├── requirements.txt
├── MANIFEST.in
├── docs
    ├── source
    │   ├── LICENSE.rst
    │   ├── README.rst
    │   ├── CHANGELOG.rst
    │   ├── images
    │   │   ├── dataset_structure.png
    │   │   └── package_data_and_metadata_into_beautiful_box.png
    │   ├── citing_dtool.rst
    │   ├── configuring_storage_brokers.rst
    │   ├── tagging_datasets.rst
    │   ├── index.rst
    │   ├── publishing_a_dataset.rst
    │   ├── configuring_the_dtool_cache_directory.rst
    │   ├── python_api.rst
    │   ├── configuring_user_name_and_email.rst
    │   ├── creating_plugins.rst
    │   ├── installation_notes.rst
    │   ├── annotating_datasets.rst
    │   ├── configuring_a_custom_readme_template.rst
    │   ├── philosophy.rst
    │   ├── working_with_overlays.rst
    │   ├── conf.py
    │   ├── quick_start_guide.rst
    │   └── working_with_datasets.rst
    ├── Makefile
    └── make.bat
├── icons
    └── 22x22
    │   └── dtool_logo.png
├── .gitignore
├── tests
    ├── test_dtool_package.py
    └── __init__.py
├── setup.cfg
├── tox.ini
├── .github
    ├── dependabot.yml
    └── workflows
    │   ├── test.yml
    │   └── publish.yml
├── dtool
    └── __init__.py
├── LICENSE.rst
├── pyproject.toml
├── README.rst
└── CHANGELOG.rst


/requirements.txt:
--------------------------------------------------------------------------------
1 | -e .
2 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include README.rst
2 | include LICENSE.rst
3 | 


--------------------------------------------------------------------------------
/docs/source/LICENSE.rst:
--------------------------------------------------------------------------------
1 | .. include:: ../../LICENSE.rst
2 | 


--------------------------------------------------------------------------------
/docs/source/README.rst:
--------------------------------------------------------------------------------
1 | .. include:: ../../README.rst
2 | 


--------------------------------------------------------------------------------
/docs/source/CHANGELOG.rst:
--------------------------------------------------------------------------------
1 | .. include:: ../../CHANGELOG.rst
2 | 


--------------------------------------------------------------------------------
/icons/22x22/dtool_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jic-dtool/dtool/HEAD/icons/22x22/dtool_logo.png


--------------------------------------------------------------------------------
/docs/source/images/dataset_structure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jic-dtool/dtool/HEAD/docs/source/images/dataset_structure.png


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | *.pyc
 2 | *.swp
 3 | *.egg-info
 4 | 
 5 | .coverage
 6 | .eggs
 7 | .tox
 8 | .pytest_cache
 9 | env
10 | build
11 | dist
12 | 
13 | dtool/version.py
14 | 


--------------------------------------------------------------------------------
/tests/test_dtool_package.py:
--------------------------------------------------------------------------------
1 | """Test the dtool package."""
2 | 
3 | 
4 | def test_version_is_string():
5 |     import dtool
6 |     assert isinstance(dtool.__version__, str)
7 | 


--------------------------------------------------------------------------------
/docs/source/images/package_data_and_metadata_into_beautiful_box.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jic-dtool/dtool/HEAD/docs/source/images/package_data_and_metadata_into_beautiful_box.png


--------------------------------------------------------------------------------
/docs/source/citing_dtool.rst:
--------------------------------------------------------------------------------
1 | Citing dtool
2 | ============
3 | 
4 | Olsson TSG, Hartley M. 2019. Lightweight data management with dtool. PeerJ 7:e6562 https://doi.org/10.7717/peerj.6562
5 | 


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
 1 | [flake8]
 2 | exclude=env*,.tox,.git,*.egg,build,docs
 3 | 
 4 | [tool:pytest]
 5 | testpaths = tests
 6 | addopts = --cov=dtool
 7 | #addopts = -x --pdb
 8 | 
 9 | [cov:run]
10 | source = dtool
11 | 


--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
 1 | [tox]
 2 | envlist=py27,py3,flake8
 3 | 
 4 | [testenv]
 5 | deps=pytest
 6 |      pytest-cov
 7 |      mock
 8 |      pytest-mock
 9 |      coverage
10 |      -r{toxinidir}/requirements.txt
11 | commands=py.test
12 | 
13 | [testenv:flake8]
14 | deps=flake8
15 | commands=flake8
16 | 


--------------------------------------------------------------------------------
/docs/source/configuring_storage_brokers.rst:
--------------------------------------------------------------------------------
 1 | Configuring storage brokers
 2 | ===========================
 3 | 
 4 | Some remote storage brokers require extra configuration to enable
 5 | authentication.
 6 | 
 7 | The command below configures access to a Azure storage container named
 8 | ``jicinformatics``::
 9 | 
10 |     $ dtool config azure set jicinformatics the-secret-token
11 |     the-secret-token
12 | 
13 | For information on other storage brokers have a look at their documentation
14 | and/or use ``dtool config --help`` to get more information.
15 | 


--------------------------------------------------------------------------------
/.github/dependabot.yml:
--------------------------------------------------------------------------------
 1 | # To get started with Dependabot version updates, you'll need to specify which
 2 | # package ecosystems to update and where the package manifests are located.
 3 | # Please see the documentation for all configuration options:
 4 | # https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
 5 | 
 6 | version: 2
 7 | updates:
 8 |   # Maintain dependencies for GitHub Actions
 9 |   - package-ecosystem: "github-actions"
10 |     directory: "/"
11 |     schedule:
12 |       interval: "daily"
13 | 
14 |   - package-ecosystem: "pip"
15 |     directory: "/"
16 |     schedule:
17 |       interval: "daily"


--------------------------------------------------------------------------------
/docs/Makefile:
--------------------------------------------------------------------------------
 1 | # Minimal makefile for Sphinx documentation
 2 | #
 3 | 
 4 | # You can set these variables from the command line.
 5 | SPHINXOPTS    =
 6 | SPHINXBUILD   = sphinx-build
 7 | SPHINXPROJ    = dtool
 8 | SOURCEDIR     = source
 9 | BUILDDIR      = build
10 | 
11 | # Put it first so that "make" without argument is like "make help".
12 | help:
13 | 	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14 | 
15 | .PHONY: help Makefile
16 | 
17 | # Catch-all target: route all unknown targets to Sphinx using the new
18 | # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
19 | %: Makefile
20 | 	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
21 | 


--------------------------------------------------------------------------------
/docs/source/tagging_datasets.rst:
--------------------------------------------------------------------------------
 1 | Tagging datasets
 2 | ================
 3 | 
 4 | It is possible to tag datasets with labels.
 5 | 
 6 | To tag a dataset with the label "rnaseq" one would use the command below::
 7 | 
 8 |     $ dtool tag set <DS_URI> rnaseq
 9 | 
10 | It is possible to add more than one tag to a dataset. The command below
11 | adds the tag "A.thaliana"::
12 | 
13 |     $ dtool tag set <DS_URI> A.thaliana
14 | 
15 | To list tags one would use the command below:
16 | 
17 |     $ dtool tag ls <DS_URI>
18 | 
19 | This would produce the output::
20 | 
21 |     A.thalina
22 |     rnaseq
23 | 
24 | It is possible to delete a tag that has been added to a dataset::
25 | 
26 | 
27 |     $ dtool tag delete <DS_URI> A.thaliana
28 | 


--------------------------------------------------------------------------------
/docs/source/index.rst:
--------------------------------------------------------------------------------
 1 | dtool: Manage Scientific Data
 2 | =============================
 3 | 
 4 | Make your data more resilient, portable and easy to work with by packaging
 5 | files & metadata into self contained datasets.
 6 | 
 7 | .. toctree::
 8 |    :maxdepth: 2
 9 | 
10 |    README
11 |    installation_notes
12 |    philosophy
13 |    quick_start_guide
14 |    working_with_datasets
15 |    tagging_datasets
16 |    annotating_datasets
17 |    working_with_overlays
18 |    configuring_user_name_and_email
19 |    configuring_the_dtool_cache_directory
20 |    configuring_a_custom_readme_template
21 |    configuring_storage_brokers
22 |    publishing_a_dataset
23 |    python_api
24 |    creating_plugins
25 |    citing_dtool
26 |    CHANGELOG
27 |    LICENSE
28 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
 1 | """Test fixtures."""
 2 | 
 3 | import os
 4 | import shutil
 5 | import tempfile
 6 | 
 7 | import pytest
 8 | 
 9 | _HERE = os.path.dirname(__file__)
10 | 
11 | 
12 | @pytest.fixture
13 | def chdir_fixture(request):
14 |     d = tempfile.mkdtemp()
15 |     curdir = os.getcwd()
16 |     os.chdir(d)
17 | 
18 |     @request.addfinalizer
19 |     def teardown():
20 |         os.chdir(curdir)
21 |         shutil.rmtree(d)
22 | 
23 | 
24 | @pytest.fixture
25 | def tmp_dir_fixture(request):
26 |     d = tempfile.mkdtemp()
27 | 
28 |     @request.addfinalizer
29 |     def teardown():
30 |         shutil.rmtree(d)
31 |     return d
32 | 
33 | 
34 | @pytest.fixture
35 | def local_tmp_dir_fixture(request):
36 |     d = tempfile.mkdtemp(dir=_HERE)
37 | 
38 |     @request.addfinalizer
39 |     def teardown():
40 |         shutil.rmtree(d)
41 |     return d
42 | 


--------------------------------------------------------------------------------
/docs/make.bat:
--------------------------------------------------------------------------------
 1 | @ECHO OFF
 2 | 
 3 | pushd %~dp0
 4 | 
 5 | REM Command file for Sphinx documentation
 6 | 
 7 | if "%SPHINXBUILD%" == "" (
 8 | 	set SPHINXBUILD=sphinx-build
 9 | )
10 | set SOURCEDIR=source
11 | set BUILDDIR=build
12 | set SPHINXPROJ=cookiecutterrepo_name
13 | 
14 | if "%1" == "" goto help
15 | 
16 | %SPHINXBUILD% >NUL 2>NUL
17 | if errorlevel 9009 (
18 | 	echo.
19 | 	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
20 | 	echo.installed, then set the SPHINXBUILD environment variable to point
21 | 	echo.to the full path of the 'sphinx-build' executable. Alternatively you
22 | 	echo.may add the Sphinx directory to PATH.
23 | 	echo.
24 | 	echo.If you don't have Sphinx installed, grab it from
25 | 	echo.http://sphinx-doc.org/
26 | 	exit /b 1
27 | )
28 | 
29 | %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
30 | goto end
31 | 
32 | :help
33 | %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
34 | 
35 | :end
36 | popd
37 | 


--------------------------------------------------------------------------------
/docs/source/publishing_a_dataset.rst:
--------------------------------------------------------------------------------
 1 | Publishing a dataset
 2 | ====================
 3 | 
 4 | It is possible to publish a datasets hosted in AWS S3 and Microsoft Azure
 5 | Storage. A dataset is published by making it accessible via the HTTP(S)
 6 | protocol.
 7 | 
 8 | .. warning:: A published dataset is accessible by anyone in the world with an
 9 |              internet connection!
10 | 
11 | .. code-block:: none
12 | 
13 |     $ dtool publish s3://dtool-demo/ba92a5fa-d3b4-4f10-bcb9-947f62e652db
14 |     Dataset accessible at https://dtool-demo.s3.amazonaws.com/ba92a5fa-d3b4-4f10-bcb9-947f62e652db
15 | 
16 | The URL retuned by the ``dtool publish`` command can be used to interact with the dataset.
17 | 
18 | .. code-block:: none
19 | 
20 |     $ dtool summary https://dtool-demo.s3.amazonaws.com/ba92a5fa-d3b4-4f10-bcb9-947f62e652db
21 |     name: hypocotyl3
22 |     uuid: ba92a5fa-d3b4-4f10-bcb9-947f62e652db
23 |     creator_username: olssont
24 |     number_of_items: 339
25 |     size: 86.7MiB
26 |     frozen_at: 2018-09-12
27 | 
28 | 


--------------------------------------------------------------------------------
/docs/source/configuring_the_dtool_cache_directory.rst:
--------------------------------------------------------------------------------
 1 | Configuring the dtool cache directory
 2 | =====================================
 3 | 
 4 | When fetching a dataset item from a dataset stored in object storage the file
 5 | get stored in a cache directory. The default cache directory is::
 6 | 
 7 |     ~/.cache/dtool
 8 | 
 9 | You may want to configure this cache to be in a different location. This can be achieved using the ``dtool config cache`` command::
10 | 
11 |     $ mkdir /tmp/dtool
12 |     $ dtool config cache /tmp/dtool
13 | 
14 | It is also possible to override both the default and the configured cache
15 | directory by exporting the environment variable ``DTOOL_CACHE_DIRECTORY``.
16 | This can be useful when using local SSD on a compute cluster::
17 | 
18 | 
19 |     $ mkdir /local/ssd/dtool
20 |     $ export DTOOL_CACHE_DIRECTORY=/local/ssd/dtool
21 | 
22 | 
23 | .. warning:: There is no automatic mechanism built into dtool to clear up the
24 |              cache. It can therefore grow very large if you are working with
25 |              lots of datasets in object storage.
26 | 


--------------------------------------------------------------------------------
/dtool/__init__.py:
--------------------------------------------------------------------------------
 1 | """dtool package."""
 2 | 
 3 | import logging
 4 | 
 5 | logger = logging.getLogger(__name__)
 6 | 
 7 | # workaround for diverging python versions:
 8 | try:
 9 |     from importlib.metadata import version, PackageNotFoundError
10 |     logger.debug("imported version, PackageNotFoundError from importlib.metadata")
11 | except ModuleNotFoundError:
12 |     from importlib_metadata import version, PackageNotFoundError
13 |     logger.debug("imported version, PackageNotFoundError from importlib_metadata")
14 | 
15 | # first, try to determine dynamic version at runtime
16 | try:
17 |     __version__ = version(__name__)
18 |     logger.debug("Determined version %s via importlib_metadata.version", __version__)
19 | except PackageNotFoundError:
20 |     # if that fails, check for static version file written by setuptools_scm
21 |     try:
22 |         from .version import version as __version__
23 |         logger.debug("Determined version %s from autogenerated dtool/version.py", __version__)
24 |     except Exception as e:
25 |         logger.debug("All efforts to determine version failed: %s", e)
26 |         __version__ = None


--------------------------------------------------------------------------------
/LICENSE.rst:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | ===========
 3 | 
 4 | Copyright (c) 2017 Tjelvar Olsson
 5 | 
 6 | Permission is hereby granted, free of charge, to any person obtaining a copy
 7 | of this software and associated documentation files (the "Software"), to deal
 8 | in the Software without restriction, including without limitation the rights
 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10 | copies of the Software, and to permit persons to whom the Software is
11 | furnished to do so, subject to the following conditions:
12 | 
13 | The above copyright notice and this permission notice shall be included in all
14 | copies or substantial portions of the Software.
15 | 
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22 | SOFTWARE.
23 | 


--------------------------------------------------------------------------------
/docs/source/python_api.rst:
--------------------------------------------------------------------------------
 1 | Python API
 2 | ==========
 3 | 
 4 | The ``dtool`` command line tool is built using the Python API in `dtoolcore
 5 | <https://github.com/jic-dtool/dtoolcore>`_. This API can also be used to create
 6 | and interact with datasets directly.
 7 | 
 8 | Below is an example showing how to load a dataset from a URI and use it to
 9 | print out a list of all the data item identifiers in the dataset.
10 | 
11 | .. code-block:: python
12 | 
13 |     >>> from dtoolcore import DataSet
14 |     >>> dataset = DataSet.from_uri("bgi-sequencing-12345")
15 |     >>> for i in dataset.identifiers:
16 |     ...     print(i)
17 |     ...
18 |     1c10766c4a29536bc648260f456202091e2f57b4
19 |     fbcc24bed36128535a263b74b2e138d7cc43e90c
20 |     9ca330a84f3dbbdd457a860b5e3c21c917743dd6
21 |     3dce23b901709a24cfbb974b70c1ef132af10a67
22 |     78e7f1507da598e9f6a02810c1f846cfc24fb8ad
23 |     42f43f49b74ef7f901010965aae71170c9fd3ef6
24 |     ab069337b0f86cdad899d57e8de63d5b2b680c85
25 |     b55ae3fbe6081eb2ed4ed2c4ea316dbeb943ea2c
26 | 
27 | More information on how to make use of the Python API can be found in the
28 | `dtoolcore documentation <http://dtoolcore.readthedocs.io/en/latest/>`_.
29 | 


--------------------------------------------------------------------------------
/.github/workflows/test.yml:
--------------------------------------------------------------------------------
 1 | name: test
 2 | 
 3 | on:
 4 |   push:
 5 |     branches:
 6 |       - main
 7 |       - master
 8 |     tags:
 9 |       - '*'
10 |   pull_request:
11 | 
12 | jobs:
13 |   test:
14 |     runs-on: ubuntu-latest
15 | 
16 |     strategy:
17 |       matrix:
18 |         python-version: ['3.7', '3.8', '3.9', '3.10', '3.11', '3.12']
19 | 
20 |     steps:
21 |     - name: Git checkout
22 |       uses: actions/checkout@v3
23 | 
24 |     - name: Set up python3 ${{ matrix.python-version }}
25 |       uses: actions/setup-python@v2
26 |       with:
27 |         python-version: ${{ matrix.python-version }}
28 | 
29 |     - name: Install requirements
30 |       run: |
31 |         python -m pip install --upgrade pip
32 |         pip install flake8
33 |         pip install .[test]
34 |         pip list
35 | 
36 |     - name: Test with pytest
37 |       run: |
38 |         pytest -sv
39 | 
40 |     - name: Lint with flake8
41 |       run: |
42 |         # stop the build if there are Python syntax errors or undefined names
43 |         flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
44 |         # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
45 |         flake8 . --count --exit-zero --max-complexity=12 --max-line-length=127 --statistics


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [build-system]
 2 | requires = ["setuptools>=42", "setuptools_scm[toml]>=6.3"]
 3 | build-backend = "setuptools.build_meta"
 4 | 
 5 | [project]
 6 | name = "dtool"
 7 | description = "dtool command line client for managing data"
 8 | readme = "README.rst"
 9 | license = {file = "LICENSE"}
10 | authors = [
11 |     {name = "Tjelvar Olsson", email = "tjelvar.olsson@gmail.com"}
12 | ]
13 | dynamic = ["version"]
14 | dependencies = [
15 |     "dtoolcore==3.18.3",
16 |     "dtool-cli==0.7.1",
17 |     "dtool-create==0.23.4",
18 |     "dtool-info==0.16.2",
19 |     "dtool-symlink==0.3.1",
20 |     "dtool-http==0.5.1",
21 |     "dtool-config==0.4.1",
22 |     "dtool-overlay==0.3.1",
23 |     "dtool-annotation==0.1.1",
24 |     "dtool-tag==0.1.1"
25 | ]
26 | 
27 | [project.optional-dependencies]
28 | test = [
29 |     "pytest",
30 |     "pytest-cov"
31 | ]
32 | docs = [
33 |     "sphinx",
34 |     "sphinx_rtd_theme"
35 | ]
36 | 
37 | [project.urls]
38 | Documentation = "https://dtool.readthedocs.io"
39 | Repository = "https://github.com/jic-dtool/dtool"
40 | Changelog = "https://github.com/jic-dtool/dtool/blob/master/CHANGELOG.rst"
41 | 
42 | [tool.setuptools_scm]
43 | version_scheme = "guess-next-dev"
44 | local_scheme = "no-local-version"
45 | write_to = "dtool/version.py"
46 | 
47 | [tool.setuptools]
48 | packages = ["dtool"]
49 | 


--------------------------------------------------------------------------------
/docs/source/configuring_user_name_and_email.rst:
--------------------------------------------------------------------------------
 1 | Configuring user name and email
 2 | ===============================
 3 | 
 4 | When running the ``dtool interactive readme`` the default name and email
 5 | address are ``Your Name`` and ``you@example.com``.
 6 | 
 7 | ::
 8 | 
 9 |     $ dtool readme interactive my_dataset
10 |     description [Dataset description]:
11 |     project [Project name]:
12 |     confidential [False]:
13 |     personally_identifiable_information [False]:
14 |     name [Your Name]:
15 |     email [you@example.com]:
16 |     username [olssont]:
17 |     creation_date [2017-12-14]:
18 | 
19 | These defaults can be configuring the user name and email address.
20 | 
21 | ::
22 | 
23 |     $ dtool config user name "Care A. Bout-Data"
24 |     Care A. Bout-Data
25 |     $ dtool config user email researcher@famous.uni.ac.uk
26 |     researcher@famous.uni.ac.uk
27 | 
28 |     
29 | 
30 | Rerunning the previous ``dtool readme interactive`` command now gives updated
31 | defaults when prompting for input.
32 | 
33 | ::
34 | 
35 |     $ dtool readme interactive my_dataset
36 |     description [Dataset description]:
37 |     project [Project name]:
38 |     confidential [False]:
39 |     personally_identifiable_information [False]:
40 |     name [Care A. Bout-Data]:
41 |     email [researcher@famous.uni.ac.uk]:
42 |     username [olssont]:
43 |     creation_date [2017-12-14]:
44 | 


--------------------------------------------------------------------------------
/.github/workflows/publish.yml:
--------------------------------------------------------------------------------
 1 | name: publish
 2 | 
 3 | on:
 4 |   push:
 5 |     branches:
 6 |       - main
 7 |       - master
 8 |     tags:
 9 |       - '*'
10 | 
11 | jobs:
12 |   build:
13 | 
14 |     runs-on: ubuntu-latest
15 | 
16 |     steps:
17 |     - uses: actions/checkout@v4
18 |       with:
19 |         fetch-depth: 0
20 | 
21 |     - name: Set up Python 3.12
22 |       uses: actions/setup-python@v5
23 |       with:
24 |         python-version: 3.12
25 | 
26 |     - name: Install requirements
27 |       run: |
28 |         pip install --upgrade build
29 |         pip install --upgrade setuptools wheel setuptools-scm[toml]
30 |         pip list
31 | 
32 |     - name: Package distribution
33 |       run: |
34 |         python -m build
35 | 
36 |     - name: Publish distribution to Test PyPI
37 |       uses: pypa/gh-action-pypi-publish@release/v1
38 |       continue-on-error: true
39 |       with:
40 |         user: __token__
41 |         password: ${{ secrets.test_pypi_password }}
42 |         repository-url: https://test.pypi.org/legacy/
43 |         verbose: true
44 |         skip-existing: true
45 | 
46 |     - name: Publish distribution to PyPI
47 |       if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
48 |       uses: pypa/gh-action-pypi-publish@release/v1
49 |       with:
50 |         user: __token__
51 |         password: ${{ secrets.pypi_password }}
52 |         verbose: true
53 | 


--------------------------------------------------------------------------------
/docs/source/creating_plugins.rst:
--------------------------------------------------------------------------------
 1 | Creating plugins
 2 | ================
 3 | 
 4 | It is possible to create plugins to the ``dtool`` command line tool. There are
 5 | two different types of plugins: command line tools and backend storage brokers.
 6 | The former allows a developer to add custom extensions to the ``dtool``
 7 | command. The latter allows a developer to create an interface for talking to a
 8 | new type of storage. One could for example create a storage broker to interface
 9 | with `Amazon S3 <https://aws.amazon.com/s3/>`_ object storage.
10 | 
11 | 
12 | Extending the ``dtool`` command line tool
13 | -----------------------------------------
14 | 
15 | Information on how to extend the ``dtool`` command line tool is available in
16 | the README file of `dtool-cli <https://github.com/jic-dtool/dtool-cli>`_.
17 | 
18 | Concrete examples making use of this plugin system are:
19 | 
20 | - `dtool-create <https://github.com/jic-dtool/dtool-create>`_
21 | - `dtool-info <https://github.com/jic-dtool/dtool-info>`_
22 | 
23 | 
24 | Creating an interface to a new type of storage
25 | ----------------------------------------------
26 | 
27 | Below are the steps required to create a storage broker for allowing ``dtool``
28 | to interact with a new backend. A concrete example making use of this plugin
29 | system is `dtool-irods <https://github.com/jic-dtool/dtool-info>`_.
30 | 
31 | 1. Examine the code in ``dtoolcore.storagebroker.DiskStorageBroker``.
32 | 2. Create a Python class for your storage, e.g. ``MyStorageBroker``
33 | 3. Add a ``MyStorageBroker.key``` attribute to the class, this key is used to
34 |    lookup an appropriate storage broker when interacting with a dataset
35 | 4. Add a ``dtoolcore.FileHasher`` instance that matches the hashing algorithm
36 |    used by your storage to your ``MyStorageBroker.hasher`` attribute
37 | 5. Add implementations for all the public functions in
38 |    ``dtoolcore.storagebroker.DiskStorageBroker`` class to ``MyStorageBroker``
39 | 6. Expose the ``MyStorageBroker`` class as a ``dtool.storage_broker``
40 |    entrypoint, e.g. add a section along the lines of the below to the
41 |    ``setup.py`` file::
42 | 
43 |         entry_points={
44 |             "dtool.storage_brokers": [
45 |                 "MyStorageBroker=my_dtool_storage_plugin:MyStorageBroker",
46 |             ],
47 |         },
48 | 


--------------------------------------------------------------------------------
/docs/source/installation_notes.rst:
--------------------------------------------------------------------------------
 1 | Installation notes
 2 | ==================
 3 | 
 4 | dtool is a Python package that is pip installable.
 5 | 
 6 | Make sure that ``pip``, ``setputools`` and ``wheel`` are up to date.
 7 | This is a requirement of one of the dependencies (``ruamel.yaml``).
 8 | 
 9 | .. code-block:: none
10 | 
11 |     $ pip install -U pip setuptools wheel
12 | 
13 | dtool can then be installed using ``pip``.
14 | 
15 | .. code-block:: none
16 | 
17 |     $ pip install dtool
18 | 
19 | 
20 | Adding support for S3 object storage
21 | ------------------------------------
22 | 
23 | Install the ``dtool-s3`` package using ``pip``.
24 | 
25 | .. code-block:: none
26 | 
27 |     $ pip install dtool-s3
28 | 
29 | To configure Amazon S3 credentials see the README file in the `dtool-s3
30 | <https://github.com/jic-dtool/dtool-s3>`_ GitHub repository.
31 | 
32 | 
33 | Adding support for Azure storage
34 | --------------------------------
35 | 
36 | Install the ``dtool-azure`` package using ``pip``.
37 | 
38 | .. code-block:: none
39 | 
40 |     $ pip install dtool-azure
41 | 
42 | To configure Microsoft Azure credentials see the README file in the
43 | `dtool-azure <https://github.com/jic-dtool/dtool-azure>`_ GitHub repository.
44 | 
45 | 
46 | Adding support for ECS S3 object storage
47 | ----------------------------------------
48 | 
49 | Install the ``dtool-ecs`` package using ``pip``.
50 | 
51 | .. code-block:: none
52 | 
53 |     $ pip install dtool-ecs
54 | 
55 | To configure ECS S3 object storage credentials see the README file in the
56 | `dtool-ecs <https://github.com/jic-dtool/dtool-ecs>`_ GitHub repository.
57 | 
58 | 
59 | Adding support for iRODS storage
60 | --------------------------------
61 | 
62 | Install the ``dtool-irods`` package using ``pip``.
63 | 
64 | .. code-block:: none
65 | 
66 |     $ pip install dtool-irods
67 | 
68 | .. warning:: In order to be able to use the iRODS backend storage
69 |              you will need to install the iCommands. Linux packages
70 |              can be downloaded from `irods.org/download
71 |              <https://irods.org/download/>`_. On Mac OSX these can
72 |              be installed using the brew package manager::
73 | 
74 |                     $ brew install irods
75 | 
76 | For more details see the `dtool-irods
77 | <https://github.com/jic-dtool/dtool-irods>`_ GitHub repository.
78 | 


--------------------------------------------------------------------------------
/docs/source/annotating_datasets.rst:
--------------------------------------------------------------------------------
 1 | Annotating datasets
 2 | ===================
 3 | 
 4 | It is possible to annotate a dataset with so called key/value pairs. Such
 5 | key/value annotations are intended to make it easy to add and access specific
 6 | metadata at a per dataset level.
 7 | 
 8 | The difference between annotations and the descriptive metadata is that the
 9 | former is easier to work with in a programmatic fashion. The descriptive
10 | metadata, stored in the dataset's README content, is more free form. It is
11 | non-trivial to access specific pieces of information from the descriptive
12 | metadata in the dataset's README content, whereas a dtool annotation can be
13 | easily accessed by its name (key).
14 | 
15 | To create an annotation using the dtool CLI one would use the ``dtool annotation
16 | set`` command. For example to annotate a dataset with a "project" one would use
17 | the command::
18 | 
19 |     $ dtool annotation set <DS_URI> project world-peace
20 | 
21 | To access the "project" annotation one would use the ``dtool annotation get`` command::
22 | 
23 |     $ dtool annotation get <DS_URI> project
24 |     world-peace
25 | 
26 | Annotations set using ``dtool annotation set`` are strings by default. It is possible
27 | to set the type to ``int``, ``float``, and ``bool`` using the ``--type`` option. For
28 | example to annotate a dataset with a "stars" rating one could use the command::
29 | 
30 |     $ dtool annotation set --type int <DS_URI> stars 3
31 | 
32 | For more complex data structures one can set the type to ``json``. For example::
33 | 
34 |     $ dtool annotation set --type json <DS_URI> params '{"x": 3.4, "y": 5.6}'
35 | 
36 | It is possible to list all the annotations of a dataset::
37 | 
38 |     $ dtool annotation ls
39 |     params  {"x": 3.4, "y": 5.6}
40 |     project world-peace
41 |     stars   3
42 | 
43 | To update an annotation one can use the ``dtool annotation set`` command again.
44 | For example to show that a dataset is really fantastic one could increase its
45 | star rating to 5::
46 | 
47 |     $ dtool annotation set <DS_URI> stars 5 --type int
48 |     $ dtool annotation get <DS_URI> stars
49 |     5
50 | 
51 | .. warning::
52 |     There are restrictions on the characters and the length of the keys. They have to
53 |     match the regular expression ``^[a-zA-Z.-_]*$`` and it must be 80 characters or less.
54 | 


--------------------------------------------------------------------------------
/docs/source/configuring_a_custom_readme_template.rst:
--------------------------------------------------------------------------------
 1 | Configuring a custom README template
 2 | ====================================
 3 | 
 4 | When running the ``dtool interactive readme`` command one is prompted to enter
 5 | the default descriptive metadata shown below.
 6 | 
 7 | ::
 8 | 
 9 |     $ dtool readme interactive my_dataset
10 |     description [Dataset description]:
11 |     project [Project name]:
12 |     confidential [False]:
13 |     personally_identifiable_information [False]:
14 |     name [Your Name]:
15 |     email [you@example.com]:
16 |     username [olssont]:
17 |     creation_date [2017-12-14]:
18 | 
19 | It is possible to configure the required metadata prompted for by the
20 | ``dtool readme interactive`` command. This requires the creation of a
21 | README file making use of the YAML file format.
22 | 
23 | The default template is shown below.
24 | 
25 | .. code-block:: yaml
26 | 
27 |     ---
28 |     description: Dataset description
29 |     project: Project name
30 |     confidential: False
31 |     personally_identifiable_information: False
32 |     owners:
33 |       - name: {DTOOL_USER_FULL_NAME}
34 |         email: {DTOOL_USER_EMAIL}
35 |         username: {username}
36 |     creation_date: {date}
37 |     # links:
38 |     #  - http://doi.dx.org/your_doi
39 |     #  - http://github.com/your_code_repository
40 |     # budget_codes:
41 |     #  - E.g. CCBS1H10S
42 | 
43 | To create a custom template that also prompted for a species definition one
44 | could create the file ``~/custom_dtool_readme.yml`` with the content below.
45 | 
46 | .. code-block:: yaml
47 | 
48 |     ---
49 |     description: Dataset description
50 |     project: Project name
51 |     species: A. thaliana
52 |     confidential: False
53 |     personally_identifiable_information: False
54 |     owners:
55 |       - name: {DTOOL_USER_FULL_NAME}
56 |         email: {DTOOL_USER_EMAIL}
57 |         username: {username}
58 |     creation_date: {date}
59 | 
60 | To configure the dtool to make use of this template one can use the ``dtool config readme-template`` command::
61 | 
62 |     $ dtool config readme-template ~/custom_dtool_readme.yml
63 | 
64 | The ``dtool config readme-template`` command sets the
65 | ``DTOOL_README_TEMPLATE_FPATH`` key in the ``~/.config/dtool/dtool.json`` file.
66 | Alternatively one can make use of the ``DTOOL_README_TEMPLATE_FPATH``
67 | environment variable::
68 | 
69 |     $ export DTOOL_README_TEMPLATE_FPATH=~/custom_dtool_readme.yml
70 | 
71 | Re-running the previous ``dtool readme interacitve`` command now includes a prompt for the species and the default value ``A. thaliana``::
72 | 
73 |     $ dtool readme interactive my_dataset
74 |     description [Dataset description]:
75 |     project [Project name]:
76 |     species [A. thaliana]:
77 |     confidential [False]:
78 |     personally_identifiable_information [False]:
79 |     name [Your Name]:
80 |     email [you@example.com]:
81 |     username [olssont]:
82 |     creation_date [2017-12-14]:
83 | 
84 | 
85 | 


--------------------------------------------------------------------------------
/docs/source/philosophy.rst:
--------------------------------------------------------------------------------
 1 | Philosophy - what is dtool?
 2 | ===========================
 3 | 
 4 | What problem is dtool solving?
 5 | ------------------------------
 6 | 
 7 | Managing data as a collection of individual files is hard. Analysing that data
 8 | will require that certain sets of files are present, understanding it requires
 9 | suitable metadata, and copying or moving it while keeping its integrity is
10 | difficult.
11 | 
12 | dtool solves this problem by packaging a collection of files and accompanying
13 | metadata into a self contained and unified whole: a dataset.
14 | 
15 | Having metadata separate from the data, for example in an Excel spread sheet
16 | with links to the data files, it becomes difficult to reorganise the data
17 | without fear of breaking links between the data and the metadata. By
18 | encapsulating both the data files and associated metadata in a dataset one is
19 | free to move the dataset around at will. The high level organisation of
20 | datasets can therefore evolve over time as data management processes change.
21 | 
22 | dtool also solves an issue of trust. By including file hashes as metadata
23 | it is possible to verify the integrity of a dataset after it has been moved to
24 | a new location or when coming back to a dataset after a period of time.
25 | 
26 | It is possible to discover and access both metadata and data files in a
27 | dataset. It is therefore easy to create scripts and pipelines to process the
28 | items, or a subset of items, in a dataset.
29 | 
30 | 
31 | What is a "dtool dataset"?
32 | --------------------------
33 | 
34 | Briefly, a dtool dataset consists of:
35 | 
36 | - The files added to the dataset, known as the dataset "items"
37 | - Metadata used to describe the dataset as a whole
38 | - Metadata describing the items in the dataset
39 | 
40 | The exact details of how this data and metadata is stored depends on the
41 | "backend" (the type of storage used).  In other words a dataset is stored
42 | differently on local file system disk to how it is stored in Amazon S3 object
43 | store. However, the ``dtool`` commands and the Python API for interacting with
44 | datasets are the same for all backends.
45 | 
46 | 
47 | What does a dtool dataset look like on local disk?
48 | --------------------------------------------------
49 | 
50 | Below is the structure of a fictional dataset containing three items from an
51 | RNA sequencing experiment.
52 | 
53 | .. code-block:: none
54 | 
55 |     $ tree ~/my_dataset
56 |     /Users/olssont/my_dataset
57 |     ├── README.yml
58 |     └── data
59 |         ├── rna_seq_reads_1.fq.gz
60 |         ├── rna_seq_reads_2.fq.gz
61 |         └── rna_seq_reads_3.fq.gz
62 | 
63 | The ``README.yml`` file is where metadata used to describe the whole dataset is
64 | stored. The items of the dataset are stored in the directory named ``data``.
65 | 
66 | There is also hidden metadata, stored as plain text files, in a directory named
67 | ``.dtool``. This should not be edited directly by the user.
68 | 
69 | .. image:: images/dataset_structure.png
70 | 
71 | 
72 | How does one create a dtool dataset?
73 | ------------------------------------
74 | 
75 | This happens in stages:
76 | 
77 | 1. One creates a so called "proto dataset"
78 | 2. One adds data and metadata to this proto dataset
79 | 3. One converts the proto dataset into a dataset by "freezing" it
80 | 
81 | Once a proto dataset is "frozen" it is simply referred to as a dataset and it
82 | is no longer possible to modify the data in it. In other words it is not
83 | possible to add or remove items from a dataset or to alter any of the items in
84 | a dataset.
85 | 
86 | The process can be likened to creating an open box (the proto dataset), putting
87 | items (data) into it, sticking a label (metadata) on it, and closing the box
88 | (freezing the dataset).
89 | 
90 | .. image:: images/package_data_and_metadata_into_beautiful_box.png
91 | 
92 | 
93 | Give me more details!
94 | ---------------------
95 | 
96 | An in depth discussion of dtool can be found in the paper
97 | `Lightweight data management with dtool <https://peerj.com/articles/6562/>`_.
98 | 


--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
 1 | dtool: Manage Scientific Data
 2 | =============================
 3 | 
 4 | .. |dtool| image:: https://github.com/jic-dtool/dtool/blob/master/icons/22x22/dtool_logo.png?raw=True
 5 |     :height: 20px
 6 |     :target: https://github.com/jic-dtool/dtool
 7 | 
 8 | .. |pypi| image:: https://badge.fury.io/py/dtool.svg
 9 |    :target: http://badge.fury.io/py/dtool
10 |    :alt: PyPi package
11 | 
12 | .. |test| image:: https://img.shields.io/github/actions/workflow/status/jic-dtool/dtool/test.yml?branch=master&label=tests
13 |     :target: https://github.com/jic-dtool/dtool/actions/workflows/test.yml
14 | 
15 | .. |docs| image:: https://readthedocs.org/projects/dtool/badge/?version=latest
16 |    :target: https://readthedocs.org/projects/dtool?badge=latest
17 |    :alt: Documentation Status
18 | 
19 | |dtool| |pypi| |test| |docs|
20 | 
21 | *Make your data more resilient, portable and easy to work with by packaging
22 | files & metadata into self contained datasets.*
23 | 
24 | - Documentation: http://dtool.readthedocs.io
25 | - Paper: https://doi.org/10.7717/peerj.6562
26 | - Free software: MIT License
27 | 
28 | Overview
29 | --------
30 | 
31 | dtool is a suite of software for managing scientific data and making it
32 | accessible programmatically. It consists of a command line interface ``dtool``
33 | and a Python API: `dtoolcore <https://github.com/jic-dtool/dtoolcore>`_.
34 | 
35 | The ``dtool`` command line interface allows one to organise files into datasets
36 | and to move datasets between different storage solutions, for example from
37 | local disk to remote object storage. Importantly it also provides methods to
38 | verify that the transfer has been successful.
39 | 
40 | The Python API gives complete access to the data and metadata in a dataset.  It
41 | makes it easy to create scripts for processing the items, or a subset of items,
42 | in a dataset. The Python API also allows datasets to be constructed
43 | programmatically.
44 | 
45 | dtool is extensible, meaning that it is possible to create plugins both for
46 | adding functionality to the command line interface and for creating interfaces
47 | to custom storage backends.
48 | 
49 | The ``dtool`` Python package is a meta package that installs the packages:
50 | 
51 | - `dtoolcore <https://github.com/jic-dtool/dtoolcore>`_ - core API
52 | - `dtool-cli <https://github.com/jic-dtool/dtool-cli>`_ - CLI plugin scaffold
53 | - `dtool-annotation <https://github.com/jic-dtool/dtool-annotation>`_ - CLI commands for working with dataset annotations
54 | - `dtool-config <https://github.com/jic-dtool/dtool-config>`_ - CLI commands for configuring dtool
55 | - `dtool-create <https://github.com/jic-dtool/dtool-create>`_ - CLI commands for creating datasets
56 | - `dtool-info <https://github.com/jic-dtool/dtool-info>`_ - CLI commands for getting information about datasets
57 | - `dtool-overlay <https://github.com/jic-dtool/dtool-overlay>`_ - CLI commands for working with per item metadata stored as overlays
58 | - `dtool-symlink <https://github.com/jic-dtool/dtool-symlink>`_ - storage broker interface allowing symlinking to data
59 | - `dtool-http <https://github.com/jic-dtool/dtool-symlink>`_ - storage broker interface allowing read only access to datasets over HTTP
60 | 
61 | 
62 | Installation::
63 | 
64 |     $ pip install dtool
65 | 
66 | There are support packages for several object storage solutions:
67 | 
68 | - `dtool-s3 <https://github.com/jic-dtool/dtool-s3>`_ - storage broker interface to S3 object storage
69 | - `dtool-smb <https://github.com/livMatS/dtool-smb>`_ - storage broker interface to smb network share
70 | - `dtool-azure <https://github.com/jic-dtool/dtool-azure>`_ - storage broker interface to Azure Storage
71 | - `dtool-ecs <https://github.com/jic-dtool/dtool-ecs>`_ - storage broker interface to ECS S3 object storage
72 | - `dtool-irods <https://github.com/jic-dtool/dtool-irods>`_ - storage broker interface to iRODS
73 | 
74 | If you have access to Amazon S3, Microsoft Azure, ECS S3 or iRODS storage you may also want to install support for these::
75 | 
76 |     $ pip install dtool-s3 dtool-azure dtool-ecs dtool-irods
77 | 
78 | Usage::
79 | 
80 |     $ dtool create my-awesome-dataset
81 |     Created proto dataset file:///Users/olssont/my-awesome-dataset
82 |     Next steps:
83 |     1. Add raw data, eg:
84 |        dtool add item my_file.txt file:///Users/olssont/my-awesome-dataset
85 |        Or use your system commands, e.g:
86 |        mv my_data_directory /Users/olssont/my-awesome-dataset/data/
87 |     2. Add descriptive metadata, e.g:
88 |        dtool readme interactive file:///Users/olssont/my-awesome-dataset
89 |     3. Convert the proto dataset into a dataset:
90 |        dtool freeze file:///Users/olssont/my-awesome-dataset
91 | 
92 | 


--------------------------------------------------------------------------------
/docs/source/working_with_overlays.rst:
--------------------------------------------------------------------------------
  1 | Working with overlays
  2 | =====================
  3 | 
  4 | Overlays provide a means to store and access per item metadata.
  5 | 
  6 | Display table with all per item metadata
  7 | ----------------------------------------
  8 | 
  9 | It is possible to display all the per item metadata as a CSV table using the
 10 | command ``dtool overlays show``.
 11 | 
 12 | .. code-block:: none
 13 | 
 14 |     $ dtool overlays show http://bit.ly/Ecoli-reads-minified
 15 |     identifiers,pair_id,is_read1,useful_name,relpaths
 16 |     8bda245a8cd526673aab775f90206c8b67d196af,9760280dc6313d3bb598fa03c5931a7f037d7ffc,False,ERR022075,ERR022075_2.fastq.gz
 17 |     9760280dc6313d3bb598fa03c5931a7f037d7ffc,8bda245a8cd526673aab775f90206c8b67d196af,True,ERR022075,ERR022075_1.fastq.gz
 18 | 
 19 | The dataset above has three overlays named: ``pair_id``, ``is_read1``, and
 20 | ``useful_name``. The columns named ``identifiers`` and ``relpaths`` are
 21 | reported for bookkeeping purposes.
 22 | 
 23 | 
 24 | Accessing an overlay value of a specific dataset item
 25 | ------------------------------------------------------
 26 | 
 27 | It is possible to get access to the value stored in an overlay for a specific
 28 | item using the command ``dtool item overlay``.
 29 | 
 30 | .. code-block:: none
 31 | 
 32 |     $ dtool item overlay  \
 33 |         is_read1  \
 34 |         http://bit.ly/Ecoli-reads-minified  \
 35 |         9760280dc6313d3bb598fa03c5931a7f037d7ffc
 36 |     True
 37 | 
 38 | 
 39 | Creating overlays
 40 | -----------------
 41 | 
 42 | Overlay creation happens in two steps.
 43 | 
 44 | 1. Create a template overlay CSV file using the format above
 45 | 2. Use the template to write all overlays in the template to the dataset
 46 | 
 47 | Creating overlay templates
 48 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
 49 | 
 50 | A starting template can be created using the ``dtool overlays show`` command.
 51 | For a dataset with no overlays this will result in a table with the columns
 52 | ``identifiers`` and ``relpaths``. The table will have one row for each item in
 53 | the dataset. One can then add columns for the overlays one would wish to
 54 | create.
 55 | 
 56 | However, in many cases one would want to use metadata in the items' relapths to
 57 | generate a starting CSV template. This can be achieved using the commands:
 58 | 
 59 | - ``dtool overlays template parse``
 60 | - ``dtool overlays template glob``
 61 | - ``dtool overlays template pairs``
 62 | 
 63 | Consider for example the dataset below::
 64 | 
 65 |     $ dtool ls http://bit.ly/Ecoli-reads-minified
 66 |     8bda245a8cd526673aab775f90206c8b67d196af  ERR022075_2.fastq.gz
 67 |     9760280dc6313d3bb598fa03c5931a7f037d7ffc  ERR022075_1.fastq.gz
 68 | 
 69 | The command below could be used to generate a template for the overlays
 70 | "useful_name" and "read"::
 71 | 
 72 |     $ dtool overlays template parse  \
 73 |         http://bit.ly/Ecoli-reads-minified  \
 74 |         '{useful_name}_{read:d}.fastq.gz'
 75 | 
 76 | Results in the CSV output below::
 77 | 
 78 |     identifiers,read,useful_name,relpaths
 79 |     8bda245a8cd526673aab775f90206c8b67d196af,2,ERR022075,ERR022075_2.fastq.gz
 80 |     9760280dc6313d3bb598fa03c5931a7f037d7ffc,1,ERR022075,ERR022075_1.fastq.gz
 81 | 
 82 | To ignore a variable element when parsing one can use unnamed curly braces. The
 83 | command below for example only generates the overlay "useful_name"::
 84 | 
 85 |     $ dtool overlays template parse  \
 86 |         http://bit.ly/Ecoli-reads-minified  \
 87 |         '{useful_name}_{:d}.fastq.gz'
 88 |     identifiers,useful_name,relpaths
 89 |     8bda245a8cd526673aab775f90206c8b67d196af,ERR022075,ERR022075_2.fastq.gz
 90 |     9760280dc6313d3bb598fa03c5931a7f037d7ffc,ERR022075,ERR022075_1.fastq.gz
 91 | 
 92 | 
 93 | Sometimes one simply wants to create a boolean overlay based on weather or not
 94 | a particular file matches a glob pattern. The command below can be used to
 95 | create a CSV template for an overlay named ``is_read1``::
 96 | 
 97 |     
 98 |     $ dtool overlays template glob  \
 99 |         http://bit.ly/Ecoli-reads-minified  \
100 |         is_read1  \
101 |         '*1.fastq.gz'
102 |     identifiers,is_read1,relpaths
103 |     8bda245a8cd526673aab775f90206c8b67d196af,False,ERR022075_2.fastq.gz
104 |     9760280dc6313d3bb598fa03c5931a7f037d7ffc,True,ERR022075_1.fastq.gz
105 |  
106 | Sometimes it is useful to be able to find pairs of items. For example when
107 | dealing with genomic sequencing data that has forward and reverse reads.
108 | 
109 | One can create a "pair_id" overlay CSV template for this dataset using the
110 | command below::
111 | 
112 |     $  dtool overlays template pairs http://bit.ly/Ecoli-reads-minified .fastq.gz
113 |     identifiers,pair_id,relpaths
114 |     8bda245a8cd526673aab775f90206c8b67d196af,9760280dc6313d3bb598fa03c5931a7f037d7ffc,ERR022075_2.fastq.gz
115 |     9760280dc6313d3bb598fa03c5931a7f037d7ffc,8bda245a8cd526673aab775f90206c8b67d196af,ERR022075_1.fastq.gz
116 | 
117 | In the above the suffix ".fastq.gz" is used to extract the prefix ``ERR022075_``
118 | that is used to find matching pairs.
119 | 
120 | 
121 | Writing an overlay template to a dataset
122 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
123 | 
124 | Once one has a overlay template CSV file one can write this to a dataset::
125 | 
126 |     $ dtool overlays write <DS_URI> overlays.csv
127 | 
128 | 
129 | Further reading
130 | ---------------
131 | 
132 | For more information see the at https://github.com/jic-dtool/dtool-overlay
133 | 


--------------------------------------------------------------------------------
/docs/source/conf.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # This file is execfile()d with the current directory set to its
  4 | # containing dir.
  5 | #
  6 | # Note that not all possible configuration values are present in this
  7 | # autogenerated file.
  8 | #
  9 | # All configuration values have a default; values that are commented out
 10 | # serve to show the default.
 11 | 
 12 | # If extensions (or modules to document with autodoc) are in another directory,
 13 | # add these directories to sys.path here. If the directory is relative to the
 14 | # documentation root, use os.path.abspath to make it absolute, like shown here.
 15 | #
 16 | import os
 17 | # import sys
 18 | # sys.path.insert(0, os.path.abspath('.'))
 19 | 
 20 | 
 21 | # -- General configuration ------------------------------------------------
 22 | 
 23 | # If your documentation needs a minimal Sphinx version, state it here.
 24 | #
 25 | # needs_sphinx = '1.0'
 26 | 
 27 | # Add any Sphinx extension module names here, as strings. They can be
 28 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 29 | # ones.
 30 | extensions = [
 31 |     'sphinx.ext.autodoc',
 32 |     'sphinx.ext.doctest',
 33 |     'sphinx.ext.viewcode']
 34 | 
 35 | # Add any paths that contain templates here, relative to this directory.
 36 | templates_path = ['_templates']
 37 | 
 38 | # The suffix(es) of source filenames.
 39 | # You can specify multiple suffix as a list of string:
 40 | #
 41 | # source_suffix = ['.rst', '.md']
 42 | source_suffix = '.rst'
 43 | 
 44 | # The master toctree document.
 45 | master_doc = 'index'
 46 | 
 47 | # General information about the project.
 48 | project = u"dtool"
 49 | copyright = u"2017, Tjelvar Olsson"
 50 | author = u"Tjelvar Olsson"
 51 | repo_name = u"dtool"
 52 | 
 53 | # The version info for the project you're documenting, acts as replacement for
 54 | # |version| and |release|, also used in various other places throughout the
 55 | # built documents.
 56 | #
 57 | # The short X.Y version.
 58 | version = u"3.26.2"
 59 | # The full version, including alpha/beta/rc tags.
 60 | release = version
 61 | 
 62 | # The language for content autogenerated by Sphinx. Refer to documentation
 63 | # for a list of supported languages.
 64 | #
 65 | # This is also used if you do content translation via gettext catalogs.
 66 | # Usually you set "language" from the command line for these cases.
 67 | language = None
 68 | 
 69 | # List of patterns, relative to source directory, that match files and
 70 | # directories to ignore when looking for source files.
 71 | # This patterns also effect to html_static_path and html_extra_path
 72 | exclude_patterns = []
 73 | 
 74 | # The name of the Pygments (syntax highlighting) style to use.
 75 | pygments_style = 'sphinx'
 76 | 
 77 | # If true, `todo` and `todoList` produce output, else they produce nothing.
 78 | todo_include_todos = False
 79 | 
 80 | 
 81 | # -- Options for HTML output ----------------------------------------------
 82 | 
 83 | # The theme to use for HTML and HTML Help pages.  See the documentation for
 84 | # a list of builtin themes.
 85 | #
 86 | html_theme = 'default'
 87 | 
 88 | # Set the readthedocs theme.
 89 | on_rtd = os.environ.get('READTHEDOCS', None) == 'True'
 90 | 
 91 | if not on_rtd:  # only import and set the theme if we're building docs locally
 92 |     print('using readthedocs theme...')
 93 |     import sphinx_rtd_theme
 94 |     html_theme = 'sphinx_rtd_theme'
 95 |     html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
 96 | # otherwise, readthedocs.org uses their theme by default, so no need to specify
 97 | # it
 98 | 
 99 | # Theme options are theme-specific and customize the look and feel of a theme
100 | # further.  For a list of options available for each theme, see the
101 | # documentation.
102 | #
103 | # html_theme_options = {}
104 | 
105 | # Add any paths that contain custom static files (such as style sheets) here,
106 | # relative to this directory. They are copied after the builtin static files,
107 | # so a file named "default.css" will overwrite the builtin "default.css".
108 | html_static_path = ['_static']
109 | 
110 | 
111 | # -- Options for HTMLHelp output ------------------------------------------
112 | 
113 | # Output file base name for HTML help builder.
114 | htmlhelp_basename = '{}doc'.format(repo_name)
115 | 
116 | 
117 | # -- Options for LaTeX output ---------------------------------------------
118 | 
119 | latex_elements = {
120 |     # The paper size ('letterpaper' or 'a4paper').
121 |     #
122 |     # 'papersize': 'letterpaper',
123 | 
124 |     # The font size ('10pt', '11pt' or '12pt').
125 |     #
126 |     # 'pointsize': '10pt',
127 | 
128 |     # Additional stuff for the LaTeX preamble.
129 |     #
130 |     # 'preamble': '',
131 | 
132 |     # Latex figure (float) alignment
133 |     #
134 |     # 'figure_align': 'htbp',
135 | }
136 | 
137 | # Grouping the document tree into LaTeX files. List of tuples
138 | # (source start file, target name, title,
139 | #  author, documentclass [howto, manual, or own class]).
140 | latex_documents = [
141 |     (master_doc, '{}.tex'.format(repo_name),
142 |      u'{} Documentation'.format(repo_name),
143 |      author, 'manual'),
144 | ]
145 | 
146 | 
147 | # -- Options for manual page output ---------------------------------------
148 | 
149 | # One entry per manual page. List of tuples
150 | # (source start file, name, description, authors, manual section).
151 | man_pages = [
152 |     (master_doc, repo_name, u'{} Documentation'.format(repo_name),
153 |      [author], 1)
154 | ]
155 | 
156 | 
157 | # -- Options for Texinfo output -------------------------------------------
158 | 
159 | # Grouping the document tree into Texinfo files. List of tuples
160 | # (source start file, target name, title, author,
161 | #  dir menu entry, description, category)
162 | texinfo_documents = [
163 |     (master_doc, repo_name, u'{} Documentation'.format(repo_name),
164 |      author, repo_name, u'Manage scientific data',
165 |      'Miscellaneous'),
166 | ]
167 | 


--------------------------------------------------------------------------------
/docs/source/quick_start_guide.rst:
--------------------------------------------------------------------------------
  1 | Quick start guide
  2 | =================
  3 | 
  4 | This quick start guide shows how the ``dtool`` command line tool can be used to
  5 | accomplish some common data management tasks.
  6 | 
  7 | Organising files into a dataset on local disk
  8 | ---------------------------------------------
  9 | 
 10 | In this scenario one simply wants to organise one or more files into a dataset
 11 | in the file system on the local computer.
 12 | 
 13 | When working on local disk a dataset is simply a standardised directory layout
 14 | combined with some hidden files used to annotate the dataset and its items.
 15 | 
 16 | The first step is to create a "proto" dataset. The command below creates a
 17 | dataset named ``fishers-iris-data`` in the current working directory.
 18 | 
 19 | .. code-block:: none
 20 | 
 21 |     $ dtool create fishers-iris-data
 22 | 
 23 | One can now add files to the dataset by moving/copying them to the
 24 | ``fisher-iris-data/data`` directory, or by using the built in ``dtool add
 25 | item`` command. In the example below the file ``iris.csv`` is added to the
 26 | proto dataset.
 27 | 
 28 | .. code-block:: none
 29 | 
 30 |     $ touch iris.csv
 31 |     $ dtool add item iris.csv fishers-iris-data
 32 | 
 33 | Metadata describing the data is as important as the data itself. Metadata
 34 | describing the dataset is stored in the file ``fisers-iris-data/README.yml``.
 35 | An easy way to add content to this file is to use the ``dtool readme
 36 | interactive``, which will prompt for input regarding the dataset.
 37 | 
 38 | .. code-block:: none
 39 | 
 40 |     $ dtool readme interactive fishers-iris-data
 41 |     description [Dataset description]: Fisher's classic iris data, but with an empty file :(
 42 |     project [Project name]: dtool demo
 43 |     confidential [False]:
 44 |     personally_identifiable_information [False]:
 45 |     name [Your Name]: Tjelvar Olsson
 46 |     email [olssont@nbi.ac.uk]:
 47 |     username [olssont]:
 48 |     creation_date [2017-10-06]:
 49 |     Updated readme
 50 |     To edit the readme using your default editor:
 51 |     dtool readme edit fiser-iris-data
 52 | 
 53 | Finally, to convert the proto dataset into a dataset one uses the ``dtool
 54 | freeze`` command.
 55 | 
 56 | .. code-block:: none
 57 | 
 58 |     $ dtool freeze fishers-iris-data
 59 |     Generating manifest  [####################################]  100%  iris.csv
 60 |     Dataset frozen fiser-iris-data
 61 | 
 62 | 
 63 | Copying data from an external hard drive to remote storage as a dataset
 64 | -----------------------------------------------------------------------
 65 | 
 66 | Genome sequencing generates large volumes of data, which are often sent from
 67 | the sequencing company to the user by posting an external hard drive. When
 68 | backing up such data on a remote storage system one does not want to have to
 69 | reorganise the data before copying it to the remote storage system.
 70 | 
 71 | In this case one can create a "symlink" dataset and copy that to the remote
 72 | storage. A symlink dataset is a dataset where the data directory is a symlink
 73 | to another location, for example the data directory on the external hard drive.
 74 | 
 75 | .. code-block:: none
 76 | 
 77 |     $ dtool create bgi-sequencing-12345 --symlink-path /mnt/external-hard-drive
 78 | 
 79 | Again, adding metadata to the dataset is vital.
 80 | 
 81 | .. code-block:: none
 82 | 
 83 |     $ dtool readme interactive bgi-sequencing-12345
 84 | 
 85 | One can then convert the proto dataset into a dataset by "freezing" it.
 86 | 
 87 | .. code-block:: none
 88 | 
 89 |     $ dtool freeze bgi-sequencing-12345
 90 | 
 91 | It is now time to copy the dataset to the remote storage. The command below
 92 | assumes that one has credentials setup to write to the Amazon S3 bucket
 93 | ``dtool-demo``. The command copies the local dataset to the S3 ``dtool-demo``
 94 | bucket.
 95 | 
 96 | .. code-block:: none
 97 | 
 98 |     $ dtool cp bgi-sequencing-12345 s3://dtool-demo/
 99 | 
100 | The command above returns feedback on the URI used to identify the dataset in
101 | the remote storage. In this case
102 | ``s3://dtool-demo/1e47c076-2eb0-43b2-b219-fc7d419f1f16``.
103 | 
104 | The URI used to identify the dataset uses the UUID of the dataset rather than
105 | the dataset's name. This is to avoid name clashes in the object storage.
106 | 
107 | Finally, one may want to confirm that the data transfer was successful. This
108 | can be achieved using the ``dtool diff`` command, which should show no
109 | differences if the transfer was successful.
110 | 
111 | .. code-block:: none
112 | 
113 |     $ dtool diff bgi-sequencing-12345 s3://dtool-demo/1e47c076-2eb0-43b2-b219-fc7d419f1f16
114 | 
115 | By default only identifiers and file sizes are compared. To check file hashes
116 | make use of the ``--full`` option.
117 | 
118 | .. warning:: When comparing datasets identifiers, sizes and hashes are
119 |              compared. When checking that the hashes are identical the hashes
120 |              for the first dataset are recalculated using the hashing algorithm
121 |              of the reference dataset (the second). If the dataset in S3 had
122 |              been specified as the first argument then all the files would have
123 |              had to have been downloaded to the local disk before calculating
124 |              their hashes, which would have made the command slower. 
125 | 
126 | 
127 | Copying a dataset from remote storage to local disk
128 | ---------------------------------------------------
129 | 
130 | After having copied a dataset to a remote storage system one may have deleted
131 | the copy on the local disk. In this case one may want to be able to get the
132 | dataset back onto the local disk.
133 | 
134 | This can be achieved using the ``dtool cp`` command. The command below copies
135 | the dataset in iRODS to the current working directory.
136 | 
137 | .. code-block:: none
138 | 
139 |     $ dtool cp s3://dtool-demo/1e47c076-2eb0-43b2-b219-fc7d419f1f16 ./
140 | 
141 | Note that on the local disk the dataset will use the name of the dataset rather
142 | than the UUID, in this example ``bgi-sequencing-12345``.
143 | 
144 | Again one can verify the data transfer using the ``dtool diff`` command.
145 | 
146 | .. code-block:: none
147 | 
148 |     $ dtool diff bgi-sequencing-12345 s3://dtool-demo/1e47c076-2eb0-43b2-b219-fc7d419f1f16
149 | 


--------------------------------------------------------------------------------
/docs/source/working_with_datasets.rst:
--------------------------------------------------------------------------------
  1 | Working with datasets
  2 | =====================
  3 | 
  4 | Listing datasets
  5 | ----------------
  6 | 
  7 | It is possible to list all datasets in a directory or in a S3 bucket
  8 | using the ``dtool ls`` command.
  9 | 
 10 | .. code-block:: none
 11 | 
 12 |     $ dtool ls ~/my_datasets
 13 |     bgi-sequencing-12345
 14 |       file:///Users/olssont/my_datasets/bgi-sequencing-12345
 15 |     drone-images
 16 |       file:///Users/olssont/my_datasets/drone-images
 17 |     fishers-iris-data
 18 |       file:///Users/olssont/my_datasets/fishers-iris-data
 19 |     my_rnaseq_data
 20 |       file:///Users/olssont/my_datasets/my_rnaseq_data
 21 | 
 22 | .. tip:: When using this command proto datasets are highlighted in red.
 23 | 
 24 | .. tip:: The ``dtool ls`` command takes a URI. As such it can be used to list
 25 |          the datasets in remote storage locations. The example below lists all
 26 |          the datasets in the S3 bucket named ``dtool-demo``::
 27 | 
 28 |             $ dtool ls s3://dtool-demo/
 29 | 
 30 | 
 31 | Generating an inventory of datasets
 32 | -----------------------------------
 33 | 
 34 | It is possible to generate CSV/TSV/HTML inventories of datasets in a directory
 35 | or in another base URI such as an Amazon S3 bucket. For example, the command
 36 | below is used to generate a HTML report of all the datasets in the
 37 | s3://dtool-demo/ bucket.
 38 | 
 39 | .. code-block:: none
 40 | 
 41 |     $ dtool inventory --format html s3://dtool-demo/ > inventory.html
 42 | 
 43 | 
 44 | Verifying a dataset has not been modified since freezing it
 45 | -----------------------------------------------------------
 46 | 
 47 | A dtool dataset has metadata listing its items and their hashes. This
 48 | information can be used to verify that a dataset is in the same state as it was
 49 | when it was frozen.
 50 | 
 51 | In the example below the dataset has been corrupted in three ways.
 52 | 
 53 | 1. The file ``rna_seq_reads_4.fq.gz`` has been added to it
 54 | 2. The file ``rna_seq_reads_3.fq.gz`` has been deleted from it
 55 | 3. The content of the file ``rna_seq_reads_1.fq.gz`` has been modified
 56 | 
 57 | .. code-block:: none
 58 | 
 59 |     $ dtool verify ~/my_datasets/my_rnaseq_data
 60 |     Unknown item: 49919bdae83011b96bf54d984735e24c4419feb5 rna_seq_reads_4.fq.gz
 61 |     Missing item: 72b24007759c0086a316d13838021c2571853a16 rna_seq_reads_3.fq.gz
 62 | 
 63 | By default only identifiers and file sizes are compared. To check file hashes
 64 | make use of the ``--full`` option.
 65 | 
 66 | .. code-block:: none
 67 | 
 68 |     $ dtool verify --full ~/my_datasets/my_rnaseq_data
 69 |     Unknown item: 49919bdae83011b96bf54d984735e24c4419feb5 rna_seq_reads_4.fq.gz
 70 |     Missing item: 72b24007759c0086a316d13838021c2571853a16 rna_seq_reads_3.fq.gz
 71 |     Altered item: d4e065787eab480e9cbd2bac6988bc7717464c83 rna_seq_reads_1.fq.gz
 72 | 
 73 | 
 74 | Displaying the README descriptive metadata
 75 | ------------------------------------------
 76 | 
 77 | To display the README metadata used to describe the dataset one can make use of
 78 | the ``dtool readme show`` command.
 79 | 
 80 | .. code-block:: none
 81 | 
 82 |     $ dtool readme show ~/my_datasets/chrX-rna-seq
 83 |     ---
 84 |     description: RNA-seq sample data
 85 |     creation_date: 2017-11-20
 86 |     ftp: "ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/"
 87 |     doi: "10.1038/nprot.2016.095"
 88 | 
 89 | 
 90 | Reporting summary information about a dataset
 91 | ---------------------------------------------
 92 | 
 93 | One often wants to find out how many items are in a dataset and what their
 94 | total size is. This can be achieved using the ``dtool summary`` command.
 95 | 
 96 | .. code-block:: none
 97 | 
 98 |     $ dtool summary ~/my_datasets/drone-images
 99 |     name: drone-images
100 |     uuid: c2542c2b-d149-4f73-84bc-741bf9af918f
101 |     creator_username: hartleym
102 |     number_of_items: 59
103 |     size: 152.5MiB
104 |     frozen_at: 2017-09-19
105 | 
106 | 
107 | 
108 | Listing the item identifiers in a dataset
109 | -----------------------------------------
110 | 
111 | To list all the item identifiers in a dataset one can use the ``dtool
112 | identifiers`` command.
113 | 
114 | .. code-block:: none
115 | 
116 |     $ dtool identifiers ~/my_datasets/my_rnaseq_data
117 |     b0f92a668d24a3015692b0869e2b7590a62a380c
118 |     72b24007759c0086a316d13838021c2571853a16
119 |     d4e065787eab480e9cbd2bac6988bc7717464c83
120 | 
121 | 
122 | .. tip:: Using ``dtool ls`` on a dataset URI results in a list of item
123 |          identifiers and relapths::
124 | 
125 |             $ dtool ls ~/my_datasets/my_rnaseq_data
126 |             b0f92a668d24a3015692b0869e2b7590a62a380c - rna_seq_reads_2.fq.gz
127 |             72b24007759c0086a316d13838021c2571853a16 - rna_seq_reads_3.fq.gz
128 |             d4e065787eab480e9cbd2bac6988bc7717464c83 - rna_seq_reads_1.fq.gz
129 | 
130 | 
131 | Finding out the size of an item in a dataset
132 | --------------------------------------------
133 | 
134 | To find the size of a specific item in a dataset one can use the ``dtool item
135 | properties`` command. The command below accesses the properties of the item
136 | with the identifier ``58f50508c42a56919376132e36b693e9815dbd0c``.
137 | 
138 | .. code-block:: none
139 | 
140 |     $ dtool item properties ~/my_datasets/drone-images 58f50508c42a56919376132e36b693e9815dbd0c
141 |     {
142 |       "relpath": "IMG_8585.JPG",
143 |       "size_in_bytes": 2716446,
144 |       "utc_timestamp": 1505818439.0,
145 |       "hash": "dbcb0d6f22ec660fa4ac33b3d74556f3"
146 |     }
147 | 
148 | 
149 | Accessing the content of an item in a dataset
150 | ---------------------------------------------
151 | 
152 | When all files are on local disk getting access to them is trivial.  However,
153 | when files are located in some object storage system in the cloud, access may
154 | be less trivial.
155 | 
156 | dtool solves this problem by providing a call to a method that returns an
157 | absolute path on local disk with a promise that the file requested will be
158 | available from there when the call returns the path.
159 | 
160 | The dtool command line interface makes this call available as the command
161 | ``dtool item fetch``.
162 | 
163 | Below is an example of this command being used on a local disk file storage.
164 | 
165 | .. code-block:: none
166 | 
167 |     $ dtool item fetch ~/my_datasets/drone-images 58f50508c42a56919376132e36b693e9815dbd0c
168 |     /Users/olssont/my_datasets/drone-images/data/IMG_8585.JPG
169 | 
170 | Below is an example of this command being used on a dataset in the S3 bucket
171 | ``dtool-demo``.
172 | 
173 | .. code-block:: none
174 | 
175 |     $ dtool item fetch s3://dtool-demo/1e47c076-2eb0-43b2-b219-fc7d419f1f16 3dce23b901709a24cfbb974b70c1ef132af10a67
176 |     /Users/olssont/.cache/dtool/s3/1e47c076-2eb0-43b2-b219-fc7d419f1f16/3dce23b901709a24cfbb974b70c1ef132af10a67.txt
177 | 
178 | 
179 | Processing all the items in a dataset
180 | -------------------------------------
181 | 
182 | By combining the use of ``dtool identifiers`` and ``dtool item fetch`` it is
183 | possible to create basic Bash scripts to process all the items in a dataset.
184 | 
185 | .. code-block:: none
186 | 
187 |     $ DS_URI=~/my_datasets/my_rnaseq_data
188 |     $ for ITEM_ID in `dtool identifiers $DS_URI`;
189 |     > do ITEM_FPATH=`dtool item fetch $DS_URI $ITEM_ID`;
190 |     > echo $ITEM_FPATH;
191 |     > done
192 |     /Users/olssont/my_datasets/my_rnaseq_data/data/rna_seq_reads_2.fq.gz
193 |     /Users/olssont/my_datasets/my_rnaseq_data/data/rna_seq_reads_3.fq.gz
194 |     /Users/olssont/my_datasets/my_rnaseq_data/data/rna_seq_reads_1.fq.gz
195 | 


--------------------------------------------------------------------------------
/CHANGELOG.rst:
--------------------------------------------------------------------------------
  1 | CHANGELOG
  2 | =========
  3 | 
  4 | This project uses `semantic versioning <http://semver.org/>`_.
  5 | This change log uses principles from `keep a changelog <http://keepachangelog.com/>`_.
  6 | 
  7 | [Unreleased]
  8 | ------------
  9 | 
 10 | Added
 11 | ^^^^^
 12 | 
 13 | 
 14 | Changed
 15 | ^^^^^^^
 16 | 
 17 | 
 18 | Deprecated
 19 | ^^^^^^^^^^
 20 | 
 21 | 
 22 | Removed
 23 | ^^^^^^^
 24 | 
 25 | 
 26 | Fixed
 27 | ^^^^^
 28 | 
 29 | 
 30 | Security
 31 | ^^^^^^^^
 32 | 
 33 | [3.27.0] - 2024-07-04
 34 | ---------------------
 35 | 
 36 | Added
 37 | ^^^^^
 38 | 
 39 | 
 40 | Changed
 41 | ^^^^^^^
 42 | 
 43 | - Pinned ``dtoolcore`` version to 3.18.3, where copying tags has been fixed
 44 | - Embedded dtool icon in ``README.rst``
 45 | - Replaced ``setup.py`` with ``pyproject.toml``
 46 | - Dynamic versioning from scm tag
 47 | 
 48 | 
 49 | [3.26.2] - 2022-02-20
 50 | ---------------------
 51 | 
 52 | Fixed
 53 | ^^^^^
 54 | 
 55 | - Fixed defect where "frozen_at" administrative metadata changed when a dataset
 56 |   was being copied (in the destination dataset).
 57 |   Many thanks to `Johannes L. Hörmann <https://github.com/jotelha>`_
 58 |   and `Lars Pastewka <https://github.com/pastewka>`_ for bug reports,
 59 |   design discussions and code contributions.
 60 |   See:
 61 |   https://github.com/jic-dtool/dtoolcore/issues/20
 62 | - Improve handling of Windows paths with drive letters where the
 63 |   dataset is located in a drive different to that of the working
 64 |   directory, see https://github.com/jic-dtool/dtoolcore/pull/23
 65 | 
 66 | 
 67 | [3.26.1] - 2021-06-23
 68 | ---------------------
 69 | 
 70 | Fixed
 71 | ^^^^^
 72 | 
 73 | - License files now included in releases thanks to Jan Janssen (https://github.com/jan-janssen)
 74 | 
 75 | 
 76 | [3.26.0] - 2021-04-11
 77 | ---------------------
 78 | 
 79 | Added
 80 | ^^^^^
 81 | 
 82 | - ``dtoolcore.iter_datasets_in_base_uri`` helper function
 83 | - ``dtoolcore.iter_proto_datasets_in_base_uri`` helper function
 84 | 
 85 | Fixed
 86 | ^^^^^
 87 | 
 88 | - Fixed defect in ``dtool readme interactive`` command when the readme template contains a date.
 89 |   Thanks to Lars Pastewka.
 90 | - Fixed defect in "dtool readme interaction" when the default date of today is
 91 |   not updated when using "{{ date }}" in the readme template. See
 92 |   https://github.com/jic-dtool/dtool-create/issues/24
 93 |   Thanks to Antoine Sanner.
 94 | - Fixed issue where "dtool readme edit" opened file with ".txt" extension
 95 |   rather than ".yml" extension. See:
 96 |   https://github.com/jic-dtool/dtool-cli/issues/3
 97 |   Thanks to Antoine Sanner.
 98 | 
 99 | 
100 | 
101 | [3.25.0] - 2020-03-25
102 | ---------------------
103 | 
104 | Added support for tags from the dtool CLI.
105 | 
106 | Added
107 | ^^^^^
108 | 
109 | - The CLI command 'dtool tag set'
110 | - The CLI command 'dtool tag ls'
111 | - The CLI command 'dtool tag delete'
112 | 
113 | 
114 | [3.24.0] - 2020-03-23
115 | ---------------------
116 | 
117 | Added Python API support for tags.
118 | 
119 | Added
120 | ^^^^^
121 | 
122 | - Added ``dtoolcore._BaseDataSet.put_tag()`` method
123 | - Added ``dtoolcore._BaseDataSet.delete_tag()`` method
124 | - Added ``dtoolcore._BaseDataSet.list_tags()`` method
125 | - Added ``dtoolcore.storagebroker.BaseStorageBroker.delete_key()`` method
126 | - Added ``dtoolcore.storagebroker.BaseStorageBroker.get_tag_key()`` method
127 | - Added ``dtoolcore.storagebroker.BaseStorageBroker.list_tags()`` method
128 | - Added ``dtoolcore.storagebroker.BaseStorageBroker.put_tag()`` method
129 | - Added ``dtoolcore.storagebroker.BaseStorageBroker.delete_tag()`` method
130 | - Added ``dtoolcore.storagebroker.DiskStorageBroker.delete_key()`` method
131 | - Added ``dtoolcore.storagebroker.DiskStorageBroker.get_tag_key()`` method
132 | - Added ``dtoolcore.storagebroker.DiskStorageBroker.list_tags()`` method
133 | - Default cache directory changed from ``~/.cache/dtool/http`` to
134 |   ``~/.cache/dtool``
135 | 
136 | Fixed
137 | ^^^^^
138 | 
139 | - Cache environment variable changed from DTOOL_HTTP_CACHE_DIRECTORY to
140 |   DTOOL_CACHE_DIRECTORY
141 | 
142 | 
143 | [3.23.0] - 2020-02-28
144 | ---------------------
145 | 
146 | Added
147 | ^^^^^
148 | 
149 | - Add ``dtool readme validate`` command
150 | - Ability to update descriptive metadata in README of frozen datasets
151 |   when using ``dtool redme write``
152 | 
153 | Fixed
154 | ^^^^^
155 | 
156 | - Fixed several defects in how URIs were parsed and generated on Windows.
157 | 
158 | 
159 | [3.22.0] - 2020-02-06
160 | ---------------------
161 | 
162 | Improved Python API for creating datasets.
163 | 
164 | Added
165 | ^^^^^
166 | 
167 | - dtoolcore.create_proto_dataset() helper function
168 | - dtoolcore.create_derived_proto_dataset() helper function
169 | - dtoolcore.DataSetCreator helper context manager class
170 | - dtoolcore.DerivedDataSetCreator helper context manager class
171 | 
172 | Fixed
173 | ^^^^^
174 | 
175 | - Fixed defect where using ``DTOOL_NUM_PROCESSES`` > 1 resulted in
176 |   a cPickle.PicklingError on some storage brokers. Multiprocessing
177 |   is now only used if the storage broker supports it.
178 | 
179 | 
180 | [3.21.1] - 2020-01-23
181 | ---------------------
182 | 
183 | - Fixed defect where 'dtool verify' calculated hashes even when the '-f/--full'
184 |   option was not specified. The 'dtool verify' command now runs more quickly.
185 | 
186 | 
187 | [3.21.0] - 2020-01-21
188 | ---------------------
189 | 
190 | Added
191 | ^^^^^
192 | 
193 | - Ability to use multiple processes (cores) to generate item properties for
194 |   manifest files in parallel.  Set the environment variable
195 |   ``DTOOL_NUM_PROCESSES`` to specify the number of processes to use.
196 | 
197 | Fixed
198 | ^^^^^
199 | 
200 | - Included .dtool/annotations directory in DiskStorageBroker self description file
201 | 
202 | 
203 | [3.20.0] - 2019-10-31
204 | ---------------------
205 | 
206 | *New feature: Dataset annotation*
207 | 
208 | Dataset annotations are intended to make it easy to add and access specific
209 | metadata at a per dataset level.
210 | 
211 | The difference between annotations and the descriptive metadata is that the
212 | former is easier to work with in a programmatic fashion. The descriptive
213 | metadata, stored in the dataset's README content, is more free form. It is
214 | non-trivial to access specific pieces of information from the descriptive
215 | metadata in the dataset's README content, whereas a dtool annotation can be
216 | easily accessed by its name.
217 | 
218 | Added
219 | ^^^^^
220 | 
221 | - Added ``dtool annotation set`` command
222 | - Added ``dtool annotation get`` command
223 | - Added ``dtool annotation ls`` command
224 | 
225 | 
226 | [3.19.0] - 2019-09-12
227 | ---------------------
228 | 
229 | Added
230 | ^^^^^
231 | 
232 | - Added sorting of items by relpath to 'dtool ls <DS_URI>'
233 | 
234 | Fixed
235 | ^^^^^
236 | 
237 | - Fixed formatting of 'dtool ls <DS_URI>' from using two whitespaces to using
238 |   one tab to make it easier to work with command line tools such as ``cut``
239 | - Fixed ordering of lines in overlay CSV template from being sorted by the
240 |   identifier to being ordered by the relpath
241 | 
242 | 
243 | [3.18.0] - 2019-09-06
244 | ---------------------
245 | 
246 | Added
247 | ^^^^^
248 | 
249 | - Added 'dtool overlays show' command
250 | - Added 'dtool overlays write' command
251 | - Added 'dtool overlays template parse' command
252 | - Added 'dtool overlays template glob' command
253 | - Added 'dtool overlays template pairs' command
254 | 
255 | 
256 | Deprecated
257 | ^^^^^^^^^^
258 | 
259 | - Deprecated 'dtool overlay ls'
260 | - Deprecated 'dtool overlay show'
261 | 
262 | 
263 | [3.17.0] - 2019-08-06
264 | ---------------------
265 | 
266 | Added
267 | ^^^^^
268 | 
269 | - Added support for host name in file URI.
270 | - Added ``dtool status`` command for working out if a dataset is frozen or not
271 | - Added ``dtool uri`` command for expanding absolute and relative paths into
272 |   proper URIs
273 | 
274 | 
275 | [3.16.0] - 2019-07-12
276 | ---------------------
277 | 
278 | Added
279 | ^^^^^
280 | 
281 | - Added more debug logging
282 | - Added ``dtool config ecs ls`` command to list ECS base URIs that have been
283 | - Added support for configuring access to ECS buckets in multiple namespaces
284 | 
285 | Fixed
286 | ^^^^^
287 | 
288 | - The ``dtool config azure ls`` command now returns base URIs rather than
289 |   container names
290 | 
291 | 
292 | [3.15.0] - 2019-04-26
293 | ---------------------
294 | 
295 | Added
296 | ^^^^^
297 | 
298 | - ``dtool config readme-template`` CLI command for configuring the path to a
299 |   custom readme template
300 | - ``dtoolcore._BaseDataSet.base_uri`` property
301 | - ``dtoolcore.storagebroker.BaseStorageBroker.generate_base_uri`` method
302 | - ``dtoolcore.utils.DEFAULT_CACHE_PATH`` global helper variable
303 | - ``dtoolcore.utils.get_config_value_from_file`` helper function
304 | - ``dtoolcore.utils.write_config_value_to_file`` helper function
305 | 
306 | 
307 | Changed
308 | ^^^^^^^
309 | 
310 | - ``dtool config cache`` now works with one unified cache directory for all
311 |   storage brokers
312 | - Started using unified environment variable to specify the cache directory
313 |   ``DTOOL_CACHE_DIRECTORY``
314 | - Default cache directory changed set to ``~/.cache/dtool``
315 | 
316 | Fixed
317 | ^^^^^
318 | 
319 | - Fixed defect  when username was supplied as two separate strings to
320 |   ``dtool config user name`` in CLI
321 | 
322 | 
323 | [3.14.1] - 2018-12-12
324 | ---------------------
325 | 
326 | Fixed
327 | ^^^^^
328 | 
329 | - Fixed the ``dtool config azure set`` help text
330 | 
331 | 
332 | [3.14.0] - 2018-11-21
333 | ---------------------
334 | 
335 | Added
336 | ^^^^^
337 | 
338 | - Added ``dtool publish`` command
339 | - Added ``-f/--format`` option to ``dtool summary`` command to enable output in
340 |   JSON format
341 | - Added sorting of CSV/TSV/HTML inventories by dataset name
342 | 
343 | 
344 | Changed
345 | ^^^^^^^
346 | 
347 | - Changed default output of ``dtool summary`` to be human readable YAML
348 | 
349 | 
350 | [3.13.0] - 2018-11-13
351 | ---------------------
352 | 
353 | Added
354 | ^^^^^
355 | 
356 | - Added support for Windows!   :)
357 | - Added ``dtool config`` command
358 | 
359 | 
360 | 
361 | 
362 | [3.12.0] - 2018-09-25
363 | ---------------------
364 | 
365 | Added
366 | ^^^^^
367 | 
368 | - Added ``dtool uuid`` command
369 | - Added ``dtool item relpath`` command
370 | 
371 | 
372 | [3.11.0] - 2018-09-20
373 | ---------------------
374 | 
375 | Added
376 | ^^^^^
377 | 
378 | - ``dtool cp`` to replace ``dtool copy``
379 | - ``dtool readme write`` to write readme from file or stdin
380 | - ``dtool item overlay`` command
381 | 
382 | 
383 | Deprecated
384 | ^^^^^^^^^^
385 | 
386 | - ``dtool copy`` in favour of ``dtool cp``
387 | 
388 | 
389 | Removed
390 | ^^^^^^^
391 | 
392 | - Removed ``created_at`` field from default README template
393 | 
394 | 
395 | Fixed
396 | ^^^^^
397 | 
398 | - Defect in ``dtool create`` when providing a relative path to the
399 |   ``--symlink-path`` option
400 | - Python 2 defect in dealing with unicode in README.yml file when using
401 |   ``dtool readme edit``
402 | 
403 | 
404 | [3.10.0] - 2018-09-11
405 | ---------------------
406 | 
407 | Added
408 | ^^^^^
409 | 
410 | - ``dtoolcore.filehasher.hashsum_digest`` helper function
411 | - ``dtoolcore.filehasher.md5sum_digest`` helper function
412 | 
413 | 
414 | Changed
415 | ^^^^^^^
416 | 
417 | - Improved name from ``dtoolcore.filehasher.hashsum`` to
418 |   ``dtoolcore.filehasher.hashsum_hexdigest``
419 | 
420 | Fixed
421 | ^^^^^
422 | 
423 | - Deal with issue in how ruamel.yaml deals with float values
424 | 
425 | 
426 | 
427 | [3.9.0] - 2018-08-03
428 | --------------------
429 | 
430 | Added
431 | ^^^^^
432 | 
433 | - Added ability to update the name of a frozen dataset from the ``dtool`` CLI
434 | - Added ``update_name`` method to ``DataSet`` class (previously only available
435 |   on ``ProtoDataSet`` class)
436 | 
437 | 
438 | [3.8.0] - 2018-07-31
439 | --------------------
440 | 
441 | Dataset name validation.
442 | 
443 | Added
444 | ^^^^^
445 | 
446 | - ``dtoolcore.generate_admin_metadata`` function raises
447 |   ``dtoolcore.DtoolCoreInvalidNameError`` if invalid name is provided
448 | - ``dtoolcore.utils.name_is_valid`` utility function for checking sanity of
449 |   dataset names
450 | - Validation of dataset name upon creation using dtool CLI
451 | - Validation of dataset name when updating it using dtool CLI
452 | 
453 | Fixed
454 | ^^^^^
455 | 
456 | - Fixed defect where ``dtool ls -q`` was listing dataset names rather than URIs
457 |   making it impossible to process datasets in a BASE_URI programatically
458 | - Make ``SymlinkStorageBroker`` compatible with dtoolcore 3.4.0
459 | 
460 | 
461 | [3.7.0] - 2018-07-26
462 | --------------------
463 | 
464 | Storage broker base class redesign and refactoring.
465 | 
466 | Added
467 | ^^^^^
468 | 
469 | - Ability to update descriptive metadata in README of frozen datasets
470 | - Validation that the descriptive metadata provided by the
471 |   ``dtool readme edit`` command is valid YAML
472 | - Added ``dtoolcore.storagebroker.BaseStorageBroker``
473 | - Added logging to the reusable ``BaseStorageBroker`` methods
474 | - ``get_text`` new method on ``BaseStorageBroker`` class
475 | - ``put_text`` new method on ``BaseStorageBroker`` class
476 | - ``get_admin_metadata_key`` new method on ``BaseStorageBroker`` class
477 | - ``get_readme_key`` new method on ``BaseStorageBroker`` class
478 | - ``get_manifest_key`` new method on ``BaseStorageBroker`` class
479 | - ``get_overlay_key`` new method on ``BaseStorageBroker`` class
480 | - ``get_structure_key`` new method on ``BaseStorageBroker`` class
481 | - ``get_dtool_readme_key`` new method on ``BaseStorageBroker`` class
482 | - ``get_size_in_bytes`` new method on ``BaseStorageBroker`` class
483 | - ``get_utc_timestamp`` new method on ``BaseStorageBroker`` class
484 | - ``get_hash`` new method on ``BaseStorageBroker`` class
485 | - ``get_relpath`` new method on ``BaseStorageBroker`` class
486 | - ``update_readme`` new method on ``BaseStorageBroker`` class
487 | - ``DataSet.put_readme`` method that can be used to update descriptive metadata
488 |    in (frozen) dataset README whilst keeping a copy of the historical README
489 |    content
490 | - Add ``storage_broker_version`` key to structure parameters
491 | 
492 | Fixed
493 | ^^^^^
494 | 
495 | - Stop ``copy_resume`` function calculating hashes unnecessarily
496 | - Fixed the documentation of the ``dtool verify`` command
497 | 
498 | 
499 | [3.6.2] - 2018-07-10
500 | --------------------
501 | 
502 | Fixed
503 | ^^^^^
504 | 
505 | - Default config file now set in ``dtoolcore.utils.get_config_value`` if not provided in caller 
506 | 
507 | 
508 | [3.6.1] - 2018-07-09
509 | --------------------
510 | 
511 | Fixed
512 | ^^^^^
513 | 
514 | - Made download to DTOOL_HTTP_CACHE_DIRECTORY more robust
515 | - Added ability to deal with redirects to enable working with shortened URLs
516 | 
517 | 
518 | [3.6.0] - 2018-07-05
519 | --------------------
520 | 
521 | Added
522 | ^^^^^
523 | 
524 | - Bundling of ``dtool-http`` package
525 | 
526 | Removed
527 | ^^^^^^^
528 | 
529 | - Bundling of ``dtool-irods`` package
530 | - Bundling of ``dtool-s3`` package
531 | 
532 | 
533 | [3.5.0] - 2018-06-06
534 | --------------------
535 | 
536 | Added
537 | ^^^^^
538 | 
539 | - Pre-checks to 'dtool freeze' command to ensure that there is no rogue content
540 |   in the base of disk datasets
541 | - Added rogue content validation check to DiskStorageBroker.pre_freeze hook
542 | 
543 | 
544 | [3.4.0] - 2018-05-24
545 | --------------------
546 | 
547 | Added
548 | ^^^^^
549 | 
550 | - Pre-checks to 'dtool freeze' command to ensure that the item handles are sane, i.e. that they do not contain newline characters
551 | - Pre-checks to 'dtool freeze' command to ensure that there are not too many items in the proto dataset, default to less than 10000
552 | 
553 | 
554 | [3.3.1] - 2018-05-18
555 | --------------------
556 | 
557 | Fixed
558 | ^^^^^
559 | 
560 | - Defect where inventory html template is not included in Python package on PyPi
561 | 
562 | 
563 | [3.3.0] - 2018-05-18
564 | --------------------
565 | 
566 | Added
567 | ^^^^^
568 | 
569 | - Add "created_at" key to the administrative metadata
570 | - ``dtool inventory`` command for generating csv/tsv/html inventories of collections
571 |   of datasets
572 | - Added support for ``-h`` flag as well as ``--help``
573 | - Added timestamp to logging output
574 | 
575 | Fixed
576 | ^^^^^
577 | 
578 | - Improved handling of URIs in validation code
579 | - Fixed defect where running ``dtool item properties`` with an invalid identifier
580 |   resulted in a KeyError exception being propagated to the user
581 | - Fixed defect where ``dtool verify`` did not compare file sizes
582 | - Fixed timestamp defect in DiskStoragBroker
583 | 
584 | 
585 | [3.2.1] - 2018-05-01
586 | --------------------
587 | 
588 | Fixed
589 | ^^^^^
590 | 
591 | - Fixed issue arising from a file being put into iRODS and the connection
592 |   breaking before the appropriate metadata could be set on the file in iRODS.
593 |   See also: https://github.com/jic-dtool/dtool-irods/issues/7
594 | 
595 | 
596 | [3.2.0] - 2018-02-09
597 | --------------------
598 | 
599 | Release to make it easier to create symlink datasets in an automated fashion.
600 | 
601 | Changed
602 | ^^^^^^^
603 | 
604 | - Simplified the way to specify the symbolic link path in the
605 |   SymLinkStorageBroker
606 | - The path to the data when creating a symlink dataset is now specified using the
607 |   ``-s/--symlink-path`` option rather than being something that is prompted for.
608 |   This makes it easier to create symlink datasets in an automated fashion.
609 | 
610 | 
611 | [3.1.0] - 2018-02-05
612 | --------------------
613 | 
614 | Added
615 | ^^^^^
616 | 
617 | - ``--resume`` option to ``dtool copy`` command
618 | - ``--quite`` and ``--verbose`` options to ``dtool ls`` and improved formatting
619 | - Add ``dtoolcore.copy_resume`` function
620 | 
621 | 
622 | [3.0.0] - 2018-01-18
623 | --------------------
624 | 
625 | This release makes use of the dtoolcore version 3.0.0 API, which improves the
626 | handling of URIs and adds more metadata describing the structure of datasets.
627 | 
628 | Another major feature of this release is the addition of an S3 storage broker
629 | that can be used to interact with Amazon's S3 object storage.
630 | 
631 | Added
632 | ^^^^^
633 | 
634 | - AWS S3 object storage broker
635 | - Writing of ``.dtool/structure.json`` file to the DiskStorageBroker; a file
636 |   for describing the structure of the dtool dataset in a computer readable format
637 | - Writing of ``.dtool/README.txt`` file to the DiskStorageBroker; a file
638 |   for describing the structure of the dtool dataset in a human readable format
639 | - Writing of ``.dtool/structure.json`` file to the IrodsStorageBroker; a file
640 |   for describing the structure of the dtool dataset in a computer readable format
641 | - Writing of ``.dtool/README.txt`` file to the IrodsStorageBroker; a file
642 |   for describing the structure of the dtool dataset in a human readable format
643 | 
644 | 
645 | Changed
646 | ^^^^^^^
647 | 
648 | - Make use of dtoolcore version 3 API
649 | 
650 | 
651 | Fixed
652 | ^^^^^
653 | 
654 | - Removed the historical ``dtool_readme`` key/value pair from the
655 |   administrative metadata (in the .dtool/dtool file)
656 | 
657 | 
658 | [2.4.0] - 2017-12-14
659 | --------------------
660 | 
661 | Added
662 | ^^^^^
663 | 
664 | - Ability to specify a custom README.yml template file path.
665 | - Ability to configure the full user name for the README.yml template using
666 |   ``DTOOL_USER_FULL_NAME``
667 | 
668 | Fixed
669 | ^^^^^
670 | 
671 | - Made ``.dtool/manifest.json`` content created by DiskStorageBroker human
672 |   readable by adding new lines and indentation to the JSON formatting.
673 | - Made the DiskStorageBroker.list_overlay_names method more robust. It no
674 |   longer falls over if the ``.dtool/overlays`` directory has been lost, i.e. by
675 |   cloning a dataset with no overlays from a Git repository.
676 | - Fixed defect where an incorrect URI would get set on the dataset when using
677 |   ``DataSet.from_path`` class method on a relative path
678 | - Made the YAML output more pretty by adding more indentation.
679 | - Replaced hardcoded ``nbi.ac.uk`` email with configurable ``DTOOL_USER_EMAIL``
680 |   in the default README.yml template.
681 | - Fixed ``IrodsStorageBroker.generate_uri`` class method
682 | - Made ``.dtool/manifest.json`` content created by IrodsStorageBroker human
683 |   readable by adding new lines and indentation to the JSON formatting.
684 | - Added rule to catch ``CAT_INVALID_USER`` string for giving a more informative
685 |   error message when iRODS authentication times out
686 | 
687 | 
688 | 
689 | [2.3.2] - 2017-10-25
690 | --------------------
691 | 
692 | Fixed
693 | ^^^^^
694 | 
695 | - Fixed issue where the symbolic link was not fully resolved when creating
696 |   a symlink dataset that used the terminal to prompt for the data directory
697 | 
698 | 
699 | [2.3.1] - 2017-10-25
700 | --------------------
701 | 
702 | Fixed
703 | ^^^^^
704 | 
705 | - More graceful exit if one presses Cancel in file browser when creating a
706 |   symlink dataset
707 | - Data directory now falls back on click command line prompt if TkInter has
708 |   issues when creating a symlink dataset
709 | 
710 | 
711 | [2.3.0] - 2017-10-23
712 | --------------------
713 | 
714 | Added
715 | ^^^^^
716 | 
717 | - ``pre_freeze_hoook`` to the stroage broker interface called at the beginning
718 |   of ``ProtoDataSet.freeze`` method.
719 | - ``--quiet`` flag to ``dtool create`` command
720 | - ``dtool overlay ls`` command to list the overlays in dataset
721 | - ``dtool overlay show`` command to show the content of a specific overlay
722 | 
723 | 
724 | Changed
725 | ^^^^^^^
726 | 
727 | - Improved speed of freezing a dataset in iRODS by making use of
728 |   caches to reduce the number of calls made to iRODS during this
729 |   process
730 | - ``dtool copy`` now specifies target location using URI rather than
731 |   using the ``--prefix`` and ``--storage`` arguments
732 | 
733 | 
734 | Fixed
735 | ^^^^^
736 | 
737 | - Made the ``DiskStorageBroker.create_structure`` method more robust
738 | - More informative error message when iRODS has not been configured
739 | - More informative error message when iRODS authentication times out
740 | - Stopped client hanging when iRODS authentication has timed out
741 | - storagebroker's ``put_item`` method now returns relpath
742 | - Made the ``IrodsStorageBroker.create_structure`` method more
743 |   robust by checking if the parent collection exists
744 | - Made error handling in ``dtool create`` more specific
745 | - Added propagation of original error message when ``StorageBrokerOSError``
746 |   captures in ``dtool create``
747 | 
748 | 
749 | [2.2.0] - 2017-10-09
750 | --------------------
751 | 
752 | 
753 | Added
754 | ^^^^^
755 | 
756 | - ``dtool ls`` can now be used to list the relpaths of the items in a dataset
757 | - ``-f/--full`` flag to ``dtool diff`` command to include checking of file
758 |   hashes  
759 | - ``-f/--full`` flag to ``dtool verify`` command to include checking of file
760 |   hashes  
761 | 
762 | 
763 | Changed
764 | ^^^^^^^
765 | 
766 | - ``dtool ls`` now works with URIs rather than with prefix and storage arguments
767 | - ``dtool diff`` now only compares identifiers and file sizes by default
768 | - ``dtool verify`` now only compares identifiers and file sizes by default
769 | 
770 | 
771 | Fixed
772 | ^^^^^
773 | 
774 | - Made ``DiskStorageBroker.list_dataset_uris`` class method more robust
775 | 
776 | 
777 | [2.1.2] - 2017-10-05
778 | --------------------
779 | 
780 | Fixed
781 | ^^^^^
782 | 
783 | - Set the correct dependency to actually get fix reported in 2.1.1
784 | 
785 | [2.1.1] - 2017-10-05
786 | --------------------
787 | 
788 | Fixed
789 | ^^^^^
790 | 
791 | - Fixed defect in iRODS storage broker where files with white space resulted in
792 |   broken identifiers
793 | 
794 | 
795 | [2.1.0] - 2017-10-04
796 | --------------------
797 | 
798 | Added
799 | ^^^^^
800 | 
801 | - ``dtool readme show`` command that returns the readme content
802 | - ``--quiet`` flag to ``dtool copy`` command
803 | 
804 | Changed
805 | ^^^^^^^
806 | 
807 | - Improved the ``dtool readme --help`` output
808 | 
809 | Fixed
810 | ^^^^^
811 | 
812 | - Progress bar now shows information on individual items being processed
813 | - ``dtool ls`` now works with relative paths
814 | - Fix defect where ``IrodsStorageBroker.put_item`` raised SystemError when
815 |   trying to overwrite an existing file
816 | 
817 | 
818 | [2.0.2] - 2017-09-25
819 | --------------------
820 | 
821 | Fixed
822 | ^^^^^
823 | 
824 | - Better validation of input in terms of base vs proto vs frozen dataset URIs
825 | - Fixed bug where copy creates an intermediate proto dataset that self
826 |   identifies as a frozen dataset.
827 | - Fixed potential bug where a copy could convert a proto dataset to
828 |   a dataset before all its overlays had been copied over
829 | - Fixed type of "frozen_at" time stamp in admin metadata: from string to float
830 | 
831 | 
832 | [2.0.1] - 2017-09-20
833 | --------------------
834 | 
835 | Fixed
836 | ^^^^^
837 | 
838 | - Made version requirements of dtool sub-packages explicit
839 | 
840 | [2.0.0] - 2017-09-14
841 | --------------------
842 | 
843 | Initial release of ``dtool`` as a meta package.
844 | 


--------------------------------------------------------------------------------