├── poetry.toml
├── mypy.ini
├── tests
    ├── data
    │   ├── sample.csv
    │   ├── sample.jpeg
    │   ├── sample.xlsx
    │   └── 2022_04_14_rocket.png
    ├── test_hash.py
    ├── test_parser.py
    ├── test_config.py
    ├── test_project.py
    └── test_cli.py
├── example
    ├── wrongImagesInMNISTTestset.xlsx
    └── prepare.sh
├── .github
    ├── pull_request_template.md
    ├── ISSUE_TEMPLATE
    │   ├── feature_request.md
    │   └── bug_report.md
    └── workflows
    │   ├── ci.yml
    │   └── ci_dev.yml
├── base
    ├── __init__.py
    ├── hash.py
    ├── exception.py
    ├── spinner.py
    ├── dataset.py
    ├── config.py
    ├── parser.py
    ├── files.py
    └── cli.py
├── CONTRIBUTING.md
├── LICENSE
├── pyproject.toml
├── .gitignore
├── download_mnist.py
├── CODE_OF_CONDUCT.md
├── README.md
└── docs
    ├── CLI.md
    └── SDK.md


/poetry.toml:
--------------------------------------------------------------------------------
1 | [virtualenvs]
2 | in-project = true


--------------------------------------------------------------------------------
/mypy.ini:
--------------------------------------------------------------------------------
1 | [mypy]
2 | ignore_missing_imports = True


--------------------------------------------------------------------------------
/tests/data/sample.csv:
--------------------------------------------------------------------------------
1 | key1,key2,key3
2 | 1,2,3
3 | 4,5,6
4 | 7,8,9


--------------------------------------------------------------------------------
/tests/data/sample.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/adansons/base/HEAD/tests/data/sample.jpeg


--------------------------------------------------------------------------------
/tests/data/sample.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/adansons/base/HEAD/tests/data/sample.xlsx


--------------------------------------------------------------------------------
/tests/data/2022_04_14_rocket.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/adansons/base/HEAD/tests/data/2022_04_14_rocket.png


--------------------------------------------------------------------------------
/example/wrongImagesInMNISTTestset.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/adansons/base/HEAD/example/wrongImagesInMNISTTestset.xlsx


--------------------------------------------------------------------------------
/.github/pull_request_template.md:
--------------------------------------------------------------------------------
1 | close #your_issue_id
2 | 
3 | # Motivation
4 | 
5 | # Description of the changes
6 | 
7 | # Example


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: 🪄 Feature Request
 3 | about: Request a new feature or enhancement
 4 | labels: 0_enhancement
 5 | ---
 6 | 
 7 | # Motivation
 8 | 
 9 | # Description
10 | 
11 | # Additional context (optional)
12 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: 🐛 Bug Report
 3 | about: Create a bug report
 4 | labels: 1_bug
 5 | ---
 6 | 
 7 | # Expected behavior
 8 | 
 9 | # Error messages, stack traces, or logs
10 | 
11 | # Steps to reproduce
12 | 
13 | # Additional context (optional)
14 | 


--------------------------------------------------------------------------------
/base/__init__.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | 
 3 | # Copyright 2021 Adansons Inc.
 4 | # Please contact engineer@adansons.co.jp
 5 | 
 6 | from .project import Project
 7 | from .dataset import Dataset
 8 | 
 9 | 
10 | # check exists local cache directory and files
11 | import os
12 | 
13 | CACHE_DIR = os.path.join(os.path.expanduser("~"), ".base")
14 | CONFIG_FILE = os.path.join(os.path.expanduser("~"), ".base", "config")
15 | PROJECT_FILE = os.path.join(os.path.expanduser("~"), ".base", "projects")
16 | 
17 | os.makedirs(CACHE_DIR, exist_ok=True)
18 | 
19 | # initialize with empty file
20 | if not os.path.exists(CONFIG_FILE):
21 |     open(CONFIG_FILE, "w").close()
22 | if not os.path.exists(PROJECT_FILE):
23 |     open(PROJECT_FILE, "w").close()
24 | 
25 | VERSION = "0.1.3"
26 | __version__ = VERSION
27 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contribution Guide
 2 | 
 3 | Thanks for your interest in helping improve Adansons Base!
 4 | 
 5 | 1. Please check exisintg issues to know someone is already working on same thing.
 6 | 
 7 | 2. Create a new issue if you want fix a large bug or add a new feature.
 8 | 
 9 | 3. Create a new branch or fork this repository. (it is good to use the name which is clear what issue is related on that branch. ex: `feature/#100` or `fix/#101`)
10 | 
11 | 4. Apply code formatter `black` and check `pytest` goes well after your working.
12 | 
13 | 5. Create pull request to `main` branch and specify reviewers.
14 | 
15 | ## Detail of development
16 | 
17 | ### 1. Setup environment for develop
18 | 
19 | Check poetry installation.
20 | 
21 | If you don't installed poetry yet, please follow [the official instructions](https://python-poetry.org/docs/#installation).
22 | 
23 | If `poetry --help` works, you're good to go.
24 | 
25 | ```Bash
26 | poetry install
27 | ```
28 | 
29 | ### 2. Running tests
30 | 
31 | ```Bash
32 | poetry run pytest tests/
33 | ```
34 | 
35 | ### 3. Format source
36 | 
37 | ```Bash
38 | poetry run black .
39 | ```


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Adansons Inc
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/example/prepare.sh:
--------------------------------------------------------------------------------
 1 | # Partially adapted from https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh
 2 | 
 3 | # script to extract ImageNet dataset
 4 | # ILSVRC2012_img_val.tar (about 6.3 GB)
 5 | # make sure ILSVRC2012_img_val.tar in your current directory
 6 | #
 7 | #  Adapted from:
 8 | #  https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md
 9 | #  https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4
10 | # 
11 | #  imagenet/val/
12 | #  ├── n01440764
13 | #  │   ├── ILSVRC2012_val_00000293.JPEG
14 | #  │   ├── ILSVRC2012_val_00002138.JPEG
15 | #  │   ├── ......
16 | #  ├── ......
17 | 
18 | # Make imagnet directory
19 | #
20 | mkdir imagenet
21 | 
22 | # Extract the validation data and move images to subfolders:
23 | #
24 | # Create validation directory; move .tar file; change directory; extract validation .tar; remove compressed file
25 | mkdir imagenet/val && mv ILSVRC2012_img_val.tar imagenet/val/ && cd imagenet/val && tar -xvf ILSVRC2012_img_val.tar && rm -f ILSVRC2012_img_val.tar
26 | # get script from soumith and run; this script creates all class directories and moves images into corresponding directories
27 | wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | zsh
28 | #
29 | # This results in a validation directory like so:
30 | #
31 | #  imagenet/val/
32 | #  ├── n01440764
33 | #  │   ├── ILSVRC2012_val_00000293.JPEG
34 | #  │   ├── ILSVRC2012_val_00002138.JPEG
35 | #  │   ├── ......
36 | #  ├── ......
37 | #
38 | #
39 | # Check total files after extract
40 | #
41 | #  $ find val/ -name "*.JPEG" | wc -l
42 | #  50000
43 | #


--------------------------------------------------------------------------------
/base/hash.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | 
 3 | # Copyright 2022 Adansons Inc.
 4 | # Please contact engineer@adansons.co.jp
 5 | import hashlib
 6 | 
 7 | 
 8 | HASH_FUNCS = {
 9 |     "md5": hashlib.md5,
10 |     "sha224": hashlib.sha224,
11 |     "sha256": hashlib.sha256,
12 |     "sha384": hashlib.sha384,
13 |     "sha512": hashlib.sha512,
14 |     "sha1": hashlib.sha1,
15 | }
16 | 
17 | 
18 | def calc_file_hash(
19 |     path: str,
20 |     algorithm: str = "sha256",
21 |     split_chunk: bool = True,
22 |     chunk_size: int = 2048,
23 | ) -> str:
24 |     """
25 |     Calculate hash value of each file
26 | 
27 |     Parameters
28 |     ----------
29 |     path : str
30 |         target file path
31 |     algorithm : {"md5", "sha224", "sha256", "sha384", "sha512", "sha1"}, default="sha256"
32 |         hash algorithm name
33 |     split_chunk : bool, default=True
34 |         if True, split large file to byte chunks
35 |     chunk_size : int, default=2048
36 |         block byte size of chunk
37 | 
38 |     Returns
39 |     -------
40 |     digest : str
41 |         hash string of inputed file
42 |     """
43 |     hash_func = HASH_FUNCS[algorithm]()
44 | 
45 |     with open(path, "rb") as f:
46 |         if split_chunk:
47 |             while True:
48 |                 chunk = f.read(chunk_size * hash_func.block_size)
49 |                 if len(chunk) == 0:
50 |                     break
51 | 
52 |                 hash_func.update(chunk)
53 |         else:
54 |             chunk = f.read()
55 |             hash_func.update(chunk)
56 | 
57 |     digest = hash_func.hexdigest()
58 |     return digest
59 | 
60 | 
61 | if __name__ == "__main__":
62 |     pass
63 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [tool.poetry]
 2 | name = "adansons-base"
 3 | version = "0.1.3"
 4 | description = "Adansons Base"
 5 | readme = "README.md"
 6 | authors = ["Adansons Developers <engineer@adansons.co.jp>"]
 7 | homepage = ""
 8 | repository = "https://github.com/adansons/base"
 9 | license = "MIT"
10 | packages = [
11 |     { include = "base"},
12 | ]
13 | classifiers = [
14 |     "Development Status :: 4 - Beta",
15 |     "Intended Audience :: Science/Research",
16 |     "Intended Audience :: Information Technology",
17 |     "Intended Audience :: Developers",
18 |     "License :: OSI Approved :: MIT License",
19 |     "Programming Language :: Python :: 3.8",
20 |     "Programming Language :: Python :: 3.9",
21 |     "Programming Language :: Python :: 3.10",
22 |     "Topic :: Database",
23 |     "Topic :: Scientific/Engineering",
24 |     "Topic :: Scientific/Engineering :: Artificial Intelligence",
25 |     "Topic :: Scientific/Engineering :: Information Analysis",
26 |     "Operating System :: MacOS",
27 |     "Operating System :: Microsoft :: Windows",
28 |     "Operating System :: POSIX :: Linux"
29 | ]
30 | 
31 | [tool.poetry.scripts]
32 | base = 'base.cli:main'
33 | 
34 | [tool.poetry.dependencies]
35 | python = "^3.8,<3.11"
36 | click = ">=8.0.0"
37 | requests = ">=1.0.0"
38 | numpy = ">=1.18.5"
39 | scikit-learn = ">=0.23.0"
40 | PyYAML = "^6.0"
41 | "ruamel.yaml" = "^0.17.21"
42 | colorama = "^0.4.4"
43 | pandas = "^1.4.3"
44 | 
45 | [tool.poetry.dev-dependencies]
46 | black = "^21.12b0"
47 | pytest = "^6.2.5"
48 | mypy = "^0.931"
49 | boto3 = "^1.21.13"
50 | jupyter = "^1.0.0"
51 | jupyterlab = "^3.0.9"
52 | 
53 | [build-system]
54 | requires = ["poetry-core>=1.0.0"]
55 | build-backend = "poetry.core.masonry.api"
56 | 
57 | 


--------------------------------------------------------------------------------
/tests/test_hash.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | 
 3 | # Copyright 2022 Adansons Inc.
 4 | # Please contact engineer@adansons.co.jp
 5 | 
 6 | import os
 7 | import sys
 8 | 
 9 | sys.path.append(os.path.dirname(os.path.dirname(__file__)))
10 | 
11 | from base.hash import calc_file_hash
12 | 
13 | PATH = os.path.join(os.path.dirname(__file__), "data", "sample.jpeg")
14 | MD5HASH = "93304c750cf3dd4e8e91d374d60b9734"
15 | SHA224HASH = "c9462ddf27c8aefbf74f70ca13fd113f304e46d1359cde3f3aa8908a"
16 | SHA256HASH = "09e300d993f62d0e623e0d631a468e6126881b0e9152547ca8b369e7233e5717"
17 | SHA384HASH = "eb2e4a765e17f666122bb30f13a40e843fbfb32d6f6b3f96b5d8614c2761f3827ef5c374b5078c651d31ac549feed8f2"
18 | SHA512HASH = "c9414d9abf93f278457d9d31a0eef74a57644f7431aa9132a3ac5e7642b29a6b2f27976ff19700cca0bd9b902f8e4d5bfcfb4733b8b79e9b8c85d40fc796e7d6"
19 | SHA1HASH = "ec33c6e4dbe7a84f177899f6aac29bb718cb0451"
20 | 
21 | 
22 | def test_calc_file_hash_md5():
23 |     digest = calc_file_hash(PATH, algorithm="md5", split_chunk=False)
24 |     assert digest == MD5HASH
25 | 
26 | 
27 | def test_calc_file_hash_sha224():
28 |     digest = calc_file_hash(PATH, algorithm="sha224", split_chunk=False)
29 |     assert digest == SHA224HASH
30 | 
31 | 
32 | def test_calc_file_hash_sha256():
33 |     digest = calc_file_hash(PATH, algorithm="sha256", split_chunk=False)
34 |     assert digest == SHA256HASH
35 | 
36 | 
37 | def test_calc_file_hash_sha384():
38 |     digest = calc_file_hash(PATH, algorithm="sha384", split_chunk=False)
39 |     assert digest == SHA384HASH
40 | 
41 | 
42 | def test_calc_file_hash_sha512():
43 |     digest = calc_file_hash(PATH, algorithm="sha512", split_chunk=False)
44 |     assert digest == SHA512HASH
45 | 
46 | 
47 | def test_calc_file_hash_sha1():
48 |     digest = calc_file_hash(PATH, algorithm="sha1", split_chunk=False)
49 |     assert digest == SHA1HASH
50 | 
51 | 
52 | def test_split_chunk():
53 |     digest = calc_file_hash(PATH, algorithm="sha256", split_chunk=True)
54 |     assert digest == SHA256HASH
55 | 
56 | 
57 | if __name__ == "__main__":
58 |     test_calc_file_hash_md5()
59 |     test_calc_file_hash_sha224()
60 |     test_calc_file_hash_sha256()
61 |     test_calc_file_hash_sha384()
62 |     test_calc_file_hash_sha512()
63 |     test_calc_file_hash_sha1()
64 |     test_split_chunk()
65 | 


--------------------------------------------------------------------------------
/.github/workflows/ci.yml:
--------------------------------------------------------------------------------
 1 | name: CI_PRD
 2 | 
 3 | on:
 4 |   # execute when pull requested
 5 |   pull_request:
 6 |     branches: 
 7 |       - 'main'
 8 | 
 9 |     paths:
10 |       - '.github/workflows/**'
11 |       - 'base/**'
12 |       - 'tests/**'
13 | 
14 | jobs:
15 |   test:
16 |     runs-on: ubuntu-latest
17 |     strategy:
18 |       matrix:
19 |         python-version: [3.9]
20 | 
21 |     env:
22 |       GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
23 | 
24 |     steps:
25 |       - name: Checkout
26 |         uses: actions/checkout@v2
27 |         with:
28 |           ref: ${{ github.event.pull_request.head.ref }}
29 | 
30 |       - name: Set up Python ${{ matrix.python-version }}
31 |         uses: actions/setup-python@v2
32 |         with:
33 |           python-version: ${{ matrix.python-version }}
34 | 
35 |       - name: Install Dependencies
36 |         run: |
37 |           pip install --upgrade pip
38 |           pip install poetry
39 |           poetry install --no-interaction
40 | 
41 |       - name: Run the test with pytest
42 |         run: poetry run pytest tests/
43 |         env:
44 |           BASE_USER_ID: ${{ secrets.BASE_USER_ID }}
45 |           BASE_ACCESS_KEY: ${{ secrets.BASE_ACCESS_KEY_PRD }}
46 | 
47 |   fix-format:
48 |     runs-on: ubuntu-latest
49 |     needs: test
50 | 
51 |     env:
52 |       GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
53 | 
54 |     steps:
55 |       - name: Checkout
56 |         uses: actions/checkout@v2
57 |         with:
58 |           ref: ${{ github.event.pull_request.head.ref }}
59 | 
60 |       - name: Set up Python
61 |         uses: actions/setup-python@v2
62 |         with:
63 |           python-version: '3.9'
64 | 
65 |       - name: Install Dependencies
66 |         run: |
67 |           pip install --upgrade pip
68 |           pip install poetry
69 |           poetry install --no-interaction
70 | 
71 |       - name: Format Python Source with Black
72 |         run: poetry run black .
73 | 
74 |       - name: Push to Pull Requested branch
75 |         run: |
76 |           if ! git diff --exit-code --quiet
77 |           then
78 |             git config --global user.email "github-actions[bot]@users.noreply.github.com"
79 |             git config --global user.name "github-actions"
80 | 
81 |             git add .
82 |             git commit -m "[Actions]Fix format with black."
83 |             git push
84 |           fi


--------------------------------------------------------------------------------
/.github/workflows/ci_dev.yml:
--------------------------------------------------------------------------------
 1 | name: CI_DEV
 2 | 
 3 | on:
 4 |   # execute when pull requested
 5 |   pull_request:
 6 |     branches: 
 7 |       - 'dev'
 8 | 
 9 |     paths:
10 |       - '.github/workflows/**'
11 |       - 'base/**'
12 |       - 'tests/**'
13 | 
14 | jobs:
15 |   test:
16 |     runs-on: ubuntu-latest
17 |     strategy:
18 |       matrix:
19 |         python-version: [3.9]
20 | 
21 |     env:
22 |       GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
23 | 
24 |     steps:
25 |       - name: Checkout
26 |         uses: actions/checkout@v2
27 |         with:
28 |           ref: ${{ github.event.pull_request.head.ref }}
29 | 
30 |       - name: Set up Python ${{ matrix.python-version }}
31 |         uses: actions/setup-python@v2
32 |         with:
33 |           python-version: ${{ matrix.python-version }}
34 | 
35 |       - name: Install Dependencies
36 |         run: |
37 |           pip install --upgrade pip
38 |           pip install poetry
39 |           poetry install --no-interaction
40 | 
41 |       - name: Run the test with pytest
42 |         run: poetry run pytest tests/
43 |         env:
44 |           BASE_USER_ID: ${{ secrets.BASE_USER_ID }}
45 |           BASE_ACCESS_KEY: ${{ secrets.BASE_ACCESS_KEY }}
46 |           BASE_API_ENDPOINT: ${{ secrets.BASE_API_ENDPOINT }}
47 | 
48 |   fix-format:
49 |     runs-on: ubuntu-latest
50 |     needs: test
51 | 
52 |     env:
53 |       GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
54 | 
55 |     steps:
56 |       - name: Checkout
57 |         uses: actions/checkout@v2
58 |         with:
59 |           ref: ${{ github.event.pull_request.head.ref }}
60 | 
61 |       - name: Set up Python
62 |         uses: actions/setup-python@v2
63 |         with:
64 |           python-version: '3.9'
65 | 
66 |       - name: Install Dependencies
67 |         run: |
68 |           pip install --upgrade pip
69 |           pip install poetry
70 |           poetry install --no-interaction
71 | 
72 |       - name: Format Python Source with Black
73 |         run: poetry run black .
74 | 
75 |       - name: Push to Pull Requested branch
76 |         run: |
77 |           if ! git diff --exit-code --quiet
78 |           then
79 |             git config --global user.email "github-actions[bot]@users.noreply.github.com"
80 |             git config --global user.name "github-actions"
81 | 
82 |             git add .
83 |             git commit -m "[Actions]Fix format with black."
84 |             git push
85 |           fi


--------------------------------------------------------------------------------
/base/exception.py:
--------------------------------------------------------------------------------
 1 | import click
 2 | 
 3 | 
 4 | def CatchAllExceptions(cls, handler):
 5 |     """
 6 |     Function to override the default exception handler of click.
 7 |     With thanks to this Stack Overflow answer:
 8 |     https://stackoverflow.com/questions/52213375/python-click-exception-handling-under-setuptools
 9 | 
10 |     Parameters
11 |     ----------
12 |     cls : The class that the handler is being applied e.g. click.Command
13 |     handler : The handler function to print the error message
14 | 
15 |     Returns
16 |     -------
17 |     Cls : The new class itself with the handler applied
18 |     """
19 | 
20 |     class Cls(cls):
21 |         _original_args = None
22 | 
23 |         def make_context(self, info_name, args, parent=None, **extra):
24 | 
25 |             # grab the original command line arguments
26 |             self._original_args = " ".join(args)
27 | 
28 |             try:
29 |                 return super(Cls, self).make_context(
30 |                     info_name, args, parent=parent, **extra
31 |                 )
32 |             except Exception as exc:
33 |                 # call the handler
34 |                 handler(self, info_name, exc)
35 |                 # let the user see the original error
36 |                 raise
37 | 
38 |         def invoke(self, ctx):
39 |             try:
40 |                 return super(Cls, self).invoke(ctx)
41 |             except Exception as exc:
42 |                 # call the handler
43 |                 handler(self, ctx.info_name, exc)
44 | 
45 |                 # let the user see the original error
46 |                 raise
47 | 
48 |     return Cls
49 | 
50 | 
51 | def search_export_exception(cmd, info_name, exc):
52 |     """
53 |     Function to handle the exception for "base search --export" command.
54 | 
55 |     Parameters
56 |     ----------
57 |     cmd: The command that user is trying to run
58 |     info_name: The name of the exception
59 |     exc: The exception message
60 | 
61 |     Returns
62 |     -------
63 |     None
64 |     """
65 |     # send error info to rollbar, etc, here
66 |     if ("'--export' requires an argument" or "'--e' requires an argument") in str(exc):
67 |         click.echo("You can specify ‘json’ or ‘csv’ as export-file-type")
68 |     elif ("'--output' requires an argument" or "'--o' requires an argument") in str(
69 |         exc
70 |     ):
71 |         click.echo("You can specify the output file name")
72 |     else:
73 |         click.echo("Raised error: {}".format(exc))
74 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | .DS_Store
  2 | archive/
  3 | 
  4 | # Byte-compiled / optimized / DLL files
  5 | __pycache__/
  6 | *.py[cod]
  7 | *$py.class
  8 | 
  9 | # C extensions
 10 | *.so
 11 | 
 12 | # Distribution / packaging
 13 | .Python
 14 | build/
 15 | develop-eggs/
 16 | dist/
 17 | downloads/
 18 | eggs/
 19 | .eggs/
 20 | lib/
 21 | lib64/
 22 | parts/
 23 | sdist/
 24 | var/
 25 | wheels/
 26 | pip-wheel-metadata/
 27 | share/python-wheels/
 28 | *.egg-info/
 29 | .installed.cfg
 30 | *.egg
 31 | MANIFEST
 32 | 
 33 | # PyInstaller
 34 | #  Usually these files are written by a python script from a template
 35 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 36 | *.manifest
 37 | *.spec
 38 | 
 39 | # Installer logs
 40 | pip-log.txt
 41 | pip-delete-this-directory.txt
 42 | 
 43 | # Unit test / coverage reports
 44 | htmlcov/
 45 | .tox/
 46 | .nox/
 47 | .coverage
 48 | .coverage.*
 49 | .cache
 50 | nosetests.xml
 51 | coverage.xml
 52 | *.cover
 53 | *.py,cover
 54 | .hypothesis/
 55 | .pytest_cache/
 56 | 
 57 | # Translations
 58 | *.mo
 59 | *.pot
 60 | 
 61 | # Django stuff:
 62 | *.log
 63 | local_settings.py
 64 | db.sqlite3
 65 | db.sqlite3-journal
 66 | 
 67 | # Flask stuff:
 68 | instance/
 69 | .webassets-cache
 70 | 
 71 | # Scrapy stuff:
 72 | .scrapy
 73 | 
 74 | # Sphinx documentation
 75 | docs/_build/
 76 | 
 77 | # PyBuilder
 78 | target/
 79 | 
 80 | # Jupyter Notebook
 81 | .ipynb_checkpoints
 82 | 
 83 | # IPython
 84 | profile_default/
 85 | ipython_config.py
 86 | 
 87 | # pyenv
 88 | .python-version
 89 | 
 90 | # pipenv
 91 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 92 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 93 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 94 | #   install all needed dependencies.
 95 | #Pipfile.lock
 96 | 
 97 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 98 | __pypackages__/
 99 | 
100 | # Celery stuff
101 | celerybeat-schedule
102 | celerybeat.pid
103 | 
104 | # SageMath parsed files
105 | *.sage.py
106 | 
107 | # Environments
108 | .env
109 | .venv
110 | env/
111 | venv/
112 | ENV/
113 | env.bak/
114 | venv.bak/
115 | 
116 | # Spyder project settings
117 | .spyderproject
118 | .spyproject
119 | 
120 | # Rope project settings
121 | .ropeproject
122 | 
123 | # mkdocs documentation
124 | /site
125 | 
126 | # mypy
127 | .mypy_cache/
128 | .dmypy.json
129 | dmypy.json
130 | 
131 | # Pyre type checker
132 | .pyre/
133 | 
134 | # Created by https://www.toptal.com/developers/gitignore/api/terraform
135 | # Edit at https://www.toptal.com/developers/gitignore?templates=terraform
136 | 
137 | /example/download_mnist.py
138 | .vscode/settings.json
139 | 


--------------------------------------------------------------------------------
/base/spinner.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | import time
  6 | import itertools
  7 | import threading
  8 | 
  9 | 
 10 | class Spinner:
 11 |     """
 12 |     Spinner class
 13 | 
 14 |     Attributes
 15 |     ----------
 16 |     text : str
 17 |         text to be displayed with a spinner when the process is started
 18 |     etext : str
 19 |         text to be displayed when the process is terminated
 20 |     overwrite : bool
 21 |         whether `etext` overwrites `text` or not
 22 |     """
 23 | 
 24 |     def __init__(
 25 |         self, text: str = "Please wait...", etext: str = "", overwrite: bool = True
 26 |     ) -> None:
 27 |         """
 28 |         Parameters
 29 |         ----------
 30 |         text : str
 31 |             text to be displayed with a spinner when the process is started
 32 |         etext : str
 33 |             text to be displayed when the process is terminated
 34 |         overwrite : bool
 35 |             whether `etext` overwrites `text` or not
 36 |         """
 37 |         self.text = text
 38 |         self.etext = etext
 39 |         self.overwrite = overwrite
 40 | 
 41 |         self._stop_flag = False
 42 | 
 43 |     def _spinner(self) -> None:
 44 |         """
 45 |         Display the `text` and a spinner during processing.
 46 |         """
 47 |         chars = itertools.cycle(r"/-\|")
 48 |         while not self._stop_flag:
 49 |             print(f"\r{self.text} {next(chars)}", end="")
 50 |             time.sleep(0.2)
 51 | 
 52 |     def start(self):
 53 |         """
 54 |         Set up a subthread to run a spinner.
 55 |         """
 56 |         self._stop_flag = False
 57 |         self._spinner_thread = threading.Thread(target=self._spinner)
 58 |         self._spinner_thread.setDaemon(True)
 59 |         self._spinner_thread.start()
 60 | 
 61 |     def stop(self, etext: str = "", overwrite: bool = True):
 62 |         """
 63 |         Kill a subthread and display `etest`.
 64 | 
 65 |         Parameters
 66 |         ----------
 67 |         etext : str
 68 |             text to be displayed when the process is terminated
 69 |         overwrite : bool
 70 |             whether `etext` overwrites `text` or not
 71 |         """
 72 |         if self._spinner_thread and self._spinner_thread.is_alive():
 73 |             self._stop_flag = True
 74 |             self._spinner_thread.join()
 75 | 
 76 |         etext = etext or self.etext
 77 |         overwrite = self.overwrite if not self.overwrite else overwrite
 78 | 
 79 |         if overwrite:
 80 |             if etext == "":
 81 |                 print(f"\r\033[2K\033[G", end="")
 82 |             else:
 83 |                 print(f"\r\033[2K\033[G{etext}")
 84 |         else:
 85 |             if etext == "":
 86 |                 print(f"\033[1D\033[K\n", end="")
 87 |             else:
 88 |                 print(f"\033[1D\033[K\n{etext}")
 89 | 
 90 |     def __enter__(self) -> None:
 91 |         """
 92 |         Start the spinner used on context managers.
 93 |         """
 94 |         self.start()
 95 | 
 96 |     def __exit__(self, exception_type, exception_value, traceback) -> None:
 97 |         """
 98 |         Stop the spinner used on context managers.
 99 |         """
100 |         if exception_type is not None:
101 |             if self.overwrite:
102 |                 self.stop(etext=self.text)
103 |             else:
104 |                 self.stop(etext="")
105 |         else:
106 |             self.stop()
107 | 


--------------------------------------------------------------------------------
/download_mnist.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact higuchi@adansons.co.jp
  5 | # This script is cited from https://github.com/myleott/mnist_png/blob/master/convert_mnist_to_png.py
  6 | 
  7 | import os
  8 | import struct
  9 | import sys
 10 | 
 11 | from array import array
 12 | 
 13 | import png
 14 | 
 15 | import gzip
 16 | import urllib.request
 17 | 
 18 | 
 19 | original_sources_list = [
 20 |     "train-images-idx3-ubyte",
 21 |     "train-labels-idx1-ubyte",
 22 |     "t10k-images-idx3-ubyte",
 23 |     "t10k-labels-idx1-ubyte",
 24 | ]
 25 | 
 26 | 
 27 | def download_original_sources(path: str = "."):
 28 |     for s in original_sources_list:
 29 |         fname = f"{s}.gz"
 30 |         url = f"http://yann.lecun.com/exdb/mnist/{fname}"
 31 | 
 32 |         s_path = os.path.join(path, s)
 33 |         fname_path = os.path.join(path, fname)
 34 |         urllib.request.urlretrieve(url, fname_path)
 35 |         with gzip.open(fname_path, mode="rb") as gzfile:
 36 |             with open(s_path, mode="wb") as f:
 37 |                 f.write(gzfile.read())
 38 | 
 39 |         os.remove(fname_path)
 40 | 
 41 | 
 42 | def remove_original_sources(path: str = "."):
 43 |     for s in original_sources_list:
 44 |         s_path = os.path.join(path, s)
 45 |         os.remove(s_path)
 46 | 
 47 | 
 48 | # source: http://abel.ee.ucla.edu/cvxopt/_downloads/mnist.py
 49 | def read(dataset: str = "train", path: str = "."):
 50 |     if dataset == "train":
 51 |         fname_img = os.path.join(path, "train-images-idx3-ubyte")
 52 |         fname_lbl = os.path.join(path, "train-labels-idx1-ubyte")
 53 |     elif dataset == "test":
 54 |         fname_img = os.path.join(path, "t10k-images-idx3-ubyte")
 55 |         fname_lbl = os.path.join(path, "t10k-labels-idx1-ubyte")
 56 |     else:
 57 |         raise ValueError("dataset must be 'test' or 'train'")
 58 | 
 59 |     flbl = open(fname_lbl, "rb")
 60 |     magic_nr, size = struct.unpack(">II", flbl.read(8))
 61 |     lbl = array("b", flbl.read())
 62 |     flbl.close()
 63 | 
 64 |     fimg = open(fname_img, "rb")
 65 |     magic_nr, size, rows, cols = struct.unpack(">IIII", fimg.read(16))
 66 |     img = array("B", fimg.read())
 67 |     fimg.close()
 68 | 
 69 |     return lbl, img, size, rows, cols
 70 | 
 71 | 
 72 | def write_dataset(labels, data, size, rows, cols, output_dir):
 73 |     # create output directories
 74 |     output_dirs = [os.path.join(output_dir, str(i)) for i in range(10)]
 75 |     for dir in output_dirs:
 76 |         os.makedirs(dir, exist_ok=True)
 77 | 
 78 |     # write data
 79 |     for (i, label) in enumerate(labels):
 80 |         output_filename = os.path.join(output_dirs[label], str(i) + ".png")
 81 |         with open(output_filename, "wb") as h:
 82 |             w = png.Writer(cols, rows, greyscale=True)
 83 |             data_i = [
 84 |                 data[(i * rows * cols + j * cols) : (i * rows * cols + (j + 1) * cols)]
 85 |                 for j in range(rows)
 86 |             ]
 87 |             w.write(h, data_i)
 88 | 
 89 | 
 90 | if __name__ == "__main__":
 91 |     if len(sys.argv) != 2:
 92 |         print(f"usage: {sys.argv[0]} <output_path>")
 93 |         sys.exit()
 94 | 
 95 |     data_root_dir = sys.argv[1]
 96 |     os.makedirs(data_root_dir, exist_ok=True)
 97 |     download_original_sources(data_root_dir)
 98 | 
 99 |     for dataset in ["train", "test"]:
100 |         labels, data, size, rows, cols = read(dataset, data_root_dir)
101 |         write_dataset(
102 |             labels, data, size, rows, cols, os.path.join(data_root_dir, dataset)
103 |         )
104 | 
105 |     remove_original_sources(data_root_dir)
106 | 


--------------------------------------------------------------------------------
/base/dataset.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | import numpy as np
  6 | from sklearn.model_selection import train_test_split
  7 | from typing import Callable, Optional, Tuple
  8 | 
  9 | from base.files import Files
 10 | 
 11 | 
 12 | class Dataset:
 13 |     """
 14 |     Dataset class
 15 | 
 16 |     Attributes
 17 |     ----------
 18 |     transform : function
 19 |         function for preprocessing
 20 |     target_key : str
 21 |         key you want to label
 22 |     files : list
 23 |         list of File class
 24 |     paths : list
 25 |         list of filepath
 26 |     convert_dict : dict
 27 |         dict to convert labels to numbers
 28 |     y : list of int
 29 |         target label
 30 |     y_train : list of int
 31 |         target label used to train
 32 |     y_test : list of int
 33 |         target label used to test
 34 |     x : list
 35 |         data
 36 |     x_train : list
 37 |         data used to train
 38 |     x_test : list
 39 |         data used to test
 40 |     train_path : list
 41 |         filepath used to train
 42 |     test_path : list
 43 |         filepath used to test
 44 |     """
 45 | 
 46 |     def __init__(
 47 |         self, files: Files, target_key: str, transform: Optional[Callable] = None
 48 |     ) -> None:
 49 |         """
 50 |         Make the dict to convert labels to numbers.
 51 | 
 52 |         files : list
 53 |             list of File class
 54 |         target_key : str
 55 |             key you want to label
 56 |         transform : function or None, default None
 57 |             function for preprocessing
 58 |         """
 59 | 
 60 |         self.transform = transform
 61 |         self.target_key = target_key
 62 |         if self.transform == None:
 63 |             self.transform = lambda x: x
 64 | 
 65 |         self.files = files
 66 |         self.paths = self.files.paths
 67 | 
 68 |     def train_test_split(self, split_rate: int = 0.25) -> Tuple[list]:
 69 |         """
 70 |         Split train data and test data.
 71 | 
 72 |         Parameters
 73 |         ----------
 74 |         split_rate : float
 75 |             the proportion of the dataset to include in the test data
 76 | 
 77 |         Returns
 78 |         -------
 79 |         x_train : list
 80 |             data used to train
 81 |         x_test : list
 82 |             data used to test
 83 |         y_train : list of int
 84 |             target label used to train
 85 |         y_test : list of int
 86 |             target label used to test
 87 |         """
 88 |         self.y = [getattr(i, self.target_key) for i in self.files]
 89 |         self.x = [self.transform(i) for i in self.paths]
 90 | 
 91 |         (
 92 |             self.train_path,
 93 |             self.test_path,
 94 |             self.y_train,
 95 |             self.y_test,
 96 |         ) = train_test_split(self.paths, self.y, test_size=split_rate, stratify=self.y)
 97 | 
 98 |         self.x_train = [self.transform(i) for i in self.train_path]
 99 |         self.x_test = [self.transform(i) for i in self.test_path]
100 | 
101 |         return self.x_train, self.x_test, self.y_train, self.y_test
102 | 
103 |     def __len__(self) -> int:
104 |         return len(self.files)
105 | 
106 |     def __getitem__(self, idx: int) -> Tuple:
107 |         path = self.paths[idx]
108 |         data = self.transform(path)
109 |         label = getattr(self.files[idx], self.target_key)
110 | 
111 |         return data, label
112 | 
113 | 
114 | if __name__ == "__main__":
115 |     pass
116 | 


--------------------------------------------------------------------------------
/tests/test_parser.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | 
  6 | import os
  7 | import sys
  8 | import pytest
  9 | 
 10 | sys.path.append(os.path.dirname(os.path.dirname(__file__)))
 11 | 
 12 | from base.parser import Parser
 13 | 
 14 | 
 15 | INPUT_PATH1 = "/Origin/左声帯嚢胞/1-055-E-a01.wav"
 16 | PARSING_RULE1 = "{_}/{disease}/{_}-{patient-id}-{part}-{iteration}.wav"
 17 | PARSING_KEYS1 = ["_", "disease", "_", "patient-id", "part", "iteration", "[UnuseToken]"]
 18 | PARSED_DICT1 = {
 19 |     "disease": "左声帯嚢胞",
 20 |     "patient-id": "055",
 21 |     "part": "E",
 22 |     "iteration": "a01",
 23 | }
 24 | 
 25 | INPUT_PATH2 = "/Origin/hoge1/fugasuzukipiyo_03.csv"
 26 | PARSING_RULE2 = "{_}/hoge{num1}/fuga{name}piyo_{month}.csv"
 27 | CONVERTED_PARSING_RULE2 = "{_}/{[UnuseToken]}/{num1}/{[UnuseToken]}/{name}/{[UnuseToken]}_{month}.{[UnuseToken]}"
 28 | PARSING_KEYS2 = [
 29 |     "_",
 30 |     "[UnuseToken]",
 31 |     "num1",
 32 |     "[UnuseToken]",
 33 |     "name",
 34 |     "[UnuseToken]",
 35 |     "month",
 36 |     "[UnuseToken]",
 37 | ]
 38 | PARSED_DICT2 = {"num1": "1", "name": "suzuki", "month": "03"}
 39 | 
 40 | INPUT_PATH3 = "/Origin/hoge1/fugasuzukipiyo_2022_03_02.csv"
 41 | PARSING_RULE3 = "{_}/hoge{num1}/fuga{name}piyo_{timestamp}.csv"
 42 | DETAIL_PARSING_RULE = "{Origin}/hoge{1}/fuga{suzuki}piyo_{2022_03_02}.csv"
 43 | PARSING_KEYS3 = [
 44 |     "_",
 45 |     "[UnuseToken]",
 46 |     "num1",
 47 |     "[UnuseToken]",
 48 |     "name",
 49 |     "[UnuseToken]",
 50 |     "timestamp",
 51 |     "[UnuseToken]",
 52 | ]
 53 | PARSED_DICT3 = {"num1": "1", "name": "suzuki", "timestamp": "2022_03_02"}
 54 | 
 55 | 
 56 | def test_generate_parser_pattern1():
 57 |     parser = Parser(PARSING_RULE1, extension="wav")
 58 |     assert parser.parsing_keys == PARSING_KEYS1
 59 | 
 60 | 
 61 | def test_parse_pattern1():
 62 |     parser = Parser(PARSING_RULE1, extension="wav")
 63 |     parsed_dict = parser(INPUT_PATH1)
 64 |     assert parsed_dict == PARSED_DICT1
 65 | 
 66 | 
 67 | def test_generate_parser_pattern2():
 68 |     parser = Parser(PARSING_RULE2, extension="csv")
 69 |     assert parser.parsing_keys == PARSING_KEYS2
 70 |     assert parser.parsing_rule == CONVERTED_PARSING_RULE2
 71 | 
 72 | 
 73 | def test_parse_pattern2():
 74 |     parser = Parser(PARSING_RULE2, extension="csv")
 75 |     parsed_dict = parser(INPUT_PATH2)
 76 |     assert parsed_dict == PARSED_DICT2
 77 | 
 78 | 
 79 | def test_generate_parser_pattern3():
 80 |     parser = Parser(PARSING_RULE3, extension="csv")
 81 |     parser.update_rule(DETAIL_PARSING_RULE)
 82 |     assert parser.parsing_keys == PARSING_KEYS3
 83 | 
 84 | 
 85 | def test_parse_pattern3():
 86 |     parser = Parser(PARSING_RULE3, extension="csv")
 87 |     parser.update_rule(DETAIL_PARSING_RULE)
 88 |     parsed_dict = parser(INPUT_PATH3)
 89 |     assert parsed_dict == PARSED_DICT3
 90 | 
 91 | 
 92 | def test_is_path_parsable():
 93 |     parser = Parser(PARSING_RULE3, extension="csv")
 94 | 
 95 |     # Check for `IndexError` when parsing the file path with invalid `parsing_rule`.
 96 |     with pytest.raises(Exception) as e:
 97 |         _ = parser(INPUT_PATH3)
 98 |     assert str(e.value) == "list index out of range"
 99 | 
100 |     # Test `is_path_parsable` with non-parsable path.
101 |     parsable_frag = parser.is_path_parsable(INPUT_PATH3)
102 |     assert parsable_frag == False
103 | 
104 | 
105 | if __name__ == "__main__":
106 |     test_generate_parser_pattern1()
107 |     test_parse_pattern1()
108 |     test_generate_parser_pattern2()
109 |     test_parse_pattern2()
110 |     test_generate_parser_pattern3()
111 |     test_parse_pattern3()
112 |     test_is_path_parsable()
113 | 


--------------------------------------------------------------------------------
/tests/test_config.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | import time
  6 | from click.testing import CliRunner
  7 | from base.cli import (
  8 |     list_project,
  9 |     remove_project,
 10 | )
 11 | from base.config import (
 12 |     get_user_id,
 13 |     register_user_id,
 14 |     get_access_key,
 15 |     register_access_key,
 16 |     get_project_uid,
 17 |     check_project_exists,
 18 |     register_project_uid,
 19 |     delete_project_config,
 20 |     update_project_info,
 21 |     get_user_id_from_db,
 22 | )
 23 | from base.project import create_project
 24 | 
 25 | PROJECT_NAME = "adansons_test_project"
 26 | 
 27 | 
 28 | def test_initialize():
 29 |     """
 30 |     If something went wrong past test session.
 31 |     You may have exsiting tables, so you have to clear them before below tests.
 32 |     """
 33 |     runner = CliRunner()
 34 |     result = runner.invoke(list_project, [])
 35 |     if PROJECT_NAME in result.output:
 36 |         result = runner.invoke(remove_project, [PROJECT_NAME])
 37 | 
 38 |     result = runner.invoke(list_project, ["--archived"])
 39 |     if PROJECT_NAME in result.output:
 40 |         result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"])
 41 | 
 42 | 
 43 | def test_get_access_key():
 44 |     get_access_key()
 45 | 
 46 | 
 47 | def test_register_access_key():
 48 |     access_key = get_access_key()
 49 |     register_access_key(access_key)
 50 | 
 51 | 
 52 | def test_get_user_id_from_db():
 53 |     access_key = get_access_key()
 54 |     get_user_id_from_db(access_key)
 55 | 
 56 | 
 57 | def test_register_user_id():
 58 |     access_key = get_access_key()
 59 |     user_id = get_user_id_from_db(access_key)
 60 |     register_user_id(user_id)
 61 | 
 62 | 
 63 | def test_get_user_id():
 64 |     get_user_id()
 65 | 
 66 | 
 67 | def test_register_project():
 68 |     user_id = get_user_id()
 69 |     create_project(user_id, PROJECT_NAME)
 70 |     time.sleep(20)
 71 | 
 72 | 
 73 | def test_check_project_exists():
 74 |     user_id = get_user_id()
 75 |     assert check_project_exists(user_id, PROJECT_NAME)
 76 | 
 77 | 
 78 | def test_get_project_uid():
 79 |     user_id = get_user_id()
 80 |     get_project_uid(user_id, PROJECT_NAME)
 81 | 
 82 | 
 83 | def test_register_project_uid():
 84 |     user_id = get_user_id()
 85 |     project_uid = get_project_uid(user_id, PROJECT_NAME)
 86 |     register_project_uid(user_id, PROJECT_NAME, project_uid)
 87 | 
 88 | 
 89 | def test_update_project_info():
 90 |     user_id = get_user_id()
 91 |     update_project_info(user_id)
 92 | 
 93 | 
 94 | def test_archive_project():
 95 |     runner = CliRunner()
 96 |     result = runner.invoke(remove_project, [PROJECT_NAME])
 97 |     assert result.exit_code == 0
 98 |     assert f"{PROJECT_NAME} was Archived" in result.output
 99 |     result = runner.invoke(list_project, [])
100 |     assert result.exit_code == 0
101 |     assert PROJECT_NAME not in result.output
102 | 
103 | 
104 | # How to test delete project config itself?
105 | def test_delete_project():
106 |     runner = CliRunner()
107 |     result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"])
108 |     assert result.exit_code == 0
109 |     assert f"{PROJECT_NAME} was Deleted" in result.output
110 |     result = runner.invoke(list_project, ["--archived"])
111 |     assert result.exit_code == 0
112 |     assert PROJECT_NAME not in result.output
113 | 
114 | 
115 | if __name__ == "__main__":
116 |     test_initialize()
117 |     test_get_access_key()
118 |     test_register_access_key()
119 |     test_get_user_id_from_db()
120 |     test_register_user_id()
121 |     test_get_user_id()
122 |     test_check_project_exists()
123 |     test_get_project_uid()
124 |     test_register_project_uid()
125 |     test_update_project_info()
126 |     test_archive_project()
127 |     test_delete_project()
128 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
  1 | # Contributor Covenant Code of Conduct
  2 | 
  3 | ## Our Pledge
  4 | 
  5 | We as members, contributors, and leaders pledge to make participation in our
  6 | community a harassment-free experience for everyone, regardless of age, body
  7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
  8 | identity and expression, level of experience, education, socio-economic status,
  9 | nationality, personal appearance, race, religion, or sexual identity
 10 | and orientation.
 11 | 
 12 | We pledge to act and interact in ways that contribute to an open, welcoming,
 13 | diverse, inclusive, and healthy community.
 14 | 
 15 | ## Our Standards
 16 | 
 17 | Examples of behavior that contributes to a positive environment for our
 18 | community include:
 19 | 
 20 | * Demonstrating empathy and kindness toward other people
 21 | * Being respectful of differing opinions, viewpoints, and experiences
 22 | * Giving and gracefully accepting constructive feedback
 23 | * Accepting responsibility and apologizing to those affected by our mistakes,
 24 |   and learning from the experience
 25 | * Focusing on what is best not just for us as individuals, but for the
 26 |   overall community
 27 | 
 28 | Examples of unacceptable behavior include:
 29 | 
 30 | * The use of sexualized language or imagery, and sexual attention or
 31 |   advances of any kind
 32 | * Trolling, insulting or derogatory comments, and personal or political attacks
 33 | * Public or private harassment
 34 | * Publishing others' private information, such as a physical or email
 35 |   address, without their explicit permission
 36 | * Other conduct which could reasonably be considered inappropriate in a
 37 |   professional setting
 38 | 
 39 | ## Enforcement Responsibilities
 40 | 
 41 | Community leaders are responsible for clarifying and enforcing our standards of
 42 | acceptable behavior and will take appropriate and fair corrective action in
 43 | response to any behavior that they deem inappropriate, threatening, offensive,
 44 | or harmful.
 45 | 
 46 | Community leaders have the right and responsibility to remove, edit, or reject
 47 | comments, commits, code, wiki edits, issues, and other contributions that are
 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
 49 | decisions when appropriate.
 50 | 
 51 | ## Scope
 52 | 
 53 | This Code of Conduct applies within all community spaces, and also applies when
 54 | an individual is officially representing the community in public spaces.
 55 | Examples of representing our community include using an official e-mail address,
 56 | posting via an official social media account, or acting as an appointed
 57 | representative at an online or offline event.
 58 | 
 59 | ## Enforcement
 60 | 
 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
 62 | reported to the community leaders responsible for enforcement at
 63 | Email (support@adansons.co.jp).
 64 | All complaints will be reviewed and investigated promptly and fairly.
 65 | 
 66 | All community leaders are obligated to respect the privacy and security of the
 67 | reporter of any incident.
 68 | 
 69 | ## Enforcement Guidelines
 70 | 
 71 | Community leaders will follow these Community Impact Guidelines in determining
 72 | the consequences for any action they deem in violation of this Code of Conduct:
 73 | 
 74 | ### 1. Correction
 75 | 
 76 | **Community Impact**: Use of inappropriate language or other behavior deemed
 77 | unprofessional or unwelcome in the community.
 78 | 
 79 | **Consequence**: A private, written warning from community leaders, providing
 80 | clarity around the nature of the violation and an explanation of why the
 81 | behavior was inappropriate. A public apology may be requested.
 82 | 
 83 | ### 2. Warning
 84 | 
 85 | **Community Impact**: A violation through a single incident or series
 86 | of actions.
 87 | 
 88 | **Consequence**: A warning with consequences for continued behavior. No
 89 | interaction with the people involved, including unsolicited interaction with
 90 | those enforcing the Code of Conduct, for a specified period of time. This
 91 | includes avoiding interactions in community spaces as well as external channels
 92 | like social media. Violating these terms may lead to a temporary or
 93 | permanent ban.
 94 | 
 95 | ### 3. Temporary Ban
 96 | 
 97 | **Community Impact**: A serious violation of community standards, including
 98 | sustained inappropriate behavior.
 99 | 
100 | **Consequence**: A temporary ban from any sort of interaction or public
101 | communication with the community for a specified period of time. No public or
102 | private interaction with the people involved, including unsolicited interaction
103 | with those enforcing the Code of Conduct, is allowed during this period.
104 | Violating these terms may lead to a permanent ban.
105 | 
106 | ### 4. Permanent Ban
107 | 
108 | **Community Impact**: Demonstrating a pattern of violation of community
109 | standards, including sustained inappropriate behavior,  harassment of an
110 | individual, or aggression toward or disparagement of classes of individuals.
111 | 
112 | **Consequence**: A permanent ban from any sort of public interaction within
113 | the community.
114 | 
115 | ## Attribution
116 | 
117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118 | version 2.0, available at
119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120 | 
121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
122 | enforcement ladder](https://github.com/mozilla/diversity).
123 | 
124 | [homepage]: https://www.contributor-covenant.org
125 | 
126 | For answers to common questions about this code of conduct, see the FAQ at
127 | https://www.contributor-covenant.org/faq. Translations are available at
128 | https://www.contributor-covenant.org/translations.
129 | 


--------------------------------------------------------------------------------
/base/config.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | import os
  6 | import json
  7 | import time
  8 | import requests
  9 | import configparser
 10 | 
 11 | from base.spinner import Spinner
 12 | 
 13 | CONFIG_FILE = os.path.join(os.path.expanduser("~"), ".base", "config")
 14 | PROJECT_FILE = os.path.join(os.path.expanduser("~"), ".base", "projects")
 15 | LINKER_DIR = os.path.join(os.path.expanduser("~"), ".base", "linker")
 16 | 
 17 | HEADER = {"Content-Type": "application/json"}
 18 | BASE_API_ENDPOINT = os.environ.get(
 19 |     "BASE_API_ENDPOINT", "https://api.base.adansons.co.jp"
 20 | )
 21 | 
 22 | 
 23 | def get_user_id() -> str:
 24 |     """
 25 |     Get user id from config file.
 26 |     if you have 'BASE_USER_ID' on environment variables, Base will use it
 27 | 
 28 |     Returns
 29 |     -------
 30 |     user_id : str
 31 |         aquired user id from environment variable or config file
 32 |     """
 33 |     user_id = os.environ.get("BASE_USER_ID", None)
 34 |     if user_id is None:
 35 |         config = configparser.ConfigParser()
 36 |         config.read(CONFIG_FILE)
 37 | 
 38 |         user_id = config["default"]["user_id"]
 39 | 
 40 |     return user_id
 41 | 
 42 | 
 43 | def register_user_id(user_id: str) -> None:
 44 |     """
 45 |     Register user id to local config file.
 46 | 
 47 |     Parameters
 48 |     ----------
 49 |     user_id : str
 50 |         target user id
 51 |     """
 52 |     config = configparser.ConfigParser()
 53 |     config.read(CONFIG_FILE)
 54 | 
 55 |     config["default"].update({"user_id": user_id})
 56 |     with open(CONFIG_FILE, "w") as f:
 57 |         config.write(f)
 58 | 
 59 | 
 60 | def get_access_key() -> str:
 61 |     """
 62 |     Get access key from config file
 63 |     if you have 'BASE_ACCESS_KEY' on environment variables, Base will use it
 64 | 
 65 |     Returns
 66 |     -------
 67 |     access_key : str
 68 |         aquired API access key from environment variable or config file
 69 |     """
 70 |     access_key = os.environ.get("BASE_ACCESS_KEY", None)
 71 |     if access_key is None:
 72 |         config = configparser.ConfigParser()
 73 |         config.read(CONFIG_FILE)
 74 | 
 75 |         access_key = config["default"]["access_key"]
 76 | 
 77 |     return access_key
 78 | 
 79 | 
 80 | def register_access_key(access_key: str) -> None:
 81 |     """
 82 |     Register access key to local config file.
 83 | 
 84 |     Parameters
 85 |     ----------
 86 |     access_key : str
 87 |         API access key
 88 |     """
 89 |     config = configparser.ConfigParser()
 90 |     config.read(CONFIG_FILE)
 91 | 
 92 |     config["default"] = {"access_key": access_key}
 93 |     os.makedirs(os.path.dirname(CONFIG_FILE), exist_ok=True)
 94 |     with open(CONFIG_FILE, "w") as f:
 95 |         config.write(f)
 96 | 
 97 | 
 98 | def get_project_uid(user_id: str, project_name: str) -> str:
 99 |     """
100 |     Get project uid from project name.
101 | 
102 |     Parameters
103 |     ----------
104 |     user_id : str
105 |         user id
106 |     project_name : str
107 |         target project name
108 | 
109 |     Returns
110 |     -------
111 |     project_uid : str
112 |         project uid of given project name
113 |     """
114 |     config = configparser.ConfigParser()
115 |     config.read(PROJECT_FILE)
116 | 
117 |     is_exist = check_project_exists(user_id, project_name)
118 |     if not is_exist:
119 |         raise KeyError(f"Project {project_name} does not exist.")
120 |     else:
121 |         project_uid = config[user_id][project_name]
122 |         return project_uid
123 | 
124 | 
125 | def check_project_exists(user_id: str, project_name: str) -> bool:
126 |     """
127 |     Check project is already exists or not
128 | 
129 |     Parameters
130 |     ----------
131 |     user_id : str
132 |         user id
133 |     project_name : str
134 |         target project name
135 | 
136 |     Returns
137 |     -------
138 |     project_exists : bool
139 |         project already exists or not
140 |     """
141 |     config = configparser.ConfigParser()
142 |     config.read(PROJECT_FILE)
143 | 
144 |     project_exists = project_name in config[user_id]
145 | 
146 |     return project_exists
147 | 
148 | 
149 | def check_project_available(user_id: str, project_id: str) -> None:
150 |     """
151 |     Check project's tables available or not.
152 | 
153 |     Parameters
154 |     ----------
155 |     user_id : str
156 |         user id
157 |     project_uid : str
158 |         target project uid
159 |     """
160 |     access_key = get_access_key()
161 |     HEADER.update({"x-api-key": access_key})
162 | 
163 |     with Spinner(text="Creating the project, please wait..."):
164 |         is_available = False
165 |         while not is_available:
166 |             url = (
167 |                 f"{BASE_API_ENDPOINT}/project/{project_id}/tables/status?user={user_id}"
168 |             )
169 |             res = requests.get(url, headers=HEADER)
170 | 
171 |             if res.status_code != 200:
172 |                 raise Exception("Something went wrong. Please try again.")
173 | 
174 |             status = res.json()["Status"]
175 |             if status == "Creating":
176 |                 time.sleep(1)
177 |             else:
178 |                 is_available = True
179 | 
180 | 
181 | def register_project_uid(user_id: str, project: str, project_uid: str) -> None:
182 |     """
183 |     Register project uid to local config file.
184 | 
185 |     Parameters
186 |     ----------
187 |     user_id : str
188 |         user id
189 |     project : str
190 |         target project name
191 |     project_uid : str
192 |         target project uid
193 |     """
194 |     config = configparser.ConfigParser()
195 |     config.read(PROJECT_FILE)
196 | 
197 |     if config.has_section(user_id):
198 |         config[user_id][project] = project_uid
199 |     else:
200 |         config[user_id] = {project: project_uid}
201 |     with open(PROJECT_FILE, "w") as f:
202 |         config.write(f)
203 | 
204 | 
205 | def delete_project_config(user_id: str, project_name: str) -> None:
206 |     """
207 |     Delete config of specified project.
208 | 
209 |     Parameters
210 |     ----------
211 |     user_id : str
212 |         user id
213 |     project_name : str
214 |         target project name
215 |     """
216 |     config = configparser.ConfigParser()
217 |     config.read(PROJECT_FILE)
218 | 
219 |     config.remove_option(user_id, project_name)
220 |     with open(PROJECT_FILE, "w") as f:
221 |         config.write(f)
222 | 
223 | 
224 | def update_project_info(user_id: str) -> None:
225 |     """
226 |     Update local project info with remote.
227 | 
228 |     Parameters
229 |     ----------
230 |     user_id : str
231 |         target user id
232 |     """
233 |     config = configparser.ConfigParser()
234 |     config.read(PROJECT_FILE)
235 | 
236 |     config.remove_section(user_id)
237 | 
238 |     access_key = get_access_key()
239 |     HEADER.update({"x-api-key": access_key})
240 | 
241 |     url = f"{BASE_API_ENDPOINT}/projects?user={user_id}"
242 |     res = requests.get(url, headers=HEADER)
243 |     if res.status_code != 200:
244 |         raise ValueError("Invalid user configuration")
245 |     projects = res.json()["Projects"]
246 | 
247 |     url += "&archived=1"
248 |     res = requests.get(url, headers=HEADER)
249 |     if res.json()["Projects"]:
250 |         projects.extend(res.json()["Projects"])
251 | 
252 |     project_info = {}
253 |     for project in projects:
254 |         project_name = project["ProjectName"]
255 |         project_uid = project["ProjectUid"]
256 |         project_info[project_name] = project_uid
257 | 
258 |     config[user_id] = project_info
259 |     with open(PROJECT_FILE, "w") as f:
260 |         config.write(f)
261 | 
262 | 
263 | def get_user_id_from_db(access_key: str) -> str:
264 |     """
265 |     Get user id from remote db.
266 | 
267 |     Parameters
268 |     ----------
269 |     access_key : str
270 |         API access key saved in config file
271 |     """
272 |     url = f"{BASE_API_ENDPOINT}/user/id"
273 |     res = requests.get(url, data=json.dumps({"api_key": access_key}), headers=HEADER)
274 | 
275 |     if res.status_code != 200:
276 |         raise ValueError(
277 |             "Incorrect access key was specified. Please retry or ask support team via Slack. \nIf you have not joined our Slack yet, get your invite here!\n-> https://share.hsforms.com/16OxTF7eJRPK92oGCny7nGw8moen\n"
278 |         )
279 |     user_id = res.json()["user_id"]
280 | 
281 |     return user_id
282 | 
283 | 
284 | if __name__ == "__main__":
285 |     pass
286 | 


--------------------------------------------------------------------------------
/tests/test_project.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | import os
  6 | import time
  7 | from click.testing import CliRunner
  8 | from base.cli import (
  9 |     list_project,
 10 |     remove_project,
 11 | )
 12 | from base.config import (
 13 |     get_access_key,
 14 |     delete_project_config,
 15 |     get_user_id_from_db,
 16 | )
 17 | from base.project import (
 18 |     create_project,
 19 |     get_projects,
 20 |     archive_project,
 21 |     delete_project,
 22 |     Project,
 23 |     summarize_keys_information,
 24 | )
 25 | 
 26 | PROJECT_NAME = "adansons_test_project"
 27 | USER_ID = get_user_id_from_db(get_access_key())
 28 | INVITE_USER_ID = "test_invite@adansons.co.jp"
 29 | TESTS_DIR = os.path.dirname(__file__)
 30 | TEST_METADATA_SUMMARY = [
 31 |     {
 32 |         "LowerValue": "0",
 33 |         "EditorList": ["xxxx@yyy.com"],
 34 |         "Creator": "xxxx@yyy.com",
 35 |         "ValueHash": "6dd1c6ef359fc0290897273dfee97dd6d1f277334b9a53f07056500409fd0f3a",
 36 |         "LastEditor": "xxxx@yyy.com",
 37 |         "UpperValue": "59999",
 38 |         "ValueType": "str",
 39 |         "CreatedTime": "1651429889.986235",
 40 |         "LastModifiedTime": "1651430744.0796146",
 41 |         "KeyHash": "a56145270ce6b3bebd1dd012b73948677dd618d496488bc608a3cb43ce3547dd",
 42 |         "KeyName": "id",
 43 |         "RecordedCount": 70000,
 44 |     },
 45 |     {
 46 |         "LowerValue": "0",
 47 |         "EditorList": ["xxxx@yyy.com"],
 48 |         "Creator": "xxxx@yyy.com",
 49 |         "ValueHash": "6dd1c6ef359fc0290897273dfee97dd6d1f277334b9a53f07056500409fd0f3a",
 50 |         "LastEditor": "xxxx@yyy.com",
 51 |         "UpperValue": "59999",
 52 |         "ValueType": "int",
 53 |         "CreatedTime": "1651429889.986235",
 54 |         "LastModifiedTime": "1651430744.0796146",
 55 |         "KeyHash": "a56145270ce6b3bebd1dd012b73948677dd618d496488bc608a3cb43ce3547dd",
 56 |         "KeyName": "index",
 57 |         "RecordedCount": 70000,
 58 |     },
 59 |     {
 60 |         "LowerValue": "0or6",
 61 |         "EditorList": ["xxxx@yyy.com"],
 62 |         "Creator": "xxxx@yyy.com",
 63 |         "ValueHash": "665c5c8dca33d1e21cbddcf524c7d8e19ec4b6b1576bbb04032bdedd8e79d95a",
 64 |         "LastEditor": "xxxx@yyy.com",
 65 |         "UpperValue": "-1",
 66 |         "ValueType": "str",
 67 |         "CreatedTime": "1651430744.0796146",
 68 |         "LastModifiedTime": "1651430744.0796146",
 69 |         "KeyHash": "34627e3242f2ca21f540951cb5376600aebba58675654dd5f61e860c6948bffa",
 70 |         "KeyName": "correction",
 71 |         "RecordedCount": 74,
 72 |     },
 73 |     {
 74 |         "LowerValue": "0",
 75 |         "EditorList": ["xxxx@yyy.com"],
 76 |         "Creator": "xxxx@yyy.com",
 77 |         "ValueHash": "0c2fb8f0d59d60a0a5e524c7794d1cf091a377e5c0d3b2cf19324432562555e1",
 78 |         "LastEditor": "xxxx@yyy.com",
 79 |         "UpperValue": "9",
 80 |         "ValueType": "str",
 81 |         "CreatedTime": "1651429889.986235",
 82 |         "LastModifiedTime": "1651430744.0796146",
 83 |         "KeyHash": "1aca80e8b55c802f7b43740da2990e1b5735bbb323d93eb5ebda8395b04025e2",
 84 |         "KeyName": "label",
 85 |         "RecordedCount": 70000,
 86 |     },
 87 |     {
 88 |         "LowerValue": "0",
 89 |         "EditorList": ["xxxx@yyy.com"],
 90 |         "Creator": "xxxx@yyy.com",
 91 |         "ValueHash": "0c2fb8f0d59d60a0a5e524c7794d1cf091a377e5c0d3b2cf19324432562555e1",
 92 |         "LastEditor": "xxxx@yyy.com",
 93 |         "UpperValue": "9",
 94 |         "ValueType": "int",
 95 |         "CreatedTime": "1651429889.986235",
 96 |         "LastModifiedTime": "1651430744.0796146",
 97 |         "KeyHash": "1aca80e8b55c802f7b43740da2990e1b5735bbb323d93eb5ebda8395b04025e2",
 98 |         "KeyName": "originalLabel",
 99 |         "RecordedCount": 70000,
100 |     },
101 |     {
102 |         "LowerValue": "test",
103 |         "EditorList": ["xxxx@yyy.com"],
104 |         "Creator": "xxxx@yyy.com",
105 |         "ValueHash": "0e546bb01e2c9a9d1c388fca8ce3fabdde16084aee10c58becd4767d39f62ab7",
106 |         "LastEditor": "xxxx@yyy.com",
107 |         "UpperValue": "train",
108 |         "ValueType": "str",
109 |         "CreatedTime": "1651429889.986235",
110 |         "LastModifiedTime": "1651430744.0796146",
111 |         "KeyHash": "9c98c4cbd490df10e7dc42f441c72ef835e3719d147241e32b962a6ff8c1f49d",
112 |         "KeyName": "dataType",
113 |         "RecordedCount": 70000,
114 |     },
115 | ]
116 | TEST_SUMMARY_OUTPUT = {
117 |     "MaxRecordedCount": 70000,
118 |     "UniqueKeyCount": 4,
119 |     "MaxCharCount": {
120 |         "KEY NAME": 23,
121 |         "VALUE RANGE": 12,
122 |         "VALUE TYPE": 34,
123 |         "RECORDED COUNT": 14,
124 |     },
125 |     "Keys": [
126 |         ("KEY NAME", "VALUE RANGE", "VALUE TYPE", "RECORDED COUNT"),
127 |         ("'id','index'", "0 ~ 59999", "str('id'), int('index')", "70000"),
128 |         ("'correction'", "0or6 ~ -1", "str('correction')", "74"),
129 |         (
130 |             "'label','originalLabel'",
131 |             "0 ~ 9",
132 |             "str('label'), int('originalLabel')",
133 |             "70000",
134 |         ),
135 |         ("'dataType'", "test ~ train", "str('dataType')", "70000"),
136 |     ],
137 | }
138 | 
139 | 
140 | def test_initialize():
141 |     """
142 |     If something went wrong past test session.
143 |     You may have exsiting tables, so you have to clear them before below tests.
144 |     """
145 |     runner = CliRunner()
146 |     result = runner.invoke(list_project, [])
147 |     if PROJECT_NAME in result.output:
148 |         result = runner.invoke(remove_project, [PROJECT_NAME])
149 | 
150 |     result = runner.invoke(list_project, ["--archived"])
151 |     if PROJECT_NAME in result.output:
152 |         result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"])
153 | 
154 | 
155 | def test_create_project():
156 |     create_project(USER_ID, PROJECT_NAME)
157 |     time.sleep(20)
158 | 
159 | 
160 | def test_get_projects():
161 |     project_list = get_projects(USER_ID)
162 |     assert any([project["ProjectName"] == PROJECT_NAME for project in project_list])
163 | 
164 | 
165 | def test_add_datafiles():
166 |     project = Project(PROJECT_NAME)
167 |     dir_path = TESTS_DIR
168 |     extension = "jpeg"
169 |     parsing_rule = "{_}/{title}.jpeg"
170 |     file_num = project.add_datafiles(dir_path, extension, parsing_rule=parsing_rule)
171 |     assert file_num == 1
172 | 
173 | 
174 | def test_add_datafile():
175 |     project = Project(PROJECT_NAME)
176 |     file_path = os.path.join(TESTS_DIR, "data", "sample.jpeg")
177 |     attributes = {"title": "sample"}
178 |     file_num = project.add_datafile(file_path, attributes)
179 | 
180 | 
181 | def test_extract_metafile():
182 |     project = Project(PROJECT_NAME)
183 |     file_path = os.path.join(TESTS_DIR, "data", "sample.xlsx")
184 |     project.extract_metafile(file_path)
185 | 
186 | 
187 | def test_estimate_join_rule():
188 |     project = Project(PROJECT_NAME)
189 |     file_path = os.path.join(TESTS_DIR, "data", "sample.xlsx")
190 |     project.estimate_join_rule(file_path=file_path)
191 | 
192 | 
193 | def test_add_metafile():
194 |     project = Project(PROJECT_NAME)
195 |     file_path = [os.path.join(TESTS_DIR, "data", "sample.xlsx")]
196 |     project.add_metafile(file_path, auto=True)
197 | 
198 | 
199 | def test_get_metadata_summary():
200 |     project = Project(PROJECT_NAME)
201 |     project.get_metadata_summary()
202 | 
203 | 
204 | def test_link_datafiles():
205 |     project = Project(PROJECT_NAME)
206 |     dir_path = TESTS_DIR
207 |     extension = "jpeg"
208 |     project.link_datafiles(dir_path, extension)
209 | 
210 | 
211 | def test_add_member():
212 |     project = Project(PROJECT_NAME)
213 |     project.add_member(INVITE_USER_ID, "Editor")
214 | 
215 | 
216 | def test_update_member():
217 |     project = Project(PROJECT_NAME)
218 |     project.update_member(INVITE_USER_ID, "Admin")
219 | 
220 | 
221 | def test_get_members():
222 |     project = Project(PROJECT_NAME)
223 |     project.get_members()
224 | 
225 | 
226 | def test_remove_member():
227 |     project = Project(PROJECT_NAME)
228 |     project.remove_member(INVITE_USER_ID)
229 | 
230 | 
231 | def test_archive_project():
232 |     archive_project(USER_ID, PROJECT_NAME)
233 | 
234 | 
235 | def test_delete_project():
236 |     delete_project(USER_ID, PROJECT_NAME)
237 |     delete_project_config(USER_ID, PROJECT_NAME)
238 | 
239 | 
240 | def test_summarize_keys_information():
241 |     result = summarize_keys_information(TEST_METADATA_SUMMARY)
242 |     assert result == TEST_SUMMARY_OUTPUT
243 | 
244 | 
245 | if __name__ == "__main__":
246 |     test_initialize()
247 |     test_create_project()
248 |     test_get_projects()
249 |     test_add_datafiles()
250 |     test_add_datafile()
251 |     test_extract_metafile()
252 |     test_estimate_join_rule()
253 |     test_add_metafile()
254 |     test_get_metadata_summary()
255 |     test_link_datafiles()
256 |     test_add_member()
257 |     test_update_member()
258 |     test_get_members()
259 |     test_remove_member()
260 |     test_archive_project()
261 |     test_delete_project()
262 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Adansons Base Document
  2 | 
  3 |   - [Product Concept](#product-concept)
  4 |   - [0. Get Access Key](#0-get-access-key)
  5 |   - [1. Installation](#1-installation)
  6 |   - [2. Configuration](#2-configuration)
  7 |     - [2.1 with CLI](#21-with-cli)
  8 |     - [2.2 Environment Variables](#22-environment-variables)
  9 |   - [3. Tutorial 1: Organize meta data and Create dataset](#3-tutorial-1-organize-meta-data-and-create-dataset)
 10 |     - [Step 0. prepare sample dataset](#step-0-prepare-sample-dataset)
 11 |     - [Step 1. create new project](#step-1-create-new-project)
 12 |     - [Step 2. import data files](#step-2-import-data-files)
 13 |     - [Step 3. import external metadata files](#step-3-import-external-metadata-files)
 14 |     - [Step 4. filter and export dataset with CLI](#step-4-filter-and-export-dataset-with-cli)
 15 |     - [Step 5. filter and export dataset with Python SDK](#step-5-filter-and-export-dataset-with-python-sdk)
 16 |   - [4. API Reference](#4-api-reference)
 17 |     - [4.1 Command Reference](#41-command-reference)
 18 |     - [4.2 Python Reference](#42-python-reference)
 19 | 
 20 | 
 21 | ## Product Concept
 22 | - Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets.
 23 | - It makes dataset creation more effective, helps find essential insights from training results, and improves AI performance.
 24 | 
 25 | More detail
 26 | ↓↓↓
 27 | 
 28 | - Medium
 29 |   - https://medium.com/@KenichiHiguchi/3-things-you-need-to-deal-with-in-data-management-to-create-best-dataset-781177507fc2
 30 | - Product Page
 31 |   - https://adansons.wraptas.site
 32 | 
 33 | ---
 34 | ## 0. Get Access Key
 35 | 
 36 | Type your email into the form below to join our slack and get the access key.
 37 | 
 38 | Invitation Form: https://share.hsforms.com/1KG8Hp2kwSjC6fjVwwlklZA8moen
 39 | 
 40 | 
 41 | ## 1. Installation
 42 | 
 43 | Adansons Base contains Command Line Interface (CLI) and Python SDK, and you can install both with `pip` command.
 44 | 
 45 | ```bash
 46 | pip install -U pip
 47 | pip install adansons-base
 48 | ```
 49 | 
 50 | > Note: if you want to use CLI in any directory, you have to install with the python globally installed on your computer.
 51 | 
 52 | ## 2. Configuration
 53 | 
 54 | ### 2.1 with CLI
 55 | 
 56 | when you run any Base CLI command for the first time, Base will ask for your access key provided on our slack.
 57 | 
 58 | then, Base will verify the specified access key was correct.
 59 | 
 60 | if you don't have an access key, please see [0. Get Access Key](#0-get-access-key).
 61 | 
 62 | this command will show you what projects you have
 63 | 
 64 | ```bash
 65 | base list
 66 | ```
 67 | 
 68 | <details><summary>Output</summary>
 69 | 
 70 | ```
 71 | Welcome to Adansons Base!!
 72 | 
 73 | Let's start with your access key provided on our slack.
 74 | 
 75 | Please register your access_key: xxxxxxxxxx
 76 | 
 77 | Successfully configured as xxxx@yyyy.com
 78 | 
 79 | projects
 80 | ========
 81 | ```
 82 | </details>
 83 | 
 84 | ### 2.2 Environment Variables
 85 | 
 86 | if you don’t want to configure interactively, you can use environment variables for configuration.
 87 | 
 88 | `BASE_USER_ID` is used for the identification of users, this is the email address you submitted via our form.
 89 | 
 90 | ```bash
 91 | export BASE_ACCESS_KEY=xxxxxxxxxx
 92 | export BASE_USER_ID=xxxx@yyyy.com
 93 | ```
 94 | 
 95 | ## 3. Tutorial 1: Organize metadata and Create a dataset
 96 | 
 97 | let’s start the Base tutorial with the mnist dataset.
 98 | 
 99 | ### Step 0. prepare sample dataset
100 | 
101 | install dependencies for download dataset at first.
102 | 
103 | ```bash
104 | pip install pypng
105 | ```
106 | 
107 | then, download a script for mnist from our Base repository
108 | 
109 | ```bash
110 | curl -sSL https://raw.githubusercontent.com/adansons/base/main/download_mnist.py > download_mnist.py
111 | ```
112 | 
113 | run the download-mnist script. you can specify any folder for downloading as the last argument(default “~/dataset/mnist”). if you run this command on Windows, please replace it with the windows path like “C:\dataset\mnist”
114 | 
115 | ```bash
116 | python3 ./download_mnist.py ~/dataset/mnist
117 | ```
118 | 
119 | > Note: Base can link the data files if you put them anywhere on the local computer. So if you already downloaded the mnist dataset, you can use it
120 | 
121 | after downloading, you can see data files in ~/dataset/mnist.
122 | 
123 | ```
124 | ~
125 | └── dataset
126 |      └── mnist
127 |           ├── train
128 |           │ 	 ├── 0
129 |           │ 	 │   ├── 1.png
130 |           │ 	 │   ├── ...
131 |           │ 	 │   └── 59987.png
132 |           │ 	 ├── ...
133 |           │ 	 └── 9
134 |           └──	test
135 |                 ├── 0
136 |                 └── ...
137 | ```
138 | 
139 | ### Step 1. create a new project
140 | 
141 | create mnist project with [base new <project>](/docs/CLI.md#new) command.
142 | 
143 | ```bash
144 | base new mnist
145 | ```
146 | 
147 | <details><summary>Output</summary>
148 | 
149 | ```
150 | Your Project UID
151 | ----------------
152 | abcdefghij0123456789
153 | 
154 | save Project UID in the local file (~/.base/projects)
155 | ```
156 | </details>
157 | 
158 | Base will issue a Project Unique ID and automatically save it in a local file.
159 | 
160 | ### Step 2. import data files
161 | 
162 | after step 0, you have many png image files on the”~/dataset/mnist” directory.
163 | 
164 | let’s upload metadata related to their paths into the mnist project with the `base import` command.
165 | 
166 | ```bash
167 | base import mnist --directory ~/dataset/mnist --extension png --parse "{dataType}/{label}/{id}.png"
168 | ```
169 | 
170 | > Note: if you changed the download folder, please replace “~/dataset/mnist” in the above command.
171 | 
172 | <details><summary>Output</summary>
173 | 
174 | ```
175 | Check datafiles...
176 | found 70000 files with png extension.
177 | Success!
178 | ```
179 | </details>
180 | 
181 | ### Step 3. import external metadata files
182 | 
183 | if you have external metadata files, you can integrate them into the existing project database with the `—-external-file` option.
184 | 
185 | in this time, we use `wrongImagesInMNISTTestset.csv` published on Github by youkaichao.
186 | 
187 | [https://github.com/youkaichao/mnist-wrong-test](https://github.com/youkaichao/mnist-wrong-test)
188 | 
189 | this is the extra metadata that correct wrong label on the mnist test dataset.
190 | 
191 | you can evaluate your model more strictly and correctly by using these extra metadata with Base.
192 | 
193 | download external CSV
194 | 
195 | ```bash
196 | curl -SL https://raw.githubusercontent.com/youkaichao/mnist-wrong-test/master/wrongImagesInMNISTTestset.csv > ~/Downloads/wrongImagesInMNISTTestset.csv
197 | ```
198 | 
199 | ```bash
200 | base import mnist --external-file --path ~/Downloads/wrongImagesInMNISTTestset.csv -a dataType:test
201 | ```
202 | 
203 | <details><summary>Output</summary>
204 | 
205 | ```
206 | 1 tables found!
207 | now estimating the rule for table joining...
208 | 
209 | 1 table joining rule was estimated!
210 | Below table joining rule will be applied...
211 | 
212 | Rule no.1
213 | 
214 |         key 'index'     ->      connected to 'id' key on exist table
215 |         key 'originalLabel'     ->      connected to 'label' key on exist table
216 |         key 'correction'        ->      newly added
217 | 
218 | 1 tables will be applied
219 | Table 1 sample record:
220 |         {'index': 8, 'originalLabel': 5, 'correction': '-1'}
221 | 
222 | Do you want to perform table join?
223 |         Base will join tables with that rule described above.
224 | 
225 |         'y' will be accepted to approve.
226 | 
227 |         Enter a value: y
228 | Success!
229 | ```
230 | </details>
231 | 
232 | ### Step 4. filter and export dataset with CLI
233 | 
234 | now, we are ready to create a dataset.
235 | 
236 | let’s pick up a part of data files, the label is 0, 1, or 2 for training, from project mnist with `base search <project>` command.
237 | 
238 | you can use `--conditions <value-only-search>` option for magical search filter and `--query <key-value-pair-search>` option for advanced filter.
239 | 
240 | Note that the `--query` option can only use the value for searching.
241 | 
242 | 
243 | 
244 | Be careful that you may get so large output on your console without the `-s, --summary` option.
245 | 
246 | The `--query` option's grammar is below.
247 | 
248 | `--query {KeyName} {Operator} {Values}`
249 | 
250 | - add 1 space between each section
251 | - **don't use space any other**
252 | 
253 | You can use these operators below in the query option.
254 | 
255 | [operators]
256 | ```
257 |   == : equal
258 |   != : not equal
259 |   >= : greater than
260 |   <= : less than
261 |   >  : greater
262 |   <  : less
263 |   in : inner list of Values
264 |   not in : not inner list of Values
265 | ```
266 | 
267 | (check [search docs](/docs/CLI.md#search) for more information).
268 | 
269 | ```bash
270 | base search mnist --conditions "train" --query "label in ['1','2','3']"
271 | ```
272 | 
273 | > Note: in the query option, you have to specify each component as a string in the list without space like `“[’1’,’2’,’3’]”`, when you want to operate `in` or `not in` query.
274 | > 
275 | 
276 | <details><summary>Output</summary>
277 | 
278 | ```
279 | 18831 files
280 | ========
281 | '/home/xxxx/dataset/mnist/train/1/42485.png'
282 | ...
283 | ```
284 | </details>
285 | 
286 | > Note: If you specify no conditions or query, Base will return whole data files.
287 | 
288 | If you want to use the 'OR search' with the `--query` command, please use our Python SDK.
289 | 
290 | ### Step 5. filter and export dataset with Python SDK
291 | 
292 | in python script, you can filter and export datasets easily and simply with `Project class` and `Files class`. (see [SDK docs](/docs/SDK.md#project-class))
293 | 
294 | (If you don't have the packages below, please install them by using `pip`)
295 | ```bash
296 | pip install NumPy pillow torch torchvision
297 | ```
298 | 
299 | ```python
300 | from base import Project, Dataset
301 | import numpy as np
302 | from PIL import Image
303 | 
304 | 
305 | # export dataset as you want to use
306 | project = Project("mnist")
307 | files = project.files(conditions="train", query=["label in ['1','2','3']"])
308 | 
309 | print(files[0])
310 | # this returns path-like `File` object
311 | # -> '/home/xxxx/dataset/mnist/0/12909.png'
312 | print(files[0].label)
313 | # this returns the value of attribute 'lable' of first `File` object
314 | # -> '0'
315 | 
316 | # function to load image from path
317 | # this is necessary, if you want to use image in your dataset
318 | # because base Dataset class doesn't convert path to image
319 | def preprocess_func(path):
320 |     image = Image.open(path)
321 |     image = image.resize((28, 28))
322 |     image = np.array(image)
323 |     return image
324 | 
325 | dataset = Dataset(files, target_key="label", transform=preprocess_func)
326 | 
327 | # you can also use dataset objects like this.
328 | for data, label in dataset:
329 |     # data: an image-data. ndarray
330 |     # label: the label of an image data, like 0
331 |     pass
332 | 
333 | x_train, x_test, y_train, y_test = dataset.train_test_split(split_rate=0.2)
334 | 
335 | # or use with torch
336 | import torch
337 | import torchvision.transforms as transforms
338 | from PIL import Image
339 | 
340 | def preprocess_func(path):
341 |     image = transforms.ToTensor()(transforms.Resize((28, 28))(Image.open(path)))
342 |     return image
343 | 
344 | dataset = Dataset(files, target_key="label", transform=preprocess_func)
345 | loader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)
346 | ```
347 | 
348 | finally, let’s try one of the most characteristic use cases on Adansons Base.
349 | 
350 | in the external file, you imported in step.3, some mnist test data files are annotated as `“-1”` in the correction column. this means that it is difficult to classify that files even for a human.
351 | 
352 | so, you should exclude that files from your dataset to evaluate your AI models more properly.
353 | 
354 | ```python
355 | # you can exclude files which have "-1" on "correction" with below code
356 | eval_files = project.files(conditions="test", query=["correction != -1"])
357 | 
358 | print(len(eval_files))
359 | # this returns the number of files matched with requested conditions or query
360 | # -> 9963
361 | 
362 | eval_dataset = Dataset(eval_files, target_key="label", transform=preprocess_func)
363 | ```
364 | 
365 | ## 4. API Reference
366 | 
367 | ### 4.1 Command Reference
368 | 
369 | [Command Reference](/docs/CLI.md)
370 | 
371 | ### 4.2 Python Reference
372 | 
373 | [Python Reference](/docs/SDK.md)
374 | 


--------------------------------------------------------------------------------
/base/parser.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | import re
  6 | from typing import List, Optional
  7 | 
  8 | 
  9 | class Parser:
 10 |     """
 11 |     The class of path parsing.
 12 |     Instance Args
 13 |     -----------------
 14 |     self.sep: str
 15 |         Path separator
 16 | 
 17 |     self.parsing_rule: str
 18 |         Path parsing rule
 19 | 
 20 |     self.splitters: List
 21 |         The list of splitters in the parsing rule.
 22 |         ex.) `["-", "/", "_"]`
 23 | 
 24 |     self.split_counts: List
 25 |         The list of the number of splitters included in each value of detail parsing rule.
 26 |         You input the detail parsing rule ({Origin}/{2022_04_05}-{dog}_{1}.png).
 27 |         Then, self.split_counts is `[0, 2, 0, 0]`.
 28 | 
 29 |     self.unuse_strs: List
 30 |         The list of the number of unuse strings in the parsing rule.
 31 |         You input the parsing rule (hoge{num1}/fuga{num2}.txt).
 32 |         Then, self.unuse_strs is `["hoge", "fuga"]`.
 33 |     """
 34 | 
 35 |     def __init__(
 36 |         self, parsing_rule: str, extension: str, sep: Optional[str] = None
 37 |     ) -> None:
 38 |         """
 39 |         Parameters
 40 |         ----------
 41 |         parsing_rule : str
 42 |             specified parsing rule
 43 |             ex.) {_}/{name}/{timestamp}/{sensor}-{condition}_{iteration}.csv
 44 |             * phrase in "{}" will be interpretered as key
 45 |             * "{_}" will be ignored
 46 |         sep : str, default={"/", "¥"}
 47 |             path separator char. this depends on the Operating System
 48 |         """
 49 |         if sep is None:
 50 |             self.sep = "/"
 51 |         else:
 52 |             self.sep = sep
 53 | 
 54 |         self.parsing_rule = parsing_rule
 55 |         self.extension = extension
 56 |         self.__generate_parser()
 57 | 
 58 |     def __generate_parser(self, is_update: bool = False) -> None:
 59 |         """
 60 |         Generate parsing keys and pattern from parsing rule
 61 |         Parameters (instance vars)
 62 |         --------------------------
 63 |         self.parsing_rule : str
 64 |             specified parsing rule
 65 |             ex.) {_}/{name}/{timestamp}/{sensor}-{condition}_{iteration}.csv
 66 |             * phrase in "{}" will be interpretered as key
 67 |             * "{_}" will be ignored
 68 |         is_update : bool
 69 |             whether to update the parse rule
 70 |         Returns (instance vars)
 71 |         -----------------------
 72 |         self.parsing_keys : list
 73 |             list of keys
 74 |             ex.) ["_", "name", "timestamp", "sensor", "condition", "iteration"]
 75 |         """
 76 |         # 正規表現"(.*)"は任意の文字の任意回数の繰り返し
 77 |         # "{(.*)}"とすることで、{}に囲まれた文字列を抽出できる
 78 |         # しかし、複数{}がある時に、どの"{"と"}"の組み合わせを取れば良いかわからない
 79 |         # "{(.*?)}"と?をつけることで、最小文字数の文字列を囲んだ{}をpickupできる
 80 |         self.parsing_rule = self.convert_parsable_parse_rule()
 81 | 
 82 |         parsing_keys = re.findall("{(.*?)}", self.parsing_rule)
 83 |         splitters = self.extract_splitter()
 84 | 
 85 |         if is_update:
 86 |             split_counts = self.count_splitter_in_each_key(parsing_keys)
 87 |         else:
 88 |             split_counts = [0] * len(parsing_keys)
 89 | 
 90 |         self.splitters = splitters
 91 |         self.split_counts = split_counts
 92 | 
 93 |         if not is_update:
 94 |             self.parsing_keys = parsing_keys
 95 | 
 96 |     def __call__(self, path: str) -> dict:
 97 |         """
 98 |         Parse path with generated parser
 99 |         Parameters
100 |         ----------
101 |         path : str
102 |             required the file path
103 |         Returns
104 |         -------
105 |         parsed_dict : dict
106 |             meta data dictionary
107 |         Example
108 |         -------
109 |         >>> from base.parser import Parser
110 |         >>> parser = Parser("your parsing rule")
111 |         >>> result = parser("your target path for parse")
112 |         """
113 |         if path.startswith(self.sep):
114 |             path = (self.sep).join(path.split(self.sep)[1:])
115 | 
116 |         # ユーザーから取得したparsing_ruleを元に
117 |         # hoge/2022-03-14/A-200-A-a-01 から
118 |         # {hoge}/{2022-03-14}/{A-200}-{A}-{a-01}　に変換する
119 |         # "{(.*?)}"で{}内の値を抽出できるので良きみ
120 |         path = self.convert_path_to_parsable_format(path)
121 |         parsed_values = re.findall("{(.*?)}", path)
122 | 
123 |         parsed_dict = {}
124 |         for key, value in zip(self.parsing_keys, parsed_values):
125 |             if key in ["_", "[UnuseToken]"]:
126 |                 continue
127 |             parsed_dict[key] = value
128 | 
129 |         return parsed_dict
130 | 
131 |     def update_rule(self, parsing_rule: str) -> None:
132 |         """
133 |         Update parsing rule and re-generate parsing keys and pattern
134 |         Parameters
135 |         ----------
136 |         parsing_rule : str
137 |             specified parsing rule
138 |             ex.) {_}/{name}/{timestamp}/{sensor}-{condition}_{iteration}.csv
139 |             * phrase in "{}" will be interpretered as key
140 |             * "{_}" will be ignored
141 |         """
142 |         self.parsing_rule = parsing_rule
143 |         self.__generate_parser(is_update=True)
144 | 
145 |     def is_path_parsable(self, path: str) -> bool:
146 |         """
147 |         Check path is parsable
148 |         Parameters
149 |         ---------------------
150 |         path : str
151 |             ex) `dataset/dog/2022_12_04-A.png`
152 |         Return
153 |         ---------------------
154 |         parsable_flag : bool
155 |         """
156 |         if path.startswith(self.sep):
157 |             path = (self.sep).join(path.split(self.sep)[1:])
158 | 
159 |         parsable_flag = True
160 |         try:
161 |             path = self.convert_path_to_parsable_format(path)
162 |         except:
163 |             parsable_flag = False
164 | 
165 |         return parsable_flag
166 | 
167 |     def convert_parsable_parse_rule(self) -> str:
168 |         """
169 |         Replace unused strings in `parsing_rule` with `{_}`
170 | 
171 |         Return
172 |         ---------------------
173 |         parsable_parsing_rule: str
174 |             ex.) `{_}/{num1}/{fuga}/{num2}.txt`
175 |         """
176 |         if not self.parsing_rule.endswith("." + self.extension):
177 |             self.parsing_rule += "." + self.extension
178 | 
179 |         self.unuse_strs = self.extract_unuse_str()
180 | 
181 |         for not_use_str in self.unuse_strs:
182 |             self.parsing_rule = self.parsing_rule.replace(not_use_str, "{[UnuseToken]}")
183 | 
184 |         self.parsing_rule = self.parsing_rule.replace("}{", "}" + self.sep + "{")
185 | 
186 |         parsable_parsing_rule = self.parsing_rule
187 |         return parsable_parsing_rule
188 | 
189 |     def convert_path_to_parsable_format(self, path: str) -> str:
190 |         """
191 |         Convert path to parsable format
192 |         1. Insert separators before and after unuse strings.
193 |         2. Enclose the value to be extracted in the path with  `{}`.
194 |             - Add `{` after separator `/` to `parsable_format_path`.
195 |             - The condition of adding `}` to `parsable_format_path`.
196 |                 - Before the separator was added.
197 |                 - When the number of splitters, added to the `parsable_format_path` after `{`,
198 |                  is the number determined by parsing_rule
199 | 
200 |         Parameters
201 |         ---------------
202 |         path : str
203 |             ex.) `Origin/suzuki/2022_12_03/A-20-C-100.csv`
204 |         Return
205 |         --------------
206 |         parsable_format_path : str
207 |             strings converted for path-parse
208 |             ex.) `{Origin}/{suzuki}/{2022_12_03}/{A-200}-{C}-{100}.csv`
209 |         """
210 |         closure_cnt, split_cnt = 0, 0
211 | 
212 |         path = self.insert_sep_to_path(path)
213 | 
214 |         parsable_format_path = "{"
215 |         for s in path:
216 |             if (s in self.splitters) and (split_cnt == self.split_counts[closure_cnt]):
217 |                 parsable_format_path += "}" + s + "{"
218 |                 closure_cnt += 1
219 |                 split_cnt = 0
220 | 
221 |             else:
222 |                 if s in self.splitters:
223 |                     split_cnt += 1
224 |                 parsable_format_path += s
225 | 
226 |         return parsable_format_path
227 | 
228 |     def extract_sub_splitters(self) -> List:
229 |         """
230 |         Extract strings not enclosed in `{}`
231 |         Returns
232 |         -----------------
233 |         candidate_splitters : List
234 |             ex) `["hoge", "/fuga"]`
235 |         """
236 |         parsing_rule_ = self.parsing_rule
237 | 
238 |         # switch `{XX}` to `}XX{`
239 |         parsing_rule_ = parsing_rule_.replace("{", "[RPTRight]").replace(
240 |             "}", "[RPTLeft]"
241 |         )
242 |         parsing_rule_ = parsing_rule_.replace("[RPTRight]", "}").replace(
243 |             "[RPTLeft]", "{"
244 |         )
245 |         parsing_rule_ = "{" + parsing_rule_ + "}"
246 | 
247 |         sub_splitters = re.findall(r"\{(.*?)\}", parsing_rule_)
248 | 
249 |         return sub_splitters
250 | 
251 |     def extract_splitter(self) -> List:
252 |         """
253 |         Find splitters such as '/', '-' or '_' etc...
254 | 
255 |         Return
256 |         ---------------------
257 |         splitters : list
258 |             ex) `["/", "/", "/", "-", "_", "."]`
259 |         """
260 |         sub_splitters = self.extract_sub_splitters()
261 | 
262 |         code_pattern = re.compile("[!-/:-@[-`{-~]")
263 | 
264 |         splitters = []
265 |         for sub_sp in sub_splitters:
266 |             sp = re.findall(code_pattern, sub_sp)
267 |             splitters.extend(sp)
268 | 
269 |         return splitters
270 | 
271 |     def extract_unuse_str(self) -> List:
272 |         """
273 |         Find not use strings for value.
274 |         Return
275 |         ---------------------
276 |         unuse_strs : List
277 |             ex) `["hoge", "fuga"]`
278 |         """
279 |         sub_splitters = self.extract_sub_splitters()
280 |         str_pattern = re.compile("[^!-/:-@[-`{-~]+")
281 | 
282 |         unuse_strs = []
283 |         for sub_sp in sub_splitters:
284 |             strs = re.findall(str_pattern, sub_sp)
285 |             unuse_strs.extend(strs)
286 | 
287 |         return unuse_strs
288 | 
289 |     def count_splitter_in_each_key(self, values: List) -> List:
290 |         """
291 |         Count the number of splitters in each keys
292 |         Parameters
293 |         ----------------
294 |         values : List
295 |             The list of values extracted from the `path`
296 |             ex.) `["hoge", "/fuga"]` (path: hoge{num1}/fuga{num2}.txt)
297 | 
298 |         Retern
299 |         ---------------
300 |         split_cnts : List
301 |             The list of the number of splitters in each key
302 |             ex.) `
303 |         """
304 |         split_cnts = []
305 |         for value in values:
306 |             split_in_value = 0
307 |             if value != "_":
308 |                 for split in set(self.splitters):
309 |                     split_in_value += value.count(split)
310 |             split_cnts.append(split_in_value)
311 | 
312 |         return split_cnts
313 | 
314 |     def insert_sep_to_path(self, path: str) -> str:
315 |         """
316 |         Insert splitter before/after `unuse_str` in the path
317 |         Parameters
318 |         ------------------
319 |         path : str
320 |             ex.) `Origin/hoge1/fugasuzukipiyo_03.csv`
321 |         Returns
322 |         ------------------
323 |         path : str
324 |             ex.) `{Origin}/{hoge}/{1}/{fuga}/{suzuki}/{piyo}_{03}.csv`
325 |         """
326 |         for unuse_str in self.unuse_strs:
327 |             path = path.replace(unuse_str, self.sep + unuse_str + self.sep)
328 | 
329 |         path = path.replace(self.sep * 3, self.sep)
330 | 
331 |         splitters = sorted(list(set(self.splitters)))
332 | 
333 |         # Put "/" at the last position in `splitters`
334 |         if self.sep in splitters:
335 |             splitters.remove(self.sep)
336 |             splitters.append(self.sep)
337 | 
338 |         for splitter in splitters:
339 |             path = path.replace(splitter + self.sep, self.sep).replace(
340 |                 self.sep + splitter, self.sep
341 |             )
342 | 
343 |         return path
344 | 
345 |     def validate_parsing_rule(self) -> bool:
346 |         """
347 |         Check that the parsing_rule is valid
348 |         Parameter
349 |         ----------------
350 |         self.parsing_keys
351 | 
352 |         Return
353 |         ----------------
354 |         is_valid : bool
355 |         """
356 |         if len(self.parsing_keys) == 0:
357 |             is_valid = False
358 |         elif len(self.parsing_keys) == 1 and self.parsing_keys[0] == "[UnuseToken]":
359 |             is_valid = False
360 |         else:
361 |             is_valid = True
362 |         return is_valid
363 | 
364 | 
365 | if __name__ == "__main__":
366 |     pass
367 | 


--------------------------------------------------------------------------------
/tests/test_cli.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | 
  6 | import os
  7 | 
  8 | import time
  9 | from unittest import result, runner
 10 | from click.testing import CliRunner
 11 | from base.cli import (
 12 |     create_table,
 13 |     import_data,
 14 |     list_project,
 15 |     remove_project,
 16 |     show_project_detail,
 17 |     import_data,
 18 |     data_link,
 19 |     search_files,
 20 |     invite_member,
 21 | )
 22 | 
 23 | 
 24 | PROJECT_NAME = "adansons_test_project"
 25 | INVITE_USER_ID = "test_invite@adansons.co.jp"
 26 | 
 27 | 
 28 | def test_initialize():
 29 |     """
 30 |     If something went wrong past test session.
 31 |     You may have exsiting tables, so you have to clear them before below tests.
 32 |     """
 33 |     runner = CliRunner()
 34 |     result = runner.invoke(list_project, [])
 35 |     if PROJECT_NAME in result.output:
 36 |         result = runner.invoke(remove_project, [PROJECT_NAME])
 37 | 
 38 |     result = runner.invoke(list_project, ["--archived"])
 39 |     if PROJECT_NAME in result.output:
 40 |         result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"])
 41 | 
 42 | 
 43 | def test_create_table():
 44 |     runner = CliRunner()
 45 |     result = runner.invoke(create_table, [PROJECT_NAME])
 46 |     assert result.exit_code == 0
 47 |     assert "Your Project UID" in result.output
 48 | 
 49 | 
 50 | def test_list_project():
 51 |     runner = CliRunner()
 52 |     result = runner.invoke(list_project, [])
 53 |     assert result.exit_code == 0
 54 |     assert PROJECT_NAME in result.output
 55 | 
 56 | 
 57 | def test_show_project_detail():
 58 |     # wait create table
 59 |     time.sleep(20)
 60 |     runner = CliRunner()
 61 |     result = runner.invoke(show_project_detail, [PROJECT_NAME])
 62 |     assert result.exit_code == 0
 63 |     assert f"project {PROJECT_NAME}" in result.output
 64 | 
 65 | 
 66 | def test_import_dataset_with_invalid_rule():
 67 |     runner = CliRunner()
 68 |     result = runner.invoke(
 69 |         import_data,
 70 |         [
 71 |             PROJECT_NAME,
 72 |             "-d",
 73 |             os.path.dirname(__file__),
 74 |             "-e",
 75 |             "png",
 76 |             "-c",
 77 |             "{_}/{date}_{key}.png",
 78 |         ],
 79 |         input="{data}/{2022_04_14}_{rocket}.png",
 80 |     )
 81 |     assert result.exit_code == 0
 82 |     assert "Success!" in result.output
 83 | 
 84 | 
 85 | def test_import_dataset():
 86 |     runner = CliRunner()
 87 |     result = runner.invoke(
 88 |         import_data,
 89 |         [
 90 |             PROJECT_NAME,
 91 |             "-d",
 92 |             os.path.dirname(__file__),
 93 |             "-e",
 94 |             "jpeg",
 95 |             "-c",
 96 |             "{_}/{title}.jpeg",
 97 |         ],
 98 |     )
 99 |     assert result.exit_code == 0
100 |     assert "Success!" in result.output
101 | 
102 | 
103 | def test_import_metafile_extract():
104 |     runner = CliRunner()
105 |     result = runner.invoke(
106 |         import_data,
107 |         [
108 |             PROJECT_NAME,
109 |             "-m",
110 |             "-p",
111 |             os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"),
112 |             "--extract",
113 |         ],
114 |     )
115 |     assert result.exit_code == 0
116 |     assert "Success!" in result.output
117 | 
118 | 
119 | def test_import_metafile_estimate_rule():
120 |     runner = CliRunner()
121 |     result = runner.invoke(
122 |         import_data,
123 |         [
124 |             PROJECT_NAME,
125 |             "-m",
126 |             "-p",
127 |             os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"),
128 |             "--estimate-rule",
129 |         ],
130 |     )
131 |     assert result.exit_code == 0
132 |     assert "Success!" in result.output
133 | 
134 | 
135 | def test_import_metafile():
136 |     runner = CliRunner()
137 |     result = runner.invoke(
138 |         import_data,
139 |         [
140 |             PROJECT_NAME,
141 |             "-m",
142 |             "-p",
143 |             os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"),
144 |             "--auto-approve",
145 |         ],
146 |     )
147 |     assert result.exit_code == 0
148 |     assert "Success!" in result.output
149 | 
150 | 
151 | def test_import_metafile_modify():
152 |     runner = CliRunner()
153 |     result = runner.invoke(
154 |         import_data,
155 |         [
156 |             PROJECT_NAME,
157 |             "-m",
158 |             "-p",
159 |             os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"),
160 |         ],
161 |         input="m",
162 |     )
163 |     assert result.exit_code == 0
164 |     assert "Success!" in result.output
165 | 
166 | 
167 | def test_import_metafile_modify_join_rule_file():
168 |     runner = CliRunner()
169 |     result = runner.invoke(
170 |         import_data,
171 |         [
172 |             PROJECT_NAME,
173 |             "-m",
174 |             "--join-rule",
175 |             "joinrule_definition_adansons_test_project.yml",
176 |         ],
177 |         input="y",
178 |     )
179 |     assert result.exit_code == 0
180 |     assert "Success!" in result.output
181 | 
182 | 
183 | def test_import_metafile_exkeyvalue():
184 |     runner = CliRunner()
185 |     result = runner.invoke(
186 |         import_data,
187 |         [
188 |             PROJECT_NAME,
189 |             "-m",
190 |             "-p",
191 |             os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"),
192 |             "-a",
193 |             "key1:value1",
194 |             "--auto-approve",
195 |         ],
196 |     )
197 |     assert result.exit_code == 0
198 |     assert "Success!" in result.output
199 | 
200 | 
201 | def test_import_metafile_exkeyvalue_multiple():
202 |     runner = CliRunner()
203 |     result = runner.invoke(
204 |         import_data,
205 |         [
206 |             PROJECT_NAME,
207 |             "-m",
208 |             "-p",
209 |             os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"),
210 |             "-a",
211 |             "key1:value1",
212 |             "-a",
213 |             "key2:value2",
214 |             "--auto-approve",
215 |         ],
216 |     )
217 |     assert result.exit_code == 0
218 |     assert "Success!" in result.output
219 | 
220 | 
221 | def test_import_metafile_exkeyvalue_invalid():
222 |     runner = CliRunner()
223 |     result = runner.invoke(
224 |         import_data,
225 |         [
226 |             PROJECT_NAME,
227 |             "-m",
228 |             "-p",
229 |             os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"),
230 |             "-a",
231 |             "key1-value1",
232 |             "--auto-approve",
233 |         ],
234 |     )
235 |     assert result.exit_code == 0
236 |     assert "invalid" in result.output
237 | 
238 | 
239 | def test_import_metafile_csv():
240 |     runner = CliRunner()
241 |     result = runner.invoke(
242 |         import_data,
243 |         [
244 |             PROJECT_NAME,
245 |             "-m",
246 |             "-p",
247 |             os.path.join(os.path.dirname(__file__), "data", "sample.csv"),
248 |             "--auto-approve",
249 |         ],
250 |     )
251 |     assert result.exit_code == 0
252 |     assert "Success!" in result.output
253 | 
254 | 
255 | def test_import_metafile_csv_exkeyvalue():
256 |     runner = CliRunner()
257 |     result = runner.invoke(
258 |         import_data,
259 |         [
260 |             PROJECT_NAME,
261 |             "-m",
262 |             "-p",
263 |             os.path.join(os.path.dirname(__file__), "data", "sample.csv"),
264 |             "-a",
265 |             "key1:value1",
266 |             "--auto-approve",
267 |         ],
268 |     )
269 |     assert result.exit_code == 0
270 |     assert "Success!" in result.output
271 | 
272 | 
273 | def test_import_metafile_csv_exkeyvalue_multiple():
274 |     runner = CliRunner()
275 |     result = runner.invoke(
276 |         import_data,
277 |         [
278 |             PROJECT_NAME,
279 |             "-m",
280 |             "-p",
281 |             os.path.join(os.path.dirname(__file__), "data", "sample.csv"),
282 |             "-a",
283 |             "key1:value1",
284 |             "-a",
285 |             "key2:value2",
286 |             "--auto-approve",
287 |         ],
288 |     )
289 |     assert result.exit_code == 0
290 |     assert "Success!" in result.output
291 | 
292 | 
293 | def test_data_link():
294 |     runner = CliRunner()
295 |     result = runner.invoke(
296 |         data_link,
297 |         [
298 |             PROJECT_NAME,
299 |             "-d",
300 |             os.path.dirname(__file__),
301 |             "-e",
302 |             "jpeg",
303 |         ],
304 |     )
305 |     assert result.exit_code == 0
306 |     assert "linked!" in result.output
307 | 
308 | 
309 | def test_import_metafile_csv_exkeyvalue_invalid():
310 |     runner = CliRunner()
311 |     result = runner.invoke(
312 |         import_data,
313 |         [
314 |             PROJECT_NAME,
315 |             "-m",
316 |             "-p",
317 |             os.path.join(os.path.dirname(__file__), "data", "sample.csv"),
318 |             "-a",
319 |             "key1-value1",
320 |         ],
321 |     )
322 |     assert result.exit_code == 0
323 |     assert "invalid" in result.output
324 | 
325 | 
326 | def test_search_files():
327 |     time.sleep(5)
328 |     runner = CliRunner()
329 |     result = runner.invoke(search_files, [PROJECT_NAME, "-q", "title == sample"])
330 |     assert result.exit_code == 0
331 |     assert "1 files" in result.output
332 | 
333 | 
334 | def test_get_project_member():
335 |     runner = CliRunner()
336 |     result = runner.invoke(show_project_detail, [PROJECT_NAME, "--member-list"])
337 |     assert result.exit_code == 0
338 |     assert "project Members" in result.output
339 | 
340 | 
341 | def test_invite_project_member():
342 |     runner = CliRunner()
343 |     result = runner.invoke(
344 |         invite_member, [PROJECT_NAME, "-m", INVITE_USER_ID, "-p", "Editor"]
345 |     )
346 |     assert result.exit_code == 0
347 |     assert "Successfully" in result.output
348 |     runner = CliRunner()
349 |     result = runner.invoke(show_project_detail, [PROJECT_NAME, "--member-list"])
350 |     assert result.exit_code == 0
351 |     assert f"{INVITE_USER_ID} (Editor" in result.output
352 | 
353 | 
354 | def test_change_permission():
355 |     runner = CliRunner()
356 |     result = runner.invoke(
357 |         invite_member, [PROJECT_NAME, "-m", INVITE_USER_ID, "-p", "Admin", "-u"]
358 |     )
359 |     assert result.exit_code == 0
360 |     assert "Successfully" in result.output
361 |     runner = CliRunner()
362 |     result = runner.invoke(show_project_detail, [PROJECT_NAME, "--member-list"])
363 |     assert result.exit_code == 0
364 |     assert f"{INVITE_USER_ID} (Admin" in result.output
365 | 
366 | 
367 | # skip test_change_project_owner because it difficult to handle multi user in CLI
368 | def test_delete_project_member():
369 |     runner = CliRunner()
370 |     result = runner.invoke(remove_project, [PROJECT_NAME, "-m", INVITE_USER_ID])
371 |     assert result.exit_code == 0
372 |     assert f"{INVITE_USER_ID} was removed from {PROJECT_NAME}" in result.output
373 |     runner = CliRunner()
374 |     result = runner.invoke(show_project_detail, [PROJECT_NAME, "--member-list"])
375 |     assert result.exit_code == 0
376 |     assert f"{INVITE_USER_ID} (Admin" not in result.output
377 | 
378 | 
379 | def test_archive_project():
380 |     runner = CliRunner()
381 |     result = runner.invoke(remove_project, [PROJECT_NAME])
382 |     assert result.exit_code == 0
383 |     assert f"{PROJECT_NAME} was Archived" in result.output
384 |     result = runner.invoke(list_project, [])
385 |     assert result.exit_code == 0
386 |     assert PROJECT_NAME not in result.output
387 | 
388 | 
389 | def test_list_archived_project():
390 |     runner = CliRunner()
391 |     result = runner.invoke(list_project, ["--archived"])
392 |     assert result.exit_code == 0
393 |     assert PROJECT_NAME in result.output
394 | 
395 | 
396 | def test_delete_project():
397 |     runner = CliRunner()
398 |     result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"])
399 |     assert result.exit_code == 0
400 |     assert f"{PROJECT_NAME} was Deleted" in result.output
401 |     result = runner.invoke(list_project, ["--archived"])
402 |     assert result.exit_code == 0
403 |     assert PROJECT_NAME not in result.output
404 | 
405 | 
406 | if __name__ == "__main__":
407 |     test_initialize()
408 |     test_create_table()
409 |     test_list_project()
410 |     test_show_project_detail()
411 |     test_import_dataset()
412 |     test_import_metafile_extract()
413 |     test_import_metafile_estimate_rule()
414 |     test_import_metafile()
415 |     test_import_metafile_modify()
416 |     test_import_metafile_modify_join_rule_file()
417 |     test_import_metafile_exkeyvalue()
418 |     test_import_metafile_exkeyvalue_multiple()
419 |     test_import_metafile_csv_exkeyvalue()
420 |     test_import_metafile_csv_exkeyvalue_multiple()
421 |     test_import_metafile_csv()
422 |     test_data_link()
423 |     test_import_metafile_csv_exkeyvalue_invalid()
424 |     test_import_metafile_exkeyvalue_invalid()
425 |     test_search_files()
426 |     test_get_project_member()
427 |     test_invite_project_member()
428 |     test_change_permission()
429 |     test_delete_project_member()
430 |     test_archive_project()
431 |     test_list_archived_project()
432 |     test_delete_project()
433 | 


--------------------------------------------------------------------------------
/base/files.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | import os
  6 | import re
  7 | import json
  8 | import copy
  9 | import requests
 10 | import urllib.parse
 11 | from typing import Optional, Union, List, Any
 12 | 
 13 | from base.config import (
 14 |     get_user_id,
 15 |     get_access_key,
 16 |     get_project_uid,
 17 |     BASE_API_ENDPOINT,
 18 | )
 19 | 
 20 | HEADER = {"Content-Type": "application/json"}
 21 | LINKER_DIR = os.path.join(os.path.expanduser("~"), ".base", "linker")
 22 | 
 23 | 
 24 | class File(str):
 25 |     """
 26 |     File class
 27 | 
 28 |     Attributes
 29 |     ----------
 30 |     path : str
 31 |         path of file
 32 |     attrs : dict
 33 |         dict of attributes (metadata) which related with this file
 34 | 
 35 |     Note
 36 |     ----
 37 |     The metadata of the file are added to the attribute.
 38 |     """
 39 | 
 40 |     def __new__(cls, file_path: str, attrs: dict):
 41 |         self = super().__new__(cls, file_path)
 42 |         self.path = file_path
 43 |         self.metadata = attrs
 44 |         self.__dict__.update(attrs)
 45 |         return self
 46 | 
 47 |     def __getitem__(self, key: str) -> Any:
 48 |         return self.__dict__[key]
 49 | 
 50 | 
 51 | class Files:
 52 |     """
 53 |     Files class
 54 | 
 55 |     Attributes
 56 |     ----------
 57 |     project_name : str
 58 |         registerd project name
 59 |     user_id : str
 60 |         registerd user id
 61 |     project_uid : str
 62 |         project unique hash
 63 |     conditions : str, default None
 64 |         value of the condition to search for files
 65 |     query : list of str, default []
 66 |         conditional expression of key and value to search for files
 67 |     sort_key : str, default None
 68 |         key to sort files
 69 |     result : list of dict
 70 |         search result
 71 |     files : list
 72 |         list of File class
 73 |     paths : list
 74 |         list of filepath
 75 |     items : list of dict
 76 |         metadata other than filepath
 77 |     """
 78 | 
 79 |     def __init__(
 80 |         self,
 81 |         project_name: str,
 82 |         conditions: Optional[str] = None,
 83 |         query: List[str] = [],
 84 |         sort_key: Union[str, List[str], None] = None,
 85 |     ) -> None:
 86 |         """
 87 |         Parameters
 88 |         ----------
 89 |         project_name : str
 90 |             registerd project name
 91 |         conditions : str, default None
 92 |             value of the condition to search for files
 93 |         query : list of str, default []
 94 |             conditional expression of key and value to search for files
 95 |         sort_key : str, default None
 96 |             key to sort files
 97 |         """
 98 |         access_key = get_access_key()
 99 |         HEADER.update({"x-api-key": access_key})
100 | 
101 |         self.project_name = project_name
102 |         self.user_id = get_user_id()
103 |         self.project_uid = get_project_uid(self.user_id, project_name)
104 | 
105 |         self.sort_key = sort_key
106 | 
107 |         self.__export(conditions=conditions, query=query, sort_key=sort_key)
108 | 
109 |         self.reprtext = self.__reprtext_generator(conditions, query)
110 |         self.expression = self.__class__.__name__
111 | 
112 |     def __search(
113 |         self, conditions: Optional[str] = None, query: List[str] = []
114 |     ) -> List[dict]:
115 |         """
116 |         Get metadata of filtered files from DynamoDB.
117 | 
118 |         Parameters
119 |         ----------
120 |         conditions : str, default None
121 |             value of the condition to search for files
122 |         query : list of str, default []
123 |             conditional expression of key and value to search for files
124 |         sort_key : str, default None
125 |             key to sort files
126 | 
127 |         Returns
128 |         -------
129 |         result : list of dict
130 |             search result of metadata
131 |         """
132 |         url = f"{BASE_API_ENDPOINT}/project/{self.project_uid}/files"
133 |         if conditions is not None:
134 |             url += "/" + "/".join(map(urllib.parse.quote_plus, conditions.split(",")))
135 |         url += "?user=" + self.user_id
136 | 
137 |         res = requests.get(url=url, headers=HEADER)
138 |         if res.status_code == 200:
139 |             result_url = res.json()["URL"]
140 |             result = requests.get(result_url)
141 |             result = json.loads(result.content.decode("utf-8"))["Items"]
142 |         else:
143 |             raise Exception("Undefined error happend.")
144 | 
145 |         result = self.__query_filter(result, query)
146 | 
147 |         linked_hash_location = os.path.join(
148 |             LINKER_DIR, self.project_uid, "linked_hash.json"
149 |         )
150 |         with open(linked_hash_location, "r", encoding="utf-8") as f:
151 |             hash_dict = json.loads(f.read())
152 |             result = [{"FilePath": hash_dict[i.pop("FileHash")], **i} for i in result]
153 | 
154 |         return result
155 | 
156 |     def __export(
157 |         self,
158 |         conditions: Optional[str] = None,
159 |         query: List[str] = [],
160 |         sort_key: Union[str, List[str], None] = None,
161 |     ):
162 |         """
163 |         Get metadata and return the File class.
164 | 
165 |         Parameters
166 |         ----------
167 |         conditions : str, default None
168 |             value of the condition to search for files
169 |         query : list of str, default []
170 |             conditional expression of key and value to search for files
171 |         sort_key : str, default None
172 |             key to sort files
173 | 
174 |         Returns
175 |         -------
176 |         self : Files class instance
177 |         """
178 |         # arguments varidation
179 |         self.__validate_args(conditions, query, sort_key)
180 | 
181 |         result = self.__search(conditions, query)
182 |         if sort_key is not None:
183 |             if isinstance(sort_key, str):
184 |                 sort_key = [sort_key]
185 |             result = sorted(
186 |                 result,
187 |                 key=lambda x: [x.get(key, float("inf")) for key in sort_key],
188 |             )
189 | 
190 |         self.result = result
191 |         self.__set_attributes(result)
192 | 
193 |         return self
194 | 
195 |     def filter(
196 |         self,
197 |         conditions: Optional[str] = None,
198 |         query: List[str] = [],
199 |         sort_key: Union[str, List[str], None] = None,
200 |     ):
201 |         """
202 |         Filter metadata and return the File class.
203 | 
204 |         Parameters
205 |         ----------
206 |         conditions : str, default None
207 |             value of the condition to search for files
208 |         query : list of str, default []
209 |             conditional expression of key and value to search for files
210 |         sort_key : str, default None
211 |             key to sort files
212 | 
213 |         Returns
214 |         -------
215 |         self : Files class instance
216 |         """
217 |         # arguments varidation
218 |         self.__validate_args(conditions, query, sort_key)
219 | 
220 |         filtered_files = copy.copy(self)
221 |         filtered_files.sort_key = (
222 |             sort_key or self.sort_key
223 |         )  # value1 or value2 <==> value2 if value1 is None else value1
224 | 
225 |         result = filtered_files.result
226 |         if conditions is not None:
227 |             result = filtered_files.__conditions_filter(result, conditions)
228 |         if len(query) > 0:
229 |             result = filtered_files.__query_filter(result, query)
230 |         if sort_key is not None:
231 |             if isinstance(sort_key, str):
232 |                 sort_key = [sort_key]
233 |             result = sorted(
234 |                 result,
235 |                 key=lambda x: [x.get(key, float("inf")) for key in sort_key],
236 |             )
237 | 
238 |         filtered_files.result = result
239 |         filtered_files.__set_attributes(result)
240 | 
241 |         return filtered_files
242 | 
243 |     def __query_filter(self, result: List[dict], query: List[str] = []) -> List[dict]:
244 |         """
245 |         Filter metadata with query.
246 | 
247 |         Parameters
248 |         ----------
249 |         result : list of dict
250 |             search result of metadata
251 |         query : list of str, default []
252 |             conditional expression of key and value to search for files
253 | 
254 |         Returns
255 |         -------
256 |         result : list of dict
257 |             metadata filterd with query
258 |         """
259 | 
260 |         def number_to_int(obj: str):
261 |             return int(obj) if obj.isdigit() else obj
262 | 
263 |         def natural_keys(primary_class: str):
264 |             def sort_funcion(obj):
265 |                 try:
266 |                     keys = [eval(primary_class)(obj)]
267 |                 except:
268 |                     keys = [number_to_int(c) for c in re.split(r"(\d+)", str(obj))]
269 |                 finally:
270 |                     return keys
271 | 
272 |             return sort_funcion
273 | 
274 |         unquote = lambda v: v.lstrip("'").rstrip("'").lstrip('"').rstrip('"')
275 | 
276 |         for q in query:
277 |             queried_result = []
278 | 
279 |             query_split = q.split(" ", 2)
280 |             if len(query_split) < 3 or query_split[1] not in [
281 |                 "==",
282 |                 "!=",
283 |                 ">",
284 |                 ">=",
285 |                 "<",
286 |                 "<=",
287 |                 "in",
288 |                 "is",
289 |                 "not",
290 |             ]:
291 |                 raise ValueError(
292 |                     "Invalid query grammar. See docs about query option.\nhttps://github.com/adansons/base/blob/main/docs/CLI.md#search"
293 |                 )
294 | 
295 |             # if q = "label <= 7" or  q = "label <= '7'"
296 |             # key = "label", value = "7", operator = "<="
297 |             if query_split[1] in ["in", "is", "not"]:
298 |                 key = query_split[0]
299 |                 qs_ = query_split[2].split(" ", 1)
300 |                 value = unquote(qs_[-1])
301 |                 operator = " ".join([query_split[1]] + qs_[:-1])
302 |             else:
303 |                 key = query_split[0]
304 |                 value = unquote(query_split[-1])
305 |                 operator = " ".join(query_split[1:-1])
306 | 
307 |             if operator == "==":
308 |                 for data in result:
309 |                     if key in data and eval(f"'{data[key]}' {operator} '{value}'"):
310 |                         queried_result.append(data)
311 |             elif operator == "!=":
312 |                 for data in result:
313 |                     if key in data and not eval(f"'{data[key]}' {operator} '{value}'"):
314 |                         continue
315 |                     else:
316 |                         queried_result.append(data)
317 |             elif operator in [">", ">="]:
318 |                 for data in result:
319 |                     if key in data:
320 |                         s = sorted(
321 |                             [data[key], value],
322 |                             key=natural_keys(data[key].__class__.__name__),
323 |                         )
324 |                         if s[0] == value:
325 |                             queried_result.append(data)
326 |             elif operator in ["<", "<="]:
327 |                 for data in result:
328 |                     if key in data:
329 |                         s = sorted(
330 |                             [data[key], value],
331 |                             key=natural_keys(data[key].__class__.__name__),
332 |                         )
333 |                         if s[1] == value:
334 |                             queried_result.append(data)
335 |             elif operator in ["is", "is not"]:
336 |                 # in python, "is" and "is not" operators allowed to compare with `None`
337 |                 # so, if other values set as 'value', raise ValueError
338 |                 if value != "None":
339 |                     raise ValueError(
340 |                         "Only 'None' is allowed with `is` or `is not` operators."
341 |                     )
342 |                 for data in result:
343 |                     if (operator == "is" and key not in data) or (
344 |                         operator == "is not" and key in data
345 |                     ):
346 |                         queried_result.append(data)
347 |             elif operator in ["in", "not in"]:
348 |                 value = [unquote(v) for v in re.split("[ ,]", value[1:-1]) if v != ""]
349 |                 for data in result:
350 |                     if key in data and eval(f"'{data[key]}' {operator} {value}"):
351 |                         queried_result.append(data)
352 |             else:
353 |                 raise ValueError(
354 |                     f"Specified operator '{operator}' was blocked for the security."
355 |                 )
356 | 
357 |             result = queried_result
358 |         return result
359 | 
360 |     def __conditions_filter(
361 |         self, result: List[dict], conditions: Optional[str] = None
362 |     ) -> List[dict]:
363 |         """
364 |         Filter metadata with conditions.
365 | 
366 |         Parameters
367 |         ----------
368 |         result : list of dict
369 |             search result of metadata
370 |         conditions : str, default None
371 |             value of the condition to search for files
372 | 
373 |         Returns
374 |         -------
375 |         result : list of dict
376 |             metadata filterd with conditions
377 |         """
378 |         conditions = set(conditions.split(","))
379 |         result = [recode for recode in result if set(recode.values()) & conditions]
380 |         return result
381 | 
382 |     def __set_attributes(self, result: List[dict]) -> None:
383 |         """
384 |         Set instance variables.
385 | 
386 |         Parameters
387 |         ----------
388 |         result : list of dict
389 |             search result of metadata
390 | 
391 |         Returns
392 |         -------
393 |         None
394 |         """
395 |         files = []
396 |         paths = []
397 |         items = []
398 |         for res in result:
399 |             attrs = {}
400 |             for k, v in res.items():
401 |                 if k == "FilePath":
402 |                     path = v
403 |                 else:
404 |                     attrs[k] = v
405 |             file = File(path, attrs)
406 |             files.append(file)
407 |             paths.append(path)
408 |             items.append(attrs)
409 | 
410 |         self.files = files  # list of File_class objects
411 |         self.paths = paths  # list of filepaths
412 |         self.items = items  # list of metadata_dict other than filepath
413 | 
414 |     def __validate_args(self, conditions, query, sort_key):
415 |         if conditions is not None:
416 |             if not isinstance(conditions, str):
417 |                 raise TypeError(
418 |                     f'Argument "conditions" must be str, not {conditions.__class__.__name__}.'
419 |                 )
420 |         if not hasattr(query, "__iter__"):
421 |             raise TypeError(
422 |                 f'Argument "query" must be list, not {query.__class__.__name__}.'
423 |             )
424 |         if sort_key is not None:
425 |             if not isinstance(sort_key, (str, list)):
426 |                 raise TypeError(
427 |                     f'Argument "sort_key" must be str, not {sort_key.__class__.__name__}.'
428 |                 )
429 | 
430 |     def __getitem__(self, idx: int) -> File:
431 |         return self.files[idx]
432 | 
433 |     def __len__(self) -> int:
434 |         return len(self.files)
435 | 
436 |     def __repr_formatter(self, string: Optional[str]) -> Optional[str]:
437 |         return "'" + string + "'" if string is not None else None
438 | 
439 |     def __reprtext_generator(self, conditions, query) -> str:
440 |         project_name = self.__repr_formatter(self.project_name)
441 |         conditions = self.__repr_formatter(conditions)
442 |         query = query
443 |         sort_key = self.__repr_formatter(self.sort_key)
444 |         reprtext = f"{self.__class__.__name__}(project_name={project_name}, conditions={conditions}, query={query}, sort_key={sort_key}, file_num={len(self.files)})\n"
445 |         return reprtext
446 | 
447 |     def __repr__(self) -> str:
448 |         # if this instance is operated,
449 |         if self.reprtext.count(self.__class__.__name__) >= 2:
450 |             repr_header = "======Files======\n"
451 |             expres_header = "===Expressions===\n"
452 |             # number each File instance
453 |             # 'Files(project_name=,...)' -> '{}(projwct_name=,...)' to use str.format()
454 |             self.reprtext = re.sub(f"{self.__class__.__name__}", "{}", self.reprtext)
455 |             self.expression = re.sub(
456 |                 f"{self.__class__.__name__}", "{}", self.expression
457 |             )
458 |             # '{}(projwct_name=,...)' -> 'Files1(projwct_name=,...)'
459 |             self.reprtext = self.reprtext.format(
460 |                 *[
461 |                     f"{self.__class__.__name__}{i+1}"
462 |                     for i in range(self.reprtext.count("{}"))
463 |                 ]
464 |             )
465 |             self.expression = self.expression.format(
466 |                 *[
467 |                     f"{self.__class__.__name__}{i+1}"
468 |                     for i in range(self.expression.count("{}"))
469 |                 ]
470 |             )
471 |             return repr_header + self.reprtext + expres_header + self.expression
472 |         else:
473 |             return self.reprtext
474 | 
475 |     def __add__(self, other: "Files") -> "Files":
476 |         if isinstance(other, self.__class__):
477 |             files = copy.copy(self)
478 |             files.result = self.result + other.result
479 |             files.__set_attributes(files.result)
480 |             files.reprtext = files.reprtext + other.reprtext
481 |             files.expression += " + " + other.expression
482 |             return files
483 |         else:
484 |             raise TypeError(
485 |                 f"unsupported operand type(s) for +: '{self.__class__.__name__}' and '{other.__class__.__name__}'."
486 |             )
487 | 
488 |     def __or__(self, other: "Files") -> "Files":
489 |         if isinstance(other, self.__class__):
490 |             files_list = list(
491 |                 map(
492 |                     lambda x: json.dumps(sorted(x.items())),
493 |                     [*(self.result), *(other.result)],
494 |                 )
495 |             )
496 |             uniq_result = sorted(set(files_list), key=files_list.index)
497 | 
498 |             files = copy.copy(self)
499 |             files.result = [dict(json.loads(result)) for result in uniq_result]
500 |             files.__set_attributes(files.result)
501 | 
502 |             files.reprtext = files.reprtext + other.reprtext
503 |             files_expression_count = files.expression.count(files.__class__.__name__)
504 |             other_expression_count = other.expression.count(other.__class__.__name__)
505 |             if files_expression_count >= 2 and other_expression_count >= 2:
506 |                 files.expression = f"({files.expression}) or ({other.expression})"
507 |             elif files_expression_count == 1 and other_expression_count >= 2:
508 |                 files.expression = f"{files.expression} or ({other.expression})"
509 |             elif files_expression_count >= 2 and other_expression_count == 1:
510 |                 files.expression = f"({files.expression}) or {other.expression}"
511 |             elif files_expression_count == 1 and other_expression_count == 1:
512 |                 files.expression = f"{files.expression} or {other.expression}"
513 |             return files
514 |         else:
515 |             raise TypeError(
516 |                 f"unsupported operand type(s) for +: '{self.__class__.__name__}' and '{other.__class__.__name__}'."
517 |             )
518 | 
519 | 
520 | if __name__ == "__main__":
521 |     pass
522 | 


--------------------------------------------------------------------------------
/base/cli.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright 2022 Adansons Inc.
  4 | # Please contact engineer@adansons.co.jp
  5 | import os
  6 | import sys
  7 | import time
  8 | import glob
  9 | import json
 10 | 
 11 | import click
 12 | from datetime import datetime
 13 | 
 14 | from base import VERSION
 15 | from base.project import (
 16 |     Project,
 17 |     create_project,
 18 |     get_projects,
 19 |     archive_project,
 20 |     delete_project,
 21 |     summarize_keys_information,
 22 | )
 23 | from base.config import (
 24 |     get_user_id,
 25 |     get_access_key,
 26 |     register_access_key,
 27 |     register_user_id,
 28 |     update_project_info,
 29 |     get_user_id_from_db,
 30 |     check_project_available,
 31 | )
 32 | from .exception import CatchAllExceptions, search_export_exception
 33 | 
 34 | 
 35 | def base_config(func):
 36 |     def wrapper(*args, **kwargs):
 37 |         # Try get user_id
 38 |         try:
 39 |             access_key = get_access_key()
 40 |             user_id = get_user_id()
 41 |             update_project_info(user_id)
 42 |         except:
 43 |             click.echo(
 44 |                 "Welcome to Adansons Base!!\n\nLet's start with your access key provided on our slack.\n(if you don't have access key, please press ENTER.)\n"
 45 |             )
 46 |             while True:
 47 |                 try:
 48 |                     access_key = click.prompt(
 49 |                         "Please register your access key", type=str, default="none"
 50 |                     )
 51 |                     if access_key == "none":
 52 |                         click.echo(
 53 |                             "\nGet invitation from here!\n-> https://share.hsforms.com/16OxTF7eJRPK92oGCny7nGw8moen\n"
 54 |                         )
 55 |                         sys.exit()
 56 |                 except click.exceptions.Abort:
 57 |                     click.echo("\nAborted!")
 58 |                     sys.exit()
 59 | 
 60 |                 try:
 61 |                     register_access_key(access_key)
 62 |                     user_id = get_user_id_from_db(access_key)
 63 |                     register_user_id(user_id)
 64 |                     update_project_info(user_id)
 65 |                 except click.exceptions.Abort:
 66 |                     click.echo("\nAborted!")
 67 |                     sys.exit()
 68 |                 except:
 69 |                     click.echo(
 70 |                         "\nIncorrect access key was specified, please re-configure or ask support team.\n"
 71 |                     )
 72 |                 else:
 73 |                     click.echo(f"\nSuccessfully configured as {user_id}\n")
 74 |                     time.sleep(3)
 75 |                     kwargs["user_id"] = user_id
 76 |                     break
 77 |         else:
 78 |             kwargs["user_id"] = user_id
 79 | 
 80 |         func(*args, **kwargs)
 81 | 
 82 |     return wrapper
 83 | 
 84 | 
 85 | @click.version_option(VERSION)
 86 | @click.group()
 87 | def main():
 88 |     """Adansons Database Command Line Interface"""
 89 |     pass
 90 | 
 91 | 
 92 | @main.command(name="new", help="create new project")
 93 | @click.argument("project")
 94 | @base_config
 95 | def create_table(project, user_id):
 96 |     """
 97 |     Create a new project table command
 98 |     Usage
 99 |     -----
100 |     $ base new sample-project
101 |     Arguments
102 |     ---------
103 |     project: str
104 |         new project name
105 |     parameters
106 |     ----------
107 |     user_id : str
108 |         registerd user id
109 |     Returns
110 |     -------
111 |     project_uid : str
112 |         project unique hash
113 |     """
114 |     try:
115 |         project_uid = create_project(user_id, project)
116 |         check_project_available(user_id, project_uid)
117 |     except Exception as e:
118 |         click.echo(e)
119 |     else:
120 |         click.echo(
121 |             f"Your Project UID\n----------------\n{project_uid}\n\nsave Project UID in local file (~/.base/projects)"
122 |         )
123 |         return project_uid
124 | 
125 | 
126 | @main.command(name="list", help="show project list")
127 | @click.option("--archived", is_flag=True)
128 | @base_config
129 | def list_project(archived, user_id):
130 |     """
131 |     Show project list command
132 |     Usage
133 |     -----
134 |     $ base list
135 |     Parameters
136 |     ----------
137 |     user_id : str
138 |         registerd user id
139 |     archived : bool
140 |         if you want show archived projects
141 |     """
142 | 
143 |     try:
144 |         project_list = get_projects(user_id, archived=archived)
145 |     except Exception as e:
146 |         click.echo(e)
147 |     else:
148 |         click.echo("projects\n========")
149 |         for project in project_list:
150 |             private = "yes" if project["PrivateProject"] == "1" else "no"
151 |             created_date = datetime.fromtimestamp(
152 |                 float(project["CreatedTime"])
153 |             ).strftime("%Y-%m-%d %H:%M:%S")
154 |             click.echo(
155 |                 f"[{project['ProjectName']}]\nProject UID: {project['ProjectUid']}\nRole: {project['UserRole']}\nPrivate Project: {private}\nCreated Date: {created_date}"
156 |             )
157 | 
158 | 
159 | @main.command(name="rm", help="remove project")
160 | @click.argument("project")
161 | @click.option("--confirm", is_flag=True)
162 | @click.option(
163 |     "-m",
164 |     "--member",
165 |     type=str,
166 |     help="member id you want to remove from project",
167 |     required=False,
168 |     default=None,
169 |     multiple=True,
170 | )
171 | @base_config
172 | def remove_project(confirm, project, user_id, member):
173 |     """
174 |     Delete a project command
175 |     Usage
176 |     -----
177 |     $ base rm sample-project
178 |     Arguments
179 |     ---------
180 |     project : str
181 |         project name wich you want to delete
182 |     Parameters
183 |     ----------
184 |     user_id : str
185 |         registerd user id
186 |     Options
187 |     -------
188 |     confirm : bool
189 |         if you want delete archived projects
190 |     member : list
191 |         if you want remove project member from project
192 |     """
193 |     if not member:
194 |         if confirm:
195 |             try:
196 |                 delete_project(user_id, project)
197 |             except Exception as e:
198 |                 click.echo(e)
199 | 
200 |             else:
201 |                 click.echo(f"{project} was Deleted")
202 |         else:
203 | 
204 |             try:
205 |                 archive_project(user_id, project)
206 |             except Exception as e:
207 |                 click.echo(e)
208 |             else:
209 |                 click.echo(f"{project} was Archived")
210 |     else:
211 | 
212 |         pjt = Project(project)
213 |         try:
214 |             pjt.remove_member(member)
215 |         except Exception as e:
216 |             click.echo(e)
217 |         else:
218 |             click.echo(f"{','.join(member)} was removed from {project}")
219 | 
220 | 
221 | @main.command(name="show", help="show project detail")
222 | @click.argument("project")
223 | @click.option("--member-list", is_flag=True)
224 | @base_config
225 | def show_project_detail(project, user_id, member_list):
226 |     """
227 | 
228 |     Show project detail command
229 |     Usage
230 |     -----
231 |     $ base show sample-project
232 |     Arguments
233 |     ---------
234 |     project : str
235 |         project name wich you are interested in
236 |     Parameters
237 |     ----------
238 |     user_id : str
239 |         registerd user id
240 |     Optinons
241 |     --------
242 |     member_list : bool
243 |         if you want see about project members
244 |     """
245 |     pjt = Project(project)
246 |     if not member_list:
247 | 
248 |         try:
249 |             key_list = pjt.get_metadata_summary()
250 |             summary_for_print = summarize_keys_information(key_list)
251 |         except Exception as e:
252 |             click.echo(e)
253 |         else:
254 |             click.echo(
255 |                 f"project {project}\n===============\nYou have {summary_for_print['MaxRecordedCount']} records with {summary_for_print['UniqueKeyCount']} keys in this project.\n\n[Keys Information]\n"
256 |             )
257 | 
258 |             # first element is ('KEY NAME', 'VALUE RANGE', 'VALUE TYPE', 'RECORDED COUNT')
259 |             max_len_list = [
260 |                 summary_for_print["MaxCharCount"][column]
261 |                 for column in summary_for_print["Keys"][0]
262 |             ]
263 |             for row in summary_for_print["Keys"]:
264 |                 click.echo(
265 |                     "  ".join(
266 |                         [
267 |                             content + " " * (length - len(content))
268 |                             for content, length in zip(row, max_len_list)
269 |                         ]
270 |                     )
271 |                 )
272 |     else:
273 |         try:
274 |             member_list = pjt.get_members()
275 |         except Exception as e:
276 |             click.echo(e)
277 |         else:
278 |             click.echo("project Members\n===============")
279 |             for column in member_list:
280 |                 created_date = datetime.fromtimestamp(
281 |                     float(column["CreatedTime"])
282 |                 ).strftime("%Y-%m-%d %H:%M:%S")
283 |                 click.echo(
284 |                     f'{column["UserID"]} ({column["UserRole"]}, invited at {created_date})'
285 |                 )
286 | 
287 | 
288 | @main.command(name="import", help="import dataset into project")
289 | @click.argument("project")
290 | @click.option(
291 |     "-m",
292 |     "--external-file",
293 |     help="flag for external meta-data file",
294 |     is_flag=True,
295 |     default=False,
296 | )
297 | @click.option(
298 |     "-p",
299 |     "--path",
300 |     help="path for external meta-data file",
301 |     required=False,
302 |     default=None,
303 |     multiple=True,
304 | )
305 | @click.option(
306 |     "-d",
307 |     "--directory",
308 |     type=str,
309 |     help="target directory path",
310 |     required=False,
311 |     default=None,
312 | )
313 | @click.option(
314 |     "-e",
315 |     "--extension",
316 |     type=str,
317 |     help="target file extensions",
318 |     required=False,
319 |     default=None,
320 | )
321 | @click.option(
322 |     "-c", "--parse", type=str, help="path parsing rule", required=False, default=None
323 | )
324 | @click.option(
325 |     "-a",
326 |     "--additional",
327 |     type=str,
328 |     help="additional key and value",
329 |     required=False,
330 |     default=None,
331 |     multiple=True,
332 | )
333 | @click.option(
334 |     "--extract",
335 |     type=str,
336 |     help="flag for extract external file",
337 |     is_flag=True,
338 |     default=False,
339 | )
340 | @click.option(
341 |     "--estimate-rule",
342 |     type=str,
343 |     help="flag for estimate join rule",
344 |     is_flag=True,
345 |     default=False,
346 | )
347 | @click.option(
348 |     "--join-rule",
349 |     type=str,
350 |     help="file path for defining the join rule",
351 |     required=False,
352 |     default=None,
353 | )
354 | @click.option("--export", type=str, help="export file type", required=False)
355 | @click.option("--output", type=str, help="output file path", required=False)
356 | @click.option("--auto-approve", is_flag=True)
357 | @base_config
358 | def import_data(
359 |     project,
360 |     external_file,
361 |     path,
362 |     directory,
363 |     extension,
364 |     parse,
365 |     additional,
366 |     auto_approve,
367 |     extract,
368 |     estimate_rule,
369 |     join_rule,
370 |     export,
371 |     output,
372 |     user_id,
373 | ):
374 |     """
375 |     Import data file command
376 |     Usage
377 |     -----
378 |     $ base import sample-project -d ../dataset -e wav -c {timestamp}/{UID}-{condition}-{iteration}.wav
379 |     If you want to import meta-data from an external file :
380 |     $ base import sample-project --external-file your/path/to_data
381 |     Arguments
382 |     ---------
383 |     project : str
384 |         project name wich you are interested in
385 |     Parameters
386 |     ----------
387 |     user_id : str
388 |         registerd user id
389 |     directory : str, default=None
390 |     extension : str, default=None
391 |     parse : str, default=None
392 |     additional : tuple of str, default=None
393 |     auto_approve : bool, default=False
394 |         approve estimated table joining rule
395 |     """
396 |     if additional is None:
397 |         additional = {}
398 |     else:
399 |         try:
400 |             additional = {
401 |                 element.split(":")[0]: element.split(":")[1] for element in additional
402 |             }
403 |         except:
404 |             click.echo(
405 |                 "Found invalid argument in -x. The argument must be : -x key:value"
406 |             )
407 |         else:
408 |             import_metafile(
409 |                 project,
410 |                 path,
411 |                 additional,
412 |                 auto_approve,
413 |                 extract,
414 |                 estimate_rule,
415 |                 join_rule,
416 |                 export,
417 |                 output,
418 |             ) if external_file else import_dataset(
419 |                 project, directory, extension, parse, additional
420 |             )
421 | 
422 | 
423 | def import_dataset(project, directory, extension, parse, additional):
424 |     pjt = Project(project)
425 |     if directory is None:
426 |         directory = click.prompt(
427 |             "Where is your dataset? (select root of dataset directory)", type=str
428 |         )
429 |     if extension is None:
430 |         extension = []
431 |         extension = click.prompt(
432 |             "What is your data file extension? (ex: csv, jpg, png, wav)", type=str
433 |         )
434 |         if extension[0] == ".":
435 |             extension = extension[1:]
436 | 
437 |     click.echo("Check datafiles...")
438 |     files = glob.glob(os.path.join(directory, "**", f"*.{extension}"), recursive=True)
439 |     click.echo(f"found {len(files)} files with {extension} extension.")
440 |     assert (
441 |         len(files) > 0
442 |     ), "No datafiles found. Please check your directory and extension."
443 | 
444 |     if parse is None:
445 |         sample_file_path = files[0].split(directory)[-1]
446 |         if sample_file_path[0] == os.sep:
447 |             sample_file_path = sample_file_path[1:]
448 |         click.echo(
449 |             f"\nTell me parsing rule for get meta data from file path with '{extension}'.\n\
450 | * you can use {{key-name}} to parse phrases with key.\n\
451 | * you can use {{_}} to ignore some phrases.\n\
452 | * you have to use '/' as separator.\n\
453 | ** sample parsing rule: {{_}}/{{name}}/{{timestamp}}/{{sensor}}-{{condition}}_{{iteration}}.csv\n\
454 | path to your file: {sample_file_path}"
455 |         )
456 |         parse = click.prompt("Parsing rule", type=str)
457 | 
458 |     try:
459 |         pjt.add_datafiles(
460 |             directory,
461 |             extension,
462 |             attributes=additional,
463 |             parsing_rule=parse,
464 |             detail_parsing_rule=None,
465 |         )
466 |     except ValueError as e:
467 |         click.echo(e)
468 |         click.echo(
469 |             f"\nCan't parse uniquely with parsing rule: {parse}\n\
470 | Please tell me detail parsing rule in accordance with the actual path.\n\
471 | * use {{value}} to parse phrases with value in the actual path\n\
472 | * put {{}} before/after the value corresponding to {{_}} on the original parsing rule.\n\
473 | ** original parsing rule: {{_}}/{{name}}/{{timestamp}}/{{sensor}}-{{condition}}_{{iteration}}.csv\n\
474 | ** example path: Origin/suzuki/2020-04-07/A200-C_50.csv\n\
475 | ** sample detail parsing rule: {{Origin}}/{{suzuki}}/{{2022-04-07}}/{{A200}}-{{C}}_{{50}}.csv\n\
476 | path to your file: {files[0].split(directory)[-1]}"
477 |         )
478 |         detail_parse = click.prompt("Detail parsing rule", type=str)
479 | 
480 |         try:
481 |             pjt.add_datafiles(
482 |                 directory,
483 |                 extension,
484 |                 attributes=additional,
485 |                 parsing_rule=parse,
486 |                 detail_parsing_rule=detail_parse,
487 |             )
488 |         except Exception as e:
489 |             click.echo(e)
490 |         else:
491 |             click.echo("Success!")
492 |     except Exception as e:
493 |         click.echo(e)
494 |     else:
495 |         click.echo("Success!")
496 | 
497 | 
498 | def import_metafile(
499 |     project,
500 |     path,
501 |     additional,
502 |     auto_approve,
503 |     extract,
504 |     estimate_rule,
505 |     join_rule,
506 |     export,
507 |     output,
508 | ):
509 |     pjt = Project(project)
510 |     if (path == ()) and (join_rule is None):
511 |         path = click.prompt(
512 |             "Where is your meta-data file? (select a path for an external meta-data file)",
513 |             type=str,
514 |         )
515 |     try:
516 |         if extract:
517 |             for pth in path:
518 |                 result = pjt.extract_metafile(
519 |                     file_path=pth, attributes=additional, verbose=2
520 |                 )
521 |                 if export is not None:
522 |                     if export.lower() == "csv":
523 |                         for i, res in enumerate(result, 1):
524 |                             result_keys = list(res[0].keys())
525 |                             for r in res:
526 |                                 extra_keys = list(set(r.keys()) - set(result_keys))
527 |                                 result_keys += extra_keys
528 | 
529 |                             output_csv = ",".join(result_keys)
530 |                             for r in res:
531 |                                 result_values = [str(r[k]) for k in result_keys]
532 |                                 output_csv += "\n" + ",".join(result_values)
533 | 
534 |                             output_path = os.path.join(
535 |                                 ".",
536 |                                 f"{os.path.basename(pth.split('.')[0])}_Table{i}.csv",
537 |                             )
538 |                             if output is not None:
539 |                                 output_path = output
540 |                                 os.makedirs(os.path.dirname(output), exist_ok=True)
541 | 
542 |                             file_count = 1
543 |                             while True:
544 |                                 if os.path.exists(output_path):
545 |                                     basename, ext = os.path.splitext(output_path)
546 |                                     output_path = f"{basename} ({file_count}){ext}"
547 |                                     file_count += 1
548 |                                 else:
549 |                                     break
550 |                             with open(output_path, "w", encoding="utf-8") as f:
551 |                                 f.write(output_csv)
552 |                     else:
553 |                         click.echo(
554 |                             f"Sorry, export file type: {export} was not supprted yet..."
555 |                         )
556 |         elif estimate_rule:
557 |             for pth in path:
558 |                 pjt.estimate_join_rule(file_path=pth, verbose=2)
559 |         else:
560 |             pjt.add_metafile(
561 |                 file_path=path,
562 |                 attributes=additional,
563 |                 auto=auto_approve,
564 |                 join_rule_path=join_rule,
565 |                 verbose=1,
566 |             )
567 |     except Exception as e:
568 |         click.echo(e)
569 |     else:
570 |         click.echo("Success!")
571 | 
572 | 
573 | @main.command(
574 |     name="search",
575 |     help="search files",
576 |     cls=CatchAllExceptions(click.Command, handler=search_export_exception),
577 | )
578 | @click.argument("project")
579 | @click.option(
580 |     "-q",
581 |     "--query",
582 |     type=str,
583 |     help="query key value pair and operator. you have to specify like 'key >= value'",
584 |     required=False,
585 |     multiple=True,
586 | )
587 | @click.option(
588 |     "-c",
589 |     "--conditions",
590 |     type=str,
591 |     help="query value. you have to specify as 'value1,value2,...'",
592 |     required=False,
593 | )
594 | @click.option("-e", "--export", type=str, help="export file type", required=False)
595 | @click.option("-o", "--output", type=str, help="output file path", required=False)
596 | @click.option("-s", "--summary", is_flag=True)
597 | @base_config
598 | def search_files(
599 |     project,
600 |     query,
601 |     conditions,
602 |     export,
603 |     output,
604 |     user_id,
605 |     summary,
606 | ):
607 |     """
608 |     Query database
609 |     Usage
610 |     -----
611 |     $ base search sample-project -q "key >= xxxxx" -c yyy,zzz
612 |     Arguments
613 |     ---------
614 |     project : str
615 |         project name wich you are interested in
616 |     Parameters
617 |     ----------
618 |     user_id : str
619 |         registerd user id
620 |     query : str
621 |     conditions : str
622 |     Options
623 |     -------
624 |     summary : bool
625 |         if you want hide detail
626 |     """
627 |     pjt = Project(project)
628 |     try:
629 |         if conditions is not None:
630 |             result = pjt.files(conditions=conditions, query=query).result
631 |         else:
632 |             result = pjt.files(query=query).result
633 |     except Exception as e:
634 |         click.echo(e)
635 |     else:
636 |         click.echo(f"{len(result)} files")
637 |         if not summary:
638 |             click.echo("========")
639 |             for r in result:
640 |                 click.echo(r)
641 |         if export is not None:
642 |             if export.lower() == "json":
643 |                 output_json = json.dumps({"Data": result}, indent=4, ensure_ascii=False)
644 | 
645 |                 output_path = os.path.join(".", "dataset.json")
646 |                 if output is not None:
647 |                     output_path = output
648 |                     os.makedirs(os.path.dirname(output) or ".", exist_ok=True)
649 | 
650 |                 file_count = 1
651 |                 while True:
652 |                     if os.path.exists(output_path):
653 |                         basename, ext = os.path.splitext(output_path)
654 |                         output_path = f"{basename} ({file_count}){ext}"
655 |                         file_count += 1
656 |                     else:
657 |                         break
658 |                 with open(output_path, "w", encoding="utf-8") as f:
659 |                     f.write(output_json)
660 |             elif export.lower() == "csv":
661 |                 result_keys = []
662 |                 for r in result:
663 |                     extra_keys = list(set(r.keys()) - set(result_keys))
664 |                     result_keys += extra_keys
665 | 
666 |                 output_csv = ",".join(result_keys)
667 |                 for r in result:
668 |                     result_values = [str(r[k]) for k in result_keys]
669 |                     output_csv += "\n" + ",".join(result_values)
670 | 
671 |                 output_path = os.path.join(".", "dataset.csv")
672 |                 if output is not None:
673 |                     output_path = output
674 |                     os.makedirs(os.path.dirname(output) or ".", exist_ok=True)
675 | 
676 |                 file_count = 1
677 |                 while True:
678 |                     if os.path.exists(output_path):
679 |                         basename, ext = os.path.splitext(output_path)
680 |                         output_path = f"{basename} ({file_count}){ext}"
681 |                         file_count += 1
682 |                     else:
683 |                         break
684 |                 with open(output_path, "w", encoding="utf-8") as f:
685 |                     f.write(output_csv)
686 |             else:
687 |                 click.echo(f"Sorry, export file type: {export} was not supprted yet...")
688 |         elif export is None and output is not None:
689 |             click.echo("\nPlease specify export file type. (e.g. --export json)")
690 | 
691 | 
692 | @main.command(name="invite", help="invite project member")
693 | @click.argument("project")
694 | @click.option(
695 |     "-m", "--member", type=str, help="member id you want to invite", required=True
696 | )
697 | @click.option(
698 |     "-p",
699 |     "--permission",
700 |     type=str,
701 |     help="permission level, select from 'Viewer', 'Editor', 'Admin', 'Owner'",
702 |     required=True,
703 | )
704 | @click.option("-u", "--update", is_flag=True)
705 | @base_config
706 | def invite_member(project, member, permission, update, user_id):
707 |     """
708 |     Invite project member
709 |     Usage
710 |     -----
711 |     $ base invite sample-project -m MEMBER -p Editor
712 |     Arguments
713 |     ---------
714 |     project : str
715 |         project name wich you want to invite to
716 |     Parameters
717 |     ----------
718 |     user_id : str
719 |         registerd user id
720 |     member : str
721 |         user id who you want invite
722 |     permission : str
723 |         permission level you want to give the member
724 |     Options
725 |     -------
726 |     update : bool
727 |         if you want update permission exsisting project member
728 |     """
729 |     pjt = Project(project)
730 |     if not update:
731 |         try:
732 |             pjt.add_member(member, permission)
733 |         except Exception as e:
734 |             click.echo(e)
735 |         else:
736 |             click.echo(f"Successfully invited {member} into {project} as {permission}")
737 |     else:
738 |         try:
739 |             pjt.update_member(member, permission)
740 |         except Exception as e:
741 |             click.echo(e)
742 |         else:
743 |             click.echo(f"Successfully update {member}'s permission to {permission}")
744 | 
745 | 
746 | @main.command(name="link", help="import dataset into project")
747 | @click.argument("project")
748 | @click.option(
749 |     "-d",
750 |     "--directory",
751 |     type=str,
752 |     help="target directory path",
753 |     required=False,
754 |     default=None,
755 | )
756 | @click.option(
757 |     "-e",
758 |     "--extension",
759 |     type=str,
760 |     help="target file extensions",
761 |     required=False,
762 |     default=None,
763 | )
764 | @base_config
765 | def data_link(project, directory, extension, user_id):
766 |     """
767 |     Create linker metadat to local datafiles.
768 |     Usage
769 |     -----
770 |     $ base link sample-project -d ../dataset -e wav
771 |     Arguments
772 |     ---------
773 |     project : str
774 |         project name wich you are interested in
775 |     Parameters
776 |     ----------
777 |     user_id : str
778 |         registerd user id
779 |     directory : str, default=None
780 |     extension : str, default=None
781 |     """
782 |     pjt = Project(project)
783 |     if directory is None:
784 |         directory = click.prompt(
785 |             "Where is your dataset? (select root of dataset directory)", type=str
786 |         )
787 |     if extension is None:
788 |         extension = []
789 |         extension = click.prompt(
790 |             "What is your data file extension? (ex: csv, jpg, png, wav)", type=str
791 |         )
792 |         if extension[0] == ".":
793 |             extension = extension[1:]
794 | 
795 |     try:
796 |         file_num = pjt.link_datafiles(directory, extension)
797 |     except Exception as e:
798 |         click.echo(e)
799 |     else:
800 |         click.echo("Check datafiles...")
801 |         click.echo(f"found {file_num} files with {extension} extension.")
802 |         click.echo("linked!")
803 | 
804 | 
805 | if __name__ == "__main__":
806 |     main()
807 | 


--------------------------------------------------------------------------------
/docs/CLI.md:
--------------------------------------------------------------------------------
  1 | # Command Reference
  2 | 
  3 | Here we provide the specifications, complete descriptions, and comprehensive usage examples for `base` commands. For a list of commands, type `base --help.`
  4 | 
  5 |   - [import](#import)
  6 |   - [invite](#invite)
  7 |   - [link](#link)
  8 |   - [list](#list)
  9 |   - [new](#new)
 10 |   - [rm](#rm)
 11 |   - [search](#search)
 12 |   - [show](#show)
 13 | 
 14 | ## import
 15 | 
 16 | ---
 17 | 
 18 | Import data files or external meta data files into Base project.
 19 | 
 20 | **Synopsis**
 21 | 
 22 | ---
 23 | 
 24 | ```
 25 | usage: base import project [-d <datafiles-dirpath>] [-e <datafile-extension>] [-c <path-parsing-rule>] [-m] [-p <external-filepath>] [-a <additional-key-value>]
 26 | 
 27 | positional arguments:
 28 |   project              your project name to import.
 29 | ```
 30 | 
 31 | **Description**
 32 | 
 33 | ---
 34 | 
 35 | This command provide you the way to import meta data related with data file paths and defined in external files such as `.xlsx`, `.csv`.
 36 | 
 37 | You have to select import mode as data files or external files.
 38 | 
 39 | If you want to import data files, you have to specify `-d`, `-c` and `-e` options (or prompt ask you interactively).
 40 | 
 41 | And then, Base will take below actions.
 42 | 
 43 | 1. Calculate the file hash.
 44 | 2. Parse the file path with `parsing-rule`.
 45 | 3. Create meta data records with the file hash and parsed path data.
 46 | 4. Add that records into project database table.
 47 | 
 48 | ```
 49 | {
 50 | 	"FileHash": String,
 51 | 	"MetaKey1": ...,
 52 | 	...
 53 | }
 54 | ```
 55 | 
 56 | If you want to import external files, you have to specify `-m` and `-p` options.
 57 | 
 58 | And then, Base will take below actions.
 59 | 
 60 | 1. Extract tables from the external file.
 61 | 2. Parse each table and detect headers in the table.
 62 | 3. Set header as Key and create meta data records.
 63 | 4. Link and update existing records with new meta data records in project database table.
 64 | 
 65 | ```
 66 | {
 67 | 	"Table0,MetaKey1": ...,
 68 | 	...
 69 | }
 70 | ```
 71 | 
 72 | **Options**
 73 | 
 74 | ---
 75 | 
 76 | - `-d <datafiles-dirpath>`, `--directory <datafiles-dirpath>` - specify a `datafiles-dirpath` to load data files which have an extension specified with `-e` option. Base will search recursively.
 77 | - `-e <datafile-extension>`, `--extension <datafile-extension>` - specify a `datafile-extension` to filter the targets on load data files. if you have some extensions in one dataset (such as png and jpg), you have to split loading workflow.
 78 | - `-c <path-parsing-rule>`, `--parse <path-parsing-rule>` - specify `path-parsing-rule` to extract meta data from each data file path.
 79 |     
 80 |     ```
 81 |     - you can use {key-name} to parse phrases with key.
 82 |     - you can use {_} to ignore some phrases.
 83 |     - you have to use '/' as separator.
 84 |     
 85 |     >>> sample parsing rule: {}/{name}/{timestamp}/{sensor}-{condition}{iteration}.csv
 86 |     ```
 87 | The following options are used only when importing external files.
 88 | - `-m`, `--external-file` - parse the content of external files which specified with `-p` option.
 89 | - `-p <external-filepath>`, `--path <external-filepath>` - specify an `external-filepath` to import external files. Base will parse content of that file, extract table data on it, and parse the tables.
 90 | - `-a <additional-key-value>`, `--additional <additional-key-value>` - specify additional meta data you want to add whole the file you import. the value must be include colon (”:”) between `key name` and `value string`. for instance, if you want to import and join an external file for only “test” data type files, you should specify like `-x dataType:test`.
 91 | - `--extract` - with this option, only extract the content of external file, dose not link and update with existing tables. you can specify output path with `-e` and `-o` options to get extract results.
 92 |   - `--export <export-file-type>` - if you want to convert extract results into CSV, you can specify CSV as export-file-type.
 93 |   - `--output <output-filepath>`- specify output-filepath to save dataset file. default is “./{external-filepath}_Table{number}.csv”
 94 | - `--estimate-rule` - with this option, only estimate the joining rule from existing tables and external files which specified with `-p` option, dose not link and update with existing tables.
 95 | 
 96 | **Example: Import png files on project “mnist”**
 97 | 
 98 | ---
 99 | 
100 | ```
101 | # after you download mnist data files based on Tutorial1
102 | $ base import mnist --directory ~/dataset/mnist --extension png --parse "{dataType}/{label}/{id}.png"
103 | ```
104 | 
105 | <details><summary>Output</summary>
106 | 
107 | ```
108 | Check datafiles...
109 | found 70000 files with png extension.
110 | 70000/70000 files uploaded.   
111 | Success!
112 | ```
113 | </details>
114 | 
115 | **Example: Import external csv file on project “mnist”**
116 | 
117 | ---
118 | 
119 | ```
120 | # download external csv
121 | $ curl -SL https://raw.githubusercontent.com/youkaichao/mnist-wrong-test/master/wrongImagesInMNISTTestset.csv > ~/Downloads/wrongImagesInMNISTTestset.csv
122 | 
123 | $ base import mnist --external-file --path ~/Downloads/wrongImagesInMNISTTestset.csv -a dataType:test
124 | ```
125 | 
126 | <details><summary>Output</summary>
127 | 
128 | ```
129 | 1 tables found!
130 | now estimating the rule for table joining...
131 | 
132 | 1 table joining rule was estimated!
133 | Below table joining rule will be applied...
134 | 
135 | Rule no.1
136 | 
137 |         key 'index'     ->      connected to 'id' key on exist table
138 |         key 'originalLabel'     ->      connected to 'label' key on exist table
139 |         key 'correction'        ->      newly added
140 | 
141 | 1 tables will be applied
142 | Table 1 sample record:
143 |         {'index': 8, 'originalLabel': 5, 'correction': '-1'}
144 | 
145 | Do you want to perform table join?
146 |         Base will join tables with that rule described above.
147 | 
148 |         'y' will be accepted to approve.
149 |         If you need to modify it, please enter 'm'
150 |                 Definition YML file with estimated table join rules will be downloaded, then you can modify it and apply the new join rule.
151 |         Enter a value: y
152 | Success!
153 | ```
154 | If you enter 'm', definition YAML file with estimated table join rules will be downloaded.  
155 | You can modify this file and execute the commands displayed in the terminal to apply the new join rule.  
156 | 
157 | ```
158 | Do you want to perform table join?
159 |         Base will join tables with that rule described above.
160 | 
161 |         'y' will be accepted to approve.
162 | 
163 |         If you need to modify it, please enter 'm'
164 |                 Definition YML file with estimated table join rules will be downloaded, then you can modify it and apply the new join rule.
165 |         Enter a value: m
166 | 
167 | Downloaded a YAML file 'joinrule_definition_mnist.yml' in current directory.
168 | Key information for the new table and the existing table is as follows.
169 | 
170 | 
171 | ===== New Table1 =====
172 | KEY NAME         VALUE RANGE  VALUE TYPE            RECORDED COUNT
173 | 'index'          8 ~ 9850     int('index')          74            
174 | 'originalLabel'  0 ~ 9        int('originalLabel')  74            
175 | 'correction'     -1 ~ 8or9    str('correction')     74            
176 | 'dataType'       test ~ test  str('dataType')       74            
177 | 
178 | ===== Existing Table =====
179 | KEY NAME    VALUE RANGE   VALUE TYPE       RECORDED COUNT
180 | 'id'        0 ~ 59999     str('id')        70000         
181 | 'label'     0 ~ 9         str('label')     70000         
182 | 'dataType'  test ~ train  str('dataType')  70000          
183 | 
184 | You can apply the new join-rule according to 2 steps.
185 | 1. Modify the file 'joinrule_definition_mnist.yml'. Open the file to see a detailed description.
186 | 2. Execute the following command.
187 |    base import mnist --external-file --additional dataType:test --join-rule joinrule_definition_mnist.yml
188 | 
189 | Success!
190 | ```
191 | joinrule_definition_mnist.yml
192 | ```yaml
193 | RequestedTime: 1654257223.4988642
194 | ProjectName: mnist
195 | Body:
196 |   Table1:
197 |     FilePath: /Users/user/Downloads/wrongImagesInMNISTTestset.csv
198 |     JoinRules:
199 |       index: id
200 |       originalLabel: label
201 |       correction:
202 |       dataType: dataType
203 | ```
204 | New join rules can be defined by modifying the `Body/Table/JoinRules` section.  
205 | Fundamentally, this section consists of Key-Value Pairs. Key is the key name from the new table extracted from the external file. Value is the key name from the existing table.
206 | 
207 | How to define join rules.  
208 | if you have same key on the new table and the existing table, write like this.
209 | ```yaml
210 |  'New table key': 'Existing table key'
211 | ```
212 | 
213 | if you have new value on the existing key, write like this.
214 | ```yaml
215 |  'New table key': 'ADD:Existing table key'
216 | ```
217 | 
218 | if you have new key, no need to specify anything.
219 | ```yaml
220 |  'New table key': 
221 | ```
222 | 
223 | For example:
224 | ```yaml
225 |  JoinRules:
226 |   first_name: name
227 |   age: ADD:Age
228 |   height:
229 | ```
230 | 1. "first_name: name" means to join the new key named "first_name" with the existing key named "name".
231 | 2. "age: ADD:Age" means to add new values of the new key named 'age' on the existing key named 'Age'.
232 | 3. "height: " means to add the key named "height" as a new key.
233 | 
234 | 
235 | 
236 | </details>
237 | 
238 | 
239 | → [Back to top](#command-reference)
240 | 
241 | ## invite
242 | 
243 | Invite collaborators into your Base project.
244 | 
245 | **Synopsis**
246 | 
247 | ---
248 | 
249 | ```
250 | usage: base invite project [-m <member-id>] [-p <permission-level>] [-u]
251 | 
252 | positional arguments:
253 |   project              your project name to invite.
254 | ```
255 | 
256 | **Description**
257 | 
258 | ---
259 | 
260 | This command control access into your project.
261 | 
262 | You can invite new project member as below `permission level`.
263 | 
264 | - `Viewer` : only read meta data on project database. viewer can not import data files or external files and can not control permission of other members.
265 | - `Editor` : can read and write meta data into project database. editor can not control permission of other members.
266 | - `Admin` : can read and write meta data into project database. admin can also control permission of other members, but can not transfer `Owner` permission level.
267 | 
268 | And also you can update member’s permission level with `-u` option, if you are admin or owner.
269 | 
270 | If you are the project owner and try update other member’s permission to `Owner` , the member will become project owner and your permission will be downgraded to `Admin` .
271 | 
272 | - `Owner` : can transfer `owner` permission to others, and delete project completely.
273 | 
274 | **Options**
275 | 
276 | ---
277 | 
278 | - `-m <member-id>`, `--member <member-id>` - specify `member-id` to invite. if you will be invited by others, you have to tell him/her your user id.
279 | - `-p <permission-level>`, `--permission <permission-level>` -
280 | - `-u`, `--update` -
281 | 
282 | **Example: Invite an viewer into mnist**
283 | 
284 | ---
285 | 
286 | check current your project members on mnist with `[base show <project> --member-list](#show)` command
287 | 
288 | ```
289 | $ base show mnist --member-list
290 | ```
291 | 
292 | <details><summary>Output</summary>
293 | 
294 | ```
295 | project Members
296 | ===============
297 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54)
298 | ```
299 | </details>
300 | 
301 | then, invite zzzz@yyyy.com into mnist as an viewer
302 | 
303 | ```
304 | $ base invite mnist --member zzzz@yyyy.com --permission viewer
305 | ```
306 | 
307 | <details><summary>Output</summary>
308 | 
309 | ```
310 | Successfully invited zzzz@yyyy.com into mnist as Viewer
311 | ```
312 | </details>
313 | 
314 | finally, you can check the invited user in project member list .
315 | 
316 | ```
317 | $ base show mnist --member-list
318 | ```
319 | 
320 | <details><summary>Output</summary>
321 | 
322 | ```
323 | project Members
324 | ===============
325 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54)
326 | zzzz@yyyy.com (Viewer, invited at 2022-03-12 13:45:04)
327 | ```
328 | </details>
329 | 
330 | **Example: Update project member’s permission**
331 | 
332 | ---
333 | 
334 | check current your project members
335 | 
336 | ```
337 | $ base show mnist --member-list
338 | ```
339 | 
340 | <details><summary>Output</summary>
341 | 
342 | ```
343 | project Members
344 | ===============
345 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54)
346 | zzzz@yyyy.com (Viewer, invited at 2022-03-12 13:45:04)
347 | ```
348 | </details>
349 | 
350 | then, update permission of zzzz@yyyy.com to editor
351 | 
352 | ```
353 | $ base invite mnist --update --member zzzz@yyyy.com --permission editor
354 | ```
355 | 
356 | <details><summary>Output</summary>
357 | 
358 | ```
359 | Successfully update zzzz@yyyy.com's permission to Editor
360 | ```
361 | </details>
362 | 
363 | finally, you can check the updated user permission in project member list .
364 | 
365 | ```
366 | $ base show mnist --member-list
367 | ```
368 | 
369 | <details><summary>Output</summary>
370 | 
371 | ```
372 | project Members
373 | ===============
374 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54)
375 | zzzz@yyyy.com (Editor, invited at 2022-03-12 13:45:04)
376 | ```
377 | </details>
378 | 
379 | → [Back to top](#command-reference)
380 | 
381 | ## link
382 | 
383 | Link path to data files on local computer and meta data on Base project.
384 | 
385 | **Synopsis**
386 | 
387 | ---
388 | 
389 | ```
390 | usage: base link project [-d <datafiles-dirpath>] [-e <datafile-extension>]
391 | 
392 | positional arguments:
393 |   project              your invited project name to link data files.
394 | ```
395 | 
396 | **Description**
397 | 
398 | ---
399 | 
400 | This command will link data files and meta data records on Base project.
401 | 
402 | After invitation to project, invited collaborators have to link their data files on local computer.
403 | 
404 | The data files often locate in different directory with the project owner, and sometimes in different directory name or file name.
405 | 
406 | So Base will create linker to match local file paths and this enable your collaborators to share the python script which load local file.
407 | 
408 | **Options**
409 | 
410 | ---
411 | 
412 | - `-d <datafiles-dirpath>`, `--directory <datafiles-dirpath>` - specify a `datafiles-dirpath` to load data files which have an extension specified with -e option. Base will search recursively.
413 | - `-e <datafile-extension>`, `--extension <datafile-extension>` - specify a `datafile-extension` to filter the targets on load data files. if you have some extensions in one dataset (such as png and jpg), you have to split loading workflow.
414 | 
415 | **Example: Link mnist data files into invited project**
416 | 
417 | ---
418 | 
419 | ```
420 | $ base link --directory ~/Downloads/mnist --extension png
421 | ```
422 | 
423 | then, you can search and export dataset as you want, or run python modeling script shared by other collaborators.
424 | 
425 | <details><summary>Output</summary>
426 | 
427 | ```
428 | Check datafiles...
429 | found 70000 files with png extension.
430 | linked!
431 | ```
432 | </details>
433 | 
434 | → [Back to top](#command-reference)
435 | 
436 | ## list
437 | 
438 | Show list of Base projects you can access.
439 | 
440 | **Synopsis**
441 | 
442 | ---
443 | 
444 | ```
445 | usage: base list [--archived]
446 | ```
447 | 
448 | **Description**
449 | 
450 | ---
451 | 
452 | This command will show you what project you can access.
453 | 
454 | You can check `Project UID`, your `Role` on the project (”Viewer”, “Editor”, “Admin” or “Owner”), whether the project is `private` or not and project `created date`.
455 | 
456 | **Options**
457 | 
458 | ---
459 | 
460 | - `--archived` - show archived projects
461 | 
462 | **Example: Check projects (not archived)**
463 | 
464 | ---
465 | 
466 | if you have project “mnist”,
467 | 
468 | ```
469 | $ base list
470 | ```
471 | 
472 | <details><summary>Output</summary>
473 | 
474 | ```
475 | projects
476 | ========
477 | [mnist]
478 | Project UID: abcdefghij0123456789
479 | Role: Owner
480 | Private Project: yes
481 | Created Date: 2022-03-11 18:18:54
482 | ```
483 | </details>
484 | 
485 | **Example: Check project (archived)**
486 | 
487 | ---
488 | 
489 | if you have archived project “fashion-mnist”,
490 | 
491 | ```
492 | $ base list --archived
493 | ```
494 | 
495 | <details><summary>Output</summary>
496 | 
497 | ```
498 | projects
499 | ========
500 | [fashion-mnist]
501 | Project UID: klmnopqrst0123456789
502 | Role: Owner
503 | Private Project: yes
504 | Created Date: 2022-03-16 01:38:29
505 | ```
506 | </details>
507 | 
508 | > Note: you can archive your projects with [`base rm <project>`](#rm) command.
509 | > 
510 | 
511 | → [Back to top](#command-reference)
512 | 
513 | ## new
514 | 
515 | Create a new Base project.
516 | 
517 | **Synopsis**
518 | 
519 | ---
520 | 
521 | ```
522 | usage: base new project
523 | 
524 | positional arguments:
525 |   project              your project name to create.
526 | ```
527 | 
528 | **Description**
529 | 
530 | ---
531 | 
532 | This command will create database table for meta data.
533 | 
534 | 1. issue 20 characters as `project unique id (Project UID)` and create tables.
535 | 2. save Project UID at `~/.base/projects` file on your local computer.
536 | 3. you can use `project name` as alias to Project UID with any Base command.
537 | 
538 | **Example**
539 | 
540 | ---
541 | 
542 | ```
543 | $ base new mnist
544 | ```
545 | 
546 | <details><summary>Output</summary>
547 | 
548 | ```
549 | Your Project UID
550 | ----------------
551 | abcdefghij0123456789
552 | 
553 | save Project UID in local file (~/.base/projects)
554 | ```
555 | </details>
556 | 
557 | then, project uids will save on `~/.base/projects` .
558 | 
559 | ```
560 | $ cat ~/.base/projects
561 | [xxxx@yyyy.com]
562 | mnist = abcdefghij0123456789
563 | ```
564 | 
565 | > Note: your user id is saved in Global section
566 | > 
567 | 
568 | → [Back to top](#command-referenced)
569 | 
570 | ## rm
571 | 
572 | Archive or completely Delete your Base projects.
573 | 
574 | **Synopsis**
575 | 
576 | ---
577 | 
578 | ```
579 | usage: base rm project [--confirm] [-m <member-id>]
580 | 
581 | positional arguments:
582 |   project              your project name to archive or delete.
583 | ```
584 | 
585 | **Description**
586 | 
587 | ---
588 | 
589 | This command provide you the way to remove project member and archive or delete your project.
590 | 
591 | If you specify `-m` option, you can remove a project member from the project.
592 | 
593 | If not, Base will archive or delete specified project.
594 | 
595 | For prevention unexpected delete, we suppose you to archive project not delete.
596 | 
597 | If not deleted, you can restore your archived projects.
598 | 
599 | > Note: Delete project action can be performed only by project owner.
600 | > 
601 | 
602 | **Options**
603 | 
604 | ---
605 | 
606 | - `--confirm` - delete archived project completely (only Owner user)
607 | - `-m <member-id>`, `--member <member-id>` - specify a `member-id` to remove from project. you can see your project member list with `[base show <project> --member-list](#show)` command.
608 | 
609 | **Example: Remove project member**
610 | 
611 | ---
612 | 
613 | check your project members on mnist with `[base show <project> --member-list](#show)` command
614 | 
615 | ```
616 | $ base show mnist --member-list
617 | ```
618 | 
619 | <details><summary>Output</summary>
620 | 
621 | ```
622 | project Members
623 | ===============
624 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54)
625 | zzzz@yyyy.com (Editor, invited at 2022-03-12 13:45:04)
626 | ```
627 | </details>
628 | 
629 | then, remove zzzz@yyyy.com from mnist
630 | 
631 | ```
632 | $ base rm mnist --member zzzz@yyyy.com
633 | ```
634 | 
635 | <details><summary>Output</summary>
636 | 
637 | ```
638 | zzzz@yyyy.com was removed from mnist
639 | ```
640 | </details>
641 | 
642 | finally, you can check the removed user not in project member list .
643 | 
644 | ```
645 | $ base show mnist --member-list
646 | ```
647 | 
648 | <details><summary>Output</summary>
649 | 
650 | ```
651 | project Members
652 | ===============
653 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54)
654 | ```
655 | </details>
656 | 
657 | **Example: Archive mnist project**
658 | 
659 | ---
660 | 
661 | ```
662 | $ base rm mnist
663 | ```
664 | 
665 | <details><summary>Output</summary>
666 | 
667 | ```
668 | mnist was Archived
669 | ```
670 | </details>
671 | 
672 | then, you can check whether the project was archived with `[base list](#list)` command.
673 | 
674 | **Example: Delete mnist project**
675 | 
676 | ---
677 | 
678 | ```
679 | $ base rm mnist --confirm
680 | ```
681 | 
682 | <details><summary>Output</summary>
683 | 
684 | ```
685 | mnist was Deleted
686 | ```
687 | </details>
688 | 
689 | then, you can check whether the project was deleted with `[base list](#list)` command.
690 | 
691 | ```
692 | $ base list --archived
693 | ```
694 | 
695 | <details><summary>Output</summary>
696 | 
697 | ```
698 | projects
699 | ========
700 | ```
701 | </details>
702 | 
703 | > Note: if you delete the project once, you can not restore its saved data forever.
704 | 
705 | 
706 | → [Back to top](#command-reference)
707 | 
708 | ## search
709 | 
710 | Search data files and export it based on meta data of Base project.
711 | 
712 | **Synopsis**
713 | 
714 | ---
715 | 
716 | ```
717 | usage: base search project [-q <query-condition>] [-c <value-conditions>] 
718 | [-e <export-file-type>] [-o <output-filepath>] [-s]
719 | 
720 | positional arguments:
721 |   project              your project name to search.
722 | ```
723 | 
724 | **Description**
725 | 
726 | ---
727 | 
728 | This command provide you search engine for data files.
729 | 
730 | You can search some words in meta data with `-c` option, or set filter with `-q` option.
731 | 
732 | And also you can export as JSON or CSV with `-e` and `-o` options.
733 | 
734 | > Note: if you have same values on different keys, condition filter will be confused and return a result you have not expected. for secure filtering, you should specify key name with query option if some values duplicated in over 2 keys.
735 | 
736 | **Options**
737 | 
738 | ---
739 | 
740 | - `-q <query-condition>`, `--query <query-condition>` - specify `query-condition` to filter the data files based on meta data. you can use various operators and specify multiple `query-condition`.
741 |     
742 |     ```
743 |     [query grammar]
744 |     {KeyName} {Operator} {Values}
745 |     - add 1 spaces each section
746 |     - don't use space any other
747 |     >>> sample query condition: CategoryName == airplane
748 |     
749 |     [operators]
750 |     - == : equal
751 |     - != : not equal
752 |     - >= : greater than
753 |     - <= : less than
754 |     - > : greater
755 |     - < : less
756 |     - is : missing value (only 'None' is allowed as Values, ex. query='correction is None')
757 |     - is not : any value (only 'None' is allowed as Values, ex. query='correction is not None')
758 |     - in : inner list of Values
759 |     - not in : outer list of values
760 |     ```
761 |     
762 |     > Note:  you have to follow query grammar.
763 |     > 
764 | - `-c <value-conditions>`, `--conditions <value-conditions>` - specify `value-conditions` to filter by meta data value. this is so powerful mode because you do not have to know the KeyName of meta data.
765 |     - if you specify multiple value of one meta data key, Base will return you the union of the values.
766 |         - ex.) if you specify "airplane,automobile” and both of them in same meta data key ”CategoryName”, Base will Interpret as “CategoryName is airplane or automobile”.
767 |     - if you specify multiple value of different meta data keys, Base will return you the intersection of the values.
768 |         - ex.) if you specify "airplane,2007” and one of them in meta data key ”CategoryName”, and other in “Timestamp”, Base will Interpret as “CategoryName is airplane and also Timestamp is 2007”.
769 |     - and you can combine these behaviors.
770 |         - ex.) if you specify "airplane,automobile,2007” and two of them in same meta data key ”CategoryName”, and one in “Timestamp”, Base will Interpret as “(CategoryName is airplane and also Timestamp is 2007) or (CategoryName is automobile and also Timestamp is 2007)”.
771 |     
772 |     ```
773 |     [conditions grammar]
774 |     "{Value1},{Value2},..."
775 |     - separate with comma
776 |     >>> sample conditions: "airplane,automobile"
777 |     ```
778 |     
779 |     > Note:  you have to follow conditions grammar.
780 |     > 
781 | - `-e <export-file-type>`, `--export <export-file-type>` - if you want to convert search results into JSON or CSV, you can specify JSON or CSV as `export-file-type`.
782 | - `-o <output-filepath>`, `--output <output-filepath>` - specify `output-filepath` to save dataset file. default is “./dataset.json” or “./dataset.csv”
783 | - `-s`, `--summary` - summarize result and hide detail output
784 | 
785 | **Example: Search mnist with value conditions**
786 | 
787 | ---
788 | 
789 | ```
790 | $ base search mnist --conditions "train" --query "label in ['1','2','3']"
791 | ```
792 | 
793 | <details><summary>Output</summary>
794 | 
795 | ```
796 | 18831 files
797 | ========
798 | '/home/xxxx/dataset/mnist/train/1/42485.png'
799 | ...
800 | ```
801 | </details>
802 | 
803 | **Example: Search mnist and export as JSON**
804 | 
805 | ---
806 | 
807 | ```
808 | $ base search mnist --conditions "test" --query "correction != -1" --export JSON --output ./dataset.json
809 | ```
810 | 
811 | <details><summary>Output</summary>
812 | 
813 | ```
814 | 9963 files
815 | ========
816 | '/home/xxxx/dataset/mnist/test/7/3329.png'
817 | ...
818 | ```
819 | </details>
820 | 
821 | ```
822 | $ cat ./dataset.json
823 | {
824 | 	"Data": [
825 | 		{
826 | 			"FilePath": "/home/xxxx/dataset/mnist/test/7/3329.png",
827 | 			"label": "7",
828 | 			"dataType": "test",
829 | 			"id": "3329"
830 | 		},
831 | 		...
832 | 	]
833 | }
834 | ```
835 | 
836 | → [Back to top](Command%20Re%2020fe9.md)
837 | 
838 | ## show
839 | 
840 | Show detail information about your Base project.
841 | 
842 | **Synopsis**
843 | 
844 | ---
845 | 
846 | ```
847 | usage: base show project [--member-list]
848 | 
849 | positional arguments:
850 |   project              your project name to show detail.
851 | ```
852 | 
853 | **Description**
854 | 
855 | ---
856 | 
857 | This command will show you what meta data in your project.
858 | 
859 | Each meta data has `KeyName (like “CategoryName”)` and `KeyHash` for specify the meta data if you changed KeyName.
860 | 
861 | The structure of returns likes below.
862 | 
863 | ```
864 | {
865 | 	"KeyHash": String,
866 | 	"KeyName": String,
867 | 	"RecordedCount", Number,
868 | 	"Creator": String,
869 | 	"LastEditor": String,
870 | 	"EditerList": List,
871 | 	"ValueHash": String,
872 | 	"ValueType": String,
873 | 	"UpperValue": String,
874 | 	"LowerValue": String,
875 | 	"UniqueValues": String,
876 | 	"CreatedTime": String of unix time,
877 | 	"LastModifiedTime": String of unix time
878 | }
879 | ```
880 | 
881 | Options
882 | 
883 | ---
884 | 
885 | - `--member-list` - show project members
886 | 
887 | **Example: Show mnist project**
888 | 
889 | ---
890 | 
891 | ```
892 | $ base show mnist
893 | ```
894 | 
895 | <details><summary>Output</summary>
896 | 
897 | ```
898 | project mnist
899 | ===============
900 | You have 70000 records with 4 keys in this project.
901 | 
902 | [Keys Information]
903 | 
904 | KEY NAME                 VALUE RANGE   VALUE TYPE                          RECORDED COUNT
905 | 'id','index'             0 ~ 59999     str('id'), int('index')             70000         
906 | 'correction'             0or6 ~ -1     str('correction')                   74            
907 | 'label','originalLabel'  0 ~ 9         str('label'), int('originalLabel')  70000         
908 | 'dataType'               test ~ train  str('dataType')                     70000
909 | ...
910 | ```
911 | </details>
912 | 
913 | **Example: Show mnist project members**
914 | 
915 | ---
916 | 
917 | ```
918 | $ base show mnist --member-list
919 | ```
920 | 
921 | <details><summary>Output</summary>
922 | 
923 | ```
924 | project Members
925 | ===============
926 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54)
927 | zzzz@yyyy.com (Editor, invited at 2022-03-12 13:45:04)
928 | ```
929 | </details>
930 | 
931 | → [Back to top](#command-reference)


--------------------------------------------------------------------------------
/docs/SDK.md:
--------------------------------------------------------------------------------
   1 | # Python Reference
   2 | 
   3 | - base.config
   4 |     - [func check_project_exists](#checkprojectexists)
   5 |     - [func delete_project_config](#deleteprojectconfig)
   6 |     - [func get_access_key](#getaccesskey)
   7 |     - [func get_project_uid](#getprojectuid)
   8 |     - [func get_user_id](#getuserid)
   9 |     - [func get_user_id_from_db](#getuseridfromdb)
  10 |     - [func register_access_key](#registeraccesskey)
  11 |     - [func register_project_uid](#registerprojectuid)
  12 |     - [func register_user_id](#registeruserid)
  13 |     - [func update_project_info](#updateprojectinfo)
  14 | - base.dataset
  15 |     - [class Dataset](#dataset-class)
  16 | - base.file
  17 |     - [class File](#file-class)
  18 |     - [class Files](#files-class)
  19 | - base.hash
  20 |     - [func calc_file_hash](#calcfilehash)
  21 | - base.parser
  22 |     - [class Parser](#parser-class)
  23 | - base.project
  24 |     - [class Project](#project-class)
  25 |     - [func archive_project](#archiveproject)
  26 |     - [func create_project](#createproject)
  27 |     - [func delete_project](#deleteproject)
  28 |     - [func get_projects](#getprojects)
  29 |     - [func summarize_keys_information]()
  30 | 
  31 | ## **check_project_exists()**
  32 | 
  33 | ```python
  34 | function base.config.check_project_exists(user_id="string", project_name="string")
  35 | ```
  36 | 
  37 | Check project is already exists or not
  38 | 
  39 | **Parameters**
  40 | 
  41 | - user_id (string) - requeired
  42 |     - aquired user id from environment variable or config file
  43 | - project_name (string) - requeired
  44 |     - target project name
  45 | 
  46 | **Returns**
  47 | 
  48 | - project_exists (bool)
  49 |     - project already exists or not
  50 | 
  51 | → [Back to top](#python-reference)
  52 | 
  53 | ## **delete_project_config()**
  54 | 
  55 | ```python
  56 | function base.config.delete_project_config(user_id="string", project_name="string")
  57 | ```
  58 | 
  59 | Delete config of specified project.
  60 | 
  61 | **Parameters**
  62 | 
  63 | - user_id (string) - requeired
  64 |     - aquired user id from environment variable or config file
  65 | - project_name (string) - requeired
  66 |     - target project name
  67 | 
  68 | → [Back to top](#python-reference)
  69 | 
  70 | ## **get_access_key()**
  71 | 
  72 | ```python
  73 | function base.config.get_access_key()
  74 | ```
  75 | 
  76 | Get access key from config file. If you have 'BASE_ACCESS_KEY' on environment variables, Base will use it
  77 | 
  78 | **Returns**
  79 | 
  80 | - access_key (string)
  81 |     - aquired API access key from environment variable or config file
  82 | 
  83 | → [Back to top](#python-reference)
  84 | 
  85 | ## **get_project_uid()**
  86 | 
  87 | ```python
  88 | function base.config.get_project_uid(user_id="string", project_name="string")
  89 | ```
  90 | 
  91 | Get project uid from project name.
  92 | 
  93 | **Parameters**
  94 | 
  95 | - user_id (string) - requeired
  96 |     - aquired user id from environment variable or config file
  97 | - project_name (string) - requeired
  98 |     - target project name
  99 | 
 100 | **Returns**
 101 | 
 102 | - project_uid (listringst)
 103 |     - project uid of given project name
 104 | 
 105 | → [Back to top](#python-reference)
 106 | 
 107 | ## **get_user_id()**
 108 | 
 109 | ```python
 110 | function base.config.get_user_id()
 111 | ```
 112 | 
 113 | Get user id from config file. If you have 'BASE_USER_ID' on environment variables, Base will use it
 114 | 
 115 | **Returns**
 116 | 
 117 | - user_id (string)
 118 |     - aquired user id from environment variable or config file
 119 | 
 120 | → [Back to top](#python-reference)
 121 | 
 122 | ## **get_user_id_from_db()**
 123 | 
 124 | ```python
 125 | function base.config.get_user_id_from_db(access_key="string")
 126 | ```
 127 | 
 128 | Get user id from remote db.
 129 | 
 130 | **Parameters**
 131 | 
 132 | - access_key (string) - requeired
 133 |     - aquired API access key from environment variable or config file
 134 | 
 135 | **Returns**
 136 | 
 137 | - user_id (string)
 138 |     - aquired user id from database
 139 | 
 140 | → [Back to top](#python-reference)
 141 | 
 142 | ## **register_access_key()**
 143 | 
 144 | ```python
 145 | function base.config.register_access_key(access_key="string")
 146 | ```
 147 | 
 148 | Register access key to local config file.
 149 | 
 150 | **Parameters**
 151 | 
 152 | - access_key (string) - requeired
 153 |     - API access key
 154 | 
 155 | → [Back to top](#python-reference)
 156 | 
 157 | ## **register_project_uid()**
 158 | 
 159 | ```python
 160 | function base.config.register_project_uid(user_id="string", project="string", project_uid="string")
 161 | ```
 162 | 
 163 | Register project uid to local config file.
 164 | 
 165 | **Parameters**
 166 | 
 167 | - user_id (string) - requeired
 168 |     - aquired user id from environment variable or config file
 169 | - project (string) - requeired
 170 |     - target project name
 171 | - project_uid (string) - requeired
 172 |     - target project uid
 173 | 
 174 | → [Back to top](#python-reference)
 175 | 
 176 | ## **register_user_id()**
 177 | 
 178 | ```python
 179 | function base.config.register_user_id(user_id="string")
 180 | ```
 181 | 
 182 | Register user id to local config file.
 183 | 
 184 | **Parameters**
 185 | 
 186 | - user_id (string) - requeired
 187 |     - target user id
 188 | 
 189 | → [Back to top](#python-reference)
 190 | 
 191 | ## **update_project_info()**
 192 | 
 193 | ```python
 194 | function base.config.update_project_info(user_id="string")
 195 | ```
 196 | 
 197 | Update local project info with remote.
 198 | 
 199 | **Parameters**
 200 | 
 201 | - user_id (string) - requeired
 202 |     - aquired user id from environment variable or config file
 203 | 
 204 | → [Back to top](#python-reference)
 205 | 
 206 | ## **Dataset class**
 207 | 
 208 | ```python
 209 | class base.dataset.Dataset
 210 | ```
 211 | 
 212 | This is a middle-level (numpy or other) interface for dataset in Base. Dataset class receive Files class as an argument and process each data file with specified transform functions. You can create high-level (torch tensor or other) interface for dataset, like Dataloader of Pytorch, using this Dataset object.
 213 | 
 214 | ```python
 215 | import base
 216 | 
 217 | project = base.Project("project-name")
 218 | files = project.files(conditions="string", query=["string"], sort_key="string")
 219 | dataset = base.Dataset(files=files, target_key="string", transform=None|Callable)
 220 | ```
 221 | 
 222 | These are the available attributes:
 223 | 
 224 | - transform (Callable)
 225 |     - preprocess function
 226 | - target_key (string)
 227 |     - object variable for modeling
 228 | - files (Files)
 229 |     - inherited dataset interface
 230 | 
 231 | These are the available methods:
 232 | 
 233 | - [train_test_split()](#traintestsplit)
 234 | 
 235 | ### **train_test_split()**
 236 | 
 237 | ```python
 238 | x_train, x_test, y_train, y_test = dataset.train_test_split(split_rate=float)
 239 | ```
 240 | 
 241 | This method splits dataset for 2 folds. You can adjust split ratio with `split_rate` option.
 242 | 
 243 | **Parameters**
 244 | 
 245 | - split_rate (float) - default 0.25
 246 |     - the ratio of test set
 247 | 
 248 | **Returns**
 249 | 
 250 | - x_train (list)
 251 |     - transformed train data
 252 | - x_test (list)
 253 |     - transformed test data
 254 | - y_train (list)
 255 |     - train label specified as target_key in Dataset class initialization
 256 | - y_test (list)
 257 |     - test label specified as target_key in Dataset class initialization
 258 | 
 259 | **Usage**  
 260 | Using the index operator [] on the Dataset class object, you can get the data transformed by user-defined preprocessing functions and label specified by target key. 
 261 | 
 262 | ```python
 263 | def preprocess_func(path):
 264 |     image = Image.open(path)
 265 |     image = image.resize((28, 28))
 266 |     image = np.array(image)
 267 |     return image
 268 | 
 269 | test_files = Project("mnist").files(conditions="test")
 270 | test_dataset = Dataset(test_files, target_key="label", transform=preprocess_func)
 271 | 
 272 | print(test_dataset[0])
 273 | >>>(array([[  0,   0, ...]]), '7'
 274 | ```
 275 | 
 276 | If transform is not specified, local path is returned by default.
 277 | ```python
 278 | test_files = Project("mnist").files(conditions="test")
 279 | test_dataset = Dataset(test_files, target_key="label")
 280 | 
 281 | print(test_dataset[0])
 282 | >>> '/Users/user/dataset/mnist/test/7/4815.png', '7'
 283 | ```
 284 | 
 285 | For example:  
 286 | 
 287 | You can get X and y using for loops as in the following example. 
 288 | ```python
 289 | def preprocess_func(path):
 290 |     image = Image.open(path)
 291 |     image = image.resize((28, 28))
 292 |     image = np.array(image)
 293 |     return image
 294 | 
 295 | def get_image_and_label(dataset, idx):
 296 |     X, label = dataset[idx] # label = "0" or "1" or "2" , ...
 297 |     y = int(label)
 298 |     # cerate one-hot vector
 299 |     y = np.eye(10)[y]
 300 |     return X, y
 301 | 
 302 | test_files = Project("mnist").files(conditions="test")
 303 | test_dataset = Dataset(test_files, target_key="label", transform=preprocess_func)
 304 | 
 305 | X_test = np.empty((len(test_dataset), 28, 28, 1))
 306 | y_test = np.empty((len(test_dataset), 10))
 307 | for i in range(len(test_dataset)):
 308 |     X_test[i], y_test[i] = get_image_and_label(test_dataset, i)
 309 | ```
 310 | 
 311 | If you use `train_test_split()`, y_train and y_test are list of string obtained by target_key by default.
 312 | ```python
 313 | files = Project("mnist").files()
 314 | dataset = Dataset(files, target_key="label", transform=preprocess_func)
 315 | X_train, y_train, X_test, y_test = dataset.train_test_split(0.25)
 316 | 
 317 | print(y_train)
 318 | >>> ["1", "3", "4",...]
 319 | ```
 320 | 
 321 | 
 322 | → [Back to top](#python-reference)
 323 | 
 324 | ## **File class**
 325 | 
 326 | ```python
 327 | class base.files.File
 328 | ```
 329 | 
 330 | Using the index operator [] on the Files class object, you can get the File class object at a specific index.
 331 | 
 332 | ```python
 333 | print(files[0])
 334 | >>> "/home/xxxx/dataset/mnist/0/12909.png"
 335 | ```
 336 | 
 337 | These are the available attributes:
 338 | 
 339 | - path (string)
 340 |     - local filepath.
 341 |     
 342 |     For example:
 343 |     
 344 |     ```python
 345 |     files[0].path
 346 |     >>> "/home/xxxx/dataset/mnist/0/12909.png"
 347 |     ```
 348 |     
 349 | - metadata (dict)
 350 |     whole dict of attributes (metadata) which related with this file.
 351 |     
 352 |     For example:
 353 |     
 354 |     ```Python
 355 |     files[0].metadata
 356 |     >>> {
 357 |             "dataType": "train",
 358 |             "label": "0",
 359 |             "id": "12909"
 360 |         }
 361 |     ```
 362 |     
 363 | - attrs (string)
 364 |     - attributes (metadata) which related with this file.
 365 |     
 366 |     For example:
 367 |     
 368 |     ```python
 369 |     files[0].label
 370 |     >>> "0"
 371 |     
 372 |     files[0].id
 373 |     >>> "12909"
 374 |     ```
 375 | 
 376 | → [Back to top](#python-reference)
 377 | 
 378 | ## **Files class**
 379 | 
 380 | ```python
 381 | class base.files.Files
 382 | ```
 383 | 
 384 | This is a low-level (file path) interface for dataset in Base. A Files object includes the File instances which matched with your dataset filter.
 385 | 
 386 | ```python
 387 | import base
 388 | 
 389 | project = base.Project("project-name")
 390 | files = project.files(conditions="string", query=["string"], sort_key="string")
 391 | ```
 392 | 
 393 | You can filter data files and get Files object simply by specified criteria using `files` method of `base.Project`.
 394 | 
 395 | 
 396 | **Using the index operator [] on the Files class object, you can get the [`File class`](#file-class) object at a specific index.**
 397 | 
 398 | For example:
 399 | 
 400 | ```python
 401 | files[0]
 402 | >>> "/home/xxxx/dataset/mnist/0/12909.png"
 403 | 
 404 | files[0].label
 405 | >>> "0"
 406 | 
 407 | files[0].id
 408 | >>> "12909"
 409 | ```
 410 | 
 411 | These are the available attributes:
 412 | 
 413 | - project_name (string)
 414 |     - registerd project name
 415 | - user_id (string)
 416 |     - registerd user id
 417 | - project_uid (string)
 418 |     - project unique hash
 419 | - conditions (string) - default `None`
 420 |     - value to search for files
 421 | - query (string) - default []
 422 |     - expression of key and value to search for files
 423 | - sort_key (string) - default `None` 
 424 |     - key to sort files
 425 | - files (list)
 426 |     - list of File class objects
 427 | - result (list)
 428 |     - list of metadata_dict filtered by criteria
 429 |     ```python
 430 |     [
 431 |     		{
 432 |     				"FilePath": String,
 433 |     				"MetaKey1": ...,
 434 |     				...
 435 |     		},
 436 |     		...
 437 |     ]
 438 |     ```
 439 | - paths (list)
 440 |     - list of local filepaths
 441 |     ```python
 442 |     [
 443 |     		"String",
 444 |     		...
 445 |     ]
 446 |     ```
 447 | - items (list)
 448 |     - list of metadata_dict other than filepath
 449 |     ```python
 450 |     [
 451 |     		{
 452 |     				"MetaKey1": ...,
 453 |     				...
 454 |     		},
 455 |     		...
 456 |     ]
 457 |     ```
 458 | 
 459 | This is the available methods:
 460 | 
 461 | - [filter()](#filter)
 462 | 
 463 | ### **filter()**
 464 | 
 465 | ```python
 466 | files = files.filter(conditions="string", query=["string"], sort_key="string")
 467 | ```
 468 | 
 469 | This method apply additional filter to already filtered Files object. You can use this method repeatedly.
 470 | 
 471 | **Parameters**
 472 | 
 473 | - conditions (string) - optional
 474 |     - value to search for files.
 475 |     
 476 |     For example:
 477 |     
 478 |     ```python
 479 |     conditions="0"
 480 |     ```
 481 |     
 482 |     If you want to search by multiple criteria, you must provide comma (,) separated strings. 
 483 |     
 484 |     For example:
 485 |     
 486 |     ```python
 487 |     conditions="0,1,2"
 488 |     ```
 489 |     
 490 |     You will get  files that meet at least one of the criteria. 
 491 |     
 492 |     **Note**
 493 |     
 494 |     There must be no single-byte spaces between values.
 495 |     
 496 | - query (list) - default []
 497 |     - expression of key and value to search for files.
 498 |     
 499 |     For example:
 500 |     
 501 |     ```python
 502 |     query=["label == 0"]
 503 |     ```
 504 |     
 505 |     You can use `==`, `!=`, `>`, `>=`, `<`, `<=`, `is`, `is not`, `in`, and `not in`  as operators.
 506 |     
 507 |     If you want to search by multiple criteria, you must provide the list of expressions.  For example:
 508 |     
 509 |     ```python
 510 |     query=["label == 0", "id >= 10000"]
 511 |     ```
 512 |     
 513 |     You will get  files that meet all the criteria.
 514 |     
 515 |     **Note**
 516 |     
 517 |     A single-byte space is required before and after the operator.
 518 |     
 519 | - sort_key (string) - optional
 520 |     - key to sort files.
 521 |     
 522 |     For example:
 523 |     
 524 |     ```python
 525 |     sort_key="label"
 526 |     ```
 527 | 
 528 | **Returns**
 529 | 
 530 | - Files class
 531 | 
 532 | There are available operators
 533 | 
 534 |  - [＋ (concatenation)](#+-(concatenatopm))
 535 |  - [| (union)](#|-(union))
 536 | 
 537 |  ### **+ (concatenation)**
 538 | Return a new Files object that is the concatenation of the 2 Files object. You can use this operator recursively.
 539 | 
 540 | This operation is **not** sensitive to element duplication. If both Files objects has same File object, the operated Files object has 2 same File object.
 541 | 
 542 | **Expression**
 543 | ```python
 544 | concated_files = files1 + files2
 545 | 
 546 | # You can operate recursively.
 547 | concated_files = files1 + files2 + files3
 548 | concated_files2 = concated_files + files4
 549 | ```
 550 | 
 551 | **Examples**
 552 |  ```python
 553 | files1 = project.files(conditions="0,1,2", query=['dataType == test'], sort_key="id")
 554 | files2 = project.files(conditions="0,1,2", query=['dataType == train'], sort_key="id")
 555 | 
 556 | files = files1 + files2
 557 | print(files)
 558 | >>> ======Files======
 559 |      Files1(project_name='mnist', conditions='0,1,2', query=['dataType == test'], sort_key='id', file_num=3148)
 560 |      Files2(project_name='mnist', conditions='0,1,2', query=['dataType == train'], sort_key='id', file_num=18624)
 561 |      ===Expressions===
 562 |      Files1 + Files2
 563 | 
 564 | print(len(files))
 565 | >>> 21772
 566 |  ```
 567 | 
 568 | 
 569 |  ### **| (union)**
 570 | Return a new Files object that is the union of the 2 Files object. You can use this operator recursively.
 571 | 
 572 | This operation guaranteed that all File objects that operated Files object has are unique.
 573 | 
 574 | **Expression**
 575 | ```python
 576 | union_files = files1 | files2
 577 | 
 578 | # You can operate recursively.
 579 | union_files = files1 | files2 | files3
 580 | union_files2 = union_files | files4
 581 | ```
 582 | 
 583 | **Examples**
 584 |  ```python
 585 | files1 = project.files(conditions="0,1,2", sort_key="id")
 586 | files2 = project.files(conditions="0", sort_key="id")
 587 | 
 588 | files = files1 | files2
 589 | print(files)
 590 | >>> ======Files======
 591 |      Files1(project_name='mnist', conditions='0,1,2', query=[], sort_key='id', file_num=21772)
 592 |      Files2(project_name='mnist', conditions='0', query=[], sort_key='id', file_num=6905)
 593 |      ===Expressions===
 594 |      Files1 or Files2
 595 | 
 596 | print(len(files))
 597 | >>> 21772
 598 |  ```
 599 | 
 600 | 
 601 | 
 602 | → [Back to top](#python-reference)
 603 | 
 604 | ## **calc_file_hash()**
 605 | 
 606 | ```python
 607 | function base.hash.calc_file_hash(path="string", algorithm="md5"|"sha224"|"sha256"|"sha384"|"sha512"|"sha1", split_chunk=False|True, chunk_size=int)
 608 | ```
 609 | 
 610 | Calculate hash value of each file
 611 | 
 612 | **Parameters**
 613 | 
 614 | - path (string) - requeired
 615 |     - target file path
 616 | - algorithm (string) - default "sha256"
 617 |     - hash algorithm name
 618 | - split_chunk (bool) - default True
 619 |     - if True, split large file to byte chunks
 620 | - chunk_size (integer) - default 2048
 621 |     - block byte size of chunk
 622 | 
 623 | **Returns**
 624 | 
 625 | - digest (string)
 626 |     - hash string of inputed file
 627 | 
 628 | → [Back to top](#python-reference)
 629 | 
 630 | ## **Parser class**
 631 | 
 632 | ```python
 633 | class base.parser.Parser
 634 | ```
 635 | 
 636 | This is a file path parser. When you call `add_datafiles` method of `base.Project`, Base will initialize Parser object with specified parsing rule and try to extract metadata from each file path with `__call__` method.
 637 | 
 638 | ```python
 639 | from base.parser import Parser
 640 | 
 641 | parser = Parser(parsing_rule="string", sep=None|"string")
 642 | result = parser(path="string")
 643 | ```
 644 | 
 645 | ### **\_\_init\_\_()**
 646 | 
 647 | Initialize self with parsing_rule and generate parser.
 648 | 
 649 | ```python
 650 | base.parser.Parser(parsing_rule="string", sep=None|"string")
 651 | ```
 652 | 
 653 | 1. Replace unused strings with `{_}` in `parsing_rule`
 654 | 2. Extract keys enclosed in `{}`
 655 | * Example of processing method
 656 |     ```Raw
 657 |     1. parsing_rule: hoge{num1}/fuga{num2}.txt
 658 |         -> {hoge}/{num1}/{fuga}/{num2}.txt
 659 |     
 660 |     2. {_}/{num1}/{_}/{num2}.txt
 661 |         -> ["_", "num1", "_", "num2"]
 662 |     ```
 663 | 
 664 | **Parameter**
 665 | 
 666 | - parsing_rule (string) - required
 667 |     - specified parsing rule  
 668 |     ex.) {_}/{name}/{timestamp}/{sensor}-{condition}_{iteration}.csv
 669 | - sep (string) - optional
 670 |     - the separator of the file path
 671 | 
 672 | ### **\_\_call\_\_()**
 673 | 
 674 | Parse your target path.
 675 | 
 676 | ```python
 677 | parser(path="string")
 678 | ```
 679 | 
 680 | 1. Convert file path string to parsable format.
 681 | 2. Extract values enclosed in `{}` in the parsable formatted path.
 682 | 3. Generate a dictionary from keys and values extracted with `parsing_rule`.
 683 | 
 684 | * Example of processing method
 685 |     ```Raw
 686 |     1. path: mnist/train/0/12909.png
 687 |         -> {mnist}/{train}/{0}/{12909}.png
 688 | 
 689 |     2. parsable format: {mnist}/{train}/{0}/{12909}.png
 690 |         -> ["mnist", "train", "0", "12909"]
 691 | 
 692 |     3. keys  : ["_", "dataType", "label", "id"]
 693 |        values: ["mnist", "train", "0", "12909"]
 694 |         -> {"dataType": "train", "label": "0", "id": "12909"}
 695 |     ```
 696 | **Parameters**
 697 | 
 698 | - path (string) - required
 699 |     - the file path
 700 | 
 701 | **Return**
 702 | 
 703 | - parsed_dict (dict)
 704 |     - meta data dictionary
 705 | 
 706 | These are the available methods:
 707 | 
 708 | - [is_path_parsable()](#is_path_parsable)
 709 | - [update_rule()](#update_rule)
 710 | 
 711 | ### **is_path_parsable()**
 712 | 
 713 | Verify specified parsing rule is working properly. If not, return False
 714 | 
 715 | ```python
 716 | parser.is_path_parsable(path="string")
 717 | ```
 718 | 
 719 | **Parameter**
 720 | 
 721 | - path (string) - required
 722 |     - the file path.
 723 | 
 724 | **Return**
 725 | 
 726 | - parsable_flag (bool)
 727 |     - True if the file path is parsable
 728 | 
 729 | ### **update_rule()**
 730 | 
 731 | Generate a parser that takes into account the number of splitter based on the parsing example.
 732 | 
 733 | Use this method when `is_path_parsable("your-path")` is false.
 734 | 
 735 | ```python
 736 | parser.update_rule(parsing_rule="string")
 737 | ```
 738 | 
 739 | **Parameters**
 740 | - parsing_rule (string) - required
 741 |     - detail parsing rule.  
 742 |     ex.) {Origin}/{train}/{2022_04_05}-{dog}_{a01}.png
 743 | 
 744 | → [Back to top](#python-reference)
 745 | 
 746 | ## **Project class**
 747 | 
 748 | ```python
 749 | class base.project.Project
 750 | ```
 751 | 
 752 | A basement class of project. You have to initialize with existing project name. If you specified a project name which you don't have, you will get ValueError. Please retry after call `base.project.create_project` function.
 753 | 
 754 | 
 755 | ```Python
 756 | import base
 757 | 
 758 | project = base.Project("project-name")
 759 | ```
 760 | 
 761 | These are the available attributes:
 762 | 
 763 | - project_name (string)
 764 |     - registerd project name
 765 | - user_id (string)
 766 |     - registerd user id
 767 | - project_uid (string)
 768 |     - project unique hash
 769 | 
 770 | These are the available methods:
 771 | 
 772 | - [add_datafile()](#adddatafile)
 773 | - [add_datafiles()](#adddatafiles)
 774 | - [add_member()](#addmember)
 775 | - [add_metafile()](#addmetafile)
 776 | - [extract_metafile](#extractmetafile)
 777 | - [estimate_join_rule](#estimatejoinrule)
 778 | - [files()](#files)
 779 | - [get_members()](#getmembers)
 780 | - [get_metadata_summary()](#getmetadatasummary)
 781 | - [link_datafiles()](#linkdatafiles)
 782 | - [remove_member()](#removemember)
 783 | - [update_member()](#updatemember)
 784 | 
 785 | 
 786 | ### **add_datafile()**
 787 | 
 788 | Import meta data of one file.
 789 | 
 790 | ```python
 791 | project.add_datafile(file_path="string", attributes={"string":"string"})
 792 | ```
 793 | 
 794 | 1. Calculate the file hash.
 795 | 3. Create meta data record with the file hash and attributes.
 796 | 4. Add that record into project database table.
 797 | 
 798 | ```python
 799 | {
 800 | 	"FileHash": String,
 801 | 	"MetaKey1": ...,
 802 | 	...
 803 | }
 804 | ```
 805 | 
 806 | **Parameters**
 807 | 
 808 | - file_path (string) - requeired
 809 |     - the file path
 810 | - attributes (dict) - default {}
 811 |     - the extra meta data (attributes)
 812 | 
 813 | **Raises**
 814 | 
 815 | - Exception
 816 |     - raises if something went wrong on uploading request to server
 817 | 
 818 | ### **add_datafiles()**
 819 | 
 820 | Import meta data related with datafile paths.
 821 | 
 822 | ```python
 823 | project.add_datafiles(dir_path="string", extension="string", attributes={"string":"string"}, parsing_rule="string", detail_parsing_rule="string")
 824 | ```
 825 | 
 826 | 1. Calculate the file hash.
 827 | 2. Parse the file path with `parsing-rule`.
 828 | 3. Create meta data records with the file hash, attributes, and parsed path data.
 829 | 4. Add that records into project database table.
 830 | 
 831 | ```python
 832 | {
 833 | 	"FileHash": String,
 834 | 	"MetaKey1": ...,
 835 | 	...
 836 | }
 837 | ```
 838 | 
 839 | **Parameters**
 840 | 
 841 | - dir_path (string) - requeired
 842 |     - the root directory path for datafiles
 843 | - extension (string) - requeired
 844 |     - the extension of datafiles
 845 | - attributes (dict) - default {}
 846 |     - the extra meta data (attributes) combined with whole datafiles
 847 | - parsing_rule (string) - optional
 848 |     - the rule for extracting meta data from datafile path
 849 |     ex.) {_}/{disease}/{patient-id}-{part}-{iteration}.png
 850 | - detail_parsing_rule (string) - optional
 851 |     - detail information about parsing rule
 852 |     ex.) {_}/{CancerA}/{1-123}-{1}-{100}.png
 853 | 
 854 | **Returns**
 855 | 
 856 | - file_num (integer)
 857 |     - number of imported datafiles
 858 | 
 859 | **Raises**
 860 | 
 861 | - ValueError
 862 |     - raises if invalid parsing rule was specified
 863 | - Exception
 864 |     - raises if something went wrong on uploading request to server
 865 | 
 866 | ### **add_member()**
 867 | 
 868 | Invite a new project member.
 869 | 
 870 | ```python
 871 | project.add_member(member="string", permission_level="string")
 872 | ```
 873 | 
 874 | **Parameters**
 875 | 
 876 | - member (string) - requeired
 877 |     - the user id of new member
 878 | - permission_level (string) - requeired
 879 |     - new member's permission level
 880 |         - Viewer
 881 |             only read meta data on project database. 
 882 |             viewer can not import data files or external files
 883 |             and can not control permission of other members.
 884 |         - Editor
 885 |             can read and write meta data into project database.
 886 |             editor can not control permission of other members.
 887 |         - Admin
 888 |             can read and write meta data into project database.
 889 |             admin can also control permission of other members,
 890 |             but can not transfer Owner permission level.
 891 | 
 892 | **Raises**
 893 | 
 894 | - ValueError
 895 |     - raises if invalid permission level was specified
 896 | - Exception
 897 |     - raises if something went wrong on invite request to server
 898 | 
 899 | ### **add_metafile()**
 900 | 
 901 | Import meta data from external file.
 902 | 
 903 | ```python
 904 | project.add_metafile(file_path=["string"], attributes={"string":"string"})
 905 | ```
 906 | 
 907 | **Parameters**
 908 | 
 909 | - file_path (list) - requeired
 910 |     - list of the external file path
 911 | - attributes (string) - default {}
 912 |     - the extra meta data (attributes) combined with whole datafiles
 913 | 
 914 | **Raises**
 915 | 
 916 | - ValueError
 917 |     - raises if specified external file is not csv or excel file
 918 | - Exception
 919 |     - raises if something went wrong on uploading request to server
 920 | 
 921 | ### **extract_metafile()**
 922 | 
 923 | Only Extract meta data from external file.
 924 | 
 925 | ```python
 926 | project.extract_metafile(file_path="string", attributes={"string":"string"})
 927 | ```
 928 | 
 929 | **Parameters**
 930 | 
 931 | - file_path (string) - requeired
 932 |     - the external file path
 933 | - attributes (string) - default {}
 934 |     - the extra meta data (attributes) combined with whole datafiles
 935 | 
 936 | **Returns**
 937 | - tables (list)
 938 |     - list of table data extracted from external file
 939 | 
 940 | ```JavaScript
 941 | [
 942 |     [
 943 |         {
 944 |             "MetaKey1": ...,
 945 |             "MetaKey2": ...,
 946 |             ...
 947 |         },
 948 |         ...
 949 |     ],
 950 |     ...
 951 | ]
 952 | ```
 953 | 
 954 | **Raises**
 955 | 
 956 | - ValueError
 957 |     - raises if specified external file is not csv or excel file
 958 | - Exception
 959 |     - raises if something went wrong on uploading request to server
 960 | 
 961 | ### **estimate_join_rule()**
 962 | 
 963 | Only estimate the join rule from external file and existing table.
 964 | 
 965 | ```python
 966 | project.extract_metafile(file_path="string", tables=list)
 967 | ```
 968 | 
 969 | **Parameters**
 970 | 
 971 | Either file_path or tables are required. If both are specified, tables take precedence.
 972 | - file_path (string)
 973 |     - the external file path
 974 | - tables (list)
 975 |     - output of base.Project().extract_metafile() method  
 976 | 
 977 | 
 978 | 
 979 | **Returns**
 980 | - join_rule (list)
 981 |     - list of the join rule estimated from external file and existing table.
 982 | 
 983 | ```JavaScript
 984 | [
 985 |         {
 986 |             "new key1":"exist key1" ...,
 987 |             ...
 988 |         },
 989 |         ...
 990 | ]
 991 | ```
 992 | 
 993 | 
 994 | **Raises**
 995 | 
 996 | - ValueError
 997 |     - raises if specified external file is not csv or excel file
 998 | - Exception
 999 |     - raises if something went wrong on uploading request to server
1000 | 
1001 | ### **files()**
1002 | 
1003 | Return the [`Files class`](#files-class).
1004 | You can filter files easily and simply by specified criteria.
1005 | 
1006 | ```python
1007 | files = project.files(conditions="string", query=["string"], sort_key="string")
1008 | ```
1009 | 
1010 | **Parameters**
1011 | 
1012 | - conditions (string) - optional
1013 |     - value to search for files
1014 |     
1015 |     For example:
1016 |     
1017 |     ```python
1018 |     conditions="0"
1019 |     ```
1020 |     
1021 |     If you want to search by multiple criteria, you must provide comma (,) separated strings.
1022 |     
1023 |     For example:
1024 |     
1025 |     ```python
1026 |     conditions="0,1,2"
1027 |     ```
1028 |     
1029 |     You will get  files that meet at least one of the criteria. 
1030 |     
1031 | - query (list) - default []
1032 |     - expression of key and value to search for files
1033 |     
1034 |     For example:
1035 |     
1036 |     ```python
1037 |     query=["label == 0"]
1038 |     ```
1039 |     
1040 |     You can use `==`, `!=`, `>`, `>=`, `<`, `<=`, `is`, `is not`, `in`, and `not in`  as operator.
1041 |     
1042 |     If you want to search by multiple criteria, you must provide the list of expressions.
1043 |     
1044 |     For example:
1045 |     
1046 |     ```python
1047 |     query=["label == 0", "id >= 10000"]
1048 |     ```
1049 |     
1050 |     You will get  files that meet all the criteria.
1051 |     
1052 |     **Note**
1053 |     
1054 |     A single-byte space is required before and after the operator.
1055 |     
1056 | - sort_key (string) - optional
1057 |     - key to sort files.
1058 |     
1059 |     For example:
1060 |     
1061 |     ```python
1062 |     sort_key="label"
1063 |     ```
1064 | 
1065 | **Returns**
1066 | 
1067 | - [`Files class`](#files-class)
1068 | 
1069 | ### **get_members()**
1070 | 
1071 | Get list of project members.
1072 | 
1073 | ```python
1074 | project.get_members()
1075 | ```
1076 | 
1077 | **Returns**
1078 | 
1079 | - member_list (list)
1080 |     - list of each members information
1081 | 
1082 | ```JavaScript
1083 | [
1084 |     {
1085 |         "UserID": String,
1086 |         "UserRole": String,
1087 |         "CreatedTime": String of unix time
1088 |     },
1089 |     ...
1090 | ]
1091 | ```
1092 | 
1093 | **Raises**
1094 | 
1095 | - Exception
1096 |     - raises if something went wrong with request to server
1097 | 
1098 | ### **get_metadata_summary()**
1099 | 
1100 | Get list of meta data information.
1101 | 
1102 | ```python
1103 | project.get_metadata_summary()
1104 | ```
1105 | 
1106 | **Returns**
1107 | 
1108 | - key_list (list)
1109 |     - list of each members information
1110 | 
1111 | ```JavaScript
1112 | [
1113 |     {
1114 |         "KeyHash": String,
1115 |         "KeyName": String,
1116 |         "ValueHash": String,
1117 |         "ValueType": String,
1118 |         "RecordedCount": Integer,
1119 |         "UpperValue": String,
1120 |         "LowerValue": String,
1121 |         "CreatedTime": String of unix time,
1122 |         "LastModifiedTime": String of unix time,
1123 |         "Creator": String,
1124 |         "LastEditor": String,
1125 |         "EditerList": List of String
1126 |     },
1127 |     ...
1128 | ]
1129 | ```
1130 | 
1131 | **Raises**
1132 | 
1133 | - Exception
1134 |     - raises if something went wrong with request to server
1135 | 
1136 | 
1137 | ### **link_datafiles()**
1138 | 
1139 | Create linker metadat to local datafiles.
1140 | 
1141 | ```python
1142 | project.link_datafiles(dir_path="string", extension="string")
1143 | ```
1144 | 
1145 | **Parameters**
1146 | 
1147 | - dir_path (string) - requeired
1148 |     - the root directory path for datafiles
1149 | - extension (string) - requeired
1150 |     - the extension of datafiles
1151 | 
1152 | **Returns**
1153 | 
1154 | - file_num (integer)
1155 |     - number of linked datafiles
1156 | 
1157 | ### **remove_member()**
1158 | 
1159 | Remove project member.
1160 | 
1161 | ```python
1162 | project.remove_member(member=["string"]|"string")
1163 | ```
1164 | 
1165 | **Parameters**
1166 | 
1167 | - member (list or string) - requeired
1168 |     - the target member for removing
1169 | 
1170 | **Raises**
1171 | 
1172 | - Exception
1173 |     - raises if something went wrong on removing request to server
1174 | 
1175 | ### **update_member()**
1176 | 
1177 | Update project member's permission.
1178 | 
1179 | ```python
1180 | project.update_member(member="string", permission_level="Viewer"|"Editor"|"Admin"|"Owner")
1181 | ```
1182 | 
1183 | **Parameters**
1184 | 
1185 | - member (string) - requeired
1186 |     - the user id of existing member
1187 | - permission_level (string) - requeired
1188 |     - member's permission level for update
1189 |         - Viewer
1190 |             only read meta data on project database. 
1191 |             viewer can not import data files or external files
1192 |             and can not control permission of other members.
1193 |         - Editor
1194 |             can read and write meta data into project database.
1195 |             editor can not control permission of other members.
1196 |         - Admin
1197 |             can read and write meta data into project database.
1198 |             admin can also control permission of other members,
1199 |             but can not transfer Owner permission level.
1200 |         - Owner
1201 |             can transfer owner permission to others,
1202 |             and delete project completely.
1203 | 
1204 | **Raises**
1205 | 
1206 | - ValueError
1207 |     - raises if invalid permission level was specified
1208 | - Exception
1209 |     - raises if something went wrong on invite request to server
1210 | 
1211 | → [Back to top](#python-reference)
1212 | 
1213 | ## **archive_project()**
1214 | 
1215 | ```python
1216 | function base.project.archive_project(user_id="string", project_name="string")
1217 | ```
1218 | 
1219 | Archive project.
1220 | 
1221 | **Parameters**
1222 | 
1223 | - user_id (string) - requeired
1224 |     - registerd user id
1225 | - project_name (string) - requeired
1226 |     - project name you want to archive
1227 | 
1228 | **Raises**
1229 | 
1230 | - Exception
1231 |     - raises if something went wrong on request to server
1232 | 
1233 | 
1234 | ## **create_project()**
1235 | 
1236 | ```python
1237 | function base.project.create_project(user_id="string", project_name="string", private=True|False)
1238 | ```
1239 | 
1240 | **Parameters**
1241 | 
1242 | - user_id (string) - requeired
1243 |     - registerd user id
1244 | - project_name (string) - requeired
1245 |     - project name which you want to create
1246 | - private (bool) - default True
1247 |     - specifies whether or not to allow public access into the project
1248 | 
1249 | **Returns**
1250 | 
1251 | - project_uid (string)
1252 |     - project unique hash
1253 | 
1254 | **Raises**
1255 | 
1256 | - Exception
1257 |     - raises if something went wrong on request to server
1258 | 
1259 | 
1260 | ## **delete_project()**
1261 | 
1262 | ```python
1263 | function base.project.delete_project(user_id="string", project_name="string")
1264 | ```
1265 | 
1266 | Delete project.
1267 | 
1268 | **Parameters**
1269 | 
1270 | - user_id (string) - requeired
1271 |     - registerd user id
1272 | - project_name (string) - requeired
1273 |     - archived project name you want to delete
1274 | 
1275 | **Raises**
1276 | 
1277 | - Exception
1278 |     - raises if something went wrong on request to server
1279 | 
1280 | ## **get_projects()**
1281 | 
1282 | ```python
1283 | function base.project.get_projects(user_id="string", archived=False|True)
1284 | ```
1285 | 
1286 | Get list of projects.
1287 | 
1288 | **Parameters**
1289 | 
1290 | - user_id (string) - requeired
1291 |     - registerd user id
1292 | - archived (bool) - default False
1293 |     - if False, return not archived projects. if False, return archived projects
1294 | 
1295 | **Returns**
1296 | 
1297 | - project_list (list)
1298 |     - list of project name you have
1299 | 
1300 | **Raises**
1301 | 
1302 | - Exception
1303 |     - raises if something went wrong on request to server
1304 | 
1305 | → [Back to top](#python-reference)
1306 | 
1307 | ## **summarize_keys_information()**
1308 | 
1309 | ```python
1310 | function base.project.summarize_keys_information(metadata_summary="list")
1311 | ```
1312 | 
1313 | Summarize information of keys on project for printing.
1314 | 
1315 | **Parameters**
1316 | 
1317 | - metadata_summary (list) - requeired
1318 |     - output of the base.Project().get_metadata_summary() method
1319 | 
1320 | **Returns**
1321 | 
1322 | - summary_for_print (dict)
1323 |     - summarized key information for printing
1324 | 
1325 | ```JavaScript
1326 | {
1327 |     "MaxRecordedCount": Integer,
1328 |     "UniqueKeyCount": Integer,
1329 |     "MaxCharCount": {
1330 |         "KEY NAME": Integer,
1331 |         "VALUE RANGE": Integer,
1332 |         "VALUE TYPE": Integer,
1333 |         "RECORDED COUNT": Integer
1334 |     },
1335 |     "Keys": [
1336 |         (
1337 |             KeyName: String,
1338 |             ValueRange: String,
1339 |             ValueType: String,
1340 |             RecordedCount: String
1341 |         ),
1342 |         ...
1343 |     ]
1344 | }
1345 | ```
1346 | 
1347 | → [Back to top](#python-reference)


--------------------------------------------------------------------------------