├── poetry.toml ├── mypy.ini ├── tests ├── data │ ├── sample.csv │ ├── sample.jpeg │ ├── sample.xlsx │ └── 2022_04_14_rocket.png ├── test_hash.py ├── test_parser.py ├── test_config.py ├── test_project.py └── test_cli.py ├── example ├── wrongImagesInMNISTTestset.xlsx └── prepare.sh ├── .github ├── pull_request_template.md ├── ISSUE_TEMPLATE │ ├── feature_request.md │ └── bug_report.md └── workflows │ ├── ci.yml │ └── ci_dev.yml ├── base ├── __init__.py ├── hash.py ├── exception.py ├── spinner.py ├── dataset.py ├── config.py ├── parser.py ├── files.py └── cli.py ├── CONTRIBUTING.md ├── LICENSE ├── pyproject.toml ├── .gitignore ├── download_mnist.py ├── CODE_OF_CONDUCT.md ├── README.md └── docs ├── CLI.md └── SDK.md /poetry.toml: -------------------------------------------------------------------------------- 1 | [virtualenvs] 2 | in-project = true -------------------------------------------------------------------------------- /mypy.ini: -------------------------------------------------------------------------------- 1 | [mypy] 2 | ignore_missing_imports = True -------------------------------------------------------------------------------- /tests/data/sample.csv: -------------------------------------------------------------------------------- 1 | key1,key2,key3 2 | 1,2,3 3 | 4,5,6 4 | 7,8,9 -------------------------------------------------------------------------------- /tests/data/sample.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adansons/base/HEAD/tests/data/sample.jpeg -------------------------------------------------------------------------------- /tests/data/sample.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adansons/base/HEAD/tests/data/sample.xlsx -------------------------------------------------------------------------------- /tests/data/2022_04_14_rocket.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adansons/base/HEAD/tests/data/2022_04_14_rocket.png -------------------------------------------------------------------------------- /example/wrongImagesInMNISTTestset.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adansons/base/HEAD/example/wrongImagesInMNISTTestset.xlsx -------------------------------------------------------------------------------- /.github/pull_request_template.md: -------------------------------------------------------------------------------- 1 | close #your_issue_id 2 | 3 | # Motivation 4 | 5 | # Description of the changes 6 | 7 | # Example -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: 🪄 Feature Request 3 | about: Request a new feature or enhancement 4 | labels: 0_enhancement 5 | --- 6 | 7 | # Motivation 8 | 9 | # Description 10 | 11 | # Additional context (optional) 12 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: 🐛 Bug Report 3 | about: Create a bug report 4 | labels: 1_bug 5 | --- 6 | 7 | # Expected behavior 8 | 9 | # Error messages, stack traces, or logs 10 | 11 | # Steps to reproduce 12 | 13 | # Additional context (optional) 14 | -------------------------------------------------------------------------------- /base/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2021 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | 6 | from .project import Project 7 | from .dataset import Dataset 8 | 9 | 10 | # check exists local cache directory and files 11 | import os 12 | 13 | CACHE_DIR = os.path.join(os.path.expanduser("~"), ".base") 14 | CONFIG_FILE = os.path.join(os.path.expanduser("~"), ".base", "config") 15 | PROJECT_FILE = os.path.join(os.path.expanduser("~"), ".base", "projects") 16 | 17 | os.makedirs(CACHE_DIR, exist_ok=True) 18 | 19 | # initialize with empty file 20 | if not os.path.exists(CONFIG_FILE): 21 | open(CONFIG_FILE, "w").close() 22 | if not os.path.exists(PROJECT_FILE): 23 | open(PROJECT_FILE, "w").close() 24 | 25 | VERSION = "0.1.3" 26 | __version__ = VERSION 27 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contribution Guide 2 | 3 | Thanks for your interest in helping improve Adansons Base! 4 | 5 | 1. Please check exisintg issues to know someone is already working on same thing. 6 | 7 | 2. Create a new issue if you want fix a large bug or add a new feature. 8 | 9 | 3. Create a new branch or fork this repository. (it is good to use the name which is clear what issue is related on that branch. ex: `feature/#100` or `fix/#101`) 10 | 11 | 4. Apply code formatter `black` and check `pytest` goes well after your working. 12 | 13 | 5. Create pull request to `main` branch and specify reviewers. 14 | 15 | ## Detail of development 16 | 17 | ### 1. Setup environment for develop 18 | 19 | Check poetry installation. 20 | 21 | If you don't installed poetry yet, please follow [the official instructions](https://python-poetry.org/docs/#installation). 22 | 23 | If `poetry --help` works, you're good to go. 24 | 25 | ```Bash 26 | poetry install 27 | ``` 28 | 29 | ### 2. Running tests 30 | 31 | ```Bash 32 | poetry run pytest tests/ 33 | ``` 34 | 35 | ### 3. Format source 36 | 37 | ```Bash 38 | poetry run black . 39 | ``` -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Adansons Inc 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /example/prepare.sh: -------------------------------------------------------------------------------- 1 | # Partially adapted from https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh 2 | 3 | # script to extract ImageNet dataset 4 | # ILSVRC2012_img_val.tar (about 6.3 GB) 5 | # make sure ILSVRC2012_img_val.tar in your current directory 6 | # 7 | # Adapted from: 8 | # https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md 9 | # https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4 10 | # 11 | # imagenet/val/ 12 | # ├── n01440764 13 | # │ ├── ILSVRC2012_val_00000293.JPEG 14 | # │ ├── ILSVRC2012_val_00002138.JPEG 15 | # │ ├── ...... 16 | # ├── ...... 17 | 18 | # Make imagnet directory 19 | # 20 | mkdir imagenet 21 | 22 | # Extract the validation data and move images to subfolders: 23 | # 24 | # Create validation directory; move .tar file; change directory; extract validation .tar; remove compressed file 25 | mkdir imagenet/val && mv ILSVRC2012_img_val.tar imagenet/val/ && cd imagenet/val && tar -xvf ILSVRC2012_img_val.tar && rm -f ILSVRC2012_img_val.tar 26 | # get script from soumith and run; this script creates all class directories and moves images into corresponding directories 27 | wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | zsh 28 | # 29 | # This results in a validation directory like so: 30 | # 31 | # imagenet/val/ 32 | # ├── n01440764 33 | # │ ├── ILSVRC2012_val_00000293.JPEG 34 | # │ ├── ILSVRC2012_val_00002138.JPEG 35 | # │ ├── ...... 36 | # ├── ...... 37 | # 38 | # 39 | # Check total files after extract 40 | # 41 | # $ find val/ -name "*.JPEG" | wc -l 42 | # 50000 43 | # -------------------------------------------------------------------------------- /base/hash.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | import hashlib 6 | 7 | 8 | HASH_FUNCS = { 9 | "md5": hashlib.md5, 10 | "sha224": hashlib.sha224, 11 | "sha256": hashlib.sha256, 12 | "sha384": hashlib.sha384, 13 | "sha512": hashlib.sha512, 14 | "sha1": hashlib.sha1, 15 | } 16 | 17 | 18 | def calc_file_hash( 19 | path: str, 20 | algorithm: str = "sha256", 21 | split_chunk: bool = True, 22 | chunk_size: int = 2048, 23 | ) -> str: 24 | """ 25 | Calculate hash value of each file 26 | 27 | Parameters 28 | ---------- 29 | path : str 30 | target file path 31 | algorithm : {"md5", "sha224", "sha256", "sha384", "sha512", "sha1"}, default="sha256" 32 | hash algorithm name 33 | split_chunk : bool, default=True 34 | if True, split large file to byte chunks 35 | chunk_size : int, default=2048 36 | block byte size of chunk 37 | 38 | Returns 39 | ------- 40 | digest : str 41 | hash string of inputed file 42 | """ 43 | hash_func = HASH_FUNCS[algorithm]() 44 | 45 | with open(path, "rb") as f: 46 | if split_chunk: 47 | while True: 48 | chunk = f.read(chunk_size * hash_func.block_size) 49 | if len(chunk) == 0: 50 | break 51 | 52 | hash_func.update(chunk) 53 | else: 54 | chunk = f.read() 55 | hash_func.update(chunk) 56 | 57 | digest = hash_func.hexdigest() 58 | return digest 59 | 60 | 61 | if __name__ == "__main__": 62 | pass 63 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.poetry] 2 | name = "adansons-base" 3 | version = "0.1.3" 4 | description = "Adansons Base" 5 | readme = "README.md" 6 | authors = ["Adansons Developers "] 7 | homepage = "" 8 | repository = "https://github.com/adansons/base" 9 | license = "MIT" 10 | packages = [ 11 | { include = "base"}, 12 | ] 13 | classifiers = [ 14 | "Development Status :: 4 - Beta", 15 | "Intended Audience :: Science/Research", 16 | "Intended Audience :: Information Technology", 17 | "Intended Audience :: Developers", 18 | "License :: OSI Approved :: MIT License", 19 | "Programming Language :: Python :: 3.8", 20 | "Programming Language :: Python :: 3.9", 21 | "Programming Language :: Python :: 3.10", 22 | "Topic :: Database", 23 | "Topic :: Scientific/Engineering", 24 | "Topic :: Scientific/Engineering :: Artificial Intelligence", 25 | "Topic :: Scientific/Engineering :: Information Analysis", 26 | "Operating System :: MacOS", 27 | "Operating System :: Microsoft :: Windows", 28 | "Operating System :: POSIX :: Linux" 29 | ] 30 | 31 | [tool.poetry.scripts] 32 | base = 'base.cli:main' 33 | 34 | [tool.poetry.dependencies] 35 | python = "^3.8,<3.11" 36 | click = ">=8.0.0" 37 | requests = ">=1.0.0" 38 | numpy = ">=1.18.5" 39 | scikit-learn = ">=0.23.0" 40 | PyYAML = "^6.0" 41 | "ruamel.yaml" = "^0.17.21" 42 | colorama = "^0.4.4" 43 | pandas = "^1.4.3" 44 | 45 | [tool.poetry.dev-dependencies] 46 | black = "^21.12b0" 47 | pytest = "^6.2.5" 48 | mypy = "^0.931" 49 | boto3 = "^1.21.13" 50 | jupyter = "^1.0.0" 51 | jupyterlab = "^3.0.9" 52 | 53 | [build-system] 54 | requires = ["poetry-core>=1.0.0"] 55 | build-backend = "poetry.core.masonry.api" 56 | 57 | -------------------------------------------------------------------------------- /tests/test_hash.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | 6 | import os 7 | import sys 8 | 9 | sys.path.append(os.path.dirname(os.path.dirname(__file__))) 10 | 11 | from base.hash import calc_file_hash 12 | 13 | PATH = os.path.join(os.path.dirname(__file__), "data", "sample.jpeg") 14 | MD5HASH = "93304c750cf3dd4e8e91d374d60b9734" 15 | SHA224HASH = "c9462ddf27c8aefbf74f70ca13fd113f304e46d1359cde3f3aa8908a" 16 | SHA256HASH = "09e300d993f62d0e623e0d631a468e6126881b0e9152547ca8b369e7233e5717" 17 | SHA384HASH = "eb2e4a765e17f666122bb30f13a40e843fbfb32d6f6b3f96b5d8614c2761f3827ef5c374b5078c651d31ac549feed8f2" 18 | SHA512HASH = "c9414d9abf93f278457d9d31a0eef74a57644f7431aa9132a3ac5e7642b29a6b2f27976ff19700cca0bd9b902f8e4d5bfcfb4733b8b79e9b8c85d40fc796e7d6" 19 | SHA1HASH = "ec33c6e4dbe7a84f177899f6aac29bb718cb0451" 20 | 21 | 22 | def test_calc_file_hash_md5(): 23 | digest = calc_file_hash(PATH, algorithm="md5", split_chunk=False) 24 | assert digest == MD5HASH 25 | 26 | 27 | def test_calc_file_hash_sha224(): 28 | digest = calc_file_hash(PATH, algorithm="sha224", split_chunk=False) 29 | assert digest == SHA224HASH 30 | 31 | 32 | def test_calc_file_hash_sha256(): 33 | digest = calc_file_hash(PATH, algorithm="sha256", split_chunk=False) 34 | assert digest == SHA256HASH 35 | 36 | 37 | def test_calc_file_hash_sha384(): 38 | digest = calc_file_hash(PATH, algorithm="sha384", split_chunk=False) 39 | assert digest == SHA384HASH 40 | 41 | 42 | def test_calc_file_hash_sha512(): 43 | digest = calc_file_hash(PATH, algorithm="sha512", split_chunk=False) 44 | assert digest == SHA512HASH 45 | 46 | 47 | def test_calc_file_hash_sha1(): 48 | digest = calc_file_hash(PATH, algorithm="sha1", split_chunk=False) 49 | assert digest == SHA1HASH 50 | 51 | 52 | def test_split_chunk(): 53 | digest = calc_file_hash(PATH, algorithm="sha256", split_chunk=True) 54 | assert digest == SHA256HASH 55 | 56 | 57 | if __name__ == "__main__": 58 | test_calc_file_hash_md5() 59 | test_calc_file_hash_sha224() 60 | test_calc_file_hash_sha256() 61 | test_calc_file_hash_sha384() 62 | test_calc_file_hash_sha512() 63 | test_calc_file_hash_sha1() 64 | test_split_chunk() 65 | -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: CI_PRD 2 | 3 | on: 4 | # execute when pull requested 5 | pull_request: 6 | branches: 7 | - 'main' 8 | 9 | paths: 10 | - '.github/workflows/**' 11 | - 'base/**' 12 | - 'tests/**' 13 | 14 | jobs: 15 | test: 16 | runs-on: ubuntu-latest 17 | strategy: 18 | matrix: 19 | python-version: [3.9] 20 | 21 | env: 22 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 23 | 24 | steps: 25 | - name: Checkout 26 | uses: actions/checkout@v2 27 | with: 28 | ref: ${{ github.event.pull_request.head.ref }} 29 | 30 | - name: Set up Python ${{ matrix.python-version }} 31 | uses: actions/setup-python@v2 32 | with: 33 | python-version: ${{ matrix.python-version }} 34 | 35 | - name: Install Dependencies 36 | run: | 37 | pip install --upgrade pip 38 | pip install poetry 39 | poetry install --no-interaction 40 | 41 | - name: Run the test with pytest 42 | run: poetry run pytest tests/ 43 | env: 44 | BASE_USER_ID: ${{ secrets.BASE_USER_ID }} 45 | BASE_ACCESS_KEY: ${{ secrets.BASE_ACCESS_KEY_PRD }} 46 | 47 | fix-format: 48 | runs-on: ubuntu-latest 49 | needs: test 50 | 51 | env: 52 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 53 | 54 | steps: 55 | - name: Checkout 56 | uses: actions/checkout@v2 57 | with: 58 | ref: ${{ github.event.pull_request.head.ref }} 59 | 60 | - name: Set up Python 61 | uses: actions/setup-python@v2 62 | with: 63 | python-version: '3.9' 64 | 65 | - name: Install Dependencies 66 | run: | 67 | pip install --upgrade pip 68 | pip install poetry 69 | poetry install --no-interaction 70 | 71 | - name: Format Python Source with Black 72 | run: poetry run black . 73 | 74 | - name: Push to Pull Requested branch 75 | run: | 76 | if ! git diff --exit-code --quiet 77 | then 78 | git config --global user.email "github-actions[bot]@users.noreply.github.com" 79 | git config --global user.name "github-actions" 80 | 81 | git add . 82 | git commit -m "[Actions]Fix format with black." 83 | git push 84 | fi -------------------------------------------------------------------------------- /.github/workflows/ci_dev.yml: -------------------------------------------------------------------------------- 1 | name: CI_DEV 2 | 3 | on: 4 | # execute when pull requested 5 | pull_request: 6 | branches: 7 | - 'dev' 8 | 9 | paths: 10 | - '.github/workflows/**' 11 | - 'base/**' 12 | - 'tests/**' 13 | 14 | jobs: 15 | test: 16 | runs-on: ubuntu-latest 17 | strategy: 18 | matrix: 19 | python-version: [3.9] 20 | 21 | env: 22 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 23 | 24 | steps: 25 | - name: Checkout 26 | uses: actions/checkout@v2 27 | with: 28 | ref: ${{ github.event.pull_request.head.ref }} 29 | 30 | - name: Set up Python ${{ matrix.python-version }} 31 | uses: actions/setup-python@v2 32 | with: 33 | python-version: ${{ matrix.python-version }} 34 | 35 | - name: Install Dependencies 36 | run: | 37 | pip install --upgrade pip 38 | pip install poetry 39 | poetry install --no-interaction 40 | 41 | - name: Run the test with pytest 42 | run: poetry run pytest tests/ 43 | env: 44 | BASE_USER_ID: ${{ secrets.BASE_USER_ID }} 45 | BASE_ACCESS_KEY: ${{ secrets.BASE_ACCESS_KEY }} 46 | BASE_API_ENDPOINT: ${{ secrets.BASE_API_ENDPOINT }} 47 | 48 | fix-format: 49 | runs-on: ubuntu-latest 50 | needs: test 51 | 52 | env: 53 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 54 | 55 | steps: 56 | - name: Checkout 57 | uses: actions/checkout@v2 58 | with: 59 | ref: ${{ github.event.pull_request.head.ref }} 60 | 61 | - name: Set up Python 62 | uses: actions/setup-python@v2 63 | with: 64 | python-version: '3.9' 65 | 66 | - name: Install Dependencies 67 | run: | 68 | pip install --upgrade pip 69 | pip install poetry 70 | poetry install --no-interaction 71 | 72 | - name: Format Python Source with Black 73 | run: poetry run black . 74 | 75 | - name: Push to Pull Requested branch 76 | run: | 77 | if ! git diff --exit-code --quiet 78 | then 79 | git config --global user.email "github-actions[bot]@users.noreply.github.com" 80 | git config --global user.name "github-actions" 81 | 82 | git add . 83 | git commit -m "[Actions]Fix format with black." 84 | git push 85 | fi -------------------------------------------------------------------------------- /base/exception.py: -------------------------------------------------------------------------------- 1 | import click 2 | 3 | 4 | def CatchAllExceptions(cls, handler): 5 | """ 6 | Function to override the default exception handler of click. 7 | With thanks to this Stack Overflow answer: 8 | https://stackoverflow.com/questions/52213375/python-click-exception-handling-under-setuptools 9 | 10 | Parameters 11 | ---------- 12 | cls : The class that the handler is being applied e.g. click.Command 13 | handler : The handler function to print the error message 14 | 15 | Returns 16 | ------- 17 | Cls : The new class itself with the handler applied 18 | """ 19 | 20 | class Cls(cls): 21 | _original_args = None 22 | 23 | def make_context(self, info_name, args, parent=None, **extra): 24 | 25 | # grab the original command line arguments 26 | self._original_args = " ".join(args) 27 | 28 | try: 29 | return super(Cls, self).make_context( 30 | info_name, args, parent=parent, **extra 31 | ) 32 | except Exception as exc: 33 | # call the handler 34 | handler(self, info_name, exc) 35 | # let the user see the original error 36 | raise 37 | 38 | def invoke(self, ctx): 39 | try: 40 | return super(Cls, self).invoke(ctx) 41 | except Exception as exc: 42 | # call the handler 43 | handler(self, ctx.info_name, exc) 44 | 45 | # let the user see the original error 46 | raise 47 | 48 | return Cls 49 | 50 | 51 | def search_export_exception(cmd, info_name, exc): 52 | """ 53 | Function to handle the exception for "base search --export" command. 54 | 55 | Parameters 56 | ---------- 57 | cmd: The command that user is trying to run 58 | info_name: The name of the exception 59 | exc: The exception message 60 | 61 | Returns 62 | ------- 63 | None 64 | """ 65 | # send error info to rollbar, etc, here 66 | if ("'--export' requires an argument" or "'--e' requires an argument") in str(exc): 67 | click.echo("You can specify ‘json’ or ‘csv’ as export-file-type") 68 | elif ("'--output' requires an argument" or "'--o' requires an argument") in str( 69 | exc 70 | ): 71 | click.echo("You can specify the output file name") 72 | else: 73 | click.echo("Raised error: {}".format(exc)) 74 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | archive/ 3 | 4 | # Byte-compiled / optimized / DLL files 5 | __pycache__/ 6 | *.py[cod] 7 | *$py.class 8 | 9 | # C extensions 10 | *.so 11 | 12 | # Distribution / packaging 13 | .Python 14 | build/ 15 | develop-eggs/ 16 | dist/ 17 | downloads/ 18 | eggs/ 19 | .eggs/ 20 | lib/ 21 | lib64/ 22 | parts/ 23 | sdist/ 24 | var/ 25 | wheels/ 26 | pip-wheel-metadata/ 27 | share/python-wheels/ 28 | *.egg-info/ 29 | .installed.cfg 30 | *.egg 31 | MANIFEST 32 | 33 | # PyInstaller 34 | # Usually these files are written by a python script from a template 35 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 36 | *.manifest 37 | *.spec 38 | 39 | # Installer logs 40 | pip-log.txt 41 | pip-delete-this-directory.txt 42 | 43 | # Unit test / coverage reports 44 | htmlcov/ 45 | .tox/ 46 | .nox/ 47 | .coverage 48 | .coverage.* 49 | .cache 50 | nosetests.xml 51 | coverage.xml 52 | *.cover 53 | *.py,cover 54 | .hypothesis/ 55 | .pytest_cache/ 56 | 57 | # Translations 58 | *.mo 59 | *.pot 60 | 61 | # Django stuff: 62 | *.log 63 | local_settings.py 64 | db.sqlite3 65 | db.sqlite3-journal 66 | 67 | # Flask stuff: 68 | instance/ 69 | .webassets-cache 70 | 71 | # Scrapy stuff: 72 | .scrapy 73 | 74 | # Sphinx documentation 75 | docs/_build/ 76 | 77 | # PyBuilder 78 | target/ 79 | 80 | # Jupyter Notebook 81 | .ipynb_checkpoints 82 | 83 | # IPython 84 | profile_default/ 85 | ipython_config.py 86 | 87 | # pyenv 88 | .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 98 | __pypackages__/ 99 | 100 | # Celery stuff 101 | celerybeat-schedule 102 | celerybeat.pid 103 | 104 | # SageMath parsed files 105 | *.sage.py 106 | 107 | # Environments 108 | .env 109 | .venv 110 | env/ 111 | venv/ 112 | ENV/ 113 | env.bak/ 114 | venv.bak/ 115 | 116 | # Spyder project settings 117 | .spyderproject 118 | .spyproject 119 | 120 | # Rope project settings 121 | .ropeproject 122 | 123 | # mkdocs documentation 124 | /site 125 | 126 | # mypy 127 | .mypy_cache/ 128 | .dmypy.json 129 | dmypy.json 130 | 131 | # Pyre type checker 132 | .pyre/ 133 | 134 | # Created by https://www.toptal.com/developers/gitignore/api/terraform 135 | # Edit at https://www.toptal.com/developers/gitignore?templates=terraform 136 | 137 | /example/download_mnist.py 138 | .vscode/settings.json 139 | -------------------------------------------------------------------------------- /base/spinner.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | import time 6 | import itertools 7 | import threading 8 | 9 | 10 | class Spinner: 11 | """ 12 | Spinner class 13 | 14 | Attributes 15 | ---------- 16 | text : str 17 | text to be displayed with a spinner when the process is started 18 | etext : str 19 | text to be displayed when the process is terminated 20 | overwrite : bool 21 | whether `etext` overwrites `text` or not 22 | """ 23 | 24 | def __init__( 25 | self, text: str = "Please wait...", etext: str = "", overwrite: bool = True 26 | ) -> None: 27 | """ 28 | Parameters 29 | ---------- 30 | text : str 31 | text to be displayed with a spinner when the process is started 32 | etext : str 33 | text to be displayed when the process is terminated 34 | overwrite : bool 35 | whether `etext` overwrites `text` or not 36 | """ 37 | self.text = text 38 | self.etext = etext 39 | self.overwrite = overwrite 40 | 41 | self._stop_flag = False 42 | 43 | def _spinner(self) -> None: 44 | """ 45 | Display the `text` and a spinner during processing. 46 | """ 47 | chars = itertools.cycle(r"/-\|") 48 | while not self._stop_flag: 49 | print(f"\r{self.text} {next(chars)}", end="") 50 | time.sleep(0.2) 51 | 52 | def start(self): 53 | """ 54 | Set up a subthread to run a spinner. 55 | """ 56 | self._stop_flag = False 57 | self._spinner_thread = threading.Thread(target=self._spinner) 58 | self._spinner_thread.setDaemon(True) 59 | self._spinner_thread.start() 60 | 61 | def stop(self, etext: str = "", overwrite: bool = True): 62 | """ 63 | Kill a subthread and display `etest`. 64 | 65 | Parameters 66 | ---------- 67 | etext : str 68 | text to be displayed when the process is terminated 69 | overwrite : bool 70 | whether `etext` overwrites `text` or not 71 | """ 72 | if self._spinner_thread and self._spinner_thread.is_alive(): 73 | self._stop_flag = True 74 | self._spinner_thread.join() 75 | 76 | etext = etext or self.etext 77 | overwrite = self.overwrite if not self.overwrite else overwrite 78 | 79 | if overwrite: 80 | if etext == "": 81 | print(f"\r\033[2K\033[G", end="") 82 | else: 83 | print(f"\r\033[2K\033[G{etext}") 84 | else: 85 | if etext == "": 86 | print(f"\033[1D\033[K\n", end="") 87 | else: 88 | print(f"\033[1D\033[K\n{etext}") 89 | 90 | def __enter__(self) -> None: 91 | """ 92 | Start the spinner used on context managers. 93 | """ 94 | self.start() 95 | 96 | def __exit__(self, exception_type, exception_value, traceback) -> None: 97 | """ 98 | Stop the spinner used on context managers. 99 | """ 100 | if exception_type is not None: 101 | if self.overwrite: 102 | self.stop(etext=self.text) 103 | else: 104 | self.stop(etext="") 105 | else: 106 | self.stop() 107 | -------------------------------------------------------------------------------- /download_mnist.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact higuchi@adansons.co.jp 5 | # This script is cited from https://github.com/myleott/mnist_png/blob/master/convert_mnist_to_png.py 6 | 7 | import os 8 | import struct 9 | import sys 10 | 11 | from array import array 12 | 13 | import png 14 | 15 | import gzip 16 | import urllib.request 17 | 18 | 19 | original_sources_list = [ 20 | "train-images-idx3-ubyte", 21 | "train-labels-idx1-ubyte", 22 | "t10k-images-idx3-ubyte", 23 | "t10k-labels-idx1-ubyte", 24 | ] 25 | 26 | 27 | def download_original_sources(path: str = "."): 28 | for s in original_sources_list: 29 | fname = f"{s}.gz" 30 | url = f"http://yann.lecun.com/exdb/mnist/{fname}" 31 | 32 | s_path = os.path.join(path, s) 33 | fname_path = os.path.join(path, fname) 34 | urllib.request.urlretrieve(url, fname_path) 35 | with gzip.open(fname_path, mode="rb") as gzfile: 36 | with open(s_path, mode="wb") as f: 37 | f.write(gzfile.read()) 38 | 39 | os.remove(fname_path) 40 | 41 | 42 | def remove_original_sources(path: str = "."): 43 | for s in original_sources_list: 44 | s_path = os.path.join(path, s) 45 | os.remove(s_path) 46 | 47 | 48 | # source: http://abel.ee.ucla.edu/cvxopt/_downloads/mnist.py 49 | def read(dataset: str = "train", path: str = "."): 50 | if dataset == "train": 51 | fname_img = os.path.join(path, "train-images-idx3-ubyte") 52 | fname_lbl = os.path.join(path, "train-labels-idx1-ubyte") 53 | elif dataset == "test": 54 | fname_img = os.path.join(path, "t10k-images-idx3-ubyte") 55 | fname_lbl = os.path.join(path, "t10k-labels-idx1-ubyte") 56 | else: 57 | raise ValueError("dataset must be 'test' or 'train'") 58 | 59 | flbl = open(fname_lbl, "rb") 60 | magic_nr, size = struct.unpack(">II", flbl.read(8)) 61 | lbl = array("b", flbl.read()) 62 | flbl.close() 63 | 64 | fimg = open(fname_img, "rb") 65 | magic_nr, size, rows, cols = struct.unpack(">IIII", fimg.read(16)) 66 | img = array("B", fimg.read()) 67 | fimg.close() 68 | 69 | return lbl, img, size, rows, cols 70 | 71 | 72 | def write_dataset(labels, data, size, rows, cols, output_dir): 73 | # create output directories 74 | output_dirs = [os.path.join(output_dir, str(i)) for i in range(10)] 75 | for dir in output_dirs: 76 | os.makedirs(dir, exist_ok=True) 77 | 78 | # write data 79 | for (i, label) in enumerate(labels): 80 | output_filename = os.path.join(output_dirs[label], str(i) + ".png") 81 | with open(output_filename, "wb") as h: 82 | w = png.Writer(cols, rows, greyscale=True) 83 | data_i = [ 84 | data[(i * rows * cols + j * cols) : (i * rows * cols + (j + 1) * cols)] 85 | for j in range(rows) 86 | ] 87 | w.write(h, data_i) 88 | 89 | 90 | if __name__ == "__main__": 91 | if len(sys.argv) != 2: 92 | print(f"usage: {sys.argv[0]} ") 93 | sys.exit() 94 | 95 | data_root_dir = sys.argv[1] 96 | os.makedirs(data_root_dir, exist_ok=True) 97 | download_original_sources(data_root_dir) 98 | 99 | for dataset in ["train", "test"]: 100 | labels, data, size, rows, cols = read(dataset, data_root_dir) 101 | write_dataset( 102 | labels, data, size, rows, cols, os.path.join(data_root_dir, dataset) 103 | ) 104 | 105 | remove_original_sources(data_root_dir) 106 | -------------------------------------------------------------------------------- /base/dataset.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | import numpy as np 6 | from sklearn.model_selection import train_test_split 7 | from typing import Callable, Optional, Tuple 8 | 9 | from base.files import Files 10 | 11 | 12 | class Dataset: 13 | """ 14 | Dataset class 15 | 16 | Attributes 17 | ---------- 18 | transform : function 19 | function for preprocessing 20 | target_key : str 21 | key you want to label 22 | files : list 23 | list of File class 24 | paths : list 25 | list of filepath 26 | convert_dict : dict 27 | dict to convert labels to numbers 28 | y : list of int 29 | target label 30 | y_train : list of int 31 | target label used to train 32 | y_test : list of int 33 | target label used to test 34 | x : list 35 | data 36 | x_train : list 37 | data used to train 38 | x_test : list 39 | data used to test 40 | train_path : list 41 | filepath used to train 42 | test_path : list 43 | filepath used to test 44 | """ 45 | 46 | def __init__( 47 | self, files: Files, target_key: str, transform: Optional[Callable] = None 48 | ) -> None: 49 | """ 50 | Make the dict to convert labels to numbers. 51 | 52 | files : list 53 | list of File class 54 | target_key : str 55 | key you want to label 56 | transform : function or None, default None 57 | function for preprocessing 58 | """ 59 | 60 | self.transform = transform 61 | self.target_key = target_key 62 | if self.transform == None: 63 | self.transform = lambda x: x 64 | 65 | self.files = files 66 | self.paths = self.files.paths 67 | 68 | def train_test_split(self, split_rate: int = 0.25) -> Tuple[list]: 69 | """ 70 | Split train data and test data. 71 | 72 | Parameters 73 | ---------- 74 | split_rate : float 75 | the proportion of the dataset to include in the test data 76 | 77 | Returns 78 | ------- 79 | x_train : list 80 | data used to train 81 | x_test : list 82 | data used to test 83 | y_train : list of int 84 | target label used to train 85 | y_test : list of int 86 | target label used to test 87 | """ 88 | self.y = [getattr(i, self.target_key) for i in self.files] 89 | self.x = [self.transform(i) for i in self.paths] 90 | 91 | ( 92 | self.train_path, 93 | self.test_path, 94 | self.y_train, 95 | self.y_test, 96 | ) = train_test_split(self.paths, self.y, test_size=split_rate, stratify=self.y) 97 | 98 | self.x_train = [self.transform(i) for i in self.train_path] 99 | self.x_test = [self.transform(i) for i in self.test_path] 100 | 101 | return self.x_train, self.x_test, self.y_train, self.y_test 102 | 103 | def __len__(self) -> int: 104 | return len(self.files) 105 | 106 | def __getitem__(self, idx: int) -> Tuple: 107 | path = self.paths[idx] 108 | data = self.transform(path) 109 | label = getattr(self.files[idx], self.target_key) 110 | 111 | return data, label 112 | 113 | 114 | if __name__ == "__main__": 115 | pass 116 | -------------------------------------------------------------------------------- /tests/test_parser.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | 6 | import os 7 | import sys 8 | import pytest 9 | 10 | sys.path.append(os.path.dirname(os.path.dirname(__file__))) 11 | 12 | from base.parser import Parser 13 | 14 | 15 | INPUT_PATH1 = "/Origin/左声帯嚢胞/1-055-E-a01.wav" 16 | PARSING_RULE1 = "{_}/{disease}/{_}-{patient-id}-{part}-{iteration}.wav" 17 | PARSING_KEYS1 = ["_", "disease", "_", "patient-id", "part", "iteration", "[UnuseToken]"] 18 | PARSED_DICT1 = { 19 | "disease": "左声帯嚢胞", 20 | "patient-id": "055", 21 | "part": "E", 22 | "iteration": "a01", 23 | } 24 | 25 | INPUT_PATH2 = "/Origin/hoge1/fugasuzukipiyo_03.csv" 26 | PARSING_RULE2 = "{_}/hoge{num1}/fuga{name}piyo_{month}.csv" 27 | CONVERTED_PARSING_RULE2 = "{_}/{[UnuseToken]}/{num1}/{[UnuseToken]}/{name}/{[UnuseToken]}_{month}.{[UnuseToken]}" 28 | PARSING_KEYS2 = [ 29 | "_", 30 | "[UnuseToken]", 31 | "num1", 32 | "[UnuseToken]", 33 | "name", 34 | "[UnuseToken]", 35 | "month", 36 | "[UnuseToken]", 37 | ] 38 | PARSED_DICT2 = {"num1": "1", "name": "suzuki", "month": "03"} 39 | 40 | INPUT_PATH3 = "/Origin/hoge1/fugasuzukipiyo_2022_03_02.csv" 41 | PARSING_RULE3 = "{_}/hoge{num1}/fuga{name}piyo_{timestamp}.csv" 42 | DETAIL_PARSING_RULE = "{Origin}/hoge{1}/fuga{suzuki}piyo_{2022_03_02}.csv" 43 | PARSING_KEYS3 = [ 44 | "_", 45 | "[UnuseToken]", 46 | "num1", 47 | "[UnuseToken]", 48 | "name", 49 | "[UnuseToken]", 50 | "timestamp", 51 | "[UnuseToken]", 52 | ] 53 | PARSED_DICT3 = {"num1": "1", "name": "suzuki", "timestamp": "2022_03_02"} 54 | 55 | 56 | def test_generate_parser_pattern1(): 57 | parser = Parser(PARSING_RULE1, extension="wav") 58 | assert parser.parsing_keys == PARSING_KEYS1 59 | 60 | 61 | def test_parse_pattern1(): 62 | parser = Parser(PARSING_RULE1, extension="wav") 63 | parsed_dict = parser(INPUT_PATH1) 64 | assert parsed_dict == PARSED_DICT1 65 | 66 | 67 | def test_generate_parser_pattern2(): 68 | parser = Parser(PARSING_RULE2, extension="csv") 69 | assert parser.parsing_keys == PARSING_KEYS2 70 | assert parser.parsing_rule == CONVERTED_PARSING_RULE2 71 | 72 | 73 | def test_parse_pattern2(): 74 | parser = Parser(PARSING_RULE2, extension="csv") 75 | parsed_dict = parser(INPUT_PATH2) 76 | assert parsed_dict == PARSED_DICT2 77 | 78 | 79 | def test_generate_parser_pattern3(): 80 | parser = Parser(PARSING_RULE3, extension="csv") 81 | parser.update_rule(DETAIL_PARSING_RULE) 82 | assert parser.parsing_keys == PARSING_KEYS3 83 | 84 | 85 | def test_parse_pattern3(): 86 | parser = Parser(PARSING_RULE3, extension="csv") 87 | parser.update_rule(DETAIL_PARSING_RULE) 88 | parsed_dict = parser(INPUT_PATH3) 89 | assert parsed_dict == PARSED_DICT3 90 | 91 | 92 | def test_is_path_parsable(): 93 | parser = Parser(PARSING_RULE3, extension="csv") 94 | 95 | # Check for `IndexError` when parsing the file path with invalid `parsing_rule`. 96 | with pytest.raises(Exception) as e: 97 | _ = parser(INPUT_PATH3) 98 | assert str(e.value) == "list index out of range" 99 | 100 | # Test `is_path_parsable` with non-parsable path. 101 | parsable_frag = parser.is_path_parsable(INPUT_PATH3) 102 | assert parsable_frag == False 103 | 104 | 105 | if __name__ == "__main__": 106 | test_generate_parser_pattern1() 107 | test_parse_pattern1() 108 | test_generate_parser_pattern2() 109 | test_parse_pattern2() 110 | test_generate_parser_pattern3() 111 | test_parse_pattern3() 112 | test_is_path_parsable() 113 | -------------------------------------------------------------------------------- /tests/test_config.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | import time 6 | from click.testing import CliRunner 7 | from base.cli import ( 8 | list_project, 9 | remove_project, 10 | ) 11 | from base.config import ( 12 | get_user_id, 13 | register_user_id, 14 | get_access_key, 15 | register_access_key, 16 | get_project_uid, 17 | check_project_exists, 18 | register_project_uid, 19 | delete_project_config, 20 | update_project_info, 21 | get_user_id_from_db, 22 | ) 23 | from base.project import create_project 24 | 25 | PROJECT_NAME = "adansons_test_project" 26 | 27 | 28 | def test_initialize(): 29 | """ 30 | If something went wrong past test session. 31 | You may have exsiting tables, so you have to clear them before below tests. 32 | """ 33 | runner = CliRunner() 34 | result = runner.invoke(list_project, []) 35 | if PROJECT_NAME in result.output: 36 | result = runner.invoke(remove_project, [PROJECT_NAME]) 37 | 38 | result = runner.invoke(list_project, ["--archived"]) 39 | if PROJECT_NAME in result.output: 40 | result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"]) 41 | 42 | 43 | def test_get_access_key(): 44 | get_access_key() 45 | 46 | 47 | def test_register_access_key(): 48 | access_key = get_access_key() 49 | register_access_key(access_key) 50 | 51 | 52 | def test_get_user_id_from_db(): 53 | access_key = get_access_key() 54 | get_user_id_from_db(access_key) 55 | 56 | 57 | def test_register_user_id(): 58 | access_key = get_access_key() 59 | user_id = get_user_id_from_db(access_key) 60 | register_user_id(user_id) 61 | 62 | 63 | def test_get_user_id(): 64 | get_user_id() 65 | 66 | 67 | def test_register_project(): 68 | user_id = get_user_id() 69 | create_project(user_id, PROJECT_NAME) 70 | time.sleep(20) 71 | 72 | 73 | def test_check_project_exists(): 74 | user_id = get_user_id() 75 | assert check_project_exists(user_id, PROJECT_NAME) 76 | 77 | 78 | def test_get_project_uid(): 79 | user_id = get_user_id() 80 | get_project_uid(user_id, PROJECT_NAME) 81 | 82 | 83 | def test_register_project_uid(): 84 | user_id = get_user_id() 85 | project_uid = get_project_uid(user_id, PROJECT_NAME) 86 | register_project_uid(user_id, PROJECT_NAME, project_uid) 87 | 88 | 89 | def test_update_project_info(): 90 | user_id = get_user_id() 91 | update_project_info(user_id) 92 | 93 | 94 | def test_archive_project(): 95 | runner = CliRunner() 96 | result = runner.invoke(remove_project, [PROJECT_NAME]) 97 | assert result.exit_code == 0 98 | assert f"{PROJECT_NAME} was Archived" in result.output 99 | result = runner.invoke(list_project, []) 100 | assert result.exit_code == 0 101 | assert PROJECT_NAME not in result.output 102 | 103 | 104 | # How to test delete project config itself? 105 | def test_delete_project(): 106 | runner = CliRunner() 107 | result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"]) 108 | assert result.exit_code == 0 109 | assert f"{PROJECT_NAME} was Deleted" in result.output 110 | result = runner.invoke(list_project, ["--archived"]) 111 | assert result.exit_code == 0 112 | assert PROJECT_NAME not in result.output 113 | 114 | 115 | if __name__ == "__main__": 116 | test_initialize() 117 | test_get_access_key() 118 | test_register_access_key() 119 | test_get_user_id_from_db() 120 | test_register_user_id() 121 | test_get_user_id() 122 | test_check_project_exists() 123 | test_get_project_uid() 124 | test_register_project_uid() 125 | test_update_project_info() 126 | test_archive_project() 127 | test_delete_project() 128 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, religion, or sexual identity 10 | and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the 26 | overall community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or 31 | advances of any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email 35 | address, without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible for enforcement at 63 | Email (support@adansons.co.jp). 64 | All complaints will be reviewed and investigated promptly and fairly. 65 | 66 | All community leaders are obligated to respect the privacy and security of the 67 | reporter of any incident. 68 | 69 | ## Enforcement Guidelines 70 | 71 | Community leaders will follow these Community Impact Guidelines in determining 72 | the consequences for any action they deem in violation of this Code of Conduct: 73 | 74 | ### 1. Correction 75 | 76 | **Community Impact**: Use of inappropriate language or other behavior deemed 77 | unprofessional or unwelcome in the community. 78 | 79 | **Consequence**: A private, written warning from community leaders, providing 80 | clarity around the nature of the violation and an explanation of why the 81 | behavior was inappropriate. A public apology may be requested. 82 | 83 | ### 2. Warning 84 | 85 | **Community Impact**: A violation through a single incident or series 86 | of actions. 87 | 88 | **Consequence**: A warning with consequences for continued behavior. No 89 | interaction with the people involved, including unsolicited interaction with 90 | those enforcing the Code of Conduct, for a specified period of time. This 91 | includes avoiding interactions in community spaces as well as external channels 92 | like social media. Violating these terms may lead to a temporary or 93 | permanent ban. 94 | 95 | ### 3. Temporary Ban 96 | 97 | **Community Impact**: A serious violation of community standards, including 98 | sustained inappropriate behavior. 99 | 100 | **Consequence**: A temporary ban from any sort of interaction or public 101 | communication with the community for a specified period of time. No public or 102 | private interaction with the people involved, including unsolicited interaction 103 | with those enforcing the Code of Conduct, is allowed during this period. 104 | Violating these terms may lead to a permanent ban. 105 | 106 | ### 4. Permanent Ban 107 | 108 | **Community Impact**: Demonstrating a pattern of violation of community 109 | standards, including sustained inappropriate behavior, harassment of an 110 | individual, or aggression toward or disparagement of classes of individuals. 111 | 112 | **Consequence**: A permanent ban from any sort of public interaction within 113 | the community. 114 | 115 | ## Attribution 116 | 117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 118 | version 2.0, available at 119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. 120 | 121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct 122 | enforcement ladder](https://github.com/mozilla/diversity). 123 | 124 | [homepage]: https://www.contributor-covenant.org 125 | 126 | For answers to common questions about this code of conduct, see the FAQ at 127 | https://www.contributor-covenant.org/faq. Translations are available at 128 | https://www.contributor-covenant.org/translations. 129 | -------------------------------------------------------------------------------- /base/config.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | import os 6 | import json 7 | import time 8 | import requests 9 | import configparser 10 | 11 | from base.spinner import Spinner 12 | 13 | CONFIG_FILE = os.path.join(os.path.expanduser("~"), ".base", "config") 14 | PROJECT_FILE = os.path.join(os.path.expanduser("~"), ".base", "projects") 15 | LINKER_DIR = os.path.join(os.path.expanduser("~"), ".base", "linker") 16 | 17 | HEADER = {"Content-Type": "application/json"} 18 | BASE_API_ENDPOINT = os.environ.get( 19 | "BASE_API_ENDPOINT", "https://api.base.adansons.co.jp" 20 | ) 21 | 22 | 23 | def get_user_id() -> str: 24 | """ 25 | Get user id from config file. 26 | if you have 'BASE_USER_ID' on environment variables, Base will use it 27 | 28 | Returns 29 | ------- 30 | user_id : str 31 | aquired user id from environment variable or config file 32 | """ 33 | user_id = os.environ.get("BASE_USER_ID", None) 34 | if user_id is None: 35 | config = configparser.ConfigParser() 36 | config.read(CONFIG_FILE) 37 | 38 | user_id = config["default"]["user_id"] 39 | 40 | return user_id 41 | 42 | 43 | def register_user_id(user_id: str) -> None: 44 | """ 45 | Register user id to local config file. 46 | 47 | Parameters 48 | ---------- 49 | user_id : str 50 | target user id 51 | """ 52 | config = configparser.ConfigParser() 53 | config.read(CONFIG_FILE) 54 | 55 | config["default"].update({"user_id": user_id}) 56 | with open(CONFIG_FILE, "w") as f: 57 | config.write(f) 58 | 59 | 60 | def get_access_key() -> str: 61 | """ 62 | Get access key from config file 63 | if you have 'BASE_ACCESS_KEY' on environment variables, Base will use it 64 | 65 | Returns 66 | ------- 67 | access_key : str 68 | aquired API access key from environment variable or config file 69 | """ 70 | access_key = os.environ.get("BASE_ACCESS_KEY", None) 71 | if access_key is None: 72 | config = configparser.ConfigParser() 73 | config.read(CONFIG_FILE) 74 | 75 | access_key = config["default"]["access_key"] 76 | 77 | return access_key 78 | 79 | 80 | def register_access_key(access_key: str) -> None: 81 | """ 82 | Register access key to local config file. 83 | 84 | Parameters 85 | ---------- 86 | access_key : str 87 | API access key 88 | """ 89 | config = configparser.ConfigParser() 90 | config.read(CONFIG_FILE) 91 | 92 | config["default"] = {"access_key": access_key} 93 | os.makedirs(os.path.dirname(CONFIG_FILE), exist_ok=True) 94 | with open(CONFIG_FILE, "w") as f: 95 | config.write(f) 96 | 97 | 98 | def get_project_uid(user_id: str, project_name: str) -> str: 99 | """ 100 | Get project uid from project name. 101 | 102 | Parameters 103 | ---------- 104 | user_id : str 105 | user id 106 | project_name : str 107 | target project name 108 | 109 | Returns 110 | ------- 111 | project_uid : str 112 | project uid of given project name 113 | """ 114 | config = configparser.ConfigParser() 115 | config.read(PROJECT_FILE) 116 | 117 | is_exist = check_project_exists(user_id, project_name) 118 | if not is_exist: 119 | raise KeyError(f"Project {project_name} does not exist.") 120 | else: 121 | project_uid = config[user_id][project_name] 122 | return project_uid 123 | 124 | 125 | def check_project_exists(user_id: str, project_name: str) -> bool: 126 | """ 127 | Check project is already exists or not 128 | 129 | Parameters 130 | ---------- 131 | user_id : str 132 | user id 133 | project_name : str 134 | target project name 135 | 136 | Returns 137 | ------- 138 | project_exists : bool 139 | project already exists or not 140 | """ 141 | config = configparser.ConfigParser() 142 | config.read(PROJECT_FILE) 143 | 144 | project_exists = project_name in config[user_id] 145 | 146 | return project_exists 147 | 148 | 149 | def check_project_available(user_id: str, project_id: str) -> None: 150 | """ 151 | Check project's tables available or not. 152 | 153 | Parameters 154 | ---------- 155 | user_id : str 156 | user id 157 | project_uid : str 158 | target project uid 159 | """ 160 | access_key = get_access_key() 161 | HEADER.update({"x-api-key": access_key}) 162 | 163 | with Spinner(text="Creating the project, please wait..."): 164 | is_available = False 165 | while not is_available: 166 | url = ( 167 | f"{BASE_API_ENDPOINT}/project/{project_id}/tables/status?user={user_id}" 168 | ) 169 | res = requests.get(url, headers=HEADER) 170 | 171 | if res.status_code != 200: 172 | raise Exception("Something went wrong. Please try again.") 173 | 174 | status = res.json()["Status"] 175 | if status == "Creating": 176 | time.sleep(1) 177 | else: 178 | is_available = True 179 | 180 | 181 | def register_project_uid(user_id: str, project: str, project_uid: str) -> None: 182 | """ 183 | Register project uid to local config file. 184 | 185 | Parameters 186 | ---------- 187 | user_id : str 188 | user id 189 | project : str 190 | target project name 191 | project_uid : str 192 | target project uid 193 | """ 194 | config = configparser.ConfigParser() 195 | config.read(PROJECT_FILE) 196 | 197 | if config.has_section(user_id): 198 | config[user_id][project] = project_uid 199 | else: 200 | config[user_id] = {project: project_uid} 201 | with open(PROJECT_FILE, "w") as f: 202 | config.write(f) 203 | 204 | 205 | def delete_project_config(user_id: str, project_name: str) -> None: 206 | """ 207 | Delete config of specified project. 208 | 209 | Parameters 210 | ---------- 211 | user_id : str 212 | user id 213 | project_name : str 214 | target project name 215 | """ 216 | config = configparser.ConfigParser() 217 | config.read(PROJECT_FILE) 218 | 219 | config.remove_option(user_id, project_name) 220 | with open(PROJECT_FILE, "w") as f: 221 | config.write(f) 222 | 223 | 224 | def update_project_info(user_id: str) -> None: 225 | """ 226 | Update local project info with remote. 227 | 228 | Parameters 229 | ---------- 230 | user_id : str 231 | target user id 232 | """ 233 | config = configparser.ConfigParser() 234 | config.read(PROJECT_FILE) 235 | 236 | config.remove_section(user_id) 237 | 238 | access_key = get_access_key() 239 | HEADER.update({"x-api-key": access_key}) 240 | 241 | url = f"{BASE_API_ENDPOINT}/projects?user={user_id}" 242 | res = requests.get(url, headers=HEADER) 243 | if res.status_code != 200: 244 | raise ValueError("Invalid user configuration") 245 | projects = res.json()["Projects"] 246 | 247 | url += "&archived=1" 248 | res = requests.get(url, headers=HEADER) 249 | if res.json()["Projects"]: 250 | projects.extend(res.json()["Projects"]) 251 | 252 | project_info = {} 253 | for project in projects: 254 | project_name = project["ProjectName"] 255 | project_uid = project["ProjectUid"] 256 | project_info[project_name] = project_uid 257 | 258 | config[user_id] = project_info 259 | with open(PROJECT_FILE, "w") as f: 260 | config.write(f) 261 | 262 | 263 | def get_user_id_from_db(access_key: str) -> str: 264 | """ 265 | Get user id from remote db. 266 | 267 | Parameters 268 | ---------- 269 | access_key : str 270 | API access key saved in config file 271 | """ 272 | url = f"{BASE_API_ENDPOINT}/user/id" 273 | res = requests.get(url, data=json.dumps({"api_key": access_key}), headers=HEADER) 274 | 275 | if res.status_code != 200: 276 | raise ValueError( 277 | "Incorrect access key was specified. Please retry or ask support team via Slack. \nIf you have not joined our Slack yet, get your invite here!\n-> https://share.hsforms.com/16OxTF7eJRPK92oGCny7nGw8moen\n" 278 | ) 279 | user_id = res.json()["user_id"] 280 | 281 | return user_id 282 | 283 | 284 | if __name__ == "__main__": 285 | pass 286 | -------------------------------------------------------------------------------- /tests/test_project.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | import os 6 | import time 7 | from click.testing import CliRunner 8 | from base.cli import ( 9 | list_project, 10 | remove_project, 11 | ) 12 | from base.config import ( 13 | get_access_key, 14 | delete_project_config, 15 | get_user_id_from_db, 16 | ) 17 | from base.project import ( 18 | create_project, 19 | get_projects, 20 | archive_project, 21 | delete_project, 22 | Project, 23 | summarize_keys_information, 24 | ) 25 | 26 | PROJECT_NAME = "adansons_test_project" 27 | USER_ID = get_user_id_from_db(get_access_key()) 28 | INVITE_USER_ID = "test_invite@adansons.co.jp" 29 | TESTS_DIR = os.path.dirname(__file__) 30 | TEST_METADATA_SUMMARY = [ 31 | { 32 | "LowerValue": "0", 33 | "EditorList": ["xxxx@yyy.com"], 34 | "Creator": "xxxx@yyy.com", 35 | "ValueHash": "6dd1c6ef359fc0290897273dfee97dd6d1f277334b9a53f07056500409fd0f3a", 36 | "LastEditor": "xxxx@yyy.com", 37 | "UpperValue": "59999", 38 | "ValueType": "str", 39 | "CreatedTime": "1651429889.986235", 40 | "LastModifiedTime": "1651430744.0796146", 41 | "KeyHash": "a56145270ce6b3bebd1dd012b73948677dd618d496488bc608a3cb43ce3547dd", 42 | "KeyName": "id", 43 | "RecordedCount": 70000, 44 | }, 45 | { 46 | "LowerValue": "0", 47 | "EditorList": ["xxxx@yyy.com"], 48 | "Creator": "xxxx@yyy.com", 49 | "ValueHash": "6dd1c6ef359fc0290897273dfee97dd6d1f277334b9a53f07056500409fd0f3a", 50 | "LastEditor": "xxxx@yyy.com", 51 | "UpperValue": "59999", 52 | "ValueType": "int", 53 | "CreatedTime": "1651429889.986235", 54 | "LastModifiedTime": "1651430744.0796146", 55 | "KeyHash": "a56145270ce6b3bebd1dd012b73948677dd618d496488bc608a3cb43ce3547dd", 56 | "KeyName": "index", 57 | "RecordedCount": 70000, 58 | }, 59 | { 60 | "LowerValue": "0or6", 61 | "EditorList": ["xxxx@yyy.com"], 62 | "Creator": "xxxx@yyy.com", 63 | "ValueHash": "665c5c8dca33d1e21cbddcf524c7d8e19ec4b6b1576bbb04032bdedd8e79d95a", 64 | "LastEditor": "xxxx@yyy.com", 65 | "UpperValue": "-1", 66 | "ValueType": "str", 67 | "CreatedTime": "1651430744.0796146", 68 | "LastModifiedTime": "1651430744.0796146", 69 | "KeyHash": "34627e3242f2ca21f540951cb5376600aebba58675654dd5f61e860c6948bffa", 70 | "KeyName": "correction", 71 | "RecordedCount": 74, 72 | }, 73 | { 74 | "LowerValue": "0", 75 | "EditorList": ["xxxx@yyy.com"], 76 | "Creator": "xxxx@yyy.com", 77 | "ValueHash": "0c2fb8f0d59d60a0a5e524c7794d1cf091a377e5c0d3b2cf19324432562555e1", 78 | "LastEditor": "xxxx@yyy.com", 79 | "UpperValue": "9", 80 | "ValueType": "str", 81 | "CreatedTime": "1651429889.986235", 82 | "LastModifiedTime": "1651430744.0796146", 83 | "KeyHash": "1aca80e8b55c802f7b43740da2990e1b5735bbb323d93eb5ebda8395b04025e2", 84 | "KeyName": "label", 85 | "RecordedCount": 70000, 86 | }, 87 | { 88 | "LowerValue": "0", 89 | "EditorList": ["xxxx@yyy.com"], 90 | "Creator": "xxxx@yyy.com", 91 | "ValueHash": "0c2fb8f0d59d60a0a5e524c7794d1cf091a377e5c0d3b2cf19324432562555e1", 92 | "LastEditor": "xxxx@yyy.com", 93 | "UpperValue": "9", 94 | "ValueType": "int", 95 | "CreatedTime": "1651429889.986235", 96 | "LastModifiedTime": "1651430744.0796146", 97 | "KeyHash": "1aca80e8b55c802f7b43740da2990e1b5735bbb323d93eb5ebda8395b04025e2", 98 | "KeyName": "originalLabel", 99 | "RecordedCount": 70000, 100 | }, 101 | { 102 | "LowerValue": "test", 103 | "EditorList": ["xxxx@yyy.com"], 104 | "Creator": "xxxx@yyy.com", 105 | "ValueHash": "0e546bb01e2c9a9d1c388fca8ce3fabdde16084aee10c58becd4767d39f62ab7", 106 | "LastEditor": "xxxx@yyy.com", 107 | "UpperValue": "train", 108 | "ValueType": "str", 109 | "CreatedTime": "1651429889.986235", 110 | "LastModifiedTime": "1651430744.0796146", 111 | "KeyHash": "9c98c4cbd490df10e7dc42f441c72ef835e3719d147241e32b962a6ff8c1f49d", 112 | "KeyName": "dataType", 113 | "RecordedCount": 70000, 114 | }, 115 | ] 116 | TEST_SUMMARY_OUTPUT = { 117 | "MaxRecordedCount": 70000, 118 | "UniqueKeyCount": 4, 119 | "MaxCharCount": { 120 | "KEY NAME": 23, 121 | "VALUE RANGE": 12, 122 | "VALUE TYPE": 34, 123 | "RECORDED COUNT": 14, 124 | }, 125 | "Keys": [ 126 | ("KEY NAME", "VALUE RANGE", "VALUE TYPE", "RECORDED COUNT"), 127 | ("'id','index'", "0 ~ 59999", "str('id'), int('index')", "70000"), 128 | ("'correction'", "0or6 ~ -1", "str('correction')", "74"), 129 | ( 130 | "'label','originalLabel'", 131 | "0 ~ 9", 132 | "str('label'), int('originalLabel')", 133 | "70000", 134 | ), 135 | ("'dataType'", "test ~ train", "str('dataType')", "70000"), 136 | ], 137 | } 138 | 139 | 140 | def test_initialize(): 141 | """ 142 | If something went wrong past test session. 143 | You may have exsiting tables, so you have to clear them before below tests. 144 | """ 145 | runner = CliRunner() 146 | result = runner.invoke(list_project, []) 147 | if PROJECT_NAME in result.output: 148 | result = runner.invoke(remove_project, [PROJECT_NAME]) 149 | 150 | result = runner.invoke(list_project, ["--archived"]) 151 | if PROJECT_NAME in result.output: 152 | result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"]) 153 | 154 | 155 | def test_create_project(): 156 | create_project(USER_ID, PROJECT_NAME) 157 | time.sleep(20) 158 | 159 | 160 | def test_get_projects(): 161 | project_list = get_projects(USER_ID) 162 | assert any([project["ProjectName"] == PROJECT_NAME for project in project_list]) 163 | 164 | 165 | def test_add_datafiles(): 166 | project = Project(PROJECT_NAME) 167 | dir_path = TESTS_DIR 168 | extension = "jpeg" 169 | parsing_rule = "{_}/{title}.jpeg" 170 | file_num = project.add_datafiles(dir_path, extension, parsing_rule=parsing_rule) 171 | assert file_num == 1 172 | 173 | 174 | def test_add_datafile(): 175 | project = Project(PROJECT_NAME) 176 | file_path = os.path.join(TESTS_DIR, "data", "sample.jpeg") 177 | attributes = {"title": "sample"} 178 | file_num = project.add_datafile(file_path, attributes) 179 | 180 | 181 | def test_extract_metafile(): 182 | project = Project(PROJECT_NAME) 183 | file_path = os.path.join(TESTS_DIR, "data", "sample.xlsx") 184 | project.extract_metafile(file_path) 185 | 186 | 187 | def test_estimate_join_rule(): 188 | project = Project(PROJECT_NAME) 189 | file_path = os.path.join(TESTS_DIR, "data", "sample.xlsx") 190 | project.estimate_join_rule(file_path=file_path) 191 | 192 | 193 | def test_add_metafile(): 194 | project = Project(PROJECT_NAME) 195 | file_path = [os.path.join(TESTS_DIR, "data", "sample.xlsx")] 196 | project.add_metafile(file_path, auto=True) 197 | 198 | 199 | def test_get_metadata_summary(): 200 | project = Project(PROJECT_NAME) 201 | project.get_metadata_summary() 202 | 203 | 204 | def test_link_datafiles(): 205 | project = Project(PROJECT_NAME) 206 | dir_path = TESTS_DIR 207 | extension = "jpeg" 208 | project.link_datafiles(dir_path, extension) 209 | 210 | 211 | def test_add_member(): 212 | project = Project(PROJECT_NAME) 213 | project.add_member(INVITE_USER_ID, "Editor") 214 | 215 | 216 | def test_update_member(): 217 | project = Project(PROJECT_NAME) 218 | project.update_member(INVITE_USER_ID, "Admin") 219 | 220 | 221 | def test_get_members(): 222 | project = Project(PROJECT_NAME) 223 | project.get_members() 224 | 225 | 226 | def test_remove_member(): 227 | project = Project(PROJECT_NAME) 228 | project.remove_member(INVITE_USER_ID) 229 | 230 | 231 | def test_archive_project(): 232 | archive_project(USER_ID, PROJECT_NAME) 233 | 234 | 235 | def test_delete_project(): 236 | delete_project(USER_ID, PROJECT_NAME) 237 | delete_project_config(USER_ID, PROJECT_NAME) 238 | 239 | 240 | def test_summarize_keys_information(): 241 | result = summarize_keys_information(TEST_METADATA_SUMMARY) 242 | assert result == TEST_SUMMARY_OUTPUT 243 | 244 | 245 | if __name__ == "__main__": 246 | test_initialize() 247 | test_create_project() 248 | test_get_projects() 249 | test_add_datafiles() 250 | test_add_datafile() 251 | test_extract_metafile() 252 | test_estimate_join_rule() 253 | test_add_metafile() 254 | test_get_metadata_summary() 255 | test_link_datafiles() 256 | test_add_member() 257 | test_update_member() 258 | test_get_members() 259 | test_remove_member() 260 | test_archive_project() 261 | test_delete_project() 262 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Adansons Base Document 2 | 3 | - [Product Concept](#product-concept) 4 | - [0. Get Access Key](#0-get-access-key) 5 | - [1. Installation](#1-installation) 6 | - [2. Configuration](#2-configuration) 7 | - [2.1 with CLI](#21-with-cli) 8 | - [2.2 Environment Variables](#22-environment-variables) 9 | - [3. Tutorial 1: Organize meta data and Create dataset](#3-tutorial-1-organize-meta-data-and-create-dataset) 10 | - [Step 0. prepare sample dataset](#step-0-prepare-sample-dataset) 11 | - [Step 1. create new project](#step-1-create-new-project) 12 | - [Step 2. import data files](#step-2-import-data-files) 13 | - [Step 3. import external metadata files](#step-3-import-external-metadata-files) 14 | - [Step 4. filter and export dataset with CLI](#step-4-filter-and-export-dataset-with-cli) 15 | - [Step 5. filter and export dataset with Python SDK](#step-5-filter-and-export-dataset-with-python-sdk) 16 | - [4. API Reference](#4-api-reference) 17 | - [4.1 Command Reference](#41-command-reference) 18 | - [4.2 Python Reference](#42-python-reference) 19 | 20 | 21 | ## Product Concept 22 | - Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets. 23 | - It makes dataset creation more effective, helps find essential insights from training results, and improves AI performance. 24 | 25 | More detail 26 | ↓↓↓ 27 | 28 | - Medium 29 | - https://medium.com/@KenichiHiguchi/3-things-you-need-to-deal-with-in-data-management-to-create-best-dataset-781177507fc2 30 | - Product Page 31 | - https://adansons.wraptas.site 32 | 33 | --- 34 | ## 0. Get Access Key 35 | 36 | Type your email into the form below to join our slack and get the access key. 37 | 38 | Invitation Form: https://share.hsforms.com/1KG8Hp2kwSjC6fjVwwlklZA8moen 39 | 40 | 41 | ## 1. Installation 42 | 43 | Adansons Base contains Command Line Interface (CLI) and Python SDK, and you can install both with `pip` command. 44 | 45 | ```bash 46 | pip install -U pip 47 | pip install adansons-base 48 | ``` 49 | 50 | > Note: if you want to use CLI in any directory, you have to install with the python globally installed on your computer. 51 | 52 | ## 2. Configuration 53 | 54 | ### 2.1 with CLI 55 | 56 | when you run any Base CLI command for the first time, Base will ask for your access key provided on our slack. 57 | 58 | then, Base will verify the specified access key was correct. 59 | 60 | if you don't have an access key, please see [0. Get Access Key](#0-get-access-key). 61 | 62 | this command will show you what projects you have 63 | 64 | ```bash 65 | base list 66 | ``` 67 | 68 |
Output 69 | 70 | ``` 71 | Welcome to Adansons Base!! 72 | 73 | Let's start with your access key provided on our slack. 74 | 75 | Please register your access_key: xxxxxxxxxx 76 | 77 | Successfully configured as xxxx@yyyy.com 78 | 79 | projects 80 | ======== 81 | ``` 82 |
83 | 84 | ### 2.2 Environment Variables 85 | 86 | if you don’t want to configure interactively, you can use environment variables for configuration. 87 | 88 | `BASE_USER_ID` is used for the identification of users, this is the email address you submitted via our form. 89 | 90 | ```bash 91 | export BASE_ACCESS_KEY=xxxxxxxxxx 92 | export BASE_USER_ID=xxxx@yyyy.com 93 | ``` 94 | 95 | ## 3. Tutorial 1: Organize metadata and Create a dataset 96 | 97 | let’s start the Base tutorial with the mnist dataset. 98 | 99 | ### Step 0. prepare sample dataset 100 | 101 | install dependencies for download dataset at first. 102 | 103 | ```bash 104 | pip install pypng 105 | ``` 106 | 107 | then, download a script for mnist from our Base repository 108 | 109 | ```bash 110 | curl -sSL https://raw.githubusercontent.com/adansons/base/main/download_mnist.py > download_mnist.py 111 | ``` 112 | 113 | run the download-mnist script. you can specify any folder for downloading as the last argument(default “~/dataset/mnist”). if you run this command on Windows, please replace it with the windows path like “C:\dataset\mnist” 114 | 115 | ```bash 116 | python3 ./download_mnist.py ~/dataset/mnist 117 | ``` 118 | 119 | > Note: Base can link the data files if you put them anywhere on the local computer. So if you already downloaded the mnist dataset, you can use it 120 | 121 | after downloading, you can see data files in ~/dataset/mnist. 122 | 123 | ``` 124 | ~ 125 | └── dataset 126 | └── mnist 127 | ├── train 128 | │ ├── 0 129 | │ │ ├── 1.png 130 | │ │ ├── ... 131 | │ │ └── 59987.png 132 | │ ├── ... 133 | │ └── 9 134 | └── test 135 | ├── 0 136 | └── ... 137 | ``` 138 | 139 | ### Step 1. create a new project 140 | 141 | create mnist project with [base new ](/docs/CLI.md#new) command. 142 | 143 | ```bash 144 | base new mnist 145 | ``` 146 | 147 |
Output 148 | 149 | ``` 150 | Your Project UID 151 | ---------------- 152 | abcdefghij0123456789 153 | 154 | save Project UID in the local file (~/.base/projects) 155 | ``` 156 |
157 | 158 | Base will issue a Project Unique ID and automatically save it in a local file. 159 | 160 | ### Step 2. import data files 161 | 162 | after step 0, you have many png image files on the”~/dataset/mnist” directory. 163 | 164 | let’s upload metadata related to their paths into the mnist project with the `base import` command. 165 | 166 | ```bash 167 | base import mnist --directory ~/dataset/mnist --extension png --parse "{dataType}/{label}/{id}.png" 168 | ``` 169 | 170 | > Note: if you changed the download folder, please replace “~/dataset/mnist” in the above command. 171 | 172 |
Output 173 | 174 | ``` 175 | Check datafiles... 176 | found 70000 files with png extension. 177 | Success! 178 | ``` 179 |
180 | 181 | ### Step 3. import external metadata files 182 | 183 | if you have external metadata files, you can integrate them into the existing project database with the `—-external-file` option. 184 | 185 | in this time, we use `wrongImagesInMNISTTestset.csv` published on Github by youkaichao. 186 | 187 | [https://github.com/youkaichao/mnist-wrong-test](https://github.com/youkaichao/mnist-wrong-test) 188 | 189 | this is the extra metadata that correct wrong label on the mnist test dataset. 190 | 191 | you can evaluate your model more strictly and correctly by using these extra metadata with Base. 192 | 193 | download external CSV 194 | 195 | ```bash 196 | curl -SL https://raw.githubusercontent.com/youkaichao/mnist-wrong-test/master/wrongImagesInMNISTTestset.csv > ~/Downloads/wrongImagesInMNISTTestset.csv 197 | ``` 198 | 199 | ```bash 200 | base import mnist --external-file --path ~/Downloads/wrongImagesInMNISTTestset.csv -a dataType:test 201 | ``` 202 | 203 |
Output 204 | 205 | ``` 206 | 1 tables found! 207 | now estimating the rule for table joining... 208 | 209 | 1 table joining rule was estimated! 210 | Below table joining rule will be applied... 211 | 212 | Rule no.1 213 | 214 | key 'index' -> connected to 'id' key on exist table 215 | key 'originalLabel' -> connected to 'label' key on exist table 216 | key 'correction' -> newly added 217 | 218 | 1 tables will be applied 219 | Table 1 sample record: 220 | {'index': 8, 'originalLabel': 5, 'correction': '-1'} 221 | 222 | Do you want to perform table join? 223 | Base will join tables with that rule described above. 224 | 225 | 'y' will be accepted to approve. 226 | 227 | Enter a value: y 228 | Success! 229 | ``` 230 |
231 | 232 | ### Step 4. filter and export dataset with CLI 233 | 234 | now, we are ready to create a dataset. 235 | 236 | let’s pick up a part of data files, the label is 0, 1, or 2 for training, from project mnist with `base search ` command. 237 | 238 | you can use `--conditions ` option for magical search filter and `--query ` option for advanced filter. 239 | 240 | Note that the `--query` option can only use the value for searching. 241 | 242 | 243 | 244 | Be careful that you may get so large output on your console without the `-s, --summary` option. 245 | 246 | The `--query` option's grammar is below. 247 | 248 | `--query {KeyName} {Operator} {Values}` 249 | 250 | - add 1 space between each section 251 | - **don't use space any other** 252 | 253 | You can use these operators below in the query option. 254 | 255 | [operators] 256 | ``` 257 | == : equal 258 | != : not equal 259 | >= : greater than 260 | <= : less than 261 | > : greater 262 | < : less 263 | in : inner list of Values 264 | not in : not inner list of Values 265 | ``` 266 | 267 | (check [search docs](/docs/CLI.md#search) for more information). 268 | 269 | ```bash 270 | base search mnist --conditions "train" --query "label in ['1','2','3']" 271 | ``` 272 | 273 | > Note: in the query option, you have to specify each component as a string in the list without space like `“[’1’,’2’,’3’]”`, when you want to operate `in` or `not in` query. 274 | > 275 | 276 |
Output 277 | 278 | ``` 279 | 18831 files 280 | ======== 281 | '/home/xxxx/dataset/mnist/train/1/42485.png' 282 | ... 283 | ``` 284 |
285 | 286 | > Note: If you specify no conditions or query, Base will return whole data files. 287 | 288 | If you want to use the 'OR search' with the `--query` command, please use our Python SDK. 289 | 290 | ### Step 5. filter and export dataset with Python SDK 291 | 292 | in python script, you can filter and export datasets easily and simply with `Project class` and `Files class`. (see [SDK docs](/docs/SDK.md#project-class)) 293 | 294 | (If you don't have the packages below, please install them by using `pip`) 295 | ```bash 296 | pip install NumPy pillow torch torchvision 297 | ``` 298 | 299 | ```python 300 | from base import Project, Dataset 301 | import numpy as np 302 | from PIL import Image 303 | 304 | 305 | # export dataset as you want to use 306 | project = Project("mnist") 307 | files = project.files(conditions="train", query=["label in ['1','2','3']"]) 308 | 309 | print(files[0]) 310 | # this returns path-like `File` object 311 | # -> '/home/xxxx/dataset/mnist/0/12909.png' 312 | print(files[0].label) 313 | # this returns the value of attribute 'lable' of first `File` object 314 | # -> '0' 315 | 316 | # function to load image from path 317 | # this is necessary, if you want to use image in your dataset 318 | # because base Dataset class doesn't convert path to image 319 | def preprocess_func(path): 320 | image = Image.open(path) 321 | image = image.resize((28, 28)) 322 | image = np.array(image) 323 | return image 324 | 325 | dataset = Dataset(files, target_key="label", transform=preprocess_func) 326 | 327 | # you can also use dataset objects like this. 328 | for data, label in dataset: 329 | # data: an image-data. ndarray 330 | # label: the label of an image data, like 0 331 | pass 332 | 333 | x_train, x_test, y_train, y_test = dataset.train_test_split(split_rate=0.2) 334 | 335 | # or use with torch 336 | import torch 337 | import torchvision.transforms as transforms 338 | from PIL import Image 339 | 340 | def preprocess_func(path): 341 | image = transforms.ToTensor()(transforms.Resize((28, 28))(Image.open(path))) 342 | return image 343 | 344 | dataset = Dataset(files, target_key="label", transform=preprocess_func) 345 | loader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True) 346 | ``` 347 | 348 | finally, let’s try one of the most characteristic use cases on Adansons Base. 349 | 350 | in the external file, you imported in step.3, some mnist test data files are annotated as `“-1”` in the correction column. this means that it is difficult to classify that files even for a human. 351 | 352 | so, you should exclude that files from your dataset to evaluate your AI models more properly. 353 | 354 | ```python 355 | # you can exclude files which have "-1" on "correction" with below code 356 | eval_files = project.files(conditions="test", query=["correction != -1"]) 357 | 358 | print(len(eval_files)) 359 | # this returns the number of files matched with requested conditions or query 360 | # -> 9963 361 | 362 | eval_dataset = Dataset(eval_files, target_key="label", transform=preprocess_func) 363 | ``` 364 | 365 | ## 4. API Reference 366 | 367 | ### 4.1 Command Reference 368 | 369 | [Command Reference](/docs/CLI.md) 370 | 371 | ### 4.2 Python Reference 372 | 373 | [Python Reference](/docs/SDK.md) 374 | -------------------------------------------------------------------------------- /base/parser.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | import re 6 | from typing import List, Optional 7 | 8 | 9 | class Parser: 10 | """ 11 | The class of path parsing. 12 | Instance Args 13 | ----------------- 14 | self.sep: str 15 | Path separator 16 | 17 | self.parsing_rule: str 18 | Path parsing rule 19 | 20 | self.splitters: List 21 | The list of splitters in the parsing rule. 22 | ex.) `["-", "/", "_"]` 23 | 24 | self.split_counts: List 25 | The list of the number of splitters included in each value of detail parsing rule. 26 | You input the detail parsing rule ({Origin}/{2022_04_05}-{dog}_{1}.png). 27 | Then, self.split_counts is `[0, 2, 0, 0]`. 28 | 29 | self.unuse_strs: List 30 | The list of the number of unuse strings in the parsing rule. 31 | You input the parsing rule (hoge{num1}/fuga{num2}.txt). 32 | Then, self.unuse_strs is `["hoge", "fuga"]`. 33 | """ 34 | 35 | def __init__( 36 | self, parsing_rule: str, extension: str, sep: Optional[str] = None 37 | ) -> None: 38 | """ 39 | Parameters 40 | ---------- 41 | parsing_rule : str 42 | specified parsing rule 43 | ex.) {_}/{name}/{timestamp}/{sensor}-{condition}_{iteration}.csv 44 | * phrase in "{}" will be interpretered as key 45 | * "{_}" will be ignored 46 | sep : str, default={"/", "¥"} 47 | path separator char. this depends on the Operating System 48 | """ 49 | if sep is None: 50 | self.sep = "/" 51 | else: 52 | self.sep = sep 53 | 54 | self.parsing_rule = parsing_rule 55 | self.extension = extension 56 | self.__generate_parser() 57 | 58 | def __generate_parser(self, is_update: bool = False) -> None: 59 | """ 60 | Generate parsing keys and pattern from parsing rule 61 | Parameters (instance vars) 62 | -------------------------- 63 | self.parsing_rule : str 64 | specified parsing rule 65 | ex.) {_}/{name}/{timestamp}/{sensor}-{condition}_{iteration}.csv 66 | * phrase in "{}" will be interpretered as key 67 | * "{_}" will be ignored 68 | is_update : bool 69 | whether to update the parse rule 70 | Returns (instance vars) 71 | ----------------------- 72 | self.parsing_keys : list 73 | list of keys 74 | ex.) ["_", "name", "timestamp", "sensor", "condition", "iteration"] 75 | """ 76 | # 正規表現"(.*)"は任意の文字の任意回数の繰り返し 77 | # "{(.*)}"とすることで、{}に囲まれた文字列を抽出できる 78 | # しかし、複数{}がある時に、どの"{"と"}"の組み合わせを取れば良いかわからない 79 | # "{(.*?)}"と?をつけることで、最小文字数の文字列を囲んだ{}をpickupできる 80 | self.parsing_rule = self.convert_parsable_parse_rule() 81 | 82 | parsing_keys = re.findall("{(.*?)}", self.parsing_rule) 83 | splitters = self.extract_splitter() 84 | 85 | if is_update: 86 | split_counts = self.count_splitter_in_each_key(parsing_keys) 87 | else: 88 | split_counts = [0] * len(parsing_keys) 89 | 90 | self.splitters = splitters 91 | self.split_counts = split_counts 92 | 93 | if not is_update: 94 | self.parsing_keys = parsing_keys 95 | 96 | def __call__(self, path: str) -> dict: 97 | """ 98 | Parse path with generated parser 99 | Parameters 100 | ---------- 101 | path : str 102 | required the file path 103 | Returns 104 | ------- 105 | parsed_dict : dict 106 | meta data dictionary 107 | Example 108 | ------- 109 | >>> from base.parser import Parser 110 | >>> parser = Parser("your parsing rule") 111 | >>> result = parser("your target path for parse") 112 | """ 113 | if path.startswith(self.sep): 114 | path = (self.sep).join(path.split(self.sep)[1:]) 115 | 116 | # ユーザーから取得したparsing_ruleを元に 117 | # hoge/2022-03-14/A-200-A-a-01 から 118 | # {hoge}/{2022-03-14}/{A-200}-{A}-{a-01} に変換する 119 | # "{(.*?)}"で{}内の値を抽出できるので良きみ 120 | path = self.convert_path_to_parsable_format(path) 121 | parsed_values = re.findall("{(.*?)}", path) 122 | 123 | parsed_dict = {} 124 | for key, value in zip(self.parsing_keys, parsed_values): 125 | if key in ["_", "[UnuseToken]"]: 126 | continue 127 | parsed_dict[key] = value 128 | 129 | return parsed_dict 130 | 131 | def update_rule(self, parsing_rule: str) -> None: 132 | """ 133 | Update parsing rule and re-generate parsing keys and pattern 134 | Parameters 135 | ---------- 136 | parsing_rule : str 137 | specified parsing rule 138 | ex.) {_}/{name}/{timestamp}/{sensor}-{condition}_{iteration}.csv 139 | * phrase in "{}" will be interpretered as key 140 | * "{_}" will be ignored 141 | """ 142 | self.parsing_rule = parsing_rule 143 | self.__generate_parser(is_update=True) 144 | 145 | def is_path_parsable(self, path: str) -> bool: 146 | """ 147 | Check path is parsable 148 | Parameters 149 | --------------------- 150 | path : str 151 | ex) `dataset/dog/2022_12_04-A.png` 152 | Return 153 | --------------------- 154 | parsable_flag : bool 155 | """ 156 | if path.startswith(self.sep): 157 | path = (self.sep).join(path.split(self.sep)[1:]) 158 | 159 | parsable_flag = True 160 | try: 161 | path = self.convert_path_to_parsable_format(path) 162 | except: 163 | parsable_flag = False 164 | 165 | return parsable_flag 166 | 167 | def convert_parsable_parse_rule(self) -> str: 168 | """ 169 | Replace unused strings in `parsing_rule` with `{_}` 170 | 171 | Return 172 | --------------------- 173 | parsable_parsing_rule: str 174 | ex.) `{_}/{num1}/{fuga}/{num2}.txt` 175 | """ 176 | if not self.parsing_rule.endswith("." + self.extension): 177 | self.parsing_rule += "." + self.extension 178 | 179 | self.unuse_strs = self.extract_unuse_str() 180 | 181 | for not_use_str in self.unuse_strs: 182 | self.parsing_rule = self.parsing_rule.replace(not_use_str, "{[UnuseToken]}") 183 | 184 | self.parsing_rule = self.parsing_rule.replace("}{", "}" + self.sep + "{") 185 | 186 | parsable_parsing_rule = self.parsing_rule 187 | return parsable_parsing_rule 188 | 189 | def convert_path_to_parsable_format(self, path: str) -> str: 190 | """ 191 | Convert path to parsable format 192 | 1. Insert separators before and after unuse strings. 193 | 2. Enclose the value to be extracted in the path with `{}`. 194 | - Add `{` after separator `/` to `parsable_format_path`. 195 | - The condition of adding `}` to `parsable_format_path`. 196 | - Before the separator was added. 197 | - When the number of splitters, added to the `parsable_format_path` after `{`, 198 | is the number determined by parsing_rule 199 | 200 | Parameters 201 | --------------- 202 | path : str 203 | ex.) `Origin/suzuki/2022_12_03/A-20-C-100.csv` 204 | Return 205 | -------------- 206 | parsable_format_path : str 207 | strings converted for path-parse 208 | ex.) `{Origin}/{suzuki}/{2022_12_03}/{A-200}-{C}-{100}.csv` 209 | """ 210 | closure_cnt, split_cnt = 0, 0 211 | 212 | path = self.insert_sep_to_path(path) 213 | 214 | parsable_format_path = "{" 215 | for s in path: 216 | if (s in self.splitters) and (split_cnt == self.split_counts[closure_cnt]): 217 | parsable_format_path += "}" + s + "{" 218 | closure_cnt += 1 219 | split_cnt = 0 220 | 221 | else: 222 | if s in self.splitters: 223 | split_cnt += 1 224 | parsable_format_path += s 225 | 226 | return parsable_format_path 227 | 228 | def extract_sub_splitters(self) -> List: 229 | """ 230 | Extract strings not enclosed in `{}` 231 | Returns 232 | ----------------- 233 | candidate_splitters : List 234 | ex) `["hoge", "/fuga"]` 235 | """ 236 | parsing_rule_ = self.parsing_rule 237 | 238 | # switch `{XX}` to `}XX{` 239 | parsing_rule_ = parsing_rule_.replace("{", "[RPTRight]").replace( 240 | "}", "[RPTLeft]" 241 | ) 242 | parsing_rule_ = parsing_rule_.replace("[RPTRight]", "}").replace( 243 | "[RPTLeft]", "{" 244 | ) 245 | parsing_rule_ = "{" + parsing_rule_ + "}" 246 | 247 | sub_splitters = re.findall(r"\{(.*?)\}", parsing_rule_) 248 | 249 | return sub_splitters 250 | 251 | def extract_splitter(self) -> List: 252 | """ 253 | Find splitters such as '/', '-' or '_' etc... 254 | 255 | Return 256 | --------------------- 257 | splitters : list 258 | ex) `["/", "/", "/", "-", "_", "."]` 259 | """ 260 | sub_splitters = self.extract_sub_splitters() 261 | 262 | code_pattern = re.compile("[!-/:-@[-`{-~]") 263 | 264 | splitters = [] 265 | for sub_sp in sub_splitters: 266 | sp = re.findall(code_pattern, sub_sp) 267 | splitters.extend(sp) 268 | 269 | return splitters 270 | 271 | def extract_unuse_str(self) -> List: 272 | """ 273 | Find not use strings for value. 274 | Return 275 | --------------------- 276 | unuse_strs : List 277 | ex) `["hoge", "fuga"]` 278 | """ 279 | sub_splitters = self.extract_sub_splitters() 280 | str_pattern = re.compile("[^!-/:-@[-`{-~]+") 281 | 282 | unuse_strs = [] 283 | for sub_sp in sub_splitters: 284 | strs = re.findall(str_pattern, sub_sp) 285 | unuse_strs.extend(strs) 286 | 287 | return unuse_strs 288 | 289 | def count_splitter_in_each_key(self, values: List) -> List: 290 | """ 291 | Count the number of splitters in each keys 292 | Parameters 293 | ---------------- 294 | values : List 295 | The list of values extracted from the `path` 296 | ex.) `["hoge", "/fuga"]` (path: hoge{num1}/fuga{num2}.txt) 297 | 298 | Retern 299 | --------------- 300 | split_cnts : List 301 | The list of the number of splitters in each key 302 | ex.) ` 303 | """ 304 | split_cnts = [] 305 | for value in values: 306 | split_in_value = 0 307 | if value != "_": 308 | for split in set(self.splitters): 309 | split_in_value += value.count(split) 310 | split_cnts.append(split_in_value) 311 | 312 | return split_cnts 313 | 314 | def insert_sep_to_path(self, path: str) -> str: 315 | """ 316 | Insert splitter before/after `unuse_str` in the path 317 | Parameters 318 | ------------------ 319 | path : str 320 | ex.) `Origin/hoge1/fugasuzukipiyo_03.csv` 321 | Returns 322 | ------------------ 323 | path : str 324 | ex.) `{Origin}/{hoge}/{1}/{fuga}/{suzuki}/{piyo}_{03}.csv` 325 | """ 326 | for unuse_str in self.unuse_strs: 327 | path = path.replace(unuse_str, self.sep + unuse_str + self.sep) 328 | 329 | path = path.replace(self.sep * 3, self.sep) 330 | 331 | splitters = sorted(list(set(self.splitters))) 332 | 333 | # Put "/" at the last position in `splitters` 334 | if self.sep in splitters: 335 | splitters.remove(self.sep) 336 | splitters.append(self.sep) 337 | 338 | for splitter in splitters: 339 | path = path.replace(splitter + self.sep, self.sep).replace( 340 | self.sep + splitter, self.sep 341 | ) 342 | 343 | return path 344 | 345 | def validate_parsing_rule(self) -> bool: 346 | """ 347 | Check that the parsing_rule is valid 348 | Parameter 349 | ---------------- 350 | self.parsing_keys 351 | 352 | Return 353 | ---------------- 354 | is_valid : bool 355 | """ 356 | if len(self.parsing_keys) == 0: 357 | is_valid = False 358 | elif len(self.parsing_keys) == 1 and self.parsing_keys[0] == "[UnuseToken]": 359 | is_valid = False 360 | else: 361 | is_valid = True 362 | return is_valid 363 | 364 | 365 | if __name__ == "__main__": 366 | pass 367 | -------------------------------------------------------------------------------- /tests/test_cli.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | 6 | import os 7 | 8 | import time 9 | from unittest import result, runner 10 | from click.testing import CliRunner 11 | from base.cli import ( 12 | create_table, 13 | import_data, 14 | list_project, 15 | remove_project, 16 | show_project_detail, 17 | import_data, 18 | data_link, 19 | search_files, 20 | invite_member, 21 | ) 22 | 23 | 24 | PROJECT_NAME = "adansons_test_project" 25 | INVITE_USER_ID = "test_invite@adansons.co.jp" 26 | 27 | 28 | def test_initialize(): 29 | """ 30 | If something went wrong past test session. 31 | You may have exsiting tables, so you have to clear them before below tests. 32 | """ 33 | runner = CliRunner() 34 | result = runner.invoke(list_project, []) 35 | if PROJECT_NAME in result.output: 36 | result = runner.invoke(remove_project, [PROJECT_NAME]) 37 | 38 | result = runner.invoke(list_project, ["--archived"]) 39 | if PROJECT_NAME in result.output: 40 | result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"]) 41 | 42 | 43 | def test_create_table(): 44 | runner = CliRunner() 45 | result = runner.invoke(create_table, [PROJECT_NAME]) 46 | assert result.exit_code == 0 47 | assert "Your Project UID" in result.output 48 | 49 | 50 | def test_list_project(): 51 | runner = CliRunner() 52 | result = runner.invoke(list_project, []) 53 | assert result.exit_code == 0 54 | assert PROJECT_NAME in result.output 55 | 56 | 57 | def test_show_project_detail(): 58 | # wait create table 59 | time.sleep(20) 60 | runner = CliRunner() 61 | result = runner.invoke(show_project_detail, [PROJECT_NAME]) 62 | assert result.exit_code == 0 63 | assert f"project {PROJECT_NAME}" in result.output 64 | 65 | 66 | def test_import_dataset_with_invalid_rule(): 67 | runner = CliRunner() 68 | result = runner.invoke( 69 | import_data, 70 | [ 71 | PROJECT_NAME, 72 | "-d", 73 | os.path.dirname(__file__), 74 | "-e", 75 | "png", 76 | "-c", 77 | "{_}/{date}_{key}.png", 78 | ], 79 | input="{data}/{2022_04_14}_{rocket}.png", 80 | ) 81 | assert result.exit_code == 0 82 | assert "Success!" in result.output 83 | 84 | 85 | def test_import_dataset(): 86 | runner = CliRunner() 87 | result = runner.invoke( 88 | import_data, 89 | [ 90 | PROJECT_NAME, 91 | "-d", 92 | os.path.dirname(__file__), 93 | "-e", 94 | "jpeg", 95 | "-c", 96 | "{_}/{title}.jpeg", 97 | ], 98 | ) 99 | assert result.exit_code == 0 100 | assert "Success!" in result.output 101 | 102 | 103 | def test_import_metafile_extract(): 104 | runner = CliRunner() 105 | result = runner.invoke( 106 | import_data, 107 | [ 108 | PROJECT_NAME, 109 | "-m", 110 | "-p", 111 | os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"), 112 | "--extract", 113 | ], 114 | ) 115 | assert result.exit_code == 0 116 | assert "Success!" in result.output 117 | 118 | 119 | def test_import_metafile_estimate_rule(): 120 | runner = CliRunner() 121 | result = runner.invoke( 122 | import_data, 123 | [ 124 | PROJECT_NAME, 125 | "-m", 126 | "-p", 127 | os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"), 128 | "--estimate-rule", 129 | ], 130 | ) 131 | assert result.exit_code == 0 132 | assert "Success!" in result.output 133 | 134 | 135 | def test_import_metafile(): 136 | runner = CliRunner() 137 | result = runner.invoke( 138 | import_data, 139 | [ 140 | PROJECT_NAME, 141 | "-m", 142 | "-p", 143 | os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"), 144 | "--auto-approve", 145 | ], 146 | ) 147 | assert result.exit_code == 0 148 | assert "Success!" in result.output 149 | 150 | 151 | def test_import_metafile_modify(): 152 | runner = CliRunner() 153 | result = runner.invoke( 154 | import_data, 155 | [ 156 | PROJECT_NAME, 157 | "-m", 158 | "-p", 159 | os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"), 160 | ], 161 | input="m", 162 | ) 163 | assert result.exit_code == 0 164 | assert "Success!" in result.output 165 | 166 | 167 | def test_import_metafile_modify_join_rule_file(): 168 | runner = CliRunner() 169 | result = runner.invoke( 170 | import_data, 171 | [ 172 | PROJECT_NAME, 173 | "-m", 174 | "--join-rule", 175 | "joinrule_definition_adansons_test_project.yml", 176 | ], 177 | input="y", 178 | ) 179 | assert result.exit_code == 0 180 | assert "Success!" in result.output 181 | 182 | 183 | def test_import_metafile_exkeyvalue(): 184 | runner = CliRunner() 185 | result = runner.invoke( 186 | import_data, 187 | [ 188 | PROJECT_NAME, 189 | "-m", 190 | "-p", 191 | os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"), 192 | "-a", 193 | "key1:value1", 194 | "--auto-approve", 195 | ], 196 | ) 197 | assert result.exit_code == 0 198 | assert "Success!" in result.output 199 | 200 | 201 | def test_import_metafile_exkeyvalue_multiple(): 202 | runner = CliRunner() 203 | result = runner.invoke( 204 | import_data, 205 | [ 206 | PROJECT_NAME, 207 | "-m", 208 | "-p", 209 | os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"), 210 | "-a", 211 | "key1:value1", 212 | "-a", 213 | "key2:value2", 214 | "--auto-approve", 215 | ], 216 | ) 217 | assert result.exit_code == 0 218 | assert "Success!" in result.output 219 | 220 | 221 | def test_import_metafile_exkeyvalue_invalid(): 222 | runner = CliRunner() 223 | result = runner.invoke( 224 | import_data, 225 | [ 226 | PROJECT_NAME, 227 | "-m", 228 | "-p", 229 | os.path.join(os.path.dirname(__file__), "data", "sample.xlsx"), 230 | "-a", 231 | "key1-value1", 232 | "--auto-approve", 233 | ], 234 | ) 235 | assert result.exit_code == 0 236 | assert "invalid" in result.output 237 | 238 | 239 | def test_import_metafile_csv(): 240 | runner = CliRunner() 241 | result = runner.invoke( 242 | import_data, 243 | [ 244 | PROJECT_NAME, 245 | "-m", 246 | "-p", 247 | os.path.join(os.path.dirname(__file__), "data", "sample.csv"), 248 | "--auto-approve", 249 | ], 250 | ) 251 | assert result.exit_code == 0 252 | assert "Success!" in result.output 253 | 254 | 255 | def test_import_metafile_csv_exkeyvalue(): 256 | runner = CliRunner() 257 | result = runner.invoke( 258 | import_data, 259 | [ 260 | PROJECT_NAME, 261 | "-m", 262 | "-p", 263 | os.path.join(os.path.dirname(__file__), "data", "sample.csv"), 264 | "-a", 265 | "key1:value1", 266 | "--auto-approve", 267 | ], 268 | ) 269 | assert result.exit_code == 0 270 | assert "Success!" in result.output 271 | 272 | 273 | def test_import_metafile_csv_exkeyvalue_multiple(): 274 | runner = CliRunner() 275 | result = runner.invoke( 276 | import_data, 277 | [ 278 | PROJECT_NAME, 279 | "-m", 280 | "-p", 281 | os.path.join(os.path.dirname(__file__), "data", "sample.csv"), 282 | "-a", 283 | "key1:value1", 284 | "-a", 285 | "key2:value2", 286 | "--auto-approve", 287 | ], 288 | ) 289 | assert result.exit_code == 0 290 | assert "Success!" in result.output 291 | 292 | 293 | def test_data_link(): 294 | runner = CliRunner() 295 | result = runner.invoke( 296 | data_link, 297 | [ 298 | PROJECT_NAME, 299 | "-d", 300 | os.path.dirname(__file__), 301 | "-e", 302 | "jpeg", 303 | ], 304 | ) 305 | assert result.exit_code == 0 306 | assert "linked!" in result.output 307 | 308 | 309 | def test_import_metafile_csv_exkeyvalue_invalid(): 310 | runner = CliRunner() 311 | result = runner.invoke( 312 | import_data, 313 | [ 314 | PROJECT_NAME, 315 | "-m", 316 | "-p", 317 | os.path.join(os.path.dirname(__file__), "data", "sample.csv"), 318 | "-a", 319 | "key1-value1", 320 | ], 321 | ) 322 | assert result.exit_code == 0 323 | assert "invalid" in result.output 324 | 325 | 326 | def test_search_files(): 327 | time.sleep(5) 328 | runner = CliRunner() 329 | result = runner.invoke(search_files, [PROJECT_NAME, "-q", "title == sample"]) 330 | assert result.exit_code == 0 331 | assert "1 files" in result.output 332 | 333 | 334 | def test_get_project_member(): 335 | runner = CliRunner() 336 | result = runner.invoke(show_project_detail, [PROJECT_NAME, "--member-list"]) 337 | assert result.exit_code == 0 338 | assert "project Members" in result.output 339 | 340 | 341 | def test_invite_project_member(): 342 | runner = CliRunner() 343 | result = runner.invoke( 344 | invite_member, [PROJECT_NAME, "-m", INVITE_USER_ID, "-p", "Editor"] 345 | ) 346 | assert result.exit_code == 0 347 | assert "Successfully" in result.output 348 | runner = CliRunner() 349 | result = runner.invoke(show_project_detail, [PROJECT_NAME, "--member-list"]) 350 | assert result.exit_code == 0 351 | assert f"{INVITE_USER_ID} (Editor" in result.output 352 | 353 | 354 | def test_change_permission(): 355 | runner = CliRunner() 356 | result = runner.invoke( 357 | invite_member, [PROJECT_NAME, "-m", INVITE_USER_ID, "-p", "Admin", "-u"] 358 | ) 359 | assert result.exit_code == 0 360 | assert "Successfully" in result.output 361 | runner = CliRunner() 362 | result = runner.invoke(show_project_detail, [PROJECT_NAME, "--member-list"]) 363 | assert result.exit_code == 0 364 | assert f"{INVITE_USER_ID} (Admin" in result.output 365 | 366 | 367 | # skip test_change_project_owner because it difficult to handle multi user in CLI 368 | def test_delete_project_member(): 369 | runner = CliRunner() 370 | result = runner.invoke(remove_project, [PROJECT_NAME, "-m", INVITE_USER_ID]) 371 | assert result.exit_code == 0 372 | assert f"{INVITE_USER_ID} was removed from {PROJECT_NAME}" in result.output 373 | runner = CliRunner() 374 | result = runner.invoke(show_project_detail, [PROJECT_NAME, "--member-list"]) 375 | assert result.exit_code == 0 376 | assert f"{INVITE_USER_ID} (Admin" not in result.output 377 | 378 | 379 | def test_archive_project(): 380 | runner = CliRunner() 381 | result = runner.invoke(remove_project, [PROJECT_NAME]) 382 | assert result.exit_code == 0 383 | assert f"{PROJECT_NAME} was Archived" in result.output 384 | result = runner.invoke(list_project, []) 385 | assert result.exit_code == 0 386 | assert PROJECT_NAME not in result.output 387 | 388 | 389 | def test_list_archived_project(): 390 | runner = CliRunner() 391 | result = runner.invoke(list_project, ["--archived"]) 392 | assert result.exit_code == 0 393 | assert PROJECT_NAME in result.output 394 | 395 | 396 | def test_delete_project(): 397 | runner = CliRunner() 398 | result = runner.invoke(remove_project, [PROJECT_NAME, "--confirm"]) 399 | assert result.exit_code == 0 400 | assert f"{PROJECT_NAME} was Deleted" in result.output 401 | result = runner.invoke(list_project, ["--archived"]) 402 | assert result.exit_code == 0 403 | assert PROJECT_NAME not in result.output 404 | 405 | 406 | if __name__ == "__main__": 407 | test_initialize() 408 | test_create_table() 409 | test_list_project() 410 | test_show_project_detail() 411 | test_import_dataset() 412 | test_import_metafile_extract() 413 | test_import_metafile_estimate_rule() 414 | test_import_metafile() 415 | test_import_metafile_modify() 416 | test_import_metafile_modify_join_rule_file() 417 | test_import_metafile_exkeyvalue() 418 | test_import_metafile_exkeyvalue_multiple() 419 | test_import_metafile_csv_exkeyvalue() 420 | test_import_metafile_csv_exkeyvalue_multiple() 421 | test_import_metafile_csv() 422 | test_data_link() 423 | test_import_metafile_csv_exkeyvalue_invalid() 424 | test_import_metafile_exkeyvalue_invalid() 425 | test_search_files() 426 | test_get_project_member() 427 | test_invite_project_member() 428 | test_change_permission() 429 | test_delete_project_member() 430 | test_archive_project() 431 | test_list_archived_project() 432 | test_delete_project() 433 | -------------------------------------------------------------------------------- /base/files.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | import os 6 | import re 7 | import json 8 | import copy 9 | import requests 10 | import urllib.parse 11 | from typing import Optional, Union, List, Any 12 | 13 | from base.config import ( 14 | get_user_id, 15 | get_access_key, 16 | get_project_uid, 17 | BASE_API_ENDPOINT, 18 | ) 19 | 20 | HEADER = {"Content-Type": "application/json"} 21 | LINKER_DIR = os.path.join(os.path.expanduser("~"), ".base", "linker") 22 | 23 | 24 | class File(str): 25 | """ 26 | File class 27 | 28 | Attributes 29 | ---------- 30 | path : str 31 | path of file 32 | attrs : dict 33 | dict of attributes (metadata) which related with this file 34 | 35 | Note 36 | ---- 37 | The metadata of the file are added to the attribute. 38 | """ 39 | 40 | def __new__(cls, file_path: str, attrs: dict): 41 | self = super().__new__(cls, file_path) 42 | self.path = file_path 43 | self.metadata = attrs 44 | self.__dict__.update(attrs) 45 | return self 46 | 47 | def __getitem__(self, key: str) -> Any: 48 | return self.__dict__[key] 49 | 50 | 51 | class Files: 52 | """ 53 | Files class 54 | 55 | Attributes 56 | ---------- 57 | project_name : str 58 | registerd project name 59 | user_id : str 60 | registerd user id 61 | project_uid : str 62 | project unique hash 63 | conditions : str, default None 64 | value of the condition to search for files 65 | query : list of str, default [] 66 | conditional expression of key and value to search for files 67 | sort_key : str, default None 68 | key to sort files 69 | result : list of dict 70 | search result 71 | files : list 72 | list of File class 73 | paths : list 74 | list of filepath 75 | items : list of dict 76 | metadata other than filepath 77 | """ 78 | 79 | def __init__( 80 | self, 81 | project_name: str, 82 | conditions: Optional[str] = None, 83 | query: List[str] = [], 84 | sort_key: Union[str, List[str], None] = None, 85 | ) -> None: 86 | """ 87 | Parameters 88 | ---------- 89 | project_name : str 90 | registerd project name 91 | conditions : str, default None 92 | value of the condition to search for files 93 | query : list of str, default [] 94 | conditional expression of key and value to search for files 95 | sort_key : str, default None 96 | key to sort files 97 | """ 98 | access_key = get_access_key() 99 | HEADER.update({"x-api-key": access_key}) 100 | 101 | self.project_name = project_name 102 | self.user_id = get_user_id() 103 | self.project_uid = get_project_uid(self.user_id, project_name) 104 | 105 | self.sort_key = sort_key 106 | 107 | self.__export(conditions=conditions, query=query, sort_key=sort_key) 108 | 109 | self.reprtext = self.__reprtext_generator(conditions, query) 110 | self.expression = self.__class__.__name__ 111 | 112 | def __search( 113 | self, conditions: Optional[str] = None, query: List[str] = [] 114 | ) -> List[dict]: 115 | """ 116 | Get metadata of filtered files from DynamoDB. 117 | 118 | Parameters 119 | ---------- 120 | conditions : str, default None 121 | value of the condition to search for files 122 | query : list of str, default [] 123 | conditional expression of key and value to search for files 124 | sort_key : str, default None 125 | key to sort files 126 | 127 | Returns 128 | ------- 129 | result : list of dict 130 | search result of metadata 131 | """ 132 | url = f"{BASE_API_ENDPOINT}/project/{self.project_uid}/files" 133 | if conditions is not None: 134 | url += "/" + "/".join(map(urllib.parse.quote_plus, conditions.split(","))) 135 | url += "?user=" + self.user_id 136 | 137 | res = requests.get(url=url, headers=HEADER) 138 | if res.status_code == 200: 139 | result_url = res.json()["URL"] 140 | result = requests.get(result_url) 141 | result = json.loads(result.content.decode("utf-8"))["Items"] 142 | else: 143 | raise Exception("Undefined error happend.") 144 | 145 | result = self.__query_filter(result, query) 146 | 147 | linked_hash_location = os.path.join( 148 | LINKER_DIR, self.project_uid, "linked_hash.json" 149 | ) 150 | with open(linked_hash_location, "r", encoding="utf-8") as f: 151 | hash_dict = json.loads(f.read()) 152 | result = [{"FilePath": hash_dict[i.pop("FileHash")], **i} for i in result] 153 | 154 | return result 155 | 156 | def __export( 157 | self, 158 | conditions: Optional[str] = None, 159 | query: List[str] = [], 160 | sort_key: Union[str, List[str], None] = None, 161 | ): 162 | """ 163 | Get metadata and return the File class. 164 | 165 | Parameters 166 | ---------- 167 | conditions : str, default None 168 | value of the condition to search for files 169 | query : list of str, default [] 170 | conditional expression of key and value to search for files 171 | sort_key : str, default None 172 | key to sort files 173 | 174 | Returns 175 | ------- 176 | self : Files class instance 177 | """ 178 | # arguments varidation 179 | self.__validate_args(conditions, query, sort_key) 180 | 181 | result = self.__search(conditions, query) 182 | if sort_key is not None: 183 | if isinstance(sort_key, str): 184 | sort_key = [sort_key] 185 | result = sorted( 186 | result, 187 | key=lambda x: [x.get(key, float("inf")) for key in sort_key], 188 | ) 189 | 190 | self.result = result 191 | self.__set_attributes(result) 192 | 193 | return self 194 | 195 | def filter( 196 | self, 197 | conditions: Optional[str] = None, 198 | query: List[str] = [], 199 | sort_key: Union[str, List[str], None] = None, 200 | ): 201 | """ 202 | Filter metadata and return the File class. 203 | 204 | Parameters 205 | ---------- 206 | conditions : str, default None 207 | value of the condition to search for files 208 | query : list of str, default [] 209 | conditional expression of key and value to search for files 210 | sort_key : str, default None 211 | key to sort files 212 | 213 | Returns 214 | ------- 215 | self : Files class instance 216 | """ 217 | # arguments varidation 218 | self.__validate_args(conditions, query, sort_key) 219 | 220 | filtered_files = copy.copy(self) 221 | filtered_files.sort_key = ( 222 | sort_key or self.sort_key 223 | ) # value1 or value2 <==> value2 if value1 is None else value1 224 | 225 | result = filtered_files.result 226 | if conditions is not None: 227 | result = filtered_files.__conditions_filter(result, conditions) 228 | if len(query) > 0: 229 | result = filtered_files.__query_filter(result, query) 230 | if sort_key is not None: 231 | if isinstance(sort_key, str): 232 | sort_key = [sort_key] 233 | result = sorted( 234 | result, 235 | key=lambda x: [x.get(key, float("inf")) for key in sort_key], 236 | ) 237 | 238 | filtered_files.result = result 239 | filtered_files.__set_attributes(result) 240 | 241 | return filtered_files 242 | 243 | def __query_filter(self, result: List[dict], query: List[str] = []) -> List[dict]: 244 | """ 245 | Filter metadata with query. 246 | 247 | Parameters 248 | ---------- 249 | result : list of dict 250 | search result of metadata 251 | query : list of str, default [] 252 | conditional expression of key and value to search for files 253 | 254 | Returns 255 | ------- 256 | result : list of dict 257 | metadata filterd with query 258 | """ 259 | 260 | def number_to_int(obj: str): 261 | return int(obj) if obj.isdigit() else obj 262 | 263 | def natural_keys(primary_class: str): 264 | def sort_funcion(obj): 265 | try: 266 | keys = [eval(primary_class)(obj)] 267 | except: 268 | keys = [number_to_int(c) for c in re.split(r"(\d+)", str(obj))] 269 | finally: 270 | return keys 271 | 272 | return sort_funcion 273 | 274 | unquote = lambda v: v.lstrip("'").rstrip("'").lstrip('"').rstrip('"') 275 | 276 | for q in query: 277 | queried_result = [] 278 | 279 | query_split = q.split(" ", 2) 280 | if len(query_split) < 3 or query_split[1] not in [ 281 | "==", 282 | "!=", 283 | ">", 284 | ">=", 285 | "<", 286 | "<=", 287 | "in", 288 | "is", 289 | "not", 290 | ]: 291 | raise ValueError( 292 | "Invalid query grammar. See docs about query option.\nhttps://github.com/adansons/base/blob/main/docs/CLI.md#search" 293 | ) 294 | 295 | # if q = "label <= 7" or q = "label <= '7'" 296 | # key = "label", value = "7", operator = "<=" 297 | if query_split[1] in ["in", "is", "not"]: 298 | key = query_split[0] 299 | qs_ = query_split[2].split(" ", 1) 300 | value = unquote(qs_[-1]) 301 | operator = " ".join([query_split[1]] + qs_[:-1]) 302 | else: 303 | key = query_split[0] 304 | value = unquote(query_split[-1]) 305 | operator = " ".join(query_split[1:-1]) 306 | 307 | if operator == "==": 308 | for data in result: 309 | if key in data and eval(f"'{data[key]}' {operator} '{value}'"): 310 | queried_result.append(data) 311 | elif operator == "!=": 312 | for data in result: 313 | if key in data and not eval(f"'{data[key]}' {operator} '{value}'"): 314 | continue 315 | else: 316 | queried_result.append(data) 317 | elif operator in [">", ">="]: 318 | for data in result: 319 | if key in data: 320 | s = sorted( 321 | [data[key], value], 322 | key=natural_keys(data[key].__class__.__name__), 323 | ) 324 | if s[0] == value: 325 | queried_result.append(data) 326 | elif operator in ["<", "<="]: 327 | for data in result: 328 | if key in data: 329 | s = sorted( 330 | [data[key], value], 331 | key=natural_keys(data[key].__class__.__name__), 332 | ) 333 | if s[1] == value: 334 | queried_result.append(data) 335 | elif operator in ["is", "is not"]: 336 | # in python, "is" and "is not" operators allowed to compare with `None` 337 | # so, if other values set as 'value', raise ValueError 338 | if value != "None": 339 | raise ValueError( 340 | "Only 'None' is allowed with `is` or `is not` operators." 341 | ) 342 | for data in result: 343 | if (operator == "is" and key not in data) or ( 344 | operator == "is not" and key in data 345 | ): 346 | queried_result.append(data) 347 | elif operator in ["in", "not in"]: 348 | value = [unquote(v) for v in re.split("[ ,]", value[1:-1]) if v != ""] 349 | for data in result: 350 | if key in data and eval(f"'{data[key]}' {operator} {value}"): 351 | queried_result.append(data) 352 | else: 353 | raise ValueError( 354 | f"Specified operator '{operator}' was blocked for the security." 355 | ) 356 | 357 | result = queried_result 358 | return result 359 | 360 | def __conditions_filter( 361 | self, result: List[dict], conditions: Optional[str] = None 362 | ) -> List[dict]: 363 | """ 364 | Filter metadata with conditions. 365 | 366 | Parameters 367 | ---------- 368 | result : list of dict 369 | search result of metadata 370 | conditions : str, default None 371 | value of the condition to search for files 372 | 373 | Returns 374 | ------- 375 | result : list of dict 376 | metadata filterd with conditions 377 | """ 378 | conditions = set(conditions.split(",")) 379 | result = [recode for recode in result if set(recode.values()) & conditions] 380 | return result 381 | 382 | def __set_attributes(self, result: List[dict]) -> None: 383 | """ 384 | Set instance variables. 385 | 386 | Parameters 387 | ---------- 388 | result : list of dict 389 | search result of metadata 390 | 391 | Returns 392 | ------- 393 | None 394 | """ 395 | files = [] 396 | paths = [] 397 | items = [] 398 | for res in result: 399 | attrs = {} 400 | for k, v in res.items(): 401 | if k == "FilePath": 402 | path = v 403 | else: 404 | attrs[k] = v 405 | file = File(path, attrs) 406 | files.append(file) 407 | paths.append(path) 408 | items.append(attrs) 409 | 410 | self.files = files # list of File_class objects 411 | self.paths = paths # list of filepaths 412 | self.items = items # list of metadata_dict other than filepath 413 | 414 | def __validate_args(self, conditions, query, sort_key): 415 | if conditions is not None: 416 | if not isinstance(conditions, str): 417 | raise TypeError( 418 | f'Argument "conditions" must be str, not {conditions.__class__.__name__}.' 419 | ) 420 | if not hasattr(query, "__iter__"): 421 | raise TypeError( 422 | f'Argument "query" must be list, not {query.__class__.__name__}.' 423 | ) 424 | if sort_key is not None: 425 | if not isinstance(sort_key, (str, list)): 426 | raise TypeError( 427 | f'Argument "sort_key" must be str, not {sort_key.__class__.__name__}.' 428 | ) 429 | 430 | def __getitem__(self, idx: int) -> File: 431 | return self.files[idx] 432 | 433 | def __len__(self) -> int: 434 | return len(self.files) 435 | 436 | def __repr_formatter(self, string: Optional[str]) -> Optional[str]: 437 | return "'" + string + "'" if string is not None else None 438 | 439 | def __reprtext_generator(self, conditions, query) -> str: 440 | project_name = self.__repr_formatter(self.project_name) 441 | conditions = self.__repr_formatter(conditions) 442 | query = query 443 | sort_key = self.__repr_formatter(self.sort_key) 444 | reprtext = f"{self.__class__.__name__}(project_name={project_name}, conditions={conditions}, query={query}, sort_key={sort_key}, file_num={len(self.files)})\n" 445 | return reprtext 446 | 447 | def __repr__(self) -> str: 448 | # if this instance is operated, 449 | if self.reprtext.count(self.__class__.__name__) >= 2: 450 | repr_header = "======Files======\n" 451 | expres_header = "===Expressions===\n" 452 | # number each File instance 453 | # 'Files(project_name=,...)' -> '{}(projwct_name=,...)' to use str.format() 454 | self.reprtext = re.sub(f"{self.__class__.__name__}", "{}", self.reprtext) 455 | self.expression = re.sub( 456 | f"{self.__class__.__name__}", "{}", self.expression 457 | ) 458 | # '{}(projwct_name=,...)' -> 'Files1(projwct_name=,...)' 459 | self.reprtext = self.reprtext.format( 460 | *[ 461 | f"{self.__class__.__name__}{i+1}" 462 | for i in range(self.reprtext.count("{}")) 463 | ] 464 | ) 465 | self.expression = self.expression.format( 466 | *[ 467 | f"{self.__class__.__name__}{i+1}" 468 | for i in range(self.expression.count("{}")) 469 | ] 470 | ) 471 | return repr_header + self.reprtext + expres_header + self.expression 472 | else: 473 | return self.reprtext 474 | 475 | def __add__(self, other: "Files") -> "Files": 476 | if isinstance(other, self.__class__): 477 | files = copy.copy(self) 478 | files.result = self.result + other.result 479 | files.__set_attributes(files.result) 480 | files.reprtext = files.reprtext + other.reprtext 481 | files.expression += " + " + other.expression 482 | return files 483 | else: 484 | raise TypeError( 485 | f"unsupported operand type(s) for +: '{self.__class__.__name__}' and '{other.__class__.__name__}'." 486 | ) 487 | 488 | def __or__(self, other: "Files") -> "Files": 489 | if isinstance(other, self.__class__): 490 | files_list = list( 491 | map( 492 | lambda x: json.dumps(sorted(x.items())), 493 | [*(self.result), *(other.result)], 494 | ) 495 | ) 496 | uniq_result = sorted(set(files_list), key=files_list.index) 497 | 498 | files = copy.copy(self) 499 | files.result = [dict(json.loads(result)) for result in uniq_result] 500 | files.__set_attributes(files.result) 501 | 502 | files.reprtext = files.reprtext + other.reprtext 503 | files_expression_count = files.expression.count(files.__class__.__name__) 504 | other_expression_count = other.expression.count(other.__class__.__name__) 505 | if files_expression_count >= 2 and other_expression_count >= 2: 506 | files.expression = f"({files.expression}) or ({other.expression})" 507 | elif files_expression_count == 1 and other_expression_count >= 2: 508 | files.expression = f"{files.expression} or ({other.expression})" 509 | elif files_expression_count >= 2 and other_expression_count == 1: 510 | files.expression = f"({files.expression}) or {other.expression}" 511 | elif files_expression_count == 1 and other_expression_count == 1: 512 | files.expression = f"{files.expression} or {other.expression}" 513 | return files 514 | else: 515 | raise TypeError( 516 | f"unsupported operand type(s) for +: '{self.__class__.__name__}' and '{other.__class__.__name__}'." 517 | ) 518 | 519 | 520 | if __name__ == "__main__": 521 | pass 522 | -------------------------------------------------------------------------------- /base/cli.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright 2022 Adansons Inc. 4 | # Please contact engineer@adansons.co.jp 5 | import os 6 | import sys 7 | import time 8 | import glob 9 | import json 10 | 11 | import click 12 | from datetime import datetime 13 | 14 | from base import VERSION 15 | from base.project import ( 16 | Project, 17 | create_project, 18 | get_projects, 19 | archive_project, 20 | delete_project, 21 | summarize_keys_information, 22 | ) 23 | from base.config import ( 24 | get_user_id, 25 | get_access_key, 26 | register_access_key, 27 | register_user_id, 28 | update_project_info, 29 | get_user_id_from_db, 30 | check_project_available, 31 | ) 32 | from .exception import CatchAllExceptions, search_export_exception 33 | 34 | 35 | def base_config(func): 36 | def wrapper(*args, **kwargs): 37 | # Try get user_id 38 | try: 39 | access_key = get_access_key() 40 | user_id = get_user_id() 41 | update_project_info(user_id) 42 | except: 43 | click.echo( 44 | "Welcome to Adansons Base!!\n\nLet's start with your access key provided on our slack.\n(if you don't have access key, please press ENTER.)\n" 45 | ) 46 | while True: 47 | try: 48 | access_key = click.prompt( 49 | "Please register your access key", type=str, default="none" 50 | ) 51 | if access_key == "none": 52 | click.echo( 53 | "\nGet invitation from here!\n-> https://share.hsforms.com/16OxTF7eJRPK92oGCny7nGw8moen\n" 54 | ) 55 | sys.exit() 56 | except click.exceptions.Abort: 57 | click.echo("\nAborted!") 58 | sys.exit() 59 | 60 | try: 61 | register_access_key(access_key) 62 | user_id = get_user_id_from_db(access_key) 63 | register_user_id(user_id) 64 | update_project_info(user_id) 65 | except click.exceptions.Abort: 66 | click.echo("\nAborted!") 67 | sys.exit() 68 | except: 69 | click.echo( 70 | "\nIncorrect access key was specified, please re-configure or ask support team.\n" 71 | ) 72 | else: 73 | click.echo(f"\nSuccessfully configured as {user_id}\n") 74 | time.sleep(3) 75 | kwargs["user_id"] = user_id 76 | break 77 | else: 78 | kwargs["user_id"] = user_id 79 | 80 | func(*args, **kwargs) 81 | 82 | return wrapper 83 | 84 | 85 | @click.version_option(VERSION) 86 | @click.group() 87 | def main(): 88 | """Adansons Database Command Line Interface""" 89 | pass 90 | 91 | 92 | @main.command(name="new", help="create new project") 93 | @click.argument("project") 94 | @base_config 95 | def create_table(project, user_id): 96 | """ 97 | Create a new project table command 98 | Usage 99 | ----- 100 | $ base new sample-project 101 | Arguments 102 | --------- 103 | project: str 104 | new project name 105 | parameters 106 | ---------- 107 | user_id : str 108 | registerd user id 109 | Returns 110 | ------- 111 | project_uid : str 112 | project unique hash 113 | """ 114 | try: 115 | project_uid = create_project(user_id, project) 116 | check_project_available(user_id, project_uid) 117 | except Exception as e: 118 | click.echo(e) 119 | else: 120 | click.echo( 121 | f"Your Project UID\n----------------\n{project_uid}\n\nsave Project UID in local file (~/.base/projects)" 122 | ) 123 | return project_uid 124 | 125 | 126 | @main.command(name="list", help="show project list") 127 | @click.option("--archived", is_flag=True) 128 | @base_config 129 | def list_project(archived, user_id): 130 | """ 131 | Show project list command 132 | Usage 133 | ----- 134 | $ base list 135 | Parameters 136 | ---------- 137 | user_id : str 138 | registerd user id 139 | archived : bool 140 | if you want show archived projects 141 | """ 142 | 143 | try: 144 | project_list = get_projects(user_id, archived=archived) 145 | except Exception as e: 146 | click.echo(e) 147 | else: 148 | click.echo("projects\n========") 149 | for project in project_list: 150 | private = "yes" if project["PrivateProject"] == "1" else "no" 151 | created_date = datetime.fromtimestamp( 152 | float(project["CreatedTime"]) 153 | ).strftime("%Y-%m-%d %H:%M:%S") 154 | click.echo( 155 | f"[{project['ProjectName']}]\nProject UID: {project['ProjectUid']}\nRole: {project['UserRole']}\nPrivate Project: {private}\nCreated Date: {created_date}" 156 | ) 157 | 158 | 159 | @main.command(name="rm", help="remove project") 160 | @click.argument("project") 161 | @click.option("--confirm", is_flag=True) 162 | @click.option( 163 | "-m", 164 | "--member", 165 | type=str, 166 | help="member id you want to remove from project", 167 | required=False, 168 | default=None, 169 | multiple=True, 170 | ) 171 | @base_config 172 | def remove_project(confirm, project, user_id, member): 173 | """ 174 | Delete a project command 175 | Usage 176 | ----- 177 | $ base rm sample-project 178 | Arguments 179 | --------- 180 | project : str 181 | project name wich you want to delete 182 | Parameters 183 | ---------- 184 | user_id : str 185 | registerd user id 186 | Options 187 | ------- 188 | confirm : bool 189 | if you want delete archived projects 190 | member : list 191 | if you want remove project member from project 192 | """ 193 | if not member: 194 | if confirm: 195 | try: 196 | delete_project(user_id, project) 197 | except Exception as e: 198 | click.echo(e) 199 | 200 | else: 201 | click.echo(f"{project} was Deleted") 202 | else: 203 | 204 | try: 205 | archive_project(user_id, project) 206 | except Exception as e: 207 | click.echo(e) 208 | else: 209 | click.echo(f"{project} was Archived") 210 | else: 211 | 212 | pjt = Project(project) 213 | try: 214 | pjt.remove_member(member) 215 | except Exception as e: 216 | click.echo(e) 217 | else: 218 | click.echo(f"{','.join(member)} was removed from {project}") 219 | 220 | 221 | @main.command(name="show", help="show project detail") 222 | @click.argument("project") 223 | @click.option("--member-list", is_flag=True) 224 | @base_config 225 | def show_project_detail(project, user_id, member_list): 226 | """ 227 | 228 | Show project detail command 229 | Usage 230 | ----- 231 | $ base show sample-project 232 | Arguments 233 | --------- 234 | project : str 235 | project name wich you are interested in 236 | Parameters 237 | ---------- 238 | user_id : str 239 | registerd user id 240 | Optinons 241 | -------- 242 | member_list : bool 243 | if you want see about project members 244 | """ 245 | pjt = Project(project) 246 | if not member_list: 247 | 248 | try: 249 | key_list = pjt.get_metadata_summary() 250 | summary_for_print = summarize_keys_information(key_list) 251 | except Exception as e: 252 | click.echo(e) 253 | else: 254 | click.echo( 255 | f"project {project}\n===============\nYou have {summary_for_print['MaxRecordedCount']} records with {summary_for_print['UniqueKeyCount']} keys in this project.\n\n[Keys Information]\n" 256 | ) 257 | 258 | # first element is ('KEY NAME', 'VALUE RANGE', 'VALUE TYPE', 'RECORDED COUNT') 259 | max_len_list = [ 260 | summary_for_print["MaxCharCount"][column] 261 | for column in summary_for_print["Keys"][0] 262 | ] 263 | for row in summary_for_print["Keys"]: 264 | click.echo( 265 | " ".join( 266 | [ 267 | content + " " * (length - len(content)) 268 | for content, length in zip(row, max_len_list) 269 | ] 270 | ) 271 | ) 272 | else: 273 | try: 274 | member_list = pjt.get_members() 275 | except Exception as e: 276 | click.echo(e) 277 | else: 278 | click.echo("project Members\n===============") 279 | for column in member_list: 280 | created_date = datetime.fromtimestamp( 281 | float(column["CreatedTime"]) 282 | ).strftime("%Y-%m-%d %H:%M:%S") 283 | click.echo( 284 | f'{column["UserID"]} ({column["UserRole"]}, invited at {created_date})' 285 | ) 286 | 287 | 288 | @main.command(name="import", help="import dataset into project") 289 | @click.argument("project") 290 | @click.option( 291 | "-m", 292 | "--external-file", 293 | help="flag for external meta-data file", 294 | is_flag=True, 295 | default=False, 296 | ) 297 | @click.option( 298 | "-p", 299 | "--path", 300 | help="path for external meta-data file", 301 | required=False, 302 | default=None, 303 | multiple=True, 304 | ) 305 | @click.option( 306 | "-d", 307 | "--directory", 308 | type=str, 309 | help="target directory path", 310 | required=False, 311 | default=None, 312 | ) 313 | @click.option( 314 | "-e", 315 | "--extension", 316 | type=str, 317 | help="target file extensions", 318 | required=False, 319 | default=None, 320 | ) 321 | @click.option( 322 | "-c", "--parse", type=str, help="path parsing rule", required=False, default=None 323 | ) 324 | @click.option( 325 | "-a", 326 | "--additional", 327 | type=str, 328 | help="additional key and value", 329 | required=False, 330 | default=None, 331 | multiple=True, 332 | ) 333 | @click.option( 334 | "--extract", 335 | type=str, 336 | help="flag for extract external file", 337 | is_flag=True, 338 | default=False, 339 | ) 340 | @click.option( 341 | "--estimate-rule", 342 | type=str, 343 | help="flag for estimate join rule", 344 | is_flag=True, 345 | default=False, 346 | ) 347 | @click.option( 348 | "--join-rule", 349 | type=str, 350 | help="file path for defining the join rule", 351 | required=False, 352 | default=None, 353 | ) 354 | @click.option("--export", type=str, help="export file type", required=False) 355 | @click.option("--output", type=str, help="output file path", required=False) 356 | @click.option("--auto-approve", is_flag=True) 357 | @base_config 358 | def import_data( 359 | project, 360 | external_file, 361 | path, 362 | directory, 363 | extension, 364 | parse, 365 | additional, 366 | auto_approve, 367 | extract, 368 | estimate_rule, 369 | join_rule, 370 | export, 371 | output, 372 | user_id, 373 | ): 374 | """ 375 | Import data file command 376 | Usage 377 | ----- 378 | $ base import sample-project -d ../dataset -e wav -c {timestamp}/{UID}-{condition}-{iteration}.wav 379 | If you want to import meta-data from an external file : 380 | $ base import sample-project --external-file your/path/to_data 381 | Arguments 382 | --------- 383 | project : str 384 | project name wich you are interested in 385 | Parameters 386 | ---------- 387 | user_id : str 388 | registerd user id 389 | directory : str, default=None 390 | extension : str, default=None 391 | parse : str, default=None 392 | additional : tuple of str, default=None 393 | auto_approve : bool, default=False 394 | approve estimated table joining rule 395 | """ 396 | if additional is None: 397 | additional = {} 398 | else: 399 | try: 400 | additional = { 401 | element.split(":")[0]: element.split(":")[1] for element in additional 402 | } 403 | except: 404 | click.echo( 405 | "Found invalid argument in -x. The argument must be : -x key:value" 406 | ) 407 | else: 408 | import_metafile( 409 | project, 410 | path, 411 | additional, 412 | auto_approve, 413 | extract, 414 | estimate_rule, 415 | join_rule, 416 | export, 417 | output, 418 | ) if external_file else import_dataset( 419 | project, directory, extension, parse, additional 420 | ) 421 | 422 | 423 | def import_dataset(project, directory, extension, parse, additional): 424 | pjt = Project(project) 425 | if directory is None: 426 | directory = click.prompt( 427 | "Where is your dataset? (select root of dataset directory)", type=str 428 | ) 429 | if extension is None: 430 | extension = [] 431 | extension = click.prompt( 432 | "What is your data file extension? (ex: csv, jpg, png, wav)", type=str 433 | ) 434 | if extension[0] == ".": 435 | extension = extension[1:] 436 | 437 | click.echo("Check datafiles...") 438 | files = glob.glob(os.path.join(directory, "**", f"*.{extension}"), recursive=True) 439 | click.echo(f"found {len(files)} files with {extension} extension.") 440 | assert ( 441 | len(files) > 0 442 | ), "No datafiles found. Please check your directory and extension." 443 | 444 | if parse is None: 445 | sample_file_path = files[0].split(directory)[-1] 446 | if sample_file_path[0] == os.sep: 447 | sample_file_path = sample_file_path[1:] 448 | click.echo( 449 | f"\nTell me parsing rule for get meta data from file path with '{extension}'.\n\ 450 | * you can use {{key-name}} to parse phrases with key.\n\ 451 | * you can use {{_}} to ignore some phrases.\n\ 452 | * you have to use '/' as separator.\n\ 453 | ** sample parsing rule: {{_}}/{{name}}/{{timestamp}}/{{sensor}}-{{condition}}_{{iteration}}.csv\n\ 454 | path to your file: {sample_file_path}" 455 | ) 456 | parse = click.prompt("Parsing rule", type=str) 457 | 458 | try: 459 | pjt.add_datafiles( 460 | directory, 461 | extension, 462 | attributes=additional, 463 | parsing_rule=parse, 464 | detail_parsing_rule=None, 465 | ) 466 | except ValueError as e: 467 | click.echo(e) 468 | click.echo( 469 | f"\nCan't parse uniquely with parsing rule: {parse}\n\ 470 | Please tell me detail parsing rule in accordance with the actual path.\n\ 471 | * use {{value}} to parse phrases with value in the actual path\n\ 472 | * put {{}} before/after the value corresponding to {{_}} on the original parsing rule.\n\ 473 | ** original parsing rule: {{_}}/{{name}}/{{timestamp}}/{{sensor}}-{{condition}}_{{iteration}}.csv\n\ 474 | ** example path: Origin/suzuki/2020-04-07/A200-C_50.csv\n\ 475 | ** sample detail parsing rule: {{Origin}}/{{suzuki}}/{{2022-04-07}}/{{A200}}-{{C}}_{{50}}.csv\n\ 476 | path to your file: {files[0].split(directory)[-1]}" 477 | ) 478 | detail_parse = click.prompt("Detail parsing rule", type=str) 479 | 480 | try: 481 | pjt.add_datafiles( 482 | directory, 483 | extension, 484 | attributes=additional, 485 | parsing_rule=parse, 486 | detail_parsing_rule=detail_parse, 487 | ) 488 | except Exception as e: 489 | click.echo(e) 490 | else: 491 | click.echo("Success!") 492 | except Exception as e: 493 | click.echo(e) 494 | else: 495 | click.echo("Success!") 496 | 497 | 498 | def import_metafile( 499 | project, 500 | path, 501 | additional, 502 | auto_approve, 503 | extract, 504 | estimate_rule, 505 | join_rule, 506 | export, 507 | output, 508 | ): 509 | pjt = Project(project) 510 | if (path == ()) and (join_rule is None): 511 | path = click.prompt( 512 | "Where is your meta-data file? (select a path for an external meta-data file)", 513 | type=str, 514 | ) 515 | try: 516 | if extract: 517 | for pth in path: 518 | result = pjt.extract_metafile( 519 | file_path=pth, attributes=additional, verbose=2 520 | ) 521 | if export is not None: 522 | if export.lower() == "csv": 523 | for i, res in enumerate(result, 1): 524 | result_keys = list(res[0].keys()) 525 | for r in res: 526 | extra_keys = list(set(r.keys()) - set(result_keys)) 527 | result_keys += extra_keys 528 | 529 | output_csv = ",".join(result_keys) 530 | for r in res: 531 | result_values = [str(r[k]) for k in result_keys] 532 | output_csv += "\n" + ",".join(result_values) 533 | 534 | output_path = os.path.join( 535 | ".", 536 | f"{os.path.basename(pth.split('.')[0])}_Table{i}.csv", 537 | ) 538 | if output is not None: 539 | output_path = output 540 | os.makedirs(os.path.dirname(output), exist_ok=True) 541 | 542 | file_count = 1 543 | while True: 544 | if os.path.exists(output_path): 545 | basename, ext = os.path.splitext(output_path) 546 | output_path = f"{basename} ({file_count}){ext}" 547 | file_count += 1 548 | else: 549 | break 550 | with open(output_path, "w", encoding="utf-8") as f: 551 | f.write(output_csv) 552 | else: 553 | click.echo( 554 | f"Sorry, export file type: {export} was not supprted yet..." 555 | ) 556 | elif estimate_rule: 557 | for pth in path: 558 | pjt.estimate_join_rule(file_path=pth, verbose=2) 559 | else: 560 | pjt.add_metafile( 561 | file_path=path, 562 | attributes=additional, 563 | auto=auto_approve, 564 | join_rule_path=join_rule, 565 | verbose=1, 566 | ) 567 | except Exception as e: 568 | click.echo(e) 569 | else: 570 | click.echo("Success!") 571 | 572 | 573 | @main.command( 574 | name="search", 575 | help="search files", 576 | cls=CatchAllExceptions(click.Command, handler=search_export_exception), 577 | ) 578 | @click.argument("project") 579 | @click.option( 580 | "-q", 581 | "--query", 582 | type=str, 583 | help="query key value pair and operator. you have to specify like 'key >= value'", 584 | required=False, 585 | multiple=True, 586 | ) 587 | @click.option( 588 | "-c", 589 | "--conditions", 590 | type=str, 591 | help="query value. you have to specify as 'value1,value2,...'", 592 | required=False, 593 | ) 594 | @click.option("-e", "--export", type=str, help="export file type", required=False) 595 | @click.option("-o", "--output", type=str, help="output file path", required=False) 596 | @click.option("-s", "--summary", is_flag=True) 597 | @base_config 598 | def search_files( 599 | project, 600 | query, 601 | conditions, 602 | export, 603 | output, 604 | user_id, 605 | summary, 606 | ): 607 | """ 608 | Query database 609 | Usage 610 | ----- 611 | $ base search sample-project -q "key >= xxxxx" -c yyy,zzz 612 | Arguments 613 | --------- 614 | project : str 615 | project name wich you are interested in 616 | Parameters 617 | ---------- 618 | user_id : str 619 | registerd user id 620 | query : str 621 | conditions : str 622 | Options 623 | ------- 624 | summary : bool 625 | if you want hide detail 626 | """ 627 | pjt = Project(project) 628 | try: 629 | if conditions is not None: 630 | result = pjt.files(conditions=conditions, query=query).result 631 | else: 632 | result = pjt.files(query=query).result 633 | except Exception as e: 634 | click.echo(e) 635 | else: 636 | click.echo(f"{len(result)} files") 637 | if not summary: 638 | click.echo("========") 639 | for r in result: 640 | click.echo(r) 641 | if export is not None: 642 | if export.lower() == "json": 643 | output_json = json.dumps({"Data": result}, indent=4, ensure_ascii=False) 644 | 645 | output_path = os.path.join(".", "dataset.json") 646 | if output is not None: 647 | output_path = output 648 | os.makedirs(os.path.dirname(output) or ".", exist_ok=True) 649 | 650 | file_count = 1 651 | while True: 652 | if os.path.exists(output_path): 653 | basename, ext = os.path.splitext(output_path) 654 | output_path = f"{basename} ({file_count}){ext}" 655 | file_count += 1 656 | else: 657 | break 658 | with open(output_path, "w", encoding="utf-8") as f: 659 | f.write(output_json) 660 | elif export.lower() == "csv": 661 | result_keys = [] 662 | for r in result: 663 | extra_keys = list(set(r.keys()) - set(result_keys)) 664 | result_keys += extra_keys 665 | 666 | output_csv = ",".join(result_keys) 667 | for r in result: 668 | result_values = [str(r[k]) for k in result_keys] 669 | output_csv += "\n" + ",".join(result_values) 670 | 671 | output_path = os.path.join(".", "dataset.csv") 672 | if output is not None: 673 | output_path = output 674 | os.makedirs(os.path.dirname(output) or ".", exist_ok=True) 675 | 676 | file_count = 1 677 | while True: 678 | if os.path.exists(output_path): 679 | basename, ext = os.path.splitext(output_path) 680 | output_path = f"{basename} ({file_count}){ext}" 681 | file_count += 1 682 | else: 683 | break 684 | with open(output_path, "w", encoding="utf-8") as f: 685 | f.write(output_csv) 686 | else: 687 | click.echo(f"Sorry, export file type: {export} was not supprted yet...") 688 | elif export is None and output is not None: 689 | click.echo("\nPlease specify export file type. (e.g. --export json)") 690 | 691 | 692 | @main.command(name="invite", help="invite project member") 693 | @click.argument("project") 694 | @click.option( 695 | "-m", "--member", type=str, help="member id you want to invite", required=True 696 | ) 697 | @click.option( 698 | "-p", 699 | "--permission", 700 | type=str, 701 | help="permission level, select from 'Viewer', 'Editor', 'Admin', 'Owner'", 702 | required=True, 703 | ) 704 | @click.option("-u", "--update", is_flag=True) 705 | @base_config 706 | def invite_member(project, member, permission, update, user_id): 707 | """ 708 | Invite project member 709 | Usage 710 | ----- 711 | $ base invite sample-project -m MEMBER -p Editor 712 | Arguments 713 | --------- 714 | project : str 715 | project name wich you want to invite to 716 | Parameters 717 | ---------- 718 | user_id : str 719 | registerd user id 720 | member : str 721 | user id who you want invite 722 | permission : str 723 | permission level you want to give the member 724 | Options 725 | ------- 726 | update : bool 727 | if you want update permission exsisting project member 728 | """ 729 | pjt = Project(project) 730 | if not update: 731 | try: 732 | pjt.add_member(member, permission) 733 | except Exception as e: 734 | click.echo(e) 735 | else: 736 | click.echo(f"Successfully invited {member} into {project} as {permission}") 737 | else: 738 | try: 739 | pjt.update_member(member, permission) 740 | except Exception as e: 741 | click.echo(e) 742 | else: 743 | click.echo(f"Successfully update {member}'s permission to {permission}") 744 | 745 | 746 | @main.command(name="link", help="import dataset into project") 747 | @click.argument("project") 748 | @click.option( 749 | "-d", 750 | "--directory", 751 | type=str, 752 | help="target directory path", 753 | required=False, 754 | default=None, 755 | ) 756 | @click.option( 757 | "-e", 758 | "--extension", 759 | type=str, 760 | help="target file extensions", 761 | required=False, 762 | default=None, 763 | ) 764 | @base_config 765 | def data_link(project, directory, extension, user_id): 766 | """ 767 | Create linker metadat to local datafiles. 768 | Usage 769 | ----- 770 | $ base link sample-project -d ../dataset -e wav 771 | Arguments 772 | --------- 773 | project : str 774 | project name wich you are interested in 775 | Parameters 776 | ---------- 777 | user_id : str 778 | registerd user id 779 | directory : str, default=None 780 | extension : str, default=None 781 | """ 782 | pjt = Project(project) 783 | if directory is None: 784 | directory = click.prompt( 785 | "Where is your dataset? (select root of dataset directory)", type=str 786 | ) 787 | if extension is None: 788 | extension = [] 789 | extension = click.prompt( 790 | "What is your data file extension? (ex: csv, jpg, png, wav)", type=str 791 | ) 792 | if extension[0] == ".": 793 | extension = extension[1:] 794 | 795 | try: 796 | file_num = pjt.link_datafiles(directory, extension) 797 | except Exception as e: 798 | click.echo(e) 799 | else: 800 | click.echo("Check datafiles...") 801 | click.echo(f"found {file_num} files with {extension} extension.") 802 | click.echo("linked!") 803 | 804 | 805 | if __name__ == "__main__": 806 | main() 807 | -------------------------------------------------------------------------------- /docs/CLI.md: -------------------------------------------------------------------------------- 1 | # Command Reference 2 | 3 | Here we provide the specifications, complete descriptions, and comprehensive usage examples for `base` commands. For a list of commands, type `base --help.` 4 | 5 | - [import](#import) 6 | - [invite](#invite) 7 | - [link](#link) 8 | - [list](#list) 9 | - [new](#new) 10 | - [rm](#rm) 11 | - [search](#search) 12 | - [show](#show) 13 | 14 | ## import 15 | 16 | --- 17 | 18 | Import data files or external meta data files into Base project. 19 | 20 | **Synopsis** 21 | 22 | --- 23 | 24 | ``` 25 | usage: base import project [-d ] [-e ] [-c ] [-m] [-p ] [-a ] 26 | 27 | positional arguments: 28 | project your project name to import. 29 | ``` 30 | 31 | **Description** 32 | 33 | --- 34 | 35 | This command provide you the way to import meta data related with data file paths and defined in external files such as `.xlsx`, `.csv`. 36 | 37 | You have to select import mode as data files or external files. 38 | 39 | If you want to import data files, you have to specify `-d`, `-c` and `-e` options (or prompt ask you interactively). 40 | 41 | And then, Base will take below actions. 42 | 43 | 1. Calculate the file hash. 44 | 2. Parse the file path with `parsing-rule`. 45 | 3. Create meta data records with the file hash and parsed path data. 46 | 4. Add that records into project database table. 47 | 48 | ``` 49 | { 50 | "FileHash": String, 51 | "MetaKey1": ..., 52 | ... 53 | } 54 | ``` 55 | 56 | If you want to import external files, you have to specify `-m` and `-p` options. 57 | 58 | And then, Base will take below actions. 59 | 60 | 1. Extract tables from the external file. 61 | 2. Parse each table and detect headers in the table. 62 | 3. Set header as Key and create meta data records. 63 | 4. Link and update existing records with new meta data records in project database table. 64 | 65 | ``` 66 | { 67 | "Table0,MetaKey1": ..., 68 | ... 69 | } 70 | ``` 71 | 72 | **Options** 73 | 74 | --- 75 | 76 | - `-d `, `--directory ` - specify a `datafiles-dirpath` to load data files which have an extension specified with `-e` option. Base will search recursively. 77 | - `-e `, `--extension ` - specify a `datafile-extension` to filter the targets on load data files. if you have some extensions in one dataset (such as png and jpg), you have to split loading workflow. 78 | - `-c `, `--parse ` - specify `path-parsing-rule` to extract meta data from each data file path. 79 | 80 | ``` 81 | - you can use {key-name} to parse phrases with key. 82 | - you can use {_} to ignore some phrases. 83 | - you have to use '/' as separator. 84 | 85 | >>> sample parsing rule: {}/{name}/{timestamp}/{sensor}-{condition}{iteration}.csv 86 | ``` 87 | The following options are used only when importing external files. 88 | - `-m`, `--external-file` - parse the content of external files which specified with `-p` option. 89 | - `-p `, `--path ` - specify an `external-filepath` to import external files. Base will parse content of that file, extract table data on it, and parse the tables. 90 | - `-a `, `--additional ` - specify additional meta data you want to add whole the file you import. the value must be include colon (”:”) between `key name` and `value string`. for instance, if you want to import and join an external file for only “test” data type files, you should specify like `-x dataType:test`. 91 | - `--extract` - with this option, only extract the content of external file, dose not link and update with existing tables. you can specify output path with `-e` and `-o` options to get extract results. 92 | - `--export ` - if you want to convert extract results into CSV, you can specify CSV as export-file-type. 93 | - `--output `- specify output-filepath to save dataset file. default is “./{external-filepath}_Table{number}.csv” 94 | - `--estimate-rule` - with this option, only estimate the joining rule from existing tables and external files which specified with `-p` option, dose not link and update with existing tables. 95 | 96 | **Example: Import png files on project “mnist”** 97 | 98 | --- 99 | 100 | ``` 101 | # after you download mnist data files based on Tutorial1 102 | $ base import mnist --directory ~/dataset/mnist --extension png --parse "{dataType}/{label}/{id}.png" 103 | ``` 104 | 105 |
Output 106 | 107 | ``` 108 | Check datafiles... 109 | found 70000 files with png extension. 110 | 70000/70000 files uploaded. 111 | Success! 112 | ``` 113 |
114 | 115 | **Example: Import external csv file on project “mnist”** 116 | 117 | --- 118 | 119 | ``` 120 | # download external csv 121 | $ curl -SL https://raw.githubusercontent.com/youkaichao/mnist-wrong-test/master/wrongImagesInMNISTTestset.csv > ~/Downloads/wrongImagesInMNISTTestset.csv 122 | 123 | $ base import mnist --external-file --path ~/Downloads/wrongImagesInMNISTTestset.csv -a dataType:test 124 | ``` 125 | 126 |
Output 127 | 128 | ``` 129 | 1 tables found! 130 | now estimating the rule for table joining... 131 | 132 | 1 table joining rule was estimated! 133 | Below table joining rule will be applied... 134 | 135 | Rule no.1 136 | 137 | key 'index' -> connected to 'id' key on exist table 138 | key 'originalLabel' -> connected to 'label' key on exist table 139 | key 'correction' -> newly added 140 | 141 | 1 tables will be applied 142 | Table 1 sample record: 143 | {'index': 8, 'originalLabel': 5, 'correction': '-1'} 144 | 145 | Do you want to perform table join? 146 | Base will join tables with that rule described above. 147 | 148 | 'y' will be accepted to approve. 149 | If you need to modify it, please enter 'm' 150 | Definition YML file with estimated table join rules will be downloaded, then you can modify it and apply the new join rule. 151 | Enter a value: y 152 | Success! 153 | ``` 154 | If you enter 'm', definition YAML file with estimated table join rules will be downloaded. 155 | You can modify this file and execute the commands displayed in the terminal to apply the new join rule. 156 | 157 | ``` 158 | Do you want to perform table join? 159 | Base will join tables with that rule described above. 160 | 161 | 'y' will be accepted to approve. 162 | 163 | If you need to modify it, please enter 'm' 164 | Definition YML file with estimated table join rules will be downloaded, then you can modify it and apply the new join rule. 165 | Enter a value: m 166 | 167 | Downloaded a YAML file 'joinrule_definition_mnist.yml' in current directory. 168 | Key information for the new table and the existing table is as follows. 169 | 170 | 171 | ===== New Table1 ===== 172 | KEY NAME VALUE RANGE VALUE TYPE RECORDED COUNT 173 | 'index' 8 ~ 9850 int('index') 74 174 | 'originalLabel' 0 ~ 9 int('originalLabel') 74 175 | 'correction' -1 ~ 8or9 str('correction') 74 176 | 'dataType' test ~ test str('dataType') 74 177 | 178 | ===== Existing Table ===== 179 | KEY NAME VALUE RANGE VALUE TYPE RECORDED COUNT 180 | 'id' 0 ~ 59999 str('id') 70000 181 | 'label' 0 ~ 9 str('label') 70000 182 | 'dataType' test ~ train str('dataType') 70000 183 | 184 | You can apply the new join-rule according to 2 steps. 185 | 1. Modify the file 'joinrule_definition_mnist.yml'. Open the file to see a detailed description. 186 | 2. Execute the following command. 187 | base import mnist --external-file --additional dataType:test --join-rule joinrule_definition_mnist.yml 188 | 189 | Success! 190 | ``` 191 | joinrule_definition_mnist.yml 192 | ```yaml 193 | RequestedTime: 1654257223.4988642 194 | ProjectName: mnist 195 | Body: 196 | Table1: 197 | FilePath: /Users/user/Downloads/wrongImagesInMNISTTestset.csv 198 | JoinRules: 199 | index: id 200 | originalLabel: label 201 | correction: 202 | dataType: dataType 203 | ``` 204 | New join rules can be defined by modifying the `Body/Table/JoinRules` section. 205 | Fundamentally, this section consists of Key-Value Pairs. Key is the key name from the new table extracted from the external file. Value is the key name from the existing table. 206 | 207 | How to define join rules. 208 | if you have same key on the new table and the existing table, write like this. 209 | ```yaml 210 | 'New table key': 'Existing table key' 211 | ``` 212 | 213 | if you have new value on the existing key, write like this. 214 | ```yaml 215 | 'New table key': 'ADD:Existing table key' 216 | ``` 217 | 218 | if you have new key, no need to specify anything. 219 | ```yaml 220 | 'New table key': 221 | ``` 222 | 223 | For example: 224 | ```yaml 225 | JoinRules: 226 | first_name: name 227 | age: ADD:Age 228 | height: 229 | ``` 230 | 1. "first_name: name" means to join the new key named "first_name" with the existing key named "name". 231 | 2. "age: ADD:Age" means to add new values of the new key named 'age' on the existing key named 'Age'. 232 | 3. "height: " means to add the key named "height" as a new key. 233 | 234 | 235 | 236 |
237 | 238 | 239 | → [Back to top](#command-reference) 240 | 241 | ## invite 242 | 243 | Invite collaborators into your Base project. 244 | 245 | **Synopsis** 246 | 247 | --- 248 | 249 | ``` 250 | usage: base invite project [-m ] [-p ] [-u] 251 | 252 | positional arguments: 253 | project your project name to invite. 254 | ``` 255 | 256 | **Description** 257 | 258 | --- 259 | 260 | This command control access into your project. 261 | 262 | You can invite new project member as below `permission level`. 263 | 264 | - `Viewer` : only read meta data on project database. viewer can not import data files or external files and can not control permission of other members. 265 | - `Editor` : can read and write meta data into project database. editor can not control permission of other members. 266 | - `Admin` : can read and write meta data into project database. admin can also control permission of other members, but can not transfer `Owner` permission level. 267 | 268 | And also you can update member’s permission level with `-u` option, if you are admin or owner. 269 | 270 | If you are the project owner and try update other member’s permission to `Owner` , the member will become project owner and your permission will be downgraded to `Admin` . 271 | 272 | - `Owner` : can transfer `owner` permission to others, and delete project completely. 273 | 274 | **Options** 275 | 276 | --- 277 | 278 | - `-m `, `--member ` - specify `member-id` to invite. if you will be invited by others, you have to tell him/her your user id. 279 | - `-p `, `--permission ` - 280 | - `-u`, `--update` - 281 | 282 | **Example: Invite an viewer into mnist** 283 | 284 | --- 285 | 286 | check current your project members on mnist with `[base show --member-list](#show)` command 287 | 288 | ``` 289 | $ base show mnist --member-list 290 | ``` 291 | 292 |
Output 293 | 294 | ``` 295 | project Members 296 | =============== 297 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54) 298 | ``` 299 |
300 | 301 | then, invite zzzz@yyyy.com into mnist as an viewer 302 | 303 | ``` 304 | $ base invite mnist --member zzzz@yyyy.com --permission viewer 305 | ``` 306 | 307 |
Output 308 | 309 | ``` 310 | Successfully invited zzzz@yyyy.com into mnist as Viewer 311 | ``` 312 |
313 | 314 | finally, you can check the invited user in project member list . 315 | 316 | ``` 317 | $ base show mnist --member-list 318 | ``` 319 | 320 |
Output 321 | 322 | ``` 323 | project Members 324 | =============== 325 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54) 326 | zzzz@yyyy.com (Viewer, invited at 2022-03-12 13:45:04) 327 | ``` 328 |
329 | 330 | **Example: Update project member’s permission** 331 | 332 | --- 333 | 334 | check current your project members 335 | 336 | ``` 337 | $ base show mnist --member-list 338 | ``` 339 | 340 |
Output 341 | 342 | ``` 343 | project Members 344 | =============== 345 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54) 346 | zzzz@yyyy.com (Viewer, invited at 2022-03-12 13:45:04) 347 | ``` 348 |
349 | 350 | then, update permission of zzzz@yyyy.com to editor 351 | 352 | ``` 353 | $ base invite mnist --update --member zzzz@yyyy.com --permission editor 354 | ``` 355 | 356 |
Output 357 | 358 | ``` 359 | Successfully update zzzz@yyyy.com's permission to Editor 360 | ``` 361 |
362 | 363 | finally, you can check the updated user permission in project member list . 364 | 365 | ``` 366 | $ base show mnist --member-list 367 | ``` 368 | 369 |
Output 370 | 371 | ``` 372 | project Members 373 | =============== 374 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54) 375 | zzzz@yyyy.com (Editor, invited at 2022-03-12 13:45:04) 376 | ``` 377 |
378 | 379 | → [Back to top](#command-reference) 380 | 381 | ## link 382 | 383 | Link path to data files on local computer and meta data on Base project. 384 | 385 | **Synopsis** 386 | 387 | --- 388 | 389 | ``` 390 | usage: base link project [-d ] [-e ] 391 | 392 | positional arguments: 393 | project your invited project name to link data files. 394 | ``` 395 | 396 | **Description** 397 | 398 | --- 399 | 400 | This command will link data files and meta data records on Base project. 401 | 402 | After invitation to project, invited collaborators have to link their data files on local computer. 403 | 404 | The data files often locate in different directory with the project owner, and sometimes in different directory name or file name. 405 | 406 | So Base will create linker to match local file paths and this enable your collaborators to share the python script which load local file. 407 | 408 | **Options** 409 | 410 | --- 411 | 412 | - `-d `, `--directory ` - specify a `datafiles-dirpath` to load data files which have an extension specified with -e option. Base will search recursively. 413 | - `-e `, `--extension ` - specify a `datafile-extension` to filter the targets on load data files. if you have some extensions in one dataset (such as png and jpg), you have to split loading workflow. 414 | 415 | **Example: Link mnist data files into invited project** 416 | 417 | --- 418 | 419 | ``` 420 | $ base link --directory ~/Downloads/mnist --extension png 421 | ``` 422 | 423 | then, you can search and export dataset as you want, or run python modeling script shared by other collaborators. 424 | 425 |
Output 426 | 427 | ``` 428 | Check datafiles... 429 | found 70000 files with png extension. 430 | linked! 431 | ``` 432 |
433 | 434 | → [Back to top](#command-reference) 435 | 436 | ## list 437 | 438 | Show list of Base projects you can access. 439 | 440 | **Synopsis** 441 | 442 | --- 443 | 444 | ``` 445 | usage: base list [--archived] 446 | ``` 447 | 448 | **Description** 449 | 450 | --- 451 | 452 | This command will show you what project you can access. 453 | 454 | You can check `Project UID`, your `Role` on the project (”Viewer”, “Editor”, “Admin” or “Owner”), whether the project is `private` or not and project `created date`. 455 | 456 | **Options** 457 | 458 | --- 459 | 460 | - `--archived` - show archived projects 461 | 462 | **Example: Check projects (not archived)** 463 | 464 | --- 465 | 466 | if you have project “mnist”, 467 | 468 | ``` 469 | $ base list 470 | ``` 471 | 472 |
Output 473 | 474 | ``` 475 | projects 476 | ======== 477 | [mnist] 478 | Project UID: abcdefghij0123456789 479 | Role: Owner 480 | Private Project: yes 481 | Created Date: 2022-03-11 18:18:54 482 | ``` 483 |
484 | 485 | **Example: Check project (archived)** 486 | 487 | --- 488 | 489 | if you have archived project “fashion-mnist”, 490 | 491 | ``` 492 | $ base list --archived 493 | ``` 494 | 495 |
Output 496 | 497 | ``` 498 | projects 499 | ======== 500 | [fashion-mnist] 501 | Project UID: klmnopqrst0123456789 502 | Role: Owner 503 | Private Project: yes 504 | Created Date: 2022-03-16 01:38:29 505 | ``` 506 |
507 | 508 | > Note: you can archive your projects with [`base rm `](#rm) command. 509 | > 510 | 511 | → [Back to top](#command-reference) 512 | 513 | ## new 514 | 515 | Create a new Base project. 516 | 517 | **Synopsis** 518 | 519 | --- 520 | 521 | ``` 522 | usage: base new project 523 | 524 | positional arguments: 525 | project your project name to create. 526 | ``` 527 | 528 | **Description** 529 | 530 | --- 531 | 532 | This command will create database table for meta data. 533 | 534 | 1. issue 20 characters as `project unique id (Project UID)` and create tables. 535 | 2. save Project UID at `~/.base/projects` file on your local computer. 536 | 3. you can use `project name` as alias to Project UID with any Base command. 537 | 538 | **Example** 539 | 540 | --- 541 | 542 | ``` 543 | $ base new mnist 544 | ``` 545 | 546 |
Output 547 | 548 | ``` 549 | Your Project UID 550 | ---------------- 551 | abcdefghij0123456789 552 | 553 | save Project UID in local file (~/.base/projects) 554 | ``` 555 |
556 | 557 | then, project uids will save on `~/.base/projects` . 558 | 559 | ``` 560 | $ cat ~/.base/projects 561 | [xxxx@yyyy.com] 562 | mnist = abcdefghij0123456789 563 | ``` 564 | 565 | > Note: your user id is saved in Global section 566 | > 567 | 568 | → [Back to top](#command-referenced) 569 | 570 | ## rm 571 | 572 | Archive or completely Delete your Base projects. 573 | 574 | **Synopsis** 575 | 576 | --- 577 | 578 | ``` 579 | usage: base rm project [--confirm] [-m ] 580 | 581 | positional arguments: 582 | project your project name to archive or delete. 583 | ``` 584 | 585 | **Description** 586 | 587 | --- 588 | 589 | This command provide you the way to remove project member and archive or delete your project. 590 | 591 | If you specify `-m` option, you can remove a project member from the project. 592 | 593 | If not, Base will archive or delete specified project. 594 | 595 | For prevention unexpected delete, we suppose you to archive project not delete. 596 | 597 | If not deleted, you can restore your archived projects. 598 | 599 | > Note: Delete project action can be performed only by project owner. 600 | > 601 | 602 | **Options** 603 | 604 | --- 605 | 606 | - `--confirm` - delete archived project completely (only Owner user) 607 | - `-m `, `--member ` - specify a `member-id` to remove from project. you can see your project member list with `[base show --member-list](#show)` command. 608 | 609 | **Example: Remove project member** 610 | 611 | --- 612 | 613 | check your project members on mnist with `[base show --member-list](#show)` command 614 | 615 | ``` 616 | $ base show mnist --member-list 617 | ``` 618 | 619 |
Output 620 | 621 | ``` 622 | project Members 623 | =============== 624 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54) 625 | zzzz@yyyy.com (Editor, invited at 2022-03-12 13:45:04) 626 | ``` 627 |
628 | 629 | then, remove zzzz@yyyy.com from mnist 630 | 631 | ``` 632 | $ base rm mnist --member zzzz@yyyy.com 633 | ``` 634 | 635 |
Output 636 | 637 | ``` 638 | zzzz@yyyy.com was removed from mnist 639 | ``` 640 |
641 | 642 | finally, you can check the removed user not in project member list . 643 | 644 | ``` 645 | $ base show mnist --member-list 646 | ``` 647 | 648 |
Output 649 | 650 | ``` 651 | project Members 652 | =============== 653 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54) 654 | ``` 655 |
656 | 657 | **Example: Archive mnist project** 658 | 659 | --- 660 | 661 | ``` 662 | $ base rm mnist 663 | ``` 664 | 665 |
Output 666 | 667 | ``` 668 | mnist was Archived 669 | ``` 670 |
671 | 672 | then, you can check whether the project was archived with `[base list](#list)` command. 673 | 674 | **Example: Delete mnist project** 675 | 676 | --- 677 | 678 | ``` 679 | $ base rm mnist --confirm 680 | ``` 681 | 682 |
Output 683 | 684 | ``` 685 | mnist was Deleted 686 | ``` 687 |
688 | 689 | then, you can check whether the project was deleted with `[base list](#list)` command. 690 | 691 | ``` 692 | $ base list --archived 693 | ``` 694 | 695 |
Output 696 | 697 | ``` 698 | projects 699 | ======== 700 | ``` 701 |
702 | 703 | > Note: if you delete the project once, you can not restore its saved data forever. 704 | 705 | 706 | → [Back to top](#command-reference) 707 | 708 | ## search 709 | 710 | Search data files and export it based on meta data of Base project. 711 | 712 | **Synopsis** 713 | 714 | --- 715 | 716 | ``` 717 | usage: base search project [-q ] [-c ] 718 | [-e ] [-o ] [-s] 719 | 720 | positional arguments: 721 | project your project name to search. 722 | ``` 723 | 724 | **Description** 725 | 726 | --- 727 | 728 | This command provide you search engine for data files. 729 | 730 | You can search some words in meta data with `-c` option, or set filter with `-q` option. 731 | 732 | And also you can export as JSON or CSV with `-e` and `-o` options. 733 | 734 | > Note: if you have same values on different keys, condition filter will be confused and return a result you have not expected. for secure filtering, you should specify key name with query option if some values duplicated in over 2 keys. 735 | 736 | **Options** 737 | 738 | --- 739 | 740 | - `-q `, `--query ` - specify `query-condition` to filter the data files based on meta data. you can use various operators and specify multiple `query-condition`. 741 | 742 | ``` 743 | [query grammar] 744 | {KeyName} {Operator} {Values} 745 | - add 1 spaces each section 746 | - don't use space any other 747 | >>> sample query condition: CategoryName == airplane 748 | 749 | [operators] 750 | - == : equal 751 | - != : not equal 752 | - >= : greater than 753 | - <= : less than 754 | - > : greater 755 | - < : less 756 | - is : missing value (only 'None' is allowed as Values, ex. query='correction is None') 757 | - is not : any value (only 'None' is allowed as Values, ex. query='correction is not None') 758 | - in : inner list of Values 759 | - not in : outer list of values 760 | ``` 761 | 762 | > Note: you have to follow query grammar. 763 | > 764 | - `-c `, `--conditions ` - specify `value-conditions` to filter by meta data value. this is so powerful mode because you do not have to know the KeyName of meta data. 765 | - if you specify multiple value of one meta data key, Base will return you the union of the values. 766 | - ex.) if you specify "airplane,automobile” and both of them in same meta data key ”CategoryName”, Base will Interpret as “CategoryName is airplane or automobile”. 767 | - if you specify multiple value of different meta data keys, Base will return you the intersection of the values. 768 | - ex.) if you specify "airplane,2007” and one of them in meta data key ”CategoryName”, and other in “Timestamp”, Base will Interpret as “CategoryName is airplane and also Timestamp is 2007”. 769 | - and you can combine these behaviors. 770 | - ex.) if you specify "airplane,automobile,2007” and two of them in same meta data key ”CategoryName”, and one in “Timestamp”, Base will Interpret as “(CategoryName is airplane and also Timestamp is 2007) or (CategoryName is automobile and also Timestamp is 2007)”. 771 | 772 | ``` 773 | [conditions grammar] 774 | "{Value1},{Value2},..." 775 | - separate with comma 776 | >>> sample conditions: "airplane,automobile" 777 | ``` 778 | 779 | > Note: you have to follow conditions grammar. 780 | > 781 | - `-e `, `--export ` - if you want to convert search results into JSON or CSV, you can specify JSON or CSV as `export-file-type`. 782 | - `-o `, `--output ` - specify `output-filepath` to save dataset file. default is “./dataset.json” or “./dataset.csv” 783 | - `-s`, `--summary` - summarize result and hide detail output 784 | 785 | **Example: Search mnist with value conditions** 786 | 787 | --- 788 | 789 | ``` 790 | $ base search mnist --conditions "train" --query "label in ['1','2','3']" 791 | ``` 792 | 793 |
Output 794 | 795 | ``` 796 | 18831 files 797 | ======== 798 | '/home/xxxx/dataset/mnist/train/1/42485.png' 799 | ... 800 | ``` 801 |
802 | 803 | **Example: Search mnist and export as JSON** 804 | 805 | --- 806 | 807 | ``` 808 | $ base search mnist --conditions "test" --query "correction != -1" --export JSON --output ./dataset.json 809 | ``` 810 | 811 |
Output 812 | 813 | ``` 814 | 9963 files 815 | ======== 816 | '/home/xxxx/dataset/mnist/test/7/3329.png' 817 | ... 818 | ``` 819 |
820 | 821 | ``` 822 | $ cat ./dataset.json 823 | { 824 | "Data": [ 825 | { 826 | "FilePath": "/home/xxxx/dataset/mnist/test/7/3329.png", 827 | "label": "7", 828 | "dataType": "test", 829 | "id": "3329" 830 | }, 831 | ... 832 | ] 833 | } 834 | ``` 835 | 836 | → [Back to top](Command%20Re%2020fe9.md) 837 | 838 | ## show 839 | 840 | Show detail information about your Base project. 841 | 842 | **Synopsis** 843 | 844 | --- 845 | 846 | ``` 847 | usage: base show project [--member-list] 848 | 849 | positional arguments: 850 | project your project name to show detail. 851 | ``` 852 | 853 | **Description** 854 | 855 | --- 856 | 857 | This command will show you what meta data in your project. 858 | 859 | Each meta data has `KeyName (like “CategoryName”)` and `KeyHash` for specify the meta data if you changed KeyName. 860 | 861 | The structure of returns likes below. 862 | 863 | ``` 864 | { 865 | "KeyHash": String, 866 | "KeyName": String, 867 | "RecordedCount", Number, 868 | "Creator": String, 869 | "LastEditor": String, 870 | "EditerList": List, 871 | "ValueHash": String, 872 | "ValueType": String, 873 | "UpperValue": String, 874 | "LowerValue": String, 875 | "UniqueValues": String, 876 | "CreatedTime": String of unix time, 877 | "LastModifiedTime": String of unix time 878 | } 879 | ``` 880 | 881 | Options 882 | 883 | --- 884 | 885 | - `--member-list` - show project members 886 | 887 | **Example: Show mnist project** 888 | 889 | --- 890 | 891 | ``` 892 | $ base show mnist 893 | ``` 894 | 895 |
Output 896 | 897 | ``` 898 | project mnist 899 | =============== 900 | You have 70000 records with 4 keys in this project. 901 | 902 | [Keys Information] 903 | 904 | KEY NAME VALUE RANGE VALUE TYPE RECORDED COUNT 905 | 'id','index' 0 ~ 59999 str('id'), int('index') 70000 906 | 'correction' 0or6 ~ -1 str('correction') 74 907 | 'label','originalLabel' 0 ~ 9 str('label'), int('originalLabel') 70000 908 | 'dataType' test ~ train str('dataType') 70000 909 | ... 910 | ``` 911 |
912 | 913 | **Example: Show mnist project members** 914 | 915 | --- 916 | 917 | ``` 918 | $ base show mnist --member-list 919 | ``` 920 | 921 |
Output 922 | 923 | ``` 924 | project Members 925 | =============== 926 | xxxx@yyyy.com (Owner, invited at 2022-03-11 18:18:54) 927 | zzzz@yyyy.com (Editor, invited at 2022-03-12 13:45:04) 928 | ``` 929 |
930 | 931 | → [Back to top](#command-reference) -------------------------------------------------------------------------------- /docs/SDK.md: -------------------------------------------------------------------------------- 1 | # Python Reference 2 | 3 | - base.config 4 | - [func check_project_exists](#checkprojectexists) 5 | - [func delete_project_config](#deleteprojectconfig) 6 | - [func get_access_key](#getaccesskey) 7 | - [func get_project_uid](#getprojectuid) 8 | - [func get_user_id](#getuserid) 9 | - [func get_user_id_from_db](#getuseridfromdb) 10 | - [func register_access_key](#registeraccesskey) 11 | - [func register_project_uid](#registerprojectuid) 12 | - [func register_user_id](#registeruserid) 13 | - [func update_project_info](#updateprojectinfo) 14 | - base.dataset 15 | - [class Dataset](#dataset-class) 16 | - base.file 17 | - [class File](#file-class) 18 | - [class Files](#files-class) 19 | - base.hash 20 | - [func calc_file_hash](#calcfilehash) 21 | - base.parser 22 | - [class Parser](#parser-class) 23 | - base.project 24 | - [class Project](#project-class) 25 | - [func archive_project](#archiveproject) 26 | - [func create_project](#createproject) 27 | - [func delete_project](#deleteproject) 28 | - [func get_projects](#getprojects) 29 | - [func summarize_keys_information]() 30 | 31 | ## **check_project_exists()** 32 | 33 | ```python 34 | function base.config.check_project_exists(user_id="string", project_name="string") 35 | ``` 36 | 37 | Check project is already exists or not 38 | 39 | **Parameters** 40 | 41 | - user_id (string) - requeired 42 | - aquired user id from environment variable or config file 43 | - project_name (string) - requeired 44 | - target project name 45 | 46 | **Returns** 47 | 48 | - project_exists (bool) 49 | - project already exists or not 50 | 51 | → [Back to top](#python-reference) 52 | 53 | ## **delete_project_config()** 54 | 55 | ```python 56 | function base.config.delete_project_config(user_id="string", project_name="string") 57 | ``` 58 | 59 | Delete config of specified project. 60 | 61 | **Parameters** 62 | 63 | - user_id (string) - requeired 64 | - aquired user id from environment variable or config file 65 | - project_name (string) - requeired 66 | - target project name 67 | 68 | → [Back to top](#python-reference) 69 | 70 | ## **get_access_key()** 71 | 72 | ```python 73 | function base.config.get_access_key() 74 | ``` 75 | 76 | Get access key from config file. If you have 'BASE_ACCESS_KEY' on environment variables, Base will use it 77 | 78 | **Returns** 79 | 80 | - access_key (string) 81 | - aquired API access key from environment variable or config file 82 | 83 | → [Back to top](#python-reference) 84 | 85 | ## **get_project_uid()** 86 | 87 | ```python 88 | function base.config.get_project_uid(user_id="string", project_name="string") 89 | ``` 90 | 91 | Get project uid from project name. 92 | 93 | **Parameters** 94 | 95 | - user_id (string) - requeired 96 | - aquired user id from environment variable or config file 97 | - project_name (string) - requeired 98 | - target project name 99 | 100 | **Returns** 101 | 102 | - project_uid (listringst) 103 | - project uid of given project name 104 | 105 | → [Back to top](#python-reference) 106 | 107 | ## **get_user_id()** 108 | 109 | ```python 110 | function base.config.get_user_id() 111 | ``` 112 | 113 | Get user id from config file. If you have 'BASE_USER_ID' on environment variables, Base will use it 114 | 115 | **Returns** 116 | 117 | - user_id (string) 118 | - aquired user id from environment variable or config file 119 | 120 | → [Back to top](#python-reference) 121 | 122 | ## **get_user_id_from_db()** 123 | 124 | ```python 125 | function base.config.get_user_id_from_db(access_key="string") 126 | ``` 127 | 128 | Get user id from remote db. 129 | 130 | **Parameters** 131 | 132 | - access_key (string) - requeired 133 | - aquired API access key from environment variable or config file 134 | 135 | **Returns** 136 | 137 | - user_id (string) 138 | - aquired user id from database 139 | 140 | → [Back to top](#python-reference) 141 | 142 | ## **register_access_key()** 143 | 144 | ```python 145 | function base.config.register_access_key(access_key="string") 146 | ``` 147 | 148 | Register access key to local config file. 149 | 150 | **Parameters** 151 | 152 | - access_key (string) - requeired 153 | - API access key 154 | 155 | → [Back to top](#python-reference) 156 | 157 | ## **register_project_uid()** 158 | 159 | ```python 160 | function base.config.register_project_uid(user_id="string", project="string", project_uid="string") 161 | ``` 162 | 163 | Register project uid to local config file. 164 | 165 | **Parameters** 166 | 167 | - user_id (string) - requeired 168 | - aquired user id from environment variable or config file 169 | - project (string) - requeired 170 | - target project name 171 | - project_uid (string) - requeired 172 | - target project uid 173 | 174 | → [Back to top](#python-reference) 175 | 176 | ## **register_user_id()** 177 | 178 | ```python 179 | function base.config.register_user_id(user_id="string") 180 | ``` 181 | 182 | Register user id to local config file. 183 | 184 | **Parameters** 185 | 186 | - user_id (string) - requeired 187 | - target user id 188 | 189 | → [Back to top](#python-reference) 190 | 191 | ## **update_project_info()** 192 | 193 | ```python 194 | function base.config.update_project_info(user_id="string") 195 | ``` 196 | 197 | Update local project info with remote. 198 | 199 | **Parameters** 200 | 201 | - user_id (string) - requeired 202 | - aquired user id from environment variable or config file 203 | 204 | → [Back to top](#python-reference) 205 | 206 | ## **Dataset class** 207 | 208 | ```python 209 | class base.dataset.Dataset 210 | ``` 211 | 212 | This is a middle-level (numpy or other) interface for dataset in Base. Dataset class receive Files class as an argument and process each data file with specified transform functions. You can create high-level (torch tensor or other) interface for dataset, like Dataloader of Pytorch, using this Dataset object. 213 | 214 | ```python 215 | import base 216 | 217 | project = base.Project("project-name") 218 | files = project.files(conditions="string", query=["string"], sort_key="string") 219 | dataset = base.Dataset(files=files, target_key="string", transform=None|Callable) 220 | ``` 221 | 222 | These are the available attributes: 223 | 224 | - transform (Callable) 225 | - preprocess function 226 | - target_key (string) 227 | - object variable for modeling 228 | - files (Files) 229 | - inherited dataset interface 230 | 231 | These are the available methods: 232 | 233 | - [train_test_split()](#traintestsplit) 234 | 235 | ### **train_test_split()** 236 | 237 | ```python 238 | x_train, x_test, y_train, y_test = dataset.train_test_split(split_rate=float) 239 | ``` 240 | 241 | This method splits dataset for 2 folds. You can adjust split ratio with `split_rate` option. 242 | 243 | **Parameters** 244 | 245 | - split_rate (float) - default 0.25 246 | - the ratio of test set 247 | 248 | **Returns** 249 | 250 | - x_train (list) 251 | - transformed train data 252 | - x_test (list) 253 | - transformed test data 254 | - y_train (list) 255 | - train label specified as target_key in Dataset class initialization 256 | - y_test (list) 257 | - test label specified as target_key in Dataset class initialization 258 | 259 | **Usage** 260 | Using the index operator [] on the Dataset class object, you can get the data transformed by user-defined preprocessing functions and label specified by target key. 261 | 262 | ```python 263 | def preprocess_func(path): 264 | image = Image.open(path) 265 | image = image.resize((28, 28)) 266 | image = np.array(image) 267 | return image 268 | 269 | test_files = Project("mnist").files(conditions="test") 270 | test_dataset = Dataset(test_files, target_key="label", transform=preprocess_func) 271 | 272 | print(test_dataset[0]) 273 | >>>(array([[ 0, 0, ...]]), '7' 274 | ``` 275 | 276 | If transform is not specified, local path is returned by default. 277 | ```python 278 | test_files = Project("mnist").files(conditions="test") 279 | test_dataset = Dataset(test_files, target_key="label") 280 | 281 | print(test_dataset[0]) 282 | >>> '/Users/user/dataset/mnist/test/7/4815.png', '7' 283 | ``` 284 | 285 | For example: 286 | 287 | You can get X and y using for loops as in the following example. 288 | ```python 289 | def preprocess_func(path): 290 | image = Image.open(path) 291 | image = image.resize((28, 28)) 292 | image = np.array(image) 293 | return image 294 | 295 | def get_image_and_label(dataset, idx): 296 | X, label = dataset[idx] # label = "0" or "1" or "2" , ... 297 | y = int(label) 298 | # cerate one-hot vector 299 | y = np.eye(10)[y] 300 | return X, y 301 | 302 | test_files = Project("mnist").files(conditions="test") 303 | test_dataset = Dataset(test_files, target_key="label", transform=preprocess_func) 304 | 305 | X_test = np.empty((len(test_dataset), 28, 28, 1)) 306 | y_test = np.empty((len(test_dataset), 10)) 307 | for i in range(len(test_dataset)): 308 | X_test[i], y_test[i] = get_image_and_label(test_dataset, i) 309 | ``` 310 | 311 | If you use `train_test_split()`, y_train and y_test are list of string obtained by target_key by default. 312 | ```python 313 | files = Project("mnist").files() 314 | dataset = Dataset(files, target_key="label", transform=preprocess_func) 315 | X_train, y_train, X_test, y_test = dataset.train_test_split(0.25) 316 | 317 | print(y_train) 318 | >>> ["1", "3", "4",...] 319 | ``` 320 | 321 | 322 | → [Back to top](#python-reference) 323 | 324 | ## **File class** 325 | 326 | ```python 327 | class base.files.File 328 | ``` 329 | 330 | Using the index operator [] on the Files class object, you can get the File class object at a specific index. 331 | 332 | ```python 333 | print(files[0]) 334 | >>> "/home/xxxx/dataset/mnist/0/12909.png" 335 | ``` 336 | 337 | These are the available attributes: 338 | 339 | - path (string) 340 | - local filepath. 341 | 342 | For example: 343 | 344 | ```python 345 | files[0].path 346 | >>> "/home/xxxx/dataset/mnist/0/12909.png" 347 | ``` 348 | 349 | - metadata (dict) 350 | whole dict of attributes (metadata) which related with this file. 351 | 352 | For example: 353 | 354 | ```Python 355 | files[0].metadata 356 | >>> { 357 | "dataType": "train", 358 | "label": "0", 359 | "id": "12909" 360 | } 361 | ``` 362 | 363 | - attrs (string) 364 | - attributes (metadata) which related with this file. 365 | 366 | For example: 367 | 368 | ```python 369 | files[0].label 370 | >>> "0" 371 | 372 | files[0].id 373 | >>> "12909" 374 | ``` 375 | 376 | → [Back to top](#python-reference) 377 | 378 | ## **Files class** 379 | 380 | ```python 381 | class base.files.Files 382 | ``` 383 | 384 | This is a low-level (file path) interface for dataset in Base. A Files object includes the File instances which matched with your dataset filter. 385 | 386 | ```python 387 | import base 388 | 389 | project = base.Project("project-name") 390 | files = project.files(conditions="string", query=["string"], sort_key="string") 391 | ``` 392 | 393 | You can filter data files and get Files object simply by specified criteria using `files` method of `base.Project`. 394 | 395 | 396 | **Using the index operator [] on the Files class object, you can get the [`File class`](#file-class) object at a specific index.** 397 | 398 | For example: 399 | 400 | ```python 401 | files[0] 402 | >>> "/home/xxxx/dataset/mnist/0/12909.png" 403 | 404 | files[0].label 405 | >>> "0" 406 | 407 | files[0].id 408 | >>> "12909" 409 | ``` 410 | 411 | These are the available attributes: 412 | 413 | - project_name (string) 414 | - registerd project name 415 | - user_id (string) 416 | - registerd user id 417 | - project_uid (string) 418 | - project unique hash 419 | - conditions (string) - default `None` 420 | - value to search for files 421 | - query (string) - default [] 422 | - expression of key and value to search for files 423 | - sort_key (string) - default `None` 424 | - key to sort files 425 | - files (list) 426 | - list of File class objects 427 | - result (list) 428 | - list of metadata_dict filtered by criteria 429 | ```python 430 | [ 431 | { 432 | "FilePath": String, 433 | "MetaKey1": ..., 434 | ... 435 | }, 436 | ... 437 | ] 438 | ``` 439 | - paths (list) 440 | - list of local filepaths 441 | ```python 442 | [ 443 | "String", 444 | ... 445 | ] 446 | ``` 447 | - items (list) 448 | - list of metadata_dict other than filepath 449 | ```python 450 | [ 451 | { 452 | "MetaKey1": ..., 453 | ... 454 | }, 455 | ... 456 | ] 457 | ``` 458 | 459 | This is the available methods: 460 | 461 | - [filter()](#filter) 462 | 463 | ### **filter()** 464 | 465 | ```python 466 | files = files.filter(conditions="string", query=["string"], sort_key="string") 467 | ``` 468 | 469 | This method apply additional filter to already filtered Files object. You can use this method repeatedly. 470 | 471 | **Parameters** 472 | 473 | - conditions (string) - optional 474 | - value to search for files. 475 | 476 | For example: 477 | 478 | ```python 479 | conditions="0" 480 | ``` 481 | 482 | If you want to search by multiple criteria, you must provide comma (,) separated strings. 483 | 484 | For example: 485 | 486 | ```python 487 | conditions="0,1,2" 488 | ``` 489 | 490 | You will get files that meet at least one of the criteria. 491 | 492 | **Note** 493 | 494 | There must be no single-byte spaces between values. 495 | 496 | - query (list) - default [] 497 | - expression of key and value to search for files. 498 | 499 | For example: 500 | 501 | ```python 502 | query=["label == 0"] 503 | ``` 504 | 505 | You can use `==`, `!=`, `>`, `>=`, `<`, `<=`, `is`, `is not`, `in`, and `not in` as operators. 506 | 507 | If you want to search by multiple criteria, you must provide the list of expressions. For example: 508 | 509 | ```python 510 | query=["label == 0", "id >= 10000"] 511 | ``` 512 | 513 | You will get files that meet all the criteria. 514 | 515 | **Note** 516 | 517 | A single-byte space is required before and after the operator. 518 | 519 | - sort_key (string) - optional 520 | - key to sort files. 521 | 522 | For example: 523 | 524 | ```python 525 | sort_key="label" 526 | ``` 527 | 528 | **Returns** 529 | 530 | - Files class 531 | 532 | There are available operators 533 | 534 | - [+ (concatenation)](#+-(concatenatopm)) 535 | - [| (union)](#|-(union)) 536 | 537 | ### **+ (concatenation)** 538 | Return a new Files object that is the concatenation of the 2 Files object. You can use this operator recursively. 539 | 540 | This operation is **not** sensitive to element duplication. If both Files objects has same File object, the operated Files object has 2 same File object. 541 | 542 | **Expression** 543 | ```python 544 | concated_files = files1 + files2 545 | 546 | # You can operate recursively. 547 | concated_files = files1 + files2 + files3 548 | concated_files2 = concated_files + files4 549 | ``` 550 | 551 | **Examples** 552 | ```python 553 | files1 = project.files(conditions="0,1,2", query=['dataType == test'], sort_key="id") 554 | files2 = project.files(conditions="0,1,2", query=['dataType == train'], sort_key="id") 555 | 556 | files = files1 + files2 557 | print(files) 558 | >>> ======Files====== 559 | Files1(project_name='mnist', conditions='0,1,2', query=['dataType == test'], sort_key='id', file_num=3148) 560 | Files2(project_name='mnist', conditions='0,1,2', query=['dataType == train'], sort_key='id', file_num=18624) 561 | ===Expressions=== 562 | Files1 + Files2 563 | 564 | print(len(files)) 565 | >>> 21772 566 | ``` 567 | 568 | 569 | ### **| (union)** 570 | Return a new Files object that is the union of the 2 Files object. You can use this operator recursively. 571 | 572 | This operation guaranteed that all File objects that operated Files object has are unique. 573 | 574 | **Expression** 575 | ```python 576 | union_files = files1 | files2 577 | 578 | # You can operate recursively. 579 | union_files = files1 | files2 | files3 580 | union_files2 = union_files | files4 581 | ``` 582 | 583 | **Examples** 584 | ```python 585 | files1 = project.files(conditions="0,1,2", sort_key="id") 586 | files2 = project.files(conditions="0", sort_key="id") 587 | 588 | files = files1 | files2 589 | print(files) 590 | >>> ======Files====== 591 | Files1(project_name='mnist', conditions='0,1,2', query=[], sort_key='id', file_num=21772) 592 | Files2(project_name='mnist', conditions='0', query=[], sort_key='id', file_num=6905) 593 | ===Expressions=== 594 | Files1 or Files2 595 | 596 | print(len(files)) 597 | >>> 21772 598 | ``` 599 | 600 | 601 | 602 | → [Back to top](#python-reference) 603 | 604 | ## **calc_file_hash()** 605 | 606 | ```python 607 | function base.hash.calc_file_hash(path="string", algorithm="md5"|"sha224"|"sha256"|"sha384"|"sha512"|"sha1", split_chunk=False|True, chunk_size=int) 608 | ``` 609 | 610 | Calculate hash value of each file 611 | 612 | **Parameters** 613 | 614 | - path (string) - requeired 615 | - target file path 616 | - algorithm (string) - default "sha256" 617 | - hash algorithm name 618 | - split_chunk (bool) - default True 619 | - if True, split large file to byte chunks 620 | - chunk_size (integer) - default 2048 621 | - block byte size of chunk 622 | 623 | **Returns** 624 | 625 | - digest (string) 626 | - hash string of inputed file 627 | 628 | → [Back to top](#python-reference) 629 | 630 | ## **Parser class** 631 | 632 | ```python 633 | class base.parser.Parser 634 | ``` 635 | 636 | This is a file path parser. When you call `add_datafiles` method of `base.Project`, Base will initialize Parser object with specified parsing rule and try to extract metadata from each file path with `__call__` method. 637 | 638 | ```python 639 | from base.parser import Parser 640 | 641 | parser = Parser(parsing_rule="string", sep=None|"string") 642 | result = parser(path="string") 643 | ``` 644 | 645 | ### **\_\_init\_\_()** 646 | 647 | Initialize self with parsing_rule and generate parser. 648 | 649 | ```python 650 | base.parser.Parser(parsing_rule="string", sep=None|"string") 651 | ``` 652 | 653 | 1. Replace unused strings with `{_}` in `parsing_rule` 654 | 2. Extract keys enclosed in `{}` 655 | * Example of processing method 656 | ```Raw 657 | 1. parsing_rule: hoge{num1}/fuga{num2}.txt 658 | -> {hoge}/{num1}/{fuga}/{num2}.txt 659 | 660 | 2. {_}/{num1}/{_}/{num2}.txt 661 | -> ["_", "num1", "_", "num2"] 662 | ``` 663 | 664 | **Parameter** 665 | 666 | - parsing_rule (string) - required 667 | - specified parsing rule 668 | ex.) {_}/{name}/{timestamp}/{sensor}-{condition}_{iteration}.csv 669 | - sep (string) - optional 670 | - the separator of the file path 671 | 672 | ### **\_\_call\_\_()** 673 | 674 | Parse your target path. 675 | 676 | ```python 677 | parser(path="string") 678 | ``` 679 | 680 | 1. Convert file path string to parsable format. 681 | 2. Extract values enclosed in `{}` in the parsable formatted path. 682 | 3. Generate a dictionary from keys and values extracted with `parsing_rule`. 683 | 684 | * Example of processing method 685 | ```Raw 686 | 1. path: mnist/train/0/12909.png 687 | -> {mnist}/{train}/{0}/{12909}.png 688 | 689 | 2. parsable format: {mnist}/{train}/{0}/{12909}.png 690 | -> ["mnist", "train", "0", "12909"] 691 | 692 | 3. keys : ["_", "dataType", "label", "id"] 693 | values: ["mnist", "train", "0", "12909"] 694 | -> {"dataType": "train", "label": "0", "id": "12909"} 695 | ``` 696 | **Parameters** 697 | 698 | - path (string) - required 699 | - the file path 700 | 701 | **Return** 702 | 703 | - parsed_dict (dict) 704 | - meta data dictionary 705 | 706 | These are the available methods: 707 | 708 | - [is_path_parsable()](#is_path_parsable) 709 | - [update_rule()](#update_rule) 710 | 711 | ### **is_path_parsable()** 712 | 713 | Verify specified parsing rule is working properly. If not, return False 714 | 715 | ```python 716 | parser.is_path_parsable(path="string") 717 | ``` 718 | 719 | **Parameter** 720 | 721 | - path (string) - required 722 | - the file path. 723 | 724 | **Return** 725 | 726 | - parsable_flag (bool) 727 | - True if the file path is parsable 728 | 729 | ### **update_rule()** 730 | 731 | Generate a parser that takes into account the number of splitter based on the parsing example. 732 | 733 | Use this method when `is_path_parsable("your-path")` is false. 734 | 735 | ```python 736 | parser.update_rule(parsing_rule="string") 737 | ``` 738 | 739 | **Parameters** 740 | - parsing_rule (string) - required 741 | - detail parsing rule. 742 | ex.) {Origin}/{train}/{2022_04_05}-{dog}_{a01}.png 743 | 744 | → [Back to top](#python-reference) 745 | 746 | ## **Project class** 747 | 748 | ```python 749 | class base.project.Project 750 | ``` 751 | 752 | A basement class of project. You have to initialize with existing project name. If you specified a project name which you don't have, you will get ValueError. Please retry after call `base.project.create_project` function. 753 | 754 | 755 | ```Python 756 | import base 757 | 758 | project = base.Project("project-name") 759 | ``` 760 | 761 | These are the available attributes: 762 | 763 | - project_name (string) 764 | - registerd project name 765 | - user_id (string) 766 | - registerd user id 767 | - project_uid (string) 768 | - project unique hash 769 | 770 | These are the available methods: 771 | 772 | - [add_datafile()](#adddatafile) 773 | - [add_datafiles()](#adddatafiles) 774 | - [add_member()](#addmember) 775 | - [add_metafile()](#addmetafile) 776 | - [extract_metafile](#extractmetafile) 777 | - [estimate_join_rule](#estimatejoinrule) 778 | - [files()](#files) 779 | - [get_members()](#getmembers) 780 | - [get_metadata_summary()](#getmetadatasummary) 781 | - [link_datafiles()](#linkdatafiles) 782 | - [remove_member()](#removemember) 783 | - [update_member()](#updatemember) 784 | 785 | 786 | ### **add_datafile()** 787 | 788 | Import meta data of one file. 789 | 790 | ```python 791 | project.add_datafile(file_path="string", attributes={"string":"string"}) 792 | ``` 793 | 794 | 1. Calculate the file hash. 795 | 3. Create meta data record with the file hash and attributes. 796 | 4. Add that record into project database table. 797 | 798 | ```python 799 | { 800 | "FileHash": String, 801 | "MetaKey1": ..., 802 | ... 803 | } 804 | ``` 805 | 806 | **Parameters** 807 | 808 | - file_path (string) - requeired 809 | - the file path 810 | - attributes (dict) - default {} 811 | - the extra meta data (attributes) 812 | 813 | **Raises** 814 | 815 | - Exception 816 | - raises if something went wrong on uploading request to server 817 | 818 | ### **add_datafiles()** 819 | 820 | Import meta data related with datafile paths. 821 | 822 | ```python 823 | project.add_datafiles(dir_path="string", extension="string", attributes={"string":"string"}, parsing_rule="string", detail_parsing_rule="string") 824 | ``` 825 | 826 | 1. Calculate the file hash. 827 | 2. Parse the file path with `parsing-rule`. 828 | 3. Create meta data records with the file hash, attributes, and parsed path data. 829 | 4. Add that records into project database table. 830 | 831 | ```python 832 | { 833 | "FileHash": String, 834 | "MetaKey1": ..., 835 | ... 836 | } 837 | ``` 838 | 839 | **Parameters** 840 | 841 | - dir_path (string) - requeired 842 | - the root directory path for datafiles 843 | - extension (string) - requeired 844 | - the extension of datafiles 845 | - attributes (dict) - default {} 846 | - the extra meta data (attributes) combined with whole datafiles 847 | - parsing_rule (string) - optional 848 | - the rule for extracting meta data from datafile path 849 | ex.) {_}/{disease}/{patient-id}-{part}-{iteration}.png 850 | - detail_parsing_rule (string) - optional 851 | - detail information about parsing rule 852 | ex.) {_}/{CancerA}/{1-123}-{1}-{100}.png 853 | 854 | **Returns** 855 | 856 | - file_num (integer) 857 | - number of imported datafiles 858 | 859 | **Raises** 860 | 861 | - ValueError 862 | - raises if invalid parsing rule was specified 863 | - Exception 864 | - raises if something went wrong on uploading request to server 865 | 866 | ### **add_member()** 867 | 868 | Invite a new project member. 869 | 870 | ```python 871 | project.add_member(member="string", permission_level="string") 872 | ``` 873 | 874 | **Parameters** 875 | 876 | - member (string) - requeired 877 | - the user id of new member 878 | - permission_level (string) - requeired 879 | - new member's permission level 880 | - Viewer 881 | only read meta data on project database. 882 | viewer can not import data files or external files 883 | and can not control permission of other members. 884 | - Editor 885 | can read and write meta data into project database. 886 | editor can not control permission of other members. 887 | - Admin 888 | can read and write meta data into project database. 889 | admin can also control permission of other members, 890 | but can not transfer Owner permission level. 891 | 892 | **Raises** 893 | 894 | - ValueError 895 | - raises if invalid permission level was specified 896 | - Exception 897 | - raises if something went wrong on invite request to server 898 | 899 | ### **add_metafile()** 900 | 901 | Import meta data from external file. 902 | 903 | ```python 904 | project.add_metafile(file_path=["string"], attributes={"string":"string"}) 905 | ``` 906 | 907 | **Parameters** 908 | 909 | - file_path (list) - requeired 910 | - list of the external file path 911 | - attributes (string) - default {} 912 | - the extra meta data (attributes) combined with whole datafiles 913 | 914 | **Raises** 915 | 916 | - ValueError 917 | - raises if specified external file is not csv or excel file 918 | - Exception 919 | - raises if something went wrong on uploading request to server 920 | 921 | ### **extract_metafile()** 922 | 923 | Only Extract meta data from external file. 924 | 925 | ```python 926 | project.extract_metafile(file_path="string", attributes={"string":"string"}) 927 | ``` 928 | 929 | **Parameters** 930 | 931 | - file_path (string) - requeired 932 | - the external file path 933 | - attributes (string) - default {} 934 | - the extra meta data (attributes) combined with whole datafiles 935 | 936 | **Returns** 937 | - tables (list) 938 | - list of table data extracted from external file 939 | 940 | ```JavaScript 941 | [ 942 | [ 943 | { 944 | "MetaKey1": ..., 945 | "MetaKey2": ..., 946 | ... 947 | }, 948 | ... 949 | ], 950 | ... 951 | ] 952 | ``` 953 | 954 | **Raises** 955 | 956 | - ValueError 957 | - raises if specified external file is not csv or excel file 958 | - Exception 959 | - raises if something went wrong on uploading request to server 960 | 961 | ### **estimate_join_rule()** 962 | 963 | Only estimate the join rule from external file and existing table. 964 | 965 | ```python 966 | project.extract_metafile(file_path="string", tables=list) 967 | ``` 968 | 969 | **Parameters** 970 | 971 | Either file_path or tables are required. If both are specified, tables take precedence. 972 | - file_path (string) 973 | - the external file path 974 | - tables (list) 975 | - output of base.Project().extract_metafile() method 976 | 977 | 978 | 979 | **Returns** 980 | - join_rule (list) 981 | - list of the join rule estimated from external file and existing table. 982 | 983 | ```JavaScript 984 | [ 985 | { 986 | "new key1":"exist key1" ..., 987 | ... 988 | }, 989 | ... 990 | ] 991 | ``` 992 | 993 | 994 | **Raises** 995 | 996 | - ValueError 997 | - raises if specified external file is not csv or excel file 998 | - Exception 999 | - raises if something went wrong on uploading request to server 1000 | 1001 | ### **files()** 1002 | 1003 | Return the [`Files class`](#files-class). 1004 | You can filter files easily and simply by specified criteria. 1005 | 1006 | ```python 1007 | files = project.files(conditions="string", query=["string"], sort_key="string") 1008 | ``` 1009 | 1010 | **Parameters** 1011 | 1012 | - conditions (string) - optional 1013 | - value to search for files 1014 | 1015 | For example: 1016 | 1017 | ```python 1018 | conditions="0" 1019 | ``` 1020 | 1021 | If you want to search by multiple criteria, you must provide comma (,) separated strings. 1022 | 1023 | For example: 1024 | 1025 | ```python 1026 | conditions="0,1,2" 1027 | ``` 1028 | 1029 | You will get files that meet at least one of the criteria. 1030 | 1031 | - query (list) - default [] 1032 | - expression of key and value to search for files 1033 | 1034 | For example: 1035 | 1036 | ```python 1037 | query=["label == 0"] 1038 | ``` 1039 | 1040 | You can use `==`, `!=`, `>`, `>=`, `<`, `<=`, `is`, `is not`, `in`, and `not in` as operator. 1041 | 1042 | If you want to search by multiple criteria, you must provide the list of expressions. 1043 | 1044 | For example: 1045 | 1046 | ```python 1047 | query=["label == 0", "id >= 10000"] 1048 | ``` 1049 | 1050 | You will get files that meet all the criteria. 1051 | 1052 | **Note** 1053 | 1054 | A single-byte space is required before and after the operator. 1055 | 1056 | - sort_key (string) - optional 1057 | - key to sort files. 1058 | 1059 | For example: 1060 | 1061 | ```python 1062 | sort_key="label" 1063 | ``` 1064 | 1065 | **Returns** 1066 | 1067 | - [`Files class`](#files-class) 1068 | 1069 | ### **get_members()** 1070 | 1071 | Get list of project members. 1072 | 1073 | ```python 1074 | project.get_members() 1075 | ``` 1076 | 1077 | **Returns** 1078 | 1079 | - member_list (list) 1080 | - list of each members information 1081 | 1082 | ```JavaScript 1083 | [ 1084 | { 1085 | "UserID": String, 1086 | "UserRole": String, 1087 | "CreatedTime": String of unix time 1088 | }, 1089 | ... 1090 | ] 1091 | ``` 1092 | 1093 | **Raises** 1094 | 1095 | - Exception 1096 | - raises if something went wrong with request to server 1097 | 1098 | ### **get_metadata_summary()** 1099 | 1100 | Get list of meta data information. 1101 | 1102 | ```python 1103 | project.get_metadata_summary() 1104 | ``` 1105 | 1106 | **Returns** 1107 | 1108 | - key_list (list) 1109 | - list of each members information 1110 | 1111 | ```JavaScript 1112 | [ 1113 | { 1114 | "KeyHash": String, 1115 | "KeyName": String, 1116 | "ValueHash": String, 1117 | "ValueType": String, 1118 | "RecordedCount": Integer, 1119 | "UpperValue": String, 1120 | "LowerValue": String, 1121 | "CreatedTime": String of unix time, 1122 | "LastModifiedTime": String of unix time, 1123 | "Creator": String, 1124 | "LastEditor": String, 1125 | "EditerList": List of String 1126 | }, 1127 | ... 1128 | ] 1129 | ``` 1130 | 1131 | **Raises** 1132 | 1133 | - Exception 1134 | - raises if something went wrong with request to server 1135 | 1136 | 1137 | ### **link_datafiles()** 1138 | 1139 | Create linker metadat to local datafiles. 1140 | 1141 | ```python 1142 | project.link_datafiles(dir_path="string", extension="string") 1143 | ``` 1144 | 1145 | **Parameters** 1146 | 1147 | - dir_path (string) - requeired 1148 | - the root directory path for datafiles 1149 | - extension (string) - requeired 1150 | - the extension of datafiles 1151 | 1152 | **Returns** 1153 | 1154 | - file_num (integer) 1155 | - number of linked datafiles 1156 | 1157 | ### **remove_member()** 1158 | 1159 | Remove project member. 1160 | 1161 | ```python 1162 | project.remove_member(member=["string"]|"string") 1163 | ``` 1164 | 1165 | **Parameters** 1166 | 1167 | - member (list or string) - requeired 1168 | - the target member for removing 1169 | 1170 | **Raises** 1171 | 1172 | - Exception 1173 | - raises if something went wrong on removing request to server 1174 | 1175 | ### **update_member()** 1176 | 1177 | Update project member's permission. 1178 | 1179 | ```python 1180 | project.update_member(member="string", permission_level="Viewer"|"Editor"|"Admin"|"Owner") 1181 | ``` 1182 | 1183 | **Parameters** 1184 | 1185 | - member (string) - requeired 1186 | - the user id of existing member 1187 | - permission_level (string) - requeired 1188 | - member's permission level for update 1189 | - Viewer 1190 | only read meta data on project database. 1191 | viewer can not import data files or external files 1192 | and can not control permission of other members. 1193 | - Editor 1194 | can read and write meta data into project database. 1195 | editor can not control permission of other members. 1196 | - Admin 1197 | can read and write meta data into project database. 1198 | admin can also control permission of other members, 1199 | but can not transfer Owner permission level. 1200 | - Owner 1201 | can transfer owner permission to others, 1202 | and delete project completely. 1203 | 1204 | **Raises** 1205 | 1206 | - ValueError 1207 | - raises if invalid permission level was specified 1208 | - Exception 1209 | - raises if something went wrong on invite request to server 1210 | 1211 | → [Back to top](#python-reference) 1212 | 1213 | ## **archive_project()** 1214 | 1215 | ```python 1216 | function base.project.archive_project(user_id="string", project_name="string") 1217 | ``` 1218 | 1219 | Archive project. 1220 | 1221 | **Parameters** 1222 | 1223 | - user_id (string) - requeired 1224 | - registerd user id 1225 | - project_name (string) - requeired 1226 | - project name you want to archive 1227 | 1228 | **Raises** 1229 | 1230 | - Exception 1231 | - raises if something went wrong on request to server 1232 | 1233 | 1234 | ## **create_project()** 1235 | 1236 | ```python 1237 | function base.project.create_project(user_id="string", project_name="string", private=True|False) 1238 | ``` 1239 | 1240 | **Parameters** 1241 | 1242 | - user_id (string) - requeired 1243 | - registerd user id 1244 | - project_name (string) - requeired 1245 | - project name which you want to create 1246 | - private (bool) - default True 1247 | - specifies whether or not to allow public access into the project 1248 | 1249 | **Returns** 1250 | 1251 | - project_uid (string) 1252 | - project unique hash 1253 | 1254 | **Raises** 1255 | 1256 | - Exception 1257 | - raises if something went wrong on request to server 1258 | 1259 | 1260 | ## **delete_project()** 1261 | 1262 | ```python 1263 | function base.project.delete_project(user_id="string", project_name="string") 1264 | ``` 1265 | 1266 | Delete project. 1267 | 1268 | **Parameters** 1269 | 1270 | - user_id (string) - requeired 1271 | - registerd user id 1272 | - project_name (string) - requeired 1273 | - archived project name you want to delete 1274 | 1275 | **Raises** 1276 | 1277 | - Exception 1278 | - raises if something went wrong on request to server 1279 | 1280 | ## **get_projects()** 1281 | 1282 | ```python 1283 | function base.project.get_projects(user_id="string", archived=False|True) 1284 | ``` 1285 | 1286 | Get list of projects. 1287 | 1288 | **Parameters** 1289 | 1290 | - user_id (string) - requeired 1291 | - registerd user id 1292 | - archived (bool) - default False 1293 | - if False, return not archived projects. if False, return archived projects 1294 | 1295 | **Returns** 1296 | 1297 | - project_list (list) 1298 | - list of project name you have 1299 | 1300 | **Raises** 1301 | 1302 | - Exception 1303 | - raises if something went wrong on request to server 1304 | 1305 | → [Back to top](#python-reference) 1306 | 1307 | ## **summarize_keys_information()** 1308 | 1309 | ```python 1310 | function base.project.summarize_keys_information(metadata_summary="list") 1311 | ``` 1312 | 1313 | Summarize information of keys on project for printing. 1314 | 1315 | **Parameters** 1316 | 1317 | - metadata_summary (list) - requeired 1318 | - output of the base.Project().get_metadata_summary() method 1319 | 1320 | **Returns** 1321 | 1322 | - summary_for_print (dict) 1323 | - summarized key information for printing 1324 | 1325 | ```JavaScript 1326 | { 1327 | "MaxRecordedCount": Integer, 1328 | "UniqueKeyCount": Integer, 1329 | "MaxCharCount": { 1330 | "KEY NAME": Integer, 1331 | "VALUE RANGE": Integer, 1332 | "VALUE TYPE": Integer, 1333 | "RECORDED COUNT": Integer 1334 | }, 1335 | "Keys": [ 1336 | ( 1337 | KeyName: String, 1338 | ValueRange: String, 1339 | ValueType: String, 1340 | RecordedCount: String 1341 | ), 1342 | ... 1343 | ] 1344 | } 1345 | ``` 1346 | 1347 | → [Back to top](#python-reference) --------------------------------------------------------------------------------