├── .coveragerc ├── .github ├── .gitmessage.txt ├── ISSUE_TEMPLATE │ ├── bug-template.md │ ├── config.yml │ ├── feature-template.md │ └── issue-template.md ├── PULL_REQUEST_TEMPLATE.md └── workflows │ ├── build.yml │ ├── codespell.yml │ ├── lint.yml │ ├── publish.yml │ └── tests.yml ├── .gitignore ├── .pre-commit-config.yaml ├── .readthedocs.yml ├── .travis.yml ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.rst ├── HISTORY.rst ├── LICENSE.txt ├── MANIFEST.in ├── README.md ├── assets ├── function_schematic.png ├── function_schematic.svg ├── models_export.svg └── models_import.svg ├── docker ├── docker-compose-base.yml └── docker-compose-test-all.yml ├── local-test.env ├── pyDataverse ├── __init__.py ├── api.py ├── auth.py ├── docs │ └── source │ │ ├── _images │ │ └── collection_dataset.png │ │ ├── _static │ │ └── .gitkeep │ │ ├── _templates │ │ ├── layout.html │ │ ├── sidebar_intro.html │ │ └── sidebar_related-links.html │ │ ├── community │ │ ├── contact.rst │ │ └── releases.rst │ │ ├── conf.py │ │ ├── contributing │ │ └── contributing.rst │ │ ├── index.rst │ │ ├── reference.rst │ │ ├── snippets │ │ ├── pip-install.rst │ │ ├── requirements.rst │ │ └── warning_production.rst │ │ └── user │ │ ├── advanced-usage.rst │ │ ├── basic-usage.rst │ │ ├── csv-templates.rst │ │ ├── faq.rst │ │ ├── installation.rst │ │ ├── resources.rst │ │ └── use-cases.rst ├── exceptions.py ├── models.py ├── schemas │ └── json │ │ ├── datafile_upload_schema.json │ │ ├── dataset_upload_default_schema.json │ │ ├── dataverse_upload_schema.json │ │ └── dspace_schema.json ├── templates │ ├── datafiles.csv │ ├── datasets.csv │ └── dataverses.csv └── utils.py ├── pyproject.toml ├── requirements.txt ├── run-tests.sh ├── tests ├── __init__.py ├── api │ ├── __init__.py │ ├── test_access.py │ ├── test_api.py │ ├── test_async_api.py │ └── test_upload.py ├── auth │ ├── __init__.py │ └── test_auth.py ├── conftest.py ├── core │ └── __init__.py ├── data │ ├── datafile.txt │ ├── datafile_upload_full.json │ ├── datafile_upload_min.json │ ├── dataset_upload_full_default.json │ ├── dataset_upload_min_default.json │ ├── dataverse_upload_full.json │ ├── dataverse_upload_min.json │ ├── file_upload_ds_minimum.json │ ├── output │ │ └── .gitkeep │ ├── replace.xyz │ ├── tree.json │ ├── user-guide │ │ ├── datafile.txt │ │ ├── datafiles.csv │ │ ├── dataset.json │ │ ├── datasets.csv │ │ └── dataverse.json │ └── user.json ├── logs │ └── .gitkeep ├── models │ ├── __init__.py │ ├── test_datafile.py │ ├── test_dataset.py │ ├── test_dataverse.py │ └── test_dvobject.py └── utils │ ├── __init__.py │ └── test_utils.py └── tox.ini /.coveragerc: -------------------------------------------------------------------------------- 1 | [html] 2 | directory = docs/coverage_htlm 3 | -------------------------------------------------------------------------------- /.github/.gitmessage.txt: -------------------------------------------------------------------------------- 1 | Capitalized, short (50 chars or less) summary (#123) 2 | 3 | More detailed explanatory text, if necessary. Wrap it to about 72 4 | characters or so. In some contexts, the first line is treated as the 5 | subject of an email and the rest of the text as the body. The blank 6 | line separating the summary from the body is critical (unless you omit 7 | the body entirely); tools like rebase can get confused if you run the 8 | two together. 9 | 10 | Write your commit message in the imperative: "Fix bug" and not "Fixed bug" 11 | or "Fixes bug." This convention matches up with commit messages generated 12 | by commands like git merge and git revert. 13 | 14 | Further paragraphs come after blank lines. 15 | 16 | - Bullet points are okay, too 17 | 18 | - Typically a hyphen or asterisk is used for the bullet, followed by a single space, with blank lines in between, but conventions vary here 19 | 20 | - Use a hanging indent 21 | 22 | If you use an issue tracker, add a reference(s) to them at the bottom, 23 | like so: 24 | 25 | Resolves: #123 26 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug-template.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: 'Bug Report' 3 | about: 'This is a bug report issue' 4 | labels: 'type:bug, status:incoming' 5 | --- 6 | 7 | 8 | 9 | 10 | 11 | Thank you for your contribution! 12 | 13 | It's great, that you want contribute to pyDataverse. 14 | 15 | First, start by reading the [Bug reports, enhancement requests and other issues](https://pydataverse.readthedocs.io/en/master/contributing/contributing.html) section. 16 | 17 | ### Before we can start 18 | 19 | Before moving on, please check some things first: 20 | 21 | * [ ] Your issue may already be reported! Please search on the [issue tracker](https://github.com/gdcc/pyDataverse/issues) before creating one. 22 | * [ ] Is this something you can **debug and fix**? Send a pull request! For more information, see the [Contributor Guide](https://pydataverse.readthedocs.io/en/master/contributing/contributing.html). 23 | * [ ] We as maintainers foster an open and welcoming environment. Be respectfull, supportive and nice to each other! :) 24 | 25 | ### Prerequisites 26 | 27 | * [ ] Are you running the expected version of pyDataverse? (check via `pip freeze`). 28 | 29 | ### Bug report 30 | 31 | [Please replace this line with a brief summary of your issue and add code and/or screenshots or other media to it if available] 32 | 33 | To write a bug report, we have defined a small, helpful workflow, to keep communication effective. 34 | 35 | **1. Describe your environment** 36 | 37 | * [ ] OS: NAME, VERSION, 64/32bit 38 | * [ ] pyDataverse: VERSION 39 | * [ ] Python: VERSION 40 | * [ ] Dataverse: VERSION 41 | 42 | **2. Actual behaviour:** 43 | 44 | [What actually happened] 45 | [Add logs, code, data, screenshots or other media if available.] 46 | 47 | **3. Expected behaviour:** 48 | 49 | [What have you expected to happen?] 50 | 51 | **4. Steps to reproduce** 52 | 53 | 1. [First Step] 54 | 2. [Second Step] 55 | 3. [and so on...] 56 | 57 | [Add logs, code, data, screenshots or other media if available.] 58 | 59 | **5. Possible solution** 60 | 61 | [If you have a clue, tell what could be the actual solution to the problem] 62 | 63 | **6. Check your bug report** 64 | 65 | Before you submit the issue: 66 | 67 | * Check if all information necessary to understand the problem is in. 68 | * Check if your language is written in a positive way. 69 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/config.yml: -------------------------------------------------------------------------------- 1 | blank_issues_enabled: false 2 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature-template.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: 'Feature Request' 3 | about: 'This is a feature request issue' 4 | labels: 'type:feature, status:incoming' 5 | --- 6 | 7 | 8 | 9 | 10 | 11 | Thank you for your contribution! 12 | 13 | It's great, that you want contribute to pyDataverse. 14 | 15 | First, start by reading the [Bug reports, enhancement requests and other issues](https://pydataverse.readthedocs.io/en/master/contributing/contributing.html) section. 16 | 17 | ### Before we can start 18 | 19 | Before moving on, please check some things first: 20 | 21 | * [ ] Your issue may already be reported! Please search on the [issue tracker](https://github.com/gdcc/pyDataverse/issues) before creating one. 22 | * [ ] Is this something you can **debug and fix**? Send a pull request! For more information, see the [Contributor Guide](https://pydataverse.readthedocs.io/en/master/contributing/contributing.html). 23 | * [ ] We as maintainers foster an open and welcoming environment. Be respectfull, supportive and nice to each other! :) 24 | 25 | ### Prerequisites 26 | 27 | * [ ] Are you running the latest version? 28 | 29 | ### Feature Request 30 | 31 | We will consider your request but it may be closed if it's something we're not actively planning to work on. 32 | 33 | **Please note: By far the quickest way to get a new feature is to file a [Pull Request](https://github.com/gdcc/pyDataverse/pulls).** 34 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/issue-template.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: 'Issue' 3 | about: 'This is a normal issue' 4 | labels: 'status:incoming' 5 | --- 6 | 7 | 8 | 9 | 10 | 11 | Thank you for your contribution! 12 | 13 | It's great, that you want contribute to pyDataverse. 14 | 15 | First, start by reading the [Bug reports, enhancement requests and other issues](https://pydataverse.readthedocs.io/en/master/contributing/contributing.html) section. 16 | 17 | ### Before we can start 18 | 19 | Before moving on, please check some things first: 20 | 21 | * [ ] Your issue may already be reported! Please search on the [issue tracker](https://github.com/gdcc/pyDataverse/issues) before creating one. 22 | * [ ] Use our issue templates for bug reports and feature requests, if that's what you need. 23 | * [ ] Are you running the expected version of pyDataverse? (check via `pip freeze`). 24 | * [ ] Is this something you can **debug and fix**? Send a pull request! Bug fixes and documentation fixes are welcome. For more information, see the [Contributor Guide](https://pydataverse.readthedocs.io/en/master/contributing/contributing.html). 25 | * [ ] We as maintainers foster an open and welcoming environment. Be respectfull, supportive and nice to each other! :) 26 | 27 | ### Issue 28 | 29 | [Explain the reason for your issue] 30 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | *Any change needs to be discussed before proceeding. Failure to do so may result in the rejection of the pull request.* 11 | 12 | 13 | 14 | Thanks for submitting a pull request! It's great, that you want contribute to pyDataverse. Please provide enough information so that others can review it. 15 | 16 | First, start always by reading the [Contribution Guide](https://pydataverse.readthedocs.io/en/master/contributing/contributing.html). There you can find all information needed, to create good pull requests. 17 | 18 | ### All Submissions 19 | 20 | **Describe your environment** 21 | 22 | * [ ] OS: NAME, VERSION, 64/32bit 23 | * [ ] pyDataverse: VERSION 24 | * [ ] Python: VERSION 25 | * [ ] Dataverse: VERSION 26 | 27 | **Follow best practices** 28 | 29 | * [ ] Have you checked to ensure there aren't other open [Pull Requests](https://github.com/gdcc/pyDataverse/pulls) for the same update/change? 30 | * [ ] Have you followed the guidelines in our [Contribution Guide](https://pydataverse.readthedocs.io/en/master/contributing/contributing.html)? 31 | * [ ] Have you read the [Code of Conduct](https://github.com/gdcc/pyDataverse/blob/master/CODE_OF_CONDUCT.md)? 32 | * [ ] Do your changes in a separate branch. Branches MUST have descriptive names. 33 | * [ ] Have you merged the latest changes from upstream to your branch? 34 | 35 | **Describe the PR** 36 | 37 | * [ ] What kind of change does this PR introduce? 38 | * TEXT 39 | * [ ] Why is this change required? What problem does it solve? 40 | * TEXT 41 | * [ ] Screenshots (if appropriate) 42 | * [ ] Put `Closes #ISSUE_NUMBER` to the end of this pull request 43 | 44 | **Testing** 45 | 46 | * [ ] Have you used tox and/or pytest for testing the changes? 47 | * [ ] Did the local testing ran successfully? 48 | * [ ] Did the Continuous Integration testing (Travis-CI) ran successfully? 49 | 50 | **Commits** 51 | 52 | * [ ] Have descriptive commit messages with a short title (first line). 53 | * [ ] Use the [commit message template](https://github.com/gdcc/pyDataverse/blob/master/.github/.gitmessage.txt) 54 | * [ ] Put `Closes #ISSUE_NUMBER` in your commit messages to auto-close the issue that it fixes (if such). 55 | 56 | **Others** 57 | 58 | * [ ] Is there anything you need from someone else? 59 | 60 | ### Documentation contribution 61 | 62 | * [ ] Have you followed NumPy Docstring standard? 63 | 64 | ### Code contribution 65 | 66 | * [ ] Have you used pre-commit? 67 | * [ ] Have you formatted your code with black prior to submission (e. g. via pre-commit)? 68 | * [ ] Have you written new tests for your changes? 69 | * [ ] Have you ran mypy on your changes successfully? 70 | * [ ] Have you documented your update (Docstrings and/or Docs)? 71 | * [ ] Do your changes require additional changes to the documentation? 72 | 73 | 74 | Closes #ISSUE_NUMBER 75 | -------------------------------------------------------------------------------- /.github/workflows/build.yml: -------------------------------------------------------------------------------- 1 | name: Build PyDataverse 2 | on: [push] 3 | 4 | jobs: 5 | build: 6 | runs-on: ubuntu-latest 7 | strategy: 8 | matrix: 9 | python-version: ["3.8", "3.9", "3.10", "3.11"] 10 | name: Build pyDataverse 11 | steps: 12 | - name: "Checkout" 13 | uses: "actions/checkout@v4" 14 | - name: Setup Python 15 | uses: actions/setup-python@v3 16 | with: 17 | python-version: ${{ matrix.python-version }} 18 | - name: Install Python Dependencies 19 | run: | 20 | python3 -m pip install --upgrade pip 21 | python3 -m pip install poetry 22 | 23 | poetry install 24 | -------------------------------------------------------------------------------- /.github/workflows/codespell.yml: -------------------------------------------------------------------------------- 1 | # Codespell configuration is within setup.cfg 2 | --- 3 | name: Codespell 4 | 5 | on: 6 | push: 7 | branches: [master] 8 | pull_request: 9 | branches: [master] 10 | 11 | permissions: 12 | contents: read 13 | 14 | jobs: 15 | codespell: 16 | name: Check for spelling errors 17 | runs-on: ubuntu-latest 18 | 19 | steps: 20 | - name: Checkout 21 | uses: actions/checkout@v4 22 | - name: Codespell 23 | uses: codespell-project/actions-codespell@v2 24 | -------------------------------------------------------------------------------- /.github/workflows/lint.yml: -------------------------------------------------------------------------------- 1 | name: Ruff 2 | on: [push, pull_request] 3 | jobs: 4 | ruff: 5 | runs-on: ubuntu-latest 6 | steps: 7 | - uses: actions/checkout@v4 8 | - uses: chartboost/ruff-action@v1 9 | -------------------------------------------------------------------------------- /.github/workflows/publish.yml: -------------------------------------------------------------------------------- 1 | name: Build and publish 2 | 3 | on: 4 | release: 5 | types: [released] 6 | 7 | jobs: 8 | deploy: 9 | runs-on: ubuntu-latest 10 | 11 | steps: 12 | - uses: actions/checkout@v2 13 | - name: "Build and publish to PyPi" 14 | uses: JRubics/poetry-publish@v1.17 15 | with: 16 | pypi_token: ${{ secrets.PYPI_TOKEN }} 17 | -------------------------------------------------------------------------------- /.github/workflows/tests.yml: -------------------------------------------------------------------------------- 1 | name: Unit tests 2 | on: [push] 3 | 4 | jobs: 5 | custom_test: 6 | runs-on: ubuntu-latest 7 | strategy: 8 | matrix: 9 | python-version: ["3.8", "3.9", "3.10", "3.11"] 10 | name: Test pyDataverse 11 | env: 12 | PORT: 8080 13 | steps: 14 | - name: "Checkout" 15 | uses: "actions/checkout@v4" 16 | - name: Run Dataverse Action 17 | id: dataverse 18 | uses: gdcc/dataverse-action@main 19 | - name: Setup Python 20 | uses: actions/setup-python@v3 21 | with: 22 | python-version: ${{ matrix.python-version }} 23 | - name: Install Python Dependencies 24 | run: | 25 | python3 -m pip install --upgrade pip 26 | python3 -m pip install poetry 27 | 28 | poetry install --with tests 29 | 30 | - name: Run tests 31 | env: 32 | API_TOKEN_SUPERUSER: ${{ steps.dataverse.outputs.api_token }} 33 | API_TOKEN: ${{ steps.dataverse.outputs.api_token }} 34 | BASE_URL: ${{ steps.dataverse.outputs.base_url }} 35 | DV_VERSION: ${{ steps.dataverse.outputs.dv_version }} 36 | run: | 37 | python3 -m poetry run pytest 38 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Local testing dv artifacts 2 | dv 3 | solr 4 | 5 | # Apple artifacts 6 | .DS_Store 7 | 8 | # Byte-compiled / optimized / DLL files 9 | __pycache__/ 10 | *.py[cod] 11 | *$py.class 12 | 13 | # C extensions 14 | *.so 15 | 16 | # Distribution / packaging 17 | .Python 18 | build/ 19 | develop-eggs/ 20 | dist/ 21 | downloads/ 22 | eggs/ 23 | .eggs/ 24 | lib/ 25 | lib64/ 26 | parts/ 27 | sdist/ 28 | var/ 29 | wheels/ 30 | share/python-wheels/ 31 | *.egg-info/ 32 | .installed.cfg 33 | *.egg 34 | MANIFEST 35 | 36 | # PyInstaller 37 | # Usually these files are written by a python script from a template 38 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 39 | *.manifest 40 | *.spec 41 | 42 | # Installer logs 43 | pip-log.txt 44 | pip-delete-this-directory.txt 45 | 46 | # Unit test / coverage reports 47 | htmlcov/ 48 | .tox/ 49 | .nox/ 50 | .coverage 51 | .coverage.* 52 | .cache 53 | nosetests.xml 54 | coverage.xml 55 | *.cover 56 | *.py,cover 57 | .hypothesis/ 58 | .pytest_cache/ 59 | cover/ 60 | 61 | # Translations 62 | *.mo 63 | *.pot 64 | 65 | # Django stuff: 66 | *.log 67 | local_settings.py 68 | db.sqlite3 69 | db.sqlite3-journal 70 | 71 | # Flask stuff: 72 | instance/ 73 | .webassets-cache 74 | 75 | # Scrapy stuff: 76 | .scrapy 77 | 78 | # Sphinx documentation 79 | docs/_build/ 80 | 81 | # PyBuilder 82 | .pybuilder/ 83 | target/ 84 | 85 | # Jupyter Notebook 86 | .ipynb_checkpoints 87 | 88 | # IPython 89 | profile_default/ 90 | ipython_config.py 91 | 92 | # pyenv 93 | # For a library or package, you might want to ignore these files since the code is 94 | # intended to run in multiple environments; otherwise, check them in: 95 | # .python-version 96 | 97 | # pipenv 98 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 99 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 100 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 101 | # install all needed dependencies. 102 | #Pipfile.lock 103 | 104 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 105 | __pypackages__/ 106 | 107 | # Celery stuff 108 | celerybeat-schedule 109 | celerybeat.pid 110 | 111 | # SageMath parsed files 112 | *.sage.py 113 | 114 | # Environments 115 | .env 116 | .venv 117 | env/ 118 | venv/ 119 | ENV/ 120 | env.bak/ 121 | venv.bak/ 122 | 123 | # Spyder project settings 124 | .spyderproject 125 | .spyproject 126 | 127 | # Rope project settings 128 | .ropeproject 129 | 130 | # mkdocs documentation 131 | /site 132 | 133 | # mypy 134 | .mypy_cache/ 135 | .dmypy.json 136 | dmypy.json 137 | 138 | # Pyre type checker 139 | .pyre/ 140 | 141 | # pytype static type analyzer 142 | .pytype/ 143 | 144 | # Cython debug symbols 145 | cython_debug/ 146 | 147 | # Personal 148 | notes*.md 149 | stash*.* 150 | setup.sh 151 | .pypirc 152 | data/ 153 | !tests/data 154 | tests/data/output/ 155 | dev/ 156 | internal 157 | *.code-workspace 158 | .python-version 159 | devel.py 160 | test_manual.py 161 | /docs 162 | src/pyDataverse/docs/build 163 | src/pyDataverse/docs/source/_build 164 | src/pyDataverse/docs/Makefile 165 | env-config/ 166 | wiki/ 167 | 168 | # Poetry lock 169 | poetry.lock 170 | 171 | # Ruff 172 | .ruff_cache/ 173 | 174 | # JetBrains 175 | .idea/ 176 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | default_language_version: 2 | python: python3 3 | exclude: ^migrations/ 4 | repos: 5 | - repo: https://github.com/pre-commit/pre-commit-hooks 6 | rev: v4.6.0 7 | hooks: 8 | - id: check-added-large-files 9 | - id: check-case-conflict 10 | - id: check-docstring-first 11 | - id: check-json 12 | - id: check-symlinks 13 | - id: check-xml 14 | - id: check-yaml 15 | - id: detect-private-key 16 | - id: end-of-file-fixer 17 | - id: pretty-format-json 18 | args: [--autofix, --no-sort-keys] 19 | 20 | - repo: https://github.com/astral-sh/ruff-pre-commit 21 | rev: v0.4.4 22 | hooks: 23 | - id: ruff 24 | args: [--fix] 25 | - id: ruff-format 26 | 27 | - repo: https://github.com/asottile/blacken-docs 28 | rev: v1.7.0 29 | hooks: 30 | - id: blacken-docs 31 | additional_dependencies: [black==19.10b0] 32 | 33 | - repo: https://github.com/codespell-project/codespell 34 | # Configuration for codespell is in setup.cfg 35 | rev: v2.2.6 36 | hooks: 37 | - id: codespell 38 | -------------------------------------------------------------------------------- /.readthedocs.yml: -------------------------------------------------------------------------------- 1 | # .readthedocs.yml 2 | # Read the Docs configuration file 3 | # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details 4 | 5 | # Required 6 | version: 2 7 | 8 | # Build documentation in the docs/ directory with Sphinx 9 | sphinx: 10 | configuration: pyDataverse/docs/source/conf.py 11 | 12 | # Optionally build your docs in additional formats such as PDF and ePub 13 | formats: all 14 | 15 | # Optionally set the version of Python and requirements required to build your docs 16 | python: 17 | version: 3.6 18 | install: 19 | - requirements: requirements/docs.txt 20 | - method: pip 21 | path: . 22 | system_packages: true 23 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | cache: pip 3 | dist: xenial 4 | 5 | matrix: 6 | include: 7 | - python: 3.6 8 | env: TOXENV=py36 9 | - python: 3.7 10 | env: TOXENV=py37 11 | - python: 3.8 12 | env: TOXENV=py38 13 | - python: 3.6 14 | env: TOXENV=docs 15 | - python: 3.6 16 | env: TOXENV=coverage 17 | - python: 3.6 18 | env: TOXENV=coveralls 19 | 20 | branches: 21 | only: 22 | - master 23 | - develop 24 | 25 | before_install: 26 | - echo $TRAVIS_PYTHON_VERSION 27 | 28 | install: 29 | - pip install tox-travis 30 | - pip install coverage 31 | - pip install coveralls 32 | - virtualenv --version 33 | - easy_install --version 34 | - pip --version 35 | - tox --version 36 | 37 | script: 38 | - tox 39 | 40 | after_success: 41 | - coveralls 42 | 43 | notifications: 44 | email: 45 | recipients: 46 | - stefan.kasberger@univie.ac.at 47 | on_success: change 48 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to making participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | - Using welcoming and inclusive language 18 | - Being respectful of differing viewpoints and experiences 19 | - Gracefully accepting constructive criticism 20 | - Focusing on what is best for the community 21 | - Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | - The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | - Trolling, insulting/derogatory comments, and personal or political attacks 28 | - Public or private harassment 29 | - Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | - Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies within all project spaces, and it also applies when 49 | an individual is representing the project or its community in public spaces. 50 | Examples of representing a project or community include using an official 51 | project e-mail address, posting via an official social media account, or acting 52 | as an appointed representative at an online or offline event. Representation of 53 | a project may be further defined and clarified by project maintainers. 54 | 55 | ## Enforcement 56 | 57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 58 | reported by contacting the project lead at stefan.kasberger@univie.ac.at. All 59 | complaints will be reviewed and investigated and will result in a response that 60 | is deemed necessary and appropriate to the circumstances. The project team is 61 | obligated to maintain confidentiality with regard to the reporter of an incident. 62 | Further details of specific enforcement policies may be posted separately. 63 | 64 | Project maintainers who do not follow or enforce the Code of Conduct in good 65 | faith may face temporary or permanent repercussions as determined by other 66 | members of the project's leadership. 67 | 68 | ## Attribution 69 | 70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 71 | available at 72 | 73 | [homepage]: https://www.contributor-covenant.org 74 | 75 | For answers to common questions about this code of conduct, see 76 | 77 | -------------------------------------------------------------------------------- /HISTORY.rst: -------------------------------------------------------------------------------- 1 | .. _history: 2 | 3 | 4 | 0.3.1 - (2021-04-06) 5 | ---------------------------------------------------------- 6 | 7 | `Release `_ 8 | 9 | 10 | 0.3.0 - (2021-01-26) - Ruth Wodak 11 | ---------------------------------------------------------- 12 | 13 | `Release `_ 14 | 15 | 16 | 0.2.1 - (2019-06-19) 17 | ---------------------------------------------------------- 18 | 19 | `Release `_ 20 | 21 | 22 | 0.2.0 - (2019-06-18) - Ida Pfeiffer 23 | ---------------------------------------------------------- 24 | 25 | `Release `_ 26 | 27 | 28 | 0.1.1 - (2019-05-28) 29 | ---------------------------------------------------------- 30 | 31 | `Release `_ 32 | 33 | 34 | 0.1.0 - (2019-05-22) - Marietta Blau 35 | ---------------------------------------------------------- 36 | 37 | `Release `_ 38 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | ===================== 3 | 4 | Copyright © 2024 Stefan Kasberger, Jan Range 5 | 6 | Permission is hereby granted, free of charge, to any person 7 | obtaining a copy of this software and associated documentation 8 | files (the “Software”), to deal in the Software without 9 | restriction, including without limitation the rights to use, 10 | copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the 12 | Software is furnished to do so, subject to the following 13 | conditions: 14 | 15 | The above copyright notice and this permission notice shall be 16 | included in all copies or substantial portions of the Software. 17 | 18 | THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, 19 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 20 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 21 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 22 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 23 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 24 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 25 | OTHER DEALINGS IN THE SOFTWARE. 26 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | recursive-include pyDataverse/schemas * 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![PyPI](https://img.shields.io/pypi/v/pyDataverse.svg)](https://pypi.org/project/pyDataverse/) [![Conda Version](https://img.shields.io/conda/vn/conda-forge/pydataverse.svg)](https://anaconda.org/conda-forge/pydataverse/) ![Build Status](https://github.com/gdcc/pyDataverse/actions/workflows/tests.yml/badge.svg) [![Coverage Status](https://coveralls.io/repos/github/gdcc/pyDataverse/badge.svg)](https://coveralls.io/github/gdcc/pyDataverse) [![Documentation Status](https://readthedocs.org/projects/pydataverse/badge/?version=latest)](https://pydataverse.readthedocs.io/en/latest) PyPI - Python Version [![GitHub](https://img.shields.io/github/license/gdcc/pydataverse.svg)](https://opensource.org/licenses/MIT) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4664557.svg)](https://doi.org/10.5281/zenodo.4664557) 2 | 3 | # pyDataverse 4 | 5 | [![Project Status: Unsupported – The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) 6 | 7 | pyDataverse is a Python module for [Dataverse](http://dataverse.org). 8 | It helps to access the Dataverse [API's](http://guides.dataverse.org/en/latest/api/index.html) and manipulate, validate, import and export all Dataverse data-types (Dataverse, Dataset, Datafile). 9 | 10 | **Find out more: [Read the Docs](https://pydataverse.readthedocs.io/en/latest/)** 11 | 12 | # Running tests 13 | 14 | In order to run the tests, you need to have a Dataverse instance running. We have prepared a shell script that will start a Dataverse instance using Docker that runs all tests in a clean environment. To run the tests, execute the following command: 15 | 16 | ```bash 17 | # Defaults to Python 3.11 18 | ./run_tests.sh 19 | 20 | # To run the tests with a specific Python version 21 | ./run_tests.sh -p 3.8 22 | ``` 23 | 24 | Once finished, you can find the test results in the `dv/unit-tests.log` file and in the terminal. 25 | 26 | ## Manual setup 27 | 28 | If you want to run single tests you need to manually set up the environment and set up the necessary environment variables. Please follow the instructions below. 29 | 30 | **1. Start the Dataverse instance** 31 | 32 | ```bash 33 | docker compose \ 34 | -f ./docker/docker-compose-base.yml \ 35 | --env-file local-test.env \ 36 | up -d 37 | ``` 38 | 39 | **2. Set up the environment variables** 40 | 41 | ```bash 42 | export BASE_URL=http://localhost:8080 43 | export DV_VERSION=6.2 # or any other version 44 | export $(grep "API_TOKEN" "dv/bootstrap.exposed.env") 45 | export API_TOKEN_SUPERUSER=$API_TOKEN 46 | ``` 47 | 48 | **3. Run the test(s) with pytest** 49 | 50 | ```bash 51 | python -m pytest -v 52 | ``` 53 | 54 | ## Chat with us! 55 | 56 | If you are interested in the development of pyDataverse, we invite you to join us for a chat on our [Zulip Channel](https://dataverse.zulipchat.com/#narrow/stream/377090-python). This is the perfect place to discuss and exchange ideas about the development of pyDataverse. Whether you need help or have ideas to share, feel free to join us! 57 | 58 | ## PyDataverse Working Group 59 | 60 | We have formed a [pyDataverse working group](https://py.gdcc.io) to exchange ideas and collaborate on pyDataverse. There is a bi-weekly meeting planned for this purpose, and you are welcome to join us by clicking the following [WebEx meeting link](https://unistuttgart.webex.com/unistuttgart/j.php?MTID=m322473ae7c744792437ce854422e52a3). For a list of all the scheduled dates, please refer to the [Dataverse Community calendar](https://calendar.google.com/calendar/embed?src=c_udn4tonm401kgjjre4jl4ja0cs%40group.calendar.google.com&ctz=America%2FNew_York). 61 | -------------------------------------------------------------------------------- /assets/function_schematic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/assets/function_schematic.png -------------------------------------------------------------------------------- /docker/docker-compose-base.yml: -------------------------------------------------------------------------------- 1 | version: "2.4" 2 | name: pydataverse 3 | services: 4 | dataverse: 5 | container_name: "dataverse" 6 | hostname: dataverse 7 | image: ${DATAVERSE_IMAGE} 8 | restart: on-failure 9 | user: payara 10 | environment: 11 | - DATAVERSE_DB_HOST=postgres 12 | - DATAVERSE_DB_USER=${DATAVERSE_DB_USER} 13 | - DATAVERSE_DB_PASSWORD=${DATAVERSE_DB_PASSWORD} 14 | - JVM_ARGS=-Ddataverse.pid.providers=fake 15 | -Ddataverse.pid.default-provider=fake 16 | -Ddataverse.pid.fake.type=FAKE 17 | -Ddataverse.pid.fake.label=FakeDOIProvider 18 | -Ddataverse.pid.fake.authority=10.5072 19 | -Ddataverse.pid.fake.shoulder=FK2/ 20 | ports: 21 | - "8080:8080" 22 | networks: 23 | - dataverse 24 | depends_on: 25 | postgres: 26 | condition: service_started 27 | solr: 28 | condition: service_started 29 | dv_initializer: 30 | condition: service_completed_successfully 31 | volumes: 32 | - ${PWD}/dv/data:/dv 33 | - ${PWD}:/secrets 34 | tmpfs: 35 | - /dumps:mode=770,size=2052M,uid=1000,gid=1000 36 | - /tmp:mode=770,size=2052M,uid=1000,gid=1000 37 | mem_limit: 2147483648 # 2 GiB 38 | mem_reservation: 1024m 39 | privileged: false 40 | healthcheck: 41 | test: curl --fail http://dataverse:8080/api/info/version || exit 1 42 | interval: 10s 43 | retries: 20 44 | start_period: 20s 45 | timeout: 240s 46 | 47 | dv_initializer: 48 | container_name: "dv_initializer" 49 | image: ${CONFIGBAKER_IMAGE} 50 | restart: "no" 51 | command: 52 | - sh 53 | - -c 54 | - "fix-fs-perms.sh dv" 55 | volumes: 56 | - ${PWD}/dv/data:/dv 57 | 58 | postgres: 59 | container_name: "postgres" 60 | hostname: postgres 61 | image: postgres:${POSTGRES_VERSION} 62 | restart: on-failure 63 | environment: 64 | - POSTGRES_USER=${DATAVERSE_DB_USER} 65 | - POSTGRES_PASSWORD=${DATAVERSE_DB_PASSWORD} 66 | ports: 67 | - "5432:5432" 68 | networks: 69 | - dataverse 70 | 71 | solr_initializer: 72 | container_name: "solr_initializer" 73 | image: ${CONFIGBAKER_IMAGE} 74 | restart: "no" 75 | command: 76 | - sh 77 | - -c 78 | - "fix-fs-perms.sh solr && cp -a /template/* /solr-template" 79 | volumes: 80 | - ${PWD}/solr/data:/var/solr 81 | - ${PWD}/solr/conf:/solr-template 82 | 83 | solr: 84 | container_name: "solr" 85 | hostname: "solr" 86 | image: solr:${SOLR_VERSION} 87 | depends_on: 88 | solr_initializer: 89 | condition: service_completed_successfully 90 | restart: on-failure 91 | ports: 92 | - "8983:8983" 93 | networks: 94 | - dataverse 95 | command: 96 | - "solr-precreate" 97 | - "collection1" 98 | - "/template" 99 | volumes: 100 | - ${PWD}/solr/data:/var/solr 101 | - ${PWD}/solr/conf:/template 102 | 103 | smtp: 104 | container_name: "smtp" 105 | hostname: "smtp" 106 | image: maildev/maildev:2.0.5 107 | restart: on-failure 108 | expose: 109 | - "25" # smtp server 110 | environment: 111 | - MAILDEV_SMTP_PORT=25 112 | - MAILDEV_MAIL_DIRECTORY=/mail 113 | networks: 114 | - dataverse 115 | tmpfs: 116 | - /mail:mode=770,size=128M,uid=1000,gid=1000 117 | 118 | bootstrap: 119 | container_name: "bootstrap" 120 | hostname: "bootstrap" 121 | image: ${CONFIGBAKER_IMAGE} 122 | restart: "no" 123 | networks: 124 | - dataverse 125 | volumes: 126 | - ${PWD}/dv/bootstrap.exposed.env:/.env 127 | command: 128 | - sh 129 | - -c 130 | - "bootstrap.sh -e /.env dev" 131 | depends_on: 132 | dataverse: 133 | condition: service_healthy 134 | 135 | networks: 136 | dataverse: 137 | driver: bridge 138 | -------------------------------------------------------------------------------- /docker/docker-compose-test-all.yml: -------------------------------------------------------------------------------- 1 | version: "2.4" 2 | services: 3 | unit-tests: 4 | container_name: unit-tests 5 | image: python:${PYTHON_VERSION}-slim 6 | environment: 7 | BASE_URL: http://dataverse:8080 8 | DV_VERSION: 6.3 9 | networks: 10 | - dataverse 11 | volumes: 12 | - ${PWD}:/pydataverse 13 | - ../dv:/dv 14 | command: 15 | - sh 16 | - -c 17 | - | 18 | # Fetch the API Token from the local file 19 | export $(grep "API_TOKEN" "dv/bootstrap.exposed.env") 20 | export API_TOKEN_SUPERUSER=$$API_TOKEN 21 | cd /pydataverse 22 | 23 | # Run the unit tests 24 | python3 -m pip install --upgrade pip 25 | python3 -m pip install pytest pytest-cov 26 | python3 -m pip install -e . 27 | python3 -m pytest > /dv/unit-tests.log 28 | 29 | depends_on: 30 | bootstrap: 31 | condition: service_completed_successfully 32 | -------------------------------------------------------------------------------- /local-test.env: -------------------------------------------------------------------------------- 1 | # Dataverse 2 | DATAVERSE_IMAGE=docker.io/gdcc/dataverse:unstable 3 | DATAVERSE_DB_USER=dataverse 4 | DATAVERSE_DB_PASSWORD=secret 5 | CONFIGBAKER_IMAGE=docker.io/gdcc/configbaker:unstable 6 | 7 | # Services 8 | POSTGRES_VERSION=15 9 | SOLR_VERSION=9.3.0 10 | -------------------------------------------------------------------------------- /pyDataverse/__init__.py: -------------------------------------------------------------------------------- 1 | """Find out more at https://github.com/GDCC/pyDataverse. 2 | 3 | Copyright 2019 Stefan Kasberger 4 | 5 | Licensed under the MIT License. 6 | """ 7 | 8 | from __future__ import absolute_import 9 | 10 | 11 | __author__ = "Stefan Kasberger" 12 | __email__ = "stefan.kasberger@univie.ac.at" 13 | __copyright__ = "Copyright (c) 2019 Stefan Kasberger" 14 | __license__ = "MIT License" 15 | __version__ = "0.3.4" 16 | __url__ = "https://github.com/GDCC/pyDataverse" 17 | __download_url__ = "https://pypi.python.org/pypi/pyDataverse" 18 | __description__ = "A Python module for Dataverse." 19 | -------------------------------------------------------------------------------- /pyDataverse/auth.py: -------------------------------------------------------------------------------- 1 | """This module contains authentication handlers compatible with :class:`httpx.Auth`""" 2 | 3 | from typing import Generator 4 | 5 | from httpx import Auth, Request, Response 6 | 7 | from pyDataverse.exceptions import ApiAuthorizationError 8 | 9 | 10 | class ApiTokenAuth(Auth): 11 | """An authentication handler to add an API token as the X-Dataverse-key 12 | header. 13 | 14 | For more information on how to retrieve an API token and how it is used, 15 | please refer to https://guides.dataverse.org/en/latest/api/auth.html. 16 | """ 17 | 18 | def __init__(self, api_token: str): 19 | """Initializes the auth handler with an API token. 20 | 21 | Parameters 22 | ---------- 23 | api_token : str 24 | The API token retrieved from your Dataverse instance user profile. 25 | 26 | Examples 27 | -------- 28 | 29 | >>> import os 30 | >>> from pyDataverse.api import DataAccessApi 31 | >>> base_url = 'https://demo.dataverse.org' 32 | >>> api_token_auth = ApiTokenAuth(os.getenv('API_TOKEN')) 33 | >>> api = DataAccessApi(base_url, api_token_auth) 34 | 35 | """ 36 | if not isinstance(api_token, str): 37 | raise ApiAuthorizationError("API token passed is not a string.") 38 | self.api_token = api_token 39 | 40 | def auth_flow(self, request: Request) -> Generator[Request, Response, None]: 41 | """Adds the X-Dataverse-key header with the API token and yields the 42 | original :class:`httpx.Request`. 43 | 44 | Parameters 45 | ---------- 46 | request : httpx.Request 47 | The request object which requires authentication headers 48 | 49 | Yields 50 | ------ 51 | httpx.Request 52 | The original request with modified headers 53 | """ 54 | request.headers["X-Dataverse-key"] = self.api_token 55 | yield request 56 | 57 | 58 | class BearerTokenAuth(Auth): 59 | """An authentication handler to add a Bearer token as defined in `RFC 6750 60 | `_ to the request. 61 | 62 | A bearer token could be obtained from an OIDC provider, for example, 63 | Keycloak. 64 | """ 65 | 66 | def __init__(self, bearer_token: str): 67 | """Initializes the auth handler with a bearer token. 68 | 69 | Parameters 70 | ---------- 71 | bearer_token : str 72 | The bearer token retrieved from your OIDC provider. 73 | 74 | Examples 75 | -------- 76 | 77 | >>> import os 78 | >>> from pyDataverse.api import DataAccessApi 79 | >>> base_url = 'https://demo.dataverse.org' 80 | >>> bearer_token_auth = OAuthBearerTokenAuth(os.getenv('OAUTH_TOKEN')) 81 | >>> api = DataAccessApi(base_url, bearer_token_auth) 82 | 83 | """ 84 | if not isinstance(bearer_token, str): 85 | raise ApiAuthorizationError("API token passed is not a string.") 86 | self.bearer_token = bearer_token 87 | 88 | def auth_flow(self, request: Request) -> Generator[Request, Response, None]: 89 | """Adds the X-Dataverse-key header with the API token and yields the 90 | original :class:`httpx.Request`. 91 | 92 | Parameters 93 | ---------- 94 | request : httpx.Request 95 | The request object which requires authentication headers 96 | 97 | Yields 98 | ------ 99 | httpx.Request 100 | The original request with modified headers 101 | """ 102 | request.headers["Authorization"] = f"Bearer {self.bearer_token}" 103 | yield request 104 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/_images/collection_dataset.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/pyDataverse/docs/source/_images/collection_dataset.png -------------------------------------------------------------------------------- /pyDataverse/docs/source/_static/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/pyDataverse/docs/source/_static/.gitkeep -------------------------------------------------------------------------------- /pyDataverse/docs/source/_templates/layout.html: -------------------------------------------------------------------------------- 1 | {% extends "!layout.html" %} 2 | {% block extrahead %} 3 | 4 | 17 | 18 | {% endblock %} 19 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/_templates/sidebar_intro.html: -------------------------------------------------------------------------------- 1 |

{{ project }}

2 | 3 |

{{ theme_description }}

4 |

Developed by Stefan Kasberger at AUSSDA - The Austrian Social Science Data Archive.

5 | 6 |

7 | 8 | https://travis-ci.com/gdcc/pyDataverse.svg?branch=master 9 | 10 |

11 | 12 | 14 | 15 |

16 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/_templates/sidebar_related-links.html: -------------------------------------------------------------------------------- 1 |

Useful Links

2 | 9 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/community/contact.rst: -------------------------------------------------------------------------------- 1 | .. _community_contact: 2 | 3 | Contact 4 | ================= 5 | 6 | If you'd like to get in touch with the community and development of pyDataverse, 7 | there are several options: 8 | 9 | 10 | GitHub 11 | ------ 12 | 13 | The best way to track the development of pyDataverse is through the 14 | `GitHub repo `_. 15 | 16 | 17 | Email 18 | ------- 19 | 20 | The author of pyDataverse, Stefan Kasberger, can also be contacted 21 | directly via Email. 22 | 23 | - stefan.kasberger@univie.ac.at 24 | 25 | 26 | Twitter 27 | ------- 28 | 29 | pyDataverse is developed at AUSSDA - The Austrian Social Science Data Archive. 30 | You can get regular updates of pyDataverse from our Twitter account, or get in 31 | touch with. 32 | 33 | - `@theaussda `_ 34 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/community/releases.rst: -------------------------------------------------------------------------------- 1 | .. _community_history: 2 | 3 | Release History 4 | =========================== 5 | 6 | .. include:: ../../../../../HISTORY.rst 7 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/conf.py: -------------------------------------------------------------------------------- 1 | # Configuration file for the Sphinx documentation builder. 2 | # 3 | # This file does only contain a selection of the most common options. For a 4 | # full list see the documentation: 5 | # http://www.sphinx-doc.org/en/master/config 6 | 7 | # -- Path setup -------------------------------------------------------------- 8 | 9 | # If extensions (or modules to document with autodoc) are in another directory, 10 | # add these directories to sys.path here. If the directory is relative to the 11 | # documentation root, use os.path.abspath to make it absolute, like shown here. 12 | # 13 | # sys.path.insert(0, os.path.abspath('../../')) 14 | 15 | # -- Project information ----------------------------------------------------- 16 | 17 | import pyDataverse 18 | from datetime import date 19 | import os 20 | import sys 21 | 22 | sys.path.insert(0, os.path.abspath("../..")) 23 | 24 | project = "pyDataverse" 25 | author = "Stefan Kasberger" 26 | author_affiliation = "AUSSDA - The Austrian Social Science Data Archive" 27 | copyright = "{0}, {1}".format(date.today().strftime("%Y"), author) 28 | description = "pyDataverse helps with the Dataverse API's and data types (Dataverse, Dataset, Datafile)." 29 | 30 | # The short X.Y version 31 | version = pyDataverse.__version__ 32 | # The full version, including alpha/beta/rc tags 33 | release = pyDataverse.__version__ 34 | 35 | 36 | # -- General configuration --------------------------------------------------- 37 | 38 | # If your documentation needs a minimal Sphinx version, state it here. 39 | # 40 | # needs_sphinx = '1.0' 41 | 42 | # Add any Sphinx extension module names here, as strings. They can be 43 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 44 | # ones. 45 | extensions = [ 46 | "sphinx.ext.autodoc", 47 | "sphinx.ext.doctest", 48 | "sphinx.ext.coverage", 49 | "sphinx.ext.napoleon", 50 | "sphinx.ext.todo", 51 | "sphinx.ext.viewcode", 52 | "sphinx.ext.intersphinx", 53 | ] 54 | 55 | # Add any paths that contain templates here, relative to this directory. 56 | templates_path = ["_templates"] 57 | 58 | # The suffix(es) of source filenames. 59 | # You can specify multiple suffix as a list of string: 60 | # 61 | # source_suffix = ['.rst', '.md'] 62 | source_suffix = ".rst" 63 | 64 | # The master toctree document. 65 | master_doc = "index" 66 | 67 | # The language for content autogenerated by Sphinx. Refer to documentation 68 | # for a list of supported languages. 69 | # 70 | # This is also used if you do content translation via gettext catalogs. 71 | # Usually you set "language" from the command line for these cases. 72 | language = "en" 73 | 74 | # If true, the current module name will be prepended to all description 75 | # unit titles (such as .. function::). 76 | add_module_names = False 77 | 78 | # List of patterns, relative to source directory, that match files and 79 | # directories to ignore when looking for source files. 80 | # This pattern also affects html_static_path and html_extra_path . 81 | exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "code/"] 82 | 83 | # The name of the Pygments (syntax highlighting) style to use. 84 | pygments_style = "sphinx" 85 | 86 | # -- Options for HTML output ------------------------------------------------- 87 | 88 | # The theme to use for HTML and HTML Help pages. See the documentation for 89 | # a list of builtin themes. 90 | # 91 | html_theme = "alabaster" 92 | 93 | # Theme options are theme-specific and customize the look and feel of a theme 94 | # further. For a list of options available for each theme, see the 95 | # documentation. 96 | # 97 | html_theme_options = { 98 | "description": description, 99 | "show_powered_by": False, 100 | "github_button": True, 101 | "github_user": "gdcc", 102 | "github_repo": "pyDataverse", 103 | "github_banner": False, 104 | "travis_button": True, 105 | } 106 | 107 | # Add any paths that contain custom static files (such as style sheets) here, 108 | # relative to this directory. They are copied after the builtin static files, 109 | # so a file named "default.css" will overwrite the builtin "default.css". 110 | html_static_path = ["_static"] 111 | 112 | # Custom sidebar templates, must be a dictionary that maps document names 113 | # to template names. 114 | # 115 | # The default sidebars (for documents that don't match any pattern) are 116 | # defined by theme itself. Builtin themes are using these templates by 117 | # default: ``['localtoc.html', 'relations.html', 'sourcelink.html', 118 | # 'searchbox.html']``. 119 | # 120 | html_sidebars = { 121 | "index": [ 122 | "sidebar_intro.html", 123 | "navigation.html", 124 | "sidebar_related-links.html", 125 | "sourcelink.html", 126 | "searchbox.html", 127 | ], 128 | "**": [ 129 | "sidebar_intro.html", 130 | "navigation.html", 131 | "sidebar_related-links.html", 132 | "sourcelink.html", 133 | "searchbox.html", 134 | ], 135 | } 136 | 137 | # -- Options for HTMLHelp output --------------------------------------------- 138 | 139 | # Output file base name for HTML help builder. 140 | htmlhelp_basename = "pyDataverse" 141 | 142 | 143 | # -- Options for LaTeX output ------------------------------------------------ 144 | 145 | latex_elements: dict[str, str] = { 146 | # The paper size ('letterpaper' or 'a4paper'). 147 | # 148 | # 'papersize': 'letterpaper', 149 | # The font size ('10pt', '11pt' or '12pt'). 150 | # 151 | # 'pointsize': '10pt', 152 | # Additional stuff for the LaTeX preamble. 153 | # 154 | # 'preamble': '', 155 | # Latex figure (float) alignment 156 | # 157 | # 'figure_align': 'htbp', 158 | } 159 | 160 | # Grouping the document tree into LaTeX files. List of tuples 161 | # (source start file, target name, title, 162 | # author, documentclass [howto, manual, or own class]). 163 | latex_documents = [ 164 | ( 165 | master_doc, 166 | "pyDataverse.tex", 167 | "pyDataverse Documentation", 168 | "AUSSDA - Austrian Social Science Data Archive", 169 | "manual", 170 | ), 171 | ] 172 | 173 | # -- Options for manual page output ------------------------------------------ 174 | 175 | # One entry per manual page. List of tuples 176 | # (source start file, name, description, authors, manual section). 177 | man_pages = [(master_doc, "pyDataverse", "pyDataverse Documentation", [author], 1)] 178 | 179 | # -- Options for Texinfo output ---------------------------------------------- 180 | 181 | # Grouping the document tree into Texinfo files. List of tuples 182 | # (source start file, target name, title, author, 183 | # dir menu entry, description, category) 184 | texinfo_documents = [ 185 | ( 186 | master_doc, 187 | "pyDataverse", 188 | "pyDataverse Documentation", 189 | author, 190 | "pyDataverse", 191 | description, 192 | "Miscellaneous", 193 | ), 194 | ] 195 | 196 | 197 | # -- Extension configuration ------------------------------------------------- 198 | 199 | # Intersphinx 200 | 201 | intersphinx_mapping = { 202 | "python": ("https://docs.python.org/3", None), 203 | } 204 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/contributing/contributing.rst: -------------------------------------------------------------------------------- 1 | Contributor Guide 2 | ========================================= 3 | 4 | .. _contributing_contributing: 5 | 6 | .. include:: ../../../../CONTRIBUTING.rst 7 | :start-line: 2 8 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/index.rst: -------------------------------------------------------------------------------- 1 | .. _homepage: 2 | 3 | pyDataverse 4 | ========================================= 5 | 6 | Release v\ |version|. 7 | 8 | .. image:: https://img.shields.io/github/v/release/gdcc/pyDataverse 9 | :target: https://github.com/gdcc/pyDataverse 10 | 11 | .. image:: https://img.shields.io/conda/vn/conda-forge/pydataverse.svg 12 | :target: https://anaconda.org/conda-forge/pydataverse 13 | 14 | .. image:: https://travis-ci.com/gdcc/pyDataverse.svg?branch=master 15 | :target: https://travis-ci.com/gdcc/pyDataverse 16 | 17 | .. image:: https://img.shields.io/pypi/v/pyDataverse.svg 18 | :target: https://pypi.org/project/pyDataverse/ 19 | 20 | .. image:: https://img.shields.io/pypi/wheel/pyDataverse.svg 21 | :target: https://pypi.org/project/pyDataverse/ 22 | 23 | .. image:: https://img.shields.io/pypi/pyversions/pyDataverse.svg 24 | :target: https://pypi.org/project/pyDataverse/ 25 | 26 | .. image:: https://readthedocs.org/projects/pydataverse/badge/?version=latest 27 | :target: https://pydataverse.readthedocs.io/en/latest 28 | 29 | .. image:: https://coveralls.io/repos/github/gdcc/pyDataverse/badge.svg 30 | :target: https://coveralls.io/github/gdcc/pyDataverse 31 | 32 | .. image:: https://img.shields.io/github/license/gdcc/pydataverse.svg 33 | :target: https://opensource.org/licenses/MIT 34 | 35 | .. image:: https://img.shields.io/badge/code%20style-black-000000.svg 36 | :target: https://github.com/psf/black 37 | 38 | .. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.4664557.svg 39 | :target: https://doi.org/10.5281/zenodo.4664557 40 | 41 | ------------------- 42 | 43 | .. _homepage_description: 44 | 45 | **pyDataverse** is a Python module for `Dataverse `_ you can use for: 46 | 47 | - accessing the Dataverse `API's `_ 48 | - manipulating and using the Dataverse (meta)data - Dataverses, Datasets, Datafiles 49 | 50 | No matter, if you want to import huge masses of data into 51 | Dataverse, test your Dataverse instance after deployment or want to make 52 | basic API calls: 53 | **pyDataverse helps you with Dataverse!** 54 | 55 | pyDataverse is fully Open Source and can be used by everybody. 56 | 57 | .. image:: https://www.repostatus.org/badges/latest/unsupported.svg 58 | :alt: Project Status: Unsupported – The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired. 59 | :target: https://www.repostatus.org/#unsupported 60 | 61 | pyDataverse is not supported right now. A new maintainer or funding is desired. Please contact the author `Stefan Kasberger `_, if you want to contribute in some way. 62 | 63 | .. _homepage_install: 64 | 65 | Install 66 | ----------------------------- 67 | 68 | To install pyDataverse, simply run this command in your terminal of choice: 69 | 70 | .. code-block:: shell 71 | 72 | pip install pyDataverse 73 | 74 | Or run this command to install using conda: 75 | 76 | .. code-block:: shell 77 | 78 | conda install pyDataverse -c conda-forge 79 | 80 | Find more options at :ref:`user_installation`. 81 | 82 | **Requirements** 83 | 84 | .. include:: snippets/requirements.rst 85 | 86 | 87 | .. _homepage_quickstart: 88 | 89 | Quickstart 90 | ----------------------------- 91 | 92 | .. include:: snippets/warning_production.rst 93 | 94 | **Import Dataset metadata JSON** 95 | 96 | To import the metadata of a Dataset from Dataverse's own JSON format, 97 | use :meth:`ds.from_json() `. 98 | The created :class:`Dataset ` can then 99 | be retrieved with :meth:`get() `. 100 | 101 | For this example, we use the ``dataset.json`` from 102 | ``tests/data/user-guide/`` 103 | (`GitHub repo `_) 104 | and place it in the root directory. 105 | 106 | :: 107 | 108 | >>> from pyDataverse.models import Dataset 109 | >>> from pyDataverse.utils import read_file 110 | >>> ds = Dataset() 111 | >>> ds_filename = "dataset.json" 112 | >>> ds.from_json(read_file(ds_filename)) 113 | >>> ds.get() 114 | {'citation_displayName': 'Citation Metadata', 'title': 'Youth in Austria 2005', 'author': [{'authorName': 'LastAuthor1, FirstAuthor1', 'authorAffiliation': 'AuthorAffiliation1'}], 'datasetContact': [{'datasetContactEmail': 'ContactEmail1@mailinator.com', 'datasetContactName': 'LastContact1, FirstContact1'}], 'dsDescription': [{'dsDescriptionValue': 'DescriptionText'}], 'subject': ['Medicine, Health and Life Sciences']} 115 | 116 | **Create Dataset by API** 117 | 118 | To access Dataverse's Native API, you first have to instantiate 119 | :class:`NativeApi `. Then create the 120 | Dataset through the API with 121 | :meth:`create_dataset() `. 122 | 123 | This returns, as all API functions do, a 124 | :class:`httpx.Response ` object, with the 125 | DOI inside ``data``. 126 | 127 | Replace following variables with your own instance data 128 | before you execute the lines: 129 | 130 | - BASE_URL: Base URL of your Dataverse instance, without trailing slash (e. g. ``https://data.aussda.at``)) 131 | - API_TOKEN: API token of a Dataverse user with proper rights to create a Dataset 132 | - DV_PARENT_ALIAS: Alias of the Dataverse, the Dataset should be attached to. 133 | 134 | :: 135 | 136 | >>> from pyDataverse.api import NativeApi 137 | >>> api = NativeApi(BASE_URL, API_TOKEN) 138 | >>> resp = api.create_dataset(DV_PARENT_ALIAS, ds.json()) 139 | Dataset with pid 'doi:10.5072/FK2/UTGITX' created. 140 | >>> resp.json() 141 | {'status': 'OK', 'data': {'id': 251, 'persistentId': 'doi:10.5072/FK2/UTGITX'}} 142 | 143 | 144 | For more tutorials, check out 145 | :ref:`User Guide - Basic Usage ` and 146 | :ref:`User Guide - Advanced Usage `. 147 | 148 | 149 | .. _homepage_features: 150 | 151 | Features 152 | ----------------------------- 153 | 154 | - **Comprehensive API wrapper** for all Dataverse API’s and most of their endpoints 155 | - **Data models** for each of Dataverses data types: **Dataverse, Dataset and Datafile** 156 | - Data conversion to and from Dataverse's own JSON format for API uploads 157 | - **Easy mass imports and exports through CSV templates** 158 | - Utils with helper functions 159 | - **Documented** examples and functionalities 160 | - Custom exceptions 161 | - Tested (`Travis CI `_) and documented (`Read the Docs `_) 162 | - Open Source (`MIT `_) 163 | 164 | 165 | .. _homepage_user-guide: 166 | 167 | User Guide 168 | ----------------------------- 169 | 170 | .. toctree:: 171 | :maxdepth: 3 172 | 173 | user/installation 174 | user/basic-usage 175 | user/advanced-usage 176 | user/use-cases 177 | user/csv-templates 178 | user/faq 179 | Wiki 180 | user/resources 181 | 182 | 183 | .. _homepage_reference: 184 | 185 | Reference / API 186 | ----------------------------- 187 | 188 | If you are looking for information on a specific class, function, or method, 189 | this part of the documentation is for you. 190 | 191 | .. toctree:: 192 | :maxdepth: 2 193 | 194 | reference 195 | 196 | 197 | .. _homepage_community-guide: 198 | 199 | Community Guide 200 | ----------------------------- 201 | 202 | This part of the documentation, which is mostly prose, details the 203 | pyDataverse ecosystem and community. 204 | 205 | .. toctree:: 206 | :maxdepth: 1 207 | 208 | community/contact 209 | community/releases 210 | 211 | 212 | .. _homepage_contributor-guide: 213 | 214 | Contributor Guide 215 | ----------------------------- 216 | 217 | .. toctree:: 218 | :maxdepth: 2 219 | 220 | contributing/contributing 221 | 222 | 223 | .. _homepage_thanks: 224 | 225 | Thanks! 226 | ----------------------------- 227 | 228 | To everyone who has contributed to pyDataverse - with an idea, an issue, a 229 | pull request, developing used tools, sharing it with others or by any other means: 230 | **Thank you for your support!** 231 | 232 | Open Source projects live from the cooperation of the many and pyDataverse is 233 | no exception to that, so to say thank you is the least that can be done. 234 | 235 | Special thanks to Lars Kaczmirek, Veronika Heider, Christian Bischof, Iris 236 | Butzlaff and everyone else from AUSSDA, Slava Tykhonov and Marion Wittenberg 237 | from DANS and all the people who do an amazing job by developing Dataverse 238 | at IQSS, but especially to Phil Durbin for it's support from the first minute. 239 | 240 | pyDataverse is funded by 241 | `AUSSDA - The Austrian Social Science Data Archive `_ 242 | and through the EU Horizon2020 programme 243 | `SSHOC - Social Sciences & Humanities Open Cloud `_ 244 | (T5.2). 245 | 246 | 247 | .. _homepage_license: 248 | 249 | License 250 | ----------------------------- 251 | 252 | Copyright Stefan Kasberger and others, 2019-2021. 253 | 254 | Distributed under the terms of the MIT license, pyDataverse is free and open source software. 255 | 256 | Full License Text: `LICENSE.txt `_ 257 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/reference.rst: -------------------------------------------------------------------------------- 1 | Reference / API 2 | ========================================= 3 | 4 | .. module:: pyDataverse 5 | 6 | This part of the documentation covers all the interfaces / APIs of the pyDataverse modules. 7 | 8 | 9 | Where pyDataverse depends on external libraries, we document the most 10 | important right here and provide links to the canonical documentation outside of scope. 11 | 12 | 13 | API Interface 14 | ----------------------------- 15 | 16 | Access all of Dataverse APIs. 17 | 18 | .. automodule:: pyDataverse.api 19 | :members: 20 | :special-members: 21 | 22 | 23 | Models Interface 24 | ----------------------------- 25 | 26 | Use all metadata models of the Dataverse data-types (`Dataverse`, `Dataset` 27 | and `Datafile`). This includes import, export and manipulation. 28 | 29 | .. automodule:: pyDataverse.models 30 | :inherited-members: 31 | 32 | 33 | Utils Interface 34 | ----------------------------- 35 | 36 | Helper functions. 37 | 38 | .. automodule:: pyDataverse.utils 39 | :members: 40 | 41 | 42 | Auth Helpers 43 | ----------------------------- 44 | 45 | .. automodule:: pyDataverse.auth 46 | :members: 47 | 48 | 49 | Exceptions 50 | ----------------------------- 51 | 52 | Custom exceptions. 53 | 54 | .. automodule:: pyDataverse.exceptions 55 | :members: 56 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/snippets/pip-install.rst: -------------------------------------------------------------------------------- 1 | .. code-block:: shell 2 | 3 | pip install -U pyDataverse 4 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/snippets/requirements.rst: -------------------------------------------------------------------------------- 1 | pyDataverse officially supports Python 3.6–3.8 2 | 3 | Python packages required: 4 | 5 | - `requests `_>=2.12.0 6 | - `jsonschema `_>=3.2.0 7 | 8 | External packages required: 9 | 10 | - curl (only for :meth:`replace_datafile() ` necessary) 11 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/snippets/warning_production.rst: -------------------------------------------------------------------------------- 1 | .. warning:: 2 | Do not execute the example code on a Dataverse production instance, 3 | unless 100% sure! 4 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/user/advanced-usage.rst: -------------------------------------------------------------------------------- 1 | .. _user_advanced-usage: 2 | 3 | Advanced Usage 4 | ============== 5 | 6 | In addition to these tutorials, you can find more basic examples at 7 | :ref:`User Guide - Basic Usage `. 8 | and use-cases 9 | :ref:`User Guide - Use-Cases `. 10 | 11 | 12 | .. _advanced-usage_data-migration: 13 | 14 | Import CSV to Dataverse 15 | ----------------------- 16 | 17 | This tutorial will show you how to mass-import metadata from pyDataverse's own 18 | CSV format (see :ref:`CSV templates `), create 19 | pyDataverse objects from it (Datasets and Datafiles) 20 | and upload the data and metadata through the API. 21 | 22 | The CSV format in this case can work as an exchange format or kind of a bridge between all kind of data formats and programming languages. Note that this can be filled directly by humans who collect the data manually (such as in digitization projects) as well as through more common automation workflows. 23 | 24 | 25 | .. _advanced-usage_prepare: 26 | 27 | Prepare 28 | ^^^^^^^ 29 | 30 | **Requirements** 31 | 32 | - pyDataverse installed (see :ref:`user_installation`) 33 | 34 | **Information** 35 | 36 | - Follow the order of code execution 37 | - Dataverse Docker 4.18.1 used 38 | - pyDataverse 0.3.0 used 39 | - API responses may vary by each request and Dataverse installation! 40 | 41 | .. include:: ../snippets/warning_production.rst 42 | 43 | **Additional Resources** 44 | 45 | - CSV templates from ``pyDataverse/templates/`` are used (see :ref:`CSV templates `) 46 | - Data from ``tests/data/user-guide/`` is used (`GitHub repo `_) 47 | 48 | 49 | .. _advanced-usage_data-migration_adapt-csv-templates: 50 | 51 | Adapt CSV template 52 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 53 | 54 | See :ref:`CSV templates - Adapt CSV template(s) `. 55 | 56 | 57 | .. _advanced-usage_data-migration_fill-csv-templates: 58 | 59 | Add metadata to the CSV files 60 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 61 | 62 | After preparing the CSV files, the metadata will need to be collected (manually or programmatically). No matter the origin or the format, each row must contain one entity (Dataverse collection, Dataset or Datafile). 63 | 64 | As mentioned in "Additional Resources" in the tutorial we use prepared data and place it in the root directory. You can ether use our files or fill in your own metadata with your own datafiles. 65 | 66 | No matter what you choose, you have to have properly formatted CSV files 67 | (``datasets.csv`` and ``datafiles.csv``) before moving on. 68 | 69 | Don't forget: Some columns must be entered in a JSON format! 70 | 71 | 72 | .. _advanced-usage_data-migration_add-datafiles: 73 | 74 | Add datafiles 75 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 76 | 77 | Add the files you have filled in the ``org.filename`` cell in ``datafiles.csv`` 78 | and then place them in the root directory (or any other specified directory). 79 | 80 | 81 | .. _advanced-usage_data-migration_import-csv-templates: 82 | 83 | Import CSV files 84 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 85 | 86 | Import the CSV files with 87 | :meth:`read_csv_as_dicts() `. 88 | This creates a list of :class:`dict`'s, automatically imports 89 | the Dataverse Software's own metadata attribute (``dv.`` prefix), 90 | converts boolean values, and loads JSON cells properly. 91 | 92 | :: 93 | 94 | >>> import os 95 | >>> from pyDataverse.utils import read_csv_as_dicts 96 | >>> csv_datasets_filename = "datasets.csv" 97 | >>> ds_data = read_csv_as_dicts(csv_datasets_filename) 98 | >>> csv_datafiles_filename = "datafiles.csv" 99 | >>> df_data = read_csv_as_dicts(csv_datafiles_filename) 100 | 101 | Once we have the data in Python, we can easily import the data into 102 | pyDataverse. 103 | 104 | For this, loop over each Dataset :class:`dict`, to: 105 | 106 | #. Instantiate an empty :class:`Dataset ` 107 | #. add the data with :meth:`set() ` and 108 | #. append the instance to a :class:`list`. 109 | 110 | :: 111 | 112 | >>> from pyDataverse.models import Dataset 113 | >>> ds_lst = [] 114 | >>> for ds in ds_data: 115 | >>> ds_obj = Dataset() 116 | >>> ds_obj.set(ds) 117 | >>> ds_lst.append(ds_obj) 118 | 119 | To import the :class:`Datafile `'s, do 120 | the same with ``df_data``: 121 | :meth:`set() ` the Datafile metadata, and 122 | append it. 123 | 124 | :: 125 | 126 | >>> from pyDataverse.models import Datafile 127 | >>> df_lst = [] 128 | >>> for df in df_data: 129 | >>> df_obj = Datafile() 130 | >>> df_obj.set(df) 131 | >>> df_lst.append(df_obj) 132 | 133 | 134 | .. _advanced-usage_data-migration_upload-data: 135 | 136 | Upload data via API 137 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 138 | 139 | Before we can upload metadata and data, we need to create an instance of 140 | :class:`NativeApi `. 141 | You will need to replace the following variables with your own Dataverse installation's data 142 | before executing the lines: 143 | 144 | - BASE_URL: Base URL of your Dataverse installation, without trailing slash (e. g. ``https://data.aussda.at``)) 145 | - API_TOKEN: API token of a Dataverse user with proper rights to create a Dataset and upload Datafiles 146 | 147 | :: 148 | 149 | >>> from pyDataverse.api import NativeApi 150 | >>> api = NativeApi(BASE_URL, API_TOKEN) 151 | 152 | Loop over the :class:`list ` of :class:`Dataset `'s, 153 | upload the metadata with 154 | :meth:`create_dataset() ` and collect 155 | all ``dataset_id``'s and ``pid``'s in ``dataset_id_2_pid``. 156 | 157 | Note: The Dataverse collection assigned to ``dv_alias`` must be published in order to add a Dataset to it. 158 | 159 | :: 160 | 161 | >>> dv_alias = ":root:" 162 | >>> dataset_id_2_pid = {} 163 | >>> for ds in ds_lst: 164 | >>> resp = api.create_dataset(dv_alias, ds.json()) 165 | >>> dataset_id_2_pid[ds.get()["org.dataset_id"]] = resp.json()["data"]["persistentId"] 166 | Dataset with pid 'doi:10.5072/FK2/WVMDFE' created. 167 | 168 | The API requests always return a 169 | :class:`httpx.Response ` object, which can then be used 170 | to extract the data. 171 | 172 | Next, we'll do the same for the :class:`list ` of 173 | :class:`Datafile `'s with 174 | :meth:`upload_datafile() `. 175 | In addition to the metadata, the ``PID`` (Persistent Identifier, which is mostly the DOI) and the ``filename`` must be passed. 176 | 177 | :: 178 | 179 | >>> for df in df_lst: 180 | >>> pid = dataset_id_2_pid[df.get()["org.dataset_id"]] 181 | >>> filename = os.path.join(os.getcwd(), df.get()["org.filename"]) 182 | >>> df.set({"pid": pid, "filename": filename}) 183 | >>> resp = api.upload_datafile(pid, filename, df.json()) 184 | 185 | Now we have created all Datasets, which we added to ``datasets.csv``, and uploaded 186 | all Datafiles, which we placed in the root directory, to the Dataverse installation. 187 | 188 | 189 | .. _advanced-usage_data-migration_publish-dataset: 190 | 191 | Publish Datasets via API 192 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 193 | 194 | Finally, we iterate over all Datasets and publish them with 195 | :meth:`publish_dataset() `. 196 | 197 | :: 198 | 199 | >>> for dataset_id, pid in dataset_id_2_pid.items(): 200 | >>> resp = api.publish_dataset(pid, "major") 201 | >>> resp.json() 202 | Dataset doi:10.5072/FK2/WVMDFE published 203 | {'status': 'OK', 'data': {'id': 444, 'identifier': 'FK2/WVMDFE', 'persistentUrl': 'https://doi.org/10.5072/FK2/WVMDFE', 'protocol': 'doi', 'authority': '10.5072', 'publisher': 'Root', 'publicationDate': '2021-01-13', 'storageIdentifier': 'file://10.5072/FK2/WVMDFE'}} 204 | 205 | 206 | The Advanced Usage tutorial is now finished! If you want to 207 | revisit basic examples and use cases you can do so at 208 | :ref:`User Guide - Basic Usage ` and 209 | :ref:`User Guide - Use-Cases `. 210 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/user/basic-usage.rst: -------------------------------------------------------------------------------- 1 | .. _user_basic-usage: 2 | 3 | Basic Usage 4 | ================= 5 | 6 | This tutorial will show you how to import metadata from the Dataverse software's own 7 | JSON format, create pyDataverse objects from it (Dataverse collection, Dataset 8 | and Datafile), upload it via the API, and clean up at the end. 9 | 10 | In addition to this tutorial, you can find more advanced examples at 11 | :ref:`User Guide - Advanced Usage ` and background information at 12 | :ref:`User Guide - Use-Cases `. 13 | 14 | 15 | .. _user_basic-usage_prepare: 16 | 17 | Prepare 18 | ------------------------------------------ 19 | 20 | **Requirements** 21 | 22 | - pyDataverse installed (see :ref:`Installation `) 23 | 24 | **Basic Information** 25 | 26 | - Follow the order of code execution 27 | - Dataverse Docker 4.18.1 used 28 | - pyDataverse 0.3.0 used 29 | - API responses may vary by each request and Dataverse installation! 30 | 31 | .. include:: ../snippets/warning_production.rst 32 | 33 | **Additional Resources** 34 | 35 | - Data from ``tests/data/user-guide/`` used (`GitHub repo `_) 36 | 37 | 38 | .. _user_basic-usage_api-connection: 39 | 40 | Connect to Native API 41 | ------------------------------------------ 42 | 43 | First, create a :class:`NativeApi ` instance. You will use it 44 | later for data creation. Replace the following variables with your own installation's data 45 | before you execute the lines: 46 | 47 | - BASE_URL: Base URL of your Dataverse installation, without trailing slash (e. g. ``https://data.aussda.at``)) 48 | - API_TOKEN: API token of a Dataverse installation user with proper permissions to create a Dataverse collection, create a Dataset, and upload Datafiles 49 | 50 | :: 51 | 52 | >>> from pyDataverse.api import NativeApi 53 | >>> api = NativeApi(BASE_URL, API_TOKEN) 54 | 55 | Check with :meth:`get_info_version() `, 56 | if the API connection works and to retrieve the version of your Dataverse instance: 57 | 58 | :: 59 | 60 | >>> resp = api.get_info_version() 61 | >>> resp.json() 62 | {'status': 'OK', 'data': {'version': '4.15.1', 'build': '1377-701b56b'}} 63 | >>> resp.status_code 64 | 200 65 | 66 | All API requests return a :class:`httpx.Response ` object, which 67 | can then be used (e. g. :meth:`json() `). 68 | 69 | 70 | .. _user_basic-usage_create-dataverse: 71 | 72 | Create Dataverse Collection 73 | ----------------------------- 74 | 75 | The top-level data-type in the Dataverse software is called a Dataverse collection, so we will start with that. 76 | Take a look at the figure below to better understand the relationship between a Dataverse collection, a dataset, and a datafile. 77 | 78 | .. figure:: ../_images/collection_dataset.png 79 | :align: center 80 | :alt: collection dataset datafile 81 | 82 | A dataverse collection (also known as a :class:`Dataverse `) acts as a container for your :class:`Datasets`. 83 | It can also store other collections (:class:`Dataverses `). 84 | You could create your own Dataverse collections, but it is not a requirement. 85 | A Dataset is a container for :class:`Datafiles`, such as data, documentation, code, metadata, etc. 86 | You need to create a Dataset to deposit your files. All Datasets are uniquely identified with a DOI at Dataverse. 87 | For more detailed explanations, check out `the Dataverse User Guide `_. 88 | 89 | Going back to the example, first, instantiate a :class:`Dataverse ` 90 | object and import the metadata from the Dataverse Software's own JSON format with 91 | :meth:`from_json() `: 92 | 93 | :: 94 | 95 | >>> from pyDataverse.models import Dataverse 96 | >>> from pyDataverse.utils import read_file 97 | >>> dv = Dataverse() 98 | >>> dv_filename = "dataverse.json" 99 | >>> dv.from_json(read_file(dv_filename)) 100 | 101 | With :meth:`get() ` you can 102 | have a look at all the data of the object: 103 | 104 | :: 105 | 106 | >>> dv.get() 107 | {'alias': 'pyDataverse_user-guide', 'name': 'pyDataverse - User Guide', 'dataverseContacts': [{'contactEmail': 'info@aussda.at'}]} 108 | >>> type(dv.get()) 109 | 110 | 111 | To see only the metadata necessary for the Dataverse API upload, use 112 | :meth:`json() `, which defaults 113 | to the needed format for the Dataverse API upload 114 | (equivalent to ``json(data_format="dataverse_upload")``): 115 | 116 | :: 117 | 118 | >>> dv.json() 119 | '{\n "alias": "pyDataverse_user-guide",\n "dataverseContacts": [\n {\n "contactEmail": "info@aussda.at"\n }\n ],\n "name": "pyDataverse - User Guide"\n}' 120 | >>> type(dv.json()) 121 | 122 | 123 | Then use :meth:`create_dataverse() ` to 124 | upload the Dataverse metadata to your Dataverse installation via its Native API and 125 | create an unpublished Dataverse collection draft. For this, you have to pass a) the parent 126 | Dataverse collection alias to which the new Dataverse collection is attached and b) the metadata in the Dataverse Software's 127 | own JSON format (:meth:`json() `): 128 | 129 | :: 130 | 131 | >>> resp = api.create_dataverse(":root", dv.json()) 132 | Dataverse pyDataverse_user-guide created. 133 | 134 | Last, we publish the Dataverse collection draft with 135 | :meth:`publish_dataverse() `: 136 | 137 | :: 138 | 139 | >>> resp = api.publish_dataverse("pyDataverse_user-guide") 140 | Dataverse pyDataverse_user-guide published. 141 | 142 | To have a look at the results of our work, you can check the created Dataverse collection 143 | on the frontend, or use pyDataverse to retrieve the Dataverse collection with 144 | :meth:`get_dataverse() `: 145 | 146 | :: 147 | 148 | >>> resp = api.get_dataverse("pyDataverse_user-guide") 149 | >>> resp.json() 150 | {'status': 'OK', 'data': {'id': 441, 'alias': 'pyDataverse_user-guide', 'name': 'pyDataverse - User Guide', 'dataverseContacts': [{'displayOrder': 0, 'contactEmail': 'info@aussda.at'}], 'permissionRoot': True, 'dataverseType': 'UNCATEGORIZED', 'ownerId': 1, 'creationDate': '2021-01-13T20:47:43Z'}} 151 | 152 | This is it, our first Dataverse collection created with the help of pyDataverse! 153 | Now let's move on and apply what we've learned to Datasets and Datafiles. 154 | 155 | .. _user_basic-usage_create-dataset: 156 | 157 | Create Dataset 158 | ----------------------------- 159 | 160 | Again, start by creating an empty pyDataverse object, this time a 161 | :class:`Dataset `: 162 | 163 | :: 164 | 165 | >>> from pyDataverse.models import Dataset 166 | >>> ds = Dataset() 167 | 168 | The function names often are the same for each data-type. So again, we can use 169 | :meth:`from_json() ` to import 170 | the metadata from the JSON file, but this time it feeds into a Dataset: 171 | 172 | :: 173 | 174 | >>> ds_filename = "dataset.json" 175 | >>> ds.from_json(read_file(ds_filename)) 176 | 177 | You can also use :meth:`get() ` 178 | to output all data: 179 | 180 | :: 181 | 182 | >>> ds.get() 183 | {'citation_displayName': 'Citation Metadata', 'title': 'Youth in Austria 2005', 'author': [{'authorName': 'LastAuthor1, FirstAuthor1', 'authorAffiliation': 'AuthorAffiliation1'}], 'datasetContact': [{'datasetContactEmail': 'ContactEmail1@mailinator.com', 'datasetContactName': 'LastContact1, FirstContact1'}], 'dsDescription': [{'dsDescriptionValue': 'DescriptionText'}], 'subject': ['Medicine, Health and Life Sciences']} 184 | 185 | Now, as the metadata is imported, we don't know if the data is valid and can be used 186 | to create a Dataset. Maybe some attributes are missing or misnamed, or a 187 | mistake during import happened. pyDataverse offers a convenient function 188 | to test this out with 189 | :meth:`validate_json() `, so 190 | you can move on with confidence: 191 | 192 | 193 | :: 194 | 195 | >>> ds.validate_json() 196 | True 197 | 198 | Adding or updating data manually is easy. With 199 | :meth:`set() ` 200 | you can pass any attribute you want as a collection of key-value 201 | pairs in a :class:`dict`: 202 | 203 | :: 204 | 205 | >>> ds.get()["title"] 206 | Youth in Austria 2005 207 | >>> ds.set({"title": "Youth from Austria 2005"}) 208 | >>> ds.get()["title"] 209 | Youth from Austria 2005 210 | 211 | To upload the Dataset, use 212 | :meth:`create_dataset() `. 213 | You'll pass the Dataverse collection where the Dataset should be attached 214 | and include the metadata as a JSON string 215 | (:meth:`json() `): 216 | 217 | :: 218 | 219 | >>> resp = api.create_dataset("pyDataverse_user-guide", ds.json()) 220 | Dataset with pid 'doi:10.5072/FK2/EO7BNB' created. 221 | >>> resp.json() 222 | {'status': 'OK', 'data': {'id': 442, 'persistentId': 'doi:10.5072/FK2/EO7BNB'}} 223 | 224 | Save the created PID (short for Persistent Identifier, which in 225 | our case is the DOI) in a :class:`dict`: 226 | 227 | :: 228 | 229 | >>> ds_pid = resp.json()["data"]["persistentId"] 230 | 231 | Private Dataset URL's can also be created. Use 232 | :meth:`create_dataset_private_url() ` 233 | to get the URL and the private token: 234 | 235 | :: 236 | 237 | >>> resp = api.create_dataset_private_url(ds_pid) 238 | Dataset private URL created: http://data.aussda.at/privateurl.xhtml?token={PRIVATE_TOKEN} 239 | >>> resp.json() 240 | {'status': 'OK', 'data': {'token': '{PRIVATE_TOKEN}', 'link': 'http://data.aussda.at/privateurl.xhtml?token={PRIVATE_TOKEN}', 'roleAssignment': {'id': 174, 'assignee': '#442', 'roleId': 8, '_roleAlias': 'member', 'privateUrlToken': '{PRIVATE_TOKEN}', 'definitionPointId': 442}}} 241 | 242 | Finally, to make the Dataset public, publish the draft with 243 | :meth:`publish_dataset() `. 244 | Set ``release_type="major"`` (defaults to ``minor``), to create version 1.0: 245 | 246 | :: 247 | 248 | >>> resp = api.publish_dataset(ds_pid, release_type="major") 249 | Dataset doi:10.5072/FK2/EO7BNB published 250 | 251 | 252 | .. _user_basic-usage_upload-datafile: 253 | 254 | Upload Datafile 255 | ----------------------------- 256 | 257 | After all the preparations, it's now time to upload a 258 | :class:`Datafile ` and attach it to the Dataset: 259 | 260 | :: 261 | 262 | >>> from pyDataverse.models import Datafile 263 | >>> df = Datafile() 264 | 265 | Again, import your metadata with :meth:`from_json() `. 266 | Then, set your PID and filename manually (:meth:`set() `), 267 | as they are required as metadata for the upload and are created during the 268 | import process: 269 | 270 | :: 271 | 272 | >>> df_filename = "datafile.txt" 273 | >>> df.set({"pid": ds_pid, "filename": df_filename}) 274 | >>> df.get() 275 | {'pid': 'doi:10.5072/FK2/EO7BNB', 'filename': 'datafile.txt'} 276 | 277 | Upload the Datafile with 278 | :meth:`upload_datafile() `. 279 | Pass the PID, the Datafile filename and the Datafile metadata: 280 | 281 | :: 282 | 283 | >>> resp = api.upload_datafile(ds_pid, df_filename, df.json()) 284 | >>> resp.json() 285 | {'status': 'OK', 'data': {'files': [{'description': '', 'label': 'datafile.txt', 'restricted': False, 'version': 1, 'datasetVersionId': 101, 'dataFile': {'id': 443, 'persistentId': '', 'pidURL': '', 'filename': 'datafile.txt', 'contentType': 'text/plain', 'filesize': 7, 'description': '', 'storageIdentifier': '176fd85f46f-cf06cf243502', 'rootDataFileId': -1, 'md5': '8b8db3dfa426f6bdb1798d578f5239ae', 'checksum': {'type': 'MD5', 'value': '8b8db3dfa426f6bdb1798d578f5239ae'}, 'creationDate': '2021-01-13'}}]}} 286 | 287 | By uploading the Datafile, the attached Dataset gets an update. 288 | This means that a new unpublished Dataset version is created as a draft 289 | and the change is not yet publicly available. To make it available 290 | through creating a new Dataset version, publish the Dataset with 291 | :meth:`publish_dataset() `. 292 | Again, set the ``release_type="major"`` to create version 2.0, as a file change 293 | always leads to a major version change: 294 | 295 | :: 296 | 297 | >>> resp = api.publish_dataset(ds_pid, release_type="major") 298 | Dataset doi:10.5072/FK2/EO7BNB published 299 | 300 | 301 | .. _user_basic-usage_download-data: 302 | 303 | Download and save a dataset to disk 304 | ---------------------------------------- 305 | 306 | You may want to download and explore an existing dataset from Dataverse. The following code snippet will show how to retrieve and save a dataset to your machine. 307 | 308 | Note that if the dataset is public, you don't need to have an API_TOKEN. Furthermore, you don't even need to have a Dataverse account to use this functionality. The code would therefore look as follows: 309 | 310 | :: 311 | 312 | >>> from pyDataverse.api import NativeApi, DataAccessApi 313 | >>> from pyDataverse.models import Dataverse 314 | 315 | >>> base_url = 'https://dataverse.harvard.edu/' 316 | 317 | >>> api = NativeApi(base_url) 318 | >>> data_api = DataAccessApi(base_url) 319 | 320 | However, you need to know the DOI of the dataset that you want to download. In this example, we use ``doi:10.7910/DVN/KBHLOD``, which is hosted on Harvard's Dataverse instance that we specified as ``base_url``. The code looks as follows: 321 | 322 | :: 323 | 324 | >>> DOI = "doi:10.7910/DVN/KBHLOD" 325 | >>> dataset = api.get_dataset(DOI) 326 | 327 | As previously mentioned, every dataset comprises of datafiles, therefore, we need to get the list of datafiles by ID and save them on disk. That is done in the following code snippet: 328 | 329 | :: 330 | 331 | >>> files_list = dataset.json()['data']['latestVersion']['files'] 332 | 333 | >>> for file in files_list: 334 | >>> filename = file["dataFile"]["filename"] 335 | >>> file_id = file["dataFile"]["id"] 336 | >>> print("File name {}, id {}".format(filename, file_id)) 337 | 338 | >>> response = data_api.get_datafile(file_id) 339 | >>> with open(filename, "wb") as f: 340 | >>> f.write(response.content) 341 | File name cat.jpg, id 2456195 342 | 343 | Please note that in this example, the dataset will be saved in the execution directory. You could change that by adding a desired path in the ``open()`` function above. 344 | 345 | 346 | .. _user_basic-usage_get-data-tree: 347 | 348 | Retrieve all created data as a Dataverse tree 349 | --------------------------------------------------------- 350 | 351 | PyDataverse offers a convenient way to retrieve all children-data from a specific 352 | Dataverse collection or Dataset down to the Datafile level (Dataverse collections, Datasets 353 | and Datafiles). 354 | 355 | Simply pass the identifier of the parent (e. g. Dataverse collection alias or Dataset 356 | PID) and the list of the children data-types that should be collected 357 | (``dataverses``, ``datasets``, ``datafiles``) to 358 | :meth:`get_children() `: 359 | 360 | 361 | :: 362 | 363 | >>> tree = api.get_children("pyDataverse_user-guide", children_types= ["datasets", "datafiles"]) 364 | >>> tree 365 | [{'dataset_id': 442, 'pid': 'doi:10.5072/FK2/EO7BNB', 'type': 'dataset', 'children': [{'datafile_id': 443, 'filename': 'datafile.txt', 'label': 'datafile.txt', 'pid': '', 'type': 'datafile'}]}] 366 | 367 | In our case, we don't use ``dataverses`` as children data-type, as there 368 | is none inside the created Dataverse collection. 369 | 370 | For further use of the tree, have a look at 371 | :meth:`dataverse_tree_walker() ` 372 | and :meth:`save_tree_data() `. 373 | 374 | 375 | .. _user_basic-usage_remove-data: 376 | 377 | Clean up and remove all created data 378 | ---------------------------------------- 379 | 380 | As we have created a Dataverse collection, created a Dataset, and uploaded a Datafile, we now will 381 | remove all of it in order to clean up what we did so far. 382 | 383 | The Dataset has been published in the step above, so we have to destroy it with 384 | :meth:`destroy_dataset() `. 385 | To remove a non-published Dataset, 386 | :meth:`delete_dataset() ` 387 | must be used instead. 388 | 389 | Note: When you delete a Dataset, it automatically deletes all attached 390 | Datafile(s): 391 | 392 | :: 393 | 394 | >>> resp = api.destroy_dataset(ds_pid) 395 | Dataset {'status': 'OK', 'data': {'message': 'Dataset :persistentId destroyed'}} destroyed 396 | 397 | When you want to retrieve the Dataset now with 398 | :meth:`get_dataset() `, pyDataverse throws an 399 | :class:`OperationFailedError ` 400 | exception, which is the expected behaviour, as the Dataset was deleted: 401 | 402 | :: 403 | 404 | >>> resp = api.get_dataset(ds_pid) 405 | pyDataverse.exceptions.OperationFailedError: ERROR: GET HTTP 404 - http://data.aussda.at/api/v1/datasets/:persistentId/?persistentId=doi:10.5072/FK2/EO7BNB. MSG: {"status":"ERROR","message":"Dataset with Persistent ID doi:10.5072/FK2/EO7BNB not found."} 406 | 407 | After removing all Datasets and/or Dataverse collections in it, delete the parent Dataverse collection 408 | (:meth:`delete_dataverse() `). 409 | 410 | Note: It is not possible to delete a Dataverse collection with any data (Dataverse collection or 411 | Dataset) attached to it. 412 | 413 | :: 414 | 415 | >>> resp = api.delete_dataverse("pyDataverse_user-guide") 416 | Dataverse pyDataverse_user-guide deleted. 417 | 418 | Now the Dataverse instance is as it was once before we started. 419 | 420 | The Basic Usage tutorial is now finished, but maybe you want to 421 | have a look at more advanced examples at 422 | :ref:`User Guide - Advanced Usage ` and at 423 | :ref:`User Guide - Use-Cases ` for more information. 424 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/user/csv-templates.rst: -------------------------------------------------------------------------------- 1 | .. _user_csv-templates: 2 | 3 | CSV Templates 4 | ============================ 5 | 6 | .. _user_csv-templates_description: 7 | 8 | General 9 | ----------------------------- 10 | 11 | The CSV templates offer a **pre-defined data format**, which can be used to 12 | import metadata into pyDataverse, and export from it. 13 | They support all three Dataverse Software data-types: Dataverse collections, Datasets and Datafiles. 14 | 15 | CSV is an open file format, and great for humans and for machines. It can be 16 | opened with your Spreadsheet software and edited manually, or used by your 17 | favoured programming language. 18 | 19 | The CSV format can also work as an exchange format or kind of a bridge 20 | between all kind of data formats and programming languages. 21 | 22 | The CSV templates and the mentioned workflow below can be used especially for: 23 | 24 | - **Mass imports into a Dataverse installation:** The data to be imported could ether be collected manually (e. g. digitization of paper works), or created by machines (coming from any data source you have). 25 | - **Data exchange:** share pyDataverse data with any other system in an open, machine-readable format 26 | 27 | The CSV templates are licensed under `CC BY 4.0 `_ 28 | 29 | 30 | .. _user_csv-templates_data-format: 31 | 32 | Data format 33 | ----------------------------- 34 | 35 | - Separator: ``,`` 36 | - Encoding: ``utf-8`` 37 | - Quotation: ``"``. Note: In JSON strings, you have to escape with ``\`` before a quotation mark (e. g. adapt ``"`` to ``\"``). 38 | - Boolean: we recommend using ``TRUE`` and ``FALSE`` as boolean values. Note: They can be modified, when you open it with your preferred spreadsheet software (e. g. Libre Office), depending on the software or your operating systems settings. 39 | 40 | 41 | .. _user_csv-templates_content: 42 | 43 | Content 44 | ----------------------------- 45 | 46 | The templates don't come empty. They are pre-filled with supportive information to get started. 47 | Each row is one entry 48 | 49 | 1. **Column names**: The attribute name for each column. You can add and remove columns as you want. The pre-filled columns are a recommendation, as they consist of all metadata for the specific data-type, and the most common internal fields for handling the workflow. This is the only row that's not allowed to be deleted. There are three established prefixes so far (you can define your own if you want): 50 | 51 | a. ``org.``: Organization specific information to handle the data workflow later on. 52 | b. ``dv.``: Dataverse specific metadata, used for API uploads. Use the exact Dataverse software attribute name after the prefix, so the metadata gets imported properly. 53 | c. ``alma.``: ALMA specific information 54 | 55 | 2. **Description:** Description of the Dataverse software attribute. This row is for support purposes only, and must be deleted before usage. 56 | 3. **Attribute type:** Describes the type of the attribute (``serial``, ``string`` or ``numeric``). Strings can also be valid JSON strings to use more complex data structures. This row is for support purposes only, and must be deleted before usage. 57 | 4. **Example:** Contains a concrete example. To start adding your own data, it is often good to get started by copying the example for it. This row is for support purposes only, and must be deleted before usage. 58 | 5. **Multiple:** ``TRUE``, if multiple entries are allowed (boolean). This row is for support purposes only, and must be deleted before usage. 59 | 6. **Sub-keys:** ``TRUE``, if sub-keys are part (boolean). Only applicable to JSON strings. This row is for support purposes only, and must be deleted before usage. 60 | 61 | 62 | .. _user_csv-templates_usage: 63 | 64 | Usage 65 | ----------------------------- 66 | 67 | To use the CSV templates, we propose following steps as a best practice. 68 | The workflow is the same for Dataverse collections, Datasets and Datafiles. 69 | 70 | There is also a more detailed tutorial on how to use the CSV templates 71 | for mass imports in the 72 | :ref:`User Guide - Advanced `. 73 | 74 | The CSV templates can be found in ``pyDataverse/templates/`` 75 | (`GitHub repo `_): 76 | 77 | - `dataverses.csv `_ 78 | - `datasets.csv `_ 79 | - `datafiles.csv `_ 80 | 81 | 82 | .. _user_csv-templates_usage_create-csv: 83 | 84 | Adapt CSV template(s) 85 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 86 | 87 | First, adapt the CSV templates to your own needs and workflow. 88 | 89 | #. **Open a template file and save it:** Just start by copying the file and changing its filename to something descriptive (e.g. ``20200117_datasets.csv``). 90 | #. **Adapt columns:** Then change the pre-defined columns (attributes) to your needs. 91 | #. **Add metadata:** Add metadata in the first empty row. Closely following the example is often a good starting point, especially for JSON strings. 92 | #. **Remove supporting rows:** Once you are used to the workflow, you can delete the supportive rows 2 to 6. This must be done before you use the template for pyDataverse! 93 | #. **Save and use:** Once you have finished editing, save the CSV-file and import it to pyDataverse. 94 | 95 | 96 | .. _user_csv-templates_usage_add-metadata: 97 | 98 | Use the CSV files 99 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 100 | 101 | For further usage of the CSV files with pyDataverse, for example: 102 | 103 | - adding metadata to the CSV files 104 | - importing CSV files 105 | - uploading data and metadata via API 106 | 107 | ... have a look at the :ref:`Data Migration Tutorial `. 108 | 109 | 110 | .. _user_csv-templates_usage_export-csv: 111 | 112 | Export from pyDataverse 113 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 114 | 115 | If you want to export your metadata from a pyDataverse object ( 116 | :class:`Dataverse `, 117 | :class:`Dataset `, 118 | :class:`Datafile `) 119 | to a CSV file: 120 | 121 | #. Get the metadata as :class:`dict ` (:meth:`Dataverse.get() `, :meth:`Dataset.get() ` or :meth:`Datafile.get() `). 122 | #. Pass the :class:`dict ` to :func:`write_dicts_as_csv() `. Note: Use the internal attribute lists from ``pyDataverse.models`` to get a complete list of fieldnames for each Dataverse data-type (e. g. ``Dataset.__attr_import_dv_up_citation_fields_values``). 123 | 124 | 125 | .. _user_csv-templates_resources: 126 | 127 | Resources 128 | ----------------------------- 129 | 130 | - Dataverse example data taken from `dataverse_full.json `_ 131 | - Dataset example data taken from `dataset_full.json `_ 132 | - Datafile example data taken from `Native API documentation `_ 133 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/user/faq.rst: -------------------------------------------------------------------------------- 1 | .. _community_faq: 2 | 3 | FAQ 4 | ================================== 5 | 6 | **Q: What is the "Dataverse Software?" What is a "Dataverse collection?"** 7 | 8 | A: The Dataverse Software is the name of the open source 9 | data repository software. 2. A Dataverse collection is the top-level data-type in the 10 | Dataverse Software. 11 | 12 | **Q: What is a "Dataset"?** 13 | 14 | A: The term dataset differs from the usual use for a structured set of data. 15 | A Dataset in the Dataverse software is a data-type typically representative for all content of one study. 16 | The Dataset itself contains only metadata, but it relates to other data-types: 17 | Datafiles are attached to it and a Dataset is always part of a Dataverse collection. 18 | 19 | **Q: What is a "Datafile"?** 20 | 21 | A: A Datafile is a Dataverse software data-type. It consists of the file itself and 22 | its metadata. A Datafile is always part of a Dataset. 23 | 24 | **Q: What are the expected HTTP Status Codes for the API requests?** 25 | 26 | A: So far, this is still an unsolved question, as it is not documented yet. 27 | We started to collect this information at a 28 | `Wiki page `_ 29 | , so if you have some knowledge about this, please add it there 30 | or get in touch with us (:ref:`community_contact`). 31 | 32 | **Q: Can I create my own API calls?** 33 | 34 | A: Yes, you can use the :class:`Api ` base-class and its request functions 35 | (:meth:`get_request() `, :meth:`post_request() `, :meth:`put_request() ` and 36 | :meth:`delete_request() `) and pass your own parameter. 37 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/user/installation.rst: -------------------------------------------------------------------------------- 1 | .. _user_installation: 2 | 3 | Installation 4 | ================= 5 | 6 | .. contents:: Table of Contents 7 | :local: 8 | 9 | There are different options on how to install a Python package, mostly depending 10 | on your preferred tools and what you want to do with it. The easiest 11 | way is in most cases to use pip (see :ref:`below `). 12 | 13 | 14 | .. _user_installation_requirements: 15 | 16 | Requirements 17 | ----------------------------- 18 | 19 | .. include:: ../snippets/requirements.rst 20 | 21 | 22 | Installer requirements: `setuptools `_ 23 | 24 | 25 | .. _user_installation_pip: 26 | 27 | Pip 28 | ----------------------------- 29 | 30 | To install the latest release of pyDataverse from PyPI, simply run this 31 | `pip `_ 32 | command in your terminal of choice: 33 | 34 | .. include:: ../snippets/pip-install.rst 35 | 36 | 37 | .. _user_installation_pipenv: 38 | 39 | Pipenv 40 | ----------------------------- 41 | 42 | `Pipenv `_ combines pip and virtualenv. 43 | 44 | .. code-block:: shell 45 | 46 | pipenv install pyDataverse 47 | 48 | 49 | .. _user_installation_source-code: 50 | 51 | 52 | Conda 53 | ----------------------------- 54 | 55 | pyDataverse is also available through `conda-forge `_. 56 | 57 | .. code-block:: shell 58 | 59 | conda install pyDataverse -c conda-forge 60 | 61 | 62 | Source Code 63 | ----------------------------- 64 | 65 | PyDataverse is actively developed on GitHub, where the code is 66 | `always available `_. 67 | 68 | You can either clone the public repository: 69 | 70 | .. code-block:: shell 71 | 72 | git clone git://github.com/gdcc/pyDataverse.git 73 | 74 | Or download the archive of the ``master`` branch as a zip: 75 | 76 | .. code-block:: shell 77 | 78 | curl -OL https://github.com/gdcc/pyDataverse/archive/master.zip 79 | 80 | Once you have a copy of the source, you can embed it in your own Python 81 | package: 82 | 83 | .. code-block:: shell 84 | 85 | cd pyDataverse 86 | pip install . 87 | 88 | 89 | .. _user_installation_development: 90 | 91 | Development 92 | ----------------------------- 93 | 94 | To set up your development environment, see 95 | :ref:`contributing_working-with-code_development-environment`. 96 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/user/resources.rst: -------------------------------------------------------------------------------- 1 | .. _user_resources: 2 | 3 | 4 | Resources 5 | ================= 6 | 7 | 8 | .. _user_resources_presentations-workshops: 9 | 10 | Presentations / Workshops 11 | ----------------------------- 12 | 13 | - `Slides `_: from talk at Dataverse Community Meeting 2019 14 | - `Jupyter Notebook Demo `_: at European Dataverse Community Workshop Tromso 2020 15 | 16 | 17 | .. _user_resources_dataverse: 18 | 19 | Dataverse Project 20 | ----------------------------- 21 | 22 | - `Dataverse `_ 23 | - `API Guide `_ 24 | 25 | 26 | .. _user_resources_developing: 27 | 28 | Developing 29 | ----------------------------- 30 | 31 | **Helpful** 32 | 33 | - `JSON Schema `_ 34 | 35 | - `Validator Webservice `_ 36 | - `Getting Started `_ 37 | - `Schema Validation `_ 38 | 39 | **Open Source Development** 40 | 41 | - `Producing Open Source Software by Karl Fogel `_ 42 | - `GitHub flow `_ 43 | - `Git Workflow `_ 44 | - `Writing on GitHub `_ 45 | -------------------------------------------------------------------------------- /pyDataverse/docs/source/user/use-cases.rst: -------------------------------------------------------------------------------- 1 | .. _user_use-cases: 2 | 3 | Use-Cases 4 | ================= 5 | 6 | For a basic introduction to pyDataverse, visit 7 | :ref:`User Guide - Basic Usage `. For information on more advanced uses, visit :ref:`User Guide - Advanced Usage `. 8 | 9 | 10 | .. _use-cases_data-migration: 11 | 12 | Data Migration 13 | ----------------------------- 14 | 15 | Importing lots of data from data sources outside a Dataverse installation can be done 16 | with the help of the :ref:`CSV templates `. 17 | Simply add your data to the CSV files, import the files into pyDataverse, and then 18 | upload the data and metadata via the API. 19 | 20 | The following mappings currently exist: 21 | 22 | - CSV 23 | - CSV 2 pyDataverse (:ref:`Tutorial `) 24 | - pyDataverse 2 CSV (:ref:`Tutorial `) 25 | - Dataverse Upload JSON 26 | - JSON 2 pyDataverse 27 | - pyDataverse to JSON 28 | 29 | If you would like to add an additional mapping, we welcome 30 | :ref:`contributions `! 31 | 32 | 33 | .. _use-cases_testing: 34 | 35 | Testing 36 | ----------------------------- 37 | 38 | 39 | .. _use-cases_testing_create-test-data: 40 | 41 | Create test data for integrity tests (DevOps) 42 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 43 | 44 | Get full lists of all Dataverse collections, Datasets and Datafiles of an installation, 45 | or a subset of it. The results are stored in JSON files, which then 46 | can be used to do data integrity tests and verify data completeness. 47 | This is typically useful after an upgrade or a Dataverse migration. 48 | The data integrates easily into 49 | `aussda_tests `_ and to any CI 50 | build tools. 51 | 52 | The general steps for use: 53 | 54 | - Collect a data tree with all Dataverse collections, Datasets and Datafiles (:meth:`get_children() `) 55 | - Extract Dataverse collections, Datasets and Datafiles from the tree (:func:`dataverse_tree_walker() `) 56 | - Save extracted data (:func:`save_tree_data() `) 57 | 58 | 59 | .. _use-cases_testing_mass-removal: 60 | 61 | Mass removal of data in a Dataverse installation (DevOps) 62 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 63 | 64 | After testing, you often have to clean up Dataverse collections 65 | with Datasets and Datafiles within. It can be 66 | tricky to remove them all at once, but pyDataverse helps you to do it 67 | with only a few commands: 68 | 69 | - Collect a data tree with all Dataverse collections and Datasets (:meth:`get_children() `) 70 | - Extract Dataverse collections and Datasets from the tree (:func:`dataverse_tree_walker() `) 71 | - Save extracted data (:func:`save_tree_data() `) 72 | - Iterate over all Datasets to delete/destroy them (:meth:`destroy_dataset() ` :meth:`delete_dataset() `, :meth:`destroy_dataset() `) 73 | - Iterate over all Dataverse collections to delete them (:meth:`delete_dataverse() `) 74 | 75 | This functionality is not yet fully implemented in pyDataverse, 76 | but you can find it in 77 | `aussda_tests `_. 78 | 79 | 80 | .. _use-cases_data-science: 81 | 82 | Data Science Pipeline 83 | ------------------------------------ 84 | 85 | Using APIs, you can access data and/or metadata from a Dataverse installation. You can also use pyDataverse to automatically add data and metadata to your Dataset. PyDataverse connects your Data Science pipeline with your Dataverse installation. 86 | 87 | 88 | .. _use-cases_microservices: 89 | 90 | Web-Applications / Microservices 91 | ------------------------------------------ 92 | 93 | As it is a direct and easy way to access Dataverses API's and 94 | to manipulate the Dataverse installation's data models, it integrates really well into 95 | all kind of web-applications and microservices. For example, you can use pyDataverse to 96 | visualize data, do some analysis, enrich it with other data 97 | sources (and so on). 98 | -------------------------------------------------------------------------------- /pyDataverse/exceptions.py: -------------------------------------------------------------------------------- 1 | """Find out more at https://github.com/GDCC/pyDataverse.""" 2 | 3 | 4 | class DataverseError(Exception): 5 | """Base exception class for Dataverse-related error.""" 6 | 7 | pass 8 | 9 | 10 | class DataverseApiError(DataverseError): 11 | """Base exception class for Dataverse-related api error.""" 12 | 13 | pass 14 | 15 | 16 | class OperationFailedError(DataverseApiError): 17 | """Raised when an operation fails for an unknown reason.""" 18 | 19 | pass 20 | 21 | 22 | class ApiUrlError(DataverseApiError): 23 | """Raised when the request url is not valid.""" 24 | 25 | pass 26 | 27 | 28 | class ApiResponseError(DataverseApiError): 29 | """Raised when the requests response fails.""" 30 | 31 | pass 32 | 33 | 34 | class ApiAuthorizationError(OperationFailedError): 35 | """Raised if a user provides invalid credentials.""" 36 | 37 | pass 38 | 39 | 40 | class DataverseNotEmptyError(OperationFailedError): 41 | """Raised when a Dataverse has accessioned Datasets.""" 42 | 43 | pass 44 | 45 | 46 | class DataverseNotFoundError(OperationFailedError): 47 | """Raised when a Dataverse cannot be found.""" 48 | 49 | pass 50 | 51 | 52 | class DatasetNotFoundError(OperationFailedError): 53 | """Raised when a Dataset cannot be found.""" 54 | 55 | pass 56 | 57 | 58 | class DatafileNotFoundError(OperationFailedError): 59 | """Raised when a Datafile cannot be found.""" 60 | 61 | pass 62 | -------------------------------------------------------------------------------- /pyDataverse/schemas/json/datafile_upload_schema.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "http://json-schema.org/draft-07/schema", 3 | "$id": "https://github.com/GDCC/pyDataverse/schemas/json/datafile_upload_schema.json", 4 | "type": "object", 5 | "title": "Datafile JSON upload schema", 6 | "description": "Describes the full Datafile metadata JSON file structure for a Dataverse API upload.", 7 | "default": {}, 8 | "required": [ 9 | "pid", 10 | "filename" 11 | ], 12 | "additionalProperties": false, 13 | "properties": { 14 | "description": { 15 | "$id": "#/properties/description", 16 | "type": "string" 17 | }, 18 | "categories": { 19 | "$id": "#/properties/categories", 20 | "type": "array", 21 | "additionalItems": false, 22 | "items": { 23 | "anyOf": [ 24 | { 25 | "$id": "#/properties/categories/items/anyOf/0", 26 | "type": "string" 27 | } 28 | ], 29 | "$id": "#/properties/categories/items" 30 | } 31 | }, 32 | "restrict": { 33 | "$id": "#/properties/restrict", 34 | "type": "boolean" 35 | }, 36 | "pid": { 37 | "$id": "#/properties/pid", 38 | "type": "string" 39 | }, 40 | "filename": { 41 | "$id": "#/properties/filename", 42 | "type": "string" 43 | }, 44 | "label": { 45 | "$id": "#/properties/label", 46 | "type": "string" 47 | }, 48 | "directoryLabel": { 49 | "$id": "#/properties/directoryLabel", 50 | "type": "string" 51 | } 52 | } 53 | } 54 | -------------------------------------------------------------------------------- /pyDataverse/schemas/json/dataverse_upload_schema.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "http://json-schema.org/draft-07/schema", 3 | "$id": "https://github.com/GDCC/pyDataverse/schemas/json/dataverse_upload_schema.json", 4 | "type": "object", 5 | "title": "Dataverse JSON upload schema", 6 | "description": "Describes the full Dataverse metadata JSON file structure for a Dataverse API upload.", 7 | "required": [ 8 | "name", 9 | "alias", 10 | "dataverseContacts" 11 | ], 12 | "additionalProperties": false, 13 | "properties": { 14 | "name": { 15 | "$id": "#/properties/name", 16 | "type": "string" 17 | }, 18 | "alias": { 19 | "$id": "#/properties/alias", 20 | "type": "string" 21 | }, 22 | "dataverseContacts": { 23 | "$id": "#/properties/dataverseContacts", 24 | "type": "array", 25 | "additionalItems": false, 26 | "items": { 27 | "anyOf": [ 28 | { 29 | "$id": "#/properties/dataverseContacts/items/anyOf/0", 30 | "type": "object", 31 | "required": [ 32 | "contactEmail" 33 | ], 34 | "additionalProperties": false, 35 | "properties": { 36 | "contactEmail": { 37 | "$id": "#/properties/dataverseContacts/items/anyOf/0/properties/contactEmail", 38 | "type": "string" 39 | } 40 | } 41 | } 42 | ], 43 | "$id": "#/properties/dataverseContacts/items" 44 | } 45 | }, 46 | "affiliation": { 47 | "$id": "#/properties/affiliation", 48 | "type": "string" 49 | }, 50 | "description": { 51 | "$id": "#/properties/description", 52 | "type": "string" 53 | }, 54 | "dataverseType": { 55 | "$id": "#/properties/dataverseType", 56 | "type": "string" 57 | } 58 | } 59 | } 60 | -------------------------------------------------------------------------------- /pyDataverse/schemas/json/dspace_schema.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "http://json-schema.org/schema#", 3 | "type": "object", 4 | "properties": { 5 | "responseHeader": { 6 | "type": "object", 7 | "properties": { 8 | "status": { 9 | "type": "integer" 10 | }, 11 | "QTime": { 12 | "type": "integer" 13 | }, 14 | "params": { 15 | "type": "object", 16 | "properties": { 17 | "q": { 18 | "type": "string" 19 | }, 20 | "indent": { 21 | "type": "string" 22 | }, 23 | "wt": { 24 | "type": "string" 25 | } 26 | } 27 | } 28 | } 29 | }, 30 | "response": { 31 | "type": "object", 32 | "properties": { 33 | "numFound": { 34 | "type": "integer" 35 | }, 36 | "start": { 37 | "type": "integer" 38 | }, 39 | "docs": { 40 | "type": "array", 41 | "items": { 42 | "type": "object", 43 | "properties": { 44 | "ZANo": { 45 | "type": "string" 46 | }, 47 | "id": { 48 | "type": "string" 49 | }, 50 | "AccessClass": { 51 | "type": "string" 52 | }, 53 | "NumberOfUnits": { 54 | "type": "string" 55 | }, 56 | "src": { 57 | "type": "string" 58 | }, 59 | "Studynumber": { 60 | "type": "string" 61 | }, 62 | "AnalysisSystem": { 63 | "type": "string" 64 | }, 65 | "NumberOfVariables": { 66 | "type": "string" 67 | }, 68 | "SDC": { 69 | "type": "string" 70 | }, 71 | "Universe": { 72 | "type": "string" 73 | }, 74 | "max": { 75 | "type": "string" 76 | }, 77 | "min": { 78 | "type": "string" 79 | }, 80 | "Abstract": { 81 | "type": "string" 82 | }, 83 | "CollectionMethod": { 84 | "type": "string" 85 | }, 86 | "Title": { 87 | "type": "string" 88 | }, 89 | "SelectionMethod": { 90 | "type": "string" 91 | }, 92 | "Remarks": { 93 | "type": "string" 94 | }, 95 | "DatCollector": { 96 | "type": "string" 97 | }, 98 | "EAbstract": { 99 | "type": "string" 100 | }, 101 | "EUniverse": { 102 | "type": "string" 103 | }, 104 | "ECollectionMethod": { 105 | "type": "string" 106 | }, 107 | "ETitle": { 108 | "type": "string" 109 | }, 110 | "ESelectionMethod": { 111 | "type": "string" 112 | }, 113 | "EDatCollector": { 114 | "type": "string" 115 | }, 116 | "PriInvestigator": { 117 | "type": "array", 118 | "items": { 119 | "type": "string" 120 | } 121 | }, 122 | "Institution": { 123 | "type": "array", 124 | "items": { 125 | "type": "string" 126 | } 127 | }, 128 | "EGeogFree": { 129 | "type": "array", 130 | "items": { 131 | "type": "string" 132 | } 133 | }, 134 | "GeogISO": { 135 | "type": "array", 136 | "items": { 137 | "type": "string" 138 | } 139 | }, 140 | "EGeogName": { 141 | "type": "array", 142 | "items": { 143 | "type": "string" 144 | } 145 | }, 146 | "GeogFree": { 147 | "type": "array", 148 | "items": { 149 | "type": "string" 150 | } 151 | }, 152 | "GeogName": { 153 | "type": "array", 154 | "items": { 155 | "type": "string" 156 | } 157 | }, 158 | "CategoryNo": { 159 | "type": "array", 160 | "items": { 161 | "type": "string" 162 | } 163 | }, 164 | "ECategory": { 165 | "type": "array", 166 | "items": { 167 | "type": "string" 168 | } 169 | }, 170 | "Category": { 171 | "type": "array", 172 | "items": { 173 | "type": "string" 174 | } 175 | }, 176 | "TopicNo": { 177 | "type": "array", 178 | "items": { 179 | "type": "string" 180 | } 181 | }, 182 | "ETopic": { 183 | "type": "array", 184 | "items": { 185 | "type": "string" 186 | } 187 | }, 188 | "Topic": { 189 | "type": "array", 190 | "items": { 191 | "type": "string" 192 | } 193 | }, 194 | "VersionYear": { 195 | "type": "string" 196 | }, 197 | "EVersionName": { 198 | "type": "string" 199 | }, 200 | "PublicationYear": { 201 | "type": "string" 202 | }, 203 | "VersionDate": { 204 | "type": "string" 205 | }, 206 | "DOInumber": { 207 | "type": "array", 208 | "items": { 209 | "type": "string" 210 | } 211 | }, 212 | "VersionNumber": { 213 | "type": "string" 214 | }, 215 | "VersionName": { 216 | "type": "string" 217 | }, 218 | "DOI": { 219 | "type": "array", 220 | "items": { 221 | "type": "string" 222 | } 223 | }, 224 | "CollDateMin": { 225 | "type": "string" 226 | }, 227 | "CollDateMax": { 228 | "type": "string" 229 | }, 230 | "GroupNo": { 231 | "type": "array", 232 | "items": { 233 | "type": "string" 234 | } 235 | }, 236 | "EGroupName": { 237 | "type": "array", 238 | "items": { 239 | "type": "string" 240 | } 241 | }, 242 | "GroupName": { 243 | "type": "array", 244 | "items": { 245 | "type": "string" 246 | } 247 | }, 248 | "DatasetFile": { 249 | "type": "array", 250 | "items": { 251 | "type": "string" 252 | } 253 | }, 254 | "DatasetSize": { 255 | "type": "array", 256 | "items": { 257 | "type": "string" 258 | } 259 | }, 260 | "DatasetDescription": { 261 | "type": "array", 262 | "items": { 263 | "type": "string" 264 | } 265 | }, 266 | "Dataset": { 267 | "type": "array", 268 | "items": { 269 | "type": "string" 270 | } 271 | }, 272 | "EDatasetDescription": { 273 | "type": "array", 274 | "items": { 275 | "type": "string" 276 | } 277 | }, 278 | "QuestionnaireSize": { 279 | "type": "array", 280 | "items": { 281 | "type": "string" 282 | } 283 | }, 284 | "EQuestionnaireDescription": { 285 | "type": "array", 286 | "items": { 287 | "type": "string" 288 | } 289 | }, 290 | "QuestionnaireFile": { 291 | "type": "array", 292 | "items": { 293 | "type": "string" 294 | } 295 | }, 296 | "Questionnaire": { 297 | "type": "array", 298 | "items": { 299 | "type": "string" 300 | } 301 | }, 302 | "QuestionnaireDescription": { 303 | "type": "array", 304 | "items": { 305 | "type": "string" 306 | } 307 | }, 308 | "CodebookSize": { 309 | "type": "array", 310 | "items": { 311 | "type": "string" 312 | } 313 | }, 314 | "Codebook": { 315 | "type": "array", 316 | "items": { 317 | "type": "string" 318 | } 319 | }, 320 | "ECodebookDescription": { 321 | "type": "array", 322 | "items": { 323 | "type": "string" 324 | } 325 | }, 326 | "CodebookDescription": { 327 | "type": "array", 328 | "items": { 329 | "type": "string" 330 | } 331 | }, 332 | "CodebookFile": { 333 | "type": "array", 334 | "items": { 335 | "type": "string" 336 | } 337 | } 338 | } 339 | } 340 | } 341 | } 342 | } 343 | } 344 | } 345 | -------------------------------------------------------------------------------- /pyDataverse/templates/datafiles.csv: -------------------------------------------------------------------------------- 1 | "attribute","org.datafile_id","org.dataset_id","org.filename","org.to_upload","org.is_uploaded","org.to_delete","org.is_deleted","org.to_update","org.is_updated","dv.datafile_id","dv.description","dv.categories","dv.restrict","dv.label","dv.directoryLabel","alma.title","alma.pages","alma.year" 2 | "description","Unique identifier for a Datafile from the organizational perspective.","Unique identifier for a Dataset from the organizational perspective. Relates to the column in datasets.csv.","Filename without path.","Datafile is to be uploaded.","Datafile is uploaded.","Datafile is to be deleted.","Datafile is deleted.","Datafile Metadata is to be updated.","Datafile Metadata is updated.","Datafile ID in Dataverse.","Description for the file","Categories for the file.","File restriction.","Title","Directory, the file should be stored in.","Title for Alma.","Number of pages of work.", 3 | "type","serial","numeric","string","boolean","boolean","boolean","boolean","boolean","boolean","string","string","String, 4 | JSON object [""VALUE"", ""VALUE""] 5 | Labels: ""Documentation"", ""Data"", ""Code""","boolean","string","string","string","string","string" 6 | "example",1,1,"01035_en_q.pdf","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE",634,"My description bbb.","[""Data""]","FALSE","Text Report","data/subdir1","Text Report",23,1997 7 | "multiple",,,,,,,,,,,,,,,,,, 8 | "sub_keys",,,,,,,,,,,,,,,,,, 9 | -------------------------------------------------------------------------------- /pyDataverse/templates/datasets.csv: -------------------------------------------------------------------------------- 1 | "attribute","org.dataset_id","org.dataverse_id","org.doi","org.privateurl","org.to_upload","org.is_uploaded","org.to_publish","org.is_published","org.to_delete","org.is_deleted","org.to_update","org.is_updated","dv.license","dv.termsOfAccess","dv.termsOfUse","dv.otherId","dv.title","dv.subtitle","dv.alternativeTitle","dv.series","dv.notesText","dv.author","dv.dsDescription","dv.subject","dv.keyword","dv.topicClassification","dv.language","dv.grantNumber","dv.dateOfCollection","dv.kindOfData","dv.dataSources","dv.accessToSources","dv.alternativeURL","dv.characteristicOfSources","dv.dateOfDeposit","dv.depositor","dv.distributionDate","dv.otherReferences","dv.productionDate","dv.productionPlace","dv.contributor","dv.relatedDatasets","dv.relatedMaterial","dv.datasetContact","dv.distributor","dv.producer","dv.publication","dv.software","dv.timePeriodCovered","dv.geographicUnit","dv.geographicBoundingBox","dv.geographicCoverage","dv.actionsToMinimizeLoss","dv.cleaningOperations","dv.collectionMode","dv.collectorTraining","dv.controlOperations","dv.dataCollectionSituation","dv.dataCollector","dv.datasetLevelErrorNotes","dv.deviationsFromSampleDesign","dv.frequencyOfDataCollection","dv.otherDataAppraisal","dv.socialScienceNotes","dv.researchInstrument","dv.responseRate","dv.samplingErrorEstimates","dv.samplingProcedure","dv.unitOfAnalysis","dv.universe","dv.timeMethod","dv.weighting","dv.fileAccessRequest" 2 | "description","Unique identifier for a Dataset from the organizational perspective.","Unique identifier for the dataverse coming from the organization. Ether dataverse ID or dataverse alias. Relates to column in datverses.csv.","DOI related to the Dataset.","Private URL related to the Dataset.","Is the Dataset to be uploaded.","Is the Dataset uploaded.","Is the Dataset to be published.","Is the Dataset published.","Is the Dataset to be deleted.","Is the Dataset deleted.","Dataset Metadata is to be updated.","Dataset Metadata is updated.","License text.","Terms of Access text.","Terms of Use text.","Other identifier related to the Dataset.","Title","Subtitle","Alternative title.","Series, which the Dataset is part of.",,"Author(s), work creator(s) of the Dataset content.","Description","Subject","Keyword(s)","Topic(s)","Language(s)","Grant number(s)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 3 | "type","serial","serial/string","string","string","boolean","boolean","boolean","boolean","boolean","boolean","boolean","boolean","string","string","string","String 4 | JSON object [{""otherIdAgency"": ""VALUE"", ""otherIdValue"": ""VALUE""}]","string","string","string","string 5 | JSON object 6 | {""seriesName"": ""VALUE"", ""seriesInformation"": ""VALUE""}","string","String 7 | JSON object [{""authorName"": ""VALUE"", ""authorAffiliation"": ""VALUE"", ""authorIdentifierScheme"": ""VALUE"", ""authorIdentifier"": ""VALUE""}]","String 8 | JSON object [{""dsDescriptionValue"": ""VALUE"", ""dsDescriptionDate"": ""VALUE""}]","String, JSON object 9 | [""VALUE"", ""VALUE""]","String, 10 | JSON object [{""keywordValue"": ""VALUE"", ""keywordVocabulary"": ""VALUE"", ""keywordVocabularyURI"": ""VALUE""}]","String, 11 | JSON object [{""topicClassValue"": ""VALUE"", ""topicClassVocab"": ""VALUE"", ""topicClassVocabURI"": ""VALUE""}]","String, JSON object 12 | [""VALUE"", ""VALUE""]","String 13 | JSON object [{""grantNumberAgency"": ""VALUE"", ""grantNumberValue"": ""VALUE""}]","String, 14 | JSON object [{""dateOfCollectionStart"": ""VALUE"", ""dateOfCollectionEnd"": ""VALUE""}] 15 | DateOfCollectionStart & dateOfCollectionEnd → String, YYYY-MM-DD ","string, 16 | JSON object [""VALUE"", ""VALUE""]","string, 17 | JSON object [""VALUE"", ""VALUE""]","string","string; URL","string","string; YYYY-MM-DD","string","string; YYYY-MM-DD","string, 18 | JSON object [""VALUE"", ""VALUE""]","string; YYYY-MM-DD","string","String, 19 | JSON object [{""contributorType"": ""VALUE"", ""contributorName"": ""VALUE""}]","string, 20 | JSON object [""VALUE"", ""VALUE""]","string, 21 | JSON object [""VALUE"", ""VALUE""]","String, 22 | JSON object [{""datasetContactName"": ""VALUE"", ""datasetContactAffiliation"": ""VALUE"", ""datasetContactEmail"": ""VALUE""}]","String, 23 | JSON object [{""distributorName"": ""VALUE"", ""distributorAffiliation"": ""VALUE"", ""distributorAbbreviation"": ""VALUE"", ""distributorURL"": ""VALUE"", ""distributorLogoURL"": ""VALUE""}]","String, 24 | JSON object [{""producerName"": ""VALUE"", ""producerAffiliation"": ""VALUE"", ""producerAbbreviation"": ""VALUE"", ""producerURL"": ""VALUE"", ""producerLogoURL"": ""VALUE""}]","String, 25 | JSON object [{""publicationCitation"": ""VALUE"", ""publicationIDType"": ""VALUE"", ""publicationIDNumber"": ""VALUE"", ""publicationURL"": ""VALUE""}]","String, 26 | JSON object [{""softwareName"": ""VALUE"", ""softwareVersion"": ""VALUE""}]","String, 27 | JSON object [{""timePeriodCoveredStart"": ""VALUE"", ""timePeriodCoveredEnd"": ""VALUE""}]","string, 28 | JSON object [""VALUE"", ""VALUE""]","String, 29 | JSON object [{""westLongitude"": ""VALUE"", ""eastLongitude"": ""VALUE"", ""northLongitude"": ""VALUE"", ""southLongitude"": ""VALUE""}]","String, 30 | JSON object [{""country"": ""VALUE"", ""state"": ""VALUE"", ""city"": ""VALUE"", ""otherGeographicCoverage"": ""VALUE""}]","string","string","string","string","string","string","string","string","string","string","string","String 31 | JSON object [{""socialScienceNotesType"": ""VALUE"", ""socialScienceNotesSubject"": ""VALUE"", ""socialScienceNotesText"": ""VALUE""}]","string","string","string","string","string, 32 | JSON object [""VALUE"", ""VALUE""]","string, 33 | JSON object [""VALUE"", ""VALUE""]","string","string","bool" 34 | "example",1,1,"doi:10.11587/19ZW6I",,"TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","CC0","Terms of Access","CC0 Waiver","[{""otherIdAgency"": ""OtherIDAgency1"", ""otherIdValue"": ""OtherIDIdentifier1""}]","Replication Data for: Title","Subtitle","Alternative Title","{""seriesName"": ""SeriesName"", ""seriesInformation"": ""SeriesInformation""}","Notes1","[{""authorName"": ""LastAuthor1, FirstAuthor1"", ""authorAffiliation"": ""AuthorAffiliation1"", ""authorIdentifierScheme"": ""ORCID"", ""authorIdentifier"": ""AuthorIdentifier1""}]","[{""dsDescriptionValue"": ""DescriptionText2"", ""dsDescriptionDate"": ""1000-02-02""}]","[""Agricultural Sciences"", ""Business and Management"", ""Engineering"", ""Law""]","[{""keywordValue"": ""KeywordTerm1"", ""keywordVocabulary"": ""KeywordVocabulary1"", ""keywordVocabularyURI"": ""http://KeywordVocabularyURL1.org""}]","[{""topicClassValue"": ""Topic Class Value1"", ""topicClassVocab"": ""Topic Classification Vocabulary"", ""topicClassVocabURI"": ""http://www.topicURL.net""}]","[""English"", ""German""]","[{""grantNumberAgency"": ""GrantInformationGrantAgency1"", ""grantNumberValue"": ""GrantInformationGrantNumber1""}]","[{""dateOfCollectionStart"": ""1006-01-01"", ""dateOfCollectionEnd"": ""1006-01-01""}]","[""KindOfData1"", ""KindOfData2""]","[""DataSources1"", ""DataSources2""]","DocumentationAndAccessToSources","http://AlternativeURL.org","CharacteristicOfSourcesNoted","1002-01-01","LastDepositor, FirstDepositor","1004-01-01","[""OtherReferences1"", ""OtherReferences2""]","1003-01-01","ProductionPlace","[{""contributorType"": ""Data Collector"", ""contributorName"": ""LastContributor1, FirstContributor1""}]","[""RelatedDatasets1"", ""RelatedDatasets2""]","[""RelatedMaterial1"", ""RelatedMaterial2""]","[{""datasetContactName"": ""LastContact1, FirstContact1"", ""datasetContactAffiliation"": ""ContactAffiliation1"", ""datasetContactEmail"": ""ContactEmail1@mailinator.com""}]","[{""distributorName"": ""LastDistributor1, FirstDistributor1"", ""distributorAffiliation"": ""DistributorAffiliation1"", ""distributorAbbreviation"": ""DistributorAbbreviation1"", ""distributorURL"": ""http://DistributorURL1.org"", ""distributorLogoURL"": ""http://DistributorLogoURL1.org""}]","[{""producerName"": ""LastProducer1, FirstProducer1"", ""producerAffiliation"": ""ProducerAffiliation1"", ""producerAbbreviation"": ""ProducerAbbreviation1"", ""producerURL"": ""http://ProducerURL1.org"", ""producerLogoURL"": ""http://ProducerLogoURL1.org""}]","[{""publicationCitation"": ""RelatedPublicationCitation1"", ""publicationIDType"": ""ark"", ""publicationIDNumber"": ""RelatedPublicationIDNumber1"", ""publicationURL"": ""http://RelatedPublicationURL1.org""}]","[{""softwareName"": ""SoftwareName1"", ""softwareVersion"": ""SoftwareVersion1""}]","[{""timePeriodCoveredStart"": ""1005-01-01"", ""timePeriodCoveredEnd"": ""1005-01-02""}]","[""GeographicUnit1"", ""GeographicUnit2""]","[{""westLongitude"": ""10"", ""eastLongitude"": ""20"", ""northLongitude"": ""30"", ""southLongitude"": ""40""}]","[{""country"": ""Afghanistan"", ""state"": ""GeographicCoverageStateProvince1"", ""city"": ""GeographicCoverageCity1"", ""otherGeographicCoverage"": ""GeographicCoverageOther1""}]","ActionsToMinimizeLosses","CleaningOperations","CollectionMode","CollectorTraining","ControlOperations","CharacteristicsOfDataCollectionSituation","LastDataCollector1, FirstDataCollector1","StudyLevelErrorNotes","MajorDeviationsForSampleDesign","Frequency","OtherFormsOfDataAppraisal","[{""socialScienceNotesType"": ""NotesType"", ""socialScienceNotesSubject"": ""NotesSubject"", ""socialScienceNotesText"": ""NotesText""}]","TypeOfResearchInstrument","ResponseRate","EstimatesOfSamplingError","SamplingProcedure","[""UnitOfAnalysis1"", ""UnitOfAnalysis2""]","[""Universe1"", ""Universe2""]","TimeMethod","Weighting","True" 35 | "multiple",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 36 | "sub_keys",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 37 | -------------------------------------------------------------------------------- /pyDataverse/templates/dataverses.csv: -------------------------------------------------------------------------------- 1 | "attribute","org.dataverse_id","org.to_upload","org.is_uploaded","org.to_publish","org.is_published","org.to_delete","org.is_deleted","org.to_update","org.is_updated","dv.dataverse_id","dv.dataverse","dv.affiliation","dv.alias","dv.dataverseContacts","dv.dataverseType","dv.description","dv.name" 2 | "description","Unique identifier for the dataverse coming from the organization.","Dataverse should be uploaded.","Dataverse is uploaded.","Dataverse is to be published.","Dataverse is published.","Dataverse is to be published.","Dataverse is published.","Dataverse Metadata is to be updated.","Dataverse Metadata is updated.","Dataverse ID in Dataverse.","Parent dataverse.","Affiliation of Dataverse","Alias of Dataverse","Contact Email(s) of Dataverse.","Type of Dataverse.","Description text of Dataverse.","Name of Dataverse." 3 | "type","string","boolean","boolean","boolean","boolean","boolean","boolean","boolean","boolean","number","string","string","string","string","string","string","string" 4 | "example",1,"TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE",31,"public","Scientific Research University","science","[{""contactEmail"": ""pi@example.edu""}, {""contactEmail"": ""student@ex ample.edu""}]","LABORATORY","We do all the science.","Scientific Research" 5 | "multiple","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","TRUE","FALSE","FALSE","FALSE" 6 | "sub_keys","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","contactEmail (format: email)","NULL","NULL","NULL" 7 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.poetry] 2 | name = "pyDataverse" 3 | version = "0.3.4" 4 | description = "A Python module for Dataverse." 5 | authors = [ 6 | "Stefan Kasberger ", 7 | "Jan Range ", 8 | ] 9 | license = "MIT" 10 | readme = "README.md" 11 | repository = "https://github.com/gdcc/pyDataverse" 12 | packages = [{ include = "pyDataverse" }] 13 | 14 | [tool.poetry.dependencies] 15 | python = "^3.8.1" 16 | httpx = "^0.27.0" 17 | jsonschema = "^4.21.1" 18 | 19 | [tool.poetry.group.dev] 20 | optional = true 21 | 22 | [tool.poetry.group.dev.dependencies] 23 | black = "^24.3.0" 24 | radon = "^6.0.1" 25 | mypy = "^1.9.0" 26 | autopep8 = "^2.1.0" 27 | pydocstyle = "^6.3.0" 28 | pygments = "^2.17.2" 29 | pytest = "^8.1.1" 30 | pytest-cov = "^5.0.0" 31 | tox = "^4.14.2" 32 | selenium = "^4.19.0" 33 | wheel = "^0.43.0" 34 | pre-commit = "3.5.0" 35 | sphinx = "7.1.2" 36 | restructuredtext-lint = "^1.4.0" 37 | rstcheck = "^6.2.1" 38 | ruff = "^0.4.4" 39 | 40 | 41 | [tool.poetry.group.tests] 42 | optional = true 43 | 44 | [tool.poetry.group.tests.dependencies] 45 | pytest = "^8.1.1" 46 | pytest-cov = "^5.0.0" 47 | pytest-asyncio = "^0.23.7" 48 | tox = "^4.14.2" 49 | selenium = "^4.19.0" 50 | 51 | [tool.poetry.group.docs] 52 | optional = true 53 | 54 | [tool.poetry.group.docs.dependencies] 55 | sphinx = "7.1.2" 56 | pydocstyle = "^6.3.0" 57 | restructuredtext-lint = "^1.4.0" 58 | pygments = "^2.17.2" 59 | rstcheck = "^6.2.1" 60 | 61 | [tool.poetry.group.lint] 62 | optional = true 63 | 64 | [tool.poetry.group.lint.dependencies] 65 | black = "^24.3.0" 66 | radon = "^6.0.1" 67 | mypy = "^1.9.0" 68 | types-jsonschema = "^4.23.0" 69 | autopep8 = "^2.1.0" 70 | ruff = "^0.4.4" 71 | 72 | [build-system] 73 | requires = ["poetry-core"] 74 | build-backend = "poetry.core.masonry.api" 75 | 76 | [tool.pytest.ini_options] 77 | addopts = ["-v", "--cov=pyDataverse"] 78 | 79 | [tool.coverage.run] 80 | source = "tests" 81 | 82 | [tool.coverage.report] 83 | show_missing = true 84 | 85 | [tool.radon] 86 | cc_min = "B" 87 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # Requirements 2 | httpx==0.27.0 3 | jsonschema==4.21.1 4 | -------------------------------------------------------------------------------- /run-tests.sh: -------------------------------------------------------------------------------- 1 | #!bin/bash 2 | 3 | # Parse arguments 4 | usage() { 5 | echo "Usage: $0 [-p Python version (e.g. 3.10, 3.11, ...)]" 1>&2 6 | exit 1 7 | } 8 | 9 | while getopts ":p:d:" o; do 10 | case "${o}" in 11 | p) 12 | p=${OPTARG} 13 | ;; 14 | *) ;; 15 | esac 16 | done 17 | shift $((OPTIND - 1)) 18 | 19 | # Fall back to Python 3.11 if no Python version is specified 20 | if [ -z "${p}" ]; then 21 | printf "\n⚠️ No Python version specified falling back to '3.11'\n" 22 | p=3.11 23 | fi 24 | 25 | # Validate Python version 26 | if [[ ! "${p}" =~ ^3\.[0-9]+$ ]]; then 27 | echo "\n❌ Invalid Python version. Please specify a valid Python version (e.g. 3.10, 3.11, ...)\n" 28 | exit 1 29 | fi 30 | 31 | # Check if Docker is installed 32 | if ! command -v docker &>/dev/null; then 33 | echo "✋ Docker is not installed. Please install Docker before running this script." 34 | exit 1 35 | fi 36 | 37 | # Prepare the environment for the test 38 | mkdir dv >>/dev/null 2>&1 39 | touch dv/bootstrap.exposed.env >>/dev/null 2>&1 40 | 41 | # Add python version to the environment 42 | export PYTHON_VERSION=${p} 43 | 44 | printf "\n🚀 Preparing containers\n" 45 | printf " Using PYTHON_VERSION=${p}\n\n" 46 | 47 | # Run all containers 48 | docker compose \ 49 | -f docker/docker-compose-base.yml \ 50 | -f ./docker/docker-compose-test-all.yml \ 51 | --env-file local-test.env \ 52 | up -d 53 | 54 | printf "\n🔎 Running pyDataverse tests\n" 55 | printf " Logs will be printed once finished...\n\n" 56 | 57 | # Check if "unit-test" container has finished 58 | while [ -n "$(docker ps -f "name=unit-tests" -f "status=running" -q)" ]; do 59 | printf " Waiting for unit-tests container to finish...\n" 60 | sleep 5 61 | done 62 | 63 | # Check if "unit-test" container has failed 64 | if [ "$(docker inspect -f '{{.State.ExitCode}}' unit-tests)" -ne 0 ]; then 65 | printf "\n❌ Unit tests failed. Printing logs...\n" 66 | docker logs unit-tests 67 | printf "\n Stopping containers\n" 68 | docker compose \ 69 | -f docker/docker-compose-base.yml \ 70 | -f ./docker/docker-compose-test-all.yml \ 71 | --env-file local-test.env \ 72 | down 73 | exit 1 74 | fi 75 | 76 | # Print test results 77 | printf "\n" 78 | cat dv/unit-tests.log 79 | printf "\n\n✅ Unit tests passed\n\n" 80 | 81 | # Stop all containers 82 | docker compose \ 83 | -f docker/docker-compose-base.yml \ 84 | -f ./docker/docker-compose-test-all.yml \ 85 | --env-file local-test.env \ 86 | down 87 | printf "\n🎉 Done\n\n" 88 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/tests/__init__.py -------------------------------------------------------------------------------- /tests/api/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/tests/api/__init__.py -------------------------------------------------------------------------------- /tests/api/test_access.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import httpx 4 | 5 | from pyDataverse.api import DataAccessApi 6 | 7 | 8 | class TestDataAccess: 9 | def test_get_data_by_id(self): 10 | """Tests getting data file by id.""" 11 | 12 | # Arrange 13 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 14 | API_TOKEN = os.getenv("API_TOKEN") 15 | 16 | assert BASE_URL is not None, "BASE_URL is not set" 17 | assert API_TOKEN is not None, "API_TOKEN is not set" 18 | 19 | # Create dataset 20 | metadata = json.load(open("tests/data/file_upload_ds_minimum.json")) 21 | pid = self._create_dataset(BASE_URL, API_TOKEN, metadata) 22 | api = DataAccessApi(BASE_URL, API_TOKEN) 23 | 24 | # Upload a file 25 | self._upload_datafile(BASE_URL, API_TOKEN, pid) 26 | 27 | # Retrieve the file ID 28 | file_id = self._get_file_id(BASE_URL, API_TOKEN, pid) 29 | 30 | # Act 31 | response = api.get_datafile(file_id, is_pid=False) 32 | response.raise_for_status() 33 | content = response.content.decode("utf-8") 34 | 35 | # Assert 36 | expected = open("tests/data/datafile.txt").read() 37 | assert content == expected, "Data retrieval failed." 38 | 39 | def test_get_data_by_pid(self): 40 | """Tests getting data file by id. 41 | 42 | Test runs with a PID instead of a file ID from Harvard. 43 | No PID given if used within local containers 44 | 45 | TODO - Check if possible with containers 46 | """ 47 | 48 | # Arrange 49 | BASE_URL = "https://dataverse.harvard.edu" 50 | pid = "doi:10.7910/DVN/26093/IGA4JD" 51 | api = DataAccessApi(BASE_URL) 52 | 53 | # Act 54 | response = api.get_datafile(pid, is_pid=True) 55 | response.raise_for_status() 56 | content = response.content 57 | 58 | # Assert 59 | expected = self._get_file_content(BASE_URL, pid) 60 | assert content == expected, "Data retrieval failed." 61 | 62 | @staticmethod 63 | def _create_dataset( 64 | BASE_URL: str, 65 | API_TOKEN: str, 66 | metadata: dict, 67 | ): 68 | """ 69 | Create a dataset in the Dataverse. 70 | 71 | Args: 72 | BASE_URL (str): The base URL of the Dataverse instance. 73 | API_TOKEN (str): The API token for authentication. 74 | metadata (dict): The metadata for the dataset. 75 | 76 | Returns: 77 | str: The persistent identifier (PID) of the created dataset. 78 | """ 79 | url = f"{BASE_URL}/api/dataverses/root/datasets" 80 | response = httpx.post( 81 | url=url, 82 | json=metadata, 83 | headers={ 84 | "X-Dataverse-key": API_TOKEN, 85 | "Content-Type": "application/json", 86 | }, 87 | ) 88 | 89 | response.raise_for_status() 90 | 91 | return response.json()["data"]["persistentId"] 92 | 93 | @staticmethod 94 | def _get_file_id( 95 | BASE_URL: str, 96 | API_TOKEN: str, 97 | pid: str, 98 | ): 99 | """Retrieves a file ID for a given persistent identifier (PID) in Dataverse.""" 100 | 101 | response = httpx.get( 102 | url=f"{BASE_URL}/api/datasets/:persistentId/?persistentId={pid}", 103 | headers={ 104 | "X-Dataverse-key": API_TOKEN, 105 | "Content-Type": "application/json", 106 | }, 107 | ) 108 | 109 | response.raise_for_status() 110 | 111 | return response.json()["data"]["latestVersion"]["files"][0]["dataFile"]["id"] 112 | 113 | @staticmethod 114 | def _upload_datafile( 115 | BASE_URL: str, 116 | API_TOKEN: str, 117 | pid: str, 118 | ): 119 | """Uploads a file to Dataverse""" 120 | 121 | url = f"{BASE_URL}/api/datasets/:persistentId/add?persistentId={pid}" 122 | response = httpx.post( 123 | url=url, 124 | files={"file": open("tests/data/datafile.txt", "rb")}, 125 | headers={ 126 | "X-Dataverse-key": API_TOKEN, 127 | }, 128 | ) 129 | 130 | response.raise_for_status() 131 | 132 | @staticmethod 133 | def _get_file_content( 134 | BASE_URL: str, 135 | pid: str, 136 | ): 137 | """Retrieves the file content for testing purposes.""" 138 | 139 | response = httpx.get( 140 | url=f"{BASE_URL}/api/access/datafile/:persistentId/?persistentId={pid}", 141 | follow_redirects=True, 142 | ) 143 | 144 | response.raise_for_status() 145 | 146 | return response.content 147 | -------------------------------------------------------------------------------- /tests/api/test_api.py: -------------------------------------------------------------------------------- 1 | import os 2 | import httpx 3 | import pytest 4 | from httpx import Response 5 | from time import sleep 6 | from pyDataverse.api import DataAccessApi, NativeApi, SwordApi 7 | from pyDataverse.auth import ApiTokenAuth 8 | from pyDataverse.exceptions import ApiAuthorizationError 9 | from pyDataverse.exceptions import ApiUrlError 10 | from pyDataverse.models import Dataset 11 | from pyDataverse.utils import read_file 12 | from ..conftest import test_config 13 | 14 | 15 | BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(os.path.dirname(__file__)))) 16 | 17 | 18 | class TestApiConnect(object): 19 | """Test the NativeApi() class initialization.""" 20 | 21 | def test_api_connect(self, native_api): 22 | sleep(test_config["wait_time"]) 23 | 24 | assert isinstance(native_api, NativeApi) 25 | assert not native_api.api_token 26 | assert native_api.api_version == "v1" 27 | assert native_api.base_url == os.getenv("BASE_URL").rstrip("/") 28 | assert native_api.base_url_api_native == "{0}/api/{1}".format( 29 | os.getenv("BASE_URL").rstrip("/"), native_api.api_version 30 | ) 31 | 32 | def test_api_connect_base_url_wrong(self): 33 | """Test native_api connection with wrong `base_url`.""" 34 | # None 35 | with pytest.raises(ApiUrlError): 36 | NativeApi(None) 37 | 38 | 39 | class TestApiTokenAndAuthBehavior: 40 | def test_api_token_none_and_auth_none(self): 41 | api = NativeApi("https://demo.dataverse.org") 42 | assert api.api_token is None 43 | assert api.auth is None 44 | 45 | def test_api_token_none_and_auth(self): 46 | auth = ApiTokenAuth("mytoken") 47 | api = NativeApi("https://demo.dataverse.org", auth=auth) 48 | assert api.api_token is None 49 | assert api.auth is auth 50 | 51 | def test_api_token_and_auth(self): 52 | auth = ApiTokenAuth("mytoken") 53 | # Only one, api_token or auth, should be specified 54 | with pytest.warns(UserWarning): 55 | api = NativeApi( 56 | "https://demo.dataverse.org", api_token="sometoken", auth=auth 57 | ) 58 | assert api.api_token is None 59 | assert api.auth is auth 60 | 61 | def test_api_token_and_auth_none(self): 62 | api_token = "mytoken" 63 | api = NativeApi("https://demo.dataverse.org", api_token) 64 | assert api.api_token == api_token 65 | assert isinstance(api.auth, ApiTokenAuth) 66 | assert api.auth.api_token == api_token 67 | 68 | 69 | class TestApiRequests(object): 70 | """Test the native_api requests.""" 71 | 72 | dataset_id = None 73 | 74 | @classmethod 75 | def setup_class(cls): 76 | """Create the native_api connection for later use.""" 77 | cls.dataverse_id = "test-pyDataverse" 78 | cls.dataset_id = None 79 | 80 | def test_get_request(self, native_api): 81 | """Test successful `.get_request()` request.""" 82 | # TODO: test params und auth default 83 | base_url = os.getenv("BASE_URL").rstrip("/") 84 | query_str = base_url + "/api/v1/info/server" 85 | resp = native_api.get_request(query_str) 86 | sleep(test_config["wait_time"]) 87 | 88 | assert isinstance(resp, Response) 89 | 90 | def test_get_dataverse(self, native_api): 91 | """Test successful `.get_dataverse()` request`.""" 92 | resp = native_api.get_dataverse(":root") 93 | sleep(test_config["wait_time"]) 94 | 95 | assert isinstance(resp, Response) 96 | 97 | 98 | if not os.environ.get("TRAVIS"): 99 | 100 | class TestApiToken(object): 101 | """Test user rights.""" 102 | 103 | def test_token_missing(self): 104 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 105 | api = NativeApi(BASE_URL) 106 | resp = api.get_info_version() 107 | assert resp.json()["data"]["version"] == os.getenv("DV_VERSION") 108 | # assert resp.json()["data"]["build"] == "267-a91d370" 109 | 110 | with pytest.raises(ApiAuthorizationError): 111 | ds = Dataset() 112 | ds.from_json( 113 | read_file( 114 | os.path.join( 115 | BASE_DIR, "tests/data/dataset_upload_min_default.json" 116 | ) 117 | ) 118 | ) 119 | api.create_dataset(":root", ds.json()) 120 | 121 | def test_token_empty_string(self): 122 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 123 | api = NativeApi(BASE_URL, "") 124 | resp = api.get_info_version() 125 | assert resp.json()["data"]["version"] == os.getenv("DV_VERSION") 126 | # assert resp.json()["data"]["build"] == "267-a91d370" 127 | 128 | with pytest.raises(ApiAuthorizationError): 129 | ds = Dataset() 130 | ds.from_json( 131 | read_file( 132 | os.path.join( 133 | BASE_DIR, "tests/data/dataset_upload_min_default.json" 134 | ) 135 | ) 136 | ) 137 | api.create_dataset(":root", ds.json()) 138 | 139 | # def test_token_no_rights(self): 140 | # BASE_URL = os.getenv("BASE_URL") 141 | # API_TOKEN = os.getenv("API_TOKEN_NO_RIGHTS") 142 | # api = NativeApi(BASE_URL, API_TOKEN) 143 | # resp = api.get_info_version() 144 | # assert resp.json()["data"]["version"] == os.getenv("DV_VERSION") 145 | # assert resp.json()["data"]["build"] == "267-a91d370" 146 | 147 | # with pytest.raises(ApiAuthorizationError): 148 | # ds = Dataset() 149 | # ds.from_json( 150 | # read_file( 151 | # os.path.join( 152 | # BASE_DIR, "tests/data/dataset_upload_min_default.json" 153 | # ) 154 | # ) 155 | # ) 156 | # api.create_dataset(":root", ds.json()) 157 | 158 | def test_token_right_create_dataset_rights(self): 159 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 160 | api_su = NativeApi(BASE_URL, os.getenv("API_TOKEN_SUPERUSER")) 161 | # api_nru = NativeApi(BASE_URL, os.getenv("API_TOKEN_TEST_NO_RIGHTS")) 162 | 163 | resp = api_su.get_info_version() 164 | assert resp.json()["data"]["version"] == os.getenv("DV_VERSION") 165 | # assert resp.json()["data"]["build"] == "267-a91d370" 166 | # resp = api_nru.get_info_version() 167 | # assert resp.json()["data"]["version"] == os.getenv("DV_VERSION") 168 | # assert resp.json()["data"]["build"] == "267-a91d370" 169 | 170 | ds = Dataset() 171 | ds.from_json( 172 | read_file( 173 | os.path.join(BASE_DIR, "tests/data/dataset_upload_min_default.json") 174 | ) 175 | ) 176 | resp = api_su.create_dataset(":root", ds.json()) 177 | pid = resp.json()["data"]["persistentId"] 178 | assert resp.json()["status"] == "OK" 179 | 180 | # with pytest.raises(ApiAuthorizationError): 181 | # resp = api_nru.get_dataset(pid) 182 | 183 | resp = api_su.delete_dataset(pid) 184 | assert resp.json()["status"] == "OK" 185 | 186 | def test_token_should_not_be_exposed_on_error(self): 187 | BASE_URL = os.getenv("BASE_URL") 188 | API_TOKEN = os.getenv("API_TOKEN") 189 | api = DataAccessApi(BASE_URL, API_TOKEN) 190 | 191 | result = api.get_datafile("does-not-exist").json() 192 | assert API_TOKEN not in result["requestUrl"] 193 | 194 | @pytest.mark.parametrize( 195 | "auth", (True, False, "api-token", ApiTokenAuth("some-token")) 196 | ) 197 | def test_using_auth_on_individual_requests_is_deprecated(self, auth): 198 | BASE_URL = os.getenv("BASE_URL") 199 | API_TOKEN = os.getenv("API_TOKEN") 200 | api = DataAccessApi(BASE_URL, auth=ApiTokenAuth(API_TOKEN)) 201 | with pytest.warns(DeprecationWarning): 202 | api.get_datafile("does-not-exist", auth=auth) 203 | 204 | @pytest.mark.parametrize( 205 | "auth", (True, False, "api-token", ApiTokenAuth("some-token")) 206 | ) 207 | def test_using_auth_on_individual_requests_is_deprecated_unauthorized( 208 | self, auth 209 | ): 210 | BASE_URL = os.getenv("BASE_URL") 211 | no_auth_api = DataAccessApi(BASE_URL) 212 | with pytest.warns(DeprecationWarning): 213 | no_auth_api.get_datafile("does-not-exist", auth=auth) 214 | 215 | def test_sword_api_requires_http_basic_auth(self): 216 | BASE_URL = os.getenv("BASE_URL") 217 | API_TOKEN = os.getenv("API_TOKEN") 218 | api = SwordApi(BASE_URL, api_token=API_TOKEN) 219 | assert isinstance(api.auth, httpx.BasicAuth) 220 | 221 | def test_sword_api_can_authenticate(self): 222 | BASE_URL = os.getenv("BASE_URL") 223 | API_TOKEN = os.getenv("API_TOKEN") 224 | api = SwordApi(BASE_URL, api_token=API_TOKEN) 225 | response = api.get_service_document() 226 | assert response.status_code == 200 227 | 228 | def test_sword_api_cannot_authenticate_without_token(self): 229 | BASE_URL = os.getenv("BASE_URL") 230 | api = SwordApi(BASE_URL) 231 | with pytest.raises(ApiAuthorizationError): 232 | api.get_service_document() 233 | -------------------------------------------------------------------------------- /tests/api/test_async_api.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import pytest 3 | 4 | 5 | class TestAsyncAPI: 6 | @pytest.mark.asyncio 7 | async def test_async_api(self, native_api): 8 | async with native_api: 9 | tasks = [native_api.get_info_version() for _ in range(10)] 10 | responses = await asyncio.gather(*tasks) 11 | 12 | assert len(responses) == 10 13 | for response in responses: 14 | assert response.status_code == 200, "Request failed." 15 | -------------------------------------------------------------------------------- /tests/api/test_upload.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import tempfile 4 | 5 | import httpx 6 | 7 | from pyDataverse.api import DataAccessApi, NativeApi 8 | from pyDataverse.models import Datafile 9 | 10 | 11 | class TestFileUpload: 12 | def test_file_upload(self): 13 | """ 14 | Test case for uploading a file to a dataset. 15 | 16 | This test case performs the following steps: 17 | 1. Creates a dataset using the provided metadata. 18 | 2. Prepares a file for upload. 19 | 3. Uploads the file to the dataset. 20 | 4. Asserts that the file upload was successful. 21 | 22 | Raises: 23 | AssertionError: If the file upload fails. 24 | 25 | """ 26 | # Arrange 27 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 28 | API_TOKEN = os.getenv("API_TOKEN") 29 | 30 | # Create dataset 31 | metadata = json.load(open("tests/data/file_upload_ds_minimum.json")) 32 | pid = self._create_dataset(BASE_URL, API_TOKEN, metadata) 33 | api = NativeApi(BASE_URL, API_TOKEN) 34 | 35 | # Prepare file upload 36 | df = Datafile({"pid": pid, "filename": "datafile.txt"}) 37 | 38 | # Act 39 | response = api.upload_datafile( 40 | identifier=pid, 41 | filename="tests/data/datafile.txt", 42 | json_str=df.json(), 43 | ) 44 | 45 | # Assert 46 | assert response.status_code == 200, "File upload failed." 47 | 48 | def test_file_upload_without_metadata(self): 49 | """ 50 | Test case for uploading a file to a dataset without metadata. 51 | 52 | --> json_str will be set as None 53 | 54 | This test case performs the following steps: 55 | 1. Creates a dataset using the provided metadata. 56 | 2. Prepares a file for upload. 57 | 3. Uploads the file to the dataset. 58 | 4. Asserts that the file upload was successful. 59 | 60 | Raises: 61 | AssertionError: If the file upload fails. 62 | 63 | """ 64 | # Arrange 65 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 66 | API_TOKEN = os.getenv("API_TOKEN") 67 | 68 | # Create dataset 69 | metadata = json.load(open("tests/data/file_upload_ds_minimum.json")) 70 | pid = self._create_dataset(BASE_URL, API_TOKEN, metadata) 71 | api = NativeApi(BASE_URL, API_TOKEN) 72 | 73 | # Act 74 | response = api.upload_datafile( 75 | identifier=pid, 76 | filename="tests/data/datafile.txt", 77 | json_str=None, 78 | ) 79 | 80 | # Assert 81 | assert response.status_code == 200, "File upload failed." 82 | 83 | def test_bulk_file_upload(self, create_mock_file): 84 | """ 85 | Test case for uploading bulk files to a dataset. 86 | 87 | This test is meant to check the performance of the file upload feature 88 | and that nothing breaks when uploading multiple files in line. 89 | 90 | This test case performs the following steps: 91 | 0. Create 50 mock files. 92 | 1. Creates a dataset using the provided metadata. 93 | 2. Prepares a file for upload. 94 | 3. Uploads the file to the dataset. 95 | 4. Asserts that the file upload was successful. 96 | 97 | Raises: 98 | AssertionError: If the file upload fails. 99 | 100 | """ 101 | # Arrange 102 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 103 | API_TOKEN = os.getenv("API_TOKEN") 104 | 105 | # Create dataset 106 | metadata = json.load(open("tests/data/file_upload_ds_minimum.json")) 107 | pid = self._create_dataset(BASE_URL, API_TOKEN, metadata) 108 | api = NativeApi(BASE_URL, API_TOKEN) 109 | 110 | with tempfile.TemporaryDirectory() as tmp_dir: 111 | # Create mock files 112 | mock_files = [ 113 | create_mock_file( 114 | filename=f"mock_file_{i}.txt", 115 | dir=tmp_dir, 116 | size=1024**2, # 1MB 117 | ) 118 | for i in range(50) 119 | ] 120 | 121 | for mock_file in mock_files: 122 | # Prepare file upload 123 | df = Datafile({"pid": pid, "filename": os.path.basename(mock_file)}) 124 | 125 | # Act 126 | response = api.upload_datafile( 127 | identifier=pid, 128 | filename=mock_file, 129 | json_str=df.json(), 130 | ) 131 | 132 | # Assert 133 | assert response.status_code == 200, "File upload failed." 134 | 135 | def test_file_replacement_wo_metadata(self): 136 | """ 137 | Test case for replacing a file in a dataset without metadata. 138 | 139 | Steps: 140 | 1. Create a dataset using the provided metadata. 141 | 2. Upload a datafile to the dataset. 142 | 3. Replace the uploaded datafile with a mutated version. 143 | 4. Verify that the file replacement was successful and the content matches the expected content. 144 | """ 145 | 146 | # Arrange 147 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 148 | API_TOKEN = os.getenv("API_TOKEN") 149 | 150 | # Create dataset 151 | metadata = json.load(open("tests/data/file_upload_ds_minimum.json")) 152 | pid = self._create_dataset(BASE_URL, API_TOKEN, metadata) 153 | api = NativeApi(BASE_URL, API_TOKEN) 154 | data_api = DataAccessApi(BASE_URL, API_TOKEN) 155 | 156 | # Perform file upload 157 | df = Datafile({"pid": pid, "filename": "datafile.txt"}) 158 | response = api.upload_datafile( 159 | identifier=pid, 160 | filename="tests/data/replace.xyz", 161 | json_str=df.json(), 162 | ) 163 | 164 | # Retrieve file ID 165 | file_id = response.json()["data"]["files"][0]["dataFile"]["id"] 166 | 167 | # Act 168 | with tempfile.TemporaryDirectory() as tempdir: 169 | original = open("tests/data/replace.xyz").read() 170 | mutated = "Z" + original[1::] 171 | mutated_path = os.path.join(tempdir, "replace.xyz") 172 | 173 | with open(mutated_path, "w") as f: 174 | f.write(mutated) 175 | 176 | json_data = {} 177 | 178 | response = api.replace_datafile( 179 | identifier=file_id, 180 | filename=mutated_path, 181 | json_str=json.dumps(json_data), 182 | is_filepid=False, 183 | ) 184 | 185 | # Assert 186 | file_id = response.json()["data"]["files"][0]["dataFile"]["id"] 187 | content = data_api.get_datafile(file_id, is_pid=False).text 188 | 189 | assert response.status_code == 200, "File replacement failed." 190 | assert content == mutated, "File content does not match the expected content." 191 | 192 | def test_file_replacement_w_metadata(self): 193 | """ 194 | Test case for replacing a file in a dataset with metadata. 195 | 196 | Steps: 197 | 1. Create a dataset using the provided metadata. 198 | 2. Upload a datafile to the dataset. 199 | 3. Replace the uploaded datafile with a mutated version. 200 | 4. Verify that the file replacement was successful and the content matches the expected content. 201 | """ 202 | 203 | # Arrange 204 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 205 | API_TOKEN = os.getenv("API_TOKEN") 206 | 207 | # Create dataset 208 | metadata = json.load(open("tests/data/file_upload_ds_minimum.json")) 209 | pid = self._create_dataset(BASE_URL, API_TOKEN, metadata) 210 | api = NativeApi(BASE_URL, API_TOKEN) 211 | data_api = DataAccessApi(BASE_URL, API_TOKEN) 212 | 213 | # Perform file upload 214 | df = Datafile({"pid": pid, "filename": "datafile.txt"}) 215 | response = api.upload_datafile( 216 | identifier=pid, 217 | filename="tests/data/replace.xyz", 218 | json_str=df.json(), 219 | ) 220 | 221 | # Retrieve file ID 222 | file_id = response.json()["data"]["files"][0]["dataFile"]["id"] 223 | 224 | # Act 225 | with tempfile.TemporaryDirectory() as tempdir: 226 | original = open("tests/data/replace.xyz").read() 227 | mutated = "Z" + original[1::] 228 | mutated_path = os.path.join(tempdir, "replace.xyz") 229 | 230 | with open(mutated_path, "w") as f: 231 | f.write(mutated) 232 | 233 | json_data = { 234 | "description": "My description.", 235 | "categories": ["Data"], 236 | "forceReplace": False, 237 | "directoryLabel": "some/other", 238 | } 239 | 240 | response = api.replace_datafile( 241 | identifier=file_id, 242 | filename=mutated_path, 243 | json_str=json.dumps(json_data), 244 | is_filepid=False, 245 | ) 246 | 247 | # Assert 248 | file_id = response.json()["data"]["files"][0]["dataFile"]["id"] 249 | data_file = api.get_dataset(pid).json()["data"]["latestVersion"]["files"][0] 250 | content = data_api.get_datafile(file_id, is_pid=False).text 251 | 252 | assert ( 253 | data_file["description"] == "My description." 254 | ), "Description does not match." 255 | assert data_file["categories"] == ["Data"], "Categories do not match." 256 | assert ( 257 | data_file["directoryLabel"] == "some/other" 258 | ), "Directory label does not match." 259 | assert response.status_code == 200, "File replacement failed." 260 | assert content == mutated, "File content does not match the expected content." 261 | 262 | @staticmethod 263 | def _create_dataset( 264 | BASE_URL: str, 265 | API_TOKEN: str, 266 | metadata: dict, 267 | ): 268 | """ 269 | Create a dataset in the Dataverse. 270 | 271 | Args: 272 | BASE_URL (str): The base URL of the Dataverse instance. 273 | API_TOKEN (str): The API token for authentication. 274 | metadata (dict): The metadata for the dataset. 275 | 276 | Returns: 277 | str: The persistent identifier (PID) of the created dataset. 278 | """ 279 | url = f"{BASE_URL}/api/dataverses/root/datasets" 280 | response = httpx.post( 281 | url=url, 282 | json=metadata, 283 | headers={ 284 | "X-Dataverse-key": API_TOKEN, 285 | "Content-Type": "application/json", 286 | }, 287 | ) 288 | 289 | response.raise_for_status() 290 | 291 | return response.json()["data"]["persistentId"] 292 | -------------------------------------------------------------------------------- /tests/auth/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/tests/auth/__init__.py -------------------------------------------------------------------------------- /tests/auth/test_auth.py: -------------------------------------------------------------------------------- 1 | import uuid 2 | 3 | import pytest 4 | from httpx import Request 5 | 6 | from pyDataverse.auth import ApiTokenAuth, BearerTokenAuth 7 | from pyDataverse.exceptions import ApiAuthorizationError 8 | 9 | 10 | class TestApiTokenAuth: 11 | def test_token_header_is_added_during_auth_flow(self): 12 | api_token = str(uuid.uuid4()) 13 | auth = ApiTokenAuth(api_token) 14 | request = Request("GET", "https://example.org") 15 | assert "X-Dataverse-key" not in request.headers 16 | modified_request = next(auth.auth_flow(request)) 17 | assert "X-Dataverse-key" in modified_request.headers 18 | assert modified_request.headers["X-Dataverse-key"] == api_token 19 | 20 | @pytest.mark.parametrize( 21 | "non_str_token", (123, object(), lambda x: x, 1.423, b"123", uuid.uuid4()) 22 | ) 23 | def test_raise_if_token_is_not_str(self, non_str_token): 24 | with pytest.raises(ApiAuthorizationError): 25 | ApiTokenAuth(non_str_token) 26 | 27 | 28 | class TestBearerTokenAuth: 29 | def test_authorization_header_is_added_during_auth_flow(self): 30 | # Token as shown in RFC 6750 31 | bearer_token = "mF_9.B5f-4.1JqM" 32 | auth = BearerTokenAuth(bearer_token) 33 | request = Request("GET", "https://example.org") 34 | assert "Authorization" not in request.headers 35 | modified_request = next(auth.auth_flow(request)) 36 | assert "Authorization" in modified_request.headers 37 | assert modified_request.headers["Authorization"] == f"Bearer {bearer_token}" 38 | 39 | @pytest.mark.parametrize( 40 | "non_str_token", (123, object(), lambda x: x, 1.423, b"123", uuid.uuid4()) 41 | ) 42 | def test_raise_if_token_is_not_str(self, non_str_token): 43 | with pytest.raises(ApiAuthorizationError): 44 | BearerTokenAuth(non_str_token) 45 | -------------------------------------------------------------------------------- /tests/conftest.py: -------------------------------------------------------------------------------- 1 | """Find out more at https://github.com/GDCC/pyDataverse.""" 2 | 3 | import os 4 | import pytest 5 | from pyDataverse.api import NativeApi 6 | 7 | 8 | def test_config(): 9 | test_dir = os.path.dirname(os.path.realpath(__file__)) 10 | root_dir = os.path.dirname(test_dir) 11 | test_data_dir = os.path.join(test_dir, "data") 12 | json_schemas_dir = os.path.join(root_dir, "pyDataverse/schemas/json") 13 | test_data_output_dir = os.path.join(test_data_dir, "output") 14 | invalid_filename_strings = ["wrong", ""] 15 | invalid_filename_types = [(), [], 12, 12.12, set(), True, False] 16 | 17 | return { 18 | "root_dir": root_dir, 19 | "test_dir": test_dir, 20 | "test_data_dir": test_data_dir, 21 | "json_schemas_dir": json_schemas_dir, 22 | "test_data_output_dir": test_data_output_dir, 23 | "dataverse_upload_min_filename": os.path.join( 24 | test_data_dir, "dataverse_upload_min.json" 25 | ), 26 | "dataverse_upload_full_filename": os.path.join( 27 | test_data_dir, "dataverse_upload_full.json" 28 | ), 29 | "dataverse_upload_schema_filename": os.path.join( 30 | json_schemas_dir, "dataverse_upload_schema.json" 31 | ), 32 | "dataverse_json_output_filename": os.path.join( 33 | test_data_output_dir, "dataverse_pytest.json" 34 | ), 35 | "dataset_upload_min_filename": os.path.join( 36 | test_data_dir, "dataset_upload_min_default.json" 37 | ), 38 | "dataset_upload_full_filename": os.path.join( 39 | test_data_dir, "dataset_upload_full_default.json" 40 | ), 41 | "dataset_upload_schema_filename": os.path.join( 42 | json_schemas_dir, "dataset_upload_default_schema.json" 43 | ), 44 | "dataset_json_output_filename": os.path.join( 45 | test_data_output_dir, "dataset_pytest.json" 46 | ), 47 | "datafile_upload_min_filename": os.path.join( 48 | test_data_dir, "datafile_upload_min.json" 49 | ), 50 | "datafile_upload_full_filename": os.path.join( 51 | test_data_dir, "datafile_upload_full.json" 52 | ), 53 | "datafile_upload_schema_filename": os.path.join( 54 | json_schemas_dir, "datafile_upload_schema.json" 55 | ), 56 | "datafile_json_output_filename": os.path.join( 57 | test_data_output_dir, "datafile_pytest.json" 58 | ), 59 | "tree_filename": os.path.join(test_data_dir, "tree.json"), 60 | "invalid_filename_strings": ["wrong", ""], 61 | "invalid_filename_types": [(), [], 12, 12.12, set(), True, False], 62 | "invalid_validate_types": [None, "wrong", {}, []], 63 | "invalid_json_data_types": [[], (), 12, set(), True, False, None], 64 | "invalid_set_types": invalid_filename_types + ["", "wrong"], 65 | "invalid_json_strings": invalid_filename_strings, 66 | "invalid_data_format_types": invalid_filename_types, 67 | "invalid_data_format_strings": invalid_filename_strings, 68 | "base_url": os.getenv("BASE_URL").rstrip("/"), 69 | "api_token": os.getenv("API_TOKEN"), 70 | "travis": os.getenv("TRAVIS") or False, 71 | "wait_time": 1, 72 | } 73 | 74 | 75 | test_config = test_config() 76 | 77 | 78 | @pytest.fixture() 79 | def native_api(monkeypatch): 80 | """Fixture, so set up an Api connection. 81 | 82 | Returns 83 | ------- 84 | Api 85 | Api object. 86 | 87 | """ 88 | 89 | BASE_URL = os.getenv("BASE_URL").rstrip("/") 90 | 91 | monkeypatch.setenv("BASE_URL", BASE_URL) 92 | return NativeApi(BASE_URL) 93 | 94 | 95 | def import_dataverse_min_dict(): 96 | """Import minimum Dataverse dict. 97 | 98 | Returns 99 | ------- 100 | dict 101 | Minimum Dataverse metadata. 102 | 103 | """ 104 | return { 105 | "alias": "test-pyDataverse", 106 | "name": "Test pyDataverse", 107 | "dataverseContacts": [{"contactEmail": "info@aussda.at"}], 108 | } 109 | 110 | 111 | def import_dataset_min_dict(): 112 | """Import dataset dict. 113 | 114 | Returns 115 | ------- 116 | dict 117 | Dataset metadata. 118 | 119 | """ 120 | return { 121 | "license": "CC0", 122 | "termsOfUse": "CC0 Waiver", 123 | "termsOfAccess": "Terms of Access", 124 | "citation_displayName": "Citation Metadata", 125 | "title": "Replication Data for: Title", 126 | } 127 | 128 | 129 | def import_datafile_min_dict(): 130 | """Import minimum Datafile dict. 131 | 132 | Returns 133 | ------- 134 | dict 135 | Minimum Datafile metadata. 136 | 137 | """ 138 | return { 139 | "pid": "doi:10.11587/EVMUHP", 140 | "filename": "tests/data/datafile.txt", 141 | } 142 | 143 | 144 | def import_datafile_full_dict(): 145 | """Import full Datafile dict. 146 | 147 | Returns 148 | ------- 149 | dict 150 | Full Datafile metadata. 151 | 152 | """ 153 | return { 154 | "pid": "doi:10.11587/EVMUHP", 155 | "filename": "tests/data/datafile.txt", 156 | "description": "Test datafile", 157 | "restrict": False, 158 | } 159 | 160 | 161 | @pytest.fixture 162 | def create_mock_file(): 163 | """Returns a function that creates a mock file.""" 164 | 165 | def _create_mock_file(filename: str, dir: str, size: int): 166 | """Create a mock file. 167 | 168 | Args: 169 | filename (str): Filename. 170 | dir (str): Directory. 171 | size (int): Size. 172 | 173 | Returns: 174 | str: Path to the file. 175 | """ 176 | path = os.path.join(dir, filename) 177 | with open(path, "wb") as f: 178 | f.write(os.urandom(size)) 179 | 180 | return path 181 | 182 | return _create_mock_file 183 | -------------------------------------------------------------------------------- /tests/core/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/tests/core/__init__.py -------------------------------------------------------------------------------- /tests/data/datafile.txt: -------------------------------------------------------------------------------- 1 | hello! 2 | -------------------------------------------------------------------------------- /tests/data/datafile_upload_full.json: -------------------------------------------------------------------------------- 1 | { 2 | "description": "Another data file.", 3 | "categories": [ 4 | "Documentation" 5 | ], 6 | "restrict": true, 7 | "pid": "doi:10.11587/NVWE8Y", 8 | "filename": "20001_ta_de_v1_0.pdf", 9 | "label": "Questionnaire", 10 | "directoryLabel": "data/subdir1" 11 | } 12 | -------------------------------------------------------------------------------- /tests/data/datafile_upload_min.json: -------------------------------------------------------------------------------- 1 | { 2 | "pid": "doi:10.11587/RRKEA9", 3 | "filename": "10109_qu_de_v1_0.pdf" 4 | } 5 | -------------------------------------------------------------------------------- /tests/data/dataset_upload_min_default.json: -------------------------------------------------------------------------------- 1 | { 2 | "datasetVersion": { 3 | "metadataBlocks": { 4 | "citation": { 5 | "fields": [ 6 | { 7 | "value": "Darwin's Finches", 8 | "typeClass": "primitive", 9 | "multiple": false, 10 | "typeName": "title" 11 | }, 12 | { 13 | "value": [ 14 | { 15 | "authorName": { 16 | "value": "Finch, Fiona", 17 | "typeClass": "primitive", 18 | "multiple": false, 19 | "typeName": "authorName" 20 | }, 21 | "authorAffiliation": { 22 | "value": "Birds Inc.", 23 | "typeClass": "primitive", 24 | "multiple": false, 25 | "typeName": "authorAffiliation" 26 | } 27 | } 28 | ], 29 | "typeClass": "compound", 30 | "multiple": true, 31 | "typeName": "author" 32 | }, 33 | { 34 | "value": [ 35 | { 36 | "datasetContactEmail": { 37 | "typeClass": "primitive", 38 | "multiple": false, 39 | "typeName": "datasetContactEmail", 40 | "value": "finch@mailinator.com" 41 | }, 42 | "datasetContactName": { 43 | "typeClass": "primitive", 44 | "multiple": false, 45 | "typeName": "datasetContactName", 46 | "value": "Finch, Fiona" 47 | } 48 | } 49 | ], 50 | "typeClass": "compound", 51 | "multiple": true, 52 | "typeName": "datasetContact" 53 | }, 54 | { 55 | "value": [ 56 | { 57 | "dsDescriptionValue": { 58 | "value": "Darwin's finches (also known as the Gal\u00e1pagos finches) are a group of about fifteen species of passerine birds.", 59 | "multiple": false, 60 | "typeClass": "primitive", 61 | "typeName": "dsDescriptionValue" 62 | } 63 | } 64 | ], 65 | "typeClass": "compound", 66 | "multiple": true, 67 | "typeName": "dsDescription" 68 | }, 69 | { 70 | "value": [ 71 | "Medicine, Health and Life Sciences" 72 | ], 73 | "typeClass": "controlledVocabulary", 74 | "multiple": true, 75 | "typeName": "subject" 76 | } 77 | ], 78 | "displayName": "Citation Metadata" 79 | } 80 | } 81 | } 82 | } 83 | -------------------------------------------------------------------------------- /tests/data/dataverse_upload_full.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Scientific Research", 3 | "alias": "science", 4 | "dataverseContacts": [ 5 | { 6 | "contactEmail": "pi@example.edu" 7 | }, 8 | { 9 | "contactEmail": "student@example.edu" 10 | } 11 | ], 12 | "affiliation": "Scientific Research University", 13 | "description": "We do all the science.", 14 | "dataverseType": "LABORATORY" 15 | } 16 | -------------------------------------------------------------------------------- /tests/data/dataverse_upload_min.json: -------------------------------------------------------------------------------- 1 | { 2 | "alias": "test-pyDataverse", 3 | "name": "Test pyDataverse", 4 | "dataverseContacts": [ 5 | { 6 | "contactEmail": "info@aussda.at" 7 | } 8 | ] 9 | } 10 | -------------------------------------------------------------------------------- /tests/data/file_upload_ds_minimum.json: -------------------------------------------------------------------------------- 1 | { 2 | "datasetVersion": { 3 | "metadataBlocks": { 4 | "citation": { 5 | "fields": [ 6 | { 7 | "multiple": true, 8 | "typeClass": "compound", 9 | "typeName": "author", 10 | "value": [ 11 | { 12 | "authorName": { 13 | "multiple": false, 14 | "typeClass": "primitive", 15 | "typeName": "authorName", 16 | "value": "John Doe" 17 | } 18 | } 19 | ] 20 | }, 21 | { 22 | "multiple": true, 23 | "typeClass": "compound", 24 | "typeName": "datasetContact", 25 | "value": [ 26 | { 27 | "datasetContactName": { 28 | "multiple": false, 29 | "typeClass": "primitive", 30 | "typeName": "datasetContactName", 31 | "value": "John Doe" 32 | }, 33 | "datasetContactEmail": { 34 | "multiple": false, 35 | "typeClass": "primitive", 36 | "typeName": "datasetContactEmail", 37 | "value": "john@doe.com" 38 | } 39 | } 40 | ] 41 | }, 42 | { 43 | "multiple": true, 44 | "typeClass": "compound", 45 | "typeName": "dsDescription", 46 | "value": [ 47 | { 48 | "dsDescriptionValue": { 49 | "multiple": false, 50 | "typeClass": "primitive", 51 | "typeName": "dsDescriptionValue", 52 | "value": "This is a description of the dataset" 53 | } 54 | } 55 | ] 56 | }, 57 | { 58 | "multiple": true, 59 | "typeClass": "controlledVocabulary", 60 | "typeName": "subject", 61 | "value": [ 62 | "Other" 63 | ] 64 | }, 65 | { 66 | "multiple": false, 67 | "typeClass": "primitive", 68 | "typeName": "title", 69 | "value": "My dataset" 70 | } 71 | ] 72 | } 73 | } 74 | } 75 | } 76 | -------------------------------------------------------------------------------- /tests/data/output/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/tests/data/output/.gitkeep -------------------------------------------------------------------------------- /tests/data/replace.xyz: -------------------------------------------------------------------------------- 1 | A B C 2 | 3.4292 -4.32647 -1.66819 3 | 3.4292 -4.51647 -1.65310 4 | -------------------------------------------------------------------------------- /tests/data/tree.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "dataverse_alias": "parent_dv_1", 4 | "type": "dataverse", 5 | "dataverse_id": 1, 6 | "children": [ 7 | { 8 | "dataverse_alias": "parent_dv_1_sub_dv_1", 9 | "type": "dataverse", 10 | "dataverse_id": 3 11 | }, 12 | { 13 | "dataset_id": "1AB23C", 14 | "pid": "doi:12.34567/1AB23C", 15 | "type": "dataset", 16 | "children": [ 17 | { 18 | "datafile_id": 1, 19 | "filename": "appendix.pdf", 20 | "label": "appendix.pdf", 21 | "pid": "doi:12.34567/1AB23C/ABC123", 22 | "type": "datafile" 23 | }, 24 | { 25 | "datafile_id": 2, 26 | "filename": "survey.zsav", 27 | "label": "survey.zsav", 28 | "pid": "doi:12.34567/1AB23C/DEF456", 29 | "type": "datafile" 30 | } 31 | ] 32 | }, 33 | { 34 | "dataset_id": "4DE56F", 35 | "pid": "doi:12.34567/4DE56F", 36 | "type": "dataset", 37 | "children": [ 38 | { 39 | "datafile_id": 3, 40 | "filename": "manual.pdf", 41 | "label": "manual.pdf", 42 | "pid": "doi:12.34567/4DE56F/GHI789", 43 | "type": "datafile" 44 | } 45 | ] 46 | } 47 | ] 48 | }, 49 | { 50 | "dataverse_alias": "parent_dv_2", 51 | "type": "dataverse", 52 | "dataverse_id": 2, 53 | "children": [ 54 | { 55 | "dataset_id": "7GH89I", 56 | "pid": "doi:12.34567/7GH89I", 57 | "type": "dataset", 58 | "children": [ 59 | { 60 | "datafile_id": 4, 61 | "filename": "study.zsav", 62 | "label": "study.zsav", 63 | "pid": "doi:12.34567/7GH89I/JKL012", 64 | "type": "datafile" 65 | } 66 | ] 67 | }, 68 | { 69 | "dataset_id": "0JK1LM", 70 | "pid": "doi:12.34567/0JK1LM", 71 | "type": "dataset", 72 | "children": [ 73 | { 74 | "datafile_id": 5, 75 | "filename": "documentation.pdf", 76 | "label": "documentation.pdf", 77 | "pid": "doi:12.34567/0JK1LM/MNO345", 78 | "type": "datafile" 79 | }, 80 | { 81 | "datafile_id": 6, 82 | "filename": "data.R", 83 | "label": "data.R", 84 | "pid": "doi:12.34567/0JK1LM/PQR678", 85 | "type": "datafile" 86 | } 87 | ] 88 | } 89 | ] 90 | }, 91 | { 92 | "dataset_id": "2NO34P", 93 | "pid": "doi:12.34567/2NO34P", 94 | "type": "dataset", 95 | "children": [ 96 | { 97 | "datafile_id": 7, 98 | "filename": "summary.md", 99 | "label": "summary.md", 100 | "pid": "doi:12.34567/2NO34P/STU901", 101 | "type": "datafile" 102 | } 103 | ] 104 | } 105 | ] 106 | -------------------------------------------------------------------------------- /tests/data/user-guide/datafile.txt: -------------------------------------------------------------------------------- 1 | hello! 2 | -------------------------------------------------------------------------------- /tests/data/user-guide/datafiles.csv: -------------------------------------------------------------------------------- 1 | "org.datafile_id","org.dataset_id","org.filename","org.to_upload","org.is_uploaded","org.to_delete","org.is_deleted","org.to_update","org.is_updated","dv.datafile_id","dv.description","dv.categories","dv.restrict","dv.label","dv.directoryLabel","alma.title","alma.pages","alma.year" 2 | 1,1,"datafile.txt","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE",634,"My description bbb.","[""Data""]","FALSE","Text Report","data/subdir1","Text Report",23,1997 3 | -------------------------------------------------------------------------------- /tests/data/user-guide/dataset.json: -------------------------------------------------------------------------------- 1 | { 2 | "datasetVersion": { 3 | "metadataBlocks": { 4 | "citation": { 5 | "fields": [ 6 | { 7 | "value": "Youth in Austria 2005", 8 | "typeClass": "primitive", 9 | "multiple": false, 10 | "typeName": "title" 11 | }, 12 | { 13 | "value": [ 14 | { 15 | "authorName": { 16 | "value": "LastAuthor1, FirstAuthor1", 17 | "typeClass": "primitive", 18 | "multiple": false, 19 | "typeName": "authorName" 20 | }, 21 | "authorAffiliation": { 22 | "value": "AuthorAffiliation1", 23 | "typeClass": "primitive", 24 | "multiple": false, 25 | "typeName": "authorAffiliation" 26 | } 27 | } 28 | ], 29 | "typeClass": "compound", 30 | "multiple": true, 31 | "typeName": "author" 32 | }, 33 | { 34 | "value": [ 35 | { 36 | "datasetContactEmail": { 37 | "typeClass": "primitive", 38 | "multiple": false, 39 | "typeName": "datasetContactEmail", 40 | "value": "ContactEmail1@mailinator.com" 41 | }, 42 | "datasetContactName": { 43 | "typeClass": "primitive", 44 | "multiple": false, 45 | "typeName": "datasetContactName", 46 | "value": "LastContact1, FirstContact1" 47 | } 48 | } 49 | ], 50 | "typeClass": "compound", 51 | "multiple": true, 52 | "typeName": "datasetContact" 53 | }, 54 | { 55 | "value": [ 56 | { 57 | "dsDescriptionValue": { 58 | "value": "DescriptionText", 59 | "multiple": false, 60 | "typeClass": "primitive", 61 | "typeName": "dsDescriptionValue" 62 | } 63 | } 64 | ], 65 | "typeClass": "compound", 66 | "multiple": true, 67 | "typeName": "dsDescription" 68 | }, 69 | { 70 | "value": [ 71 | "Medicine, Health and Life Sciences" 72 | ], 73 | "typeClass": "controlledVocabulary", 74 | "multiple": true, 75 | "typeName": "subject" 76 | } 77 | ], 78 | "displayName": "Citation Metadata" 79 | } 80 | } 81 | } 82 | } 83 | -------------------------------------------------------------------------------- /tests/data/user-guide/datasets.csv: -------------------------------------------------------------------------------- 1 | "org.dataset_id","org.dataverse_id","org.doi","org.privateurl","org.to_upload","org.is_uploaded","org.to_publish","org.is_published","org.to_delete","org.is_deleted","org.to_update","org.is_updated","dv.license","dv.termsOfAccess","dv.termsOfUse","dv.otherId","dv.title","dv.subtitle","dv.alternativeTitle","dv.series","dv.notesText","dv.author","dv.dsDescription","dv.subject","dv.keyword","dv.topicClassification","dv.language","dv.grantNumber","dv.dateOfCollection","dv.kindOfData","dv.dataSources","dv.accessToSources","dv.alternativeURL","dv.characteristicOfSources","dv.dateOfDeposit","dv.depositor","dv.distributionDate","dv.otherReferences","dv.productionDate","dv.productionPlace","dv.contributor","dv.relatedDatasets","dv.relatedMaterial","dv.datasetContact","dv.distributor","dv.producer","dv.publication","dv.software","dv.timePeriodCovered","dv.geographicUnit","dv.geographicBoundingBox","dv.geographicCoverage","dv.actionsToMinimizeLoss","dv.cleaningOperations","dv.collectionMode","dv.collectorTraining","dv.controlOperations","dv.dataCollectionSituation","dv.dataCollector","dv.datasetLevelErrorNotes","dv.deviationsFromSampleDesign","dv.frequencyOfDataCollection","dv.otherDataAppraisal","dv.socialScienceNotes","dv.researchInstrument","dv.responseRate","dv.samplingErrorEstimates","dv.samplingProcedure","dv.unitOfAnalysis","dv.universe","dv.timeMethod","dv.weighting","dv.fileAccessRequest" 2 | 1,1,"doi:10.11587/19ZW6I",,"TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","CC0","Terms of Access","CC0 Waiver","[{""otherIdAgency"": ""OtherIDAgency1"", ""otherIdValue"": ""OtherIDIdentifier1""}]","Replication Data for: Title","Subtitle","Alternative Title","{""seriesName"": ""SeriesName"", ""seriesInformation"": ""SeriesInformation""}","Notes1","[{""authorName"": ""LastAuthor1, FirstAuthor1"", ""authorAffiliation"": ""AuthorAffiliation1"", ""authorIdentifierScheme"": ""ORCID"", ""authorIdentifier"": ""AuthorIdentifier1""}]","[{""dsDescriptionValue"": ""DescriptionText2"", ""dsDescriptionDate"": ""1000-02-02""}]","[""Agricultural Sciences"", ""Business and Management"", ""Engineering"", ""Law""]","[{""keywordValue"": ""KeywordTerm1"", ""keywordVocabulary"": ""KeywordVocabulary1"", ""keywordVocabularyURI"": ""http://KeywordVocabularyURL1.org""}]","[{""topicClassValue"": ""Topic Class Value1"", ""topicClassVocab"": ""Topic Classification Vocabulary"", ""topicClassVocabURI"": ""http://www.topicURL.net""}]","[""English"", ""German""]","[{""grantNumberAgency"": ""GrantInformationGrantAgency1"", ""grantNumberValue"": ""GrantInformationGrantNumber1""}]","[{""dateOfCollectionStart"": ""1006-01-01"", ""dateOfCollectionEnd"": ""1006-01-01""}]","[""KindOfData1"", ""KindOfData2""]","[""DataSources1"", ""DataSources2""]","DocumentationAndAccessToSources","http://AlternativeURL.org","CharacteristicOfSourcesNoted","1002-01-01","LastDepositor, FirstDepositor","1004-01-01","[""OtherReferences1"", ""OtherReferences2""]","1003-01-01","ProductionPlace","[{""contributorType"": ""Data Collector"", ""contributorName"": ""LastContributor1, FirstContributor1""}]","[""RelatedDatasets1"", ""RelatedDatasets2""]","[""RelatedMaterial1"", ""RelatedMaterial2""]","[{""datasetContactName"": ""LastContact1, FirstContact1"", ""datasetContactAffiliation"": ""ContactAffiliation1"", ""datasetContactEmail"": ""ContactEmail1@mailinator.com""}]","[{""distributorName"": ""LastDistributor1, FirstDistributor1"", ""distributorAffiliation"": ""DistributorAffiliation1"", ""distributorAbbreviation"": ""DistributorAbbreviation1"", ""distributorURL"": ""http://DistributorURL1.org"", ""distributorLogoURL"": ""http://DistributorLogoURL1.org""}]","[{""producerName"": ""LastProducer1, FirstProducer1"", ""producerAffiliation"": ""ProducerAffiliation1"", ""producerAbbreviation"": ""ProducerAbbreviation1"", ""producerURL"": ""http://ProducerURL1.org"", ""producerLogoURL"": ""http://ProducerLogoURL1.org""}]","[{""publicationCitation"": ""RelatedPublicationCitation1"", ""publicationIDType"": ""ark"", ""publicationIDNumber"": ""RelatedPublicationIDNumber1"", ""publicationURL"": ""http://RelatedPublicationURL1.org""}]","[{""softwareName"": ""SoftwareName1"", ""softwareVersion"": ""SoftwareVersion1""}]","[{""timePeriodCoveredStart"": ""1005-01-01"", ""timePeriodCoveredEnd"": ""1005-01-02""}]","[""GeographicUnit1"", ""GeographicUnit2""]","[{""westLongitude"": ""10"", ""eastLongitude"": ""20"", ""northLongitude"": ""30"", ""southLongitude"": ""40""}]","[{""country"": ""Afghanistan"", ""state"": ""GeographicCoverageStateProvince1"", ""city"": ""GeographicCoverageCity1"", ""otherGeographicCoverage"": ""GeographicCoverageOther1""}]","ActionsToMinimizeLosses","CleaningOperations","CollectionMode","CollectorTraining","ControlOperations","CharacteristicsOfDataCollectionSituation","LastDataCollector1, FirstDataCollector1","StudyLevelErrorNotes","MajorDeviationsForSampleDesign","Frequency","OtherFormsOfDataAppraisal","[{""socialScienceNotesType"": ""NotesType"", ""socialScienceNotesSubject"": ""NotesSubject"", ""socialScienceNotesText"": ""NotesText""}]","TypeOfResearchInstrument","ResponseRate","EstimatesOfSamplingError","SamplingProcedure","[""UnitOfAnalysis1"", ""UnitOfAnalysis2""]","[""Universe1"", ""Universe2""]","TimeMethod","Weighting","True" 3 | -------------------------------------------------------------------------------- /tests/data/user-guide/dataverse.json: -------------------------------------------------------------------------------- 1 | { 2 | "alias": "pyDataverse_user-guide", 3 | "name": "pyDataverse - User Guide", 4 | "dataverseContacts": [ 5 | { 6 | "contactEmail": "info@aussda.at" 7 | } 8 | ] 9 | } 10 | -------------------------------------------------------------------------------- /tests/data/user.json: -------------------------------------------------------------------------------- 1 | { 2 | "email": "admin@stefankasberger.at", 3 | "firstName": "pyDataverse", 4 | "lastName": "Test", 5 | "userName": "pyDataverseTest" 6 | } 7 | -------------------------------------------------------------------------------- /tests/logs/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/tests/logs/.gitkeep -------------------------------------------------------------------------------- /tests/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/tests/models/__init__.py -------------------------------------------------------------------------------- /tests/models/test_datafile.py: -------------------------------------------------------------------------------- 1 | """Datafile data model tests.""" 2 | 3 | import json 4 | import jsonschema 5 | import os 6 | import platform 7 | import pytest 8 | 9 | from pyDataverse.models import Datafile 10 | from pyDataverse.utils import read_file, write_json 11 | from ..conftest import test_config 12 | 13 | 14 | def data_object(): 15 | """Get Datafile object. 16 | 17 | Returns 18 | ------- 19 | pydataverse.models.Datafile 20 | :class:`Datafile` object. 21 | """ 22 | return Datafile() 23 | 24 | 25 | def dict_flat_set_min(): 26 | """Get flat dict for set() of minimum Datafile. 27 | 28 | Returns 29 | ------- 30 | dict 31 | Flat dict with minimum Datafile data. 32 | 33 | """ 34 | return {"pid": "doi:10.11587/RRKEA9", "filename": "10109_qu_de_v1_0.pdf"} 35 | 36 | 37 | def dict_flat_set_full(): 38 | """Get flat dict for set() of full Datafile. 39 | 40 | Returns 41 | ------- 42 | dict 43 | Flat dict with full Datafile data. 44 | 45 | """ 46 | return { 47 | "pid": "doi:10.11587/NVWE8Y", 48 | "filename": "20001_ta_de_v1_0.pdf", 49 | "description": "Another data file.", 50 | "restrict": True, 51 | "categories": ["Documentation"], 52 | "label": "Questionnaire", 53 | "directoryLabel": "data/subdir1", 54 | } 55 | 56 | 57 | def object_data_init(): 58 | """Get dictionary for Datafile with initial attributes. 59 | 60 | Returns 61 | ------- 62 | dict 63 | Dictionary of init data attributes set. 64 | 65 | """ 66 | return { 67 | "_Datafile_default_json_format": "dataverse_upload", 68 | "_Datafile_default_json_schema_filename": test_config[ 69 | "datafile_upload_schema_filename" 70 | ], 71 | "_Datafile_allowed_json_formats": ["dataverse_upload", "dataverse_download"], 72 | "_Datafile_json_dataverse_upload_attr": [ 73 | "description", 74 | "categories", 75 | "restrict", 76 | "label", 77 | "directoryLabel", 78 | "pid", 79 | "filename", 80 | ], 81 | "_internal_attributes": [], 82 | } 83 | 84 | 85 | def object_data_min(): 86 | """Get dictionary for Datafile with minimum attributes. 87 | 88 | Returns 89 | ------- 90 | pyDataverse.Datafile 91 | :class:`Datafile` with minimum attributes set. 92 | 93 | """ 94 | return {"pid": "doi:10.11587/RRKEA9", "filename": "10109_qu_de_v1_0.pdf"} 95 | 96 | 97 | def object_data_full(): 98 | """Get flat dict for :func:`get()` with initial data of Datafile. 99 | 100 | Returns 101 | ------- 102 | pyDataverse.Datafile 103 | :class:`Datafile` with full attributes set. 104 | 105 | """ 106 | return { 107 | "pid": "doi:10.11587/NVWE8Y", 108 | "filename": "20001_ta_de_v1_0.pdf", 109 | "description": "Another data file.", 110 | "restrict": True, 111 | "categories": ["Documentation"], 112 | "label": "Questionnaire", 113 | "directoryLabel": "data/subdir1", 114 | } 115 | 116 | 117 | def dict_flat_get_min(): 118 | """Get flat dict for :func:`get` with minimum data of Datafile. 119 | 120 | Returns 121 | ------- 122 | dict 123 | Minimum Datafile dictionary returned by :func:`get`. 124 | 125 | """ 126 | return {"pid": "doi:10.11587/RRKEA9", "filename": "10109_qu_de_v1_0.pdf"} 127 | 128 | 129 | def dict_flat_get_full(): 130 | """Get flat dict for :func:`get` of full data of Datafile. 131 | 132 | Returns 133 | ------- 134 | dict 135 | Full Datafile dictionary returned by :func:`get`. 136 | 137 | """ 138 | return { 139 | "pid": "doi:10.11587/NVWE8Y", 140 | "filename": "20001_ta_de_v1_0.pdf", 141 | "description": "Another data file.", 142 | "restrict": True, 143 | "categories": ["Documentation"], 144 | "label": "Questionnaire", 145 | "directoryLabel": "data/subdir1", 146 | } 147 | 148 | 149 | def json_upload_min(): 150 | """Get JSON string of minimum Datafile. 151 | 152 | Returns 153 | ------- 154 | dict 155 | JSON string. 156 | 157 | """ 158 | return read_file(test_config["datafile_upload_min_filename"]) 159 | 160 | 161 | def json_upload_full(): 162 | """Get JSON string of full Datafile. 163 | 164 | Returns 165 | ------- 166 | str 167 | JSON string. 168 | 169 | """ 170 | return read_file(test_config["datafile_upload_full_filename"]) 171 | 172 | 173 | def json_dataverse_upload_attr(): 174 | """List of attributes import or export in format `dataverse_upload`. 175 | 176 | Returns 177 | ------- 178 | list 179 | List of attributes, which will be used for import and export. 180 | 181 | """ 182 | return [ 183 | "description", 184 | "categories", 185 | "restrict", 186 | "label", 187 | "directoryLabel", 188 | "pid", 189 | "filename", 190 | ] 191 | 192 | 193 | def json_dataverse_upload_required_attr(): 194 | """List of attributes required for `dataverse_upload` JSON. 195 | 196 | Returns 197 | ------- 198 | list 199 | List of attributes, which will be used for import and export. 200 | 201 | """ 202 | return ["pid", "filename"] 203 | 204 | 205 | class TestDatafileGeneric(object): 206 | """Generic tests for Datafile().""" 207 | 208 | def test_datafile_set_and_get_valid(self): 209 | """Test Datafile.get() with valid data.""" 210 | data = [ 211 | ((dict_flat_set_min(), object_data_min()), dict_flat_get_min()), 212 | ((dict_flat_set_full(), object_data_full()), dict_flat_get_full()), 213 | (({}, {}), {}), 214 | ] 215 | 216 | pdv = data_object() 217 | pdv.set(dict_flat_set_min()) 218 | assert isinstance(pdv.get(), dict) 219 | 220 | for input, data_eval in data: 221 | pdv = data_object() 222 | pdv.set(input[0]) 223 | data = pdv.get() 224 | for key, val in data_eval.items(): 225 | assert data[key] == input[1][key] == data_eval[key] 226 | assert len(data) == len(input[1]) == len(data_eval) 227 | 228 | def test_datafile_set_invalid(self): 229 | """Test Datafile.set() with invalid data.""" 230 | 231 | # invalid data 232 | for data in test_config["invalid_set_types"]: 233 | with pytest.raises(AssertionError): 234 | pdv = data_object() 235 | pdv.set(data) 236 | 237 | def test_datafile_from_json_valid(self): 238 | """Test Datafile.from_json() with valid data.""" 239 | data = [ 240 | (({json_upload_min()}, {}), object_data_min()), 241 | (({json_upload_full()}, {}), object_data_full()), 242 | ( 243 | ({json_upload_min()}, {"data_format": "dataverse_upload"}), 244 | object_data_min(), 245 | ), 246 | (({json_upload_min()}, {"validate": False}), object_data_min()), 247 | ( 248 | ( 249 | {json_upload_min()}, 250 | {"filename_schema": "wrong", "validate": False}, 251 | ), 252 | object_data_min(), 253 | ), 254 | ( 255 | ( 256 | {json_upload_min()}, 257 | { 258 | "filename_schema": test_config[ 259 | "datafile_upload_schema_filename" 260 | ], 261 | "validate": True, 262 | }, 263 | ), 264 | object_data_min(), 265 | ), 266 | (({"{}"}, {"validate": False}), {}), 267 | ] 268 | 269 | for input, data_eval in data: 270 | pdv = data_object() 271 | args = input[0] 272 | kwargs = input[1] 273 | pdv.from_json(*args, **kwargs) 274 | 275 | for key, val in data_eval.items(): 276 | assert getattr(pdv, key) == data_eval[key] 277 | assert len(pdv.__dict__) - len(object_data_init()) == len(data_eval) 278 | 279 | def test_datafile_from_json_invalid(self): 280 | """Test Datafile.from_json() with invalid data.""" 281 | # invalid data 282 | for data in test_config["invalid_json_data_types"]: 283 | with pytest.raises(AssertionError): 284 | pdv = data_object() 285 | pdv.from_json(data, validate=False) 286 | 287 | if int(platform.python_version_tuple()[1]) >= 5: 288 | for json_string in test_config["invalid_json_strings"]: 289 | with pytest.raises(json.decoder.JSONDecodeError): 290 | pdv = data_object() 291 | pdv.from_json(json_string, validate=False) 292 | else: 293 | for json_string in test_config["invalid_json_strings"]: 294 | with pytest.raises(ValueError): 295 | pdv = data_object() 296 | pdv.from_json(json_string, validate=False) 297 | 298 | # invalid `filename_schema` 299 | for filename_schema in test_config["invalid_filename_strings"]: 300 | with pytest.raises(FileNotFoundError): 301 | pdv = data_object() 302 | pdv.from_json(json_upload_min(), filename_schema=filename_schema) 303 | 304 | for filename_schema in test_config["invalid_filename_types"]: 305 | with pytest.raises(AssertionError): 306 | pdv = data_object() 307 | pdv.from_json(json_upload_min(), filename_schema=filename_schema) 308 | 309 | # invalid `data_format` 310 | for data_format in ( 311 | test_config["invalid_data_format_types"] 312 | + test_config["invalid_data_format_strings"] 313 | ): 314 | with pytest.raises(AssertionError): 315 | pdv = data_object() 316 | pdv.from_json( 317 | json_upload_min(), data_format=data_format, validate=False 318 | ) 319 | 320 | # invalid `validate` 321 | for validate in test_config["invalid_validate_types"]: 322 | with pytest.raises(AssertionError): 323 | pdv = data_object() 324 | pdv.from_json(json_upload_min(), validate=validate) 325 | 326 | with pytest.raises(jsonschema.exceptions.ValidationError): 327 | pdv = data_object() 328 | pdv.from_json("{}") 329 | 330 | for attr in json_dataverse_upload_required_attr(): 331 | with pytest.raises(jsonschema.exceptions.ValidationError): 332 | pdv = data_object() 333 | data = json.loads(json_upload_min()) 334 | del data[attr] 335 | data = json.dumps(data) 336 | pdv.from_json(data, validate=True) 337 | 338 | def test_datafile_to_json_valid(self): 339 | """Test Datafile.json() with valid data.""" 340 | data = [ 341 | ((dict_flat_set_min(), {}), json.loads(json_upload_min())), 342 | ((dict_flat_set_full(), {}), json.loads(json_upload_full())), 343 | ( 344 | (dict_flat_set_min(), {"data_format": "dataverse_upload"}), 345 | json.loads(json_upload_min()), 346 | ), 347 | ( 348 | (dict_flat_set_min(), {"validate": False}), 349 | json.loads(json_upload_min()), 350 | ), 351 | ( 352 | ( 353 | dict_flat_set_min(), 354 | {"filename_schema": "wrong", "validate": False}, 355 | ), 356 | json.loads(json_upload_min()), 357 | ), 358 | ( 359 | ( 360 | dict_flat_set_min(), 361 | { 362 | "filename_schema": test_config[ 363 | "datafile_upload_schema_filename" 364 | ], 365 | "validate": True, 366 | }, 367 | ), 368 | json.loads(json_upload_min()), 369 | ), 370 | (({}, {"validate": False}), {}), 371 | ] 372 | 373 | pdv = data_object() 374 | pdv.set(dict_flat_set_min()) 375 | assert isinstance(pdv.json(), str) 376 | 377 | for input, data_eval in data: 378 | pdv = data_object() 379 | pdv.set(input[0]) 380 | kwargs = input[1] 381 | data = json.loads(pdv.json(**kwargs)) 382 | for key, val in data_eval.items(): 383 | assert data[key] == data_eval[key] 384 | assert len(data) == len(data_eval) 385 | 386 | def test_datafile_to_json_invalid(self): 387 | """Test Datafile.json() with non-valid data.""" 388 | # invalid `filename_schema` 389 | for filename_schema in test_config["invalid_filename_strings"]: 390 | with pytest.raises(FileNotFoundError): 391 | obj = data_object() 392 | obj.json(filename_schema=filename_schema) 393 | 394 | for filename_schema in test_config["invalid_filename_types"]: 395 | with pytest.raises(AssertionError): 396 | pdv = data_object() 397 | pdv.json(filename_schema=filename_schema) 398 | 399 | # invalid `data_format` 400 | for data_format in ( 401 | test_config["invalid_data_format_types"] 402 | + test_config["invalid_data_format_strings"] 403 | ): 404 | with pytest.raises(AssertionError): 405 | pdv = data_object() 406 | pdv.set(dict_flat_set_min()) 407 | pdv.json(data_format=data_format, validate=False) 408 | 409 | # invalid `validate` 410 | for validate in test_config["invalid_validate_types"]: 411 | with pytest.raises(AssertionError): 412 | pdv = data_object() 413 | pdv.set(dict_flat_set_min()) 414 | pdv.json(validate=validate) 415 | 416 | with pytest.raises(jsonschema.exceptions.ValidationError): 417 | pdv = data_object() 418 | pdv.set({}) 419 | pdv.json() 420 | 421 | for attr in json_dataverse_upload_required_attr(): 422 | with pytest.raises(jsonschema.exceptions.ValidationError): 423 | pdv = data_object() 424 | data = json.loads(json_upload_min()) 425 | del data[attr] 426 | pdv.set(data) 427 | pdv.json(validate=True) 428 | 429 | def test_datafile_validate_json_valid(self): 430 | """Test Datafile.validate_json() with valid data.""" 431 | data = [ 432 | ((dict_flat_set_min(), {}), True), 433 | ((dict_flat_set_full(), {}), True), 434 | ((dict_flat_set_min(), {"data_format": "dataverse_upload"}), True), 435 | ( 436 | ( 437 | dict_flat_set_min(), 438 | { 439 | "data_format": "dataverse_upload", 440 | "filename_schema": test_config[ 441 | "datafile_upload_schema_filename" 442 | ], 443 | }, 444 | ), 445 | True, 446 | ), 447 | ( 448 | ( 449 | dict_flat_set_min(), 450 | {"filename_schema": test_config["datafile_upload_schema_filename"]}, 451 | ), 452 | True, 453 | ), 454 | ] 455 | 456 | for input, data_eval in data: 457 | pdv = data_object() 458 | pdv.set(input[0]) 459 | 460 | assert pdv.validate_json() == data_eval 461 | 462 | def test_datafile_validate_json_invalid(self): 463 | """Test Datafile.validate_json() with non-valid data.""" 464 | # invalid data 465 | for attr in json_dataverse_upload_required_attr(): 466 | with pytest.raises(jsonschema.exceptions.ValidationError): 467 | for data in [dict_flat_set_min(), dict_flat_set_full()]: 468 | pdv = data_object() 469 | pdv.set(data) 470 | delattr(pdv, attr) 471 | pdv.validate_json() 472 | 473 | # invalid `filename_schema` 474 | for filename_schema in test_config["invalid_filename_strings"]: 475 | with pytest.raises(FileNotFoundError): 476 | pdv = data_object() 477 | pdv.set(dict_flat_set_min()) 478 | pdv.validate_json(filename_schema=filename_schema) 479 | 480 | for filename_schema in test_config["invalid_filename_types"]: 481 | with pytest.raises(AssertionError): 482 | pdv = data_object() 483 | pdv.set(dict_flat_set_min()) 484 | pdv.validate_json(filename_schema=filename_schema) 485 | 486 | 487 | class TestDatafileSpecific(object): 488 | """Specific tests for Datafile().""" 489 | 490 | def test_datafile_init_valid(self): 491 | """Test Datafile.__init__() with valid data.""" 492 | # specific 493 | data = [ 494 | (Datafile(), {}), 495 | (Datafile(dict_flat_set_min()), object_data_min()), 496 | (Datafile(dict_flat_set_full()), object_data_full()), 497 | (Datafile({}), {}), 498 | ] 499 | 500 | for pdv, data_eval in data: 501 | for key, val in data_eval.items(): 502 | assert getattr(pdv, key) == data_eval[key] 503 | assert len(pdv.__dict__) - len(object_data_init()) == len(data_eval) 504 | 505 | def test_datafile_init_invalid(self): 506 | """Test Datafile.init() with invalid data.""" 507 | pdv = Datafile() 508 | 509 | # invalid data 510 | for data in ["invalid_set_types"]: 511 | with pytest.raises(AssertionError): 512 | pdv.set(data) 513 | 514 | 515 | if not os.environ.get("TRAVIS"): 516 | 517 | class TestDatafileGenericTravisNot(object): 518 | """Generic tests for Datafile(), not running on Travis (no file-write permissions).""" 519 | 520 | def test_dataverse_from_json_to_json_valid(self): 521 | """Test Dataverse to JSON from JSON with valid data.""" 522 | data = [ 523 | ({json_upload_min()}, {}), 524 | ({json_upload_full()}, {}), 525 | ({json_upload_min()}, {"data_format": "dataverse_upload"}), 526 | ({json_upload_min()}, {"validate": False}), 527 | ( 528 | {json_upload_min()}, 529 | {"filename_schema": "wrong", "validate": False}, 530 | ), 531 | ( 532 | {json_upload_min()}, 533 | { 534 | "filename_schema": test_config[ 535 | "datafile_upload_schema_filename" 536 | ], 537 | "validate": True, 538 | }, 539 | ), 540 | ({"{}"}, {"validate": False}), 541 | ] 542 | 543 | for args_from, kwargs_from in data: 544 | pdv_start = data_object() 545 | args = args_from 546 | kwargs = kwargs_from 547 | pdv_start.from_json(*args, **kwargs) 548 | if "validate" in kwargs: 549 | if not kwargs["validate"]: 550 | kwargs = {"validate": False} 551 | write_json( 552 | test_config["datafile_json_output_filename"], 553 | json.loads(pdv_start.json(**kwargs)), 554 | ) 555 | pdv_end = data_object() 556 | kwargs = kwargs_from 557 | pdv_end.from_json( 558 | read_file(test_config["datafile_json_output_filename"]), **kwargs 559 | ) 560 | 561 | for key, val in pdv_end.get().items(): 562 | assert getattr(pdv_start, key) == getattr(pdv_end, key) 563 | assert len(pdv_start.__dict__) == len( 564 | pdv_end.__dict__, 565 | ) 566 | -------------------------------------------------------------------------------- /tests/models/test_dvobject.py: -------------------------------------------------------------------------------- 1 | """Dataverse data model tests.""" 2 | 3 | from pyDataverse.models import DVObject 4 | 5 | 6 | class TestDVObject(object): 7 | """Tests for :class:DVObject().""" 8 | 9 | def test_dataverse_init(self): 10 | """Test Dataverse.__init__().""" 11 | obj = DVObject() 12 | 13 | assert not hasattr(obj, "default_json_format") 14 | assert not hasattr(obj, "allowed_json_formats") 15 | assert not hasattr(obj, "default_json_schema_filename") 16 | assert not hasattr(obj, "json_dataverse_upload_attr") 17 | -------------------------------------------------------------------------------- /tests/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gdcc/pyDataverse/693d0ff8d2849eccc32f9e66228ee8976109881a/tests/utils/__init__.py -------------------------------------------------------------------------------- /tests/utils/test_utils.py: -------------------------------------------------------------------------------- 1 | from pyDataverse.utils import read_json, dataverse_tree_walker 2 | from ..conftest import test_config 3 | 4 | 5 | class TestUtilsSaveTreeData: 6 | def test_dataverse_tree_walker_valid_default(self): 7 | dv_ids = [1, 2, 3] 8 | dv_aliases = ["parent_dv_1", "parent_dv_1_sub_dv_1", "parent_dv_2"] 9 | ds_ids = ["1AB23C", "4DE56F", "7GH89I", "0JK1LM", "2NO34P"] 10 | ds_pids = [ 11 | "doi:12.34567/1AB23C", 12 | "doi:12.34567/4DE56F", 13 | "doi:12.34567/7GH89I", 14 | "doi:12.34567/0JK1LM", 15 | "doi:12.34567/2NO34P", 16 | ] 17 | df_ids = [1, 2, 3, 4, 5, 6, 7] 18 | df_filenames = [ 19 | "appendix.pdf", 20 | "survey.zsav", 21 | "manual.pdf", 22 | "study.zsav", 23 | "documentation.pdf", 24 | "data.R", 25 | "summary.md", 26 | ] 27 | df_labels = [ 28 | "appendix.pdf", 29 | "survey.zsav", 30 | "manual.pdf", 31 | "study.zsav", 32 | "documentation.pdf", 33 | "data.R", 34 | "summary.md", 35 | ] 36 | df_pids = [ 37 | "doi:12.34567/1AB23C/ABC123", 38 | "doi:12.34567/1AB23C/DEF456", 39 | "doi:12.34567/4DE56F/GHI789", 40 | "doi:12.34567/7GH89I/JKL012", 41 | "doi:12.34567/0JK1LM/MNO345", 42 | "doi:12.34567/0JK1LM/PQR678", 43 | "doi:12.34567/2NO34P/STU901", 44 | ] 45 | 46 | data = read_json(test_config["tree_filename"]) 47 | dataverses, datasets, datafiles = dataverse_tree_walker(data) 48 | 49 | assert isinstance(dataverses, list) 50 | assert isinstance(datasets, list) 51 | assert isinstance(datafiles, list) 52 | assert len(dataverses) == 3 53 | assert len(datasets) == 5 54 | assert len(datafiles) == 7 55 | 56 | for dv in dataverses: 57 | assert "dataverse_alias" in dv 58 | assert "dataverse_id" in dv 59 | assert dv["dataverse_alias"] in dv_aliases 60 | dv_aliases.pop(dv_aliases.index(dv["dataverse_alias"])) 61 | assert dv["dataverse_id"] in dv_ids 62 | dv_ids.pop(dv_ids.index(dv["dataverse_id"])) 63 | assert (len(dv_aliases)) == 0 64 | assert (len(dv_ids)) == 0 65 | 66 | for ds in datasets: 67 | assert "dataset_id" in ds 68 | assert "pid" in ds 69 | assert ds["dataset_id"] in ds_ids 70 | ds_ids.pop(ds_ids.index(ds["dataset_id"])) 71 | assert ds["pid"] in ds_pids 72 | ds_pids.pop(ds_pids.index(ds["pid"])) 73 | assert (len(ds_ids)) == 0 74 | assert (len(ds_pids)) == 0 75 | 76 | for df in datafiles: 77 | assert "datafile_id" in df 78 | assert "filename" in df 79 | assert "label" in df 80 | assert "pid" in df 81 | assert df["datafile_id"] in df_ids 82 | df_ids.pop(df_ids.index(df["datafile_id"])) 83 | assert df["filename"] in df_filenames 84 | df_filenames.pop(df_filenames.index(df["filename"])) 85 | assert df["label"] in df_labels 86 | df_labels.pop(df_labels.index(df["label"])) 87 | assert df["pid"] in df_pids 88 | df_pids.pop(df_pids.index(df["pid"])) 89 | assert (len(df_ids)) == 0 90 | assert (len(df_filenames)) == 0 91 | assert (len(df_pids)) == 0 92 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | requires = 3 | tox>=4 4 | envlist = py3,py3{8,9,10,11,12},coverage,coveralls,lint 5 | skip_missing_interpreters = True 6 | 7 | [testenv] 8 | description = default settings for unspecified tests 9 | skip_install = True 10 | allowlist_externals = poetry 11 | passenv = * 12 | commands_pre = 13 | poetry lock --no-update 14 | poetry install --with=tests 15 | commands = 16 | pytest -v tests --cov=pyDataverse --basetemp={envtmpdir} 17 | 18 | [testenv:py3] 19 | 20 | [testenv:py38] 21 | basepython = python3.8 22 | 23 | [testenv:py39] 24 | basepython = python3.9 25 | 26 | [testenv:py310] 27 | basepython = python3.10 28 | 29 | [testenv:py311] 30 | basepython = python3.11 31 | 32 | [testenv:py312] 33 | basepython = python3.12 34 | 35 | [testenv:py313] 36 | basepython = python3.13 37 | 38 | [testenv:coverage] 39 | description = create report for coverage 40 | commands = 41 | pytest tests --cov=pyDataverse --cov-report=term-missing --cov-report=xml --cov-report=html 42 | 43 | [testenv:coveralls] 44 | description = create reports for coveralls 45 | commands = 46 | pytest tests --doctest-modules -v --cov=pyDataverse 47 | 48 | [testenv:lint] 49 | commands_pre = 50 | poetry lock --no-update 51 | poetry install --with=lint 52 | commands = 53 | ruff check pyDataverse tests 54 | 55 | [testenv:mypy] 56 | commands_pre = 57 | poetry lock --no-update 58 | poetry install --with=lint 59 | commands = 60 | mypy pyDataverse tests 61 | 62 | [testenv:docs] 63 | description = invoke sphinx-build to build the HTML docs 64 | commands_pre = 65 | poetry lock --no-update 66 | poetry install --with=docs 67 | commands = 68 | sphinx-build -d pyDataverse/docs/build/docs_doctree pyDataverse/docs/source docs/build/html --color -b html {posargs} 69 | 70 | [testenv:pydocstyle] 71 | description = pydocstyle for auto-formatting 72 | commands_pre = 73 | poetry lock --no-update 74 | poetry install --with=docs 75 | commands = 76 | pydocstyle pyDataverse/ 77 | pydocstyle tests/ 78 | 79 | [testenv:radon-mc] 80 | description = Radon McCabe number 81 | commands_pre = 82 | poetry lock --no-update 83 | poetry install --with=lint 84 | commands = 85 | radon cc pyDataverse/ -a 86 | 87 | [testenv:radon-mi] 88 | description = Radon Maintainability Index 89 | commands_pre = 90 | poetry lock --no-update 91 | poetry install --with=lint 92 | commands = 93 | radon mi pyDataverse/ 94 | radon mi tests/ 95 | 96 | [testenv:radon-raw] 97 | description = Radon raw metrics 98 | commands_pre = 99 | poetry lock --no-update 100 | poetry install --with=lint 101 | commands = 102 | radon raw pyDataverse/ 103 | radon raw tests/ 104 | 105 | [testenv:radon-hal] 106 | description = Radon Halstead metrics 107 | commands_pre = 108 | poetry lock --no-update 109 | poetry install --with=lint 110 | commands = 111 | radon hal pyDataverse/ 112 | radon hal tests/ 113 | --------------------------------------------------------------------------------