├── LICENSE ├── README.md ├── cookiecutter.json ├── hooks └── post_gen_project.py └── {{cookiecutter.__directory_name}} ├── .gitignore ├── .pre-commit-config.yaml ├── .python-version ├── README.md ├── config ├── main.yaml ├── model │ ├── model1.yaml │ └── model2.yaml └── process │ ├── process1.yaml │ └── process2.yaml ├── data ├── final │ └── .gitkeep ├── processed │ └── .gitkeep └── raw │ └── .gitkeep ├── docs └── .gitkeep ├── models └── .gitkeep ├── notebooks └── .gitkeep ├── pyproject.toml ├── requirements-dev.txt ├── requirements.txt ├── src ├── __init__.py ├── process.py ├── train_model.py └── utils.py └── tests ├── __init__.py ├── test_process.py └── test_train_model.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Khuyen Tran 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![View article](https://img.shields.io/badge/CodeCut-View_article-blue)](https://codecut.ai/how-to-structure-a-data-science-project-for-readability-and-transparency-2/) [![View on YouTube](https://img.shields.io/badge/YouTube-Watch%20on%20Youtube-red?logo=youtube)](https://youtu.be/TzvcPi3nsdw) 2 | 3 | # Data Science Cookie Cutter 4 | 5 | ## Why? 6 | It is important to structure your data science project based on a certain standard so that your teammates can easily maintain and modify your project. 7 | 8 | This repository provides a template that incorporates best practices to create a maintainable and reproducible data science project. 9 | 10 | ## Tools used in this project 11 | * [hydra](https://hydra.cc/): Manage configuration files - [article](https://codecut.ai/stop-hard-coding-in-a-data-science-project-use-configuration-files-instead/) 12 | * [pdoc](https://github.com/pdoc3/pdoc): Automatically create an API documentation for your project 13 | * [pre-commit plugins](https://pre-commit.com/): Automate code reviewing formatting 14 | * [Poetry](https://towardsdatascience.com/how-to-effortlessly-publish-your-python-package-to-pypi-using-poetry-44b305362f9f): Dependency management - [article](https://codecut.ai/poetry-a-better-way-to-manage-python-dependencies/) 15 | * [uv](https://github.com/astral-sh/uv): Ultra-fast Python package installer and resolver 16 | * [pip](https://pip.pypa.io/): Traditional Python package installer 17 | 18 | ## How to use this project 19 | 20 | Install Cookiecutter: 21 | ```bash 22 | pip install cookiecutter 23 | ``` 24 | 25 | Create a project based on the template: 26 | ```bash 27 | cookiecutter https://github.com/khuyentran1401/data-science-template 28 | ``` 29 | 30 | You will be prompted to choose your preferred dependency manager: 31 | - `poetry`: Modern Python package and dependency manager 32 | - `uv`: Ultra-fast Python package installer and resolver 33 | - `pip`: Traditional Python package installer 34 | 35 | ## Book: Production-Ready Data Science 36 | 37 | Want to learn more about building production-ready data science projects? Check out my upcoming book: 38 | 39 | [Production Ready Data Science: From Prototyping to Production with Python](https://codecut.ai/production-ready-data/?utm_source=github&utm_medium=repository&utm_campaign=data_science_template) 40 | 41 | The book will cover: 42 | 43 | - Best practices for structuring data science projects 44 | - Tools and techniques for reproducible research 45 | - Deploying and monitoring machine learning models 46 | - And much more! 47 | 48 | Sign up now to receive the first 3 chapters for free! You'll also be notified when the full book becomes available. 49 | 50 | ## Other Resources: 51 | - [Article](https://codecut.ai/how-to-structure-a-data-science-project-for-readability-and-transparency-2/) 52 | - [Video](https://youtu.be/TzvcPi3nsdw) 53 | 54 | -------------------------------------------------------------------------------- /cookiecutter.json: -------------------------------------------------------------------------------- 1 | { 2 | "project_name": "Project Name", 3 | "__directory_name": "{{ cookiecutter.project_name.lower().replace(' ', '_') }}", 4 | "author_name": "Your Name", 5 | "dependency_manager": ["pip", "poetry", "uv"], 6 | "compatible_python_versions": ">=3.9" 7 | } 8 | -------------------------------------------------------------------------------- /hooks/post_gen_project.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | 4 | def parse_dependencies_settings(): 5 | if "{{ cookiecutter.dependency_manager }}" != "pip": 6 | os.remove("requirements.txt") 7 | os.remove("requirements-dev.txt") 8 | 9 | 10 | def handle_python_version_file(): 11 | """Remove .python-version file if not using uv.""" 12 | if "{{ cookiecutter.dependency_manager }}" != "uv": 13 | os.remove(".python-version") 14 | 15 | 16 | if __name__ == "__main__": 17 | parse_dependencies_settings() 18 | handle_python_version_file() 19 | -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | 5 | # C extensions 6 | *.so 7 | 8 | # Distribution / packaging 9 | .Python 10 | env/ 11 | .venv 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg' 26 | venv 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | 48 | # Translations 49 | *.mo 50 | *.pot 51 | 52 | # Django stuff: 53 | *.log 54 | 55 | # Sphinx documentation 56 | docs/_build/ 57 | 58 | # PyBuilder 59 | target/ 60 | 61 | # DotEnv configuration 62 | .env 63 | 64 | # Database 65 | *.db 66 | *.rdb 67 | 68 | # Pycharm 69 | .idea 70 | 71 | # VS Code 72 | .vscode/ 73 | 74 | # Spyder 75 | .spyproject/ 76 | 77 | # Jupyter NB Checkpoints 78 | .ipynb_checkpoints/ 79 | 80 | # Mac OS-specific storage files 81 | .DS_Store 82 | 83 | # vim 84 | *.swp 85 | *.swo 86 | 87 | # Caches 88 | .mypy_cache/ 89 | .pytest_cache/ 90 | .ruff_cache/ 91 | 92 | # Hydra logs 93 | outputs 94 | multirun 95 | 96 | # Data 97 | data/*/* 98 | 99 | # Notebooks 100 | notebooks/* -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/charliermarsh/ruff-pre-commit 3 | rev: v0.11.6 4 | hooks: 5 | - id: ruff 6 | args: [--fix] 7 | - repo: https://github.com/pre-commit/mirrors-mypy 8 | rev: v1.15.0 9 | hooks: 10 | - id: mypy -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/.python-version: -------------------------------------------------------------------------------- 1 | {%- if cookiecutter.dependency_manager == "uv" -%} 2 | {{ cookiecutter.compatible_python_versions.replace(">=", "") }} 3 | {%- endif -%} -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/README.md: -------------------------------------------------------------------------------- 1 | # {{cookiecutter.project_name}} 2 | 3 | ## Tools used in this project 4 | 5 | * [hydra](https://hydra.cc/): Manage configuration files - [article](https://codecut.ai/stop-hard-coding-in-a-data-science-project-use-configuration-files-instead/) 6 | * [pdoc](https://github.com/pdoc3/pdoc): Automatically create an API documentation for your project 7 | * [pre-commit plugins](https://pre-commit.com/): Automate code reviewing formatting 8 | {%- if cookiecutter.dependency_manager == "poetry" %} 9 | * [Poetry](https://towardsdatascience.com/how-to-effortlessly-publish-your-python-package-to-pypi-using-poetry-44b305362f9f): Dependency management - [article](https://codecut.ai/poetry-a-better-way-to-manage-python-dependencies/) 10 | {%- elif cookiecutter.dependency_manager == "uv" %} 11 | * [uv](https://github.com/astral-sh/uv): Ultra-fast Python package installer and resolver 12 | {%- endif %} 13 | 14 | ## Project Structure 15 | 16 | ```bash 17 | . 18 | ├── config 19 | │ ├── main.yaml # Main configuration file 20 | │ ├── model # Configurations for training model 21 | │ │ ├── model1.yaml # First variation of parameters to train model 22 | │ │ └── model2.yaml # Second variation of parameters to train model 23 | │ └── process # Configurations for processing data 24 | │ ├── process1.yaml # First variation of parameters to process data 25 | │ └── process2.yaml # Second variation of parameters to process data 26 | ├── data 27 | │ ├── final # data after training the model 28 | │ ├── processed # data after processing 29 | │ └── raw # raw data 30 | ├── docs # documentation for your project 31 | ├── .gitignore # ignore files that cannot commit to Git 32 | ├── models # store models 33 | ├── notebooks # store notebooks 34 | {%- if cookiecutter.dependency_manager == "pip" %} 35 | ├── pyproject.toml # Configure black 36 | {%- elif cookiecutter.dependency_manager == "poetry" %} 37 | ├── .pre-commit-config.yaml # configurations for pre-commit 38 | ├── pyproject.toml # dependencies for poetry 39 | {%- elif cookiecutter.dependency_manager == "uv" %} 40 | ├── .pre-commit-config.yaml # configurations for pre-commit 41 | ├── .python-version # specify Python version for the project 42 | ├── pyproject.toml # project metadata and dependencies 43 | {%- endif %} 44 | ├── README.md # describe your project 45 | ├── src # store source code 46 | │ ├── __init__.py # make src a Python module 47 | │ ├── process.py # process data before training model 48 | │ ├── train_model.py # train model 49 | │ └── utils.py # store helper functions 50 | └── tests # store tests 51 | ├── __init__.py # make tests a Python module 52 | ├── test_process.py # test functions for process.py 53 | └── test_train_model.py # test functions for train_model.py 54 | ``` 55 | 56 | ## Version Control Setup 57 | 58 | 1. Initialize Git in your project directory: 59 | ```bash 60 | git init 61 | ``` 62 | 63 | 2. Add your remote repository: 64 | ```bash 65 | # For HTTPS 66 | git remote add origin https://github.com/username/repository-name.git 67 | 68 | # For SSH 69 | git remote add origin git@github.com:username/repository-name.git 70 | ``` 71 | 72 | 3. Create and switch to a new branch: 73 | ```bash 74 | git checkout -b main 75 | ``` 76 | 77 | 4. Add and commit your files: 78 | ```bash 79 | git add . 80 | git commit -m "Initial commit" 81 | ``` 82 | 83 | 5. Push to your remote repository: 84 | ```bash 85 | git push -u origin main 86 | ``` 87 | 88 | ## Set up the environment 89 | 90 | {%- if cookiecutter.dependency_manager == "poetry" %} 91 | 1. Install [Poetry](https://python-poetry.org/docs/#installation) 92 | 93 | 2. Activate the virtual environment: 94 | 95 | ```bash 96 | poetry shell 97 | ``` 98 | 99 | 3. Install dependencies: 100 | 101 | - To install all dependencies from pyproject.toml, run: 102 | 103 | ```bash 104 | poetry install 105 | ``` 106 | 107 | - To install only production dependencies, run: 108 | 109 | ```bash 110 | poetry install --only main 111 | ``` 112 | 113 | Note: To follow the rest of the instructions in this README (including running tests, generating documentation, and using pre-commit hooks), it is recommended to install all dependencies using `poetry install`. 114 | 115 | 4. Run Python scripts: 116 | 117 | ```bash 118 | # Run directly with poetry 119 | poetry run python src/process.py 120 | 121 | # Or after activating the virtual environment 122 | python src/process.py 123 | ``` 124 | {%- elif cookiecutter.dependency_manager == "uv" %} 125 | 1. Install [uv](https://docs.astral.sh/uv/getting-started/installation/) 126 | 127 | 2. Install dependencies: 128 | 129 | - To install all dependencies from pyproject.toml, run: 130 | 131 | ```bash 132 | uv sync --all-extras 133 | ``` 134 | 135 | - To install only production dependencies, run: 136 | 137 | ```bash 138 | uv sync 139 | ``` 140 | 141 | Note: To follow the rest of the instructions in this README (including running tests, generating documentation, and using pre-commit hooks), it is recommended to install all dependencies using `uv sync --all-extras`. 142 | 143 | 3. Run Python scripts: 144 | 145 | ```bash 146 | uv run src/process.py 147 | ``` 148 | {%- else %} 149 | 1. Create the virtual environment: 150 | 151 | ```bash 152 | python3 -m venv venv 153 | ``` 154 | 155 | 2. Activate the virtual environment: 156 | 157 | * For Linux/MacOS: 158 | 159 | ```bash 160 | source venv/bin/activate 161 | ``` 162 | 163 | * For Command Prompt: 164 | 165 | ```bash 166 | .\venv\Scripts\activate 167 | ``` 168 | 169 | 3. Install dependencies: 170 | 171 | - To install all dependencies, run: 172 | 173 | ```bash 174 | pip install -r requirements-dev.txt 175 | ``` 176 | 177 | - To install only production dependencies, run: 178 | 179 | ```bash 180 | pip install -r requirements.txt 181 | ``` 182 | 183 | Note: To follow the rest of the instructions in this README (including running tests, generating documentation, and using pre-commit hooks), it is recommended to install all dependencies using `pip install -r requirements-dev.txt`. 184 | 185 | 4. Run Python scripts: 186 | 187 | ```bash 188 | # After activating the virtual environment 189 | python3 src/process.py 190 | ``` 191 | {%- endif %} 192 | 193 | ## Set up pre-commit hooks 194 | 195 | {%- if cookiecutter.dependency_manager == "poetry" %} 196 | Set up pre-commit: 197 | ```bash 198 | poetry run pre-commit install 199 | ``` 200 | {%- elif cookiecutter.dependency_manager == "uv" %} 201 | Set up pre-commit: 202 | ```bash 203 | uv run pre-commit install 204 | ``` 205 | {%- else %} 206 | Set up pre-commit: 207 | ```bash 208 | pre-commit install 209 | ``` 210 | {%- endif %} 211 | 212 | The pre-commit configuration is already set up in `.pre-commit-config.yaml`. This includes: 213 | * `ruff`: A fast Python linter and code formatter that will automatically fix issues when possible 214 | * `black`: Python code formatting to ensure consistent code style 215 | * `mypy`: Static type checking for Python to catch type-related errors before runtime 216 | 217 | Pre-commit will now run automatically on every commit. If any checks fail, the commit will be aborted and the issues will be automatically fixed when possible. 218 | 219 | ## View and alter configurations 220 | 221 | The project uses Hydra to manage configurations. You can view and modify these configurations from the command line. 222 | 223 | To view available configurations: 224 | 225 | {%- if cookiecutter.dependency_manager == "poetry" %} 226 | ```bash 227 | poetry run python src/process.py --help 228 | ``` 229 | {%- elif cookiecutter.dependency_manager == "uv" %} 230 | ```bash 231 | uv run src/process.py --help 232 | ``` 233 | {%- else %} 234 | ```bash 235 | python3 src/process.py --help 236 | ``` 237 | {%- endif %} 238 | 239 | Output: 240 | 241 | ```yaml 242 | process is powered by Hydra. 243 | 244 | == Configuration groups == 245 | Compose your configuration from those groups (group=option) 246 | 247 | model: model1, model2 248 | process: process1, process2 249 | 250 | 251 | == Config == 252 | Override anything in the config (foo.bar=value) 253 | 254 | process: 255 | use_columns: 256 | - col1 257 | - col2 258 | model: 259 | name: model1 260 | data: 261 | raw: data/raw/sample.csv 262 | processed: data/processed/processed.csv 263 | final: data/final/final.csv 264 | ``` 265 | 266 | To override configurations (for example, changing the input data file): 267 | 268 | {%- if cookiecutter.dependency_manager == "poetry" %} 269 | ```bash 270 | poetry run python src/process.py data.raw=sample2.csv 271 | ``` 272 | {%- elif cookiecutter.dependency_manager == "uv" %} 273 | ```bash 274 | uv run src/process.py data.raw=sample2.csv 275 | ``` 276 | {%- else %} 277 | ```bash 278 | python3 src/process.py data.raw=sample2.csv 279 | ``` 280 | {%- endif %} 281 | 282 | Output: 283 | 284 | ``` 285 | Process data using sample2.csv 286 | Columns used: ['col1', 'col2'] 287 | ``` 288 | 289 | You can override any configuration value shown in the help output. Multiple overrides can be combined in a single command. For more information about Hydra's configuration system, visit the [official documentation](https://hydra.cc/docs/intro/). 290 | 291 | ## Auto-generate API documentation 292 | 293 | {%- if cookiecutter.dependency_manager == "poetry" %} 294 | Generate static documentation: 295 | ```bash 296 | poetry run pdoc src -o docs 297 | ``` 298 | 299 | Start documentation server (available at http://localhost:8080): 300 | ```bash 301 | poetry run pdoc src --http localhost:8080 302 | ``` 303 | {%- elif cookiecutter.dependency_manager == "uv" %} 304 | Generate static documentation: 305 | ```bash 306 | uv run pdoc src -o docs 307 | ``` 308 | 309 | Start documentation server (available at http://localhost:8080): 310 | ```bash 311 | uv run pdoc src --http localhost:8080 312 | ``` 313 | {%- else %} 314 | Generate static documentation: 315 | ```bash 316 | pdoc src -o docs 317 | ``` 318 | 319 | Start documentation server (available at http://localhost:8080): 320 | ```bash 321 | pdoc src --http localhost:8080 322 | ``` 323 | {%- endif %} 324 | 325 | The documentation will be generated from your docstrings and type hints in your Python files. The static documentation will be saved in the `docs` directory, while the live server allows you to view the documentation with hot-reloading as you make changes. 326 | -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/config/main.yaml: -------------------------------------------------------------------------------- 1 | defaults: 2 | - process: process1 3 | - model: model1 4 | - _self_ 5 | 6 | data: 7 | raw: data/raw/sample.csv 8 | processed: data/processed/processed.csv 9 | final: data/final/final.csv -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/config/model/model1.yaml: -------------------------------------------------------------------------------- 1 | name: model1 -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/config/model/model2.yaml: -------------------------------------------------------------------------------- 1 | name: model2 -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/config/process/process1.yaml: -------------------------------------------------------------------------------- 1 | use_columns: 2 | - col1 3 | - col2 -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/config/process/process2.yaml: -------------------------------------------------------------------------------- 1 | use_columns: 2 | - col1 3 | - col2 4 | - col3 -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/data/final/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/data/final/.gitkeep -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/data/processed/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/data/processed/.gitkeep -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/data/raw/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/data/raw/.gitkeep -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/docs/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/docs/.gitkeep -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/models/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/models/.gitkeep -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/notebooks/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/notebooks/.gitkeep -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/pyproject.toml: -------------------------------------------------------------------------------- 1 | {%- if cookiecutter.dependency_manager == "poetry" %} 2 | [tool.poetry] 3 | name = "{{ cookiecutter.__directory_name }}" 4 | version = "0.1.0" 5 | description = "" 6 | authors = ["{{ cookiecutter.author_name }}"] 7 | 8 | [tool.poetry.dependencies] 9 | python = "{{ cookiecutter.compatible_python_versions }}" 10 | hydra-core = "^1.3.2" 11 | 12 | [tool.poetry.group.dev.dependencies] 13 | pdoc3 = "^0.11.6" 14 | pytest = "^8.3.5" 15 | pre-commit = "^4.2.0" 16 | 17 | [build-system] 18 | requires = ["poetry-core>=1.0.0"] 19 | build-backend = "poetry.core.masonry.api" 20 | {% elif cookiecutter.dependency_manager == "uv" %} 21 | [project] 22 | name = "{{ cookiecutter.__directory_name }}" 23 | version = "0.1.0" 24 | description = "" 25 | authors = [ 26 | { name = "{{ cookiecutter.author_name }}" } 27 | ] 28 | requires-python = "{{ cookiecutter.compatible_python_versions }}" 29 | dependencies = [ 30 | "hydra-core>=1.3.2" 31 | ] 32 | 33 | [project.optional-dependencies] 34 | dev = [ 35 | "pdoc3>=0.11.6", 36 | "pytest>=8.3.5", 37 | "pre-commit>=4.2.0" 38 | ] 39 | {% else %} 40 | [project] 41 | name = "{{ cookiecutter.__directory_name }}" 42 | version = "0.1.0" 43 | description = "" 44 | authors = [ 45 | { name = "{{ cookiecutter.author_name }}" } 46 | ] 47 | dependencies = [ 48 | "hydra-core>=1.3.2" 49 | ] 50 | 51 | [project.optional-dependencies] 52 | dev = [ 53 | "pdoc3>=0.11.6", 54 | "pytest>=8.3.5", 55 | "pre-commit>=4.2.0" 56 | ] 57 | {% endif %} 58 | 59 | [tool.ruff] 60 | # Exclude a variety of commonly ignored directories. 61 | exclude = [ 62 | ".bzr", 63 | ".direnv", 64 | ".eggs", 65 | ".git", 66 | ".git-rewrite", 67 | ".hg", 68 | ".mypy_cache", 69 | ".nox", 70 | ".pants.d", 71 | ".pytype", 72 | ".ruff_cache", 73 | ".svn", 74 | ".tox", 75 | ".venv", 76 | "__pypackages__", 77 | "_build", 78 | "buck-out", 79 | "build", 80 | "dist", 81 | "node_modules", 82 | "venv", 83 | ] 84 | 85 | # Same as Black. 86 | line-length = 88 87 | 88 | [tool.ruff.lint] 89 | ignore = ["E501"] 90 | select = ["B","C","E","F","W","B9", "I", "Q"] 91 | 92 | [tool.ruff.format] 93 | quote-style = "double" 94 | indent-style = "tab" 95 | skip-magic-trailing-comma = false 96 | 97 | [tool.ruff.lint.mccabe] 98 | max-complexity = 10 99 | 100 | [tool.mypy] 101 | ignore_missing_imports = true -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/requirements-dev.txt: -------------------------------------------------------------------------------- 1 | -r requirements.txt 2 | pdoc3>=0.11.6 3 | pytest>=8.3.5 4 | pre-commit>=4.2.0 -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/requirements.txt: -------------------------------------------------------------------------------- 1 | hydra-core>=1.3.2 -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/src/__init__.py: -------------------------------------------------------------------------------- 1 | """Source code of your project""" -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/src/process.py: -------------------------------------------------------------------------------- 1 | """ 2 | This is the demo code that uses hydra to access the parameters in under the directory config. 3 | 4 | Author: Khuyen Tran 5 | """ 6 | 7 | import hydra 8 | from omegaconf import DictConfig 9 | 10 | 11 | @hydra.main(config_path="../config", config_name="main", version_base="1.2") 12 | def process_data(config: DictConfig): 13 | """Function to process the data""" 14 | 15 | print(f"Process data using {config.data.raw}") 16 | print(f"Columns used: {config.process.use_columns}") 17 | 18 | 19 | if __name__ == "__main__": 20 | process_data() 21 | -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/src/train_model.py: -------------------------------------------------------------------------------- 1 | """ 2 | This is the demo code that uses hydra to access the parameters in under the directory config. 3 | 4 | Author: Khuyen Tran 5 | """ 6 | 7 | import hydra 8 | from omegaconf import DictConfig 9 | 10 | 11 | @hydra.main(config_path="../config", config_name="main", version_base="1.2") 12 | def train_model(config: DictConfig): 13 | """Function to train the model""" 14 | 15 | print(f"Train modeling using {config.data.processed}") 16 | print(f"Model used: {config.model.name}") 17 | print(f"Save the output to {config.data.final}") 18 | 19 | 20 | if __name__ == "__main__": 21 | train_model() 22 | -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/src/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/src/utils.py -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/tests/__init__.py: -------------------------------------------------------------------------------- 1 | # Avoid ModuleNotFoundError 2 | 3 | import sys 4 | 5 | sys.path.append("./src") 6 | -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/tests/test_process.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/tests/test_process.py -------------------------------------------------------------------------------- /{{cookiecutter.__directory_name}}/tests/test_train_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/tests/test_train_model.py --------------------------------------------------------------------------------