├── LICENSE
├── README.md
├── cookiecutter.json
├── hooks
    └── post_gen_project.py
└── {{cookiecutter.__directory_name}}
    ├── .gitignore
    ├── .pre-commit-config.yaml
    ├── .python-version
    ├── README.md
    ├── config
        ├── main.yaml
        ├── model
        │   ├── model1.yaml
        │   └── model2.yaml
        └── process
        │   ├── process1.yaml
        │   └── process2.yaml
    ├── data
        ├── final
        │   └── .gitkeep
        ├── processed
        │   └── .gitkeep
        └── raw
        │   └── .gitkeep
    ├── docs
        └── .gitkeep
    ├── models
        └── .gitkeep
    ├── notebooks
        └── .gitkeep
    ├── pyproject.toml
    ├── requirements-dev.txt
    ├── requirements.txt
    ├── src
        ├── __init__.py
        ├── process.py
        ├── train_model.py
        └── utils.py
    └── tests
        ├── __init__.py
        ├── test_process.py
        └── test_train_model.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Khuyen Tran
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | [![View article](https://img.shields.io/badge/CodeCut-View_article-blue)](https://codecut.ai/how-to-structure-a-data-science-project-for-readability-and-transparency-2/) [![View on YouTube](https://img.shields.io/badge/YouTube-Watch%20on%20Youtube-red?logo=youtube)](https://youtu.be/TzvcPi3nsdw) 
 2 | 
 3 | # Data Science Cookie Cutter
 4 | 
 5 | ## Why?
 6 | It is important to structure your data science project based on a certain standard so that your teammates can easily maintain and modify your project.
 7 | 
 8 | This repository provides a template that incorporates best practices to create a maintainable and reproducible data science project.  
 9 | 
10 | ## Tools used in this project
11 | * [hydra](https://hydra.cc/): Manage configuration files - [article](https://codecut.ai/stop-hard-coding-in-a-data-science-project-use-configuration-files-instead/)
12 | * [pdoc](https://github.com/pdoc3/pdoc): Automatically create an API documentation for your project
13 | * [pre-commit plugins](https://pre-commit.com/): Automate code reviewing formatting
14 | * [Poetry](https://towardsdatascience.com/how-to-effortlessly-publish-your-python-package-to-pypi-using-poetry-44b305362f9f): Dependency management - [article](https://codecut.ai/poetry-a-better-way-to-manage-python-dependencies/)
15 | * [uv](https://github.com/astral-sh/uv): Ultra-fast Python package installer and resolver
16 | * [pip](https://pip.pypa.io/): Traditional Python package installer
17 | 
18 | ## How to use this project
19 | 
20 | Install Cookiecutter:
21 | ```bash
22 | pip install cookiecutter
23 | ```
24 | 
25 | Create a project based on the template:
26 | ```bash
27 | cookiecutter https://github.com/khuyentran1401/data-science-template
28 | ```
29 | 
30 | You will be prompted to choose your preferred dependency manager:
31 | - `poetry`: Modern Python package and dependency manager
32 | - `uv`: Ultra-fast Python package installer and resolver
33 | - `pip`: Traditional Python package installer
34 | 
35 | ## Book: Production-Ready Data Science
36 | 
37 | Want to learn more about building production-ready data science projects? Check out my upcoming book:
38 | 
39 | [Production Ready Data Science: From Prototyping to Production with Python](https://codecut.ai/production-ready-data/?utm_source=github&utm_medium=repository&utm_campaign=data_science_template)
40 | 
41 | The book will cover:
42 | 
43 | - Best practices for structuring data science projects
44 | - Tools and techniques for reproducible research 
45 | - Deploying and monitoring machine learning models
46 | - And much more!
47 | 
48 | Sign up now to receive the first 3 chapters for free! You'll also be notified when the full book becomes available.
49 | 
50 | ## Other Resources:
51 | - [Article](https://codecut.ai/how-to-structure-a-data-science-project-for-readability-and-transparency-2/)
52 | - [Video](https://youtu.be/TzvcPi3nsdw)
53 | 
54 | 


--------------------------------------------------------------------------------
/cookiecutter.json:
--------------------------------------------------------------------------------
1 | {
2 |     "project_name": "Project Name",
3 |     "__directory_name": "{{ cookiecutter.project_name.lower().replace(' ', '_') }}",
4 |     "author_name": "Your Name",
5 |     "dependency_manager": ["pip", "poetry", "uv"],
6 |     "compatible_python_versions": ">=3.9"
7 | }
8 | 


--------------------------------------------------------------------------------
/hooks/post_gen_project.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | 
 4 | def parse_dependencies_settings():
 5 |     if "{{ cookiecutter.dependency_manager }}" != "pip":
 6 |         os.remove("requirements.txt")
 7 |         os.remove("requirements-dev.txt")
 8 | 
 9 | 
10 | def handle_python_version_file():
11 |     """Remove .python-version file if not using uv."""
12 |     if "{{ cookiecutter.dependency_manager }}" != "uv":
13 |         os.remove(".python-version")
14 | 
15 | 
16 | if __name__ == "__main__":
17 |     parse_dependencies_settings()
18 |     handle_python_version_file()
19 | 


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | 
  5 | # C extensions
  6 | *.so
  7 | 
  8 | # Distribution / packaging
  9 | .Python
 10 | env/
 11 | .venv
 12 | build/
 13 | develop-eggs/
 14 | dist/
 15 | downloads/
 16 | eggs/
 17 | .eggs/
 18 | lib/
 19 | lib64/
 20 | parts/
 21 | sdist/
 22 | var/
 23 | *.egg-info/
 24 | .installed.cfg
 25 | *.egg'
 26 | venv
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | 
 48 | # Translations
 49 | *.mo
 50 | *.pot
 51 | 
 52 | # Django stuff:
 53 | *.log
 54 | 
 55 | # Sphinx documentation
 56 | docs/_build/
 57 | 
 58 | # PyBuilder
 59 | target/
 60 | 
 61 | # DotEnv configuration
 62 | .env
 63 | 
 64 | # Database
 65 | *.db
 66 | *.rdb
 67 | 
 68 | # Pycharm
 69 | .idea
 70 | 
 71 | # VS Code
 72 | .vscode/
 73 | 
 74 | # Spyder
 75 | .spyproject/
 76 | 
 77 | # Jupyter NB Checkpoints
 78 | .ipynb_checkpoints/
 79 | 
 80 | # Mac OS-specific storage files
 81 | .DS_Store
 82 | 
 83 | # vim
 84 | *.swp
 85 | *.swo
 86 | 
 87 | # Caches
 88 | .mypy_cache/
 89 | .pytest_cache/
 90 | .ruff_cache/
 91 | 
 92 | # Hydra logs
 93 | outputs
 94 | multirun
 95 | 
 96 | # Data
 97 | data/*/*
 98 | 
 99 | # Notebooks
100 | notebooks/*


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
 1 | repos:
 2 |   - repo: https://github.com/charliermarsh/ruff-pre-commit
 3 |     rev: v0.11.6
 4 |     hooks:
 5 |       - id: ruff
 6 |         args: [--fix]
 7 |   - repo: https://github.com/pre-commit/mirrors-mypy
 8 |     rev: v1.15.0
 9 |     hooks:
10 |       - id: mypy


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/.python-version:
--------------------------------------------------------------------------------
1 | {%- if cookiecutter.dependency_manager == "uv" -%}
2 | {{ cookiecutter.compatible_python_versions.replace(">=", "") }}
3 | {%- endif -%} 


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/README.md:
--------------------------------------------------------------------------------
  1 | # {{cookiecutter.project_name}}
  2 | 
  3 | ## Tools used in this project
  4 | 
  5 | * [hydra](https://hydra.cc/): Manage configuration files - [article](https://codecut.ai/stop-hard-coding-in-a-data-science-project-use-configuration-files-instead/)
  6 | * [pdoc](https://github.com/pdoc3/pdoc): Automatically create an API documentation for your project
  7 | * [pre-commit plugins](https://pre-commit.com/): Automate code reviewing formatting
  8 | {%- if cookiecutter.dependency_manager == "poetry" %}
  9 | * [Poetry](https://towardsdatascience.com/how-to-effortlessly-publish-your-python-package-to-pypi-using-poetry-44b305362f9f): Dependency management - [article](https://codecut.ai/poetry-a-better-way-to-manage-python-dependencies/)
 10 | {%- elif cookiecutter.dependency_manager == "uv" %}
 11 | * [uv](https://github.com/astral-sh/uv): Ultra-fast Python package installer and resolver
 12 | {%- endif %}
 13 | 
 14 | ## Project Structure
 15 | 
 16 | ```bash
 17 | .
 18 | ├── config
 19 | │   ├── main.yaml                   # Main configuration file
 20 | │   ├── model                       # Configurations for training model
 21 | │   │   ├── model1.yaml             # First variation of parameters to train model
 22 | │   │   └── model2.yaml             # Second variation of parameters to train model
 23 | │   └── process                     # Configurations for processing data
 24 | │       ├── process1.yaml           # First variation of parameters to process data
 25 | │       └── process2.yaml           # Second variation of parameters to process data
 26 | ├── data
 27 | │   ├── final                       # data after training the model
 28 | │   ├── processed                   # data after processing
 29 | │   └── raw                         # raw data
 30 | ├── docs                            # documentation for your project
 31 | ├── .gitignore                      # ignore files that cannot commit to Git
 32 | ├── models                          # store models
 33 | ├── notebooks                       # store notebooks
 34 | {%- if cookiecutter.dependency_manager == "pip" %}
 35 | ├── pyproject.toml                  # Configure black
 36 | {%- elif cookiecutter.dependency_manager == "poetry" %}
 37 | ├── .pre-commit-config.yaml         # configurations for pre-commit
 38 | ├── pyproject.toml                  # dependencies for poetry
 39 | {%- elif cookiecutter.dependency_manager == "uv" %}
 40 | ├── .pre-commit-config.yaml         # configurations for pre-commit
 41 | ├── .python-version                 # specify Python version for the project
 42 | ├── pyproject.toml                  # project metadata and dependencies
 43 | {%- endif %}
 44 | ├── README.md                       # describe your project
 45 | ├── src                             # store source code
 46 | │   ├── __init__.py                 # make src a Python module
 47 | │   ├── process.py                  # process data before training model
 48 | │   ├── train_model.py              # train model
 49 | │   └── utils.py                    # store helper functions
 50 | └── tests                           # store tests
 51 |     ├── __init__.py                 # make tests a Python module
 52 |     ├── test_process.py             # test functions for process.py
 53 |     └── test_train_model.py         # test functions for train_model.py
 54 | ```
 55 | 
 56 | ## Version Control Setup
 57 | 
 58 | 1. Initialize Git in your project directory:
 59 | ```bash
 60 | git init
 61 | ```
 62 | 
 63 | 2. Add your remote repository:
 64 | ```bash
 65 | # For HTTPS
 66 | git remote add origin https://github.com/username/repository-name.git
 67 | 
 68 | # For SSH
 69 | git remote add origin git@github.com:username/repository-name.git
 70 | ```
 71 | 
 72 | 3. Create and switch to a new branch:
 73 | ```bash
 74 | git checkout -b main
 75 | ```
 76 | 
 77 | 4. Add and commit your files:
 78 | ```bash
 79 | git add .
 80 | git commit -m "Initial commit"
 81 | ```
 82 | 
 83 | 5. Push to your remote repository:
 84 | ```bash
 85 | git push -u origin main
 86 | ```
 87 | 
 88 | ## Set up the environment
 89 | 
 90 | {%- if cookiecutter.dependency_manager == "poetry" %}
 91 | 1. Install [Poetry](https://python-poetry.org/docs/#installation)
 92 | 
 93 | 2. Activate the virtual environment:
 94 | 
 95 | ```bash
 96 | poetry shell
 97 | ```
 98 | 
 99 | 3. Install dependencies:
100 | 
101 | - To install all dependencies from pyproject.toml, run:
102 | 
103 | ```bash
104 | poetry install
105 | ```
106 | 
107 | - To install only production dependencies, run:
108 | 
109 | ```bash
110 | poetry install --only main
111 | ```
112 | 
113 | Note: To follow the rest of the instructions in this README (including running tests, generating documentation, and using pre-commit hooks), it is recommended to install all dependencies using `poetry install`.
114 | 
115 | 4. Run Python scripts:
116 | 
117 | ```bash
118 | # Run directly with poetry
119 | poetry run python src/process.py
120 | 
121 | # Or after activating the virtual environment
122 | python src/process.py
123 | ```
124 | {%- elif cookiecutter.dependency_manager == "uv" %}
125 | 1. Install [uv](https://docs.astral.sh/uv/getting-started/installation/)
126 | 
127 | 2. Install dependencies:
128 | 
129 | - To install all dependencies from pyproject.toml, run:
130 | 
131 | ```bash
132 | uv sync --all-extras
133 | ```
134 | 
135 | - To install only production dependencies, run:
136 | 
137 | ```bash
138 | uv sync
139 | ```
140 | 
141 | Note: To follow the rest of the instructions in this README (including running tests, generating documentation, and using pre-commit hooks), it is recommended to install all dependencies using `uv sync --all-extras`.
142 | 
143 | 3. Run Python scripts:
144 | 
145 | ```bash
146 | uv run src/process.py
147 | ```
148 | {%- else %}
149 | 1. Create the virtual environment:
150 | 
151 | ```bash
152 | python3 -m venv venv
153 | ```
154 | 
155 | 2. Activate the virtual environment:
156 | 
157 | * For Linux/MacOS:
158 | 
159 | ```bash
160 | source venv/bin/activate
161 | ```
162 | 
163 | * For Command Prompt:
164 | 
165 | ```bash
166 | .\venv\Scripts\activate
167 | ```
168 | 
169 | 3. Install dependencies:
170 | 
171 | - To install all dependencies, run:
172 | 
173 | ```bash
174 | pip install -r requirements-dev.txt
175 | ```
176 | 
177 | - To install only production dependencies, run:
178 | 
179 | ```bash
180 | pip install -r requirements.txt
181 | ```
182 | 
183 | Note: To follow the rest of the instructions in this README (including running tests, generating documentation, and using pre-commit hooks), it is recommended to install all dependencies using `pip install -r requirements-dev.txt`.
184 | 
185 | 4. Run Python scripts:
186 | 
187 | ```bash
188 | # After activating the virtual environment
189 | python3 src/process.py
190 | ```
191 | {%- endif %}
192 | 
193 | ## Set up pre-commit hooks
194 | 
195 | {%- if cookiecutter.dependency_manager == "poetry" %}
196 | Set up pre-commit:
197 | ```bash
198 | poetry run pre-commit install
199 | ```
200 | {%- elif cookiecutter.dependency_manager == "uv" %}
201 | Set up pre-commit:
202 | ```bash
203 | uv run pre-commit install
204 | ```
205 | {%- else %}
206 | Set up pre-commit:
207 | ```bash
208 | pre-commit install
209 | ```
210 | {%- endif %}
211 | 
212 | The pre-commit configuration is already set up in `.pre-commit-config.yaml`. This includes:
213 | * `ruff`: A fast Python linter and code formatter that will automatically fix issues when possible
214 | * `black`: Python code formatting to ensure consistent code style
215 | * `mypy`: Static type checking for Python to catch type-related errors before runtime
216 | 
217 | Pre-commit will now run automatically on every commit. If any checks fail, the commit will be aborted and the issues will be automatically fixed when possible.
218 | 
219 | ## View and alter configurations
220 | 
221 | The project uses Hydra to manage configurations. You can view and modify these configurations from the command line.
222 | 
223 | To view available configurations:
224 | 
225 | {%- if cookiecutter.dependency_manager == "poetry" %}
226 | ```bash
227 | poetry run python src/process.py --help
228 | ```
229 | {%- elif cookiecutter.dependency_manager == "uv" %}
230 | ```bash
231 | uv run src/process.py --help
232 | ```
233 | {%- else %}
234 | ```bash
235 | python3 src/process.py --help
236 | ```
237 | {%- endif %}
238 | 
239 | Output:
240 | 
241 | ```yaml
242 | process is powered by Hydra.
243 | 
244 | == Configuration groups ==
245 | Compose your configuration from those groups (group=option)
246 | 
247 | model: model1, model2
248 | process: process1, process2
249 | 
250 | 
251 | == Config ==
252 | Override anything in the config (foo.bar=value)
253 | 
254 | process:
255 |   use_columns:
256 |   - col1
257 |   - col2
258 | model:
259 |   name: model1
260 | data:
261 |   raw: data/raw/sample.csv
262 |   processed: data/processed/processed.csv
263 |   final: data/final/final.csv
264 | ```
265 | 
266 | To override configurations (for example, changing the input data file):
267 | 
268 | {%- if cookiecutter.dependency_manager == "poetry" %}
269 | ```bash
270 | poetry run python src/process.py data.raw=sample2.csv
271 | ```
272 | {%- elif cookiecutter.dependency_manager == "uv" %}
273 | ```bash
274 | uv run src/process.py data.raw=sample2.csv
275 | ```
276 | {%- else %}
277 | ```bash
278 | python3 src/process.py data.raw=sample2.csv
279 | ```
280 | {%- endif %}
281 | 
282 | Output:
283 | 
284 | ```
285 | Process data using sample2.csv
286 | Columns used: ['col1', 'col2']
287 | ```
288 | 
289 | You can override any configuration value shown in the help output. Multiple overrides can be combined in a single command. For more information about Hydra's configuration system, visit the [official documentation](https://hydra.cc/docs/intro/).
290 | 
291 | ## Auto-generate API documentation
292 | 
293 | {%- if cookiecutter.dependency_manager == "poetry" %}
294 | Generate static documentation:
295 | ```bash
296 | poetry run pdoc src -o docs
297 | ```
298 | 
299 | Start documentation server (available at http://localhost:8080):
300 | ```bash
301 | poetry run pdoc src --http localhost:8080
302 | ```
303 | {%- elif cookiecutter.dependency_manager == "uv" %}
304 | Generate static documentation:
305 | ```bash
306 | uv run pdoc src -o docs
307 | ```
308 | 
309 | Start documentation server (available at http://localhost:8080):
310 | ```bash
311 | uv run pdoc src --http localhost:8080
312 | ```
313 | {%- else %}
314 | Generate static documentation:
315 | ```bash
316 | pdoc src -o docs
317 | ```
318 | 
319 | Start documentation server (available at http://localhost:8080):
320 | ```bash
321 | pdoc src --http localhost:8080
322 | ```
323 | {%- endif %}
324 | 
325 | The documentation will be generated from your docstrings and type hints in your Python files. The static documentation will be saved in the `docs` directory, while the live server allows you to view the documentation with hot-reloading as you make changes.
326 | 


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/config/main.yaml:
--------------------------------------------------------------------------------
1 | defaults:
2 |   - process: process1
3 |   - model: model1
4 |   - _self_
5 | 
6 | data:
7 |   raw: data/raw/sample.csv
8 |   processed: data/processed/processed.csv
9 |   final: data/final/final.csv


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/config/model/model1.yaml:
--------------------------------------------------------------------------------
1 | name: model1


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/config/model/model2.yaml:
--------------------------------------------------------------------------------
1 | name: model2


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/config/process/process1.yaml:
--------------------------------------------------------------------------------
1 | use_columns: 
2 |   - col1
3 |   - col2


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/config/process/process2.yaml:
--------------------------------------------------------------------------------
1 | use_columns: 
2 |   - col1
3 |   - col2
4 |   - col3


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/data/final/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/data/final/.gitkeep


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/data/processed/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/data/processed/.gitkeep


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/data/raw/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/data/raw/.gitkeep


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/docs/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/docs/.gitkeep


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/models/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/models/.gitkeep


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/notebooks/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/notebooks/.gitkeep


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/pyproject.toml:
--------------------------------------------------------------------------------
  1 | {%- if cookiecutter.dependency_manager == "poetry" %}
  2 | [tool.poetry]
  3 | name = "{{ cookiecutter.__directory_name }}"
  4 | version = "0.1.0"
  5 | description = ""
  6 | authors = ["{{ cookiecutter.author_name }}"]
  7 | 
  8 | [tool.poetry.dependencies]
  9 | python = "{{ cookiecutter.compatible_python_versions }}"
 10 | hydra-core = "^1.3.2"
 11 | 
 12 | [tool.poetry.group.dev.dependencies]
 13 | pdoc3 = "^0.11.6"
 14 | pytest = "^8.3.5"
 15 | pre-commit = "^4.2.0"
 16 | 
 17 | [build-system]
 18 | requires = ["poetry-core>=1.0.0"]
 19 | build-backend = "poetry.core.masonry.api"
 20 | {% elif cookiecutter.dependency_manager == "uv" %}
 21 | [project]
 22 | name = "{{ cookiecutter.__directory_name }}"
 23 | version = "0.1.0"
 24 | description = ""
 25 | authors = [
 26 |     { name = "{{ cookiecutter.author_name }}" }
 27 | ]
 28 | requires-python = "{{ cookiecutter.compatible_python_versions }}"
 29 | dependencies = [
 30 |     "hydra-core>=1.3.2"
 31 | ]
 32 | 
 33 | [project.optional-dependencies]
 34 | dev = [
 35 |     "pdoc3>=0.11.6",
 36 |     "pytest>=8.3.5",
 37 |     "pre-commit>=4.2.0"
 38 | ]
 39 | {% else %}
 40 | [project]
 41 | name = "{{ cookiecutter.__directory_name }}"
 42 | version = "0.1.0"
 43 | description = ""
 44 | authors = [
 45 |     { name = "{{ cookiecutter.author_name }}" }
 46 | ]
 47 | dependencies = [
 48 |     "hydra-core>=1.3.2"
 49 | ]
 50 | 
 51 | [project.optional-dependencies]
 52 | dev = [
 53 |     "pdoc3>=0.11.6",
 54 |     "pytest>=8.3.5",
 55 |     "pre-commit>=4.2.0"
 56 | ]
 57 | {% endif %}
 58 | 
 59 | [tool.ruff]
 60 | # Exclude a variety of commonly ignored directories.
 61 | exclude = [
 62 |     ".bzr",
 63 |     ".direnv",
 64 |     ".eggs",
 65 |     ".git",
 66 |     ".git-rewrite",
 67 |     ".hg",
 68 |     ".mypy_cache",
 69 |     ".nox",
 70 |     ".pants.d",
 71 |     ".pytype",
 72 |     ".ruff_cache",
 73 |     ".svn",
 74 |     ".tox",
 75 |     ".venv",
 76 |     "__pypackages__",
 77 |     "_build",
 78 |     "buck-out",
 79 |     "build",
 80 |     "dist",
 81 |     "node_modules",
 82 |     "venv",
 83 | ]
 84 | 
 85 | # Same as Black.
 86 | line-length = 88
 87 | 
 88 | [tool.ruff.lint]
 89 | ignore = ["E501"]
 90 | select = ["B","C","E","F","W","B9", "I", "Q"]
 91 | 
 92 | [tool.ruff.format]
 93 | quote-style = "double"
 94 | indent-style = "tab"
 95 | skip-magic-trailing-comma = false
 96 | 
 97 | [tool.ruff.lint.mccabe]
 98 | max-complexity = 10
 99 | 
100 | [tool.mypy]
101 | ignore_missing_imports = true


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/requirements-dev.txt:
--------------------------------------------------------------------------------
1 | -r requirements.txt
2 | pdoc3>=0.11.6
3 | pytest>=8.3.5
4 | pre-commit>=4.2.0


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/requirements.txt:
--------------------------------------------------------------------------------
1 | hydra-core>=1.3.2


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/src/__init__.py:
--------------------------------------------------------------------------------
1 | """Source code of your project"""


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/src/process.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This is the demo code that uses hydra to access the parameters in under the directory config.
 3 | 
 4 | Author: Khuyen Tran
 5 | """
 6 | 
 7 | import hydra
 8 | from omegaconf import DictConfig
 9 | 
10 | 
11 | @hydra.main(config_path="../config", config_name="main", version_base="1.2")
12 | def process_data(config: DictConfig):
13 |     """Function to process the data"""
14 | 
15 |     print(f"Process data using {config.data.raw}")
16 |     print(f"Columns used: {config.process.use_columns}")
17 | 
18 | 
19 | if __name__ == "__main__":
20 |     process_data()
21 | 


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/src/train_model.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This is the demo code that uses hydra to access the parameters in under the directory config.
 3 | 
 4 | Author: Khuyen Tran
 5 | """
 6 | 
 7 | import hydra
 8 | from omegaconf import DictConfig
 9 | 
10 | 
11 | @hydra.main(config_path="../config", config_name="main", version_base="1.2")
12 | def train_model(config: DictConfig):
13 |     """Function to train the model"""
14 | 
15 |     print(f"Train modeling using {config.data.processed}")
16 |     print(f"Model used: {config.model.name}")
17 |     print(f"Save the output to {config.data.final}")
18 | 
19 | 
20 | if __name__ == "__main__":
21 |     train_model()
22 | 


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/src/utils.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/src/utils.py


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/tests/__init__.py:
--------------------------------------------------------------------------------
1 | # Avoid ModuleNotFoundError
2 | 
3 | import sys
4 | 
5 | sys.path.append("./src")
6 | 


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/tests/test_process.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/tests/test_process.py


--------------------------------------------------------------------------------
/{{cookiecutter.__directory_name}}/tests/test_train_model.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CodeCutTech/data-science-template/8fb1bc546d24e04ceea2a5f2537788ca49fce682/{{cookiecutter.__directory_name}}/tests/test_train_model.py


--------------------------------------------------------------------------------