├── .gitignore ├── Insight_Project_Framework ├── README.md └── scratchpad.py ├── LICENSE ├── README.md ├── build ├── environment.sh └── requirements.txt ├── configs └── example_config.yml ├── data ├── preprocessed │ └── example.txt ├── processed │ └── example_output.txt └── raw │ └── raw_data.txt ├── requirements.txt └── tests └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .nox/ 42 | .coverage 43 | .coverage.* 44 | .cache 45 | nosetests.xml 46 | coverage.xml 47 | *.cover 48 | .hypothesis/ 49 | .pytest_cache/ 50 | 51 | # Translations 52 | *.mo 53 | *.pot 54 | 55 | # Django stuff: 56 | *.log 57 | local_settings.py 58 | db.sqlite3 59 | 60 | # Flask stuff: 61 | instance/ 62 | .webassets-cache 63 | 64 | # Scrapy stuff: 65 | .scrapy 66 | 67 | # Sphinx documentation 68 | docs/_build/ 69 | 70 | # PyBuilder 71 | target/ 72 | 73 | # Jupyter Notebook 74 | .ipynb_checkpoints 75 | 76 | # IPython 77 | profile_default/ 78 | ipython_config.py 79 | 80 | # pyenv 81 | .python-version 82 | 83 | # celery beat schedule file 84 | celerybeat-schedule 85 | 86 | # SageMath parsed files 87 | *.sage.py 88 | 89 | # Environments 90 | .env 91 | .venv 92 | env/ 93 | venv/ 94 | ENV/ 95 | env.bak/ 96 | venv.bak/ 97 | 98 | # Spyder project settings 99 | .spyderproject 100 | .spyproject 101 | 102 | # Rope project settings 103 | .ropeproject 104 | 105 | # mkdocs documentation 106 | /site 107 | 108 | # mypy 109 | .mypy_cache/ 110 | .dmypy.json 111 | dmypy.json -------------------------------------------------------------------------------- /Insight_Project_Framework/README.md: -------------------------------------------------------------------------------- 1 | All of your project code will go here 2 | -------------------------------------------------------------------------------- /Insight_Project_Framework/scratchpad.py: -------------------------------------------------------------------------------- 1 | """ 2 | This scratchpad usess Streamlit which should be installed on your machine if 3 | you follow the Insight Installation Instructions: 4 | 5 | https://docs.google.com/presentation/d/1qo_MDz3iF0YRykuElF6I9WC4yAQIYzOA-GY16_NOuUM 6 | 7 | Or by running: 8 | 9 | pip install -r requirements.txt 10 | 11 | from the top-level project folder. 12 | """ 13 | 14 | import streamlit as st 15 | import numpy as np 16 | 17 | st.write('This is a scratchpad for *Streamlit.* **Edit it and see what happens!**') 18 | 19 | st.subheader('A Numpy Array') 20 | 21 | st.write(np.random.randn(10, 10)) 22 | 23 | st.subheader('A Graph!') 24 | 25 | st.line_chart(np.random.randn(200, 2)) 26 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Matthew Rubashkin 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Insight_Project_Framework 2 | Framework for machine learning projects at Insight Data Science. 3 | 4 | ## Motivation for this project format: 5 | - **Insight_Project_Framework** : Put all source code for production within structured directory 6 | - **tests** : Put all source code for testing in an easy to find location 7 | - **configs** : Enable modification of all preset variables within single directory (consisting of one or many config files for separate tasks) 8 | - **data** : Include example a small amount of data in the Github repository so tests can be run to validate installation 9 | - **build** : Include scripts that automate building of a standalone environment 10 | - **static** : Any images or content to include in the README or web framework if part of the pipeline 11 | 12 | ## Setup 13 | Clone repository and update python path 14 | ``` 15 | repo_name=Insight_Project_Framework # URL of your new repository 16 | username=mrubash1 # Username for your personal github account 17 | git clone https://github.com/$username/$repo_name 18 | cd $repo_name 19 | echo "export $repo_name=${PWD}" >> ~/.bash_profile 20 | echo "export PYTHONPATH=$repo_name/src:${PYTHONPATH}" >> ~/.bash_profile 21 | source ~/.bash_profile 22 | ``` 23 | Create new development branch and switch onto it 24 | ``` 25 | branch_name=dev-readme_requisites-20180905 # Name of development branch, of the form 'dev-feature_name-date_of_creation'}} 26 | git checkout -b $branch_name 27 | ``` 28 | 29 | ## Initial Commit 30 | Lets start with a blank slate: remove `.git` and re initialize the repo 31 | ``` 32 | cd $repo_name 33 | rm -rf .git 34 | git init 35 | git status 36 | ``` 37 | You'll see a list of file, these are files that git doesn't recognize. At this point, feel free to change the directory names to match your project. i.e. change the parent directory Insight_Project_Framework and the project directory Insight_Project_Framework: 38 | Now commit these: 39 | ``` 40 | git add . 41 | git commit -m "Initial commit" 42 | git push origin $branch_name 43 | ``` 44 | 45 | ## Requisites 46 | 47 | - List all packages and software needed to build the environment 48 | - This could include cloud command line tools (i.e. gsutil), package managers (i.e. conda), etc. 49 | 50 | #### Dependencies 51 | 52 | - [Streamlit](streamlit.io) 53 | 54 | #### Installation 55 | To install the package above, pleae run: 56 | ```shell 57 | pip install -r requiremnts 58 | ``` 59 | 60 | ## Build Environment 61 | - Include instructions of how to launch scripts in the build subfolder 62 | - Build scripts can include shell scripts or python setup.py files 63 | - The purpose of these scripts is to build a standalone environment, for running the code in this repository 64 | - The environment can be for local use, or for use in a cloud environment 65 | - If using for a cloud environment, commands could include CLI tools from a cloud provider (i.e. gsutil from Google Cloud Platform) 66 | ``` 67 | # Example 68 | 69 | # Step 1 70 | # Step 2 71 | ``` 72 | 73 | ## Configs 74 | - We recommond using either .yaml or .txt for your config files, not .json 75 | - **DO NOT STORE CREDENTIALS IN THE CONFIG DIRECTORY!!** 76 | - If credentials are needed, use environment variables or HashiCorp's [Vault](https://www.vaultproject.io/) 77 | 78 | 79 | ## Test 80 | - Include instructions for how to run all tests after the software is installed 81 | ``` 82 | # Example 83 | 84 | # Step 1 85 | # Step 2 86 | ``` 87 | 88 | ## Run Inference 89 | - Include instructions on how to run inference 90 | - i.e. image classification on a single image for a CNN deep learning project 91 | ``` 92 | # Example 93 | 94 | # Step 1 95 | # Step 2 96 | ``` 97 | 98 | ## Build Model 99 | - Include instructions of how to build the model 100 | - This can be done either locally or on the cloud 101 | ``` 102 | # Example 103 | 104 | # Step 1 105 | # Step 2 106 | ``` 107 | 108 | ## Serve Model 109 | - Include instructions of how to set up a REST or RPC endpoint 110 | - This is for running remote inference via a custom model 111 | ``` 112 | # Example 113 | 114 | # Step 1 115 | # Step 2 116 | ``` 117 | 118 | ## Analysis 119 | - Include some form of EDA (exploratory data analysis) 120 | - And/or include benchmarking of the model and results 121 | ``` 122 | # Example 123 | 124 | # Step 1 125 | # Step 2 126 | ``` 127 | -------------------------------------------------------------------------------- /build/environment.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mrubash1/Insight_Project_Framework/392e34ef3614809774744bb88339dea9cb09ca76/build/environment.sh -------------------------------------------------------------------------------- /build/requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mrubash1/Insight_Project_Framework/392e34ef3614809774744bb88339dea9cb09ca76/build/requirements.txt -------------------------------------------------------------------------------- /configs/example_config.yml: -------------------------------------------------------------------------------- 1 | # Dataset Stuff ------------------------------------------------- 2 | # 3 | data_path: ~/data 4 | output_path: ~/output 5 | 6 | val_size: 10000 7 | train_chunk_size: 40000 8 | 9 | 10 | # Training Hyperparams -------------------------------------- 11 | 12 | batch_size: 128 13 | num_epochs: 200 14 | validation_every: 1 15 | 16 | weight_decay: 0.0005 17 | 18 | learning_rate_schedule: 19 | init: 0.1 20 | final: 0.0001 21 | 22 | momentum_schedule: 23 | 0: 0.0 24 | 1: 0.5 25 | 2: 0.9 26 | 27 | layer_config: 28 | 0: 29 | layer_type: InputLayer 30 | input_shape: [128, 1, 91, 64] 31 | 32 | 1: 33 | layer_type: Conv2DLayer 34 | n_filters: 64 35 | filter_size: [8,59] 36 | nonlinearity: rectifier 37 | init_bias_value: 0.01 38 | 39 | 2: 40 | layer_type: MaxPooling2DLayer 41 | pool_size: [6,3] 42 | ignore_border: False 43 | 44 | 3: 45 | layer_type: DenseLayer 46 | n_outputs: 500 47 | nonlinearity: rectifier 48 | init_bias_value: 0.1 49 | dropout: 0.5 50 | 51 | 4: 52 | layer_type: DenseLayer 53 | n_outputs: 2 54 | nonlinearity: sigmoid 55 | init_bias_value: 0.1 56 | dropout: 0.0 57 | 58 | 5: 59 | layer_type: OutputLayer -------------------------------------------------------------------------------- /data/preprocessed/example.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mrubash1/Insight_Project_Framework/392e34ef3614809774744bb88339dea9cb09ca76/data/preprocessed/example.txt -------------------------------------------------------------------------------- /data/processed/example_output.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mrubash1/Insight_Project_Framework/392e34ef3614809774744bb88339dea9cb09ca76/data/processed/example_output.txt -------------------------------------------------------------------------------- /data/raw/raw_data.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mrubash1/Insight_Project_Framework/392e34ef3614809774744bb88339dea9cb09ca76/data/raw/raw_data.txt -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # To install these Python dependencies, please type: 2 | # pip install -r requirements.txt 3 | 4 | streamlit 5 | -------------------------------------------------------------------------------- /tests/README.md: -------------------------------------------------------------------------------- 1 | Extra credit for writing tests! 2 | 3 | The structure of this directory should mirror the structure of your project 4 | directory. For each file in your project directory, `/.py` 5 | you'll have a test file here: `/test_.py` 6 | 7 | If you want to learn more about tests, check out this video: 8 | 9 | https://www.youtube.com/watch?v=6tNS--WetLI 10 | 11 | If you end up writing tests, this is another good thing to know about: 12 | 13 | https://docs.scipy.org/doc/numpy/reference/routines.testing.html 14 | --------------------------------------------------------------------------------