├── Code
    ├── Deployment
    │   ├── operationalization.py
    │   └── Readme.md
    ├── Modeling
    │   ├── model.R
    │   └── Readme.md
    ├── Data_Acquisition_and_Understanding
    │   ├── dataPrep.py
    │   ├── datapipeline.json
    │   └── Readme.md
    └── Readme.md
├── Docs
    ├── Model
    │   ├── README.md
    │   ├── FinalReport.md
    │   ├── Model 1
    │   │   └── Model Report.md
    │   └── Baseline
    │   │   └── Baseline Models.md
    ├── Data_Report
    │   ├── ReadMe.md
    │   ├── DataPipeline.txt
    │   ├── DataSummaryReport.md
    │   └── Data Defintion.md
    ├── Project
    │   ├── System Architecture.docx
    │   ├── README.md
    │   ├── Charter.md
    │   └── Exit Report.md
    ├── Data_Dictionaries
    │   ├── data-dictionary-from-sql-table.PNG
    │   ├── Raw-Data-Dictionary.csv
    │   └── ReadMe.md
    └── README.md
├── LICENSE.TXT
├── Sample_Data
    ├── For_Modeling
    │   └── modelling.md
    ├── Raw
    │   └── rawData.md
    ├── Processed
    │   └── processed.md
    └── README.md
├── NOTICE.TXT
├── LICENSE-CODE.TXT
└── README.md


/Code/Deployment/operationalization.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/Code/Modeling/model.R:
--------------------------------------------------------------------------------
1 | # R-code for models


--------------------------------------------------------------------------------
/Code/Data_Acquisition_and_Understanding/dataPrep.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/Docs/Model/README.md:
--------------------------------------------------------------------------------
1 | # Folder for hosting all documents and reports related to modeling
2 | 


--------------------------------------------------------------------------------
/LICENSE.TXT:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure/Azure-TDSP-ProjectTemplate/HEAD/LICENSE.TXT


--------------------------------------------------------------------------------
/Docs/Data_Report/ReadMe.md:
--------------------------------------------------------------------------------
1 | # DataReport Folder
2 | _Location to place documents describing results of data exploration_


--------------------------------------------------------------------------------
/Docs/Project/System Architecture.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure/Azure-TDSP-ProjectTemplate/HEAD/Docs/Project/System Architecture.docx


--------------------------------------------------------------------------------
/Code/Data_Acquisition_and_Understanding/datapipeline.json:
--------------------------------------------------------------------------------
1 | # Add your data pipe line spec. This is dependant on your tool. For Azure Data Factory (ADF) it is a set of JSON files.
2 | 


--------------------------------------------------------------------------------
/Docs/Data_Dictionaries/data-dictionary-from-sql-table.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure/Azure-TDSP-ProjectTemplate/HEAD/Docs/Data_Dictionaries/data-dictionary-from-sql-table.PNG


--------------------------------------------------------------------------------
/Code/Deployment/Readme.md:
--------------------------------------------------------------------------------
1 | # This folder contains code for model deployment
2 | 
3 | You can add detailed description in this markdown related to your specific data science project.
4 | 


--------------------------------------------------------------------------------
/Docs/Data_Dictionaries/Raw-Data-Dictionary.csv:
--------------------------------------------------------------------------------
1 | Column Index,Column Name,Type of Variable,"Values (range, levels, examples, etc)",Short Description,Joining Keys with others datasets?
2 | 1,,,,,
3 | 2,,,,,
4 | 3,,,,,
5 | 


--------------------------------------------------------------------------------
/Docs/Data_Report/DataPipeline.txt:
--------------------------------------------------------------------------------
1 | # Data Pipeline 
2 | 
3 | Describe the data pipeline and provide a logical diagram. List how frequently the data is moved - real time/stream, near real time, batched with frequency etc.
4 | 


--------------------------------------------------------------------------------
/Code/Data_Acquisition_and_Understanding/Readme.md:
--------------------------------------------------------------------------------
1 | # This folder hosts code for data acquisition and understanding (exploratory analysis)
2 | 
3 | You can add detailed description in this markdown related to your specific data science project.
4 | 


--------------------------------------------------------------------------------
/Code/Modeling/Readme.md:
--------------------------------------------------------------------------------
1 | # This folder contains code for modeling and related activities (such as feature engineering, model evaluation etc.)
2 | 
3 | You can add detailed description in this markdown related to your specific data science project.
4 | 


--------------------------------------------------------------------------------
/Code/Readme.md:
--------------------------------------------------------------------------------
1 | # Code folder for hosting code for a Data Science Project
2 | 
3 | This folder hosts all code for a data science project. It has three sub-folders, belonging to 3 stages of the Data Science Lifecycle:
4 | 
5 | 1. Data_Acquisition_and_Understanding
6 | 2. Modeling
7 | 3. Deployment
8 | 


--------------------------------------------------------------------------------
/Docs/README.md:
--------------------------------------------------------------------------------
 1 | # Folder for hosting all documents for a Data Science Project
 2 | 
 3 | Documents will contain information about the following 
 4 | 
 5 | 1. System architecture
 6 | 2. Data dictionaries
 7 | 3. Reports related to data understanding, modeling
 8 | 4. Project management and planning docs
 9 | 5. Information obtained from a business owner or client about the project
10 | 6. Docs and presentations prepared to share information about the project 
11 | 


--------------------------------------------------------------------------------
/Sample_Data/For_Modeling/modelling.md:
--------------------------------------------------------------------------------
1 | # List of feature sets
2 | |  Feature Set Name | Link to the Full Feature Set   | Full Feature Set Size (MB)  | Link to Report |
3 | | ---:| ---: | ---: | ---: |
4 | | Feature Set 1 | [link](link/to/feature/set1) | 2,000 | [Feature Set 1 Report](link/to/report1)|
5 | | Feature Set 2 | [link](link/to/feature/set2) | 300 | [Feature Set 2 Report](link/to/report2)|
6 | 
7 | If the link to the full dataset does not apply, provide some information on how to access the full dataset. 
8 | 
9 | If the data stays in an Azure file storage, please provide the link to the text file with the information of the file storage that has been checked in to the git repository. 


--------------------------------------------------------------------------------
/Docs/Project/README.md:
--------------------------------------------------------------------------------
 1 | # Folder for hosting project documents and reports for a Data Science Project
 2 | 
 3 | These could be: 
 4 | 
 5 | 1. Project management and planning docs
 6 | 2. System architecture
 7 | 3. Information obtained from a business owner or client about the project
 8 | 4. Docs and presentations prepared to share information about hte project
 9 | 
10 | In this folder we have templates for project chater and exit report. 
11 | 
12 | In addition, if you have access to Microsoft Project or Excel, you may use project templates provided in this [blog](https://blogs.msdn.microsoft.com/buckwoody/2017/10/24/a-data-science-microsoft-project-template-you-can-use-in-your-solutions).
13 | 


--------------------------------------------------------------------------------
/Sample_Data/Raw/rawData.md:
--------------------------------------------------------------------------------
 1 | ## List of Raw Datasets
 2 | 
 3 | 
 4 | | Raw Dataset Name | Link to the Full Dataset   | Full Dataset Size (MB)  | Link to Report |
 5 | | ---:| ---: | ---: | ---: |
 6 | | Raw Dataset 1 | [link](link/to/full/dataset1) | 2,000 | [Raw Dataset 1 Report](link/to/report1)|
 7 | | Raw Dataset 2 | [link](link/to/full/dataset2) | 300 | [Raw Dataset 2 Report](link/to/report2)|
 8 | 
 9 | If the link to the full dataset does not apply, provide some information on how to access the full dataset. 
10 | 
11 | If the data stays in an Azure file storage, please provide the link to the text file with the information of the file storage that has been checked in to the git repository. 
12 | 
13 | 


--------------------------------------------------------------------------------
/Sample_Data/Processed/processed.md:
--------------------------------------------------------------------------------
 1 | ## List of Processed Datasets
 2 | 
 3 | 
 4 | | Processed Dataset Name | Link to the Full Processed Dataset   | Full Processed Dataset Size (MB)  | Link to Report |
 5 | | ---:| ---: | ---: | ---: |
 6 | | Processed Dataset 1 | [link](link/to/processed/dataset1) | 2,000 | [Processed Dataset 1 Report](link/to/report1)|
 7 | | Processed Dataset 2 | [link](link/to/processed/dataset2) | 300 | [Processed Dataset 2 Report](link/to/report2)|
 8 | 
 9 | 
10 | If the link to the full dataset does not apply, provide some information on how to access the full dataset. 
11 | 
12 | If the data stays in an Azure file storage, please provide the link to the text file with the information of the file storage that has been checked in to the git repository. 


--------------------------------------------------------------------------------
/Docs/Data_Report/DataSummaryReport.md:
--------------------------------------------------------------------------------
 1 | # Data Report
 2 | This file will be generated for each data file received or processed. The Interactive Data Exploration, Analysis, and Reporting (IDEAR) utility developed by TDSP team of Microsoft can help you explore and visualize the data in an interactive way, and generate the data report along with the process of exploration and visualization. 
 3 | 
 4 | IDEAR allows you to output the data summary, statistics, and charts that you want to use to tell the data story into the report. You only need to click a few buttons, and the report will be generated for you. 
 5 | 
 6 | ## General summary of the data
 7 | 
 8 | ## Data quality summary
 9 | 
10 | ## Target variable
11 | 
12 | ## Individual variables
13 | 
14 | ## Variable ranking
15 | 
16 | ## Relationship between explanatory variables and target variable
17 | 
18 | 
19 | 


--------------------------------------------------------------------------------
/Sample_Data/README.md:
--------------------------------------------------------------------------------
1 | The **Sample_Data**  directory in the project git repository is the place to store **SAMPLE** datasets which should be of small size, **NOT** the entire datasets. If your client does not allow you to store even the sample data on the github repository, if possible, store a sample dataset with all confidential fields hashed. If still not allowed, please do not store sample data here. But, please still fill in the table in each sub-directory. 
2 | 
3 | The small sample datasets can be used to make your data preprocessing, feature engineering, or modeling scripts runnable. It can be helpful to quickly run the scripts that process or model the data, and understand what the scripts are doing.  
4 | 
5 | In each directory, there is a markdown file, which lists all datasets in each directory. Please provide the link to the full dataset in case one wants to access the full dataset. 
6 | 
7 | 
8 | 
9 | 


--------------------------------------------------------------------------------
/Docs/Model/FinalReport.md:
--------------------------------------------------------------------------------
 1 | # Final Model Report
 2 | _Report describing the final model to be delivered - typically comprised of one or more of the models built during the life of the project_
 3 | 
 4 | ## Analytic Approach
 5 | * What is target definition
 6 | * What are inputs (description)
 7 | * What kind of model was built?
 8 | 
 9 | ## Solution Description
10 | * Simple solution architecture (Data sources, solution components, data flow)
11 | * What is output?
12 | 
13 | ## Data
14 | * Source
15 | * Data Schema
16 | * Sampling
17 | * Selection (dates, segments)
18 | * Stats (counts)
19 | 
20 | ## Features
21 | * List of raw and derived features 
22 | * Importance ranking.
23 | 
24 | ## Algorithm
25 | * Description or images of data flow graph
26 |   * if AzureML, link to:
27 |     * Training experiment
28 |     * Scoring workflow
29 | * What learner(s) were used?
30 | * Learner hyper-parameters
31 | 
32 | ## Results
33 | * ROC/Lift charts, AUC, R^2, MAPE as appropriate
34 | * Performance graphs for parameters sweeps if applicable
35 | 


--------------------------------------------------------------------------------
/NOTICE.TXT:
--------------------------------------------------------------------------------
 1 | ##Legal Notices
 2 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content
 3 | in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
 4 | see the LICENSE file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
 5 | LICENSE-CODE file.
 6 | 
 7 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation
 8 | may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries.
 9 | The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks.
10 | Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.
11 | 
12 | Privacy information can be found at https://privacy.microsoft.com/
13 | 
14 | Microsoft and any contributors reserve all others rights, whether under their respective copyrights, patents,
15 | or trademarks, whether by implication, estoppel or otherwise.


--------------------------------------------------------------------------------
/Docs/Data_Dictionaries/ReadMe.md:
--------------------------------------------------------------------------------
 1 | # Data Dictionaries
 2 | _Place to put data description documents, typically received from a client_
 3 | This is typically a field-level description of data files received.
 4 | 
 5 | This document provides the descriptions of the data that is provided by the client. If the client is providing data dictionaries in text (in emails or text files), directly copy them here, or have a snapshot of the text, and add it here as an image. If the client is providing data dictionaries in Excel worksheets, directly put the Excel files in this directory, and add a link to this Excel file.
 6 | 
 7 | If the client is providing you the data from a database-like data management system, you can also copy and paste the data schema (snapshot) here. If necessary, please also provide brief description of each column after the snapshot image, if such image does not have such information. 
 8 | 
 9 | ## <Dataset 1 name (from database)\>
10 | 
11 | _Example image of data schema when data is from a sql server_
12 | 
13 | ![](data-dictionary-from-sql-table.PNG)
14 | 
15 | ## <Dataset 2 name (dictionary in Excel file)\>
16 | 
17 | [dataset 2 with dictionary in Excel](./Raw-Data-Dictionary.csv)
18 | 


--------------------------------------------------------------------------------
/LICENSE-CODE.TXT:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | Copyright (c) Microsoft Corporation. All rights reserved.
 3 | 
 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and 
 5 | associated documentation files (the "Software"), to deal in the Software without restriction, 
 6 | including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 
 7 | and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, 
 8 | subject to the following conditions:
 9 | 
10 | The above copyright notice and this permission notice shall be included in all copies or substantial 
11 | portions of the Software.
12 | 
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT 
14 | NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 
15 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 
16 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 
17 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
/Docs/Model/Model 1/Model Report.md:
--------------------------------------------------------------------------------
 1 | # Model Report
 2 | _A report to provide details on a specific experiment (model) - possibly one of many_
 3 | 
 4 | If applicable, the Automated Modeling and Reporting utility developed by Microsoft TDSP team can be used to generate reports, which can provide contents for most of the sections in this model report. 
 5 | ## Analytic Approach
 6 | * What is target definition
 7 | * What are inputs (description)
 8 | * What kind of model was built?
 9 | 
10 | ## Model Description
11 | 
12 | * Models and Parameters
13 | 
14 | 	* Description or images of data flow graph
15 |   		* if AzureML, link to:
16 |     		* Training experiment
17 |     		* Scoring workflow
18 | 	* What learner(s) were used?
19 | 	* Learner hyper-parameters
20 | 
21 | 
22 | ## Results (Model Performance)
23 | * ROC/Lift charts, AUC, R^2, MAPE as appropriate
24 | * Performance graphs for parameters sweeps if applicable
25 | 
26 | ## Model Understanding
27 | 
28 | * Variable Importance (significance)
29 | 
30 | * Insight Derived from the Model
31 | 
32 | 
33 | 
34 | ## Conclusion and Discussions for Next Steps
35 | 
36 | * Conclusion
37 | 
38 | * Discussion on overfitting (if applicable)
39 | 
40 | * What other Features Can Be Generated from the Current Data
41 | 
42 | * What other Relevant Data Sources Are Available to Help the Modeling
43 | 


--------------------------------------------------------------------------------
/Docs/Model/Baseline/Baseline Models.md:
--------------------------------------------------------------------------------
 1 | # Baseline Model Report
 2 | 
 3 | _Baseline model is the the model a data scientist would train and evaluate quickly after he/she has the first (preliminary) feature set ready for the machine learning modeling. Through building the baseline model, the data scientist can have a quick assessment of the feasibility of the machine learning task._
 4 | 
 5 | When applicable, the Automated Modeling and Reporting utility developed by TDSP team of Microsoft is employed to build the baseline models quickly. The baseline model report is generated from this utility easily. 
 6 | 
 7 | > If using the Automated Modeling and Reporting tool, most of the sections below will be generated automatically from this tool. 
 8 | 
 9 | ## Analytic Approach
10 | * What is target definition
11 | * What are inputs (description)
12 | * What kind of model was built?
13 | 
14 | ## Model Description
15 | 
16 | * Models and Parameters
17 | 
18 | 	* Description or images of data flow graph
19 |   		* if AzureML, link to:
20 |     		* Training experiment
21 |     		* Scoring workflow
22 | 	* What learner(s) were used?
23 | 	* Learner hyper-parameters
24 | 
25 | 
26 | ## Results (Model Performance)
27 | * ROC/Lift charts, AUC, R^2, MAPE as appropriate
28 | * Performance graphs for parameters sweeps if applicable
29 | 
30 | ## Model Understanding
31 | 
32 | * Variable Importance (significance)
33 | 
34 | * Insight Derived from the Model
35 | 
36 | 
37 | 
38 | ## Conclusion and Discussions for Next Steps
39 | 
40 | * Conclusion on Feasibility Assessment of the Machine Learning Task
41 | 
42 | * Discussion on Overfitting (If Applicable)
43 | 
44 | * What other Features Can Be Generated from the Current Data
45 | 
46 | * What other Relevant Data Sources Are Available to Help the Modeling


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 |  # TDSP Project Structure, and Documents and Artifact Templates
 2 | 
 3 | This is a general project directory structure for Team Data Science Process developed by Microsoft. It also contains templates for various documents that are recommended as part of executing a data science project when using TDSP. 
 4 | 
 5 | [Team Data Science Process (TDSP)](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview) is an agile, iterative, data science methodology to improve collaboration and team learning. It is supported through a lifecycle definition, standard project structure, artifact templates, and [tools](https://github.com/Azure/Azure-TDSP-Utilities) for productive data science. 
 6 | 
 7 | 
 8 | **NOTE:** In this directory structure, the **Sample_Data folder is NOT supposed to contain LARGE raw or processed data**. It is only supposed to contain **small and sample** data sets, which could be used to test the code.
 9 | 
10 | The two documents under Docs/Project, namely the [Charter](./Docs/Project/Charter.md) and [Exit Report](./Docs/Project/Exit%20Report.md) are particularly important to consider. They help to define the project at the start of an engagement, and provide a final report to the customer or client.
11 | 
12 | **NOTE:** In some projects, e.g. short term proof of principle (PoC) or proof of value (PoV) engagements, it can be relatively time consuming to create and all the recommended documents and artifacts. In that case, at least the Charter and Exit Report should be created and delivered to the customer or client. As necessary, organizations may modify certain sections of the documents. But it is strongly recommended that the content of the documents be maintained, as they provide important information about the project and deliverables.
13 | 


--------------------------------------------------------------------------------
/Docs/Project/Charter.md:
--------------------------------------------------------------------------------
 1 | # Project Charter
 2 | 
 3 | ## Business background
 4 | 
 5 | * Who is the client, what business domain the client is in.
 6 | * What business problems are we trying to address?
 7 | 
 8 | ## Scope
 9 | * What data science solutions are we trying to build?
10 | * What will we do?
11 | * How is it going to be consumed by the customer?
12 | 
13 | ## Personnel
14 | * Who are on this project:
15 | 	* Microsoft:
16 | 		* Project lead
17 | 		* PM
18 | 		* Data scientist(s)
19 | 		* Account manager
20 | 	* Client:
21 | 		* Data administrator
22 | 		* Business contact
23 | 	
24 | ## Metrics
25 | * What are the qualitative objectives? (e.g. reduce user churn)
26 | * What is a quantifiable metric  (e.g. reduce the fraction of users with 4-week inactivity)
27 | * Quantify what improvement in the values of the metrics are useful for the customer scenario (e.g. reduce the  fraction of users with 4-week inactivity by 20%) 
28 | * What is the baseline (current) value of the metric? (e.g. current fraction of users with 4-week inactivity = 60%)
29 | * How will we measure the metric? (e.g. A/B test on a specified subset for a specified period; or comparison of performance after implementation to baseline)
30 | 
31 | ## Plan
32 | * Phases (milestones), timeline, short description of what we'll do in each phase.
33 | 
34 | ## Architecture
35 | * Data
36 |   * What data do we expect? Raw data in the customer data sources (e.g. on-prem files, SQL, on-prem Hadoop etc.)
37 | * Data movement from on-prem to Azure using ADF or other data movement tools (Azcopy, EventHub etc.) to move either
38 |   * all the data, 
39 |   * after some pre-aggregation on-prem,
40 |   * Sampled data enough for modeling 
41 | 
42 | * What tools and data storage/analytics resources will be used in the solution e.g.,
43 |   * ASA for stream aggregation
44 |   * HDI/Hive/R/Python for feature construction, aggregation and sampling
45 |   * AzureML for modeling and web service operationalization
46 | * How will the score or operationalized web service(s) (RRS and/or BES) be consumed in the business workflow of the customer? If applicable, write down pseudo code for the APIs of the web service calls.
47 |   * How will the customer use the model results to make decisions
48 |   * Data movement pipeline in production
49 |   * Make a 1 slide diagram showing the end to end data flow and decision architecture
50 |     * If there is a substantial change in the customer's business workflow, make a before/after diagram showing the data flow.
51 | 
52 | ## Communication
53 | * How will we keep in touch? Weekly meetings?
54 | * Who are the contact persons on both sides?
55 | 


--------------------------------------------------------------------------------
/Docs/Project/Exit Report.md:
--------------------------------------------------------------------------------
 1 | # Exit Report of Project <X> for Customer <Y>
 2 | 
 3 | Instructions: Template for exit criteria for data science projects. This is concise document that includes an overview of the entire project, including details of each stage and learning. If a section isn't applicable (e.g. project didn't include a ML model), simply mark that section as "Not applicable". Suggested length between 5-20 pages. Code should mostly be within code repository (not in this document).
 4 | 
 5 | Customer: <Enter Customer Name\>
 6 | 
 7 | Team Members: <Enter team member' names. Please also enter relevant parties names, such as team lead, Account team, Business stakeholders, etc.\>
 8 | 
 9 | ##	Overview
10 | 
11 | <Executive summary of entire solution, brief non-technical overview\>
12 | 
13 | ##	Business Domain
14 | <Industry, business domain of customer\>
15 | 
16 | ##	Business Problem
17 | <Business problem and exact use case(s), why it matters\>
18 | 
19 | ##	Data Processing
20 | <Schema of original datasets, how data was processed, final input data schema for model\>
21 | 
22 | ##	Modeling, Validation
23 | <Modeling techniques used, validation results, details of how validation conducted\>
24 | 
25 | ##	Solution Architecture
26 | <Architecture of the solution, describe clearly whether this was actually implemented or a proposed architecture. Include diagram and relevant details for reproducing similar architecture. Include details of why this architecture was chosen versus other architectures that were considered, if relevant\>
27 | 
28 | ##	Benefits
29 | 	
30 | ###	Company Benefit (internal only. Double check if you want to share this with your customer)
31 | <What did our company gain from this engagement? ROI, revenue,  etc\>
32 | 
33 | ###	Customer Benefit
34 | What is the benefit (ROI, savings, productivity gains etc)  for the customer? If just POC, what is estimated ROI? If exact metrics are not available, why does it have impact for the customer?\>
35 | 
36 | ##	Learnings
37 | 
38 | ### 	Project Execution
39 | <Learnings around the customer engagement process\>
40 | 
41 | ### Data science / Engineering
42 | <Learnings related to data science/engineering, tips/tricks, etc\>
43 | 
44 | 
45 | ### Domain
46 | <Learnings around the business domain, \>
47 | 
48 | 
49 | ### Product
50 | <Learnings around the products and services utilized in the solution \>
51 | 
52 | ###	What's unique about this project, specific challenges
53 | <Specific issues or setup, unique things, specific challenges that had to be addressed during the engagement and how that was accomplished\>
54 | 
55 | ##	Links
56 | <Links to published case studies, etc.; Link to git repository where all code sits\>
57 | 
58 | 
59 | ##	Next Steps
60 |  
61 | <Next steps. These should include milestones for follow-ups and who 'owns' this action. E.g. Post- Proof of Concept check-in on status on 12/1/2016 by X, monthly check-in meeting by Y, etc.\>
62 | 
63 | ## Appendix
64 | <Other material that seems relevant – try to keep non-appendix to <20 pages but more details can be included in appendix if needed\>


--------------------------------------------------------------------------------
/Docs/Data_Report/Data Defintion.md:
--------------------------------------------------------------------------------
 1 | # Data and Feature Definitions
 2 | 
 3 | This document provides a central hub for the raw data sources, the processed/transformed data, and feature sets. More details of each dataset is provided in the data summary report. 
 4 | 
 5 | For each data, an individual report describing the data schema, the meaning of each data field, and other information that is helpful for understanding the data is provided. If the dataset is the output of processing/transforming/feature engineering existing data set(s), the names of the input data sets, and the links to scripts that are used to conduct the operation are also provided. 
 6 | 
 7 | When applicable, the Interactive Data Exploration, Analysis, and Reporting (IDEAR) utility developed by Microsoft is applied to explore and visualize the data, and generate the data report. Instructions of how to use IDEAR can be found [here](). 
 8 | 
 9 | For each dataset, the links to the sample datasets in the _**Data**_ directory are also provided. 
10 | 
11 | _**For ease of modifying this report, placeholder links are included in this page, for example a link to dataset 1, but they are just placeholders pointing to a non-existent page. These should be modified to point to the actual location.**_
12 | 
13 | 
14 | ## Raw Data Sources
15 | 
16 | 
17 | | Dataset Name | Original Location   | Destination Location  | Data Movement Tools / Scripts | Link to Report |
18 | | ---:| ---: | ---: | ---: | -----: |
19 | | Dataset 1 | Brief description of its orignal location | Brief description of its destination location | [script1.py](link/to/python/script/file/in/Code) | [Dataset 1 Report](link/to/report1)|
20 | | Dataset 2 | Brief description of its orignal location | Brief description of its destination location | [script2.R](link/to/R/script/file/in/Code) | [Dataset 2 Report](link/to/report2)|
21 | 
22 | 
23 | * Dataset1 summary. <Provide brief summary of the data, such as how to access the data. More detailed information should be in the Dataset1 Report.>
24 | * Dataset2 summary. <Provide brief summary of the data, such as how to access the data. More detailed information should be in the Dataset2 Report.> 
25 | 
26 | ## Processed Data
27 | | Processed Dataset Name | Input Dataset(s)   | Data Processing Tools/Scripts | Link to Report |
28 | | ---:| ---: | ---: | ---: | 
29 | | Processed Dataset 1 | [Dataset1](link/to/dataset1/report), [Dataset2](link/to/dataset2/report) | [Python_Script1.py](link/to/python/script/file/in/Code) | [Processed Dataset 1 Report](link/to/report1)|
30 | | Processed Dataset 2 | [Dataset2](link/to/dataset2/report) |[script2.R](link/to/R/script/file/in/Code) | [Processed Dataset 2 Report](link/to/report2)|
31 | 
32 | * Processed Data1 summary. <Provide brief summary of the processed data, such as why you want to process data in this way. More detailed information about the processed data should be in the Processed Data1 Report.>
33 | * Processed Data2 summary. <Provide brief summary of the processed data, such as why you want to process data in this way. More detailed information about the processed data should be in the Processed Data2 Report.> 
34 | 
35 | ## Feature Sets
36 | 
37 | | Feature Set Name | Input Dataset(s)   | Feature Engineering Tools/Scripts | Link to Report |
38 | | ---:| ---: | ---: | ---: | 
39 | | Feature Set 1 | [Dataset1](link/to/dataset1/report), [Processed Dataset2](link/to/dataset2/report) | [R_Script2.R](link/to/R/script/file/in/Code) | [Feature Set1 Report](link/to/report1)|
40 | | Feature Set 2 | [Processed Dataset2](link/to/dataset2/report) |[SQL_Script2.sql](link/to/sql/script/file/in/Code) | [Feature Set2 Report](link/to/report2)|
41 | 
42 | * Feature Set1 summary. <Provide detailed description of the feature set, such as the meaning of each feature. More detailed information about the feature set should be in the Feature Set1 Report.>
43 | * Feature Set2 summary. <Provide detailed description of the feature set, such as the meaning of each feature. More detailed information about the feature set should be in the Feature Set2 Report.> 
44 | 


--------------------------------------------------------------------------------