├── Code ├── Deployment │ ├── operationalization.py │ └── Readme.md ├── Modeling │ ├── model.R │ └── Readme.md ├── Data_Acquisition_and_Understanding │ ├── dataPrep.py │ ├── datapipeline.json │ └── Readme.md └── Readme.md ├── Docs ├── Model │ ├── README.md │ ├── FinalReport.md │ ├── Model 1 │ │ └── Model Report.md │ └── Baseline │ │ └── Baseline Models.md ├── Data_Report │ ├── ReadMe.md │ ├── DataPipeline.txt │ ├── DataSummaryReport.md │ └── Data Defintion.md ├── Project │ ├── System Architecture.docx │ ├── README.md │ ├── Charter.md │ └── Exit Report.md ├── Data_Dictionaries │ ├── data-dictionary-from-sql-table.PNG │ ├── Raw-Data-Dictionary.csv │ └── ReadMe.md └── README.md ├── LICENSE.TXT ├── Sample_Data ├── For_Modeling │ └── modelling.md ├── Raw │ └── rawData.md ├── Processed │ └── processed.md └── README.md ├── NOTICE.TXT ├── LICENSE-CODE.TXT └── README.md /Code/Deployment/operationalization.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /Code/Modeling/model.R: -------------------------------------------------------------------------------- 1 | # R-code for models -------------------------------------------------------------------------------- /Code/Data_Acquisition_and_Understanding/dataPrep.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /Docs/Model/README.md: -------------------------------------------------------------------------------- 1 | # Folder for hosting all documents and reports related to modeling 2 | -------------------------------------------------------------------------------- /LICENSE.TXT: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Azure-TDSP-ProjectTemplate/HEAD/LICENSE.TXT -------------------------------------------------------------------------------- /Docs/Data_Report/ReadMe.md: -------------------------------------------------------------------------------- 1 | # DataReport Folder 2 | _Location to place documents describing results of data exploration_ -------------------------------------------------------------------------------- /Docs/Project/System Architecture.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Azure-TDSP-ProjectTemplate/HEAD/Docs/Project/System Architecture.docx -------------------------------------------------------------------------------- /Code/Data_Acquisition_and_Understanding/datapipeline.json: -------------------------------------------------------------------------------- 1 | # Add your data pipe line spec. This is dependant on your tool. For Azure Data Factory (ADF) it is a set of JSON files. 2 | -------------------------------------------------------------------------------- /Docs/Data_Dictionaries/data-dictionary-from-sql-table.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Azure-TDSP-ProjectTemplate/HEAD/Docs/Data_Dictionaries/data-dictionary-from-sql-table.PNG -------------------------------------------------------------------------------- /Code/Deployment/Readme.md: -------------------------------------------------------------------------------- 1 | # This folder contains code for model deployment 2 | 3 | You can add detailed description in this markdown related to your specific data science project. 4 | -------------------------------------------------------------------------------- /Docs/Data_Dictionaries/Raw-Data-Dictionary.csv: -------------------------------------------------------------------------------- 1 | Column Index,Column Name,Type of Variable,"Values (range, levels, examples, etc)",Short Description,Joining Keys with others datasets? 2 | 1,,,,, 3 | 2,,,,, 4 | 3,,,,, 5 | -------------------------------------------------------------------------------- /Docs/Data_Report/DataPipeline.txt: -------------------------------------------------------------------------------- 1 | # Data Pipeline 2 | 3 | Describe the data pipeline and provide a logical diagram. List how frequently the data is moved - real time/stream, near real time, batched with frequency etc. 4 | -------------------------------------------------------------------------------- /Code/Data_Acquisition_and_Understanding/Readme.md: -------------------------------------------------------------------------------- 1 | # This folder hosts code for data acquisition and understanding (exploratory analysis) 2 | 3 | You can add detailed description in this markdown related to your specific data science project. 4 | -------------------------------------------------------------------------------- /Code/Modeling/Readme.md: -------------------------------------------------------------------------------- 1 | # This folder contains code for modeling and related activities (such as feature engineering, model evaluation etc.) 2 | 3 | You can add detailed description in this markdown related to your specific data science project. 4 | -------------------------------------------------------------------------------- /Code/Readme.md: -------------------------------------------------------------------------------- 1 | # Code folder for hosting code for a Data Science Project 2 | 3 | This folder hosts all code for a data science project. It has three sub-folders, belonging to 3 stages of the Data Science Lifecycle: 4 | 5 | 1. Data_Acquisition_and_Understanding 6 | 2. Modeling 7 | 3. Deployment 8 | -------------------------------------------------------------------------------- /Docs/README.md: -------------------------------------------------------------------------------- 1 | # Folder for hosting all documents for a Data Science Project 2 | 3 | Documents will contain information about the following 4 | 5 | 1. System architecture 6 | 2. Data dictionaries 7 | 3. Reports related to data understanding, modeling 8 | 4. Project management and planning docs 9 | 5. Information obtained from a business owner or client about the project 10 | 6. Docs and presentations prepared to share information about the project 11 | -------------------------------------------------------------------------------- /Sample_Data/For_Modeling/modelling.md: -------------------------------------------------------------------------------- 1 | # List of feature sets 2 | | Feature Set Name | Link to the Full Feature Set | Full Feature Set Size (MB) | Link to Report | 3 | | ---:| ---: | ---: | ---: | 4 | | Feature Set 1 | [link](link/to/feature/set1) | 2,000 | [Feature Set 1 Report](link/to/report1)| 5 | | Feature Set 2 | [link](link/to/feature/set2) | 300 | [Feature Set 2 Report](link/to/report2)| 6 | 7 | If the link to the full dataset does not apply, provide some information on how to access the full dataset. 8 | 9 | If the data stays in an Azure file storage, please provide the link to the text file with the information of the file storage that has been checked in to the git repository. -------------------------------------------------------------------------------- /Docs/Project/README.md: -------------------------------------------------------------------------------- 1 | # Folder for hosting project documents and reports for a Data Science Project 2 | 3 | These could be: 4 | 5 | 1. Project management and planning docs 6 | 2. System architecture 7 | 3. Information obtained from a business owner or client about the project 8 | 4. Docs and presentations prepared to share information about hte project 9 | 10 | In this folder we have templates for project chater and exit report. 11 | 12 | In addition, if you have access to Microsoft Project or Excel, you may use project templates provided in this [blog](https://blogs.msdn.microsoft.com/buckwoody/2017/10/24/a-data-science-microsoft-project-template-you-can-use-in-your-solutions). 13 | -------------------------------------------------------------------------------- /Sample_Data/Raw/rawData.md: -------------------------------------------------------------------------------- 1 | ## List of Raw Datasets 2 | 3 | 4 | | Raw Dataset Name | Link to the Full Dataset | Full Dataset Size (MB) | Link to Report | 5 | | ---:| ---: | ---: | ---: | 6 | | Raw Dataset 1 | [link](link/to/full/dataset1) | 2,000 | [Raw Dataset 1 Report](link/to/report1)| 7 | | Raw Dataset 2 | [link](link/to/full/dataset2) | 300 | [Raw Dataset 2 Report](link/to/report2)| 8 | 9 | If the link to the full dataset does not apply, provide some information on how to access the full dataset. 10 | 11 | If the data stays in an Azure file storage, please provide the link to the text file with the information of the file storage that has been checked in to the git repository. 12 | 13 | -------------------------------------------------------------------------------- /Sample_Data/Processed/processed.md: -------------------------------------------------------------------------------- 1 | ## List of Processed Datasets 2 | 3 | 4 | | Processed Dataset Name | Link to the Full Processed Dataset | Full Processed Dataset Size (MB) | Link to Report | 5 | | ---:| ---: | ---: | ---: | 6 | | Processed Dataset 1 | [link](link/to/processed/dataset1) | 2,000 | [Processed Dataset 1 Report](link/to/report1)| 7 | | Processed Dataset 2 | [link](link/to/processed/dataset2) | 300 | [Processed Dataset 2 Report](link/to/report2)| 8 | 9 | 10 | If the link to the full dataset does not apply, provide some information on how to access the full dataset. 11 | 12 | If the data stays in an Azure file storage, please provide the link to the text file with the information of the file storage that has been checked in to the git repository. -------------------------------------------------------------------------------- /Docs/Data_Report/DataSummaryReport.md: -------------------------------------------------------------------------------- 1 | # Data Report 2 | This file will be generated for each data file received or processed. The Interactive Data Exploration, Analysis, and Reporting (IDEAR) utility developed by TDSP team of Microsoft can help you explore and visualize the data in an interactive way, and generate the data report along with the process of exploration and visualization. 3 | 4 | IDEAR allows you to output the data summary, statistics, and charts that you want to use to tell the data story into the report. You only need to click a few buttons, and the report will be generated for you. 5 | 6 | ## General summary of the data 7 | 8 | ## Data quality summary 9 | 10 | ## Target variable 11 | 12 | ## Individual variables 13 | 14 | ## Variable ranking 15 | 16 | ## Relationship between explanatory variables and target variable 17 | 18 | 19 | -------------------------------------------------------------------------------- /Sample_Data/README.md: -------------------------------------------------------------------------------- 1 | The **Sample_Data** directory in the project git repository is the place to store **SAMPLE** datasets which should be of small size, **NOT** the entire datasets. If your client does not allow you to store even the sample data on the github repository, if possible, store a sample dataset with all confidential fields hashed. If still not allowed, please do not store sample data here. But, please still fill in the table in each sub-directory. 2 | 3 | The small sample datasets can be used to make your data preprocessing, feature engineering, or modeling scripts runnable. It can be helpful to quickly run the scripts that process or model the data, and understand what the scripts are doing. 4 | 5 | In each directory, there is a markdown file, which lists all datasets in each directory. Please provide the link to the full dataset in case one wants to access the full dataset. 6 | 7 | 8 | 9 | -------------------------------------------------------------------------------- /Docs/Model/FinalReport.md: -------------------------------------------------------------------------------- 1 | # Final Model Report 2 | _Report describing the final model to be delivered - typically comprised of one or more of the models built during the life of the project_ 3 | 4 | ## Analytic Approach 5 | * What is target definition 6 | * What are inputs (description) 7 | * What kind of model was built? 8 | 9 | ## Solution Description 10 | * Simple solution architecture (Data sources, solution components, data flow) 11 | * What is output? 12 | 13 | ## Data 14 | * Source 15 | * Data Schema 16 | * Sampling 17 | * Selection (dates, segments) 18 | * Stats (counts) 19 | 20 | ## Features 21 | * List of raw and derived features 22 | * Importance ranking. 23 | 24 | ## Algorithm 25 | * Description or images of data flow graph 26 | * if AzureML, link to: 27 | * Training experiment 28 | * Scoring workflow 29 | * What learner(s) were used? 30 | * Learner hyper-parameters 31 | 32 | ## Results 33 | * ROC/Lift charts, AUC, R^2, MAPE as appropriate 34 | * Performance graphs for parameters sweeps if applicable 35 | -------------------------------------------------------------------------------- /NOTICE.TXT: -------------------------------------------------------------------------------- 1 | ##Legal Notices 2 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content 3 | in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode), 4 | see the LICENSE file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the 5 | LICENSE-CODE file. 6 | 7 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation 8 | may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. 9 | The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. 10 | Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653. 11 | 12 | Privacy information can be found at https://privacy.microsoft.com/ 13 | 14 | Microsoft and any contributors reserve all others rights, whether under their respective copyrights, patents, 15 | or trademarks, whether by implication, estoppel or otherwise. -------------------------------------------------------------------------------- /Docs/Data_Dictionaries/ReadMe.md: -------------------------------------------------------------------------------- 1 | # Data Dictionaries 2 | _Place to put data description documents, typically received from a client_ 3 | This is typically a field-level description of data files received. 4 | 5 | This document provides the descriptions of the data that is provided by the client. If the client is providing data dictionaries in text (in emails or text files), directly copy them here, or have a snapshot of the text, and add it here as an image. If the client is providing data dictionaries in Excel worksheets, directly put the Excel files in this directory, and add a link to this Excel file. 6 | 7 | If the client is providing you the data from a database-like data management system, you can also copy and paste the data schema (snapshot) here. If necessary, please also provide brief description of each column after the snapshot image, if such image does not have such information. 8 | 9 | ## 10 | 11 | _Example image of data schema when data is from a sql server_ 12 | 13 | ![](data-dictionary-from-sql-table.PNG) 14 | 15 | ## 16 | 17 | [dataset 2 with dictionary in Excel](./Raw-Data-Dictionary.csv) 18 | -------------------------------------------------------------------------------- /LICENSE-CODE.TXT: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | Copyright (c) Microsoft Corporation. All rights reserved. 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and 5 | associated documentation files (the "Software"), to deal in the Software without restriction, 6 | including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 7 | and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, 8 | subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all copies or substantial 11 | portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT 14 | NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 15 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 16 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 17 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /Docs/Model/Model 1/Model Report.md: -------------------------------------------------------------------------------- 1 | # Model Report 2 | _A report to provide details on a specific experiment (model) - possibly one of many_ 3 | 4 | If applicable, the Automated Modeling and Reporting utility developed by Microsoft TDSP team can be used to generate reports, which can provide contents for most of the sections in this model report. 5 | ## Analytic Approach 6 | * What is target definition 7 | * What are inputs (description) 8 | * What kind of model was built? 9 | 10 | ## Model Description 11 | 12 | * Models and Parameters 13 | 14 | * Description or images of data flow graph 15 | * if AzureML, link to: 16 | * Training experiment 17 | * Scoring workflow 18 | * What learner(s) were used? 19 | * Learner hyper-parameters 20 | 21 | 22 | ## Results (Model Performance) 23 | * ROC/Lift charts, AUC, R^2, MAPE as appropriate 24 | * Performance graphs for parameters sweeps if applicable 25 | 26 | ## Model Understanding 27 | 28 | * Variable Importance (significance) 29 | 30 | * Insight Derived from the Model 31 | 32 | 33 | 34 | ## Conclusion and Discussions for Next Steps 35 | 36 | * Conclusion 37 | 38 | * Discussion on overfitting (if applicable) 39 | 40 | * What other Features Can Be Generated from the Current Data 41 | 42 | * What other Relevant Data Sources Are Available to Help the Modeling 43 | -------------------------------------------------------------------------------- /Docs/Model/Baseline/Baseline Models.md: -------------------------------------------------------------------------------- 1 | # Baseline Model Report 2 | 3 | _Baseline model is the the model a data scientist would train and evaluate quickly after he/she has the first (preliminary) feature set ready for the machine learning modeling. Through building the baseline model, the data scientist can have a quick assessment of the feasibility of the machine learning task._ 4 | 5 | When applicable, the Automated Modeling and Reporting utility developed by TDSP team of Microsoft is employed to build the baseline models quickly. The baseline model report is generated from this utility easily. 6 | 7 | > If using the Automated Modeling and Reporting tool, most of the sections below will be generated automatically from this tool. 8 | 9 | ## Analytic Approach 10 | * What is target definition 11 | * What are inputs (description) 12 | * What kind of model was built? 13 | 14 | ## Model Description 15 | 16 | * Models and Parameters 17 | 18 | * Description or images of data flow graph 19 | * if AzureML, link to: 20 | * Training experiment 21 | * Scoring workflow 22 | * What learner(s) were used? 23 | * Learner hyper-parameters 24 | 25 | 26 | ## Results (Model Performance) 27 | * ROC/Lift charts, AUC, R^2, MAPE as appropriate 28 | * Performance graphs for parameters sweeps if applicable 29 | 30 | ## Model Understanding 31 | 32 | * Variable Importance (significance) 33 | 34 | * Insight Derived from the Model 35 | 36 | 37 | 38 | ## Conclusion and Discussions for Next Steps 39 | 40 | * Conclusion on Feasibility Assessment of the Machine Learning Task 41 | 42 | * Discussion on Overfitting (If Applicable) 43 | 44 | * What other Features Can Be Generated from the Current Data 45 | 46 | * What other Relevant Data Sources Are Available to Help the Modeling -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TDSP Project Structure, and Documents and Artifact Templates 2 | 3 | This is a general project directory structure for Team Data Science Process developed by Microsoft. It also contains templates for various documents that are recommended as part of executing a data science project when using TDSP. 4 | 5 | [Team Data Science Process (TDSP)](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview) is an agile, iterative, data science methodology to improve collaboration and team learning. It is supported through a lifecycle definition, standard project structure, artifact templates, and [tools](https://github.com/Azure/Azure-TDSP-Utilities) for productive data science. 6 | 7 | 8 | **NOTE:** In this directory structure, the **Sample_Data folder is NOT supposed to contain LARGE raw or processed data**. It is only supposed to contain **small and sample** data sets, which could be used to test the code. 9 | 10 | The two documents under Docs/Project, namely the [Charter](./Docs/Project/Charter.md) and [Exit Report](./Docs/Project/Exit%20Report.md) are particularly important to consider. They help to define the project at the start of an engagement, and provide a final report to the customer or client. 11 | 12 | **NOTE:** In some projects, e.g. short term proof of principle (PoC) or proof of value (PoV) engagements, it can be relatively time consuming to create and all the recommended documents and artifacts. In that case, at least the Charter and Exit Report should be created and delivered to the customer or client. As necessary, organizations may modify certain sections of the documents. But it is strongly recommended that the content of the documents be maintained, as they provide important information about the project and deliverables. 13 | -------------------------------------------------------------------------------- /Docs/Project/Charter.md: -------------------------------------------------------------------------------- 1 | # Project Charter 2 | 3 | ## Business background 4 | 5 | * Who is the client, what business domain the client is in. 6 | * What business problems are we trying to address? 7 | 8 | ## Scope 9 | * What data science solutions are we trying to build? 10 | * What will we do? 11 | * How is it going to be consumed by the customer? 12 | 13 | ## Personnel 14 | * Who are on this project: 15 | * Microsoft: 16 | * Project lead 17 | * PM 18 | * Data scientist(s) 19 | * Account manager 20 | * Client: 21 | * Data administrator 22 | * Business contact 23 | 24 | ## Metrics 25 | * What are the qualitative objectives? (e.g. reduce user churn) 26 | * What is a quantifiable metric (e.g. reduce the fraction of users with 4-week inactivity) 27 | * Quantify what improvement in the values of the metrics are useful for the customer scenario (e.g. reduce the fraction of users with 4-week inactivity by 20%) 28 | * What is the baseline (current) value of the metric? (e.g. current fraction of users with 4-week inactivity = 60%) 29 | * How will we measure the metric? (e.g. A/B test on a specified subset for a specified period; or comparison of performance after implementation to baseline) 30 | 31 | ## Plan 32 | * Phases (milestones), timeline, short description of what we'll do in each phase. 33 | 34 | ## Architecture 35 | * Data 36 | * What data do we expect? Raw data in the customer data sources (e.g. on-prem files, SQL, on-prem Hadoop etc.) 37 | * Data movement from on-prem to Azure using ADF or other data movement tools (Azcopy, EventHub etc.) to move either 38 | * all the data, 39 | * after some pre-aggregation on-prem, 40 | * Sampled data enough for modeling 41 | 42 | * What tools and data storage/analytics resources will be used in the solution e.g., 43 | * ASA for stream aggregation 44 | * HDI/Hive/R/Python for feature construction, aggregation and sampling 45 | * AzureML for modeling and web service operationalization 46 | * How will the score or operationalized web service(s) (RRS and/or BES) be consumed in the business workflow of the customer? If applicable, write down pseudo code for the APIs of the web service calls. 47 | * How will the customer use the model results to make decisions 48 | * Data movement pipeline in production 49 | * Make a 1 slide diagram showing the end to end data flow and decision architecture 50 | * If there is a substantial change in the customer's business workflow, make a before/after diagram showing the data flow. 51 | 52 | ## Communication 53 | * How will we keep in touch? Weekly meetings? 54 | * Who are the contact persons on both sides? 55 | -------------------------------------------------------------------------------- /Docs/Project/Exit Report.md: -------------------------------------------------------------------------------- 1 | # Exit Report of Project for Customer 2 | 3 | Instructions: Template for exit criteria for data science projects. This is concise document that includes an overview of the entire project, including details of each stage and learning. If a section isn't applicable (e.g. project didn't include a ML model), simply mark that section as "Not applicable". Suggested length between 5-20 pages. Code should mostly be within code repository (not in this document). 4 | 5 | Customer: 6 | 7 | Team Members: 8 | 9 | ## Overview 10 | 11 | 12 | 13 | ## Business Domain 14 | 15 | 16 | ## Business Problem 17 | 18 | 19 | ## Data Processing 20 | 21 | 22 | ## Modeling, Validation 23 | 24 | 25 | ## Solution Architecture 26 | 27 | 28 | ## Benefits 29 | 30 | ### Company Benefit (internal only. Double check if you want to share this with your customer) 31 | 32 | 33 | ### Customer Benefit 34 | What is the benefit (ROI, savings, productivity gains etc) for the customer? If just POC, what is estimated ROI? If exact metrics are not available, why does it have impact for the customer?\> 35 | 36 | ## Learnings 37 | 38 | ### Project Execution 39 | 40 | 41 | ### Data science / Engineering 42 | 43 | 44 | 45 | ### Domain 46 | 47 | 48 | 49 | ### Product 50 | 51 | 52 | ### What's unique about this project, specific challenges 53 | 54 | 55 | ## Links 56 | 57 | 58 | 59 | ## Next Steps 60 | 61 | 62 | 63 | ## Appendix 64 | -------------------------------------------------------------------------------- /Docs/Data_Report/Data Defintion.md: -------------------------------------------------------------------------------- 1 | # Data and Feature Definitions 2 | 3 | This document provides a central hub for the raw data sources, the processed/transformed data, and feature sets. More details of each dataset is provided in the data summary report. 4 | 5 | For each data, an individual report describing the data schema, the meaning of each data field, and other information that is helpful for understanding the data is provided. If the dataset is the output of processing/transforming/feature engineering existing data set(s), the names of the input data sets, and the links to scripts that are used to conduct the operation are also provided. 6 | 7 | When applicable, the Interactive Data Exploration, Analysis, and Reporting (IDEAR) utility developed by Microsoft is applied to explore and visualize the data, and generate the data report. Instructions of how to use IDEAR can be found [here](). 8 | 9 | For each dataset, the links to the sample datasets in the _**Data**_ directory are also provided. 10 | 11 | _**For ease of modifying this report, placeholder links are included in this page, for example a link to dataset 1, but they are just placeholders pointing to a non-existent page. These should be modified to point to the actual location.**_ 12 | 13 | 14 | ## Raw Data Sources 15 | 16 | 17 | | Dataset Name | Original Location | Destination Location | Data Movement Tools / Scripts | Link to Report | 18 | | ---:| ---: | ---: | ---: | -----: | 19 | | Dataset 1 | Brief description of its orignal location | Brief description of its destination location | [script1.py](link/to/python/script/file/in/Code) | [Dataset 1 Report](link/to/report1)| 20 | | Dataset 2 | Brief description of its orignal location | Brief description of its destination location | [script2.R](link/to/R/script/file/in/Code) | [Dataset 2 Report](link/to/report2)| 21 | 22 | 23 | * Dataset1 summary. 24 | * Dataset2 summary. 25 | 26 | ## Processed Data 27 | | Processed Dataset Name | Input Dataset(s) | Data Processing Tools/Scripts | Link to Report | 28 | | ---:| ---: | ---: | ---: | 29 | | Processed Dataset 1 | [Dataset1](link/to/dataset1/report), [Dataset2](link/to/dataset2/report) | [Python_Script1.py](link/to/python/script/file/in/Code) | [Processed Dataset 1 Report](link/to/report1)| 30 | | Processed Dataset 2 | [Dataset2](link/to/dataset2/report) |[script2.R](link/to/R/script/file/in/Code) | [Processed Dataset 2 Report](link/to/report2)| 31 | 32 | * Processed Data1 summary. 33 | * Processed Data2 summary. 34 | 35 | ## Feature Sets 36 | 37 | | Feature Set Name | Input Dataset(s) | Feature Engineering Tools/Scripts | Link to Report | 38 | | ---:| ---: | ---: | ---: | 39 | | Feature Set 1 | [Dataset1](link/to/dataset1/report), [Processed Dataset2](link/to/dataset2/report) | [R_Script2.R](link/to/R/script/file/in/Code) | [Feature Set1 Report](link/to/report1)| 40 | | Feature Set 2 | [Processed Dataset2](link/to/dataset2/report) |[SQL_Script2.sql](link/to/sql/script/file/in/Code) | [Feature Set2 Report](link/to/report2)| 41 | 42 | * Feature Set1 summary. 43 | * Feature Set2 summary. 44 | --------------------------------------------------------------------------------