├── .gitignore
├── .gitmodules
├── 1807.05351.pdf
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── ML Schema Core Specification.pdf
├── README.md
├── _prior_art
    ├── README.md
    ├── kubeflow
    │   ├── Artifact.proto
    │   ├── ArtifactConnection.proto
    │   ├── Data.proto
    │   ├── Executable.proto
    │   ├── Framework.proto
    │   ├── Metrics.proto
    │   ├── Model.proto
    │   ├── Project.proto
    │   ├── README.md
    │   ├── Run.proto
    │   ├── TimeRange.proto
    │   └── UUID.proto
    ├── mlflow
    │   └── README.md
    ├── modeldb
    │   └── README.md
    ├── pachyderm
    │   └── README.md
    └── seldon
    │   └── README.md
├── common
    └── object.md
├── data
    ├── artifact.md
    ├── datapath.md
    ├── dataset.md
    ├── dataset.yml
    ├── datastore.md
    └── readme.md
├── docs
    ├── archive
    │   └── README-old.md
    └── assets
    │   └── logos
    │       ├── mlspec_logo.png
    │       └── mlspec_logo_light.png
├── experiment_tracking
    ├── README.md
    ├── experiment_example.yml
    └── run.md
├── logging_proto
    ├── README.md
    └── inferenceLog.yml
├── metadata_file
    ├── README.md
    └── metadata.yaml
├── model_packaging
    ├── README.md
    ├── data.yaml
    ├── model.yaml
    ├── model_example.yml
    ├── model_onnx_conversion.yaml
    ├── model_packaging.md
    ├── model_packaging.yaml
    ├── model_scoring.yaml
    ├── model_serving.yaml
    └── model_training.yaml
├── monitoring_proto
    ├── README.md
    └── inferenceRequest.yml
└── pipelines
    ├── module.md
    ├── pipeline.md
    └── pipeline.yml


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "_prior_art/_prior_art/tf_metadata"]
2 | 	path = _prior_art/_prior_art/tf_metadata
3 | 	url = https://github.com/tensorflow/metadata/
4 | 


--------------------------------------------------------------------------------
/1807.05351.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/1807.05351.pdf


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct
2 | 
3 | We have adopted the [Contributor Covenant](https://www.contributor-covenant.org/version/2/0/code_of_conduct/) as our Code of Conduct.
4 | 
5 | Please read the full text at [https://www.contributor-covenant.org/version/2/0/code_of_conduct/](https://www.contributor-covenant.org/version/2/0/code_of_conduct/).
6 | 
7 | For any questions or concerns, please contact the project maintainers.
8 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to ML Spec
 2 | 
 3 | Thank you for your interest in contributing to ML Spec! We welcome contributions from the community to help improve and advance the project. This document outlines the guidelines and best practices for contributing to ML Spec.
 4 | 
 5 | ## Code of Conduct
 6 | 
 7 | We have adopted the [Contributor Covenant](https://www.contributor-covenant.org/version/2/0/code_of_conduct/) as our Code of Conduct. By participating in this project, you agree to abide by its terms. Please read the [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) file for more information.
 8 | 
 9 | ## Ways to Contribute
10 | 
11 | There are several ways you can contribute to ML Spec:
12 | 
13 | - Reporting bugs and issues
14 | - Suggesting enhancements and new features
15 | - Improving documentation
16 | - Submitting pull requests with bug fixes or new features
17 | - Providing feedback and suggestions for improvement
18 | 
19 | ## Getting Started
20 | 
21 | 1. Fork the ML Spec repository on GitHub.
22 | 2. Clone your forked repository to your local machine.
23 | 3. Create a new branch for your contribution.
24 | 4. Make your changes and commit them with descriptive messages.
25 | 5. Push your changes to your forked repository.
26 | 6. Submit a pull request to the main ML Spec repository.
27 | 
28 | ## Pull Request Guidelines
29 | 
30 | When submitting a pull request, please ensure the following:
31 | 
32 | - Provide a clear and descriptive title.
33 | - Include a detailed description of the changes made and the problem they solve.
34 | - Reference any relevant issues or pull requests.
35 | - Ensure your code follows the project's coding conventions and style guide.
36 | - Include tests for any new functionality or bug fixes.
37 | - Update the documentation if necessary.
38 | 
39 | ## Issue Reporting
40 | 
41 | If you encounter a bug or have a feature request, please submit an issue on the GitHub issue tracker. When reporting an issue, please provide the following information:
42 | 
43 | - A clear and descriptive title.
44 | - A detailed description of the issue or feature request.
45 | - Steps to reproduce the issue (if applicable).
46 | - Any relevant error messages or logs.
47 | - Your operating system and version.
48 | - Any other relevant information.
49 | 
50 | ## Communication
51 | 
52 | If you have any questions, suggestions, or need clarification, you can reach out to the maintainers and the community through the following channels:
53 | 
54 | - GitHub Issues: For bug reports, feature requests, and general discussions related to the project.
55 | - ML Spec Mailing List: For general discussions, announcements, and community engagement.
56 | - ML Spec Slack Channel: For real-time communication and collaboration with the community.
57 | 
58 | ## License
59 | 
60 | By contributing to ML Spec, you agree that your contributions will be licensed under the [Apache License 2.0](https://github.com/mlspec/MLSpec/blob/master/LICENSE).
61 | 
62 | Thank you for your contributions and helping to make ML Spec better!
63 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/ML Schema Core Specification.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/ML Schema Core Specification.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ![MLSpec Logo](./docs/assets/logos/mlspec_logo.png#gh-light-mode-only)
  2 | ![MLSpec Logo](./docs/assets/logos/mlspec_logo_light.png#gh-dark-mode-only)
  3 | 
  4 | MLSpec is an open source framework for defining and verifying machine learning (ML) workflows.  The project provides a standardized schema and libraries to specify and validate the various stages of an ML pipeline, from data preprocessing to model training, evaluation, and deployment.
  5 | 
  6 | ## Table of Contents
  7 | 
  8 | - [Background](#background)
  9 |   - [Foundational Work](#foundational-work)
 10 |   - [Existing Multi-Stage ML Workflows](#existing-multi-stage-ml-workflows)
 11 | - [Specification](#specification)
 12 |   - [MLSpec Standards](#mlspec-standards)
 13 |   - [End-to-End Complete Lifecycle](#end-to-end-complete-lifecycle)
 14 |   - [Repository Structure](#repository-structure)
 15 | - [Vision and Direction](#vision-and-direction)
 16 | - [Enhancing Model Interpretability and Trust](#enhancing-model-interpretability-and-trust)
 17 | - [Roadmap](#roadmap)
 18 | - [Contributing](#contributing)
 19 | - [Code of Conduct](#code-of-conduct)
 20 | - [Acknowledgments](#acknowledgments)
 21 | [Help Shape The Future of MLSpec!](#help-shape-the-future-of-mlspec)
 22 | 
 23 | ## Background
 24 | 
 25 | The field of machine learning has seen significant advancements in recent years, with the development of various frameworks, tools, and platforms to support the ML lifecycle. However, the lack of standardization and interoperability among these tools has led to challenges in reproducing, sharing, and governing ML workflows across different environments and organizations.
 26 | 
 27 | ### Foundational Work
 28 | 
 29 | MLSpec builds upon the ideas and concepts from the foundational work in the field of ML workflow specification and verification. Some notable contributions include:
 30 | 
 31 | - [Predictive Model Markup Language (PMML)](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)
 32 | - [Portable Format for Analytics (PFA)](https://dmg.org/pfa/)
 33 | - [ML Metadata](https://www.tensorflow.org/tfx/guide/mlmd)
 34 | 
 35 | These projects have paved the way for standardizing ML model serialization and metadata, but they primarily focus on individual models rather than end-to-end workflows.
 36 | 
 37 | MLSpec also draws inspiration from the MLSchema project, which has made significant contributions to the field of ML workflow specification. Due to the instability of the original MLSchema website, we have mirrored some of their key resources here:
 38 | 
 39 | - [MLSchema Paper](https://github.com/mlspec/MLSpec/blob/master/1807.05351.pdf)
 40 | - [MLSchema Core Specification](https://github.com/mlspec/MLSpec/blob/master/ML%20Schema%20Core%20Specification.pdf)
 41 | 
 42 | These resources provide valuable insights into the design principles and approaches behind ML workflow specification and have influenced the development of MLSpec.
 43 | 
 44 | ### Existing Multi-Stage ML Workflows
 45 | 
 46 | Several prominent companies and organizations have developed their own multi-stage ML workflow solutions to address the challenges of managing end-to-end machine learning pipelines. These projects have focused on combining ML and batch processing capabilities to create robust and scalable workflows.
 47 | 
 48 | Some notable examples of these multi-stage ML workflow solutions include:
 49 | 
 50 | - [Facebook’s FBLearner Flow](https://code.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/): FBLearner Flow is Facebook's internal ML platform that enables engineers and data scientists to build, train, and deploy ML models at scale. It provides a unified interface for managing the entire ML lifecycle, from data preparation to model serving.
 51 | 
 52 | - [Google's TFX:](https://dl.acm.org/citation.cfm?id=3098021) TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. It provides a suite of components and libraries for building, training, and serving ML models, with a focus on scalability and reproducibility.
 53 | 
 54 | - [Kubeflow Pipelines](https://cloud.google.com/blog/products/ai-machine-learning/getting-started-kubeflow-pipelines): Kubeflow Pipelines is an open-source platform for building and deploying portable, scalable ML workflows on Kubernetes. It allows users to define and orchestrate complex ML pipelines using a declarative approach.
 55 | 
 56 | - [Microsoft Azure ML Pipelines](https://learn.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines?view=azureml-api-2): Azure ML Pipelines is a service within the Azure Machine Learning platform that enables the creation and management of end-to-end ML workflows. It provides a visual interface and SDK for building, scheduling, and monitoring ML pipelines.
 57 | 
 58 | - [Netflix Meson](https://netflixtechblog.com/meson-workflow-orchestration-for-netflix-recommendations-fc932625c1d9): Meson is Netflix's internal platform for building and managing ML workflows. It provides a unified framework for data preparation, model training, and deployment, with a focus on scalability and ease of use.
 59 | 
 60 | - [Spotify's Luigi](https://github.com/spotify/luigi): Luigi is an open-source Python library developed by Spotify for building and scheduling complex pipelines of batch jobs. While not specifically designed for ML workflows, it has been widely adopted in the ML community for managing data processing and model training pipelines.
 61 | 
 62 | - [Uber's Michelangelo](https://www.uber.com/blog/michelangelo-machine-learning-platform/): Michelangelo is Uber's ML platform that powers a wide range of services, from fraud detection to customer support. It provides end-to-end functionality for building, training, and deploying ML models at scale.
 63 | 
 64 | From studying these projects and their associated papers, we have identified a set of common steps that encompass the end-to-end machine learning workflow. These steps include data ingestion, data preparation, feature engineering, model training, model evaluation, model deployment, and monitoring.
 65 | 
 66 | MLSpec aims to provide a standardized specification and framework for defining and executing these multi-stage ML workflows, drawing inspiration from the successful approaches and best practices established by these industry leaders.
 67 | 
 68 | # Specificiation 
 69 | 
 70 | ## MLSpec Standards
 71 | 
 72 | MLSpec aims to establish a set of standards for various components involved in an end-to-end machine learning workflow. By defining these standards, MLSpec seeks to promote interoperability, reproducibility, and best practices across different ML tools and platforms.
 73 | 
 74 | The standards cover the following key areas:
 75 | 
 76 | 1. **Workflow Orchestration**: MLSpec will define a standard set of endpoints that each step in an ML workflow should expose. These endpoints may include:
 77 |    - `/ok`: An endpoint to check the health and readiness of the step.
 78 |    - `/varz`: An endpoint to retrieve various runtime variables and configurations.
 79 |    - `/metrics`: An endpoint to expose performance metrics and monitoring data.
 80 |    - Additional endpoints for step-specific functionality and control.
 81 | 
 82 |    By standardizing these endpoints, MLSpec enables consistent monitoring, control, and integration of ML workflow steps across different orchestration platforms.
 83 | 
 84 | 2. **Model Management**: MLSpec will provide guidelines and standards for versioning, packaging, and deploying ML models. This may include:
 85 |    - Model serialization formats and conventions.
 86 |    - Metadata schemas for capturing model provenance, hyperparameters, and performance metrics.
 87 |    - APIs for model serving and inference.
 88 |    - Best practices for model versioning and lineage tracking.
 89 | 
 90 |    Standardizing model management practices promotes model reproducibility, interpretability, and maintainability.
 91 | 
 92 | 3. **Logging**: MLSpec will define a standard logging format for capturing relevant information about each inference request. This logging format will align with the NCSA (National Center for Supercomputing Applications) standard log format, which includes fields such as:
 93 |    - Timestamp
 94 |    - Request ID
 95 |    - User ID
 96 |    - Model version
 97 |    - Input data
 98 |    - Output predictions
 99 |    - Latency
100 |    - Additional metadata
101 | 
102 |    Adopting a standardized logging format enables consistent monitoring, debugging, and analysis of ML system behavior.
103 | 
104 | 4. **Data Validation and Quality**: MLSpec will provide guidelines and standards for validating and ensuring the quality of input data at various stages of the ML workflow. This may include:
105 |    - Data schema validation.
106 |    - Data quality checks (e.g., missing values, outliers, data drift).
107 |    - Data versioning and lineage tracking.
108 |    - Integration with data validation frameworks and tools.
109 | 
110 |    Ensuring data quality and validation helps maintain the integrity and reliability of ML workflows.
111 | 
112 | 5. **Experiment Tracking**: MLSpec will define standards for tracking and managing ML experiments, including:
113 |    - Experiment metadata schemas.
114 |    - APIs for logging and querying experiment runs.
115 |    - Best practices for organizing and versioning experiment artifacts.
116 |    - Integration with popular experiment tracking frameworks.
117 | 
118 |    Standardizing experiment tracking enables reproducibility, comparison, and analysis of ML experiments.
119 | 
120 | By establishing standards across these critical components of the ML workflow, MLSpec aims to foster a more robust, interoperable, and governed ecosystem for machine learning development and deployment.
121 | 
122 | ## End-to-End Complete Lifecycle
123 | 
124 | With MLSpec, we believe that every stage of an ML lifecycle requires some form of metadata management. We have identified the following steps as critical components of a complete ML lifecycle:
125 | 
126 | 1. **Codify Objectives**: Detail the model outputs, possible errors, and minimum success criteria for launching in code. Use a simple DSL that can be used to verify success/failure programmatically for automated deployment.
127 | 
128 | 2. **Data Ingestion**: Specify the tools/connectors (e.g., ODBC, Spark, HDFS, CSV, etc.) used for pulling in data, along with the queries used (including signed datasets), sharding strategies, and any labeling or synthetic data generation/simulation techniques.
129 | 
130 | 3. **Data Analysis**: Provide a set of descriptive statistics on the included features and configurable slices of the data. Identify outliers.
131 | 
132 | 4. **Data Transformation**: Document the data conversions and feature wrangling techniques (e.g., feature to-integer mappings) used, as well as any outliers that were programmatically eliminated.
133 | 
134 | 5. **Data Validation**: Apply validation to the data based on a versioned, succinct description of the expected properties of the data. Use schemas to prevent bad behavior, such as training on deprecated data. Provide mechanisms to generate the first version of the schema (e.g., `select * from foo limit 30`) that can be used to drive other platform components, such as automatic feature-engineering or data-analysis tools.
135 | 
136 | 6. **Data Splitting (including partitioning)**: Record how the data is split into training, validation, hold back, and debugging sets, along with the results of validation for statistics of each set. Use metadata to detect leakage of training data into testing data and/or overfitting.
137 | 
138 | 7. **Model Training/Tuning**: Capture metadata about how the model is packaged and the distribution strategy, hyperparameters searched and results of the search, results of any conversions to other model serving formats (e.g., TF -> ONNX), and techniques used to quantize/compress/prune the model and the results.
139 | 
140 | 8. **Model Evaluation/Validation**: Record the results of evaluation and validation of models to ensure they meet the original codified objectives before serving them to users. Compute metrics on slices of data, both for improving performance and avoiding bias (e.g., gender A gets significantly better results than gender B). Document the source of data used for validation.
141 | 
142 | 9. **Test**: Record the results of final confirmation for the model on the hold back data set. This MUST BE A SEPARATE STEP FROM #8. Document the source of data used for the final test.
143 | 
144 | 10. **Model Packaging**: Capture metadata about the model package, including additional security constraints, monitoring agents, signing, etc. Provide descriptions of the necessary infrastructure (e.g., P100s, 16 GB of RAM, etc.).
145 | 
146 | 11. **Serving**: Record the results of rolling the model out to production.
147 | 
148 | 12. **Monitoring**: Provide live queryable metadata that enables liveness checking and ML-specific signals that need action, such as significant deviation from previous model performance or degradation of the model performance over time. Include a rollback strategy (e.g., if this model is failing, use `model last-year.last-month.pkl`).
149 | 
150 | 13. **Logging**: Generate an NCSA-style record per inference request, including a cryptographically secure record of the version of the pipeline (including features) and data used to train.
151 | 
152 | By capturing and managing metadata at each stage of the ML lifecycle, MLSpec aims to provide a comprehensive and standardized approach to ensuring the reproducibility, interpretability, and governance of end-to-end machine learning workflows.
153 | 
154 | ## Repository Structure
155 | 
156 | - [common](./common)
157 | 
158 |   - [object](./common/object.md)
159 | 
160 |     General notes applicable to multiple objects in the system. How they are identified and named, basic operations, etc.
161 | 
162 | - [data](./data)
163 | 
164 |   - [datastore](./data/datastore.md)
165 | 
166 |     Data storages
167 | 
168 |   - [datapath](./data/datapath.md)
169 | 
170 |     Data references
171 | 
172 |   - [artifact](artifact.md)
173 | 
174 |     Data produced by runs
175 | 
176 |   - [dataset](./data/dataset.md)
177 | 
178 |     Named and versioned data in storage
179 | 
180 | - [pipelines](./pipelines)
181 | 
182 |   - [pipeline](pipeline.md)
183 | 
184 |     DAG for executing computation on data and training and deploying models
185 | 
186 |   - [module](module.md)
187 | 
188 |     Reusable definition of computation, includes script, set of expected inputs, outputs, etc.
189 | 
190 | - [experiment_tracking](./experiment_tracking)
191 | 
192 |   - [run](./experiment_tracking/run.md)
193 | 
194 |     Tracked execution of pipeline or single script on compute
195 | 
196 | - [model_packaging](./model_packaging)
197 | 
198 |   - [models](./model_packaging/README.md)
199 | 
200 |     Trained models
201 | 
202 | - logging_proto
203 | 
204 | - monitoring_proto
205 | 
206 | - [metadata_file](./metadata_file)
207 | 
208 |   - [metadata](./metadata_file/metadata.yaml)
209 | 
210 |     The metadata file used to recreate the ML workflow
211 | 
212 | ## Vision and Direction
213 | 
214 | Our vision for MLSpec is to establish it as a robust and widely adopted framework for defining, standardizing, and verifying complex ML workflows. We aim to:
215 | 
216 | 1. **Enhance Framework Support**: Extend MLSpec to support the latest ML frameworks and libraries, such as PyTorch, XGBoost, and LightGBM, enabling seamless integration with cutting-edge techniques and architectures.
217 | 
218 | 2. **Accommodate Complex Workflows**: Expand the MLSpec schema to accommodate intricate, multi-stage ML pipelines, including data preprocessing, feature engineering, model training, evaluation, and deployment.
219 | 
220 | 3. **Integrate with MLOps and AutoML**: Align MLSpec with modern MLOps practices and AutoML frameworks, enabling streamlined workflow management and automation.
221 | 
222 | 4. **Improve Governance and Compliance**: Introduce a methodology for recording attestations of workflow execution in accordance with schema to support governance and compliance requirements.
223 | 
224 | 5. **Foster Community Engagement**: Revitalize the project's community by improving documentation, providing clear contributing guidelines, and actively engaging with users and contributors.
225 | 
226 | 6. **Integrate with Workflow Orchestration Tools**: Provide seamless integration with popular workflow management platforms, such as Apache Airflow, Flyte, and Prefect, allowing users to leverage MLSpec for defining and verifying ML workflows within their existing orchestration pipelines.
227 | 
228 | ## Enhancing Model Interpretability and Trust
229 | 
230 | One of the key challenges in the adoption and deployment of machine learning models is their interpretability and the trust users place in them. Many ML models are often considered "black boxes," making it difficult for users to understand how they arrive at their predictions or decisions. This lack of transparency can lead to a lack of trust and hesitation in relying on these models, especially in critical domains such as healthcare, finance, and legal systems.
231 | 
232 | MLSpec aims to address this challenge by providing a framework for building interpretable and transparent ML workflows. By standardizing the end-to-end ML lifecycle and promoting best practices for model development, evaluation, and deployment, MLSpec enables users to gain insights into the behavior and decision-making process of ML models.
233 | 
234 | Some of the ways MLSpec promotes model interpretability and trust include:
235 | 
236 | 1. **Standardized Model Metadata**: MLSpec defines a standard schema for capturing and storing metadata about ML models, including information about their architecture, training data, hyperparameters, and performance metrics. This metadata provides a clear and comprehensive view of the model's characteristics and behavior.
237 | 
238 | 2. **Model Evaluation and Validation**: MLSpec emphasizes rigorous model evaluation and validation practices to ensure that models meet the desired performance criteria and are free from biases or unintended consequences. By standardizing evaluation metrics and techniques, MLSpec enables users to assess the reliability and trustworthiness of models objectively.
239 | 
240 | 3. **Model Explainability Techniques**: MLSpec encourages the use of model explainability techniques, such as feature importance analysis, partial dependence plots, and counterfactual explanations, to provide insights into how models make predictions. These techniques help users understand the factors influencing model decisions and identify potential issues or biases.
241 | 
242 | 4. **Governance and Auditing**: MLSpec includes mechanisms for model governance and auditing, allowing organizations to track and monitor the lifecycle of ML models. This includes capturing information about model lineage, versioning, and approvals, ensuring that models adhere to regulatory requirements and ethical standards.
243 | 
244 | By focusing on model interpretability and trust, MLSpec aims to foster the responsible development and deployment of ML models. It provides the necessary tools and guidelines to build transparent and accountable ML workflows, enabling users to have confidence in the models they use and the decisions they make.
245 | 
246 | Join us in our mission to create a more interpretable and trustworthy ML ecosystem with MLSpec!
247 | 
248 | ## Roadmap
249 | 
250 | Our short-term goals (next 3-6 months) include:
251 | 
252 | - [ ] Refactor the library codebase to improve maintainability and extensibility.
253 | - [ ] Add support for PyTorch and XGBoost frameworks.
254 | - [ ] Enhance the schema to accommodate complex, multi-stage workflows.
255 | - [ ] Implement workflow attestation and digital signing capabilities.
256 | - [ ] Overhaul the documentation and contributing guidelines.
257 | - [ ] Develop an Apache Airflow operator/plugin for integrating MLSpec -defined workflows.
258 | 
259 | Our medium-term goals (6-12 months) include:
260 | 
261 | - [ ] Integrate with popular MLOps platforms and AutoML frameworks.
262 | - [ ] Develop tools and dashboards for governance and compliance reporting.
263 | - [ ] Collaborate with the Apache Airflow, Kubeflow, Prefect, Argo Workflows, MLflow, and Flyte communities to develop seamless integrations for defining and verifying ML workflows within their respective orchestration platforms.
264 | - [ ] Establish industry partnerships and collaborations to promote adoption.
265 | - [ ] Foster an active community of contributors and users.
266 | 
267 | We welcome contributions from the community to refine and expand this roadmap.
268 | 
269 | ## Contributing
270 | 
271 | We are excited to have you on board! Please refer to the [CONTRIBUTING.md](CONTRIBUTING.md) file for detailed guidelines on how to get involved, whether it's by reporting issues, submitting pull requests, or participating in discussions.
272 | 
273 | ## Code of Conduct
274 | 
275 | To ensure a welcoming and inclusive environment, we have adopted the [Contributor Covenant Code of Conduct](CODE_OF_CONDUCT.md). Please review and adhere to these guidelines.
276 | 
277 | ## Acknowledgments
278 | 
279 | We would like to express our gratitude to David Aronchick (@aronchick) and the authors and contributors of MLSpec for their pioneering work on this project. Their efforts have laid the foundation for what we aim to achieve.
280 | 
281 | ## Help Shape The Future of MLSpec!
282 | 
283 | We invite you to be a part of the MLSpec journey. Try out the new developments, provide feedback, and contribute your ideas and code. Together, we can shape the future of standardized and verifiable ML workflows.
284 | 
285 | For discussions and updates, stay tuned for upcoming mailing list and social accounts.
286 | 
287 | We believe MLSpec has the potential to become a foundational tool for building reliable and governed ML pipelines, and we look forward to working with the community to realize this vision.
288 | 


--------------------------------------------------------------------------------
/_prior_art/README.md:
--------------------------------------------------------------------------------
1 | **Example Repo**
2 | 
3 | Examples of metadata used from other projects.
4 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/Artifact.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/ArtifactConnection.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/Data.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/Executable.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/Framework.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/Metrics.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/Model.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/Project.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/_prior_art/kubeflow/README.md


--------------------------------------------------------------------------------
/_prior_art/kubeflow/Run.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/TimeRange.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/kubeflow/UUID.proto:
--------------------------------------------------------------------------------
  1 | /*
  2 | Copyright 2019 The Kubeflow Authors.
  3 | 
  4 | Licensed under the Apache License, Version 2.0 (the "License");
  5 | you may not use this file except in compliance with the License.
  6 | You may obtain a copy of the License at
  7 | 
  8 |     http://www.apache.org/licenses/LICENSE-2.0
  9 | 
 10 | Unless required by applicable law or agreed to in writing, software
 11 | distributed under the License is distributed on an "AS IS" BASIS,
 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | See the License for the specific language governing permissions and
 14 | limitations under the License.
 15 | */
 16 | 
 17 | syntax = "proto3";
 18 | 
 19 | package protobuf;
 20 | 
 21 | import "google/protobuf/timestamp.proto";
 22 | import "google/protobuf/struct.proto";
 23 | 
 24 | // Instead of have project_uuid, run_uuid and etc, we have uuid with type
 25 | // information.
 26 | message UUID {
 27 |   string value = 1;
 28 |   Type type = 2;
 29 |   enum Type {
 30 |     UNKNOWN = 0;
 31 |     PROJECT = 1;
 32 |     RUN = 2;
 33 |     ARTIFACT_CONNECTION = 3;
 34 |     DATA_METADATA = 4;
 35 |     MODEL_METADATA = 5;
 36 |     EXECUTABLE_METADATA = 6;
 37 |     METRICS = 7;
 38 |   }
 39 | }
 40 | 
 41 | message Project {
 42 |   UUID id = 1;
 43 |   string name = 2;
 44 |   string description = 3;
 45 |   repeated UUID runs = 4;
 46 |   repeated UUID artifacts = 5;
 47 |   map<string, string> annotations = 6;
 48 | }
 49 | 
 50 | message Run {
 51 |   UUID id = 1;
 52 |   string name = 2;
 53 |   string description = 3;
 54 |   UUID project = 4;
 55 |   repeated UUID artifacts = 5;
 56 |   map<string, string> annotations = 6;
 57 | }
 58 | 
 59 | message ArtifactConnection {
 60 |   UUID id = 1;
 61 |   UUID first_artifact = 2;
 62 |   UUID second_artifact = 3;
 63 |   UUID run = 4;
 64 |   UUID project = 5;
 65 | }
 66 | 
 67 | message ArtifactMetadata {
 68 |   oneof metadata {
 69 |     DataMetadata data_metadata = 1;
 70 |     ExecutableMetadata executable_metadata = 2;
 71 |     ModelMetadata model_metadata = 3;
 72 |   }
 73 | }
 74 | 
 75 | message DataMetadata {
 76 |   UUID id = 1;
 77 |   string name = 2;
 78 |   string description  = 3;
 79 |   string source  = 4;
 80 |   string query  = 5;
 81 |   string version  = 6;
 82 |   google.protobuf.Timestamp ingestTime = 7;
 83 |   TimeRange timerange = 8;
 84 |   repeated UUID runs = 9;
 85 |   repeated UUID projects = 10;
 86 |   map<string, string> annotations = 11;
 87 |   repeated UUID jobs = 12;
 88 | }
 89 | 
 90 | message ModelMetadata {
 91 |   UUID id = 1;
 92 |   string name = 2;
 93 |   string description  = 3;
 94 |   string kind = 4;
 95 |   string version = 5;
 96 |   repeated string tags = 15;
 97 |   map<string, google.protobuf.Value> hyperparameters = 6;
 98 |   Framework framework = 7;
 99 |   string storage_location = 8;
100 |   google.protobuf.Timestamp create_ts = 14;
101 |   repeated UUID metrics_ids = 9;
102 |   UUID run = 10;
103 |   UUID project = 11;
104 |   map<string, string> annotations = 12;
105 |   repeated UUID jobs = 13;
106 | }
107 | 
108 | message ExecutableMetadata {
109 |   UUID id = 1;
110 |   string name = 2;
111 |   string description  = 3;
112 |   string repository = 4;
113 |   string version = 5;
114 |   repeated string tags = 6;
115 |   google.protobuf.Timestamp create_ts = 7;
116 |   repeated UUID runs = 8;
117 |   repeated UUID projects = 9;
118 |   map<string, string> annotations = 10;
119 |   repeated UUID jobs = 11;
120 | }
121 | 
122 | message Metrics {
123 |   UUID id = 1;
124 |   UUID model = 2;
125 |   UUID data = 3;
126 |   UUID job = 4;
127 |   string kind = 5;
128 |   string description = 8;
129 |   map<string, google.protobuf.Value> values = 6;
130 |   map<string, string> annotations = 7;
131 | }
132 | 
133 | message Framework {
134 |   string name = 1;
135 |   string version = 2;
136 | }
137 | 
138 | message TimeRange {
139 |   google.protobuf.Timestamp start = 1;
140 |   google.protobuf.Timestamp end = 2;
141 | }
142 | 


--------------------------------------------------------------------------------
/_prior_art/mlflow/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/_prior_art/mlflow/README.md


--------------------------------------------------------------------------------
/_prior_art/modeldb/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/_prior_art/modeldb/README.md


--------------------------------------------------------------------------------
/_prior_art/pachyderm/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/_prior_art/pachyderm/README.md


--------------------------------------------------------------------------------
/_prior_art/seldon/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/_prior_art/seldon/README.md


--------------------------------------------------------------------------------
/common/object.md:
--------------------------------------------------------------------------------
 1 | # Object
 2 | 
 3 | All system managed objects generally have the following attributes and actions.
 4 | 
 5 | ### Attributes
 6 | 
 7 | - **Id**
 8 | 
 9 |   [*String, Required*] Id is system generated string, immutable and uniquely identifying an object. System must ensure its uniqueness within given workspace over the lifetime of the worksapce. Usually [UUIDs (GUIDs)](https://en.wikipedia.org/wiki/Universally_unique_identifier) are used for this.
10 | 
11 |   Example: `123e4567-e89b-12d3-a456-426655440000`
12 | 
13 | The maximum length is 40 characters.
14 | 
15 |   Note: 40 characters is sufficient to represent either UUID or SHA-1 hashes as human readable strings.
16 | 
17 | - **Name**
18 | 
19 |   [*String, Required*]
20 | 
21 |   Name is string assigned by user, uniquely identifying an object. It's mutable, and can be changed by user as long as it stays unique within specific scope at any moment of time among non-archived objects. Scope is usually workspace, but in some cases uniqueness can be enforced within smaller scope, such as artifact names are unique only within run.
22 | 
23 | Additional criteria:
24 | - It should consist of alphanumeric characters, hyphens (-), and underscores (_).
25 | - It should start with an alphanumeric character.
26 | - It should not contain spaces or any other special characters.
27 | - It should be case-insensitive for uniqueness checks.
28 | - The maximum length is 255 characters.
29 | 
30 | Examples of valid names:
31 | - `my-dataset`
32 | - `model_v1`
33 | - `preprocessing_pipeline_2`
34 | 
35 |   - **Status**
36 | 
37 |   [*Enum, Required*]
38 | 
39 |   Allowed values: Active, Deprecated, Archived
40 | 
41 |   Not all statuses are applicable to some object types.
42 | 
43 |   - *Active* object is visible in default lists, can be retrieved by name or id, and can be used in the system
44 |   - *Deprecated* object is not visible in default lists, can be retrieved by name or id, and can be used in the system. User might be warned that this object is deprecated and recommended to use some other object instead.
45 |   - *Archived* object is not visible in default lists, can't be retrieved by name and can't be used to execute new jobs. It can be retrieved by Id, most of metadata is still readable for the purpose of exploring history of past executions.
46 | 
47 | - **Description**
48 | 
49 |   [*String, Optional*].
50 | 
51 |   Arbitrary text description for given object.
52 | 
53 | The maximum length is 32K characters. 
54 | 
55 | 
56 | 
57 | ### Actions
58 | 
59 | - Create
60 | 
61 | - Get
62 | 
63 | - Archive
64 | 
65 |   Object is not deleted. Name is can be reused for some other object.
66 | 
67 | - Rename
68 | 
69 | 
70 | 
71 | ## Deletion
72 | 
73 | Hard deletion is unsupported for system managed objects. Objects can be archived to free up name, and hide object from default lists or UX. However, objects are preserved to allow user later discover history of ML, and prevent breakage of other objects referencing given object
74 | 


--------------------------------------------------------------------------------
/data/artifact.md:
--------------------------------------------------------------------------------
 1 | # Artifact
 2 | 
 3 | *Artifacts* are pieces of data produced by runs.
 4 | 
 5 | Artifacts have the following properties
 6 | 
 7 | - Id
 8 | 
 9 |   System assigned id, unique within workspace
10 | 
11 | - Name
12 | 
13 |   User assigned name, unique within the context of run which produced it
14 | 
15 | - Created By Run Id
16 | 
17 |   [Run](run.md) which produced this artifact
18 | 
19 | - Data Path
20 | 
21 |   Reference to data stored in storage (includes Data Store, relative path, etc.)
22 | 
23 | Artifact can be promoted to [DataSet](dataset.md)
24 | 
25 | 


--------------------------------------------------------------------------------
/data/datapath.md:
--------------------------------------------------------------------------------
 1 | # DataPath
 2 | 
 3 | ### DataPath
 4 | 
 5 | DataPath is used to refer to something stored in DataStore. DataPath always contains reference to DataStore (which user usually does by name, and system usually tracks by Id), and DataStore type specific properties. In most cases it's relative path in given DataStore, but sometimes can be more complex, such as SQL Table name or SQL Query used to retrieve data from SQL Database.
 6 | 
 7 | Example DataPath:
 8 | 
 9 | ```json
10 | {
11 | 	"DataStore": "MyCloudStorageContainer",
12 | 	"RelativePath": "/data/summerproject/1.txt"
13 | }
14 | ```
15 | 
16 | DataPaths don't have names, Ids, and other metadata. For those, see Artifacts and DataSets
17 | 
18 | 


--------------------------------------------------------------------------------
/data/dataset.md:
--------------------------------------------------------------------------------
 1 | # Dataset 
 2 | 
 3 | Datasets represent named data stored within a Datastore. Conceptually Datasets will be enabling data to be effectively used in various  scenarios by abstracting the underlying storage (such as formatting and encoding) and access complexities (such as reading and transforming to a meaningful form) 
 4 | 
 5 | # Dataset Version 
 6 | 
 7 | Datasets are versioned based on its definition (location of the data and steps to transform). Dataset definition can be simple Datapath (file path relative to Datastore) or a complex Dataflow (DataPrep dataflow json).  
 8 | 
 9 | Any changes to definition will necessitate the creation of a new version. Versioning concept is here same as source code commit in VSO or similar systems. 
10 | 
11 | # Archiving a Version 
12 | 
13 | Individual Dataset versions can be Archived when versions not supposed to be used for any reasons (such as underlying data no longer available). When an Archived Dataset version used in a Pipeline, execution will be blocked with error. No further actions can be performed on archived Dataset versions, but the references will be kept intact. 
14 | 
15 | # Deprecating a Version 
16 | 
17 | Individual Dataset versions can be deprecated when usage is no longer recommended and a replacement available. When a deprecated Dataset version used in a Pipeline, a warning message gets returned but execution will not be blocked.  
18 | 
19 | # Reactivate a Version 
20 | 
21 | Dataset versions can be reactivated to be used again. 
22 | 
23 | # Profile 
24 | 
25 | Profile of a Dataset version is Schema and various statistical measures of underlying data at a point in time. Profile of Dataset version will be stall when underlying data changes. 
26 | 
27 | Note: At this point only, Profile freshness can be calculated for file based Datastore (Azure Block Blob/ File Blob/ ADLS) 
28 | 
29 | # Dataset Checkpoint 
30 | 
31 | A checkpoint is a combination of Profile and an optional materialized copy of the data itself, tied to a specific dataset version. 
32 | 
33 | When data gets materialized, a new profile will be generated for the materialized data instead of current profile though it is still valid. This is to ensure freshness of the Profile while data being materialized.  
34 | 
35 | Also, when a checkpoint with materialized data is used in execution/ Training, the materialized data gets used instead of live data. 
36 | 
37 | # Dataset Views 
38 | 
39 | Dataset Views are simple user defined filters on a Dataset version to select subset of rows. Each Dataset View will have one Dataflow filter expression. Refer Dataflow expressions for more details. 
40 | 
41 | Eg: Weather Dataset version 1 can be pointing to all data between 2015 and 2018, The view will be Weather_2015, Weather_2016 and so on.
42 | 


--------------------------------------------------------------------------------
/data/dataset.yml:
--------------------------------------------------------------------------------
 1 | mlSpecVersion: 0.0.1
 2 | id: 592f0c1c-72ae-4236-9202-7e6aff1954f4
 3 | location: https://internal.contoso.com/datasets/90d47e48-9105-49d0-a159-5c029f97ecd0
 4 | description: Error Logs 2019-08-22
 5 | dateGenerated: 2019-08-22
 6 | isDeprecated: false
 7 | isReadOnly: true
 8 | isVisible: true
 9 | name: foo
10 | preview: ref
11 | profile: ref
12 | state: active
13 | tags:
14 | - key: X
15 |   value: '100'
16 | version: 1.0.1
17 | versionedBy: BobSmith
18 | versionedOn: 2019-08-21
19 | versionNotes: None
20 | 


--------------------------------------------------------------------------------
/data/datastore.md:
--------------------------------------------------------------------------------
 1 | # Datastore
 2 | 
 3 | ### Datastore
 4 | 
 5 | Datastores are objects which indicate the location of a stored data. Example of Datastore can be a SQL Database, or cloud storage account like an Azure Blob or AWS S3 bucket, or DBFS file system in a Databricks cluster. Datastore can be either a pointer to the data, or used to store various pieces of data, organized into files, folders, sql tables, etc. - depending on its type.
 6 | 
 7 | The Datastore Object shouldn't be used for referencing specific file or folder, it's intended to be a bigger container. DataPaths are used to refer to specific data (like file or folder)  in Datastore. Datastore can be imagined as "place", and DataPath can be imagined as "thing" stored in "place".
 8 | 
 9 | Datastore has user assigned *name* and *type*, and system generated *id*. Depending on type, Datastore can have more properties. For example, for SQL Database can have properties such as connection string, database name.
10 | 
11 | Name is unique and assigned by user, at any moment of time only single datastore can have given name. User can rename Datastore. Id is also unique, but immutable and system generated. Usually *Id* is GUID.
12 | 
13 | Datastore can be archived. When it's archived, name is freed up and can be used for another Datastore.
14 | 


--------------------------------------------------------------------------------
/data/readme.md:
--------------------------------------------------------------------------------
 1 | # Data tracking
 2 | 
 3 | Data tracking is built on several abstractions:
 4 | 
 5 | - [DataStore](datastore.md)
 6 | 
 7 |   A place for storing data. Could be a storage account, SQL database, HDFS filesystem, etc.
 8 | 
 9 | - [DataPath](datapath.md)
10 | 
11 |   A thing stored in DataStore. Always has reference to DataStore, and some way to specify location within it, for example relative path. Could be file, folder, table in SQL database, etc.
12 | 
13 | - [Artifact](artifact.md)
14 | 
15 |   Piece of data produced by run, has identifier (and can be stored in registry of artifacts) and DataPath.
16 | 
17 | - [DataSet](dataset.md)
18 | 
19 |   Named and versioned data, with rich metadata (profile, schema, snapshots, etc.). Uses DataPath to point to actual data.


--------------------------------------------------------------------------------
/docs/archive/README-old.md:
--------------------------------------------------------------------------------
  1 | # MLSpec
  2 | A project to standardize the intercomponent schemas for a multi-stage ML Pipeline
  3 | 
  4 | # Prior Art
  5 | Many of these concepts are inspired by the previous work from the [MLSchema project](http://ml-schema.github.io/documentation/ML%20Schema.html) (Mirrored here due to site instability) -
  6 | - Paper - https://github.com/mlspec/MLSpec/blob/master/1807.05351.pdf
  7 | - Website - https://github.com/mlspec/MLSpec/blob/master/ML%20Schema%20Core%20Specification.pdf
  8 | 
  9 | # Background
 10 | The machine learning industry has embraced the concept of cloud-native architectures, made up of multiple component parts loosely coupled together. One of the issues with this approach, however, has been that while the steps of a machine learning pipeline have been fairly well articulated in a wide variety of publications, the specifications for how to wire together these steps remains highly varied, and make it difficult to build any standard tools that might simplify or formalize machine learning operations. 
 11 | 
 12 | This project is about establishing community driven standards that automated tooling can consume and output. Ideally, this enables the next opportunity around standardized ML software engineering practices. 
 13 | 
 14 | # Existing Multi-Stage ML Workflows
 15 | The below provide inspiration as projects which focus on ML and batch solutions:
 16 | 
 17 | - [Facebook’s FBLearner Flow](https://code.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/)
 18 | - [Google’s TFX Paper](https://dl.acm.org/citation.cfm?id=3098021)
 19 | - [Kubeflow Pipelines](https://cloud.google.com/blog/products/ai-machine-learning/getting-started-kubeflow-pipelines)
 20 | - [Microsoft Azure ML Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines)
 21 | - [Netflix Meson](https://medium.com/netflix-techblog/meson-workflow-orchestration-for-netflix-recommendations-fc932625c1d9)
 22 | - [Spotify’s Luigi](https://github.com/spotify/luigi)
 23 | - [Uber’s Michelangelo](https://eng.uber.com/michelangelo/) [[paper](http://proceedings.mlr.press/v67/li17a/li17a.pdf)]
 24 | 
 25 | From these papers, we feel the following steps summarize all the steps in an end-to-end machine learning workflow.
 26 | 
 27 | # Proposed standards
 28 | We propose a standard around the following components.
 29 | 
 30 | - Workflow orchestration – what are the standard endpoints that each step in an ML workflow require (e.g. /ok, /varz, /metrics, etc)
 31 | - Model - …
 32 | - Logging – what is the NCSA standard log for each inference request?
 33 | - Other…
 34 | 
 35 | # End-to-End Complete Lifecycle
 36 | We feel that over time, every stage of an ML lifecycle will need some form of metadata management. The below represent a collection of these steps:
 37 | 
 38 | 1. *Codify Objectives* - Detail the model outputs, possible errors and minimum success for launching in code; a simple DSL that can be used to verify success/failure programmatically for automated deployment
 39 | 2. *Data Ingestion* - What tools/connectors (e.g. ODBC, Spark, HDFS, CSV, etc) were used for pulling in data; what queries were used (including signed datasets); sharding strategies; May include labelling or synthetic data generation/simulation.
 40 | 3. *Data Analysis* - Set of descriptive statistics on the included features and configurable slices of the data. Identification of outliers.
 41 | 4. *Data Transformation* - What data conversions and feature wrangling (e.g. feature to-integer mappings) were used; what outliers were programmatically eliminated
 42 | 5. *Data Validation* - What validation was applied to the data based on a versioned, succinct description of the expected properties of the data; schema can also be used to prevent bad behavior, such as training on deprecated data; mechanisms to  generate the first version of the schema (e.g. select * from foo limit 30) that can be used to drive other platform components, e.g., automatic feature-engineering or data-analysis tools.
 43 | 6. *Data Splitting (including partitioning)* - How the data is split into training, validation, hold back & debugging sets and records and gets results of validation for statistics of each set; metadata here may be be used to detect leakage of training data into testing data and/or overfit
 44 | 7. *Model Training/Tuning* - Metadata about how the model is packaged and the distribution strategy; hyperparameters searched and results of the search; results of any conversions to other model serving format (e.g. TF -> ONNX); techniques used to quantize/compress/prune model and the results  
 45 | 8. *Model Evaluation/Validation*	- Result of evaluation and validation of model to ensure they meet original codified objectives before serving them to users; computation of metrics on slices of data, both for improving performance and avoiding bias (e.g. gender A gets significantly better results than gender B); source of data used for validation
 46 | 9. *Test*	- Results of final confirmation for model on the hold back data set; MUST BE SEPARATE STEP FROM #8; source of data used for final test
 47 | 10. *Model Packaging*	- Metadata about model package; includes adding additional security constraints, monitoring agents, signing, etc.; descriptions of the necessary infrastructure (e.g. P100s, 16 GB of RAM, etc)
 48 | 11. *Serving*	- Results of rolling model out to production
 49 | 12. *Monitoring* - Live queryable metadata that provides liveness checking and ML-specific signals that need action, such as significant deviation from previous model performance or degradation of the model performance over time; ideally includes rollback strategy (e.g. if this model is failing, use model last-year.last-month.pkl)
 50 | 13. *Logging*	- NCSA-style record per inference request, including a cryptographically secure record of the version of the pipeline (including features) and data used to train.
 51 | 
 52 | # Table of contents for MLSpec repo
 53 | 
 54 | - [common](./common)
 55 | 
 56 |   - [object](./common/object.md)
 57 | 
 58 |     General notes applicable to multiple objects in the system. How they are identified and named, basic operations, etc.
 59 | 
 60 | - [data](./data)
 61 | 
 62 |   - [datastore](datastore.md)
 63 | 
 64 |     Data storages
 65 | 
 66 |   - [datapath](./data/datapath.md)
 67 | 
 68 |     Data references
 69 | 
 70 |   - [artifact](artifact.md)
 71 | 
 72 |     Data produced by runs
 73 | 
 74 |   - [dataset](./data/dataset.md)
 75 | 
 76 |     Named and versioned data in storage
 77 | 
 78 | - [pipelines](./pipelines)
 79 | 
 80 |   - [pipeline](pipeline.md)
 81 | 
 82 |     DAG for executing computation on data and training and deploying models
 83 | 
 84 |   - [module](module.md)
 85 | 
 86 |     Reusable definition of computation, includes script, set of expected inputs, outputs, etc.
 87 | 
 88 | - [experiment_tracking](./experiment_tracking)
 89 | 
 90 |   - [run](./experiment_tracking/run.md)
 91 | 
 92 |     Tracked execution of pipeline or single script on compute
 93 | 
 94 | - [model_packaging](./model_packaging)
 95 | 
 96 |   - [models](./model_packaging/README.md)
 97 | 
 98 |     Trained models
 99 | 
100 | - logging_proto
101 | 
102 | - monitoring_proto
103 | 
104 | - [metadata_file](./metadata_file)
105 | 
106 |   - [metadata](./metadata_file/metadata.yaml)
107 | 
108 |     The metadata file used to recreate the ML workflow
109 | 


--------------------------------------------------------------------------------
/docs/assets/logos/mlspec_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/docs/assets/logos/mlspec_logo.png


--------------------------------------------------------------------------------
/docs/assets/logos/mlspec_logo_light.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/docs/assets/logos/mlspec_logo_light.png


--------------------------------------------------------------------------------
/experiment_tracking/README.md:
--------------------------------------------------------------------------------
1 | # Experiment Tracking
2 | 
3 | Experiment tracking is handled by capturing [runs](run.md), their metadata and relationships to other objects.
4 | 
5 | 
6 | 
7 | 


--------------------------------------------------------------------------------
/experiment_tracking/experiment_example.yml:
--------------------------------------------------------------------------------
 1 | mlSpecVersion: 0.0.1
 2 | id: 592f0c1c-72ae-4236-9202-7e6aff1954f4
 3 | project_id: 5999460a-86e4-4b3e-b48d-b4c73b9d73b0
 4 | experiment_id: 3eb00c1e-e06a-4a50-b911-54e6fd4d5746
 5 | name: 'test experiment run'
 6 | submittedDate: '2019-01-01T000000'
 7 | submitterId: 'GUID'
 8 | hyperparameters:
 9 | - key: C
10 |   value: '100'
11 | - key: solver
12 |   value: lbfgs
13 | - key: max_iter
14 |   value: '1000'
15 | artifacts:
16 |   key: model
17 |   path: output/census_logreg_simple.gz
18 |   artifact_type: MODEL
19 | datasets:
20 |   key: input_data
21 |   path: data/credit-default.csv
22 |   artifact_type: DATA
23 | metrics:
24 |   key: accuracy
25 |   value: '0.7787333333333334'
26 |   value_type: NUMBER
27 | tags:
28 | - key: origin
29 |   value: ENUS
30 | properties:
31 | - key: systemId
32 |   value: foo
33 | 


--------------------------------------------------------------------------------
/experiment_tracking/run.md:
--------------------------------------------------------------------------------
 1 | # Run
 2 | 
 3 | Run is a single execution of some 'job', such as script, pipeline, etc. Once executed, it can't execute again.
 4 | 
 5 | Runs have various metadata, metrics, artifacts, etc. associated with them. Also runs have references to other runs and other objects in the system
 6 | 
 7 | ### Relationships
 8 | 
 9 | Runs can have parent-child relationships with other runs. These relationships help establish a hierarchical structure and capture dependencies between runs.
10 | 
11 | - Parent Run: A run can have a parent run, indicating that it is a part of a larger workflow or a subsequent run in a series of related runs. The parent run acts as a container or grouping mechanism for its child runs.
12 | - Child Runs: A run can have multiple child runs, representing sub-tasks or sub-components of the parent run. Child runs are typically spawned by the parent run and can inherit certain properties or configurations from the parent.
13 | The parent-child relationships can be captured using the following fields:
14 | 
15 | - parent_run_id: The ID of the parent run, if applicable. This field establishes the link between a child run and its parent run.
16 | - child_run_ids: An array of IDs representing the child runs associated with the current run. This field allows for tracking the child runs spawned by the current run.
17 | By capturing these relationships, you can create a hierarchical structure of runs, enabling better organization, traceability, and understanding of the dependencies between different parts of the ML workflow.
18 | 
19 | ## Tags
20 | 
21 | Tags are key-value pairs that provide additional metadata and annotations for a run. They allow for flexible categorization, filtering, and querying of runs based on specific attributes or characteristics.
22 | 
23 | * Tags can be used to label runs with meaningful information, such as the purpose of the run, the algorithm used, the dataset version, or any other relevant contextual information.
24 | * Each tag consists of a key and a value, both represented as strings.
25 | * Multiple tags can be associated with a single run, allowing for rich metadata capture.
26 | * Tags can be used to group related runs, filter runs based on specific criteria, or provide additional context for analysis and interpretation.
27 | 
28 | Example tags:
29 | 
30 | ```
31 |   - key: "experiment_type"
32 |     value: "hyperparameter_tuning"
33 |   - key: "dataset_version"
34 |     value: "v2.0"
35 |   - key: "algorithm"
36 |     value: "random_forest"
37 | ```
38 | 
39 | ## Metrics
40 | 
41 | Metrics are quantitative measurements or outcomes associated with a run. They capture the performance, accuracy, or other relevant numerical values that are recorded during the execution of a run.
42 | 
43 | * Metrics provide a way to track and compare the performance of different runs or experiments.
44 | * Each metric consists of a key, which represents the name or identifier of the metric, and a value, which is the numerical value associated with the metric.
45 | * Metrics can be of different types, such as scalar values (e.g., accuracy, loss), arrays or vectors (e.g., precision-recall curve), or even dictionaries or JSON objects for more complex metrics.
46 | * Metrics can be captured at different points during the run, such as at regular intervals or at the end of the run.
47 | 
48 | Example metrics:
49 | 
50 | ```
51 | metrics:
52 |   - key: "accuracy"
53 |     value: 0.87
54 |     value_type: "scalar"
55 |   - key: "precision_recall_curve"
56 |     value: [0.92, 0.88, 0.85, 0.80]
57 |     value_type: "array"
58 |   - key: "confusion_matrix"
59 |     value: {
60 |       "true_positive": 100,
61 |       "true_negative": 200,
62 |       "false_positive": 20,
63 |       "false_negative": 30
64 |     }
65 |     value_type: "object"
66 | ```
67 | 
68 | ## Artifacts
69 | 
70 | Artifacts are data produced by a run. A single run can produce multiple artifacts. Artifacts are always stored as blobs identified by DataPaths - those could be files, folders, SQL tables, etc. Because Artifacts always point to data using DataPaths, they always represent data stored in Datastores. Different artifacts for a given run can be stored in different Datastores. Some artifacts can be published as (or promoted to) DataSets.
71 | 
72 | See [Artifact](artifact.md) for more information.
73 | 
74 | ### Logs
75 | 
76 | Logs are represented as artifacts. A run can have several log objects - usual ones are stdout, stderr, driver log, etc. They are used to capture traces of the executing job. Depending on the implementation, logs can be raw text, HTML, or some other text format.
77 | 
78 | 


--------------------------------------------------------------------------------
/logging_proto/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mlspec/MLSpec/c4fe68b0d4d62d61ff56e434b54af383e09abf27/logging_proto/README.md


--------------------------------------------------------------------------------
/logging_proto/inferenceLog.yml:
--------------------------------------------------------------------------------
 1 | applicationInfo:
 2 |  - applicationId
 3 |  - applicationEndpoint
 4 | requestInfo:
 5 |  - requestId
 6 |  - requestTimestamp
 7 |  - requestLatency
 8 | modelInfo:
 9 |   models:
10 |     type: array
11 |     items:
12 |       type: string
13 | 


--------------------------------------------------------------------------------
/metadata_file/README.md:
--------------------------------------------------------------------------------
 1 | # [WIP] Metadata_file
 2 | This folder contains a metadata file for the ML job that we think what it should look like. We hope to recreate the ML job based on this metadata file. We started the first version of the metadata file based on Kubeflow mnist example, so we only focus on this one particular example. This metadata file will be expanded to adapt to more scenarios in ML world and we are happy to hear your advices.
 3 | 
 4 | Feel free to submit an issue or pr if you have any question or advice
 5 | 
 6 | 
 7 | # [WIP] Abstractions
 8 | There are 6 sections in the metadata description
 9 | 
10 | 1. *framework* - the name, version of the framework, also contains runtime and other supporting files for optional
11 | 2. *model* - the model used for this job
12 | 3. *dataset* - dataset acquisition, storage, loading and other dataset processings
13 | 4. *data_process* - different data process functions is applied based on different scenarios. Right now only some basic mnist image classification functions is included
14 | 5. *model_architecture* - includes input, output and other model architecture definitions
15 | 6. *training_params* - parameters for training, includes lr, loss, batch_size etc.
16 | 
17 | 
18 | 


--------------------------------------------------------------------------------
/metadata_file/metadata.yaml:
--------------------------------------------------------------------------------
 1 | framework:
 2 |   	- name: tensorflow
 3 |   	- version: 1.12.0
 4 |   	- runtime: python2.7
 5 |     	- requirement(optional):
 6 |                 - numpy: 1.14.0
 7 |                 - grpc: 0.3.post19
 8 | 
 9 | model:
10 |         - name: image-recognition
11 |   	- version: 1.0
12 |   	- source : http://...  or test.py
13 |   	- creator: Xiyuan_Wang
14 |   	- time: 2019-02-26
15 |   	- type: Enum(Keras, Graph...)
16 |   
17 | dataset:
18 |   	- name: mnist
19 |   	- version: 1.0
20 |   	- source: http://...  or dataset.zip
21 |   
22 | # data_process differs in various ways, this is particularly for the mnist example
23 | # suggest to implement with a base + plugin structure. Base contains the most basic keyword definition like 
24 | # the dataset_split while plugin contains functions in different scenarios like image recognition, voice
25 | # recognition and so on
26 | data_process:
27 |   	- data_load: 
28 |   	- data_split: 
29 |   	- padding:
30 |   	- truncating: 
31 |   	- type: enum(Image, NLP, voice)
32 |                 - key1: value1
33 |     		- key2: value2
34 | 
35 | # the model architecture combines with 'model' if there already exists the model file, or this section may 
36 | # be useful with the Dynamic Computation Graphs?
37 | model_architecture:
38 |   	- input:
39 |   	- fully_connected_layer:
40 |   	- output:
41 |   	- dropout:
42 |   	- embedding:
43 |   	- batch_normalization:
44 | 
45 | 
46 | training_params:
47 |   	- learning_rate:
48 |   	- loss: 
49 |   	- batch_size: 
50 |   	- epoch:
51 |   	- optimizer:
52 |     	- xxx
53 |     	- yyy
54 |   	- train_op:
55 | 


--------------------------------------------------------------------------------
/model_packaging/README.md:
--------------------------------------------------------------------------------
 1 | # Model Packaging
 2 | Models may come from a variety of places:
 3 | -	Forked from a baseline experiment with sources tracked in a Git repository
 4 | -	Developed in a corporate environment's compliant experimentation system
 5 | -	Shared as part of a larger Model Ensemble.
 6 | 
 7 | Models may need to be consumed by several different upstream inferencing stacks. 
 8 | We should be able to track rich metadata around where models came from, what they are capable of, and where they are running.
 9 | 
10 | # Why is this metadata important?
11 | You are a bank being sued for discrimination in how you chose loan recipients via an ML model. 
12 | Prove that the model you used was (a) not tampered with (b) had an audit trail for training and (c) the data set was unbiased.
13 | 
14 | # Key Requirements
15 | - Track paper trail requirements for outputs produced in training pipelines.
16 | -	Enable easy registration and updates of models across a variety of storage formats.
17 | -	Provide a fluent command line experience / set of REST APIs for registering models.
18 | -	Express model schema and service schema (how do I use the model?)
19 | -	Support a model policy store
20 | - Simplify operationalization of a model
21 | -	Track model lineage
22 |  -	Which code / data / experiment produced a model
23 |  -	Pipelines used to produce a model 
24 |  -	Other models used in the inference process for a model (composites)
25 | - At minimum, capture the following:
26 |  - 	code
27 |  -	data
28 |  - config (conda env, base docker image, … )
29 | -	Track metrics associated with a model
30 |  -	Make it easy to compare metrics across different versions of a model
31 | -	Expose available deployment paths for inference
32 |  -	What can we convert it to? Where can it be deployed to?
33 |  -	New code or new model as a trigger for Deployment
34 |  -	Change management integrated into the pipeline directly
35 | -	Track breaking changes on models through semantic versioning
36 | 


--------------------------------------------------------------------------------
/model_packaging/data.yaml:
--------------------------------------------------------------------------------
 1 | # data (Optional)
 2 | #   source_id: (Optional) Extension file id regarding the data source.
 3 | #   domain: (Optional) Metadata about the data domain.
 4 | #   website: (Optional) Links to the data description
 5 | #   license: (Optional) Data license
 6 | 
 7 | data:
 8 |   source_id: IMDB-WIKI
 9 |   domain: "Image"
10 |   website: "https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/"
11 |   license: "Apache 2.0"
12 | 


--------------------------------------------------------------------------------
/model_packaging/model.yaml:
--------------------------------------------------------------------------------
 1 | # name: (Required) name of this model file
 2 | # description: (Optional) description of this model file
 3 | # author: (Required for trainable)
 4 | #   name: (Required for trainable) name of this training job's author
 5 | #   email: (Required for trainable) email of this training job's author
 6 | # framework: (Required)
 7 | #   name: (Required) ML/DL framework format that the model is stored as.
 8 | #   version: (Optional) Framework version used for this model
 9 | #   runtimes: (Required for trainable)
10 | #     name: (Required for trainable) programming language for the model runtime
11 | #     version: (Required for trainable) programming language version for the model runtime
12 | # license: (Optional) License for this model.
13 | # domain: (Optional) Domain metadata for this model.
14 | # purpose: (Optional) Purpose of this model, e.g. binary_classification
15 | #     binary_classification - Binary classification
16 | #     multiclass_classification - Multiclass classification
17 | #     regression_prediction - Regression – Prediction
18 | #     regression_recognition - Recognition – Detection
19 | # website:  (Optional) Links that explain this model in more details
20 | # labels: (Optional list) labels and tags for this model
21 | #   - url: (Optional) Link to the ML/DL model page.
22 | #   - pipeline_uuids: (Optional) Linkage with a list of execuable pipelines.
23 | 
24 | name: Facial Age estimator Model
25 | model_identifier: facial-age-estimator
26 | description: Sample Model trained to classify the age of the human face.
27 | author:
28 |   name: DL Developer
29 |   email: "me@ibm.com"
30 | framework:
31 |   name: "tensorflow"
32 |   version: "1.13.1"
33 |   runtimes:
34 |     name: python
35 |     version: "3.5"
36 | 
37 | license: "Apache 2.0"
38 | domain: "Facial Recognition"
39 | purpose: 
40 | website: "https://developer.ibm.com/exchanges/models/all/max-facial-age-estimator"
41 | labels:
42 |   - url:
43 |   - pipeline_uuids: ["abcd1234"]
44 | 


--------------------------------------------------------------------------------
/model_packaging/model_example.yml:
--------------------------------------------------------------------------------
 1 | specVersion: 0.0.1
 2 | # We need semantic versioning to indicate breaking changes
 3 | 
 4 | runtimeArtifacts:
 5 | # Collection of files (asset) required to instantiate the model
 6 | 
 7 |   example: http://github.com/example
 8 |   # Per artifact, we track
 9 |   # - location is URI which can point to a variety of stores (ADLS / Git / Blob store / ACR path / …)
10 |   # - (optional) relativePath (if it needs to be laid out specially when turning the model into a service)
11 | 
12 | framework:
13 |   name: tensorflow
14 |   # (scikit / pytorch / sparkml / onnx / MLnet / etc.)
15 |   version: 1.7
16 |   # Provides specific optimizers / value add for known flavors of models
17 |   # Version of model flavor (tf 1.7 / 1.8)
18 |   custom:
19 |   # Also provides a “custom” option
20 | 
21 |     accelerator: V100
22 | 
23 | time_created: 2019-01-12T22:53:18+00:00
24 | # when model was created
25 | 
26 | created_by: sam@rockwell.com
27 | purpose: binary_classification
28 | # binary_classification - Binary classification
29 | # multiclass_classification - Multiclass classification
30 | # regression_prediction - Regression – Prediction
31 | # regression_recognition - Recognition – Detection
32 | 
33 | clustering: xxx
34 | dimensionality_reduction: xxx
35 | # Clustering and dimensionality reduction
36 | 
37 | custom:
38 | # Open extensible field by model type
39 | # Composition - Describes the collection of models used for Composite Model
40 |   origin:
41 |   # Origin (collection of artifacts)
42 |     codeAssets: xxx
43 |     dataAssets: xxx
44 |     created_by: xxx
45 |     pipeline: xxx
46 |     logs: xxx
47 |   expiry: xxx
48 | 
49 |   metadata:
50 |   # Optional section
51 |     schema: 
52 |     #Specify the features required for scoring with the model (input, type, shape)
53 | 
54 |       link_to_dataset: xxx
55 |       # Linkage to model dataset / profile
56 |       
57 |       features: xxx
58 |       # Captures model features
59 |       
60 |       service_schema: xxx
61 |       # Service schema (generate Swagger client)
62 |     output_schema: xxx
63 |     # Output format, in contract (with same type of validation check as the inputs) (could explore compatible version ranges, etc.)
64 |     metrics:
65 |       key: value
66 |       latency: 0.01
67 |       # key/value pairs with user defined metrics (from training)
68 |       # Model performance metrics (latency to inference)
69 |     sampleInputs: http://uri/inputs.txt
70 |     tags:
71 |     # Arbitrary key / value pairs
72 |     
73 |     key: value 
74 |     # Used to express anything not specified in the loose typing above


--------------------------------------------------------------------------------
/model_packaging/model_onnx_conversion.yaml:
--------------------------------------------------------------------------------
 1 | # convert: (Optional)
 2 | #   onnx_convertable: (Optional) Enable convertion to ONNX format.
 3 | #                            The model needs to be either trainable or servable. Default: False
 4 | #   model_source: (Required for onnx_convertable) Model binary path that needs the format conversion.
 5 | #     data_store: (Required) datastore for the model source
 6 | #     initial_model:
 7 | #       bucket: (Required) Bucket that has the model source
 8 | #       path: (Required) Bucket path that has the model source
 9 | #       url: (Optional) Link to the model
10 | #     onnx_model:
11 | #       bucket:(Required) Bucket to store the onnx model
12 | #       path: (Required) Bucket path to store the onnx model
13 | #       url: (Optional) Link to the converted model
14 | #   tf_inputs: (Required for TensorFlow model) Input placeholder and shapes of the model.
15 | #   tf_outputs: (Required for TensorFlow model) Output placeholders of the model.
16 | #   tf_rtol: (Optional) Relative tolerance for TensorFlow
17 | #   tf_atol: (Optional) Absolute tolerance for TensorFlow
18 | 
19 | convert:
20 |   onnx_convertable: true
21 |   model_source:
22 |     initial_model:
23 |       data_store: age_datastore
24 |       bucket: facial-age-estimator
25 |       path: 2.0/assests/model.pt
26 |       url: ""
27 |     initial_model_local:
28 |       path: /local/1.0/assets/
29 |   onnx_converted_model:
30 |     onnx_model:
31 |       data_store: age_datastore
32 |       bucket: facial-age-estimator
33 |       path: 3.0/assets/model.onnx
34 |       url: ""
35 |     onnx_model_local:
36 |       path: /local/1.0/assets/
37 |   tf_inputs:
38 |     "X:0": [1]
39 |   tf_outputs:
40 |     - pred:0
41 |   tf_rtol: 0
42 |   tf_atol: 0
43 |   data_stores:
44 |   - name: age_datastore
45 |     type: s3
46 |     connection:
47 |       endpoint: https://s3-api.us-geo.objectstorage.softlayer.net
48 |       access_key_id: xxxxxxxxxx
49 |       secret_access_key: xxxxxxxxxxxxx
50 | 
51 | # data_stores_file_paths: (Optional) - 
52 | # - name: (Required) name of the data_store_file_path
53 | #      key: value
54 | 
55 | data_store_file_paths:
56 |  -  name: scoring_file_paths
57 |        feature_file: 2.0/assets/features.csv
58 |        input_schema_file: 2.0/assets//input_schema.json
59 |        output_schema_file: 2.0/assets//output_schema.json
60 |        sample_inputs_file: 2.0/assets//scoring_inputs.json
61 | 
62 | # container_stores: (Optional)
63 | #  - name: (Required) name of the container_store
64 | #    connection:
65 | #      container_registry: (Required) container registry for this container_store
66 | #      container_registry_token: (Required if container registry is private) container registry token
67 | 
68 | container_stores:
69 |  - name: container_store
70 |    connection:
71 |      container_registry: docker.io
72 |      container_registry_token: ""
73 | 
74 | 


--------------------------------------------------------------------------------
/model_packaging/model_packaging.md:
--------------------------------------------------------------------------------
  1 | # Model Packaging Yaml file
  2 | 
  3 | The yaml file listed here allows to 'register' a model to a AI and ML platform. The goal is to define overall Model metadata in a standard way so that you can bring a model at any part of the lifecycle within the workflow system, if you so desire. For example someone might just want to train a model, or someone might have trained a model somewhere else, and would just like to serve. If they enter during training part, then some parts of the serving template should be filled automatically. Someone else might have a need to just convert their model to ONNX format and deploy. 
  4 | 
  5 | Even in serving, there will be different ways to describe your Model given how you have packaged them (container, or split pre-processing, prediction, post processing, and that too differs based on different model types). 
  6 | 
  7 | Training section is very evolved, and can handle multiple usecases. For serving, this starts with a sample for container based Model, but we hope to evolve in future.
  8 | 
  9 |  ## General Model Metadata
 10 |  
 11 | ```
 12 |  name: (Required) name of this model file
 13 |  description: (Optional) description of this model file 
 14 |  author: (Required for trainable)
 15 |    name: (Required for trainable) name of this training job's author
 16 |    email: (Required for trainable) email of this training job's author
 17 |  framework: (Required)
 18 |    name: (Required) ML/DL framework format that the model is stored as.
 19 |    version: (Optional) Framework version used for this model
 20 |    runtimes: (Required for trainable)
 21 |      name: (Required for trainable) programming language for the model runtime
 22 |      version: (Required for trainable) programming language version for the model runtime
 23 |  labels: (Optional list) labels and tags for this model
 24 |    - url: (Optional) Link to the ML/DL model page.
 25 |    - pipeline_uuids: (Optional) Linkage with a list of execuable pipelines.
 26 |  license: (Optional) License for this model.
 27 |  domain: (Optional) Domain metadata for this model.
 28 |  purpose: (Optional) Purpose of this model, e.g. binary_classification
 29 |  website:  (Optional) Links that explain this model in more details
 30 | ```
 31 | ## Information Required for Model Training
 32 | 
 33 | ```
 34 |  train: (optional)
 35 |    trainable: (optional) Indicate the model is trainable. Default: False
 36 |    tested_platforms(optional list): platform on which this model can trained (current options: wml, ffdl, kubeflow)
 37 |    model_source: (Required for trainable)
 38 |      initial_model: (Required for trainable)
 39 |        data_store: (Required) datastore for the model code source
 40 |        bucket: (Required) Bucket that has the model code source
 41 |        path: (Required) Bucket path that has the model code source
 42 |        url: (Optional) Link to the model
 43 |      initial_model_local: (Optional)
 44 |        path: (Optional) Initial model code in the user local machine
 45 |    model_training_results: (Required for trainable)
 46 |      trained_model: (Required for trainable)
 47 |        data_store: (Required) datastore for the training result source
 48 |        bucket: (Required) Bucket that has the training result source
 49 |        path: (Required) Bucket path that has the training result source
 50 |        url: (Optional) Link to the model
 51 |      trained_model_local: (Optional)
 52 |        path: (Optional) Path to pull trained model in the user local machine
 53 |    data_source: (Optional)
 54 |      training_data: (Required for trainable)
 55 |        data_store: (Required) datastore for the model data source
 56 |        bucket: (Required) Bucket that has the model data source
 57 |        path: (Required) Bucket path that has the model data source
 58 |        url: (Optional) Link to the model
 59 |      training_data_local: (Optional)
 60 |        path: (Optional) Initial data files in the user local machine
 61 |    mount_type: (Required) object storage mount type
 62 |    evaluation_metrics: (optional) Define the metrics for the training job.
 63 |      type: (Required) evaluation_metrics type
 64 |      in: (Required) Path to store the evaluation_metrics
 65 |    training_container_image: (Optional)
 66 |      container_image_url: (Optional) Custom training container image url
 67 |      container_store: (Optional) container_store for the custom training image
 68 |    execution: (Required for trainable)
 69 |      command: (Required) Entrypoint commands to execute model code
 70 |        name: (Required) T-shirt size for training on Watson Machine Learning
 71 |        nodes: (Required) Number of nodes needed for this training job. Default: 1
 72 |    training_params: (Optional) list of hyperparameters for the training model
 73 |    	- (optional) list of key(param name):value(param value)
 74 | ```
 75 | 
 76 | ## Information required for Model Serving
 77 | 
 78 | ```
 79 |  serve: (Optional)
 80 |    servable: (Optional) Indicate the model is servable without training. Default: False
 81 |    tested_platforms (optional list): platform on which this model can served (current options: kubernetes, knative, seldon, wml, kfserving)
 82 |    model_source: (Optional) - (Required if servable is true)
 83 |      servable_model: (Required for s3 or url type)
 84 |        data_store: (Required for s3 type) datastore for the model source
 85 |        bucket: (Required for s3 type) Bucket that has the model source
 86 |        path: (Required for s3 type) Source path to the model
 87 |        url: (Required for url type) Source URL for the model
 88 |      servable_model_local: (Optional)
 89 |        path: (Optional) Servable model path in the user local machine
 90 |    serving_container_image: (Required for container type)
 91 |      container_image_url: (Required for container type) Container image to serve the model.
 92 |      container_store: (Optional) container_store name
 93 | ```
 94 | 
 95 | ## Information required for Model Scoring
 96 | 
 97 | ```
 98 |  score: (Optional)
 99 |    scorable: (Optional) Indicate the model is scorable. Default: False
100 |    model_feature_schema_source: (Required if scorable is true)
101 |      scorable_model: (Required for s3 or url type)
102 |        data_store: (Required for s3 type) datastore for the model source
103 |        bucket: (Required for s3 type) Bucket that has the model source
104 |        data_store_file_paths: (Required for s3 type) Source path to the model schema, features and test files
105 |        url: (Required for url type) Source URL for the model
106 |      secorable_model_local: (Required if local)
107 |        path: (Optional) Servable model path in the user local machine
108 |      metrics: Metrics for scoring
109 | ```
110 | 
111 | ## Data Metedata 
112 | 
113 | ```
114 |  data (Optional)
115 |    source_id: (Optional) Extension file id regarding the data source.
116 |    domain: (Optional) Metadata about the data domain.
117 |    website: (Optional) Links to the data description
118 |    license: (Optional) Data license
119 | ```
120 | 
121 | ## Data Location 
122 |  
123 |  ```
124 |  data_stores: (Optional) - (Required for trainable)
125 |    - name: (Required) name of the data_stores
126 |      connection:
127 |        endpoing: (Required) Object Storage endpoint URL or public Object Storage key link.
128 |        access_key_id: (Required) Object Storage access_key_id
129 |        secret_access_key: (Required) Object secret_access_key
130 | ```
131 | 
132 | ## File paths for Data Location
133 | 
134 |  ```
135 |  data_stores_file_paths: (Optional) - 
136 |  - name: (Required) name of the data_store_file_path
137 |       key: value
138 |   ```
139 |   
140 | ## Process - Mixin steps like training post process, serving pre process can be added
141 | 
142 | ```
143 |  process: (Optional)
144 |      - name: (Required) Script Process name. Can mix any kind of process here
145 |        params: (Optional) Free flowing list of key:value paisrs
146 |         staging_dir: (Optional) Staging directory within the local machine
147 |         trained_model_path: (Optional) trained model path within the object storage bucket
148 | ```
149 | ## Location for Docker container registry
150 | 
151 | ```
152 | container_stores: (Optional)
153 |   - name: (Required) name of the container_store
154 |     connection:
155 |       container_registry: (Required) container registry for this container_store
156 |       container_registry_token: (Required if container registry is private) container registry token
157 | ```
158 | 
159 | ## Data required for Model conversion to ONNX format
160 | 
161 | ```
162 |  convert: (Optional)
163 |    onnx_convertable: (Optional) Enable convertion to ONNX format.
164 |                             The model needs to be either trainable or servable. Default: False
165 |    model_source: (Required for onnx_convertable) Model binary path that needs the format conversion.
166 |      data_store: (Required) datastore for the model source
167 |      initial_model:
168 |        bucket: (Required) Bucket that has the model source
169 |        path: (Required) Bucket path that has the model source
170 |        url: (Optional) Link to the model
171 |      onnx_model:
172 |        bucket:(Required) Bucket to store the onnx model
173 |        path: (Required) Bucket path to store the onnx model
174 |        url: (Optional) Link to the converted model
175 |    tf_inputs: (Required for TensorFlow model) Input placeholder and shapes of the model.
176 |    tf_outputs: (Required for TensorFlow model) Output placeholders of the model.
177 |    tf_rtol: (Optional) Relative tolerance for TensorFlow
178 |    tf_atol: (Optional) Absolute tolerance for TensorFlow
179 | ```
180 | 


--------------------------------------------------------------------------------
/model_packaging/model_packaging.yaml:
--------------------------------------------------------------------------------
  1 | # name: (Required) name of this model file
  2 | # description: (Optional) description of this model file
  3 | # author: (Required for trainable)
  4 | #   name: (Required for trainable) name of this training job's author
  5 | #   email: (Required for trainable) email of this training job's author
  6 | # framework: (Required)
  7 | #   name: (Required) ML/DL framework format that the model is stored as.
  8 | #   version: (Optional) Framework version used for this model
  9 | #   runtimes: (Required for trainable)
 10 | #     name: (Required for trainable) programming language for the model runtime
 11 | #     version: (Required for trainable) programming language version for the model runtime
 12 | 
 13 | name: Facial Age estimator Model
 14 | model_identifier: facial-age-estimator
 15 | description: Sample Model trained to classify the age of the human face.
 16 | author:
 17 |   name: DL Developer
 18 |   email: "me@ibm.com"
 19 | framework:
 20 |   name: "tensorflow"
 21 |   version: "1.13.1"
 22 |   runtimes:
 23 |     name: python
 24 |     version: "3.5"
 25 | 
 26 | # labels: (Optional list) labels and tags for this model
 27 | #   - url: (Optional) Link to the ML/DL model page.
 28 | #   - pipeline_uuids: (Optional) Linkage with a list of execuable pipelines.
 29 | # license: (Optional) License for this model.
 30 | # domain: (Optional) Domain metadata for this model.
 31 | # purpose: (Optional) Purpose of this model, e.g. binary_classification
 32 | #     binary_classification - Binary classification
 33 | #     multiclass_classification - Multiclass classification
 34 | #     regression_prediction - Regression – Prediction
 35 | #     regression_recognition - Recognition – Detection
 36 | # website:  (Optional) Links that explain this model in more details
 37 | 
 38 | license: "Apache 2.0"
 39 | domain: "Facial Recognition"
 40 | purpose: 
 41 | website: "https://developer.ibm.com/exchanges/models/all/max-facial-age-estimator"
 42 | labels:
 43 |   - url:
 44 |   - pipeline_uuids: ["abcd1234"]
 45 | 
 46 | # train: (optional)
 47 | #   trainable: (optional) Indicate the model is trainable. Default: False
 48 | #   tested_platforms(optional list): platform on which this model can trained (current options: wml, ffdl, kubeflow)
 49 | #   model_source: (Required for trainable)
 50 | #     initial_model: (Required for trainable)
 51 | #       data_store: (Required) datastore for the model code source
 52 | #       bucket: (Required) Bucket that has the model code source
 53 | #       path: (Required) Bucket path that has the model code source
 54 | #       url: (Optional) Link to the model
 55 | #     initial_model_local: (Optional)
 56 | #       path: (Optional) Initial model code in the user local machine
 57 | #   model_training_results: (Required for trainable)
 58 | #     trained_model: (Required for trainable)
 59 | #       data_store: (Required) datastore for the training result source
 60 | #       bucket: (Required) Bucket that has the training result source
 61 | #       path: (Required) Bucket path that has the training result source
 62 | #       url: (Optional) Link to the model
 63 | #     trained_model_local: (Optional)
 64 | #       path: (Optional) Path to pull trained model in the user local machine
 65 | #   data_source: (Optional)
 66 | #     training_data: (Required for trainable)
 67 | #       data_store: (Required) datastore for the model data source
 68 | #       bucket: (Required) Bucket that has the model data source
 69 | #       path: (Required) Bucket path that has the model data source
 70 | #       url: (Optional) Link to the model
 71 | #     training_data_local: (Optional)
 72 | #       path: (Optional) Initial data files in the user local machine
 73 | #   mount_type: (Optional) object storage mount type
 74 | #   evaluation_metrics: (optional) Define the metrics for the training job.
 75 | #     type: (Required) evaluation_metrics type
 76 | #     in: (Required) Path to store the evaluation_metrics
 77 | #   training_container_image: (Optional)
 78 | #     container_image_url: (Optional) Custom training container image url
 79 | #     container_store: (Optional) container_store for the custom training image
 80 | #   execution: (Required for trainable)
 81 | #     command: (Required) Entrypoint commands to execute model code
 82 | #       name: (Required) T-shirt size for training on Watson Machine Learning
 83 | #       nodes: (Required) Number of nodes needed for this training job. Default: 1
 84 | #   training_params: (Optional) list of hyperparameters for the training model
 85 | #   	- (optional) list of key(param name):value(param value)
 86 | train:
 87 |   trainable: true
 88 |   tested_platforms:
 89 |     - wml
 90 |     - ffdl
 91 |   model_source:
 92 |     initial_model:
 93 |       data_store: age_datastore
 94 |       bucket: facial-age-estimator
 95 |       path: 1.0/assets/
 96 |       url: ""
 97 |     initial_model_local:
 98 |       path: /local/1.0/assets/
 99 |   model_training_results:
100 |     trained_model:
101 |       data_store: age_datastore
102 |       bucket: facial-age-estimator
103 |       path: 1.0/assets/
104 |       url: ""
105 |     trained_model_local:
106 |       path: /local/1.0/assets/
107 |   data_source:
108 |     training_data:
109 |       data_store: age_datastore
110 |       bucket: facial-age-estimator
111 |       path: 1.0/assets/
112 |       training_data_url:
113 |     training_data_local:
114 |       path: /local/1.0/assets/
115 |   mount_type: mount_cos
116 |   evaluation_metrics:
117 |     type: tensorboard
118 |     in: "$JOB_STATE_DIR/logs/tb/test"
119 |   training_container_image:
120 |     container_image_url: tensorflow/tensorflow:latest-gpu-py3
121 |     container_store: container_store
122 |   execution:
123 |     command: python3 convolutional_network.py --trainImagesFile ${DATA_DIR}/train-images-idx3-ubyte.gz
124 |       --trainLabelsFile ${DATA_DIR}/train-labels-idx1-ubyte.gz --testImagesFile ${DATA_DIR}/t10k-images-idx3-ubyte.gz
125 |       --testLabelsFile ${DATA_DIR}/t10k-labels-idx1-ubyte.gz --learningRate 0.001 --trainingIters 20000
126 |     compute_configuration:
127 |       name: k80
128 |       nodes: 1
129 |   training_params:
130 |     - learning_rate:
131 |     - loss:
132 |     - batch_size:
133 |     - epoch:
134 |     - optimizer:
135 |       - xxx
136 |       - yyy
137 |     - train_op:
138 | 
139 | # serve: (Optional)
140 | #   servable: (Optional) Indicate the model is servable. Default: False
141 | #   tested_platforms (optional list): platform on which this model can served (current options: kubernetes, knative, seldon, wml, kfserving)
142 | #   model_source: (Optional) - (Required if servable is true)
143 | #     servable_model: (Required for s3 or url type)
144 | #       data_store: (Required for s3 type) datastore for the model source
145 | #       bucket: (Required for s3 type) Bucket that has the model source
146 | #       path: (Required for s3 type) Source path to the model
147 | #       url: (Required for url type) Source URL for the model
148 | #     servable_model_local: (Optional)
149 | #       path: (Optional) Servable model path in the user local machine
150 | #   serving_container_image: (Required for container type)
151 | #     container_image_url: (Required for container type) Container image to serve the model.
152 | #     container_store: (Optional) container_store name
153 | 
154 | serve:
155 |   servable: true
156 |   tested_platforms:
157 |     - kubernetes
158 |     - knative
159 |   model_source:
160 |     servable_model:
161 |       data_store: age_datastore
162 |       bucket: facial-age-estimator
163 |       path: 2.0/assets/
164 |       url: ""
165 |     servable_model_local:
166 |       path: /local/1.0/assets/
167 |       url: ""
168 |     scorable_model_local:
169 |       path: /local/1.0/assets/
170 |   serving_container_image:
171 |     container_image_url: "codait/max-facial-age-estimator:latest"
172 |     container_store: container_store
173 | 
174 | # score: (Optional)
175 | #   scorable: (Optional) Indicate the model is scorable. Default: False
176 | #   model_feature_schema_source: (Required if scorable is true)
177 | #     scorable_model: (Required for s3 or url type)
178 | #       data_store: (Required for s3 type) datastore for the model source
179 | #       bucket: (Required for s3 type) Bucket that has the model source
180 | #       data_store_file_paths: (Required for s3 type) Source path to the model schema, features and test files
181 | #       url: (Required for url type) Source URL for the model
182 | #     secorable_model_local: (Required if local)
183 | #       path: (Optional) Servable model path in the user local machine
184 | #     metrics: Metrics for scoring
185 | 
186 | score:
187 |   scorable: true
188 |   model_features_schema_source:
189 |     scorable_model:
190 |       data_store: festure_schema_datastore
191 |       bucket: feature_schema_bucket
192 |       data_store_file_paths: scoring_file_paths
193 |     scorable_model_local:
194 |       data_store_file_paths: scoring_file_paths
195 |     metrics:
196 |       key: value
197 |       latency: 0.01
198 |     params:
199 |        key: value
200 | 
201 | # data (Optional)
202 | #   source_id: (Optional) Extension file id regarding the data source.
203 | #   domain: (Optional) Metadata about the data domain.
204 | #   website: (Optional) Links to the data description
205 | #   license: (Optional) Data license
206 | 
207 | data:
208 |   source_id: IMDB-WIKI
209 |   domain: "Image"
210 |   website: "https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/"
211 |   license: "Apache 2.0"
212 |   
213 | # process: (Optional)
214 | #     - name: (Required) Script Process name. Can mix any kind of process here
215 | #       params: (Optional) Free flowing list of key:value paisrs
216 | #        staging_dir: (Optional) Staging directory within the local machine
217 | #        trained_model_path: (Optional) trained model path within the object storage bucket
218 |   
219 | process:
220 |     - name: training_post_process
221 |       params:
222 |        key: value
223 |        staging_dir: training_output/
224 |        trained_model_path: 
225 | 
226 | # data_stores: (Optional) - (Required for trainable)
227 | #   - name: (Required) name of the data_stores
228 | #     connection:
229 | #       endpoing: (Required) Object Storage endpoint URL or public Object Storage key link.
230 | #       access_key_id: (Required) Object Storage access_key_id
231 | #       secret_access_key: (Required) Object secret_access_key
232 | 
233 | data_stores:
234 |   - name: age_datastore
235 |     type: s3
236 |     connection:
237 |       endpoint: https://s3-api.us-geo.objectstorage.softlayer.net
238 |       access_key_id: xxxxxxxxxx
239 |       secret_access_key: xxxxxxxxxxxxx
240 | 
241 | # data_stores_file_paths: (Optional) - 
242 | # - name: (Required) name of the data_store_file_path
243 | #      key: value
244 | 
245 | data_store_file_paths:
246 |  -  name: scoring_file_paths
247 |        feature_file: 2.0/assets/features.csv
248 |        input_schema_file: 2.0/assets/input_schema.json
249 |        output_schema_file: 2.0/assets/output_schema.json
250 |        sample_inputs_file: 2.0/assets/scoring_inputs.json
251 | 
252 | # container_stores: (Optional)
253 | #  - name: (Required) name of the container_store
254 | #    connection:
255 | #      container_registry: (Required) container registry for this container_store
256 | #      container_registry_token: (Required if container registry is private) container registry token
257 | 
258 | container_stores:
259 |  - name: container_store
260 |    connection:
261 |      container_registry: docker.io
262 |      container_registry_token: ""
263 | 
264 | # convert: (Optional)
265 | #   onnx_convertable: (Optional) Enable convertion to ONNX format.
266 | #                            The model needs to be either trainable or servable. Default: False
267 | #   model_source: (Required for onnx_convertable) Model binary path that needs the format conversion.
268 | #     data_store: (Required) datastore for the model source
269 | #     initial_model:
270 | #       bucket: (Required) Bucket that has the model source
271 | #       path: (Required) Bucket path that has the model source
272 | #       url: (Optional) Link to the model
273 | #     onnx_model:
274 | #       bucket:(Required) Bucket to store the onnx model
275 | #       path: (Required) Bucket path to store the onnx model
276 | #       url: (Optional) Link to the converted model
277 | #   tf_inputs: (Required for TensorFlow model) Input placeholder and shapes of the model.
278 | #   tf_outputs: (Required for TensorFlow model) Output placeholders of the model.
279 | #   tf_rtol: (Optional) Relative tolerance for TensorFlow
280 | #   tf_atol: (Optional) Absolute tolerance for TensorFlow
281 | 
282 | convert:
283 |   onnx_convertable: true
284 |   model_source:
285 |     initial_model:
286 |       data_store: age_datastore
287 |       bucket: facial-age-estimator
288 |       path: 2.0/assests/model.pt
289 |       url: ""
290 |     initial_model_local:
291 |       path: /local/1.0/assets/
292 |   onnx_converted_model:
293 |     onnx_model:
294 |       data_store: age_datastore
295 |       bucket: facial-age-estimator
296 |       path: 3.0/assets/model.onnx
297 |       url: ""
298 |     onnx_model_local:
299 |       path: /local/1.0/assets/
300 |   tf_inputs:
301 |     "X:0": [1]
302 |   tf_outputs:
303 |     - pred:0
304 |   tf_rtol: 0
305 |   tf_atol: 0
306 | 


--------------------------------------------------------------------------------
/model_packaging/model_scoring.yaml:
--------------------------------------------------------------------------------
 1 | 
 2 | # score: (Optional)
 3 | #   scorable: (Optional) Indicate the model is scorable. Default: False
 4 | #   model_feature_schema_source: (Required if scorable is true)
 5 | #     scorable_model: (Required for s3 or url type)
 6 | #       data_store: (Required for s3 type) datastore for the model source
 7 | #       bucket: (Required for s3 type) Bucket that has the model source
 8 | #       data_store_file_paths: (Required for s3 type) Source path to the model schema, features and test files
 9 | #       url: (Required for url type) Source URL for the model
10 | #     secorable_model_local: (Required if local)
11 | #       path: (Optional) Servable model path in the user local machine
12 | #     metrics: Metrics for scoring
13 | 
14 | score:
15 |   scorable: true
16 |   model_features_schema_source:
17 |     scorable_model:
18 |       data_store: festure_schema_datastore
19 |       bucket: feature_schema_bucket
20 |       data_store_file_paths: scoring_file_paths
21 |     scorable_model_local:
22 |       data_store_file_paths: scoring_file_paths
23 |     metrics:
24 |       key: value
25 |       latency: 0.01
26 |     params:
27 |        key: value
28 | 
29 | 
30 | # process: (Optional)
31 | #     - name: (Required) Script Process name. Can mix any kind of process here
32 | #       params: (Optional) Free flowing list of key:value paisrs
33 | #        staging_dir: (Optional) Staging directory within the local machine
34 | #        trained_model_path: (Optional) trained model path within the object storage bucket
35 |   
36 | process:
37 |     - name: scoring_pre_process
38 |       params:
39 |        key: value
40 |        staging_dir: serving_output/
41 | 
42 | # data_stores: (Optional) - (Required for trainable)
43 | #   - name: (Required) name of the data_stores
44 | #     connection:
45 | #       endpoing: (Required) Object Storage endpoint URL or public Object Storage key link.
46 | #       access_key_id: (Required) Object Storage access_key_id
47 | #       secret_access_key: (Required) Object secret_access_key
48 | 
49 | data_stores:
50 |   - name: age_datastore
51 |     type: s3
52 |     connection:
53 |       endpoint: https://s3-api.us-geo.objectstorage.softlayer.net
54 |       access_key_id: xxxxxxxxxx
55 |       secret_access_key: xxxxxxxxxxxxx
56 | 
57 | # data_stores_file_paths: (Optional) - 
58 | # - name: (Required) name of the data_store_file_path
59 | #      key: value
60 | 
61 | data_store_file_paths:
62 |  -  name: scoring_file_paths
63 |        feature_file: 2.0/assets/features.csv
64 |        input_schema_file: 2.0/assets/input_schema.json
65 |        output_schema_file: 2.0/assets/output_schema.json
66 |        sample_inputs_file: 2.0/assets/scoring_inputs.json
67 | 
68 | # container_stores: (Optional)
69 | #  - name: (Required) name of the container_store
70 | #    connection:
71 | #      container_registry: (Required) container registry for this container_store
72 | #      container_registry_token: (Required if container registry is private) container registry token
73 | 
74 | container_stores:
75 |  - name: container_store
76 |    connection:
77 |      container_registry: docker.io
78 |      container_registry_token: ""


--------------------------------------------------------------------------------
/model_packaging/model_serving.yaml:
--------------------------------------------------------------------------------
 1 | # serve: (Optional)
 2 | #   servable: (Optional) Indicate the model is servable. Default: False
 3 | #   tested_platforms (optional list): platform on which this model can served (current options: kubernetes, knative, seldon, wml, kfserving)
 4 | #   model_source: (Optional) - (Required if servable is true)
 5 | #     servable_model: (Required for s3 or url type)
 6 | #       data_store: (Required for s3 type) datastore for the model source
 7 | #       bucket: (Required for s3 type) Bucket that has the model source
 8 | #       path: (Required for s3 type) Source path to the model
 9 | #       url: (Required for url type) Source URL for the model
10 | #     servable_model_local: (Optional)
11 | #       path: (Optional) Servable model path in the user local machine
12 | #   serving_container_image: (Required for container type)
13 | #     container_image_url: (Required for container type) Container image to serve the model.
14 | #     container_store: (Optional) container_store name
15 | 
16 | serve:
17 |   servable: true
18 |   tested_platforms:
19 |     - kubernetes
20 |     - knative
21 |   model_source:
22 |     servable_model:
23 |       data_store: age_datastore
24 |       bucket: facial-age-estimator
25 |       path: 2.0/assets/
26 |       url: ""
27 |     servable_model_local:
28 |       path: /local/1.0/assets/
29 |       url: ""
30 |     scorable_model_local:
31 |       path: /local/1.0/assets/
32 |   serving_container_image:
33 |     container_image_url: "codait/max-facial-age-estimator:latest"
34 |     container_store: container_store
35 |   
36 | # process: (Optional)
37 | #     - name: (Required) Script Process name. Can mix any kind of process here
38 | #       params: (Optional) Free flowing list of key:value paisrs
39 | #        staging_dir: (Optional) Staging directory within the local machine
40 | #        trained_model_path: (Optional) trained model path within the object storage bucket
41 |   
42 | process:
43 |     - name: serving_pre_process
44 |       params:
45 |        key: value
46 |        staging_dir: training_output/
47 |        trained_model_path: 
48 | 
49 | # data_stores: (Optional) - (Required for trainable)
50 | #   - name: (Required) name of the data_stores
51 | #     connection:
52 | #       endpoing: (Required) Object Storage endpoint URL or public Object Storage key link.
53 | #       access_key_id: (Required) Object Storage access_key_id
54 | #       secret_access_key: (Required) Object secret_access_key
55 | 
56 | data_stores:
57 |   - name: age_datastore
58 |     type: s3
59 |     connection:
60 |       endpoint: https://s3-api.us-geo.objectstorage.softlayer.net
61 |       access_key_id: xxxxxxxxxx
62 |       secret_access_key: xxxxxxxxxxxxx
63 | 
64 | # data_stores_file_paths: (Optional) - 
65 | # - name: (Required) name of the data_store_file_path
66 | #      key: value
67 | 
68 | data_store_file_paths:
69 |  -  name: serving_file_paths
70 |        feature_file: 2.0/assets/features.csv
71 |        input_schema_file: 2.0/assets/input_schema.json
72 |        output_schema_file: 2.0/assets/output_schema.json
73 |        sample_inputs_file: 2.0/assets/scoring_inputs.json
74 | 
75 | # container_stores: (Optional)
76 | #  - name: (Required) name of the container_store
77 | #    connection:
78 | #      container_registry: (Required) container registry for this container_store
79 | #      container_registry_token: (Required if container registry is private) container registry token
80 | 
81 | container_stores:
82 |  - name: container_store
83 |    connection:
84 |      container_registry: docker.io
85 |      container_registry_token: ""
86 | 


--------------------------------------------------------------------------------
/model_packaging/model_training.yaml:
--------------------------------------------------------------------------------
  1 | # train: (optional)
  2 | #   trainable: (optional) Indicate the model is trainable. Default: False
  3 | #   tested_platforms(optional list): platform on which this model can trained (current options: wml, ffdl, kubeflow)
  4 | #   model_source: (Required for trainable)
  5 | #     initial_model: (Required for trainable)
  6 | #       data_store: (Required) datastore for the model code source
  7 | #       bucket: (Required) Bucket that has the model code source
  8 | #       path: (Required) Bucket path that has the model code source
  9 | #       url: (Optional) Link to the model
 10 | #     initial_model_local: (Optional)
 11 | #       path: (Optional) Initial model code in the user local machine
 12 | #   model_training_results: (Required for trainable)
 13 | #     trained_model: (Required for trainable)
 14 | #       data_store: (Required) datastore for the training result source
 15 | #       bucket: (Required) Bucket that has the training result source
 16 | #       path: (Required) Bucket path that has the training result source
 17 | #       url: (Optional) Link to the model
 18 | #     trained_model_local: (Optional)
 19 | #       path: (Optional) Path to pull trained model in the user local machine
 20 | #   data_source: (Optional)
 21 | #     training_data: (Required for trainable)
 22 | #       data_store: (Required) datastore for the model data source
 23 | #       bucket: (Required) Bucket that has the model data source
 24 | #       path: (Required) Bucket path that has the model data source
 25 | #       url: (Optional) Link to the model
 26 | #     training_data_local: (Optional)
 27 | #       path: (Optional) Initial data files in the user local machine
 28 | #   mount_type: (Optional) object storage mount type
 29 | #   evaluation_metrics: (optional) Define the metrics for the training job.
 30 | #     type: (Required) evaluation_metrics type
 31 | #     in: (Required) Path to store the evaluation_metrics
 32 | #   training_container_image: (Optional)
 33 | #     container_image_url: (Optional) Custom training container image url
 34 | #     container_store: (Optional) container_store for the custom training image
 35 | #   execution: (Required for trainable)
 36 | #     command: (Required) Entrypoint commands to execute model code
 37 | #       name: (Required) T-shirt size for training on Watson Machine Learning
 38 | #       nodes: (Required) Number of nodes needed for this training job. Default: 1
 39 | #   training_params: (Optional) list of hyperparameters for the training model
 40 | #   	- (optional) list of key(param name):value(param value)
 41 | train:
 42 |   trainable: true
 43 |   tested_platforms:
 44 |     - wml
 45 |     - ffdl
 46 |   model_source:
 47 |     initial_model:
 48 |       data_store: age_datastore
 49 |       bucket: facial-age-estimator
 50 |       path: 1.0/assets/
 51 |       url: ""
 52 |     initial_model_local:
 53 |       path: /local/1.0/assets/
 54 |   model_training_results:
 55 |     trained_model:
 56 |       data_store: age_datastore
 57 |       bucket: facial-age-estimator
 58 |       path: 1.0/assets/
 59 |       url: ""
 60 |     trained_model_local:
 61 |       path: /local/1.0/assets/
 62 |   data_source:
 63 |     training_data:
 64 |       data_store: age_datastore
 65 |       bucket: facial-age-estimator
 66 |       path: 1.0/assets/
 67 |       training_data_url:
 68 |     training_data_local:
 69 |       path: /local/1.0/assets/
 70 |   mount_type: mount_cos
 71 |   evaluation_metrics:
 72 |     type: tensorboard
 73 |     in: "$JOB_STATE_DIR/logs/tb/test"
 74 |   training_container_image:
 75 |     container_image_url: tensorflow/tensorflow:latest-gpu-py3
 76 |     container_store: container_store
 77 |   execution:
 78 |     command: python3 convolutional_network.py --trainImagesFile ${DATA_DIR}/train-images-idx3-ubyte.gz
 79 |       --trainLabelsFile ${DATA_DIR}/train-labels-idx1-ubyte.gz --testImagesFile ${DATA_DIR}/t10k-images-idx3-ubyte.gz
 80 |       --testLabelsFile ${DATA_DIR}/t10k-labels-idx1-ubyte.gz --learningRate 0.001 --trainingIters 20000
 81 |     compute_configuration:
 82 |       name: k80
 83 |       nodes: 1
 84 |   training_params:
 85 |     - learning_rate:
 86 |     - loss:
 87 |     - batch_size:
 88 |     - epoch:
 89 |     - optimizer:
 90 |       - xxx
 91 |       - yyy
 92 |     - train_op:
 93 | 
 94 | # process: (Optional)
 95 | #     - name: (Required) Script Process name. Can mix any kind of process here
 96 | #       params: (Optional) Free flowing list of key:value paisrs
 97 | #        staging_dir: (Optional) Staging directory within the local machine
 98 | #        trained_model_path: (Optional) trained model path within the object storage bucket
 99 |   
100 | process:
101 |     - name: training_post_process
102 |       params:
103 |        key: value
104 |        staging_dir: training_output/
105 |        trained_model_path: 
106 | 
107 | # data_stores: (Required for trainable)
108 | #   - name: (Required) name of the data_stores
109 | #     connection:
110 | #       endpoing: (Required) Object Storage endpoint URL or public Object Storage key link.
111 | #       access_key_id: (Required) Object Storage access_key_id
112 | #       secret_access_key: (Required) Object secret_access_key
113 | 
114 | data_stores:
115 |   - name: age_datastore
116 |     type: s3
117 |     connection:
118 |       endpoint: https://s3-api.us-geo.objectstorage.softlayer.net
119 |       access_key_id: xxxxxxxxxx
120 |       secret_access_key: xxxxxxxxxxxxx
121 | 
122 | # data_stores_file_paths: (Optional. To be used if there are multiple files, else the path field can be used as above) - 
123 | # - name: (Optional) name of any additional training file paths to assign. Should be referenced from a datastore if needed
124 | #      key: value
125 | 
126 | data_store_file_paths:
127 |  -  name: training_file_paths
128 |        training_file_one: 2.0/assets/training_file_one
129 |        training_file_two: 2.0/assets/training_file_two
130 | 
131 | 
132 | # container_stores: (Optional)
133 | #  - name: (Required) name of the container_store
134 | #    connection:
135 | #      container_registry: (Required) container registry for this container_store
136 | #      container_registry_token: (Required if container registry is private) container registry token
137 | 
138 | container_stores:
139 |  - name: container_store
140 |    connection:
141 |      container_registry: docker.io
142 |      container_registry_token: ""
143 | 
144 | 


--------------------------------------------------------------------------------
/monitoring_proto/README.md:
--------------------------------------------------------------------------------
 1 | # Inference Requests & Predictions
 2 | We will need to capture the following key attributes of an inference request to analyze drift & perform other upstream operations 
 3 | (such as labeling, feedback aggregation):
 4 | 
 5 | ## Inference requests
 6 | - applicationId
 7 | - requestId # correlation ID
 8 | - requestTimestamp
 9 | - inputType
10 | - inputFeatures (dictionary, string:string)
11 | (for images, example features include dimensions, number of channels, dimension ordering, path to image file)
12 | - groundTruthLabel
13 | 
14 | ## Inference predictions
15 | - applicationId
16 | - requestId
17 | - modelId (name/version)
18 | - inferenceServiceId
19 | - requestTimestamp
20 | - inferenceLatency
21 | - inputFeatures (dictionary, string:string)
22 | - prediction
23 | - predictionExplanation (dictionary, string:float) 
24 | - predictionFeedback (float, 0:1, how useful was it)
25 | - feedbackActions (list, string, what did the user do after the prediction came back)
26 | 
27 | Note that for ensemble cases (where requests are running through an inference pipeline) several of these may be generated. 
28 | Do we log isFinal in ensemble case?
29 | 


--------------------------------------------------------------------------------
/monitoring_proto/inferenceRequest.yml:
--------------------------------------------------------------------------------
 1 | applicationInfo:
 2 |  - applicationId
 3 |  - applicationEndpoint
 4 | requestInfo:
 5 |  - requestId
 6 |  - requestTimestamp
 7 |  - requestLatencyMs
 8 | resultInfo:
 9 |  - resultCode
10 |  - resultSizeInBytes
11 | inputs:
12 |   featureNames:
13 |     type: array
14 |     items:
15 |       type: string
16 |   featureValues:
17 |     type: array
18 |     items:
19 |       type: string
20 | prediction:
21 |   - predictionMimeType
22 |   - predictionData
23 | 


--------------------------------------------------------------------------------
/pipelines/module.md:
--------------------------------------------------------------------------------
 1 | # Modules
 2 | 
 3 | Module represents a unit of computation, defining script which will run on compute target, and describing its interface. Module  interface describes inputs, outputs, parameter definitions, but doesn't bind them to specific values or data. Module has snapshot associated with it, capturing script, binaries and other files necessary to execute on compute target. 
 4 | 
 5 | Module is container of ModuleVersions. Users can publish new versions, deprecate them, and otherwise manage.
 6 | 
 7 | ## Publishing new module version
 8 | 
 9 | There are multiple ways to publish new ModuleVersion:
10 | 
11 | - Using UX, CLI, SDK, REST
12 | - Snapshot can originate from many sources: git commit, folder, vsts artifact, existing snapshot, docker image
13 | 
14 | Publishing is in a scope of workspace. 
15 | 
16 | ## Consuming module
17 | 
18 | In most cases users will reference/consume Module, not ModuleVersion. System will be responsible for resolving Module->ModuleVersion in meaningful time:
19 | 
20 | - When publishing pipeline, if user used Module in their graph, we should preserve Module reference and resolve ModuleVersion only during submission
21 | 
22 |   - Challenge here is that interface is defined for ModuleVersion, not Module. So we can offer interface checks with authoring graph, but can't enforce them - actual module version might have different interface so interface enforcement can only happen when pipeline run is triggered
23 | 
24 | - When submitting pipeline for execution, Module is resolved to ModuleVersion immediately when submitting graph
25 | 
26 | - User can always use ModuleVersion in either of above scenarios, and we don't need to perform any binding.
27 | 
28 | Generally if user doesn't care about versioning too much, they just publish and consume Module, and always work with latest. Underlying infra tracks actual executed instances as ModuleVersion and user can always fall back to binding specific version.
29 | 
30 | ## Mutability
31 | 
32 | Module has mutable and immutable metadata. Most of metadata is mutable.
33 | 
34 | Mutable: name, description, versions, status
35 | 
36 | Immutable: id, creation time
37 | 
38 | 
39 | 
40 | ModuleVersion has mutable and immutable metadata. Most of metadata is immutable.
41 | 
42 | Immutable: id, snapshot id, inputs, outputs, parameters, creation time, status etc. Generally any fields influencing execution are immutable.
43 | 
44 | Mutable: description
45 | 
46 | ## Naming and identifiers
47 | 
48 | Module has name assigned by user, which must be unique within workspace. Module ID is system generated unique identifier. Module name can be changed by user (renaming modules), and is also "freed up" from user space when module is archived. Module Id is unique and immutable.
49 | 
50 | 
51 | ## Entities
52 | 
53 |  - Module
54 | 
55 |    ```json
56 |    {
57 |        "id": "266c582a-a92b-478f-93b3-66a89b755bbf",
58 |        "name" : "My training module",
59 |        "description" : "I dreamed epic algo and coded a new boosted decision trees algo, just need to make it work now",
60 |        "status" : "active",
61 |        "createdUtc" : "2019/01/01 12:12:32Z",
62 |        "versions" : [
63 |        	{
64 |                "version" : "1",
65 |                "id" : "b27ce40e-4ddd-4ef5-8f7b-095621f29a03"
66 |            },
67 |            {
68 |                "version" : "2",
69 |                "id": "db47be6b-8e73-403c-9280-0e30a8dc80d3"
70 |            }
71 |        ]
72 |    }
73 |    ```
74 | 
75 |    
76 | 
77 |  - ModuleVersion
78 | 
79 |    ```json
80 |    {
81 |    
82 |    	"id" : "b27ce40e-4ddd-4ef5-8f7b-095621f29a03",
83 |    	"module" : "266c582a-a92b-478f-93b3-66a89b755bbf",
84 |     "description" : "final attempt3",
85 |     "version" : "1",
86 |     "status" : "deprecated",
87 |     "snapshot_id" : "blah",
88 |     "createdUtc" : "2019/01/01 12:12:32Z",
89 |     "inputs": TBD,
90 |     "outputs" : TBD,
91 |     "params" : TBD    
92 |    
93 |    }
94 |    ```
95 | 
96 |    


--------------------------------------------------------------------------------
/pipelines/pipeline.md:
--------------------------------------------------------------------------------
 1 | # Machine Learning Pipeline
 2 | Machine learning pipelines are optimized around a data scientist's workflow and focus on making the model training process as efficient as possible.
 3 | 
 4 | ## Graph
 5 | 
 6 | The core abstraction for Machine Leanring Pipelines is DAG, with every vertex being either 'data' or 'step' ('computation unit'), edges are data or control dependencies between verteces.
 7 | 
 8 | #### Vertices
 9 | 'Data' vertex can be :
10 | - [DataPath](../data/datapath.md)
11 | - [DataSet](../data/dataset.md)
12 | 
13 | Data vertex has single output port.
14 | 
15 | 'Step' vertex is always an instance of the [Module](module.md), with bound inputs, outputs and parameters. Module can be of two types: *primitive* (single script which can be executed on compute), and *complex* (subgraphs - DAGs themselves defined as above). This makes graph a nested recursive structure.
16 | 
17 | Step vertex can have many inputs (or none), and many outputs (or none).
18 | 
19 | #### Edges
20 | 
21 | There are two types of edges:
22 | - Data dependency. Such edges connect data vertex or step output to input of another step. At execution time, input of that step will be set to specific instance of data
23 | 
24 | - Control dependency. Such edges connect step output to input of another step. This edge is not applicable to data verteces, and there is no data flow invloved. It purely represents that one step must be executed after another 
25 | 
26 |   Edges are always connecting inputs and outputs of vertices. Edges not associated with inputs and outputs are not allowed.
27 | 
28 | ## Pipelines, Pipelines Drafts and Pipeline Runs
29 | 
30 | Graphs can be of two major classes: 
31 |     - blueprint which defines what and how should be executed, and
32 |         - particular executing instance
33 | While both classes can look similarly, they are not exactly the same. Executing instance has naturaly properties like status, start time, logs, etc. Blueprint would insted define topology of the graph, parameter types and acceptable values, etc.
34 | 
35 | Mutable blueprint is called *Pipeline Draft*, and immutable preserved blueprint is called *Pipeline*. Executing instance is called *PipelineRun*
36 | 
37 | User can interactively construct a *Pipeline Draft*. This client side object defines DAG, and can be parametrized.  It is mutable during authoring, and then can be submitted for execution or published as *Pipeline* for future executions. When it's submitted for execution, *PipelineRun* is created and orchestrated by system. There are many ways to create *Pipeline Drafts* - *Pipeline Draft* can be defined by yaml file on disk, or in memory object in python, or created in UI interfaces, etc.
38 | 
39 | *Pipeline* is preserved immutable blueprint, which defines a parametrized DAG, and can be "instantiated" multiple times. Every time it's instantiated, it produces a PipelineRun which is then orchestrated by system.
40 | 
41 | *PipelineRun* is what is being executed by the system. While PipelineDraft or Pipeline graphs contain Steps, PipelineRun graph contains StepRuns - executing instances of Steps.
42 | 
43 | Mapping summary:
44 | 
45 | | Blueprint     | Executing Instance |
46 | | ------------- | ------------------ |
47 | | Step          | StepRun            |
48 | | PipelineDraft | PipelineRun        |
49 | | Pipeline      | PipelineRun        |
50 | 
51 | Transformation summary:
52 | 
53 | PipelineDraft.submit() -> PipelineRun
54 | 
55 | Pipeline.submit() -> PipelineRun
56 | PipelineRun.clone() -> PipelineDraft
57 | Pipline.clone() -> PipelineDraft
58 | 
59 | # Steps in a Pipeline
60 | Step is always an instance of some [Module](module.md). Modules can be thought as reusable packages, and can be used many times in the same pipeline or in another pipelines. For convenience, we allow users to construct steps directly from scripts and binaries, but system is responsible to publish module and instantiate it in such cases.
61 | 
62 | #### Inputs and outputs
63 | 
64 | All data inputs and outputs are data stored in [DataStores](../data/datastore.md) . Single step can have inputs and outputs stored in different DataStores: both multiple data stores of the same type, and data stores of different types are allowed.
65 | 
66 | Step must have every non-optional input connected through edge to another vertex. Input can be connected only to single other vertex.
67 | 
68 | Step might have outputs **not** connected to other steps. Output can be connected to multiple other vertexes. Single output can be connected to multiple inputs of single vertex as well.
69 | 
70 | #### Parameters
71 | 
72 | Steps have all parameters set to specific values, either explicitly or implicitly (through default values defined for corresponding module)
73 | 
74 | ## Parametrization
75 | 
76 | Pipeline Drafts and Pipelines can be parametrized with graph level parameters. Graph level parameters have names, types (string, float, etc.), default values and constrains defined. What PipelineRun is created from either Pipeline Draft or Pipeline, those parameters are set to actual values. Graph parameter names are unique within the graph.
77 | 
78 | ## Pipeline Run
79 | 
80 | When graph is orchestrated, its execution is captured by *Pipeline Run*. 
81 | 
82 | Nodes in executing graph are *Pipeline Step Runs*. Each step run's output is a uniquely identified [Artifact](../data/artifact.md). When some step run finishes, and downstream step can be executed, orchestrator will ensure that corresponding artifacts are passed as inputs to the next step run.
83 | 
84 | Pipeline Run and each Step Run are runs - each having run id, logs, metrics and other metadata associated with them. See [Run](../experiment_tracking/run.md) for more details.
85 | 
86 | # A YML specification for a pipeline draft
87 | See [pipeline.yml](pipeline.yml) for example
88 | 


--------------------------------------------------------------------------------
/pipelines/pipeline.yml:
--------------------------------------------------------------------------------
 1 | name: 'myTrainingPipeline'
 2 | data:
 3 |      pipelineData1
 4 | steps:
 5 |   preprocess:
 6 |     moduleName: 'BatchStep'
 7 |     computeName: 'cpu'
 8 |     cmd: 'unzip data/Posts.xml.zip -d data/'
 9 |     deps:
10 |       path: data/Posts.xml.zip
11 |     outs:
12 |     - cache: true
13 |       path: data/Posts.xml
14 |   train:
15 |     moduleName: 'PythonScriptStep'
16 |     computeName: 'cpu'
17 |     cmd: 'python train.py'
18 |     deps:
19 |       path: data/Posts.xml.zip
20 |     outs:
21 |       path: data/Posts.xml
22 |   validate:
23 |     moduleName: 'PythonScriptStep'
24 |     computeName: 'cpu'
25 |     parameters:
26 |       data_file: 'path'
27 |     cmd: 'python validate.py {data_file}'
28 | 
29 | pipeline: [preprocess,train,validate]
30 | 


--------------------------------------------------------------------------------