├── README.md ├── assets ├── mltrl-immunai22.pdf └── mltrl_flow_full.png ├── ethics_checklist.md ├── examples ├── example-BO-flowchart.png ├── mltrl_card_BO_level4.md └── mltrl_card_BO_level8.md └── mltrl_card.md /README.md: -------------------------------------------------------------------------------- 1 | # Machine Learning Technology Readiness Levels 2 | 3 | Repository for the MLTRL framework and materials. [![DOI](https://zenodo.org/badge/511701165.svg)](https://zenodo.org/badge/latestdoi/511701165) 4 | 5 | ![Alt text](assets/mltrl_flow_full.png?raw=true "MLTRL Flow") 6 | 7 | 8 | The development and deployment of machine learning (ML) systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, where mission critical measures and robustness are ingrained in the development process. Drawing on experience in both spacecraft engineering and ML (from research through product across domain areas), we have developed a proven systems engineering approach for machine learning development and deployment: *Machine Learning Technology Readiness Levels (MLTRL)* framework defines a principled process to ensure robust, reliable, and responsible systems while being streamlined for ML workflows, including key distinctions from traditional software engineering. MLTRL further defines a lingua franca for people across teams and organizations to work collaboratively on artificial intelligence and machine learning technologies. 9 | 10 | See the main paper ["Technology Readiness Levels for Machine Learning Systems"](https://arxiv.org/abs/2101.03989) (in-press) for details on the framework and results in areas including ML for medical diagnostics, consumer computer vision, satellite imagery, and particle physics. 11 | 12 | 13 | 14 | ## MLTRL Cards 15 | 16 | Here we incude a template for MLTRL practitioners to copy/fork for use in their projects, teams, and organizations: [MLTRL Card](mltrl_card.md). 17 | We also provide several real-world example Cards, and welcome others to be submitted via pull request. 18 | 19 | Please see the MLTRL journal paper for details on Card implementation and utilities. 20 | 21 | 22 | ## AI Ethics Guide 23 | 24 | The MLTRL framework frequently calls on ethics policies to ensure safe and responsible technolgies. 25 | 26 | Here we include a thorough AI ethics guide for all to use, and adapt to your domain as needed: (local link to ethics_checklist.md) 27 | 28 | Note the recent develoment of community guidelines for AI publishing has resulted in a handful of industry-specific checklists. We list those that we know of here (and accept contributions from the community): 29 | 30 | - [MI_CLAIM (Minimum Information for CLinical AI Modeling)](https://github.com/beaunorgeot/MI-CLAIM) 31 | - [Checklist for Artificial Intelligence in Medical Imaging (CLAIM)](https://pubs.rsna.org/doi/10.1148/ryai.2020200029) 32 | - *TODO: more?* 33 | 34 | 35 | ## MLTRL resources 36 | 37 | Please submit additional materials via pull request, thanks. 38 | 39 | ### Talks: 40 | 41 | - 2022 @ MIT Media Lab (TODO) 42 | - [2022 @ Immunai](assets/mltrl-immunai22.pdf) 43 | 44 | 45 | 46 | ### Papers: 47 | 48 | - **[Lavin et al. '21, *TRL for ML Systems*](https://arxiv.org/abs/2101.03989)** 49 | - [Lavin & Renard '21, *TRL for AI&ML*](https://arxiv.org/abs/2006.12497) 50 | - [Paleyes et al. '22, *Challenges in Deploying Machine Learning: a Survey of Case Studies*](https://dl.acm.org/doi/10.1145/3533378) 51 | 52 | 53 | ## Citations 54 | 55 | If you use MLTRL and/or develop related methods, please cite our work as follows: 56 | ``` 57 | @article{Lavin22NatureComms, 58 | title={Technology Readiness Levels for Machine Learning Systems}, 59 | author={Alexander Lavin and Ciar{\'a}n M. Gilligan-Lee and Alessya Visnjic and Siddha Ganju and Dava Newman and Atilim Gunes Baydin and Sujoy Ganguly and Danny B. Lange and Ajay Sharma and Stephan Zheng and Eric P. Xing and Adam Gibson and James Parr and Chris Mattmann and Yarin Gal}, 60 | journal={ArXiv}, 61 | year={2021}, 62 | volume={abs/2101.03989} 63 | } 64 | ``` 65 | -------------------------------------------------------------------------------- /assets/mltrl-immunai22.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-infrastructure-alliance/mltrl/30541308eefe21c99858c7661721ecfd914c44a8/assets/mltrl-immunai22.pdf -------------------------------------------------------------------------------- /assets/mltrl_flow_full.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-infrastructure-alliance/mltrl/30541308eefe21c99858c7661721ecfd914c44a8/assets/mltrl_flow_full.png -------------------------------------------------------------------------------- /ethics_checklist.md: -------------------------------------------------------------------------------- 1 | # MLTRL Ethics Checklist 2 | 3 | We include an example ethics playbook for use in MLTRL. There are three sections to use as described in the MLTRL journal paper: 4 | 1. Pre-dev, per project 5 | 2. Post-dev, per project 6 | 3. Guiding questions, for teams and the org 7 | 8 | This example comes from a 200-person enterprise machine learning startup with clients in healthcare, retail, and automotive industries in US and Europe. Each item was drafted by the company's Ethics Council, revised following feedback from all teams, and approved by the executive team and board. 9 | 10 | --- 11 | 12 | **Purpose:** 13 | We seek to create an objective process that allows us to analyze AI use cases in order to create trusted products and solutions for the ongoing benefit of humanity. These guidelines and questions help us navigate various ethical considerations/gray areas, allowing us to decide, at an appropriate point in the product and sales cycle, or whether to pursue (or not pursue) projects and answer critical questions about how to execute projects responsibly. 14 | 15 | 16 | ## Pre-Development Ethics Evaluation/Checklist 17 | 18 | *Purpose: The project owner (and working group) will complete this template of questions for any new project. These questions help navigate the ethical considerations / gray areas, and they should result in buy-in (or not) to pursue said project.* 19 | 20 | ### Data ownership and rights 21 | 22 | 1. How do we use or reuse data that is given to us by clients? 23 | 2. How do we store data? What about PII? 24 | 3. What information do we need to give clients about where data is housed, transferred to, etc.? 25 | 26 | ### Data collection and storage 27 | 28 | 1. How do we verify security throughout data collection, transferal, processing, and storage? 29 | 2. How do we ensure that the clients' usage aligns with PII protection and policy? 30 | 3. If relevant, precisely how are we de-identifying / anonymizing data? 31 | 32 | ### Legal 33 | 34 | 1. How do we ensure that we are legally compliant, specifically pertaining to PII protection and other client proprietary data security? 35 | 2. How do we ensure that individuals (especially protected classes) aren’t being harmed or affected by unfair biases? 36 | 37 | ### Privacy etc. 38 | 39 | 1. How do we seek informed, affirmative consent from individuals? 40 | 2. How do we provide users with clear and accessible opt-out options? 41 | 3. How do we notify individuals about access to their personal data? 42 | 4. How do we notify individuals about potential changes to policy? 43 | 5. How do we inform individuals of how their data is being used and/or shared? 44 | 45 | ### Second-order effects, and third-party users 46 | 47 | 1. What information do we need to give end-users about how data is collected, stored, transferred, etc? 48 | 2. How do we guarantee intended use of tech and data? 49 | 3. How do we maintain the intended use of tech and data down the road? 50 | 51 | ### Miscellaneous 52 | 53 | 1. How can we accurately and precisely communicate the results of our algorithms? 54 | 2. How do we ensure we won't mislead (all stakeholders)? 55 | 3. What are potential ways users can be harmed? How do we mitigate and monitor these risks? 56 | 57 | (For *personally identifiable information (PII)*, we adopt the definition provided by the U.S. General Services Administration and the definition of *personally identifiable data (PID)* provided by GDPR compliance.) 58 | 59 | 60 | 61 | ## Post-Development Ethics Evaluation/Checklist 62 | 63 | *One paragraph of explanation is expected for each question.* 64 | 65 | 1. Have we ensured that our technologies are legally compliant? 66 | 2. Have we listed how this technology can be attacked or abused? 67 | 3. Have we tested our training data to ensure it is fair and representative? 68 | 4. Have we studied and understood possible sources of bias in our data? 69 | 5. Does our team reflect diversity of opinions, backgrounds, and kinds of thought? 70 | 6. What kind of user consent do we need to collect and to store the data? 71 | 7. Do we have a mechanism for gathering consent from users? 72 | 8. Have we explained clearly what users are consenting to? 73 | 9. Do we have a mechanism for redress if people are harmed by the results? 74 | 10. Can we shut down this software in production? 75 | 11. Have we tested for fairness with respect to different user groups? 76 | 12. Have we tested for disparate error rates among different user groups? 77 | 13. Do we test and monitor for model drift to ensure our software remains fair over time? 78 | 14. Do we have a plan to protect and secure user data? 79 | 15. How do we communicate issues / policy changes / updates to users? 80 | 81 | 82 | 83 | ## High-level questions 84 | 85 | *These questions and responses have been drafted by the company's Ethics Council, revised following feedback from all teams, and approved by the executive team and board.* 86 | 87 | 1. What is our stance on gathering and storing personally identifiable information (PII)? 88 | - We adhere to all relevant local, state, federal, international, etc. laws and regulations in the jurisdictions where we operate. 89 | - We strive to preserve individuals’ power by giving notice and seeking consent. 90 | - When interacting with users (through an Augustus app, website, etc.) we offer explicit opt-in / opt-out for data sharing; opt-out is the default selection, and we do not bias selections one way or another (e.g. unlike Facebook privacy settings). 91 | - Gathering vs storing vs using data: 92 | - We’ll gather and securely store data. We do not hand personally identifiable information over to business clients or third parties. (Exception to data collection and storage policy below.) 93 | 94 | 2. What is our stance on tracking individuals? 95 | - We may monitor individuals in specific locations but only track across locations, businesses, etc. as outlined below. 96 | - Tracking by biometrics? 97 | - We may gather and securely store data, but we do not hand personally identifiable information over to business clients and third parties. 98 | - Wi-fi tracking? 99 | - We only track individuals with full, informed consent. 100 | 101 | 3. Are we comfortable collecting non-PII data and handing it over to clients without knowing what it will be used for or who it will be sent to? 102 | - If we are utilizing the client hardware and software to gather this data, then they own the data, unless otherwise contractually specified. 103 | - If we are utilizing our own hardware and software to gather this data, then we use the data and allow our clients to have access to this data through secure and limited means. 104 | - We explicitly state in contractual agreements that gathered and generated datasets shall not be used to derive PII, by the client or by downstream users. 105 | 106 | 4. Do we have mechanisms in place to ensure that data is only used for the intended purpose? 107 | - We explicitly state in contractual agreements that gathered and generated datasets shall not be used to derive PII, by the client or by downstream users. 108 | - Deployed systems will include data monitoring mechanisms for this purpose, when appropriate/possible. 109 | 110 | 5. What use cases will we avoid pursuing? 111 | - That enable, promote, or advance illegal activity 112 | - That cause physical harm 113 | - That encourage harmful and abusive biases 114 | 115 | 6. What governments or regimes are we unwilling to work with? 116 | - Those that violate internationally recognized standards regarding human rights. 117 | - Those on the [UN Sanctions List](https://www.un.org/securitycouncil/content/un-sc-consolidated-list) 118 | -------------------------------------------------------------------------------- /examples/example-BO-flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-infrastructure-alliance/mltrl/30541308eefe21c99858c7661721ecfd914c44a8/examples/example-BO-flowchart.png -------------------------------------------------------------------------------- /examples/mltrl_card_BO_level4.md: -------------------------------------------------------------------------------- 1 | # MLTRL Card example - Bayesian optimization algorithm - level 4 2 | 3 | | Summary info | Content, links | 4 | | -------------------------- | ------------- | 5 | | Tech name (project ID) | Mixed-variable BO (R02.1)[^1] | 6 | | Current Level | 4 (*link to prior level cards*) | 7 | | Owner(s) | Gunes Baydin | 8 | | Reviewer(s) | Yarin Gal | 9 | | Main project page | (*link to e.g. team's internal wiki page*) | 10 | | Req's, V&V docs | (*link, if not on main page above*) | 11 | | Code repo & docs | (*link to Github etc*) | 12 | | Ethics checks? | WIP: *link to MVBO [checklist](../ethics_checklist.md)* | 13 | | Coupled components | R04.1, P13.2 | 14 | 15 | 16 | [^1]: Note the ID convention shown is arbitrary — we use `R` for research, `02` for some index on the team's projects, and `1` for a version of the project. 17 | 18 | **TL;DR** — MVBO is a novel Bayesian optimization (BO) algorithm for the efficient optimization of mixed-variable functions (which may represent a real-world system, data-driven model, or other parameterized process). 19 | 20 | 21 | ### Top-level requirements 22 | 23 | *link to full R&D requirements doc (there is not yet a product req's doc)* 24 | 25 | 1. The algorithm shall jointly optimize discrete and continuous variables. 26 | 2. The BO approach shall be useful for hyperparameter optimization of data-driven models. 27 | 3. The BO approach shall effectively model and tune mechanistic equations (of physical systems). 28 | 4. The algorithm shall have a total runtime at least 1-2 order of magnitude faster than the end-to-end system/function. 29 | 5. The method shall be transparent to surface info such as quantified uncertainties, bounds, etc. 30 | 31 | **Extra notes**: Req's are intentionally succinct and specific to the mixed-variable alg variant we're developing, ignoring generic BO items that are well-studied/validated. 32 | 33 | 34 | ### Model / algorithm info 35 | 36 | A Bayesian optimization (BO) algorithm — a probabilsitic scheme for finding optimal parameters of a typically black-box, expensive function or system — but modified for mixed-variable settings: Our variation "mixed-variable BO" (MVBO) is for settings that have both discrete and continuous variables to optimize jointly. 37 | 38 | Implementation notes: 39 | 40 | - Gaussian process surrogate model, with standard RBF kernel 41 | - Thompson sampling scheme, which we run until converging to a local optimum 42 | - Implementation leverages the standard BO libraries GPyTorch and BoTorch. 43 | 44 | **Extra notes**: Our working definition of Bayesian Optimization (BO): The objective is to select and gather the most informative (possibly noisy) observations for finding the global maximum of an unknown, highly complex (e.g., non-convex, no closed-form expression or derivatives) objective function modelled by a GP given a sampling budget (e.g., number of costly function evaluations). The rewards (informativeness measure) are defined using an improvement-based (e.g. probability of improvement or expected improvement over currently found maximum), entropy-based, or upper confidence bound (UCB) acquisition function. 45 | 46 | 47 | ### Intended use 48 | 49 | - Optimize the parameters of an expensive, black-box function `f` with constraints. 50 | - This algorithm is specifically useful when `f` is mixed-variable: contains both discrete and continuous variables to optimize. 51 | - ML model parameter tuning is a common example for BO. With this MVBO we can tune the hyperparameters of a convnet: continuous variables (e.g. learning rate, momentum) and discrete variables (e.g. kernel size, stride, padding), subject to constraints – some combos of kernel/stride/padding lead to invalid networks. 52 | - Other applications include optimization in sensor placement / array design. 53 | 54 | **Extra notes**: MVBO use-cases are still exploratory, as this began at Level 1 without a specific application area in mind. 55 | 56 | 57 | ### Testing status 58 | 59 | - Low-level tests verify the algorithm can find solutions for simple mixed-variable functions 60 | - Tests verify the algorithm converges for standard BO problems 61 | - There are several unit tests on the BO loop, but more are needed (notably for diverse parameter sets) 62 | - MVBO algorithm converges to solution on optimization benchmark problems in 1.0s or less on 4-core CPU. 63 | 64 | **Extra notes**: Base BO tests are assumed valid from the source BoTorch and GPyTorch repositories 65 | 66 | 67 | ### Data considerations 68 | 69 | Benchmark experiments have been run on standard optimization benchmarks (e.g. Branin Hoo functions) and also newer ML-based benchmarks: [github.com/uber/bayesmark](https://github.com/uber/bayesmark) 70 | 71 | Given the code refactoring at this stage to interface with (and extend) PyTorch-based BO libraries (BoTorch and GPyTorch), the MVBO implementation expects PyTorch-based data structures and data handlers. 72 | 73 | There has not yet been extensive testing on noisy and sparse datasets, which should be done in the context of applications at later stages. 74 | 75 | 76 | ### Caveats, known edge cases, recommendations 77 | 78 | - We have not yet run experiments to fully understand the algorithm relative to other BO components – e.g. GP vs RF as surrogate models. 79 | - For most BO problems you are best off starting with default algorithms (see BoTorch) 80 | 81 | 82 | ### MLTRL stage debrief 83 | 84 | 85 | 86 | 1. What was accomplished, and by who? 87 | 88 | Investigated the algorithm's fundamental properties for efficiently navigating solution spaces of BO benchmark problems of mixed-variable domains. Codebase refactored to (1) follow spec and extend industry standard libs BoTorch and GPyTorch, and (2) include automated unit test suite. See *link to experiments page w/ plots*. 89 | 90 | 2. What was punted and/or de-scoped? 91 | 92 | n/a 93 | 94 | 3. What was learned? 95 | 96 | - Algorithm behaves similar to two (coupled) EI acquisition functions. 97 | - Convergence properties satisfied, on par w/ existing BO algorithms for computational efficiency: *link to notebook of results*. 98 | 99 | 4. What tech debt what gained? Mitigated? 100 | 101 | - Moved code from team sandbox (research-caliber) to mainline R&D repo, with sufficient unit tests (over MVBO), integration tests (with BoTorch and GPyTorch), and lightweight synthetic test data generators. 102 | 103 | --- 104 | -------------------------------------------------------------------------------- /examples/mltrl_card_BO_level8.md: -------------------------------------------------------------------------------- 1 | # MLTRL Card example - Bayesian optimization algorithm - level 8 2 | 3 | | Summary info | Content, links | 4 | | -------------------------- | ------------- | 5 | | Tech name (project ID) | Dynamic Solar Array Optmization (P13.2) | 6 | | Current Level | 8 (*link to prior level cards*) | 7 | | Owner(s) | Alexander Lavin | 8 | | Reviewer(s) | Eric Xing, Dava Newman | 9 | | Main project page | (*link to e.g. team's internal wiki page*) | 10 | | Req's, V&V docs | (*link, if not on main page above*) | 11 | | Code repo & docs | (*link to Github etc*) | 12 | | Ethics checks? | Yes: *link to completed [checklist](../ethics_checklist.md)* | 13 | | Coupled components | R02.1, P12.2 | 14 | 15 | 16 | **TL;DR** — Applying our MVBO algorithm in a systems optimization context, specifically to deploy with a new (greenfield) solar array control system, running in both live/production and shadow/simulated settings. 17 | 18 | 19 | ### Top-level requirements 20 | 21 | *links to full product requirements doc (and the R&D requirements doc if applicable from earlier levels)* 22 | 23 | 1. The MVBO algorithm shall run efficiently and reliably as a module in the broader industrial control system. 24 | 2. The optimization module shall continuously validate and deploy both production and shadow instances. 25 | 3. The MVBO algorithm and broader optimization module shall test for, be robust against, and log and shifts in data, environment, or other operations. 26 | 4. The control system optimzation scheme shall be resilient to faults and intrusions (internally and externally). 27 | 5. The optimization module shall have a total runtime less than 5% of the end-to-end control software pipeline. 28 | 6. The optimization module shall have a fallback/bypass mode in case of unforseen failure. 29 | 30 | **Extra notes**: More product requirements for the specific customer deployments and/or distribution channels, not listed here in the interest of space. 31 | 32 | 33 | ### Model / algorithm info 34 | 35 | 36 | ![Alt text](example-BO-flowchart.png?raw=true "Solar array optimization flowchart") 37 | 38 | 39 | **Extra notes**: See previous Cards for details on MVBO 40 | 41 | 42 | ### Intended use 43 | 44 | Closed-loop contrained optimization of industrial control system that operates the positioning (i.e. actuation) of solar panels in a given array/field. Extensive V&V testing must be done in simulation with real data and synthetic data prior to live deployment. 45 | 46 | **Extra notes**: At this advanced MLTRL stage there should be product datasheet/whitepaper that elucidates the use-cases. 47 | 48 | ### Testing status 49 | 50 | The following should be implemented and running (with visible status indicators and continuous logs): 51 | 52 | 1. CI/CD test suite implementation and status page 53 | 2. Cyber-physical systems tests for device and network bandwiths/latencies 54 | 3. Optimization module's shadow testing suite 55 | 4. Optimization module's testing suite for environment/data shifts 56 | 57 | 58 | ### Data considerations 59 | 60 | Production data should be assumed to be specific to each deployment / environment, even with our domain randomization techniques. This also holds for the synthetic datasets, which are generated based on a limited set of real data. 61 | If there is not sufficient data to validate the optimization module ahead of a given deployment, we can ship the control system software with the module in "pass through" mode such that we can start acquiring real system-specific data and release MVBO features in later updates. 62 | 63 | ### Caveats, known edge cases, recommendations 64 | 65 | In very rare cases that we've only seen in (intentionally challenging) simualted scenarios, the optimization module will try to shift half the solar field m degrees and the other half n degrees, and then back again on the next control loop. Simply adding random noise to several of the actuators eliminates this scenario without affecting performance metrics. 66 | 67 | ### MLTRL stage debrief 68 | 69 | 70 | 71 | 1. What was accomplished, and by who? 72 | 73 | Containerization and integrastion of the optimization subsystem into a production-ready software module. 74 | 75 | 2. What was punted and/or de-scoped? 76 | 77 | n/a 78 | 79 | 3. What was learned? 80 | 81 | Runtime of optimization module is insignificant relative to the broader control software system :) although most of the CI/CD testing is focused on optmization scenarios and robustness. 82 | 83 | 4. What tech debt what gained? Mitigated? 84 | 85 | Potentially mitigated by containerizing the optimization subsystem such that we may release it as an independent packge for open-source (and thus gain feedback from external use-cases). Note that open-sourcing is in the backlog for this project (i.e. this solar-opt epic on jira). 86 | 87 | --- 88 | -------------------------------------------------------------------------------- /mltrl_card.md: -------------------------------------------------------------------------------- 1 | # MLTRL Cards 2 | 3 | In the ML/AI Technology Readiness Levels framework ([Lavin et al. '21](https://arxiv.org/abs/2101.03989)), the maturity of each ML technology is tracked via "report cards" called *MLTRL cards*. As a given technology or project proceeds through the framewark, a Card evolves corresponding to the progression of technology readiness levels. Cards provide a means of inter-team and cross-functional communication. Where the systems engineering framework itself defines a lingua franca for ML development and adoption, these *MLTRL cards* are like a [pilot's flight plan](https://en.wikipedia.org/wiki/Flight_plan) (and the MLTRL ethics checklist is like the pilot *and* crew [preflight checks](https://pilotinstitute.com/pre-flight-checks/)). 4 | 5 | The content of an MLTRL Card is roughly in two categories: project info, and implicit knowledge. The former clearly states info such as project owners and reviewers, development status, datasets, code and deployment characteristics, and semantic versioning (for code, models and data). In the latter category, Cards communicate specific insights that are typically siloed in the ML development team but should be transparent with other stakeholders: modeling assumptions, dataset biases, corner cases, etc. These critical details can be incomplete or ignored if Cards are treated as typical software documentation afterthoughts, and thus should be prioritized in project workflows; in MLTRL, Cards are key for progressing to subsequent development levels. 6 | 7 | the aim is to promote transparency and trust, within teams and across organizations. 8 | 9 | Here we incude a template for MLTRL practitioners to copy/fork for use in their projects, teams, organizations, and so on. Please refer to the full paper for more details and context, and cite the journal publication where appropriate: (todo: replace with DOI on publication) [arxiv.org/abs/2101.03989](https://arxiv.org/abs/2101.03989). 10 | 11 | The best subtrates/platforms for MLTRL Cards we've seen are git-traceable markdown (as in this repo) and internal company docs on platforms such as Nuclino and Confluence. These exemplify several **necessary features of whatever tool you choose: Card provenance, cross-linking with users and pages, and living/editable doc that mitigates stagnation.** 12 | 13 | 14 | ### Comparisons 15 | 16 | MLTRL Cards aim to be more information-dense than recent comparisons; we're aiming for the utility of datasheets for medical devices and engineering tools, rather than high-level product whitepapers. 17 | 18 | Recent ["ML model cards"](arxiv.org/abs/1810.03993) are related but are not nearly as thorough and robust — from a [Google Cloud blog post](https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-ai-explanations-to-increase-fairness-responsibility-and-trust): `"Our aim with these first model card examples is to provide practical information about models' performance and limitations in order to help developers make better decisions about what models to use for what purpose and how to deploy them responsibly."` 19 | 20 | Additional ML model card comparisons: 21 | 22 | - [Tensorflow Model Card Toolkit](https://github.com/tensorflow/model-card-toolkit) 23 | - [🤗-Building a model card](https://huggingface.co/course/chapter4/4?fw=pt) 24 | - GPT2 examples: [HuggingFace](https://huggingface.co/gpt2) and [OpenAI](https://github.com/openai/gpt-2/blob/master/model_card.md) 25 | 26 | The HuggingFace GPT2 example is the closest to the MLTRL Card deliverable: clean yet detailed, actual examples that are informative towards practical use (rather than simplistic Google card 'results'), versioned and taggable. 27 | 28 | 29 | ### Complements 30 | 31 | MLTRL Cards should leverage other community tools (documentation, provenenance, etc.) where beneficial. For instance, **datasets** should have their own "datacards", which we don't specify (yet) in MLTRL. 32 | 33 | - Google's "Datasheets for Datasets" ([paper](https://arxiv.org/abs/1803.09010), [template](https://research.google/static/documents/datasets/crowdsourced-high-quality-colombian-spanish-es-co-multi-speaker-speech-dataset.pdf)) — it is straightforward to follow this practice within the context of MLTRL. (Note that Microsoft also provides a version, but in a format that is less implementable and transparent as a deliverable: [MS Datasheets for Datasets](https://www.microsoft.com/en-us/research/project/datasheets-for-datasets/)) 34 | - Semantic versioning for datasets, prescribed in MLTRL, is becoming a standard practice and supported by many tools: for example, one can easily coordinate datacards and MTRL Cards with [DVC](https://dvc.org/) (including programmatic tools for keeping track with data commits/tags and data artifacts/registries). 35 | - Data accountability, ethics, and overall best-practices are constantly evolving areas that should be tracked, potentially for incorporating new methods into MLTRL, and potentially for MLTRL learning lessons to inform the field. [Hutchinson et al. '21](https://arxiv.org/abs/2010.13561) is a good place to start. 36 | 37 | 43 | 44 | --- 45 | 46 | 47 | ## Card Outline 48 | 49 | It's useful to view the Card contents in the context of real example Cards: [Level 4 BayesOpt Card](examples/mltrl_card_BO_level4.md) 50 | 51 | ### Card content 52 | 53 | > First is a quick summary table... 54 | 55 | - Tech name, project ID 56 | - Current level 57 | - Owner(s) 58 | - Reviewer(s) 59 | - link to main project page (in company wiki, for example) 60 | - link to code, documentation 61 | - link to ethics checklist 62 | 63 | > Then we have more details... 64 | 65 | #### Top-level requirements 66 | 67 | A quick view of the main req's is very handy for newcomers and stakeholders to quickly grok the tech and development approach. The req's listed here will be top-level, i.e. referenced with integers 1, 2, 3, ... aligned with the full project req's document, which follows the format `requirement number.subset.component`. 68 | 69 | Then there will be link(s) to full requirements + V&V table; refer to the main manuscript to see how MLTRL defines *research-* and *product-requirements*, and the use of verification and validation (V&V). 70 | 71 | #### Model/algorithm info 72 | 73 | Concise "elevator pitch" — think from the perspective of a MLTRL reviewer who may be a domain expert but not skilled in ML. 74 | 75 | #### Intended use 76 | 77 | This can have mission-critical information and should clearly communicate what, how, why of intended *and* unacceptable use-cases. In general this section is most verbose at the early stages of commercialization (Levels 5-8). 78 | 79 | #### Testing status 80 | 81 | What's tested algorithmically? How is the code/implementation tested? What testing recommendations should be acted on in other MLTRL stages? 82 | 83 | #### Data considerations 84 | 85 | 1. Refer to the MLTRL manuscript for level-by-level data specs. 86 | 2. Highlight important/interesting data findings — for example, class imbalances, subpar labels, noise and gaps, acquisition assumptions, etc. 87 | 3. Point to specific datasets in use (internal and external sources) and data versioning info. 88 | 4. Explain precisely what data and conditions a model has been trained and developed on. 89 | 90 | Note this section's content, amongst others, can vary significantly depending on the ML model or algorithm and the domain. For instance, this section can be verbose with examples for image models, but not necessarily for time series algorithms. And in fields such as medicine there should be notes on acquisition, sharing, privacy, and ethics that may not be as significant in e.g. manufacturing. 91 | 92 | #### Caveats, known edge cases, recommendations 93 | 94 | Additional notes to highlight — for example, this is a good place to call out potential biases, technical or knowledge debt, and other matters to be considered in later stages. 95 | 96 | 97 | #### MLTRL stage debrief 98 | 99 | Succinct summary of stage progress, in a question-response format that is specified at review time (or using defaults like those in the [MLTRL Card examples](examples/). 100 | --------------------------------------------------------------------------------