Educating data science teams regarding the risks of trying to use Jupyter Notebooks in production | 210 |- Communicating the SDLC and the lessons of DevOps in a way that is comfortable for the audience 211 | - Highlighting the dangers of using RAD tools in production 212 | - Demonstrate simple alternatives that are easy to use as part of an MLOps process |
213 |
Treat ML assets as first class citizens in DevOps processes | 216 |- Extend CI/CD tools to support ML assets 217 | - Integrate ML assets with version control systems |
218 |
Providing mechanisms by which training/testing/validation data sets, training scripts, models, experiments, and service wrappers may all be versioned appropriately | 221 |- All assets under version control 222 | - ML assets include sets of data 223 | - Data volumes may be large 224 | - Data may be sensitive 225 | - Data ownership may be complex 226 | - Multiple platforms may be involved |
227 |
Providing mechanisms by which changes to training/testing/validation data sets, training scripts, models, experiments, and service wrappers may all be auditable across their full lifecycle | 230 |- Models may be retrained based upon new training data 231 | - Models may be retrained based upon new training approaches 232 | - Models may be self-learning 233 | - Models may degrade over time 234 | - Models may be deployed in new applications 235 | - Models may be subject to attack and require revision 236 | - Incidents may occur that require root cause analysis and change 237 | - Corporate or government compliance may require audit or investigation |
238 |
Treating training/testing/validation data sets as managed assets under a MLOps workflow | 241 |- Content of training/testing/validation data sets is essential to audit process, root cause analysis 242 | - Data is not typically treated as a managed asset under conventional CI/CD 243 | - Data may reside across multiple systems 244 | - Data may only be able to reside in restricted jurisdictions 245 | - Data storage may not be immutable 246 | - Data ownership may be a factor |
247 |
Managing the security of data in the MLOps process with particular focus upon the increased risk associated with aggregated data sets used for training or batch processing | 250 |- Aggregated data carries additional risk and represents a higher value target 251 | - Migration of data into Cloud environments, particularly for training, may be problematic |
252 |
Implications of privacy, GDPR and the right to be forgotten upon training sets and deployed models | 255 |- The right to use data for training purposes may require audit 256 | - Legal compliance may require removing data from audited training sets, which in turn implies the need to retrain and redeploy models built from that data. 257 | - The right to be forgotten or the right to opt out of certain data tracking may require per-user reset of model predictions |
258 |
Methods for wrapping trained models as deployable services in scenarios where data scientists training the models may not be experienced software developers with a background in service-oriented design | 261 |Operational use of a model brings new architectural requirements that may sit outside the domain of expertise of the data scientists who created it. 262 | Model developers may not be software developers and therefore experience challenges implementing APIs around their models and integrating within solutions. |
263 |
Approaches for enabling all Machine Learning frameworks to be used within the scope of MLOps, regardless of language or platform | 266 |Common mistake to assume ML = Python 267 | Many commonly used frameworks such as PyTorch and TensorFlow exist and can be expected to continue to proliferate. 268 | MLOps must not be opinionated about frameworks or languages. |
269 |
Approaches for enabling MLOps to support a broad range of target platforms, including but not limited to CPU, GPU, TPU, custom ASICs and neuromorphic silicon. | 272 |Choice of training platform and operational platform may be varied and could be different for separate models in a single project | 273 |
Methods for ensuring efficient use of hardware in both training and operational scenarios | 276 |- Hardware accelerators are expensive and difficult to virtualise 277 | - Cadence of training activities impacts cost of ownership 278 | - Elastic scaling of models against demand in operation is challenging when based upon hardware acceleration |
279 |
Approaches for applying MLOps to very large scale problems at petabyte scale and beyond | 282 |- Problems associated with moving and refreshing large training sets 283 | - Problems associated with distributing training loads across hardware accelerators 284 | - Problems with speed of distributing training data to correct hardware accelerators 285 | - Problems of provisioning / releasing large pools of hardware resources |
286 |
Providing appropriate pipeline tools to manage MLOps workflows transparently as part of existing DevOps solutions | 289 |- Integration of ML assets with existing CD/CD solutions 290 | - Extending Cloud-native build tools to support allocation of ML assets, data and hardware during training builds 291 | - Hardware pool management |
292 |
Testing ML assets appropriately | 295 |- Conventional Unit / Integration / BDD / UAT testing 296 | - Adversarial testing 297 | - Bias detection 298 | - Fairness testing 299 | - Ethics testing 300 | - Interpretability 301 | - Stress testing 302 | - Security testing |
303 |
Governance processes for managing the release cycle of MLOps assets, including Responsible AI principles | 306 |- Managing release of new training sets to data science team 307 | - Establishing thresholds for acceptable models 308 | - Monitoring model performance (and drift) over time to feed into thresholds for retraining and deployments 309 | - Managing competitive training of model variants against each other in dev environments 310 | - Managing release of preferred models into staging environments for integration and UAT 311 | - Managing release of specific model versions into production environments for specific clients/deployments 312 | - Managing root cause analysis for incident investigation 313 | - Observability / interpretability 314 | - Explainability and compliance |
315 |
Management of shared dependencies between training and operational phases | 318 |A number of ML approaches require the ability to have reusable resources that are applied both during training and during the pre-processing of data being passed to operational models. It is necessary to be able to synchronise these assets across the lifecycle of the model. e.g. Preprocessing, Validation, Word embeddings etc. | 319 |
Abstraction for models | 322 |Stored models are currently often little more than serialised objects. To decouple training languages, platforms and hardware from operational languages, platforms and hardware it is necessary to have broadly supported standard intermediate storage formats for models that can be used reliably to decouple training and operational phases. | 323 |
Longevity of ML assets | 326 |Decision-making systems can be expected to require very long effective operating lifetimes. It will be necessary in some scenarios to be able to refer to instances of models across significant spans of time and therefore forward and backward compatibility issues, storage formats and migration of long running transactions are all to be considered. | 327 |
Managing and tracking trade-offs | 330 |Solutions including ML components will be required to manage trade-offs between multiple factors, for example in striking a balance between model accuracy and customer privacy, or explainability and the risk of revealing data about individuals in the data set. It may be necessary to provide intrinsic metrics to help customers balance these equations in production. It should also be anticipated that customers will need to be able to safely A/B test different scenarios to measure their impact upon this balance. | 331 |
Escalation of data categories | 334 |As a side effect of applying governance processes to check for fairness and bias within models, it may become necessary to hold special category data providing information about race, religion or belief, sexual orientation, disability, pregnancy, or gender reassignment in order to detect such biases. As a result of this, there will be an escalation of data sensitivity and in the legal constraints that apply to the solution. | 335 |
Intrinsic protection of models | 338 |Models are vulnerable to certain common classes of attack, such as: 339 |    - Black box attempts to reverse engineer them 340 |    - Model Inversion attacks attempting to extract data about individuals 341 |    - Membership Inference attacks attempting to verify the presence of individuals in data sets 342 |    - Adversarial attacks using tailored data to manipulate outcomes 343 | It should be anticipated that there will be a need for generic protections against these classes of challenge across all deployed models. |
344 |
Emergency cut out | 348 |As models are trained typically with production/runtime data, and act on production data, there can be cases where undesirable behaviour of a recently deployed model change is only apparent in a production environment. One example is a chat bot that uses appropriate language in reaction to some interactions and training sets. There is a need to have the ability to quickly cut out a model or roll back immediately to an earlier version should this happen. | 349 | 350 |
Online learning | 354 |There are systems in use that use online learning, where a model or similar evolves in near real time with the data flowing in (and there is not necessarily a deploy stage). Systems also may modify themselves at runtime in this case without any sort of rebuilding or human approval. This will require further research and observation of emerging practices and use cases of this approach to machine learning. | 355 | 356 |
Educating data science teams regarding the risks of trying to use Jupyter Notebooks in production | 369 |Jupyter Notebooks are the tool we use to educate Data Scientists as they can easily be used to explore ad hoc ML problems incrementally on a laptop. Unfortunately, when all you have is a hammer, everything tends to look like a nail. We see Jupyter Notebooks featuring in production scenarios, not because this are the best tool for the job, but because this is the tool we taught people to use and because we didn't teach about any of the problems inherent in using that tool. This approach persists because of an ongoing gap in the tool chain. 370 | - Improved technology solutions are required that enable Data Scientists to easily run experiments at scale on elastic Cloud resources in a consistent, audited and repeatable way. 371 | - These solutions should integrate with existing Integrated Development Environments, Source Code Control and Quality Management tools. 372 | - Solutions should integrate with CI/CD environments so as to facilitate Data Scientists setting up multiple variations on a training approach and launching these as parallel, audited activities on Cloud infrastructure. 373 | - Working with the training of models and the consumption of trained models should be a seamless experience within a single toolchain. 374 | - Tooling should introduce new Data Scientists to the core concepts of software asset management, quality management and automated testing. 375 | - Tooling should enable appropriate governance processes around the release of ML assets into team environments. | 376 |
Treat ML assets as first class citizens in DevOps processes | 379 |ML assets include but are not limited to training sets, training configurations (hyper-parameters), scripts and models. Some of these assets can be large, some are more like source code assets. We should be able to track changes to all these assets, and see how the changes relate to each other. Reporting on popular DevOps "metrics" like "Change Failure Rate" and "Mean cycle time" should be possible if done correctly. Whilst some of these assets are large, they are not without precedent in the DevOps world. People have been handling changes to database configurations, copying and backup of data for some time, so they same should apply to all ML assets. In the following sections we can explore how this may be implemented. | 380 |
Providing mechanisms by which training sets, training scripts and service wrappers may all be versioned appropriately | 383 |Training sets are prepared data sets which are extracted from production data. They may contain PII or sensitive information, so making the data available to developers in a way analogous to source code may be problematic. Training scripts are smaller artefacts but are sometimes coupled to the training sets. All scripts should be versioned in a way that makes the connected to the training sets that they are associated with. Scripts and training sets are also coupled to meta data, such as instructions as to how training sets are split up for testing and validation, so a model can be reproduced in ideally a deterministic manner if required. | 384 |
Providing mechanisms by which changes to training sets, training scripts and service wrappers may all be auditable across their full lifecycle | 387 |It is an essential requirement that all changes to ML assets must result in a robust audit trail capable of meeting forensic standards of analysis. It must be possible to work backwards from a given deployment of assets at a specified time, tracing all changes to this set of assets and the originators of each change. Tooling supporting decision-making systems in applications working with sensitive personal data or life-threatening situations must support non-repudiation and immutable audit records. 388 | It should be anticipated that customers will be operating in environments subject to legal or regulatory compliance requirements that will vary by industry and jurisdiction that may require varying standards of audit transparency, including requirements for third party audit. | 389 |
Treating training sets as managed assets under a MLOps workflow | 392 |One of the difficult challenges for MLOps tooling is to be able to treat data as a managed asset in an MLOps workflow. Typically, it should be expected that traditional source code control techniques are inapplicable to data assets, which may be very large and reside in environments that are not amenable to tracking changes in the same manner as source code. New meta-data techniques must be created to effectively record and manage aggregates of data that represent specific versions of training, testing or validation sets. Given the variety of data storage mechanisms in common usage, this will likely necessitate pluggable extensions to tooling. 393 | It must be possible to define a specific set of data which can be introduced into MLOps tooling for a given training run to produce a model version, and to retrospectively inspect the set of data that was used to create a known model version in the past. Multiple model instances may share one data set and subsequent iterations of a model may be required to be regression tested against specific data set instances. It should be recognised that data sets are long-lived assets that may have compliance implications but which may also be required to be edited in response to data protection requests, invalidating all models based upon a set. 394 | | 395 |
Managing the security of data in the MLOps process with particular focus upon the increased risk associated with aggregated data sets used for training or batch processing | 398 |The vast majority of MLOps use-cases can be expected to involve mission-critical applications, sensitive personal data and large aggregated data sets which all represent high value targets with high impact from a security breach. As a result, MLOps tooling must adopt a 'secure-by-design' position rather than assuming that customers will harden their deployments as part of their responsibilities. 399 | Solutions must not default to insecure configurations for convenience, nor should they provide user-facing options that invalidate system security as a side effect of adding functionality. | 400 |
Implications of privacy, GDPR and the right to be forgotten upon training sets and deployed models | 403 |- Tooling should provide mechanisms for auditing individual permissions to use source data as part of training sets, with the assumption that permission may be withdrawn at any time 404 | - Tooling should provide mechanisms to invalidate the status of deployed models where permissions to use have been revoked 405 | - Tooling may optionally provide mechanisms to automatically retrain and revalidate models on the basis of revoked permissions 406 | - Tooling should facilitate user-specific exceptions to model invocation rules where this is necessary to enable the right to be forgotten or to support the right to opt out of certain types of data tracking 407 | | 408 |
Methods for wrapping trained models as deployable services in scenarios where data scientists training the models may not be experienced software developers with a background in service-oriented design | 411 |Models that have been trained and which have passed acceptance testing need to be deployed as part of a broader application. This might take the form of elastically scalable Cloud services within a distributed web application, or as an embedded code/data bundle within a mobile application or other physical device. In some cases, it may be expected that models need to be translated to an alternate form prior to deployment, perhaps as components of a dedicated FPGA or ASIC in a hardware solution. 412 | - MLOps tooling should integrate and simplify the deployment of models into customer applications, according to the architecture specified by the customer. 413 | - MLOps tooling should not force a specific style of deployment for models, such as a dedicated, central 'model server' 414 | - Tooling should assume that model execution in Cloud environments must be able to scale elastically 415 | - Tooling should allow for careful management of execution cost of models in Cloud environments, to mitigate the risk of unexpected proliferation of consumption of expensive compute resources 416 | - Tooling should provide mechanisms for out-of-the-box deployment of models against common architectures, with the assumption that customers may not be expert service developers 417 | - Tooling should provide automated governance processes to manage the release of models into production environments in a controlled manner 418 | - Tooling should provide the facility to upgrade and roll-back deployed models across environments 419 | - It should be assumed that models represent reusable assets that may be deployed in the form of multiple instances at differing point release versions across many independent production environments 420 | - It should be assumed that more than one point release version of a model may be deployed concurrently in order to support phased upgrades of system functionality across customer environments 421 | | 422 |
Approaches for enabling all Machine Learning frameworks to be used within the scope of MLOps, regardless of language or platform | 425 |MLOps is a methodology that must be applicable in all environments, using any programming language or framework. Implementations of MLOps tooling may be opinionated about the approach to the methodology but must be agnostic to the underlying technologies used to implement the models and services associated. 426 | It should be possible to use MLOps tooling to deploy solution components utilising different languages or frameworks using loosely-coupled principles to provide compatibility layers. | 427 |
Approaches for enabling MLOps to support a broad range of target platforms, including but not limited to CPU, GPU, TPU, custom ASICs and neuromorphic silicon. | 430 |MLOps should be considered as a cross-compilation problem where the architectures of the source and target platforms may be different. In trivial cases, models may be trained on say CPU or GPU and deployed to execute on the same CPU or GPU architecture, however other scenarios already exist and should be expected to be increasingly likely in the future. This may include training on GPU / TPU and executing on CPU or, in edge devices, training on any architecture and then translating the models into physical logic that can be implemented at very low cost / size / power directly on FPGA or ASIC devices. 431 | This implies the need for architecture-independent intermediate formats to facilitate cross-deployment or cross-compilation onto target platforms. | 432 |
Methods for ensuring efficient use of hardware in both training and operational scenarios | 435 |ML training and inferencing operations are typically very processing intensive operations that can expect to be accelerated by the use of dedicated hardware. Training is essentially an intermittent ad-hoc process that may run for hours-to-days to complete a single training run, demanding full utilisation of large scale compute resources for this period and then releasing that demand outside active training runs. Similarly, inferencing on a model may be processing intensive during a given execution. Elastic scaling of hardware resources will be essential for minimising cost of ownership however the dedicated nature of existing accelerator cards makes it currently hard to scale these elastically in today's Cloud infrastructure. 436 | Additionally, some accelerators provide the ability to subdivide processing resource into smaller allocation units to allow for efficient allocation of smaller work units within high capacity infrastructure. 437 | It will be necessary to extend the capabilities of existing Cloud platforms to permit more efficient utilisation of expensive compute resources whilst managing overall demand across multiple customers in a way that mitigates security and privacy concerns. | 438 |
Approaches for applying MLOps to very large scale problems at petabyte scale and beyond | 441 |As of 2020, large ML data sets are considered to start at around 50TB and very large data sets may derive from petabytes of source data, especially in visual applications such as autonomous vehicle control. At these scales, it becomes necessary to spread ML workloads across thousands of GPU instances in order to keep overall training times within acceptable elapsed time windows (less than a week per run). 442 | Individual GPUs are currently able to process in the order of 1-10GB of data per second but only have around 40GB of local RAM. An individual server can be expected to have around 1TB of conventional RAM and around 15TB of local high speed storage as cache for around 8 GPUs, so may be able to transfer data between these and the compute units at high hundreds of GB/s with upstream connections to network storage running at low hundreds of GB/s. 443 | Efficient workflows rely upon being able to reduce problems into smaller sub-units with constrained data requirements or systems start to become I/O bound. MLOps tooling for large problems must be able to efficiently decompose training and inferencing workloads into individual operations and data sets that can be effectively distributed as parallel activities across a homogeneous infrastructure with a supercomputing-style architecture. This can be expected to exceed the capabilities of conventional Cloud computing infrastructure and require dedicated hardware components and architecture so any MLOps tooling must have appropriate awareness of the target architecture in order to optimise deployments. 444 | At this scale, it is not feasible to create multiple physical copies of petabytes of training data due to storage capacity constraints and limitations of data transfer rates, so strategies for versioning sets of data with metadata against an incrementally growing pool will be necessary. | 445 |
Providing appropriate pipeline tools to manage MLOps workflows transparently as part of existing DevOps solutions | 448 |Existing projects using DevOps practices typically will have pipelines and delivery automated. Ideally MLOps solutions would extend this rather than replace it. In some cases it may be necessary to have a ML model (for example) have its own tooling and pipeline, but that should be the exception (as typically there is a non trivial amount of source code that goes along with the training and model for scripts and endpoints, as covered previously). | 449 |
Testing ML assets appropriately | 452 |ML assets should be considered at least at the same level as traditional source code in terms of testing (unit, integration, end to end, acceptance etc). Metrics like coverage may still apply to scripts and service endpoints, it not for the model itself (as it isn't derived from source code). Further to this, typically models when deployed are used in decision making capacities where the stakes are higher, or there are potential governance or compliance or bias risks. This implies that testing will need to cover far more than source code, but actively show in a way suitable to a variety of stakeholders, how it was tested (for example was socioeconomic bias testing included). The testing involved should be in an accessible fashion so it is not only available for developers to audit. | 453 |
Governance processes for managing the release cycle of MLOps assets, including Responsible AI principles | 456 |MLOps as a process extends governance requirements into areas beyond those typically considered as part of conventional DevOps practices. It is necessary to be able to extend auditing and traceability of MLOps assets all the way back into the data that is chosen for the purposes of training models in the first instance. MLOps tooling will need to provide mechanisms for managing the release of new training sets for the purposes of training, with the consideration that many data scientists may be working on a given model, and more than one model may be trained against a specific instance of a training data set. Customers must have the capability to retain a history of prior versions of training data and be able to recreate specific environments as the basis of new avenues of investigation, remedial work or root cause analysis, potentially under forensic conditions. 457 | The training process is predicated upon the idea of setting predefined success criteria for a given training session. Tooling should make it easy for data science teams to clearly and expressively declare success criteria against which an automated training execution will be evaluated, such that no manual intervention is required to determine when a particular training run meets the standard required for promotion to a staging environment. 458 | Tooling should also offer the ability to manage competitive training of model variants against each other as part of automated batch training activities. This could involve predefined sets of hyper-parameters to test in parallel training executions, use of smart hyper-parameter tuning libraries as part of training scripts, or evolutionary approaches to iteratively creating model variants. 459 | It should be possible to promote preferred candidate models into staging environments for integration and acceptance testing using a defined set of automated governance criteria and providing a full audit trail that can be retained across the lifetime of the ML asset being created. 460 | Tooling should permit the selective promotion of specific model versions into target production environments with the assumption that customers may need to manage multiple live versions of any given model in production for multiple client environments. Again, this requires a persistent audit trail for each deployment. 461 | It should be assumed that the decision-making nature of ML-based products will require that any incident or defect in a production environment may result in the need for a formal investigation or root-cause analysis, potentially as part of a compliance audit or litigation case. Tooling should facilitate the ability to easily walk backwards through audit trail pathways to establish the full state of all assets associated with a given deployment and all governance decisions associated. This should be implemented in such a way as to be easily initiated by non-technical staff and provide best efforts at non-repudiation and tamper protection of any potential evidence. 462 | Models themselves can be expected to be required to be constructed with a level of conformance to observability, interpretability and explainability standards, which may be defined in legislation in some cases. This is a fundamentally hard problem and one which will have a significant impact upon the practical viability of deploying ML solutions in some fields. Typically, these concerns have an inverse relationship with security and privacy requirements so it is important that tooling consider the balance of these trade-offs when providing capabilities in these areas. 463 | | 464 |
Management of shared dependencies between training and operational phases | 467 |Appropriate separation of concerns should be facilitated in the implementation of training scripts so that aspects associated with the preprocessing of data are handled independently from the core training activities. 468 | Tooling should provide clean mechanisms for efficiently and safely deploying code associated with preprocessing activities to both training and service deployment environments. It is critical that preprocessing algorithms are kept consistent between these environments at all times and a change in one must trigger a change in the other or raise an alert status capable of blocking an accidental release of mismatched libraries. 469 | It should be considered that the target environment for deployment may differ from that of training so it may be necessary to support implementations of preprocessing functions in different languages or architectures. Under such circumstances, tooling should provide robust mechanisms for ensuring an ongoing link between the implementations of these algorithms, preferably with a single, unified form of testing to prevent divergence. 470 | | 471 |
Abstraction layer for models | 474 |It is unsafe to assume that models will be deployed into environments that closely match those it which the model was originally trained. As a result, the use of object serialisation for model description is only viable in a narrow range of potential use cases. Typically, we need to be able to use a platform independent intermediate format to describe models so that this can act as a normalised form for data exchange between training and operational environments. 475 | This form must be machine readable and structured in such a way that it is extensible to support a wide range of evolving ML techniques and can easily have new marshalling and unmarshalling components added as new source and target environments are adopted. The normalised form should be structured such that the training environment has no need to know about the target operational environment in advance. | 476 |
Longevity of ML assets | 479 |With traditional software assets, such as binaries, there is some expectation of backwards compatibility (in the case of, for example, Java, this compatibility of binaries has spanned decades). ML Assets such as binaries will need to have some reasonable backwards compatibility. Versioning of artefacts for serving the model are also important (but typically that is a better known challenge). In the case of backwards breaking changes, ML assets such as training scripts, tests and data sets will need to be accessed to reproduce a model for the new target runtime. | 480 |
Managing and tracking trade-offs | 483 |ML solutions always involve trade-offs between different compromises. A typical example might be the trade-off between model accuracy, model explainability and data privacy. Viewed as the points of a triangle, we can select for a model that fits some point within the triangle where its distance from each vertex represents proximity to that ideal case. Since we train models by discovery, we can only make changes to our training data sets and hyper-parameters and then evaluate the properties of any resulting model by testing against these desired properties. Because of this, it is important that MLOps tooling provides capabilities to automate much of the heavy lifting associated with managing trade-offs in general. This might take the form of automated training of variations upon a basic model which are subsequently tested against a panel of selection criteria, evaluated and presented to customers in such a way as to make their relevant properties easily interpretable. | 484 |
Escalation of data categories | 487 |To obtain an accurate model, or to prevent the production of models with undesirable biases, it may be necessary to store data of a very sensitive nature or legally protected categories. This data may be used to vet a model pre release, or for training, in any case it will likely be persisted along with training scripts. This will require strong data protections (as with any database of a sensitive nature) and auditability of access. MLOps systems are likely to be targets of attack to obtain this data, meaning stronger considerations for protections that just source code would be required. 488 | It is anticipated that regulatory requirements intended to reduce the impact of bias or fairness issues will have unintended consequences relating to the sensitivity of other data that must be collected to fulfil these requirements, creating additional privacy risks. | 489 |
Intrinsic protection of models | 492 |Model inferencing will have to embrace modern application security techniques to protect the model against these kinds of attacks. Inferencing might be protected through restriction of access(tokens), rate-limiting, and monitoring of incoming traffic. In addition, as part of the integration test phase, there is a requirement to test the model against adversarial attacks(both common attacks, and domain-specific attacks) in a sandboxed environment. 493 | It must also be recognised that Python, whilst convenient as a language for expressing ML concepts, is an interpreted scripting language that is intrinsically insecure in production environments since any ad-hoc Python source that can be injected into a Python environment can be executed without constraint, even if shell access is disabled. The long term use of Python to build mission-critical ML models should be discouraged in favour of more secure-by-design options. | 494 |
Emergency cut out | 497 |As a model may need to abruptly be cut out, this may need to be done at the service wrapper level as an emergency measure. Rolling back to a previous version of the model and a service wrapper is desirable, but only if it is fast enough for safety reasons. At the very least, the ability to deny service in the span of minutes in cases of misbehaviour is required. This cut out needs to be human triggered at least, and possibly triggered via a live health check in the service wrapper. It is common in traditional service deployments to have health and liveness checks, a similar thing exists for deployed models where health includes acceptable behaviour. | 498 |
Online learning | 501 |Self-learning ML techniques require that models are trained live in production environments, often continuously. This places the behaviour of such models outside the governance constraints of current MLOps platforms and introduces potentially unconstrained risk that must be managed in end user code. Further consideration must be given to identifying ways in which MLOps capabilities can be extended into this space in order to provide easier access to best known methods for mitigating the risk of degrading quality. | 502 |