├── .dockerignore ├── .github └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── .gitignore ├── .python-version ├── CONTRIBUTING.md ├── DEVELOPERS.md ├── Dockerfile ├── LICENSE ├── MANIFEST.in ├── README.md ├── demo.gif ├── edge ├── edge_docker_entrypoint.sh ├── omniboard-screenshot.png ├── pyproject.toml ├── requirements-dev.txt ├── requirements.txt ├── setup.py ├── src ├── edge │ ├── __init__.py │ ├── command │ │ ├── __init__.py │ │ ├── common │ │ │ ├── __init__.py │ │ │ └── precommand_check.py │ │ ├── config │ │ │ ├── __init__.py │ │ │ └── subparser.py │ │ ├── dvc │ │ │ ├── __init__.py │ │ │ ├── init.py │ │ │ └── subparser.py │ │ ├── experiments │ │ │ ├── __init__.py │ │ │ ├── get_dashboard.py │ │ │ ├── get_mongodb.py │ │ │ ├── init.py │ │ │ └── subparser.py │ │ ├── force_unlock.py │ │ ├── init.py │ │ └── model │ │ │ ├── __init__.py │ │ │ ├── deploy.py │ │ │ ├── describe.py │ │ │ ├── get_endpoint.py │ │ │ ├── init.py │ │ │ ├── list.py │ │ │ ├── remove.py │ │ │ ├── subparser.py │ │ │ └── template.py │ ├── config.py │ ├── dvc.py │ ├── enable_api.py │ ├── endpoint.py │ ├── exception.py │ ├── gcloud.py │ ├── k8s │ │ ├── __init__.py │ │ └── omniboard.yaml │ ├── path.py │ ├── sacred.py │ ├── state.py │ ├── storage.py │ ├── templates │ │ └── tensorflow_model │ │ │ ├── cookiecutter.json │ │ │ └── {{cookiecutter.model_name}} │ │ │ ├── __init__.py │ │ │ └── train.py │ ├── train.py │ ├── tui.py │ ├── versions.py │ └── vertex_deploy.py └── vertex_edge.py ├── tutorials ├── setup.md └── train_deploy.md └── vertex-edge-logo.png /.dockerignore: -------------------------------------------------------------------------------- 1 | env/ 2 | *~ 3 | __pycache__/ 4 | .pytest_cache/ 5 | data/ 6 | build/ 7 | dist/ -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Browser [e.g. chrome, safari] 29 | - Version [e.g. 22] 30 | 31 | **Additional context** 32 | Add any other context about the problem here. 33 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | env/ 2 | *~ 3 | __pycache__/ 4 | .pytest_cache/ 5 | /.idea/ 6 | build/ 7 | dist/ 8 | /**/vertex_edge.egg-info/ -------------------------------------------------------------------------------- /.python-version: -------------------------------------------------------------------------------- 1 | 3.8.0 2 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to vertex:edge 2 | 3 | First off, thanks for taking the time to contribute! 4 | 5 | This article will help you get started, from learning [how you can contribute](#how) all the way to raising your first [pull request](#firstcontrib). 6 | 7 | ## Contents 8 | 9 | * [How can I contribute?](#how) 10 | * [Your first code contribution](#firstcontrib) 11 | * [Style guides](#styleguides) 12 | 13 | 14 | ## How can I contribute? 15 | 16 | ### Testing 17 | 18 | This is a new project that is moving fast, and so one of the most useful ways you can help out is simply testing the tools and the documentation in order to tease out bugs, edge-cases and opportunities to improve things. 19 | 20 | If you are testing this out, whether in production or not, we're really keen to hear from you and to receive your feedback. 21 | 22 | ### Reporting bugs 23 | 24 | Sadly, bugs happen; we're sorry! Before reporting a bug, please check the [open issues](https://github.com/fuzzylabs/vertex-edge/issues) to see if somebody has submitted the same bug before. If so, feel free to add further detail to the existing issue. 25 | 26 | If your bug hasn't been raised before then [go ahead and raise it](https://github.com/fuzzylabs/vertex-edge/issues/new?assignees=&labels=bug&template=bug_report.md&title=) using our bug report template. Please provide as much information as possible to help us to reproduce the bug. 27 | 28 | ### Suggesting enhancements 29 | 30 | Enhancements and feature requests are very much welcome. We hope to learn from real-world useage which features are missing so that we can improve the tool to meet the expectations of real machine learning projects. Please use our [feature request template](https://github.com/fuzzylabs/vertex-edge/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) to do this. 31 | 32 | ### Taking on an existing issue 33 | 34 | You'll find plenty of opportunities to contribute amount our [open issues](https://github.com/fuzzylabs/vertex-edge/issues). If you'd like to pick up an issue, please add a comment saying so, as this avoids duplicate work. Then read on to make your [first code contribution](#firstcontrib). 35 | 36 | 37 | ## Your first code contribution 38 | 39 | ### Fork the repository 40 | 41 | We prefer that you fork the repository to your own Github account first before raising a pull request. 42 | 43 | ### Pull requests 44 | 45 | Once you've got a code change that's ready to be reviewed, please raise a pull request. If you've got some ongoing work that's not quite ready for review, feel free to raise the pull request, but please place `[WIP]` (work-in-progress) in front of the PR title so we know it's still being worked on. 46 | 47 | Please include a description in the pull request explaining what has been changed and/or added, how, and why. Please also link to relevant issues and discussions where appropriate. 48 | 49 | 50 | ## Style guides 51 | 52 | ### Git commit messages 53 | 54 | * Make sure it's descriptive, so not `fix bug` but `fix issue #1234 where servers spontaeously combusted on random Tuesdays`. 55 | * Keep the first line brief; use multiple lines if you want to add more details. 56 | * Reference relevant issues, discussions and pull requests where appropriate. 57 | 58 | ### Python code 59 | 60 | * Above all, write clean, understandable code. 61 | * Use [black](https://github.com/psf/black) and [PyLint](https://pypi.org/project/pylint) to help ensure code is consistent. 62 | 63 | ### Documentation 64 | 65 | * Use [Markdown](https://guides.github.com/features/mastering-markdown) 66 | * Place a table of contents at the top of each Markdown file. 67 | * Write concise, clear explanations. 68 | -------------------------------------------------------------------------------- /DEVELOPERS.md: -------------------------------------------------------------------------------- 1 | # Development guide 2 | 3 | ## Python Package 4 | 5 | ### Requirements 6 | 7 | ``` 8 | pip install -r requirements-dev.txt 9 | ``` 10 | 11 | ### Build 12 | 13 | TODO 14 | 15 | ``` 16 | ./setup.py build 17 | ./setup.py install 18 | ``` 19 | 20 | Or to package 21 | 22 | ``` 23 | python -m build 24 | ``` 25 | 26 | ### Push to PyPi 27 | 28 | ``` 29 | twine upload dist/* --verbose 30 | ``` 31 | 32 | ### Testing locally 33 | 34 | ``` 35 | mkdir my_test_project 36 | cd my_test_project 37 | python -m venv env/ 38 | source env/bin/activate 39 | pip install -e 40 | ``` 41 | 42 | This will install the tool locally within a venv 43 | 44 | ## Docker image 45 | 46 | ### Build 47 | 48 | ``` 49 | docker build . -t fuzzylabs/edge 50 | ``` 51 | 52 | ### Push 53 | 54 | ``` 55 | docker push fuzzylabs/edge 56 | ``` 57 | 58 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.9.6-slim 2 | 3 | RUN apt update \ 4 | && apt install -y curl \ 5 | && apt install -y git \ 6 | && rm -rf /var/lib/apt/lists/* 7 | 8 | # Install GCloud tools 9 | RUN curl https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-347.0.0-linux-x86_64.tar.gz > /tmp/google-cloud-sdk.tar.gz \ 10 | && mkdir -p /usr/local/gcloud \ 11 | && tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz \ 12 | && /usr/local/gcloud/google-cloud-sdk/install.sh \ 13 | && /usr/local/gcloud/google-cloud-sdk/bin/gcloud components install alpha --quiet \ 14 | && pip install dvc \ 15 | && rm /tmp/google-cloud-sdk.tar.gz 16 | 17 | ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin 18 | 19 | # Install Kubectl 20 | RUN curl -LO https://dl.k8s.io/release/v1.21.0/bin/linux/amd64/kubectl \ 21 | && install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl 22 | 23 | # Install Helm 24 | RUN curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash 25 | 26 | # Install Python dependencies 27 | WORKDIR /project/ 28 | COPY requirements.txt requirements.txt 29 | RUN pip install --no-cache-dir -r requirements.txt 30 | 31 | # Install edge 32 | COPY setup.py setup.py 33 | COPY MANIFEST.in MANIFEST.in 34 | COPY edge edge 35 | COPY src/ src/ 36 | 37 | COPY src/edge/k8s/omniboard.yaml /omniboard.yaml 38 | 39 | RUN ./setup.py build 40 | RUN ./setup.py install 41 | 42 | # Copy the entrypoint script 43 | COPY edge_docker_entrypoint.sh /edge_docker_entrypoint.sh 44 | ENTRYPOINT ["/edge_docker_entrypoint.sh"] 45 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include src/edge/k8s/*.yaml 2 | recursive-include src/edge/templates/ * -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

Vertex Edge Logo

2 |

3 | 4 | 5 |


6 | 7 | # Vertex:Edge 8 | 9 | Adopting MLOps into a data science workflow requires specialist knowledge of cloud engineering. As a data scientist, you just want to train your models and get on with your life. **vertex:edge** provides an environment for training and deploying models on Google Cloud that leverages the best available open-source MLOps tools to track your experiments and version your data. 10 | 11 | 14 | 15 | ## Contents 16 | 17 | * **[Why vertex:edge?](#why-vertexedge)** 18 | * **[Pre-requisites](#pre-requisites)** 19 | * **[Quick-start](#quick-start)** 20 | * **[Tutorials](#tutorials)** 21 | * **[Contributing](#contributing)** 22 | 23 | # Why vertex:edge? 24 | 25 | **vertex:edge** is a tool that sits on top of Vertex (Google's cloud AI platform). Ordinarily, training and deploying models with Vertex requires a fair amount of repetitive work, and moreover the tooling provided by Vertex for things like data versioning and experiment tracking [aren't quite up-to-scratch](https://fuzzylabs.ai/blog/vertex-ai-the-hype). 26 | 27 | **vertex:edge** addresses a number of challenges: 28 | 29 | * Training and deploying a model on Vertex with minimal effort. 30 | * Setting up useful MLOps tools such as experiment trackers in Google Cloud, without needing a lot of cloud engineering knowledge. 31 | * Seamlessly integrating MLOps tools into machine learning workflows. 32 | 33 | Our vision is to provide a complete environment for training models with MLOps capabilities built-in. Right now we support model training and deployment through Vertex and TensorFlow, experiment tracking thanks to [Sacred](https://github.com/IDSIA/sacred), and data versioning through [DVC](https://dvc.org). In the future we want to not only expand these features, but also add: 34 | 35 | * Support for multiple ML frameworks. 36 | * Integration into model monitoring solutions. 37 | * Easy integration into infrastructure-as-code tools such as Terraform. 38 | 39 | # Pre-requisites 40 | 41 | * [A Google Cloud account](https://cloud.google.com). 42 | * [gcloud command line tool](https://cloud.google.com/sdk/docs/install). 43 | * [Docker](https://docs.docker.com/get-docker) (version 18 or greater). 44 | * Python, at least version 3.8. Check this using `python --version`. 45 | * PIP, at least version 21.2.0. Check this using `pip --version`. To upgrade PIP, run `pip install --upgrade-pip`. 46 | 47 | # Quick-start 48 | 49 | This guide gives you a quick overview of using **vertex:edge** to train and deploy a model. If this is your first time training a model on Vertex, we recommend reading the more detailed tutorials on [Project Setup](tutorials/setup.md) and [Training and Deploying a Model to GCP](tutorials/train_deploy.md). 50 | 51 | ## Install vertex:edge 52 | 53 | ``` 54 | pip install vertex-edge 55 | ``` 56 | 57 | ## Authenticate with GCP 58 | 59 | ``` 60 | gcloud auth login 61 | gcloud config set project 62 | gcloud config set compute/region 63 | gcloud auth application-default login 64 | ``` 65 | 66 | ## Initialise your project 67 | 68 | ``` 69 | edge init 70 | edge model init hello-world 71 | edge model template hello-world 72 | ``` 73 | 74 | n.b. when you run `edge init`, you will be prompted for a cloud storage bucket name. This bucket is used for tracking your project state, storing trained models, and storing versioned data. Remember that bucket names need to be globally-unique on GCP. 75 | 76 | ## Train and deploy 77 | 78 | After running the above, you'll have a new Python script under `models/hello-world/train.py`. This script uses TensorFlow to train a simple model. 79 | 80 | To train the model on Google Vertex, run: 81 | 82 | ``` 83 | RUN_ON_VERTEX=True python models/hello-world/train.py 84 | ``` 85 | 86 | Once this has finished, you can deploy the model using: 87 | 88 | ``` 89 | edge model deploy hello-world 90 | ``` 91 | 92 | You can also train the model locally, without modifying any of the code: 93 | 94 | ``` 95 | pip install tensorflow 96 | python models/hello-world/train.py 97 | ``` 98 | 99 | Note that we needed to install TensorFlow first. This is by design, because we don't want the **vertex:edge** tool to depend on specific ML frameworks. 100 | 101 | ## Track experiments 102 | 103 | We can add experiment tracking with just one command: 104 | 105 | ``` 106 | edge experiments init 107 | ``` 108 | 109 | With experiment tracking enabled, every time you train a model, the details of the training run will be recorded, including performance metrics and training parameters. 110 | 111 | You can view all of these experiments in a dashboard. To get the dashboard URL, run: 112 | 113 | ``` 114 | edge experiments get-dashboard 115 | ``` 116 | 117 |

118 | 119 |

120 | 121 | To learn more, read our tutorial on [Tracking your experiments](tutorials/experiment_tracking.md). 122 | 123 | ## Version data 124 | 125 | By using data version control you can always track the history of your data. Combined with experiment tracking, it means each model can be tied to precisely the dataset that was used when the model was trained. 126 | 127 | We use [DVC](https://dvc.org) for data versioning. To enable it, run: 128 | 129 | ``` 130 | edge dvc init 131 | ``` 132 | 133 | n.b. you need to be working in an existing Git repository before you can enable data versioning. 134 | 135 | To learn more, read our tutorial on [Versioning your data](tutorials/versioning_data.md). 136 | 137 | # Tutorials 138 | 139 | * [Project Setup](tutorials/setup.md) 140 | * [Training and Deploying a Model to GCP](tutorials/train_deploy.md) 141 | * [Tracking your experiments](tutorials/experiment_tracking.md) 142 | * [Versioning your data](tutorials/versioning_data.md) 143 | 144 | # Contributing 145 | 146 | This is a new project and we're keen to get feedback from the community to help us improve it. Please do **raise and discuss issues**, send us pull requests, and don't forget to **~~like and subscribe~~** star and fork this repo. 147 | 148 | **If you want to contribute** then please check out our [contributions guide](CONTRIBUTING.md) and [developers guide](DEVELOPERS.md). We look forward to your contributions! 149 | -------------------------------------------------------------------------------- /demo.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/demo.gif -------------------------------------------------------------------------------- /edge: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # This script is a wrapper for running the vertex:edge Docker image 4 | # As well as running the image itself, it also mounts the Google Cloud secret key 5 | # So that we will be authenticated with GCP while inside the Docker container 6 | 7 | ACCOUNT=$(gcloud config get-value account) 8 | HOST_UID=$(id -u) 9 | HOST_GID=$(id -g) 10 | docker run -it \ 11 | -v "$(pwd)":/project/ \ 12 | -v ~/.config/gcloud/:/root/.config/gcloud/ \ 13 | -e ACCOUNT="$ACCOUNT" -e HOST_UID="$HOST_UID" -e HOST_GID="$HOST_GID" \ 14 | fuzzylabs/edge $@ 15 | -------------------------------------------------------------------------------- /edge_docker_entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | export GOOGLE_APPLICATION_CREDENTIALS=/root/.config/gcloud/application_default_credentials.json 3 | gcloud config set component_manager/disable_update_check true &> /dev/null 4 | gcloud config set account "$ACCOUNT" &> /dev/null 5 | 6 | if [[ $1 == "bash" ]] 7 | then 8 | bash 9 | else 10 | vertex_edge.py "$@" 11 | fi 12 | 13 | chown -R "$HOST_UID":"$HOST_GID" . &> /dev/null 14 | -------------------------------------------------------------------------------- /omniboard-screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/omniboard-screenshot.png -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.black] 2 | line-length = 120 3 | 4 | [tool.pylint.'MESSAGES CONTROL'] 5 | max-line-length = 120 6 | -------------------------------------------------------------------------------- /requirements-dev.txt: -------------------------------------------------------------------------------- 1 | # Packages used for development but not required for the library distribution 2 | 3 | pylint 4 | black 5 | twine 6 | build 7 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | ## Serialisation / deserialisation 2 | 3 | pyserde==0.4.0 4 | 5 | ## GCP 6 | 7 | google-cloud-container==2.4.1 8 | google-cloud-secret-manager==2.5.0 9 | google_cloud_aiplatform==1.1.1 10 | google_cloud_storage==1.38.0 11 | 12 | ## Data versioning 13 | 14 | #dvc[gs]==2.5.0 15 | 16 | ## Experiment tracking 17 | 18 | sacred==0.8.2 19 | pymongo==3.11.4 20 | 21 | ## Terminal UI 22 | 23 | questionary==1.10.0 24 | 25 | ## TODO: do we need Dill? 26 | #dill==0.3.4 27 | 28 | # Model templating 29 | cookiecutter=1.7.3 30 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from setuptools import setup, find_packages 4 | 5 | setup( 6 | name="vertex:edge", 7 | version="0.1.117", 8 | url="https://github.com/fuzzylabs/vertex-edge", 9 | package_dir={'': 'src'}, 10 | packages=find_packages("src/"), 11 | scripts=["edge", "src/vertex_edge.py"], 12 | include_package_data=True, 13 | install_requires=[ 14 | "pyserde==0.4.0", 15 | #"google-api-core==1.29.0", 16 | "google-cloud-container==2.4.1", 17 | "google-cloud-secret-manager==2.5.0", 18 | "google_cloud_aiplatform==1.1.1", 19 | "google-cloud-storage==1.38.0", 20 | "cookiecutter==1.7.3", 21 | #"dvc[gs]==2.5.0", 22 | "sacred==0.8.2", 23 | "pymongo==3.11.4", 24 | "questionary==1.10.0" 25 | ] 26 | ) 27 | -------------------------------------------------------------------------------- /src/edge/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/__init__.py -------------------------------------------------------------------------------- /src/edge/command/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/__init__.py -------------------------------------------------------------------------------- /src/edge/command/common/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/common/__init__.py -------------------------------------------------------------------------------- /src/edge/command/common/precommand_check.py: -------------------------------------------------------------------------------- 1 | from edge.config import EdgeConfig 2 | from edge.exception import EdgeException 3 | from edge.gcloud import is_authenticated, project_exists, is_billing_enabled 4 | from edge.tui import SubStepTUI, StepTUI 5 | 6 | 7 | def check_gcloud_authenticated(project_id: str): 8 | with SubStepTUI(message="️Checking if you have authenticated with gcloud") as sub_step: 9 | _is_authenticated, _reason = is_authenticated(project_id) 10 | if not _is_authenticated: 11 | raise EdgeException(_reason) 12 | 13 | 14 | def check_project_exists(gcloud_project: str): 15 | with SubStepTUI(f"Checking if project '{gcloud_project}' exists") as sub_step: 16 | project_exists(gcloud_project) 17 | 18 | 19 | def check_billing_enabled(gcloud_project: str): 20 | with SubStepTUI(f"Checking if billing is enabled for project '{gcloud_project}'") as sub_step: 21 | if not is_billing_enabled(gcloud_project): 22 | raise EdgeException( 23 | f"Billing is not enabled for project '{gcloud_project}'. " 24 | f"Please enable billing for this project following these instructions " 25 | f"https://cloud.google.com/billing/docs/how-to/modify-projectBilling is not enabled " 26 | f"for project '{gcloud_project}'." 27 | ) 28 | 29 | 30 | def precommand_checks(config: EdgeConfig): 31 | gcloud_project = config.google_cloud_project.project_id 32 | with StepTUI(message="Checking your GCP environment", emoji="☁️") as step: 33 | check_gcloud_authenticated(gcloud_project) 34 | check_project_exists(gcloud_project) 35 | check_billing_enabled(gcloud_project) 36 | -------------------------------------------------------------------------------- /src/edge/command/config/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/config/__init__.py -------------------------------------------------------------------------------- /src/edge/command/config/subparser.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import sys 3 | 4 | from edge.config import EdgeConfig 5 | from edge.exception import EdgeException 6 | 7 | 8 | def add_config_parser(subparsers): 9 | parser = subparsers.add_parser("config", help="Configuration related actions") 10 | actions = parser.add_subparsers(title="action", dest="action", required=True) 11 | actions.add_parser("get-region", help="Get configured region") 12 | 13 | 14 | def run_config_actions(args: argparse.Namespace): 15 | if args.action == "get-region": 16 | with EdgeConfig.context(silent=True) as config: 17 | print(config.google_cloud_project.region) 18 | sys.exit(0) 19 | else: 20 | raise EdgeException("Unexpected experiments command") 21 | -------------------------------------------------------------------------------- /src/edge/command/dvc/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/dvc/__init__.py -------------------------------------------------------------------------------- /src/edge/command/dvc/init.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from edge.command.common.precommand_check import precommand_checks 4 | from edge.config import EdgeConfig 5 | from edge.state import EdgeState 6 | from edge.tui import TUI 7 | from edge.dvc import setup_dvc 8 | from edge.path import get_model_dvc_pipeline 9 | 10 | 11 | def dvc_init(): 12 | intro = "Initialising data version control (DVC)" 13 | success_title = "DVC initialised successfully" 14 | success_message = f""" 15 | Now you can version your data using DVC. See https://dvc.org/doc for more details about how it can be used. 16 | 17 | What's next? We suggest you proceed with: 18 | 19 | Train and deploy a model (see 'Training a model' section of the README for more details): 20 | ./edge.sh model init fashion 21 | dvc repro {get_model_dvc_pipeline("fashion")} 22 | ./edge.sh model deploy fashion 23 | 24 | Happy herding! 🐏 25 | """.strip() 26 | failure_title = "DVC initialisation failed" 27 | failure_message = "See the errors above. See README for more details." 28 | with TUI( 29 | intro, 30 | success_title, 31 | success_message, 32 | failure_title, 33 | failure_message 34 | ) as tui: 35 | with EdgeConfig.context() as config: 36 | precommand_checks(config) 37 | with EdgeState.context(config) as state: 38 | setup_dvc( 39 | state.storage.bucket_path, 40 | config.storage_bucket.dvc_store_directory 41 | ) 42 | 43 | -------------------------------------------------------------------------------- /src/edge/command/dvc/subparser.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from edge.command.dvc.init import dvc_init 3 | from edge.exception import EdgeException 4 | 5 | 6 | def add_dvc_parser(subparsers): 7 | parser = subparsers.add_parser("dvc", help="DVC related actions") 8 | actions = parser.add_subparsers(title="action", dest="action", required=True) 9 | actions.add_parser("init", help="Initialise DVC") 10 | 11 | 12 | def run_dvc_actions(args: argparse.Namespace): 13 | if args.action == "init": 14 | dvc_init() 15 | else: 16 | raise EdgeException("Unexpected DVC command") 17 | -------------------------------------------------------------------------------- /src/edge/command/experiments/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/experiments/__init__.py -------------------------------------------------------------------------------- /src/edge/command/experiments/get_dashboard.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | from edge.config import EdgeConfig 4 | from edge.state import EdgeState 5 | 6 | 7 | def get_dashboard(): 8 | with EdgeConfig.context(silent=True) as config: 9 | with EdgeState.context(config, silent=True) as state: 10 | print(state.sacred.external_omniboard_string) 11 | sys.exit(0) 12 | -------------------------------------------------------------------------------- /src/edge/command/experiments/get_mongodb.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | from edge.config import EdgeConfig 4 | from edge.sacred import get_connection_string 5 | from edge.state import EdgeState 6 | 7 | 8 | def get_mongodb(): 9 | with EdgeConfig.context(silent=True) as config: 10 | with EdgeState.context(config, silent=True) as state: 11 | project_id = config.google_cloud_project.project_id 12 | secret_id = config.experiments.mongodb_connection_string_secret 13 | print(get_connection_string(project_id, secret_id)) 14 | sys.exit(0) 15 | -------------------------------------------------------------------------------- /src/edge/command/experiments/init.py: -------------------------------------------------------------------------------- 1 | from edge.command.common.precommand_check import precommand_checks 2 | from edge.config import EdgeConfig, SacredConfig 3 | from edge.enable_api import enable_service_api 4 | from edge.exception import EdgeException 5 | from edge.path import get_model_dvc_pipeline 6 | from edge.state import EdgeState 7 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark 8 | from edge.sacred import setup_sacred 9 | import questionary 10 | 11 | 12 | def experiments_init(): 13 | intro = "Initialising experiment tracking" 14 | success_title = "Experiment tracking initialised successfully" 15 | success_message = "" 16 | failure_title = "Experiment tracking initialisation failed" 17 | failure_message = "See the errors above. See README for more details." 18 | with TUI( 19 | intro, 20 | success_title, 21 | success_message, 22 | failure_title, 23 | failure_message 24 | ) as tui: 25 | with EdgeConfig.context(to_save=True) as config: 26 | precommand_checks(config) 27 | with EdgeState.context(config, to_lock=True, to_save=True) as state: 28 | with StepTUI("Enabling required Google Cloud APIs", emoji="☁️"): 29 | with SubStepTUI("Enabling Kubernetes Engine API for experiment tracking"): 30 | enable_service_api("container.googleapis.com", config.google_cloud_project.project_id) 31 | with SubStepTUI("Enabling Secret Manager API for experiment tracking"): 32 | enable_service_api("secretmanager.googleapis.com", config.google_cloud_project.project_id) 33 | with StepTUI("Configuring experiment tracking", emoji="⚙️"): 34 | with SubStepTUI("Configuring Kubernetes cluster name on GCP", status=TUIStatus.NEUTRAL) as sub_step: 35 | sub_step.add_explanation("If a name for an existing cluster is provided, this cluster " 36 | "will be used. Otherwise, vertex:edge will create a new cluster with " 37 | "GKE auto-pilot.") 38 | previous_cluster_name = ( 39 | config.experiments.gke_cluster_name if config.experiments is not None else "sacred" 40 | ) 41 | cluster_name = questionary.text( 42 | "Choose a name for a kubernetes cluster to use:", 43 | default=previous_cluster_name, 44 | qmark=qmark, 45 | validate=(lambda x: x.strip() != "") 46 | ).ask() 47 | if cluster_name is None: 48 | raise EdgeException("Cluster name is required") 49 | sacred_config = SacredConfig( 50 | gke_cluster_name=cluster_name, 51 | mongodb_connection_string_secret="sacred-mongodb-connection-string" 52 | ) 53 | config.experiments = sacred_config 54 | sacred_state = setup_sacred( 55 | config.google_cloud_project.project_id, 56 | config.google_cloud_project.region, 57 | config.experiments.gke_cluster_name, 58 | config.experiments.mongodb_connection_string_secret, 59 | ) 60 | state.sacred = sacred_state 61 | tui.success_title = ( 62 | f"Now you can track experiments, and view them in Omniboard dashboard " 63 | f"at {sacred_state.external_omniboard_string}\n\n" 64 | "What's next? We suggest you proceed with:\n\n" 65 | " Train and deploy a model (see 'Training a model' section of the README for more details):\n" 66 | " ./edge.sh model init fashion\n" 67 | f" dvc repro {get_model_dvc_pipeline('fashion')}\n" 68 | " ./edge.sh model deploy fashion\n\n" 69 | "Happy herding! 🐏" 70 | ) 71 | 72 | 73 | 74 | -------------------------------------------------------------------------------- /src/edge/command/experiments/subparser.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | from edge.command.experiments.get_dashboard import get_dashboard 4 | from edge.command.experiments.get_mongodb import get_mongodb 5 | from edge.exception import EdgeException 6 | from edge.command.experiments.init import experiments_init 7 | 8 | 9 | def add_experiments_parser(subparsers): 10 | parser = subparsers.add_parser("experiments", help="Experiments related actions") 11 | actions = parser.add_subparsers(title="action", dest="action", required=True) 12 | actions.add_parser("init", help="Initialise experiments") 13 | actions.add_parser("get-dashboard", help="Get experiment tracker dashboard URL") 14 | actions.add_parser("get-mongodb", help="Get MongoDB connection string") 15 | 16 | 17 | def run_experiments_actions(args: argparse.Namespace): 18 | if args.action == "init": 19 | experiments_init() 20 | elif args.action == "get-dashboard": 21 | get_dashboard() 22 | elif args.action == "get-mongodb": 23 | get_mongodb() 24 | else: 25 | raise EdgeException("Unexpected experiments command") 26 | -------------------------------------------------------------------------------- /src/edge/command/force_unlock.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | from edge.config import EdgeConfig 4 | from edge.state import EdgeState 5 | 6 | 7 | def force_unlock(): 8 | with EdgeConfig.context() as config: 9 | EdgeState.unlock( 10 | config.google_cloud_project.project_id, 11 | config.storage_bucket.bucket_name, 12 | ) 13 | sys.exit(0) 14 | -------------------------------------------------------------------------------- /src/edge/command/init.py: -------------------------------------------------------------------------------- 1 | from edge.command.common.precommand_check import check_gcloud_authenticated, check_project_exists, check_billing_enabled 2 | from edge.config import GCProjectConfig, StorageBucketConfig, EdgeConfig 3 | from edge.enable_api import enable_service_api 4 | from edge.exception import EdgeException 5 | from edge.gcloud import is_authenticated, get_gcloud_account, get_gcloud_project, get_gcloud_region, get_gcp_regions, \ 6 | project_exists, is_billing_enabled 7 | from edge.state import EdgeState 8 | from edge.storage import setup_storage 9 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark 10 | from edge.versions import get_gcloud_version, Version, get_kubectl_version, get_helm_version 11 | from edge.path import get_model_dvc_pipeline 12 | import questionary 13 | 14 | 15 | def edge_init(): 16 | success_title = "Initialised successfully" 17 | success_message = f""" 18 | What's next? We suggest you proceed with: 19 | 20 | Commit the new vertex:edge configuration to git: 21 | git add edge.yaml && git commit -m "Initialise vertex:edge" 22 | 23 | Configure an experiment tracker (optional): 24 | ./edge.sh experiments init 25 | 26 | Configure data version control: 27 | ./edge.sh dvc init 28 | 29 | Train and deploy a model (see 'Training a model' section of the README for more details): 30 | ./edge.sh model init fashion 31 | dvc repro {get_model_dvc_pipeline("fashion")} 32 | ./edge.sh model deploy fashion 33 | 34 | Happy herding! 🐏 35 | """.strip() 36 | failure_title = "Initialisation failed" 37 | failure_message = "See the errors above. See README for more details." 38 | with TUI( 39 | "Initialising vertex:edge", 40 | success_title, 41 | success_message, 42 | failure_title, 43 | failure_message 44 | ) as tui: 45 | with StepTUI(message="Checking your local environment", emoji="🖥️") as step: 46 | with SubStepTUI("Checking gcloud version") as sub_step: 47 | gcloud_version = get_gcloud_version() 48 | expected_gcloud_version_string = "2021.05.21" 49 | expected_gcloud_version = Version.from_string(expected_gcloud_version_string) 50 | if not gcloud_version.is_at_least(expected_gcloud_version): 51 | raise EdgeException( 52 | f"We found gcloud version {str(gcloud_version)}, " 53 | f"but we require at least {str(expected_gcloud_version)}. " 54 | "Update gcloud by running `gcloud components update`." 55 | ) 56 | 57 | try: 58 | gcloud_alpha_version = get_gcloud_version("alpha") 59 | expected_gcloud_alpha_version_string = "2021.06.00" 60 | expected_gcloud_alpha_version = Version.from_string(expected_gcloud_alpha_version_string) 61 | if not gcloud_alpha_version.is_at_least(expected_gcloud_alpha_version): 62 | raise EdgeException( 63 | f"We found gcloud alpha component version {str(gcloud_alpha_version)}, " 64 | f"but we require at least {str(expected_gcloud_alpha_version)}. " 65 | "Update gcloud by running `gcloud components update`." 66 | ) 67 | except KeyError: 68 | raise EdgeException( 69 | f"We couldn't find the gcloud alpha components, " 70 | f"please install these by running `gcloud components install alpha`" 71 | ) 72 | 73 | with SubStepTUI("Checking kubectl version") as sub_step: 74 | kubectl_version = get_kubectl_version() 75 | expected_kubectl_version_string = "v1.19.0" 76 | expected_kubectl_version = Version.from_string(expected_kubectl_version_string) 77 | if not kubectl_version.is_at_least(expected_kubectl_version): 78 | raise EdgeException( 79 | f"We found gcloud version {str(kubectl_version)}, " 80 | f"but we require at least {str(expected_kubectl_version)}. " 81 | "Please visit https://kubernetes.io/docs/tasks/tools/ for installation instructions." 82 | ) 83 | 84 | with SubStepTUI("Checking helm version") as sub_step: 85 | helm_version = get_helm_version() 86 | expected_helm_version_string = "v3.5.2" 87 | expected_helm_version = Version.from_string(expected_helm_version_string) 88 | if not helm_version.is_at_least(expected_helm_version): 89 | raise EdgeException( 90 | f"We found gcloud version {str(helm_version)}, " 91 | f"but we require at least {str(expected_helm_version)}. " 92 | "Please visit https://helm.sh/docs/intro/install/ for installation instructions." 93 | ) 94 | 95 | with StepTUI(message="Checking your GCP environment", emoji="☁️") as step: 96 | with SubStepTUI(message="Verifying GCloud configuration") as sub_step: 97 | gcloud_account = get_gcloud_account() 98 | if gcloud_account is None or gcloud_account == "": 99 | raise EdgeException( 100 | "gcloud account is unset. " 101 | "Run `gcloud auth login && gcloud auth application-default login` to authenticate " 102 | "with the correct account" 103 | ) 104 | 105 | gcloud_project = get_gcloud_project() 106 | if gcloud_project is None or gcloud_project == "": 107 | raise EdgeException( 108 | "gcloud project id is unset. " 109 | "Run `gcloud config set project $PROJECT_ID` to set the correct project id" 110 | ) 111 | 112 | gcloud_region = get_gcloud_region() 113 | if gcloud_region is None or gcloud_region == "": 114 | raise EdgeException( 115 | "gcloud region is unset. " 116 | "Run `gcloud config set compute/region $REGION` to set the correct region" 117 | ) 118 | 119 | sub_step.update(status=TUIStatus.NEUTRAL) 120 | sub_step.set_dirty() 121 | 122 | if not questionary.confirm(f"Is this the correct GCloud account: {gcloud_account}", qmark=qmark).ask(): 123 | raise EdgeException( 124 | "Run `gcloud auth login && gcloud auth application-default login` to authenticate " 125 | "with the correct account" 126 | ) 127 | if not questionary.confirm(f"Is this the correct project id: {gcloud_project}", qmark=qmark).ask(): 128 | raise EdgeException("Run `gcloud config set project ` to set the correct project id") 129 | if not questionary.confirm(f"Is this the correct region: {gcloud_region}", qmark=qmark).ask(): 130 | raise EdgeException("Run `gcloud config set compute/region ` to set the correct region") 131 | 132 | check_gcloud_authenticated(gcloud_project) 133 | 134 | with SubStepTUI(f"{gcloud_region} is available on Vertex AI") as sub_step: 135 | if gcloud_region not in get_gcp_regions(gcloud_project): 136 | formatted_regions = "\n ".join(get_gcp_regions(gcloud_project)) 137 | raise EdgeException( 138 | "Vertex AI only works in certain regions. " 139 | "Please choose one of the following by running `gcloud config set compute/region `:\n" 140 | f" {formatted_regions}" 141 | ) 142 | 143 | gcloud_config = GCProjectConfig( 144 | project_id=gcloud_project, 145 | region=gcloud_region, 146 | ) 147 | 148 | check_project_exists(gcloud_project) 149 | check_billing_enabled(gcloud_project) 150 | 151 | with StepTUI(message="Initialising Google Storage and vertex:edge state file", emoji="💾") as step: 152 | with SubStepTUI("Enabling Storage API") as sub_step: 153 | enable_service_api("storage-component.googleapis.com", gcloud_project) 154 | 155 | with SubStepTUI("Configuring Google Storage bucket", status=TUIStatus.NEUTRAL) as sub_step: 156 | sub_step.set_dirty() 157 | storage_bucket_name = questionary.text( 158 | "Now you need to choose a name for a storage bucket that will be used for data version control, " 159 | "model assets and keeping track of the vertex:edge state\n " 160 | "NOTE: Storage bucket names must be unique and follow certain conventions. " 161 | "Please see the following guidelines for more information " 162 | "https://cloud.google.com/storage/docs/naming-buckets." 163 | "\n Enter Storage bucket name to use: ", 164 | qmark=qmark 165 | ).ask().strip() 166 | if storage_bucket_name is None or storage_bucket_name == "": 167 | raise EdgeException("Storage bucket name is required") 168 | 169 | storage_config = StorageBucketConfig( 170 | bucket_name=storage_bucket_name, 171 | dvc_store_directory="dvcstore", 172 | vertex_jobs_directory="vertex", 173 | ) 174 | storage_state = setup_storage(gcloud_project, gcloud_region, storage_bucket_name) 175 | 176 | _state = EdgeState( 177 | storage=storage_state 178 | ) 179 | 180 | _config = EdgeConfig( 181 | google_cloud_project=gcloud_config, 182 | storage_bucket=storage_config, 183 | ) 184 | 185 | skip_saving_state = False 186 | with SubStepTUI("Checking if vertex:edge state file exists") as sub_step: 187 | if EdgeState.exists(_config): 188 | sub_step.update( 189 | "The state file already exists. " 190 | "This means that vertex:edge has already been initialised using this storage bucket.", 191 | status=TUIStatus.WARNING 192 | ) 193 | sub_step.set_dirty() 194 | if not questionary.confirm( 195 | f"Do you want to delete the state and start over (this action is destructive!)", 196 | qmark=qmark, 197 | default=False, 198 | ).ask(): 199 | skip_saving_state = True 200 | 201 | if skip_saving_state: 202 | with SubStepTUI("Saving state file skipped", status=TUIStatus.WARNING) as sub_step: 203 | pass 204 | else: 205 | with SubStepTUI("Saving state file") as sub_step: 206 | _state.save(_config) 207 | 208 | with StepTUI(message="Saving configuration", emoji="⚙️") as step: 209 | with SubStepTUI("Saving configuration to edge.yaml") as sub_step: 210 | _config.save("./edge.yaml") 211 | -------------------------------------------------------------------------------- /src/edge/command/model/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/model/__init__.py -------------------------------------------------------------------------------- /src/edge/command/model/deploy.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from serde.json import from_json 4 | from edge.command.common.precommand_check import precommand_checks 5 | from edge.config import EdgeConfig 6 | from edge.exception import EdgeException 7 | from edge.state import EdgeState 8 | from edge.train import TrainedModel 9 | from edge.tui import TUI, StepTUI, SubStepTUI 10 | from edge.vertex_deploy import vertex_deploy 11 | from edge.path import get_model_dvc_pipeline, get_vertex_model_json 12 | 13 | 14 | def model_deploy(model_name: str): 15 | intro = f"Deploying model '{model_name}' on Vertex AI" 16 | success_title = "Model deployed successfully" 17 | success_message = "Success" 18 | failure_title = "Model deployment failed" 19 | failure_message = "See the errors above. See README for more details." 20 | with EdgeConfig.context() as config: 21 | with TUI( 22 | intro, 23 | success_title, 24 | success_message, 25 | failure_title, 26 | failure_message 27 | ) as tui: 28 | precommand_checks(config) 29 | with EdgeState.context(config, to_lock=True, to_save=True) as state: 30 | with StepTUI("Checking model configuration", emoji="🐏"): 31 | with SubStepTUI("Checking that the model is initialised"): 32 | if model_name not in config.models: 33 | raise EdgeException("Model has not been initialised. " 34 | f"Run `./edge.sh model init {model_name}` to initialise.") 35 | if state.models is None or state.models[model_name] is None: 36 | raise EdgeException("Model is missing from vertex:edge state. " 37 | "This might mean that the model has not been initialised. " 38 | f"Run `./edge.sh model init {model_name}` to initialise.") 39 | endpoint_resource_name = state.models[model_name].endpoint_resource_name 40 | with SubStepTUI("Checking that the model has been trained"): 41 | if not os.path.exists(get_vertex_model_json(model_name)): 42 | raise EdgeException(f"{get_vertex_model_json(model_name)} does not exist. " 43 | "This means that the model has not been trained") 44 | with open(get_vertex_model_json(model_name)) as file: 45 | model = from_json(TrainedModel, file.read()) 46 | if model.is_local: 47 | raise EdgeException("This model was trained locally, and hence cannot be deployed " 48 | "on Vertex AI") 49 | model_resource_name = model.model_name 50 | 51 | vertex_deploy(endpoint_resource_name, model_resource_name, model_name) 52 | 53 | state.models[model_name].deployed_model_resource_name = model_resource_name 54 | 55 | short_endpoint_resource_name = "/".join(endpoint_resource_name.split("/")[2:]) 56 | tui.success_message = ( 57 | "You can see the deployed model at " 58 | f"https://console.cloud.google.com/vertex-ai/" 59 | f"{short_endpoint_resource_name}?project={config.google_cloud_project.project_id}\n\n" 60 | "Happy herding! 🐏" 61 | ) 62 | 63 | -------------------------------------------------------------------------------- /src/edge/command/model/describe.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from dataclasses import dataclass 3 | from serde import serialize 4 | from serde.yaml import to_yaml 5 | from edge.config import EdgeConfig, ModelConfig 6 | from edge.state import ModelState, EdgeState 7 | 8 | 9 | @serialize 10 | @dataclass 11 | class Description: 12 | config: ModelConfig 13 | state: ModelState 14 | 15 | 16 | def describe_model(model_name): 17 | with EdgeConfig.context(silent=True) as config: 18 | if model_name not in config.models: 19 | print(f"'{model_name}' model is not initialised. " 20 | f"Initialise it by running `./edge.sh model init {model_name}`") 21 | sys.exit(1) 22 | else: 23 | with EdgeState.context(config, silent=True) as state: 24 | description = Description( 25 | config.models[model_name], 26 | state.models[model_name] 27 | ) 28 | print(to_yaml(description)) 29 | sys.exit(0) 30 | -------------------------------------------------------------------------------- /src/edge/command/model/get_endpoint.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | from edge.config import EdgeConfig 4 | from edge.state import EdgeState 5 | import questionary 6 | 7 | 8 | def get_model_endpoint(model_name: str): 9 | with EdgeConfig.context(silent=True) as config: 10 | if config.models is None or model_name not in config.models: 11 | questionary.print("Model is not initialised. Initialise it by running `./edge.sh model init`.", 12 | style="fg:ansired") 13 | sys.exit(1) 14 | with EdgeState.context(config, silent=True) as state: 15 | print(state.models[model_name].endpoint_resource_name) 16 | sys.exit(0) 17 | -------------------------------------------------------------------------------- /src/edge/command/model/init.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | from edge.command.common.precommand_check import precommand_checks 4 | from edge.config import EdgeConfig, ModelConfig 5 | from edge.enable_api import enable_service_api 6 | from edge.endpoint import setup_endpoint 7 | from edge.exception import EdgeException 8 | from edge.state import EdgeState 9 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark 10 | from edge.path import get_model_dvc_pipeline 11 | import questionary 12 | 13 | 14 | def model_init(model_name: str): 15 | intro = f"Initialising model '{model_name}' on Vertex AI" 16 | success_title = "Model initialised successfully" 17 | success_message = f""" 18 | What's next? We suggest you proceed with: 19 | 20 | Train and deploy a model (see 'Training a model' section of the README for more details): 21 | dvc repro {get_model_dvc_pipeline(model_name)} 22 | ./edge.sh model deploy 23 | 24 | Happy herding! 🐏 25 | """.strip() 26 | failure_title = "Model initialisation failed" 27 | failure_message = "See the errors above. See README for more details." 28 | with TUI( 29 | intro, 30 | success_title, 31 | success_message, 32 | failure_title, 33 | failure_message 34 | ) as tui: 35 | with EdgeConfig.context(to_save=True) as config: 36 | precommand_checks(config) 37 | with EdgeState.context(config, to_lock=True, to_save=True) as state: 38 | with StepTUI("Enabling required Google Cloud APIs", emoji="☁️"): 39 | with SubStepTUI("Enabling Vertex AI API for model training and deployment"): 40 | enable_service_api("aiplatform.googleapis.com", config.google_cloud_project.project_id) 41 | 42 | with StepTUI(f"Configuring model '{model_name}'", emoji="⚙️"): 43 | with SubStepTUI(f"Checking if model '{model_name}' is configured") as sub_step: 44 | if model_name in config.models: 45 | sub_step.update(f"Model '{model_name}' is already configured", status=TUIStatus.WARNING) 46 | sub_step.set_dirty() 47 | if not questionary.confirm( 48 | f"Do you want to configure model '{model_name}' again?", 49 | qmark=qmark 50 | ).ask(): 51 | raise EdgeException(f"Configuration for model '{model_name}' already exists") 52 | else: 53 | sub_step.update( 54 | message=f"Model '{model_name}' is not configured", 55 | status=TUIStatus.NEUTRAL 56 | ) 57 | with SubStepTUI(f"Creating model '{model_name}' configuration"): 58 | model_config = ModelConfig( 59 | name=model_name, 60 | endpoint_name=f"{model_name}-endpoint" 61 | ) 62 | config.models[model_name] = model_config 63 | 64 | endpoint_name = config.models[model_name].endpoint_name 65 | 66 | model_state = setup_endpoint( 67 | config.google_cloud_project.project_id, 68 | config.google_cloud_project.region, 69 | endpoint_name 70 | ) 71 | 72 | directory_exists = False 73 | pipeline_exists = False 74 | with StepTUI("Checking project directory structure", emoji="📁"): 75 | with SubStepTUI(f"Checking that 'models/{model_name}' directory exists") as sub_step: 76 | if not (os.path.exists(f"models/{model_name}") and os.path.isdir(f"models/{model_name}")): 77 | sub_step.update( 78 | message="'models/{model_name}' directory does not exist", 79 | status=TUIStatus.NEUTRAL 80 | ) 81 | else: 82 | directory_exists = True 83 | if directory_exists: 84 | with SubStepTUI(f"Checking that 'models/{model_name}/dvc.yaml' pipeline exists") as sub_step: 85 | if not os.path.exists(f"models/{model_name}/dvc.yaml"): 86 | sub_step.update( 87 | message=f"'models/{model_name}/dvc.yaml' pipeline does not exist", 88 | status=TUIStatus.NEUTRAL 89 | ) 90 | else: 91 | pipeline_exists = True 92 | 93 | state.models = { 94 | model_name: model_state 95 | } 96 | 97 | if not directory_exists or not pipeline_exists: 98 | tui.success_message = f"Note that the 'models/{model_name}" + tui.success_message 99 | -------------------------------------------------------------------------------- /src/edge/command/model/list.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | from edge.config import EdgeConfig 4 | 5 | 6 | def list_models(): 7 | with EdgeConfig.context(silent=True) as config: 8 | print("Configured models:") 9 | print("\n".join([f" - {x}" for x in config.models.keys()])) 10 | sys.exit(0) 11 | -------------------------------------------------------------------------------- /src/edge/command/model/remove.py: -------------------------------------------------------------------------------- 1 | import questionary 2 | 3 | from edge.command.common.precommand_check import precommand_checks 4 | from edge.config import EdgeConfig 5 | from edge.endpoint import tear_down_endpoint 6 | from edge.exception import EdgeException 7 | from edge.state import EdgeState 8 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark 9 | 10 | 11 | def remove_model(model_name): 12 | intro = f"Removing model '{model_name}' from vertex:edge" 13 | success_title = "Model removed successfully" 14 | success_message = "Success" 15 | failure_title = "Model removal failed" 16 | failure_message = "See the errors above. See README for more details." 17 | with TUI( 18 | intro, 19 | success_title, 20 | success_message, 21 | failure_title, 22 | failure_message 23 | ) as tui: 24 | with EdgeConfig.context(to_save=True) as config: 25 | precommand_checks(config) 26 | with EdgeState.context(config, to_save=True, to_lock=True) as state: 27 | with StepTUI(f"Checking model '{model_name}' configuration and state", emoji="🐏"): 28 | with SubStepTUI(f"Checking model '{model_name}' configuration"): 29 | if model_name not in config.models: 30 | raise EdgeException(f"'{model_name}' model is not in `edge.yaml` configuration, so it " 31 | f"cannot be removed.") 32 | with SubStepTUI(f"Checking model '{model_name}' state"): 33 | if model_name not in state.models: 34 | raise EdgeException(f"'{model_name}' is not in vertex:edge state, which suggests that " 35 | f"it has not been initialised. Cannot be removed") 36 | with SubStepTUI("Confirming action", status=TUIStatus.WARNING) as sub_step: 37 | sub_step.add_explanation(f"This action will undeploy '{model_name}' model from Vertex AI, " 38 | f"delete the Vertex AI endpoint associated with '{model_name}' model, " 39 | f"and remove '{model_name}' model from vertex:edge config and " 40 | f"state.") 41 | if not questionary.confirm("Do you want to continue?", qmark=qmark, default=False).ask(): 42 | raise EdgeException("Canceled by user") 43 | 44 | with StepTUI(f"Removing '{model_name}' model"): 45 | with SubStepTUI(f"Deleting '{state.models[model_name].endpoint_resource_name}' endpoint"): 46 | tear_down_endpoint(state.models[model_name].endpoint_resource_name) 47 | with SubStepTUI(f"Removing '{model_name}' model from config and state"): 48 | del config.models[model_name] 49 | del state.models[model_name] 50 | 51 | -------------------------------------------------------------------------------- /src/edge/command/model/subparser.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | from edge.command.model.deploy import model_deploy 4 | from edge.command.model.describe import describe_model 5 | from edge.command.model.get_endpoint import get_model_endpoint 6 | from edge.command.model.init import model_init 7 | from edge.command.model.list import list_models 8 | from edge.command.model.remove import remove_model 9 | from edge.command.model.template import create_model_from_template 10 | from edge.exception import EdgeException 11 | 12 | 13 | def add_model_parser(subparsers): 14 | parser = subparsers.add_parser("model", help="Model related actions") 15 | actions = parser.add_subparsers(title="action", dest="action", required=True) 16 | 17 | init_parser = actions.add_parser("init", help="Initialise model on Vertex AI") 18 | init_parser.add_argument("model_name", metavar="model-name", help="Model name") 19 | 20 | deploy_parser = actions.add_parser("deploy", help="Deploy model on Vertex AI") 21 | deploy_parser.add_argument("model_name", metavar="model-name", help="Model name") 22 | 23 | get_endpoint_parser = actions.add_parser("get-endpoint", help="Get Vertex AI endpoint URI") 24 | get_endpoint_parser.add_argument("model_name", metavar="model-name", help="Model name") 25 | 26 | actions.add_parser("list", help="List initialised models") 27 | 28 | describe_parser = actions.add_parser("describe", help="Describe an initialised model") 29 | describe_parser.add_argument("model_name", metavar="model-name", help="Model name") 30 | 31 | remove_parser = actions.add_parser("remove", help="Remove an initialised model from vertex:edge") 32 | remove_parser.add_argument("model_name", metavar="model-name", help="Model name") 33 | 34 | template_parser = actions.add_parser("template", help="Create a model pipeline from a template") 35 | template_parser.add_argument("model_name", metavar="model-name", help="Model name") 36 | template_parser.add_argument("-f", action="store_true", 37 | help="Force override a pipeline directory if already exists") 38 | 39 | 40 | def run_model_actions(args: argparse.Namespace): 41 | if args.action == "init": 42 | model_init(args.model_name) 43 | elif args.action == "deploy": 44 | model_deploy(args.model_name) 45 | elif args.action == "get-endpoint": 46 | get_model_endpoint(args.model_name) 47 | elif args.action == "list": 48 | list_models() 49 | elif args.action == "describe": 50 | describe_model(args.model_name) 51 | elif args.action == "remove": 52 | remove_model(args.model_name) 53 | elif args.action == "template": 54 | create_model_from_template(args.model_name, args.f) 55 | else: 56 | raise EdgeException("Unexpected model command") 57 | -------------------------------------------------------------------------------- /src/edge/command/model/template.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from edge.command.common.precommand_check import precommand_checks 4 | from edge.config import EdgeConfig 5 | from edge.exception import EdgeException 6 | from edge.state import EdgeState 7 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark 8 | from cookiecutter.main import cookiecutter 9 | from cookiecutter.exceptions import OutputDirExistsException 10 | import questionary 11 | 12 | 13 | def create_model_from_template(model_name: str, force: bool = False): 14 | intro = f"Creating model pipeline '{model_name}' from a template" 15 | success_title = "Pipeline is created from a template" 16 | success_message = "Success" 17 | failure_title = "Pipeline creation failed" 18 | failure_message = "See the errors above. See README for more details." 19 | with EdgeConfig.context() as config: 20 | with TUI( 21 | intro, 22 | success_title, 23 | success_message, 24 | failure_title, 25 | failure_message 26 | ) as tui: 27 | precommand_checks(config) 28 | with EdgeState.context(config) as state: 29 | with StepTUI("Checking model configuration", emoji="🐏"): 30 | with SubStepTUI("Checking that the model is initialised"): 31 | if model_name not in config.models: 32 | raise EdgeException("Model has not been initialised. " 33 | f"Run `./edge.sh model init {model_name}` to initialise.") 34 | if state.models is None or state.models[model_name] is None: 35 | raise EdgeException("Model is missing from vertex:edge state. " 36 | "This might mean that the model has not been initialised. " 37 | f"Run `./edge.sh model init {model_name}` to initialise.") 38 | with StepTUI("Creating pipeline from a template", emoji="🐏"): 39 | with SubStepTUI("Choosing model pipeline template", status=TUIStatus.NEUTRAL) as substep: 40 | substep.set_dirty() 41 | templates = { 42 | "tensorflow": "tensorflow_model", 43 | } 44 | pipeline_template = questionary.select( 45 | "Choose model template", 46 | templates.keys(), 47 | qmark=qmark 48 | ).ask() 49 | if pipeline_template is None: 50 | raise EdgeException("Pipeline template must be selected") 51 | pipeline_template = templates[pipeline_template] 52 | with SubStepTUI(f"Applying template '{pipeline_template}'"): 53 | try: 54 | cookiecutter( 55 | os.path.join( 56 | os.path.dirname(os.path.abspath(__file__)), 57 | f"../../templates/{pipeline_template}/" 58 | ), 59 | output_dir="models/", 60 | extra_context={ 61 | "model_name": model_name 62 | }, 63 | no_input=True, 64 | overwrite_if_exists=force, 65 | ) 66 | except OutputDirExistsException as exc: 67 | raise EdgeException( 68 | f"Pipeline directory 'models/{model_name}' already exists, so the template cannot be " 69 | f"applied. If you want to override the existing pipeline, run `edge model template " 70 | f"{model_name} -f`." 71 | ) 72 | -------------------------------------------------------------------------------- /src/edge/config.py: -------------------------------------------------------------------------------- 1 | import inspect 2 | import sys 3 | from dataclasses import dataclass, field 4 | from typing import TypeVar, Type, Optional, Dict 5 | from serde import serialize, deserialize 6 | from serde.yaml import from_yaml, to_yaml 7 | from contextlib import contextmanager 8 | 9 | from edge.tui import StepTUI, SubStepTUI 10 | from edge.path import get_default_config_path, get_default_config_path_from_model 11 | 12 | 13 | @deserialize 14 | @serialize 15 | @dataclass 16 | class GCProjectConfig: 17 | project_id: str 18 | region: str 19 | 20 | 21 | @deserialize 22 | @serialize 23 | @dataclass 24 | class StorageBucketConfig: 25 | bucket_name: str 26 | dvc_store_directory: str 27 | vertex_jobs_directory: str 28 | 29 | 30 | @deserialize 31 | @serialize 32 | @dataclass 33 | class SacredConfig: 34 | gke_cluster_name: str 35 | mongodb_connection_string_secret: str 36 | 37 | 38 | @deserialize 39 | @serialize 40 | @dataclass 41 | class ModelConfig: 42 | name: str 43 | endpoint_name: str 44 | training_container_image_uri: str = "europe-docker.pkg.dev/vertex-ai/training/tf-cpu.2-6:latest" 45 | serving_container_image_uri: str = "europe-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-6:latest" 46 | 47 | 48 | T = TypeVar("T", bound="EdgeConfig") 49 | 50 | 51 | @deserialize 52 | @serialize 53 | @dataclass 54 | class EdgeConfig: 55 | google_cloud_project: GCProjectConfig 56 | storage_bucket: StorageBucketConfig 57 | experiments: Optional[SacredConfig] = None 58 | models: Dict[str, ModelConfig] = field(default_factory=dict) 59 | 60 | def save(self, path: str): 61 | with open(path, "w") as f: 62 | f.write(to_yaml(self)) 63 | 64 | def __str__(self) -> str: 65 | return to_yaml(self) 66 | 67 | @classmethod 68 | def from_string(cls: Type[T], string: str) -> T: 69 | return from_yaml(EdgeConfig, string) 70 | 71 | @classmethod 72 | def load(cls: Type[T], path: str) -> T: 73 | with open(path) as f: 74 | yaml_str = "\n".join(f.readlines()) 75 | 76 | return from_yaml(EdgeConfig, yaml_str) 77 | 78 | @classmethod 79 | def load_default(cls: Type[T]) -> T: 80 | config_path = get_default_config_path_from_model(inspect.getframeinfo(sys._getframe(1)).filename) 81 | config = EdgeConfig.load(config_path) 82 | return config 83 | 84 | @classmethod 85 | @contextmanager 86 | def context(cls: Type[T], config_path: str = None, to_save: bool = False, silent: bool = False) -> T: 87 | if config_path is None: 88 | config_path = get_default_config_path() 89 | config = EdgeConfig.load(config_path) 90 | try: 91 | yield config 92 | finally: 93 | if to_save: 94 | with StepTUI("Saving vertex:edge configuration", emoji="💾", silent=silent): 95 | with SubStepTUI("Saving vertex:edge configuration", silent=silent): 96 | config.save(config_path) 97 | -------------------------------------------------------------------------------- /src/edge/dvc.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import subprocess 3 | from typing import Optional 4 | 5 | import os 6 | import shutil 7 | from edge.exception import EdgeException 8 | from edge.tui import StepTUI, SubStepTUI, TUIStatus, qmark 9 | import questionary 10 | 11 | 12 | def dvc_exists() -> bool: 13 | return os.path.exists(".dvc") and os.path.isdir(".dvc") 14 | 15 | 16 | def dvc_init(): 17 | with StepTUI("Initialising DVC", emoji="🔨"): 18 | with SubStepTUI("Initialising DVC"): 19 | try: 20 | subprocess.check_output("dvc init", shell=True, stderr=subprocess.DEVNULL) 21 | except subprocess.CalledProcessError as e: 22 | raise EdgeException(f"Unexpected error occurred while initialising DVC:\n{str(e)}") 23 | 24 | 25 | def dvc_destroy(): 26 | with StepTUI("Destroying DVC", emoji="🔥"): 27 | with SubStepTUI("Deleting DVC configuration [.dvc]"): 28 | shutil.rmtree(".dvc") 29 | with SubStepTUI("Deleting DVC data [data/fashion-mnist/*.dvc]"): 30 | for f in glob.glob("data/fashion-mnist/*.dvc"): 31 | os.remove(f) 32 | with SubStepTUI("Deleting pipeline lock file [models/pipelines/fashion/dvc.lock]"): 33 | if os.path.exists("models/pipelines/fashion/dvc.lock"): 34 | os.remove("models/pipelines/fashion/dvc.lock") 35 | 36 | 37 | def dvc_remote_exists(path: str) -> (bool, bool): 38 | try: 39 | remotes_raw = subprocess.check_output("dvc remote list", shell=True, stderr=subprocess.DEVNULL).decode("utf-8") 40 | remotes = [x.split("\t") for x in remotes_raw.strip().split("\n") if len(x.split("\t")) == 2] 41 | for remote in remotes: 42 | if remote[0] == "storage": 43 | if remote[1] == path: 44 | return True, True 45 | else: 46 | return True, False 47 | return False, False 48 | except subprocess.CalledProcessError as e: 49 | raise EdgeException(f"Unexpected error occurred while adding remote storage to DVC:\n{str(e)}") 50 | 51 | 52 | def get_dvc_storage_path() -> Optional[str]: 53 | try: 54 | remotes_raw = subprocess.check_output("dvc remote list", shell=True, stderr=subprocess.DEVNULL).decode("utf-8") 55 | remotes = [x.split("\t") for x in remotes_raw.strip().split("\n") if len(x.split("\t")) == 2] 56 | for remote in remotes: 57 | if remote[0] == "storage": 58 | return remote[1].strip() 59 | return None 60 | except subprocess.CalledProcessError as e: 61 | raise EdgeException(f"Unexpected error occurred while getting DVC remote storage path:\n{str(e)}") 62 | 63 | 64 | def dvc_add_remote(path: str): 65 | with StepTUI("Configuring DVC remote storage", emoji="⚙️"): 66 | with SubStepTUI(f"Adding '{path}' as DVC remote storage URI") as sub_step: 67 | try: 68 | storage_exists, correct_path = dvc_remote_exists(path) 69 | if storage_exists: 70 | if correct_path: 71 | return 72 | else: 73 | sub_step.update(f"Modifying existing storage URI to '{path}'") 74 | subprocess.check_output( 75 | f"dvc remote modify storage url {path} && dvc remote default storage", shell=True, 76 | stderr=subprocess.DEVNULL 77 | ) 78 | else: 79 | subprocess.check_output(f"dvc remote add storage {path} && dvc remote default storage", shell=True, 80 | stderr=subprocess.DEVNULL) 81 | except subprocess.CalledProcessError as e: 82 | raise EdgeException(f"Unexpected error occurred while adding remote storage to DVC:\n{str(e)}") 83 | 84 | 85 | def setup_dvc(bucket_path: str, dvc_store_directory: str): 86 | storage_path = os.path.join(bucket_path, dvc_store_directory) 87 | exists = False 88 | is_remote_correct = False 89 | to_destroy = False 90 | with StepTUI("Checking DVC configuration", emoji="🔍"): 91 | with SubStepTUI("Checking if DVC is already initialised") as sub_step: 92 | exists = dvc_exists() 93 | if not exists: 94 | sub_step.update( 95 | message="DVC is not initialised", 96 | status=TUIStatus.NEUTRAL 97 | ) 98 | if exists: 99 | with SubStepTUI("Checking if DVC remote storage is configured") as sub_step: 100 | configured_storage_path = get_dvc_storage_path() 101 | is_remote_correct = storage_path == configured_storage_path 102 | if configured_storage_path is None: 103 | sub_step.update( 104 | f"DVC remote storage is not configured", 105 | status=TUIStatus.NEUTRAL 106 | ) 107 | elif not is_remote_correct: 108 | sub_step.update( 109 | f"DVC remote storage does not match vertex:edge config", 110 | status=TUIStatus.WARNING 111 | ) 112 | sub_step.set_dirty() 113 | sub_step.add_explanation( 114 | f"DVC remote storage is configured to '{configured_storage_path}', " 115 | f"but vertex:edge has been configured to use '{storage_path}'. " 116 | f"This might mean that DVC has been already initialised to work with " 117 | f"a different GCP environment. " 118 | f"If this is the case, we recommend to reinitialise DVC from scratch. \n\n " 119 | f"Alternatively, you may have changed the bucket name on purpose in your GCP environment. " 120 | f"In which case, DVC does not need to be reinitialised, and your DVC config will be " 121 | f"updated to match your vertex:edge config." 122 | ) 123 | to_destroy = questionary.confirm( 124 | "Do you want to destroy DVC and initialise it from scratch? (this action is destructive!)", 125 | default=False, 126 | qmark=qmark 127 | ).ask() 128 | if to_destroy is None: 129 | raise EdgeException("Canceled by user") 130 | if to_destroy: 131 | dvc_destroy() 132 | exists = False 133 | is_remote_correct = False 134 | 135 | # Checking again, DVC might have been destroyed by this point 136 | if not exists: 137 | dvc_init() 138 | 139 | if not is_remote_correct: 140 | dvc_add_remote(storage_path) 141 | -------------------------------------------------------------------------------- /src/edge/enable_api.py: -------------------------------------------------------------------------------- 1 | """ 2 | Enabling Google Cloud APIs 3 | """ 4 | import json 5 | import os 6 | import subprocess 7 | from .exception import EdgeException 8 | from .config import EdgeConfig 9 | 10 | 11 | def enable_api(_config: EdgeConfig): 12 | """ 13 | Enable all necessary APIs (deprecated) 14 | 15 | :param _config: 16 | :return: 17 | """ 18 | print("# Enabling necessary Google Cloud APIs") 19 | project_id = _config.google_cloud_project.project_id 20 | 21 | print("## Kubernetes Engine") 22 | print("Required for installing the experiment tracker") 23 | os.system(f"gcloud services enable container.googleapis.com --project {project_id}") 24 | 25 | print("## Storage") 26 | print("Required for DVC remote storage, Vertex AI artifact storage, and Vertex:Edge state") 27 | os.system(f"gcloud services enable storage-component.googleapis.com --project {project_id}") 28 | 29 | print("## Vertex AI") 30 | print("Required for training and deploying on Vertex AI") 31 | os.system(f"gcloud services enable aiplatform.googleapis.com --project {project_id}") 32 | 33 | print("## Secret Manager") 34 | print("Required for secret sharing, including connection strings for the experiment tracker") 35 | os.system(f"gcloud services enable secretmanager.googleapis.com --project {project_id}") 36 | 37 | print("## Cloud Run") 38 | print("Required for deploying the webapp") 39 | os.system(f"gcloud services enable run.googleapis.com --project {project_id}") 40 | 41 | 42 | def is_service_api_enabled(service_name: str, project_id: str) -> bool: 43 | """ 44 | Check if a [service_name] API is enabled 45 | 46 | :param service_name: 47 | :param project_id: 48 | :return: 49 | """ 50 | try: 51 | enabled_services = json.loads(subprocess.check_output( 52 | f"gcloud services list --enabled --project {project_id} --format json", 53 | shell=True, 54 | stderr=subprocess.STDOUT 55 | ).decode("utf-8")) 56 | for service in enabled_services: 57 | if service_name in service["name"]: 58 | return True 59 | return False 60 | except subprocess.CalledProcessError as error: 61 | parse_enable_service_api_error(service_name, error) 62 | return False 63 | 64 | 65 | def enable_service_api(service: str, project_id: str): 66 | """ 67 | Enable [service] API 68 | 69 | :param service: 70 | :param project_id: 71 | :return: 72 | """ 73 | if not is_service_api_enabled(service, project_id): 74 | try: 75 | subprocess.check_output( 76 | f"gcloud services enable {service} --project {project_id}", 77 | shell=True, 78 | stderr=subprocess.STDOUT 79 | ) 80 | except subprocess.CalledProcessError as error: 81 | parse_enable_service_api_error(service, error) 82 | 83 | 84 | def parse_enable_service_api_error(service: str, error: subprocess.CalledProcessError): 85 | """ 86 | Parse errors coming from `gcloud services` commands 87 | 88 | :param service: 89 | :param error: 90 | :return: 91 | """ 92 | output = error.output.decode("utf-8") 93 | if output.startswith("ERROR: (gcloud.services.enable) PERMISSION_DENIED"): 94 | raise EdgeException(f"Service '{service}' cannot be enabled because you have insufficient permissions " 95 | f"on Google Cloud") 96 | 97 | raise error 98 | -------------------------------------------------------------------------------- /src/edge/endpoint.py: -------------------------------------------------------------------------------- 1 | """ 2 | Performing operations on Vertex AI endpoints 3 | """ 4 | import re 5 | from typing import Optional 6 | from google.cloud import aiplatform 7 | from google.api_core.exceptions import PermissionDenied 8 | from .config import EdgeConfig 9 | from .exception import EdgeException 10 | from .state import ModelState, EdgeState 11 | from .tui import StepTUI, SubStepTUI, TUIStatus 12 | 13 | 14 | def get_endpoint(sub_step: SubStepTUI, project_id: str, region: str, endpoint_name: str) -> Optional[str]: 15 | """ 16 | Get Vertex AI endpoint resource name 17 | 18 | :param sub_step: 19 | :param project_id: 20 | :param region: 21 | :param endpoint_name: 22 | :return: 23 | """ 24 | endpoints = aiplatform.Endpoint.list( 25 | filter=f'display_name="{endpoint_name}"', 26 | project=project_id, 27 | location=region, 28 | ) 29 | if len(endpoints) > 1: 30 | sub_step.update(status=TUIStatus.WARNING) 31 | sub_step.add_explanation( 32 | f"Multiple endpoints with '{endpoint_name}' name were found. Vertex:edge will use the first one found" 33 | ) 34 | elif len(endpoints) == 0: 35 | return None 36 | return endpoints[0].resource_name 37 | 38 | 39 | def create_endpoint(project_id: str, region: str, endpoint_name: str) -> str: 40 | """ 41 | Create an endpoint on Vertex AI 42 | 43 | :param project_id: 44 | :param region: 45 | :param endpoint_name: 46 | :return: 47 | """ 48 | try: 49 | endpoint = aiplatform.Endpoint.create(display_name=endpoint_name, project=project_id, location=region) 50 | 51 | return endpoint.resource_name 52 | except PermissionDenied as error: 53 | try: 54 | permission = re.search("Permission '(.*)' denied", error.args[0]).group(1) 55 | raise EdgeException( 56 | f"Endpoint '{endpoint_name}' could not be created in project '{project_id}' " 57 | f"because you have insufficient permission. Make sure you have '{permission}' permission." 58 | ) from error 59 | except AttributeError as attribute_error: 60 | raise error from attribute_error 61 | 62 | 63 | def setup_endpoint(project_id: str, region: str, endpoint_name: str) -> ModelState: 64 | """ 65 | Setup procedure for Vertex AI endpoint 66 | 67 | :param project_id: 68 | :param region: 69 | :param endpoint_name: 70 | :return: 71 | """ 72 | with StepTUI("Configuring Vertex AI endpoint", emoji="☁️"): 73 | with SubStepTUI(f"Checking if Vertex AI endpoint '{endpoint_name}' exists") as sub_step: 74 | endpoint_resource_name = get_endpoint( 75 | sub_step, project_id, region, endpoint_name 76 | ) 77 | if endpoint_resource_name is None: 78 | sub_step.update(message=f"'{endpoint_name}' endpoint does not exist, creating...") 79 | endpoint_resource_name = create_endpoint( 80 | project_id, region, endpoint_name 81 | ) 82 | sub_step.update(message="Created 'fashion-endpoint' endpoint") 83 | return ModelState(endpoint_resource_name=endpoint_resource_name) 84 | 85 | 86 | def tear_down_endpoint(endpoint_resource_name: str): 87 | endpoint = aiplatform.Endpoint(endpoint_resource_name) 88 | endpoint.undeploy_all() 89 | endpoint.delete() 90 | -------------------------------------------------------------------------------- /src/edge/exception.py: -------------------------------------------------------------------------------- 1 | class EdgeException(Exception): 2 | def __init__(self, mesg: str, fatal: bool = True): 3 | self.fatal = True 4 | super(EdgeException, self).__init__(mesg) 5 | -------------------------------------------------------------------------------- /src/edge/gcloud.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import subprocess 4 | from typing import List 5 | from .exception import EdgeException 6 | 7 | # Regions that are supported for Vertex AI training and deployment 8 | regions = [ 9 | "us-central1", 10 | "europe-west4", 11 | "asia-east1", 12 | "asia-northeast1", 13 | "asia-northeast3", 14 | "asia-southeast1", 15 | "australia-southeast1", 16 | "europe-west1", 17 | "europe-west2", 18 | "northamerica-northeast1", 19 | "us-west1", 20 | "us-east1", 21 | "us-east4", 22 | ] 23 | 24 | 25 | def get_gcp_regions(project: str) -> List[str]: 26 | return regions 27 | 28 | 29 | def get_gcloud_account() -> str: 30 | return ( 31 | subprocess.check_output("gcloud config get-value account", shell=True, stderr=subprocess.DEVNULL) 32 | .decode("utf-8") 33 | .strip() 34 | ) 35 | 36 | 37 | def get_gcloud_project() -> str: 38 | return ( 39 | subprocess.check_output("gcloud config get-value project", shell=True, stderr=subprocess.DEVNULL) 40 | .decode("utf-8") 41 | .strip() 42 | ) 43 | 44 | 45 | def get_gcloud_region() -> str: 46 | return ( 47 | subprocess.check_output("gcloud config get-value compute/region", shell=True, stderr=subprocess.DEVNULL) 48 | .decode("utf-8") 49 | .strip() 50 | ) 51 | 52 | 53 | def is_billing_enabled(project: str) -> bool: 54 | try: 55 | response = json.loads( 56 | subprocess.check_output( 57 | f"gcloud alpha billing projects describe {project} --format json", shell=True, stderr=subprocess.DEVNULL 58 | ) 59 | ) 60 | return response["billingEnabled"] 61 | except subprocess.CalledProcessError: 62 | raise EdgeException( 63 | f"Unable to access billing information for project '{project}'. " 64 | f"Please verify that the project ID is valid and your user has permissions " 65 | f"to access the billing information for this project.", 66 | fatal=False 67 | ) 68 | 69 | 70 | def is_authenticated(project_id: str) -> (bool, str): 71 | """ 72 | Check if gcloud is authenticated 73 | :return: is authenticated, and the reason if not 74 | """ 75 | try: 76 | subprocess.check_output(f"gcloud auth print-access-token", shell=True, stderr=subprocess.DEVNULL) 77 | except subprocess.CalledProcessError: 78 | return False, "gcloud is not authenticated. Run `gcloud auth login`." 79 | 80 | try: 81 | credentials = json.loads(subprocess.check_output( 82 | f"gcloud auth application-default print-access-token --format json", shell=True, stderr=subprocess.DEVNULL 83 | ).decode("utf-8")) 84 | if credentials["quota_project_id"] != project_id: 85 | return False, f"Quota project id does not match '{project_id}'. Please generate new application default" \ 86 | f" credentials by running `gcloud auth application-default login`" 87 | return True, "" 88 | except subprocess.CalledProcessError: 89 | return ( 90 | False, 91 | "gcloud does not have application default credentials configured. " 92 | "Run `gcloud auth application-default login`.", 93 | ) 94 | 95 | 96 | def project_exists(project_id: str) -> bool: 97 | try: 98 | subprocess.check_output(f"gcloud projects describe {project_id}", shell=True, stderr=subprocess.DEVNULL) 99 | return True 100 | except subprocess.CalledProcessError: 101 | raise EdgeException( 102 | f"Unable to find project {project_id}. " 103 | "This means it does not exist or you do not have permissions to access it. " 104 | "Please verify that the project ID is valid in Google Cloud Console." 105 | ) 106 | -------------------------------------------------------------------------------- /src/edge/k8s/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/k8s/__init__.py -------------------------------------------------------------------------------- /src/edge/k8s/omniboard.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: omniboard-deployment 5 | labels: 6 | app: omniboard 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: omniboard 12 | template: 13 | metadata: 14 | labels: 15 | app: omniboard 16 | spec: 17 | containers: 18 | - name: nginx 19 | image: vivekratnavel/omniboard 20 | env: 21 | - name: MONGO_URI 22 | valueFrom: 23 | secretKeyRef: 24 | name: mongodb-connection 25 | key: internal 26 | ports: 27 | - containerPort: 9000 28 | --- 29 | apiVersion: v1 30 | kind: Service 31 | metadata: 32 | name: omniboard-lb 33 | spec: 34 | type: LoadBalancer 35 | selector: 36 | app: omniboard 37 | ports: 38 | - protocol: TCP 39 | port: 9000 40 | targetPort: 9000 41 | -------------------------------------------------------------------------------- /src/edge/path.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | # TODO: Document all 4 | 5 | def get_default_config_path(): 6 | # TODO: Document env var 7 | path = os.environ.get("EDGE_CONFIG_PATH") 8 | if path is None: 9 | path = os.path.join(os.getcwd(), "edge.yaml") 10 | return path 11 | 12 | 13 | def get_default_config_path_from_model(caller: str): 14 | path = os.environ.get("EDGE_CONFIG_PATH") 15 | if path is None: 16 | path = os.path.join(os.path.dirname(caller), "../../", "edge.yaml") 17 | return path 18 | 19 | 20 | def get_model_path(model_name: str): 21 | return f"models/{model_name}" 22 | 23 | 24 | def get_model_dvc_pipeline(model_name: str): 25 | return os.path.join(get_model_path(model_name), "dvc.yaml") 26 | 27 | 28 | def get_vertex_model_json(model_name: str): 29 | return os.path.join(get_model_path(model_name), "trained_model.json") 30 | 31 | -------------------------------------------------------------------------------- /src/edge/sacred.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import subprocess 4 | import time 5 | from edge.config import EdgeConfig 6 | from google.cloud import container_v1 7 | from google.cloud.container_v1 import Cluster 8 | from google.api_core.exceptions import NotFound, PermissionDenied 9 | from google.cloud import secretmanager_v1 10 | 11 | from edge.exception import EdgeException 12 | from edge.state import SacredState, EdgeState 13 | from sacred.observers import MongoObserver 14 | from sacred.experiment import Experiment 15 | 16 | from edge.tui import StepTUI, SubStepTUI, TUIStatus 17 | 18 | 19 | def create_cluster(project_id: str, region: str, cluster_name: str) -> Cluster: 20 | with SubStepTUI(f"Checking if '{cluster_name}' cluster exists") as sub_step: 21 | client = container_v1.ClusterManagerClient() 22 | try: 23 | cluster = client.get_cluster( 24 | project_id=project_id, name=f"projects/{project_id}/locations/{region}/clusters/{cluster_name}" 25 | ) 26 | except NotFound: 27 | sub_step.update(message=f"Cluster '{cluster_name}' does not exist, creating... (may take a few minutes)") 28 | try: 29 | subprocess.check_output( 30 | f"gcloud container clusters create-auto {cluster_name} --project {project_id} --region {region}", 31 | shell=True, stderr=subprocess.STDOUT 32 | ) 33 | except subprocess.CalledProcessError as exc: 34 | raise EdgeException(f"Error occurred while creating cluster '{cluster_name}'\n{exc.output}") 35 | 36 | cluster = client.get_cluster( 37 | project_id=project_id, name=f"projects/{project_id}/locations/{region}/clusters/{cluster_name}" 38 | ) 39 | sub_step.update(message=f"Cluster '{cluster_name}' created", status=TUIStatus.SUCCESSFUL) 40 | return cluster 41 | return cluster 42 | 43 | 44 | def get_credentials(project_id: str, region: str, cluster_name: str): 45 | with SubStepTUI("Getting cluster credentials"): 46 | try: 47 | subprocess.check_output( 48 | f"gcloud container clusters get-credentials {cluster_name} --project {project_id} --region {region}", 49 | shell=True, 50 | stderr=subprocess.STDOUT, 51 | ) 52 | except subprocess.CalledProcessError as e: 53 | raise EdgeException(f"Error occurred while getting kubernetes cluster credentials\n{e.output}") 54 | 55 | 56 | def get_mongodb_password(): 57 | with SubStepTUI("Getting MongoDB password"): 58 | try: 59 | return subprocess.check_output( 60 | 'kubectl get secret --namespace default mongodb -o jsonpath="{.data.mongodb-passwords}" | ' 61 | 'base64 --decode', 62 | shell=True, 63 | ).decode("utf-8") 64 | except subprocess.CalledProcessError as e: 65 | raise EdgeException(f"Error occurred while getting MongoDB password\n{e.output}") 66 | 67 | 68 | def get_lb_ip(name) -> str: 69 | try: 70 | return subprocess.check_output( 71 | f'kubectl get service --namespace default {name} -o jsonpath="{{.status.loadBalancer.ingress[0].ip}}"', 72 | shell=True, 73 | ).decode("utf-8") 74 | except subprocess.CalledProcessError as e: 75 | raise EdgeException(f"Error occurred while getting IP for {name}\n{e.output}") 76 | 77 | 78 | def check_mongodb_installed() -> bool: 79 | helm_charts = json.loads(subprocess.check_output("helm list -o json", shell=True).decode("utf-8")) 80 | for chart in helm_charts: 81 | if chart["name"] == "mongodb": 82 | return True 83 | return False 84 | 85 | 86 | def check_mongodb_lb_installed() -> bool: 87 | try: 88 | subprocess.check_output("kubectl get service mongodb-lb -o json", stderr=subprocess.STDOUT, shell=True) 89 | except subprocess.CalledProcessError as e: 90 | if e.output.decode("utf-8") == 'Error from server (NotFound): services "mongodb-lb" not found\n': 91 | return False 92 | else: 93 | raise e 94 | return True 95 | 96 | 97 | def install_mongodb() -> (str, str): 98 | with SubStepTUI("Checking if MongoDB is installed on the cluster") as sub_step: 99 | try: 100 | if not check_mongodb_installed(): 101 | sub_step.update("Installing MongoDB on the cluster") 102 | subprocess.check_output( 103 | """ 104 | helm repo add bitnami https://charts.bitnami.com/bitnami 105 | helm upgrade -i --wait mongodb bitnami/mongodb --version 10.29.1 --set auth.username=sacred,auth.database=sacred 106 | """, 107 | shell=True, 108 | ) 109 | sub_step.update("MongoDB is installed on the cluster", status=TUIStatus.SUCCESSFUL) 110 | except subprocess.CalledProcessError as e: 111 | raise EdgeException(f"Error occurred while installing MongoDB with helm chart\n{e.output}") 112 | 113 | with SubStepTUI("Making MongoDB externally available"): 114 | try: 115 | if not check_mongodb_lb_installed(): 116 | subprocess.check_output( 117 | "kubectl expose deployment mongodb --name mongodb-lb --type LoadBalancer --port 60000 " 118 | "--target-port 27017", 119 | shell=True, 120 | ) 121 | except subprocess.CalledProcessError as e: 122 | raise EdgeException(f"Error occurred while exposing MongoDB\n{e.output}") 123 | 124 | password = get_mongodb_password() 125 | 126 | with SubStepTUI("Getting MongoDB IP address (may take a few minutes)"): 127 | external_ip = get_lb_ip("mongodb-lb") 128 | while external_ip == "": 129 | time.sleep(5) 130 | external_ip = get_lb_ip("mongodb-lb") 131 | 132 | internal_connection_string = f"mongodb://sacred:{password}@mongodb/sacred" 133 | external_connection_string = f"mongodb://sacred:{password}@{external_ip}:60000/sacred" 134 | 135 | with SubStepTUI("Saving MongoDB credentials into kubernetes secrets") as sub_step: 136 | try: 137 | subprocess.check_output( 138 | "kubectl delete secret mongodb-connection", shell=True, stderr=subprocess.STDOUT 139 | ) 140 | except subprocess.CalledProcessError as exc: 141 | if "NotFound" in exc.output.decode("utf-8"): 142 | pass # error expected if the secret was not previously created 143 | else: 144 | raise EdgeException( 145 | f"Error while trying to delete mongodb-connection secret\n{exc.output.decode('utf-8')}" 146 | ) 147 | try: 148 | subprocess.check_output( 149 | f"kubectl create secret generic mongodb-connection " 150 | f"--from-literal=internal={internal_connection_string}", 151 | shell=True, 152 | ) 153 | except subprocess.CalledProcessError as exc: 154 | raise EdgeException( 155 | f"Error while trying to create mongodb-connection secret\n{exc.output.decode('utf-8')}" 156 | ) 157 | 158 | sub_step.update(status=TUIStatus.SUCCESSFUL) 159 | sub_step.add_explanation(f"Internal connection string: mongodb://sacred:*****@mongodb/sacred") 160 | sub_step.add_explanation(f"External connection string: mongodb://sacred:*****@{external_ip}:60000/sacred") 161 | sub_step.add_explanation(f"You can get full connection strings by running `./edge.sh experiments get-mongodb`") 162 | 163 | return internal_connection_string, external_connection_string 164 | 165 | 166 | def install_omniboard() -> str: 167 | with SubStepTUI("Installing experiment tracker dashboard (Omniboard)"): 168 | try: 169 | subprocess.check_output("kubectl apply -f /omniboard.yaml", stderr=subprocess.STDOUT, shell=True) 170 | except subprocess.CalledProcessError as e: 171 | raise EdgeException(f"Error occurred while applying Omniboard's configuration\n {e} {e.output}") 172 | 173 | with SubStepTUI("Getting Omniboard IP address (may take a few minutes)") as sub_step: 174 | external_ip = get_lb_ip("omniboard-lb") 175 | while external_ip == "": 176 | time.sleep(5) 177 | external_ip = get_lb_ip("omniboard-lb") 178 | 179 | sub_step.update(status=TUIStatus.SUCCESSFUL) 180 | sub_step.add_explanation( 181 | f"Omniboard is installed and available at http://{external_ip}:9000", 182 | ) 183 | return f"http://{external_ip}:9000" 184 | 185 | 186 | def save_mongo_to_secretmanager(project_id: str, secret_id: str, connection_string: str): 187 | with SubStepTUI("Saving MongoDB credentials to Google Cloud Secret Manager") as sub_step: 188 | try: 189 | client = secretmanager_v1.SecretManagerServiceClient() 190 | try: 191 | client.access_secret_version(name=f"projects/{project_id}/secrets/{secret_id}/versions/latest") 192 | except NotFound: 193 | client.create_secret( 194 | request={ 195 | "parent": f"projects/{project_id}", 196 | "secret_id": secret_id, 197 | "secret": {"replication": {"automatic": {}}}, 198 | } 199 | ) 200 | 201 | client.add_secret_version( 202 | request={ 203 | "parent": f"projects/{project_id}/secrets/{secret_id}", 204 | "payload": {"data": connection_string.encode()}, 205 | } 206 | ) 207 | except PermissionDenied as exc: 208 | sub_step.update(status=TUIStatus.FAILED) 209 | sub_step.add_explanation(exc.message) 210 | 211 | 212 | def delete_mongo_to_secretmanager(_config: EdgeConfig): 213 | project_id = _config.google_cloud_project.project_id 214 | secret_id = _config.experiments.mongodb_connection_string_secret 215 | print("## Removing MongoDB connection string from Google Cloud Secret Manager") 216 | client = secretmanager_v1.SecretManagerServiceClient() 217 | 218 | try: 219 | client.access_secret_version(name=f"projects/{project_id}/secrets/{secret_id}/versions/latest") 220 | client.delete_secret(name=f"projects/{project_id}/secrets/{secret_id}") 221 | except NotFound: 222 | print("Secret does not exist") 223 | return 224 | 225 | 226 | def delete_cluster(_config: EdgeConfig): 227 | project_id = _config.google_cloud_project.project_id 228 | region = _config.google_cloud_project.region 229 | cluster_name = _config.experiments.gke_cluster_name 230 | print(f"## Deleting cluster '{cluster_name}'") 231 | client = container_v1.ClusterManagerClient() 232 | try: 233 | client.get_cluster( 234 | project_id=project_id, name=f"projects/{project_id}/locations/{region}/clusters/{cluster_name}" 235 | ) 236 | os.system(f"gcloud container clusters delete {cluster_name} --project {project_id} --region {region}") 237 | except NotFound: 238 | print("Cluster does not exist") 239 | 240 | 241 | def setup_sacred(project_id: str, region: str, gke_cluster_name: str, secret_id: str) -> SacredState: 242 | with StepTUI("Installing experiment tracker", emoji="📔"): 243 | create_cluster( 244 | project_id, region, gke_cluster_name 245 | ) 246 | 247 | get_credentials( 248 | project_id, region, gke_cluster_name 249 | ) 250 | 251 | internal_mongo_string, external_mongo_string = install_mongodb() 252 | 253 | save_mongo_to_secretmanager(project_id, secret_id, external_mongo_string) 254 | 255 | external_omniboard_string = install_omniboard() 256 | 257 | return SacredState(external_omniboard_string=external_omniboard_string) 258 | 259 | 260 | def tear_down_sacred(_config: EdgeConfig, _state: EdgeState): 261 | print("# Tearing down Sacred+Omniboard") 262 | 263 | delete_mongo_to_secretmanager(_config) 264 | delete_cluster(_config) 265 | 266 | 267 | def get_connection_string(project_id: str, secret_id: str) -> str: 268 | client = secretmanager_v1.SecretManagerServiceClient() 269 | 270 | secret_name = f"projects/{project_id}/secrets/{secret_id}/versions/latest" 271 | response = client.access_secret_version(name=secret_name) 272 | 273 | return response.payload.data.decode("UTF-8") 274 | 275 | 276 | def track_experiment(config: EdgeConfig, state: EdgeState, experiment: Experiment): 277 | if config is None or state is None: 278 | print("Vertex:edge configuration is not provided, the experiment will not be tracked") 279 | return 280 | 281 | if state.sacred is None: 282 | print("Experiment tracker is not initialised in vertex:edge, the experiment will not be tracked") 283 | return 284 | 285 | project_id = config.google_cloud_project.project_id 286 | secret_id = config.experiments.mongodb_connection_string_secret 287 | mongo_connection_string = get_connection_string(project_id, secret_id) 288 | experiment.observers.append(MongoObserver(mongo_connection_string)) 289 | -------------------------------------------------------------------------------- /src/edge/state.py: -------------------------------------------------------------------------------- 1 | import os.path 2 | from serde import serialize, deserialize 3 | from serde.yaml import to_yaml, from_yaml 4 | from dataclasses import dataclass 5 | from google.cloud import storage 6 | 7 | from edge.exception import EdgeException 8 | from edge.storage import get_bucket, StorageBucketState 9 | from edge.config import EdgeConfig 10 | from edge.tui import StepTUI, SubStepTUI 11 | from typing import Type, TypeVar, Optional, Dict 12 | from contextlib import contextmanager 13 | 14 | 15 | @deserialize 16 | @serialize 17 | @dataclass 18 | class SacredState: 19 | external_omniboard_string: str 20 | 21 | 22 | @deserialize 23 | @serialize 24 | @dataclass 25 | class ModelState: 26 | endpoint_resource_name: str 27 | deployed_model_resource_name: Optional[str] = None 28 | 29 | 30 | T = TypeVar("T", bound="EdgeState") 31 | 32 | 33 | @deserialize 34 | @serialize 35 | @dataclass 36 | class EdgeState: 37 | models: Optional[Dict[str, ModelState]] = None 38 | sacred: Optional[SacredState] = None 39 | storage: Optional[StorageBucketState] = None 40 | 41 | def save(self, _config: EdgeConfig): 42 | client = storage.Client(project=_config.google_cloud_project.project_id) 43 | bucket = client.bucket(_config.storage_bucket.bucket_name) 44 | blob = storage.Blob(".edge_state/edge_state.yaml", bucket) 45 | blob.upload_from_string(to_yaml(self)) 46 | 47 | 48 | @classmethod 49 | def load(cls: Type[T], _config: EdgeConfig) -> T: 50 | client = storage.Client(project=_config.google_cloud_project.project_id) 51 | bucket = client.bucket(_config.storage_bucket.bucket_name) 52 | blob = storage.Blob(".edge_state/edge_state.yaml", bucket) 53 | 54 | if blob.exists(): 55 | return from_yaml(EdgeState, blob.download_as_bytes(client).decode("utf-8")) 56 | else: 57 | raise EdgeException(f"State file is not found in '{_config.storage_bucket.bucket_name}' bucket." 58 | f"Initialise vertex:edge state by running `./edge.py init.`") 59 | 60 | @classmethod 61 | @contextmanager 62 | def context( 63 | cls: Type[T], 64 | _config: EdgeConfig, 65 | to_lock: bool = False, 66 | to_save: bool = False, 67 | silent: bool = False 68 | ) -> T: 69 | with StepTUI("Loading vertex:edge state", emoji="💾", silent=silent): 70 | state = None 71 | locked = False 72 | 73 | if to_lock: 74 | with SubStepTUI("Locking state", silent=silent): 75 | locked = EdgeState.lock(_config.google_cloud_project.project_id, 76 | _config.storage_bucket.bucket_name) 77 | 78 | with SubStepTUI("Loading state", silent=silent): 79 | state = EdgeState.load(_config) 80 | try: 81 | yield state 82 | finally: 83 | if (to_save and state is not None) or locked: 84 | with StepTUI("Saving vertex:edge state", emoji="💾", silent=silent): 85 | if to_save and state is not None: 86 | with SubStepTUI("Saving state", silent=silent): 87 | state.save(_config) 88 | if locked: 89 | with SubStepTUI("Unlocking state", silent=silent): 90 | EdgeState.unlock(_config.google_cloud_project.project_id, 91 | _config.storage_bucket.bucket_name) 92 | 93 | @classmethod 94 | def exists(cls: Type[T], _config: EdgeConfig) -> bool: 95 | client = storage.Client(project=_config.google_cloud_project.project_id) 96 | bucket = client.bucket(_config.storage_bucket.bucket_name) 97 | blob = storage.Blob(".edge_state/edge_state.yaml", bucket) 98 | return blob.exists() 99 | 100 | @classmethod 101 | def lock(cls, project: str, bucket_name: str, blob_name: str = ".edge_state/edge_state.yaml") -> bool: 102 | """ 103 | Lock the state file in Google Storage Bucket 104 | 105 | :param project: 106 | :param bucket_name: 107 | :param blob_name: 108 | :return: (bool, bool) -- is lock successful, is state to be locked later 109 | """ 110 | bucket = get_bucket(project, bucket_name) 111 | if bucket is None or not bucket.exists(): 112 | raise EdgeException("Google Storage Bucket does not exist. Initialise it by running `./edge.py init.`") 113 | blob = storage.Blob(f"{blob_name}.lock", bucket) 114 | if blob.exists(): 115 | raise EdgeException("State file is already locked") 116 | 117 | blob.upload_from_string("locked") 118 | return True 119 | 120 | @classmethod 121 | def unlock(cls, project: str, bucket_name: str, blob_name: str = ".edge_state/edge_state.yaml"): 122 | bucket = get_bucket(project, bucket_name) 123 | blob = storage.Blob(f"{blob_name}.lock", bucket) 124 | 125 | if bucket is not None and blob.exists(): 126 | blob.delete() 127 | -------------------------------------------------------------------------------- /src/edge/storage.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from typing import Optional 3 | from serde import serialize, deserialize 4 | from dataclasses import dataclass 5 | from google.api_core.exceptions import NotFound, Forbidden 6 | from google.cloud import storage 7 | from .config import EdgeConfig 8 | from .exception import EdgeException 9 | from .tui import ( 10 | print_substep_not_done, print_substep_success, print_substep_failure, print_failure_explanation, print_substep, 11 | clear_last_line, SubStepTUI 12 | ) 13 | 14 | 15 | @deserialize 16 | @serialize 17 | @dataclass 18 | class StorageBucketState: 19 | bucket_path: str 20 | 21 | 22 | def get_bucket(project_id: str, bucket_name: str) -> Optional[storage.Bucket]: 23 | try: 24 | client = storage.Client(project_id) 25 | bucket = client.get_bucket(bucket_name) 26 | return bucket 27 | except NotFound: 28 | return None 29 | except Forbidden: 30 | raise EdgeException( 31 | f"The bucket '{bucket_name}' exists, but you do not have permissions to access it. " 32 | "Maybe it belongs to another project? " 33 | "Please see the following guidelines for more information " 34 | "https://cloud.google.com/storage/docs/naming-buckets" 35 | ) 36 | 37 | 38 | def get_bucket_uri(project_id: str, bucket_name: str) -> Optional[str]: 39 | bucket = get_bucket(project_id, bucket_name) 40 | if bucket is None: 41 | return None 42 | return f"gs://{bucket.name}/" 43 | 44 | 45 | def create_bucket(project_id: str, region: str, bucket_name: str) -> str: 46 | client = storage.Client(project_id) 47 | bucket = client.create_bucket(bucket_or_name=bucket_name, project=project_id, location=region) 48 | return f"gs://{bucket.name}/" 49 | 50 | 51 | def delete_bucket(project_id: str, region: str, bucket_name: str): 52 | client = storage.Client(project_id) 53 | bucket = client.get_bucket(bucket_name) 54 | print("## Deleting bucket content") 55 | bucket.delete_blobs(blobs=list(bucket.list_blobs())) 56 | print("## Deleting bucket") 57 | bucket.delete(force=True) 58 | print("Bucket deleted") 59 | 60 | 61 | def tear_down_storage(_config: EdgeConfig, _state): 62 | print("# Tearing down Google Storage") 63 | bucket_path = get_bucket_uri( 64 | _config.google_cloud_project.project_id, 65 | _config.storage_bucket.bucket_name, 66 | ) 67 | if bucket_path is not None: 68 | delete_bucket( 69 | _config.google_cloud_project.project_id, 70 | _config.google_cloud_project.region, 71 | _config.storage_bucket.bucket_name, 72 | ) 73 | 74 | 75 | def setup_storage(project_id: str, region: str, bucket_name: str) -> StorageBucketState: 76 | with SubStepTUI(f"Checking if '{bucket_name}' exists") as sub_step: 77 | try: 78 | bucket_path = get_bucket_uri( 79 | project_id, 80 | bucket_name, 81 | ) 82 | if bucket_path is None: 83 | clear_last_line() 84 | sub_step.update(message=f"'{bucket_name}' does not exist, creating it") 85 | bucket_path = create_bucket( 86 | project_id, 87 | region, 88 | bucket_name, 89 | ) 90 | return StorageBucketState(bucket_path) 91 | except NotFound as e: 92 | raise EdgeException(f"The '{project_id}' project could not be found. It might mean that the quota project " 93 | f"is set to a different project. Please generate new application default " 94 | f"credentials by running `gcloud auth application-default login`") 95 | except ValueError as e: 96 | raise EdgeException(f"Unexpected error while setting up Storage bucket:\n{str(e)}") 97 | -------------------------------------------------------------------------------- /src/edge/templates/tensorflow_model/cookiecutter.json: -------------------------------------------------------------------------------- 1 | { 2 | "model_name": "sklearn_model" 3 | } -------------------------------------------------------------------------------- /src/edge/templates/tensorflow_model/{{cookiecutter.model_name}}/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/templates/tensorflow_model/{{cookiecutter.model_name}}/__init__.py -------------------------------------------------------------------------------- /src/edge/templates/tensorflow_model/{{cookiecutter.model_name}}/train.py: -------------------------------------------------------------------------------- 1 | from edge.train import Trainer 2 | 3 | class MyTrainer(Trainer): 4 | def main(self): 5 | self.set_parameter("example", 123) 6 | 7 | # Add model training logic here 8 | 9 | return 0 # return your model score here 10 | 11 | MyTrainer("{{cookiecutter.model_name}}").run() 12 | -------------------------------------------------------------------------------- /src/edge/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import abc 4 | import uuid 5 | import inspect 6 | import logging 7 | from typing import Optional, Any 8 | from dataclasses import dataclass 9 | from enum import Enum 10 | 11 | from serde import serialize, deserialize 12 | from serde.json import to_json 13 | from sacred import Experiment 14 | from sacred.observers import MongoObserver 15 | from google.cloud import secretmanager_v1 16 | from google.cloud.aiplatform import Model, CustomJob 17 | 18 | import edge.path 19 | #from edge.state import EdgeState 20 | from edge.config import EdgeConfig 21 | from edge.exception import EdgeException 22 | 23 | logging.basicConfig(level = logging.INFO) 24 | 25 | class TrainingTarget(Enum): 26 | LOCAL = "local" 27 | VERTEX = "vertex" 28 | 29 | @deserialize 30 | @serialize 31 | @dataclass 32 | class TrainedModel: 33 | model_name: Optional[str] 34 | is_local: bool = False 35 | 36 | @classmethod 37 | def from_vertex_model(cls, model: Model): 38 | return TrainedModel( 39 | model_name=model.resource_name, 40 | ) 41 | 42 | @classmethod 43 | def from_local_model(cls): 44 | return TrainedModel( 45 | model_name=None, 46 | is_local=True, 47 | ) 48 | 49 | """ 50 | A Trainer encapsulates a model training script and its associated MLOps lifecycle 51 | 52 | TODO: Explain why it has been built in this way. Sacred forces us into this pattern, but at least we hide it from the user. 53 | TOOD: How much can we abstract this? What if Sacred is replaced with something else? 54 | TODO: How can we be even better at handling experiment config? 55 | """ 56 | class Trainer(): 57 | # TODO: group together experiment variables and Vertex variables. Note when target is local, we don't need Vertex values 58 | experiment = None 59 | experiment_run = None 60 | edge_config = None 61 | #edge_state = None 62 | name = None 63 | # TODO: Remove hard-coded Git link 64 | pip_requirements = [ 65 | "vertex-edge @ git+https://github.com/fuzzylabs/vertex-edge.git" 66 | ] 67 | vertex_staging_path = None 68 | vertex_output_path = None 69 | script_path = None 70 | mongo_connection_string = None 71 | target = TrainingTarget.LOCAL 72 | model_config = None 73 | model_id = None 74 | 75 | def __init__(self, name: str): 76 | self.name = name 77 | 78 | # We need the path to the training script itself 79 | self.script_path = inspect.getframeinfo(sys._getframe(1)).filename 80 | 81 | # Determine our target training environment 82 | if os.environ.get("RUN_ON_VERTEX") == "True": 83 | logging.info("Target training environment is Vertex") 84 | self.target = TrainingTarget.VERTEX 85 | else: 86 | logging.info("Target training environment is Local") 87 | self.target = TrainingTarget.LOCAL 88 | 89 | # Load the Edge configuration from the appropriate source 90 | # TODO: Document env var 91 | if os.environ.get("EDGE_CONFIG"): 92 | logging.info("Edge config will be loaded from environment variable EDGE_CONFIG_STRING") 93 | self.edge_config = self._decode_config_string(os.environ.get("EDGE_CONFIG")) 94 | else: 95 | logging.info("Edge config will be loaded from edge.yaml") 96 | # TODO: This isn't very stable. We should search for the config file. 97 | self.edge_config = EdgeConfig.load(edge.path.get_default_config_path_from_model(inspect.getframeinfo(sys._getframe(1)).filename)) 98 | 99 | # Extract the model configuration and check if the model has been initialised 100 | if name in self.edge_config.models: 101 | self.model_config = self.edge_config.models[name] 102 | else: 103 | raise EdgeException(f"Model with name {name} could not be found in Edge config. Perhaps it hasn't been initialised") 104 | 105 | # Load the Edge state 106 | #self.edge_state = EdgeState.load(self.edge_config) 107 | #logging.info(f"Edge state: {self.edge_state}") 108 | 109 | if os.environ.get("MODEL_ID"): 110 | self.model_id = os.environ.get("MODEL_ID") 111 | else: 112 | self.model_id = uuid.uuid4() 113 | 114 | # Determine correct paths for Vertex running 115 | self.vertex_staging_path = "gs://" + os.path.join( 116 | self.edge_config.storage_bucket.bucket_name, 117 | self.edge_config.storage_bucket.vertex_jobs_directory 118 | ) 119 | self.vertex_output_path = os.path.join(self.vertex_staging_path, str(self.model_id)) 120 | 121 | # Set up experiment tracking for this training job 122 | # TODO: Restore Git support 123 | # TODO: If training target is Vertex, we don't need to init an experiment 124 | # TODO: Experiment initialisation in its own function (but *must* be called during construction) 125 | self.experiment = Experiment(name, save_git_info=True) 126 | 127 | # TODO: Document env var 128 | if os.environ.get("MONGO_CONNECTION_STRING"): 129 | self.mongo_connection_string = os.environ.get("MONGO_CONNECTION_STRING") 130 | else: 131 | self.mongo_connection_string = self._get_mongo_connection_string() 132 | 133 | if self.mongo_connection_string is not None: 134 | self.experiment.observers.append(MongoObserver(self.mongo_connection_string)) 135 | else: 136 | logging.info("Experiment tracker has not been initialised") 137 | 138 | @self.experiment.main 139 | def ex_noop_main(c): 140 | pass 141 | 142 | """ 143 | To be implemented by data scientist 144 | """ 145 | @abc.abstractmethod 146 | def main(self): 147 | # TODO: A more user-friendly message 148 | raise NotImplementedError("The main method for this trainer has not been implemented") 149 | 150 | def set_parameter(self, key: str, value: Any): 151 | self.experiment_run.config[key] = value 152 | 153 | def get_parameter(self, key: str) -> Any: 154 | return self.experiment_run.config[key] 155 | 156 | def log_scalar(self, key: str, value: Any): 157 | self.experiment_run.log_scalar(key, value) 158 | 159 | def get_model_save_path(self): 160 | # TODO: Support local paths 161 | return self.vertex_output_path 162 | 163 | """ 164 | Executes the training script and tracks experiment details 165 | """ 166 | def run(self): 167 | json_path = os.path.join( 168 | os.path.dirname(self.script_path), 169 | "trained_model.json" 170 | ) 171 | 172 | with open(json_path, "w") as train_json: 173 | if self.target == TrainingTarget.VERTEX: 174 | self._run_on_vertex() 175 | 176 | try: 177 | model = self._create_model_on_vertex() 178 | train_json.write(to_json(TrainedModel.from_vertex_model(model))) 179 | except Exception as e: 180 | logging.info("Unable to capture saved model. This might mean the model has not been saved by the training script") 181 | else: 182 | self._run_locally() 183 | train_json.write(to_json(TrainedModel.from_local_model())) 184 | 185 | def _run_locally(self): 186 | self.experiment_run = self.experiment._create_run() 187 | result = self.main() 188 | 189 | self.experiment_run.log_scalar("score", result) 190 | self.experiment_run({}) 191 | 192 | def _run_on_vertex(self): 193 | environment_variables = { 194 | "RUN_ON_VERTEX": "False", 195 | "EDGE_CONFIG": self._get_encoded_config(), 196 | "MODEL_ID": str(self.model_id) 197 | } 198 | 199 | if self.mongo_connection_string is not None: 200 | environment_variables["MONGO_CONNECTION_STRING"] = self.mongo_connection_string 201 | 202 | CustomJob.from_local_script( 203 | display_name=f"{self.name}-custom-training", 204 | script_path=self.script_path, 205 | container_uri=self.model_config.training_container_image_uri, 206 | requirements=self.pip_requirements, 207 | #args=training_script_args, 208 | replica_count=1, 209 | project=self.edge_config.google_cloud_project.project_id, 210 | location=self.edge_config.google_cloud_project.region, 211 | staging_bucket=self.vertex_staging_path, 212 | environment_variables=environment_variables 213 | ).run() 214 | 215 | def _create_model_on_vertex(self): 216 | return Model.upload( 217 | display_name=self.name, 218 | project=self.edge_config.google_cloud_project.project_id, 219 | location=self.edge_config.google_cloud_project.region, 220 | serving_container_image_uri=self.model_config.serving_container_image_uri, 221 | artifact_uri=self.get_model_save_path() 222 | ) 223 | 224 | def _get_encoded_config(self) -> str: 225 | return str(self.edge_config).replace("\n", "\\n") 226 | 227 | def _decode_config_string(self, s: str) -> EdgeConfig: 228 | return EdgeConfig.from_string(s.replace("\\n", "\n")) 229 | 230 | def _get_mongo_connection_string(self) -> str: 231 | # Try to get the Mongo connection string, if available 232 | try: 233 | client = secretmanager_v1.SecretManagerServiceClient() 234 | secret_name = f"projects/{self.edge_config.google_cloud_project.project_id}/secrets/{self.edge_config.experiments.mongodb_connection_string_secret}/versions/latest" 235 | response = client.access_secret_version(name=secret_name) 236 | return response.payload.data.decode("UTF-8") 237 | except Exception as e: 238 | return None 239 | -------------------------------------------------------------------------------- /src/edge/tui.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import traceback 3 | from typing import Optional 4 | import questionary 5 | from enum import Enum 6 | from edge.exception import EdgeException 7 | 8 | styles = { 9 | "heading": "bold underline", 10 | "step": "bold", 11 | "substep": "", 12 | "success": "fg:ansigreen", 13 | "failure": "fg:ansired", 14 | "warning": "fg:ansiyellow", 15 | } 16 | 17 | qmark = " ?" 18 | 19 | 20 | def print_heading(text: str): 21 | questionary.print(text, styles["heading"]) 22 | 23 | 24 | def strfmt_step(text: str, emoji: str = "*"): 25 | return f"{emoji} {text}" 26 | 27 | 28 | def print_step(text: str, emoji: str = "*"): 29 | questionary.print(strfmt_step(text, emoji), styles["step"]) 30 | 31 | 32 | def strfmt_substep(text): 33 | return f" {text}" 34 | 35 | 36 | def print_substep(text: str): 37 | questionary.print(strfmt_substep(f"◻️ {text}"), styles["substep"]) 38 | 39 | 40 | def strfmt_substep_success(text): 41 | return strfmt_substep(f"✔ ️{text}") 42 | 43 | 44 | def strfmt_substep_failure(text): 45 | return strfmt_substep(f"❌ {text}") 46 | 47 | 48 | def strfmt_substep_warning(text): 49 | return strfmt_substep(f"⚠️ {text}") 50 | 51 | 52 | def strfmt_substep_not_done(text): 53 | return strfmt_substep(f"⏳ {text}") 54 | 55 | 56 | def print_substep_success(text: str): 57 | questionary.print(strfmt_substep_success(text), styles["success"]) 58 | 59 | 60 | def print_substep_failure(text: str): 61 | questionary.print(strfmt_substep_failure(text), styles["failure"]) 62 | 63 | 64 | def print_substep_not_done(text: str): 65 | questionary.print(strfmt_substep_not_done(text), styles["substep"]) 66 | 67 | 68 | def print_substep_warning(text: str): 69 | questionary.print(strfmt_substep_warning(text), styles["warning"]) 70 | 71 | 72 | def strfmt_failure_explanation(text: str): 73 | return f" - {text}" 74 | 75 | 76 | def print_failure_explanation(text: str): 77 | questionary.print(strfmt_failure_explanation(text), styles["failure"]) 78 | 79 | 80 | def print_warning_explanation(text: str): 81 | questionary.print(strfmt_failure_explanation(text), styles["warning"]) 82 | 83 | 84 | def clear_last_line(): 85 | print("\033[1A\033[0K", end="\r") 86 | 87 | 88 | class TUIStatus(Enum): 89 | NEUTRAL = "neutral" 90 | PENDING = "pending" 91 | SUCCESSFUL = "successful" 92 | FAILED = "failed" 93 | WARNING = "warning" 94 | 95 | 96 | class TUI(object): 97 | def __init__( 98 | self, 99 | intro: str, 100 | success_title: str, 101 | success_message: str, 102 | failure_title: str, 103 | failure_message: str, 104 | ): 105 | self.intro = intro 106 | self.success_title = success_title 107 | self.success_message = success_message 108 | self.failure_title = failure_title 109 | self.failure_message = failure_message 110 | 111 | def __enter__(self): 112 | questionary.print(self.intro, "bold underline") 113 | return self 114 | 115 | def __exit__(self, exc_type, exc_val, exc_tb): 116 | if exc_type is None: 117 | print() 118 | questionary.print(self.success_title, style="fg:ansigreen") 119 | print() 120 | print(self.success_message) 121 | sys.exit(0) 122 | elif exc_type is EdgeException: 123 | print() 124 | questionary.print(self.failure_title, style="fg:ansired") 125 | print() 126 | questionary.print(self.failure_message, style="fg:ansired") 127 | sys.exit(1) 128 | else: 129 | return False 130 | 131 | 132 | class StepTUI(object): 133 | def __init__(self, message: str, emoji: str = "*", silent: bool = False): 134 | self.message = message 135 | self.emoji = emoji 136 | self.silent = silent 137 | 138 | def __enter__(self): 139 | self.print() 140 | return self 141 | 142 | def __exit__(self, exc_type, exc_val, exc_tb): 143 | return False # Do not suppress 144 | 145 | def print(self): 146 | if self.silent: 147 | return 148 | questionary.print(f"{self.emoji} {self.message}", "bold") 149 | 150 | 151 | class SubStepTUI(object): 152 | style = { 153 | TUIStatus.NEUTRAL: "", 154 | TUIStatus.PENDING: "", 155 | TUIStatus.SUCCESSFUL: styles["success"], 156 | TUIStatus.FAILED: styles["failure"], 157 | TUIStatus.WARNING: styles["warning"] 158 | } 159 | 160 | emoji = { 161 | TUIStatus.NEUTRAL: "🤔", 162 | TUIStatus.PENDING: "⏳", 163 | TUIStatus.SUCCESSFUL: "✔", 164 | TUIStatus.FAILED: "❌", 165 | TUIStatus.WARNING: "⚠️" 166 | } 167 | 168 | def __init__(self, message: str, status=TUIStatus.PENDING, silent: bool = False): 169 | self.message = message 170 | self.status = status 171 | self.written = False 172 | self._entered = False 173 | self._dirty = False 174 | self.silent = silent 175 | 176 | def __enter__(self): 177 | self._entered = True 178 | self.print() 179 | return self 180 | 181 | def __exit__(self, exc_type, exc_val, exc_tb): 182 | suppress = True 183 | if exc_type is None: # sub-step exited with errors 184 | if self.status == TUIStatus.PENDING: 185 | self.update(status=TUIStatus.SUCCESSFUL) 186 | elif exc_type is EdgeException: 187 | if exc_val.fatal: 188 | self.update(status=TUIStatus.FAILED) 189 | suppress = False 190 | else: 191 | self.update(status=TUIStatus.WARNING) 192 | self.add_explanation(str(exc_val)) 193 | else: 194 | self.update(status=TUIStatus.FAILED) 195 | self.add_explanation( 196 | "Unexpected error occurred:\n\n" 197 | f"{''.join(traceback.format_tb(exc_tb))}\n" 198 | f" {exc_type.__name__}: {str(exc_val)}\n" 199 | f"\n Please raise an issue for this error at https://github.com/fuzzylabs/vertex-edge/issues" 200 | ) 201 | raise EdgeException("Unexpected error occurred, but already reported") 202 | 203 | self._entered = False 204 | 205 | return suppress 206 | 207 | def print(self): 208 | if self.silent: 209 | return 210 | if not self._entered: 211 | return 212 | if self.written and not self._dirty: 213 | clear_last_line() 214 | line = f" {self.emoji[self.status]} {self.message}" 215 | questionary.print(line, self.style[self.status]) 216 | self.written = True 217 | 218 | def update(self, message: Optional[str] = None, status: Optional[TUIStatus] = None): 219 | if message is not None: 220 | self.message = message 221 | if status is not None: 222 | self.status = status 223 | self.print() 224 | 225 | def add_explanation(self, text: str): 226 | self._dirty = True 227 | line = f" - {text}" 228 | questionary.print(line, self.style[self.status]) 229 | 230 | def set_dirty(self): 231 | self._dirty = True 232 | -------------------------------------------------------------------------------- /src/edge/versions.py: -------------------------------------------------------------------------------- 1 | import json 2 | import subprocess 3 | from dataclasses import dataclass 4 | from .exception import EdgeException 5 | 6 | 7 | @dataclass 8 | class Version: 9 | major: int 10 | minor: int 11 | patch: int 12 | 13 | @classmethod 14 | def from_string(cls, version_string: str): 15 | version_string = version_string.split("+")[0].strip("v") 16 | ns = [int(x) for x in version_string.split(".")] 17 | return Version(*ns) 18 | 19 | def is_at_least(self, other): 20 | if self.major > other.major: 21 | return True 22 | elif self.major == other.major: 23 | if self.minor > other.minor: 24 | return True 25 | elif self.minor == other.minor: 26 | if self.patch >= other.patch: 27 | return True 28 | return False 29 | 30 | def __str__(self): 31 | return f"{self.major}.{self.minor}.{self.patch}" 32 | 33 | 34 | def command_exist(command) -> bool: 35 | try: 36 | subprocess.check_output(f"which {command}", shell=True, stderr=subprocess.STDOUT) 37 | return True 38 | except subprocess.CalledProcessError: 39 | return False 40 | 41 | 42 | def get_version(command) -> str: 43 | try: 44 | version_string = subprocess.check_output(command, shell=True, stderr=subprocess.DEVNULL).decode("utf-8") 45 | return version_string 46 | except subprocess.CalledProcessError: 47 | raise EdgeException(f"Unexpected error, while trying to get version with `{command}`") 48 | 49 | 50 | def get_gcloud_version(component: str = "core") -> Version: 51 | if not command_exist("gcloud"): 52 | raise EdgeException("Unable to locate gcloud. Please visit https://cloud.google.com/sdk/docs/install for installation instructions.") 53 | version_string = get_version("gcloud version --format json") 54 | return Version.from_string(json.loads(version_string)[component]) 55 | 56 | def get_kubectl_version() -> Version: 57 | if not command_exist("kubectl"): 58 | raise EdgeException("Unable to locate kubectl. Please visit https://kubernetes.io/docs/tasks/tools/ for installation instructions.") 59 | version_string = get_version("kubectl version --client=true --short -o json") 60 | return Version.from_string(json.loads(version_string)["clientVersion"]["gitVersion"]) 61 | 62 | 63 | def get_helm_version() -> Version: 64 | if not command_exist("helm"): 65 | raise EdgeException("Unable to locate helm. Please visit https://helm.sh/docs/intro/install/ for installation instructions.") 66 | version_string = get_version("helm version --short") 67 | return Version.from_string(version_string) 68 | -------------------------------------------------------------------------------- /src/edge/vertex_deploy.py: -------------------------------------------------------------------------------- 1 | from google.cloud.aiplatform import Model, Endpoint 2 | from google.api_core.exceptions import NotFound 3 | 4 | from edge.exception import EdgeException 5 | from edge.tui import StepTUI, SubStepTUI 6 | 7 | 8 | def vertex_deploy(endpoint_resource_name: str, model_resource_name: str, model_name: str): 9 | with StepTUI(f"Deploying model '{model_name}'", emoji="🐏"): 10 | with SubStepTUI(f"Checking endpoint '{endpoint_resource_name}'"): 11 | try: 12 | endpoint = Endpoint(endpoint_name=endpoint_resource_name) 13 | except NotFound: 14 | raise EdgeException(f"Endpoint '{endpoint_resource_name}' is not found. Please reinitialise the model " 15 | f"by running `./edge.py model init` to create it.") 16 | with SubStepTUI(f"Undeploying previous models from endpoint '{endpoint_resource_name}'"): 17 | endpoint.undeploy_all() 18 | with SubStepTUI(f"Deploying model '{model_resource_name}' on endpoint '{endpoint_resource_name}'"): 19 | try: 20 | model = Model(model_resource_name) 21 | except NotFound: 22 | raise EdgeException(f"Model '{model_resource_name}' is not found. You need to train a model " 23 | f"by running `dvc repro ...`.") 24 | endpoint.deploy(model=model, traffic_percentage=100, machine_type="n1-standard-2") 25 | -------------------------------------------------------------------------------- /src/vertex_edge.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | vertex:edge CLI tool 4 | """ 5 | import argparse 6 | import warnings 7 | import logging 8 | 9 | from edge.command.force_unlock import force_unlock 10 | from edge.command.experiments.subparser import add_experiments_parser, run_experiments_actions 11 | from edge.command.init import edge_init 12 | from edge.command.dvc.subparser import add_dvc_parser, run_dvc_actions 13 | from edge.command.config.subparser import add_config_parser, run_config_actions 14 | from edge.command.model.subparser import add_model_parser, run_model_actions 15 | 16 | logging.disable(logging.WARNING) 17 | warnings.filterwarnings( 18 | "ignore", 19 | "Your application has authenticated using end user credentials from Google Cloud SDK without a quota project.", 20 | ) 21 | 22 | 23 | if __name__ == "__main__": 24 | parser = argparse.ArgumentParser(description="Edge", formatter_class=argparse.RawTextHelpFormatter) 25 | parser.add_argument( 26 | "-c", "--config", type=str, default="edge.yaml", help="Path to the configuration file (default: edge.yaml)" 27 | ) 28 | 29 | subparsers = parser.add_subparsers(title="command", dest="command", required=True) 30 | init_parser = subparsers.add_parser("init", help="Initialise vertex:edge") 31 | force_unlock_parser = subparsers.add_parser("force-unlock", help="Force unlock vertex:edge state") 32 | 33 | add_dvc_parser(subparsers) 34 | add_model_parser(subparsers) 35 | add_experiments_parser(subparsers) 36 | add_config_parser(subparsers) 37 | 38 | args = parser.parse_args() 39 | 40 | if args.command == "init": 41 | edge_init() 42 | elif args.command == "force-unlock": 43 | force_unlock() 44 | elif args.command == "dvc": 45 | run_dvc_actions(args) 46 | elif args.command == "model": 47 | run_model_actions(args) 48 | elif args.command == "experiments": 49 | run_experiments_actions(args) 50 | elif args.command == "config": 51 | run_config_actions(args) 52 | 53 | raise NotImplementedError("The rest of the commands are not implemented") 54 | -------------------------------------------------------------------------------- /tutorials/setup.md: -------------------------------------------------------------------------------- 1 | # vertex:edge setup 2 | 3 | In this tutorial you'll see how to set up a new project using vertex:edge. 4 | 5 | ## Preparation 6 | 7 | The very first thing you'll need is a fresh directory in which to work. For instance: 8 | 9 | ``` 10 | mkdir hello-world-vertex 11 | cd hello-world-vertex 12 | ``` 13 | 14 | ## Setting up GCP environment 15 | 16 | Now you'll need a [GCP account](https://cloud.google.com), so sign up for one if you haven't already done so. 17 | 18 | Then within your GCP account, [create a new project](https://cloud.google.com/resource-manager/docs/creating-managing-projects). Take a note of the project ID; you'll be able to view this in the Google Cloud console with the project selection dialog). Note that the project ID won't necessarily match the name that you chose for the project, as GCP often appends some digits to the end of the name. 19 | 20 | Finally make sure you have [enabled billing](https://cloud.google.com/billing/docs/how-to/modify-project) for your new project too. 21 | 22 | ## Authenticating with GCP 23 | 24 | If you haven't got the `gcloud` command line tool, [install it now](https://cloud.google.com/sdk/docs/install). 25 | 26 | And then authenticate by running: 27 | 28 | ``` 29 | gcloud auth login 30 | ``` 31 | 32 | Next you need to configure the project ID. This should be the project which you created during 'Setting up GCP environment' above. 33 | 34 | ``` 35 | gcloud config set project 36 | ``` 37 | 38 | You'll also need to configure a region. Please see the [GCP documentation](https://cloud.google.com/vertex-ai/docs/general/locations#feature-availability) to learn which regions are available for Vertex. 39 | 40 | ``` 41 | gcloud config set compute/region 42 | ``` 43 | 44 | **Note** `gcloud` might ask you if you want to enable the Google Compute API on the project. If so, type `y` to enable this. 45 | 46 | Finally, you need to run one more command to complete authentication: 47 | 48 | ``` 49 | gcloud auth application-default login 50 | ``` 51 | 52 | ## Installing vertex:edge 53 | 54 | We'll use PIP to install **vertex:edge**. Before doing this, it's a good idea to run `pip install --upgrade-pip` to ensure that you have the most recent PIP version. 55 | 56 | To install vertex_edge, run: 57 | 58 | ``` 59 | pip install vertex-edge 60 | ``` 61 | 62 | After doing that, you should have the `edge` command available. Try running: 63 | 64 | ``` 65 | edge --help 66 | ``` 67 | 68 | **Note** that when you run `edge` for the first time, it will download a Docker image (`fuzzylabs/edge`), which might take some time to complete. All Edge commands run inside Docker. 69 | 70 | ## Initialising vertex:edge 71 | 72 | Before you can use **vertex:edge** to train models, you'll need to initialise your project. This only needs to be done once, whenever you start a new project. 73 | 74 | Inside your project directory, run 75 | 76 | ``` 77 | edge init 78 | ``` 79 | 80 | As part of the initialisation process, vertex:edge will first verify that your GCP environment is setup correctly and it will confirm your choice of project name and region, so that you don't accidentally install things to the wrong GCP environment. 81 | 82 | It will ask you to choose a name for a cloud storage bucket. This bucket is used for a number of things: 83 | 84 | * Tracking the state of your project. 85 | * Storing model assets. 86 | * Storing versioned data. 87 | 88 | Keep in mind that on GCP, storage bucket names are **globally unique**, so you need to choose a name that isn't already taken. For more information please see the [official GCP documentation](https://cloud.google.com/storage/docs/naming-buckets). 89 | 90 | You might wonder what initialisation actually _does_: 91 | 92 | * It creates a configuration file in your project directory, called `edge.yaml`. The configuration includes details about your GCP environment, the models that you have created, and the cloud storage bucket. 93 | * And creates a _state file_. This lives in the cloud storage bucket, and it is used by **vertex:edge** to keep track of everything that it has deployed or trained. 94 | 95 | ## Next steps 96 | 97 | After all of the above, you'll have a new project directory `hello-world-vertex`, which will contain a configuration file `edge.yaml`. You'll also have a GCP project, inside which there will be a cloud storage bucket with a name that matches what you chose during `edge init`. 98 | 99 | You're now ready to train and deploy a model. See the [next tutorial](train_deploy.md) to learn how. 100 | -------------------------------------------------------------------------------- /tutorials/train_deploy.md: -------------------------------------------------------------------------------- 1 | # Training a model 2 | 3 | In this tutorial you'll be able to train and deploy a TensorFlow model to Google Vertex using the vertex:edge command line tool and Python library. 4 | 5 | Before following this tutorial, you should have already setup a GCP project and initialised vertex:edge. See the [setup tutorial](setup.md) for more information. 6 | 7 | ## Initialisation 8 | 9 | We're going to use the [TensorFlow](https://www.tensorflow.org) framework for this example, so let's go ahead and install that now: 10 | 11 | ``` 12 | pip install tensorflow 13 | ``` 14 | 15 | Next we initialise a new model, which makes **vertex:edge** aware that there is a new model. 16 | 17 | ``` 18 | edge model init hello-world 19 | ``` 20 | 21 | If you check your `config.yaml` file now, you will see that a model has been added to the `models` section: 22 | 23 | ```yaml 24 | models: 25 | hello-world: 26 | endpoint_name: hello-world-endpoint 27 | name: hello-world 28 | serving_container_image_uri: europe-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-6:latest 29 | training_container_image_uri: europe-docker.pkg.dev/vertex-ai/training/tf-cpu.2-6:latest 30 | ``` 31 | 32 | Note that you won't see anything new appear in the Google Cloud Console until after the model has actually been trained, which we'll do next. 33 | 34 | ## Writing a model training script 35 | 36 | To begin with, we can generate an outline of our model training code using a template: 37 | 38 | ``` 39 | edge model template hello-world 40 | ``` 41 | 42 | You will be asked which framework you want to use, so select `tensorflow`. 43 | 44 | There will now be a Python script named `train.py` inside `models/hello-world`. Open this script in your favourite editor or IDE. It looks like this: 45 | 46 | ```python 47 | from edge.train import Trainer 48 | 49 | class MyTrainer(Trainer): 50 | def main(self): 51 | self.set_parameter("example", 123) 52 | 53 | # Add model training logic here 54 | 55 | return 0 # return your model score here 56 | 57 | MyTrainer("hello-world").run() 58 | ``` 59 | 60 | Every model training script needs to have the basic structure shown above. Let's break this down a little bit: 61 | 62 | * We start by importing the class `Trainer` from the **vertex:edge** library. 63 | * We define a training class. This class can have any name you like, as long as it extends `Trainer`. 64 | * The `Trainer` class provides a method called `main`, and this is where we write all of the model training logic. 65 | * We have the ability to set parameters and save performance metrics for experiment tracking - more on this shortly. 66 | * At the end, we just need one more line to instantiate and run our training class. 67 | 68 | Now let's create something a bit more interesting. A simple classifier: 69 | 70 | ```python 71 | TODO 72 | ``` 73 | 74 | ## Training the model 75 | 76 | Now we can train the model simply by running 77 | 78 | ``` 79 | python models/hello-world/train.py 80 | ``` 81 | 82 | Which will run the training script locally - i.e. on your computer. That's fine if your model is reasonably simple, but for more compute-intensive models we want to use the on-demand compute available in Google Vertex. 83 | 84 | The good news is that you don't need to modify the code in any way in order to train the model on Vertex, because **vertex:edge** figures out how to do package the training script and run it for you. All you run is this: 85 | 86 | ``` 87 | RUN_ON_VERTEX=True python models/hello-world/train.py 88 | ``` 89 | 90 | ## Deploying the model 91 | 92 | Once you've trained the model on Vertex as above, then you can also deploy it to Vertex. One important thing to remember, however, is that models trained locally _cannot_ be deployed to Vertex. 93 | 94 | Because vertex:edge keeps track of all the models you've trained, it's very easy to deploy the most recently trained model, like this: 95 | 96 | ``` 97 | edge model deploy hello-world 98 | ``` 99 | -------------------------------------------------------------------------------- /vertex-edge-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/vertex-edge-logo.png --------------------------------------------------------------------------------