├── .dockerignore
├── .github
└── ISSUE_TEMPLATE
│ ├── bug_report.md
│ └── feature_request.md
├── .gitignore
├── .python-version
├── CONTRIBUTING.md
├── DEVELOPERS.md
├── Dockerfile
├── LICENSE
├── MANIFEST.in
├── README.md
├── demo.gif
├── edge
├── edge_docker_entrypoint.sh
├── omniboard-screenshot.png
├── pyproject.toml
├── requirements-dev.txt
├── requirements.txt
├── setup.py
├── src
├── edge
│ ├── __init__.py
│ ├── command
│ │ ├── __init__.py
│ │ ├── common
│ │ │ ├── __init__.py
│ │ │ └── precommand_check.py
│ │ ├── config
│ │ │ ├── __init__.py
│ │ │ └── subparser.py
│ │ ├── dvc
│ │ │ ├── __init__.py
│ │ │ ├── init.py
│ │ │ └── subparser.py
│ │ ├── experiments
│ │ │ ├── __init__.py
│ │ │ ├── get_dashboard.py
│ │ │ ├── get_mongodb.py
│ │ │ ├── init.py
│ │ │ └── subparser.py
│ │ ├── force_unlock.py
│ │ ├── init.py
│ │ └── model
│ │ │ ├── __init__.py
│ │ │ ├── deploy.py
│ │ │ ├── describe.py
│ │ │ ├── get_endpoint.py
│ │ │ ├── init.py
│ │ │ ├── list.py
│ │ │ ├── remove.py
│ │ │ ├── subparser.py
│ │ │ └── template.py
│ ├── config.py
│ ├── dvc.py
│ ├── enable_api.py
│ ├── endpoint.py
│ ├── exception.py
│ ├── gcloud.py
│ ├── k8s
│ │ ├── __init__.py
│ │ └── omniboard.yaml
│ ├── path.py
│ ├── sacred.py
│ ├── state.py
│ ├── storage.py
│ ├── templates
│ │ └── tensorflow_model
│ │ │ ├── cookiecutter.json
│ │ │ └── {{cookiecutter.model_name}}
│ │ │ ├── __init__.py
│ │ │ └── train.py
│ ├── train.py
│ ├── tui.py
│ ├── versions.py
│ └── vertex_deploy.py
└── vertex_edge.py
├── tutorials
├── setup.md
└── train_deploy.md
└── vertex-edge-logo.png
/.dockerignore:
--------------------------------------------------------------------------------
1 | env/
2 | *~
3 | __pycache__/
4 | .pytest_cache/
5 | data/
6 | build/
7 | dist/
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Bug report
3 | about: Create a report to help us improve
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 |
13 | **To Reproduce**
14 | Steps to reproduce the behavior:
15 | 1. Go to '...'
16 | 2. Click on '....'
17 | 3. Scroll down to '....'
18 | 4. See error
19 |
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 |
23 | **Screenshots**
24 | If applicable, add screenshots to help explain your problem.
25 |
26 | **Desktop (please complete the following information):**
27 | - OS: [e.g. iOS]
28 | - Browser [e.g. chrome, safari]
29 | - Version [e.g. 22]
30 |
31 | **Additional context**
32 | Add any other context about the problem here.
33 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Feature request
3 | about: Suggest an idea for this project
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Is your feature request related to a problem? Please describe.**
11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12 |
13 | **Describe the solution you'd like**
14 | A clear and concise description of what you want to happen.
15 |
16 | **Describe alternatives you've considered**
17 | A clear and concise description of any alternative solutions or features you've considered.
18 |
19 | **Additional context**
20 | Add any other context or screenshots about the feature request here.
21 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | env/
2 | *~
3 | __pycache__/
4 | .pytest_cache/
5 | /.idea/
6 | build/
7 | dist/
8 | /**/vertex_edge.egg-info/
--------------------------------------------------------------------------------
/.python-version:
--------------------------------------------------------------------------------
1 | 3.8.0
2 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing to vertex:edge
2 |
3 | First off, thanks for taking the time to contribute!
4 |
5 | This article will help you get started, from learning [how you can contribute](#how) all the way to raising your first [pull request](#firstcontrib).
6 |
7 | ## Contents
8 |
9 | * [How can I contribute?](#how)
10 | * [Your first code contribution](#firstcontrib)
11 | * [Style guides](#styleguides)
12 |
13 |
14 | ## How can I contribute?
15 |
16 | ### Testing
17 |
18 | This is a new project that is moving fast, and so one of the most useful ways you can help out is simply testing the tools and the documentation in order to tease out bugs, edge-cases and opportunities to improve things.
19 |
20 | If you are testing this out, whether in production or not, we're really keen to hear from you and to receive your feedback.
21 |
22 | ### Reporting bugs
23 |
24 | Sadly, bugs happen; we're sorry! Before reporting a bug, please check the [open issues](https://github.com/fuzzylabs/vertex-edge/issues) to see if somebody has submitted the same bug before. If so, feel free to add further detail to the existing issue.
25 |
26 | If your bug hasn't been raised before then [go ahead and raise it](https://github.com/fuzzylabs/vertex-edge/issues/new?assignees=&labels=bug&template=bug_report.md&title=) using our bug report template. Please provide as much information as possible to help us to reproduce the bug.
27 |
28 | ### Suggesting enhancements
29 |
30 | Enhancements and feature requests are very much welcome. We hope to learn from real-world useage which features are missing so that we can improve the tool to meet the expectations of real machine learning projects. Please use our [feature request template](https://github.com/fuzzylabs/vertex-edge/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) to do this.
31 |
32 | ### Taking on an existing issue
33 |
34 | You'll find plenty of opportunities to contribute amount our [open issues](https://github.com/fuzzylabs/vertex-edge/issues). If you'd like to pick up an issue, please add a comment saying so, as this avoids duplicate work. Then read on to make your [first code contribution](#firstcontrib).
35 |
36 |
37 | ## Your first code contribution
38 |
39 | ### Fork the repository
40 |
41 | We prefer that you fork the repository to your own Github account first before raising a pull request.
42 |
43 | ### Pull requests
44 |
45 | Once you've got a code change that's ready to be reviewed, please raise a pull request. If you've got some ongoing work that's not quite ready for review, feel free to raise the pull request, but please place `[WIP]` (work-in-progress) in front of the PR title so we know it's still being worked on.
46 |
47 | Please include a description in the pull request explaining what has been changed and/or added, how, and why. Please also link to relevant issues and discussions where appropriate.
48 |
49 |
50 | ## Style guides
51 |
52 | ### Git commit messages
53 |
54 | * Make sure it's descriptive, so not `fix bug` but `fix issue #1234 where servers spontaeously combusted on random Tuesdays`.
55 | * Keep the first line brief; use multiple lines if you want to add more details.
56 | * Reference relevant issues, discussions and pull requests where appropriate.
57 |
58 | ### Python code
59 |
60 | * Above all, write clean, understandable code.
61 | * Use [black](https://github.com/psf/black) and [PyLint](https://pypi.org/project/pylint) to help ensure code is consistent.
62 |
63 | ### Documentation
64 |
65 | * Use [Markdown](https://guides.github.com/features/mastering-markdown)
66 | * Place a table of contents at the top of each Markdown file.
67 | * Write concise, clear explanations.
68 |
--------------------------------------------------------------------------------
/DEVELOPERS.md:
--------------------------------------------------------------------------------
1 | # Development guide
2 |
3 | ## Python Package
4 |
5 | ### Requirements
6 |
7 | ```
8 | pip install -r requirements-dev.txt
9 | ```
10 |
11 | ### Build
12 |
13 | TODO
14 |
15 | ```
16 | ./setup.py build
17 | ./setup.py install
18 | ```
19 |
20 | Or to package
21 |
22 | ```
23 | python -m build
24 | ```
25 |
26 | ### Push to PyPi
27 |
28 | ```
29 | twine upload dist/* --verbose
30 | ```
31 |
32 | ### Testing locally
33 |
34 | ```
35 | mkdir my_test_project
36 | cd my_test_project
37 | python -m venv env/
38 | source env/bin/activate
39 | pip install -e
40 | ```
41 |
42 | This will install the tool locally within a venv
43 |
44 | ## Docker image
45 |
46 | ### Build
47 |
48 | ```
49 | docker build . -t fuzzylabs/edge
50 | ```
51 |
52 | ### Push
53 |
54 | ```
55 | docker push fuzzylabs/edge
56 | ```
57 |
58 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM python:3.9.6-slim
2 |
3 | RUN apt update \
4 | && apt install -y curl \
5 | && apt install -y git \
6 | && rm -rf /var/lib/apt/lists/*
7 |
8 | # Install GCloud tools
9 | RUN curl https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-347.0.0-linux-x86_64.tar.gz > /tmp/google-cloud-sdk.tar.gz \
10 | && mkdir -p /usr/local/gcloud \
11 | && tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz \
12 | && /usr/local/gcloud/google-cloud-sdk/install.sh \
13 | && /usr/local/gcloud/google-cloud-sdk/bin/gcloud components install alpha --quiet \
14 | && pip install dvc \
15 | && rm /tmp/google-cloud-sdk.tar.gz
16 |
17 | ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin
18 |
19 | # Install Kubectl
20 | RUN curl -LO https://dl.k8s.io/release/v1.21.0/bin/linux/amd64/kubectl \
21 | && install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
22 |
23 | # Install Helm
24 | RUN curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
25 |
26 | # Install Python dependencies
27 | WORKDIR /project/
28 | COPY requirements.txt requirements.txt
29 | RUN pip install --no-cache-dir -r requirements.txt
30 |
31 | # Install edge
32 | COPY setup.py setup.py
33 | COPY MANIFEST.in MANIFEST.in
34 | COPY edge edge
35 | COPY src/ src/
36 |
37 | COPY src/edge/k8s/omniboard.yaml /omniboard.yaml
38 |
39 | RUN ./setup.py build
40 | RUN ./setup.py install
41 |
42 | # Copy the entrypoint script
43 | COPY edge_docker_entrypoint.sh /edge_docker_entrypoint.sh
44 | ENTRYPOINT ["/edge_docker_entrypoint.sh"]
45 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include src/edge/k8s/*.yaml
2 | recursive-include src/edge/templates/ *
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 |
4 |
5 |
6 |
7 | # Vertex:Edge
8 |
9 | Adopting MLOps into a data science workflow requires specialist knowledge of cloud engineering. As a data scientist, you just want to train your models and get on with your life. **vertex:edge** provides an environment for training and deploying models on Google Cloud that leverages the best available open-source MLOps tools to track your experiments and version your data.
10 |
11 |
14 |
15 | ## Contents
16 |
17 | * **[Why vertex:edge?](#why-vertexedge)**
18 | * **[Pre-requisites](#pre-requisites)**
19 | * **[Quick-start](#quick-start)**
20 | * **[Tutorials](#tutorials)**
21 | * **[Contributing](#contributing)**
22 |
23 | # Why vertex:edge?
24 |
25 | **vertex:edge** is a tool that sits on top of Vertex (Google's cloud AI platform). Ordinarily, training and deploying models with Vertex requires a fair amount of repetitive work, and moreover the tooling provided by Vertex for things like data versioning and experiment tracking [aren't quite up-to-scratch](https://fuzzylabs.ai/blog/vertex-ai-the-hype).
26 |
27 | **vertex:edge** addresses a number of challenges:
28 |
29 | * Training and deploying a model on Vertex with minimal effort.
30 | * Setting up useful MLOps tools such as experiment trackers in Google Cloud, without needing a lot of cloud engineering knowledge.
31 | * Seamlessly integrating MLOps tools into machine learning workflows.
32 |
33 | Our vision is to provide a complete environment for training models with MLOps capabilities built-in. Right now we support model training and deployment through Vertex and TensorFlow, experiment tracking thanks to [Sacred](https://github.com/IDSIA/sacred), and data versioning through [DVC](https://dvc.org). In the future we want to not only expand these features, but also add:
34 |
35 | * Support for multiple ML frameworks.
36 | * Integration into model monitoring solutions.
37 | * Easy integration into infrastructure-as-code tools such as Terraform.
38 |
39 | # Pre-requisites
40 |
41 | * [A Google Cloud account](https://cloud.google.com).
42 | * [gcloud command line tool](https://cloud.google.com/sdk/docs/install).
43 | * [Docker](https://docs.docker.com/get-docker) (version 18 or greater).
44 | * Python, at least version 3.8. Check this using `python --version`.
45 | * PIP, at least version 21.2.0. Check this using `pip --version`. To upgrade PIP, run `pip install --upgrade-pip`.
46 |
47 | # Quick-start
48 |
49 | This guide gives you a quick overview of using **vertex:edge** to train and deploy a model. If this is your first time training a model on Vertex, we recommend reading the more detailed tutorials on [Project Setup](tutorials/setup.md) and [Training and Deploying a Model to GCP](tutorials/train_deploy.md).
50 |
51 | ## Install vertex:edge
52 |
53 | ```
54 | pip install vertex-edge
55 | ```
56 |
57 | ## Authenticate with GCP
58 |
59 | ```
60 | gcloud auth login
61 | gcloud config set project
62 | gcloud config set compute/region
63 | gcloud auth application-default login
64 | ```
65 |
66 | ## Initialise your project
67 |
68 | ```
69 | edge init
70 | edge model init hello-world
71 | edge model template hello-world
72 | ```
73 |
74 | n.b. when you run `edge init`, you will be prompted for a cloud storage bucket name. This bucket is used for tracking your project state, storing trained models, and storing versioned data. Remember that bucket names need to be globally-unique on GCP.
75 |
76 | ## Train and deploy
77 |
78 | After running the above, you'll have a new Python script under `models/hello-world/train.py`. This script uses TensorFlow to train a simple model.
79 |
80 | To train the model on Google Vertex, run:
81 |
82 | ```
83 | RUN_ON_VERTEX=True python models/hello-world/train.py
84 | ```
85 |
86 | Once this has finished, you can deploy the model using:
87 |
88 | ```
89 | edge model deploy hello-world
90 | ```
91 |
92 | You can also train the model locally, without modifying any of the code:
93 |
94 | ```
95 | pip install tensorflow
96 | python models/hello-world/train.py
97 | ```
98 |
99 | Note that we needed to install TensorFlow first. This is by design, because we don't want the **vertex:edge** tool to depend on specific ML frameworks.
100 |
101 | ## Track experiments
102 |
103 | We can add experiment tracking with just one command:
104 |
105 | ```
106 | edge experiments init
107 | ```
108 |
109 | With experiment tracking enabled, every time you train a model, the details of the training run will be recorded, including performance metrics and training parameters.
110 |
111 | You can view all of these experiments in a dashboard. To get the dashboard URL, run:
112 |
113 | ```
114 | edge experiments get-dashboard
115 | ```
116 |
117 |
118 |
119 |
120 |
121 | To learn more, read our tutorial on [Tracking your experiments](tutorials/experiment_tracking.md).
122 |
123 | ## Version data
124 |
125 | By using data version control you can always track the history of your data. Combined with experiment tracking, it means each model can be tied to precisely the dataset that was used when the model was trained.
126 |
127 | We use [DVC](https://dvc.org) for data versioning. To enable it, run:
128 |
129 | ```
130 | edge dvc init
131 | ```
132 |
133 | n.b. you need to be working in an existing Git repository before you can enable data versioning.
134 |
135 | To learn more, read our tutorial on [Versioning your data](tutorials/versioning_data.md).
136 |
137 | # Tutorials
138 |
139 | * [Project Setup](tutorials/setup.md)
140 | * [Training and Deploying a Model to GCP](tutorials/train_deploy.md)
141 | * [Tracking your experiments](tutorials/experiment_tracking.md)
142 | * [Versioning your data](tutorials/versioning_data.md)
143 |
144 | # Contributing
145 |
146 | This is a new project and we're keen to get feedback from the community to help us improve it. Please do **raise and discuss issues**, send us pull requests, and don't forget to **~~like and subscribe~~** star and fork this repo.
147 |
148 | **If you want to contribute** then please check out our [contributions guide](CONTRIBUTING.md) and [developers guide](DEVELOPERS.md). We look forward to your contributions!
149 |
--------------------------------------------------------------------------------
/demo.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/demo.gif
--------------------------------------------------------------------------------
/edge:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # This script is a wrapper for running the vertex:edge Docker image
4 | # As well as running the image itself, it also mounts the Google Cloud secret key
5 | # So that we will be authenticated with GCP while inside the Docker container
6 |
7 | ACCOUNT=$(gcloud config get-value account)
8 | HOST_UID=$(id -u)
9 | HOST_GID=$(id -g)
10 | docker run -it \
11 | -v "$(pwd)":/project/ \
12 | -v ~/.config/gcloud/:/root/.config/gcloud/ \
13 | -e ACCOUNT="$ACCOUNT" -e HOST_UID="$HOST_UID" -e HOST_GID="$HOST_GID" \
14 | fuzzylabs/edge $@
15 |
--------------------------------------------------------------------------------
/edge_docker_entrypoint.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | export GOOGLE_APPLICATION_CREDENTIALS=/root/.config/gcloud/application_default_credentials.json
3 | gcloud config set component_manager/disable_update_check true &> /dev/null
4 | gcloud config set account "$ACCOUNT" &> /dev/null
5 |
6 | if [[ $1 == "bash" ]]
7 | then
8 | bash
9 | else
10 | vertex_edge.py "$@"
11 | fi
12 |
13 | chown -R "$HOST_UID":"$HOST_GID" . &> /dev/null
14 |
--------------------------------------------------------------------------------
/omniboard-screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/omniboard-screenshot.png
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [tool.black]
2 | line-length = 120
3 |
4 | [tool.pylint.'MESSAGES CONTROL']
5 | max-line-length = 120
6 |
--------------------------------------------------------------------------------
/requirements-dev.txt:
--------------------------------------------------------------------------------
1 | # Packages used for development but not required for the library distribution
2 |
3 | pylint
4 | black
5 | twine
6 | build
7 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | ## Serialisation / deserialisation
2 |
3 | pyserde==0.4.0
4 |
5 | ## GCP
6 |
7 | google-cloud-container==2.4.1
8 | google-cloud-secret-manager==2.5.0
9 | google_cloud_aiplatform==1.1.1
10 | google_cloud_storage==1.38.0
11 |
12 | ## Data versioning
13 |
14 | #dvc[gs]==2.5.0
15 |
16 | ## Experiment tracking
17 |
18 | sacred==0.8.2
19 | pymongo==3.11.4
20 |
21 | ## Terminal UI
22 |
23 | questionary==1.10.0
24 |
25 | ## TODO: do we need Dill?
26 | #dill==0.3.4
27 |
28 | # Model templating
29 | cookiecutter=1.7.3
30 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | from setuptools import setup, find_packages
4 |
5 | setup(
6 | name="vertex:edge",
7 | version="0.1.117",
8 | url="https://github.com/fuzzylabs/vertex-edge",
9 | package_dir={'': 'src'},
10 | packages=find_packages("src/"),
11 | scripts=["edge", "src/vertex_edge.py"],
12 | include_package_data=True,
13 | install_requires=[
14 | "pyserde==0.4.0",
15 | #"google-api-core==1.29.0",
16 | "google-cloud-container==2.4.1",
17 | "google-cloud-secret-manager==2.5.0",
18 | "google_cloud_aiplatform==1.1.1",
19 | "google-cloud-storage==1.38.0",
20 | "cookiecutter==1.7.3",
21 | #"dvc[gs]==2.5.0",
22 | "sacred==0.8.2",
23 | "pymongo==3.11.4",
24 | "questionary==1.10.0"
25 | ]
26 | )
27 |
--------------------------------------------------------------------------------
/src/edge/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/__init__.py
--------------------------------------------------------------------------------
/src/edge/command/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/__init__.py
--------------------------------------------------------------------------------
/src/edge/command/common/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/common/__init__.py
--------------------------------------------------------------------------------
/src/edge/command/common/precommand_check.py:
--------------------------------------------------------------------------------
1 | from edge.config import EdgeConfig
2 | from edge.exception import EdgeException
3 | from edge.gcloud import is_authenticated, project_exists, is_billing_enabled
4 | from edge.tui import SubStepTUI, StepTUI
5 |
6 |
7 | def check_gcloud_authenticated(project_id: str):
8 | with SubStepTUI(message="️Checking if you have authenticated with gcloud") as sub_step:
9 | _is_authenticated, _reason = is_authenticated(project_id)
10 | if not _is_authenticated:
11 | raise EdgeException(_reason)
12 |
13 |
14 | def check_project_exists(gcloud_project: str):
15 | with SubStepTUI(f"Checking if project '{gcloud_project}' exists") as sub_step:
16 | project_exists(gcloud_project)
17 |
18 |
19 | def check_billing_enabled(gcloud_project: str):
20 | with SubStepTUI(f"Checking if billing is enabled for project '{gcloud_project}'") as sub_step:
21 | if not is_billing_enabled(gcloud_project):
22 | raise EdgeException(
23 | f"Billing is not enabled for project '{gcloud_project}'. "
24 | f"Please enable billing for this project following these instructions "
25 | f"https://cloud.google.com/billing/docs/how-to/modify-projectBilling is not enabled "
26 | f"for project '{gcloud_project}'."
27 | )
28 |
29 |
30 | def precommand_checks(config: EdgeConfig):
31 | gcloud_project = config.google_cloud_project.project_id
32 | with StepTUI(message="Checking your GCP environment", emoji="☁️") as step:
33 | check_gcloud_authenticated(gcloud_project)
34 | check_project_exists(gcloud_project)
35 | check_billing_enabled(gcloud_project)
36 |
--------------------------------------------------------------------------------
/src/edge/command/config/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/config/__init__.py
--------------------------------------------------------------------------------
/src/edge/command/config/subparser.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import sys
3 |
4 | from edge.config import EdgeConfig
5 | from edge.exception import EdgeException
6 |
7 |
8 | def add_config_parser(subparsers):
9 | parser = subparsers.add_parser("config", help="Configuration related actions")
10 | actions = parser.add_subparsers(title="action", dest="action", required=True)
11 | actions.add_parser("get-region", help="Get configured region")
12 |
13 |
14 | def run_config_actions(args: argparse.Namespace):
15 | if args.action == "get-region":
16 | with EdgeConfig.context(silent=True) as config:
17 | print(config.google_cloud_project.region)
18 | sys.exit(0)
19 | else:
20 | raise EdgeException("Unexpected experiments command")
21 |
--------------------------------------------------------------------------------
/src/edge/command/dvc/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/dvc/__init__.py
--------------------------------------------------------------------------------
/src/edge/command/dvc/init.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | from edge.command.common.precommand_check import precommand_checks
4 | from edge.config import EdgeConfig
5 | from edge.state import EdgeState
6 | from edge.tui import TUI
7 | from edge.dvc import setup_dvc
8 | from edge.path import get_model_dvc_pipeline
9 |
10 |
11 | def dvc_init():
12 | intro = "Initialising data version control (DVC)"
13 | success_title = "DVC initialised successfully"
14 | success_message = f"""
15 | Now you can version your data using DVC. See https://dvc.org/doc for more details about how it can be used.
16 |
17 | What's next? We suggest you proceed with:
18 |
19 | Train and deploy a model (see 'Training a model' section of the README for more details):
20 | ./edge.sh model init fashion
21 | dvc repro {get_model_dvc_pipeline("fashion")}
22 | ./edge.sh model deploy fashion
23 |
24 | Happy herding! 🐏
25 | """.strip()
26 | failure_title = "DVC initialisation failed"
27 | failure_message = "See the errors above. See README for more details."
28 | with TUI(
29 | intro,
30 | success_title,
31 | success_message,
32 | failure_title,
33 | failure_message
34 | ) as tui:
35 | with EdgeConfig.context() as config:
36 | precommand_checks(config)
37 | with EdgeState.context(config) as state:
38 | setup_dvc(
39 | state.storage.bucket_path,
40 | config.storage_bucket.dvc_store_directory
41 | )
42 |
43 |
--------------------------------------------------------------------------------
/src/edge/command/dvc/subparser.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | from edge.command.dvc.init import dvc_init
3 | from edge.exception import EdgeException
4 |
5 |
6 | def add_dvc_parser(subparsers):
7 | parser = subparsers.add_parser("dvc", help="DVC related actions")
8 | actions = parser.add_subparsers(title="action", dest="action", required=True)
9 | actions.add_parser("init", help="Initialise DVC")
10 |
11 |
12 | def run_dvc_actions(args: argparse.Namespace):
13 | if args.action == "init":
14 | dvc_init()
15 | else:
16 | raise EdgeException("Unexpected DVC command")
17 |
--------------------------------------------------------------------------------
/src/edge/command/experiments/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/experiments/__init__.py
--------------------------------------------------------------------------------
/src/edge/command/experiments/get_dashboard.py:
--------------------------------------------------------------------------------
1 | import sys
2 |
3 | from edge.config import EdgeConfig
4 | from edge.state import EdgeState
5 |
6 |
7 | def get_dashboard():
8 | with EdgeConfig.context(silent=True) as config:
9 | with EdgeState.context(config, silent=True) as state:
10 | print(state.sacred.external_omniboard_string)
11 | sys.exit(0)
12 |
--------------------------------------------------------------------------------
/src/edge/command/experiments/get_mongodb.py:
--------------------------------------------------------------------------------
1 | import sys
2 |
3 | from edge.config import EdgeConfig
4 | from edge.sacred import get_connection_string
5 | from edge.state import EdgeState
6 |
7 |
8 | def get_mongodb():
9 | with EdgeConfig.context(silent=True) as config:
10 | with EdgeState.context(config, silent=True) as state:
11 | project_id = config.google_cloud_project.project_id
12 | secret_id = config.experiments.mongodb_connection_string_secret
13 | print(get_connection_string(project_id, secret_id))
14 | sys.exit(0)
15 |
--------------------------------------------------------------------------------
/src/edge/command/experiments/init.py:
--------------------------------------------------------------------------------
1 | from edge.command.common.precommand_check import precommand_checks
2 | from edge.config import EdgeConfig, SacredConfig
3 | from edge.enable_api import enable_service_api
4 | from edge.exception import EdgeException
5 | from edge.path import get_model_dvc_pipeline
6 | from edge.state import EdgeState
7 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark
8 | from edge.sacred import setup_sacred
9 | import questionary
10 |
11 |
12 | def experiments_init():
13 | intro = "Initialising experiment tracking"
14 | success_title = "Experiment tracking initialised successfully"
15 | success_message = ""
16 | failure_title = "Experiment tracking initialisation failed"
17 | failure_message = "See the errors above. See README for more details."
18 | with TUI(
19 | intro,
20 | success_title,
21 | success_message,
22 | failure_title,
23 | failure_message
24 | ) as tui:
25 | with EdgeConfig.context(to_save=True) as config:
26 | precommand_checks(config)
27 | with EdgeState.context(config, to_lock=True, to_save=True) as state:
28 | with StepTUI("Enabling required Google Cloud APIs", emoji="☁️"):
29 | with SubStepTUI("Enabling Kubernetes Engine API for experiment tracking"):
30 | enable_service_api("container.googleapis.com", config.google_cloud_project.project_id)
31 | with SubStepTUI("Enabling Secret Manager API for experiment tracking"):
32 | enable_service_api("secretmanager.googleapis.com", config.google_cloud_project.project_id)
33 | with StepTUI("Configuring experiment tracking", emoji="⚙️"):
34 | with SubStepTUI("Configuring Kubernetes cluster name on GCP", status=TUIStatus.NEUTRAL) as sub_step:
35 | sub_step.add_explanation("If a name for an existing cluster is provided, this cluster "
36 | "will be used. Otherwise, vertex:edge will create a new cluster with "
37 | "GKE auto-pilot.")
38 | previous_cluster_name = (
39 | config.experiments.gke_cluster_name if config.experiments is not None else "sacred"
40 | )
41 | cluster_name = questionary.text(
42 | "Choose a name for a kubernetes cluster to use:",
43 | default=previous_cluster_name,
44 | qmark=qmark,
45 | validate=(lambda x: x.strip() != "")
46 | ).ask()
47 | if cluster_name is None:
48 | raise EdgeException("Cluster name is required")
49 | sacred_config = SacredConfig(
50 | gke_cluster_name=cluster_name,
51 | mongodb_connection_string_secret="sacred-mongodb-connection-string"
52 | )
53 | config.experiments = sacred_config
54 | sacred_state = setup_sacred(
55 | config.google_cloud_project.project_id,
56 | config.google_cloud_project.region,
57 | config.experiments.gke_cluster_name,
58 | config.experiments.mongodb_connection_string_secret,
59 | )
60 | state.sacred = sacred_state
61 | tui.success_title = (
62 | f"Now you can track experiments, and view them in Omniboard dashboard "
63 | f"at {sacred_state.external_omniboard_string}\n\n"
64 | "What's next? We suggest you proceed with:\n\n"
65 | " Train and deploy a model (see 'Training a model' section of the README for more details):\n"
66 | " ./edge.sh model init fashion\n"
67 | f" dvc repro {get_model_dvc_pipeline('fashion')}\n"
68 | " ./edge.sh model deploy fashion\n\n"
69 | "Happy herding! 🐏"
70 | )
71 |
72 |
73 |
74 |
--------------------------------------------------------------------------------
/src/edge/command/experiments/subparser.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | from edge.command.experiments.get_dashboard import get_dashboard
4 | from edge.command.experiments.get_mongodb import get_mongodb
5 | from edge.exception import EdgeException
6 | from edge.command.experiments.init import experiments_init
7 |
8 |
9 | def add_experiments_parser(subparsers):
10 | parser = subparsers.add_parser("experiments", help="Experiments related actions")
11 | actions = parser.add_subparsers(title="action", dest="action", required=True)
12 | actions.add_parser("init", help="Initialise experiments")
13 | actions.add_parser("get-dashboard", help="Get experiment tracker dashboard URL")
14 | actions.add_parser("get-mongodb", help="Get MongoDB connection string")
15 |
16 |
17 | def run_experiments_actions(args: argparse.Namespace):
18 | if args.action == "init":
19 | experiments_init()
20 | elif args.action == "get-dashboard":
21 | get_dashboard()
22 | elif args.action == "get-mongodb":
23 | get_mongodb()
24 | else:
25 | raise EdgeException("Unexpected experiments command")
26 |
--------------------------------------------------------------------------------
/src/edge/command/force_unlock.py:
--------------------------------------------------------------------------------
1 | import sys
2 |
3 | from edge.config import EdgeConfig
4 | from edge.state import EdgeState
5 |
6 |
7 | def force_unlock():
8 | with EdgeConfig.context() as config:
9 | EdgeState.unlock(
10 | config.google_cloud_project.project_id,
11 | config.storage_bucket.bucket_name,
12 | )
13 | sys.exit(0)
14 |
--------------------------------------------------------------------------------
/src/edge/command/init.py:
--------------------------------------------------------------------------------
1 | from edge.command.common.precommand_check import check_gcloud_authenticated, check_project_exists, check_billing_enabled
2 | from edge.config import GCProjectConfig, StorageBucketConfig, EdgeConfig
3 | from edge.enable_api import enable_service_api
4 | from edge.exception import EdgeException
5 | from edge.gcloud import is_authenticated, get_gcloud_account, get_gcloud_project, get_gcloud_region, get_gcp_regions, \
6 | project_exists, is_billing_enabled
7 | from edge.state import EdgeState
8 | from edge.storage import setup_storage
9 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark
10 | from edge.versions import get_gcloud_version, Version, get_kubectl_version, get_helm_version
11 | from edge.path import get_model_dvc_pipeline
12 | import questionary
13 |
14 |
15 | def edge_init():
16 | success_title = "Initialised successfully"
17 | success_message = f"""
18 | What's next? We suggest you proceed with:
19 |
20 | Commit the new vertex:edge configuration to git:
21 | git add edge.yaml && git commit -m "Initialise vertex:edge"
22 |
23 | Configure an experiment tracker (optional):
24 | ./edge.sh experiments init
25 |
26 | Configure data version control:
27 | ./edge.sh dvc init
28 |
29 | Train and deploy a model (see 'Training a model' section of the README for more details):
30 | ./edge.sh model init fashion
31 | dvc repro {get_model_dvc_pipeline("fashion")}
32 | ./edge.sh model deploy fashion
33 |
34 | Happy herding! 🐏
35 | """.strip()
36 | failure_title = "Initialisation failed"
37 | failure_message = "See the errors above. See README for more details."
38 | with TUI(
39 | "Initialising vertex:edge",
40 | success_title,
41 | success_message,
42 | failure_title,
43 | failure_message
44 | ) as tui:
45 | with StepTUI(message="Checking your local environment", emoji="🖥️") as step:
46 | with SubStepTUI("Checking gcloud version") as sub_step:
47 | gcloud_version = get_gcloud_version()
48 | expected_gcloud_version_string = "2021.05.21"
49 | expected_gcloud_version = Version.from_string(expected_gcloud_version_string)
50 | if not gcloud_version.is_at_least(expected_gcloud_version):
51 | raise EdgeException(
52 | f"We found gcloud version {str(gcloud_version)}, "
53 | f"but we require at least {str(expected_gcloud_version)}. "
54 | "Update gcloud by running `gcloud components update`."
55 | )
56 |
57 | try:
58 | gcloud_alpha_version = get_gcloud_version("alpha")
59 | expected_gcloud_alpha_version_string = "2021.06.00"
60 | expected_gcloud_alpha_version = Version.from_string(expected_gcloud_alpha_version_string)
61 | if not gcloud_alpha_version.is_at_least(expected_gcloud_alpha_version):
62 | raise EdgeException(
63 | f"We found gcloud alpha component version {str(gcloud_alpha_version)}, "
64 | f"but we require at least {str(expected_gcloud_alpha_version)}. "
65 | "Update gcloud by running `gcloud components update`."
66 | )
67 | except KeyError:
68 | raise EdgeException(
69 | f"We couldn't find the gcloud alpha components, "
70 | f"please install these by running `gcloud components install alpha`"
71 | )
72 |
73 | with SubStepTUI("Checking kubectl version") as sub_step:
74 | kubectl_version = get_kubectl_version()
75 | expected_kubectl_version_string = "v1.19.0"
76 | expected_kubectl_version = Version.from_string(expected_kubectl_version_string)
77 | if not kubectl_version.is_at_least(expected_kubectl_version):
78 | raise EdgeException(
79 | f"We found gcloud version {str(kubectl_version)}, "
80 | f"but we require at least {str(expected_kubectl_version)}. "
81 | "Please visit https://kubernetes.io/docs/tasks/tools/ for installation instructions."
82 | )
83 |
84 | with SubStepTUI("Checking helm version") as sub_step:
85 | helm_version = get_helm_version()
86 | expected_helm_version_string = "v3.5.2"
87 | expected_helm_version = Version.from_string(expected_helm_version_string)
88 | if not helm_version.is_at_least(expected_helm_version):
89 | raise EdgeException(
90 | f"We found gcloud version {str(helm_version)}, "
91 | f"but we require at least {str(expected_helm_version)}. "
92 | "Please visit https://helm.sh/docs/intro/install/ for installation instructions."
93 | )
94 |
95 | with StepTUI(message="Checking your GCP environment", emoji="☁️") as step:
96 | with SubStepTUI(message="Verifying GCloud configuration") as sub_step:
97 | gcloud_account = get_gcloud_account()
98 | if gcloud_account is None or gcloud_account == "":
99 | raise EdgeException(
100 | "gcloud account is unset. "
101 | "Run `gcloud auth login && gcloud auth application-default login` to authenticate "
102 | "with the correct account"
103 | )
104 |
105 | gcloud_project = get_gcloud_project()
106 | if gcloud_project is None or gcloud_project == "":
107 | raise EdgeException(
108 | "gcloud project id is unset. "
109 | "Run `gcloud config set project $PROJECT_ID` to set the correct project id"
110 | )
111 |
112 | gcloud_region = get_gcloud_region()
113 | if gcloud_region is None or gcloud_region == "":
114 | raise EdgeException(
115 | "gcloud region is unset. "
116 | "Run `gcloud config set compute/region $REGION` to set the correct region"
117 | )
118 |
119 | sub_step.update(status=TUIStatus.NEUTRAL)
120 | sub_step.set_dirty()
121 |
122 | if not questionary.confirm(f"Is this the correct GCloud account: {gcloud_account}", qmark=qmark).ask():
123 | raise EdgeException(
124 | "Run `gcloud auth login && gcloud auth application-default login` to authenticate "
125 | "with the correct account"
126 | )
127 | if not questionary.confirm(f"Is this the correct project id: {gcloud_project}", qmark=qmark).ask():
128 | raise EdgeException("Run `gcloud config set project ` to set the correct project id")
129 | if not questionary.confirm(f"Is this the correct region: {gcloud_region}", qmark=qmark).ask():
130 | raise EdgeException("Run `gcloud config set compute/region ` to set the correct region")
131 |
132 | check_gcloud_authenticated(gcloud_project)
133 |
134 | with SubStepTUI(f"{gcloud_region} is available on Vertex AI") as sub_step:
135 | if gcloud_region not in get_gcp_regions(gcloud_project):
136 | formatted_regions = "\n ".join(get_gcp_regions(gcloud_project))
137 | raise EdgeException(
138 | "Vertex AI only works in certain regions. "
139 | "Please choose one of the following by running `gcloud config set compute/region `:\n"
140 | f" {formatted_regions}"
141 | )
142 |
143 | gcloud_config = GCProjectConfig(
144 | project_id=gcloud_project,
145 | region=gcloud_region,
146 | )
147 |
148 | check_project_exists(gcloud_project)
149 | check_billing_enabled(gcloud_project)
150 |
151 | with StepTUI(message="Initialising Google Storage and vertex:edge state file", emoji="💾") as step:
152 | with SubStepTUI("Enabling Storage API") as sub_step:
153 | enable_service_api("storage-component.googleapis.com", gcloud_project)
154 |
155 | with SubStepTUI("Configuring Google Storage bucket", status=TUIStatus.NEUTRAL) as sub_step:
156 | sub_step.set_dirty()
157 | storage_bucket_name = questionary.text(
158 | "Now you need to choose a name for a storage bucket that will be used for data version control, "
159 | "model assets and keeping track of the vertex:edge state\n "
160 | "NOTE: Storage bucket names must be unique and follow certain conventions. "
161 | "Please see the following guidelines for more information "
162 | "https://cloud.google.com/storage/docs/naming-buckets."
163 | "\n Enter Storage bucket name to use: ",
164 | qmark=qmark
165 | ).ask().strip()
166 | if storage_bucket_name is None or storage_bucket_name == "":
167 | raise EdgeException("Storage bucket name is required")
168 |
169 | storage_config = StorageBucketConfig(
170 | bucket_name=storage_bucket_name,
171 | dvc_store_directory="dvcstore",
172 | vertex_jobs_directory="vertex",
173 | )
174 | storage_state = setup_storage(gcloud_project, gcloud_region, storage_bucket_name)
175 |
176 | _state = EdgeState(
177 | storage=storage_state
178 | )
179 |
180 | _config = EdgeConfig(
181 | google_cloud_project=gcloud_config,
182 | storage_bucket=storage_config,
183 | )
184 |
185 | skip_saving_state = False
186 | with SubStepTUI("Checking if vertex:edge state file exists") as sub_step:
187 | if EdgeState.exists(_config):
188 | sub_step.update(
189 | "The state file already exists. "
190 | "This means that vertex:edge has already been initialised using this storage bucket.",
191 | status=TUIStatus.WARNING
192 | )
193 | sub_step.set_dirty()
194 | if not questionary.confirm(
195 | f"Do you want to delete the state and start over (this action is destructive!)",
196 | qmark=qmark,
197 | default=False,
198 | ).ask():
199 | skip_saving_state = True
200 |
201 | if skip_saving_state:
202 | with SubStepTUI("Saving state file skipped", status=TUIStatus.WARNING) as sub_step:
203 | pass
204 | else:
205 | with SubStepTUI("Saving state file") as sub_step:
206 | _state.save(_config)
207 |
208 | with StepTUI(message="Saving configuration", emoji="⚙️") as step:
209 | with SubStepTUI("Saving configuration to edge.yaml") as sub_step:
210 | _config.save("./edge.yaml")
211 |
--------------------------------------------------------------------------------
/src/edge/command/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/command/model/__init__.py
--------------------------------------------------------------------------------
/src/edge/command/model/deploy.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | from serde.json import from_json
4 | from edge.command.common.precommand_check import precommand_checks
5 | from edge.config import EdgeConfig
6 | from edge.exception import EdgeException
7 | from edge.state import EdgeState
8 | from edge.train import TrainedModel
9 | from edge.tui import TUI, StepTUI, SubStepTUI
10 | from edge.vertex_deploy import vertex_deploy
11 | from edge.path import get_model_dvc_pipeline, get_vertex_model_json
12 |
13 |
14 | def model_deploy(model_name: str):
15 | intro = f"Deploying model '{model_name}' on Vertex AI"
16 | success_title = "Model deployed successfully"
17 | success_message = "Success"
18 | failure_title = "Model deployment failed"
19 | failure_message = "See the errors above. See README for more details."
20 | with EdgeConfig.context() as config:
21 | with TUI(
22 | intro,
23 | success_title,
24 | success_message,
25 | failure_title,
26 | failure_message
27 | ) as tui:
28 | precommand_checks(config)
29 | with EdgeState.context(config, to_lock=True, to_save=True) as state:
30 | with StepTUI("Checking model configuration", emoji="🐏"):
31 | with SubStepTUI("Checking that the model is initialised"):
32 | if model_name not in config.models:
33 | raise EdgeException("Model has not been initialised. "
34 | f"Run `./edge.sh model init {model_name}` to initialise.")
35 | if state.models is None or state.models[model_name] is None:
36 | raise EdgeException("Model is missing from vertex:edge state. "
37 | "This might mean that the model has not been initialised. "
38 | f"Run `./edge.sh model init {model_name}` to initialise.")
39 | endpoint_resource_name = state.models[model_name].endpoint_resource_name
40 | with SubStepTUI("Checking that the model has been trained"):
41 | if not os.path.exists(get_vertex_model_json(model_name)):
42 | raise EdgeException(f"{get_vertex_model_json(model_name)} does not exist. "
43 | "This means that the model has not been trained")
44 | with open(get_vertex_model_json(model_name)) as file:
45 | model = from_json(TrainedModel, file.read())
46 | if model.is_local:
47 | raise EdgeException("This model was trained locally, and hence cannot be deployed "
48 | "on Vertex AI")
49 | model_resource_name = model.model_name
50 |
51 | vertex_deploy(endpoint_resource_name, model_resource_name, model_name)
52 |
53 | state.models[model_name].deployed_model_resource_name = model_resource_name
54 |
55 | short_endpoint_resource_name = "/".join(endpoint_resource_name.split("/")[2:])
56 | tui.success_message = (
57 | "You can see the deployed model at "
58 | f"https://console.cloud.google.com/vertex-ai/"
59 | f"{short_endpoint_resource_name}?project={config.google_cloud_project.project_id}\n\n"
60 | "Happy herding! 🐏"
61 | )
62 |
63 |
--------------------------------------------------------------------------------
/src/edge/command/model/describe.py:
--------------------------------------------------------------------------------
1 | import sys
2 | from dataclasses import dataclass
3 | from serde import serialize
4 | from serde.yaml import to_yaml
5 | from edge.config import EdgeConfig, ModelConfig
6 | from edge.state import ModelState, EdgeState
7 |
8 |
9 | @serialize
10 | @dataclass
11 | class Description:
12 | config: ModelConfig
13 | state: ModelState
14 |
15 |
16 | def describe_model(model_name):
17 | with EdgeConfig.context(silent=True) as config:
18 | if model_name not in config.models:
19 | print(f"'{model_name}' model is not initialised. "
20 | f"Initialise it by running `./edge.sh model init {model_name}`")
21 | sys.exit(1)
22 | else:
23 | with EdgeState.context(config, silent=True) as state:
24 | description = Description(
25 | config.models[model_name],
26 | state.models[model_name]
27 | )
28 | print(to_yaml(description))
29 | sys.exit(0)
30 |
--------------------------------------------------------------------------------
/src/edge/command/model/get_endpoint.py:
--------------------------------------------------------------------------------
1 | import sys
2 |
3 | from edge.config import EdgeConfig
4 | from edge.state import EdgeState
5 | import questionary
6 |
7 |
8 | def get_model_endpoint(model_name: str):
9 | with EdgeConfig.context(silent=True) as config:
10 | if config.models is None or model_name not in config.models:
11 | questionary.print("Model is not initialised. Initialise it by running `./edge.sh model init`.",
12 | style="fg:ansired")
13 | sys.exit(1)
14 | with EdgeState.context(config, silent=True) as state:
15 | print(state.models[model_name].endpoint_resource_name)
16 | sys.exit(0)
17 |
--------------------------------------------------------------------------------
/src/edge/command/model/init.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | from edge.command.common.precommand_check import precommand_checks
4 | from edge.config import EdgeConfig, ModelConfig
5 | from edge.enable_api import enable_service_api
6 | from edge.endpoint import setup_endpoint
7 | from edge.exception import EdgeException
8 | from edge.state import EdgeState
9 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark
10 | from edge.path import get_model_dvc_pipeline
11 | import questionary
12 |
13 |
14 | def model_init(model_name: str):
15 | intro = f"Initialising model '{model_name}' on Vertex AI"
16 | success_title = "Model initialised successfully"
17 | success_message = f"""
18 | What's next? We suggest you proceed with:
19 |
20 | Train and deploy a model (see 'Training a model' section of the README for more details):
21 | dvc repro {get_model_dvc_pipeline(model_name)}
22 | ./edge.sh model deploy
23 |
24 | Happy herding! 🐏
25 | """.strip()
26 | failure_title = "Model initialisation failed"
27 | failure_message = "See the errors above. See README for more details."
28 | with TUI(
29 | intro,
30 | success_title,
31 | success_message,
32 | failure_title,
33 | failure_message
34 | ) as tui:
35 | with EdgeConfig.context(to_save=True) as config:
36 | precommand_checks(config)
37 | with EdgeState.context(config, to_lock=True, to_save=True) as state:
38 | with StepTUI("Enabling required Google Cloud APIs", emoji="☁️"):
39 | with SubStepTUI("Enabling Vertex AI API for model training and deployment"):
40 | enable_service_api("aiplatform.googleapis.com", config.google_cloud_project.project_id)
41 |
42 | with StepTUI(f"Configuring model '{model_name}'", emoji="⚙️"):
43 | with SubStepTUI(f"Checking if model '{model_name}' is configured") as sub_step:
44 | if model_name in config.models:
45 | sub_step.update(f"Model '{model_name}' is already configured", status=TUIStatus.WARNING)
46 | sub_step.set_dirty()
47 | if not questionary.confirm(
48 | f"Do you want to configure model '{model_name}' again?",
49 | qmark=qmark
50 | ).ask():
51 | raise EdgeException(f"Configuration for model '{model_name}' already exists")
52 | else:
53 | sub_step.update(
54 | message=f"Model '{model_name}' is not configured",
55 | status=TUIStatus.NEUTRAL
56 | )
57 | with SubStepTUI(f"Creating model '{model_name}' configuration"):
58 | model_config = ModelConfig(
59 | name=model_name,
60 | endpoint_name=f"{model_name}-endpoint"
61 | )
62 | config.models[model_name] = model_config
63 |
64 | endpoint_name = config.models[model_name].endpoint_name
65 |
66 | model_state = setup_endpoint(
67 | config.google_cloud_project.project_id,
68 | config.google_cloud_project.region,
69 | endpoint_name
70 | )
71 |
72 | directory_exists = False
73 | pipeline_exists = False
74 | with StepTUI("Checking project directory structure", emoji="📁"):
75 | with SubStepTUI(f"Checking that 'models/{model_name}' directory exists") as sub_step:
76 | if not (os.path.exists(f"models/{model_name}") and os.path.isdir(f"models/{model_name}")):
77 | sub_step.update(
78 | message="'models/{model_name}' directory does not exist",
79 | status=TUIStatus.NEUTRAL
80 | )
81 | else:
82 | directory_exists = True
83 | if directory_exists:
84 | with SubStepTUI(f"Checking that 'models/{model_name}/dvc.yaml' pipeline exists") as sub_step:
85 | if not os.path.exists(f"models/{model_name}/dvc.yaml"):
86 | sub_step.update(
87 | message=f"'models/{model_name}/dvc.yaml' pipeline does not exist",
88 | status=TUIStatus.NEUTRAL
89 | )
90 | else:
91 | pipeline_exists = True
92 |
93 | state.models = {
94 | model_name: model_state
95 | }
96 |
97 | if not directory_exists or not pipeline_exists:
98 | tui.success_message = f"Note that the 'models/{model_name}" + tui.success_message
99 |
--------------------------------------------------------------------------------
/src/edge/command/model/list.py:
--------------------------------------------------------------------------------
1 | import sys
2 |
3 | from edge.config import EdgeConfig
4 |
5 |
6 | def list_models():
7 | with EdgeConfig.context(silent=True) as config:
8 | print("Configured models:")
9 | print("\n".join([f" - {x}" for x in config.models.keys()]))
10 | sys.exit(0)
11 |
--------------------------------------------------------------------------------
/src/edge/command/model/remove.py:
--------------------------------------------------------------------------------
1 | import questionary
2 |
3 | from edge.command.common.precommand_check import precommand_checks
4 | from edge.config import EdgeConfig
5 | from edge.endpoint import tear_down_endpoint
6 | from edge.exception import EdgeException
7 | from edge.state import EdgeState
8 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark
9 |
10 |
11 | def remove_model(model_name):
12 | intro = f"Removing model '{model_name}' from vertex:edge"
13 | success_title = "Model removed successfully"
14 | success_message = "Success"
15 | failure_title = "Model removal failed"
16 | failure_message = "See the errors above. See README for more details."
17 | with TUI(
18 | intro,
19 | success_title,
20 | success_message,
21 | failure_title,
22 | failure_message
23 | ) as tui:
24 | with EdgeConfig.context(to_save=True) as config:
25 | precommand_checks(config)
26 | with EdgeState.context(config, to_save=True, to_lock=True) as state:
27 | with StepTUI(f"Checking model '{model_name}' configuration and state", emoji="🐏"):
28 | with SubStepTUI(f"Checking model '{model_name}' configuration"):
29 | if model_name not in config.models:
30 | raise EdgeException(f"'{model_name}' model is not in `edge.yaml` configuration, so it "
31 | f"cannot be removed.")
32 | with SubStepTUI(f"Checking model '{model_name}' state"):
33 | if model_name not in state.models:
34 | raise EdgeException(f"'{model_name}' is not in vertex:edge state, which suggests that "
35 | f"it has not been initialised. Cannot be removed")
36 | with SubStepTUI("Confirming action", status=TUIStatus.WARNING) as sub_step:
37 | sub_step.add_explanation(f"This action will undeploy '{model_name}' model from Vertex AI, "
38 | f"delete the Vertex AI endpoint associated with '{model_name}' model, "
39 | f"and remove '{model_name}' model from vertex:edge config and "
40 | f"state.")
41 | if not questionary.confirm("Do you want to continue?", qmark=qmark, default=False).ask():
42 | raise EdgeException("Canceled by user")
43 |
44 | with StepTUI(f"Removing '{model_name}' model"):
45 | with SubStepTUI(f"Deleting '{state.models[model_name].endpoint_resource_name}' endpoint"):
46 | tear_down_endpoint(state.models[model_name].endpoint_resource_name)
47 | with SubStepTUI(f"Removing '{model_name}' model from config and state"):
48 | del config.models[model_name]
49 | del state.models[model_name]
50 |
51 |
--------------------------------------------------------------------------------
/src/edge/command/model/subparser.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | from edge.command.model.deploy import model_deploy
4 | from edge.command.model.describe import describe_model
5 | from edge.command.model.get_endpoint import get_model_endpoint
6 | from edge.command.model.init import model_init
7 | from edge.command.model.list import list_models
8 | from edge.command.model.remove import remove_model
9 | from edge.command.model.template import create_model_from_template
10 | from edge.exception import EdgeException
11 |
12 |
13 | def add_model_parser(subparsers):
14 | parser = subparsers.add_parser("model", help="Model related actions")
15 | actions = parser.add_subparsers(title="action", dest="action", required=True)
16 |
17 | init_parser = actions.add_parser("init", help="Initialise model on Vertex AI")
18 | init_parser.add_argument("model_name", metavar="model-name", help="Model name")
19 |
20 | deploy_parser = actions.add_parser("deploy", help="Deploy model on Vertex AI")
21 | deploy_parser.add_argument("model_name", metavar="model-name", help="Model name")
22 |
23 | get_endpoint_parser = actions.add_parser("get-endpoint", help="Get Vertex AI endpoint URI")
24 | get_endpoint_parser.add_argument("model_name", metavar="model-name", help="Model name")
25 |
26 | actions.add_parser("list", help="List initialised models")
27 |
28 | describe_parser = actions.add_parser("describe", help="Describe an initialised model")
29 | describe_parser.add_argument("model_name", metavar="model-name", help="Model name")
30 |
31 | remove_parser = actions.add_parser("remove", help="Remove an initialised model from vertex:edge")
32 | remove_parser.add_argument("model_name", metavar="model-name", help="Model name")
33 |
34 | template_parser = actions.add_parser("template", help="Create a model pipeline from a template")
35 | template_parser.add_argument("model_name", metavar="model-name", help="Model name")
36 | template_parser.add_argument("-f", action="store_true",
37 | help="Force override a pipeline directory if already exists")
38 |
39 |
40 | def run_model_actions(args: argparse.Namespace):
41 | if args.action == "init":
42 | model_init(args.model_name)
43 | elif args.action == "deploy":
44 | model_deploy(args.model_name)
45 | elif args.action == "get-endpoint":
46 | get_model_endpoint(args.model_name)
47 | elif args.action == "list":
48 | list_models()
49 | elif args.action == "describe":
50 | describe_model(args.model_name)
51 | elif args.action == "remove":
52 | remove_model(args.model_name)
53 | elif args.action == "template":
54 | create_model_from_template(args.model_name, args.f)
55 | else:
56 | raise EdgeException("Unexpected model command")
57 |
--------------------------------------------------------------------------------
/src/edge/command/model/template.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | from edge.command.common.precommand_check import precommand_checks
4 | from edge.config import EdgeConfig
5 | from edge.exception import EdgeException
6 | from edge.state import EdgeState
7 | from edge.tui import TUI, StepTUI, SubStepTUI, TUIStatus, qmark
8 | from cookiecutter.main import cookiecutter
9 | from cookiecutter.exceptions import OutputDirExistsException
10 | import questionary
11 |
12 |
13 | def create_model_from_template(model_name: str, force: bool = False):
14 | intro = f"Creating model pipeline '{model_name}' from a template"
15 | success_title = "Pipeline is created from a template"
16 | success_message = "Success"
17 | failure_title = "Pipeline creation failed"
18 | failure_message = "See the errors above. See README for more details."
19 | with EdgeConfig.context() as config:
20 | with TUI(
21 | intro,
22 | success_title,
23 | success_message,
24 | failure_title,
25 | failure_message
26 | ) as tui:
27 | precommand_checks(config)
28 | with EdgeState.context(config) as state:
29 | with StepTUI("Checking model configuration", emoji="🐏"):
30 | with SubStepTUI("Checking that the model is initialised"):
31 | if model_name not in config.models:
32 | raise EdgeException("Model has not been initialised. "
33 | f"Run `./edge.sh model init {model_name}` to initialise.")
34 | if state.models is None or state.models[model_name] is None:
35 | raise EdgeException("Model is missing from vertex:edge state. "
36 | "This might mean that the model has not been initialised. "
37 | f"Run `./edge.sh model init {model_name}` to initialise.")
38 | with StepTUI("Creating pipeline from a template", emoji="🐏"):
39 | with SubStepTUI("Choosing model pipeline template", status=TUIStatus.NEUTRAL) as substep:
40 | substep.set_dirty()
41 | templates = {
42 | "tensorflow": "tensorflow_model",
43 | }
44 | pipeline_template = questionary.select(
45 | "Choose model template",
46 | templates.keys(),
47 | qmark=qmark
48 | ).ask()
49 | if pipeline_template is None:
50 | raise EdgeException("Pipeline template must be selected")
51 | pipeline_template = templates[pipeline_template]
52 | with SubStepTUI(f"Applying template '{pipeline_template}'"):
53 | try:
54 | cookiecutter(
55 | os.path.join(
56 | os.path.dirname(os.path.abspath(__file__)),
57 | f"../../templates/{pipeline_template}/"
58 | ),
59 | output_dir="models/",
60 | extra_context={
61 | "model_name": model_name
62 | },
63 | no_input=True,
64 | overwrite_if_exists=force,
65 | )
66 | except OutputDirExistsException as exc:
67 | raise EdgeException(
68 | f"Pipeline directory 'models/{model_name}' already exists, so the template cannot be "
69 | f"applied. If you want to override the existing pipeline, run `edge model template "
70 | f"{model_name} -f`."
71 | )
72 |
--------------------------------------------------------------------------------
/src/edge/config.py:
--------------------------------------------------------------------------------
1 | import inspect
2 | import sys
3 | from dataclasses import dataclass, field
4 | from typing import TypeVar, Type, Optional, Dict
5 | from serde import serialize, deserialize
6 | from serde.yaml import from_yaml, to_yaml
7 | from contextlib import contextmanager
8 |
9 | from edge.tui import StepTUI, SubStepTUI
10 | from edge.path import get_default_config_path, get_default_config_path_from_model
11 |
12 |
13 | @deserialize
14 | @serialize
15 | @dataclass
16 | class GCProjectConfig:
17 | project_id: str
18 | region: str
19 |
20 |
21 | @deserialize
22 | @serialize
23 | @dataclass
24 | class StorageBucketConfig:
25 | bucket_name: str
26 | dvc_store_directory: str
27 | vertex_jobs_directory: str
28 |
29 |
30 | @deserialize
31 | @serialize
32 | @dataclass
33 | class SacredConfig:
34 | gke_cluster_name: str
35 | mongodb_connection_string_secret: str
36 |
37 |
38 | @deserialize
39 | @serialize
40 | @dataclass
41 | class ModelConfig:
42 | name: str
43 | endpoint_name: str
44 | training_container_image_uri: str = "europe-docker.pkg.dev/vertex-ai/training/tf-cpu.2-6:latest"
45 | serving_container_image_uri: str = "europe-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-6:latest"
46 |
47 |
48 | T = TypeVar("T", bound="EdgeConfig")
49 |
50 |
51 | @deserialize
52 | @serialize
53 | @dataclass
54 | class EdgeConfig:
55 | google_cloud_project: GCProjectConfig
56 | storage_bucket: StorageBucketConfig
57 | experiments: Optional[SacredConfig] = None
58 | models: Dict[str, ModelConfig] = field(default_factory=dict)
59 |
60 | def save(self, path: str):
61 | with open(path, "w") as f:
62 | f.write(to_yaml(self))
63 |
64 | def __str__(self) -> str:
65 | return to_yaml(self)
66 |
67 | @classmethod
68 | def from_string(cls: Type[T], string: str) -> T:
69 | return from_yaml(EdgeConfig, string)
70 |
71 | @classmethod
72 | def load(cls: Type[T], path: str) -> T:
73 | with open(path) as f:
74 | yaml_str = "\n".join(f.readlines())
75 |
76 | return from_yaml(EdgeConfig, yaml_str)
77 |
78 | @classmethod
79 | def load_default(cls: Type[T]) -> T:
80 | config_path = get_default_config_path_from_model(inspect.getframeinfo(sys._getframe(1)).filename)
81 | config = EdgeConfig.load(config_path)
82 | return config
83 |
84 | @classmethod
85 | @contextmanager
86 | def context(cls: Type[T], config_path: str = None, to_save: bool = False, silent: bool = False) -> T:
87 | if config_path is None:
88 | config_path = get_default_config_path()
89 | config = EdgeConfig.load(config_path)
90 | try:
91 | yield config
92 | finally:
93 | if to_save:
94 | with StepTUI("Saving vertex:edge configuration", emoji="💾", silent=silent):
95 | with SubStepTUI("Saving vertex:edge configuration", silent=silent):
96 | config.save(config_path)
97 |
--------------------------------------------------------------------------------
/src/edge/dvc.py:
--------------------------------------------------------------------------------
1 | import glob
2 | import subprocess
3 | from typing import Optional
4 |
5 | import os
6 | import shutil
7 | from edge.exception import EdgeException
8 | from edge.tui import StepTUI, SubStepTUI, TUIStatus, qmark
9 | import questionary
10 |
11 |
12 | def dvc_exists() -> bool:
13 | return os.path.exists(".dvc") and os.path.isdir(".dvc")
14 |
15 |
16 | def dvc_init():
17 | with StepTUI("Initialising DVC", emoji="🔨"):
18 | with SubStepTUI("Initialising DVC"):
19 | try:
20 | subprocess.check_output("dvc init", shell=True, stderr=subprocess.DEVNULL)
21 | except subprocess.CalledProcessError as e:
22 | raise EdgeException(f"Unexpected error occurred while initialising DVC:\n{str(e)}")
23 |
24 |
25 | def dvc_destroy():
26 | with StepTUI("Destroying DVC", emoji="🔥"):
27 | with SubStepTUI("Deleting DVC configuration [.dvc]"):
28 | shutil.rmtree(".dvc")
29 | with SubStepTUI("Deleting DVC data [data/fashion-mnist/*.dvc]"):
30 | for f in glob.glob("data/fashion-mnist/*.dvc"):
31 | os.remove(f)
32 | with SubStepTUI("Deleting pipeline lock file [models/pipelines/fashion/dvc.lock]"):
33 | if os.path.exists("models/pipelines/fashion/dvc.lock"):
34 | os.remove("models/pipelines/fashion/dvc.lock")
35 |
36 |
37 | def dvc_remote_exists(path: str) -> (bool, bool):
38 | try:
39 | remotes_raw = subprocess.check_output("dvc remote list", shell=True, stderr=subprocess.DEVNULL).decode("utf-8")
40 | remotes = [x.split("\t") for x in remotes_raw.strip().split("\n") if len(x.split("\t")) == 2]
41 | for remote in remotes:
42 | if remote[0] == "storage":
43 | if remote[1] == path:
44 | return True, True
45 | else:
46 | return True, False
47 | return False, False
48 | except subprocess.CalledProcessError as e:
49 | raise EdgeException(f"Unexpected error occurred while adding remote storage to DVC:\n{str(e)}")
50 |
51 |
52 | def get_dvc_storage_path() -> Optional[str]:
53 | try:
54 | remotes_raw = subprocess.check_output("dvc remote list", shell=True, stderr=subprocess.DEVNULL).decode("utf-8")
55 | remotes = [x.split("\t") for x in remotes_raw.strip().split("\n") if len(x.split("\t")) == 2]
56 | for remote in remotes:
57 | if remote[0] == "storage":
58 | return remote[1].strip()
59 | return None
60 | except subprocess.CalledProcessError as e:
61 | raise EdgeException(f"Unexpected error occurred while getting DVC remote storage path:\n{str(e)}")
62 |
63 |
64 | def dvc_add_remote(path: str):
65 | with StepTUI("Configuring DVC remote storage", emoji="⚙️"):
66 | with SubStepTUI(f"Adding '{path}' as DVC remote storage URI") as sub_step:
67 | try:
68 | storage_exists, correct_path = dvc_remote_exists(path)
69 | if storage_exists:
70 | if correct_path:
71 | return
72 | else:
73 | sub_step.update(f"Modifying existing storage URI to '{path}'")
74 | subprocess.check_output(
75 | f"dvc remote modify storage url {path} && dvc remote default storage", shell=True,
76 | stderr=subprocess.DEVNULL
77 | )
78 | else:
79 | subprocess.check_output(f"dvc remote add storage {path} && dvc remote default storage", shell=True,
80 | stderr=subprocess.DEVNULL)
81 | except subprocess.CalledProcessError as e:
82 | raise EdgeException(f"Unexpected error occurred while adding remote storage to DVC:\n{str(e)}")
83 |
84 |
85 | def setup_dvc(bucket_path: str, dvc_store_directory: str):
86 | storage_path = os.path.join(bucket_path, dvc_store_directory)
87 | exists = False
88 | is_remote_correct = False
89 | to_destroy = False
90 | with StepTUI("Checking DVC configuration", emoji="🔍"):
91 | with SubStepTUI("Checking if DVC is already initialised") as sub_step:
92 | exists = dvc_exists()
93 | if not exists:
94 | sub_step.update(
95 | message="DVC is not initialised",
96 | status=TUIStatus.NEUTRAL
97 | )
98 | if exists:
99 | with SubStepTUI("Checking if DVC remote storage is configured") as sub_step:
100 | configured_storage_path = get_dvc_storage_path()
101 | is_remote_correct = storage_path == configured_storage_path
102 | if configured_storage_path is None:
103 | sub_step.update(
104 | f"DVC remote storage is not configured",
105 | status=TUIStatus.NEUTRAL
106 | )
107 | elif not is_remote_correct:
108 | sub_step.update(
109 | f"DVC remote storage does not match vertex:edge config",
110 | status=TUIStatus.WARNING
111 | )
112 | sub_step.set_dirty()
113 | sub_step.add_explanation(
114 | f"DVC remote storage is configured to '{configured_storage_path}', "
115 | f"but vertex:edge has been configured to use '{storage_path}'. "
116 | f"This might mean that DVC has been already initialised to work with "
117 | f"a different GCP environment. "
118 | f"If this is the case, we recommend to reinitialise DVC from scratch. \n\n "
119 | f"Alternatively, you may have changed the bucket name on purpose in your GCP environment. "
120 | f"In which case, DVC does not need to be reinitialised, and your DVC config will be "
121 | f"updated to match your vertex:edge config."
122 | )
123 | to_destroy = questionary.confirm(
124 | "Do you want to destroy DVC and initialise it from scratch? (this action is destructive!)",
125 | default=False,
126 | qmark=qmark
127 | ).ask()
128 | if to_destroy is None:
129 | raise EdgeException("Canceled by user")
130 | if to_destroy:
131 | dvc_destroy()
132 | exists = False
133 | is_remote_correct = False
134 |
135 | # Checking again, DVC might have been destroyed by this point
136 | if not exists:
137 | dvc_init()
138 |
139 | if not is_remote_correct:
140 | dvc_add_remote(storage_path)
141 |
--------------------------------------------------------------------------------
/src/edge/enable_api.py:
--------------------------------------------------------------------------------
1 | """
2 | Enabling Google Cloud APIs
3 | """
4 | import json
5 | import os
6 | import subprocess
7 | from .exception import EdgeException
8 | from .config import EdgeConfig
9 |
10 |
11 | def enable_api(_config: EdgeConfig):
12 | """
13 | Enable all necessary APIs (deprecated)
14 |
15 | :param _config:
16 | :return:
17 | """
18 | print("# Enabling necessary Google Cloud APIs")
19 | project_id = _config.google_cloud_project.project_id
20 |
21 | print("## Kubernetes Engine")
22 | print("Required for installing the experiment tracker")
23 | os.system(f"gcloud services enable container.googleapis.com --project {project_id}")
24 |
25 | print("## Storage")
26 | print("Required for DVC remote storage, Vertex AI artifact storage, and Vertex:Edge state")
27 | os.system(f"gcloud services enable storage-component.googleapis.com --project {project_id}")
28 |
29 | print("## Vertex AI")
30 | print("Required for training and deploying on Vertex AI")
31 | os.system(f"gcloud services enable aiplatform.googleapis.com --project {project_id}")
32 |
33 | print("## Secret Manager")
34 | print("Required for secret sharing, including connection strings for the experiment tracker")
35 | os.system(f"gcloud services enable secretmanager.googleapis.com --project {project_id}")
36 |
37 | print("## Cloud Run")
38 | print("Required for deploying the webapp")
39 | os.system(f"gcloud services enable run.googleapis.com --project {project_id}")
40 |
41 |
42 | def is_service_api_enabled(service_name: str, project_id: str) -> bool:
43 | """
44 | Check if a [service_name] API is enabled
45 |
46 | :param service_name:
47 | :param project_id:
48 | :return:
49 | """
50 | try:
51 | enabled_services = json.loads(subprocess.check_output(
52 | f"gcloud services list --enabled --project {project_id} --format json",
53 | shell=True,
54 | stderr=subprocess.STDOUT
55 | ).decode("utf-8"))
56 | for service in enabled_services:
57 | if service_name in service["name"]:
58 | return True
59 | return False
60 | except subprocess.CalledProcessError as error:
61 | parse_enable_service_api_error(service_name, error)
62 | return False
63 |
64 |
65 | def enable_service_api(service: str, project_id: str):
66 | """
67 | Enable [service] API
68 |
69 | :param service:
70 | :param project_id:
71 | :return:
72 | """
73 | if not is_service_api_enabled(service, project_id):
74 | try:
75 | subprocess.check_output(
76 | f"gcloud services enable {service} --project {project_id}",
77 | shell=True,
78 | stderr=subprocess.STDOUT
79 | )
80 | except subprocess.CalledProcessError as error:
81 | parse_enable_service_api_error(service, error)
82 |
83 |
84 | def parse_enable_service_api_error(service: str, error: subprocess.CalledProcessError):
85 | """
86 | Parse errors coming from `gcloud services` commands
87 |
88 | :param service:
89 | :param error:
90 | :return:
91 | """
92 | output = error.output.decode("utf-8")
93 | if output.startswith("ERROR: (gcloud.services.enable) PERMISSION_DENIED"):
94 | raise EdgeException(f"Service '{service}' cannot be enabled because you have insufficient permissions "
95 | f"on Google Cloud")
96 |
97 | raise error
98 |
--------------------------------------------------------------------------------
/src/edge/endpoint.py:
--------------------------------------------------------------------------------
1 | """
2 | Performing operations on Vertex AI endpoints
3 | """
4 | import re
5 | from typing import Optional
6 | from google.cloud import aiplatform
7 | from google.api_core.exceptions import PermissionDenied
8 | from .config import EdgeConfig
9 | from .exception import EdgeException
10 | from .state import ModelState, EdgeState
11 | from .tui import StepTUI, SubStepTUI, TUIStatus
12 |
13 |
14 | def get_endpoint(sub_step: SubStepTUI, project_id: str, region: str, endpoint_name: str) -> Optional[str]:
15 | """
16 | Get Vertex AI endpoint resource name
17 |
18 | :param sub_step:
19 | :param project_id:
20 | :param region:
21 | :param endpoint_name:
22 | :return:
23 | """
24 | endpoints = aiplatform.Endpoint.list(
25 | filter=f'display_name="{endpoint_name}"',
26 | project=project_id,
27 | location=region,
28 | )
29 | if len(endpoints) > 1:
30 | sub_step.update(status=TUIStatus.WARNING)
31 | sub_step.add_explanation(
32 | f"Multiple endpoints with '{endpoint_name}' name were found. Vertex:edge will use the first one found"
33 | )
34 | elif len(endpoints) == 0:
35 | return None
36 | return endpoints[0].resource_name
37 |
38 |
39 | def create_endpoint(project_id: str, region: str, endpoint_name: str) -> str:
40 | """
41 | Create an endpoint on Vertex AI
42 |
43 | :param project_id:
44 | :param region:
45 | :param endpoint_name:
46 | :return:
47 | """
48 | try:
49 | endpoint = aiplatform.Endpoint.create(display_name=endpoint_name, project=project_id, location=region)
50 |
51 | return endpoint.resource_name
52 | except PermissionDenied as error:
53 | try:
54 | permission = re.search("Permission '(.*)' denied", error.args[0]).group(1)
55 | raise EdgeException(
56 | f"Endpoint '{endpoint_name}' could not be created in project '{project_id}' "
57 | f"because you have insufficient permission. Make sure you have '{permission}' permission."
58 | ) from error
59 | except AttributeError as attribute_error:
60 | raise error from attribute_error
61 |
62 |
63 | def setup_endpoint(project_id: str, region: str, endpoint_name: str) -> ModelState:
64 | """
65 | Setup procedure for Vertex AI endpoint
66 |
67 | :param project_id:
68 | :param region:
69 | :param endpoint_name:
70 | :return:
71 | """
72 | with StepTUI("Configuring Vertex AI endpoint", emoji="☁️"):
73 | with SubStepTUI(f"Checking if Vertex AI endpoint '{endpoint_name}' exists") as sub_step:
74 | endpoint_resource_name = get_endpoint(
75 | sub_step, project_id, region, endpoint_name
76 | )
77 | if endpoint_resource_name is None:
78 | sub_step.update(message=f"'{endpoint_name}' endpoint does not exist, creating...")
79 | endpoint_resource_name = create_endpoint(
80 | project_id, region, endpoint_name
81 | )
82 | sub_step.update(message="Created 'fashion-endpoint' endpoint")
83 | return ModelState(endpoint_resource_name=endpoint_resource_name)
84 |
85 |
86 | def tear_down_endpoint(endpoint_resource_name: str):
87 | endpoint = aiplatform.Endpoint(endpoint_resource_name)
88 | endpoint.undeploy_all()
89 | endpoint.delete()
90 |
--------------------------------------------------------------------------------
/src/edge/exception.py:
--------------------------------------------------------------------------------
1 | class EdgeException(Exception):
2 | def __init__(self, mesg: str, fatal: bool = True):
3 | self.fatal = True
4 | super(EdgeException, self).__init__(mesg)
5 |
--------------------------------------------------------------------------------
/src/edge/gcloud.py:
--------------------------------------------------------------------------------
1 | import json
2 | import os
3 | import subprocess
4 | from typing import List
5 | from .exception import EdgeException
6 |
7 | # Regions that are supported for Vertex AI training and deployment
8 | regions = [
9 | "us-central1",
10 | "europe-west4",
11 | "asia-east1",
12 | "asia-northeast1",
13 | "asia-northeast3",
14 | "asia-southeast1",
15 | "australia-southeast1",
16 | "europe-west1",
17 | "europe-west2",
18 | "northamerica-northeast1",
19 | "us-west1",
20 | "us-east1",
21 | "us-east4",
22 | ]
23 |
24 |
25 | def get_gcp_regions(project: str) -> List[str]:
26 | return regions
27 |
28 |
29 | def get_gcloud_account() -> str:
30 | return (
31 | subprocess.check_output("gcloud config get-value account", shell=True, stderr=subprocess.DEVNULL)
32 | .decode("utf-8")
33 | .strip()
34 | )
35 |
36 |
37 | def get_gcloud_project() -> str:
38 | return (
39 | subprocess.check_output("gcloud config get-value project", shell=True, stderr=subprocess.DEVNULL)
40 | .decode("utf-8")
41 | .strip()
42 | )
43 |
44 |
45 | def get_gcloud_region() -> str:
46 | return (
47 | subprocess.check_output("gcloud config get-value compute/region", shell=True, stderr=subprocess.DEVNULL)
48 | .decode("utf-8")
49 | .strip()
50 | )
51 |
52 |
53 | def is_billing_enabled(project: str) -> bool:
54 | try:
55 | response = json.loads(
56 | subprocess.check_output(
57 | f"gcloud alpha billing projects describe {project} --format json", shell=True, stderr=subprocess.DEVNULL
58 | )
59 | )
60 | return response["billingEnabled"]
61 | except subprocess.CalledProcessError:
62 | raise EdgeException(
63 | f"Unable to access billing information for project '{project}'. "
64 | f"Please verify that the project ID is valid and your user has permissions "
65 | f"to access the billing information for this project.",
66 | fatal=False
67 | )
68 |
69 |
70 | def is_authenticated(project_id: str) -> (bool, str):
71 | """
72 | Check if gcloud is authenticated
73 | :return: is authenticated, and the reason if not
74 | """
75 | try:
76 | subprocess.check_output(f"gcloud auth print-access-token", shell=True, stderr=subprocess.DEVNULL)
77 | except subprocess.CalledProcessError:
78 | return False, "gcloud is not authenticated. Run `gcloud auth login`."
79 |
80 | try:
81 | credentials = json.loads(subprocess.check_output(
82 | f"gcloud auth application-default print-access-token --format json", shell=True, stderr=subprocess.DEVNULL
83 | ).decode("utf-8"))
84 | if credentials["quota_project_id"] != project_id:
85 | return False, f"Quota project id does not match '{project_id}'. Please generate new application default" \
86 | f" credentials by running `gcloud auth application-default login`"
87 | return True, ""
88 | except subprocess.CalledProcessError:
89 | return (
90 | False,
91 | "gcloud does not have application default credentials configured. "
92 | "Run `gcloud auth application-default login`.",
93 | )
94 |
95 |
96 | def project_exists(project_id: str) -> bool:
97 | try:
98 | subprocess.check_output(f"gcloud projects describe {project_id}", shell=True, stderr=subprocess.DEVNULL)
99 | return True
100 | except subprocess.CalledProcessError:
101 | raise EdgeException(
102 | f"Unable to find project {project_id}. "
103 | "This means it does not exist or you do not have permissions to access it. "
104 | "Please verify that the project ID is valid in Google Cloud Console."
105 | )
106 |
--------------------------------------------------------------------------------
/src/edge/k8s/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/k8s/__init__.py
--------------------------------------------------------------------------------
/src/edge/k8s/omniboard.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: omniboard-deployment
5 | labels:
6 | app: omniboard
7 | spec:
8 | replicas: 1
9 | selector:
10 | matchLabels:
11 | app: omniboard
12 | template:
13 | metadata:
14 | labels:
15 | app: omniboard
16 | spec:
17 | containers:
18 | - name: nginx
19 | image: vivekratnavel/omniboard
20 | env:
21 | - name: MONGO_URI
22 | valueFrom:
23 | secretKeyRef:
24 | name: mongodb-connection
25 | key: internal
26 | ports:
27 | - containerPort: 9000
28 | ---
29 | apiVersion: v1
30 | kind: Service
31 | metadata:
32 | name: omniboard-lb
33 | spec:
34 | type: LoadBalancer
35 | selector:
36 | app: omniboard
37 | ports:
38 | - protocol: TCP
39 | port: 9000
40 | targetPort: 9000
41 |
--------------------------------------------------------------------------------
/src/edge/path.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | # TODO: Document all
4 |
5 | def get_default_config_path():
6 | # TODO: Document env var
7 | path = os.environ.get("EDGE_CONFIG_PATH")
8 | if path is None:
9 | path = os.path.join(os.getcwd(), "edge.yaml")
10 | return path
11 |
12 |
13 | def get_default_config_path_from_model(caller: str):
14 | path = os.environ.get("EDGE_CONFIG_PATH")
15 | if path is None:
16 | path = os.path.join(os.path.dirname(caller), "../../", "edge.yaml")
17 | return path
18 |
19 |
20 | def get_model_path(model_name: str):
21 | return f"models/{model_name}"
22 |
23 |
24 | def get_model_dvc_pipeline(model_name: str):
25 | return os.path.join(get_model_path(model_name), "dvc.yaml")
26 |
27 |
28 | def get_vertex_model_json(model_name: str):
29 | return os.path.join(get_model_path(model_name), "trained_model.json")
30 |
31 |
--------------------------------------------------------------------------------
/src/edge/sacred.py:
--------------------------------------------------------------------------------
1 | import json
2 | import os
3 | import subprocess
4 | import time
5 | from edge.config import EdgeConfig
6 | from google.cloud import container_v1
7 | from google.cloud.container_v1 import Cluster
8 | from google.api_core.exceptions import NotFound, PermissionDenied
9 | from google.cloud import secretmanager_v1
10 |
11 | from edge.exception import EdgeException
12 | from edge.state import SacredState, EdgeState
13 | from sacred.observers import MongoObserver
14 | from sacred.experiment import Experiment
15 |
16 | from edge.tui import StepTUI, SubStepTUI, TUIStatus
17 |
18 |
19 | def create_cluster(project_id: str, region: str, cluster_name: str) -> Cluster:
20 | with SubStepTUI(f"Checking if '{cluster_name}' cluster exists") as sub_step:
21 | client = container_v1.ClusterManagerClient()
22 | try:
23 | cluster = client.get_cluster(
24 | project_id=project_id, name=f"projects/{project_id}/locations/{region}/clusters/{cluster_name}"
25 | )
26 | except NotFound:
27 | sub_step.update(message=f"Cluster '{cluster_name}' does not exist, creating... (may take a few minutes)")
28 | try:
29 | subprocess.check_output(
30 | f"gcloud container clusters create-auto {cluster_name} --project {project_id} --region {region}",
31 | shell=True, stderr=subprocess.STDOUT
32 | )
33 | except subprocess.CalledProcessError as exc:
34 | raise EdgeException(f"Error occurred while creating cluster '{cluster_name}'\n{exc.output}")
35 |
36 | cluster = client.get_cluster(
37 | project_id=project_id, name=f"projects/{project_id}/locations/{region}/clusters/{cluster_name}"
38 | )
39 | sub_step.update(message=f"Cluster '{cluster_name}' created", status=TUIStatus.SUCCESSFUL)
40 | return cluster
41 | return cluster
42 |
43 |
44 | def get_credentials(project_id: str, region: str, cluster_name: str):
45 | with SubStepTUI("Getting cluster credentials"):
46 | try:
47 | subprocess.check_output(
48 | f"gcloud container clusters get-credentials {cluster_name} --project {project_id} --region {region}",
49 | shell=True,
50 | stderr=subprocess.STDOUT,
51 | )
52 | except subprocess.CalledProcessError as e:
53 | raise EdgeException(f"Error occurred while getting kubernetes cluster credentials\n{e.output}")
54 |
55 |
56 | def get_mongodb_password():
57 | with SubStepTUI("Getting MongoDB password"):
58 | try:
59 | return subprocess.check_output(
60 | 'kubectl get secret --namespace default mongodb -o jsonpath="{.data.mongodb-passwords}" | '
61 | 'base64 --decode',
62 | shell=True,
63 | ).decode("utf-8")
64 | except subprocess.CalledProcessError as e:
65 | raise EdgeException(f"Error occurred while getting MongoDB password\n{e.output}")
66 |
67 |
68 | def get_lb_ip(name) -> str:
69 | try:
70 | return subprocess.check_output(
71 | f'kubectl get service --namespace default {name} -o jsonpath="{{.status.loadBalancer.ingress[0].ip}}"',
72 | shell=True,
73 | ).decode("utf-8")
74 | except subprocess.CalledProcessError as e:
75 | raise EdgeException(f"Error occurred while getting IP for {name}\n{e.output}")
76 |
77 |
78 | def check_mongodb_installed() -> bool:
79 | helm_charts = json.loads(subprocess.check_output("helm list -o json", shell=True).decode("utf-8"))
80 | for chart in helm_charts:
81 | if chart["name"] == "mongodb":
82 | return True
83 | return False
84 |
85 |
86 | def check_mongodb_lb_installed() -> bool:
87 | try:
88 | subprocess.check_output("kubectl get service mongodb-lb -o json", stderr=subprocess.STDOUT, shell=True)
89 | except subprocess.CalledProcessError as e:
90 | if e.output.decode("utf-8") == 'Error from server (NotFound): services "mongodb-lb" not found\n':
91 | return False
92 | else:
93 | raise e
94 | return True
95 |
96 |
97 | def install_mongodb() -> (str, str):
98 | with SubStepTUI("Checking if MongoDB is installed on the cluster") as sub_step:
99 | try:
100 | if not check_mongodb_installed():
101 | sub_step.update("Installing MongoDB on the cluster")
102 | subprocess.check_output(
103 | """
104 | helm repo add bitnami https://charts.bitnami.com/bitnami
105 | helm upgrade -i --wait mongodb bitnami/mongodb --version 10.29.1 --set auth.username=sacred,auth.database=sacred
106 | """,
107 | shell=True,
108 | )
109 | sub_step.update("MongoDB is installed on the cluster", status=TUIStatus.SUCCESSFUL)
110 | except subprocess.CalledProcessError as e:
111 | raise EdgeException(f"Error occurred while installing MongoDB with helm chart\n{e.output}")
112 |
113 | with SubStepTUI("Making MongoDB externally available"):
114 | try:
115 | if not check_mongodb_lb_installed():
116 | subprocess.check_output(
117 | "kubectl expose deployment mongodb --name mongodb-lb --type LoadBalancer --port 60000 "
118 | "--target-port 27017",
119 | shell=True,
120 | )
121 | except subprocess.CalledProcessError as e:
122 | raise EdgeException(f"Error occurred while exposing MongoDB\n{e.output}")
123 |
124 | password = get_mongodb_password()
125 |
126 | with SubStepTUI("Getting MongoDB IP address (may take a few minutes)"):
127 | external_ip = get_lb_ip("mongodb-lb")
128 | while external_ip == "":
129 | time.sleep(5)
130 | external_ip = get_lb_ip("mongodb-lb")
131 |
132 | internal_connection_string = f"mongodb://sacred:{password}@mongodb/sacred"
133 | external_connection_string = f"mongodb://sacred:{password}@{external_ip}:60000/sacred"
134 |
135 | with SubStepTUI("Saving MongoDB credentials into kubernetes secrets") as sub_step:
136 | try:
137 | subprocess.check_output(
138 | "kubectl delete secret mongodb-connection", shell=True, stderr=subprocess.STDOUT
139 | )
140 | except subprocess.CalledProcessError as exc:
141 | if "NotFound" in exc.output.decode("utf-8"):
142 | pass # error expected if the secret was not previously created
143 | else:
144 | raise EdgeException(
145 | f"Error while trying to delete mongodb-connection secret\n{exc.output.decode('utf-8')}"
146 | )
147 | try:
148 | subprocess.check_output(
149 | f"kubectl create secret generic mongodb-connection "
150 | f"--from-literal=internal={internal_connection_string}",
151 | shell=True,
152 | )
153 | except subprocess.CalledProcessError as exc:
154 | raise EdgeException(
155 | f"Error while trying to create mongodb-connection secret\n{exc.output.decode('utf-8')}"
156 | )
157 |
158 | sub_step.update(status=TUIStatus.SUCCESSFUL)
159 | sub_step.add_explanation(f"Internal connection string: mongodb://sacred:*****@mongodb/sacred")
160 | sub_step.add_explanation(f"External connection string: mongodb://sacred:*****@{external_ip}:60000/sacred")
161 | sub_step.add_explanation(f"You can get full connection strings by running `./edge.sh experiments get-mongodb`")
162 |
163 | return internal_connection_string, external_connection_string
164 |
165 |
166 | def install_omniboard() -> str:
167 | with SubStepTUI("Installing experiment tracker dashboard (Omniboard)"):
168 | try:
169 | subprocess.check_output("kubectl apply -f /omniboard.yaml", stderr=subprocess.STDOUT, shell=True)
170 | except subprocess.CalledProcessError as e:
171 | raise EdgeException(f"Error occurred while applying Omniboard's configuration\n {e} {e.output}")
172 |
173 | with SubStepTUI("Getting Omniboard IP address (may take a few minutes)") as sub_step:
174 | external_ip = get_lb_ip("omniboard-lb")
175 | while external_ip == "":
176 | time.sleep(5)
177 | external_ip = get_lb_ip("omniboard-lb")
178 |
179 | sub_step.update(status=TUIStatus.SUCCESSFUL)
180 | sub_step.add_explanation(
181 | f"Omniboard is installed and available at http://{external_ip}:9000",
182 | )
183 | return f"http://{external_ip}:9000"
184 |
185 |
186 | def save_mongo_to_secretmanager(project_id: str, secret_id: str, connection_string: str):
187 | with SubStepTUI("Saving MongoDB credentials to Google Cloud Secret Manager") as sub_step:
188 | try:
189 | client = secretmanager_v1.SecretManagerServiceClient()
190 | try:
191 | client.access_secret_version(name=f"projects/{project_id}/secrets/{secret_id}/versions/latest")
192 | except NotFound:
193 | client.create_secret(
194 | request={
195 | "parent": f"projects/{project_id}",
196 | "secret_id": secret_id,
197 | "secret": {"replication": {"automatic": {}}},
198 | }
199 | )
200 |
201 | client.add_secret_version(
202 | request={
203 | "parent": f"projects/{project_id}/secrets/{secret_id}",
204 | "payload": {"data": connection_string.encode()},
205 | }
206 | )
207 | except PermissionDenied as exc:
208 | sub_step.update(status=TUIStatus.FAILED)
209 | sub_step.add_explanation(exc.message)
210 |
211 |
212 | def delete_mongo_to_secretmanager(_config: EdgeConfig):
213 | project_id = _config.google_cloud_project.project_id
214 | secret_id = _config.experiments.mongodb_connection_string_secret
215 | print("## Removing MongoDB connection string from Google Cloud Secret Manager")
216 | client = secretmanager_v1.SecretManagerServiceClient()
217 |
218 | try:
219 | client.access_secret_version(name=f"projects/{project_id}/secrets/{secret_id}/versions/latest")
220 | client.delete_secret(name=f"projects/{project_id}/secrets/{secret_id}")
221 | except NotFound:
222 | print("Secret does not exist")
223 | return
224 |
225 |
226 | def delete_cluster(_config: EdgeConfig):
227 | project_id = _config.google_cloud_project.project_id
228 | region = _config.google_cloud_project.region
229 | cluster_name = _config.experiments.gke_cluster_name
230 | print(f"## Deleting cluster '{cluster_name}'")
231 | client = container_v1.ClusterManagerClient()
232 | try:
233 | client.get_cluster(
234 | project_id=project_id, name=f"projects/{project_id}/locations/{region}/clusters/{cluster_name}"
235 | )
236 | os.system(f"gcloud container clusters delete {cluster_name} --project {project_id} --region {region}")
237 | except NotFound:
238 | print("Cluster does not exist")
239 |
240 |
241 | def setup_sacred(project_id: str, region: str, gke_cluster_name: str, secret_id: str) -> SacredState:
242 | with StepTUI("Installing experiment tracker", emoji="📔"):
243 | create_cluster(
244 | project_id, region, gke_cluster_name
245 | )
246 |
247 | get_credentials(
248 | project_id, region, gke_cluster_name
249 | )
250 |
251 | internal_mongo_string, external_mongo_string = install_mongodb()
252 |
253 | save_mongo_to_secretmanager(project_id, secret_id, external_mongo_string)
254 |
255 | external_omniboard_string = install_omniboard()
256 |
257 | return SacredState(external_omniboard_string=external_omniboard_string)
258 |
259 |
260 | def tear_down_sacred(_config: EdgeConfig, _state: EdgeState):
261 | print("# Tearing down Sacred+Omniboard")
262 |
263 | delete_mongo_to_secretmanager(_config)
264 | delete_cluster(_config)
265 |
266 |
267 | def get_connection_string(project_id: str, secret_id: str) -> str:
268 | client = secretmanager_v1.SecretManagerServiceClient()
269 |
270 | secret_name = f"projects/{project_id}/secrets/{secret_id}/versions/latest"
271 | response = client.access_secret_version(name=secret_name)
272 |
273 | return response.payload.data.decode("UTF-8")
274 |
275 |
276 | def track_experiment(config: EdgeConfig, state: EdgeState, experiment: Experiment):
277 | if config is None or state is None:
278 | print("Vertex:edge configuration is not provided, the experiment will not be tracked")
279 | return
280 |
281 | if state.sacred is None:
282 | print("Experiment tracker is not initialised in vertex:edge, the experiment will not be tracked")
283 | return
284 |
285 | project_id = config.google_cloud_project.project_id
286 | secret_id = config.experiments.mongodb_connection_string_secret
287 | mongo_connection_string = get_connection_string(project_id, secret_id)
288 | experiment.observers.append(MongoObserver(mongo_connection_string))
289 |
--------------------------------------------------------------------------------
/src/edge/state.py:
--------------------------------------------------------------------------------
1 | import os.path
2 | from serde import serialize, deserialize
3 | from serde.yaml import to_yaml, from_yaml
4 | from dataclasses import dataclass
5 | from google.cloud import storage
6 |
7 | from edge.exception import EdgeException
8 | from edge.storage import get_bucket, StorageBucketState
9 | from edge.config import EdgeConfig
10 | from edge.tui import StepTUI, SubStepTUI
11 | from typing import Type, TypeVar, Optional, Dict
12 | from contextlib import contextmanager
13 |
14 |
15 | @deserialize
16 | @serialize
17 | @dataclass
18 | class SacredState:
19 | external_omniboard_string: str
20 |
21 |
22 | @deserialize
23 | @serialize
24 | @dataclass
25 | class ModelState:
26 | endpoint_resource_name: str
27 | deployed_model_resource_name: Optional[str] = None
28 |
29 |
30 | T = TypeVar("T", bound="EdgeState")
31 |
32 |
33 | @deserialize
34 | @serialize
35 | @dataclass
36 | class EdgeState:
37 | models: Optional[Dict[str, ModelState]] = None
38 | sacred: Optional[SacredState] = None
39 | storage: Optional[StorageBucketState] = None
40 |
41 | def save(self, _config: EdgeConfig):
42 | client = storage.Client(project=_config.google_cloud_project.project_id)
43 | bucket = client.bucket(_config.storage_bucket.bucket_name)
44 | blob = storage.Blob(".edge_state/edge_state.yaml", bucket)
45 | blob.upload_from_string(to_yaml(self))
46 |
47 |
48 | @classmethod
49 | def load(cls: Type[T], _config: EdgeConfig) -> T:
50 | client = storage.Client(project=_config.google_cloud_project.project_id)
51 | bucket = client.bucket(_config.storage_bucket.bucket_name)
52 | blob = storage.Blob(".edge_state/edge_state.yaml", bucket)
53 |
54 | if blob.exists():
55 | return from_yaml(EdgeState, blob.download_as_bytes(client).decode("utf-8"))
56 | else:
57 | raise EdgeException(f"State file is not found in '{_config.storage_bucket.bucket_name}' bucket."
58 | f"Initialise vertex:edge state by running `./edge.py init.`")
59 |
60 | @classmethod
61 | @contextmanager
62 | def context(
63 | cls: Type[T],
64 | _config: EdgeConfig,
65 | to_lock: bool = False,
66 | to_save: bool = False,
67 | silent: bool = False
68 | ) -> T:
69 | with StepTUI("Loading vertex:edge state", emoji="💾", silent=silent):
70 | state = None
71 | locked = False
72 |
73 | if to_lock:
74 | with SubStepTUI("Locking state", silent=silent):
75 | locked = EdgeState.lock(_config.google_cloud_project.project_id,
76 | _config.storage_bucket.bucket_name)
77 |
78 | with SubStepTUI("Loading state", silent=silent):
79 | state = EdgeState.load(_config)
80 | try:
81 | yield state
82 | finally:
83 | if (to_save and state is not None) or locked:
84 | with StepTUI("Saving vertex:edge state", emoji="💾", silent=silent):
85 | if to_save and state is not None:
86 | with SubStepTUI("Saving state", silent=silent):
87 | state.save(_config)
88 | if locked:
89 | with SubStepTUI("Unlocking state", silent=silent):
90 | EdgeState.unlock(_config.google_cloud_project.project_id,
91 | _config.storage_bucket.bucket_name)
92 |
93 | @classmethod
94 | def exists(cls: Type[T], _config: EdgeConfig) -> bool:
95 | client = storage.Client(project=_config.google_cloud_project.project_id)
96 | bucket = client.bucket(_config.storage_bucket.bucket_name)
97 | blob = storage.Blob(".edge_state/edge_state.yaml", bucket)
98 | return blob.exists()
99 |
100 | @classmethod
101 | def lock(cls, project: str, bucket_name: str, blob_name: str = ".edge_state/edge_state.yaml") -> bool:
102 | """
103 | Lock the state file in Google Storage Bucket
104 |
105 | :param project:
106 | :param bucket_name:
107 | :param blob_name:
108 | :return: (bool, bool) -- is lock successful, is state to be locked later
109 | """
110 | bucket = get_bucket(project, bucket_name)
111 | if bucket is None or not bucket.exists():
112 | raise EdgeException("Google Storage Bucket does not exist. Initialise it by running `./edge.py init.`")
113 | blob = storage.Blob(f"{blob_name}.lock", bucket)
114 | if blob.exists():
115 | raise EdgeException("State file is already locked")
116 |
117 | blob.upload_from_string("locked")
118 | return True
119 |
120 | @classmethod
121 | def unlock(cls, project: str, bucket_name: str, blob_name: str = ".edge_state/edge_state.yaml"):
122 | bucket = get_bucket(project, bucket_name)
123 | blob = storage.Blob(f"{blob_name}.lock", bucket)
124 |
125 | if bucket is not None and blob.exists():
126 | blob.delete()
127 |
--------------------------------------------------------------------------------
/src/edge/storage.py:
--------------------------------------------------------------------------------
1 | import sys
2 | from typing import Optional
3 | from serde import serialize, deserialize
4 | from dataclasses import dataclass
5 | from google.api_core.exceptions import NotFound, Forbidden
6 | from google.cloud import storage
7 | from .config import EdgeConfig
8 | from .exception import EdgeException
9 | from .tui import (
10 | print_substep_not_done, print_substep_success, print_substep_failure, print_failure_explanation, print_substep,
11 | clear_last_line, SubStepTUI
12 | )
13 |
14 |
15 | @deserialize
16 | @serialize
17 | @dataclass
18 | class StorageBucketState:
19 | bucket_path: str
20 |
21 |
22 | def get_bucket(project_id: str, bucket_name: str) -> Optional[storage.Bucket]:
23 | try:
24 | client = storage.Client(project_id)
25 | bucket = client.get_bucket(bucket_name)
26 | return bucket
27 | except NotFound:
28 | return None
29 | except Forbidden:
30 | raise EdgeException(
31 | f"The bucket '{bucket_name}' exists, but you do not have permissions to access it. "
32 | "Maybe it belongs to another project? "
33 | "Please see the following guidelines for more information "
34 | "https://cloud.google.com/storage/docs/naming-buckets"
35 | )
36 |
37 |
38 | def get_bucket_uri(project_id: str, bucket_name: str) -> Optional[str]:
39 | bucket = get_bucket(project_id, bucket_name)
40 | if bucket is None:
41 | return None
42 | return f"gs://{bucket.name}/"
43 |
44 |
45 | def create_bucket(project_id: str, region: str, bucket_name: str) -> str:
46 | client = storage.Client(project_id)
47 | bucket = client.create_bucket(bucket_or_name=bucket_name, project=project_id, location=region)
48 | return f"gs://{bucket.name}/"
49 |
50 |
51 | def delete_bucket(project_id: str, region: str, bucket_name: str):
52 | client = storage.Client(project_id)
53 | bucket = client.get_bucket(bucket_name)
54 | print("## Deleting bucket content")
55 | bucket.delete_blobs(blobs=list(bucket.list_blobs()))
56 | print("## Deleting bucket")
57 | bucket.delete(force=True)
58 | print("Bucket deleted")
59 |
60 |
61 | def tear_down_storage(_config: EdgeConfig, _state):
62 | print("# Tearing down Google Storage")
63 | bucket_path = get_bucket_uri(
64 | _config.google_cloud_project.project_id,
65 | _config.storage_bucket.bucket_name,
66 | )
67 | if bucket_path is not None:
68 | delete_bucket(
69 | _config.google_cloud_project.project_id,
70 | _config.google_cloud_project.region,
71 | _config.storage_bucket.bucket_name,
72 | )
73 |
74 |
75 | def setup_storage(project_id: str, region: str, bucket_name: str) -> StorageBucketState:
76 | with SubStepTUI(f"Checking if '{bucket_name}' exists") as sub_step:
77 | try:
78 | bucket_path = get_bucket_uri(
79 | project_id,
80 | bucket_name,
81 | )
82 | if bucket_path is None:
83 | clear_last_line()
84 | sub_step.update(message=f"'{bucket_name}' does not exist, creating it")
85 | bucket_path = create_bucket(
86 | project_id,
87 | region,
88 | bucket_name,
89 | )
90 | return StorageBucketState(bucket_path)
91 | except NotFound as e:
92 | raise EdgeException(f"The '{project_id}' project could not be found. It might mean that the quota project "
93 | f"is set to a different project. Please generate new application default "
94 | f"credentials by running `gcloud auth application-default login`")
95 | except ValueError as e:
96 | raise EdgeException(f"Unexpected error while setting up Storage bucket:\n{str(e)}")
97 |
--------------------------------------------------------------------------------
/src/edge/templates/tensorflow_model/cookiecutter.json:
--------------------------------------------------------------------------------
1 | {
2 | "model_name": "sklearn_model"
3 | }
--------------------------------------------------------------------------------
/src/edge/templates/tensorflow_model/{{cookiecutter.model_name}}/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/src/edge/templates/tensorflow_model/{{cookiecutter.model_name}}/__init__.py
--------------------------------------------------------------------------------
/src/edge/templates/tensorflow_model/{{cookiecutter.model_name}}/train.py:
--------------------------------------------------------------------------------
1 | from edge.train import Trainer
2 |
3 | class MyTrainer(Trainer):
4 | def main(self):
5 | self.set_parameter("example", 123)
6 |
7 | # Add model training logic here
8 |
9 | return 0 # return your model score here
10 |
11 | MyTrainer("{{cookiecutter.model_name}}").run()
12 |
--------------------------------------------------------------------------------
/src/edge/train.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import abc
4 | import uuid
5 | import inspect
6 | import logging
7 | from typing import Optional, Any
8 | from dataclasses import dataclass
9 | from enum import Enum
10 |
11 | from serde import serialize, deserialize
12 | from serde.json import to_json
13 | from sacred import Experiment
14 | from sacred.observers import MongoObserver
15 | from google.cloud import secretmanager_v1
16 | from google.cloud.aiplatform import Model, CustomJob
17 |
18 | import edge.path
19 | #from edge.state import EdgeState
20 | from edge.config import EdgeConfig
21 | from edge.exception import EdgeException
22 |
23 | logging.basicConfig(level = logging.INFO)
24 |
25 | class TrainingTarget(Enum):
26 | LOCAL = "local"
27 | VERTEX = "vertex"
28 |
29 | @deserialize
30 | @serialize
31 | @dataclass
32 | class TrainedModel:
33 | model_name: Optional[str]
34 | is_local: bool = False
35 |
36 | @classmethod
37 | def from_vertex_model(cls, model: Model):
38 | return TrainedModel(
39 | model_name=model.resource_name,
40 | )
41 |
42 | @classmethod
43 | def from_local_model(cls):
44 | return TrainedModel(
45 | model_name=None,
46 | is_local=True,
47 | )
48 |
49 | """
50 | A Trainer encapsulates a model training script and its associated MLOps lifecycle
51 |
52 | TODO: Explain why it has been built in this way. Sacred forces us into this pattern, but at least we hide it from the user.
53 | TOOD: How much can we abstract this? What if Sacred is replaced with something else?
54 | TODO: How can we be even better at handling experiment config?
55 | """
56 | class Trainer():
57 | # TODO: group together experiment variables and Vertex variables. Note when target is local, we don't need Vertex values
58 | experiment = None
59 | experiment_run = None
60 | edge_config = None
61 | #edge_state = None
62 | name = None
63 | # TODO: Remove hard-coded Git link
64 | pip_requirements = [
65 | "vertex-edge @ git+https://github.com/fuzzylabs/vertex-edge.git"
66 | ]
67 | vertex_staging_path = None
68 | vertex_output_path = None
69 | script_path = None
70 | mongo_connection_string = None
71 | target = TrainingTarget.LOCAL
72 | model_config = None
73 | model_id = None
74 |
75 | def __init__(self, name: str):
76 | self.name = name
77 |
78 | # We need the path to the training script itself
79 | self.script_path = inspect.getframeinfo(sys._getframe(1)).filename
80 |
81 | # Determine our target training environment
82 | if os.environ.get("RUN_ON_VERTEX") == "True":
83 | logging.info("Target training environment is Vertex")
84 | self.target = TrainingTarget.VERTEX
85 | else:
86 | logging.info("Target training environment is Local")
87 | self.target = TrainingTarget.LOCAL
88 |
89 | # Load the Edge configuration from the appropriate source
90 | # TODO: Document env var
91 | if os.environ.get("EDGE_CONFIG"):
92 | logging.info("Edge config will be loaded from environment variable EDGE_CONFIG_STRING")
93 | self.edge_config = self._decode_config_string(os.environ.get("EDGE_CONFIG"))
94 | else:
95 | logging.info("Edge config will be loaded from edge.yaml")
96 | # TODO: This isn't very stable. We should search for the config file.
97 | self.edge_config = EdgeConfig.load(edge.path.get_default_config_path_from_model(inspect.getframeinfo(sys._getframe(1)).filename))
98 |
99 | # Extract the model configuration and check if the model has been initialised
100 | if name in self.edge_config.models:
101 | self.model_config = self.edge_config.models[name]
102 | else:
103 | raise EdgeException(f"Model with name {name} could not be found in Edge config. Perhaps it hasn't been initialised")
104 |
105 | # Load the Edge state
106 | #self.edge_state = EdgeState.load(self.edge_config)
107 | #logging.info(f"Edge state: {self.edge_state}")
108 |
109 | if os.environ.get("MODEL_ID"):
110 | self.model_id = os.environ.get("MODEL_ID")
111 | else:
112 | self.model_id = uuid.uuid4()
113 |
114 | # Determine correct paths for Vertex running
115 | self.vertex_staging_path = "gs://" + os.path.join(
116 | self.edge_config.storage_bucket.bucket_name,
117 | self.edge_config.storage_bucket.vertex_jobs_directory
118 | )
119 | self.vertex_output_path = os.path.join(self.vertex_staging_path, str(self.model_id))
120 |
121 | # Set up experiment tracking for this training job
122 | # TODO: Restore Git support
123 | # TODO: If training target is Vertex, we don't need to init an experiment
124 | # TODO: Experiment initialisation in its own function (but *must* be called during construction)
125 | self.experiment = Experiment(name, save_git_info=True)
126 |
127 | # TODO: Document env var
128 | if os.environ.get("MONGO_CONNECTION_STRING"):
129 | self.mongo_connection_string = os.environ.get("MONGO_CONNECTION_STRING")
130 | else:
131 | self.mongo_connection_string = self._get_mongo_connection_string()
132 |
133 | if self.mongo_connection_string is not None:
134 | self.experiment.observers.append(MongoObserver(self.mongo_connection_string))
135 | else:
136 | logging.info("Experiment tracker has not been initialised")
137 |
138 | @self.experiment.main
139 | def ex_noop_main(c):
140 | pass
141 |
142 | """
143 | To be implemented by data scientist
144 | """
145 | @abc.abstractmethod
146 | def main(self):
147 | # TODO: A more user-friendly message
148 | raise NotImplementedError("The main method for this trainer has not been implemented")
149 |
150 | def set_parameter(self, key: str, value: Any):
151 | self.experiment_run.config[key] = value
152 |
153 | def get_parameter(self, key: str) -> Any:
154 | return self.experiment_run.config[key]
155 |
156 | def log_scalar(self, key: str, value: Any):
157 | self.experiment_run.log_scalar(key, value)
158 |
159 | def get_model_save_path(self):
160 | # TODO: Support local paths
161 | return self.vertex_output_path
162 |
163 | """
164 | Executes the training script and tracks experiment details
165 | """
166 | def run(self):
167 | json_path = os.path.join(
168 | os.path.dirname(self.script_path),
169 | "trained_model.json"
170 | )
171 |
172 | with open(json_path, "w") as train_json:
173 | if self.target == TrainingTarget.VERTEX:
174 | self._run_on_vertex()
175 |
176 | try:
177 | model = self._create_model_on_vertex()
178 | train_json.write(to_json(TrainedModel.from_vertex_model(model)))
179 | except Exception as e:
180 | logging.info("Unable to capture saved model. This might mean the model has not been saved by the training script")
181 | else:
182 | self._run_locally()
183 | train_json.write(to_json(TrainedModel.from_local_model()))
184 |
185 | def _run_locally(self):
186 | self.experiment_run = self.experiment._create_run()
187 | result = self.main()
188 |
189 | self.experiment_run.log_scalar("score", result)
190 | self.experiment_run({})
191 |
192 | def _run_on_vertex(self):
193 | environment_variables = {
194 | "RUN_ON_VERTEX": "False",
195 | "EDGE_CONFIG": self._get_encoded_config(),
196 | "MODEL_ID": str(self.model_id)
197 | }
198 |
199 | if self.mongo_connection_string is not None:
200 | environment_variables["MONGO_CONNECTION_STRING"] = self.mongo_connection_string
201 |
202 | CustomJob.from_local_script(
203 | display_name=f"{self.name}-custom-training",
204 | script_path=self.script_path,
205 | container_uri=self.model_config.training_container_image_uri,
206 | requirements=self.pip_requirements,
207 | #args=training_script_args,
208 | replica_count=1,
209 | project=self.edge_config.google_cloud_project.project_id,
210 | location=self.edge_config.google_cloud_project.region,
211 | staging_bucket=self.vertex_staging_path,
212 | environment_variables=environment_variables
213 | ).run()
214 |
215 | def _create_model_on_vertex(self):
216 | return Model.upload(
217 | display_name=self.name,
218 | project=self.edge_config.google_cloud_project.project_id,
219 | location=self.edge_config.google_cloud_project.region,
220 | serving_container_image_uri=self.model_config.serving_container_image_uri,
221 | artifact_uri=self.get_model_save_path()
222 | )
223 |
224 | def _get_encoded_config(self) -> str:
225 | return str(self.edge_config).replace("\n", "\\n")
226 |
227 | def _decode_config_string(self, s: str) -> EdgeConfig:
228 | return EdgeConfig.from_string(s.replace("\\n", "\n"))
229 |
230 | def _get_mongo_connection_string(self) -> str:
231 | # Try to get the Mongo connection string, if available
232 | try:
233 | client = secretmanager_v1.SecretManagerServiceClient()
234 | secret_name = f"projects/{self.edge_config.google_cloud_project.project_id}/secrets/{self.edge_config.experiments.mongodb_connection_string_secret}/versions/latest"
235 | response = client.access_secret_version(name=secret_name)
236 | return response.payload.data.decode("UTF-8")
237 | except Exception as e:
238 | return None
239 |
--------------------------------------------------------------------------------
/src/edge/tui.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import traceback
3 | from typing import Optional
4 | import questionary
5 | from enum import Enum
6 | from edge.exception import EdgeException
7 |
8 | styles = {
9 | "heading": "bold underline",
10 | "step": "bold",
11 | "substep": "",
12 | "success": "fg:ansigreen",
13 | "failure": "fg:ansired",
14 | "warning": "fg:ansiyellow",
15 | }
16 |
17 | qmark = " ?"
18 |
19 |
20 | def print_heading(text: str):
21 | questionary.print(text, styles["heading"])
22 |
23 |
24 | def strfmt_step(text: str, emoji: str = "*"):
25 | return f"{emoji} {text}"
26 |
27 |
28 | def print_step(text: str, emoji: str = "*"):
29 | questionary.print(strfmt_step(text, emoji), styles["step"])
30 |
31 |
32 | def strfmt_substep(text):
33 | return f" {text}"
34 |
35 |
36 | def print_substep(text: str):
37 | questionary.print(strfmt_substep(f"◻️ {text}"), styles["substep"])
38 |
39 |
40 | def strfmt_substep_success(text):
41 | return strfmt_substep(f"✔ ️{text}")
42 |
43 |
44 | def strfmt_substep_failure(text):
45 | return strfmt_substep(f"❌ {text}")
46 |
47 |
48 | def strfmt_substep_warning(text):
49 | return strfmt_substep(f"⚠️ {text}")
50 |
51 |
52 | def strfmt_substep_not_done(text):
53 | return strfmt_substep(f"⏳ {text}")
54 |
55 |
56 | def print_substep_success(text: str):
57 | questionary.print(strfmt_substep_success(text), styles["success"])
58 |
59 |
60 | def print_substep_failure(text: str):
61 | questionary.print(strfmt_substep_failure(text), styles["failure"])
62 |
63 |
64 | def print_substep_not_done(text: str):
65 | questionary.print(strfmt_substep_not_done(text), styles["substep"])
66 |
67 |
68 | def print_substep_warning(text: str):
69 | questionary.print(strfmt_substep_warning(text), styles["warning"])
70 |
71 |
72 | def strfmt_failure_explanation(text: str):
73 | return f" - {text}"
74 |
75 |
76 | def print_failure_explanation(text: str):
77 | questionary.print(strfmt_failure_explanation(text), styles["failure"])
78 |
79 |
80 | def print_warning_explanation(text: str):
81 | questionary.print(strfmt_failure_explanation(text), styles["warning"])
82 |
83 |
84 | def clear_last_line():
85 | print("\033[1A\033[0K", end="\r")
86 |
87 |
88 | class TUIStatus(Enum):
89 | NEUTRAL = "neutral"
90 | PENDING = "pending"
91 | SUCCESSFUL = "successful"
92 | FAILED = "failed"
93 | WARNING = "warning"
94 |
95 |
96 | class TUI(object):
97 | def __init__(
98 | self,
99 | intro: str,
100 | success_title: str,
101 | success_message: str,
102 | failure_title: str,
103 | failure_message: str,
104 | ):
105 | self.intro = intro
106 | self.success_title = success_title
107 | self.success_message = success_message
108 | self.failure_title = failure_title
109 | self.failure_message = failure_message
110 |
111 | def __enter__(self):
112 | questionary.print(self.intro, "bold underline")
113 | return self
114 |
115 | def __exit__(self, exc_type, exc_val, exc_tb):
116 | if exc_type is None:
117 | print()
118 | questionary.print(self.success_title, style="fg:ansigreen")
119 | print()
120 | print(self.success_message)
121 | sys.exit(0)
122 | elif exc_type is EdgeException:
123 | print()
124 | questionary.print(self.failure_title, style="fg:ansired")
125 | print()
126 | questionary.print(self.failure_message, style="fg:ansired")
127 | sys.exit(1)
128 | else:
129 | return False
130 |
131 |
132 | class StepTUI(object):
133 | def __init__(self, message: str, emoji: str = "*", silent: bool = False):
134 | self.message = message
135 | self.emoji = emoji
136 | self.silent = silent
137 |
138 | def __enter__(self):
139 | self.print()
140 | return self
141 |
142 | def __exit__(self, exc_type, exc_val, exc_tb):
143 | return False # Do not suppress
144 |
145 | def print(self):
146 | if self.silent:
147 | return
148 | questionary.print(f"{self.emoji} {self.message}", "bold")
149 |
150 |
151 | class SubStepTUI(object):
152 | style = {
153 | TUIStatus.NEUTRAL: "",
154 | TUIStatus.PENDING: "",
155 | TUIStatus.SUCCESSFUL: styles["success"],
156 | TUIStatus.FAILED: styles["failure"],
157 | TUIStatus.WARNING: styles["warning"]
158 | }
159 |
160 | emoji = {
161 | TUIStatus.NEUTRAL: "🤔",
162 | TUIStatus.PENDING: "⏳",
163 | TUIStatus.SUCCESSFUL: "✔",
164 | TUIStatus.FAILED: "❌",
165 | TUIStatus.WARNING: "⚠️"
166 | }
167 |
168 | def __init__(self, message: str, status=TUIStatus.PENDING, silent: bool = False):
169 | self.message = message
170 | self.status = status
171 | self.written = False
172 | self._entered = False
173 | self._dirty = False
174 | self.silent = silent
175 |
176 | def __enter__(self):
177 | self._entered = True
178 | self.print()
179 | return self
180 |
181 | def __exit__(self, exc_type, exc_val, exc_tb):
182 | suppress = True
183 | if exc_type is None: # sub-step exited with errors
184 | if self.status == TUIStatus.PENDING:
185 | self.update(status=TUIStatus.SUCCESSFUL)
186 | elif exc_type is EdgeException:
187 | if exc_val.fatal:
188 | self.update(status=TUIStatus.FAILED)
189 | suppress = False
190 | else:
191 | self.update(status=TUIStatus.WARNING)
192 | self.add_explanation(str(exc_val))
193 | else:
194 | self.update(status=TUIStatus.FAILED)
195 | self.add_explanation(
196 | "Unexpected error occurred:\n\n"
197 | f"{''.join(traceback.format_tb(exc_tb))}\n"
198 | f" {exc_type.__name__}: {str(exc_val)}\n"
199 | f"\n Please raise an issue for this error at https://github.com/fuzzylabs/vertex-edge/issues"
200 | )
201 | raise EdgeException("Unexpected error occurred, but already reported")
202 |
203 | self._entered = False
204 |
205 | return suppress
206 |
207 | def print(self):
208 | if self.silent:
209 | return
210 | if not self._entered:
211 | return
212 | if self.written and not self._dirty:
213 | clear_last_line()
214 | line = f" {self.emoji[self.status]} {self.message}"
215 | questionary.print(line, self.style[self.status])
216 | self.written = True
217 |
218 | def update(self, message: Optional[str] = None, status: Optional[TUIStatus] = None):
219 | if message is not None:
220 | self.message = message
221 | if status is not None:
222 | self.status = status
223 | self.print()
224 |
225 | def add_explanation(self, text: str):
226 | self._dirty = True
227 | line = f" - {text}"
228 | questionary.print(line, self.style[self.status])
229 |
230 | def set_dirty(self):
231 | self._dirty = True
232 |
--------------------------------------------------------------------------------
/src/edge/versions.py:
--------------------------------------------------------------------------------
1 | import json
2 | import subprocess
3 | from dataclasses import dataclass
4 | from .exception import EdgeException
5 |
6 |
7 | @dataclass
8 | class Version:
9 | major: int
10 | minor: int
11 | patch: int
12 |
13 | @classmethod
14 | def from_string(cls, version_string: str):
15 | version_string = version_string.split("+")[0].strip("v")
16 | ns = [int(x) for x in version_string.split(".")]
17 | return Version(*ns)
18 |
19 | def is_at_least(self, other):
20 | if self.major > other.major:
21 | return True
22 | elif self.major == other.major:
23 | if self.minor > other.minor:
24 | return True
25 | elif self.minor == other.minor:
26 | if self.patch >= other.patch:
27 | return True
28 | return False
29 |
30 | def __str__(self):
31 | return f"{self.major}.{self.minor}.{self.patch}"
32 |
33 |
34 | def command_exist(command) -> bool:
35 | try:
36 | subprocess.check_output(f"which {command}", shell=True, stderr=subprocess.STDOUT)
37 | return True
38 | except subprocess.CalledProcessError:
39 | return False
40 |
41 |
42 | def get_version(command) -> str:
43 | try:
44 | version_string = subprocess.check_output(command, shell=True, stderr=subprocess.DEVNULL).decode("utf-8")
45 | return version_string
46 | except subprocess.CalledProcessError:
47 | raise EdgeException(f"Unexpected error, while trying to get version with `{command}`")
48 |
49 |
50 | def get_gcloud_version(component: str = "core") -> Version:
51 | if not command_exist("gcloud"):
52 | raise EdgeException("Unable to locate gcloud. Please visit https://cloud.google.com/sdk/docs/install for installation instructions.")
53 | version_string = get_version("gcloud version --format json")
54 | return Version.from_string(json.loads(version_string)[component])
55 |
56 | def get_kubectl_version() -> Version:
57 | if not command_exist("kubectl"):
58 | raise EdgeException("Unable to locate kubectl. Please visit https://kubernetes.io/docs/tasks/tools/ for installation instructions.")
59 | version_string = get_version("kubectl version --client=true --short -o json")
60 | return Version.from_string(json.loads(version_string)["clientVersion"]["gitVersion"])
61 |
62 |
63 | def get_helm_version() -> Version:
64 | if not command_exist("helm"):
65 | raise EdgeException("Unable to locate helm. Please visit https://helm.sh/docs/intro/install/ for installation instructions.")
66 | version_string = get_version("helm version --short")
67 | return Version.from_string(version_string)
68 |
--------------------------------------------------------------------------------
/src/edge/vertex_deploy.py:
--------------------------------------------------------------------------------
1 | from google.cloud.aiplatform import Model, Endpoint
2 | from google.api_core.exceptions import NotFound
3 |
4 | from edge.exception import EdgeException
5 | from edge.tui import StepTUI, SubStepTUI
6 |
7 |
8 | def vertex_deploy(endpoint_resource_name: str, model_resource_name: str, model_name: str):
9 | with StepTUI(f"Deploying model '{model_name}'", emoji="🐏"):
10 | with SubStepTUI(f"Checking endpoint '{endpoint_resource_name}'"):
11 | try:
12 | endpoint = Endpoint(endpoint_name=endpoint_resource_name)
13 | except NotFound:
14 | raise EdgeException(f"Endpoint '{endpoint_resource_name}' is not found. Please reinitialise the model "
15 | f"by running `./edge.py model init` to create it.")
16 | with SubStepTUI(f"Undeploying previous models from endpoint '{endpoint_resource_name}'"):
17 | endpoint.undeploy_all()
18 | with SubStepTUI(f"Deploying model '{model_resource_name}' on endpoint '{endpoint_resource_name}'"):
19 | try:
20 | model = Model(model_resource_name)
21 | except NotFound:
22 | raise EdgeException(f"Model '{model_resource_name}' is not found. You need to train a model "
23 | f"by running `dvc repro ...`.")
24 | endpoint.deploy(model=model, traffic_percentage=100, machine_type="n1-standard-2")
25 |
--------------------------------------------------------------------------------
/src/vertex_edge.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | """
3 | vertex:edge CLI tool
4 | """
5 | import argparse
6 | import warnings
7 | import logging
8 |
9 | from edge.command.force_unlock import force_unlock
10 | from edge.command.experiments.subparser import add_experiments_parser, run_experiments_actions
11 | from edge.command.init import edge_init
12 | from edge.command.dvc.subparser import add_dvc_parser, run_dvc_actions
13 | from edge.command.config.subparser import add_config_parser, run_config_actions
14 | from edge.command.model.subparser import add_model_parser, run_model_actions
15 |
16 | logging.disable(logging.WARNING)
17 | warnings.filterwarnings(
18 | "ignore",
19 | "Your application has authenticated using end user credentials from Google Cloud SDK without a quota project.",
20 | )
21 |
22 |
23 | if __name__ == "__main__":
24 | parser = argparse.ArgumentParser(description="Edge", formatter_class=argparse.RawTextHelpFormatter)
25 | parser.add_argument(
26 | "-c", "--config", type=str, default="edge.yaml", help="Path to the configuration file (default: edge.yaml)"
27 | )
28 |
29 | subparsers = parser.add_subparsers(title="command", dest="command", required=True)
30 | init_parser = subparsers.add_parser("init", help="Initialise vertex:edge")
31 | force_unlock_parser = subparsers.add_parser("force-unlock", help="Force unlock vertex:edge state")
32 |
33 | add_dvc_parser(subparsers)
34 | add_model_parser(subparsers)
35 | add_experiments_parser(subparsers)
36 | add_config_parser(subparsers)
37 |
38 | args = parser.parse_args()
39 |
40 | if args.command == "init":
41 | edge_init()
42 | elif args.command == "force-unlock":
43 | force_unlock()
44 | elif args.command == "dvc":
45 | run_dvc_actions(args)
46 | elif args.command == "model":
47 | run_model_actions(args)
48 | elif args.command == "experiments":
49 | run_experiments_actions(args)
50 | elif args.command == "config":
51 | run_config_actions(args)
52 |
53 | raise NotImplementedError("The rest of the commands are not implemented")
54 |
--------------------------------------------------------------------------------
/tutorials/setup.md:
--------------------------------------------------------------------------------
1 | # vertex:edge setup
2 |
3 | In this tutorial you'll see how to set up a new project using vertex:edge.
4 |
5 | ## Preparation
6 |
7 | The very first thing you'll need is a fresh directory in which to work. For instance:
8 |
9 | ```
10 | mkdir hello-world-vertex
11 | cd hello-world-vertex
12 | ```
13 |
14 | ## Setting up GCP environment
15 |
16 | Now you'll need a [GCP account](https://cloud.google.com), so sign up for one if you haven't already done so.
17 |
18 | Then within your GCP account, [create a new project](https://cloud.google.com/resource-manager/docs/creating-managing-projects). Take a note of the project ID; you'll be able to view this in the Google Cloud console with the project selection dialog). Note that the project ID won't necessarily match the name that you chose for the project, as GCP often appends some digits to the end of the name.
19 |
20 | Finally make sure you have [enabled billing](https://cloud.google.com/billing/docs/how-to/modify-project) for your new project too.
21 |
22 | ## Authenticating with GCP
23 |
24 | If you haven't got the `gcloud` command line tool, [install it now](https://cloud.google.com/sdk/docs/install).
25 |
26 | And then authenticate by running:
27 |
28 | ```
29 | gcloud auth login
30 | ```
31 |
32 | Next you need to configure the project ID. This should be the project which you created during 'Setting up GCP environment' above.
33 |
34 | ```
35 | gcloud config set project
36 | ```
37 |
38 | You'll also need to configure a region. Please see the [GCP documentation](https://cloud.google.com/vertex-ai/docs/general/locations#feature-availability) to learn which regions are available for Vertex.
39 |
40 | ```
41 | gcloud config set compute/region
42 | ```
43 |
44 | **Note** `gcloud` might ask you if you want to enable the Google Compute API on the project. If so, type `y` to enable this.
45 |
46 | Finally, you need to run one more command to complete authentication:
47 |
48 | ```
49 | gcloud auth application-default login
50 | ```
51 |
52 | ## Installing vertex:edge
53 |
54 | We'll use PIP to install **vertex:edge**. Before doing this, it's a good idea to run `pip install --upgrade-pip` to ensure that you have the most recent PIP version.
55 |
56 | To install vertex_edge, run:
57 |
58 | ```
59 | pip install vertex-edge
60 | ```
61 |
62 | After doing that, you should have the `edge` command available. Try running:
63 |
64 | ```
65 | edge --help
66 | ```
67 |
68 | **Note** that when you run `edge` for the first time, it will download a Docker image (`fuzzylabs/edge`), which might take some time to complete. All Edge commands run inside Docker.
69 |
70 | ## Initialising vertex:edge
71 |
72 | Before you can use **vertex:edge** to train models, you'll need to initialise your project. This only needs to be done once, whenever you start a new project.
73 |
74 | Inside your project directory, run
75 |
76 | ```
77 | edge init
78 | ```
79 |
80 | As part of the initialisation process, vertex:edge will first verify that your GCP environment is setup correctly and it will confirm your choice of project name and region, so that you don't accidentally install things to the wrong GCP environment.
81 |
82 | It will ask you to choose a name for a cloud storage bucket. This bucket is used for a number of things:
83 |
84 | * Tracking the state of your project.
85 | * Storing model assets.
86 | * Storing versioned data.
87 |
88 | Keep in mind that on GCP, storage bucket names are **globally unique**, so you need to choose a name that isn't already taken. For more information please see the [official GCP documentation](https://cloud.google.com/storage/docs/naming-buckets).
89 |
90 | You might wonder what initialisation actually _does_:
91 |
92 | * It creates a configuration file in your project directory, called `edge.yaml`. The configuration includes details about your GCP environment, the models that you have created, and the cloud storage bucket.
93 | * And creates a _state file_. This lives in the cloud storage bucket, and it is used by **vertex:edge** to keep track of everything that it has deployed or trained.
94 |
95 | ## Next steps
96 |
97 | After all of the above, you'll have a new project directory `hello-world-vertex`, which will contain a configuration file `edge.yaml`. You'll also have a GCP project, inside which there will be a cloud storage bucket with a name that matches what you chose during `edge init`.
98 |
99 | You're now ready to train and deploy a model. See the [next tutorial](train_deploy.md) to learn how.
100 |
--------------------------------------------------------------------------------
/tutorials/train_deploy.md:
--------------------------------------------------------------------------------
1 | # Training a model
2 |
3 | In this tutorial you'll be able to train and deploy a TensorFlow model to Google Vertex using the vertex:edge command line tool and Python library.
4 |
5 | Before following this tutorial, you should have already setup a GCP project and initialised vertex:edge. See the [setup tutorial](setup.md) for more information.
6 |
7 | ## Initialisation
8 |
9 | We're going to use the [TensorFlow](https://www.tensorflow.org) framework for this example, so let's go ahead and install that now:
10 |
11 | ```
12 | pip install tensorflow
13 | ```
14 |
15 | Next we initialise a new model, which makes **vertex:edge** aware that there is a new model.
16 |
17 | ```
18 | edge model init hello-world
19 | ```
20 |
21 | If you check your `config.yaml` file now, you will see that a model has been added to the `models` section:
22 |
23 | ```yaml
24 | models:
25 | hello-world:
26 | endpoint_name: hello-world-endpoint
27 | name: hello-world
28 | serving_container_image_uri: europe-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-6:latest
29 | training_container_image_uri: europe-docker.pkg.dev/vertex-ai/training/tf-cpu.2-6:latest
30 | ```
31 |
32 | Note that you won't see anything new appear in the Google Cloud Console until after the model has actually been trained, which we'll do next.
33 |
34 | ## Writing a model training script
35 |
36 | To begin with, we can generate an outline of our model training code using a template:
37 |
38 | ```
39 | edge model template hello-world
40 | ```
41 |
42 | You will be asked which framework you want to use, so select `tensorflow`.
43 |
44 | There will now be a Python script named `train.py` inside `models/hello-world`. Open this script in your favourite editor or IDE. It looks like this:
45 |
46 | ```python
47 | from edge.train import Trainer
48 |
49 | class MyTrainer(Trainer):
50 | def main(self):
51 | self.set_parameter("example", 123)
52 |
53 | # Add model training logic here
54 |
55 | return 0 # return your model score here
56 |
57 | MyTrainer("hello-world").run()
58 | ```
59 |
60 | Every model training script needs to have the basic structure shown above. Let's break this down a little bit:
61 |
62 | * We start by importing the class `Trainer` from the **vertex:edge** library.
63 | * We define a training class. This class can have any name you like, as long as it extends `Trainer`.
64 | * The `Trainer` class provides a method called `main`, and this is where we write all of the model training logic.
65 | * We have the ability to set parameters and save performance metrics for experiment tracking - more on this shortly.
66 | * At the end, we just need one more line to instantiate and run our training class.
67 |
68 | Now let's create something a bit more interesting. A simple classifier:
69 |
70 | ```python
71 | TODO
72 | ```
73 |
74 | ## Training the model
75 |
76 | Now we can train the model simply by running
77 |
78 | ```
79 | python models/hello-world/train.py
80 | ```
81 |
82 | Which will run the training script locally - i.e. on your computer. That's fine if your model is reasonably simple, but for more compute-intensive models we want to use the on-demand compute available in Google Vertex.
83 |
84 | The good news is that you don't need to modify the code in any way in order to train the model on Vertex, because **vertex:edge** figures out how to do package the training script and run it for you. All you run is this:
85 |
86 | ```
87 | RUN_ON_VERTEX=True python models/hello-world/train.py
88 | ```
89 |
90 | ## Deploying the model
91 |
92 | Once you've trained the model on Vertex as above, then you can also deploy it to Vertex. One important thing to remember, however, is that models trained locally _cannot_ be deployed to Vertex.
93 |
94 | Because vertex:edge keeps track of all the models you've trained, it's very easy to deploy the most recently trained model, like this:
95 |
96 | ```
97 | edge model deploy hello-world
98 | ```
99 |
--------------------------------------------------------------------------------
/vertex-edge-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fuzzylabs/vertex-edge/97a0df5ebad0dc35e262d9f8269e6e33190f6ad1/vertex-edge-logo.png
--------------------------------------------------------------------------------