├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── notebooks
    ├── 0 - Label your dataset with Amazon SageMaker GroundTruth
    │   ├── 0a-Label-your-own-dataset.ipynb
    │   ├── 0b-Label-your-own-dataset with SM Processing.ipynb
    │   ├── README.md
    │   └── code
    │   │   └── preprocessing.py
    ├── 1 - Train and test on Amazon SageMaker Studio
    │   ├── README.md
    │   └── Train-and-test-on-Amazon-SageMaker-Studio.ipynb
    └── 2 - Train and deploy with Amazon SageMaker
    │   ├── Custom-YOLOv5-Train-and-Deploy-on-Amazon-SageMaker.ipynb
    │   ├── README.md
    │   └── helper-code
    │       └── detect.py
└── src
    └── images
        └── banner-1.png


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Daniel Mitchell
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ![banner-image](src/images/banner-1.png)
 2 | ## Train and deploy custom YOLOv5 Object Detection models on Amazon SageMaker.
 3 | 
 4 | Object detection allows us to identify and locate objects in images or videos. You may want to detect your company brand in pictures, find objects in an shelf, count the number of people in a shop and many other detection use cases which need to be fullfilled everyday. **You only look once (YOLO)** is a state-of-the-art, real-time object detection system [presented in 2015](https://arxiv.org/abs/1506.02640). Nowadays YOLO has become a very popular algorithm to use when focusing on object detection.
 5 | 
 6 | **[Amazon SageMaker](https://aws.amazon.com/sagemaker/)** is a fully managed service to build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.
 7 | 
 8 | In this workshop you will learn how to use different Amazon Sagemaker features to train and deploy custom [YOLOv5 models](https://github.com/ultralytics/yolov5).
 9 | 
10 | Here are the different sections you can find:
11 | 
12 |   **0. Label and prepare your dataset:** Before we start creating a custom model, we need data, which has to be labelled and organized in the expected format. For this task we will make use of [Amazon SageMaker Ground Truth](https://aws.amazon.com/sagemaker/data-labeling/?sagemaker-data-wrangler-whats-new.sort-by=item.additionalFields.postDateTime&sagemaker-data-wrangler-whats-new.sort-order=desc), a feature that helps you build and manage your own data labeling workflows and data labeling workforce. Once you have labeled your dataset, you can choose to convert it to the expected format locally or using [Amazon SageMaker Processing Jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html).
13 |   
14 |   **1. Train and test on Amazon SageMaker Studio:** Once we have prepared the dataset we can train the custom YOLOv5 model. In this section you will download your dataset to train and test the model locally on [Amazon SageMaker Studio](https://aws.amazon.com/sagemaker/studio/). 
15 |   
16 |   **2. Train and deploy with Amazon SageMaker:** Training and testing locally is good to quickly test out your model, but for production you will probably want to train your models with more powerfull instances and deploy your model to an endpoint (having to manage the least infrastructure as possible). In this section you will learn how to make use of [Amazon SageMaker Training Jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html) and [Amazon SageMaker Endpoints](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html) to train and deploy your custom model.
17 | 
18 | 
19 | ---
20 | 
21 | This package depends on and may incorporate or retrieve a number of third-party
22 | software packages (such as open source packages) at install-time or build-time
23 | or run-time ("External Dependencies"). The External Dependencies are subject to
24 | license terms that you must accept in order to use this package. If you do not
25 | accept all of the applicable license terms, you should not use this package. We
26 | recommend that you consult your company’s open source approval policy before
27 | proceeding.
28 | 
29 | Provided below is a list of External Dependencies and the applicable license
30 | identification as indicated by the documentation associated with the External
31 | Dependencies as of Amazon's most recent review.
32 | 
33 | THIS INFORMATION IS PROVIDED FOR CONVENIENCE ONLY. AMAZON DOES NOT PROMISE THAT
34 | THE LIST OR THE APPLICABLE TERMS AND CONDITIONS ARE COMPLETE, ACCURATE, OR
35 | UP-TO-DATE, AND AMAZON WILL HAVE NO LIABILITY FOR ANY INACCURACIES. YOU SHOULD
36 | CONSULT THE DOWNLOAD SITES FOR THE EXTERNAL DEPENDENCIES FOR THE MOST COMPLETE
37 | AND UP-TO-DATE LICENSING INFORMATION.
38 | 
39 | YOUR USE OF THE EXTERNAL DEPENDENCIES IS AT YOUR SOLE RISK. IN NO EVENT WILL
40 | AMAZON BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT,
41 | INDIRECT, CONSEQUENTIAL, SPECIAL, INCIDENTAL, OR PUNITIVE DAMAGES (INCLUDING
42 | FOR ANY LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, OR
43 | COMPUTER FAILURE OR MALFUNCTION) ARISING FROM OR RELATING TO THE EXTERNAL
44 | DEPENDENCIES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, EVEN
45 | IF AMAZON HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THESE LIMITATIONS
46 | AND DISCLAIMERS APPLY EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW.
47 | 
48 | {YOLOv5 (https://github.com/ultralytics/yolov5) - GNU General Public License v3.0 or later}
49 | 
50 | 


--------------------------------------------------------------------------------
/notebooks/0 - Label your dataset with Amazon SageMaker GroundTruth/0a-Label-your-own-dataset.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "4379a998-cad7-40fb-99bb-5a2948f7009d",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Label your dataset with Amazon SageMaker Ground Truth"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "code",
 13 |    "execution_count": 12,
 14 |    "id": "f47a5f73-c8c5-4ce0-bb5d-b996066e47c4",
 15 |    "metadata": {},
 16 |    "outputs": [],
 17 |    "source": [
 18 |     "import boto3\n",
 19 |     "import json\n",
 20 |     "import numpy\n",
 21 |     "import os\n",
 22 |     "import sagemaker\n",
 23 |     "\n",
 24 |     "from sklearn.model_selection import train_test_split\n",
 25 |     "\n",
 26 |     "sm_client = boto3.client('sagemaker')\n",
 27 |     "s3_resource = boto3.resource('s3')\n",
 28 |     "sm_session = sagemaker.Session()"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "id": "792c33f6-78f7-42da-affe-b34bb34845e0",
 34 |    "metadata": {},
 35 |    "source": [
 36 |     "### Create a labeling job in Amazon SageMaker Ground Truth"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "id": "7892e313-d4da-4487-a3b7-c7f28a1a0935",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "To create your custom model on YOLOv5 you are going to need to label your custom dataset. To label an object detection dataset you may use Amazon SageMaker Ground Truth."
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "markdown",
 49 |    "id": "bbdda986-63f2-4514-b6eb-57f935771daa",
 50 |    "metadata": {},
 51 |    "source": [
 52 |     "| ⚠️ WARNING: If you have already labeled an object detection dataset with Amazon SageMaker Ground Truth you can skip to the \"**Get Job Details**\" |\n",
 53 |     "| -- |"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "markdown",
 58 |    "id": "d5523582-1a38-47cc-85d4-9c450ccedb87",
 59 |    "metadata": {
 60 |     "tags": []
 61 |    },
 62 |    "source": [
 63 |     "#### Create a Labeling Workforce\n",
 64 |     "\n",
 65 |     "Follow the steps in the SageMaker Ground Truth documentation here: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-create-private-console.html#create-workforce-labeling-job\n"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "markdown",
 70 |    "id": "94e7cd63-e965-44d6-9f5b-0be8e7b9e8ad",
 71 |    "metadata": {
 72 |     "tags": []
 73 |    },
 74 |    "source": [
 75 |     "#### Create your bounding box labeling job\n",
 76 |     "\n",
 77 |     "Follow the steps in the SageMaker Ground Truth documentation here: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-create-labeling-job-console.html\n",
 78 |     "\n",
 79 |     "If using the AWS Console, you should create a labeling job with the following options:\n",
 80 |     "\n",
 81 |     "1. Job name: Set any unique name for the job name, for example \"Object-Detection-Example\".\n",
 82 |     "2. Leave the \"I want to specify a label attribute...\" option un-checked.\n",
 83 |     "3. Input data setup: Pick \"Automated data setup\".\n",
 84 |     "4. Input dataset location: Copy and paste the location of the single folder with your images in S3. Example: \"s3://mybucket/raw_images\".\n",
 85 |     "5. Output dataset location: Choose \"Same location as input dataset\".\n",
 86 |     "6. Data type: Choose \"Image\".\n",
 87 |     "7. IAM Role: Create a new role and give access to the S3 bucket where your images are located, or any S3 bucket.\n",
 88 |     "8. Now hit \"Complete data setup\" and wait for it to be ready.\n",
 89 |     "9. Task category: Choose \"Image\" and select \"Bounding box\", then hit \"Next\".\n",
 90 |     "10. Worker types: Select \"Private\" and choose your team for the \"Private teams\" option.\n",
 91 |     "11. For the Bounding box labeling tool: Enter a description and instructions, and for the \"Labels\" section add the relevant labels for your job. \n",
 92 |     "12. Finally choose \"Create\"."
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "markdown",
 97 |    "id": "46aa584d-64c4-4e67-a576-59921a6c0f75",
 98 |    "metadata": {},
 99 |    "source": [
100 |     "### Get Job Details"
101 |    ]
102 |   },
103 |   {
104 |    "cell_type": "markdown",
105 |    "id": "a1703a57-9b6a-47b0-96e2-74eba601142a",
106 |    "metadata": {},
107 |    "source": [
108 |     "Once you have finished labeling your images, let's retrieve the information we need to create our dataset in the format YOLOv5 expects"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": 10,
114 |    "id": "807670d9-3cee-47d5-9f30-0b74735f3367",
115 |    "metadata": {},
116 |    "outputs": [],
117 |    "source": [
118 |     "groundtruth_job_name = \"Object-Detection-Example\" ### <-- Replace with the name you used for your labeling job"
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "code",
123 |    "execution_count": 11,
124 |    "id": "810be406-7cc7-4098-b2c7-80eecd167523",
125 |    "metadata": {},
126 |    "outputs": [
127 |     {
128 |      "name": "stdout",
129 |      "output_type": "stream",
130 |      "text": [
131 |       "Job Status:  Completed\n",
132 |       "Manifest Uri:  s3://buzecd-aiml-demos/ground-truth-tests/image-bounding-box/Output/Object-Detection-Example/manifests/output/output.manifest\n",
133 |       "Labels Uri:  s3://buzecd-aiml-demos/ground-truth-tests/image-bounding-box/Output/Object-Detection-Example/annotation-tool/data.json\n"
134 |      ]
135 |     }
136 |    ],
137 |    "source": [
138 |     "response = sm_client.describe_labeling_job(\n",
139 |     "    LabelingJobName=groundtruth_job_name\n",
140 |     ")\n",
141 |     "\n",
142 |     "labelingJobStatus = response[\"LabelingJobStatus\"]\n",
143 |     "manifestUri = response[\"LabelingJobOutput\"][\"OutputDatasetS3Uri\"]\n",
144 |     "labelsListUri = response[\"LabelCategoryConfigS3Uri\"]\n",
145 |     "\n",
146 |     "print(\"Job Status: \",labelingJobStatus)\n",
147 |     "print(\"Manifest Uri: \", manifestUri)\n",
148 |     "print(\"Labels Uri: \", labelsListUri)"
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "markdown",
153 |    "id": "249a30a9-32ec-4100-87f9-d0029ecea96e",
154 |    "metadata": {},
155 |    "source": [
156 |     "### Get labels"
157 |    ]
158 |   },
159 |   {
160 |    "cell_type": "markdown",
161 |    "id": "3a93a9c8-4528-46f2-8b7a-85c7bb089134",
162 |    "metadata": {},
163 |    "source": [
164 |     "We need to retrieve the labels from the training job which are located in S3."
165 |    ]
166 |   },
167 |   {
168 |    "cell_type": "code",
169 |    "execution_count": 6,
170 |    "id": "acf2cf92-96b0-4707-9799-8aafe5dca589",
171 |    "metadata": {},
172 |    "outputs": [],
173 |    "source": [
174 |     "def split_s3_path(s3_path):\n",
175 |     "    path_parts=s3_path.replace(\"s3://\",\"\").split(\"/\")\n",
176 |     "    bucket=path_parts.pop(0)\n",
177 |     "    key=\"/\".join(path_parts)\n",
178 |     "    return bucket, key\n",
179 |     "\n",
180 |     "def get_labels_list(labels_uri):\n",
181 |     "    labels = []\n",
182 |     "    bucket, key = split_s3_path(labels_uri)\n",
183 |     "    s3_resource.meta.client.download_file(bucket, key, 'labels.json')\n",
184 |     "    with open('labels.json') as f:\n",
185 |     "        data = json.load(f)\n",
186 |     "    for label in data[\"labels\"]:\n",
187 |     "        labels.append(label[\"label\"])\n",
188 |     "    return labels"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "code",
193 |    "execution_count": 7,
194 |    "id": "eacbf363-06fe-460f-b559-be889ac59078",
195 |    "metadata": {},
196 |    "outputs": [
197 |     {
198 |      "name": "stdout",
199 |      "output_type": "stream",
200 |      "text": [
201 |       "Labels:  ['Dog', 'Cat']\n"
202 |      ]
203 |     }
204 |    ],
205 |    "source": [
206 |     "labels = get_labels_list(labelsListUri)\n",
207 |     "print(\"Labels: \",labels)"
208 |    ]
209 |   },
210 |   {
211 |    "cell_type": "markdown",
212 |    "id": "c40ef3ce-40d2-40cf-83c3-7412f37503ba",
213 |    "metadata": {},
214 |    "source": [
215 |     "### Get manifest"
216 |    ]
217 |   },
218 |   {
219 |    "cell_type": "markdown",
220 |    "id": "1f35b0b3-edbe-4e11-b8d7-1c6970581020",
221 |    "metadata": {},
222 |    "source": [
223 |     "We need to retrieve the labeled manifest file from the training job which is located in S3"
224 |    ]
225 |   },
226 |   {
227 |    "cell_type": "code",
228 |    "execution_count": 8,
229 |    "id": "eb5c5536-cbf7-4f3d-9e75-4d423f05a5b4",
230 |    "metadata": {},
231 |    "outputs": [],
232 |    "source": [
233 |     "def get_manifest_file(manifest_uri):\n",
234 |     "    bucket, key = split_s3_path(manifest_uri)\n",
235 |     "    s3_resource.meta.client.download_file(bucket, key, 'output.manifest')\n",
236 |     "    return \"output.manifest\""
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "code",
241 |    "execution_count": 9,
242 |    "id": "285684f2-26b9-4d03-8e62-76312459055c",
243 |    "metadata": {},
244 |    "outputs": [],
245 |    "source": [
246 |     "manifest = get_manifest_file(manifestUri)"
247 |    ]
248 |   },
249 |   {
250 |    "cell_type": "markdown",
251 |    "id": "6f337cb0-f48d-4395-8928-0ddcf8a3ec8b",
252 |    "metadata": {},
253 |    "source": [
254 |     "### Split manifest into training and validation"
255 |    ]
256 |   },
257 |   {
258 |    "cell_type": "markdown",
259 |    "id": "cb8e4752-6c18-4fc4-986f-6e63f4532875",
260 |    "metadata": {},
261 |    "source": [
262 |     "Now we have our manifest, let's split our data into training and validation"
263 |    ]
264 |   },
265 |   {
266 |    "cell_type": "code",
267 |    "execution_count": null,
268 |    "id": "860b8bc9-0e41-4d49-ac49-e3d223704170",
269 |    "metadata": {},
270 |    "outputs": [],
271 |    "source": [
272 |     "with open(manifest) as file:\n",
273 |     "    lines = file.readlines()\n",
274 |     "    data = numpy.array(lines)\n",
275 |     "    train_data, validation_data = train_test_split(data, test_size=0.2)\n",
276 |     "    \n",
277 |     "print(\"The manifest contains {} annotations.\".format(len(data)))\n",
278 |     "print(\"{} will be used for training.\".format(len(train_data)))\n",
279 |     "print(\"{} will be used for validation.\".format(len(validation_data)))"
280 |    ]
281 |   },
282 |   {
283 |    "cell_type": "markdown",
284 |    "id": "50a0779c-6aa5-4037-8790-7653f90fe4de",
285 |    "metadata": {},
286 |    "source": [
287 |     "### Create YOLOv5 Training and Validation datasets"
288 |    ]
289 |   },
290 |   {
291 |    "cell_type": "markdown",
292 |    "id": "89948cf6-8e54-4ecb-8b4a-3d6c29dd19ae",
293 |    "metadata": {},
294 |    "source": [
295 |     "Lets download the images and create the annotation files in YOLOv5 expected format"
296 |    ]
297 |   },
298 |   {
299 |    "cell_type": "code",
300 |    "execution_count": null,
301 |    "id": "a63680b8-e8c9-4fbd-8944-403fb3479fd9",
302 |    "metadata": {},
303 |    "outputs": [],
304 |    "source": [
305 |     "dirs = [\"dataset/images/train\", \n",
306 |     "        \"dataset/labels/train\",\n",
307 |     "        \"dataset/images/validation\",\n",
308 |     "        \"dataset/labels/validation\"]\n",
309 |     "\n",
310 |     "for directory in dirs:\n",
311 |     "    !mkdir -p {directory}"
312 |    ]
313 |   },
314 |   {
315 |    "cell_type": "code",
316 |    "execution_count": null,
317 |    "id": "05595658-d4fc-4fb2-8143-39167fb600d2",
318 |    "metadata": {},
319 |    "outputs": [],
320 |    "source": [
321 |     "def ground_truth_to_yolo(dataset, dataset_category):\n",
322 |     "    print(\"Downloading images and creating labels for the {} dataset\".format(dataset_category))\n",
323 |     "    for line in dataset:\n",
324 |     "        line = json.loads(line)\n",
325 |     "        \n",
326 |     "        # Variables\n",
327 |     "        object_s3_uri = line[\"source-ref\"]\n",
328 |     "        bucket, key = split_s3_path(object_s3_uri)\n",
329 |     "        image_filename = object_s3_uri.split(\"/\")[-1]\n",
330 |     "        txt_filename = '.'.join(image_filename.split(\".\")[:-1]) + \".txt\"\n",
331 |     "        txt_path = \"dataset/labels/{}/{}\".format(dataset_category, txt_filename)\n",
332 |     "        \n",
333 |     "        # Download image\n",
334 |     "        s3_resource.meta.client.download_file(bucket, key, \"dataset/images/{}/{}\".format(dataset_category,image_filename))\n",
335 |     "        \n",
336 |     "        # Create txt with annotations\n",
337 |     "        with open(txt_path, 'w') as target:\n",
338 |     "            for annotation in line[groundtruth_job_name][\"annotations\"]:\n",
339 |     "                class_id = annotation[\"class_id\"]\n",
340 |     "                center_x = (annotation[\"left\"] + (annotation[\"width\"]/2)) / line[groundtruth_job_name][\"image_size\"][0][\"width\"]\n",
341 |     "                center_y = (annotation[\"top\"] + (annotation[\"height\"]/2)) / line[groundtruth_job_name][\"image_size\"][0][\"height\"]\n",
342 |     "                w = annotation[\"width\"] / line[groundtruth_job_name][\"image_size\"][0][\"width\"]\n",
343 |     "                h = annotation[\"height\"] / line[groundtruth_job_name][\"image_size\"][0][\"height\"]\n",
344 |     "                data = \"{} {} {} {} {}\\n\".format(class_id, center_x, center_y, w, h)\n",
345 |     "                target.write(data)"
346 |    ]
347 |   },
348 |   {
349 |    "cell_type": "code",
350 |    "execution_count": null,
351 |    "id": "b802e890-1622-43a4-86e2-e7276a816382",
352 |    "metadata": {},
353 |    "outputs": [],
354 |    "source": [
355 |     "ground_truth_to_yolo(train_data, \"train\")\n",
356 |     "ground_truth_to_yolo(validation_data, \"validation\")"
357 |    ]
358 |   },
359 |   {
360 |    "cell_type": "markdown",
361 |    "id": "1a247340-635a-4fa1-b5c1-d884ea204091",
362 |    "metadata": {},
363 |    "source": [
364 |     "### Validate the number of downloaded files"
365 |    ]
366 |   },
367 |   {
368 |    "cell_type": "code",
369 |    "execution_count": null,
370 |    "id": "409f2780-b839-4906-8426-d0bcd6acd2ff",
371 |    "metadata": {},
372 |    "outputs": [],
373 |    "source": [
374 |     "def count_files(dirs):\n",
375 |     "    for directory in dirs:\n",
376 |     "        number = len([1 for x in list(os.scandir(directory)) if x.is_file()])\n",
377 |     "        print(\"There are {} elements in {}\".format(number, directory))\n",
378 |     "\n",
379 |     "count_files(dirs)"
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "code",
384 |    "execution_count": null,
385 |    "id": "06c8037a-ac58-4b96-add5-34ba671a3166",
386 |    "metadata": {},
387 |    "outputs": [],
388 |    "source": [
389 |     "# TODO: Show images with bounding boxes"
390 |    ]
391 |   },
392 |   {
393 |    "cell_type": "markdown",
394 |    "id": "7257e0ed-27a5-4443-9628-d291bacf9aa3",
395 |    "metadata": {},
396 |    "source": [
397 |     "### Upload to S3 the labeled dataset"
398 |    ]
399 |   },
400 |   {
401 |    "cell_type": "markdown",
402 |    "id": "fdd6a607-786f-4098-9162-c532e94fe6a3",
403 |    "metadata": {},
404 |    "source": [
405 |     "Let's upload our dataset to S3, this will be used for the training job"
406 |    ]
407 |   },
408 |   {
409 |    "cell_type": "code",
410 |    "execution_count": null,
411 |    "id": "4a435d86-44be-4908-aef7-0f80a15d3834",
412 |    "metadata": {},
413 |    "outputs": [],
414 |    "source": [
415 |     "bucket = sm_session.default_bucket()\n",
416 |     "#bucket = \"\" #Use this option if you want to use a specific S3 bucket\n",
417 |     "dataset_s3_uri = sm_session.upload_data(\"dataset\", bucket, \"yolov5dataset\")\n",
418 |     "print(\"Dataset located in: \",dataset_s3_uri)"
419 |    ]
420 |   },
421 |   {
422 |    "cell_type": "markdown",
423 |    "id": "6548b938-525f-450d-9382-9ce9595fcaa4",
424 |    "metadata": {},
425 |    "source": [
426 |     "You have labeled your own custom dataset with Amazon SageMaker Ground Truth and split it a training and validation dataset in YOLOv5 expected format. For the next modules you will be able to use this dataset to train and deploy a custom YOLOv5 model"
427 |    ]
428 |   },
429 |   {
430 |    "cell_type": "markdown",
431 |    "id": "944291de-7144-4585-8662-b0592ea2c98a",
432 |    "metadata": {},
433 |    "source": [
434 |     "| ⚠️ WARNING: These are the details you will need to train your models based on the labeling job you completed. |\n",
435 |     "| -- |"
436 |    ]
437 |   },
438 |   {
439 |    "cell_type": "code",
440 |    "execution_count": null,
441 |    "id": "e7730b90-2fd6-4cdf-ba90-551cc28424bd",
442 |    "metadata": {},
443 |    "outputs": [],
444 |    "source": [
445 |     "print(\"Dataset S3 location: \", dataset_s3_uri)\n",
446 |     "print(\"Labels: \", labels)"
447 |    ]
448 |   }
449 |  ],
450 |  "metadata": {
451 |   "instance_type": "ml.t3.medium",
452 |   "kernelspec": {
453 |    "display_name": "Python 3 (Data Science)",
454 |    "language": "python",
455 |    "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
456 |   },
457 |   "language_info": {
458 |    "codemirror_mode": {
459 |     "name": "ipython",
460 |     "version": 3
461 |    },
462 |    "file_extension": ".py",
463 |    "mimetype": "text/x-python",
464 |    "name": "python",
465 |    "nbconvert_exporter": "python",
466 |    "pygments_lexer": "ipython3",
467 |    "version": "3.7.10"
468 |   }
469 |  },
470 |  "nbformat": 4,
471 |  "nbformat_minor": 5
472 | }
473 | 


--------------------------------------------------------------------------------
/notebooks/0 - Label your dataset with Amazon SageMaker GroundTruth/0b-Label-your-own-dataset with SM Processing.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "d824ab39-fbc9-4076-9442-edca7af5bf3c",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Label your dataset with Amazon SageMaker Ground Truth and SM processing jobs"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "code",
 13 |    "execution_count": null,
 14 |    "id": "e47ba6f4-e051-4b10-8cf8-29d55969cc3b",
 15 |    "metadata": {},
 16 |    "outputs": [],
 17 |    "source": [
 18 |     "import boto3\n",
 19 |     "import json\n",
 20 |     "import numpy\n",
 21 |     "import os\n",
 22 |     "import sagemaker\n",
 23 |     "\n",
 24 |     "from sagemaker.sklearn.processing import SKLearnProcessor\n",
 25 |     "from sagemaker.processing import ProcessingOutput\n",
 26 |     "\n",
 27 |     "sm_client = boto3.client('sagemaker')\n",
 28 |     "s3_resource = boto3.resource('s3')\n",
 29 |     "sm_session = sagemaker.Session()\n",
 30 |     "role = sagemaker.get_execution_role()\n",
 31 |     "region = boto3.Session().region_name"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "id": "fed07009-225d-4291-92ab-4fe6352bb7cd",
 37 |    "metadata": {},
 38 |    "source": [
 39 |     "### Create a labeling job in Amazon SageMaker Ground Truth"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "markdown",
 44 |    "id": "2878ad12-24f2-4c08-bbb9-812713eba973",
 45 |    "metadata": {},
 46 |    "source": [
 47 |     "To create your custom model on YOLOv5 you are going to need to label your custom dataset. To label an object detection dataset you may use Amazon SageMaker Ground Truth."
 48 |    ]
 49 |   },
 50 |   {
 51 |    "cell_type": "markdown",
 52 |    "id": "132888d6-089f-4703-9a0e-4d7bec664e63",
 53 |    "metadata": {},
 54 |    "source": [
 55 |     "| ⚠️ WARNING: If you have already labeled an object detection dataset with Amazon SageMaker Ground Truth you can skip to the \"**Get Job Details**\" |\n",
 56 |     "| -- |"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "markdown",
 61 |    "id": "0334df93-d28e-47e7-8695-bf86354a4008",
 62 |    "metadata": {},
 63 |    "source": [
 64 |     "#### Create a Labeling Workforce\n",
 65 |     "\n",
 66 |     "Follow the steps in the SageMaker Ground Truth documentation here: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-create-private-console.html#create-workforce-labeling-job\n"
 67 |    ]
 68 |   },
 69 |   {
 70 |    "cell_type": "markdown",
 71 |    "id": "873f1915-19a6-49ba-be92-a022c84738dc",
 72 |    "metadata": {},
 73 |    "source": [
 74 |     "#### Create your bounding box labeling job\n",
 75 |     "\n",
 76 |     "Follow the steps in the SageMaker Ground Truth documentation here: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-create-labeling-job-console.html\n",
 77 |     "\n",
 78 |     "If using the AWS Console, you should create a labeling job with the following options:\n",
 79 |     "\n",
 80 |     "1. Job name: Set any unique name for the job name, for example \"Object-Detection-Example\".\n",
 81 |     "2. Leave the \"I want to specify a label attribute...\" option un-checked.\n",
 82 |     "3. Input data setup: Pick \"Automated data setup\".\n",
 83 |     "4. Input dataset location: Copy and paste the location of the single folder with your images in S3. Example: \"s3://mybucket/raw_images\".\n",
 84 |     "5. Output dataset location: Choose \"Same location as input dataset\".\n",
 85 |     "6. Data type: Choose \"Image\".\n",
 86 |     "7. IAM Role: Create a new role and give access to the S3 bucket where your images are located, or any S3 bucket.\n",
 87 |     "8. Now hit \"Complete data setup\" and wait for it to be ready.\n",
 88 |     "9. Task category: Choose \"Image\" and select \"Bounding box\", then hit \"Next\".\n",
 89 |     "10. Worker types: Select \"Private\" and choose your team for the \"Private teams\" option.\n",
 90 |     "11. For the Bounding box labeling tool: Enter a description and instructions, and for the \"Labels\" section add the relevant labels for your job. \n",
 91 |     "12. Finally choose \"Create\"."
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "markdown",
 96 |    "id": "4efa20cb-9e35-4a6b-be8d-8c0d3a910ad6",
 97 |    "metadata": {},
 98 |    "source": [
 99 |     "### Get Job Details and Labels"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "markdown",
104 |    "id": "5a174f40-80f4-46ac-b375-ca34324e9c1a",
105 |    "metadata": {},
106 |    "source": [
107 |     "Once you have finished labeling your images, let's retrieve the information we need to create our processing job which will create the dataset in the format YOLOv5 expects"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "code",
112 |    "execution_count": null,
113 |    "id": "6ce05191-bec6-473f-86d4-0824ca4925e9",
114 |    "metadata": {},
115 |    "outputs": [],
116 |    "source": [
117 |     "groundtruth_job_name = \"Object-Detection-Example\" ### <-- Replace with the name you used for your labeling job"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "code",
122 |    "execution_count": null,
123 |    "id": "a489d7f7-59da-456d-9a4e-d092527f5a9e",
124 |    "metadata": {},
125 |    "outputs": [],
126 |    "source": [
127 |     "response = sm_client.describe_labeling_job(\n",
128 |     "    LabelingJobName=groundtruth_job_name\n",
129 |     ")\n",
130 |     "\n",
131 |     "labelingJobStatus = response[\"LabelingJobStatus\"]\n",
132 |     "labelsListUri = response[\"LabelCategoryConfigS3Uri\"]\n",
133 |     "\n",
134 |     "print(\"Job Status: \",labelingJobStatus)\n",
135 |     "print(\"Labels Uri: \", labelsListUri)"
136 |    ]
137 |   },
138 |   {
139 |    "cell_type": "markdown",
140 |    "id": "a42cd1a5-1e75-43d0-801d-993746eba937",
141 |    "metadata": {},
142 |    "source": [
143 |     "### Get labels"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "markdown",
148 |    "id": "158d4393-0101-4b8c-b088-e32ee4fe5621",
149 |    "metadata": {},
150 |    "source": [
151 |     "We need to retrieve the labels from the training job which are located in S3."
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "code",
156 |    "execution_count": null,
157 |    "id": "e28cd619-5163-456d-b1e7-16320602f6f5",
158 |    "metadata": {},
159 |    "outputs": [],
160 |    "source": [
161 |     "def split_s3_path(s3_path):\n",
162 |     "    path_parts=s3_path.replace(\"s3://\",\"\").split(\"/\")\n",
163 |     "    bucket=path_parts.pop(0)\n",
164 |     "    key=\"/\".join(path_parts)\n",
165 |     "    return bucket, key\n",
166 |     "\n",
167 |     "def get_labels_list(labels_uri):\n",
168 |     "    labels = []\n",
169 |     "    bucket, key = split_s3_path(labels_uri)\n",
170 |     "    s3_resource.meta.client.download_file(bucket, key, 'labels.json')\n",
171 |     "    with open('labels.json') as f:\n",
172 |     "        data = json.load(f)\n",
173 |     "    for label in data[\"labels\"]:\n",
174 |     "        labels.append(label[\"label\"])\n",
175 |     "    return labels"
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "code",
180 |    "execution_count": null,
181 |    "id": "99536d91-042a-4f55-846c-b1c47060552e",
182 |    "metadata": {},
183 |    "outputs": [],
184 |    "source": [
185 |     "labels = get_labels_list(labelsListUri)\n",
186 |     "print(\"Labels: \",labels)"
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "markdown",
191 |    "id": "001766d4-67de-4c91-a1b8-249dd033814e",
192 |    "metadata": {},
193 |    "source": [
194 |     "### Create a SageMaker Processing Job"
195 |    ]
196 |   },
197 |   {
198 |    "cell_type": "code",
199 |    "execution_count": null,
200 |    "id": "bf2d5fc5-e088-43bd-9296-77216c13078a",
201 |    "metadata": {},
202 |    "outputs": [],
203 |    "source": [
204 |     "sklearn_processor = SKLearnProcessor(\n",
205 |     "    framework_version=\"1.0-1\",\n",
206 |     "    instance_type=\"ml.c5.xlarge\",\n",
207 |     "    env={'gt_job_name': groundtruth_job_name,\n",
208 |     "        'region': region},\n",
209 |     "    instance_count=1,\n",
210 |     "    base_job_name=\"yolov5-process\",\n",
211 |     "    role=role,\n",
212 |     "    sagemaker_session = sm_session\n",
213 |     ")"
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": null,
219 |    "id": "df47018a-2205-4985-bb3a-767c8d06a72e",
220 |    "metadata": {},
221 |    "outputs": [],
222 |    "source": [
223 |     "sklearn_processor.run(\n",
224 |     "    outputs=[\n",
225 |     "        ProcessingOutput(output_name=\"train\", source=\"/opt/ml/processing/output/train\")\n",
226 |     "    ],\n",
227 |     "    code=\"code/preprocessing.py\",\n",
228 |     ")"
229 |    ]
230 |   },
231 |   {
232 |    "cell_type": "code",
233 |    "execution_count": null,
234 |    "id": "6794675c-34c7-4318-b858-facc6bd8d05d",
235 |    "metadata": {},
236 |    "outputs": [],
237 |    "source": [
238 |     "dataset_s3_uri = sklearn_processor.jobs[-1].describe()[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\"S3Uri\"]"
239 |    ]
240 |   },
241 |   {
242 |    "cell_type": "markdown",
243 |    "id": "e97a7784-f95a-4f8d-874b-ba232537cb2f",
244 |    "metadata": {},
245 |    "source": [
246 |     "| ⚠️ WARNING: These are the details you will need to train your models based on the labeling job you completed. |\n",
247 |     "| -- |"
248 |    ]
249 |   },
250 |   {
251 |    "cell_type": "code",
252 |    "execution_count": null,
253 |    "id": "7f31131a-d18b-418d-9204-fb7eda2a3553",
254 |    "metadata": {},
255 |    "outputs": [],
256 |    "source": [
257 |     "print(\"Dataset S3 location: \", dataset_s3_uri)\n",
258 |     "print(\"Labels: \", labels)"
259 |    ]
260 |   }
261 |  ],
262 |  "metadata": {
263 |   "instance_type": "ml.t3.medium",
264 |   "kernelspec": {
265 |    "display_name": "Python 3 (Data Science)",
266 |    "language": "python",
267 |    "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
268 |   },
269 |   "language_info": {
270 |    "codemirror_mode": {
271 |     "name": "ipython",
272 |     "version": 3
273 |    },
274 |    "file_extension": ".py",
275 |    "mimetype": "text/x-python",
276 |    "name": "python",
277 |    "nbconvert_exporter": "python",
278 |    "pygments_lexer": "ipython3",
279 |    "version": "3.7.10"
280 |   }
281 |  },
282 |  "nbformat": 4,
283 |  "nbformat_minor": 5
284 | }
285 | 


--------------------------------------------------------------------------------
/notebooks/0 - Label your dataset with Amazon SageMaker GroundTruth/README.md:
--------------------------------------------------------------------------------
 1 | ## Label and prepare your dataset with Amazon SageMaker Ground Truth
 2 | 
 3 | In this section you will focus on preparing your dataset needed to train a custom YOLOv5 model.
 4 | 
 5 | There are two available notebooks you can use to create your dataset:
 6 | 
 7 | **A. Label and prepare your own dataset** : In this notebook, you will use Amazon SageMaker Ground Truth to create a labelling job and label your training images. After you have finished labelling your images you will download them to the local environment, split them into training and validation datasets, convert the label format to COCO and upload to S3 the resulting datasets, needed to train your custom model. 
 8 | 
 9 | **B. Label and prepare your own dataset with SM Processing** : In this notebook, you will use Amazon SageMaker Ground Truth to create a labelling job and label your training images. After you have finished labelling your images you will use an Amazon SageMaker Processing Job to split them into training and validation datasets, convert the label format to COCO and upload to S3 the resulting datasets, needed to train your custom model. 
10 | 


--------------------------------------------------------------------------------
/notebooks/0 - Label your dataset with Amazon SageMaker GroundTruth/code/preprocessing.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import numpy
 3 | import json
 4 | import shutil
 5 | import boto3
 6 | from sklearn.model_selection import train_test_split
 7 | 
 8 | envs = dict(os.environ)
 9 | groundtruth_job_name = envs.get('gt_job_name')
10 | region = envs.get('region')
11 | print("Region: ",region)
12 | 
13 | sm_client = boto3.client('sagemaker', region_name=region)
14 | s3_resource = boto3.resource('s3', region_name=region)
15 | 
16 | response = sm_client.describe_labeling_job(
17 |     LabelingJobName=groundtruth_job_name
18 | )
19 | 
20 | labelingJobStatus = response["LabelingJobStatus"]
21 | manifestUri = response["LabelingJobOutput"]["OutputDatasetS3Uri"]
22 | labelsListUri = response["LabelCategoryConfigS3Uri"]
23 | 
24 | print("Manifest Uri: ", manifestUri)
25 | print("Labels Uri: ", labelsListUri)
26 | 
27 | def split_s3_path(s3_path):
28 |     path_parts=s3_path.replace("s3://","").split("/")
29 |     bucket=path_parts.pop(0)
30 |     key="/".join(path_parts)
31 |     return bucket, key
32 | 
33 | def get_labels_list(labels_uri):
34 |     labels = []
35 |     bucket, key = split_s3_path(labels_uri)
36 |     s3_resource.meta.client.download_file(bucket, key, 'labels.json')
37 |     with open('labels.json') as f:
38 |         data = json.load(f)
39 |     for label in data["labels"]:
40 |         labels.append(label["label"])
41 |     return labels
42 | 
43 | labels = get_labels_list(labelsListUri)
44 | print("Labels: ",labels)
45 | 
46 | def get_manifest_file(manifest_uri):
47 |     bucket, key = split_s3_path(manifest_uri)
48 |     s3_resource.meta.client.download_file(bucket, key, 'output.manifest')
49 |     return "output.manifest"
50 | 
51 | manifest = get_manifest_file(manifestUri)
52 | 
53 | with open(manifest) as file:
54 |     lines = file.readlines()
55 |     data = numpy.array(lines)
56 |     train_data, validation_data = train_test_split(data, test_size=0.2)
57 |     
58 | print("The manifest contains {} annotations.".format(len(data)))
59 | print("{} will be used for training.".format(len(train_data)))
60 | print("{} will be used for validation.".format(len(validation_data)))
61 | 
62 | os.makedirs("/opt/ml/processing/output/train/training_data")
63 | os.makedirs("/opt/ml/processing/output/train/training_data/images")
64 | os.makedirs("/opt/ml/processing/output/train/training_data/labels")
65 | os.makedirs("/opt/ml/processing/output/train/training_data/images/train")
66 | os.makedirs("/opt/ml/processing/output/train/training_data/labels/train")
67 | os.makedirs("/opt/ml/processing/output/train/training_data/images/validation")
68 | os.makedirs("/opt/ml/processing/output/train/training_data/labels/validation")
69 | 
70 | 
71 | def ground_truth_to_yolo(dataset, dataset_category):
72 |     print("Downloading images and creating labels for the {} dataset".format(dataset_category))
73 |     for line in dataset:
74 |         line = json.loads(line)     
75 |         # Variables
76 |         object_s3_uri = line["source-ref"]
77 |         bucket, key = split_s3_path(object_s3_uri)
78 |         image_filename = object_s3_uri.split("/")[-1]
79 |         txt_filename = '.'.join(image_filename.split(".")[:-1]) + ".txt"
80 |         txt_path = "/opt/ml/processing/output/train/training_data/labels/{}/{}".format(dataset_category, txt_filename)
81 |         
82 |         # Download image
83 |         s3_resource.meta.client.download_file(bucket, key, "/opt/ml/processing/output/train/training_data/images/{}/{}".format(dataset_category,image_filename))
84 |         
85 |         # Create txt with annotations
86 |         with open(txt_path, 'w') as target:
87 |             for annotation in line[groundtruth_job_name]["annotations"]:
88 |                 class_id = annotation["class_id"]
89 |                 center_x = (annotation["left"] + (annotation["width"]/2)) / line[groundtruth_job_name]["image_size"][0]["width"]
90 |                 center_y = (annotation["top"] + (annotation["height"]/2)) / line[groundtruth_job_name]["image_size"][0]["height"]
91 |                 w = annotation["width"] / line[groundtruth_job_name]["image_size"][0]["width"]
92 |                 h = annotation["height"] / line[groundtruth_job_name]["image_size"][0]["height"]
93 |                 data = "{} {} {} {} {}\n".format(class_id, center_x, center_y, w, h)
94 |                 target.write(data)
95 |                 
96 | ground_truth_to_yolo(train_data, "train")
97 | ground_truth_to_yolo(validation_data, "validation")                
98 | 
99 | print("Completed running the processing job")


--------------------------------------------------------------------------------
/notebooks/1 - Train and test on Amazon SageMaker Studio/README.md:
--------------------------------------------------------------------------------
1 | ## Train and test on a custom YOLOv5 model on Amazon SageMaker Studio
2 | 
3 | In the [Label your dataset with Amazon SageMaker GroundTruth]() notebooks, you prepared your custom dataset to train a YOLOv5 model (If you don't have a labeled dataset in COCO format please go to that section an create your dataset). 
4 | 
5 | In this section, you will download your generated dataset locally and start a training job of your custom model. Once the training has finished, you will be able to test it and see the results of the predictions.
6 | 


--------------------------------------------------------------------------------
/notebooks/1 - Train and test on Amazon SageMaker Studio/Train-and-test-on-Amazon-SageMaker-Studio.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "2bc39d14-7603-4588-8f35-e8c29320b90c",
  6 |    "metadata": {
  7 |     "tags": []
  8 |    },
  9 |    "source": [
 10 |     "# Train and Test custom YOLOv5 on Amazon SageMaker Studio"
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "markdown",
 15 |    "id": "369a9b33-2052-4c57-aa4f-b4b26e1c1d62",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "In this notebook we will train and test a custom YOLOv5 object detection CV model within Amazon SageMaker Studio. \n",
 19 |     "\n",
 20 |     "**Steps:**\n",
 21 |     "\n",
 22 |     "0. Initial configuration.\n",
 23 |     "1. Download a labeled dataset.\n",
 24 |     "3. Train the custom YOLOv5 model.\n",
 25 |     "4. Make predictions against the created model. \n"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "markdown",
 30 |    "id": "1e59f9ec-43f8-4040-bebc-4423afdc4b04",
 31 |    "metadata": {},
 32 |    "source": [
 33 |     "| ⚠️ WARNING: For this notebook to work, make sure to select the following settings in your jupyter environment: |\n",
 34 |     "| -- |\n",
 35 |     "Image: \"PyTorch 1.10 Python 3.8 GPU Optimized\"\n",
 36 |     "Instance_type: \"ml.g4dn.xlarge\" (fast launch)"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "id": "356f3d22-a857-40b5-8cb5-3185dd353233",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "## 0. Initial Configuration"
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "markdown",
 49 |    "id": "cea75758-365c-41a4-b229-01bb7689848b",
 50 |    "metadata": {},
 51 |    "source": [
 52 |     "#### Download the YOLOv5 repository"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "code",
 57 |    "execution_count": null,
 58 |    "id": "3b786198-5500-4fbe-bc2b-b36bbf109125",
 59 |    "metadata": {
 60 |     "tags": []
 61 |    },
 62 |    "outputs": [],
 63 |    "source": [
 64 |     "!git clone --quiet https://github.com/ultralytics/yolov5\n",
 65 |     "!pip install -qr yolov5/requirements.txt"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": null,
 71 |    "id": "a22293fe-b5c2-42c0-906a-add1fbf406e0",
 72 |    "metadata": {
 73 |     "tags": []
 74 |    },
 75 |    "outputs": [],
 76 |    "source": [
 77 |     "import os\n",
 78 |     "import boto3\n",
 79 |     "import glob\n",
 80 |     "s3_resource = boto3.resource('s3')"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "markdown",
 85 |    "id": "cddd1da6-e457-4e2b-acf3-d085643aa8c3",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "## 1. Download a labeled dataset with YOLOv5 expected format."
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "markdown",
 93 |    "id": "cfefb767-e740-42e7-8b5b-5c30abeff17b",
 94 |    "metadata": {},
 95 |    "source": [
 96 |     "Before we train a custom YOLOv5 model, we need to have a labeled dataset. \n",
 97 |     "In the previous notebook \"0 - Label your dataset with Amazon SageMaker GroundTruth\" you will be able to label your own dataset and transform it into YOLOv5 expected format or use an example custom dataset. Once you have run through one of the two options you will have available the S3 dataset location and labels used."
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "code",
102 |    "execution_count": null,
103 |    "id": "3856bee1-e3ed-4304-b7de-3ee4417430a4",
104 |    "metadata": {},
105 |    "outputs": [],
106 |    "source": [
107 |     "dataset_s3_uri = \"\"\n",
108 |     "labels = []"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "markdown",
113 |    "id": "ee81a053-6502-453a-805a-5e640df0733f",
114 |    "metadata": {},
115 |    "source": [
116 |     "#### Download the dataset"
117 |    ]
118 |   },
119 |   {
120 |    "cell_type": "code",
121 |    "execution_count": null,
122 |    "id": "7f2122a6-1c42-4ce7-b8d4-6e72a81475f2",
123 |    "metadata": {},
124 |    "outputs": [],
125 |    "source": [
126 |     "def split_s3_path(s3_path):\n",
127 |     "    path_parts=s3_path.replace(\"s3://\",\"\").split(\"/\")\n",
128 |     "    bucket=path_parts.pop(0)\n",
129 |     "    key=\"/\".join(path_parts)\n",
130 |     "    return bucket, key"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "code",
135 |    "execution_count": null,
136 |    "id": "a6794fb3-de08-4527-8a73-683175121493",
137 |    "metadata": {},
138 |    "outputs": [],
139 |    "source": [
140 |     "bucket,dataset_name = split_s3_path(dataset_s3_uri)\n",
141 |     "bucket,dataset_name"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "code",
146 |    "execution_count": null,
147 |    "id": "e10c3229-ac6d-4cfc-9588-38b4431b92c4",
148 |    "metadata": {},
149 |    "outputs": [],
150 |    "source": [
151 |     "def download_dataset(bucket_name, folder):\n",
152 |     "    bucket = s3_resource.Bucket(bucket_name) \n",
153 |     "    for obj in bucket.objects.filter(Prefix = folder):\n",
154 |     "        if not os.path.exists(os.path.dirname(obj.key)):\n",
155 |     "            os.makedirs(os.path.dirname(obj.key))\n",
156 |     "        bucket.download_file(obj.key, obj.key)"
157 |    ]
158 |   },
159 |   {
160 |    "cell_type": "code",
161 |    "execution_count": null,
162 |    "id": "32a60468-0295-42ed-93d0-2efef95a016c",
163 |    "metadata": {},
164 |    "outputs": [],
165 |    "source": [
166 |     "download_dataset(bucket, dataset_name)"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "markdown",
171 |    "id": "6edd94f5-f78a-44a7-8c7d-a23cd3cfe90e",
172 |    "metadata": {},
173 |    "source": [
174 |     "#### Lets explore our dataset"
175 |    ]
176 |   },
177 |   {
178 |    "cell_type": "code",
179 |    "execution_count": null,
180 |    "id": "44534876-38eb-4c01-a0ed-1447911ae23e",
181 |    "metadata": {},
182 |    "outputs": [],
183 |    "source": [
184 |     "for filename in glob.iglob(dataset_name + '**/**', recursive=True):\n",
185 |     "     print(filename)"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "markdown",
190 |    "id": "d388275f-d930-44ac-be59-cd9e1312c7a4",
191 |    "metadata": {},
192 |    "source": [
193 |     "#### Now let's add these data sources to the data library in the yolov5 folder for our model to train"
194 |    ]
195 |   },
196 |   {
197 |    "cell_type": "code",
198 |    "execution_count": null,
199 |    "id": "03755c69-4523-4d50-9cb2-e6a0b4ea2820",
200 |    "metadata": {},
201 |    "outputs": [],
202 |    "source": [
203 |     "with open(\"yolov5/data/custom-model.yaml\", 'w') as target:\n",
204 |     "    target.write(\"path: ../{}\\n\".format(dataset_name))\n",
205 |     "    target.write(\"train: images/train\\n\")\n",
206 |     "    target.write(\"val: images/validation\\n\")\n",
207 |     "    target.write(\"names:\\n\")\n",
208 |     "    for i, label in enumerate(labels):\n",
209 |     "        target.write(\"  {}: {}\\n\".format(i, label))\n",
210 |     "        \n",
211 |     "with open('yolov5/data/custom-model.yaml') as file:\n",
212 |     "    lines = file.readlines()\n",
213 |     "    for line in lines:\n",
214 |     "        print(line)"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "markdown",
219 |    "id": "be7b8fbd-ccdb-4440-af14-ea428bcc1cce",
220 |    "metadata": {
221 |     "tags": []
222 |    },
223 |    "source": [
224 |     "## 3. Train the custom YOLOv5 model."
225 |    ]
226 |   },
227 |   {
228 |    "cell_type": "code",
229 |    "execution_count": null,
230 |    "id": "ee2f8907-0640-4aa0-a4b0-80b061b43659",
231 |    "metadata": {
232 |     "tags": []
233 |    },
234 |    "outputs": [],
235 |    "source": [
236 |     "!python yolov5/train.py --workers 4 --device 0 --img 640 --batch 8 --epochs 10 --data yolov5/data/custom-model.yaml --weights yolov5s.pt --cache"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "markdown",
241 |    "id": "27d5835f-e975-408f-97f1-3c47f44ba56e",
242 |    "metadata": {},
243 |    "source": [
244 |     "## 4. Make inferences with the created model."
245 |    ]
246 |   },
247 |   {
248 |    "cell_type": "code",
249 |    "execution_count": null,
250 |    "id": "3a3dfee9-e2fc-4058-a5aa-2ffe186d4a1f",
251 |    "metadata": {
252 |     "tags": []
253 |    },
254 |    "outputs": [],
255 |    "source": [
256 |     "!python yolov5/detect.py --weights yolov5/runs/train/exp/weights/best.pt --img 640 --conf 0.5 --source \"\""
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "markdown",
261 |    "id": "660fdab9-e4d8-4a76-8e0e-4ca98fbecd4c",
262 |    "metadata": {},
263 |    "source": [
264 |     "| ⚠️ WARNING: Remember to shutdown the instance once finalized with this notebook to prevent unnecesary charges. Head to running Terminals and Kernels tab and shutdown the running instance. |\n",
265 |     "| -- |"
266 |    ]
267 |   }
268 |  ],
269 |  "metadata": {
270 |   "instance_type": "ml.g4dn.xlarge",
271 |   "kernelspec": {
272 |    "display_name": "Python 3 (PyTorch 1.10 Python 3.8 GPU Optimized)",
273 |    "language": "python",
274 |    "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/pytorch-1.10-gpu-py38"
275 |   },
276 |   "language_info": {
277 |    "codemirror_mode": {
278 |     "name": "ipython",
279 |     "version": 3
280 |    },
281 |    "file_extension": ".py",
282 |    "mimetype": "text/x-python",
283 |    "name": "python",
284 |    "nbconvert_exporter": "python",
285 |    "pygments_lexer": "ipython3",
286 |    "version": "3.8.10"
287 |   }
288 |  },
289 |  "nbformat": 4,
290 |  "nbformat_minor": 5
291 | }
292 | 


--------------------------------------------------------------------------------
/notebooks/2 - Train and deploy with Amazon SageMaker/Custom-YOLOv5-Train-and-Deploy-on-Amazon-SageMaker.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "078afd41-6103-4d36-8b39-2bf0cf662599",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Custom YOLOv5 Train and Deploy on Amazon SageMaker"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "id": "c4b2cd73-cadd-4a8b-9c8d-55a36eafce19",
 14 |    "metadata": {},
 15 |    "source": [
 16 |     "In this notebook we will train and deploy custom YOLOv5 object detection CV model with Amazon SageMaker Training Jobs and Endpoints.\n",
 17 |     "\n",
 18 |     "**Steps:**\n",
 19 |     "\n",
 20 |     "0. Initial configuration.\n",
 21 |     "1. Locate a labeled dataset with YOLOv5 expected format.\n",
 22 |     "2. Train the custom YOLOv5 model with SageMaker Training Jobs.\n",
 23 |     "3. Deploy the model with SageMaker Endpoints."
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "markdown",
 28 |    "id": "33b14bf6-40e5-46b0-a0f6-8f1aa43fa9be",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "## 0. Initial Configuration"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": null,
 37 |    "id": "7fd63dba-6e16-4bc7-a6a6-a3a80f6ab01a",
 38 |    "metadata": {
 39 |     "tags": []
 40 |    },
 41 |    "outputs": [],
 42 |    "source": [
 43 |     "!pip install -qU sagemaker\n",
 44 |     "import json\n",
 45 |     "import numpy as np\n",
 46 |     "import pandas as pd\n",
 47 |     "import os\n",
 48 |     "import boto3\n",
 49 |     "import sagemaker\n",
 50 |     "import uuid\n",
 51 |     "import time\n",
 52 |     "import cv2\n",
 53 |     "import glob\n",
 54 |     "import matplotlib.pyplot as plt\n",
 55 |     "%matplotlib inline \n",
 56 |     "from sagemaker.pytorch.estimator import PyTorch\n",
 57 |     "from sagemaker.session import TrainingInput\n",
 58 |     "from sagemaker import get_execution_role\n",
 59 |     "from sagemaker.utils import name_from_base\n",
 60 |     "from sagemaker.pytorch import PyTorchModel\n",
 61 |     "from sagemaker.serializers import DataSerializer\n",
 62 |     "from sagemaker.deserializers import JSONDeserializer\n",
 63 |     "sm_session = sagemaker.Session()\n",
 64 |     "role = get_execution_role()\n",
 65 |     "s3_resource = boto3.resource('s3')"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": null,
 71 |    "id": "c2699467-fa11-4fb6-8c4d-17941ea8f7ba",
 72 |    "metadata": {},
 73 |    "outputs": [],
 74 |    "source": [
 75 |     "!git clone --quiet https://github.com/ultralytics/yolov5\n",
 76 |     "!cp -r helper-code/* yolov5/"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "markdown",
 81 |    "id": "ef53157e-e980-4f8e-bcf2-879d698de5ab",
 82 |    "metadata": {},
 83 |    "source": [
 84 |     "## 1. Locate a labeled dataset with YOLOv5 expected format."
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "markdown",
 89 |    "id": "4810ead6-444b-47bf-a57e-c696e3f3ec44",
 90 |    "metadata": {},
 91 |    "source": [
 92 |     "Before we train a custom YOLOv5 model, we need to have a labeled dataset. In the previous notebook \"0 - Label your dataset with Amazon SageMaker GroundTruth\" you will be able to label your own dataset and transform it into YOLOv5 expected format or use an example custom dataset. Once you have run through one of the two options you will have available the S3 dataset location and labels used.\n"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": null,
 98 |    "id": "8120ec27-1e47-4bd1-a925-1ef791f70be4",
 99 |    "metadata": {},
100 |    "outputs": [],
101 |    "source": [
102 |     "dataset_s3_uri = \"\"\n",
103 |     "labels = [\"\",\"\"]"
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "markdown",
108 |    "id": "7f2a7284-1cb9-4fba-8333-76280927be70",
109 |    "metadata": {},
110 |    "source": [
111 |     "### Download the dataset"
112 |    ]
113 |   },
114 |   {
115 |    "cell_type": "code",
116 |    "execution_count": null,
117 |    "id": "f19a7c76-163a-4894-a0c8-abc0478f6cfa",
118 |    "metadata": {},
119 |    "outputs": [],
120 |    "source": [
121 |     "def split_s3_path(s3_path):\n",
122 |     "    path_parts=s3_path.replace(\"s3://\",\"\").split(\"/\")\n",
123 |     "    bucket=path_parts.pop(0)\n",
124 |     "    key=\"/\".join(path_parts)\n",
125 |     "    return bucket, key\n",
126 |     "\n",
127 |     "def download_dataset(bucket_name, folder):\n",
128 |     "    bucket = s3_resource.Bucket(bucket_name)\n",
129 |     "    for obj in bucket.objects.filter(Prefix = folder):\n",
130 |     "        if not os.path.exists(os.path.dirname(obj.key)):\n",
131 |     "            os.makedirs(os.path.dirname(obj.key))\n",
132 |     "        if os.path.splitext(obj.key)[1]:\n",
133 |     "            bucket.download_file(obj.key, obj.key)"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": null,
139 |    "id": "5fb56366-53e1-43fe-9ecf-77d39da90b83",
140 |    "metadata": {},
141 |    "outputs": [],
142 |    "source": [
143 |     "bucket,dataset_name = split_s3_path(dataset_s3_uri)\n",
144 |     "download_dataset(bucket, dataset_name)"
145 |    ]
146 |   },
147 |   {
148 |    "cell_type": "markdown",
149 |    "id": "bb704665-7d73-4db5-b54e-e2b971ccfd51",
150 |    "metadata": {},
151 |    "source": [
152 |     "### Lets explore our dataset"
153 |    ]
154 |   },
155 |   {
156 |    "cell_type": "code",
157 |    "execution_count": null,
158 |    "id": "3d89de35-c6ba-45ce-b221-cbf4e5ab0ee4",
159 |    "metadata": {},
160 |    "outputs": [],
161 |    "source": [
162 |     "for filename in glob.iglob(dataset_name + '**', recursive=True):\n",
163 |     "     print(filename)"
164 |    ]
165 |   },
166 |   {
167 |    "cell_type": "markdown",
168 |    "id": "cfe17888-ce81-4c31-81cc-ce95a0212b7c",
169 |    "metadata": {},
170 |    "source": [
171 |     "#### Now let's add these data sources to the data library in the yolov5 folder for our model to train"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": null,
177 |    "id": "84e1c6d9-dbce-4eb8-b168-fd0266b4786e",
178 |    "metadata": {},
179 |    "outputs": [],
180 |    "source": [
181 |     "with open(\"yolov5/data/custom-model.yaml\", 'w') as target:\n",
182 |     "    target.write(\"path: /opt/ml/input/data/training\\n\")\n",
183 |     "    target.write(\"train: images/train\\n\")\n",
184 |     "    target.write(\"val: images/validation\\n\")\n",
185 |     "    target.write(\"names:\\n\")\n",
186 |     "    for i, label in enumerate(labels):\n",
187 |     "        target.write(\"  {}: {}\\n\".format(i, label))\n",
188 |     "        \n",
189 |     "with open('yolov5/data/custom-model.yaml') as file:\n",
190 |     "    lines = file.readlines()\n",
191 |     "    for line in lines:\n",
192 |     "        print(line)"
193 |    ]
194 |   },
195 |   {
196 |    "cell_type": "markdown",
197 |    "id": "62a1251b-7e0f-4f96-bc11-d400a0ba04bd",
198 |    "metadata": {},
199 |    "source": [
200 |     "## 3. Train the custom YOLOv5 model with SageMaker Training Jobs."
201 |    ]
202 |   },
203 |   {
204 |    "cell_type": "markdown",
205 |    "id": "b253a75e-6830-4577-91e0-4b4df8c268f9",
206 |    "metadata": {},
207 |    "source": [
208 |     "#### First let's send our training data to S3"
209 |    ]
210 |   },
211 |   {
212 |    "cell_type": "code",
213 |    "execution_count": null,
214 |    "id": "7d788b55-71d1-45ac-9c8b-ccf8208819cb",
215 |    "metadata": {},
216 |    "outputs": [],
217 |    "source": [
218 |     "training_name = \"yolov5-t\""
219 |    ]
220 |   },
221 |   {
222 |    "cell_type": "code",
223 |    "execution_count": null,
224 |    "id": "97834e07-2bb5-4bf4-9eb3-e7630197acab",
225 |    "metadata": {},
226 |    "outputs": [],
227 |    "source": [
228 |     "job_name = '{}-{}'.format(training_name,str(uuid.uuid4()))\n",
229 |     "print(job_name)"
230 |    ]
231 |   },
232 |   {
233 |    "cell_type": "code",
234 |    "execution_count": null,
235 |    "id": "306f005b-2073-4496-a963-0ab1aadfb27a",
236 |    "metadata": {},
237 |    "outputs": [],
238 |    "source": [
239 |     "hyperparameters={\n",
240 |     "    \"workers\":\"8\",\n",
241 |     "    \"device\": \"0\",\n",
242 |     "    \"batch-size\": \"8\",\n",
243 |     "    \"epochs\": 50,\n",
244 |     "    \"data\": \"custom-model.yaml\",\n",
245 |     "    \"weights\": \"yolov5s.pt\",\n",
246 |     "    \"project\": \"/opt/ml/model\"\n",
247 |     "}\n",
248 |     "\n",
249 |     "estimator = PyTorch(\n",
250 |     "    framework_version='1.11.0',\n",
251 |     "    py_version='py38',\n",
252 |     "    entry_point='train.py',\n",
253 |     "    source_dir='yolov5',\n",
254 |     "    hyperparameters=hyperparameters,\n",
255 |     "    instance_count=1,\n",
256 |     "    instance_type='ml.g4dn.xlarge',\n",
257 |     "    role=role,\n",
258 |     "    disable_profiler=True, \n",
259 |     "    debugger_hook_config=False\n",
260 |     ")"
261 |    ]
262 |   },
263 |   {
264 |    "cell_type": "code",
265 |    "execution_count": null,
266 |    "id": "903f4558-1069-48ed-bf19-a208de4eccee",
267 |    "metadata": {},
268 |    "outputs": [],
269 |    "source": [
270 |     "train_input = TrainingInput(dataset_s3_uri)"
271 |    ]
272 |   },
273 |   {
274 |    "cell_type": "code",
275 |    "execution_count": null,
276 |    "id": "a238bc87-989b-4e83-9420-1c8a330d64a7",
277 |    "metadata": {},
278 |    "outputs": [],
279 |    "source": [
280 |     "estimator.fit(train_input, job_name=job_name)"
281 |    ]
282 |   },
283 |   {
284 |    "cell_type": "code",
285 |    "execution_count": null,
286 |    "id": "3d37759f-2c22-4f9a-bd6b-492dad543be1",
287 |    "metadata": {},
288 |    "outputs": [],
289 |    "source": [
290 |     "model_name = \"Model-\"+job_name\n",
291 |     "model_data = 's3://{}/{}/output/model.tar.gz'.format(sm_session.default_bucket(), job_name)\n",
292 |     "print(model_data)"
293 |    ]
294 |   },
295 |   {
296 |    "cell_type": "markdown",
297 |    "id": "7637620a-bc37-495c-b4a1-b8b337f9f49d",
298 |    "metadata": {},
299 |    "source": [
300 |     "## 4. Deploy your model to a SM Endpoint"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "code",
305 |    "execution_count": null,
306 |    "id": "5b40a48e-03f4-4e94-a262-f1de14934abb",
307 |    "metadata": {
308 |     "tags": []
309 |    },
310 |    "outputs": [],
311 |    "source": [
312 |     "model = PyTorchModel(\n",
313 |     "    entry_point='detect.py',\n",
314 |     "    source_dir='yolov5',\n",
315 |     "    model_data=model_data,\n",
316 |     "    framework_version='1.11.0',\n",
317 |     "    py_version='py38',\n",
318 |     "    role=role,\n",
319 |     "    name=model_name\n",
320 |     ")"
321 |    ]
322 |   },
323 |   {
324 |    "cell_type": "code",
325 |    "execution_count": null,
326 |    "id": "7b20c386-f663-4263-b8e7-e97759554948",
327 |    "metadata": {},
328 |    "outputs": [],
329 |    "source": [
330 |     "predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.large')\n",
331 |     "predictor.deserializer = JSONDeserializer()"
332 |    ]
333 |   },
334 |   {
335 |    "cell_type": "code",
336 |    "execution_count": null,
337 |    "id": "549daa11-c68d-4905-ba75-a62f1a6b513b",
338 |    "metadata": {},
339 |    "outputs": [],
340 |    "source": [
341 |     "predictor.serializer =DataSerializer(content_type=\"image/png\")"
342 |    ]
343 |   },
344 |   {
345 |    "cell_type": "markdown",
346 |    "id": "6b49fac4-534d-4ed1-a922-fbd03a867585",
347 |    "metadata": {},
348 |    "source": [
349 |     "### Display predictions"
350 |    ]
351 |   },
352 |   {
353 |    "cell_type": "code",
354 |    "execution_count": null,
355 |    "id": "b620af19-e252-4c25-823f-e817e41ad04c",
356 |    "metadata": {},
357 |    "outputs": [],
358 |    "source": [
359 |     "test_files_dir=\"test-images\""
360 |    ]
361 |   },
362 |   {
363 |    "cell_type": "code",
364 |    "execution_count": null,
365 |    "id": "36463c30-7767-4414-8b65-3e3d9b2460a1",
366 |    "metadata": {},
367 |    "outputs": [],
368 |    "source": [
369 |     "def draw_label (image, box, conf, label):\n",
370 |     "    bbox = np.array(box).astype(np.int32)\n",
371 |     "    cv2.rectangle(image, (bbox[0], bbox[1]), (bbox[2], bbox[3]), [255,0,0], 2, cv2.LINE_AA)\n",
372 |     "    cv2.putText(image, \"{}:{}\".format(label,str(conf)[0:4]), (bbox[0], bbox[1] - 10),  0, 1e-3 * imgHeight, [255,0,0], 2)\n",
373 |     "    \n",
374 |     "def resize_bb(old, new, min_b, max_b):\n",
375 |     "    old = np.array(old)\n",
376 |     "    new = np.array(new)\n",
377 |     "    min_b = np.array(min_b)\n",
378 |     "    max_b = np.array(max_b)\n",
379 |     "    min_xy = min_b/(old/new)\n",
380 |     "    max_xy = max_b/(old/new)\n",
381 |     "    return [int(min_xy[0]),int(min_xy[1]),int(max_xy[0]),int(max_xy[1])]\n",
382 |     "\n",
383 |     "def plot_image(img):\n",
384 |     "    dpi = 80\n",
385 |     "    figsize = imgWidth / float(dpi), imgHeight / float(dpi)    \n",
386 |     "    fig = plt.figure(figsize=figsize)\n",
387 |     "    ax = fig.add_axes([0, 0, 1, 1])\n",
388 |     "    ax.axis('off')\n",
389 |     "    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))\n",
390 |     "\n",
391 |     "def make_prediction(imgdir,image):\n",
392 |     "    #Get predictions\n",
393 |     "    img_path = \"{}/{}\".format(imgdir,image)\n",
394 |     "    data = open(img_path, 'rb').read()\n",
395 |     "    pr = json.loads(predictor.predict(data))\n",
396 |     "    df = pd.DataFrame(data=pr[\"data\"], index = pr[\"index\"], columns = pr[\"columns\"])\n",
397 |     "    \n",
398 |     "    #Display labels\n",
399 |     "    img = cv2.imread(img_path)\n",
400 |     "    imgHeight,imgWidth,_ = img.shape\n",
401 |     "\n",
402 |     "    for index, row in df.iterrows():\n",
403 |     "        if row['confidence'] > 0.3:\n",
404 |     "            new_boxes = resize_bb([640,640],[imgWidth,imgHeight],[row['xmin'],row['ymin']],[row['xmax'],row['ymax']])\n",
405 |     "            draw_label(img, new_boxes,row[\"confidence\"],row['name'])\n",
406 |     "\n",
407 |     "    plot_image(img)"
408 |    ]
409 |   },
410 |   {
411 |    "cell_type": "code",
412 |    "execution_count": null,
413 |    "id": "8ca707e9-58a7-4173-9271-0f3c187cdeca",
414 |    "metadata": {},
415 |    "outputs": [],
416 |    "source": [
417 |     "for image in os.listdir(test_files_dir):\n",
418 |     "    if image.lower().endswith(('.png', '.jpg', '.jpeg')):\n",
419 |     "        make_prediction(test_files_dir,image)"
420 |    ]
421 |   }
422 |  ],
423 |  "metadata": {
424 |   "instance_type": "ml.t3.medium",
425 |   "kernelspec": {
426 |    "display_name": "Python 3 (Data Science)",
427 |    "language": "python",
428 |    "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
429 |   },
430 |   "language_info": {
431 |    "codemirror_mode": {
432 |     "name": "ipython",
433 |     "version": 3
434 |    },
435 |    "file_extension": ".py",
436 |    "mimetype": "text/x-python",
437 |    "name": "python",
438 |    "nbconvert_exporter": "python",
439 |    "pygments_lexer": "ipython3",
440 |    "version": "3.7.10"
441 |   }
442 |  },
443 |  "nbformat": 4,
444 |  "nbformat_minor": 5
445 | }
446 | 


--------------------------------------------------------------------------------
/notebooks/2 - Train and deploy with Amazon SageMaker/README.md:
--------------------------------------------------------------------------------
1 | ## Train and Deploy a Custom YOLOv5 Model with Amazon SageMaker Training Jobs and Endpoints
2 | 
3 | In the [Label your dataset with Amazon SageMaker GroundTruth notebooks](), you prepared your custom dataset to train a YOLOv5 model (If you don't have a labeled dataset in COCO format please go to that section an create your dataset).
4 | 
5 | In this section you will be using Amazon SageMaker features to train and test your Custom YOLOv5 model without having to worry about the infrastrucuture management.
6 | You will first locate a labelled custom dataset to be used and pass it through to a SM Training Job. Once the training has finished, you will deploy the model to an endpoint on SageMaker and test it out.
7 | 


--------------------------------------------------------------------------------
/notebooks/2 - Train and deploy with Amazon SageMaker/helper-code/detect.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import os
 3 | import platform
 4 | import sys
 5 | import torch
 6 | import json
 7 | import numpy as np
 8 | import cv2
 9 | 
10 | def model_fn(model_dir):
11 |     os.system("pip install seaborn")
12 |     torch.hub._validate_not_a_forked_repo=lambda a,b,c: True 
13 |     model = torch.hub.load("ultralytics/yolov5", "custom", path="/opt/ml/model/exp/weights/best.pt", force_reload=True)
14 |     print("Model Loaded")
15 |     return model
16 | 
17 | def input_fn(input_data, content_type):
18 | 
19 |     if content_type in ['image/png','image/jpeg']:
20 |         img = np.frombuffer(input_data, dtype=np.uint8)
21 |         img = cv2.imdecode(img, cv2.IMREAD_COLOR)[..., ::-1]
22 |         img = cv2.resize(img, [640,640])
23 |         return img 
24 |     else:
25 |         raise Exception('Requested unsupported ContentType in Accept: ' + content_type)
26 |         return
27 | 
28 | def predict_fn(input_data, model):
29 |     print("Making inference")
30 |     results = model(input_data)
31 |     print(results)
32 |     df = results.pandas().xyxy[0]
33 |     return(df.to_json(orient="split"))


--------------------------------------------------------------------------------
/src/images/banner-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-train-and-deploy-yolov5/f88d6b81e515a15eb783f865980028cf205d76b5/src/images/banner-1.png


--------------------------------------------------------------------------------