├── LICENSE ├── README.md ├── WenzhenZhu_FP.nb ├── checkpoint ├── 2017-07-05T04:14:54_0_03_04905_4.34e-1.wlnet ├── 2017-07-05_0_10_17700_3.08e-1.wlnet └── hour9.wlnet ├── nets └── enet.wlnet ├── preprocessings ├── convertMaskIntoTensor.wl └── convert_coco.wl ├── report └── graphics │ └── bboxPrediction.png ├── src ├── Utility.wl └── Visualization.wl └── test └── test.nb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Wenzhen 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Instance Segmentation 2 | 3 | ## Introduction 4 | **Instance Segmentation** is one of many interesting computer vision tasks that combines **object detection** and **semantic segmentation**. It detects the object, and at the same time, generates a segmentation mask, which you can think as classifying each pixel - whether it belongs to an object or not. 5 | 6 | Therefore, solving object detection and semantic segmentation together is a good approach to solve instance segmentation. In the summer school, we borrowed the framework of Mask R-CNN, to combine object detection and semantic segmentation in one pipeline, and produced some promising results. 7 | 8 | Mask R-CNN is developed like this: R-CNN -> Fast R-CNN -> Faster R-CNN -> Mask RCNN. In each stage, researchers solved some bottleneck problems to get faster and better performance. R stands for the region based, so R-CNN a region based convolutional neural network. Mask R-CNN has two stages, the first stage is trying to produce valid bounding box, you can think it as "blobby" image regions because "blobby" regions are likely to contain objects. In the early stage, researchers feed these warped image regions into a convolutional network, and in the output, they put two head, one regression head to produce the bounding-box, another head like SVM to do classify. And peoples kept working on it and make the network more efficient using some tricks like switch the proposal layer and convolutional layer to avoid unnecessary computations. 9 | 10 | ![enter image description here][1] 11 | 12 | There is a Github repo [FastMaskRCNN][2], several machine learning enthusiasts are trying to reproduce this paper in Tensorflow. I already obtained >400k epoch weights, but haven't tested yet, I will keep working on it after summer school. This Mask R-CNN model a too big to finish within 2 weeks, especially since I am new to the TensorFlow framework. Here is the graph visualization from TensorBoard, which looks really complicated and I haven't figure out the pipeline yet. 13 | ![enter image description here][3] 14 | 15 | ## My Project 16 | Because building Mask R-CNN network in Mathematica side turned out to be too complicated to finish before the deadline. So in the last two days of summer school, I did something simpler, but it's inspired by Mask R-CNN's framework, which uses the bounding-box region and corresponding mask to train a network to produce a binary mask (pixel-to-pixel). And this process is like semantic segmentation. In order to get this "pixel-to-pixel" trianing dataset, I wrote a script to process 24k [COCO][4] train2014 dataset and to crop the bounding box region based on the annotation `json` file. Here is how information encoded for object instance annotations: 17 | 18 | annotation{ 19 | "id" : int, 20 | "image_id" : int, 21 | "category_id" : int, 22 | "segmentation" : RLE or [polygon], 23 | "area" : float, 24 | "bbox" : [x,y,width,height], 25 | "iscrowd" : 0 or 1, 26 | } 27 | 28 | The original annotation `json` data looks like this: 29 | 30 | {"area" -> 54653., 31 | "segmentation" -> {{312.29, 562.89, 402.25, 511.49, 400.96, 425.38, 32 | 398.39, 372.69, 388.11, 332.85, 318.71, 325.14, 295.58, 305.86, 33 | 269.88, 314.86, 258.31, 337.99, 217.19, 321.29, 182.49, 343.13, 34 | 141.37, 348.27, 132.37, 358.55, 159.36, 377.83, 116.95, 421.53, 35 | 167.07, 499.92, 232.61, 560.32, 300.72, 571.89}}, "iscrowd" -> 0, 36 | "bbox" -> {116.95, 305.86, 285.3, 266.03}, "image_id" -> 480023, 37 | "category_id" -> 58, "id" -> 86} 38 | 39 | The code is very simple, we just need to do the following things: 40 | 41 | - Use the bounding-box `{{x, y}, {dw, dh}` to trim the image into bounding-box region image. 42 | - Use the list of points (vertices of the polygon (encoding ground truth segmentation mask) ) and bounding-box image dimensions to produce the corresponding mask for the region image. 43 | 44 | convertAnnotationBBoxIntoImageCoord[bbox_List, h_]:= Module[ 45 | {x1, y1, dw, dh}, 46 | {x1, y1, dw, dh} = bbox; 47 | Transpose[{0, h} + {1, -1} Transpose[Partition[{x1, y1, x1 + dw, y1 + dh}, 2]]] 48 | ] 49 | getMask[pts_List, w_, h_]:= Module[ 50 | {vtx}, 51 | vtx = convertAnnotationMask[pts, h]; 52 | Binarize @ Rasterize[Graphics[{White, Polygon@@vtx}, PlotRange->{{0, w}, {0, h}}, 53 | Background->Black],"Image", ImageSize->{w, h}] 54 | ] 55 | extractBBoxAndMask[img_, bbox_, segPts_List]:= Module[ 56 | {bboxLocal, w, h, boxRegion, maskRegion}, 57 | {w, h} = ImageDimensions[img]; 58 | bboxLocal = convertAnnotationBBoxIntoImageCoord[bbox, h]; 59 | boxRegion = ImageTrim[img, bboxLocal]; 60 | maskRegion = ImageTrim[getMask[segPts, w, h], bboxLocal]; 61 | {boxRegion, maskRegion} 62 | ] 63 | imgsAndMasks = 64 | extractBBox[imgs[[#]], bboxes[[#]], maskCoord[[#]]] & /@ Range[10] 65 | ![enter image description here][5] 66 | 67 | 68 | For the network I used to train, I used ENet, which is a very fast and an efficient network. The Mask R-CNN paper used FCN, which is known as standard network to perform semantic segmentation. I also construct this network in Mathematica and I will try it later as well. 69 | The ENet architecture is as followings: 70 | 71 | ![enter image description here][6] 72 | ![enter image description here][7] 73 | 74 | The output is a 256 * 256 * 2 tensor produced by a softmax layer, so it encoded the mask as the Pr[ this pixel belongs to object]. Therefore, I also need to convert the mask image from {0,1} binary into {1,2} as the class label and save it as `.dat` as training labels. Because I was running out of the time, I just trained with this simple input and output. A better way Etienne suggested is to extract the output from the final convolutional layer of yolo as an input feature to feed in near the output, which I will definitely try soon. 75 | 76 | I trained my network only for 9 hours on a single Tesla K80 GPU, and already got very promising results. 77 | 78 | The way my `instanceSegmentation[image, net, detectionThreshold, overlapThreshold]` work is as followings: 79 | 80 | 1. Use YOLO network as detector to produce labels, bounding-boxes, and probabilities 81 | 82 | 2. Use bounding-boxes to crop the image object region and feed it to our trained network 83 | 84 | 3. Take the output tensor, convert it to binaryImages, resize it back to bounding box dimensions by using `ImagePad`. 85 | 86 | padRegionMask[trimmedMask_, bboxLocal_, w_, h_]:= Block[ 87 | {x1,y1,x2,y2}, 88 | {{x1,y1},{x2,y2}} = bboxLocal; 89 | ImagePad[trimmedMask, 90 | {{x1, w - x2}, 91 | {y1, h - y2}}] 92 | ] 93 | 94 | instanceSegmentation[img_, ennet_, detectionThreshold_, overlapThreshold_]:= Module[ 95 | {labels, bboxes, probs, masks, coloredMasks, yoloVis, yoloRes, rectangles, centers}, 96 | yoloRes = detection[img, detectionThreshold, overlapThreshold]; 97 | rectangles = Transpose[yoloRes][[2]]; 98 | {labels, bboxes, probs} = convertYoloResult[yoloRes]; 99 | centers = Mean/@ bboxes; 100 | masks = produceMask[ennet, img, bboxes]; 101 | coloredMasks = Flatten[{RandomColor[], Opacity[.45], #}&/@ masks]; 102 | yoloVis = Transpose @ MapThread[ 103 | { 104 | {Darker @ Green,Opacity[0],#2}, 105 | Style[ 106 | Inset[#1<>" \n("<>ToString@Round[#3, .01]<>")", #4, {Center, Center} ], 107 | FontSize -> Scaled[.03], FontColor -> White, GrayLevel[0,1], Background -> GrayLevel[1,0] 108 | ] 109 | }&, {labels, rectangles, probs, centers}]; 110 | HighlightImage[img, Join[{coloredMasks,yoloVis}], ImagePadding -> Scaled[.02]] 111 | ] 112 | 113 | Ok, here are some results, I only started training today 114 | 115 | Many cute dogs 116 | 117 | ![dogs][8] 118 | 119 | Me and my mentor 120 | 121 | ![Me with my mentor ][9] 122 | 123 | 124 | Me and my phone 125 | 126 | ![Me with my phone][10] 127 | 128 | 129 | Me and my coffee 130 | 131 | ![Me with my coffee][11] 132 | 133 | Me and my handbag 134 | 135 | ![Me with my handbag][12] 136 | 137 | 138 | Me and my classmates and his phone 139 | 140 | ![Me with my classmates with his phone][13] 141 | 142 | 143 | 144 | ## Some personal reflection 145 | I enjoyed the summer school overall. I have been hoping to explore the Tensorflow framework and watch Stanford CS231n class for few months, but I was always very occupied with school's classes, other projects, lab's assignments, coding interviews, etc. I finally find some peaceful time to sit down and learn things I had always to learn. And in the process, I am also very amazed by the neural network framework Wolfram people developed. This is a very powerful and user-friendly framework that inherits Wolfram Language's elegant syntax and interactive property. I still have some questions about this framework and plan to learn more about it. 146 | 147 | ## Future Direction 148 | 1. Use FCN to do mask semantic segmentation 149 | 2. After obtained Mask R-CNN trained network, deploy it on a server and build an interesting iOS application. 150 | 3. Collaborate with Medical school people and apply Mask R-CNN to some medical imaging problems. 151 | 152 | 153 | ## Reference 154 | 155 | 1. Mask R-CNN 156 | Paper: https://arxiv.org/abs/1703.06870 157 | Code (under testing): https://github.com/CharlesShang/FastMaskRCNN 158 | 2. ENet: https://arxiv.org/abs/1606.02147 159 | 3. YOLO2: https://pjreddie.com/darknet/yolo/ 160 | 4. Project repo: https://github.com/zhuwenzhen/InstanceSegmentation 161 | 162 | 163 | [1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Framework.png&userId=524853 164 | [2]: https://github.com/CharlesShang/FastMaskRCNN 165 | [3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=png.png&userId=524853 166 | [4]: http://mscoco.org/dataset/#download 167 | [5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2017-07-0515.14.08.png&userId=524853 168 | [6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Enet.png&userId=524853 169 | [7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2017-07-0515.22.14.png&userId=524853 170 | [8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2017-07-0516.02.58.png&userId=524853 171 | [9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2017-07-0516.20.56.png&userId=524853 172 | [10]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2017-07-0516.20.47.png&userId=524853 173 | [11]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2017-07-0516.21.35.png&userId=524853 174 | [12]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2017-07-0516.25.30.png&userId=524853 175 | [13]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2017-07-0516.25.38.png&userId=524853 176 | -------------------------------------------------------------------------------- /checkpoint/2017-07-05T04:14:54_0_03_04905_4.34e-1.wlnet: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhuwenzhen/instance-segmentation/fc395b8f60ae518379bfd5959ff868d2766c8148/checkpoint/2017-07-05T04:14:54_0_03_04905_4.34e-1.wlnet -------------------------------------------------------------------------------- /checkpoint/2017-07-05_0_10_17700_3.08e-1.wlnet: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhuwenzhen/instance-segmentation/fc395b8f60ae518379bfd5959ff868d2766c8148/checkpoint/2017-07-05_0_10_17700_3.08e-1.wlnet -------------------------------------------------------------------------------- /checkpoint/hour9.wlnet: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhuwenzhen/instance-segmentation/fc395b8f60ae518379bfd5959ff868d2766c8148/checkpoint/hour9.wlnet -------------------------------------------------------------------------------- /nets/enet.wlnet: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhuwenzhen/instance-segmentation/fc395b8f60ae518379bfd5959ff868d2766c8148/nets/enet.wlnet -------------------------------------------------------------------------------- /preprocessings/convertMaskIntoTensor.wl: -------------------------------------------------------------------------------- 1 | (* ::Package:: *) 2 | 3 | dir = NotebookDirectory[]; 4 | 5 | 6 | json = Import[dir <>"instances_train2014.json"]; 7 | 8 | 9 | instance = Association/@ json[[1,2, All]]; 10 | imgIdList = #["image_id"] &/@ instance; 11 | 12 | 13 | convertToOutputFormat[mask_Image] := 14 | ImageData[ ImageResize[ mask, {256, 256}], "Bit"] + 1 15 | 16 | padZero[num_]:=Module[ 17 | {len}, 18 | len = Length[IntegerDigits[num]]; 19 | StringJoin @ Table["0", 12 - len]<>ToString[num] 20 | ] 21 | 22 | 23 | imageId = imgIdList[[1]]; 24 | mask = Import[dir <> "myTrain/mask/mask_"<> padZero @ imageId <> ".jpg"]; 25 | 26 | 27 | imgIdListNoDuplicates = Union @ imgIdList; 28 | 29 | 30 | Length[imgIdListNoDuplicates] 31 | 32 | 33 | progress = 0; 34 | SetSharedVariable[progress]; 35 | 36 | AbsoluteTiming[ParallelDo[ 37 | (*1. Import image i*) 38 | imageId = imgIdListNoDuplicates[[i]]; 39 | mask = Import[dir <> "myTrain/mask/mask_"<> padZero @ imageId <> ".jpg"]; 40 | outputTensor = convertToOutputFormat[mask]; 41 | Export[dir <> "myTrain/mask_data/mask_"<> padZero@imageId<> ".dat", outputTensor]; 42 | Print[++progress, " (", progress / Length[imgIdListNoDuplicates] * 100., " %)"], 43 | {i, Length[imgIdListNoDuplicates]}]] 44 | -------------------------------------------------------------------------------- /preprocessings/convert_coco.wl: -------------------------------------------------------------------------------- 1 | (* ::Package:: *) 2 | 3 | (* ::Title:: *) 4 | (*Convert Coco Into Bounding Boxes And Masks*) 5 | 6 | 7 | (* ::Subtitle:: *) 8 | (*Wenzhen Zhu*) 9 | 10 | 11 | dir = ParentDirectory[NotebookDirectory[]] 12 | 13 | 14 | (* ::Section:: *) 15 | (*To Do*) 16 | 17 | 18 | (* ::ItemNumbered:: *) 19 | (*Some of the mask are not encoded in "Polygon" format, add an if condition to not parse in this case / throw it out in the beginning.*) 20 | 21 | 22 | (* ::ItemNumbered:: *) 23 | (*Brain storm how to optimize this. *) 24 | 25 | 26 | (* ::Section:: *) 27 | (*Helper*) 28 | 29 | 30 | padZero[num_]:=Module[ 31 | {len}, 32 | len = Length[IntegerDigits[num]]; 33 | StringJoin @ Table["0", 12 - len]<>ToString[num] 34 | ] 35 | 36 | 37 | convertAnnotationMask[vtx_List, h_]:= 38 | Map[Transpose[{0, h} + {1, -1} Transpose[Partition[#, 2]]]&, vtx] 39 | 40 | 41 | getMask[pts_List, w_, h_]:= Module[ 42 | {vtx}, 43 | vtx = convertAnnotationMask[pts, h]; 44 | Binarize @ Rasterize[Graphics[{White, Polygon@@vtx}, PlotRange->{{0, w}, {0, h}}, 45 | Background->Black],"Image", ImageSize->{w, h}] 46 | ] 47 | 48 | 49 | convertAnnotationBBoxIntoImageCoord[bbox_List, h_]:= Module[ 50 | {x1, y1, dw, dh}, 51 | {x1, y1, dw, dh} = bbox; 52 | Transpose[{0, h} + {1, -1} Transpose[Partition[{x1, y1, x1 + dw, y1 + dh}, 2]]] 53 | ] 54 | 55 | 56 | extractBBoxAndMask[img_, bbox_, segPts_List]:= Module[ 57 | {bboxLocal, w, h, boxRegion, maskRegion}, 58 | {w, h} = ImageDimensions[img]; 59 | bboxLocal = convertAnnotationBBoxIntoImageCoord[bbox, h]; 60 | boxRegion = ImageTrim[img, bboxLocal]; 61 | maskRegion = ImageTrim[getMask[segPts, w, h], bboxLocal]; 62 | {boxRegion, maskRegion} 63 | ] 64 | 65 | 66 | (* ::Subsection:: *) 67 | (*Yolo-extractBBox*) 68 | 69 | 70 | extractBBox[img_, bbox_]:= ImageTrim[img, bbox] 71 | 72 | 73 | (* ::Section:: *) 74 | (*Set up*) 75 | 76 | 77 | json = Import[dir <> "/coco/annotations/instances_train2014.json"]; 78 | 79 | 80 | imgIdList = Import[dir <> "/dataset/imgIdAll_train2014.mx"]; 81 | 82 | 83 | data = Association/@json[[1,2, All]]; 84 | 85 | 86 | dataDictionary = Association[#["image_id"] -> # & /@ data]; 87 | 88 | 89 | getNData[startInd_, endInd_]:= dataDictionary[#] &/@ imgIdList[[startInd;; endInd]] 90 | 91 | 92 | (*start = 11; 93 | end = 50;*) 94 | start = 51; 95 | end = 100; 96 | len = end - start; 97 | 98 | 99 | nJson = getNData[start, end]; 100 | 101 | 102 | bboxList = #["bbox"]&/@ nJson; 103 | maskCoordList = #["segmentation"] &/@ nJson; 104 | imgIds = #["image_id"] &/@ nJson; 105 | outputDir = dir <> "/dataset/sampledDataset/"; 106 | 107 | 108 | (* ::Section:: *) 109 | (*Convert*) 110 | 111 | 112 | Do[ 113 | (*1. Import image i*) 114 | imageId = imgIds[[i]]; 115 | image = Import[dir <>"/coco/train2014/COCO_train2014_"<> padZero @ imageId<>".jpg"]; 116 | 117 | bbox = bboxList[[i]]; 118 | maskCoord = maskCoordList[[i]]; 119 | 120 | {imgTrim, maskTrim} = extractBBoxAndMask[image, bbox, maskCoord]; 121 | 122 | Export[outputDir <> "bbox/" <> padZero@imageId <> ".png",imgTrim]; 123 | Export[outputDir <> "mask/" <> padZero@imageId<> ".png",maskTrim]; 124 | 125 | Echo[i], 126 | {i, 1, len + 1} 127 | ] 128 | -------------------------------------------------------------------------------- /report/graphics/bboxPrediction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhuwenzhen/instance-segmentation/fc395b8f60ae518379bfd5959ff868d2766c8148/report/graphics/bboxPrediction.png -------------------------------------------------------------------------------- /src/Utility.wl: -------------------------------------------------------------------------------- 1 | (* ::Package:: *) 2 | 3 | (* ::Title:: *) 4 | (*Utility*) 5 | 6 | 7 | (* ::Subsection:: *) 8 | (*ReadOutput*) 9 | 10 | 11 | readOutput[outputTensor_]:= outputTensor[[All, All, 1]] 12 | 13 | 14 | (* ::Subsection:: *) 15 | (*padRegionMask*) 16 | 17 | 18 | padRegionMask[trimmedMask_, bboxLocal_, w_, h_]:= Block[ 19 | {x1,y1,x2,y2}, 20 | {{x1,y1},{x2,y2}} = bboxLocal; 21 | ImagePad[trimmedMask, 22 | {{x1, w - x2}, 23 | {y1, h - y2}}] 24 | ] 25 | 26 | 27 | (* ::Subsection:: *) 28 | (*Yolo Result Get*) 29 | 30 | 31 | ConvertYoloResult[yoloRes_]:= Transpose[yoloRes]/.Rectangle -> List 32 | 33 | 34 | (* ::Subsection:: *) 35 | (*Crop Regions*) 36 | 37 | 38 | cropRegion[img_, bbox_]:= Module[ 39 | {w, h, regions}, 40 | {w, h} = ImageDimensions[img]; 41 | regions = ImageTrim[img, #] &/@ bbox 42 | ] 43 | 44 | 45 | (* ::Subsection:: *) 46 | (*produceMask*) 47 | 48 | 49 | (* ::Text:: *) 50 | (*out: output tensor of network*) 51 | (*bbox: a list of bounding boxes*) 52 | (*w = width of image*) 53 | (*h = height of image*) 54 | 55 | 56 | produceMask[net_, img_, bbox_]:= Module[ 57 | {w, h, regions, outputTensor, masks, regionMasks, resizedRegionMask}, 58 | {w, h} = ImageDimensions[img]; 59 | regions = ImageTrim[img, #] &/@ bbox; 60 | 61 | outputTensor = readOutput/@ ( net /@ regions); 62 | masks = Binarize/@( Image /@ outputTensor); 63 | regionMasks = 1 - (Erosion[#, IdentityMatrix[3]]& /@ masks); 64 | resizedRegionMask = MapThread[ImageResize[#1, #2]&, {regionMasks, ImageDimensions/@regions}]; 65 | MapThread[padRegionMask[#1, #2, w, h]&, {resizedRegionMask, bbox}] 66 | ] 67 | -------------------------------------------------------------------------------- /src/Visualization.wl: -------------------------------------------------------------------------------- 1 | (* ::Package:: *) 2 | 3 | (* ::Title:: *) 4 | (*Visualization*) 5 | 6 | 7 | (* ::Section:: *) 8 | (*Helper*) 9 | 10 | 11 | padZero[num_]:=Module[ 12 | {len}, 13 | len = Length[IntegerDigits[num]]; 14 | StringJoin @ Table["0", 12 - len]<>ToString[num] 15 | ] 16 | 17 | 18 | (* ::Section:: *) 19 | (*Mask *) 20 | 21 | 22 | (* ::Text:: *) 23 | (*input: segmentation -> {{Subscript[x, 11], Subscript[y, 11], Subscript[x, 12], Subscript[y, 12], ..., Subscript[x, 1n], Subscript[y, 1n]}, {Subscript[x, 21], Subscript[y, 21], Subscript[x, 22], Subscript[y, 22], ..., Subscript[x, 2n], Subscript[y, 2n]}};*) 24 | (*output: Graphics mask *) 25 | 26 | 27 | convertAnnotationMask[vtx_List, h_]:= 28 | Map[Transpose[{0, h} + {1, -1} Transpose[Partition[#, 2]]]&, vtx] 29 | 30 | 31 | getMask[pts_List, w_, h_]:= Module[ 32 | {vtx}, 33 | vtx = convertAnnotationMask[pts, h]; 34 | Binarize @ Rasterize[Graphics[{White, Polygon@@vtx}, PlotRange->{{0, w}, {0, h}}, 35 | Background->Black], "Image", ImageSize->{w, h}] 36 | ] 37 | 38 | 39 | (* ::Section:: *) 40 | (*BBox*) 41 | 42 | 43 | convertAnnotationBBoxIntoImageCoord[bbox_List, h_]:= Module[ 44 | {x1, y1, dw, dh}, 45 | {x1, y1, dw, dh} = bbox; 46 | Transpose[{0, h} + {1, -1} Transpose[Partition[{x1, y1, x1 + dw, y1 + dh}, 2]]] 47 | ] 48 | 49 | 50 | (* ::Section:: *) 51 | (*Extract Mask And BBox*) 52 | 53 | 54 | (* ::Text:: *) 55 | (*Make the extraction of b-box mask into a pipeline*) 56 | 57 | 58 | extractBBoxAndMask[img_, bbox_, segPts_List]:= Module[ 59 | {bboxLocal, w, h, boxRegion, maskRegion}, 60 | {w, h} = ImageDimensions[img]; 61 | bboxLocal = convertAnnotationBBoxIntoImageCoord[bbox, h]; 62 | boxRegion = ImageTrim[img, bboxLocal]; 63 | maskRegion = ImageTrim[getMask[segPts, w, h], bboxLocal]; 64 | {boxRegion, maskRegion} 65 | ] 66 | 67 | 68 | (* ::Section:: *) 69 | (*InstanceSegmentation*) 70 | 71 | 72 | (*instanceSegmentation[img_, ennet_, detectionThreshold_, overlapThreshold_]:= Module[ 73 | {labels, bboxes, probs, masks, coloredMasks, yoloVis, yoloRes, centers}, 74 | yoloRes = detection[img, detectionThreshold, overlapThreshold]; 75 | {labels, bboxes, probs} = convertYoloResult[yoloRes]; 76 | centers = Mean/@ bboxes; 77 | masks = produceMask[ennet,img, bboxes]; 78 | coloredMasks = Flatten[{RandomColor[], Opacity[.4], #}&/@ masks]; 79 | yoloVis = Transpose @ MapThread[ 80 | { 81 | {Green,Opacity[0],#2}, 82 | Style[ 83 | Inset[#1<>" ("<>ToString@Round[#3, .01]<>")", #2[[1]], {Left, Top} ], 84 | FontSize -> Scaled[.02], GrayLevel[0,1], Background->GrayLevel[1,0] 85 | ] 86 | }&, Transpose[yoloRes]]; 87 | HighlightImage[img, Join[{coloredMasks,yoloVis}], ImagePadding -> Scaled[.02]] 88 | (*Flatten[{coloredMasks, visLabels, visBoxes}]*) 89 | ]*) 90 | 91 | 92 | convertYoloResult[yoloRes_]:= Transpose[yoloRes]/.Rectangle -> List 93 | 94 | 95 | instanceSegmentation[img_, ennet_, detectionThreshold_, overlapThreshold_]:= Module[ 96 | {labels, bboxes, probs, masks, coloredMasks, yoloVis, yoloRes, rectangles, centers}, 97 | 98 | yoloRes = detection[img, detectionThreshold, overlapThreshold]; 99 | rectangles = Transpose[yoloRes][[2]]; 100 | {labels, bboxes, probs} = Transpose[yoloRes]/.Rectangle -> List; 101 | centers = Mean/@ bboxes; 102 | masks = produceMask[ennet,img, bboxes]; 103 | coloredMasks = Flatten[{RandomColor[], Opacity[.45], #}&/@ masks]; 104 | yoloVis = Transpose @ MapThread[ 105 | { 106 | {Darker @ Green,Opacity[0],#2}, 107 | Style[ 108 | Inset[#1<>" \n("<>ToString@Round[#3, .01]<>")", #4, {Center, Center} ], 109 | FontSize -> Scaled[.03], FontColor -> White, GrayLevel[0,1], Background -> GrayLevel[1,0] 110 | ] 111 | }&, {labels, rectangles, probs, centers}]; 112 | HighlightImage[img, Join[{coloredMasks,yoloVis}], ImagePadding -> Scaled[.02],ImageSize->Large] 113 | (*Flatten[{coloredMasks, visLabels, visBoxes}]*) 114 | ] 115 | 116 | 117 | (* ::Section::Closed:: *) 118 | (*Bounding Box*) 119 | 120 | 121 | 122 | labelBox[class_ -> box_]:= Module[{coord,textCoord},(*convert class\[Rule]boxes to labeled boxes*) 123 | coord=List@@box; 124 | textCoord={(coord[[1,1]]+coord[[2,1]])/2.,coord[[1,2]]-0.04}; 125 | {{GeometricTransformation[Text[Style[labels[[class]], 20, Darker@Blue],textCoord], 126 | ReflectionTransform[{0,1},textCoord]]},EdgeForm[Directive[Red,Thick]],Transparent,box}] 127 | 128 | coordToBox[center_, boxCord_, scaling_: 1]:=Module[ 129 | {bx,by,w,h}, 130 | (*conver from {centerx,centery,width,height} to Rectangle object*) 131 | bx=(center[[1]]+boxCord[[1]])/7.; 132 | by=(center[[2]]+boxCord[[2]])/7.; 133 | w=boxCord[[3]]*scaling; 134 | h=boxCord[[4]]*scaling; 135 | Rectangle[{bx-w/2,by-h/2},{bx+w/2,by+h/2}] 136 | ] 137 | 138 | nonMaxSuppression[boxes_, overlapThreshold_, confidThreshold_]:=Module[ 139 | {lth=Length@boxes,boxesSorted,boxi,boxj}, 140 | (*non-max suppresion to eliminate overlapping boxes*)boxesSorted=GroupBy[boxes,#class&][All,SortBy[#prob&]/*Reverse]; 141 | Do[ 142 | Do[ 143 | boxi=boxesSorted[[c,n]]; 144 | If[boxi["prob"]!=0, 145 | Do[boxj=boxesSorted[[c,m]]; 146 | (*if two boxes overlap largely,kill the box with low confidence*) 147 | If[RegionMeasure[RegionIntersection[boxi["coord"],boxj["coord"]]]/RegionMeasure[RegionUnion[boxi["coord"],boxj["coord"]]]>=overlapThreshold, 148 | boxesSorted = ReplacePart[boxesSorted, {c,m,"prob"} -> 0] 149 | ]; 150 | ,{m,n+1,Length[boxesSorted[[c]]]} 151 | ] 152 | ], 153 | {n,1,Length[boxesSorted[[c]]]} 154 | ], 155 | {c,1,Length@boxesSorted} 156 | ]; 157 | boxesSorted[All, Select[#prob>0&]]] 158 | 159 | labelBox[class_ -> box_]:= Module[{coord,textCoord},(*convert class\[Rule]boxes to labeled boxes*) 160 | coord=List@@box; 161 | textCoord={(coord[[1,1]]+coord[[2,1]])/2.,coord[[1,2]]-0.04}; 162 | {{GeometricTransformation[Text[Style[labels[[class]], 20, Darker@Blue],textCoord], 163 | ReflectionTransform[{0,1},textCoord]]},EdgeForm[Directive[Red,Thick]],Transparent,box}] 164 | 165 | drawBoxes[img_,boxes_]:=Module[{labeledBoxes},(*draw boxes with labels*)labeledBoxes=labelBox/@Flatten[Thread/@Normal@Normal@boxes[All,All,"coord"]]; 166 | Graphics[GeometricTransformation[{Raster[ImageData[img],{{0,0},{1,1}}],labeledBoxes},ReflectionTransform[{0,1},{0,1/2}]]]] 167 | 168 | postProcess[img_,vec_,boxScaling_: 0.7,confidentThreshold_: 0.15,overlapThreshold_: 0.4]:=Module[{grid,prob,confid,boxCoord,boxes,boxNonMax},grid=Flatten[Table[{i,j},{j,0,6},{i,0,6}],1]; 169 | prob=Partition[vec[[1;;980]],20]; 170 | confid=Partition[vec[[980+1;;980+98]],2]; 171 | boxCoord=ArrayReshape[vec[[980+98+1;;-1]],{49,2,4}]; 172 | boxes=Dataset@Select[Flatten@Table[<|"coord"->coordToBox[grid[[i]],boxCoord[[i,b]],boxScaling],"class"->c,"prob"->If[#<=confidentThreshold,0,#]&@(prob[[i,c]]*confid[[i,b]])|>,{c,1,20},{b,1,2},{i,1,49}],#prob>=confidentThreshold&]; 173 | boxNonMax=nonMaxSuppression[boxes,overlapThreshold,confidentThreshold]; 174 | drawBoxes[Image[img],boxNonMax]] 175 | 176 | 177 | (* ::Title:: *) 178 | (*Utility*) 179 | 180 | 181 | (* ::Subsection:: *) 182 | (*ReadOutput*) 183 | 184 | 185 | readOutput[outputTensor_]:= outputTensor[[All, All, 1]] 186 | 187 | 188 | (* ::Subsection:: *) 189 | (*padRegionMask*) 190 | 191 | 192 | padRegionMask[trimmedMask_, bboxLocal_, w_, h_]:= Block[ 193 | {x1,y1,x2,y2}, 194 | {{x1,y1},{x2,y2}} = bboxLocal; 195 | ImagePad[trimmedMask, 196 | {{x1, w - x2}, 197 | {y1, h - y2}}] 198 | ] 199 | 200 | 201 | (* ::Subsection:: *) 202 | (*Yolo Result Get*) 203 | 204 | 205 | ConvertYoloResult[yoloRes_]:= Transpose[yoloRes]/.Rectangle -> List 206 | 207 | 208 | (* ::Subsection:: *) 209 | (*Crop Regions*) 210 | 211 | 212 | cropRegion[img_, bbox_]:= Module[ 213 | {w, h, regions}, 214 | {w, h} = ImageDimensions[img]; 215 | regions = ImageTrim[img, #] &/@ bbox 216 | ] 217 | 218 | 219 | (* ::Subsection:: *) 220 | (*produceMask*) 221 | 222 | 223 | (* ::Text:: *) 224 | (*out: output tensor of network*) 225 | (*bbox: a list of bounding boxes*) 226 | (*w = width of image*) 227 | (*h = height of image*) 228 | 229 | 230 | produceMask[net_, img_, bbox_]:= Module[ 231 | {w, h, regions, outputTensor, masks, regionMasks, resizedRegionMask}, 232 | {w, h} = ImageDimensions[img]; 233 | regions = ImageTrim[img, #] &/@ bbox; 234 | 235 | outputTensor = readOutput/@ ( net /@ regions); 236 | masks = Binarize/@( Image /@ outputTensor); 237 | regionMasks = 1 - (Erosion[#, IdentityMatrix[3]]& /@ masks); 238 | resizedRegionMask = MapThread[ImageResize[#1, #2]&, {regionMasks, ImageDimensions/@regions}]; 239 | MapThread[padRegionMask[#1, #2, w, h]&, {resizedRegionMask, bbox}] 240 | ] 241 | --------------------------------------------------------------------------------