├── .gitignore
├── LICENSE
├── README.md
├── image
├── Building_Footprint_Extraction
│ ├── README.md
│ ├── dataset.py
│ ├── metrics.py
│ ├── model.py
│ ├── requirements.txt
│ ├── train.py
│ ├── training.py
│ └── utils.py
├── Defect_Detection
│ ├── README.md
│ ├── datasets.py
│ ├── model.py
│ ├── requirements.txt
│ ├── train.py
│ └── utils.py
├── Road_Obstacle_Detection
│ ├── README.md
│ ├── dice_loss.py
│ ├── eval.py
│ ├── lf_loader.py
│ ├── train.py
│ ├── train_loss.txt
│ ├── utils.py
│ └── val_loss.txt
└── fastlane
│ ├── Image_Classification
│ └── ImageClassification.ipynb
│ ├── OCR
│ ├── OCR.ipynb
│ ├── charnet
│ │ ├── __init__.py
│ │ ├── config
│ │ │ ├── __init__.py
│ │ │ └── defaults.py
│ │ └── modeling
│ │ │ ├── __init__.py
│ │ │ ├── backbone
│ │ │ ├── __init__.py
│ │ │ ├── decoder.py
│ │ │ ├── hourglass.py
│ │ │ └── resnet.py
│ │ │ ├── layers
│ │ │ ├── __init__.py
│ │ │ ├── misc.py
│ │ │ └── scale.py
│ │ │ ├── model.py
│ │ │ ├── postprocessing.py
│ │ │ ├── rotated_nms.py
│ │ │ └── utils.py
│ ├── configs
│ │ └── icdar2015_hourglass88.yaml
│ ├── datasets
│ │ └── ICDAR2015
│ │ │ └── test
│ │ │ ├── GenericVocabulary.txt
│ │ │ └── char_dict.txt
│ ├── iou.py
│ └── sample.jpg
│ ├── Object_Detection
│ ├── ObjectDetection.ipynb
│ ├── dataset.py
│ ├── models.py
│ ├── skynews-boeing-737-plane_5435020.jpg
│ └── utils.py
│ └── README.md
└── video
├── Galbladder_Segmentation
├── Detectron2StepByStep.ipynb
├── DetectronGBScript.py
├── GallbladderFiles
│ ├── NOGO 1-16424 via_project_20May2021_17h2m.json
│ ├── NOGO 1_16504 via_project_22May2021_9h9m.json
│ ├── NOGO1_319 via_project_14May2021_13h54m.json
│ ├── d018a7fb_25Apr2021_13h18m36s nogo.json
│ ├── nogo 1-450 via_project_18May2021_19h36m.json
│ ├── nogo1_16859 via_project_22May2021_17h54m (66).json
│ ├── nogo1_16859 via_project_22May2021_17h54m.json
│ ├── nogo270via_project_11May2021_18h54m.json
│ ├── nogo310via_project_13May2021_16h20m.json
│ ├── via_project_13May2021_23h12m_nogo.json
│ ├── via_project_26Apr2021_20h18m nogo231.json
│ ├── via_project_2nogo.json
│ ├── via_project_3May2021_17h41m_nogo (1).json
│ ├── via_project_3May2021_17h41m_nogo.json
│ ├── via_project_vid04_nogo.json
│ ├── via_project_vid6_nogo_100.json
│ ├── via_project_vid8_nogo_100.json
│ ├── via_project_video_02_nogo_1-100.json
│ ├── video20_03260.nogo.json
│ ├── video_18_00979.nogo.json
│ └── video_24_09836_nogo.json
├── README.md
├── _launch.sh
├── _runner.sh
├── augCoords.json
├── grandproj.env
├── jsonOutput
│ ├── bladder_val_coco_format.json
│ ├── bladder_val_coco_format.json.lock
│ ├── coco_instances_results.json
│ └── instances_predictions.pth
├── parameters.txt
└── runt4v1Detectron.slrm
└── Traffic_Incident_Detection
├── .gitignore
├── .idea
├── .gitignore
├── deployment.xml
├── inspectionProfiles
│ └── profiles_settings.xml
├── misc.xml
├── modules.xml
├── vcs.xml
├── webServers.xml
└── yowo.iml
├── README.md
├── backbones_2d
├── DeepLabV3PlusPytorch
│ ├── LICENSE
│ ├── README.md
│ ├── datasets
│ │ ├── __init__.py
│ │ ├── cityscapes.py
│ │ ├── utils.py
│ │ └── voc.py
│ ├── main.py
│ ├── metrics
│ │ ├── __init__.py
│ │ └── stream_metrics.py
│ ├── network
│ │ ├── __init__.py
│ │ ├── _deeplab.py
│ │ ├── backbone
│ │ │ ├── __init__.py
│ │ │ ├── mobilenetv2.py
│ │ │ └── resnet.py
│ │ ├── modeling.py
│ │ └── utils.py
│ ├── predict.py
│ ├── resnet_2d.ipynb
│ └── utils
│ │ ├── __init__.py
│ │ ├── ext_transforms.py
│ │ ├── loss.py
│ │ ├── scheduler.py
│ │ ├── utils.py
│ │ └── visualizer.py
└── darknet.py
├── backbones_3d
├── mobilenet.py
├── mobilenetv2.py
├── resnet.py
├── resnext.py
├── shufflenet.py
└── shufflenetv2.py
├── cfg
├── ava.yaml
├── ava_categories_count.json
├── ava_categories_ratio.json
├── custom_config.py
├── defaults.py
├── dota_config.yaml
├── dota_train.yaml
├── jhmdb.yaml
├── parser.py
├── ucf24.yaml
├── ucf24_charmed-leaf-23_copy.yaml
├── ucf24_finalAnnots.mat
├── yolo.cfg
└── yolo_cfg.py
├── core
├── FocalLoss.py
├── cfam.py
├── detection_visualization.py
├── detection_visualization_obj_anom.py
├── eval_results.py
├── model.py
├── optimization.py
├── plot_ava_result.py
├── region_loss.py
└── utils.py
├── dataset_factory
├── ava_dataset.py
├── ava_eval_helper.py
├── ava_evaluation
│ ├── README.md
│ ├── __init__.py
│ ├── label_map_util.py
│ ├── metrics.py
│ ├── np_box_list.py
│ ├── np_box_list_ops.py
│ ├── np_box_mask_list.py
│ ├── np_box_mask_list_ops.py
│ ├── np_box_ops.py
│ ├── np_mask_ops.py
│ ├── object_detection_evaluation.py
│ ├── per_image_evaluation.py
│ └── standard_fields.py
├── ava_helper.py
├── clip.py
├── cv2_transform.py
├── dataset_utils.py
├── dota.py
├── generate_anchors.py
├── image.py
├── list_dataset.py
├── logging.py
├── meters.py
└── transform.py
├── dota_anchors.py
├── dota_dl.ipynb
├── main.py
├── main_dota.py
├── test_video_ava.py
└── video_mAP.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.DS_Store
2 | Image_Segmentation/Defect_Detection/ckpt
3 | *__pycache__
4 | Image_Segmentation/Defect_Detection/samples
5 | Image_Segmentation/Building_Footprint_Extraction/samples
6 | Image_Segmentation/Building_Footprint_Extraction/ckpt
7 | Image_Segmentation/Road_Obstacle_Detection/ckpt
8 | Image_Segmentation/Road_Obstacle_Detection/samples
9 | *.ipynb_checkpoints
10 | Image_Segmentation/Road_Obstacle_Detection/test.ipynb
11 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 Vector Institute
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Computer Vision Project
2 |
3 | This project involved facilitating knowledge transfer between Vector and its industry sponsors. Specifically, the objectives were following:
4 |
5 | 1. Learn about recent advances in deep learning for computer vision
6 | 2. Apply methods to novel use cases in industry
7 |
8 | Several use cases involving both images and videos are explored. These use-cases reflected current industry needs, participants’ interests and expertise, and opportunities to translate academic advances into real-world applications:
9 |
10 | **Image Use Cases**
11 | 1. Unsupervised defect detection in manufacturing using autoencoders
12 | 2. Building footprint extraction using semantic segmentation
13 | 3. Road Obstactle Detection using semantic segmentation
14 |
15 | **Video Use Cases**
16 | 1. Semantic segmentation of videos from cholecystectomy procedures (gallbladder surgery)
17 | 2. Traffic incident detection of videos using augment
18 |
19 | ## Additional Tooling
20 | In addition, the AI Engineering team has created a separate repository that works as a tool-kit the Computer Vision project at Vector Institute. It includes various datasets readily loadable from the shared cluster as well as useful image/video tools such as data augmentation and visualization utilities.You can find the repository at https://github.com/VectorInstitute/vector_cv_tools
21 |
22 | ## Usage
23 | Each folder corresponding to a use case includes instructions to run the experiments. It should be noted that this repository is no longer maintained and solely serves as an artifact of the project.
24 |
25 | ## Citations
26 | Please ensure you cite [Computer Vision: Applications in Manufacturing, Surgery, Traffic, Satellites, and Unlabelled Data Recognition Technical Report](https://vectorinstitute.ai/wp-content/uploads/2022/05/computer_vision_project_report_may252022.pdf) whenever you are citing this GitHub repository
27 |
28 | ## Acknowledgements
29 | Many thanks to our sponsor companies, researchers and Vector Institute staff for making this collaboration possible and providing academic support and computing infrastructure during all phases of this work. We would specifically like to thank the following individuals for their contributions.
30 |
31 | * Elham Ahmadi
32 | * Andrew Alberts-Scherer
33 | * Raghav Goyal
34 | * John Jewell
35 | * Shuja Khalid
36 | * Matthew Kowal
37 | * Andriy Levitskyy
38 | * Jinbiao Ning
39 | * Tristan Trim
40 | * Kuldeep Panjwani
41 | * Saeed Pouryazdian
42 | * Sim Sachar
43 | * Yilei Wu
44 | * An Zhou
45 |
--------------------------------------------------------------------------------
/image/Building_Footprint_Extraction/README.md:
--------------------------------------------------------------------------------
1 | # Building Footprint Extraction
2 |
3 | ## Overview
4 |
5 | As high resolution satellite imagery becomes increasingly available in both the public and private domain, a number of beneficial applications that leverage this data are enabled. Extraction of building footprints in satellite imagery is a core component of many downstream applications of satellite imagery such as humanitarian assistance and disaster response. This paper offers a comparative study of methods for building footprint extraction in satellite imagery. The focus is to explore state-of-the-art semantic segmentation models in computer vision using the SpaceNet 2 Building Detection Dataset. Four high-level approaches, and six total variants, are trained and evaluated including U-Net, UNet++, Fully Convolutional Networks (FCN) and DeepLabv3. The Intersection over Union (IoU) is used to quantify the segmentation performance on a held out test set. In our experiments, we found that Deeplabv3 with a Resnet-101 backbone is the most accurate approach to building footprint extraction out of the surveyed methods. In general, models that leverage pretraining achieve high accuracy and require minimal training. Conversely, models that do not leverage pretraining are inaccurate and require longer training regimes.
6 |
7 | ## Dataset
8 | In order to benchmark the aforementioned approaches on building footprint extraction in satellite images, the [SpaceNet Building Detection V2 dataset](https://spacenet.ai/spacenet-buildings-dataset-v2/) is used. This dataset contains high resolution satellite imagery and corresponding labels that specify the location of building footprints. The dataset includes 302,701 Building Labels from across 10,593 multi-spectral satellite images of Vegas, Paris, Shanghai and Khartoum. The labels are binary and indicate whether each pixel is building or background.
9 |
10 |
11 |
12 |
13 |
14 | Figure 1: An example of images (left) and labels (right) in the Spacenet Building
15 | Detection V2.
16 |
17 |
18 |
19 | ## Experimental Setup
20 |
21 | The dataset is divided into training (80%), validating (10%) and testing (10%) sets. Images are resized from 650x650 to 384x384 using bi-cubic interpolation and normalized using the mean and standard deviation of the Imagenet dataset.
22 | The proposed semantic segmentation models are trained on the training set, while the validating set is used to determine a stopping criteria. Lastly, the trained model is evaluated on the testing set. Intersection over Union (IoU) is the metric used to evaluate the model performance and measures the overlap between the labels of the prediction and ground truth. IoU ranges from 0 to 1 where 1 denotes perfect and complete overlap.
23 |
24 | ## Results
25 |
26 |
27 |
28 |
29 |
30 | Figure 2: IOU score on test set for each approach.
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 | Figure 3: A visualization of the predictions generated by each approach along with the input image (far left) and ground truth label (far right).
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 | Figure 4: Binary cross entropy loss for training set (top) and validation set
47 | (bottom) across epochs.
48 |
49 |
50 |
51 | ## Running Code
52 | To configure the environment to run the experiments navigate to the base of this directory and execute the following commands:
53 |
54 | ```
55 | conda create -n new_env
56 | conda activate new_env
57 | pip install -r requirements.txt
58 | ```
59 |
60 | To obtain results for a specific architecture simply pass the appropriate arguments to the **train.py** script:
61 | ```
62 | python train.py --model fcn50 --epochs 10 --batch_size 4 --data_path /path/to/spacenet
63 | ```
64 |
65 | The **train.py** script has the following arguments:
66 | - **model**: (str): Architecture variation for experiments. *required*
67 | - **data_path** (str): The root directory of the dataset. *required*
68 | - **epochs** (int): The number of epochs to train the model. Default 25
69 | - **batch_size** (int) The batch size for training, validation and testing. Default 8
70 | - **learning_rate** (float): Learning rates of model. Default .0001
71 | - **size** (int): Side length of input image. Default 384
72 | - **train_perc** (float): The proportion of samples used for train. Default .8
73 | - **val_perc** (float): The proportion of samples used for validation. Default .1
74 |
--------------------------------------------------------------------------------
/image/Building_Footprint_Extraction/dataset.py:
--------------------------------------------------------------------------------
1 |
2 |
3 | import os
4 |
5 | import numpy as np
6 | import torch
7 | from PIL import Image
8 | from torch.utils.data.dataset import Dataset
9 |
10 | class SpaceNet_Dataset(Dataset):
11 | def __init__(self, img_dir_list, mask_dir_list, img_transform = None, mask_transform=None):
12 | self.img_dir_list = img_dir_list
13 | self.mask_dir_list = mask_dir_list
14 |
15 | img_paths, mask_paths = [], []
16 |
17 | for img_dir, mask_dir in zip(img_dir_list, mask_dir_list):
18 | img_paths += [f"{img_dir}/{img_file}" for img_file in os.listdir(img_dir)]
19 | mask_paths += [f"{mask_dir}/{mask_file}" for mask_file in os.listdir(mask_dir)]
20 |
21 | self.img_paths = sorted(img_paths)
22 | self.mask_paths = [f"{new_mask_path}_mask.png" for new_mask_path in sorted([mask_path[:-9] for mask_path in mask_paths])]
23 |
24 | self.img_transform = img_transform
25 | self.mask_transform = mask_transform
26 |
27 |
28 |
29 | def __len__(self):
30 | return len(self.img_paths)
31 |
32 | def __getitem__(self, index):
33 | img_path = self.img_paths[index]
34 | mask_path = self.mask_paths[index]
35 |
36 | img = Image.open(img_path)
37 | img = self.img_transform(img)
38 |
39 | mask = Image.open(mask_path).convert("1")
40 | mask = self.mask_transform(mask)
41 | mask = torch.from_numpy(np.array(mask).astype(int)).unsqueeze(0)
42 |
43 | return img, mask
--------------------------------------------------------------------------------
/image/Building_Footprint_Extraction/metrics.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.autograd import Function
3 |
4 |
5 | class DiceCoeff(Function):
6 | """Dice coeff for individual examples"""
7 |
8 | def forward(self, input, target):
9 | self.save_for_backward(input, target)
10 | eps = 0.0001
11 | self.inter = torch.dot(input.view(-1), target.view(-1))
12 | self.union = torch.sum(input) + torch.sum(target) + eps
13 |
14 | t = (2 * self.inter.float() + eps) / self.union.float()
15 | return t
16 |
17 | # This function has only a single output, so it gets only one gradient
18 | def backward(self, grad_output):
19 |
20 | input, target = self.saved_variables
21 | grad_input = grad_target = None
22 |
23 | if self.needs_input_grad[0]:
24 | grad_input = grad_output * 2 * (target * self.union - self.inter) \
25 | / (self.union * self.union)
26 | if self.needs_input_grad[1]:
27 | grad_target = None
28 |
29 | return grad_input, grad_target
30 |
31 |
32 | def dice_coeff(input, target):
33 | """Dice coeff for batches"""
34 | if input.is_cuda:
35 | s = torch.FloatTensor(1).cuda().zero_()
36 | else:
37 | s = torch.FloatTensor(1).zero_()
38 |
39 | for i, c in enumerate(zip(input, target)):
40 | s = s + DiceCoeff().forward(c[0], c[1])
41 |
42 | return s / (i + 1)
43 |
44 |
45 |
46 | class IoU(Function):
47 | """IoU for individual examples"""
48 | def forward(self, input, target):
49 | eps = 0.0001
50 | self.inter = torch.dot(input.view(-1), target.view(-1))
51 | self.union = torch.sum(input) + torch.sum(target) + eps-self.inter
52 |
53 | t = (self.inter.float()+eps) / self.union.float()
54 | return t
55 |
56 | def iou(input, target):
57 | """IoU for batches"""
58 | if input.is_cuda:
59 | s = torch.FloatTensor(1).cuda().zero_()
60 | else:
61 | s = torch.FloatTensor(1).zero_()
62 |
63 | for i, c in enumerate(zip(input, target)):
64 | s = s + IoU().forward(c[0], c[1])
65 |
66 | return s / (i + 1)
--------------------------------------------------------------------------------
/image/Building_Footprint_Extraction/requirements.txt:
--------------------------------------------------------------------------------
1 | absl-py==0.11.0
2 | alabaster==0.7.12
3 | anaconda-client==1.7.2
4 | anaconda-navigator==1.7.0
5 | anaconda-project==0.8.3
6 | appdirs==1.4.3
7 | asn1crypto==0.24.0
8 | astor==0.8.1
9 | astroid==2.2.5
10 | astropy==3.2.1
11 | astunparse==1.6.3
12 | atomicwrites==1.3.0
13 | attrs==19.1.0
14 | Automat==0.7.0
15 | Babel==2.7.0
16 | backcall==0.1.0
17 | backports.os==0.1.1
18 | backports.shutil-get-terminal-size==1.0.0
19 | beautifulsoup4==4.7.1
20 | bitarray==0.9.3
21 | bkcharts==0.2
22 | blaze==0.11.3
23 | bleach==3.1.0
24 | bokeh==1.2.0
25 | boto==2.49.0
26 | Bottleneck==1.2.1
27 | cachetools==4.2.1
28 | certifi==2019.6.16
29 | cffi==1.12.3
30 | chardet==3.0.4
31 | Click==7.0
32 | cloudpickle==1.6.0
33 | clyent==1.2.2
34 | colorama==0.4.1
35 | conda==4.7.12
36 | conda-build==3.17.6
37 | conda-package-handling==1.6.0
38 | conda-verify==3.1.1
39 | constantly==15.1.0
40 | contextlib2==0.5.5
41 | convertdate==2.3.2
42 | cryptography==2.7
43 | cycler==0.10.0
44 | Cython==0.29.12
45 | cytoolz==0.10.0
46 | dask==2.1.0
47 | dataclasses==0.8
48 | datashape==0.5.4
49 | decorator==4.4.0
50 | defusedxml==0.6.0
51 | distributed==2.1.0
52 | dm-tree==0.1.5
53 | docutils==0.14
54 | entrypoints==0.3
55 | et-xmlfile==1.0.1
56 | fastcache==1.1.0
57 | filelock==3.0.12
58 | Flask==1.1.1
59 | Flask-Cors==3.0.7
60 | flatbuffers==1.12
61 | fredapi==0.4.3
62 | future==0.17.1
63 | gast==0.3.3
64 | gevent==1.4.0
65 | glob2==0.7
66 | gluonts==0.8.1
67 | gmpy2==2.0.8
68 | google-auth==1.27.0
69 | google-auth-oauthlib==0.4.2
70 | google-pasta==0.2.0
71 | googledrivedownloader==0.4
72 | graphviz==0.8.4
73 | greenlet==0.4.15
74 | grpcio==1.32.0
75 | h5py==2.10.0
76 | heapdict==1.0.0
77 | hijri-converter==2.2.2
78 | holidays==0.11.3.1
79 | html5lib==1.0.1
80 | hyperlink==18.0.0
81 | idna==2.8
82 | imageio==2.5.0
83 | imagesize==1.1.0
84 | importlib-metadata==0.17
85 | incremental==17.5.0
86 | ipykernel==5.1.1
87 | ipython==7.6.1
88 | ipython_genutils==0.2.0
89 | ipywidgets==7.5.0
90 | isodate==0.6.0
91 | isort==4.3.21
92 | itsdangerous==1.1.0
93 | jdcal==1.4.1
94 | jedi==0.13.3
95 | jeepney==0.4
96 | Jinja2==2.10.1
97 | joblib==0.13.2
98 | json5==0.8.4
99 | jsonschema==3.0.1
100 | jupyter==1.0.0
101 | jupyter-client==5.3.1
102 | jupyter-console==6.0.0
103 | jupyter-core==4.5.0
104 | jupyterlab==1.0.2
105 | jupyterlab-launcher==0.13.1
106 | jupyterlab-server==1.0.0
107 | Keras-Applications==1.0.8
108 | Keras-Preprocessing==1.1.2
109 | keyring==18.0.0
110 | kiwisolver==1.1.0
111 | korean-lunar-calendar==0.2.1
112 | lazy-object-proxy==1.4.1
113 | libarchive-c==2.8
114 | lief==0.9.0
115 | lightgbm==3.3.2
116 | llvmlite==0.29.0
117 | locket==0.2.0
118 | lxml==4.3.4
119 | Markdown==3.3.3
120 | MarkupSafe==1.1.1
121 | matplotlib==3.1.0
122 | mccabe==0.6.1
123 | mistune==0.8.4
124 | mkl-fft==1.0.12
125 | mkl-random==1.0.2
126 | mkl-service==2.0.2
127 | mock==3.0.5
128 | more-itertools==7.0.0
129 | mpmath==1.1.0
130 | msgpack==0.6.1
131 | multipledispatch==0.6.0
132 | mxnet-cu112==1.8.0.post0
133 | navigator-updater==0.1.0
134 | nbconvert==5.5.0
135 | nbformat==4.4.0
136 | networkx==2.3
137 | nltk==3.4.4
138 | nose==1.3.7
139 | notebook==6.0.0
140 | numba==0.45.0
141 | numexpr==2.6.9
142 | numpy==1.19.5
143 | numpydoc==0.9.1
144 | oauthlib==3.1.0
145 | odo==0.5.1
146 | olefile==0.46
147 | openpyxl==2.6.2
148 | opt-einsum==3.3.0
149 | packaging==19.0
150 | pandas==1.1.5
151 | pandocfilters==1.4.2
152 | parso==0.5.0
153 | partd==1.0.0
154 | path.py==12.0.1
155 | pathlib2==2.3.4
156 | patsy==0.5.1
157 | pep8==1.7.1
158 | pexpect==4.7.0
159 | pickleshare==0.7.5
160 | Pillow==8.2.0
161 | pkginfo==1.5.0.1
162 | plotly==5.6.0
163 | plotly-express==0.4.1
164 | pluggy==0.12.0
165 | ply==3.11
166 | prometheus-client==0.7.1
167 | promise==2.3
168 | prompt-toolkit==2.0.9
169 | protobuf==3.15.1
170 | psutil==5.6.3
171 | ptyprocess==0.6.0
172 | py==1.8.0
173 | pyasn1==0.4.4
174 | pyasn1-modules==0.2.2
175 | pycodestyle==2.5.0
176 | pycosat==0.6.3
177 | pycparser==2.19
178 | pycrypto==2.6.1
179 | pycurl==7.43.0.3
180 | pydantic==1.8.2
181 | pyflakes==2.1.1
182 | Pygments==2.4.2
183 | pylint==2.3.1
184 | PyMeeus==0.5.11
185 | pyodbc==4.0.26
186 | pyOpenSSL==19.0.0
187 | pyparsing==2.4.0
188 | pyrsistent==0.14.11
189 | PySocks==1.7.0
190 | pytest==5.0.1
191 | pytest-arraydiff==0.3
192 | pytest-astropy==0.5.0
193 | pytest-doctestplus==0.3.0
194 | pytest-openfiles==0.3.2
195 | pytest-remotedata==0.3.1
196 | python-dateutil==2.8.0
197 | pytz==2019.1
198 | PyWavelets==1.0.3
199 | PyYAML==5.1.1
200 | pyzmq==18.0.0
201 | QtAwesome==0.5.7
202 | qtconsole==4.5.1
203 | QtPy==1.8.0
204 | rdflib==5.0.0
205 | requests==2.22.0
206 | requests-oauthlib==1.3.0
207 | rope==0.14.0
208 | rsa==4.7.1
209 | ruamel_yaml==0.15.46
210 | scikit-image==0.15.0
211 | scikit-learn==0.21.2
212 | scipy==1.4.1
213 | seaborn==0.9.0
214 | SecretStorage==3.1.1
215 | Send2Trash==1.5.0
216 | service-identity==17.0.0
217 | simplegeneric==0.8.1
218 | singledispatch==3.4.0.3
219 | six==1.15.0
220 | snowballstemmer==1.9.0
221 | sortedcollections==1.1.2
222 | sortedcontainers==2.1.0
223 | soupsieve==1.8
224 | Sphinx==2.1.2
225 | sphinxcontrib-applehelp==1.0.1
226 | sphinxcontrib-devhelp==1.0.1
227 | sphinxcontrib-htmlhelp==1.0.2
228 | sphinxcontrib-jsmath==1.0.1
229 | sphinxcontrib-qthelp==1.0.2
230 | sphinxcontrib-serializinghtml==1.1.3
231 | sphinxcontrib-websupport==1.1.2
232 | spyder==3.3.6
233 | spyder-kernels==0.5.1
234 | SQLAlchemy==1.3.5
235 | statsmodels==0.10.0
236 | sympy==1.4
237 | tables==3.5.2
238 | tblib==1.4.0
239 | tenacity==8.0.1
240 | tensorboard==2.4.1
241 | tensorboard-plugin-wit==1.8.0
242 | tensorflow==2.4.1
243 | tensorflow-estimator==2.4.0
244 | tensorflow-gpu==2.4.1
245 | tensorflow-probability==0.12.1
246 | termcolor==1.1.0
247 | terminado==0.8.2
248 | testpath==0.4.2
249 | toolz==0.10.0
250 | torch-scatter==2.0.7
251 | tornado==6.0.3
252 | tqdm==4.32.1
253 | traitlets==4.3.2
254 | Twisted==18.7.0
255 | typed-ast==1.3.4
256 | typing==3.6.2
257 | typing-extensions==3.10.0.2
258 | unicodecsv==0.14.1
259 | urllib3==1.24.2
260 | wcwidth==0.1.7
261 | webencodings==0.5.1
262 | Werkzeug==0.15.4
263 | widgetsnbextension==3.5.0
264 | wrapt==1.12.1
265 | wurlitzer==1.0.2
266 | xlrd==1.2.0
267 | XlsxWriter==1.1.8
268 | xlwt==1.3.0
269 | yacs==0.1.8
270 | zict==1.0.0
271 | zipp==0.5.1
272 | zope.interface==4.5.0
273 |
--------------------------------------------------------------------------------
/image/Building_Footprint_Extraction/training.py:
--------------------------------------------------------------------------------
1 | import torch, numpy as np
2 | from utils import save_viz
3 | from metrics import iou
4 |
5 | def get_label_dist(loader):
6 | count_list = []
7 | for _, (_, lbl) in enumerate(loader):
8 | cnt = torch.bincount(lbl.int().flatten())
9 | count_list.append(cnt)
10 |
11 | cnts = torch.stack(count_list, dim=0).sum(dim=0).tolist()
12 | zero_count, one_count = cnts[0], cnts[1]
13 | perc = zero_count / (zero_count + one_count)
14 | return perc
15 |
16 |
17 | def train_fn(loader, model, opt, loss_fn, device):
18 | loss_list = []
19 | for batch_id, (data, targets) in enumerate(loader):
20 | data = data.to(device=device)
21 | targets = targets.float().to(device)
22 | predictions = model(data)['out']
23 | loss = loss_fn(predictions, targets)
24 | opt.zero_grad()
25 | loss.backward()
26 | opt.step()
27 | loss_list.append(loss.item())
28 |
29 | mean_loss = np.mean(loss_list)
30 | return mean_loss
31 |
32 |
33 | def val_fn(loader, model, loss_fn, device, color_map, sample_path, epoch, perc, viz):
34 | loss_list = []
35 | for batch_id, (data, targets) in enumerate(loader):
36 | data = data.to(device=device)
37 | targets = targets.float().to(device=device)
38 | with torch.no_grad():
39 | predictions = model(data)['out']
40 | loss = loss_fn(predictions, targets)
41 | loss_list.append(loss.item())
42 | if viz:
43 | save_viz(data, predictions, targets, color_map, epoch, sample_path, perc)
44 | viz = False
45 |
46 | mean_loss = np.mean(loss_list)
47 | return mean_loss
48 |
49 |
50 | def test_fn(loader, model, loss_fn, device, perc):
51 | target_list, pred_list = [], []
52 | for batch_id, (data, targets) in enumerate(loader):
53 | data = data.to(device=device)
54 | targets = targets.float().to(device=device)
55 | with torch.no_grad():
56 | pred = model(data)['out']
57 | pred_list.append(pred)
58 | target_list.append(targets)
59 |
60 | pred = torch.cat(pred_list, dim=0)
61 | target = torch.cat(target_list, dim=0)
62 | thresh = np.quantile(pred.flatten().cpu().numpy(), perc)
63 | test_loss = loss_fn(pred, target).item()
64 | pred = (pred > thresh).float()
65 | test_iou = iou(pred, target).item()
66 | return (
67 | test_loss, test_iou)
--------------------------------------------------------------------------------
/image/Building_Footprint_Extraction/utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | from torchvision.models.segmentation import fcn_resnet50, fcn_resnet101, deeplabv3_resnet50, deeplabv3_resnet101
4 | from torchvision.models.segmentation.deeplabv3 import DeepLabHead
5 | from torchvision.models.segmentation.fcn import FCNHead
6 |
7 | import numpy as np
8 | import matplotlib.pyplot as plt
9 |
10 | from model import UNET, UNETPlus
11 |
12 | def get_model(model_type, pretrained):
13 | model = None
14 | if model_type == "fcn50":
15 | model = get_model_fcn50(pretrained)
16 |
17 | elif model_type == "fcn101":
18 | model = get_model_fcn101(pretrained)
19 |
20 | elif model_type == "dlv350":
21 | model = get_model_dlv350(pretrained)
22 |
23 | elif model_type == "dlv3101":
24 | model = get_model_dlv3101(pretrained)
25 |
26 | elif model_type == "unet":
27 | model = UNET(in_channels=3, out_channels=1)
28 |
29 | elif model_type == "unetplus":
30 | model = UNETPlus(n_channels=3, n_classes=1)
31 |
32 | return model
33 |
34 | def get_model_fcn50(pretrained=True, c_out=1):
35 | # Prepare Model and Save to Checkpoint Directory
36 | model = fcn_resnet50(pretrained=pretrained)
37 |
38 | model.classifier = FCNHead(2048, c_out)
39 | model.aux_classifier = None
40 | model = nn.DataParallel(model)
41 | return model
42 |
43 | def get_model_fcn101(pretrained=True, c_out=1):
44 | # Prepare Model and Save to Checkpoint Directory
45 | model = fcn_resnet101(pretrained=pretrained)
46 | model.classifier = FCNHead(2048, c_out)
47 | model.aux_classifier = None
48 | model = nn.DataParallel(model)
49 | return model
50 |
51 | def get_model_dlv350(pretrained=True, c_out=1):
52 | # Prepare Model and Save to Checkpoint Directory
53 | model = deeplabv3_resnet50(pretrained=pretrained)
54 | model.classifier = DeepLabHead(2048, c_out)
55 | model.aux_classifier = None
56 | model = nn.DataParallel(model)
57 |
58 | return model
59 |
60 | def get_model_dlv3101(pretrained=True, c_out=1):
61 | # Prepare Model and Save to Checkpoint Directory
62 | model = deeplabv3_resnet101(pretrained=pretrained)
63 | model.classifier = DeepLabHead(2048, c_out)
64 | model.aux_classifier = None
65 | model = nn.DataParallel(model)
66 | return model
67 |
68 | def save_checkpoint(model, opt, epoch, path, train_loss_list=[], val_loss_list=[]):
69 | """Save Checkpoint"""
70 |
71 | torch.save({
72 | "model": model.state_dict(),
73 | "opt": opt.state_dict(),
74 | "epoch": epoch,
75 | "train_loss_list": train_loss_list,
76 | "val_loss_list": val_loss_list
77 | },
78 | path)
79 |
80 |
81 | def save_viz(img, out, lbl, color_map, epoch, sample_path, perc):
82 | img = img.cpu().numpy()
83 | out = out.cpu().numpy()
84 | lbl = lbl.cpu().numpy()
85 |
86 | thresh = np.quantile(out, perc)
87 |
88 | print("thresh", thresh)
89 |
90 | img = (img - np.min(img)) / (np.max(img) - np.min(img))
91 | rows = out.shape[2]
92 | cols = out.shape[3]
93 |
94 | masks = []
95 | masks_gt = []
96 | for index, (im, o, l) in enumerate(zip(img, out, lbl)):
97 | o, l = o.squeeze(), l.squeeze()
98 |
99 | o = (o > thresh).astype(int)
100 |
101 |
102 | mask = np.zeros((rows, cols, 3), dtype=np.uint8)
103 | mask_gt = np.zeros((rows, cols, 3), dtype=np.uint8)
104 |
105 | for j in range(rows):
106 | for i in range(cols):
107 | mask[j, i] = color_map[o[j, i]]
108 | mask_gt[j, i] = color_map[l[j, i]]
109 |
110 |
111 |
112 | f, axarr = plt.subplots(1, 3, figsize=(20, 20))
113 | im = np.moveaxis(im, 0, -1)
114 | axarr[0].imshow(im)
115 | axarr[0].title.set_text('Image')
116 | axarr[1].imshow(mask_gt)
117 | axarr[1].title.set_text('Label')
118 | axarr[2].imshow(mask)
119 | axarr[2].title.set_text('Prediction')
120 | f.savefig( f"{sample_path}/epoch_{str(epoch)}_{str(index)}.jpg")
--------------------------------------------------------------------------------
/image/Defect_Detection/README.md:
--------------------------------------------------------------------------------
1 | # Defect Detection
2 |
3 | ## Overview
4 |
5 | Anomaly detection is an important task in computer vision that is concerned with identifying anomalous images given a training set of only normal images. In anomaly segmentation, the concept of anomaly detection is extended to the pixel level in order to identify anomalous regions of images. There are many applications to anomaly detection including biomedical image segmentation, video surveillance and defect detection. In particular, defect detection involves detecting abnormalities in manufacturing components and so is widely used in the industry to enhance quality assurance and efficiency in the production process \cite{bergmann2019mvtec}. However, having a person manually inspect each component is not feasible in most cases. To address this, systems have been proposed to automate the detection of defective components. These approaches generally take as input an image of a component and output a label or pixel-level mask that predicts whether the image or pixel is anomalous. Although initial approaches were generally ineffective, newer, deep learning based approaches have shown very strong performance in anomaly detection and segmentation. Thus, these new methods have the potential to dramatically increase quality assurance and efficiency. In order to compare anomaly detection methods, several datasets have been proposed as benchmarks such as MNIST, CIFAR, and UCSD, whereas there are much fewer benchmark datasets for the anomaly segmentation task. To address this, the MVTec Anomaly Detection Dataset was recently introduced as a benchmark for anomaly segmentation.
6 |
7 | MVTec is focused on industrial inspection; consisting of a training set of normal images of objects and textures as well as a test set with both normal and anomalous samples along with their corresponding labels. There are over 70 different types of defects across the anomalous images that are typical in the manufacturing process. The quality and practical nature of the MVTec dataset has made it a popular benchmark for recently proposed anomaly segmentation methods. The goal of this focus phase of the project is to apply state-of-the-art methods to accurately segment anomalies in the MVTec dataset. In doing so, we compared the performance of different anomaly segmentation methods in the industrial inspection setting. Additionally, we sought to optimize the performance of the methods by altering the hyperparameters and architectures of the approaches.
8 |
9 | ## Dataset
10 | The MVTec anomaly detection dataset contains 5354 high-resolution images from 15 different object categories and includes 70 different types of defects across the anomalous images that are typical in the manufacturing process. For each object category, a training set of normal images of objects and textures as well as a test set with both normal and anomalous samples along with their corresponding labels.
11 |
12 |
13 |
14 |
15 |
16 | Figure 1: An example of inlier images (left) and labels (right) for multiple object categories in the MVTec dataset.
17 |
18 |
19 |
20 | ## Experimental Setup
21 | The MVTEC dataset object categories each include a train set of normal samples and a test set of both normal and anomalous samples. Models were optimized to be able to reconstruct samples from the inlier distribution during the training phase. Subsequently, at test time, both normal and anomalous images are input to the model and the pixelwise reconstruction error of samples is used to identify anomalous regions. Specifically, the models were evaluated on the testing data for each of the object categories and the average area under the ROC curve (AUC) is reported. A small validation set of normal images is used to determine which model step yields the most optimal set of parameters. Specifically, 10\% of images were randomly removed from the train set and used as the validation set. For testing, the entire test set was used and the average AUC across object categories is reported for each method.
22 |
23 | ## Results
24 |
25 |
26 |
27 |
28 |
29 | Figure 2: Avergae AUC score on test set for each approach.
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 | Figure 3: A visualization of the predictions generated by the network for an anomolous sample.
38 |
39 |
40 |
41 | ## Running Code
42 | To configure the environment to run the experiments navigate to the base of this directory and execute the following commands:
43 |
44 | ```
45 | conda create -n new_env
46 | conda activate new_env
47 | pip install -r requirements.txt
48 | ```
49 |
50 | To obtain results for a specific architecture simply pass the appropriate arguments to the **train.py** script:
51 | ```
52 | python train.py --model vae --epochs 10 --ckpt_path /path/to/checkpoint/folder --dataset_path /path/to/mvtec
53 | ```
54 |
55 | The **train.py** script has the following arguments:
56 | - **model**: (str): Architecture variation for experiments. ae or vae. *required*
57 | - **data_path** (str): The root directory of the dataset. *required*
58 | - **ckpt_path** (str): The directory to save model checkpoints. *required*
59 | - **epochs** (int): The number of epochs to train the model. Default 100
60 | - **batch_size** (int) The batch size for trainingtesting. Default 32
61 | - **learning_rate** (float): Learning rates of model. Default .001
62 | - **size** (int): Side length of input image Default 128
63 |
64 |
--------------------------------------------------------------------------------
/image/Defect_Detection/datasets.py:
--------------------------------------------------------------------------------
1 | import os
2 | import torch
3 | import glob
4 | from PIL import Image
5 | import numpy as np
6 |
7 | from torch.utils.data import Dataset
8 |
9 | class MVTecADDataset(Dataset):
10 | def __init__(self, img_dir, mode, transform, size=128):
11 | self.img_dir = img_dir
12 | self.mode = mode
13 | self.size = size
14 |
15 | if self.mode == "train":
16 | self.img_paths = glob.glob(f"{self.img_dir}/train/good/*.png")
17 | else:
18 |
19 | paths = glob.glob(f"{self.img_dir}/test/*/*.png")
20 |
21 | inlier_img_paths = glob.glob(f"{self.img_dir}/test/good/*.png")
22 | outlier_img_paths = list(set(paths) - set(inlier_img_paths))
23 | self.img_paths = inlier_img_paths + outlier_img_paths
24 | self.outlier_lbl_paths = [f"{self.img_dir}/ground_truth/{path.split('/')[-2]}/{path.split('/')[-1][:-4]}_mask.png" for path in outlier_img_paths]
25 |
26 | self.outlier_lbl = np.array([np.array(Image.open(path).convert('1').resize((self.size, self.size))) for path in self.outlier_lbl_paths])
27 |
28 |
29 | self.inlier_lbl = np.zeros(shape=(len(inlier_img_paths), self.outlier_lbl.shape[1], self.outlier_lbl.shape[2]))
30 |
31 | self.labels = torch.from_numpy(np.concatenate([self.inlier_lbl, self.outlier_lbl])).int()
32 |
33 |
34 | self.transform = transform
35 |
36 | def __getitem__(self, index):
37 | if self.mode == "test":
38 | x = Image.open(self.img_paths[index]).convert("RGB")
39 | if self.transform is not None:
40 | x = self.transform(x)
41 |
42 |
43 | y = self.labels[index]
44 | return x, y
45 | else:
46 | x = Image.open(self.img_paths[index]).convert('RGB')
47 | if self.transform is not None:
48 | x = self.transform(x)
49 | return x
50 |
51 | def __len__(self):
52 | return len(self.img_paths)
--------------------------------------------------------------------------------
/image/Defect_Detection/model.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 |
5 |
6 | class Decoder(nn.Module):
7 | """
8 | The model architecture is taken from https://github.com/pytorch/examples/issues/70
9 | """
10 |
11 | def __init__(self, in_channels, dec_channels, hidden_dim):
12 | self.in_channels = in_channels
13 | self.dec_channels = dec_channels
14 | self.hidden_dim = hidden_dim
15 |
16 | super().__init__()
17 | self.main = nn.Sequential(
18 | # input is Z, going into a convolution
19 | nn.ConvTranspose2d(self.hidden_dim, self.dec_channels * 16, 4, 1, 0, bias=False),
20 | nn.BatchNorm2d(self.dec_channels * 16),
21 | nn.ReLU(True),
22 | # state size. (NGF*16) x 4 x 4
23 | nn.ConvTranspose2d(self.dec_channels * 16, self.dec_channels * 8, 4, 2, 1, bias=False),
24 | nn.BatchNorm2d(self.dec_channels * 8),
25 | nn.ReLU(True),
26 | # state size. (NGF*8) x 8 x 8
27 | nn.ConvTranspose2d(self.dec_channels * 8, self.dec_channels * 4, 4, 2, 1, bias=False),
28 | nn.BatchNorm2d(self.dec_channels * 4),
29 | nn.ReLU(True),
30 | # state size. (NGF*4) x 16 x 16
31 | nn.ConvTranspose2d(self.dec_channels * 4, self.dec_channels * 2, 4, 2, 1, bias=False),
32 | nn.BatchNorm2d(self.dec_channels * 2),
33 | nn.ReLU(True),
34 | # state size. (NGF*2) x 32 x 32
35 | nn.ConvTranspose2d(self.dec_channels * 2, self.dec_channels, 4, 2, 1, bias=False),
36 | nn.BatchNorm2d(self.dec_channels),
37 | nn.ReLU(True),
38 | # state size. (NGF) x 64 x 64
39 | nn.ConvTranspose2d(self.dec_channels, self.in_channels, 4, 2, 1, bias=False),
40 | nn.Sigmoid()
41 | # state size. (NC) x 128 x 128
42 | )
43 |
44 | def forward(self, x):
45 | return self.main(x)
46 |
47 |
48 | class Encoder(nn.Module):
49 | """
50 | The model architecture is taken from https://github.com/pytorch/examples/issues/70
51 | """
52 |
53 | def __init__(self, in_channels, enc_channels, hidden_dim):
54 | self.in_channels = in_channels
55 | self.enc_channels = enc_channels
56 | self.hidden_dim = hidden_dim
57 |
58 | super().__init__()
59 | self.main = nn.Sequential(
60 | # input is (NC) x 128 x 128
61 | nn.Conv2d(self.in_channels, self.enc_channels, 4, stride=2, padding=1, bias=False),
62 | nn.LeakyReLU(0.2, inplace=True),
63 | # state size. (NDF) x 64 x 64
64 | nn.Conv2d(self.enc_channels, self.enc_channels * 2, 4, stride=2, padding=1, bias=False),
65 | nn.BatchNorm2d(self.enc_channels * 2),
66 | nn.LeakyReLU(0.2, inplace=True),
67 | # state size. (NDF*2) x 32 x 32
68 | nn.Conv2d(self.enc_channels * 2, self.enc_channels * 4, 4, stride=2, padding=1, bias=False),
69 | nn.BatchNorm2d(self.enc_channels * 4),
70 | nn.LeakyReLU(0.2, inplace=True),
71 | # state size. (NDF*4) x 16 x 16
72 | nn.Conv2d(self.enc_channels * 4, self.enc_channels * 8, 4, stride=2, padding=1, bias=False),
73 | nn.BatchNorm2d(self.enc_channels * 8),
74 | nn.LeakyReLU(0.2, inplace=True),
75 | # state size. (NDF*8) x 8 x 8
76 | nn.Conv2d(self.enc_channels * 8, self.enc_channels * 16, 4, stride=2, padding=1, bias=False),
77 | nn.BatchNorm2d(self.enc_channels * 16),
78 | nn.LeakyReLU(0.2, inplace=True),
79 | # state size. (NDF*16) x 4 x 4
80 | nn.Conv2d(self.enc_channels * 16, self.hidden_dim, 4, stride=1, padding=0, bias=False),
81 | nn.Flatten(),
82 | )
83 |
84 | def forward(self, x):
85 | return self.main(x)
86 |
87 | def vae_loss_fn(x, recon_batch, mu, logvar):
88 |
89 | recon_loss = ae_loss_fn(x, recon_batch)
90 |
91 | KLD = torch.mean(-0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp(),dim=1),dim=0)
92 |
93 | return recon_loss + KLD
94 |
95 | def ae_loss_fn(x, recon_batch):
96 | """Function taken and modified from
97 | https://github.com/pytorch/examples/tree/master/vae
98 | """
99 | MSE = ((x - recon_batch) ** 2).mean()
100 | return MSE
101 |
102 | class ConvVAE(nn.Module):
103 |
104 | def __init__(self, in_channels=3, enc_channels=128, dec_channels=128, hidden_dim=100):
105 | super().__init__()
106 |
107 | self.in_channels = in_channels
108 | self.enc_channels = enc_channels
109 | self.dec_channels = dec_channels
110 | self.hidden_dim = hidden_dim
111 |
112 | self.encoder = Encoder(self.in_channels, self.enc_channels, self.hidden_dim*2)
113 | self.decoder = Decoder(self.in_channels, self.dec_channels, self.hidden_dim)
114 |
115 | def reparameterize(self, mu, logvar):
116 | std = (0.5 * logvar).exp()
117 | eps = torch.randn_like(std)
118 |
119 | return mu + eps * std
120 |
121 | def forward(self, x):
122 | enc_out = self.encoder(x)
123 | mu, logvar = enc_out[..., :self.hidden_dim], enc_out[..., self.hidden_dim:]
124 | z = self.reparameterize(mu, logvar)
125 | recon_batch = self.decoder(z.unsqueeze(-1).unsqueeze(-1))
126 | return recon_batch, mu, logvar
127 |
128 | class AE(nn.Module):
129 |
130 | def __init__(self, in_channels=3, enc_channels=128, dec_channels=128, hidden_dim=100):
131 | super().__init__()
132 | self.in_channels = in_channels
133 | self.enc_channels = enc_channels
134 | self.dec_channels = dec_channels
135 | self.hidden_dim = hidden_dim
136 |
137 | self.encoder = Encoder(self.in_channels, self.enc_channels, self.hidden_dim)
138 | self.decoder = Decoder(self.in_channels, self.dec_channels, self.hidden_dim)
139 |
140 | def forward(self, x):
141 | enc_out = self.encoder(x)
142 | recon_batch = self.decoder(enc_out.unsqueeze(-1).unsqueeze(-1))
143 | return recon_batch
144 |
145 |
--------------------------------------------------------------------------------
/image/Defect_Detection/requirements.txt:
--------------------------------------------------------------------------------
1 | anyio==3.7.0
2 | argon2-cffi==21.3.0
3 | argon2-cffi-bindings==21.2.0
4 | arrow==1.2.3
5 | asttokens==2.2.1
6 | attrs==23.1.0
7 | backcall==0.2.0
8 | beautifulsoup4==4.12.2
9 | bleach==6.0.0
10 | certifi==2023.5.7
11 | cffi==1.15.1
12 | charset-normalizer==3.1.0
13 | comm==0.1.3
14 | debugpy==1.6.7
15 | decorator==5.1.1
16 | defusedxml==0.7.1
17 | exceptiongroup==1.1.1
18 | executing==1.2.0
19 | fastjsonschema==2.17.1
20 | fqdn==1.5.1
21 | idna==3.4
22 | importlib-metadata==6.7.0
23 | ipykernel==6.23.3
24 | ipython==8.14.0
25 | ipython-genutils==0.2.0
26 | ipywidgets==8.0.6
27 | isoduration==20.11.0
28 | jedi==0.18.2
29 | Jinja2==3.1.2
30 | joblib==1.2.0
31 | jsonpointer==2.4
32 | jsonschema==4.17.3
33 | jupyter==1.0.0
34 | jupyter-console==6.6.3
35 | jupyter-events==0.6.3
36 | jupyter_client==8.3.0
37 | jupyter_core==5.3.1
38 | jupyter_server==2.6.0
39 | jupyter_server_terminals==0.4.4
40 | jupyterlab-pygments==0.2.2
41 | jupyterlab-widgets==3.0.7
42 | MarkupSafe==2.1.3
43 | matplotlib-inline==0.1.6
44 | mistune==3.0.1
45 | nbclassic==1.0.0
46 | nbclient==0.8.0
47 | nbconvert==7.6.0
48 | nbformat==5.9.0
49 | nest-asyncio==1.5.6
50 | notebook==6.5.4
51 | notebook_shim==0.2.3
52 | numpy==1.25.0
53 | overrides==7.3.1
54 | packaging==23.1
55 | pandas==2.0.2
56 | pandocfilters==1.5.0
57 | parso==0.8.3
58 | pexpect==4.8.0
59 | pickleshare==0.7.5
60 | Pillow==9.5.0
61 | platformdirs==3.8.0
62 | prometheus-client==0.17.0
63 | prompt-toolkit==3.0.38
64 | psutil==5.9.5
65 | ptyprocess==0.7.0
66 | pure-eval==0.2.2
67 | pycparser==2.21
68 | Pygments==2.15.1
69 | pyrsistent==0.19.3
70 | python-dateutil==2.8.2
71 | python-json-logger==2.0.7
72 | pytz==2023.3
73 | PyYAML==6.0
74 | pyzmq==25.1.0
75 | qtconsole==5.4.3
76 | QtPy==2.3.1
77 | requests==2.31.0
78 | rfc3339-validator==0.1.4
79 | rfc3986-validator==0.1.1
80 | scikit-learn==1.2.2
81 | scipy==1.10.1
82 | Send2Trash==1.8.2
83 | six==1.16.0
84 | sniffio==1.3.0
85 | soupsieve==2.4.1
86 | stack-data==0.6.2
87 | terminado==0.17.1
88 | threadpoolctl==3.1.0
89 | tinycss2==1.2.1
90 | torch==1.11.0
91 | torchvision==0.12.0
92 | tornado==6.3.2
93 | traitlets==5.9.0
94 | typing_extensions==4.6.3
95 | tzdata==2023.3
96 | uri-template==1.3.0
97 | urllib3==2.0.3
98 | wcwidth==0.2.6
99 | webcolors==1.13
100 | webencodings==0.5.1
101 | websocket-client==1.6.1
102 | widgetsnbextension==4.0.7
103 | zipp==3.15.0
104 |
--------------------------------------------------------------------------------
/image/Defect_Detection/train.py:
--------------------------------------------------------------------------------
1 | # system imports
2 | import os
3 | import logging
4 | import glob
5 | from pathlib import Path
6 | import re
7 | import argparse
8 |
9 | # external dependencies
10 | import torch
11 | import torch.nn as nn
12 | from torch.optim import Adam
13 | from torchvision import transforms
14 | from torch.utils.data import DataLoader
15 |
16 | # relative imports
17 | from model import AE, ConvVAE, ae_loss_fn, vae_loss_fn
18 | from datasets import MVTecADDataset
19 | from utils import train_step, test_step, save_checkpoint
20 |
21 | parser = argparse.ArgumentParser(description="Feature Memory for Anomaly Detection")
22 |
23 | # basic config
24 | parser.add_argument('--model', type=str, help='Architecture variation for experiments. ae or vae.')
25 | parser.add_argument('--epochs', type=int, default=100, help=' The number of epochs to train the model.')
26 | parser.add_argument('--batch_size', type=int, default=8, help=' The batch size for training, validation and testing.')
27 | parser.add_argument('--learning_rate', type=float, default=.001, help='Learning rates of model.')
28 | parser.add_argument('--size', type=int, default=128, help='Side length of input image')
29 | parser.add_argument('--data_path', type=str, help='The root directory of the dataset.')
30 | parser.add_argument('--ckpt_path', type=str, help='The directory to save model checkpoints.')
31 |
32 | args = parser.parse_args()
33 |
34 | # Data Paths
35 |
36 | CLASSES = ["toothbrush",
37 | "pill",
38 | "leather",
39 | "hazelnut",
40 | "capsule",
41 | "cable",
42 | "bottle",
43 | "zipper",
44 | "tile",
45 | "transistor",
46 | "wood",
47 | "metal_nut",
48 | "screw",
49 | "carpet",
50 | "grid"]
51 |
52 | DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
53 |
54 | def main():
55 |
56 | transform = transforms.Compose([
57 | transforms.ToTensor(),
58 | transforms.Resize(size=(args.size, args.size)),
59 | transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
60 | ])
61 |
62 | test_auc_list = []
63 | for inlier in CLASSES:
64 | # Prepare Data
65 | print("class", inlier)
66 | current_epoch = 0
67 | ckpt_path = f"{args.ckpt_path}/{inlier}.pth"
68 | img_dir = f"{args.data_path}/{inlier}"
69 | train_dataset = MVTecADDataset(img_dir, "train", transform)
70 | test_dataset = MVTecADDataset(img_dir, "test", transform, args.size)
71 |
72 | train_loader = DataLoader(train_dataset, batch_size=args.batch_size, shuffle=True)
73 | test_loader = DataLoader(test_dataset, batch_size=args.batch_size, shuffle=True)
74 |
75 |
76 | model = ConvVAE() if args.model == "vae" else AE()
77 | model = torch.nn.DataParallel(model)
78 |
79 | optimizer = Adam(model.parameters(), lr=args.learning_rate)
80 | save_checkpoint(model, optimizer, epoch=current_epoch, path=ckpt_path)
81 |
82 | loss_fn = vae_loss_fn if args.model == "vae" else ae_loss_fn
83 |
84 | highest_auc = 0
85 | while True:
86 | ckpt = torch.load(ckpt_path)
87 | epoch = ckpt["epoch"]
88 |
89 | if epoch == args.epochs:
90 | break
91 |
92 | model = ConvVAE() if args.model == "vae" else AE()
93 | model = nn.DataParallel(model)
94 | model.load_state_dict(ckpt["model"])
95 | model.to(DEVICE)
96 |
97 | model.train()
98 | train_loss = train_step(train_loader, model, optimizer, loss_fn, DEVICE, args.model)
99 |
100 | model.eval()
101 | test_auc, test_loss = test_step(test_loader, model, loss_fn, DEVICE, args.model)
102 |
103 | print(f"Train Loss: {str(train_loss)} \t Test AUC: {str(test_auc)}")
104 |
105 | if test_auc > highest_auc:
106 | highest_auc = test_auc
107 |
108 | save_checkpoint(model, optimizer, epoch + 1, ckpt_path)
109 |
110 | test_auc_list.append(highest_auc)
111 |
112 |
113 | print(f"Average AUC: {str(np.mean(test_auc_list))}")
114 |
115 | ######################################################
116 |
117 | if __name__ == "__main__":
118 | main()
--------------------------------------------------------------------------------
/image/Defect_Detection/utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import numpy as np
3 |
4 | from sklearn.metrics import roc_auc_score, roc_curve
5 |
6 | def get_auc(preds, lbls):
7 | preds = preds.flatten().cpu().numpy()
8 | lbls = lbls.flatten().cpu().numpy()
9 |
10 | auc = roc_auc_score(lbls, preds)
11 | return auc
12 |
13 | def save_checkpoint(model, opt, epoch, path):
14 | """Save Checkpoint"""
15 |
16 | torch.save({
17 | "model": model.state_dict(),
18 | "opt": opt.state_dict(),
19 | "epoch": epoch
20 | },
21 | path)
22 |
23 |
24 | def train_step(loader, model, optimizer, loss_fn, device, model_str):
25 |
26 | train_loss_list = []
27 |
28 | for i, data in enumerate(loader):
29 | data = data.to(device)
30 | optimizer.zero_grad()
31 | if model_str == "vae":
32 | recon, mu, logvar = model(data)
33 | loss = loss_fn(data, recon, mu, logvar)
34 | else:
35 | recon = model(data)
36 | loss = loss_fn(data, recon)
37 | loss.backward()
38 | optimizer.step()
39 | train_loss_list.append(loss.item())
40 |
41 | return np.mean(train_loss_list)
42 |
43 | def test_step(loader, model, loss_fn, device, model_str):
44 |
45 | loss_list, error_map_list, lbl_list = [], [], []
46 | for i, (data, lbl) in enumerate(loader):
47 | data, lbl = data.to(device), lbl.to(device)
48 |
49 | with torch.no_grad():
50 | if model_str == "vae":
51 | recon, mu, logvar = model(data)
52 | loss = loss_fn(data, recon, mu, logvar)
53 | else:
54 | recon = model(data)
55 | loss = loss_fn(data, recon)
56 | loss_list.append(loss.item())
57 | error_map = torch.mean((data - recon)**2, dim=1).unsqueeze(1)
58 | error_map_list.append(error_map)
59 | lbl_list.append(lbl)
60 |
61 | error_maps = torch.cat(error_map_list, dim=0)
62 | lbls = torch.cat(lbl_list, dim=0)
63 | preds = (error_maps - torch.min(error_maps)) / (torch.max(error_maps) - torch.min(error_maps))
64 |
65 | auc = get_auc(preds, lbls)
66 | loss = np.mean(loss_list)
67 |
68 | return auc, loss
69 |
--------------------------------------------------------------------------------
/image/Road_Obstacle_Detection/README.md:
--------------------------------------------------------------------------------
1 | # Road Obstacle Detection
2 |
3 | ## Overview
4 |
5 | Detecting obstacles on the road/railway is a critical part of the driving task which has not been mastered by fully autonomous vehicles. Semantic segmentation plays an important role in addressing the challenges of identifying the locations of obstacles. In this phase of the project, we explore the application of semantic segmentation methods to the task of detecting road obstacles using the Lost and Found Dataset. The goal of the experiments is to determine which model architecture is the best for road obstacle detection - something that is of interest to both the practioner and researchers.
6 |
7 | ## Dataset
8 | The Lost and Found dataset was introduced to evaluate the performance of small road obstacle detection approaches. The Lost and Found Dataset includes 2k images recording from 13 different challenging street scenarios, featuring 37 different obstacles types. Each object is labeled with a unique ID, allowing for a later refinement into subcategories. An overview of the Lost and Found dataset is available below, which is refined into three classes: driveable area, non drivable area and obstacles.
9 |
10 |
11 |
12 |
13 |
14 | Figure 1: The Lost and Found Dataset.
15 |
16 |
17 |
18 | ## Results
19 |
20 |
21 |
22 |
23 |
24 | Figure 2: The validation cross entropy loss for each model across epochs.
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 | Figure 3: Visual results comparing prediction made by each model for a test image.
33 |
34 |
35 |
36 | ## Running Code
37 | To configure the environment to run the experiments navigate to the base of this directory and execute the following commands:
38 |
39 | ```
40 | conda create -n new_env
41 | conda activate new_env
42 | pip install -r requirements.txt
43 | ```
44 |
45 | To obtain results for a specific architecture simply pass the appropriate arguments to the **train.py** script:
46 | ```
47 | python train.py --epochs 10 --batch_size 4
48 | ```
49 |
50 | The **train.py** script has the following arguments:
51 | - **epochs** (int): The number of epochs to train the memory.
52 | - **batch_size** (int) The batch size for training, validation and testing.
53 | - **learning_rate** (float): Learning rates of memory units.
54 | - **height** (int): Height of input image.
55 | - **width** (int): Width of input image.
56 | - **train_perc** (float): The proportion of samples used for train.
57 | - **data_path** (str): The root directory of the dataset.
58 | - **ckpt_path** (str): Path of checkpoint file.
59 | - **best_ckpt_path** (str): Path of checkpoint file for best performing model on the validation set.
60 | - **sample_path** (str): Path of file to save example images.
61 |
62 |
63 |
--------------------------------------------------------------------------------
/image/Road_Obstacle_Detection/dice_loss.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.autograd import Function
3 |
4 |
5 | class DiceCoeff(Function):
6 | """Dice coeff for individual examples"""
7 |
8 | def forward(self, input, target):
9 | self.save_for_backward(input, target)
10 | eps = 0.0001
11 | self.inter = torch.dot(input.view(-1), target.view(-1))
12 | self.union = torch.sum(input) + torch.sum(target) + eps
13 |
14 | t = (2 * self.inter.float() + eps) / self.union.float()
15 | return t
16 |
17 | # This function has only a single output, so it gets only one gradient
18 | def backward(self, grad_output):
19 |
20 | input, target = self.saved_variables
21 | grad_input = grad_target = None
22 |
23 | if self.needs_input_grad[0]:
24 | grad_input = grad_output * 2 * (target * self.union - self.inter) \
25 | / (self.union * self.union)
26 | if self.needs_input_grad[1]:
27 | grad_target = None
28 |
29 | return grad_input, grad_target
30 |
31 |
32 | def dice_coeff(input, target):
33 | """Dice coeff for batches"""
34 | if input.is_cuda:
35 | s = torch.FloatTensor(1).cuda().zero_()
36 | else:
37 | s = torch.FloatTensor(1).zero_()
38 |
39 | for i, c in enumerate(zip(input, target)):
40 | s = s + DiceCoeff().forward(c[0], c[1])
41 |
42 | return s / (i + 1)
43 |
--------------------------------------------------------------------------------
/image/Road_Obstacle_Detection/eval.py:
--------------------------------------------------------------------------------
1 | import torch.nn.functional as F
2 | import torch
3 | from tqdm import tqdm
4 |
5 | from dice_loss import dice_coeff
6 |
7 |
8 | def eval_net(net, loader, device):
9 | """Evaluation without the densecrf with the dice coefficient"""
10 | net.eval()
11 | mask_type = torch.long
12 | n_val = len(loader) # the number of batch
13 | tot = 0
14 |
15 | with tqdm(total=n_val, desc='Validation round', unit='batch', leave=False) as pbar:
16 | for batch in loader:
17 | imgs, true_masks =batch #batch['image'], batch['mask']
18 | #true_masks=(true_masks > 0.5).float()
19 | imgs = imgs.to(device=device, dtype=torch.float32)
20 | true_masks = true_masks.to(device=device, dtype=mask_type)
21 |
22 | with torch.no_grad():
23 | mask_pred = net(imgs)
24 |
25 |
26 | tot += F.cross_entropy(mask_pred, true_masks).item()
27 | pbar.update()
28 |
29 | net.train()
30 | return tot / n_val
31 |
--------------------------------------------------------------------------------
/image/Road_Obstacle_Detection/train.py:
--------------------------------------------------------------------------------
1 | import os
2 | import pickle
3 | import argparse
4 | from tqdm import tqdm
5 | import torch.nn.functional as F
6 |
7 | import numpy as np
8 | from PIL import Image
9 | import torch.utils.data as data
10 |
11 | import matplotlib.pyplot as plt
12 |
13 | import torch
14 | import torch.utils.data as data
15 | import torch.nn as nn
16 | from torch.utils.data import DataLoader
17 | from torch.nn import CrossEntropyLoss
18 |
19 | from torchvision.datasets import Cityscapes
20 | from torchvision.utils import make_grid
21 |
22 | from lf_loader import lostandfoundLoader
23 |
24 | from eval import eval_net
25 |
26 | from utils import train_step, val_step, get_model, save_viz, save_checkpoint
27 |
28 | parser = argparse.ArgumentParser(description="Feature Memory for Anomaly Detection")
29 |
30 | # basic config
31 | parser.add_argument('--epochs', type=int, default=2, help=' The number of epochs to train the memory.')
32 | parser.add_argument('--batch_size', type=int, default=4, help=' The batch size for training, validation and testing.')
33 | parser.add_argument('--learning_rate', type=float, default=3e-4, help='Learning rates of model.')
34 | parser.add_argument('--height', type=int, default=128, help='Height of input image')
35 | parser.add_argument('--width', type=int, default=256, help='Width of input image')
36 | parser.add_argument('--train_perc', type=float, default=.9, help='Proportion of samples to use in training set')
37 | parser.add_argument('--data_path', type=str, default="/scratch/ssd002/datasets/lostandfound", help='The root directory of the dataset.')
38 | parser.add_argument('--ckpt_path', type=str, default="ckpt/run_1.pth", help='The file to save model checkpoints.')
39 | parser.add_argument('--best_ckpt_path', type=str, default="ckpt/best_run_1.pth", help='The file to save best model checkpoint.')
40 | parser.add_argument('--sample_path', type=str, default="samples", help='The file to save best model checkpoint.')
41 |
42 |
43 | args = parser.parse_args()
44 |
45 | # Global Variables
46 | IMG_SIZE = (args.height, args.width) #H, W
47 | DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
48 | CURRENT_EPOCH = 0
49 |
50 | LF_MAP = {
51 | 0: (0, 0, 0),
52 | 1: (255, 0, 0),
53 | 2: (0, 255, 0),
54 | 3: (0, 0, 255),
55 | }
56 |
57 | def main():
58 |
59 | # Prepare Dataset and Dataloader
60 | dataset = lostandfoundLoader(args.data_path, is_transform=True, augmentations=None)
61 |
62 | train_size = int(len(dataset) * args.train_perc)
63 | val_size = len(dataset) - train_size
64 | train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size])
65 |
66 | train_dataloader = data.DataLoader(train_dataset, batch_size=args.batch_size, num_workers=2)
67 | val_dataloader = data.DataLoader(val_dataset, batch_size=args.batch_size, num_workers=2)
68 |
69 | model = get_model(pretrained=True)
70 |
71 | # Loss and Optimizer
72 | criterion = CrossEntropyLoss()
73 | opt = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
74 |
75 | # Save Initial checkpoint to be subsquently restored from
76 | save_checkpoint(model, opt, epoch=CURRENT_EPOCH, path=args.ckpt_path)
77 |
78 | train_loss_list = []
79 | val_loss_list = []
80 | max_val_loss = 1e10
81 | while True:
82 | # Load checkpoint
83 | ckpt = torch.load(args.ckpt_path)
84 |
85 | epoch = ckpt["epoch"]
86 |
87 | if epoch == args.epochs:
88 | break
89 |
90 | model = get_model(pretrained=False)
91 | model.load_state_dict(ckpt["model"])
92 | model.to(DEVICE)
93 |
94 | opt = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
95 | opt.load_state_dict(ckpt["opt"])
96 |
97 | model.train()
98 | train_loss = train_step(model, opt, criterion, train_dataloader, epoch, DEVICE)
99 | train_loss_list.append(train_loss)
100 |
101 | model.eval()
102 | val_loss = val_step(model, criterion, val_dataloader, epoch, DEVICE, LF_MAP, args.sample_path)
103 | val_loss_list.append(val_loss)
104 |
105 |
106 | with open("train_loss.txt", "a") as myfile:
107 | myfile.write(f"{str(epoch)}\t{str(train_loss)}\n")
108 |
109 | with open("val_loss.txt", "a") as myfile:
110 | myfile.write(f"{str(epoch)}\t{str(val_loss)}\n")
111 |
112 | if val_loss < max_val_loss:
113 | torch.save({
114 | "model": model.state_dict(),
115 | "opt": opt.state_dict(),
116 | "epoch": epoch,
117 | },
118 | args.best_ckpt_path)
119 |
120 | save_checkpoint(model, opt, epoch + 1, args.ckpt_path)
121 | model.cpu()
122 |
123 |
124 | f, axarr = plt.subplots(1, 2, figsize=(20,20))
125 | axarr[0].plot(train_loss_list)
126 | axarr[0].title.set_text("Train Loss")
127 | axarr[1].plot(val_loss_list)
128 | axarr[1].title.set_text("Validation Loss")
129 |
130 | fig_path = f"{args.sample_path}/loss_figure.jpg"
131 | f.savefig(fig_path)
132 |
133 | if __name__ == "__main__":
134 | main()
--------------------------------------------------------------------------------
/image/Road_Obstacle_Detection/train_loss.txt:
--------------------------------------------------------------------------------
1 | 0 0.16359891243630725
2 | 1 0.08223168643055556
3 |
--------------------------------------------------------------------------------
/image/Road_Obstacle_Detection/utils.py:
--------------------------------------------------------------------------------
1 | import tqdm
2 |
3 | import numpy as np
4 | import matplotlib.pyplot as plt
5 |
6 | import torch
7 | import torch.nn as nn
8 |
9 | from torchvision.models.segmentation import fcn_resnet50
10 |
11 |
12 | def train_step(model, opt, criterion, dataloader, epoch, device):
13 | losses = []
14 | counter = 0
15 | for i, (img, lbl) in enumerate(dataloader):
16 | lbl = lbl.long()
17 | img, lbl = img.to(device), lbl.to(device)
18 | opt.zero_grad()
19 | out = model(img)["out"]
20 | loss = criterion(out, lbl)
21 | loss.backward()
22 | opt.step()
23 | losses.append(loss.item())
24 |
25 | return np.mean(losses)
26 |
27 | def val_step(model, criterion, dataloader, epoch, device, lf_map, sample_path):
28 | losses = []
29 | dices = []
30 | viz = True
31 | for i, (img, lbl) in enumerate(dataloader):
32 | lbl = lbl.long()
33 | img, lbl = img.to(device), lbl.to(device)
34 |
35 | with torch.no_grad():
36 | out = model(img)["out"]
37 |
38 | loss = criterion(out, lbl)
39 | losses.append(loss.item())
40 |
41 | if viz:
42 | save_viz(img, out, lbl, lf_map, epoch, sample_path)
43 | viz = False
44 |
45 | return np.mean(losses)
46 |
47 | def save_viz(img, out, lbl, color_map, epoch, sample_path):
48 | img = img.cpu().numpy()
49 | out = out.cpu().numpy()
50 | lbl = lbl.cpu().numpy()
51 | rows = out.shape[2]
52 | cols = out.shape[3]
53 |
54 | masks = []
55 | masks_gt = []
56 | for index, (im, o, l) in enumerate(zip(img, out, lbl)):
57 | mask = np.zeros((rows, cols, 3), dtype=np.uint8)
58 | mask_gt = np.zeros((rows, cols, 3), dtype=np.uint8)
59 | for j in range(rows):
60 | for i in range(cols):
61 | mask[j, i] = color_map[np.argmax(o[:, j, i]-1, axis=0)]
62 | mask_gt[j, i] = color_map[l[j, i]]
63 |
64 | mask_path = f"{sample_path}/epoch_{str(epoch)}_pred_{str(index)}.jpg"
65 | lbl_path = f"{sample_path}/epoch_{str(epoch)}_lbl_{str(index)}.jpg"
66 | img_path = f"{sample_path}/epoch_{str(epoch)}_img_{str(index)}.jpg"
67 | f, axarr = plt.subplots(1, 3, figsize=(20, 20))
68 | im = np.moveaxis(im, 0, -1)
69 | axarr[0].imshow(im)
70 | axarr[0].title.set_text('Image')
71 | axarr[1].imshow(mask_gt)
72 | axarr[1].title.set_text('Label')
73 | axarr[2].imshow(mask)
74 | axarr[2].title.set_text('Prediction')
75 | f.savefig( f"{sample_path}/epoch_{str(epoch)}_{str(index)}.jpg")
76 |
77 | def get_model(pretrained=False):
78 | # Prepare Model and Save to Checkpoint Directory
79 | model = fcn_resnet50(pretrained=pretrained)
80 | model.classifier[4] = nn.Conv2d(512, 4, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
81 | model.aux_classifier = None
82 | model = nn.DataParallel(model)
83 | return model
84 |
85 | def save_checkpoint(model, opt, epoch, path):
86 | """Save Checkpoint"""
87 |
88 | torch.save({
89 | "model": model.state_dict(),
90 | "opt": opt.state_dict(),
91 | "epoch": epoch,
92 | },
93 | path)
94 |
--------------------------------------------------------------------------------
/image/Road_Obstacle_Detection/val_loss.txt:
--------------------------------------------------------------------------------
1 | 0 0.08754148219640438
2 | 1 0.1392884962260723
3 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VectorInstitute/Computer_Vision_Project/337d2dd041b575a31304c2052370b816bf92b2be/image/fastlane/OCR/charnet/__init__.py
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/config/__init__.py:
--------------------------------------------------------------------------------
1 | from .defaults import _C as cfg
2 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/config/defaults.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Malong Technologies Co., Ltd.
2 | # All rights reserved.
3 | #
4 | # Contact: github@malong.com
5 | #
6 | # This source code is licensed under the LICENSE file in the root directory of this source tree.
7 |
8 | from yacs.config import CfgNode as CN
9 |
10 |
11 | _C = CN()
12 |
13 | _C.INPUT_SIZE = 2280
14 | _C.SIZE_DIVISIBILITY = 1
15 | _C.WEIGHT= ""
16 |
17 | _C.CHAR_DICT_FILE = ""
18 | _C.WORD_LEXICON_PATH = ""
19 |
20 | _C.WORD_MIN_SCORE = 0.95
21 | _C.WORD_NMS_IOU_THRESH = 0.15
22 | _C.CHAR_MIN_SCORE = 0.25
23 | _C.CHAR_NMS_IOU_THRESH = 0.3
24 | _C.MAGNITUDE_THRESH = 0.2
25 |
26 | _C.WORD_STRIDE = 4
27 | _C.CHAR_STRIDE = 4
28 | _C.NUM_CHAR_CLASSES = 68
29 |
30 | _C.WORD_DETECTOR_DILATION = 1
31 | _C.RESULTS_SEPARATOR = chr(31)
32 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/modeling/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VectorInstitute/Computer_Vision_Project/337d2dd041b575a31304c2052370b816bf92b2be/image/fastlane/OCR/charnet/modeling/__init__.py
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/modeling/backbone/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VectorInstitute/Computer_Vision_Project/337d2dd041b575a31304c2052370b816bf92b2be/image/fastlane/OCR/charnet/modeling/backbone/__init__.py
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/modeling/backbone/decoder.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Malong Technologies Co., Ltd.
2 | # All rights reserved.
3 | #
4 | # Contact: github@malong.com
5 | #
6 | # This source code is licensed under the LICENSE file in the root directory of this source tree.
7 |
8 | from torch import nn
9 | from collections import OrderedDict
10 | from torch.functional import F
11 |
12 |
13 | class Decoder(nn.Module):
14 | def __init__(self, in_channels_list, out_channels):
15 | super(Decoder, self).__init__()
16 | self.backbone_feature_reduction = nn.ModuleList()
17 | self.top_down_feature_reduction = nn.ModuleList()
18 | for i, in_channels in enumerate(in_channels_list[::-1]):
19 | self.backbone_feature_reduction.append(
20 | self._conv1x1_relu(in_channels, out_channels)
21 | )
22 | if i < len(in_channels_list) - 2:
23 | self.top_down_feature_reduction.append(
24 | self._conv1x1_relu(out_channels, out_channels)
25 | )
26 |
27 | def _conv1x1_relu(self, in_channels, out_channels):
28 | return nn.Sequential(OrderedDict([
29 | ("conv", nn.Conv2d(
30 | in_channels, out_channels,
31 | kernel_size=1, stride=1,
32 | bias=False
33 | )),
34 | ("relu", nn.ReLU())
35 | ]))
36 |
37 | def forward(self, x):
38 | x = x[::-1] # to lowest resolution first
39 | top_down_feature = None
40 | for i, feature in enumerate(x):
41 | feature = self.backbone_feature_reduction[i](feature)
42 | if i == 0:
43 | top_down_feature = feature
44 | else:
45 | upsampled_feature = F.interpolate(
46 | top_down_feature,
47 | size=feature.size()[-2:],
48 | mode='bilinear',
49 | align_corners=True
50 | )
51 | if i < len(x) - 1:
52 | top_down_feature = self.top_down_feature_reduction[i - 1](
53 | feature + upsampled_feature
54 | )
55 | else:
56 | top_down_feature = feature + upsampled_feature
57 | return top_down_feature
58 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/modeling/backbone/hourglass.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Malong Technologies Co., Ltd.
2 | # All rights reserved.
3 | #
4 | # Contact: github@malong.com
5 | #
6 | # This source code is licensed under the LICENSE file in the root directory of this source tree.
7 |
8 | import torch
9 | from torch import nn
10 | import torch.nn.functional as F
11 |
12 |
13 | _norm_func = lambda num_features: nn.BatchNorm2d(num_features, eps=1e-5)
14 |
15 |
16 | def _make_layer(in_channels, out_channels, num_blocks, **kwargs):
17 | blocks = []
18 | blocks.append(Residual(in_channels, out_channels))
19 | for _ in range(1, num_blocks):
20 | blocks.append(Residual(out_channels, out_channels, **kwargs))
21 | return nn.Sequential(*blocks)
22 |
23 |
24 | def _make_layer_revr(in_channels, out_channels, num_blocks, **kwargs):
25 | blocks = []
26 | for _ in range(num_blocks - 1):
27 | blocks.append(Residual(in_channels, in_channels, **kwargs))
28 | blocks.append(Residual(in_channels, out_channels, **kwargs))
29 | return nn.Sequential(*blocks)
30 |
31 |
32 | class Residual(nn.Module):
33 | def __init__(self, in_channels, out_channels, stride=1):
34 | super(Residual, self).__init__()
35 |
36 | self.conv_1 = nn.Sequential(
37 | nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride, bias=False),
38 | _norm_func(out_channels),
39 | nn.ReLU()
40 | )
41 | self.conv_2 = nn.Sequential(
42 | nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1, stride=1, bias=False),
43 | _norm_func(out_channels)
44 | )
45 | if stride != 1 or in_channels != out_channels:
46 | self.skip = nn.Sequential(
47 | nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride, bias=False),
48 | _norm_func(out_channels)
49 | )
50 | else:
51 | self.skip = None
52 | self.out_relu = nn.ReLU()
53 |
54 | def forward(self, x):
55 | b1 = self.conv_2(self.conv_1(x))
56 | if self.skip is None:
57 | return self.out_relu(b1 + x)
58 | else:
59 | return self.out_relu(b1 + self.skip(x))
60 |
61 |
62 | class HourGlassBlock(nn.Module):
63 | def __init__(self, n, channels, blocks):
64 | super(HourGlassBlock, self).__init__()
65 |
66 | self.up_1 = _make_layer(channels[0], channels[0], blocks[0])
67 | self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
68 | self.low_1 = _make_layer(channels[0], channels[1], blocks[0])
69 | if n <= 1:
70 | self.low_2 = _make_layer(channels[1], channels[1], blocks[1])
71 | else:
72 | self.low_2 = HourGlassBlock(n - 1, channels[1:], blocks[1:])
73 | self.low_3 = _make_layer_revr(channels[1], channels[0], blocks[0])
74 |
75 | def forward(self, x):
76 | upsample = lambda input: F.interpolate(input, scale_factor=2, mode='bilinear', align_corners=True)
77 | up_1 = self.up_1(x)
78 | low = self.low_3(self.low_2(self.low_1(self.pool(x))))
79 | return upsample(low) + up_1
80 |
81 |
82 | class HourGlassNet(nn.Module):
83 | def __init__(self, n, channels, blocks):
84 | super(HourGlassNet, self).__init__()
85 | self.pre = nn.Sequential(
86 | nn.Conv2d(3, 128, kernel_size=7, stride=2, padding=3, bias=False),
87 | _norm_func(128),
88 | nn.ReLU(),
89 | Residual(128, 256, stride=2)
90 | )
91 | hourglass_blocks = []
92 | for _ in range(2):
93 | hourglass_blocks.append(
94 | HourGlassBlock(n, channels, blocks)
95 | )
96 | self.hourglass_blocks = nn.Sequential(*hourglass_blocks)
97 |
98 | def forward(self, x):
99 | return self.hourglass_blocks(self.pre(x))
100 |
101 |
102 | def hourglass88():
103 | return HourGlassNet(3, [256, 256, 256, 512], [2, 2, 2, 2])
104 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/modeling/layers/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Malong Technologies Co., Ltd.
2 | # All rights reserved.
3 | #
4 | # Contact: github@malong.com
5 | #
6 | # This source code is licensed under the LICENSE file in the root directory of this source tree.
7 |
8 | from .misc import Conv2d
9 | from .misc import ConvTranspose2d
10 | from .misc import BatchNorm2d
11 | from .misc import interpolate
12 | from .scale import Scale
13 |
14 |
15 | __all__ = [
16 | "Conv2d",
17 | "ConvTranspose2d",
18 | "interpolate",
19 | "BatchNorm2d",
20 | "Scale"
21 | ]
22 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/modeling/layers/misc.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | """
3 | helper class that supports empty tensors on some nn functions.
4 |
5 | Ideally, add support directly in PyTorch to empty tensors in
6 | those functions.
7 |
8 | This can be removed once https://github.com/pytorch/pytorch/issues/12013
9 | is implemented
10 | """
11 |
12 | import math
13 | import torch
14 | from torch.nn.modules.utils import _ntuple
15 |
16 |
17 | class _NewEmptyTensorOp(torch.autograd.Function):
18 | @staticmethod
19 | def forward(ctx, x, new_shape):
20 | ctx.shape = x.shape
21 | return x.new_empty(new_shape)
22 |
23 | @staticmethod
24 | def backward(ctx, grad):
25 | shape = ctx.shape
26 | return _NewEmptyTensorOp.apply(grad, shape), None
27 |
28 |
29 | class Conv2d(torch.nn.Conv2d):
30 | def forward(self, x):
31 | if x.numel() > 0:
32 | return super(Conv2d, self).forward(x)
33 | # get output shape
34 |
35 | output_shape = [
36 | (i + 2 * p - (di * (k - 1) + 1)) // d + 1
37 | for i, p, di, k, d in zip(
38 | x.shape[-2:], self.padding, self.dilation, self.kernel_size, self.stride
39 | )
40 | ]
41 | output_shape = [x.shape[0], self.weight.shape[0]] + output_shape
42 | return _NewEmptyTensorOp.apply(x, output_shape)
43 |
44 |
45 | class ConvTranspose2d(torch.nn.ConvTranspose2d):
46 | def forward(self, x):
47 | if x.numel() > 0:
48 | return super(ConvTranspose2d, self).forward(x)
49 | # get output shape
50 |
51 | output_shape = [
52 | (i - 1) * d - 2 * p + (di * (k - 1) + 1) + op
53 | for i, p, di, k, d, op in zip(
54 | x.shape[-2:],
55 | self.padding,
56 | self.dilation,
57 | self.kernel_size,
58 | self.stride,
59 | self.output_padding,
60 | )
61 | ]
62 | output_shape = [x.shape[0], self.bias.shape[0]] + output_shape
63 | return _NewEmptyTensorOp.apply(x, output_shape)
64 |
65 |
66 | class BatchNorm2d(torch.nn.BatchNorm2d):
67 | def forward(self, x):
68 | if x.numel() > 0:
69 | return super(BatchNorm2d, self).forward(x)
70 | # get output shape
71 | output_shape = x.shape
72 | return _NewEmptyTensorOp.apply(x, output_shape)
73 |
74 |
75 | def interpolate(
76 | input, size=None, scale_factor=None, mode="nearest", align_corners=None
77 | ):
78 | if input.numel() > 0:
79 | return torch.nn.functional.interpolate(
80 | input, size, scale_factor, mode, align_corners
81 | )
82 |
83 | def _check_size_scale_factor(dim):
84 | if size is None and scale_factor is None:
85 | raise ValueError("either size or scale_factor should be defined")
86 | if size is not None and scale_factor is not None:
87 | raise ValueError("only one of size or scale_factor should be defined")
88 | if (
89 | scale_factor is not None
90 | and isinstance(scale_factor, tuple)
91 | and len(scale_factor) != dim
92 | ):
93 | raise ValueError(
94 | "scale_factor shape must match input shape. "
95 | "Input is {}D, scale_factor size is {}".format(dim, len(scale_factor))
96 | )
97 |
98 | def _output_size(dim):
99 | _check_size_scale_factor(dim)
100 | if size is not None:
101 | return size
102 | scale_factors = _ntuple(dim)(scale_factor)
103 | # math.floor might return float in py2.7
104 | return [
105 | int(math.floor(input.size(i + 2) * scale_factors[i])) for i in range(dim)
106 | ]
107 |
108 | output_shape = tuple(_output_size(2))
109 | output_shape = input.shape[:-2] + output_shape
110 | return _NewEmptyTensorOp.apply(input, output_shape)
111 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/modeling/layers/scale.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Malong Technologies Co., Ltd.
2 | # All rights reserved.
3 | #
4 | # Contact: github@malong.com
5 | #
6 | # This source code is licensed under the LICENSE file in the root directory of this source tree.
7 |
8 | import torch
9 | from torch import nn
10 |
11 |
12 | class Scale(nn.Module):
13 | def __init__(self, init_value=1.0):
14 | super(Scale, self).__init__()
15 | self.scale = nn.Parameter(torch.FloatTensor([init_value]))
16 |
17 | def forward(self, input):
18 | return input * self.scale
19 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/charnet/modeling/utils.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Malong Technologies Co., Ltd.
2 | # All rights reserved.
3 | #
4 | # Contact: github@malong.com
5 | #
6 | # This source code is licensed under the LICENSE file in the root directory of this source tree.
7 |
8 | import math
9 |
10 |
11 | def rotate_rect(x1, y1, x2, y2, degree, center_x, center_y):
12 | points = [[x1, y1], [x2, y1], [x2, y2], [x1, y2]]
13 | new_points = list()
14 | for point in points:
15 | dx = point[0] - center_x
16 | dy = point[1] - center_y
17 | new_x = center_x + dx * math.cos(degree) - dy * math.sin(degree)
18 | new_y = center_y + dx * math.sin(degree) + dy * math.cos(degree)
19 | new_points.append([(new_x), (new_y)])
20 | return new_points
21 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/configs/icdar2015_hourglass88.yaml:
--------------------------------------------------------------------------------
1 | INPUT_SIZE: 2280
2 | WEIGHT: "weights/icdar2015_hourglass88.pth"
3 | CHAR_DICT_FILE: "datasets/ICDAR2015/test/char_dict.txt"
4 | WORD_LEXICON_PATH: "datasets/ICDAR2015/test/GenericVocabulary.txt"
5 | RESULTS_SEPARATOR: ","
6 | SIZE_DIVISIBILITY: 128
7 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/datasets/ICDAR2015/test/char_dict.txt:
--------------------------------------------------------------------------------
1 | a0
2 | b1
3 | c2
4 | d3
5 | e4
6 | f5
7 | g6
8 | h7
9 | i8
10 | j9
11 | k10
12 | l11
13 | m12
14 | n13
15 | o14
16 | p15
17 | q16
18 | r17
19 | s18
20 | t19
21 | u20
22 | v21
23 | w22
24 | x23
25 | y24
26 | z25
27 | 026
28 | 127
29 | 228
30 | 329
31 | 430
32 | 531
33 | 632
34 | 733
35 | 834
36 | 935
37 | !36
38 | #37
39 | "38
40 | %39
41 | $40
42 | '41
43 | &42
44 | )43
45 | (44
46 | +45
47 | *46
48 | -47
49 | ,48
50 | /49
51 | .50
52 | ;51
53 | :52
54 | =53
55 | <54
56 | ?55
57 | >56
58 | @57
59 | [58
60 | ]59
61 | \60
62 | _61
63 | ^62
64 | `63
65 | {64
66 | }65
67 | |66
68 | ~67
69 |
--------------------------------------------------------------------------------
/image/fastlane/OCR/sample.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VectorInstitute/Computer_Vision_Project/337d2dd041b575a31304c2052370b816bf92b2be/image/fastlane/OCR/sample.jpg
--------------------------------------------------------------------------------
/image/fastlane/Object_Detection/dataset.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | from torch.utils.data import Dataset
3 | import pandas as pd
4 | import torch
5 | import numpy as np
6 | import os
7 | from PIL import Image
8 | from utils import iou_width_height
9 |
10 | class YOLODataset(Dataset):
11 | def __init__(
12 | self,
13 | csv_file,
14 | img_dir,
15 | label_dir,
16 | anchors,
17 | image_size=416,
18 | S=[13, 26, 52],
19 | C=20,
20 | transform=None,
21 | ):
22 | self.annotations = pd.read_csv(csv_file)
23 | self.img_dir = img_dir
24 | self.label_dir = label_dir
25 | self.image_size = image_size
26 | self.transform = transform
27 | self.S = S
28 | self.anchors = torch.tensor(anchors[0] + anchors[1] + anchors[2]) # for all 3 scales
29 | self.num_anchors = self.anchors.shape[0]
30 | self.num_anchors_per_scale = self.num_anchors // 3
31 | self.C = C
32 | self.ignore_iou_thresh = 0.5
33 |
34 | def __len__(self):
35 | return len(self.annotations)
36 |
37 | def __getitem__(self, index):
38 | label_path = os.path.join(self.label_dir, self.annotations.iloc[index, 1])
39 | bboxes = np.roll(np.loadtxt(fname=label_path, delimiter=" ", ndmin=2), 4, axis=1).tolist()
40 | img_path = os.path.join(self.img_dir, self.annotations.iloc[index, 0])
41 | image = np.array(Image.open(img_path).convert("RGB"))
42 |
43 | if self.transform:
44 | augmentations = self.transform(image=image, bboxes=bboxes)
45 | image = augmentations["image"]
46 | bboxes = augmentations["bboxes"]
47 |
48 | # Below assumes 3 scale predictions (as paper) and same num of anchors per scale
49 | targets = [torch.zeros((self.num_anchors // 3, S, S, 6)) for S in self.S]
50 | for box in bboxes:
51 | iou_anchors = iou_width_height(torch.tensor(box[2:4]), self.anchors)
52 | anchor_indices = iou_anchors.argsort(descending=True, dim=0)
53 | x, y, width, height, class_label = box
54 | has_anchor = [False] * 3 # each scale should have one anchor
55 | for anchor_idx in anchor_indices:
56 | scale_idx = anchor_idx // self.num_anchors_per_scale
57 | anchor_on_scale = anchor_idx % self.num_anchors_per_scale
58 | S = self.S[scale_idx]
59 | i, j = int(S * y), int(S * x) # which cell
60 | anchor_taken = targets[scale_idx][anchor_on_scale, i, j, 0]
61 | if not anchor_taken and not has_anchor[scale_idx]:
62 | targets[scale_idx][anchor_on_scale, i, j, 0] = 1
63 | x_cell, y_cell = S * x - j, S * y - i # both between [0,1]
64 | width_cell, height_cell = (
65 | width * S,
66 | height * S,
67 | ) # can be greater than 1 since it's relative to cell
68 | box_coordinates = torch.tensor(
69 | [x_cell, y_cell, width_cell, height_cell]
70 | )
71 | targets[scale_idx][anchor_on_scale, i, j, 1:5] = box_coordinates
72 | targets[scale_idx][anchor_on_scale, i, j, 5] = int(class_label)
73 | has_anchor[scale_idx] = True
74 |
75 | elif not anchor_taken and iou_anchors[anchor_idx] > self.ignore_iou_thresh:
76 | targets[scale_idx][anchor_on_scale, i, j, 0] = -1 # ignore prediction
77 |
78 | return image, tuple(targets)
79 |
--------------------------------------------------------------------------------
/image/fastlane/Object_Detection/models.py:
--------------------------------------------------------------------------------
1 | """
2 | Implementation of YOLOv3 architecture
3 | """
4 |
5 | import torch
6 | import torch.nn as nn
7 |
8 | """
9 | Information about architecture config:
10 | Tuple is structured by (filters, kernel_size, stride)
11 | Every conv is a same convolution.
12 | List is structured by "B" indicating a residual block followed by the number of repeats
13 | "S" is for scale prediction block and computing the yolo loss
14 | "U" is for upsampling the feature map and concatenating with a previous layer
15 | """
16 | config = [
17 | (32, 3, 1),
18 | (64, 3, 2),
19 | ["B", 1],
20 | (128, 3, 2),
21 | ["B", 2],
22 | (256, 3, 2),
23 | ["B", 8],
24 | (512, 3, 2),
25 | ["B", 8],
26 | (1024, 3, 2),
27 | ["B", 4], # To this point is Darknet-53
28 | (512, 1, 1),
29 | (1024, 3, 1),
30 | "S",
31 | (256, 1, 1),
32 | "U",
33 | (256, 1, 1),
34 | (512, 3, 1),
35 | "S",
36 | (128, 1, 1),
37 | "U",
38 | (128, 1, 1),
39 | (256, 3, 1),
40 | "S",
41 | ]
42 |
43 |
44 | class CNNBlock(nn.Module):
45 | def __init__(self, in_channels, out_channels, bn_act=True, **kwargs):
46 | super().__init__()
47 | self.conv = nn.Conv2d(in_channels, out_channels, bias=not bn_act, **kwargs)
48 | self.bn = nn.BatchNorm2d(out_channels)
49 | self.leaky = nn.LeakyReLU(0.1)
50 | self.use_bn_act = bn_act
51 |
52 | def forward(self, x):
53 | if self.use_bn_act:
54 | return self.leaky(self.bn(self.conv(x)))
55 | else:
56 | return self.conv(x)
57 |
58 |
59 | class ResidualBlock(nn.Module):
60 | def __init__(self, channels, use_residual=True, num_repeats=1):
61 | super().__init__()
62 | self.layers = nn.ModuleList()
63 | for repeat in range(num_repeats):
64 | self.layers += [
65 | nn.Sequential(
66 | CNNBlock(channels, channels // 2, kernel_size=1),
67 | CNNBlock(channels // 2, channels, kernel_size=3, padding=1),
68 | )
69 | ]
70 |
71 | self.use_residual = use_residual
72 | self.num_repeats = num_repeats
73 |
74 | def forward(self, x):
75 | for layer in self.layers:
76 | if self.use_residual:
77 | x = x + layer(x)
78 | else:
79 | x = layer(x)
80 |
81 | return x
82 |
83 |
84 | class ScalePrediction(nn.Module):
85 | def __init__(self, in_channels, num_classes):
86 | super().__init__()
87 | self.pred = nn.Sequential(
88 | CNNBlock(in_channels, 2 * in_channels, kernel_size=3, padding=1),
89 | CNNBlock(
90 | 2 * in_channels, (num_classes + 5) * 3, bn_act=False, kernel_size=1
91 | ),
92 | )
93 | self.num_classes = num_classes
94 |
95 | def forward(self, x):
96 | return (
97 | self.pred(x)
98 | .reshape(x.shape[0], 3, self.num_classes + 5, x.shape[2], x.shape[3])
99 | .permute(0, 1, 3, 4, 2)
100 | )
101 |
102 |
103 | class YOLOv3(nn.Module):
104 | def __init__(self, in_channels=3, num_classes=80):
105 | super().__init__()
106 | self.num_classes = num_classes
107 | self.in_channels = in_channels
108 | self.layers = self._create_conv_layers()
109 |
110 | def forward(self, x):
111 | outputs = [] # for each scale
112 | route_connections = []
113 | for layer in self.layers:
114 | if isinstance(layer, ScalePrediction):
115 | outputs.append(layer(x))
116 | continue
117 |
118 | x = layer(x)
119 |
120 | if isinstance(layer, ResidualBlock) and layer.num_repeats == 8:
121 | route_connections.append(x)
122 |
123 | elif isinstance(layer, nn.Upsample):
124 | x = torch.cat([x, route_connections[-1]], dim=1)
125 | route_connections.pop()
126 |
127 | return outputs
128 |
129 | def _create_conv_layers(self):
130 | layers = nn.ModuleList()
131 | in_channels = self.in_channels
132 |
133 | for module in config:
134 | if isinstance(module, tuple):
135 | out_channels, kernel_size, stride = module
136 | layers.append(
137 | CNNBlock(
138 | in_channels,
139 | out_channels,
140 | kernel_size=kernel_size,
141 | stride=stride,
142 | padding=1 if kernel_size == 3 else 0,
143 | )
144 | )
145 | in_channels = out_channels
146 |
147 | elif isinstance(module, list):
148 | num_repeats = module[1]
149 | layers.append(ResidualBlock(in_channels, num_repeats=num_repeats,))
150 |
151 | elif isinstance(module, str):
152 | if module == "S":
153 | layers += [
154 | ResidualBlock(in_channels, use_residual=False, num_repeats=1),
155 | CNNBlock(in_channels, in_channels // 2, kernel_size=1),
156 | ScalePrediction(in_channels // 2, num_classes=self.num_classes),
157 | ]
158 | in_channels = in_channels // 2
159 |
160 | elif module == "U":
161 | layers.append(nn.Upsample(scale_factor=2),)
162 | in_channels = in_channels * 3
163 |
164 | return layers
165 |
166 |
167 | if __name__ == "__main__":
168 | num_classes = 20
169 | IMAGE_SIZE = 416
170 | model = YOLOv3(num_classes=num_classes)
171 | x = torch.randn((2, 3, IMAGE_SIZE, IMAGE_SIZE))
172 | out = model(x)
173 | assert model(x)[0].shape == (2, 3, IMAGE_SIZE//32, IMAGE_SIZE//32, num_classes + 5)
174 | assert model(x)[1].shape == (2, 3, IMAGE_SIZE//16, IMAGE_SIZE//16, num_classes + 5)
175 | assert model(x)[2].shape == (2, 3, IMAGE_SIZE//8, IMAGE_SIZE//8, num_classes + 5)
176 | print("Success!")
177 |
--------------------------------------------------------------------------------
/image/fastlane/Object_Detection/skynews-boeing-737-plane_5435020.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VectorInstitute/Computer_Vision_Project/337d2dd041b575a31304c2052370b816bf92b2be/image/fastlane/Object_Detection/skynews-boeing-737-plane_5435020.jpg
--------------------------------------------------------------------------------
/image/fastlane/README.md:
--------------------------------------------------------------------------------
1 | # Vector Fastlane
2 |
3 | You need to have conda installed on you machine. Follow these intructions
4 |
5 | ```
6 | conda create -n pytorch181 python=3.9
7 | conda activate pytorch181
8 | conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
9 | pip install albumentations scikit-learn scikit-image matplotlib opencv-python yacs joblib natsort h5py tqdm
10 | pip install gdown addict future pyyaml requests scipy yapf editdistance pyclipper pandas==1.4.0 shapely==2.0.1
11 | ```
12 |
13 | You can download the datasets and pretrained weights from this [link](https://drive.google.com/drive/folders/1qqK1uQsgkj0MT7yOhx33mTRlISy27QCA?usp=share_link).
14 |
--------------------------------------------------------------------------------
/video/Galbladder_Segmentation/GallbladderFiles/NOGO1_319 via_project_14May2021_13h54m.json:
--------------------------------------------------------------------------------
1 | {"_via_settings":{"ui":{"annotation_editor_height":25,"annotation_editor_fontsize":0.8,"leftsidebar_width":18,"image_grid":{"img_height":80,"rshape_fill":"none","rshape_fill_opacity":0.3,"rshape_stroke":"yellow","rshape_stroke_width":2,"show_region_shape":true,"show_image_policy":"all"},"image":{"region_label":"__via_region_id__","region_color":"__via_default_region_color__","region_label_font":"10px Sans","on_image_annotation_editor_placement":"NEAR_REGION"}},"core":{"buffer_size":18,"filepath":{},"default_filepath":""},"project":{"name":"NOGO1_319 via_project_14May2021_13h54m"}},"_via_img_metadata":{"frame_317_endo.png426401":{"filename":"frame_317_endo.png","size":426401,"regions":[{"shape_attributes":{"name":"polygon","all_points_x":[386,412,435,443,436,405,394],"all_points_y":[384,388,385,379,348,356,364]},"region_attributes":{}},{"shape_attributes":{"name":"polygon","all_points_x":[494,545,592,625,623,626,675,705,726,761,783,727,416,443,448,447,463,498],"all_points_y":[241,274,284,294,322,332,332,339,345,358,357,475,471,409,381,367,359,287]},"region_attributes":{}}],"file_attributes":{}},"frame_318_endo.png446373":{"filename":"frame_318_endo.png","size":446373,"regions":[{"shape_attributes":{"name":"polygon","all_points_x":[386,412,435,443,436,405,394],"all_points_y":[377,381,378,372,341,349,357]},"region_attributes":{}},{"shape_attributes":{"name":"polygon","all_points_x":[494,545,618,625,623,626,673,705,726,761,783,727,416,443,448,447,463,498],"all_points_y":[241,274,291,294,322,332,327,339,345,358,357,475,471,409,381,367,359,287]},"region_attributes":{}}],"file_attributes":{}},"frame_319_endo.png429032":{"filename":"frame_319_endo.png","size":429032,"regions":[{"shape_attributes":{"name":"polygon","all_points_x":[382,408,431,439,432,401,390],"all_points_y":[375,379,376,370,339,347,355]},"region_attributes":{}},{"shape_attributes":{"name":"polygon","all_points_x":[492,543,584,623,621,624,671,703,724,759,781,725,414,441,446,445,461,496],"all_points_y":[242,275,280,295,323,333,328,340,346,359,358,476,472,410,382,368,360,288]},"region_attributes":{}}],"file_attributes":{}},"frame_311_endo.png432467":{"filename":"frame_311_endo.png","size":432467,"regions":[{"shape_attributes":{"name":"polygon","all_points_x":[376,402,425,433,426,411,384],"all_points_y":[386,390,387,381,350,350,366]},"region_attributes":{}},{"shape_attributes":{"name":"polygon","all_points_x":[486,520,561,589,613,610,616,667,699,721,771,722,403,429,436,433,457,480],"all_points_y":[245,272,286,289,301,323,342,340,345,355,373,477,477,421,385,372,357,315]},"region_attributes":{}}],"file_attributes":{}},"frame_312_endo.png443219":{"filename":"frame_312_endo.png","size":443219,"regions":[{"shape_attributes":{"name":"polygon","all_points_x":[376,402,425,433,426,411,384],"all_points_y":[386,390,387,381,350,350,366]},"region_attributes":{}},{"shape_attributes":{"name":"polygon","all_points_x":[488,522,563,591,615,612,618,669,701,723,773,724,405,431,438,435,459,482],"all_points_y":[245,272,286,289,301,323,342,340,345,355,373,477,477,421,385,372,357,315]},"region_attributes":{}}],"file_attributes":{}},"frame_313_endo.png425114":{"filename":"frame_313_endo.png","size":425114,"regions":[{"shape_attributes":{"name":"polygon","all_points_x":[376,402,425,433,426,411,384],"all_points_y":[386,390,387,381,350,350,366]},"region_attributes":{}},{"shape_attributes":{"name":"polygon","all_points_x":[489,524,565,593,617,614,620,671,703,725,775,726,407,433,441,437,461,484],"all_points_y":[252,273,287,290,302,324,343,341,346,356,374,478,478,422,399,373,358,316]},"region_attributes":{}}],"file_attributes":{}},"frame_314_endo.png438937":{"filename":"frame_314_endo.png","size":438937,"regions":[{"shape_attributes":{"name":"polygon","all_points_x":[379,405,428,436,429,414,387],"all_points_y":[389,393,390,384,353,353,369]},"region_attributes":{}},{"shape_attributes":{"name":"polygon","all_points_x":[491,526,567,595,619,616,622,673,705,727,777,728,409,435,443,439,463,486],"all_points_y":[254,275,289,292,304,326,345,343,348,358,376,480,480,424,401,375,360,318]},"region_attributes":{}}],"file_attributes":{}},"frame_315_endo.png391587":{"filename":"frame_315_endo.png","size":391587,"regions":[{"shape_attributes":{"name":"polygon","all_points_x":[379,405,428,436,429,414,387],"all_points_y":[389,393,390,384,353,353,369]},"region_attributes":{}},{"shape_attributes":{"name":"polygon","all_points_x":[488,539,586,619,617,620,669,699,720,755,777,724,409,437,442,441,457,492],"all_points_y":[248,281,291,301,329,339,339,346,352,365,364,475,473,416,388,374,366,294]},"region_attributes":{}}],"file_attributes":{}},"frame_316_endo.png438069":{"filename":"frame_316_endo.png","size":438069,"regions":[{"shape_attributes":{"name":"polygon","all_points_x":[379,405,428,436,429,414,387],"all_points_y":[389,393,390,384,353,353,369]},"region_attributes":{}},{"shape_attributes":{"name":"polygon","all_points_x":[490,541,588,621,619,622,671,701,722,757,779,723,412,439,444,443,459,494],"all_points_y":[243,276,286,296,324,334,334,341,347,360,359,477,473,411,383,369,361,289]},"region_attributes":{}}],"file_attributes":{}}},"_via_attributes":{"region":{},"file":{}},"_via_data_format_version":"2.0.10","_via_image_id_list":["frame_317_endo.png426401","frame_318_endo.png446373","frame_319_endo.png429032","frame_311_endo.png432467","frame_312_endo.png443219","frame_313_endo.png425114","frame_314_endo.png438937","frame_315_endo.png391587","frame_316_endo.png438069"]}
--------------------------------------------------------------------------------
/video/Galbladder_Segmentation/README.md:
--------------------------------------------------------------------------------
1 | # Gallbladder Segmentation
2 |
3 | To work with this project, it is a prerequisite to install detectron2 and it's dependencies. The instructions for this is in the website https://github.com/facebookresearch/detectron2
4 |
5 | ## Training for classes for personal project in detectron2:
6 |
7 | Detectron2 has a prespecified workflow for common machine learning datasets such as COCO, Pascal VOC, and cityscapes. There are also arrangements for the tasks that can be performed within these datasets such as object detection, and the different types of segmentation (see "detectron2/configs/" folder). However, there are some additions required for the sake of using detectron2 in custom projects and external datasets. In our case, we are trying to use detectron2 to detect the No-Go-Zone in a laparoscopic surgery.
8 |
9 | To enable this, we first had to register the dataset under MetadataCatalog and DatasetCatalog. We need the dataset to be in a specific list-of-dictionaries format (keys=filename, imageId, height, width, annotations). Next, we simply had to call the DatasetCatalog and MetadataCatalog objects to register the training and evaluation parts of the dataset and the classes within it. A tutorial for this can be found in the official collaboratory page for detectron2 which is in "https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5" in the "Train on a custom dataset" section.
10 |
11 | Also, to perform periodic evaluation during training, we followed the recommendations given in the build_evaluator method from the defaults.py module. We added a class called MyTrainer which inherits DefaultTrainer from the detectron2.engine location. Another addition we had to make was the LossEvalHook class which inherits the HookBase class from the detectron2.engine.hooks location. This enables us to register our own events on which the evaluation steps should automatically take place during training.
12 |
13 | ## Changes made to default demo workflow:
14 |
15 | Another change required was for creating an output video. In the VisualizationDemo class of predictor.py, we have to make sure the metadata it picks up is from our dataset, so we had to use the line
16 |
17 | `self.metadata = MetadataCatalog.get("bladder_val")`
18 |
19 | instead of the old line used to set the self.metadata variable in the __init__ part. To make the colour the same for all the frames, we had to add a line in video_visualizer.py which hard-codes the colour by making a list repeating the same RGB code. For example, we added "colors=[[0,0.502,0.502]]*10" after the line where it gets the colour.
20 |
21 | For video output smoothing, we added some lines of code where it sums up the area of the segment predictions over an interval and outputs it to the video so the predictions look stable for that amount of frames. These changes were made in video_visualizer.py and predictor.py. These changes mainly included adding a buffer value to choose the interval over the averaging of prediction mask area, a way to retain the masks until the buffer criteria is met, and a signal to the draw_instance_predictions method in video_visualizer.py.
22 |
23 | ## Instructions to run training and inference:
24 |
25 | To run the training, there is an slrm file called runt4v1Detectron.slrm. Essentially that file runs the command:
26 |
27 | ```python
28 | python DetectronGBScript.py
29 | --wd
30 | --ims
31 | --lr
32 | --e
33 | --roi
34 | --d