├── README.md ├── README_zh.md ├── anchors.json ├── assets ├── dataset.png ├── loss_curve.png ├── net_structure.png └── qrcodes.png ├── data_generator ├── README.md ├── README_zh.md ├── generate_qrcode.py └── generate_training_data.py ├── data_loader └── dataset.py ├── evaluate.py ├── gradio_demo.py ├── models ├── README.md ├── loss.py └── yolov3.py ├── requirements.txt ├── test.py ├── test_images ├── 1.jpg ├── 2.jpg └── 3.jpg ├── train.py └── utils ├── anchor_generator.py ├── kmean.py └── util.py /README.md: -------------------------------------------------------------------------------- 1 | [中文](README_zh.md) 2 | # QRCode Detection 3 | Deep learning based QRCode detection. 4 | 5 | ## Introduction 6 | This is a project which depends on deep learning algorithm for QRCode detection. 7 | We have achieved fast and high-precision detection by using a yolov3-like detecter. 8 | 9 | Feature: 10 | 11 | + Fast detection, more than 190 fps on GTX 1060. 12 | + High precision 13 | Evaluate result on validation data 14 | |Precision|Recall|Mean IOU| 15 | | ---- | ---- |----| 16 | |0.987|0.819|0.798| 17 | 18 | + Free deployment 19 | 20 | ## Installation 21 | Please enable python in your machine. 22 | ```shell 23 | git clone https://github.com/cosimo17/QRCodeDetection.git 24 | cd QRCodeDetection 25 | pip install -r requirements.txt 26 | 27 | ``` 28 | ## Test 29 | To test with the pretrained model, please download the pretrained weight file from [here](https://drive.google.com/file/d/1lqlQySkYehgkVJjZtRnYAICla7qSnxeG/view?usp=sharing). 30 | ```shell 31 | python3 test.py \ 32 | -w yolo_qrcode.h5 \ 33 | -i test_images/1.jpg \ 34 | -o ./result_1.jpg 35 | ``` 36 | 37 | ## Training 38 | * Before start training, please check [How to prepare dataset](data_generator/README.md) 39 | * Run the kmean algorithm to generate priori anchor boxes 40 | ```shell 41 | python3 utils/kmean.py \ 42 | --root_dir your_dataset_dir \ 43 | -n 6 44 | ``` 45 | 46 | Execute following command to start training: 47 | ```shell 48 | python3 train.py \ 49 | -d your_dataset_dir \ 50 | -b 64 \ 51 | -e 80 52 | ``` 53 | You can run ```python3 train.py --help``` to get help. 54 | During training, you can use tensorboard to visualize the loss curve. 55 | ```shell 56 | tensorboard --logdir=./logs 57 | ``` 58 | ![loss](assets/loss_curve.png) 59 | 60 | ## Evaluate 61 | Execute following command to evaluate the model performance: 62 | ```shell 63 | python3 evaluate.py \ 64 | -d your_dataset_dir \ 65 | -b 64 \ 66 | --score_threshold 0.5 \ 67 | --iou_threshold 0.5 \ 68 | -w yolo_qrcode.h5 69 | ``` 70 | 71 | ## TODO 72 | - [ ] Integrate decode module 73 | - [ ] Support docker container 74 | - [ ] Support openvino 75 | - [ ] Support tensorrt 76 | - [ ] Support tflite 77 | -------------------------------------------------------------------------------- /README_zh.md: -------------------------------------------------------------------------------- 1 | [In English](README.md) 2 | # 二维码检测 3 | 基于深度学习的二维码检测 4 | 5 | ## 介绍 6 | 这是一个基于深度学习算法的二维码检测项目,通过一个类似yolov3的目标检测网络,实现了快速,高精度的二维码检测。 7 | 特性: 8 | + 快速, 在GTX 1060显卡上可以达到大于190的FPS 9 | + 高精度 10 | 在验证集上的测试结果 11 | |Precision|Recall|Mean IOU| 12 | | ---- | ---- |----| 13 | |0.987|0.819|0.798| 14 | 15 | + 多样化部署 16 | 17 | ## 安装 18 | ```shell 19 | git clone https://github.com/cosimo17/QRCodeDetection.git 20 | cd QRCodeDetection 21 | pip install -r requirements.txt 22 | ``` 23 | 24 | ## 测试 25 | 测试前,请先从 [这里](https://drive.google.com/file/d/1lqlQySkYehgkVJjZtRnYAICla7qSnxeG/view?usp=sharing) 下载预训练好的模型。 26 | ```shell 27 | python3 test.py \ 28 | -w yolo_qrcode.h5 \ 29 | -i test_images/1.jpg \ 30 | -o ./result_1.jpg 31 | ``` 32 | 33 | ## 训练 34 | * 训练自己的模型之前,请先查看[如何准备训练数据集](data_generator/README_zh.md) 35 | * 运行聚类算法,为数据集生成先验的锚点(anchor box) 36 | ```shell 37 | python3 utils/kmean.py \ 38 | --root_dir your_dataset_dir \ 39 | -n 6 40 | ``` 41 | 42 | * 使用如下命令启动训练 43 | ```shell 44 | python3 train.py \ 45 | -d your_dataset_dir \ 46 | -b 64 \ 47 | -e 80 48 | ``` 49 | 可以运行```python3 train.py --help```来查看参数含义和帮助信息 50 | 51 | * 在训练过程中,你可以使用tensorboard来监控loss的收敛曲线 52 | ```shell 53 | tensorboard --logdir=./logs 54 | ``` 55 | ![loss](assets/loss_curve.png) 56 | 57 | ## 评估 58 | 运行如下命令来评估模型的性能: 59 | ```shell 60 | python3 evaluate.py \ 61 | -d your_dataset_dir \ 62 | -b 64 \ 63 | --score_threshold 0.5 \ 64 | --iou_threshold 0.5 \ 65 | -w yolo_qrcode.h5 66 | ``` 67 | 68 | ## TODO 69 | - [ ] 集成解码模块 70 | - [ ] 支持docker 71 | - [ ] 支持openvino 72 | - [ ] 支持tensorrt 73 | - [ ] 支持tflite 74 | -------------------------------------------------------------------------------- /anchors.json: -------------------------------------------------------------------------------- 1 | { 2 | "anchors": [ 3 | [ 4 | 0.21708803813877386, 5 | 0.19077461920925146 6 | ], 7 | [ 8 | 0.14457228401955907, 9 | 0.14482037475851203 10 | ], 11 | [ 12 | 0.24115443125042058, 13 | 0.24174995569171495 14 | ], 15 | [ 16 | 0.2798744647627279, 17 | 0.28599593416090546 18 | ], 19 | [ 20 | 0.17124336032308185, 21 | 0.17623757236501922 22 | ], 23 | [ 24 | 0.18998317083452929, 25 | 0.2249816123708392 26 | ] 27 | ] 28 | } -------------------------------------------------------------------------------- /assets/dataset.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cosimo17/QRCodeDetection/865e5421c44d16db5ceb48e899cecb56823b3db9/assets/dataset.png -------------------------------------------------------------------------------- /assets/loss_curve.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cosimo17/QRCodeDetection/865e5421c44d16db5ceb48e899cecb56823b3db9/assets/loss_curve.png -------------------------------------------------------------------------------- /assets/net_structure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cosimo17/QRCodeDetection/865e5421c44d16db5ceb48e899cecb56823b3db9/assets/net_structure.png -------------------------------------------------------------------------------- /assets/qrcodes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cosimo17/QRCodeDetection/865e5421c44d16db5ceb48e899cecb56823b3db9/assets/qrcodes.png -------------------------------------------------------------------------------- /data_generator/README.md: -------------------------------------------------------------------------------- 1 | [中文](README_zh.md) 2 | # How to prepare training data 3 | 4 | Yu can using generated fake data to train this model, or you can collect your own dataset for training. 5 | We suggest you train on fake data first, then finetune on your own dataset. 6 | 7 | ## Generate fake dataset 8 | We provide two scripts for data generation. 9 | * Generate QRCode image 10 | ```shell 11 | mkdir qrcodes 12 | python3 data_generator/generate_qrcode.py \ 13 | -n 1500 \ 14 | -o qrcodes 15 | ``` 16 | 17 | * Prepare some background images. (Such as imagenet, open image.) 18 | 19 | * Generate training data 20 | ```shell 21 | python3 generate_training_data.py \ 22 | -fg qrcodes \ 23 | -bg your_dir \ 24 | -o training_ds \ 25 | -n 40000 \ 26 | --shape 256 27 | ``` 28 | The generated data looks like following: 29 | ![dataset](../assets/dataset.png) 30 | We have already generated 40000 images and labels. You can download them from here: [dataset](https://drive.google.com/file/d/1Mv9fC8e4-IJq3MLQ_QA846o4TTjn-9ui/view?usp=sharing) 31 | 32 | ## Prepare your own dataset 33 | Of course, you can prepare your own dataset by yourself. 34 | * Collect images data 35 | 36 | * Annotate your images 37 | You can use any tools you like to annotate your images. [Labelme](https://github.com/wkentaro/labelme) will be a good choice. 38 | * Convert the label format 39 | After annotation, you should convert the label format. 40 | ``` 41 | training_ds 42 | ------------ 43 | | 44 | |---000001.jpg 45 | |---000001.txt 46 | |---000002.jpg 47 | |---000002.txt 48 | |---... 49 | |---... 50 | |---... 51 | |---xxxxxx.jpg 52 | |---xxxxxx.txt 53 | For each image, there should be a txtfile which has the same name with image's name. 54 | The format of txt: 55 | cx,cy,w,h,1.0, 1.0 56 | cx,cyw,h,1.0,1.0 57 | Each line represents an qrcode object. 58 | cx,cy means the center coordinates, w,h means the width and the height. All coordinates are normalized to [0-1] -------------------------------------------------------------------------------- /data_generator/README_zh.md: -------------------------------------------------------------------------------- 1 | [In English](README.md) 2 | # 如何准备训练数据集 3 | 4 | 你可以使用生成的虚拟数据集进行训练,也可以使用自己在实际场景中收集标注的数据进行训练。 5 | 建议先使用虚拟数据集进行训练,然后再在自己的数据集上进行微调。 6 | 7 | ## 生成数据集 8 | 这里提供了两个程序来生成虚拟的训练数据。 9 | * 生成二维码图片 10 | ```shell 11 | mkdir qrcodes 12 | # 生成1500张二维码图片,保存在qrcodes目录下 13 | python3 data_generator/generate_qrcode.py \ 14 | -n 1500 \ 15 | -o qrcodes 16 | ``` 17 | * 准备一些图片作为背景 18 | 19 | * 合成二维码 20 | ```shell 21 | python3 generate_training_data.py \ 22 | -fg qrcodes \ 23 | -bg your_dir \ 24 | -o training_ds \ 25 | -n 40000 \ 26 | --shape 256 27 | ``` 28 | 生成的数据集如下所示: 29 | ![数据集示意](../assets/dataset.png) 30 | 31 | 预先使用这两个脚本生成了40000张图片数据和标签,你可以从这里下载到它: [数据集](https://drive.google.com/file/d/1Mv9fC8e4-IJq3MLQ_QA846o4TTjn-9ui/view?usp=sharing) 32 | 33 | ## 创建自己的数据集 34 | 你也可以创建自己的数据集 35 | * 采集图片 36 | * 标注图片 37 | 你可以使用任何你喜欢的标注工具来标注自己的数据,比如[labelme](https://github.com/wkentaro/labelme) 等 38 | * 确保标注格式符合要求 39 | 请将标注工具生成的标签转换为如下格式: 40 | ``` 41 | training_ds 42 | ------------ 43 | | 44 | |---000001.jpg 45 | |---000001.txt 46 | |---000002.jpg 47 | |---000002.txt 48 | |---... 49 | |---... 50 | |---... 51 | |---xxxxxx.jpg 52 | |---xxxxxx.txt 53 | 54 | 每一张图片,都应该有一个同名的txt文件与之对应 55 | txt的格式如下 56 | cx,cy,w,h,1.0, 1.0 57 | cx,cyw,h,1.0,1.0 58 | 每一行表示一个二维码对象 59 | cx,cy表示边界框的中心点坐标,w,h,表示边界框的宽和高,所有的坐标都被归一化到[0-1]之间. 60 | -------------------------------------------------------------------------------- /data_generator/generate_qrcode.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os 3 | import qrcode 4 | import argparse 5 | 6 | chars = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 7 | 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 8 | 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 9 | 'w', 'x', 'y', 'z', 10 | 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 11 | 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 12 | 'W', 'X', 'Y', 'Z', 13 | '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', 14 | '-', '+', '/', '?', ','] 15 | 16 | def get_args(): 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument('--number', '-n', type=int, 19 | default=1000, help='How many qrcode images will be generated') 20 | parser.add_argument('--min_length', '-min', type=int, 21 | default=10, help='min length of the string encoded in qrcode') 22 | parser.add_argument('--max_length', '-max', type=int, 23 | default=25, help='max length of the string encoded in qrcode') 24 | parser.add_argument('--output_dir', '-o', type=str, required=True, 25 | help='Dir to save the result') 26 | parser.add_argument('--size', '-s', type=int, default=6, 27 | help='qrcode image pixel size') 28 | parser.add_argument('--version', '-v', type=int, default=1, 29 | help='version of the qrcode') 30 | args = parser.parse_args() 31 | return args 32 | 33 | def random_length(min_length, max_length): 34 | return np.random.randint(min_length, max_length) 35 | 36 | def random_index(length): 37 | return np.random.randint(0, len(chars), size=(length,)) 38 | 39 | def string_from_index(index): 40 | s = '' 41 | for i in index: 42 | s += chars[i] 43 | return s 44 | 45 | def string2qrcode(string, version, size): 46 | img = qrcode.make(string, version=version, box_size=size) 47 | return img 48 | 49 | def run(args): 50 | for i in range(args.number): 51 | print("Generating {}/{} qrcode image".format(i, args.number)) 52 | length = random_length(args.min_length, args.max_length) 53 | index = random_index(length) 54 | string = string_from_index(index) 55 | img = string2qrcode(string, args.version, args.size) 56 | imgname = '{:04d}.jpg'.format(i) 57 | imgname = os.path.join(args.output_dir, imgname) 58 | img.save(imgname) 59 | 60 | def main(): 61 | args = get_args() 62 | run(args) 63 | 64 | if __name__ == '__main__': 65 | main() -------------------------------------------------------------------------------- /data_generator/generate_training_data.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import cv2 4 | import argparse 5 | import imgaug.augmenters as iaa 6 | import tqdm 7 | 8 | MIN_W = 32 9 | MIN_H = 32 10 | 11 | aug_seq = iaa.Sequential([ 12 | iaa.Crop(px=(0, 10)), 13 | iaa.GaussianBlur(sigma=(0.0, 4)), 14 | iaa.Sometimes(0.5, iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.5 * 255), per_channel=0.5)), 15 | iaa.Affine( 16 | scale={"x": (0.3, 0.5), "y": (0.3, 0.5)}, 17 | rotate=(-25, 25), 18 | shear=(-15, 15) 19 | ), 20 | iaa.PerspectiveTransform(scale=(0.01, 0.1)) 21 | ]) 22 | 23 | 24 | def augment_process_fg(image): 25 | image_aug = aug_seq.augment_image(image) 26 | return image_aug 27 | 28 | 29 | def get_args(): 30 | parser = argparse.ArgumentParser() 31 | parser.add_argument('--fg_dir', '-fg', type=str, help='path to foreground qrcode images') 32 | parser.add_argument('--bg_dir', '-bg', type=str, help='path to background images') 33 | parser.add_argument('--output', '-o', type=str, help='path to save the generated images') 34 | parser.add_argument('--number', '-n', type=int, help='how many images you want to generate') 35 | parser.add_argument('--size', '-s', type=str, default='(32,120)', help='size range of the qrcode image') 36 | parser.add_argument('--alpha', '-a', type=str, default='(10,30)', help='value range of the alpha parameter') 37 | parser.add_argument('--object_number', '-on', type=str, default='(1,5)', 38 | help='the number of qrcode image in one background image') 39 | parser.add_argument('--shape', type=int, default=256, help='training data shape') 40 | parser.add_argument('--debug', type=bool, default=False, help='debug mode') 41 | args = parser.parse_args() 42 | args.size = eval(args.size) # string to tuple 43 | args.alpha = eval(args.alpha) 44 | args.object_number = eval(args.object_number) 45 | return args 46 | 47 | 48 | class ImageLists(object): 49 | def __init__(self, root_dir, shape=None): 50 | imgnames = os.listdir(root_dir) 51 | imgnames = [os.path.join(root_dir, imgname) for imgname in imgnames] 52 | self.imgnames = imgnames 53 | self.shape = shape 54 | 55 | def __getitem__(self, item): 56 | item = item % len(self.imgnames) 57 | imgname = self.imgnames[item] 58 | img = cv2.imread(imgname) 59 | if self.shape is not None: 60 | img = cv2.resize(img, self.shape) 61 | return img 62 | 63 | def __len__(self): 64 | return len(self.imgnames) 65 | 66 | 67 | def random_position(bg_size, fg_size): 68 | bg_w, bg_h = bg_size 69 | w, h = fg_size 70 | xmin = np.random.randint(0, bg_w - w) 71 | ymin = np.random.randint(0, bg_h - h) 72 | xmax = xmin + w 73 | ymax = ymin + h 74 | bbox = [xmin, ymin, xmax, ymax] 75 | return bbox 76 | 77 | 78 | def random_resize(img, size_range): 79 | h, w = img.shape[:2] 80 | min_size, max_size = size_range 81 | new_w = np.random.randint(min_size, max_size) 82 | new_h = int(new_w * h / w) 83 | img = cv2.resize(img, (new_w, new_h)) 84 | return img 85 | 86 | 87 | def try_random_resize(fg_img, size_range): 88 | _fg_img = fg_img.copy() 89 | while True: 90 | fg_img = _fg_img.copy() 91 | # augment 92 | fg_img = augment_process_fg(fg_img) 93 | mask = np.argwhere(fg_img > 0) 94 | box = (np.min(mask[..., 0]), 95 | np.min(mask[..., 1]), 96 | np.max(mask[..., 0]), 97 | np.max(mask[..., 1])) 98 | if box[2] - box[0] < MIN_W or box[3] - box[1] < MIN_H: 99 | continue 100 | fg_img = fg_img[box[0]:box[2], box[1]:box[3], ...] 101 | fg_img = random_resize(fg_img, size_range) 102 | 103 | break 104 | return fg_img 105 | 106 | 107 | def try_random_position(fg_img, bg_size, exist_bbox): 108 | fg_size = [fg_img.shape[1], fg_img.shape[0]] 109 | count = 0 110 | max_count = 20 111 | while True: 112 | if count > max_count: 113 | return None 114 | bbox = random_position(bg_size, fg_size) 115 | intersection = False 116 | for bx in exist_bbox: 117 | if is_overlap(bx, bbox): 118 | intersection = True 119 | break 120 | if not intersection: 121 | break 122 | count += 1 123 | return bbox 124 | 125 | 126 | def is_overlap(box1, box2): 127 | if box1[0] > box2[2] or box2[0] > box1[2]: 128 | return False 129 | if box1[1] > box2[3] or box2[1] > box1[3]: 130 | return False 131 | return True 132 | 133 | 134 | def paste(fg_img, bg_img, bbox, alpha): 135 | # crop qrcode image 136 | fg_img = fg_img[0:bbox[3] - bbox[1], 0:bbox[2] - bbox[0], ...] 137 | mask = np.nonzero(fg_img > np.random.randint(15, 50)) 138 | bg_crop = bg_img[bbox[1]:bbox[3], bbox[0]:bbox[2], ...] 139 | # alpha fusion with random parameter 140 | alpha = np.random.randint(alpha[0], alpha[1]) / 100.0 141 | bg_crop[mask] = (bg_crop[mask] * alpha + fg_img[mask] * (1 - alpha)).astype(np.uint8) 142 | bg_img[bbox[1]:bbox[3], bbox[0]:bbox[2], :] = bg_crop 143 | return bg_img 144 | 145 | 146 | def normalize_coordinate(bbox, shape): 147 | """Convert absolute coordinates to relative coordinates""" 148 | xmin, ymin, xmax, ymax = bbox 149 | w = xmax - xmin 150 | h = ymax - ymin 151 | xmin /= shape[1] 152 | ymin /= shape[0] 153 | w /= shape[1] 154 | h /= shape[0] 155 | return xmin, ymin, w, h 156 | 157 | 158 | def save_result(output_dir, img, count, labels): 159 | imgname = '{:06d}.jpg'.format(count) 160 | imgname = os.path.join(output_dir, imgname) 161 | cv2.imwrite(imgname, img) 162 | labels_name = imgname.replace('.jpg', '.txt') 163 | with open(labels_name, 'w') as f: 164 | for i in range(len(labels) - 1): 165 | f.write(labels[i] + '\n') 166 | f.write(labels[-1]) 167 | 168 | 169 | def visualize(img, bbox): 170 | xmin, ymin, xmax, ymax = bbox 171 | cv2.rectangle(img, (xmin, ymin), (xmax, ymax), (0, 255, 0), 1) 172 | cv2.imshow('img', img) 173 | cv2.waitKey(0) 174 | 175 | 176 | def generate_training_data(args): 177 | """ 178 | Generate fake training data. 179 | 1. select one background image 180 | 2. select one or more qrcode images 181 | 3. do some image augment for qrcode image (add noise, blur, affine ...) 182 | 4. resize qrcode image to a random size 183 | 5. paste qrcode image to a random location of background image(alpha Fusion) 184 | """ 185 | if not os.path.exists(args.output): 186 | os.mkdir(args.output) 187 | bg_imgs = ImageLists(args.bg_dir, [args.shape] * 2) 188 | fg_imgs = ImageLists(args.fg_dir) 189 | count = 0 190 | with tqdm.tqdm(total=args.number) as pbar: 191 | pbar.set_description('Generating {}/{} sample'.format(count, args.number)) 192 | while True: 193 | if count >= args.number: 194 | break 195 | # get ackground image 196 | bg_img = bg_imgs[count] 197 | exist_bbox = [] 198 | labels = [] 199 | for i in range(np.random.randint(args.object_number[0], args.object_number[1])): 200 | # get qrcode image 201 | fg_img = fg_imgs[count] 202 | fg_img = try_random_resize(fg_img, args.size) 203 | bbox = try_random_position(fg_img, [bg_img.shape[1], bg_img.shape[0]], exist_bbox) 204 | if bbox is None: 205 | continue 206 | synth_img = paste(fg_img, bg_img, bbox, args.alpha) 207 | exist_bbox.append(bbox) 208 | l, t, w, h = normalize_coordinate(bbox, bg_img.shape) 209 | cx = l + w / 2 210 | cy = t + h / 2 211 | if args.debug: 212 | visualize(synth_img, bbox) 213 | # cx, cy, w, h, conf, cls 214 | one_label = '{},{},{},{},{},{}'.format(cx, cy, w, h, 1.0, 0) 215 | labels.append(one_label) 216 | bg_img = synth_img 217 | count += 1 218 | save_result(args.output, bg_img, count, labels) 219 | pbar.update(1) 220 | 221 | 222 | def main(): 223 | args = get_args() 224 | generate_training_data(args) 225 | 226 | 227 | if __name__ == '__main__': 228 | main() 229 | -------------------------------------------------------------------------------- /data_loader/dataset.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import os 4 | import tensorflow as tf 5 | from functools import partial 6 | from utils import util 7 | 8 | 9 | def preprocess_img(imgname): 10 | """ 11 | Load img via cv2 12 | """ 13 | img = cv2.imread(imgname.numpy().decode()).astype(np.float32) 14 | img /= 255.0 15 | return img 16 | 17 | 18 | def parse_label(txtname): 19 | """ 20 | Load bbox from txtfile 21 | """ 22 | labels = [] 23 | with open(txtname.numpy().decode(), 'r') as f: 24 | lines = f.readlines() 25 | for l in lines: 26 | l = l.split(',')[:4] 27 | labels.append(l) 28 | labels = np.array(labels).astype(np.float32) 29 | return labels 30 | 31 | 32 | def tf_preprocess_img(filename): 33 | img = None 34 | [img, ] = tf.py_function(preprocess_img, [filename], [tf.float32]) 35 | return img 36 | 37 | 38 | def tf_preprocess_label(filename): 39 | label = None 40 | [label, ] = tf.py_function(parse_label, [filename], [tf.float32]) 41 | return label 42 | 43 | 44 | def yolo_label(bbox, grids, anchor_ratios, class_number): 45 | return util.bbox2yololabel(bboxs=bbox, 46 | grids=grids, 47 | anchor_ratios=anchor_ratios, 48 | class_number=class_number) 49 | 50 | 51 | def tf_create_yolo_label(bbox, grids, anchor_ratios, class_number): 52 | [label, ] = tf.py_function(yolo_label, [bbox, grids, anchor_ratios, class_number], [tf.float32]) 53 | return label 54 | 55 | 56 | def create_dataset(root_dir, grids, anchor_ratios, class_number, batch_size): 57 | """ 58 | Build dataset pipeline 59 | """ 60 | list_ds = tf.data.Dataset.list_files(root_dir + '/*.jpg', shuffle=False) 61 | imgs_ds = list_ds.map(tf_preprocess_img, num_parallel_calls=4) 62 | list_ds = tf.data.Dataset.list_files(root_dir + '/*.txt', shuffle=False) 63 | label_ds = list_ds.map(tf_preprocess_label, num_parallel_calls=4) 64 | label_ds = label_ds.map( 65 | partial(tf_create_yolo_label, grids=grids, anchor_ratios=anchor_ratios, class_number=class_number), num_parallel_calls=4) 66 | dataset = tf.data.Dataset.zip((imgs_ds, label_ds)) 67 | # slice all data. 70% for training, 30% for validation 68 | training_dataset = dataset.take(int(len(dataset)*0.7)).prefetch(batch_size*10).shuffle(batch_size*10).batch(batch_size) 69 | val_dataset = dataset.skip(int(len(dataset)*0.7)).prefetch(batch_size*10).batch(batch_size) 70 | return training_dataset, val_dataset 71 | -------------------------------------------------------------------------------- /evaluate.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import argparse 4 | from utils.util import * 5 | from models.yolov3 import yolov3 6 | import multiprocessing as mp 7 | import numpy as np 8 | import time 9 | 10 | 11 | def get_args(): 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument('--data_dir', '-d', type=str, required=True, help='path to training dataset') 14 | parser.add_argument('--shape', type=str, default='(256,256)', help='input shape of network') 15 | parser.add_argument('--batch_size', '-b', type=int, default=32) 16 | parser.add_argument('--score_threshold', type=float, default=0.5) 17 | parser.add_argument('--iou_threshold', type=float, default=0.5) 18 | parser.add_argument('--anchors', '-a', type=str, default='anchors.json', 19 | help='anchors generated from kmean algorithm') 20 | parser.add_argument('--weights', '-w', type=str, default='yolo_qrcode.h5', help='pretrained weight') 21 | args = parser.parse_args() 22 | args.shape = eval(args.shape) 23 | return args 24 | 25 | 26 | def _load_img(name): 27 | img = cv2.imread(name) 28 | img = img.astype(np.float32) / 255.0 29 | return img 30 | 31 | 32 | def _load_label(name): 33 | labelname = name.replace('.jpg', '.txt') 34 | with open(labelname, 'r') as f: 35 | lines = f.readlines() 36 | labels = [] 37 | for line in lines: 38 | line = line.split(',')[:4] # cx,cy,w,h 39 | line = [float(v) for v in line] 40 | line = cxcy2xyxy(line) 41 | labels.append(line) 42 | labels = np.array(labels).astype(np.float32) 43 | return labels 44 | 45 | 46 | def loader(root_dir, batch_size, cpu): 47 | imgnames = os.listdir(root_dir) 48 | imgnames = [name for name in imgnames if name.endswith('.jpg')] 49 | imgnames.sort() 50 | imgnames = imgnames[int(len(imgnames) * 0.7):] # last 30% is validation dataset 51 | imgnames = [os.path.join(root_dir, name) for name in imgnames] 52 | indexes = np.arange(len(imgnames)) 53 | indexes = indexes[:batch_size * (len(indexes) // batch_size)] # drop last 54 | indexes = np.reshape(indexes, (-1, batch_size)) 55 | pool = mp.Pool(cpu) 56 | for i in range(indexes.shape[0]): 57 | index = indexes[i] 58 | _imgnames = [imgnames[idx] for idx in index] 59 | imgs = pool.map(_load_img, _imgnames) 60 | labels = pool.map(_load_label, _imgnames) 61 | imgs = np.array(imgs).astype(np.float32) 62 | yield imgs, labels 63 | 64 | 65 | def _metrics(pred_bboxes, true_boxes, iou_threshold=0.5): 66 | """ 67 | pred_bboxes: np.ndarray. [n,4]. format: normalized | xmin,ymin,xmax,ymax. 68 | true_boxes: list. [m,4]. format: normalized | xmin,ymin,xmax,ymax. 69 | """ 70 | TP = 0 # true positive 71 | TN = 0 # true negative 72 | FP = 0 # false positive 73 | FN = 0 # false negative 74 | IOU = 0 75 | used = [False for _ in range(len(pred_bboxes))] # mask indicate the pred box has matched with gt box or not 76 | for i in range(len(true_boxes)): 77 | detected = False 78 | for j in range(len(pred_bboxes)): 79 | _iou = general_iou(true_boxes[i], pred_bboxes[j]) 80 | if _iou > iou_threshold and not used[j]: 81 | TP += 1 82 | used[j] = True 83 | IOU += _iou 84 | detected = True 85 | break 86 | if not detected: 87 | FN += 1 88 | FP += (len(used) - sum(used)) # unmatched pred box. False positive pred. 89 | if TP > 0: 90 | mean_iou = IOU / TP 91 | else: 92 | mean_iou = 0 93 | return TP, FP, FN, mean_iou 94 | 95 | 96 | def run(): 97 | args = get_args() 98 | anchors = load_anchors(args.anchors) 99 | detecter = yolov3(input_shape=args.shape, anchor_number=len(anchors), weight=args.weights) 100 | anchors = gen_anchors([s // 32 for s in args.shape], anchors) 101 | TP, FP, FN = 0, 0, 0 102 | IOU = 0 103 | count = 0 104 | for imgs, gt_labels in loader(args.data_dir, args.batch_size, 3): 105 | print("Evaluating {}/{} sample".format(count, 12000)) 106 | # Forward 107 | outputs = detecter.predict(imgs) # [n,h,w,c] 108 | for i in range(len(outputs)): 109 | scores, classes, bboxes = decode(anchors, np.expand_dims(outputs[i], axis=0)) 110 | pred_scores, pred_bboxes = postprocess(scores, classes, bboxes) 111 | tp, fp, fn, iou = _metrics(pred_bboxes, gt_labels[i]) 112 | TP += tp 113 | FP += fp 114 | FN += fn 115 | IOU += iou 116 | count += len(imgs) 117 | precision = TP / (TP + FP) 118 | recall = TP / (TP + FN) 119 | mean_iou = IOU / count 120 | print("\n") 121 | print("--------------Evaluate Result-----------------") 122 | print("Model: {}".format(args.weights)) 123 | print("score_threshold: {}".format(args.score_threshold)) 124 | print("iou_threshold: {}".format(args.iou_threshold)) 125 | print("Precision: {:.3f} Recall: {:.3f} MeanIOU: {:.3f}".format(precision, recall, mean_iou)) 126 | 127 | 128 | if __name__ == '__main__': 129 | run() 130 | 131 | -------------------------------------------------------------------------------- /gradio_demo.py: -------------------------------------------------------------------------------- 1 | import gradio as gr 2 | from models.yolov3 import yolov3 3 | import numpy as np 4 | import cv2 5 | from utils.anchor_generator import gen_anchors 6 | import utils.util as util 7 | from functools import partial 8 | 9 | shape = (256, 256) 10 | anchors = util.load_anchors('./anchors.json') 11 | model = yolov3((256, 256), anchor_number=len(anchors), weight='yolo_qrcode.h5') 12 | anchors = gen_anchors([s // 32 for s in (256, 256)], anchors) 13 | 14 | 15 | def preprocess(img): 16 | src_img = img.copy() 17 | img = cv2.resize(img, shape) 18 | img = img.astype(np.float32) / 255.0 19 | img = np.expand_dims(img, axis=0) 20 | return src_img, img 21 | 22 | 23 | def draw_roi(img, scores, bboxes, name='qrcode'): 24 | h, w = img.shape[:2] 25 | label_w = 46 26 | label_h = 18 27 | bbox_color = (240, 146, 31) 28 | label_roi_color = np.array([192, 219, 103]) 29 | label_text_color = (255, 255, 255) 30 | for score, bbox in zip(scores, bboxes): 31 | xmin, ymin, xmax, ymax = bbox 32 | xmin = int(xmin * w) 33 | ymin = int(ymin * h) 34 | xmax = int(xmax * w) 35 | ymax = int(ymax * h) 36 | cv2.rectangle(img, (xmin, ymin), (xmax, ymax), bbox_color, 2) 37 | img[ymin - label_h:ymin, xmin:xmin + label_w, :] = label_roi_color 38 | cv2.putText(img, str(name), (xmin, ymin - 8), cv2.FONT_HERSHEY_SIMPLEX, 0.4, label_text_color, 1) 39 | return img 40 | 41 | 42 | def detection_info(bboxes, scores): 43 | infos = [] 44 | temp = 'QRCode{}\n 置信度:{:.3f}\n xmin:{}, ymin:{}, xmax:{}, ymax:{}\n\n' 45 | for i in range(len(scores)): 46 | bbox = bboxes[i] 47 | score = scores[i] 48 | xmin, ymin, xmax, ymax = bbox 49 | xmin = int(xmin * 256) 50 | ymin = int(ymin * 256) 51 | xmax = int(xmax * 256) 52 | ymax = int(ymax * 256) 53 | infos.append(temp.format(i + 1, score, xmin, ymin, xmax, ymax)) 54 | return ''.join(infos) 55 | 56 | 57 | def detect(image): 58 | src_img, img = preprocess(image) 59 | pred = model.predict(img)[0] 60 | scores, classes, bboxes = util.decode(anchors, pred) 61 | scores, bboxes = util.postprocess(scores, classes, bboxes) 62 | src_img = draw_roi(src_img, scores, bboxes) 63 | return src_img, str(detection_info(bboxes, scores)) 64 | 65 | 66 | input_image = gr.Image() 67 | output_image = gr.Image() 68 | output_text = gr.Textbox() 69 | 70 | demo = gr.Interface( 71 | fn=detect, 72 | inputs=input_image, 73 | outputs=[output_image, output_text], 74 | ) 75 | 76 | demo.launch() 77 | -------------------------------------------------------------------------------- /models/README.md: -------------------------------------------------------------------------------- 1 | ## 网络结构如下 (Network structure is as follows) 2 | ![network structure](../assets/net_structure.png) 3 | Created by [netron](https://netron.app/). -------------------------------------------------------------------------------- /models/loss.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | 4 | def yolo_loss(y_true, y_pred): 5 | """ 6 | :param y_true: [n, gridw, gridh, anchor_per_grid, channel] 7 | :param y_pred: [n, gridw, gridh, anchor_per_grid, channel] 8 | :return: loss 9 | """ 10 | pred_scores = tf.math.sigmoid(y_pred[..., 0]) 11 | pred_cls = tf.math.softmax(y_pred[..., 1:3], axis=-1) 12 | epsilon = 0.0001 13 | pred_cls = tf.clip_by_value(pred_cls, epsilon, 1 - epsilon) 14 | pred_xy = tf.math.tanh(y_pred[..., 3:5]) 15 | pred_wh = tf.math.tanh(y_pred[..., 5:]) 16 | 17 | true_scores = y_true[..., 0] 18 | true_cls = y_true[..., 1:3] 19 | true_xy = y_true[..., 3:5] 20 | true_wh = y_true[..., 5:] 21 | 22 | bce = tf.keras.losses.BinaryCrossentropy(from_logits=False) 23 | score_loss = bce(true_scores, pred_scores) 24 | 25 | cls_mask = true_scores + 0.005 26 | cce = tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.NONE) 27 | cls_loss = cce(true_cls, pred_cls) * cls_mask 28 | cls_loss = tf.math.reduce_mean(cls_loss) 29 | 30 | se = lambda x, y: tf.reduce_sum(tf.math.square(x - y), axis=-1) 31 | xy_loss = se(true_xy, pred_xy) * true_scores 32 | wh_loss = se(true_wh, pred_wh) * true_scores 33 | bbox_loss = xy_loss + wh_loss 34 | bbox_loss = tf.math.reduce_mean(bbox_loss) 35 | 36 | loss = score_loss + 2 * cls_loss + 5 * bbox_loss 37 | loss *= 32 38 | 39 | return loss 40 | -------------------------------------------------------------------------------- /models/yolov3.py: -------------------------------------------------------------------------------- 1 | import tensorflow.keras as keras 2 | from tensorflow.keras.models import Model 3 | import tensorflow as tf 4 | 5 | def bn_act(x, act='relu'): 6 | acts = {'relu': keras.layers.ReLU, 7 | 'leaky_relu': keras.layers.LeakyReLU, 8 | 'swish': keras.activations.swish} 9 | x = keras.layers.BatchNormalization()(x) 10 | x = acts[act]()(x) 11 | return x 12 | 13 | def head_layer(x, class_number=2, anchor_number=3): 14 | """ 15 | Head layer for prediction. 16 | Reference: https://pjreddie.com/media/files/papers/YOLOv3.pdf 17 | """ 18 | kernel = anchor_number * (1 + class_number + 4) 19 | output = keras.layers.Conv2D(kernel, (3,3), padding='SAME')(x) 20 | output = tf.reshape(output, [-1, output.shape[1], output.shape[2], anchor_number, output.shape[3]//anchor_number]) 21 | return output 22 | 23 | def downsize(x): 24 | """ 25 | maxpool to downsize feature map 26 | """ 27 | return keras.layers.MaxPool2D(padding='same')(x) 28 | 29 | def yolov3(input_shape=(256,256), class_number=2, anchor_number=5, weight=''): 30 | """ 31 | yolov3 like network. Not real yolov3. 32 | size: 256 -> 128 -> 64 -> 32 -> 16 -> 8 33 | kernel: 32 -> 64 -> 128 -> 256 -> 256 34 | """ 35 | input_layer = keras.layers.Input(shape=input_shape + (3,)) 36 | for _ in range(2): 37 | x = keras.layers.Conv2D(32, (3,3), padding='SAME')(input_layer) 38 | x = bn_act(x) 39 | # downsize 40 | x = downsize(x) # 128 x 128 41 | 42 | for _ in range(3): 43 | x = keras.layers.Conv2D(64, (3,3), padding='SAME')(x) 44 | x = bn_act(x) 45 | # downsize 46 | x = downsize(x) # 64 x 64 47 | 48 | for _ in range(4): 49 | x = keras.layers.Conv2D(128, (3,3), padding='SAME')(x) 50 | x = bn_act(x) 51 | # downsize 52 | x = downsize(x) # 32 x 32 53 | for _ in range(4): 54 | x = keras.layers.Conv2D(128, (3,3), padding='SAME')(x) 55 | x = bn_act(x) 56 | # downsize 57 | x = downsize(x) # 16 x 16 58 | # low level feature 59 | f1 = x 60 | 61 | for _ in range(4): 62 | x = keras.layers.Conv2D(128, (3,3), padding='SAME')(x) 63 | x = bn_act(x) 64 | 65 | # feature fusion 66 | x = keras.layers.Concatenate()([f1, x]) 67 | 68 | # downsize via stride2 conv 69 | x = keras.layers.Conv2D(256, (3,3), strides=(2,2), padding='SAME')(x) # 8 x 8 70 | 71 | # 1x1 kernel to summary all channel 72 | x = keras.layers.Conv2D(256, (1,1), strides=(1,1))(x) 73 | 74 | output = head_layer(x, class_number=class_number, anchor_number=anchor_number) 75 | model = Model(inputs=input_layer, outputs=output) 76 | if weight != '': 77 | print('Load pretrained weight: {}'.format(weight)) 78 | model.load_weights(weight) 79 | return model 80 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | opencv-python 2 | qrcode 3 | imgaug 4 | tqdm 5 | numpy 6 | tensorflow-gpu 7 | sklearn 8 | gradio -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | from models.yolov3 import yolov3 2 | import numpy as np 3 | import cv2 4 | from utils.anchor_generator import gen_anchors 5 | import utils.util as util 6 | import argparse 7 | 8 | 9 | def get_args(): 10 | parser = argparse.ArgumentParser() 11 | parser.add_argument('--input', '-i', type=str, help='test image') 12 | parser.add_argument('--weight', '-w', type=str, help='h5 weight file') 13 | parser.add_argument('--shape', '-s', type=str, default='(256,256)', 14 | help='input shape. It should be equal with training shape') 15 | parser.add_argument('--anchors', '-a', type=str, default='anchors.json', 16 | help='anchors generated from kmean algorithm') 17 | parser.add_argument('--output', '-o', type=str, default='', help='output image') 18 | args = parser.parse_args() 19 | args.shape = eval(args.shape) 20 | return args 21 | 22 | 23 | def load_test_img(name, shape): 24 | img = cv2.imread(name) 25 | src_img = img.copy() 26 | img = cv2.resize(img, shape) 27 | img = img.astype(np.float32) / 255.0 28 | img = np.expand_dims(img, axis=0) 29 | return src_img, img 30 | 31 | 32 | def draw_roi(img, scores, bboxes, name='qrcode'): 33 | h, w = img.shape[:2] 34 | label_w = 46 35 | label_h = 18 36 | bbox_color = (240, 146, 31) 37 | label_roi_color = np.array([192, 219, 103]) 38 | label_text_color = (255, 255, 255) 39 | for score, bbox in zip(scores, bboxes): 40 | xmin, ymin, xmax, ymax = bbox 41 | xmin = int(xmin * w) 42 | ymin = int(ymin * h) 43 | xmax = int(xmax * w) 44 | ymax = int(ymax * h) 45 | cv2.rectangle(img, (xmin, ymin), (xmax, ymax), bbox_color, 2) 46 | img[ymin - label_h:ymin, xmin:xmin + label_w, :] = label_roi_color 47 | cv2.putText(img, str(name), (xmin, ymin - 8), cv2.FONT_HERSHEY_SIMPLEX, 0.4, label_text_color, 1) 48 | return img 49 | 50 | 51 | def main(): 52 | args = get_args() 53 | anchors = util.load_anchors('./anchors.json') 54 | model = yolov3(args.shape, anchor_number=len(anchors), weight=args.weight) 55 | anchors = gen_anchors([s//32 for s in args.shape], anchors) 56 | test_img = args.input 57 | src_img, img = load_test_img(test_img, args.shape) 58 | pred = model.predict(img)[0] 59 | scores, classes, bboxes = util.decode(anchors, pred) 60 | scores, bboxes = util.postprocess(scores, classes, bboxes) 61 | src_img = draw_roi(src_img, scores, bboxes) 62 | cv2.imshow('qrcode_detection', src_img) 63 | cv2.waitKey(0) 64 | if args.output != '': 65 | cv2.imwrite(args.output, src_img) 66 | 67 | 68 | if __name__ == '__main__': 69 | main() 70 | -------------------------------------------------------------------------------- /test_images/1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cosimo17/QRCodeDetection/865e5421c44d16db5ceb48e899cecb56823b3db9/test_images/1.jpg -------------------------------------------------------------------------------- /test_images/2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cosimo17/QRCodeDetection/865e5421c44d16db5ceb48e899cecb56823b3db9/test_images/2.jpg -------------------------------------------------------------------------------- /test_images/3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cosimo17/QRCodeDetection/865e5421c44d16db5ceb48e899cecb56823b3db9/test_images/3.jpg -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 4 | import argparse 5 | import tensorflow.keras as keras 6 | import tensorflow as tf 7 | from data_loader import dataset 8 | from models.yolov3 import yolov3 9 | from models.loss import yolo_loss 10 | from utils.util import load_anchors 11 | 12 | tf.get_logger().setLevel('WARNING') 13 | 14 | 15 | def get_args(): 16 | parser = argparse.ArgumentParser() 17 | parser.add_argument('--data_dir', '-d', type=str, help='path to training dataset') 18 | parser.add_argument('--shape', type=str, default='(256,256)', help='input shape of network') 19 | parser.add_argument('--epoch', '-e', type=int, default=40) 20 | parser.add_argument('--batch_size', '-b', type=int, default=32) 21 | parser.add_argument('--anchors', '-a', type=str, default='anchors.json', 22 | help='anchors generated from kmean algorithm') 23 | parser.add_argument('--weights', '-w', type=str, default='', help='pretrained weight') 24 | parser.add_argument('--output', '-o', type=str, default='yolo_qrcode.h5', help='output weight') 25 | parser.add_argument('--val_interval', '-i', type=int, default=1) 26 | parser.add_argument('--learning_rate', '-lr', type=float, default=0.001) 27 | args = parser.parse_args() 28 | args.shape = eval(args.shape) 29 | return args 30 | 31 | 32 | def scheduler(epoch, lr): 33 | if epoch % 5 == 1: 34 | return 0.001 35 | else: 36 | return lr * 0.7 37 | 38 | 39 | def score_acc(yt, yp): 40 | pred_scores = tf.math.sigmoid(yp[..., 0]) 41 | true_scores = yt[..., 0] 42 | pred_scores = tf.where(pred_scores > 0.5, 1, 0) 43 | acc = tf.reduce_mean( 44 | tf.cast(tf.math.equal(tf.cast(pred_scores, tf.float32), tf.cast(true_scores, tf.float32)), tf.float32)) 45 | return acc 46 | 47 | 48 | def cls_acc(yt, yp): 49 | pred_cls = yp[..., 1:3] 50 | pred_cls = tf.math.sigmoid(pred_cls) 51 | pred_cls = tf.math.argmax(pred_cls, axis=-1) 52 | 53 | true_cls = yt[..., 1:3] 54 | true_cls = tf.math.argmax(true_cls, axis=-1) 55 | acc = tf.metrics.categorical_accuracy(true_cls, pred_cls) 56 | return acc 57 | 58 | 59 | def train(): 60 | args = get_args() 61 | anchors = load_anchors(args.anchors) 62 | model = yolov3(input_shape=args.shape, anchor_number=len(anchors), weight=args.weights) 63 | model.compile(optimizer=keras.optimizers.Adam(args.learning_rate), loss=yolo_loss, 64 | metrics=[score_acc, cls_acc]) 65 | training_ds, val_ds = dataset.create_dataset(args.data_dir, [s // 32 for s in args.shape], anchor_ratios=anchors, 66 | class_number=2, batch_size=args.batch_size) 67 | lr_callback = tf.keras.callbacks.LearningRateScheduler(scheduler) 68 | tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir="./logs") 69 | model.fit(x=training_ds, epochs=args.epoch, validation_data=val_ds, callbacks=[lr_callback, tensorboard_callback], 70 | validation_freq=5) 71 | model.save(args.output) 72 | 73 | 74 | if __name__ == '__main__': 75 | train() 76 | -------------------------------------------------------------------------------- /utils/anchor_generator.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | ANCHOR = None 5 | def restack_anchors(xs, ys, ws, hs): 6 | xs = np.split(xs, xs.shape[2], axis=2) 7 | ys = np.split(ys, ys.shape[2], axis=2) 8 | ws = np.split(ws, ws.shape[2], axis=2) 9 | hs = np.split(hs, hs.shape[2], axis=2) 10 | anchors = np.concatenate([xs[0], ys[0], ws[0], hs[0]], axis=3) 11 | for i in range(1, len(xs)): 12 | _anchor = np.concatenate([xs[i], ys[i], ws[i], hs[i]], axis=3) 13 | anchors = np.concatenate([anchors, _anchor], axis=2) 14 | return anchors 15 | 16 | 17 | def gen_anchors(grids, anchor_ratios): 18 | """ 19 | :param grids: list. [grid_width, grid_height]. Grids means the last output from network 20 | :param anchor_ratios: np.ndarray. [[anchor_w, anchor_h]*] 21 | :return: priori anchors 22 | """ 23 | global ANCHOR 24 | if ANCHOR is not None: 25 | return ANCHOR 26 | grid_w, grid_h = grids 27 | ys, xs = np.meshgrid(list(range(grid_w)), list(range(grid_h))) 28 | xs = xs * (1 / grid_w) + (1 / grid_w) * 0.5 29 | ys = ys * (1 / grid_h) + (1 / grid_h) * 0.5 30 | anchor_per_grid = len(anchor_ratios) 31 | ws = np.ones(shape=(grid_w, grid_h, anchor_per_grid, 1), dtype=np.float32) 32 | ws *= np.expand_dims(anchor_ratios[..., 0], 1) 33 | hs = np.ones(shape=(grid_w, grid_h, anchor_per_grid, 1), dtype=np.float32) 34 | hs *= np.expand_dims(anchor_ratios[..., 1], 1) 35 | xs = np.expand_dims(xs, axis=[2, 3]) 36 | xs = np.tile(xs, [1, 1, anchor_per_grid, 1]) 37 | ys = np.expand_dims(ys, axis=[2, 3]) 38 | ys = np.tile(ys, [1, 1, anchor_per_grid, 1]) 39 | anchors = restack_anchors(xs, ys, ws, hs) 40 | ANCHOR = anchors.astype(np.float32) 41 | return anchors.astype(np.float32) 42 | 43 | 44 | def test_generated_anchors(): 45 | """ 46 | Test case of this anchor generator 47 | """ 48 | grid = [4, 4] 49 | anchor_rations = np.array([[1, 1], [2, 2]]) 50 | generated_anchors = gen_anchors(grid, anchor_rations) 51 | 52 | anchors = np.zeros(shape=(4, 4, 2, 4), dtype=np.float32) 53 | for i in range(4): 54 | for j in range(4): 55 | anchors[i, j, 0, :] = np.array([i * 0.25 + 0.125, j * 0.25 + 0.125, 1, 1]) 56 | anchors[i, j, 1, :] = np.array([i * 0.25 + 0.125, j * 0.25 + 0.125, 2, 2]) 57 | assert np.allclose(generated_anchors, anchors) 58 | 59 | global ANCHOR 60 | ANCHOR = None 61 | grid = [5, 5] 62 | anchor_rations = np.array([[1, 1], [2, 2], [0.6, 0.8]]) 63 | generated_anchors = gen_anchors(grid, anchor_rations) 64 | 65 | anchors = np.zeros(shape=(5, 5, 3, 4), dtype=np.float32) 66 | for i in range(5): 67 | for j in range(5): 68 | anchors[i, j, 0, :] = np.array([i * 0.2 + 0.1, j * 0.2 + 0.1, 1, 1]) 69 | anchors[i, j, 1, :] = np.array([i * 0.2 + 0.1, j * 0.2 + 0.1, 2, 2]) 70 | anchors[i, j, 2, :] = np.array( 71 | [i * 0.2 + 0.1, j * 0.2 + 0.1, 0.6, 0.8]) 72 | assert np.allclose(generated_anchors, anchors) 73 | 74 | 75 | if __name__ == '__main__': 76 | test_generated_anchors() 77 | -------------------------------------------------------------------------------- /utils/kmean.py: -------------------------------------------------------------------------------- 1 | from sklearn.cluster import KMeans 2 | import numpy as np 3 | import argparse 4 | import json 5 | import os 6 | 7 | def get_args(): 8 | parser = argparse.ArgumentParser() 9 | parser.add_argument('--root_dir', type=str, help='path of dataset') 10 | parser.add_argument('--n_clusters', '-n', type=int, default=6) 11 | parser.add_argument('--output', '-o', type=str, default='anchors.json') 12 | args = parser.parse_args() 13 | return args 14 | 15 | def load_data(root_dir): 16 | dataset = [] 17 | filenames = [name for name in os.listdir(root_dir) if name.endswith('.txt')] 18 | filenames = [os.path.join(root_dir, name) for name in filenames] 19 | for txt in filenames: 20 | with open(txt, 'r') as f: 21 | lines = f.readlines() 22 | for l in lines: 23 | l = l.split(',') 24 | w,h = l[2:4] 25 | w = float(w) 26 | h = float(h) 27 | dataset.append([w,h]) 28 | return dataset 29 | 30 | def mean_iou(shape1, shape2): 31 | w1, h1 = shape1[...,0], shape1[...,1] 32 | w2, h2 = shape2 33 | s1 = w1 * h1 34 | s2 = w2 * h2 35 | iou = np.minimum(s1, s2) / np.maximum(s1, s2) 36 | return iou 37 | 38 | def main(): 39 | args = get_args() 40 | dataset = load_data(args.root_dir) 41 | kmeans = KMeans(n_clusters=args.n_clusters, random_state=0, max_iter=500).fit(dataset) 42 | anchors = kmeans.cluster_centers_ 43 | print(anchors) 44 | anchors = anchors.tolist() 45 | anchors = { 46 | 'anchors': anchors 47 | } 48 | print("Save kmean anchors to {}".format(args.output)) 49 | with open(args.output, 'w') as f: 50 | json.dump(anchors, f, indent=2) 51 | 52 | if __name__ == '__main__': 53 | main() 54 | -------------------------------------------------------------------------------- /utils/util.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from .anchor_generator import gen_anchors 3 | import json 4 | import tensorflow as tf 5 | np.seterr(divide='raise') 6 | 7 | 8 | def load_anchors(anchor_file): 9 | with open(anchor_file, 'r') as f: 10 | anchors = json.load(f)['anchors'] 11 | return np.array(anchors) 12 | 13 | 14 | def general_iou(bbox1, bbox2): 15 | xmin1, ymin1, xmax1, ymax1 = bbox1 16 | xmin2, ymin2, xmax2, ymax2 = bbox2 17 | left = np.max([xmin1, xmin2]) 18 | right = np.min([xmax1, xmax2]) 19 | top = np.max([ymin1, ymin2]) 20 | bottom = np.min([ymax1, ymax2]) 21 | iw = np.max([(right - left), 0]) 22 | ih = np.max([(bottom - top), 0]) 23 | si = iw * ih 24 | s1 = (xmax1 - xmin1) * (ymax1 - ymin1) 25 | s2 = (xmax2 - xmin2) * (ymax2 - ymin2) 26 | _iou = si / (s1 + s2 - si) 27 | return _iou 28 | 29 | 30 | def cxcy2xyxy(bbox): 31 | cx, cy, w, h = bbox 32 | xmin = cx - w / 2 33 | ymin = cy - h / 2 34 | xmax = xmin + w 35 | ymax = ymin + h 36 | return xmin, ymin, xmax, ymax 37 | 38 | 39 | def iou(bbox1, bbox2): 40 | """ 41 | :param bbox1: np.ndarray. [gridw, gridh, anchor_per_grid, 4] 42 | :param bbox2: np.ndarray. [n, 4] 43 | box format: cx, cy, w, h.normalized to 0~1 44 | :return: ious. [n, gridw, gridh, anchor_per_grid, 1] 45 | """ 46 | cx1, cy1, w1, h1 = np.split(bbox1, 4, axis=-1) 47 | xmin1 = cx1 - w1 / 2 48 | ymin1 = cy1 - h1 / 2 49 | xmax1 = xmin1 + w1 50 | ymax1 = ymin1 + h1 51 | 52 | cx2, cy2, w2, h2 = np.split(bbox2, 4, axis=-1) 53 | xmin2 = cx2 - w2 / 2 54 | ymin2 = cy2 - h2 / 2 55 | xmax2 = xmin2 + w2 56 | ymax2 = ymin2 + h2 57 | 58 | ious = np.zeros(shape=(len(cx2), cx1.shape[0], cx1.shape[1], cx1.shape[2], 1), dtype=np.float32) 59 | for i in range(len(cx2)): 60 | left = np.maximum(xmin1, xmin2[i]) 61 | right = np.minimum(xmax1, xmax2[i]) 62 | top = np.maximum(ymin1, ymin2[i]) 63 | bottom = np.minimum(ymax1, ymax2[i]) 64 | iw = np.maximum((right - left), 0) 65 | ih = np.maximum((bottom - top), 0) 66 | s1 = w1 * h1 67 | s2 = w2[i] * h2[i] 68 | _iou = iw * ih / (s1 + s2 - (iw * ih)) 69 | ious[i] = _iou 70 | return ious 71 | 72 | 73 | def postprocess(scores, classes, bboxes, score_threshold=0.5, selected_cls=1, iou_threshold=0.6): 74 | scores = scores.flatten() 75 | classes = np.reshape(classes, [-1, 2]) 76 | bboxes = np.reshape(bboxes, [-1, 4]) 77 | idx = scores >= score_threshold 78 | 79 | scores = scores[idx] 80 | classes = classes[idx] 81 | bboxes = bboxes[idx] 82 | 83 | idx = classes[..., 1] > classes[..., 0] 84 | scores = scores[idx] 85 | classes = classes[idx] 86 | bboxes = bboxes[idx] 87 | 88 | idx = tf.image.non_max_suppression( 89 | bboxes, scores, max_output_size=10) 90 | scores = scores[idx.numpy()] 91 | classes = classes[idx.numpy()] 92 | bboxes = bboxes[idx.numpy()] 93 | return scores, bboxes 94 | 95 | 96 | def sigmoid(x): 97 | x = np.clip(x, -15.0, 15.0) 98 | return 1 / (1 + np.exp(-x)) 99 | 100 | 101 | def decode(anchors, output): 102 | """ 103 | decode anchor relative offset to bbox coordinate 104 | output: np.ndarray. output from network. [n, w, h, anchor_per_grid, 7] 105 | """ 106 | scores, cls_conf, bbox = output[..., 0], output[..., 1:3], output[..., 3:] 107 | scores = sigmoid(scores) 108 | cls_conf = sigmoid(cls_conf) 109 | bbox[..., :2] = np.tanh(bbox[..., :2]) # xw 110 | bbox[..., 2:] = np.tanh(bbox[..., 2:]) # wh 111 | tx, ty, tw, th = np.split(bbox, 4, axis=-1) 112 | anchor_cx, anchor_cy, anchor_w, anchor_h = np.split(anchors, 4, axis=-1) 113 | cx = tx * anchor_w + anchor_cx 114 | cy = ty * anchor_h + anchor_cy 115 | w = np.exp(tw) * anchor_w 116 | h = np.exp(th) * anchor_h 117 | xmin = cx - w / 2 118 | ymin = cy - h / 2 119 | xmax = xmin + w 120 | ymax = ymin + h 121 | bboxes = np.concatenate([xmin, ymin, xmax, ymax], axis=-1) 122 | return scores, cls_conf, bboxes 123 | 124 | 125 | def encode(anchor, bbox): 126 | """ 127 | encode bbox coordinate to anchor relative offset 128 | reference: https://pjreddie.com/media/files/papers/YOLOv3.pdf && 129 | https://github.com/tensorflow/models/blob/master/research/object_detection/box_coders/faster_rcnn_box_coder.py 130 | """ 131 | anchor_cx, anchor_cy, anchor_w, anchor_h = anchor 132 | cx, cy, w, h = bbox 133 | tx = (cx - anchor_cx) / anchor_w # -0.5 ~ 0.5 134 | ty = (cy - anchor_cy) / anchor_h # -0.5 ~ 0.5 135 | tw = np.log(w / anchor_w) 136 | th = np.log(h / anchor_h) 137 | return np.array([tx, ty, tw, th]) 138 | 139 | 140 | def bbox2yololabel(bboxs, grids, anchor_ratios, class_number=2): 141 | """ 142 | bboxs: np.ndarray. [n,4] [[cx, cy, w,h]*]. bboxs are normalized to 0~1 143 | """ 144 | channel = len(anchor_ratios) * (1 + class_number + 4) 145 | labels = np.zeros(shape=(grids[0], grids[1], len(anchor_ratios), channel // len(anchor_ratios)), dtype=np.float32) 146 | labels[..., 1] = np.array([1.0]) 147 | anchors = gen_anchors(grids, anchor_ratios) 148 | ious = iou(anchors, bboxs) 149 | for i in range(ious.shape[0]): 150 | index = np.unravel_index(ious[i].argmax(), ious[i].shape)[:-1] 151 | # assign label to the anchor whose iou is the max one 152 | encoded_bbox = encode(anchors[index], bboxs[i]) 153 | scores = np.array([1.0]) 154 | cls_confidence = np.array([0.0, 1.0]) 155 | label = np.concatenate([scores, cls_confidence, encoded_bbox], axis=0) # [score, cls_socre, bbox] 156 | labels[index] = label 157 | return labels.astype(np.float32) 158 | 159 | 160 | def test_iou(): 161 | anchors = [[0.5, 0.5, 1, 1], [1, 1, 2, 2]] 162 | boxs = [[0.5, 0.5, 1, 1], [1, 1, 2, 2], [1.5, 1.5, 1, 1]] 163 | anchors = np.array(anchors) 164 | anchors = np.expand_dims(anchors, axis=[0, 1]) 165 | anchors = np.tile(anchors, [3, 3, 1, 1]) 166 | boxs = np.array(boxs) 167 | _ious = iou(anchors, boxs) 168 | assert np.allclose(_ious[0, :, :, 0, :], [1]) 169 | assert np.allclose(_ious[0, :, :, 1, :], [0.25]) 170 | assert np.allclose(_ious[1, :, :, 0, :], [0.25]) 171 | assert np.allclose(_ious[1, :, :, 1, :], [1]) 172 | assert np.allclose(_ious[2, :, :, 0, :], [0]) 173 | assert np.allclose(_ious[2, :, :, 1, :], [0.25]) 174 | 175 | 176 | if __name__ == '__main__': 177 | test_iou() 178 | --------------------------------------------------------------------------------