├── .gitignore ├── LICENSE ├── README.md ├── ctpn ├── __init__.py ├── demo.py ├── demo_pb.py ├── generate_pb.py ├── text.yml └── train_net.py ├── data ├── VOCdevkit2007 ├── demo │ ├── 006.jpg │ ├── 007.jpg │ ├── 008.jpg │ ├── 009.jpg │ └── 010.png ├── oriented_results │ ├── 006.jpg │ ├── 007.jpg │ ├── 008.jpg │ ├── 009.jpg │ └── 010.png └── results │ ├── 006.jpg │ ├── 007.jpg │ ├── 008.jpg │ ├── 009.jpg │ ├── 010.png │ ├── res_006.txt │ ├── res_007.txt │ ├── res_008.txt │ ├── res_009.txt │ └── res_010.txt ├── lib ├── __init__.py ├── datasets │ ├── __init__.py │ ├── factory.py │ ├── imdb.py │ └── pascal_voc.py ├── fast_rcnn │ ├── __init__.py │ ├── bbox_transform.py │ ├── config.py │ ├── nms_wrapper.py │ ├── test.py │ └── train.py ├── networks │ ├── VGGnet_test.py │ ├── VGGnet_train.py │ ├── __init__.py │ ├── factory.py │ └── network.py ├── prepare_training_data │ ├── ToVoc.py │ └── split_label.py ├── roi_data_layer │ ├── __init__.py │ ├── layer.py │ ├── minibatch.py │ └── roidb.py ├── rpn_msr │ ├── __init__.py │ ├── anchor_target_layer_tf.py │ ├── generate_anchors.py │ └── proposal_layer_tf.py ├── text_connector │ ├── __init__.py │ ├── detectors.py │ ├── other.py │ ├── text_connect_cfg.py │ ├── text_proposal_connector.py │ ├── text_proposal_connector_oriented.py │ └── text_proposal_graph_builder.py └── utils │ ├── __init__.py │ ├── bbox.c │ ├── bbox.pyx │ ├── blob.py │ ├── boxes_grid.py │ ├── cython_nms.c │ ├── cython_nms.pyx │ ├── gpu_nms.c │ ├── gpu_nms.cpp │ ├── gpu_nms.hpp │ ├── gpu_nms.pyx │ ├── make.sh │ ├── nms_kernel.cu │ ├── setup.py │ └── timer.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | cache/ 3 | pretrain/ 4 | VOCdevkit2007/ 5 | logs/ 6 | output/ 7 | build/ 8 | dist/ 9 | checkpoints/ 10 | .idea/ 11 | *.py[cod] 12 | *.c[cod] 13 | *.so 14 | *.swp 15 | *.pb 16 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 shaohui ruan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # text-detection-ctpn 2 | 3 | text detection mainly based on ctpn (connectionist text proposal network). It is implemented in tensorflow. I use id card detect as an example to demonstrate the results, but it should be noticing that this model can be used in almost every horizontal scene text detection task. The origin paper can be found [here](https://arxiv.org/abs/1609.03605). Also, the origin repo in caffe can be found in [here](https://github.com/tianzhi0549/CTPN). For more detail about the paper and code, see this [blog](http://slade-ruan.me/2017/10/22/text-detection-ctpn/). If you got any questions, check the issue first, if the problem persists, open a new issue. 4 | *** 5 | # roadmap 6 | - [x] freeze the graph for convenient inference 7 | - [x] pure python, cython nms and cuda nms 8 | - [x] loss function as referred in paper 9 | - [x] oriented text connector 10 | - [x] BLSTM 11 | *** 12 | # demo 13 | - for a quick demo,you don't have to build the library, simpely use demo_pb.py for inference. 14 | - first, git clone git@github.com:eragonruan/text-detection-ctpn.git --depth=1 15 | - then, download the pb file from [release](https://github.com/eragonruan/text-detection-ctpn/releases) 16 | - put ctpn.pb in data/ 17 | - put your images in data/demo, the results will be saved in data/results, and run demo in the root 18 | ```shell 19 | python ./ctpn/demo_pb.py 20 | ``` 21 | *** 22 | # parameters 23 | there are some parameters you may need to modify according to your requirement, you can find them in ctpn/text.yml 24 | - USE_GPU_NMS # whether to use nms implemented in cuda or not 25 | - DETECT_MODE # H represents horizontal mode, O represents oriented mode, default is H 26 | - checkpoints_path # the model I provided is in checkpoints/, if you train the model by yourself,it will be saved in output/ 27 | *** 28 | # training 29 | ## setup 30 | - requirements: python2.7, tensorflow1.3, cython0.24, opencv-python, easydict,(recommend to install Anaconda) 31 | - if you do not have a gpu device,follow here to [setup](https://github.com/eragonruan/text-detection-ctpn/issues/43) 32 | - if you have a gpu device, build the library by 33 | ```shell 34 | cd lib/utils 35 | chmod +x make.sh 36 | ./make.sh 37 | ``` 38 | ## prepare data 39 | - First, download the pre-trained model of VGG net and put it in data/pretrain/VGG_imagenet.npy. you can download it from [google drive](https://drive.google.com/drive/folders/0B_WmJoEtfQhDRl82b1dJTjB2ZGc?resourcekey=0-OjW5DtLUbX5xUob7fwRvEw&usp=sharing) or [baidu yun](https://pan.baidu.com/s/1kUNTl1l). 40 | - Second, prepare the training data as referred in paper, or you can download the data I prepared from [google drive](https://drive.google.com/drive/folders/0B_WmJoEtfQhDRl82b1dJTjB2ZGc?resourcekey=0-OjW5DtLUbX5xUob7fwRvEw&usp=sharing) or [baidu yun](https://pan.baidu.com/s/1kUNTl1l). Or you can prepare your own data according to the following steps. 41 | - Modify the path and gt_path in prepare_training_data/split_label.py according to your dataset. And run 42 | ```shell 43 | cd lib/prepare_training_data 44 | python split_label.py 45 | ``` 46 | - it will generate the prepared data in current folder, and then run 47 | ```shell 48 | python ToVoc.py 49 | ``` 50 | - to convert the prepared training data into voc format. It will generate a folder named TEXTVOC. move this folder to data/ and then run 51 | ```shell 52 | cd ../../data 53 | ln -s TEXTVOC VOCdevkit2007 54 | ``` 55 | ## train 56 | Simplely run 57 | ```shell 58 | python ./ctpn/train_net.py 59 | ``` 60 | - you can modify some hyper parameters in ctpn/text.yml, or just used the parameters I set. 61 | - The model I provided in checkpoints is trained on GTX1070 for 50k iters. 62 | - If you are using cuda nms, it takes about 0.2s per iter. So it will takes about 2.5 hours to finished 50k iterations. 63 | *** 64 | # some results 65 | `NOTICE:` all the photos used below are collected from the internet. If it affects you, please contact me to delete them. 66 | 67 | 68 | *** 69 | ## oriented text connector 70 | - oriented text connector has been implemented, i's working, but still need futher improvement. 71 | - left figure is the result for DETECT_MODE H, right figure for DETECT_MODE O 72 | 73 | 74 | *** 75 | -------------------------------------------------------------------------------- /ctpn/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /ctpn/demo.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import cv2 4 | import glob 5 | import os 6 | import shutil 7 | import sys 8 | 9 | import numpy as np 10 | import tensorflow as tf 11 | 12 | sys.path.append(os.getcwd()) 13 | from lib.networks.factory import get_network 14 | from lib.fast_rcnn.config import cfg, cfg_from_file 15 | from lib.fast_rcnn.test import test_ctpn 16 | from lib.utils.timer import Timer 17 | from lib.text_connector.detectors import TextDetector 18 | from lib.text_connector.text_connect_cfg import Config as TextLineCfg 19 | 20 | 21 | def resize_im(im, scale, max_scale=None): 22 | f = float(scale) / min(im.shape[0], im.shape[1]) 23 | if max_scale != None and f * max(im.shape[0], im.shape[1]) > max_scale: 24 | f = float(max_scale) / max(im.shape[0], im.shape[1]) 25 | return cv2.resize(im, None, None, fx=f, fy=f, interpolation=cv2.INTER_LINEAR), f 26 | 27 | 28 | def draw_boxes(img, image_name, boxes, scale): 29 | base_name = image_name.split('/')[-1] 30 | with open('data/results/' + 'res_{}.txt'.format(base_name.split('.')[0]), 'w') as f: 31 | for box in boxes: 32 | if np.linalg.norm(box[0] - box[1]) < 5 or np.linalg.norm(box[3] - box[0]) < 5: 33 | continue 34 | if box[8] >= 0.9: 35 | color = (0, 255, 0) 36 | elif box[8] >= 0.8: 37 | color = (255, 0, 0) 38 | cv2.line(img, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), color, 2) 39 | cv2.line(img, (int(box[0]), int(box[1])), (int(box[4]), int(box[5])), color, 2) 40 | cv2.line(img, (int(box[6]), int(box[7])), (int(box[2]), int(box[3])), color, 2) 41 | cv2.line(img, (int(box[4]), int(box[5])), (int(box[6]), int(box[7])), color, 2) 42 | 43 | min_x = min(int(box[0] / scale), int(box[2] / scale), int(box[4] / scale), int(box[6] / scale)) 44 | min_y = min(int(box[1] / scale), int(box[3] / scale), int(box[5] / scale), int(box[7] / scale)) 45 | max_x = max(int(box[0] / scale), int(box[2] / scale), int(box[4] / scale), int(box[6] / scale)) 46 | max_y = max(int(box[1] / scale), int(box[3] / scale), int(box[5] / scale), int(box[7] / scale)) 47 | 48 | line = ','.join([str(min_x), str(min_y), str(max_x), str(max_y)]) + '\r\n' 49 | f.write(line) 50 | 51 | img = cv2.resize(img, None, None, fx=1.0 / scale, fy=1.0 / scale, interpolation=cv2.INTER_LINEAR) 52 | cv2.imwrite(os.path.join("data/results", base_name), img) 53 | 54 | 55 | def ctpn(sess, net, image_name): 56 | timer = Timer() 57 | timer.tic() 58 | 59 | img = cv2.imread(image_name) 60 | img, scale = resize_im(img, scale=TextLineCfg.SCALE, max_scale=TextLineCfg.MAX_SCALE) 61 | scores, boxes = test_ctpn(sess, net, img) 62 | 63 | textdetector = TextDetector() 64 | boxes = textdetector.detect(boxes, scores[:, np.newaxis], img.shape[:2]) 65 | draw_boxes(img, image_name, boxes, scale) 66 | timer.toc() 67 | print(('Detection took {:.3f}s for ' 68 | '{:d} object proposals').format(timer.total_time, boxes.shape[0])) 69 | 70 | 71 | if __name__ == '__main__': 72 | if os.path.exists("data/results/"): 73 | shutil.rmtree("data/results/") 74 | os.makedirs("data/results/") 75 | 76 | cfg_from_file('ctpn/text.yml') 77 | 78 | # init session 79 | config = tf.ConfigProto(allow_soft_placement=True) 80 | sess = tf.Session(config=config) 81 | # load network 82 | net = get_network("VGGnet_test") 83 | # load model 84 | print(('Loading network {:s}... '.format("VGGnet_test")), end=' ') 85 | saver = tf.train.Saver() 86 | 87 | try: 88 | ckpt = tf.train.get_checkpoint_state(cfg.TEST.checkpoints_path) 89 | print('Restoring from {}...'.format(ckpt.model_checkpoint_path), end=' ') 90 | saver.restore(sess, ckpt.model_checkpoint_path) 91 | print('done') 92 | except: 93 | raise 'Check your pretrained {:s}'.format(ckpt.model_checkpoint_path) 94 | 95 | im = 128 * np.ones((300, 300, 3), dtype=np.uint8) 96 | for i in range(2): 97 | _, _ = test_ctpn(sess, net, im) 98 | 99 | im_names = glob.glob(os.path.join(cfg.DATA_DIR, 'demo', '*.png')) + \ 100 | glob.glob(os.path.join(cfg.DATA_DIR, 'demo', '*.jpg')) 101 | 102 | for im_name in im_names: 103 | print('~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~') 104 | print(('Demo for {:s}'.format(im_name))) 105 | ctpn(sess, net, im_name) 106 | -------------------------------------------------------------------------------- /ctpn/demo_pb.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import glob 4 | import os 5 | import shutil 6 | import sys 7 | 8 | import cv2 9 | import numpy as np 10 | import tensorflow as tf 11 | from tensorflow.python.platform import gfile 12 | 13 | sys.path.append(os.getcwd()) 14 | from lib.fast_rcnn.config import cfg, cfg_from_file 15 | from lib.fast_rcnn.test import _get_blobs 16 | from lib.text_connector.detectors import TextDetector 17 | from lib.text_connector.text_connect_cfg import Config as TextLineCfg 18 | from lib.rpn_msr.proposal_layer_tf import proposal_layer 19 | 20 | 21 | def resize_im(im, scale, max_scale=None): 22 | f = float(scale) / min(im.shape[0], im.shape[1]) 23 | if max_scale != None and f * max(im.shape[0], im.shape[1]) > max_scale: 24 | f = float(max_scale) / max(im.shape[0], im.shape[1]) 25 | return cv2.resize(im, None, None, fx=f, fy=f, interpolation=cv2.INTER_LINEAR), f 26 | 27 | 28 | def draw_boxes(img, image_name, boxes, scale): 29 | base_name = image_name.split('/')[-1] 30 | with open('data/results/' + 'res_{}.txt'.format(base_name.split('.')[0]), 'w') as f: 31 | for box in boxes: 32 | if np.linalg.norm(box[0] - box[1]) < 5 or np.linalg.norm(box[3] - box[0]) < 5: 33 | continue 34 | if box[8] >= 0.9: 35 | color = (0, 255, 0) 36 | elif box[8] >= 0.8: 37 | color = (255, 0, 0) 38 | cv2.line(img, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), color, 2) 39 | cv2.line(img, (int(box[0]), int(box[1])), (int(box[4]), int(box[5])), color, 2) 40 | cv2.line(img, (int(box[6]), int(box[7])), (int(box[2]), int(box[3])), color, 2) 41 | cv2.line(img, (int(box[4]), int(box[5])), (int(box[6]), int(box[7])), color, 2) 42 | 43 | min_x = min(int(box[0] / scale), int(box[2] / scale), int(box[4] / scale), int(box[6] / scale)) 44 | min_y = min(int(box[1] / scale), int(box[3] / scale), int(box[5] / scale), int(box[7] / scale)) 45 | max_x = max(int(box[0] / scale), int(box[2] / scale), int(box[4] / scale), int(box[6] / scale)) 46 | max_y = max(int(box[1] / scale), int(box[3] / scale), int(box[5] / scale), int(box[7] / scale)) 47 | 48 | line = ','.join([str(min_x), str(min_y), str(max_x), str(max_y)]) + '\r\n' 49 | f.write(line) 50 | 51 | img = cv2.resize(img, None, None, fx=1.0 / scale, fy=1.0 / scale, interpolation=cv2.INTER_LINEAR) 52 | cv2.imwrite(os.path.join("data/results", base_name), img) 53 | 54 | 55 | if __name__ == '__main__': 56 | 57 | if os.path.exists("data/results/"): 58 | shutil.rmtree("data/results/") 59 | os.makedirs("data/results/") 60 | 61 | cfg_from_file('ctpn/text.yml') 62 | 63 | # init session 64 | config = tf.ConfigProto(allow_soft_placement=True) 65 | sess = tf.Session(config=config) 66 | with gfile.FastGFile('data/ctpn.pb', 'rb') as f: 67 | graph_def = tf.GraphDef() 68 | graph_def.ParseFromString(f.read()) 69 | sess.graph.as_default() 70 | tf.import_graph_def(graph_def, name='') 71 | sess.run(tf.global_variables_initializer()) 72 | 73 | input_img = sess.graph.get_tensor_by_name('Placeholder:0') 74 | output_cls_prob = sess.graph.get_tensor_by_name('Reshape_2:0') 75 | output_box_pred = sess.graph.get_tensor_by_name('rpn_bbox_pred/Reshape_1:0') 76 | 77 | im_names = glob.glob(os.path.join(cfg.DATA_DIR, 'demo', '*.png')) + \ 78 | glob.glob(os.path.join(cfg.DATA_DIR, 'demo', '*.jpg')) 79 | 80 | for im_name in im_names: 81 | print('~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~') 82 | print(('Demo for {:s}'.format(im_name))) 83 | img = cv2.imread(im_name) 84 | img, scale = resize_im(img, scale=TextLineCfg.SCALE, max_scale=TextLineCfg.MAX_SCALE) 85 | blobs, im_scales = _get_blobs(img, None) 86 | if cfg.TEST.HAS_RPN: 87 | im_blob = blobs['data'] 88 | blobs['im_info'] = np.array( 89 | [[im_blob.shape[1], im_blob.shape[2], im_scales[0]]], 90 | dtype=np.float32) 91 | cls_prob, box_pred = sess.run([output_cls_prob, output_box_pred], feed_dict={input_img: blobs['data']}) 92 | rois, _ = proposal_layer(cls_prob, box_pred, blobs['im_info'], 'TEST', anchor_scales=cfg.ANCHOR_SCALES) 93 | 94 | scores = rois[:, 0] 95 | boxes = rois[:, 1:5] / im_scales[0] 96 | textdetector = TextDetector() 97 | boxes = textdetector.detect(boxes, scores[:, np.newaxis], img.shape[:2]) 98 | draw_boxes(img, im_name, boxes, scale) 99 | -------------------------------------------------------------------------------- /ctpn/generate_pb.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | 6 | import tensorflow as tf 7 | from tensorflow.python.framework.graph_util import convert_variables_to_constants 8 | 9 | sys.path.append(os.getcwd()) 10 | from lib.networks.factory import get_network 11 | from lib.fast_rcnn.config import cfg, cfg_from_file 12 | 13 | if __name__ == "__main__": 14 | cfg_from_file('ctpn/text.yml') 15 | 16 | config = tf.ConfigProto(allow_soft_placement=True) 17 | sess = tf.Session(config=config) 18 | net = get_network("VGGnet_test") 19 | print(('Loading network {:s}... '.format("VGGnet_test")), end=' ') 20 | saver = tf.train.Saver() 21 | try: 22 | ckpt = tf.train.get_checkpoint_state(cfg.TEST.checkpoints_path) 23 | print('Restoring from {}...'.format(ckpt.model_checkpoint_path), end=' ') 24 | saver.restore(sess, ckpt.model_checkpoint_path) 25 | print('done') 26 | except: 27 | raise 'Check your pretrained {:s}'.format(ckpt.model_checkpoint_path) 28 | print(' done.') 29 | 30 | print('all nodes are:\n') 31 | graph = tf.get_default_graph() 32 | input_graph_def = graph.as_graph_def() 33 | node_names = [node.name for node in input_graph_def.node] 34 | for x in node_names: 35 | print(x) 36 | output_node_names = 'Reshape_2,rpn_bbox_pred/Reshape_1' 37 | output_graph_def = convert_variables_to_constants(sess, input_graph_def, output_node_names.split(',')) 38 | output_graph = 'data/ctpn.pb' 39 | with tf.gfile.GFile(output_graph, 'wb') as f: 40 | f.write(output_graph_def.SerializeToString()) 41 | sess.close() 42 | -------------------------------------------------------------------------------- /ctpn/text.yml: -------------------------------------------------------------------------------- 1 | EXP_DIR: ctpn_end2end 2 | LOG_DIR: ctpn 3 | IS_MULTISCALE: False 4 | NET_NAME: VGGnet 5 | ANCHOR_SCALES: [16] 6 | NCLASSES: 2 7 | USE_GPU_NMS: True 8 | TRAIN: 9 | restore: 0 10 | max_steps: 50000 11 | SOLVER: Adam 12 | OHEM: False 13 | RPN_BATCHSIZE: 300 14 | BATCH_SIZE: 300 15 | LOG_IMAGE_ITERS: 100 16 | DISPLAY: 10 17 | SNAPSHOT_ITERS: 1000 18 | HAS_RPN: True 19 | LEARNING_RATE: 0.00001 20 | MOMENTUM: 0.9 21 | GAMMA: 0.1 22 | STEPSIZE: 30000 23 | IMS_PER_BATCH: 1 24 | BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True 25 | RPN_POSITIVE_OVERLAP: 0.7 26 | PROPOSAL_METHOD: gt 27 | BG_THRESH_LO: 0.0 28 | PRECLUDE_HARD_SAMPLES: True 29 | BBOX_INSIDE_WEIGHTS: [0, 1, 0, 1] 30 | RPN_BBOX_INSIDE_WEIGHTS: [0, 1, 0, 1] 31 | RPN_POSITIVE_WEIGHT: -1.0 32 | FG_FRACTION: 0.3 33 | WEIGHT_DECAY: 0.0005 34 | TEST: 35 | HAS_RPN: True 36 | DETECT_MODE: H 37 | checkpoints_path: checkpoints/ 38 | # checkpoints_path: output/ctpn_end2end/voc_2007_trainval 39 | -------------------------------------------------------------------------------- /ctpn/train_net.py: -------------------------------------------------------------------------------- 1 | import os.path 2 | import pprint 3 | import sys 4 | 5 | sys.path.append(os.getcwd()) 6 | from lib.fast_rcnn.train import get_training_roidb, train_net 7 | from lib.fast_rcnn.config import cfg_from_file, get_output_dir, get_log_dir 8 | from lib.datasets.factory import get_imdb 9 | from lib.networks.factory import get_network 10 | from lib.fast_rcnn.config import cfg 11 | 12 | if __name__ == '__main__': 13 | cfg_from_file('ctpn/text.yml') 14 | print('Using config:') 15 | pprint.pprint(cfg) 16 | imdb = get_imdb('voc_2007_trainval') 17 | print('Loaded dataset `{:s}` for training'.format(imdb.name)) 18 | roidb = get_training_roidb(imdb) 19 | 20 | output_dir = get_output_dir(imdb, None) 21 | log_dir = get_log_dir(imdb) 22 | print('Output will be saved to `{:s}`'.format(output_dir)) 23 | print('Logs will be saved to `{:s}`'.format(log_dir)) 24 | 25 | device_name = '/gpu:0' 26 | print(device_name) 27 | 28 | network = get_network('VGGnet_train') 29 | 30 | train_net(network, imdb, roidb, 31 | output_dir=output_dir, 32 | log_dir=log_dir, 33 | pretrained_model='data/pretrain/VGG_imagenet.npy', 34 | max_iters=int(cfg.TRAIN.max_steps), 35 | restore=bool(int(cfg.TRAIN.restore))) 36 | -------------------------------------------------------------------------------- /data/VOCdevkit2007: -------------------------------------------------------------------------------- 1 | /media/D/code/OCR/CTPN_LSTM/data/VOCdevkit -------------------------------------------------------------------------------- /data/demo/006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/006.jpg -------------------------------------------------------------------------------- /data/demo/007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/007.jpg -------------------------------------------------------------------------------- /data/demo/008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/008.jpg -------------------------------------------------------------------------------- /data/demo/009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/009.jpg -------------------------------------------------------------------------------- /data/demo/010.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/010.png -------------------------------------------------------------------------------- /data/oriented_results/006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/006.jpg -------------------------------------------------------------------------------- /data/oriented_results/007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/007.jpg -------------------------------------------------------------------------------- /data/oriented_results/008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/008.jpg -------------------------------------------------------------------------------- /data/oriented_results/009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/009.jpg -------------------------------------------------------------------------------- /data/oriented_results/010.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/010.png -------------------------------------------------------------------------------- /data/results/006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/006.jpg -------------------------------------------------------------------------------- /data/results/007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/007.jpg -------------------------------------------------------------------------------- /data/results/008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/008.jpg -------------------------------------------------------------------------------- /data/results/009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/009.jpg -------------------------------------------------------------------------------- /data/results/010.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/010.png -------------------------------------------------------------------------------- /data/results/res_006.txt: -------------------------------------------------------------------------------- 1 | 435,476,870,576 2 | 409,299,716,400 3 | 179,118,691,237 4 | 179,623,614,740 5 | 153,952,947,1069 6 | 102,0,512,30 7 | 230,800,921,906 8 | -------------------------------------------------------------------------------- /data/results/res_007.txt: -------------------------------------------------------------------------------- 1 | 0,653,254,675 2 | 872,654,1018,676 3 | 181,373,836,558 4 | 436,287,545,387 5 | 345,100,654,310 6 | -------------------------------------------------------------------------------- /data/results/res_008.txt: -------------------------------------------------------------------------------- 1 | 96,214,512,258 2 | 96,161,320,198 3 | 96,255,480,302 4 | 96,458,320,493 5 | 96,299,480,343 6 | 96,548,464,593 7 | 96,419,480,462 8 | 272,63,384,103 9 | 432,68,496,98 10 | 96,586,464,632 11 | 96,382,496,425 12 | 96,515,448,559 13 | 96,787,224,799 14 | 432,18,496,49 15 | 96,742,256,778 16 | 96,622,448,668 17 | 96,707,464,756 18 | 96,61,128,91 19 | 96,15,208,51 20 | 96,340,496,382 21 | 256,13,384,53 22 | 96,659,224,694 23 | 80,120,512,166 24 | -------------------------------------------------------------------------------- /data/results/res_009.txt: -------------------------------------------------------------------------------- 1 | 0,695,947,857 2 | 0,19,947,239 3 | 128,1057,768,1237 4 | 51,882,870,1035 5 | 230,253,691,458 6 | -------------------------------------------------------------------------------- /data/results/res_010.txt: -------------------------------------------------------------------------------- 1 | 40,60,260,79 2 | 113,204,193,216 3 | 33,130,266,151 4 | 120,179,180,197 5 | 60,106,240,126 6 | 40,84,260,103 7 | 26,153,273,174 8 | 33,10,260,41 9 | -------------------------------------------------------------------------------- /lib/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/lib/__init__.py -------------------------------------------------------------------------------- /lib/datasets/__init__.py: -------------------------------------------------------------------------------- 1 | from .imdb import imdb 2 | from .pascal_voc import pascal_voc 3 | from . import factory 4 | 5 | -------------------------------------------------------------------------------- /lib/datasets/factory.py: -------------------------------------------------------------------------------- 1 | from .pascal_voc import pascal_voc 2 | __sets = {} 3 | def _selective_search_IJCV_top_k(split, year, top_k): 4 | imdb = pascal_voc(split, year) 5 | imdb.roidb_handler = imdb.selective_search_IJCV_roidb 6 | imdb.config['top_k'] = top_k 7 | return imdb 8 | # Set up voc__ using selective search "fast" mode 9 | for year in ['2007', '2012', '0712']: 10 | for split in ['train', 'val', 'trainval', 'test']: 11 | name = 'voc_{}_{}'.format(year, split) 12 | __sets[name] = (lambda split=split, year=year: 13 | pascal_voc(split, year)) 14 | 15 | def get_imdb(name): 16 | """Get an imdb (image database) by name.""" 17 | if name not in __sets: 18 | print((list_imdbs())) 19 | raise KeyError('Unknown dataset: {}'.format(name)) 20 | return __sets[name]() 21 | 22 | def list_imdbs(): 23 | """List all registered imdbs.""" 24 | return list(__sets.keys()) 25 | -------------------------------------------------------------------------------- /lib/datasets/imdb.py: -------------------------------------------------------------------------------- 1 | import os 2 | import os.path as osp 3 | import PIL 4 | import numpy as np 5 | import scipy.sparse 6 | from lib.utils.bbox import bbox_overlaps 7 | from lib.fast_rcnn.config import cfg 8 | 9 | class imdb(object): 10 | 11 | def __init__(self, name): 12 | self._name = name 13 | self._num_classes = 0 14 | self._classes = [] 15 | self._image_index = [] 16 | self._obj_proposer = 'selective_search' 17 | self._roidb = None 18 | print(self.default_roidb) 19 | self._roidb_handler = self.default_roidb 20 | # Use this dict for storing dataset specific config options 21 | self.config = {} 22 | 23 | @property 24 | def name(self): 25 | return self._name 26 | 27 | @property 28 | def num_classes(self): 29 | return len(self._classes) 30 | 31 | @property 32 | def classes(self): 33 | return self._classes 34 | 35 | @property 36 | def image_index(self): 37 | return self._image_index 38 | 39 | @property 40 | def roidb_handler(self): 41 | return self._roidb_handler 42 | 43 | @roidb_handler.setter 44 | def roidb_handler(self, val): 45 | self._roidb_handler = val 46 | 47 | def set_proposal_method(self, method): 48 | method = eval('self.' + method + '_roidb') 49 | self.roidb_handler = method 50 | 51 | @property 52 | def roidb(self): 53 | # A roidb is a list of dictionaries, each with the following keys: 54 | # boxes 55 | # gt_overlaps 56 | # gt_classes 57 | # flipped 58 | if self._roidb is not None: 59 | return self._roidb 60 | self._roidb = self.roidb_handler() 61 | return self._roidb 62 | 63 | @property 64 | def cache_path(self): 65 | cache_path = osp.abspath(osp.join(cfg.DATA_DIR, 'cache')) 66 | if not os.path.exists(cache_path): 67 | os.makedirs(cache_path) 68 | return cache_path 69 | 70 | @property 71 | def num_images(self): 72 | return len(self.image_index) 73 | 74 | def image_path_at(self, i): 75 | raise NotImplementedError 76 | 77 | def default_roidb(self): 78 | raise NotImplementedError 79 | 80 | def _get_widths(self): 81 | return [PIL.Image.open(self.image_path_at(i)).size[0] 82 | for i in range(self.num_images)] 83 | 84 | def append_flipped_images(self): 85 | num_images = self.num_images 86 | widths = self._get_widths() 87 | for i in range(num_images): 88 | boxes = self.roidb[i]['boxes'].copy() 89 | oldx1 = boxes[:, 0].copy() 90 | oldx2 = boxes[:, 2].copy() 91 | boxes[:, 0] = widths[i] - oldx2 - 1 92 | boxes[:, 2] = widths[i] - oldx1 - 1 93 | for b in range(len(boxes)): 94 | if boxes[b][2]< boxes[b][0]: 95 | boxes[b][0] = 0 96 | assert (boxes[:, 2] >= boxes[:, 0]).all() 97 | entry = {'boxes' : boxes, 98 | 'gt_overlaps' : self.roidb[i]['gt_overlaps'], 99 | 'gt_classes' : self.roidb[i]['gt_classes'], 100 | 'flipped' : True} 101 | 102 | if 'gt_ishard' in self.roidb[i] and 'dontcare_areas' in self.roidb[i]: 103 | entry['gt_ishard'] = self.roidb[i]['gt_ishard'].copy() 104 | dontcare_areas = self.roidb[i]['dontcare_areas'].copy() 105 | oldx1 = dontcare_areas[:, 0].copy() 106 | oldx2 = dontcare_areas[:, 2].copy() 107 | dontcare_areas[:, 0] = widths[i] - oldx2 - 1 108 | dontcare_areas[:, 2] = widths[i] - oldx1 - 1 109 | entry['dontcare_areas'] = dontcare_areas 110 | 111 | self.roidb.append(entry) 112 | 113 | self._image_index = self._image_index * 2 114 | 115 | 116 | def create_roidb_from_box_list(self, box_list, gt_roidb): 117 | assert len(box_list) == self.num_images, \ 118 | 'Number of boxes must match number of ground-truth images' 119 | roidb = [] 120 | for i in range(self.num_images): 121 | boxes = box_list[i] 122 | num_boxes = boxes.shape[0] 123 | overlaps = np.zeros((num_boxes, self.num_classes), dtype=np.float32) 124 | 125 | if gt_roidb is not None and gt_roidb[i]['boxes'].size > 0: 126 | gt_boxes = gt_roidb[i]['boxes'] 127 | gt_classes = gt_roidb[i]['gt_classes'] 128 | gt_overlaps = bbox_overlaps(boxes.astype(np.float), 129 | gt_boxes.astype(np.float)) 130 | argmaxes = gt_overlaps.argmax(axis=1) 131 | maxes = gt_overlaps.max(axis=1) 132 | I = np.where(maxes > 0)[0] 133 | overlaps[I, gt_classes[argmaxes[I]]] = maxes[I] 134 | 135 | overlaps = scipy.sparse.csr_matrix(overlaps) 136 | roidb.append({ 137 | 'boxes' : boxes, 138 | 'gt_classes' : np.zeros((num_boxes,), dtype=np.int32), 139 | 'gt_overlaps' : overlaps, 140 | 'flipped' : False, 141 | 'seg_areas' : np.zeros((num_boxes,), dtype=np.float32), 142 | }) 143 | return roidb 144 | 145 | @staticmethod 146 | def merge_roidbs(a, b): 147 | assert len(a) == len(b) 148 | for i in range(len(a)): 149 | a[i]['boxes'] = np.vstack((a[i]['boxes'], b[i]['boxes'])) 150 | a[i]['gt_classes'] = np.hstack((a[i]['gt_classes'], 151 | b[i]['gt_classes'])) 152 | a[i]['gt_overlaps'] = scipy.sparse.vstack([a[i]['gt_overlaps'], 153 | b[i]['gt_overlaps']]) 154 | a[i]['seg_areas'] = np.hstack((a[i]['seg_areas'], 155 | b[i]['seg_areas'])) 156 | return a 157 | 158 | -------------------------------------------------------------------------------- /lib/datasets/pascal_voc.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | import os 3 | import numpy as np 4 | import scipy.sparse 5 | try: 6 | import cPickle as pickle 7 | except: 8 | import pickle 9 | import uuid 10 | import xml.etree.ElementTree as ET 11 | from .imdb import imdb 12 | from lib.fast_rcnn.config import cfg 13 | 14 | class pascal_voc(imdb): 15 | def __init__(self, image_set, year, devkit_path=None): 16 | imdb.__init__(self, 'voc_' + year + '_' + image_set) 17 | self._year = year 18 | self._image_set = image_set 19 | self._devkit_path = self._get_default_path() if devkit_path is None \ 20 | else devkit_path 21 | self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year) 22 | self._classes = ('__background__', # always index 0 23 | 'text') 24 | 25 | self._class_to_ind = dict(list(zip(self.classes, list(range(self.num_classes))))) 26 | self._image_ext = '.jpg' 27 | self._image_index = self._load_image_set_index() 28 | # Default to roidb handler 29 | #self._roidb_handler = self.selective_search_roidb 30 | self._roidb_handler = self.gt_roidb 31 | self._salt = str(uuid.uuid4()) 32 | self._comp_id = 'comp4' 33 | 34 | # PASCAL specific config options 35 | self.config = {'cleanup' : True, 36 | 'use_salt' : True, 37 | 'use_diff' : False, 38 | 'matlab_eval' : False, 39 | 'rpn_file' : None, 40 | 'min_size' : 2} 41 | 42 | assert os.path.exists(self._devkit_path), \ 43 | 'VOCdevkit path does not exist: {}'.format(self._devkit_path) 44 | assert os.path.exists(self._data_path), \ 45 | 'Path does not exist: {}'.format(self._data_path) 46 | 47 | def image_path_at(self, i): 48 | """ 49 | Return the absolute path to image i in the image sequence. 50 | """ 51 | return self.image_path_from_index(self._image_index[i]) 52 | 53 | def image_path_from_index(self, index): 54 | """ 55 | Construct an image path from the image's "index" identifier. 56 | """ 57 | image_path = os.path.join(self._data_path, 'JPEGImages', 58 | index + self._image_ext) 59 | assert os.path.exists(image_path), \ 60 | 'Path does not exist: {}'.format(image_path) 61 | return image_path 62 | 63 | def _load_image_set_index(self): 64 | """ 65 | Load the indexes listed in this dataset's image set file. 66 | """ 67 | # Example path to image set file: 68 | # self._devkit_path + /VOCdevkit2007/VOC2007/ImageSets/Main/val.txt 69 | image_set_file = os.path.join(self._data_path, 'ImageSets', 'Main', 70 | self._image_set + '.txt') 71 | assert os.path.exists(image_set_file), \ 72 | 'Path does not exist: {}'.format(image_set_file) 73 | with open(image_set_file) as f: 74 | image_index = [x.strip() for x in f.readlines()] 75 | return image_index 76 | 77 | def _get_default_path(self): 78 | """ 79 | Return the default path where PASCAL VOC is expected to be installed. 80 | """ 81 | return os.path.join(cfg.DATA_DIR, 'VOCdevkit' + self._year) 82 | 83 | def gt_roidb(self): 84 | """ 85 | Return the database of ground-truth regions of interest. 86 | 87 | This function loads/saves from/to a cache file to speed up future calls. 88 | """ 89 | cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl') 90 | if os.path.exists(cache_file): 91 | with open(cache_file, 'rb') as fid: 92 | roidb = pickle.load(fid) 93 | print('{} gt roidb loaded from {}'.format(self.name, cache_file)) 94 | return roidb 95 | 96 | gt_roidb = [self._load_pascal_annotation(index) 97 | for index in self.image_index] 98 | with open(cache_file, 'wb') as fid: 99 | pickle.dump(gt_roidb, fid, pickle.HIGHEST_PROTOCOL) 100 | print('wrote gt roidb to {}'.format(cache_file)) 101 | 102 | return gt_roidb 103 | 104 | def rpn_roidb(self): 105 | if int(self._year) == 2007 or self._image_set != 'test': 106 | gt_roidb = self.gt_roidb() 107 | rpn_roidb = self._load_rpn_roidb(gt_roidb) 108 | roidb = imdb.merge_roidbs(gt_roidb, rpn_roidb) 109 | else: 110 | roidb = self._load_rpn_roidb(None) 111 | 112 | return roidb 113 | 114 | def _load_rpn_roidb(self, gt_roidb): 115 | filename = self.config['rpn_file'] 116 | print('loading {}'.format(filename)) 117 | assert os.path.exists(filename), \ 118 | 'rpn data not found at: {}'.format(filename) 119 | with open(filename, 'rb') as f: 120 | box_list = pickle.load(f) 121 | return self.create_roidb_from_box_list(box_list, gt_roidb) 122 | 123 | 124 | def _load_pascal_annotation(self, index): 125 | """ 126 | Load image and bounding boxes info from XML file in the PASCAL VOC 127 | format. 128 | """ 129 | filename = os.path.join(self._data_path, 'Annotations', index + '.xml') 130 | tree = ET.parse(filename) 131 | objs = tree.findall('object') 132 | num_objs = len(objs) 133 | 134 | boxes = np.zeros((num_objs, 4), dtype=np.uint16) 135 | gt_classes = np.zeros((num_objs), dtype=np.int32) 136 | overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32) 137 | # "Seg" area for pascal is just the box area 138 | seg_areas = np.zeros((num_objs), dtype=np.float32) 139 | ishards = np.zeros((num_objs), dtype=np.int32) 140 | 141 | # Load object bounding boxes into a data frame. 142 | for ix, obj in enumerate(objs): 143 | bbox = obj.find('bndbox') 144 | # Make pixel indexes 0-based 145 | x1 = float(bbox.find('xmin').text) 146 | y1 = float(bbox.find('ymin').text) 147 | x2 = float(bbox.find('xmax').text) 148 | y2 = float(bbox.find('ymax').text) 149 | diffc = obj.find('difficult') 150 | difficult = 0 if diffc == None else int(diffc.text) 151 | ishards[ix] = difficult 152 | 153 | cls = self._class_to_ind[obj.find('name').text.lower().strip()] 154 | boxes[ix, :] = [x1, y1, x2, y2] 155 | gt_classes[ix] = cls 156 | overlaps[ix, cls] = 1.0 157 | seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1) 158 | 159 | overlaps = scipy.sparse.csr_matrix(overlaps) 160 | 161 | return {'boxes' : boxes, 162 | 'gt_classes': gt_classes, 163 | 'gt_ishard': ishards, 164 | 'gt_overlaps' : overlaps, 165 | 'flipped' : False, 166 | 'seg_areas' : seg_areas} 167 | 168 | def _get_comp_id(self): 169 | comp_id = (self._comp_id + '_' + self._salt if self.config['use_salt'] 170 | else self._comp_id) 171 | return comp_id 172 | 173 | def _get_voc_results_file_template(self): 174 | filename = self._get_comp_id() + '_det_' + self._image_set + '_{:s}.txt' 175 | filedir = os.path.join(self._devkit_path, 'results', 'VOC' + self._year, 'Main') 176 | if not os.path.exists(filedir): 177 | os.makedirs(filedir) 178 | path = os.path.join(filedir, filename) 179 | return path 180 | 181 | def _write_voc_results_file(self, all_boxes): 182 | for cls_ind, cls in enumerate(self.classes): 183 | if cls == '__background__': 184 | continue 185 | print('Writing {} VOC results file'.format(cls)) 186 | filename = self._get_voc_results_file_template().format(cls) 187 | with open(filename, 'wt') as f: 188 | for im_ind, index in enumerate(self.image_index): 189 | dets = all_boxes[cls_ind][im_ind] 190 | if dets == []: 191 | continue 192 | # the VOCdevkit expects 1-based indices 193 | for k in range(dets.shape[0]): 194 | f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'. 195 | format(index, dets[k, -1], 196 | dets[k, 0] + 1, dets[k, 1] + 1, 197 | dets[k, 2] + 1, dets[k, 3] + 1)) 198 | 199 | 200 | if __name__ == '__main__': 201 | d = pascal_voc('trainval', '2007') 202 | res = d.roidb 203 | from IPython import embed; embed() 204 | -------------------------------------------------------------------------------- /lib/fast_rcnn/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/lib/fast_rcnn/__init__.py -------------------------------------------------------------------------------- /lib/fast_rcnn/bbox_transform.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | def bbox_transform(ex_rois, gt_rois): 4 | """ 5 | computes the distance from ground-truth boxes to the given boxes, normed by their size 6 | :param ex_rois: n * 4 numpy array, given boxes 7 | :param gt_rois: n * 4 numpy array, ground-truth boxes 8 | :return: deltas: n * 4 numpy array, ground-truth boxes 9 | """ 10 | ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0 11 | ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0 12 | ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths 13 | ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights 14 | 15 | assert np.min(ex_widths) > 0.1 and np.min(ex_heights) > 0.1, \ 16 | 'Invalid boxes found: {} {}'. \ 17 | format(ex_rois[np.argmin(ex_widths), :], ex_rois[np.argmin(ex_heights), :]) 18 | 19 | gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0 20 | gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0 21 | gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths 22 | gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights 23 | 24 | # warnings.catch_warnings() 25 | # warnings.filterwarnings('error') 26 | targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths 27 | targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights 28 | targets_dw = np.log(gt_widths / ex_widths) 29 | targets_dh = np.log(gt_heights / ex_heights) 30 | 31 | targets = np.vstack( 32 | (targets_dx, targets_dy, targets_dw, targets_dh)).transpose() 33 | 34 | return targets 35 | 36 | def bbox_transform_inv(boxes, deltas): 37 | 38 | boxes = boxes.astype(deltas.dtype, copy=False) 39 | 40 | widths = boxes[:, 2] - boxes[:, 0] + 1.0 41 | heights = boxes[:, 3] - boxes[:, 1] + 1.0 42 | ctr_x = boxes[:, 0] + 0.5 * widths 43 | ctr_y = boxes[:, 1] + 0.5 * heights 44 | 45 | dx = deltas[:, 0::4] 46 | dy = deltas[:, 1::4] 47 | dw = deltas[:, 2::4] 48 | dh = deltas[:, 3::4] 49 | 50 | pred_ctr_x = ctr_x[:, np.newaxis] 51 | pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis] 52 | pred_w = widths[:, np.newaxis] 53 | pred_h = np.exp(dh) * heights[:, np.newaxis] 54 | 55 | pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype) 56 | # x1 57 | pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w 58 | # y1 59 | pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h 60 | # x2 61 | pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w 62 | # y2 63 | pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h 64 | 65 | return pred_boxes 66 | 67 | def clip_boxes(boxes, im_shape): 68 | """ 69 | Clip boxes to image boundaries. 70 | """ 71 | 72 | # x1 >= 0 73 | boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0) 74 | # y1 >= 0 75 | boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0) 76 | # x2 < im_shape[1] 77 | boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0) 78 | # y2 < im_shape[0] 79 | boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0) 80 | return boxes 81 | -------------------------------------------------------------------------------- /lib/fast_rcnn/config.py: -------------------------------------------------------------------------------- 1 | import os 2 | import os.path as osp 3 | import numpy as np 4 | from time import strftime, localtime 5 | from easydict import EasyDict as edict 6 | 7 | __C = edict() 8 | cfg = __C 9 | 10 | # Default GPU device id 11 | __C.GPU_ID = 0 12 | 13 | # Training options 14 | __C.IS_RPN = True 15 | __C.ANCHOR_SCALES = [16] 16 | __C.NCLASSES = 2 17 | __C.USE_GPU_NMS = True 18 | # multiscale training and testing 19 | __C.IS_MULTISCALE = False 20 | __C.IS_EXTRAPOLATING = True 21 | 22 | __C.REGION_PROPOSAL = 'RPN' 23 | 24 | __C.NET_NAME = 'VGGnet' 25 | __C.SUBCLS_NAME = 'voxel_exemplars' 26 | 27 | __C.TRAIN = edict() 28 | # Adam, Momentum, RMS 29 | __C.TRAIN.restore = 0 30 | __C.TRAIN.max_steps = 100000 31 | __C.TRAIN.SOLVER = 'Momentum' 32 | # learning rate 33 | __C.TRAIN.WEIGHT_DECAY = 0.0005 34 | __C.TRAIN.LEARNING_RATE = 0.001 35 | __C.TRAIN.MOMENTUM = 0.9 36 | __C.TRAIN.GAMMA = 0.1 37 | __C.TRAIN.STEPSIZE = 50000 38 | __C.TRAIN.DISPLAY = 10 39 | __C.TRAIN.LOG_IMAGE_ITERS = 100 40 | __C.TRAIN.OHEM = False 41 | __C.TRAIN.RANDOM_DOWNSAMPLE = False 42 | 43 | # Scales to compute real features 44 | __C.TRAIN.SCALES_BASE = (0.25, 0.5, 1.0, 2.0, 3.0) 45 | __C.TRAIN.KERNEL_SIZE = 5 46 | __C.TRAIN.ASPECTS= (1,) 47 | __C.TRAIN.SCALES = (600,) 48 | 49 | # Max pixel size of the longest side of a scaled input image 50 | __C.TRAIN.MAX_SIZE = 1000 51 | 52 | # Images to use per minibatch 53 | __C.TRAIN.IMS_PER_BATCH = 2 54 | 55 | # Minibatch size (number of regions of interest [ROIs]) 56 | __C.TRAIN.BATCH_SIZE = 128 57 | 58 | # Fraction of minibatch that is labeled foreground (i.e. class > 0) 59 | __C.TRAIN.FG_FRACTION = 0.25 60 | 61 | # Overlap threshold for a ROI to be considered foreground (if >= FG_THRESH) 62 | __C.TRAIN.FG_THRESH = 0.5 63 | 64 | # Overlap threshold for a ROI to be considered background (class = 0 if 65 | # overlap in [LO, HI)) 66 | __C.TRAIN.BG_THRESH_HI = 0.5 67 | __C.TRAIN.BG_THRESH_LO = 0.1 68 | 69 | # Use horizontally-flipped images during training? 70 | __C.TRAIN.USE_FLIPPED = True 71 | 72 | # Train bounding-box regressors 73 | __C.TRAIN.BBOX_REG = True 74 | 75 | # Overlap required between a ROI and ground-truth box in order for that ROI to 76 | # be used as a bounding-box regression training example 77 | __C.TRAIN.BBOX_THRESH = 0.5 78 | 79 | # Iterations between snapshots 80 | __C.TRAIN.SNAPSHOT_ITERS = 5000 81 | 82 | # solver.prototxt specifies the snapshot path prefix, this adds an optional 83 | # infix to yield the path: [_]_iters_XYZ.caffemodel 84 | __C.TRAIN.SNAPSHOT_PREFIX = 'VGGnet_fast_rcnn' 85 | __C.TRAIN.SNAPSHOT_INFIX = '' 86 | 87 | # Use a prefetch thread in roi_data_layer.layer 88 | # So far I haven't found this useful; likely more engineering work is required 89 | __C.TRAIN.USE_PREFETCH = False 90 | 91 | # Normalize the targets (subtract empirical mean, divide by empirical stddev) 92 | __C.TRAIN.BBOX_NORMALIZE_TARGETS = True 93 | # Deprecated (inside weights) 94 | # used for assigning weights for each coords (x1, y1, w, h) 95 | __C.TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0) 96 | # Normalize the targets using "precomputed" (or made up) means and stdevs 97 | # (BBOX_NORMALIZE_TARGETS must also be True) 98 | __C.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = True 99 | __C.TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0) 100 | __C.TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2) 101 | # faster rcnn dont use pre-generated rois by selective search 102 | # __C.TRAIN.BBOX_NORMALIZE_STDS = (1, 1, 1, 1) 103 | 104 | # Train using these proposals 105 | __C.TRAIN.PROPOSAL_METHOD = 'selective_search' 106 | 107 | # Make minibatches from images that have similar aspect ratios (i.e. both 108 | # tall and thin or both short and wide) in order to avoid wasting computation 109 | # on zero-padding. 110 | __C.TRAIN.ASPECT_GROUPING = True 111 | # preclude rois intersected with dontcare areas above the value 112 | __C.TRAIN.DONTCARE_AREA_INTERSECTION_HI = 0.5 113 | __C.TRAIN.PRECLUDE_HARD_SAMPLES = True 114 | # Use RPN to detect objects 115 | __C.TRAIN.HAS_RPN = True 116 | # IOU >= thresh: positive example 117 | __C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7 118 | # IOU < thresh: negative example 119 | __C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3 120 | # If an anchor statisfied by positive and negative conditions set to negative 121 | __C.TRAIN.RPN_CLOBBER_POSITIVES = False 122 | # Max number of foreground examples 123 | __C.TRAIN.RPN_FG_FRACTION = 0.5 124 | # Total number of examples 125 | __C.TRAIN.RPN_BATCHSIZE = 256 126 | # NMS threshold used on RPN proposals 127 | __C.TRAIN.RPN_NMS_THRESH = 0.7 128 | # Number of top scoring boxes to keep before apply NMS to RPN proposals 129 | __C.TRAIN.RPN_PRE_NMS_TOP_N = 12000 130 | # Number of top scoring boxes to keep after applying NMS to RPN proposals 131 | __C.TRAIN.RPN_POST_NMS_TOP_N = 2000 132 | # Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale) 133 | __C.TRAIN.RPN_MIN_SIZE = 8 134 | # Deprecated (outside weights) 135 | __C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0) 136 | # Give the positive RPN examples weight of p * 1 / {num positives} 137 | # and give negatives a weight of (1 - p) 138 | # Set to -1.0 to use uniform example weighting 139 | __C.TRAIN.RPN_POSITIVE_WEIGHT = -1.0 140 | # __C.TRAIN.RPN_POSITIVE_WEIGHT = 0.5 141 | 142 | 143 | # 144 | # Testing options 145 | # 146 | 147 | __C.TEST = edict() 148 | __C.TEST.checkpoints_path = "checkpoints/" 149 | __C.TEST.DETECT_MODE = "H"#H/O for horizontal/oriented mode 150 | # Scales to use during testing (can list multiple scales) 151 | # Each scale is the pixel size of an image's shortest side 152 | __C.TEST.SCALES = (600,) 153 | 154 | # Max pixel size of the longest side of a scaled input image 155 | __C.TEST.MAX_SIZE = 1000 156 | 157 | # Overlap threshold used for non-maximum suppression (suppress boxes with 158 | # IoU >= this threshold) 159 | __C.TEST.NMS = 0.3 160 | 161 | # Experimental: treat the (K+1) units in the cls_score layer as linear 162 | # predictors (trained, eg, with one-vs-rest SVMs). 163 | __C.TEST.SVM = False 164 | 165 | # Test using bounding-box regressors 166 | __C.TEST.BBOX_REG = True 167 | 168 | # Propose boxes 169 | __C.TEST.HAS_RPN = True 170 | 171 | # Test using these proposals 172 | __C.TEST.PROPOSAL_METHOD = 'selective_search' 173 | 174 | ## NMS threshold used on RPN proposals 175 | __C.TEST.RPN_NMS_THRESH = 0.7 176 | ## Number of top scoring boxes to keep before apply NMS to RPN proposals 177 | #__C.TEST.RPN_PRE_NMS_TOP_N = 6000 178 | __C.TEST.RPN_PRE_NMS_TOP_N = 12000 179 | ## Number of top scoring boxes to keep after applying NMS to RPN proposals 180 | __C.TEST.RPN_POST_NMS_TOP_N = 1000 181 | #__C.TEST.RPN_POST_NMS_TOP_N = 2000 182 | # Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale) 183 | __C.TEST.RPN_MIN_SIZE = 8 184 | 185 | 186 | # 187 | # MISC 188 | # 189 | 190 | # The mapping from image coordinates to feature map coordinates might cause 191 | # some boxes that are distinct in image space to become identical in feature 192 | # coordinates. If DEDUP_BOXES > 0, then DEDUP_BOXES is used as the scale factor 193 | # for identifying duplicate boxes. 194 | # 1/16 is correct for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16 195 | __C.DEDUP_BOXES = 1./16. 196 | 197 | # Pixel mean values (BGR order) as a (1, 1, 3) array 198 | # We use the same pixel mean for all networks even though it's not exactly what 199 | # they were trained with 200 | __C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]]) 201 | 202 | # For reproducibility 203 | #__C.RNG_SEED = 3 204 | __C.RNG_SEED = 3 205 | 206 | # A small number that's used many times 207 | __C.EPS = 1e-14 208 | 209 | # Root directory of project 210 | __C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..')) 211 | 212 | # Data directory 213 | __C.DATA_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'data')) 214 | 215 | # Model directory 216 | __C.MODELS_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'models', 'pascal_voc')) 217 | 218 | # Name (or path to) the matlab executable 219 | __C.MATLAB = 'matlab' 220 | 221 | # Place outputs under an experiments directory 222 | __C.EXP_DIR = 'default' 223 | __C.LOG_DIR = 'default' 224 | 225 | # Use GPU implementation of non-maximum suppression 226 | __C.USE_GPU_NMS = True 227 | 228 | 229 | 230 | def get_output_dir(imdb, weights_filename): 231 | """Return the directory where experimental artifacts are placed. 232 | If the directory does not exist, it is created. 233 | 234 | A canonical path is built using the name from an imdb and a network 235 | (if not None). 236 | """ 237 | outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'output', __C.EXP_DIR, imdb.name)) 238 | if weights_filename is not None: 239 | outdir = osp.join(outdir, weights_filename) 240 | if not os.path.exists(outdir): 241 | os.makedirs(outdir) 242 | return outdir 243 | 244 | def get_log_dir(imdb): 245 | """Return the directory where experimental artifacts are placed. 246 | If the directory does not exist, it is created. 247 | A canonical path is built using the name from an imdb and a network 248 | (if not None). 249 | """ 250 | log_dir = osp.abspath(\ 251 | osp.join(__C.ROOT_DIR, 'logs', __C.LOG_DIR, imdb.name, strftime("%Y-%m-%d-%H-%M-%S", localtime()))) 252 | if not os.path.exists(log_dir): 253 | os.makedirs(log_dir) 254 | return log_dir 255 | 256 | def _merge_a_into_b(a, b): 257 | """Merge config dictionary a into config dictionary b, clobbering the 258 | options in b whenever they are also specified in a. 259 | """ 260 | if type(a) is not edict: 261 | return 262 | 263 | for k, v in a.items(): 264 | # a must specify keys that are in b 265 | if k not in b: 266 | raise KeyError('{} is not a valid config key'.format(k)) 267 | 268 | # the types must match, too 269 | old_type = type(b[k]) 270 | if old_type is not type(v): 271 | if isinstance(b[k], np.ndarray): 272 | v = np.array(v, dtype=b[k].dtype) 273 | else: 274 | raise ValueError(('Type mismatch ({} vs. {}) ' 275 | 'for config key: {}').format(type(b[k]), 276 | type(v), k)) 277 | 278 | # recursively merge dicts 279 | if type(v) is edict: 280 | try: 281 | _merge_a_into_b(a[k], b[k]) 282 | except: 283 | print(('Error under config key: {}'.format(k))) 284 | raise 285 | else: 286 | b[k] = v 287 | 288 | def cfg_from_file(filename): 289 | """Load a config file and merge it into the default options.""" 290 | import yaml 291 | with open(filename, 'r') as f: 292 | yaml_cfg = edict(yaml.load(f)) 293 | 294 | _merge_a_into_b(yaml_cfg, __C) 295 | 296 | def cfg_from_list(cfg_list): 297 | """Set config keys via list (e.g., from command line).""" 298 | from ast import literal_eval 299 | assert len(cfg_list) % 2 == 0 300 | for k, v in zip(cfg_list[0::2], cfg_list[1::2]): 301 | key_list = k.split('.') 302 | d = __C 303 | for subkey in key_list[:-1]: 304 | assert subkey in d 305 | d = d[subkey] 306 | subkey = key_list[-1] 307 | assert subkey in d 308 | try: 309 | value = literal_eval(v) 310 | except: 311 | # handle the case when v is a string literal 312 | value = v 313 | assert type(value) == type(d[subkey]), \ 314 | 'type {} does not match original type {}'.format( 315 | type(value), type(d[subkey])) 316 | d[subkey] = value 317 | -------------------------------------------------------------------------------- /lib/fast_rcnn/nms_wrapper.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from .config import cfg 3 | pure_python_nms = False 4 | try: 5 | from lib.utils.gpu_nms import gpu_nms 6 | from ..utils.cython_nms import nms as cython_nms 7 | except ImportError: 8 | pure_python_nms = True 9 | 10 | 11 | def nms(dets, thresh): 12 | if dets.shape[0] == 0: 13 | return [] 14 | if pure_python_nms: 15 | # print("Fall back to pure python nms") 16 | return py_cpu_nms(dets, thresh) 17 | if cfg.USE_GPU_NMS: 18 | return gpu_nms(dets, thresh, device_id=cfg.GPU_ID) 19 | else: 20 | return cython_nms(dets, thresh) 21 | 22 | 23 | def py_cpu_nms(dets, thresh): 24 | x1 = dets[:, 0] 25 | y1 = dets[:, 1] 26 | x2 = dets[:, 2] 27 | y2 = dets[:, 3] 28 | scores = dets[:, 4] 29 | 30 | areas = (x2 - x1 + 1) * (y2 - y1 + 1) 31 | order = scores.argsort()[::-1] 32 | 33 | keep = [] 34 | while order.size > 0: 35 | i = order[0] 36 | keep.append(i) 37 | xx1 = np.maximum(x1[i], x1[order[1:]]) 38 | yy1 = np.maximum(y1[i], y1[order[1:]]) 39 | xx2 = np.minimum(x2[i], x2[order[1:]]) 40 | yy2 = np.minimum(y2[i], y2[order[1:]]) 41 | w = np.maximum(0.0, xx2 - xx1 + 1) 42 | h = np.maximum(0.0, yy2 - yy1 + 1) 43 | inter = w * h 44 | ovr = inter / (areas[i] + areas[order[1:]] - inter) 45 | inds = np.where(ovr <= thresh)[0] 46 | order = order[inds + 1] 47 | return keep 48 | -------------------------------------------------------------------------------- /lib/fast_rcnn/test.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | from .config import cfg 4 | from lib.utils.blob import im_list_to_blob 5 | 6 | 7 | def _get_image_blob(im): 8 | im_orig = im.astype(np.float32, copy=True) 9 | im_orig -= cfg.PIXEL_MEANS 10 | 11 | im_shape = im_orig.shape 12 | im_size_min = np.min(im_shape[0:2]) 13 | im_size_max = np.max(im_shape[0:2]) 14 | 15 | processed_ims = [] 16 | im_scale_factors = [] 17 | 18 | for target_size in cfg.TEST.SCALES: 19 | im_scale = float(target_size) / float(im_size_min) 20 | # Prevent the biggest axis from being more than MAX_SIZE 21 | if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE: 22 | im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max) 23 | im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale, 24 | interpolation=cv2.INTER_LINEAR) 25 | im_scale_factors.append(im_scale) 26 | processed_ims.append(im) 27 | 28 | # Create a blob to hold the input images 29 | blob = im_list_to_blob(processed_ims) 30 | 31 | return blob, np.array(im_scale_factors) 32 | 33 | 34 | def _get_blobs(im, rois): 35 | blobs = {'data' : None, 'rois' : None} 36 | blobs['data'], im_scale_factors = _get_image_blob(im) 37 | return blobs, im_scale_factors 38 | 39 | 40 | def test_ctpn(sess, net, im, boxes=None): 41 | blobs, im_scales = _get_blobs(im, boxes) 42 | if cfg.TEST.HAS_RPN: 43 | im_blob = blobs['data'] 44 | blobs['im_info'] = np.array( 45 | [[im_blob.shape[1], im_blob.shape[2], im_scales[0]]], 46 | dtype=np.float32) 47 | # forward pass 48 | if cfg.TEST.HAS_RPN: 49 | feed_dict = {net.data: blobs['data'], net.im_info: blobs['im_info'], net.keep_prob: 1.0} 50 | 51 | rois = sess.run([net.get_output('rois')[0]],feed_dict=feed_dict) 52 | rois=rois[0] 53 | 54 | scores = rois[:, 0] 55 | if cfg.TEST.HAS_RPN: 56 | assert len(im_scales) == 1, "Only single-image batch implemented" 57 | boxes = rois[:, 1:5] / im_scales[0] 58 | return scores,boxes 59 | -------------------------------------------------------------------------------- /lib/fast_rcnn/train.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import numpy as np 3 | import os 4 | import tensorflow as tf 5 | from lib.roi_data_layer.layer import RoIDataLayer 6 | from lib.utils.timer import Timer 7 | from lib.roi_data_layer import roidb as rdl_roidb 8 | from lib.fast_rcnn.config import cfg 9 | 10 | _DEBUG = False 11 | 12 | class SolverWrapper(object): 13 | def __init__(self, sess, network, imdb, roidb, output_dir, logdir, pretrained_model=None): 14 | """Initialize the SolverWrapper.""" 15 | self.net = network 16 | self.imdb = imdb 17 | self.roidb = roidb 18 | self.output_dir = output_dir 19 | self.pretrained_model = pretrained_model 20 | 21 | print('Computing bounding-box regression targets...') 22 | if cfg.TRAIN.BBOX_REG: 23 | self.bbox_means, self.bbox_stds = rdl_roidb.add_bbox_regression_targets(roidb) 24 | print('done') 25 | 26 | # For checkpoint 27 | self.saver = tf.train.Saver(max_to_keep=100,write_version=tf.train.SaverDef.V2) 28 | self.writer = tf.summary.FileWriter(logdir=logdir, 29 | graph=tf.get_default_graph(), 30 | flush_secs=5) 31 | 32 | def snapshot(self, sess, iter): 33 | net = self.net 34 | if cfg.TRAIN.BBOX_REG and 'bbox_pred' in net.layers and cfg.TRAIN.BBOX_NORMALIZE_TARGETS: 35 | # save original values 36 | with tf.variable_scope('bbox_pred', reuse=True): 37 | weights = tf.get_variable("weights") 38 | biases = tf.get_variable("biases") 39 | 40 | orig_0 = weights.eval() 41 | orig_1 = biases.eval() 42 | 43 | # scale and shift with bbox reg unnormalization; then save snapshot 44 | weights_shape = weights.get_shape().as_list() 45 | sess.run(weights.assign(orig_0 * np.tile(self.bbox_stds, (weights_shape[0],1)))) 46 | sess.run(biases.assign(orig_1 * self.bbox_stds + self.bbox_means)) 47 | 48 | if not os.path.exists(self.output_dir): 49 | os.makedirs(self.output_dir) 50 | 51 | infix = ('_' + cfg.TRAIN.SNAPSHOT_INFIX 52 | if cfg.TRAIN.SNAPSHOT_INFIX != '' else '') 53 | filename = (cfg.TRAIN.SNAPSHOT_PREFIX + infix + 54 | '_iter_{:d}'.format(iter+1) + '.ckpt') 55 | filename = os.path.join(self.output_dir, filename) 56 | 57 | self.saver.save(sess, filename) 58 | print('Wrote snapshot to: {:s}'.format(filename)) 59 | 60 | if cfg.TRAIN.BBOX_REG and 'bbox_pred' in net.layers: 61 | # restore net to original state 62 | sess.run(weights.assign(orig_0)) 63 | sess.run(biases.assign(orig_1)) 64 | 65 | def build_image_summary(self): 66 | # A simple graph for write image summary 67 | 68 | log_image_data = tf.placeholder(tf.uint8, [None, None, 3]) 69 | log_image_name = tf.placeholder(tf.string) 70 | # import tensorflow.python.ops.gen_logging_ops as logging_ops 71 | from tensorflow.python.ops import gen_logging_ops 72 | from tensorflow.python.framework import ops as _ops 73 | log_image = gen_logging_ops._image_summary(log_image_name, tf.expand_dims(log_image_data, 0), max_images=1) 74 | _ops.add_to_collection(_ops.GraphKeys.SUMMARIES, log_image) 75 | # log_image = tf.summary.image(log_image_name, tf.expand_dims(log_image_data, 0), max_outputs=1) 76 | return log_image, log_image_data, log_image_name 77 | 78 | 79 | def train_model(self, sess, max_iters, restore=False): 80 | """Network training loop.""" 81 | data_layer = get_data_layer(self.roidb, self.imdb.num_classes) 82 | total_loss,model_loss, rpn_cross_entropy, rpn_loss_box=self.net.build_loss(ohem=cfg.TRAIN.OHEM) 83 | # scalar summary 84 | tf.summary.scalar('rpn_reg_loss', rpn_loss_box) 85 | tf.summary.scalar('rpn_cls_loss', rpn_cross_entropy) 86 | tf.summary.scalar('model_loss', model_loss) 87 | tf.summary.scalar('total_loss',total_loss) 88 | summary_op = tf.summary.merge_all() 89 | 90 | log_image, log_image_data, log_image_name =\ 91 | self.build_image_summary() 92 | 93 | # optimizer 94 | lr = tf.Variable(cfg.TRAIN.LEARNING_RATE, trainable=False) 95 | if cfg.TRAIN.SOLVER == 'Adam': 96 | opt = tf.train.AdamOptimizer(cfg.TRAIN.LEARNING_RATE) 97 | elif cfg.TRAIN.SOLVER == 'RMS': 98 | opt = tf.train.RMSPropOptimizer(cfg.TRAIN.LEARNING_RATE) 99 | else: 100 | # lr = tf.Variable(0.0, trainable=False) 101 | momentum = cfg.TRAIN.MOMENTUM 102 | opt = tf.train.MomentumOptimizer(lr, momentum) 103 | 104 | global_step = tf.Variable(0, trainable=False) 105 | with_clip = True 106 | if with_clip: 107 | tvars = tf.trainable_variables() 108 | grads, norm = tf.clip_by_global_norm(tf.gradients(total_loss, tvars), 10.0) 109 | train_op = opt.apply_gradients(list(zip(grads, tvars)), global_step=global_step) 110 | else: 111 | train_op = opt.minimize(total_loss, global_step=global_step) 112 | 113 | # intialize variables 114 | sess.run(tf.global_variables_initializer()) 115 | restore_iter = 0 116 | 117 | # load vgg16 118 | if self.pretrained_model is not None and not restore: 119 | try: 120 | print(('Loading pretrained model ' 121 | 'weights from {:s}').format(self.pretrained_model)) 122 | self.net.load(self.pretrained_model, sess, True) 123 | except: 124 | raise Exception('Check your pretrained model {:s}'.format(self.pretrained_model)) 125 | 126 | # resuming a trainer 127 | if restore: 128 | try: 129 | ckpt = tf.train.get_checkpoint_state(self.output_dir) 130 | print('Restoring from {}...'.format(ckpt.model_checkpoint_path), end=' ') 131 | self.saver.restore(sess, ckpt.model_checkpoint_path) 132 | stem = os.path.splitext(os.path.basename(ckpt.model_checkpoint_path))[0] 133 | restore_iter = int(stem.split('_')[-1]) 134 | sess.run(global_step.assign(restore_iter)) 135 | print('done') 136 | except: 137 | raise 'Check your pretrained {:s}'.format(ckpt.model_checkpoint_path) 138 | 139 | last_snapshot_iter = -1 140 | timer = Timer() 141 | for iter in range(restore_iter, max_iters): 142 | timer.tic() 143 | # learning rate 144 | if iter != 0 and iter % cfg.TRAIN.STEPSIZE == 0: 145 | sess.run(tf.assign(lr, lr.eval() * cfg.TRAIN.GAMMA)) 146 | print(lr) 147 | 148 | # get one batch 149 | blobs = data_layer.forward() 150 | 151 | feed_dict={ 152 | self.net.data: blobs['data'], 153 | self.net.im_info: blobs['im_info'], 154 | self.net.keep_prob: 0.5, 155 | self.net.gt_boxes: blobs['gt_boxes'], 156 | self.net.gt_ishard: blobs['gt_ishard'], 157 | self.net.dontcare_areas: blobs['dontcare_areas'] 158 | } 159 | res_fetches=[] 160 | fetch_list = [total_loss,model_loss, rpn_cross_entropy, rpn_loss_box, 161 | summary_op, 162 | train_op] + res_fetches 163 | 164 | total_loss_val,model_loss_val, rpn_loss_cls_val, rpn_loss_box_val, \ 165 | summary_str, _ = sess.run(fetches=fetch_list, feed_dict=feed_dict) 166 | 167 | self.writer.add_summary(summary=summary_str, global_step=global_step.eval()) 168 | 169 | _diff_time = timer.toc(average=False) 170 | 171 | 172 | if (iter) % (cfg.TRAIN.DISPLAY) == 0: 173 | print('iter: %d / %d, total loss: %.4f, model loss: %.4f, rpn_loss_cls: %.4f, rpn_loss_box: %.4f, lr: %f'%\ 174 | (iter, max_iters, total_loss_val,model_loss_val,rpn_loss_cls_val,rpn_loss_box_val,lr.eval())) 175 | print('speed: {:.3f}s / iter'.format(_diff_time)) 176 | 177 | if (iter+1) % cfg.TRAIN.SNAPSHOT_ITERS == 0: 178 | last_snapshot_iter = iter 179 | self.snapshot(sess, iter) 180 | 181 | if last_snapshot_iter != iter: 182 | self.snapshot(sess, iter) 183 | 184 | def get_training_roidb(imdb): 185 | """Returns a roidb (Region of Interest database) for use in training.""" 186 | if cfg.TRAIN.USE_FLIPPED: 187 | print('Appending horizontally-flipped training examples...') 188 | imdb.append_flipped_images() 189 | print('done') 190 | 191 | print('Preparing training data...') 192 | if cfg.TRAIN.HAS_RPN: 193 | rdl_roidb.prepare_roidb(imdb) 194 | else: 195 | rdl_roidb.prepare_roidb(imdb) 196 | print('done') 197 | 198 | return imdb.roidb 199 | 200 | 201 | def get_data_layer(roidb, num_classes): 202 | """return a data layer.""" 203 | if cfg.TRAIN.HAS_RPN: 204 | if cfg.IS_MULTISCALE: 205 | # obsolete 206 | # layer = GtDataLayer(roidb) 207 | raise "Calling caffe modules..." 208 | else: 209 | layer = RoIDataLayer(roidb, num_classes) 210 | else: 211 | layer = RoIDataLayer(roidb, num_classes) 212 | 213 | return layer 214 | 215 | 216 | 217 | def train_net(network, imdb, roidb, output_dir, log_dir, pretrained_model=None, max_iters=40000, restore=False): 218 | """Train a Fast R-CNN network.""" 219 | 220 | config = tf.ConfigProto(allow_soft_placement=True) 221 | config.gpu_options.allocator_type = 'BFC' 222 | config.gpu_options.per_process_gpu_memory_fraction = 0.75 223 | with tf.Session(config=config) as sess: 224 | sw = SolverWrapper(sess, network, imdb, roidb, output_dir, logdir= log_dir, pretrained_model=pretrained_model) 225 | print('Solving...') 226 | sw.train_model(sess, max_iters, restore=restore) 227 | print('done solving') 228 | -------------------------------------------------------------------------------- /lib/networks/VGGnet_test.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from .network import Network 3 | from lib.fast_rcnn.config import cfg 4 | 5 | 6 | class VGGnet_test(Network): 7 | def __init__(self, trainable=True): 8 | self.inputs = [] 9 | self.data = tf.placeholder(tf.float32, shape=[None, None, None, 3]) 10 | self.im_info = tf.placeholder(tf.float32, shape=[None, 3]) 11 | self.keep_prob = tf.placeholder(tf.float32) 12 | self.layers = dict({'data': self.data, 'im_info': self.im_info}) 13 | self.trainable = trainable 14 | self.setup() 15 | 16 | def setup(self): 17 | anchor_scales = cfg.ANCHOR_SCALES 18 | _feat_stride = [16, ] 19 | 20 | (self.feed('data') 21 | .conv(3, 3, 64, 1, 1, name='conv1_1') 22 | .conv(3, 3, 64, 1, 1, name='conv1_2') 23 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool1') 24 | .conv(3, 3, 128, 1, 1, name='conv2_1') 25 | .conv(3, 3, 128, 1, 1, name='conv2_2') 26 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool2') 27 | .conv(3, 3, 256, 1, 1, name='conv3_1') 28 | .conv(3, 3, 256, 1, 1, name='conv3_2') 29 | .conv(3, 3, 256, 1, 1, name='conv3_3') 30 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool3') 31 | .conv(3, 3, 512, 1, 1, name='conv4_1') 32 | .conv(3, 3, 512, 1, 1, name='conv4_2') 33 | .conv(3, 3, 512, 1, 1, name='conv4_3') 34 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool4') 35 | .conv(3, 3, 512, 1, 1, name='conv5_1') 36 | .conv(3, 3, 512, 1, 1, name='conv5_2') 37 | .conv(3, 3, 512, 1, 1, name='conv5_3')) 38 | 39 | (self.feed('conv5_3').conv(3, 3, 512, 1, 1, name='rpn_conv/3x3')) 40 | 41 | (self.feed('rpn_conv/3x3').Bilstm(512, 128, 512, name='lstm_o')) 42 | (self.feed('lstm_o').lstm_fc(512, len(anchor_scales) * 10 * 4, name='rpn_bbox_pred')) 43 | (self.feed('lstm_o').lstm_fc(512, len(anchor_scales) * 10 * 2, name='rpn_cls_score')) 44 | 45 | # shape is (1, H, W, Ax2) -> (1, H, WxA, 2) 46 | (self.feed('rpn_cls_score') 47 | .spatial_reshape_layer(2, name='rpn_cls_score_reshape') 48 | .spatial_softmax(name='rpn_cls_prob')) 49 | 50 | # shape is (1, H, WxA, 2) -> (1, H, W, Ax2) 51 | (self.feed('rpn_cls_prob') 52 | .spatial_reshape_layer(len(anchor_scales) * 10 * 2, name='rpn_cls_prob_reshape')) 53 | 54 | (self.feed('rpn_cls_prob_reshape', 'rpn_bbox_pred', 'im_info') 55 | .proposal_layer(_feat_stride, anchor_scales, 'TEST', name='rois')) 56 | -------------------------------------------------------------------------------- /lib/networks/VGGnet_train.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | import tensorflow as tf 3 | from .network import Network 4 | from lib.fast_rcnn.config import cfg 5 | 6 | class VGGnet_train(Network): 7 | def __init__(self, trainable=True): 8 | self.inputs = [] 9 | self.data = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='data') 10 | self.im_info = tf.placeholder(tf.float32, shape=[None, 3], name='im_info') 11 | self.gt_boxes = tf.placeholder(tf.float32, shape=[None, 5], name='gt_boxes') 12 | self.gt_ishard = tf.placeholder(tf.int32, shape=[None], name='gt_ishard') 13 | self.dontcare_areas = tf.placeholder(tf.float32, shape=[None, 4], name='dontcare_areas') 14 | self.keep_prob = tf.placeholder(tf.float32) 15 | self.layers = dict({'data':self.data, 'im_info':self.im_info, 'gt_boxes':self.gt_boxes,\ 16 | 'gt_ishard': self.gt_ishard, 'dontcare_areas': self.dontcare_areas}) 17 | self.trainable = trainable 18 | self.setup() 19 | 20 | def setup(self): 21 | 22 | # n_classes = 21 23 | n_classes = cfg.NCLASSES 24 | # anchor_scales = [8, 16, 32] 25 | anchor_scales = cfg.ANCHOR_SCALES 26 | _feat_stride = [16, ] 27 | 28 | (self.feed('data') 29 | .conv(3, 3, 64, 1, 1, name='conv1_1') 30 | .conv(3, 3, 64, 1, 1, name='conv1_2') 31 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool1') 32 | .conv(3, 3, 128, 1, 1, name='conv2_1') 33 | .conv(3, 3, 128, 1, 1, name='conv2_2') 34 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool2') 35 | .conv(3, 3, 256, 1, 1, name='conv3_1') 36 | .conv(3, 3, 256, 1, 1, name='conv3_2') 37 | .conv(3, 3, 256, 1, 1, name='conv3_3') 38 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool3') 39 | .conv(3, 3, 512, 1, 1, name='conv4_1') 40 | .conv(3, 3, 512, 1, 1, name='conv4_2') 41 | .conv(3, 3, 512, 1, 1, name='conv4_3') 42 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool4') 43 | .conv(3, 3, 512, 1, 1, name='conv5_1') 44 | .conv(3, 3, 512, 1, 1, name='conv5_2') 45 | .conv(3, 3, 512, 1, 1, name='conv5_3')) 46 | #========= RPN ============ 47 | (self.feed('conv5_3') 48 | .conv(3,3,512,1,1,name='rpn_conv/3x3')) 49 | 50 | (self.feed('rpn_conv/3x3').Bilstm(512,128,512,name='lstm_o')) 51 | (self.feed('lstm_o').lstm_fc(512,len(anchor_scales) * 10 * 4, name='rpn_bbox_pred')) 52 | (self.feed('lstm_o').lstm_fc(512,len(anchor_scales) * 10 * 2,name='rpn_cls_score')) 53 | 54 | # generating training labels on the fly 55 | # output: rpn_labels(HxWxA, 2) rpn_bbox_targets(HxWxA, 4) rpn_bbox_inside_weights rpn_bbox_outside_weights 56 | # 给每个anchor上标签,并计算真值(也是delta的形式),以及内部权重和外部权重 57 | (self.feed('rpn_cls_score', 'gt_boxes', 'gt_ishard', 'dontcare_areas', 'im_info') 58 | .anchor_target_layer(_feat_stride, anchor_scales, name = 'rpn-data' )) 59 | 60 | # shape is (1, H, W, Ax2) -> (1, H, WxA, 2) 61 | # 给之前得到的score进行softmax,得到0-1之间的得分 62 | (self.feed('rpn_cls_score') 63 | .spatial_reshape_layer(2, name = 'rpn_cls_score_reshape') 64 | .spatial_softmax(name='rpn_cls_prob')) 65 | -------------------------------------------------------------------------------- /lib/networks/__init__.py: -------------------------------------------------------------------------------- 1 | from .VGGnet_train import VGGnet_train 2 | from .VGGnet_test import VGGnet_test 3 | from . import factory 4 | -------------------------------------------------------------------------------- /lib/networks/factory.py: -------------------------------------------------------------------------------- 1 | from .VGGnet_test import VGGnet_test 2 | from .VGGnet_train import VGGnet_train 3 | 4 | def get_network(name): 5 | """Get a network by name.""" 6 | if name.split('_')[0] == 'VGGnet': 7 | if name.split('_')[1] == 'test': 8 | return VGGnet_test() 9 | elif name.split('_')[1] == 'train': 10 | return VGGnet_train() 11 | else: 12 | raise KeyError('Unknown dataset: {}'.format(name)) 13 | else: 14 | raise KeyError('Unknown dataset: {}'.format(name)) 15 | -------------------------------------------------------------------------------- /lib/networks/network.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | import numpy as np 3 | import tensorflow as tf 4 | from lib.fast_rcnn.config import cfg 5 | from lib.rpn_msr.proposal_layer_tf import proposal_layer as proposal_layer_py 6 | from lib.rpn_msr.anchor_target_layer_tf import anchor_target_layer as anchor_target_layer_py 7 | DEFAULT_PADDING = 'SAME' 8 | 9 | def layer(op): 10 | def layer_decorated(self, *args, **kwargs): 11 | # Automatically set a name if not provided. 12 | name = kwargs.setdefault('name', self.get_unique_name(op.__name__)) 13 | # Figure out the layer inputs. 14 | if len(self.inputs)==0: 15 | raise RuntimeError('No input variables found for layer %s.'%name) 16 | elif len(self.inputs)==1: 17 | layer_input = self.inputs[0] 18 | else: 19 | layer_input = list(self.inputs) 20 | # Perform the operation and get the output. 21 | layer_output = op(self, layer_input, *args, **kwargs) 22 | # Add to layer LUT. 23 | self.layers[name] = layer_output 24 | # This output is now the input for the next layer. 25 | self.feed(layer_output) 26 | # Return self for chained calls. 27 | return self 28 | return layer_decorated 29 | 30 | class Network(object): 31 | def __init__(self, inputs, trainable=True): 32 | self.inputs = [] 33 | self.layers = dict(inputs) 34 | self.trainable = trainable 35 | self.setup() 36 | 37 | def setup(self): 38 | raise NotImplementedError('Must be subclassed.') 39 | 40 | def load(self, data_path, session, ignore_missing=False): 41 | data_dict = np.load(data_path,encoding='latin1').item() 42 | for key in data_dict: 43 | with tf.variable_scope(key, reuse=True): 44 | for subkey in data_dict[key]: 45 | try: 46 | var = tf.get_variable(subkey) 47 | session.run(var.assign(data_dict[key][subkey])) 48 | print("assign pretrain model "+subkey+ " to "+key) 49 | except ValueError: 50 | print("ignore "+key) 51 | if not ignore_missing: 52 | 53 | raise 54 | 55 | def feed(self, *args): 56 | assert len(args)!=0 57 | self.inputs = [] 58 | for layer in args: 59 | if isinstance(layer, str): 60 | try: 61 | layer = self.layers[layer] 62 | print(layer) 63 | except KeyError: 64 | print(list(self.layers.keys())) 65 | raise KeyError('Unknown layer name fed: %s'%layer) 66 | self.inputs.append(layer) 67 | return self 68 | 69 | def get_output(self, layer): 70 | try: 71 | layer = self.layers[layer] 72 | except KeyError: 73 | print(list(self.layers.keys())) 74 | raise KeyError('Unknown layer name fed: %s'%layer) 75 | return layer 76 | 77 | def get_unique_name(self, prefix): 78 | id = sum(t.startswith(prefix) for t,_ in list(self.layers.items()))+1 79 | return '%s_%d'%(prefix, id) 80 | 81 | def make_var(self, name, shape, initializer=None, trainable=True, regularizer=None): 82 | return tf.get_variable(name, shape, initializer=initializer, trainable=trainable, regularizer=regularizer) 83 | 84 | def validate_padding(self, padding): 85 | assert padding in ('SAME', 'VALID') 86 | 87 | 88 | @layer 89 | def Bilstm(self, input, d_i, d_h, d_o, name, trainable=True): 90 | img = input 91 | with tf.variable_scope(name) as scope: 92 | shape = tf.shape(img) 93 | N, H, W, C = shape[0], shape[1], shape[2], shape[3] 94 | img = tf.reshape(img, [N * H, W, C]) 95 | img.set_shape([None, None, d_i]) 96 | 97 | lstm_fw_cell = tf.contrib.rnn.LSTMCell(d_h, state_is_tuple=True) 98 | lstm_bw_cell = tf.contrib.rnn.LSTMCell(d_h, state_is_tuple=True) 99 | 100 | lstm_out, last_state = tf.nn.bidirectional_dynamic_rnn(lstm_fw_cell,lstm_bw_cell, img, dtype=tf.float32) 101 | lstm_out = tf.concat(lstm_out, axis=-1) 102 | 103 | lstm_out = tf.reshape(lstm_out, [N * H * W, 2*d_h]) 104 | 105 | init_weights = tf.truncated_normal_initializer(stddev=0.1) 106 | init_biases = tf.constant_initializer(0.0) 107 | weights = self.make_var('weights', [2*d_h, d_o], init_weights, trainable, \ 108 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)) 109 | biases = self.make_var('biases', [d_o], init_biases, trainable) 110 | outputs = tf.matmul(lstm_out, weights) + biases 111 | 112 | outputs = tf.reshape(outputs, [N, H, W, d_o]) 113 | return outputs 114 | 115 | @layer 116 | def lstm(self, input, d_i,d_h,d_o, name, trainable=True): 117 | img = input 118 | with tf.variable_scope(name) as scope: 119 | shape = tf.shape(img) 120 | N,H,W,C = shape[0], shape[1],shape[2], shape[3] 121 | img = tf.reshape(img,[N*H,W,C]) 122 | img.set_shape([None,None,d_i]) 123 | 124 | lstm_cell = tf.contrib.rnn.LSTMCell(d_h, state_is_tuple=True) 125 | initial_state = lstm_cell.zero_state(N*H, dtype=tf.float32) 126 | 127 | lstm_out, last_state = tf.nn.dynamic_rnn(lstm_cell, img, 128 | initial_state=initial_state,dtype=tf.float32) 129 | 130 | lstm_out = tf.reshape(lstm_out,[N*H*W,d_h]) 131 | 132 | 133 | init_weights = tf.truncated_normal_initializer(stddev=0.1) 134 | init_biases = tf.constant_initializer(0.0) 135 | weights = self.make_var('weights', [d_h, d_o], init_weights, trainable, \ 136 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)) 137 | biases = self.make_var('biases', [d_o], init_biases, trainable) 138 | outputs = tf.matmul(lstm_out, weights) + biases 139 | 140 | 141 | outputs = tf.reshape(outputs, [N,H,W,d_o]) 142 | return outputs 143 | 144 | @layer 145 | def lstm_fc(self, input, d_i, d_o, name, trainable=True): 146 | with tf.variable_scope(name) as scope: 147 | shape = tf.shape(input) 148 | N, H, W, C = shape[0], shape[1], shape[2], shape[3] 149 | input = tf.reshape(input, [N*H*W,C]) 150 | 151 | init_weights = tf.truncated_normal_initializer(0.0, stddev=0.01) 152 | init_biases = tf.constant_initializer(0.0) 153 | kernel = self.make_var('weights', [d_i, d_o], init_weights, trainable, 154 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)) 155 | biases = self.make_var('biases', [d_o], init_biases, trainable) 156 | 157 | _O = tf.matmul(input, kernel) + biases 158 | return tf.reshape(_O, [N, H, W, int(d_o)]) 159 | 160 | @layer 161 | def conv(self, input, k_h, k_w, c_o, s_h, s_w, name, biased=True,relu=True, padding=DEFAULT_PADDING, trainable=True): 162 | """ contribution by miraclebiu, and biased option""" 163 | self.validate_padding(padding) 164 | c_i = input.get_shape()[-1] 165 | convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding) 166 | with tf.variable_scope(name) as scope: 167 | 168 | init_weights = tf.truncated_normal_initializer(0.0, stddev=0.01) 169 | init_biases = tf.constant_initializer(0.0) 170 | kernel = self.make_var('weights', [k_h, k_w, c_i, c_o], init_weights, trainable, \ 171 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)) 172 | if biased: 173 | biases = self.make_var('biases', [c_o], init_biases, trainable) 174 | conv = convolve(input, kernel) 175 | if relu: 176 | bias = tf.nn.bias_add(conv, biases) 177 | return tf.nn.relu(bias, name=scope.name) 178 | return tf.nn.bias_add(conv, biases, name=scope.name) 179 | else: 180 | conv = convolve(input, kernel) 181 | if relu: 182 | return tf.nn.relu(conv, name=scope.name) 183 | return conv 184 | 185 | @layer 186 | def relu(self, input, name): 187 | return tf.nn.relu(input, name=name) 188 | 189 | @layer 190 | def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING): 191 | self.validate_padding(padding) 192 | return tf.nn.max_pool(input, 193 | ksize=[1, k_h, k_w, 1], 194 | strides=[1, s_h, s_w, 1], 195 | padding=padding, 196 | name=name) 197 | 198 | @layer 199 | def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING): 200 | self.validate_padding(padding) 201 | return tf.nn.avg_pool(input, 202 | ksize=[1, k_h, k_w, 1], 203 | strides=[1, s_h, s_w, 1], 204 | padding=padding, 205 | name=name) 206 | 207 | @layer 208 | def proposal_layer(self, input, _feat_stride, anchor_scales, cfg_key, name): 209 | if isinstance(input[0], tuple): 210 | input[0] = input[0][0] 211 | # input[0] shape is (1, H, W, Ax2) 212 | # rpn_rois <- (1 x H x W x A, 5) [0, x1, y1, x2, y2] 213 | with tf.variable_scope(name) as scope: 214 | blob,bbox_delta = tf.py_func(proposal_layer_py,[input[0],input[1],input[2], cfg_key, _feat_stride, anchor_scales],\ 215 | [tf.float32,tf.float32]) 216 | 217 | rpn_rois = tf.convert_to_tensor(tf.reshape(blob,[-1, 5]), name = 'rpn_rois') # shape is (1 x H x W x A, 2) 218 | rpn_targets = tf.convert_to_tensor(bbox_delta, name = 'rpn_targets') # shape is (1 x H x W x A, 4) 219 | self.layers['rpn_rois'] = rpn_rois 220 | self.layers['rpn_targets'] = rpn_targets 221 | 222 | return rpn_rois, rpn_targets 223 | 224 | 225 | @layer 226 | def anchor_target_layer(self, input, _feat_stride, anchor_scales, name): 227 | if isinstance(input[0], tuple): 228 | input[0] = input[0][0] 229 | 230 | with tf.variable_scope(name) as scope: 231 | # 'rpn_cls_score', 'gt_boxes', 'gt_ishard', 'dontcare_areas', 'im_info' 232 | rpn_labels,rpn_bbox_targets,rpn_bbox_inside_weights,rpn_bbox_outside_weights = \ 233 | tf.py_func(anchor_target_layer_py, 234 | [input[0],input[1],input[2],input[3],input[4], _feat_stride, anchor_scales], 235 | [tf.float32,tf.float32,tf.float32,tf.float32]) 236 | 237 | rpn_labels = tf.convert_to_tensor(tf.cast(rpn_labels,tf.int32), name = 'rpn_labels') # shape is (1 x H x W x A, 2) 238 | rpn_bbox_targets = tf.convert_to_tensor(rpn_bbox_targets, name = 'rpn_bbox_targets') # shape is (1 x H x W x A, 4) 239 | rpn_bbox_inside_weights = tf.convert_to_tensor(rpn_bbox_inside_weights , name = 'rpn_bbox_inside_weights') # shape is (1 x H x W x A, 4) 240 | rpn_bbox_outside_weights = tf.convert_to_tensor(rpn_bbox_outside_weights , name = 'rpn_bbox_outside_weights') # shape is (1 x H x W x A, 4) 241 | 242 | 243 | return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights 244 | 245 | @layer 246 | def reshape_layer(self, input, d, name): 247 | input_shape = tf.shape(input) 248 | if name == 'rpn_cls_prob_reshape': 249 | # 250 | # transpose: (1, AxH, W, 2) -> (1, 2, AxH, W) 251 | # reshape: (1, 2xA, H, W) 252 | # transpose: -> (1, H, W, 2xA) 253 | return tf.transpose(tf.reshape(tf.transpose(input,[0,3,1,2]), 254 | [ input_shape[0], 255 | int(d), 256 | tf.cast(tf.cast(input_shape[1],tf.float32)/tf.cast(d,tf.float32)*tf.cast(input_shape[3],tf.float32),tf.int32), 257 | input_shape[2] 258 | ]), 259 | [0,2,3,1],name=name) 260 | else: 261 | return tf.transpose(tf.reshape(tf.transpose(input,[0,3,1,2]), 262 | [ input_shape[0], 263 | int(d), 264 | tf.cast(tf.cast(input_shape[1],tf.float32)*(tf.cast(input_shape[3],tf.float32)/tf.cast(d,tf.float32)),tf.int32), 265 | input_shape[2] 266 | ]), 267 | [0,2,3,1],name=name) 268 | 269 | @layer 270 | def spatial_reshape_layer(self, input, d, name): 271 | input_shape = tf.shape(input) 272 | # transpose: (1, H, W, A x d) -> (1, H, WxA, d) 273 | return tf.reshape(input,\ 274 | [input_shape[0],\ 275 | input_shape[1], \ 276 | -1,\ 277 | int(d)]) 278 | 279 | 280 | @layer 281 | def lrn(self, input, radius, alpha, beta, name, bias=1.0): 282 | return tf.nn.local_response_normalization(input, 283 | depth_radius=radius, 284 | alpha=alpha, 285 | beta=beta, 286 | bias=bias, 287 | name=name) 288 | 289 | @layer 290 | def concat(self, inputs, axis, name): 291 | return tf.concat(concat_dim=axis, values=inputs, name=name) 292 | 293 | @layer 294 | def fc(self, input, num_out, name, relu=True, trainable=True): 295 | with tf.variable_scope(name) as scope: 296 | # only use the first input 297 | if isinstance(input, tuple): 298 | input = input[0] 299 | 300 | input_shape = input.get_shape() 301 | if input_shape.ndims == 4: 302 | dim = 1 303 | for d in input_shape[1:].as_list(): 304 | dim *= d 305 | feed_in = tf.reshape(tf.transpose(input,[0,3,1,2]), [-1, dim]) 306 | else: 307 | feed_in, dim = (input, int(input_shape[-1])) 308 | 309 | if name == 'bbox_pred': 310 | init_weights = tf.truncated_normal_initializer(0.0, stddev=0.001) 311 | init_biases = tf.constant_initializer(0.0) 312 | else: 313 | init_weights = tf.truncated_normal_initializer(0.0, stddev=0.01) 314 | init_biases = tf.constant_initializer(0.0) 315 | 316 | weights = self.make_var('weights', [dim, num_out], init_weights, trainable, \ 317 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)) 318 | biases = self.make_var('biases', [num_out], init_biases, trainable) 319 | 320 | op = tf.nn.relu_layer if relu else tf.nn.xw_plus_b 321 | fc = op(feed_in, weights, biases, name=scope.name) 322 | return fc 323 | 324 | @layer 325 | def softmax(self, input, name): 326 | input_shape = tf.shape(input) 327 | if name == 'rpn_cls_prob': 328 | return tf.reshape(tf.nn.softmax(tf.reshape(input,[-1,input_shape[3]])),[-1,input_shape[1],input_shape[2],input_shape[3]],name=name) 329 | else: 330 | return tf.nn.softmax(input,name=name) 331 | 332 | @layer 333 | def spatial_softmax(self, input, name): 334 | input_shape = tf.shape(input) 335 | # d = input.get_shape()[-1] 336 | return tf.reshape(tf.nn.softmax(tf.reshape(input, [-1, input_shape[3]])), 337 | [-1, input_shape[1], input_shape[2], input_shape[3]], name=name) 338 | 339 | @layer 340 | def add(self,input,name): 341 | """contribution by miraclebiu""" 342 | return tf.add(input[0],input[1]) 343 | 344 | @layer 345 | def batch_normalization(self,input,name,relu=True,is_training=False): 346 | """contribution by miraclebiu""" 347 | if relu: 348 | temp_layer=tf.contrib.layers.batch_norm(input,scale=True,center=True,is_training=is_training,scope=name) 349 | return tf.nn.relu(temp_layer) 350 | else: 351 | return tf.contrib.layers.batch_norm(input,scale=True,center=True,is_training=is_training,scope=name) 352 | 353 | @layer 354 | def dropout(self, input, keep_prob, name): 355 | return tf.nn.dropout(input, keep_prob, name=name) 356 | 357 | def l2_regularizer(self, weight_decay=0.0005, scope=None): 358 | def regularizer(tensor): 359 | with tf.name_scope(scope, default_name='l2_regularizer', values=[tensor]): 360 | l2_weight = tf.convert_to_tensor(weight_decay, 361 | dtype=tensor.dtype.base_dtype, 362 | name='weight_decay') 363 | #return tf.mul(l2_weight, tf.nn.l2_loss(tensor), name='value') 364 | return tf.multiply(l2_weight, tf.nn.l2_loss(tensor), name='value') 365 | return regularizer 366 | 367 | def smooth_l1_dist(self, deltas, sigma2=9.0, name='smooth_l1_dist'): 368 | with tf.name_scope(name=name) as scope: 369 | deltas_abs = tf.abs(deltas) 370 | smoothL1_sign = tf.cast(tf.less(deltas_abs, 1.0/sigma2), tf.float32) 371 | return tf.square(deltas) * 0.5 * sigma2 * smoothL1_sign + \ 372 | (deltas_abs - 0.5 / sigma2) * tf.abs(smoothL1_sign - 1) 373 | 374 | 375 | 376 | def build_loss(self, ohem=False): 377 | # classification loss 378 | rpn_cls_score = tf.reshape(self.get_output('rpn_cls_score_reshape'), [-1, 2]) # shape (HxWxA, 2) 379 | rpn_label = tf.reshape(self.get_output('rpn-data')[0], [-1]) # shape (HxWxA) 380 | # ignore_label(-1) 381 | fg_keep = tf.equal(rpn_label, 1) 382 | rpn_keep = tf.where(tf.not_equal(rpn_label, -1)) 383 | rpn_cls_score = tf.gather(rpn_cls_score, rpn_keep) # shape (N, 2) 384 | rpn_label = tf.gather(rpn_label, rpn_keep) 385 | rpn_cross_entropy_n = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=rpn_label,logits=rpn_cls_score) 386 | 387 | # box loss 388 | rpn_bbox_pred = self.get_output('rpn_bbox_pred') # shape (1, H, W, Ax4) 389 | rpn_bbox_targets = self.get_output('rpn-data')[1] 390 | rpn_bbox_inside_weights = self.get_output('rpn-data')[2] 391 | rpn_bbox_outside_weights = self.get_output('rpn-data')[3] 392 | rpn_bbox_pred = tf.gather(tf.reshape(rpn_bbox_pred, [-1, 4]), rpn_keep) # shape (N, 4) 393 | rpn_bbox_targets = tf.gather(tf.reshape(rpn_bbox_targets, [-1, 4]), rpn_keep) 394 | rpn_bbox_inside_weights = tf.gather(tf.reshape(rpn_bbox_inside_weights, [-1, 4]), rpn_keep) 395 | rpn_bbox_outside_weights = tf.gather(tf.reshape(rpn_bbox_outside_weights, [-1, 4]), rpn_keep) 396 | 397 | rpn_loss_box_n = tf.reduce_sum(rpn_bbox_outside_weights * self.smooth_l1_dist( 398 | rpn_bbox_inside_weights * (rpn_bbox_pred - rpn_bbox_targets)), reduction_indices=[1]) 399 | 400 | rpn_loss_box = tf.reduce_sum(rpn_loss_box_n) / (tf.reduce_sum(tf.cast(fg_keep, tf.float32)) + 1) 401 | rpn_cross_entropy = tf.reduce_mean(rpn_cross_entropy_n) 402 | 403 | 404 | model_loss = rpn_cross_entropy + rpn_loss_box 405 | 406 | regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) 407 | total_loss = tf.add_n(regularization_losses) + model_loss 408 | 409 | return total_loss,model_loss, rpn_cross_entropy, rpn_loss_box 410 | -------------------------------------------------------------------------------- /lib/prepare_training_data/ToVoc.py: -------------------------------------------------------------------------------- 1 | from xml.dom.minidom import Document 2 | import cv2 3 | import os 4 | import glob 5 | import shutil 6 | import numpy as np 7 | 8 | def generate_xml(name, lines, img_size, class_sets, doncateothers=True): 9 | doc = Document() 10 | 11 | def append_xml_node_attr(child, parent=None, text=None): 12 | ele = doc.createElement(child) 13 | if not text is None: 14 | text_node = doc.createTextNode(text) 15 | ele.appendChild(text_node) 16 | parent = doc if parent is None else parent 17 | parent.appendChild(ele) 18 | return ele 19 | 20 | img_name = name + '.jpg' 21 | # create header 22 | annotation = append_xml_node_attr('annotation') 23 | append_xml_node_attr('folder', parent=annotation, text='text') 24 | append_xml_node_attr('filename', parent=annotation, text=img_name) 25 | source = append_xml_node_attr('source', parent=annotation) 26 | append_xml_node_attr('database', parent=source, text='coco_text_database') 27 | append_xml_node_attr('annotation', parent=source, text='text') 28 | append_xml_node_attr('image', parent=source, text='text') 29 | append_xml_node_attr('flickrid', parent=source, text='000000') 30 | owner = append_xml_node_attr('owner', parent=annotation) 31 | append_xml_node_attr('name', parent=owner, text='ms') 32 | size = append_xml_node_attr('size', annotation) 33 | append_xml_node_attr('width', size, str(img_size[1])) 34 | append_xml_node_attr('height', size, str(img_size[0])) 35 | append_xml_node_attr('depth', size, str(img_size[2])) 36 | append_xml_node_attr('segmented', parent=annotation, text='0') 37 | 38 | # create objects 39 | objs = [] 40 | for line in lines: 41 | splitted_line = line.strip().lower().split() 42 | cls = splitted_line[0].lower() 43 | if not doncateothers and cls not in class_sets: 44 | continue 45 | cls = 'dontcare' if cls not in class_sets else cls 46 | if cls == 'dontcare': 47 | continue 48 | obj = append_xml_node_attr('object', parent=annotation) 49 | occlusion = int(0) 50 | x1, y1, x2, y2 = int(float(splitted_line[1]) + 1), int(float(splitted_line[2]) + 1), \ 51 | int(float(splitted_line[3]) + 1), int(float(splitted_line[4]) + 1) 52 | truncation = float(0) 53 | difficult = 1 if _is_hard(cls, truncation, occlusion, x1, y1, x2, y2) else 0 54 | truncted = 0 if truncation < 0.5 else 1 55 | 56 | append_xml_node_attr('name', parent=obj, text=cls) 57 | append_xml_node_attr('pose', parent=obj, text='none') 58 | append_xml_node_attr('truncated', parent=obj, text=str(truncted)) 59 | append_xml_node_attr('difficult', parent=obj, text=str(int(difficult))) 60 | bb = append_xml_node_attr('bndbox', parent=obj) 61 | append_xml_node_attr('xmin', parent=bb, text=str(x1)) 62 | append_xml_node_attr('ymin', parent=bb, text=str(y1)) 63 | append_xml_node_attr('xmax', parent=bb, text=str(x2)) 64 | append_xml_node_attr('ymax', parent=bb, text=str(y2)) 65 | 66 | o = {'class': cls, 'box': np.asarray([x1, y1, x2, y2], dtype=float), \ 67 | 'truncation': truncation, 'difficult': difficult, 'occlusion': occlusion} 68 | objs.append(o) 69 | 70 | return doc, objs 71 | 72 | 73 | def _is_hard(cls, truncation, occlusion, x1, y1, x2, y2): 74 | hard = False 75 | if y2 - y1 < 25 and occlusion >= 2: 76 | hard = True 77 | return hard 78 | if occlusion >= 3: 79 | hard = True 80 | return hard 81 | if truncation > 0.8: 82 | hard = True 83 | return hard 84 | return hard 85 | 86 | 87 | def build_voc_dirs(outdir): 88 | mkdir = lambda dir: os.makedirs(dir) if not os.path.exists(dir) else None 89 | mkdir(outdir) 90 | mkdir(os.path.join(outdir, 'Annotations')) 91 | mkdir(os.path.join(outdir, 'ImageSets')) 92 | mkdir(os.path.join(outdir, 'ImageSets', 'Layout')) 93 | mkdir(os.path.join(outdir, 'ImageSets', 'Main')) 94 | mkdir(os.path.join(outdir, 'ImageSets', 'Segmentation')) 95 | mkdir(os.path.join(outdir, 'JPEGImages')) 96 | mkdir(os.path.join(outdir, 'SegmentationClass')) 97 | mkdir(os.path.join(outdir, 'SegmentationObject')) 98 | return os.path.join(outdir, 'Annotations'), os.path.join(outdir, 'JPEGImages'), os.path.join(outdir, 'ImageSets', 99 | 'Main') 100 | 101 | 102 | if __name__ == '__main__': 103 | _outdir = 'TEXTVOC/VOC2007' 104 | _draw = bool(0) 105 | _dest_label_dir, _dest_img_dir, _dest_set_dir = build_voc_dirs(_outdir) 106 | _doncateothers = bool(1) 107 | for dset in ['train']: 108 | _labeldir = 'label_tmp' 109 | _imagedir = 're_image' 110 | class_sets = ('text', 'dontcare') 111 | class_sets_dict = dict((k, i) for i, k in enumerate(class_sets)) 112 | allclasses = {} 113 | fs = [open(os.path.join(_dest_set_dir, cls + '_' + dset + '.txt'), 'w') for cls in class_sets] 114 | ftrain = open(os.path.join(_dest_set_dir, dset + '.txt'), 'w') 115 | 116 | files = glob.glob(os.path.join(_labeldir, '*.txt')) 117 | files.sort() 118 | for file in files: 119 | path, basename = os.path.split(file) 120 | stem, ext = os.path.splitext(basename) 121 | with open(file, 'r') as f: 122 | lines = f.readlines() 123 | img_file = os.path.join(_imagedir, stem + '.jpg') 124 | 125 | print(img_file) 126 | img = cv2.imread(img_file) 127 | img_size = img.shape 128 | 129 | doc, objs = generate_xml(stem, lines, img_size, class_sets=class_sets, doncateothers=_doncateothers) 130 | 131 | cv2.imwrite(os.path.join(_dest_img_dir, stem + '.jpg'), img) 132 | xmlfile = os.path.join(_dest_label_dir, stem + '.xml') 133 | with open(xmlfile, 'w') as f: 134 | f.write(doc.toprettyxml(indent=' ')) 135 | 136 | ftrain.writelines(stem + '\n') 137 | 138 | cls_in_image = set([o['class'] for o in objs]) 139 | 140 | for obj in objs: 141 | cls = obj['class'] 142 | allclasses[cls] = 0 \ 143 | if not cls in list(allclasses.keys()) else allclasses[cls] + 1 144 | 145 | for cls in cls_in_image: 146 | if cls in class_sets: 147 | fs[class_sets_dict[cls]].writelines(stem + ' 1\n') 148 | for cls in class_sets: 149 | if cls not in cls_in_image: 150 | fs[class_sets_dict[cls]].writelines(stem + ' -1\n') 151 | 152 | 153 | (f.close() for f in fs) 154 | ftrain.close() 155 | 156 | print('~~~~~~~~~~~~~~~~~~~') 157 | print(allclasses) 158 | print('~~~~~~~~~~~~~~~~~~~') 159 | shutil.copyfile(os.path.join(_dest_set_dir, 'train.txt'), os.path.join(_dest_set_dir, 'val.txt')) 160 | shutil.copyfile(os.path.join(_dest_set_dir, 'train.txt'), os.path.join(_dest_set_dir, 'trainval.txt')) 161 | for cls in class_sets: 162 | shutil.copyfile(os.path.join(_dest_set_dir, cls + '_train.txt'), 163 | os.path.join(_dest_set_dir, cls + '_trainval.txt')) 164 | shutil.copyfile(os.path.join(_dest_set_dir, cls + '_train.txt'), 165 | os.path.join(_dest_set_dir, cls + '_val.txt')) 166 | -------------------------------------------------------------------------------- /lib/prepare_training_data/split_label.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import math 4 | import cv2 as cv 5 | 6 | path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/image' 7 | gt_path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/label' 8 | out_path = 're_image' 9 | if not os.path.exists(out_path): 10 | os.makedirs(out_path) 11 | files = os.listdir(path) 12 | files.sort() 13 | #files=files[:100] 14 | for file in files: 15 | _, basename = os.path.split(file) 16 | if basename.lower().split('.')[-1] not in ['jpg', 'png']: 17 | continue 18 | stem, ext = os.path.splitext(basename) 19 | gt_file = os.path.join(gt_path, 'gt_' + stem + '.txt') 20 | img_path = os.path.join(path, file) 21 | print(img_path) 22 | img = cv.imread(img_path) 23 | img_size = img.shape 24 | im_size_min = np.min(img_size[0:2]) 25 | im_size_max = np.max(img_size[0:2]) 26 | 27 | im_scale = float(600) / float(im_size_min) 28 | if np.round(im_scale * im_size_max) > 1200: 29 | im_scale = float(1200) / float(im_size_max) 30 | re_im = cv.resize(img, None, None, fx=im_scale, fy=im_scale, interpolation=cv.INTER_LINEAR) 31 | re_size = re_im.shape 32 | cv.imwrite(os.path.join(out_path, stem) + '.jpg', re_im) 33 | 34 | with open(gt_file, 'r') as f: 35 | lines = f.readlines() 36 | for line in lines: 37 | splitted_line = line.strip().lower().split(',') 38 | pt_x = np.zeros((4, 1)) 39 | pt_y = np.zeros((4, 1)) 40 | pt_x[0, 0] = int(float(splitted_line[0]) / img_size[1] * re_size[1]) 41 | pt_y[0, 0] = int(float(splitted_line[1]) / img_size[0] * re_size[0]) 42 | pt_x[1, 0] = int(float(splitted_line[2]) / img_size[1] * re_size[1]) 43 | pt_y[1, 0] = int(float(splitted_line[3]) / img_size[0] * re_size[0]) 44 | pt_x[2, 0] = int(float(splitted_line[4]) / img_size[1] * re_size[1]) 45 | pt_y[2, 0] = int(float(splitted_line[5]) / img_size[0] * re_size[0]) 46 | pt_x[3, 0] = int(float(splitted_line[6]) / img_size[1] * re_size[1]) 47 | pt_y[3, 0] = int(float(splitted_line[7]) / img_size[0] * re_size[0]) 48 | 49 | ind_x = np.argsort(pt_x, axis=0) 50 | pt_x = pt_x[ind_x] 51 | pt_y = pt_y[ind_x] 52 | 53 | if pt_y[0] < pt_y[1]: 54 | pt1 = (pt_x[0], pt_y[0]) 55 | pt3 = (pt_x[1], pt_y[1]) 56 | else: 57 | pt1 = (pt_x[1], pt_y[1]) 58 | pt3 = (pt_x[0], pt_y[0]) 59 | 60 | if pt_y[2] < pt_y[3]: 61 | pt2 = (pt_x[2], pt_y[2]) 62 | pt4 = (pt_x[3], pt_y[3]) 63 | else: 64 | pt2 = (pt_x[3], pt_y[3]) 65 | pt4 = (pt_x[2], pt_y[2]) 66 | 67 | xmin = int(min(pt1[0], pt2[0])) 68 | ymin = int(min(pt1[1], pt2[1])) 69 | xmax = int(max(pt2[0], pt4[0])) 70 | ymax = int(max(pt3[1], pt4[1])) 71 | 72 | if xmin < 0: 73 | xmin = 0 74 | if xmax > re_size[1] - 1: 75 | xmax = re_size[1] - 1 76 | if ymin < 0: 77 | ymin = 0 78 | if ymax > re_size[0] - 1: 79 | ymax = re_size[0] - 1 80 | 81 | width = xmax - xmin 82 | height = ymax - ymin 83 | 84 | # reimplement 85 | step = 16.0 86 | x_left = [] 87 | x_right = [] 88 | x_left.append(xmin) 89 | x_left_start = int(math.ceil(xmin / 16.0) * 16.0) 90 | if x_left_start == xmin: 91 | x_left_start = xmin + 16 92 | for i in np.arange(x_left_start, xmax, 16): 93 | x_left.append(i) 94 | x_left = np.array(x_left) 95 | 96 | x_right.append(x_left_start - 1) 97 | for i in range(1, len(x_left) - 1): 98 | x_right.append(x_left[i] + 15) 99 | x_right.append(xmax) 100 | x_right = np.array(x_right) 101 | 102 | idx = np.where(x_left == x_right) 103 | x_left = np.delete(x_left, idx, axis=0) 104 | x_right = np.delete(x_right, idx, axis=0) 105 | 106 | if not os.path.exists('label_tmp'): 107 | os.makedirs('label_tmp') 108 | with open(os.path.join('label_tmp', stem) + '.txt', 'a') as f: 109 | for i in range(len(x_left)): 110 | f.writelines("text\t") 111 | f.writelines(str(int(x_left[i]))) 112 | f.writelines("\t") 113 | f.writelines(str(int(ymin))) 114 | f.writelines("\t") 115 | f.writelines(str(int(x_right[i]))) 116 | f.writelines("\t") 117 | f.writelines(str(int(ymax))) 118 | f.writelines("\n") 119 | -------------------------------------------------------------------------------- /lib/roi_data_layer/__init__.py: -------------------------------------------------------------------------------- 1 | from . import roidb -------------------------------------------------------------------------------- /lib/roi_data_layer/layer.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from lib.fast_rcnn.config import cfg 3 | from lib.roi_data_layer.minibatch import get_minibatch 4 | 5 | class RoIDataLayer(object): 6 | """Fast R-CNN data layer used for training.""" 7 | 8 | def __init__(self, roidb, num_classes): 9 | """Set the roidb to be used by this layer during training.""" 10 | self._roidb = roidb 11 | self._num_classes = num_classes 12 | self._shuffle_roidb_inds() 13 | 14 | def _shuffle_roidb_inds(self): 15 | """Randomly permute the training roidb.""" 16 | self._perm = np.random.permutation(np.arange(len(self._roidb))) 17 | self._cur = 0 18 | 19 | def _get_next_minibatch_inds(self): 20 | """Return the roidb indices for the next minibatch.""" 21 | 22 | if cfg.TRAIN.HAS_RPN: 23 | if self._cur + cfg.TRAIN.IMS_PER_BATCH >= len(self._roidb): 24 | self._shuffle_roidb_inds() 25 | 26 | db_inds = self._perm[self._cur:self._cur + cfg.TRAIN.IMS_PER_BATCH] 27 | self._cur += cfg.TRAIN.IMS_PER_BATCH 28 | else: 29 | # sample images 30 | db_inds = np.zeros((cfg.TRAIN.IMS_PER_BATCH), dtype=np.int32) 31 | i = 0 32 | while (i < cfg.TRAIN.IMS_PER_BATCH): 33 | ind = self._perm[self._cur] 34 | num_objs = self._roidb[ind]['boxes'].shape[0] 35 | if num_objs != 0: 36 | db_inds[i] = ind 37 | i += 1 38 | 39 | self._cur += 1 40 | if self._cur >= len(self._roidb): 41 | self._shuffle_roidb_inds() 42 | 43 | return db_inds 44 | 45 | def _get_next_minibatch(self): 46 | """Return the blobs to be used for the next minibatch. 47 | 48 | If cfg.TRAIN.USE_PREFETCH is True, then blobs will be computed in a 49 | separate process and made available through self._blob_queue. 50 | """ 51 | db_inds = self._get_next_minibatch_inds() 52 | minibatch_db = [self._roidb[i] for i in db_inds] 53 | return get_minibatch(minibatch_db, self._num_classes) 54 | 55 | def forward(self): 56 | """Get blobs and copy them into this layer's top blob vector.""" 57 | blobs = self._get_next_minibatch() 58 | return blobs 59 | -------------------------------------------------------------------------------- /lib/roi_data_layer/minibatch.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import numpy.random as npr 3 | import cv2 4 | import os 5 | from lib.fast_rcnn.config import cfg 6 | from lib.utils.blob import prep_im_for_blob, im_list_to_blob 7 | 8 | def get_minibatch(roidb, num_classes): 9 | """Given a roidb, construct a minibatch sampled from it.""" 10 | num_images = len(roidb) 11 | # Sample random scales to use for each image in this batch 12 | random_scale_inds = npr.randint(0, high=len(cfg.TRAIN.SCALES), 13 | size=num_images) 14 | assert(cfg.TRAIN.BATCH_SIZE % num_images == 0), \ 15 | 'num_images ({}) must divide BATCH_SIZE ({})'. \ 16 | format(num_images, cfg.TRAIN.BATCH_SIZE) 17 | rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images 18 | fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image) 19 | 20 | # Get the input image blob, formatted for caffe 21 | im_blob, im_scales = _get_image_blob(roidb, random_scale_inds) 22 | 23 | blobs = {'data': im_blob} 24 | 25 | if cfg.TRAIN.HAS_RPN: 26 | assert len(im_scales) == 1, "Single batch only" 27 | assert len(roidb) == 1, "Single batch only" 28 | # gt boxes: (x1, y1, x2, y2, cls) 29 | gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0] 30 | gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32) 31 | gt_boxes[:, 0:4] = roidb[0]['boxes'][gt_inds, :] * im_scales[0] 32 | gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds] 33 | blobs['gt_boxes'] = gt_boxes 34 | blobs['gt_ishard'] = roidb[0]['gt_ishard'][gt_inds] \ 35 | if 'gt_ishard' in roidb[0] else np.zeros(gt_inds.size, dtype=int) 36 | # blobs['gt_ishard'] = roidb[0]['gt_ishard'][gt_inds] 37 | blobs['dontcare_areas'] = roidb[0]['dontcare_areas'] * im_scales[0] \ 38 | if 'dontcare_areas' in roidb[0] else np.zeros([0, 4], dtype=float) 39 | blobs['im_info'] = np.array( 40 | [[im_blob.shape[1], im_blob.shape[2], im_scales[0]]], 41 | dtype=np.float32) 42 | blobs['im_name'] = os.path.basename(roidb[0]['image']) 43 | 44 | else: # not using RPN 45 | # Now, build the region of interest and label blobs 46 | rois_blob = np.zeros((0, 5), dtype=np.float32) 47 | labels_blob = np.zeros((0), dtype=np.float32) 48 | bbox_targets_blob = np.zeros((0, 4 * num_classes), dtype=np.float32) 49 | bbox_inside_blob = np.zeros(bbox_targets_blob.shape, dtype=np.float32) 50 | # all_overlaps = [] 51 | for im_i in range(num_images): 52 | labels, overlaps, im_rois, bbox_targets, bbox_inside_weights \ 53 | = _sample_rois(roidb[im_i], fg_rois_per_image, rois_per_image, 54 | num_classes) 55 | 56 | # Add to RoIs blob 57 | rois = _project_im_rois(im_rois, im_scales[im_i]) 58 | batch_ind = im_i * np.ones((rois.shape[0], 1)) 59 | rois_blob_this_image = np.hstack((batch_ind, rois)) 60 | rois_blob = np.vstack((rois_blob, rois_blob_this_image)) 61 | 62 | # Add to labels, bbox targets, and bbox loss blobs 63 | labels_blob = np.hstack((labels_blob, labels)) 64 | bbox_targets_blob = np.vstack((bbox_targets_blob, bbox_targets)) 65 | bbox_inside_blob = np.vstack((bbox_inside_blob, bbox_inside_weights)) 66 | # all_overlaps = np.hstack((all_overlaps, overlaps)) 67 | 68 | # For debug visualizations 69 | # _vis_minibatch(im_blob, rois_blob, labels_blob, all_overlaps) 70 | 71 | blobs['rois'] = rois_blob 72 | blobs['labels'] = labels_blob 73 | 74 | if cfg.TRAIN.BBOX_REG: 75 | blobs['bbox_targets'] = bbox_targets_blob 76 | blobs['bbox_inside_weights'] = bbox_inside_blob 77 | blobs['bbox_outside_weights'] = \ 78 | np.array(bbox_inside_blob > 0).astype(np.float32) 79 | 80 | return blobs 81 | 82 | def _sample_rois(roidb, fg_rois_per_image, rois_per_image, num_classes): 83 | """Generate a random sample of RoIs comprising foreground and background 84 | examples. 85 | """ 86 | # label = class RoI has max overlap with 87 | labels = roidb['max_classes'] 88 | overlaps = roidb['max_overlaps'] 89 | rois = roidb['boxes'] 90 | 91 | # Select foreground RoIs as those with >= FG_THRESH overlap 92 | fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0] 93 | # Guard against the case when an image has fewer than fg_rois_per_image 94 | # foreground RoIs 95 | fg_rois_per_this_image = np.minimum(fg_rois_per_image, fg_inds.size) 96 | # Sample foreground regions without replacement 97 | if fg_inds.size > 0: 98 | fg_inds = npr.choice( 99 | fg_inds, size=fg_rois_per_this_image, replace=False) 100 | 101 | # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) 102 | bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) & 103 | (overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] 104 | # Compute number of background RoIs to take from this image (guarding 105 | # against there being fewer than desired) 106 | bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image 107 | bg_rois_per_this_image = np.minimum(bg_rois_per_this_image, 108 | bg_inds.size) 109 | # Sample foreground regions without replacement 110 | if bg_inds.size > 0: 111 | bg_inds = npr.choice( 112 | bg_inds, size=bg_rois_per_this_image, replace=False) 113 | 114 | # The indices that we're selecting (both fg and bg) 115 | keep_inds = np.append(fg_inds, bg_inds) 116 | # Select sampled values from various arrays: 117 | labels = labels[keep_inds] 118 | # Clamp labels for the background RoIs to 0 119 | labels[fg_rois_per_this_image:] = 0 120 | overlaps = overlaps[keep_inds] 121 | rois = rois[keep_inds] 122 | 123 | bbox_targets, bbox_inside_weights = _get_bbox_regression_labels( 124 | roidb['bbox_targets'][keep_inds, :], num_classes) 125 | 126 | return labels, overlaps, rois, bbox_targets, bbox_inside_weights 127 | 128 | def _get_image_blob(roidb, scale_inds): 129 | """Builds an input blob from the images in the roidb at the specified 130 | scales. 131 | """ 132 | num_images = len(roidb) 133 | processed_ims = [] 134 | im_scales = [] 135 | for i in range(num_images): 136 | im = cv2.imread(roidb[i]['image']) 137 | if roidb[i]['flipped']: 138 | im = im[:, ::-1, :] 139 | target_size = cfg.TRAIN.SCALES[scale_inds[i]] 140 | im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, target_size, 141 | cfg.TRAIN.MAX_SIZE) 142 | im_scales.append(im_scale) 143 | processed_ims.append(im) 144 | 145 | # Create a blob to hold the input images 146 | blob = im_list_to_blob(processed_ims) 147 | 148 | return blob, im_scales 149 | 150 | def _project_im_rois(im_rois, im_scale_factor): 151 | """Project image RoIs into the rescaled training image.""" 152 | rois = im_rois * im_scale_factor 153 | return rois 154 | 155 | def _get_bbox_regression_labels(bbox_target_data, num_classes): 156 | """Bounding-box regression targets are stored in a compact form in the 157 | roidb. 158 | 159 | This function expands those targets into the 4-of-4*K representation used 160 | by the network (i.e. only one class has non-zero targets). The loss weights 161 | are similarly expanded. 162 | 163 | Returns: 164 | bbox_target_data (ndarray): N x 4K blob of regression targets 165 | bbox_inside_weights (ndarray): N x 4K blob of loss weights 166 | """ 167 | clss = bbox_target_data[:, 0] 168 | bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32) 169 | bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32) 170 | inds = np.where(clss > 0)[0] 171 | for ind in inds: 172 | cls = clss[ind] 173 | start = 4 * cls 174 | end = start + 4 175 | bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] 176 | bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS 177 | return bbox_targets, bbox_inside_weights 178 | 179 | def _vis_minibatch(im_blob, rois_blob, labels_blob, overlaps): 180 | """Visualize a mini-batch for debugging.""" 181 | import matplotlib.pyplot as plt 182 | for i in range(rois_blob.shape[0]): 183 | rois = rois_blob[i, :] 184 | im_ind = rois[0] 185 | roi = rois[1:] 186 | im = im_blob[im_ind, :, :, :].transpose((1, 2, 0)).copy() 187 | im += cfg.PIXEL_MEANS 188 | im = im[:, :, (2, 1, 0)] 189 | im = im.astype(np.uint8) 190 | cls = labels_blob[i] 191 | plt.imshow(im) 192 | print('class: ', cls, ' overlap: ', overlaps[i]) 193 | plt.gca().add_patch( 194 | plt.Rectangle((roi[0], roi[1]), roi[2] - roi[0], 195 | roi[3] - roi[1], fill=False, 196 | edgecolor='r', linewidth=3) 197 | ) 198 | plt.show() 199 | -------------------------------------------------------------------------------- /lib/roi_data_layer/roidb.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import PIL 3 | from lib.fast_rcnn.config import cfg 4 | from lib.fast_rcnn.bbox_transform import bbox_transform 5 | from lib.utils.bbox import bbox_overlaps 6 | 7 | def prepare_roidb(imdb): 8 | """Enrich the imdb's roidb by adding some derived quantities that 9 | are useful for training. This function precomputes the maximum 10 | overlap, taken over ground-truth boxes, between each ROI and 11 | each ground-truth box. The class with maximum overlap is also 12 | recorded. 13 | """ 14 | sizes = [PIL.Image.open(imdb.image_path_at(i)).size 15 | for i in range(imdb.num_images)] 16 | roidb = imdb.roidb 17 | for i in range(len(imdb.image_index)): 18 | roidb[i]['image'] = imdb.image_path_at(i) 19 | roidb[i]['width'] = sizes[i][0] 20 | roidb[i]['height'] = sizes[i][1] 21 | # need gt_overlaps as a dense array for argmax 22 | gt_overlaps = roidb[i]['gt_overlaps'].toarray() 23 | # max overlap with gt over classes (columns) 24 | max_overlaps = gt_overlaps.max(axis=1) 25 | # gt class that had the max overlap 26 | max_classes = gt_overlaps.argmax(axis=1) 27 | roidb[i]['max_classes'] = max_classes 28 | roidb[i]['max_overlaps'] = max_overlaps 29 | # sanity checks 30 | # max overlap of 0 => class should be zero (background) 31 | zero_inds = np.where(max_overlaps == 0)[0] 32 | assert all(max_classes[zero_inds] == 0) 33 | # max overlap > 0 => class should not be zero (must be a fg class) 34 | nonzero_inds = np.where(max_overlaps > 0)[0] 35 | assert all(max_classes[nonzero_inds] != 0) 36 | 37 | def add_bbox_regression_targets(roidb): 38 | """ 39 | Add information needed to train bounding-box regressors. 40 | For each roi find the corresponding gt box, and compute the distance. 41 | then normalize the distance into Gaussian by minus mean and divided by std 42 | """ 43 | assert len(roidb) > 0 44 | assert 'max_classes' in roidb[0], 'Did you call prepare_roidb first?' 45 | 46 | num_images = len(roidb) 47 | # Infer number of classes from the number of columns in gt_overlaps 48 | num_classes = roidb[0]['gt_overlaps'].shape[1] 49 | for im_i in range(num_images): 50 | rois = roidb[im_i]['boxes'] 51 | max_overlaps = roidb[im_i]['max_overlaps'] 52 | max_classes = roidb[im_i]['max_classes'] 53 | roidb[im_i]['bbox_targets'] = \ 54 | _compute_targets(rois, max_overlaps, max_classes) 55 | 56 | if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED: 57 | # Use fixed / precomputed "means" and "stds" instead of empirical values 58 | means = np.tile( 59 | np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (num_classes, 1)) 60 | stds = np.tile( 61 | np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (num_classes, 1)) 62 | else: 63 | # Compute values needed for means and stds 64 | # var(x) = E(x^2) - E(x)^2 65 | class_counts = np.zeros((num_classes, 1)) + cfg.EPS 66 | sums = np.zeros((num_classes, 4)) 67 | squared_sums = np.zeros((num_classes, 4)) 68 | for im_i in range(num_images): 69 | targets = roidb[im_i]['bbox_targets'] 70 | for cls in range(1, num_classes): 71 | cls_inds = np.where(targets[:, 0] == cls)[0] 72 | if cls_inds.size > 0: 73 | class_counts[cls] += cls_inds.size 74 | sums[cls, :] += targets[cls_inds, 1:].sum(axis=0) 75 | squared_sums[cls, :] += \ 76 | (targets[cls_inds, 1:] ** 2).sum(axis=0) 77 | 78 | means = sums / class_counts 79 | stds = np.sqrt(squared_sums / class_counts - means ** 2) 80 | # too small number will cause nan error 81 | assert np.min(stds) < 0.01, \ 82 | 'Boxes std is too small, std:{}'.format(stds) 83 | 84 | print('bbox target means:') 85 | print(means) 86 | print(means[1:, :].mean(axis=0)) # ignore bg class 87 | print('bbox target stdevs:') 88 | print(stds) 89 | print(stds[1:, :].mean(axis=0)) # ignore bg class 90 | 91 | # Normalize targets 92 | if cfg.TRAIN.BBOX_NORMALIZE_TARGETS: 93 | print("Normalizing targets") 94 | for im_i in range(num_images): 95 | targets = roidb[im_i]['bbox_targets'] 96 | for cls in range(1, num_classes): 97 | cls_inds = np.where(targets[:, 0] == cls)[0] 98 | roidb[im_i]['bbox_targets'][cls_inds, 1:] -= means[cls, :] 99 | roidb[im_i]['bbox_targets'][cls_inds, 1:] /= stds[cls, :] 100 | else: 101 | print("NOT normalizing targets") 102 | 103 | # These values will be needed for making predictions 104 | # (the predicts will need to be unnormalized and uncentered) 105 | return means.ravel(), stds.ravel() 106 | 107 | def _compute_targets(rois, overlaps, labels): 108 | """ 109 | Compute bounding-box regression targets for an image. 110 | for each roi find the corresponding gt_box, then compute the distance. 111 | """ 112 | # Indices of ground-truth ROIs 113 | gt_inds = np.where(overlaps == 1)[0] 114 | if len(gt_inds) == 0: 115 | # Bail if the image has no ground-truth ROIs 116 | return np.zeros((rois.shape[0], 5), dtype=np.float32) 117 | # Indices of examples for which we try to make predictions 118 | ex_inds = np.where(overlaps >= cfg.TRAIN.BBOX_THRESH)[0] 119 | 120 | # Get IoU overlap between each ex ROI and gt ROI 121 | ex_gt_overlaps = bbox_overlaps( 122 | np.ascontiguousarray(rois[ex_inds, :], dtype=np.float), 123 | np.ascontiguousarray(rois[gt_inds, :], dtype=np.float)) 124 | 125 | # Find which gt ROI each ex ROI has max overlap with: 126 | # this will be the ex ROI's gt target 127 | gt_assignment = ex_gt_overlaps.argmax(axis=1) 128 | gt_rois = rois[gt_inds[gt_assignment], :] 129 | ex_rois = rois[ex_inds, :] 130 | 131 | targets = np.zeros((rois.shape[0], 5), dtype=np.float32) 132 | targets[ex_inds, 0] = labels[ex_inds] 133 | targets[ex_inds, 1:] = bbox_transform(ex_rois, gt_rois) 134 | return targets 135 | -------------------------------------------------------------------------------- /lib/rpn_msr/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/lib/rpn_msr/__init__.py -------------------------------------------------------------------------------- /lib/rpn_msr/anchor_target_layer_tf.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | import numpy as np 3 | import numpy.random as npr 4 | from .generate_anchors import generate_anchors 5 | from lib.utils.bbox import bbox_overlaps, bbox_intersections 6 | from lib.fast_rcnn.config import cfg 7 | from lib.fast_rcnn.bbox_transform import bbox_transform 8 | 9 | DEBUG = False 10 | def anchor_target_layer(rpn_cls_score, gt_boxes, gt_ishard, dontcare_areas, im_info, _feat_stride = [16,], anchor_scales = [16,]): 11 | """ 12 | Assign anchors to ground-truth targets. Produces anchor classification 13 | labels and bounding-box regression targets. 14 | Parameters 15 | ---------- 16 | rpn_cls_score: (1, H, W, Ax2) bg/fg scores of previous conv layer 17 | gt_boxes: (G, 5) vstack of [x1, y1, x2, y2, class] 18 | gt_ishard: (G, 1), 1 or 0 indicates difficult or not 19 | dontcare_areas: (D, 4), some areas may contains small objs but no labelling. D may be 0 20 | im_info: a list of [image_height, image_width, scale_ratios] 21 | _feat_stride: the downsampling ratio of feature map to the original input image 22 | anchor_scales: the scales to the basic_anchor (basic anchor is [16, 16]) 23 | ---------- 24 | Returns 25 | ---------- 26 | rpn_labels : (HxWxA, 1), for each anchor, 0 denotes bg, 1 fg, -1 dontcare 27 | rpn_bbox_targets: (HxWxA, 4), distances of the anchors to the gt_boxes(may contains some transform) 28 | that are the regression objectives 29 | rpn_bbox_inside_weights: (HxWxA, 4) weights of each boxes, mainly accepts hyper param in cfg 30 | rpn_bbox_outside_weights: (HxWxA, 4) used to balance the fg/bg, 31 | beacuse the numbers of bgs and fgs mays significiantly different 32 | """ 33 | _anchors = generate_anchors(scales=np.array(anchor_scales))#生成基本的anchor,一共9个 34 | _num_anchors = _anchors.shape[0]#9个anchor 35 | 36 | if DEBUG: 37 | print('anchors:') 38 | print(_anchors) 39 | print('anchor shapes:') 40 | print(np.hstack(( 41 | _anchors[:, 2::4] - _anchors[:, 0::4], 42 | _anchors[:, 3::4] - _anchors[:, 1::4], 43 | ))) 44 | _counts = cfg.EPS 45 | _sums = np.zeros((1, 4)) 46 | _squared_sums = np.zeros((1, 4)) 47 | _fg_sum = 0 48 | _bg_sum = 0 49 | _count = 0 50 | 51 | # allow boxes to sit over the edge by a small amount 52 | _allowed_border = 0 53 | # map of shape (..., H, W) 54 | #height, width = rpn_cls_score.shape[1:3] 55 | 56 | im_info = im_info[0]#图像的高宽及通道数 57 | 58 | #在feature-map上定位anchor,并加上delta,得到在实际图像中anchor的真实坐标 59 | # Algorithm: 60 | # for each (H, W) location i 61 | # generate 9 anchor boxes centered on cell i 62 | # apply predicted bbox deltas at cell i to each of the 9 anchors 63 | # filter out-of-image anchors 64 | # measure GT overlap 65 | 66 | assert rpn_cls_score.shape[0] == 1, \ 67 | 'Only single item batches are supported' 68 | 69 | # map of shape (..., H, W) 70 | height, width = rpn_cls_score.shape[1:3]#feature-map的高宽 71 | 72 | if DEBUG: 73 | print('AnchorTargetLayer: height', height, 'width', width) 74 | print('') 75 | print('im_size: ({}, {})'.format(im_info[0], im_info[1])) 76 | print('scale: {}'.format(im_info[2])) 77 | print('height, width: ({}, {})'.format(height, width)) 78 | print('rpn: gt_boxes.shape', gt_boxes.shape) 79 | print('rpn: gt_boxes', gt_boxes) 80 | 81 | # 1. Generate proposals from bbox deltas and shifted anchors 82 | shift_x = np.arange(0, width) * _feat_stride 83 | shift_y = np.arange(0, height) * _feat_stride 84 | shift_x, shift_y = np.meshgrid(shift_x, shift_y) # in W H order 85 | # K is H x W 86 | shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), 87 | shift_x.ravel(), shift_y.ravel())).transpose()#生成feature-map和真实image上anchor之间的偏移量 88 | # add A anchors (1, A, 4) to 89 | # cell K shifts (K, 1, 4) to get 90 | # shift anchors (K, A, 4) 91 | # reshape to (K*A, 4) shifted anchors 92 | A = _num_anchors#9个anchor 93 | K = shifts.shape[0]#50*37,feature-map的宽乘高的大小 94 | all_anchors = (_anchors.reshape((1, A, 4)) + 95 | shifts.reshape((1, K, 4)).transpose((1, 0, 2)))#相当于复制宽高的维度,然后相加 96 | all_anchors = all_anchors.reshape((K * A, 4)) 97 | total_anchors = int(K * A) 98 | 99 | # only keep anchors inside the image 100 | #仅保留那些还在图像内部的anchor,超出图像的都删掉 101 | inds_inside = np.where( 102 | (all_anchors[:, 0] >= -_allowed_border) & 103 | (all_anchors[:, 1] >= -_allowed_border) & 104 | (all_anchors[:, 2] < im_info[1] + _allowed_border) & # width 105 | (all_anchors[:, 3] < im_info[0] + _allowed_border) # height 106 | )[0] 107 | 108 | if DEBUG: 109 | print('total_anchors', total_anchors) 110 | print('inds_inside', len(inds_inside)) 111 | 112 | # keep only inside anchors 113 | anchors = all_anchors[inds_inside, :]#保留那些在图像内的anchor 114 | if DEBUG: 115 | print('anchors.shape', anchors.shape) 116 | 117 | #至此,anchor准备好了 118 | #-------------------------------------------------------------- 119 | # label: 1 is positive, 0 is negative, -1 is dont care 120 | # (A) 121 | labels = np.empty((len(inds_inside), ), dtype=np.float32) 122 | labels.fill(-1)#初始化label,均为-1 123 | 124 | # overlaps between the anchors and the gt boxes 125 | # overlaps (ex, gt), shape is A x G 126 | #计算anchor和gt-box的overlap,用来给anchor上标签 127 | overlaps = bbox_overlaps( 128 | np.ascontiguousarray(anchors, dtype=np.float), 129 | np.ascontiguousarray(gt_boxes, dtype=np.float))#假设anchors有x个,gt_boxes有y个,返回的是一个(x,y)的数组 130 | # 存放每一个anchor和每一个gtbox之间的overlap 131 | argmax_overlaps = overlaps.argmax(axis=1) # (A)#找到和每一个gtbox,overlap最大的那个anchor 132 | max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] 133 | gt_argmax_overlaps = overlaps.argmax(axis=0) # G#找到每个位置上9个anchor中与gtbox,overlap最大的那个 134 | gt_max_overlaps = overlaps[gt_argmax_overlaps, 135 | np.arange(overlaps.shape[1])] 136 | gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0] 137 | 138 | if not cfg.TRAIN.RPN_CLOBBER_POSITIVES: 139 | # assign bg labels first so that positive labels can clobber them 140 | labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0#先给背景上标签,小于0.3overlap的 141 | 142 | # fg label: for each gt, anchor with highest overlap 143 | labels[gt_argmax_overlaps] = 1#每个位置上的9个anchor中overlap最大的认为是前景 144 | # fg label: above threshold IOU 145 | labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1#overlap大于0.7的认为是前景 146 | 147 | if cfg.TRAIN.RPN_CLOBBER_POSITIVES: 148 | # assign bg labels last so that negative labels can clobber positives 149 | labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 150 | 151 | # preclude dontcare areas 152 | if dontcare_areas is not None and dontcare_areas.shape[0] > 0:#这里我们暂时不考虑有doncare_area的存在 153 | # intersec shape is D x A 154 | intersecs = bbox_intersections( 155 | np.ascontiguousarray(dontcare_areas, dtype=np.float), # D x 4 156 | np.ascontiguousarray(anchors, dtype=np.float) # A x 4 157 | ) 158 | intersecs_ = intersecs.sum(axis=0) # A x 1 159 | labels[intersecs_ > cfg.TRAIN.DONTCARE_AREA_INTERSECTION_HI] = -1 160 | 161 | #这里我们暂时不考虑难样本的问题 162 | # preclude hard samples that are highly occlusioned, truncated or difficult to see 163 | if cfg.TRAIN.PRECLUDE_HARD_SAMPLES and gt_ishard is not None and gt_ishard.shape[0] > 0: 164 | assert gt_ishard.shape[0] == gt_boxes.shape[0] 165 | gt_ishard = gt_ishard.astype(int) 166 | gt_hardboxes = gt_boxes[gt_ishard == 1, :] 167 | if gt_hardboxes.shape[0] > 0: 168 | # H x A 169 | hard_overlaps = bbox_overlaps( 170 | np.ascontiguousarray(gt_hardboxes, dtype=np.float), # H x 4 171 | np.ascontiguousarray(anchors, dtype=np.float)) # A x 4 172 | hard_max_overlaps = hard_overlaps.max(axis=0) # (A) 173 | labels[hard_max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = -1 174 | max_intersec_label_inds = hard_overlaps.argmax(axis=1) # H x 1 175 | labels[max_intersec_label_inds] = -1 # 176 | 177 | # subsample positive labels if we have too many 178 | #对正样本进行采样,如果正样本的数量太多的话 179 | # 限制正样本的数量不超过128个 180 | #TODO 这个后期可能还需要修改,毕竟如果使用的是字符的片段,那个正样本的数量是很多的。 181 | num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE) 182 | fg_inds = np.where(labels == 1)[0] 183 | if len(fg_inds) > num_fg: 184 | disable_inds = npr.choice( 185 | fg_inds, size=(len(fg_inds) - num_fg), replace=False)#随机去除掉一些正样本 186 | labels[disable_inds] = -1#变为-1 187 | 188 | # subsample negative labels if we have too many 189 | #对负样本进行采样,如果负样本的数量太多的话 190 | # 正负样本总数是256,限制正样本数目最多128, 191 | # 如果正样本数量小于128,差的那些就用负样本补上,凑齐256个样本 192 | num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1) 193 | bg_inds = np.where(labels == 0)[0] 194 | if len(bg_inds) > num_bg: 195 | disable_inds = npr.choice( 196 | bg_inds, size=(len(bg_inds) - num_bg), replace=False) 197 | labels[disable_inds] = -1 198 | #print "was %s inds, disabling %s, now %s inds" % ( 199 | #len(bg_inds), len(disable_inds), np.sum(labels == 0)) 200 | 201 | # 至此, 上好标签,开始计算rpn-box的真值 202 | #-------------------------------------------------------------- 203 | bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32) 204 | bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])#根据anchor和gtbox计算得真值(anchor和gtbox之间的偏差) 205 | 206 | 207 | bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) 208 | bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)#内部权重,前景就给1,其他是0 209 | 210 | bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) 211 | if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:#暂时使用uniform 权重,也就是正样本是1,负样本是0 212 | # uniform weighting of examples (given non-uniform sampling) 213 | num_examples = np.sum(labels >= 0) + 1 214 | # positive_weights = np.ones((1, 4)) * 1.0 / num_examples 215 | # negative_weights = np.ones((1, 4)) * 1.0 / num_examples 216 | positive_weights = np.ones((1, 4)) 217 | negative_weights = np.zeros((1, 4)) 218 | else: 219 | assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) & 220 | (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1)) 221 | positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT / 222 | (np.sum(labels == 1)) + 1) 223 | negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) / 224 | (np.sum(labels == 0)) + 1) 225 | bbox_outside_weights[labels == 1, :] = positive_weights#外部权重,前景是1,背景是0 226 | bbox_outside_weights[labels == 0, :] = negative_weights 227 | 228 | if DEBUG: 229 | _sums += bbox_targets[labels == 1, :].sum(axis=0) 230 | _squared_sums += (bbox_targets[labels == 1, :] ** 2).sum(axis=0) 231 | _counts += np.sum(labels == 1) 232 | means = _sums / _counts 233 | stds = np.sqrt(_squared_sums / _counts - means ** 2) 234 | print('means:') 235 | print(means) 236 | print('stdevs:') 237 | print(stds) 238 | 239 | # map up to original set of anchors 240 | # 一开始是将超出图像范围的anchor直接丢掉的,现在在加回来 241 | labels = _unmap(labels, total_anchors, inds_inside, fill=-1)#这些anchor的label是-1,也即dontcare 242 | bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)#这些anchor的真值是0,也即没有值 243 | bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)#内部权重以0填充 244 | bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)#外部权重以0填充 245 | 246 | if DEBUG: 247 | print('rpn: max max_overlap', np.max(max_overlaps)) 248 | print('rpn: num_positive', np.sum(labels == 1)) 249 | print('rpn: num_negative', np.sum(labels == 0)) 250 | _fg_sum += np.sum(labels == 1) 251 | _bg_sum += np.sum(labels == 0) 252 | _count += 1 253 | print('rpn: num_positive avg', _fg_sum / _count) 254 | print('rpn: num_negative avg', _bg_sum / _count) 255 | 256 | # labels 257 | labels = labels.reshape((1, height, width, A))#reshap一下label 258 | rpn_labels = labels 259 | 260 | # bbox_targets 261 | bbox_targets = bbox_targets \ 262 | .reshape((1, height, width, A * 4))#reshape 263 | 264 | rpn_bbox_targets = bbox_targets 265 | # bbox_inside_weights 266 | bbox_inside_weights = bbox_inside_weights \ 267 | .reshape((1, height, width, A * 4)) 268 | 269 | rpn_bbox_inside_weights = bbox_inside_weights 270 | 271 | # bbox_outside_weights 272 | bbox_outside_weights = bbox_outside_weights \ 273 | .reshape((1, height, width, A * 4)) 274 | rpn_bbox_outside_weights = bbox_outside_weights 275 | 276 | return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights 277 | 278 | 279 | 280 | def _unmap(data, count, inds, fill=0): 281 | """ Unmap a subset of item (data) back to the original set of items (of 282 | size count) """ 283 | if len(data.shape) == 1: 284 | ret = np.empty((count, ), dtype=np.float32) 285 | ret.fill(fill) 286 | ret[inds] = data 287 | else: 288 | ret = np.empty((count, ) + data.shape[1:], dtype=np.float32) 289 | ret.fill(fill) 290 | ret[inds, :] = data 291 | return ret 292 | 293 | 294 | def _compute_targets(ex_rois, gt_rois): 295 | """Compute bounding-box regression targets for an image.""" 296 | 297 | assert ex_rois.shape[0] == gt_rois.shape[0] 298 | assert ex_rois.shape[1] == 4 299 | assert gt_rois.shape[1] == 5 300 | 301 | return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False) 302 | -------------------------------------------------------------------------------- /lib/rpn_msr/generate_anchors.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | def generate_basic_anchors(sizes, base_size=16): 4 | base_anchor = np.array([0, 0, base_size - 1, base_size - 1], np.int32) 5 | anchors = np.zeros((len(sizes), 4), np.int32) 6 | index = 0 7 | for h, w in sizes: 8 | anchors[index] = scale_anchor(base_anchor, h, w) 9 | index += 1 10 | return anchors 11 | 12 | 13 | def scale_anchor(anchor, h, w): 14 | x_ctr = (anchor[0] + anchor[2]) * 0.5 15 | y_ctr = (anchor[1] + anchor[3]) * 0.5 16 | scaled_anchor = anchor.copy() 17 | scaled_anchor[0] = x_ctr - w / 2 # xmin 18 | scaled_anchor[2] = x_ctr + w / 2 # xmax 19 | scaled_anchor[1] = y_ctr - h / 2 # ymin 20 | scaled_anchor[3] = y_ctr + h / 2 # ymax 21 | return scaled_anchor 22 | 23 | 24 | def generate_anchors(base_size=16, ratios=[0.5, 1, 2], 25 | scales=2**np.arange(3, 6)): 26 | heights = [11, 16, 23, 33, 48, 68, 97, 139, 198, 283] 27 | widths = [16] 28 | sizes = [] 29 | for h in heights: 30 | for w in widths: 31 | sizes.append((h, w)) 32 | return generate_basic_anchors(sizes) 33 | 34 | if __name__ == '__main__': 35 | import time 36 | t = time.time() 37 | a = generate_anchors() 38 | print(time.time() - t) 39 | print(a) 40 | from IPython import embed; embed() 41 | -------------------------------------------------------------------------------- /lib/rpn_msr/proposal_layer_tf.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | import numpy as np 3 | from .generate_anchors import generate_anchors 4 | from lib.fast_rcnn.config import cfg 5 | from lib.fast_rcnn.bbox_transform import bbox_transform_inv, clip_boxes 6 | from lib.fast_rcnn.nms_wrapper import nms 7 | 8 | 9 | DEBUG = False 10 | """ 11 | Outputs object detection proposals by applying estimated bounding-box 12 | transformations to a set of regular boxes (called "anchors"). 13 | """ 14 | def proposal_layer(rpn_cls_prob_reshape, rpn_bbox_pred, im_info, cfg_key, _feat_stride = [16,], anchor_scales = [16,]): 15 | """ 16 | Parameters 17 | ---------- 18 | rpn_cls_prob_reshape: (1 , H , W , Ax2) outputs of RPN, prob of bg or fg 19 | NOTICE: the old version is ordered by (1, H, W, 2, A) !!!! 20 | rpn_bbox_pred: (1 , H , W , Ax4), rgs boxes output of RPN 21 | im_info: a list of [image_height, image_width, scale_ratios] 22 | cfg_key: 'TRAIN' or 'TEST' 23 | _feat_stride: the downsampling ratio of feature map to the original input image 24 | anchor_scales: the scales to the basic_anchor (basic anchor is [16, 16]) 25 | ---------- 26 | Returns 27 | ---------- 28 | rpn_rois : (1 x H x W x A, 5) e.g. [0, x1, y1, x2, y2] 29 | 30 | # Algorithm: 31 | # 32 | # for each (H, W) location i 33 | # generate A anchor boxes centered on cell i 34 | # apply predicted bbox deltas at cell i to each of the A anchors 35 | # clip predicted boxes to image 36 | # remove predicted boxes with either height or width < threshold 37 | # sort all (proposal, score) pairs by score from highest to lowest 38 | # take top pre_nms_topN proposals before NMS 39 | # apply NMS with threshold 0.7 to remaining proposals 40 | # take after_nms_topN proposals after NMS 41 | # return the top proposals (-> RoIs top, scores top) 42 | #layer_params = yaml.load(self.param_str_) 43 | 44 | """ 45 | # cfg_key=cfg_key.decode('ascii') 46 | _anchors = generate_anchors(scales=np.array(anchor_scales))#生成基本的9个anchor 47 | _num_anchors = _anchors.shape[0]#9个anchor 48 | 49 | im_info = im_info[0]#原始图像的高宽、缩放尺度 50 | 51 | assert rpn_cls_prob_reshape.shape[0] == 1, \ 52 | 'Only single item batches are supported' 53 | 54 | pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N#12000,在做nms之前,最多保留的候选box数目 55 | post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N#2000,做完nms之后,最多保留的box的数目 56 | nms_thresh = cfg[cfg_key].RPN_NMS_THRESH#nms用参数,阈值是0.7 57 | min_size = cfg[cfg_key].RPN_MIN_SIZE#候选box的最小尺寸,目前是16,高宽均要大于16 58 | #TODO 后期需要修改这个最小尺寸,改为8? 59 | 60 | height, width = rpn_cls_prob_reshape.shape[1:3]#feature-map的高宽 61 | 62 | # the first set of _num_anchors channels are bg probs 63 | # the second set are the fg probs, which we want 64 | # (1, H, W, A) 65 | scores = np.reshape(np.reshape(rpn_cls_prob_reshape, [1, height, width, _num_anchors, 2])[:,:,:,:,1], 66 | [1, height, width, _num_anchors]) 67 | #提取到object的分数,non-object的我们不关心 68 | #并reshape到1*H*W*9 69 | 70 | bbox_deltas = rpn_bbox_pred#模型输出的pred是相对值,需要进一步处理成真实图像中的坐标 71 | #im_info = bottom[2].data[0, :] 72 | 73 | if DEBUG: 74 | print('im_size: ({}, {})'.format(im_info[0], im_info[1])) 75 | print('scale: {}'.format(im_info[2])) 76 | 77 | # 1. Generate proposals from bbox deltas and shifted anchors 78 | if DEBUG: 79 | print('score map size: {}'.format(scores.shape)) 80 | 81 | # Enumerate all shifts 82 | # 同anchor-target-layer-tf这个文件一样,生成anchor的shift,进一步得到整张图像上的所有anchor 83 | shift_x = np.arange(0, width) * _feat_stride 84 | shift_y = np.arange(0, height) * _feat_stride 85 | shift_x, shift_y = np.meshgrid(shift_x, shift_y) 86 | shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), 87 | shift_x.ravel(), shift_y.ravel())).transpose() 88 | 89 | # Enumerate all shifted anchors: 90 | # 91 | # add A anchors (1, A, 4) to 92 | # cell K shifts (K, 1, 4) to get 93 | # shift anchors (K, A, 4) 94 | # reshape to (K*A, 4) shifted anchors 95 | A = _num_anchors 96 | K = shifts.shape[0] 97 | anchors = _anchors.reshape((1, A, 4)) + \ 98 | shifts.reshape((1, K, 4)).transpose((1, 0, 2)) 99 | anchors = anchors.reshape((K * A, 4))#这里得到的anchor就是整张图像上的所有anchor 100 | 101 | # Transpose and reshape predicted bbox transformations to get them 102 | # into the same order as the anchors: 103 | # bbox deltas will be (1, 4 * A, H, W) format 104 | # transpose to (1, H, W, 4 * A) 105 | # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a) 106 | # in slowest to fastest order 107 | bbox_deltas = bbox_deltas.reshape((-1, 4)) #(HxWxA, 4) 108 | 109 | # Same story for the scores: 110 | scores = scores.reshape((-1, 1)) 111 | 112 | # Convert anchors into proposals via bbox transformations 113 | proposals = bbox_transform_inv(anchors, bbox_deltas)#做逆变换,得到box在图像上的真实坐标 114 | 115 | # 2. clip predicted boxes to image 116 | proposals = clip_boxes(proposals, im_info[:2])#将所有的proposal修建一下,超出图像范围的将会被修剪掉 117 | 118 | # 3. remove predicted boxes with either height or width < threshold 119 | # (NOTE: convert min_size to input image scale stored in im_info[2]) 120 | keep = _filter_boxes(proposals, min_size * im_info[2])#移除那些proposal小于一定尺寸的proposal 121 | proposals = proposals[keep, :]#保留剩下的proposal 122 | scores = scores[keep] 123 | bbox_deltas=bbox_deltas[keep,:] 124 | 125 | 126 | # # remove irregular boxes, too fat too tall 127 | # keep = _filter_irregular_boxes(proposals) 128 | # proposals = proposals[keep, :] 129 | # scores = scores[keep] 130 | 131 | # 4. sort all (proposal, score) pairs by score from highest to lowest 132 | # 5. take top pre_nms_topN (e.g. 6000) 133 | order = scores.ravel().argsort()[::-1]#score按得分的高低进行排序 134 | if pre_nms_topN > 0: #保留12000个proposal进去做nms 135 | order = order[:pre_nms_topN] 136 | proposals = proposals[order, :] 137 | scores = scores[order] 138 | bbox_deltas=bbox_deltas[order,:] 139 | 140 | 141 | # 6. apply nms (e.g. threshold = 0.7) 142 | # 7. take after_nms_topN (e.g. 300) 143 | # 8. return the top proposals (-> RoIs top) 144 | keep = nms(np.hstack((proposals, scores)), nms_thresh)#进行nms操作,保留2000个proposal 145 | if post_nms_topN > 0: 146 | keep = keep[:post_nms_topN] 147 | proposals = proposals[keep, :] 148 | scores = scores[keep] 149 | bbox_deltas=bbox_deltas[keep,:] 150 | 151 | 152 | # Output rois blob 153 | # Our RPN implementation only supports a single input image, so all 154 | # batch inds are 0 155 | blob = np.hstack((scores.astype(np.float32, copy=False), proposals.astype(np.float32, copy=False))) 156 | 157 | return blob,bbox_deltas 158 | 159 | 160 | def _filter_boxes(boxes, min_size): 161 | """Remove all boxes with any side smaller than min_size.""" 162 | ws = boxes[:, 2] - boxes[:, 0] + 1 163 | hs = boxes[:, 3] - boxes[:, 1] + 1 164 | keep = np.where((ws >= min_size) & (hs >= min_size))[0] 165 | return keep 166 | 167 | def _filter_irregular_boxes(boxes, min_ratio = 0.2, max_ratio = 5): 168 | """Remove all boxes with any side smaller than min_size.""" 169 | ws = boxes[:, 2] - boxes[:, 0] + 1 170 | hs = boxes[:, 3] - boxes[:, 1] + 1 171 | rs = ws / hs 172 | keep = np.where((rs <= max_ratio) & (rs >= min_ratio))[0] 173 | return keep 174 | -------------------------------------------------------------------------------- /lib/text_connector/__init__.py: -------------------------------------------------------------------------------- 1 | from .detectors import TextDetector 2 | from .text_connect_cfg import Config 3 | -------------------------------------------------------------------------------- /lib/text_connector/detectors.py: -------------------------------------------------------------------------------- 1 | #coding:utf-8 2 | import numpy as np 3 | from lib.fast_rcnn.nms_wrapper import nms 4 | from lib.fast_rcnn.config import cfg 5 | from .text_proposal_connector import TextProposalConnector 6 | from .text_proposal_connector_oriented import TextProposalConnector as TextProposalConnectorOriented 7 | from .text_connect_cfg import Config as TextLineCfg 8 | 9 | 10 | class TextDetector: 11 | def __init__(self): 12 | self.mode= cfg.TEST.DETECT_MODE 13 | if self.mode == "H": 14 | self.text_proposal_connector=TextProposalConnector() 15 | elif self.mode == "O": 16 | self.text_proposal_connector=TextProposalConnectorOriented() 17 | 18 | 19 | def detect(self, text_proposals,scores,size): 20 | # 删除得分较低的proposal 21 | keep_inds=np.where(scores>TextLineCfg.TEXT_PROPOSALS_MIN_SCORE)[0] 22 | text_proposals, scores=text_proposals[keep_inds], scores[keep_inds] 23 | 24 | # 按得分排序 25 | sorted_indices=np.argsort(scores.ravel())[::-1] 26 | text_proposals, scores=text_proposals[sorted_indices], scores[sorted_indices] 27 | 28 | # 对proposal做nms 29 | keep_inds=nms(np.hstack((text_proposals, scores)), TextLineCfg.TEXT_PROPOSALS_NMS_THRESH) 30 | text_proposals, scores=text_proposals[keep_inds], scores[keep_inds] 31 | 32 | # 获取检测结果 33 | text_recs=self.text_proposal_connector.get_text_lines(text_proposals, scores, size) 34 | keep_inds=self.filter_boxes(text_recs) 35 | return text_recs[keep_inds] 36 | 37 | def filter_boxes(self, boxes): 38 | heights=np.zeros((len(boxes), 1), np.float) 39 | widths=np.zeros((len(boxes), 1), np.float) 40 | scores=np.zeros((len(boxes), 1), np.float) 41 | index=0 42 | for box in boxes: 43 | heights[index]=(abs(box[5]-box[1])+abs(box[7]-box[3]))/2.0+1 44 | widths[index]=(abs(box[2]-box[0])+abs(box[6]-box[4]))/2.0+1 45 | scores[index] = box[8] 46 | index += 1 47 | 48 | return np.where((widths/heights>TextLineCfg.MIN_RATIO) & (scores>TextLineCfg.LINE_MIN_SCORE) & 49 | (widths>(TextLineCfg.TEXT_PROPOSALS_WIDTH*TextLineCfg.MIN_NUM_PROPOSALS)))[0] -------------------------------------------------------------------------------- /lib/text_connector/other.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def threshold(coords, min_, max_): 5 | return np.maximum(np.minimum(coords, max_), min_) 6 | 7 | def clip_boxes(boxes, im_shape): 8 | """ 9 | Clip boxes to image boundaries. 10 | """ 11 | boxes[:, 0::2]=threshold(boxes[:, 0::2], 0, im_shape[1]-1) 12 | boxes[:, 1::2]=threshold(boxes[:, 1::2], 0, im_shape[0]-1) 13 | return boxes 14 | 15 | 16 | class Graph: 17 | def __init__(self, graph): 18 | self.graph=graph 19 | 20 | def sub_graphs_connected(self): 21 | sub_graphs=[] 22 | for index in range(self.graph.shape[0]): 23 | if not self.graph[:, index].any() and self.graph[index, :].any(): 24 | v=index 25 | sub_graphs.append([v]) 26 | while self.graph[v, :].any(): 27 | v=np.where(self.graph[v, :])[0][0] 28 | sub_graphs[-1].append(v) 29 | return sub_graphs 30 | 31 | -------------------------------------------------------------------------------- /lib/text_connector/text_connect_cfg.py: -------------------------------------------------------------------------------- 1 | class Config: 2 | SCALE=600 3 | MAX_SCALE=1200 4 | TEXT_PROPOSALS_WIDTH=16 5 | MIN_NUM_PROPOSALS = 2 6 | MIN_RATIO=0.5 7 | LINE_MIN_SCORE=0.9 8 | MAX_HORIZONTAL_GAP=50 9 | TEXT_PROPOSALS_MIN_SCORE=0.7 10 | TEXT_PROPOSALS_NMS_THRESH=0.2 11 | MIN_V_OVERLAPS=0.7 12 | MIN_SIZE_SIM=0.7 13 | 14 | 15 | -------------------------------------------------------------------------------- /lib/text_connector/text_proposal_connector.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from .other import clip_boxes 3 | from .text_proposal_graph_builder import TextProposalGraphBuilder 4 | 5 | class TextProposalConnector: 6 | def __init__(self): 7 | self.graph_builder=TextProposalGraphBuilder() 8 | 9 | def group_text_proposals(self, text_proposals, scores, im_size): 10 | graph=self.graph_builder.build_graph(text_proposals, scores, im_size) 11 | return graph.sub_graphs_connected() 12 | 13 | def fit_y(self, X, Y, x1, x2): 14 | len(X)!=0 15 | # if X only include one point, the function will get line y=Y[0] 16 | if np.sum(X==X[0])==len(X): 17 | return Y[0], Y[0] 18 | p=np.poly1d(np.polyfit(X, Y, 1)) 19 | return p(x1), p(x2) 20 | 21 | def get_text_lines(self, text_proposals, scores, im_size): 22 | # tp=text proposal 23 | tp_groups=self.group_text_proposals(text_proposals, scores, im_size) 24 | text_lines=np.zeros((len(tp_groups), 5), np.float32) 25 | 26 | for index, tp_indices in enumerate(tp_groups): 27 | text_line_boxes=text_proposals[list(tp_indices)] 28 | 29 | x0=np.min(text_line_boxes[:, 0]) 30 | x1=np.max(text_line_boxes[:, 2]) 31 | 32 | offset=(text_line_boxes[0, 2]-text_line_boxes[0, 0])*0.5 33 | 34 | lt_y, rt_y=self.fit_y(text_line_boxes[:, 0], text_line_boxes[:, 1], x0+offset, x1-offset) 35 | lb_y, rb_y=self.fit_y(text_line_boxes[:, 0], text_line_boxes[:, 3], x0+offset, x1-offset) 36 | 37 | # the score of a text line is the average score of the scores 38 | # of all text proposals contained in the text line 39 | score=scores[list(tp_indices)].sum()/float(len(tp_indices)) 40 | 41 | text_lines[index, 0]=x0 42 | text_lines[index, 1]=min(lt_y, rt_y) 43 | text_lines[index, 2]=x1 44 | text_lines[index, 3]=max(lb_y, rb_y) 45 | text_lines[index, 4]=score 46 | 47 | text_lines=clip_boxes(text_lines, im_size) 48 | 49 | text_recs = np.zeros((len(text_lines), 9), np.float) 50 | index = 0 51 | for line in text_lines: 52 | xmin,ymin,xmax,ymax=line[0],line[1],line[2],line[3] 53 | text_recs[index, 0] = xmin 54 | text_recs[index, 1] = ymin 55 | text_recs[index, 2] = xmax 56 | text_recs[index, 3] = ymin 57 | text_recs[index, 4] = xmin 58 | text_recs[index, 5] = ymax 59 | text_recs[index, 6] = xmax 60 | text_recs[index, 7] = ymax 61 | text_recs[index, 8] = line[4] 62 | index = index + 1 63 | 64 | return text_recs 65 | -------------------------------------------------------------------------------- /lib/text_connector/text_proposal_connector_oriented.py: -------------------------------------------------------------------------------- 1 | #coding:utf-8 2 | import numpy as np 3 | from .text_proposal_graph_builder import TextProposalGraphBuilder 4 | 5 | class TextProposalConnector: 6 | """ 7 | Connect text proposals into text lines 8 | """ 9 | def __init__(self): 10 | self.graph_builder=TextProposalGraphBuilder() 11 | 12 | def group_text_proposals(self, text_proposals, scores, im_size): 13 | graph=self.graph_builder.build_graph(text_proposals, scores, im_size) 14 | return graph.sub_graphs_connected() 15 | 16 | def fit_y(self, X, Y, x1, x2): 17 | len(X)!=0 18 | # if X only include one point, the function will get line y=Y[0] 19 | if np.sum(X==X[0])==len(X): 20 | return Y[0], Y[0] 21 | p=np.poly1d(np.polyfit(X, Y, 1)) 22 | return p(x1), p(x2) 23 | 24 | def get_text_lines(self, text_proposals, scores, im_size): 25 | """ 26 | text_proposals:boxes 27 | 28 | """ 29 | # tp=text proposal 30 | tp_groups=self.group_text_proposals(text_proposals, scores, im_size)#首先还是建图,获取到文本行由哪几个小框构成 31 | 32 | text_lines=np.zeros((len(tp_groups), 8), np.float32) 33 | 34 | for index, tp_indices in enumerate(tp_groups): 35 | text_line_boxes=text_proposals[list(tp_indices)]#每个文本行的全部小框 36 | X = (text_line_boxes[:,0] + text_line_boxes[:,2]) / 2# 求每一个小框的中心x,y坐标 37 | Y = (text_line_boxes[:,1] + text_line_boxes[:,3]) / 2 38 | 39 | z1 = np.polyfit(X,Y,1)#多项式拟合,根据之前求的中心店拟合一条直线(最小二乘) 40 | 41 | x0=np.min(text_line_boxes[:, 0])#文本行x坐标最小值 42 | x1=np.max(text_line_boxes[:, 2])#文本行x坐标最大值 43 | 44 | offset=(text_line_boxes[0, 2]-text_line_boxes[0, 0])*0.5#小框宽度的一半 45 | 46 | # 以全部小框的左上角这个点去拟合一条直线,然后计算一下文本行x坐标的极左极右对应的y坐标 47 | lt_y, rt_y=self.fit_y(text_line_boxes[:, 0], text_line_boxes[:, 1], x0+offset, x1-offset) 48 | # 以全部小框的左下角这个点去拟合一条直线,然后计算一下文本行x坐标的极左极右对应的y坐标 49 | lb_y, rb_y=self.fit_y(text_line_boxes[:, 0], text_line_boxes[:, 3], x0+offset, x1-offset) 50 | 51 | score=scores[list(tp_indices)].sum()/float(len(tp_indices))#求全部小框得分的均值作为文本行的均值 52 | 53 | text_lines[index, 0]=x0 54 | text_lines[index, 1]=min(lt_y, rt_y)#文本行上端 线段 的y坐标的小值 55 | text_lines[index, 2]=x1 56 | text_lines[index, 3]=max(lb_y, rb_y)#文本行下端 线段 的y坐标的大值 57 | text_lines[index, 4]=score#文本行得分 58 | text_lines[index, 5]=z1[0]#根据中心点拟合的直线的k,b 59 | text_lines[index, 6]=z1[1] 60 | height = np.mean( (text_line_boxes[:,3]-text_line_boxes[:,1]) )#小框平均高度 61 | text_lines[index, 7]= height + 2.5 62 | 63 | text_recs = np.zeros((len(text_lines), 9), np.float) 64 | index = 0 65 | for line in text_lines: 66 | b1 = line[6] - line[7] / 2 # 根据高度和文本行中心线,求取文本行上下两条线的b值 67 | b2 = line[6] + line[7] / 2 68 | x1 = line[0] 69 | y1 = line[5] * line[0] + b1 # 左上 70 | x2 = line[2] 71 | y2 = line[5] * line[2] + b1 # 右上 72 | x3 = line[0] 73 | y3 = line[5] * line[0] + b2 # 左下 74 | x4 = line[2] 75 | y4 = line[5] * line[2] + b2 # 右下 76 | disX = x2 - x1 77 | disY = y2 - y1 78 | width = np.sqrt(disX * disX + disY * disY) # 文本行宽度 79 | 80 | fTmp0 = y3 - y1 # 文本行高度 81 | fTmp1 = fTmp0 * disY / width 82 | x = np.fabs(fTmp1 * disX / width) # 做补偿 83 | y = np.fabs(fTmp1 * disY / width) 84 | if line[5] < 0: 85 | x1 -= x 86 | y1 += y 87 | x4 += x 88 | y4 -= y 89 | else: 90 | x2 += x 91 | y2 += y 92 | x3 -= x 93 | y3 -= y 94 | text_recs[index, 0] = x1 95 | text_recs[index, 1] = y1 96 | text_recs[index, 2] = x2 97 | text_recs[index, 3] = y2 98 | text_recs[index, 4] = x3 99 | text_recs[index, 5] = y3 100 | text_recs[index, 6] = x4 101 | text_recs[index, 7] = y4 102 | text_recs[index, 8] = line[4] 103 | index = index + 1 104 | 105 | return text_recs 106 | -------------------------------------------------------------------------------- /lib/text_connector/text_proposal_graph_builder.py: -------------------------------------------------------------------------------- 1 | from .text_connect_cfg import Config as TextLineCfg 2 | from .other import Graph 3 | import numpy as np 4 | 5 | 6 | class TextProposalGraphBuilder: 7 | """ 8 | Build Text proposals into a graph. 9 | """ 10 | def get_successions(self, index): 11 | box=self.text_proposals[index] 12 | results=[] 13 | for left in range(int(box[0])+1, min(int(box[0])+TextLineCfg.MAX_HORIZONTAL_GAP+1, self.im_size[1])): 14 | adj_box_indices=self.boxes_table[left] 15 | for adj_box_index in adj_box_indices: 16 | if self.meet_v_iou(adj_box_index, index): 17 | results.append(adj_box_index) 18 | if len(results)!=0: 19 | return results 20 | return results 21 | 22 | def get_precursors(self, index): 23 | box=self.text_proposals[index] 24 | results=[] 25 | for left in range(int(box[0])-1, max(int(box[0]-TextLineCfg.MAX_HORIZONTAL_GAP), 0)-1, -1): 26 | adj_box_indices=self.boxes_table[left] 27 | for adj_box_index in adj_box_indices: 28 | if self.meet_v_iou(adj_box_index, index): 29 | results.append(adj_box_index) 30 | if len(results)!=0: 31 | return results 32 | return results 33 | 34 | def is_succession_node(self, index, succession_index): 35 | precursors=self.get_precursors(succession_index) 36 | if self.scores[index]>=np.max(self.scores[precursors]): 37 | return True 38 | return False 39 | 40 | def meet_v_iou(self, index1, index2): 41 | def overlaps_v(index1, index2): 42 | h1=self.heights[index1] 43 | h2=self.heights[index2] 44 | y0=max(self.text_proposals[index2][1], self.text_proposals[index1][1]) 45 | y1=min(self.text_proposals[index2][3], self.text_proposals[index1][3]) 46 | return max(0, y1-y0+1)/min(h1, h2) 47 | 48 | def size_similarity(index1, index2): 49 | h1=self.heights[index1] 50 | h2=self.heights[index2] 51 | return min(h1, h2)/max(h1, h2) 52 | 53 | return overlaps_v(index1, index2)>=TextLineCfg.MIN_V_OVERLAPS and \ 54 | size_similarity(index1, index2)>=TextLineCfg.MIN_SIZE_SIM 55 | 56 | def build_graph(self, text_proposals, scores, im_size): 57 | self.text_proposals=text_proposals 58 | self.scores=scores 59 | self.im_size=im_size 60 | self.heights=text_proposals[:, 3]-text_proposals[:, 1]+1 61 | 62 | boxes_table=[[] for _ in range(self.im_size[1])] 63 | for index, box in enumerate(text_proposals): 64 | boxes_table[int(box[0])].append(index) 65 | self.boxes_table=boxes_table 66 | 67 | graph=np.zeros((text_proposals.shape[0], text_proposals.shape[0]), np.bool) 68 | 69 | for index, box in enumerate(text_proposals): 70 | successions=self.get_successions(index) 71 | if len(successions)==0: 72 | continue 73 | succession_index=successions[np.argmax(scores[successions])] 74 | if self.is_succession_node(index, succession_index): 75 | # NOTE: a box can have multiple successions(precursors) if multiple successions(precursors) 76 | # have equal scores. 77 | graph[index, succession_index]=True 78 | return Graph(graph) 79 | -------------------------------------------------------------------------------- /lib/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from . import boxes_grid 2 | from . import blob 3 | from . import timer -------------------------------------------------------------------------------- /lib/utils/bbox.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Sergey Karayev 6 | # -------------------------------------------------------- 7 | 8 | cimport cython 9 | import numpy as np 10 | cimport numpy as np 11 | 12 | DTYPE = np.float 13 | ctypedef np.float_t DTYPE_t 14 | 15 | def bbox_overlaps( 16 | np.ndarray[DTYPE_t, ndim=2] boxes, 17 | np.ndarray[DTYPE_t, ndim=2] query_boxes): 18 | """ 19 | Parameters 20 | ---------- 21 | boxes: (N, 4) ndarray of float 22 | query_boxes: (K, 4) ndarray of float 23 | Returns 24 | ------- 25 | overlaps: (N, K) ndarray of overlap between boxes and query_boxes 26 | """ 27 | cdef unsigned int N = boxes.shape[0] 28 | cdef unsigned int K = query_boxes.shape[0] 29 | cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE) 30 | cdef DTYPE_t iw, ih, box_area 31 | cdef DTYPE_t ua 32 | cdef unsigned int k, n 33 | for k in range(K): 34 | box_area = ( 35 | (query_boxes[k, 2] - query_boxes[k, 0] + 1) * 36 | (query_boxes[k, 3] - query_boxes[k, 1] + 1) 37 | ) 38 | for n in range(N): 39 | iw = ( 40 | min(boxes[n, 2], query_boxes[k, 2]) - 41 | max(boxes[n, 0], query_boxes[k, 0]) + 1 42 | ) 43 | if iw > 0: 44 | ih = ( 45 | min(boxes[n, 3], query_boxes[k, 3]) - 46 | max(boxes[n, 1], query_boxes[k, 1]) + 1 47 | ) 48 | if ih > 0: 49 | ua = float( 50 | (boxes[n, 2] - boxes[n, 0] + 1) * 51 | (boxes[n, 3] - boxes[n, 1] + 1) + 52 | box_area - iw * ih 53 | ) 54 | overlaps[n, k] = iw * ih / ua 55 | return overlaps 56 | 57 | def bbox_intersections( 58 | np.ndarray[DTYPE_t, ndim=2] boxes, 59 | np.ndarray[DTYPE_t, ndim=2] query_boxes): 60 | """ 61 | For each query box compute the intersection ratio covered by boxes 62 | ---------- 63 | Parameters 64 | ---------- 65 | boxes: (N, 4) ndarray of float 66 | query_boxes: (K, 4) ndarray of float 67 | Returns 68 | ------- 69 | overlaps: (N, K) ndarray of intersec between boxes and query_boxes 70 | """ 71 | cdef unsigned int N = boxes.shape[0] 72 | cdef unsigned int K = query_boxes.shape[0] 73 | cdef np.ndarray[DTYPE_t, ndim=2] intersec = np.zeros((N, K), dtype=DTYPE) 74 | cdef DTYPE_t iw, ih, box_area 75 | cdef DTYPE_t ua 76 | cdef unsigned int k, n 77 | for k in range(K): 78 | box_area = ( 79 | (query_boxes[k, 2] - query_boxes[k, 0] + 1) * 80 | (query_boxes[k, 3] - query_boxes[k, 1] + 1) 81 | ) 82 | for n in range(N): 83 | iw = ( 84 | min(boxes[n, 2], query_boxes[k, 2]) - 85 | max(boxes[n, 0], query_boxes[k, 0]) + 1 86 | ) 87 | if iw > 0: 88 | ih = ( 89 | min(boxes[n, 3], query_boxes[k, 3]) - 90 | max(boxes[n, 1], query_boxes[k, 1]) + 1 91 | ) 92 | if ih > 0: 93 | intersec[n, k] = iw * ih / box_area 94 | return intersec -------------------------------------------------------------------------------- /lib/utils/blob.py: -------------------------------------------------------------------------------- 1 | """Blob helper functions.""" 2 | import numpy as np 3 | import cv2 4 | from ..fast_rcnn.config import cfg 5 | 6 | def im_list_to_blob(ims): 7 | """Convert a list of images into a network input. 8 | 9 | Assumes images are already prepared (means subtracted, BGR order, ...). 10 | """ 11 | max_shape = np.array([im.shape for im in ims]).max(axis=0) 12 | num_images = len(ims) 13 | blob = np.zeros((num_images, max_shape[0], max_shape[1], 3), 14 | dtype=np.float32) 15 | for i in range(num_images): 16 | im = ims[i] 17 | blob[i, 0:im.shape[0], 0:im.shape[1], :] = im 18 | 19 | return blob 20 | 21 | def prep_im_for_blob(im, pixel_means, target_size, max_size): 22 | """Mean subtract and scale an image for use in a blob.""" 23 | im = im.astype(np.float32, copy=False) 24 | im -= pixel_means 25 | im_shape = im.shape 26 | im_size_min = np.min(im_shape[0:2]) 27 | im_size_max = np.max(im_shape[0:2]) 28 | im_scale = float(target_size) / float(im_size_min) 29 | # Prevent the biggest axis from being more than MAX_SIZE 30 | if np.round(im_scale * im_size_max) > max_size: 31 | im_scale = float(max_size) / float(im_size_max) 32 | if cfg.TRAIN.RANDOM_DOWNSAMPLE: 33 | r = 0.6 + np.random.rand() * 0.4 34 | im_scale *= r 35 | im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale, 36 | interpolation=cv2.INTER_LINEAR) 37 | 38 | return im, im_scale 39 | -------------------------------------------------------------------------------- /lib/utils/boxes_grid.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Subcategory CNN 3 | # Copyright (c) 2015 CVGL Stanford 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Yu Xiang 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | import math 10 | # TODO: make fast_rcnn irrelevant 11 | # >>>> obsolete, because it depends on sth outside of this project 12 | from ..fast_rcnn.config import cfg 13 | # <<<< obsolete 14 | 15 | def get_boxes_grid(image_height, image_width): 16 | """ 17 | Return the boxes on image grid. 18 | calling this function when cfg.IS_MULTISCALE is True, otherwise, calling rdl_roidb.prepare_roidb(imdb) instead. 19 | """ 20 | 21 | # fixed a bug, change cfg.TRAIN.SCALES to cfg.TRAIN.SCALES_BASE 22 | # coz, here needs a ratio around 1.0, not the accutual size. 23 | # height and width of the feature map 24 | if cfg.NET_NAME == 'CaffeNet': 25 | height = np.floor((image_height * max(cfg.TRAIN.SCALES_BASE) - 1) / 4.0 + 1) 26 | height = np.floor((height - 1) / 2.0 + 1 + 0.5) 27 | height = np.floor((height - 1) / 2.0 + 1 + 0.5) 28 | 29 | width = np.floor((image_width * max(cfg.TRAIN.SCALES_BASE) - 1) / 4.0 + 1) 30 | width = np.floor((width - 1) / 2.0 + 1 + 0.5) 31 | width = np.floor((width - 1) / 2.0 + 1 + 0.5) 32 | elif cfg.NET_NAME == 'VGGnet': 33 | height = np.floor(image_height * max(cfg.TRAIN.SCALES_BASE) / 2.0 + 0.5) 34 | height = np.floor(height / 2.0 + 0.5) 35 | height = np.floor(height / 2.0 + 0.5) 36 | height = np.floor(height / 2.0 + 0.5) 37 | 38 | width = np.floor(image_width * max(cfg.TRAIN.SCALES_BASE) / 2.0 + 0.5) 39 | width = np.floor(width / 2.0 + 0.5) 40 | width = np.floor(width / 2.0 + 0.5) 41 | width = np.floor(width / 2.0 + 0.5) 42 | else: 43 | assert (1), 'The network architecture is not supported in utils.get_boxes_grid!' 44 | 45 | # compute the grid box centers 46 | h = np.arange(height) 47 | w = np.arange(width) 48 | y, x = np.meshgrid(h, w, indexing='ij') 49 | centers = np.dstack((x, y)) 50 | centers = np.reshape(centers, (-1, 2)) 51 | num = centers.shape[0] 52 | 53 | # compute width and height of grid box 54 | area = cfg.TRAIN.KERNEL_SIZE * cfg.TRAIN.KERNEL_SIZE 55 | aspect = cfg.TRAIN.ASPECTS # height / width 56 | num_aspect = len(aspect) 57 | widths = np.zeros((1, num_aspect), dtype=np.float32) 58 | heights = np.zeros((1, num_aspect), dtype=np.float32) 59 | for i in range(num_aspect): 60 | widths[0,i] = math.sqrt(area / aspect[i]) 61 | heights[0,i] = widths[0,i] * aspect[i] 62 | 63 | # construct grid boxes 64 | centers = np.repeat(centers, num_aspect, axis=0) 65 | widths = np.tile(widths, num).transpose() 66 | heights = np.tile(heights, num).transpose() 67 | 68 | x1 = np.reshape(centers[:,0], (-1, 1)) - widths * 0.5 69 | x2 = np.reshape(centers[:,0], (-1, 1)) + widths * 0.5 70 | y1 = np.reshape(centers[:,1], (-1, 1)) - heights * 0.5 71 | y2 = np.reshape(centers[:,1], (-1, 1)) + heights * 0.5 72 | 73 | boxes_grid = np.hstack((x1, y1, x2, y2)) / cfg.TRAIN.SPATIAL_SCALE 74 | 75 | return boxes_grid, centers[:,0], centers[:,1] 76 | -------------------------------------------------------------------------------- /lib/utils/cython_nms.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | cimport numpy as np 10 | 11 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b): 12 | return a if a >= b else b 13 | 14 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b): 15 | return a if a <= b else b 16 | 17 | def nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh): 18 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0] 19 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1] 20 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2] 21 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3] 22 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4] 23 | 24 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1) 25 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1] 26 | 27 | cdef int ndets = dets.shape[0] 28 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \ 29 | np.zeros((ndets), dtype=np.int) 30 | 31 | # nominal indices 32 | cdef int _i, _j 33 | # sorted indices 34 | cdef int i, j 35 | # temp variables for box i's (the box currently under consideration) 36 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea 37 | # variables for computing overlap with box j (lower scoring box) 38 | cdef np.float32_t xx1, yy1, xx2, yy2 39 | cdef np.float32_t w, h 40 | cdef np.float32_t inter, ovr 41 | 42 | keep = [] 43 | for _i in range(ndets): 44 | i = order[_i] 45 | if suppressed[i] == 1: 46 | continue 47 | keep.append(i) 48 | ix1 = x1[i] 49 | iy1 = y1[i] 50 | ix2 = x2[i] 51 | iy2 = y2[i] 52 | iarea = areas[i] 53 | for _j in range(_i + 1, ndets): 54 | j = order[_j] 55 | if suppressed[j] == 1: 56 | continue 57 | xx1 = max(ix1, x1[j]) 58 | yy1 = max(iy1, y1[j]) 59 | xx2 = min(ix2, x2[j]) 60 | yy2 = min(iy2, y2[j]) 61 | w = max(0.0, xx2 - xx1 + 1) 62 | h = max(0.0, yy2 - yy1 + 1) 63 | inter = w * h 64 | ovr = inter / (iarea + areas[j] - inter) 65 | if ovr >= thresh: 66 | suppressed[j] = 1 67 | 68 | return keep 69 | 70 | def nms_new(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh): 71 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0] 72 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1] 73 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2] 74 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3] 75 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4] 76 | 77 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1) 78 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1] 79 | 80 | cdef int ndets = dets.shape[0] 81 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \ 82 | np.zeros((ndets), dtype=np.int) 83 | 84 | # nominal indices 85 | cdef int _i, _j 86 | # sorted indices 87 | cdef int i, j 88 | # temp variables for box i's (the box currently under consideration) 89 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea 90 | # variables for computing overlap with box j (lower scoring box) 91 | cdef np.float32_t xx1, yy1, xx2, yy2 92 | cdef np.float32_t w, h 93 | cdef np.float32_t inter, ovr 94 | 95 | keep = [] 96 | for _i in range(ndets): 97 | i = order[_i] 98 | if suppressed[i] == 1: 99 | continue 100 | keep.append(i) 101 | ix1 = x1[i] 102 | iy1 = y1[i] 103 | ix2 = x2[i] 104 | iy2 = y2[i] 105 | iarea = areas[i] 106 | for _j in range(_i + 1, ndets): 107 | j = order[_j] 108 | if suppressed[j] == 1: 109 | continue 110 | xx1 = max(ix1, x1[j]) 111 | yy1 = max(iy1, y1[j]) 112 | xx2 = min(ix2, x2[j]) 113 | yy2 = min(iy2, y2[j]) 114 | w = max(0.0, xx2 - xx1 + 1) 115 | h = max(0.0, yy2 - yy1 + 1) 116 | inter = w * h 117 | ovr = inter / (iarea + areas[j] - inter) 118 | ovr1 = inter / iarea 119 | ovr2 = inter / areas[j] 120 | if ovr >= thresh or ovr1 > 0.95 or ovr2 > 0.95: 121 | suppressed[j] = 1 122 | 123 | return keep 124 | -------------------------------------------------------------------------------- /lib/utils/gpu_nms.hpp: -------------------------------------------------------------------------------- 1 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 2 | int boxes_dim, float nms_overlap_thresh, int device_id); 3 | -------------------------------------------------------------------------------- /lib/utils/gpu_nms.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Faster R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | cimport numpy as np 10 | 11 | assert sizeof(int) == sizeof(np.int32_t) 12 | 13 | cdef extern from "gpu_nms.hpp": 14 | void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int) 15 | 16 | def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh, 17 | np.int32_t device_id=0): 18 | cdef int boxes_num = dets.shape[0] 19 | cdef int boxes_dim = dets.shape[1] 20 | cdef int num_out 21 | cdef np.ndarray[np.int32_t, ndim=1] \ 22 | keep = np.zeros(boxes_num, dtype=np.int32) 23 | cdef np.ndarray[np.float32_t, ndim=1] \ 24 | scores = dets[:, 4] 25 | cdef np.ndarray[np.int_t, ndim=1] \ 26 | order = scores.argsort()[::-1] 27 | cdef np.ndarray[np.float32_t, ndim=2] \ 28 | sorted_dets = dets[order, :] 29 | _nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id) 30 | keep = keep[:num_out] 31 | return list(order[keep]) 32 | -------------------------------------------------------------------------------- /lib/utils/make.sh: -------------------------------------------------------------------------------- 1 | cython bbox.pyx 2 | cython cython_nms.pyx 3 | cython gpu_nms.pyx 4 | python setup.py build_ext --inplace 5 | rm -rf build 6 | -------------------------------------------------------------------------------- /lib/utils/nms_kernel.cu: -------------------------------------------------------------------------------- 1 | // ------------------------------------------------------------------ 2 | // Faster R-CNN 3 | // Copyright (c) 2015 Microsoft 4 | // Licensed under The MIT License [see fast-rcnn/LICENSE for details] 5 | // Written by Shaoqing Ren 6 | // ------------------------------------------------------------------ 7 | 8 | #include "gpu_nms.hpp" 9 | #include 10 | #include 11 | 12 | #define CUDA_CHECK(condition) \ 13 | /* Code block avoids redefinition of cudaError_t error */ \ 14 | do { \ 15 | cudaError_t error = condition; \ 16 | if (error != cudaSuccess) { \ 17 | std::cout << cudaGetErrorString(error) << std::endl; \ 18 | } \ 19 | } while (0) 20 | 21 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0)) 22 | int const threadsPerBlock = sizeof(unsigned long long) * 8; 23 | 24 | __device__ inline float devIoU(float const * const a, float const * const b) { 25 | float left = max(a[0], b[0]), right = min(a[2], b[2]); 26 | float top = max(a[1], b[1]), bottom = min(a[3], b[3]); 27 | float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f); 28 | float interS = width * height; 29 | float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1); 30 | float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1); 31 | return interS / (Sa + Sb - interS); 32 | } 33 | 34 | __global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh, 35 | const float *dev_boxes, unsigned long long *dev_mask) { 36 | const int row_start = blockIdx.y; 37 | const int col_start = blockIdx.x; 38 | 39 | // if (row_start > col_start) return; 40 | 41 | const int row_size = 42 | min(n_boxes - row_start * threadsPerBlock, threadsPerBlock); 43 | const int col_size = 44 | min(n_boxes - col_start * threadsPerBlock, threadsPerBlock); 45 | 46 | __shared__ float block_boxes[threadsPerBlock * 5]; 47 | if (threadIdx.x < col_size) { 48 | block_boxes[threadIdx.x * 5 + 0] = 49 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0]; 50 | block_boxes[threadIdx.x * 5 + 1] = 51 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1]; 52 | block_boxes[threadIdx.x * 5 + 2] = 53 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2]; 54 | block_boxes[threadIdx.x * 5 + 3] = 55 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3]; 56 | block_boxes[threadIdx.x * 5 + 4] = 57 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4]; 58 | } 59 | __syncthreads(); 60 | 61 | if (threadIdx.x < row_size) { 62 | const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x; 63 | const float *cur_box = dev_boxes + cur_box_idx * 5; 64 | int i = 0; 65 | unsigned long long t = 0; 66 | int start = 0; 67 | if (row_start == col_start) { 68 | start = threadIdx.x + 1; 69 | } 70 | for (i = start; i < col_size; i++) { 71 | if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) { 72 | t |= 1ULL << i; 73 | } 74 | } 75 | const int col_blocks = DIVUP(n_boxes, threadsPerBlock); 76 | dev_mask[cur_box_idx * col_blocks + col_start] = t; 77 | } 78 | } 79 | 80 | void _set_device(int device_id) { 81 | int current_device; 82 | CUDA_CHECK(cudaGetDevice(¤t_device)); 83 | if (current_device == device_id) { 84 | return; 85 | } 86 | // The call to cudaSetDevice must come before any calls to Get, which 87 | // may perform initialization using the GPU. 88 | CUDA_CHECK(cudaSetDevice(device_id)); 89 | } 90 | 91 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 92 | int boxes_dim, float nms_overlap_thresh, int device_id) { 93 | _set_device(device_id); 94 | 95 | float* boxes_dev = NULL; 96 | unsigned long long* mask_dev = NULL; 97 | 98 | const int col_blocks = DIVUP(boxes_num, threadsPerBlock); 99 | 100 | CUDA_CHECK(cudaMalloc(&boxes_dev, 101 | boxes_num * boxes_dim * sizeof(float))); 102 | CUDA_CHECK(cudaMemcpy(boxes_dev, 103 | boxes_host, 104 | boxes_num * boxes_dim * sizeof(float), 105 | cudaMemcpyHostToDevice)); 106 | 107 | CUDA_CHECK(cudaMalloc(&mask_dev, 108 | boxes_num * col_blocks * sizeof(unsigned long long))); 109 | 110 | dim3 blocks(DIVUP(boxes_num, threadsPerBlock), 111 | DIVUP(boxes_num, threadsPerBlock)); 112 | dim3 threads(threadsPerBlock); 113 | nms_kernel<<>>(boxes_num, 114 | nms_overlap_thresh, 115 | boxes_dev, 116 | mask_dev); 117 | 118 | std::vector mask_host(boxes_num * col_blocks); 119 | CUDA_CHECK(cudaMemcpy(&mask_host[0], 120 | mask_dev, 121 | sizeof(unsigned long long) * boxes_num * col_blocks, 122 | cudaMemcpyDeviceToHost)); 123 | 124 | std::vector remv(col_blocks); 125 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks); 126 | 127 | int num_to_keep = 0; 128 | for (int i = 0; i < boxes_num; i++) { 129 | int nblock = i / threadsPerBlock; 130 | int inblock = i % threadsPerBlock; 131 | 132 | if (!(remv[nblock] & (1ULL << inblock))) { 133 | keep_out[num_to_keep++] = i; 134 | unsigned long long *p = &mask_host[0] + i * col_blocks; 135 | for (int j = nblock; j < col_blocks; j++) { 136 | remv[j] |= p[j]; 137 | } 138 | } 139 | } 140 | *num_out = num_to_keep; 141 | 142 | CUDA_CHECK(cudaFree(boxes_dev)); 143 | CUDA_CHECK(cudaFree(mask_dev)); 144 | } 145 | -------------------------------------------------------------------------------- /lib/utils/setup.py: -------------------------------------------------------------------------------- 1 | from Cython.Build import cythonize 2 | import os 3 | from os.path import join as pjoin 4 | import numpy as np 5 | from distutils.core import setup 6 | from distutils.extension import Extension 7 | from Cython.Distutils import build_ext 8 | 9 | def find_in_path(name, path): 10 | for dir in path.split(os.pathsep): 11 | binpath = pjoin(dir, name) 12 | if os.path.exists(binpath): 13 | return os.path.abspath(binpath) 14 | return None 15 | 16 | def locate_cuda(): 17 | # first check if the CUDAHOME env variable is in use 18 | if 'CUDAHOME' in os.environ: 19 | home = os.environ['CUDAHOME'] 20 | nvcc = pjoin(home, 'bin', 'nvcc') 21 | else: 22 | # otherwise, search the PATH for NVCC 23 | default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin') 24 | nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path) 25 | if nvcc is None: 26 | raise EnvironmentError('The nvcc binary could not be ' 27 | 'located in your $PATH. Either add it to your path, or set $CUDAHOME') 28 | home = os.path.dirname(os.path.dirname(nvcc)) 29 | 30 | cudaconfig = {'home':home, 'nvcc':nvcc, 31 | 'include': pjoin(home, 'include'), 32 | 'lib64': pjoin(home, 'lib64')} 33 | for k, v in cudaconfig.items(): 34 | #for k, v in cudaconfig.iteritems(): 35 | if not os.path.exists(v): 36 | raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v)) 37 | return cudaconfig 38 | 39 | CUDA = locate_cuda() 40 | 41 | 42 | try: 43 | numpy_include = np.get_include() 44 | except AttributeError: 45 | numpy_include = np.get_numpy_include() 46 | 47 | def customize_compiler_for_nvcc(self): 48 | self.src_extensions.append('.cu') 49 | default_compiler_so = self.compiler_so 50 | super = self._compile 51 | def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts): 52 | print(extra_postargs) 53 | if os.path.splitext(src)[1] == '.cu': 54 | # use the cuda for .cu files 55 | self.set_executable('compiler_so', CUDA['nvcc']) 56 | # use only a subset of the extra_postargs, which are 1-1 translated 57 | # from the extra_compile_args in the Extension class 58 | postargs = extra_postargs['nvcc'] 59 | else: 60 | postargs = extra_postargs['gcc'] 61 | 62 | super(obj, src, ext, cc_args, postargs, pp_opts) 63 | # reset the default compiler_so, which we might have changed for cuda 64 | self.compiler_so = default_compiler_so 65 | # inject our redefined _compile method into the class 66 | self._compile = _compile 67 | 68 | 69 | # run the customize_compiler 70 | class custom_build_ext(build_ext): 71 | def build_extensions(self): 72 | customize_compiler_for_nvcc(self.compiler) 73 | build_ext.build_extensions(self) 74 | 75 | ext_modules = [ 76 | Extension( 77 | "bbox", 78 | ["bbox.pyx"], 79 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]}, 80 | include_dirs = [numpy_include] 81 | ), 82 | Extension( 83 | "cython_nms", 84 | ["cython_nms.pyx"], 85 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]}, 86 | include_dirs = [numpy_include] 87 | ), 88 | Extension('gpu_nms', 89 | ['nms_kernel.cu', 'gpu_nms.pyx'], 90 | library_dirs=[CUDA['lib64']], 91 | libraries=['cudart'], 92 | language='c++', 93 | runtime_library_dirs=[CUDA['lib64']], 94 | extra_compile_args={'gcc': ["-Wno-unused-function"], 95 | 'nvcc': ['-arch=sm_35', 96 | '--ptxas-options=-v', 97 | '-c', 98 | '--compiler-options', 99 | "'-fPIC'"]}, 100 | include_dirs = [numpy_include, CUDA['include']] 101 | ), 102 | ] 103 | 104 | setup( 105 | ext_modules=ext_modules, 106 | cmdclass={'build_ext': custom_build_ext}, 107 | ) 108 | 109 | -------------------------------------------------------------------------------- /lib/utils/timer.py: -------------------------------------------------------------------------------- 1 | import time 2 | class Timer(object): 3 | def __init__(self): 4 | self.total_time = 0. 5 | self.calls = 0 6 | self.start_time = 0. 7 | self.diff = 0. 8 | self.average_time = 0. 9 | 10 | def tic(self): 11 | self.start_time = time.time() 12 | 13 | def toc(self, average=True): 14 | self.diff = time.time() - self.start_time 15 | self.total_time += self.diff 16 | self.calls += 1 17 | self.average_time = self.total_time / self.calls 18 | if average: 19 | return self.average_time 20 | else: 21 | return self.diff 22 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | easydict==1.7 2 | tensorflow_gpu==1.3.0 3 | scipy==0.18.1 4 | numpy==1.11.1 5 | opencv_python==3.4.0.12 6 | Cython==0.27.3 7 | Pillow==5.0.0 8 | PyYAML==3.12 9 | --------------------------------------------------------------------------------