├── .gitignore
├── LICENSE
├── README.md
├── ctpn
├── __init__.py
├── demo.py
├── demo_pb.py
├── generate_pb.py
├── text.yml
└── train_net.py
├── data
├── VOCdevkit2007
├── demo
│ ├── 006.jpg
│ ├── 007.jpg
│ ├── 008.jpg
│ ├── 009.jpg
│ └── 010.png
├── oriented_results
│ ├── 006.jpg
│ ├── 007.jpg
│ ├── 008.jpg
│ ├── 009.jpg
│ └── 010.png
└── results
│ ├── 006.jpg
│ ├── 007.jpg
│ ├── 008.jpg
│ ├── 009.jpg
│ ├── 010.png
│ ├── res_006.txt
│ ├── res_007.txt
│ ├── res_008.txt
│ ├── res_009.txt
│ └── res_010.txt
├── lib
├── __init__.py
├── datasets
│ ├── __init__.py
│ ├── factory.py
│ ├── imdb.py
│ └── pascal_voc.py
├── fast_rcnn
│ ├── __init__.py
│ ├── bbox_transform.py
│ ├── config.py
│ ├── nms_wrapper.py
│ ├── test.py
│ └── train.py
├── networks
│ ├── VGGnet_test.py
│ ├── VGGnet_train.py
│ ├── __init__.py
│ ├── factory.py
│ └── network.py
├── prepare_training_data
│ ├── ToVoc.py
│ └── split_label.py
├── roi_data_layer
│ ├── __init__.py
│ ├── layer.py
│ ├── minibatch.py
│ └── roidb.py
├── rpn_msr
│ ├── __init__.py
│ ├── anchor_target_layer_tf.py
│ ├── generate_anchors.py
│ └── proposal_layer_tf.py
├── text_connector
│ ├── __init__.py
│ ├── detectors.py
│ ├── other.py
│ ├── text_connect_cfg.py
│ ├── text_proposal_connector.py
│ ├── text_proposal_connector_oriented.py
│ └── text_proposal_graph_builder.py
└── utils
│ ├── __init__.py
│ ├── bbox.c
│ ├── bbox.pyx
│ ├── blob.py
│ ├── boxes_grid.py
│ ├── cython_nms.c
│ ├── cython_nms.pyx
│ ├── gpu_nms.c
│ ├── gpu_nms.cpp
│ ├── gpu_nms.hpp
│ ├── gpu_nms.pyx
│ ├── make.sh
│ ├── nms_kernel.cu
│ ├── setup.py
│ └── timer.py
└── requirements.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | cache/
3 | pretrain/
4 | VOCdevkit2007/
5 | logs/
6 | output/
7 | build/
8 | dist/
9 | checkpoints/
10 | .idea/
11 | *.py[cod]
12 | *.c[cod]
13 | *.so
14 | *.swp
15 | *.pb
16 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 shaohui ruan
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # text-detection-ctpn
2 |
3 | text detection mainly based on ctpn (connectionist text proposal network). It is implemented in tensorflow. I use id card detect as an example to demonstrate the results, but it should be noticing that this model can be used in almost every horizontal scene text detection task. The origin paper can be found [here](https://arxiv.org/abs/1609.03605). Also, the origin repo in caffe can be found in [here](https://github.com/tianzhi0549/CTPN). For more detail about the paper and code, see this [blog](http://slade-ruan.me/2017/10/22/text-detection-ctpn/). If you got any questions, check the issue first, if the problem persists, open a new issue.
4 | ***
5 | # roadmap
6 | - [x] freeze the graph for convenient inference
7 | - [x] pure python, cython nms and cuda nms
8 | - [x] loss function as referred in paper
9 | - [x] oriented text connector
10 | - [x] BLSTM
11 | ***
12 | # demo
13 | - for a quick demo,you don't have to build the library, simpely use demo_pb.py for inference.
14 | - first, git clone git@github.com:eragonruan/text-detection-ctpn.git --depth=1
15 | - then, download the pb file from [release](https://github.com/eragonruan/text-detection-ctpn/releases)
16 | - put ctpn.pb in data/
17 | - put your images in data/demo, the results will be saved in data/results, and run demo in the root
18 | ```shell
19 | python ./ctpn/demo_pb.py
20 | ```
21 | ***
22 | # parameters
23 | there are some parameters you may need to modify according to your requirement, you can find them in ctpn/text.yml
24 | - USE_GPU_NMS # whether to use nms implemented in cuda or not
25 | - DETECT_MODE # H represents horizontal mode, O represents oriented mode, default is H
26 | - checkpoints_path # the model I provided is in checkpoints/, if you train the model by yourself,it will be saved in output/
27 | ***
28 | # training
29 | ## setup
30 | - requirements: python2.7, tensorflow1.3, cython0.24, opencv-python, easydict,(recommend to install Anaconda)
31 | - if you do not have a gpu device,follow here to [setup](https://github.com/eragonruan/text-detection-ctpn/issues/43)
32 | - if you have a gpu device, build the library by
33 | ```shell
34 | cd lib/utils
35 | chmod +x make.sh
36 | ./make.sh
37 | ```
38 | ## prepare data
39 | - First, download the pre-trained model of VGG net and put it in data/pretrain/VGG_imagenet.npy. you can download it from [google drive](https://drive.google.com/drive/folders/0B_WmJoEtfQhDRl82b1dJTjB2ZGc?resourcekey=0-OjW5DtLUbX5xUob7fwRvEw&usp=sharing) or [baidu yun](https://pan.baidu.com/s/1kUNTl1l).
40 | - Second, prepare the training data as referred in paper, or you can download the data I prepared from [google drive](https://drive.google.com/drive/folders/0B_WmJoEtfQhDRl82b1dJTjB2ZGc?resourcekey=0-OjW5DtLUbX5xUob7fwRvEw&usp=sharing) or [baidu yun](https://pan.baidu.com/s/1kUNTl1l). Or you can prepare your own data according to the following steps.
41 | - Modify the path and gt_path in prepare_training_data/split_label.py according to your dataset. And run
42 | ```shell
43 | cd lib/prepare_training_data
44 | python split_label.py
45 | ```
46 | - it will generate the prepared data in current folder, and then run
47 | ```shell
48 | python ToVoc.py
49 | ```
50 | - to convert the prepared training data into voc format. It will generate a folder named TEXTVOC. move this folder to data/ and then run
51 | ```shell
52 | cd ../../data
53 | ln -s TEXTVOC VOCdevkit2007
54 | ```
55 | ## train
56 | Simplely run
57 | ```shell
58 | python ./ctpn/train_net.py
59 | ```
60 | - you can modify some hyper parameters in ctpn/text.yml, or just used the parameters I set.
61 | - The model I provided in checkpoints is trained on GTX1070 for 50k iters.
62 | - If you are using cuda nms, it takes about 0.2s per iter. So it will takes about 2.5 hours to finished 50k iterations.
63 | ***
64 | # some results
65 | `NOTICE:` all the photos used below are collected from the internet. If it affects you, please contact me to delete them.
66 | 
67 | 
68 | ***
69 | ## oriented text connector
70 | - oriented text connector has been implemented, i's working, but still need futher improvement.
71 | - left figure is the result for DETECT_MODE H, right figure for DETECT_MODE O
72 | 
73 | 
74 | ***
75 |
--------------------------------------------------------------------------------
/ctpn/__init__.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/ctpn/demo.py:
--------------------------------------------------------------------------------
1 | from __future__ import print_function
2 |
3 | import cv2
4 | import glob
5 | import os
6 | import shutil
7 | import sys
8 |
9 | import numpy as np
10 | import tensorflow as tf
11 |
12 | sys.path.append(os.getcwd())
13 | from lib.networks.factory import get_network
14 | from lib.fast_rcnn.config import cfg, cfg_from_file
15 | from lib.fast_rcnn.test import test_ctpn
16 | from lib.utils.timer import Timer
17 | from lib.text_connector.detectors import TextDetector
18 | from lib.text_connector.text_connect_cfg import Config as TextLineCfg
19 |
20 |
21 | def resize_im(im, scale, max_scale=None):
22 | f = float(scale) / min(im.shape[0], im.shape[1])
23 | if max_scale != None and f * max(im.shape[0], im.shape[1]) > max_scale:
24 | f = float(max_scale) / max(im.shape[0], im.shape[1])
25 | return cv2.resize(im, None, None, fx=f, fy=f, interpolation=cv2.INTER_LINEAR), f
26 |
27 |
28 | def draw_boxes(img, image_name, boxes, scale):
29 | base_name = image_name.split('/')[-1]
30 | with open('data/results/' + 'res_{}.txt'.format(base_name.split('.')[0]), 'w') as f:
31 | for box in boxes:
32 | if np.linalg.norm(box[0] - box[1]) < 5 or np.linalg.norm(box[3] - box[0]) < 5:
33 | continue
34 | if box[8] >= 0.9:
35 | color = (0, 255, 0)
36 | elif box[8] >= 0.8:
37 | color = (255, 0, 0)
38 | cv2.line(img, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), color, 2)
39 | cv2.line(img, (int(box[0]), int(box[1])), (int(box[4]), int(box[5])), color, 2)
40 | cv2.line(img, (int(box[6]), int(box[7])), (int(box[2]), int(box[3])), color, 2)
41 | cv2.line(img, (int(box[4]), int(box[5])), (int(box[6]), int(box[7])), color, 2)
42 |
43 | min_x = min(int(box[0] / scale), int(box[2] / scale), int(box[4] / scale), int(box[6] / scale))
44 | min_y = min(int(box[1] / scale), int(box[3] / scale), int(box[5] / scale), int(box[7] / scale))
45 | max_x = max(int(box[0] / scale), int(box[2] / scale), int(box[4] / scale), int(box[6] / scale))
46 | max_y = max(int(box[1] / scale), int(box[3] / scale), int(box[5] / scale), int(box[7] / scale))
47 |
48 | line = ','.join([str(min_x), str(min_y), str(max_x), str(max_y)]) + '\r\n'
49 | f.write(line)
50 |
51 | img = cv2.resize(img, None, None, fx=1.0 / scale, fy=1.0 / scale, interpolation=cv2.INTER_LINEAR)
52 | cv2.imwrite(os.path.join("data/results", base_name), img)
53 |
54 |
55 | def ctpn(sess, net, image_name):
56 | timer = Timer()
57 | timer.tic()
58 |
59 | img = cv2.imread(image_name)
60 | img, scale = resize_im(img, scale=TextLineCfg.SCALE, max_scale=TextLineCfg.MAX_SCALE)
61 | scores, boxes = test_ctpn(sess, net, img)
62 |
63 | textdetector = TextDetector()
64 | boxes = textdetector.detect(boxes, scores[:, np.newaxis], img.shape[:2])
65 | draw_boxes(img, image_name, boxes, scale)
66 | timer.toc()
67 | print(('Detection took {:.3f}s for '
68 | '{:d} object proposals').format(timer.total_time, boxes.shape[0]))
69 |
70 |
71 | if __name__ == '__main__':
72 | if os.path.exists("data/results/"):
73 | shutil.rmtree("data/results/")
74 | os.makedirs("data/results/")
75 |
76 | cfg_from_file('ctpn/text.yml')
77 |
78 | # init session
79 | config = tf.ConfigProto(allow_soft_placement=True)
80 | sess = tf.Session(config=config)
81 | # load network
82 | net = get_network("VGGnet_test")
83 | # load model
84 | print(('Loading network {:s}... '.format("VGGnet_test")), end=' ')
85 | saver = tf.train.Saver()
86 |
87 | try:
88 | ckpt = tf.train.get_checkpoint_state(cfg.TEST.checkpoints_path)
89 | print('Restoring from {}...'.format(ckpt.model_checkpoint_path), end=' ')
90 | saver.restore(sess, ckpt.model_checkpoint_path)
91 | print('done')
92 | except:
93 | raise 'Check your pretrained {:s}'.format(ckpt.model_checkpoint_path)
94 |
95 | im = 128 * np.ones((300, 300, 3), dtype=np.uint8)
96 | for i in range(2):
97 | _, _ = test_ctpn(sess, net, im)
98 |
99 | im_names = glob.glob(os.path.join(cfg.DATA_DIR, 'demo', '*.png')) + \
100 | glob.glob(os.path.join(cfg.DATA_DIR, 'demo', '*.jpg'))
101 |
102 | for im_name in im_names:
103 | print('~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~')
104 | print(('Demo for {:s}'.format(im_name)))
105 | ctpn(sess, net, im_name)
106 |
--------------------------------------------------------------------------------
/ctpn/demo_pb.py:
--------------------------------------------------------------------------------
1 | from __future__ import print_function
2 |
3 | import glob
4 | import os
5 | import shutil
6 | import sys
7 |
8 | import cv2
9 | import numpy as np
10 | import tensorflow as tf
11 | from tensorflow.python.platform import gfile
12 |
13 | sys.path.append(os.getcwd())
14 | from lib.fast_rcnn.config import cfg, cfg_from_file
15 | from lib.fast_rcnn.test import _get_blobs
16 | from lib.text_connector.detectors import TextDetector
17 | from lib.text_connector.text_connect_cfg import Config as TextLineCfg
18 | from lib.rpn_msr.proposal_layer_tf import proposal_layer
19 |
20 |
21 | def resize_im(im, scale, max_scale=None):
22 | f = float(scale) / min(im.shape[0], im.shape[1])
23 | if max_scale != None and f * max(im.shape[0], im.shape[1]) > max_scale:
24 | f = float(max_scale) / max(im.shape[0], im.shape[1])
25 | return cv2.resize(im, None, None, fx=f, fy=f, interpolation=cv2.INTER_LINEAR), f
26 |
27 |
28 | def draw_boxes(img, image_name, boxes, scale):
29 | base_name = image_name.split('/')[-1]
30 | with open('data/results/' + 'res_{}.txt'.format(base_name.split('.')[0]), 'w') as f:
31 | for box in boxes:
32 | if np.linalg.norm(box[0] - box[1]) < 5 or np.linalg.norm(box[3] - box[0]) < 5:
33 | continue
34 | if box[8] >= 0.9:
35 | color = (0, 255, 0)
36 | elif box[8] >= 0.8:
37 | color = (255, 0, 0)
38 | cv2.line(img, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), color, 2)
39 | cv2.line(img, (int(box[0]), int(box[1])), (int(box[4]), int(box[5])), color, 2)
40 | cv2.line(img, (int(box[6]), int(box[7])), (int(box[2]), int(box[3])), color, 2)
41 | cv2.line(img, (int(box[4]), int(box[5])), (int(box[6]), int(box[7])), color, 2)
42 |
43 | min_x = min(int(box[0] / scale), int(box[2] / scale), int(box[4] / scale), int(box[6] / scale))
44 | min_y = min(int(box[1] / scale), int(box[3] / scale), int(box[5] / scale), int(box[7] / scale))
45 | max_x = max(int(box[0] / scale), int(box[2] / scale), int(box[4] / scale), int(box[6] / scale))
46 | max_y = max(int(box[1] / scale), int(box[3] / scale), int(box[5] / scale), int(box[7] / scale))
47 |
48 | line = ','.join([str(min_x), str(min_y), str(max_x), str(max_y)]) + '\r\n'
49 | f.write(line)
50 |
51 | img = cv2.resize(img, None, None, fx=1.0 / scale, fy=1.0 / scale, interpolation=cv2.INTER_LINEAR)
52 | cv2.imwrite(os.path.join("data/results", base_name), img)
53 |
54 |
55 | if __name__ == '__main__':
56 |
57 | if os.path.exists("data/results/"):
58 | shutil.rmtree("data/results/")
59 | os.makedirs("data/results/")
60 |
61 | cfg_from_file('ctpn/text.yml')
62 |
63 | # init session
64 | config = tf.ConfigProto(allow_soft_placement=True)
65 | sess = tf.Session(config=config)
66 | with gfile.FastGFile('data/ctpn.pb', 'rb') as f:
67 | graph_def = tf.GraphDef()
68 | graph_def.ParseFromString(f.read())
69 | sess.graph.as_default()
70 | tf.import_graph_def(graph_def, name='')
71 | sess.run(tf.global_variables_initializer())
72 |
73 | input_img = sess.graph.get_tensor_by_name('Placeholder:0')
74 | output_cls_prob = sess.graph.get_tensor_by_name('Reshape_2:0')
75 | output_box_pred = sess.graph.get_tensor_by_name('rpn_bbox_pred/Reshape_1:0')
76 |
77 | im_names = glob.glob(os.path.join(cfg.DATA_DIR, 'demo', '*.png')) + \
78 | glob.glob(os.path.join(cfg.DATA_DIR, 'demo', '*.jpg'))
79 |
80 | for im_name in im_names:
81 | print('~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~')
82 | print(('Demo for {:s}'.format(im_name)))
83 | img = cv2.imread(im_name)
84 | img, scale = resize_im(img, scale=TextLineCfg.SCALE, max_scale=TextLineCfg.MAX_SCALE)
85 | blobs, im_scales = _get_blobs(img, None)
86 | if cfg.TEST.HAS_RPN:
87 | im_blob = blobs['data']
88 | blobs['im_info'] = np.array(
89 | [[im_blob.shape[1], im_blob.shape[2], im_scales[0]]],
90 | dtype=np.float32)
91 | cls_prob, box_pred = sess.run([output_cls_prob, output_box_pred], feed_dict={input_img: blobs['data']})
92 | rois, _ = proposal_layer(cls_prob, box_pred, blobs['im_info'], 'TEST', anchor_scales=cfg.ANCHOR_SCALES)
93 |
94 | scores = rois[:, 0]
95 | boxes = rois[:, 1:5] / im_scales[0]
96 | textdetector = TextDetector()
97 | boxes = textdetector.detect(boxes, scores[:, np.newaxis], img.shape[:2])
98 | draw_boxes(img, im_name, boxes, scale)
99 |
--------------------------------------------------------------------------------
/ctpn/generate_pb.py:
--------------------------------------------------------------------------------
1 | from __future__ import print_function
2 |
3 | import os
4 | import sys
5 |
6 | import tensorflow as tf
7 | from tensorflow.python.framework.graph_util import convert_variables_to_constants
8 |
9 | sys.path.append(os.getcwd())
10 | from lib.networks.factory import get_network
11 | from lib.fast_rcnn.config import cfg, cfg_from_file
12 |
13 | if __name__ == "__main__":
14 | cfg_from_file('ctpn/text.yml')
15 |
16 | config = tf.ConfigProto(allow_soft_placement=True)
17 | sess = tf.Session(config=config)
18 | net = get_network("VGGnet_test")
19 | print(('Loading network {:s}... '.format("VGGnet_test")), end=' ')
20 | saver = tf.train.Saver()
21 | try:
22 | ckpt = tf.train.get_checkpoint_state(cfg.TEST.checkpoints_path)
23 | print('Restoring from {}...'.format(ckpt.model_checkpoint_path), end=' ')
24 | saver.restore(sess, ckpt.model_checkpoint_path)
25 | print('done')
26 | except:
27 | raise 'Check your pretrained {:s}'.format(ckpt.model_checkpoint_path)
28 | print(' done.')
29 |
30 | print('all nodes are:\n')
31 | graph = tf.get_default_graph()
32 | input_graph_def = graph.as_graph_def()
33 | node_names = [node.name for node in input_graph_def.node]
34 | for x in node_names:
35 | print(x)
36 | output_node_names = 'Reshape_2,rpn_bbox_pred/Reshape_1'
37 | output_graph_def = convert_variables_to_constants(sess, input_graph_def, output_node_names.split(','))
38 | output_graph = 'data/ctpn.pb'
39 | with tf.gfile.GFile(output_graph, 'wb') as f:
40 | f.write(output_graph_def.SerializeToString())
41 | sess.close()
42 |
--------------------------------------------------------------------------------
/ctpn/text.yml:
--------------------------------------------------------------------------------
1 | EXP_DIR: ctpn_end2end
2 | LOG_DIR: ctpn
3 | IS_MULTISCALE: False
4 | NET_NAME: VGGnet
5 | ANCHOR_SCALES: [16]
6 | NCLASSES: 2
7 | USE_GPU_NMS: True
8 | TRAIN:
9 | restore: 0
10 | max_steps: 50000
11 | SOLVER: Adam
12 | OHEM: False
13 | RPN_BATCHSIZE: 300
14 | BATCH_SIZE: 300
15 | LOG_IMAGE_ITERS: 100
16 | DISPLAY: 10
17 | SNAPSHOT_ITERS: 1000
18 | HAS_RPN: True
19 | LEARNING_RATE: 0.00001
20 | MOMENTUM: 0.9
21 | GAMMA: 0.1
22 | STEPSIZE: 30000
23 | IMS_PER_BATCH: 1
24 | BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True
25 | RPN_POSITIVE_OVERLAP: 0.7
26 | PROPOSAL_METHOD: gt
27 | BG_THRESH_LO: 0.0
28 | PRECLUDE_HARD_SAMPLES: True
29 | BBOX_INSIDE_WEIGHTS: [0, 1, 0, 1]
30 | RPN_BBOX_INSIDE_WEIGHTS: [0, 1, 0, 1]
31 | RPN_POSITIVE_WEIGHT: -1.0
32 | FG_FRACTION: 0.3
33 | WEIGHT_DECAY: 0.0005
34 | TEST:
35 | HAS_RPN: True
36 | DETECT_MODE: H
37 | checkpoints_path: checkpoints/
38 | # checkpoints_path: output/ctpn_end2end/voc_2007_trainval
39 |
--------------------------------------------------------------------------------
/ctpn/train_net.py:
--------------------------------------------------------------------------------
1 | import os.path
2 | import pprint
3 | import sys
4 |
5 | sys.path.append(os.getcwd())
6 | from lib.fast_rcnn.train import get_training_roidb, train_net
7 | from lib.fast_rcnn.config import cfg_from_file, get_output_dir, get_log_dir
8 | from lib.datasets.factory import get_imdb
9 | from lib.networks.factory import get_network
10 | from lib.fast_rcnn.config import cfg
11 |
12 | if __name__ == '__main__':
13 | cfg_from_file('ctpn/text.yml')
14 | print('Using config:')
15 | pprint.pprint(cfg)
16 | imdb = get_imdb('voc_2007_trainval')
17 | print('Loaded dataset `{:s}` for training'.format(imdb.name))
18 | roidb = get_training_roidb(imdb)
19 |
20 | output_dir = get_output_dir(imdb, None)
21 | log_dir = get_log_dir(imdb)
22 | print('Output will be saved to `{:s}`'.format(output_dir))
23 | print('Logs will be saved to `{:s}`'.format(log_dir))
24 |
25 | device_name = '/gpu:0'
26 | print(device_name)
27 |
28 | network = get_network('VGGnet_train')
29 |
30 | train_net(network, imdb, roidb,
31 | output_dir=output_dir,
32 | log_dir=log_dir,
33 | pretrained_model='data/pretrain/VGG_imagenet.npy',
34 | max_iters=int(cfg.TRAIN.max_steps),
35 | restore=bool(int(cfg.TRAIN.restore)))
36 |
--------------------------------------------------------------------------------
/data/VOCdevkit2007:
--------------------------------------------------------------------------------
1 | /media/D/code/OCR/CTPN_LSTM/data/VOCdevkit
--------------------------------------------------------------------------------
/data/demo/006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/006.jpg
--------------------------------------------------------------------------------
/data/demo/007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/007.jpg
--------------------------------------------------------------------------------
/data/demo/008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/008.jpg
--------------------------------------------------------------------------------
/data/demo/009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/009.jpg
--------------------------------------------------------------------------------
/data/demo/010.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/demo/010.png
--------------------------------------------------------------------------------
/data/oriented_results/006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/006.jpg
--------------------------------------------------------------------------------
/data/oriented_results/007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/007.jpg
--------------------------------------------------------------------------------
/data/oriented_results/008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/008.jpg
--------------------------------------------------------------------------------
/data/oriented_results/009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/009.jpg
--------------------------------------------------------------------------------
/data/oriented_results/010.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/oriented_results/010.png
--------------------------------------------------------------------------------
/data/results/006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/006.jpg
--------------------------------------------------------------------------------
/data/results/007.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/007.jpg
--------------------------------------------------------------------------------
/data/results/008.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/008.jpg
--------------------------------------------------------------------------------
/data/results/009.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/009.jpg
--------------------------------------------------------------------------------
/data/results/010.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/data/results/010.png
--------------------------------------------------------------------------------
/data/results/res_006.txt:
--------------------------------------------------------------------------------
1 | 435,476,870,576
2 | 409,299,716,400
3 | 179,118,691,237
4 | 179,623,614,740
5 | 153,952,947,1069
6 | 102,0,512,30
7 | 230,800,921,906
8 |
--------------------------------------------------------------------------------
/data/results/res_007.txt:
--------------------------------------------------------------------------------
1 | 0,653,254,675
2 | 872,654,1018,676
3 | 181,373,836,558
4 | 436,287,545,387
5 | 345,100,654,310
6 |
--------------------------------------------------------------------------------
/data/results/res_008.txt:
--------------------------------------------------------------------------------
1 | 96,214,512,258
2 | 96,161,320,198
3 | 96,255,480,302
4 | 96,458,320,493
5 | 96,299,480,343
6 | 96,548,464,593
7 | 96,419,480,462
8 | 272,63,384,103
9 | 432,68,496,98
10 | 96,586,464,632
11 | 96,382,496,425
12 | 96,515,448,559
13 | 96,787,224,799
14 | 432,18,496,49
15 | 96,742,256,778
16 | 96,622,448,668
17 | 96,707,464,756
18 | 96,61,128,91
19 | 96,15,208,51
20 | 96,340,496,382
21 | 256,13,384,53
22 | 96,659,224,694
23 | 80,120,512,166
24 |
--------------------------------------------------------------------------------
/data/results/res_009.txt:
--------------------------------------------------------------------------------
1 | 0,695,947,857
2 | 0,19,947,239
3 | 128,1057,768,1237
4 | 51,882,870,1035
5 | 230,253,691,458
6 |
--------------------------------------------------------------------------------
/data/results/res_010.txt:
--------------------------------------------------------------------------------
1 | 40,60,260,79
2 | 113,204,193,216
3 | 33,130,266,151
4 | 120,179,180,197
5 | 60,106,240,126
6 | 40,84,260,103
7 | 26,153,273,174
8 | 33,10,260,41
9 |
--------------------------------------------------------------------------------
/lib/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/lib/__init__.py
--------------------------------------------------------------------------------
/lib/datasets/__init__.py:
--------------------------------------------------------------------------------
1 | from .imdb import imdb
2 | from .pascal_voc import pascal_voc
3 | from . import factory
4 |
5 |
--------------------------------------------------------------------------------
/lib/datasets/factory.py:
--------------------------------------------------------------------------------
1 | from .pascal_voc import pascal_voc
2 | __sets = {}
3 | def _selective_search_IJCV_top_k(split, year, top_k):
4 | imdb = pascal_voc(split, year)
5 | imdb.roidb_handler = imdb.selective_search_IJCV_roidb
6 | imdb.config['top_k'] = top_k
7 | return imdb
8 | # Set up voc__ using selective search "fast" mode
9 | for year in ['2007', '2012', '0712']:
10 | for split in ['train', 'val', 'trainval', 'test']:
11 | name = 'voc_{}_{}'.format(year, split)
12 | __sets[name] = (lambda split=split, year=year:
13 | pascal_voc(split, year))
14 |
15 | def get_imdb(name):
16 | """Get an imdb (image database) by name."""
17 | if name not in __sets:
18 | print((list_imdbs()))
19 | raise KeyError('Unknown dataset: {}'.format(name))
20 | return __sets[name]()
21 |
22 | def list_imdbs():
23 | """List all registered imdbs."""
24 | return list(__sets.keys())
25 |
--------------------------------------------------------------------------------
/lib/datasets/imdb.py:
--------------------------------------------------------------------------------
1 | import os
2 | import os.path as osp
3 | import PIL
4 | import numpy as np
5 | import scipy.sparse
6 | from lib.utils.bbox import bbox_overlaps
7 | from lib.fast_rcnn.config import cfg
8 |
9 | class imdb(object):
10 |
11 | def __init__(self, name):
12 | self._name = name
13 | self._num_classes = 0
14 | self._classes = []
15 | self._image_index = []
16 | self._obj_proposer = 'selective_search'
17 | self._roidb = None
18 | print(self.default_roidb)
19 | self._roidb_handler = self.default_roidb
20 | # Use this dict for storing dataset specific config options
21 | self.config = {}
22 |
23 | @property
24 | def name(self):
25 | return self._name
26 |
27 | @property
28 | def num_classes(self):
29 | return len(self._classes)
30 |
31 | @property
32 | def classes(self):
33 | return self._classes
34 |
35 | @property
36 | def image_index(self):
37 | return self._image_index
38 |
39 | @property
40 | def roidb_handler(self):
41 | return self._roidb_handler
42 |
43 | @roidb_handler.setter
44 | def roidb_handler(self, val):
45 | self._roidb_handler = val
46 |
47 | def set_proposal_method(self, method):
48 | method = eval('self.' + method + '_roidb')
49 | self.roidb_handler = method
50 |
51 | @property
52 | def roidb(self):
53 | # A roidb is a list of dictionaries, each with the following keys:
54 | # boxes
55 | # gt_overlaps
56 | # gt_classes
57 | # flipped
58 | if self._roidb is not None:
59 | return self._roidb
60 | self._roidb = self.roidb_handler()
61 | return self._roidb
62 |
63 | @property
64 | def cache_path(self):
65 | cache_path = osp.abspath(osp.join(cfg.DATA_DIR, 'cache'))
66 | if not os.path.exists(cache_path):
67 | os.makedirs(cache_path)
68 | return cache_path
69 |
70 | @property
71 | def num_images(self):
72 | return len(self.image_index)
73 |
74 | def image_path_at(self, i):
75 | raise NotImplementedError
76 |
77 | def default_roidb(self):
78 | raise NotImplementedError
79 |
80 | def _get_widths(self):
81 | return [PIL.Image.open(self.image_path_at(i)).size[0]
82 | for i in range(self.num_images)]
83 |
84 | def append_flipped_images(self):
85 | num_images = self.num_images
86 | widths = self._get_widths()
87 | for i in range(num_images):
88 | boxes = self.roidb[i]['boxes'].copy()
89 | oldx1 = boxes[:, 0].copy()
90 | oldx2 = boxes[:, 2].copy()
91 | boxes[:, 0] = widths[i] - oldx2 - 1
92 | boxes[:, 2] = widths[i] - oldx1 - 1
93 | for b in range(len(boxes)):
94 | if boxes[b][2]< boxes[b][0]:
95 | boxes[b][0] = 0
96 | assert (boxes[:, 2] >= boxes[:, 0]).all()
97 | entry = {'boxes' : boxes,
98 | 'gt_overlaps' : self.roidb[i]['gt_overlaps'],
99 | 'gt_classes' : self.roidb[i]['gt_classes'],
100 | 'flipped' : True}
101 |
102 | if 'gt_ishard' in self.roidb[i] and 'dontcare_areas' in self.roidb[i]:
103 | entry['gt_ishard'] = self.roidb[i]['gt_ishard'].copy()
104 | dontcare_areas = self.roidb[i]['dontcare_areas'].copy()
105 | oldx1 = dontcare_areas[:, 0].copy()
106 | oldx2 = dontcare_areas[:, 2].copy()
107 | dontcare_areas[:, 0] = widths[i] - oldx2 - 1
108 | dontcare_areas[:, 2] = widths[i] - oldx1 - 1
109 | entry['dontcare_areas'] = dontcare_areas
110 |
111 | self.roidb.append(entry)
112 |
113 | self._image_index = self._image_index * 2
114 |
115 |
116 | def create_roidb_from_box_list(self, box_list, gt_roidb):
117 | assert len(box_list) == self.num_images, \
118 | 'Number of boxes must match number of ground-truth images'
119 | roidb = []
120 | for i in range(self.num_images):
121 | boxes = box_list[i]
122 | num_boxes = boxes.shape[0]
123 | overlaps = np.zeros((num_boxes, self.num_classes), dtype=np.float32)
124 |
125 | if gt_roidb is not None and gt_roidb[i]['boxes'].size > 0:
126 | gt_boxes = gt_roidb[i]['boxes']
127 | gt_classes = gt_roidb[i]['gt_classes']
128 | gt_overlaps = bbox_overlaps(boxes.astype(np.float),
129 | gt_boxes.astype(np.float))
130 | argmaxes = gt_overlaps.argmax(axis=1)
131 | maxes = gt_overlaps.max(axis=1)
132 | I = np.where(maxes > 0)[0]
133 | overlaps[I, gt_classes[argmaxes[I]]] = maxes[I]
134 |
135 | overlaps = scipy.sparse.csr_matrix(overlaps)
136 | roidb.append({
137 | 'boxes' : boxes,
138 | 'gt_classes' : np.zeros((num_boxes,), dtype=np.int32),
139 | 'gt_overlaps' : overlaps,
140 | 'flipped' : False,
141 | 'seg_areas' : np.zeros((num_boxes,), dtype=np.float32),
142 | })
143 | return roidb
144 |
145 | @staticmethod
146 | def merge_roidbs(a, b):
147 | assert len(a) == len(b)
148 | for i in range(len(a)):
149 | a[i]['boxes'] = np.vstack((a[i]['boxes'], b[i]['boxes']))
150 | a[i]['gt_classes'] = np.hstack((a[i]['gt_classes'],
151 | b[i]['gt_classes']))
152 | a[i]['gt_overlaps'] = scipy.sparse.vstack([a[i]['gt_overlaps'],
153 | b[i]['gt_overlaps']])
154 | a[i]['seg_areas'] = np.hstack((a[i]['seg_areas'],
155 | b[i]['seg_areas']))
156 | return a
157 |
158 |
--------------------------------------------------------------------------------
/lib/datasets/pascal_voc.py:
--------------------------------------------------------------------------------
1 | # -*- coding:utf-8 -*-
2 | import os
3 | import numpy as np
4 | import scipy.sparse
5 | try:
6 | import cPickle as pickle
7 | except:
8 | import pickle
9 | import uuid
10 | import xml.etree.ElementTree as ET
11 | from .imdb import imdb
12 | from lib.fast_rcnn.config import cfg
13 |
14 | class pascal_voc(imdb):
15 | def __init__(self, image_set, year, devkit_path=None):
16 | imdb.__init__(self, 'voc_' + year + '_' + image_set)
17 | self._year = year
18 | self._image_set = image_set
19 | self._devkit_path = self._get_default_path() if devkit_path is None \
20 | else devkit_path
21 | self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year)
22 | self._classes = ('__background__', # always index 0
23 | 'text')
24 |
25 | self._class_to_ind = dict(list(zip(self.classes, list(range(self.num_classes)))))
26 | self._image_ext = '.jpg'
27 | self._image_index = self._load_image_set_index()
28 | # Default to roidb handler
29 | #self._roidb_handler = self.selective_search_roidb
30 | self._roidb_handler = self.gt_roidb
31 | self._salt = str(uuid.uuid4())
32 | self._comp_id = 'comp4'
33 |
34 | # PASCAL specific config options
35 | self.config = {'cleanup' : True,
36 | 'use_salt' : True,
37 | 'use_diff' : False,
38 | 'matlab_eval' : False,
39 | 'rpn_file' : None,
40 | 'min_size' : 2}
41 |
42 | assert os.path.exists(self._devkit_path), \
43 | 'VOCdevkit path does not exist: {}'.format(self._devkit_path)
44 | assert os.path.exists(self._data_path), \
45 | 'Path does not exist: {}'.format(self._data_path)
46 |
47 | def image_path_at(self, i):
48 | """
49 | Return the absolute path to image i in the image sequence.
50 | """
51 | return self.image_path_from_index(self._image_index[i])
52 |
53 | def image_path_from_index(self, index):
54 | """
55 | Construct an image path from the image's "index" identifier.
56 | """
57 | image_path = os.path.join(self._data_path, 'JPEGImages',
58 | index + self._image_ext)
59 | assert os.path.exists(image_path), \
60 | 'Path does not exist: {}'.format(image_path)
61 | return image_path
62 |
63 | def _load_image_set_index(self):
64 | """
65 | Load the indexes listed in this dataset's image set file.
66 | """
67 | # Example path to image set file:
68 | # self._devkit_path + /VOCdevkit2007/VOC2007/ImageSets/Main/val.txt
69 | image_set_file = os.path.join(self._data_path, 'ImageSets', 'Main',
70 | self._image_set + '.txt')
71 | assert os.path.exists(image_set_file), \
72 | 'Path does not exist: {}'.format(image_set_file)
73 | with open(image_set_file) as f:
74 | image_index = [x.strip() for x in f.readlines()]
75 | return image_index
76 |
77 | def _get_default_path(self):
78 | """
79 | Return the default path where PASCAL VOC is expected to be installed.
80 | """
81 | return os.path.join(cfg.DATA_DIR, 'VOCdevkit' + self._year)
82 |
83 | def gt_roidb(self):
84 | """
85 | Return the database of ground-truth regions of interest.
86 |
87 | This function loads/saves from/to a cache file to speed up future calls.
88 | """
89 | cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl')
90 | if os.path.exists(cache_file):
91 | with open(cache_file, 'rb') as fid:
92 | roidb = pickle.load(fid)
93 | print('{} gt roidb loaded from {}'.format(self.name, cache_file))
94 | return roidb
95 |
96 | gt_roidb = [self._load_pascal_annotation(index)
97 | for index in self.image_index]
98 | with open(cache_file, 'wb') as fid:
99 | pickle.dump(gt_roidb, fid, pickle.HIGHEST_PROTOCOL)
100 | print('wrote gt roidb to {}'.format(cache_file))
101 |
102 | return gt_roidb
103 |
104 | def rpn_roidb(self):
105 | if int(self._year) == 2007 or self._image_set != 'test':
106 | gt_roidb = self.gt_roidb()
107 | rpn_roidb = self._load_rpn_roidb(gt_roidb)
108 | roidb = imdb.merge_roidbs(gt_roidb, rpn_roidb)
109 | else:
110 | roidb = self._load_rpn_roidb(None)
111 |
112 | return roidb
113 |
114 | def _load_rpn_roidb(self, gt_roidb):
115 | filename = self.config['rpn_file']
116 | print('loading {}'.format(filename))
117 | assert os.path.exists(filename), \
118 | 'rpn data not found at: {}'.format(filename)
119 | with open(filename, 'rb') as f:
120 | box_list = pickle.load(f)
121 | return self.create_roidb_from_box_list(box_list, gt_roidb)
122 |
123 |
124 | def _load_pascal_annotation(self, index):
125 | """
126 | Load image and bounding boxes info from XML file in the PASCAL VOC
127 | format.
128 | """
129 | filename = os.path.join(self._data_path, 'Annotations', index + '.xml')
130 | tree = ET.parse(filename)
131 | objs = tree.findall('object')
132 | num_objs = len(objs)
133 |
134 | boxes = np.zeros((num_objs, 4), dtype=np.uint16)
135 | gt_classes = np.zeros((num_objs), dtype=np.int32)
136 | overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
137 | # "Seg" area for pascal is just the box area
138 | seg_areas = np.zeros((num_objs), dtype=np.float32)
139 | ishards = np.zeros((num_objs), dtype=np.int32)
140 |
141 | # Load object bounding boxes into a data frame.
142 | for ix, obj in enumerate(objs):
143 | bbox = obj.find('bndbox')
144 | # Make pixel indexes 0-based
145 | x1 = float(bbox.find('xmin').text)
146 | y1 = float(bbox.find('ymin').text)
147 | x2 = float(bbox.find('xmax').text)
148 | y2 = float(bbox.find('ymax').text)
149 | diffc = obj.find('difficult')
150 | difficult = 0 if diffc == None else int(diffc.text)
151 | ishards[ix] = difficult
152 |
153 | cls = self._class_to_ind[obj.find('name').text.lower().strip()]
154 | boxes[ix, :] = [x1, y1, x2, y2]
155 | gt_classes[ix] = cls
156 | overlaps[ix, cls] = 1.0
157 | seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)
158 |
159 | overlaps = scipy.sparse.csr_matrix(overlaps)
160 |
161 | return {'boxes' : boxes,
162 | 'gt_classes': gt_classes,
163 | 'gt_ishard': ishards,
164 | 'gt_overlaps' : overlaps,
165 | 'flipped' : False,
166 | 'seg_areas' : seg_areas}
167 |
168 | def _get_comp_id(self):
169 | comp_id = (self._comp_id + '_' + self._salt if self.config['use_salt']
170 | else self._comp_id)
171 | return comp_id
172 |
173 | def _get_voc_results_file_template(self):
174 | filename = self._get_comp_id() + '_det_' + self._image_set + '_{:s}.txt'
175 | filedir = os.path.join(self._devkit_path, 'results', 'VOC' + self._year, 'Main')
176 | if not os.path.exists(filedir):
177 | os.makedirs(filedir)
178 | path = os.path.join(filedir, filename)
179 | return path
180 |
181 | def _write_voc_results_file(self, all_boxes):
182 | for cls_ind, cls in enumerate(self.classes):
183 | if cls == '__background__':
184 | continue
185 | print('Writing {} VOC results file'.format(cls))
186 | filename = self._get_voc_results_file_template().format(cls)
187 | with open(filename, 'wt') as f:
188 | for im_ind, index in enumerate(self.image_index):
189 | dets = all_boxes[cls_ind][im_ind]
190 | if dets == []:
191 | continue
192 | # the VOCdevkit expects 1-based indices
193 | for k in range(dets.shape[0]):
194 | f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
195 | format(index, dets[k, -1],
196 | dets[k, 0] + 1, dets[k, 1] + 1,
197 | dets[k, 2] + 1, dets[k, 3] + 1))
198 |
199 |
200 | if __name__ == '__main__':
201 | d = pascal_voc('trainval', '2007')
202 | res = d.roidb
203 | from IPython import embed; embed()
204 |
--------------------------------------------------------------------------------
/lib/fast_rcnn/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/lib/fast_rcnn/__init__.py
--------------------------------------------------------------------------------
/lib/fast_rcnn/bbox_transform.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | def bbox_transform(ex_rois, gt_rois):
4 | """
5 | computes the distance from ground-truth boxes to the given boxes, normed by their size
6 | :param ex_rois: n * 4 numpy array, given boxes
7 | :param gt_rois: n * 4 numpy array, ground-truth boxes
8 | :return: deltas: n * 4 numpy array, ground-truth boxes
9 | """
10 | ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
11 | ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
12 | ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
13 | ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
14 |
15 | assert np.min(ex_widths) > 0.1 and np.min(ex_heights) > 0.1, \
16 | 'Invalid boxes found: {} {}'. \
17 | format(ex_rois[np.argmin(ex_widths), :], ex_rois[np.argmin(ex_heights), :])
18 |
19 | gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
20 | gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
21 | gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
22 | gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights
23 |
24 | # warnings.catch_warnings()
25 | # warnings.filterwarnings('error')
26 | targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
27 | targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
28 | targets_dw = np.log(gt_widths / ex_widths)
29 | targets_dh = np.log(gt_heights / ex_heights)
30 |
31 | targets = np.vstack(
32 | (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
33 |
34 | return targets
35 |
36 | def bbox_transform_inv(boxes, deltas):
37 |
38 | boxes = boxes.astype(deltas.dtype, copy=False)
39 |
40 | widths = boxes[:, 2] - boxes[:, 0] + 1.0
41 | heights = boxes[:, 3] - boxes[:, 1] + 1.0
42 | ctr_x = boxes[:, 0] + 0.5 * widths
43 | ctr_y = boxes[:, 1] + 0.5 * heights
44 |
45 | dx = deltas[:, 0::4]
46 | dy = deltas[:, 1::4]
47 | dw = deltas[:, 2::4]
48 | dh = deltas[:, 3::4]
49 |
50 | pred_ctr_x = ctr_x[:, np.newaxis]
51 | pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
52 | pred_w = widths[:, np.newaxis]
53 | pred_h = np.exp(dh) * heights[:, np.newaxis]
54 |
55 | pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
56 | # x1
57 | pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
58 | # y1
59 | pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
60 | # x2
61 | pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
62 | # y2
63 | pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
64 |
65 | return pred_boxes
66 |
67 | def clip_boxes(boxes, im_shape):
68 | """
69 | Clip boxes to image boundaries.
70 | """
71 |
72 | # x1 >= 0
73 | boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
74 | # y1 >= 0
75 | boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
76 | # x2 < im_shape[1]
77 | boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
78 | # y2 < im_shape[0]
79 | boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
80 | return boxes
81 |
--------------------------------------------------------------------------------
/lib/fast_rcnn/config.py:
--------------------------------------------------------------------------------
1 | import os
2 | import os.path as osp
3 | import numpy as np
4 | from time import strftime, localtime
5 | from easydict import EasyDict as edict
6 |
7 | __C = edict()
8 | cfg = __C
9 |
10 | # Default GPU device id
11 | __C.GPU_ID = 0
12 |
13 | # Training options
14 | __C.IS_RPN = True
15 | __C.ANCHOR_SCALES = [16]
16 | __C.NCLASSES = 2
17 | __C.USE_GPU_NMS = True
18 | # multiscale training and testing
19 | __C.IS_MULTISCALE = False
20 | __C.IS_EXTRAPOLATING = True
21 |
22 | __C.REGION_PROPOSAL = 'RPN'
23 |
24 | __C.NET_NAME = 'VGGnet'
25 | __C.SUBCLS_NAME = 'voxel_exemplars'
26 |
27 | __C.TRAIN = edict()
28 | # Adam, Momentum, RMS
29 | __C.TRAIN.restore = 0
30 | __C.TRAIN.max_steps = 100000
31 | __C.TRAIN.SOLVER = 'Momentum'
32 | # learning rate
33 | __C.TRAIN.WEIGHT_DECAY = 0.0005
34 | __C.TRAIN.LEARNING_RATE = 0.001
35 | __C.TRAIN.MOMENTUM = 0.9
36 | __C.TRAIN.GAMMA = 0.1
37 | __C.TRAIN.STEPSIZE = 50000
38 | __C.TRAIN.DISPLAY = 10
39 | __C.TRAIN.LOG_IMAGE_ITERS = 100
40 | __C.TRAIN.OHEM = False
41 | __C.TRAIN.RANDOM_DOWNSAMPLE = False
42 |
43 | # Scales to compute real features
44 | __C.TRAIN.SCALES_BASE = (0.25, 0.5, 1.0, 2.0, 3.0)
45 | __C.TRAIN.KERNEL_SIZE = 5
46 | __C.TRAIN.ASPECTS= (1,)
47 | __C.TRAIN.SCALES = (600,)
48 |
49 | # Max pixel size of the longest side of a scaled input image
50 | __C.TRAIN.MAX_SIZE = 1000
51 |
52 | # Images to use per minibatch
53 | __C.TRAIN.IMS_PER_BATCH = 2
54 |
55 | # Minibatch size (number of regions of interest [ROIs])
56 | __C.TRAIN.BATCH_SIZE = 128
57 |
58 | # Fraction of minibatch that is labeled foreground (i.e. class > 0)
59 | __C.TRAIN.FG_FRACTION = 0.25
60 |
61 | # Overlap threshold for a ROI to be considered foreground (if >= FG_THRESH)
62 | __C.TRAIN.FG_THRESH = 0.5
63 |
64 | # Overlap threshold for a ROI to be considered background (class = 0 if
65 | # overlap in [LO, HI))
66 | __C.TRAIN.BG_THRESH_HI = 0.5
67 | __C.TRAIN.BG_THRESH_LO = 0.1
68 |
69 | # Use horizontally-flipped images during training?
70 | __C.TRAIN.USE_FLIPPED = True
71 |
72 | # Train bounding-box regressors
73 | __C.TRAIN.BBOX_REG = True
74 |
75 | # Overlap required between a ROI and ground-truth box in order for that ROI to
76 | # be used as a bounding-box regression training example
77 | __C.TRAIN.BBOX_THRESH = 0.5
78 |
79 | # Iterations between snapshots
80 | __C.TRAIN.SNAPSHOT_ITERS = 5000
81 |
82 | # solver.prototxt specifies the snapshot path prefix, this adds an optional
83 | # infix to yield the path: [_]_iters_XYZ.caffemodel
84 | __C.TRAIN.SNAPSHOT_PREFIX = 'VGGnet_fast_rcnn'
85 | __C.TRAIN.SNAPSHOT_INFIX = ''
86 |
87 | # Use a prefetch thread in roi_data_layer.layer
88 | # So far I haven't found this useful; likely more engineering work is required
89 | __C.TRAIN.USE_PREFETCH = False
90 |
91 | # Normalize the targets (subtract empirical mean, divide by empirical stddev)
92 | __C.TRAIN.BBOX_NORMALIZE_TARGETS = True
93 | # Deprecated (inside weights)
94 | # used for assigning weights for each coords (x1, y1, w, h)
95 | __C.TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
96 | # Normalize the targets using "precomputed" (or made up) means and stdevs
97 | # (BBOX_NORMALIZE_TARGETS must also be True)
98 | __C.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = True
99 | __C.TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0)
100 | __C.TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2)
101 | # faster rcnn dont use pre-generated rois by selective search
102 | # __C.TRAIN.BBOX_NORMALIZE_STDS = (1, 1, 1, 1)
103 |
104 | # Train using these proposals
105 | __C.TRAIN.PROPOSAL_METHOD = 'selective_search'
106 |
107 | # Make minibatches from images that have similar aspect ratios (i.e. both
108 | # tall and thin or both short and wide) in order to avoid wasting computation
109 | # on zero-padding.
110 | __C.TRAIN.ASPECT_GROUPING = True
111 | # preclude rois intersected with dontcare areas above the value
112 | __C.TRAIN.DONTCARE_AREA_INTERSECTION_HI = 0.5
113 | __C.TRAIN.PRECLUDE_HARD_SAMPLES = True
114 | # Use RPN to detect objects
115 | __C.TRAIN.HAS_RPN = True
116 | # IOU >= thresh: positive example
117 | __C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7
118 | # IOU < thresh: negative example
119 | __C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3
120 | # If an anchor statisfied by positive and negative conditions set to negative
121 | __C.TRAIN.RPN_CLOBBER_POSITIVES = False
122 | # Max number of foreground examples
123 | __C.TRAIN.RPN_FG_FRACTION = 0.5
124 | # Total number of examples
125 | __C.TRAIN.RPN_BATCHSIZE = 256
126 | # NMS threshold used on RPN proposals
127 | __C.TRAIN.RPN_NMS_THRESH = 0.7
128 | # Number of top scoring boxes to keep before apply NMS to RPN proposals
129 | __C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
130 | # Number of top scoring boxes to keep after applying NMS to RPN proposals
131 | __C.TRAIN.RPN_POST_NMS_TOP_N = 2000
132 | # Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
133 | __C.TRAIN.RPN_MIN_SIZE = 8
134 | # Deprecated (outside weights)
135 | __C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
136 | # Give the positive RPN examples weight of p * 1 / {num positives}
137 | # and give negatives a weight of (1 - p)
138 | # Set to -1.0 to use uniform example weighting
139 | __C.TRAIN.RPN_POSITIVE_WEIGHT = -1.0
140 | # __C.TRAIN.RPN_POSITIVE_WEIGHT = 0.5
141 |
142 |
143 | #
144 | # Testing options
145 | #
146 |
147 | __C.TEST = edict()
148 | __C.TEST.checkpoints_path = "checkpoints/"
149 | __C.TEST.DETECT_MODE = "H"#H/O for horizontal/oriented mode
150 | # Scales to use during testing (can list multiple scales)
151 | # Each scale is the pixel size of an image's shortest side
152 | __C.TEST.SCALES = (600,)
153 |
154 | # Max pixel size of the longest side of a scaled input image
155 | __C.TEST.MAX_SIZE = 1000
156 |
157 | # Overlap threshold used for non-maximum suppression (suppress boxes with
158 | # IoU >= this threshold)
159 | __C.TEST.NMS = 0.3
160 |
161 | # Experimental: treat the (K+1) units in the cls_score layer as linear
162 | # predictors (trained, eg, with one-vs-rest SVMs).
163 | __C.TEST.SVM = False
164 |
165 | # Test using bounding-box regressors
166 | __C.TEST.BBOX_REG = True
167 |
168 | # Propose boxes
169 | __C.TEST.HAS_RPN = True
170 |
171 | # Test using these proposals
172 | __C.TEST.PROPOSAL_METHOD = 'selective_search'
173 |
174 | ## NMS threshold used on RPN proposals
175 | __C.TEST.RPN_NMS_THRESH = 0.7
176 | ## Number of top scoring boxes to keep before apply NMS to RPN proposals
177 | #__C.TEST.RPN_PRE_NMS_TOP_N = 6000
178 | __C.TEST.RPN_PRE_NMS_TOP_N = 12000
179 | ## Number of top scoring boxes to keep after applying NMS to RPN proposals
180 | __C.TEST.RPN_POST_NMS_TOP_N = 1000
181 | #__C.TEST.RPN_POST_NMS_TOP_N = 2000
182 | # Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
183 | __C.TEST.RPN_MIN_SIZE = 8
184 |
185 |
186 | #
187 | # MISC
188 | #
189 |
190 | # The mapping from image coordinates to feature map coordinates might cause
191 | # some boxes that are distinct in image space to become identical in feature
192 | # coordinates. If DEDUP_BOXES > 0, then DEDUP_BOXES is used as the scale factor
193 | # for identifying duplicate boxes.
194 | # 1/16 is correct for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16
195 | __C.DEDUP_BOXES = 1./16.
196 |
197 | # Pixel mean values (BGR order) as a (1, 1, 3) array
198 | # We use the same pixel mean for all networks even though it's not exactly what
199 | # they were trained with
200 | __C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
201 |
202 | # For reproducibility
203 | #__C.RNG_SEED = 3
204 | __C.RNG_SEED = 3
205 |
206 | # A small number that's used many times
207 | __C.EPS = 1e-14
208 |
209 | # Root directory of project
210 | __C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..'))
211 |
212 | # Data directory
213 | __C.DATA_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'data'))
214 |
215 | # Model directory
216 | __C.MODELS_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'models', 'pascal_voc'))
217 |
218 | # Name (or path to) the matlab executable
219 | __C.MATLAB = 'matlab'
220 |
221 | # Place outputs under an experiments directory
222 | __C.EXP_DIR = 'default'
223 | __C.LOG_DIR = 'default'
224 |
225 | # Use GPU implementation of non-maximum suppression
226 | __C.USE_GPU_NMS = True
227 |
228 |
229 |
230 | def get_output_dir(imdb, weights_filename):
231 | """Return the directory where experimental artifacts are placed.
232 | If the directory does not exist, it is created.
233 |
234 | A canonical path is built using the name from an imdb and a network
235 | (if not None).
236 | """
237 | outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'output', __C.EXP_DIR, imdb.name))
238 | if weights_filename is not None:
239 | outdir = osp.join(outdir, weights_filename)
240 | if not os.path.exists(outdir):
241 | os.makedirs(outdir)
242 | return outdir
243 |
244 | def get_log_dir(imdb):
245 | """Return the directory where experimental artifacts are placed.
246 | If the directory does not exist, it is created.
247 | A canonical path is built using the name from an imdb and a network
248 | (if not None).
249 | """
250 | log_dir = osp.abspath(\
251 | osp.join(__C.ROOT_DIR, 'logs', __C.LOG_DIR, imdb.name, strftime("%Y-%m-%d-%H-%M-%S", localtime())))
252 | if not os.path.exists(log_dir):
253 | os.makedirs(log_dir)
254 | return log_dir
255 |
256 | def _merge_a_into_b(a, b):
257 | """Merge config dictionary a into config dictionary b, clobbering the
258 | options in b whenever they are also specified in a.
259 | """
260 | if type(a) is not edict:
261 | return
262 |
263 | for k, v in a.items():
264 | # a must specify keys that are in b
265 | if k not in b:
266 | raise KeyError('{} is not a valid config key'.format(k))
267 |
268 | # the types must match, too
269 | old_type = type(b[k])
270 | if old_type is not type(v):
271 | if isinstance(b[k], np.ndarray):
272 | v = np.array(v, dtype=b[k].dtype)
273 | else:
274 | raise ValueError(('Type mismatch ({} vs. {}) '
275 | 'for config key: {}').format(type(b[k]),
276 | type(v), k))
277 |
278 | # recursively merge dicts
279 | if type(v) is edict:
280 | try:
281 | _merge_a_into_b(a[k], b[k])
282 | except:
283 | print(('Error under config key: {}'.format(k)))
284 | raise
285 | else:
286 | b[k] = v
287 |
288 | def cfg_from_file(filename):
289 | """Load a config file and merge it into the default options."""
290 | import yaml
291 | with open(filename, 'r') as f:
292 | yaml_cfg = edict(yaml.load(f))
293 |
294 | _merge_a_into_b(yaml_cfg, __C)
295 |
296 | def cfg_from_list(cfg_list):
297 | """Set config keys via list (e.g., from command line)."""
298 | from ast import literal_eval
299 | assert len(cfg_list) % 2 == 0
300 | for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
301 | key_list = k.split('.')
302 | d = __C
303 | for subkey in key_list[:-1]:
304 | assert subkey in d
305 | d = d[subkey]
306 | subkey = key_list[-1]
307 | assert subkey in d
308 | try:
309 | value = literal_eval(v)
310 | except:
311 | # handle the case when v is a string literal
312 | value = v
313 | assert type(value) == type(d[subkey]), \
314 | 'type {} does not match original type {}'.format(
315 | type(value), type(d[subkey]))
316 | d[subkey] = value
317 |
--------------------------------------------------------------------------------
/lib/fast_rcnn/nms_wrapper.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from .config import cfg
3 | pure_python_nms = False
4 | try:
5 | from lib.utils.gpu_nms import gpu_nms
6 | from ..utils.cython_nms import nms as cython_nms
7 | except ImportError:
8 | pure_python_nms = True
9 |
10 |
11 | def nms(dets, thresh):
12 | if dets.shape[0] == 0:
13 | return []
14 | if pure_python_nms:
15 | # print("Fall back to pure python nms")
16 | return py_cpu_nms(dets, thresh)
17 | if cfg.USE_GPU_NMS:
18 | return gpu_nms(dets, thresh, device_id=cfg.GPU_ID)
19 | else:
20 | return cython_nms(dets, thresh)
21 |
22 |
23 | def py_cpu_nms(dets, thresh):
24 | x1 = dets[:, 0]
25 | y1 = dets[:, 1]
26 | x2 = dets[:, 2]
27 | y2 = dets[:, 3]
28 | scores = dets[:, 4]
29 |
30 | areas = (x2 - x1 + 1) * (y2 - y1 + 1)
31 | order = scores.argsort()[::-1]
32 |
33 | keep = []
34 | while order.size > 0:
35 | i = order[0]
36 | keep.append(i)
37 | xx1 = np.maximum(x1[i], x1[order[1:]])
38 | yy1 = np.maximum(y1[i], y1[order[1:]])
39 | xx2 = np.minimum(x2[i], x2[order[1:]])
40 | yy2 = np.minimum(y2[i], y2[order[1:]])
41 | w = np.maximum(0.0, xx2 - xx1 + 1)
42 | h = np.maximum(0.0, yy2 - yy1 + 1)
43 | inter = w * h
44 | ovr = inter / (areas[i] + areas[order[1:]] - inter)
45 | inds = np.where(ovr <= thresh)[0]
46 | order = order[inds + 1]
47 | return keep
48 |
--------------------------------------------------------------------------------
/lib/fast_rcnn/test.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import cv2
3 | from .config import cfg
4 | from lib.utils.blob import im_list_to_blob
5 |
6 |
7 | def _get_image_blob(im):
8 | im_orig = im.astype(np.float32, copy=True)
9 | im_orig -= cfg.PIXEL_MEANS
10 |
11 | im_shape = im_orig.shape
12 | im_size_min = np.min(im_shape[0:2])
13 | im_size_max = np.max(im_shape[0:2])
14 |
15 | processed_ims = []
16 | im_scale_factors = []
17 |
18 | for target_size in cfg.TEST.SCALES:
19 | im_scale = float(target_size) / float(im_size_min)
20 | # Prevent the biggest axis from being more than MAX_SIZE
21 | if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
22 | im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
23 | im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
24 | interpolation=cv2.INTER_LINEAR)
25 | im_scale_factors.append(im_scale)
26 | processed_ims.append(im)
27 |
28 | # Create a blob to hold the input images
29 | blob = im_list_to_blob(processed_ims)
30 |
31 | return blob, np.array(im_scale_factors)
32 |
33 |
34 | def _get_blobs(im, rois):
35 | blobs = {'data' : None, 'rois' : None}
36 | blobs['data'], im_scale_factors = _get_image_blob(im)
37 | return blobs, im_scale_factors
38 |
39 |
40 | def test_ctpn(sess, net, im, boxes=None):
41 | blobs, im_scales = _get_blobs(im, boxes)
42 | if cfg.TEST.HAS_RPN:
43 | im_blob = blobs['data']
44 | blobs['im_info'] = np.array(
45 | [[im_blob.shape[1], im_blob.shape[2], im_scales[0]]],
46 | dtype=np.float32)
47 | # forward pass
48 | if cfg.TEST.HAS_RPN:
49 | feed_dict = {net.data: blobs['data'], net.im_info: blobs['im_info'], net.keep_prob: 1.0}
50 |
51 | rois = sess.run([net.get_output('rois')[0]],feed_dict=feed_dict)
52 | rois=rois[0]
53 |
54 | scores = rois[:, 0]
55 | if cfg.TEST.HAS_RPN:
56 | assert len(im_scales) == 1, "Only single-image batch implemented"
57 | boxes = rois[:, 1:5] / im_scales[0]
58 | return scores,boxes
59 |
--------------------------------------------------------------------------------
/lib/fast_rcnn/train.py:
--------------------------------------------------------------------------------
1 | from __future__ import print_function
2 | import numpy as np
3 | import os
4 | import tensorflow as tf
5 | from lib.roi_data_layer.layer import RoIDataLayer
6 | from lib.utils.timer import Timer
7 | from lib.roi_data_layer import roidb as rdl_roidb
8 | from lib.fast_rcnn.config import cfg
9 |
10 | _DEBUG = False
11 |
12 | class SolverWrapper(object):
13 | def __init__(self, sess, network, imdb, roidb, output_dir, logdir, pretrained_model=None):
14 | """Initialize the SolverWrapper."""
15 | self.net = network
16 | self.imdb = imdb
17 | self.roidb = roidb
18 | self.output_dir = output_dir
19 | self.pretrained_model = pretrained_model
20 |
21 | print('Computing bounding-box regression targets...')
22 | if cfg.TRAIN.BBOX_REG:
23 | self.bbox_means, self.bbox_stds = rdl_roidb.add_bbox_regression_targets(roidb)
24 | print('done')
25 |
26 | # For checkpoint
27 | self.saver = tf.train.Saver(max_to_keep=100,write_version=tf.train.SaverDef.V2)
28 | self.writer = tf.summary.FileWriter(logdir=logdir,
29 | graph=tf.get_default_graph(),
30 | flush_secs=5)
31 |
32 | def snapshot(self, sess, iter):
33 | net = self.net
34 | if cfg.TRAIN.BBOX_REG and 'bbox_pred' in net.layers and cfg.TRAIN.BBOX_NORMALIZE_TARGETS:
35 | # save original values
36 | with tf.variable_scope('bbox_pred', reuse=True):
37 | weights = tf.get_variable("weights")
38 | biases = tf.get_variable("biases")
39 |
40 | orig_0 = weights.eval()
41 | orig_1 = biases.eval()
42 |
43 | # scale and shift with bbox reg unnormalization; then save snapshot
44 | weights_shape = weights.get_shape().as_list()
45 | sess.run(weights.assign(orig_0 * np.tile(self.bbox_stds, (weights_shape[0],1))))
46 | sess.run(biases.assign(orig_1 * self.bbox_stds + self.bbox_means))
47 |
48 | if not os.path.exists(self.output_dir):
49 | os.makedirs(self.output_dir)
50 |
51 | infix = ('_' + cfg.TRAIN.SNAPSHOT_INFIX
52 | if cfg.TRAIN.SNAPSHOT_INFIX != '' else '')
53 | filename = (cfg.TRAIN.SNAPSHOT_PREFIX + infix +
54 | '_iter_{:d}'.format(iter+1) + '.ckpt')
55 | filename = os.path.join(self.output_dir, filename)
56 |
57 | self.saver.save(sess, filename)
58 | print('Wrote snapshot to: {:s}'.format(filename))
59 |
60 | if cfg.TRAIN.BBOX_REG and 'bbox_pred' in net.layers:
61 | # restore net to original state
62 | sess.run(weights.assign(orig_0))
63 | sess.run(biases.assign(orig_1))
64 |
65 | def build_image_summary(self):
66 | # A simple graph for write image summary
67 |
68 | log_image_data = tf.placeholder(tf.uint8, [None, None, 3])
69 | log_image_name = tf.placeholder(tf.string)
70 | # import tensorflow.python.ops.gen_logging_ops as logging_ops
71 | from tensorflow.python.ops import gen_logging_ops
72 | from tensorflow.python.framework import ops as _ops
73 | log_image = gen_logging_ops._image_summary(log_image_name, tf.expand_dims(log_image_data, 0), max_images=1)
74 | _ops.add_to_collection(_ops.GraphKeys.SUMMARIES, log_image)
75 | # log_image = tf.summary.image(log_image_name, tf.expand_dims(log_image_data, 0), max_outputs=1)
76 | return log_image, log_image_data, log_image_name
77 |
78 |
79 | def train_model(self, sess, max_iters, restore=False):
80 | """Network training loop."""
81 | data_layer = get_data_layer(self.roidb, self.imdb.num_classes)
82 | total_loss,model_loss, rpn_cross_entropy, rpn_loss_box=self.net.build_loss(ohem=cfg.TRAIN.OHEM)
83 | # scalar summary
84 | tf.summary.scalar('rpn_reg_loss', rpn_loss_box)
85 | tf.summary.scalar('rpn_cls_loss', rpn_cross_entropy)
86 | tf.summary.scalar('model_loss', model_loss)
87 | tf.summary.scalar('total_loss',total_loss)
88 | summary_op = tf.summary.merge_all()
89 |
90 | log_image, log_image_data, log_image_name =\
91 | self.build_image_summary()
92 |
93 | # optimizer
94 | lr = tf.Variable(cfg.TRAIN.LEARNING_RATE, trainable=False)
95 | if cfg.TRAIN.SOLVER == 'Adam':
96 | opt = tf.train.AdamOptimizer(cfg.TRAIN.LEARNING_RATE)
97 | elif cfg.TRAIN.SOLVER == 'RMS':
98 | opt = tf.train.RMSPropOptimizer(cfg.TRAIN.LEARNING_RATE)
99 | else:
100 | # lr = tf.Variable(0.0, trainable=False)
101 | momentum = cfg.TRAIN.MOMENTUM
102 | opt = tf.train.MomentumOptimizer(lr, momentum)
103 |
104 | global_step = tf.Variable(0, trainable=False)
105 | with_clip = True
106 | if with_clip:
107 | tvars = tf.trainable_variables()
108 | grads, norm = tf.clip_by_global_norm(tf.gradients(total_loss, tvars), 10.0)
109 | train_op = opt.apply_gradients(list(zip(grads, tvars)), global_step=global_step)
110 | else:
111 | train_op = opt.minimize(total_loss, global_step=global_step)
112 |
113 | # intialize variables
114 | sess.run(tf.global_variables_initializer())
115 | restore_iter = 0
116 |
117 | # load vgg16
118 | if self.pretrained_model is not None and not restore:
119 | try:
120 | print(('Loading pretrained model '
121 | 'weights from {:s}').format(self.pretrained_model))
122 | self.net.load(self.pretrained_model, sess, True)
123 | except:
124 | raise Exception('Check your pretrained model {:s}'.format(self.pretrained_model))
125 |
126 | # resuming a trainer
127 | if restore:
128 | try:
129 | ckpt = tf.train.get_checkpoint_state(self.output_dir)
130 | print('Restoring from {}...'.format(ckpt.model_checkpoint_path), end=' ')
131 | self.saver.restore(sess, ckpt.model_checkpoint_path)
132 | stem = os.path.splitext(os.path.basename(ckpt.model_checkpoint_path))[0]
133 | restore_iter = int(stem.split('_')[-1])
134 | sess.run(global_step.assign(restore_iter))
135 | print('done')
136 | except:
137 | raise 'Check your pretrained {:s}'.format(ckpt.model_checkpoint_path)
138 |
139 | last_snapshot_iter = -1
140 | timer = Timer()
141 | for iter in range(restore_iter, max_iters):
142 | timer.tic()
143 | # learning rate
144 | if iter != 0 and iter % cfg.TRAIN.STEPSIZE == 0:
145 | sess.run(tf.assign(lr, lr.eval() * cfg.TRAIN.GAMMA))
146 | print(lr)
147 |
148 | # get one batch
149 | blobs = data_layer.forward()
150 |
151 | feed_dict={
152 | self.net.data: blobs['data'],
153 | self.net.im_info: blobs['im_info'],
154 | self.net.keep_prob: 0.5,
155 | self.net.gt_boxes: blobs['gt_boxes'],
156 | self.net.gt_ishard: blobs['gt_ishard'],
157 | self.net.dontcare_areas: blobs['dontcare_areas']
158 | }
159 | res_fetches=[]
160 | fetch_list = [total_loss,model_loss, rpn_cross_entropy, rpn_loss_box,
161 | summary_op,
162 | train_op] + res_fetches
163 |
164 | total_loss_val,model_loss_val, rpn_loss_cls_val, rpn_loss_box_val, \
165 | summary_str, _ = sess.run(fetches=fetch_list, feed_dict=feed_dict)
166 |
167 | self.writer.add_summary(summary=summary_str, global_step=global_step.eval())
168 |
169 | _diff_time = timer.toc(average=False)
170 |
171 |
172 | if (iter) % (cfg.TRAIN.DISPLAY) == 0:
173 | print('iter: %d / %d, total loss: %.4f, model loss: %.4f, rpn_loss_cls: %.4f, rpn_loss_box: %.4f, lr: %f'%\
174 | (iter, max_iters, total_loss_val,model_loss_val,rpn_loss_cls_val,rpn_loss_box_val,lr.eval()))
175 | print('speed: {:.3f}s / iter'.format(_diff_time))
176 |
177 | if (iter+1) % cfg.TRAIN.SNAPSHOT_ITERS == 0:
178 | last_snapshot_iter = iter
179 | self.snapshot(sess, iter)
180 |
181 | if last_snapshot_iter != iter:
182 | self.snapshot(sess, iter)
183 |
184 | def get_training_roidb(imdb):
185 | """Returns a roidb (Region of Interest database) for use in training."""
186 | if cfg.TRAIN.USE_FLIPPED:
187 | print('Appending horizontally-flipped training examples...')
188 | imdb.append_flipped_images()
189 | print('done')
190 |
191 | print('Preparing training data...')
192 | if cfg.TRAIN.HAS_RPN:
193 | rdl_roidb.prepare_roidb(imdb)
194 | else:
195 | rdl_roidb.prepare_roidb(imdb)
196 | print('done')
197 |
198 | return imdb.roidb
199 |
200 |
201 | def get_data_layer(roidb, num_classes):
202 | """return a data layer."""
203 | if cfg.TRAIN.HAS_RPN:
204 | if cfg.IS_MULTISCALE:
205 | # obsolete
206 | # layer = GtDataLayer(roidb)
207 | raise "Calling caffe modules..."
208 | else:
209 | layer = RoIDataLayer(roidb, num_classes)
210 | else:
211 | layer = RoIDataLayer(roidb, num_classes)
212 |
213 | return layer
214 |
215 |
216 |
217 | def train_net(network, imdb, roidb, output_dir, log_dir, pretrained_model=None, max_iters=40000, restore=False):
218 | """Train a Fast R-CNN network."""
219 |
220 | config = tf.ConfigProto(allow_soft_placement=True)
221 | config.gpu_options.allocator_type = 'BFC'
222 | config.gpu_options.per_process_gpu_memory_fraction = 0.75
223 | with tf.Session(config=config) as sess:
224 | sw = SolverWrapper(sess, network, imdb, roidb, output_dir, logdir= log_dir, pretrained_model=pretrained_model)
225 | print('Solving...')
226 | sw.train_model(sess, max_iters, restore=restore)
227 | print('done solving')
228 |
--------------------------------------------------------------------------------
/lib/networks/VGGnet_test.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | from .network import Network
3 | from lib.fast_rcnn.config import cfg
4 |
5 |
6 | class VGGnet_test(Network):
7 | def __init__(self, trainable=True):
8 | self.inputs = []
9 | self.data = tf.placeholder(tf.float32, shape=[None, None, None, 3])
10 | self.im_info = tf.placeholder(tf.float32, shape=[None, 3])
11 | self.keep_prob = tf.placeholder(tf.float32)
12 | self.layers = dict({'data': self.data, 'im_info': self.im_info})
13 | self.trainable = trainable
14 | self.setup()
15 |
16 | def setup(self):
17 | anchor_scales = cfg.ANCHOR_SCALES
18 | _feat_stride = [16, ]
19 |
20 | (self.feed('data')
21 | .conv(3, 3, 64, 1, 1, name='conv1_1')
22 | .conv(3, 3, 64, 1, 1, name='conv1_2')
23 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool1')
24 | .conv(3, 3, 128, 1, 1, name='conv2_1')
25 | .conv(3, 3, 128, 1, 1, name='conv2_2')
26 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool2')
27 | .conv(3, 3, 256, 1, 1, name='conv3_1')
28 | .conv(3, 3, 256, 1, 1, name='conv3_2')
29 | .conv(3, 3, 256, 1, 1, name='conv3_3')
30 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool3')
31 | .conv(3, 3, 512, 1, 1, name='conv4_1')
32 | .conv(3, 3, 512, 1, 1, name='conv4_2')
33 | .conv(3, 3, 512, 1, 1, name='conv4_3')
34 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool4')
35 | .conv(3, 3, 512, 1, 1, name='conv5_1')
36 | .conv(3, 3, 512, 1, 1, name='conv5_2')
37 | .conv(3, 3, 512, 1, 1, name='conv5_3'))
38 |
39 | (self.feed('conv5_3').conv(3, 3, 512, 1, 1, name='rpn_conv/3x3'))
40 |
41 | (self.feed('rpn_conv/3x3').Bilstm(512, 128, 512, name='lstm_o'))
42 | (self.feed('lstm_o').lstm_fc(512, len(anchor_scales) * 10 * 4, name='rpn_bbox_pred'))
43 | (self.feed('lstm_o').lstm_fc(512, len(anchor_scales) * 10 * 2, name='rpn_cls_score'))
44 |
45 | # shape is (1, H, W, Ax2) -> (1, H, WxA, 2)
46 | (self.feed('rpn_cls_score')
47 | .spatial_reshape_layer(2, name='rpn_cls_score_reshape')
48 | .spatial_softmax(name='rpn_cls_prob'))
49 |
50 | # shape is (1, H, WxA, 2) -> (1, H, W, Ax2)
51 | (self.feed('rpn_cls_prob')
52 | .spatial_reshape_layer(len(anchor_scales) * 10 * 2, name='rpn_cls_prob_reshape'))
53 |
54 | (self.feed('rpn_cls_prob_reshape', 'rpn_bbox_pred', 'im_info')
55 | .proposal_layer(_feat_stride, anchor_scales, 'TEST', name='rois'))
56 |
--------------------------------------------------------------------------------
/lib/networks/VGGnet_train.py:
--------------------------------------------------------------------------------
1 | # -*- coding:utf-8 -*-
2 | import tensorflow as tf
3 | from .network import Network
4 | from lib.fast_rcnn.config import cfg
5 |
6 | class VGGnet_train(Network):
7 | def __init__(self, trainable=True):
8 | self.inputs = []
9 | self.data = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='data')
10 | self.im_info = tf.placeholder(tf.float32, shape=[None, 3], name='im_info')
11 | self.gt_boxes = tf.placeholder(tf.float32, shape=[None, 5], name='gt_boxes')
12 | self.gt_ishard = tf.placeholder(tf.int32, shape=[None], name='gt_ishard')
13 | self.dontcare_areas = tf.placeholder(tf.float32, shape=[None, 4], name='dontcare_areas')
14 | self.keep_prob = tf.placeholder(tf.float32)
15 | self.layers = dict({'data':self.data, 'im_info':self.im_info, 'gt_boxes':self.gt_boxes,\
16 | 'gt_ishard': self.gt_ishard, 'dontcare_areas': self.dontcare_areas})
17 | self.trainable = trainable
18 | self.setup()
19 |
20 | def setup(self):
21 |
22 | # n_classes = 21
23 | n_classes = cfg.NCLASSES
24 | # anchor_scales = [8, 16, 32]
25 | anchor_scales = cfg.ANCHOR_SCALES
26 | _feat_stride = [16, ]
27 |
28 | (self.feed('data')
29 | .conv(3, 3, 64, 1, 1, name='conv1_1')
30 | .conv(3, 3, 64, 1, 1, name='conv1_2')
31 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool1')
32 | .conv(3, 3, 128, 1, 1, name='conv2_1')
33 | .conv(3, 3, 128, 1, 1, name='conv2_2')
34 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool2')
35 | .conv(3, 3, 256, 1, 1, name='conv3_1')
36 | .conv(3, 3, 256, 1, 1, name='conv3_2')
37 | .conv(3, 3, 256, 1, 1, name='conv3_3')
38 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool3')
39 | .conv(3, 3, 512, 1, 1, name='conv4_1')
40 | .conv(3, 3, 512, 1, 1, name='conv4_2')
41 | .conv(3, 3, 512, 1, 1, name='conv4_3')
42 | .max_pool(2, 2, 2, 2, padding='VALID', name='pool4')
43 | .conv(3, 3, 512, 1, 1, name='conv5_1')
44 | .conv(3, 3, 512, 1, 1, name='conv5_2')
45 | .conv(3, 3, 512, 1, 1, name='conv5_3'))
46 | #========= RPN ============
47 | (self.feed('conv5_3')
48 | .conv(3,3,512,1,1,name='rpn_conv/3x3'))
49 |
50 | (self.feed('rpn_conv/3x3').Bilstm(512,128,512,name='lstm_o'))
51 | (self.feed('lstm_o').lstm_fc(512,len(anchor_scales) * 10 * 4, name='rpn_bbox_pred'))
52 | (self.feed('lstm_o').lstm_fc(512,len(anchor_scales) * 10 * 2,name='rpn_cls_score'))
53 |
54 | # generating training labels on the fly
55 | # output: rpn_labels(HxWxA, 2) rpn_bbox_targets(HxWxA, 4) rpn_bbox_inside_weights rpn_bbox_outside_weights
56 | # 给每个anchor上标签,并计算真值(也是delta的形式),以及内部权重和外部权重
57 | (self.feed('rpn_cls_score', 'gt_boxes', 'gt_ishard', 'dontcare_areas', 'im_info')
58 | .anchor_target_layer(_feat_stride, anchor_scales, name = 'rpn-data' ))
59 |
60 | # shape is (1, H, W, Ax2) -> (1, H, WxA, 2)
61 | # 给之前得到的score进行softmax,得到0-1之间的得分
62 | (self.feed('rpn_cls_score')
63 | .spatial_reshape_layer(2, name = 'rpn_cls_score_reshape')
64 | .spatial_softmax(name='rpn_cls_prob'))
65 |
--------------------------------------------------------------------------------
/lib/networks/__init__.py:
--------------------------------------------------------------------------------
1 | from .VGGnet_train import VGGnet_train
2 | from .VGGnet_test import VGGnet_test
3 | from . import factory
4 |
--------------------------------------------------------------------------------
/lib/networks/factory.py:
--------------------------------------------------------------------------------
1 | from .VGGnet_test import VGGnet_test
2 | from .VGGnet_train import VGGnet_train
3 |
4 | def get_network(name):
5 | """Get a network by name."""
6 | if name.split('_')[0] == 'VGGnet':
7 | if name.split('_')[1] == 'test':
8 | return VGGnet_test()
9 | elif name.split('_')[1] == 'train':
10 | return VGGnet_train()
11 | else:
12 | raise KeyError('Unknown dataset: {}'.format(name))
13 | else:
14 | raise KeyError('Unknown dataset: {}'.format(name))
15 |
--------------------------------------------------------------------------------
/lib/networks/network.py:
--------------------------------------------------------------------------------
1 | # -*- coding:utf-8 -*-
2 | import numpy as np
3 | import tensorflow as tf
4 | from lib.fast_rcnn.config import cfg
5 | from lib.rpn_msr.proposal_layer_tf import proposal_layer as proposal_layer_py
6 | from lib.rpn_msr.anchor_target_layer_tf import anchor_target_layer as anchor_target_layer_py
7 | DEFAULT_PADDING = 'SAME'
8 |
9 | def layer(op):
10 | def layer_decorated(self, *args, **kwargs):
11 | # Automatically set a name if not provided.
12 | name = kwargs.setdefault('name', self.get_unique_name(op.__name__))
13 | # Figure out the layer inputs.
14 | if len(self.inputs)==0:
15 | raise RuntimeError('No input variables found for layer %s.'%name)
16 | elif len(self.inputs)==1:
17 | layer_input = self.inputs[0]
18 | else:
19 | layer_input = list(self.inputs)
20 | # Perform the operation and get the output.
21 | layer_output = op(self, layer_input, *args, **kwargs)
22 | # Add to layer LUT.
23 | self.layers[name] = layer_output
24 | # This output is now the input for the next layer.
25 | self.feed(layer_output)
26 | # Return self for chained calls.
27 | return self
28 | return layer_decorated
29 |
30 | class Network(object):
31 | def __init__(self, inputs, trainable=True):
32 | self.inputs = []
33 | self.layers = dict(inputs)
34 | self.trainable = trainable
35 | self.setup()
36 |
37 | def setup(self):
38 | raise NotImplementedError('Must be subclassed.')
39 |
40 | def load(self, data_path, session, ignore_missing=False):
41 | data_dict = np.load(data_path,encoding='latin1').item()
42 | for key in data_dict:
43 | with tf.variable_scope(key, reuse=True):
44 | for subkey in data_dict[key]:
45 | try:
46 | var = tf.get_variable(subkey)
47 | session.run(var.assign(data_dict[key][subkey]))
48 | print("assign pretrain model "+subkey+ " to "+key)
49 | except ValueError:
50 | print("ignore "+key)
51 | if not ignore_missing:
52 |
53 | raise
54 |
55 | def feed(self, *args):
56 | assert len(args)!=0
57 | self.inputs = []
58 | for layer in args:
59 | if isinstance(layer, str):
60 | try:
61 | layer = self.layers[layer]
62 | print(layer)
63 | except KeyError:
64 | print(list(self.layers.keys()))
65 | raise KeyError('Unknown layer name fed: %s'%layer)
66 | self.inputs.append(layer)
67 | return self
68 |
69 | def get_output(self, layer):
70 | try:
71 | layer = self.layers[layer]
72 | except KeyError:
73 | print(list(self.layers.keys()))
74 | raise KeyError('Unknown layer name fed: %s'%layer)
75 | return layer
76 |
77 | def get_unique_name(self, prefix):
78 | id = sum(t.startswith(prefix) for t,_ in list(self.layers.items()))+1
79 | return '%s_%d'%(prefix, id)
80 |
81 | def make_var(self, name, shape, initializer=None, trainable=True, regularizer=None):
82 | return tf.get_variable(name, shape, initializer=initializer, trainable=trainable, regularizer=regularizer)
83 |
84 | def validate_padding(self, padding):
85 | assert padding in ('SAME', 'VALID')
86 |
87 |
88 | @layer
89 | def Bilstm(self, input, d_i, d_h, d_o, name, trainable=True):
90 | img = input
91 | with tf.variable_scope(name) as scope:
92 | shape = tf.shape(img)
93 | N, H, W, C = shape[0], shape[1], shape[2], shape[3]
94 | img = tf.reshape(img, [N * H, W, C])
95 | img.set_shape([None, None, d_i])
96 |
97 | lstm_fw_cell = tf.contrib.rnn.LSTMCell(d_h, state_is_tuple=True)
98 | lstm_bw_cell = tf.contrib.rnn.LSTMCell(d_h, state_is_tuple=True)
99 |
100 | lstm_out, last_state = tf.nn.bidirectional_dynamic_rnn(lstm_fw_cell,lstm_bw_cell, img, dtype=tf.float32)
101 | lstm_out = tf.concat(lstm_out, axis=-1)
102 |
103 | lstm_out = tf.reshape(lstm_out, [N * H * W, 2*d_h])
104 |
105 | init_weights = tf.truncated_normal_initializer(stddev=0.1)
106 | init_biases = tf.constant_initializer(0.0)
107 | weights = self.make_var('weights', [2*d_h, d_o], init_weights, trainable, \
108 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY))
109 | biases = self.make_var('biases', [d_o], init_biases, trainable)
110 | outputs = tf.matmul(lstm_out, weights) + biases
111 |
112 | outputs = tf.reshape(outputs, [N, H, W, d_o])
113 | return outputs
114 |
115 | @layer
116 | def lstm(self, input, d_i,d_h,d_o, name, trainable=True):
117 | img = input
118 | with tf.variable_scope(name) as scope:
119 | shape = tf.shape(img)
120 | N,H,W,C = shape[0], shape[1],shape[2], shape[3]
121 | img = tf.reshape(img,[N*H,W,C])
122 | img.set_shape([None,None,d_i])
123 |
124 | lstm_cell = tf.contrib.rnn.LSTMCell(d_h, state_is_tuple=True)
125 | initial_state = lstm_cell.zero_state(N*H, dtype=tf.float32)
126 |
127 | lstm_out, last_state = tf.nn.dynamic_rnn(lstm_cell, img,
128 | initial_state=initial_state,dtype=tf.float32)
129 |
130 | lstm_out = tf.reshape(lstm_out,[N*H*W,d_h])
131 |
132 |
133 | init_weights = tf.truncated_normal_initializer(stddev=0.1)
134 | init_biases = tf.constant_initializer(0.0)
135 | weights = self.make_var('weights', [d_h, d_o], init_weights, trainable, \
136 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY))
137 | biases = self.make_var('biases', [d_o], init_biases, trainable)
138 | outputs = tf.matmul(lstm_out, weights) + biases
139 |
140 |
141 | outputs = tf.reshape(outputs, [N,H,W,d_o])
142 | return outputs
143 |
144 | @layer
145 | def lstm_fc(self, input, d_i, d_o, name, trainable=True):
146 | with tf.variable_scope(name) as scope:
147 | shape = tf.shape(input)
148 | N, H, W, C = shape[0], shape[1], shape[2], shape[3]
149 | input = tf.reshape(input, [N*H*W,C])
150 |
151 | init_weights = tf.truncated_normal_initializer(0.0, stddev=0.01)
152 | init_biases = tf.constant_initializer(0.0)
153 | kernel = self.make_var('weights', [d_i, d_o], init_weights, trainable,
154 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY))
155 | biases = self.make_var('biases', [d_o], init_biases, trainable)
156 |
157 | _O = tf.matmul(input, kernel) + biases
158 | return tf.reshape(_O, [N, H, W, int(d_o)])
159 |
160 | @layer
161 | def conv(self, input, k_h, k_w, c_o, s_h, s_w, name, biased=True,relu=True, padding=DEFAULT_PADDING, trainable=True):
162 | """ contribution by miraclebiu, and biased option"""
163 | self.validate_padding(padding)
164 | c_i = input.get_shape()[-1]
165 | convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding)
166 | with tf.variable_scope(name) as scope:
167 |
168 | init_weights = tf.truncated_normal_initializer(0.0, stddev=0.01)
169 | init_biases = tf.constant_initializer(0.0)
170 | kernel = self.make_var('weights', [k_h, k_w, c_i, c_o], init_weights, trainable, \
171 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY))
172 | if biased:
173 | biases = self.make_var('biases', [c_o], init_biases, trainable)
174 | conv = convolve(input, kernel)
175 | if relu:
176 | bias = tf.nn.bias_add(conv, biases)
177 | return tf.nn.relu(bias, name=scope.name)
178 | return tf.nn.bias_add(conv, biases, name=scope.name)
179 | else:
180 | conv = convolve(input, kernel)
181 | if relu:
182 | return tf.nn.relu(conv, name=scope.name)
183 | return conv
184 |
185 | @layer
186 | def relu(self, input, name):
187 | return tf.nn.relu(input, name=name)
188 |
189 | @layer
190 | def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING):
191 | self.validate_padding(padding)
192 | return tf.nn.max_pool(input,
193 | ksize=[1, k_h, k_w, 1],
194 | strides=[1, s_h, s_w, 1],
195 | padding=padding,
196 | name=name)
197 |
198 | @layer
199 | def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING):
200 | self.validate_padding(padding)
201 | return tf.nn.avg_pool(input,
202 | ksize=[1, k_h, k_w, 1],
203 | strides=[1, s_h, s_w, 1],
204 | padding=padding,
205 | name=name)
206 |
207 | @layer
208 | def proposal_layer(self, input, _feat_stride, anchor_scales, cfg_key, name):
209 | if isinstance(input[0], tuple):
210 | input[0] = input[0][0]
211 | # input[0] shape is (1, H, W, Ax2)
212 | # rpn_rois <- (1 x H x W x A, 5) [0, x1, y1, x2, y2]
213 | with tf.variable_scope(name) as scope:
214 | blob,bbox_delta = tf.py_func(proposal_layer_py,[input[0],input[1],input[2], cfg_key, _feat_stride, anchor_scales],\
215 | [tf.float32,tf.float32])
216 |
217 | rpn_rois = tf.convert_to_tensor(tf.reshape(blob,[-1, 5]), name = 'rpn_rois') # shape is (1 x H x W x A, 2)
218 | rpn_targets = tf.convert_to_tensor(bbox_delta, name = 'rpn_targets') # shape is (1 x H x W x A, 4)
219 | self.layers['rpn_rois'] = rpn_rois
220 | self.layers['rpn_targets'] = rpn_targets
221 |
222 | return rpn_rois, rpn_targets
223 |
224 |
225 | @layer
226 | def anchor_target_layer(self, input, _feat_stride, anchor_scales, name):
227 | if isinstance(input[0], tuple):
228 | input[0] = input[0][0]
229 |
230 | with tf.variable_scope(name) as scope:
231 | # 'rpn_cls_score', 'gt_boxes', 'gt_ishard', 'dontcare_areas', 'im_info'
232 | rpn_labels,rpn_bbox_targets,rpn_bbox_inside_weights,rpn_bbox_outside_weights = \
233 | tf.py_func(anchor_target_layer_py,
234 | [input[0],input[1],input[2],input[3],input[4], _feat_stride, anchor_scales],
235 | [tf.float32,tf.float32,tf.float32,tf.float32])
236 |
237 | rpn_labels = tf.convert_to_tensor(tf.cast(rpn_labels,tf.int32), name = 'rpn_labels') # shape is (1 x H x W x A, 2)
238 | rpn_bbox_targets = tf.convert_to_tensor(rpn_bbox_targets, name = 'rpn_bbox_targets') # shape is (1 x H x W x A, 4)
239 | rpn_bbox_inside_weights = tf.convert_to_tensor(rpn_bbox_inside_weights , name = 'rpn_bbox_inside_weights') # shape is (1 x H x W x A, 4)
240 | rpn_bbox_outside_weights = tf.convert_to_tensor(rpn_bbox_outside_weights , name = 'rpn_bbox_outside_weights') # shape is (1 x H x W x A, 4)
241 |
242 |
243 | return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights
244 |
245 | @layer
246 | def reshape_layer(self, input, d, name):
247 | input_shape = tf.shape(input)
248 | if name == 'rpn_cls_prob_reshape':
249 | #
250 | # transpose: (1, AxH, W, 2) -> (1, 2, AxH, W)
251 | # reshape: (1, 2xA, H, W)
252 | # transpose: -> (1, H, W, 2xA)
253 | return tf.transpose(tf.reshape(tf.transpose(input,[0,3,1,2]),
254 | [ input_shape[0],
255 | int(d),
256 | tf.cast(tf.cast(input_shape[1],tf.float32)/tf.cast(d,tf.float32)*tf.cast(input_shape[3],tf.float32),tf.int32),
257 | input_shape[2]
258 | ]),
259 | [0,2,3,1],name=name)
260 | else:
261 | return tf.transpose(tf.reshape(tf.transpose(input,[0,3,1,2]),
262 | [ input_shape[0],
263 | int(d),
264 | tf.cast(tf.cast(input_shape[1],tf.float32)*(tf.cast(input_shape[3],tf.float32)/tf.cast(d,tf.float32)),tf.int32),
265 | input_shape[2]
266 | ]),
267 | [0,2,3,1],name=name)
268 |
269 | @layer
270 | def spatial_reshape_layer(self, input, d, name):
271 | input_shape = tf.shape(input)
272 | # transpose: (1, H, W, A x d) -> (1, H, WxA, d)
273 | return tf.reshape(input,\
274 | [input_shape[0],\
275 | input_shape[1], \
276 | -1,\
277 | int(d)])
278 |
279 |
280 | @layer
281 | def lrn(self, input, radius, alpha, beta, name, bias=1.0):
282 | return tf.nn.local_response_normalization(input,
283 | depth_radius=radius,
284 | alpha=alpha,
285 | beta=beta,
286 | bias=bias,
287 | name=name)
288 |
289 | @layer
290 | def concat(self, inputs, axis, name):
291 | return tf.concat(concat_dim=axis, values=inputs, name=name)
292 |
293 | @layer
294 | def fc(self, input, num_out, name, relu=True, trainable=True):
295 | with tf.variable_scope(name) as scope:
296 | # only use the first input
297 | if isinstance(input, tuple):
298 | input = input[0]
299 |
300 | input_shape = input.get_shape()
301 | if input_shape.ndims == 4:
302 | dim = 1
303 | for d in input_shape[1:].as_list():
304 | dim *= d
305 | feed_in = tf.reshape(tf.transpose(input,[0,3,1,2]), [-1, dim])
306 | else:
307 | feed_in, dim = (input, int(input_shape[-1]))
308 |
309 | if name == 'bbox_pred':
310 | init_weights = tf.truncated_normal_initializer(0.0, stddev=0.001)
311 | init_biases = tf.constant_initializer(0.0)
312 | else:
313 | init_weights = tf.truncated_normal_initializer(0.0, stddev=0.01)
314 | init_biases = tf.constant_initializer(0.0)
315 |
316 | weights = self.make_var('weights', [dim, num_out], init_weights, trainable, \
317 | regularizer=self.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY))
318 | biases = self.make_var('biases', [num_out], init_biases, trainable)
319 |
320 | op = tf.nn.relu_layer if relu else tf.nn.xw_plus_b
321 | fc = op(feed_in, weights, biases, name=scope.name)
322 | return fc
323 |
324 | @layer
325 | def softmax(self, input, name):
326 | input_shape = tf.shape(input)
327 | if name == 'rpn_cls_prob':
328 | return tf.reshape(tf.nn.softmax(tf.reshape(input,[-1,input_shape[3]])),[-1,input_shape[1],input_shape[2],input_shape[3]],name=name)
329 | else:
330 | return tf.nn.softmax(input,name=name)
331 |
332 | @layer
333 | def spatial_softmax(self, input, name):
334 | input_shape = tf.shape(input)
335 | # d = input.get_shape()[-1]
336 | return tf.reshape(tf.nn.softmax(tf.reshape(input, [-1, input_shape[3]])),
337 | [-1, input_shape[1], input_shape[2], input_shape[3]], name=name)
338 |
339 | @layer
340 | def add(self,input,name):
341 | """contribution by miraclebiu"""
342 | return tf.add(input[0],input[1])
343 |
344 | @layer
345 | def batch_normalization(self,input,name,relu=True,is_training=False):
346 | """contribution by miraclebiu"""
347 | if relu:
348 | temp_layer=tf.contrib.layers.batch_norm(input,scale=True,center=True,is_training=is_training,scope=name)
349 | return tf.nn.relu(temp_layer)
350 | else:
351 | return tf.contrib.layers.batch_norm(input,scale=True,center=True,is_training=is_training,scope=name)
352 |
353 | @layer
354 | def dropout(self, input, keep_prob, name):
355 | return tf.nn.dropout(input, keep_prob, name=name)
356 |
357 | def l2_regularizer(self, weight_decay=0.0005, scope=None):
358 | def regularizer(tensor):
359 | with tf.name_scope(scope, default_name='l2_regularizer', values=[tensor]):
360 | l2_weight = tf.convert_to_tensor(weight_decay,
361 | dtype=tensor.dtype.base_dtype,
362 | name='weight_decay')
363 | #return tf.mul(l2_weight, tf.nn.l2_loss(tensor), name='value')
364 | return tf.multiply(l2_weight, tf.nn.l2_loss(tensor), name='value')
365 | return regularizer
366 |
367 | def smooth_l1_dist(self, deltas, sigma2=9.0, name='smooth_l1_dist'):
368 | with tf.name_scope(name=name) as scope:
369 | deltas_abs = tf.abs(deltas)
370 | smoothL1_sign = tf.cast(tf.less(deltas_abs, 1.0/sigma2), tf.float32)
371 | return tf.square(deltas) * 0.5 * sigma2 * smoothL1_sign + \
372 | (deltas_abs - 0.5 / sigma2) * tf.abs(smoothL1_sign - 1)
373 |
374 |
375 |
376 | def build_loss(self, ohem=False):
377 | # classification loss
378 | rpn_cls_score = tf.reshape(self.get_output('rpn_cls_score_reshape'), [-1, 2]) # shape (HxWxA, 2)
379 | rpn_label = tf.reshape(self.get_output('rpn-data')[0], [-1]) # shape (HxWxA)
380 | # ignore_label(-1)
381 | fg_keep = tf.equal(rpn_label, 1)
382 | rpn_keep = tf.where(tf.not_equal(rpn_label, -1))
383 | rpn_cls_score = tf.gather(rpn_cls_score, rpn_keep) # shape (N, 2)
384 | rpn_label = tf.gather(rpn_label, rpn_keep)
385 | rpn_cross_entropy_n = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=rpn_label,logits=rpn_cls_score)
386 |
387 | # box loss
388 | rpn_bbox_pred = self.get_output('rpn_bbox_pred') # shape (1, H, W, Ax4)
389 | rpn_bbox_targets = self.get_output('rpn-data')[1]
390 | rpn_bbox_inside_weights = self.get_output('rpn-data')[2]
391 | rpn_bbox_outside_weights = self.get_output('rpn-data')[3]
392 | rpn_bbox_pred = tf.gather(tf.reshape(rpn_bbox_pred, [-1, 4]), rpn_keep) # shape (N, 4)
393 | rpn_bbox_targets = tf.gather(tf.reshape(rpn_bbox_targets, [-1, 4]), rpn_keep)
394 | rpn_bbox_inside_weights = tf.gather(tf.reshape(rpn_bbox_inside_weights, [-1, 4]), rpn_keep)
395 | rpn_bbox_outside_weights = tf.gather(tf.reshape(rpn_bbox_outside_weights, [-1, 4]), rpn_keep)
396 |
397 | rpn_loss_box_n = tf.reduce_sum(rpn_bbox_outside_weights * self.smooth_l1_dist(
398 | rpn_bbox_inside_weights * (rpn_bbox_pred - rpn_bbox_targets)), reduction_indices=[1])
399 |
400 | rpn_loss_box = tf.reduce_sum(rpn_loss_box_n) / (tf.reduce_sum(tf.cast(fg_keep, tf.float32)) + 1)
401 | rpn_cross_entropy = tf.reduce_mean(rpn_cross_entropy_n)
402 |
403 |
404 | model_loss = rpn_cross_entropy + rpn_loss_box
405 |
406 | regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
407 | total_loss = tf.add_n(regularization_losses) + model_loss
408 |
409 | return total_loss,model_loss, rpn_cross_entropy, rpn_loss_box
410 |
--------------------------------------------------------------------------------
/lib/prepare_training_data/ToVoc.py:
--------------------------------------------------------------------------------
1 | from xml.dom.minidom import Document
2 | import cv2
3 | import os
4 | import glob
5 | import shutil
6 | import numpy as np
7 |
8 | def generate_xml(name, lines, img_size, class_sets, doncateothers=True):
9 | doc = Document()
10 |
11 | def append_xml_node_attr(child, parent=None, text=None):
12 | ele = doc.createElement(child)
13 | if not text is None:
14 | text_node = doc.createTextNode(text)
15 | ele.appendChild(text_node)
16 | parent = doc if parent is None else parent
17 | parent.appendChild(ele)
18 | return ele
19 |
20 | img_name = name + '.jpg'
21 | # create header
22 | annotation = append_xml_node_attr('annotation')
23 | append_xml_node_attr('folder', parent=annotation, text='text')
24 | append_xml_node_attr('filename', parent=annotation, text=img_name)
25 | source = append_xml_node_attr('source', parent=annotation)
26 | append_xml_node_attr('database', parent=source, text='coco_text_database')
27 | append_xml_node_attr('annotation', parent=source, text='text')
28 | append_xml_node_attr('image', parent=source, text='text')
29 | append_xml_node_attr('flickrid', parent=source, text='000000')
30 | owner = append_xml_node_attr('owner', parent=annotation)
31 | append_xml_node_attr('name', parent=owner, text='ms')
32 | size = append_xml_node_attr('size', annotation)
33 | append_xml_node_attr('width', size, str(img_size[1]))
34 | append_xml_node_attr('height', size, str(img_size[0]))
35 | append_xml_node_attr('depth', size, str(img_size[2]))
36 | append_xml_node_attr('segmented', parent=annotation, text='0')
37 |
38 | # create objects
39 | objs = []
40 | for line in lines:
41 | splitted_line = line.strip().lower().split()
42 | cls = splitted_line[0].lower()
43 | if not doncateothers and cls not in class_sets:
44 | continue
45 | cls = 'dontcare' if cls not in class_sets else cls
46 | if cls == 'dontcare':
47 | continue
48 | obj = append_xml_node_attr('object', parent=annotation)
49 | occlusion = int(0)
50 | x1, y1, x2, y2 = int(float(splitted_line[1]) + 1), int(float(splitted_line[2]) + 1), \
51 | int(float(splitted_line[3]) + 1), int(float(splitted_line[4]) + 1)
52 | truncation = float(0)
53 | difficult = 1 if _is_hard(cls, truncation, occlusion, x1, y1, x2, y2) else 0
54 | truncted = 0 if truncation < 0.5 else 1
55 |
56 | append_xml_node_attr('name', parent=obj, text=cls)
57 | append_xml_node_attr('pose', parent=obj, text='none')
58 | append_xml_node_attr('truncated', parent=obj, text=str(truncted))
59 | append_xml_node_attr('difficult', parent=obj, text=str(int(difficult)))
60 | bb = append_xml_node_attr('bndbox', parent=obj)
61 | append_xml_node_attr('xmin', parent=bb, text=str(x1))
62 | append_xml_node_attr('ymin', parent=bb, text=str(y1))
63 | append_xml_node_attr('xmax', parent=bb, text=str(x2))
64 | append_xml_node_attr('ymax', parent=bb, text=str(y2))
65 |
66 | o = {'class': cls, 'box': np.asarray([x1, y1, x2, y2], dtype=float), \
67 | 'truncation': truncation, 'difficult': difficult, 'occlusion': occlusion}
68 | objs.append(o)
69 |
70 | return doc, objs
71 |
72 |
73 | def _is_hard(cls, truncation, occlusion, x1, y1, x2, y2):
74 | hard = False
75 | if y2 - y1 < 25 and occlusion >= 2:
76 | hard = True
77 | return hard
78 | if occlusion >= 3:
79 | hard = True
80 | return hard
81 | if truncation > 0.8:
82 | hard = True
83 | return hard
84 | return hard
85 |
86 |
87 | def build_voc_dirs(outdir):
88 | mkdir = lambda dir: os.makedirs(dir) if not os.path.exists(dir) else None
89 | mkdir(outdir)
90 | mkdir(os.path.join(outdir, 'Annotations'))
91 | mkdir(os.path.join(outdir, 'ImageSets'))
92 | mkdir(os.path.join(outdir, 'ImageSets', 'Layout'))
93 | mkdir(os.path.join(outdir, 'ImageSets', 'Main'))
94 | mkdir(os.path.join(outdir, 'ImageSets', 'Segmentation'))
95 | mkdir(os.path.join(outdir, 'JPEGImages'))
96 | mkdir(os.path.join(outdir, 'SegmentationClass'))
97 | mkdir(os.path.join(outdir, 'SegmentationObject'))
98 | return os.path.join(outdir, 'Annotations'), os.path.join(outdir, 'JPEGImages'), os.path.join(outdir, 'ImageSets',
99 | 'Main')
100 |
101 |
102 | if __name__ == '__main__':
103 | _outdir = 'TEXTVOC/VOC2007'
104 | _draw = bool(0)
105 | _dest_label_dir, _dest_img_dir, _dest_set_dir = build_voc_dirs(_outdir)
106 | _doncateothers = bool(1)
107 | for dset in ['train']:
108 | _labeldir = 'label_tmp'
109 | _imagedir = 're_image'
110 | class_sets = ('text', 'dontcare')
111 | class_sets_dict = dict((k, i) for i, k in enumerate(class_sets))
112 | allclasses = {}
113 | fs = [open(os.path.join(_dest_set_dir, cls + '_' + dset + '.txt'), 'w') for cls in class_sets]
114 | ftrain = open(os.path.join(_dest_set_dir, dset + '.txt'), 'w')
115 |
116 | files = glob.glob(os.path.join(_labeldir, '*.txt'))
117 | files.sort()
118 | for file in files:
119 | path, basename = os.path.split(file)
120 | stem, ext = os.path.splitext(basename)
121 | with open(file, 'r') as f:
122 | lines = f.readlines()
123 | img_file = os.path.join(_imagedir, stem + '.jpg')
124 |
125 | print(img_file)
126 | img = cv2.imread(img_file)
127 | img_size = img.shape
128 |
129 | doc, objs = generate_xml(stem, lines, img_size, class_sets=class_sets, doncateothers=_doncateothers)
130 |
131 | cv2.imwrite(os.path.join(_dest_img_dir, stem + '.jpg'), img)
132 | xmlfile = os.path.join(_dest_label_dir, stem + '.xml')
133 | with open(xmlfile, 'w') as f:
134 | f.write(doc.toprettyxml(indent=' '))
135 |
136 | ftrain.writelines(stem + '\n')
137 |
138 | cls_in_image = set([o['class'] for o in objs])
139 |
140 | for obj in objs:
141 | cls = obj['class']
142 | allclasses[cls] = 0 \
143 | if not cls in list(allclasses.keys()) else allclasses[cls] + 1
144 |
145 | for cls in cls_in_image:
146 | if cls in class_sets:
147 | fs[class_sets_dict[cls]].writelines(stem + ' 1\n')
148 | for cls in class_sets:
149 | if cls not in cls_in_image:
150 | fs[class_sets_dict[cls]].writelines(stem + ' -1\n')
151 |
152 |
153 | (f.close() for f in fs)
154 | ftrain.close()
155 |
156 | print('~~~~~~~~~~~~~~~~~~~')
157 | print(allclasses)
158 | print('~~~~~~~~~~~~~~~~~~~')
159 | shutil.copyfile(os.path.join(_dest_set_dir, 'train.txt'), os.path.join(_dest_set_dir, 'val.txt'))
160 | shutil.copyfile(os.path.join(_dest_set_dir, 'train.txt'), os.path.join(_dest_set_dir, 'trainval.txt'))
161 | for cls in class_sets:
162 | shutil.copyfile(os.path.join(_dest_set_dir, cls + '_train.txt'),
163 | os.path.join(_dest_set_dir, cls + '_trainval.txt'))
164 | shutil.copyfile(os.path.join(_dest_set_dir, cls + '_train.txt'),
165 | os.path.join(_dest_set_dir, cls + '_val.txt'))
166 |
--------------------------------------------------------------------------------
/lib/prepare_training_data/split_label.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | import math
4 | import cv2 as cv
5 |
6 | path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/image'
7 | gt_path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/label'
8 | out_path = 're_image'
9 | if not os.path.exists(out_path):
10 | os.makedirs(out_path)
11 | files = os.listdir(path)
12 | files.sort()
13 | #files=files[:100]
14 | for file in files:
15 | _, basename = os.path.split(file)
16 | if basename.lower().split('.')[-1] not in ['jpg', 'png']:
17 | continue
18 | stem, ext = os.path.splitext(basename)
19 | gt_file = os.path.join(gt_path, 'gt_' + stem + '.txt')
20 | img_path = os.path.join(path, file)
21 | print(img_path)
22 | img = cv.imread(img_path)
23 | img_size = img.shape
24 | im_size_min = np.min(img_size[0:2])
25 | im_size_max = np.max(img_size[0:2])
26 |
27 | im_scale = float(600) / float(im_size_min)
28 | if np.round(im_scale * im_size_max) > 1200:
29 | im_scale = float(1200) / float(im_size_max)
30 | re_im = cv.resize(img, None, None, fx=im_scale, fy=im_scale, interpolation=cv.INTER_LINEAR)
31 | re_size = re_im.shape
32 | cv.imwrite(os.path.join(out_path, stem) + '.jpg', re_im)
33 |
34 | with open(gt_file, 'r') as f:
35 | lines = f.readlines()
36 | for line in lines:
37 | splitted_line = line.strip().lower().split(',')
38 | pt_x = np.zeros((4, 1))
39 | pt_y = np.zeros((4, 1))
40 | pt_x[0, 0] = int(float(splitted_line[0]) / img_size[1] * re_size[1])
41 | pt_y[0, 0] = int(float(splitted_line[1]) / img_size[0] * re_size[0])
42 | pt_x[1, 0] = int(float(splitted_line[2]) / img_size[1] * re_size[1])
43 | pt_y[1, 0] = int(float(splitted_line[3]) / img_size[0] * re_size[0])
44 | pt_x[2, 0] = int(float(splitted_line[4]) / img_size[1] * re_size[1])
45 | pt_y[2, 0] = int(float(splitted_line[5]) / img_size[0] * re_size[0])
46 | pt_x[3, 0] = int(float(splitted_line[6]) / img_size[1] * re_size[1])
47 | pt_y[3, 0] = int(float(splitted_line[7]) / img_size[0] * re_size[0])
48 |
49 | ind_x = np.argsort(pt_x, axis=0)
50 | pt_x = pt_x[ind_x]
51 | pt_y = pt_y[ind_x]
52 |
53 | if pt_y[0] < pt_y[1]:
54 | pt1 = (pt_x[0], pt_y[0])
55 | pt3 = (pt_x[1], pt_y[1])
56 | else:
57 | pt1 = (pt_x[1], pt_y[1])
58 | pt3 = (pt_x[0], pt_y[0])
59 |
60 | if pt_y[2] < pt_y[3]:
61 | pt2 = (pt_x[2], pt_y[2])
62 | pt4 = (pt_x[3], pt_y[3])
63 | else:
64 | pt2 = (pt_x[3], pt_y[3])
65 | pt4 = (pt_x[2], pt_y[2])
66 |
67 | xmin = int(min(pt1[0], pt2[0]))
68 | ymin = int(min(pt1[1], pt2[1]))
69 | xmax = int(max(pt2[0], pt4[0]))
70 | ymax = int(max(pt3[1], pt4[1]))
71 |
72 | if xmin < 0:
73 | xmin = 0
74 | if xmax > re_size[1] - 1:
75 | xmax = re_size[1] - 1
76 | if ymin < 0:
77 | ymin = 0
78 | if ymax > re_size[0] - 1:
79 | ymax = re_size[0] - 1
80 |
81 | width = xmax - xmin
82 | height = ymax - ymin
83 |
84 | # reimplement
85 | step = 16.0
86 | x_left = []
87 | x_right = []
88 | x_left.append(xmin)
89 | x_left_start = int(math.ceil(xmin / 16.0) * 16.0)
90 | if x_left_start == xmin:
91 | x_left_start = xmin + 16
92 | for i in np.arange(x_left_start, xmax, 16):
93 | x_left.append(i)
94 | x_left = np.array(x_left)
95 |
96 | x_right.append(x_left_start - 1)
97 | for i in range(1, len(x_left) - 1):
98 | x_right.append(x_left[i] + 15)
99 | x_right.append(xmax)
100 | x_right = np.array(x_right)
101 |
102 | idx = np.where(x_left == x_right)
103 | x_left = np.delete(x_left, idx, axis=0)
104 | x_right = np.delete(x_right, idx, axis=0)
105 |
106 | if not os.path.exists('label_tmp'):
107 | os.makedirs('label_tmp')
108 | with open(os.path.join('label_tmp', stem) + '.txt', 'a') as f:
109 | for i in range(len(x_left)):
110 | f.writelines("text\t")
111 | f.writelines(str(int(x_left[i])))
112 | f.writelines("\t")
113 | f.writelines(str(int(ymin)))
114 | f.writelines("\t")
115 | f.writelines(str(int(x_right[i])))
116 | f.writelines("\t")
117 | f.writelines(str(int(ymax)))
118 | f.writelines("\n")
119 |
--------------------------------------------------------------------------------
/lib/roi_data_layer/__init__.py:
--------------------------------------------------------------------------------
1 | from . import roidb
--------------------------------------------------------------------------------
/lib/roi_data_layer/layer.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from lib.fast_rcnn.config import cfg
3 | from lib.roi_data_layer.minibatch import get_minibatch
4 |
5 | class RoIDataLayer(object):
6 | """Fast R-CNN data layer used for training."""
7 |
8 | def __init__(self, roidb, num_classes):
9 | """Set the roidb to be used by this layer during training."""
10 | self._roidb = roidb
11 | self._num_classes = num_classes
12 | self._shuffle_roidb_inds()
13 |
14 | def _shuffle_roidb_inds(self):
15 | """Randomly permute the training roidb."""
16 | self._perm = np.random.permutation(np.arange(len(self._roidb)))
17 | self._cur = 0
18 |
19 | def _get_next_minibatch_inds(self):
20 | """Return the roidb indices for the next minibatch."""
21 |
22 | if cfg.TRAIN.HAS_RPN:
23 | if self._cur + cfg.TRAIN.IMS_PER_BATCH >= len(self._roidb):
24 | self._shuffle_roidb_inds()
25 |
26 | db_inds = self._perm[self._cur:self._cur + cfg.TRAIN.IMS_PER_BATCH]
27 | self._cur += cfg.TRAIN.IMS_PER_BATCH
28 | else:
29 | # sample images
30 | db_inds = np.zeros((cfg.TRAIN.IMS_PER_BATCH), dtype=np.int32)
31 | i = 0
32 | while (i < cfg.TRAIN.IMS_PER_BATCH):
33 | ind = self._perm[self._cur]
34 | num_objs = self._roidb[ind]['boxes'].shape[0]
35 | if num_objs != 0:
36 | db_inds[i] = ind
37 | i += 1
38 |
39 | self._cur += 1
40 | if self._cur >= len(self._roidb):
41 | self._shuffle_roidb_inds()
42 |
43 | return db_inds
44 |
45 | def _get_next_minibatch(self):
46 | """Return the blobs to be used for the next minibatch.
47 |
48 | If cfg.TRAIN.USE_PREFETCH is True, then blobs will be computed in a
49 | separate process and made available through self._blob_queue.
50 | """
51 | db_inds = self._get_next_minibatch_inds()
52 | minibatch_db = [self._roidb[i] for i in db_inds]
53 | return get_minibatch(minibatch_db, self._num_classes)
54 |
55 | def forward(self):
56 | """Get blobs and copy them into this layer's top blob vector."""
57 | blobs = self._get_next_minibatch()
58 | return blobs
59 |
--------------------------------------------------------------------------------
/lib/roi_data_layer/minibatch.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import numpy.random as npr
3 | import cv2
4 | import os
5 | from lib.fast_rcnn.config import cfg
6 | from lib.utils.blob import prep_im_for_blob, im_list_to_blob
7 |
8 | def get_minibatch(roidb, num_classes):
9 | """Given a roidb, construct a minibatch sampled from it."""
10 | num_images = len(roidb)
11 | # Sample random scales to use for each image in this batch
12 | random_scale_inds = npr.randint(0, high=len(cfg.TRAIN.SCALES),
13 | size=num_images)
14 | assert(cfg.TRAIN.BATCH_SIZE % num_images == 0), \
15 | 'num_images ({}) must divide BATCH_SIZE ({})'. \
16 | format(num_images, cfg.TRAIN.BATCH_SIZE)
17 | rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
18 | fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
19 |
20 | # Get the input image blob, formatted for caffe
21 | im_blob, im_scales = _get_image_blob(roidb, random_scale_inds)
22 |
23 | blobs = {'data': im_blob}
24 |
25 | if cfg.TRAIN.HAS_RPN:
26 | assert len(im_scales) == 1, "Single batch only"
27 | assert len(roidb) == 1, "Single batch only"
28 | # gt boxes: (x1, y1, x2, y2, cls)
29 | gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]
30 | gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)
31 | gt_boxes[:, 0:4] = roidb[0]['boxes'][gt_inds, :] * im_scales[0]
32 | gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]
33 | blobs['gt_boxes'] = gt_boxes
34 | blobs['gt_ishard'] = roidb[0]['gt_ishard'][gt_inds] \
35 | if 'gt_ishard' in roidb[0] else np.zeros(gt_inds.size, dtype=int)
36 | # blobs['gt_ishard'] = roidb[0]['gt_ishard'][gt_inds]
37 | blobs['dontcare_areas'] = roidb[0]['dontcare_areas'] * im_scales[0] \
38 | if 'dontcare_areas' in roidb[0] else np.zeros([0, 4], dtype=float)
39 | blobs['im_info'] = np.array(
40 | [[im_blob.shape[1], im_blob.shape[2], im_scales[0]]],
41 | dtype=np.float32)
42 | blobs['im_name'] = os.path.basename(roidb[0]['image'])
43 |
44 | else: # not using RPN
45 | # Now, build the region of interest and label blobs
46 | rois_blob = np.zeros((0, 5), dtype=np.float32)
47 | labels_blob = np.zeros((0), dtype=np.float32)
48 | bbox_targets_blob = np.zeros((0, 4 * num_classes), dtype=np.float32)
49 | bbox_inside_blob = np.zeros(bbox_targets_blob.shape, dtype=np.float32)
50 | # all_overlaps = []
51 | for im_i in range(num_images):
52 | labels, overlaps, im_rois, bbox_targets, bbox_inside_weights \
53 | = _sample_rois(roidb[im_i], fg_rois_per_image, rois_per_image,
54 | num_classes)
55 |
56 | # Add to RoIs blob
57 | rois = _project_im_rois(im_rois, im_scales[im_i])
58 | batch_ind = im_i * np.ones((rois.shape[0], 1))
59 | rois_blob_this_image = np.hstack((batch_ind, rois))
60 | rois_blob = np.vstack((rois_blob, rois_blob_this_image))
61 |
62 | # Add to labels, bbox targets, and bbox loss blobs
63 | labels_blob = np.hstack((labels_blob, labels))
64 | bbox_targets_blob = np.vstack((bbox_targets_blob, bbox_targets))
65 | bbox_inside_blob = np.vstack((bbox_inside_blob, bbox_inside_weights))
66 | # all_overlaps = np.hstack((all_overlaps, overlaps))
67 |
68 | # For debug visualizations
69 | # _vis_minibatch(im_blob, rois_blob, labels_blob, all_overlaps)
70 |
71 | blobs['rois'] = rois_blob
72 | blobs['labels'] = labels_blob
73 |
74 | if cfg.TRAIN.BBOX_REG:
75 | blobs['bbox_targets'] = bbox_targets_blob
76 | blobs['bbox_inside_weights'] = bbox_inside_blob
77 | blobs['bbox_outside_weights'] = \
78 | np.array(bbox_inside_blob > 0).astype(np.float32)
79 |
80 | return blobs
81 |
82 | def _sample_rois(roidb, fg_rois_per_image, rois_per_image, num_classes):
83 | """Generate a random sample of RoIs comprising foreground and background
84 | examples.
85 | """
86 | # label = class RoI has max overlap with
87 | labels = roidb['max_classes']
88 | overlaps = roidb['max_overlaps']
89 | rois = roidb['boxes']
90 |
91 | # Select foreground RoIs as those with >= FG_THRESH overlap
92 | fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
93 | # Guard against the case when an image has fewer than fg_rois_per_image
94 | # foreground RoIs
95 | fg_rois_per_this_image = np.minimum(fg_rois_per_image, fg_inds.size)
96 | # Sample foreground regions without replacement
97 | if fg_inds.size > 0:
98 | fg_inds = npr.choice(
99 | fg_inds, size=fg_rois_per_this_image, replace=False)
100 |
101 | # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
102 | bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
103 | (overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
104 | # Compute number of background RoIs to take from this image (guarding
105 | # against there being fewer than desired)
106 | bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
107 | bg_rois_per_this_image = np.minimum(bg_rois_per_this_image,
108 | bg_inds.size)
109 | # Sample foreground regions without replacement
110 | if bg_inds.size > 0:
111 | bg_inds = npr.choice(
112 | bg_inds, size=bg_rois_per_this_image, replace=False)
113 |
114 | # The indices that we're selecting (both fg and bg)
115 | keep_inds = np.append(fg_inds, bg_inds)
116 | # Select sampled values from various arrays:
117 | labels = labels[keep_inds]
118 | # Clamp labels for the background RoIs to 0
119 | labels[fg_rois_per_this_image:] = 0
120 | overlaps = overlaps[keep_inds]
121 | rois = rois[keep_inds]
122 |
123 | bbox_targets, bbox_inside_weights = _get_bbox_regression_labels(
124 | roidb['bbox_targets'][keep_inds, :], num_classes)
125 |
126 | return labels, overlaps, rois, bbox_targets, bbox_inside_weights
127 |
128 | def _get_image_blob(roidb, scale_inds):
129 | """Builds an input blob from the images in the roidb at the specified
130 | scales.
131 | """
132 | num_images = len(roidb)
133 | processed_ims = []
134 | im_scales = []
135 | for i in range(num_images):
136 | im = cv2.imread(roidb[i]['image'])
137 | if roidb[i]['flipped']:
138 | im = im[:, ::-1, :]
139 | target_size = cfg.TRAIN.SCALES[scale_inds[i]]
140 | im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, target_size,
141 | cfg.TRAIN.MAX_SIZE)
142 | im_scales.append(im_scale)
143 | processed_ims.append(im)
144 |
145 | # Create a blob to hold the input images
146 | blob = im_list_to_blob(processed_ims)
147 |
148 | return blob, im_scales
149 |
150 | def _project_im_rois(im_rois, im_scale_factor):
151 | """Project image RoIs into the rescaled training image."""
152 | rois = im_rois * im_scale_factor
153 | return rois
154 |
155 | def _get_bbox_regression_labels(bbox_target_data, num_classes):
156 | """Bounding-box regression targets are stored in a compact form in the
157 | roidb.
158 |
159 | This function expands those targets into the 4-of-4*K representation used
160 | by the network (i.e. only one class has non-zero targets). The loss weights
161 | are similarly expanded.
162 |
163 | Returns:
164 | bbox_target_data (ndarray): N x 4K blob of regression targets
165 | bbox_inside_weights (ndarray): N x 4K blob of loss weights
166 | """
167 | clss = bbox_target_data[:, 0]
168 | bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
169 | bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
170 | inds = np.where(clss > 0)[0]
171 | for ind in inds:
172 | cls = clss[ind]
173 | start = 4 * cls
174 | end = start + 4
175 | bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
176 | bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
177 | return bbox_targets, bbox_inside_weights
178 |
179 | def _vis_minibatch(im_blob, rois_blob, labels_blob, overlaps):
180 | """Visualize a mini-batch for debugging."""
181 | import matplotlib.pyplot as plt
182 | for i in range(rois_blob.shape[0]):
183 | rois = rois_blob[i, :]
184 | im_ind = rois[0]
185 | roi = rois[1:]
186 | im = im_blob[im_ind, :, :, :].transpose((1, 2, 0)).copy()
187 | im += cfg.PIXEL_MEANS
188 | im = im[:, :, (2, 1, 0)]
189 | im = im.astype(np.uint8)
190 | cls = labels_blob[i]
191 | plt.imshow(im)
192 | print('class: ', cls, ' overlap: ', overlaps[i])
193 | plt.gca().add_patch(
194 | plt.Rectangle((roi[0], roi[1]), roi[2] - roi[0],
195 | roi[3] - roi[1], fill=False,
196 | edgecolor='r', linewidth=3)
197 | )
198 | plt.show()
199 |
--------------------------------------------------------------------------------
/lib/roi_data_layer/roidb.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import PIL
3 | from lib.fast_rcnn.config import cfg
4 | from lib.fast_rcnn.bbox_transform import bbox_transform
5 | from lib.utils.bbox import bbox_overlaps
6 |
7 | def prepare_roidb(imdb):
8 | """Enrich the imdb's roidb by adding some derived quantities that
9 | are useful for training. This function precomputes the maximum
10 | overlap, taken over ground-truth boxes, between each ROI and
11 | each ground-truth box. The class with maximum overlap is also
12 | recorded.
13 | """
14 | sizes = [PIL.Image.open(imdb.image_path_at(i)).size
15 | for i in range(imdb.num_images)]
16 | roidb = imdb.roidb
17 | for i in range(len(imdb.image_index)):
18 | roidb[i]['image'] = imdb.image_path_at(i)
19 | roidb[i]['width'] = sizes[i][0]
20 | roidb[i]['height'] = sizes[i][1]
21 | # need gt_overlaps as a dense array for argmax
22 | gt_overlaps = roidb[i]['gt_overlaps'].toarray()
23 | # max overlap with gt over classes (columns)
24 | max_overlaps = gt_overlaps.max(axis=1)
25 | # gt class that had the max overlap
26 | max_classes = gt_overlaps.argmax(axis=1)
27 | roidb[i]['max_classes'] = max_classes
28 | roidb[i]['max_overlaps'] = max_overlaps
29 | # sanity checks
30 | # max overlap of 0 => class should be zero (background)
31 | zero_inds = np.where(max_overlaps == 0)[0]
32 | assert all(max_classes[zero_inds] == 0)
33 | # max overlap > 0 => class should not be zero (must be a fg class)
34 | nonzero_inds = np.where(max_overlaps > 0)[0]
35 | assert all(max_classes[nonzero_inds] != 0)
36 |
37 | def add_bbox_regression_targets(roidb):
38 | """
39 | Add information needed to train bounding-box regressors.
40 | For each roi find the corresponding gt box, and compute the distance.
41 | then normalize the distance into Gaussian by minus mean and divided by std
42 | """
43 | assert len(roidb) > 0
44 | assert 'max_classes' in roidb[0], 'Did you call prepare_roidb first?'
45 |
46 | num_images = len(roidb)
47 | # Infer number of classes from the number of columns in gt_overlaps
48 | num_classes = roidb[0]['gt_overlaps'].shape[1]
49 | for im_i in range(num_images):
50 | rois = roidb[im_i]['boxes']
51 | max_overlaps = roidb[im_i]['max_overlaps']
52 | max_classes = roidb[im_i]['max_classes']
53 | roidb[im_i]['bbox_targets'] = \
54 | _compute_targets(rois, max_overlaps, max_classes)
55 |
56 | if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
57 | # Use fixed / precomputed "means" and "stds" instead of empirical values
58 | means = np.tile(
59 | np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (num_classes, 1))
60 | stds = np.tile(
61 | np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (num_classes, 1))
62 | else:
63 | # Compute values needed for means and stds
64 | # var(x) = E(x^2) - E(x)^2
65 | class_counts = np.zeros((num_classes, 1)) + cfg.EPS
66 | sums = np.zeros((num_classes, 4))
67 | squared_sums = np.zeros((num_classes, 4))
68 | for im_i in range(num_images):
69 | targets = roidb[im_i]['bbox_targets']
70 | for cls in range(1, num_classes):
71 | cls_inds = np.where(targets[:, 0] == cls)[0]
72 | if cls_inds.size > 0:
73 | class_counts[cls] += cls_inds.size
74 | sums[cls, :] += targets[cls_inds, 1:].sum(axis=0)
75 | squared_sums[cls, :] += \
76 | (targets[cls_inds, 1:] ** 2).sum(axis=0)
77 |
78 | means = sums / class_counts
79 | stds = np.sqrt(squared_sums / class_counts - means ** 2)
80 | # too small number will cause nan error
81 | assert np.min(stds) < 0.01, \
82 | 'Boxes std is too small, std:{}'.format(stds)
83 |
84 | print('bbox target means:')
85 | print(means)
86 | print(means[1:, :].mean(axis=0)) # ignore bg class
87 | print('bbox target stdevs:')
88 | print(stds)
89 | print(stds[1:, :].mean(axis=0)) # ignore bg class
90 |
91 | # Normalize targets
92 | if cfg.TRAIN.BBOX_NORMALIZE_TARGETS:
93 | print("Normalizing targets")
94 | for im_i in range(num_images):
95 | targets = roidb[im_i]['bbox_targets']
96 | for cls in range(1, num_classes):
97 | cls_inds = np.where(targets[:, 0] == cls)[0]
98 | roidb[im_i]['bbox_targets'][cls_inds, 1:] -= means[cls, :]
99 | roidb[im_i]['bbox_targets'][cls_inds, 1:] /= stds[cls, :]
100 | else:
101 | print("NOT normalizing targets")
102 |
103 | # These values will be needed for making predictions
104 | # (the predicts will need to be unnormalized and uncentered)
105 | return means.ravel(), stds.ravel()
106 |
107 | def _compute_targets(rois, overlaps, labels):
108 | """
109 | Compute bounding-box regression targets for an image.
110 | for each roi find the corresponding gt_box, then compute the distance.
111 | """
112 | # Indices of ground-truth ROIs
113 | gt_inds = np.where(overlaps == 1)[0]
114 | if len(gt_inds) == 0:
115 | # Bail if the image has no ground-truth ROIs
116 | return np.zeros((rois.shape[0], 5), dtype=np.float32)
117 | # Indices of examples for which we try to make predictions
118 | ex_inds = np.where(overlaps >= cfg.TRAIN.BBOX_THRESH)[0]
119 |
120 | # Get IoU overlap between each ex ROI and gt ROI
121 | ex_gt_overlaps = bbox_overlaps(
122 | np.ascontiguousarray(rois[ex_inds, :], dtype=np.float),
123 | np.ascontiguousarray(rois[gt_inds, :], dtype=np.float))
124 |
125 | # Find which gt ROI each ex ROI has max overlap with:
126 | # this will be the ex ROI's gt target
127 | gt_assignment = ex_gt_overlaps.argmax(axis=1)
128 | gt_rois = rois[gt_inds[gt_assignment], :]
129 | ex_rois = rois[ex_inds, :]
130 |
131 | targets = np.zeros((rois.shape[0], 5), dtype=np.float32)
132 | targets[ex_inds, 0] = labels[ex_inds]
133 | targets[ex_inds, 1:] = bbox_transform(ex_rois, gt_rois)
134 | return targets
135 |
--------------------------------------------------------------------------------
/lib/rpn_msr/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eragonruan/text-detection-ctpn/c04a571e2593fc361c1aff3127e58dc13fdc4e5a/lib/rpn_msr/__init__.py
--------------------------------------------------------------------------------
/lib/rpn_msr/anchor_target_layer_tf.py:
--------------------------------------------------------------------------------
1 | # -*- coding:utf-8 -*-
2 | import numpy as np
3 | import numpy.random as npr
4 | from .generate_anchors import generate_anchors
5 | from lib.utils.bbox import bbox_overlaps, bbox_intersections
6 | from lib.fast_rcnn.config import cfg
7 | from lib.fast_rcnn.bbox_transform import bbox_transform
8 |
9 | DEBUG = False
10 | def anchor_target_layer(rpn_cls_score, gt_boxes, gt_ishard, dontcare_areas, im_info, _feat_stride = [16,], anchor_scales = [16,]):
11 | """
12 | Assign anchors to ground-truth targets. Produces anchor classification
13 | labels and bounding-box regression targets.
14 | Parameters
15 | ----------
16 | rpn_cls_score: (1, H, W, Ax2) bg/fg scores of previous conv layer
17 | gt_boxes: (G, 5) vstack of [x1, y1, x2, y2, class]
18 | gt_ishard: (G, 1), 1 or 0 indicates difficult or not
19 | dontcare_areas: (D, 4), some areas may contains small objs but no labelling. D may be 0
20 | im_info: a list of [image_height, image_width, scale_ratios]
21 | _feat_stride: the downsampling ratio of feature map to the original input image
22 | anchor_scales: the scales to the basic_anchor (basic anchor is [16, 16])
23 | ----------
24 | Returns
25 | ----------
26 | rpn_labels : (HxWxA, 1), for each anchor, 0 denotes bg, 1 fg, -1 dontcare
27 | rpn_bbox_targets: (HxWxA, 4), distances of the anchors to the gt_boxes(may contains some transform)
28 | that are the regression objectives
29 | rpn_bbox_inside_weights: (HxWxA, 4) weights of each boxes, mainly accepts hyper param in cfg
30 | rpn_bbox_outside_weights: (HxWxA, 4) used to balance the fg/bg,
31 | beacuse the numbers of bgs and fgs mays significiantly different
32 | """
33 | _anchors = generate_anchors(scales=np.array(anchor_scales))#生成基本的anchor,一共9个
34 | _num_anchors = _anchors.shape[0]#9个anchor
35 |
36 | if DEBUG:
37 | print('anchors:')
38 | print(_anchors)
39 | print('anchor shapes:')
40 | print(np.hstack((
41 | _anchors[:, 2::4] - _anchors[:, 0::4],
42 | _anchors[:, 3::4] - _anchors[:, 1::4],
43 | )))
44 | _counts = cfg.EPS
45 | _sums = np.zeros((1, 4))
46 | _squared_sums = np.zeros((1, 4))
47 | _fg_sum = 0
48 | _bg_sum = 0
49 | _count = 0
50 |
51 | # allow boxes to sit over the edge by a small amount
52 | _allowed_border = 0
53 | # map of shape (..., H, W)
54 | #height, width = rpn_cls_score.shape[1:3]
55 |
56 | im_info = im_info[0]#图像的高宽及通道数
57 |
58 | #在feature-map上定位anchor,并加上delta,得到在实际图像中anchor的真实坐标
59 | # Algorithm:
60 | # for each (H, W) location i
61 | # generate 9 anchor boxes centered on cell i
62 | # apply predicted bbox deltas at cell i to each of the 9 anchors
63 | # filter out-of-image anchors
64 | # measure GT overlap
65 |
66 | assert rpn_cls_score.shape[0] == 1, \
67 | 'Only single item batches are supported'
68 |
69 | # map of shape (..., H, W)
70 | height, width = rpn_cls_score.shape[1:3]#feature-map的高宽
71 |
72 | if DEBUG:
73 | print('AnchorTargetLayer: height', height, 'width', width)
74 | print('')
75 | print('im_size: ({}, {})'.format(im_info[0], im_info[1]))
76 | print('scale: {}'.format(im_info[2]))
77 | print('height, width: ({}, {})'.format(height, width))
78 | print('rpn: gt_boxes.shape', gt_boxes.shape)
79 | print('rpn: gt_boxes', gt_boxes)
80 |
81 | # 1. Generate proposals from bbox deltas and shifted anchors
82 | shift_x = np.arange(0, width) * _feat_stride
83 | shift_y = np.arange(0, height) * _feat_stride
84 | shift_x, shift_y = np.meshgrid(shift_x, shift_y) # in W H order
85 | # K is H x W
86 | shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
87 | shift_x.ravel(), shift_y.ravel())).transpose()#生成feature-map和真实image上anchor之间的偏移量
88 | # add A anchors (1, A, 4) to
89 | # cell K shifts (K, 1, 4) to get
90 | # shift anchors (K, A, 4)
91 | # reshape to (K*A, 4) shifted anchors
92 | A = _num_anchors#9个anchor
93 | K = shifts.shape[0]#50*37,feature-map的宽乘高的大小
94 | all_anchors = (_anchors.reshape((1, A, 4)) +
95 | shifts.reshape((1, K, 4)).transpose((1, 0, 2)))#相当于复制宽高的维度,然后相加
96 | all_anchors = all_anchors.reshape((K * A, 4))
97 | total_anchors = int(K * A)
98 |
99 | # only keep anchors inside the image
100 | #仅保留那些还在图像内部的anchor,超出图像的都删掉
101 | inds_inside = np.where(
102 | (all_anchors[:, 0] >= -_allowed_border) &
103 | (all_anchors[:, 1] >= -_allowed_border) &
104 | (all_anchors[:, 2] < im_info[1] + _allowed_border) & # width
105 | (all_anchors[:, 3] < im_info[0] + _allowed_border) # height
106 | )[0]
107 |
108 | if DEBUG:
109 | print('total_anchors', total_anchors)
110 | print('inds_inside', len(inds_inside))
111 |
112 | # keep only inside anchors
113 | anchors = all_anchors[inds_inside, :]#保留那些在图像内的anchor
114 | if DEBUG:
115 | print('anchors.shape', anchors.shape)
116 |
117 | #至此,anchor准备好了
118 | #--------------------------------------------------------------
119 | # label: 1 is positive, 0 is negative, -1 is dont care
120 | # (A)
121 | labels = np.empty((len(inds_inside), ), dtype=np.float32)
122 | labels.fill(-1)#初始化label,均为-1
123 |
124 | # overlaps between the anchors and the gt boxes
125 | # overlaps (ex, gt), shape is A x G
126 | #计算anchor和gt-box的overlap,用来给anchor上标签
127 | overlaps = bbox_overlaps(
128 | np.ascontiguousarray(anchors, dtype=np.float),
129 | np.ascontiguousarray(gt_boxes, dtype=np.float))#假设anchors有x个,gt_boxes有y个,返回的是一个(x,y)的数组
130 | # 存放每一个anchor和每一个gtbox之间的overlap
131 | argmax_overlaps = overlaps.argmax(axis=1) # (A)#找到和每一个gtbox,overlap最大的那个anchor
132 | max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps]
133 | gt_argmax_overlaps = overlaps.argmax(axis=0) # G#找到每个位置上9个anchor中与gtbox,overlap最大的那个
134 | gt_max_overlaps = overlaps[gt_argmax_overlaps,
135 | np.arange(overlaps.shape[1])]
136 | gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
137 |
138 | if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
139 | # assign bg labels first so that positive labels can clobber them
140 | labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0#先给背景上标签,小于0.3overlap的
141 |
142 | # fg label: for each gt, anchor with highest overlap
143 | labels[gt_argmax_overlaps] = 1#每个位置上的9个anchor中overlap最大的认为是前景
144 | # fg label: above threshold IOU
145 | labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1#overlap大于0.7的认为是前景
146 |
147 | if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
148 | # assign bg labels last so that negative labels can clobber positives
149 | labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
150 |
151 | # preclude dontcare areas
152 | if dontcare_areas is not None and dontcare_areas.shape[0] > 0:#这里我们暂时不考虑有doncare_area的存在
153 | # intersec shape is D x A
154 | intersecs = bbox_intersections(
155 | np.ascontiguousarray(dontcare_areas, dtype=np.float), # D x 4
156 | np.ascontiguousarray(anchors, dtype=np.float) # A x 4
157 | )
158 | intersecs_ = intersecs.sum(axis=0) # A x 1
159 | labels[intersecs_ > cfg.TRAIN.DONTCARE_AREA_INTERSECTION_HI] = -1
160 |
161 | #这里我们暂时不考虑难样本的问题
162 | # preclude hard samples that are highly occlusioned, truncated or difficult to see
163 | if cfg.TRAIN.PRECLUDE_HARD_SAMPLES and gt_ishard is not None and gt_ishard.shape[0] > 0:
164 | assert gt_ishard.shape[0] == gt_boxes.shape[0]
165 | gt_ishard = gt_ishard.astype(int)
166 | gt_hardboxes = gt_boxes[gt_ishard == 1, :]
167 | if gt_hardboxes.shape[0] > 0:
168 | # H x A
169 | hard_overlaps = bbox_overlaps(
170 | np.ascontiguousarray(gt_hardboxes, dtype=np.float), # H x 4
171 | np.ascontiguousarray(anchors, dtype=np.float)) # A x 4
172 | hard_max_overlaps = hard_overlaps.max(axis=0) # (A)
173 | labels[hard_max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = -1
174 | max_intersec_label_inds = hard_overlaps.argmax(axis=1) # H x 1
175 | labels[max_intersec_label_inds] = -1 #
176 |
177 | # subsample positive labels if we have too many
178 | #对正样本进行采样,如果正样本的数量太多的话
179 | # 限制正样本的数量不超过128个
180 | #TODO 这个后期可能还需要修改,毕竟如果使用的是字符的片段,那个正样本的数量是很多的。
181 | num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
182 | fg_inds = np.where(labels == 1)[0]
183 | if len(fg_inds) > num_fg:
184 | disable_inds = npr.choice(
185 | fg_inds, size=(len(fg_inds) - num_fg), replace=False)#随机去除掉一些正样本
186 | labels[disable_inds] = -1#变为-1
187 |
188 | # subsample negative labels if we have too many
189 | #对负样本进行采样,如果负样本的数量太多的话
190 | # 正负样本总数是256,限制正样本数目最多128,
191 | # 如果正样本数量小于128,差的那些就用负样本补上,凑齐256个样本
192 | num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)
193 | bg_inds = np.where(labels == 0)[0]
194 | if len(bg_inds) > num_bg:
195 | disable_inds = npr.choice(
196 | bg_inds, size=(len(bg_inds) - num_bg), replace=False)
197 | labels[disable_inds] = -1
198 | #print "was %s inds, disabling %s, now %s inds" % (
199 | #len(bg_inds), len(disable_inds), np.sum(labels == 0))
200 |
201 | # 至此, 上好标签,开始计算rpn-box的真值
202 | #--------------------------------------------------------------
203 | bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)
204 | bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])#根据anchor和gtbox计算得真值(anchor和gtbox之间的偏差)
205 |
206 |
207 | bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
208 | bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)#内部权重,前景就给1,其他是0
209 |
210 | bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
211 | if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:#暂时使用uniform 权重,也就是正样本是1,负样本是0
212 | # uniform weighting of examples (given non-uniform sampling)
213 | num_examples = np.sum(labels >= 0) + 1
214 | # positive_weights = np.ones((1, 4)) * 1.0 / num_examples
215 | # negative_weights = np.ones((1, 4)) * 1.0 / num_examples
216 | positive_weights = np.ones((1, 4))
217 | negative_weights = np.zeros((1, 4))
218 | else:
219 | assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
220 | (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
221 | positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
222 | (np.sum(labels == 1)) + 1)
223 | negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
224 | (np.sum(labels == 0)) + 1)
225 | bbox_outside_weights[labels == 1, :] = positive_weights#外部权重,前景是1,背景是0
226 | bbox_outside_weights[labels == 0, :] = negative_weights
227 |
228 | if DEBUG:
229 | _sums += bbox_targets[labels == 1, :].sum(axis=0)
230 | _squared_sums += (bbox_targets[labels == 1, :] ** 2).sum(axis=0)
231 | _counts += np.sum(labels == 1)
232 | means = _sums / _counts
233 | stds = np.sqrt(_squared_sums / _counts - means ** 2)
234 | print('means:')
235 | print(means)
236 | print('stdevs:')
237 | print(stds)
238 |
239 | # map up to original set of anchors
240 | # 一开始是将超出图像范围的anchor直接丢掉的,现在在加回来
241 | labels = _unmap(labels, total_anchors, inds_inside, fill=-1)#这些anchor的label是-1,也即dontcare
242 | bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)#这些anchor的真值是0,也即没有值
243 | bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)#内部权重以0填充
244 | bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)#外部权重以0填充
245 |
246 | if DEBUG:
247 | print('rpn: max max_overlap', np.max(max_overlaps))
248 | print('rpn: num_positive', np.sum(labels == 1))
249 | print('rpn: num_negative', np.sum(labels == 0))
250 | _fg_sum += np.sum(labels == 1)
251 | _bg_sum += np.sum(labels == 0)
252 | _count += 1
253 | print('rpn: num_positive avg', _fg_sum / _count)
254 | print('rpn: num_negative avg', _bg_sum / _count)
255 |
256 | # labels
257 | labels = labels.reshape((1, height, width, A))#reshap一下label
258 | rpn_labels = labels
259 |
260 | # bbox_targets
261 | bbox_targets = bbox_targets \
262 | .reshape((1, height, width, A * 4))#reshape
263 |
264 | rpn_bbox_targets = bbox_targets
265 | # bbox_inside_weights
266 | bbox_inside_weights = bbox_inside_weights \
267 | .reshape((1, height, width, A * 4))
268 |
269 | rpn_bbox_inside_weights = bbox_inside_weights
270 |
271 | # bbox_outside_weights
272 | bbox_outside_weights = bbox_outside_weights \
273 | .reshape((1, height, width, A * 4))
274 | rpn_bbox_outside_weights = bbox_outside_weights
275 |
276 | return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights
277 |
278 |
279 |
280 | def _unmap(data, count, inds, fill=0):
281 | """ Unmap a subset of item (data) back to the original set of items (of
282 | size count) """
283 | if len(data.shape) == 1:
284 | ret = np.empty((count, ), dtype=np.float32)
285 | ret.fill(fill)
286 | ret[inds] = data
287 | else:
288 | ret = np.empty((count, ) + data.shape[1:], dtype=np.float32)
289 | ret.fill(fill)
290 | ret[inds, :] = data
291 | return ret
292 |
293 |
294 | def _compute_targets(ex_rois, gt_rois):
295 | """Compute bounding-box regression targets for an image."""
296 |
297 | assert ex_rois.shape[0] == gt_rois.shape[0]
298 | assert ex_rois.shape[1] == 4
299 | assert gt_rois.shape[1] == 5
300 |
301 | return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)
302 |
--------------------------------------------------------------------------------
/lib/rpn_msr/generate_anchors.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | def generate_basic_anchors(sizes, base_size=16):
4 | base_anchor = np.array([0, 0, base_size - 1, base_size - 1], np.int32)
5 | anchors = np.zeros((len(sizes), 4), np.int32)
6 | index = 0
7 | for h, w in sizes:
8 | anchors[index] = scale_anchor(base_anchor, h, w)
9 | index += 1
10 | return anchors
11 |
12 |
13 | def scale_anchor(anchor, h, w):
14 | x_ctr = (anchor[0] + anchor[2]) * 0.5
15 | y_ctr = (anchor[1] + anchor[3]) * 0.5
16 | scaled_anchor = anchor.copy()
17 | scaled_anchor[0] = x_ctr - w / 2 # xmin
18 | scaled_anchor[2] = x_ctr + w / 2 # xmax
19 | scaled_anchor[1] = y_ctr - h / 2 # ymin
20 | scaled_anchor[3] = y_ctr + h / 2 # ymax
21 | return scaled_anchor
22 |
23 |
24 | def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
25 | scales=2**np.arange(3, 6)):
26 | heights = [11, 16, 23, 33, 48, 68, 97, 139, 198, 283]
27 | widths = [16]
28 | sizes = []
29 | for h in heights:
30 | for w in widths:
31 | sizes.append((h, w))
32 | return generate_basic_anchors(sizes)
33 |
34 | if __name__ == '__main__':
35 | import time
36 | t = time.time()
37 | a = generate_anchors()
38 | print(time.time() - t)
39 | print(a)
40 | from IPython import embed; embed()
41 |
--------------------------------------------------------------------------------
/lib/rpn_msr/proposal_layer_tf.py:
--------------------------------------------------------------------------------
1 | # -*- coding:utf-8 -*-
2 | import numpy as np
3 | from .generate_anchors import generate_anchors
4 | from lib.fast_rcnn.config import cfg
5 | from lib.fast_rcnn.bbox_transform import bbox_transform_inv, clip_boxes
6 | from lib.fast_rcnn.nms_wrapper import nms
7 |
8 |
9 | DEBUG = False
10 | """
11 | Outputs object detection proposals by applying estimated bounding-box
12 | transformations to a set of regular boxes (called "anchors").
13 | """
14 | def proposal_layer(rpn_cls_prob_reshape, rpn_bbox_pred, im_info, cfg_key, _feat_stride = [16,], anchor_scales = [16,]):
15 | """
16 | Parameters
17 | ----------
18 | rpn_cls_prob_reshape: (1 , H , W , Ax2) outputs of RPN, prob of bg or fg
19 | NOTICE: the old version is ordered by (1, H, W, 2, A) !!!!
20 | rpn_bbox_pred: (1 , H , W , Ax4), rgs boxes output of RPN
21 | im_info: a list of [image_height, image_width, scale_ratios]
22 | cfg_key: 'TRAIN' or 'TEST'
23 | _feat_stride: the downsampling ratio of feature map to the original input image
24 | anchor_scales: the scales to the basic_anchor (basic anchor is [16, 16])
25 | ----------
26 | Returns
27 | ----------
28 | rpn_rois : (1 x H x W x A, 5) e.g. [0, x1, y1, x2, y2]
29 |
30 | # Algorithm:
31 | #
32 | # for each (H, W) location i
33 | # generate A anchor boxes centered on cell i
34 | # apply predicted bbox deltas at cell i to each of the A anchors
35 | # clip predicted boxes to image
36 | # remove predicted boxes with either height or width < threshold
37 | # sort all (proposal, score) pairs by score from highest to lowest
38 | # take top pre_nms_topN proposals before NMS
39 | # apply NMS with threshold 0.7 to remaining proposals
40 | # take after_nms_topN proposals after NMS
41 | # return the top proposals (-> RoIs top, scores top)
42 | #layer_params = yaml.load(self.param_str_)
43 |
44 | """
45 | # cfg_key=cfg_key.decode('ascii')
46 | _anchors = generate_anchors(scales=np.array(anchor_scales))#生成基本的9个anchor
47 | _num_anchors = _anchors.shape[0]#9个anchor
48 |
49 | im_info = im_info[0]#原始图像的高宽、缩放尺度
50 |
51 | assert rpn_cls_prob_reshape.shape[0] == 1, \
52 | 'Only single item batches are supported'
53 |
54 | pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N#12000,在做nms之前,最多保留的候选box数目
55 | post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N#2000,做完nms之后,最多保留的box的数目
56 | nms_thresh = cfg[cfg_key].RPN_NMS_THRESH#nms用参数,阈值是0.7
57 | min_size = cfg[cfg_key].RPN_MIN_SIZE#候选box的最小尺寸,目前是16,高宽均要大于16
58 | #TODO 后期需要修改这个最小尺寸,改为8?
59 |
60 | height, width = rpn_cls_prob_reshape.shape[1:3]#feature-map的高宽
61 |
62 | # the first set of _num_anchors channels are bg probs
63 | # the second set are the fg probs, which we want
64 | # (1, H, W, A)
65 | scores = np.reshape(np.reshape(rpn_cls_prob_reshape, [1, height, width, _num_anchors, 2])[:,:,:,:,1],
66 | [1, height, width, _num_anchors])
67 | #提取到object的分数,non-object的我们不关心
68 | #并reshape到1*H*W*9
69 |
70 | bbox_deltas = rpn_bbox_pred#模型输出的pred是相对值,需要进一步处理成真实图像中的坐标
71 | #im_info = bottom[2].data[0, :]
72 |
73 | if DEBUG:
74 | print('im_size: ({}, {})'.format(im_info[0], im_info[1]))
75 | print('scale: {}'.format(im_info[2]))
76 |
77 | # 1. Generate proposals from bbox deltas and shifted anchors
78 | if DEBUG:
79 | print('score map size: {}'.format(scores.shape))
80 |
81 | # Enumerate all shifts
82 | # 同anchor-target-layer-tf这个文件一样,生成anchor的shift,进一步得到整张图像上的所有anchor
83 | shift_x = np.arange(0, width) * _feat_stride
84 | shift_y = np.arange(0, height) * _feat_stride
85 | shift_x, shift_y = np.meshgrid(shift_x, shift_y)
86 | shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
87 | shift_x.ravel(), shift_y.ravel())).transpose()
88 |
89 | # Enumerate all shifted anchors:
90 | #
91 | # add A anchors (1, A, 4) to
92 | # cell K shifts (K, 1, 4) to get
93 | # shift anchors (K, A, 4)
94 | # reshape to (K*A, 4) shifted anchors
95 | A = _num_anchors
96 | K = shifts.shape[0]
97 | anchors = _anchors.reshape((1, A, 4)) + \
98 | shifts.reshape((1, K, 4)).transpose((1, 0, 2))
99 | anchors = anchors.reshape((K * A, 4))#这里得到的anchor就是整张图像上的所有anchor
100 |
101 | # Transpose and reshape predicted bbox transformations to get them
102 | # into the same order as the anchors:
103 | # bbox deltas will be (1, 4 * A, H, W) format
104 | # transpose to (1, H, W, 4 * A)
105 | # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
106 | # in slowest to fastest order
107 | bbox_deltas = bbox_deltas.reshape((-1, 4)) #(HxWxA, 4)
108 |
109 | # Same story for the scores:
110 | scores = scores.reshape((-1, 1))
111 |
112 | # Convert anchors into proposals via bbox transformations
113 | proposals = bbox_transform_inv(anchors, bbox_deltas)#做逆变换,得到box在图像上的真实坐标
114 |
115 | # 2. clip predicted boxes to image
116 | proposals = clip_boxes(proposals, im_info[:2])#将所有的proposal修建一下,超出图像范围的将会被修剪掉
117 |
118 | # 3. remove predicted boxes with either height or width < threshold
119 | # (NOTE: convert min_size to input image scale stored in im_info[2])
120 | keep = _filter_boxes(proposals, min_size * im_info[2])#移除那些proposal小于一定尺寸的proposal
121 | proposals = proposals[keep, :]#保留剩下的proposal
122 | scores = scores[keep]
123 | bbox_deltas=bbox_deltas[keep,:]
124 |
125 |
126 | # # remove irregular boxes, too fat too tall
127 | # keep = _filter_irregular_boxes(proposals)
128 | # proposals = proposals[keep, :]
129 | # scores = scores[keep]
130 |
131 | # 4. sort all (proposal, score) pairs by score from highest to lowest
132 | # 5. take top pre_nms_topN (e.g. 6000)
133 | order = scores.ravel().argsort()[::-1]#score按得分的高低进行排序
134 | if pre_nms_topN > 0: #保留12000个proposal进去做nms
135 | order = order[:pre_nms_topN]
136 | proposals = proposals[order, :]
137 | scores = scores[order]
138 | bbox_deltas=bbox_deltas[order,:]
139 |
140 |
141 | # 6. apply nms (e.g. threshold = 0.7)
142 | # 7. take after_nms_topN (e.g. 300)
143 | # 8. return the top proposals (-> RoIs top)
144 | keep = nms(np.hstack((proposals, scores)), nms_thresh)#进行nms操作,保留2000个proposal
145 | if post_nms_topN > 0:
146 | keep = keep[:post_nms_topN]
147 | proposals = proposals[keep, :]
148 | scores = scores[keep]
149 | bbox_deltas=bbox_deltas[keep,:]
150 |
151 |
152 | # Output rois blob
153 | # Our RPN implementation only supports a single input image, so all
154 | # batch inds are 0
155 | blob = np.hstack((scores.astype(np.float32, copy=False), proposals.astype(np.float32, copy=False)))
156 |
157 | return blob,bbox_deltas
158 |
159 |
160 | def _filter_boxes(boxes, min_size):
161 | """Remove all boxes with any side smaller than min_size."""
162 | ws = boxes[:, 2] - boxes[:, 0] + 1
163 | hs = boxes[:, 3] - boxes[:, 1] + 1
164 | keep = np.where((ws >= min_size) & (hs >= min_size))[0]
165 | return keep
166 |
167 | def _filter_irregular_boxes(boxes, min_ratio = 0.2, max_ratio = 5):
168 | """Remove all boxes with any side smaller than min_size."""
169 | ws = boxes[:, 2] - boxes[:, 0] + 1
170 | hs = boxes[:, 3] - boxes[:, 1] + 1
171 | rs = ws / hs
172 | keep = np.where((rs <= max_ratio) & (rs >= min_ratio))[0]
173 | return keep
174 |
--------------------------------------------------------------------------------
/lib/text_connector/__init__.py:
--------------------------------------------------------------------------------
1 | from .detectors import TextDetector
2 | from .text_connect_cfg import Config
3 |
--------------------------------------------------------------------------------
/lib/text_connector/detectors.py:
--------------------------------------------------------------------------------
1 | #coding:utf-8
2 | import numpy as np
3 | from lib.fast_rcnn.nms_wrapper import nms
4 | from lib.fast_rcnn.config import cfg
5 | from .text_proposal_connector import TextProposalConnector
6 | from .text_proposal_connector_oriented import TextProposalConnector as TextProposalConnectorOriented
7 | from .text_connect_cfg import Config as TextLineCfg
8 |
9 |
10 | class TextDetector:
11 | def __init__(self):
12 | self.mode= cfg.TEST.DETECT_MODE
13 | if self.mode == "H":
14 | self.text_proposal_connector=TextProposalConnector()
15 | elif self.mode == "O":
16 | self.text_proposal_connector=TextProposalConnectorOriented()
17 |
18 |
19 | def detect(self, text_proposals,scores,size):
20 | # 删除得分较低的proposal
21 | keep_inds=np.where(scores>TextLineCfg.TEXT_PROPOSALS_MIN_SCORE)[0]
22 | text_proposals, scores=text_proposals[keep_inds], scores[keep_inds]
23 |
24 | # 按得分排序
25 | sorted_indices=np.argsort(scores.ravel())[::-1]
26 | text_proposals, scores=text_proposals[sorted_indices], scores[sorted_indices]
27 |
28 | # 对proposal做nms
29 | keep_inds=nms(np.hstack((text_proposals, scores)), TextLineCfg.TEXT_PROPOSALS_NMS_THRESH)
30 | text_proposals, scores=text_proposals[keep_inds], scores[keep_inds]
31 |
32 | # 获取检测结果
33 | text_recs=self.text_proposal_connector.get_text_lines(text_proposals, scores, size)
34 | keep_inds=self.filter_boxes(text_recs)
35 | return text_recs[keep_inds]
36 |
37 | def filter_boxes(self, boxes):
38 | heights=np.zeros((len(boxes), 1), np.float)
39 | widths=np.zeros((len(boxes), 1), np.float)
40 | scores=np.zeros((len(boxes), 1), np.float)
41 | index=0
42 | for box in boxes:
43 | heights[index]=(abs(box[5]-box[1])+abs(box[7]-box[3]))/2.0+1
44 | widths[index]=(abs(box[2]-box[0])+abs(box[6]-box[4]))/2.0+1
45 | scores[index] = box[8]
46 | index += 1
47 |
48 | return np.where((widths/heights>TextLineCfg.MIN_RATIO) & (scores>TextLineCfg.LINE_MIN_SCORE) &
49 | (widths>(TextLineCfg.TEXT_PROPOSALS_WIDTH*TextLineCfg.MIN_NUM_PROPOSALS)))[0]
--------------------------------------------------------------------------------
/lib/text_connector/other.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def threshold(coords, min_, max_):
5 | return np.maximum(np.minimum(coords, max_), min_)
6 |
7 | def clip_boxes(boxes, im_shape):
8 | """
9 | Clip boxes to image boundaries.
10 | """
11 | boxes[:, 0::2]=threshold(boxes[:, 0::2], 0, im_shape[1]-1)
12 | boxes[:, 1::2]=threshold(boxes[:, 1::2], 0, im_shape[0]-1)
13 | return boxes
14 |
15 |
16 | class Graph:
17 | def __init__(self, graph):
18 | self.graph=graph
19 |
20 | def sub_graphs_connected(self):
21 | sub_graphs=[]
22 | for index in range(self.graph.shape[0]):
23 | if not self.graph[:, index].any() and self.graph[index, :].any():
24 | v=index
25 | sub_graphs.append([v])
26 | while self.graph[v, :].any():
27 | v=np.where(self.graph[v, :])[0][0]
28 | sub_graphs[-1].append(v)
29 | return sub_graphs
30 |
31 |
--------------------------------------------------------------------------------
/lib/text_connector/text_connect_cfg.py:
--------------------------------------------------------------------------------
1 | class Config:
2 | SCALE=600
3 | MAX_SCALE=1200
4 | TEXT_PROPOSALS_WIDTH=16
5 | MIN_NUM_PROPOSALS = 2
6 | MIN_RATIO=0.5
7 | LINE_MIN_SCORE=0.9
8 | MAX_HORIZONTAL_GAP=50
9 | TEXT_PROPOSALS_MIN_SCORE=0.7
10 | TEXT_PROPOSALS_NMS_THRESH=0.2
11 | MIN_V_OVERLAPS=0.7
12 | MIN_SIZE_SIM=0.7
13 |
14 |
15 |
--------------------------------------------------------------------------------
/lib/text_connector/text_proposal_connector.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from .other import clip_boxes
3 | from .text_proposal_graph_builder import TextProposalGraphBuilder
4 |
5 | class TextProposalConnector:
6 | def __init__(self):
7 | self.graph_builder=TextProposalGraphBuilder()
8 |
9 | def group_text_proposals(self, text_proposals, scores, im_size):
10 | graph=self.graph_builder.build_graph(text_proposals, scores, im_size)
11 | return graph.sub_graphs_connected()
12 |
13 | def fit_y(self, X, Y, x1, x2):
14 | len(X)!=0
15 | # if X only include one point, the function will get line y=Y[0]
16 | if np.sum(X==X[0])==len(X):
17 | return Y[0], Y[0]
18 | p=np.poly1d(np.polyfit(X, Y, 1))
19 | return p(x1), p(x2)
20 |
21 | def get_text_lines(self, text_proposals, scores, im_size):
22 | # tp=text proposal
23 | tp_groups=self.group_text_proposals(text_proposals, scores, im_size)
24 | text_lines=np.zeros((len(tp_groups), 5), np.float32)
25 |
26 | for index, tp_indices in enumerate(tp_groups):
27 | text_line_boxes=text_proposals[list(tp_indices)]
28 |
29 | x0=np.min(text_line_boxes[:, 0])
30 | x1=np.max(text_line_boxes[:, 2])
31 |
32 | offset=(text_line_boxes[0, 2]-text_line_boxes[0, 0])*0.5
33 |
34 | lt_y, rt_y=self.fit_y(text_line_boxes[:, 0], text_line_boxes[:, 1], x0+offset, x1-offset)
35 | lb_y, rb_y=self.fit_y(text_line_boxes[:, 0], text_line_boxes[:, 3], x0+offset, x1-offset)
36 |
37 | # the score of a text line is the average score of the scores
38 | # of all text proposals contained in the text line
39 | score=scores[list(tp_indices)].sum()/float(len(tp_indices))
40 |
41 | text_lines[index, 0]=x0
42 | text_lines[index, 1]=min(lt_y, rt_y)
43 | text_lines[index, 2]=x1
44 | text_lines[index, 3]=max(lb_y, rb_y)
45 | text_lines[index, 4]=score
46 |
47 | text_lines=clip_boxes(text_lines, im_size)
48 |
49 | text_recs = np.zeros((len(text_lines), 9), np.float)
50 | index = 0
51 | for line in text_lines:
52 | xmin,ymin,xmax,ymax=line[0],line[1],line[2],line[3]
53 | text_recs[index, 0] = xmin
54 | text_recs[index, 1] = ymin
55 | text_recs[index, 2] = xmax
56 | text_recs[index, 3] = ymin
57 | text_recs[index, 4] = xmin
58 | text_recs[index, 5] = ymax
59 | text_recs[index, 6] = xmax
60 | text_recs[index, 7] = ymax
61 | text_recs[index, 8] = line[4]
62 | index = index + 1
63 |
64 | return text_recs
65 |
--------------------------------------------------------------------------------
/lib/text_connector/text_proposal_connector_oriented.py:
--------------------------------------------------------------------------------
1 | #coding:utf-8
2 | import numpy as np
3 | from .text_proposal_graph_builder import TextProposalGraphBuilder
4 |
5 | class TextProposalConnector:
6 | """
7 | Connect text proposals into text lines
8 | """
9 | def __init__(self):
10 | self.graph_builder=TextProposalGraphBuilder()
11 |
12 | def group_text_proposals(self, text_proposals, scores, im_size):
13 | graph=self.graph_builder.build_graph(text_proposals, scores, im_size)
14 | return graph.sub_graphs_connected()
15 |
16 | def fit_y(self, X, Y, x1, x2):
17 | len(X)!=0
18 | # if X only include one point, the function will get line y=Y[0]
19 | if np.sum(X==X[0])==len(X):
20 | return Y[0], Y[0]
21 | p=np.poly1d(np.polyfit(X, Y, 1))
22 | return p(x1), p(x2)
23 |
24 | def get_text_lines(self, text_proposals, scores, im_size):
25 | """
26 | text_proposals:boxes
27 |
28 | """
29 | # tp=text proposal
30 | tp_groups=self.group_text_proposals(text_proposals, scores, im_size)#首先还是建图,获取到文本行由哪几个小框构成
31 |
32 | text_lines=np.zeros((len(tp_groups), 8), np.float32)
33 |
34 | for index, tp_indices in enumerate(tp_groups):
35 | text_line_boxes=text_proposals[list(tp_indices)]#每个文本行的全部小框
36 | X = (text_line_boxes[:,0] + text_line_boxes[:,2]) / 2# 求每一个小框的中心x,y坐标
37 | Y = (text_line_boxes[:,1] + text_line_boxes[:,3]) / 2
38 |
39 | z1 = np.polyfit(X,Y,1)#多项式拟合,根据之前求的中心店拟合一条直线(最小二乘)
40 |
41 | x0=np.min(text_line_boxes[:, 0])#文本行x坐标最小值
42 | x1=np.max(text_line_boxes[:, 2])#文本行x坐标最大值
43 |
44 | offset=(text_line_boxes[0, 2]-text_line_boxes[0, 0])*0.5#小框宽度的一半
45 |
46 | # 以全部小框的左上角这个点去拟合一条直线,然后计算一下文本行x坐标的极左极右对应的y坐标
47 | lt_y, rt_y=self.fit_y(text_line_boxes[:, 0], text_line_boxes[:, 1], x0+offset, x1-offset)
48 | # 以全部小框的左下角这个点去拟合一条直线,然后计算一下文本行x坐标的极左极右对应的y坐标
49 | lb_y, rb_y=self.fit_y(text_line_boxes[:, 0], text_line_boxes[:, 3], x0+offset, x1-offset)
50 |
51 | score=scores[list(tp_indices)].sum()/float(len(tp_indices))#求全部小框得分的均值作为文本行的均值
52 |
53 | text_lines[index, 0]=x0
54 | text_lines[index, 1]=min(lt_y, rt_y)#文本行上端 线段 的y坐标的小值
55 | text_lines[index, 2]=x1
56 | text_lines[index, 3]=max(lb_y, rb_y)#文本行下端 线段 的y坐标的大值
57 | text_lines[index, 4]=score#文本行得分
58 | text_lines[index, 5]=z1[0]#根据中心点拟合的直线的k,b
59 | text_lines[index, 6]=z1[1]
60 | height = np.mean( (text_line_boxes[:,3]-text_line_boxes[:,1]) )#小框平均高度
61 | text_lines[index, 7]= height + 2.5
62 |
63 | text_recs = np.zeros((len(text_lines), 9), np.float)
64 | index = 0
65 | for line in text_lines:
66 | b1 = line[6] - line[7] / 2 # 根据高度和文本行中心线,求取文本行上下两条线的b值
67 | b2 = line[6] + line[7] / 2
68 | x1 = line[0]
69 | y1 = line[5] * line[0] + b1 # 左上
70 | x2 = line[2]
71 | y2 = line[5] * line[2] + b1 # 右上
72 | x3 = line[0]
73 | y3 = line[5] * line[0] + b2 # 左下
74 | x4 = line[2]
75 | y4 = line[5] * line[2] + b2 # 右下
76 | disX = x2 - x1
77 | disY = y2 - y1
78 | width = np.sqrt(disX * disX + disY * disY) # 文本行宽度
79 |
80 | fTmp0 = y3 - y1 # 文本行高度
81 | fTmp1 = fTmp0 * disY / width
82 | x = np.fabs(fTmp1 * disX / width) # 做补偿
83 | y = np.fabs(fTmp1 * disY / width)
84 | if line[5] < 0:
85 | x1 -= x
86 | y1 += y
87 | x4 += x
88 | y4 -= y
89 | else:
90 | x2 += x
91 | y2 += y
92 | x3 -= x
93 | y3 -= y
94 | text_recs[index, 0] = x1
95 | text_recs[index, 1] = y1
96 | text_recs[index, 2] = x2
97 | text_recs[index, 3] = y2
98 | text_recs[index, 4] = x3
99 | text_recs[index, 5] = y3
100 | text_recs[index, 6] = x4
101 | text_recs[index, 7] = y4
102 | text_recs[index, 8] = line[4]
103 | index = index + 1
104 |
105 | return text_recs
106 |
--------------------------------------------------------------------------------
/lib/text_connector/text_proposal_graph_builder.py:
--------------------------------------------------------------------------------
1 | from .text_connect_cfg import Config as TextLineCfg
2 | from .other import Graph
3 | import numpy as np
4 |
5 |
6 | class TextProposalGraphBuilder:
7 | """
8 | Build Text proposals into a graph.
9 | """
10 | def get_successions(self, index):
11 | box=self.text_proposals[index]
12 | results=[]
13 | for left in range(int(box[0])+1, min(int(box[0])+TextLineCfg.MAX_HORIZONTAL_GAP+1, self.im_size[1])):
14 | adj_box_indices=self.boxes_table[left]
15 | for adj_box_index in adj_box_indices:
16 | if self.meet_v_iou(adj_box_index, index):
17 | results.append(adj_box_index)
18 | if len(results)!=0:
19 | return results
20 | return results
21 |
22 | def get_precursors(self, index):
23 | box=self.text_proposals[index]
24 | results=[]
25 | for left in range(int(box[0])-1, max(int(box[0]-TextLineCfg.MAX_HORIZONTAL_GAP), 0)-1, -1):
26 | adj_box_indices=self.boxes_table[left]
27 | for adj_box_index in adj_box_indices:
28 | if self.meet_v_iou(adj_box_index, index):
29 | results.append(adj_box_index)
30 | if len(results)!=0:
31 | return results
32 | return results
33 |
34 | def is_succession_node(self, index, succession_index):
35 | precursors=self.get_precursors(succession_index)
36 | if self.scores[index]>=np.max(self.scores[precursors]):
37 | return True
38 | return False
39 |
40 | def meet_v_iou(self, index1, index2):
41 | def overlaps_v(index1, index2):
42 | h1=self.heights[index1]
43 | h2=self.heights[index2]
44 | y0=max(self.text_proposals[index2][1], self.text_proposals[index1][1])
45 | y1=min(self.text_proposals[index2][3], self.text_proposals[index1][3])
46 | return max(0, y1-y0+1)/min(h1, h2)
47 |
48 | def size_similarity(index1, index2):
49 | h1=self.heights[index1]
50 | h2=self.heights[index2]
51 | return min(h1, h2)/max(h1, h2)
52 |
53 | return overlaps_v(index1, index2)>=TextLineCfg.MIN_V_OVERLAPS and \
54 | size_similarity(index1, index2)>=TextLineCfg.MIN_SIZE_SIM
55 |
56 | def build_graph(self, text_proposals, scores, im_size):
57 | self.text_proposals=text_proposals
58 | self.scores=scores
59 | self.im_size=im_size
60 | self.heights=text_proposals[:, 3]-text_proposals[:, 1]+1
61 |
62 | boxes_table=[[] for _ in range(self.im_size[1])]
63 | for index, box in enumerate(text_proposals):
64 | boxes_table[int(box[0])].append(index)
65 | self.boxes_table=boxes_table
66 |
67 | graph=np.zeros((text_proposals.shape[0], text_proposals.shape[0]), np.bool)
68 |
69 | for index, box in enumerate(text_proposals):
70 | successions=self.get_successions(index)
71 | if len(successions)==0:
72 | continue
73 | succession_index=successions[np.argmax(scores[successions])]
74 | if self.is_succession_node(index, succession_index):
75 | # NOTE: a box can have multiple successions(precursors) if multiple successions(precursors)
76 | # have equal scores.
77 | graph[index, succession_index]=True
78 | return Graph(graph)
79 |
--------------------------------------------------------------------------------
/lib/utils/__init__.py:
--------------------------------------------------------------------------------
1 | from . import boxes_grid
2 | from . import blob
3 | from . import timer
--------------------------------------------------------------------------------
/lib/utils/bbox.pyx:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Sergey Karayev
6 | # --------------------------------------------------------
7 |
8 | cimport cython
9 | import numpy as np
10 | cimport numpy as np
11 |
12 | DTYPE = np.float
13 | ctypedef np.float_t DTYPE_t
14 |
15 | def bbox_overlaps(
16 | np.ndarray[DTYPE_t, ndim=2] boxes,
17 | np.ndarray[DTYPE_t, ndim=2] query_boxes):
18 | """
19 | Parameters
20 | ----------
21 | boxes: (N, 4) ndarray of float
22 | query_boxes: (K, 4) ndarray of float
23 | Returns
24 | -------
25 | overlaps: (N, K) ndarray of overlap between boxes and query_boxes
26 | """
27 | cdef unsigned int N = boxes.shape[0]
28 | cdef unsigned int K = query_boxes.shape[0]
29 | cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
30 | cdef DTYPE_t iw, ih, box_area
31 | cdef DTYPE_t ua
32 | cdef unsigned int k, n
33 | for k in range(K):
34 | box_area = (
35 | (query_boxes[k, 2] - query_boxes[k, 0] + 1) *
36 | (query_boxes[k, 3] - query_boxes[k, 1] + 1)
37 | )
38 | for n in range(N):
39 | iw = (
40 | min(boxes[n, 2], query_boxes[k, 2]) -
41 | max(boxes[n, 0], query_boxes[k, 0]) + 1
42 | )
43 | if iw > 0:
44 | ih = (
45 | min(boxes[n, 3], query_boxes[k, 3]) -
46 | max(boxes[n, 1], query_boxes[k, 1]) + 1
47 | )
48 | if ih > 0:
49 | ua = float(
50 | (boxes[n, 2] - boxes[n, 0] + 1) *
51 | (boxes[n, 3] - boxes[n, 1] + 1) +
52 | box_area - iw * ih
53 | )
54 | overlaps[n, k] = iw * ih / ua
55 | return overlaps
56 |
57 | def bbox_intersections(
58 | np.ndarray[DTYPE_t, ndim=2] boxes,
59 | np.ndarray[DTYPE_t, ndim=2] query_boxes):
60 | """
61 | For each query box compute the intersection ratio covered by boxes
62 | ----------
63 | Parameters
64 | ----------
65 | boxes: (N, 4) ndarray of float
66 | query_boxes: (K, 4) ndarray of float
67 | Returns
68 | -------
69 | overlaps: (N, K) ndarray of intersec between boxes and query_boxes
70 | """
71 | cdef unsigned int N = boxes.shape[0]
72 | cdef unsigned int K = query_boxes.shape[0]
73 | cdef np.ndarray[DTYPE_t, ndim=2] intersec = np.zeros((N, K), dtype=DTYPE)
74 | cdef DTYPE_t iw, ih, box_area
75 | cdef DTYPE_t ua
76 | cdef unsigned int k, n
77 | for k in range(K):
78 | box_area = (
79 | (query_boxes[k, 2] - query_boxes[k, 0] + 1) *
80 | (query_boxes[k, 3] - query_boxes[k, 1] + 1)
81 | )
82 | for n in range(N):
83 | iw = (
84 | min(boxes[n, 2], query_boxes[k, 2]) -
85 | max(boxes[n, 0], query_boxes[k, 0]) + 1
86 | )
87 | if iw > 0:
88 | ih = (
89 | min(boxes[n, 3], query_boxes[k, 3]) -
90 | max(boxes[n, 1], query_boxes[k, 1]) + 1
91 | )
92 | if ih > 0:
93 | intersec[n, k] = iw * ih / box_area
94 | return intersec
--------------------------------------------------------------------------------
/lib/utils/blob.py:
--------------------------------------------------------------------------------
1 | """Blob helper functions."""
2 | import numpy as np
3 | import cv2
4 | from ..fast_rcnn.config import cfg
5 |
6 | def im_list_to_blob(ims):
7 | """Convert a list of images into a network input.
8 |
9 | Assumes images are already prepared (means subtracted, BGR order, ...).
10 | """
11 | max_shape = np.array([im.shape for im in ims]).max(axis=0)
12 | num_images = len(ims)
13 | blob = np.zeros((num_images, max_shape[0], max_shape[1], 3),
14 | dtype=np.float32)
15 | for i in range(num_images):
16 | im = ims[i]
17 | blob[i, 0:im.shape[0], 0:im.shape[1], :] = im
18 |
19 | return blob
20 |
21 | def prep_im_for_blob(im, pixel_means, target_size, max_size):
22 | """Mean subtract and scale an image for use in a blob."""
23 | im = im.astype(np.float32, copy=False)
24 | im -= pixel_means
25 | im_shape = im.shape
26 | im_size_min = np.min(im_shape[0:2])
27 | im_size_max = np.max(im_shape[0:2])
28 | im_scale = float(target_size) / float(im_size_min)
29 | # Prevent the biggest axis from being more than MAX_SIZE
30 | if np.round(im_scale * im_size_max) > max_size:
31 | im_scale = float(max_size) / float(im_size_max)
32 | if cfg.TRAIN.RANDOM_DOWNSAMPLE:
33 | r = 0.6 + np.random.rand() * 0.4
34 | im_scale *= r
35 | im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,
36 | interpolation=cv2.INTER_LINEAR)
37 |
38 | return im, im_scale
39 |
--------------------------------------------------------------------------------
/lib/utils/boxes_grid.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Subcategory CNN
3 | # Copyright (c) 2015 CVGL Stanford
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Yu Xiang
6 | # --------------------------------------------------------
7 |
8 | import numpy as np
9 | import math
10 | # TODO: make fast_rcnn irrelevant
11 | # >>>> obsolete, because it depends on sth outside of this project
12 | from ..fast_rcnn.config import cfg
13 | # <<<< obsolete
14 |
15 | def get_boxes_grid(image_height, image_width):
16 | """
17 | Return the boxes on image grid.
18 | calling this function when cfg.IS_MULTISCALE is True, otherwise, calling rdl_roidb.prepare_roidb(imdb) instead.
19 | """
20 |
21 | # fixed a bug, change cfg.TRAIN.SCALES to cfg.TRAIN.SCALES_BASE
22 | # coz, here needs a ratio around 1.0, not the accutual size.
23 | # height and width of the feature map
24 | if cfg.NET_NAME == 'CaffeNet':
25 | height = np.floor((image_height * max(cfg.TRAIN.SCALES_BASE) - 1) / 4.0 + 1)
26 | height = np.floor((height - 1) / 2.0 + 1 + 0.5)
27 | height = np.floor((height - 1) / 2.0 + 1 + 0.5)
28 |
29 | width = np.floor((image_width * max(cfg.TRAIN.SCALES_BASE) - 1) / 4.0 + 1)
30 | width = np.floor((width - 1) / 2.0 + 1 + 0.5)
31 | width = np.floor((width - 1) / 2.0 + 1 + 0.5)
32 | elif cfg.NET_NAME == 'VGGnet':
33 | height = np.floor(image_height * max(cfg.TRAIN.SCALES_BASE) / 2.0 + 0.5)
34 | height = np.floor(height / 2.0 + 0.5)
35 | height = np.floor(height / 2.0 + 0.5)
36 | height = np.floor(height / 2.0 + 0.5)
37 |
38 | width = np.floor(image_width * max(cfg.TRAIN.SCALES_BASE) / 2.0 + 0.5)
39 | width = np.floor(width / 2.0 + 0.5)
40 | width = np.floor(width / 2.0 + 0.5)
41 | width = np.floor(width / 2.0 + 0.5)
42 | else:
43 | assert (1), 'The network architecture is not supported in utils.get_boxes_grid!'
44 |
45 | # compute the grid box centers
46 | h = np.arange(height)
47 | w = np.arange(width)
48 | y, x = np.meshgrid(h, w, indexing='ij')
49 | centers = np.dstack((x, y))
50 | centers = np.reshape(centers, (-1, 2))
51 | num = centers.shape[0]
52 |
53 | # compute width and height of grid box
54 | area = cfg.TRAIN.KERNEL_SIZE * cfg.TRAIN.KERNEL_SIZE
55 | aspect = cfg.TRAIN.ASPECTS # height / width
56 | num_aspect = len(aspect)
57 | widths = np.zeros((1, num_aspect), dtype=np.float32)
58 | heights = np.zeros((1, num_aspect), dtype=np.float32)
59 | for i in range(num_aspect):
60 | widths[0,i] = math.sqrt(area / aspect[i])
61 | heights[0,i] = widths[0,i] * aspect[i]
62 |
63 | # construct grid boxes
64 | centers = np.repeat(centers, num_aspect, axis=0)
65 | widths = np.tile(widths, num).transpose()
66 | heights = np.tile(heights, num).transpose()
67 |
68 | x1 = np.reshape(centers[:,0], (-1, 1)) - widths * 0.5
69 | x2 = np.reshape(centers[:,0], (-1, 1)) + widths * 0.5
70 | y1 = np.reshape(centers[:,1], (-1, 1)) - heights * 0.5
71 | y2 = np.reshape(centers[:,1], (-1, 1)) + heights * 0.5
72 |
73 | boxes_grid = np.hstack((x1, y1, x2, y2)) / cfg.TRAIN.SPATIAL_SCALE
74 |
75 | return boxes_grid, centers[:,0], centers[:,1]
76 |
--------------------------------------------------------------------------------
/lib/utils/cython_nms.pyx:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | import numpy as np
9 | cimport numpy as np
10 |
11 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
12 | return a if a >= b else b
13 |
14 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
15 | return a if a <= b else b
16 |
17 | def nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
18 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
19 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
20 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
21 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
22 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
23 |
24 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
25 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1]
26 |
27 | cdef int ndets = dets.shape[0]
28 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \
29 | np.zeros((ndets), dtype=np.int)
30 |
31 | # nominal indices
32 | cdef int _i, _j
33 | # sorted indices
34 | cdef int i, j
35 | # temp variables for box i's (the box currently under consideration)
36 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea
37 | # variables for computing overlap with box j (lower scoring box)
38 | cdef np.float32_t xx1, yy1, xx2, yy2
39 | cdef np.float32_t w, h
40 | cdef np.float32_t inter, ovr
41 |
42 | keep = []
43 | for _i in range(ndets):
44 | i = order[_i]
45 | if suppressed[i] == 1:
46 | continue
47 | keep.append(i)
48 | ix1 = x1[i]
49 | iy1 = y1[i]
50 | ix2 = x2[i]
51 | iy2 = y2[i]
52 | iarea = areas[i]
53 | for _j in range(_i + 1, ndets):
54 | j = order[_j]
55 | if suppressed[j] == 1:
56 | continue
57 | xx1 = max(ix1, x1[j])
58 | yy1 = max(iy1, y1[j])
59 | xx2 = min(ix2, x2[j])
60 | yy2 = min(iy2, y2[j])
61 | w = max(0.0, xx2 - xx1 + 1)
62 | h = max(0.0, yy2 - yy1 + 1)
63 | inter = w * h
64 | ovr = inter / (iarea + areas[j] - inter)
65 | if ovr >= thresh:
66 | suppressed[j] = 1
67 |
68 | return keep
69 |
70 | def nms_new(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
71 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
72 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
73 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
74 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
75 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
76 |
77 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
78 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1]
79 |
80 | cdef int ndets = dets.shape[0]
81 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \
82 | np.zeros((ndets), dtype=np.int)
83 |
84 | # nominal indices
85 | cdef int _i, _j
86 | # sorted indices
87 | cdef int i, j
88 | # temp variables for box i's (the box currently under consideration)
89 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea
90 | # variables for computing overlap with box j (lower scoring box)
91 | cdef np.float32_t xx1, yy1, xx2, yy2
92 | cdef np.float32_t w, h
93 | cdef np.float32_t inter, ovr
94 |
95 | keep = []
96 | for _i in range(ndets):
97 | i = order[_i]
98 | if suppressed[i] == 1:
99 | continue
100 | keep.append(i)
101 | ix1 = x1[i]
102 | iy1 = y1[i]
103 | ix2 = x2[i]
104 | iy2 = y2[i]
105 | iarea = areas[i]
106 | for _j in range(_i + 1, ndets):
107 | j = order[_j]
108 | if suppressed[j] == 1:
109 | continue
110 | xx1 = max(ix1, x1[j])
111 | yy1 = max(iy1, y1[j])
112 | xx2 = min(ix2, x2[j])
113 | yy2 = min(iy2, y2[j])
114 | w = max(0.0, xx2 - xx1 + 1)
115 | h = max(0.0, yy2 - yy1 + 1)
116 | inter = w * h
117 | ovr = inter / (iarea + areas[j] - inter)
118 | ovr1 = inter / iarea
119 | ovr2 = inter / areas[j]
120 | if ovr >= thresh or ovr1 > 0.95 or ovr2 > 0.95:
121 | suppressed[j] = 1
122 |
123 | return keep
124 |
--------------------------------------------------------------------------------
/lib/utils/gpu_nms.hpp:
--------------------------------------------------------------------------------
1 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
2 | int boxes_dim, float nms_overlap_thresh, int device_id);
3 |
--------------------------------------------------------------------------------
/lib/utils/gpu_nms.pyx:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Faster R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | import numpy as np
9 | cimport numpy as np
10 |
11 | assert sizeof(int) == sizeof(np.int32_t)
12 |
13 | cdef extern from "gpu_nms.hpp":
14 | void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int)
15 |
16 | def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh,
17 | np.int32_t device_id=0):
18 | cdef int boxes_num = dets.shape[0]
19 | cdef int boxes_dim = dets.shape[1]
20 | cdef int num_out
21 | cdef np.ndarray[np.int32_t, ndim=1] \
22 | keep = np.zeros(boxes_num, dtype=np.int32)
23 | cdef np.ndarray[np.float32_t, ndim=1] \
24 | scores = dets[:, 4]
25 | cdef np.ndarray[np.int_t, ndim=1] \
26 | order = scores.argsort()[::-1]
27 | cdef np.ndarray[np.float32_t, ndim=2] \
28 | sorted_dets = dets[order, :]
29 | _nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id)
30 | keep = keep[:num_out]
31 | return list(order[keep])
32 |
--------------------------------------------------------------------------------
/lib/utils/make.sh:
--------------------------------------------------------------------------------
1 | cython bbox.pyx
2 | cython cython_nms.pyx
3 | cython gpu_nms.pyx
4 | python setup.py build_ext --inplace
5 | rm -rf build
6 |
--------------------------------------------------------------------------------
/lib/utils/nms_kernel.cu:
--------------------------------------------------------------------------------
1 | // ------------------------------------------------------------------
2 | // Faster R-CNN
3 | // Copyright (c) 2015 Microsoft
4 | // Licensed under The MIT License [see fast-rcnn/LICENSE for details]
5 | // Written by Shaoqing Ren
6 | // ------------------------------------------------------------------
7 |
8 | #include "gpu_nms.hpp"
9 | #include
10 | #include
11 |
12 | #define CUDA_CHECK(condition) \
13 | /* Code block avoids redefinition of cudaError_t error */ \
14 | do { \
15 | cudaError_t error = condition; \
16 | if (error != cudaSuccess) { \
17 | std::cout << cudaGetErrorString(error) << std::endl; \
18 | } \
19 | } while (0)
20 |
21 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0))
22 | int const threadsPerBlock = sizeof(unsigned long long) * 8;
23 |
24 | __device__ inline float devIoU(float const * const a, float const * const b) {
25 | float left = max(a[0], b[0]), right = min(a[2], b[2]);
26 | float top = max(a[1], b[1]), bottom = min(a[3], b[3]);
27 | float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f);
28 | float interS = width * height;
29 | float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1);
30 | float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1);
31 | return interS / (Sa + Sb - interS);
32 | }
33 |
34 | __global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh,
35 | const float *dev_boxes, unsigned long long *dev_mask) {
36 | const int row_start = blockIdx.y;
37 | const int col_start = blockIdx.x;
38 |
39 | // if (row_start > col_start) return;
40 |
41 | const int row_size =
42 | min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
43 | const int col_size =
44 | min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
45 |
46 | __shared__ float block_boxes[threadsPerBlock * 5];
47 | if (threadIdx.x < col_size) {
48 | block_boxes[threadIdx.x * 5 + 0] =
49 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0];
50 | block_boxes[threadIdx.x * 5 + 1] =
51 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1];
52 | block_boxes[threadIdx.x * 5 + 2] =
53 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2];
54 | block_boxes[threadIdx.x * 5 + 3] =
55 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3];
56 | block_boxes[threadIdx.x * 5 + 4] =
57 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4];
58 | }
59 | __syncthreads();
60 |
61 | if (threadIdx.x < row_size) {
62 | const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
63 | const float *cur_box = dev_boxes + cur_box_idx * 5;
64 | int i = 0;
65 | unsigned long long t = 0;
66 | int start = 0;
67 | if (row_start == col_start) {
68 | start = threadIdx.x + 1;
69 | }
70 | for (i = start; i < col_size; i++) {
71 | if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) {
72 | t |= 1ULL << i;
73 | }
74 | }
75 | const int col_blocks = DIVUP(n_boxes, threadsPerBlock);
76 | dev_mask[cur_box_idx * col_blocks + col_start] = t;
77 | }
78 | }
79 |
80 | void _set_device(int device_id) {
81 | int current_device;
82 | CUDA_CHECK(cudaGetDevice(¤t_device));
83 | if (current_device == device_id) {
84 | return;
85 | }
86 | // The call to cudaSetDevice must come before any calls to Get, which
87 | // may perform initialization using the GPU.
88 | CUDA_CHECK(cudaSetDevice(device_id));
89 | }
90 |
91 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
92 | int boxes_dim, float nms_overlap_thresh, int device_id) {
93 | _set_device(device_id);
94 |
95 | float* boxes_dev = NULL;
96 | unsigned long long* mask_dev = NULL;
97 |
98 | const int col_blocks = DIVUP(boxes_num, threadsPerBlock);
99 |
100 | CUDA_CHECK(cudaMalloc(&boxes_dev,
101 | boxes_num * boxes_dim * sizeof(float)));
102 | CUDA_CHECK(cudaMemcpy(boxes_dev,
103 | boxes_host,
104 | boxes_num * boxes_dim * sizeof(float),
105 | cudaMemcpyHostToDevice));
106 |
107 | CUDA_CHECK(cudaMalloc(&mask_dev,
108 | boxes_num * col_blocks * sizeof(unsigned long long)));
109 |
110 | dim3 blocks(DIVUP(boxes_num, threadsPerBlock),
111 | DIVUP(boxes_num, threadsPerBlock));
112 | dim3 threads(threadsPerBlock);
113 | nms_kernel<<>>(boxes_num,
114 | nms_overlap_thresh,
115 | boxes_dev,
116 | mask_dev);
117 |
118 | std::vector mask_host(boxes_num * col_blocks);
119 | CUDA_CHECK(cudaMemcpy(&mask_host[0],
120 | mask_dev,
121 | sizeof(unsigned long long) * boxes_num * col_blocks,
122 | cudaMemcpyDeviceToHost));
123 |
124 | std::vector remv(col_blocks);
125 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
126 |
127 | int num_to_keep = 0;
128 | for (int i = 0; i < boxes_num; i++) {
129 | int nblock = i / threadsPerBlock;
130 | int inblock = i % threadsPerBlock;
131 |
132 | if (!(remv[nblock] & (1ULL << inblock))) {
133 | keep_out[num_to_keep++] = i;
134 | unsigned long long *p = &mask_host[0] + i * col_blocks;
135 | for (int j = nblock; j < col_blocks; j++) {
136 | remv[j] |= p[j];
137 | }
138 | }
139 | }
140 | *num_out = num_to_keep;
141 |
142 | CUDA_CHECK(cudaFree(boxes_dev));
143 | CUDA_CHECK(cudaFree(mask_dev));
144 | }
145 |
--------------------------------------------------------------------------------
/lib/utils/setup.py:
--------------------------------------------------------------------------------
1 | from Cython.Build import cythonize
2 | import os
3 | from os.path import join as pjoin
4 | import numpy as np
5 | from distutils.core import setup
6 | from distutils.extension import Extension
7 | from Cython.Distutils import build_ext
8 |
9 | def find_in_path(name, path):
10 | for dir in path.split(os.pathsep):
11 | binpath = pjoin(dir, name)
12 | if os.path.exists(binpath):
13 | return os.path.abspath(binpath)
14 | return None
15 |
16 | def locate_cuda():
17 | # first check if the CUDAHOME env variable is in use
18 | if 'CUDAHOME' in os.environ:
19 | home = os.environ['CUDAHOME']
20 | nvcc = pjoin(home, 'bin', 'nvcc')
21 | else:
22 | # otherwise, search the PATH for NVCC
23 | default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin')
24 | nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path)
25 | if nvcc is None:
26 | raise EnvironmentError('The nvcc binary could not be '
27 | 'located in your $PATH. Either add it to your path, or set $CUDAHOME')
28 | home = os.path.dirname(os.path.dirname(nvcc))
29 |
30 | cudaconfig = {'home':home, 'nvcc':nvcc,
31 | 'include': pjoin(home, 'include'),
32 | 'lib64': pjoin(home, 'lib64')}
33 | for k, v in cudaconfig.items():
34 | #for k, v in cudaconfig.iteritems():
35 | if not os.path.exists(v):
36 | raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))
37 | return cudaconfig
38 |
39 | CUDA = locate_cuda()
40 |
41 |
42 | try:
43 | numpy_include = np.get_include()
44 | except AttributeError:
45 | numpy_include = np.get_numpy_include()
46 |
47 | def customize_compiler_for_nvcc(self):
48 | self.src_extensions.append('.cu')
49 | default_compiler_so = self.compiler_so
50 | super = self._compile
51 | def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
52 | print(extra_postargs)
53 | if os.path.splitext(src)[1] == '.cu':
54 | # use the cuda for .cu files
55 | self.set_executable('compiler_so', CUDA['nvcc'])
56 | # use only a subset of the extra_postargs, which are 1-1 translated
57 | # from the extra_compile_args in the Extension class
58 | postargs = extra_postargs['nvcc']
59 | else:
60 | postargs = extra_postargs['gcc']
61 |
62 | super(obj, src, ext, cc_args, postargs, pp_opts)
63 | # reset the default compiler_so, which we might have changed for cuda
64 | self.compiler_so = default_compiler_so
65 | # inject our redefined _compile method into the class
66 | self._compile = _compile
67 |
68 |
69 | # run the customize_compiler
70 | class custom_build_ext(build_ext):
71 | def build_extensions(self):
72 | customize_compiler_for_nvcc(self.compiler)
73 | build_ext.build_extensions(self)
74 |
75 | ext_modules = [
76 | Extension(
77 | "bbox",
78 | ["bbox.pyx"],
79 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]},
80 | include_dirs = [numpy_include]
81 | ),
82 | Extension(
83 | "cython_nms",
84 | ["cython_nms.pyx"],
85 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]},
86 | include_dirs = [numpy_include]
87 | ),
88 | Extension('gpu_nms',
89 | ['nms_kernel.cu', 'gpu_nms.pyx'],
90 | library_dirs=[CUDA['lib64']],
91 | libraries=['cudart'],
92 | language='c++',
93 | runtime_library_dirs=[CUDA['lib64']],
94 | extra_compile_args={'gcc': ["-Wno-unused-function"],
95 | 'nvcc': ['-arch=sm_35',
96 | '--ptxas-options=-v',
97 | '-c',
98 | '--compiler-options',
99 | "'-fPIC'"]},
100 | include_dirs = [numpy_include, CUDA['include']]
101 | ),
102 | ]
103 |
104 | setup(
105 | ext_modules=ext_modules,
106 | cmdclass={'build_ext': custom_build_ext},
107 | )
108 |
109 |
--------------------------------------------------------------------------------
/lib/utils/timer.py:
--------------------------------------------------------------------------------
1 | import time
2 | class Timer(object):
3 | def __init__(self):
4 | self.total_time = 0.
5 | self.calls = 0
6 | self.start_time = 0.
7 | self.diff = 0.
8 | self.average_time = 0.
9 |
10 | def tic(self):
11 | self.start_time = time.time()
12 |
13 | def toc(self, average=True):
14 | self.diff = time.time() - self.start_time
15 | self.total_time += self.diff
16 | self.calls += 1
17 | self.average_time = self.total_time / self.calls
18 | if average:
19 | return self.average_time
20 | else:
21 | return self.diff
22 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | easydict==1.7
2 | tensorflow_gpu==1.3.0
3 | scipy==0.18.1
4 | numpy==1.11.1
5 | opencv_python==3.4.0.12
6 | Cython==0.27.3
7 | Pillow==5.0.0
8 | PyYAML==3.12
9 |
--------------------------------------------------------------------------------