├── .gitignore
├── .gitmodules
├── README.md
├── config.py
├── data_loader.py
├── examples
├── 2347567.jpg
└── 2405273.jpg
├── main.py
├── networks
├── __init__.py
├── base.py
├── base_net
│ ├── __init__.py
│ └── net.py
├── image_feat_net
│ ├── __init__.py
│ ├── net.py
│ ├── resnet101
│ │ ├── __init__.py
│ │ └── net.py
│ ├── roi_pooling_layer
│ │ ├── __init__.py
│ │ ├── roi_pooling.so
│ │ ├── roi_pooling_op.cc
│ │ ├── roi_pooling_op.cu.o
│ │ ├── roi_pooling_op.py
│ │ ├── roi_pooling_op_gpu.cu.cc
│ │ ├── roi_pooling_op_gpu.h
│ │ ├── roi_pooling_op_grad.py
│ │ ├── roi_pooling_op_test.py
│ │ └── work_sharder.h
│ └── vgg16
│ │ ├── __init__.py
│ │ └── net.py
├── net_wrapper.py
├── pair_net
│ ├── __init__.py
│ └── net.py
└── text_feat_net
│ ├── __init__.py
│ └── net.py
├── requirements.txt
├── setup.sh
├── test.py
└── utils.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.npy
2 | **.png
3 | **.mat
4 | **.swn
5 | **.swp
6 | test.jpg
7 | train/*
8 | **.tar.gz
9 | **.swo
10 | **/__pycache__/**
11 | checkpoints/*
12 | visualization/*
13 | matlab_model/*
14 | nlvd_evaluation/*
15 | **.pyc
16 | **.pkl
17 | **.json
18 | scripts/*
19 | edge_boxes_with_python/*
20 |
21 |
--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "nlvd_evaluation"]
2 | path = nlvd_evaluation
3 | url = https://github.com/YutingZhang/nlvd_evaluation.git
4 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # TensorFlow Implementation of DBNet
2 |
3 | This repository is the **TensorFlow** implementation of DBNet, a method for localizing and detecting visual entities with natural language queries. DBNet is proposed in the follow paper:
4 |
5 | **[Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries](https://arxiv.org/abs/1704.03944)**,
6 | [Yuting Zhang](http://www.ytzhang.net/), Luyao Yuan, Yijie Guo, Zhiyuan He, I-An Huang, [Honglak Lee](https://web.eecs.umich.edu/~honglak/index.html)
7 | In *IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*, 2017. **spotlight**
8 |
9 | Remarks:
10 |
11 | - The results in the above paper are obtained with the Caffe+MATLAB implementation, which is available at https://github.com/YutingZhang/dbnet-caffe-matlab
12 | - This repository uses the evaluation protocol published together with the above paper, the implementation of which is at https://github.com/YutingZhang/nlvd_evaluation . It has been included here as a git submodule (see below for instructions on cloning submodules).
13 |
14 | ## How to clone this repository
15 |
16 | **This GIT repository have submodules**, please use the follow command to clone it.
17 |
18 | git clone git clone --recursive https://github.com/yuanluya/nldet_TensorFlow
19 |
20 | If you have clone the repository without the `--recursive` flag, you can do `git submodule update --init --recursive` in your local repository folder.
21 |
22 | The evaluation submodule requires additional setup steps. Please refer to [`./nlvd_evaluation/README.md`] (https://github.com/YutingZhang/nlvd_evaluation)
23 |
24 | ## Detection examples
25 |
26 | Here are two detection examples:
27 |
28 |
29 |
30 |
31 |
32 | ## Introduction to DBNet
33 |
34 | DBNet is a two-pathway deep neural network framework. It uses two separate pathways to extract visual and linguistic features, and uses a discriminative network to compute the matching score between the image region and the text phrase. DBNet is trained with a classifier with extensive use of negative samples. The training objective encourages better localization on single images, incorporates text phrases in a broad range, and properly pairs image regions with text phrases into positive and negative examples.
35 |
36 | For more details about DBNet, please refer to [the paper](https://arxiv.org/abs/1704.03944).
37 |
38 | ## Prerequisites
39 |
40 | * Python 3.3+
41 | * [TensorFlow](https://www.TensorFlow.org/install/install_linux) 1.x: `pip3 install --user TensorFlow-gpu`
42 | * Python-OpenCV 3.2.0: `pip3 install --user opencv-python`
43 | * [Pyx 0.14.1](http://pyx.sourceforge.net/): `pip3 install --user pyx`
44 | * [PIL 4.0.0](http://pillow.readthedocs.io/en/3.4.x/index.html): `pip3 install --user Pillow`
45 |
46 | If you have admin/root access to your workstation, you can remove `--user` and use `sudo` to install them into the system folder.
47 |
48 | ## What are included
49 |
50 | - Demo using pretrained models (detection and visualization on individual images)
51 | - Training code
52 | - Evaluation code
53 |
54 | ## Data to download
55 |
56 | - The [Visual Genome Images](http://visualgenome.org/api/v0/api_home.html) dataset.
57 | - [Spell-corrected text annotations](http://www.ytzhang.net/files/dbnet/data/vg_v1_json.tar.gz) for Visual Genome.
58 |
59 | - *Remark:* if you have set up the evaluation toolbox in `./nlvd_evaluation`, the above data should have been available already. You will only need to update the data paths in the configure file.
60 |
61 | - [Cached EdgeBoxes](http://www.ytzhang.net/files/dbnet/data/vg-v1-edgebox.tar.gz) for the Visual Genome images
62 |
63 | ## Pretrained Models
64 |
65 | - VGGNet-16 and RESNET-101 faster RCNN model pretrained on PASCAL VOC
66 | - Our pretrained VGGNet-16 and RESNET-101 based DBNet model.
67 |
68 | The pretrained models can be obtained via [this link](http://www.ytzhang.net/files/dbnet/tensorflow/dbnet-pretrained.tar.gz). This model was trained according to the following procedure from scratch. It outperforms the model used in the paper slightly. Its evaluation results are summarized as follows.
69 |
70 | - Localization
71 |
72 | | IoU Threshold| 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 |
73 | | --- | --- | --- | --- | --- | --- | --- | --- |
74 | | Recall | 56.6 | 47.8 | 40.1 | 32.4 | 25.0| 17.6 | 10.7 |
75 |
76 | |Top Overlap Median | Top Overlap Mean |
77 | | --- | --- |
78 | | 0.174 | 0.270 |
79 |
80 | - Detection for the level-0 query set:
81 |
82 | |Threshold | gAP | mAP |
83 | | --- | --- | --- |
84 | | 0.3 | 25.3 | 49.8 |
85 | | 0.5 | 12.3 | 31.4 |
86 | | 0.7 | 2.6 | 12.4 |
87 |
88 | - Detection for the level-1 query set:
89 |
90 | |Threshold | gAP | mAP |
91 | | --- | --- | --- |
92 | | 0.3 | 22.8 | 46.7 |
93 | | 0.5 | 11.2 | 29.7 |
94 | | 0.7 | 2.4 | 12.0 |
95 |
96 |
97 | - Detection for the level-2 query set:
98 |
99 | |Threshold | gAP | mAP |
100 | | --- | --- | --- |
101 | | 0.3 | 9.6 | 28.4 |
102 | | 0.5 | 5.0 | 19.0 |
103 | | 0.7 | 1.2 | 8.2 |
104 |
105 | ## Code Overview
106 |
107 | - `main.py`: contains the main function to run the experiment and evaluate the model.
108 | - `test.py`: utilities for the test phase
109 | - `utils.py`: utility functions to read in data to neural networks.
110 | - `config.py`: definitions of the model hyperparameters, and training and testing configurations.
111 | - `networks`: the subfolder for the model files.
112 |
113 | ## Usage
114 |
115 | You can use `python3 main.py` to run our code with default config setting, see `config.py` for detailed configuration definitions. You can overwrite the default configuration in `config.py` by parsing the corresponding arguments to `main.py` (see the examples later in this section).
116 |
117 | ### Detecting and Visualizing on Sample Images
118 |
119 | #### Demo on Images from Visual Genome (Quick Demo)
120 |
121 | You can run a quick demo on Visual Genome images with a user-specified query.
122 |
123 | - Download the Visual Genome images, text annotations, EdgeBox cache, and our pretrained model.
124 | - Set up the `ENV_PATHS.IMAGE_PATH` and `ENV_PATHS.IMAGE_PATH` accordingly in `config.py`.
125 | - Put the pretrained model in the same directory as defined in `config.py`.
126 | - Create a json file with a list of image ids (just the number) that you want to do detection with and set the `--IMAGE_LIST_DIR` in `config.py` to this file's directory.
127 | - After that, run
128 |
129 | python3 main.py --MODE vis --PHRASE_INPUT 'your text query .'
130 |
131 | - You should be able to view the visualization of the detection results in the `visualization` folder created under the root of the project.
132 | - Make sure you have [Pyx](http://pyx.sourceforge.net/) and [PIL](http://pillow.readthedocs.io/en/3.4.x/index.html) installed to draw the result. `pdflatex` is also needed.
133 |
134 | #### Demo on Other Images
135 |
136 | To perform detection on non-Visual-Genome images, an external region proposal method is needed. Our code supports EdgeBox. You can download [the EdgeBox python interface](https://github.com/dculibrk/edge_boxes_with_python) to the repository root and run our code. Please make sure that the `ENV_PATHS.EDGE_BOX_RPN` is pointing to location of `edge_boxes.py`. The test procedure is the same as testing on Visual Genome images, except that, you will need to use **absolute paths** in the json file rather than image ids to list the test images.
137 |
138 | ### Training DBNet
139 |
140 | 1. Download images from [the Visual Genome website](http://visualgenome.org/api/v0/api_home.html) and our spell-checked text annotations.
141 | 2. Change `config.py` according to your data paths.
142 | 3. Either download our [trained model](#pretrained-model) to finetuning or perform training from scratch.
143 | 4. To finetune a pretrained model, please download and make sure `config.py` has the correct paths to the two `.npy` files (one is for the image pathway, and the other one is for the text pathway).
144 |
145 | #### Training from Scratch
146 |
147 | To train from scratch, we recommend using the [faster RCNN](https://arxiv.org/abs/1506.01497) to initialize the image pathway and randomly initialize the text pathway with our default parameters. After that, the DBNet model can be trained in 3 phases.
148 |
149 | - Phase 1: Fix the image net and use the base learning rate for the text model until the loss converges.
150 |
151 | `python3 main.py --PHASE phase1 --text_lr 1e-4 --image_lr_conv 0 --image_lr_region 0 --IMAGE_FINE_TUNE_MODEL frcnn_Region_Feat_Net.npy --TEXT_FINE_TUNE_MODEL XXX.npy --MAX_ITERS 50000`
152 |
153 | - Phase 2: Tune both pathway together without changing the base learning rate. To try out other configurations, change plese change `config.py`.
154 |
155 | `python3 main.py --PHASE phase2 --text_lr 1e-4 --image_lr_conv 1e-3 --image_lr_region 1e-3 --INIT_SNAPSHOT phase1 --INIT_ITER 50000 --MAX_ITERS 150000`
156 |
157 | - Phase 3: Decrease the learning rate for all pathways by a factor of 10 and train the model further.
158 |
159 | `python3 main.py --PHASE phase3 --INIT_SNAPSHOT phase2 --INIT_ITER 200000 --MAX_ITERS 100000`
160 |
161 | Model snapshots will be saved every `--SAVE_ITERS` to `--SNAPSHOT_DIR`. We name the snapshots as `nldet_[PHASE]_[ITER]`.
162 |
163 | ### Benchmarking on Visual Genome
164 |
165 | To test with pretrained model, you can place `.npy` files to the default directory and run `python3 main.py --MODE test`. To test TensorFlow models trained from scratch, please change `--INIT_SNAPSHOT` and `--INIT_ITER` flags accordingly.
166 |
167 | The detection results will be saved in a subfolder `tmp_output` under the directory `nlvd_evaluation/results/vg_v1/dbnet_[IMAGE MODEL]/` in `nlvd_evaluation` submodule. `IMAGE MODEL` refers to the model used in the image path way and can be set by `--IMAGE_MODEL` flag in `config.py`. By default `--IMAGE_MODEL` is set to `vgg16` and our model also supports `resnet101`. These tempory results will be merged together and saved in a `.txt` file, which can be used by our evaluation code directly. As long as results in `tmp_output` are saved, testing process can be resumed at anytime. Change the `--LEVEL` flag in `config.py` to perform the three-level tests in the paper.
168 |
169 | `python3 main.py --MODE test --LEVEL level_0 --INIT_SNAPSHOT phase3 --INIT_ITER 300000`
170 |
171 | ### Evaluation
172 |
173 | The evaluation and dataset development code is cloned from the [nlvd_evaluation](https://github.com/YutingZhang/nlvd_evaluation) repository as a submodule of this code. You can refer to [this page](https://github.com/YutingZhang/nlvd_evaluation/tree/master/evaluation) for more detailed instructions for how to compute the performance metrics.
174 |
175 | ## Contributors
176 |
177 | This repository is mainly contributed by [Luyao Yuan](https://github.com/yuanluya) and [Binghao Deng](https://github.com/bhdeng). The evaluation code is provided by [Yijie Guo](https://github.com/guoyijie)
178 |
--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
1 | """ Configuration for the network
2 | """
3 | import os
4 | import os.path as osp
5 | import sys
6 | from easydict import EasyDict as edict
7 | import tensorflow as tf
8 |
9 | ###############################################
10 | #set global configuration for network training#
11 | ###############################################
12 | flags = tf.app.flags
13 | FLAGS = flags.FLAGS
14 | #model hyperparameters
15 | flags.DEFINE_string('IMAGE_MODEL', 'vgg16', 'which network to use for image pathway, ')
16 | flags.DEFINE_float('text_lr', 1e-5, 'learning rate for the text end')
17 | flags.DEFINE_float('image_lr_conv', 1e-4, 'learning rate for the image end')
18 | flags.DEFINE_float('image_lr_region', 1e-4, 'learning rate for the image end')
19 | flags.DEFINE_integer('batch_size', 2, 'number of images in a batch sent to the network')
20 | flags.DEFINE_integer('pair_net_batch_size', 128, 'number of image and text pair sent to the pair net in a subbatch')
21 | #device layout
22 | flags.DEFINE_integer('DEVICE_NUM', 0, 'GPU device ID')
23 | flags.DEFINE_integer('NUM_PROCESSORS', 4, 'Number of processor for data loading')
24 | flags.DEFINE_integer('DATA_LOADER_CAPACITY', 10, 'Maximum number of batches saved in the data loader')
25 | #training mode
26 | flags.DEFINE_string('MODE', "train", 'train|test|val')
27 | flags.DEFINE_boolean('DEBUG', False, 'whether run in tensorflow debug mode')
28 | flags.DEFINE_string('PHASE', 'phase1', 'phase1|phase2')
29 | flags.DEFINE_string('IMAGE_FINE_TUNE_MODEL', 'Region_Feat_Net.npy',
30 | 'relative path to networks/image_feat_net//net.py depends on the choice of --IMAGE_MODEL')
31 | flags.DEFINE_string('TEXT_FINE_TUNE_MODEL', 'vgg16_Text_Feat_Net.npy', 'relative path to networks/text_feat_net/net.py')
32 | flags.DEFINE_string('INIT_SNAPSHOT', 'phase1', 'init train from which phase')
33 | flags.DEFINE_integer('INIT_ITER', 0, 'init train from which iteration, together with INIT_SNAPSHOT')
34 | flags.DEFINE_string('SNAPSHOT_DIR', 'checkpoints', 'init train from which phase')
35 | flags.DEFINE_boolean('RESTORE_ALL', False, 'restore model with all variables (concern with momentum issue)')
36 | #test configs if testing
37 | flags.DEFINE_string('LEVEL', 'level_0', 'level_0|level_1|level_2')
38 | flags.DEFINE_integer('TOP_NUM_RPN', 500, 'doing nms in the top k boxes based on the prediction score')
39 | flags.DEFINE_boolean('INCLUDE_GT_BOX', False, 'include ground truth box in final test box')
40 | #visualization output
41 | flags.DEFINE_string('VIS_DIR', 'visualization', 'save image detection example')
42 | flags.DEFINE_string('PHRASE_INPUT', 'A man in red.', 'query phrase to do detections')
43 | flags.DEFINE_string('IMAGE_LIST_DIR', 'image_examples.json', 'a file specify which images to do visualization, '
44 | 'image id if images in VISUAL GENOME, otherwise absolute directory of the images')
45 | flags.DEFINE_integer('VIS_NUM', 3, 'draw how top-x detected regions')
46 | #training infos
47 | flags.DEFINE_integer('MAX_ITERS', float('inf'), 'Maxiumum running iteration')
48 | flags.DEFINE_integer('PRINT_ITERS', 1, 'Print data each print_iters')
49 | flags.DEFINE_integer('SAVE_ITERS', 2000, 'Frequency of saving checkpoints')
50 |
51 | ###############################################
52 | # set global configuration for data reading #
53 | ###############################################
54 | DATA_PATH = osp.abspath(osp.join(osp.dirname(__file__), 'data'))
55 | ENV_PATHS = edict()
56 |
57 | # need to be moved to data path
58 | ENV_PATHS.IMAGE_PATH = '/mnt/brain3/datasets/VisualGenome/images'
59 | ENV_PATHS.EDGEBOX_PATH = '/mnt/brain2/scratch/yutingzh/object-det-cache/nldet_cache/region_proposal_cache/vg/edgebox'
60 | ENV_PATHS.EDGE_BOX_RPN = '/mnt/brain1/scratch/yuanluya/nldet_tensorflow/edge_boxes_with_python'
61 | ENV_PATHS.RAW_DATA = osp.abspath(osp.join(DATA_PATH, 'region_description.json'))
62 | ENV_PATHS.METEOR = osp.abspath(osp.join(DATA_PATH, 'meteor.json')) #upper triangle matrix
63 | ENV_PATHS.FREQUENCY = osp.abspath(osp.join(DATA_PATH, 'freq.json'))
64 | ENV_PATHS.SPLIT = osp.abspath(osp.join(DATA_PATH, 'densecap_splits.json'))
65 | ENV_PATHS.LEVEL1_TEST = osp.abspath(osp.join(DATA_PATH, 'level1_im2p.json'))
66 | ENV_PATHS.LEVEL2_TEST = osp.abspath(osp.join(DATA_PATH, 'level2_im2p.json'))
67 |
68 | ###############################################
69 | # set global configuration for data sampling #
70 | ###############################################
71 | DS_CONFIG = edict()
72 |
73 | DS_CONFIG.thre_neg = 0.1
74 | DS_CONFIG.thre_pos = 0.9
75 | DS_CONFIG.pos_loss_weight = 1
76 | DS_CONFIG.neg_loss_weight = 1
77 | DS_CONFIG.rest_loss_weight = 1
78 | DS_CONFIG.meteor_thred = 0.3
79 | DS_CONFIG.text_tensor_sequence_length = 256
80 | DS_CONFIG.text_rand_sample_size = 100
81 | DS_CONFIG.target_size = 600
82 | DS_CONFIG.max_size = 1000
83 | DS_CONFIG.edge_box_high_rank_num = 100
84 | DS_CONFIG.edge_box_random_num = 50
85 |
--------------------------------------------------------------------------------
/data_loader.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | from multiprocessing import Condition, Lock, Process, Manager
3 | import random
4 | #from utils import train_ids, test_ids, get_data
5 | from utils import train_ids, test_ids, get_data
6 | import pdb
7 |
8 |
9 | class DataLoader:
10 | """ Class for loading data
11 | Attributes:
12 | num_processor: an integer indicating the number of processors
13 | for loading the data, normally 4 is enough
14 | capacity: an integer indicating the capacity of the data load
15 | queue, default set to 10
16 | batch_size: an integer indicating the batch size for each
17 | extraction from the data load queue
18 | phase: an string indicating the phase of the data loading process,
19 | can only be 'train' or 'test'
20 | """
21 | def __init__(self, num_processor, batch_size, phase,
22 | batch_idx_init = 0, data_ids_init = train_ids, capacity = 10):
23 | self.num_processor = num_processor
24 | self.batch_size = batch_size
25 | self.data_load_capacity = capacity
26 | self.manager = Manager()
27 | self.batch_lock = Lock()
28 | self.mutex = Lock()
29 | self.cv_full = Condition(self.mutex)
30 | self.cv_empty = Condition(self.mutex)
31 | self.data_load_queue = self.manager.list()
32 | self.cur_batch = self.manager.list([batch_idx_init])
33 | self.processors = []
34 | if phase == 'train':
35 | self.data_ids = self.manager.list(data_ids_init)
36 | elif phase == 'test':
37 | self.data_ids = self.manager.list(test_ids)
38 | else:
39 | raise ValueError('Could not set phase to %s' % phase)
40 |
41 | def __load__(self):
42 | while True:
43 | image_dicts = []
44 | self.batch_lock.acquire()
45 | image_ids = self.data_ids[self.cur_batch[0] * self.batch_size :
46 | (self.cur_batch[0] + 1) * self.batch_size]
47 | self.cur_batch[0] += 1
48 | if (self.cur_batch[0] + 1) * self.batch_size >= len(self.data_ids):
49 | self.cur_batch[0] = 0
50 | random.shuffle(self.data_ids)
51 | self.batch_lock.release()
52 |
53 | data = get_data(image_ids)
54 |
55 | self.cv_full.acquire()
56 | if len(self.data_load_queue) > self.data_load_capacity:
57 | self.cv_full.wait()
58 | self.data_load_queue.append(data)
59 | self.cv_empty.notify()
60 | self.cv_full.release()
61 |
62 | def start(self):
63 | for _ in range(self.num_processor):
64 | p = Process(target = self.__load__)
65 | p.start()
66 | self.processors.append(p)
67 |
68 | def get_batch(self):
69 | self.cv_empty.acquire()
70 | if len(self.data_load_queue) == 0:
71 | self.cv_empty.wait()
72 | batch_data = self.data_load_queue.pop()
73 | self.cv_full.notify()
74 | self.cv_empty.release()
75 | return batch_data
76 |
77 | def get_status(self):
78 | self.batch_lock.acquire()
79 | current_cur_batch = self.cur_batch[0]
80 | current_data_ids = self.data_ids
81 | self.batch_lock.release()
82 | return {'batch_idx': int(current_cur_batch), 'data_ids': list(current_data_ids)}
83 |
84 | def stop(self):
85 | for p in self.processors:
86 | p.terminate()
87 |
--------------------------------------------------------------------------------
/examples/2347567.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/examples/2347567.jpg
--------------------------------------------------------------------------------
/examples/2405273.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/examples/2405273.jpg
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python3
2 | import tensorflow as tf
3 | from tensorflow.python import debug as tf_debug
4 | from tensorflow.core.protobuf import config_pb2
5 | import json
6 | import os
7 | from config import FLAGS
8 | from networks.net_wrapper import NetWrapper
9 | from data_loader import DataLoader
10 | from test import test
11 | from utils import val_ids, test_ids, visualize
12 |
13 | import pdb
14 |
15 | def step(net, loader):
16 | batch_data = loader.get_batch()
17 | net.set_input(batch_data)
18 | net.forward_backward()
19 |
20 | def main(_):
21 | sess = tf.Session(config = tf.ConfigProto(allow_soft_placement = True,
22 | log_device_placement = False))
23 |
24 | #declare networks
25 | with tf.device('/gpu: %d' % FLAGS.DEVICE_NUM):
26 | net = NetWrapper(sess, FLAGS.IMAGE_MODEL, FLAGS.image_lr_conv, FLAGS.image_lr_region, FLAGS.text_lr,
27 | FLAGS.pair_net_batch_size, FLAGS.MODE,
28 | FLAGS.IMAGE_FINE_TUNE_MODEL, FLAGS.TEXT_FINE_TUNE_MODEL)
29 | net.build()
30 |
31 | if FLAGS.DEBUG:
32 | sess = tf_debug.LocalCLIDebugWrapperSession(sess)
33 | sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)
34 |
35 | net.text_net.sess = sess
36 | init = tf.global_variables_initializer()
37 | sess.run(init)
38 |
39 | #train_writer = tf.summary.FileWriter('.' + '/train', sess.graph)
40 | #restore network
41 | if FLAGS.RESTORE_ALL:
42 | restore = []
43 | else:
44 | restore = net.varlist
45 |
46 | if net.load(sess, FLAGS.SNAPSHOT_DIR, 'nldet_%s_%d' % (FLAGS.INIT_SNAPSHOT, FLAGS.INIT_ITER), restore):
47 | print('[INIT]Successfully load model from %s_%d' % (FLAGS.INIT_SNAPSHOT, FLAGS.INIT_ITER))
48 | elif FLAGS.MODE == 'test':
49 | print('[INIT]No Tensorflow found model for %s, test initialization' % FLAGS.INIT_SNAPSHOT)
50 | else:
51 | print('[INIT]No Tensorflow found model for %s train from scratch' % FLAGS.INIT_SNAPSHOT)
52 |
53 | if FLAGS.MODE == "train":
54 | resume_status = None
55 | status_dir = '%s/nldet_%s_%d/nldet_status_%s_%d.json' %\
56 | (FLAGS.SNAPSHOT_DIR, FLAGS.INIT_SNAPSHOT, FLAGS.INIT_ITER, FLAGS.INIT_SNAPSHOT, FLAGS.INIT_ITER)
57 | if os.path.exists(status_dir):
58 | resume_status = json.load(open(status_dir, 'r'))
59 | print('resume from %s' % status_dir)
60 | else:
61 | print('no resume data loader status found')
62 |
63 | # initialize data loader
64 | if resume_status is None:
65 | loader = DataLoader(FLAGS.NUM_PROCESSORS, FLAGS.batch_size, FLAGS.MODE, capacity = FLAGS.DATA_LOADER_CAPACITY)
66 | else:
67 | loader = DataLoader(FLAGS.NUM_PROCESSORS, FLAGS.batch_size, FLAGS.MODE,
68 | resume_status['batch_idx'], resume_status['data_ids'], FLAGS.DATA_LOADER_CAPACITY)
69 | loader.start()
70 |
71 | current_iter = FLAGS.INIT_ITER + 1
72 | while current_iter <= FLAGS.MAX_ITERS:
73 | step(net, loader)
74 | if current_iter % FLAGS.PRINT_ITERS == 0:
75 | net.get_output(current_iter)
76 | if current_iter % FLAGS.SAVE_ITERS == 0:
77 | net.save(sess, FLAGS.SNAPSHOT_DIR, 'nldet_%s_%d' % (FLAGS.PHASE, current_iter))
78 | saving_status = loader.get_status()
79 | json.dump(saving_status, open('%s/nldet_%s_%d/nldet_status_%s_%d.json' % \
80 | (FLAGS.SNAPSHOT_DIR, FLAGS.PHASE, current_iter, FLAGS.PHASE, current_iter), 'w'))
81 | print('save data loader status to nldet_status_%s_%d.json' % (FLAGS.PHASE, current_iter))
82 | current_iter += 1
83 | loader.stop()
84 |
85 | elif FLAGS.MODE == "test" or FLAGS.MODE == 'val':
86 | if FLAGS.MODE == 'test':
87 | tranverse_ids = test_ids
88 | else:
89 | tranverse_ids = val_ids
90 | if FLAGS.LEVEL != 'level_0':
91 | print('Validation set only support level-0')
92 | return
93 | for idx, tid in enumerate(tranverse_ids):
94 | print('[%d/%d]' % (idx + 1, len(tranverse_ids)))
95 | result_dir = "nlvd_evaluation/results/vg_v1/dbnet_%s" % FLAGS.IMAGE_MODEL
96 | if os.path.exists('%s/tmp_output/%s_%d.txt' % (result_dir, FLAGS.LEVEL, tid)):
97 | print('FOUND EXISTING RESULT')
98 | continue
99 | test(net, tid, FLAGS.LEVEL, result_dir, top_num = FLAGS.TOP_NUM_RPN, gt_box = FLAGS.INCLUDE_GT_BOX)
100 | os.system('cat %s/tmp_output/%s* > %s/%s.txt' % (result_dir, FLAGS.LEVEL, result_dir, FLAGS.LEVEL))
101 |
102 | elif FLAGS.MODE == "vis":
103 | im_ids = json.load(open(FLAGS.IMAGE_LIST_DIR, 'r'))
104 | os.makedirs(FLAGS.VIS_DIR, exist_ok = True)
105 | for idx, im_id in enumerate(im_ids):
106 | detection_result = test(net, im_id, 'vis', None,
107 | top_num = FLAGS.TOP_NUM_RPN, query_phrase = [FLAGS.PHRASE_INPUT])
108 | visualize(im_id, detection_result, FLAGS.VIS_NUM, FLAGS.PHRASE_INPUT,
109 | os.path.join(FLAGS.VIS_DIR, 'vis_' + str(idx + 1)))
110 | return
111 |
112 | if __name__ == '__main__':
113 | tf.app.run()
114 |
--------------------------------------------------------------------------------
/networks/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/networks/__init__.py
--------------------------------------------------------------------------------
/networks/base.py:
--------------------------------------------------------------------------------
1 | import os
2 | from glob import glob
3 | import tensorflow as tf
4 |
5 | class Model(object):
6 | """Abstract object representing an Reader model."""
7 | def __init__(self):
8 | return
9 |
10 | def save(self, sess, checkpoint_dir, dataset_name):
11 | self.saver = tf.train.Saver()
12 |
13 | print(" [*] Saving checkpoints...")
14 | model_name = type(self).__name__ or "Reader"
15 | model_dir = dataset_name
16 |
17 | checkpoint_dir = os.path.join(checkpoint_dir, model_dir)
18 | if not os.path.exists(checkpoint_dir):
19 | os.makedirs(checkpoint_dir)
20 | self.saver.save(sess, os.path.join(checkpoint_dir, model_name))
21 |
22 | def load(self, sess, checkpoint_dir, dataset_name, load_var):
23 |
24 | all_vars = tf.global_variables()
25 | if len(load_var) == 0:
26 | restore_vars = all_vars
27 | else:
28 | restore_vars = [var for var in all_vars if var in load_var]
29 | self.saver = tf.train.Saver(restore_vars)
30 | print(" [*] Loading checkpoints...")
31 | print(dataset_name)
32 | model_dir = dataset_name
33 | checkpoint_dir = os.path.join(checkpoint_dir, model_dir)
34 |
35 | ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
36 | if ckpt and ckpt.model_checkpoint_path:
37 | ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
38 | self.saver.restore(sess, os.path.join(checkpoint_dir, ckpt_name))
39 | return True
40 | else:
41 | return False
42 |
43 |
--------------------------------------------------------------------------------
/networks/base_net/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/networks/base_net/__init__.py
--------------------------------------------------------------------------------
/networks/base_net/net.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import os
3 | import tensorflow as tf
4 | import pdb
5 |
6 | class BaseNet:
7 | """ Class for basic net operations and structure
8 | """
9 | def __init__(self, sess):
10 | self.sess = sess
11 | self.ys = []
12 | self.xs = []
13 | self.grad_ys = None
14 | self.gradients_pool = {}
15 | self.average_grads = {}
16 | self.p_grads = []
17 | self.p_batch_sizes = 0.0
18 | self.current_batch_size = None
19 |
20 | #placeholders required outside for gradient accumulate
21 | self.batch_sizes = tf.placeholder(tf.float32, name = 'subbatch_sizes')
22 | self.batch_num = tf.placeholder(tf.int32, name = 'subbatch_nums')
23 |
24 | def accumulate(self):
25 | #accumulate gradients
26 | self.gradients = tf.gradients(self.ys, self.xs, grad_ys = self.grad_ys)
27 |
28 | for idx, var in enumerate(self.xs):
29 | self.gradients_pool[var] = tf.Variable(initial_value = np.zeros(1),
30 | validate_shape = False,
31 | trainable = False,
32 | dtype = tf.float32)
33 |
34 | def first_grad():
35 | ops = []
36 | for idx, var in enumerate(self.xs):
37 | ops.append(tf.assign(self.gradients_pool[var], self.gradients[idx],
38 | validate_shape = False))
39 | with tf.control_dependencies(ops):
40 | return tf.no_op()
41 |
42 | def normal_grad():
43 | ops = []
44 | for idx, var in enumerate(self.xs):
45 | ops.append(tf.assign_add(self.gradients_pool[var], self.gradients[idx]))
46 | with tf.control_dependencies(ops):
47 | return tf.no_op()
48 |
49 | #flow_control_list = [tf.contrib.framework.
50 | # convert_to_tensor_or_sparse_tensor(grad)
51 | # for grad in self.gradients]
52 | #with tf.control_dependencies(flow_control_list):
53 | self.accumulate_grad = tf.cond(tf.equal(self.batch_num, 0),
54 | first_grad, normal_grad)
55 |
56 | #calculate final gradients for this batch
57 | for var in self.xs:
58 | self.average_grads[var] = self.gradients_pool[var]
59 | # tf.div(self.gradients_pool[var],
60 | # self.batch_sizes))
61 |
62 | def backward(self, get_cpu_array = False):
63 | if get_cpu_array:
64 | self.p_grads = self.sess.run(list(self.average_grads.values()),
65 | feed_dict = {self.batch_sizes:
66 | self.p_batch_sizes})
67 | else:
68 | self.sess.run(list(self.average_grads.values()),
69 | feed_dict = {self.batch_sizes:
70 | self.p_batch_sizes})
71 | self.p_batch_sizes = 0.0
72 | return
73 |
74 | def get_input_gradients(self):
75 | return self.p_grads
76 |
--------------------------------------------------------------------------------
/networks/image_feat_net/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/networks/image_feat_net/__init__.py
--------------------------------------------------------------------------------
/networks/image_feat_net/net.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | import numpy as np
3 | import os
4 | import inspect
5 | import pdb
6 |
7 | class ImageFeatNet:
8 | """ Network model for Image Feat Net
9 | Attributes:
10 | sess:
11 | opt:
12 | max_batch_size:
13 | train:
14 | RegionNet_npy_path:
15 | """
16 | def __init__(self, sess, lr_conv, lr_region, max_batch_size = 32, train = True):
17 | self.lr_conv = lr_conv
18 | self.lr_region = lr_region
19 | self.opt_conv = tf.train.MomentumOptimizer(self.lr_conv, 0.9)
20 | self.opt_region = tf.train.MomentumOptimizer(self.lr_region, 0.9)
21 | #model hyperparameters
22 | self.sess = sess
23 | self.max_batch_size = max_batch_size
24 | self.train = train
25 |
26 | #physical inputs should be numpy arrays
27 | self.images = tf.placeholder(tf.float32, shape = [None, None, None, 3],
28 | name = 'image_inputs')
29 | #[batch_idx, xmin, ymin, xmax, ymax]
30 | self.rois = tf.placeholder(tf.float32, shape = [None, 5],
31 | name = 'roi_inputs')
32 | self.dropout_flag = tf.placeholder(tf.int32)
33 | self.p_images = None
34 | self.p_rois = None
35 |
36 | #physcial outputs
37 | self.p_region_feats = None
38 |
39 | def build(self, sub_net, output_grad = tf.placeholder(tf.float32),
40 | feature_dim = 4096, roi_size = 7, roi_scale = 0.0625,
41 | dropout_ratio = 0.3, weight_decay= 1e-4, batch_size = 16):
42 |
43 | #conv net base
44 | self.sub_net = sub_net
45 | self.output_grad = output_grad
46 |
47 | #optimization utility
48 | self.batch_size = batch_size
49 | assert self.batch_size < self.max_batch_size
50 |
51 | #model parameters
52 | self.parameters = {}
53 | self.parameters['feature_dim'] = feature_dim
54 | self.parameters['weight_decay'] = weight_decay
55 | self.parameters['dropout_ratio'] = dropout_ratio
56 | self.parameters['roi_size'] = roi_size
57 | self.parameters['roi_scale'] = roi_scale
58 | self.parameters['dropout_flag'] = self.dropout_flag
59 |
60 | #######################################################################
61 | ######################## NETWORK STARTS #########################
62 | #######################################################################
63 | self.roi_features = self.sub_net.build(self.images, self.rois, self.parameters)
64 | self.output = tf.Variable(initial_value = 1.0, trainable = False,
65 | validate_shape = False, dtype = tf.float32)
66 | self.get_output = tf.assign(self.output, self.roi_features,
67 | validate_shape = False)
68 |
69 | #gather weight decays
70 | self.wd = tf.add_n(tf.get_collection('img_net_weight_decay'),
71 | name = 'img_net_total_weight_decay')
72 | if self.sub_net.net_type == 'Vgg16':
73 | self.extra_update = [tf.no_op()]
74 | elif self.sub_net.net_type == 'Resnet101':
75 | self.extra_update = tf.get_collection('resnet_update_ops')
76 |
77 |
78 | def accumulate(self):
79 | self.ys = [self.wd, self.roi_features]
80 | self.grad_ys = [1.0, self.output_grad]
81 |
82 | self.gradients_conv = tf.gradients(self.ys, self.sub_net.varlist_conv, grad_ys = self.grad_ys)
83 | self.gradients_region = tf.gradients(self.ys, self.sub_net.varlist_region, grad_ys = self.grad_ys)
84 |
85 | self.grad_and_vars_conv = []
86 | self.grad_and_vars_region = []
87 |
88 | for idx, var in enumerate(self.sub_net.varlist_conv):
89 | self.grad_and_vars_conv.append((self.gradients_conv[idx], var))
90 | for idx, var in enumerate(self.sub_net.varlist_region):
91 | self.grad_and_vars_region.append((self.gradients_region[idx], var))
92 |
93 | #apply gradients
94 | with tf.control_dependencies(self.gradients_conv + self.gradients_region):
95 | self.train_op = tf.group(self.opt_conv.apply_gradients(self.grad_and_vars_conv),
96 | self.opt_region.apply_gradients(self.grad_and_vars_region), *self.extra_update)
97 |
98 | def set_input(self, images, rois):
99 | self.p_images = images
100 | self.p_rois = rois
101 |
102 | def get_output(self):
103 | return self.p_roi_features
104 |
105 | def forward(self, physical_output = False):
106 | if physical_output:
107 | [self.p_roi_features] = self.sess.run([self.get_output],
108 | feed_dict = {
109 | self.images: self.p_images,
110 | self.rois: self.p_rois,
111 | self.dropout_flag: 1})
112 | else:
113 | self.sess.run([self.get_output],
114 | feed_dict = {
115 | self.images: self.p_images,
116 | self.rois: self.p_rois,
117 | self.dropout_flag: 1})
118 | return
119 |
120 | def backward(self):
121 | self.sess.run([self.train_op],
122 | feed_dict = {
123 | self.images: self.p_images,
124 | self.rois: self.p_rois,
125 | self.dropout_flag: 0})
126 | return
127 |
128 |
--------------------------------------------------------------------------------
/networks/image_feat_net/resnet101/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/networks/image_feat_net/resnet101/__init__.py
--------------------------------------------------------------------------------
/networks/image_feat_net/resnet101/net.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import inspect
3 | import os
4 | import tensorflow as tf
5 | from tensorflow.python.ops import control_flow_ops
6 | from tensorflow.python.training import moving_averages
7 | from ..roi_pooling_layer.roi_pooling_op import roi_pool
8 | from ..roi_pooling_layer.roi_pooling_op_grad import *
9 | import pdb
10 |
11 | MOVING_AVERAGE_DECAY = 0.9997
12 | BN_DECAY = MOVING_AVERAGE_DECAY
13 | BN_EPSILON = 1e-6
14 | CONV_WEIGHT_STDDEV = 0.1
15 | UPDATE_OPS_COLLECTION = 'resnet_update_ops' # must be grouped with training op
16 |
17 | class Resnet101:
18 | def __init__(self, RegionNet_npy_path='frcnn_Region_Feat_Net.npy', train=True):
19 | #load saved model
20 | try:
21 | path = inspect.getfile(Resnet101)
22 | path = os.path.abspath(os.path.join(path, os.pardir))
23 | RegionNet_npy_path = os.path.join(path, RegionNet_npy_path)
24 | self.data_dict = np.load(RegionNet_npy_path, encoding='latin1').item()
25 | print("Image Feat Net npy file loaded")
26 | except:
27 | print('[WARNING!]Image Feat Net npy file not found,'
28 | 'we don\'t recommend training this network from scratch')
29 | self.data_dict = {}
30 | self.is_training = train
31 | self.varlist_conv = []
32 | self.varlist_region = []
33 | self.net_type = 'Resnet101'
34 | self.activation = tf.nn.relu
35 |
36 | self.cvar = {}
37 |
38 | def build(self, bgr, rois, parameters,
39 | num_blocks=[3, 4, 23, 3],
40 | use_bias=False):
41 |
42 | self.bgr = bgr
43 | self.rois = rois
44 | self.weight_decay = parameters['weight_decay']
45 | self.roi_size = parameters['roi_size']
46 | self.roi_scale = parameters['roi_scale']
47 |
48 | c = {}
49 | c['bottleneck'] = True
50 | c['is_training'] = tf.convert_to_tensor(self.is_training,
51 | dtype='bool',
52 | name='is_training')
53 | c['ksize'] = 3
54 | c['stride'] = 1
55 | c['use_bias'] = use_bias
56 | c['num_blocks'] = num_blocks
57 | c['stack_stride'] = 2
58 |
59 | with tf.variable_scope('scale1'):
60 | c['conv_filters_out'] = 64
61 | c['ksize'] = 7
62 | c['stride'] = 2
63 | self.conv1 = self.conv(tf.pad(self.bgr,
64 | [[0, 0], [3, 3], [3, 3], [0, 0]]), c, 'conv1', padding='VALID')
65 | self.bn_conv1 = self.bn(self.conv1, c, 'bn_conv1')
66 | self.scale1_feat = self.activation(self.bn_conv1)
67 |
68 | with tf.variable_scope('scale2'):
69 | self.scale1_pool = self._max_pool(tf.pad(self.scale1_feat,
70 | [[0, 0], [0, 1], [0, 1], [0, 0]]), ksize=3, stride=2)
71 | c['num_blocks'] = num_blocks[0]
72 | c['stack_stride'] = 1
73 | c['block_filters_internal'] = 64
74 | self.scale2_feat = self.stack(self.scale1_pool, c, '2')
75 |
76 | with tf.variable_scope('scale3'):
77 | c['num_blocks'] = num_blocks[1]
78 | c['block_filters_internal'] = 128
79 | c['stack_stride'] = 2
80 | self.scale3_feat = self.stack(self.scale2_feat, c, '3')
81 |
82 | with tf.variable_scope('scale4'):
83 | c['num_blocks'] = num_blocks[2]
84 | c['block_filters_internal'] = 256
85 | assert c['stack_stride'] == 2
86 | self.scale4_feat = self.stack(self.scale3_feat, c, '4')
87 |
88 |
89 | [self.rois_feat, _] = roi_pool(self.scale4_feat, self.rois,
90 | self.roi_size, self.roi_size,
91 | self.roi_scale)
92 |
93 | with tf.variable_scope('scale5'):
94 | c['num_blocks'] = num_blocks[3]
95 | c['block_filters_internal'] = 512
96 | assert c['stack_stride'] == 2
97 | self.scale5_feat = self.stack(self.rois_feat, c, '5', belong='region')
98 |
99 | # post-net
100 | self.final_feature = tf.reduce_mean(self.scale5_feat, reduction_indices=[1, 2], name="avg_pool")
101 |
102 | return self.final_feature
103 |
104 | def stack(self, x, c, stack_caffe_scale, belong='conv'):
105 | if c['num_blocks'] == 3:
106 | block_names = ['a', 'b', 'c']
107 | else:
108 | block_names = ['a'] + ['b' + str(i + 1) for i in range(c['num_blocks'] - 1)]
109 | for n in range(c['num_blocks']):
110 | s = c['stack_stride'] if n == 0 else 1
111 | c['block_stride'] = s
112 | with tf.variable_scope('block%d' % (n + 1)):
113 | x = self.block(x, c, stack_caffe_scale+block_names[n], belong)
114 | return x
115 |
116 |
117 | def block(self, x, c, block_caffe_name, belong='conv'):
118 | filters_in = x.get_shape()[-1]
119 |
120 | # Note: filters_out isn't how many filters are outputed.
121 | # That is the case when bottleneck=False but when bottleneck is
122 | # True, filters_internal*4 filters are outputted. filters_internal is how many filters
123 | # the 3x3 convs output internally.
124 | m = 4 if c['bottleneck'] else 1
125 | filters_out = m * c['block_filters_internal']
126 |
127 | shortcut = x # branch 1
128 |
129 | c['conv_filters_out'] = c['block_filters_internal']
130 |
131 | with tf.variable_scope('a'):
132 | c['ksize'] = 1
133 | c['stride'] = c['block_stride']
134 | x = self.conv(x, c, 'res'+block_caffe_name+'_branch2a', belong)
135 | self.cvar['res'+block_caffe_name+'_branch2a'] = x
136 | x = self.bn(x, c, 'bn'+block_caffe_name+'_branch2a', belong)
137 | self.cvar['bn'+block_caffe_name+'_branch2a'] = x
138 | x = self.activation(x)
139 |
140 | with tf.variable_scope('b'):
141 | c['ksize'] = 3
142 | c['stride'] = 1
143 | x = self.conv(x, c, 'res'+block_caffe_name+'_branch2b', belong)
144 | self.cvar['res'+block_caffe_name+'_branch2b'] = x
145 | x = self.bn(x, c, 'bn'+block_caffe_name+'_branch2b', belong)
146 | self.cvar['bn'+block_caffe_name+'_branch2b'] = x
147 | x = self.activation(x)
148 |
149 | with tf.variable_scope('c'):
150 | c['conv_filters_out'] = filters_out
151 | c['ksize'] = 1
152 | assert c['stride'] == 1
153 | x = self.conv(x, c, 'res'+block_caffe_name+'_branch2c', belong)
154 | self.cvar['res'+block_caffe_name+'_branch2c'] = x
155 | x = self.bn(x, c, 'bn'+block_caffe_name+'_branch2c', belong)
156 | self.cvar['bn'+block_caffe_name+'_branch2c'] = x
157 |
158 | with tf.variable_scope('shortcut'):
159 | if filters_out != filters_in or c['block_stride'] != 1:
160 | c['ksize'] = 1
161 | c['stride'] = c['block_stride']
162 | c['conv_filters_out'] = filters_out
163 | shortcut = self.conv(shortcut, c, 'res'+block_caffe_name+'_branch1', belong)
164 | self.cvar['res'+block_caffe_name+'_branch1'] = shortcut
165 | shortcut = self.bn(shortcut, c, 'bn'+block_caffe_name+'_branch1', belong)
166 | self.cvar['bn'+block_caffe_name+'_branch1'] = shortcut
167 |
168 | return self.activation(x + shortcut)
169 |
170 |
171 | def bn(self, x, c, caffe_name, belong='conv'):
172 | x_shape = x.get_shape()
173 | params_shape = x_shape[-1:]
174 |
175 | if c['use_bias']:
176 | bias = self._get_variable('bias', params_shape,
177 | initializer=tf.zeros_initializer())
178 | return x + bias
179 |
180 |
181 | axis = list(range(len(x_shape) - 1))
182 |
183 | beta = self._get_variable('beta',
184 | caffe_name,
185 | params_shape,
186 | key='offset',
187 | initializer=tf.zeros_initializer())
188 | gamma = self._get_variable('gamma',
189 | caffe_name,
190 | params_shape,
191 | key='scale',
192 | initializer=tf.ones_initializer())
193 |
194 | moving_mean = self._get_variable('moving_mean',
195 | caffe_name,
196 | params_shape,
197 | key='mean',
198 | initializer=tf.zeros_initializer(),
199 | trainable=False)
200 | moving_variance = self._get_variable('moving_variance',
201 | caffe_name,
202 | params_shape,
203 | key='variance',
204 | initializer=tf.ones_initializer(),
205 | trainable=False)
206 |
207 | if belong == 'conv':
208 | self.varlist_conv.extend([beta, gamma, moving_mean, moving_variance])
209 | elif belong == 'region':
210 | self.varlist_region.extend([beta, gamma, moving_mean, moving_variance])
211 |
212 | # These ops will only be preformed when training.
213 | mean, variance = tf.nn.moments(x, axis)
214 | update_moving_mean = moving_averages.assign_moving_average(moving_mean,
215 | mean, BN_DECAY)
216 | update_moving_variance = moving_averages.assign_moving_average(
217 | moving_variance, variance, BN_DECAY)
218 | tf.add_to_collection(UPDATE_OPS_COLLECTION, update_moving_mean)
219 | tf.add_to_collection(UPDATE_OPS_COLLECTION, update_moving_variance)
220 |
221 | mean, variance = control_flow_ops.cond(
222 | c['is_training'], lambda: (mean, variance),
223 | lambda: (moving_mean, moving_variance))
224 |
225 | x = tf.nn.batch_normalization(x, mean, variance, beta, gamma, BN_EPSILON)
226 |
227 | return x
228 |
229 | def _get_variable(self,
230 | name,
231 | caffe_name,
232 | shape,
233 | initializer,
234 | key='weights',
235 | dtype='float',
236 | trainable=True):
237 |
238 | "A little wrapper around tf.get_variable to do weight decay and add to"
239 | "resnet collection"
240 | if self.data_dict.get(caffe_name):
241 | initializer = tf.constant_initializer(value = self.data_dict[caffe_name][key], dtype = tf.float32)
242 | else:
243 | print('[WARNING] Resnet block with caffe name\
244 | %s:%s was initialized by random' % (caffe_name, key))
245 | var = tf.get_variable(name, shape=shape, initializer=initializer,
246 | dtype=dtype, trainable=trainable)
247 | if self.weight_decay > 0:
248 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.weight_decay,
249 | name = 'weight_decay')
250 | tf.add_to_collection('img_net_weight_decay', weight_decay)
251 | return var
252 |
253 | def conv(self, x, c, caffe_name, belong='conv', padding='SAME'):
254 | ksize = c['ksize']
255 | stride = c['stride']
256 | filters_out = c['conv_filters_out']
257 |
258 | filters_in = x.get_shape()[-1]
259 | shape = [ksize, ksize, filters_in, filters_out]
260 | initializer = tf.truncated_normal_initializer(stddev=CONV_WEIGHT_STDDEV)
261 | weights = self._get_variable('weights',
262 | caffe_name,
263 | shape=shape,
264 | dtype='float32',
265 | initializer=initializer)
266 | if belong == 'conv':
267 | self.varlist_conv.append(weights)
268 | elif belong == 'region':
269 | self.varlist_region.append(weights)
270 | if ksize == 1 and stride == 2:
271 | padding = 'VALID'
272 | return tf.nn.conv2d(x, weights, [1, stride, stride, 1], padding=padding)
273 |
274 |
275 | def _max_pool(self, x, ksize=3, stride=2):
276 | return tf.nn.max_pool(x,
277 | ksize=[1, ksize, ksize, 1],
278 | strides=[1, stride, stride, 1],
279 | padding='VALID')
280 |
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/__init__.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/roi_pooling.so:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/networks/image_feat_net/roi_pooling_layer/roi_pooling.so
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/roi_pooling_op.cc:
--------------------------------------------------------------------------------
1 | /* Copyright 2015 Google Inc. All Rights Reserved.
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 | ==============================================================================*/
15 |
16 | // An example Op.
17 |
18 | #include
19 | #include
20 |
21 | #include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
22 | #include "tensorflow/core/framework/op.h"
23 | #include "tensorflow/core/framework/op_kernel.h"
24 | #include "tensorflow/core/framework/tensor_shape.h"
25 | #include "work_sharder.h"
26 |
27 | using namespace tensorflow;
28 | typedef Eigen::ThreadPoolDevice CPUDevice;
29 |
30 | REGISTER_OP("RoiPool")
31 | .Attr("T: {float, double}")
32 | .Attr("pooled_height: int")
33 | .Attr("pooled_width: int")
34 | .Attr("spatial_scale: float")
35 | .Input("bottom_data: T")
36 | .Input("bottom_rois: T")
37 | .Output("top_data: T")
38 | .Output("argmax: int32");
39 |
40 | REGISTER_OP("RoiPoolGrad")
41 | .Attr("T: {float, double}")
42 | .Attr("pooled_height: int")
43 | .Attr("pooled_width: int")
44 | .Attr("spatial_scale: float")
45 | .Input("bottom_data: T")
46 | .Input("bottom_rois: T")
47 | .Input("argmax: int32")
48 | .Input("grad: T")
49 | .Output("output: T");
50 |
51 | template
52 | class RoiPoolOp : public OpKernel {
53 | public:
54 | explicit RoiPoolOp(OpKernelConstruction* context) : OpKernel(context) {
55 | // Get the pool height
56 | OP_REQUIRES_OK(context,
57 | context->GetAttr("pooled_height", &pooled_height_));
58 | // Check that pooled_height is positive
59 | OP_REQUIRES(context, pooled_height_ >= 0,
60 | errors::InvalidArgument("Need pooled_height >= 0, got ",
61 | pooled_height_));
62 | // Get the pool width
63 | OP_REQUIRES_OK(context,
64 | context->GetAttr("pooled_width", &pooled_width_));
65 | // Check that pooled_width is positive
66 | OP_REQUIRES(context, pooled_width_ >= 0,
67 | errors::InvalidArgument("Need pooled_width >= 0, got ",
68 | pooled_width_));
69 | // Get the spatial scale
70 | OP_REQUIRES_OK(context,
71 | context->GetAttr("spatial_scale", &spatial_scale_));
72 | }
73 |
74 | void Compute(OpKernelContext* context) override
75 | {
76 | // Grab the input tensor
77 | const Tensor& bottom_data = context->input(0);
78 | const Tensor& bottom_rois = context->input(1);
79 | auto bottom_data_flat = bottom_data.flat();
80 | auto bottom_rois_flat = bottom_rois.flat();
81 |
82 | // data should have 4 dimensions.
83 | OP_REQUIRES(context, bottom_data.dims() == 4,
84 | errors::InvalidArgument("data must be 4-dimensional"));
85 |
86 | // rois should have 2 dimensions.
87 | OP_REQUIRES(context, bottom_rois.dims() == 2,
88 | errors::InvalidArgument("rois must be 2-dimensional"));
89 |
90 | // Number of ROIs
91 | int num_rois = bottom_rois.dim_size(0);
92 | // batch size
93 | int batch_size = bottom_data.dim_size(0);
94 | // data height
95 | int data_height = bottom_data.dim_size(1);
96 | // data width
97 | int data_width = bottom_data.dim_size(2);
98 | // Number of channels
99 | int num_channels = bottom_data.dim_size(3);
100 |
101 | // construct the output shape
102 | int dims[4];
103 | dims[0] = num_rois;
104 | dims[1] = pooled_height_;
105 | dims[2] = pooled_width_;
106 | dims[3] = num_channels;
107 | TensorShape output_shape;
108 | TensorShapeUtils::MakeShape(dims, 4, &output_shape);
109 |
110 | // Create output tensors
111 | Tensor* output_tensor = NULL;
112 | OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output_tensor));
113 | auto output = output_tensor->template flat();
114 |
115 | Tensor* argmax_tensor = NULL;
116 | OP_REQUIRES_OK(context, context->allocate_output(1, output_shape, &argmax_tensor));
117 | auto argmax = argmax_tensor->template flat();
118 |
119 | int pooled_height = pooled_height_;
120 | int pooled_width = pooled_width_;
121 | float spatial_scale = spatial_scale_;
122 |
123 | auto shard = [pooled_height, pooled_width, spatial_scale,
124 | num_rois, batch_size, data_height, data_width, num_channels,
125 | &bottom_data_flat, &bottom_rois_flat, &output, &argmax]
126 | (int64 start, int64 limit) {
127 | for (int64 b = start; b < limit; ++b)
128 | {
129 | // (n, ph, pw, c) is an element in the pooled output
130 | int n = b;
131 | int c = n % num_channels;
132 | n /= num_channels;
133 | int pw = n % pooled_width;
134 | n /= pooled_width;
135 | int ph = n % pooled_height;
136 | n /= pooled_height;
137 |
138 | const float* bottom_rois = bottom_rois_flat.data() + n * 5;
139 | int roi_batch_ind = bottom_rois[0];
140 | int roi_start_w = round(bottom_rois[1] * spatial_scale);
141 | int roi_start_h = round(bottom_rois[2] * spatial_scale);
142 | int roi_end_w = round(bottom_rois[3] * spatial_scale);
143 | int roi_end_h = round(bottom_rois[4] * spatial_scale);
144 |
145 | // Force malformed ROIs to be 1x1
146 | int roi_width = std::max(roi_end_w - roi_start_w + 1, 1);
147 | int roi_height = std::max(roi_end_h - roi_start_h + 1, 1);
148 | const T bin_size_h = static_cast(roi_height)
149 | / static_cast(pooled_height);
150 | const T bin_size_w = static_cast(roi_width)
151 | / static_cast(pooled_width);
152 |
153 | int hstart = static_cast(floor(ph * bin_size_h));
154 | int wstart = static_cast(floor(pw * bin_size_w));
155 | int hend = static_cast(ceil((ph + 1) * bin_size_h));
156 | int wend = static_cast(ceil((pw + 1) * bin_size_w));
157 |
158 | // Add roi offsets and clip to input boundaries
159 | hstart = std::min(std::max(hstart + roi_start_h, 0), data_height);
160 | hend = std::min(std::max(hend + roi_start_h, 0), data_height);
161 | wstart = std::min(std::max(wstart + roi_start_w, 0), data_width);
162 | wend = std::min(std::max(wend + roi_start_w, 0), data_width);
163 | bool is_empty = (hend <= hstart) || (wend <= wstart);
164 |
165 | // Define an empty pooling region to be zero
166 | float maxval = is_empty ? 0 : -FLT_MAX;
167 | // If nothing is pooled, argmax = -1 causes nothing to be backprop'd
168 | int maxidx = -1;
169 | const float* bottom_data = bottom_data_flat.data() + roi_batch_ind * num_channels * data_height * data_width;
170 | for (int h = hstart; h < hend; ++h) {
171 | for (int w = wstart; w < wend; ++w) {
172 | int bottom_index = (h * data_width + w) * num_channels + c;
173 | if (bottom_data[bottom_index] > maxval) {
174 | maxval = bottom_data[bottom_index];
175 | maxidx = bottom_index;
176 | }
177 | }
178 | }
179 | output(b) = maxval;
180 | argmax(b) = maxidx;
181 | }
182 | };
183 |
184 | const DeviceBase::CpuWorkerThreads& worker_threads =
185 | *(context->device()->tensorflow_cpu_worker_threads());
186 | const int64 shard_cost =
187 | num_rois * num_channels * pooled_height * pooled_width * spatial_scale;
188 | Shard(worker_threads.num_threads, worker_threads.workers,
189 | output.size(), shard_cost, shard);
190 | }
191 | private:
192 | int pooled_height_;
193 | int pooled_width_;
194 | float spatial_scale_;
195 | };
196 |
197 | bool ROIPoolForwardLaucher(
198 | const float* bottom_data, const float spatial_scale, const int num_rois, const int height,
199 | const int width, const int channels, const int pooled_height,
200 | const int pooled_width, const float* bottom_rois,
201 | float* top_data, int* argmax_data, const Eigen::GpuDevice& d);
202 |
203 | static void RoiPoolingKernel(
204 | OpKernelContext* context, const Tensor* bottom_data, const Tensor* bottom_rois,
205 | const float spatial_scale, const int num_rois, const int height,
206 | const int width, const int channels, const int pooled_height,
207 | const int pooled_width, const TensorShape& tensor_output_shape)
208 | {
209 | Tensor* output = nullptr;
210 | Tensor* argmax = nullptr;
211 | OP_REQUIRES_OK(context, context->allocate_output(0, tensor_output_shape, &output));
212 | OP_REQUIRES_OK(context, context->allocate_output(1, tensor_output_shape, &argmax));
213 |
214 | if (!context->status().ok()) {
215 | return;
216 | }
217 |
218 | ROIPoolForwardLaucher(
219 | bottom_data->flat().data(), spatial_scale, num_rois, height,
220 | width, channels, pooled_height, pooled_width, bottom_rois->flat().data(),
221 | output->flat().data(), argmax->flat().data(), context->eigen_device());
222 | }
223 |
224 | template
225 | class RoiPoolOp : public OpKernel {
226 | public:
227 | typedef Eigen::GpuDevice Device;
228 |
229 | explicit RoiPoolOp(OpKernelConstruction* context) : OpKernel(context) {
230 |
231 | // Get the pool height
232 | OP_REQUIRES_OK(context,
233 | context->GetAttr("pooled_height", &pooled_height_));
234 | // Check that pooled_height is positive
235 | OP_REQUIRES(context, pooled_height_ >= 0,
236 | errors::InvalidArgument("Need pooled_height >= 0, got ",
237 | pooled_height_));
238 | // Get the pool width
239 | OP_REQUIRES_OK(context,
240 | context->GetAttr("pooled_width", &pooled_width_));
241 | // Check that pooled_width is positive
242 | OP_REQUIRES(context, pooled_width_ >= 0,
243 | errors::InvalidArgument("Need pooled_width >= 0, got ",
244 | pooled_width_));
245 | // Get the spatial scale
246 | OP_REQUIRES_OK(context,
247 | context->GetAttr("spatial_scale", &spatial_scale_));
248 | }
249 |
250 | void Compute(OpKernelContext* context) override
251 | {
252 | // Grab the input tensor
253 | const Tensor& bottom_data = context->input(0);
254 | const Tensor& bottom_rois = context->input(1);
255 |
256 | // data should have 4 dimensions.
257 | OP_REQUIRES(context, bottom_data.dims() == 4,
258 | errors::InvalidArgument("data must be 4-dimensional"));
259 |
260 | // rois should have 2 dimensions.
261 | OP_REQUIRES(context, bottom_rois.dims() == 2,
262 | errors::InvalidArgument("rois must be 2-dimensional"));
263 |
264 | // Number of ROIs
265 | int num_rois = bottom_rois.dim_size(0);
266 | // batch size
267 | int batch_size = bottom_data.dim_size(0);
268 | // data height
269 | int data_height = bottom_data.dim_size(1);
270 | // data width
271 | int data_width = bottom_data.dim_size(2);
272 | // Number of channels
273 | int num_channels = bottom_data.dim_size(3);
274 |
275 | // construct the output shape
276 | int dims[4];
277 | dims[0] = num_rois;
278 | dims[1] = pooled_height_;
279 | dims[2] = pooled_width_;
280 | dims[3] = num_channels;
281 | TensorShape output_shape;
282 | TensorShapeUtils::MakeShape(dims, 4, &output_shape);
283 |
284 | RoiPoolingKernel(context, &bottom_data, &bottom_rois, spatial_scale_, num_rois, data_height,
285 | data_width, num_channels, pooled_height_, pooled_width_, output_shape);
286 |
287 | }
288 | private:
289 | int pooled_height_;
290 | int pooled_width_;
291 | float spatial_scale_;
292 | };
293 |
294 | // compute gradient
295 | template
296 | class RoiPoolGradOp : public OpKernel {
297 | public:
298 | explicit RoiPoolGradOp(OpKernelConstruction* context) : OpKernel(context) {
299 |
300 | // Get the pool height
301 | OP_REQUIRES_OK(context,
302 | context->GetAttr("pooled_height", &pooled_height_));
303 | // Check that pooled_height is positive
304 | OP_REQUIRES(context, pooled_height_ >= 0,
305 | errors::InvalidArgument("Need pooled_height >= 0, got ",
306 | pooled_height_));
307 | // Get the pool width
308 | OP_REQUIRES_OK(context,
309 | context->GetAttr("pooled_width", &pooled_width_));
310 | // Check that pooled_width is positive
311 | OP_REQUIRES(context, pooled_width_ >= 0,
312 | errors::InvalidArgument("Need pooled_width >= 0, got ",
313 | pooled_width_));
314 | // Get the spatial scale
315 | OP_REQUIRES_OK(context,
316 | context->GetAttr("spatial_scale", &spatial_scale_));
317 | }
318 |
319 | void Compute(OpKernelContext* context) override
320 | {
321 | // Grab the input tensor
322 | const Tensor& bottom_data = context->input(0);
323 | const Tensor& bottom_rois = context->input(1);
324 | const Tensor& argmax_data = context->input(2);
325 | const Tensor& out_backprop = context->input(3);
326 |
327 | auto bottom_data_flat = bottom_data.flat();
328 | auto bottom_rois_flat = bottom_rois.flat();
329 | auto argmax_data_flat = argmax_data.flat();
330 | auto out_backprop_flat = out_backprop.flat();
331 |
332 | // data should have 4 dimensions.
333 | OP_REQUIRES(context, bottom_data.dims() == 4,
334 | errors::InvalidArgument("data must be 4-dimensional"));
335 |
336 | // rois should have 2 dimensions.
337 | OP_REQUIRES(context, bottom_rois.dims() == 2,
338 | errors::InvalidArgument("rois must be 2-dimensional"));
339 |
340 | OP_REQUIRES(context, argmax_data.dims() == 4,
341 | errors::InvalidArgument("argmax_data must be 4-dimensional"));
342 |
343 | OP_REQUIRES(context, out_backprop.dims() == 4,
344 | errors::InvalidArgument("out_backprop must be 4-dimensional"));
345 |
346 | // Number of ROIs
347 | int num_rois = bottom_rois.dim_size(0);
348 | // batch size
349 | int batch_size = bottom_data.dim_size(0);
350 | // data height
351 | int data_height = bottom_data.dim_size(1);
352 | // data width
353 | int data_width = bottom_data.dim_size(2);
354 | // Number of channels
355 | int num_channels = bottom_data.dim_size(3);
356 |
357 | // construct the output shape
358 | TensorShape output_shape = bottom_data.shape();
359 |
360 | // Create output tensors
361 | Tensor* output_tensor = NULL;
362 | OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output_tensor));
363 | auto output = output_tensor->template flat();
364 |
365 | int pooled_height = pooled_height_;
366 | int pooled_width = pooled_width_;
367 | float spatial_scale = spatial_scale_;
368 |
369 | auto shard = [pooled_height, pooled_width, spatial_scale,
370 | num_rois, batch_size, data_height, data_width, num_channels,
371 | &bottom_data_flat, &bottom_rois_flat, &argmax_data_flat,
372 | &out_backprop_flat, &output](int64 start, int64 limit) {
373 | for (int64 b = start; b < limit; ++b)
374 | {
375 | // (n, h, w, c) coords in bottom data
376 | int n = b;
377 | int c = n % num_channels;
378 | n /= num_channels;
379 | int w = n % data_width;
380 | n /= data_width;
381 | int h = n % data_height;
382 | n /= data_height;
383 |
384 | float gradient = 0.0;
385 | // Accumulate gradient over all ROIs that pooled this element
386 | for (int roi_n = 0; roi_n < num_rois; ++roi_n)
387 | {
388 | const float* offset_bottom_rois = bottom_rois_flat.data() + roi_n * 5;
389 | int roi_batch_ind = offset_bottom_rois[0];
390 | // Skip if ROI's batch index doesn't match n
391 | if (n != roi_batch_ind) {
392 | continue;
393 | }
394 |
395 | int roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
396 | int roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
397 | int roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
398 | int roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
399 |
400 | // Skip if ROI doesn't include (h, w)
401 | const bool in_roi = (w >= roi_start_w && w <= roi_end_w &&
402 | h >= roi_start_h && h <= roi_end_h);
403 | if (!in_roi) {
404 | continue;
405 | }
406 |
407 | int offset = roi_n * pooled_height * pooled_width * num_channels;
408 | const float* offset_top_diff = out_backprop_flat.data() + offset;
409 | const int* offset_argmax_data = argmax_data_flat.data() + offset;
410 |
411 | // Compute feasible set of pooled units that could have pooled
412 | // this bottom unit
413 |
414 | // Force malformed ROIs to be 1x1
415 | int roi_width = std::max(roi_end_w - roi_start_w + 1, 1);
416 | int roi_height = std::max(roi_end_h - roi_start_h + 1, 1);
417 |
418 | const T bin_size_h = static_cast(roi_height)
419 | / static_cast(pooled_height);
420 | const T bin_size_w = static_cast(roi_width)
421 | / static_cast(pooled_width);
422 |
423 | int phstart = floor(static_cast(h - roi_start_h) / bin_size_h);
424 | int phend = ceil(static_cast(h - roi_start_h + 1) / bin_size_h);
425 | int pwstart = floor(static_cast(w - roi_start_w) / bin_size_w);
426 | int pwend = ceil(static_cast(w - roi_start_w + 1) / bin_size_w);
427 |
428 | phstart = std::min(std::max(phstart, 0), pooled_height);
429 | phend = std::min(std::max(phend, 0), pooled_height);
430 | pwstart = std::min(std::max(pwstart, 0), pooled_width);
431 | pwend = std::min(std::max(pwend, 0), pooled_width);
432 |
433 | for (int ph = phstart; ph < phend; ++ph) {
434 | for (int pw = pwstart; pw < pwend; ++pw) {
435 | if (offset_argmax_data[(ph * pooled_width + pw) * num_channels + c] == (h * data_width + w) * num_channels + c)
436 | {
437 | gradient += offset_top_diff[(ph * pooled_width + pw) * num_channels + c];
438 | }
439 | }
440 | }
441 | }
442 | output(b) = gradient;
443 | }
444 | };
445 |
446 | const DeviceBase::CpuWorkerThreads& worker_threads =
447 | *(context->device()->tensorflow_cpu_worker_threads());
448 | const int64 shard_cost =
449 | num_rois * num_channels * pooled_height * pooled_width * spatial_scale;
450 | Shard(worker_threads.num_threads, worker_threads.workers,
451 | output.size(), shard_cost, shard);
452 | }
453 | private:
454 | int pooled_height_;
455 | int pooled_width_;
456 | float spatial_scale_;
457 | };
458 |
459 | bool ROIPoolBackwardLaucher(const float* top_diff, const float spatial_scale, const int batch_size, const int num_rois,
460 | const int height, const int width, const int channels, const int pooled_height,
461 | const int pooled_width, const float* bottom_rois,
462 | float* bottom_diff, const int* argmax_data, const Eigen::GpuDevice& d);
463 |
464 | static void RoiPoolingGradKernel(
465 | OpKernelContext* context, const Tensor* bottom_data, const Tensor* bottom_rois, const Tensor* argmax_data, const Tensor* out_backprop,
466 | const float spatial_scale, const int batch_size, const int num_rois, const int height,
467 | const int width, const int channels, const int pooled_height,
468 | const int pooled_width, const TensorShape& tensor_output_shape)
469 | {
470 | Tensor* output = nullptr;
471 | OP_REQUIRES_OK(context, context->allocate_output(0, tensor_output_shape, &output));
472 |
473 | if (!context->status().ok()) {
474 | return;
475 | }
476 |
477 | ROIPoolBackwardLaucher(
478 | out_backprop->flat().data(), spatial_scale, batch_size, num_rois, height,
479 | width, channels, pooled_height, pooled_width, bottom_rois->flat().data(),
480 | output->flat().data(), argmax_data->flat().data(), context->eigen_device());
481 | }
482 |
483 |
484 | template
485 | class RoiPoolGradOp : public OpKernel {
486 | public:
487 | explicit RoiPoolGradOp(OpKernelConstruction* context) : OpKernel(context) {
488 |
489 | // Get the pool height
490 | OP_REQUIRES_OK(context,
491 | context->GetAttr("pooled_height", &pooled_height_));
492 | // Check that pooled_height is positive
493 | OP_REQUIRES(context, pooled_height_ >= 0,
494 | errors::InvalidArgument("Need pooled_height >= 0, got ",
495 | pooled_height_));
496 | // Get the pool width
497 | OP_REQUIRES_OK(context,
498 | context->GetAttr("pooled_width", &pooled_width_));
499 | // Check that pooled_width is positive
500 | OP_REQUIRES(context, pooled_width_ >= 0,
501 | errors::InvalidArgument("Need pooled_width >= 0, got ",
502 | pooled_width_));
503 | // Get the spatial scale
504 | OP_REQUIRES_OK(context,
505 | context->GetAttr("spatial_scale", &spatial_scale_));
506 | }
507 |
508 | void Compute(OpKernelContext* context) override
509 | {
510 | // Grab the input tensor
511 | const Tensor& bottom_data = context->input(0);
512 | const Tensor& bottom_rois = context->input(1);
513 | const Tensor& argmax_data = context->input(2);
514 | const Tensor& out_backprop = context->input(3);
515 |
516 | // data should have 4 dimensions.
517 | OP_REQUIRES(context, bottom_data.dims() == 4,
518 | errors::InvalidArgument("data must be 4-dimensional"));
519 |
520 | // rois should have 2 dimensions.
521 | OP_REQUIRES(context, bottom_rois.dims() == 2,
522 | errors::InvalidArgument("rois must be 2-dimensional"));
523 |
524 | OP_REQUIRES(context, argmax_data.dims() == 4,
525 | errors::InvalidArgument("argmax_data must be 4-dimensional"));
526 |
527 | OP_REQUIRES(context, out_backprop.dims() == 4,
528 | errors::InvalidArgument("out_backprop must be 4-dimensional"));
529 |
530 | // Number of ROIs
531 | int num_rois = bottom_rois.dim_size(0);
532 | // batch size
533 | int batch_size = bottom_data.dim_size(0);
534 | // data height
535 | int height = bottom_data.dim_size(1);
536 | // data width
537 | int width = bottom_data.dim_size(2);
538 | // Number of channels
539 | int channels = bottom_data.dim_size(3);
540 |
541 | // construct the output shape
542 | TensorShape output_shape = bottom_data.shape();
543 |
544 | RoiPoolingGradKernel(
545 | context, &bottom_data, &bottom_rois, &argmax_data, &out_backprop,
546 | spatial_scale_, batch_size, num_rois, height, width, channels, pooled_height_,
547 | pooled_width_, output_shape);
548 |
549 | }
550 | private:
551 | int pooled_height_;
552 | int pooled_width_;
553 | float spatial_scale_;
554 | };
555 |
556 | REGISTER_KERNEL_BUILDER(Name("RoiPool").Device(DEVICE_CPU).TypeConstraint("T"), RoiPoolOp);
557 | REGISTER_KERNEL_BUILDER(Name("RoiPoolGrad").Device(DEVICE_CPU).TypeConstraint("T"), RoiPoolGradOp);
558 | #if GOOGLE_CUDA
559 | REGISTER_KERNEL_BUILDER(Name("RoiPool").Device(DEVICE_GPU).TypeConstraint("T"), RoiPoolOp);
560 | REGISTER_KERNEL_BUILDER(Name("RoiPoolGrad").Device(DEVICE_GPU).TypeConstraint("T"), RoiPoolGradOp);
561 | #endif
562 |
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/roi_pooling_op.cu.o:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/networks/image_feat_net/roi_pooling_layer/roi_pooling_op.cu.o
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/roi_pooling_op.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | import os.path as osp
3 |
4 | filename = osp.join(osp.dirname(__file__), 'roi_pooling.so')
5 | _roi_pooling_module = tf.load_op_library(filename)
6 | roi_pool = _roi_pooling_module.roi_pool
7 | roi_pool_grad = _roi_pooling_module.roi_pool_grad
8 |
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/roi_pooling_op_gpu.cu.cc:
--------------------------------------------------------------------------------
1 | #if GOOGLE_CUDA
2 |
3 | #define EIGEN_USE_GPU
4 |
5 | #include
6 | #include
7 | #include "roi_pooling_op_gpu.h"
8 |
9 | #define CUDA_1D_KERNEL_LOOP(i, n) \
10 | for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
11 | i += blockDim.x * gridDim.x)
12 |
13 | using std::max;
14 | using std::min;
15 |
16 | // namespace tensorflow {
17 | using namespace tensorflow;
18 |
19 | template
20 | __global__ void ROIPoolForward(const int nthreads, const Dtype* bottom_data,
21 | const Dtype spatial_scale, const int height, const int width,
22 | const int channels, const int pooled_height, const int pooled_width,
23 | const Dtype* bottom_rois, Dtype* top_data, int* argmax_data)
24 | {
25 | CUDA_1D_KERNEL_LOOP(index, nthreads)
26 | {
27 | // (n, ph, pw, c) is an element in the pooled output
28 | int n = index;
29 | int c = n % channels;
30 | n /= channels;
31 | int pw = n % pooled_width;
32 | n /= pooled_width;
33 | int ph = n % pooled_height;
34 | n /= pooled_height;
35 |
36 | bottom_rois += n * 5;
37 | int roi_batch_ind = bottom_rois[0];
38 | int roi_start_w = round(bottom_rois[1] * spatial_scale);
39 | int roi_start_h = round(bottom_rois[2] * spatial_scale);
40 | int roi_end_w = round(bottom_rois[3] * spatial_scale);
41 | int roi_end_h = round(bottom_rois[4] * spatial_scale);
42 |
43 | // Force malformed ROIs to be 1x1
44 | int roi_width = max(roi_end_w - roi_start_w + 1, 1);
45 | int roi_height = max(roi_end_h - roi_start_h + 1, 1);
46 | Dtype bin_size_h = static_cast(roi_height)
47 | / static_cast(pooled_height);
48 | Dtype bin_size_w = static_cast(roi_width)
49 | / static_cast(pooled_width);
50 |
51 | int hstart = static_cast(floor(static_cast(ph)
52 | * bin_size_h));
53 | int wstart = static_cast(floor(static_cast(pw)
54 | * bin_size_w));
55 | int hend = static_cast(ceil(static_cast(ph + 1)
56 | * bin_size_h));
57 | int wend = static_cast(ceil(static_cast(pw + 1)
58 | * bin_size_w));
59 |
60 | // Add roi offsets and clip to input boundaries
61 | hstart = min(max(hstart + roi_start_h, 0), height);
62 | hend = min(max(hend + roi_start_h, 0), height);
63 | wstart = min(max(wstart + roi_start_w, 0), width);
64 | wend = min(max(wend + roi_start_w, 0), width);
65 | bool is_empty = (hend <= hstart) || (wend <= wstart);
66 |
67 | // Define an empty pooling region to be zero
68 | Dtype maxval = is_empty ? 0 : -FLT_MAX;
69 | // If nothing is pooled, argmax = -1 causes nothing to be backprop'd
70 | int maxidx = -1;
71 | bottom_data += roi_batch_ind * channels * height * width;
72 | for (int h = hstart; h < hend; ++h) {
73 | for (int w = wstart; w < wend; ++w) {
74 | int bottom_index = (h * width + w) * channels + c;
75 | if (bottom_data[bottom_index] > maxval) {
76 | maxval = bottom_data[bottom_index];
77 | maxidx = bottom_index;
78 | }
79 | }
80 | }
81 | top_data[index] = maxval;
82 | if (argmax_data != nullptr)
83 | argmax_data[index] = maxidx;
84 | }
85 | }
86 |
87 | bool ROIPoolForwardLaucher(
88 | const float* bottom_data, const float spatial_scale, const int num_rois, const int height,
89 | const int width, const int channels, const int pooled_height,
90 | const int pooled_width, const float* bottom_rois,
91 | float* top_data, int* argmax_data, const Eigen::GpuDevice& d)
92 | {
93 | const int kThreadsPerBlock = 1024;
94 | const int output_size = num_rois * pooled_height * pooled_width * channels;
95 | cudaError_t err;
96 |
97 | ROIPoolForward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock,
98 | kThreadsPerBlock, 0, d.stream()>>>(
99 | output_size, bottom_data, spatial_scale, height, width, channels, pooled_height,
100 | pooled_width, bottom_rois, top_data, argmax_data);
101 |
102 | err = cudaGetLastError();
103 | if(cudaSuccess != err)
104 | {
105 | fprintf( stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString( err ) );
106 | exit( -1 );
107 | }
108 |
109 | return d.ok();
110 | }
111 |
112 |
113 | template
114 | __global__ void ROIPoolBackward(const int nthreads, const Dtype* top_diff,
115 | const int* argmax_data, const int num_rois, const Dtype spatial_scale,
116 | const int height, const int width, const int channels,
117 | const int pooled_height, const int pooled_width, Dtype* bottom_diff,
118 | const Dtype* bottom_rois) {
119 | CUDA_1D_KERNEL_LOOP(index, nthreads)
120 | {
121 | // (n, h, w, c) coords in bottom data
122 | int n = index;
123 | int c = n % channels;
124 | n /= channels;
125 | int w = n % width;
126 | n /= width;
127 | int h = n % height;
128 | n /= height;
129 |
130 | Dtype gradient = 0;
131 | // Accumulate gradient over all ROIs that pooled this element
132 | for (int roi_n = 0; roi_n < num_rois; ++roi_n)
133 | {
134 | const Dtype* offset_bottom_rois = bottom_rois + roi_n * 5;
135 | int roi_batch_ind = offset_bottom_rois[0];
136 | // Skip if ROI's batch index doesn't match n
137 | if (n != roi_batch_ind) {
138 | continue;
139 | }
140 |
141 | int roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
142 | int roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
143 | int roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
144 | int roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
145 |
146 | // Skip if ROI doesn't include (h, w)
147 | const bool in_roi = (w >= roi_start_w && w <= roi_end_w &&
148 | h >= roi_start_h && h <= roi_end_h);
149 | if (!in_roi) {
150 | continue;
151 | }
152 |
153 | int offset = roi_n * pooled_height * pooled_width * channels;
154 | const Dtype* offset_top_diff = top_diff + offset;
155 | const int* offset_argmax_data = argmax_data + offset;
156 |
157 | // Compute feasible set of pooled units that could have pooled
158 | // this bottom unit
159 |
160 | // Force malformed ROIs to be 1x1
161 | int roi_width = max(roi_end_w - roi_start_w + 1, 1);
162 | int roi_height = max(roi_end_h - roi_start_h + 1, 1);
163 |
164 | Dtype bin_size_h = static_cast(roi_height)
165 | / static_cast(pooled_height);
166 | Dtype bin_size_w = static_cast(roi_width)
167 | / static_cast(pooled_width);
168 |
169 | int phstart = floor(static_cast(h - roi_start_h) / bin_size_h);
170 | int phend = ceil(static_cast(h - roi_start_h + 1) / bin_size_h);
171 | int pwstart = floor(static_cast(w - roi_start_w) / bin_size_w);
172 | int pwend = ceil(static_cast(w - roi_start_w + 1) / bin_size_w);
173 |
174 | phstart = min(max(phstart, 0), pooled_height);
175 | phend = min(max(phend, 0), pooled_height);
176 | pwstart = min(max(pwstart, 0), pooled_width);
177 | pwend = min(max(pwend, 0), pooled_width);
178 |
179 | for (int ph = phstart; ph < phend; ++ph) {
180 | for (int pw = pwstart; pw < pwend; ++pw) {
181 | if (offset_argmax_data[(ph * pooled_width + pw) * channels + c] == (h * width + w) * channels + c)
182 | {
183 | gradient += offset_top_diff[(ph * pooled_width + pw) * channels + c];
184 | }
185 | }
186 | }
187 | }
188 | bottom_diff[index] = gradient;
189 | }
190 | }
191 |
192 |
193 | bool ROIPoolBackwardLaucher(const float* top_diff, const float spatial_scale, const int batch_size, const int num_rois,
194 | const int height, const int width, const int channels, const int pooled_height,
195 | const int pooled_width, const float* bottom_rois,
196 | float* bottom_diff, const int* argmax_data, const Eigen::GpuDevice& d)
197 | {
198 | const int kThreadsPerBlock = 1024;
199 | const int output_size = batch_size * height * width * channels;
200 | cudaError_t err;
201 |
202 | ROIPoolBackward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock,
203 | kThreadsPerBlock, 0, d.stream()>>>(
204 | output_size, top_diff, argmax_data, num_rois, spatial_scale, height, width, channels, pooled_height,
205 | pooled_width, bottom_diff, bottom_rois);
206 |
207 | err = cudaGetLastError();
208 | if(cudaSuccess != err)
209 | {
210 | fprintf( stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString( err ) );
211 | exit( -1 );
212 | }
213 |
214 | return d.ok();
215 | }
216 |
217 | // } // namespace tensorflow
218 |
219 | #endif // GOOGLE_CUDA
220 |
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/roi_pooling_op_gpu.h:
--------------------------------------------------------------------------------
1 | #if !GOOGLE_CUDA
2 | #error This file must only be included when building with Cuda support
3 | #endif
4 |
5 | #ifndef TENSORFLOW_USER_OPS_ROIPOOLING_OP_GPU_H_
6 | #define TENSORFLOW_USER_OPS_ROIPOOLING_OP_GPU_H_
7 |
8 | #define EIGEN_USE_GPU
9 |
10 | #include "tensorflow/core/framework/tensor_types.h"
11 | #include "tensorflow/core/platform/types.h"
12 |
13 | namespace tensorflow {
14 |
15 | // Run the forward pass of max pooling, optionally writing the argmax indices to
16 | // the mask array, if it is not nullptr. If mask is passed in as nullptr, the
17 | // argmax indices are not written.
18 | bool ROIPoolForwardLaucher(
19 | const float* bottom_data, const float spatial_scale, const int num_rois, const int height,
20 | const int width, const int channels, const int pooled_height,
21 | const int pooled_width, const float* bottom_rois,
22 | float* top_data, int* argmax_data, const Eigen::GpuDevice& d);
23 |
24 | bool ROIPoolBackwardLaucher(const float* top_diff, const float spatial_scale, const int batch_size, const int num_rois,
25 | const int height, const int width, const int channels, const int pooled_height,
26 | const int pooled_width, const float* bottom_rois,
27 | float* bottom_diff, const int* argmax_data, const Eigen::GpuDevice& d);
28 |
29 | } // namespace tensorflow
30 |
31 | #endif // TENSORFLOW_CORE_KERNELS_MAXPOOLING_OP_GPU_H_
32 |
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/roi_pooling_op_grad.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | from tensorflow.python.framework import ops
3 | from . import roi_pooling_op
4 | import pdb
5 |
6 |
7 | @ops.RegisterShape("RoiPool")
8 | def _roi_pool_shape(op):
9 | """Shape function for the RoiPool op.
10 |
11 | """
12 | dims_data = op.inputs[0].get_shape().as_list()
13 | channels = dims_data[3]
14 | dims_rois = op.inputs[1].get_shape().as_list()
15 | num_rois = dims_rois[0]
16 |
17 | pooled_height = op.get_attr('pooled_height')
18 | pooled_width = op.get_attr('pooled_width')
19 |
20 | output_shape = tf.TensorShape([num_rois, pooled_height, pooled_width, channels])
21 | return [output_shape, output_shape]
22 |
23 | @ops.RegisterGradient("RoiPool")
24 | def _roi_pool_grad(op, grad, _):
25 | """The gradients for `roi_pool`.
26 | Args:
27 | op: The `roi_pool` `Operation` that we are differentiating, which we can use
28 | to find the inputs and outputs of the original op.
29 | grad: Gradient with respect to the output of the `roi_pool` op.
30 | Returns:
31 | Gradients with respect to the input of `zero_out`.
32 | """
33 | data = op.inputs[0]
34 | rois = op.inputs[1]
35 | argmax = op.outputs[1]
36 | pooled_height = op.get_attr('pooled_height')
37 | pooled_width = op.get_attr('pooled_width')
38 | spatial_scale = op.get_attr('spatial_scale')
39 |
40 | # compute gradient
41 | data_grad = roi_pooling_op.roi_pool_grad(data, rois, argmax, grad, pooled_height, pooled_width, spatial_scale)
42 |
43 | return [data_grad, None] # List of one Tensor, since we have one input
44 |
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/roi_pooling_op_test.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | import numpy as np
3 | import roi_pooling_op
4 | import roi_pooling_op_grad
5 | import tensorflow as tf
6 | import pdb
7 |
8 |
9 | def weight_variable(shape):
10 | initial = tf.truncated_normal(shape, stddev=0.1)
11 | return tf.Variable(initial)
12 |
13 | def conv2d(x, W):
14 | return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
15 |
16 | array = np.random.rand(32, 100, 100, 3)
17 | data = tf.convert_to_tensor(array, dtype=tf.float32)
18 | rois = tf.convert_to_tensor([[0, 10, 10, 20, 20], [31, 30, 30, 40, 40]], dtype=tf.float32)
19 |
20 | W = weight_variable([3, 3, 3, 1])
21 | h = conv2d(data, W)
22 |
23 | [y, argmax] = roi_pooling_op.roi_pool(h, rois, 6, 6, 1.0/3)
24 | pdb.set_trace()
25 | y_data = tf.convert_to_tensor(np.ones((2, 6, 6, 1)), dtype=tf.float32)
26 | print y_data, y, argmax
27 |
28 | # Minimize the mean squared errors.
29 | loss = tf.reduce_mean(tf.square(y - y_data))
30 | optimizer = tf.train.GradientDescentOptimizer(0.5)
31 | train = optimizer.minimize(loss)
32 |
33 | init = tf.initialize_all_variables()
34 |
35 | # Launch the graph.
36 | sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
37 | sess.run(init)
38 | pdb.set_trace()
39 | for step in xrange(10):
40 | sess.run(train)
41 | print(step, sess.run(W))
42 | print(sess.run(y))
43 |
44 | #with tf.device('/gpu:0'):
45 | # result = module.roi_pool(data, rois, 1, 1, 1.0/1)
46 | # print result.eval()
47 | #with tf.device('/cpu:0'):
48 | # run(init)
49 |
--------------------------------------------------------------------------------
/networks/image_feat_net/roi_pooling_layer/work_sharder.h:
--------------------------------------------------------------------------------
1 | /* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 | ==============================================================================*/
15 |
16 | #ifndef TENSORFLOW_UTIL_WORK_SHARDER_H_
17 | #define TENSORFLOW_UTIL_WORK_SHARDER_H_
18 |
19 | #include
20 |
21 | #include "tensorflow/core/lib/core/threadpool.h"
22 | #include "tensorflow/core/platform/types.h"
23 |
24 | namespace tensorflow {
25 |
26 | // Shards the "total" unit of work assuming each unit of work having
27 | // roughly "cost_per_unit". Each unit of work is indexed 0, 1, ...,
28 | // total - 1. Each shard contains 1 or more units of work and the
29 | // total cost of each shard is roughly the same. The calling thread and the
30 | // "workers" are used to compute each shard (calling work(start,
31 | // limit). A common configuration is that "workers" is a thread pool
32 | // with at least "max_parallelism" threads.
33 | //
34 | // "cost_per_unit" is an estimate of the number of CPU cycles (or nanoseconds
35 | // if not CPU-bound) to complete a unit of work. Overestimating creates too
36 | // many shards and CPU time will be dominated by per-shard overhead, such as
37 | // Context creation. Underestimating may not fully make use of the specified
38 | // parallelism.
39 | //
40 | // "work" should be a callable taking (int64, int64) arguments.
41 | // work(start, limit) computes the work units from [start,
42 | // limit), i.e., [start, limit) is a shard.
43 | //
44 | // REQUIRES: max_parallelism >= 0
45 | // REQUIRES: workers != nullptr
46 | // REQUIRES: total >= 0
47 | // REQUIRES: cost_per_unit >= 0
48 | void Shard(int max_parallelism, thread::ThreadPool* workers, int64 total,
49 | int64 cost_per_unit, std::function work);
50 |
51 | } // end namespace tensorflow
52 |
53 | #endif // TENSORFLOW_UTIL_WORK_SHARDER_H_
54 |
--------------------------------------------------------------------------------
/networks/image_feat_net/vgg16/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/networks/image_feat_net/vgg16/__init__.py
--------------------------------------------------------------------------------
/networks/image_feat_net/vgg16/net.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import inspect
3 | import os
4 | import tensorflow as tf
5 | from ..roi_pooling_layer.roi_pooling_op import roi_pool
6 | from ..roi_pooling_layer.roi_pooling_op_grad import *
7 |
8 | class Vgg16:
9 | def __init__(self, lr, RegionNet_npy_path = 'frcnn_Region_Feat_Net.npy', train = True):
10 | #load saved model
11 | try:
12 | path = inspect.getfile(Vgg16)
13 | path = os.path.abspath(os.path.join(path, os.pardir))
14 | RegionNet_npy_path = os.path.join(path, RegionNet_npy_path)
15 | self.data_dict = np.load(RegionNet_npy_path, encoding='latin1').item()
16 | print("Image Feat Net npy file loaded")
17 | except:
18 | print('[WARNING!]Image Feat Net npy file not found,'
19 | 'we don\'t recommend training this network from scratch')
20 | self.data_dict = {}
21 | self.lr = lr
22 | self.train = train
23 | self.varlist_conv = []
24 | self.varlist_region = []
25 | self.net_type = 'Vgg16'
26 |
27 | def build(self, bgr, rois, parameters):
28 | #set placeholder
29 | self.bgr = bgr
30 | self.rois = rois
31 |
32 | #set parameters
33 | self.feature_dim = parameters['feature_dim']
34 | self.weight_decay = parameters['weight_decay']
35 | self.dropout_ratio = parameters['dropout_ratio']
36 | self.dropout_flag = parameters['dropout_flag']
37 | self.roi_size = parameters['roi_size']
38 | self.roi_scale = parameters['roi_scale']
39 | self.build_conv()
40 | self.build_region()
41 | return self.relu7
42 |
43 | def build_conv(self):
44 | """
45 | load variable from npy to build the VGG
46 |
47 | :param bgr: bgr image [batch, height, width, 3] values scaled [0, 1]
48 | """
49 | # Convert RGB to BGR
50 | self.conv1_1 = self.conv_layer(self.bgr, "conv1_1")
51 | self.conv1_2 = self.conv_layer(self.conv1_1, "conv1_2")
52 | self.pool1 = self.max_pool(self.conv1_2, 'pool1')
53 |
54 | self.conv2_1 = self.conv_layer(self.pool1, "conv2_1")
55 | self.conv2_2 = self.conv_layer(self.conv2_1, "conv2_2")
56 | self.pool2 = self.max_pool(self.conv2_2, 'pool2')
57 |
58 | self.conv3_1 = self.conv_layer(self.pool2, "conv3_1")
59 | self.conv3_2 = self.conv_layer(self.conv3_1, "conv3_2")
60 | self.conv3_3 = self.conv_layer(self.conv3_2, "conv3_3")
61 | self.pool3 = self.max_pool(self.conv3_3, 'pool3')
62 |
63 | self.conv4_1 = self.conv_layer(self.pool3, "conv4_1")
64 | self.conv4_2 = self.conv_layer(self.conv4_1, "conv4_2")
65 | self.conv4_3 = self.conv_layer(self.conv4_2, "conv4_3")
66 | self.pool4 = self.max_pool(self.conv4_3, 'pool4')
67 |
68 | self.conv5_1 = self.conv_layer(self.pool4, "conv5_1")
69 | self.conv5_2 = self.conv_layer(self.conv5_1, "conv5_2")
70 | self.conv5_3 = self.conv_layer(self.conv5_2, "conv5_3")
71 |
72 | def build_region(self):
73 | [self.rois_feat, _] = roi_pool(self.conv5_3, self.rois,
74 | self.roi_size, self.roi_size,
75 | self.roi_scale)
76 |
77 | # reshape tensor so that every channel's map are expanded
78 | # with rows unchanged
79 | conv_channels = self.rois_feat.get_shape().as_list()[-1]
80 | self.rois_feat_reshape1 = tf.reshape(self.rois_feat,
81 | [-1, self.roi_size ** 2,
82 | conv_channels])
83 | self.rois_feat_transpose = tf.transpose(self.rois_feat_reshape1,
84 | perm = [0, 2, 1])
85 | self.rois_feat_reshape2 = tf.reshape(self.rois_feat_transpose,
86 | [-1, self.roi_size ** 2 *
87 | conv_channels])
88 |
89 | self.fc6 = self.fc_layer(self.rois_feat_reshape2, 'fc6',
90 | [self.roi_size ** 2 * 512, 4096])
91 | self.relu6 = tf.nn.relu(self.fc6)
92 |
93 | #hand write dropout
94 | if self.train:
95 | self.relu6 = dropout(self.relu6, self.dropout_flag,
96 | self.dropout_ratio, 'fc6_dropout')
97 |
98 | self.fc7 = self.fc_layer(self.relu6, "fc7", [4096, self.feature_dim])
99 | self.relu7 = tf.nn.relu(self.fc7)
100 |
101 | def conv_layer(self, bottom, name):
102 | with tf.variable_scope(name):
103 | filt = self.get_conv_filter(name)
104 |
105 | conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')
106 |
107 | conv_biases = self.get_bias_conv(name)
108 | bias = tf.nn.bias_add(conv, conv_biases)
109 |
110 | relu = tf.nn.relu(bias)
111 | return relu
112 |
113 | def get_conv_filter(self, name):
114 | var = tf.Variable(self.data_dict[name]['weights'], name="filter",
115 | trainable = (self.lr > 0), dtype = tf.float32)
116 | wd = tf.multiply(tf.nn.l2_loss(var), self.weight_decay, name = 'weight_decay')
117 | tf.add_to_collection('img_net_weight_decay', wd)
118 | self.varlist_conv.append(var)
119 | return var
120 |
121 | def get_bias_conv(self, name):
122 | var = tf.Variable(self.data_dict[name]['biases'], name="biases",
123 | trainable = (self.lr > 0), dtype = tf.float32)
124 | self.varlist_conv.append(var)
125 | return var
126 |
127 | def max_pool(self, bottom, name):
128 | return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
129 |
130 | def fc_layer(self, bottom, name, shape):
131 | with tf.variable_scope(name) as scope:
132 | weights = self.get_fc_weight(name, shape)
133 | biases = self.get_bias(name, [shape[1]])
134 |
135 | # Fully connected layer. Note that the '+' operation automatically
136 | # broadcasts the biases.
137 | fc = tf.nn.bias_add(tf.matmul(bottom, weights), biases)
138 |
139 | return fc
140 |
141 | def get_bias(self, name, shape):
142 | if self.data_dict.get(name):
143 | init = tf.constant_initializer(
144 | value = self.data_dict[name]['biases'], dtype = tf.float32)
145 | else:
146 | init = tf.constant_initializer(0.015)
147 | print('[WARNING]Region Feat Net %s layer\'s bias are random init '
148 | 'with shape [%d]' % (name, shape[0]))
149 |
150 | var = tf.get_variable(name = 'bias', initializer = init,
151 | shape = shape, dtype = tf.float32)
152 | self.varlist_region.append(var)
153 | return var
154 |
155 | def get_fc_weight(self, name, shape):
156 | if self.data_dict.get(name):
157 | init = tf.constant_initializer(
158 | value = self.data_dict[name]['weights'], dtype = tf.float32)
159 | else:
160 | init = tf.random_normal_initializer(mean = 0.0, stddev = 0.0005)
161 | print('[WARNING]Region Feat Net %s layer\'s weights are '
162 | 'random init!' % name)
163 |
164 | var = tf.get_variable(name = 'weights', initializer = init,
165 | shape = shape, dtype = tf.float32)
166 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.weight_decay,
167 | name = 'weight_decay')
168 | tf.add_to_collection('img_net_weight_decay', weight_decay)
169 | self.varlist_region.append(var)
170 | return var
171 |
172 | def dropout(bottom, random_flag, dropout_ratio = 0.5, name = 'dropout'):
173 | with tf.variable_scope(name):
174 | drop_mask_r = tf.random_uniform(shape = tf.shape(bottom))
175 | drop_mask_r = tf.cast(tf.greater(drop_mask_r, dropout_ratio),
176 | tf.float32)
177 | drop_mask_v = tf.Variable(initial_value = np.zeros(1),
178 | validate_shape = False, trainable = False, dtype = tf.float32)
179 | assign_dropout = tf.assign(drop_mask_v, drop_mask_r,
180 | validate_shape = False)
181 | assign_dropout = tf.cond(tf.equal(random_flag, 1),
182 | lambda: tf.assign(drop_mask_v, drop_mask_r,
183 | validate_shape = False),
184 | lambda: tf.identity(drop_mask_v))
185 | return tf.div(tf.multiply(bottom, assign_dropout), (1 - dropout_ratio))
186 |
187 |
--------------------------------------------------------------------------------
/networks/net_wrapper.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | import numpy as np
3 | import pdb
4 | from .image_feat_net.vgg16.net import Vgg16
5 | from .image_feat_net.resnet101.net import Resnet101
6 | from .image_feat_net.net import ImageFeatNet
7 | from .text_feat_net.net import TextFeatNet
8 | from .pair_net.net import PairNet
9 | from .base import Model
10 |
11 | class NetWrapper(Model):
12 | """
13 | The class is for wrapping TextFeatNet, ImageFeatNet and PairNet
14 | together and provide a uniform interface for the network training
15 | steps.
16 | Attributes:
17 | sess: Tensorflow session
18 | opt_image: optimizer for image end
19 | opt_text: optimizer for text end
20 | train: a boolean indicating if the network is currently in the
21 | training phase, default is True
22 | pair_net_max_batch_size: an integer indicating the maximum batch
23 | size of the pair net, default is 500
24 | """
25 | def __init__(self, sess, image_net_type, image_lr_conv, image_lr_region, text_lr,
26 | pair_net_max_batch_size, train, image_init_npy, text_init_npy):
27 |
28 | self.image_net_type = image_net_type
29 | self.image_lr_conv = image_lr_conv
30 | self.image_lr_region = image_lr_region
31 | self.text_lr = text_lr
32 |
33 | self.sess = sess
34 | self.train = (train == 'train')
35 | self.pair_net = PairNet(sess, pair_net_max_batch_size,
36 | train = self.train)
37 | self.image_net = ImageFeatNet(sess, self.image_lr_conv,
38 | self.image_lr_region, train = self.train)
39 | self.text_net_opt_type = 'Adam'
40 | if self.image_lr_conv != 0 or self.image_lr_region != 0:
41 | self.text_net_opt_type = 'SGD'
42 | self.text_net = TextFeatNet(sess, self.text_lr, train = self.train,
43 | opt_type = self.text_net_opt_type, TextNet_npy_path = text_init_npy)
44 | if self.image_net_type == 'resnet101':
45 | self.text_feature_dim = 2049
46 | self.im_sub_net = Resnet101(RegionNet_npy_path = image_init_npy, train = self.train)
47 | elif self.image_net_type == 'vgg16':
48 | self.text_feature_dim = 4097
49 | self.im_sub_net = Vgg16(self.image_lr_conv,
50 | RegionNet_npy_path = image_init_npy, train = self.train)
51 | self.data_dict = None
52 | self.varlist = None
53 |
54 | def set_input(self, data):
55 | self.data_dict = data
56 |
57 | def build(self):
58 | with tf.variable_scope('Text_Network'):
59 | self.text_net.build(output_feature_dim = self.text_feature_dim)
60 |
61 | with tf.variable_scope('Image_Network'):
62 | if self.image_net_type == 'resnet101':
63 | self.image_net.build(self.im_sub_net, feature_dim = 2048, roi_size = 14)
64 | else:
65 | self.image_net.build(self.im_sub_net)
66 |
67 | with tf.variable_scope('Pair_Network'):
68 | self.pair_net.build(im_feat = self.image_net.output,
69 | dy_param = self.text_net.output,
70 | feature_dim = self.text_feature_dim - 1)
71 |
72 | self.image_net.output_grad = (
73 | self.pair_net.gradients_pool[self.image_net.output])
74 | self.text_net.output_grad = (
75 | self.pair_net.gradients_pool[self.text_net.output])
76 |
77 | self.image_net.accumulate()
78 | self.text_net.accumulate()
79 | self.varlist = self.image_net.sub_net.varlist_conv\
80 | + self.image_net.sub_net.varlist_region\
81 | + self.text_net.varlist\
82 | + self.text_net.varlist_relu
83 |
84 | def forward(self, compute_grads = True, compute_loss = True):
85 | self.image_net.set_input(self.data_dict['images'],
86 | self.data_dict['rois'])
87 | self.image_net.forward()
88 | self.text_net.set_input(self.data_dict['phrases'])
89 | self.text_net.forward()
90 | self.pair_net.set_input(self.data_dict['roi_ids'],
91 | self.data_dict['phrase_ids'],
92 | self.data_dict['labels'],
93 | self.data_dict['loss_weights'],
94 | self.data_dict['sources'])
95 | self.pair_net.forward(compute_grads, compute_loss)
96 |
97 | def backward(self):
98 | self.pair_net.backward()
99 | self.text_net.backward()
100 | self.image_net.backward()
101 |
102 | def forward_backward(self):
103 | self.forward()
104 | self.backward()
105 |
106 | def get_output(self, current_iter = 0):
107 | self.output = self.pair_net.get_output()
108 | if current_iter is not 0:
109 | self.show_result(current_iter)
110 | return self.output
111 |
112 | def show_result(self, current_iter):
113 | self.prediction = self.output[1] > 0.5
114 | total_pos = np.sum(self.data_dict['labels'] == 1)
115 | total_predict = np.sum(self.prediction == 1)
116 | self.recall = (np.sum((self.data_dict['labels'] == 1) *
117 | (self.data_dict['labels'] ==
118 | self.prediction[:, 0]))
119 | / total_pos)
120 | self.precision = (np.sum((self.data_dict['labels'] == 1) *
121 | (self.data_dict['labels'] ==
122 | self.prediction[:, 0]))
123 | / total_predict)
124 | #print results
125 | print('Iter: %d' % current_iter)
126 | print('Looked images:', self.data_dict['image_ids'])
127 | print('\t[$$]Precision: %f, Recall: %f' % (self.precision,
128 | self.recall))
129 | print('\t[TL] Total loss is %f' % self.output[0])
130 | print('\t[PL]Raw positive loss is %f' % self.output[2])
131 | print('\t[NL]Raw negative loss is %f' % self.output[3])
132 | print('\t[RL]Raw rest loss is %f\n' % self.output[4])
133 |
--------------------------------------------------------------------------------
/networks/pair_net/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/networks/pair_net/__init__.py
--------------------------------------------------------------------------------
/networks/pair_net/net.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | import numpy as np
3 | import os
4 | import pdb
5 | from ..base_net.net import BaseNet
6 |
7 | class PairNet(BaseNet):
8 | """ Network model for Pair Net
9 | Attributes:
10 | sess:
11 | max_batch_size:
12 | train:
13 | """
14 | def __init__(self, sess, max_batch_size, train = True):
15 | super(PairNet, self).__init__(sess)
16 | #model hyperparameters
17 | self.max_batch_size = max_batch_size
18 | self.train = train
19 | self.epsilon = 1e-6
20 |
21 | #physical inputs should be numpy arrays
22 | self.image_ids = None #[Total ~ M X N]
23 | self.text_ids = None #[Total]
24 | self.p_labels = None #[Total]
25 | self.p_loss_weights = None #[Total]
26 | self.p_sources = None #[Total]
27 | self.batch_total = tf.placeholder(tf.float32, name = 'batch_size')
28 | self.pos_batch_total = tf.placeholder(tf.float32,
29 | name = 'pos_batch_size')
30 | self.neg_batch_total = tf.placeholder(tf.float32,
31 | name = 'neg_batch_size')
32 | self.res_batch_total = tf.placeholder(tf.float32,
33 | name = 'res_batch_size')
34 |
35 | #physcial outputs
36 | self.p_loss = None
37 | self.p_sim = None
38 |
39 | def build(self, im_feat = tf.placeholder(tf.float32, name = 'im_feat'),
40 | dy_param = tf.placeholder(tf.float32, name = 'dy_param'),
41 | feature_dim = 4096, image_dropout = 0.3,
42 | text_dropout = 0.3, weight_decay= 1e-7):
43 |
44 | #model parameters
45 | self.feature_dim = feature_dim
46 | self.weight_decay = weight_decay
47 | self.image_dropout = image_dropout
48 | self.text_dropout = text_dropout
49 | if not self.train:
50 | self.image_dropout = 0
51 | self.text_dropout = 0
52 |
53 | #used for data loader
54 | #[batch_size, feature_dim (+ 1)] for im_feat and dy_param
55 | self.im_feat = im_feat
56 | self.dy_param = dy_param
57 | self.im_idx = tf.placeholder(tf.int32, name = 'im_idx')
58 | self.txt_idx = tf.placeholder(tf.int32, name = 'txt_idx')
59 | self.labels = tf.placeholder(tf.int32, name = 'labels')
60 | self.loss_weights = tf.placeholder(tf.float32, name = 'loss_weights')
61 | self.sources = tf.placeholder(tf.int32, name = 'pair_sources')
62 |
63 | #######################################################################
64 | ######################## NETWORK STARTS #########################
65 | #######################################################################
66 | self.im_feat_chosen = tf.gather(self.im_feat, self.im_idx)
67 | self.dy_param_chosen = tf.gather(self.dy_param, self.txt_idx)
68 |
69 | self.im_feat_dropout = tf.nn.dropout(self.im_feat_chosen,
70 | 1 - self.image_dropout)
71 | self.dy_param_decay = self.weight_decay * tf.to_double(tf.nn.l2_loss(self.dy_param_chosen))
72 |
73 | #prepare kernel
74 | self.dy_kernel = tf.slice(self.dy_param_chosen,
75 | [0, 0], [-1, self.feature_dim])
76 | self.dy_bias = tf.slice(self.dy_param_chosen,
77 | [0, self.feature_dim], [-1, 1])
78 | self.dy_kernel_dropout = tf.nn.dropout(self.dy_kernel, 1 - self.text_dropout)
79 |
80 | #get binary classification score
81 | self.cls_pre = tf.reduce_sum(tf.multiply(self.im_feat_dropout,
82 | self.dy_kernel_dropout), 1)
83 | self.cls_single = tf.add(tf.expand_dims(self.cls_pre, -1), self.dy_bias)
84 | self.cls = tf.concat([-1 * self.cls_single, self.cls_single], axis = 1)
85 | #logit = lambda x: np.log(x) - np.log(1 - x)
86 | #self.cls = tf.clip_by_value(tf.add(tf.expand_dims(self.cls_pre, -1), self.dy_bias), logit(self.epsilon), logit(1 - self.epsilon))
87 |
88 | self.sim = tf.slice(tf.nn.softmax(self.cls), [0, 1], [-1, 1])
89 | self.full_labels = tf.one_hot(self.labels, 2)
90 | losses = tf.nn.softmax_cross_entropy_with_logits(
91 | labels = self.full_labels, logits = self.cls)
92 | self.losses = tf.expand_dims(tf.multiply(self.loss_weights, losses), -1)
93 | self.pos_mask = tf.expand_dims(tf.equal(self.sources, 1), -1)
94 | self.neg_mask = tf.expand_dims(tf.equal(self.sources, 0), -1)
95 | self.rest_mask = tf.expand_dims(tf.equal(self.sources, 2), -1)
96 |
97 | pos_mask = tf.cast(self.pos_mask, tf.float32)
98 | neg_mask = tf.cast(self.neg_mask, tf.float32)
99 | rest_mask = tf.cast(self.rest_mask, tf.float32)
100 |
101 | self.pos_loss = tf.reduce_sum(tf.multiply(self.losses, pos_mask)) / self.pos_batch_total
102 | self.neg_loss = tf.reduce_sum(tf.multiply(self.losses, neg_mask)) / self.neg_batch_total
103 | self.rest_loss = tf.reduce_sum(tf.multiply(self.losses, rest_mask)) / self.res_batch_total
104 | self.cls_loss = self.pos_loss + self.neg_loss + self.rest_loss
105 |
106 | #accumulate gradients
107 | self.current_batch_size = tf.to_float(tf.shape(self.im_feat_chosen)[0])
108 | self.loss = tf.to_float(self.dy_param_decay) + self.cls_loss
109 | self.xs = [self.im_feat, self.dy_param]
110 | self.ys = [self.loss]
111 | self.accumulate()
112 |
113 | def set_input(self, image_ids, text_ids, labels = None,
114 | loss_weights = None, sources = None):
115 | self.image_ids = image_ids
116 | self.text_ids = text_ids
117 | self.p_labels = labels
118 | self.p_loss_weights = loss_weights
119 | self.p_sources = sources
120 |
121 | def get_output(self):
122 | return (self.p_loss, self.p_sim, self.p_pos_loss, self.p_neg_loss,
123 | self.p_rest_loss)
124 |
125 | def forward(self, compute_grads = True, compute_loss = True):
126 | #initialize accumulating variables for this batch
127 | current_im_idx = 0
128 | self.p_loss = 0
129 | self.p_pos_loss = 0
130 | self.p_neg_loss = 0
131 | self.p_rest_loss = 0
132 | self.p_decay = 0
133 | self.p_sim = np.zeros((0, 1))
134 |
135 | #set parameters for this batch
136 | total_pos = np.sum(self.p_sources == 1)
137 | assert total_pos == np.sum(self.p_labels == 1)
138 | total_neg = np.sum(self.p_sources == 0)
139 | total_res = np.sum(self.p_sources == 2)
140 | self.p_batch_sizes = self.text_ids.shape[0] * 1.0
141 |
142 | self.batch_size = self.max_batch_size
143 | #start accumulate gradients for subbatches
144 | while current_im_idx < self.text_ids.shape[0]:
145 | if compute_grads:
146 | [p_loss, p_pos_loss, p_neg_loss, p_rest_loss, dy_param_decay, p_sim, _] = (
147 | self.sess.run(
148 | [self.loss, self.pos_loss, self.neg_loss, self.rest_loss, self.dy_param_decay, self.sim, self.accumulate_grad],
149 | feed_dict = {
150 | self.im_idx:
151 | self.image_ids[current_im_idx: current_im_idx + self.batch_size],
152 | self.txt_idx:
153 | self.text_ids[current_im_idx: current_im_idx + self.batch_size],
154 | self.labels:
155 | self.p_labels[current_im_idx: current_im_idx + self.batch_size],
156 | self.loss_weights:
157 | self.p_loss_weights[current_im_idx: current_im_idx + self.batch_size],
158 | self.sources:
159 | self.p_sources[current_im_idx: current_im_idx + self.batch_size],
160 | self.batch_total: self.p_batch_sizes,
161 | self.pos_batch_total: total_pos,
162 | self.neg_batch_total: total_neg,
163 | self.res_batch_total: total_res,
164 | self.batch_num: current_im_idx}))
165 | elif compute_loss:
166 | [p_loss, p_pos_loss, p_neg_loss, p_rest_loss, dy_param_decay, p_sim] = (
167 | self.sess.run(
168 | [self.loss, self.pos_loss, self.neg_loss,
169 | self.rest_loss, self.dy_param_decay, self.sim],
170 | feed_dict = {
171 | self.im_idx:
172 | self.image_ids[current_im_idx:
173 | current_im_idx +
174 | self.batch_size],
175 | self.txt_idx:
176 | self.text_ids[current_im_idx:
177 | current_im_idx +
178 | self.batch_size],
179 | self.labels:
180 | self.p_labels[current_im_idx:
181 | current_im_idx +
182 | self.batch_size],
183 | self.loss_weights:
184 | self.p_loss_weights[current_im_idx:
185 | current_im_idx +
186 | self.batch_size],
187 | self.sources:
188 | self.p_sources[current_im_idx:
189 | current_im_idx +
190 | self.batch_size],
191 | self.batch_total: self.p_batch_sizes,
192 | self.pos_batch_total: total_pos,
193 | self.res_batch_total: total_res,
194 | self.neg_batch_total: total_neg}))
195 | else:
196 | [p_sim] = self.sess.run([self.sim],
197 | feed_dict = {
198 | self.im_idx:
199 | self.image_ids[current_im_idx:
200 | current_im_idx +
201 | self.batch_size],
202 | self.txt_idx:
203 | self.text_ids[current_im_idx:
204 | current_im_idx +
205 | self.batch_size]})
206 | p_loss = None
207 | if compute_loss:
208 | self.p_loss += p_loss
209 | self.p_pos_loss += p_pos_loss
210 | self.p_neg_loss += p_neg_loss
211 | self.p_rest_loss += p_rest_loss
212 | self.p_decay += dy_param_decay
213 | self.p_sim = np.concatenate((self.p_sim, p_sim))
214 | current_im_idx += self.batch_size
215 |
216 | #avoid small subbatch
217 | if self.p_batch_sizes > current_im_idx + self.batch_size and\
218 | self.p_batch_sizes - current_im_idx - self.batch_size < 0.25 * self.batch_size:
219 | self.batch_size = int(self.p_batch_sizes - current_im_idx)
220 | return
221 |
--------------------------------------------------------------------------------
/networks/text_feat_net/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuanluya/dbnet_tensorflow/991bba69fed28f96dd289bd393d566098f3185c2/networks/text_feat_net/__init__.py
--------------------------------------------------------------------------------
/networks/text_feat_net/net.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | import numpy as np
3 | import os
4 | import pdb
5 | import inspect
6 |
7 | class TextFeatNet:
8 | """ Network for Text Feat Net
9 | Attributes:
10 | sess: Tensorflow session
11 | opt: Optimizer
12 | max_batch_size:
13 | train:
14 | sequence_length:
15 | TextNet_npy_path:
16 | """
17 | def __init__(self, sess, lr, max_batch_size = 32,
18 | train = True, sequence_length = 256, opt_type = 'Adam',
19 | TextNet_npy_path = 'Text_Feat_Net.npy'):
20 | #load saved model
21 | try:
22 | path = inspect.getfile(TextFeatNet)
23 | path = os.path.abspath(os.path.join(path, os.pardir))
24 | TextNet_npy_path = os.path.join(path, TextNet_npy_path)
25 | self.data_dict = np.load(TextNet_npy_path, encoding='latin1').item()
26 | print("Text Feat Net npy file loaded")
27 | except:
28 | self.data_dict = {}
29 | print('[WARNING]Text Feat load file not found, train from scratch!')
30 | self.lr = lr
31 | self.backward_counter = 0
32 | self.opt_type = opt_type
33 | if self.opt_type == 'SGD':
34 | self.opt = tf.train.GradientDescentOptimizer(self.lr)
35 | self.opt_relu = tf.train.GradientDescentOptimizer(0.1 * self.lr)
36 | elif self.opt_type == 'Adam':
37 | self.opt = tf.train.AdamOptimizer(self.lr)
38 | self.opt_relu = tf.train.AdamOptimizer(0.1 * self.lr)
39 | self.varlist = []
40 | self.varlist_relu = []
41 | #model hyperparameters
42 | self.sess = sess
43 | self.sequence_length = sequence_length
44 | self.max_batch_size = max_batch_size
45 | self.train = train
46 |
47 | #physcial outputs
48 | self.p_dy_param = None
49 |
50 | def build(self, output_grad = tf.placeholder(tf.float32),
51 | input_feature_dim = 74, weight_decay= 5e-4,
52 | relu_coef = 0.1, batch_size = 16,
53 | output_feature_dim = 4097):
54 |
55 | #optimization utility
56 | self.output_grad = output_grad
57 |
58 | self.batch_size = batch_size
59 | assert self.batch_size < self.max_batch_size
60 |
61 | #model parameters
62 | self.input_feature_dim = input_feature_dim
63 | self.output_feature_dim = output_feature_dim
64 | self.weight_decay = weight_decay
65 | self.relu_coef = relu_coef
66 |
67 | #physical inputs should be numpy arrays
68 | self.texts = tf.placeholder(tf.float32,
69 | shape = [None, 1, self.sequence_length,
70 | self.input_feature_dim])
71 | self.p_texts = None
72 |
73 | #######################################################################
74 | ######################## NETWORK STARTS #########################
75 | #######################################################################
76 |
77 | #conv1
78 | self.conv1 = self.conv_layer(self.texts, 'conv1',
79 | [1, 7, self.input_feature_dim, 256])
80 | self.conv1_relu = self.lrelu(self.conv1, self.relu_coef, 'conv1_relu')
81 | self.conv1_pool = tf.nn.max_pool(self.conv1_relu,
82 | [1, 1, 2, 1], [1, 1, 2, 1],
83 | padding = 'VALID')
84 |
85 | #conv2
86 | self.conv2_1 = self.conv_layer(self.conv1_pool, 'conv2_1',
87 | [1, 7, 256, 256])
88 | self.conv2_1_relu = self.lrelu(self.conv2_1, self.relu_coef, 'conv2_1_relu')
89 | self.conv2_2 = self.conv_layer(self.conv2_1_relu, 'conv2_2',
90 | [1, 3, 256, 256])
91 | self.conv2_2_relu = self.lrelu(self.conv2_2, self.relu_coef, 'conv2_2_relu')
92 | self.conv2_3 = self.conv_layer(self.conv2_2_relu, 'conv2_3',
93 | [1, 3, 256, 256])
94 | self.conv2_3_relu = self.lrelu(self.conv2_3, self.relu_coef, 'conv2_3_relu')
95 | self.conv2_3_pool = tf.nn.max_pool(self.conv2_3_relu,
96 | [1, 1, 2, 1], [1, 1, 2, 1],
97 | padding = 'VALID',
98 | name = 'conv2_3_pooling')
99 |
100 | #conv3
101 | self.conv3_1 = self.conv_layer(self.conv2_3_pool, 'conv3_1',
102 | [1, 3, 256, 512])
103 | self.conv3_1_relu = self.lrelu(self.conv3_1, self.relu_coef, 'conv3_1_relu')
104 | self.conv3_2 = self.conv_layer(self.conv3_1_relu, 'conv3_2',
105 | [1, 3, 512, 512])
106 | self.conv3_2_relu = self.lrelu(self.conv3_2, self.relu_coef, 'conv3_2_relu')
107 | self.conv3_2_pool = tf.nn.max_pool(self.conv3_2_relu,
108 | [1, 1, 2, 1], [1, 1, 2, 1],
109 | padding = 'VALID',
110 | name = 'conv3_2_pooling')
111 |
112 | #fully connected
113 | expand_size = self.conv3_2_pool.get_shape().as_list()
114 | self.conv3_2_reshape_1 = tf.reshape(self.conv3_2_pool,
115 | [-1, expand_size[1] * expand_size[2], expand_size[3] ])
116 | self.conv3_2_transpose = tf.transpose(self.conv3_2_reshape_1, [0, 2, 1])
117 | self.conv3_2_reshape_2 = tf.reshape(self.conv3_2_transpose,
118 | [-1, expand_size[1] * expand_size[2] * expand_size[3]])
119 |
120 | self.fc4 = self.fc_layer(self.conv3_2_reshape_2, 'fc4', 2048, 0.1, 0.01)
121 | self.fc4_relu = self.lrelu(self.fc4, self.relu_coef, 'fc4_relu')
122 |
123 | #dynamic filters
124 | self.pre_dy_fc1 = self.fc_layer(self.fc4_relu, 'pre_dy_fc1', 2048, bias_decay = True)
125 | self.pre_dy_fc1_relu = self.lrelu(self.pre_dy_fc1, self.relu_coef,
126 | 'pre_dy_fc1_relu', constant = False)
127 | self.pre_dy_fc2 = self.fc_layer(self.pre_dy_fc1_relu,
128 | 'pre_dy_fc2', 2048, bias_decay = True)
129 | self.pre_dy_fc2_relu = self.lrelu(self.pre_dy_fc2, 1.5 * self.relu_coef,
130 | 'pre_dy_fc2_relu', constant = False)
131 | self.dy_param = self.fc_layer(self.pre_dy_fc2_relu,
132 | 'dy_param', self.output_feature_dim, bias_decay = True)
133 |
134 | self.output = tf.Variable(initial_value = 1.0, trainable = False,
135 | validate_shape = False, dtype = tf.float32)
136 | self.get_output = tf.assign(self.output, self.dy_param,
137 | validate_shape = False)
138 |
139 | #gather weight decays
140 | self.wd = tf.add_n(tf.get_collection('txt_net_weight_decay'),
141 | name = 'txt_net_total_weight_decay')
142 |
143 | def accumulate(self):
144 | #gradients calculation
145 | self.ys = [self.wd, self.dy_param]
146 | self.grad_ys = [1.0, self.output_grad]
147 |
148 | self.gradients = tf.gradients(self.ys, self.varlist, grad_ys = self.grad_ys)
149 | self.gradients_relu = tf.gradients(self.ys, self.varlist_relu, grad_ys = self.grad_ys)
150 |
151 | self.grad_and_vars = []
152 | self.grad_and_vars_relu = []
153 |
154 | for idx, var in enumerate(self.varlist):
155 | self.grad_and_vars.append((tf.clip_by_value(self.gradients[idx], -10, 10), var))
156 | for idx, var in enumerate(self.varlist_relu):
157 | self.grad_and_vars_relu.append((self.gradients_relu[idx], var))
158 |
159 | with tf.control_dependencies(self.gradients + self.gradients_relu):
160 | self.train_op = tf.group(self.opt.apply_gradients(self.grad_and_vars),
161 | self.opt_relu.apply_gradients(self.grad_and_vars_relu))
162 | self.safe_ops = {}
163 | for v in self.varlist:
164 | self.safe_ops[v] = tf.assign(v, tf.where(tf.is_finite(v), v, 1e-25 * tf.ones_like(v)))
165 |
166 | def set_input(self, texts):
167 | self.p_texts = texts
168 |
169 | def get_output(self):
170 | return self.p_dy_param
171 |
172 | def forward(self, physical_output = False):
173 | if physical_output:
174 | [self.p_dy_param] = self.sess.run([self.get_output],
175 | feed_dict = {self.texts:
176 | self.p_texts})
177 | else:
178 | self.sess.run([self.get_output],
179 | feed_dict = {self.texts: self.p_texts})
180 |
181 | return
182 |
183 | def backward(self):
184 | self.sess.run(self.train_op, feed_dict = {self.texts: self.p_texts})
185 | return
186 |
187 | #shape: [h, w, in_channel, out_channel]
188 | def conv_layer(self, bottom, name, shape,
189 | strides = [1, 1, 1, 1], weight_init_std = 0.1, bias_init_value = 0.01):
190 | conv_filter = self.get_weight(name, shape, weight_init_std)
191 | biases = self.get_bias(name, shape[3], bias_init_value)
192 | conv = tf.nn.conv2d(bottom, conv_filter, strides, 'SAME')
193 | result = tf.nn.bias_add(conv, biases)
194 | return result
195 |
196 | def fc_layer(self, bottom, name, output_shape, weight_init_std = 0.001, bias_init_value = 0.0, bias_decay = False):
197 | weights = self.get_weight(name, [bottom.get_shape()[1], output_shape], weight_init_std)
198 | biases = self.get_bias(name, [output_shape], bias_init_value, bias_decay)
199 | fc = tf.nn.bias_add(tf.matmul(bottom, weights), biases)
200 | return fc
201 |
202 | def lrelu(self, x, leak = 0.1, name = 'lrelu', constant = True):
203 | if not constant:
204 | if self.data_dict.get(name) is not None:
205 | init = tf.constant_initializer(
206 | value = self.data_dict[name], dtype = tf.float32)
207 | else:
208 | init = tf.constant_initializer(leak)
209 | x_shape = x.get_shape().as_list()
210 | x_shape[0] = 1
211 | with tf.variable_scope(name):
212 | leak = tf.get_variable(name = 'relu_params', initializer = init,
213 | shape = x_shape, dtype = tf.float32)
214 | self.varlist_relu.append(leak)
215 | f1 = 0.5 * (1 + leak)
216 | f2 = 0.5 * (1 - leak)
217 | return f1 * x + f2 * abs(x)
218 |
219 | def get_bias(self, name, shape, init_value = 0.0, weight_decay = False):
220 | if self.data_dict.get(name):
221 | init = tf.constant_initializer(
222 | value = self.data_dict[name]['biases'], dtype = tf.float32)
223 | else:
224 | init = tf.constant_initializer(init_value)
225 | print('[WARNING]This is random init!')
226 | with tf.variable_scope(name):
227 | var = tf.get_variable(name = 'bias', initializer = init,
228 | shape = shape, dtype = tf.float32)
229 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.weight_decay,
230 | name = 'weight_decay')
231 | tf.add_to_collection('txt_net_weight_decay', weight_decay)
232 | self.varlist.append(var)
233 | return var
234 |
235 | def get_weight(self, name, shape, init_std = 0.01):
236 | if self.data_dict.get(name):
237 | init = tf.constant_initializer(
238 | value = self.data_dict[name]['weights'], dtype = tf.float32)
239 | else:
240 | init = tf.random_normal_initializer(mean = 0.0, stddev = init_std)
241 | print('[WARNING]This is random init!')
242 | with tf.variable_scope(name):
243 | var = tf.get_variable(name = 'weights', initializer = init,
244 | shape = shape, dtype = tf.float32)
245 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.weight_decay,
246 | name = 'weight_decay')
247 | tf.add_to_collection('txt_net_weight_decay', weight_decay)
248 | self.varlist.append(var)
249 | return var
250 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | cv2==1.0
2 | easydict==1.7
3 | numpy==1.12.1
4 | scipy==0.19.0
5 | tensorflow-gpu==1.1.0
6 | Pillow
7 | pyx
8 |
--------------------------------------------------------------------------------
/setup.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 | # ===========================
3 | # Usage: ./setup.sh (model|data)?
4 |
5 | if wget --help | grep -q 'show-progress'; then
6 | WGET_FLAG="-q --show-progress"
7 | else
8 | WGET_FLAG=""
9 | fi
10 |
11 | # create a tmp directory for the downloading data
12 | TMP_DIR="./tmp_download"
13 | mkdir -p "${TMP_DIR}"
14 |
15 | # downloading model
16 | download_model()
17 | {
18 | # directory for model
19 | MODEL_TAR_BALL="${TMP_DIR}/pretrained_model.tar.gz"
20 | MODEL_DIR="${TMP_DIR}/pretrained_model"
21 | mkdir -p "${MODEL_DIR}"
22 |
23 | MODEL_URL="http://www.ytzhang.net/files/dbnet/tensorflow/dbnet-pretrained.tar.gz"
24 | echo "Downloading pre-trained models ..."
25 | wget ${WGET_FLAG} "${MODEL_URL}" -O "${MODEL_TAR_BALL}"
26 | echo "Uncompressing pre-trained models ..."
27 | tar -xzf "${MODEL_TAR_BALL}" -C "${MODEL_DIR}"
28 |
29 | # move model to default directories
30 | VGG_REGION_NET_DIR="./networks/image_feat_net/vgg16"
31 | RESNET_REGION_NET_DIR="./networks/image_feat_net/resnet101"
32 | TEXT_NET_DIR="./networks/text_feat_net"
33 | echo "Move pre-trained image network model to ${VGG_REGION_NET_DIR} ..."
34 | mv ${MODEL_DIR}/vgg16_Region_Feat_Net.npy "${VGG_REGION_NET_DIR}/Region_Feat_Net.npy"
35 | mv ${MODEL_DIR}/vgg16_frcnn_Region_Feat_Net.npy "${VGG_REGION_NET_DIR}/frcnn_Region_Feat_Net.npy"
36 | echo "Move pre-trained image network model to ${RESNET_REGION_NET_DIR} ..."
37 | mv ${MODEL_DIR}/resnet101_Region_Feat_Net.npy "${RESNET_REGION_NET_DIR}/Region_Feat_Net.npy"
38 | mv ${MODEL_DIR}/resnet101_frcnn_Region_Feat_Net.npy "${RESNET_REGION_NET_DIR}/frcnn_Region_Feat_Net.npy"
39 | echo "Move pre-trained text network model to ${TEXT_NET_DIR} ..."
40 | mv ${MODEL_DIR}/*Text*.npy "${TEXT_NET_DIR}"
41 | }
42 |
43 | # downloading data
44 | download_data()
45 | {
46 | # directory for data
47 | DATA_TAR_BALL="${TMP_DIR}/data.tar.gz"
48 | DATA_DIR="./data"
49 | mkdir -p "${DATA_DIR}"
50 |
51 | DATA_URL="http://www.ytzhang.net/files/dbnet/data/vg_v1_json_.tar.gz"
52 | echo "Downloading data ..."
53 | wget ${WGET_FLAG} "${DATA_URL}" -O "${DATA_TAR_BALL}"
54 | echo "Uncompressing data ..."
55 | tar -xzf "${DATA_TAR_BALL}" -C "${DATA_DIR}"
56 | }
57 |
58 | # default to download all
59 | if [ $# -eq 0 ]; then
60 | download_model
61 | download_data
62 | else
63 | case $1 in
64 | "model") download_model
65 | ;;
66 | "data") download_data
67 | ;;
68 | *) echo "Usage: ./setup.sh [OPTION]"
69 | echo ""
70 | echo "No option will download both model and data."
71 | echo ""
72 | echo "OPTION:\n\tmodel: only download the pre-trained models (.npy)"
73 | echo "\tdata: only download the data(.json)"
74 | ;;
75 | esac
76 | fi
77 |
78 | # clear the tmp files
79 | rm -rf "${TMP_DIR}"
80 |
--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import os
3 | import json
4 | import scipy.io as sio
5 | import cv2
6 | import itertools
7 | import operator
8 | import time
9 | import sys
10 | import pdb
11 | from utils import get_scaled_im_tensor, get_scaled_roi,\
12 | get_txt_tensor, im2rid, rid2r
13 | from config import ENV_PATHS, DS_CONFIG
14 | sys.path.append(ENV_PATHS.EDGE_BOX_RPN)
15 |
16 | level1_im2p = json.load(open(ENV_PATHS.LEVEL1_TEST, 'r'))
17 | level2_im2p = json.load(open(ENV_PATHS.LEVEL2_TEST, 'r'))
18 |
19 | def get_edgeboxes_test(img_id, top_num):
20 | try:
21 | raw_boxes = sio.loadmat(os.path.join(ENV_PATHS.EDGEBOX_PATH,
22 | str(img_id) + '.mat'))['bbs'][0: top_num, :]
23 | except:
24 | import edge_boxes
25 | raw_boxes_ = edge_boxes.get_windows([img_id])[0][0: top_num, :]
26 | raw_boxes = np.zeros(raw_boxes_.shape)
27 | raw_boxes[:, 0] = raw_boxes_[:, 1]
28 | raw_boxes[:, 1] = raw_boxes_[:, 0]
29 | raw_boxes[:, 2] = raw_boxes_[:, 3] - raw_boxes_[:, 1] + 1
30 | raw_boxes[:, 3] = raw_boxes_[:, 2] - raw_boxes_[:, 0] + 1
31 |
32 | edge_boxes = np.zeros((0,4))
33 | edge_boxes = np.concatenate((edge_boxes, raw_boxes[:, 0:4]))
34 | return edge_boxes
35 |
36 | # NMS referenced from http://www.pyimagesearch.com/2015/02/16/faster-non-maximum-suppression-python/
37 | # for each box, in the format [x1, y1, x2, y2, score]
38 | def non_max_suppression(boxes, overlapThresh):
39 | # if there are no boxes, return an empty list
40 | if len(boxes) == 0:
41 | return []
42 |
43 | # if the bounding boxes integers, convert them to floats --
44 | # this is important since we'll be doing a bunch of divisions
45 | if boxes.dtype.kind == "i":
46 | boxes = boxes.astype("float")
47 |
48 | # initialize the list of picked indexes
49 | pick = []
50 |
51 | # grab the coordinates of the bounding boxes
52 | x1 = boxes[:,0]
53 | y1 = boxes[:,1]
54 | x2 = boxes[:,2]
55 | y2 = boxes[:,3]
56 | scores = boxes[:,4]
57 |
58 | # compute the area of the bounding boxes and sort the bounding
59 | # boxes by the score
60 | area = (x2 - x1 + 1) * (y2 - y1 + 1)
61 | idxs = np.argsort(scores)
62 |
63 | # keep looping while some indexes still remain in the indexes
64 | # list
65 | while len(idxs) > 0:
66 | # grab the last index in the indexes list and add the
67 | # index value to the list of picked indexes
68 | last = len(idxs) - 1
69 | i = idxs[last]
70 | pick.append(i)
71 |
72 | # find the largest (x, y) coordinates for the start of
73 | # the bounding box and the smallest (x, y) coordinates
74 | # for the end of the bounding box
75 | xx1 = np.maximum(x1[i], x1[idxs[:last]])
76 | yy1 = np.maximum(y1[i], y1[idxs[:last]])
77 | xx2 = np.minimum(x2[i], x2[idxs[:last]])
78 | yy2 = np.minimum(y2[i], y2[idxs[:last]])
79 |
80 | # compute the width and height of the bounding box
81 | w = np.maximum(0, xx2 - xx1 + 1)
82 | h = np.maximum(0, yy2 - yy1 + 1)
83 |
84 | # compute the ratio of overlap
85 | overlap = (w * h) / (area[i] + area[idxs[:last]] - w * h)
86 |
87 | # delete all indexes from the index list that have
88 | idxs = np.delete(idxs, np.concatenate(([last],
89 | np.where(overlap > overlapThresh)[0])))
90 |
91 | # return only the bounding boxes that were picked using the
92 | # integer data type
93 | return boxes[pick]
94 |
95 | def get_pairs_test(img_id, level, edge_box_max, gt_box):
96 | # (region_id, t_id)
97 | pair_list = []
98 | region_ids = im2rid.get(str(img_id))
99 | if region_ids is None:
100 | region_ids = []
101 | edgebox_regions = get_edgeboxes_test(img_id, edge_box_max)
102 | edgebox_id = "edgebox_%s" % str(img_id)
103 | region_id = "region_%s" % str(img_id)
104 |
105 | # need to determine how to define the ids
106 | # currently __
107 | region_dict = {}
108 | counter = 0
109 | for i in range(edgebox_regions.shape[0]):
110 | e_id = edgebox_id + "_%d" % counter
111 | region_dict[e_id] = edgebox_regions[i, :]
112 | counter += 1
113 |
114 | if gt_box:
115 | phrase_ids = []
116 | counter = 0
117 | for rid in region_ids:
118 | r_info = rid2r[str(rid)]
119 | r_coord = [r_info['x'], r_info['y'], r_info['width'], r_info['height']]
120 | r_id = region_id + "_%d" % counter
121 | region_dict[r_id] = np.array(r_coord)
122 | # genenrate phrase ids
123 | phrase_ids.append(r_info['categ_id'])
124 | counter += 1
125 | else:
126 | # genenrate phrase ids
127 | phrase_ids = [rid2r[str(r)]['categ_id'] for r in region_ids]
128 |
129 | if level == 'level_1':
130 | phrase_ids = level1_im2p[str(img_id)]
131 | elif level == 'level_2':
132 | phrase_ids = level2_im2p[str(img_id)]
133 | elif level == 'vis':
134 | phrase_ids = [-1]
135 | elif level != 'level_0':
136 | print('wrong LEVEL parameter, ')
137 | assert(0)
138 |
139 | # generate pair
140 | pair_list = [(r_id, t_id) for t_id in phrase_ids for r_id in region_dict]
141 |
142 | return pair_list, region_dict
143 |
144 | def get_data(img_id, level, edge_box_max, gt_box, query_phrase = None):
145 | image_tensor, scale, shape = get_scaled_im_tensor([img_id],
146 | DS_CONFIG.target_size,
147 | DS_CONFIG.max_size)
148 | all_rois = np.zeros((0,5))
149 |
150 | # start gathering data for the testing image
151 | pair_list, region_dict = get_pairs_test(img_id, level, edge_box_max, gt_box)
152 | rois_list = [pair[0] for pair in pair_list]
153 | phrases_list = [pair[1] for pair in pair_list]
154 | unique_rois_ids, inverse_region_ids = (
155 | np.unique(rois_list, return_inverse = True))
156 | test_rois = get_scaled_roi(unique_rois_ids, region_dict,
157 | scale[0], shape[0], 0)
158 | all_rois = np.concatenate((all_rois, test_rois))
159 |
160 | unique_phrase_ids, inverse_phrase_ids = (
161 | np.unique(phrases_list, return_inverse = True))
162 | phrase_tensor = get_txt_tensor(unique_phrase_ids, query_phrase)
163 |
164 | return (pair_list, region_dict,
165 | {'raw_phrase': query_phrase,
166 | 'images': image_tensor,
167 | 'phrases': phrase_tensor,
168 | 'rois': all_rois,
169 | 'phrase_ids': inverse_phrase_ids,
170 | 'roi_ids': inverse_region_ids,
171 | 'labels': None,
172 | 'loss_weights': None,
173 | 'sources': None})
174 |
175 | def test_output(img_id, phrase2r_dict, level, output_dir):
176 | os.makedirs("%s/tmp_output" % output_dir, exist_ok = True)
177 | f = open("%s/tmp_output/%s_%d.txt" % (output_dir, level, img_id), "w+")
178 | f.write(str(img_id) + ":")
179 | for t_id in phrase2r_dict:
180 | f.write("\n\t%s:" % t_id)
181 | # output the region informations
182 | for region in phrase2r_dict[t_id]:
183 | #write in order [y1, x1, y2, x2]
184 | f.write(" [%d, %d, %d, %d, %.6f]" %
185 | (region[1], region[0], region[3], region[2], region[4]))
186 | f.write("\n")
187 | f.close()
188 |
189 | def test(net, img_id, level, output_dir, top_num = 10, gt_box = False, query_phrase = None):
190 | if query_phrase is not None:
191 | assert(level == 'vis')
192 | t0 = time.time()
193 | pair_list, region_dict, data_dict = get_data(img_id, level, top_num, gt_box, query_phrase)
194 | net.set_input(data_dict)
195 | net.forward(False, False)
196 | scores = net.get_output()[1]
197 | scores = [s[0] for s in scores]
198 | t1 = time.time()
199 | print ("run through the network takes %f" % (t1 - t0))
200 |
201 | # build region np array for nms
202 | phrase2r_dict = {}
203 | combined_region_score = [pair_list[i] + (scores[i],)
204 | for i in range(len(scores))]
205 | for key, group in itertools.groupby(combined_region_score,
206 | operator.itemgetter(1)):
207 | # [x, y, w, h, score]
208 | regions_info = np.array([np.append(region_dict[info[0]], info[2])
209 | for info in list(group)])
210 | # change from [x, y, w, h] to [x1, y1, x2, y2]
211 | regions_info[:,2] += regions_info[:,0] - 1
212 | regions_info[:,3] += regions_info[:,1] - 1
213 |
214 | # apply nms on the top score regions
215 | regions_info = np.array(
216 | sorted(regions_info, key = lambda row: row[4])[::-1])
217 | regions_info_nms = non_max_suppression(regions_info, 0.3)
218 | phrase2r_dict[key] = regions_info_nms
219 |
220 | t2 = time.time()
221 | print ("run through the nms takes %f" % (t2 - t1))
222 | if query_phrase is None:
223 | test_output(img_id, phrase2r_dict, level, output_dir)
224 | print ("FINISH TESTING %s" % str(img_id))
225 | return phrase2r_dict
226 |
227 |
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | """
2 | Details:
3 | load the phrase json and track the image
4 | generate ground truth region & text phrase pair dict
5 | images need resize
6 | region id --> (x,y,w,h)
7 | """
8 | import numpy as np
9 | import os.path as osp
10 | import cv2
11 | import string
12 | import json
13 | import re
14 | from PIL import Image
15 | import pyx
16 | import time
17 | import scipy.io as sio
18 | import scipy.ndimage.interpolation as sni
19 | from collections import defaultdict
20 | from config import DS_CONFIG, ENV_PATHS
21 |
22 | #all ids are integer type
23 | t0 = time.clock()
24 | print('')
25 | raw_data = json.load(open(ENV_PATHS.RAW_DATA, 'r'))
26 | im2rid = raw_data['im2rid']
27 | rid2r = raw_data['rid2r']
28 | tid2p = raw_data['tid2p']
29 | valid_tids = [int(k) for k in list(tid2p.keys())]
30 |
31 | meteor = np.array(json.load(open(ENV_PATHS.METEOR, 'r')))
32 | vocab = [c for c in string.printable if c not in string.ascii_uppercase]
33 | VGG_MEAN = [103.939, 116.779, 123.68]
34 | phrase_freq_temp = json.load(open(ENV_PATHS.FREQUENCY, 'r'))['freq']
35 | phrase_freq = np.zeros(len(phrase_freq_temp))
36 | for _, tid in enumerate(valid_tids):
37 | phrase_freq[tid - 1] = phrase_freq_temp[tid - 1]
38 | total_frequency = np.sum(phrase_freq)
39 | phrase_prob = phrase_freq / total_frequency
40 |
41 | # split
42 | raw_split = json.load(open(ENV_PATHS.SPLIT, 'r'))
43 | train_ids = raw_split['train']
44 | test_ids = raw_split['test']
45 | val_ids = raw_split['val']
46 | print('' % (time.clock() - t0))
47 |
48 | #[x, y, width, height]
49 | def IoU(region_current, object_current):
50 | totalarea = (object_current[2] * object_current[3] +
51 | region_current[2] * region_current[3])
52 |
53 | if region_current[0] <= object_current[0]:
54 | x_left = object_current[0]
55 | else:
56 | x_left = region_current[0]
57 |
58 | if region_current[1] <= object_current[1]:
59 | y_left = object_current[1]
60 | else:
61 | y_left = region_current[1]
62 |
63 | if (region_current[0] + region_current[2] >=
64 | object_current[0] + object_current[2]):
65 | x_right = object_current[0] + object_current[2]
66 | else:
67 | x_right= region_current[0] + region_current[2]
68 |
69 | if (region_current[1] + region_current[3] >=
70 | object_current[1] + object_current[3]):
71 | y_right = object_current[1] + object_current[3]
72 | else:
73 | y_right= region_current[1] + region_current[3]
74 |
75 | if x_right <= x_left:
76 | intersection = 0
77 | elif y_right <= y_left:
78 | intersection = 0
79 | else:
80 | intersection = (x_right - x_left) * (y_right - y_left)
81 | union = totalarea - intersection
82 |
83 | return 1.0 * intersection / union
84 |
85 | def get_edgeboxes(img_id):
86 | raw_boxes = sio.loadmat(osp.join(ENV_PATHS.EDGEBOX_PATH, str(img_id) + '.mat'))['bbs']
87 | chosen_boxes = np.zeros((0, 4))
88 | chosen_boxes = np.concatenate((chosen_boxes,
89 | raw_boxes[0: DS_CONFIG.edge_box_high_rank_num, 0:4]))
90 | rand_ids = (np.random.permutation(
91 | raw_boxes.shape[0] -
92 | DS_CONFIG.edge_box_high_rank_num)[0: DS_CONFIG.edge_box_random_num]
93 | + DS_CONFIG.edge_box_high_rank_num)
94 | chosen_boxes = np.concatenate((chosen_boxes, raw_boxes[rand_ids, 0:4]))
95 | return chosen_boxes
96 |
97 | def get_label_same_img(img_id):
98 | #(region_id, t_id): label<0|1|2>
99 | pair2dict = {}
100 | ambiguous = {}
101 | region_ids = im2rid[str(img_id)]
102 | edgebox_regions = get_edgeboxes(img_id)
103 |
104 | #get a local dictionary for all regions
105 | local_region_dict = {}
106 | counter = 0
107 | for i in range(edgebox_regions.shape[0]):
108 | local_region_dict[counter] = edgebox_regions[i, :]
109 | counter += 1
110 | for rid in region_ids:
111 | r_info = rid2r[str(rid)]
112 | r_coord = [r_info['x'], r_info['y'], r_info['width'], r_info['height']]
113 | local_region_dict[counter] = np.array(r_coord)
114 | counter += 1
115 |
116 | #get labels based on IOU
117 | for r1 in local_region_dict:
118 | r1_coord = local_region_dict[r1]
119 | ambiguous[r1] = []
120 | for r2 in region_ids:
121 | t = rid2r[str(r2)]['categ_id']
122 | r2_info = rid2r[str(r2)]
123 | r2_coord = [r2_info['x'], r2_info['y'],
124 | r2_info['width'], r2_info['height']]
125 | iou = IoU(r1_coord, r2_coord)
126 | if iou <= DS_CONFIG.thre_neg:
127 | #always get the maximum iou
128 | if (pair2dict.get((r1, t)) is not None and
129 | (pair2dict[(r1, t)] == 1 or pair2dict[(r1, t)] == 2)):
130 | continue
131 | pair2dict[(r1, t)] = 0
132 | elif iou >= DS_CONFIG.thre_pos:
133 | if t in ambiguous[r1]:
134 | ambiguous[r1].remove(t)
135 | pair2dict[(r1, t)] = 1
136 | else:
137 | #always get the maximum iou
138 | if (pair2dict.get((r1, t)) is not None and
139 | pair2dict[(r1,t)] == 1):
140 | continue
141 | pair2dict[(r1,t)] = 2
142 | ambiguous[r1].append(t)
143 |
144 | return pair2dict, local_region_dict, ambiguous
145 |
146 | def get_label_diff_img(img_id, ambiguous):
147 | pair2dict = {}
148 | region_ids = im2rid[str(img_id)]
149 | #get random sampled phrases out of the given image
150 | t_ids_in_image = [rid2r[str(r_id)]['categ_id'] for r_id in region_ids]
151 | rand_t_ids = []
152 | check_next_tid = 0
153 | temp_rand = (np.random.choice(len(phrase_prob),
154 | int(1.1 * DS_CONFIG.text_rand_sample_size),
155 | p = phrase_prob, replace = True) + 1)
156 | while (len(rand_t_ids) < DS_CONFIG.text_rand_sample_size
157 | and check_next_tid < len(temp_rand)):
158 | if temp_rand[check_next_tid] not in t_ids_in_image:
159 | rand_t_ids.append(temp_rand[check_next_tid])
160 | check_next_tid += 1
161 |
162 | for t_id in rand_t_ids:
163 | for r_id in ambiguous:
164 | current_ambiguous = ambiguous[r_id]
165 | pair2dict[(r_id, t_id)] = 0
166 | for a_id in current_ambiguous:
167 | #upper triangle matrix
168 | if max(t_id, a_id) - 1 >= meteor.shape[0]:
169 | continue
170 | elif meteor[min(t_id, a_id) - 1,
171 | max(t_id, a_id) - 1] > DS_CONFIG.meteor_thred:
172 | pair2dict[(r_id, t_id)] = 2
173 | break
174 | return pair2dict
175 |
176 | def get_label_together(img_id):
177 | pair2dict_same, local_region_dict, ambiguous = get_label_same_img(img_id)
178 | pair2dict_diff = get_label_diff_img(img_id, ambiguous)
179 | data_book = []
180 | #(region_id, categ_id, label, loss_weights, category)
181 | for b_id, current_p2d in enumerate([pair2dict_same, pair2dict_diff]):
182 | for k in current_p2d:
183 | if current_p2d[k] == 1:
184 | data_book.append([k[0], k[1], 1, DS_CONFIG.pos_loss_weight, 1])
185 | elif current_p2d[k] == 0 and b_id == 0:#same image negative
186 | data_book.append([k[0], k[1], 0, DS_CONFIG.neg_loss_weight, 0])
187 | elif current_p2d[k] == 0 and b_id == 1:#diff image negative
188 | data_book.append([k[0], k[1], 0, DS_CONFIG.rest_loss_weight, 2])
189 | return np.array(data_book), local_region_dict
190 |
191 | # input a batch of images id
192 | def get_scaled_im_tensor(img_ids, target_size, max_size):
193 | images = []
194 | scales = []
195 | img_shapes = []
196 | max_w = -1
197 | max_h = -1
198 | # load each image
199 | for img_id in img_ids:
200 | im_path = osp.join(ENV_PATHS.IMAGE_PATH, str(img_id) + '.jpg')
201 | try:
202 | img = cv2.imread(im_path).astype('float')
203 | except:
204 | img = cv2.imread(img_id).astype('float')
205 | img_shapes.append([img.shape[1], img.shape[0]]) #(limit_x, limit_y)
206 | # calculate scale
207 | old_short = min(img.shape[0: 2])
208 | old_long = max(img.shape[0: 2])
209 | new_scale = 1.0 * target_size / old_short
210 | if old_long * new_scale > max_size:
211 | new_scale = 1.0 * max_size / old_long
212 | # subtract mean from the image
213 | img[:, :, 0] = img[:, :, 0] - VGG_MEAN[0]
214 | img[:, :, 1] = img[:, :, 1] - VGG_MEAN[1]
215 | img[:, :, 2] = img[:, :, 2] - VGG_MEAN[2]
216 | # scale the image
217 | img = cv2.resize(img, None, fx = new_scale, fy = new_scale,
218 | interpolation = cv2.INTER_LINEAR)
219 | images.append(img)
220 | scales.append([new_scale, new_scale])
221 | # find the max shape
222 | if img.shape[0] > max_h:
223 | max_h = img.shape[0]
224 | if img.shape[1] > max_w:
225 | max_w = img.shape[1]
226 | # padding the image to be the max size with 0
227 | for idx, img in enumerate(images):
228 | resize_h = max_h - img.shape[0]
229 | resize_w = max_w - img.shape[1]
230 | images[idx] = cv2.copyMakeBorder(img, 0, resize_h, 0, resize_w,
231 | cv2.BORDER_CONSTANT, value=(0,0,0))
232 |
233 | return np.array(images), np.array(scales), np.array(img_shapes)
234 |
235 | def get_txt_tensor(phrase_ids, phrases = None):
236 | if phrases is None:
237 | phrases = [tid2p[str(int(id))] for id in phrase_ids]
238 | else:
239 | assert(phrase_ids[0] == -1 and len(phrase_ids) == 1)
240 | tensor = np.zeros([len(phrases), 1,
241 | DS_CONFIG.text_tensor_sequence_length, len(vocab)])
242 | for idx, line in enumerate(phrases):
243 | line = line.encode('ascii', errors='ignore')
244 | line = line.decode('utf-8')
245 | if line[-1] != '.':
246 | line = line + ' .'
247 | line = re.sub(' +', ' ', line)
248 | line = line.lower()
249 | line = [char for char in line if char in vocab]
250 | line = ''.join(line)
251 | #repeat the phrase to fixed length
252 | while len(line) < DS_CONFIG.text_tensor_sequence_length:
253 | line = line + ' ' + line
254 | for i in range(DS_CONFIG.text_tensor_sequence_length):
255 | tensor[idx, 0, i, vocab.index(line[i])] = 1
256 | return tensor
257 |
258 | #scale: [x_scale, y_scale]
259 | #shape: [limit_x, limit_y]
260 | #return: [xmin, ymin, xmax, ymax]
261 | #use this to decode local roi dict
262 | def get_scaled_roi(roi_ids, roi_dict, scale, shape, batch_idx, area_thred = 49):
263 | rois = []
264 | for idx in roi_ids:
265 | roi_coor = roi_dict[idx]
266 | if roi_coor[2] * roi_coor[3] < area_thred:
267 | continue
268 | temp_roi = [roi_coor[0] - 1, roi_coor[1] - 1,
269 | roi_coor[0] + roi_coor[2] - 2 , roi_coor[1] + roi_coor[3] - 2]
270 | rois.append([batch_idx, temp_roi[0] * scale[0], #1-base -> 0-base
271 | temp_roi[1] * scale[1],
272 | temp_roi[2] * scale[0],
273 | temp_roi[3] * scale[1]])
274 | return np.array(rois)
275 |
276 | #get all needed data from an image
277 | #image_tensor: [batch_size, width, height, 3]
278 | #phrase_tensor: [num_phrases, 1, sequence_length, vocab_size]
279 | #rois:[batch_idx, xmin, ymin, xmax, ymax]
280 | #pair: [rois_idx, phrase_idx]
281 | #labels|loss_weights: same length as pair
282 | def get_data(img_ids):
283 | image_tensor, scales, shapes = get_scaled_im_tensor(img_ids, DS_CONFIG.target_size,
284 | DS_CONFIG.max_size)
285 | all_labels = np.zeros((0,))
286 | all_sources = np.zeros((0,))
287 | all_loss_weights = np.zeros((0,))
288 | inverse_region_ids = np.zeros((0,))
289 | all_rois = np.zeros((0, 5))
290 | phrases_accumulate = np.zeros((0,))
291 | #unique roi index is calculated by batch,
292 | #when used globally should be offset
293 | unique_roi_index_offset = 0
294 | for idx, img_id in enumerate(img_ids):
295 | current_data_book, current_region_dict = get_label_together(img_id)
296 | current_unique_rois_ids, current_inverse_ids = (
297 | np.unique(current_data_book[:, 0], return_inverse = True))
298 | current_rois = get_scaled_roi(current_unique_rois_ids,
299 | current_region_dict,
300 | scales[idx, :], shapes[idx,:], idx)
301 | all_rois = np.concatenate((all_rois, current_rois))
302 | inverse_region_ids = np.concatenate((inverse_region_ids,
303 | current_inverse_ids + unique_roi_index_offset))
304 | unique_roi_index_offset += current_rois.shape[0]
305 | #phrase id is unique globally
306 | phrases_accumulate = np.concatenate((phrases_accumulate,
307 | current_data_book[:, 1]))
308 | all_labels = np.concatenate((all_labels, current_data_book[:, 2]))
309 | all_loss_weights = np.concatenate((all_loss_weights,
310 | current_data_book[:, 3]))
311 | all_sources = np.concatenate((all_sources, current_data_book[:, 4]))
312 | #get phrase tensor
313 | unique_phrase_ids, inverse_phrase_ids = np.unique(phrases_accumulate,
314 | return_inverse = True)
315 | phrase_tensor = get_txt_tensor(unique_phrase_ids)
316 |
317 | assert inverse_phrase_ids.shape[0] == inverse_region_ids.shape[0]
318 | assert inverse_phrase_ids.shape[0] == all_labels.shape[0]
319 |
320 | return {'image_ids': img_ids, #for track and debug
321 | 'phrase_ids': unique_phrase_ids, #for track and debug
322 | 'images': image_tensor,
323 | 'phrases': phrase_tensor,
324 | 'rois': all_rois,
325 | 'phrase_ids': inverse_phrase_ids,
326 | 'roi_ids': inverse_region_ids,
327 | 'labels': all_labels,
328 | 'loss_weights': all_loss_weights,
329 | 'sources': all_sources}
330 |
331 | def visualize(im_idx, phrase2ranked, visual_num, phrase, save_path):
332 | #read in image
333 | try:
334 | image = Image.open(osp.join(ENV_PATHS.IMAGE_PATH, str(im_idx) + '.jpg'))
335 | except:
336 | image = Image.open(im_idx)
337 | im_w, im_h = image.size
338 | ratio = 0.3
339 | pyx.text.set(mode="latex")
340 | pyx.text.preamble(r"\renewcommand{\familydefault}{\sfdefault}")
341 | canv = pyx.canvas.canvas()
342 | canv.insert(pyx.bitmap.bitmap(0, 0, image, width = ratio * im_w, height = ratio * im_h))
343 | assert(len(phrase2ranked) == 1)
344 | ranked = list(phrase2ranked.values())[0]
345 | for i in range(visual_num):
346 | (x1, y1, x2, y2, s) = ranked[i]
347 | w = int(x2 - x1)
348 | h = int(y2 - y1)
349 | canv.stroke(pyx.path.rect(ratio * x1, ratio * (im_h - y2), ratio * w, ratio * h),
350 | [pyx.style.linewidth(1.0), pyx.color.rgb.red])
351 | #insert score tab for each bbox
352 | pyx.unit.set(xscale = 3)
353 | tbox = pyx.text.text(ratio * x1, ratio * (im_h - y1), '[%f]:%s' % (s, phrase), [pyx.text.size.Huge])
354 | tpath = tbox.bbox().enlarged(3 * pyx.unit.x_pt).path()
355 | canv.draw(tpath, [pyx.deco.filled([pyx.color.cmyk.Yellow]), pyx.deco.stroked()])
356 | canv.insert(tbox)
357 |
358 | canv.writePDFfile(save_path)
359 |
360 | if __name__ == '__main__':
361 | main()
362 |
--------------------------------------------------------------------------------