├── .gitignore ├── LICENSE ├── README.md ├── __init__.py ├── conda_packagelist.txt ├── configs ├── CornerNet-multi_scale.json ├── CornerNet.json ├── CornerNet_Saccade.json └── CornerNet_Squeeze.json ├── core ├── __init__.py ├── base.py ├── config.py ├── dbs │ ├── __init__.py │ ├── base.py │ ├── coco.py │ └── detection.py ├── detectors.py ├── external │ ├── .gitignore │ ├── Makefile │ ├── __init__.py │ ├── bbox.pyx │ ├── nms.pyx │ └── setup.py ├── models │ ├── CornerNet.py │ ├── CornerNet_Saccade.py │ ├── CornerNet_Squeeze.py │ ├── __init__.py │ └── py_utils │ │ ├── __init__.py │ │ ├── _cpools │ │ ├── .gitignore │ │ ├── __init__.py │ │ ├── setup.py │ │ └── src │ │ │ ├── bottom_pool.cpp │ │ │ ├── left_pool.cpp │ │ │ ├── right_pool.cpp │ │ │ └── top_pool.cpp │ │ ├── data_parallel.py │ │ ├── losses.py │ │ ├── modules.py │ │ ├── scatter_gather.py │ │ └── utils.py ├── nnet │ ├── __init__.py │ └── py_factory.py ├── paths.py ├── sample │ ├── __init__.py │ ├── cornernet.py │ ├── cornernet_saccade.py │ └── utils.py ├── test │ ├── __init__.py │ ├── cornernet.py │ └── cornernet_saccade.py ├── utils │ ├── __init__.py │ ├── timer.py │ └── tqdm.py └── vis_utils.py ├── demo.jpg ├── demo.py ├── evaluate.py └── train.py /.gitignore: -------------------------------------------------------------------------------- 1 | loss/ 2 | data/ 3 | cache/ 4 | tf_cache/ 5 | debug/ 6 | results/ 7 | 8 | misc/outputs 9 | 10 | evaluation/evaluate_object 11 | evaluation/analyze_object 12 | 13 | nnet/__pycache__/ 14 | 15 | *.swp 16 | 17 | *.pyc 18 | *.o* 19 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2019, Princeton University 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | * Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | * Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | * Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CornerNet-Lite: Training, Evaluation and Testing Code 2 | Code for reproducing results in the following paper: 3 | 4 | [**CornerNet-Lite: Efficient Keypoint Based Object Detection**](https://arxiv.org/abs/1904.08900) 5 | Hei Law, Yun Teng, Olga Russakovsky, Jia Deng 6 | *arXiv:1904.08900* 7 | 8 | ## Getting Started 9 | ### Software Requirement 10 | - Python 3.7 11 | - PyTorch 1.0.0 12 | - CUDA 10 13 | - GCC 4.9.2 or above 14 | 15 | ### Installing Dependencies 16 | Please first install [Anaconda](https://anaconda.org) and create an Anaconda environment using the provided package list `conda_packagelist.txt`. 17 | ``` 18 | conda create --name CornerNet_Lite --file conda_packagelist.txt --channel pytorch 19 | ``` 20 | 21 | After you create the environment, please activate it. 22 | ``` 23 | source activate CornerNet_Lite 24 | ``` 25 | 26 | ### Compiling Corner Pooling Layers 27 | Compile the C++ implementation of the corner pooling layers. (GCC4.9.2 or above is required.) 28 | ``` 29 | cd /core/models/py_utils/_cpools/ 30 | python setup.py install --user 31 | ``` 32 | 33 | ### Compiling NMS 34 | Compile the NMS code which are originally from [Faster R-CNN](https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/nms/cpu_nms.pyx) and [Soft-NMS](https://github.com/bharatsingh430/soft-nms/blob/master/lib/nms/cpu_nms.pyx). 35 | ``` 36 | cd /core/external 37 | make 38 | ``` 39 | 40 | ### Downloading Models 41 | In this repo, we provide models for the following detectors: 42 | - [CornerNet-Saccade](https://drive.google.com/file/d/1MQDyPRI0HgDHxHToudHqQ-2m8TVBciaa/view?usp=sharing) 43 | - [CornerNet-Squeeze](https://drive.google.com/file/d/1qM8BBYCLUBcZx_UmLT0qMXNTh-Yshp4X/view?usp=sharing) 44 | - [CornerNet](https://drive.google.com/file/d/1e8At_iZWyXQgLlMwHkB83kN-AN85Uff1/view?usp=sharing) 45 | 46 | Put the CornerNet-Saccade model under `/cache/nnet/CornerNet_Saccade/`, CornerNet-Squeeze model under `/cache/nnet/CornerNet_Squeeze/` and CornerNet model under `/cache/nnet/CornerNet/`. (\* Note we use underscore instead of dash in both the directory names for CornerNet-Saccade and CornerNet-Squeeze.) 47 | 48 | Note: The CornerNet model is the same as the one in the original [CornerNet repo](https://github.com/princeton-vl/CornerNet). We just ported it to this new repo. 49 | 50 | ### Running the Demo Script 51 | After downloading the models, you should be able to use the detectors on your own images. We provide a demo script `demo.py` to test if the repo is installed correctly. 52 | ``` 53 | python demo.py 54 | ``` 55 | This script applies CornerNet-Saccade to `demo.jpg` and writes the results to `demo_out.jpg`. 56 | 57 | In the demo script, the default detector is CornerNet-Saccade. You can modify the demo script to test different detectors. For example, if you want to test CornerNet-Squeeze: 58 | ```python 59 | #!/usr/bin/env python 60 | 61 | import cv2 62 | from core.detectors import CornerNet_Squeeze 63 | from core.vis_utils import draw_bboxes 64 | 65 | detector = CornerNet_Squeeze() 66 | image = cv2.imread("demo.jpg") 67 | 68 | bboxes = detector(image) 69 | image = draw_bboxes(image, bboxes) 70 | cv2.imwrite("demo_out.jpg", image) 71 | ``` 72 | 73 | ### Using CornerNet-Lite in Your Project 74 | It is also easy to use CornerNet-Lite in your project. You will need to change the directory name from `CornerNet-Lite` to `CornerNet_Lite`. Otherwise, you won't be able to import CornerNet-Lite. 75 | ``` 76 | Your project 77 | │ README.md 78 | │ ... 79 | │ foo.py 80 | │ 81 | └───CornerNet_Lite 82 | │ 83 | └───directory1 84 | │ 85 | └───... 86 | ``` 87 | 88 | In `foo.py`, you can easily import CornerNet-Saccade by adding: 89 | ```python 90 | from CornerNet_Lite import CornerNet_Saccade 91 | 92 | def foo(): 93 | cornernet = CornerNet_Saccade() 94 | # CornerNet_Saccade is ready to use 95 | 96 | image = cv2.imread('/path/to/your/image') 97 | bboxes = cornernet(image) 98 | ``` 99 | 100 | If you want to train or evaluate the detectors on COCO, please move on to the following steps. 101 | 102 | ## Training and Evaluation 103 | 104 | ### Installing MS COCO APIs 105 | ``` 106 | mkdir -p /data 107 | cd /data 108 | git clone git@github.com:cocodataset/cocoapi.git coco 109 | cd /data/coco/PythonAPI 110 | make install 111 | ``` 112 | 113 | ### Downloading MS COCO Data 114 | - Download the training/validation split we use in our paper from [here](https://drive.google.com/file/d/1dop4188xo5lXDkGtOZUzy2SHOD_COXz4/view?usp=sharing) (originally from [Faster R-CNN](https://github.com/rbgirshick/py-faster-rcnn/tree/master/data)) 115 | - Unzip the file and place `annotations` under `/data/coco` 116 | - Download the images (2014 Train, 2014 Val, 2017 Test) from [here](http://cocodataset.org/#download) 117 | - Create 3 directories, `trainval2014`, `minival2014` and `testdev2017`, under `/data/coco/images/` 118 | - Copy the training/validation/testing images to the corresponding directories according to the annotation files 119 | 120 | To train and evaluate a network, you will need to create a configuration file, which defines the hyperparameters, and a model file, which defines the network architecture. The configuration file should be in JSON format and placed in `/configs/`. Each configuration file should have a corresponding model file in `/core/models/`. i.e. If there is a `.json` in `/configs/`, there should be a `.py` in `/core/models/`. There is only one exception which we will mention later. 121 | 122 | ### Training and Evaluating a Model 123 | To train a model: 124 | ``` 125 | python train.py 126 | ``` 127 | 128 | We provide the configuration files and the model files for CornerNet-Saccade, CornerNet-Squeeze and CornerNet in this repo. Please check the configuration files in `/configs/`. 129 | 130 | To train CornerNet-Saccade: 131 | ``` 132 | python train.py CornerNet_Saccade 133 | ``` 134 | Please adjust the batch size in `CornerNet_Saccade.json` to accommodate the number of GPUs that are available to you. 135 | 136 | To evaluate the trained model: 137 | ``` 138 | python evaluate.py CornerNet_Saccade --testiter 500000 --split 139 | ``` 140 | 141 | If you want to test different hyperparameters during evaluation and do not want to overwrite the original configuration file, you can do so by creating a configuration file with a suffix (`-.json`). There is no need to create `-.py` in `/core/models/`. 142 | 143 | To use the new configuration file: 144 | ``` 145 | python evaluate.py --testiter --split --suffix 146 | ``` 147 | 148 | We also include a configuration file for CornerNet under multi-scale setting, which is `CornerNet-multi_scale.json`, in this repo. 149 | 150 | To use the multi-scale configuration file: 151 | ``` 152 | python evaluate.py CornerNet --testiter --split --suffix multi_scale 153 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- 1 | from .core.detectors import CornerNet, CornerNet_Squeeze, CornerNet_Saccade 2 | from .core.vis_utils import draw_bboxes 3 | -------------------------------------------------------------------------------- /conda_packagelist.txt: -------------------------------------------------------------------------------- 1 | # This file may be used to create an environment using: 2 | # $ conda create --name --file 3 | # platform: linux-64 4 | blas=1.0=mkl 5 | bzip2=1.0.6=h14c3975_5 6 | ca-certificates=2018.12.5=0 7 | cairo=1.14.12=h8948797_3 8 | certifi=2018.11.29=py37_0 9 | cffi=1.11.5=py37he75722e_1 10 | cuda100=1.0=0 11 | cycler=0.10.0=py37_0 12 | cython=0.28.5=py37hf484d3e_0 13 | dbus=1.13.2=h714fa37_1 14 | expat=2.2.6=he6710b0_0 15 | ffmpeg=4.0=hcdf2ecd_0 16 | fontconfig=2.13.0=h9420a91_0 17 | freeglut=3.0.0=hf484d3e_5 18 | freetype=2.9.1=h8a8886c_1 19 | glib=2.56.2=hd408876_0 20 | graphite2=1.3.12=h23475e2_2 21 | gst-plugins-base=1.14.0=hbbd80ab_1 22 | gstreamer=1.14.0=hb453b48_1 23 | harfbuzz=1.8.8=hffaf4a1_0 24 | hdf5=1.10.2=hba1933b_1 25 | icu=58.2=h9c2bf20_1 26 | intel-openmp=2019.0=118 27 | jasper=2.0.14=h07fcdf6_1 28 | jpeg=9b=h024ee3a_2 29 | kiwisolver=1.0.1=py37hf484d3e_0 30 | libedit=3.1.20170329=h6b74fdf_2 31 | libffi=3.2.1=hd88cf55_4 32 | libgcc-ng=8.2.0=hdf63c60_1 33 | libgfortran-ng=7.3.0=hdf63c60_0 34 | libglu=9.0.0=hf484d3e_1 35 | libopencv=3.4.2=hb342d67_1 36 | libopus=1.2.1=hb9ed12e_0 37 | libpng=1.6.35=hbc83047_0 38 | libstdcxx-ng=8.2.0=hdf63c60_1 39 | libtiff=4.0.9=he85c1e1_2 40 | libuuid=1.0.3=h1bed415_2 41 | libvpx=1.7.0=h439df22_0 42 | libxcb=1.13=h1bed415_1 43 | libxml2=2.9.8=h26e45fe_1 44 | matplotlib=3.0.2=py37h5429711_0 45 | mkl=2018.0.3=1 46 | mkl_fft=1.0.6=py37h7dd41cf_0 47 | mkl_random=1.0.1=py37h4414c95_1 48 | ncurses=6.1=hf484d3e_0 49 | ninja=1.8.2=py37h6bb024c_1 50 | numpy=1.15.4=py37h1d66e8a_0 51 | numpy-base=1.15.4=py37h81de0dd_0 52 | olefile=0.46=py37_0 53 | opencv=3.4.2=py37h6fd60c2_1 54 | openssl=1.1.1a=h7b6447c_0 55 | pcre=8.42=h439df22_0 56 | pillow=5.2.0=py37heded4f4_0 57 | pip=10.0.1=py37_0 58 | pixman=0.34.0=hceecf20_3 59 | py-opencv=3.4.2=py37hb342d67_1 60 | pycparser=2.18=py37_1 61 | pyparsing=2.2.0=py37_1 62 | pyqt=5.9.2=py37h05f1152_2 63 | python=3.7.1=h0371630_3 64 | python-dateutil=2.7.3=py37_0 65 | pytorch=1.0.0=py3.7_cuda10.0.130_cudnn7.4.1_1 66 | pytz=2018.5=py37_0 67 | qt=5.9.7=h5867ecd_1 68 | readline=7.0=h7b6447c_5 69 | scikit-learn=0.19.1=py37hedc7406_0 70 | scipy=1.1.0=py37hfa4b5c9_1 71 | setuptools=40.2.0=py37_0 72 | sip=4.19.8=py37hf484d3e_0 73 | six=1.11.0=py37_1 74 | sqlite=3.25.3=h7b6447c_0 75 | tk=8.6.8=hbc83047_0 76 | torchvision=0.2.1=py37_1 77 | tornado=5.1=py37h14c3975_0 78 | tqdm=4.25.0=py37h28b3542_0 79 | wheel=0.31.1=py37_0 80 | xz=5.2.4=h14c3975_4 81 | zlib=1.2.11=ha838bed_2 82 | -------------------------------------------------------------------------------- /configs/CornerNet-multi_scale.json: -------------------------------------------------------------------------------- 1 | { 2 | "system": { 3 | "dataset": "COCO", 4 | "batch_size": 49, 5 | "sampling_function": "cornernet", 6 | 7 | "train_split": "trainval", 8 | "val_split": "minival", 9 | 10 | "learning_rate": 0.00025, 11 | "decay_rate": 10, 12 | 13 | "val_iter": 100, 14 | 15 | "opt_algo": "adam", 16 | "prefetch_size": 5, 17 | 18 | "max_iter": 500000, 19 | "stepsize": 450000, 20 | "snapshot": 5000, 21 | 22 | "chunk_sizes": [4, 5, 5, 5, 5, 5, 5, 5, 5, 5], 23 | 24 | "data_dir": "./data" 25 | }, 26 | 27 | "db": { 28 | "rand_scale_min": 0.6, 29 | "rand_scale_max": 1.4, 30 | "rand_scale_step": 0.1, 31 | "rand_scales": null, 32 | 33 | "rand_crop": true, 34 | "rand_color": true, 35 | 36 | "border": 128, 37 | "gaussian_bump": true, 38 | 39 | "input_size": [511, 511], 40 | "output_sizes": [[128, 128]], 41 | 42 | "test_scales": [0.5, 0.75, 1, 1.25, 1.5], 43 | 44 | "top_k": 100, 45 | "categories": 80, 46 | "ae_threshold": 0.5, 47 | "nms_threshold": 0.5, 48 | 49 | "merge_bbox": true, 50 | "weight_exp": 10, 51 | 52 | "max_per_image": 100 53 | } 54 | } 55 | -------------------------------------------------------------------------------- /configs/CornerNet.json: -------------------------------------------------------------------------------- 1 | { 2 | "system": { 3 | "dataset": "COCO", 4 | "batch_size": 49, 5 | "sampling_function": "cornernet", 6 | 7 | "train_split": "trainval", 8 | "val_split": "minival", 9 | 10 | "learning_rate": 0.00025, 11 | "decay_rate": 10, 12 | 13 | "val_iter": 100, 14 | 15 | "opt_algo": "adam", 16 | "prefetch_size": 5, 17 | 18 | "max_iter": 500000, 19 | "stepsize": 450000, 20 | "snapshot": 5000, 21 | 22 | "chunk_sizes": [4, 5, 5, 5, 5, 5, 5, 5, 5, 5], 23 | 24 | "data_dir": "./data" 25 | }, 26 | 27 | "db": { 28 | "rand_scale_min": 0.6, 29 | "rand_scale_max": 1.4, 30 | "rand_scale_step": 0.1, 31 | "rand_scales": null, 32 | 33 | "rand_crop": true, 34 | "rand_color": true, 35 | 36 | "border": 128, 37 | "gaussian_bump": true, 38 | "gaussian_iou": 0.3, 39 | 40 | "input_size": [511, 511], 41 | "output_sizes": [[128, 128]], 42 | 43 | "test_scales": [1], 44 | 45 | "top_k": 100, 46 | "categories": 80, 47 | "ae_threshold": 0.5, 48 | "nms_threshold": 0.5, 49 | 50 | "max_per_image": 100 51 | } 52 | } 53 | -------------------------------------------------------------------------------- /configs/CornerNet_Saccade.json: -------------------------------------------------------------------------------- 1 | { 2 | "system": { 3 | "dataset": "COCO", 4 | "batch_size": 48, 5 | "sampling_function": "cornernet_saccade", 6 | 7 | "train_split": "trainval", 8 | "val_split": "minival", 9 | 10 | "learning_rate": 0.00025, 11 | "decay_rate": 10, 12 | 13 | "val_iter": 100, 14 | 15 | "opt_algo": "adam", 16 | "prefetch_size": 5, 17 | 18 | "max_iter": 500000, 19 | "stepsize": 450000, 20 | "snapshot": 5000, 21 | 22 | "chunk_sizes": [12, 12, 12, 12] 23 | }, 24 | 25 | "db": { 26 | "rand_scale_min": 0.5, 27 | "rand_scale_max": 1.1, 28 | "rand_scale_step": 0.1, 29 | "rand_scales": null, 30 | 31 | "rand_full_crop": true, 32 | "gaussian_bump": true, 33 | "gaussian_iou": 0.5, 34 | 35 | "min_scale": 16, 36 | "view_sizes": [], 37 | 38 | "height_mult": 31, 39 | "width_mult": 31, 40 | 41 | "input_size": [255, 255], 42 | "output_sizes": [[64, 64]], 43 | 44 | "att_max_crops": 30, 45 | "att_scales": [[1, 2, 4]], 46 | "att_thresholds": [0.3], 47 | 48 | "top_k": 12, 49 | "num_dets": 12, 50 | "categories": 80, 51 | "ae_threshold": 0.3, 52 | "nms_threshold": 0.5, 53 | 54 | "max_per_image": 100 55 | } 56 | } 57 | -------------------------------------------------------------------------------- /configs/CornerNet_Squeeze.json: -------------------------------------------------------------------------------- 1 | { 2 | "system": { 3 | "dataset": "COCO", 4 | "batch_size": 55, 5 | "sampling_function": "cornernet", 6 | 7 | "train_split": "trainval", 8 | "val_split": "minival", 9 | 10 | "learning_rate": 0.00025, 11 | "decay_rate": 10, 12 | 13 | "val_iter": 100, 14 | 15 | "opt_algo": "adam", 16 | "prefetch_size": 5, 17 | 18 | "max_iter": 500000, 19 | "stepsize": 450000, 20 | "snapshot": 5000, 21 | 22 | "chunk_sizes": [13, 14, 14, 14], 23 | 24 | "data_dir": "./data" 25 | }, 26 | 27 | "db": { 28 | "rand_scale_min": 0.6, 29 | "rand_scale_max": 1.4, 30 | "rand_scale_step": 0.1, 31 | "rand_scales": null, 32 | 33 | "rand_crop": true, 34 | "rand_color": true, 35 | 36 | "border": 128, 37 | "gaussian_bump": true, 38 | "gaussian_iou": 0.3, 39 | 40 | "input_size": [511, 511], 41 | "output_sizes": [[64, 64]], 42 | 43 | "test_scales": [1], 44 | "test_flipped": false, 45 | 46 | "top_k": 20, 47 | "num_dets": 100, 48 | "categories": 80, 49 | "ae_threshold": 0.5, 50 | "nms_threshold": 0.5, 51 | 52 | "max_per_image": 100 53 | } 54 | } 55 | -------------------------------------------------------------------------------- /core/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/princeton-vl/CornerNet-Lite/6a54505d830a9d6afe26e99f0864b5d06d0bbbaf/core/__init__.py -------------------------------------------------------------------------------- /core/base.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | from .nnet.py_factory import NetworkFactory 4 | 5 | class Base(object): 6 | def __init__(self, db, nnet, func, model=None): 7 | super(Base, self).__init__() 8 | 9 | self._db = db 10 | self._nnet = nnet 11 | self._func = func 12 | 13 | if model is not None: 14 | self._nnet.load_pretrained_params(model) 15 | 16 | self._nnet.cuda() 17 | self._nnet.eval_mode() 18 | 19 | def _inference(self, image, *args, **kwargs): 20 | return self._func(self._db, self._nnet, image.copy(), *args, **kwargs) 21 | 22 | def __call__(self, image, *args, **kwargs): 23 | categories = self._db.configs["categories"] 24 | bboxes = self._inference(image, *args, **kwargs) 25 | return {self._db.cls2name(j): bboxes[j] for j in range(1, categories + 1)} 26 | 27 | def load_cfg(cfg_file): 28 | with open(cfg_file, "r") as f: 29 | cfg = json.load(f) 30 | 31 | cfg_sys = cfg["system"] 32 | cfg_db = cfg["db"] 33 | return cfg_sys, cfg_db 34 | 35 | def load_nnet(cfg_sys, model): 36 | return NetworkFactory(cfg_sys, model) 37 | -------------------------------------------------------------------------------- /core/config.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | 4 | class SystemConfig(object): 5 | def __init__(self): 6 | self._configs = {} 7 | self._configs["dataset"] = None 8 | self._configs["sampling_function"] = "coco_detection" 9 | 10 | # Training Config 11 | self._configs["display"] = 5 12 | self._configs["snapshot"] = 400 13 | self._configs["stepsize"] = 5000 14 | self._configs["learning_rate"] = 0.001 15 | self._configs["decay_rate"] = 10 16 | self._configs["max_iter"] = 100000 17 | self._configs["val_iter"] = 20 18 | self._configs["batch_size"] = 1 19 | self._configs["snapshot_name"] = None 20 | self._configs["prefetch_size"] = 100 21 | self._configs["pretrain"] = None 22 | self._configs["opt_algo"] = "adam" 23 | self._configs["chunk_sizes"] = None 24 | 25 | # Directories 26 | self._configs["data_dir"] = "./data" 27 | self._configs["cache_dir"] = "./cache" 28 | self._configs["config_dir"] = "./config" 29 | self._configs["result_dir"] = "./results" 30 | 31 | # Split 32 | self._configs["train_split"] = "training" 33 | self._configs["val_split"] = "validation" 34 | self._configs["test_split"] = "testdev" 35 | 36 | # Rng 37 | self._configs["data_rng"] = np.random.RandomState(123) 38 | self._configs["nnet_rng"] = np.random.RandomState(317) 39 | 40 | @property 41 | def chunk_sizes(self): 42 | return self._configs["chunk_sizes"] 43 | 44 | @property 45 | def train_split(self): 46 | return self._configs["train_split"] 47 | 48 | @property 49 | def val_split(self): 50 | return self._configs["val_split"] 51 | 52 | @property 53 | def test_split(self): 54 | return self._configs["test_split"] 55 | 56 | @property 57 | def full(self): 58 | return self._configs 59 | 60 | @property 61 | def sampling_function(self): 62 | return self._configs["sampling_function"] 63 | 64 | @property 65 | def data_rng(self): 66 | return self._configs["data_rng"] 67 | 68 | @property 69 | def nnet_rng(self): 70 | return self._configs["nnet_rng"] 71 | 72 | @property 73 | def opt_algo(self): 74 | return self._configs["opt_algo"] 75 | 76 | @property 77 | def prefetch_size(self): 78 | return self._configs["prefetch_size"] 79 | 80 | @property 81 | def pretrain(self): 82 | return self._configs["pretrain"] 83 | 84 | @property 85 | def result_dir(self): 86 | result_dir = os.path.join(self._configs["result_dir"], self.snapshot_name) 87 | if not os.path.exists(result_dir): 88 | os.makedirs(result_dir) 89 | return result_dir 90 | 91 | @property 92 | def dataset(self): 93 | return self._configs["dataset"] 94 | 95 | @property 96 | def snapshot_name(self): 97 | return self._configs["snapshot_name"] 98 | 99 | @property 100 | def snapshot_dir(self): 101 | snapshot_dir = os.path.join(self.cache_dir, "nnet", self.snapshot_name) 102 | 103 | if not os.path.exists(snapshot_dir): 104 | os.makedirs(snapshot_dir) 105 | return snapshot_dir 106 | 107 | @property 108 | def snapshot_file(self): 109 | snapshot_file = os.path.join(self.snapshot_dir, self.snapshot_name + "_{}.pkl") 110 | return snapshot_file 111 | 112 | @property 113 | def config_dir(self): 114 | return self._configs["config_dir"] 115 | 116 | @property 117 | def batch_size(self): 118 | return self._configs["batch_size"] 119 | 120 | @property 121 | def max_iter(self): 122 | return self._configs["max_iter"] 123 | 124 | @property 125 | def learning_rate(self): 126 | return self._configs["learning_rate"] 127 | 128 | @property 129 | def decay_rate(self): 130 | return self._configs["decay_rate"] 131 | 132 | @property 133 | def stepsize(self): 134 | return self._configs["stepsize"] 135 | 136 | @property 137 | def snapshot(self): 138 | return self._configs["snapshot"] 139 | 140 | @property 141 | def display(self): 142 | return self._configs["display"] 143 | 144 | @property 145 | def val_iter(self): 146 | return self._configs["val_iter"] 147 | 148 | @property 149 | def data_dir(self): 150 | return self._configs["data_dir"] 151 | 152 | @property 153 | def cache_dir(self): 154 | if not os.path.exists(self._configs["cache_dir"]): 155 | os.makedirs(self._configs["cache_dir"]) 156 | return self._configs["cache_dir"] 157 | 158 | def update_config(self, new): 159 | for key in new: 160 | if key in self._configs: 161 | self._configs[key] = new[key] 162 | return self 163 | -------------------------------------------------------------------------------- /core/dbs/__init__.py: -------------------------------------------------------------------------------- 1 | from .coco import COCO 2 | 3 | datasets = { 4 | "COCO": COCO 5 | } 6 | 7 | -------------------------------------------------------------------------------- /core/dbs/base.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | 4 | class BASE(object): 5 | def __init__(self): 6 | self._split = None 7 | self._db_inds = [] 8 | self._image_ids = [] 9 | 10 | self._mean = np.zeros((3, ), dtype=np.float32) 11 | self._std = np.ones((3, ), dtype=np.float32) 12 | self._eig_val = np.ones((3, ), dtype=np.float32) 13 | self._eig_vec = np.zeros((3, 3), dtype=np.float32) 14 | 15 | self._configs = {} 16 | self._configs["data_aug"] = True 17 | 18 | self._data_rng = None 19 | 20 | @property 21 | def configs(self): 22 | return self._configs 23 | 24 | @property 25 | def mean(self): 26 | return self._mean 27 | 28 | @property 29 | def std(self): 30 | return self._std 31 | 32 | @property 33 | def eig_val(self): 34 | return self._eig_val 35 | 36 | @property 37 | def eig_vec(self): 38 | return self._eig_vec 39 | 40 | @property 41 | def db_inds(self): 42 | return self._db_inds 43 | 44 | @property 45 | def split(self): 46 | return self._split 47 | 48 | def update_config(self, new): 49 | for key in new: 50 | if key in self._configs: 51 | self._configs[key] = new[key] 52 | 53 | def image_ids(self, ind): 54 | return self._image_ids[ind] 55 | 56 | def image_path(self, ind): 57 | pass 58 | 59 | def write_result(self, ind, all_bboxes, all_scores): 60 | pass 61 | 62 | def evaluate(self, name): 63 | pass 64 | 65 | def shuffle_inds(self, quiet=False): 66 | if self._data_rng is None: 67 | self._data_rng = np.random.RandomState(os.getpid()) 68 | 69 | if not quiet: 70 | print("shuffling indices...") 71 | rand_perm = self._data_rng.permutation(len(self._db_inds)) 72 | self._db_inds = self._db_inds[rand_perm] 73 | -------------------------------------------------------------------------------- /core/dbs/coco.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import numpy as np 4 | 5 | from .detection import DETECTION 6 | from ..paths import get_file_path 7 | 8 | # COCO bounding boxes are 0-indexed 9 | 10 | class COCO(DETECTION): 11 | def __init__(self, db_config, split=None, sys_config=None): 12 | assert split is None or sys_config is not None 13 | super(COCO, self).__init__(db_config) 14 | 15 | self._mean = np.array([0.40789654, 0.44719302, 0.47026115], dtype=np.float32) 16 | self._std = np.array([0.28863828, 0.27408164, 0.27809835], dtype=np.float32) 17 | self._eig_val = np.array([0.2141788, 0.01817699, 0.00341571], dtype=np.float32) 18 | self._eig_vec = np.array([ 19 | [-0.58752847, -0.69563484, 0.41340352], 20 | [-0.5832747, 0.00994535, -0.81221408], 21 | [-0.56089297, 0.71832671, 0.41158938] 22 | ], dtype=np.float32) 23 | 24 | self._coco_cls_ids = [ 25 | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 26 | 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 27 | 24, 25, 27, 28, 31, 32, 33, 34, 35, 36, 28 | 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 29 | 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 30 | 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 31 | 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 32 | 82, 84, 85, 86, 87, 88, 89, 90 33 | ] 34 | 35 | self._coco_cls_names = [ 36 | 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 37 | 'bus', 'train', 'truck', 'boat', 'traffic light', 38 | 'fire hydrant', 'stop sign', 'parking meter', 'bench', 39 | 'bird', 'cat', 'dog', 'horse','sheep', 'cow', 'elephant', 40 | 'bear', 'zebra','giraffe', 'backpack', 'umbrella', 41 | 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 42 | 'snowboard','sports ball', 'kite', 'baseball bat', 43 | 'baseball glove', 'skateboard', 'surfboard', 44 | 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 45 | 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 46 | 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 47 | 'donut', 'cake', 'chair', 'couch', 'potted plant', 48 | 'bed', 'dining table', 'toilet', 'tv', 'laptop', 49 | 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 50 | 'oven', 'toaster', 'sink', 'refrigerator', 'book', 51 | 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 52 | 'toothbrush' 53 | ] 54 | 55 | self._cls2coco = {ind + 1: coco_id for ind, coco_id in enumerate(self._coco_cls_ids)} 56 | self._coco2cls = {coco_id: cls_id for cls_id, coco_id in self._cls2coco.items()} 57 | self._coco2name = {cls_id: cls_name for cls_id, cls_name in zip(self._coco_cls_ids, self._coco_cls_names)} 58 | self._name2coco = {cls_name: cls_id for cls_name, cls_id in self._coco2name.items()} 59 | 60 | if split is not None: 61 | coco_dir = os.path.join(sys_config.data_dir, "coco") 62 | 63 | self._split = { 64 | "trainval": "trainval2014", 65 | "minival": "minival2014", 66 | "testdev": "testdev2017" 67 | }[split] 68 | self._data_dir = os.path.join(coco_dir, "images", self._split) 69 | self._anno_file = os.path.join(coco_dir, "annotations", "instances_{}.json".format(self._split)) 70 | 71 | self._detections, self._eval_ids = self._load_coco_annos() 72 | self._image_ids = list(self._detections.keys()) 73 | self._db_inds = np.arange(len(self._image_ids)) 74 | 75 | def _load_coco_annos(self): 76 | from pycocotools.coco import COCO 77 | 78 | coco = COCO(self._anno_file) 79 | self._coco = coco 80 | 81 | class_ids = coco.getCatIds() 82 | image_ids = coco.getImgIds() 83 | 84 | eval_ids = {} 85 | detections = {} 86 | for image_id in image_ids: 87 | image = coco.loadImgs(image_id)[0] 88 | dets = [] 89 | 90 | eval_ids[image["file_name"]] = image_id 91 | for class_id in class_ids: 92 | annotation_ids = coco.getAnnIds(imgIds=image["id"], catIds=class_id) 93 | annotations = coco.loadAnns(annotation_ids) 94 | category = self._coco2cls[class_id] 95 | for annotation in annotations: 96 | det = annotation["bbox"] + [category] 97 | det[2] += det[0] 98 | det[3] += det[1] 99 | dets.append(det) 100 | 101 | file_name = image["file_name"] 102 | if len(dets) == 0: 103 | detections[file_name] = np.zeros((0, 5), dtype=np.float32) 104 | else: 105 | detections[file_name] = np.array(dets, dtype=np.float32) 106 | return detections, eval_ids 107 | 108 | def image_path(self, ind): 109 | if self._data_dir is None: 110 | raise ValueError("Data directory is not set") 111 | 112 | db_ind = self._db_inds[ind] 113 | file_name = self._image_ids[db_ind] 114 | return os.path.join(self._data_dir, file_name) 115 | 116 | def detections(self, ind): 117 | db_ind = self._db_inds[ind] 118 | file_name = self._image_ids[db_ind] 119 | return self._detections[file_name].copy() 120 | 121 | def cls2name(self, cls): 122 | coco = self._cls2coco[cls] 123 | return self._coco2name[coco] 124 | 125 | def _to_float(self, x): 126 | return float("{:.2f}".format(x)) 127 | 128 | def convert_to_coco(self, all_bboxes): 129 | detections = [] 130 | for image_id in all_bboxes: 131 | coco_id = self._eval_ids[image_id] 132 | for cls_ind in all_bboxes[image_id]: 133 | category_id = self._cls2coco[cls_ind] 134 | for bbox in all_bboxes[image_id][cls_ind]: 135 | bbox[2] -= bbox[0] 136 | bbox[3] -= bbox[1] 137 | 138 | score = bbox[4] 139 | bbox = list(map(self._to_float, bbox[0:4])) 140 | 141 | detection = { 142 | "image_id": coco_id, 143 | "category_id": category_id, 144 | "bbox": bbox, 145 | "score": float("{:.2f}".format(score)) 146 | } 147 | 148 | detections.append(detection) 149 | return detections 150 | 151 | def evaluate(self, result_json, cls_ids, image_ids): 152 | from pycocotools.cocoeval import COCOeval 153 | 154 | if self._split == "testdev": 155 | return None 156 | 157 | coco = self._coco 158 | 159 | eval_ids = [self._eval_ids[image_id] for image_id in image_ids] 160 | cat_ids = [self._cls2coco[cls_id] for cls_id in cls_ids] 161 | 162 | coco_dets = coco.loadRes(result_json) 163 | coco_eval = COCOeval(coco, coco_dets, "bbox") 164 | coco_eval.params.imgIds = eval_ids 165 | coco_eval.params.catIds = cat_ids 166 | coco_eval.evaluate() 167 | coco_eval.accumulate() 168 | coco_eval.summarize() 169 | return coco_eval.stats[0], coco_eval.stats[12:] 170 | -------------------------------------------------------------------------------- /core/dbs/detection.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from .base import BASE 4 | 5 | class DETECTION(BASE): 6 | def __init__(self, db_config): 7 | super(DETECTION, self).__init__() 8 | 9 | # Configs for training 10 | self._configs["categories"] = 80 11 | self._configs["rand_scales"] = [1] 12 | self._configs["rand_scale_min"] = 0.8 13 | self._configs["rand_scale_max"] = 1.4 14 | self._configs["rand_scale_step"] = 0.2 15 | 16 | # Configs for both training and testing 17 | self._configs["input_size"] = [383, 383] 18 | self._configs["output_sizes"] = [[96, 96], [48, 48], [24, 24], [12, 12]] 19 | 20 | self._configs["score_threshold"] = 0.05 21 | self._configs["nms_threshold"] = 0.7 22 | self._configs["max_per_set"] = 40 23 | self._configs["max_per_image"] = 100 24 | self._configs["top_k"] = 20 25 | self._configs["ae_threshold"] = 1 26 | self._configs["nms_kernel"] = 3 27 | self._configs["num_dets"] = 1000 28 | 29 | self._configs["nms_algorithm"] = "exp_soft_nms" 30 | self._configs["weight_exp"] = 8 31 | self._configs["merge_bbox"] = False 32 | 33 | self._configs["data_aug"] = True 34 | self._configs["lighting"] = True 35 | 36 | self._configs["border"] = 64 37 | self._configs["gaussian_bump"] = False 38 | self._configs["gaussian_iou"] = 0.7 39 | self._configs["gaussian_radius"] = -1 40 | self._configs["rand_crop"] = False 41 | self._configs["rand_color"] = False 42 | self._configs["rand_center"] = True 43 | 44 | self._configs["init_sizes"] = [192, 255] 45 | self._configs["view_sizes"] = [] 46 | 47 | self._configs["min_scale"] = 16 48 | self._configs["max_scale"] = 32 49 | 50 | self._configs["att_sizes"] = [[16, 16], [32, 32], [64, 64]] 51 | self._configs["att_ranges"] = [[96, 256], [32, 96], [0, 32]] 52 | self._configs["att_ratios"] = [16, 8, 4] 53 | self._configs["att_scales"] = [1, 1.5, 2] 54 | self._configs["att_thresholds"] = [0.3, 0.3, 0.3, 0.3] 55 | self._configs["att_nms_ks"] = [3, 3, 3] 56 | self._configs["att_max_crops"] = 8 57 | self._configs["ref_dets"] = True 58 | 59 | # Configs for testing 60 | self._configs["test_scales"] = [1] 61 | self._configs["test_flipped"] = True 62 | 63 | self.update_config(db_config) 64 | 65 | if self._configs["rand_scales"] is None: 66 | self._configs["rand_scales"] = np.arange( 67 | self._configs["rand_scale_min"], 68 | self._configs["rand_scale_max"], 69 | self._configs["rand_scale_step"] 70 | ) 71 | -------------------------------------------------------------------------------- /core/detectors.py: -------------------------------------------------------------------------------- 1 | from .base import Base, load_cfg, load_nnet 2 | from .paths import get_file_path 3 | from .config import SystemConfig 4 | from .dbs.coco import COCO 5 | 6 | class CornerNet(Base): 7 | def __init__(self): 8 | from .test.cornernet import cornernet_inference 9 | from .models.CornerNet import model 10 | 11 | cfg_path = get_file_path("..", "configs", "CornerNet.json") 12 | model_path = get_file_path("..", "cache", "nnet", "CornerNet", "CornerNet_500000.pkl") 13 | 14 | cfg_sys, cfg_db = load_cfg(cfg_path) 15 | sys_cfg = SystemConfig().update_config(cfg_sys) 16 | coco = COCO(cfg_db) 17 | 18 | cornernet = load_nnet(sys_cfg, model()) 19 | super(CornerNet, self).__init__(coco, cornernet, cornernet_inference, model=model_path) 20 | 21 | class CornerNet_Squeeze(Base): 22 | def __init__(self): 23 | from .test.cornernet import cornernet_inference 24 | from .models.CornerNet_Squeeze import model 25 | 26 | cfg_path = get_file_path("..", "configs", "CornerNet_Squeeze.json") 27 | model_path = get_file_path("..", "cache", "nnet", "CornerNet_Squeeze", "CornerNet_Squeeze_500000.pkl") 28 | 29 | cfg_sys, cfg_db = load_cfg(cfg_path) 30 | sys_cfg = SystemConfig().update_config(cfg_sys) 31 | coco = COCO(cfg_db) 32 | 33 | cornernet = load_nnet(sys_cfg, model()) 34 | super(CornerNet_Squeeze, self).__init__(coco, cornernet, cornernet_inference, model=model_path) 35 | 36 | class CornerNet_Saccade(Base): 37 | def __init__(self): 38 | from .test.cornernet_saccade import cornernet_saccade_inference 39 | from .models.CornerNet_Saccade import model 40 | 41 | cfg_path = get_file_path("..", "configs", "CornerNet_Saccade.json") 42 | model_path = get_file_path("..", "cache", "nnet", "CornerNet_Saccade", "CornerNet_Saccade_500000.pkl") 43 | 44 | cfg_sys, cfg_db = load_cfg(cfg_path) 45 | sys_cfg = SystemConfig().update_config(cfg_sys) 46 | coco = COCO(cfg_db) 47 | 48 | cornernet = load_nnet(sys_cfg, model()) 49 | super(CornerNet_Saccade, self).__init__(coco, cornernet, cornernet_saccade_inference, model=model_path) 50 | -------------------------------------------------------------------------------- /core/external/.gitignore: -------------------------------------------------------------------------------- 1 | bbox.c 2 | bbox.cpython-35m-x86_64-linux-gnu.so 3 | bbox.cpython-36m-x86_64-linux-gnu.so 4 | 5 | nms.c 6 | nms.cpython-35m-x86_64-linux-gnu.so 7 | nms.cpython-36m-x86_64-linux-gnu.so 8 | -------------------------------------------------------------------------------- /core/external/Makefile: -------------------------------------------------------------------------------- 1 | all: 2 | python setup.py build_ext --inplace 3 | rm -rf build 4 | -------------------------------------------------------------------------------- /core/external/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/princeton-vl/CornerNet-Lite/6a54505d830a9d6afe26e99f0864b5d06d0bbbaf/core/external/__init__.py -------------------------------------------------------------------------------- /core/external/bbox.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Sergey Karayev 6 | # -------------------------------------------------------- 7 | 8 | cimport cython 9 | import numpy as np 10 | cimport numpy as np 11 | 12 | DTYPE = np.float 13 | ctypedef np.float_t DTYPE_t 14 | 15 | def bbox_overlaps( 16 | np.ndarray[DTYPE_t, ndim=2] boxes, 17 | np.ndarray[DTYPE_t, ndim=2] query_boxes): 18 | """ 19 | Parameters 20 | ---------- 21 | boxes: (N, 4) ndarray of float 22 | query_boxes: (K, 4) ndarray of float 23 | Returns 24 | ------- 25 | overlaps: (N, K) ndarray of overlap between boxes and query_boxes 26 | """ 27 | cdef unsigned int N = boxes.shape[0] 28 | cdef unsigned int K = query_boxes.shape[0] 29 | cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE) 30 | cdef DTYPE_t iw, ih, box_area 31 | cdef DTYPE_t ua 32 | cdef unsigned int k, n 33 | for k in range(K): 34 | box_area = ( 35 | (query_boxes[k, 2] - query_boxes[k, 0] + 1) * 36 | (query_boxes[k, 3] - query_boxes[k, 1] + 1) 37 | ) 38 | for n in range(N): 39 | iw = ( 40 | min(boxes[n, 2], query_boxes[k, 2]) - 41 | max(boxes[n, 0], query_boxes[k, 0]) + 1 42 | ) 43 | if iw > 0: 44 | ih = ( 45 | min(boxes[n, 3], query_boxes[k, 3]) - 46 | max(boxes[n, 1], query_boxes[k, 1]) + 1 47 | ) 48 | if ih > 0: 49 | ua = float( 50 | (boxes[n, 2] - boxes[n, 0] + 1) * 51 | (boxes[n, 3] - boxes[n, 1] + 1) + 52 | box_area - iw * ih 53 | ) 54 | overlaps[n, k] = iw * ih / ua 55 | return overlaps 56 | -------------------------------------------------------------------------------- /core/external/nms.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | cimport numpy as np 10 | 11 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b): 12 | return a if a >= b else b 13 | 14 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b): 15 | return a if a <= b else b 16 | 17 | def nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh): 18 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0] 19 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1] 20 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2] 21 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3] 22 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4] 23 | 24 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1) 25 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1] 26 | 27 | cdef int ndets = dets.shape[0] 28 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \ 29 | np.zeros((ndets), dtype=np.int) 30 | 31 | # nominal indices 32 | cdef int _i, _j 33 | # sorted indices 34 | cdef int i, j 35 | # temp variables for box i's (the box currently under consideration) 36 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea 37 | # variables for computing overlap with box j (lower scoring box) 38 | cdef np.float32_t xx1, yy1, xx2, yy2 39 | cdef np.float32_t w, h 40 | cdef np.float32_t inter, ovr 41 | 42 | keep = [] 43 | for _i in range(ndets): 44 | i = order[_i] 45 | if suppressed[i] == 1: 46 | continue 47 | keep.append(i) 48 | ix1 = x1[i] 49 | iy1 = y1[i] 50 | ix2 = x2[i] 51 | iy2 = y2[i] 52 | iarea = areas[i] 53 | for _j in range(_i + 1, ndets): 54 | j = order[_j] 55 | if suppressed[j] == 1: 56 | continue 57 | xx1 = max(ix1, x1[j]) 58 | yy1 = max(iy1, y1[j]) 59 | xx2 = min(ix2, x2[j]) 60 | yy2 = min(iy2, y2[j]) 61 | w = max(0.0, xx2 - xx1 + 1) 62 | h = max(0.0, yy2 - yy1 + 1) 63 | inter = w * h 64 | ovr = inter / (iarea + areas[j] - inter) 65 | if ovr >= thresh: 66 | suppressed[j] = 1 67 | 68 | return keep 69 | 70 | def soft_nms(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001, unsigned int method=0): 71 | cdef unsigned int N = boxes.shape[0] 72 | cdef float iw, ih, box_area 73 | cdef float ua 74 | cdef int pos = 0 75 | cdef float maxscore = 0 76 | cdef int maxpos = 0 77 | cdef float x1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov 78 | 79 | for i in range(N): 80 | maxscore = boxes[i, 4] 81 | maxpos = i 82 | 83 | tx1 = boxes[i,0] 84 | ty1 = boxes[i,1] 85 | tx2 = boxes[i,2] 86 | ty2 = boxes[i,3] 87 | ts = boxes[i,4] 88 | 89 | pos = i + 1 90 | # get max box 91 | while pos < N: 92 | if maxscore < boxes[pos, 4]: 93 | maxscore = boxes[pos, 4] 94 | maxpos = pos 95 | pos = pos + 1 96 | 97 | # add max box as a detection 98 | boxes[i,0] = boxes[maxpos,0] 99 | boxes[i,1] = boxes[maxpos,1] 100 | boxes[i,2] = boxes[maxpos,2] 101 | boxes[i,3] = boxes[maxpos,3] 102 | boxes[i,4] = boxes[maxpos,4] 103 | 104 | # swap ith box with position of max box 105 | boxes[maxpos,0] = tx1 106 | boxes[maxpos,1] = ty1 107 | boxes[maxpos,2] = tx2 108 | boxes[maxpos,3] = ty2 109 | boxes[maxpos,4] = ts 110 | 111 | tx1 = boxes[i,0] 112 | ty1 = boxes[i,1] 113 | tx2 = boxes[i,2] 114 | ty2 = boxes[i,3] 115 | ts = boxes[i,4] 116 | 117 | pos = i + 1 118 | # NMS iterations, note that N changes if detection boxes fall below threshold 119 | while pos < N: 120 | x1 = boxes[pos, 0] 121 | y1 = boxes[pos, 1] 122 | x2 = boxes[pos, 2] 123 | y2 = boxes[pos, 3] 124 | s = boxes[pos, 4] 125 | 126 | area = (x2 - x1 + 1) * (y2 - y1 + 1) 127 | iw = (min(tx2, x2) - max(tx1, x1) + 1) 128 | if iw > 0: 129 | ih = (min(ty2, y2) - max(ty1, y1) + 1) 130 | if ih > 0: 131 | ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih) 132 | ov = iw * ih / ua #iou between max box and detection box 133 | 134 | if method == 1: # linear 135 | if ov > Nt: 136 | weight = 1 - ov 137 | else: 138 | weight = 1 139 | elif method == 2: # gaussian 140 | weight = np.exp(-(ov * ov)/sigma) 141 | else: # original NMS 142 | if ov > Nt: 143 | weight = 0 144 | else: 145 | weight = 1 146 | 147 | boxes[pos, 4] = weight*boxes[pos, 4] 148 | 149 | # if box score falls below threshold, discard the box by swapping with last box 150 | # update N 151 | if boxes[pos, 4] < threshold: 152 | boxes[pos,0] = boxes[N-1, 0] 153 | boxes[pos,1] = boxes[N-1, 1] 154 | boxes[pos,2] = boxes[N-1, 2] 155 | boxes[pos,3] = boxes[N-1, 3] 156 | boxes[pos,4] = boxes[N-1, 4] 157 | N = N - 1 158 | pos = pos - 1 159 | 160 | pos = pos + 1 161 | 162 | keep = [i for i in range(N)] 163 | return keep 164 | 165 | def soft_nms_merge(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001, unsigned int method=0, float weight_exp=6): 166 | cdef unsigned int N = boxes.shape[0] 167 | cdef float iw, ih, box_area 168 | cdef float ua 169 | cdef int pos = 0 170 | cdef float maxscore = 0 171 | cdef int maxpos = 0 172 | cdef float x1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov 173 | cdef float mx1,mx2,my1,my2,mts,mbs,mw 174 | 175 | for i in range(N): 176 | maxscore = boxes[i, 4] 177 | maxpos = i 178 | 179 | tx1 = boxes[i,0] 180 | ty1 = boxes[i,1] 181 | tx2 = boxes[i,2] 182 | ty2 = boxes[i,3] 183 | ts = boxes[i,4] 184 | 185 | pos = i + 1 186 | # get max box 187 | while pos < N: 188 | if maxscore < boxes[pos, 4]: 189 | maxscore = boxes[pos, 4] 190 | maxpos = pos 191 | pos = pos + 1 192 | 193 | # add max box as a detection 194 | boxes[i,0] = boxes[maxpos,0] 195 | boxes[i,1] = boxes[maxpos,1] 196 | boxes[i,2] = boxes[maxpos,2] 197 | boxes[i,3] = boxes[maxpos,3] 198 | boxes[i,4] = boxes[maxpos,4] 199 | 200 | mx1 = boxes[i, 0] * boxes[i, 5] 201 | my1 = boxes[i, 1] * boxes[i, 5] 202 | mx2 = boxes[i, 2] * boxes[i, 6] 203 | my2 = boxes[i, 3] * boxes[i, 6] 204 | mts = boxes[i, 5] 205 | mbs = boxes[i, 6] 206 | 207 | # swap ith box with position of max box 208 | boxes[maxpos,0] = tx1 209 | boxes[maxpos,1] = ty1 210 | boxes[maxpos,2] = tx2 211 | boxes[maxpos,3] = ty2 212 | boxes[maxpos,4] = ts 213 | 214 | tx1 = boxes[i,0] 215 | ty1 = boxes[i,1] 216 | tx2 = boxes[i,2] 217 | ty2 = boxes[i,3] 218 | ts = boxes[i,4] 219 | 220 | pos = i + 1 221 | # NMS iterations, note that N changes if detection boxes fall below threshold 222 | while pos < N: 223 | x1 = boxes[pos, 0] 224 | y1 = boxes[pos, 1] 225 | x2 = boxes[pos, 2] 226 | y2 = boxes[pos, 3] 227 | s = boxes[pos, 4] 228 | 229 | area = (x2 - x1 + 1) * (y2 - y1 + 1) 230 | iw = (min(tx2, x2) - max(tx1, x1) + 1) 231 | if iw > 0: 232 | ih = (min(ty2, y2) - max(ty1, y1) + 1) 233 | if ih > 0: 234 | ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih) 235 | ov = iw * ih / ua #iou between max box and detection box 236 | 237 | if method == 1: # linear 238 | if ov > Nt: 239 | weight = 1 - ov 240 | else: 241 | weight = 1 242 | elif method == 2: # gaussian 243 | weight = np.exp(-(ov * ov)/sigma) 244 | else: # original NMS 245 | if ov > Nt: 246 | weight = 0 247 | else: 248 | weight = 1 249 | 250 | mw = (1 - weight) ** weight_exp 251 | mx1 = mx1 + boxes[pos, 0] * boxes[pos, 5] * mw 252 | my1 = my1 + boxes[pos, 1] * boxes[pos, 5] * mw 253 | mx2 = mx2 + boxes[pos, 2] * boxes[pos, 6] * mw 254 | my2 = my2 + boxes[pos, 3] * boxes[pos, 6] * mw 255 | mts = mts + boxes[pos, 5] * mw 256 | mbs = mbs + boxes[pos, 6] * mw 257 | 258 | boxes[pos, 4] = weight*boxes[pos, 4] 259 | 260 | # if box score falls below threshold, discard the box by swapping with last box 261 | # update N 262 | if boxes[pos, 4] < threshold: 263 | boxes[pos,0] = boxes[N-1, 0] 264 | boxes[pos,1] = boxes[N-1, 1] 265 | boxes[pos,2] = boxes[N-1, 2] 266 | boxes[pos,3] = boxes[N-1, 3] 267 | boxes[pos,4] = boxes[N-1, 4] 268 | N = N - 1 269 | pos = pos - 1 270 | 271 | pos = pos + 1 272 | 273 | boxes[i, 0] = mx1 / mts 274 | boxes[i, 1] = my1 / mts 275 | boxes[i, 2] = mx2 / mbs 276 | boxes[i, 3] = my2 / mbs 277 | 278 | keep = [i for i in range(N)] 279 | return keep 280 | -------------------------------------------------------------------------------- /core/external/setup.py: -------------------------------------------------------------------------------- 1 | import numpy 2 | from distutils.core import setup 3 | from distutils.extension import Extension 4 | from Cython.Build import cythonize 5 | 6 | extensions = [ 7 | Extension( 8 | "bbox", 9 | ["bbox.pyx"], 10 | extra_compile_args=["-Wno-cpp", "-Wno-unused-function"] 11 | ), 12 | Extension( 13 | "nms", 14 | ["nms.pyx"], 15 | extra_compile_args=["-Wno-cpp", "-Wno-unused-function"] 16 | ) 17 | ] 18 | 19 | setup( 20 | name="coco", 21 | ext_modules=cythonize(extensions), 22 | include_dirs=[numpy.get_include()] 23 | ) 24 | -------------------------------------------------------------------------------- /core/models/CornerNet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from .py_utils import TopPool, BottomPool, LeftPool, RightPool 5 | 6 | from .py_utils.utils import convolution, residual, corner_pool 7 | from .py_utils.losses import CornerNet_Loss 8 | from .py_utils.modules import hg_module, hg, hg_net 9 | 10 | def make_pool_layer(dim): 11 | return nn.Sequential() 12 | 13 | def make_hg_layer(inp_dim, out_dim, modules): 14 | layers = [residual(inp_dim, out_dim, stride=2)] 15 | layers += [residual(out_dim, out_dim) for _ in range(1, modules)] 16 | return nn.Sequential(*layers) 17 | 18 | class model(hg_net): 19 | def _pred_mod(self, dim): 20 | return nn.Sequential( 21 | convolution(3, 256, 256, with_bn=False), 22 | nn.Conv2d(256, dim, (1, 1)) 23 | ) 24 | 25 | def _merge_mod(self): 26 | return nn.Sequential( 27 | nn.Conv2d(256, 256, (1, 1), bias=False), 28 | nn.BatchNorm2d(256) 29 | ) 30 | 31 | def __init__(self): 32 | stacks = 2 33 | pre = nn.Sequential( 34 | convolution(7, 3, 128, stride=2), 35 | residual(128, 256, stride=2) 36 | ) 37 | hg_mods = nn.ModuleList([ 38 | hg_module( 39 | 5, [256, 256, 384, 384, 384, 512], [2, 2, 2, 2, 2, 4], 40 | make_pool_layer=make_pool_layer, 41 | make_hg_layer=make_hg_layer 42 | ) for _ in range(stacks) 43 | ]) 44 | cnvs = nn.ModuleList([convolution(3, 256, 256) for _ in range(stacks)]) 45 | inters = nn.ModuleList([residual(256, 256) for _ in range(stacks - 1)]) 46 | cnvs_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)]) 47 | inters_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)]) 48 | 49 | hgs = hg(pre, hg_mods, cnvs, inters, cnvs_, inters_) 50 | 51 | tl_modules = nn.ModuleList([corner_pool(256, TopPool, LeftPool) for _ in range(stacks)]) 52 | br_modules = nn.ModuleList([corner_pool(256, BottomPool, RightPool) for _ in range(stacks)]) 53 | 54 | tl_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)]) 55 | br_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)]) 56 | for tl_heat, br_heat in zip(tl_heats, br_heats): 57 | torch.nn.init.constant_(tl_heat[-1].bias, -2.19) 58 | torch.nn.init.constant_(br_heat[-1].bias, -2.19) 59 | 60 | tl_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)]) 61 | br_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)]) 62 | 63 | tl_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)]) 64 | br_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)]) 65 | 66 | super(model, self).__init__( 67 | hgs, tl_modules, br_modules, tl_heats, br_heats, 68 | tl_tags, br_tags, tl_offs, br_offs 69 | ) 70 | 71 | self.loss = CornerNet_Loss(pull_weight=1e-1, push_weight=1e-1) 72 | -------------------------------------------------------------------------------- /core/models/CornerNet_Saccade.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from .py_utils import TopPool, BottomPool, LeftPool, RightPool 5 | 6 | from .py_utils.utils import convolution, residual, corner_pool 7 | from .py_utils.losses import CornerNet_Saccade_Loss 8 | from .py_utils.modules import saccade_net, saccade_module, saccade 9 | 10 | def make_pool_layer(dim): 11 | return nn.Sequential() 12 | 13 | def make_hg_layer(inp_dim, out_dim, modules): 14 | layers = [residual(inp_dim, out_dim, stride=2)] 15 | layers += [residual(out_dim, out_dim) for _ in range(1, modules)] 16 | return nn.Sequential(*layers) 17 | 18 | class model(saccade_net): 19 | def _pred_mod(self, dim): 20 | return nn.Sequential( 21 | convolution(3, 256, 256, with_bn=False), 22 | nn.Conv2d(256, dim, (1, 1)) 23 | ) 24 | 25 | def _merge_mod(self): 26 | return nn.Sequential( 27 | nn.Conv2d(256, 256, (1, 1), bias=False), 28 | nn.BatchNorm2d(256) 29 | ) 30 | 31 | def __init__(self): 32 | stacks = 3 33 | pre = nn.Sequential( 34 | convolution(7, 3, 128, stride=2), 35 | residual(128, 256, stride=2) 36 | ) 37 | hg_mods = nn.ModuleList([ 38 | saccade_module( 39 | 3, [256, 384, 384, 512], [1, 1, 1, 1], 40 | make_pool_layer=make_pool_layer, 41 | make_hg_layer=make_hg_layer 42 | ) for _ in range(stacks) 43 | ]) 44 | cnvs = nn.ModuleList([convolution(3, 256, 256) for _ in range(stacks)]) 45 | inters = nn.ModuleList([residual(256, 256) for _ in range(stacks - 1)]) 46 | cnvs_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)]) 47 | inters_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)]) 48 | 49 | att_mods = nn.ModuleList([ 50 | nn.ModuleList([ 51 | nn.Sequential( 52 | convolution(3, 384, 256, with_bn=False), 53 | nn.Conv2d(256, 1, (1, 1)) 54 | ), 55 | nn.Sequential( 56 | convolution(3, 384, 256, with_bn=False), 57 | nn.Conv2d(256, 1, (1, 1)) 58 | ), 59 | nn.Sequential( 60 | convolution(3, 256, 256, with_bn=False), 61 | nn.Conv2d(256, 1, (1, 1)) 62 | ) 63 | ]) for _ in range(stacks) 64 | ]) 65 | for att_mod in att_mods: 66 | for att in att_mod: 67 | torch.nn.init.constant_(att[-1].bias, -2.19) 68 | 69 | hgs = saccade(pre, hg_mods, cnvs, inters, cnvs_, inters_) 70 | 71 | tl_modules = nn.ModuleList([corner_pool(256, TopPool, LeftPool) for _ in range(stacks)]) 72 | br_modules = nn.ModuleList([corner_pool(256, BottomPool, RightPool) for _ in range(stacks)]) 73 | 74 | tl_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)]) 75 | br_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)]) 76 | for tl_heat, br_heat in zip(tl_heats, br_heats): 77 | torch.nn.init.constant_(tl_heat[-1].bias, -2.19) 78 | torch.nn.init.constant_(br_heat[-1].bias, -2.19) 79 | 80 | tl_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)]) 81 | br_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)]) 82 | 83 | tl_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)]) 84 | br_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)]) 85 | 86 | super(model, self).__init__( 87 | hgs, tl_modules, br_modules, tl_heats, br_heats, 88 | tl_tags, br_tags, tl_offs, br_offs, att_mods 89 | ) 90 | 91 | self.loss = CornerNet_Saccade_Loss(pull_weight=1e-1, push_weight=1e-1) 92 | -------------------------------------------------------------------------------- /core/models/CornerNet_Squeeze.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from .py_utils import TopPool, BottomPool, LeftPool, RightPool 5 | 6 | from .py_utils.utils import convolution, corner_pool, residual 7 | from .py_utils.losses import CornerNet_Loss 8 | from .py_utils.modules import hg_module, hg, hg_net 9 | 10 | class fire_module(nn.Module): 11 | def __init__(self, inp_dim, out_dim, sr=2, stride=1): 12 | super(fire_module, self).__init__() 13 | self.conv1 = nn.Conv2d(inp_dim, out_dim // sr, kernel_size=1, stride=1, bias=False) 14 | self.bn1 = nn.BatchNorm2d(out_dim // sr) 15 | self.conv_1x1 = nn.Conv2d(out_dim // sr, out_dim // 2, kernel_size=1, stride=stride, bias=False) 16 | self.conv_3x3 = nn.Conv2d(out_dim // sr, out_dim // 2, kernel_size=3, padding=1, 17 | stride=stride, groups=out_dim // sr, bias=False) 18 | self.bn2 = nn.BatchNorm2d(out_dim) 19 | self.skip = (stride == 1 and inp_dim == out_dim) 20 | self.relu = nn.ReLU(inplace=True) 21 | 22 | def forward(self, x): 23 | conv1 = self.conv1(x) 24 | bn1 = self.bn1(conv1) 25 | conv2 = torch.cat((self.conv_1x1(bn1), self.conv_3x3(bn1)), 1) 26 | bn2 = self.bn2(conv2) 27 | if self.skip: 28 | return self.relu(bn2 + x) 29 | else: 30 | return self.relu(bn2) 31 | 32 | def make_pool_layer(dim): 33 | return nn.Sequential() 34 | 35 | def make_unpool_layer(dim): 36 | return nn.ConvTranspose2d(dim, dim, kernel_size=4, stride=2, padding=1) 37 | 38 | def make_layer(inp_dim, out_dim, modules): 39 | layers = [fire_module(inp_dim, out_dim)] 40 | layers += [fire_module(out_dim, out_dim) for _ in range(1, modules)] 41 | return nn.Sequential(*layers) 42 | 43 | def make_layer_revr(inp_dim, out_dim, modules): 44 | layers = [fire_module(inp_dim, inp_dim) for _ in range(modules - 1)] 45 | layers += [fire_module(inp_dim, out_dim)] 46 | return nn.Sequential(*layers) 47 | 48 | def make_hg_layer(inp_dim, out_dim, modules): 49 | layers = [fire_module(inp_dim, out_dim, stride=2)] 50 | layers += [fire_module(out_dim, out_dim) for _ in range(1, modules)] 51 | return nn.Sequential(*layers) 52 | 53 | class model(hg_net): 54 | def _pred_mod(self, dim): 55 | return nn.Sequential( 56 | convolution(1, 256, 256, with_bn=False), 57 | nn.Conv2d(256, dim, (1, 1)) 58 | ) 59 | 60 | def _merge_mod(self): 61 | return nn.Sequential( 62 | nn.Conv2d(256, 256, (1, 1), bias=False), 63 | nn.BatchNorm2d(256) 64 | ) 65 | 66 | def __init__(self): 67 | stacks = 2 68 | pre = nn.Sequential( 69 | convolution(7, 3, 128, stride=2), 70 | residual(128, 256, stride=2), 71 | residual(256, 256, stride=2) 72 | ) 73 | hg_mods = nn.ModuleList([ 74 | hg_module( 75 | 4, [256, 256, 384, 384, 512], [2, 2, 2, 2, 4], 76 | make_pool_layer=make_pool_layer, 77 | make_unpool_layer=make_unpool_layer, 78 | make_up_layer=make_layer, 79 | make_low_layer=make_layer, 80 | make_hg_layer_revr=make_layer_revr, 81 | make_hg_layer=make_hg_layer 82 | ) for _ in range(stacks) 83 | ]) 84 | cnvs = nn.ModuleList([convolution(3, 256, 256) for _ in range(stacks)]) 85 | inters = nn.ModuleList([residual(256, 256) for _ in range(stacks - 1)]) 86 | cnvs_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)]) 87 | inters_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)]) 88 | 89 | hgs = hg(pre, hg_mods, cnvs, inters, cnvs_, inters_) 90 | 91 | tl_modules = nn.ModuleList([corner_pool(256, TopPool, LeftPool) for _ in range(stacks)]) 92 | br_modules = nn.ModuleList([corner_pool(256, BottomPool, RightPool) for _ in range(stacks)]) 93 | 94 | tl_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)]) 95 | br_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)]) 96 | for tl_heat, br_heat in zip(tl_heats, br_heats): 97 | torch.nn.init.constant_(tl_heat[-1].bias, -2.19) 98 | torch.nn.init.constant_(br_heat[-1].bias, -2.19) 99 | 100 | tl_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)]) 101 | br_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)]) 102 | 103 | tl_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)]) 104 | br_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)]) 105 | 106 | super(model, self).__init__( 107 | hgs, tl_modules, br_modules, tl_heats, br_heats, 108 | tl_tags, br_tags, tl_offs, br_offs 109 | ) 110 | 111 | self.loss = CornerNet_Loss(pull_weight=1e-1, push_weight=1e-1) 112 | -------------------------------------------------------------------------------- /core/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/princeton-vl/CornerNet-Lite/6a54505d830a9d6afe26e99f0864b5d06d0bbbaf/core/models/__init__.py -------------------------------------------------------------------------------- /core/models/py_utils/__init__.py: -------------------------------------------------------------------------------- 1 | from ._cpools import TopPool, BottomPool, LeftPool, RightPool 2 | -------------------------------------------------------------------------------- /core/models/py_utils/_cpools/.gitignore: -------------------------------------------------------------------------------- 1 | build/ 2 | cpools.egg-info/ 3 | dist/ 4 | -------------------------------------------------------------------------------- /core/models/py_utils/_cpools/__init__.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | from torch import nn 4 | from torch.autograd import Function 5 | 6 | import top_pool, bottom_pool, left_pool, right_pool 7 | 8 | class TopPoolFunction(Function): 9 | @staticmethod 10 | def forward(ctx, input): 11 | output = top_pool.forward(input)[0] 12 | ctx.save_for_backward(input) 13 | return output 14 | 15 | @staticmethod 16 | def backward(ctx, grad_output): 17 | input = ctx.saved_variables[0] 18 | output = top_pool.backward(input, grad_output)[0] 19 | return output 20 | 21 | class BottomPoolFunction(Function): 22 | @staticmethod 23 | def forward(ctx, input): 24 | output = bottom_pool.forward(input)[0] 25 | ctx.save_for_backward(input) 26 | return output 27 | 28 | @staticmethod 29 | def backward(ctx, grad_output): 30 | input = ctx.saved_variables[0] 31 | output = bottom_pool.backward(input, grad_output)[0] 32 | return output 33 | 34 | class LeftPoolFunction(Function): 35 | @staticmethod 36 | def forward(ctx, input): 37 | output = left_pool.forward(input)[0] 38 | ctx.save_for_backward(input) 39 | return output 40 | 41 | @staticmethod 42 | def backward(ctx, grad_output): 43 | input = ctx.saved_variables[0] 44 | output = left_pool.backward(input, grad_output)[0] 45 | return output 46 | 47 | class RightPoolFunction(Function): 48 | @staticmethod 49 | def forward(ctx, input): 50 | output = right_pool.forward(input)[0] 51 | ctx.save_for_backward(input) 52 | return output 53 | 54 | @staticmethod 55 | def backward(ctx, grad_output): 56 | input = ctx.saved_variables[0] 57 | output = right_pool.backward(input, grad_output)[0] 58 | return output 59 | 60 | class TopPool(nn.Module): 61 | def forward(self, x): 62 | return TopPoolFunction.apply(x) 63 | 64 | class BottomPool(nn.Module): 65 | def forward(self, x): 66 | return BottomPoolFunction.apply(x) 67 | 68 | class LeftPool(nn.Module): 69 | def forward(self, x): 70 | return LeftPoolFunction.apply(x) 71 | 72 | class RightPool(nn.Module): 73 | def forward(self, x): 74 | return RightPoolFunction.apply(x) 75 | -------------------------------------------------------------------------------- /core/models/py_utils/_cpools/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | from torch.utils.cpp_extension import BuildExtension, CppExtension 3 | 4 | setup( 5 | name="cpools", 6 | ext_modules=[ 7 | CppExtension("top_pool", ["src/top_pool.cpp"]), 8 | CppExtension("bottom_pool", ["src/bottom_pool.cpp"]), 9 | CppExtension("left_pool", ["src/left_pool.cpp"]), 10 | CppExtension("right_pool", ["src/right_pool.cpp"]) 11 | ], 12 | cmdclass={ 13 | "build_ext": BuildExtension 14 | } 15 | ) 16 | -------------------------------------------------------------------------------- /core/models/py_utils/_cpools/src/bottom_pool.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include 4 | 5 | std::vector pool_forward( 6 | at::Tensor input 7 | ) { 8 | // Initialize output 9 | at::Tensor output = at::zeros_like(input); 10 | 11 | // Get height 12 | int64_t height = input.size(2); 13 | 14 | output.copy_(input); 15 | 16 | for (int64_t ind = 1; ind < height; ind <<= 1) { 17 | at::Tensor max_temp = at::slice(output, 2, ind, height); 18 | at::Tensor cur_temp = at::slice(output, 2, ind, height); 19 | at::Tensor next_temp = at::slice(output, 2, 0, height-ind); 20 | at::max_out(max_temp, cur_temp, next_temp); 21 | } 22 | 23 | return { 24 | output 25 | }; 26 | } 27 | 28 | std::vector pool_backward( 29 | at::Tensor input, 30 | at::Tensor grad_output 31 | ) { 32 | auto output = at::zeros_like(input); 33 | 34 | int32_t batch = input.size(0); 35 | int32_t channel = input.size(1); 36 | int32_t height = input.size(2); 37 | int32_t width = input.size(3); 38 | 39 | auto max_val = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat)); 40 | auto max_ind = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kLong)); 41 | 42 | auto input_temp = input.select(2, 0); 43 | max_val.copy_(input_temp); 44 | 45 | max_ind.fill_(0); 46 | 47 | auto output_temp = output.select(2, 0); 48 | auto grad_output_temp = grad_output.select(2, 0); 49 | output_temp.copy_(grad_output_temp); 50 | 51 | auto un_max_ind = max_ind.unsqueeze(2); 52 | auto gt_mask = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kByte)); 53 | auto max_temp = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat)); 54 | for (int32_t ind = 0; ind < height - 1; ++ind) { 55 | input_temp = input.select(2, ind + 1); 56 | at::gt_out(gt_mask, input_temp, max_val); 57 | 58 | at::masked_select_out(max_temp, input_temp, gt_mask); 59 | max_val.masked_scatter_(gt_mask, max_temp); 60 | max_ind.masked_fill_(gt_mask, ind + 1); 61 | 62 | grad_output_temp = grad_output.select(2, ind + 1).unsqueeze(2); 63 | output.scatter_add_(2, un_max_ind, grad_output_temp); 64 | } 65 | 66 | return { 67 | output 68 | }; 69 | } 70 | 71 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 72 | m.def( 73 | "forward", &pool_forward, "Bottom Pool Forward", 74 | py::call_guard() 75 | ); 76 | m.def( 77 | "backward", &pool_backward, "Bottom Pool Backward", 78 | py::call_guard() 79 | ); 80 | } 81 | -------------------------------------------------------------------------------- /core/models/py_utils/_cpools/src/left_pool.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include 4 | 5 | std::vector pool_forward( 6 | at::Tensor input 7 | ) { 8 | // Initialize output 9 | at::Tensor output = at::zeros_like(input); 10 | 11 | // Get width 12 | int64_t width = input.size(3); 13 | 14 | output.copy_(input); 15 | 16 | for (int64_t ind = 1; ind < width; ind <<= 1) { 17 | at::Tensor max_temp = at::slice(output, 3, 0, width-ind); 18 | at::Tensor cur_temp = at::slice(output, 3, 0, width-ind); 19 | at::Tensor next_temp = at::slice(output, 3, ind, width); 20 | at::max_out(max_temp, cur_temp, next_temp); 21 | } 22 | 23 | return { 24 | output 25 | }; 26 | } 27 | 28 | std::vector pool_backward( 29 | at::Tensor input, 30 | at::Tensor grad_output 31 | ) { 32 | auto output = at::zeros_like(input); 33 | 34 | int32_t batch = input.size(0); 35 | int32_t channel = input.size(1); 36 | int32_t height = input.size(2); 37 | int32_t width = input.size(3); 38 | 39 | auto max_val = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat)); 40 | auto max_ind = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kLong)); 41 | 42 | auto input_temp = input.select(3, width - 1); 43 | max_val.copy_(input_temp); 44 | 45 | max_ind.fill_(width - 1); 46 | 47 | auto output_temp = output.select(3, width - 1); 48 | auto grad_output_temp = grad_output.select(3, width - 1); 49 | output_temp.copy_(grad_output_temp); 50 | 51 | auto un_max_ind = max_ind.unsqueeze(3); 52 | auto gt_mask = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kByte)); 53 | auto max_temp = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat)); 54 | for (int32_t ind = 1; ind < width; ++ind) { 55 | input_temp = input.select(3, width - ind - 1); 56 | at::gt_out(gt_mask, input_temp, max_val); 57 | 58 | at::masked_select_out(max_temp, input_temp, gt_mask); 59 | max_val.masked_scatter_(gt_mask, max_temp); 60 | max_ind.masked_fill_(gt_mask, width - ind - 1); 61 | 62 | grad_output_temp = grad_output.select(3, width - ind - 1).unsqueeze(3); 63 | output.scatter_add_(3, un_max_ind, grad_output_temp); 64 | } 65 | 66 | return { 67 | output 68 | }; 69 | } 70 | 71 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 72 | m.def( 73 | "forward", &pool_forward, "Left Pool Forward", 74 | py::call_guard() 75 | ); 76 | m.def( 77 | "backward", &pool_backward, "Left Pool Backward", 78 | py::call_guard() 79 | ); 80 | } 81 | -------------------------------------------------------------------------------- /core/models/py_utils/_cpools/src/right_pool.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include 4 | 5 | std::vector pool_forward( 6 | at::Tensor input 7 | ) { 8 | // Initialize output 9 | at::Tensor output = at::zeros_like(input); 10 | 11 | // Get width 12 | int64_t width = input.size(3); 13 | 14 | output.copy_(input); 15 | 16 | for (int64_t ind = 1; ind < width; ind <<= 1) { 17 | at::Tensor max_temp = at::slice(output, 3, ind, width); 18 | at::Tensor cur_temp = at::slice(output, 3, ind, width); 19 | at::Tensor next_temp = at::slice(output, 3, 0, width-ind); 20 | at::max_out(max_temp, cur_temp, next_temp); 21 | } 22 | 23 | return { 24 | output 25 | }; 26 | } 27 | 28 | std::vector pool_backward( 29 | at::Tensor input, 30 | at::Tensor grad_output 31 | ) { 32 | at::Tensor output = at::zeros_like(input); 33 | 34 | int32_t batch = input.size(0); 35 | int32_t channel = input.size(1); 36 | int32_t height = input.size(2); 37 | int32_t width = input.size(3); 38 | 39 | auto max_val = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat)); 40 | auto max_ind = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kLong)); 41 | 42 | auto input_temp = input.select(3, 0); 43 | max_val.copy_(input_temp); 44 | 45 | max_ind.fill_(0); 46 | 47 | auto output_temp = output.select(3, 0); 48 | auto grad_output_temp = grad_output.select(3, 0); 49 | output_temp.copy_(grad_output_temp); 50 | 51 | auto un_max_ind = max_ind.unsqueeze(3); 52 | auto gt_mask = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kByte)); 53 | auto max_temp = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat)); 54 | for (int32_t ind = 0; ind < width - 1; ++ind) { 55 | input_temp = input.select(3, ind + 1); 56 | at::gt_out(gt_mask, input_temp, max_val); 57 | 58 | at::masked_select_out(max_temp, input_temp, gt_mask); 59 | max_val.masked_scatter_(gt_mask, max_temp); 60 | max_ind.masked_fill_(gt_mask, ind + 1); 61 | 62 | grad_output_temp = grad_output.select(3, ind + 1).unsqueeze(3); 63 | output.scatter_add_(3, un_max_ind, grad_output_temp); 64 | } 65 | 66 | return { 67 | output 68 | }; 69 | } 70 | 71 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 72 | m.def( 73 | "forward", &pool_forward, "Right Pool Forward", 74 | py::call_guard() 75 | ); 76 | m.def( 77 | "backward", &pool_backward, "Right Pool Backward", 78 | py::call_guard() 79 | ); 80 | } 81 | -------------------------------------------------------------------------------- /core/models/py_utils/_cpools/src/top_pool.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include 4 | 5 | std::vector top_pool_forward( 6 | at::Tensor input 7 | ) { 8 | // Initialize output 9 | at::Tensor output = at::zeros_like(input); 10 | 11 | // Get height 12 | int64_t height = input.size(2); 13 | 14 | output.copy_(input); 15 | 16 | for (int64_t ind = 1; ind < height; ind <<= 1) { 17 | at::Tensor max_temp = at::slice(output, 2, 0, height-ind); 18 | at::Tensor cur_temp = at::slice(output, 2, 0, height-ind); 19 | at::Tensor next_temp = at::slice(output, 2, ind, height); 20 | at::max_out(max_temp, cur_temp, next_temp); 21 | } 22 | 23 | return { 24 | output 25 | }; 26 | } 27 | 28 | std::vector top_pool_backward( 29 | at::Tensor input, 30 | at::Tensor grad_output 31 | ) { 32 | auto output = at::zeros_like(input); 33 | 34 | int32_t batch = input.size(0); 35 | int32_t channel = input.size(1); 36 | int32_t height = input.size(2); 37 | int32_t width = input.size(3); 38 | 39 | auto max_val = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat)); 40 | auto max_ind = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kLong)); 41 | 42 | auto input_temp = input.select(2, height - 1); 43 | max_val.copy_(input_temp); 44 | 45 | max_ind.fill_(height - 1); 46 | 47 | auto output_temp = output.select(2, height - 1); 48 | auto grad_output_temp = grad_output.select(2, height - 1); 49 | output_temp.copy_(grad_output_temp); 50 | 51 | auto un_max_ind = max_ind.unsqueeze(2); 52 | auto gt_mask = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kByte)); 53 | auto max_temp = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat)); 54 | for (int32_t ind = 1; ind < height; ++ind) { 55 | input_temp = input.select(2, height - ind - 1); 56 | at::gt_out(gt_mask, input_temp, max_val); 57 | 58 | at::masked_select_out(max_temp, input_temp, gt_mask); 59 | max_val.masked_scatter_(gt_mask, max_temp); 60 | max_ind.masked_fill_(gt_mask, height - ind - 1); 61 | 62 | grad_output_temp = grad_output.select(2, height - ind - 1).unsqueeze(2); 63 | output.scatter_add_(2, un_max_ind, grad_output_temp); 64 | } 65 | 66 | return { 67 | output 68 | }; 69 | } 70 | 71 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 72 | m.def( 73 | "forward", &top_pool_forward, "Top Pool Forward", 74 | py::call_guard() 75 | ); 76 | m.def( 77 | "backward", &top_pool_backward, "Top Pool Backward", 78 | py::call_guard() 79 | ); 80 | } 81 | -------------------------------------------------------------------------------- /core/models/py_utils/data_parallel.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.nn.modules import Module 3 | from torch.nn.parallel.scatter_gather import gather 4 | from torch.nn.parallel.replicate import replicate 5 | from torch.nn.parallel.parallel_apply import parallel_apply 6 | 7 | from .scatter_gather import scatter_kwargs 8 | 9 | class DataParallel(Module): 10 | r"""Implements data parallelism at the module level. 11 | 12 | This container parallelizes the application of the given module by 13 | splitting the input across the specified devices by chunking in the batch 14 | dimension. In the forward pass, the module is replicated on each device, 15 | and each replica handles a portion of the input. During the backwards 16 | pass, gradients from each replica are summed into the original module. 17 | 18 | The batch size should be larger than the number of GPUs used. It should 19 | also be an integer multiple of the number of GPUs so that each chunk is the 20 | same size (so that each GPU processes the same number of samples). 21 | 22 | See also: :ref:`cuda-nn-dataparallel-instead` 23 | 24 | Arbitrary positional and keyword inputs are allowed to be passed into 25 | DataParallel EXCEPT Tensors. All variables will be scattered on dim 26 | specified (default 0). Primitive types will be broadcasted, but all 27 | other types will be a shallow copy and can be corrupted if written to in 28 | the model's forward pass. 29 | 30 | Args: 31 | module: module to be parallelized 32 | device_ids: CUDA devices (default: all devices) 33 | output_device: device location of output (default: device_ids[0]) 34 | 35 | Example:: 36 | 37 | >>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2]) 38 | >>> output = net(input_var) 39 | """ 40 | 41 | # TODO: update notes/cuda.rst when this class handles 8+ GPUs well 42 | 43 | def __init__(self, module, device_ids=None, output_device=None, dim=0, chunk_sizes=None): 44 | super(DataParallel, self).__init__() 45 | 46 | if not torch.cuda.is_available(): 47 | self.module = module 48 | self.device_ids = [] 49 | return 50 | 51 | if device_ids is None: 52 | device_ids = list(range(torch.cuda.device_count())) 53 | if output_device is None: 54 | output_device = device_ids[0] 55 | self.dim = dim 56 | self.module = module 57 | self.device_ids = device_ids 58 | self.chunk_sizes = chunk_sizes 59 | self.output_device = output_device 60 | if len(self.device_ids) == 1: 61 | self.module.cuda(device_ids[0]) 62 | 63 | def forward(self, *inputs, **kwargs): 64 | if not self.device_ids: 65 | return self.module(*inputs, **kwargs) 66 | inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes) 67 | if len(self.device_ids) == 1: 68 | return self.module(*inputs[0], **kwargs[0]) 69 | replicas = self.replicate(self.module, self.device_ids[:len(inputs)]) 70 | outputs = self.parallel_apply(replicas, inputs, kwargs) 71 | return self.gather(outputs, self.output_device) 72 | 73 | def replicate(self, module, device_ids): 74 | return replicate(module, device_ids) 75 | 76 | def scatter(self, inputs, kwargs, device_ids, chunk_sizes): 77 | return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes) 78 | 79 | def parallel_apply(self, replicas, inputs, kwargs): 80 | return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) 81 | 82 | def gather(self, outputs, output_device): 83 | return gather(outputs, output_device, dim=self.dim) 84 | 85 | 86 | def data_parallel(module, inputs, device_ids=None, output_device=None, dim=0, module_kwargs=None): 87 | r"""Evaluates module(input) in parallel across the GPUs given in device_ids. 88 | 89 | This is the functional version of the DataParallel module. 90 | 91 | Args: 92 | module: the module to evaluate in parallel 93 | inputs: inputs to the module 94 | device_ids: GPU ids on which to replicate module 95 | output_device: GPU location of the output Use -1 to indicate the CPU. 96 | (default: device_ids[0]) 97 | Returns: 98 | a Variable containing the result of module(input) located on 99 | output_device 100 | """ 101 | if not isinstance(inputs, tuple): 102 | inputs = (inputs,) 103 | 104 | if device_ids is None: 105 | device_ids = list(range(torch.cuda.device_count())) 106 | 107 | if output_device is None: 108 | output_device = device_ids[0] 109 | 110 | inputs, module_kwargs = scatter_kwargs(inputs, module_kwargs, device_ids, dim) 111 | if len(device_ids) == 1: 112 | return module(*inputs[0], **module_kwargs[0]) 113 | used_device_ids = device_ids[:len(inputs)] 114 | replicas = replicate(module, used_device_ids) 115 | outputs = parallel_apply(replicas, inputs, module_kwargs, used_device_ids) 116 | return gather(outputs, output_device, dim) 117 | -------------------------------------------------------------------------------- /core/models/py_utils/losses.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from .utils import _tranpose_and_gather_feat 5 | 6 | def _sigmoid(x): 7 | return torch.clamp(x.sigmoid_(), min=1e-4, max=1-1e-4) 8 | 9 | def _ae_loss(tag0, tag1, mask): 10 | num = mask.sum(dim=1, keepdim=True).float() 11 | tag0 = tag0.squeeze() 12 | tag1 = tag1.squeeze() 13 | 14 | tag_mean = (tag0 + tag1) / 2 15 | 16 | tag0 = torch.pow(tag0 - tag_mean, 2) / (num + 1e-4) 17 | tag0 = tag0[mask].sum() 18 | tag1 = torch.pow(tag1 - tag_mean, 2) / (num + 1e-4) 19 | tag1 = tag1[mask].sum() 20 | pull = tag0 + tag1 21 | 22 | mask = mask.unsqueeze(1) + mask.unsqueeze(2) 23 | mask = mask.eq(2) 24 | num = num.unsqueeze(2) 25 | num2 = (num - 1) * num 26 | dist = tag_mean.unsqueeze(1) - tag_mean.unsqueeze(2) 27 | dist = 1 - torch.abs(dist) 28 | dist = nn.functional.relu(dist, inplace=True) 29 | dist = dist - 1 / (num + 1e-4) 30 | dist = dist / (num2 + 1e-4) 31 | dist = dist[mask] 32 | push = dist.sum() 33 | return pull, push 34 | 35 | def _off_loss(off, gt_off, mask): 36 | num = mask.float().sum() 37 | mask = mask.unsqueeze(2).expand_as(gt_off) 38 | 39 | off = off[mask] 40 | gt_off = gt_off[mask] 41 | 42 | off_loss = nn.functional.smooth_l1_loss(off, gt_off, reduction="sum") 43 | off_loss = off_loss / (num + 1e-4) 44 | return off_loss 45 | 46 | def _focal_loss_mask(preds, gt, mask): 47 | pos_inds = gt.eq(1) 48 | neg_inds = gt.lt(1) 49 | 50 | neg_weights = torch.pow(1 - gt[neg_inds], 4) 51 | 52 | pos_mask = mask[pos_inds] 53 | neg_mask = mask[neg_inds] 54 | 55 | loss = 0 56 | for pred in preds: 57 | pos_pred = pred[pos_inds] 58 | neg_pred = pred[neg_inds] 59 | 60 | pos_loss = torch.log(pos_pred) * torch.pow(1 - pos_pred, 2) * pos_mask 61 | neg_loss = torch.log(1 - neg_pred) * torch.pow(neg_pred, 2) * neg_weights * neg_mask 62 | 63 | num_pos = pos_inds.float().sum() 64 | pos_loss = pos_loss.sum() 65 | neg_loss = neg_loss.sum() 66 | 67 | if pos_pred.nelement() == 0: 68 | loss = loss - neg_loss 69 | else: 70 | loss = loss - (pos_loss + neg_loss) / num_pos 71 | return loss 72 | 73 | def _focal_loss(preds, gt): 74 | pos_inds = gt.eq(1) 75 | neg_inds = gt.lt(1) 76 | 77 | neg_weights = torch.pow(1 - gt[neg_inds], 4) 78 | 79 | loss = 0 80 | for pred in preds: 81 | pos_pred = pred[pos_inds] 82 | neg_pred = pred[neg_inds] 83 | 84 | pos_loss = torch.log(pos_pred) * torch.pow(1 - pos_pred, 2) 85 | neg_loss = torch.log(1 - neg_pred) * torch.pow(neg_pred, 2) * neg_weights 86 | 87 | num_pos = pos_inds.float().sum() 88 | pos_loss = pos_loss.sum() 89 | neg_loss = neg_loss.sum() 90 | 91 | if pos_pred.nelement() == 0: 92 | loss = loss - neg_loss 93 | else: 94 | loss = loss - (pos_loss + neg_loss) / num_pos 95 | return loss 96 | 97 | class CornerNet_Saccade_Loss(nn.Module): 98 | def __init__(self, pull_weight=1, push_weight=1, off_weight=1, focal_loss=_focal_loss_mask): 99 | super(CornerNet_Saccade_Loss, self).__init__() 100 | 101 | self.pull_weight = pull_weight 102 | self.push_weight = push_weight 103 | self.off_weight = off_weight 104 | self.focal_loss = focal_loss 105 | self.ae_loss = _ae_loss 106 | self.off_loss = _off_loss 107 | 108 | def forward(self, outs, targets): 109 | tl_heats = outs[0] 110 | br_heats = outs[1] 111 | tl_tags = outs[2] 112 | br_tags = outs[3] 113 | tl_offs = outs[4] 114 | br_offs = outs[5] 115 | atts = outs[6] 116 | 117 | gt_tl_heat = targets[0] 118 | gt_br_heat = targets[1] 119 | gt_mask = targets[2] 120 | gt_tl_off = targets[3] 121 | gt_br_off = targets[4] 122 | gt_tl_ind = targets[5] 123 | gt_br_ind = targets[6] 124 | gt_tl_valid = targets[7] 125 | gt_br_valid = targets[8] 126 | gt_atts = targets[9] 127 | 128 | # focal loss 129 | focal_loss = 0 130 | 131 | tl_heats = [_sigmoid(t) for t in tl_heats] 132 | br_heats = [_sigmoid(b) for b in br_heats] 133 | 134 | focal_loss += self.focal_loss(tl_heats, gt_tl_heat, gt_tl_valid) 135 | focal_loss += self.focal_loss(br_heats, gt_br_heat, gt_br_valid) 136 | 137 | atts = [[_sigmoid(a) for a in att] for att in atts] 138 | atts = [[att[ind] for att in atts] for ind in range(len(gt_atts))] 139 | 140 | att_loss = 0 141 | for att, gt_att in zip(atts, gt_atts): 142 | att_loss += _focal_loss(att, gt_att) / max(len(att), 1) 143 | 144 | # tag loss 145 | pull_loss = 0 146 | push_loss = 0 147 | tl_tags = [_tranpose_and_gather_feat(tl_tag, gt_tl_ind) for tl_tag in tl_tags] 148 | br_tags = [_tranpose_and_gather_feat(br_tag, gt_br_ind) for br_tag in br_tags] 149 | for tl_tag, br_tag in zip(tl_tags, br_tags): 150 | pull, push = self.ae_loss(tl_tag, br_tag, gt_mask) 151 | pull_loss += pull 152 | push_loss += push 153 | pull_loss = self.pull_weight * pull_loss 154 | push_loss = self.push_weight * push_loss 155 | 156 | off_loss = 0 157 | tl_offs = [_tranpose_and_gather_feat(tl_off, gt_tl_ind) for tl_off in tl_offs] 158 | br_offs = [_tranpose_and_gather_feat(br_off, gt_br_ind) for br_off in br_offs] 159 | for tl_off, br_off in zip(tl_offs, br_offs): 160 | off_loss += self.off_loss(tl_off, gt_tl_off, gt_mask) 161 | off_loss += self.off_loss(br_off, gt_br_off, gt_mask) 162 | off_loss = self.off_weight * off_loss 163 | 164 | loss = (focal_loss + att_loss + pull_loss + push_loss + off_loss) / max(len(tl_heats), 1) 165 | return loss.unsqueeze(0) 166 | 167 | class CornerNet_Loss(nn.Module): 168 | def __init__(self, pull_weight=1, push_weight=1, off_weight=1, focal_loss=_focal_loss): 169 | super(CornerNet_Loss, self).__init__() 170 | 171 | self.pull_weight = pull_weight 172 | self.push_weight = push_weight 173 | self.off_weight = off_weight 174 | self.focal_loss = focal_loss 175 | self.ae_loss = _ae_loss 176 | self.off_loss = _off_loss 177 | 178 | def forward(self, outs, targets): 179 | tl_heats = outs[0] 180 | br_heats = outs[1] 181 | tl_tags = outs[2] 182 | br_tags = outs[3] 183 | tl_offs = outs[4] 184 | br_offs = outs[5] 185 | 186 | gt_tl_heat = targets[0] 187 | gt_br_heat = targets[1] 188 | gt_mask = targets[2] 189 | gt_tl_off = targets[3] 190 | gt_br_off = targets[4] 191 | gt_tl_ind = targets[5] 192 | gt_br_ind = targets[6] 193 | 194 | # focal loss 195 | focal_loss = 0 196 | 197 | tl_heats = [_sigmoid(t) for t in tl_heats] 198 | br_heats = [_sigmoid(b) for b in br_heats] 199 | 200 | focal_loss += self.focal_loss(tl_heats, gt_tl_heat) 201 | focal_loss += self.focal_loss(br_heats, gt_br_heat) 202 | 203 | # tag loss 204 | pull_loss = 0 205 | push_loss = 0 206 | tl_tags = [_tranpose_and_gather_feat(tl_tag, gt_tl_ind) for tl_tag in tl_tags] 207 | br_tags = [_tranpose_and_gather_feat(br_tag, gt_br_ind) for br_tag in br_tags] 208 | for tl_tag, br_tag in zip(tl_tags, br_tags): 209 | pull, push = self.ae_loss(tl_tag, br_tag, gt_mask) 210 | pull_loss += pull 211 | push_loss += push 212 | pull_loss = self.pull_weight * pull_loss 213 | push_loss = self.push_weight * push_loss 214 | 215 | off_loss = 0 216 | tl_offs = [_tranpose_and_gather_feat(tl_off, gt_tl_ind) for tl_off in tl_offs] 217 | br_offs = [_tranpose_and_gather_feat(br_off, gt_br_ind) for br_off in br_offs] 218 | for tl_off, br_off in zip(tl_offs, br_offs): 219 | off_loss += self.off_loss(tl_off, gt_tl_off, gt_mask) 220 | off_loss += self.off_loss(br_off, gt_br_off, gt_mask) 221 | off_loss = self.off_weight * off_loss 222 | 223 | loss = (focal_loss + pull_loss + push_loss + off_loss) / max(len(tl_heats), 1) 224 | return loss.unsqueeze(0) 225 | -------------------------------------------------------------------------------- /core/models/py_utils/modules.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from .utils import residual, upsample, merge, _decode 5 | 6 | def _make_layer(inp_dim, out_dim, modules): 7 | layers = [residual(inp_dim, out_dim)] 8 | layers += [residual(out_dim, out_dim) for _ in range(1, modules)] 9 | return nn.Sequential(*layers) 10 | 11 | def _make_layer_revr(inp_dim, out_dim, modules): 12 | layers = [residual(inp_dim, inp_dim) for _ in range(modules - 1)] 13 | layers += [residual(inp_dim, out_dim)] 14 | return nn.Sequential(*layers) 15 | 16 | def _make_pool_layer(dim): 17 | return nn.MaxPool2d(kernel_size=2, stride=2) 18 | 19 | def _make_unpool_layer(dim): 20 | return upsample(scale_factor=2) 21 | 22 | def _make_merge_layer(dim): 23 | return merge() 24 | 25 | class hg_module(nn.Module): 26 | def __init__( 27 | self, n, dims, modules, make_up_layer=_make_layer, 28 | make_pool_layer=_make_pool_layer, make_hg_layer=_make_layer, 29 | make_low_layer=_make_layer, make_hg_layer_revr=_make_layer_revr, 30 | make_unpool_layer=_make_unpool_layer, make_merge_layer=_make_merge_layer 31 | ): 32 | super(hg_module, self).__init__() 33 | 34 | curr_mod = modules[0] 35 | next_mod = modules[1] 36 | 37 | curr_dim = dims[0] 38 | next_dim = dims[1] 39 | 40 | self.n = n 41 | self.up1 = make_up_layer(curr_dim, curr_dim, curr_mod) 42 | self.max1 = make_pool_layer(curr_dim) 43 | self.low1 = make_hg_layer(curr_dim, next_dim, curr_mod) 44 | self.low2 = hg_module( 45 | n - 1, dims[1:], modules[1:], 46 | make_up_layer=make_up_layer, 47 | make_pool_layer=make_pool_layer, 48 | make_hg_layer=make_hg_layer, 49 | make_low_layer=make_low_layer, 50 | make_hg_layer_revr=make_hg_layer_revr, 51 | make_unpool_layer=make_unpool_layer, 52 | make_merge_layer=make_merge_layer 53 | ) if n > 1 else make_low_layer(next_dim, next_dim, next_mod) 54 | self.low3 = make_hg_layer_revr(next_dim, curr_dim, curr_mod) 55 | self.up2 = make_unpool_layer(curr_dim) 56 | self.merg = make_merge_layer(curr_dim) 57 | 58 | def forward(self, x): 59 | up1 = self.up1(x) 60 | max1 = self.max1(x) 61 | low1 = self.low1(max1) 62 | low2 = self.low2(low1) 63 | low3 = self.low3(low2) 64 | up2 = self.up2(low3) 65 | merg = self.merg(up1, up2) 66 | return merg 67 | 68 | class hg(nn.Module): 69 | def __init__(self, pre, hg_modules, cnvs, inters, cnvs_, inters_): 70 | super(hg, self).__init__() 71 | 72 | self.pre = pre 73 | self.hgs = hg_modules 74 | self.cnvs = cnvs 75 | 76 | self.inters = inters 77 | self.inters_ = inters_ 78 | self.cnvs_ = cnvs_ 79 | 80 | def forward(self, x): 81 | inter = self.pre(x) 82 | 83 | cnvs = [] 84 | for ind, (hg_, cnv_) in enumerate(zip(self.hgs, self.cnvs)): 85 | hg = hg_(inter) 86 | cnv = cnv_(hg) 87 | cnvs.append(cnv) 88 | 89 | if ind < len(self.hgs) - 1: 90 | inter = self.inters_[ind](inter) + self.cnvs_[ind](cnv) 91 | inter = nn.functional.relu_(inter) 92 | inter = self.inters[ind](inter) 93 | return cnvs 94 | 95 | class hg_net(nn.Module): 96 | def __init__( 97 | self, hg, tl_modules, br_modules, tl_heats, br_heats, 98 | tl_tags, br_tags, tl_offs, br_offs 99 | ): 100 | super(hg_net, self).__init__() 101 | 102 | self._decode = _decode 103 | 104 | self.hg = hg 105 | 106 | self.tl_modules = tl_modules 107 | self.br_modules = br_modules 108 | 109 | self.tl_heats = tl_heats 110 | self.br_heats = br_heats 111 | 112 | self.tl_tags = tl_tags 113 | self.br_tags = br_tags 114 | 115 | self.tl_offs = tl_offs 116 | self.br_offs = br_offs 117 | 118 | def _train(self, *xs): 119 | image = xs[0] 120 | cnvs = self.hg(image) 121 | 122 | tl_modules = [tl_mod_(cnv) for tl_mod_, cnv in zip(self.tl_modules, cnvs)] 123 | br_modules = [br_mod_(cnv) for br_mod_, cnv in zip(self.br_modules, cnvs)] 124 | tl_heats = [tl_heat_(tl_mod) for tl_heat_, tl_mod in zip(self.tl_heats, tl_modules)] 125 | br_heats = [br_heat_(br_mod) for br_heat_, br_mod in zip(self.br_heats, br_modules)] 126 | tl_tags = [tl_tag_(tl_mod) for tl_tag_, tl_mod in zip(self.tl_tags, tl_modules)] 127 | br_tags = [br_tag_(br_mod) for br_tag_, br_mod in zip(self.br_tags, br_modules)] 128 | tl_offs = [tl_off_(tl_mod) for tl_off_, tl_mod in zip(self.tl_offs, tl_modules)] 129 | br_offs = [br_off_(br_mod) for br_off_, br_mod in zip(self.br_offs, br_modules)] 130 | return [tl_heats, br_heats, tl_tags, br_tags, tl_offs, br_offs] 131 | 132 | def _test(self, *xs, **kwargs): 133 | image = xs[0] 134 | cnvs = self.hg(image) 135 | 136 | tl_mod = self.tl_modules[-1](cnvs[-1]) 137 | br_mod = self.br_modules[-1](cnvs[-1]) 138 | 139 | tl_heat, br_heat = self.tl_heats[-1](tl_mod), self.br_heats[-1](br_mod) 140 | tl_tag, br_tag = self.tl_tags[-1](tl_mod), self.br_tags[-1](br_mod) 141 | tl_off, br_off = self.tl_offs[-1](tl_mod), self.br_offs[-1](br_mod) 142 | 143 | outs = [tl_heat, br_heat, tl_tag, br_tag, tl_off, br_off] 144 | return self._decode(*outs, **kwargs), tl_heat, br_heat, tl_tag, br_tag 145 | 146 | def forward(self, *xs, test=False, **kwargs): 147 | if not test: 148 | return self._train(*xs, **kwargs) 149 | return self._test(*xs, **kwargs) 150 | 151 | class saccade_module(nn.Module): 152 | def __init__( 153 | self, n, dims, modules, make_up_layer=_make_layer, 154 | make_pool_layer=_make_pool_layer, make_hg_layer=_make_layer, 155 | make_low_layer=_make_layer, make_hg_layer_revr=_make_layer_revr, 156 | make_unpool_layer=_make_unpool_layer, make_merge_layer=_make_merge_layer 157 | ): 158 | super(saccade_module, self).__init__() 159 | 160 | curr_mod = modules[0] 161 | next_mod = modules[1] 162 | 163 | curr_dim = dims[0] 164 | next_dim = dims[1] 165 | 166 | self.n = n 167 | self.up1 = make_up_layer(curr_dim, curr_dim, curr_mod) 168 | self.max1 = make_pool_layer(curr_dim) 169 | self.low1 = make_hg_layer(curr_dim, next_dim, curr_mod) 170 | self.low2 = saccade_module( 171 | n - 1, dims[1:], modules[1:], 172 | make_up_layer=make_up_layer, 173 | make_pool_layer=make_pool_layer, 174 | make_hg_layer=make_hg_layer, 175 | make_low_layer=make_low_layer, 176 | make_hg_layer_revr=make_hg_layer_revr, 177 | make_unpool_layer=make_unpool_layer, 178 | make_merge_layer=make_merge_layer 179 | ) if n > 1 else make_low_layer(next_dim, next_dim, next_mod) 180 | self.low3 = make_hg_layer_revr(next_dim, curr_dim, curr_mod) 181 | self.up2 = make_unpool_layer(curr_dim) 182 | self.merg = make_merge_layer(curr_dim) 183 | 184 | def forward(self, x): 185 | up1 = self.up1(x) 186 | max1 = self.max1(x) 187 | low1 = self.low1(max1) 188 | if self.n > 1: 189 | low2, mergs = self.low2(low1) 190 | else: 191 | low2, mergs = self.low2(low1), [] 192 | low3 = self.low3(low2) 193 | up2 = self.up2(low3) 194 | merg = self.merg(up1, up2) 195 | mergs.append(merg) 196 | return merg, mergs 197 | 198 | class saccade(nn.Module): 199 | def __init__(self, pre, hg_modules, cnvs, inters, cnvs_, inters_): 200 | super(saccade, self).__init__() 201 | 202 | self.pre = pre 203 | self.hgs = hg_modules 204 | self.cnvs = cnvs 205 | 206 | self.inters = inters 207 | self.inters_ = inters_ 208 | self.cnvs_ = cnvs_ 209 | 210 | def forward(self, x): 211 | inter = self.pre(x) 212 | 213 | cnvs = [] 214 | atts = [] 215 | for ind, (hg_, cnv_) in enumerate(zip(self.hgs, self.cnvs)): 216 | hg, ups = hg_(inter) 217 | cnv = cnv_(hg) 218 | cnvs.append(cnv) 219 | atts.append(ups) 220 | 221 | if ind < len(self.hgs) - 1: 222 | inter = self.inters_[ind](inter) + self.cnvs_[ind](cnv) 223 | inter = nn.functional.relu_(inter) 224 | inter = self.inters[ind](inter) 225 | return cnvs, atts 226 | 227 | class saccade_net(nn.Module): 228 | def __init__( 229 | self, hg, tl_modules, br_modules, tl_heats, br_heats, 230 | tl_tags, br_tags, tl_offs, br_offs, att_modules, up_start=0 231 | ): 232 | super(saccade_net, self).__init__() 233 | 234 | self._decode = _decode 235 | 236 | self.hg = hg 237 | 238 | self.tl_modules = tl_modules 239 | self.br_modules = br_modules 240 | self.tl_heats = tl_heats 241 | self.br_heats = br_heats 242 | self.tl_tags = tl_tags 243 | self.br_tags = br_tags 244 | self.tl_offs = tl_offs 245 | self.br_offs = br_offs 246 | 247 | self.att_modules = att_modules 248 | self.up_start = up_start 249 | 250 | def _train(self, *xs): 251 | image = xs[0] 252 | 253 | cnvs, ups = self.hg(image) 254 | ups = [up[self.up_start:] for up in ups] 255 | 256 | tl_modules = [tl_mod_(cnv) for tl_mod_, cnv in zip(self.tl_modules, cnvs)] 257 | br_modules = [br_mod_(cnv) for br_mod_, cnv in zip(self.br_modules, cnvs)] 258 | tl_heats = [tl_heat_(tl_mod) for tl_heat_, tl_mod in zip(self.tl_heats, tl_modules)] 259 | br_heats = [br_heat_(br_mod) for br_heat_, br_mod in zip(self.br_heats, br_modules)] 260 | tl_tags = [tl_tag_(tl_mod) for tl_tag_, tl_mod in zip(self.tl_tags, tl_modules)] 261 | br_tags = [br_tag_(br_mod) for br_tag_, br_mod in zip(self.br_tags, br_modules)] 262 | tl_offs = [tl_off_(tl_mod) for tl_off_, tl_mod in zip(self.tl_offs, tl_modules)] 263 | br_offs = [br_off_(br_mod) for br_off_, br_mod in zip(self.br_offs, br_modules)] 264 | atts = [[att_mod_(u) for att_mod_, u in zip(att_mods, up)] for att_mods, up in zip(self.att_modules, ups)] 265 | return [tl_heats, br_heats, tl_tags, br_tags, tl_offs, br_offs, atts] 266 | 267 | def _test(self, *xs, no_att=False, **kwargs): 268 | image = xs[0] 269 | cnvs, ups = self.hg(image) 270 | ups = [up[self.up_start:] for up in ups] 271 | 272 | if not no_att: 273 | atts = [att_mod_(up) for att_mod_, up in zip(self.att_modules[-1], ups[-1])] 274 | atts = [torch.sigmoid(att) for att in atts] 275 | 276 | tl_mod = self.tl_modules[-1](cnvs[-1]) 277 | br_mod = self.br_modules[-1](cnvs[-1]) 278 | 279 | tl_heat, br_heat = self.tl_heats[-1](tl_mod), self.br_heats[-1](br_mod) 280 | tl_tag, br_tag = self.tl_tags[-1](tl_mod), self.br_tags[-1](br_mod) 281 | tl_off, br_off = self.tl_offs[-1](tl_mod), self.br_offs[-1](br_mod) 282 | 283 | outs = [tl_heat, br_heat, tl_tag, br_tag, tl_off, br_off] 284 | if not no_att: 285 | return self._decode(*outs, **kwargs), atts 286 | else: 287 | return self._decode(*outs, **kwargs) 288 | 289 | def forward(self, *xs, test=False, **kwargs): 290 | if not test: 291 | return self._train(*xs, **kwargs) 292 | return self._test(*xs, **kwargs) 293 | -------------------------------------------------------------------------------- /core/models/py_utils/scatter_gather.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.autograd import Variable 3 | from torch.nn.parallel._functions import Scatter, Gather 4 | 5 | 6 | def scatter(inputs, target_gpus, dim=0, chunk_sizes=None): 7 | r""" 8 | Slices variables into approximately equal chunks and 9 | distributes them across given GPUs. Duplicates 10 | references to objects that are not variables. Does not 11 | support Tensors. 12 | """ 13 | def scatter_map(obj): 14 | if isinstance(obj, Variable): 15 | return Scatter.apply(target_gpus, chunk_sizes, dim, obj) 16 | assert not torch.is_tensor(obj), "Tensors not supported in scatter." 17 | if isinstance(obj, tuple): 18 | return list(zip(*map(scatter_map, obj))) 19 | if isinstance(obj, list): 20 | return list(map(list, zip(*map(scatter_map, obj)))) 21 | if isinstance(obj, dict): 22 | return list(map(type(obj), zip(*map(scatter_map, obj.items())))) 23 | return [obj for targets in target_gpus] 24 | 25 | return scatter_map(inputs) 26 | 27 | 28 | def scatter_kwargs(inputs, kwargs, target_gpus, dim=0, chunk_sizes=None): 29 | r"""Scatter with support for kwargs dictionary""" 30 | inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else [] 31 | kwargs = scatter(kwargs, target_gpus, dim, chunk_sizes) if kwargs else [] 32 | if len(inputs) < len(kwargs): 33 | inputs.extend([() for _ in range(len(kwargs) - len(inputs))]) 34 | elif len(kwargs) < len(inputs): 35 | kwargs.extend([{} for _ in range(len(inputs) - len(kwargs))]) 36 | inputs = tuple(inputs) 37 | kwargs = tuple(kwargs) 38 | return inputs, kwargs 39 | -------------------------------------------------------------------------------- /core/models/py_utils/utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | def _gather_feat(feat, ind, mask=None): 5 | dim = feat.size(2) 6 | ind = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim) 7 | feat = feat.gather(1, ind) 8 | if mask is not None: 9 | mask = mask.unsqueeze(2).expand_as(feat) 10 | feat = feat[mask] 11 | feat = feat.view(-1, dim) 12 | return feat 13 | 14 | def _nms(heat, kernel=1): 15 | pad = (kernel - 1) // 2 16 | 17 | hmax = nn.functional.max_pool2d(heat, (kernel, kernel), stride=1, padding=pad) 18 | keep = (hmax == heat).float() 19 | return heat * keep 20 | 21 | def _tranpose_and_gather_feat(feat, ind): 22 | feat = feat.permute(0, 2, 3, 1).contiguous() 23 | feat = feat.view(feat.size(0), -1, feat.size(3)) 24 | feat = _gather_feat(feat, ind) 25 | return feat 26 | 27 | def _topk(scores, K=20): 28 | batch, cat, height, width = scores.size() 29 | 30 | topk_scores, topk_inds = torch.topk(scores.view(batch, -1), K) 31 | 32 | topk_clses = (topk_inds / (height * width)).int() 33 | 34 | topk_inds = topk_inds % (height * width) 35 | topk_ys = (topk_inds / width).int().float() 36 | topk_xs = (topk_inds % width).int().float() 37 | return topk_scores, topk_inds, topk_clses, topk_ys, topk_xs 38 | 39 | def _decode( 40 | tl_heat, br_heat, tl_tag, br_tag, tl_regr, br_regr, 41 | K=100, kernel=1, ae_threshold=1, num_dets=1000, no_border=False 42 | ): 43 | batch, cat, height, width = tl_heat.size() 44 | 45 | tl_heat = torch.sigmoid(tl_heat) 46 | br_heat = torch.sigmoid(br_heat) 47 | 48 | # perform nms on heatmaps 49 | tl_heat = _nms(tl_heat, kernel=kernel) 50 | br_heat = _nms(br_heat, kernel=kernel) 51 | 52 | tl_scores, tl_inds, tl_clses, tl_ys, tl_xs = _topk(tl_heat, K=K) 53 | br_scores, br_inds, br_clses, br_ys, br_xs = _topk(br_heat, K=K) 54 | 55 | tl_ys = tl_ys.view(batch, K, 1).expand(batch, K, K) 56 | tl_xs = tl_xs.view(batch, K, 1).expand(batch, K, K) 57 | br_ys = br_ys.view(batch, 1, K).expand(batch, K, K) 58 | br_xs = br_xs.view(batch, 1, K).expand(batch, K, K) 59 | 60 | if no_border: 61 | tl_ys_binds = (tl_ys == 0) 62 | tl_xs_binds = (tl_xs == 0) 63 | br_ys_binds = (br_ys == height - 1) 64 | br_xs_binds = (br_xs == width - 1) 65 | 66 | if tl_regr is not None and br_regr is not None: 67 | tl_regr = _tranpose_and_gather_feat(tl_regr, tl_inds) 68 | tl_regr = tl_regr.view(batch, K, 1, 2) 69 | br_regr = _tranpose_and_gather_feat(br_regr, br_inds) 70 | br_regr = br_regr.view(batch, 1, K, 2) 71 | 72 | tl_xs = tl_xs + tl_regr[..., 0] 73 | tl_ys = tl_ys + tl_regr[..., 1] 74 | br_xs = br_xs + br_regr[..., 0] 75 | br_ys = br_ys + br_regr[..., 1] 76 | 77 | # all possible boxes based on top k corners (ignoring class) 78 | bboxes = torch.stack((tl_xs, tl_ys, br_xs, br_ys), dim=3) 79 | 80 | tl_tag = _tranpose_and_gather_feat(tl_tag, tl_inds) 81 | tl_tag = tl_tag.view(batch, K, 1) 82 | br_tag = _tranpose_and_gather_feat(br_tag, br_inds) 83 | br_tag = br_tag.view(batch, 1, K) 84 | dists = torch.abs(tl_tag - br_tag) 85 | 86 | tl_scores = tl_scores.view(batch, K, 1).expand(batch, K, K) 87 | br_scores = br_scores.view(batch, 1, K).expand(batch, K, K) 88 | scores = (tl_scores + br_scores) / 2 89 | 90 | # reject boxes based on classes 91 | tl_clses = tl_clses.view(batch, K, 1).expand(batch, K, K) 92 | br_clses = br_clses.view(batch, 1, K).expand(batch, K, K) 93 | cls_inds = (tl_clses != br_clses) 94 | 95 | # reject boxes based on distances 96 | dist_inds = (dists > ae_threshold) 97 | 98 | # reject boxes based on widths and heights 99 | width_inds = (br_xs < tl_xs) 100 | height_inds = (br_ys < tl_ys) 101 | 102 | if no_border: 103 | scores[tl_ys_binds] = -1 104 | scores[tl_xs_binds] = -1 105 | scores[br_ys_binds] = -1 106 | scores[br_xs_binds] = -1 107 | 108 | scores[cls_inds] = -1 109 | scores[dist_inds] = -1 110 | scores[width_inds] = -1 111 | scores[height_inds] = -1 112 | 113 | scores = scores.view(batch, -1) 114 | scores, inds = torch.topk(scores, num_dets) 115 | scores = scores.unsqueeze(2) 116 | 117 | bboxes = bboxes.view(batch, -1, 4) 118 | bboxes = _gather_feat(bboxes, inds) 119 | 120 | clses = tl_clses.contiguous().view(batch, -1, 1) 121 | clses = _gather_feat(clses, inds).float() 122 | 123 | tl_scores = tl_scores.contiguous().view(batch, -1, 1) 124 | tl_scores = _gather_feat(tl_scores, inds).float() 125 | br_scores = br_scores.contiguous().view(batch, -1, 1) 126 | br_scores = _gather_feat(br_scores, inds).float() 127 | 128 | detections = torch.cat([bboxes, scores, tl_scores, br_scores, clses], dim=2) 129 | return detections 130 | 131 | class upsample(nn.Module): 132 | def __init__(self, scale_factor): 133 | super(upsample, self).__init__() 134 | self.scale_factor = scale_factor 135 | 136 | def forward(self, x): 137 | return nn.functional.interpolate(x, scale_factor=self.scale_factor) 138 | 139 | class merge(nn.Module): 140 | def forward(self, x, y): 141 | return x + y 142 | 143 | class convolution(nn.Module): 144 | def __init__(self, k, inp_dim, out_dim, stride=1, with_bn=True): 145 | super(convolution, self).__init__() 146 | 147 | pad = (k - 1) // 2 148 | self.conv = nn.Conv2d(inp_dim, out_dim, (k, k), padding=(pad, pad), stride=(stride, stride), bias=not with_bn) 149 | self.bn = nn.BatchNorm2d(out_dim) if with_bn else nn.Sequential() 150 | self.relu = nn.ReLU(inplace=True) 151 | 152 | def forward(self, x): 153 | conv = self.conv(x) 154 | bn = self.bn(conv) 155 | relu = self.relu(bn) 156 | return relu 157 | 158 | class residual(nn.Module): 159 | def __init__(self, inp_dim, out_dim, k=3, stride=1): 160 | super(residual, self).__init__() 161 | p = (k - 1) // 2 162 | 163 | self.conv1 = nn.Conv2d(inp_dim, out_dim, (k, k), padding=(p, p), stride=(stride, stride), bias=False) 164 | self.bn1 = nn.BatchNorm2d(out_dim) 165 | self.relu1 = nn.ReLU(inplace=True) 166 | 167 | self.conv2 = nn.Conv2d(out_dim, out_dim, (k, k), padding=(p, p), bias=False) 168 | self.bn2 = nn.BatchNorm2d(out_dim) 169 | 170 | self.skip = nn.Sequential( 171 | nn.Conv2d(inp_dim, out_dim, (1, 1), stride=(stride, stride), bias=False), 172 | nn.BatchNorm2d(out_dim) 173 | ) if stride != 1 or inp_dim != out_dim else nn.Sequential() 174 | self.relu = nn.ReLU(inplace=True) 175 | 176 | def forward(self, x): 177 | conv1 = self.conv1(x) 178 | bn1 = self.bn1(conv1) 179 | relu1 = self.relu1(bn1) 180 | 181 | conv2 = self.conv2(relu1) 182 | bn2 = self.bn2(conv2) 183 | 184 | skip = self.skip(x) 185 | return self.relu(bn2 + skip) 186 | 187 | class corner_pool(nn.Module): 188 | def __init__(self, dim, pool1, pool2): 189 | super(corner_pool, self).__init__() 190 | self._init_layers(dim, pool1, pool2) 191 | 192 | def _init_layers(self, dim, pool1, pool2): 193 | self.p1_conv1 = convolution(3, dim, 128) 194 | self.p2_conv1 = convolution(3, dim, 128) 195 | 196 | self.p_conv1 = nn.Conv2d(128, dim, (3, 3), padding=(1, 1), bias=False) 197 | self.p_bn1 = nn.BatchNorm2d(dim) 198 | 199 | self.conv1 = nn.Conv2d(dim, dim, (1, 1), bias=False) 200 | self.bn1 = nn.BatchNorm2d(dim) 201 | self.relu1 = nn.ReLU(inplace=True) 202 | 203 | self.conv2 = convolution(3, dim, dim) 204 | 205 | self.pool1 = pool1() 206 | self.pool2 = pool2() 207 | 208 | def forward(self, x): 209 | # pool 1 210 | p1_conv1 = self.p1_conv1(x) 211 | pool1 = self.pool1(p1_conv1) 212 | 213 | # pool 2 214 | p2_conv1 = self.p2_conv1(x) 215 | pool2 = self.pool2(p2_conv1) 216 | 217 | # pool 1 + pool 2 218 | p_conv1 = self.p_conv1(pool1 + pool2) 219 | p_bn1 = self.p_bn1(p_conv1) 220 | 221 | conv1 = self.conv1(x) 222 | bn1 = self.bn1(conv1) 223 | relu1 = self.relu1(p_bn1 + bn1) 224 | 225 | conv2 = self.conv2(relu1) 226 | return conv2 227 | -------------------------------------------------------------------------------- /core/nnet/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/princeton-vl/CornerNet-Lite/6a54505d830a9d6afe26e99f0864b5d06d0bbbaf/core/nnet/__init__.py -------------------------------------------------------------------------------- /core/nnet/py_factory.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | import pickle 4 | import importlib 5 | import torch.nn as nn 6 | 7 | from ..models.py_utils.data_parallel import DataParallel 8 | 9 | torch.manual_seed(317) 10 | 11 | class Network(nn.Module): 12 | def __init__(self, model, loss): 13 | super(Network, self).__init__() 14 | 15 | self.model = model 16 | self.loss = loss 17 | 18 | def forward(self, xs, ys, **kwargs): 19 | preds = self.model(*xs, **kwargs) 20 | loss = self.loss(preds, ys, **kwargs) 21 | return loss 22 | 23 | # for model backward compatibility 24 | # previously model was wrapped by DataParallel module 25 | class DummyModule(nn.Module): 26 | def __init__(self, model): 27 | super(DummyModule, self).__init__() 28 | self.module = model 29 | 30 | def forward(self, *xs, **kwargs): 31 | return self.module(*xs, **kwargs) 32 | 33 | class NetworkFactory(object): 34 | def __init__(self, system_config, model, distributed=False, gpu=None): 35 | super(NetworkFactory, self).__init__() 36 | 37 | self.system_config = system_config 38 | 39 | self.gpu = gpu 40 | self.model = DummyModule(model) 41 | self.loss = model.loss 42 | self.network = Network(self.model, self.loss) 43 | 44 | if distributed: 45 | from apex.parallel import DistributedDataParallel, convert_syncbn_model 46 | torch.cuda.set_device(gpu) 47 | self.network = self.network.cuda(gpu) 48 | self.network = convert_syncbn_model(self.network) 49 | self.network = DistributedDataParallel(self.network) 50 | else: 51 | self.network = DataParallel(self.network, chunk_sizes=system_config.chunk_sizes) 52 | 53 | total_params = 0 54 | for params in self.model.parameters(): 55 | num_params = 1 56 | for x in params.size(): 57 | num_params *= x 58 | total_params += num_params 59 | print("total parameters: {}".format(total_params)) 60 | 61 | if system_config.opt_algo == "adam": 62 | self.optimizer = torch.optim.Adam( 63 | filter(lambda p: p.requires_grad, self.model.parameters()) 64 | ) 65 | elif system_config.opt_algo == "sgd": 66 | self.optimizer = torch.optim.SGD( 67 | filter(lambda p: p.requires_grad, self.model.parameters()), 68 | lr=system_config.learning_rate, 69 | momentum=0.9, weight_decay=0.0001 70 | ) 71 | else: 72 | raise ValueError("unknown optimizer") 73 | 74 | def cuda(self): 75 | self.model.cuda() 76 | 77 | def train_mode(self): 78 | self.network.train() 79 | 80 | def eval_mode(self): 81 | self.network.eval() 82 | 83 | def _t_cuda(self, xs): 84 | if type(xs) is list: 85 | return [x.cuda(self.gpu, non_blocking=True) for x in xs] 86 | return xs.cuda(self.gpu, non_blocking=True) 87 | 88 | def train(self, xs, ys, **kwargs): 89 | xs = [self._t_cuda(x) for x in xs] 90 | ys = [self._t_cuda(y) for y in ys] 91 | 92 | self.optimizer.zero_grad() 93 | loss = self.network(xs, ys) 94 | loss = loss.mean() 95 | loss.backward() 96 | self.optimizer.step() 97 | 98 | return loss 99 | 100 | def validate(self, xs, ys, **kwargs): 101 | with torch.no_grad(): 102 | xs = [self._t_cuda(x) for x in xs] 103 | ys = [self._t_cuda(y) for y in ys] 104 | 105 | loss = self.network(xs, ys) 106 | loss = loss.mean() 107 | return loss 108 | 109 | def test(self, xs, **kwargs): 110 | with torch.no_grad(): 111 | xs = [self._t_cuda(x) for x in xs] 112 | return self.model(*xs, **kwargs) 113 | 114 | def set_lr(self, lr): 115 | print("setting learning rate to: {}".format(lr)) 116 | for param_group in self.optimizer.param_groups: 117 | param_group["lr"] = lr 118 | 119 | def load_pretrained_params(self, pretrained_model): 120 | print("loading from {}".format(pretrained_model)) 121 | with open(pretrained_model, "rb") as f: 122 | params = torch.load(f) 123 | self.model.load_state_dict(params) 124 | 125 | def load_params(self, iteration): 126 | cache_file = self.system_config.snapshot_file.format(iteration) 127 | print("loading model from {}".format(cache_file)) 128 | with open(cache_file, "rb") as f: 129 | params = torch.load(f) 130 | self.model.load_state_dict(params) 131 | 132 | def save_params(self, iteration): 133 | cache_file = self.system_config.snapshot_file.format(iteration) 134 | print("saving model to {}".format(cache_file)) 135 | with open(cache_file, "wb") as f: 136 | params = self.model.state_dict() 137 | torch.save(params, f) 138 | -------------------------------------------------------------------------------- /core/paths.py: -------------------------------------------------------------------------------- 1 | import pkg_resources 2 | 3 | _package_name = __name__ 4 | 5 | def get_file_path(*paths): 6 | path = "/".join(paths) 7 | return pkg_resources.resource_filename(_package_name, path) 8 | -------------------------------------------------------------------------------- /core/sample/__init__.py: -------------------------------------------------------------------------------- 1 | from .cornernet import cornernet 2 | from .cornernet_saccade import cornernet_saccade 3 | 4 | def data_sampling_func(sys_configs, db, k_ind, data_aug=True, debug=False): 5 | return globals()[sys_configs.sampling_function](sys_configs, db, k_ind, data_aug, debug) 6 | -------------------------------------------------------------------------------- /core/sample/cornernet.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import math 3 | import numpy as np 4 | import torch 5 | 6 | from .utils import random_crop, draw_gaussian, gaussian_radius, normalize_, color_jittering_, lighting_ 7 | 8 | def _resize_image(image, detections, size): 9 | detections = detections.copy() 10 | height, width = image.shape[0:2] 11 | new_height, new_width = size 12 | 13 | image = cv2.resize(image, (new_width, new_height)) 14 | 15 | height_ratio = new_height / height 16 | width_ratio = new_width / width 17 | detections[:, 0:4:2] *= width_ratio 18 | detections[:, 1:4:2] *= height_ratio 19 | return image, detections 20 | 21 | def _clip_detections(image, detections): 22 | detections = detections.copy() 23 | height, width = image.shape[0:2] 24 | 25 | detections[:, 0:4:2] = np.clip(detections[:, 0:4:2], 0, width - 1) 26 | detections[:, 1:4:2] = np.clip(detections[:, 1:4:2], 0, height - 1) 27 | keep_inds = ((detections[:, 2] - detections[:, 0]) > 0) & \ 28 | ((detections[:, 3] - detections[:, 1]) > 0) 29 | detections = detections[keep_inds] 30 | return detections 31 | 32 | def cornernet(system_configs, db, k_ind, data_aug, debug): 33 | data_rng = system_configs.data_rng 34 | batch_size = system_configs.batch_size 35 | 36 | categories = db.configs["categories"] 37 | input_size = db.configs["input_size"] 38 | output_size = db.configs["output_sizes"][0] 39 | 40 | border = db.configs["border"] 41 | lighting = db.configs["lighting"] 42 | rand_crop = db.configs["rand_crop"] 43 | rand_color = db.configs["rand_color"] 44 | rand_scales = db.configs["rand_scales"] 45 | gaussian_bump = db.configs["gaussian_bump"] 46 | gaussian_iou = db.configs["gaussian_iou"] 47 | gaussian_rad = db.configs["gaussian_radius"] 48 | 49 | max_tag_len = 128 50 | 51 | # allocating memory 52 | images = np.zeros((batch_size, 3, input_size[0], input_size[1]), dtype=np.float32) 53 | tl_heatmaps = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32) 54 | br_heatmaps = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32) 55 | tl_regrs = np.zeros((batch_size, max_tag_len, 2), dtype=np.float32) 56 | br_regrs = np.zeros((batch_size, max_tag_len, 2), dtype=np.float32) 57 | tl_tags = np.zeros((batch_size, max_tag_len), dtype=np.int64) 58 | br_tags = np.zeros((batch_size, max_tag_len), dtype=np.int64) 59 | tag_masks = np.zeros((batch_size, max_tag_len), dtype=np.uint8) 60 | tag_lens = np.zeros((batch_size, ), dtype=np.int32) 61 | 62 | db_size = db.db_inds.size 63 | for b_ind in range(batch_size): 64 | if not debug and k_ind == 0: 65 | db.shuffle_inds() 66 | 67 | db_ind = db.db_inds[k_ind] 68 | k_ind = (k_ind + 1) % db_size 69 | 70 | # reading image 71 | image_path = db.image_path(db_ind) 72 | image = cv2.imread(image_path) 73 | 74 | # reading detections 75 | detections = db.detections(db_ind) 76 | 77 | # cropping an image randomly 78 | if not debug and rand_crop: 79 | image, detections = random_crop(image, detections, rand_scales, input_size, border=border) 80 | 81 | image, detections = _resize_image(image, detections, input_size) 82 | detections = _clip_detections(image, detections) 83 | 84 | width_ratio = output_size[1] / input_size[1] 85 | height_ratio = output_size[0] / input_size[0] 86 | 87 | # flipping an image randomly 88 | if not debug and np.random.uniform() > 0.5: 89 | image[:] = image[:, ::-1, :] 90 | width = image.shape[1] 91 | detections[:, [0, 2]] = width - detections[:, [2, 0]] - 1 92 | 93 | if not debug: 94 | image = image.astype(np.float32) / 255. 95 | if rand_color: 96 | color_jittering_(data_rng, image) 97 | if lighting: 98 | lighting_(data_rng, image, 0.1, db.eig_val, db.eig_vec) 99 | normalize_(image, db.mean, db.std) 100 | images[b_ind] = image.transpose((2, 0, 1)) 101 | 102 | for ind, detection in enumerate(detections): 103 | category = int(detection[-1]) - 1 104 | 105 | xtl, ytl = detection[0], detection[1] 106 | xbr, ybr = detection[2], detection[3] 107 | 108 | fxtl = (xtl * width_ratio) 109 | fytl = (ytl * height_ratio) 110 | fxbr = (xbr * width_ratio) 111 | fybr = (ybr * height_ratio) 112 | 113 | xtl = int(fxtl) 114 | ytl = int(fytl) 115 | xbr = int(fxbr) 116 | ybr = int(fybr) 117 | 118 | if gaussian_bump: 119 | width = detection[2] - detection[0] 120 | height = detection[3] - detection[1] 121 | 122 | width = math.ceil(width * width_ratio) 123 | height = math.ceil(height * height_ratio) 124 | 125 | if gaussian_rad == -1: 126 | radius = gaussian_radius((height, width), gaussian_iou) 127 | radius = max(0, int(radius)) 128 | else: 129 | radius = gaussian_rad 130 | 131 | draw_gaussian(tl_heatmaps[b_ind, category], [xtl, ytl], radius) 132 | draw_gaussian(br_heatmaps[b_ind, category], [xbr, ybr], radius) 133 | else: 134 | tl_heatmaps[b_ind, category, ytl, xtl] = 1 135 | br_heatmaps[b_ind, category, ybr, xbr] = 1 136 | 137 | tag_ind = tag_lens[b_ind] 138 | tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl] 139 | br_regrs[b_ind, tag_ind, :] = [fxbr - xbr, fybr - ybr] 140 | tl_tags[b_ind, tag_ind] = ytl * output_size[1] + xtl 141 | br_tags[b_ind, tag_ind] = ybr * output_size[1] + xbr 142 | tag_lens[b_ind] += 1 143 | 144 | for b_ind in range(batch_size): 145 | tag_len = tag_lens[b_ind] 146 | tag_masks[b_ind, :tag_len] = 1 147 | 148 | images = torch.from_numpy(images) 149 | tl_heatmaps = torch.from_numpy(tl_heatmaps) 150 | br_heatmaps = torch.from_numpy(br_heatmaps) 151 | tl_regrs = torch.from_numpy(tl_regrs) 152 | br_regrs = torch.from_numpy(br_regrs) 153 | tl_tags = torch.from_numpy(tl_tags) 154 | br_tags = torch.from_numpy(br_tags) 155 | tag_masks = torch.from_numpy(tag_masks) 156 | 157 | return { 158 | "xs": [images], 159 | "ys": [tl_heatmaps, br_heatmaps, tag_masks, tl_regrs, br_regrs, tl_tags, br_tags] 160 | }, k_ind 161 | -------------------------------------------------------------------------------- /core/sample/cornernet_saccade.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import math 3 | import torch 4 | import numpy as np 5 | 6 | from .utils import draw_gaussian, gaussian_radius, normalize_, color_jittering_, lighting_, crop_image 7 | 8 | def bbox_overlaps(a_dets, b_dets): 9 | a_widths = a_dets[:, 2] - a_dets[:, 0] 10 | a_heights = a_dets[:, 3] - a_dets[:, 1] 11 | a_areas = a_widths * a_heights 12 | 13 | b_widths = b_dets[:, 2] - b_dets[:, 0] 14 | b_heights = b_dets[:, 3] - b_dets[:, 1] 15 | b_areas = b_widths * b_heights 16 | 17 | return a_areas / b_areas 18 | 19 | def clip_detections(border, detections): 20 | detections = detections.copy() 21 | 22 | y0, y1, x0, x1 = border 23 | det_xs = detections[:, 0:4:2] 24 | det_ys = detections[:, 1:4:2] 25 | np.clip(det_xs, x0, x1 - 1, out=det_xs) 26 | np.clip(det_ys, y0, y1 - 1, out=det_ys) 27 | 28 | keep_inds = ((det_xs[:, 1] - det_xs[:, 0]) > 0) & \ 29 | ((det_ys[:, 1] - det_ys[:, 0]) > 0) 30 | keep_inds = np.where(keep_inds)[0] 31 | return detections[keep_inds], keep_inds 32 | 33 | def crop_image_dets(image, dets, ind, input_size, output_size=None, random_crop=True, rand_center=True): 34 | if ind is not None: 35 | det_x0, det_y0, det_x1, det_y1 = dets[ind, 0:4] 36 | else: 37 | det_x0, det_y0, det_x1, det_y1 = None, None, None, None 38 | 39 | input_height, input_width = input_size 40 | image_height, image_width = image.shape[0:2] 41 | 42 | centered = rand_center and np.random.uniform() > 0.5 43 | if not random_crop or image_width <= input_width: 44 | xc = image_width // 2 45 | elif ind is None or not centered: 46 | xmin = max(det_x1 - input_width, 0) if ind is not None else 0 47 | xmax = min(image_width - input_width, det_x0) if ind is not None else image_width - input_width 48 | xrand = np.random.randint(int(xmin), int(xmax) + 1) 49 | xc = xrand + input_width // 2 50 | else: 51 | xmin = max((det_x0 + det_x1) // 2 - np.random.randint(0, 15), 0) 52 | xmax = min((det_x0 + det_x1) // 2 + np.random.randint(0, 15), image_width - 1) 53 | xc = np.random.randint(int(xmin), int(xmax) + 1) 54 | 55 | if not random_crop or image_height <= input_height: 56 | yc = image_height // 2 57 | elif ind is None or not centered: 58 | ymin = max(det_y1 - input_height, 0) if ind is not None else 0 59 | ymax = min(image_height - input_height, det_y0) if ind is not None else image_height - input_height 60 | yrand = np.random.randint(int(ymin), int(ymax) + 1) 61 | yc = yrand + input_height // 2 62 | else: 63 | ymin = max((det_y0 + det_y1) // 2 - np.random.randint(0, 15), 0) 64 | ymax = min((det_y0 + det_y1) // 2 + np.random.randint(0, 15), image_height - 1) 65 | yc = np.random.randint(int(ymin), int(ymax) + 1) 66 | 67 | image, border, offset = crop_image(image, [yc, xc], input_size, output_size=output_size) 68 | dets[:, 0:4:2] -= offset[1] 69 | dets[:, 1:4:2] -= offset[0] 70 | return image, dets, border 71 | 72 | def scale_image_detections(image, dets, scale): 73 | height, width = image.shape[0:2] 74 | 75 | new_height = int(height * scale) 76 | new_width = int(width * scale) 77 | 78 | image = cv2.resize(image, (new_width, new_height)) 79 | dets = dets.copy() 80 | dets[:, 0:4] *= scale 81 | return image, dets 82 | 83 | def ref_scale(detections, random_crop=False): 84 | if detections.shape[0] == 0: 85 | return None, None 86 | 87 | if random_crop and np.random.uniform() > 0.7: 88 | return None, None 89 | 90 | ref_ind = np.random.randint(detections.shape[0]) 91 | ref_det = detections[ref_ind].copy() 92 | ref_h = ref_det[3] - ref_det[1] 93 | ref_w = ref_det[2] - ref_det[0] 94 | ref_hw = max(ref_h, ref_w) 95 | 96 | if ref_hw > 96: 97 | return np.random.randint(low=96, high=255) / ref_hw, ref_ind 98 | elif ref_hw > 32: 99 | return np.random.randint(low=32, high=97) / ref_hw, ref_ind 100 | return np.random.randint(low=16, high=33) / ref_hw, ref_ind 101 | 102 | def create_attention_mask(atts, ratios, sizes, detections): 103 | for det in detections: 104 | width = det[2] - det[0] 105 | height = det[3] - det[1] 106 | 107 | max_hw = max(width, height) 108 | for att, ratio, size in zip(atts, ratios, sizes): 109 | if max_hw >= size[0] and max_hw <= size[1]: 110 | x = (det[0] + det[2]) / 2 111 | y = (det[1] + det[3]) / 2 112 | x = (x / ratio).astype(np.int32) 113 | y = (y / ratio).astype(np.int32) 114 | att[y, x] = 1 115 | 116 | def cornernet_saccade(system_configs, db, k_ind, data_aug, debug): 117 | data_rng = system_configs.data_rng 118 | batch_size = system_configs.batch_size 119 | 120 | categories = db.configs["categories"] 121 | input_size = db.configs["input_size"] 122 | output_size = db.configs["output_sizes"][0] 123 | rand_scales = db.configs["rand_scales"] 124 | rand_crop = db.configs["rand_crop"] 125 | rand_center = db.configs["rand_center"] 126 | view_sizes = db.configs["view_sizes"] 127 | 128 | gaussian_iou = db.configs["gaussian_iou"] 129 | gaussian_rad = db.configs["gaussian_radius"] 130 | 131 | att_ratios = db.configs["att_ratios"] 132 | att_ranges = db.configs["att_ranges"] 133 | att_sizes = db.configs["att_sizes"] 134 | 135 | min_scale = db.configs["min_scale"] 136 | max_scale = db.configs["max_scale"] 137 | max_objects = 128 138 | 139 | images = np.zeros((batch_size, 3, input_size[0], input_size[1]), dtype=np.float32) 140 | tl_heats = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32) 141 | br_heats = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32) 142 | tl_valids = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32) 143 | br_valids = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32) 144 | tl_regrs = np.zeros((batch_size, max_objects, 2), dtype=np.float32) 145 | br_regrs = np.zeros((batch_size, max_objects, 2), dtype=np.float32) 146 | tl_tags = np.zeros((batch_size, max_objects), dtype=np.int64) 147 | br_tags = np.zeros((batch_size, max_objects), dtype=np.int64) 148 | tag_masks = np.zeros((batch_size, max_objects), dtype=np.uint8) 149 | tag_lens = np.zeros((batch_size, ), dtype=np.int32) 150 | attentions = [np.zeros((batch_size, 1, att_size[0], att_size[1]), dtype=np.float32) for att_size in att_sizes] 151 | 152 | db_size = db.db_inds.size 153 | for b_ind in range(batch_size): 154 | if not debug and k_ind == 0: 155 | # if k_ind == 0: 156 | db.shuffle_inds() 157 | 158 | db_ind = db.db_inds[k_ind] 159 | k_ind = (k_ind + 1) % db_size 160 | 161 | image_path = db.image_path(db_ind) 162 | image = cv2.imread(image_path) 163 | 164 | orig_detections = db.detections(db_ind) 165 | keep_inds = np.arange(orig_detections.shape[0]) 166 | 167 | # clip the detections 168 | detections = orig_detections.copy() 169 | border = [0, image.shape[0], 0, image.shape[1]] 170 | detections, clip_inds = clip_detections(border, detections) 171 | keep_inds = keep_inds[clip_inds] 172 | 173 | scale, ref_ind = ref_scale(detections, random_crop=rand_crop) 174 | scale = np.random.choice(rand_scales) if scale is None else scale 175 | 176 | orig_detections[:, 0:4:2] *= scale 177 | orig_detections[:, 1:4:2] *= scale 178 | 179 | image, detections = scale_image_detections(image, detections, scale) 180 | ref_detection = detections[ref_ind].copy() 181 | 182 | image, detections, border = crop_image_dets(image, detections, ref_ind, input_size, rand_center=rand_center) 183 | 184 | detections, clip_inds = clip_detections(border, detections) 185 | keep_inds = keep_inds[clip_inds] 186 | 187 | width_ratio = output_size[1] / input_size[1] 188 | height_ratio = output_size[0] / input_size[0] 189 | 190 | # flipping an image randomly 191 | if not debug and np.random.uniform() > 0.5: 192 | image[:] = image[:, ::-1, :] 193 | width = image.shape[1] 194 | detections[:, [0, 2]] = width - detections[:, [2, 0]] - 1 195 | create_attention_mask([att[b_ind, 0] for att in attentions], att_ratios, att_ranges, detections) 196 | 197 | if debug: 198 | dimage = image.copy() 199 | for det in detections.astype(np.int32): 200 | cv2.rectangle(dimage, 201 | (det[0], det[1]), 202 | (det[2], det[3]), 203 | (0, 255, 0), 2 204 | ) 205 | cv2.imwrite('debug/{:03d}.jpg'.format(b_ind), dimage) 206 | overlaps = bbox_overlaps(detections, orig_detections[keep_inds]) > 0.5 207 | 208 | if not debug: 209 | image = image.astype(np.float32) / 255. 210 | color_jittering_(data_rng, image) 211 | lighting_(data_rng, image, 0.1, db.eig_val, db.eig_vec) 212 | normalize_(image, db.mean, db.std) 213 | images[b_ind] = image.transpose((2, 0, 1)) 214 | 215 | for ind, (detection, overlap) in enumerate(zip(detections, overlaps)): 216 | category = int(detection[-1]) - 1 217 | 218 | xtl, ytl = detection[0], detection[1] 219 | xbr, ybr = detection[2], detection[3] 220 | 221 | det_height = int(ybr) - int(ytl) 222 | det_width = int(xbr) - int(xtl) 223 | det_max = max(det_height, det_width) 224 | 225 | valid = det_max >= min_scale 226 | 227 | fxtl = (xtl * width_ratio) 228 | fytl = (ytl * height_ratio) 229 | fxbr = (xbr * width_ratio) 230 | fybr = (ybr * height_ratio) 231 | 232 | xtl = int(fxtl) 233 | ytl = int(fytl) 234 | xbr = int(fxbr) 235 | ybr = int(fybr) 236 | 237 | width = detection[2] - detection[0] 238 | height = detection[3] - detection[1] 239 | 240 | width = math.ceil(width * width_ratio) 241 | height = math.ceil(height * height_ratio) 242 | 243 | if gaussian_rad == -1: 244 | radius = gaussian_radius((height, width), gaussian_iou) 245 | radius = max(0, int(radius)) 246 | else: 247 | radius = gaussian_rad 248 | 249 | if overlap and valid: 250 | draw_gaussian(tl_heats[b_ind, category], [xtl, ytl], radius) 251 | draw_gaussian(br_heats[b_ind, category], [xbr, ybr], radius) 252 | 253 | tag_ind = tag_lens[b_ind] 254 | tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl] 255 | br_regrs[b_ind, tag_ind, :] = [fxbr - xbr, fybr - ybr] 256 | tl_tags[b_ind, tag_ind] = ytl * output_size[1] + xtl 257 | br_tags[b_ind, tag_ind] = ybr * output_size[1] + xbr 258 | tag_lens[b_ind] += 1 259 | else: 260 | draw_gaussian(tl_valids[b_ind, category], [xtl, ytl], radius) 261 | draw_gaussian(br_valids[b_ind, category], [xbr, ybr], radius) 262 | 263 | tl_valids = (tl_valids == 0).astype(np.float32) 264 | br_valids = (br_valids == 0).astype(np.float32) 265 | 266 | for b_ind in range(batch_size): 267 | tag_len = tag_lens[b_ind] 268 | tag_masks[b_ind, :tag_len] = 1 269 | 270 | images = torch.from_numpy(images) 271 | tl_heats = torch.from_numpy(tl_heats) 272 | br_heats = torch.from_numpy(br_heats) 273 | tl_regrs = torch.from_numpy(tl_regrs) 274 | br_regrs = torch.from_numpy(br_regrs) 275 | tl_tags = torch.from_numpy(tl_tags) 276 | br_tags = torch.from_numpy(br_tags) 277 | tag_masks = torch.from_numpy(tag_masks) 278 | tl_valids = torch.from_numpy(tl_valids) 279 | br_valids = torch.from_numpy(br_valids) 280 | attentions = [torch.from_numpy(att) for att in attentions] 281 | 282 | return { 283 | "xs": [images], 284 | "ys": [tl_heats, br_heats, tag_masks, tl_regrs, br_regrs, tl_tags, br_tags, tl_valids, br_valids, attentions] 285 | }, k_ind 286 | -------------------------------------------------------------------------------- /core/sample/utils.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import random 4 | 5 | def grayscale(image): 6 | return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) 7 | 8 | def normalize_(image, mean, std): 9 | image -= mean 10 | image /= std 11 | 12 | def lighting_(data_rng, image, alphastd, eigval, eigvec): 13 | alpha = data_rng.normal(scale=alphastd, size=(3, )) 14 | image += np.dot(eigvec, eigval * alpha) 15 | 16 | def blend_(alpha, image1, image2): 17 | image1 *= alpha 18 | image2 *= (1 - alpha) 19 | image1 += image2 20 | 21 | def saturation_(data_rng, image, gs, gs_mean, var): 22 | alpha = 1. + data_rng.uniform(low=-var, high=var) 23 | blend_(alpha, image, gs[:, :, None]) 24 | 25 | def brightness_(data_rng, image, gs, gs_mean, var): 26 | alpha = 1. + data_rng.uniform(low=-var, high=var) 27 | image *= alpha 28 | 29 | def contrast_(data_rng, image, gs, gs_mean, var): 30 | alpha = 1. + data_rng.uniform(low=-var, high=var) 31 | blend_(alpha, image, gs_mean) 32 | 33 | def color_jittering_(data_rng, image): 34 | functions = [brightness_, contrast_, saturation_] 35 | random.shuffle(functions) 36 | 37 | gs = grayscale(image) 38 | gs_mean = gs.mean() 39 | for f in functions: 40 | f(data_rng, image, gs, gs_mean, 0.4) 41 | 42 | def gaussian2D(shape, sigma=1): 43 | m, n = [(ss - 1.) / 2. for ss in shape] 44 | y, x = np.ogrid[-m:m+1,-n:n+1] 45 | 46 | h = np.exp(-(x * x + y * y) / (2 * sigma * sigma)) 47 | h[h < np.finfo(h.dtype).eps * h.max()] = 0 48 | return h 49 | 50 | def draw_gaussian(heatmap, center, radius, k=1): 51 | diameter = 2 * radius + 1 52 | gaussian = gaussian2D((diameter, diameter), sigma=diameter / 6) 53 | 54 | x, y = center 55 | 56 | height, width = heatmap.shape[0:2] 57 | 58 | left, right = min(x, radius), min(width - x, radius + 1) 59 | top, bottom = min(y, radius), min(height - y, radius + 1) 60 | 61 | masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] 62 | masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:radius + right] 63 | np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap) 64 | 65 | def gaussian_radius(det_size, min_overlap): 66 | height, width = det_size 67 | 68 | a1 = 1 69 | b1 = (height + width) 70 | c1 = width * height * (1 - min_overlap) / (1 + min_overlap) 71 | sq1 = np.sqrt(b1 ** 2 - 4 * a1 * c1) 72 | r1 = (b1 - sq1) / (2 * a1) 73 | 74 | a2 = 4 75 | b2 = 2 * (height + width) 76 | c2 = (1 - min_overlap) * width * height 77 | sq2 = np.sqrt(b2 ** 2 - 4 * a2 * c2) 78 | r2 = (b2 - sq2) / (2 * a2) 79 | 80 | a3 = 4 * min_overlap 81 | b3 = -2 * min_overlap * (height + width) 82 | c3 = (min_overlap - 1) * width * height 83 | sq3 = np.sqrt(b3 ** 2 - 4 * a3 * c3) 84 | r3 = (b3 + sq3) / (2 * a3) 85 | return min(r1, r2, r3) 86 | 87 | def _get_border(border, size): 88 | i = 1 89 | while size - border // i <= border // i: 90 | i *= 2 91 | return border // i 92 | 93 | def random_crop(image, detections, random_scales, view_size, border=64): 94 | view_height, view_width = view_size 95 | image_height, image_width = image.shape[0:2] 96 | 97 | scale = np.random.choice(random_scales) 98 | height = int(view_height * scale) 99 | width = int(view_width * scale) 100 | 101 | cropped_image = np.zeros((height, width, 3), dtype=image.dtype) 102 | 103 | w_border = _get_border(border, image_width) 104 | h_border = _get_border(border, image_height) 105 | 106 | ctx = np.random.randint(low=w_border, high=image_width - w_border) 107 | cty = np.random.randint(low=h_border, high=image_height - h_border) 108 | 109 | x0, x1 = max(ctx - width // 2, 0), min(ctx + width // 2, image_width) 110 | y0, y1 = max(cty - height // 2, 0), min(cty + height // 2, image_height) 111 | 112 | left_w, right_w = ctx - x0, x1 - ctx 113 | top_h, bottom_h = cty - y0, y1 - cty 114 | 115 | # crop image 116 | cropped_ctx, cropped_cty = width // 2, height // 2 117 | x_slice = slice(cropped_ctx - left_w, cropped_ctx + right_w) 118 | y_slice = slice(cropped_cty - top_h, cropped_cty + bottom_h) 119 | cropped_image[y_slice, x_slice, :] = image[y0:y1, x0:x1, :] 120 | 121 | # crop detections 122 | cropped_detections = detections.copy() 123 | cropped_detections[:, 0:4:2] -= x0 124 | cropped_detections[:, 1:4:2] -= y0 125 | cropped_detections[:, 0:4:2] += cropped_ctx - left_w 126 | cropped_detections[:, 1:4:2] += cropped_cty - top_h 127 | 128 | return cropped_image, cropped_detections 129 | 130 | def crop_image(image, center, size, output_size=None): 131 | if output_size == None: 132 | output_size = size 133 | 134 | cty, ctx = center 135 | height, width = size 136 | o_height, o_width = output_size 137 | im_height, im_width = image.shape[0:2] 138 | cropped_image = np.zeros((o_height, o_width, 3), dtype=image.dtype) 139 | 140 | x0, x1 = max(0, ctx - width // 2), min(ctx + width // 2, im_width) 141 | y0, y1 = max(0, cty - height // 2), min(cty + height // 2, im_height) 142 | 143 | left, right = ctx - x0, x1 - ctx 144 | top, bottom = cty - y0, y1 - cty 145 | 146 | cropped_cty, cropped_ctx = o_height // 2, o_width // 2 147 | y_slice = slice(cropped_cty - top, cropped_cty + bottom) 148 | x_slice = slice(cropped_ctx - left, cropped_ctx + right) 149 | cropped_image[y_slice, x_slice, :] = image[y0:y1, x0:x1, :] 150 | 151 | border = np.array([ 152 | cropped_cty - top, 153 | cropped_cty + bottom, 154 | cropped_ctx - left, 155 | cropped_ctx + right 156 | ], dtype=np.float32) 157 | 158 | offset = np.array([ 159 | cty - o_height // 2, 160 | ctx - o_width // 2 161 | ]) 162 | 163 | return cropped_image, border, offset 164 | -------------------------------------------------------------------------------- /core/test/__init__.py: -------------------------------------------------------------------------------- 1 | from .cornernet import cornernet 2 | from .cornernet_saccade import cornernet_saccade 3 | 4 | def test_func(sys_config, db, nnet, result_dir, debug=False): 5 | return globals()[sys_config.sampling_function](db, nnet, result_dir, debug=debug) 6 | -------------------------------------------------------------------------------- /core/test/cornernet.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import json 4 | import numpy as np 5 | import torch 6 | 7 | from tqdm import tqdm 8 | 9 | from ..utils import Timer 10 | from ..vis_utils import draw_bboxes 11 | from ..sample.utils import crop_image 12 | from ..external.nms import soft_nms, soft_nms_merge 13 | 14 | def rescale_dets_(detections, ratios, borders, sizes): 15 | xs, ys = detections[..., 0:4:2], detections[..., 1:4:2] 16 | xs /= ratios[:, 1][:, None, None] 17 | ys /= ratios[:, 0][:, None, None] 18 | xs -= borders[:, 2][:, None, None] 19 | ys -= borders[:, 0][:, None, None] 20 | np.clip(xs, 0, sizes[:, 1][:, None, None], out=xs) 21 | np.clip(ys, 0, sizes[:, 0][:, None, None], out=ys) 22 | 23 | def decode(nnet, images, K, ae_threshold=0.5, kernel=3, num_dets=1000): 24 | detections = nnet.test([images], ae_threshold=ae_threshold, test=True, K=K, kernel=kernel, num_dets=num_dets)[0] 25 | return detections.data.cpu().numpy() 26 | 27 | def cornernet(db, nnet, result_dir, debug=False, decode_func=decode): 28 | debug_dir = os.path.join(result_dir, "debug") 29 | if not os.path.exists(debug_dir): 30 | os.makedirs(debug_dir) 31 | 32 | if db.split != "trainval2014": 33 | db_inds = db.db_inds[:100] if debug else db.db_inds 34 | else: 35 | db_inds = db.db_inds[:100] if debug else db.db_inds[:5000] 36 | 37 | num_images = db_inds.size 38 | categories = db.configs["categories"] 39 | 40 | timer = Timer() 41 | top_bboxes = {} 42 | for ind in tqdm(range(0, num_images), ncols=80, desc="locating kps"): 43 | db_ind = db_inds[ind] 44 | 45 | image_id = db.image_ids(db_ind) 46 | image_path = db.image_path(db_ind) 47 | image = cv2.imread(image_path) 48 | 49 | timer.tic() 50 | top_bboxes[image_id] = cornernet_inference(db, nnet, image) 51 | timer.toc() 52 | 53 | if debug: 54 | image_path = db.image_path(db_ind) 55 | image = cv2.imread(image_path) 56 | bboxes = { 57 | db.cls2name(j): top_bboxes[image_id][j] 58 | for j in range(1, categories + 1) 59 | } 60 | image = draw_bboxes(image, bboxes) 61 | debug_file = os.path.join(debug_dir, "{}.jpg".format(db_ind)) 62 | cv2.imwrite(debug_file, image) 63 | print('average time: {}'.format(timer.average_time)) 64 | 65 | result_json = os.path.join(result_dir, "results.json") 66 | detections = db.convert_to_coco(top_bboxes) 67 | with open(result_json, "w") as f: 68 | json.dump(detections, f) 69 | 70 | cls_ids = list(range(1, categories + 1)) 71 | image_ids = [db.image_ids(ind) for ind in db_inds] 72 | db.evaluate(result_json, cls_ids, image_ids) 73 | return 0 74 | 75 | def cornernet_inference(db, nnet, image, decode_func=decode): 76 | K = db.configs["top_k"] 77 | ae_threshold = db.configs["ae_threshold"] 78 | nms_kernel = db.configs["nms_kernel"] 79 | num_dets = db.configs["num_dets"] 80 | test_flipped = db.configs["test_flipped"] 81 | 82 | input_size = db.configs["input_size"] 83 | output_size = db.configs["output_sizes"][0] 84 | 85 | scales = db.configs["test_scales"] 86 | weight_exp = db.configs["weight_exp"] 87 | merge_bbox = db.configs["merge_bbox"] 88 | categories = db.configs["categories"] 89 | nms_threshold = db.configs["nms_threshold"] 90 | max_per_image = db.configs["max_per_image"] 91 | nms_algorithm = { 92 | "nms": 0, 93 | "linear_soft_nms": 1, 94 | "exp_soft_nms": 2 95 | }[db.configs["nms_algorithm"]] 96 | 97 | height, width = image.shape[0:2] 98 | 99 | height_scale = (input_size[0] + 1) // output_size[0] 100 | width_scale = (input_size[1] + 1) // output_size[1] 101 | 102 | im_mean = torch.cuda.FloatTensor(db.mean).reshape(1, 3, 1, 1) 103 | im_std = torch.cuda.FloatTensor(db.std).reshape(1, 3, 1, 1) 104 | 105 | detections = [] 106 | for scale in scales: 107 | new_height = int(height * scale) 108 | new_width = int(width * scale) 109 | new_center = np.array([new_height // 2, new_width // 2]) 110 | 111 | inp_height = new_height | 127 112 | inp_width = new_width | 127 113 | 114 | images = np.zeros((1, 3, inp_height, inp_width), dtype=np.float32) 115 | ratios = np.zeros((1, 2), dtype=np.float32) 116 | borders = np.zeros((1, 4), dtype=np.float32) 117 | sizes = np.zeros((1, 2), dtype=np.float32) 118 | 119 | out_height, out_width = (inp_height + 1) // height_scale, (inp_width + 1) // width_scale 120 | height_ratio = out_height / inp_height 121 | width_ratio = out_width / inp_width 122 | 123 | resized_image = cv2.resize(image, (new_width, new_height)) 124 | resized_image, border, offset = crop_image(resized_image, new_center, [inp_height, inp_width]) 125 | 126 | resized_image = resized_image / 255. 127 | 128 | images[0] = resized_image.transpose((2, 0, 1)) 129 | borders[0] = border 130 | sizes[0] = [int(height * scale), int(width * scale)] 131 | ratios[0] = [height_ratio, width_ratio] 132 | 133 | if test_flipped: 134 | images = np.concatenate((images, images[:, :, :, ::-1]), axis=0) 135 | images = torch.from_numpy(images).cuda() 136 | images -= im_mean 137 | images /= im_std 138 | 139 | dets = decode_func(nnet, images, K, ae_threshold=ae_threshold, kernel=nms_kernel, num_dets=num_dets) 140 | if test_flipped: 141 | dets[1, :, [0, 2]] = out_width - dets[1, :, [2, 0]] 142 | dets = dets.reshape(1, -1, 8) 143 | 144 | rescale_dets_(dets, ratios, borders, sizes) 145 | dets[:, :, 0:4] /= scale 146 | detections.append(dets) 147 | 148 | detections = np.concatenate(detections, axis=1) 149 | 150 | classes = detections[..., -1] 151 | classes = classes[0] 152 | detections = detections[0] 153 | 154 | # reject detections with negative scores 155 | keep_inds = (detections[:, 4] > -1) 156 | detections = detections[keep_inds] 157 | classes = classes[keep_inds] 158 | 159 | top_bboxes = {} 160 | for j in range(categories): 161 | keep_inds = (classes == j) 162 | top_bboxes[j + 1] = detections[keep_inds][:, 0:7].astype(np.float32) 163 | if merge_bbox: 164 | soft_nms_merge(top_bboxes[j + 1], Nt=nms_threshold, method=nms_algorithm, weight_exp=weight_exp) 165 | else: 166 | soft_nms(top_bboxes[j + 1], Nt=nms_threshold, method=nms_algorithm) 167 | top_bboxes[j + 1] = top_bboxes[j + 1][:, 0:5] 168 | 169 | scores = np.hstack([top_bboxes[j][:, -1] for j in range(1, categories + 1)]) 170 | if len(scores) > max_per_image: 171 | kth = len(scores) - max_per_image 172 | thresh = np.partition(scores, kth)[kth] 173 | for j in range(1, categories + 1): 174 | keep_inds = (top_bboxes[j][:, -1] >= thresh) 175 | top_bboxes[j] = top_bboxes[j][keep_inds] 176 | return top_bboxes 177 | -------------------------------------------------------------------------------- /core/test/cornernet_saccade.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import math 4 | import json 5 | import torch 6 | import torch.nn as nn 7 | import numpy as np 8 | 9 | from tqdm import tqdm 10 | 11 | from ..utils import Timer 12 | from ..vis_utils import draw_bboxes 13 | from ..external.nms import soft_nms 14 | 15 | def crop_image_gpu(image, center, size, out_image): 16 | cty, ctx = center 17 | height, width = size 18 | o_height, o_width = out_image.shape[1:3] 19 | im_height, im_width = image.shape[1:3] 20 | 21 | scale = o_height / max(height, width) 22 | x0, x1 = max(0, ctx - width // 2), min(ctx + width // 2, im_width) 23 | y0, y1 = max(0, cty - height // 2), min(cty + height // 2, im_height) 24 | 25 | left, right = ctx - x0, x1 - ctx 26 | top, bottom = cty - y0, y1 - cty 27 | 28 | cropped_cty, cropped_ctx = o_height // 2, o_width // 2 29 | out_y0, out_y1 = cropped_cty - int(top * scale), cropped_cty + int(bottom * scale) 30 | out_x0, out_x1 = cropped_ctx - int(left * scale), cropped_ctx + int(right * scale) 31 | 32 | new_height = out_y1 - out_y0 33 | new_width = out_x1 - out_x0 34 | image = image[:, y0:y1, x0:x1].unsqueeze(0) 35 | out_image[:, out_y0:out_y1, out_x0:out_x1] = nn.functional.interpolate( 36 | image, size=[new_height, new_width], mode='bilinear' 37 | )[0] 38 | 39 | return np.array([cty - height // 2, ctx - width // 2]) 40 | 41 | def remap_dets_(detections, scales, offsets): 42 | xs, ys = detections[..., 0:4:2], detections[..., 1:4:2] 43 | 44 | xs /= scales.reshape(-1, 1, 1) 45 | ys /= scales.reshape(-1, 1, 1) 46 | xs += offsets[:, 1][:, None, None] 47 | ys += offsets[:, 0][:, None, None] 48 | 49 | def att_nms(atts, ks): 50 | pads = [(k - 1) // 2 for k in ks] 51 | pools = [nn.functional.max_pool2d(att, (k, k), stride=1, padding=pad) for k, att, pad in zip(ks, atts, pads)] 52 | keeps = [(att == pool).float() for att, pool in zip(atts, pools)] 53 | atts = [att * keep for att, keep in zip(atts, keeps)] 54 | return atts 55 | 56 | def batch_decode(db, nnet, images, no_att=False): 57 | K = db.configs["top_k"] 58 | ae_threshold = db.configs["ae_threshold"] 59 | kernel = db.configs["nms_kernel"] 60 | num_dets = db.configs["num_dets"] 61 | 62 | att_nms_ks = db.configs["att_nms_ks"] 63 | att_ranges = db.configs["att_ranges"] 64 | 65 | num_images = images.shape[0] 66 | detections = [] 67 | attentions = [[] for _ in range(len(att_ranges))] 68 | 69 | batch_size = 32 70 | for b_ind in range(math.ceil(num_images / batch_size)): 71 | b_start = b_ind * batch_size 72 | b_end = min(num_images, (b_ind + 1) * batch_size) 73 | 74 | b_images = images[b_start:b_end] 75 | b_outputs = nnet.test( 76 | [b_images], ae_threshold=ae_threshold, K=K, kernel=kernel, 77 | test=True, num_dets=num_dets, no_border=True, no_att=no_att 78 | ) 79 | if no_att: 80 | b_detections = b_outputs 81 | else: 82 | b_detections = b_outputs[0] 83 | b_attentions = b_outputs[1] 84 | b_attentions = att_nms(b_attentions, att_nms_ks) 85 | b_attentions = [b_attention.data.cpu().numpy() for b_attention in b_attentions] 86 | 87 | b_detections = b_detections.data.cpu().numpy() 88 | 89 | detections.append(b_detections) 90 | if not no_att: 91 | for attention, b_attention in zip(attentions, b_attentions): 92 | attention.append(b_attention) 93 | 94 | if not no_att: 95 | attentions = [np.concatenate(atts, axis=0) for atts in attentions] if detections else None 96 | detections = np.concatenate(detections, axis=0) if detections else np.zeros((0, num_dets, 8)) 97 | return detections, attentions 98 | 99 | def decode_atts(db, atts, att_scales, scales, offsets, height, width, thresh, ignore_same=False): 100 | att_ranges = db.configs["att_ranges"] 101 | att_ratios = db.configs["att_ratios"] 102 | input_size = db.configs["input_size"] 103 | 104 | next_ys, next_xs, next_scales, next_scores = [], [], [], [] 105 | 106 | num_atts = atts[0].shape[0] 107 | for aind in range(num_atts): 108 | for att, att_range, att_ratio, att_scale in zip(atts, att_ranges, att_ratios, att_scales): 109 | ys, xs = np.where(att[aind, 0] > thresh) 110 | scores = att[aind, 0, ys, xs] 111 | 112 | ys = ys * att_ratio / scales[aind] + offsets[aind, 0] 113 | xs = xs * att_ratio / scales[aind] + offsets[aind, 1] 114 | 115 | keep = (ys >= 0) & (ys < height) & (xs >= 0) & (xs < width) 116 | ys, xs, scores = ys[keep], xs[keep], scores[keep] 117 | 118 | next_scale = att_scale * scales[aind] 119 | if (ignore_same and att_scale <= 1) or scales[aind] > 2 or next_scale > 4: 120 | continue 121 | 122 | next_scales += [next_scale] * len(xs) 123 | next_scores += scores.tolist() 124 | next_ys += ys.tolist() 125 | next_xs += xs.tolist() 126 | next_ys = np.array(next_ys) 127 | next_xs = np.array(next_xs) 128 | next_scales = np.array(next_scales) 129 | next_scores = np.array(next_scores) 130 | return np.stack((next_ys, next_xs, next_scales, next_scores), axis=1) 131 | 132 | def get_ref_locs(dets): 133 | keep = dets[:, 4] > 0.5 134 | dets = dets[keep] 135 | 136 | ref_xs = (dets[:, 0] + dets[:, 2]) / 2 137 | ref_ys = (dets[:, 1] + dets[:, 3]) / 2 138 | 139 | ref_maxhws = np.maximum(dets[:, 2] - dets[:, 0], dets[:, 3] - dets[:, 1]) 140 | ref_scales = np.zeros_like(ref_maxhws) 141 | ref_scores = dets[:, 4] 142 | 143 | large_inds = ref_maxhws > 96 144 | medium_inds = (ref_maxhws > 32) & (ref_maxhws <= 96) 145 | small_inds = ref_maxhws <= 32 146 | 147 | ref_scales[large_inds] = 192 / ref_maxhws[large_inds] 148 | ref_scales[medium_inds] = 64 / ref_maxhws[medium_inds] 149 | ref_scales[small_inds] = 24 / ref_maxhws[small_inds] 150 | 151 | new_locations = np.stack((ref_ys, ref_xs, ref_scales, ref_scores), axis=1) 152 | new_locations[:, 3] = 1 153 | return new_locations 154 | 155 | def get_locs(db, nnet, image, im_mean, im_std, att_scales, thresh, sizes, ref_dets=True): 156 | att_ranges = db.configs["att_ranges"] 157 | att_ratios = db.configs["att_ratios"] 158 | input_size = db.configs["input_size"] 159 | 160 | height, width = image.shape[1:3] 161 | 162 | locations = [] 163 | for size in sizes: 164 | scale = size / max(height, width) 165 | location = [height // 2, width // 2, scale] 166 | locations.append(location) 167 | 168 | locations = np.array(locations, dtype=np.float32) 169 | images, offsets = prepare_images(db, image, locations, flipped=False) 170 | 171 | images -= im_mean 172 | images /= im_std 173 | 174 | dets, atts = batch_decode(db, nnet, images) 175 | 176 | scales = locations[:, 2] 177 | next_locations = decode_atts(db, atts, att_scales, scales, offsets, height, width, thresh) 178 | 179 | rescale_dets_(db, dets) 180 | remap_dets_(dets, scales, offsets) 181 | 182 | dets = dets.reshape(-1, 8) 183 | keep = dets[:, 4] > 0.3 184 | dets = dets[keep] 185 | 186 | if ref_dets: 187 | ref_locations = get_ref_locs(dets) 188 | next_locations = np.concatenate((next_locations, ref_locations), axis=0) 189 | next_locations = location_nms(next_locations, thresh=16) 190 | return dets, next_locations, atts 191 | 192 | def location_nms(locations, thresh=15): 193 | next_locations = [] 194 | sorted_inds = np.argsort(locations[:, -1])[::-1] 195 | 196 | locations = locations[sorted_inds] 197 | ys = locations[:, 0] 198 | xs = locations[:, 1] 199 | scales = locations[:, 2] 200 | 201 | dist_ys = np.absolute(ys.reshape(-1, 1) - ys.reshape(1, -1)) 202 | dist_xs = np.absolute(xs.reshape(-1, 1) - xs.reshape(1, -1)) 203 | dists = np.minimum(dist_ys, dist_xs) 204 | ratios = scales.reshape(-1, 1) / scales.reshape(1, -1) 205 | while dists.shape[0] > 0: 206 | next_locations.append(locations[0]) 207 | 208 | scale = scales[0] 209 | dist = dists[0] 210 | ratio = ratios[0] 211 | 212 | keep = (dist > (thresh / scale)) | (ratio > 1.2) | (ratio < 0.8) 213 | 214 | locations = locations[keep] 215 | 216 | scales = scales[keep] 217 | dists = dists[keep, :] 218 | dists = dists[:, keep] 219 | ratios = ratios[keep, :] 220 | ratios = ratios[:, keep] 221 | return np.stack(next_locations) if next_locations else np.zeros((0, 4)) 222 | 223 | def prepare_images(db, image, locs, flipped=True): 224 | input_size = db.configs["input_size"] 225 | num_patches = locs.shape[0] 226 | 227 | images = torch.cuda.FloatTensor(num_patches, 3, input_size[0], input_size[1]).fill_(0) 228 | offsets = np.zeros((num_patches, 2), dtype=np.float32) 229 | for ind, (y, x, scale) in enumerate(locs[:, :3]): 230 | crop_height = int(input_size[0] / scale) 231 | crop_width = int(input_size[1] / scale) 232 | offsets[ind] = crop_image_gpu(image, [int(y), int(x)], [crop_height, crop_width], images[ind]) 233 | return images, offsets 234 | 235 | def rescale_dets_(db, dets): 236 | input_size = db.configs["input_size"] 237 | output_size = db.configs["output_sizes"][0] 238 | 239 | ratios = [o / i for o, i in zip(output_size, input_size)] 240 | dets[..., 0:4:2] /= ratios[1] 241 | dets[..., 1:4:2] /= ratios[0] 242 | 243 | def cornernet_saccade(db, nnet, result_dir, debug=False, decode_func=batch_decode): 244 | debug_dir = os.path.join(result_dir, "debug") 245 | if not os.path.exists(debug_dir): 246 | os.makedirs(debug_dir) 247 | 248 | if db.split != "trainval2014": 249 | db_inds = db.db_inds[:500] if debug else db.db_inds 250 | else: 251 | db_inds = db.db_inds[:100] if debug else db.db_inds[:5000] 252 | 253 | num_images = db_inds.size 254 | categories = db.configs["categories"] 255 | 256 | timer = Timer() 257 | top_bboxes = {} 258 | for k_ind in tqdm(range(0, num_images), ncols=80, desc="locating kps"): 259 | db_ind = db_inds[k_ind] 260 | 261 | image_id = db.image_ids(db_ind) 262 | image_path = db.image_path(db_ind) 263 | image = cv2.imread(image_path) 264 | 265 | timer.tic() 266 | top_bboxes[image_id] = cornernet_saccade_inference(db, nnet, image) 267 | timer.toc() 268 | 269 | if debug: 270 | image_path = db.image_path(db_ind) 271 | image = cv2.imread(image_path) 272 | bboxes = { 273 | db.cls2name(j): top_bboxes[image_id][j] 274 | for j in range(1, categories + 1) 275 | } 276 | image = draw_bboxes(image, bboxes) 277 | debug_file = os.path.join(debug_dir, "{}.jpg".format(db_ind)) 278 | cv2.imwrite(debug_file, image) 279 | print('average time: {}'.format(timer.average_time)) 280 | 281 | result_json = os.path.join(result_dir, "results.json") 282 | detections = db.convert_to_coco(top_bboxes) 283 | with open(result_json, "w") as f: 284 | json.dump(detections, f) 285 | 286 | cls_ids = list(range(1, categories + 1)) 287 | image_ids = [db.image_ids(ind) for ind in db_inds] 288 | db.evaluate(result_json, cls_ids, image_ids) 289 | return 0 290 | 291 | def cornernet_saccade_inference(db, nnet, image, decode_func=batch_decode): 292 | init_sizes = db.configs["init_sizes"] 293 | ref_dets = db.configs["ref_dets"] 294 | 295 | att_thresholds = db.configs["att_thresholds"] 296 | att_scales = db.configs["att_scales"] 297 | att_max_crops = db.configs["att_max_crops"] 298 | 299 | categories = db.configs["categories"] 300 | nms_threshold = db.configs["nms_threshold"] 301 | max_per_image = db.configs["max_per_image"] 302 | nms_algorithm = { 303 | "nms": 0, 304 | "linear_soft_nms": 1, 305 | "exp_soft_nms": 2 306 | }[db.configs["nms_algorithm"]] 307 | 308 | num_iterations = len(att_thresholds) 309 | 310 | im_mean = torch.cuda.FloatTensor(db.mean).reshape(1, 3, 1, 1) 311 | im_std = torch.cuda.FloatTensor(db.std).reshape(1, 3, 1, 1) 312 | 313 | detections = [] 314 | height, width = image.shape[0:2] 315 | 316 | image = image / 255. 317 | image = image.transpose((2, 0, 1)).copy() 318 | image = torch.from_numpy(image).cuda(non_blocking=True) 319 | 320 | dets, locations, atts = get_locs( 321 | db, nnet, image, im_mean, im_std, 322 | att_scales[0], att_thresholds[0], 323 | init_sizes, ref_dets=ref_dets 324 | ) 325 | 326 | detections = [dets] 327 | num_patches = locations.shape[0] 328 | 329 | num_crops = 0 330 | for ind in range(1, num_iterations + 1): 331 | if num_patches == 0: 332 | break 333 | 334 | if num_crops + num_patches > att_max_crops: 335 | max_crops = min(att_max_crops - num_crops, num_patches) 336 | locations = locations[:max_crops] 337 | 338 | num_patches = locations.shape[0] 339 | num_crops += locations.shape[0] 340 | no_att = (ind == num_iterations) 341 | 342 | images, offsets = prepare_images(db, image, locations, flipped=False) 343 | images -= im_mean 344 | images /= im_std 345 | 346 | dets, atts = decode_func(db, nnet, images, no_att=no_att) 347 | dets = dets.reshape(num_patches, -1, 8) 348 | 349 | rescale_dets_(db, dets) 350 | remap_dets_(dets, locations[:, 2], offsets) 351 | 352 | dets = dets.reshape(-1, 8) 353 | keeps = (dets[:, 4] > -1) 354 | dets = dets[keeps] 355 | 356 | detections.append(dets) 357 | 358 | if num_crops == att_max_crops: 359 | break 360 | 361 | if ind < num_iterations: 362 | att_threshold = att_thresholds[ind] 363 | att_scale = att_scales[ind] 364 | 365 | next_locations = decode_atts( 366 | db, atts, att_scale, locations[:, 2], offsets, height, width, att_threshold, ignore_same=True 367 | ) 368 | 369 | if ref_dets: 370 | ref_locations = get_ref_locs(dets) 371 | next_locations = np.concatenate((next_locations, ref_locations), axis=0) 372 | next_locations = location_nms(next_locations, thresh=16) 373 | 374 | locations = next_locations 375 | num_patches = locations.shape[0] 376 | 377 | detections = np.concatenate(detections, axis=0) 378 | classes = detections[..., -1] 379 | 380 | top_bboxes = {} 381 | for j in range(categories): 382 | keep_inds = (classes == j) 383 | top_bboxes[j + 1] = detections[keep_inds][:, 0:7].astype(np.float32) 384 | keep_inds = soft_nms(top_bboxes[j + 1], Nt=nms_threshold, method=nms_algorithm, sigma=0.7) 385 | top_bboxes[j + 1] = top_bboxes[j + 1][keep_inds, 0:5] 386 | 387 | scores = np.hstack([top_bboxes[j][:, -1] for j in range(1, categories + 1)]) 388 | if len(scores) > max_per_image: 389 | kth = len(scores) - max_per_image 390 | thresh = np.partition(scores, kth)[kth] 391 | for j in range(1, categories + 1): 392 | keep_inds = (top_bboxes[j][:, -1] >= thresh) 393 | top_bboxes[j] = top_bboxes[j][keep_inds] 394 | return top_bboxes 395 | -------------------------------------------------------------------------------- /core/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .tqdm import stdout_to_tqdm 2 | from .timer import Timer 3 | -------------------------------------------------------------------------------- /core/utils/timer.py: -------------------------------------------------------------------------------- 1 | import time 2 | 3 | class Timer(object): 4 | """A simple timer.""" 5 | def __init__(self): 6 | self.total_time = 0. 7 | self.calls = 0 8 | self.start_time = 0. 9 | self.diff = 0. 10 | self.average_time = 0. 11 | 12 | def tic(self): 13 | # using time.time instead of time.clock because time time.clock 14 | # does not normalize for multithreading 15 | self.start_time = time.time() 16 | 17 | def toc(self, average=True): 18 | self.diff = time.time() - self.start_time 19 | self.total_time += self.diff 20 | self.calls += 1 21 | self.average_time = self.total_time / self.calls 22 | if average: 23 | return self.average_time 24 | else: 25 | return self.diff 26 | -------------------------------------------------------------------------------- /core/utils/tqdm.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import numpy as np 3 | import contextlib 4 | 5 | from tqdm import tqdm 6 | 7 | class TqdmFile(object): 8 | dummy_file = None 9 | def __init__(self, dummy_file): 10 | self.dummy_file = dummy_file 11 | 12 | def write(self, x): 13 | if len(x.rstrip()) > 0: 14 | tqdm.write(x, file=self.dummy_file) 15 | 16 | @contextlib.contextmanager 17 | def stdout_to_tqdm(): 18 | save_stdout = sys.stdout 19 | try: 20 | sys.stdout = TqdmFile(sys.stdout) 21 | yield save_stdout 22 | except Exception as exc: 23 | raise exc 24 | finally: 25 | sys.stdout = save_stdout 26 | -------------------------------------------------------------------------------- /core/vis_utils.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | 4 | def draw_bboxes(image, bboxes, font_size=0.5, thresh=0.5, colors=None): 5 | """Draws bounding boxes on an image. 6 | 7 | Args: 8 | image: An image in OpenCV format 9 | bboxes: A dictionary representing bounding boxes of different object 10 | categories, where the keys are the names of the categories and the 11 | values are the bounding boxes. The bounding boxes of category should be 12 | stored in a 2D NumPy array, where each row is a bounding box (x1, y1, 13 | x2, y2, score). 14 | font_size: (Optional) Font size of the category names. 15 | thresh: (Optional) Only bounding boxes with scores above the threshold 16 | will be drawn. 17 | colors: (Optional) Color of bounding boxes for each category. If it is 18 | not provided, this function will use random color for each category. 19 | 20 | Returns: 21 | An image with bounding boxes. 22 | """ 23 | 24 | image = image.copy() 25 | for cat_name in bboxes: 26 | keep_inds = bboxes[cat_name][:, -1] > thresh 27 | cat_size = cv2.getTextSize(cat_name, cv2.FONT_HERSHEY_SIMPLEX, font_size, 2)[0] 28 | 29 | if colors is None: 30 | color = np.random.random((3, )) * 0.6 + 0.4 31 | color = (color * 255).astype(np.int32).tolist() 32 | else: 33 | color = colors[cat_name] 34 | 35 | for bbox in bboxes[cat_name][keep_inds]: 36 | bbox = bbox[0:4].astype(np.int32) 37 | if bbox[1] - cat_size[1] - 2 < 0: 38 | cv2.rectangle(image, 39 | (bbox[0], bbox[1] + 2), 40 | (bbox[0] + cat_size[0], bbox[1] + cat_size[1] + 2), 41 | color, -1 42 | ) 43 | cv2.putText(image, cat_name, 44 | (bbox[0], bbox[1] + cat_size[1] + 2), 45 | cv2.FONT_HERSHEY_SIMPLEX, font_size, (0, 0, 0), thickness=1 46 | ) 47 | else: 48 | cv2.rectangle(image, 49 | (bbox[0], bbox[1] - cat_size[1] - 2), 50 | (bbox[0] + cat_size[0], bbox[1] - 2), 51 | color, -1 52 | ) 53 | cv2.putText(image, cat_name, 54 | (bbox[0], bbox[1] - 2), 55 | cv2.FONT_HERSHEY_SIMPLEX, font_size, (0, 0, 0), thickness=1 56 | ) 57 | cv2.rectangle(image, 58 | (bbox[0], bbox[1]), 59 | (bbox[2], bbox[3]), 60 | color, 2 61 | ) 62 | return image 63 | -------------------------------------------------------------------------------- /demo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/princeton-vl/CornerNet-Lite/6a54505d830a9d6afe26e99f0864b5d06d0bbbaf/demo.jpg -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import cv2 4 | from core.detectors import CornerNet_Saccade 5 | from core.vis_utils import draw_bboxes 6 | 7 | detector = CornerNet_Saccade() 8 | image = cv2.imread("demo.jpg") 9 | 10 | bboxes = detector(image) 11 | image = draw_bboxes(image, bboxes) 12 | cv2.imwrite("demo_out.jpg", image) 13 | -------------------------------------------------------------------------------- /evaluate.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import os 3 | import json 4 | import torch 5 | import pprint 6 | import argparse 7 | import importlib 8 | 9 | from core.dbs import datasets 10 | from core.test import test_func 11 | from core.config import SystemConfig 12 | from core.nnet.py_factory import NetworkFactory 13 | 14 | torch.backends.cudnn.benchmark = False 15 | 16 | def parse_args(): 17 | parser = argparse.ArgumentParser(description="Evaluation Script") 18 | parser.add_argument("cfg_file", help="config file", type=str) 19 | parser.add_argument("--testiter", dest="testiter", 20 | help="test at iteration i", 21 | default=None, type=int) 22 | parser.add_argument("--split", dest="split", 23 | help="which split to use", 24 | default="validation", type=str) 25 | parser.add_argument("--suffix", dest="suffix", default=None, type=str) 26 | parser.add_argument("--debug", action="store_true") 27 | 28 | args = parser.parse_args() 29 | return args 30 | 31 | def make_dirs(directories): 32 | for directory in directories: 33 | if not os.path.exists(directory): 34 | os.makedirs(directory) 35 | 36 | def test(db, system_config, model, args): 37 | split = args.split 38 | testiter = args.testiter 39 | debug = args.debug 40 | suffix = args.suffix 41 | 42 | result_dir = system_config.result_dir 43 | result_dir = os.path.join(result_dir, str(testiter), split) 44 | 45 | if suffix is not None: 46 | result_dir = os.path.join(result_dir, suffix) 47 | 48 | make_dirs([result_dir]) 49 | 50 | test_iter = system_config.max_iter if testiter is None else testiter 51 | print("loading parameters at iteration: {}".format(test_iter)) 52 | 53 | print("building neural network...") 54 | nnet = NetworkFactory(system_config, model) 55 | print("loading parameters...") 56 | nnet.load_params(test_iter) 57 | 58 | nnet.cuda() 59 | nnet.eval_mode() 60 | test_func(system_config, db, nnet, result_dir, debug=debug) 61 | 62 | def main(args): 63 | if args.suffix is None: 64 | cfg_file = os.path.join("./configs", args.cfg_file + ".json") 65 | else: 66 | cfg_file = os.path.join("./configs", args.cfg_file + "-{}.json".format(args.suffix)) 67 | print("cfg_file: {}".format(cfg_file)) 68 | 69 | with open(cfg_file, "r") as f: 70 | config = json.load(f) 71 | 72 | config["system"]["snapshot_name"] = args.cfg_file 73 | system_config = SystemConfig().update_config(config["system"]) 74 | 75 | model_file = "core.models.{}".format(args.cfg_file) 76 | model_file = importlib.import_module(model_file) 77 | model = model_file.model() 78 | 79 | train_split = system_config.train_split 80 | val_split = system_config.val_split 81 | test_split = system_config.test_split 82 | 83 | split = { 84 | "training": train_split, 85 | "validation": val_split, 86 | "testing": test_split 87 | }[args.split] 88 | 89 | print("loading all datasets...") 90 | dataset = system_config.dataset 91 | print("split: {}".format(split)) 92 | testing_db = datasets[dataset](config["db"], split=split, sys_config=system_config) 93 | 94 | print("system config...") 95 | pprint.pprint(system_config.full) 96 | 97 | print("db config...") 98 | pprint.pprint(testing_db.configs) 99 | 100 | test(testing_db, system_config, model, args) 101 | 102 | if __name__ == "__main__": 103 | args = parse_args() 104 | main(args) 105 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import os 3 | import json 4 | import torch 5 | import numpy as np 6 | import queue 7 | import pprint 8 | import random 9 | import argparse 10 | import importlib 11 | import threading 12 | import traceback 13 | import torch.distributed as dist 14 | import torch.multiprocessing as mp 15 | 16 | from tqdm import tqdm 17 | from torch.multiprocessing import Process, Queue, Pool 18 | 19 | from core.dbs import datasets 20 | from core.utils import stdout_to_tqdm 21 | from core.config import SystemConfig 22 | from core.sample import data_sampling_func 23 | from core.nnet.py_factory import NetworkFactory 24 | 25 | torch.backends.cudnn.enabled = True 26 | torch.backends.cudnn.benchmark = True 27 | 28 | def parse_args(): 29 | parser = argparse.ArgumentParser(description="Training Script") 30 | parser.add_argument("cfg_file", help="config file", type=str) 31 | parser.add_argument("--iter", dest="start_iter", 32 | help="train at iteration i", 33 | default=0, type=int) 34 | parser.add_argument("--workers", default=4, type=int) 35 | parser.add_argument("--initialize", action="store_true") 36 | 37 | parser.add_argument("--distributed", action="store_true") 38 | parser.add_argument("--world-size", default=-1, type=int, 39 | help="number of nodes of distributed training") 40 | parser.add_argument("--rank", default=0, type=int, 41 | help="node rank for distributed training") 42 | parser.add_argument("--dist-url", default=None, type=str, 43 | help="url used to set up distributed training") 44 | parser.add_argument("--dist-backend", default="nccl", type=str) 45 | 46 | args = parser.parse_args() 47 | return args 48 | 49 | def prefetch_data(system_config, db, queue, sample_data, data_aug): 50 | ind = 0 51 | print("start prefetching data...") 52 | np.random.seed(os.getpid()) 53 | while True: 54 | try: 55 | data, ind = sample_data(system_config, db, ind, data_aug=data_aug) 56 | queue.put(data) 57 | except Exception as e: 58 | traceback.print_exc() 59 | raise e 60 | 61 | def _pin_memory(ts): 62 | if type(ts) is list: 63 | return [t.pin_memory() for t in ts] 64 | return ts.pin_memory() 65 | 66 | def pin_memory(data_queue, pinned_data_queue, sema): 67 | while True: 68 | data = data_queue.get() 69 | 70 | data["xs"] = [_pin_memory(x) for x in data["xs"]] 71 | data["ys"] = [_pin_memory(y) for y in data["ys"]] 72 | 73 | pinned_data_queue.put(data) 74 | 75 | if sema.acquire(blocking=False): 76 | return 77 | 78 | def init_parallel_jobs(system_config, dbs, queue, fn, data_aug): 79 | tasks = [Process(target=prefetch_data, args=(system_config, db, queue, fn, data_aug)) for db in dbs] 80 | for task in tasks: 81 | task.daemon = True 82 | task.start() 83 | return tasks 84 | 85 | def terminate_tasks(tasks): 86 | for task in tasks: 87 | task.terminate() 88 | 89 | def train(training_dbs, validation_db, system_config, model, args): 90 | # reading arguments from command 91 | start_iter = args.start_iter 92 | distributed = args.distributed 93 | world_size = args.world_size 94 | initialize = args.initialize 95 | gpu = args.gpu 96 | rank = args.rank 97 | 98 | # reading arguments from json file 99 | batch_size = system_config.batch_size 100 | learning_rate = system_config.learning_rate 101 | max_iteration = system_config.max_iter 102 | pretrained_model = system_config.pretrain 103 | stepsize = system_config.stepsize 104 | snapshot = system_config.snapshot 105 | val_iter = system_config.val_iter 106 | display = system_config.display 107 | decay_rate = system_config.decay_rate 108 | stepsize = system_config.stepsize 109 | 110 | print("Process {}: building model...".format(rank)) 111 | nnet = NetworkFactory(system_config, model, distributed=distributed, gpu=gpu) 112 | if initialize: 113 | nnet.save_params(0) 114 | exit(0) 115 | 116 | # queues storing data for training 117 | training_queue = Queue(system_config.prefetch_size) 118 | validation_queue = Queue(5) 119 | 120 | # queues storing pinned data for training 121 | pinned_training_queue = queue.Queue(system_config.prefetch_size) 122 | pinned_validation_queue = queue.Queue(5) 123 | 124 | # allocating resources for parallel reading 125 | training_tasks = init_parallel_jobs(system_config, training_dbs, training_queue, data_sampling_func, True) 126 | if val_iter: 127 | validation_tasks = init_parallel_jobs(system_config, [validation_db], validation_queue, data_sampling_func, False) 128 | 129 | training_pin_semaphore = threading.Semaphore() 130 | validation_pin_semaphore = threading.Semaphore() 131 | training_pin_semaphore.acquire() 132 | validation_pin_semaphore.acquire() 133 | 134 | training_pin_args = (training_queue, pinned_training_queue, training_pin_semaphore) 135 | training_pin_thread = threading.Thread(target=pin_memory, args=training_pin_args) 136 | training_pin_thread.daemon = True 137 | training_pin_thread.start() 138 | 139 | validation_pin_args = (validation_queue, pinned_validation_queue, validation_pin_semaphore) 140 | validation_pin_thread = threading.Thread(target=pin_memory, args=validation_pin_args) 141 | validation_pin_thread.daemon = True 142 | validation_pin_thread.start() 143 | 144 | if pretrained_model is not None: 145 | if not os.path.exists(pretrained_model): 146 | raise ValueError("pretrained model does not exist") 147 | print("Process {}: loading from pretrained model".format(rank)) 148 | nnet.load_pretrained_params(pretrained_model) 149 | 150 | if start_iter: 151 | nnet.load_params(start_iter) 152 | learning_rate /= (decay_rate ** (start_iter // stepsize)) 153 | nnet.set_lr(learning_rate) 154 | print("Process {}: training starts from iteration {} with learning_rate {}".format(rank, start_iter + 1, learning_rate)) 155 | else: 156 | nnet.set_lr(learning_rate) 157 | 158 | if rank == 0: 159 | print("training start...") 160 | nnet.cuda() 161 | nnet.train_mode() 162 | with stdout_to_tqdm() as save_stdout: 163 | for iteration in tqdm(range(start_iter + 1, max_iteration + 1), file=save_stdout, ncols=80): 164 | training = pinned_training_queue.get(block=True) 165 | training_loss = nnet.train(**training) 166 | 167 | if display and iteration % display == 0: 168 | print("Process {}: training loss at iteration {}: {}".format(rank, iteration, training_loss.item())) 169 | del training_loss 170 | 171 | if val_iter and validation_db.db_inds.size and iteration % val_iter == 0: 172 | nnet.eval_mode() 173 | validation = pinned_validation_queue.get(block=True) 174 | validation_loss = nnet.validate(**validation) 175 | print("Process {}: validation loss at iteration {}: {}".format(rank, iteration, validation_loss.item())) 176 | nnet.train_mode() 177 | 178 | if iteration % snapshot == 0 and rank == 0: 179 | nnet.save_params(iteration) 180 | 181 | if iteration % stepsize == 0: 182 | learning_rate /= decay_rate 183 | nnet.set_lr(learning_rate) 184 | 185 | # sending signal to kill the thread 186 | training_pin_semaphore.release() 187 | validation_pin_semaphore.release() 188 | 189 | # terminating data fetching processes 190 | terminate_tasks(training_tasks) 191 | terminate_tasks(validation_tasks) 192 | 193 | def main(gpu, ngpus_per_node, args): 194 | args.gpu = gpu 195 | if args.distributed: 196 | args.rank = args.rank * ngpus_per_node + gpu 197 | dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, 198 | world_size=args.world_size, rank=args.rank) 199 | 200 | rank = args.rank 201 | 202 | cfg_file = os.path.join("./configs", args.cfg_file + ".json") 203 | with open(cfg_file, "r") as f: 204 | config = json.load(f) 205 | 206 | config["system"]["snapshot_name"] = args.cfg_file 207 | system_config = SystemConfig().update_config(config["system"]) 208 | 209 | model_file = "core.models.{}".format(args.cfg_file) 210 | model_file = importlib.import_module(model_file) 211 | model = model_file.model() 212 | 213 | train_split = system_config.train_split 214 | val_split = system_config.val_split 215 | 216 | print("Process {}: loading all datasets...".format(rank)) 217 | dataset = system_config.dataset 218 | workers = args.workers 219 | print("Process {}: using {} workers".format(rank, workers)) 220 | training_dbs = [datasets[dataset](config["db"], split=train_split, sys_config=system_config) for _ in range(workers)] 221 | validation_db = datasets[dataset](config["db"], split=val_split, sys_config=system_config) 222 | 223 | if rank == 0: 224 | print("system config...") 225 | pprint.pprint(system_config.full) 226 | 227 | print("db config...") 228 | pprint.pprint(training_dbs[0].configs) 229 | 230 | print("len of db: {}".format(len(training_dbs[0].db_inds))) 231 | print("distributed: {}".format(args.distributed)) 232 | 233 | train(training_dbs, validation_db, system_config, model, args) 234 | 235 | if __name__ == "__main__": 236 | args = parse_args() 237 | 238 | distributed = args.distributed 239 | world_size = args.world_size 240 | 241 | if distributed and world_size < 0: 242 | raise ValueError("world size must be greater than 0 in distributed training") 243 | 244 | ngpus_per_node = torch.cuda.device_count() 245 | if distributed: 246 | args.world_size = ngpus_per_node * args.world_size 247 | mp.spawn(main, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) 248 | else: 249 | main(None, ngpus_per_node, args) 250 | --------------------------------------------------------------------------------