├── .gitignore ├── README.md ├── data └── cache │ ├── caltech │ ├── test │ ├── train_gt │ └── train_nogt │ ├── cityperson │ ├── train_h50 │ └── val_500 │ └── widerface │ ├── note.txt~ │ ├── test │ ├── train │ └── val ├── docs ├── face.jpg └── pipeline.png ├── download_weights.sh ├── eval_caltech ├── AS.mat ├── ResultsEval │ └── gt-new │ │ └── gt-Reasonable.mat ├── bbApply.m ├── bbGt.m ├── dbEval.m ├── dbInfo.m ├── extract_img_anno.m ├── getPrmDflt.m └── vbb.m ├── eval_city ├── cocoapi │ ├── .gitignore │ ├── LuaAPI │ │ ├── CocoApi.lua │ │ ├── MaskApi.lua │ │ ├── cocoDemo.lua │ │ ├── env.lua │ │ ├── init.lua │ │ └── rocks │ │ │ └── coco-scm-1.rockspec │ ├── MatlabAPI │ │ ├── CocoApi.m │ │ ├── CocoEval.m │ │ ├── CocoUtils.m │ │ ├── MaskApi.m │ │ ├── cocoDemo.m │ │ ├── evalDemo.m │ │ ├── gason.m │ │ └── private │ │ │ ├── gasonMex.cpp │ │ │ ├── gasonMex.mexa64 │ │ │ ├── gasonMex.mexmaci64 │ │ │ ├── getPrmDflt.m │ │ │ ├── maskApiMex.c │ │ │ ├── maskApiMex.mexa64 │ │ │ └── maskApiMex.mexmaci64 │ ├── PythonAPI │ │ ├── Makefile │ │ ├── pycocoDemo.ipynb │ │ ├── pycocoEvalDemo.ipynb │ │ ├── pycocotools │ │ │ ├── __init__.py │ │ │ ├── _mask.pyx │ │ │ ├── coco.py │ │ │ ├── cocoeval.py │ │ │ └── mask.py │ │ └── setup.py │ ├── README.txt │ ├── common │ │ ├── gason.cpp │ │ ├── gason.h │ │ ├── maskApi.c │ │ └── maskApi.h │ └── license.txt ├── dt_txt2json.m ├── eval_script │ ├── coco.py │ ├── eval_MR_multisetup.py │ ├── eval_demo.py │ └── readme.txt ├── readme.txt └── val_gt.json ├── generate_cache_caltech.py ├── generate_cache_city.py ├── generate_cache_wider.py ├── keras_csp ├── __init__.py ├── bbox_process.py ├── bbox_transform.py ├── config.py ├── data_augment.py ├── data_generators.py ├── keras_layer_L2Normalization.py ├── losses.py ├── mobilenet.py ├── nms │ ├── .gitignore │ ├── __init__.py │ ├── cpu_nms.pyx │ ├── gpu_nms.hpp │ ├── gpu_nms.pyx │ ├── nms_kernel.cu │ └── py_cpu_nms.py ├── nms_wrapper.py ├── parallel_model.py ├── resnet50.py └── utilsfunc.py ├── requirements.txt ├── setup.py ├── test_caltech.py ├── test_city.py ├── test_wider_ms.py ├── train_caltech.py ├── train_city.py └── train_wider.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | build/ 3 | data/ 4 | __pycache__ 5 | output 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection 2 | Keras implementation of [CSP] accepted by CVPR 2019. 3 | ## Introduction 4 | This paper provides a new perspective for detecting pedestrians where detection is formulated as Center and Scale Prediction (CSP), the pipeline is illustrated in the following. For more details, please refer to our [paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_High-Level_Semantic_Feature_Detection_A_New_Perspective_for_Pedestrian_Detection_CVPR_2019_paper.pdf). 5 | ![img01](./docs/pipeline.png) 6 | 7 | Besides the superority on pedestrian detection demonstrated in the paper, we take a step further towards the generablity of CSP and validate it on face detection. Experimental reults on WiderFace benchmark also show the competitiveness of CSP. 8 | ![img02](./docs/face.jpg) 9 | 10 | 11 | ### Dependencies 12 | 13 | * Python >= 3.6 14 | * Tensorflow >= 1.1.3 15 | * Keras >= 2.0.6 16 | * OpenCV >= 3.4.1.15 (note that other versions than 3.4.1.15 will result in different performance on Caltech) 17 | 18 | ## Contents 19 | 1. [Installation](#installation) 20 | 2. [Preparation](#preparation) 21 | 3. [Training](#training) 22 | 4. [Test](#test) 23 | 5. [Evaluation](#evaluation) 24 | 6. [Models](#models) 25 | 26 | ### Installation 27 | 1. Get the code. We will call the cloned directory as '$CSP'. 28 | ``` 29 | git clone https://github.com/liuwei16/CSP.git 30 | ``` 31 | 2. Install the requirments. 32 | ``` 33 | pip install -r requirements.txt 34 | ``` 35 | 36 | 3. Build dependencies 37 | ``` 38 | python setup.py build_ext --inplace 39 | ``` 40 | 41 | 4. Download pretrained resnet50 weights (basenet only): 42 | ``` 43 | ./download_weights.sh 44 | ``` 45 | 46 | 47 | ### Preparation 48 | 1. Download the dataset. 49 | 50 | For pedestrian detection, we train and test our model on [Caltech](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/) and [CityPersons](https://bitbucket.org/shanshanzhang/citypersons), you should firstly download the datasets. By default, we assume the dataset is stored in `./data/`. 51 | 52 | 2. Dataset preparation. 53 | 54 | For Caltech, you can follow [./eval_caltech/extract_img_anno.m](./eval_caltech/extract_img_anno.m) to extract official seq files into images. Training and test are based on the new annotations provided by [Shanshan2016CVPR](https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/people-detection-pose-estimation-and-tracking/how-far-are-we-from-solving-pedestrian-detection/). We use the train_10x setting (42782 images) for training, the official test set has 4024 images. By default, we assume that images and annotations are stored in `./data/caltech`, and the directory structure is 55 | ``` 56 | *DATA_PATH 57 | *train_3 58 | *annotations_new 59 | *set00_V000_I00002.txt 60 | *... 61 | *images 62 | *set00_V000_I00002.jpg 63 | *... 64 | *test 65 | *annotations_new 66 | *set06_V000_I00029.jpg.txt 67 | *... 68 | *images 69 | *set06_V000_I00029.jpg 70 | *... 71 | ``` 72 | For citypersons, we use the training set (2975 images) for training and test on the validation set (500 images), we assume that images and annotations are stored in `./data/citypersons`, and the directory structure is 73 | ``` 74 | *DATA_PATH 75 | *annotations 76 | *anno_train.mat 77 | *anno_val.mat 78 | *images 79 | *train 80 | *val 81 | ``` 82 | We have provided the cache files of training and validation subsets. Optionally, you can also follow the [./generate_cache_caltech.py](./generate_cache_caltech.py) and [./generate_cache_city.py](./generate_cache_city.py) to create the cache files for training and validation. By default, we assume the cache files is stored in `./data/cache/`. For Caltech, we split the training set into images with and without any pedestrian instance, resulting in 9105 and 33767 images. 83 | 84 | 3. Download the initialized models. 85 | 86 | We use the backbone [ResNet-50](https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5) and [MobileNet_v1](https://github.com/fchollet/deep-learning-models/releases/download/v0.6/) in our experiments. By default, we assume the weight files is stored in `./data/models/`. 87 | 88 | ### Training 89 | Optionally, you should set the training parameters in [./keras_csp/config.py](./keras_csp/config.py) 90 | 1. Train on Caltech. 91 | 92 | Follow the [./train_caltech.py](./train_caltech.py) to start training. The weight files of all epochs will be saved in `./output/valmodels/caltech`. 93 | 94 | 2. Train on CityPersons. 95 | 96 | Follow the [./train_city.py](./train_city.py) to start training. The weight files of all epochs will be saved in `./output/valmodels/city`. 97 | 98 | ### Test 99 | 1. Caltech. 100 | 101 | Follow the [./test_caltech.py](./test_caltech.py) to get the detection results. You can test from epoch 51, and the results will be saved in `./output/valresults/caltech`. 102 | 103 | 2. CityPersons. 104 | 105 | Follow the [./test_city.py](./test_city.py) to get the detection results. You can test from epoch 51, and the results will be saved in `./output/valresults/city`. 106 | 107 | ### Evaluation 108 | 1. Caltech. 109 | 110 | Follow the [./eval_caltech/dbEval.m](./eval_caltech/dbEval.m) to get the Miss Rates of detections in `pth.resDir` defined in line 25. Finally, evaluation results will be saved as `eval-newReasonable.txt` in `./eval_caltech/ResultsEval`. 111 | 112 | 2. CityPersons. 113 | 114 | (1) Follow the [./eval_city/dt_txt2json.m](./eval_city/dt_txt2json.m) to convert the '.txt' files to '.json'. The specific `main_path` is defined in line 3. 115 | 116 | (2) Follow the [./eval_city/eval_script/eval_demo.py](./eval_city/eval_script/eval_demo.py) to get the Miss Rates of detections in `main_path` defined in line 9. 117 | 118 | ### Models 119 | To reproduce the results in our paper, we have provided the models trained from different datasets. You can download them through [BaiduYun](https://pan.baidu.com/s/1SSPQnbDP6zf9xf8eCDi3Fw) (Code: jcgd). For Caltech, please make sure that the version of OpenCV is 3.4.1.15, other versions will read the same image into different data values, resulting in slightly different performance. 120 | 1. For Caltech 121 | 122 | ResNet-50 initialized from ImageNet: 123 | 124 | Height prediction: [model_CSP/caltech/fromimgnet/h/nooffset](https://pan.baidu.com/s/1SSPQnbDP6zf9xf8eCDi3Fw) 125 | 126 | Height+Offset prediciton: [model_CSP/caltech/fromimgnet/h/withoffset](https://pan.baidu.com/s/1SSPQnbDP6zf9xf8eCDi3Fw) 127 | 128 | Height+Width prediciton: [model_CSP/caltech/fromimgnet/h+w/](https://pan.baidu.com/s/1SSPQnbDP6zf9xf8eCDi3Fw) 129 | 130 | ResNet-50 initialized from CityPersons: 131 | 132 | Height+Offset prediciton: [model_CSP/caltech/fromcity/](https://pan.baidu.com/s/1SSPQnbDP6zf9xf8eCDi3Fw) 133 | 134 | 2. For CityPersons: 135 | 136 | Height prediction: [model_CSP/cityperson/nooffset](https://pan.baidu.com/s/1SSPQnbDP6zf9xf8eCDi3Fw) 137 | 138 | Height+Offset prediction: [model_CSP/cityperson/withoffset](https://pan.baidu.com/s/1SSPQnbDP6zf9xf8eCDi3Fw) 139 | 140 | Upon this codebase, we also have 10 trails on Height+Offset prediction. Generally, models will be converged after epoch 50. For Caltech and CityPersons, we test the results from epoch 50 to 120 and from epoch 50 to 150, respectively, and get the best result (*MR* under Reasonable setting) given in the following table. 141 | 142 | | Trial | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 143 | |:-----:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:--:| 144 | |Caltech| 4.98 | 4.75 | 4.57 | 4.84 | 4.72 | 4.15 | 5.17 | 4.60 | 4.63 | 4.91 | 145 | |CityPersons| 11.31 | 11.17 | 11.42 | 11.69 | 11.56 | 11.05 | 11.59 | 11.78 | 11.27 | 10.62 | 146 | 147 | 148 | ### Extension--Face Detection 149 | 1. Data preparation 150 | 151 | You should firstly download the [WiderFace](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) dataset and put it in `./data/WiderFace`. We have provided the cache files in `./data/cache/widerface` or you can follow [./genetrate_cache_wider.py](./genetrate_cache_wider.py) to cerate them. 152 | 153 | 2. Training and Test 154 | 155 | For face detection, CSP is required to predict both height and width of each instance with various aspect ratios. You can follow the [./train_wider.py](./train_wider.py) to start training and [./test_wider_ms.py](./test_wider_ms.py) for multi-scale test. As a common practice, the model trained on the official training set is evaluated on both validation and test set, and the results are submitted to [WiderFace](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/). To reproduce the result in the benchmark, we provide the model for Height+Width+Offset prediction in [model_CSP/widerface/](https://pan.baidu.com/s/1SSPQnbDP6zf9xf8eCDi3Fw). 156 | 157 | Note that we adopt the similar data-augmentation strategy for training and multi-scale testing in [PyramidBox](https://arxiv.org/pdf/1803.07737.pdf), which helps us to achieve better performance in this benchmark. 158 | 159 | ## Citation 160 | If you think our work is useful in your research, please consider citing: 161 | ``` 162 | @inproceedings{liu2018high, 163 | title={High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection}, 164 | author={Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, Yinan Yu}, 165 | booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 166 | year={2019} 167 | } 168 | ``` 169 | 170 | 171 | 172 | -------------------------------------------------------------------------------- /data/cache/caltech/test: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/data/cache/caltech/test -------------------------------------------------------------------------------- /data/cache/caltech/train_gt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/data/cache/caltech/train_gt -------------------------------------------------------------------------------- /data/cache/caltech/train_nogt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/data/cache/caltech/train_nogt -------------------------------------------------------------------------------- /data/cache/cityperson/train_h50: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/data/cache/cityperson/train_h50 -------------------------------------------------------------------------------- /data/cache/cityperson/val_500: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/data/cache/cityperson/val_500 -------------------------------------------------------------------------------- /data/cache/widerface/note.txt~: -------------------------------------------------------------------------------- 1 | filter w>=5 and h>=5 2 | train: 3 | 12880 images and 12880 valid images and 159424 boxes 4 | 12880 images and 12876 valid images and 154301 boxes 5 | 6 | val: 7 | 3226 images and 3226 valid images and 39708 boxes 8 | 9 | test: 10 | 16097 images 11 | 12 | 13 | 14 | 15 | -------------------------------------------------------------------------------- /data/cache/widerface/test: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/data/cache/widerface/test -------------------------------------------------------------------------------- /data/cache/widerface/train: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/data/cache/widerface/train -------------------------------------------------------------------------------- /data/cache/widerface/val: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/data/cache/widerface/val -------------------------------------------------------------------------------- /docs/face.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/docs/face.jpg -------------------------------------------------------------------------------- /docs/pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/docs/pipeline.png -------------------------------------------------------------------------------- /download_weights.sh: -------------------------------------------------------------------------------- 1 | mkdir -p data/models 2 | wget -O ./data/models/resnet50_weights_tf_dim_ordering_tf_kernels.h5 https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5 --no-check-certificate 3 | -------------------------------------------------------------------------------- /eval_caltech/AS.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/eval_caltech/AS.mat -------------------------------------------------------------------------------- /eval_caltech/ResultsEval/gt-new/gt-Reasonable.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/eval_caltech/ResultsEval/gt-new/gt-Reasonable.mat -------------------------------------------------------------------------------- /eval_caltech/bbGt.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/eval_caltech/bbGt.m -------------------------------------------------------------------------------- /eval_caltech/dbInfo.m: -------------------------------------------------------------------------------- 1 | function [pth,setIds,vidIds,skip,ext] = dbInfo( name1 ) 2 | % Specifies data amount and location. 3 | % 4 | % 'name' specifies the name of the dataset. Valid options include: 'Usa', 5 | % 'UsaTest', 'UsaTrain', 'InriaTrain', 'InriaTest', 'Japan', 'TudBrussels', 6 | % 'ETH', and 'Daimler'. If dbInfo() is called without specifying the 7 | % dataset name, defaults to the last used name (or 'UsaTest' on first call 8 | % to dbInfo()). Finally, one can specify a subset of a dataset by appending 9 | % digits to the end of the name (eg. 'UsaTest01' indicates first set of 10 | % 'UsaTest' and 'UsaTest01005' indicate first set, fifth video). 11 | % 12 | % USAGE 13 | % [pth,setIds,vidIds,skip,ext] = dbInfo( [name] ) 14 | % 15 | % INPUTS 16 | % name - ['UsaTest'] specify dataset, caches last passed in name 17 | % 18 | % OUTPUTS 19 | % pth - directory containing database 20 | % setIds - integer ids of each set 21 | % vidIds - [1xnSets] cell of vectors of integer ids of each video 22 | % skip - specify subset of frames to use for evaluation 23 | % ext - file extension determining image format ('jpg' or 'png') 24 | % 25 | % EXAMPLE 26 | % [pth,setIds,vidIds,skip,ext] = dbInfo 27 | % 28 | % See also 29 | % 30 | % Caltech Pedestrian Dataset Version 3.2.1 31 | % Copyright 2014 Piotr Dollar. [pdollar-at-gmail.com] 32 | % Licensed under the Simplified BSD License [see external/bsd.txt] 33 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 34 | %modification version: 35 | % 2017.01.18:the default of skip is set to be 3 instead of 30 36 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 37 | 38 | persistent name; % cache last used name 39 | if(nargin && ~isempty(name1)), name=lower(name1); else 40 | if(isempty(name)), name='usa'; end; end; name1=name; 41 | 42 | vidId=str2double(name1(end-2:end)); % check if name ends in 3 ints 43 | if(isnan(vidId)), vidId=[]; else name1=name1(1:end-3); end 44 | setId=str2double(name1(end-1:end)); % check if name ends in 2 ints 45 | if(isnan(setId)), setId=[]; else name1=name1(1:end-2); end 46 | 47 | switch name1 48 | case 'usa' % Caltech Pedestrian Datasets (all) 49 | setIds=0:10; subdir='USA'; skip=30; ext='jpg'; 50 | vidIds={0:14 0:5 0:11 0:12 0:11 0:12 0:18 0:11 0:10 0:11 0:11}; 51 | case 'usatrain' % Caltech Pedestrian Datasets (training) 52 | setIds=0:5; subdir='USA'; skip=30; ext='jpg'; 53 | vidIds={0:14 0:5 0:11 0:12 0:11 0:12}; 54 | case 'usatest' % Caltech Pedestrian Datasets (testing) 55 | setIds=6:10; subdir='USA'; skip=30; ext='jpg'; 56 | vidIds={0:18 0:11 0:10 0:11 0:11}; 57 | case 'kaist' % Caltech Pedestrian Datasets (testing) 58 | setIds=6:11; subdir='kaist'; skip=20; ext='jpg'; 59 | vidIds={0:4 0:2 0:2 0 0:1 0:1}; 60 | case 'inriatrain' % INRIA peds (training) 61 | setIds=0; subdir='INRIA'; skip=1; ext='png'; vidIds={0:1}; 62 | case 'inriatest' % INRIA peds (testing) 63 | setIds=1; subdir='INRIA'; skip=1; ext='png'; vidIds={0}; 64 | case 'japan' % Caltech Japan data (not publicly avialable) 65 | setIds=0:12; subdir='Japan'; skip=30; ext='jpg'; 66 | vidIds={0:5 0:5 0:3 0:5 0:5 0:5 0:5 0:5 0:4 0:4 0:5 0:5 0:4}; 67 | case 'tudbrussels' % TUD-Brussels dataset 68 | setIds=0; subdir='TudBrussels'; skip=1; ext='png'; vidIds={0}; 69 | case 'eth' % ETH dataset 70 | setIds=0:2; subdir='ETH'; skip=1; ext='png'; vidIds={0 0 0}; 71 | case 'daimler' % Daimler dataset 72 | setIds=0; subdir='Daimler'; skip=1; ext='png'; vidIds={0}; 73 | case 'pietro' 74 | setIds=0; subdir='Pietro'; skip=1; ext='jpg'; vidIds={0}; 75 | otherwise, error('unknown data type: %s',name); 76 | end 77 | 78 | % optionally select only specific set/vid if name ended in ints 79 | if(~isempty(setId)), setIds=setIds(setId); vidIds=vidIds(setId); end 80 | if(~isempty(vidId)), vidIds={vidIds{1}(vidId)}; end 81 | 82 | % actual directory where data is contained 83 | pth=fileparts(mfilename('fullpath')); 84 | pth=[pth filesep 'data-' subdir]; 85 | 86 | end 87 | -------------------------------------------------------------------------------- /eval_caltech/extract_img_anno.m: -------------------------------------------------------------------------------- 1 | % extract_img_anno() 2 | % -------------------------------------------------------- 3 | % RPN_BF 4 | % Copyright (c) 2016, Liliang Zhang 5 | % Licensed under The MIT License [see LICENSE for details] 6 | % -------------------------------------------------------- 7 | 8 | dataDir='./datasets/caltech/'; 9 | addpath(genpath('./code3.2.1')); 10 | addpath(genpath('./toolbox')); 11 | for s=1:2 12 | if(s==1), type='test'; skip=[];continue; else type='train'; skip=3; end 13 | dbInfo(['Usa' type]); 14 | if(exist([dataDir type '/annotations'],'dir')), continue; end 15 | dbExtract([dataDir type],1,skip); 16 | end 17 | 18 | -------------------------------------------------------------------------------- /eval_caltech/getPrmDflt.m: -------------------------------------------------------------------------------- 1 | function varargout = getPrmDflt( prm, dfs, checkExtra ) 2 | % Helper to set default values (if not already set) of parameter struct. 3 | % 4 | % Takes input parameters and a list of 'name'/default pairs, and for each 5 | % 'name' for which prm has no value (prm.(name) is not a field or 'name' 6 | % does not appear in prm list), getPrmDflt assigns the given default 7 | % value. If default value for variable 'name' is 'REQ', and value for 8 | % 'name' is not given, an error is thrown. See below for usage details. 9 | % 10 | % USAGE (nargout==1) 11 | % prm = getPrmDflt( prm, dfs, [checkExtra] ) 12 | % 13 | % USAGE (nargout>1) 14 | % [ param1 ... paramN ] = getPrmDflt( prm, dfs, [checkExtra] ) 15 | % 16 | % INPUTS 17 | % prm - param struct or cell of form {'name1' v1 'name2' v2 ...} 18 | % dfs - cell of form {'name1' def1 'name2' def2 ...} 19 | % checkExtra - [0] if 1 throw error if prm contains params not in dfs 20 | % if -1 if prm contains params not in dfs adds them 21 | % 22 | % OUTPUTS (nargout==1) 23 | % prm - parameter struct with fields 'name1' through 'nameN' assigned 24 | % 25 | % OUTPUTS (nargout>1) 26 | % param1 - value assigned to parameter with 'name1' 27 | % ... 28 | % paramN - value assigned to parameter with 'nameN' 29 | % 30 | % EXAMPLE 31 | % dfs = { 'x','REQ', 'y',0, 'z',[], 'eps',1e-3 }; 32 | % prm = getPrmDflt( struct('x',1,'y',1), dfs ) 33 | % [ x y z eps ] = getPrmDflt( {'x',2,'y',1}, dfs ) 34 | % 35 | % See also INPUTPARSER 36 | % 37 | % Piotr's Computer Vision Matlab Toolbox Version 2.60 38 | % Copyright 2014 Piotr Dollar. [pdollar-at-gmail.com] 39 | % Licensed under the Simplified BSD License [see external/bsd.txt] 40 | 41 | if( mod(length(dfs),2) ), error('odd number of default parameters'); end 42 | if nargin<=2, checkExtra = 0; end 43 | 44 | % get the input parameters as two cell arrays: prmVal and prmField 45 | if iscell(prm) && length(prm)==1, prm=prm{1}; end 46 | if iscell(prm) 47 | if(mod(length(prm),2)), error('odd number of parameters in prm'); end 48 | prmField = prm(1:2:end); prmVal = prm(2:2:end); 49 | else 50 | if(~isstruct(prm)), error('prm must be a struct or a cell'); end 51 | prmVal = struct2cell(prm); prmField = fieldnames(prm); 52 | end 53 | 54 | % get and update default values using quick for loop 55 | dfsField = dfs(1:2:end); dfsVal = dfs(2:2:end); 56 | if checkExtra>0 57 | for i=1:length(prmField) 58 | j = find(strcmp(prmField{i},dfsField)); 59 | if isempty(j), error('parameter %s is not valid', prmField{i}); end 60 | dfsVal(j) = prmVal(i); 61 | end 62 | elseif checkExtra<0 63 | for i=1:length(prmField) 64 | j = find(strcmp(prmField{i},dfsField)); 65 | if isempty(j), j=length(dfsVal)+1; dfsField{j}=prmField{i}; end 66 | dfsVal(j) = prmVal(i); 67 | end 68 | else 69 | for i=1:length(prmField) 70 | dfsVal(strcmp(prmField{i},dfsField)) = prmVal(i); 71 | end 72 | end 73 | 74 | % check for missing values 75 | if any(strcmp('REQ',dfsVal)) 76 | cmpArray = find(strcmp('REQ',dfsVal)); 77 | error(['Required field ''' dfsField{cmpArray(1)} ''' not specified.'] ); 78 | end 79 | 80 | % set output 81 | if nargout==1 82 | varargout{1} = cell2struct( dfsVal, dfsField, 2 ); 83 | else 84 | varargout = dfsVal; 85 | end 86 | -------------------------------------------------------------------------------- /eval_city/cocoapi/.gitignore: -------------------------------------------------------------------------------- 1 | images/ 2 | annotations/ 3 | results/ 4 | external/ 5 | .DS_Store 6 | 7 | MatlabAPI/analyze*/ 8 | MatlabAPI/visualize*/ 9 | 10 | PythonAPI/pycocotools/__init__.pyc 11 | PythonAPI/pycocotools/_mask.c 12 | PythonAPI/pycocotools/_mask.so 13 | PythonAPI/pycocotools/coco.pyc 14 | PythonAPI/pycocotools/cocoeval.pyc 15 | PythonAPI/pycocotools/mask.pyc 16 | -------------------------------------------------------------------------------- /eval_city/cocoapi/LuaAPI/MaskApi.lua: -------------------------------------------------------------------------------- 1 | --[[---------------------------------------------------------------------------- 2 | 3 | Interface for manipulating masks stored in RLE format. 4 | 5 | For an overview of RLE please see http://mscoco.org/dataset/#download. 6 | Additionally, more detailed information can be found in the Matlab MaskApi.m: 7 | https://github.com/pdollar/coco/blob/master/MatlabAPI/MaskApi.m 8 | 9 | The following API functions are defined: 10 | encode - Encode binary masks using RLE. 11 | decode - Decode binary masks encoded via RLE. 12 | merge - Compute union or intersection of encoded masks. 13 | iou - Compute intersection over union between masks. 14 | nms - Compute non-maximum suppression between ordered masks. 15 | area - Compute area of encoded masks. 16 | toBbox - Get bounding boxes surrounding encoded masks. 17 | frBbox - Convert bounding boxes to encoded masks. 18 | frPoly - Convert polygon to encoded mask. 19 | drawCirc - Draw circle into image (alters input). 20 | drawLine - Draw line into image (alters input). 21 | drawMasks - Draw masks into image (alters input). 22 | 23 | Usage: 24 | Rs = MaskApi.encode( masks ) 25 | masks = MaskApi.decode( Rs ) 26 | R = MaskApi.merge( Rs, [intersect=false] ) 27 | o = MaskApi.iou( dt, gt, [iscrowd=false] ) 28 | keep = MaskApi.nms( dt, thr ) 29 | a = MaskApi.area( Rs ) 30 | bbs = MaskApi.toBbox( Rs ) 31 | Rs = MaskApi.frBbox( bbs, h, w ) 32 | R = MaskApi.frPoly( poly, h, w ) 33 | MaskApi.drawCirc( img, x, y, rad, clr ) 34 | MaskApi.drawLine( img, x0, y0, x1, y1, rad, clr ) 35 | MaskApi.drawMasks( img, masks, [maxn=n], [alpha=.4], [clrs] ) 36 | For detailed usage information please see cocoDemo.lua. 37 | 38 | In the API the following formats are used: 39 | R,Rs - [table] Run-length encoding of binary mask(s) 40 | masks - [nxhxw] Binary mask(s) 41 | bbs - [nx4] Bounding box(es) stored as [x y w h] 42 | poly - Polygon stored as {[x1 y1 x2 y2...],[x1 y1 ...],...} 43 | dt,gt - May be either bounding boxes or encoded masks 44 | Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel). 45 | 46 | Common Objects in COntext (COCO) Toolbox. version 3.0 47 | Data, paper, and tutorials available at: http://mscoco.org/ 48 | Code written by Pedro O. Pinheiro and Piotr Dollar, 2016. 49 | Licensed under the Simplified BSD License [see coco/license.txt] 50 | 51 | ------------------------------------------------------------------------------]] 52 | 53 | local ffi = require 'ffi' 54 | local coco = require 'coco.env' 55 | 56 | coco.MaskApi = {} 57 | local MaskApi = coco.MaskApi 58 | 59 | coco.libmaskapi = ffi.load(package.searchpath('libmaskapi',package.cpath)) 60 | local libmaskapi = coco.libmaskapi 61 | 62 | -------------------------------------------------------------------------------- 63 | 64 | MaskApi.encode = function( masks ) 65 | local n, h, w = masks:size(1), masks:size(2), masks:size(3) 66 | masks = masks:type('torch.ByteTensor'):transpose(2,3) 67 | local data = masks:contiguous():data() 68 | local Qs = MaskApi._rlesInit(n) 69 | libmaskapi.rleEncode(Qs[0],data,h,w,n) 70 | return MaskApi._rlesToLua(Qs,n) 71 | end 72 | 73 | MaskApi.decode = function( Rs ) 74 | local Qs, n, h, w = MaskApi._rlesFrLua(Rs) 75 | local masks = torch.ByteTensor(n,w,h):zero():contiguous() 76 | libmaskapi.rleDecode(Qs,masks:data(),n) 77 | MaskApi._rlesFree(Qs,n) 78 | return masks:transpose(2,3) 79 | end 80 | 81 | MaskApi.merge = function( Rs, intersect ) 82 | intersect = intersect or 0 83 | local Qs, n, h, w = MaskApi._rlesFrLua(Rs) 84 | local Q = MaskApi._rlesInit(1) 85 | libmaskapi.rleMerge(Qs,Q,n,intersect) 86 | MaskApi._rlesFree(Qs,n) 87 | return MaskApi._rlesToLua(Q,1)[1] 88 | end 89 | 90 | MaskApi.iou = function( dt, gt, iscrowd ) 91 | if not iscrowd then iscrowd = NULL else 92 | iscrowd = iscrowd:type('torch.ByteTensor'):contiguous():data() 93 | end 94 | if torch.isTensor(gt) and torch.isTensor(dt) then 95 | local nDt, k = dt:size(1), dt:size(2); assert(k==4) 96 | local nGt, k = gt:size(1), gt:size(2); assert(k==4) 97 | local dDt = dt:type('torch.DoubleTensor'):contiguous():data() 98 | local dGt = gt:type('torch.DoubleTensor'):contiguous():data() 99 | local o = torch.DoubleTensor(nGt,nDt):contiguous() 100 | libmaskapi.bbIou(dDt,dGt,nDt,nGt,iscrowd,o:data()) 101 | return o:transpose(1,2) 102 | else 103 | local qDt, nDt = MaskApi._rlesFrLua(dt) 104 | local qGt, nGt = MaskApi._rlesFrLua(gt) 105 | local o = torch.DoubleTensor(nGt,nDt):contiguous() 106 | libmaskapi.rleIou(qDt,qGt,nDt,nGt,iscrowd,o:data()) 107 | MaskApi._rlesFree(qDt,nDt); MaskApi._rlesFree(qGt,nGt) 108 | return o:transpose(1,2) 109 | end 110 | end 111 | 112 | MaskApi.nms = function( dt, thr ) 113 | if torch.isTensor(dt) then 114 | local n, k = dt:size(1), dt:size(2); assert(k==4) 115 | local Q = dt:type('torch.DoubleTensor'):contiguous():data() 116 | local kp = torch.IntTensor(n):contiguous() 117 | libmaskapi.bbNms(Q,n,kp:data(),thr) 118 | return kp 119 | else 120 | local Q, n = MaskApi._rlesFrLua(dt) 121 | local kp = torch.IntTensor(n):contiguous() 122 | libmaskapi.rleNms(Q,n,kp:data(),thr) 123 | MaskApi._rlesFree(Q,n) 124 | return kp 125 | end 126 | end 127 | 128 | MaskApi.area = function( Rs ) 129 | local Qs, n, h, w = MaskApi._rlesFrLua(Rs) 130 | local a = torch.IntTensor(n):contiguous() 131 | libmaskapi.rleArea(Qs,n,a:data()) 132 | MaskApi._rlesFree(Qs,n) 133 | return a 134 | end 135 | 136 | MaskApi.toBbox = function( Rs ) 137 | local Qs, n, h, w = MaskApi._rlesFrLua(Rs) 138 | local bb = torch.DoubleTensor(n,4):contiguous() 139 | libmaskapi.rleToBbox(Qs,bb:data(),n) 140 | MaskApi._rlesFree(Qs,n) 141 | return bb 142 | end 143 | 144 | MaskApi.frBbox = function( bbs, h, w ) 145 | if bbs:dim()==1 then bbs=bbs:view(1,bbs:size(1)) end 146 | local n, k = bbs:size(1), bbs:size(2); assert(k==4) 147 | local data = bbs:type('torch.DoubleTensor'):contiguous():data() 148 | local Qs = MaskApi._rlesInit(n) 149 | libmaskapi.rleFrBbox(Qs[0],data,h,w,n) 150 | return MaskApi._rlesToLua(Qs,n) 151 | end 152 | 153 | MaskApi.frPoly = function( poly, h, w ) 154 | local n = #poly 155 | local Qs, Q = MaskApi._rlesInit(n), MaskApi._rlesInit(1) 156 | for i,p in pairs(poly) do 157 | local xy = p:type('torch.DoubleTensor'):contiguous():data() 158 | libmaskapi.rleFrPoly(Qs[i-1],xy,p:size(1)/2,h,w) 159 | end 160 | libmaskapi.rleMerge(Qs,Q[0],n,0) 161 | MaskApi._rlesFree(Qs,n) 162 | return MaskApi._rlesToLua(Q,1)[1] 163 | end 164 | 165 | -------------------------------------------------------------------------------- 166 | 167 | MaskApi.drawCirc = function( img, x, y, rad, clr ) 168 | assert(img:isContiguous() and img:dim()==3) 169 | local k, h, w, data = img:size(1), img:size(2), img:size(3), img:data() 170 | for dx=-rad,rad do for dy=-rad,rad do 171 | local xi, yi = torch.round(x+dx), torch.round(y+dy) 172 | if dx*dx+dy*dy<=rad*rad and xi>=0 and yi>=0 and xi=0 and yi>=0 and xi= 5.1", 17 | "torch >= 7.0", 18 | "lua-cjson" 19 | } 20 | 21 | build = { 22 | type = "builtin", 23 | modules = { 24 | ["coco.env"] = "LuaAPI/env.lua", 25 | ["coco.init"] = "LuaAPI/init.lua", 26 | ["coco.MaskApi"] = "LuaAPI/MaskApi.lua", 27 | ["coco.CocoApi"] = "LuaAPI/CocoApi.lua", 28 | libmaskapi = { 29 | sources = { "common/maskApi.c" }, 30 | incdirs = { "common/" } 31 | } 32 | } 33 | } 34 | 35 | -- luarocks make LuaAPI/rocks/coco-scm-1.rockspec 36 | -- https://github.com/pdollar/coco/raw/master/LuaAPI/rocks/coco-scm-1.rockspec 37 | -------------------------------------------------------------------------------- /eval_city/cocoapi/MatlabAPI/MaskApi.m: -------------------------------------------------------------------------------- 1 | classdef MaskApi 2 | % Interface for manipulating masks stored in RLE format. 3 | % 4 | % RLE is a simple yet efficient format for storing binary masks. RLE 5 | % first divides a vector (or vectorized image) into a series of piecewise 6 | % constant regions and then for each piece simply stores the length of 7 | % that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would 8 | % be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1] 9 | % (note that the odd counts are always the numbers of zeros). Instead of 10 | % storing the counts directly, additional compression is achieved with a 11 | % variable bitrate representation based on a common scheme called LEB128. 12 | % 13 | % Compression is greatest given large piecewise constant regions. 14 | % Specifically, the size of the RLE is proportional to the number of 15 | % *boundaries* in M (or for an image the number of boundaries in the y 16 | % direction). Assuming fairly simple shapes, the RLE representation is 17 | % O(sqrt(n)) where n is number of pixels in the object. Hence space usage 18 | % is substantially lower, especially for large simple objects (large n). 19 | % 20 | % Many common operations on masks can be computed directly using the RLE 21 | % (without need for decoding). This includes computations such as area, 22 | % union, intersection, etc. All of these operations are linear in the 23 | % size of the RLE, in other words they are O(sqrt(n)) where n is the area 24 | % of the object. Computing these operations on the original mask is O(n). 25 | % Thus, using the RLE can result in substantial computational savings. 26 | % 27 | % The following API functions are defined: 28 | % encode - Encode binary masks using RLE. 29 | % decode - Decode binary masks encoded via RLE. 30 | % merge - Compute union or intersection of encoded masks. 31 | % iou - Compute intersection over union between masks. 32 | % nms - Compute non-maximum suppression between ordered masks. 33 | % area - Compute area of encoded masks. 34 | % toBbox - Get bounding boxes surrounding encoded masks. 35 | % frBbox - Convert bounding boxes to encoded masks. 36 | % frPoly - Convert polygon to encoded mask. 37 | % 38 | % Usage: 39 | % Rs = MaskApi.encode( masks ) 40 | % masks = MaskApi.decode( Rs ) 41 | % R = MaskApi.merge( Rs, [intersect=false] ) 42 | % o = MaskApi.iou( dt, gt, [iscrowd=false] ) 43 | % keep = MaskApi.nms( dt, thr ) 44 | % a = MaskApi.area( Rs ) 45 | % bbs = MaskApi.toBbox( Rs ) 46 | % Rs = MaskApi.frBbox( bbs, h, w ) 47 | % R = MaskApi.frPoly( poly, h, w ) 48 | % 49 | % In the API the following formats are used: 50 | % R,Rs - [struct] Run-length encoding of binary mask(s) 51 | % masks - [hxwxn] Binary mask(s) (must have type uint8) 52 | % bbs - [nx4] Bounding box(es) stored as [x y w h] 53 | % poly - Polygon stored as {[x1 y1 x2 y2...],[x1 y1 ...],...} 54 | % dt,gt - May be either bounding boxes or encoded masks 55 | % Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel). 56 | % 57 | % Finally, a note about the intersection over union (iou) computation. 58 | % The standard iou of a ground truth (gt) and detected (dt) object is 59 | % iou(gt,dt) = area(intersect(gt,dt)) / area(union(gt,dt)) 60 | % For "crowd" regions, we use a modified criteria. If a gt object is 61 | % marked as "iscrowd", we allow a dt to match any subregion of the gt. 62 | % Choosing gt' in the crowd gt that best matches the dt can be done using 63 | % gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing 64 | % iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt) 65 | % For crowd gt regions we use this modified criteria above for the iou. 66 | % 67 | % To compile use the following (some precompiled binaries are included): 68 | % mex('CFLAGS=\$CFLAGS -Wall -std=c99','-largeArrayDims',... 69 | % 'private/maskApiMex.c','../common/maskApi.c',... 70 | % '-I../common/','-outdir','private'); 71 | % Please do not contact us for help with compiling. 72 | % 73 | % Microsoft COCO Toolbox. version 2.0 74 | % Data, paper, and tutorials available at: http://mscoco.org/ 75 | % Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 76 | % Licensed under the Simplified BSD License [see coco/license.txt] 77 | 78 | methods( Static ) 79 | function Rs = encode( masks ) 80 | Rs = maskApiMex( 'encode', masks ); 81 | end 82 | 83 | function masks = decode( Rs ) 84 | masks = maskApiMex( 'decode', Rs ); 85 | end 86 | 87 | function R = merge( Rs, varargin ) 88 | R = maskApiMex( 'merge', Rs, varargin{:} ); 89 | end 90 | 91 | function o = iou( dt, gt, varargin ) 92 | o = maskApiMex( 'iou', dt', gt', varargin{:} ); 93 | end 94 | 95 | function keep = nms( dt, thr ) 96 | keep = maskApiMex('nms',dt',thr); 97 | end 98 | 99 | function a = area( Rs ) 100 | a = maskApiMex( 'area', Rs ); 101 | end 102 | 103 | function bbs = toBbox( Rs ) 104 | bbs = maskApiMex( 'toBbox', Rs )'; 105 | end 106 | 107 | function Rs = frBbox( bbs, h, w ) 108 | Rs = maskApiMex( 'frBbox', bbs', h, w ); 109 | end 110 | 111 | function R = frPoly( poly, h, w ) 112 | R = maskApiMex( 'frPoly', poly, h , w ); 113 | end 114 | end 115 | 116 | end 117 | -------------------------------------------------------------------------------- /eval_city/cocoapi/MatlabAPI/cocoDemo.m: -------------------------------------------------------------------------------- 1 | %% Demo for the CocoApi (see CocoApi.m) 2 | 3 | %% initialize COCO api (please specify dataType/annType below) 4 | annTypes = { 'instances', 'captions', 'person_keypoints' }; 5 | dataType='val2014'; annType=annTypes{1}; % specify dataType/annType 6 | annFile=sprintf('../annotations/%s_%s.json',annType,dataType); 7 | coco=CocoApi(annFile); 8 | 9 | %% display COCO categories and supercategories 10 | if( ~strcmp(annType,'captions') ) 11 | cats = coco.loadCats(coco.getCatIds()); 12 | nms={cats.name}; fprintf('COCO categories: '); 13 | fprintf('%s, ',nms{:}); fprintf('\n'); 14 | nms=unique({cats.supercategory}); fprintf('COCO supercategories: '); 15 | fprintf('%s, ',nms{:}); fprintf('\n'); 16 | end 17 | 18 | %% get all images containing given categories, select one at random 19 | catIds = coco.getCatIds('catNms',{'person','dog','skateboard'}); 20 | imgIds = coco.getImgIds('catIds',catIds); 21 | imgId = imgIds(randi(length(imgIds))); 22 | 23 | %% load and display image 24 | img = coco.loadImgs(imgId); 25 | I = imread(sprintf('../images/%s/%s',dataType,img.file_name)); 26 | figure(1); imagesc(I); axis('image'); set(gca,'XTick',[],'YTick',[]) 27 | 28 | %% load and display annotations 29 | annIds = coco.getAnnIds('imgIds',imgId,'catIds',catIds,'iscrowd',[]); 30 | anns = coco.loadAnns(annIds); coco.showAnns(anns); 31 | -------------------------------------------------------------------------------- /eval_city/cocoapi/MatlabAPI/evalDemo.m: -------------------------------------------------------------------------------- 1 | %% Demo demonstrating the algorithm result formats for COCO 2 | 3 | %% select results type for demo (either bbox or segm) 4 | type = {'segm','bbox','keypoints'}; type = type{1}; % specify type here 5 | fprintf('Running demo for *%s* results.\n\n',type); 6 | 7 | %% initialize COCO ground truth api 8 | dataDir='../'; prefix='instances'; dataType='val2014'; 9 | if(strcmp(type,'keypoints')), prefix='person_keypoints'; end 10 | annFile=sprintf('%s/annotations/%s_%s.json',dataDir,prefix,dataType); 11 | cocoGt=CocoApi(annFile); 12 | 13 | %% initialize COCO detections api 14 | resFile='%s/results/%s_%s_fake%s100_results.json'; 15 | resFile=sprintf(resFile,dataDir,prefix,dataType,type); 16 | cocoDt=cocoGt.loadRes(resFile); 17 | 18 | %% visialuze gt and dt side by side 19 | imgIds=sort(cocoGt.getImgIds()); imgIds=imgIds(1:100); 20 | imgId = imgIds(randi(100)); img = cocoGt.loadImgs(imgId); 21 | I = imread(sprintf('%s/images/val2014/%s',dataDir,img.file_name)); 22 | figure(1); subplot(1,2,1); imagesc(I); axis('image'); axis off; 23 | annIds = cocoGt.getAnnIds('imgIds',imgId); title('ground truth') 24 | anns = cocoGt.loadAnns(annIds); cocoGt.showAnns(anns); 25 | figure(1); subplot(1,2,2); imagesc(I); axis('image'); axis off; 26 | annIds = cocoDt.getAnnIds('imgIds',imgId); title('results') 27 | anns = cocoDt.loadAnns(annIds); cocoDt.showAnns(anns); 28 | 29 | %% load raw JSON and show exact format for results 30 | fprintf('results structure have the following format:\n'); 31 | res = gason(fileread(resFile)); disp(res) 32 | 33 | %% the following command can be used to save the results back to disk 34 | if(0), f=fopen(resFile,'w'); fwrite(f,gason(res)); fclose(f); end 35 | 36 | %% run COCO evaluation code (see CocoEval.m) 37 | cocoEval=CocoEval(cocoGt,cocoDt,type); 38 | cocoEval.params.imgIds=imgIds; 39 | cocoEval.evaluate(); 40 | cocoEval.accumulate(); 41 | cocoEval.summarize(); 42 | 43 | %% generate Derek Hoiem style analyis of false positives (slow) 44 | if(0), cocoEval.analyze(); end 45 | -------------------------------------------------------------------------------- /eval_city/cocoapi/MatlabAPI/gason.m: -------------------------------------------------------------------------------- 1 | function out = gason( in ) 2 | % Convert between JSON strings and corresponding JSON objects. 3 | % 4 | % This parser is based on Gason written and maintained by Ivan Vashchaev: 5 | % https://github.com/vivkin/gason 6 | % Gason is a "lightweight and fast JSON parser for C++". Please see the 7 | % above link for license information and additional details about Gason. 8 | % 9 | % Given a JSON string, gason calls the C++ parser and converts the output 10 | % into an appropriate Matlab structure. As the parsing is performed in mex 11 | % the resulting parser is blazingly fast. Large JSON structs (100MB+) take 12 | % only a few seconds to parse (compared to hours for pure Matlab parsers). 13 | % 14 | % Given a JSON object, gason calls the C++ encoder to convert the object 15 | % back into a JSON string representation. Nearly any Matlab struct, cell 16 | % array, or numeric array represent a valid JSON object. Note that gason() 17 | % can be used to go both from JSON string to JSON object and back. 18 | % 19 | % Gason requires C++11 to compile (for GCC this requires version 4.7 or 20 | % later). The following command compiles the parser (may require tweaking): 21 | % mex('CXXFLAGS=\$CXXFLAGS -std=c++11 -Wall','-largeArrayDims',... 22 | % 'private/gasonMex.cpp','../common/gason.cpp',... 23 | % '-I../common/','-outdir','private'); 24 | % Note the use of the "-std=c++11" flag. A number of precompiled binaries 25 | % are included, please do not contact us for help with compiling. If needed 26 | % you can specify a compiler by adding the option 'CXX="/usr/bin/g++"'. 27 | % 28 | % Note that by default JSON arrays that contain only numbers are stored as 29 | % regular Matlab arrays. Likewise, JSON arrays that contain only objects of 30 | % the same type are stored as Matlab struct arrays. This is much faster and 31 | % can use considerably less memory than always using Matlab cell arrays. 32 | % 33 | % USAGE 34 | % object = gason( string ) 35 | % string = gason( object ) 36 | % 37 | % INPUTS/OUTPUTS 38 | % string - JSON string 39 | % object - JSON object 40 | % 41 | % EXAMPLE 42 | % o = struct('first',{'piotr','ty'},'last',{'dollar','lin'}) 43 | % s = gason( o ) % convert JSON object -> JSON string 44 | % p = gason( s ) % convert JSON string -> JSON object 45 | % 46 | % See also 47 | % 48 | % Microsoft COCO Toolbox. version 2.0 49 | % Data, paper, and tutorials available at: http://mscoco.org/ 50 | % Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 51 | % Licensed under the Simplified BSD License [see coco/license.txt] 52 | 53 | out = gasonMex( 'convert', in ); 54 | -------------------------------------------------------------------------------- /eval_city/cocoapi/MatlabAPI/private/gasonMex.cpp: -------------------------------------------------------------------------------- 1 | /************************************************************************** 2 | * Microsoft COCO Toolbox. version 2.0 3 | * Data, paper, and tutorials available at: http://mscoco.org/ 4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 5 | * Licensed under the Simplified BSD License [see coco/license.txt] 6 | **************************************************************************/ 7 | #include "gason.h" 8 | #include "mex.h" 9 | #include "string.h" 10 | #include "math.h" 11 | #include 12 | #include 13 | #include 14 | typedef std::ostringstream ostrm; 15 | typedef unsigned long siz; 16 | typedef unsigned short ushort; 17 | 18 | siz length( const JsonValue &a ) { 19 | // get number of elements in JSON_ARRAY or JSON_OBJECT 20 | siz k=0; auto n=a.toNode(); while(n) { k++; n=n->next; } return k; 21 | } 22 | 23 | bool isRegularObjArray( const JsonValue &a ) { 24 | // check if all JSON_OBJECTs in JSON_ARRAY have the same fields 25 | JsonValue o=a.toNode()->value; siz k, n; const char **keys; 26 | n=length(o); keys=new const char*[n]; 27 | k=0; for(auto j:o) keys[k++]=j->key; 28 | for( auto i:a ) { 29 | if(length(i->value)!=n) return false; k=0; 30 | for(auto j:i->value) if(strcmp(j->key,keys[k++])) return false; 31 | } 32 | delete [] keys; return true; 33 | } 34 | 35 | mxArray* json( const JsonValue &o ) { 36 | // convert JsonValue to Matlab mxArray 37 | siz k, m, n; mxArray *M; const char **keys; 38 | switch( o.getTag() ) { 39 | case JSON_NUMBER: 40 | return mxCreateDoubleScalar(o.toNumber()); 41 | case JSON_STRING: 42 | return mxCreateString(o.toString()); 43 | case JSON_ARRAY: { 44 | if(!o.toNode()) return mxCreateDoubleMatrix(1,0,mxREAL); 45 | JsonValue o0=o.toNode()->value; JsonTag tag=o0.getTag(); 46 | n=length(o); bool isRegular=true; 47 | for(auto i:o) isRegular=isRegular && i->value.getTag()==tag; 48 | if( isRegular && tag==JSON_OBJECT && isRegularObjArray(o) ) { 49 | m=length(o0); keys=new const char*[m]; 50 | k=0; for(auto j:o0) keys[k++]=j->key; 51 | M = mxCreateStructMatrix(1,n,m,keys); 52 | k=0; for(auto i:o) { m=0; for(auto j:i->value) 53 | mxSetFieldByNumber(M,k,m++,json(j->value)); k++; } 54 | delete [] keys; return M; 55 | } else if( isRegular && tag==JSON_NUMBER ) { 56 | M = mxCreateDoubleMatrix(1,n,mxREAL); double *p=mxGetPr(M); 57 | k=0; for(auto i:o) p[k++]=i->value.toNumber(); return M; 58 | } else { 59 | M = mxCreateCellMatrix(1,n); 60 | k=0; for(auto i:o) mxSetCell(M,k++,json(i->value)); 61 | return M; 62 | } 63 | } 64 | case JSON_OBJECT: 65 | if(!o.toNode()) return mxCreateStructMatrix(1,0,0,NULL); 66 | n=length(o); keys=new const char*[n]; 67 | k=0; for(auto i:o) keys[k++]=i->key; 68 | M = mxCreateStructMatrix(1,1,n,keys); k=0; 69 | for(auto i:o) mxSetFieldByNumber(M,0,k++,json(i->value)); 70 | delete [] keys; return M; 71 | case JSON_TRUE: 72 | return mxCreateDoubleScalar(1); 73 | case JSON_FALSE: 74 | return mxCreateDoubleScalar(0); 75 | case JSON_NULL: 76 | return mxCreateDoubleMatrix(0,0,mxREAL); 77 | default: return NULL; 78 | } 79 | } 80 | 81 | template ostrm& json( ostrm &S, T *A, siz n ) { 82 | // convert numeric array to JSON string with casting 83 | if(n==0) { S<<"[]"; return S; } if(n==1) { S< ostrm& json( ostrm &S, T *A, siz n ) { 89 | // convert numeric array to JSON string without casting 90 | return json(S,A,n); 91 | } 92 | 93 | ostrm& json( ostrm &S, const char *A ) { 94 | // convert char array to JSON string (handle escape characters) 95 | #define RPL(a,b) case a: { S << b; A++; break; } 96 | S << "\""; while( *A>0 ) switch( *A ) { 97 | RPL('"',"\\\""); RPL('\\',"\\\\"); RPL('/',"\\/"); RPL('\b',"\\b"); 98 | RPL('\f',"\\f"); RPL('\n',"\\n"); RPL('\r',"\\r"); RPL('\t',"\\t"); 99 | default: S << *A; A++; 100 | } 101 | S << "\""; return S; 102 | } 103 | 104 | ostrm& json( ostrm& S, const JsonValue *o ) { 105 | // convert JsonValue to JSON string 106 | switch( o->getTag() ) { 107 | case JSON_NUMBER: S << o->toNumber(); return S; 108 | case JSON_TRUE: S << "true"; return S; 109 | case JSON_FALSE: S << "false"; return S; 110 | case JSON_NULL: S << "null"; return S; 111 | case JSON_STRING: return json(S,o->toString()); 112 | case JSON_ARRAY: 113 | S << "["; for(auto i:*o) { 114 | json(S,&i->value) << (i->next ? "," : ""); } 115 | S << "]"; return S; 116 | case JSON_OBJECT: 117 | S << "{"; for(auto i:*o) { 118 | json(S,i->key) << ":"; 119 | json(S,&i->value) << (i->next ? "," : ""); } 120 | S << "}"; return S; 121 | default: return S; 122 | } 123 | } 124 | 125 | ostrm& json( ostrm& S, const mxArray *M ) { 126 | // convert Matlab mxArray to JSON string 127 | siz i, j, m, n=mxGetNumberOfElements(M); 128 | void *A=mxGetData(M); ostrm *nms; 129 | switch( mxGetClassID(M) ) { 130 | case mxDOUBLE_CLASS: return json(S,(double*) A,n); 131 | case mxSINGLE_CLASS: return json(S,(float*) A,n); 132 | case mxINT64_CLASS: return json(S,(int64_t*) A,n); 133 | case mxUINT64_CLASS: return json(S,(uint64_t*) A,n); 134 | case mxINT32_CLASS: return json(S,(int32_t*) A,n); 135 | case mxUINT32_CLASS: return json(S,(uint32_t*) A,n); 136 | case mxINT16_CLASS: return json(S,(int16_t*) A,n); 137 | case mxUINT16_CLASS: return json(S,(uint16_t*) A,n); 138 | case mxINT8_CLASS: return json(S,(int8_t*) A,n); 139 | case mxUINT8_CLASS: return json(S,(uint8_t*) A,n); 140 | case mxLOGICAL_CLASS: return json(S,(uint8_t*) A,n); 141 | case mxCHAR_CLASS: return json(S,mxArrayToString(M)); 142 | case mxCELL_CLASS: 143 | S << "["; for(i=0; i0) json(S,mxGetCell(M,n-1)); S << "]"; return S; 145 | case mxSTRUCT_CLASS: 146 | if(n==0) { S<<"{}"; return S; } m=mxGetNumberOfFields(M); 147 | if(m==0) { S<<"["; for(i=0; i1) S<<"["; nms=new ostrm[m]; 149 | for(j=0; j1) S<<"]"; delete [] nms; return S; 156 | default: 157 | mexErrMsgTxt( "Unknown type." ); return S; 158 | } 159 | } 160 | 161 | mxArray* mxCreateStringRobust( const char* str ) { 162 | // convert char* to Matlab string (robust version of mxCreateString) 163 | mxArray *M; ushort *c; mwSize n[2]={1,strlen(str)}; 164 | M=mxCreateCharArray(2,n); c=(ushort*) mxGetData(M); 165 | for( siz i=0; i1 ) mexErrMsgTxt("One output expected."); 182 | 183 | if(!strcmp(action,"convert")) { 184 | if( nr!=1 ) mexErrMsgTxt("One input expected."); 185 | if( mxGetClassID(pr[0])==mxCHAR_CLASS ) { 186 | // object = mexFunction( string ) 187 | char *str = mxArrayToStringRobust(pr[0]); 188 | int status = jsonParse(str, &endptr, &val, allocator); 189 | if( status != JSON_OK) mexErrMsgTxt(jsonStrError(status)); 190 | pl[0] = json(val); mxFree(str); 191 | } else { 192 | // string = mexFunction( object ) 193 | ostrm S; S << std::setprecision(12); json(S,pr[0]); 194 | pl[0]=mxCreateStringRobust(S.str().c_str()); 195 | } 196 | 197 | } else if(!strcmp(action,"split")) { 198 | // strings = mexFunction( string, k ) 199 | if( nr!=2 ) mexErrMsgTxt("Two input expected."); 200 | char *str = mxArrayToStringRobust(pr[0]); 201 | int status = jsonParse(str, &endptr, &val, allocator); 202 | if( status != JSON_OK) mexErrMsgTxt(jsonStrError(status)); 203 | if( val.getTag()!=JSON_ARRAY ) mexErrMsgTxt("Array expected"); 204 | siz i=0, t=0, n=length(val), k=(siz) mxGetScalar(pr[1]); 205 | k=(k>n)?n:(k<1)?1:k; k=ceil(n/ceil(double(n)/k)); 206 | pl[0]=mxCreateCellMatrix(1,k); ostrm S; S<value); t--; if(!o->next) t=0; S << (t ? "," : "]"); 210 | if(!t) mxSetCell(pl[0],i++,mxCreateStringRobust(S.str().c_str())); 211 | } 212 | 213 | } else if(!strcmp(action,"merge")) { 214 | // string = mexFunction( strings ) 215 | if( nr!=1 ) mexErrMsgTxt("One input expected."); 216 | if(!mxIsCell(pr[0])) mexErrMsgTxt("Cell array expected."); 217 | siz n = mxGetNumberOfElements(pr[0]); 218 | ostrm S; S << std::setprecision(12); S << "["; 219 | for( siz i=0; ivalue) << (j->next ? "," : ""); 225 | mxFree(str); if(i1) 14 | % [ param1 ... paramN ] = getPrmDflt( prm, dfs, [checkExtra] ) 15 | % 16 | % INPUTS 17 | % prm - param struct or cell of form {'name1' v1 'name2' v2 ...} 18 | % dfs - cell of form {'name1' def1 'name2' def2 ...} 19 | % checkExtra - [0] if 1 throw error if prm contains params not in dfs 20 | % if -1 if prm contains params not in dfs adds them 21 | % 22 | % OUTPUTS (nargout==1) 23 | % prm - parameter struct with fields 'name1' through 'nameN' assigned 24 | % 25 | % OUTPUTS (nargout>1) 26 | % param1 - value assigned to parameter with 'name1' 27 | % ... 28 | % paramN - value assigned to parameter with 'nameN' 29 | % 30 | % EXAMPLE 31 | % dfs = { 'x','REQ', 'y',0, 'z',[], 'eps',1e-3 }; 32 | % prm = getPrmDflt( struct('x',1,'y',1), dfs ) 33 | % [ x y z eps ] = getPrmDflt( {'x',2,'y',1}, dfs ) 34 | % 35 | % See also INPUTPARSER 36 | % 37 | % Piotr's Computer Vision Matlab Toolbox Version 2.60 38 | % Copyright 2014 Piotr Dollar. [pdollar-at-gmail.com] 39 | % Licensed under the Simplified BSD License [see external/bsd.txt] 40 | 41 | if( mod(length(dfs),2) ), error('odd number of default parameters'); end 42 | if nargin<=2, checkExtra = 0; end 43 | 44 | % get the input parameters as two cell arrays: prmVal and prmField 45 | if iscell(prm) && length(prm)==1, prm=prm{1}; end 46 | if iscell(prm) 47 | if(mod(length(prm),2)), error('odd number of parameters in prm'); end 48 | prmField = prm(1:2:end); prmVal = prm(2:2:end); 49 | else 50 | if(~isstruct(prm)), error('prm must be a struct or a cell'); end 51 | prmVal = struct2cell(prm); prmField = fieldnames(prm); 52 | end 53 | 54 | % get and update default values using quick for loop 55 | dfsField = dfs(1:2:end); dfsVal = dfs(2:2:end); 56 | if checkExtra>0 57 | for i=1:length(prmField) 58 | j = find(strcmp(prmField{i},dfsField)); 59 | if isempty(j), error('parameter %s is not valid', prmField{i}); end 60 | dfsVal(j) = prmVal(i); 61 | end 62 | elseif checkExtra<0 63 | for i=1:length(prmField) 64 | j = find(strcmp(prmField{i},dfsField)); 65 | if isempty(j), j=length(dfsVal)+1; dfsField{j}=prmField{i}; end 66 | dfsVal(j) = prmVal(i); 67 | end 68 | else 69 | for i=1:length(prmField) 70 | dfsVal(strcmp(prmField{i},dfsField)) = prmVal(i); 71 | end 72 | end 73 | 74 | % check for missing values 75 | if any(strcmp('REQ',dfsVal)) 76 | cmpArray = find(strcmp('REQ',dfsVal)); 77 | error(['Required field ''' dfsField{cmpArray(1)} ''' not specified.'] ); 78 | end 79 | 80 | % set output 81 | if nargout==1 82 | varargout{1} = cell2struct( dfsVal, dfsField, 2 ); 83 | else 84 | varargout = dfsVal; 85 | end 86 | -------------------------------------------------------------------------------- /eval_city/cocoapi/MatlabAPI/private/maskApiMex.c: -------------------------------------------------------------------------------- 1 | /************************************************************************** 2 | * Microsoft COCO Toolbox. version 2.0 3 | * Data, paper, and tutorials available at: http://mscoco.org/ 4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 5 | * Licensed under the Simplified BSD License [see coco/license.txt] 6 | **************************************************************************/ 7 | #include "mex.h" 8 | #include "maskApi.h" 9 | #include 10 | 11 | void checkType( const mxArray *M, mxClassID id ) { 12 | if(mxGetClassID(M)!=id) mexErrMsgTxt("Invalid type."); 13 | } 14 | 15 | mxArray* toMxArray( const RLE *R, siz n ) { 16 | const char *fs[] = {"size", "counts"}; 17 | mxArray *M=mxCreateStructMatrix(1,n,2,fs); 18 | for( siz i=0; i1) mexErrMsgTxt(err); 35 | for( i=0; i<*n; i++ ) { 36 | mxArray *S, *C; double *s; void *c; 37 | S=mxGetFieldByNumber(M,i,O[0]); checkType(S,mxDOUBLE_CLASS); 38 | C=mxGetFieldByNumber(M,i,O[1]); s=mxGetPr(S); c=mxGetData(C); 39 | h=(siz)s[0]; w=(siz)s[1]; m=mxGetNumberOfElements(C); 40 | if(same && i>0 && (h!=R[0].h || w!=R[0].w)) mexErrMsgTxt(err); 41 | if( mxGetClassID(C)==mxDOUBLE_CLASS ) { 42 | rleInit(R+i,h,w,m,0); 43 | for(j=0; j=2) ? (mxGetScalar(pr[1])>0) : false; 74 | rleMerge(R,&M,n,intersect); pl[0]=toMxArray(&M,1); rleFree(&M); 75 | 76 | } else if(!strcmp(action,"area")) { 77 | R=frMxArray(pr[0],&n,0); 78 | pl[0]=mxCreateNumericMatrix(1,n,mxUINT32_CLASS,mxREAL); 79 | uint *a=(uint*) mxGetPr(pl[0]); rleArea(R,n,a); 80 | 81 | } else if(!strcmp(action,"iou")) { 82 | if(nr>2) checkType(pr[2],mxUINT8_CLASS); siz nDt, nGt; 83 | byte *iscrowd = nr>2 ? (byte*) mxGetPr(pr[2]) : NULL; 84 | if(mxIsStruct(pr[0]) || mxIsStruct(pr[1])) { 85 | RLE *dt=frMxArray(pr[0],&nDt,1), *gt=frMxArray(pr[1],&nGt,1); 86 | pl[0]=mxCreateNumericMatrix(nDt,nGt,mxDOUBLE_CLASS,mxREAL); 87 | double *o=mxGetPr(pl[0]); rleIou(dt,gt,nDt,nGt,iscrowd,o); 88 | rlesFree(&dt,nDt); rlesFree(>,nGt); 89 | } else { 90 | checkType(pr[0],mxDOUBLE_CLASS); checkType(pr[1],mxDOUBLE_CLASS); 91 | double *dt=mxGetPr(pr[0]); nDt=mxGetN(pr[0]); 92 | double *gt=mxGetPr(pr[1]); nGt=mxGetN(pr[1]); 93 | pl[0]=mxCreateNumericMatrix(nDt,nGt,mxDOUBLE_CLASS,mxREAL); 94 | double *o=mxGetPr(pl[0]); bbIou(dt,gt,nDt,nGt,iscrowd,o); 95 | } 96 | 97 | } else if(!strcmp(action,"nms")) { 98 | siz n; uint *keep; double thr=(double) mxGetScalar(pr[1]); 99 | if(mxIsStruct(pr[0])) { 100 | RLE *dt=frMxArray(pr[0],&n,1); 101 | pl[0]=mxCreateNumericMatrix(1,n,mxUINT32_CLASS,mxREAL); 102 | keep=(uint*) mxGetPr(pl[0]); rleNms(dt,n,keep,thr); 103 | rlesFree(&dt,n); 104 | } else { 105 | checkType(pr[0],mxDOUBLE_CLASS); 106 | double *dt=mxGetPr(pr[0]); n=mxGetN(pr[0]); 107 | pl[0]=mxCreateNumericMatrix(1,n,mxUINT32_CLASS,mxREAL); 108 | keep=(uint*) mxGetPr(pl[0]); bbNms(dt,n,keep,thr); 109 | } 110 | 111 | } else if(!strcmp(action,"toBbox")) { 112 | R=frMxArray(pr[0],&n,0); 113 | pl[0]=mxCreateNumericMatrix(4,n,mxDOUBLE_CLASS,mxREAL); 114 | BB bb=mxGetPr(pl[0]); rleToBbox(R,bb,n); 115 | 116 | } else if(!strcmp(action,"frBbox")) { 117 | checkType(pr[0],mxDOUBLE_CLASS); 118 | double *bb=mxGetPr(pr[0]); n=mxGetN(pr[0]); 119 | h=(siz)mxGetScalar(pr[1]); w=(siz)mxGetScalar(pr[2]); 120 | rlesInit(&R,n); rleFrBbox(R,bb,h,w,n); pl[0]=toMxArray(R,n); 121 | 122 | } else if(!strcmp(action,"frPoly")) { 123 | checkType(pr[0],mxCELL_CLASS); n=mxGetNumberOfElements(pr[0]); 124 | h=(siz)mxGetScalar(pr[1]); w=(siz)mxGetScalar(pr[2]); rlesInit(&R,n); 125 | for(siz i=0; i 4 | 5 | #define JSON_ZONE_SIZE 4096 6 | #define JSON_STACK_SIZE 32 7 | 8 | const char *jsonStrError(int err) { 9 | switch (err) { 10 | #define XX(no, str) \ 11 | case JSON_##no: \ 12 | return str; 13 | JSON_ERRNO_MAP(XX) 14 | #undef XX 15 | default: 16 | return "unknown"; 17 | } 18 | } 19 | 20 | void *JsonAllocator::allocate(size_t size) { 21 | size = (size + 7) & ~7; 22 | 23 | if (head && head->used + size <= JSON_ZONE_SIZE) { 24 | char *p = (char *)head + head->used; 25 | head->used += size; 26 | return p; 27 | } 28 | 29 | size_t allocSize = sizeof(Zone) + size; 30 | Zone *zone = (Zone *)malloc(allocSize <= JSON_ZONE_SIZE ? JSON_ZONE_SIZE : allocSize); 31 | if (zone == nullptr) 32 | return nullptr; 33 | zone->used = allocSize; 34 | if (allocSize <= JSON_ZONE_SIZE || head == nullptr) { 35 | zone->next = head; 36 | head = zone; 37 | } else { 38 | zone->next = head->next; 39 | head->next = zone; 40 | } 41 | return (char *)zone + sizeof(Zone); 42 | } 43 | 44 | void JsonAllocator::deallocate() { 45 | while (head) { 46 | Zone *next = head->next; 47 | free(head); 48 | head = next; 49 | } 50 | } 51 | 52 | static inline bool isspace(char c) { 53 | return c == ' ' || (c >= '\t' && c <= '\r'); 54 | } 55 | 56 | static inline bool isdelim(char c) { 57 | return c == ',' || c == ':' || c == ']' || c == '}' || isspace(c) || !c; 58 | } 59 | 60 | static inline bool isdigit(char c) { 61 | return c >= '0' && c <= '9'; 62 | } 63 | 64 | static inline bool isxdigit(char c) { 65 | return (c >= '0' && c <= '9') || ((c & ~' ') >= 'A' && (c & ~' ') <= 'F'); 66 | } 67 | 68 | static inline int char2int(char c) { 69 | if (c <= '9') 70 | return c - '0'; 71 | return (c & ~' ') - 'A' + 10; 72 | } 73 | 74 | static double string2double(char *s, char **endptr) { 75 | char ch = *s; 76 | if (ch == '-') 77 | ++s; 78 | 79 | double result = 0; 80 | while (isdigit(*s)) 81 | result = (result * 10) + (*s++ - '0'); 82 | 83 | if (*s == '.') { 84 | ++s; 85 | 86 | double fraction = 1; 87 | while (isdigit(*s)) { 88 | fraction *= 0.1; 89 | result += (*s++ - '0') * fraction; 90 | } 91 | } 92 | 93 | if (*s == 'e' || *s == 'E') { 94 | ++s; 95 | 96 | double base = 10; 97 | if (*s == '+') 98 | ++s; 99 | else if (*s == '-') { 100 | ++s; 101 | base = 0.1; 102 | } 103 | 104 | unsigned int exponent = 0; 105 | while (isdigit(*s)) 106 | exponent = (exponent * 10) + (*s++ - '0'); 107 | 108 | double power = 1; 109 | for (; exponent; exponent >>= 1, base *= base) 110 | if (exponent & 1) 111 | power *= base; 112 | 113 | result *= power; 114 | } 115 | 116 | *endptr = s; 117 | return ch == '-' ? -result : result; 118 | } 119 | 120 | static inline JsonNode *insertAfter(JsonNode *tail, JsonNode *node) { 121 | if (!tail) 122 | return node->next = node; 123 | node->next = tail->next; 124 | tail->next = node; 125 | return node; 126 | } 127 | 128 | static inline JsonValue listToValue(JsonTag tag, JsonNode *tail) { 129 | if (tail) { 130 | auto head = tail->next; 131 | tail->next = nullptr; 132 | return JsonValue(tag, head); 133 | } 134 | return JsonValue(tag, nullptr); 135 | } 136 | 137 | int jsonParse(char *s, char **endptr, JsonValue *value, JsonAllocator &allocator) { 138 | JsonNode *tails[JSON_STACK_SIZE]; 139 | JsonTag tags[JSON_STACK_SIZE]; 140 | char *keys[JSON_STACK_SIZE]; 141 | JsonValue o; 142 | int pos = -1; 143 | bool separator = true; 144 | JsonNode *node; 145 | *endptr = s; 146 | 147 | while (*s) { 148 | while (isspace(*s)) { 149 | ++s; 150 | if (!*s) break; 151 | } 152 | *endptr = s++; 153 | switch (**endptr) { 154 | case '-': 155 | if (!isdigit(*s) && *s != '.') { 156 | *endptr = s; 157 | return JSON_BAD_NUMBER; 158 | } 159 | case '0': 160 | case '1': 161 | case '2': 162 | case '3': 163 | case '4': 164 | case '5': 165 | case '6': 166 | case '7': 167 | case '8': 168 | case '9': 169 | o = JsonValue(string2double(*endptr, &s)); 170 | if (!isdelim(*s)) { 171 | *endptr = s; 172 | return JSON_BAD_NUMBER; 173 | } 174 | break; 175 | case '"': 176 | o = JsonValue(JSON_STRING, s); 177 | for (char *it = s; *s; ++it, ++s) { 178 | int c = *it = *s; 179 | if (c == '\\') { 180 | c = *++s; 181 | switch (c) { 182 | case '\\': 183 | case '"': 184 | case '/': 185 | *it = c; 186 | break; 187 | case 'b': 188 | *it = '\b'; 189 | break; 190 | case 'f': 191 | *it = '\f'; 192 | break; 193 | case 'n': 194 | *it = '\n'; 195 | break; 196 | case 'r': 197 | *it = '\r'; 198 | break; 199 | case 't': 200 | *it = '\t'; 201 | break; 202 | case 'u': 203 | c = 0; 204 | for (int i = 0; i < 4; ++i) { 205 | if (isxdigit(*++s)) { 206 | c = c * 16 + char2int(*s); 207 | } else { 208 | *endptr = s; 209 | return JSON_BAD_STRING; 210 | } 211 | } 212 | if (c < 0x80) { 213 | *it = c; 214 | } else if (c < 0x800) { 215 | *it++ = 0xC0 | (c >> 6); 216 | *it = 0x80 | (c & 0x3F); 217 | } else { 218 | *it++ = 0xE0 | (c >> 12); 219 | *it++ = 0x80 | ((c >> 6) & 0x3F); 220 | *it = 0x80 | (c & 0x3F); 221 | } 222 | break; 223 | default: 224 | *endptr = s; 225 | return JSON_BAD_STRING; 226 | } 227 | } else if ((unsigned int)c < ' ' || c == '\x7F') { 228 | *endptr = s; 229 | return JSON_BAD_STRING; 230 | } else if (c == '"') { 231 | *it = 0; 232 | ++s; 233 | break; 234 | } 235 | } 236 | if (!isdelim(*s)) { 237 | *endptr = s; 238 | return JSON_BAD_STRING; 239 | } 240 | break; 241 | case 't': 242 | if (!(s[0] == 'r' && s[1] == 'u' && s[2] == 'e' && isdelim(s[3]))) 243 | return JSON_BAD_IDENTIFIER; 244 | o = JsonValue(JSON_TRUE); 245 | s += 3; 246 | break; 247 | case 'f': 248 | if (!(s[0] == 'a' && s[1] == 'l' && s[2] == 's' && s[3] == 'e' && isdelim(s[4]))) 249 | return JSON_BAD_IDENTIFIER; 250 | o = JsonValue(JSON_FALSE); 251 | s += 4; 252 | break; 253 | case 'n': 254 | if (!(s[0] == 'u' && s[1] == 'l' && s[2] == 'l' && isdelim(s[3]))) 255 | return JSON_BAD_IDENTIFIER; 256 | o = JsonValue(JSON_NULL); 257 | s += 3; 258 | break; 259 | case ']': 260 | if (pos == -1) 261 | return JSON_STACK_UNDERFLOW; 262 | if (tags[pos] != JSON_ARRAY) 263 | return JSON_MISMATCH_BRACKET; 264 | o = listToValue(JSON_ARRAY, tails[pos--]); 265 | break; 266 | case '}': 267 | if (pos == -1) 268 | return JSON_STACK_UNDERFLOW; 269 | if (tags[pos] != JSON_OBJECT) 270 | return JSON_MISMATCH_BRACKET; 271 | if (keys[pos] != nullptr) 272 | return JSON_UNEXPECTED_CHARACTER; 273 | o = listToValue(JSON_OBJECT, tails[pos--]); 274 | break; 275 | case '[': 276 | if (++pos == JSON_STACK_SIZE) 277 | return JSON_STACK_OVERFLOW; 278 | tails[pos] = nullptr; 279 | tags[pos] = JSON_ARRAY; 280 | keys[pos] = nullptr; 281 | separator = true; 282 | continue; 283 | case '{': 284 | if (++pos == JSON_STACK_SIZE) 285 | return JSON_STACK_OVERFLOW; 286 | tails[pos] = nullptr; 287 | tags[pos] = JSON_OBJECT; 288 | keys[pos] = nullptr; 289 | separator = true; 290 | continue; 291 | case ':': 292 | if (separator || keys[pos] == nullptr) 293 | return JSON_UNEXPECTED_CHARACTER; 294 | separator = true; 295 | continue; 296 | case ',': 297 | if (separator || keys[pos] != nullptr) 298 | return JSON_UNEXPECTED_CHARACTER; 299 | separator = true; 300 | continue; 301 | case '\0': 302 | continue; 303 | default: 304 | return JSON_UNEXPECTED_CHARACTER; 305 | } 306 | 307 | separator = false; 308 | 309 | if (pos == -1) { 310 | *endptr = s; 311 | *value = o; 312 | return JSON_OK; 313 | } 314 | 315 | if (tags[pos] == JSON_OBJECT) { 316 | if (!keys[pos]) { 317 | if (o.getTag() != JSON_STRING) 318 | return JSON_UNQUOTED_KEY; 319 | keys[pos] = o.toString(); 320 | continue; 321 | } 322 | if ((node = (JsonNode *) allocator.allocate(sizeof(JsonNode))) == nullptr) 323 | return JSON_ALLOCATION_FAILURE; 324 | tails[pos] = insertAfter(tails[pos], node); 325 | tails[pos]->key = keys[pos]; 326 | keys[pos] = nullptr; 327 | } else { 328 | if ((node = (JsonNode *) allocator.allocate(sizeof(JsonNode) - sizeof(char *))) == nullptr) 329 | return JSON_ALLOCATION_FAILURE; 330 | tails[pos] = insertAfter(tails[pos], node); 331 | } 332 | tails[pos]->value = o; 333 | } 334 | return JSON_BREAKING_BAD; 335 | } 336 | -------------------------------------------------------------------------------- /eval_city/cocoapi/common/gason.h: -------------------------------------------------------------------------------- 1 | // https://github.com/vivkin/gason - pulled January 10, 2016 2 | #pragma once 3 | 4 | #include 5 | #include 6 | #include 7 | 8 | enum JsonTag { 9 | JSON_NUMBER = 0, 10 | JSON_STRING, 11 | JSON_ARRAY, 12 | JSON_OBJECT, 13 | JSON_TRUE, 14 | JSON_FALSE, 15 | JSON_NULL = 0xF 16 | }; 17 | 18 | struct JsonNode; 19 | 20 | #define JSON_VALUE_PAYLOAD_MASK 0x00007FFFFFFFFFFFULL 21 | #define JSON_VALUE_NAN_MASK 0x7FF8000000000000ULL 22 | #define JSON_VALUE_TAG_MASK 0xF 23 | #define JSON_VALUE_TAG_SHIFT 47 24 | 25 | union JsonValue { 26 | uint64_t ival; 27 | double fval; 28 | 29 | JsonValue(double x) 30 | : fval(x) { 31 | } 32 | JsonValue(JsonTag tag = JSON_NULL, void *payload = nullptr) { 33 | assert((uintptr_t)payload <= JSON_VALUE_PAYLOAD_MASK); 34 | ival = JSON_VALUE_NAN_MASK | ((uint64_t)tag << JSON_VALUE_TAG_SHIFT) | (uintptr_t)payload; 35 | } 36 | bool isDouble() const { 37 | return (int64_t)ival <= (int64_t)JSON_VALUE_NAN_MASK; 38 | } 39 | JsonTag getTag() const { 40 | return isDouble() ? JSON_NUMBER : JsonTag((ival >> JSON_VALUE_TAG_SHIFT) & JSON_VALUE_TAG_MASK); 41 | } 42 | uint64_t getPayload() const { 43 | assert(!isDouble()); 44 | return ival & JSON_VALUE_PAYLOAD_MASK; 45 | } 46 | double toNumber() const { 47 | assert(getTag() == JSON_NUMBER); 48 | return fval; 49 | } 50 | char *toString() const { 51 | assert(getTag() == JSON_STRING); 52 | return (char *)getPayload(); 53 | } 54 | JsonNode *toNode() const { 55 | assert(getTag() == JSON_ARRAY || getTag() == JSON_OBJECT); 56 | return (JsonNode *)getPayload(); 57 | } 58 | }; 59 | 60 | struct JsonNode { 61 | JsonValue value; 62 | JsonNode *next; 63 | char *key; 64 | }; 65 | 66 | struct JsonIterator { 67 | JsonNode *p; 68 | 69 | void operator++() { 70 | p = p->next; 71 | } 72 | bool operator!=(const JsonIterator &x) const { 73 | return p != x.p; 74 | } 75 | JsonNode *operator*() const { 76 | return p; 77 | } 78 | JsonNode *operator->() const { 79 | return p; 80 | } 81 | }; 82 | 83 | inline JsonIterator begin(JsonValue o) { 84 | return JsonIterator{o.toNode()}; 85 | } 86 | inline JsonIterator end(JsonValue) { 87 | return JsonIterator{nullptr}; 88 | } 89 | 90 | #define JSON_ERRNO_MAP(XX) \ 91 | XX(OK, "ok") \ 92 | XX(BAD_NUMBER, "bad number") \ 93 | XX(BAD_STRING, "bad string") \ 94 | XX(BAD_IDENTIFIER, "bad identifier") \ 95 | XX(STACK_OVERFLOW, "stack overflow") \ 96 | XX(STACK_UNDERFLOW, "stack underflow") \ 97 | XX(MISMATCH_BRACKET, "mismatch bracket") \ 98 | XX(UNEXPECTED_CHARACTER, "unexpected character") \ 99 | XX(UNQUOTED_KEY, "unquoted key") \ 100 | XX(BREAKING_BAD, "breaking bad") \ 101 | XX(ALLOCATION_FAILURE, "allocation failure") 102 | 103 | enum JsonErrno { 104 | #define XX(no, str) JSON_##no, 105 | JSON_ERRNO_MAP(XX) 106 | #undef XX 107 | }; 108 | 109 | const char *jsonStrError(int err); 110 | 111 | class JsonAllocator { 112 | struct Zone { 113 | Zone *next; 114 | size_t used; 115 | } *head = nullptr; 116 | 117 | public: 118 | JsonAllocator() = default; 119 | JsonAllocator(const JsonAllocator &) = delete; 120 | JsonAllocator &operator=(const JsonAllocator &) = delete; 121 | JsonAllocator(JsonAllocator &&x) : head(x.head) { 122 | x.head = nullptr; 123 | } 124 | JsonAllocator &operator=(JsonAllocator &&x) { 125 | head = x.head; 126 | x.head = nullptr; 127 | return *this; 128 | } 129 | ~JsonAllocator() { 130 | deallocate(); 131 | } 132 | void *allocate(size_t size); 133 | void deallocate(); 134 | }; 135 | 136 | int jsonParse(char *str, char **endptr, JsonValue *value, JsonAllocator &allocator); 137 | -------------------------------------------------------------------------------- /eval_city/cocoapi/common/maskApi.c: -------------------------------------------------------------------------------- 1 | /************************************************************************** 2 | * Microsoft COCO Toolbox. version 2.0 3 | * Data, paper, and tutorials available at: http://mscoco.org/ 4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 5 | * Licensed under the Simplified BSD License [see coco/license.txt] 6 | **************************************************************************/ 7 | #include "maskApi.h" 8 | #include 9 | #include 10 | 11 | uint umin( uint a, uint b ) { return (ab) ? a : b; } 13 | 14 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ) { 15 | R->h=h; R->w=w; R->m=m; R->cnts=(m==0)?0:malloc(sizeof(uint)*m); 16 | siz j; if(cnts) for(j=0; jcnts[j]=cnts[j]; 17 | } 18 | 19 | void rleFree( RLE *R ) { 20 | free(R->cnts); R->cnts=0; 21 | } 22 | 23 | void rlesInit( RLE **R, siz n ) { 24 | siz i; *R = (RLE*) malloc(sizeof(RLE)*n); 25 | for(i=0; i0 ) { 61 | c=umin(ca,cb); cc+=c; ct=0; 62 | ca-=c; if(!ca && a0) { 83 | crowd=iscrowd!=NULL && iscrowd[g]; 84 | if(dt[d].h!=gt[g].h || dt[d].w!=gt[g].w) { o[g*m+d]=-1; continue; } 85 | siz ka, kb, a, b; uint c, ca, cb, ct, i, u; int va, vb; 86 | ca=dt[d].cnts[0]; ka=dt[d].m; va=vb=0; 87 | cb=gt[g].cnts[0]; kb=gt[g].m; a=b=1; i=u=0; ct=1; 88 | while( ct>0 ) { 89 | c=umin(ca,cb); if(va||vb) { u+=c; if(va&&vb) i+=c; } ct=0; 90 | ca-=c; if(!ca && athr) keep[j]=0; 105 | } 106 | } 107 | } 108 | 109 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ) { 110 | double h, w, i, u, ga, da; siz g, d; int crowd; 111 | for( g=0; gthr) keep[j]=0; 129 | } 130 | } 131 | } 132 | 133 | void rleToBbox( const RLE *R, BB bb, siz n ) { 134 | siz i; for( i=0; id?1:c=dy && xs>xe) || (dxye); 173 | if(flip) { t=xs; xs=xe; xe=t; t=ys; ys=ye; ye=t; } 174 | s = dx>=dy ? (double)(ye-ys)/dx : (double)(xe-xs)/dy; 175 | if(dx>=dy) for( d=0; d<=dx; d++ ) { 176 | t=flip?dx-d:d; u[m]=t+xs; v[m]=(int)(ys+s*t+.5); m++; 177 | } else for( d=0; d<=dy; d++ ) { 178 | t=flip?dy-d:d; v[m]=t+ys; u[m]=(int)(xs+s*t+.5); m++; 179 | } 180 | } 181 | /* get points along y-boundary and downsample */ 182 | free(x); free(y); k=m; m=0; double xd, yd; 183 | x=malloc(sizeof(int)*k); y=malloc(sizeof(int)*k); 184 | for( j=1; jw-1 ) continue; 187 | yd=(double)(v[j]h) yd=h; yd=ceil(yd); 189 | x[m]=(int) xd; y[m]=(int) yd; m++; 190 | } 191 | /* compute rle encoding given y-boundary points */ 192 | k=m; a=malloc(sizeof(uint)*(k+1)); 193 | for( j=0; j0) b[m++]=a[j++]; else { 199 | j++; if(jm, p=0; long x; int more; 206 | char *s=malloc(sizeof(char)*m*6); 207 | for( i=0; icnts[i]; if(i>2) x-=(long) R->cnts[i-2]; more=1; 209 | while( more ) { 210 | char c=x & 0x1f; x >>= 5; more=(c & 0x10) ? x!=-1 : x!=0; 211 | if(more) c |= 0x20; c+=48; s[p++]=c; 212 | } 213 | } 214 | s[p]=0; return s; 215 | } 216 | 217 | void rleFrString( RLE *R, char *s, siz h, siz w ) { 218 | siz m=0, p=0, k; long x; int more; uint *cnts; 219 | while( s[m] ) m++; cnts=malloc(sizeof(uint)*m); m=0; 220 | while( s[p] ) { 221 | x=0; k=0; more=1; 222 | while( more ) { 223 | char c=s[p]-48; x |= (c & 0x1f) << 5*k; 224 | more = c & 0x20; p++; k++; 225 | if(!more && (c & 0x10)) x |= -1 << 5*k; 226 | } 227 | if(m>2) x+=(long) cnts[m-2]; cnts[m++]=(uint) x; 228 | } 229 | rleInit(R,h,w,m,cnts); free(cnts); 230 | } 231 | -------------------------------------------------------------------------------- /eval_city/cocoapi/common/maskApi.h: -------------------------------------------------------------------------------- 1 | /************************************************************************** 2 | * Microsoft COCO Toolbox. version 2.0 3 | * Data, paper, and tutorials available at: http://mscoco.org/ 4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 5 | * Licensed under the Simplified BSD License [see coco/license.txt] 6 | **************************************************************************/ 7 | #pragma once 8 | 9 | typedef unsigned int uint; 10 | typedef unsigned long siz; 11 | typedef unsigned char byte; 12 | typedef double* BB; 13 | typedef struct { siz h, w, m; uint *cnts; } RLE; 14 | 15 | /* Initialize/destroy RLE. */ 16 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ); 17 | void rleFree( RLE *R ); 18 | 19 | /* Initialize/destroy RLE array. */ 20 | void rlesInit( RLE **R, siz n ); 21 | void rlesFree( RLE **R, siz n ); 22 | 23 | /* Encode binary masks using RLE. */ 24 | void rleEncode( RLE *R, const byte *mask, siz h, siz w, siz n ); 25 | 26 | /* Decode binary masks encoded via RLE. */ 27 | void rleDecode( const RLE *R, byte *mask, siz n ); 28 | 29 | /* Compute union or intersection of encoded masks. */ 30 | void rleMerge( const RLE *R, RLE *M, siz n, int intersect ); 31 | 32 | /* Compute area of encoded masks. */ 33 | void rleArea( const RLE *R, siz n, uint *a ); 34 | 35 | /* Compute intersection over union between masks. */ 36 | void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o ); 37 | 38 | /* Compute non-maximum suppression between bounding masks */ 39 | void rleNms( RLE *dt, siz n, uint *keep, double thr ); 40 | 41 | /* Compute intersection over union between bounding boxes. */ 42 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ); 43 | 44 | /* Compute non-maximum suppression between bounding boxes */ 45 | void bbNms( BB dt, siz n, uint *keep, double thr ); 46 | 47 | /* Get bounding boxes surrounding encoded masks. */ 48 | void rleToBbox( const RLE *R, BB bb, siz n ); 49 | 50 | /* Convert bounding boxes to encoded masks. */ 51 | void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n ); 52 | 53 | /* Convert polygon to encoded mask. */ 54 | void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w ); 55 | 56 | /* Get compressed string representation of encoded mask. */ 57 | char* rleToString( const RLE *R ); 58 | 59 | /* Convert from compressed string representation of encoded mask. */ 60 | void rleFrString( RLE *R, char *s, siz h, siz w ); 61 | -------------------------------------------------------------------------------- /eval_city/cocoapi/license.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2014, Piotr Dollar and Tsung-Yi Lin 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | 1. Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 2. Redistributions in binary form must reproduce the above copyright notice, 10 | this list of conditions and the following disclaimer in the documentation 11 | and/or other materials provided with the distribution. 12 | 13 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 14 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 15 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 16 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 17 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 18 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 19 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 20 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 21 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 22 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 23 | 24 | The views and conclusions contained in the software and documentation are those 25 | of the authors and should not be interpreted as representing official policies, 26 | either expressed or implied, of the FreeBSD Project. 27 | -------------------------------------------------------------------------------- /eval_city/dt_txt2json.m: -------------------------------------------------------------------------------- 1 | clear 2 | addpath('./cocoapi/MatlabAPI') 3 | main_path = '../output/valresults/city/h/off'; 4 | subdir = dir(main_path); 5 | for j = 3 : length(subdir) 6 | ndt=0; 7 | dt_coco = struct(); 8 | dt_path = fullfile(main_path, subdir(j).name); 9 | % res = load([dt_path,'/val_500_det.txt'],'%f'); 10 | res = load([dt_path,'/val_det.txt'],'%f'); 11 | num_imgs = max(res(:,1)); 12 | out = [dt_path,'/val_dt.json']; 13 | if exist(out,'file') 14 | continue 15 | end 16 | for i = 1:num_imgs 17 | bbs = res(res(:,1)==i,:); 18 | for ibb=1:size(bbs,1) 19 | ndt=ndt+1; 20 | bb=bbs(ibb,2:6); 21 | dt_coco(ndt).image_id=i; 22 | dt_coco(ndt).category_id=1; 23 | dt_coco(ndt).bbox=bb(1:4); 24 | dt_coco(ndt).score=bb(5); 25 | end 26 | end 27 | dt_string = gason(dt_coco); 28 | fp = fopen(out,'w'); 29 | fprintf(fp,'%s',dt_string); 30 | fclose(fp); 31 | end -------------------------------------------------------------------------------- /eval_city/eval_script/eval_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | from coco import COCO 3 | from eval_MR_multisetup import COCOeval 4 | 5 | annType = 'bbox' #specify type here 6 | 7 | #initialize COCO ground truth api 8 | annFile = '../val_gt.json' 9 | main_path = '../../output/valresults/city/h/off' 10 | for f in sorted(os.listdir(main_path)): 11 | print(f) 12 | # initialize COCO detections api 13 | dt_path = os.path.join(main_path, f) 14 | resFile = os.path.join(dt_path,'val_dt.json') 15 | respath = os.path.join(dt_path,'results.txt') 16 | # if os.path.exists(respath): 17 | # continue 18 | ## running evaluation 19 | res_file = open(respath, "w") 20 | for id_setup in range(0,1): 21 | cocoGt = COCO(annFile) 22 | cocoDt = cocoGt.loadRes(resFile) 23 | imgIds = sorted(cocoGt.getImgIds()) 24 | cocoEval = COCOeval(cocoGt,cocoDt,annType) 25 | cocoEval.params.imgIds = imgIds 26 | cocoEval.evaluate(id_setup) 27 | cocoEval.accumulate() 28 | cocoEval.summarize(id_setup,res_file) 29 | 30 | res_file.close() -------------------------------------------------------------------------------- /eval_city/eval_script/readme.txt: -------------------------------------------------------------------------------- 1 | ################################ 2 | This script is created to produce miss rate numbers by making minor changes to the COCO python evaluation script [1]. It is a python re-implementation of the Caltech evaluation code, which is written in matlab. This python script produces exactly the same numbers as the matlab code. 3 | [1] https://github.com/pdollar/coco/tree/master/PythonAPI 4 | [2] http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/code/code3.2.1.zip 5 | ################################# 6 | Usage 7 | 1. Prepare detection results in COCO format, and write them in a single .json file. 8 | 2. Run eval_demo.py. 9 | 3. Detailed evaluations will be written to results.txt. 10 | ################################# 11 | 12 | -------------------------------------------------------------------------------- /eval_city/readme.txt: -------------------------------------------------------------------------------- 1 | Submitting results 2 | Please write the detection results for all test images in a single .json file (see dt_mat2json.m), and then send to shanshan.zhang@njust.edu.cn. 3 | You'll receive the evaulation results via e-mail, and then you decide whether to publish it or not. 4 | If you would like to publish your results on the dashboard, please also specify a name of your method. 5 | 6 | Metrics 7 | We use the same protocol as in [1] for evaluation. As a numerical measure of the performance, log-average miss rate (MR) is computed by averaging over the precision range of [10e-2; 10e0] FPPI (false positives per image). 8 | For detailed evaluation, we consider the following 4 subsets: 9 | 1. 'Reasonable': height [50, inf]; visibility [0.65, inf] 10 | 2. 'Reasonable_small': height [50, 75]; visibility [0.65, inf] 11 | 3. 'Reasonable_occ=heavy': height [50, inf]; visibility [0.2, 0.65] 12 | 4. 'All': height [20, inf]; visibility [0.2, inf] 13 | 14 | 15 | Reference 16 | [1] P. Dollar, C. Wojek, B. Schiele and P. Perona. Pedestrian Detection: An Evaluation of the State of the Art. TPAMI, 2012. 17 | -------------------------------------------------------------------------------- /generate_cache_caltech.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pickle 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | 6 | root_dir = 'data/caltech/train_3' 7 | # root_dir = 'data/caltech/test' 8 | all_img_path = os.path.join(root_dir, 'images') 9 | all_anno_path = os.path.join(root_dir, 'annotations_new/') 10 | res_path_gt = 'data/cache/caltech/train_gt' 11 | res_path_nogt = 'data/cache/caltech/train_nogt' 12 | 13 | rows, cols = 480, 640 14 | image_data_gt, image_data_nogt = [], [] 15 | 16 | valid_count = 0 17 | iggt_count = 0 18 | box_count = 0 19 | files = sorted(os.listdir(all_anno_path)) 20 | for l in range(len(files)): 21 | gtname = files[l] 22 | imgname = files[l].split('.')[0] + '.jpg' 23 | img_path = os.path.join(all_img_path, imgname) 24 | gt_path = os.path.join(all_anno_path, gtname) 25 | 26 | boxes = [] 27 | ig_boxes = [] 28 | with open(gt_path, 'rb') as fid: 29 | lines = fid.readlines() 30 | if len(lines) > 1: 31 | for i in range(1, len(lines)): 32 | info = lines[i].strip().split(' ') 33 | label = info[0] 34 | occ, ignore = info[5], info[10] 35 | x1, y1 = max(int(float(info[1])), 0), max(int(float(info[2])), 0) 36 | w, h = min(int(float(info[3])), cols - x1 - 1), min(int(float(info[4])), rows - y1 - 1) 37 | box = np.array([int(x1), int(y1), int(x1) + int(w), int(y1) + int(h)]) 38 | if int(ignore) == 0: 39 | boxes.append(box) 40 | else: 41 | ig_boxes.append(box) 42 | boxes = np.array(boxes) 43 | ig_boxes = np.array(ig_boxes) 44 | 45 | annotation = {} 46 | annotation['filepath'] = img_path 47 | box_count += len(boxes) 48 | iggt_count += len(ig_boxes) 49 | annotation['bboxes'] = boxes 50 | annotation['ignoreareas'] = ig_boxes 51 | if len(boxes) == 0: 52 | image_data_nogt.append(annotation) 53 | else: 54 | image_data_gt.append(annotation) 55 | valid_count += 1 56 | print('{} images and {} valid images, {} valid gt and {} ignored gt'.format(len(files), valid_count, box_count, 57 | iggt_count)) 58 | 59 | if not os.path.exists(res_path_gt): 60 | with open(res_path_gt, 'wb') as fid: 61 | pickle.dump(image_data_gt, fid, pickle.HIGHEST_PROTOCOL) 62 | if not os.path.exists(res_path_nogt): 63 | with open(res_path_nogt, 'wb') as fid: 64 | pickle.dump(image_data_nogt, fid, pickle.HIGHEST_PROTOCOL) 65 | -------------------------------------------------------------------------------- /generate_cache_city.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import pickle 4 | import numpy as np 5 | from scipy import io as scio 6 | import time 7 | import matplotlib.pyplot as plt 8 | import re 9 | 10 | root_dir = 'data/cityperson' 11 | all_img_path = os.path.join(root_dir, 'images') 12 | all_anno_path = os.path.join(root_dir, 'annotations') 13 | types = ['train'] 14 | rows, cols = 1024, 2048 15 | 16 | for type in types: 17 | anno_path = os.path.join(all_anno_path, 'anno_' + type + '.mat') 18 | res_path = os.path.join('data/cache/cityperson', type) 19 | image_data = [] 20 | annos = scio.loadmat(anno_path) 21 | index = 'anno_' + type + '_aligned' 22 | valid_count = 0 23 | iggt_count = 0 24 | box_count = 0 25 | for l in range(len(annos[index][0])): 26 | anno = annos[index][0][l] 27 | cityname = anno[0][0][0][0].encode() 28 | imgname = anno[0][0][1][0].encode() 29 | gts = anno[0][0][2] 30 | img_path = os.path.join(all_img_path, type + '/' + cityname + '/' + imgname) 31 | boxes = [] 32 | ig_boxes = [] 33 | vis_boxes = [] 34 | for i in range(len(gts)): 35 | label, x1, y1, w, h = gts[i, :5] 36 | x1, y1 = max(int(x1), 0), max(int(y1), 0) 37 | w, h = min(int(w), cols - x1 - 1), min(int(h), rows - y1 - 1) 38 | xv1, yv1, wv, hv = gts[i, 6:] 39 | xv1, yv1 = max(int(xv1), 0), max(int(yv1), 0) 40 | wv, hv = min(int(wv), cols - xv1 - 1), min(int(hv), rows - yv1 - 1) 41 | 42 | if label == 1 and h >= 50: 43 | box = np.array([int(x1), int(y1), int(x1) + int(w), int(y1) + int(h)]) 44 | boxes.append(box) 45 | vis_box = np.array([int(xv1), int(yv1), int(xv1) + int(wv), int(yv1) + int(hv)]) 46 | vis_boxes.append(vis_box) 47 | else: 48 | ig_box = np.array([int(x1), int(y1), int(x1) + int(w), int(y1) + int(h)]) 49 | ig_boxes.append(ig_box) 50 | boxes = np.array(boxes) 51 | vis_boxes = np.array(vis_boxes) 52 | ig_boxes = np.array(ig_boxes) 53 | 54 | if len(boxes) > 0: 55 | valid_count += 1 56 | annotation = {} 57 | annotation['filepath'] = img_path 58 | box_count += len(boxes) 59 | iggt_count += len(ig_boxes) 60 | annotation['bboxes'] = boxes 61 | annotation['vis_bboxes'] = vis_boxes 62 | annotation['ignoreareas'] = ig_boxes 63 | image_data.append(annotation) 64 | if not os.path.exists(res_path): 65 | with open(res_path, 'wb') as fid: 66 | pickle.dump(image_data, fid, pickle.HIGHEST_PROTOCOL) 67 | print('{} has {} images and {} valid images, {} valid gt and {} ignored gt'.format(type, len(annos[index][0]), 68 | valid_count, box_count, 69 | iggt_count)) 70 | -------------------------------------------------------------------------------- /generate_cache_wider.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import pickle 4 | import numpy as np 5 | import matplotlib.pyplot as plt 6 | 7 | root_dir = 'data/WiderFace/' 8 | img_path = os.path.join(root_dir, 'WIDER_train/images') 9 | anno_path = os.path.join(root_dir, 'wider_face_split', 'wider_face_train_bbx_gt.txt') 10 | # anno_path = os.path.join(root_dir, 'wider_face_split','wider_face_test_filelist.txt') 11 | 12 | res_path = 'data/cache/train' 13 | 14 | image_data = [] 15 | valid_count = 0 16 | img_count = 0 17 | box_count = 0 18 | with open(anno_path, 'rb') as fid: 19 | lines = fid.readlines() 20 | num_lines = len(lines) 21 | 22 | index = 0 23 | while index < num_lines: 24 | filename = lines[index].strip() 25 | img_count += 1 26 | if img_count % 1000 == 0: 27 | print(img_count) 28 | num_obj = int(lines[index + 1]) 29 | filepath = os.path.join(img_path, filename) 30 | img = cv2.imread(filepath) 31 | img_height, img_width = img.shape[:2] 32 | boxes = [] 33 | if num_obj > 0: 34 | for i in range(num_obj): 35 | info = lines[index + 2 + i].strip().split(' ') 36 | x1, y1 = max(int(info[0]), 0), max(int(info[1]), 0) 37 | w, h = min(int(info[2]), img_width - x1 - 1), min(int(info[3]), img_height - y1 - 1) 38 | if w >= 5 and h >= 5: 39 | box = np.array([x1, y1, x1 + w, y1 + h]) 40 | boxes.append(box) 41 | boxes = np.array(boxes) 42 | box_count += len(boxes) 43 | if len(boxes) > 0: 44 | valid_count += 1 45 | annotation = {} 46 | annotation['filepath'] = filepath 47 | annotation['bboxes'] = boxes 48 | image_data.append(annotation) 49 | index += (2 + num_obj) 50 | 51 | print('{} images and {} valid images and {} boxes'.format(img_count, valid_count, box_count)) 52 | with open(res_path, 'wb') as fid: 53 | pickle.dump(image_data, fid, pickle.HIGHEST_PROTOCOL) 54 | -------------------------------------------------------------------------------- /keras_csp/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/keras_csp/__init__.py -------------------------------------------------------------------------------- /keras_csp/bbox_process.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from keras_csp.nms_wrapper import nms 3 | 4 | 5 | def parse_det(Y, C, score=0.1, down=4, scale='h'): 6 | seman = Y[0][0, :, :, 0] 7 | if scale == 'h': 8 | height = np.exp(Y[1][0, :, :, 0]) * down 9 | width = 0.41 * height 10 | elif scale == 'w': 11 | width = np.exp(Y[1][0, :, :, 0]) * down 12 | height = width / 0.41 13 | elif scale == 'hw': 14 | height = np.exp(Y[1][0, :, :, 0]) * down 15 | width = np.exp(Y[1][0, :, :, 1]) * down 16 | y_c, x_c = np.where(seman > score) 17 | boxs = [] 18 | if len(y_c) > 0: 19 | for i in range(len(y_c)): 20 | h = height[y_c[i], x_c[i]] 21 | w = width[y_c[i], x_c[i]] 22 | s = seman[y_c[i], x_c[i]] 23 | x1, y1 = max(0, (x_c[i] + 0.5) * down - w / 2), max(0, (y_c[i] + 0.5) * down - h / 2) 24 | boxs.append([x1, y1, min(x1 + w, C.size_test[1]), min(y1 + h, C.size_test[0]), s]) 25 | boxs = np.asarray(boxs, dtype=np.float32) 26 | keep = nms(boxs, 0.5, usegpu=False, gpu_id=0) 27 | boxs = boxs[keep, :] 28 | return boxs 29 | 30 | 31 | def parse_det_top(Y, C, score=0.1): 32 | seman = Y[0][0, :, :, 0] 33 | height = Y[1][0, :, :, 0] 34 | y_c, x_c = np.where(seman > score) 35 | boxs = [] 36 | if len(y_c) > 0: 37 | for i in range(len(y_c)): 38 | h = np.exp(height[y_c[i], x_c[i]]) * 4 39 | w = 0.41 * h 40 | s = seman[y_c[i], x_c[i]] 41 | x1, y1 = max(0, x_c[i] * 4 + 2 - w / 2), max(0, y_c[i] * 4 + 2) 42 | boxs.append([x1, y1, min(x1 + w, C.size_test[1]), min(y1 + h, C.size_test[0]), s]) 43 | boxs = np.asarray(boxs, dtype=np.float32) 44 | keep = nms(boxs, 0.5, usegpu=False, gpu_id=0) 45 | boxs = boxs[keep, :] 46 | return boxs 47 | 48 | 49 | def parse_det_bottom(Y, C, score=0.1): 50 | seman = Y[0][0, :, :, 0] 51 | height = Y[1][0, :, :, 0] 52 | y_c, x_c = np.where(seman > score) 53 | boxs = [] 54 | if len(y_c) > 0: 55 | for i in range(len(y_c)): 56 | h = np.exp(height[y_c[i], x_c[i]]) * 4 57 | w = 0.41 * h 58 | s = seman[y_c[i], x_c[i]] 59 | x1, y1 = max(0, x_c[i] * 4 + 2 - w / 2), max(0, y_c[i] * 4 + 2 - h) 60 | boxs.append([x1, y1, min(x1 + w, C.size_test[1]), min(y1 + h, C.size_test[0]), s]) 61 | boxs = np.asarray(boxs, dtype=np.float32) 62 | keep = nms(boxs, 0.5, usegpu=False, gpu_id=0) 63 | boxs = boxs[keep, :] 64 | return boxs 65 | 66 | 67 | def parse_det_offset(Y, C, score=0.1, down=4): 68 | seman = Y[0][0, :, :, 0] 69 | height = Y[1][0, :, :, 0] 70 | offset_y = Y[2][0, :, :, 0] 71 | offset_x = Y[2][0, :, :, 1] 72 | y_c, x_c = np.where(seman > score) 73 | boxs = [] 74 | if len(y_c) > 0: 75 | for i in range(len(y_c)): 76 | h = np.exp(height[y_c[i], x_c[i]]) * down 77 | w = 0.41 * h 78 | o_y = offset_y[y_c[i], x_c[i]] 79 | o_x = offset_x[y_c[i], x_c[i]] 80 | s = seman[y_c[i], x_c[i]] 81 | x1, y1 = max(0, (x_c[i] + o_x + 0.5) * down - w / 2), max(0, (y_c[i] + o_y + 0.5) * down - h / 2) 82 | boxs.append([x1, y1, min(x1 + w, C.size_test[1]), min(y1 + h, C.size_test[0]), s]) 83 | boxs = np.asarray(boxs, dtype=np.float32) 84 | keep = nms(boxs, 0.5, usegpu=False, gpu_id=0) 85 | boxs = boxs[keep, :] 86 | return boxs 87 | 88 | 89 | def parse_wider_offset(Y, C, score=0.1, down=4, nmsthre=0.5): 90 | seman = Y[0][0, :, :, 0] 91 | height = Y[1][0, :, :, 0] 92 | width = Y[1][0, :, :, 1] 93 | offset_y = Y[2][0, :, :, 0] 94 | offset_x = Y[2][0, :, :, 1] 95 | y_c, x_c = np.where(seman > score) 96 | boxs = [] 97 | if len(y_c) > 0: 98 | for i in range(len(y_c)): 99 | h = np.exp(height[y_c[i], x_c[i]]) * down 100 | w = np.exp(width[y_c[i], x_c[i]]) * down 101 | o_y = offset_y[y_c[i], x_c[i]] 102 | o_x = offset_x[y_c[i], x_c[i]] 103 | s = seman[y_c[i], x_c[i]] 104 | x1, y1 = max(0, (x_c[i] + o_x + 0.5) * down - w / 2), max(0, (y_c[i] + o_y + 0.5) * down - h / 2) 105 | x1, y1 = min(x1, C.size_test[1]), min(y1, C.size_test[0]) 106 | boxs.append([x1, y1, min(x1 + w, C.size_test[1]), min(y1 + h, C.size_test[0]), s]) 107 | boxs = np.asarray(boxs, dtype=np.float32) 108 | # keep = nms(boxs, nmsthre, usegpu=False, gpu_id=0) 109 | # boxs = boxs[keep, :] 110 | boxs = soft_bbox_vote(boxs, thre=nmsthre) 111 | return boxs 112 | 113 | 114 | def soft_bbox_vote(det, thre=0.35, score=0.05): 115 | if det.shape[0] <= 1: 116 | return det 117 | order = det[:, 4].ravel().argsort()[::-1] 118 | det = det[order, :] 119 | dets = [] 120 | while det.shape[0] > 0: 121 | # IOU 122 | area = (det[:, 2] - det[:, 0] + 1) * (det[:, 3] - det[:, 1] + 1) 123 | xx1 = np.maximum(det[0, 0], det[:, 0]) 124 | yy1 = np.maximum(det[0, 1], det[:, 1]) 125 | xx2 = np.minimum(det[0, 2], det[:, 2]) 126 | yy2 = np.minimum(det[0, 3], det[:, 3]) 127 | w = np.maximum(0.0, xx2 - xx1 + 1) 128 | h = np.maximum(0.0, yy2 - yy1 + 1) 129 | inter = w * h 130 | o = inter / (area[0] + area[:] - inter) 131 | 132 | # get needed merge det and delete these det 133 | merge_index = np.where(o >= thre)[0] 134 | det_accu = det[merge_index, :] 135 | det_accu_iou = o[merge_index] 136 | det = np.delete(det, merge_index, 0) 137 | 138 | if merge_index.shape[0] <= 1: 139 | try: 140 | dets = np.row_stack((dets, det_accu)) 141 | except: 142 | dets = det_accu 143 | continue 144 | else: 145 | soft_det_accu = det_accu.copy() 146 | soft_det_accu[:, 4] = soft_det_accu[:, 4] * (1 - det_accu_iou) 147 | soft_index = np.where(soft_det_accu[:, 4] >= score)[0] 148 | soft_det_accu = soft_det_accu[soft_index, :] 149 | 150 | det_accu[:, 0:4] = det_accu[:, 0:4] * np.tile(det_accu[:, -1:], (1, 4)) 151 | max_score = np.max(det_accu[:, 4]) 152 | det_accu_sum = np.zeros((1, 5)) 153 | det_accu_sum[:, 0:4] = np.sum(det_accu[:, 0:4], axis=0) / np.sum(det_accu[:, -1:]) 154 | det_accu_sum[:, 4] = max_score 155 | 156 | if soft_det_accu.shape[0] > 0: 157 | det_accu_sum = np.row_stack((soft_det_accu, det_accu_sum)) 158 | 159 | try: 160 | dets = np.row_stack((dets, det_accu_sum)) 161 | except: 162 | dets = det_accu_sum 163 | 164 | order = dets[:, 4].ravel().argsort()[::-1] 165 | dets = dets[order, :] 166 | return dets 167 | 168 | 169 | def bbox_vote(det, thre): 170 | if det.shape[0] <= 1: 171 | return det 172 | order = det[:, 4].ravel().argsort()[::-1] 173 | det = det[order, :] 174 | # det = det[np.where(det[:, 4] > 0.2)[0], :] 175 | dets = [] 176 | while det.shape[0] > 0: 177 | # IOU 178 | area = (det[:, 2] - det[:, 0] + 1) * (det[:, 3] - det[:, 1] + 1) 179 | xx1 = np.maximum(det[0, 0], det[:, 0]) 180 | yy1 = np.maximum(det[0, 1], det[:, 1]) 181 | xx2 = np.minimum(det[0, 2], det[:, 2]) 182 | yy2 = np.minimum(det[0, 3], det[:, 3]) 183 | w = np.maximum(0.0, xx2 - xx1 + 1) 184 | h = np.maximum(0.0, yy2 - yy1 + 1) 185 | inter = w * h 186 | o = inter / (area[0] + area[:] - inter) 187 | 188 | # get needed merge det and delete these det 189 | merge_index = np.where(o >= thre)[0] 190 | det_accu = det[merge_index, :] 191 | det = np.delete(det, merge_index, 0) 192 | 193 | if merge_index.shape[0] <= 1: 194 | try: 195 | dets = np.row_stack((dets, det_accu)) 196 | except: 197 | dets = det_accu 198 | continue 199 | else: 200 | det_accu[:, 0:4] = det_accu[:, 0:4] * np.tile(det_accu[:, -1:], (1, 4)) 201 | max_score = np.max(det_accu[:, 4]) 202 | det_accu_sum = np.zeros((1, 5)) 203 | det_accu_sum[:, 0:4] = np.sum(det_accu[:, 0:4], axis=0) / np.sum(det_accu[:, -1:]) 204 | det_accu_sum[:, 4] = max_score 205 | try: 206 | dets = np.row_stack((dets, det_accu_sum)) 207 | except: 208 | dets = det_accu_sum 209 | 210 | return dets 211 | -------------------------------------------------------------------------------- /keras_csp/bbox_transform.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | 10 | 11 | def bbox_transform(ex_rois, gt_rois): 12 | ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0 13 | ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0 14 | ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths 15 | ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights 16 | 17 | gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0 18 | gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0 19 | gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths 20 | gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights 21 | 22 | targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths 23 | targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights 24 | targets_dw = np.log(gt_widths / ex_widths) 25 | targets_dh = np.log(gt_heights / ex_heights) 26 | 27 | targets = np.vstack( 28 | (targets_dx, targets_dy, targets_dw, targets_dh)).transpose() 29 | return targets 30 | 31 | 32 | def bbox_transform_inv(boxes, deltas): 33 | if boxes.shape[0] == 0: 34 | return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype) 35 | 36 | boxes = boxes.astype(deltas.dtype, copy=False) 37 | 38 | widths = boxes[:, 2] - boxes[:, 0] + 1.0 39 | heights = boxes[:, 3] - boxes[:, 1] + 1.0 40 | ctr_x = boxes[:, 0] + 0.5 * widths 41 | ctr_y = boxes[:, 1] + 0.5 * heights 42 | 43 | dx = deltas[:, 0::4] 44 | dy = deltas[:, 1::4] 45 | dw = deltas[:, 2::4] 46 | dh = deltas[:, 3::4] 47 | 48 | pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis] 49 | pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis] 50 | pred_w = np.exp(dw) * widths[:, np.newaxis] 51 | pred_h = np.exp(dh) * heights[:, np.newaxis] 52 | 53 | pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype) 54 | # x1 55 | pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w 56 | # y1 57 | pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h 58 | # x2 59 | pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w 60 | # y2 61 | pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h 62 | 63 | return pred_boxes 64 | 65 | 66 | def compute_targets(ex_rois, gt_rois, classifier_regr_std, std): 67 | """Compute bounding-box regression targets for an image.""" 68 | 69 | assert ex_rois.shape[0] == gt_rois.shape[0] 70 | assert ex_rois.shape[1] == 4 71 | assert gt_rois.shape[1] == 4 72 | 73 | targets = bbox_transform(ex_rois, gt_rois) 74 | # Optionally normalize targets by a precomputed mean and stdev 75 | if std: 76 | targets = targets / np.array(classifier_regr_std) 77 | return targets 78 | 79 | 80 | def clip_boxes(boxes, im_shape): 81 | # boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0) 82 | # # y1 >= 0 83 | # boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0) 84 | # # x2 < im_shape[1] 85 | # boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0) 86 | # # y2 < im_shape[0] 87 | # boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0) 88 | boxes[:, 0] = np.maximum(np.minimum(boxes[:, 0], im_shape[1] - 1), 0) 89 | # y1 >= 0 90 | boxes[:, 1] = np.maximum(np.minimum(boxes[:, 1], im_shape[0] - 1), 0) 91 | # x2 < im_shape[1] 92 | boxes[:, 2] = np.maximum(np.minimum(boxes[:, 2], im_shape[1] - 1), 0) 93 | # y2 < im_shape[0] 94 | boxes[:, 3] = np.maximum(np.minimum(boxes[:, 3], im_shape[0] - 1), 0) 95 | return boxes 96 | -------------------------------------------------------------------------------- /keras_csp/config.py: -------------------------------------------------------------------------------- 1 | class Config(object): 2 | def __init__(self): 3 | self.gpu_ids = '0' 4 | self.onegpu = 2 5 | self.num_epochs = 150 6 | self.add_epoch = 0 7 | self.iter_per_epoch = 2000 8 | self.init_lr = 1e-4 9 | self.alpha = 0.999 10 | 11 | # setting for network architechture 12 | self.network = 'resnet50' # or 'mobilenet' 13 | self.point = 'center' # or 'top', 'bottom 14 | self.scale = 'h' # or 'w', 'hw' 15 | self.num_scale = 1 # 1 for height (or width) prediction, 2 for height+width prediction 16 | self.offset = False # append offset prediction or not 17 | self.down = 4 # downsampling rate of the feature map for detection 18 | self.radius = 2 # surrounding areas of positives for the scale map 19 | 20 | # setting for data augmentation 21 | self.use_horizontal_flips = True 22 | self.brightness = (0.5, 2) 23 | self.size_train = (336, 448) 24 | 25 | # image channel-wise mean to subtract, the order is BGR 26 | self.img_channel_mean = [103.939, 116.779, 123.68] 27 | -------------------------------------------------------------------------------- /keras_csp/data_augment.py: -------------------------------------------------------------------------------- 1 | 2 | import cv2 3 | import numpy as np 4 | import copy 5 | 6 | def _brightness(image, min=0.5, max=2.0): 7 | ''' 8 | Randomly change the brightness of the input image. 9 | 10 | Protected against overflow. 11 | ''' 12 | hsv = cv2.cvtColor(image,cv2.COLOR_RGB2HSV) 13 | 14 | random_br = np.random.uniform(min,max) 15 | 16 | #To protect against overflow: Calculate a mask for all pixels 17 | #where adjustment of the brightness would exceed the maximum 18 | #brightness value and set the value to the maximum at those pixels. 19 | mask = hsv[:,:,2] * random_br > 255 20 | v_channel = np.where(mask, 255, hsv[:,:,2] * random_br) 21 | hsv[:,:,2] = v_channel 22 | 23 | return cv2.cvtColor(hsv,cv2.COLOR_HSV2RGB) 24 | 25 | def resize_image(image, gts,igs, scale=[0.4,1.5]): 26 | height, width = image.shape[0:2] 27 | ratio = np.random.uniform(scale[0], scale[1]) 28 | # if len(gts)>0 and np.max(gts[:,3]-gts[:,1])>300: 29 | # ratio = np.random.uniform(scale[0], 1.0) 30 | new_height, new_width = int(ratio*height), int(ratio*width) 31 | image = cv2.resize(image, (new_width, new_height)) 32 | if len(gts)>0: 33 | gts = np.asarray(gts,dtype=float) 34 | gts[:, 0:4:2] *= ratio 35 | gts[:, 1:4:2] *= ratio 36 | 37 | if len(igs)>0: 38 | igs = np.asarray(igs, dtype=float) 39 | igs[:, 0:4:2] *= ratio 40 | igs[:, 1:4:2] *= ratio 41 | 42 | return image, gts, igs 43 | 44 | def random_crop(image, gts, igs, crop_size, limit=8): 45 | img_height, img_width = image.shape[0:2] 46 | crop_h, crop_w = crop_size 47 | 48 | if len(gts)>0: 49 | sel_id = np.random.randint(0, len(gts)) 50 | sel_center_x = int((gts[sel_id, 0] + gts[sel_id, 2]) / 2.0) 51 | sel_center_y = int((gts[sel_id, 1] + gts[sel_id, 3]) / 2.0) 52 | else: 53 | sel_center_x = int(np.random.randint(0, img_width - crop_w+1) + crop_w * 0.5) 54 | sel_center_y = int(np.random.randint(0, img_height - crop_h+1) + crop_h * 0.5) 55 | 56 | crop_x1 = max(sel_center_x - int(crop_w * 0.5), int(0)) 57 | crop_y1 = max(sel_center_y - int(crop_h * 0.5), int(0)) 58 | diff_x = max(crop_x1 + crop_w - img_width, int(0)) 59 | crop_x1 -= diff_x 60 | diff_y = max(crop_y1 + crop_h - img_height, int(0)) 61 | crop_y1 -= diff_y 62 | cropped_image = np.copy(image[crop_y1:crop_y1 + crop_h, crop_x1:crop_x1 + crop_w]) 63 | # crop detections 64 | if len(igs)>0: 65 | igs[:, 0:4:2] -= crop_x1 66 | igs[:, 1:4:2] -= crop_y1 67 | igs[:, 0:4:2] = np.clip(igs[:, 0:4:2], 0, crop_w) 68 | igs[:, 1:4:2] = np.clip(igs[:, 1:4:2], 0, crop_h) 69 | keep_inds = ((igs[:, 2] - igs[:, 0]) >=8) & \ 70 | ((igs[:, 3] - igs[:, 1]) >=8) 71 | igs = igs[keep_inds] 72 | if len(gts)>0: 73 | ori_gts = np.copy(gts) 74 | gts[:, 0:4:2] -= crop_x1 75 | gts[:, 1:4:2] -= crop_y1 76 | gts[:, 0:4:2] = np.clip(gts[:, 0:4:2], 0, crop_w) 77 | gts[:, 1:4:2] = np.clip(gts[:, 1:4:2], 0, crop_h) 78 | 79 | before_area = (ori_gts[:, 2] - ori_gts[:, 0]) * (ori_gts[:, 3] - ori_gts[:, 1]) 80 | after_area = (gts[:, 2] - gts[:, 0]) * (gts[:, 3] - gts[:, 1]) 81 | 82 | keep_inds = ((gts[:, 2] - gts[:, 0]) >=limit) & \ 83 | (after_area >= 0.5 * before_area) 84 | gts = gts[keep_inds] 85 | 86 | return cropped_image, gts, igs 87 | 88 | 89 | def random_pave(image, gts, igs, pave_size, limit=8): 90 | img_height, img_width = image.shape[0:2] 91 | pave_h, pave_w = pave_size 92 | # paved_image = np.zeros((pave_h, pave_w, 3), dtype=image.dtype) 93 | paved_image = np.ones((pave_h, pave_w, 3), dtype=image.dtype) * np.mean(image, dtype=int) 94 | pave_x = int(np.random.randint(0, pave_w - img_width + 1)) 95 | pave_y = int(np.random.randint(0, pave_h - img_height + 1)) 96 | paved_image[pave_y:pave_y + img_height, pave_x:pave_x + img_width] = image 97 | # pave detections 98 | if len(igs) > 0: 99 | igs[:, 0:4:2] += pave_x 100 | igs[:, 1:4:2] += pave_y 101 | keep_inds = ((igs[:, 2] - igs[:, 0]) >= 8) & \ 102 | ((igs[:, 3] - igs[:, 1]) >= 8) 103 | igs = igs[keep_inds] 104 | 105 | if len(gts) > 0: 106 | gts[:, 0:4:2] += pave_x 107 | gts[:, 1:4:2] += pave_y 108 | keep_inds = ((gts[:, 2] - gts[:, 0]) >= limit) 109 | gts = gts[keep_inds] 110 | 111 | return paved_image, gts, igs 112 | 113 | def augment(img_data, c): 114 | assert 'filepath' in img_data 115 | assert 'bboxes' in img_data 116 | img_data_aug = copy.deepcopy(img_data) 117 | img = cv2.imread(img_data_aug['filepath']) 118 | img_height, img_width = img.shape[:2] 119 | 120 | # random brightness 121 | if c.brightness and np.random.randint(0, 2) == 0: 122 | img = _brightness(img, min=c.brightness[0], max=c.brightness[1]) 123 | # random horizontal flip 124 | if c.use_horizontal_flips and np.random.randint(0, 2) == 0: 125 | img = cv2.flip(img, 1) 126 | if len(img_data_aug['bboxes']) > 0: 127 | img_data_aug['bboxes'][:, [0, 2]] = img_width - img_data_aug['bboxes'][:, [2, 0]] 128 | if len(img_data_aug['ignoreareas']) > 0: 129 | img_data_aug['ignoreareas'][:, [0, 2]] = img_width - img_data_aug['ignoreareas'][:, [2, 0]] 130 | 131 | gts = np.copy(img_data_aug['bboxes']) 132 | igs = np.copy(img_data_aug['ignoreareas']) 133 | 134 | img, gts, igs = resize_image(img, gts, igs, scale=[0.4,1.5]) 135 | if img.shape[0]>=c.size_train[0]: 136 | img, gts, igs = random_crop(img, gts, igs, c.size_train,limit=16) 137 | else: 138 | img, gts, igs = random_pave(img, gts, igs, c.size_train,limit=16) 139 | 140 | img_data_aug['bboxes'] = gts 141 | img_data_aug['ignoreareas'] = igs 142 | 143 | img_data_aug['width'] = c.size_train[1] 144 | img_data_aug['height'] = c.size_train[0] 145 | 146 | return img_data_aug, img 147 | 148 | def augment_wider(img_data, c): 149 | assert 'filepath' in img_data 150 | assert 'bboxes' in img_data 151 | img_data_aug = copy.deepcopy(img_data) 152 | img = cv2.imread(img_data_aug['filepath']) 153 | img_height, img_width = img.shape[:2] 154 | 155 | # random brightness 156 | if c.brightness and np.random.randint(0, 2) == 0: 157 | img = _brightness(img, min=c.brightness[0], max=c.brightness[1]) 158 | # random horizontal flip 159 | if c.use_horizontal_flips and np.random.randint(0, 2) == 0: 160 | img = cv2.flip(img, 1) 161 | if len(img_data_aug['bboxes']) > 0: 162 | img_data_aug['bboxes'][:, [0, 2]] = img_width - img_data_aug['bboxes'][:, [2, 0]] 163 | gts = np.copy(img_data_aug['bboxes']) 164 | 165 | scales = np.asarray([16, 32, 64, 128, 256]) 166 | crop_p = c.size_train[0] 167 | if len(gts) > 0: 168 | sel_id = np.random.randint(0, len(gts)) 169 | s_face = np.sqrt((gts[sel_id, 2] - gts[sel_id, 0]) * (gts[sel_id, 3] - gts[sel_id, 1])) 170 | index = np.random.randint(0, np.argmin(np.abs(scales - s_face)) + 1) 171 | s_tar = np.random.uniform(np.power(2, 4 + index)*1.5, np.power(2, 4 + index) * 2) 172 | ratio = s_tar / s_face 173 | new_height, new_width = int(ratio * img_height), int(ratio * img_width) 174 | img = cv2.resize(img, (new_width, new_height)) 175 | gts = np.asarray(gts, dtype=float) * ratio 176 | 177 | crop_x1 = np.random.randint(0, int(gts[sel_id, 0])+1) 178 | crop_x1 = np.minimum(crop_x1, np.maximum(0, new_width - crop_p)) 179 | crop_y1 = np.random.randint(0, int(gts[sel_id, 1])+1) 180 | crop_y1 = np.minimum(crop_y1, np.maximum(0, new_height - crop_p)) 181 | img = img[crop_y1:crop_y1 + crop_p, crop_x1:crop_x1 + crop_p] 182 | # crop detections 183 | if len(gts) > 0: 184 | ori_gts = np.copy(gts) 185 | gts[:, 0:4:2] -= crop_x1 186 | gts[:, 1:4:2] -= crop_y1 187 | gts[:, 0:4:2] = np.clip(gts[:, 0:4:2], 0, crop_p) 188 | gts[:, 1:4:2] = np.clip(gts[:, 1:4:2], 0, crop_p) 189 | 190 | before_area = (ori_gts[:, 2] - ori_gts[:, 0]) * (ori_gts[:, 3] - ori_gts[:, 1]) 191 | after_area = (gts[:, 2] - gts[:, 0]) * (gts[:, 3] - gts[:, 1]) 192 | 193 | keep_inds = (after_area >= 0.5 * before_area) 194 | keep_inds_ig = (after_area < 0.5 * before_area) 195 | igs = gts[keep_inds_ig] 196 | gts = gts[keep_inds] 197 | 198 | if len(igs) > 0: 199 | w, h = igs[:, 2] - igs[:, 0], igs[:, 3] - igs[:, 1] 200 | igs = igs[np.logical_and(w >= 12, h >= 12), :] 201 | if len(gts) > 0: 202 | w, h = gts[:, 2] - gts[:, 0], gts[:, 3] - gts[:, 1] 203 | gts = gts[np.logical_and(w >= 12, h >= 12), :] 204 | else: 205 | img = img[0:crop_p, 0:crop_p] 206 | 207 | if np.minimum(img.shape[0], img.shape[1]) < c.size_train[0]: 208 | img, gts, igs = random_pave(img, gts, igs, c.size_train) 209 | 210 | img_data_aug['bboxes'] = gts 211 | img_data_aug['ignoreareas'] = igs 212 | return img_data_aug, img 213 | -------------------------------------------------------------------------------- /keras_csp/keras_layer_L2Normalization.py: -------------------------------------------------------------------------------- 1 | ''' 2 | A custom Keras layer to perform L2-normalization. 3 | 4 | Copyright (C) 2017 Pierluigi Ferrari 5 | 6 | This program is free software: you can redistribute it and/or modify 7 | it under the terms of the GNU General Public License as published by 8 | the Free Software Foundation, either version 3 of the License, or 9 | (at your option) any later version. 10 | 11 | This program is distributed in the hope that it will be useful, 12 | but WITHOUT ANY WARRANTY; without even the implied warranty of 13 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14 | GNU General Public License for more details. 15 | 16 | You should have received a copy of the GNU General Public License 17 | along with this program. If not, see . 18 | ''' 19 | 20 | import keras.backend as K 21 | from keras.engine.topology import InputSpec 22 | from keras.engine.topology import Layer 23 | import numpy as np 24 | 25 | 26 | class L2Normalization(Layer): 27 | ''' 28 | Performs L2 normalization on the input tensor with a learnable scaling parameter 29 | as described in the paper "Parsenet: Looking Wider to See Better" (see references) 30 | and as used in the original SSD model. 31 | 32 | Arguments: 33 | gamma_init (int): The initial scaling parameter. Defaults to 20 following the 34 | SSD paper. 35 | 36 | Input shape: 37 | 4D tensor of shape `(batch, channels, height, width)` if `dim_ordering = 'th'` 38 | or `(batch, height, width, channels)` if `dim_ordering = 'tf'`. 39 | 40 | Returns: 41 | The scaled tensor. Same shape as the input tensor. 42 | 43 | References: 44 | http://cs.unc.edu/~wliu/papers/parsenet.pdf 45 | ''' 46 | 47 | def __init__(self, gamma_init=20, **kwargs): 48 | if K.image_dim_ordering() == 'tf': 49 | self.axis = 3 50 | else: 51 | self.axis = 1 52 | self.gamma_init = gamma_init 53 | super(L2Normalization, self).__init__(**kwargs) 54 | 55 | def build(self, input_shape): 56 | self.input_spec = [InputSpec(shape=input_shape)] 57 | gamma = self.gamma_init * np.ones((input_shape[self.axis],)) 58 | self.gamma = K.variable(gamma, name='{}_gamma'.format(self.name)) 59 | self.trainable_weights = [self.gamma] 60 | super(L2Normalization, self).build(input_shape) 61 | 62 | def call(self, x, mask=None): 63 | output = K.l2_normalize(x, self.axis) 64 | output *= self.gamma 65 | return output 66 | -------------------------------------------------------------------------------- /keras_csp/losses.py: -------------------------------------------------------------------------------- 1 | from keras import backend as K 2 | from keras.objectives import categorical_crossentropy 3 | 4 | if K.image_dim_ordering() == 'tf': 5 | import tensorflow as tf 6 | 7 | epsilon = 1e-4 8 | 9 | 10 | def cls_center(y_true, y_pred): 11 | classification_loss = K.binary_crossentropy(y_pred[:, :, :, 0], y_true[:, :, :, 2]) 12 | # firstly we compute the focal weight 13 | positives = y_true[:, :, :, 2] 14 | negatives = y_true[:, :, :, 1] - y_true[:, :, :, 2] 15 | foreground_weight = positives * (1.0 - y_pred[:, :, :, 0]) ** 2.0 16 | # foreground_weight = positives 17 | background_weight = negatives * ((1.0 - y_true[:, :, :, 0]) ** 4.0) * (y_pred[:, :, :, 0] ** 2.0) 18 | # background_weight = negatives * ((1.0 - y_true[:, :, :, 0])**4.0)*(0.01 ** 2.0) 19 | 20 | # foreground_weight = y_true[:, :, :, 0] * (1- y_pred[:, :, :, 0]) ** 2.0 21 | # background_weight = negatives * y_pred[:, :, :, 0] ** 2.0 22 | 23 | focal_weight = foreground_weight + background_weight 24 | 25 | assigned_boxes = tf.reduce_sum(y_true[:, :, :, 2]) 26 | class_loss = 0.01 * tf.reduce_sum(focal_weight * classification_loss) / tf.maximum(1.0, assigned_boxes) 27 | 28 | # assigned_boxes = tf.reduce_sum(tf.reduce_sum(y_true[:, :, :, 1], axis=-1), axis=-1) 29 | # class_loss = tf.reduce_sum(tf.reduce_sum(classification_loss, axis=-1), axis=-1) / tf.maximum(1.0, assigned_boxes) 30 | 31 | return class_loss 32 | 33 | 34 | def regr_h(y_true, y_pred): 35 | absolute_loss = tf.abs(y_true[:, :, :, 0] - y_pred[:, :, :, 0]) / (y_true[:, :, :, 0] + 1e-10) 36 | square_loss = 0.5 * ((y_true[:, :, :, 0] - y_pred[:, :, :, 0]) / (y_true[:, :, :, 0] + 1e-10)) ** 2 37 | 38 | l1_loss = y_true[:, :, :, 1] * tf.where(tf.less(absolute_loss, 1.0), square_loss, absolute_loss - 0.5) 39 | 40 | assigned_boxes = tf.reduce_sum(y_true[:, :, :, 1]) 41 | class_loss = tf.reduce_sum(l1_loss) / tf.maximum(1.0, assigned_boxes) 42 | 43 | return class_loss 44 | 45 | 46 | def regr_hw(y_true, y_pred): 47 | absolute_loss = tf.abs(y_true[:, :, :, :2] - y_pred[:, :, :, :]) / (y_true[:, :, :, :2] + 1e-10) 48 | square_loss = 0.5 * ((y_true[:, :, :, :2] - y_pred[:, :, :, :]) / (y_true[:, :, :, :2] + 1e-10)) ** 2 49 | loss = y_true[:, :, :, 2] * tf.reduce_sum(tf.where(tf.less(absolute_loss, 1.0), square_loss, absolute_loss - 0.5), 50 | axis=-1) 51 | assigned_boxes = tf.reduce_sum(y_true[:, :, :, 2]) 52 | class_loss = tf.reduce_sum(loss) / tf.maximum(1.0, assigned_boxes) 53 | 54 | return class_loss 55 | 56 | 57 | def regr_offset(y_true, y_pred): 58 | absolute_loss = tf.abs(y_true[:, :, :, :2] - y_pred[:, :, :, :]) 59 | square_loss = 0.5 * (y_true[:, :, :, :2] - y_pred[:, :, :, :]) ** 2 60 | l1_loss = y_true[:, :, :, 2] * tf.reduce_sum( 61 | tf.where(tf.less(absolute_loss, 1.0), square_loss, absolute_loss - 0.5), axis=-1) 62 | 63 | assigned_boxes = tf.reduce_sum(y_true[:, :, :, 2]) 64 | class_loss = 0.1 * tf.reduce_sum(l1_loss) / tf.maximum(1.0, assigned_boxes) 65 | 66 | return class_loss 67 | -------------------------------------------------------------------------------- /keras_csp/nms/.gitignore: -------------------------------------------------------------------------------- 1 | *.c 2 | *.cpp 3 | *.so 4 | -------------------------------------------------------------------------------- /keras_csp/nms/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dominikandreas/CSP/e4a5a3770609896d3ab4b164c434ce80746225e6/keras_csp/nms/__init__.py -------------------------------------------------------------------------------- /keras_csp/nms/cpu_nms.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | cimport numpy as np 10 | 11 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b): 12 | return a if a >= b else b 13 | 14 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b): 15 | return a if a <= b else b 16 | 17 | def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh): 18 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0] 19 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1] 20 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2] 21 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3] 22 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4] 23 | 24 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1) 25 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1] 26 | 27 | cdef int ndets = dets.shape[0] 28 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \ 29 | np.zeros((ndets), dtype=np.int) 30 | 31 | # nominal indices 32 | cdef int _i, _j 33 | # sorted indices 34 | cdef int i, j 35 | # temp variables for box i's (the box currently under consideration) 36 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea 37 | # variables for computing overlap with box j (lower scoring box) 38 | cdef np.float32_t xx1, yy1, xx2, yy2 39 | cdef np.float32_t w, h 40 | cdef np.float32_t inter, ovr 41 | 42 | keep = [] 43 | for _i in range(ndets): 44 | i = order[_i] 45 | if suppressed[i] == 1: 46 | continue 47 | keep.append(i) 48 | ix1 = x1[i] 49 | iy1 = y1[i] 50 | ix2 = x2[i] 51 | iy2 = y2[i] 52 | iarea = areas[i] 53 | for _j in range(_i + 1, ndets): 54 | j = order[_j] 55 | if suppressed[j] == 1: 56 | continue 57 | xx1 = max(ix1, x1[j]) 58 | yy1 = max(iy1, y1[j]) 59 | xx2 = min(ix2, x2[j]) 60 | yy2 = min(iy2, y2[j]) 61 | w = max(0.0, xx2 - xx1 + 1) 62 | h = max(0.0, yy2 - yy1 + 1) 63 | inter = w * h 64 | ovr = inter / (iarea + areas[j] - inter) 65 | if ovr >= thresh: 66 | suppressed[j] = 1 67 | 68 | return keep 69 | -------------------------------------------------------------------------------- /keras_csp/nms/gpu_nms.hpp: -------------------------------------------------------------------------------- 1 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 2 | int boxes_dim, float nms_overlap_thresh, int device_id); 3 | -------------------------------------------------------------------------------- /keras_csp/nms/gpu_nms.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Faster R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | cimport numpy as np 10 | 11 | assert sizeof(int) == sizeof(np.int32_t) 12 | 13 | cdef extern from "gpu_nms.hpp": 14 | void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int) 15 | 16 | def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh, 17 | np.int32_t device_id=0): 18 | cdef int boxes_num = dets.shape[0] 19 | cdef int boxes_dim = dets.shape[1] 20 | cdef int num_out 21 | cdef np.ndarray[np.int32_t, ndim=1] \ 22 | keep = np.zeros(boxes_num, dtype=np.int32) 23 | cdef np.ndarray[np.float32_t, ndim=1] \ 24 | scores = dets[:, 4] 25 | cdef np.ndarray[np.int_t, ndim=1] \ 26 | order = scores.argsort()[::-1] 27 | cdef np.ndarray[np.float32_t, ndim=2] \ 28 | sorted_dets = dets[order, :] 29 | _nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id) 30 | keep = keep[:num_out] 31 | return list(order[keep]) 32 | -------------------------------------------------------------------------------- /keras_csp/nms/nms_kernel.cu: -------------------------------------------------------------------------------- 1 | // ------------------------------------------------------------------ 2 | // Faster R-CNN 3 | // Copyright (c) 2015 Microsoft 4 | // Licensed under The MIT License [see fast-rcnn/LICENSE for details] 5 | // Written by Shaoqing Ren 6 | // ------------------------------------------------------------------ 7 | 8 | #include "gpu_nms.hpp" 9 | #include 10 | #include 11 | 12 | #define CUDA_CHECK(condition) \ 13 | /* Code block avoids redefinition of cudaError_t error */ \ 14 | do { \ 15 | cudaError_t error = condition; \ 16 | if (error != cudaSuccess) { \ 17 | std::cout << cudaGetErrorString(error) << std::endl; \ 18 | } \ 19 | } while (0) 20 | 21 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0)) 22 | int const threadsPerBlock = sizeof(unsigned long long) * 8; 23 | 24 | __device__ inline float devIoU(float const * const a, float const * const b) { 25 | float left = max(a[0], b[0]), right = min(a[2], b[2]); 26 | float top = max(a[1], b[1]), bottom = min(a[3], b[3]); 27 | float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f); 28 | float interS = width * height; 29 | float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1); 30 | float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1); 31 | return interS / (Sa + Sb - interS); 32 | } 33 | 34 | __global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh, 35 | const float *dev_boxes, unsigned long long *dev_mask) { 36 | const int row_start = blockIdx.y; 37 | const int col_start = blockIdx.x; 38 | 39 | // if (row_start > col_start) return; 40 | 41 | const int row_size = 42 | min(n_boxes - row_start * threadsPerBlock, threadsPerBlock); 43 | const int col_size = 44 | min(n_boxes - col_start * threadsPerBlock, threadsPerBlock); 45 | 46 | __shared__ float block_boxes[threadsPerBlock * 5]; 47 | if (threadIdx.x < col_size) { 48 | block_boxes[threadIdx.x * 5 + 0] = 49 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0]; 50 | block_boxes[threadIdx.x * 5 + 1] = 51 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1]; 52 | block_boxes[threadIdx.x * 5 + 2] = 53 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2]; 54 | block_boxes[threadIdx.x * 5 + 3] = 55 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3]; 56 | block_boxes[threadIdx.x * 5 + 4] = 57 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4]; 58 | } 59 | __syncthreads(); 60 | 61 | if (threadIdx.x < row_size) { 62 | const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x; 63 | const float *cur_box = dev_boxes + cur_box_idx * 5; 64 | int i = 0; 65 | unsigned long long t = 0; 66 | int start = 0; 67 | if (row_start == col_start) { 68 | start = threadIdx.x + 1; 69 | } 70 | for (i = start; i < col_size; i++) { 71 | if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) { 72 | t |= 1ULL << i; 73 | } 74 | } 75 | const int col_blocks = DIVUP(n_boxes, threadsPerBlock); 76 | dev_mask[cur_box_idx * col_blocks + col_start] = t; 77 | } 78 | } 79 | 80 | void _set_device(int device_id) { 81 | int current_device; 82 | CUDA_CHECK(cudaGetDevice(¤t_device)); 83 | if (current_device == device_id) { 84 | return; 85 | } 86 | // The call to cudaSetDevice must come before any calls to Get, which 87 | // may perform initialization using the GPU. 88 | CUDA_CHECK(cudaSetDevice(device_id)); 89 | } 90 | 91 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 92 | int boxes_dim, float nms_overlap_thresh, int device_id) { 93 | _set_device(device_id); 94 | 95 | float* boxes_dev = NULL; 96 | unsigned long long* mask_dev = NULL; 97 | 98 | const int col_blocks = DIVUP(boxes_num, threadsPerBlock); 99 | 100 | CUDA_CHECK(cudaMalloc(&boxes_dev, 101 | boxes_num * boxes_dim * sizeof(float))); 102 | CUDA_CHECK(cudaMemcpy(boxes_dev, 103 | boxes_host, 104 | boxes_num * boxes_dim * sizeof(float), 105 | cudaMemcpyHostToDevice)); 106 | 107 | CUDA_CHECK(cudaMalloc(&mask_dev, 108 | boxes_num * col_blocks * sizeof(unsigned long long))); 109 | 110 | dim3 blocks(DIVUP(boxes_num, threadsPerBlock), 111 | DIVUP(boxes_num, threadsPerBlock)); 112 | dim3 threads(threadsPerBlock); 113 | nms_kernel<<>>(boxes_num, 114 | nms_overlap_thresh, 115 | boxes_dev, 116 | mask_dev); 117 | 118 | std::vector mask_host(boxes_num * col_blocks); 119 | CUDA_CHECK(cudaMemcpy(&mask_host[0], 120 | mask_dev, 121 | sizeof(unsigned long long) * boxes_num * col_blocks, 122 | cudaMemcpyDeviceToHost)); 123 | 124 | std::vector remv(col_blocks); 125 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks); 126 | 127 | int num_to_keep = 0; 128 | for (int i = 0; i < boxes_num; i++) { 129 | int nblock = i / threadsPerBlock; 130 | int inblock = i % threadsPerBlock; 131 | 132 | if (!(remv[nblock] & (1ULL << inblock))) { 133 | keep_out[num_to_keep++] = i; 134 | unsigned long long *p = &mask_host[0] + i * col_blocks; 135 | for (int j = nblock; j < col_blocks; j++) { 136 | remv[j] |= p[j]; 137 | } 138 | } 139 | } 140 | *num_out = num_to_keep; 141 | 142 | CUDA_CHECK(cudaFree(boxes_dev)); 143 | CUDA_CHECK(cudaFree(mask_dev)); 144 | } 145 | -------------------------------------------------------------------------------- /keras_csp/nms/py_cpu_nms.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | 10 | def py_cpu_nms(dets, thresh): 11 | """Pure Python NMS baseline.""" 12 | x1 = dets[:, 0] 13 | y1 = dets[:, 1] 14 | x2 = dets[:, 2] 15 | y2 = dets[:, 3] 16 | scores = dets[:, 4] 17 | 18 | areas = (x2 - x1 + 1) * (y2 - y1 + 1) 19 | order = scores.argsort()[::-1] 20 | 21 | keep = [] 22 | while order.size > 0: 23 | i = order[0] 24 | keep.append(i) 25 | xx1 = np.maximum(x1[i], x1[order[1:]]) 26 | yy1 = np.maximum(y1[i], y1[order[1:]]) 27 | xx2 = np.minimum(x2[i], x2[order[1:]]) 28 | yy2 = np.minimum(y2[i], y2[order[1:]]) 29 | 30 | w = np.maximum(0.0, xx2 - xx1 + 1) 31 | h = np.maximum(0.0, yy2 - yy1 + 1) 32 | inter = w * h 33 | ovr = inter / (areas[i] + areas[order[1:]] - inter) 34 | 35 | inds = np.where(ovr <= thresh)[0] 36 | order = order[inds + 1] 37 | 38 | return keep 39 | -------------------------------------------------------------------------------- /keras_csp/nms_wrapper.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | import pyximport; pyximport.install() 8 | from keras_csp.nms.gpu_nms import gpu_nms 9 | from keras_csp.nms.cpu_nms import cpu_nms 10 | import numpy as np 11 | 12 | def soft_nms(dets, sigma=0.5, Nt=0.3, threshold=0.001, method=1): 13 | 14 | keep = cpu_soft_nms(np.ascontiguousarray(dets, dtype=np.float32), 15 | np.float32(sigma), np.float32(Nt), 16 | np.float32(threshold), 17 | np.uint8(method)) 18 | return keep 19 | 20 | def nms(dets, thresh, usegpu,gpu_id): 21 | """Dispatch to either CPU or GPU NMS implementations.""" 22 | 23 | if dets.shape[0] == 0: 24 | return [] 25 | if usegpu: 26 | return gpu_nms(dets, thresh, device_id=gpu_id) 27 | else: 28 | return cpu_nms(dets, thresh) 29 | -------------------------------------------------------------------------------- /keras_csp/parallel_model.py: -------------------------------------------------------------------------------- 1 | """ 2 | Mask R-CNN 3 | Multi-GPU Support for Keras. 4 | 5 | Copyright (c) 2017 Matterport, Inc. 6 | Licensed under the MIT License (see LICENSE for details) 7 | Written by Waleed Abdulla 8 | 9 | Ideas and a small code snippets from these sources: 10 | https://github.com/fchollet/keras/issues/2436 11 | https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012 12 | https://github.com/avolkov1/keras_experiments/blob/master/keras_exp/multigpu/ 13 | https://github.com/fchollet/keras/blob/master/keras/utils/training_utils.py 14 | """ 15 | 16 | import tensorflow as tf 17 | import keras.backend as K 18 | import keras.layers as KL 19 | import keras.models as KM 20 | 21 | 22 | class ParallelModel(KM.Model): 23 | """Subclasses the standard Keras Model and adds multi-GPU support. 24 | It works by creating a copy of the model on each GPU. Then it slices 25 | the inputs and sends a slice to each copy of the model, and then 26 | merges the outputs together and applies the loss on the combined 27 | outputs. 28 | """ 29 | 30 | def __init__(self, keras_model, gpu_count): 31 | """Class constructor. 32 | keras_model: The Keras model to parallelize 33 | gpu_count: Number of GPUs. Must be > 1 34 | """ 35 | self.inner_model = keras_model 36 | self.gpu_count = gpu_count 37 | merged_outputs = self.make_parallel() 38 | super(ParallelModel, self).__init__(inputs=self.inner_model.inputs, 39 | outputs=merged_outputs) 40 | 41 | def __getattribute__(self, attrname): 42 | """Redirect loading and saving methods to the inner model. That's where 43 | the weights are stored.""" 44 | if 'load' in attrname or 'save' in attrname: 45 | return getattr(self.inner_model, attrname) 46 | return super(ParallelModel, self).__getattribute__(attrname) 47 | 48 | def summary(self, *args, **kwargs): 49 | """Override summary() to display summaries of both, the wrapper 50 | and inner models.""" 51 | super(ParallelModel, self).summary(*args, **kwargs) 52 | self.inner_model.summary(*args, **kwargs) 53 | 54 | def make_parallel(self): 55 | """Creates a new wrapper model that consists of multiple replicas of 56 | the original model placed on different GPUs. 57 | """ 58 | # Slice inputs. Slice inputs on the CPU to avoid sending a copy 59 | # of the full inputs to all GPUs. Saves on bandwidth and memory. 60 | input_slices = {name: tf.split(x, self.gpu_count) 61 | for name, x in zip(self.inner_model.input_names, 62 | self.inner_model.inputs)} 63 | 64 | output_names = self.inner_model.output_names 65 | outputs_all = [] 66 | for i in range(len(self.inner_model.outputs)): 67 | outputs_all.append([]) 68 | 69 | # Run the model call() on each GPU to place the ops there 70 | for i in range(self.gpu_count): 71 | with tf.device('/gpu:%d' % i): 72 | with tf.name_scope('tower_%d' % i): 73 | # Run a slice of inputs through this replica 74 | zipped_inputs = list(zip(self.inner_model.input_names, 75 | self.inner_model.inputs)) 76 | inputs = [ 77 | KL.Lambda(lambda s: input_slices[name][i], 78 | output_shape=lambda s: (None,)+s[1:])(tensor) 79 | for name, tensor in zipped_inputs] 80 | # Create the model replica and get the outputs 81 | outputs = self.inner_model(inputs) 82 | if not isinstance(outputs, list): 83 | outputs = [outputs] 84 | # Save the outputs for merging back together later 85 | for l, o in enumerate(outputs): 86 | outputs_all[l].append(o) 87 | 88 | # Merge outputs on CPU 89 | with tf.device('/cpu:0'): 90 | merged = [] 91 | for outputs, name in zip(outputs_all, output_names): 92 | # If outputs are numbers without dimensions, add a batch dim. 93 | def add_dim(tensor): 94 | """Add a dimension to tensors that don't have any.""" 95 | if K.int_shape(tensor) == (): 96 | return KL.Lambda(lambda t: K.reshape(t, [1, 1]))(tensor) 97 | return tensor 98 | outputs = list(map(add_dim, outputs)) 99 | 100 | # Concatenate 101 | merged.append(KL.Concatenate(axis=0, name=name)(outputs)) 102 | return merged 103 | 104 | 105 | if __name__ == "__main__": 106 | # Testing code below. It creates a simple model to train on MNIST and 107 | # tries to run it on 2 GPUs. It saves the graph so it can be viewed 108 | # in TensorBoard. Run it as: 109 | # 110 | # python3 parallel_model.py 111 | 112 | import os 113 | import numpy as np 114 | import keras.optimizers 115 | from keras.datasets import mnist 116 | from keras.preprocessing.image import ImageDataGenerator 117 | 118 | GPU_COUNT = 2 119 | 120 | # Root directory of the project 121 | ROOT_DIR = os.getcwd() 122 | 123 | # Directory to save logs and trained model 124 | MODEL_DIR = os.path.join(ROOT_DIR, "logs/parallel") 125 | 126 | def build_model(x_train, num_classes): 127 | # Reset default graph. Keras leaves old ops in the graph, 128 | # which are ignored for execution but clutter graph 129 | # visualization in TensorBoard. 130 | tf.reset_default_graph() 131 | 132 | inputs = KL.Input(shape=x_train.shape[1:], name="input_image") 133 | x = KL.Conv2D(32, (3, 3), activation='relu', padding="same", 134 | name="conv1")(inputs) 135 | x = KL.Conv2D(64, (3, 3), activation='relu', padding="same", 136 | name="conv2")(x) 137 | x = KL.MaxPooling2D(pool_size=(2, 2), name="pool1")(x) 138 | x = KL.Flatten(name="flat1")(x) 139 | x = KL.Dense(128, activation='relu', name="dense1")(x) 140 | x = KL.Dense(num_classes, activation='softmax', name="dense2")(x) 141 | 142 | return KM.Model(inputs, x, "digit_classifier_model") 143 | 144 | # Load MNIST Data 145 | (x_train, y_train), (x_test, y_test) = mnist.load_data() 146 | x_train = np.expand_dims(x_train, -1).astype('float32') / 255 147 | x_test = np.expand_dims(x_test, -1).astype('float32') / 255 148 | 149 | print(('x_train shape:', x_train.shape)) 150 | print(('x_test shape:', x_test.shape)) 151 | 152 | # Build data generator and model 153 | datagen = ImageDataGenerator() 154 | model = build_model(x_train, 10) 155 | 156 | # Add multi-GPU support. 157 | model = ParallelModel(model, GPU_COUNT) 158 | 159 | optimizer = keras.optimizers.SGD(lr=0.01, momentum=0.9, clipnorm=5.0) 160 | 161 | model.compile(loss='sparse_categorical_crossentropy', 162 | optimizer=optimizer, metrics=['accuracy']) 163 | 164 | model.summary() 165 | 166 | # Train 167 | model.fit_generator( 168 | datagen.flow(x_train, y_train, batch_size=64), 169 | steps_per_epoch=50, epochs=10, verbose=1, 170 | validation_data=(x_test, y_test), 171 | callbacks=[keras.callbacks.TensorBoard(log_dir=MODEL_DIR, 172 | write_graph=True)] 173 | ) 174 | -------------------------------------------------------------------------------- /keras_csp/utilsfunc.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | 4 | 5 | def format_img_size(img, C): 6 | """ formats the image size based on config """ 7 | img_min_side = float(C.im_size) 8 | (height, width, _) = img.shape 9 | 10 | if width <= height: 11 | ratio = img_min_side / width 12 | new_height = int(ratio * height) 13 | new_width = int(img_min_side) 14 | else: 15 | ratio = img_min_side / height 16 | new_width = int(ratio * width) 17 | new_height = int(img_min_side) 18 | img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC) 19 | return img, ratio 20 | 21 | 22 | def format_img_channels(img, C): 23 | """ formats the image channels based on config """ 24 | # img = img[:, :, (2, 1, 0)] 25 | img = img.astype(np.float32) 26 | img[:, :, 0] -= C.img_channel_mean[0] 27 | img[:, :, 1] -= C.img_channel_mean[1] 28 | img[:, :, 2] -= C.img_channel_mean[2] 29 | # img /= C.img_scaling_factor 30 | # img = np.transpose(img, (2, 0, 1)) 31 | img = np.expand_dims(img, axis=0) 32 | return img 33 | 34 | 35 | def format_img_batch(img, C): 36 | """ formats the image channels based on config """ 37 | # img = img[:, :, (2, 1, 0)] 38 | img = img.astype(np.float32) 39 | img[:, :, :, 0] -= C.img_channel_mean[0] 40 | img[:, :, :, 1] -= C.img_channel_mean[1] 41 | img[:, :, :, 2] -= C.img_channel_mean[2] 42 | # img /= C.img_scaling_factor 43 | # img = np.transpose(img, (2, 0, 1)) 44 | # img = np.expand_dims(img, axis=0) 45 | return img 46 | 47 | 48 | def format_img(img, C): 49 | """ formats an image for model prediction based on config """ 50 | # img, ratio = format_img_size(img, C) 51 | img = format_img_channels(img, C) 52 | return img # return img, ratio 53 | 54 | 55 | def format_img_inria(img, C): 56 | img_h, img_w = img.shape[:2] 57 | # img_h_new, img_w_new = int(round(img_h/16)*16), int(round(img_w/16)*16) 58 | # img = cv2.resize(img, (img_w_new, img_h_new)) 59 | img_h_new, img_w_new = int(np.ceil(img_h / 16) * 16), int(np.ceil(img_w / 16) * 16) 60 | paved_image = np.zeros((img_h_new, img_w_new, 3), dtype=img.dtype) 61 | paved_image[0:img_h, 0:img_w] = img 62 | img = format_img_channels(paved_image, C) 63 | return img 64 | 65 | 66 | def format_img_ratio(img, C, ratio): 67 | img = img.astype(np.float32) 68 | img[:, :, 0] -= C.img_channel_mean[0] 69 | img[:, :, 1] -= C.img_channel_mean[1] 70 | img[:, :, 2] -= C.img_channel_mean[2] 71 | img = cv2.resize(img, None, None, fx=ratio, fy=ratio) 72 | # img = cv2.resize(img, None, None, fx=ratio, fy=ratio, interpolation=cv2.INTER_CUBIC) 73 | img = np.expand_dims(img, axis=0) 74 | return img # return img, ratio 75 | 76 | 77 | def preprocess_input_test(x): 78 | x = x.astype(np.float32) 79 | x /= 255. 80 | x -= 0.5 81 | x *= 2. 82 | x = np.expand_dims(x, axis=0) 83 | return x 84 | 85 | 86 | # Method to transform the coordinates of the bounding box to its original size 87 | def get_real_coordinates(ratio, x1, y1, x2, y2): 88 | real_x1 = int(round(x1 // ratio)) 89 | real_y1 = int(round(y1 // ratio)) 90 | real_x2 = int(round(x2 // ratio)) 91 | real_y2 = int(round(y2 // ratio)) 92 | 93 | return (real_x1, real_y1, real_x2, real_y2) 94 | 95 | 96 | def intersection(ai, bi, area): 97 | x = max(ai[0], bi[0]) 98 | y = max(ai[1], bi[1]) 99 | w = min(ai[2], bi[2]) - x 100 | h = min(ai[3], bi[3]) - y 101 | if w < 0 or h < 0: 102 | return 0 103 | return w * h / area 104 | 105 | 106 | def box_grid_overlap(bboxes, get_img_output_length): 107 | width, height = 960, 540 108 | (resized_width, resized_height) = 960, 540 109 | # for VGG16, calculate the output map size 110 | (output_width, output_height) = get_img_output_length(resized_width, resized_height) 111 | 112 | downscale = float(16) 113 | 114 | num_bboxes = len(bboxes) 115 | # initialise output objectiveness 116 | y_grid_overlap = np.zeros((output_height, output_width)) 117 | 118 | if num_bboxes > 0: 119 | # get the GT box coordinates, and resize to account for image resizing 120 | gta = np.zeros((num_bboxes, 4)) 121 | gta[:, 0], gta[:, 1], gta[:, 2], gta[:, 3] = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2] + bboxes[:, 0], bboxes[:, 122 | 3] + bboxes[ 123 | :, 1] 124 | # for bbox_num in range(num_bboxes): 125 | # # get the GT box coordinates, and resize to account for image resizing 126 | # gta[bbox_num, 0] = bboxes[bbox_num][0] 127 | # gta[bbox_num, 1] = bboxes[bbox_num][1] 128 | # gta[bbox_num, 2] = bboxes[bbox_num][2] 129 | # gta[bbox_num, 3] = bboxes[bbox_num][3] 130 | for ix in range(output_width): 131 | x1_anc = downscale * ix 132 | x2_anc = downscale * (ix + 1) 133 | for jy in range(output_height): 134 | y1_anc = downscale * jy 135 | y2_anc = downscale * (jy + 1) 136 | best_op = 0 137 | for b in range(num_bboxes): 138 | grid = [x1_anc, y1_anc, x2_anc, y2_anc] 139 | op = intersection(grid, gta[b, :], downscale ** 2) 140 | best_op = op if op > best_op else best_op 141 | y_grid_overlap[jy, ix] = best_op 142 | 143 | y_grid_overlap = np.expand_dims(y_grid_overlap.reshape((1, -1)), axis=0) 144 | return y_grid_overlap 145 | 146 | 147 | def integrate_motion_score(bboxes, probs, pred, stride=16): 148 | if len(bboxes) == 0: 149 | return [] 150 | probs = probs.reshape((-1, 1)) 151 | pred_anchor_score = np.zeros((probs.shape[0], 1)) 152 | for i in range(len(bboxes)): 153 | x1, y1, x2, y2 = int(bboxes[i][0] / stride), int(bboxes[i][1] / stride), int(bboxes[i][2] / stride), int( 154 | bboxes[i][3] / stride) 155 | pred_anchor_score[i, 0] = np.sum(pred[y1:y2, x1:x2]) / ((x2 - x1) * (y2 - y1)) 156 | # alpha, belta = 2/0.7, 0.1 157 | # pred_anchor_score = np.where(pred_anchor_score>0.7, pred_anchor_score*alpha, pred_anchor_score) 158 | # pred_anchor_score = np.maximum(pred_anchor_score*alpha, np.ones_like(pred_anchor_score)*belta) 159 | # all_probs = pred_anchor_score 160 | all_probs = probs * pred_anchor_score 161 | return all_probs 162 | 163 | 164 | def box_encoder_pp(anchors, boxes, Y1): 165 | A = np.copy(anchors[:, :, :, :4]) 166 | A = A.reshape((-1, 4)) 167 | 168 | # 1 calculate the iou scores 169 | max_overlaps = np.zeros((anchors.shape[0] * anchors.shape[1] * anchors.shape[2],), dtype=np.float32) 170 | if len(boxes) > 0: 171 | boxes[:, 2] += boxes[:, 0] 172 | boxes[:, 3] += boxes[:, 1] 173 | overlaps = bbox_overlaps(np.ascontiguousarray(A, dtype=np.float64), 174 | np.ascontiguousarray(boxes, dtype=np.float64)) 175 | max_overlaps = overlaps.max(axis=1) 176 | # normalize the iou scores 177 | if np.max(max_overlaps) > 0: 178 | max_overlaps = (max_overlaps - np.min(max_overlaps)) / np.max(max_overlaps) 179 | # 2 calculate the rpn scores 180 | rpn_score = Y1.reshape((-1)).astype(np.float32) 181 | inds = np.where(max_overlaps == 0) 182 | rpn_score[inds] = np.min(rpn_score) 183 | scores = (rpn_score + max_overlaps) / 2 184 | scores = np.expand_dims(scores.reshape((1, -1)).astype(np.float32), axis=0) 185 | return scores 186 | 187 | 188 | def box_encoder_iou(anchors, boxes): 189 | A = np.copy(anchors[:, :, :, :4]) 190 | A = A.reshape((-1, 4)) 191 | 192 | max_overlaps = np.zeros((anchors.shape[0] * anchors.shape[1] * anchors.shape[2],), dtype=np.float32) 193 | if len(boxes) > 0: 194 | boxes[:, 2] += boxes[:, 0] 195 | boxes[:, 3] += boxes[:, 1] 196 | overlaps = bbox_overlaps(np.ascontiguousarray(A, dtype=np.float64), 197 | np.ascontiguousarray(boxes, dtype=np.float64)) 198 | max_overlaps = overlaps.max(axis=1) 199 | scores = np.expand_dims(max_overlaps.reshape((1, -1)).astype(np.float32), axis=0) 200 | return scores 201 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | tensorflow-gpu==1.14.0 2 | easydict==1.6 3 | joblib==0.10.3 4 | numpy==1.16.2 5 | opencv-python==4.1.0.25 6 | Pillow==6.1.0 7 | keras==2.0.6 8 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import os 2 | from Cython.Distutils import build_ext 3 | from distutils.core import setup, Extension 4 | from Cython.Build import cythonize 5 | import numpy as np 6 | 7 | try: 8 | numpy_include = np.get_include() 9 | except AttributeError: 10 | numpy_include = np.get_numpy_include() 11 | 12 | 13 | def customize_compiler_for_nvcc(self): 14 | """inject deep into distutils to customize how the dispatch 15 | to gcc/nvcc works. 16 | If you subclass UnixCCompiler, it's not trivial to get your subclass 17 | injected in, and still have the right customizations (i.e. 18 | distutils.sysconfig.customize_compiler) run on it. So instead of going 19 | the OO route, I have this. Note, it's kindof like a wierd functional 20 | subclassing going on.""" 21 | 22 | # tell the compiler it can processes .cu 23 | self.src_extensions.append('.cu') 24 | 25 | # save references to the default compiler_so and _comple methods 26 | default_compiler_so = self.compiler_so 27 | super = self._compile 28 | 29 | # now redefine the _compile method. This gets executed for each 30 | # object but distutils doesn't have the ability to change compilers 31 | # based on source extension: we add it. 32 | def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts): 33 | if os.path.splitext(src)[1] == '.cu': 34 | # use the cuda for .cu files 35 | self.set_executable('compiler_so', CUDA['nvcc']) 36 | # use only a subset of the extra_postargs, which are 1-1 translated 37 | # from the extra_compile_args in the Extension class 38 | postargs = extra_postargs['nvcc'] 39 | else: 40 | postargs = extra_postargs['gcc'] 41 | 42 | super(obj, src, ext, cc_args, postargs, pp_opts) 43 | # reset the default compiler_so, which we might have changed for cuda 44 | self.compiler_so = default_compiler_so 45 | 46 | # inject our redefined _compile method into the class 47 | self._compile = _compile 48 | 49 | 50 | # run the customize_compiler 51 | class custom_build_ext(build_ext): 52 | def build_extensions(self): 53 | customize_compiler_for_nvcc(self.compiler) 54 | build_ext.build_extensions(self) 55 | 56 | 57 | def find_in_path(name, path): 58 | "Find a file in a search path" 59 | # Adapted fom 60 | # http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/ 61 | for dir in path.split(os.pathsep): 62 | binpath = os.path.join(dir, name) 63 | if os.path.exists(binpath): 64 | return os.path.abspath(binpath) 65 | return None 66 | 67 | 68 | def locate_cuda(): 69 | """Locate the CUDA environment on the system 70 | Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64' 71 | and values giving the absolute path to each directory. 72 | Starts by looking for the CUDAHOME env variable. If not found, everything 73 | is based on finding 'nvcc' in the PATH. 74 | """ 75 | 76 | # first check if the CUDAHOME env variable is in use 77 | if 'CUDAHOME' in os.environ: 78 | home = os.environ['CUDAHOME'] 79 | nvcc = os.path.join(home, 'bin', 'nvcc') 80 | else: 81 | # otherwise, search the PATH for NVCC 82 | default_path = os.path.join(os.sep, 'usr', 'local', 'cuda', 'bin') 83 | nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path) 84 | if nvcc is None: 85 | raise EnvironmentError('The nvcc binary could not be ' 86 | 'located in your $PATH. Either add it to your path, or set $CUDAHOME') 87 | home = os.path.dirname(os.path.dirname(nvcc)) 88 | 89 | cudaconfig = {'home': home, 'nvcc': nvcc, 90 | 'include': os.path.join(home, 'include'), 91 | 'lib64': os.path.join(home, 'lib64')} 92 | for k, v in cudaconfig.items(): 93 | if not os.path.exists(v): 94 | raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v)) 95 | 96 | return cudaconfig 97 | 98 | 99 | CUDA = locate_cuda() 100 | 101 | ext_modules = [ 102 | Extension( 103 | "keras_csp.nms.cpu_nms", 104 | ["keras_csp/nms/cpu_nms.pyx"], 105 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]}, 106 | include_dirs=[numpy_include] 107 | ), 108 | Extension('keras_csp.nms.gpu_nms', 109 | ['keras_csp/nms/nms_kernel.cu', 'keras_csp/nms/gpu_nms.pyx'], 110 | library_dirs=[CUDA['lib64']], 111 | libraries=['cudart'], 112 | language='c++', 113 | runtime_library_dirs=[CUDA['lib64']], 114 | # this syntax is specific to this build system 115 | # we're only going to use certain compiler args with nvcc and not with 116 | # gcc the implementation of this trick is in customize_compiler() below 117 | extra_compile_args={'gcc': ["-Wno-unused-function"], 118 | 'nvcc': ['-arch=sm_35', 119 | '--ptxas-options=-v', 120 | '-c', 121 | '--compiler-options', 122 | "'-fPIC'"]}, 123 | include_dirs=[numpy_include, CUDA['include']] 124 | ) 125 | ] 126 | 127 | setup( 128 | ext_modules=ext_modules, 129 | cmdclass={'build_ext': custom_build_ext}, 130 | ) 131 | -------------------------------------------------------------------------------- /test_caltech.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import pickle 4 | from keras.layers import Input 5 | from keras.models import Model 6 | from keras_csp import config, bbox_process 7 | from keras_csp.utilsfunc import * 8 | from keras_csp import resnet50 as nn 9 | 10 | os.environ["CUDA_VISIBLE_DEVICES"] = '1' 11 | C = config.Config() 12 | C.offset = True 13 | cache_path = 'data/cache/caltech/test' 14 | with open(cache_path, 'rb') as fid: 15 | val_data = pickle.load(fid, encoding='latin1') 16 | num_imgs = len(val_data) 17 | print('num of val samples: {}'.format(num_imgs)) 18 | 19 | C.size_test = (480, 640) 20 | input_shape_img = (C.size_test[0], C.size_test[1], 3) 21 | 22 | img_input = Input(shape=input_shape_img) 23 | 24 | # define the network prediction 25 | preds = nn.nn_p3p4p5(img_input, offset=C.offset, num_scale=C.num_scale, trainable=True) 26 | model = Model(img_input, preds) 27 | 28 | if C.offset: 29 | w_path = 'output/valmodels/caltech/%s/off' % (C.scale) 30 | out_path = 'output/valresults/caltech/%s/off' % (C.scale) 31 | else: 32 | w_path = 'output/valmodels/caltech/%s/nooff' % (C.scale) 33 | out_path = 'output/valresults/caltech/%s/nooff' % (C.scale) 34 | 35 | if not os.path.exists(out_path): 36 | os.makedirs(out_path) 37 | files = sorted(os.listdir(w_path)) 38 | for w_ind in range(51, 121): 39 | for f in files: 40 | if f.split('_')[0] == 'net' and int(f.split('_')[1][1:]) == w_ind: 41 | cur_file = f 42 | break 43 | weight1 = os.path.join(w_path, cur_file) 44 | print('load weights from {}'.format(weight1)) 45 | model.load_weights(weight1, by_name=True) 46 | res_path = os.path.join(out_path, '%03d' % int(str(w_ind))) 47 | 48 | print(res_path) 49 | if not os.path.exists(res_path): 50 | os.mkdir(res_path) 51 | for st in range(6, 11): 52 | set_path = os.path.join(res_path, 'set' + '%02d' % st) 53 | if not os.path.exists(set_path): 54 | os.mkdir(set_path) 55 | 56 | start_time = time.time() 57 | for f in range(num_imgs): 58 | filepath = val_data[f]['filepath'] 59 | filepath_next = val_data[f + 1]['filepath'] if f < num_imgs - 1 else val_data[f]['filepath'] 60 | set = filepath.split('/')[-1].split('_')[0] 61 | video = filepath.split('/')[-1].split('_')[1] 62 | frame_number = int(filepath.split('/')[-1].split('_')[2][1:6]) + 1 63 | frame_number_next = int(filepath_next.split('/')[-1].split('_')[2][1:6]) + 1 64 | set_path = os.path.join(res_path, set) 65 | video_path = os.path.join(set_path, video + '.txt') 66 | if os.path.exists(video_path): 67 | continue 68 | if frame_number == 30: 69 | res_all = [] 70 | img = cv2.imread(filepath) 71 | x_rcnn = format_img(img, C) 72 | Y = model.predict(x_rcnn) 73 | 74 | if C.offset: 75 | boxes = bbox_process.parse_det_offset(Y, C, score=0.01, down=4) 76 | else: 77 | boxes = bbox_process.parse_det(Y, C, score=0.01, down=4, scale=C.scale) 78 | 79 | if len(boxes) > 0: 80 | f_res = np.repeat(frame_number, len(boxes), axis=0).reshape((-1, 1)) 81 | boxes[:, [2, 3]] -= boxes[:, [0, 1]] 82 | res_all += np.concatenate((f_res, boxes), axis=-1).tolist() 83 | if frame_number_next == 30 or f == num_imgs - 1: 84 | np.savetxt(video_path, np.array(res_all), fmt='%6f') 85 | print(time.time() - start_time) 86 | -------------------------------------------------------------------------------- /test_city.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import pickle 4 | from keras.layers import Input 5 | from keras.models import Model 6 | from keras_csp import config, bbox_process 7 | from keras_csp.utilsfunc import * 8 | 9 | os.environ["CUDA_VISIBLE_DEVICES"] = '1' 10 | C = config.Config() 11 | C.offset = True 12 | cache_path = 'data/cache/cityperson/val_500' 13 | with open(cache_path, 'rb') as fid: 14 | val_data = pickle.load(fid, encoding='latin1') 15 | num_imgs = len(val_data) 16 | print('num of val samples: {}'.format(num_imgs)) 17 | 18 | C.size_test = (1024, 2048) 19 | input_shape_img = (C.size_test[0], C.size_test[1], 3) 20 | img_input = Input(shape=input_shape_img) 21 | 22 | # define the base network (resnet here, can be MobileNet, etc) 23 | if C.network == 'resnet50': 24 | from keras_csp import resnet50 as nn 25 | elif C.network == 'mobilenet': 26 | from keras_csp import mobilenet as nn 27 | else: 28 | raise NotImplementedError('Not support network: {}'.format(C.network)) 29 | 30 | # define the network prediction 31 | preds = nn.nn_p3p4p5(img_input, offset=C.offset, num_scale=C.num_scale, trainable=True) 32 | model = Model(img_input, preds) 33 | 34 | if C.offset: 35 | w_path = 'output/valmodels/city/%s/off' % (C.scale) 36 | out_path = 'output/valresults/city/%s/off' % (C.scale) 37 | else: 38 | w_path = 'output/valmodels/city/%s/nooff' % (C.scale) 39 | out_path = 'output/valresults/city/%s/nooff' % (C.scale) 40 | if not os.path.exists(out_path): 41 | os.makedirs(out_path) 42 | files = sorted(os.listdir(w_path)) 43 | # get the results from epoch 51 to epoch 150 44 | for w_ind in range(150, 151): 45 | for f in files: 46 | if f.split('_')[0] == 'net' and int(f.split('_')[1][1:]) == w_ind: 47 | cur_file = f 48 | break 49 | weight1 = os.path.join(w_path, cur_file) 50 | print('load weights from {}'.format(weight1)) 51 | model.load_weights(weight1, by_name=True) 52 | res_path = os.path.join(out_path, '%03d' % int(str(w_ind))) 53 | if not os.path.exists(res_path): 54 | os.makedirs(res_path) 55 | print(res_path) 56 | res_file = os.path.join(res_path, 'val_det.txt') 57 | res_all = [] 58 | start_time = time.time() 59 | for f in range(num_imgs): 60 | filepath = val_data[f]['filepath'] 61 | img = cv2.imread(filepath) 62 | if img is None: 63 | raise RuntimeError("image at %s not found" % filepath) 64 | x_rcnn = format_img(img, C) 65 | Y = model.predict(x_rcnn) 66 | 67 | if C.offset: 68 | boxes = bbox_process.parse_det_offset(Y, C, score=0.1, down=4) 69 | else: 70 | boxes = bbox_process.parse_det(Y, C, score=0.1, down=4, scale=C.scale) 71 | if len(boxes) > 0: 72 | f_res = np.repeat(f + 1, len(boxes), axis=0).reshape((-1, 1)) 73 | boxes[:, [2, 3]] -= boxes[:, [0, 1]] 74 | res_all += np.concatenate((f_res, boxes), axis=-1).tolist() 75 | np.savetxt(res_file, np.array(res_all), fmt='%6f') 76 | print(time.time() - start_time) 77 | -------------------------------------------------------------------------------- /test_wider_ms.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import pickle 4 | from keras.layers import Input 5 | from keras.models import Model 6 | from keras_csp import config, bbox_process 7 | from keras_csp.utilsfunc import * 8 | 9 | os.environ["CUDA_VISIBLE_DEVICES"] = '0' 10 | C = config.Config() 11 | C.offset = True 12 | C.scale = 'hw' 13 | C.num_scale = 2 14 | cache_path = 'data/cache/widerface/val' 15 | with open(cache_path, 'rb') as fid: 16 | val_data = pickle.load(fid, encoding='latin1') 17 | num_imgs = len(val_data) 18 | print('num of val samples: {}'.format(num_imgs)) 19 | 20 | C.size_test = [0, 0] 21 | input_shape_img = (None, None, 3) 22 | img_input = Input(shape=input_shape_img) 23 | 24 | # define the base network (resnet here, can be MobileNet, etc) 25 | from keras_csp import resnet50 as nn 26 | 27 | # define the network prediction 28 | preds = nn.nn_p3p4p5(img_input, offset=C.offset, num_scale=C.num_scale, trainable=True) 29 | model = Model(img_input, preds) 30 | 31 | if C.offset: 32 | w_path = 'output/valmodels/wider/%s/off' % (C.scale) 33 | out_path = 'output/valresults/wider/%s/off' % (C.scale) 34 | else: 35 | w_path = 'output/valmodels/wider/%s/nooff' % (C.scale) 36 | out_path = 'output/valresults/wider/%s/nooff' % (C.scale) 37 | if not os.path.exists(out_path): 38 | os.makedirs(out_path) 39 | files = sorted(os.listdir(w_path)) 40 | # get the results from epoch 51 to epoch 150 41 | for w_ind in range(382, 383): 42 | for f in files: 43 | if f.split('_')[0] == 'net' and int(f.split('_')[1][1:]) == w_ind: 44 | cur_file = f 45 | break 46 | weight1 = os.path.join(w_path, cur_file) 47 | print('load weights from {}'.format(weight1)) 48 | model.load_weights(weight1, by_name=True) 49 | res_path = os.path.join(out_path, '%03d' % int(str(w_ind))) 50 | if not os.path.exists(res_path): 51 | os.makedirs(res_path) 52 | print(res_path) 53 | 54 | start_time = time.time() 55 | for f in range(num_imgs): 56 | filepath = val_data[f]['filepath'] 57 | event = filepath.split('/')[-2] 58 | event_path = os.path.join(res_path, event) 59 | if not os.path.exists(event_path): 60 | os.mkdir(event_path) 61 | filename = filepath.split('/')[-1].split('.')[0] 62 | txtpath = os.path.join(event_path, filename + '.txt') 63 | if os.path.exists(txtpath): 64 | continue 65 | 66 | img = cv2.imread(filepath) 67 | 68 | 69 | def detect_face(img, scale=1, flip=False): 70 | img_h, img_w = img.shape[:2] 71 | img_h_new, img_w_new = int(np.ceil(scale * img_h / 16) * 16), int(np.ceil(scale * img_w / 16) * 16) 72 | scale_h, scale_w = img_h_new / img_h, img_w_new / img_w 73 | 74 | img_s = cv2.resize(img, None, None, fx=scale_w, fy=scale_h, interpolation=cv2.INTER_LINEAR) 75 | # img_h, img_w = img_s.shape[:2] 76 | # print frame_number 77 | C.size_test[0] = img_h_new 78 | C.size_test[1] = img_w_new 79 | 80 | if flip: 81 | img_sf = cv2.flip(img_s, 1) 82 | # x_rcnn = format_img_pad(img_sf, C) 83 | x_rcnn = format_img(img_sf, C) 84 | else: 85 | # x_rcnn = format_img_pad(img_s, C) 86 | x_rcnn = format_img(img_s, C) 87 | Y = model.predict(x_rcnn) 88 | boxes = bbox_process.parse_wider_offset(Y, C, score=0.05, nmsthre=0.6) 89 | if len(boxes) > 0: 90 | keep_index = np.where(np.minimum(boxes[:, 2] - boxes[:, 0], boxes[:, 3] - boxes[:, 1]) >= 12)[0] 91 | boxes = boxes[keep_index, :] 92 | if len(boxes) > 0: 93 | if flip: 94 | boxes[:, [0, 2]] = img_s.shape[1] - boxes[:, [2, 0]] 95 | boxes[:, 0:4:2] = boxes[:, 0:4:2] / scale_w 96 | boxes[:, 1:4:2] = boxes[:, 1:4:2] / scale_h 97 | else: 98 | boxes = np.empty(shape=[0, 5], dtype=np.float32) 99 | return boxes 100 | 101 | 102 | def im_det_ms_pyramid(image, max_im_shrink): 103 | # shrink detecting and shrink only detect big face 104 | det_s = np.row_stack((detect_face(image, 0.5), detect_face(image, 0.5, flip=True))) 105 | index = np.where(np.maximum(det_s[:, 2] - det_s[:, 0] + 1, det_s[:, 3] - det_s[:, 1] + 1) > 64)[0] 106 | det_s = det_s[index, :] 107 | 108 | det_temp = np.row_stack((detect_face(image, 0.75), detect_face(image, 0.75, flip=True))) 109 | index = np.where(np.maximum(det_temp[:, 2] - det_temp[:, 0] + 1, det_temp[:, 3] - det_temp[:, 1] + 1) > 32)[ 110 | 0] 111 | det_temp = det_temp[index, :] 112 | det_s = np.row_stack((det_s, det_temp)) 113 | 114 | det_temp = np.row_stack((detect_face(image, 0.25), detect_face(image, 0.25, flip=True))) 115 | index = np.where(np.maximum(det_temp[:, 2] - det_temp[:, 0] + 1, det_temp[:, 3] - det_temp[:, 1] + 1) > 96)[ 116 | 0] 117 | det_temp = det_temp[index, :] 118 | det_s = np.row_stack((det_s, det_temp)) 119 | 120 | st = [1.25, 1.5, 1.75, 2.0, 2.25] 121 | for i in range(len(st)): 122 | if (st[i] <= max_im_shrink): 123 | det_temp = np.row_stack((detect_face(image, st[i]), detect_face(image, st[i], flip=True))) 124 | # Enlarged images are only used to detect small faces. 125 | if st[i] == 1.25: 126 | index = np.where( 127 | np.minimum(det_temp[:, 2] - det_temp[:, 0] + 1, det_temp[:, 3] - det_temp[:, 1] + 1) < 128)[ 128 | 0] 129 | det_temp = det_temp[index, :] 130 | elif st[i] == 1.5: 131 | index = np.where( 132 | np.minimum(det_temp[:, 2] - det_temp[:, 0] + 1, det_temp[:, 3] - det_temp[:, 1] + 1) < 96)[ 133 | 0] 134 | det_temp = det_temp[index, :] 135 | elif st[i] == 1.75: 136 | index = np.where( 137 | np.minimum(det_temp[:, 2] - det_temp[:, 0] + 1, det_temp[:, 3] - det_temp[:, 1] + 1) < 64)[ 138 | 0] 139 | det_temp = det_temp[index, :] 140 | elif st[i] == 2.0: 141 | index = np.where( 142 | np.minimum(det_temp[:, 2] - det_temp[:, 0] + 1, det_temp[:, 3] - det_temp[:, 1] + 1) < 48)[ 143 | 0] 144 | det_temp = det_temp[index, :] 145 | elif st[i] == 2.25: 146 | index = np.where( 147 | np.minimum(det_temp[:, 2] - det_temp[:, 0] + 1, det_temp[:, 3] - det_temp[:, 1] + 1) < 32)[ 148 | 0] 149 | det_temp = det_temp[index, :] 150 | det_s = np.row_stack((det_s, det_temp)) 151 | return det_s 152 | 153 | 154 | max_im_shrink = (0x7fffffff / 577.0 / (img.shape[0] * img.shape[1])) ** 0.5 # the max size of input image 155 | shrink = max_im_shrink if max_im_shrink < 1 else 1 156 | det0 = detect_face(img) 157 | det1 = detect_face(img, flip=True) 158 | det2 = im_det_ms_pyramid(img, max_im_shrink) 159 | # merge all test results via bounding box voting 160 | det = np.row_stack((det0, det1, det2)) 161 | keep_index = np.where(np.minimum(det[:, 2] - det[:, 0], det[:, 3] - det[:, 1]) >= 3)[0] 162 | det = det[keep_index, :] 163 | dets = bbox_process.soft_bbox_vote(det, thre=0.4) 164 | keep_index = np.where((dets[:, 2] - dets[:, 0] + 1) * (dets[:, 3] - dets[:, 1] + 1) >= 6 ** 2)[0] 165 | dets = dets[keep_index, :] 166 | 167 | with open(txtpath, 'w') as f: 168 | f.write('{:s}\n'.format(filename)) 169 | f.write('{:d}\n'.format(len(dets))) 170 | for line in dets: 171 | f.write('{:.0f} {:.0f} {:.0f} {:.0f} {:.3f}\n'. 172 | format(line[0], line[1], line[2] - line[0] + 1, line[3] - line[1] + 1, line[4])) 173 | print(time.time() - start_time) 174 | -------------------------------------------------------------------------------- /train_caltech.py: -------------------------------------------------------------------------------- 1 | import random 2 | import sys, os 3 | import time 4 | import numpy as np 5 | import pickle 6 | from keras.utils import generic_utils 7 | from keras.optimizers import Adam 8 | from keras.layers import Input 9 | from keras.models import Model 10 | from keras_csp import config, data_generators 11 | from keras_csp import losses as losses 12 | 13 | # get the config parameters 14 | C = config.Config() 15 | C.gpu_ids = '0' 16 | C.onegpu = 16 17 | C.size_train = (336, 448) 18 | C.init_lr = 1e-4 19 | C.num_epochs = 120 20 | C.offset = True 21 | 22 | num_gpu = len(C.gpu_ids.split(',')) 23 | batchsize = C.onegpu * num_gpu 24 | os.environ["CUDA_VISIBLE_DEVICES"] = C.gpu_ids 25 | 26 | # get the training data 27 | cache_ped = 'data/cache/caltech/train_gt' 28 | cache_emp = 'data/cache/caltech/train_nogt' 29 | with open(cache_ped, 'rb') as fid: 30 | ped_data = pickle.load(fid, encoding='latin1') 31 | with open(cache_emp, 'rb') as fid: 32 | emp_data = pickle.load(fid, encoding='latin1') 33 | num_imgs_ped = len(ped_data) 34 | num_imgs_emp = len(emp_data) 35 | print(('num of ped and emp samples: {} {}'.format(num_imgs_ped, num_imgs_emp))) 36 | data_gen_train = data_generators.get_data_hybrid(ped_data, emp_data, C, batchsize=batchsize, hyratio=0.5) 37 | 38 | # define the base network (resnet here, can be MobileNet, etc) 39 | if C.network == 'resnet50': 40 | from keras_csp import resnet50 as nn 41 | 42 | weight_path = 'data/models/resnet50_weights_tf_dim_ordering_tf_kernels.h5' 43 | elif C.network == 'mobilenet': 44 | from keras_csp import mobilenet as nn 45 | 46 | weight_path = 'data/models/mobilenet_1_0_224_tf_no_top.h5' 47 | else: 48 | raise NotImplementedError('Not support network: {}'.format(C.network)) 49 | 50 | input_shape_img = (C.size_train[0], C.size_train[1], 3) 51 | img_input = Input(shape=input_shape_img) 52 | # define the network prediction 53 | preds = nn.nn_p3p4p5(img_input, offset=C.offset, num_scale=C.num_scale, trainable=True) 54 | preds_tea = nn.nn_p3p4p5(img_input, offset=C.offset, num_scale=C.num_scale, trainable=True) 55 | 56 | model = Model(img_input, preds) 57 | if num_gpu > 1: 58 | from keras_csp.parallel_model import ParallelModel 59 | 60 | model = ParallelModel(model, int(num_gpu)) 61 | model_stu = Model(img_input, preds) 62 | model_tea = Model(img_input, preds_tea) 63 | 64 | model.load_weights(weight_path, by_name=True) 65 | model_tea.load_weights(weight_path, by_name=True) 66 | print('load weights from {}'.format(weight_path)) 67 | 68 | if C.offset: 69 | out_path = 'output/valmodels/caltech/%s/off2' % (C.scale) 70 | else: 71 | out_path = 'output/valmodels/caltech/%s/nooff' % (C.scale) 72 | 73 | if not os.path.exists(out_path): 74 | os.makedirs(out_path) 75 | res_file = os.path.join(out_path, 'records.txt') 76 | 77 | optimizer = Adam(lr=C.init_lr) 78 | if C.offset: 79 | model.compile(optimizer=optimizer, loss=[losses.cls_center, losses.regr_h, losses.regr_offset]) 80 | else: 81 | if C.scale == 'hw': 82 | model.compile(optimizer=optimizer, loss=[losses.cls_center, losses.regr_hw]) 83 | else: 84 | model.compile(optimizer=optimizer, loss=[losses.cls_center, losses.regr_h]) 85 | 86 | epoch_length = int(C.iter_per_epoch / batchsize) 87 | iter_num = 0 88 | add_epoch = 0 89 | losses = np.zeros((epoch_length, 3)) 90 | 91 | best_loss = np.Inf 92 | print(('Starting training with lr {} and alpha {}'.format(C.init_lr, C.alpha))) 93 | start_time = time.time() 94 | total_loss_r, cls_loss_r1, regr_loss_r1, offset_loss_r1 = [], [], [], [] 95 | for epoch_num in range(C.num_epochs): 96 | progbar = generic_utils.Progbar(epoch_length) 97 | print(('Epoch {}/{}'.format(epoch_num + 1 + add_epoch, C.num_epochs + C.add_epoch))) 98 | while True: 99 | try: 100 | X, Y = next(data_gen_train) 101 | loss_s1 = model.train_on_batch(X, Y) 102 | 103 | for l in model_tea.layers: 104 | weights_tea = l.get_weights() 105 | if len(weights_tea) > 0: 106 | if num_gpu > 1: 107 | weights_stu = model_stu.get_layer(name=l.name).get_weights() 108 | else: 109 | weights_stu = model.get_layer(name=l.name).get_weights() 110 | weights_tea = [C.alpha * w_tea + (1 - C.alpha) * w_stu for (w_tea, w_stu) in 111 | zip(weights_tea, weights_stu)] 112 | l.set_weights(weights_tea) 113 | # print loss_s1 114 | losses[iter_num, 0] = loss_s1[1] 115 | losses[iter_num, 1] = loss_s1[2] 116 | if C.offset: 117 | losses[iter_num, 2] = loss_s1[3] 118 | else: 119 | losses[iter_num, 2] = 0 120 | 121 | iter_num += 1 122 | if iter_num % 20 == 0: 123 | progbar.update(iter_num, 124 | [('cls', np.mean(losses[:iter_num, 0])), ('regr_h', np.mean(losses[:iter_num, 1])), 125 | ('offset', np.mean(losses[:iter_num, 2]))]) 126 | if iter_num == epoch_length: 127 | cls_loss1 = np.mean(losses[:, 0]) 128 | regr_loss1 = np.mean(losses[:, 1]) 129 | offset_loss1 = np.mean(losses[:, 2]) 130 | total_loss = cls_loss1 + regr_loss1 + offset_loss1 131 | 132 | total_loss_r.append(total_loss) 133 | cls_loss_r1.append(cls_loss1) 134 | regr_loss_r1.append(regr_loss1) 135 | offset_loss_r1.append(offset_loss1) 136 | print(('Total loss: {}'.format(total_loss))) 137 | print(('Elapsed time: {}'.format(time.time() - start_time))) 138 | 139 | iter_num = 0 140 | start_time = time.time() 141 | 142 | if total_loss < best_loss: 143 | print(('Total loss decreased from {} to {}, saving weights'.format(best_loss, total_loss))) 144 | best_loss = total_loss 145 | model_tea.save_weights( 146 | os.path.join(out_path, 'net_e{}_l{}.hdf5'.format(epoch_num + 1 + add_epoch, total_loss))) 147 | break 148 | except Exception as e: 149 | print(('Exception: {}'.format(e))) 150 | continue 151 | records = np.concatenate((np.asarray(total_loss_r).reshape((-1, 1)), 152 | np.asarray(cls_loss_r1).reshape((-1, 1)), 153 | np.asarray(regr_loss_r1).reshape((-1, 1)), 154 | np.asarray(offset_loss_r1).reshape((-1, 1)),), 155 | axis=-1) 156 | np.savetxt(res_file, np.array(records), fmt='%.6f') 157 | print('Training complete, exiting.') 158 | -------------------------------------------------------------------------------- /train_city.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import random 3 | import sys, os 4 | import time 5 | import numpy as np 6 | import pickle 7 | from keras.utils import generic_utils 8 | from keras.optimizers import Adam 9 | from keras.layers import Input 10 | from keras.models import Model 11 | from keras_csp import config, data_generators 12 | from keras_csp import losses as losses 13 | 14 | # get the config parameters 15 | C = config.Config() 16 | C.gpu_ids = '0,1,2,3' 17 | C.onegpu = 2 18 | C.size_train = (640, 1280) 19 | C.init_lr = 2e-4 20 | C.num_epochs = 150 21 | C.offset = True 22 | 23 | num_gpu = len(C.gpu_ids.split(',')) 24 | batchsize = C.onegpu * num_gpu 25 | os.environ["CUDA_VISIBLE_DEVICES"] = C.gpu_ids 26 | 27 | # get the training data 28 | cache_path = 'data/cache/cityperson/train_h50' 29 | with open(cache_path, 'rb') as fid: 30 | train_data = pickle.load(fid, encoding='latin1') 31 | num_imgs_train = len(train_data) 32 | random.shuffle(train_data) 33 | print('num of training samples: {}'.format(num_imgs_train)) 34 | data_gen_train = data_generators.get_data(train_data, C, batchsize=batchsize) 35 | 36 | # define the base network (resnet here, can be MobileNet, etc) 37 | if C.network == 'resnet50': 38 | from keras_csp import resnet50 as nn 39 | 40 | weight_path = 'data/models/resnet50_weights_tf_dim_ordering_tf_kernels.h5' 41 | 42 | if C.offset: 43 | out_path = 'output/valmodels/city/%s/off' % (C.scale) 44 | else: 45 | out_path = 'output/valmodels/city/%s/nooff' % (C.scale) 46 | if not os.path.exists(out_path): 47 | os.makedirs(out_path) 48 | epoch = 0 49 | else: 50 | checkpoint_paths = glob.glob(out_path + "/net*.hdf5") 51 | checkpoint_names = [f.split("/")[-1] for f in checkpoint_paths] 52 | epochs = [*map(int, [f.split("net_e")[1].split("_")[0] for f in checkpoint_names if "net_e" in f])] 53 | max_epoch_idx = np.argmax(epochs) 54 | epoch = epochs[max_epoch_idx] 55 | weight_path = checkpoint_paths[max_epoch_idx] 56 | 57 | input_shape_img = (C.size_train[0], C.size_train[1], 3) 58 | img_input = Input(shape=input_shape_img) 59 | # define the network prediction 60 | preds = nn.nn_p3p4p5(img_input, offset=C.offset, num_scale=C.num_scale, trainable=True) 61 | preds_tea = nn.nn_p3p4p5(img_input, offset=C.offset, num_scale=C.num_scale, trainable=True) 62 | model = Model(img_input, preds) 63 | 64 | if num_gpu > 1: 65 | from keras_csp.parallel_model import ParallelModel 66 | 67 | model = ParallelModel(model, int(num_gpu)) 68 | model_stu = Model(img_input, preds) 69 | model_tea = Model(img_input, preds_tea) 70 | 71 | model.load_weights(weight_path, by_name=True) 72 | model_tea.load_weights(weight_path, by_name=True) 73 | print('load weights from {}'.format(weight_path)) 74 | 75 | res_file = os.path.join(out_path, 'records.txt') 76 | 77 | optimizer = Adam(lr=C.init_lr) 78 | if C.offset: 79 | model.compile(optimizer=optimizer, loss=[losses.cls_center, losses.regr_h, losses.regr_offset]) 80 | else: 81 | if C.scale == 'hw': 82 | model.compile(optimizer=optimizer, loss=[losses.cls_center, losses.regr_hw]) 83 | else: 84 | model.compile(optimizer=optimizer, loss=[losses.cls_center, losses.regr_h]) 85 | 86 | epoch_length = int(C.iter_per_epoch / batchsize) 87 | iter_num = 0 88 | add_epoch = 0 89 | losses = np.zeros((epoch_length, 3)) 90 | 91 | best_loss = np.Inf 92 | print(('Starting training with lr {} and alpha {}'.format(C.init_lr, C.alpha))) 93 | start_time = time.time() 94 | total_loss_r, cls_loss_r1, regr_loss_r1, offset_loss_r1 = [], [], [], [] 95 | for epoch_num in range(epoch, C.num_epochs): 96 | progbar = generic_utils.Progbar(epoch_length) 97 | print(('Epoch {}/{}'.format(epoch_num + 1 + add_epoch, C.num_epochs + C.add_epoch))) 98 | while True: 99 | X, Y = next(data_gen_train) 100 | loss_s1 = model.train_on_batch(X, Y) 101 | 102 | for l in model_tea.layers: 103 | weights_tea = l.get_weights() 104 | if len(weights_tea) > 0: 105 | if num_gpu > 1: 106 | weights_stu = model_stu.get_layer(name=l.name).get_weights() 107 | else: 108 | weights_stu = model.get_layer(name=l.name).get_weights() 109 | weights_tea = [C.alpha * w_tea + (1 - C.alpha) * w_stu for (w_tea, w_stu) in 110 | zip(weights_tea, weights_stu)] 111 | l.set_weights(weights_tea) 112 | # print loss_s1 113 | losses[iter_num, 0] = loss_s1[1] 114 | losses[iter_num, 1] = loss_s1[2] 115 | if C.offset: 116 | losses[iter_num, 2] = loss_s1[3] 117 | else: 118 | losses[iter_num, 2] = 0 119 | 120 | iter_num += 1 121 | if iter_num % 20 == 0: 122 | progbar.update(iter_num, 123 | [('cls', np.mean(losses[:iter_num, 0])), ('regr_h', np.mean(losses[:iter_num, 1])), 124 | ('offset', np.mean(losses[:iter_num, 2]))]) 125 | if iter_num == epoch_length: 126 | cls_loss1 = np.mean(losses[:, 0]) 127 | regr_loss1 = np.mean(losses[:, 1]) 128 | offset_loss1 = np.mean(losses[:, 2]) 129 | total_loss = cls_loss1 + regr_loss1 + offset_loss1 130 | 131 | total_loss_r.append(total_loss) 132 | cls_loss_r1.append(cls_loss1) 133 | regr_loss_r1.append(regr_loss1) 134 | offset_loss_r1.append(offset_loss1) 135 | print(('Total loss: {}'.format(total_loss))) 136 | print(('Elapsed time: {}'.format(time.time() - start_time))) 137 | 138 | iter_num = 0 139 | start_time = time.time() 140 | 141 | if total_loss < best_loss: 142 | print(('Total loss decreased from {} to {}, saving weights'.format(best_loss, total_loss))) 143 | best_loss = total_loss 144 | model_tea.save_weights( 145 | os.path.join(out_path, 'net_e{}_l{}.hdf5'.format(epoch_num + 1 + add_epoch, total_loss))) 146 | break 147 | 148 | records = np.concatenate((np.asarray(total_loss_r).reshape((-1, 1)), 149 | np.asarray(cls_loss_r1).reshape((-1, 1)), 150 | np.asarray(regr_loss_r1).reshape((-1, 1)), 151 | np.asarray(offset_loss_r1).reshape((-1, 1)),), 152 | axis=-1) 153 | np.savetxt(res_file, np.array(records), fmt='%.6f') 154 | print('Training complete, exiting.') 155 | -------------------------------------------------------------------------------- /train_wider.py: -------------------------------------------------------------------------------- 1 | import random 2 | import sys, os 3 | import time 4 | import numpy as np 5 | import pickle 6 | from keras.utils import generic_utils 7 | from keras.optimizers import Adam 8 | from keras.layers import Input 9 | from keras.models import Model 10 | from keras_csp import config, data_generators 11 | from keras_csp import losses as losses 12 | 13 | # get the config parameters 14 | C = config.Config() 15 | C.gpu_ids = '0,1,2,3,4,5,6,7' 16 | C.onegpu = 4 17 | C.size_train = (704, 704) 18 | C.init_lr = 2e-4 19 | C.offset = True 20 | C.scale = 'hw' 21 | C.num_scale = 2 22 | C.num_epochs = 400 23 | C.iter_per_epoch = 4000 24 | num_gpu = len(C.gpu_ids.split(',')) 25 | batchsize = C.onegpu * num_gpu 26 | os.environ["CUDA_VISIBLE_DEVICES"] = C.gpu_ids 27 | 28 | # get the training data 29 | cache_path = 'data/cache/widerface/train' 30 | with open(cache_path, 'rb') as fid: 31 | train_data = pickle.load(fid, encoding='latin1') 32 | num_imgs_train = len(train_data) 33 | print('num of training samples: {}'.format(num_imgs_train)) 34 | data_gen_train = data_generators.get_data_wider(train_data, C, batchsize=batchsize) 35 | 36 | # define the base network (resnet here, can be MobileNet, etc) 37 | if C.network == 'resnet50': 38 | from keras_csp import resnet50 as nn 39 | 40 | weight_path = 'data/models/resnet50_weights_tf_dim_ordering_tf_kernels.h5' 41 | 42 | input_shape_img = (C.size_train[0], C.size_train[1], 3) 43 | img_input = Input(shape=input_shape_img) 44 | # define the network prediction 45 | preds = nn.nn_p3p4p5(img_input, offset=C.offset, num_scale=C.num_scale, trainable=True) 46 | preds_tea = nn.nn_p3p4p5(img_input, offset=C.offset, num_scale=C.num_scale, trainable=True) 47 | 48 | model = Model(img_input, preds) 49 | if num_gpu > 1: 50 | from keras_csp.parallel_model import ParallelModel 51 | 52 | model = ParallelModel(model, int(num_gpu)) 53 | model_stu = Model(img_input, preds) 54 | model_tea = Model(img_input, preds_tea) 55 | 56 | model.load_weights(weight_path, by_name=True) 57 | model_tea.load_weights(weight_path, by_name=True) 58 | print('load weights from {}'.format(weight_path)) 59 | 60 | if C.offset: 61 | out_path = 'output/valmodels/wider/%s/off' % (C.scale) 62 | else: 63 | out_path = 'output/valmodels/wider/%s/nooff' % (C.scale) 64 | if not os.path.exists(out_path): 65 | os.makedirs(out_path) 66 | res_file = os.path.join(out_path, 'records.txt') 67 | 68 | optimizer = Adam(lr=C.init_lr) 69 | if C.offset: 70 | model.compile(optimizer=optimizer, loss=[losses.cls_center, losses.regr_hw, losses.regr_offset]) 71 | else: 72 | model.compile(optimizer=optimizer, loss=[losses.cls_center, losses.regr_hw]) 73 | 74 | epoch_length = int(C.iter_per_epoch / batchsize) 75 | iter_num = 0 76 | add_epoch = 0 77 | losses = np.zeros((epoch_length, 3)) 78 | 79 | best_loss = np.Inf 80 | print(('Starting training with lr {} and alpha {}'.format(C.init_lr, C.alpha))) 81 | start_time = time.time() 82 | total_loss_r, cls_loss_r1, regr_loss_r1, offset_loss_r1 = [], [], [], [] 83 | for epoch_num in range(C.num_epochs): 84 | progbar = generic_utils.Progbar(epoch_length) 85 | print(('Epoch {}/{}'.format(epoch_num + 1 + add_epoch, C.num_epochs + C.add_epoch))) 86 | while True: 87 | try: 88 | X, Y = next(data_gen_train) 89 | loss_s1 = model.train_on_batch(X, Y) 90 | 91 | for l in model_tea.layers: 92 | weights_tea = l.get_weights() 93 | if len(weights_tea) > 0: 94 | if num_gpu > 1: 95 | weights_stu = model_stu.get_layer(name=l.name).get_weights() 96 | else: 97 | weights_stu = model.get_layer(name=l.name).get_weights() 98 | weights_tea = [C.alpha * w_tea + (1 - C.alpha) * w_stu for (w_tea, w_stu) in 99 | zip(weights_tea, weights_stu)] 100 | l.set_weights(weights_tea) 101 | # print loss_s1 102 | losses[iter_num, 0] = loss_s1[1] 103 | losses[iter_num, 1] = loss_s1[2] 104 | if C.offset: 105 | losses[iter_num, 2] = loss_s1[3] 106 | else: 107 | losses[iter_num, 2] = 0 108 | 109 | iter_num += 1 110 | if iter_num % 20 == 0: 111 | progbar.update(iter_num, 112 | [('cls', np.mean(losses[:iter_num, 0])), ('regr_h', np.mean(losses[:iter_num, 1])), 113 | ('offset', np.mean(losses[:iter_num, 2]))]) 114 | if iter_num == epoch_length: 115 | cls_loss1 = np.mean(losses[:, 0]) 116 | regr_loss1 = np.mean(losses[:, 1]) 117 | offset_loss1 = np.mean(losses[:, 2]) 118 | total_loss = cls_loss1 + regr_loss1 + offset_loss1 119 | 120 | total_loss_r.append(total_loss) 121 | cls_loss_r1.append(cls_loss1) 122 | regr_loss_r1.append(regr_loss1) 123 | offset_loss_r1.append(offset_loss1) 124 | print(('Total loss: {}'.format(total_loss))) 125 | print(('Elapsed time: {}'.format(time.time() - start_time))) 126 | 127 | iter_num = 0 128 | start_time = time.time() 129 | 130 | if total_loss < best_loss: 131 | print(('Total loss decreased from {} to {}, saving weights'.format(best_loss, total_loss))) 132 | best_loss = total_loss 133 | model_tea.save_weights( 134 | os.path.join(out_path, 'net_e{}_l{}.hdf5'.format(epoch_num + 1 + add_epoch, total_loss))) 135 | break 136 | except Exception as e: 137 | print(('Exception: {}'.format(e))) 138 | continue 139 | records = np.concatenate((np.asarray(total_loss_r).reshape((-1, 1)), 140 | np.asarray(cls_loss_r1).reshape((-1, 1)), 141 | np.asarray(regr_loss_r1).reshape((-1, 1)), 142 | np.asarray(offset_loss_r1).reshape((-1, 1)),), 143 | axis=-1) 144 | np.savetxt(res_file, np.array(records), fmt='%.6f') 145 | print('Training complete, exiting.') 146 | --------------------------------------------------------------------------------