├── .gitignore ├── README.md ├── callback.py ├── clean.sh ├── cntk-py27.yml ├── cntk-py35.yml ├── datasets ├── cls_dogs_vs_cats.py ├── cls_dogs_vs_cats.sh ├── cls_rvl_cdip.py ├── cls_rvl_cdip_check.sh ├── cls_rvl_cdip_convert.sh ├── cls_tiny_imagenet.py ├── cls_tiny_imagenet_class_list.sh ├── cls_tiny_imagenet_convert.sh ├── document.conf ├── ocr_documents.py ├── ocr_documents_generator.py ├── ocr_documents_preprocess.py ├── ocr_documents_statistics.py └── ocr_mnist.py ├── images ├── Object_detection_deep_learning_networks_for_Optical_Character_Recognition.pdf ├── res1.png ├── res2.png ├── res3.png ├── res4.png └── res5.png ├── keras-tf-py27.yml ├── keras-tf-py35.yml ├── models ├── CNN_C128_C256_M2_C256_C256_M2_C512_D_2.py ├── CNN_C128_C256_M2_C512_D.py ├── CNN_C32_C64_C128_C.py ├── CNN_C32_C64_C128_C2.py ├── CNN_C32_C64_C128_D.py ├── CNN_C32_C64_C64_Cd64_C128_D.py ├── CNN_C32_C64_M2_C128_D.py ├── CNN_C32_C64_M2_C64_C64_M2_C128_D.py ├── CNN_C32_C64_M2_C64_C64_M2_C128_D_2.py ├── CNN_C32_Cd64_C64_Cd64_C128_D.py ├── CNN_C64_C128_M2_C128_C128_M2_C256_D_2.py ├── CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7.py ├── CNN_C64_C128_M2_C128_C128_M2_C256_D_3.py ├── CNN_C64_C128_M2_C256_D.py ├── VGG16_AVG.py ├── VGG16_AVG_r.py ├── VGG16_C4096_C4096_AVG.py ├── VGG16_D256.py ├── VGG16_D4096_D4096.py ├── VGG16_block4_D4096_D4096.py ├── __init__.py ├── simple_document_classification.py └── vgg.py ├── train.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | Graph/ 3 | logs/ 4 | *.npz 5 | datasets/ocr 6 | *.zip 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Pretrained document features for document OCR, classification and segmentation 2 | 3 | The objective of this repository is to develop pretrained features for document images to be used in document classification, segmentation, OCR and analysis. The pretrained features are being trained upon OCR results from a OCR technology, such as Tesseract. 4 | 5 | 6 | 7 | 8 | [PDF paper](images/Object_detection_deep_learning_networks_for_Optical_Character_Recognition.pdf) 9 | 10 | 11 | ## Features 12 | 13 | - **Python 2 and Python 3 support** 14 | 15 | - **Tensorflow and CNTK support** 16 | 17 | To run the training with CNTK, activate the Python environment `source activate keras-tf-py27` and set backend value to `tensorflow` in `~/.keras/keras.json`. 18 | 19 | To run the training with CNTK, activate the Python environment `source activate cntk-py27` and set backend value to `cntk` in `~/.keras/keras.json`. 20 | 21 | - **Multi-GPUs support** 22 | 23 | To enable parallel computing with multi-gpus: 24 | ``` 25 | python train.py -p 26 | ``` 27 | 28 | For CNTK, start parallel workers to use all GPUs: 29 | ``` 30 | mpiexec --npernode 4 python train.py -p 31 | ``` 32 | 33 | - **TensorBoard visualization** Train and validation loss, objectness accuracy per layer scale, class accuracy per layer scale, regression accuracy, object mAP score, target mAP score, original image, objectness map, multi layer detections, detections after non-max-suppression, target and groundtruth. 34 | 35 | 36 | ## Install requirements 37 | 38 | - Ubuntu 17.4 39 | 40 | - GPU support: [NVIDIA driver](http://www.nvidia.fr/download/driverResults.aspx/131287/fr), [Cuda 9.0](https://developer.nvidia.com/cuda-90-download-archive) and [Cudnn 7.0.4](https://developer.nvidia.com/rdp/form/cudnn-download-survey) (requirement by CNTK) 41 | 42 | - [CNTK install with MKL/OpenMPI/Protobuf/Zlib/LibZip/Boost/Swig/Anaconda3/Python support](https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-Linux) 43 | 44 | Create cntk-py35 and cntk-py27 Conda environments following their specs. 45 | 46 | Build: `../../configure --with-swig=/usr/local/swig-3.0.10 --with-py35-path=$HOME/anaconda3/envs/cntk-py35 --with-py27-path=$HOME/anaconda3/envs/cntk-py27` 47 | 48 | Update their environments to add Keras and other libraries for the current code : 49 | ``` 50 | conda env update --file cntk-py27.yml 51 | conda env update --file cntk-py35.yml 52 | ``` 53 | 54 | - Tensorflow and Python 2.7 55 | ``` 56 | conda env update --file keras-tf-py27.yml 57 | ``` 58 | 59 | - Tensorflow and Python 3.5 60 | ``` 61 | conda env update --file keras-tf-py35.yml 62 | ``` 63 | 64 | - HDF5 to save weights with Keras 65 | 66 | ``` 67 | sudo apt-get install libhdf5-dev 68 | ``` 69 | 70 | 71 | 72 | ## Run 73 | 74 | Activate one of the Conda environments: 75 | ``` 76 | source activate cntk-py27 77 | source activate cntk-py35 78 | source activate keras-tf-py27 79 | source activate keras-tf-py35 80 | ``` 81 | 82 | For help on available options: 83 | 84 | ``` 85 | python train.py -h 86 | python3 train.py -h 87 | 88 | Using TensorFlow backend/ 89 | Using CNTK backend 90 | Selected GPU[3] GeForce GTX 1080 Ti as the process wide default device. 91 | usage: train.py [-h] [-b BATCH_SIZE] [-p] [-e EPOCHS] [-l LOGS] [-m MODEL] 92 | [-lr LEARNING_RATE] [-s STRIDE_SCALE] [-d DATASET] [-w WHITE] 93 | [-n] [--pos_weight POS_WEIGHT] [--iou IOU] 94 | [--nms_iou NMS_IOU] [-i INPUT_DIM] [-r RESIZE] [--no-save] 95 | [--resume RESUME_MODEL] 96 | 97 | optional arguments: 98 | -h, --help show this help message and exit 99 | -b BATCH_SIZE, --batch_size BATCH_SIZE 100 | # of images per batch 101 | -p, --parallel Enable multi GPUs 102 | -e EPOCHS, --epochs EPOCHS 103 | # of training epochs 104 | -l LOGS, --logs LOGS log directory 105 | -m MODEL, --model MODEL 106 | model 107 | -lr LEARNING_RATE, --learning_rate LEARNING_RATE 108 | learning rate 109 | -s STRIDE_SCALE, --stride_scale STRIDE_SCALE 110 | Stride scale. If zero, default stride scale. 111 | -d DATASET, --dataset DATASET 112 | dataset 113 | -w WHITE, --white WHITE 114 | white probability for MNIST dataset 115 | -n, --noise noise for MNIST dataset 116 | --pos_weight POS_WEIGHT 117 | weight for positive objects 118 | --iou IOU iou treshold to consider a position to be positive. If 119 | -1, positive only if object included in the layer 120 | field 121 | --bb_positive BB_POSITIVE 122 | Possible values: iou-treshold, in-anchor, best-anchor 123 | --nms_iou NMS_IOU iou treshold for non max suppression 124 | -i INPUT_DIM, --input_dim INPUT_DIM 125 | network input dim 126 | -r RESIZE, --resize RESIZE 127 | resize input images 128 | --no-save save model and data to files 129 | --resume RESUME_MODEL 130 | --n_cpu N_CPU number of CPU threads to use during data generation 131 | ``` 132 | 133 | ## OCR Training 134 | 135 | ### Toy dataset with MNIST "ocr_mnist" 136 | 137 | 138 | Train image recognition of digits on a white background (inverted MNIST images): 139 | 140 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | 141 | | --- | --- | --- | --- | --- | 142 | | `python train.py` | 100 | 99.2827 | 1.60e-10 | 99.93 | 143 | | With noise `python train.py -n` | 99.62 | 98.92 | 4.65e-6 | 98.41 | 144 | 145 | 146 | 147 | 148 | With stride 12 instead of default 28: 149 | 150 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 151 | | --- | --- | --- | --- | --- | --- | 152 | | `python train.py -s 6 --iou .15` | 96.37 | 36.25 | 0.010 | **99.97** | 100 | 153 | | `python train.py -s 6 --iou .2` | 98.42 | 28.56 | 0.012 | 99.75 | 100 | 154 | | `python train.py -s 6 --iou .25` | 97.05 | 36.42 | 0.015 | 99.52 | 100 | 155 | | `python train.py -s 6 --iou .3` | 98.35 | 92.78 | 0.0013 | 99.88 | 100 | 156 | | `python train.py -s 6 --iou .35` | 98.99| 83.72| 0.0069 | 99.22 | 100 | 157 | | `python train.py -s 6 --iou .4` | 98.70 | 94.96| 0.0066 | 98.37 | 100 | 158 | | `python train.py -s 6 --iou .5` | 96.71 | 95.46 | 0.0062| 91.09 | 95.71 | 159 | | `python train.py -s 6 --iou .6` | 99.92| 98.23| 4.8e-05 | 51.80 | 54.32 | 160 | | `python train.py -s 6 --iou .8` | 99.90 | **97.90** | 7.67e-05 | 8.5 | 10.63 | 161 | | `python train.py -s 6 --iou .95` | **99.94** | 97.27 | 3.7-07 | 10.80 | 12.21 | 162 | | `python train.py -s 6 --iou .99` | 99.91 | 97.66 | 7.06e-07 | 9.3 | 11.71 | 163 | 164 | 165 | With stride 4: 166 | 167 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 168 | | --- | --- | --- | --- | --- | --- | 169 | | `python train.py -s 2 --iou .2` | 98.51 | 72.71 | 0.034 | 99.99 | 100 | 170 | | `python train.py -s 2 --iou .25` | 98.63 | 78.53 | 0.018 | **100** | 100 | 171 | | `python train.py -s 2 --iou .3` | 97.88 | 94.54 | 0.0098 | 99.89 | 100 | 172 | | `python train.py -s 2 --iou .4` | 96.85 | 97.41 | 0.0098 | 99.93 | 100 | 173 | | `python train.py -s 2 --iou .5` | 94.14 | 98.81 | 0.0099 | 99.61 | 100 | 174 | | `python train.py -s 2 --iou .6` | 99.80 | 98.57 | 0.00031 | 99.93 | 100 | 175 | | `python train.py -s 2 --iou .7` | 99.64 | 98.21 | 0.0016 | 99.77 | 100 | 176 | | `python train.py -s 2 --iou .8` | 100 | 98.19 | 1.7e-8 | 82.24 | 100 | 177 | | `python train.py -s 2 --iou .8 -e 30` | 99.98 | 99.35 | 1.73e-9 | 91.05 | 100 | 178 | 179 | 180 | Train on scale ranges [14-28]: 181 | 182 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 183 | | --- | --- | --- | --- | --- | --- | 184 | | `python train.py -r 14-28 -s 6 --iou .25 -e 30` | 99.10 | 89.37 | 0.0017 | 99.58 | 100 | 185 | 186 | 187 | With bigger net: 188 | 189 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 190 | | --- | --- | --- | --- | --- | --- | 191 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .5` | 99.59 | 98.02 | 0.00078 | 92.32 | 94.89 | 192 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .4` | 99.17 | 97.23 | 0.0047 | 99.79 | 100 | 193 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .3` | 99.74 | 96.84 | 0.00043 | **100** | 100 | 194 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .2` | 97.57 | 91.14 | 0.0016 | 99.98 | 100 | 195 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .15` | 98.02 | 83.85 | 0.0083 | 99.95 | 100 | 196 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 2 --iou .5` | 99.80 | 98.87 | 0.00053 | **100** | 100 | 197 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 2 --iou .25` | 99.48 |95.78 | 0.00054 | 100 | 100 | 198 | | `python train.py -r 14-28 -m CNN_C64_C128_M2_C256_D -s 6 --iou .25 -e 30` | 96.58 | 91.42 | 0.0045 | 99.85 | 100 | 199 | 200 | 201 | Train on scale 56x56: 202 | 203 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 204 | | --- | --- | --- | --- | --- | --- | 205 | | `python train.py -r 56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D` | 99.98 | 99.22 | 7.4e-09 | 99.97 | 100 | 206 | | `python train.py -r 56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .2` | 98.86 | 78.63 | 0.011 | 99.89 | 100 | 207 | | `python train.py -r 56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .3` | 99.36 | 94.60 | 0.0036 | 99.97 | 100 | 208 | | `python train.py -r 56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .4` | 99.23 | 91.11 | 0.048 | **100** | 100 | 209 | 210 | 211 | Train for two stage networks (scales 28 and 56): 212 | 213 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 214 | | --- | --- | --- | --- | --- | --- | 215 | | `python train.py -r 28,56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2` | 99.99/1.0 | 98.62/96.69 | 1.06e-08/4.18e-05 | 99.97 | 100 | 216 | | `python train.py -r 28,56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 -s 6 -e 50` | 99.51/97.76 | 89.83/95.22 | 0.0048/0.016 | 99.44 | 100 | 217 | | `python train.py -r 28,56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 -s 4 -e 30` | 99.39/97.46 | 85.21/92.19 | 0.0054/0.022 | 99.64 | 100 | 218 | 219 | 220 | 221 | Train on scale ranges [28-56], two stages [14-28,28-56] and [14, 56]: 222 | 223 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 224 | | --- | --- | --- | --- | --- | --- | 225 | | `python train.py -r 28-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .25 -e 30` | 98.99 | 93.92 | 0.0018 | 99.89 | 100 | 226 | | `python train.py -r 14-28,28-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 -s 6 --iou .25 -e 30` | 98.92/98.04 | 64.06/91.08 | 0.0037/0.0056 | 98.82 | 99.90 | 227 | | `python train.py -r 14-28,28-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 -s 6 --iou .2 -e 30` | 98.57/97.73 | 58.30/79.84 | 0.0058/0.0036 | 98.31 | 99.90 | 228 | | `python train.py -r 14-28,28-56 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 -s 6 --iou .25 -e 30` | 99.10 / 98.16 | 93.64 / 95.28 | 0.0016 / 0.0014 | 98.42 | 99.93 | 229 | | `python train.py -r 14-28,28-56 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 -s 6 --iou .25 -e 50` | 99.26 / 98.78 | 93.91 / 94.02 | 0.0010 / 0.0014 | 98.81 | 99.93 | 230 | | `python train.py -r 14-28,28-56 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 -s 6 --iou .2 -e 50` | 99.05/98.05 | 89.88/91.97 | 0.0021/0.0022 | 99.11 | 99.97 | 231 | | `python train.py -r 14-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .02 -e 30` | 97.58 | 30.17 | 0.10 | 75.07 | 100 | 232 | | `python train.py -r 14-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .05 -e 30` | 97.92 | 53.20 | 0.027 | 75.49 | 100 | 233 | | `python train.py -r 14-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .1 -e 30` | 97.82 | 58.44 | 0.0057 | 87.45 | 92.67 | 234 | | `python train.py -r 14-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .2 -e 30` | 98.82 | 79.23 | 0.0010 | 72.36 | 75.78 | 235 | 236 | 237 | Train on lower resolution (digit resize parameter): 238 | 239 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 240 | | --- | --- | --- | --- | --- | --- | 241 | | `python train.py -e 30 -r 14 -m CNN_C32_C64_C128_D` | 100 | 99.04 | 2.2-12 | 99.91 | 100 | 242 | | `python train.py -e 30 -r 14 -m CNN_C32_C64_C128_D -s 4` | 97.12 | 94.50 | 0.012 | 99.91 | 100 | 243 | | `python train.py -e 30 -r 14 -m CNN_C32_C64_C128_C` | 100 | 98.75 | 1.9-05 | 97.02 | 100 | 244 | | `python train.py -e 30 -r 14 -m CNN_C32_C64_C128_C -s 4` | 98.00 | 91.69 | 0.023 | 93.87 | 100 | 245 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_D` | 99.99 | 96.78 | 8.4e-5 | 99.85 | 100 | 246 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_D -s 4` | 98.58 | 73.07 | 0.0087 | 98.61 | 100 | 247 | | `python train.py -e 30 -r 7-14 --iou .25 -m CNN_C32_C64_C128_D -s 4` | 99.07 | 75.34 | 0.012 | 98.98 | 100 | 248 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_C` | 99.31 | 93.61 | 0.0035 | 92.52 | 100 | 249 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_C -s 4` | 97.22 | 24.87 | 0.0060 | 97.68 | 100 | 250 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_C2 -s 4` | 98.49 | 47.93 | 0.0088 | 98.91 | 100 | 251 | | `python train.py -e 30 -r 7-28 -s 6 -m CNN_C32_C64_C64_Cd64_C128_D --iou .02` | 96.51 | 24.42 | 0.12 | 64.43 | 66.47 | 252 | | ` python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .2` | 99.12 | 91.01 | 0.0040 | 84.87 | 77.18 | 253 | | `python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .15` | 98.40 | 77.86 | 0.029 | 88.68 | 85.71 | 254 | | `python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .1` | 98.20 | 56.96 | 0.086 | 87.51 | 95.34 | 255 | | `python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .05 -lr 0.001` | 97.71 | 38.91 | 0.032 | 77.98 | 100 | 256 | | `python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .02 --lr 0.0001` | 96.79 | 18.59 | 0.10 | 77.28 | 100 | 257 | | `python train.py -e 30 -r 7-28 -s 3 -m CNN_C32_C64_C64_Cd64_C128_D --iou .1` | 97.47 | 73.70 | 0.010 | 87.19 | 95.45 | 258 | | `python train.py -e 30 -r 7-28 -s 3 -m CNN_C32_C64_C64_Cd64_C128_D --iou .2` | 99.08 | 92.84 | 0.0074 | 81.01 | 76.47 | 259 | | `python train.py -e 50 -r 7-28 -s 3 -m CNN_C32_C64_C64_Cd64_C128_D --iou .15` | 98.71 | 88.02 | 0.0046 | 87.79 | 84.76 |  260 | | `python train.py -e 50 -r 7-28 -s 3 -m CNN_C32_C64_C64_Cd64_C128_D --iou .1` | 97.97 | 79.19 | 0.0096 | 89.17 | 95.24 |  261 | 262 | 263 | 264 | Train on larger images (1000 or 1500 rather than 700): 265 | 266 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 267 | | --- | --- | --- | --- | --- | --- | 268 | | `python train.py -e 30 -i 1000 -r 7-14 --iou .2 -m CNN_C32_C64_C128_D -s 4` | 98.80 | 52.92 | 0.0081 | 98.78 | 100 | 269 | | `python train.py -e 30 -i 1000 -r 7-14 --iou .2 -m CNN_C32_C64_C128_C -s 4` | 98.24 | 20.36 | 0.011 | 97.46 | 100 | 270 | | `python train.py -e 30 -i 1500 -r 7-14 --iou .2 -m CNN_C32_C64_C128_D -s 4` | 98.61 | 47.04 | 0.0076 | 98.36 | 100 | 271 | | `python train.py -e 30 -i 1000 -r 7-28 --iou .2 -m CNN_C32_C64_C64_Cd64_C128_D -s 4` | 98.93 | 89.25 | 0.0031 | 81.39 | 76.23 | 272 | | `python train.py -e 30 -i 1500 -r 7-28 --iou .2 -m CNN_C32_C64_C64_Cd64_C128_D -s 3 -b 1` | 99.04 | 91.46 | 0.0063 | 82.33 | 76.95 | 273 | | `python train.py -e 50 -i 1500 -r 7-28 --iou .2 -m CNN_C32_C64_C64_Cd64_C128_D -s 3 -b 1` | 98.78 | 91.20 | 0.011 | 82.93 | 76.38 | 274 | | `python train.py -e 50 -i 1500 -r 7-28 --iou .2 -m CNN_C32_C64_C64_Cd64_C128_D -s 4 -b 1` | 98.96 | 92.69 | 0.0015 | 80.29 | 76.97 | 275 | 276 | 277 | ### OCR Dataset "ocr_documents" 278 | 279 | Create a document configuration file `document.conf` in JSON specifying the directory in which document files are in JPG: 280 | 281 | ```json 282 | { 283 | "directory": "/sharedfiles/ocr_documents", 284 | "namespace": "ivalua.xml", 285 | "page_tag": "page", 286 | "char_tag": "char", 287 | "x1_attribute": "x1", 288 | "y1_attribute": "y1", 289 | "x2_attribute": "x2", 290 | "y2_attribute": "y2" 291 | } 292 | ``` 293 | 294 | Use Tesseract OCR to produce the XML files: 295 | 296 | ``` 297 | sudo apt-get install tesseract-ocr tesseract-ocr-fra 298 | python datasets/ocr_documents_preprocess.py 299 | ``` 300 | 301 | Get document statistics with `python ocr_documents_statistics.py`. 302 | 303 | By default, input size is 700, this means 3500x2500 input images will be cropped to 700x420 : 304 | 305 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 306 | | --- | --- | --- | --- | --- | --- | 307 | | `python train.py -e 50 -d ocr_documents -s 2 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.15` | 97.00/97.76 | 69.11/71.78 | 0.027/0.016 | 58.82 | 91.22 | 308 | | `python train.py -e 50 -d ocr_documents -s 2 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.2` | 97.89/98.44 | 75.39/72.75 | 0.020/0.011 | 68.09 | 84.47 | 309 | | `python train.py -e 50 -d ocr_documents -s 2 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.25` | 98.19 | 81.43 | 0.014 | **64.69** | 65.40 | 310 | | `python train.py -e 50 -d ocr_documents -s 3 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.15` | 97.52/ 97.58 | 72.18/77.03 | 0.028/0.015 | **67.05** | 86.07 | 311 | | `python train.py -e 50 -d ocr_documents -s 3 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.2` | 98.24/98.25 | 79.01/79.47 | 0.019/0.10 | 66.25 | 78.15 | 312 | | `python train.py -e 50 -d ocr_documents -s 3 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.25` | 98.60/98.90 | 80.17/78.93 | 0.015/0.0075 | 62.71 | 66.42 | 313 | | `python train.py -e 50 -d ocr_documents -s 4 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.15` | 97.90/97.50 | 72.05/74.58 | 0.029/0.017 | 62.87 | 89.77 | 314 | | `python train.py -e 50 -d ocr_documents -s 4 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.2` | 98.42/97.99 | 78.35/79.15 | 0.021/0.012 | **66.30** | 83.94 | 315 | | `python train.py -e 50 -d ocr_documents -s 4 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.25` | 98.88/98.61 | 77.64/81.11 | 0.017/0.0077 | 60.26 | 69.35 | 316 | | `python train.py -e 50 -d ocr_documents -s 5 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.15` | 98.47/97.36 | 70.94/77.87 | 0.031/0.018 | **59.33** | 85.87 | 317 | | `python train.py -e 50 -d ocr_documents -s 5 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.2` | 98.92/97.76 | 67.94/80.13 | 0.021/0.014 | 51.87 | 77.52 | 318 | | `python train.py -e 50 -d ocr_documents -s 5 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.25` | 99.09/98.45 | 70.41/83.67 | 0.018/0.0097 | 44.59 | 61.57 | 319 | 320 | 321 | With more capacity: 322 | 323 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 324 | | --- | --- | --- | --- | --- | --- | 325 | | `python train.py -e 50 -d ocr_documents -s 3 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 --iou 0.2` (1) | 98.45/98.66 | 83.27/85.42 | 0.018/0.0097 | 70.11 | 78.15 | 326 | 327 | (1) Model Tensorflow `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-05-28_20:03_CNN_C64_C128_M2_C128_C128_M2_C256_D_2.h5` 328 | 329 | Model CNTK `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-06-04_12:05_CNN_C64_C128_M2_C128_C128_M2_C256_D_2.h5` and `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-06-13_21.37_CNN_C64_C128_M2_C128_C128_M2_C256_D_2.dnn` 330 | 331 | 332 | 333 | To train on lower resolution, resize input images to 1000 (downsize by 3.5) and change input size by the same factor, to 200, in order to get 200x120 crops : 334 | 335 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP | 336 | | --- | --- | --- | --- | --- | --- | 337 | | `python train.py -e 150 -d ocr_documents -r 1000 -i 200 -s 6 -m CNN_C64_C128_M2_C256_D --iou .25` | 98.90 | 34.14 | 0.013 | 8.82 | 29.58 | 338 | | `python train.py -e 150 -d ocr_documents -r 1000 -i 200 -s 6 -m CNN_C64_C128_M2_C256_D --iou .2` | | |  |  |  | 339 | | `python train.py -e 150 -d ocr_documents -r 1000 -i 200 -s 1 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7 --iou 0.2 -b 1` (2) | 98.02/99.85 | 72.54/.00 | 0.013/0.0017 | 48.81 | 69.38 | 340 | | `python train.py -e 50 -d ocr_documents -r 1000 -i 200 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .15` | 98.32 | 45.78 | 0.018 | 36.17 | 69.74 | 341 | | `python train.py -e 50 -d ocr_documents -r 1000 -i 200 -s 4 -m CNN_C32_C64_C128_D --iou .15` | 96.87 | 61.79 | 0.023 | 46.89 | 69.08 | 342 | | `python train.py -e 50 -d ocr_documents -r 1000 -i 200 -s 4 -m CNN_C32_C64_C128_D --iou .2` | 97.20 | 62.90 | 0.016 | 42.25 | 61.84 | 343 | | `python train.py -e 150 -d ocr_documents -r 1700 -i 400 -s 6 -m CNN_C64_C128_M2_C256_D --iou .25` | 98.38 | 86.83 | 0.012 | 31.76 | 43.46 | 344 | | `python train.py -e 150 -d ocr_documents -r 1700 -i 400 -s 6 -m CNN_C64_C128_M2_C256_D --iou .2` | 97.72 | 83.86 | 0.016 | 42.00 |59.83 | 345 | 346 | (2) `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-06-22_13:02_CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7.h5` 347 | 348 | 349 | 350 | For OCR training on full document images: 351 | 352 | | Command | Obj acc | Class acc | Reg acc | 353 | | --- | --- | --- | --- | 354 | | `python train.py -e 50 -d ocr_documents_generator -i 2000 -r 2000 -s 3 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 --iou 0.2` | S3 | | | 355 | | `python train.py -e 50 -d ocr_documents_generator --n_cpu 8 -i 1000 -r 1000 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .15` | S3 | | | 356 | | `python train.py -e 50 -d ocr_documents_generator --n_cpu 8 -i 1000 -r 1000 -s 4 -m CNN_C32_C64_C128_D --iou .2` (3) | 98.49 | 69.11 | 0.0158 | 357 | | `python train.py -e 50 -d ocr_documents_generator --n_cpu 8 -i 1500 -r 1500 -s 4 -m CNN_C32_C64_C128_D --iou .2` | V1 Good | | | 358 | 359 | 360 | (3) `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-06-25_15:44_CNN_C32_C64_C128_D.h5` 361 | 362 | ## Classification Training 363 | 364 | ### Cats and dogs dataset 365 | 366 | Download dataset from https://www.kaggle.com/c/dogs-vs-cats/data 367 | 368 | ``` 369 | unzip /sharedfiles/train.zip -d /sharedfiles 370 | ./datasets/cls_dogs_vs_cats.sh /sharedfiles/train 371 | ``` 372 | 373 | | Command | Class acc | 374 | | --- | --- | 375 | | `python train.py -d cls_dogs_vs_cats -i 150 -m VGG16_D256 -lr 0.001 -b 16` | 91.82 | 376 | 377 | ### Tiny ImageNet dataset 378 | 379 | Download [dataset](https://tiny-imagenet.herokuapp.com/) 380 | 381 | ``` 382 | wget http://cs231n.stanford.edu/tiny-imagenet-200.zip -P /sharedfiles 383 | unzip /sharedfiles/tiny-imagenet-200.zip -d /sharedfiles/ 384 | ./datasets/cls_tiny_imagenet_convert.sh /sharedfiles/tiny-imagenet-200 385 | python train.py -d cls_tiny_imagenet -i 150 -m VGG16_D4096_D4096 -lr 0.001 -b 64 -e 150 -p 386 | ``` 387 | 388 | ### RVL-CDIP dataset 389 | 390 | ``` 391 | wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/rvl-cdip.tar.gz -P /sharedfiles 392 | # aws s3 cp s3://christopherbourez/public/rvl-cdip.tar.gz /sharedfiles/ 393 | mkdir /sharedfiles/rvl_cdip 394 | tar xvzf /sharedfiles/rvl-cdip.tar.gz -C /sharedfiles/rvl_cdip 395 | ./datasets/cls_rvl_cdip_convert.sh /sharedfiles/rvl_cdip 396 | # remove corrupted tiff 397 | rm /sharedfiles/rvl_cdip/test/scientific_publication/2500126531_2500126536.tif 398 | ``` 399 | 400 | | Command | Class acc | 401 | | --- | --- | 402 | | `python train.py -d cls_rvl_cdip -i 150 -m VGG16_D4096_D4096 -lr 0.0001 -b 64 -e 25 -p` | 90.2 | 403 | -------------------------------------------------------------------------------- /callback.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | from keras.callbacks import Callback 3 | from keras import backend as K 4 | from datasets import get_layer_sizes, iou 5 | import numpy as np 6 | import cv2 7 | import math 8 | 9 | colors = [(86, 0, 240), (173, 225, 61), (54, 137, 255),\ 10 | (151, 0, 255), (243, 223, 48), (0, 117, 255),\ 11 | (58, 184, 14), (86, 67, 140), (121, 82, 6),\ 12 | (174, 29, 128), (115, 154, 81), (86, 255, 234)] 13 | np_colors=np.array(colors) 14 | 15 | def compute_eligible_rectangles(output_maps, layer_strides, layer_offsets, layer_fields, stride_margin, num_classes, layer_sizes): 16 | res = [] 17 | for i in range(output_maps[0].shape[0]): 18 | eligible = [] 19 | for o , output_map in enumerate(output_maps): 20 | dim = layer_fields[o] 21 | if stride_margin: 22 | dim = dim - layer_strides[o] 23 | 24 | # class_prob_map = output_map[0, :, :, 0:nb_classes ] # (15, 25, nb_classes) 25 | # class_map = np.argmax(class_prob_map, axis=-1) # (15, 25) 26 | objectness_map = output_map[i, :, :, num_classes ] # (15, 25) 27 | reg = output_map[i, :, :, num_classes+1:num_classes+5] 28 | 29 | for y in range(objectness_map.shape[0]): 30 | for x in range(objectness_map.shape[1]): 31 | if objectness_map[y, x] > 0.5: 32 | w_2 = int(dim * 2**(-reg[y,x,3] -1)) # half width 33 | h_2 = int(dim * 2**(-reg[y,x,2] -1)) # half height 34 | x1 = layer_offsets[o] + x * layer_strides[o] + reg[y,x,1] * dim - w_2 35 | y1 = layer_offsets[o] + y * layer_strides[o] + reg[y,x,0] * dim - h_2 36 | x2 = layer_offsets[o] + x * layer_strides[o] + reg[y,x,1] * dim + w_2 37 | y2 = layer_offsets[o] + y * layer_strides[o] + reg[y,x,0] * dim + h_2 38 | eligible.append( [objectness_map[y, x], y1, x1, 2 * h_2 , 2 * w_2, o ] ) 39 | res.append(eligible) 40 | return res 41 | 42 | 43 | def non_max_suppression(rectangles, nms_iou): 44 | res = [] 45 | for eligible in rectangles: 46 | valid = [] 47 | if len(eligible) > 0: 48 | index = np.argsort(- np.array(eligible)[:,0]) 49 | valid.append(eligible[0]) 50 | for i in index: 51 | if np.max(iou( np.array(valid)[:,1:], np.array( [eligible[i][1:5]] ))) > nms_iou: 52 | continue 53 | else: 54 | valid.append( eligible[i] ) 55 | res.append(valid) 56 | return res 57 | 58 | 59 | def compute_map_score_and_mean_distance(val_gt, detections, overlap_threshold = 0.5): 60 | precision_recall = [] 61 | fp, tp = 0, 0 62 | distance = 0.0 63 | 64 | # unflatten groundtruth, flatten detections for ordering 65 | nb_groundtruth = 0 66 | groundtruth = [] 67 | gt_detected = [] 68 | flattened_detections = [] 69 | for image_id in range(len(detections)): 70 | gt = [] 71 | for r in val_gt: 72 | if r[0] == image_id: 73 | gt.append( r[1:5] ) 74 | nb_groundtruth = nb_groundtruth + 1 75 | groundtruth.append(gt) 76 | gt_detected.append(np.zeros((len(gt)))) 77 | for d in range(len(detections[image_id])): 78 | flattened_detections.append( (detections[image_id][d][0], image_id, d ) ) 79 | 80 | # order detections 81 | if len(flattened_detections) > 0: 82 | index = np.argsort(- np.array(flattened_detections)[:,0]) 83 | 84 | # compute recall and precision for increasingly large subset of detections 85 | for i in index: # iterate through all predictions 86 | image_id = flattened_detections[i][1] 87 | d = flattened_detections[i][2] 88 | detection = np.array([ detections[image_id][d][1:5] ]) 89 | 90 | gt = np.array(groundtruth[image_id]) 91 | if len(gt) == 0: 92 | fp = fp + 1 93 | else: 94 | iou_scores = iou(gt, detection) 95 | m = np.argmax(iou_scores) 96 | if iou_scores[m] > overlap_threshold: 97 | if gt_detected[image_id][m] == 0: # not yet detected 98 | gt_detected[image_id][m] = 1 99 | tp = tp + 1 100 | distance = distance + math.sqrt( (gt[m][0] + gt[m][2]/2 - detection[0][0] - detection[0][2]/2)**2 + (gt[m][1] + gt[m][3]/2 - detection[0][1] - detection[0][3]/2)**2 ) 101 | else: # detected twice 102 | fp = fp + 1 103 | else: 104 | fp = fp + 1 105 | precision_recall.append( ( tp/max(tp+fp, 1), tp/max(nb_groundtruth,1) ) ) 106 | 107 | # filling the dips 108 | interpolated_precision_recall = [] 109 | for i in range(len(precision_recall)): 110 | if precision_recall[i][0] >= max( [ p for p, _ in precision_recall[i:] ] ): 111 | interpolated_precision_recall.append(precision_recall[i]) 112 | 113 | mAP = 0 114 | previous_r = 0 115 | for p, r in interpolated_precision_recall: 116 | mAP += p * (r - previous_r) 117 | previous_r = r 118 | 119 | return mAP, distance / max(tp, 1) 120 | 121 | 122 | 123 | class TensorBoard(Callback): 124 | """TensorBoard basic visualizations. 125 | [TensorBoard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) 126 | is a visualization tool provided with TensorFlow. 127 | This callback writes a log for TensorBoard, which allows 128 | you to visualize dynamic graphs of your training and test 129 | metrics, as well as activation histograms for the different 130 | layers in your model. 131 | If you have installed TensorFlow with pip, you should be able 132 | to launch TensorBoard from the command line: 133 | ```sh 134 | tensorboard --logdir=/full_path_to_your_logs 135 | ``` 136 | When using a backend other than TensorFlow, TensorBoard will still work 137 | (if you have TensorFlow installed), but the only feature available will 138 | be the display of the losses and metrics plots. 139 | # Arguments 140 | log_dir: the path of the directory where to save the log 141 | files to be parsed by TensorBoard. 142 | histogram_freq: frequency (in epochs) at which to compute activation 143 | and weight histograms for the layers of the model. If set to 0, 144 | histograms won't be computed. Validation data (or split) must be 145 | specified for histogram visualizations. 146 | write_graph: whether to visualize the graph in TensorBoard. 147 | The log file can become quite large when 148 | write_graph is set to True. 149 | write_grads: whether to visualize gradient histograms in TensorBoard. 150 | `histogram_freq` must be greater than 0. 151 | batch_size: size of batch of inputs to feed to the network 152 | for histograms computation. 153 | write_images: whether to write model weights to visualize as 154 | image in TensorBoard. 155 | embeddings_freq: frequency (in epochs) at which selected embedding 156 | layers will be saved. 157 | embeddings_layer_names: a list of names of layers to keep eye on. If 158 | None or empty list all the embedding layer will be watched. 159 | embeddings_metadata: a dictionary which maps layer name to a file name 160 | in which metadata for this embedding layer is saved. See the 161 | [details](https://www.tensorflow.org/how_tos/embedding_viz/#metadata_optional) 162 | about metadata files format. In case if the same metadata file is 163 | used for all embedding layers, string can be passed. 164 | """ 165 | 166 | def __init__(self, val_gt, classes, stride_margin, layer_strides, layer_offsets, layer_fields, nms_iou = .5, 167 | log_dir='./logs', 168 | histogram_freq=0, 169 | batch_size=32, 170 | max_validation_size=10000, 171 | write_graph=True, 172 | write_grads=False, 173 | write_images=False, 174 | write_output_images=False, 175 | enable_boundingbox=False, 176 | enable_segmentation=False, 177 | batch_display_freq=100, 178 | embeddings_freq=0, 179 | embeddings_layer_names=None, 180 | embeddings_metadata=None, val_data=None): 181 | super(TensorBoard, self).__init__() 182 | global tf, projector 183 | try: 184 | import tensorflow as tf 185 | from tensorflow.contrib.tensorboard.plugins import projector 186 | except ImportError: 187 | raise ImportError('You need the TensorFlow module installed to use TensorBoard.') 188 | 189 | self.val_gt = val_gt 190 | self.classes = classes 191 | self.num_classes = len(classes) 192 | self.stride_margin = stride_margin 193 | self.layer_strides = layer_strides 194 | self.layer_offsets = layer_offsets 195 | self.layer_fields = layer_fields 196 | self.epoch = 0 197 | self.nms_iou = nms_iou 198 | self.log_dir = log_dir 199 | self.histogram_freq = histogram_freq 200 | self.merged = None 201 | self.write_output_images = write_output_images 202 | self.enable_boundingbox = enable_boundingbox 203 | self.enable_segmentation = enable_segmentation 204 | self.batch_display_freq = batch_display_freq 205 | self.write_graph = write_graph 206 | self.write_grads = write_grads 207 | self.write_images = write_images 208 | self.embeddings_freq = embeddings_freq 209 | self.embeddings_layer_names = embeddings_layer_names 210 | self.embeddings_metadata = embeddings_metadata or {} 211 | self.batch_size = batch_size 212 | self.max_validation_size = max_validation_size 213 | self.val_data = val_data 214 | 215 | def set_model(self, model): 216 | self.model = model 217 | if K.backend() == 'tensorflow': 218 | self.sess = K.get_session() 219 | 220 | if self.write_output_images: 221 | self.log_image_data = tf.placeholder(tf.uint8, [None, None, 3]) 222 | self.log_image_name = tf.placeholder(tf.string) 223 | from tensorflow.python.ops import gen_logging_ops 224 | from tensorflow.python.framework import ops as _ops 225 | self.log_image = gen_logging_ops._image_summary(self.log_image_name, tf.expand_dims(self.log_image_data, 0), max_images=1) 226 | _ops.add_to_collection(_ops.GraphKeys.SUMMARIES, self.log_image) 227 | 228 | if self.histogram_freq and self.merged is None: 229 | for layer in self.model.layers: 230 | 231 | for weight in layer.weights: 232 | mapped_weight_name = weight.name.replace(':', '_') 233 | tf.summary.histogram(mapped_weight_name, weight) 234 | if self.write_grads: 235 | grads = model.optimizer.get_gradients(model.total_loss, 236 | weight) 237 | 238 | def is_indexed_slices(grad): 239 | return type(grad).__name__ == 'IndexedSlices' 240 | grads = [ 241 | grad.values if is_indexed_slices(grad) else grad 242 | for grad in grads] 243 | tf.summary.histogram('{}_grad'.format(mapped_weight_name), grads) 244 | if self.write_images: 245 | w_img = tf.squeeze(weight) 246 | shape = K.int_shape(w_img) 247 | if len(shape) == 2: # dense layer kernel case 248 | if shape[0] > shape[1]: 249 | w_img = tf.transpose(w_img) 250 | shape = K.int_shape(w_img) 251 | w_img = tf.reshape(w_img, [1, 252 | shape[0], 253 | shape[1], 254 | 1]) 255 | elif len(shape) == 3: # convnet case 256 | if K.image_data_format() == 'channels_last': 257 | # switch to channels_first to display 258 | # every kernel as a separate image 259 | w_img = tf.transpose(w_img, perm=[2, 0, 1]) 260 | shape = K.int_shape(w_img) 261 | w_img = tf.reshape(w_img, [shape[0], 262 | shape[1], 263 | shape[2], 264 | 1]) 265 | elif len(shape) == 1: # bias case 266 | w_img = tf.reshape(w_img, [1, 267 | shape[0], 268 | 1, 269 | 1]) 270 | else: 271 | # not possible to handle 3D convnets etc. 272 | continue 273 | 274 | shape = K.int_shape(w_img) 275 | assert len(shape) == 4 and shape[-1] in [1, 3, 4] 276 | tf.summary.image(mapped_weight_name, w_img) 277 | 278 | if hasattr(layer, 'output'): 279 | tf.summary.histogram('{}_out'.format(layer.name), 280 | layer.output) 281 | self.merged = tf.summary.merge_all() 282 | 283 | if self.write_graph: 284 | self.writer = tf.summary.FileWriter(self.log_dir, 285 | self.sess.graph) 286 | else: 287 | self.writer = tf.summary.FileWriter(self.log_dir) 288 | 289 | if self.embeddings_freq: 290 | embeddings_layer_names = self.embeddings_layer_names 291 | 292 | if not embeddings_layer_names: 293 | embeddings_layer_names = [layer.name for layer in self.model.layers 294 | if type(layer).__name__ == 'Embedding'] 295 | 296 | embeddings = {layer.name: layer.weights[0] 297 | for layer in self.model.layers 298 | if layer.name in embeddings_layer_names} 299 | 300 | self.saver = tf.train.Saver(list(embeddings.values())) 301 | 302 | embeddings_metadata = {} 303 | 304 | if not isinstance(self.embeddings_metadata, str): 305 | embeddings_metadata = self.embeddings_metadata 306 | else: 307 | embeddings_metadata = {layer_name: self.embeddings_metadata 308 | for layer_name in embeddings.keys()} 309 | 310 | config = projector.ProjectorConfig() 311 | self.embeddings_ckpt_path = os.path.join(self.log_dir, 312 | 'keras_embedding.ckpt') 313 | 314 | for layer_name, tensor in embeddings.items(): 315 | embedding = config.embeddings.add() 316 | embedding.tensor_name = tensor.name 317 | 318 | if layer_name in embeddings_metadata: 319 | embedding.metadata_path = embeddings_metadata[layer_name] 320 | 321 | projector.visualize_embeddings(self.writer, config) 322 | 323 | def on_batch_end(self, batch, logs=None): 324 | """Called at the end of a batch. 325 | # Arguments 326 | batch: integer, index of batch within the current epoch. 327 | logs: dictionary of logs. 328 | """ 329 | if batch % self.batch_display_freq != 0: 330 | return 331 | 332 | logs = logs or {} 333 | batch_size = logs.get('size', 0) 334 | # print(self.model.output.shape) 335 | # for layer in self.model.layers: 336 | # print(layer.name) 337 | # print(self.model.output.name) 338 | 339 | # self.infer = K.function([self.model.input]+ [K.learning_phase()], [self.model.output] ) 340 | # start_batch = batch * batch_size 341 | # output_map = self.infer([self.train_data[start_batch:(start_batch+1)], 1]) # [Tensor((1, 25, 15, nb_classes + 1))] 342 | 343 | # self.writer.flush() 344 | 345 | 346 | def on_epoch_end(self, epoch, logs=None): 347 | self.epoch = epoch + 1 348 | logs = logs or {} 349 | 350 | if not self.validation_data: 351 | # creating validation data from validation generator 352 | print("Feeding callback validation data with Generator") 353 | j = 0 354 | imgs = [] 355 | tags = [[] for s in range(len(self.layer_offsets))] 356 | for i in self.val_data: 357 | imgs.append(i[0]) 358 | for s in range(len(self.layer_offsets)): 359 | tags[s].append(i[1][s] ) 360 | j = j + 1 361 | if j > 10: 362 | break 363 | 364 | np_imgs = np.concatenate(imgs, axis=0) 365 | np_tags = [] 366 | for s in range(len(self.layer_offsets)): 367 | np_tags.append( np.concatenate( tags[s], axis=0 ) ) 368 | self.validation_data = [np_imgs] + np_tags + [ np.ones(np_imgs.shape[0]), 0.0] 369 | 370 | if not self.validation_data and self.histogram_freq: 371 | raise ValueError('If printing histograms, validation_data must be ' 372 | 'provided, and cannot be a generator.') 373 | if self.validation_data and self.histogram_freq: 374 | if epoch % self.histogram_freq == 0: 375 | 376 | val_data = self.validation_data 377 | tensors = (self.model.inputs + 378 | self.model.targets + 379 | self.model.sample_weights) 380 | 381 | if self.model.uses_learning_phase: 382 | tensors += [K.learning_phase()] 383 | 384 | assert len(val_data) == len(tensors) 385 | val_size = val_data[0].shape[0] 386 | i = 0 387 | while i < val_size: 388 | step = min(self.batch_size, val_size - i) 389 | if self.model.uses_learning_phase: 390 | # do not slice the learning phase 391 | batch_val = [x[i:i + step] for x in val_data[:-1]] 392 | batch_val.append(val_data[-1]) 393 | else: 394 | batch_val = [x[i:i + step] for x in val_data] 395 | assert len(batch_val) == len(tensors) 396 | feed_dict = dict(zip(tensors, batch_val)) 397 | result = self.sess.run([self.merged], feed_dict=feed_dict) 398 | summary_str = result[0] 399 | self.writer.add_summary(summary_str, epoch) 400 | i += self.batch_size 401 | 402 | if self.embeddings_freq and self.embeddings_ckpt_path: 403 | if epoch % self.embeddings_freq == 0: 404 | self.saver.save(self.sess, 405 | self.embeddings_ckpt_path, 406 | epoch) 407 | 408 | for name, value in logs.items(): 409 | if name in ['batch', 'size']: 410 | continue 411 | summary = tf.Summary() 412 | summary_value = summary.value.add() 413 | summary_value.simple_value = value.item() 414 | summary_value.tag = name 415 | self.writer.add_summary(summary, epoch) 416 | 417 | 418 | if self.validation_data and self.write_output_images: 419 | ######### original image 420 | # from skimage.io import imsave 421 | # import os 422 | # import numpy as np 423 | # if not os.path.exists(self.log_dir): 424 | # os.mkdir(self.log_dir) 425 | val_img_data = self.validation_data[0] 426 | val_size = min(val_img_data.shape[0], self.max_validation_size) 427 | tensors = (self.model.inputs) 428 | img_shape = val_img_data[0].shape 429 | layer_sizes = get_layer_sizes(img_shape, self.layer_offsets, self.layer_strides) 430 | detections, target_detections = [], [] 431 | i = 0 432 | while i < val_size: 433 | step = min(self.batch_size, val_size - i) 434 | batch_val = [val_img_data[i:i + step], 1] 435 | if self.model.uses_learning_phase: 436 | tensors += [K.learning_phase()] 437 | batch_val.append(1) 438 | feed_dict = dict(zip(tensors, batch_val)) 439 | 440 | if self.enable_boundingbox or self.enable_segmentation: 441 | output_maps = self.sess.run(self.model.outputs, feed_dict=feed_dict) 442 | 443 | if self.enable_boundingbox: 444 | eligible = compute_eligible_rectangles(output_maps, 445 | self.layer_strides, self.layer_offsets, self.layer_fields, 446 | self.stride_margin, self.num_classes, layer_sizes) 447 | valid = non_max_suppression(eligible, self.nms_iou) 448 | 449 | # compute targets for display 450 | target = compute_eligible_rectangles([self.validation_data[s+1][i:i+step] for s in range(len(layer_sizes))], 451 | self.layer_strides, self.layer_offsets, self.layer_fields, 452 | self.stride_margin, self.num_classes, layer_sizes) 453 | 454 | if i <= 10: 455 | # display results on a few images 456 | for image_id in range(step): 457 | image = (val_img_data[i + image_id] * 255.)#.astype(np.uint8) 458 | if image.shape[2] == 1: 459 | image = np.tile(image,(1,1,3)) 460 | # imsave(os.path.join(self.log_dir, str(image_id) + '_input.png'), image) 461 | t = (self.epoch - 1) * val_img_data.shape[0] + i + image_id 462 | 463 | log_image_summary_op = self.sess.run(self.log_image, \ 464 | feed_dict={self.log_image_name: "1-input", self.log_image_data: image}) 465 | self.writer.add_summary(log_image_summary_op, global_step=t) 466 | 467 | if self.enable_boundingbox: 468 | # draw objectness 469 | image_ = np.copy(image) 470 | for o, output_map in enumerate(output_maps): 471 | objectness = output_map[image_id, :, :, self.num_classes: self.num_classes+1 ] * 255. 472 | log_image_summary_op = self.sess.run(self.log_image, \ 473 | feed_dict={self.log_image_name: "2-objectness-" + str(o), self.log_image_data: np.tile( objectness,(1,1,3)) }) 474 | self.writer.add_summary(log_image_summary_op, global_step=t) 475 | dim = self.layer_fields[o] 476 | cv2.rectangle(image_, (0, 0), (dim, dim), colors[o % len(colors)], 2) 477 | if self.stride_margin: 478 | dim = dim - self.layer_strides[o] 479 | cv2.rectangle(image_, (0, 0), (dim, dim), colors[o % len(colors)], 2) 480 | 481 | # draw eligible rectangles (before non max suppression) 482 | for r in eligible[image_id]: 483 | cv2.rectangle(image_, (int(r[2]), int(r[1])), (int(r[2]+r[4]), int(r[1]+r[3])), colors[r[5] % len(colors)], 2) 484 | log_image_summary_op = self.sess.run(self.log_image, \ 485 | feed_dict={self.log_image_name: "3-result", self.log_image_data: image_}) 486 | self.writer.add_summary(log_image_summary_op, global_step=t) 487 | 488 | # display results (after non max suppression) 489 | res_image = np.copy(image) 490 | for r in valid[image_id]: 491 | cv2.rectangle(res_image, (int(r[2]), int(r[1])), (int(r[2] + r[4]), int(r[1] + r[3])), colors[0], 2) 492 | 493 | log_image_summary_op = self.sess.run(self.log_image, \ 494 | feed_dict={self.log_image_name: "4-after-nms", self.log_image_data: res_image}) 495 | self.writer.add_summary(log_image_summary_op, global_step=t) 496 | 497 | # display target label 498 | target_image = np.copy(image) 499 | for r in target[image_id]: 500 | cv2.rectangle(target_image, (int(r[2]), int(r[1])), (int(r[2]+r[4]), int(r[1]+r[3])), colors[r[5] % len(colors)], 2) 501 | log_image_summary_op = self.sess.run(self.log_image, \ 502 | feed_dict={self.log_image_name: "5-target", self.log_image_data: target_image}) 503 | self.writer.add_summary(log_image_summary_op, global_step=t) 504 | 505 | # display groundtruth boxes 506 | for r in self.val_gt: 507 | if r[0] == i + image_id: 508 | cv2.rectangle(image, (int(r[2]), int(r[1])), (int(r[2]+r[4]), int(r[1]+r[3])), (86 / 255., 0, 240/255.), 2) 509 | log_image_summary_op = self.sess.run(self.log_image, \ 510 | feed_dict={self.log_image_name: "6-groundtruth", self.log_image_data: image}) 511 | self.writer.add_summary(log_image_summary_op, global_step=t) 512 | 513 | if self.enable_segmentation: 514 | # draw segmentation maps 515 | for o, output_map in enumerate(output_maps): 516 | output_map_labels = np.argmax( output_map[image_id], axis=-1 ) 517 | output_map_color = np_colors[output_map_labels] 518 | log_image_summary_op = self.sess.run(self.log_image, \ 519 | feed_dict={self.log_image_name: "2-segmentation-" + str(o), self.log_image_data: output_map_color }) 520 | self.writer.add_summary(log_image_summary_op, global_step=t) 521 | 522 | target_image = np.copy(image) 523 | target_map_labels = np.argmax(self.validation_data[o+1][i+image_id], axis=-1) 524 | target_map_color = np_colors[target_map_labels] 525 | log_image_summary_op = self.sess.run(self.log_image, \ 526 | feed_dict={self.log_image_name: "3-target-segmentation-" + str(o), self.log_image_data: target_map_color }) # cv2.addWeighted(target_image,0.1,cv2.resize(target_map_color, (target_image.shape[1], target_image.shape[0])),0.9,0, dtype=cv2.CV_32F) 527 | self.writer.add_summary(log_image_summary_op, global_step=t) 528 | 529 | # next batch 530 | if self.enable_boundingbox: 531 | detections = detections + valid 532 | target_detections = target_detections + non_max_suppression(target, self.nms_iou) 533 | i += step 534 | 535 | # compute statistics on full val dataset 536 | if self.enable_boundingbox: 537 | # mAP score and mean distance 538 | map, mean_distance = compute_map_score_and_mean_distance(self.val_gt, detections) 539 | summary = tf.Summary() 540 | summary_value = summary.value.add() 541 | summary_value.simple_value = map 542 | summary_value.tag = "validation_average_precision" 543 | self.writer.add_summary(summary, global_step=self.epoch) 544 | 545 | summary = tf.Summary() 546 | summary_value = summary.value.add() 547 | summary_value.simple_value = mean_distance 548 | summary_value.tag = "validation_mean_distance" 549 | self.writer.add_summary(summary, global_step=self.epoch) 550 | 551 | # target mAP score 552 | summary = tf.Summary() 553 | summary_value = summary.value.add() 554 | summary_value.simple_value, _ = compute_map_score_and_mean_distance(self.val_gt, target_detections) 555 | summary_value.tag = "target_average_precision" 556 | self.writer.add_summary(summary, global_step=self.epoch) 557 | 558 | self.writer.flush() 559 | 560 | def on_train_end(self, _): 561 | self.writer.close() 562 | 563 | 564 | class ParallelSaveCallback(Callback): 565 | 566 | def __init__(self, model, file): 567 | self.model_to_save = model 568 | self.file_path = file 569 | 570 | def on_epoch_end(self, epoch, logs=None): 571 | self.model_to_save.save(self.file_path + '_%d.h5' % epoch) 572 | -------------------------------------------------------------------------------- /clean.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | echo "Removing models from /sharedfiles" 3 | rm /sharedfiles/models/* 4 | 5 | echo "Removing datasets from /sharedfiles" 6 | rm /sharedfiles/datasets/* 7 | 8 | read -p "Remove TF logs? [y/n]" -n 1 -r 9 | echo # (optional) move to a new line 10 | if [[ $REPLY =~ ^[Yy]$ ]] 11 | then 12 | echo "Removing TF logs" 13 | rm -r ./Graph/* 14 | fi 15 | -------------------------------------------------------------------------------- /cntk-py27.yml: -------------------------------------------------------------------------------- 1 | name: cntk-py27 2 | dependencies: 3 | - pip=8.1.2=py27_0 4 | - python=2.7.11=5 5 | - opencv=3.1.0 6 | - pip: 7 | - lxml==4.2.0 8 | - keras==2.1.5 9 | - pytesseract 10 | -------------------------------------------------------------------------------- /cntk-py35.yml: -------------------------------------------------------------------------------- 1 | name: cntk-py35 2 | dependencies: 3 | - pip=8.1.2=py35_0 4 | - python=3.5.2=0 5 | - opencv=3.1.0 6 | - pip: 7 | - lxml==4.2.0 8 | - keras==2.1.5 9 | - pytesseract 10 | -------------------------------------------------------------------------------- /datasets/cls_dogs_vs_cats.py: -------------------------------------------------------------------------------- 1 | from keras.preprocessing.image import ImageDataGenerator 2 | 3 | class Dataset: 4 | 5 | def __init__(self, batch_size=3, input_dim=150, **kwargs): 6 | local_keys = locals() 7 | self.enable_classification = True 8 | self.enable_boundingbox = False 9 | self.enable_segmentation = False 10 | 11 | #classes 12 | self.classes = [ 'dogs', 'cats' ] 13 | self.num_classes = len(self.classes) 14 | print("Nb classes: " + str(self.num_classes)) 15 | 16 | self.img_h = input_dim 17 | self.img_w = input_dim 18 | self.input_shape = ( self.img_h, self.img_w , 3) 19 | # self.stride_margin = True 20 | train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True) 21 | val_datagen = ImageDataGenerator(rescale=1./255) 22 | test_datagen = ImageDataGenerator(rescale=1./255) 23 | 24 | self.train = train_datagen.flow_from_directory('/sharedfiles/dogs_vs_cats/train', 25 | target_size=(self.img_w, self.img_h), 26 | batch_size=batch_size, 27 | class_mode='categorical', classes=self.classes) 28 | 29 | self.val = test_datagen.flow_from_directory('/sharedfiles/dogs_vs_cats/validation', 30 | target_size=(self.img_w, self.img_h), 31 | batch_size=batch_size, 32 | class_mode='categorical', classes=self.classes) 33 | 34 | self.test = test_datagen.flow_from_directory('/sharedfiles/dogs_vs_cats/validation', 35 | target_size=(self.img_w, self.img_h), 36 | batch_size=batch_size, 37 | class_mode='categorical', classes=self.classes) 38 | 39 | # for compatibility 40 | self.gt_test = [] 41 | self.stride_margin = 0 42 | -------------------------------------------------------------------------------- /datasets/cls_dogs_vs_cats.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | SOURCE_DIR=$1 3 | mkdir -p /sharedfiles/dogs_vs_cats/train/dogs 4 | mkdir -p /sharedfiles/dogs_vs_cats/train/cats 5 | mkdir -p /sharedfiles/dogs_vs_cats/validation/dogs 6 | mkdir -p /sharedfiles/dogs_vs_cats/validation/cats 7 | 8 | for i in $(seq 0 9999) ; do cp $SOURCE_DIR/dog.$i.jpg /sharedfiles/dogs_vs_cats/train/dogs/ ; done 9 | for i in $(seq 0 9999) ; do cp $SOURCE_DIR/cat.$i.jpg /sharedfiles/dogs_vs_cats/train/cats/ ; done 10 | for i in $(seq 10000 12499) ; do cp $SOURCE_DIR/cat.$i.jpg /sharedfiles/dogs_vs_cats/validation/cats/ ; done 11 | for i in $(seq 10000 12499) ; do cp $SOURCE_DIR/dog.$i.jpg /sharedfiles/dogs_vs_cats/validation/dogs/ ; done 12 | -------------------------------------------------------------------------------- /datasets/cls_rvl_cdip.py: -------------------------------------------------------------------------------- 1 | from keras.preprocessing.image import ImageDataGenerator 2 | 3 | class Dataset: 4 | 5 | def __init__(self, batch_size=3, input_dim=150, **kwargs): 6 | local_keys = locals() 7 | self.enable_classification = True 8 | self.enable_boundingbox = False 9 | self.enable_segmentation = False 10 | 11 | #classes 12 | self.classes = ["letter","form", "email", "handwritten", "advertisement", \ 13 | "scientific_report", "scientific_publication", "specification", \ 14 | "file_folder", "news_article", "budget", "invoice", \ 15 | "presentation", "questionnaire", "resume", "memo" ] 16 | self.num_classes = len(self.classes) 17 | print("Nb classes: " + str(self.num_classes)) 18 | 19 | self.img_h = input_dim 20 | self.img_w = input_dim 21 | self.input_shape = ( self.img_h, self.img_w , 3) 22 | # self.stride_margin = True 23 | train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True) 24 | val_datagen = ImageDataGenerator(rescale=1./255) 25 | test_datagen = ImageDataGenerator(rescale=1./255) 26 | 27 | self.train = train_datagen.flow_from_directory('/sharedfiles/rvl_cdip/train', 28 | target_size=(self.img_w, self.img_h), 29 | batch_size=batch_size, 30 | class_mode='categorical', classes=self.classes) 31 | 32 | self.val = test_datagen.flow_from_directory('/sharedfiles/rvl_cdip/val', 33 | target_size=(self.img_w, self.img_h), 34 | batch_size=batch_size, 35 | class_mode='categorical', classes=self.classes) 36 | 37 | self.test = test_datagen.flow_from_directory('/sharedfiles/rvl_cdip/test', 38 | target_size=(self.img_w, self.img_h), 39 | batch_size=batch_size, 40 | class_mode='categorical', classes=self.classes) 41 | # for compatibility 42 | self.gt_test = [] 43 | self.stride_margin = 0 44 | -------------------------------------------------------------------------------- /datasets/cls_rvl_cdip_check.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ls /sharedfiles/rvl_cdip/train/ | wc -l 3 | 4 | for split in train test val 5 | do 6 | for d in /sharedfiles/rvl_cdip/$split/* 7 | do 8 | echo $d 9 | ls $d/ | wc -l 10 | done 11 | done 12 | echo "Train" 13 | find /sharedfiles/rvl_cdip/train/ -type f | wc -l 14 | echo "Val" 15 | find /sharedfiles/rvl_cdip/val/ -type f | wc -l 16 | echo "Test" 17 | find /sharedfiles/rvl_cdip/test/ -type f | wc -l 18 | 19 | 20 | 21 | -------------------------------------------------------------------------------- /datasets/cls_rvl_cdip_convert.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | SOURCE_DIR=$1 3 | 4 | classes=(letter 5 | form 6 | email 7 | handwritten 8 | advertisement 9 | scientific\ report 10 | scientific\ publication 11 | specification 12 | file\ folder 13 | news\ article 14 | budget 15 | invoice 16 | presentation 17 | questionnaire 18 | resume 19 | memo) 20 | 21 | for LABELFILE in $SOURCE_DIR/labels/* 22 | do 23 | split=`basename ${LABELFILE%%.*}` 24 | echo "SPLIT: $split" 25 | TARGET=$SOURCE_DIR/$split 26 | mkdir -p $TARGET 27 | 28 | for c in "${classes[@]}" 29 | do 30 | mkdir $TARGET/${c// /_} 31 | done 32 | 33 | IFS=$'\n' 34 | for next in `cat $LABELFILE` 35 | do 36 | echo " $next" 37 | FILEPATH="$(cut -d' ' -f1 <<< $next)" 38 | LABEL="$(cut -d' ' -f2 <<< $next)" 39 | mv $SOURCE_DIR/images/$FILEPATH $TARGET/${classes[$LABEL]// /_}/$(basename $FILEPATH) 40 | done 41 | done 42 | exit 0 43 | -------------------------------------------------------------------------------- /datasets/cls_tiny_imagenet.py: -------------------------------------------------------------------------------- 1 | from keras.preprocessing.image import ImageDataGenerator 2 | 3 | class Dataset: 4 | 5 | def __init__(self, batch_size=3, input_dim=150, **kwargs): 6 | local_keys = locals() 7 | self.enable_classification = True 8 | self.enable_boundingbox = False 9 | self.enable_segmentation = False 10 | 11 | #classes 12 | self.classes = [ 'n02124075', 'n04067472', 'n04540053', 'n04099969', 'n07749582', 'n01641577', 'n02802426', 'n09246464', 'n07920052', 'n03970156', 'n03891332', 'n02106662', 'n03201208', 'n02279972', 'n02132136', 'n04146614', 'n07873807', 'n02364673', 'n04507155', 'n03854065', 'n03838899', 'n03733131', 'n01443537', 'n07875152', 'n03544143', 'n09428293', 'n03085013', 'n02437312', 'n07614500', 'n03804744', 'n04265275', 'n02963159', 'n02486410', 'n01944390', 'n09256479', 'n02058221', 'n04275548', 'n02321529', 'n02769748', 'n02099712', 'n07695742', 'n02056570', 'n02281406', 'n01774750', 'n02509815', 'n03983396', 'n07753592', 'n04254777', 'n02233338', 'n04008634', 'n02823428', 'n02236044', 'n03393912', 'n07583066', 'n04074963', 'n01629819', 'n09332890', 'n02481823', 'n03902125', 'n03404251', 'n09193705', 'n03637318', 'n04456115', 'n02666196', 'n03796401', 'n02795169', 'n02123045', 'n01855672', 'n01882714', 'n02917067', 'n02988304', 'n04398044', 'n02843684', 'n02423022', 'n02669723', 'n04465501', 'n02165456', 'n03770439', 'n02099601', 'n04486054', 'n02950826', 'n03814639', 'n04259630', 'n03424325', 'n02948072', 'n03179701', 'n03400231', 'n02206856', 'n03160309', 'n01984695', 'n03977966', 'n03584254', 'n04023962', 'n02814860', 'n01910747', 'n04596742', 'n03992509', 'n04133789', 'n03937543', 'n02927161', 'n01945685', 'n02395406', 'n02125311', 'n03126707', 'n04532106', 'n02268443', 'n02977058', 'n07734744', 'n03599486', 'n04562935', 'n03014705', 'n04251144', 'n04356056', 'n02190166', 'n03670208', 'n02002724', 'n02074367', 'n04285008', 'n04560804', 'n04366367', 'n02403003', 'n07615774', 'n04501370', 'n03026506', 'n02906734', 'n01770393', 'n04597913', 'n03930313', 'n04118538', 'n04179913', 'n04311004', 'n02123394', 'n04070727', 'n02793495', 'n02730930', 'n02094433', 'n04371430', 'n04328186', 'n03649909', 'n04417672', 'n03388043', 'n01774384', 'n02837789', 'n07579787', 'n04399382', 'n02791270', 'n03089624', 'n02814533', 'n04149813', 'n07747607', 'n03355925', 'n01983481', 'n04487081', 'n03250847', 'n03255030', 'n02892201', 'n02883205', 'n03100240', 'n02415577', 'n02480495', 'n01698640', 'n01784675', 'n04376876', 'n03444034', 'n01917289', 'n01950731', 'n03042490', 'n07711569', 'n04532670', 'n03763968', 'n07768694', 'n02999410', 'n03617480', 'n06596364', 'n01768244', 'n02410509', 'n03976657', 'n01742172', 'n03980874', 'n02808440', 'n02226429', 'n02231487', 'n02085620', 'n01644900', 'n02129165', 'n02699494', 'n03837869', 'n02815834', 'n07720875', 'n02788148', 'n02909870', 'n03706229', 'n07871810', 'n03447447', 'n02113799', 'n12267677', 'n03662601', 'n02841315', 'n07715103', 'n02504458' ] 13 | self.num_classes = len(self.classes) 14 | print("Nb classes: " + str(self.num_classes)) 15 | 16 | self.img_h = input_dim 17 | self.img_w = input_dim 18 | self.input_shape = ( self.img_h, self.img_w , 3) 19 | # self.stride_margin = True 20 | train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True) 21 | val_datagen = ImageDataGenerator(rescale=1./255) 22 | test_datagen = ImageDataGenerator(rescale=1./255) 23 | 24 | self.train = train_datagen.flow_from_directory('/sharedfiles/tiny-imagenet-200/train', 25 | target_size=(self.img_w, self.img_h), 26 | batch_size=batch_size, 27 | class_mode='categorical', classes=self.classes) 28 | 29 | self.val = test_datagen.flow_from_directory('/sharedfiles/tiny-imagenet-200/val', 30 | target_size=(self.img_w, self.img_h), 31 | batch_size=batch_size, 32 | class_mode='categorical', classes=self.classes) 33 | 34 | self.test = test_datagen.flow_from_directory('/sharedfiles/tiny-imagenet-200/val', 35 | target_size=(self.img_w, self.img_h), 36 | batch_size=batch_size, 37 | class_mode='categorical', classes=self.classes) 38 | 39 | # for compatibility 40 | self.gt_test = [] 41 | self.stride_margin = 0 42 | -------------------------------------------------------------------------------- /datasets/cls_tiny_imagenet_class_list.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | for next in `cat /sharedfiles/tiny-imagenet-200/wnids.txt` 3 | do 4 | l="$l, '$next'" 5 | echo " $next" 6 | done 7 | echo $l 8 | 9 | exit 0 10 | -------------------------------------------------------------------------------- /datasets/cls_tiny_imagenet_convert.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | SOURCE_DIR=$1 3 | 4 | # create class directories for validation dataset 5 | for c in $SOURCE_DIR/train/* 6 | do 7 | CLASS=`basename $c` 8 | echo $CLASS 9 | mkdir $SOURCE_DIR/val/$CLASS 10 | done 11 | 12 | # copy images at the class directory as asked by Keras Image Flow 13 | IFS=$'\n' 14 | for next in `cat $SOURCE_DIR/val/val_annotations.txt` 15 | do 16 | next=`echo $next | tr -s ' '` 17 | echo " $next" 18 | FILEPATH="$(cut -d' ' -f1 <<< $next)" 19 | LABEL="$(cut -d' ' -f2 <<< $next)" 20 | cp $SOURCE_DIR/val/images/$FILEPATH $SOURCE_DIR/val/$LABEL/$FILEPATH 21 | done 22 | 23 | exit 0 24 | -------------------------------------------------------------------------------- /datasets/document.conf: -------------------------------------------------------------------------------- 1 | { 2 | "directory": "/sharedfiles/ocr_documents", 3 | "namespace": "ivalua.xml", 4 | "page_tag": "page", 5 | "char_tag": "char", 6 | "x1_attribute": "x1", 7 | "y1_attribute": "y1", 8 | "x2_attribute": "x2", 9 | "y2_attribute": "y2" 10 | } 11 | -------------------------------------------------------------------------------- /datasets/ocr_documents.py: -------------------------------------------------------------------------------- 1 | import json 2 | import glob 3 | from lxml import etree 4 | import numpy as np 5 | # from scipy import misc 6 | import cv2 7 | from . import load_from_local_file, save_to_local_file, compute_grids, compute_grids_ 8 | import math, os 9 | 10 | class Dataset: 11 | 12 | def __init__(self, name = "", input_dim=700, resize="", layer_offsets = [14], layer_strides = [28], layer_fields=[28], iou_treshold = .3, save=True, **kwargs): 13 | local_keys = locals() 14 | self.enable_classification = False 15 | self.enable_boundingbox = True 16 | self.enable_segmentation = False 17 | 18 | #classes 19 | self.classes = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", \ 20 | "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", \ 21 | "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", \ 22 | "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "(", ")", "%"] 23 | self.num_classes = len(self.classes) 24 | print("Nb classes: " + str(self.num_classes)) 25 | 26 | self.img_h = input_dim 27 | self.img_w = int(.6 * input_dim) 28 | self.input_shape = ( self.img_h, self.img_w , 1) 29 | self.stride_margin = True 30 | 31 | if load_from_local_file(**local_keys): 32 | return 33 | 34 | with open('datasets/document.conf') as config_file: 35 | config = json.load(config_file) 36 | 37 | xml_all_files = glob.glob(config["directory"] + "/*.xml") 38 | # xml_all_files = xml_all_files[:100] 39 | num_files = len(xml_all_files) 40 | num_train = int(0.9 * num_files) 41 | print("{} files in OCR dataset, split into TRAIN 90% and VAL 10%".format(num_files)) 42 | xml_train_files = xml_all_files[0:num_train] 43 | xml_test_files = xml_all_files[num_train:] 44 | 45 | def create_dataset(xml_files, name): 46 | nb_images = len(xml_files) 47 | print("{} files in {} dataset".format(nb_images, name)) 48 | groundtruth = [] 49 | tiles = np.ones( [nb_images, self.img_h , self.img_w, 1], dtype = 'float32') 50 | ns = {'d': config["namespace"]} 51 | i = 0 52 | for xml_file in xml_files: 53 | if i >= nb_images: 54 | break 55 | root = etree.parse( xml_file ) 56 | print("{}/{} - {}".format(i, nb_images, xml_file)) 57 | 58 | pages = root.findall(".//d:" + config["page_tag"], ns) 59 | for p, page in enumerate(pages): 60 | if i >= nb_images: 61 | break 62 | 63 | prefix = "" 64 | if len(pages) > 1: 65 | prefix = "-" + str(p) 66 | img_path = xml_file[:-4] + prefix + ".jpg" 67 | image = cv2.imread(img_path, 0) 68 | 69 | if (image is None) or (image.shape != (int(page.get("height")), int(page.get("width")))) : 70 | print("Read Error " + img_path) 71 | continue 72 | 73 | image = image / 255. 74 | 75 | f = 1. 76 | if resize != "": 77 | r1 = int(resize) / image.shape[0] 78 | r2 = int(resize) / image.shape[1] 79 | f = min(r1, r2) 80 | image = cv2.resize(image, None, fx=f, fy=f, interpolation=cv2.INTER_NEAREST) 81 | print(image.shape) 82 | 83 | # find a good random crop 84 | e = 0 85 | while e < 100: 86 | e = e +1 87 | if image.shape[1] > self.img_w: 88 | x_ = np.random.choice(image.shape[1] - self.img_w) 89 | w_ = self.img_w 90 | else: 91 | x_ = 0 92 | w_ = image.shape[1] 93 | if image.shape[0] > self.img_h: 94 | y_ = np.random.choice(image.shape[0] - self.img_h) 95 | h_ = self.img_h 96 | else: 97 | y_ = 0 98 | h_ = image.shape[0] 99 | chars = page.findall(".//d:" + config["char_tag"], ns) 100 | nb_chars = 0 101 | for c in chars: 102 | x1 = float(c.get(config["x1_attribute"])) * f - x_ 103 | y1 = float(c.get(config["y1_attribute"])) * f - y_ 104 | x2 = float(c.get(config["x2_attribute"])) * f - x_ 105 | y2 = float(c.get(config["y2_attribute"])) * f - y_ 106 | if (x1 > 0) and (x2 < w_) and (y1 > 0) and (y2 < h_) : 107 | nb_chars = nb_chars + 1 108 | if nb_chars > 10: 109 | break 110 | 111 | tiles[i, :h_, :w_, 0] = image[y_:y_+h_, x_:x_+w_] 112 | 113 | chars = page.findall(".//d:" + config["char_tag"], ns) 114 | for c in chars: 115 | x1 = float(c.get(config["x1_attribute"])) * f - x_ 116 | y1 = float(c.get(config["y1_attribute"])) * f - y_ 117 | x2 = float(c.get(config["x2_attribute"])) * f - x_ 118 | y2 = float(c.get(config["y2_attribute"])) * f - y_ 119 | if (x1 < 0) or (x2 > w_) or (y1 < 0) or (y2 > h_) or ( min(y2 - y1, x2 - x1) <= 0.0 ): # or ( max(x2 - x1, y2 - y1) < (layer_fields[0] - layer_strides[0]) / 2 ) 120 | continue 121 | # discard too small chars 122 | # if max(x2 - x1, y2 - y1) < 7: 123 | # continue 124 | if (c.text in self.classes): 125 | groundtruth.append((i, y1, x1, y2 - y1, x2 - x1, self.classes.index(c.text))) 126 | 127 | i = i + 1 128 | 129 | grids = compute_grids_(0, nb_images, groundtruth, layer_offsets, layer_strides, layer_fields, self.input_shape, self.stride_margin, iou_treshold, self.num_classes) 130 | return tiles, grids, np.array(groundtruth) 131 | 132 | x_train, y_train, gt_train = create_dataset(xml_train_files, "TRAIN") 133 | x_test, y_test, gt_test = create_dataset(xml_test_files, "TEST") 134 | 135 | self.x_train = x_train 136 | self.y_train = y_train 137 | self.gt_train = gt_train 138 | self.x_test = x_test 139 | self.y_test = y_test 140 | self.gt_test = gt_test 141 | 142 | save_to_local_file(**local_keys) 143 | -------------------------------------------------------------------------------- /datasets/ocr_documents_generator.py: -------------------------------------------------------------------------------- 1 | import json 2 | import glob 3 | from lxml import etree 4 | import numpy as np 5 | import cv2 6 | import math, os 7 | import random 8 | import gc 9 | from . import compute_grids, compute_grids_ 10 | from keras.utils.data_utils import Sequence 11 | 12 | class FlowGenerator(Sequence): 13 | def __init__(self, config, files, name, layer_offsets, layer_strides, layer_fields, resize, classes, target_size=(150, 150), batch_size=1, iou_treshold = .3, stride_margin= True): #rescale=1./255, shear_range=0.2, zoom_range=0.2, brightness=0.1, rotation=5.0, zoom=0.1 14 | 15 | self.config = config 16 | self.file_path_list = files 17 | self.name = name 18 | self.layer_offsets, self.layer_strides, self.layer_fields = layer_offsets, layer_strides, layer_fields 19 | self.classes = classes 20 | self.num_classes = len(classes) 21 | self.resize = resize 22 | self.img_h = target_size[0] 23 | self.img_w = target_size[1] 24 | self.batch_size = batch_size 25 | self.iou_treshold = iou_treshold 26 | self.stride_margin = stride_margin 27 | # self.brightness = brightness 28 | # self.rotation = rotation 29 | # self.zoom = zoom 30 | 31 | 32 | def __len__(self): 33 | return len(self.file_path_list) // self.batch_size 34 | 35 | def __getitem__(self, i): 36 | X = np.zeros((self.batch_size, self.img_h, self.img_w, 1), dtype='float32') 37 | groundtruth = [] # for mAP score computation and verification 38 | 39 | ns = {'d': self.config["namespace"]} 40 | for n, xml_file in enumerate(self.file_path_list[i*self.batch_size:(i+1)*self.batch_size]): 41 | # print(self.name, i, n, "/", self.batch_size, xml_file) 42 | root = etree.parse( xml_file ) 43 | pages = root.findall(".//d:" + self.config["page_tag"], ns) 44 | # for p, page in enumerate(pages): 45 | p =0 46 | page = pages[0] 47 | page_size = (int(page.get("height")), int(page.get("width"))) 48 | prefix = "" 49 | if len(pages) > 1: 50 | prefix = "-" + str(p) 51 | img_path = xml_file[:-4] + prefix + ".jpg" 52 | image = cv2.imread(img_path, 0) 53 | if (image is None) or (image.shape != page_size) : 54 | print("Read Error " + img_path) 55 | continue 56 | image = image / 255. 57 | 58 | f = 1. 59 | if self.resize != "": 60 | r1 = int(self.resize) / image.shape[0] 61 | r2 = int(self.resize) / image.shape[1] 62 | f = min(r1, r2) 63 | image = cv2.resize(image, None, fx=f, fy=f, interpolation=cv2.INTER_NEAREST) 64 | 65 | # find a good random crop 66 | e = 0 67 | while e < 100: 68 | e = e +1 69 | if image.shape[1] > self.img_w: 70 | x_ = np.random.choice(image.shape[1] - self.img_w) 71 | w_ = self.img_w 72 | else: 73 | x_ = 0 74 | w_ = image.shape[1] 75 | if image.shape[0] > self.img_h: 76 | y_ = np.random.choice(image.shape[0] - self.img_h) 77 | h_ = self.img_h 78 | else: 79 | y_ = 0 80 | h_ = image.shape[0] 81 | chars = page.findall(".//d:" + self.config["char_tag"], ns) 82 | nb_chars = 0 83 | for c in chars: 84 | x1 = float(c.get(self.config["x1_attribute"])) * f - x_ 85 | y1 = float(c.get(self.config["y1_attribute"])) * f - y_ 86 | x2 = float(c.get(self.config["x2_attribute"])) * f - x_ 87 | y2 = float(c.get(self.config["y2_attribute"])) * f - y_ 88 | if (x1 > 0) and (x2 < w_) and (y1 > 0) and (y2 < h_) : 89 | nb_chars = nb_chars + 1 90 | if nb_chars > 10: 91 | break 92 | 93 | X[n, :h_, :w_ , 0] = image[y_:y_+h_, x_:x_+w_] 94 | 95 | chars = page.findall(".//d:" + self.config["char_tag"], ns) 96 | # print(" Nb chars:", len(chars)) 97 | for c in chars: 98 | x1 = int(float(c.get(self.config["x1_attribute"])) * f - x_) 99 | y1 = int(float(c.get(self.config["y1_attribute"])) * f - y_) 100 | x2 = int(float(c.get(self.config["x2_attribute"])) * f - x_) 101 | y2 = int(float(c.get(self.config["y2_attribute"])) * f - y_) 102 | if (x1 < 0) or (x2 > w_) or (y1 < 0) or (y2 > h_) or ( min(y2 - y1, x2 - x1) <= 0.0 ): # or ( max(x2 - x1, y2 - y1) < (self.layer_fields[0] - self.layer_strides[0]) / 2 ): 103 | continue 104 | # discard too small chars 105 | # if max(x2 - x1, y2 - y1) < 7: 106 | # continue 107 | if c.text in self.classes: 108 | groundtruth.append((i *self.batch_size + n, y1, x1, y2 - y1, x2 - x1, self.classes.index(c.text))) 109 | 110 | grids = compute_grids_(i *self.batch_size, self.batch_size, groundtruth, self.layer_offsets, self.layer_strides, self.layer_fields, (self.img_h, self.img_w), self.stride_margin, self.iou_treshold, self.num_classes) 111 | 112 | return X, grids #, np.array(groundtruth) 113 | 114 | def on_epoch_end(self): 115 | # Shuffle dataset for next epoch 116 | random.shuffle(self.file_path_list) 117 | # Fix memory leak (Keras bug) 118 | gc.collect() 119 | 120 | 121 | 122 | class Dataset: 123 | 124 | def __init__(self, name = "", batch_size=1, input_dim=1000, resize="", layer_offsets = [14], layer_strides = [28], layer_fields=[28], iou_treshold = .3, **kwargs): 125 | local_keys = locals() 126 | self.enable_classification = False 127 | self.enable_boundingbox = True 128 | self.enable_segmentation = False 129 | 130 | #classes 131 | self.classes = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", \ 132 | "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", \ 133 | "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", \ 134 | "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "(", ")", "%"] 135 | self.num_classes = len(self.classes) 136 | print("Nb classes: " + str(self.num_classes)) 137 | 138 | self.img_h = input_dim 139 | self.img_w = int(.7 * input_dim) 140 | self.input_shape = ( self.img_h, self.img_w , 1) 141 | self.stride_margin = True 142 | 143 | with open('datasets/document.conf') as config_file: 144 | config = json.load(config_file) 145 | xml_all_files = glob.glob(config["directory"] + "/*.xml") 146 | num_files = len(xml_all_files) 147 | num_train = int(0.9 * num_files) 148 | print("{} files in OCR dataset, split into TRAIN 90% and VAL 10%".format(num_files)) 149 | xml_train_files = xml_all_files[0:num_train] 150 | xml_test_files = xml_all_files[num_train:] 151 | 152 | self.train = FlowGenerator(config, xml_train_files, "TRAIN", layer_offsets, layer_strides, layer_fields, resize, self.classes, target_size=(self.img_h, self.img_w), 153 | batch_size=batch_size, iou_treshold = iou_treshold, stride_margin= self.stride_margin) #, rescale=1./255, shear_range=0.2, zoom_range=0.2 154 | self.val = FlowGenerator(config, xml_test_files, "VAL", layer_offsets, layer_strides, layer_fields, resize, self.classes, target_size=(self.img_h, self.img_w), 155 | batch_size=batch_size, iou_treshold = iou_treshold, stride_margin= self.stride_margin) #, rescale=1./255 156 | self.test = FlowGenerator(config, xml_test_files, "TEST", layer_offsets, layer_strides, layer_fields, resize, self.classes, target_size=(self.img_h, self.img_w), 157 | batch_size=batch_size, iou_treshold = iou_treshold, stride_margin= self.stride_margin) #, rescale=1./255 158 | 159 | # for compatibility 160 | self.gt_test = [] 161 | self.stride_margin = 0 162 | -------------------------------------------------------------------------------- /datasets/ocr_documents_preprocess.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import pytesseract 3 | import json 4 | import glob 5 | from lxml import etree 6 | import time 7 | import lxml.builder 8 | 9 | with open('datasets/document.conf') as config_file: 10 | config = json.load(config_file) 11 | 12 | pdf_files = glob.glob(config["directory"] +"/*.jpg") 13 | for i, filename in enumerate(pdf_files): 14 | start = time.time() 15 | # read the image and get the dimensions 16 | img = cv2.imread(filename) 17 | h, w, _ = img.shape # assumes color image 18 | 19 | # run tesseract, returning the bounding boxes 20 | boxes = pytesseract.image_to_boxes(img) # also include any config options you use 21 | 22 | root = etree.Element("root", nsmap={None : config["namespace"]}) 23 | p = etree.Element(config["page_tag"], height=str(h), width=str(w)) 24 | root.append( p ) 25 | 26 | 27 | # draw the bounding boxes on the image 28 | for b in boxes.splitlines(): 29 | b = b.split(' ') 30 | # img = cv2.rectangle(img, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2) 31 | c = etree.Element(config["char_tag"]) 32 | c.attrib["x1"] = str( min(int(b[1]), int(b[3])) ) 33 | c.attrib["y1"] = str( min( h - int(b[2]), h - int(b[4])) ) 34 | c.attrib["x2"] = str( max(int(b[1]), int(b[3])) ) 35 | c.attrib["y2"] = str( max( h - int(b[2]), h - int(b[4])) ) 36 | c.text = b[0] 37 | p.append( c ) 38 | 39 | print(filename[:-4] + ".xml", time.time() - start) 40 | etree.ElementTree(root).write(filename[:-4] + ".xml", pretty_print=True, xml_declaration=True, encoding="utf-8") 41 | # print(etree.tostring(root, pretty_print=True)) 42 | # cv2.imwrite(str(i) + ".jpg", img) 43 | -------------------------------------------------------------------------------- /datasets/ocr_documents_statistics.py: -------------------------------------------------------------------------------- 1 | import glob 2 | from lxml import etree 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | import json 6 | 7 | with open('datasets/document.conf') as config_file: 8 | config = json.load(config_file) 9 | 10 | xml_files = glob.glob(config["directory"] + "/*.xml") 11 | num_files = len(xml_files) 12 | print("{} files in dataset".format(num_files)) 13 | widths, heights = [], [] 14 | ns = {'d': config["namespace"]} 15 | i = 0 16 | for xml_file in xml_files: 17 | print(xml_file) 18 | root = etree.parse(xml_file) 19 | 20 | page = root.find(".//d:" + config["page_tag"], ns) 21 | page_size = [page.get("height"), page.get("width")] 22 | 23 | chars = root.findall(".//d:" + config["char_tag"], ns) 24 | for c in chars: 25 | widths.append( int(chars[0].get(config["x2_attribute"])) - int(chars[0].get(config["x1_attribute"])) ) 26 | heights.append( int(chars[0].get(config["y2_attribute"])) - int(chars[0].get(config["y1_attribute"])) ) 27 | 28 | i = i + 1 29 | 30 | print(np.histogram(np.asarray(widths), bins='auto')) 31 | print(np.histogram(np.asarray(heights), bins='auto')) 32 | plt.hist(np.asarray(widths), bins='auto') 33 | plt.show() 34 | plt.hist(np.asarray(heights), bins='auto') 35 | plt.show() 36 | -------------------------------------------------------------------------------- /datasets/ocr_mnist.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import keras 3 | from keras.datasets import mnist 4 | import numpy as np 5 | import os 6 | from skimage import transform 7 | from . import compute_grids, compute_grids_, compute_grids_local, load_from_local_file, save_to_local_file 8 | import math 9 | 10 | class Dataset: 11 | 12 | def __init__(self, name, layer_offsets = [14, 28], layer_strides = [28, 56], layer_fields=[28, 56], 13 | input_dim=700, resize="", white_prob = 0., bb_positive="iou-treshold" , iou_treshold = .3, save=True, noise=False, **kwargs): 14 | local_keys = locals() 15 | self.enable_classification = False 16 | self.enable_boundingbox = True 17 | self.enable_segmentation = False 18 | 19 | if resize == "": 20 | digit_dim = [["28"]] 21 | else: 22 | digit_dim = [r.split("-") for r in resize.split(",")] 23 | 24 | assert len(layer_offsets) == len(digit_dim), "Number of layers in network do not match number of digit scales" 25 | 26 | self.img_h = int(input_dim) 27 | self.img_w = int(input_dim * .6) 28 | self.input_shape = ( self.img_h, self.img_w , 1) 29 | 30 | grid_dim = int(digit_dim[0][-1]) 31 | nb_images_y = self.img_h // grid_dim 32 | nb_images_x = self.img_w // grid_dim 33 | 34 | self.num_classes = 10 35 | self.classes = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"] 36 | self.stride_margin= False 37 | 38 | if load_from_local_file(**local_keys): 39 | return 40 | 41 | (x_train, y_train), (x_test, y_test) = mnist.load_data() 42 | 43 | x_train = x_train.reshape(x_train.shape[0], 28, 28, 1) 44 | x_test = x_test.reshape(x_test.shape[0], 28, 28, 1) 45 | 46 | x_train = x_train.astype('float32') 47 | x_test = x_test.astype('float32') 48 | x_train /= 255 49 | x_test /= 255 50 | 51 | y_train = keras.utils.to_categorical(y_train, 10) 52 | y_test = keras.utils.to_categorical(y_test, 10) 53 | 54 | if noise: 55 | NUM_DISTORTIONS_DB = 100000 56 | num_distortions=80 57 | distortions = [] 58 | dist_size = (9, 9) 59 | all_digits = x_train.reshape([-1, 28, 28]) 60 | num_digits = all_digits.shape[0] 61 | for i in range(NUM_DISTORTIONS_DB): 62 | rand_digit = np.random.randint(num_digits) 63 | rand_x = np.random.randint(28-dist_size[1]) 64 | rand_y = np.random.randint(28-dist_size[0]) 65 | 66 | digit = all_digits[rand_digit] 67 | distortion = digit[rand_y:rand_y + dist_size[0], 68 | rand_x:rand_x + dist_size[1]] 69 | assert distortion.shape == dist_size 70 | #plt.imshow(distortion, cmap='gray') 71 | #plt.show() 72 | distortions += [distortion] 73 | print("Created distortions") 74 | 75 | def add_distortions(image): 76 | canvas = np.zeros_like(image) 77 | for i in range(num_distortions): 78 | rand_distortion = distortions[np.random.randint(NUM_DISTORTIONS_DB)] 79 | rand_x = np.random.randint(image.shape[1]-dist_size[1]) 80 | rand_y = np.random.randint(image.shape[0]-dist_size[0]) 81 | canvas[rand_y:rand_y+dist_size[0], 82 | rand_x:rand_x+dist_size[1], 0] = - rand_distortion 83 | canvas += image 84 | return np.clip(canvas, 0, 1) 85 | 86 | 87 | def create_tile(x, y): 88 | total_digits = x.shape[0] 89 | nb_images = int(total_digits / nb_images_x / nb_images_y / (1-white_prob) ) 90 | tiles = np.ones( [nb_images, self.img_h , self.img_w, x.shape[3]], dtype = 'float32') 91 | occupations = np.zeros( [nb_images, self.img_h , self.img_w, 1], dtype = 'float32') 92 | groundtruth = [] # for mAP score computation and verification 93 | 94 | i = 0 95 | for tile in range(nb_images): 96 | for s in reversed(range(len(layer_offsets))): 97 | nb_samples = 0 98 | img_dim = digit_dim[s] 99 | anchor_dim = layer_fields[s] 100 | while nb_samples < (1. - white_prob) * nb_images_x * nb_images_y / len(digit_dim) * grid_dim / int(img_dim[-1]): 101 | # pick a random row, col on the scale grid 102 | row = np.random.choice(nb_images_y ) 103 | col = np.random.choice(nb_images_x ) 104 | if len(img_dim) > 1: 105 | dim = int(math.ceil(int(img_dim[0]) + np.random.rand() * (int(img_dim[1]) - int(img_dim[0])) )) 106 | else: 107 | dim = int(img_dim[0]) 108 | xc = (col + .5) * grid_dim 109 | yc = (row + .5) * grid_dim 110 | x_ = int(xc - dim/2) 111 | y_ = int(yc - dim/2) 112 | x_range = slice(x_, x_ + dim) 113 | y_range = slice(y_, y_ + dim) 114 | if (x_ < 0) or (y_ < 0) or (x_ + dim > self.img_w) or (y_ + dim > self.img_h): 115 | continue 116 | 117 | # if position available add it 118 | if np.sum(occupations[ tile, y_range, x_range, 0 ]) == 0.: 119 | resized_x = transform.resize(x[i], (dim, dim), mode='constant') 120 | tiles[ tile, y_range, x_range, ...] = 1.0 - resized_x # change for white background 121 | groundtruth.append((tile, y_, x_, dim, dim, np.argmax(y[i]))) 122 | occupations[ tile, y_range, x_range, ...] = 1.0 123 | i = (i + 1) % total_digits 124 | nb_samples = nb_samples + 1 125 | 126 | if noise: 127 | tiles[ tile ] = add_distortions(tiles[ tile ]) 128 | 129 | import time 130 | now = time.time() 131 | grids = compute_grids(0, nb_images, groundtruth, layer_offsets, layer_strides, layer_fields, self.input_shape, self.stride_margin, iou_treshold, self.num_classes, bb_positive="iou-treshold") 132 | if False: # timing eval 133 | t1 = time.time() -now 134 | now = time.time() 135 | grids2 = compute_grids_local(0, nb_images, groundtruth, layer_offsets, layer_strides, layer_fields, self.input_shape, self.stride_margin, iou_treshold, self.num_classes) 136 | print("grids", t1) 137 | print("grids2", time.time() -now) 138 | print("is nan", np.isnan(grids2[0].min())) 139 | for s in range(len(grids)): 140 | for l in range(grids[s].shape[-1]): 141 | print(np.allclose(grids[s][...,l],grids2[s][...,l])) 142 | print(np.allclose(np.argmax(grids[0][...,:10], axis=3), np.argmax(grids2[0][...,:10], axis=3))) 143 | 144 | return tiles, grids, np.array(groundtruth) 145 | 146 | x_train, y_train, gt_train = create_tile(x_train, y_train) 147 | x_test, y_test, gt_test = create_tile(x_test, y_test) 148 | 149 | self.x_train = x_train 150 | self.y_train = y_train 151 | self.gt_train = gt_train 152 | self.x_test = x_test 153 | self.y_test = y_test 154 | self.gt_test = gt_test 155 | 156 | save_to_local_file(**local_keys) 157 | 158 | # from skimage.io import imsave 159 | # import os 160 | # if not os.path.exists("logs"): #args.logs 161 | # os.mkdir("logs") 162 | # image_id = 1 163 | # image = (x_train[image_id, :, :, 0] * 255.).astype(np.uint8) 164 | # imsave(os.path.join("logs", str(image_id) + '_input.png'), image) 165 | -------------------------------------------------------------------------------- /images/Object_detection_deep_learning_networks_for_Optical_Character_Recognition.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/Object_detection_deep_learning_networks_for_Optical_Character_Recognition.pdf -------------------------------------------------------------------------------- /images/res1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res1.png -------------------------------------------------------------------------------- /images/res2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res2.png -------------------------------------------------------------------------------- /images/res3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res3.png -------------------------------------------------------------------------------- /images/res4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res4.png -------------------------------------------------------------------------------- /images/res5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res5.png -------------------------------------------------------------------------------- /keras-tf-py27.yml: -------------------------------------------------------------------------------- 1 | name: keras-tf-py27 2 | dependencies: 3 | - pip=8.1.2=py27_0 4 | - python=2.7.11=5 5 | - h5py 6 | - hdf5 7 | - pip: 8 | - numpy==1.14.2 9 | - scikit-image==0.13.1 10 | - lxml==4.2.0 11 | - keras==2.1.5 12 | - opencv-python 13 | - "https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp27-none-linux_x86_64.whl" 14 | - pytesseract 15 | -------------------------------------------------------------------------------- /keras-tf-py35.yml: -------------------------------------------------------------------------------- 1 | name: keras-tf-py35 2 | dependencies: 3 | - pip=8.1.2=py35_0 4 | - python=3.5.2=0 5 | - h5py 6 | - hdf5 7 | - pip: 8 | - numpy==1.14.2 9 | - scikit-image==0.13.1 10 | - keras==2.1.5 11 | - lxml==4.2.0 12 | - opencv-python 13 | - "https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp35-cp35m-linux_x86_64.whl" 14 | - pytesseract 15 | -------------------------------------------------------------------------------- /models/CNN_C128_C256_M2_C256_C256_M2_C512_D_2.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [2 * self.stride_scale, 4 * self.stride_scale] 14 | self.offsets = [14, 28] 15 | self.fields = [28, 56] 16 | 17 | 18 | def build(self, input_shape, num_classes): 19 | image_input = Input(shape=input_shape, name='image_input') 20 | 21 | # stage 1 22 | model = Sequential() # 28, 28, 1 23 | model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', 24 | input_shape=input_shape, padding='valid')) # 28, 28, 1 25 | model.add(Conv2D(256, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 26 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1 27 | s1 = model(image_input) 28 | 29 | # stage 2 30 | s2 = Conv2D(256, kernel_size=(3, 3), activation='relu', padding='valid')(s1) 31 | s2 = Conv2D(256, (3, 3), activation='relu', padding='valid')(s2) 32 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2) 33 | 34 | # output 1 35 | f1 = Dropout(0.25)(s1) 36 | f1 = Conv2D(512, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale), 37 | padding="valid", activation='relu')(f1) 38 | f1 = Dropout(0.5)(f1) 39 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f1))) 40 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 41 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 42 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 43 | 44 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4]) 45 | 46 | # output 2 47 | f2 = Dropout(0.25)(s2) 48 | f2 = Conv2D(512, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale), 49 | padding="valid", activation='relu')(f2) 50 | f2 = Dropout(0.5)(f2) 51 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f2))) 52 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 53 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 54 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 55 | 56 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4]) 57 | 58 | # print(model.summary()) 59 | # if K._BACKEND=='tensorflow': 60 | # for layer in model.layers: 61 | # print(layer.get_output_at(0).get_shape().as_list()) 62 | self.model = Model(image_input, [output1, output2]) 63 | self.model.strides = self.strides 64 | self.model.offsets = self.offsets 65 | self.model.fields = self.fields 66 | return self.model 67 | -------------------------------------------------------------------------------- /models/CNN_C128_C256_M2_C512_D.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [2 * self.stride_scale] 14 | self.offsets = [14] 15 | self.fields = [28] 16 | 17 | def build(self, input_shape, num_classes): 18 | image_input = Input(shape=input_shape, name='image_input') 19 | 20 | model = Sequential() # 28, 28, 1 21 | model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', 22 | input_shape=input_shape, padding='valid')) # 28, 28, 1 23 | model.add(Conv2D(256, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 24 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1 25 | model.add(Dropout(0.25)) 26 | model.add(Conv2D(512, kernel_size=(12,12), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu')) 27 | model.add(Dropout(0.5)) 28 | 29 | features = model(image_input) 30 | 31 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 32 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 33 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 34 | 35 | output = concatenate([output1, output2, output3], name="output") 36 | 37 | # print(model.summary()) 38 | # if K._BACKEND=='tensorflow': 39 | # for layer in model.layers: 40 | # print(layer.get_output_at(0).get_shape().as_list()) 41 | self.model = Model(image_input, output) 42 | self.model.strides = self.strides 43 | self.model.offsets = self.offsets 44 | self.model.fields = self.fields 45 | return self.model 46 | -------------------------------------------------------------------------------- /models/CNN_C32_C64_C128_C.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [self.stride_scale] 14 | self.offsets = [7] 15 | self.fields = [14] 16 | 17 | def build(self, input_shape, num_classes): 18 | image_input = Input(shape=input_shape, name='image_input') 19 | 20 | model = Sequential() # 28, 28, 1 21 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', 22 | input_shape=input_shape, padding='valid')) # 28, 28, 1 23 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 24 | model.add(Dropout(0.25)) 25 | model.add(Conv2D(128, kernel_size=(10,10), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu')) 26 | model.add(Dropout(0.5)) 27 | model.add(Conv2D(128, kernel_size=(1,1), padding="valid", activation='relu')) 28 | model.add(Conv2D(64, kernel_size=(1,1), padding="valid", activation='relu')) 29 | 30 | features = model(image_input) 31 | 32 | output1 = Dense(num_classes, activation = "softmax")(features) 33 | output2 = Dense(1, activation = "sigmoid")(features) 34 | output3 = Dense(2, activation = "tanh")(features) 35 | output4 = Dense(2, activation = "sigmoid")(features) 36 | 37 | output = concatenate([output1, output2, output3, output4], name="output") 38 | 39 | # print(model.summary()) 40 | # if K._BACKEND=='tensorflow': 41 | # for layer in model.layers: 42 | # print(layer.get_output_at(0).get_shape().as_list()) 43 | self.model = Model(image_input, output) 44 | self.model.strides = self.strides 45 | self.model.offsets = self.offsets 46 | self.model.fields = self.fields 47 | return self.model 48 | -------------------------------------------------------------------------------- /models/CNN_C32_C64_C128_C2.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [self.stride_scale] 14 | self.offsets = [7] 15 | self.fields = [14] 16 | 17 | def build(self, input_shape, num_classes): 18 | image_input = Input(shape=input_shape, name='image_input') 19 | 20 | model = Sequential() # 28, 28, 1 21 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', 22 | input_shape=input_shape, padding='valid')) # 28, 28, 1 23 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 24 | model.add(Dropout(0.25)) 25 | model.add(Conv2D(128, kernel_size=(10,10), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu')) 26 | model.add(Dropout(0.5)) 27 | model.add(Conv2D(256, kernel_size=(1,1), padding="valid", activation='relu')) 28 | model.add(Conv2D(128, kernel_size=(1,1), padding="valid", activation='relu')) 29 | 30 | features = model(image_input) 31 | 32 | output1 = Dense(num_classes, activation = "softmax")(features) 33 | output2 = Dense(1, activation = "sigmoid")(features) 34 | output3 = Dense(2, activation = "tanh")(features) 35 | output4 = Dense(2, activation = "sigmoid")(features) 36 | 37 | output = concatenate([output1, output2, output3, output4], name="output") 38 | 39 | # print(model.summary()) 40 | # if K._BACKEND=='tensorflow': 41 | # for layer in model.layers: 42 | # print(layer.get_output_at(0).get_shape().as_list()) 43 | self.model = Model(image_input, output) 44 | self.model.strides = self.strides 45 | self.model.offsets = self.offsets 46 | self.model.fields = self.fields 47 | return self.model 48 | -------------------------------------------------------------------------------- /models/CNN_C32_C64_C128_D.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [self.stride_scale] 14 | self.offsets = [7] 15 | self.fields = [14] 16 | 17 | def build(self, input_shape, num_classes): 18 | image_input = Input(shape=input_shape, name='image_input') 19 | 20 | model = Sequential() # 28, 28, 1 21 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', 22 | input_shape=input_shape, padding='valid')) # 28, 28, 1 23 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 24 | model.add(Dropout(0.25)) 25 | model.add(Conv2D(128, kernel_size=(10,10), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu')) 26 | model.add(Dropout(0.5)) 27 | 28 | features = model(image_input) 29 | 30 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 31 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 32 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 33 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 34 | 35 | output = concatenate([output1, output2, output3, output4], name="output") 36 | 37 | # print(model.summary()) 38 | # if K._BACKEND=='tensorflow': 39 | # for layer in model.layers: 40 | # print(layer.get_output_at(0).get_shape().as_list()) 41 | self.model = Model(image_input, output) 42 | self.model.strides = self.strides 43 | self.model.offsets = self.offsets 44 | self.model.fields = self.fields 45 | return self.model 46 | -------------------------------------------------------------------------------- /models/CNN_C32_C64_C64_Cd64_C128_D.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 6 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [self.stride_scale] 14 | self.offsets = [14] 15 | self.fields = [28] 16 | 17 | 18 | def build(self, input_shape, num_classes): 19 | image_input = Input(shape=input_shape, name='image_input') 20 | 21 | model = Sequential() # 28, 28, 1 22 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', 23 | input_shape=input_shape, padding='valid')) # 28, 28, 1 24 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 25 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')) # 28, 28, 1 26 | model.add(Conv2D(64, (5, 5), activation='relu', dilation_rate=(2, 2), padding='valid')) # 28, 28, 1 27 | model.add(Dropout(0.25)) 28 | model.add(Conv2D(128, kernel_size=(14,14), strides=(self.stride_scale,self.stride_scale), 29 | padding="valid", activation='relu')) 30 | model.add(Dropout(0.5)) 31 | 32 | features = model(image_input) 33 | 34 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 35 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 36 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 37 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 38 | 39 | output = concatenate([output1, output2, output3, output4], name="output") 40 | 41 | # print(model.summary()) 42 | # if K._BACKEND=='tensorflow': 43 | # for layer in model.layers: 44 | # print(layer.get_output_at(0).get_shape().as_list()) 45 | self.model = Model(image_input, output) 46 | self.model.strides = self.strides 47 | self.model.offsets = self.offsets 48 | self.model.fields = self.fields 49 | return self.model 50 | -------------------------------------------------------------------------------- /models/CNN_C32_C64_M2_C128_D.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [2 * self.stride_scale] 14 | self.offsets = [14] 15 | self.fields = [28] 16 | 17 | def build(self, input_shape, num_classes): 18 | image_input = Input(shape=input_shape, name='image_input') 19 | 20 | model = Sequential() # 28, 28, 1 21 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', 22 | input_shape=input_shape, padding='valid')) # 28, 28, 1 23 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 24 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1 25 | model.add(Dropout(0.25)) 26 | model.add(Conv2D(128, kernel_size=(12,12), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu')) 27 | model.add(Dropout(0.5)) 28 | 29 | features = model(image_input) 30 | 31 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 32 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 33 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 34 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 35 | 36 | output = concatenate([output1, output2, output3, output4], name="output") 37 | 38 | # print(model.summary()) 39 | # if K._BACKEND=='tensorflow': 40 | # for layer in model.layers: 41 | # print(layer.get_output_at(0).get_shape().as_list()) 42 | self.model = Model(image_input, output) 43 | self.model.strides = self.strides 44 | self.model.offsets = self.offsets 45 | self.model.fields = self.fields 46 | return self.model 47 | -------------------------------------------------------------------------------- /models/CNN_C32_C64_M2_C64_C64_M2_C128_D.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [4 * self.stride_scale] 14 | self.offsets = [28] 15 | self.fields = [56] 16 | 17 | 18 | def build(self, input_shape, num_classes): 19 | image_input = Input(shape=input_shape, name='image_input') 20 | 21 | model = Sequential() # 28, 28, 1 22 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', 23 | input_shape=input_shape, padding='valid')) # 28, 28, 1 24 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 25 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1 26 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')) # 28, 28, 1 27 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 28 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1 29 | model.add(Dropout(0.25)) 30 | model.add(Conv2D(128, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale), 31 | padding="valid", activation='relu')) 32 | model.add(Dropout(0.5)) 33 | 34 | features = model(image_input) 35 | 36 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 37 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 38 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 39 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 40 | 41 | output = concatenate([output1, output2, output3, output4], name="output") 42 | 43 | # print(model.summary()) 44 | # if K._BACKEND=='tensorflow': 45 | # for layer in model.layers: 46 | # print(layer.get_output_at(0).get_shape().as_list()) 47 | self.model = Model(image_input, output) 48 | self.model.strides = self.strides 49 | self.model.offsets = self.offsets 50 | self.model.fields = self.fields 51 | return self.model 52 | -------------------------------------------------------------------------------- /models/CNN_C32_C64_M2_C64_C64_M2_C128_D_2.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [2 * self.stride_scale, 4 * self.stride_scale] 14 | self.offsets = [14, 28] 15 | self.fields = [28, 56] 16 | 17 | 18 | def build(self, input_shape, num_classes): 19 | image_input = Input(shape=input_shape, name='image_input') 20 | 21 | # stage 1 22 | model = Sequential() # 28, 28, 1 23 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', 24 | input_shape=input_shape, padding='valid')) # 28, 28, 1 25 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 26 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1 27 | s1 = model(image_input) 28 | 29 | # stage 2 30 | s2 = Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')(s1) 31 | s2 = Conv2D(64, (3, 3), activation='relu', padding='valid')(s2) 32 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2) 33 | 34 | # output 1 35 | f1 = Dropout(0.25)(s1) 36 | f1 = Conv2D(128, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale), 37 | padding="valid", activation='relu')(f1) 38 | f1 = Dropout(0.5)(f1) 39 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 40 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 41 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 42 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 43 | 44 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4]) 45 | 46 | # output 2 47 | f2 = Dropout(0.25)(s2) 48 | f2 = Conv2D(128, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale), 49 | padding="valid", activation='relu')(f2) 50 | f2 = Dropout(0.5)(f2) 51 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 52 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 53 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 54 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 55 | 56 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4]) 57 | 58 | # print(model.summary()) 59 | # if K._BACKEND=='tensorflow': 60 | # for layer in model.layers: 61 | # print(layer.get_output_at(0).get_shape().as_list()) 62 | self.model = Model(image_input, [output1, output2]) 63 | self.model.strides = self.strides 64 | self.model.offsets = self.offsets 65 | self.model.fields = self.fields 66 | return self.model 67 | -------------------------------------------------------------------------------- /models/CNN_C32_Cd64_C64_Cd64_C128_D.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 6 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [self.stride_scale] 14 | self.offsets = [14] 15 | self.fields = [28] 16 | 17 | 18 | def build(self, input_shape, num_classes): 19 | image_input = Input(shape=input_shape, name='image_input') 20 | 21 | model = Sequential() # 28, 28, 1 22 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', 23 | input_shape=input_shape, padding='valid')) # 28, 28, 1 24 | model.add(Conv2D(64, (3, 3), dilation_rate=(2, 2), activation='relu', padding='valid')) # 28, 28, 1 25 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')) # 28, 28, 1 26 | model.add(Conv2D(64, (4, 4), dilation_rate=(2, 2), activation='relu', padding='valid')) # 28, 28, 1 27 | model.add(Dropout(0.25)) 28 | model.add(Conv2D(128, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale), 29 | padding="valid", activation='relu')) 30 | model.add(Dropout(0.5)) 31 | 32 | features = model(image_input) 33 | 34 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 35 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 36 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 37 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 38 | 39 | output = concatenate([output1, output2, output3, output4], name="output") 40 | 41 | # print(model.summary()) 42 | # if K._BACKEND=='tensorflow': 43 | # for layer in model.layers: 44 | # print(layer.get_output_at(0).get_shape().as_list()) 45 | self.model = Model(image_input, output) 46 | self.model.strides = self.strides 47 | self.model.offsets = self.offsets 48 | self.model.fields = self.fields 49 | return self.model 50 | -------------------------------------------------------------------------------- /models/CNN_C64_C128_M2_C128_C128_M2_C256_D_2.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [2 * self.stride_scale, 4 * self.stride_scale] 14 | self.offsets = [14, 28] 15 | self.fields = [28, 56] 16 | 17 | 18 | def build(self, input_shape, num_classes): 19 | image_input = Input(shape=input_shape, name='image_input') 20 | 21 | # stage 1 22 | model = Sequential() # 28, 28, 1 23 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', 24 | input_shape=input_shape, padding='valid')) # 28, 28, 1 25 | model.add(Conv2D(128, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 26 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1 27 | s1 = model(image_input) 28 | 29 | # stage 2 30 | s2 = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='valid')(s1) 31 | s2 = Conv2D(128, (3, 3), activation='relu', padding='valid')(s2) 32 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2) 33 | 34 | # output 1 35 | f1 = Dropout(0.25)(s1) 36 | f1 = Conv2D(256, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale), 37 | padding="valid", activation='relu')(f1) 38 | f1 = Dropout(0.5)(f1) 39 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f1))) 40 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 41 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 42 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 43 | 44 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4]) 45 | 46 | # output 2 47 | f2 = Dropout(0.25)(s2) 48 | f2 = Conv2D(256, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale), 49 | padding="valid", activation='relu')(f2) 50 | f2 = Dropout(0.5)(f2) 51 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f2))) 52 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 53 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 54 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 55 | 56 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4]) 57 | 58 | # print(model.summary()) 59 | # if K._BACKEND=='tensorflow': 60 | # for layer in model.layers: 61 | # print(layer.get_output_at(0).get_shape().as_list()) 62 | self.model = Model(image_input, [output1, output2]) 63 | self.model.strides = self.strides 64 | self.model.offsets = self.offsets 65 | self.model.fields = self.fields 66 | return self.model 67 | -------------------------------------------------------------------------------- /models/CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [2 * self.stride_scale, 4 * self.stride_scale] 14 | self.offsets = [7, 20] 15 | self.fields = [14, 40] 16 | 17 | 18 | def build(self, input_shape, num_classes): 19 | image_input = Input(shape=input_shape, name='image_input') 20 | 21 | # stage 1 22 | model = Sequential() # 28, 28, 1 23 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', 24 | input_shape=input_shape, padding='valid')) # 28, 28, 1 25 | model.add(Conv2D(128, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 26 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1 27 | s1 = model(image_input) 28 | 29 | # stage 2 30 | s2 = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='valid')(s1) 31 | s2 = Conv2D(128, (3, 3), activation='relu', padding='valid')(s2) 32 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2) 33 | 34 | # output 1 35 | f1 = Dropout(0.25)(s1) 36 | f1 = Conv2D(256, kernel_size=(5,5), strides=(self.stride_scale,self.stride_scale), 37 | padding="valid", activation='relu')(f1) 38 | f1 = Dropout(0.5)(f1) 39 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f1))) 40 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 41 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 42 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 43 | 44 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4]) 45 | 46 | # output 2 47 | f2 = Dropout(0.25)(s2) 48 | f2 = Conv2D(256, kernel_size=(7,7), strides=(self.stride_scale,self.stride_scale), 49 | padding="valid", activation='relu')(f2) 50 | f2 = Dropout(0.5)(f2) 51 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f2))) 52 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 53 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 54 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 55 | 56 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4]) 57 | 58 | # print(model.summary()) 59 | # if K._BACKEND=='tensorflow': 60 | # for layer in model.layers: 61 | # print(layer.get_output_at(0).get_shape().as_list()) 62 | self.model = Model(image_input, [output1, output2]) 63 | self.model.strides = self.strides 64 | self.model.offsets = self.offsets 65 | self.model.fields = self.fields 66 | return self.model 67 | -------------------------------------------------------------------------------- /models/CNN_C64_C128_M2_C128_C128_M2_C256_D_3.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [self.stride_scale, 2 * self.stride_scale, 4 * self.stride_scale] 14 | self.offsets = [7, 14, 28] 15 | self.fields = [14, 28, 56] 16 | 17 | 18 | def build(self, input_shape, num_classes): 19 | image_input = Input(shape=input_shape, name='image_input') 20 | 21 | # stage 1 22 | model = Sequential() # 28, 28, 1 23 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', 24 | input_shape=input_shape, padding='valid')) # 28, 28, 1 25 | model.add(Conv2D(128, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 26 | s1 = model(image_input) 27 | 28 | # stage 2 29 | s2 = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='valid')(s1) 30 | s2 = Conv2D(128, (3, 3), activation='relu', padding='valid')(s2) 31 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2) 32 | 33 | # stage 3 34 | s3 = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='valid')(s1) 35 | s3 = Conv2D(128, (3, 3), activation='relu', padding='valid')(s3) 36 | s3 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s3) 37 | 38 | # output 1 39 | f1 = Dropout(0.25)(s1) 40 | f1 = Conv2D(256, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale), 41 | padding="valid", activation='relu')(f1) 42 | f1 = Dropout(0.5)(f1) 43 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f1))) 44 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 45 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 46 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1))) 47 | 48 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4]) 49 | 50 | # output 2 51 | f2 = Dropout(0.25)(s2) 52 | f2 = Conv2D(256, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale), 53 | padding="valid", activation='relu')(f2) 54 | f2 = Dropout(0.5)(f2) 55 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f2))) 56 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 57 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 58 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2))) 59 | 60 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4]) 61 | 62 | # output 3 63 | f3 = Dropout(0.25)(s3) 64 | f3 = Conv2D(256, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale), 65 | padding="valid", activation='relu')(f3) 66 | f3 = Dropout(0.5)(f3) 67 | output3_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f3))) 68 | output3_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f3))) 69 | output3_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f3))) 70 | output3_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f3))) 71 | 72 | output3 = concatenate([output3_1, output3_2, output3_3, output3_4]) 73 | 74 | # print(model.summary()) 75 | # if K._BACKEND=='tensorflow': 76 | # for layer in model.layers: 77 | # print(layer.get_output_at(0).get_shape().as_list()) 78 | self.model = Model(image_input, [output1, output2, output3]) 79 | self.model.strides = self.strides 80 | self.model.offsets = self.offsets 81 | self.model.fields = self.fields 82 | return self.model 83 | -------------------------------------------------------------------------------- /models/CNN_C64_C128_M2_C256_D.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | if stride_scale == 0: 9 | self.stride_scale = 14 10 | else: 11 | self.stride_scale = stride_scale 12 | 13 | self.strides = [2 * self.stride_scale] 14 | self.offsets = [14] 15 | self.fields = [28] 16 | 17 | def build(self, input_shape, num_classes): 18 | image_input = Input(shape=input_shape, name='image_input') 19 | 20 | model = Sequential() # 28, 28, 1 21 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', 22 | input_shape=input_shape, padding='valid')) # 28, 28, 1 23 | model.add(Conv2D(128, (3, 3), activation='relu', padding='valid')) # 28, 28, 1 24 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1 25 | model.add(Dropout(0.25)) 26 | model.add(Conv2D(256, kernel_size=(12,12), strides=(self.stride_scale, self.stride_scale), 27 | padding="valid", activation='relu')) 28 | model.add(Dropout(0.5)) 29 | 30 | features = model(image_input) 31 | 32 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 33 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 34 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 35 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features))) 36 | 37 | output = concatenate([output1, output2, output3, output4], name="output") 38 | 39 | # print(model.summary()) 40 | # if K._BACKEND=='tensorflow': 41 | # for layer in model.layers: 42 | # print(layer.get_output_at(0).get_shape().as_list()) 43 | self.model = Model(image_input, output) 44 | self.model.strides = self.strides 45 | self.model.offsets = self.offsets 46 | self.model.fields = self.fields 47 | return self.model 48 | -------------------------------------------------------------------------------- /models/VGG16_AVG.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate, GlobalAveragePooling2D 4 | from keras import applications 5 | 6 | class Network: 7 | 8 | def __init__(self, stride_scale = 0): 9 | # if stride_scale == 0: 10 | # self.stride_scale = 14 11 | # else: 12 | # self.stride_scale = stride_scale 13 | # 14 | # self.strides = [2 * self.stride_scale] 15 | self.strides = [32] 16 | self.offsets = [16] 17 | self.fields = [150] 18 | 19 | def build(self, input_shape, num_classes): 20 | 21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape) 22 | model = Sequential() 23 | for l in vgg.layers: 24 | #l.trainable = False 25 | model.add(l) 26 | 27 | model.add(Conv2D(num_classes, (3, 3))) 28 | model.add(GlobalAveragePooling2D()) 29 | model.add(Activation('softmax')) 30 | 31 | print(model.summary()) 32 | 33 | image_input = Input(shape=input_shape, name='image_input') 34 | 35 | self.model = Model(image_input, model(image_input)) 36 | self.model.strides = self.strides 37 | self.model.offsets = self.offsets 38 | self.model.fields = self.fields 39 | return self.model 40 | -------------------------------------------------------------------------------- /models/VGG16_AVG_r.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate, GlobalAveragePooling2D 4 | from keras import applications 5 | 6 | class Network: 7 | 8 | def __init__(self, stride_scale = 0): 9 | # if stride_scale == 0: 10 | # self.stride_scale = 14 11 | # else: 12 | # self.stride_scale = stride_scale 13 | # 14 | # self.strides = [2 * self.stride_scale] 15 | self.strides = [32] 16 | self.offsets = [16] 17 | self.fields = [150] 18 | 19 | def build(self, input_shape, num_classes): 20 | 21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape) 22 | model = Sequential() 23 | for l in vgg.layers: 24 | #l.trainable = False 25 | model.add(l) 26 | 27 | model.add(Conv2D(num_classes, (1, 1))) 28 | model.add(GlobalAveragePooling2D()) 29 | model.add(Activation('softmax')) 30 | 31 | print(model.summary()) 32 | 33 | image_input = Input(shape=input_shape, name='image_input') 34 | 35 | self.model = Model(image_input, model(image_input)) 36 | self.model.strides = self.strides 37 | self.model.offsets = self.offsets 38 | self.model.fields = self.fields 39 | return self.model 40 | -------------------------------------------------------------------------------- /models/VGG16_C4096_C4096_AVG.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate, GlobalAveragePooling2D 4 | from keras import applications 5 | 6 | class Network: 7 | 8 | def __init__(self, stride_scale = 0): 9 | # if stride_scale == 0: 10 | # self.stride_scale = 14 11 | # else: 12 | # self.stride_scale = stride_scale 13 | # 14 | # self.strides = [2 * self.stride_scale] 15 | self.strides = [32] 16 | self.offsets = [16] 17 | self.fields = [150] 18 | 19 | def build(self, input_shape, num_classes): 20 | 21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape) 22 | model = Sequential() 23 | for l in vgg.layers: 24 | #l.trainable = False 25 | model.add(l) 26 | 27 | model.add(Conv2D(4096, (3, 3), activation='relu')) 28 | model.add(Dropout(0.5)) 29 | model.add(Conv2D(4096, (3, 3), activation='relu')) 30 | model.add(Dropout(0.5)) 31 | model.add(Conv2D(num_classes, (3, 3))) 32 | model.add(GlobalAveragePooling2D()) 33 | model.add(Activation('softmax')) 34 | 35 | print(model.summary()) 36 | 37 | image_input = Input(shape=input_shape, name='image_input') 38 | 39 | self.model = Model(image_input, model(image_input)) 40 | self.model.strides = self.strides 41 | self.model.offsets = self.offsets 42 | self.model.fields = self.fields 43 | return self.model 44 | -------------------------------------------------------------------------------- /models/VGG16_D256.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | from keras import applications 5 | 6 | class Network: 7 | 8 | def __init__(self, stride_scale = 0): 9 | # if stride_scale == 0: 10 | # self.stride_scale = 14 11 | # else: 12 | # self.stride_scale = stride_scale 13 | # 14 | # self.strides = [2 * self.stride_scale] 15 | self.strides = [32] 16 | self.offsets = [ ( 5 - 1) / 2.0 ] 17 | self.fields = [150] 18 | 19 | def build(self, input_shape, num_classes): 20 | 21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape) 22 | model = Sequential() 23 | for l in vgg.layers: 24 | l.trainable = False 25 | model.add(l) 26 | 27 | model.add(Flatten()) 28 | model.add(Dense(256, activation='relu')) 29 | model.add(Dropout(0.5)) 30 | model.add(Dense(num_classes, activation='softmax')) 31 | 32 | image_input = Input(shape=input_shape, name='image_input') 33 | 34 | self.model = Model(image_input, model(image_input)) 35 | self.model.strides = self.strides 36 | self.model.offsets = self.offsets 37 | self.model.fields = self.fields 38 | return self.model 39 | -------------------------------------------------------------------------------- /models/VGG16_D4096_D4096.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | from keras import applications 5 | 6 | class Network: 7 | 8 | def __init__(self, stride_scale = 0): 9 | # if stride_scale == 0: 10 | # self.stride_scale = 14 11 | # else: 12 | # self.stride_scale = stride_scale 13 | # 14 | # self.strides = [2 * self.stride_scale] 15 | self.strides = [32] 16 | self.offsets = [16] 17 | self.fields = [150] 18 | 19 | def build(self, input_shape, num_classes): 20 | 21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape) 22 | model = Sequential() 23 | for l in vgg.layers: 24 | #l.trainable = False 25 | model.add(l) 26 | 27 | model.add(Flatten()) 28 | model.add(Dense(4096, activation='relu')) 29 | model.add(Dropout(0.5)) 30 | model.add(Dense(4096, activation='relu')) 31 | model.add(Dropout(0.5)) 32 | model.add(Dense(num_classes, activation='softmax')) 33 | 34 | print(model.summary()) 35 | 36 | image_input = Input(shape=input_shape, name='image_input') 37 | 38 | self.model = Model(image_input, model(image_input)) 39 | self.model.strides = self.strides 40 | self.model.offsets = self.offsets 41 | self.model.fields = self.fields 42 | return self.model 43 | -------------------------------------------------------------------------------- /models/VGG16_block4_D4096_D4096.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | from keras import applications 5 | 6 | class Network: 7 | 8 | def __init__(self, stride_scale = 0): 9 | # if stride_scale == 0: 10 | # self.stride_scale = 14 11 | # else: 12 | # self.stride_scale = stride_scale 13 | # 14 | # self.strides = [2 * self.stride_scale] 15 | self.strides = [32] 16 | self.offsets = [16] 17 | self.fields = [150] 18 | 19 | def build(self, input_shape, num_classes): 20 | 21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape) 22 | model = Sequential() 23 | for l in vgg.layers[:-4]: 24 | #l.trainable = False 25 | model.add(l) 26 | 27 | model.add(Flatten()) 28 | model.add(Dense(4096, activation='relu')) 29 | model.add(Dropout(0.5)) 30 | model.add(Dense(4096, activation='relu')) 31 | model.add(Dropout(0.5)) 32 | model.add(Dense(num_classes, activation='softmax')) 33 | 34 | print(model.summary()) 35 | 36 | image_input = Input(shape=input_shape, name='image_input') 37 | 38 | self.model = Model(image_input, model(image_input)) 39 | self.model.strides = self.strides 40 | self.model.offsets = self.offsets 41 | self.model.fields = self.fields 42 | return self.model 43 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- 1 | import models.CNN_C32_C64_M2_C128_D 2 | import models.CNN_C32_C64_M2_C64_C64_M2_C128_D 3 | import models.CNN_C32_C64_M2_C64_C64_M2_C128_D_2 4 | import models.CNN_C64_C128_M2_C256_D 5 | import models.CNN_C128_C256_M2_C512_D 6 | import models.CNN_C64_C128_M2_C128_C128_M2_C256_D_2 7 | import models.CNN_C64_C128_M2_C128_C128_M2_C256_D_3 8 | import models.CNN_C128_C256_M2_C256_C256_M2_C512_D_2 9 | import models.CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7 10 | import models.CNN_C32_C64_C128_D 11 | import models.CNN_C32_C64_C128_C 12 | import models.CNN_C32_C64_C128_C2 13 | import models.CNN_C32_C64_C64_Cd64_C128_D 14 | import models.CNN_C32_Cd64_C64_Cd64_C128_D 15 | import models.vgg 16 | import models.VGG16_D256 17 | import models.VGG16_D4096_D4096 18 | import models.VGG16_block4_D4096_D4096 19 | import models.VGG16_AVG 20 | import models.VGG16_AVG_r 21 | import models.VGG16_C4096_C4096_AVG 22 | 23 | 24 | def get(**kwargs): 25 | if kwargs.get("name") not in globals(): 26 | raise KeyError('Unknown network: {}'.format(kwargs)) 27 | 28 | return globals()[kwargs.get("name")].Network(kwargs.get("stride_scale")) 29 | -------------------------------------------------------------------------------- /models/simple_document_classification.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | # if stride_scale == 0: 9 | # self.stride_scale = 14 10 | # else: 11 | # self.stride_scale = stride_scale 12 | # 13 | # self.strides = [2 * self.stride_scale] 14 | self.strides = [0] 15 | self.offsets = [ ( 5 - 1) / 2.0 ] 16 | self.fields = [150] 17 | 18 | def build(self, input_shape, num_classes): 19 | 20 | assert input_shape == (150, 150, 3) , "incorrect input shape " + input_shape 21 | image_input = Input(shape=input_shape, name='image_input') 22 | 23 | model = Sequential() 24 | # Convolution + Pooling Layer 25 | model.add(Conv2D(32, (3, 3), padding='same', input_shape=input_shape, activation='relu')) 26 | model.add(MaxPooling2D(pool_size=(2, 2))) 27 | # Convolution + Pooling Layer 28 | model.add(Conv2D(32, (3, 3), padding='same', activation='relu')) 29 | model.add(MaxPooling2D(pool_size=(2, 2))) 30 | # Convolution + Pooling Layer 31 | model.add(Conv2D(64, (3, 3), padding='same', activation='relu')) 32 | model.add(MaxPooling2D(pool_size=(2, 2))) 33 | # Convolution + Pooling Layer 34 | model.add(Conv2D(64, (3, 3), padding='same', activation='relu')) 35 | model.add(MaxPooling2D(pool_size=(2, 2))) 36 | 37 | # Flattening 38 | model.add(Flatten()) 39 | # Fully connection 40 | model.add(Dense(64, activation='relu')) 41 | model.add(Dropout(.6)) 42 | model.add(Dense(64, activation='relu')) 43 | model.add(Dense(64, activation='relu')) 44 | model.add(Dropout(.3)) 45 | model.add(Dense(num_classes, activation='softmax', name='predictions')) 46 | 47 | # GlobalAveragePooling2D() 48 | 49 | self.model = Model(image_input, model(image_input)) 50 | self.model.strides = self.strides 51 | self.model.offsets = self.offsets 52 | self.model.fields = self.fields 53 | return self.model 54 | -------------------------------------------------------------------------------- /models/vgg.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential, Model 2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input 3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate 4 | 5 | class Network: 6 | 7 | def __init__(self, stride_scale = 0): 8 | # if stride_scale == 0: 9 | # self.stride_scale = 14 10 | # else: 11 | # self.stride_scale = stride_scale 12 | # 13 | # self.strides = [2 * self.stride_scale] 14 | self.strides = [32] 15 | self.offsets = [16] 16 | self.fields = [224] 17 | 18 | def build(self, input_shape, num_classes): 19 | 20 | assert input_shape == (224, 224, 3) , "incorrect input shape " + input_shape 21 | image_input = Input(shape=input_shape, name='image_input') 22 | 23 | model = Sequential() 24 | # block 1 25 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=input_shape, padding='same', name='block1_conv1')) # 224 x 224 26 | model.add(Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')) 27 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block1_pool')) # 112x112 28 | # model.add(Dropout(0.25)) 29 | 30 | # block 2 31 | model.add(Conv2D(128, kernel_size=(3,3), padding="same", activation='relu', name='block2_conv1')) 32 | model.add(Conv2D(128, kernel_size=(3,3), padding="same", activation='relu', name='block2_conv2')) 33 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block2_pool')) # 56 x 56 34 | # model.add(Dropout(0.5)) 35 | 36 | # block 3 37 | model.add(Conv2D(256, kernel_size=(3,3), padding="same", activation='relu', name='block3_conv1')) 38 | model.add(Conv2D(256, kernel_size=(3,3), padding="same", activation='relu', name='block3_conv2')) 39 | model.add(Conv2D(256, kernel_size=(3,3), padding="same", activation='relu', name='block3_conv3')) 40 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block3_pool')) # 28 x 28 41 | 42 | # block 4 43 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block4_conv1')) 44 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block4_conv2')) 45 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block4_conv3')) 46 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block4_pool')) # 14 x 14 47 | 48 | # block 5 49 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block5_conv1')) 50 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block5_conv2')) 51 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block5_conv3')) 52 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block5_pool')) # 7 x 7 53 | 54 | model.add(Flatten(name='flatten')) 55 | model.add(Dense(4096, activation='relu', name='fc1')) 56 | model.add(Dropout(0.5)) 57 | model.add(Dense(4096, activation='relu', name='fc2')) 58 | model.add(Dropout(0.5)) 59 | model.add(Dense(num_classes, activation='softmax', name='predictions')) 60 | 61 | # GlobalAveragePooling2D() 62 | 63 | self.model = Model(image_input, model(image_input)) 64 | self.model.strides = self.strides 65 | self.model.offsets = self.offsets 66 | self.model.fields = self.fields 67 | return self.model 68 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import os 3 | import time 4 | import datetime 5 | import argparse 6 | from utils import check_config 7 | check_config() 8 | import models, datasets 9 | 10 | import keras 11 | from keras import backend as K 12 | from keras.metrics import categorical_accuracy 13 | from keras.utils.multi_gpu_utils import multi_gpu_model 14 | from keras.utils.vis_utils import plot_model 15 | 16 | ap = argparse.ArgumentParser() 17 | ap.add_argument("-b", "--batch_size", type=int, 18 | default=3, help="# of images per batch") 19 | ap.add_argument("-p", "--parallel", default=False, 20 | help="Enable multi GPUs", action='store_true') 21 | ap.add_argument("-e", "--epochs", type=int, default=12, 22 | help="# of training epochs") 23 | ap.add_argument("-l", "--logs", type=str, default="logs", 24 | help="log directory") 25 | ap.add_argument("-m", "--model", type=str, 26 | default="CNN_C32_C64_M2_C128_D", help="model") 27 | ap.add_argument("-lr", "--learning_rate", type=float, 28 | default=0.001, help="learning rate") 29 | ap.add_argument("-s", "--stride_scale", type=int, 30 | default=0, help="Stride scale. If zero, default stride scale.") 31 | ap.add_argument("-d", "--dataset", type=str, default="ocr_mnist", 32 | help="dataset") 33 | ap.add_argument("-w", "--white", type=float, default=0.9, 34 | help="white probability for MNIST dataset") 35 | ap.add_argument("-n", "--noise", default=False, 36 | help="noise for MNIST dataset", action='store_true') 37 | ap.add_argument("--pos_weight", type=float, default=100., 38 | help="weight for positive objects") 39 | ap.add_argument("--iou", type=float, default=.3, 40 | help="iou treshold to consider a position to be positive. If -1, positive only if \ 41 | object included in the layer field") 42 | ap.add_argument("--bb_positive", type=str, default="iou-treshold", 43 | help="Possible values: iou-treshold, in-anchor, best-anchor") 44 | ap.add_argument("--nms_iou", type=float, default=.2, 45 | help="iou treshold for non max suppression") 46 | ap.add_argument("-i", "--input_dim", type=int, default=700, 47 | help="network input dim") 48 | ap.add_argument("-r", "--resize", type=str, default="", 49 | help="resize input images") 50 | ap.add_argument('--no-save', dest='save', action='store_false', 51 | help="save model and data to files") 52 | ap.add_argument('--resume', dest='resume_model', type=str, default="") 53 | ap.add_argument('--n_cpu', type=int, default=1, 54 | help='number of CPU threads to use during data generation') 55 | args = ap.parse_args() 56 | print(args) 57 | 58 | assert K.image_data_format() == 'channels_last' , "image data format channel_last" 59 | 60 | model = models.get(name = args.model, stride_scale = args.stride_scale) 61 | print("#"*14 +" MODEL "+ "#"*14) 62 | print("### Stride scale: " + str(model.stride_scale)) 63 | for s in model.strides: print("### Stride: " + str(s)) 64 | print("#" * 35) 65 | 66 | dataset = datasets.get(name = args.dataset, layer_strides = model.strides, layer_offsets = model.offsets, 67 | layer_fields = model.fields, white_prob = args.white, bb_positive = args.bb_positive, iou_treshold=args.iou, save=args.save, 68 | batch_size = args.batch_size, input_dim=args.input_dim, resize=args.resize, noise=args.noise) 69 | 70 | # model initialization and parallel computing 71 | if not args.parallel: 72 | print("[INFO] training with 1 device...") 73 | built_model = model.build(input_shape = dataset.input_shape, num_classes = dataset.num_classes) 74 | else: 75 | if K._BACKEND=='tensorflow': 76 | from tensorflow.python.client import device_lib 77 | def get_available_gpus(): 78 | local_device_protos = device_lib.list_local_devices() 79 | return [x.name for x in local_device_protos if x.device_type == 'GPU'] 80 | ngpus = len(get_available_gpus()) 81 | print("[INFO] training with {} GPUs...".format(ngpus)) 82 | import tensorflow as tf 83 | with tf.device("/cpu:0"): 84 | original_built_model = model.build(input_shape = dataset.input_shape, num_classes= dataset.num_classes) 85 | built_model = multi_gpu_model(original_built_model, gpus=ngpus) 86 | elif K._BACKEND=='cntk': 87 | built_model = model.build(input_shape = dataset.input_shape, num_classes= dataset.num_classes) 88 | else: 89 | print("Multi GPU not available on this backend.") 90 | 91 | # import numpy as np 92 | # class_weights = np.ones(dataset.num_classes) 93 | 94 | # model compilation with loss and accuracy 95 | def custom_loss(y_true, y_pred): 96 | final_loss = 0. 97 | if dataset.enable_boundingbox: 98 | obj_true = y_true[...,dataset.num_classes] 99 | obj_pred = y_pred[...,dataset.num_classes] 100 | # (1 - z) * x + l * (log(1 + exp(-abs(x))) + max(-x, 0)) 101 | log_weight = 1. + (args.pos_weight - 1.) * obj_true 102 | obj = (1. - obj_true) * obj_pred + log_weight * (K.log(1. + K.exp(-K.abs(obj_pred))) + K.relu(- obj_pred)) 103 | 104 | obj = K.square(obj_pred - obj_true) 105 | 106 | prob = y_pred[...,0:dataset.num_classes] 107 | # scale predictions so that the class probas of each sample sum to 1 108 | prob /= K.sum(prob, axis=-1, keepdims=True) 109 | # clip to prevent NaN's and Inf's 110 | prob = K.clip(prob, K.epsilon(), 1 - K.epsilon()) 111 | # calc 112 | loss = y_true[...,0:dataset.num_classes] * K.log(prob) #* class_weights 113 | cat = -K.sum(loss, -1, keepdims=True) 114 | 115 | reg = K.sum(K.square(y_true[..., dataset.num_classes+1:dataset.num_classes+5] - y_pred[...,dataset.num_classes+1:dataset.num_classes+5]), axis=-1, keepdims=True) 116 | 117 | # if args.best_position_classification: 118 | # mask = K.cast( K.less_equal( y_true[..., dataset.num_classes+5:(dataset.num_classes+6)], model.strides[0] * 1.42 / 2 ), K.floatx()) 119 | 120 | mask = K.cast( K.equal( y_true[..., dataset.num_classes:(dataset.num_classes+1)], 1.0 ), K.floatx()) 121 | 122 | final_loss = final_loss + obj + K.sum(cat * mask) / K.maximum(K.sum(mask), 1.0) + 100 * K.sum(reg * mask) / K.maximum(K.sum(mask), 1.0) 123 | 124 | if dataset.enable_classification or dataset.enable_segmentation: 125 | final_loss = final_loss + K.categorical_crossentropy(y_true, y_pred) 126 | 127 | return final_loss 128 | 129 | 130 | # metrics 131 | metrics = [] 132 | if dataset.enable_boundingbox: 133 | 134 | def obj_accuracy(y_true, y_pred): 135 | acc = K.cast(K.equal( y_true[...,dataset.num_classes], K.round(y_pred[...,dataset.num_classes])), K.floatx()) 136 | return K.mean(acc) 137 | metrics.append(obj_accuracy) 138 | 139 | def class_accuracy(y_true, y_pred): 140 | mask = K.cast( K.equal(y_true[...,dataset.num_classes], 1.0 ), K.floatx() ) 141 | acc = K.cast(K.equal(K.argmax(y_true[...,0:dataset.num_classes], axis=-1), K.argmax(y_pred[...,0:dataset.num_classes], axis=-1)), K.floatx()) 142 | if K.backend() == "cntk": 143 | acc = K.expand_dims(acc) 144 | return K.sum(acc * mask) / K.maximum(K.sum(mask), 1.0) 145 | metrics.append(class_accuracy) 146 | 147 | def reg_accuracy(y_true, y_pred): 148 | mask = K.cast( K.equal(y_true[...,dataset.num_classes], 1.0 ), K.floatx() ) 149 | reg = K.sum(K.square(y_true[...,dataset.num_classes+1:dataset.num_classes+3] - y_pred[...,dataset.num_classes+1:dataset.num_classes+3]), axis=-1) 150 | if K.backend() == "cntk": 151 | reg = K.expand_dims(reg) 152 | return K.sum(reg * mask) / K.maximum(K.sum(mask), 1.0) 153 | metrics.append(reg_accuracy) 154 | 155 | if dataset.enable_classification or dataset.enable_segmentation: 156 | metrics.append(categorical_accuracy) 157 | 158 | # model compilation 159 | # built_model.compile(loss=custom_loss, optimizer=keras.optimizers.Adam(lr=args.learning_rate), metrics=metrics) 160 | built_model.compile(optimizer=keras.optimizers.Adam(lr=args.learning_rate), loss=custom_loss, metrics=metrics) 161 | 162 | if args.resume_model: 163 | print("Resuming model from weights in " + args.resume_model) 164 | built_model.load_weights(args.resume_model, by_name=True) 165 | # plot_model(built_model, to_file='model_plot.png', show_shapes=True, show_layer_names=True) 166 | 167 | # parallel computing on CNTK 168 | if args.parallel and (K._BACKEND=='cntk'): 169 | import cntk as C 170 | built_model._make_train_function() 171 | trainer = built_model.train_function.trainer 172 | assert (trainer is not None), "Cannot find a trainer in Keras Model!" 173 | learner_no = len(trainer.parameter_learners) 174 | assert (learner_no > 0), "No learner in the trainer." 175 | if(learner_no > 1): 176 | warnings.warn("Unexpected multiple learners in a trainer.") 177 | learner = trainer.parameter_learners[0] 178 | dist_learner = C.train.distributed.data_parallel_distributed_learner(learner, \ 179 | num_quantization_bits=32, distributed_after=0) 180 | built_model.train_function.trainer = C.trainer.Trainer( 181 | trainer.model, [trainer.loss_function, trainer.evaluation_function], [dist_learner]) 182 | rank = C.Communicator.rank() 183 | workers = C.Communicator.num_workers() 184 | print("[INFO] CNTK training with {} GPUs...".format(workers)) 185 | total_items = dataset.x_train.shape[0] 186 | start = rank * total_items//workers 187 | end = min((rank+1) * total_items // workers, total_items) 188 | x_train, y_train = dataset.x_train[start : end], dataset.y_train[start : end] 189 | 190 | 191 | start_time = time.time() 192 | 193 | # Callbacks: save and tensorboard display 194 | callbacks = [] 195 | 196 | if args.save: 197 | from keras.callbacks import ModelCheckpoint 198 | if not os.path.exists("/sharedfiles/models"): 199 | os.makedirs("/sharedfiles/models") 200 | fname = "/sharedfiles/models/" + datetime.datetime.fromtimestamp(start_time).strftime('%Y-%m-%d_%H:%M_') + args.model + ".h5" 201 | if args.parallel: # http://github.com/keras-team/keras/issues/8649 202 | from callback import ParallelSaveCallback 203 | checkpoint = ParallelSaveCallback(original_built_model,fname) 204 | else: 205 | if dataset.enable_boundingbox: 206 | checkpoint = ModelCheckpoint(fname, monitor='val_loss', verbose=1, save_best_only=True, mode='min') 207 | else: 208 | checkpoint = ModelCheckpoint(fname, monitor='val_categorical_accuracy', verbose=1, save_best_only=True, mode='max') 209 | callbacks.append(checkpoint) 210 | 211 | if K._BACKEND=='tensorflow': 212 | from callback import TensorBoard 213 | log_dir = './Graph/' + time.strftime("%Y-%m-%d_%H:%M:%S") 214 | tensorboard = TensorBoard(dataset.gt_test, dataset.classes, dataset.stride_margin, model.strides, model.offsets, model.fields, args.nms_iou, 215 | log_dir=log_dir, 216 | histogram_freq=0, 217 | batch_size=args.batch_size, 218 | max_validation_size=100, 219 | write_output_images=True, 220 | enable_segmentation=dataset.enable_segmentation, 221 | enable_boundingbox=dataset.enable_boundingbox, 222 | write_graph=False, 223 | write_images=False, 224 | val_data=dataset.val if hasattr(dataset,"val") else None 225 | ) 226 | print("Log saved in ", log_dir) 227 | tensorboard.set_model(built_model) 228 | callbacks.append(tensorboard) 229 | 230 | # training section 231 | if hasattr(dataset, "x_train"): 232 | built_model.fit(dataset.x_train, dataset.y_train, 233 | batch_size=args.batch_size, 234 | epochs=args.epochs, 235 | verbose=1, 236 | validation_data=(dataset.x_test, dataset.y_test), 237 | callbacks=callbacks) 238 | else: 239 | built_model.fit_generator(dataset.train, 240 | epochs=args.epochs, 241 | verbose=1, 242 | workers=args.n_cpu, 243 | use_multiprocessing=False, 244 | max_queue_size=10, 245 | shuffle=True, 246 | validation_data=dataset.val, 247 | callbacks=callbacks) 248 | 249 | # # save model 250 | # if args.save: 251 | # if not os.path.exists("/sharedfiles/models"): 252 | # os.makedirs("/sharedfiles/models") 253 | # fname = "/sharedfiles/models/" + datetime.datetime.fromtimestamp(start_time).strftime('%Y-%m-%d_%H:%M_') + args.model + ".h5" 254 | # built_model.save_weights(fname) 255 | # print("Model weights saved in " + fname) 256 | 257 | # evaluate section 258 | if hasattr(dataset, "x_test"): 259 | score = built_model.evaluate(dataset.x_test, dataset.y_test, batch_size=args.batch_size, verbose=0) 260 | else: 261 | score = built_model.evaluate_generator(dataset.test) 262 | 263 | print("Test loss and accuracy values:") 264 | for s in score: 265 | print(s) 266 | duration = time.time() - start_time 267 | print('Total Duration (%.3f sec)' % duration) 268 | 269 | if K._BACKEND=='tensorflow': 270 | print("Log saved in ", log_dir) 271 | 272 | if K._BACKEND=='cntk' and args.save: 273 | import cntk as C 274 | C.combine(built_model.outputs).save(fname[:-3]+'.dnn') 275 | 276 | if K._BACKEND=='cntk' and args.parallel: 277 | C.Communicator.finalize() 278 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import re 4 | 5 | def check_config(write_json=False): 6 | import json 7 | with open(os.path.expanduser('~') + '/.keras/keras.json') as data_file: 8 | data = json.load(data_file) 9 | backend = data["backend"] 10 | 11 | r = re.search('/envs/(cntk|keras-tf)-py([0-9])([0-9])/bin/python', sys.executable) 12 | conda_env = { "tensorflow" : "keras-tf" , "cntk": "cntk" } 13 | 14 | if backend not in conda_env: 15 | sys.exit("Backend not supported.") 16 | else: 17 | env_name = conda_env[backend] + "-py" + str(sys.version_info.major) + str(sys.version_info.minor) 18 | 19 | if r is None or (sys.version_info.major != int(r.group(2)) or sys.version_info.minor != int(r.group(3))): 20 | sys.exit(""" 21 | To create corresponding environment: 22 | 23 | conda env update --file """ + env_name + """.yml 24 | 25 | To activate Conda environnement 26 | 27 | source activate """ + env_name + """ 28 | 29 | """) 30 | else: 31 | if conda_env[backend] != r.group(1): 32 | for b,e in conda_env.items(): 33 | if e == r.group(1): 34 | os.environ["KERAS_BACKEND"] = b 35 | if write_json: 36 | print("Modifying ~/.keras/keras.json to " + b) 37 | with open(os.path.expanduser('~') + '/.keras/keras.json', "w") as data_file: 38 | json.dump(data, data_file) 39 | --------------------------------------------------------------------------------