├── .gitignore
├── README.md
├── callback.py
├── clean.sh
├── cntk-py27.yml
├── cntk-py35.yml
├── datasets
├── cls_dogs_vs_cats.py
├── cls_dogs_vs_cats.sh
├── cls_rvl_cdip.py
├── cls_rvl_cdip_check.sh
├── cls_rvl_cdip_convert.sh
├── cls_tiny_imagenet.py
├── cls_tiny_imagenet_class_list.sh
├── cls_tiny_imagenet_convert.sh
├── document.conf
├── ocr_documents.py
├── ocr_documents_generator.py
├── ocr_documents_preprocess.py
├── ocr_documents_statistics.py
└── ocr_mnist.py
├── images
├── Object_detection_deep_learning_networks_for_Optical_Character_Recognition.pdf
├── res1.png
├── res2.png
├── res3.png
├── res4.png
└── res5.png
├── keras-tf-py27.yml
├── keras-tf-py35.yml
├── models
├── CNN_C128_C256_M2_C256_C256_M2_C512_D_2.py
├── CNN_C128_C256_M2_C512_D.py
├── CNN_C32_C64_C128_C.py
├── CNN_C32_C64_C128_C2.py
├── CNN_C32_C64_C128_D.py
├── CNN_C32_C64_C64_Cd64_C128_D.py
├── CNN_C32_C64_M2_C128_D.py
├── CNN_C32_C64_M2_C64_C64_M2_C128_D.py
├── CNN_C32_C64_M2_C64_C64_M2_C128_D_2.py
├── CNN_C32_Cd64_C64_Cd64_C128_D.py
├── CNN_C64_C128_M2_C128_C128_M2_C256_D_2.py
├── CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7.py
├── CNN_C64_C128_M2_C128_C128_M2_C256_D_3.py
├── CNN_C64_C128_M2_C256_D.py
├── VGG16_AVG.py
├── VGG16_AVG_r.py
├── VGG16_C4096_C4096_AVG.py
├── VGG16_D256.py
├── VGG16_D4096_D4096.py
├── VGG16_block4_D4096_D4096.py
├── __init__.py
├── simple_document_classification.py
└── vgg.py
├── train.py
└── utils.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | Graph/
3 | logs/
4 | *.npz
5 | datasets/ocr
6 | *.zip
7 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Pretrained document features for document OCR, classification and segmentation
2 |
3 | The objective of this repository is to develop pretrained features for document images to be used in document classification, segmentation, OCR and analysis. The pretrained features are being trained upon OCR results from a OCR technology, such as Tesseract.
4 |
5 |
6 |
7 |
8 | [PDF paper](images/Object_detection_deep_learning_networks_for_Optical_Character_Recognition.pdf)
9 |
10 |
11 | ## Features
12 |
13 | - **Python 2 and Python 3 support**
14 |
15 | - **Tensorflow and CNTK support**
16 |
17 | To run the training with CNTK, activate the Python environment `source activate keras-tf-py27` and set backend value to `tensorflow` in `~/.keras/keras.json`.
18 |
19 | To run the training with CNTK, activate the Python environment `source activate cntk-py27` and set backend value to `cntk` in `~/.keras/keras.json`.
20 |
21 | - **Multi-GPUs support**
22 |
23 | To enable parallel computing with multi-gpus:
24 | ```
25 | python train.py -p
26 | ```
27 |
28 | For CNTK, start parallel workers to use all GPUs:
29 | ```
30 | mpiexec --npernode 4 python train.py -p
31 | ```
32 |
33 | - **TensorBoard visualization** Train and validation loss, objectness accuracy per layer scale, class accuracy per layer scale, regression accuracy, object mAP score, target mAP score, original image, objectness map, multi layer detections, detections after non-max-suppression, target and groundtruth.
34 |
35 |
36 | ## Install requirements
37 |
38 | - Ubuntu 17.4
39 |
40 | - GPU support: [NVIDIA driver](http://www.nvidia.fr/download/driverResults.aspx/131287/fr), [Cuda 9.0](https://developer.nvidia.com/cuda-90-download-archive) and [Cudnn 7.0.4](https://developer.nvidia.com/rdp/form/cudnn-download-survey) (requirement by CNTK)
41 |
42 | - [CNTK install with MKL/OpenMPI/Protobuf/Zlib/LibZip/Boost/Swig/Anaconda3/Python support](https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-Linux)
43 |
44 | Create cntk-py35 and cntk-py27 Conda environments following their specs.
45 |
46 | Build: `../../configure --with-swig=/usr/local/swig-3.0.10 --with-py35-path=$HOME/anaconda3/envs/cntk-py35 --with-py27-path=$HOME/anaconda3/envs/cntk-py27`
47 |
48 | Update their environments to add Keras and other libraries for the current code :
49 | ```
50 | conda env update --file cntk-py27.yml
51 | conda env update --file cntk-py35.yml
52 | ```
53 |
54 | - Tensorflow and Python 2.7
55 | ```
56 | conda env update --file keras-tf-py27.yml
57 | ```
58 |
59 | - Tensorflow and Python 3.5
60 | ```
61 | conda env update --file keras-tf-py35.yml
62 | ```
63 |
64 | - HDF5 to save weights with Keras
65 |
66 | ```
67 | sudo apt-get install libhdf5-dev
68 | ```
69 |
70 |
71 |
72 | ## Run
73 |
74 | Activate one of the Conda environments:
75 | ```
76 | source activate cntk-py27
77 | source activate cntk-py35
78 | source activate keras-tf-py27
79 | source activate keras-tf-py35
80 | ```
81 |
82 | For help on available options:
83 |
84 | ```
85 | python train.py -h
86 | python3 train.py -h
87 |
88 | Using TensorFlow backend/
89 | Using CNTK backend
90 | Selected GPU[3] GeForce GTX 1080 Ti as the process wide default device.
91 | usage: train.py [-h] [-b BATCH_SIZE] [-p] [-e EPOCHS] [-l LOGS] [-m MODEL]
92 | [-lr LEARNING_RATE] [-s STRIDE_SCALE] [-d DATASET] [-w WHITE]
93 | [-n] [--pos_weight POS_WEIGHT] [--iou IOU]
94 | [--nms_iou NMS_IOU] [-i INPUT_DIM] [-r RESIZE] [--no-save]
95 | [--resume RESUME_MODEL]
96 |
97 | optional arguments:
98 | -h, --help show this help message and exit
99 | -b BATCH_SIZE, --batch_size BATCH_SIZE
100 | # of images per batch
101 | -p, --parallel Enable multi GPUs
102 | -e EPOCHS, --epochs EPOCHS
103 | # of training epochs
104 | -l LOGS, --logs LOGS log directory
105 | -m MODEL, --model MODEL
106 | model
107 | -lr LEARNING_RATE, --learning_rate LEARNING_RATE
108 | learning rate
109 | -s STRIDE_SCALE, --stride_scale STRIDE_SCALE
110 | Stride scale. If zero, default stride scale.
111 | -d DATASET, --dataset DATASET
112 | dataset
113 | -w WHITE, --white WHITE
114 | white probability for MNIST dataset
115 | -n, --noise noise for MNIST dataset
116 | --pos_weight POS_WEIGHT
117 | weight for positive objects
118 | --iou IOU iou treshold to consider a position to be positive. If
119 | -1, positive only if object included in the layer
120 | field
121 | --bb_positive BB_POSITIVE
122 | Possible values: iou-treshold, in-anchor, best-anchor
123 | --nms_iou NMS_IOU iou treshold for non max suppression
124 | -i INPUT_DIM, --input_dim INPUT_DIM
125 | network input dim
126 | -r RESIZE, --resize RESIZE
127 | resize input images
128 | --no-save save model and data to files
129 | --resume RESUME_MODEL
130 | --n_cpu N_CPU number of CPU threads to use during data generation
131 | ```
132 |
133 | ## OCR Training
134 |
135 | ### Toy dataset with MNIST "ocr_mnist"
136 |
137 |
138 | Train image recognition of digits on a white background (inverted MNIST images):
139 |
140 | | Command | Obj acc | Class acc | Reg acc | Obj mAP |
141 | | --- | --- | --- | --- | --- |
142 | | `python train.py` | 100 | 99.2827 | 1.60e-10 | 99.93 |
143 | | With noise `python train.py -n` | 99.62 | 98.92 | 4.65e-6 | 98.41 |
144 |
145 |
146 |
147 |
148 | With stride 12 instead of default 28:
149 |
150 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
151 | | --- | --- | --- | --- | --- | --- |
152 | | `python train.py -s 6 --iou .15` | 96.37 | 36.25 | 0.010 | **99.97** | 100 |
153 | | `python train.py -s 6 --iou .2` | 98.42 | 28.56 | 0.012 | 99.75 | 100 |
154 | | `python train.py -s 6 --iou .25` | 97.05 | 36.42 | 0.015 | 99.52 | 100 |
155 | | `python train.py -s 6 --iou .3` | 98.35 | 92.78 | 0.0013 | 99.88 | 100 |
156 | | `python train.py -s 6 --iou .35` | 98.99| 83.72| 0.0069 | 99.22 | 100 |
157 | | `python train.py -s 6 --iou .4` | 98.70 | 94.96| 0.0066 | 98.37 | 100 |
158 | | `python train.py -s 6 --iou .5` | 96.71 | 95.46 | 0.0062| 91.09 | 95.71 |
159 | | `python train.py -s 6 --iou .6` | 99.92| 98.23| 4.8e-05 | 51.80 | 54.32 |
160 | | `python train.py -s 6 --iou .8` | 99.90 | **97.90** | 7.67e-05 | 8.5 | 10.63 |
161 | | `python train.py -s 6 --iou .95` | **99.94** | 97.27 | 3.7-07 | 10.80 | 12.21 |
162 | | `python train.py -s 6 --iou .99` | 99.91 | 97.66 | 7.06e-07 | 9.3 | 11.71 |
163 |
164 |
165 | With stride 4:
166 |
167 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
168 | | --- | --- | --- | --- | --- | --- |
169 | | `python train.py -s 2 --iou .2` | 98.51 | 72.71 | 0.034 | 99.99 | 100 |
170 | | `python train.py -s 2 --iou .25` | 98.63 | 78.53 | 0.018 | **100** | 100 |
171 | | `python train.py -s 2 --iou .3` | 97.88 | 94.54 | 0.0098 | 99.89 | 100 |
172 | | `python train.py -s 2 --iou .4` | 96.85 | 97.41 | 0.0098 | 99.93 | 100 |
173 | | `python train.py -s 2 --iou .5` | 94.14 | 98.81 | 0.0099 | 99.61 | 100 |
174 | | `python train.py -s 2 --iou .6` | 99.80 | 98.57 | 0.00031 | 99.93 | 100 |
175 | | `python train.py -s 2 --iou .7` | 99.64 | 98.21 | 0.0016 | 99.77 | 100 |
176 | | `python train.py -s 2 --iou .8` | 100 | 98.19 | 1.7e-8 | 82.24 | 100 |
177 | | `python train.py -s 2 --iou .8 -e 30` | 99.98 | 99.35 | 1.73e-9 | 91.05 | 100 |
178 |
179 |
180 | Train on scale ranges [14-28]:
181 |
182 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
183 | | --- | --- | --- | --- | --- | --- |
184 | | `python train.py -r 14-28 -s 6 --iou .25 -e 30` | 99.10 | 89.37 | 0.0017 | 99.58 | 100 |
185 |
186 |
187 | With bigger net:
188 |
189 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
190 | | --- | --- | --- | --- | --- | --- |
191 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .5` | 99.59 | 98.02 | 0.00078 | 92.32 | 94.89 |
192 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .4` | 99.17 | 97.23 | 0.0047 | 99.79 | 100 |
193 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .3` | 99.74 | 96.84 | 0.00043 | **100** | 100 |
194 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .2` | 97.57 | 91.14 | 0.0016 | 99.98 | 100 |
195 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 6 --iou .15` | 98.02 | 83.85 | 0.0083 | 99.95 | 100 |
196 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 2 --iou .5` | 99.80 | 98.87 | 0.00053 | **100** | 100 |
197 | | `python train.py -m CNN_C64_C128_M2_C256_D -s 2 --iou .25` | 99.48 |95.78 | 0.00054 | 100 | 100 |
198 | | `python train.py -r 14-28 -m CNN_C64_C128_M2_C256_D -s 6 --iou .25 -e 30` | 96.58 | 91.42 | 0.0045 | 99.85 | 100 |
199 |
200 |
201 | Train on scale 56x56:
202 |
203 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
204 | | --- | --- | --- | --- | --- | --- |
205 | | `python train.py -r 56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D` | 99.98 | 99.22 | 7.4e-09 | 99.97 | 100 |
206 | | `python train.py -r 56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .2` | 98.86 | 78.63 | 0.011 | 99.89 | 100 |
207 | | `python train.py -r 56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .3` | 99.36 | 94.60 | 0.0036 | 99.97 | 100 |
208 | | `python train.py -r 56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .4` | 99.23 | 91.11 | 0.048 | **100** | 100 |
209 |
210 |
211 | Train for two stage networks (scales 28 and 56):
212 |
213 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
214 | | --- | --- | --- | --- | --- | --- |
215 | | `python train.py -r 28,56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2` | 99.99/1.0 | 98.62/96.69 | 1.06e-08/4.18e-05 | 99.97 | 100 |
216 | | `python train.py -r 28,56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 -s 6 -e 50` | 99.51/97.76 | 89.83/95.22 | 0.0048/0.016 | 99.44 | 100 |
217 | | `python train.py -r 28,56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 -s 4 -e 30` | 99.39/97.46 | 85.21/92.19 | 0.0054/0.022 | 99.64 | 100 |
218 |
219 |
220 |
221 | Train on scale ranges [28-56], two stages [14-28,28-56] and [14, 56]:
222 |
223 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
224 | | --- | --- | --- | --- | --- | --- |
225 | | `python train.py -r 28-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .25 -e 30` | 98.99 | 93.92 | 0.0018 | 99.89 | 100 |
226 | | `python train.py -r 14-28,28-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 -s 6 --iou .25 -e 30` | 98.92/98.04 | 64.06/91.08 | 0.0037/0.0056 | 98.82 | 99.90 |
227 | | `python train.py -r 14-28,28-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 -s 6 --iou .2 -e 30` | 98.57/97.73 | 58.30/79.84 | 0.0058/0.0036 | 98.31 | 99.90 |
228 | | `python train.py -r 14-28,28-56 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 -s 6 --iou .25 -e 30` | 99.10 / 98.16 | 93.64 / 95.28 | 0.0016 / 0.0014 | 98.42 | 99.93 |
229 | | `python train.py -r 14-28,28-56 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 -s 6 --iou .25 -e 50` | 99.26 / 98.78 | 93.91 / 94.02 | 0.0010 / 0.0014 | 98.81 | 99.93 |
230 | | `python train.py -r 14-28,28-56 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 -s 6 --iou .2 -e 50` | 99.05/98.05 | 89.88/91.97 | 0.0021/0.0022 | 99.11 | 99.97 |
231 | | `python train.py -r 14-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .02 -e 30` | 97.58 | 30.17 | 0.10 | 75.07 | 100 |
232 | | `python train.py -r 14-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .05 -e 30` | 97.92 | 53.20 | 0.027 | 75.49 | 100 |
233 | | `python train.py -r 14-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .1 -e 30` | 97.82 | 58.44 | 0.0057 | 87.45 | 92.67 |
234 | | `python train.py -r 14-56 -m CNN_C32_C64_M2_C64_C64_M2_C128_D -s 6 --iou .2 -e 30` | 98.82 | 79.23 | 0.0010 | 72.36 | 75.78 |
235 |
236 |
237 | Train on lower resolution (digit resize parameter):
238 |
239 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
240 | | --- | --- | --- | --- | --- | --- |
241 | | `python train.py -e 30 -r 14 -m CNN_C32_C64_C128_D` | 100 | 99.04 | 2.2-12 | 99.91 | 100 |
242 | | `python train.py -e 30 -r 14 -m CNN_C32_C64_C128_D -s 4` | 97.12 | 94.50 | 0.012 | 99.91 | 100 |
243 | | `python train.py -e 30 -r 14 -m CNN_C32_C64_C128_C` | 100 | 98.75 | 1.9-05 | 97.02 | 100 |
244 | | `python train.py -e 30 -r 14 -m CNN_C32_C64_C128_C -s 4` | 98.00 | 91.69 | 0.023 | 93.87 | 100 |
245 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_D` | 99.99 | 96.78 | 8.4e-5 | 99.85 | 100 |
246 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_D -s 4` | 98.58 | 73.07 | 0.0087 | 98.61 | 100 |
247 | | `python train.py -e 30 -r 7-14 --iou .25 -m CNN_C32_C64_C128_D -s 4` | 99.07 | 75.34 | 0.012 | 98.98 | 100 |
248 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_C` | 99.31 | 93.61 | 0.0035 | 92.52 | 100 |
249 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_C -s 4` | 97.22 | 24.87 | 0.0060 | 97.68 | 100 |
250 | | `python train.py -e 30 -r 7-14 --iou .2 -m CNN_C32_C64_C128_C2 -s 4` | 98.49 | 47.93 | 0.0088 | 98.91 | 100 |
251 | | `python train.py -e 30 -r 7-28 -s 6 -m CNN_C32_C64_C64_Cd64_C128_D --iou .02` | 96.51 | 24.42 | 0.12 | 64.43 | 66.47 |
252 | | ` python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .2` | 99.12 | 91.01 | 0.0040 | 84.87 | 77.18 |
253 | | `python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .15` | 98.40 | 77.86 | 0.029 | 88.68 | 85.71 |
254 | | `python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .1` | 98.20 | 56.96 | 0.086 | 87.51 | 95.34 |
255 | | `python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .05 -lr 0.001` | 97.71 | 38.91 | 0.032 | 77.98 | 100 |
256 | | `python train.py -e 30 -r 7-28 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .02 --lr 0.0001` | 96.79 | 18.59 | 0.10 | 77.28 | 100 |
257 | | `python train.py -e 30 -r 7-28 -s 3 -m CNN_C32_C64_C64_Cd64_C128_D --iou .1` | 97.47 | 73.70 | 0.010 | 87.19 | 95.45 |
258 | | `python train.py -e 30 -r 7-28 -s 3 -m CNN_C32_C64_C64_Cd64_C128_D --iou .2` | 99.08 | 92.84 | 0.0074 | 81.01 | 76.47 |
259 | | `python train.py -e 50 -r 7-28 -s 3 -m CNN_C32_C64_C64_Cd64_C128_D --iou .15` | 98.71 | 88.02 | 0.0046 | 87.79 | 84.76 |
260 | | `python train.py -e 50 -r 7-28 -s 3 -m CNN_C32_C64_C64_Cd64_C128_D --iou .1` | 97.97 | 79.19 | 0.0096 | 89.17 | 95.24 |
261 |
262 |
263 |
264 | Train on larger images (1000 or 1500 rather than 700):
265 |
266 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
267 | | --- | --- | --- | --- | --- | --- |
268 | | `python train.py -e 30 -i 1000 -r 7-14 --iou .2 -m CNN_C32_C64_C128_D -s 4` | 98.80 | 52.92 | 0.0081 | 98.78 | 100 |
269 | | `python train.py -e 30 -i 1000 -r 7-14 --iou .2 -m CNN_C32_C64_C128_C -s 4` | 98.24 | 20.36 | 0.011 | 97.46 | 100 |
270 | | `python train.py -e 30 -i 1500 -r 7-14 --iou .2 -m CNN_C32_C64_C128_D -s 4` | 98.61 | 47.04 | 0.0076 | 98.36 | 100 |
271 | | `python train.py -e 30 -i 1000 -r 7-28 --iou .2 -m CNN_C32_C64_C64_Cd64_C128_D -s 4` | 98.93 | 89.25 | 0.0031 | 81.39 | 76.23 |
272 | | `python train.py -e 30 -i 1500 -r 7-28 --iou .2 -m CNN_C32_C64_C64_Cd64_C128_D -s 3 -b 1` | 99.04 | 91.46 | 0.0063 | 82.33 | 76.95 |
273 | | `python train.py -e 50 -i 1500 -r 7-28 --iou .2 -m CNN_C32_C64_C64_Cd64_C128_D -s 3 -b 1` | 98.78 | 91.20 | 0.011 | 82.93 | 76.38 |
274 | | `python train.py -e 50 -i 1500 -r 7-28 --iou .2 -m CNN_C32_C64_C64_Cd64_C128_D -s 4 -b 1` | 98.96 | 92.69 | 0.0015 | 80.29 | 76.97 |
275 |
276 |
277 | ### OCR Dataset "ocr_documents"
278 |
279 | Create a document configuration file `document.conf` in JSON specifying the directory in which document files are in JPG:
280 |
281 | ```json
282 | {
283 | "directory": "/sharedfiles/ocr_documents",
284 | "namespace": "ivalua.xml",
285 | "page_tag": "page",
286 | "char_tag": "char",
287 | "x1_attribute": "x1",
288 | "y1_attribute": "y1",
289 | "x2_attribute": "x2",
290 | "y2_attribute": "y2"
291 | }
292 | ```
293 |
294 | Use Tesseract OCR to produce the XML files:
295 |
296 | ```
297 | sudo apt-get install tesseract-ocr tesseract-ocr-fra
298 | python datasets/ocr_documents_preprocess.py
299 | ```
300 |
301 | Get document statistics with `python ocr_documents_statistics.py`.
302 |
303 | By default, input size is 700, this means 3500x2500 input images will be cropped to 700x420 :
304 |
305 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
306 | | --- | --- | --- | --- | --- | --- |
307 | | `python train.py -e 50 -d ocr_documents -s 2 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.15` | 97.00/97.76 | 69.11/71.78 | 0.027/0.016 | 58.82 | 91.22 |
308 | | `python train.py -e 50 -d ocr_documents -s 2 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.2` | 97.89/98.44 | 75.39/72.75 | 0.020/0.011 | 68.09 | 84.47 |
309 | | `python train.py -e 50 -d ocr_documents -s 2 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.25` | 98.19 | 81.43 | 0.014 | **64.69** | 65.40 |
310 | | `python train.py -e 50 -d ocr_documents -s 3 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.15` | 97.52/ 97.58 | 72.18/77.03 | 0.028/0.015 | **67.05** | 86.07 |
311 | | `python train.py -e 50 -d ocr_documents -s 3 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.2` | 98.24/98.25 | 79.01/79.47 | 0.019/0.10 | 66.25 | 78.15 |
312 | | `python train.py -e 50 -d ocr_documents -s 3 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.25` | 98.60/98.90 | 80.17/78.93 | 0.015/0.0075 | 62.71 | 66.42 |
313 | | `python train.py -e 50 -d ocr_documents -s 4 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.15` | 97.90/97.50 | 72.05/74.58 | 0.029/0.017 | 62.87 | 89.77 |
314 | | `python train.py -e 50 -d ocr_documents -s 4 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.2` | 98.42/97.99 | 78.35/79.15 | 0.021/0.012 | **66.30** | 83.94 |
315 | | `python train.py -e 50 -d ocr_documents -s 4 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.25` | 98.88/98.61 | 77.64/81.11 | 0.017/0.0077 | 60.26 | 69.35 |
316 | | `python train.py -e 50 -d ocr_documents -s 5 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.15` | 98.47/97.36 | 70.94/77.87 | 0.031/0.018 | **59.33** | 85.87 |
317 | | `python train.py -e 50 -d ocr_documents -s 5 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.2` | 98.92/97.76 | 67.94/80.13 | 0.021/0.014 | 51.87 | 77.52 |
318 | | `python train.py -e 50 -d ocr_documents -s 5 -m CNN_C32_C64_M2_C64_C64_M2_C128_D_2 --iou 0.25` | 99.09/98.45 | 70.41/83.67 | 0.018/0.0097 | 44.59 | 61.57 |
319 |
320 |
321 | With more capacity:
322 |
323 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
324 | | --- | --- | --- | --- | --- | --- |
325 | | `python train.py -e 50 -d ocr_documents -s 3 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 --iou 0.2` (1) | 98.45/98.66 | 83.27/85.42 | 0.018/0.0097 | 70.11 | 78.15 |
326 |
327 | (1) Model Tensorflow `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-05-28_20:03_CNN_C64_C128_M2_C128_C128_M2_C256_D_2.h5`
328 |
329 | Model CNTK `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-06-04_12:05_CNN_C64_C128_M2_C128_C128_M2_C256_D_2.h5` and `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-06-13_21.37_CNN_C64_C128_M2_C128_C128_M2_C256_D_2.dnn`
330 |
331 |
332 |
333 | To train on lower resolution, resize input images to 1000 (downsize by 3.5) and change input size by the same factor, to 200, in order to get 200x120 crops :
334 |
335 | | Command | Obj acc | Class acc | Reg acc | Obj mAP | Target mAP |
336 | | --- | --- | --- | --- | --- | --- |
337 | | `python train.py -e 150 -d ocr_documents -r 1000 -i 200 -s 6 -m CNN_C64_C128_M2_C256_D --iou .25` | 98.90 | 34.14 | 0.013 | 8.82 | 29.58 |
338 | | `python train.py -e 150 -d ocr_documents -r 1000 -i 200 -s 6 -m CNN_C64_C128_M2_C256_D --iou .2` | | | | | |
339 | | `python train.py -e 150 -d ocr_documents -r 1000 -i 200 -s 1 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7 --iou 0.2 -b 1` (2) | 98.02/99.85 | 72.54/.00 | 0.013/0.0017 | 48.81 | 69.38 |
340 | | `python train.py -e 50 -d ocr_documents -r 1000 -i 200 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .15` | 98.32 | 45.78 | 0.018 | 36.17 | 69.74 |
341 | | `python train.py -e 50 -d ocr_documents -r 1000 -i 200 -s 4 -m CNN_C32_C64_C128_D --iou .15` | 96.87 | 61.79 | 0.023 | 46.89 | 69.08 |
342 | | `python train.py -e 50 -d ocr_documents -r 1000 -i 200 -s 4 -m CNN_C32_C64_C128_D --iou .2` | 97.20 | 62.90 | 0.016 | 42.25 | 61.84 |
343 | | `python train.py -e 150 -d ocr_documents -r 1700 -i 400 -s 6 -m CNN_C64_C128_M2_C256_D --iou .25` | 98.38 | 86.83 | 0.012 | 31.76 | 43.46 |
344 | | `python train.py -e 150 -d ocr_documents -r 1700 -i 400 -s 6 -m CNN_C64_C128_M2_C256_D --iou .2` | 97.72 | 83.86 | 0.016 | 42.00 |59.83 |
345 |
346 | (2) `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-06-22_13:02_CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7.h5`
347 |
348 |
349 |
350 | For OCR training on full document images:
351 |
352 | | Command | Obj acc | Class acc | Reg acc |
353 | | --- | --- | --- | --- |
354 | | `python train.py -e 50 -d ocr_documents_generator -i 2000 -r 2000 -s 3 -m CNN_C64_C128_M2_C128_C128_M2_C256_D_2 --iou 0.2` | S3 | | |
355 | | `python train.py -e 50 -d ocr_documents_generator --n_cpu 8 -i 1000 -r 1000 -s 4 -m CNN_C32_C64_C64_Cd64_C128_D --iou .15` | S3 | | |
356 | | `python train.py -e 50 -d ocr_documents_generator --n_cpu 8 -i 1000 -r 1000 -s 4 -m CNN_C32_C64_C128_D --iou .2` (3) | 98.49 | 69.11 | 0.0158 |
357 | | `python train.py -e 50 -d ocr_documents_generator --n_cpu 8 -i 1500 -r 1500 -s 4 -m CNN_C32_C64_C128_D --iou .2` | V1 Good | | |
358 |
359 |
360 | (3) `wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/2018-06-25_15:44_CNN_C32_C64_C128_D.h5`
361 |
362 | ## Classification Training
363 |
364 | ### Cats and dogs dataset
365 |
366 | Download dataset from https://www.kaggle.com/c/dogs-vs-cats/data
367 |
368 | ```
369 | unzip /sharedfiles/train.zip -d /sharedfiles
370 | ./datasets/cls_dogs_vs_cats.sh /sharedfiles/train
371 | ```
372 |
373 | | Command | Class acc |
374 | | --- | --- |
375 | | `python train.py -d cls_dogs_vs_cats -i 150 -m VGG16_D256 -lr 0.001 -b 16` | 91.82 |
376 |
377 | ### Tiny ImageNet dataset
378 |
379 | Download [dataset](https://tiny-imagenet.herokuapp.com/)
380 |
381 | ```
382 | wget http://cs231n.stanford.edu/tiny-imagenet-200.zip -P /sharedfiles
383 | unzip /sharedfiles/tiny-imagenet-200.zip -d /sharedfiles/
384 | ./datasets/cls_tiny_imagenet_convert.sh /sharedfiles/tiny-imagenet-200
385 | python train.py -d cls_tiny_imagenet -i 150 -m VGG16_D4096_D4096 -lr 0.001 -b 64 -e 150 -p
386 | ```
387 |
388 | ### RVL-CDIP dataset
389 |
390 | ```
391 | wget https://s3-eu-west-1.amazonaws.com/christopherbourez/public/rvl-cdip.tar.gz -P /sharedfiles
392 | # aws s3 cp s3://christopherbourez/public/rvl-cdip.tar.gz /sharedfiles/
393 | mkdir /sharedfiles/rvl_cdip
394 | tar xvzf /sharedfiles/rvl-cdip.tar.gz -C /sharedfiles/rvl_cdip
395 | ./datasets/cls_rvl_cdip_convert.sh /sharedfiles/rvl_cdip
396 | # remove corrupted tiff
397 | rm /sharedfiles/rvl_cdip/test/scientific_publication/2500126531_2500126536.tif
398 | ```
399 |
400 | | Command | Class acc |
401 | | --- | --- |
402 | | `python train.py -d cls_rvl_cdip -i 150 -m VGG16_D4096_D4096 -lr 0.0001 -b 64 -e 25 -p` | 90.2 |
403 |
--------------------------------------------------------------------------------
/callback.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 | from keras.callbacks import Callback
3 | from keras import backend as K
4 | from datasets import get_layer_sizes, iou
5 | import numpy as np
6 | import cv2
7 | import math
8 |
9 | colors = [(86, 0, 240), (173, 225, 61), (54, 137, 255),\
10 | (151, 0, 255), (243, 223, 48), (0, 117, 255),\
11 | (58, 184, 14), (86, 67, 140), (121, 82, 6),\
12 | (174, 29, 128), (115, 154, 81), (86, 255, 234)]
13 | np_colors=np.array(colors)
14 |
15 | def compute_eligible_rectangles(output_maps, layer_strides, layer_offsets, layer_fields, stride_margin, num_classes, layer_sizes):
16 | res = []
17 | for i in range(output_maps[0].shape[0]):
18 | eligible = []
19 | for o , output_map in enumerate(output_maps):
20 | dim = layer_fields[o]
21 | if stride_margin:
22 | dim = dim - layer_strides[o]
23 |
24 | # class_prob_map = output_map[0, :, :, 0:nb_classes ] # (15, 25, nb_classes)
25 | # class_map = np.argmax(class_prob_map, axis=-1) # (15, 25)
26 | objectness_map = output_map[i, :, :, num_classes ] # (15, 25)
27 | reg = output_map[i, :, :, num_classes+1:num_classes+5]
28 |
29 | for y in range(objectness_map.shape[0]):
30 | for x in range(objectness_map.shape[1]):
31 | if objectness_map[y, x] > 0.5:
32 | w_2 = int(dim * 2**(-reg[y,x,3] -1)) # half width
33 | h_2 = int(dim * 2**(-reg[y,x,2] -1)) # half height
34 | x1 = layer_offsets[o] + x * layer_strides[o] + reg[y,x,1] * dim - w_2
35 | y1 = layer_offsets[o] + y * layer_strides[o] + reg[y,x,0] * dim - h_2
36 | x2 = layer_offsets[o] + x * layer_strides[o] + reg[y,x,1] * dim + w_2
37 | y2 = layer_offsets[o] + y * layer_strides[o] + reg[y,x,0] * dim + h_2
38 | eligible.append( [objectness_map[y, x], y1, x1, 2 * h_2 , 2 * w_2, o ] )
39 | res.append(eligible)
40 | return res
41 |
42 |
43 | def non_max_suppression(rectangles, nms_iou):
44 | res = []
45 | for eligible in rectangles:
46 | valid = []
47 | if len(eligible) > 0:
48 | index = np.argsort(- np.array(eligible)[:,0])
49 | valid.append(eligible[0])
50 | for i in index:
51 | if np.max(iou( np.array(valid)[:,1:], np.array( [eligible[i][1:5]] ))) > nms_iou:
52 | continue
53 | else:
54 | valid.append( eligible[i] )
55 | res.append(valid)
56 | return res
57 |
58 |
59 | def compute_map_score_and_mean_distance(val_gt, detections, overlap_threshold = 0.5):
60 | precision_recall = []
61 | fp, tp = 0, 0
62 | distance = 0.0
63 |
64 | # unflatten groundtruth, flatten detections for ordering
65 | nb_groundtruth = 0
66 | groundtruth = []
67 | gt_detected = []
68 | flattened_detections = []
69 | for image_id in range(len(detections)):
70 | gt = []
71 | for r in val_gt:
72 | if r[0] == image_id:
73 | gt.append( r[1:5] )
74 | nb_groundtruth = nb_groundtruth + 1
75 | groundtruth.append(gt)
76 | gt_detected.append(np.zeros((len(gt))))
77 | for d in range(len(detections[image_id])):
78 | flattened_detections.append( (detections[image_id][d][0], image_id, d ) )
79 |
80 | # order detections
81 | if len(flattened_detections) > 0:
82 | index = np.argsort(- np.array(flattened_detections)[:,0])
83 |
84 | # compute recall and precision for increasingly large subset of detections
85 | for i in index: # iterate through all predictions
86 | image_id = flattened_detections[i][1]
87 | d = flattened_detections[i][2]
88 | detection = np.array([ detections[image_id][d][1:5] ])
89 |
90 | gt = np.array(groundtruth[image_id])
91 | if len(gt) == 0:
92 | fp = fp + 1
93 | else:
94 | iou_scores = iou(gt, detection)
95 | m = np.argmax(iou_scores)
96 | if iou_scores[m] > overlap_threshold:
97 | if gt_detected[image_id][m] == 0: # not yet detected
98 | gt_detected[image_id][m] = 1
99 | tp = tp + 1
100 | distance = distance + math.sqrt( (gt[m][0] + gt[m][2]/2 - detection[0][0] - detection[0][2]/2)**2 + (gt[m][1] + gt[m][3]/2 - detection[0][1] - detection[0][3]/2)**2 )
101 | else: # detected twice
102 | fp = fp + 1
103 | else:
104 | fp = fp + 1
105 | precision_recall.append( ( tp/max(tp+fp, 1), tp/max(nb_groundtruth,1) ) )
106 |
107 | # filling the dips
108 | interpolated_precision_recall = []
109 | for i in range(len(precision_recall)):
110 | if precision_recall[i][0] >= max( [ p for p, _ in precision_recall[i:] ] ):
111 | interpolated_precision_recall.append(precision_recall[i])
112 |
113 | mAP = 0
114 | previous_r = 0
115 | for p, r in interpolated_precision_recall:
116 | mAP += p * (r - previous_r)
117 | previous_r = r
118 |
119 | return mAP, distance / max(tp, 1)
120 |
121 |
122 |
123 | class TensorBoard(Callback):
124 | """TensorBoard basic visualizations.
125 | [TensorBoard](https://www.tensorflow.org/get_started/summaries_and_tensorboard)
126 | is a visualization tool provided with TensorFlow.
127 | This callback writes a log for TensorBoard, which allows
128 | you to visualize dynamic graphs of your training and test
129 | metrics, as well as activation histograms for the different
130 | layers in your model.
131 | If you have installed TensorFlow with pip, you should be able
132 | to launch TensorBoard from the command line:
133 | ```sh
134 | tensorboard --logdir=/full_path_to_your_logs
135 | ```
136 | When using a backend other than TensorFlow, TensorBoard will still work
137 | (if you have TensorFlow installed), but the only feature available will
138 | be the display of the losses and metrics plots.
139 | # Arguments
140 | log_dir: the path of the directory where to save the log
141 | files to be parsed by TensorBoard.
142 | histogram_freq: frequency (in epochs) at which to compute activation
143 | and weight histograms for the layers of the model. If set to 0,
144 | histograms won't be computed. Validation data (or split) must be
145 | specified for histogram visualizations.
146 | write_graph: whether to visualize the graph in TensorBoard.
147 | The log file can become quite large when
148 | write_graph is set to True.
149 | write_grads: whether to visualize gradient histograms in TensorBoard.
150 | `histogram_freq` must be greater than 0.
151 | batch_size: size of batch of inputs to feed to the network
152 | for histograms computation.
153 | write_images: whether to write model weights to visualize as
154 | image in TensorBoard.
155 | embeddings_freq: frequency (in epochs) at which selected embedding
156 | layers will be saved.
157 | embeddings_layer_names: a list of names of layers to keep eye on. If
158 | None or empty list all the embedding layer will be watched.
159 | embeddings_metadata: a dictionary which maps layer name to a file name
160 | in which metadata for this embedding layer is saved. See the
161 | [details](https://www.tensorflow.org/how_tos/embedding_viz/#metadata_optional)
162 | about metadata files format. In case if the same metadata file is
163 | used for all embedding layers, string can be passed.
164 | """
165 |
166 | def __init__(self, val_gt, classes, stride_margin, layer_strides, layer_offsets, layer_fields, nms_iou = .5,
167 | log_dir='./logs',
168 | histogram_freq=0,
169 | batch_size=32,
170 | max_validation_size=10000,
171 | write_graph=True,
172 | write_grads=False,
173 | write_images=False,
174 | write_output_images=False,
175 | enable_boundingbox=False,
176 | enable_segmentation=False,
177 | batch_display_freq=100,
178 | embeddings_freq=0,
179 | embeddings_layer_names=None,
180 | embeddings_metadata=None, val_data=None):
181 | super(TensorBoard, self).__init__()
182 | global tf, projector
183 | try:
184 | import tensorflow as tf
185 | from tensorflow.contrib.tensorboard.plugins import projector
186 | except ImportError:
187 | raise ImportError('You need the TensorFlow module installed to use TensorBoard.')
188 |
189 | self.val_gt = val_gt
190 | self.classes = classes
191 | self.num_classes = len(classes)
192 | self.stride_margin = stride_margin
193 | self.layer_strides = layer_strides
194 | self.layer_offsets = layer_offsets
195 | self.layer_fields = layer_fields
196 | self.epoch = 0
197 | self.nms_iou = nms_iou
198 | self.log_dir = log_dir
199 | self.histogram_freq = histogram_freq
200 | self.merged = None
201 | self.write_output_images = write_output_images
202 | self.enable_boundingbox = enable_boundingbox
203 | self.enable_segmentation = enable_segmentation
204 | self.batch_display_freq = batch_display_freq
205 | self.write_graph = write_graph
206 | self.write_grads = write_grads
207 | self.write_images = write_images
208 | self.embeddings_freq = embeddings_freq
209 | self.embeddings_layer_names = embeddings_layer_names
210 | self.embeddings_metadata = embeddings_metadata or {}
211 | self.batch_size = batch_size
212 | self.max_validation_size = max_validation_size
213 | self.val_data = val_data
214 |
215 | def set_model(self, model):
216 | self.model = model
217 | if K.backend() == 'tensorflow':
218 | self.sess = K.get_session()
219 |
220 | if self.write_output_images:
221 | self.log_image_data = tf.placeholder(tf.uint8, [None, None, 3])
222 | self.log_image_name = tf.placeholder(tf.string)
223 | from tensorflow.python.ops import gen_logging_ops
224 | from tensorflow.python.framework import ops as _ops
225 | self.log_image = gen_logging_ops._image_summary(self.log_image_name, tf.expand_dims(self.log_image_data, 0), max_images=1)
226 | _ops.add_to_collection(_ops.GraphKeys.SUMMARIES, self.log_image)
227 |
228 | if self.histogram_freq and self.merged is None:
229 | for layer in self.model.layers:
230 |
231 | for weight in layer.weights:
232 | mapped_weight_name = weight.name.replace(':', '_')
233 | tf.summary.histogram(mapped_weight_name, weight)
234 | if self.write_grads:
235 | grads = model.optimizer.get_gradients(model.total_loss,
236 | weight)
237 |
238 | def is_indexed_slices(grad):
239 | return type(grad).__name__ == 'IndexedSlices'
240 | grads = [
241 | grad.values if is_indexed_slices(grad) else grad
242 | for grad in grads]
243 | tf.summary.histogram('{}_grad'.format(mapped_weight_name), grads)
244 | if self.write_images:
245 | w_img = tf.squeeze(weight)
246 | shape = K.int_shape(w_img)
247 | if len(shape) == 2: # dense layer kernel case
248 | if shape[0] > shape[1]:
249 | w_img = tf.transpose(w_img)
250 | shape = K.int_shape(w_img)
251 | w_img = tf.reshape(w_img, [1,
252 | shape[0],
253 | shape[1],
254 | 1])
255 | elif len(shape) == 3: # convnet case
256 | if K.image_data_format() == 'channels_last':
257 | # switch to channels_first to display
258 | # every kernel as a separate image
259 | w_img = tf.transpose(w_img, perm=[2, 0, 1])
260 | shape = K.int_shape(w_img)
261 | w_img = tf.reshape(w_img, [shape[0],
262 | shape[1],
263 | shape[2],
264 | 1])
265 | elif len(shape) == 1: # bias case
266 | w_img = tf.reshape(w_img, [1,
267 | shape[0],
268 | 1,
269 | 1])
270 | else:
271 | # not possible to handle 3D convnets etc.
272 | continue
273 |
274 | shape = K.int_shape(w_img)
275 | assert len(shape) == 4 and shape[-1] in [1, 3, 4]
276 | tf.summary.image(mapped_weight_name, w_img)
277 |
278 | if hasattr(layer, 'output'):
279 | tf.summary.histogram('{}_out'.format(layer.name),
280 | layer.output)
281 | self.merged = tf.summary.merge_all()
282 |
283 | if self.write_graph:
284 | self.writer = tf.summary.FileWriter(self.log_dir,
285 | self.sess.graph)
286 | else:
287 | self.writer = tf.summary.FileWriter(self.log_dir)
288 |
289 | if self.embeddings_freq:
290 | embeddings_layer_names = self.embeddings_layer_names
291 |
292 | if not embeddings_layer_names:
293 | embeddings_layer_names = [layer.name for layer in self.model.layers
294 | if type(layer).__name__ == 'Embedding']
295 |
296 | embeddings = {layer.name: layer.weights[0]
297 | for layer in self.model.layers
298 | if layer.name in embeddings_layer_names}
299 |
300 | self.saver = tf.train.Saver(list(embeddings.values()))
301 |
302 | embeddings_metadata = {}
303 |
304 | if not isinstance(self.embeddings_metadata, str):
305 | embeddings_metadata = self.embeddings_metadata
306 | else:
307 | embeddings_metadata = {layer_name: self.embeddings_metadata
308 | for layer_name in embeddings.keys()}
309 |
310 | config = projector.ProjectorConfig()
311 | self.embeddings_ckpt_path = os.path.join(self.log_dir,
312 | 'keras_embedding.ckpt')
313 |
314 | for layer_name, tensor in embeddings.items():
315 | embedding = config.embeddings.add()
316 | embedding.tensor_name = tensor.name
317 |
318 | if layer_name in embeddings_metadata:
319 | embedding.metadata_path = embeddings_metadata[layer_name]
320 |
321 | projector.visualize_embeddings(self.writer, config)
322 |
323 | def on_batch_end(self, batch, logs=None):
324 | """Called at the end of a batch.
325 | # Arguments
326 | batch: integer, index of batch within the current epoch.
327 | logs: dictionary of logs.
328 | """
329 | if batch % self.batch_display_freq != 0:
330 | return
331 |
332 | logs = logs or {}
333 | batch_size = logs.get('size', 0)
334 | # print(self.model.output.shape)
335 | # for layer in self.model.layers:
336 | # print(layer.name)
337 | # print(self.model.output.name)
338 |
339 | # self.infer = K.function([self.model.input]+ [K.learning_phase()], [self.model.output] )
340 | # start_batch = batch * batch_size
341 | # output_map = self.infer([self.train_data[start_batch:(start_batch+1)], 1]) # [Tensor((1, 25, 15, nb_classes + 1))]
342 |
343 | # self.writer.flush()
344 |
345 |
346 | def on_epoch_end(self, epoch, logs=None):
347 | self.epoch = epoch + 1
348 | logs = logs or {}
349 |
350 | if not self.validation_data:
351 | # creating validation data from validation generator
352 | print("Feeding callback validation data with Generator")
353 | j = 0
354 | imgs = []
355 | tags = [[] for s in range(len(self.layer_offsets))]
356 | for i in self.val_data:
357 | imgs.append(i[0])
358 | for s in range(len(self.layer_offsets)):
359 | tags[s].append(i[1][s] )
360 | j = j + 1
361 | if j > 10:
362 | break
363 |
364 | np_imgs = np.concatenate(imgs, axis=0)
365 | np_tags = []
366 | for s in range(len(self.layer_offsets)):
367 | np_tags.append( np.concatenate( tags[s], axis=0 ) )
368 | self.validation_data = [np_imgs] + np_tags + [ np.ones(np_imgs.shape[0]), 0.0]
369 |
370 | if not self.validation_data and self.histogram_freq:
371 | raise ValueError('If printing histograms, validation_data must be '
372 | 'provided, and cannot be a generator.')
373 | if self.validation_data and self.histogram_freq:
374 | if epoch % self.histogram_freq == 0:
375 |
376 | val_data = self.validation_data
377 | tensors = (self.model.inputs +
378 | self.model.targets +
379 | self.model.sample_weights)
380 |
381 | if self.model.uses_learning_phase:
382 | tensors += [K.learning_phase()]
383 |
384 | assert len(val_data) == len(tensors)
385 | val_size = val_data[0].shape[0]
386 | i = 0
387 | while i < val_size:
388 | step = min(self.batch_size, val_size - i)
389 | if self.model.uses_learning_phase:
390 | # do not slice the learning phase
391 | batch_val = [x[i:i + step] for x in val_data[:-1]]
392 | batch_val.append(val_data[-1])
393 | else:
394 | batch_val = [x[i:i + step] for x in val_data]
395 | assert len(batch_val) == len(tensors)
396 | feed_dict = dict(zip(tensors, batch_val))
397 | result = self.sess.run([self.merged], feed_dict=feed_dict)
398 | summary_str = result[0]
399 | self.writer.add_summary(summary_str, epoch)
400 | i += self.batch_size
401 |
402 | if self.embeddings_freq and self.embeddings_ckpt_path:
403 | if epoch % self.embeddings_freq == 0:
404 | self.saver.save(self.sess,
405 | self.embeddings_ckpt_path,
406 | epoch)
407 |
408 | for name, value in logs.items():
409 | if name in ['batch', 'size']:
410 | continue
411 | summary = tf.Summary()
412 | summary_value = summary.value.add()
413 | summary_value.simple_value = value.item()
414 | summary_value.tag = name
415 | self.writer.add_summary(summary, epoch)
416 |
417 |
418 | if self.validation_data and self.write_output_images:
419 | ######### original image
420 | # from skimage.io import imsave
421 | # import os
422 | # import numpy as np
423 | # if not os.path.exists(self.log_dir):
424 | # os.mkdir(self.log_dir)
425 | val_img_data = self.validation_data[0]
426 | val_size = min(val_img_data.shape[0], self.max_validation_size)
427 | tensors = (self.model.inputs)
428 | img_shape = val_img_data[0].shape
429 | layer_sizes = get_layer_sizes(img_shape, self.layer_offsets, self.layer_strides)
430 | detections, target_detections = [], []
431 | i = 0
432 | while i < val_size:
433 | step = min(self.batch_size, val_size - i)
434 | batch_val = [val_img_data[i:i + step], 1]
435 | if self.model.uses_learning_phase:
436 | tensors += [K.learning_phase()]
437 | batch_val.append(1)
438 | feed_dict = dict(zip(tensors, batch_val))
439 |
440 | if self.enable_boundingbox or self.enable_segmentation:
441 | output_maps = self.sess.run(self.model.outputs, feed_dict=feed_dict)
442 |
443 | if self.enable_boundingbox:
444 | eligible = compute_eligible_rectangles(output_maps,
445 | self.layer_strides, self.layer_offsets, self.layer_fields,
446 | self.stride_margin, self.num_classes, layer_sizes)
447 | valid = non_max_suppression(eligible, self.nms_iou)
448 |
449 | # compute targets for display
450 | target = compute_eligible_rectangles([self.validation_data[s+1][i:i+step] for s in range(len(layer_sizes))],
451 | self.layer_strides, self.layer_offsets, self.layer_fields,
452 | self.stride_margin, self.num_classes, layer_sizes)
453 |
454 | if i <= 10:
455 | # display results on a few images
456 | for image_id in range(step):
457 | image = (val_img_data[i + image_id] * 255.)#.astype(np.uint8)
458 | if image.shape[2] == 1:
459 | image = np.tile(image,(1,1,3))
460 | # imsave(os.path.join(self.log_dir, str(image_id) + '_input.png'), image)
461 | t = (self.epoch - 1) * val_img_data.shape[0] + i + image_id
462 |
463 | log_image_summary_op = self.sess.run(self.log_image, \
464 | feed_dict={self.log_image_name: "1-input", self.log_image_data: image})
465 | self.writer.add_summary(log_image_summary_op, global_step=t)
466 |
467 | if self.enable_boundingbox:
468 | # draw objectness
469 | image_ = np.copy(image)
470 | for o, output_map in enumerate(output_maps):
471 | objectness = output_map[image_id, :, :, self.num_classes: self.num_classes+1 ] * 255.
472 | log_image_summary_op = self.sess.run(self.log_image, \
473 | feed_dict={self.log_image_name: "2-objectness-" + str(o), self.log_image_data: np.tile( objectness,(1,1,3)) })
474 | self.writer.add_summary(log_image_summary_op, global_step=t)
475 | dim = self.layer_fields[o]
476 | cv2.rectangle(image_, (0, 0), (dim, dim), colors[o % len(colors)], 2)
477 | if self.stride_margin:
478 | dim = dim - self.layer_strides[o]
479 | cv2.rectangle(image_, (0, 0), (dim, dim), colors[o % len(colors)], 2)
480 |
481 | # draw eligible rectangles (before non max suppression)
482 | for r in eligible[image_id]:
483 | cv2.rectangle(image_, (int(r[2]), int(r[1])), (int(r[2]+r[4]), int(r[1]+r[3])), colors[r[5] % len(colors)], 2)
484 | log_image_summary_op = self.sess.run(self.log_image, \
485 | feed_dict={self.log_image_name: "3-result", self.log_image_data: image_})
486 | self.writer.add_summary(log_image_summary_op, global_step=t)
487 |
488 | # display results (after non max suppression)
489 | res_image = np.copy(image)
490 | for r in valid[image_id]:
491 | cv2.rectangle(res_image, (int(r[2]), int(r[1])), (int(r[2] + r[4]), int(r[1] + r[3])), colors[0], 2)
492 |
493 | log_image_summary_op = self.sess.run(self.log_image, \
494 | feed_dict={self.log_image_name: "4-after-nms", self.log_image_data: res_image})
495 | self.writer.add_summary(log_image_summary_op, global_step=t)
496 |
497 | # display target label
498 | target_image = np.copy(image)
499 | for r in target[image_id]:
500 | cv2.rectangle(target_image, (int(r[2]), int(r[1])), (int(r[2]+r[4]), int(r[1]+r[3])), colors[r[5] % len(colors)], 2)
501 | log_image_summary_op = self.sess.run(self.log_image, \
502 | feed_dict={self.log_image_name: "5-target", self.log_image_data: target_image})
503 | self.writer.add_summary(log_image_summary_op, global_step=t)
504 |
505 | # display groundtruth boxes
506 | for r in self.val_gt:
507 | if r[0] == i + image_id:
508 | cv2.rectangle(image, (int(r[2]), int(r[1])), (int(r[2]+r[4]), int(r[1]+r[3])), (86 / 255., 0, 240/255.), 2)
509 | log_image_summary_op = self.sess.run(self.log_image, \
510 | feed_dict={self.log_image_name: "6-groundtruth", self.log_image_data: image})
511 | self.writer.add_summary(log_image_summary_op, global_step=t)
512 |
513 | if self.enable_segmentation:
514 | # draw segmentation maps
515 | for o, output_map in enumerate(output_maps):
516 | output_map_labels = np.argmax( output_map[image_id], axis=-1 )
517 | output_map_color = np_colors[output_map_labels]
518 | log_image_summary_op = self.sess.run(self.log_image, \
519 | feed_dict={self.log_image_name: "2-segmentation-" + str(o), self.log_image_data: output_map_color })
520 | self.writer.add_summary(log_image_summary_op, global_step=t)
521 |
522 | target_image = np.copy(image)
523 | target_map_labels = np.argmax(self.validation_data[o+1][i+image_id], axis=-1)
524 | target_map_color = np_colors[target_map_labels]
525 | log_image_summary_op = self.sess.run(self.log_image, \
526 | feed_dict={self.log_image_name: "3-target-segmentation-" + str(o), self.log_image_data: target_map_color }) # cv2.addWeighted(target_image,0.1,cv2.resize(target_map_color, (target_image.shape[1], target_image.shape[0])),0.9,0, dtype=cv2.CV_32F)
527 | self.writer.add_summary(log_image_summary_op, global_step=t)
528 |
529 | # next batch
530 | if self.enable_boundingbox:
531 | detections = detections + valid
532 | target_detections = target_detections + non_max_suppression(target, self.nms_iou)
533 | i += step
534 |
535 | # compute statistics on full val dataset
536 | if self.enable_boundingbox:
537 | # mAP score and mean distance
538 | map, mean_distance = compute_map_score_and_mean_distance(self.val_gt, detections)
539 | summary = tf.Summary()
540 | summary_value = summary.value.add()
541 | summary_value.simple_value = map
542 | summary_value.tag = "validation_average_precision"
543 | self.writer.add_summary(summary, global_step=self.epoch)
544 |
545 | summary = tf.Summary()
546 | summary_value = summary.value.add()
547 | summary_value.simple_value = mean_distance
548 | summary_value.tag = "validation_mean_distance"
549 | self.writer.add_summary(summary, global_step=self.epoch)
550 |
551 | # target mAP score
552 | summary = tf.Summary()
553 | summary_value = summary.value.add()
554 | summary_value.simple_value, _ = compute_map_score_and_mean_distance(self.val_gt, target_detections)
555 | summary_value.tag = "target_average_precision"
556 | self.writer.add_summary(summary, global_step=self.epoch)
557 |
558 | self.writer.flush()
559 |
560 | def on_train_end(self, _):
561 | self.writer.close()
562 |
563 |
564 | class ParallelSaveCallback(Callback):
565 |
566 | def __init__(self, model, file):
567 | self.model_to_save = model
568 | self.file_path = file
569 |
570 | def on_epoch_end(self, epoch, logs=None):
571 | self.model_to_save.save(self.file_path + '_%d.h5' % epoch)
572 |
--------------------------------------------------------------------------------
/clean.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | echo "Removing models from /sharedfiles"
3 | rm /sharedfiles/models/*
4 |
5 | echo "Removing datasets from /sharedfiles"
6 | rm /sharedfiles/datasets/*
7 |
8 | read -p "Remove TF logs? [y/n]" -n 1 -r
9 | echo # (optional) move to a new line
10 | if [[ $REPLY =~ ^[Yy]$ ]]
11 | then
12 | echo "Removing TF logs"
13 | rm -r ./Graph/*
14 | fi
15 |
--------------------------------------------------------------------------------
/cntk-py27.yml:
--------------------------------------------------------------------------------
1 | name: cntk-py27
2 | dependencies:
3 | - pip=8.1.2=py27_0
4 | - python=2.7.11=5
5 | - opencv=3.1.0
6 | - pip:
7 | - lxml==4.2.0
8 | - keras==2.1.5
9 | - pytesseract
10 |
--------------------------------------------------------------------------------
/cntk-py35.yml:
--------------------------------------------------------------------------------
1 | name: cntk-py35
2 | dependencies:
3 | - pip=8.1.2=py35_0
4 | - python=3.5.2=0
5 | - opencv=3.1.0
6 | - pip:
7 | - lxml==4.2.0
8 | - keras==2.1.5
9 | - pytesseract
10 |
--------------------------------------------------------------------------------
/datasets/cls_dogs_vs_cats.py:
--------------------------------------------------------------------------------
1 | from keras.preprocessing.image import ImageDataGenerator
2 |
3 | class Dataset:
4 |
5 | def __init__(self, batch_size=3, input_dim=150, **kwargs):
6 | local_keys = locals()
7 | self.enable_classification = True
8 | self.enable_boundingbox = False
9 | self.enable_segmentation = False
10 |
11 | #classes
12 | self.classes = [ 'dogs', 'cats' ]
13 | self.num_classes = len(self.classes)
14 | print("Nb classes: " + str(self.num_classes))
15 |
16 | self.img_h = input_dim
17 | self.img_w = input_dim
18 | self.input_shape = ( self.img_h, self.img_w , 3)
19 | # self.stride_margin = True
20 | train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
21 | val_datagen = ImageDataGenerator(rescale=1./255)
22 | test_datagen = ImageDataGenerator(rescale=1./255)
23 |
24 | self.train = train_datagen.flow_from_directory('/sharedfiles/dogs_vs_cats/train',
25 | target_size=(self.img_w, self.img_h),
26 | batch_size=batch_size,
27 | class_mode='categorical', classes=self.classes)
28 |
29 | self.val = test_datagen.flow_from_directory('/sharedfiles/dogs_vs_cats/validation',
30 | target_size=(self.img_w, self.img_h),
31 | batch_size=batch_size,
32 | class_mode='categorical', classes=self.classes)
33 |
34 | self.test = test_datagen.flow_from_directory('/sharedfiles/dogs_vs_cats/validation',
35 | target_size=(self.img_w, self.img_h),
36 | batch_size=batch_size,
37 | class_mode='categorical', classes=self.classes)
38 |
39 | # for compatibility
40 | self.gt_test = []
41 | self.stride_margin = 0
42 |
--------------------------------------------------------------------------------
/datasets/cls_dogs_vs_cats.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | SOURCE_DIR=$1
3 | mkdir -p /sharedfiles/dogs_vs_cats/train/dogs
4 | mkdir -p /sharedfiles/dogs_vs_cats/train/cats
5 | mkdir -p /sharedfiles/dogs_vs_cats/validation/dogs
6 | mkdir -p /sharedfiles/dogs_vs_cats/validation/cats
7 |
8 | for i in $(seq 0 9999) ; do cp $SOURCE_DIR/dog.$i.jpg /sharedfiles/dogs_vs_cats/train/dogs/ ; done
9 | for i in $(seq 0 9999) ; do cp $SOURCE_DIR/cat.$i.jpg /sharedfiles/dogs_vs_cats/train/cats/ ; done
10 | for i in $(seq 10000 12499) ; do cp $SOURCE_DIR/cat.$i.jpg /sharedfiles/dogs_vs_cats/validation/cats/ ; done
11 | for i in $(seq 10000 12499) ; do cp $SOURCE_DIR/dog.$i.jpg /sharedfiles/dogs_vs_cats/validation/dogs/ ; done
12 |
--------------------------------------------------------------------------------
/datasets/cls_rvl_cdip.py:
--------------------------------------------------------------------------------
1 | from keras.preprocessing.image import ImageDataGenerator
2 |
3 | class Dataset:
4 |
5 | def __init__(self, batch_size=3, input_dim=150, **kwargs):
6 | local_keys = locals()
7 | self.enable_classification = True
8 | self.enable_boundingbox = False
9 | self.enable_segmentation = False
10 |
11 | #classes
12 | self.classes = ["letter","form", "email", "handwritten", "advertisement", \
13 | "scientific_report", "scientific_publication", "specification", \
14 | "file_folder", "news_article", "budget", "invoice", \
15 | "presentation", "questionnaire", "resume", "memo" ]
16 | self.num_classes = len(self.classes)
17 | print("Nb classes: " + str(self.num_classes))
18 |
19 | self.img_h = input_dim
20 | self.img_w = input_dim
21 | self.input_shape = ( self.img_h, self.img_w , 3)
22 | # self.stride_margin = True
23 | train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
24 | val_datagen = ImageDataGenerator(rescale=1./255)
25 | test_datagen = ImageDataGenerator(rescale=1./255)
26 |
27 | self.train = train_datagen.flow_from_directory('/sharedfiles/rvl_cdip/train',
28 | target_size=(self.img_w, self.img_h),
29 | batch_size=batch_size,
30 | class_mode='categorical', classes=self.classes)
31 |
32 | self.val = test_datagen.flow_from_directory('/sharedfiles/rvl_cdip/val',
33 | target_size=(self.img_w, self.img_h),
34 | batch_size=batch_size,
35 | class_mode='categorical', classes=self.classes)
36 |
37 | self.test = test_datagen.flow_from_directory('/sharedfiles/rvl_cdip/test',
38 | target_size=(self.img_w, self.img_h),
39 | batch_size=batch_size,
40 | class_mode='categorical', classes=self.classes)
41 | # for compatibility
42 | self.gt_test = []
43 | self.stride_margin = 0
44 |
--------------------------------------------------------------------------------
/datasets/cls_rvl_cdip_check.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | ls /sharedfiles/rvl_cdip/train/ | wc -l
3 |
4 | for split in train test val
5 | do
6 | for d in /sharedfiles/rvl_cdip/$split/*
7 | do
8 | echo $d
9 | ls $d/ | wc -l
10 | done
11 | done
12 | echo "Train"
13 | find /sharedfiles/rvl_cdip/train/ -type f | wc -l
14 | echo "Val"
15 | find /sharedfiles/rvl_cdip/val/ -type f | wc -l
16 | echo "Test"
17 | find /sharedfiles/rvl_cdip/test/ -type f | wc -l
18 |
19 |
20 |
21 |
--------------------------------------------------------------------------------
/datasets/cls_rvl_cdip_convert.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | SOURCE_DIR=$1
3 |
4 | classes=(letter
5 | form
6 | email
7 | handwritten
8 | advertisement
9 | scientific\ report
10 | scientific\ publication
11 | specification
12 | file\ folder
13 | news\ article
14 | budget
15 | invoice
16 | presentation
17 | questionnaire
18 | resume
19 | memo)
20 |
21 | for LABELFILE in $SOURCE_DIR/labels/*
22 | do
23 | split=`basename ${LABELFILE%%.*}`
24 | echo "SPLIT: $split"
25 | TARGET=$SOURCE_DIR/$split
26 | mkdir -p $TARGET
27 |
28 | for c in "${classes[@]}"
29 | do
30 | mkdir $TARGET/${c// /_}
31 | done
32 |
33 | IFS=$'\n'
34 | for next in `cat $LABELFILE`
35 | do
36 | echo " $next"
37 | FILEPATH="$(cut -d' ' -f1 <<< $next)"
38 | LABEL="$(cut -d' ' -f2 <<< $next)"
39 | mv $SOURCE_DIR/images/$FILEPATH $TARGET/${classes[$LABEL]// /_}/$(basename $FILEPATH)
40 | done
41 | done
42 | exit 0
43 |
--------------------------------------------------------------------------------
/datasets/cls_tiny_imagenet.py:
--------------------------------------------------------------------------------
1 | from keras.preprocessing.image import ImageDataGenerator
2 |
3 | class Dataset:
4 |
5 | def __init__(self, batch_size=3, input_dim=150, **kwargs):
6 | local_keys = locals()
7 | self.enable_classification = True
8 | self.enable_boundingbox = False
9 | self.enable_segmentation = False
10 |
11 | #classes
12 | self.classes = [ 'n02124075', 'n04067472', 'n04540053', 'n04099969', 'n07749582', 'n01641577', 'n02802426', 'n09246464', 'n07920052', 'n03970156', 'n03891332', 'n02106662', 'n03201208', 'n02279972', 'n02132136', 'n04146614', 'n07873807', 'n02364673', 'n04507155', 'n03854065', 'n03838899', 'n03733131', 'n01443537', 'n07875152', 'n03544143', 'n09428293', 'n03085013', 'n02437312', 'n07614500', 'n03804744', 'n04265275', 'n02963159', 'n02486410', 'n01944390', 'n09256479', 'n02058221', 'n04275548', 'n02321529', 'n02769748', 'n02099712', 'n07695742', 'n02056570', 'n02281406', 'n01774750', 'n02509815', 'n03983396', 'n07753592', 'n04254777', 'n02233338', 'n04008634', 'n02823428', 'n02236044', 'n03393912', 'n07583066', 'n04074963', 'n01629819', 'n09332890', 'n02481823', 'n03902125', 'n03404251', 'n09193705', 'n03637318', 'n04456115', 'n02666196', 'n03796401', 'n02795169', 'n02123045', 'n01855672', 'n01882714', 'n02917067', 'n02988304', 'n04398044', 'n02843684', 'n02423022', 'n02669723', 'n04465501', 'n02165456', 'n03770439', 'n02099601', 'n04486054', 'n02950826', 'n03814639', 'n04259630', 'n03424325', 'n02948072', 'n03179701', 'n03400231', 'n02206856', 'n03160309', 'n01984695', 'n03977966', 'n03584254', 'n04023962', 'n02814860', 'n01910747', 'n04596742', 'n03992509', 'n04133789', 'n03937543', 'n02927161', 'n01945685', 'n02395406', 'n02125311', 'n03126707', 'n04532106', 'n02268443', 'n02977058', 'n07734744', 'n03599486', 'n04562935', 'n03014705', 'n04251144', 'n04356056', 'n02190166', 'n03670208', 'n02002724', 'n02074367', 'n04285008', 'n04560804', 'n04366367', 'n02403003', 'n07615774', 'n04501370', 'n03026506', 'n02906734', 'n01770393', 'n04597913', 'n03930313', 'n04118538', 'n04179913', 'n04311004', 'n02123394', 'n04070727', 'n02793495', 'n02730930', 'n02094433', 'n04371430', 'n04328186', 'n03649909', 'n04417672', 'n03388043', 'n01774384', 'n02837789', 'n07579787', 'n04399382', 'n02791270', 'n03089624', 'n02814533', 'n04149813', 'n07747607', 'n03355925', 'n01983481', 'n04487081', 'n03250847', 'n03255030', 'n02892201', 'n02883205', 'n03100240', 'n02415577', 'n02480495', 'n01698640', 'n01784675', 'n04376876', 'n03444034', 'n01917289', 'n01950731', 'n03042490', 'n07711569', 'n04532670', 'n03763968', 'n07768694', 'n02999410', 'n03617480', 'n06596364', 'n01768244', 'n02410509', 'n03976657', 'n01742172', 'n03980874', 'n02808440', 'n02226429', 'n02231487', 'n02085620', 'n01644900', 'n02129165', 'n02699494', 'n03837869', 'n02815834', 'n07720875', 'n02788148', 'n02909870', 'n03706229', 'n07871810', 'n03447447', 'n02113799', 'n12267677', 'n03662601', 'n02841315', 'n07715103', 'n02504458' ]
13 | self.num_classes = len(self.classes)
14 | print("Nb classes: " + str(self.num_classes))
15 |
16 | self.img_h = input_dim
17 | self.img_w = input_dim
18 | self.input_shape = ( self.img_h, self.img_w , 3)
19 | # self.stride_margin = True
20 | train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
21 | val_datagen = ImageDataGenerator(rescale=1./255)
22 | test_datagen = ImageDataGenerator(rescale=1./255)
23 |
24 | self.train = train_datagen.flow_from_directory('/sharedfiles/tiny-imagenet-200/train',
25 | target_size=(self.img_w, self.img_h),
26 | batch_size=batch_size,
27 | class_mode='categorical', classes=self.classes)
28 |
29 | self.val = test_datagen.flow_from_directory('/sharedfiles/tiny-imagenet-200/val',
30 | target_size=(self.img_w, self.img_h),
31 | batch_size=batch_size,
32 | class_mode='categorical', classes=self.classes)
33 |
34 | self.test = test_datagen.flow_from_directory('/sharedfiles/tiny-imagenet-200/val',
35 | target_size=(self.img_w, self.img_h),
36 | batch_size=batch_size,
37 | class_mode='categorical', classes=self.classes)
38 |
39 | # for compatibility
40 | self.gt_test = []
41 | self.stride_margin = 0
42 |
--------------------------------------------------------------------------------
/datasets/cls_tiny_imagenet_class_list.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | for next in `cat /sharedfiles/tiny-imagenet-200/wnids.txt`
3 | do
4 | l="$l, '$next'"
5 | echo " $next"
6 | done
7 | echo $l
8 |
9 | exit 0
10 |
--------------------------------------------------------------------------------
/datasets/cls_tiny_imagenet_convert.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | SOURCE_DIR=$1
3 |
4 | # create class directories for validation dataset
5 | for c in $SOURCE_DIR/train/*
6 | do
7 | CLASS=`basename $c`
8 | echo $CLASS
9 | mkdir $SOURCE_DIR/val/$CLASS
10 | done
11 |
12 | # copy images at the class directory as asked by Keras Image Flow
13 | IFS=$'\n'
14 | for next in `cat $SOURCE_DIR/val/val_annotations.txt`
15 | do
16 | next=`echo $next | tr -s ' '`
17 | echo " $next"
18 | FILEPATH="$(cut -d' ' -f1 <<< $next)"
19 | LABEL="$(cut -d' ' -f2 <<< $next)"
20 | cp $SOURCE_DIR/val/images/$FILEPATH $SOURCE_DIR/val/$LABEL/$FILEPATH
21 | done
22 |
23 | exit 0
24 |
--------------------------------------------------------------------------------
/datasets/document.conf:
--------------------------------------------------------------------------------
1 | {
2 | "directory": "/sharedfiles/ocr_documents",
3 | "namespace": "ivalua.xml",
4 | "page_tag": "page",
5 | "char_tag": "char",
6 | "x1_attribute": "x1",
7 | "y1_attribute": "y1",
8 | "x2_attribute": "x2",
9 | "y2_attribute": "y2"
10 | }
11 |
--------------------------------------------------------------------------------
/datasets/ocr_documents.py:
--------------------------------------------------------------------------------
1 | import json
2 | import glob
3 | from lxml import etree
4 | import numpy as np
5 | # from scipy import misc
6 | import cv2
7 | from . import load_from_local_file, save_to_local_file, compute_grids, compute_grids_
8 | import math, os
9 |
10 | class Dataset:
11 |
12 | def __init__(self, name = "", input_dim=700, resize="", layer_offsets = [14], layer_strides = [28], layer_fields=[28], iou_treshold = .3, save=True, **kwargs):
13 | local_keys = locals()
14 | self.enable_classification = False
15 | self.enable_boundingbox = True
16 | self.enable_segmentation = False
17 |
18 | #classes
19 | self.classes = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", \
20 | "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", \
21 | "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", \
22 | "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "(", ")", "%"]
23 | self.num_classes = len(self.classes)
24 | print("Nb classes: " + str(self.num_classes))
25 |
26 | self.img_h = input_dim
27 | self.img_w = int(.6 * input_dim)
28 | self.input_shape = ( self.img_h, self.img_w , 1)
29 | self.stride_margin = True
30 |
31 | if load_from_local_file(**local_keys):
32 | return
33 |
34 | with open('datasets/document.conf') as config_file:
35 | config = json.load(config_file)
36 |
37 | xml_all_files = glob.glob(config["directory"] + "/*.xml")
38 | # xml_all_files = xml_all_files[:100]
39 | num_files = len(xml_all_files)
40 | num_train = int(0.9 * num_files)
41 | print("{} files in OCR dataset, split into TRAIN 90% and VAL 10%".format(num_files))
42 | xml_train_files = xml_all_files[0:num_train]
43 | xml_test_files = xml_all_files[num_train:]
44 |
45 | def create_dataset(xml_files, name):
46 | nb_images = len(xml_files)
47 | print("{} files in {} dataset".format(nb_images, name))
48 | groundtruth = []
49 | tiles = np.ones( [nb_images, self.img_h , self.img_w, 1], dtype = 'float32')
50 | ns = {'d': config["namespace"]}
51 | i = 0
52 | for xml_file in xml_files:
53 | if i >= nb_images:
54 | break
55 | root = etree.parse( xml_file )
56 | print("{}/{} - {}".format(i, nb_images, xml_file))
57 |
58 | pages = root.findall(".//d:" + config["page_tag"], ns)
59 | for p, page in enumerate(pages):
60 | if i >= nb_images:
61 | break
62 |
63 | prefix = ""
64 | if len(pages) > 1:
65 | prefix = "-" + str(p)
66 | img_path = xml_file[:-4] + prefix + ".jpg"
67 | image = cv2.imread(img_path, 0)
68 |
69 | if (image is None) or (image.shape != (int(page.get("height")), int(page.get("width")))) :
70 | print("Read Error " + img_path)
71 | continue
72 |
73 | image = image / 255.
74 |
75 | f = 1.
76 | if resize != "":
77 | r1 = int(resize) / image.shape[0]
78 | r2 = int(resize) / image.shape[1]
79 | f = min(r1, r2)
80 | image = cv2.resize(image, None, fx=f, fy=f, interpolation=cv2.INTER_NEAREST)
81 | print(image.shape)
82 |
83 | # find a good random crop
84 | e = 0
85 | while e < 100:
86 | e = e +1
87 | if image.shape[1] > self.img_w:
88 | x_ = np.random.choice(image.shape[1] - self.img_w)
89 | w_ = self.img_w
90 | else:
91 | x_ = 0
92 | w_ = image.shape[1]
93 | if image.shape[0] > self.img_h:
94 | y_ = np.random.choice(image.shape[0] - self.img_h)
95 | h_ = self.img_h
96 | else:
97 | y_ = 0
98 | h_ = image.shape[0]
99 | chars = page.findall(".//d:" + config["char_tag"], ns)
100 | nb_chars = 0
101 | for c in chars:
102 | x1 = float(c.get(config["x1_attribute"])) * f - x_
103 | y1 = float(c.get(config["y1_attribute"])) * f - y_
104 | x2 = float(c.get(config["x2_attribute"])) * f - x_
105 | y2 = float(c.get(config["y2_attribute"])) * f - y_
106 | if (x1 > 0) and (x2 < w_) and (y1 > 0) and (y2 < h_) :
107 | nb_chars = nb_chars + 1
108 | if nb_chars > 10:
109 | break
110 |
111 | tiles[i, :h_, :w_, 0] = image[y_:y_+h_, x_:x_+w_]
112 |
113 | chars = page.findall(".//d:" + config["char_tag"], ns)
114 | for c in chars:
115 | x1 = float(c.get(config["x1_attribute"])) * f - x_
116 | y1 = float(c.get(config["y1_attribute"])) * f - y_
117 | x2 = float(c.get(config["x2_attribute"])) * f - x_
118 | y2 = float(c.get(config["y2_attribute"])) * f - y_
119 | if (x1 < 0) or (x2 > w_) or (y1 < 0) or (y2 > h_) or ( min(y2 - y1, x2 - x1) <= 0.0 ): # or ( max(x2 - x1, y2 - y1) < (layer_fields[0] - layer_strides[0]) / 2 )
120 | continue
121 | # discard too small chars
122 | # if max(x2 - x1, y2 - y1) < 7:
123 | # continue
124 | if (c.text in self.classes):
125 | groundtruth.append((i, y1, x1, y2 - y1, x2 - x1, self.classes.index(c.text)))
126 |
127 | i = i + 1
128 |
129 | grids = compute_grids_(0, nb_images, groundtruth, layer_offsets, layer_strides, layer_fields, self.input_shape, self.stride_margin, iou_treshold, self.num_classes)
130 | return tiles, grids, np.array(groundtruth)
131 |
132 | x_train, y_train, gt_train = create_dataset(xml_train_files, "TRAIN")
133 | x_test, y_test, gt_test = create_dataset(xml_test_files, "TEST")
134 |
135 | self.x_train = x_train
136 | self.y_train = y_train
137 | self.gt_train = gt_train
138 | self.x_test = x_test
139 | self.y_test = y_test
140 | self.gt_test = gt_test
141 |
142 | save_to_local_file(**local_keys)
143 |
--------------------------------------------------------------------------------
/datasets/ocr_documents_generator.py:
--------------------------------------------------------------------------------
1 | import json
2 | import glob
3 | from lxml import etree
4 | import numpy as np
5 | import cv2
6 | import math, os
7 | import random
8 | import gc
9 | from . import compute_grids, compute_grids_
10 | from keras.utils.data_utils import Sequence
11 |
12 | class FlowGenerator(Sequence):
13 | def __init__(self, config, files, name, layer_offsets, layer_strides, layer_fields, resize, classes, target_size=(150, 150), batch_size=1, iou_treshold = .3, stride_margin= True): #rescale=1./255, shear_range=0.2, zoom_range=0.2, brightness=0.1, rotation=5.0, zoom=0.1
14 |
15 | self.config = config
16 | self.file_path_list = files
17 | self.name = name
18 | self.layer_offsets, self.layer_strides, self.layer_fields = layer_offsets, layer_strides, layer_fields
19 | self.classes = classes
20 | self.num_classes = len(classes)
21 | self.resize = resize
22 | self.img_h = target_size[0]
23 | self.img_w = target_size[1]
24 | self.batch_size = batch_size
25 | self.iou_treshold = iou_treshold
26 | self.stride_margin = stride_margin
27 | # self.brightness = brightness
28 | # self.rotation = rotation
29 | # self.zoom = zoom
30 |
31 |
32 | def __len__(self):
33 | return len(self.file_path_list) // self.batch_size
34 |
35 | def __getitem__(self, i):
36 | X = np.zeros((self.batch_size, self.img_h, self.img_w, 1), dtype='float32')
37 | groundtruth = [] # for mAP score computation and verification
38 |
39 | ns = {'d': self.config["namespace"]}
40 | for n, xml_file in enumerate(self.file_path_list[i*self.batch_size:(i+1)*self.batch_size]):
41 | # print(self.name, i, n, "/", self.batch_size, xml_file)
42 | root = etree.parse( xml_file )
43 | pages = root.findall(".//d:" + self.config["page_tag"], ns)
44 | # for p, page in enumerate(pages):
45 | p =0
46 | page = pages[0]
47 | page_size = (int(page.get("height")), int(page.get("width")))
48 | prefix = ""
49 | if len(pages) > 1:
50 | prefix = "-" + str(p)
51 | img_path = xml_file[:-4] + prefix + ".jpg"
52 | image = cv2.imread(img_path, 0)
53 | if (image is None) or (image.shape != page_size) :
54 | print("Read Error " + img_path)
55 | continue
56 | image = image / 255.
57 |
58 | f = 1.
59 | if self.resize != "":
60 | r1 = int(self.resize) / image.shape[0]
61 | r2 = int(self.resize) / image.shape[1]
62 | f = min(r1, r2)
63 | image = cv2.resize(image, None, fx=f, fy=f, interpolation=cv2.INTER_NEAREST)
64 |
65 | # find a good random crop
66 | e = 0
67 | while e < 100:
68 | e = e +1
69 | if image.shape[1] > self.img_w:
70 | x_ = np.random.choice(image.shape[1] - self.img_w)
71 | w_ = self.img_w
72 | else:
73 | x_ = 0
74 | w_ = image.shape[1]
75 | if image.shape[0] > self.img_h:
76 | y_ = np.random.choice(image.shape[0] - self.img_h)
77 | h_ = self.img_h
78 | else:
79 | y_ = 0
80 | h_ = image.shape[0]
81 | chars = page.findall(".//d:" + self.config["char_tag"], ns)
82 | nb_chars = 0
83 | for c in chars:
84 | x1 = float(c.get(self.config["x1_attribute"])) * f - x_
85 | y1 = float(c.get(self.config["y1_attribute"])) * f - y_
86 | x2 = float(c.get(self.config["x2_attribute"])) * f - x_
87 | y2 = float(c.get(self.config["y2_attribute"])) * f - y_
88 | if (x1 > 0) and (x2 < w_) and (y1 > 0) and (y2 < h_) :
89 | nb_chars = nb_chars + 1
90 | if nb_chars > 10:
91 | break
92 |
93 | X[n, :h_, :w_ , 0] = image[y_:y_+h_, x_:x_+w_]
94 |
95 | chars = page.findall(".//d:" + self.config["char_tag"], ns)
96 | # print(" Nb chars:", len(chars))
97 | for c in chars:
98 | x1 = int(float(c.get(self.config["x1_attribute"])) * f - x_)
99 | y1 = int(float(c.get(self.config["y1_attribute"])) * f - y_)
100 | x2 = int(float(c.get(self.config["x2_attribute"])) * f - x_)
101 | y2 = int(float(c.get(self.config["y2_attribute"])) * f - y_)
102 | if (x1 < 0) or (x2 > w_) or (y1 < 0) or (y2 > h_) or ( min(y2 - y1, x2 - x1) <= 0.0 ): # or ( max(x2 - x1, y2 - y1) < (self.layer_fields[0] - self.layer_strides[0]) / 2 ):
103 | continue
104 | # discard too small chars
105 | # if max(x2 - x1, y2 - y1) < 7:
106 | # continue
107 | if c.text in self.classes:
108 | groundtruth.append((i *self.batch_size + n, y1, x1, y2 - y1, x2 - x1, self.classes.index(c.text)))
109 |
110 | grids = compute_grids_(i *self.batch_size, self.batch_size, groundtruth, self.layer_offsets, self.layer_strides, self.layer_fields, (self.img_h, self.img_w), self.stride_margin, self.iou_treshold, self.num_classes)
111 |
112 | return X, grids #, np.array(groundtruth)
113 |
114 | def on_epoch_end(self):
115 | # Shuffle dataset for next epoch
116 | random.shuffle(self.file_path_list)
117 | # Fix memory leak (Keras bug)
118 | gc.collect()
119 |
120 |
121 |
122 | class Dataset:
123 |
124 | def __init__(self, name = "", batch_size=1, input_dim=1000, resize="", layer_offsets = [14], layer_strides = [28], layer_fields=[28], iou_treshold = .3, **kwargs):
125 | local_keys = locals()
126 | self.enable_classification = False
127 | self.enable_boundingbox = True
128 | self.enable_segmentation = False
129 |
130 | #classes
131 | self.classes = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", \
132 | "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", \
133 | "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", \
134 | "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "(", ")", "%"]
135 | self.num_classes = len(self.classes)
136 | print("Nb classes: " + str(self.num_classes))
137 |
138 | self.img_h = input_dim
139 | self.img_w = int(.7 * input_dim)
140 | self.input_shape = ( self.img_h, self.img_w , 1)
141 | self.stride_margin = True
142 |
143 | with open('datasets/document.conf') as config_file:
144 | config = json.load(config_file)
145 | xml_all_files = glob.glob(config["directory"] + "/*.xml")
146 | num_files = len(xml_all_files)
147 | num_train = int(0.9 * num_files)
148 | print("{} files in OCR dataset, split into TRAIN 90% and VAL 10%".format(num_files))
149 | xml_train_files = xml_all_files[0:num_train]
150 | xml_test_files = xml_all_files[num_train:]
151 |
152 | self.train = FlowGenerator(config, xml_train_files, "TRAIN", layer_offsets, layer_strides, layer_fields, resize, self.classes, target_size=(self.img_h, self.img_w),
153 | batch_size=batch_size, iou_treshold = iou_treshold, stride_margin= self.stride_margin) #, rescale=1./255, shear_range=0.2, zoom_range=0.2
154 | self.val = FlowGenerator(config, xml_test_files, "VAL", layer_offsets, layer_strides, layer_fields, resize, self.classes, target_size=(self.img_h, self.img_w),
155 | batch_size=batch_size, iou_treshold = iou_treshold, stride_margin= self.stride_margin) #, rescale=1./255
156 | self.test = FlowGenerator(config, xml_test_files, "TEST", layer_offsets, layer_strides, layer_fields, resize, self.classes, target_size=(self.img_h, self.img_w),
157 | batch_size=batch_size, iou_treshold = iou_treshold, stride_margin= self.stride_margin) #, rescale=1./255
158 |
159 | # for compatibility
160 | self.gt_test = []
161 | self.stride_margin = 0
162 |
--------------------------------------------------------------------------------
/datasets/ocr_documents_preprocess.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import pytesseract
3 | import json
4 | import glob
5 | from lxml import etree
6 | import time
7 | import lxml.builder
8 |
9 | with open('datasets/document.conf') as config_file:
10 | config = json.load(config_file)
11 |
12 | pdf_files = glob.glob(config["directory"] +"/*.jpg")
13 | for i, filename in enumerate(pdf_files):
14 | start = time.time()
15 | # read the image and get the dimensions
16 | img = cv2.imread(filename)
17 | h, w, _ = img.shape # assumes color image
18 |
19 | # run tesseract, returning the bounding boxes
20 | boxes = pytesseract.image_to_boxes(img) # also include any config options you use
21 |
22 | root = etree.Element("root", nsmap={None : config["namespace"]})
23 | p = etree.Element(config["page_tag"], height=str(h), width=str(w))
24 | root.append( p )
25 |
26 |
27 | # draw the bounding boxes on the image
28 | for b in boxes.splitlines():
29 | b = b.split(' ')
30 | # img = cv2.rectangle(img, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2)
31 | c = etree.Element(config["char_tag"])
32 | c.attrib["x1"] = str( min(int(b[1]), int(b[3])) )
33 | c.attrib["y1"] = str( min( h - int(b[2]), h - int(b[4])) )
34 | c.attrib["x2"] = str( max(int(b[1]), int(b[3])) )
35 | c.attrib["y2"] = str( max( h - int(b[2]), h - int(b[4])) )
36 | c.text = b[0]
37 | p.append( c )
38 |
39 | print(filename[:-4] + ".xml", time.time() - start)
40 | etree.ElementTree(root).write(filename[:-4] + ".xml", pretty_print=True, xml_declaration=True, encoding="utf-8")
41 | # print(etree.tostring(root, pretty_print=True))
42 | # cv2.imwrite(str(i) + ".jpg", img)
43 |
--------------------------------------------------------------------------------
/datasets/ocr_documents_statistics.py:
--------------------------------------------------------------------------------
1 | import glob
2 | from lxml import etree
3 | import numpy as np
4 | import matplotlib.pyplot as plt
5 | import json
6 |
7 | with open('datasets/document.conf') as config_file:
8 | config = json.load(config_file)
9 |
10 | xml_files = glob.glob(config["directory"] + "/*.xml")
11 | num_files = len(xml_files)
12 | print("{} files in dataset".format(num_files))
13 | widths, heights = [], []
14 | ns = {'d': config["namespace"]}
15 | i = 0
16 | for xml_file in xml_files:
17 | print(xml_file)
18 | root = etree.parse(xml_file)
19 |
20 | page = root.find(".//d:" + config["page_tag"], ns)
21 | page_size = [page.get("height"), page.get("width")]
22 |
23 | chars = root.findall(".//d:" + config["char_tag"], ns)
24 | for c in chars:
25 | widths.append( int(chars[0].get(config["x2_attribute"])) - int(chars[0].get(config["x1_attribute"])) )
26 | heights.append( int(chars[0].get(config["y2_attribute"])) - int(chars[0].get(config["y1_attribute"])) )
27 |
28 | i = i + 1
29 |
30 | print(np.histogram(np.asarray(widths), bins='auto'))
31 | print(np.histogram(np.asarray(heights), bins='auto'))
32 | plt.hist(np.asarray(widths), bins='auto')
33 | plt.show()
34 | plt.hist(np.asarray(heights), bins='auto')
35 | plt.show()
36 |
--------------------------------------------------------------------------------
/datasets/ocr_mnist.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 | import keras
3 | from keras.datasets import mnist
4 | import numpy as np
5 | import os
6 | from skimage import transform
7 | from . import compute_grids, compute_grids_, compute_grids_local, load_from_local_file, save_to_local_file
8 | import math
9 |
10 | class Dataset:
11 |
12 | def __init__(self, name, layer_offsets = [14, 28], layer_strides = [28, 56], layer_fields=[28, 56],
13 | input_dim=700, resize="", white_prob = 0., bb_positive="iou-treshold" , iou_treshold = .3, save=True, noise=False, **kwargs):
14 | local_keys = locals()
15 | self.enable_classification = False
16 | self.enable_boundingbox = True
17 | self.enable_segmentation = False
18 |
19 | if resize == "":
20 | digit_dim = [["28"]]
21 | else:
22 | digit_dim = [r.split("-") for r in resize.split(",")]
23 |
24 | assert len(layer_offsets) == len(digit_dim), "Number of layers in network do not match number of digit scales"
25 |
26 | self.img_h = int(input_dim)
27 | self.img_w = int(input_dim * .6)
28 | self.input_shape = ( self.img_h, self.img_w , 1)
29 |
30 | grid_dim = int(digit_dim[0][-1])
31 | nb_images_y = self.img_h // grid_dim
32 | nb_images_x = self.img_w // grid_dim
33 |
34 | self.num_classes = 10
35 | self.classes = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
36 | self.stride_margin= False
37 |
38 | if load_from_local_file(**local_keys):
39 | return
40 |
41 | (x_train, y_train), (x_test, y_test) = mnist.load_data()
42 |
43 | x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
44 | x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
45 |
46 | x_train = x_train.astype('float32')
47 | x_test = x_test.astype('float32')
48 | x_train /= 255
49 | x_test /= 255
50 |
51 | y_train = keras.utils.to_categorical(y_train, 10)
52 | y_test = keras.utils.to_categorical(y_test, 10)
53 |
54 | if noise:
55 | NUM_DISTORTIONS_DB = 100000
56 | num_distortions=80
57 | distortions = []
58 | dist_size = (9, 9)
59 | all_digits = x_train.reshape([-1, 28, 28])
60 | num_digits = all_digits.shape[0]
61 | for i in range(NUM_DISTORTIONS_DB):
62 | rand_digit = np.random.randint(num_digits)
63 | rand_x = np.random.randint(28-dist_size[1])
64 | rand_y = np.random.randint(28-dist_size[0])
65 |
66 | digit = all_digits[rand_digit]
67 | distortion = digit[rand_y:rand_y + dist_size[0],
68 | rand_x:rand_x + dist_size[1]]
69 | assert distortion.shape == dist_size
70 | #plt.imshow(distortion, cmap='gray')
71 | #plt.show()
72 | distortions += [distortion]
73 | print("Created distortions")
74 |
75 | def add_distortions(image):
76 | canvas = np.zeros_like(image)
77 | for i in range(num_distortions):
78 | rand_distortion = distortions[np.random.randint(NUM_DISTORTIONS_DB)]
79 | rand_x = np.random.randint(image.shape[1]-dist_size[1])
80 | rand_y = np.random.randint(image.shape[0]-dist_size[0])
81 | canvas[rand_y:rand_y+dist_size[0],
82 | rand_x:rand_x+dist_size[1], 0] = - rand_distortion
83 | canvas += image
84 | return np.clip(canvas, 0, 1)
85 |
86 |
87 | def create_tile(x, y):
88 | total_digits = x.shape[0]
89 | nb_images = int(total_digits / nb_images_x / nb_images_y / (1-white_prob) )
90 | tiles = np.ones( [nb_images, self.img_h , self.img_w, x.shape[3]], dtype = 'float32')
91 | occupations = np.zeros( [nb_images, self.img_h , self.img_w, 1], dtype = 'float32')
92 | groundtruth = [] # for mAP score computation and verification
93 |
94 | i = 0
95 | for tile in range(nb_images):
96 | for s in reversed(range(len(layer_offsets))):
97 | nb_samples = 0
98 | img_dim = digit_dim[s]
99 | anchor_dim = layer_fields[s]
100 | while nb_samples < (1. - white_prob) * nb_images_x * nb_images_y / len(digit_dim) * grid_dim / int(img_dim[-1]):
101 | # pick a random row, col on the scale grid
102 | row = np.random.choice(nb_images_y )
103 | col = np.random.choice(nb_images_x )
104 | if len(img_dim) > 1:
105 | dim = int(math.ceil(int(img_dim[0]) + np.random.rand() * (int(img_dim[1]) - int(img_dim[0])) ))
106 | else:
107 | dim = int(img_dim[0])
108 | xc = (col + .5) * grid_dim
109 | yc = (row + .5) * grid_dim
110 | x_ = int(xc - dim/2)
111 | y_ = int(yc - dim/2)
112 | x_range = slice(x_, x_ + dim)
113 | y_range = slice(y_, y_ + dim)
114 | if (x_ < 0) or (y_ < 0) or (x_ + dim > self.img_w) or (y_ + dim > self.img_h):
115 | continue
116 |
117 | # if position available add it
118 | if np.sum(occupations[ tile, y_range, x_range, 0 ]) == 0.:
119 | resized_x = transform.resize(x[i], (dim, dim), mode='constant')
120 | tiles[ tile, y_range, x_range, ...] = 1.0 - resized_x # change for white background
121 | groundtruth.append((tile, y_, x_, dim, dim, np.argmax(y[i])))
122 | occupations[ tile, y_range, x_range, ...] = 1.0
123 | i = (i + 1) % total_digits
124 | nb_samples = nb_samples + 1
125 |
126 | if noise:
127 | tiles[ tile ] = add_distortions(tiles[ tile ])
128 |
129 | import time
130 | now = time.time()
131 | grids = compute_grids(0, nb_images, groundtruth, layer_offsets, layer_strides, layer_fields, self.input_shape, self.stride_margin, iou_treshold, self.num_classes, bb_positive="iou-treshold")
132 | if False: # timing eval
133 | t1 = time.time() -now
134 | now = time.time()
135 | grids2 = compute_grids_local(0, nb_images, groundtruth, layer_offsets, layer_strides, layer_fields, self.input_shape, self.stride_margin, iou_treshold, self.num_classes)
136 | print("grids", t1)
137 | print("grids2", time.time() -now)
138 | print("is nan", np.isnan(grids2[0].min()))
139 | for s in range(len(grids)):
140 | for l in range(grids[s].shape[-1]):
141 | print(np.allclose(grids[s][...,l],grids2[s][...,l]))
142 | print(np.allclose(np.argmax(grids[0][...,:10], axis=3), np.argmax(grids2[0][...,:10], axis=3)))
143 |
144 | return tiles, grids, np.array(groundtruth)
145 |
146 | x_train, y_train, gt_train = create_tile(x_train, y_train)
147 | x_test, y_test, gt_test = create_tile(x_test, y_test)
148 |
149 | self.x_train = x_train
150 | self.y_train = y_train
151 | self.gt_train = gt_train
152 | self.x_test = x_test
153 | self.y_test = y_test
154 | self.gt_test = gt_test
155 |
156 | save_to_local_file(**local_keys)
157 |
158 | # from skimage.io import imsave
159 | # import os
160 | # if not os.path.exists("logs"): #args.logs
161 | # os.mkdir("logs")
162 | # image_id = 1
163 | # image = (x_train[image_id, :, :, 0] * 255.).astype(np.uint8)
164 | # imsave(os.path.join("logs", str(image_id) + '_input.png'), image)
165 |
--------------------------------------------------------------------------------
/images/Object_detection_deep_learning_networks_for_Optical_Character_Recognition.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/Object_detection_deep_learning_networks_for_Optical_Character_Recognition.pdf
--------------------------------------------------------------------------------
/images/res1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res1.png
--------------------------------------------------------------------------------
/images/res2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res2.png
--------------------------------------------------------------------------------
/images/res3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res3.png
--------------------------------------------------------------------------------
/images/res4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res4.png
--------------------------------------------------------------------------------
/images/res5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ivalua/object_detection_ocr/7faffd6c42bd2f22ac0e1449a44f44e1c4d71e52/images/res5.png
--------------------------------------------------------------------------------
/keras-tf-py27.yml:
--------------------------------------------------------------------------------
1 | name: keras-tf-py27
2 | dependencies:
3 | - pip=8.1.2=py27_0
4 | - python=2.7.11=5
5 | - h5py
6 | - hdf5
7 | - pip:
8 | - numpy==1.14.2
9 | - scikit-image==0.13.1
10 | - lxml==4.2.0
11 | - keras==2.1.5
12 | - opencv-python
13 | - "https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp27-none-linux_x86_64.whl"
14 | - pytesseract
15 |
--------------------------------------------------------------------------------
/keras-tf-py35.yml:
--------------------------------------------------------------------------------
1 | name: keras-tf-py35
2 | dependencies:
3 | - pip=8.1.2=py35_0
4 | - python=3.5.2=0
5 | - h5py
6 | - hdf5
7 | - pip:
8 | - numpy==1.14.2
9 | - scikit-image==0.13.1
10 | - keras==2.1.5
11 | - lxml==4.2.0
12 | - opencv-python
13 | - "https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp35-cp35m-linux_x86_64.whl"
14 | - pytesseract
15 |
--------------------------------------------------------------------------------
/models/CNN_C128_C256_M2_C256_C256_M2_C512_D_2.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [2 * self.stride_scale, 4 * self.stride_scale]
14 | self.offsets = [14, 28]
15 | self.fields = [28, 56]
16 |
17 |
18 | def build(self, input_shape, num_classes):
19 | image_input = Input(shape=input_shape, name='image_input')
20 |
21 | # stage 1
22 | model = Sequential() # 28, 28, 1
23 | model.add(Conv2D(128, kernel_size=(3, 3), activation='relu',
24 | input_shape=input_shape, padding='valid')) # 28, 28, 1
25 | model.add(Conv2D(256, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
26 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1
27 | s1 = model(image_input)
28 |
29 | # stage 2
30 | s2 = Conv2D(256, kernel_size=(3, 3), activation='relu', padding='valid')(s1)
31 | s2 = Conv2D(256, (3, 3), activation='relu', padding='valid')(s2)
32 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2)
33 |
34 | # output 1
35 | f1 = Dropout(0.25)(s1)
36 | f1 = Conv2D(512, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale),
37 | padding="valid", activation='relu')(f1)
38 | f1 = Dropout(0.5)(f1)
39 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f1)))
40 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
41 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
42 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
43 |
44 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4])
45 |
46 | # output 2
47 | f2 = Dropout(0.25)(s2)
48 | f2 = Conv2D(512, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale),
49 | padding="valid", activation='relu')(f2)
50 | f2 = Dropout(0.5)(f2)
51 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f2)))
52 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
53 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
54 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
55 |
56 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4])
57 |
58 | # print(model.summary())
59 | # if K._BACKEND=='tensorflow':
60 | # for layer in model.layers:
61 | # print(layer.get_output_at(0).get_shape().as_list())
62 | self.model = Model(image_input, [output1, output2])
63 | self.model.strides = self.strides
64 | self.model.offsets = self.offsets
65 | self.model.fields = self.fields
66 | return self.model
67 |
--------------------------------------------------------------------------------
/models/CNN_C128_C256_M2_C512_D.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [2 * self.stride_scale]
14 | self.offsets = [14]
15 | self.fields = [28]
16 |
17 | def build(self, input_shape, num_classes):
18 | image_input = Input(shape=input_shape, name='image_input')
19 |
20 | model = Sequential() # 28, 28, 1
21 | model.add(Conv2D(128, kernel_size=(3, 3), activation='relu',
22 | input_shape=input_shape, padding='valid')) # 28, 28, 1
23 | model.add(Conv2D(256, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
24 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1
25 | model.add(Dropout(0.25))
26 | model.add(Conv2D(512, kernel_size=(12,12), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu'))
27 | model.add(Dropout(0.5))
28 |
29 | features = model(image_input)
30 |
31 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
32 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
33 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
34 |
35 | output = concatenate([output1, output2, output3], name="output")
36 |
37 | # print(model.summary())
38 | # if K._BACKEND=='tensorflow':
39 | # for layer in model.layers:
40 | # print(layer.get_output_at(0).get_shape().as_list())
41 | self.model = Model(image_input, output)
42 | self.model.strides = self.strides
43 | self.model.offsets = self.offsets
44 | self.model.fields = self.fields
45 | return self.model
46 |
--------------------------------------------------------------------------------
/models/CNN_C32_C64_C128_C.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [self.stride_scale]
14 | self.offsets = [7]
15 | self.fields = [14]
16 |
17 | def build(self, input_shape, num_classes):
18 | image_input = Input(shape=input_shape, name='image_input')
19 |
20 | model = Sequential() # 28, 28, 1
21 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',
22 | input_shape=input_shape, padding='valid')) # 28, 28, 1
23 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
24 | model.add(Dropout(0.25))
25 | model.add(Conv2D(128, kernel_size=(10,10), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu'))
26 | model.add(Dropout(0.5))
27 | model.add(Conv2D(128, kernel_size=(1,1), padding="valid", activation='relu'))
28 | model.add(Conv2D(64, kernel_size=(1,1), padding="valid", activation='relu'))
29 |
30 | features = model(image_input)
31 |
32 | output1 = Dense(num_classes, activation = "softmax")(features)
33 | output2 = Dense(1, activation = "sigmoid")(features)
34 | output3 = Dense(2, activation = "tanh")(features)
35 | output4 = Dense(2, activation = "sigmoid")(features)
36 |
37 | output = concatenate([output1, output2, output3, output4], name="output")
38 |
39 | # print(model.summary())
40 | # if K._BACKEND=='tensorflow':
41 | # for layer in model.layers:
42 | # print(layer.get_output_at(0).get_shape().as_list())
43 | self.model = Model(image_input, output)
44 | self.model.strides = self.strides
45 | self.model.offsets = self.offsets
46 | self.model.fields = self.fields
47 | return self.model
48 |
--------------------------------------------------------------------------------
/models/CNN_C32_C64_C128_C2.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [self.stride_scale]
14 | self.offsets = [7]
15 | self.fields = [14]
16 |
17 | def build(self, input_shape, num_classes):
18 | image_input = Input(shape=input_shape, name='image_input')
19 |
20 | model = Sequential() # 28, 28, 1
21 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',
22 | input_shape=input_shape, padding='valid')) # 28, 28, 1
23 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
24 | model.add(Dropout(0.25))
25 | model.add(Conv2D(128, kernel_size=(10,10), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu'))
26 | model.add(Dropout(0.5))
27 | model.add(Conv2D(256, kernel_size=(1,1), padding="valid", activation='relu'))
28 | model.add(Conv2D(128, kernel_size=(1,1), padding="valid", activation='relu'))
29 |
30 | features = model(image_input)
31 |
32 | output1 = Dense(num_classes, activation = "softmax")(features)
33 | output2 = Dense(1, activation = "sigmoid")(features)
34 | output3 = Dense(2, activation = "tanh")(features)
35 | output4 = Dense(2, activation = "sigmoid")(features)
36 |
37 | output = concatenate([output1, output2, output3, output4], name="output")
38 |
39 | # print(model.summary())
40 | # if K._BACKEND=='tensorflow':
41 | # for layer in model.layers:
42 | # print(layer.get_output_at(0).get_shape().as_list())
43 | self.model = Model(image_input, output)
44 | self.model.strides = self.strides
45 | self.model.offsets = self.offsets
46 | self.model.fields = self.fields
47 | return self.model
48 |
--------------------------------------------------------------------------------
/models/CNN_C32_C64_C128_D.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [self.stride_scale]
14 | self.offsets = [7]
15 | self.fields = [14]
16 |
17 | def build(self, input_shape, num_classes):
18 | image_input = Input(shape=input_shape, name='image_input')
19 |
20 | model = Sequential() # 28, 28, 1
21 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',
22 | input_shape=input_shape, padding='valid')) # 28, 28, 1
23 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
24 | model.add(Dropout(0.25))
25 | model.add(Conv2D(128, kernel_size=(10,10), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu'))
26 | model.add(Dropout(0.5))
27 |
28 | features = model(image_input)
29 |
30 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
31 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
32 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
33 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
34 |
35 | output = concatenate([output1, output2, output3, output4], name="output")
36 |
37 | # print(model.summary())
38 | # if K._BACKEND=='tensorflow':
39 | # for layer in model.layers:
40 | # print(layer.get_output_at(0).get_shape().as_list())
41 | self.model = Model(image_input, output)
42 | self.model.strides = self.strides
43 | self.model.offsets = self.offsets
44 | self.model.fields = self.fields
45 | return self.model
46 |
--------------------------------------------------------------------------------
/models/CNN_C32_C64_C64_Cd64_C128_D.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 6
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [self.stride_scale]
14 | self.offsets = [14]
15 | self.fields = [28]
16 |
17 |
18 | def build(self, input_shape, num_classes):
19 | image_input = Input(shape=input_shape, name='image_input')
20 |
21 | model = Sequential() # 28, 28, 1
22 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',
23 | input_shape=input_shape, padding='valid')) # 28, 28, 1
24 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
25 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')) # 28, 28, 1
26 | model.add(Conv2D(64, (5, 5), activation='relu', dilation_rate=(2, 2), padding='valid')) # 28, 28, 1
27 | model.add(Dropout(0.25))
28 | model.add(Conv2D(128, kernel_size=(14,14), strides=(self.stride_scale,self.stride_scale),
29 | padding="valid", activation='relu'))
30 | model.add(Dropout(0.5))
31 |
32 | features = model(image_input)
33 |
34 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
35 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
36 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
37 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
38 |
39 | output = concatenate([output1, output2, output3, output4], name="output")
40 |
41 | # print(model.summary())
42 | # if K._BACKEND=='tensorflow':
43 | # for layer in model.layers:
44 | # print(layer.get_output_at(0).get_shape().as_list())
45 | self.model = Model(image_input, output)
46 | self.model.strides = self.strides
47 | self.model.offsets = self.offsets
48 | self.model.fields = self.fields
49 | return self.model
50 |
--------------------------------------------------------------------------------
/models/CNN_C32_C64_M2_C128_D.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [2 * self.stride_scale]
14 | self.offsets = [14]
15 | self.fields = [28]
16 |
17 | def build(self, input_shape, num_classes):
18 | image_input = Input(shape=input_shape, name='image_input')
19 |
20 | model = Sequential() # 28, 28, 1
21 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',
22 | input_shape=input_shape, padding='valid')) # 28, 28, 1
23 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
24 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1
25 | model.add(Dropout(0.25))
26 | model.add(Conv2D(128, kernel_size=(12,12), strides=(self.stride_scale, self.stride_scale), padding="valid", activation='relu'))
27 | model.add(Dropout(0.5))
28 |
29 | features = model(image_input)
30 |
31 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
32 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
33 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
34 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
35 |
36 | output = concatenate([output1, output2, output3, output4], name="output")
37 |
38 | # print(model.summary())
39 | # if K._BACKEND=='tensorflow':
40 | # for layer in model.layers:
41 | # print(layer.get_output_at(0).get_shape().as_list())
42 | self.model = Model(image_input, output)
43 | self.model.strides = self.strides
44 | self.model.offsets = self.offsets
45 | self.model.fields = self.fields
46 | return self.model
47 |
--------------------------------------------------------------------------------
/models/CNN_C32_C64_M2_C64_C64_M2_C128_D.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [4 * self.stride_scale]
14 | self.offsets = [28]
15 | self.fields = [56]
16 |
17 |
18 | def build(self, input_shape, num_classes):
19 | image_input = Input(shape=input_shape, name='image_input')
20 |
21 | model = Sequential() # 28, 28, 1
22 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',
23 | input_shape=input_shape, padding='valid')) # 28, 28, 1
24 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
25 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1
26 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')) # 28, 28, 1
27 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
28 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1
29 | model.add(Dropout(0.25))
30 | model.add(Conv2D(128, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale),
31 | padding="valid", activation='relu'))
32 | model.add(Dropout(0.5))
33 |
34 | features = model(image_input)
35 |
36 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
37 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
38 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
39 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
40 |
41 | output = concatenate([output1, output2, output3, output4], name="output")
42 |
43 | # print(model.summary())
44 | # if K._BACKEND=='tensorflow':
45 | # for layer in model.layers:
46 | # print(layer.get_output_at(0).get_shape().as_list())
47 | self.model = Model(image_input, output)
48 | self.model.strides = self.strides
49 | self.model.offsets = self.offsets
50 | self.model.fields = self.fields
51 | return self.model
52 |
--------------------------------------------------------------------------------
/models/CNN_C32_C64_M2_C64_C64_M2_C128_D_2.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [2 * self.stride_scale, 4 * self.stride_scale]
14 | self.offsets = [14, 28]
15 | self.fields = [28, 56]
16 |
17 |
18 | def build(self, input_shape, num_classes):
19 | image_input = Input(shape=input_shape, name='image_input')
20 |
21 | # stage 1
22 | model = Sequential() # 28, 28, 1
23 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',
24 | input_shape=input_shape, padding='valid')) # 28, 28, 1
25 | model.add(Conv2D(64, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
26 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1
27 | s1 = model(image_input)
28 |
29 | # stage 2
30 | s2 = Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')(s1)
31 | s2 = Conv2D(64, (3, 3), activation='relu', padding='valid')(s2)
32 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2)
33 |
34 | # output 1
35 | f1 = Dropout(0.25)(s1)
36 | f1 = Conv2D(128, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale),
37 | padding="valid", activation='relu')(f1)
38 | f1 = Dropout(0.5)(f1)
39 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
40 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
41 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
42 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
43 |
44 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4])
45 |
46 | # output 2
47 | f2 = Dropout(0.25)(s2)
48 | f2 = Conv2D(128, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale),
49 | padding="valid", activation='relu')(f2)
50 | f2 = Dropout(0.5)(f2)
51 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
52 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
53 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
54 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
55 |
56 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4])
57 |
58 | # print(model.summary())
59 | # if K._BACKEND=='tensorflow':
60 | # for layer in model.layers:
61 | # print(layer.get_output_at(0).get_shape().as_list())
62 | self.model = Model(image_input, [output1, output2])
63 | self.model.strides = self.strides
64 | self.model.offsets = self.offsets
65 | self.model.fields = self.fields
66 | return self.model
67 |
--------------------------------------------------------------------------------
/models/CNN_C32_Cd64_C64_Cd64_C128_D.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 6
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [self.stride_scale]
14 | self.offsets = [14]
15 | self.fields = [28]
16 |
17 |
18 | def build(self, input_shape, num_classes):
19 | image_input = Input(shape=input_shape, name='image_input')
20 |
21 | model = Sequential() # 28, 28, 1
22 | model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',
23 | input_shape=input_shape, padding='valid')) # 28, 28, 1
24 | model.add(Conv2D(64, (3, 3), dilation_rate=(2, 2), activation='relu', padding='valid')) # 28, 28, 1
25 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')) # 28, 28, 1
26 | model.add(Conv2D(64, (4, 4), dilation_rate=(2, 2), activation='relu', padding='valid')) # 28, 28, 1
27 | model.add(Dropout(0.25))
28 | model.add(Conv2D(128, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale),
29 | padding="valid", activation='relu'))
30 | model.add(Dropout(0.5))
31 |
32 | features = model(image_input)
33 |
34 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
35 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
36 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
37 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
38 |
39 | output = concatenate([output1, output2, output3, output4], name="output")
40 |
41 | # print(model.summary())
42 | # if K._BACKEND=='tensorflow':
43 | # for layer in model.layers:
44 | # print(layer.get_output_at(0).get_shape().as_list())
45 | self.model = Model(image_input, output)
46 | self.model.strides = self.strides
47 | self.model.offsets = self.offsets
48 | self.model.fields = self.fields
49 | return self.model
50 |
--------------------------------------------------------------------------------
/models/CNN_C64_C128_M2_C128_C128_M2_C256_D_2.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [2 * self.stride_scale, 4 * self.stride_scale]
14 | self.offsets = [14, 28]
15 | self.fields = [28, 56]
16 |
17 |
18 | def build(self, input_shape, num_classes):
19 | image_input = Input(shape=input_shape, name='image_input')
20 |
21 | # stage 1
22 | model = Sequential() # 28, 28, 1
23 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu',
24 | input_shape=input_shape, padding='valid')) # 28, 28, 1
25 | model.add(Conv2D(128, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
26 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1
27 | s1 = model(image_input)
28 |
29 | # stage 2
30 | s2 = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='valid')(s1)
31 | s2 = Conv2D(128, (3, 3), activation='relu', padding='valid')(s2)
32 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2)
33 |
34 | # output 1
35 | f1 = Dropout(0.25)(s1)
36 | f1 = Conv2D(256, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale),
37 | padding="valid", activation='relu')(f1)
38 | f1 = Dropout(0.5)(f1)
39 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f1)))
40 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
41 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
42 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
43 |
44 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4])
45 |
46 | # output 2
47 | f2 = Dropout(0.25)(s2)
48 | f2 = Conv2D(256, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale),
49 | padding="valid", activation='relu')(f2)
50 | f2 = Dropout(0.5)(f2)
51 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f2)))
52 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
53 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
54 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
55 |
56 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4])
57 |
58 | # print(model.summary())
59 | # if K._BACKEND=='tensorflow':
60 | # for layer in model.layers:
61 | # print(layer.get_output_at(0).get_shape().as_list())
62 | self.model = Model(image_input, [output1, output2])
63 | self.model.strides = self.strides
64 | self.model.offsets = self.offsets
65 | self.model.fields = self.fields
66 | return self.model
67 |
--------------------------------------------------------------------------------
/models/CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [2 * self.stride_scale, 4 * self.stride_scale]
14 | self.offsets = [7, 20]
15 | self.fields = [14, 40]
16 |
17 |
18 | def build(self, input_shape, num_classes):
19 | image_input = Input(shape=input_shape, name='image_input')
20 |
21 | # stage 1
22 | model = Sequential() # 28, 28, 1
23 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu',
24 | input_shape=input_shape, padding='valid')) # 28, 28, 1
25 | model.add(Conv2D(128, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
26 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1
27 | s1 = model(image_input)
28 |
29 | # stage 2
30 | s2 = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='valid')(s1)
31 | s2 = Conv2D(128, (3, 3), activation='relu', padding='valid')(s2)
32 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2)
33 |
34 | # output 1
35 | f1 = Dropout(0.25)(s1)
36 | f1 = Conv2D(256, kernel_size=(5,5), strides=(self.stride_scale,self.stride_scale),
37 | padding="valid", activation='relu')(f1)
38 | f1 = Dropout(0.5)(f1)
39 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f1)))
40 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
41 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
42 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
43 |
44 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4])
45 |
46 | # output 2
47 | f2 = Dropout(0.25)(s2)
48 | f2 = Conv2D(256, kernel_size=(7,7), strides=(self.stride_scale,self.stride_scale),
49 | padding="valid", activation='relu')(f2)
50 | f2 = Dropout(0.5)(f2)
51 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f2)))
52 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
53 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
54 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
55 |
56 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4])
57 |
58 | # print(model.summary())
59 | # if K._BACKEND=='tensorflow':
60 | # for layer in model.layers:
61 | # print(layer.get_output_at(0).get_shape().as_list())
62 | self.model = Model(image_input, [output1, output2])
63 | self.model.strides = self.strides
64 | self.model.offsets = self.offsets
65 | self.model.fields = self.fields
66 | return self.model
67 |
--------------------------------------------------------------------------------
/models/CNN_C64_C128_M2_C128_C128_M2_C256_D_3.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [self.stride_scale, 2 * self.stride_scale, 4 * self.stride_scale]
14 | self.offsets = [7, 14, 28]
15 | self.fields = [14, 28, 56]
16 |
17 |
18 | def build(self, input_shape, num_classes):
19 | image_input = Input(shape=input_shape, name='image_input')
20 |
21 | # stage 1
22 | model = Sequential() # 28, 28, 1
23 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu',
24 | input_shape=input_shape, padding='valid')) # 28, 28, 1
25 | model.add(Conv2D(128, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
26 | s1 = model(image_input)
27 |
28 | # stage 2
29 | s2 = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='valid')(s1)
30 | s2 = Conv2D(128, (3, 3), activation='relu', padding='valid')(s2)
31 | s2 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s2)
32 |
33 | # stage 3
34 | s3 = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='valid')(s1)
35 | s3 = Conv2D(128, (3, 3), activation='relu', padding='valid')(s3)
36 | s3 = MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")(s3)
37 |
38 | # output 1
39 | f1 = Dropout(0.25)(s1)
40 | f1 = Conv2D(256, kernel_size=(12,12), strides=(self.stride_scale,self.stride_scale),
41 | padding="valid", activation='relu')(f1)
42 | f1 = Dropout(0.5)(f1)
43 | output1_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f1)))
44 | output1_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
45 | output1_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
46 | output1_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f1)))
47 |
48 | output1 = concatenate([output1_1, output1_2, output1_3, output1_4])
49 |
50 | # output 2
51 | f2 = Dropout(0.25)(s2)
52 | f2 = Conv2D(256, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale),
53 | padding="valid", activation='relu')(f2)
54 | f2 = Dropout(0.5)(f2)
55 | output2_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f2)))
56 | output2_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
57 | output2_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
58 | output2_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f2)))
59 |
60 | output2 = concatenate([output2_1, output2_2, output2_3, output2_4])
61 |
62 | # output 3
63 | f3 = Dropout(0.25)(s3)
64 | f3 = Conv2D(256, kernel_size=(11,11), strides=(self.stride_scale,self.stride_scale),
65 | padding="valid", activation='relu')(f3)
66 | f3 = Dropout(0.5)(f3)
67 | output3_1 = Dense(num_classes, activation = "softmax")(Dense(128, activation = "relu")(Dense(256, activation = "relu")(f3)))
68 | output3_2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f3)))
69 | output3_3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f3)))
70 | output3_4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(f3)))
71 |
72 | output3 = concatenate([output3_1, output3_2, output3_3, output3_4])
73 |
74 | # print(model.summary())
75 | # if K._BACKEND=='tensorflow':
76 | # for layer in model.layers:
77 | # print(layer.get_output_at(0).get_shape().as_list())
78 | self.model = Model(image_input, [output1, output2, output3])
79 | self.model.strides = self.strides
80 | self.model.offsets = self.offsets
81 | self.model.fields = self.fields
82 | return self.model
83 |
--------------------------------------------------------------------------------
/models/CNN_C64_C128_M2_C256_D.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | if stride_scale == 0:
9 | self.stride_scale = 14
10 | else:
11 | self.stride_scale = stride_scale
12 |
13 | self.strides = [2 * self.stride_scale]
14 | self.offsets = [14]
15 | self.fields = [28]
16 |
17 | def build(self, input_shape, num_classes):
18 | image_input = Input(shape=input_shape, name='image_input')
19 |
20 | model = Sequential() # 28, 28, 1
21 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu',
22 | input_shape=input_shape, padding='valid')) # 28, 28, 1
23 | model.add(Conv2D(128, (3, 3), activation='relu', padding='valid')) # 28, 28, 1
24 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid")) # 14, 14, 1
25 | model.add(Dropout(0.25))
26 | model.add(Conv2D(256, kernel_size=(12,12), strides=(self.stride_scale, self.stride_scale),
27 | padding="valid", activation='relu'))
28 | model.add(Dropout(0.5))
29 |
30 | features = model(image_input)
31 |
32 | output1 = Dense(num_classes, activation = "softmax")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
33 | output2 = Dense(1, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
34 | output3 = Dense(2, activation = "tanh")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
35 | output4 = Dense(2, activation = "sigmoid")(Dense(64, activation = "relu")(Dense(128, activation = "relu")(features)))
36 |
37 | output = concatenate([output1, output2, output3, output4], name="output")
38 |
39 | # print(model.summary())
40 | # if K._BACKEND=='tensorflow':
41 | # for layer in model.layers:
42 | # print(layer.get_output_at(0).get_shape().as_list())
43 | self.model = Model(image_input, output)
44 | self.model.strides = self.strides
45 | self.model.offsets = self.offsets
46 | self.model.fields = self.fields
47 | return self.model
48 |
--------------------------------------------------------------------------------
/models/VGG16_AVG.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate, GlobalAveragePooling2D
4 | from keras import applications
5 |
6 | class Network:
7 |
8 | def __init__(self, stride_scale = 0):
9 | # if stride_scale == 0:
10 | # self.stride_scale = 14
11 | # else:
12 | # self.stride_scale = stride_scale
13 | #
14 | # self.strides = [2 * self.stride_scale]
15 | self.strides = [32]
16 | self.offsets = [16]
17 | self.fields = [150]
18 |
19 | def build(self, input_shape, num_classes):
20 |
21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape)
22 | model = Sequential()
23 | for l in vgg.layers:
24 | #l.trainable = False
25 | model.add(l)
26 |
27 | model.add(Conv2D(num_classes, (3, 3)))
28 | model.add(GlobalAveragePooling2D())
29 | model.add(Activation('softmax'))
30 |
31 | print(model.summary())
32 |
33 | image_input = Input(shape=input_shape, name='image_input')
34 |
35 | self.model = Model(image_input, model(image_input))
36 | self.model.strides = self.strides
37 | self.model.offsets = self.offsets
38 | self.model.fields = self.fields
39 | return self.model
40 |
--------------------------------------------------------------------------------
/models/VGG16_AVG_r.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate, GlobalAveragePooling2D
4 | from keras import applications
5 |
6 | class Network:
7 |
8 | def __init__(self, stride_scale = 0):
9 | # if stride_scale == 0:
10 | # self.stride_scale = 14
11 | # else:
12 | # self.stride_scale = stride_scale
13 | #
14 | # self.strides = [2 * self.stride_scale]
15 | self.strides = [32]
16 | self.offsets = [16]
17 | self.fields = [150]
18 |
19 | def build(self, input_shape, num_classes):
20 |
21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape)
22 | model = Sequential()
23 | for l in vgg.layers:
24 | #l.trainable = False
25 | model.add(l)
26 |
27 | model.add(Conv2D(num_classes, (1, 1)))
28 | model.add(GlobalAveragePooling2D())
29 | model.add(Activation('softmax'))
30 |
31 | print(model.summary())
32 |
33 | image_input = Input(shape=input_shape, name='image_input')
34 |
35 | self.model = Model(image_input, model(image_input))
36 | self.model.strides = self.strides
37 | self.model.offsets = self.offsets
38 | self.model.fields = self.fields
39 | return self.model
40 |
--------------------------------------------------------------------------------
/models/VGG16_C4096_C4096_AVG.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate, GlobalAveragePooling2D
4 | from keras import applications
5 |
6 | class Network:
7 |
8 | def __init__(self, stride_scale = 0):
9 | # if stride_scale == 0:
10 | # self.stride_scale = 14
11 | # else:
12 | # self.stride_scale = stride_scale
13 | #
14 | # self.strides = [2 * self.stride_scale]
15 | self.strides = [32]
16 | self.offsets = [16]
17 | self.fields = [150]
18 |
19 | def build(self, input_shape, num_classes):
20 |
21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape)
22 | model = Sequential()
23 | for l in vgg.layers:
24 | #l.trainable = False
25 | model.add(l)
26 |
27 | model.add(Conv2D(4096, (3, 3), activation='relu'))
28 | model.add(Dropout(0.5))
29 | model.add(Conv2D(4096, (3, 3), activation='relu'))
30 | model.add(Dropout(0.5))
31 | model.add(Conv2D(num_classes, (3, 3)))
32 | model.add(GlobalAveragePooling2D())
33 | model.add(Activation('softmax'))
34 |
35 | print(model.summary())
36 |
37 | image_input = Input(shape=input_shape, name='image_input')
38 |
39 | self.model = Model(image_input, model(image_input))
40 | self.model.strides = self.strides
41 | self.model.offsets = self.offsets
42 | self.model.fields = self.fields
43 | return self.model
44 |
--------------------------------------------------------------------------------
/models/VGG16_D256.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 | from keras import applications
5 |
6 | class Network:
7 |
8 | def __init__(self, stride_scale = 0):
9 | # if stride_scale == 0:
10 | # self.stride_scale = 14
11 | # else:
12 | # self.stride_scale = stride_scale
13 | #
14 | # self.strides = [2 * self.stride_scale]
15 | self.strides = [32]
16 | self.offsets = [ ( 5 - 1) / 2.0 ]
17 | self.fields = [150]
18 |
19 | def build(self, input_shape, num_classes):
20 |
21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape)
22 | model = Sequential()
23 | for l in vgg.layers:
24 | l.trainable = False
25 | model.add(l)
26 |
27 | model.add(Flatten())
28 | model.add(Dense(256, activation='relu'))
29 | model.add(Dropout(0.5))
30 | model.add(Dense(num_classes, activation='softmax'))
31 |
32 | image_input = Input(shape=input_shape, name='image_input')
33 |
34 | self.model = Model(image_input, model(image_input))
35 | self.model.strides = self.strides
36 | self.model.offsets = self.offsets
37 | self.model.fields = self.fields
38 | return self.model
39 |
--------------------------------------------------------------------------------
/models/VGG16_D4096_D4096.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 | from keras import applications
5 |
6 | class Network:
7 |
8 | def __init__(self, stride_scale = 0):
9 | # if stride_scale == 0:
10 | # self.stride_scale = 14
11 | # else:
12 | # self.stride_scale = stride_scale
13 | #
14 | # self.strides = [2 * self.stride_scale]
15 | self.strides = [32]
16 | self.offsets = [16]
17 | self.fields = [150]
18 |
19 | def build(self, input_shape, num_classes):
20 |
21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape)
22 | model = Sequential()
23 | for l in vgg.layers:
24 | #l.trainable = False
25 | model.add(l)
26 |
27 | model.add(Flatten())
28 | model.add(Dense(4096, activation='relu'))
29 | model.add(Dropout(0.5))
30 | model.add(Dense(4096, activation='relu'))
31 | model.add(Dropout(0.5))
32 | model.add(Dense(num_classes, activation='softmax'))
33 |
34 | print(model.summary())
35 |
36 | image_input = Input(shape=input_shape, name='image_input')
37 |
38 | self.model = Model(image_input, model(image_input))
39 | self.model.strides = self.strides
40 | self.model.offsets = self.offsets
41 | self.model.fields = self.fields
42 | return self.model
43 |
--------------------------------------------------------------------------------
/models/VGG16_block4_D4096_D4096.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 | from keras import applications
5 |
6 | class Network:
7 |
8 | def __init__(self, stride_scale = 0):
9 | # if stride_scale == 0:
10 | # self.stride_scale = 14
11 | # else:
12 | # self.stride_scale = stride_scale
13 | #
14 | # self.strides = [2 * self.stride_scale]
15 | self.strides = [32]
16 | self.offsets = [16]
17 | self.fields = [150]
18 |
19 | def build(self, input_shape, num_classes):
20 |
21 | vgg = applications.VGG16(include_top=False, weights='imagenet', input_shape=input_shape)
22 | model = Sequential()
23 | for l in vgg.layers[:-4]:
24 | #l.trainable = False
25 | model.add(l)
26 |
27 | model.add(Flatten())
28 | model.add(Dense(4096, activation='relu'))
29 | model.add(Dropout(0.5))
30 | model.add(Dense(4096, activation='relu'))
31 | model.add(Dropout(0.5))
32 | model.add(Dense(num_classes, activation='softmax'))
33 |
34 | print(model.summary())
35 |
36 | image_input = Input(shape=input_shape, name='image_input')
37 |
38 | self.model = Model(image_input, model(image_input))
39 | self.model.strides = self.strides
40 | self.model.offsets = self.offsets
41 | self.model.fields = self.fields
42 | return self.model
43 |
--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
1 | import models.CNN_C32_C64_M2_C128_D
2 | import models.CNN_C32_C64_M2_C64_C64_M2_C128_D
3 | import models.CNN_C32_C64_M2_C64_C64_M2_C128_D_2
4 | import models.CNN_C64_C128_M2_C256_D
5 | import models.CNN_C128_C256_M2_C512_D
6 | import models.CNN_C64_C128_M2_C128_C128_M2_C256_D_2
7 | import models.CNN_C64_C128_M2_C128_C128_M2_C256_D_3
8 | import models.CNN_C128_C256_M2_C256_C256_M2_C512_D_2
9 | import models.CNN_C64_C128_M2_C128_C128_M2_C256_D_2_S7
10 | import models.CNN_C32_C64_C128_D
11 | import models.CNN_C32_C64_C128_C
12 | import models.CNN_C32_C64_C128_C2
13 | import models.CNN_C32_C64_C64_Cd64_C128_D
14 | import models.CNN_C32_Cd64_C64_Cd64_C128_D
15 | import models.vgg
16 | import models.VGG16_D256
17 | import models.VGG16_D4096_D4096
18 | import models.VGG16_block4_D4096_D4096
19 | import models.VGG16_AVG
20 | import models.VGG16_AVG_r
21 | import models.VGG16_C4096_C4096_AVG
22 |
23 |
24 | def get(**kwargs):
25 | if kwargs.get("name") not in globals():
26 | raise KeyError('Unknown network: {}'.format(kwargs))
27 |
28 | return globals()[kwargs.get("name")].Network(kwargs.get("stride_scale"))
29 |
--------------------------------------------------------------------------------
/models/simple_document_classification.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | # if stride_scale == 0:
9 | # self.stride_scale = 14
10 | # else:
11 | # self.stride_scale = stride_scale
12 | #
13 | # self.strides = [2 * self.stride_scale]
14 | self.strides = [0]
15 | self.offsets = [ ( 5 - 1) / 2.0 ]
16 | self.fields = [150]
17 |
18 | def build(self, input_shape, num_classes):
19 |
20 | assert input_shape == (150, 150, 3) , "incorrect input shape " + input_shape
21 | image_input = Input(shape=input_shape, name='image_input')
22 |
23 | model = Sequential()
24 | # Convolution + Pooling Layer
25 | model.add(Conv2D(32, (3, 3), padding='same', input_shape=input_shape, activation='relu'))
26 | model.add(MaxPooling2D(pool_size=(2, 2)))
27 | # Convolution + Pooling Layer
28 | model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
29 | model.add(MaxPooling2D(pool_size=(2, 2)))
30 | # Convolution + Pooling Layer
31 | model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
32 | model.add(MaxPooling2D(pool_size=(2, 2)))
33 | # Convolution + Pooling Layer
34 | model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
35 | model.add(MaxPooling2D(pool_size=(2, 2)))
36 |
37 | # Flattening
38 | model.add(Flatten())
39 | # Fully connection
40 | model.add(Dense(64, activation='relu'))
41 | model.add(Dropout(.6))
42 | model.add(Dense(64, activation='relu'))
43 | model.add(Dense(64, activation='relu'))
44 | model.add(Dropout(.3))
45 | model.add(Dense(num_classes, activation='softmax', name='predictions'))
46 |
47 | # GlobalAveragePooling2D()
48 |
49 | self.model = Model(image_input, model(image_input))
50 | self.model.strides = self.strides
51 | self.model.offsets = self.offsets
52 | self.model.fields = self.fields
53 | return self.model
54 |
--------------------------------------------------------------------------------
/models/vgg.py:
--------------------------------------------------------------------------------
1 | from keras.models import Sequential, Model
2 | from keras.layers import Dense, Dropout, Flatten, Reshape, Input
3 | from keras.layers import Conv2D, MaxPooling2D, Activation, concatenate
4 |
5 | class Network:
6 |
7 | def __init__(self, stride_scale = 0):
8 | # if stride_scale == 0:
9 | # self.stride_scale = 14
10 | # else:
11 | # self.stride_scale = stride_scale
12 | #
13 | # self.strides = [2 * self.stride_scale]
14 | self.strides = [32]
15 | self.offsets = [16]
16 | self.fields = [224]
17 |
18 | def build(self, input_shape, num_classes):
19 |
20 | assert input_shape == (224, 224, 3) , "incorrect input shape " + input_shape
21 | image_input = Input(shape=input_shape, name='image_input')
22 |
23 | model = Sequential()
24 | # block 1
25 | model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=input_shape, padding='same', name='block1_conv1')) # 224 x 224
26 | model.add(Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2'))
27 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block1_pool')) # 112x112
28 | # model.add(Dropout(0.25))
29 |
30 | # block 2
31 | model.add(Conv2D(128, kernel_size=(3,3), padding="same", activation='relu', name='block2_conv1'))
32 | model.add(Conv2D(128, kernel_size=(3,3), padding="same", activation='relu', name='block2_conv2'))
33 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block2_pool')) # 56 x 56
34 | # model.add(Dropout(0.5))
35 |
36 | # block 3
37 | model.add(Conv2D(256, kernel_size=(3,3), padding="same", activation='relu', name='block3_conv1'))
38 | model.add(Conv2D(256, kernel_size=(3,3), padding="same", activation='relu', name='block3_conv2'))
39 | model.add(Conv2D(256, kernel_size=(3,3), padding="same", activation='relu', name='block3_conv3'))
40 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block3_pool')) # 28 x 28
41 |
42 | # block 4
43 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block4_conv1'))
44 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block4_conv2'))
45 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block4_conv3'))
46 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block4_pool')) # 14 x 14
47 |
48 | # block 5
49 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block5_conv1'))
50 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block5_conv2'))
51 | model.add(Conv2D(512, kernel_size=(3,3), padding="same", activation='relu', name='block5_conv3'))
52 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2), padding="valid", name='block5_pool')) # 7 x 7
53 |
54 | model.add(Flatten(name='flatten'))
55 | model.add(Dense(4096, activation='relu', name='fc1'))
56 | model.add(Dropout(0.5))
57 | model.add(Dense(4096, activation='relu', name='fc2'))
58 | model.add(Dropout(0.5))
59 | model.add(Dense(num_classes, activation='softmax', name='predictions'))
60 |
61 | # GlobalAveragePooling2D()
62 |
63 | self.model = Model(image_input, model(image_input))
64 | self.model.strides = self.strides
65 | self.model.offsets = self.offsets
66 | self.model.fields = self.fields
67 | return self.model
68 |
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 | from __future__ import print_function
2 | import os
3 | import time
4 | import datetime
5 | import argparse
6 | from utils import check_config
7 | check_config()
8 | import models, datasets
9 |
10 | import keras
11 | from keras import backend as K
12 | from keras.metrics import categorical_accuracy
13 | from keras.utils.multi_gpu_utils import multi_gpu_model
14 | from keras.utils.vis_utils import plot_model
15 |
16 | ap = argparse.ArgumentParser()
17 | ap.add_argument("-b", "--batch_size", type=int,
18 | default=3, help="# of images per batch")
19 | ap.add_argument("-p", "--parallel", default=False,
20 | help="Enable multi GPUs", action='store_true')
21 | ap.add_argument("-e", "--epochs", type=int, default=12,
22 | help="# of training epochs")
23 | ap.add_argument("-l", "--logs", type=str, default="logs",
24 | help="log directory")
25 | ap.add_argument("-m", "--model", type=str,
26 | default="CNN_C32_C64_M2_C128_D", help="model")
27 | ap.add_argument("-lr", "--learning_rate", type=float,
28 | default=0.001, help="learning rate")
29 | ap.add_argument("-s", "--stride_scale", type=int,
30 | default=0, help="Stride scale. If zero, default stride scale.")
31 | ap.add_argument("-d", "--dataset", type=str, default="ocr_mnist",
32 | help="dataset")
33 | ap.add_argument("-w", "--white", type=float, default=0.9,
34 | help="white probability for MNIST dataset")
35 | ap.add_argument("-n", "--noise", default=False,
36 | help="noise for MNIST dataset", action='store_true')
37 | ap.add_argument("--pos_weight", type=float, default=100.,
38 | help="weight for positive objects")
39 | ap.add_argument("--iou", type=float, default=.3,
40 | help="iou treshold to consider a position to be positive. If -1, positive only if \
41 | object included in the layer field")
42 | ap.add_argument("--bb_positive", type=str, default="iou-treshold",
43 | help="Possible values: iou-treshold, in-anchor, best-anchor")
44 | ap.add_argument("--nms_iou", type=float, default=.2,
45 | help="iou treshold for non max suppression")
46 | ap.add_argument("-i", "--input_dim", type=int, default=700,
47 | help="network input dim")
48 | ap.add_argument("-r", "--resize", type=str, default="",
49 | help="resize input images")
50 | ap.add_argument('--no-save', dest='save', action='store_false',
51 | help="save model and data to files")
52 | ap.add_argument('--resume', dest='resume_model', type=str, default="")
53 | ap.add_argument('--n_cpu', type=int, default=1,
54 | help='number of CPU threads to use during data generation')
55 | args = ap.parse_args()
56 | print(args)
57 |
58 | assert K.image_data_format() == 'channels_last' , "image data format channel_last"
59 |
60 | model = models.get(name = args.model, stride_scale = args.stride_scale)
61 | print("#"*14 +" MODEL "+ "#"*14)
62 | print("### Stride scale: " + str(model.stride_scale))
63 | for s in model.strides: print("### Stride: " + str(s))
64 | print("#" * 35)
65 |
66 | dataset = datasets.get(name = args.dataset, layer_strides = model.strides, layer_offsets = model.offsets,
67 | layer_fields = model.fields, white_prob = args.white, bb_positive = args.bb_positive, iou_treshold=args.iou, save=args.save,
68 | batch_size = args.batch_size, input_dim=args.input_dim, resize=args.resize, noise=args.noise)
69 |
70 | # model initialization and parallel computing
71 | if not args.parallel:
72 | print("[INFO] training with 1 device...")
73 | built_model = model.build(input_shape = dataset.input_shape, num_classes = dataset.num_classes)
74 | else:
75 | if K._BACKEND=='tensorflow':
76 | from tensorflow.python.client import device_lib
77 | def get_available_gpus():
78 | local_device_protos = device_lib.list_local_devices()
79 | return [x.name for x in local_device_protos if x.device_type == 'GPU']
80 | ngpus = len(get_available_gpus())
81 | print("[INFO] training with {} GPUs...".format(ngpus))
82 | import tensorflow as tf
83 | with tf.device("/cpu:0"):
84 | original_built_model = model.build(input_shape = dataset.input_shape, num_classes= dataset.num_classes)
85 | built_model = multi_gpu_model(original_built_model, gpus=ngpus)
86 | elif K._BACKEND=='cntk':
87 | built_model = model.build(input_shape = dataset.input_shape, num_classes= dataset.num_classes)
88 | else:
89 | print("Multi GPU not available on this backend.")
90 |
91 | # import numpy as np
92 | # class_weights = np.ones(dataset.num_classes)
93 |
94 | # model compilation with loss and accuracy
95 | def custom_loss(y_true, y_pred):
96 | final_loss = 0.
97 | if dataset.enable_boundingbox:
98 | obj_true = y_true[...,dataset.num_classes]
99 | obj_pred = y_pred[...,dataset.num_classes]
100 | # (1 - z) * x + l * (log(1 + exp(-abs(x))) + max(-x, 0))
101 | log_weight = 1. + (args.pos_weight - 1.) * obj_true
102 | obj = (1. - obj_true) * obj_pred + log_weight * (K.log(1. + K.exp(-K.abs(obj_pred))) + K.relu(- obj_pred))
103 |
104 | obj = K.square(obj_pred - obj_true)
105 |
106 | prob = y_pred[...,0:dataset.num_classes]
107 | # scale predictions so that the class probas of each sample sum to 1
108 | prob /= K.sum(prob, axis=-1, keepdims=True)
109 | # clip to prevent NaN's and Inf's
110 | prob = K.clip(prob, K.epsilon(), 1 - K.epsilon())
111 | # calc
112 | loss = y_true[...,0:dataset.num_classes] * K.log(prob) #* class_weights
113 | cat = -K.sum(loss, -1, keepdims=True)
114 |
115 | reg = K.sum(K.square(y_true[..., dataset.num_classes+1:dataset.num_classes+5] - y_pred[...,dataset.num_classes+1:dataset.num_classes+5]), axis=-1, keepdims=True)
116 |
117 | # if args.best_position_classification:
118 | # mask = K.cast( K.less_equal( y_true[..., dataset.num_classes+5:(dataset.num_classes+6)], model.strides[0] * 1.42 / 2 ), K.floatx())
119 |
120 | mask = K.cast( K.equal( y_true[..., dataset.num_classes:(dataset.num_classes+1)], 1.0 ), K.floatx())
121 |
122 | final_loss = final_loss + obj + K.sum(cat * mask) / K.maximum(K.sum(mask), 1.0) + 100 * K.sum(reg * mask) / K.maximum(K.sum(mask), 1.0)
123 |
124 | if dataset.enable_classification or dataset.enable_segmentation:
125 | final_loss = final_loss + K.categorical_crossentropy(y_true, y_pred)
126 |
127 | return final_loss
128 |
129 |
130 | # metrics
131 | metrics = []
132 | if dataset.enable_boundingbox:
133 |
134 | def obj_accuracy(y_true, y_pred):
135 | acc = K.cast(K.equal( y_true[...,dataset.num_classes], K.round(y_pred[...,dataset.num_classes])), K.floatx())
136 | return K.mean(acc)
137 | metrics.append(obj_accuracy)
138 |
139 | def class_accuracy(y_true, y_pred):
140 | mask = K.cast( K.equal(y_true[...,dataset.num_classes], 1.0 ), K.floatx() )
141 | acc = K.cast(K.equal(K.argmax(y_true[...,0:dataset.num_classes], axis=-1), K.argmax(y_pred[...,0:dataset.num_classes], axis=-1)), K.floatx())
142 | if K.backend() == "cntk":
143 | acc = K.expand_dims(acc)
144 | return K.sum(acc * mask) / K.maximum(K.sum(mask), 1.0)
145 | metrics.append(class_accuracy)
146 |
147 | def reg_accuracy(y_true, y_pred):
148 | mask = K.cast( K.equal(y_true[...,dataset.num_classes], 1.0 ), K.floatx() )
149 | reg = K.sum(K.square(y_true[...,dataset.num_classes+1:dataset.num_classes+3] - y_pred[...,dataset.num_classes+1:dataset.num_classes+3]), axis=-1)
150 | if K.backend() == "cntk":
151 | reg = K.expand_dims(reg)
152 | return K.sum(reg * mask) / K.maximum(K.sum(mask), 1.0)
153 | metrics.append(reg_accuracy)
154 |
155 | if dataset.enable_classification or dataset.enable_segmentation:
156 | metrics.append(categorical_accuracy)
157 |
158 | # model compilation
159 | # built_model.compile(loss=custom_loss, optimizer=keras.optimizers.Adam(lr=args.learning_rate), metrics=metrics)
160 | built_model.compile(optimizer=keras.optimizers.Adam(lr=args.learning_rate), loss=custom_loss, metrics=metrics)
161 |
162 | if args.resume_model:
163 | print("Resuming model from weights in " + args.resume_model)
164 | built_model.load_weights(args.resume_model, by_name=True)
165 | # plot_model(built_model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
166 |
167 | # parallel computing on CNTK
168 | if args.parallel and (K._BACKEND=='cntk'):
169 | import cntk as C
170 | built_model._make_train_function()
171 | trainer = built_model.train_function.trainer
172 | assert (trainer is not None), "Cannot find a trainer in Keras Model!"
173 | learner_no = len(trainer.parameter_learners)
174 | assert (learner_no > 0), "No learner in the trainer."
175 | if(learner_no > 1):
176 | warnings.warn("Unexpected multiple learners in a trainer.")
177 | learner = trainer.parameter_learners[0]
178 | dist_learner = C.train.distributed.data_parallel_distributed_learner(learner, \
179 | num_quantization_bits=32, distributed_after=0)
180 | built_model.train_function.trainer = C.trainer.Trainer(
181 | trainer.model, [trainer.loss_function, trainer.evaluation_function], [dist_learner])
182 | rank = C.Communicator.rank()
183 | workers = C.Communicator.num_workers()
184 | print("[INFO] CNTK training with {} GPUs...".format(workers))
185 | total_items = dataset.x_train.shape[0]
186 | start = rank * total_items//workers
187 | end = min((rank+1) * total_items // workers, total_items)
188 | x_train, y_train = dataset.x_train[start : end], dataset.y_train[start : end]
189 |
190 |
191 | start_time = time.time()
192 |
193 | # Callbacks: save and tensorboard display
194 | callbacks = []
195 |
196 | if args.save:
197 | from keras.callbacks import ModelCheckpoint
198 | if not os.path.exists("/sharedfiles/models"):
199 | os.makedirs("/sharedfiles/models")
200 | fname = "/sharedfiles/models/" + datetime.datetime.fromtimestamp(start_time).strftime('%Y-%m-%d_%H:%M_') + args.model + ".h5"
201 | if args.parallel: # http://github.com/keras-team/keras/issues/8649
202 | from callback import ParallelSaveCallback
203 | checkpoint = ParallelSaveCallback(original_built_model,fname)
204 | else:
205 | if dataset.enable_boundingbox:
206 | checkpoint = ModelCheckpoint(fname, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
207 | else:
208 | checkpoint = ModelCheckpoint(fname, monitor='val_categorical_accuracy', verbose=1, save_best_only=True, mode='max')
209 | callbacks.append(checkpoint)
210 |
211 | if K._BACKEND=='tensorflow':
212 | from callback import TensorBoard
213 | log_dir = './Graph/' + time.strftime("%Y-%m-%d_%H:%M:%S")
214 | tensorboard = TensorBoard(dataset.gt_test, dataset.classes, dataset.stride_margin, model.strides, model.offsets, model.fields, args.nms_iou,
215 | log_dir=log_dir,
216 | histogram_freq=0,
217 | batch_size=args.batch_size,
218 | max_validation_size=100,
219 | write_output_images=True,
220 | enable_segmentation=dataset.enable_segmentation,
221 | enable_boundingbox=dataset.enable_boundingbox,
222 | write_graph=False,
223 | write_images=False,
224 | val_data=dataset.val if hasattr(dataset,"val") else None
225 | )
226 | print("Log saved in ", log_dir)
227 | tensorboard.set_model(built_model)
228 | callbacks.append(tensorboard)
229 |
230 | # training section
231 | if hasattr(dataset, "x_train"):
232 | built_model.fit(dataset.x_train, dataset.y_train,
233 | batch_size=args.batch_size,
234 | epochs=args.epochs,
235 | verbose=1,
236 | validation_data=(dataset.x_test, dataset.y_test),
237 | callbacks=callbacks)
238 | else:
239 | built_model.fit_generator(dataset.train,
240 | epochs=args.epochs,
241 | verbose=1,
242 | workers=args.n_cpu,
243 | use_multiprocessing=False,
244 | max_queue_size=10,
245 | shuffle=True,
246 | validation_data=dataset.val,
247 | callbacks=callbacks)
248 |
249 | # # save model
250 | # if args.save:
251 | # if not os.path.exists("/sharedfiles/models"):
252 | # os.makedirs("/sharedfiles/models")
253 | # fname = "/sharedfiles/models/" + datetime.datetime.fromtimestamp(start_time).strftime('%Y-%m-%d_%H:%M_') + args.model + ".h5"
254 | # built_model.save_weights(fname)
255 | # print("Model weights saved in " + fname)
256 |
257 | # evaluate section
258 | if hasattr(dataset, "x_test"):
259 | score = built_model.evaluate(dataset.x_test, dataset.y_test, batch_size=args.batch_size, verbose=0)
260 | else:
261 | score = built_model.evaluate_generator(dataset.test)
262 |
263 | print("Test loss and accuracy values:")
264 | for s in score:
265 | print(s)
266 | duration = time.time() - start_time
267 | print('Total Duration (%.3f sec)' % duration)
268 |
269 | if K._BACKEND=='tensorflow':
270 | print("Log saved in ", log_dir)
271 |
272 | if K._BACKEND=='cntk' and args.save:
273 | import cntk as C
274 | C.combine(built_model.outputs).save(fname[:-3]+'.dnn')
275 |
276 | if K._BACKEND=='cntk' and args.parallel:
277 | C.Communicator.finalize()
278 |
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import os
3 | import re
4 |
5 | def check_config(write_json=False):
6 | import json
7 | with open(os.path.expanduser('~') + '/.keras/keras.json') as data_file:
8 | data = json.load(data_file)
9 | backend = data["backend"]
10 |
11 | r = re.search('/envs/(cntk|keras-tf)-py([0-9])([0-9])/bin/python', sys.executable)
12 | conda_env = { "tensorflow" : "keras-tf" , "cntk": "cntk" }
13 |
14 | if backend not in conda_env:
15 | sys.exit("Backend not supported.")
16 | else:
17 | env_name = conda_env[backend] + "-py" + str(sys.version_info.major) + str(sys.version_info.minor)
18 |
19 | if r is None or (sys.version_info.major != int(r.group(2)) or sys.version_info.minor != int(r.group(3))):
20 | sys.exit("""
21 | To create corresponding environment:
22 |
23 | conda env update --file """ + env_name + """.yml
24 |
25 | To activate Conda environnement
26 |
27 | source activate """ + env_name + """
28 |
29 | """)
30 | else:
31 | if conda_env[backend] != r.group(1):
32 | for b,e in conda_env.items():
33 | if e == r.group(1):
34 | os.environ["KERAS_BACKEND"] = b
35 | if write_json:
36 | print("Modifying ~/.keras/keras.json to " + b)
37 | with open(os.path.expanduser('~') + '/.keras/keras.json', "w") as data_file:
38 | json.dump(data, data_file)
39 |
--------------------------------------------------------------------------------