├── README.md
├── data.tar.gz
├── image
    ├── 1.jpg
    └── 2.jpg
└── source
    ├── main.py
    ├── models
        └── ace.py
    ├── train.sh
    └── utils
        ├── __init__.py
        ├── basic.py
        ├── char.txt
        └── data_loader.py


/README.md:
--------------------------------------------------------------------------------
 1 | # Aggregation Cross-Entropy for Sequence Recognition
 2 | This repository contains the code for the paper **Aggregation Cross-Entropy for Sequence Recognition**. Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu and Lele Xie. CVPR. 2019. [\[Paper\]](https://arxiv.org/abs/1904.08364)
 3 | 
 4 | Connectionist temporal classification (CTC) and attention mechanism are the most popular methods for sequence-learning problem. However, CTC relies on a sophisticated forward-backward algorithm for transcription, which prevents it from addressing two-dimensional (2D) prediction problem, whereas the attention mechanism leans on a complex attention module to fulfill its functionality, resulting in additional network parameters and runtime. 
 5 | 
 6 | In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective. The ACE loss function exhibits competitive performance to CTC and the attention mechanism, with much quicker implementation (as it involves only four fundamental formulas), faster inference\back-propagation (approximately *O(1)* in parallel), less storage requirement (no parameter and negligible runtime memory), and convenient employment (by replacing CTC with ACE). Furthermore, the proposed ACE loss function exhibits two noteworthy properties: (1) it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input and (2) it requires only characters and their numbers in the sequence annotation for supervision, which allows it to advance beyond sequence recognition, e.g., counting problem.
 7 | 
 8 | ![](./image/1.jpg)
 9 | Figure 1: Illustration of proposed ACE loss function. Generally, the 1D and 2D predictions are generated by integrated CNN-LSTM and FCN model, respectively. For the ACE loss function, the 2D prediction is further flattened to 1D prediction. During aggregation, the 1D predictions at all time-steps are accumulated for each class independently. After normalization, the prediction, together with the ground-truth, is utilized for loss estimation based on cross-entropy.
10 | 
11 | ![](./image/2.jpg)
12 | Figure 2: Toy example to show the advantage of ACE loss function. Resnet-50 trained with ACE loss function is able to recognize shuffled characters in the images. For each sub-image, the right column shows the 2D prediction of the recognition model for the text images. It is noteworthy that they have similar character distributions in the 2D space.
13 | 
14 | ## Requirements
15 | - [Python 3.6](https://www.python.org/) 
16 | - [tensorflow 1.13+](https://pytorch.org/) 
17 | - [OpenCV](https://opencv.org/)
18 | 
19 | ## Data Preparation
20 | tar -xzvf data.tar.gz
21 | 
22 | ## Training and Testing
23 | Start training: (in 'source/' folder)
24 | ```bash
25 |   sh train.sh
26 | ```
27 | - The training process should take **about 10s** for 100 iterations on a 1080Ti.
28 | 
29 | ## Citation
30 | ```
31 | @inproceedings{xie2019ace,
32 |   title     = {Aggregation Cross-Entropy for Sequence Recognition},
33 |   author    = {Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu and Lele Xie},
34 |   booktitle = {CVPR}, 
35 |   year      = {2019},
36 | }
37 | ```
38 | 
39 | ## Attention
40 | The project is only free for academic research purposes.
41 | 
42 | ## Reference
43 | https://github.com/summerlvsong/Aggregation-Cross-Entropy


--------------------------------------------------------------------------------
/data.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tsing-cv/Aggregation-Cross-Entropy-for-Sequence-Recognition/b3fdcdbcf02eea5b01959dffe1bc6380c7dbbff8/data.tar.gz


--------------------------------------------------------------------------------
/image/1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tsing-cv/Aggregation-Cross-Entropy-for-Sequence-Recognition/b3fdcdbcf02eea5b01959dffe1bc6380c7dbbff8/image/1.jpg


--------------------------------------------------------------------------------
/image/2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tsing-cv/Aggregation-Cross-Entropy-for-Sequence-Recognition/b3fdcdbcf02eea5b01959dffe1bc6380c7dbbff8/image/2.jpg


--------------------------------------------------------------------------------
/source/main.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | from __future__ import print_function, division
  3 | import torch
  4 | import argparse
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | import tensorflow.keras.layers as KL
  8 | from models.ace import ACE
  9 | from utils.data_loader import ImageDataset
 10 | tf.enable_eager_execution()
 11 | # import torch.nn as nn
 12 | # from torch import optim
 13 | # import torch.nn.functional as F
 14 | # from models.seq_module import ACE
 15 | # from torch.autograd import Variable
 16 | # from models.solver import seq_solver
 17 | from utils.basic import timeSince
 18 | # from torch.utils.data import DataLoader
 19 | # from utils.data_loader import ImageDataset
 20 | 
 21 | parser = argparse.ArgumentParser()
 22 | parser.add_argument('--model_path', type=str, default='../log/snapshot/model-{:0>2d}.pkl')
 23 | parser.add_argument('--total_epoch', type=int, default=50, help='total epoch number')
 24 | parser.add_argument('--train_path', type=str, default='../data/train.txt')
 25 | parser.add_argument('--test_path', type=str, default='../data/test.txt')
 26 | parser.add_argument('--train_batch_size', type=int, default=50, help='training batch size')
 27 | parser.add_argument('--test_batch_size', type=int, default=50, help='testing batch size')
 28 | parser.add_argument('--last_epoch', type=int, default=0, help='last epoch')
 29 | parser.add_argument('--class_num', type=int, default=26, help='class number')
 30 | parser.add_argument('--dict', type=str, default='_abcdefghijklmnopqrstuvwxyz')
 31 | opt = parser.parse_args()
 32 | 
 33 | class _Bottleneck(tf.keras.Model):
 34 |     def __init__(self, filters, block, 
 35 |                  downsampling=False, stride=1, **kwargs):
 36 |         super(_Bottleneck, self).__init__(**kwargs)
 37 | 
 38 |         filters1, filters2, filters3 = filters
 39 |         conv_name_base = 'res' + block + '_branch'
 40 |         bn_name_base   = 'bn'  + block + '_branch'
 41 | 
 42 |         self.downsampling = downsampling
 43 |         self.stride = stride
 44 |         self.out_channel = filters3
 45 |         
 46 |         self.conv2a = KL.Conv2D(filters1, (1, 1), strides=(stride, stride),
 47 |                                 kernel_initializer='he_normal',
 48 |                                 name=conv_name_base + '2a')
 49 |         self.bn2a = KL.BatchNormalization(name=bn_name_base + '2a')
 50 | 
 51 |         self.conv2b = KL.Conv2D(filters2, (3, 3), padding='same',
 52 |                                 kernel_initializer='he_normal',
 53 |                                 name=conv_name_base + '2b')
 54 |         self.bn2b = KL.BatchNormalization(name=bn_name_base + '2b')
 55 | 
 56 |         self.conv2c = KL.Conv2D(filters3, (1, 1),
 57 |                                 kernel_initializer='he_normal',
 58 |                                 name=conv_name_base + '2c')
 59 |         self.bn2c = KL.BatchNormalization(name=bn_name_base + '2c')
 60 |          
 61 |         if self.downsampling:
 62 |             self.conv_shortcut = KL.Conv2D(filters3, (1, 1), strides=(stride, stride),
 63 |                                             kernel_initializer='he_normal',
 64 |                                             name=conv_name_base + '1')
 65 |             self.bn_shortcut = KL.BatchNormalization(name=bn_name_base + '1')     
 66 |     
 67 |     def __call__(self, inputs, training=False):
 68 |         x = self.conv2a(inputs)
 69 |         x = self.bn2a(x, training=training)
 70 |         x = tf.nn.relu(x)
 71 |         
 72 |         x = self.conv2b(x)
 73 |         x = self.bn2b(x, training=training)
 74 |         x = tf.nn.relu(x)
 75 |         
 76 |         x = self.conv2c(x)
 77 |         x = self.bn2c(x, training=training)
 78 |         
 79 |         if self.downsampling:
 80 |             shortcut = self.conv_shortcut(inputs)
 81 |             shortcut = self.bn_shortcut(shortcut, training=training)
 82 |         else:
 83 |             shortcut = inputs
 84 |             
 85 |         x += shortcut
 86 |         x = tf.nn.relu(x)
 87 |         
 88 |         return x     
 89 |         
 90 | 
 91 | class ResNet(tf.keras.Model):
 92 |     def __init__(self, depth, **kwargs):
 93 |         super(ResNet, self).__init__(**kwargs)
 94 |         if depth not in [50, 101]:
 95 |             raise AssertionError('depth must be 50 or 101.')
 96 |         self.depth = depth
 97 |         self.padding = KL.ZeroPadding2D((3, 3))
 98 |         self.conv1 = KL.Conv2D(64, (7, 7), strides=(2, 2), kernel_initializer='he_normal', name='conv1')
 99 |         self.bn_conv1 = KL.BatchNormalization(name='bn_conv1')
100 |         self.max_pool = KL.MaxPooling2D((3, 3), strides=(2, 2), padding='same')
101 |         
102 |         self.res2a = _Bottleneck([64, 64, 256], block='2a', downsampling=True, stride=1)
103 |         self.res2b = _Bottleneck([64, 64, 256], block='2b')
104 |         self.res2c = _Bottleneck([64, 64, 256], block='2c')
105 |         
106 |         self.res3a = _Bottleneck([128, 128, 512], block='3a', downsampling=True, stride=2)
107 |         self.res3b = _Bottleneck([128, 128, 512], block='3b')
108 |         self.res3c = _Bottleneck([128, 128, 512], block='3c')
109 |         self.res3d = _Bottleneck([128, 128, 512], block='3d')
110 |         
111 |         self.res4a = _Bottleneck([256, 256, 1024], block='4a', downsampling=True, stride=2)
112 |         self.res4b = _Bottleneck([256, 256, 1024], block='4b')
113 |         self.res4c = _Bottleneck([256, 256, 1024], block='4c')
114 |         self.res4d = _Bottleneck([256, 256, 1024], block='4d')
115 |         self.res4e = _Bottleneck([256, 256, 1024], block='4e')
116 |         self.res4f = _Bottleneck([256, 256, 1024], block='4f')
117 |         if self.depth == 101:
118 |             self.res4g = _Bottleneck([256, 256, 1024], block='4g')
119 |             self.res4h = _Bottleneck([256, 256, 1024], block='4h')
120 |             self.res4i = _Bottleneck([256, 256, 1024], block='4i')
121 |             self.res4j = _Bottleneck([256, 256, 1024], block='4j')
122 |             self.res4k = _Bottleneck([256, 256, 1024], block='4k')
123 |             self.res4l = _Bottleneck([256, 256, 1024], block='4l')
124 |             self.res4m = _Bottleneck([256, 256, 1024], block='4m')
125 |             self.res4n = _Bottleneck([256, 256, 1024], block='4n')
126 |             self.res4o = _Bottleneck([256, 256, 1024], block='4o')
127 |             self.res4p = _Bottleneck([256, 256, 1024], block='4p')
128 |             self.res4q = _Bottleneck([256, 256, 1024], block='4q')
129 |             self.res4r = _Bottleneck([256, 256, 1024], block='4r')
130 |             self.res4s = _Bottleneck([256, 256, 1024], block='4s')
131 |             self.res4t = _Bottleneck([256, 256, 1024], block='4t')
132 |             self.res4u = _Bottleneck([256, 256, 1024], block='4u')
133 |             self.res4v = _Bottleneck([256, 256, 1024], block='4v')
134 |             self.res4w = _Bottleneck([256, 256, 1024], block='4w') 
135 |         
136 |         self.res5a = _Bottleneck([512, 512, 2048], block='5a', downsampling=True, stride=2)
137 |         self.res5b = _Bottleneck([512, 512, 2048], block='5b')
138 |         self.res5c = _Bottleneck([512, 512, 2048], block='5c')
139 |         
140 |         self.out_channel = (256, 512, 1024, 2048)
141 |     
142 |     def __call__(self, inputs, training=True):
143 |         x = self.padding(inputs)
144 |         x = self.conv1(x)
145 |         x = self.bn_conv1(x, training=training)
146 |         x = tf.nn.relu(x)
147 |         x = self.max_pool(x)
148 |         
149 |         x = self.res2a(x, training=training)
150 |         x = self.res2b(x, training=training)
151 |         C2 = x = self.res2c(x, training=training)
152 |         
153 |         x = self.res3a(x, training=training)
154 |         x = self.res3b(x, training=training)
155 |         x = self.res3c(x, training=training)
156 |         C3 = x = self.res3d(x, training=training)
157 |         
158 |         x = self.res4a(x, training=training)
159 |         x = self.res4b(x, training=training)
160 |         x = self.res4c(x, training=training)
161 |         x = self.res4d(x, training=training)
162 |         x = self.res4e(x, training=training)
163 |         x = self.res4f(x, training=training)
164 |         if self.depth == 101:
165 |             x = self.res4g(x, training=training)
166 |             x = self.res4h(x, training=training)
167 |             x = self.res4i(x, training=training)
168 |             x = self.res4j(x, training=training)
169 |             x = self.res4k(x, training=training)
170 |             x = self.res4l(x, training=training)
171 |             x = self.res4m(x, training=training)
172 |             x = self.res4n(x, training=training)
173 |             x = self.res4o(x, training=training)
174 |             x = self.res4p(x, training=training)
175 |             x = self.res4q(x, training=training)
176 |             x = self.res4r(x, training=training)
177 |             x = self.res4s(x, training=training)
178 |             x = self.res4t(x, training=training)
179 |             x = self.res4u(x, training=training)
180 |             x = self.res4v(x, training=training)
181 |             x = self.res4w(x, training=training) 
182 |         C4 = x
183 |         
184 |         # x = self.res5a(x, training=training)
185 |         # x = self.res5b(x, training=training)
186 |         # C5 = x = self.res5c(x, training=training)
187 |         
188 |         # return C2, C3, C4, C5
189 |         return C4
190 | 
191 | class ResnetEncoderDecoder(tf.keras.Model):
192 |     def __init__(self):
193 |         super(ResnetEncoderDecoder, self).__init__()
194 |         self.resnet = ResNet(50)
195 |         self.out = tf.keras.layers.Dense(opt.class_num+1)
196 |         self.loss_layer = ACE(opt.dict)
197 | 
198 |     def call(self, inputs, training=True):
199 |         input, label = inputs[0], inputs[1]
200 |         input = self.resnet(input)
201 |         # print ("input shape", input.shape)
202 |         input = tf.nn.softmax(self.out(input),dim=-1)
203 | 
204 |         return self.loss_layer([input,label])
205 | 
206 | 
207 | if __name__ == "__main__":
208 | 
209 |     model = ResnetEncoderDecoder()
210 | 
211 | 
212 |     optimizer = tf.train.RMSPropOptimizer(learning_rate=0.0001)
213 |     checkpoint_path = "./checkpoints/train"
214 |     ckpt = tf.train.Checkpoint(model=model,
215 |                             optimizer=optimizer,
216 |                             step=tf.train.get_or_create_global_step())
217 |     ckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)
218 | 
219 |     train_set = ImageDataset(data_path = opt.train_path, char_path="utils/char.txt", batch_size=128, training=True).data_generation()
220 |     test_set = ImageDataset(data_path = opt.test_path, char_path="utils/char.txt", batch_size=128, training=False).data_generation()
221 | 
222 |     start_epoch = 0
223 |     if ckpt_manager.latest_checkpoint:
224 |         ckpt.restore(ckpt_manager.latest_checkpoint)
225 |         start_epoch = int(ckpt_manager.latest_checkpoint.split('-')[-1])
226 |         print (f'Latest checkpoint restored!!\n\tModel path is {ckpt_manager.latest_checkpoint}')
227 | 
228 |     epochs = 100000
229 |     for epoch in range(start_epoch, epochs):
230 |         loss_history = []
231 |         for step, (inputs, labels) in enumerate(train_set):
232 |             # print (inputs)
233 |             with tf.GradientTape() as tape:
234 |                 loss = model(inputs["images"], training=True)
235 |                 correct_count, len_total, pre_total = model.loss_layer.result_analysis(step)
236 |                 recall = float(correct_count) / len_total
237 |                 precision = correct_count / (pre_total+0.000001)
238 |                 print(f'Epoch: {epoch:3d} it: {step:6d}, loss: {loss:.4f}, recall: {recall:.4f}, precision: {precision:.4f}')
239 | 
240 |             grads = tape.gradient(loss, model.variables)
241 |             optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
242 | 
243 |             loss_history.append(loss.numpy())
244 | 			# if step == 0:
245 | 			# 	loss_aver = loss
246 | 			# loss_aver = 0.9*loss_aver+0.1*loss			  
247 | 			# if step == len(self.lmdb_train)-1:
248 |         ckpt_manager.save()
249 |     # the_solver = seq_solver(model = model,
250 |     #                     lmdb = [lmdb_train, lmdb_test],
251 |     #                     optimizer = optimizer, 
252 |     #                     scheduler = scheduler,
253 |     #                     total_epoch = opt.total_epoch,
254 |     #                     model_path = opt.model_path,
255 |     #                     last_epoch = opt.last_epoch)
256 | 
257 |     # the_solver.forward()
258 | 
259 | 


--------------------------------------------------------------------------------
/source/models/ace.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | import math
 3 | import torch
 4 | import random
 5 | import itertools
 6 | import numpy as np
 7 | import tensorflow as tf
 8 | 
 9 | class ACE(tf.keras.Model):
10 | 
11 |     def __init__(self, dictionary, name='aggregrate_cross_entropy', **kwargs):
12 |         super(ACE, self).__init__(name=name, **kwargs)
13 |         self.softmax = None
14 |         self.label = None
15 |         self.dict = dictionary
16 | 
17 |     def call(self, inputs):
18 |         input, label = inputs[0], inputs[1]
19 |         self.bs,self.h,self.w,_ = input.shape.as_list()
20 |         T_ = self.h*self.w
21 | 
22 |         input = tf.reshape(input, (self.bs,T_,-1))
23 |         input = input + 1e-10
24 | 
25 |         self.softmax = input
26 |         nums,dist = label[:,0],label[:,1:]
27 |         nums = T_ - nums
28 |         
29 |         self.label = tf.concat([tf.expand_dims(nums, -1),dist], 1)
30 | 
31 |         # ACE Implementation (four fundamental formulas)
32 |         input = tf.reduce_sum(input, axis=1)
33 |         input = input/T_
34 |         label = label/T_
35 |         loss = (-tf.reduce_sum(tf.math.log(input)*label))/self.bs
36 | 
37 |         return loss
38 | 
39 | 
40 |     def decode_batch(self):
41 |         out_best = tf.argmax(self.softmax, 2).numpy()
42 |         pre_result = [0]*self.bs
43 |         for j in range(self.bs):
44 |             pre_result[j] = out_best[j][out_best[j]!=0].astype(np.int32)
45 |         return pre_result
46 | 
47 | 
48 |     def vis(self,iteration):
49 |         sn = np.random.randint(0,self.bs-1)
50 |         print(f'Test image {iteration*50+sn:4d}')
51 |         pred = tf.argmax(self.softmax, 2).numpy()
52 |         pred = pred[sn].astype(np.int32).tolist() # sample #0
53 |         pred_string = ''.join([f'{self.dict[pn]:2s}' for pn in pred])
54 |         pred_string_set = [pred_string[i:i+self.w*2] for i in range(0, len(pred_string), self.w*2)]
55 |         print('Prediction: ')
56 |         for pre_str in pred_string_set:
57 |             print(pre_str)
58 |         label = self.label.numpy().astype(np.int32) # (batch_size, num_classes)
59 |         label = ''.join([f'{self.dict[idx]:2s}:{pn:2d}    ' for idx, pn in enumerate(label[sn]) if idx != 0 and pn != 0])
60 |         label = 'Label: ' + label
61 |         print(label)
62 | 
63 |     def result_analysis(self, iteration):
64 |         prediction = self.decode_batch()
65 |         correct_count = 0
66 |         pre_total = 0
67 |         len_total = self.label[:,1:].numpy().sum()
68 |         label_data = self.label.numpy()
69 |         for idx, pre_list in enumerate(prediction):
70 |             for pw in pre_list:
71 |                 if label_data[idx][pw] > 0:
72 |                     correct_count = correct_count + 1
73 |                     label_data[idx][pw] -= 1
74 | 
75 |             pre_total += len(pre_list)  
76 | 
77 |         if np.random.random() < 0.05:
78 |             self.vis(iteration)
79 | 
80 |         return correct_count, len_total, pre_total  


--------------------------------------------------------------------------------
/source/train.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 | 
3 | filename="../log/log/log_`date +%y_%m_%d_%H_%M_%S`.txt"
4 | CUDA_VISIBLE_DEVICES=0 python -u main.py \
5 | 	2>&1 | tee $filename
6 | 
7 | 
8 | 
9 | 	


--------------------------------------------------------------------------------
/source/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tsing-cv/Aggregation-Cross-Entropy-for-Sequence-Recognition/b3fdcdbcf02eea5b01959dffe1bc6380c7dbbff8/source/utils/__init__.py


--------------------------------------------------------------------------------
/source/utils/basic.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | import math
 3 | 
 4 | 
 5 | def asMinutes(s):
 6 |     m = math.floor(s / 60)
 7 |     s -= m * 60
 8 |     return '%dm %ds' % (m, s)
 9 | 
10 | 
11 | def timeSince(since):
12 |     now = time.time()
13 |     s = now - since
14 |     return '%s' % (asMinutes(s))
15 | 


--------------------------------------------------------------------------------
/source/utils/char.txt:
--------------------------------------------------------------------------------
 1 | a
 2 | b
 3 | c
 4 | d
 5 | e
 6 | f
 7 | g
 8 | h
 9 | i
10 | j
11 | k
12 | l
13 | m
14 | n
15 | o
16 | p
17 | q
18 | r
19 | s
20 | t
21 | u
22 | v
23 | w
24 | x
25 | y
26 | z
27 | _


--------------------------------------------------------------------------------
/source/utils/data_loader.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | import numpy as np
 3 | import tensorflow as tf
 4 | import unicodedata
 5 | import multiprocessing
 6 | 
 7 | class ImageDataset():
 8 |     """Face Landmarks dataset."""
 9 | 
10 |     def __init__(self, data_path, char_path, batch_size=10, training=True, transform=None):
11 |         """
12 |         Args:
13 |             data_path (string): Path to the files with images and their annotations.
14 |             length (string): image number.
15 |             class_num (int): class number.
16 |         """
17 |         with open(data_path) as fh:
18 |             self.img_and_label = fh.readlines()
19 |         with open(char_path) as f:
20 |             self.char_id_map = {char.strip():idx for idx,char in enumerate(f)}
21 |             print (self.char_id_map)
22 |         self.length = len(self.img_and_label)
23 |         self.class_num = len(self.char_id_map)
24 |         self.indexes = np.arange(self.length)
25 |         self.batch_size = batch_size
26 |         self.transform = transform if training else None
27 |         
28 | 
29 |     def __len__(self):
30 |         return self.length
31 | 
32 |     def __getitem__(self, index):
33 |         img_and_label = self.img_and_label[index].strip()
34 |         pth, word = img_and_label.split(' ') # image path and its annotation
35 | 
36 |         image = cv2.imread(pth)#,0)
37 |         image = cv2.pyrDown(image).astype('float32') # 100*100
38 |         
39 |         word = [ord(var)-97 for var in word] # a->0
40 | 
41 |         label = np.zeros((self.class_num)).astype('float32')
42 | 
43 |         for ln in word:
44 |             label[int(ln+1)] += 1 # label construction for ACE
45 | 
46 |         label[0] = len(word)
47 |         return pth,image,label
48 | 
49 |  
50 |     def data_generation(self):
51 |         steps_of_per_epoch = self.length//self.batch_size
52 |         while True:
53 |             with multiprocessing.Pool(processes=8) as pool:
54 |                 np.random.shuffle(self.img_and_label)
55 |                 for i, data in enumerate(range(steps_of_per_epoch)):
56 |                     paths = []
57 |                     images = []
58 |                     labels = []
59 |                     index_batch = self.indexes[i * self.batch_size:(i + 1) * self.batch_size]
60 |                     for idx in index_batch:
61 |                         # print (idx)
62 |                         path,image,label = self.__getitem__(idx)
63 |                         # print ("label", label)
64 |                         paths.append(path)
65 |                         images.append(image)
66 |                         labels.append(label)
67 | 
68 |                     yield {"path":np.array(paths), "images": np.array(images)}, np.array(labels)
69 | 
70 | 
71 | 


--------------------------------------------------------------------------------