├── .gitignore
├── README.md
├── classifier
├── cascade.xml
├── params.xml
├── stage0.xml
├── stage1.xml
├── stage2.xml
├── stage3.xml
├── stage4.xml
├── stage5.xml
└── stage6.xml
├── preview.jpg
├── results
├── 1-cv.jpg
├── 1-sk.jpg
├── 2-cv.jpg
├── 2-sk.jpg
├── 3-cv.jpg
├── 3-sk.jpg
├── 4-cv.jpg
├── 4-sk.jpg
├── 6-cv.jpg
├── 6-sk.jpg
├── 7-cv.jpg
├── 7-sk.jpg
├── 8-cv.jpg
├── 8-sk.jpg
└── cv.jpg
├── src
├── .gitignore
├── detect.py
├── load_labels.py
├── main.py
└── recognize.py
└── test
├── 1.jpg
├── 2.jpg
├── 3.jpg
├── 4.jpg
├── 6.jpg
├── 7.jpg
└── 8.jpg
/.gitignore:
--------------------------------------------------------------------------------
1 | asset2
2 | MNIST
3 | *.pyc
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## Digit Detection & Recognition
2 |
3 | ### What is it?
4 |
5 | Digit detection and recognition with AdaBoost and SVM.
6 |
7 | 
8 |
9 | ### How it works
10 |
11 | 1. Train a cascade classifier for detection. The cascade classifier in `classifier/cascade.xml` is trained with 7000 positive samples and 9000 negative samples in 10 stages.
12 | 2. Train a SVM with the MNIST database.
13 | 3. Detect the digits in the image.
14 | 4. For each detected region, scale them to the same size as the samples in MNIST, then use the trained SVM to recognize(classify) the digits. For better results we can deskew the images with their momentum first, then use the HOG descriptors for testing.
15 |
16 | ### Dependencies
17 |
18 | These scripts need python 2.7+ and the following libraries to work:
19 |
20 | 1. pillow(~2.8.1)
21 | 2. numpy(~1.9.0)
22 | 3. python-opencv(~2.4.11)
23 | 4. scikit-learn (~0.15.2)
24 | The simplest way to install all of them is to install [python(x,y)](https://code.google.com/p/pythonxy/wiki/Downloads?tm=2).
25 |
26 | If you can't install python(x,y), You can install python, numpy and python-opencv seperately, then install pip and pillow.
27 |
28 | 1. Install python. Just use the installer from [python's website](https://www.python.org/downloads/)
29 | 2. Install numpy. Just use the installer from [scipy's website](http://www.scipy.org/scipylib/download.html). (You don't need scipy to run this project, so you can just install numpy alone).
30 | 3. Install python-opencv. Download the release from [its sourceforge site](http://sourceforge.net/projects/opencvlibrary/files/). (Choose the release based on your operating system, then choose version 2.4.11). The executable is just an archive. Extract the files, then copy `cv2.pyd` to the `lib/site-packages` folder on your python installation path.
31 | 4. Install pip. Download [the script for installing pip](https://bootstrap.pypa.io/get-pip.py), open cmd (or termianl if you are using Linux/Mac OS X), go to the path where the downloaded script resides, and run `python get-pip.py`
32 | 5. Install pillow. Run `pip install pillow`.
33 | 6. Install scikit-learn. Run `pip install scikit-learn`
34 |
35 | If you are running the code under Linux/Mac OS X and the scripts throw `AttributeError: __float__`, make sure your pillow has jpeg support (consult [Pillow's document](http://pillow.readthedocs.org/en/latest/installation.html)) e.g. try:
36 |
37 | ```
38 | sudo apt-get install libjpeg-dev
39 | sudo pip uninstall pillow
40 | sudo pip install pillow
41 | ```
42 |
43 | If you have any problem installing the dependencies, contact the author.
44 |
45 | ### How to generate the results
46 |
47 | Enter the `src` directory, run
48 |
49 | ```
50 | python main.py
51 | ```
52 |
53 | It will use images(`.jpg` only) under `test` directory to produce the results. The results will show up in `results` directory. Results generated with OpenCV will have `-cv` in its filename and results generated with sklearn will have `-sk` in its filename.
54 |
55 |
56 | ### Directory structure
57 |
58 | ```
59 | .
60 | ├─ README.md
61 | ├─ doc (documentations, reports)
62 | │ └── ...
63 | ├─ classifier (OpenCV cascade classifier)
64 | │ ├── cascade.xml (the classifier parameter file)
65 | │ └── ...
66 | ├─ MNIST (The MNIST database)
67 | │ ├── train-images.idx3-ubyte
68 | │ └── train-labels.idx1-ubyte
69 | ├─ test (test images)
70 | │ └── ...
71 | ├─ results (the results)
72 | │ └── ...
73 | └─ src (the python source code)
74 | ├── detect.py (detection code)
75 | ├── load_labels.py (script to load MNIST data)
76 | ├── recognize.py (recognition code)
77 | └── main.py (generate the results)
78 | ```
79 |
80 | ### About
81 |
82 | * [Github repository](https://github.com/joyeecheung/digit-detection-recognition)
83 | * Author: Qiuyi Zhang
84 | * Time: Jul. 2015
--------------------------------------------------------------------------------
/classifier/cascade.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | BOOST
5 | HAAR
6 | 28
7 | 28
8 |
9 | GAB
10 | 9.9500000476837158e-001
11 | 5.0000000000000000e-001
12 | 9.4999999999999996e-001
13 | 1
14 | 100
15 |
16 | 0
17 | 1
18 | BASIC
19 | 7
20 |
21 |
22 | <_>
23 | 2
24 | -1.8347458541393280e-001
25 |
26 | <_>
27 |
28 | 0 -1 4 7.3850885033607483e-002
29 |
30 | -9.1880136728286743e-001 7.2553080320358276e-001
31 | <_>
32 |
33 | 0 -1 16 -1.0428477823734283e-001
34 |
35 | 7.3532676696777344e-001 -8.7772518396377563e-001
36 |
37 | <_>
38 | 4
39 | -1.8423573970794678e+000
40 |
41 | <_>
42 |
43 | 0 -1 14 2.7671821415424347e-002
44 |
45 | -8.7964987754821777e-001 5.1012891530990601e-001
46 | <_>
47 |
48 | 0 -1 13 -1.5674557536840439e-002
49 |
50 | 4.4505974650382996e-001 -8.3683210611343384e-001
51 | <_>
52 |
53 | 0 -1 21 -1.1460572568466887e-004
54 |
55 | 3.2763624191284180e-001 -8.8302916288375854e-001
56 | <_>
57 |
58 | 0 -1 5 3.4735631942749023e-001
59 |
60 | -5.2473807334899902e-001 6.4939862489700317e-001
61 |
62 | <_>
63 | 5
64 | -6.9036191701889038e-001
65 |
66 | <_>
67 |
68 | 0 -1 7 3.8202914595603943e-001
69 |
70 | -6.2518489360809326e-001 5.9963268041610718e-001
71 | <_>
72 |
73 | 0 -1 20 -1.1428447032812983e-004
74 |
75 | 3.5979631543159485e-001 -8.4728848934173584e-001
76 | <_>
77 |
78 | 0 -1 24 -6.0800916799053084e-006
79 |
80 | -9.8068064451217651e-001 2.6229214668273926e-001
81 | <_>
82 |
83 | 0 -1 11 -2.0428642164915800e-003
84 |
85 | -9.7884273529052734e-001 2.2761307656764984e-001
86 | <_>
87 |
88 | 0 -1 22 -9.1749490820802748e-005
89 |
90 | 2.9220622777938843e-001 -8.3195126056671143e-001
91 |
92 | <_>
93 | 4
94 | -9.6762913465499878e-001
95 |
96 | <_>
97 |
98 | 0 -1 10 6.8016932345926762e-005
99 |
100 | -6.9405460357666016e-001 4.2146533727645874e-001
101 | <_>
102 |
103 | 0 -1 6 2.5270378682762384e-003
104 |
105 | 2.9075825214385986e-001 -9.3532639741897583e-001
106 | <_>
107 |
108 | 0 -1 25 -5.9558178691077046e-006
109 |
110 | -8.4807384014129639e-001 3.0522018671035767e-001
111 | <_>
112 |
113 | 0 -1 25 5.9294161474099383e-006
114 |
115 | 3.5653164982795715e-001 -9.9828380346298218e-001
116 |
117 | <_>
118 | 3
119 | -1.4790147542953491e-001
120 |
121 | <_>
122 |
123 | 0 -1 25 -6.0298630160104949e-006
124 |
125 | -8.6544436216354370e-001 3.7327477335929871e-001
126 | <_>
127 |
128 | 0 -1 17 -5.3857154853176326e-005
129 |
130 | 4.1779288649559021e-001 -6.8576753139495850e-001
131 | <_>
132 |
133 | 0 -1 6 -7.4640125967562199e-004
134 |
135 | -9.8402875661849976e-001 2.9975003004074097e-001
136 |
137 | <_>
138 | 5
139 | -6.4234280586242676e-001
140 |
141 | <_>
142 |
143 | 0 -1 12 2.6030194014310837e-002
144 |
145 | -5.9079927206039429e-001 5.5949991941452026e-001
146 | <_>
147 |
148 | 0 -1 18 -1.1487156734801829e-004
149 |
150 | 3.5113725066184998e-001 -8.3308726549148560e-001
151 | <_>
152 |
153 | 0 -1 3 4.8153925687074661e-002
154 |
155 | -7.1412664651870728e-001 3.0667838454246521e-001
156 | <_>
157 |
158 | 0 -1 2 -5.9005141258239746e-002
159 |
160 | -9.4212603569030762e-001 2.5226300954818726e-001
161 | <_>
162 |
163 | 0 -1 8 8.0452084541320801e-002
164 |
165 | 2.5081482529640198e-001 -9.6162217855453491e-001
166 |
167 | <_>
168 | 6
169 | -8.3931213617324829e-001
170 |
171 | <_>
172 |
173 | 0 -1 0 1.5571638869005255e-005
174 |
175 | -8.0575537681579590e-001 2.6608934998512268e-001
176 | <_>
177 |
178 | 0 -1 15 -2.3084910935722291e-004
179 |
180 | 2.3701831698417664e-001 -8.9803802967071533e-001
181 | <_>
182 |
183 | 0 -1 1 2.1151141263544559e-003
184 |
185 | 2.5339540839195251e-001 -9.8738276958465576e-001
186 | <_>
187 |
188 | 0 -1 23 5.9348781178414356e-006
189 |
190 | 2.0503005385398865e-001 -8.4154272079467773e-001
191 | <_>
192 |
193 | 0 -1 19 -1.4691188698634505e-004
194 |
195 | 2.4083861708641052e-001 -8.0533689260482788e-001
196 | <_>
197 |
198 | 0 -1 9 -8.8780790567398071e-002
199 |
200 | -9.5632034540176392e-001 1.6521719098091125e-001
201 |
202 | <_>
203 |
204 | <_>
205 | 0 0 18 2 -1.
206 | <_>
207 | 9 0 9 2 2.
208 | 0
209 | <_>
210 |
211 | <_>
212 | 0 0 20 2 -1.
213 | <_>
214 | 10 0 10 2 2.
215 | 0
216 | <_>
217 |
218 | <_>
219 | 0 0 28 28 -1.
220 | <_>
221 | 14 0 14 28 2.
222 | 0
223 | <_>
224 |
225 | <_>
226 | 0 0 21 10 -1.
227 | <_>
228 | 0 5 21 5 2.
229 | 0
230 | <_>
231 |
232 | <_>
233 | 0 6 14 17 -1.
234 | <_>
235 | 7 6 7 17 2.
236 | 0
237 | <_>
238 |
239 | <_>
240 | 0 6 27 17 -1.
241 | <_>
242 | 9 6 9 17 3.
243 | 0
244 | <_>
245 |
246 | <_>
247 | 0 27 27 1 -1.
248 | <_>
249 | 9 27 9 1 3.
250 | 0
251 | <_>
252 |
253 | <_>
254 | 1 6 27 18 -1.
255 | <_>
256 | 10 6 9 18 3.
257 | 0
258 | <_>
259 |
260 | <_>
261 | 2 0 26 23 -1.
262 | <_>
263 | 15 0 13 23 2.
264 | 0
265 | <_>
266 |
267 | <_>
268 | 6 0 22 28 -1.
269 | <_>
270 | 6 14 22 14 2.
271 | 0
272 | <_>
273 |
274 | <_>
275 | 7 0 2 6 -1.
276 | <_>
277 | 8 0 1 6 2.
278 | 0
279 | <_>
280 |
281 | <_>
282 | 8 0 20 2 -1.
283 | <_>
284 | 18 0 10 2 2.
285 | 0
286 | <_>
287 |
288 | <_>
289 | 10 20 16 8 -1.
290 | <_>
291 | 10 20 8 4 2.
292 | <_>
293 | 18 24 8 4 2.
294 | 0
295 | <_>
296 |
297 | <_>
298 | 10 20 10 8 -1.
299 | <_>
300 | 10 24 10 4 2.
301 | 0
302 | <_>
303 |
304 | <_>
305 | 11 0 11 10 -1.
306 | <_>
307 | 11 5 11 5 2.
308 | 0
309 | <_>
310 |
311 | <_>
312 | 11 22 5 6 -1.
313 | <_>
314 | 11 25 5 3 2.
315 | 0
316 | <_>
317 |
318 | <_>
319 | 14 5 14 18 -1.
320 | <_>
321 | 21 5 7 18 2.
322 | 0
323 | <_>
324 |
325 | <_>
326 | 21 22 6 4 -1.
327 | <_>
328 | 21 24 6 2 2.
329 | 0
330 | <_>
331 |
332 | <_>
333 | 22 12 6 4 -1.
334 | <_>
335 | 24 12 2 4 3.
336 | 0
337 | <_>
338 |
339 | <_>
340 | 23 2 4 10 -1.
341 | <_>
342 | 25 2 2 10 2.
343 | 0
344 | <_>
345 |
346 | <_>
347 | 23 5 2 8 -1.
348 | <_>
349 | 24 5 1 8 2.
350 | 0
351 | <_>
352 |
353 | <_>
354 | 23 8 2 8 -1.
355 | <_>
356 | 24 8 1 8 2.
357 | 0
358 | <_>
359 |
360 | <_>
361 | 23 14 2 10 -1.
362 | <_>
363 | 24 14 1 10 2.
364 | 0
365 | <_>
366 |
367 | <_>
368 | 24 0 4 8 -1.
369 | <_>
370 | 24 4 4 4 2.
371 | 0
372 | <_>
373 |
374 | <_>
375 | 24 0 4 10 -1.
376 | <_>
377 | 24 5 4 5 2.
378 | 0
379 | <_>
380 |
381 | <_>
382 | 24 16 4 12 -1.
383 | <_>
384 | 26 16 2 12 2.
385 | 0
386 |
387 |
--------------------------------------------------------------------------------
/classifier/params.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | BOOST
5 | HAAR
6 | 28
7 | 28
8 |
9 | GAB
10 | 9.9500000476837158e-001
11 | 5.0000000000000000e-001
12 | 9.4999999999999996e-001
13 | 1
14 | 100
15 |
16 | 0
17 | 1
18 | BASIC
19 |
20 |
--------------------------------------------------------------------------------
/classifier/stage0.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | 2
5 | -1.8347458541393280e-001
6 |
7 | <_>
8 |
9 | 0 -1 8540 7.3850885033607483e-002
10 |
11 | -9.1880136728286743e-001 7.2553080320358276e-001
12 | <_>
13 |
14 | 0 -1 228204 -1.0428477823734283e-001
15 |
16 | 7.3532676696777344e-001 -8.7772518396377563e-001
17 |
18 |
--------------------------------------------------------------------------------
/classifier/stage1.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | 4
5 | -1.8423573970794678e+000
6 |
7 | <_>
8 |
9 | 0 -1 189714 2.7671821415424347e-002
10 |
11 | -8.7964987754821777e-001 5.1012891530990601e-001
12 | <_>
13 |
14 | 0 -1 188081 -1.5674557536840439e-002
15 |
16 | 4.4505974650382996e-001 -8.3683210611343384e-001
17 | <_>
18 |
19 | 0 -1 291829 -1.1460572568466887e-004
20 |
21 | 3.2763624191284180e-001 -8.8302916288375854e-001
22 | <_>
23 |
24 | 0 -1 8687 3.4735631942749023e-001
25 |
26 | -5.2473807334899902e-001 6.4939862489700317e-001
27 |
28 |
--------------------------------------------------------------------------------
/classifier/stage2.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | 5
5 | -6.9036191701889038e-001
6 |
7 | <_>
8 |
9 | 0 -1 29435 3.8202914595603943e-001
10 |
11 | -6.2518489360809326e-001 5.9963268041610718e-001
12 | <_>
13 |
14 | 0 -1 291303 -1.1428447032812983e-004
15 |
16 | 3.5979631543159485e-001 -8.4728848934173584e-001
17 | <_>
18 |
19 | 0 -1 293629 -6.0800916799053084e-006
20 |
21 | -9.8068064451217651e-001 2.6229214668273926e-001
22 | <_>
23 |
24 | 0 -1 147239 -2.0428642164915800e-003
25 |
26 | -9.7884273529052734e-001 2.2761307656764984e-001
27 | <_>
28 |
29 | 0 -1 292668 -9.1749490820802748e-005
30 |
31 | 2.9220622777938843e-001 -8.3195126056671143e-001
32 |
33 |
--------------------------------------------------------------------------------
/classifier/stage3.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | 4
5 | -9.6762913465499878e-001
6 |
7 | <_>
8 |
9 | 0 -1 130883 6.8016932345926762e-005
10 |
11 | -6.9405460357666016e-001 4.2146533727645874e-001
12 | <_>
13 |
14 | 0 -1 21092 2.5270378682762384e-003
15 |
16 | 2.9075825214385986e-001 -9.3532639741897583e-001
17 | <_>
18 |
19 | 0 -1 295867 -5.9558178691077046e-006
20 |
21 | -8.4807384014129639e-001 3.0522018671035767e-001
22 | <_>
23 |
24 | 0 -1 295867 5.9294161474099383e-006
25 |
26 | 3.5653164982795715e-001 -9.9828380346298218e-001
27 |
28 |
--------------------------------------------------------------------------------
/classifier/stage4.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | 3
5 | -1.4790147542953491e-001
6 |
7 | <_>
8 |
9 | 0 -1 295867 -6.0298630160104949e-006
10 |
11 | -8.6544436216354370e-001 3.7327477335929871e-001
12 | <_>
13 |
14 | 0 -1 285506 -5.3857154853176326e-005
15 |
16 | 4.1779288649559021e-001 -6.8576753139495850e-001
17 | <_>
18 |
19 | 0 -1 21092 -7.4640125967562199e-004
20 |
21 | -9.8402875661849976e-001 2.9975003004074097e-001
22 |
23 |
--------------------------------------------------------------------------------
/classifier/stage5.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | 5
5 | -6.4234280586242676e-001
6 |
7 | <_>
8 |
9 | 0 -1 188053 2.6030194014310837e-002
10 |
11 | -5.9079927206039429e-001 5.5949991941452026e-001
12 | <_>
13 |
14 | 0 -1 288794 -1.1487156734801829e-004
15 |
16 | 3.5113725066184998e-001 -8.3308726549148560e-001
17 | <_>
18 |
19 | 0 -1 1308 4.8153925687074661e-002
20 |
21 | -7.1412664651870728e-001 3.0667838454246521e-001
22 | <_>
23 |
24 | 0 -1 1161 -5.9005141258239746e-002
25 |
26 | -9.4212603569030762e-001 2.5226300954818726e-001
27 | <_>
28 |
29 | 0 -1 42335 8.0452084541320801e-002
30 |
31 | 2.5081482529640198e-001 -9.6162217855453491e-001
32 |
33 |
--------------------------------------------------------------------------------
/classifier/stage6.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | 6
5 | -8.3931213617324829e-001
6 |
7 | <_>
8 |
9 | 0 -1 749 1.5571638869005255e-005
10 |
11 | -8.0575537681579590e-001 2.6608934998512268e-001
12 | <_>
13 |
14 | 0 -1 200828 -2.3084910935722291e-004
15 |
16 | 2.3701831698417664e-001 -8.9803802967071533e-001
17 | <_>
18 |
19 | 0 -1 841 2.1151141263544559e-003
20 |
21 | 2.5339540839195251e-001 -9.8738276958465576e-001
22 | <_>
23 |
24 | 0 -1 293627 5.9348781178414356e-006
25 |
26 | 2.0503005385398865e-001 -8.4154272079467773e-001
27 | <_>
28 |
29 | 0 -1 290785 -1.4691188698634505e-004
30 |
31 | 2.4083861708641052e-001 -8.0533689260482788e-001
32 | <_>
33 |
34 | 0 -1 115473 -8.8780790567398071e-002
35 |
36 | -9.5632034540176392e-001 1.6521719098091125e-001
37 |
38 |
--------------------------------------------------------------------------------
/preview.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/preview.jpg
--------------------------------------------------------------------------------
/results/1-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/1-cv.jpg
--------------------------------------------------------------------------------
/results/1-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/1-sk.jpg
--------------------------------------------------------------------------------
/results/2-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/2-cv.jpg
--------------------------------------------------------------------------------
/results/2-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/2-sk.jpg
--------------------------------------------------------------------------------
/results/3-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/3-cv.jpg
--------------------------------------------------------------------------------
/results/3-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/3-sk.jpg
--------------------------------------------------------------------------------
/results/4-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/4-cv.jpg
--------------------------------------------------------------------------------
/results/4-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/4-sk.jpg
--------------------------------------------------------------------------------
/results/6-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/6-cv.jpg
--------------------------------------------------------------------------------
/results/6-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/6-sk.jpg
--------------------------------------------------------------------------------
/results/7-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/7-cv.jpg
--------------------------------------------------------------------------------
/results/7-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/7-sk.jpg
--------------------------------------------------------------------------------
/results/8-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/8-cv.jpg
--------------------------------------------------------------------------------
/results/8-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/8-sk.jpg
--------------------------------------------------------------------------------
/results/cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/cv.jpg
--------------------------------------------------------------------------------
/src/.gitignore:
--------------------------------------------------------------------------------
1 | *.jpg
--------------------------------------------------------------------------------
/src/detect.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | import cv2
5 | from PIL import Image, ImageDraw
6 |
7 |
8 | def detect(im, xml):
9 | digit_cascade = cv2.CascadeClassifier(xml)
10 | digits = digit_cascade.detectMultiScale(im)
11 | return digits
12 |
13 |
14 | def annotate_detection(im, regions, color=128):
15 | clone = im.copy()
16 | draw = ImageDraw.Draw(clone)
17 | for (x, y, w, h) in regions:
18 | draw.rectangle((x, y, x+w, y+h), outline=color)
19 | return clone
20 |
21 |
22 | def crop_detection(im, regions):
23 | return [im.crop((x, y, x+w, y+h)) for (x, y, w, h) in regions]
24 |
25 | if __name__ == '__main__':
26 | img = cv2.imread('../asset/test/7.jpg')
27 | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
28 | im = Image.open('../asset/test/7.jpg')
29 | digits = detect(gray, '../asset/classifier2/cascade.xml')
30 | result = annotate_detection(im, digits)
31 | result.show()
32 |
--------------------------------------------------------------------------------
/src/load_labels.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | import struct
5 | import numpy as np
6 | from PIL import Image
7 | import argparse
8 |
9 | def get_labels(file):
10 | magic, num = struct.unpack(">II", file.read(8))
11 | if magic != 2049:
12 | raise ValueError('Magic number mismatch, expected 2049,' +
13 | ' got %d' % magic)
14 |
15 | return np.fromfile(file, dtype=np.int8), num
16 |
17 |
18 | def get_images(file):
19 | magic, num, rows, cols = struct.unpack(">IIII", file.read(16))
20 | if magic != 2051:
21 | raise ValueError('Magic number mismatch, expected 2051,' +
22 | ' got %d' % magic)
23 | images = np.fromfile(file, dtype=np.uint8).reshape(num, rows * cols)
24 | return images, num, rows, cols
25 |
26 |
27 | def get_data(label_filename, image_filename):
28 | with open(label_filename, 'rb') as label_file:
29 | labels, num_labels = get_labels(label_file)
30 |
31 | with open(image_filename, 'rb') as image_file:
32 | images, num_images, rows, cols = get_images(image_file)
33 |
34 | if num_labels != num_images:
35 | print '[WARNING]: Number of images and labels mismatch'
36 |
37 | return images, labels, num_labels, rows, cols
38 |
39 | if __name__ == '__main__':
40 | parser = argparse.ArgumentParser()
41 | parser.add_argument("label_file", type=str)
42 | parser.add_argument("image_file", type=str)
43 |
44 | args = parser.parse_args()
45 |
46 | images, labels, num, rows, cols = get_data(args.label_file,
47 | args.image_file)
48 | print 'First:', labels[0]
49 | Image.fromarray(images[0].reshape(rows, cols)).show()
50 | print 'Last:', labels[-1]
51 | Image.fromarray(images[-1].reshape(rows, cols)).show()
52 | print 'Length', len(labels)
53 |
--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | from PIL import Image, ImageFont
5 | import cv2
6 | import numpy as np
7 |
8 | from detect import detect, crop_detection, annotate_detection
9 | from load_labels import get_data
10 | from recognize import cvtrain, sktrain, preprocess
11 | from recognize import annotate_recognition
12 | from glob import glob
13 | import os
14 |
15 | SAMPLE_SIZE = (28, 28)
16 | SZ = 28
17 | LABEL_FILE = '../MNIST/train-labels.idx1-ubyte'
18 | IMAGE_FILE = '../MNIST/train-images.idx3-ubyte'
19 | CASCADE_FILE = '../classifier/cascade.xml'
20 | TEST_FILES = '../test/'
21 | RESULT_FILES = '../results/'
22 |
23 | FONT_FILE = 'arial.ttf'
24 | FONT_SIZE = 30
25 | TEST_FONT = '5'
26 | TRAIN_SIZE = 10000
27 |
28 | bin_n = 16 # Number of bins
29 | svm_params = dict(kernel_type=cv2.SVM_LINEAR,
30 | svm_type=cv2.SVM_C_SVC,
31 | C=2.67, gamma=5.383)
32 |
33 | affine_flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR
34 |
35 |
36 | def main():
37 | images, labels, num, rows, cols = get_data(LABEL_FILE,
38 | IMAGE_FILE)
39 | print 'Training OpenCV SVM...'
40 | svc1 = cvtrain(images[:TRAIN_SIZE], labels[:TRAIN_SIZE], num, rows, cols)
41 |
42 | print 'Training sklearn SVM...'
43 | svc2 = sktrain(images[:TRAIN_SIZE], labels[:TRAIN_SIZE])
44 |
45 | filenames = glob(TEST_FILES + "/*.jpg")
46 | for filename in filenames:
47 | print 'Processing', filename
48 | img = cv2.imread(filename)
49 | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
50 | im = Image.open(filename)
51 | digits = detect(gray, CASCADE_FILE)
52 | results = crop_detection(im.copy(), digits)
53 | test = [np.float32(i.resize(SAMPLE_SIZE)).ravel() for i in results]
54 |
55 | testdata = preprocess(test, rows, cols).reshape(-1, bin_n * 4)
56 | yhat1 = svc1.predict_all(testdata)
57 | yhat1 = yhat1.astype(np.uint8).ravel()
58 | yhat2 = svc2.predict(test)
59 |
60 | font = ImageFont.truetype(FONT_FILE, FONT_SIZE)
61 | detected = annotate_detection(im.copy(), digits)
62 |
63 | basename = os.path.basename(filename)
64 | resultname = RESULT_FILES + '/' + basename
65 |
66 | print 'OpenCV results'
67 | recognized = annotate_recognition(detected, digits, yhat1, font)
68 | recognized.show()
69 | recognized.save(resultname.replace('.jpg', '-cv.jpg'))
70 |
71 | print 'sklearn results'
72 | recognized = annotate_recognition(detected, digits, yhat2, font)
73 | recognized.show()
74 | recognized.save(resultname.replace('.jpg', '-sk.jpg'))
75 |
76 |
77 | if __name__ == '__main__':
78 | main()
79 |
--------------------------------------------------------------------------------
/src/recognize.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | from PIL import ImageDraw
5 |
6 | import cv2
7 | import numpy as np
8 | from sklearn import svm
9 |
10 | SAMPLE_SIZE = (28, 28)
11 | SZ = 28
12 | TEST_FONT = '5'
13 |
14 | bin_n = 16 # Number of bins
15 | svm_params = dict(kernel_type=cv2.SVM_LINEAR,
16 | svm_type=cv2.SVM_C_SVC)
17 |
18 | affine_flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR
19 |
20 |
21 | def deskew(img):
22 | m = cv2.moments(img)
23 | if abs(m['mu02']) < 1e-2:
24 | return img.copy()
25 | skew = m['mu11']/m['mu02']
26 | M = np.float32([[1, skew, -0.5*SZ*skew], [0, 1, 0]])
27 | img = cv2.warpAffine(img, M, (SZ, SZ), flags=affine_flags)
28 | return img
29 |
30 |
31 | def hog(img):
32 | gx = cv2.Sobel(img, cv2.CV_32F, 1, 0)
33 | gy = cv2.Sobel(img, cv2.CV_32F, 0, 1)
34 | mag, ang = cv2.cartToPolar(gx, gy)
35 | # quantizing binvalues in (0...16)
36 | bins = np.int32(bin_n * ang / (2 * np.pi))
37 | bin_cells = bins[:10, :10], bins[10:, :10], bins[:10, 10:], bins[10:, 10:]
38 | mag_cells = mag[:10, :10], mag[10:, :10], mag[:10, 10:], mag[10:, 10:]
39 | hists = [np.bincount(b.ravel(), m.ravel(), bin_n)
40 | for b, m in zip(bin_cells, mag_cells)]
41 | hist = np.hstack(hists) # hist is a 64 bit vector
42 | return hist
43 |
44 |
45 | def cvtrain(images, labels, num, rows, cols):
46 | svc = cv2.SVM()
47 | traindata = preprocess(images, rows, cols)
48 | responses = np.float32(labels[:, None])
49 | svc.train(traindata, responses, params=svm_params)
50 | return svc
51 |
52 |
53 | def sktrain(images, labels):
54 | svc = svm.SVC(kernel='linear')
55 | svc.fit(images, labels)
56 | return svc
57 |
58 |
59 | def preprocess(images, rows, cols):
60 | deskewed = [deskew(im.reshape(rows, cols)) for im in images]
61 | hogdata = [hog(im) for im in deskewed]
62 | return np.float32(hogdata).reshape(-1, 64)
63 |
64 |
65 | def get_font_size(font):
66 | return max(font.getsize(TEST_FONT))
67 |
68 |
69 | def annotate_recognition(im, regions, labels, font, color=255):
70 | clone = im.copy()
71 | draw = ImageDraw.Draw(clone)
72 | size = get_font_size(font)
73 | for idx, (x, y, w, h) in enumerate(regions):
74 | draw.text(
75 | (x+w-size, y+h-size), str(labels[idx]), font=font, fill=color)
76 | return clone
77 |
--------------------------------------------------------------------------------
/test/1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/1.jpg
--------------------------------------------------------------------------------
/test/2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/2.jpg
--------------------------------------------------------------------------------
/test/3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/3.jpg
--------------------------------------------------------------------------------
/test/4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/4.jpg
--------------------------------------------------------------------------------
/test/6.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/6.jpg
--------------------------------------------------------------------------------
/test/7.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/7.jpg
--------------------------------------------------------------------------------
/test/8.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/8.jpg
--------------------------------------------------------------------------------