├── .gitignore
├── README.md
├── classifier
    ├── cascade.xml
    ├── params.xml
    ├── stage0.xml
    ├── stage1.xml
    ├── stage2.xml
    ├── stage3.xml
    ├── stage4.xml
    ├── stage5.xml
    └── stage6.xml
├── preview.jpg
├── results
    ├── 1-cv.jpg
    ├── 1-sk.jpg
    ├── 2-cv.jpg
    ├── 2-sk.jpg
    ├── 3-cv.jpg
    ├── 3-sk.jpg
    ├── 4-cv.jpg
    ├── 4-sk.jpg
    ├── 6-cv.jpg
    ├── 6-sk.jpg
    ├── 7-cv.jpg
    ├── 7-sk.jpg
    ├── 8-cv.jpg
    ├── 8-sk.jpg
    └── cv.jpg
├── src
    ├── .gitignore
    ├── detect.py
    ├── load_labels.py
    ├── main.py
    └── recognize.py
└── test
    ├── 1.jpg
    ├── 2.jpg
    ├── 3.jpg
    ├── 4.jpg
    ├── 6.jpg
    ├── 7.jpg
    └── 8.jpg


/.gitignore:
--------------------------------------------------------------------------------
1 | asset2
2 | MNIST
3 | *.pyc


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Digit Detection & Recognition
 2 | 
 3 | ### What is it?
 4 | 
 5 | Digit detection and recognition with AdaBoost and SVM.
 6 | 
 7 | ![](preview.jpg)
 8 | 
 9 | ### How it works
10 | 
11 | 1. Train a cascade classifier for detection. The cascade classifier in `classifier/cascade.xml` is trained with 7000 positive samples and 9000 negative samples in 10 stages.
12 | 2. Train a SVM with the MNIST database.
13 | 3. Detect the digits in the image.
14 | 4. For each detected region, scale them to the same size as the samples in MNIST, then use the trained SVM to recognize(classify) the digits. For better results we can deskew the images with their momentum first, then use the HOG descriptors for testing.
15 | 
16 | ### Dependencies
17 | 
18 | These scripts need python 2.7+ and the following libraries to work:
19 | 
20 | 1. pillow(~2.8.1)
21 | 2. numpy(~1.9.0)
22 | 3. python-opencv(~2.4.11)
23 | 4. scikit-learn (~0.15.2)
24 | The simplest way to install all of them is to install [python(x,y)](https://code.google.com/p/pythonxy/wiki/Downloads?tm=2).
25 | 
26 | If you can't install python(x,y), You can install python, numpy and python-opencv seperately, then install pip and pillow.
27 | 
28 | 1. Install python. Just use the installer from [python's website](https://www.python.org/downloads/)
29 | 2. Install numpy. Just use the installer from [scipy's website](http://www.scipy.org/scipylib/download.html). (You don't need scipy to run this project, so you can just install numpy alone).
30 | 3. Install python-opencv. Download the release from [its sourceforge site](http://sourceforge.net/projects/opencvlibrary/files/). (Choose the release based on your operating system, then choose version 2.4.11). The executable is just an archive. Extract the files, then copy `cv2.pyd` to the `lib/site-packages` folder on your python installation path.
31 | 4. Install pip. Download [the script for installing pip](https://bootstrap.pypa.io/get-pip.py), open cmd (or termianl if you are using Linux/Mac OS X), go to the path where the downloaded script resides, and run `python get-pip.py`
32 | 5. Install pillow. Run `pip install pillow`. 
33 | 6. Install scikit-learn. Run `pip install scikit-learn`
34 | 
35 | If you are running the code under Linux/Mac OS X and the scripts throw `AttributeError: __float__`, make sure your pillow has jpeg support (consult [Pillow's document](http://pillow.readthedocs.org/en/latest/installation.html)) e.g. try:
36 | 
37 | ```
38 | sudo apt-get install libjpeg-dev
39 | sudo pip uninstall pillow
40 | sudo pip install pillow
41 | ```
42 | 
43 | If you have any problem installing the dependencies, contact the author.
44 | 
45 | ### How to generate the results
46 | 
47 | Enter the `src` directory, run
48 | 
49 | ```
50 | python main.py
51 | ```
52 | 
53 | It will use images(`.jpg` only) under `test` directory to produce the results. The results will show up in `results` directory. Results generated with OpenCV will have `-cv` in its filename and results generated with sklearn will have `-sk` in its filename.
54 | 
55 | 
56 | ### Directory structure
57 | 
58 | ```
59 | .
60 | ├─ README.md
61 | ├─ doc (documentations, reports)
62 | │   └── ...
63 | ├─ classifier (OpenCV cascade classifier)
64 | │   ├── cascade.xml (the classifier parameter file)
65 | │   └── ...
66 | ├─ MNIST (The MNIST database)
67 | │   ├── train-images.idx3-ubyte
68 | │   └── train-labels.idx1-ubyte
69 | ├─ test (test images)
70 | │   └── ...
71 | ├─ results (the results)
72 | │   └── ...
73 | └─ src (the python source code)
74 |     ├── detect.py (detection code)
75 |     ├── load_labels.py (script to load MNIST data)
76 |     ├── recognize.py (recognition code)
77 |     └── main.py (generate the results)
78 | ```
79 | 
80 | ### About
81 | 
82 | * [Github repository](https://github.com/joyeecheung/digit-detection-recognition)
83 | * Author: Qiuyi Zhang
84 | * Time: Jul. 2015


--------------------------------------------------------------------------------
/classifier/cascade.xml:
--------------------------------------------------------------------------------
  1 | <?xml version="1.0"?>
  2 | <opencv_storage>
  3 | <cascade>
  4 |   <stageType>BOOST</stageType>
  5 |   <featureType>HAAR</featureType>
  6 |   <height>28</height>
  7 |   <width>28</width>
  8 |   <stageParams>
  9 |     <boostType>GAB</boostType>
 10 |     <minHitRate>9.9500000476837158e-001</minHitRate>
 11 |     <maxFalseAlarm>5.0000000000000000e-001</maxFalseAlarm>
 12 |     <weightTrimRate>9.4999999999999996e-001</weightTrimRate>
 13 |     <maxDepth>1</maxDepth>
 14 |     <maxWeakCount>100</maxWeakCount></stageParams>
 15 |   <featureParams>
 16 |     <maxCatCount>0</maxCatCount>
 17 |     <featSize>1</featSize>
 18 |     <mode>BASIC</mode></featureParams>
 19 |   <stageNum>7</stageNum>
 20 |   <stages>
 21 |     <!-- stage 0 -->
 22 |     <_>
 23 |       <maxWeakCount>2</maxWeakCount>
 24 |       <stageThreshold>-1.8347458541393280e-001</stageThreshold>
 25 |       <weakClassifiers>
 26 |         <_>
 27 |           <internalNodes>
 28 |             0 -1 4 7.3850885033607483e-002</internalNodes>
 29 |           <leafValues>
 30 |             -9.1880136728286743e-001 7.2553080320358276e-001</leafValues></_>
 31 |         <_>
 32 |           <internalNodes>
 33 |             0 -1 16 -1.0428477823734283e-001</internalNodes>
 34 |           <leafValues>
 35 |             7.3532676696777344e-001 -8.7772518396377563e-001</leafValues></_></weakClassifiers></_>
 36 |     <!-- stage 1 -->
 37 |     <_>
 38 |       <maxWeakCount>4</maxWeakCount>
 39 |       <stageThreshold>-1.8423573970794678e+000</stageThreshold>
 40 |       <weakClassifiers>
 41 |         <_>
 42 |           <internalNodes>
 43 |             0 -1 14 2.7671821415424347e-002</internalNodes>
 44 |           <leafValues>
 45 |             -8.7964987754821777e-001 5.1012891530990601e-001</leafValues></_>
 46 |         <_>
 47 |           <internalNodes>
 48 |             0 -1 13 -1.5674557536840439e-002</internalNodes>
 49 |           <leafValues>
 50 |             4.4505974650382996e-001 -8.3683210611343384e-001</leafValues></_>
 51 |         <_>
 52 |           <internalNodes>
 53 |             0 -1 21 -1.1460572568466887e-004</internalNodes>
 54 |           <leafValues>
 55 |             3.2763624191284180e-001 -8.8302916288375854e-001</leafValues></_>
 56 |         <_>
 57 |           <internalNodes>
 58 |             0 -1 5 3.4735631942749023e-001</internalNodes>
 59 |           <leafValues>
 60 |             -5.2473807334899902e-001 6.4939862489700317e-001</leafValues></_></weakClassifiers></_>
 61 |     <!-- stage 2 -->
 62 |     <_>
 63 |       <maxWeakCount>5</maxWeakCount>
 64 |       <stageThreshold>-6.9036191701889038e-001</stageThreshold>
 65 |       <weakClassifiers>
 66 |         <_>
 67 |           <internalNodes>
 68 |             0 -1 7 3.8202914595603943e-001</internalNodes>
 69 |           <leafValues>
 70 |             -6.2518489360809326e-001 5.9963268041610718e-001</leafValues></_>
 71 |         <_>
 72 |           <internalNodes>
 73 |             0 -1 20 -1.1428447032812983e-004</internalNodes>
 74 |           <leafValues>
 75 |             3.5979631543159485e-001 -8.4728848934173584e-001</leafValues></_>
 76 |         <_>
 77 |           <internalNodes>
 78 |             0 -1 24 -6.0800916799053084e-006</internalNodes>
 79 |           <leafValues>
 80 |             -9.8068064451217651e-001 2.6229214668273926e-001</leafValues></_>
 81 |         <_>
 82 |           <internalNodes>
 83 |             0 -1 11 -2.0428642164915800e-003</internalNodes>
 84 |           <leafValues>
 85 |             -9.7884273529052734e-001 2.2761307656764984e-001</leafValues></_>
 86 |         <_>
 87 |           <internalNodes>
 88 |             0 -1 22 -9.1749490820802748e-005</internalNodes>
 89 |           <leafValues>
 90 |             2.9220622777938843e-001 -8.3195126056671143e-001</leafValues></_></weakClassifiers></_>
 91 |     <!-- stage 3 -->
 92 |     <_>
 93 |       <maxWeakCount>4</maxWeakCount>
 94 |       <stageThreshold>-9.6762913465499878e-001</stageThreshold>
 95 |       <weakClassifiers>
 96 |         <_>
 97 |           <internalNodes>
 98 |             0 -1 10 6.8016932345926762e-005</internalNodes>
 99 |           <leafValues>
100 |             -6.9405460357666016e-001 4.2146533727645874e-001</leafValues></_>
101 |         <_>
102 |           <internalNodes>
103 |             0 -1 6 2.5270378682762384e-003</internalNodes>
104 |           <leafValues>
105 |             2.9075825214385986e-001 -9.3532639741897583e-001</leafValues></_>
106 |         <_>
107 |           <internalNodes>
108 |             0 -1 25 -5.9558178691077046e-006</internalNodes>
109 |           <leafValues>
110 |             -8.4807384014129639e-001 3.0522018671035767e-001</leafValues></_>
111 |         <_>
112 |           <internalNodes>
113 |             0 -1 25 5.9294161474099383e-006</internalNodes>
114 |           <leafValues>
115 |             3.5653164982795715e-001 -9.9828380346298218e-001</leafValues></_></weakClassifiers></_>
116 |     <!-- stage 4 -->
117 |     <_>
118 |       <maxWeakCount>3</maxWeakCount>
119 |       <stageThreshold>-1.4790147542953491e-001</stageThreshold>
120 |       <weakClassifiers>
121 |         <_>
122 |           <internalNodes>
123 |             0 -1 25 -6.0298630160104949e-006</internalNodes>
124 |           <leafValues>
125 |             -8.6544436216354370e-001 3.7327477335929871e-001</leafValues></_>
126 |         <_>
127 |           <internalNodes>
128 |             0 -1 17 -5.3857154853176326e-005</internalNodes>
129 |           <leafValues>
130 |             4.1779288649559021e-001 -6.8576753139495850e-001</leafValues></_>
131 |         <_>
132 |           <internalNodes>
133 |             0 -1 6 -7.4640125967562199e-004</internalNodes>
134 |           <leafValues>
135 |             -9.8402875661849976e-001 2.9975003004074097e-001</leafValues></_></weakClassifiers></_>
136 |     <!-- stage 5 -->
137 |     <_>
138 |       <maxWeakCount>5</maxWeakCount>
139 |       <stageThreshold>-6.4234280586242676e-001</stageThreshold>
140 |       <weakClassifiers>
141 |         <_>
142 |           <internalNodes>
143 |             0 -1 12 2.6030194014310837e-002</internalNodes>
144 |           <leafValues>
145 |             -5.9079927206039429e-001 5.5949991941452026e-001</leafValues></_>
146 |         <_>
147 |           <internalNodes>
148 |             0 -1 18 -1.1487156734801829e-004</internalNodes>
149 |           <leafValues>
150 |             3.5113725066184998e-001 -8.3308726549148560e-001</leafValues></_>
151 |         <_>
152 |           <internalNodes>
153 |             0 -1 3 4.8153925687074661e-002</internalNodes>
154 |           <leafValues>
155 |             -7.1412664651870728e-001 3.0667838454246521e-001</leafValues></_>
156 |         <_>
157 |           <internalNodes>
158 |             0 -1 2 -5.9005141258239746e-002</internalNodes>
159 |           <leafValues>
160 |             -9.4212603569030762e-001 2.5226300954818726e-001</leafValues></_>
161 |         <_>
162 |           <internalNodes>
163 |             0 -1 8 8.0452084541320801e-002</internalNodes>
164 |           <leafValues>
165 |             2.5081482529640198e-001 -9.6162217855453491e-001</leafValues></_></weakClassifiers></_>
166 |     <!-- stage 6 -->
167 |     <_>
168 |       <maxWeakCount>6</maxWeakCount>
169 |       <stageThreshold>-8.3931213617324829e-001</stageThreshold>
170 |       <weakClassifiers>
171 |         <_>
172 |           <internalNodes>
173 |             0 -1 0 1.5571638869005255e-005</internalNodes>
174 |           <leafValues>
175 |             -8.0575537681579590e-001 2.6608934998512268e-001</leafValues></_>
176 |         <_>
177 |           <internalNodes>
178 |             0 -1 15 -2.3084910935722291e-004</internalNodes>
179 |           <leafValues>
180 |             2.3701831698417664e-001 -8.9803802967071533e-001</leafValues></_>
181 |         <_>
182 |           <internalNodes>
183 |             0 -1 1 2.1151141263544559e-003</internalNodes>
184 |           <leafValues>
185 |             2.5339540839195251e-001 -9.8738276958465576e-001</leafValues></_>
186 |         <_>
187 |           <internalNodes>
188 |             0 -1 23 5.9348781178414356e-006</internalNodes>
189 |           <leafValues>
190 |             2.0503005385398865e-001 -8.4154272079467773e-001</leafValues></_>
191 |         <_>
192 |           <internalNodes>
193 |             0 -1 19 -1.4691188698634505e-004</internalNodes>
194 |           <leafValues>
195 |             2.4083861708641052e-001 -8.0533689260482788e-001</leafValues></_>
196 |         <_>
197 |           <internalNodes>
198 |             0 -1 9 -8.8780790567398071e-002</internalNodes>
199 |           <leafValues>
200 |             -9.5632034540176392e-001 1.6521719098091125e-001</leafValues></_></weakClassifiers></_></stages>
201 |   <features>
202 |     <_>
203 |       <rects>
204 |         <_>
205 |           0 0 18 2 -1.</_>
206 |         <_>
207 |           9 0 9 2 2.</_></rects>
208 |       <tilted>0</tilted></_>
209 |     <_>
210 |       <rects>
211 |         <_>
212 |           0 0 20 2 -1.</_>
213 |         <_>
214 |           10 0 10 2 2.</_></rects>
215 |       <tilted>0</tilted></_>
216 |     <_>
217 |       <rects>
218 |         <_>
219 |           0 0 28 28 -1.</_>
220 |         <_>
221 |           14 0 14 28 2.</_></rects>
222 |       <tilted>0</tilted></_>
223 |     <_>
224 |       <rects>
225 |         <_>
226 |           0 0 21 10 -1.</_>
227 |         <_>
228 |           0 5 21 5 2.</_></rects>
229 |       <tilted>0</tilted></_>
230 |     <_>
231 |       <rects>
232 |         <_>
233 |           0 6 14 17 -1.</_>
234 |         <_>
235 |           7 6 7 17 2.</_></rects>
236 |       <tilted>0</tilted></_>
237 |     <_>
238 |       <rects>
239 |         <_>
240 |           0 6 27 17 -1.</_>
241 |         <_>
242 |           9 6 9 17 3.</_></rects>
243 |       <tilted>0</tilted></_>
244 |     <_>
245 |       <rects>
246 |         <_>
247 |           0 27 27 1 -1.</_>
248 |         <_>
249 |           9 27 9 1 3.</_></rects>
250 |       <tilted>0</tilted></_>
251 |     <_>
252 |       <rects>
253 |         <_>
254 |           1 6 27 18 -1.</_>
255 |         <_>
256 |           10 6 9 18 3.</_></rects>
257 |       <tilted>0</tilted></_>
258 |     <_>
259 |       <rects>
260 |         <_>
261 |           2 0 26 23 -1.</_>
262 |         <_>
263 |           15 0 13 23 2.</_></rects>
264 |       <tilted>0</tilted></_>
265 |     <_>
266 |       <rects>
267 |         <_>
268 |           6 0 22 28 -1.</_>
269 |         <_>
270 |           6 14 22 14 2.</_></rects>
271 |       <tilted>0</tilted></_>
272 |     <_>
273 |       <rects>
274 |         <_>
275 |           7 0 2 6 -1.</_>
276 |         <_>
277 |           8 0 1 6 2.</_></rects>
278 |       <tilted>0</tilted></_>
279 |     <_>
280 |       <rects>
281 |         <_>
282 |           8 0 20 2 -1.</_>
283 |         <_>
284 |           18 0 10 2 2.</_></rects>
285 |       <tilted>0</tilted></_>
286 |     <_>
287 |       <rects>
288 |         <_>
289 |           10 20 16 8 -1.</_>
290 |         <_>
291 |           10 20 8 4 2.</_>
292 |         <_>
293 |           18 24 8 4 2.</_></rects>
294 |       <tilted>0</tilted></_>
295 |     <_>
296 |       <rects>
297 |         <_>
298 |           10 20 10 8 -1.</_>
299 |         <_>
300 |           10 24 10 4 2.</_></rects>
301 |       <tilted>0</tilted></_>
302 |     <_>
303 |       <rects>
304 |         <_>
305 |           11 0 11 10 -1.</_>
306 |         <_>
307 |           11 5 11 5 2.</_></rects>
308 |       <tilted>0</tilted></_>
309 |     <_>
310 |       <rects>
311 |         <_>
312 |           11 22 5 6 -1.</_>
313 |         <_>
314 |           11 25 5 3 2.</_></rects>
315 |       <tilted>0</tilted></_>
316 |     <_>
317 |       <rects>
318 |         <_>
319 |           14 5 14 18 -1.</_>
320 |         <_>
321 |           21 5 7 18 2.</_></rects>
322 |       <tilted>0</tilted></_>
323 |     <_>
324 |       <rects>
325 |         <_>
326 |           21 22 6 4 -1.</_>
327 |         <_>
328 |           21 24 6 2 2.</_></rects>
329 |       <tilted>0</tilted></_>
330 |     <_>
331 |       <rects>
332 |         <_>
333 |           22 12 6 4 -1.</_>
334 |         <_>
335 |           24 12 2 4 3.</_></rects>
336 |       <tilted>0</tilted></_>
337 |     <_>
338 |       <rects>
339 |         <_>
340 |           23 2 4 10 -1.</_>
341 |         <_>
342 |           25 2 2 10 2.</_></rects>
343 |       <tilted>0</tilted></_>
344 |     <_>
345 |       <rects>
346 |         <_>
347 |           23 5 2 8 -1.</_>
348 |         <_>
349 |           24 5 1 8 2.</_></rects>
350 |       <tilted>0</tilted></_>
351 |     <_>
352 |       <rects>
353 |         <_>
354 |           23 8 2 8 -1.</_>
355 |         <_>
356 |           24 8 1 8 2.</_></rects>
357 |       <tilted>0</tilted></_>
358 |     <_>
359 |       <rects>
360 |         <_>
361 |           23 14 2 10 -1.</_>
362 |         <_>
363 |           24 14 1 10 2.</_></rects>
364 |       <tilted>0</tilted></_>
365 |     <_>
366 |       <rects>
367 |         <_>
368 |           24 0 4 8 -1.</_>
369 |         <_>
370 |           24 4 4 4 2.</_></rects>
371 |       <tilted>0</tilted></_>
372 |     <_>
373 |       <rects>
374 |         <_>
375 |           24 0 4 10 -1.</_>
376 |         <_>
377 |           24 5 4 5 2.</_></rects>
378 |       <tilted>0</tilted></_>
379 |     <_>
380 |       <rects>
381 |         <_>
382 |           24 16 4 12 -1.</_>
383 |         <_>
384 |           26 16 2 12 2.</_></rects>
385 |       <tilted>0</tilted></_></features></cascade>
386 | </opencv_storage>
387 | 


--------------------------------------------------------------------------------
/classifier/params.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <opencv_storage>
 3 | <params>
 4 |   <stageType>BOOST</stageType>
 5 |   <featureType>HAAR</featureType>
 6 |   <height>28</height>
 7 |   <width>28</width>
 8 |   <stageParams>
 9 |     <boostType>GAB</boostType>
10 |     <minHitRate>9.9500000476837158e-001</minHitRate>
11 |     <maxFalseAlarm>5.0000000000000000e-001</maxFalseAlarm>
12 |     <weightTrimRate>9.4999999999999996e-001</weightTrimRate>
13 |     <maxDepth>1</maxDepth>
14 |     <maxWeakCount>100</maxWeakCount></stageParams>
15 |   <featureParams>
16 |     <maxCatCount>0</maxCatCount>
17 |     <featSize>1</featSize>
18 |     <mode>BASIC</mode></featureParams></params>
19 | </opencv_storage>
20 | 


--------------------------------------------------------------------------------
/classifier/stage0.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <opencv_storage>
 3 | <stage0>
 4 |   <maxWeakCount>2</maxWeakCount>
 5 |   <stageThreshold>-1.8347458541393280e-001</stageThreshold>
 6 |   <weakClassifiers>
 7 |     <_>
 8 |       <internalNodes>
 9 |         0 -1 8540 7.3850885033607483e-002</internalNodes>
10 |       <leafValues>
11 |         -9.1880136728286743e-001 7.2553080320358276e-001</leafValues></_>
12 |     <_>
13 |       <internalNodes>
14 |         0 -1 228204 -1.0428477823734283e-001</internalNodes>
15 |       <leafValues>
16 |         7.3532676696777344e-001 -8.7772518396377563e-001</leafValues></_></weakClassifiers></stage0>
17 | </opencv_storage>
18 | 


--------------------------------------------------------------------------------
/classifier/stage1.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <opencv_storage>
 3 | <stage1>
 4 |   <maxWeakCount>4</maxWeakCount>
 5 |   <stageThreshold>-1.8423573970794678e+000</stageThreshold>
 6 |   <weakClassifiers>
 7 |     <_>
 8 |       <internalNodes>
 9 |         0 -1 189714 2.7671821415424347e-002</internalNodes>
10 |       <leafValues>
11 |         -8.7964987754821777e-001 5.1012891530990601e-001</leafValues></_>
12 |     <_>
13 |       <internalNodes>
14 |         0 -1 188081 -1.5674557536840439e-002</internalNodes>
15 |       <leafValues>
16 |         4.4505974650382996e-001 -8.3683210611343384e-001</leafValues></_>
17 |     <_>
18 |       <internalNodes>
19 |         0 -1 291829 -1.1460572568466887e-004</internalNodes>
20 |       <leafValues>
21 |         3.2763624191284180e-001 -8.8302916288375854e-001</leafValues></_>
22 |     <_>
23 |       <internalNodes>
24 |         0 -1 8687 3.4735631942749023e-001</internalNodes>
25 |       <leafValues>
26 |         -5.2473807334899902e-001 6.4939862489700317e-001</leafValues></_></weakClassifiers></stage1>
27 | </opencv_storage>
28 | 


--------------------------------------------------------------------------------
/classifier/stage2.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <opencv_storage>
 3 | <stage2>
 4 |   <maxWeakCount>5</maxWeakCount>
 5 |   <stageThreshold>-6.9036191701889038e-001</stageThreshold>
 6 |   <weakClassifiers>
 7 |     <_>
 8 |       <internalNodes>
 9 |         0 -1 29435 3.8202914595603943e-001</internalNodes>
10 |       <leafValues>
11 |         -6.2518489360809326e-001 5.9963268041610718e-001</leafValues></_>
12 |     <_>
13 |       <internalNodes>
14 |         0 -1 291303 -1.1428447032812983e-004</internalNodes>
15 |       <leafValues>
16 |         3.5979631543159485e-001 -8.4728848934173584e-001</leafValues></_>
17 |     <_>
18 |       <internalNodes>
19 |         0 -1 293629 -6.0800916799053084e-006</internalNodes>
20 |       <leafValues>
21 |         -9.8068064451217651e-001 2.6229214668273926e-001</leafValues></_>
22 |     <_>
23 |       <internalNodes>
24 |         0 -1 147239 -2.0428642164915800e-003</internalNodes>
25 |       <leafValues>
26 |         -9.7884273529052734e-001 2.2761307656764984e-001</leafValues></_>
27 |     <_>
28 |       <internalNodes>
29 |         0 -1 292668 -9.1749490820802748e-005</internalNodes>
30 |       <leafValues>
31 |         2.9220622777938843e-001 -8.3195126056671143e-001</leafValues></_></weakClassifiers></stage2>
32 | </opencv_storage>
33 | 


--------------------------------------------------------------------------------
/classifier/stage3.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <opencv_storage>
 3 | <stage3>
 4 |   <maxWeakCount>4</maxWeakCount>
 5 |   <stageThreshold>-9.6762913465499878e-001</stageThreshold>
 6 |   <weakClassifiers>
 7 |     <_>
 8 |       <internalNodes>
 9 |         0 -1 130883 6.8016932345926762e-005</internalNodes>
10 |       <leafValues>
11 |         -6.9405460357666016e-001 4.2146533727645874e-001</leafValues></_>
12 |     <_>
13 |       <internalNodes>
14 |         0 -1 21092 2.5270378682762384e-003</internalNodes>
15 |       <leafValues>
16 |         2.9075825214385986e-001 -9.3532639741897583e-001</leafValues></_>
17 |     <_>
18 |       <internalNodes>
19 |         0 -1 295867 -5.9558178691077046e-006</internalNodes>
20 |       <leafValues>
21 |         -8.4807384014129639e-001 3.0522018671035767e-001</leafValues></_>
22 |     <_>
23 |       <internalNodes>
24 |         0 -1 295867 5.9294161474099383e-006</internalNodes>
25 |       <leafValues>
26 |         3.5653164982795715e-001 -9.9828380346298218e-001</leafValues></_></weakClassifiers></stage3>
27 | </opencv_storage>
28 | 


--------------------------------------------------------------------------------
/classifier/stage4.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <opencv_storage>
 3 | <stage4>
 4 |   <maxWeakCount>3</maxWeakCount>
 5 |   <stageThreshold>-1.4790147542953491e-001</stageThreshold>
 6 |   <weakClassifiers>
 7 |     <_>
 8 |       <internalNodes>
 9 |         0 -1 295867 -6.0298630160104949e-006</internalNodes>
10 |       <leafValues>
11 |         -8.6544436216354370e-001 3.7327477335929871e-001</leafValues></_>
12 |     <_>
13 |       <internalNodes>
14 |         0 -1 285506 -5.3857154853176326e-005</internalNodes>
15 |       <leafValues>
16 |         4.1779288649559021e-001 -6.8576753139495850e-001</leafValues></_>
17 |     <_>
18 |       <internalNodes>
19 |         0 -1 21092 -7.4640125967562199e-004</internalNodes>
20 |       <leafValues>
21 |         -9.8402875661849976e-001 2.9975003004074097e-001</leafValues></_></weakClassifiers></stage4>
22 | </opencv_storage>
23 | 


--------------------------------------------------------------------------------
/classifier/stage5.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <opencv_storage>
 3 | <stage5>
 4 |   <maxWeakCount>5</maxWeakCount>
 5 |   <stageThreshold>-6.4234280586242676e-001</stageThreshold>
 6 |   <weakClassifiers>
 7 |     <_>
 8 |       <internalNodes>
 9 |         0 -1 188053 2.6030194014310837e-002</internalNodes>
10 |       <leafValues>
11 |         -5.9079927206039429e-001 5.5949991941452026e-001</leafValues></_>
12 |     <_>
13 |       <internalNodes>
14 |         0 -1 288794 -1.1487156734801829e-004</internalNodes>
15 |       <leafValues>
16 |         3.5113725066184998e-001 -8.3308726549148560e-001</leafValues></_>
17 |     <_>
18 |       <internalNodes>
19 |         0 -1 1308 4.8153925687074661e-002</internalNodes>
20 |       <leafValues>
21 |         -7.1412664651870728e-001 3.0667838454246521e-001</leafValues></_>
22 |     <_>
23 |       <internalNodes>
24 |         0 -1 1161 -5.9005141258239746e-002</internalNodes>
25 |       <leafValues>
26 |         -9.4212603569030762e-001 2.5226300954818726e-001</leafValues></_>
27 |     <_>
28 |       <internalNodes>
29 |         0 -1 42335 8.0452084541320801e-002</internalNodes>
30 |       <leafValues>
31 |         2.5081482529640198e-001 -9.6162217855453491e-001</leafValues></_></weakClassifiers></stage5>
32 | </opencv_storage>
33 | 


--------------------------------------------------------------------------------
/classifier/stage6.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <opencv_storage>
 3 | <stage6>
 4 |   <maxWeakCount>6</maxWeakCount>
 5 |   <stageThreshold>-8.3931213617324829e-001</stageThreshold>
 6 |   <weakClassifiers>
 7 |     <_>
 8 |       <internalNodes>
 9 |         0 -1 749 1.5571638869005255e-005</internalNodes>
10 |       <leafValues>
11 |         -8.0575537681579590e-001 2.6608934998512268e-001</leafValues></_>
12 |     <_>
13 |       <internalNodes>
14 |         0 -1 200828 -2.3084910935722291e-004</internalNodes>
15 |       <leafValues>
16 |         2.3701831698417664e-001 -8.9803802967071533e-001</leafValues></_>
17 |     <_>
18 |       <internalNodes>
19 |         0 -1 841 2.1151141263544559e-003</internalNodes>
20 |       <leafValues>
21 |         2.5339540839195251e-001 -9.8738276958465576e-001</leafValues></_>
22 |     <_>
23 |       <internalNodes>
24 |         0 -1 293627 5.9348781178414356e-006</internalNodes>
25 |       <leafValues>
26 |         2.0503005385398865e-001 -8.4154272079467773e-001</leafValues></_>
27 |     <_>
28 |       <internalNodes>
29 |         0 -1 290785 -1.4691188698634505e-004</internalNodes>
30 |       <leafValues>
31 |         2.4083861708641052e-001 -8.0533689260482788e-001</leafValues></_>
32 |     <_>
33 |       <internalNodes>
34 |         0 -1 115473 -8.8780790567398071e-002</internalNodes>
35 |       <leafValues>
36 |         -9.5632034540176392e-001 1.6521719098091125e-001</leafValues></_></weakClassifiers></stage6>
37 | </opencv_storage>
38 | 


--------------------------------------------------------------------------------
/preview.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/preview.jpg


--------------------------------------------------------------------------------
/results/1-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/1-cv.jpg


--------------------------------------------------------------------------------
/results/1-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/1-sk.jpg


--------------------------------------------------------------------------------
/results/2-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/2-cv.jpg


--------------------------------------------------------------------------------
/results/2-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/2-sk.jpg


--------------------------------------------------------------------------------
/results/3-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/3-cv.jpg


--------------------------------------------------------------------------------
/results/3-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/3-sk.jpg


--------------------------------------------------------------------------------
/results/4-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/4-cv.jpg


--------------------------------------------------------------------------------
/results/4-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/4-sk.jpg


--------------------------------------------------------------------------------
/results/6-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/6-cv.jpg


--------------------------------------------------------------------------------
/results/6-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/6-sk.jpg


--------------------------------------------------------------------------------
/results/7-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/7-cv.jpg


--------------------------------------------------------------------------------
/results/7-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/7-sk.jpg


--------------------------------------------------------------------------------
/results/8-cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/8-cv.jpg


--------------------------------------------------------------------------------
/results/8-sk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/8-sk.jpg


--------------------------------------------------------------------------------
/results/cv.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/cv.jpg


--------------------------------------------------------------------------------
/src/.gitignore:
--------------------------------------------------------------------------------
1 | *.jpg


--------------------------------------------------------------------------------
/src/detect.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | import cv2
 5 | from PIL import Image, ImageDraw
 6 | 
 7 | 
 8 | def detect(im, xml):
 9 |     digit_cascade = cv2.CascadeClassifier(xml)
10 |     digits = digit_cascade.detectMultiScale(im)
11 |     return digits
12 | 
13 | 
14 | def annotate_detection(im, regions, color=128):
15 |     clone = im.copy()
16 |     draw = ImageDraw.Draw(clone)
17 |     for (x, y, w, h) in regions:
18 |         draw.rectangle((x, y, x+w, y+h), outline=color)
19 |     return clone
20 | 
21 | 
22 | def crop_detection(im, regions):
23 |     return [im.crop((x, y, x+w, y+h)) for (x, y, w, h) in regions]
24 | 
25 | if __name__ == '__main__':
26 |     img = cv2.imread('../asset/test/7.jpg')
27 |     gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
28 |     im = Image.open('../asset/test/7.jpg')
29 |     digits = detect(gray, '../asset/classifier2/cascade.xml')
30 |     result = annotate_detection(im, digits)
31 |     result.show()
32 | 


--------------------------------------------------------------------------------
/src/load_labels.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | import struct
 5 | import numpy as np
 6 | from PIL import Image
 7 | import argparse
 8 | 
 9 | def get_labels(file):
10 |     magic, num = struct.unpack(">II", file.read(8))
11 |     if magic != 2049:
12 |         raise ValueError('Magic number mismatch, expected 2049,' +
13 |                          ' got %d' % magic)
14 | 
15 |     return np.fromfile(file, dtype=np.int8), num
16 | 
17 | 
18 | def get_images(file):
19 |     magic, num, rows, cols = struct.unpack(">IIII", file.read(16))
20 |     if magic != 2051:
21 |         raise ValueError('Magic number mismatch, expected 2051,' +
22 |                          ' got %d' % magic)
23 |     images = np.fromfile(file, dtype=np.uint8).reshape(num, rows * cols)
24 |     return images, num, rows, cols
25 | 
26 | 
27 | def get_data(label_filename, image_filename):
28 |     with open(label_filename, 'rb') as label_file:
29 |         labels, num_labels = get_labels(label_file)
30 | 
31 |     with open(image_filename, 'rb') as image_file:
32 |         images, num_images, rows, cols = get_images(image_file)
33 | 
34 |     if num_labels != num_images:
35 |         print '[WARNING]: Number of images and labels mismatch'
36 | 
37 |     return images, labels, num_labels, rows, cols
38 | 
39 | if __name__ == '__main__':
40 |     parser = argparse.ArgumentParser()
41 |     parser.add_argument("label_file", type=str)
42 |     parser.add_argument("image_file", type=str)
43 | 
44 |     args = parser.parse_args()
45 | 
46 |     images, labels, num, rows, cols = get_data(args.label_file,
47 |                                                args.image_file)
48 |     print 'First:', labels[0]
49 |     Image.fromarray(images[0].reshape(rows, cols)).show()
50 |     print 'Last:', labels[-1]
51 |     Image.fromarray(images[-1].reshape(rows, cols)).show()
52 |     print 'Length', len(labels)
53 | 


--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | from PIL import Image, ImageFont
 5 | import cv2
 6 | import numpy as np
 7 | 
 8 | from detect import detect, crop_detection, annotate_detection
 9 | from load_labels import get_data
10 | from recognize import cvtrain, sktrain, preprocess
11 | from recognize import annotate_recognition
12 | from glob import glob
13 | import os
14 | 
15 | SAMPLE_SIZE = (28, 28)
16 | SZ = 28
17 | LABEL_FILE = '../MNIST/train-labels.idx1-ubyte'
18 | IMAGE_FILE = '../MNIST/train-images.idx3-ubyte'
19 | CASCADE_FILE = '../classifier/cascade.xml'
20 | TEST_FILES = '../test/'
21 | RESULT_FILES = '../results/'
22 | 
23 | FONT_FILE = 'arial.ttf'
24 | FONT_SIZE = 30
25 | TEST_FONT = '5'
26 | TRAIN_SIZE = 10000
27 | 
28 | bin_n = 16  # Number of bins
29 | svm_params = dict(kernel_type=cv2.SVM_LINEAR,
30 |                   svm_type=cv2.SVM_C_SVC,
31 |                   C=2.67, gamma=5.383)
32 | 
33 | affine_flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR
34 | 
35 | 
36 | def main():
37 |     images, labels, num, rows, cols = get_data(LABEL_FILE,
38 |                                                IMAGE_FILE)
39 |     print 'Training OpenCV SVM...'
40 |     svc1 = cvtrain(images[:TRAIN_SIZE], labels[:TRAIN_SIZE], num, rows, cols)
41 | 
42 |     print 'Training sklearn SVM...'
43 |     svc2 = sktrain(images[:TRAIN_SIZE], labels[:TRAIN_SIZE])
44 | 
45 |     filenames = glob(TEST_FILES + "/*.jpg")
46 |     for filename in filenames:
47 |         print 'Processing', filename
48 |         img = cv2.imread(filename)
49 |         gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
50 |         im = Image.open(filename)
51 |         digits = detect(gray, CASCADE_FILE)
52 |         results = crop_detection(im.copy(), digits)
53 |         test = [np.float32(i.resize(SAMPLE_SIZE)).ravel() for i in results]
54 | 
55 |         testdata = preprocess(test, rows, cols).reshape(-1, bin_n * 4)
56 |         yhat1 = svc1.predict_all(testdata)
57 |         yhat1 = yhat1.astype(np.uint8).ravel()
58 |         yhat2 = svc2.predict(test)
59 | 
60 |         font = ImageFont.truetype(FONT_FILE, FONT_SIZE)
61 |         detected = annotate_detection(im.copy(), digits)
62 | 
63 |         basename = os.path.basename(filename)
64 |         resultname = RESULT_FILES + '/' + basename
65 | 
66 |         print 'OpenCV results'
67 |         recognized = annotate_recognition(detected, digits, yhat1, font)
68 |         recognized.show()
69 |         recognized.save(resultname.replace('.jpg', '-cv.jpg'))
70 | 
71 |         print 'sklearn results'
72 |         recognized = annotate_recognition(detected, digits, yhat2, font)
73 |         recognized.show()
74 |         recognized.save(resultname.replace('.jpg', '-sk.jpg'))
75 | 
76 | 
77 | if __name__ == '__main__':
78 |     main()
79 | 


--------------------------------------------------------------------------------
/src/recognize.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | from PIL import ImageDraw
 5 | 
 6 | import cv2
 7 | import numpy as np
 8 | from sklearn import svm
 9 | 
10 | SAMPLE_SIZE = (28, 28)
11 | SZ = 28
12 | TEST_FONT = '5'
13 | 
14 | bin_n = 16  # Number of bins
15 | svm_params = dict(kernel_type=cv2.SVM_LINEAR,
16 |                   svm_type=cv2.SVM_C_SVC)
17 | 
18 | affine_flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR
19 | 
20 | 
21 | def deskew(img):
22 |     m = cv2.moments(img)
23 |     if abs(m['mu02']) < 1e-2:
24 |         return img.copy()
25 |     skew = m['mu11']/m['mu02']
26 |     M = np.float32([[1, skew, -0.5*SZ*skew], [0, 1, 0]])
27 |     img = cv2.warpAffine(img, M, (SZ, SZ), flags=affine_flags)
28 |     return img
29 | 
30 | 
31 | def hog(img):
32 |     gx = cv2.Sobel(img, cv2.CV_32F, 1, 0)
33 |     gy = cv2.Sobel(img, cv2.CV_32F, 0, 1)
34 |     mag, ang = cv2.cartToPolar(gx, gy)
35 |     # quantizing binvalues in (0...16)
36 |     bins = np.int32(bin_n * ang / (2 * np.pi))
37 |     bin_cells = bins[:10, :10], bins[10:, :10], bins[:10, 10:], bins[10:, 10:]
38 |     mag_cells = mag[:10, :10], mag[10:, :10], mag[:10, 10:], mag[10:, 10:]
39 |     hists = [np.bincount(b.ravel(), m.ravel(), bin_n)
40 |              for b, m in zip(bin_cells, mag_cells)]
41 |     hist = np.hstack(hists)     # hist is a 64 bit vector
42 |     return hist
43 | 
44 | 
45 | def cvtrain(images, labels, num, rows, cols):
46 |     svc = cv2.SVM()
47 |     traindata = preprocess(images, rows, cols)
48 |     responses = np.float32(labels[:, None])
49 |     svc.train(traindata, responses, params=svm_params)
50 |     return svc
51 | 
52 | 
53 | def sktrain(images, labels):
54 |     svc = svm.SVC(kernel='linear')
55 |     svc.fit(images, labels)
56 |     return svc
57 | 
58 | 
59 | def preprocess(images, rows, cols):
60 |     deskewed = [deskew(im.reshape(rows, cols)) for im in images]
61 |     hogdata = [hog(im) for im in deskewed]
62 |     return np.float32(hogdata).reshape(-1, 64)
63 | 
64 | 
65 | def get_font_size(font):
66 |     return max(font.getsize(TEST_FONT))
67 | 
68 | 
69 | def annotate_recognition(im, regions, labels, font, color=255):
70 |     clone = im.copy()
71 |     draw = ImageDraw.Draw(clone)
72 |     size = get_font_size(font)
73 |     for idx, (x, y, w, h) in enumerate(regions):
74 |         draw.text(
75 |             (x+w-size, y+h-size), str(labels[idx]), font=font, fill=color)
76 |     return clone
77 | 


--------------------------------------------------------------------------------
/test/1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/1.jpg


--------------------------------------------------------------------------------
/test/2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/2.jpg


--------------------------------------------------------------------------------
/test/3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/3.jpg


--------------------------------------------------------------------------------
/test/4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/4.jpg


--------------------------------------------------------------------------------
/test/6.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/6.jpg


--------------------------------------------------------------------------------
/test/7.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/7.jpg


--------------------------------------------------------------------------------
/test/8.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/8.jpg


--------------------------------------------------------------------------------