├── .gitignore ├── README.md ├── classifier ├── cascade.xml ├── params.xml ├── stage0.xml ├── stage1.xml ├── stage2.xml ├── stage3.xml ├── stage4.xml ├── stage5.xml └── stage6.xml ├── preview.jpg ├── results ├── 1-cv.jpg ├── 1-sk.jpg ├── 2-cv.jpg ├── 2-sk.jpg ├── 3-cv.jpg ├── 3-sk.jpg ├── 4-cv.jpg ├── 4-sk.jpg ├── 6-cv.jpg ├── 6-sk.jpg ├── 7-cv.jpg ├── 7-sk.jpg ├── 8-cv.jpg ├── 8-sk.jpg └── cv.jpg ├── src ├── .gitignore ├── detect.py ├── load_labels.py ├── main.py └── recognize.py └── test ├── 1.jpg ├── 2.jpg ├── 3.jpg ├── 4.jpg ├── 6.jpg ├── 7.jpg └── 8.jpg /.gitignore: -------------------------------------------------------------------------------- 1 | asset2 2 | MNIST 3 | *.pyc -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Digit Detection & Recognition 2 | 3 | ### What is it? 4 | 5 | Digit detection and recognition with AdaBoost and SVM. 6 | 7 | ![](preview.jpg) 8 | 9 | ### How it works 10 | 11 | 1. Train a cascade classifier for detection. The cascade classifier in `classifier/cascade.xml` is trained with 7000 positive samples and 9000 negative samples in 10 stages. 12 | 2. Train a SVM with the MNIST database. 13 | 3. Detect the digits in the image. 14 | 4. For each detected region, scale them to the same size as the samples in MNIST, then use the trained SVM to recognize(classify) the digits. For better results we can deskew the images with their momentum first, then use the HOG descriptors for testing. 15 | 16 | ### Dependencies 17 | 18 | These scripts need python 2.7+ and the following libraries to work: 19 | 20 | 1. pillow(~2.8.1) 21 | 2. numpy(~1.9.0) 22 | 3. python-opencv(~2.4.11) 23 | 4. scikit-learn (~0.15.2) 24 | The simplest way to install all of them is to install [python(x,y)](https://code.google.com/p/pythonxy/wiki/Downloads?tm=2). 25 | 26 | If you can't install python(x,y), You can install python, numpy and python-opencv seperately, then install pip and pillow. 27 | 28 | 1. Install python. Just use the installer from [python's website](https://www.python.org/downloads/) 29 | 2. Install numpy. Just use the installer from [scipy's website](http://www.scipy.org/scipylib/download.html). (You don't need scipy to run this project, so you can just install numpy alone). 30 | 3. Install python-opencv. Download the release from [its sourceforge site](http://sourceforge.net/projects/opencvlibrary/files/). (Choose the release based on your operating system, then choose version 2.4.11). The executable is just an archive. Extract the files, then copy `cv2.pyd` to the `lib/site-packages` folder on your python installation path. 31 | 4. Install pip. Download [the script for installing pip](https://bootstrap.pypa.io/get-pip.py), open cmd (or termianl if you are using Linux/Mac OS X), go to the path where the downloaded script resides, and run `python get-pip.py` 32 | 5. Install pillow. Run `pip install pillow`. 33 | 6. Install scikit-learn. Run `pip install scikit-learn` 34 | 35 | If you are running the code under Linux/Mac OS X and the scripts throw `AttributeError: __float__`, make sure your pillow has jpeg support (consult [Pillow's document](http://pillow.readthedocs.org/en/latest/installation.html)) e.g. try: 36 | 37 | ``` 38 | sudo apt-get install libjpeg-dev 39 | sudo pip uninstall pillow 40 | sudo pip install pillow 41 | ``` 42 | 43 | If you have any problem installing the dependencies, contact the author. 44 | 45 | ### How to generate the results 46 | 47 | Enter the `src` directory, run 48 | 49 | ``` 50 | python main.py 51 | ``` 52 | 53 | It will use images(`.jpg` only) under `test` directory to produce the results. The results will show up in `results` directory. Results generated with OpenCV will have `-cv` in its filename and results generated with sklearn will have `-sk` in its filename. 54 | 55 | 56 | ### Directory structure 57 | 58 | ``` 59 | . 60 | ├─ README.md 61 | ├─ doc (documentations, reports) 62 | │ └── ... 63 | ├─ classifier (OpenCV cascade classifier) 64 | │ ├── cascade.xml (the classifier parameter file) 65 | │ └── ... 66 | ├─ MNIST (The MNIST database) 67 | │ ├── train-images.idx3-ubyte 68 | │ └── train-labels.idx1-ubyte 69 | ├─ test (test images) 70 | │ └── ... 71 | ├─ results (the results) 72 | │ └── ... 73 | └─ src (the python source code) 74 | ├── detect.py (detection code) 75 | ├── load_labels.py (script to load MNIST data) 76 | ├── recognize.py (recognition code) 77 | └── main.py (generate the results) 78 | ``` 79 | 80 | ### About 81 | 82 | * [Github repository](https://github.com/joyeecheung/digit-detection-recognition) 83 | * Author: Qiuyi Zhang 84 | * Time: Jul. 2015 -------------------------------------------------------------------------------- /classifier/cascade.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | BOOST 5 | HAAR 6 | 28 7 | 28 8 | 9 | GAB 10 | 9.9500000476837158e-001 11 | 5.0000000000000000e-001 12 | 9.4999999999999996e-001 13 | 1 14 | 100 15 | 16 | 0 17 | 1 18 | BASIC 19 | 7 20 | 21 | 22 | <_> 23 | 2 24 | -1.8347458541393280e-001 25 | 26 | <_> 27 | 28 | 0 -1 4 7.3850885033607483e-002 29 | 30 | -9.1880136728286743e-001 7.2553080320358276e-001 31 | <_> 32 | 33 | 0 -1 16 -1.0428477823734283e-001 34 | 35 | 7.3532676696777344e-001 -8.7772518396377563e-001 36 | 37 | <_> 38 | 4 39 | -1.8423573970794678e+000 40 | 41 | <_> 42 | 43 | 0 -1 14 2.7671821415424347e-002 44 | 45 | -8.7964987754821777e-001 5.1012891530990601e-001 46 | <_> 47 | 48 | 0 -1 13 -1.5674557536840439e-002 49 | 50 | 4.4505974650382996e-001 -8.3683210611343384e-001 51 | <_> 52 | 53 | 0 -1 21 -1.1460572568466887e-004 54 | 55 | 3.2763624191284180e-001 -8.8302916288375854e-001 56 | <_> 57 | 58 | 0 -1 5 3.4735631942749023e-001 59 | 60 | -5.2473807334899902e-001 6.4939862489700317e-001 61 | 62 | <_> 63 | 5 64 | -6.9036191701889038e-001 65 | 66 | <_> 67 | 68 | 0 -1 7 3.8202914595603943e-001 69 | 70 | -6.2518489360809326e-001 5.9963268041610718e-001 71 | <_> 72 | 73 | 0 -1 20 -1.1428447032812983e-004 74 | 75 | 3.5979631543159485e-001 -8.4728848934173584e-001 76 | <_> 77 | 78 | 0 -1 24 -6.0800916799053084e-006 79 | 80 | -9.8068064451217651e-001 2.6229214668273926e-001 81 | <_> 82 | 83 | 0 -1 11 -2.0428642164915800e-003 84 | 85 | -9.7884273529052734e-001 2.2761307656764984e-001 86 | <_> 87 | 88 | 0 -1 22 -9.1749490820802748e-005 89 | 90 | 2.9220622777938843e-001 -8.3195126056671143e-001 91 | 92 | <_> 93 | 4 94 | -9.6762913465499878e-001 95 | 96 | <_> 97 | 98 | 0 -1 10 6.8016932345926762e-005 99 | 100 | -6.9405460357666016e-001 4.2146533727645874e-001 101 | <_> 102 | 103 | 0 -1 6 2.5270378682762384e-003 104 | 105 | 2.9075825214385986e-001 -9.3532639741897583e-001 106 | <_> 107 | 108 | 0 -1 25 -5.9558178691077046e-006 109 | 110 | -8.4807384014129639e-001 3.0522018671035767e-001 111 | <_> 112 | 113 | 0 -1 25 5.9294161474099383e-006 114 | 115 | 3.5653164982795715e-001 -9.9828380346298218e-001 116 | 117 | <_> 118 | 3 119 | -1.4790147542953491e-001 120 | 121 | <_> 122 | 123 | 0 -1 25 -6.0298630160104949e-006 124 | 125 | -8.6544436216354370e-001 3.7327477335929871e-001 126 | <_> 127 | 128 | 0 -1 17 -5.3857154853176326e-005 129 | 130 | 4.1779288649559021e-001 -6.8576753139495850e-001 131 | <_> 132 | 133 | 0 -1 6 -7.4640125967562199e-004 134 | 135 | -9.8402875661849976e-001 2.9975003004074097e-001 136 | 137 | <_> 138 | 5 139 | -6.4234280586242676e-001 140 | 141 | <_> 142 | 143 | 0 -1 12 2.6030194014310837e-002 144 | 145 | -5.9079927206039429e-001 5.5949991941452026e-001 146 | <_> 147 | 148 | 0 -1 18 -1.1487156734801829e-004 149 | 150 | 3.5113725066184998e-001 -8.3308726549148560e-001 151 | <_> 152 | 153 | 0 -1 3 4.8153925687074661e-002 154 | 155 | -7.1412664651870728e-001 3.0667838454246521e-001 156 | <_> 157 | 158 | 0 -1 2 -5.9005141258239746e-002 159 | 160 | -9.4212603569030762e-001 2.5226300954818726e-001 161 | <_> 162 | 163 | 0 -1 8 8.0452084541320801e-002 164 | 165 | 2.5081482529640198e-001 -9.6162217855453491e-001 166 | 167 | <_> 168 | 6 169 | -8.3931213617324829e-001 170 | 171 | <_> 172 | 173 | 0 -1 0 1.5571638869005255e-005 174 | 175 | -8.0575537681579590e-001 2.6608934998512268e-001 176 | <_> 177 | 178 | 0 -1 15 -2.3084910935722291e-004 179 | 180 | 2.3701831698417664e-001 -8.9803802967071533e-001 181 | <_> 182 | 183 | 0 -1 1 2.1151141263544559e-003 184 | 185 | 2.5339540839195251e-001 -9.8738276958465576e-001 186 | <_> 187 | 188 | 0 -1 23 5.9348781178414356e-006 189 | 190 | 2.0503005385398865e-001 -8.4154272079467773e-001 191 | <_> 192 | 193 | 0 -1 19 -1.4691188698634505e-004 194 | 195 | 2.4083861708641052e-001 -8.0533689260482788e-001 196 | <_> 197 | 198 | 0 -1 9 -8.8780790567398071e-002 199 | 200 | -9.5632034540176392e-001 1.6521719098091125e-001 201 | 202 | <_> 203 | 204 | <_> 205 | 0 0 18 2 -1. 206 | <_> 207 | 9 0 9 2 2. 208 | 0 209 | <_> 210 | 211 | <_> 212 | 0 0 20 2 -1. 213 | <_> 214 | 10 0 10 2 2. 215 | 0 216 | <_> 217 | 218 | <_> 219 | 0 0 28 28 -1. 220 | <_> 221 | 14 0 14 28 2. 222 | 0 223 | <_> 224 | 225 | <_> 226 | 0 0 21 10 -1. 227 | <_> 228 | 0 5 21 5 2. 229 | 0 230 | <_> 231 | 232 | <_> 233 | 0 6 14 17 -1. 234 | <_> 235 | 7 6 7 17 2. 236 | 0 237 | <_> 238 | 239 | <_> 240 | 0 6 27 17 -1. 241 | <_> 242 | 9 6 9 17 3. 243 | 0 244 | <_> 245 | 246 | <_> 247 | 0 27 27 1 -1. 248 | <_> 249 | 9 27 9 1 3. 250 | 0 251 | <_> 252 | 253 | <_> 254 | 1 6 27 18 -1. 255 | <_> 256 | 10 6 9 18 3. 257 | 0 258 | <_> 259 | 260 | <_> 261 | 2 0 26 23 -1. 262 | <_> 263 | 15 0 13 23 2. 264 | 0 265 | <_> 266 | 267 | <_> 268 | 6 0 22 28 -1. 269 | <_> 270 | 6 14 22 14 2. 271 | 0 272 | <_> 273 | 274 | <_> 275 | 7 0 2 6 -1. 276 | <_> 277 | 8 0 1 6 2. 278 | 0 279 | <_> 280 | 281 | <_> 282 | 8 0 20 2 -1. 283 | <_> 284 | 18 0 10 2 2. 285 | 0 286 | <_> 287 | 288 | <_> 289 | 10 20 16 8 -1. 290 | <_> 291 | 10 20 8 4 2. 292 | <_> 293 | 18 24 8 4 2. 294 | 0 295 | <_> 296 | 297 | <_> 298 | 10 20 10 8 -1. 299 | <_> 300 | 10 24 10 4 2. 301 | 0 302 | <_> 303 | 304 | <_> 305 | 11 0 11 10 -1. 306 | <_> 307 | 11 5 11 5 2. 308 | 0 309 | <_> 310 | 311 | <_> 312 | 11 22 5 6 -1. 313 | <_> 314 | 11 25 5 3 2. 315 | 0 316 | <_> 317 | 318 | <_> 319 | 14 5 14 18 -1. 320 | <_> 321 | 21 5 7 18 2. 322 | 0 323 | <_> 324 | 325 | <_> 326 | 21 22 6 4 -1. 327 | <_> 328 | 21 24 6 2 2. 329 | 0 330 | <_> 331 | 332 | <_> 333 | 22 12 6 4 -1. 334 | <_> 335 | 24 12 2 4 3. 336 | 0 337 | <_> 338 | 339 | <_> 340 | 23 2 4 10 -1. 341 | <_> 342 | 25 2 2 10 2. 343 | 0 344 | <_> 345 | 346 | <_> 347 | 23 5 2 8 -1. 348 | <_> 349 | 24 5 1 8 2. 350 | 0 351 | <_> 352 | 353 | <_> 354 | 23 8 2 8 -1. 355 | <_> 356 | 24 8 1 8 2. 357 | 0 358 | <_> 359 | 360 | <_> 361 | 23 14 2 10 -1. 362 | <_> 363 | 24 14 1 10 2. 364 | 0 365 | <_> 366 | 367 | <_> 368 | 24 0 4 8 -1. 369 | <_> 370 | 24 4 4 4 2. 371 | 0 372 | <_> 373 | 374 | <_> 375 | 24 0 4 10 -1. 376 | <_> 377 | 24 5 4 5 2. 378 | 0 379 | <_> 380 | 381 | <_> 382 | 24 16 4 12 -1. 383 | <_> 384 | 26 16 2 12 2. 385 | 0 386 | 387 | -------------------------------------------------------------------------------- /classifier/params.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | BOOST 5 | HAAR 6 | 28 7 | 28 8 | 9 | GAB 10 | 9.9500000476837158e-001 11 | 5.0000000000000000e-001 12 | 9.4999999999999996e-001 13 | 1 14 | 100 15 | 16 | 0 17 | 1 18 | BASIC 19 | 20 | -------------------------------------------------------------------------------- /classifier/stage0.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 2 5 | -1.8347458541393280e-001 6 | 7 | <_> 8 | 9 | 0 -1 8540 7.3850885033607483e-002 10 | 11 | -9.1880136728286743e-001 7.2553080320358276e-001 12 | <_> 13 | 14 | 0 -1 228204 -1.0428477823734283e-001 15 | 16 | 7.3532676696777344e-001 -8.7772518396377563e-001 17 | 18 | -------------------------------------------------------------------------------- /classifier/stage1.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 4 5 | -1.8423573970794678e+000 6 | 7 | <_> 8 | 9 | 0 -1 189714 2.7671821415424347e-002 10 | 11 | -8.7964987754821777e-001 5.1012891530990601e-001 12 | <_> 13 | 14 | 0 -1 188081 -1.5674557536840439e-002 15 | 16 | 4.4505974650382996e-001 -8.3683210611343384e-001 17 | <_> 18 | 19 | 0 -1 291829 -1.1460572568466887e-004 20 | 21 | 3.2763624191284180e-001 -8.8302916288375854e-001 22 | <_> 23 | 24 | 0 -1 8687 3.4735631942749023e-001 25 | 26 | -5.2473807334899902e-001 6.4939862489700317e-001 27 | 28 | -------------------------------------------------------------------------------- /classifier/stage2.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 5 | -6.9036191701889038e-001 6 | 7 | <_> 8 | 9 | 0 -1 29435 3.8202914595603943e-001 10 | 11 | -6.2518489360809326e-001 5.9963268041610718e-001 12 | <_> 13 | 14 | 0 -1 291303 -1.1428447032812983e-004 15 | 16 | 3.5979631543159485e-001 -8.4728848934173584e-001 17 | <_> 18 | 19 | 0 -1 293629 -6.0800916799053084e-006 20 | 21 | -9.8068064451217651e-001 2.6229214668273926e-001 22 | <_> 23 | 24 | 0 -1 147239 -2.0428642164915800e-003 25 | 26 | -9.7884273529052734e-001 2.2761307656764984e-001 27 | <_> 28 | 29 | 0 -1 292668 -9.1749490820802748e-005 30 | 31 | 2.9220622777938843e-001 -8.3195126056671143e-001 32 | 33 | -------------------------------------------------------------------------------- /classifier/stage3.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 4 5 | -9.6762913465499878e-001 6 | 7 | <_> 8 | 9 | 0 -1 130883 6.8016932345926762e-005 10 | 11 | -6.9405460357666016e-001 4.2146533727645874e-001 12 | <_> 13 | 14 | 0 -1 21092 2.5270378682762384e-003 15 | 16 | 2.9075825214385986e-001 -9.3532639741897583e-001 17 | <_> 18 | 19 | 0 -1 295867 -5.9558178691077046e-006 20 | 21 | -8.4807384014129639e-001 3.0522018671035767e-001 22 | <_> 23 | 24 | 0 -1 295867 5.9294161474099383e-006 25 | 26 | 3.5653164982795715e-001 -9.9828380346298218e-001 27 | 28 | -------------------------------------------------------------------------------- /classifier/stage4.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 3 5 | -1.4790147542953491e-001 6 | 7 | <_> 8 | 9 | 0 -1 295867 -6.0298630160104949e-006 10 | 11 | -8.6544436216354370e-001 3.7327477335929871e-001 12 | <_> 13 | 14 | 0 -1 285506 -5.3857154853176326e-005 15 | 16 | 4.1779288649559021e-001 -6.8576753139495850e-001 17 | <_> 18 | 19 | 0 -1 21092 -7.4640125967562199e-004 20 | 21 | -9.8402875661849976e-001 2.9975003004074097e-001 22 | 23 | -------------------------------------------------------------------------------- /classifier/stage5.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 5 | -6.4234280586242676e-001 6 | 7 | <_> 8 | 9 | 0 -1 188053 2.6030194014310837e-002 10 | 11 | -5.9079927206039429e-001 5.5949991941452026e-001 12 | <_> 13 | 14 | 0 -1 288794 -1.1487156734801829e-004 15 | 16 | 3.5113725066184998e-001 -8.3308726549148560e-001 17 | <_> 18 | 19 | 0 -1 1308 4.8153925687074661e-002 20 | 21 | -7.1412664651870728e-001 3.0667838454246521e-001 22 | <_> 23 | 24 | 0 -1 1161 -5.9005141258239746e-002 25 | 26 | -9.4212603569030762e-001 2.5226300954818726e-001 27 | <_> 28 | 29 | 0 -1 42335 8.0452084541320801e-002 30 | 31 | 2.5081482529640198e-001 -9.6162217855453491e-001 32 | 33 | -------------------------------------------------------------------------------- /classifier/stage6.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 6 5 | -8.3931213617324829e-001 6 | 7 | <_> 8 | 9 | 0 -1 749 1.5571638869005255e-005 10 | 11 | -8.0575537681579590e-001 2.6608934998512268e-001 12 | <_> 13 | 14 | 0 -1 200828 -2.3084910935722291e-004 15 | 16 | 2.3701831698417664e-001 -8.9803802967071533e-001 17 | <_> 18 | 19 | 0 -1 841 2.1151141263544559e-003 20 | 21 | 2.5339540839195251e-001 -9.8738276958465576e-001 22 | <_> 23 | 24 | 0 -1 293627 5.9348781178414356e-006 25 | 26 | 2.0503005385398865e-001 -8.4154272079467773e-001 27 | <_> 28 | 29 | 0 -1 290785 -1.4691188698634505e-004 30 | 31 | 2.4083861708641052e-001 -8.0533689260482788e-001 32 | <_> 33 | 34 | 0 -1 115473 -8.8780790567398071e-002 35 | 36 | -9.5632034540176392e-001 1.6521719098091125e-001 37 | 38 | -------------------------------------------------------------------------------- /preview.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/preview.jpg -------------------------------------------------------------------------------- /results/1-cv.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/1-cv.jpg -------------------------------------------------------------------------------- /results/1-sk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/1-sk.jpg -------------------------------------------------------------------------------- /results/2-cv.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/2-cv.jpg -------------------------------------------------------------------------------- /results/2-sk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/2-sk.jpg -------------------------------------------------------------------------------- /results/3-cv.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/3-cv.jpg -------------------------------------------------------------------------------- /results/3-sk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/3-sk.jpg -------------------------------------------------------------------------------- /results/4-cv.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/4-cv.jpg -------------------------------------------------------------------------------- /results/4-sk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/4-sk.jpg -------------------------------------------------------------------------------- /results/6-cv.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/6-cv.jpg -------------------------------------------------------------------------------- /results/6-sk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/6-sk.jpg -------------------------------------------------------------------------------- /results/7-cv.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/7-cv.jpg -------------------------------------------------------------------------------- /results/7-sk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/7-sk.jpg -------------------------------------------------------------------------------- /results/8-cv.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/8-cv.jpg -------------------------------------------------------------------------------- /results/8-sk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/8-sk.jpg -------------------------------------------------------------------------------- /results/cv.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/results/cv.jpg -------------------------------------------------------------------------------- /src/.gitignore: -------------------------------------------------------------------------------- 1 | *.jpg -------------------------------------------------------------------------------- /src/detect.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import cv2 5 | from PIL import Image, ImageDraw 6 | 7 | 8 | def detect(im, xml): 9 | digit_cascade = cv2.CascadeClassifier(xml) 10 | digits = digit_cascade.detectMultiScale(im) 11 | return digits 12 | 13 | 14 | def annotate_detection(im, regions, color=128): 15 | clone = im.copy() 16 | draw = ImageDraw.Draw(clone) 17 | for (x, y, w, h) in regions: 18 | draw.rectangle((x, y, x+w, y+h), outline=color) 19 | return clone 20 | 21 | 22 | def crop_detection(im, regions): 23 | return [im.crop((x, y, x+w, y+h)) for (x, y, w, h) in regions] 24 | 25 | if __name__ == '__main__': 26 | img = cv2.imread('../asset/test/7.jpg') 27 | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 28 | im = Image.open('../asset/test/7.jpg') 29 | digits = detect(gray, '../asset/classifier2/cascade.xml') 30 | result = annotate_detection(im, digits) 31 | result.show() 32 | -------------------------------------------------------------------------------- /src/load_labels.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import struct 5 | import numpy as np 6 | from PIL import Image 7 | import argparse 8 | 9 | def get_labels(file): 10 | magic, num = struct.unpack(">II", file.read(8)) 11 | if magic != 2049: 12 | raise ValueError('Magic number mismatch, expected 2049,' + 13 | ' got %d' % magic) 14 | 15 | return np.fromfile(file, dtype=np.int8), num 16 | 17 | 18 | def get_images(file): 19 | magic, num, rows, cols = struct.unpack(">IIII", file.read(16)) 20 | if magic != 2051: 21 | raise ValueError('Magic number mismatch, expected 2051,' + 22 | ' got %d' % magic) 23 | images = np.fromfile(file, dtype=np.uint8).reshape(num, rows * cols) 24 | return images, num, rows, cols 25 | 26 | 27 | def get_data(label_filename, image_filename): 28 | with open(label_filename, 'rb') as label_file: 29 | labels, num_labels = get_labels(label_file) 30 | 31 | with open(image_filename, 'rb') as image_file: 32 | images, num_images, rows, cols = get_images(image_file) 33 | 34 | if num_labels != num_images: 35 | print '[WARNING]: Number of images and labels mismatch' 36 | 37 | return images, labels, num_labels, rows, cols 38 | 39 | if __name__ == '__main__': 40 | parser = argparse.ArgumentParser() 41 | parser.add_argument("label_file", type=str) 42 | parser.add_argument("image_file", type=str) 43 | 44 | args = parser.parse_args() 45 | 46 | images, labels, num, rows, cols = get_data(args.label_file, 47 | args.image_file) 48 | print 'First:', labels[0] 49 | Image.fromarray(images[0].reshape(rows, cols)).show() 50 | print 'Last:', labels[-1] 51 | Image.fromarray(images[-1].reshape(rows, cols)).show() 52 | print 'Length', len(labels) 53 | -------------------------------------------------------------------------------- /src/main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | from PIL import Image, ImageFont 5 | import cv2 6 | import numpy as np 7 | 8 | from detect import detect, crop_detection, annotate_detection 9 | from load_labels import get_data 10 | from recognize import cvtrain, sktrain, preprocess 11 | from recognize import annotate_recognition 12 | from glob import glob 13 | import os 14 | 15 | SAMPLE_SIZE = (28, 28) 16 | SZ = 28 17 | LABEL_FILE = '../MNIST/train-labels.idx1-ubyte' 18 | IMAGE_FILE = '../MNIST/train-images.idx3-ubyte' 19 | CASCADE_FILE = '../classifier/cascade.xml' 20 | TEST_FILES = '../test/' 21 | RESULT_FILES = '../results/' 22 | 23 | FONT_FILE = 'arial.ttf' 24 | FONT_SIZE = 30 25 | TEST_FONT = '5' 26 | TRAIN_SIZE = 10000 27 | 28 | bin_n = 16 # Number of bins 29 | svm_params = dict(kernel_type=cv2.SVM_LINEAR, 30 | svm_type=cv2.SVM_C_SVC, 31 | C=2.67, gamma=5.383) 32 | 33 | affine_flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR 34 | 35 | 36 | def main(): 37 | images, labels, num, rows, cols = get_data(LABEL_FILE, 38 | IMAGE_FILE) 39 | print 'Training OpenCV SVM...' 40 | svc1 = cvtrain(images[:TRAIN_SIZE], labels[:TRAIN_SIZE], num, rows, cols) 41 | 42 | print 'Training sklearn SVM...' 43 | svc2 = sktrain(images[:TRAIN_SIZE], labels[:TRAIN_SIZE]) 44 | 45 | filenames = glob(TEST_FILES + "/*.jpg") 46 | for filename in filenames: 47 | print 'Processing', filename 48 | img = cv2.imread(filename) 49 | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 50 | im = Image.open(filename) 51 | digits = detect(gray, CASCADE_FILE) 52 | results = crop_detection(im.copy(), digits) 53 | test = [np.float32(i.resize(SAMPLE_SIZE)).ravel() for i in results] 54 | 55 | testdata = preprocess(test, rows, cols).reshape(-1, bin_n * 4) 56 | yhat1 = svc1.predict_all(testdata) 57 | yhat1 = yhat1.astype(np.uint8).ravel() 58 | yhat2 = svc2.predict(test) 59 | 60 | font = ImageFont.truetype(FONT_FILE, FONT_SIZE) 61 | detected = annotate_detection(im.copy(), digits) 62 | 63 | basename = os.path.basename(filename) 64 | resultname = RESULT_FILES + '/' + basename 65 | 66 | print 'OpenCV results' 67 | recognized = annotate_recognition(detected, digits, yhat1, font) 68 | recognized.show() 69 | recognized.save(resultname.replace('.jpg', '-cv.jpg')) 70 | 71 | print 'sklearn results' 72 | recognized = annotate_recognition(detected, digits, yhat2, font) 73 | recognized.show() 74 | recognized.save(resultname.replace('.jpg', '-sk.jpg')) 75 | 76 | 77 | if __name__ == '__main__': 78 | main() 79 | -------------------------------------------------------------------------------- /src/recognize.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | from PIL import ImageDraw 5 | 6 | import cv2 7 | import numpy as np 8 | from sklearn import svm 9 | 10 | SAMPLE_SIZE = (28, 28) 11 | SZ = 28 12 | TEST_FONT = '5' 13 | 14 | bin_n = 16 # Number of bins 15 | svm_params = dict(kernel_type=cv2.SVM_LINEAR, 16 | svm_type=cv2.SVM_C_SVC) 17 | 18 | affine_flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR 19 | 20 | 21 | def deskew(img): 22 | m = cv2.moments(img) 23 | if abs(m['mu02']) < 1e-2: 24 | return img.copy() 25 | skew = m['mu11']/m['mu02'] 26 | M = np.float32([[1, skew, -0.5*SZ*skew], [0, 1, 0]]) 27 | img = cv2.warpAffine(img, M, (SZ, SZ), flags=affine_flags) 28 | return img 29 | 30 | 31 | def hog(img): 32 | gx = cv2.Sobel(img, cv2.CV_32F, 1, 0) 33 | gy = cv2.Sobel(img, cv2.CV_32F, 0, 1) 34 | mag, ang = cv2.cartToPolar(gx, gy) 35 | # quantizing binvalues in (0...16) 36 | bins = np.int32(bin_n * ang / (2 * np.pi)) 37 | bin_cells = bins[:10, :10], bins[10:, :10], bins[:10, 10:], bins[10:, 10:] 38 | mag_cells = mag[:10, :10], mag[10:, :10], mag[:10, 10:], mag[10:, 10:] 39 | hists = [np.bincount(b.ravel(), m.ravel(), bin_n) 40 | for b, m in zip(bin_cells, mag_cells)] 41 | hist = np.hstack(hists) # hist is a 64 bit vector 42 | return hist 43 | 44 | 45 | def cvtrain(images, labels, num, rows, cols): 46 | svc = cv2.SVM() 47 | traindata = preprocess(images, rows, cols) 48 | responses = np.float32(labels[:, None]) 49 | svc.train(traindata, responses, params=svm_params) 50 | return svc 51 | 52 | 53 | def sktrain(images, labels): 54 | svc = svm.SVC(kernel='linear') 55 | svc.fit(images, labels) 56 | return svc 57 | 58 | 59 | def preprocess(images, rows, cols): 60 | deskewed = [deskew(im.reshape(rows, cols)) for im in images] 61 | hogdata = [hog(im) for im in deskewed] 62 | return np.float32(hogdata).reshape(-1, 64) 63 | 64 | 65 | def get_font_size(font): 66 | return max(font.getsize(TEST_FONT)) 67 | 68 | 69 | def annotate_recognition(im, regions, labels, font, color=255): 70 | clone = im.copy() 71 | draw = ImageDraw.Draw(clone) 72 | size = get_font_size(font) 73 | for idx, (x, y, w, h) in enumerate(regions): 74 | draw.text( 75 | (x+w-size, y+h-size), str(labels[idx]), font=font, fill=color) 76 | return clone 77 | -------------------------------------------------------------------------------- /test/1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/1.jpg -------------------------------------------------------------------------------- /test/2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/2.jpg -------------------------------------------------------------------------------- /test/3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/3.jpg -------------------------------------------------------------------------------- /test/4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/4.jpg -------------------------------------------------------------------------------- /test/6.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/6.jpg -------------------------------------------------------------------------------- /test/7.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/7.jpg -------------------------------------------------------------------------------- /test/8.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joyeecheung/digit-detection-recognition/c86e65c98f2e478499bc2360599a9bfdd32e6802/test/8.jpg --------------------------------------------------------------------------------