├── .gitignore
├── LICENSE
├── README.md
├── model
    └── fmodelwts.h5
└── src
    ├── model.py
    └── ocr.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *__pycache__


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Atharva Hudlikar
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | [![forthebadge](https://forthebadge.com/images/badges/made-with-python.svg)](https://forthebadge.com)
 2 | [![forthebadge](https://forthebadge.com/images/badges/makes-people-smile.svg)](https://forthebadge.com)
 3 | 
 4 | [![GitHub license](https://img.shields.io/github/license/Naereen/StrapDown.js.svg)](https://github.com/Naereen/StrapDown.js/blob/master/LICENSE)
 5 | [![LinkedIn-profile](https://img.shields.io/badge/LinkedIn-Atharva-blue.svg)](https://www.linkedin.com/in/atharva-hudlikar/)
 6 | 
 7 | # Optical Character Recognition
 8 | This is a code that reads the text present in an image and predicts what's written in it. 
 9 | 
10 | ## Process 
11 | The code first divides the image into multiple segments (each segment contains a single character). On this segment, a pretrained model is executed to predict the character present in the segment. This then outputs the predicted characters as a string
12 | 
13 | ## [Directory Tree](https://xiaoluoboding.github.io/repository-tree/)
14 | ```
15 | ├─ model
16 | │  └─ fmodelwts.h5
17 | ├─ src
18 | │  ├─ model.py
19 | │  └─ ocr.py
20 | ├─ .gitignore
21 | ├─ LICENSE
22 | └─ README.md
23 | ```
24 | 
25 | ## Setting up the OCR
26 | Let's start by cloning the repository<br>
27 | ```bash
28 | $ git clone https://github.com/Mastermind0100/Optical-Character-Recognizer.git
29 | $ cd Optical-Character-Recognizer
30 | ```
31 | Great! You are set up with the repository.<br> 
32 | Let's dive into it!
33 | 
34 | ## How to Use the OCR
35 | 1. Copy the following codes/files into the directory you are using for your project:
36 |     * ocr.py
37 |     * fmodelwts.h5
38 | 
39 | 2. In your code, add the following lines:
40 |     ```python
41 |     import ocr
42 |     predict(image)
43 |     ```
44 | 
45 | 3. This code will print the text the code detects in the image you gave as input in the function 'predict'.
46 | 
47 | 4. If you want the function to simply return the predicted text and not print it, then make the following changes to Line 78 of the program 'ocr.py':
48 | 
49 |     ```python
50 |     return final
51 |     ```
52 |     Also, your code needs to accept it in a variable. So the code in Step 2 will change to:
53 |     ```python
54 |     text = predict(image)
55 |     ```
56 |     
57 | * The 'image' that you pass in the argument of the **predict** function is the data after importing the image into the code using the [imread](https://www.geeksforgeeks.org/python-opencv-cv2-imread-method/) function in [opencv](https://opencv-python-tutroals.readthedocs.io/en/latest/index.html). But you knew that, right?
58 | 
59 | * Note that this is a relatively basic OCR. It does not detect spaces for you or segment words in a sentence. While work is under progress for this, you can do some level of image pre-processing to make this work for you.<br>Watch out for further updates!
60 | 
61 | ## Want to train on your own Dataset?
62 | 
63 | Go ahead! Fire up 'model.py' and use your own dataset. Hopefully the code is self explanatory.
64 | P.S. The Dataset I used was the [NIST](https://s3.amazonaws.com/nist-srd/SD19/by_class.zip) Dataset. Download the 2nd Edition and have fun manually arranging data :)
65 | 
66 | ## Output
67 | The Original photo looks like this:
68 | <br/><br/>
69 | ![plate1](https://user-images.githubusercontent.com/36445600/60267373-bca10200-9907-11e9-83ae-0a5e7b4ebb4e.jpg)<br/>
70 | <br/><br/>
71 | Mid Processing Output:
72 | <br/><br/>
73 | ![up1](https://user-images.githubusercontent.com/36445600/60267398-c75b9700-9907-11e9-8db5-18642455dbff.png)<br/>
74 | <br/><br/>
75 | Final Text Output (Spyder Console):
76 | <br/><br/>
77 | ![up2](https://user-images.githubusercontent.com/36445600/60267456-e6f2bf80-9907-11e9-8d8f-df9e9b6221ea.png)
78 | 
79 | ## License
80 | [![License](http://img.shields.io/:license-mit-blue.svg?style=flat)](http://badges.mit-license.org)<br>
81 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
82 | 


--------------------------------------------------------------------------------
/model/fmodelwts.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mastermind0100/Optical-Character-Recognizer/66611a0ad6616bb55500eda772921c6f40ade9f9/model/fmodelwts.h5


--------------------------------------------------------------------------------
/src/model.py:
--------------------------------------------------------------------------------
 1 | """
 2 | @author: Atharva
 3 | """
 4 | ##This code is to train the model to recognize typed characters
 5 | 
 6 | import numpy as np
 7 | import cv2
 8 | from keras.models import load_model
 9 | from keras.models import Sequential
10 | from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
11 | from keras.preprocessing import image
12 | from keras.preprocessing.image import ImageDataGenerator
13 | import scipy.fftpack 
14 | 
15 | trdata = 71999
16 | vltdata = 21600
17 | batch = 16
18 | #tst = cv2.inpaint(tst, thresh2,3, cv2.INPAINT_TELEA)   
19 | arr_result = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
20 | 
21 | training_data = 'nist_final/training'
22 | validation_data = 'nist_final/validation'
23 | 
24 | model=Sequential()
25 | model.add(Conv2D(32,(3,3),input_shape=(64,64,1),activation='relu'))
26 | model.add(MaxPooling2D(pool_size=(2,2)))
27 | model.add(Conv2D(32,(3,3),activation='relu'))
28 | model.add(MaxPooling2D(pool_size=(2,2)))
29 | model.add(Flatten())
30 | model.add(Dense(units=128,activation='relu'))
31 | model.add(Dropout(0.5))
32 | model.add(Dense(units=36,activation='sigmoid'))
33 | model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
34 | 
35 | train_datagen=ImageDataGenerator(rescale = 1./255,
36 |                                    shear_range = 0.2,
37 |                                    zoom_range = 0.2,
38 |                                     horizontal_flip = False)
39 | 
40 | test_datagen=ImageDataGenerator(rescale = 1./255)
41 | 
42 | training_set=train_datagen.flow_from_directory(directory = training_data,
43 |                                                  target_size = (64, 64),
44 |                                                  color_mode='grayscale',
45 |                                                  batch_size = batch,
46 |                                                  class_mode = 'sparse')
47 | 
48 | test_set=test_datagen.flow_from_directory(directory = validation_data,
49 |                                             target_size = (64, 64),
50 |                                             color_mode='grayscale',
51 |                                             batch_size = batch,
52 |                                             class_mode = 'sparse')
53 |  
54 | model.fit_generator(training_set,steps_per_epoch = 4500,         
55 |                          epochs = 15,
56 |                          validation_data = test_set,
57 |                          validation_steps = 1350)                 
58 | 
59 | model.save('fmodelwts.h5')


--------------------------------------------------------------------------------
/src/ocr.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # -*- coding: utf-8 -*-
 3 | """
 4 | Created on Thu Jun  6 16:40:46 2019
 5 | 
 6 | @author: Atharva
 7 | """
 8 | import numpy as np
 9 | import cv2
10 | from tensorflow.keras.models import load_model
11 | from tensorflow.keras.preprocessing import image
12 | from PIL import Image
13 | 
14 | arr_out = []
15 | arr_result = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
16 | 
17 | model=load_model('fmodelwts.h5')
18 | 
19 | def sortcnts(cnts):                 # to sort the contours left to right
20 | 	
21 | 	boundingBoxes = [cv2.boundingRect(c) for c in cnts]
22 | 	(cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
23 | 		key=lambda b:b[1][0], reverse=False))
24 |  
25 | 	return (cnts)
26 | 
27 | def test(a,b,c,d,imd):                # to predict the character present in the region of interest
28 |     test=imd[b:b+d,a:a+c]
29 |     _,test_image = cv2.threshold(test,100,255,cv2.THRESH_BINARY)
30 |     test_image= cv2.copyMakeBorder(test_image,10,10,10,10,cv2.BORDER_CONSTANT,value=(255,255,255))
31 |     test_image = cv2.medianBlur(test_image.copy(),3)
32 |     test_image = cv2.resize(test_image.copy(),(64,64),interpolation = cv2.INTER_AREA)
33 |     t = test_image.copy()
34 |     cv2.resize(test_image,(64,64))
35 |     test_image=(image.img_to_array(test_image))/255
36 |     test_image=np.expand_dims(test_image, axis = 0)
37 |     result=model.predict(test_image)  
38 |     np.reshape(result, 36)
39 |     high = np.amax(test_image)
40 |     low = np.amin(test_image)
41 |     if high != low:
42 |         maxval = np.amax(result)
43 |         index = np.where(result == maxval)
44 |         arr_out.append(arr_result[index[1][0]])
45 | 
46 | def predict(input_img):    
47 |     im = input_img.copy()
48 |     img = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
49 |     
50 |     # Code for enhancing the image--------------------------------------------------
51 |     
52 |     blur = cv2.bilateralFilter(img.copy(),9,75,75)
53 |     _, thresh = cv2.threshold(blur.copy(), 100, 255, cv2.THRESH_BINARY) 
54 | 
55 |     a, contours, h = cv2.findContours(thresh.copy(),cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
56 | 
57 |     sum = 0
58 |     maxar = 0
59 |     for cnt in contours:
60 |         x,y,w,h = cv2.boundingRect(cnt)
61 |         sum += (w*h)
62 | 
63 |     
64 |     avg = sum/len(contours)
65 |     maxar = 10000 
66 |     minar = 1000
67 |     for cnt in contours:
68 |         x,y,w,h = cv2.boundingRect(cnt)
69 |         if w*h < maxar and w*h > minar:
70 |             test(x,y,w,h,img)
71 | 
72 |     final = ""
73 |     i = 0
74 |     for ch in reversed(arr_out):
75 |         i += 1
76 |         final = final+ch
77 | 
78 |     print('\n',final)
79 | 
80 | cv2.waitKey()
81 | cv2.destroyAllWindows()
82 | 


--------------------------------------------------------------------------------