├── .gitignore
├── LICENSE
├── README.md
├── model
└── fmodelwts.h5
└── src
├── model.py
└── ocr.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *__pycache__
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 Atharva Hudlikar
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [](https://forthebadge.com)
2 | [](https://forthebadge.com)
3 |
4 | [](https://github.com/Naereen/StrapDown.js/blob/master/LICENSE)
5 | [](https://www.linkedin.com/in/atharva-hudlikar/)
6 |
7 | # Optical Character Recognition
8 | This is a code that reads the text present in an image and predicts what's written in it.
9 |
10 | ## Process
11 | The code first divides the image into multiple segments (each segment contains a single character). On this segment, a pretrained model is executed to predict the character present in the segment. This then outputs the predicted characters as a string
12 |
13 | ## [Directory Tree](https://xiaoluoboding.github.io/repository-tree/)
14 | ```
15 | ├─ model
16 | │ └─ fmodelwts.h5
17 | ├─ src
18 | │ ├─ model.py
19 | │ └─ ocr.py
20 | ├─ .gitignore
21 | ├─ LICENSE
22 | └─ README.md
23 | ```
24 |
25 | ## Setting up the OCR
26 | Let's start by cloning the repository
27 | ```bash
28 | $ git clone https://github.com/Mastermind0100/Optical-Character-Recognizer.git
29 | $ cd Optical-Character-Recognizer
30 | ```
31 | Great! You are set up with the repository.
32 | Let's dive into it!
33 |
34 | ## How to Use the OCR
35 | 1. Copy the following codes/files into the directory you are using for your project:
36 | * ocr.py
37 | * fmodelwts.h5
38 |
39 | 2. In your code, add the following lines:
40 | ```python
41 | import ocr
42 | predict(image)
43 | ```
44 |
45 | 3. This code will print the text the code detects in the image you gave as input in the function 'predict'.
46 |
47 | 4. If you want the function to simply return the predicted text and not print it, then make the following changes to Line 78 of the program 'ocr.py':
48 |
49 | ```python
50 | return final
51 | ```
52 | Also, your code needs to accept it in a variable. So the code in Step 2 will change to:
53 | ```python
54 | text = predict(image)
55 | ```
56 |
57 | * The 'image' that you pass in the argument of the **predict** function is the data after importing the image into the code using the [imread](https://www.geeksforgeeks.org/python-opencv-cv2-imread-method/) function in [opencv](https://opencv-python-tutroals.readthedocs.io/en/latest/index.html). But you knew that, right?
58 |
59 | * Note that this is a relatively basic OCR. It does not detect spaces for you or segment words in a sentence. While work is under progress for this, you can do some level of image pre-processing to make this work for you.
Watch out for further updates!
60 |
61 | ## Want to train on your own Dataset?
62 |
63 | Go ahead! Fire up 'model.py' and use your own dataset. Hopefully the code is self explanatory.
64 | P.S. The Dataset I used was the [NIST](https://s3.amazonaws.com/nist-srd/SD19/by_class.zip) Dataset. Download the 2nd Edition and have fun manually arranging data :)
65 |
66 | ## Output
67 | The Original photo looks like this:
68 |
69 | 
70 |
71 | Mid Processing Output:
72 |
73 | 
74 |
75 | Final Text Output (Spyder Console):
76 |
77 | 
78 |
79 | ## License
80 | [](http://badges.mit-license.org)
81 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
82 |
--------------------------------------------------------------------------------
/model/fmodelwts.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mastermind0100/Optical-Character-Recognizer/66611a0ad6616bb55500eda772921c6f40ade9f9/model/fmodelwts.h5
--------------------------------------------------------------------------------
/src/model.py:
--------------------------------------------------------------------------------
1 | """
2 | @author: Atharva
3 | """
4 | ##This code is to train the model to recognize typed characters
5 |
6 | import numpy as np
7 | import cv2
8 | from keras.models import load_model
9 | from keras.models import Sequential
10 | from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
11 | from keras.preprocessing import image
12 | from keras.preprocessing.image import ImageDataGenerator
13 | import scipy.fftpack
14 |
15 | trdata = 71999
16 | vltdata = 21600
17 | batch = 16
18 | #tst = cv2.inpaint(tst, thresh2,3, cv2.INPAINT_TELEA)
19 | arr_result = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
20 |
21 | training_data = 'nist_final/training'
22 | validation_data = 'nist_final/validation'
23 |
24 | model=Sequential()
25 | model.add(Conv2D(32,(3,3),input_shape=(64,64,1),activation='relu'))
26 | model.add(MaxPooling2D(pool_size=(2,2)))
27 | model.add(Conv2D(32,(3,3),activation='relu'))
28 | model.add(MaxPooling2D(pool_size=(2,2)))
29 | model.add(Flatten())
30 | model.add(Dense(units=128,activation='relu'))
31 | model.add(Dropout(0.5))
32 | model.add(Dense(units=36,activation='sigmoid'))
33 | model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
34 |
35 | train_datagen=ImageDataGenerator(rescale = 1./255,
36 | shear_range = 0.2,
37 | zoom_range = 0.2,
38 | horizontal_flip = False)
39 |
40 | test_datagen=ImageDataGenerator(rescale = 1./255)
41 |
42 | training_set=train_datagen.flow_from_directory(directory = training_data,
43 | target_size = (64, 64),
44 | color_mode='grayscale',
45 | batch_size = batch,
46 | class_mode = 'sparse')
47 |
48 | test_set=test_datagen.flow_from_directory(directory = validation_data,
49 | target_size = (64, 64),
50 | color_mode='grayscale',
51 | batch_size = batch,
52 | class_mode = 'sparse')
53 |
54 | model.fit_generator(training_set,steps_per_epoch = 4500,
55 | epochs = 15,
56 | validation_data = test_set,
57 | validation_steps = 1350)
58 |
59 | model.save('fmodelwts.h5')
--------------------------------------------------------------------------------
/src/ocr.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | Created on Thu Jun 6 16:40:46 2019
5 |
6 | @author: Atharva
7 | """
8 | import numpy as np
9 | import cv2
10 | from tensorflow.keras.models import load_model
11 | from tensorflow.keras.preprocessing import image
12 | from PIL import Image
13 |
14 | arr_out = []
15 | arr_result = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
16 |
17 | model=load_model('fmodelwts.h5')
18 |
19 | def sortcnts(cnts): # to sort the contours left to right
20 |
21 | boundingBoxes = [cv2.boundingRect(c) for c in cnts]
22 | (cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
23 | key=lambda b:b[1][0], reverse=False))
24 |
25 | return (cnts)
26 |
27 | def test(a,b,c,d,imd): # to predict the character present in the region of interest
28 | test=imd[b:b+d,a:a+c]
29 | _,test_image = cv2.threshold(test,100,255,cv2.THRESH_BINARY)
30 | test_image= cv2.copyMakeBorder(test_image,10,10,10,10,cv2.BORDER_CONSTANT,value=(255,255,255))
31 | test_image = cv2.medianBlur(test_image.copy(),3)
32 | test_image = cv2.resize(test_image.copy(),(64,64),interpolation = cv2.INTER_AREA)
33 | t = test_image.copy()
34 | cv2.resize(test_image,(64,64))
35 | test_image=(image.img_to_array(test_image))/255
36 | test_image=np.expand_dims(test_image, axis = 0)
37 | result=model.predict(test_image)
38 | np.reshape(result, 36)
39 | high = np.amax(test_image)
40 | low = np.amin(test_image)
41 | if high != low:
42 | maxval = np.amax(result)
43 | index = np.where(result == maxval)
44 | arr_out.append(arr_result[index[1][0]])
45 |
46 | def predict(input_img):
47 | im = input_img.copy()
48 | img = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
49 |
50 | # Code for enhancing the image--------------------------------------------------
51 |
52 | blur = cv2.bilateralFilter(img.copy(),9,75,75)
53 | _, thresh = cv2.threshold(blur.copy(), 100, 255, cv2.THRESH_BINARY)
54 |
55 | a, contours, h = cv2.findContours(thresh.copy(),cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
56 |
57 | sum = 0
58 | maxar = 0
59 | for cnt in contours:
60 | x,y,w,h = cv2.boundingRect(cnt)
61 | sum += (w*h)
62 |
63 |
64 | avg = sum/len(contours)
65 | maxar = 10000
66 | minar = 1000
67 | for cnt in contours:
68 | x,y,w,h = cv2.boundingRect(cnt)
69 | if w*h < maxar and w*h > minar:
70 | test(x,y,w,h,img)
71 |
72 | final = ""
73 | i = 0
74 | for ch in reversed(arr_out):
75 | i += 1
76 | final = final+ch
77 |
78 | print('\n',final)
79 |
80 | cv2.waitKey()
81 | cv2.destroyAllWindows()
82 |
--------------------------------------------------------------------------------