├── .gitignore ├── LICENSE ├── README.md ├── model └── fmodelwts.h5 └── src ├── model.py └── ocr.py /.gitignore: -------------------------------------------------------------------------------- 1 | *__pycache__ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Atharva Hudlikar 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![forthebadge](https://forthebadge.com/images/badges/made-with-python.svg)](https://forthebadge.com) 2 | [![forthebadge](https://forthebadge.com/images/badges/makes-people-smile.svg)](https://forthebadge.com) 3 | 4 | [![GitHub license](https://img.shields.io/github/license/Naereen/StrapDown.js.svg)](https://github.com/Naereen/StrapDown.js/blob/master/LICENSE) 5 | [![LinkedIn-profile](https://img.shields.io/badge/LinkedIn-Atharva-blue.svg)](https://www.linkedin.com/in/atharva-hudlikar/) 6 | 7 | # Optical Character Recognition 8 | This is a code that reads the text present in an image and predicts what's written in it. 9 | 10 | ## Process 11 | The code first divides the image into multiple segments (each segment contains a single character). On this segment, a pretrained model is executed to predict the character present in the segment. This then outputs the predicted characters as a string 12 | 13 | ## [Directory Tree](https://xiaoluoboding.github.io/repository-tree/) 14 | ``` 15 | ├─ model 16 | │ └─ fmodelwts.h5 17 | ├─ src 18 | │ ├─ model.py 19 | │ └─ ocr.py 20 | ├─ .gitignore 21 | ├─ LICENSE 22 | └─ README.md 23 | ``` 24 | 25 | ## Setting up the OCR 26 | Let's start by cloning the repository
27 | ```bash 28 | $ git clone https://github.com/Mastermind0100/Optical-Character-Recognizer.git 29 | $ cd Optical-Character-Recognizer 30 | ``` 31 | Great! You are set up with the repository.
32 | Let's dive into it! 33 | 34 | ## How to Use the OCR 35 | 1. Copy the following codes/files into the directory you are using for your project: 36 | * ocr.py 37 | * fmodelwts.h5 38 | 39 | 2. In your code, add the following lines: 40 | ```python 41 | import ocr 42 | predict(image) 43 | ``` 44 | 45 | 3. This code will print the text the code detects in the image you gave as input in the function 'predict'. 46 | 47 | 4. If you want the function to simply return the predicted text and not print it, then make the following changes to Line 78 of the program 'ocr.py': 48 | 49 | ```python 50 | return final 51 | ``` 52 | Also, your code needs to accept it in a variable. So the code in Step 2 will change to: 53 | ```python 54 | text = predict(image) 55 | ``` 56 | 57 | * The 'image' that you pass in the argument of the **predict** function is the data after importing the image into the code using the [imread](https://www.geeksforgeeks.org/python-opencv-cv2-imread-method/) function in [opencv](https://opencv-python-tutroals.readthedocs.io/en/latest/index.html). But you knew that, right? 58 | 59 | * Note that this is a relatively basic OCR. It does not detect spaces for you or segment words in a sentence. While work is under progress for this, you can do some level of image pre-processing to make this work for you.
Watch out for further updates! 60 | 61 | ## Want to train on your own Dataset? 62 | 63 | Go ahead! Fire up 'model.py' and use your own dataset. Hopefully the code is self explanatory. 64 | P.S. The Dataset I used was the [NIST](https://s3.amazonaws.com/nist-srd/SD19/by_class.zip) Dataset. Download the 2nd Edition and have fun manually arranging data :) 65 | 66 | ## Output 67 | The Original photo looks like this: 68 |

69 | ![plate1](https://user-images.githubusercontent.com/36445600/60267373-bca10200-9907-11e9-83ae-0a5e7b4ebb4e.jpg)
70 |

71 | Mid Processing Output: 72 |

73 | ![up1](https://user-images.githubusercontent.com/36445600/60267398-c75b9700-9907-11e9-8db5-18642455dbff.png)
74 |

75 | Final Text Output (Spyder Console): 76 |

77 | ![up2](https://user-images.githubusercontent.com/36445600/60267456-e6f2bf80-9907-11e9-8d8f-df9e9b6221ea.png) 78 | 79 | ## License 80 | [![License](http://img.shields.io/:license-mit-blue.svg?style=flat)](http://badges.mit-license.org)
81 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details 82 | -------------------------------------------------------------------------------- /model/fmodelwts.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Mastermind0100/Optical-Character-Recognizer/66611a0ad6616bb55500eda772921c6f40ade9f9/model/fmodelwts.h5 -------------------------------------------------------------------------------- /src/model.py: -------------------------------------------------------------------------------- 1 | """ 2 | @author: Atharva 3 | """ 4 | ##This code is to train the model to recognize typed characters 5 | 6 | import numpy as np 7 | import cv2 8 | from keras.models import load_model 9 | from keras.models import Sequential 10 | from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout 11 | from keras.preprocessing import image 12 | from keras.preprocessing.image import ImageDataGenerator 13 | import scipy.fftpack 14 | 15 | trdata = 71999 16 | vltdata = 21600 17 | batch = 16 18 | #tst = cv2.inpaint(tst, thresh2,3, cv2.INPAINT_TELEA) 19 | arr_result = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'] 20 | 21 | training_data = 'nist_final/training' 22 | validation_data = 'nist_final/validation' 23 | 24 | model=Sequential() 25 | model.add(Conv2D(32,(3,3),input_shape=(64,64,1),activation='relu')) 26 | model.add(MaxPooling2D(pool_size=(2,2))) 27 | model.add(Conv2D(32,(3,3),activation='relu')) 28 | model.add(MaxPooling2D(pool_size=(2,2))) 29 | model.add(Flatten()) 30 | model.add(Dense(units=128,activation='relu')) 31 | model.add(Dropout(0.5)) 32 | model.add(Dense(units=36,activation='sigmoid')) 33 | model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy']) 34 | 35 | train_datagen=ImageDataGenerator(rescale = 1./255, 36 | shear_range = 0.2, 37 | zoom_range = 0.2, 38 | horizontal_flip = False) 39 | 40 | test_datagen=ImageDataGenerator(rescale = 1./255) 41 | 42 | training_set=train_datagen.flow_from_directory(directory = training_data, 43 | target_size = (64, 64), 44 | color_mode='grayscale', 45 | batch_size = batch, 46 | class_mode = 'sparse') 47 | 48 | test_set=test_datagen.flow_from_directory(directory = validation_data, 49 | target_size = (64, 64), 50 | color_mode='grayscale', 51 | batch_size = batch, 52 | class_mode = 'sparse') 53 | 54 | model.fit_generator(training_set,steps_per_epoch = 4500, 55 | epochs = 15, 56 | validation_data = test_set, 57 | validation_steps = 1350) 58 | 59 | model.save('fmodelwts.h5') -------------------------------------------------------------------------------- /src/ocr.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Thu Jun 6 16:40:46 2019 5 | 6 | @author: Atharva 7 | """ 8 | import numpy as np 9 | import cv2 10 | from tensorflow.keras.models import load_model 11 | from tensorflow.keras.preprocessing import image 12 | from PIL import Image 13 | 14 | arr_out = [] 15 | arr_result = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'] 16 | 17 | model=load_model('fmodelwts.h5') 18 | 19 | def sortcnts(cnts): # to sort the contours left to right 20 | 21 | boundingBoxes = [cv2.boundingRect(c) for c in cnts] 22 | (cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes), 23 | key=lambda b:b[1][0], reverse=False)) 24 | 25 | return (cnts) 26 | 27 | def test(a,b,c,d,imd): # to predict the character present in the region of interest 28 | test=imd[b:b+d,a:a+c] 29 | _,test_image = cv2.threshold(test,100,255,cv2.THRESH_BINARY) 30 | test_image= cv2.copyMakeBorder(test_image,10,10,10,10,cv2.BORDER_CONSTANT,value=(255,255,255)) 31 | test_image = cv2.medianBlur(test_image.copy(),3) 32 | test_image = cv2.resize(test_image.copy(),(64,64),interpolation = cv2.INTER_AREA) 33 | t = test_image.copy() 34 | cv2.resize(test_image,(64,64)) 35 | test_image=(image.img_to_array(test_image))/255 36 | test_image=np.expand_dims(test_image, axis = 0) 37 | result=model.predict(test_image) 38 | np.reshape(result, 36) 39 | high = np.amax(test_image) 40 | low = np.amin(test_image) 41 | if high != low: 42 | maxval = np.amax(result) 43 | index = np.where(result == maxval) 44 | arr_out.append(arr_result[index[1][0]]) 45 | 46 | def predict(input_img): 47 | im = input_img.copy() 48 | img = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY) 49 | 50 | # Code for enhancing the image-------------------------------------------------- 51 | 52 | blur = cv2.bilateralFilter(img.copy(),9,75,75) 53 | _, thresh = cv2.threshold(blur.copy(), 100, 255, cv2.THRESH_BINARY) 54 | 55 | a, contours, h = cv2.findContours(thresh.copy(),cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) 56 | 57 | sum = 0 58 | maxar = 0 59 | for cnt in contours: 60 | x,y,w,h = cv2.boundingRect(cnt) 61 | sum += (w*h) 62 | 63 | 64 | avg = sum/len(contours) 65 | maxar = 10000 66 | minar = 1000 67 | for cnt in contours: 68 | x,y,w,h = cv2.boundingRect(cnt) 69 | if w*h < maxar and w*h > minar: 70 | test(x,y,w,h,img) 71 | 72 | final = "" 73 | i = 0 74 | for ch in reversed(arr_out): 75 | i += 1 76 | final = final+ch 77 | 78 | print('\n',final) 79 | 80 | cv2.waitKey() 81 | cv2.destroyAllWindows() 82 | --------------------------------------------------------------------------------