├── LICENSE ├── README.md ├── frozen_east_text_detection.pb ├── images ├── car_wash.png ├── lebron_james.jpg └── sign.jpg └── opencv_text_detection_image.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Abhishek Singh 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # EAST Detector for Text Detection 2 | 3 | OpenCV’s EAST(Efficient and Accurate Scene Text Detection ) text detector is a deep learning model, based on a novel architecture and training pattern. It is capable of 4 | - running at near real-time at 13 FPS on 720p images and 5 | - obtains state-of-the-art text detection accuracy. 6 | 7 | [Link to paper](https://arxiv.org/pdf/1704.03155.pdf) 8 | 9 | OpenCV’s text detector implementation of EAST is quite robust, capable of localizing text even when it’s blurred, reflective, or partially obscured. 10 | 11 | There are many natural scene text detection challenges that have been described by Celine Mancas-Thillou and Bernard Gosselin in their excellent 2017 paper, [Natural Scene Text Understanding](https://www.tcts.fpms.ac.be/publications/regpapers/2007/VS_cmtbg2007.pdf) below: 12 | 13 | - **Image/sensor noise**: Sensor noise from a handheld camera is typically higher than that of a traditional scanner. Additionally, low-priced cameras will typically interpolate the pixels of raw sensors to produce real colors. 14 | 15 | - **Viewing angles**: Natural scene text can naturally have viewing angles that are not parallel to the text, making the text harder to recognize. 16 | Blurring: Uncontrolled environments tend to have blur, especially if the end user is utilizing a smartphone that does not have some form of stabilization. 17 | 18 | - **Lighting conditions**: We cannot make any assumptions regarding our lighting conditions in natural scene images. It may be near dark, the flash on the camera may be on, or the sun may be shining brightly, saturating the entire image. 19 | 20 | - **Resolution**: Not all cameras are created equal — we may be dealing with cameras with sub-par resolution. 21 | 22 | - **Non-paper objects**: Most, but not all, paper is not reflective (at least in context of paper you are trying to scan). Text in natural scenes may be reflective, including logos, signs, etc. 23 | 24 | - **Non-planar objects**: Consider what happens when you wrap text around a bottle — the text on the surface becomes distorted and deformed. While humans may still be able to easily “detect” and read the text, our algorithms will struggle. We need to be able to handle such use cases. 25 | 26 | - **Unknown layout**: We cannot use any a priori information to give our algorithms “clues” as to where the text resides. 27 | 28 | 29 | ## Contributing 30 | Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. 31 | 32 | ### Thanks to [Adrian's Blog](https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/) for a comprehensive blog on EAST Detector. 33 | 34 | ## License 35 | [MIT](https://choosealicense.com/licenses/mit/) -------------------------------------------------------------------------------- /frozen_east_text_detection.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZER-0-NE/EAST-Detector-for-text-detection-using-OpenCV/5b6c8d025778e5402c327a4a3f484a16ce7dda84/frozen_east_text_detection.pb -------------------------------------------------------------------------------- /images/car_wash.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZER-0-NE/EAST-Detector-for-text-detection-using-OpenCV/5b6c8d025778e5402c327a4a3f484a16ce7dda84/images/car_wash.png -------------------------------------------------------------------------------- /images/lebron_james.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZER-0-NE/EAST-Detector-for-text-detection-using-OpenCV/5b6c8d025778e5402c327a4a3f484a16ce7dda84/images/lebron_james.jpg -------------------------------------------------------------------------------- /images/sign.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZER-0-NE/EAST-Detector-for-text-detection-using-OpenCV/5b6c8d025778e5402c327a4a3f484a16ce7dda84/images/sign.jpg -------------------------------------------------------------------------------- /opencv_text_detection_image.py: -------------------------------------------------------------------------------- 1 | # USAGE 2 | # python3 opencv_text_detection_image.py --image images/lebron_james.jpg --east frozen_east_text_detection.pb 3 | 4 | # import the necessary packages 5 | from imutils.object_detection import non_max_suppression 6 | import numpy as np 7 | import argparse 8 | import time 9 | import cv2 10 | 11 | # construct the argument parser and parse the arguments 12 | ap = argparse.ArgumentParser() 13 | ap.add_argument("-i", "--image", type=str, 14 | help="path to input image") 15 | ap.add_argument("-east", "--east", type=str, 16 | help="path to input EAST text detector") 17 | ap.add_argument("-c", "--min-confidence", type=float, default=0.5, 18 | help="minimum probability required to inspect a region") 19 | ap.add_argument("-w", "--width", type=int, default=320, 20 | help="resized image width (should be multiple of 32)") 21 | ap.add_argument("-e", "--height", type=int, default=320, 22 | help="resized image height (should be multiple of 32)") 23 | args = vars(ap.parse_args()) 24 | 25 | # load the input image and grab the image dimensions 26 | image = cv2.imread(args["image"]) 27 | orig = image.copy() 28 | (H, W) = image.shape[:2] 29 | 30 | # set the new width and height and then determine the ratio in change 31 | # for both the width and height 32 | (newW, newH) = (args["width"], args["height"]) 33 | rW = W / float(newW) 34 | rH = H / float(newH) 35 | 36 | # resize the image and grab the new image dimensions 37 | image = cv2.resize(image, (newW, newH)) 38 | (H, W) = image.shape[:2] 39 | 40 | # define the two output layer names for the EAST detector model that 41 | # we are interested -- the first is the output probabilities and the 42 | # second can be used to derive the bounding box coordinates of text 43 | layerNames = [ 44 | "feature_fusion/Conv_7/Sigmoid", 45 | "feature_fusion/concat_3"] 46 | 47 | # load the pre-trained EAST text detector 48 | print("[INFO] loading EAST text detector...") 49 | net = cv2.dnn.readNet(args["east"]) 50 | 51 | # construct a blob from the image and then perform a forward pass of 52 | # the model to obtain the two output layer sets 53 | blob = cv2.dnn.blobFromImage(image, 1.0, (W, H), 54 | (123.68, 116.78, 103.94), swapRB=True, crop=False) 55 | start = time.time() 56 | net.setInput(blob) 57 | (scores, geometry) = net.forward(layerNames) 58 | end = time.time() 59 | 60 | # show timing information on text prediction 61 | print("[INFO] text detection took {:.6f} seconds".format(end - start)) 62 | 63 | # grab the number of rows and columns from the scores volume, then 64 | # initialize our set of bounding box rectangles and corresponding 65 | # confidence scores 66 | (numRows, numCols) = scores.shape[2:4] 67 | rects = [] 68 | confidences = [] 69 | 70 | # loop over the number of rows 71 | for y in range(0, numRows): 72 | # extract the scores (probabilities), followed by the geometrical 73 | # data used to derive potential bounding box coordinates that 74 | # surround text 75 | scoresData = scores[0, 0, y] 76 | xData0 = geometry[0, 0, y] 77 | xData1 = geometry[0, 1, y] 78 | xData2 = geometry[0, 2, y] 79 | xData3 = geometry[0, 3, y] 80 | anglesData = geometry[0, 4, y] 81 | 82 | # loop over the number of columns 83 | for x in range(0, numCols): 84 | # if our score does not have sufficient probability, ignore it 85 | if scoresData[x] < args["min_confidence"]: 86 | continue 87 | 88 | # compute the offset factor as our resulting feature maps will 89 | # be 4x smaller than the input image 90 | (offsetX, offsetY) = (x * 4.0, y * 4.0) 91 | 92 | # extract the rotation angle for the prediction and then 93 | # compute the sin and cosine 94 | angle = anglesData[x] 95 | cos = np.cos(angle) 96 | sin = np.sin(angle) 97 | 98 | # use the geometry volume to derive the width and height of 99 | # the bounding box 100 | h = xData0[x] + xData2[x] 101 | w = xData1[x] + xData3[x] 102 | 103 | # compute both the starting and ending (x, y)-coordinates for 104 | # the text prediction bounding box 105 | endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x])) 106 | endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x])) 107 | startX = int(endX - w) 108 | startY = int(endY - h) 109 | 110 | # add the bounding box coordinates and probability score to 111 | # our respective lists 112 | rects.append((startX, startY, endX, endY)) 113 | confidences.append(scoresData[x]) 114 | 115 | # apply non-maxima suppression to suppress weak, overlapping bounding 116 | # boxes 117 | boxes = non_max_suppression(np.array(rects), probs=confidences) 118 | 119 | # loop over the bounding boxes 120 | for (startX, startY, endX, endY) in boxes: 121 | # scale the bounding box coordinates based on the respective 122 | # ratios 123 | startX = int(startX * rW) 124 | startY = int(startY * rH) 125 | endX = int(endX * rW) 126 | endY = int(endY * rH) 127 | 128 | # draw the bounding box on the image 129 | cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 2) 130 | 131 | # show the output image 132 | cv2.imshow("Text Detection", orig) 133 | cv2.waitKey(0) 134 | --------------------------------------------------------------------------------