├── src
├── requirements.txt
├── test.png
├── .DS_Store
├── testImages
│ ├── .DS_Store
│ ├── calendartest.png
│ └── scheduletest.png
├── text.csv
└── uImage.py
├── .DS_Store
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
├── README.md
└── LICENSE.md
/src/requirements.txt:
--------------------------------------------------------------------------------
1 | pil
2 | tesseract
3 | pytesseract
4 | skimage
5 | matplotlib
6 |
--------------------------------------------------------------------------------
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Connorula/Uniform-Image-Processing-Library/HEAD/.DS_Store
--------------------------------------------------------------------------------
/src/test.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Connorula/Uniform-Image-Processing-Library/HEAD/src/test.png
--------------------------------------------------------------------------------
/src/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Connorula/Uniform-Image-Processing-Library/HEAD/src/.DS_Store
--------------------------------------------------------------------------------
/src/testImages/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Connorula/Uniform-Image-Processing-Library/HEAD/src/testImages/.DS_Store
--------------------------------------------------------------------------------
/src/testImages/calendartest.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Connorula/Uniform-Image-Processing-Library/HEAD/src/testImages/calendartest.png
--------------------------------------------------------------------------------
/src/testImages/scheduletest.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Connorula/Uniform-Image-Processing-Library/HEAD/src/testImages/scheduletest.png
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | If you'd like to contribute to this project, simply create a pull request. All further questions, comments, or suggestions can be emailed to cdevlin@andover.edu.
2 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct
2 |
3 | ## Our Pledge
4 |
5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
6 |
7 | ## Our Standards
8 |
9 | Examples of behavior that contributes to creating a positive environment include:
10 |
11 | * Using welcoming and inclusive language
12 | * Being respectful of differing viewpoints and experiences
13 | * Gracefully accepting constructive criticism
14 | * Focusing on what is best for the community
15 | * Showing empathy towards other community members
16 |
17 | Examples of unacceptable behavior by participants include:
18 |
19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances
20 | * Trolling, insulting/derogatory comments, and personal or political attacks
21 | * Public or private harassment
22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission
23 | * Other conduct which could reasonably be considered inappropriate in a professional setting
24 |
25 | ## Our Responsibilities
26 |
27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
28 |
29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
30 |
31 | ## Scope
32 |
33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
34 |
35 | ## Enforcement
36 |
37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at libre@rmrm.io. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
38 |
39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
40 |
41 | ## Attribution
42 |
43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
44 |
45 | [homepage]: http://contributor-covenant.org
46 | [version]: http://contributor-covenant.org/version/1/4/
47 |
--------------------------------------------------------------------------------
/src/text.csv:
--------------------------------------------------------------------------------
1 | comp sci,mathClass
2 | "2:00 - 2:45
3 | CSCG30
4 | N. Zufelt
5 | MORSEHL-103","11:15 - 12:00
6 | MTHSQOB
7 |
8 | C. Odden
9 | MORSEHL-301"
10 | "2:00 - 2:45
11 | CSCG30
12 | N. Zufelt
13 | MORSEHL-103","11:15 - 12:00
14 | MTHSQOB
15 |
16 | C. Odden
17 | MORSEHL-301"
18 | "2:00 - 2:45
19 | CSCG30
20 | N. Zufelt
21 | MORSEHL-103","11:15 - 12:00
22 | MTHSQOB
23 |
24 | C. Odden
25 | MORSEHL-301"
26 | "2:00 - 2:45
27 | CSCG30
28 | N. Zufelt
29 | MORSEHL-103","11:15 - 12:00
30 | MTHSQOB
31 |
32 | C. Odden
33 | MORSEHL-301"
34 | "2:00 - 2:45
35 | CSCG30
36 | N. Zufelt
37 | MORSEHL-103","11:15 - 12:00
38 | MTHSQOB
39 |
40 | C. Odden
41 | MORSEHL-301"
42 | "2:00 - 2:45
43 | CSCG30
44 | N. Zufelt
45 | MORSEHL-103","11:15 - 12:00
46 | MTHSQOB
47 |
48 | C. Odden
49 | MORSEHL-301"
50 | "2:00 - 2:45
51 | CSCG30
52 | N. Zufelt
53 | MORSEHL-103","11:15 - 12:00
54 | MTHSQOB
55 |
56 | C. Odden
57 | MORSEHL-301"
58 | "2:00 - 2:45
59 | CSCG30
60 | N. Zufelt
61 | MORSEHL-103","11:15 - 12:00
62 | MTHSQOB
63 |
64 | C. Odden
65 | MORSEHL-301"
66 | "2:00 - 2:45
67 | CSCG30
68 | N. Zufelt
69 | MORSEHL-103","11:15 - 12:00
70 | MTHSQOB
71 |
72 | C. Odden
73 | MORSEHL-301"
74 | "2:00 - 2:45
75 | CSCG30
76 | N. Zufelt
77 | MORSEHL-103","11:15 - 12:00
78 | MTHSQOB
79 |
80 | C. Odden
81 | MORSEHL-301"
82 | "2:00 - 2:45
83 | CSCG30
84 | N. Zufelt
85 | MORSEHL-103","11:15 - 12:00
86 | MTHSQOB
87 |
88 | C. Odden
89 | MORSEHL-301"
90 | "2:00 - 2:45
91 | CSCG30
92 | N. Zufelt
93 | MORSEHL-103","11:15 - 12:00
94 | MTHSQOB
95 |
96 | C. Odden
97 | MORSEHL-301"
98 | "2:00 - 2:45
99 | CSCG30
100 | N. Zufelt
101 | MORSEHL-103","11:15 - 12:00
102 | MTHSQOB
103 |
104 | C. Odden
105 | MORSEHL-301"
106 | "2:00 - 2:45
107 | CSCG30
108 | N. Zufelt
109 | MORSEHL-103","11:15 - 12:00
110 | MTHSQOB
111 |
112 | C. Odden
113 | MORSEHL-301"
114 | "2:00 - 2:45
115 | CSCG30
116 | N. Zufelt
117 | MORSEHL-103","11:15 - 12:00
118 | MTHSQOB
119 |
120 | C. Odden
121 | MORSEHL-301"
122 | "2:00 - 2:45
123 | CSCG30
124 | N. Zufelt
125 | MORSEHL-103","11:15 - 12:00
126 | MTHSQOB
127 |
128 | C. Odden
129 | MORSEHL-301"
130 | "2:00 - 2:45
131 | CSCG30
132 | N. Zufelt
133 | MORSEHL-103","11:15 - 12:00
134 | MTHSQOB
135 |
136 | C. Odden
137 | MORSEHL-301"
138 | "2:00 - 2:45
139 | CSCG30
140 | N. Zufelt
141 | MORSEHL-103","11:15 - 12:00
142 | MTHSQOB
143 |
144 | C. Odden
145 | MORSEHL-301"
146 | "2:00 - 2:45
147 | CSCG30
148 | N. Zufelt
149 | MORSEHL-103","11:15 - 12:00
150 | MTHSQOB
151 |
152 | C. Odden
153 | MORSEHL-301"
154 | "2:00 - 2:45
155 | CSCG30
156 | N. Zufelt
157 | MORSEHL-103","11:15 - 12:00
158 | MTHSQOB
159 |
160 | C. Odden
161 | MORSEHL-301"
162 | "2:00 - 2:45
163 | CSCG30
164 | N. Zufelt
165 | MORSEHL-103","11:15 - 12:00
166 | MTHSQOB
167 |
168 | C. Odden
169 | MORSEHL-301"
170 | "2:00 - 2:45
171 | CSCG30
172 | N. Zufelt
173 | MORSEHL-103","11:15 - 12:00
174 | MTHSQOB
175 |
176 | C. Odden
177 | MORSEHL-301"
178 | "2:00 - 2:45
179 | CSCG30
180 | N. Zufelt
181 | MORSEHL-103","11:15 - 12:00
182 | MTHSQOB
183 |
184 | C. Odden
185 | MORSEHL-301"
186 | "2:00 - 2:45
187 | CSCG30
188 | N. Zufelt
189 | MORSEHL-103","11:15 - 12:00
190 | MTHSQOB
191 |
192 | C. Odden
193 | MORSEHL-301"
194 | "2:00 - 2:45
195 | CSCG30
196 | N. Zufelt
197 | MORSEHL-103","11:15 - 12:00
198 | MTHSQOB
199 |
200 | C. Odden
201 | MORSEHL-301"
202 | "2:00 - 2:45
203 | CSCG30
204 | N. Zufelt
205 | MORSEHL-103","11:15 - 12:00
206 | MTHSQOB
207 |
208 | C. Odden
209 | MORSEHL-301"
210 | "2:00 - 2:45
211 | CSCG30
212 | N. Zufelt
213 | MORSEHL-103","11:15 - 12:00
214 | MTHSQOB
215 |
216 | C. Odden
217 | MORSEHL-301"
218 | "2:00 - 2:45
219 | CSCG30
220 | N. Zufelt
221 | MORSEHL-103","11:15 - 12:00
222 | MTHSQOB
223 |
224 | C. Odden
225 | MORSEHL-301"
226 | "2:00 - 2:45
227 | CSCG30
228 | N. Zufelt
229 | MORSEHL-103","11:15 - 12:00
230 | MTHSQOB
231 |
232 | C. Odden
233 | MORSEHL-301"
234 | "2:00 - 2:45
235 | CSCG30
236 | N. Zufelt
237 | MORSEHL-103","11:15 - 12:00
238 | MTHSQOB
239 |
240 | C. Odden
241 | MORSEHL-301"
242 | "2:00 - 2:45
243 | CSCG30
244 | N. Zufelt
245 | MORSEHL-103","11:15 - 12:00
246 | MTHSQOB
247 |
248 | C. Odden
249 | MORSEHL-301"
250 | "2:00 - 2:45
251 | CSCG30
252 | N. Zufelt
253 | MORSEHL-103","11:15 - 12:00
254 | MTHSQOB
255 |
256 | C. Odden
257 | MORSEHL-301"
258 | "2:00 - 2:45
259 | CSCG30
260 | N. Zufelt
261 | MORSEHL-103","11:15 - 12:00
262 | MTHSQOB
263 |
264 | C. Odden
265 | MORSEHL-301"
266 |
--------------------------------------------------------------------------------
/src/uImage.py:
--------------------------------------------------------------------------------
1 | from PIL import Image,ImageFilter, ImageDraw
2 | from pytesseract import image_to_string
3 | import csv
4 | import cv2
5 | import sys
6 | from math import *
7 | import numpy as np
8 | from pathlib import Path
9 | import os
10 |
11 | class uImageBlueprint(object):
12 | imageTypeName = ""
13 | fileType = ""
14 |
15 | subSections = {}
16 | # def getIntersection(line1, line2):
17 | # s1 = numpy.array(line1[0])
18 | # e1 = numpy.array(line1[1])
19 | #
20 | # s2 = numpy.array(line2[0])
21 | # e2 = numpy.array(line2[1])
22 | #
23 | # a1 = (s1[1] - e1[1]) / (s1[0] - e1[0])
24 | # b1 = s1[1] - (a1 * s1[0])
25 | #
26 | # a2 = (s2[1] - e2[1]) / (s2[0] - e2[0])
27 | # b2 = s2[1] - (a2 * s2[0])
28 | #
29 | # if abs(a1 - a2) < sys.float_info.epsilon:
30 | # return False
31 | #
32 | # x = (b2 - b1) / (a1 - a2)
33 | # y = a1 * x + b1
34 | # return (x, y)
35 |
36 | def findSections(self,fileName,threshold,boxArea):
37 | self.subSections = {}
38 | img = cv2.imread(str(fileName))
39 | imgray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
40 | #2nd argument is 240 for iCalendar
41 | #2nd argument is 127 for PA Schedules
42 | ret,thresh = cv2.threshold(imgray,threshold,255,0)
43 | im2, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, 4)
44 | counter = 0
45 | counterHeader = 0
46 | for cnt in contours:
47 | if(cv2.contourArea(cnt) > boxArea):
48 | coord = [cnt[0][0][0],cnt[0][0][1],cnt[2][0][0],cnt[2][0][1]]
49 | if(coord[2] - coord[0] > 10 and coord[3] - coord[1] > 10):
50 | self.addSection(coord,counterHeader)
51 | img = cv2.drawContours(img, contours, counter, (0,255,0), 10)
52 | cv2.rectangle(img,(coord[0],coord[1]),(coord[0]+5,coord[1]+5),(255,0,0),3)
53 | cv2.rectangle(img,(coord[2],coord[3]),(coord[2]+5,coord[3]+5),(0,0,255),3)
54 | middleCoord = (int((coord[0] + coord[2])/2),int((coord[1]+coord[3])/2))
55 | cv2.putText(img,str(counterHeader), middleCoord, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 255)
56 | counterHeader += 1
57 | elif coord[2] - coord[0] > 0 and coord[3] - coord[1] > 0:
58 | extLeft = tuple(cnt[cnt[:, :, 0].argmin()][0])
59 | extRight = tuple(cnt[cnt[:, :, 0].argmax()][0])
60 | extTop = tuple(cnt[cnt[:, :, 1].argmin()][0])
61 | extBot = tuple(cnt[cnt[:, :, 1].argmax()][0])
62 | coord = [extLeft[0],extTop[1],extRight[0],extBot[1]]
63 | self.addSection(coord,counterHeader)
64 | img = cv2.drawContours(img, contours, counter, (0,255,0), 10)
65 | cv2.rectangle(img,(coord[0],coord[1]),(coord[0]+5,coord[1]+5),(255,0,0),3)
66 | cv2.rectangle(img,(coord[2],coord[3]),(coord[2]+5,coord[3]+5),(0,0,255),3)
67 | middleCoord = (int((coord[0] + coord[2])/2),int((coord[1]+coord[3])/2))
68 | cv2.putText(img,str(counterHeader), middleCoord, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 255)
69 | counterHeader += 1
70 | counter+=1
71 | showimg = Image.fromarray(img, 'RGB')
72 | showimg.save('test.png')
73 |
74 | # def findSectionsColors(self):
75 | # # hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
76 | # # lower_range = np.array([178, 179, 0])
77 | # # upper_range = np.array([255, 255, 255])
78 |
79 | def addSection(self, coordinates, name):
80 | if coordinates[2] - coordinates[0] > 0 and coordinates[3] - coordinates[1] > 0:
81 | try:
82 | self.subSections[name].append(coordinates)
83 |
84 | except KeyError:
85 | self.subSections[name] = []
86 | self.subSections[name].append(coordinates)
87 | else:
88 | print("Invalid coordinates")
89 |
90 | def removeSection(self, headerName, index):
91 | newSubSections = {}
92 | popSubSections = []
93 | if(index[0] == -1):
94 | for header in self.subSections:
95 | if header not in headerName:
96 | newSubSections[header] = []
97 | for values in self.subSections[header]:
98 | newSubSections[header].append(values)
99 | else:
100 | for header in self.subSections:
101 | if header not in headerName:
102 | newSubSections[header] = []
103 | for values in self.subSections[header]:
104 | newSubSections[header].append(values)
105 | else:
106 | newSubSections[header] = []
107 | indexCounter = 0
108 | for values in self.subSections[header]:
109 | if indexCounter not in index:
110 | newSubSections[header].append(values)
111 | indexCounter += 1
112 |
113 | for header in newSubSections:
114 | if len(newSubSections[header]) == 0:
115 | popSubSections.append(header)
116 |
117 | for header in popSubSections:
118 | newSubSections.pop(header,None)
119 |
120 | self.subSections = newSubSections
121 |
122 | def renameSection(self, headerName, newHeaderName, index):
123 | try:
124 | if (index == -1):
125 | if newHeaderName not in self.subSections:
126 | self.subSections[newHeaderName] = self.subSections.pop(headerName)
127 | else:
128 | for coord in self.subSections[headerName]:
129 | self.subSections[newHeaderName].append(coord)
130 | self.subSections.pop(headerName)
131 | else:
132 | self.addSection(self.subSections[headerName].pop(index), newHeaderName)
133 | except KeyError:
134 | return None
135 |
136 |
137 | def getSectionText(self, headerName, fileName, index):
138 | try:
139 | image = Image.open(fileName)
140 | im4 = image.crop((self.subSections[headerName][index][0],self.subSections[headerName][index][1],self.subSections[headerName][index][2],self.subSections[headerName][index][3]))
141 | text = image_to_string(im4)
142 | return text
143 | except FileNotFoundError:
144 | print("File not found.")
145 | return None
146 | except KeyError:
147 | print("Dictionary does not contain this key")
148 | return None
149 | except IndexError:
150 | print("Index out of range")
151 | return None
152 |
153 | def __init__(self, imageTypeName, fileType):
154 | self.imageTypeName = imageTypeName
155 | self.fileType = fileType
156 |
157 | def processImage(self,imageFile,csvFile):
158 | try:
159 | image = Image.open(imageFile)
160 | except FileNotFoundError:
161 | print("File not found.")
162 | return None
163 | pix = image.load()
164 | size = image.size
165 | csvHeaders = []
166 | csvText = []
167 | maxLength = 1
168 | counter = 0
169 |
170 | my_file = Path(csvFile)
171 | if my_file.is_file():
172 | with open(csvFile, "a") as csv_file:
173 | writer = csv.writer(csv_file, delimiter=',')
174 | for key in self.subSections:
175 | csvHeaders.append(key)
176 |
177 | for header in csvHeaders:
178 | if (len(self.subSections[header]) > maxLength):
179 | maxLength = len(self.subSections[header])
180 |
181 | while(counter < maxLength):
182 | for header in csvHeaders:
183 | if(counter < len(self.subSections[header])):
184 | im3 = image.crop((self.subSections[header][counter][0],self.subSections[header][counter][1],self.subSections[header][counter][2],self.subSections[header][counter][3]))
185 | csvText.append(image_to_string(im3))
186 | else:
187 | csvText.append("")
188 | writer.writerow(csvText)
189 | csvText = []
190 | counter = counter + 1
191 |
192 | else:
193 | with open(csvFile, "w") as csv_file:
194 | writer = csv.writer(csv_file, delimiter=',')
195 | for key in self.subSections:
196 | csvHeaders.append(key)
197 |
198 | writer.writerow(csvHeaders)
199 |
200 | for header in csvHeaders:
201 | if (len(self.subSections[header]) > maxLength):
202 | maxLength = len(self.subSections[header])
203 |
204 | while(counter < maxLength):
205 | for header in csvHeaders:
206 | if(counter < len(self.subSections[header])):
207 | im3 = image.crop((self.subSections[header][counter][0],self.subSections[header][counter][1],self.subSections[header][counter][2],self.subSections[header][counter][3]))
208 | csvText.append(image_to_string(im3))
209 | else:
210 | csvText.append("")
211 | writer.writerow(csvText)
212 | csvText = []
213 | counter = counter + 1
214 |
215 | def processFolder(self,csvFile, path):
216 | folder = os.listdir( str(os.getcwd() + path) )
217 | for files in folder:
218 | if files[-(len(self.fileType)):] == self.fileType:
219 | # print(files)
220 | # print(str(os.getcwd() + "/" + files))
221 | self.processImage(str(os.getcwd() + path + "/" + files),csvFile)
222 |
223 | # test = uImageBlueprint("PASchedule","png")
224 | # # test.addSection([0,0,1,1],"bio")
225 | # # test.addSection([1,1,2,2],"bio")
226 | # test.addSection([500,2271,989,2578],"comp sci")
227 | # test.addSection([0,1325,495,1650],"mathClass")
228 | # # test.findSections("scheduletest.png",127,30000)
229 | # # test.removeSection(["bio","comp sci"],[5])
230 | # # print(test.subSections)
231 | # # test.renameSection("bio","comp sci")
232 | # # print(test.getSectionText("comp sci","scheduletest.png",0))
233 | # # test.processImage("scheduletest.png","text.csv")
234 | # test.processFolder("newTest.csv", "/images")
235 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Uniform-Image-Processing-Library
2 | A Python library which facilitates the processing of numerous images with a uniform layout (e.g. calendars, schedules, etc.) by inputting their textual data into a CSV file.
3 |
4 | ## In-Depth Explanation
5 | Essentially, the user creates a uImageBlueprint object, which serves as the base template for all the image processing. The object contains the individual sections of the image which the user wishes to process and input into a data file. These sections are contained within a dictionary, where the keys represent the headers of the CSV and the values are the coordinates of the section whose text will correspond to the appropriate CSV header. The user can define the bounds of the sections in a variety of ways. The user can manually add sections to the blueprint by using the addSection function. However, this requires knowing the approximate pixel coordinates of the each section's top-left and bottom-right coordinate (one can determine a pixel's location by using software such as Gimp). Alternatively, the user can use the findSections function, which will automatically segment the image into rectangles according to the threshold value (color of the line dividing each segment) and minimum segment area (the approximate minimum size of each section). If necessary, the user can iron out the issues with the automatically processed image by removing/adding sections which were not recognized by the automated function. Once the user has a satisfying blueprint object, they can proceed to process either a single image or a folder full of images by using the processImage and processFolder functions, respectively. When the user calls these functions, the text within each section will be read, and inputted into a CSV file whose headers are the section dictionary's keys.
6 |
7 | ## Dependencies:
8 | You will need to import the following libraries: Pillow, Matplotlib, Tesseract, PyTesseract, OpenCV, Numpy, and Pathlib.
9 | ```
10 | pip install pillow
11 | pip install pytesseract
12 | pip install numpy
13 | pip install matplotlib
14 | pip install pathlib
15 | pip install opencv-python
16 | ```
17 | It's a little complicated to install Tesseract on Macs, but [here's](http://benschmidt.org/dighist13/?page_id=129) a very simple and easy-to-follow guide (only go up to `brew install tesseract` in Step 3).
18 |
19 | ## Installation
20 | If you wish to use the uImage library in your code, simply put `import uImage` at the top of your Python document. Make sure that *uImage.py* is in the same folder, however.
21 |
22 | ## Documentation
23 | ### `addSection(self, coordinates, name)`
24 | The addSection function adds a subsection with specified coordinates to the subsection dictionary under either a preexisting key or a new one.
25 | * coordinates: This parameter requires a list with 4 integers, where the first and second integers correspond to the coordinates - (x1,y1) - of the subsection's top-left corner and the third and fourth integers correspond to the coordinates - (x2,y2) - of the subsection's bottom-right corner. If x1 > x2 or y1>y2, the function will throw print "Invalid coordinates" to the console.
26 | * name: This parameter requires a string. If the key name already exists in the subSections dictionary, the section's coordinates will be appended to the key's array of coordinates.
27 | #### Example
28 | ```
29 | test = uImageBlueprint("PASchedule","png")
30 | test.addSection([0,0,1,1],"bio")
31 | test.addSection([1,1,2,2],"bio")
32 | test.addSection([500,2271,989,2578],"comp sci")
33 | test.addSection([0,1325,495,1650],"mathClass")
34 | print(test.subSections)
35 | ```
36 | This returns:
37 | ```
38 | {'bio': [[0, 0, 1, 1], [1, 1, 2, 2]], 'comp sci': [[500, 2271, 989, 2578]], 'mathClass': [[0, 1325, 495, 1650]]}
39 | ```
40 |
41 | ### `removeSection(self, headerName, index)`
42 | The removeSection function removes a subsection/many subsections from the uImageBlueprint's subSection dictionary.
43 | * headerName: This parameter requires a list containing the key values of the sections the user wishes to remove.
44 | * index: This parameter requires a list containing the indices of the subsections the user wishes to remove. If the list only contains -1, all the sections of the specified header(s) will be removed. If the indices are out of range for a particular header, nothing will occur.
45 | #### Example
46 | ```
47 | test = uImageBlueprint("PASchedule","png")
48 | test.addSection([0,0,1,1],"bio")
49 | test.addSection([1,1,2,2],"bio")
50 | test.addSection([500,2271,989,2578],"comp sci")
51 | test.addSection([0,1325,495,1650],"mathClass")
52 | test.removeSection(["bio","mathClass"],[1])
53 | print("First Deletion: " + str(test.subSections))
54 | test.removeSection(["bio","mathClass"],[-1])
55 | print("Second Deletion: " + str(test.subSections))
56 | ```
57 | This returns:
58 | ```
59 | First Deletion: {'bio': [[0, 0, 1, 1]], 'comp sci': [[500, 2271, 989, 2578]], 'mathClass': [[0, 1325, 495, 1650]]}
60 | Second Deletion: {'comp sci': [[500, 2271, 989, 2578]]}
61 | ```
62 |
63 | ### `renameSection(self, headerName, newHeaderName, index)`
64 | The renameSection function renames an entire section or a particular coordinate of a section.
65 | * headerName: This parameter requires a string which corresponds to a key of the subSections dictionary.
66 | * newHeaderName: This parameter requires a string, and, if it is not a preexisting key, will create a new key value and store the appropriate values key within it. If the key already exists, it will simply append the appropriate values to the end of the key's list of coordinates.
67 | * index: This parameter takes in an integer. If the integer is -1, the entirety of the subSection under headerName will be renamed to newHeaderName. If an index is specified, the coordinate at position index of the list of coordinates of the headerName key will be renamed to newHeaderName.
68 | #### Example
69 | ```
70 | test = uImageBlueprint("PASchedule","png")
71 | test.addSection([0,0,1,1],"bio")
72 | test.addSection([1,1,2,2],"bio")
73 | test.addSection([500,2271,989,2578],"comp sci")
74 | test.addSection([0,1325,495,1650],"mathClass")
75 | test.renameSection("bio","comp sci", -1)
76 | print(test.subSections)
77 | test.renameSection("comp sci", "bio", 1)
78 | print(test.subSections)
79 | ```
80 | This prints the following to the console:
81 | ```
82 | {'comp sci': [[500, 2271, 989, 2578], [0, 0, 1, 1], [1, 1, 2, 2]], 'mathClass': [[0, 1325, 495, 1650]]}
83 | {'comp sci': [[500, 2271, 989, 2578], [1, 1, 2, 2]], 'mathClass': [[0, 1325, 495, 1650]], 'bio': [[0, 0, 1, 1]]}
84 | ```
85 |
86 | ### `getSectionText(self, headerName, fileName, index)`
87 | This function returns the text within a particular section of the image.
88 | * headerName: This parameter requires a string which corresponds to a key of the subSections dictionary.
89 | * fileName: Name of the file you wish to read the text from. Note that if the file is not in the current directory, you must add the path to the fileName.
90 | * index: This parameter requires an integer. This parameter specifies which element of the coordinate list stored in the subSections dictionary under headerNamer you wish to retrieve the text from.
91 | #### Example
92 | ```
93 | test = uImageBlueprint("PASchedule","png")
94 | test.addSection([0,0,1,1],"bio")
95 | test.addSection([1,1,2,2],"bio")
96 | test.addSection([500,2271,989,2578],"comp sci")
97 | test.addSection([0,1325,495,1650],"mathClass")
98 | print(test.getSectionText("comp sci", "scheduletest.png",0))
99 | ```
100 | This prints the following to the console:
101 | ```
102 | 2:00 - 2:45
103 | CSCG30
104 | N. Zufelt
105 | MORSEHL-103
106 | ```
107 |
108 | ### `findSections(self,fileName,threshold,boxArea)`
109 | This function automatically finds the sections of an image, and adds them to the subSections dictionary of the uImageBlueprint object.
110 | * fileName: This parameter requires the fileName you wish to segment into sections. Note that if the file is not in the current directory, you must add the path to the fileName.
111 | * threshold: This parameter requires an integer, and determines which lines are considered contours.
112 | * boxArea: This parameter also requires an integer which is an approximation of the minimum section size, in order to filter out non-sectional data.
113 | Note that the threshold and boxArea will vary from blueprint to blueprint. In order to facilitate the finding of the correct threshold and boxArea, the function creates an image called *test.png* which shows the contours in green, the top-left coordinate of the box as a red circle, the bottom-right coordinate of the box as a blue circle, and the section name of said box in red in the middle. I recommend you start at a boxArea of 0, so nothing gets filtered out, and adjust the threshold until you find a satisfactory value. From there, you can adjust the boxArea to filter out any unwanted sections. If all else fails, manually remove/add sections using the appropriate functions until you've reached the blueprint which you desire.
114 | #### Example
115 | ```
116 | test = uImageBlueprint("PASchedule","png")
117 | test.findSections("scheduletest.png",127,30000)
118 | print(test.subSections)
119 | ```
120 | This prints the following:
121 | ```
122 | {0: [[1984, 2584, 2473, 2895]], 1: [[500, 2584, 988, 2895]], 2: [[5, 2584, 494, 2895]], 3: [[1983, 2271, 2474, 2578]], 4: [[1489, 2271, 1978, 2579]], 5: [[994, 2272, 1484, 2895]], 6: [[500, 2271, 989, 2578]], 7: [[4, 2271, 495, 2578]], 8: [[1983, 1958, 2474, 2266]], 9: [[1489, 1958, 1978, 2266]], 10: [[994, 1958, 1484, 2266]], 11: [[500, 1958, 989, 2266]], 12: [[4, 1959, 495, 2265]], 13: [[1983, 1646, 2474, 1953]], 14: [[1489, 1646, 1978, 1953]], 15: [[993, 1647, 1485, 1952]], 16: [[500, 1646, 989, 1954]], 17: [[4, 1646, 495, 1953]], 18: [[1983, 1333, 2474, 1641]], 19: [[1489, 1333, 1978, 1641]], 20: [[993, 1333, 1485, 1641]], 21: [[500, 1333, 989, 1641]], 22: [[4, 1333, 495, 1641]], 23: [[1983, 1021, 2474, 1328]], 24: [[993, 1021, 1485, 1329]], 25: [[500, 1021, 988, 1328]], 26: [[4, 1021, 495, 1328]], 27: [[1489, 1021, 1979, 1328]], 28: [[1490, 709, 1978, 1016]], 29: [[1983, 708, 2474, 1016]], 30: [[993, 708, 1485, 1016]], 31: [[499, 709, 989, 1016]], 32: [[4, 708, 495, 1016]], 33: [[1983, 396, 2474, 704]], 34: [[1490, 396, 1978, 703]], 35: [[500, 396, 989, 704]], 36: [[4, 396, 495, 704]], 37: [[1983, 83, 2474, 391]], 38: [[1489, 83, 1979, 390]], 39: [[993, 84, 1485, 390]], 40: [[500, 83, 989, 391]], 41: [[4, 83, 495, 391]], 42: [[1984, 5, 2474, 77]], 43: [[1491, 4, 1978, 77]], 44: [[995, 4, 1484, 78]], 45: [[501, 4, 988, 78]], 46: [[4, 5, 494, 78]]}
123 | ```
124 |
125 | ### `processImage(self,imageFile,csvFile)`
126 | This function processes a single image using the corresponding uImageBlueprint, and stores the textual data of each section in a CSV file.
127 | * imageFile: This parameter requires the name of the image file you wish to process. If the file does not exist within the current directory, pass the file path instead.
128 | * csvFile: This parameter requires the name of the CSV file you wish to add the data to. If the CSV file already exists in the current directory, the data will be appended to the preexisting CSV's end. Otherwise, a new CSV file will be created under the parameter's name.
129 | #### Example
130 | ```
131 | test = uImageBlueprint("PASchedule","png")
132 | test.findSections("scheduletest.png",127,30000)
133 | test.processImage("scheduletest.png","text.csv")
134 | ```
135 | This will write all the textual data within the sections of test of scheduletest.png into a CSV called *text.csv*.
136 |
137 | ### `processFolder(self,csvFile, path)`
138 | This function processes a folder of images using the corresponding uImageBlueprint, and stores all of their textual data into a CSV file.
139 | * csvFile: This parameter requires the name of the CSV file you wish to add the data to. If the CSV file already exists in the current directory, the data will be appended to the preexisting CSV's end. Otherwise, a new CSV file will be created under the parameter's name.
140 | * path: This parameter requires the path, relative to the current directory, of the folder containing the images. If the images are contained within the current directory, pass a blank string for path.
141 | #### Example
142 | ```
143 | test = uImageBlueprint("PASchedule","png")
144 | test.addSection([500,2271,989,2578],"comp sci")
145 | test.addSection([0,1325,495,1650],"mathClass")
146 | test.processFolder("newCSV.csv", "/images")
147 | ```
148 | This will process all the images in ./images according to the test uImageBlueprint, and store the textual data into a CSV file called newCSV.csv.
149 |
150 | ## FAQs
151 | - findSections is not working for my image!
152 |
153 | You're going to want to do a couple things to maximize the performance of the findSections and processImage/processFolder functions. Firstly, you'll want to use images with a DPI of at least 300. There are many ways to achieve this, and tutorials can be found online. Furthermore, you'll want to crop your image to fit as closely around the image's border as possible, as to limit the surrounding white space. Lastly, you must understand that it is not a perfect function, and will oftentimes require manual addition/removal of sections to achieve a satisfactory result.
154 |
155 | ## Possible Changes:
156 | Feel free to add different functionalities to either the blueprint object or the processing function.
157 | - [ ] Add the ability to define which subsections correspond to headers and which correspond to the headers data.
158 | - [ ] Add a section which returns the dominant color of the subsection
159 | - [ ] Test if handwritten files work
160 | - [ ] Improve contour recognition of findSections function
161 | - [ ] Add ability to have sections that aren't just rectangles
162 | - [ ] Function which returns list of intersections of contours, so user can more easily add/remove sections.
163 | - [ ] Or, come up with anything you think would be useful (different outputs, added functionality, etc.) This project is still in its early stages, so don't be afraid to contribute something drastically different!
164 |
165 | ## License
166 | This project is licensed under the MIT License - see the LICENSE.md file for details
167 |
168 | ## Authors
169 | Connor Devlin - cdevlin@andover.edu
170 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
605 | Connorula/Uniform-Image-Processing-Library is licensed under the
606 |
607 |
MIT License
608 |
A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
Permission is hereby granted, free of charge, to any person obtaining a copy
732 | of this software and associated documentation files (the "Software"), to deal
733 | in the Software without restriction, including without limitation the rights
734 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
735 | copies of the Software, and to permit persons to whom the Software is
736 | furnished to do so, subject to the following conditions:
737 |
The above copyright notice and this permission notice shall be included in all
738 | copies or substantial portions of the Software.
739 |
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
740 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
741 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
742 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
743 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
744 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
745 | SOFTWARE.
746 |
747 |
748 |
749 |
750 |
751 |
752 |
753 |
757 |
758 |
759 |
760 |
761 |
762 |
763 |
764 |
765 |
766 |
767 |
768 |
769 |
794 |
795 |
796 |
797 |
798 |
799 |
802 | You can't perform that action at this time.
803 |
815 |
816 | You signed in with another tab or window. Reload to refresh your session.
817 | You signed out in another tab or window. Reload to refresh your session.
818 |