37 |
38 | ### Features
39 |
40 | * Editing transcriptions of lines
41 | * Commenting on line and page level
42 | * Use [standardized comment tags](https://github.com/UB-Mannheim/ocr-gt-tools/wiki/Error-Tags) to mark common problems
43 | * [Cheatsheet](./doc/screenshots/cheatsheet-2016-05-04.png)
44 | * Zoom in / Zoom out
45 | * Filter visible elements
46 | * Select multiple lines and apply tags.
47 |
48 | ### Installation
49 |
50 | See [INSTALL.md](./INSTALL.md).
51 |
52 | ### About the code
53 |
54 | The server-side code is written in Perl.
55 |
56 | The frontend is written in HTML and Javascript.
57 |
58 | ## Usage
59 |
60 | - Open 'ocr-gt-tools/index.html' with a browser
61 | - open in a second Window 'Page Previews' from Kitodo
62 | - Search the book from which you created the hOCR file
63 | - Drag and drop a image from the Kitodo 'Page Preview' Window to the Window with 'ocr-gt-tools/index.html'
64 | - The perl script ocr-gt-tools.cgi will create in the background all files, which takes a few seconds
65 | - with ajax a json objects will be returned to index.html
66 | - index.html will load with ajax the created 'correction.html' and 'anmerkungen.txt' inline
67 | - 'Speichern' will get active if you have written a comment or a text line
68 |
69 | ## Contributing
70 |
71 | ### Expand the wiki
72 |
73 | We are using the wiki to collect [transcription hints for unusual
74 | glyphs](wiki/Special-Characters) and [frequent errors](wiki/Error-Tags).
75 |
76 | ### Pull Requests
77 |
78 | Bug fixes, new functions, suggestions for new features and other user feedback
79 | are appreciated.
80 |
81 | The source code is available from https://github.com/UB-Mannheim/ocr-gt-tools.
82 | Please prepare your code contributions also on Github.
83 |
84 | ### Bug reports
85 |
86 | Please feel free to [open
87 | issues](https://github.com/UB-Mannheim/ocr-gt-tools/issues) for any bug you
88 | encounter and features you'd like to have.
89 |
90 |
91 | ## Acknowledgments
92 |
93 | This is free software. You may use it under the terms of the
94 | GNU AFFERO General Public License (AGPL) version 3 or newer.
95 | See [LICENSE](LICENSE) for details.
96 |
97 | This project bundles other free software:
98 |
99 | * [EB Garamond Font](https://www.google.com/fonts/specimen/EB+Garamond) (SIL Open Font License)
100 | * [Font Awesome by Dave Gandy](http://fontawesome.io/) (SIL OFL 1.1, MIT)
101 | * [bootstrap](http://getbootstrap.com/) (MIT)
102 | * [clipboard.js](https://github.com/zenorocha/clipboard.js) (MIT)
103 | * [handlebars.js](https://github.com/wycats/handlebars.js) (MIT)
104 | * [hocr-extract-images](https://github.com/tmbdev/hocr-tools) (Apache)
105 | * [jQuery](http://jquery.com/) (MIT)
106 | * [ocropus-gtedit](https://github.com/tmbdev/ocropy) (Apache)
107 | * [reset-css](https://github.com/shannonmoeller/reset-css) (Public Domain)
108 |
--------------------------------------------------------------------------------
/ocr-gt-tools.styl:
--------------------------------------------------------------------------------
1 | selectedColor = red
2 | paperColor = #b9af96
3 |
4 | @media (min-width: 768px)
5 | body
6 | padding-top 0
7 | .container-fluid
8 | margin-left 100px
9 | .navbar-collapse
10 | height auto
11 | border-top 0
12 | box-shadow none
13 | max-height none
14 | padding-left 0
15 | padding-right 0
16 | &.collapse
17 | display block !important
18 | width auto !important
19 | padding-bottom 0
20 | overflow visible !important
21 | &.in
22 | overflow-x visible
23 | .navbar-nav
24 | &.navbar-right
25 | &:last-child
26 | margin-right 0
27 | .navbar
28 | max-width 96px
29 | height 100vh
30 | margin-right 0
31 | margin-left 0
32 | float left
33 | position fixed
34 | z-index 10001
35 | &:after
36 | clear both
37 | .btn-group
38 | display block
39 | .dropdown-menu
40 | top 0
41 | left 100%
42 | padding 0
43 | width 220px
44 | .btn.disabled
45 | opacity .35
46 | .navbar-nav,
47 | .navbar-nav > li,
48 | .navbar-left,
49 | .navbar-right,
50 | .navbar-header
51 | float none !important
52 | .navbar-right
53 | .dropdown-menu
54 | left 0
55 | right auto
56 | .navbar-default
57 | .dropdown-menu
58 | i
59 | display inline-block
60 | float left
61 | top 2px
62 | font-size 2em
63 | .modal-admin
64 | width 80%
65 | left 100px
66 | position fixed
67 | max-height 100vh
68 | overflow scroll
69 | .line
70 | .list-group-item
71 | padding 0 0 0 0
72 | .btn
73 | i
74 | padding-top 0
75 | padding-bottom 0
76 | font-size: 20px
77 |
78 | #dropzone
79 | position fixed
80 | height 90vh
81 | margin 5vh
82 | padding-top 40vh
83 | font-size 300%
84 | text-align center
85 | &.droppable
86 | border-style dashed
87 | border-width 10px
88 | border-color blue
89 |
90 | #waiting-animation
91 | position: fixed
92 | height: 100vh
93 | width: 100vw
94 | z-index: 10000
95 | a
96 | position: absolute
97 | height: 32px
98 | width: 32px
99 | img
100 | height: 32px !important
101 | max-width: 32px !important
102 | display: block !important
103 |
104 | .hidden
105 | display none
106 |
107 | .view-hidden
108 | display none
109 |
110 | .selected
111 | background: selectedColor
112 | .list-group-item.image
113 | background: selectedColor
114 | .panel
115 | background: selectedColor
116 |
117 | textarea
118 | input[type='text']
119 | display inline-block
120 | white-space pre-wrap
121 | border none
122 | padding 0
123 | color #000060
124 | font-family 'EB Garamond', serif
125 | font-size 20px
126 | min-height 24px
127 | width 100%
128 | // Important to make textareas auto-expand
129 | height auto
130 | overflow hidden
131 | resize none
132 |
133 | .line-comment, .line-comment *,
134 | #page-comment textarea
135 | background-color lightyellow
136 | font-family 'EB Garamond', serif
137 | font-size 20px
138 | min-height 24px
139 |
140 | #file-correction
141 | .panel
142 | margin 0
143 | .panel-heading
144 | padding 0
145 | h4
146 | padding-top 7.5px
147 | .col-sm-1
148 | width:10%
149 | padding-left 0
150 | padding-right 0
151 | .btn-group > .btn
152 | padding: 0
153 | width: 35px
154 | .btn
155 | text-align left
156 | i
157 | padding: 5px;
158 | .col-sm-11
159 | width: 88%
160 | padding-left: 0
161 |
162 |
163 | #right-sidebar
164 | position fixed
165 | right 0
166 | .list-group-item
167 | i
168 | padding-right: 5px;
169 |
170 | #select-bar
171 | position fixed
172 | background-color #eee
173 | border-radius 0 0 10px 10px
174 | border-left: 2px solid blue
175 | border-right 2px solid blue
176 | border-bottom 2px solid blue
177 | top 0
178 | left 200px
179 | z-index: 1001
180 | .close
181 | opacity .5
182 | display inline-block
183 | padding-right 5px
184 | font-size xx-large
185 |
186 | #cheatsheet-modal
187 | // disable selection
188 | -webkit-touch-callout: none;
189 | -webkit-user-select: none;
190 | -khtml-user-select: none;
191 | -moz-user-select: none;
192 | -ms-user-select: none;
193 | user-select: none;
194 | h4
195 | float: left
196 | .cheatsheet-entry
197 | float: left
198 | margin-left 25%
199 | font-size: 200%
200 | max-width: 64px
201 | img
202 | height: 64px
203 | th
204 | font-weight: bold
205 | td:nth-child(1)
206 | width: 40%
207 | td
208 | vertical-align: middle
209 | font-size: 120%
210 | button.code
211 | width 100%
212 | font-family monospace
213 | font-size 48px
214 | height 64px
215 | .clipboard
216 | float: left
217 | font-size: 50%
218 |
219 | .select-col
220 | padding-left: 20px
221 | // margin-top: 32px
222 | input[type="checkbox"]
223 | width 23px
224 | height 23px
225 | padding: 0
226 | margin: 4px 0 0 0
227 | // padding-top 32px
228 |
229 |
230 | #work-info
231 | overflow-y: scroll
232 | max-height: 200px;
233 |
234 | // vim: sw=4 ts=4 noet :
235 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | # Add node_modules/.bin to $PATH so the CLI tools
2 | # installed locally by npm can be used
3 | export PATH := $(PWD)/node_modules/.bin:$(PATH)
4 |
5 | include \
6 | dev/apache.mk \
7 | dev/debian.mk \
8 | dev/docker.mk \
9 | dev/plackup.mk
10 | #
11 | # Define all the CLI tools to use
12 | #
13 |
14 | # Standard UNIX tools, recurse, create parents, force delete
15 | MKDIR = mkdir -p
16 | RM = rm -rf
17 | CP = cp -r
18 | GIT_CLONE = git clone --depth 1
19 |
20 | # cURL to download files
21 | CURL = curl -s
22 | # clean-css is a CSS minifier and optimizer
23 | CLEANCSS = cleancss
24 | # UglifyJS minifies, merges and optimizes Javascript
25 | UGLIFYJS = uglifyjs
26 | # webfont-dl is a tool to download web fonts from the Google Fonts API
27 | WEBFONTDL = webfont-dl --eot=omit --ttf=data --woff1=data
28 | # Pug is a templating engine
29 | PUG = pug --pretty
30 | # Stylus is a CSS compiler
31 | STYLUS = stylus
32 |
33 | #
34 | # Define lists of assets
35 | #
36 |
37 | # URLs of Web Fonts to embed
38 | FONT_URLS = https://fonts.googleapis.com/css?family=EB+Garamond&subset=latin,latin-ext
39 | # Font files (eot, ttf, woff...) to bundle
40 | FONT_FILES = node_modules/font-awesome/fonts/fontawesome-webfont.* \
41 | node_modules/bootstrap/fonts/glyphicons-halflings-regular.*
42 | # URLs of CSS to download
43 | CSS_URLS = https://getbootstrap.com/docs/3.4/examples/dashboard/dashboard.css
44 | # CSS files to bundle into one minified `dist/vendor.css`
45 | # NOTE: Our CSS should not be bundled here
46 | CSS_FILES = node_modules/reset-css/reset.css \
47 | node_modules/bootstrap/dist/css/bootstrap.css \
48 | node_modules/notie/dist/notie.css \
49 | node_modules/font-awesome/css/font-awesome.css
50 | # JS scripts to bundle into one minified `dist/vendor.js`
51 | # NOTE: Javascript developed by us should not be bundled here
52 | VENDOR_JS_FILES = node_modules/jquery/dist/jquery.js \
53 | node_modules/async/dist/async.min.js \
54 | node_modules/bootstrap/dist/js/bootstrap.js \
55 | node_modules/handlebars/dist/handlebars.min.js \
56 | node_modules/clipboard/dist/clipboard.js \
57 | node_modules/notie/dist/notie.js
58 |
59 | JS_FILES = js/*.js js/**/*.js ocr-gt-tools.js ocr-gt-tools.js
60 | # The HTML files, described in the Pug shorthand / templating language
61 | PUG_FILES = ocr-gt-tools.pug
62 |
63 | #
64 | # Define the list of targets that will "always fail", i.e. the CLI api
65 | #
66 | # clean-js clean-html clean-fonts clean-css \
67 |
68 | .PHONY: debug clean vendor
69 |
70 | #
71 | # Debugging
72 | #
73 | print-%: ; @echo $*=$($*)
74 |
75 | __: clean dist
76 |
77 | _.%: ; $(MAKE) -C . clean-$* dist
78 |
79 | debug:
80 | @grep '^[A-Z0-9_]\+\s*=' Makefile \
81 | |grep -o '^[A-Z0-9_]*' \
82 | |xargs -I{} make -s . print-{}
83 |
84 | #
85 | # Dependencies to execute ocropy / hocr-tools in a CGI environment
86 | #
87 |
88 | vendor: dist/vendor/hocr-tools dist/vendor/ocropy
89 |
90 | dist/vendor/hocr-tools:
91 | $(MKDIR) dist/vendor
92 | $(GIT_CLONE) https://github.com/UB-Mannheim/hocr-tools $@
93 |
94 | dist/vendor/ocropy:
95 | $(MKDIR) dist/vendor
96 | $(GIT_CLONE) https://github.com/tmbdev/ocropy $@
97 |
98 | #
99 | # Set up dist folder
100 | #
101 |
102 | dist: \
103 | dist/special-chars.json\
104 | dist/error-tags.json\
105 | dist/vendor\
106 | dist/log\
107 | dist/vendor.css\
108 | dist/vendor.js\
109 | dist/fonts\
110 | dist/index.html\
111 | dist/ocr-gt-tools.js\
112 | dist/ocr-gt-tools.css\
113 | dist/ocr-gt-tools.cgi
114 |
115 | dist/%.json: doc/%.json
116 | $(CP) $< $@
117 |
118 | dist/log:
119 | $(MKDIR) $@
120 |
121 | dist/ocr-gt-tools.cgi: ocr-gt-tools.cgi
122 | $(CP) $< $@
123 | chmod a+x $@
124 |
125 | #$(UGLIFYJS) --compress --output $@ $^
126 | dist/ocr-gt-tools.js: $(JS_FILES)
127 | cat $^ > $@
128 |
129 | dist/ocr-gt-tools.css: ocr-gt-tools.styl
130 | $(STYLUS) < $< > $@
131 |
132 | dist/fonts:
133 | $(MKDIR) $@
134 | $(CP) ${FONT_FILES} $@
135 |
136 | dist/fonts.css: dist/fonts
137 | $(WEBFONTDL) -o $@ --font-out=dist/fonts $(FONT_URLS) && wait
138 |
139 | dist/vendor.css: ${CSS_FILES} dist/fonts.css
140 | cat dist/fonts.css ${CSS_FILES} \
141 | | sed 's,\.\./fonts,fonts,g' \
142 | > dist/temp.css
143 | $(CURL) ${CSS_URLS} >> dist/temp.css
144 | $(CLEANCSS) --skip-rebase --output $@ dist/temp.css
145 | $(RM) dist/temp.css
146 |
147 | dist/vendor.js: ${VENDOR_JS_FILES}
148 | $(UGLIFYJS) --output $@ \
149 | --prefix 1 \
150 | --source-map $@.map \
151 | --source-map-url vendor.js.map \
152 | $^
153 |
154 | # sed "s,\(=.\)dist/,\1,g" $< | $(PUG) > $@
155 | dist/index.html: ${PUG_FILES}
156 | $(MKDIR) dist
157 | $(PUG) < $< > $@
158 |
159 | #
160 | # Clean up, delete files
161 | #
162 |
163 | clean-fonts:
164 | $(RM) dist/fonts dist/fonts.css
165 |
166 | clean-%:
167 | $(RM) dist/$* dist/*.$* dist/*.$*.map
168 |
169 | clean: clean-js clean-css clean-fonts clean-html
170 |
171 | realclean:
172 | $(RM) node_modules
173 | $(RM) dist
174 |
175 | test:
176 | bash ./test.sh
177 |
--------------------------------------------------------------------------------
/dist/error-tags.json:
--------------------------------------------------------------------------------
1 | {
2 | "wrong-image-section": {
3 | "name": {
4 | "en": "Incorrectly captured image section which contains no straight line of text but empty spaces or page margins etc.",
5 | "de": "Falsch erfasste Bildausschnitte, die keine Textzeile enthalten, sondern leere Seitenbereiche oder Seitenränder usw."
6 | },
7 | "total": 443,
8 | "frequencyAvg": 5.7,
9 | "id": "wrong-image-section"
10 | },
11 | "text-blocked": {
12 | "name": {
13 | "en": "Blocked text with visible blanks between the letters",
14 | "de": "Gesperrter Text mit sichtbaren Leerzeichen zwischen den Buchstaben"
15 | },
16 | "total": 51,
17 | "frequencyAvg": 0.7,
18 | "id": "text-blocked"
19 | },
20 | "text-italic": {
21 | "name": {
22 | "en": "Line completely or in parts in italics",
23 | "de": "Zeile vollständig oder teilweise in kursiver Schrift"
24 | },
25 | "total": 156,
26 | "frequencyAvg": 2,
27 | "id": "text-italic"
28 | },
29 | "initial": {
30 | "name": {
31 | "en": "initial character",
32 | "de": "Initiale"
33 | },
34 | "comment": {
35 | "en": "This most likely causes wrong-image-section"
36 | },
37 | "total": 10,
38 | "frequencyAvg": 0.1,
39 | "id": "initial"
40 | },
41 | "letter-faded": {
42 | "name": {
43 | "en": "Partially or completely faded letters",
44 | "de": "Teilweise oder vollständig ausgebleichte Buchstaben"
45 | },
46 | "total": 6,
47 | "frequencyAvg": 0.1,
48 | "id": "letter-faded"
49 | },
50 | "notes-within-line": {
51 | "name": {
52 | "en": "Notes on page margin captured within a text line",
53 | "de": "Anmerkungen am Seitenrand mit Textzeile erfasst"
54 | },
55 | "total": 93,
56 | "frequencyAvg": 1.2,
57 | "id": "notes-within-line"
58 | },
59 | "notes-separate": {
60 | "name": {
61 | "en": "Notes on page margin captured as separate lines",
62 | "de": "Anmerkungen am Seitenrand als separate Zeilen erfasst"
63 | },
64 | "total": 95,
65 | "frequencyAvg": 1.2,
66 | "id": "notes-separate"
67 | },
68 | "letter-handling-unclear": {
69 | "name": {
70 | "en": "Characters whose treatment is not yet clear",
71 | "de": "Buchstaben deren Behandlung noch nicht klar ist"
72 | },
73 | "comment": {
74 | "de": "Zum Beispiel q mit Akut, que-Ligatur"
75 | },
76 | "total": 17,
77 | "frequencyAvg": 0.2,
78 | "id": "letter-handling-unclear"
79 | },
80 | "line-incomplete": {
81 | "name": {
82 | "en": "line not captured completely",
83 | "de": "Zeile nicht vollständig erfasst"
84 | },
85 | "comment": {
86 | "de": "Zeile wurde zwar korrekt erfasst, aber Buchtsaben links oder rechts in der Zeile fehlen",
87 | "en": "Line was captured correctly, but letters in the left or right of the line are missing"
88 | },
89 | "total": 33,
90 | "frequencyAvg": 0.4,
91 | "id": "line-incomplete"
92 | },
93 | "line-incorrect": {
94 | "name": {
95 | "en": "line not captured correctly",
96 | "de": "Zeile nicht richtig erfasst"
97 | },
98 | "comment": {
99 | "en": "more than just one line inside the image; line lies at an angle in image",
100 | "de": "Mehrere Zeilen im Bild erfasst; Zeile liegt schräg im Bild"
101 | },
102 | "total": 57,
103 | "frequencyAvg": 0.7,
104 | "id": "line-incorrect"
105 | },
106 | "line-captured-twice": {
107 | "name": {
108 | "en": "line partially or completely captured tice",
109 | "de": "Zeile teilweise oder vollständig doppelt erfasst"
110 | },
111 | "total": 33,
112 | "frequencyAvg": 0.4,
113 | "id": "line-captured-twice"
114 | },
115 | "text-greek": {
116 | "name": {
117 | "en": "Greek text",
118 | "de": "Griechischer Text"
119 | },
120 | "total": 3,
121 | "frequencyAvg": 0,
122 | "id": "text-greek"
123 | },
124 | "letter-unidentified": {
125 | "name": {
126 | "en": "Letter not jet identified",
127 | "de": "Noch nicht genau identifizierter Buchstabe"
128 | },
129 | "comment": {
130 | "en": "May be similar with letter-handling-unclear"
131 | },
132 | "total": 12,
133 | "frequencyAvg": 0.2,
134 | "id": "letter-unidentified"
135 | },
136 | "letter-unreadable": {
137 | "name": {
138 | "en": "letter not faded but still unreadable",
139 | "de": "Buchstabe nicht ausgebleicht aber trotzdem unleserlich"
140 | },
141 | "total": 2,
142 | "frequencyAvg": 0,
143 | "id": "letter-unreadable"
144 | },
145 | "line-not-in-order": {
146 | "name": {
147 | "en": "line not captured in correct order",
148 | "de": "Zeile nicht in richtiger Reihenfolge erfasst"
149 | },
150 | "total": 4,
151 | "frequencyAvg": 0.1,
152 | "id": "line-not-in-order"
153 | },
154 | "dividing-line": {
155 | "name": {
156 | "en": "Dividing line captured as line",
157 | "de": "Trennlinie als Zeile erfasst"
158 | },
159 | "total": 2,
160 | "id": "dividing-line"
161 | }
162 | }
163 |
--------------------------------------------------------------------------------
/doc/error-tags.json:
--------------------------------------------------------------------------------
1 | {
2 | "wrong-image-section": {
3 | "name": {
4 | "en": "Incorrectly captured image section which contains no straight line of text but empty spaces or page margins etc.",
5 | "de": "Falsch erfasste Bildausschnitte, die keine Textzeile enthalten, sondern leere Seitenbereiche oder Seitenränder usw."
6 | },
7 | "total": 443,
8 | "frequencyAvg": 5.7,
9 | "id": "wrong-image-section"
10 | },
11 | "text-blocked": {
12 | "name": {
13 | "en": "Blocked text with visible blanks between the letters",
14 | "de": "Gesperrter Text mit sichtbaren Leerzeichen zwischen den Buchstaben"
15 | },
16 | "total": 51,
17 | "frequencyAvg": 0.7,
18 | "id": "text-blocked"
19 | },
20 | "text-italic": {
21 | "name": {
22 | "en": "Line completely or in parts in italics",
23 | "de": "Zeile vollständig oder teilweise in kursiver Schrift"
24 | },
25 | "total": 156,
26 | "frequencyAvg": 2,
27 | "id": "text-italic"
28 | },
29 | "initial": {
30 | "name": {
31 | "en": "initial character",
32 | "de": "Initiale"
33 | },
34 | "comment": {
35 | "en": "This most likely causes wrong-image-section"
36 | },
37 | "total": 10,
38 | "frequencyAvg": 0.1,
39 | "id": "initial"
40 | },
41 | "letter-faded": {
42 | "name": {
43 | "en": "Partially or completely faded letters",
44 | "de": "Teilweise oder vollständig ausgebleichte Buchstaben"
45 | },
46 | "total": 6,
47 | "frequencyAvg": 0.1,
48 | "id": "letter-faded"
49 | },
50 | "notes-within-line": {
51 | "name": {
52 | "en": "Notes on page margin captured within a text line",
53 | "de": "Anmerkungen am Seitenrand mit Textzeile erfasst"
54 | },
55 | "total": 93,
56 | "frequencyAvg": 1.2,
57 | "id": "notes-within-line"
58 | },
59 | "notes-separate": {
60 | "name": {
61 | "en": "Notes on page margin captured as separate lines",
62 | "de": "Anmerkungen am Seitenrand als separate Zeilen erfasst"
63 | },
64 | "total": 95,
65 | "frequencyAvg": 1.2,
66 | "id": "notes-separate"
67 | },
68 | "letter-handling-unclear": {
69 | "name": {
70 | "en": "Characters whose treatment is not yet clear",
71 | "de": "Buchstaben deren Behandlung noch nicht klar ist"
72 | },
73 | "comment": {
74 | "de": "Zum Beispiel q mit Akut, que-Ligatur"
75 | },
76 | "total": 17,
77 | "frequencyAvg": 0.2,
78 | "id": "letter-handling-unclear"
79 | },
80 | "line-incomplete": {
81 | "name": {
82 | "en": "line not captured completely",
83 | "de": "Zeile nicht vollständig erfasst"
84 | },
85 | "comment": {
86 | "de": "Zeile wurde zwar korrekt erfasst, aber Buchtsaben links oder rechts in der Zeile fehlen",
87 | "en": "Line was captured correctly, but letters in the left or right of the line are missing"
88 | },
89 | "total": 33,
90 | "frequencyAvg": 0.4,
91 | "id": "line-incomplete"
92 | },
93 | "line-incorrect": {
94 | "name": {
95 | "en": "line not captured correctly",
96 | "de": "Zeile nicht richtig erfasst"
97 | },
98 | "comment": {
99 | "en": "more than just one line inside the image; line lies at an angle in image",
100 | "de": "Mehrere Zeilen im Bild erfasst; Zeile liegt schräg im Bild"
101 | },
102 | "total": 57,
103 | "frequencyAvg": 0.7,
104 | "id": "line-incorrect"
105 | },
106 | "line-captured-twice": {
107 | "name": {
108 | "en": "line partially or completely captured tice",
109 | "de": "Zeile teilweise oder vollständig doppelt erfasst"
110 | },
111 | "total": 33,
112 | "frequencyAvg": 0.4,
113 | "id": "line-captured-twice"
114 | },
115 | "text-greek": {
116 | "name": {
117 | "en": "Greek text",
118 | "de": "Griechischer Text"
119 | },
120 | "total": 3,
121 | "frequencyAvg": 0,
122 | "id": "text-greek"
123 | },
124 | "letter-unidentified": {
125 | "name": {
126 | "en": "Letter not jet identified",
127 | "de": "Noch nicht genau identifizierter Buchstabe"
128 | },
129 | "comment": {
130 | "en": "May be similar with letter-handling-unclear"
131 | },
132 | "total": 12,
133 | "frequencyAvg": 0.2,
134 | "id": "letter-unidentified"
135 | },
136 | "letter-unreadable": {
137 | "name": {
138 | "en": "letter not faded but still unreadable",
139 | "de": "Buchstabe nicht ausgebleicht aber trotzdem unleserlich"
140 | },
141 | "total": 2,
142 | "frequencyAvg": 0,
143 | "id": "letter-unreadable"
144 | },
145 | "line-not-in-order": {
146 | "name": {
147 | "en": "line not captured in correct order",
148 | "de": "Zeile nicht in richtiger Reihenfolge erfasst"
149 | },
150 | "total": 4,
151 | "frequencyAvg": 0.1,
152 | "id": "line-not-in-order"
153 | },
154 | "dividing-line": {
155 | "name": {
156 | "en": "Dividing line captured as line",
157 | "de": "Trennlinie als Zeile erfasst"
158 | },
159 | "total": 2,
160 | "id": "dividing-line"
161 | }
162 | }
163 |
--------------------------------------------------------------------------------
/INSTALL.md:
--------------------------------------------------------------------------------
1 | # Installation Instructions
2 |
3 | * [Docker Quickstart](#docker-quickstart)
4 | * [Install dependencies](#install-dependencies)
5 | * [Create configuration](#create-configuration)
6 | * [Deploy on a server](#deploy-on-a-server)
7 | * [On Apache](#on-apache)
8 | * [Bundled standalone server](#bundled-standalone-server)
9 | * [Testing the server](#testing-the-server)
10 | * [Developing the frontend](#developing-the-frontend)
11 | * [Perl](#perl)
12 | * [Log-Files / Error-Log-Files](#log-files--error-log-files)
13 |
14 | ## Docker Quickstart
15 |
16 | To get the tool up and running in a docker container:
17 |
18 | ```
19 | git clone https://github.com/UB-Mannheim/ocr-gt-tools
20 | cd ocr-gt-tools
21 | ./dev/run-docker.sh " + xhr.responseText + ""); 81 | } 82 | this.$el.trigger.apply(this.$el, arguments); 83 | return this; 84 | }; 85 | 86 | App.prototype.confirmExit = function confirmExit() { 87 | if (this.currentPage && this.currentPage.changed) { 88 | notie.alert(2, "Ungesicherte Inhalte vorhanden, bitte zuerst speichern!", 5); 89 | return "Ungesicherte Inhalte vorhanden, bitte zuerst speichern!"; 90 | } 91 | }; 92 | 93 | App.prototype.onHashChange = function onHashChange(e) { 94 | e.preventDefault(); 95 | var newHash = window.location.hash; 96 | console.log(e.oldURL); 97 | if (!e.oldURL) { 98 | console.info('HashChange (initial) -> ', newHash); 99 | } else { 100 | var oldHash = e.oldURL.substr(e.oldURL.indexOf('#')); 101 | console.info('HashChange', oldHash, ' -> ', newHash); 102 | if (oldHash === newHash) { 103 | return; 104 | } 105 | if (this.confirmExit()) { 106 | window.location.hash = '#' + this.currentPage.imageUrl; 107 | return; 108 | } 109 | } 110 | if (newHash.length > 2) 111 | this.loadPage(newHash.substr(1)); 112 | }; 113 | 114 | App.prototype.showHistory = function() { 115 | var self = this; 116 | this.history.load(function(err) { 117 | if (err) { 118 | return self.emit('app:ajaxError', err); 119 | } 120 | self.historyView.render(); 121 | }); 122 | }; 123 | 124 | App.prototype.render = function() { 125 | 126 | var self = this; 127 | 128 | // Select mode initially off 129 | this.selectMode = false; 130 | 131 | // Setup event handlers for drag and drop 132 | // TODO 133 | setupDragAndDrop(); 134 | 135 | // Render views 136 | this.waitingAnimation.render(); 137 | this.cheatsheetView.render(); 138 | this.toolbar.render(); 139 | this.selectbar.render(); 140 | this.dropzone.render(); 141 | 142 | this.on('app:loading', function hideSidebar() { self.sidebar.$el.addClass('hidden'); }); 143 | this.on('app:loading', function hidePageView() { self.pageView.$el.addClass('hidden'); }); 144 | this.on('app:loaded', function showSidebar() { self.sidebar.$el.removeClass('hidden'); }); 145 | this.on('app:loaded', function showPageView() { self.pageView.$el.removeClass('hidden'); }); 146 | }; 147 | 148 | App.prototype.init = function init() { 149 | var self = this; 150 | 151 | // window global events 152 | window.onbeforeunload = self.confirmExit.bind(self); 153 | window.onhashchange = self.onHashChange.bind(self); 154 | 155 | // Set up views 156 | this.pageView = new PageView({'el': "#file-correction",}); 157 | this.dropzone = new Dropzone({'el': '#dropzone'}); 158 | this.toolbar = new Toolbar({'el': '#toolbar'}); 159 | this.sidebar = new Sidebar({'el': '#right-sidebar'}); 160 | this.selectbar = new Selectbar({'el': '#select-bar'}); 161 | this.waitingAnimation = new WaitingAnimation({ 162 | 'el': "#waiting-animation", 163 | 'model': this.cheatsheet 164 | }); 165 | this.historyView = new HistoryView({ 166 | 'el': "#history-modal", 167 | 'model': this.history, 168 | 'tpl': this.templates.historyItem, 169 | }); 170 | this.cheatsheetView = new CheatsheetView({ 171 | 'el': "#cheatsheet-modal", 172 | 'model': this.cheatsheet, 173 | 'tpl': this.templates.cheatsheetEntry, 174 | }); 175 | 176 | // Load cheatsheet and errorTags 177 | async.each([this.cheatsheet, this.errorTags], function(model, done) { 178 | model.load(done); 179 | }, function(err) { 180 | if (err) return self.emit('app:ajaxError', err); 181 | self.settings.load(); 182 | self.render(); 183 | // Trigger hash change 184 | $(window).trigger('hashchange'); 185 | self.$el.trigger('app:initialized'); 186 | }); 187 | }; 188 | 189 | App.prototype.savePage = function savePage() { 190 | var self = this; 191 | let currentPage = window.app.currentPage; 192 | if (!currentPage) { 193 | notie.alert(1, "Nichts zu speichern", 1); 194 | } else { 195 | this.emit('app:saving'); 196 | window.app.currentPage.save(function(err) { 197 | if (err) { 198 | self.emit('app:ajaxError', err); 199 | } else { 200 | self.emit('app:saved'); 201 | } 202 | }); 203 | } 204 | }; 205 | 206 | App.prototype.loadPage = function loadPage(url) { 207 | var self = this; 208 | if (self.confirmExit()) return; 209 | this.emit('app:loading'); 210 | this.currentPage = new Page(url); 211 | this.currentPage.load(function(err) { 212 | if (err) { 213 | return self.emit('app:ajaxError', err); 214 | } 215 | self.pageView.model = self.currentPage; 216 | self.pageView.render(); 217 | self.sidebar.model = self.currentPage; 218 | self.sidebar.render(); 219 | self.emit('app:loaded'); 220 | }); 221 | }; 222 | -------------------------------------------------------------------------------- /doc/user-scripts/scrape-wiki.user.js: -------------------------------------------------------------------------------- 1 | // ==UserScript== 2 | // @name Extract Special Characters 3 | // @namespace http://github.com/kba/ 4 | // @include https://github.com/UB-Mannheim/ocr-gt-tools/wiki/Special-Characters 5 | // @include https://github.com/UB-Mannheim/ocr-gt-tools/wiki/Error-Tags 6 | // @description Extract special character data from ocr-gt-tools wiki 7 | // @version 1 8 | // @require https://code.jquery.com/jquery-2.2.3.min.js 9 | // @require https://cdnjs.cloudflare.com/ajax/libs/z-schema/3.17.0/ZSchema-browser.js 10 | // @grant GM_addStyle 11 | // @grant GM_setClipboard 12 | // ==/UserScript== 13 | /*globals GM_addStyle */ 14 | /*globals ZSchema */ 15 | 16 | var CSS = ` 17 | pre.schema-error 18 | { 19 | background: #a00; 20 | color: white; 21 | white-space: pre-wrap; 22 | } 23 | div#glyph-bar 24 | { 25 | font-size: x-large; 26 | position:fixed; 27 | bottom: 0; 28 | height: 48px; 29 | border: 2px solid black; 30 | background: white; 31 | width: 100%; 32 | } 33 | div#glyph-bar .left * { float: left; } 34 | div#glyph-bar .right * { float: right; } 35 | div#glyph-bar * 36 | { 37 | height: 100%; 38 | font-size: x-large; 39 | } 40 | div#glyph-bar input[type='text'] 41 | { 42 | font-family: "Garamond", "Bookman", serif; 43 | } 44 | div#schema-bar 45 | { 46 | position: fixed; 47 | z-index: 3000; 48 | top: 0; 49 | background: #900; 50 | color: white !important; 51 | width: 100%; 52 | font-size: x-large; 53 | height: 48px; 54 | border: 2px solid black; 55 | } 56 | div#schema-invalid 57 | { 58 | display: none; 59 | } 60 | div#schema-invalid a 61 | { 62 | display: inline-block; 63 | color: white !important; 64 | float: none; 65 | margin: 0 2px; 66 | } 67 | `; 68 | 69 | var SCHEMAS = { 70 | 'Special-Characters': { 71 | 'type': 'object', 72 | "additionalProperties": false, 73 | 'properties': { 74 | 'id': { 75 | 'type': 'string', 76 | 'pattern': '^[a-z0-9-]+$', 77 | }, 78 | 'sample': { 79 | 'type': 'array', 80 | 'items': { 81 | 'type': 'string', 82 | 'pattern': '^
${escapeHTML(JSON.stringify(err, null, 2))}`);
267 | $("#schema-invalid").show().append(
268 | `[${ $("#schema-invalid a").length + 1}]`);
269 | }
270 |
271 | $(function() {
272 | GM_addStyle(CSS);
273 | $("body").prepend(
274 | `
275 | 15 | P R E F A C E. 16 | 17 | dillinguer les principes des 18 | 19 | confequences & les regles des 20 | 21 | exceprjons s & c’et’t ce qu: fait 22 | 23 | une infljrution Il y a long rems 24 | 25 | que j’en voi la necelllre & que 26 | 27 | je desire qu’jl y en ait en tou- 28 | 29 | tes les marieres qu’jl importe 30 | 31 | de favoir. Ceft aum ce qui m’a 32 | 33 | porte ä compol'er le Carechle 34 | 35 | me historjque & le trajtä de la 36 | 37 | merhode des etudes. Sans ce ('e- 38 | 39 | cours on marche ä tärons , on 40 | 41 | commence par de petits details, 42 | 43 | on fuit Paurorjte du premier 44 | 45 | venu , on ne forme que des 46 | 47 | doutes sc des opinions incertai- 48 | 49 | nes. 50 | 51 |
52 | 53 |54 | Tel eft l*etat des pures prati- 55 | 56 | ciens, qui n'apprennent la juer- 57 | 58 | pruclence canonjque que com- 59 | 60 | meles artjfans apprcnnent les 61 | 62 | metjers leg plus vle Sen voyant 63 | 64 | travailler leurs mairres , & re- 65 | 66 | tenant ce quïls leur difent ä 67 | 68 |
69 | 70 |71 | lll] 72 | 73 |
74 |77 | 78 | 79 |
80 |