├── .gitignore
├── Dockerfile
├── README.md
├── docker-compose.yml
├── index.html
└── larex.properties


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/
2 | 
3 | *.iml


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
 1 | ARG BASE_IMAGE_TAG=latest
 2 | 
 3 | FROM uniwuezpd/ocr4all_base:$BASE_IMAGE_TAG
 4 | 
 5 | ARG OCR4LL_BRANCH=master
 6 | ARG LAREX_BRANCH=master
 7 | ARG OCR4ALL_HELPER_SCRIPTS_BRANCH=master
 8 | 
 9 | ENV OCR4ALL_VERSION="0.6.1"
10 | ENV LAREX_VERSION="0.7.0"
11 | 
12 | # Install helper scripts to make all scripts available to JAVA environment
13 | RUN git clone -b ${OCR4ALL_HELPER_SCRIPTS_BRANCH} https://github.com/OCR4all/OCR4all_helper-scripts /opt/OCR4all_helper-scripts
14 | WORKDIR /opt/OCR4all_helper-scripts
15 | RUN python3 -m pip install .
16 | 
17 | # Clone OCR4all and LAREX
18 | RUN git clone --depth 1 --branch ${OCR4LL_BRANCH} https://github.com/OCR4all/OCR4all /tmp/OCR4all
19 | RUN git clone --depth 1 --branch ${LAREX_BRANCH} https://github.com/OCR4all/LAREX /tmp/LAREX
20 | # Build OCR4all and LAREX
21 | WORKDIR /tmp/OCR4all
22 | RUN mvn clean install -f pom.xml
23 | RUN cp target/ocr4all.war /usr/local/tomcat/webapps/.
24 | WORKDIR /tmp/LAREX
25 | RUN mvn clean install -f pom.xml
26 | RUN cp target/Larex.war /usr/local/tomcat/webapps/.
27 | 
28 | RUN rm -r /tmp/*
29 | 
30 | # Create index.html for calling url without tool url part!
31 | COPY index.html /usr/share/tomcat/webapps/ROOT/index.html
32 | 
33 | # Copy larex.properties
34 | COPY larex.properties /larex.properties
35 | ENV LAREX_CONFIG=/larex.properties
36 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # OCR4all - Docker image 
  2 | 
  3 | Provides OCR (optical character recognition) services through web applications
  4 | 
  5 | ## Getting Started
  6 | 
  7 | These instructions will get you a [Docker container](https://www.docker.com/what-container) that runs the project
  8 | 
  9 | ### Prerequisites
 10 | 
 11 | [Docker](https://www.docker.com) (for installation instructions see the [Official Installation Guide](https://docs.docker.com/install/))
 12 | 
 13 | ### Installing
 14 | 
 15 | #### Get the Docker Image
 16 | From Docker Hub:
 17 | * Execute the following command ```docker pull uniwuezpd/ocr4all```
 18 | 
 19 | or
 20 | 
 21 | From Source:
 22 | * Download the [Dockerfile](IdeaProjects/docker_image/Dockerfile) first and enter the directory that contains it with a command line tool.
 23 | 
 24 | * Execute the following command inside the directory: ``` docker build -t <IMAGE_NAME> . ``` 
 25 | 
 26 | (We recommend uniwuezpd/ocr4all as image name)
 27 | 
 28 | #### Initialize Container
 29 | With the help of the image a container can now be created with the following command:
 30 | ```
 31 | docker run \
 32 |     -p 8080:8080 \
 33 |     -u `id -u root`:`id -g $USER` \
 34 |     --name ocr4all \
 35 |     -v <OCR_DATA_DIR>:/var/ocr4all/data \
 36 |     -v <OCR_MODEL_DIR>:/var/ocr4all/models/custom \
 37 |     -it <IMAGE_NAME>
 38 | ```
 39 | 
 40 | Explanation of variables used above:
 41 | * `<IMAGE_NAME>` - Name of the Docker image e.g. uniwuezpd/ocr4all
 42 | * `<OCR_DATA_DIR>` - Directory in which the OCR data is located on your local machine
 43 | * `<OCR_MODEL_DIR>` - Directory in which the OCR models are located on your local machine
 44 | 
 45 | The container will be started by default after executing the `docker run` command.
 46 | 
 47 | If you want to start the container again later use `docker ps -a` to list all available containers with their Container IDs and then use `docker start -ia ocr4all` to start the desired container.
 48 | 
 49 | You can now access the project via following URL: http://localhost:8080/ocr4all/
 50 | 
 51 | ### Updating
 52 | #### From Docker Hub:
 53 | 
 54 | Updating the image can easily be done via the docker hub if the image has been previously pulled from the docker hub.
 55 | 
 56 | The following command will update the image:
 57 | ```
 58 | docker pull uniwuezpd/ocr4all
 59 | ```
 60 | 
 61 | #### From Source:
 62 | 
 63 | To update the source code of the project you currently need to reinstall the image.
 64 | 
 65 | This can be achieved with executing the following command first:
 66 | ```
 67 | docker image rm <IMAGE_NAME>
 68 | ```
 69 | Afterwards you can follow the installation guide above as it is a new clean installation.
 70 | 
 71 | ## Development
 72 | 
 73 | In case you want shell access on your Docker container for development or testing purposes the container needs to be created with the following command (including the `--entrypoint` option):
 74 | ```
 75 | docker run \
 76 |     -p 8080:8080 \
 77 |     --entrypoint /bin/bash \
 78 |     -v <OCR_DATA_DIR>:/var/ocr4all/data \
 79 |     -v <OCR_MODEL_DIR>:/var/ocr4all/models/custom \
 80 |     -it <IMAGE_NAME>
 81 | ```
 82 | 
 83 | The container will be started by default after executing the `docker run` command.
 84 | 
 85 | If you want to start the container again later use `docker ps -a` to list all available containers with their Container IDs and then use `docker start <CONTAINER_ID>` to start the desired container. To gain shell access again use `docker attach <CONTAINER_ID>`.
 86 | 
 87 | Because the entrypoint has changed, processes will not start automatically and the following command needs to be executed after the container startup:
 88 | ```
 89 | /usr/bin/supervisord
 90 | ```
 91 | 
 92 | For information on how to update the project take a look into the commands within the [Dockerfile](IdeaProjects/docker_image/Dockerfile).
 93 | 
 94 | ## Built With
 95 | 
 96 | * [Docker](https://www.docker.com) - Platform and Software Deployment
 97 | * [Maven](https://maven.apache.org/) - Dependency Management
 98 | * [Spring](https://spring.io/) - Java Framework
 99 | * [Materialize](http://materializecss.com/) - Front-end Framework
100 | * [jQuery](https://jquery.com/) - JavaScript Library
101 | 
102 | ## Included Projects
103 | 
104 | * [LAREX](https://github.com/chreul/LAREX) - Layout analysis on early printed books
105 | * [OCRopus](https://github.com/tmbdev/ocropy) - Collection of document analysis programs
106 | * [calamari](https://github.com/ChWick/calamari) - OCR Engine based on OCRopy and Kraken
107 | 
108 | 


--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
 1 | version: "3"
 2 | services:
 3 |   latest:
 4 |     build:
 5 |       context: .
 6 |     image: uniwuezpd/ocr4all:latest
 7 |   staging:
 8 |     build:
 9 |       context: .
10 |     image: uniwuezpd/ocr4all:staging
11 |   dev:
12 |     build:
13 |       context: .
14 |     image: uniwuezpd/ocr4all:dev
15 | 


--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
 1 | <!doctype html>
 2 | <html lang="de">
 3 | 	<head>
 4 | 		<meta charset="UTF-8">
 5 | 		<title>OCR4All</title>
 6 | 		<style>
 7 | 			* {
 8 | 				box-sizing:border-box;
 9 | 				font-family: Georgia, serif;
10 | 			}
11 | 			body {
12 | 				margin: 0px;
13 | 				padding: 0px;
14 | 			}
15 | 			h1 {
16 | 				background-color:#283593;
17 | 				color: white;
18 | 				padding: 20px 40px;
19 | 				margin-top: 0;
20 | 				font-size: 60px;
21 | 				font-weight:normal;
22 | 			}
23 | 			article {
24 | 				box-shadow: 0 2px 2px 0 rgba(0,0,0,0.14),0 1px 5px 0 rgba(0,0,0,0.12),0 3px 1px -2px rgba(0,0,0,0.2);
25 | 				border-top: 1px solid #ddd;
26 | 				border-right: 1px solid #ddd;
27 | 				border-left: 1px solid #ddd;
28 | 				margin: .5rem 0 1rem 0;
29 | 				width: 90%;
30 | 				max-width: 1280px;
31 | 				padding: 10px 30px;
32 | 				margin: 0px auto;
33 | 			}
34 | 			ul {
35 | 				list-style:none;	
36 | 				padding: 0;
37 | 			}
38 | 			li {
39 | 				border-top: 1px solid #ddd;
40 | 				border-right: 1px solid #ddd;
41 | 				border-left: 1px solid #ddd;
42 | 				margin: 0;
43 | 			}
44 | 			a {
45 | 				width:100%;
46 | 				padding: 5px 20px;
47 | 				display: block;
48 | 				text-decoration: none;
49 | 				color: #5c6bc0;
50 | 			}
51 | 			a:hover{
52 | 				background-color: #5c6bc0;
53 | 				color: white;
54 | 			}
55 | 			li:last-child {
56 | 				border-bottom: 1px solid #ddd;
57 | 			}
58 | 			@media only screen and (min-width: 601px){
59 | 				article {
60 | 				    width: 85%;
61 | 				    max-width: none;
62 | 				}
63 | 			}
64 | 			@media only screen and (min-width: 601px){
65 | 					article {
66 | 					    width: 85%;
67 | 				}
68 | 			}
69 | 		</style>
70 | 	</head>
71 | 	<body>
72 | 		<h1>OCR4all - Tools</h1>
73 | 		<article>
74 | 			<p>Web tools:</p>
75 | 			<ul>
76 | 				<li><a href="/ocr4all">OCR4all - Main web interface</a></li>
77 | 				<li><a href="/Larex">Larex - Segmentation and Ground Truth production tool</a></li>
78 | 			</ul>
79 | 		</article>
80 | 	</body>
81 | </html>
82 | 


--------------------------------------------------------------------------------
/larex.properties:
--------------------------------------------------------------------------------
 1 | ###### Configuration file for LAREX ######
 2 | ### How to: ###
 3 | # Comments: #
 4 | # <setting>=<value>
 5 | # Empty or commented out settings are interpreted as default
 6 | 
 7 | # Set the accessible modes in the LAREX GUI <value>=[[segment][edit][lines][text]]
 8 | # A combination of the modes "segment", "edit", "lines" and "text" can be set as
 9 | # a space separated string.
10 | # e.g. modes:segment lines
11 | # The order of those modes in the string also determines which mode is opened
12 | # on startup, with the first in the list being opened as main mode.
13 | # The mode "segment" can be replaced with "edit" in order to hide all auto
14 | # segmentation features. ("edit" will be ignored if both are present)
15 | # [Default]modes:segment lines text
16 | # LAREX will display any of those modes
17 | #modes=<value>
18 | 
19 | # Set the file path of the books folder.
20 | # e.g. bookpath:/home/user/books (Linux)
21 | # e.g. bookpath:C:\Users\user\Documents\books (Windows)
22 | # LAREX will load the books off of this folder.
23 | # [default <LAREX>/src/main/webapp/resources/books]
24 | bookpath=/var/ocr4all/library
25 | 
26 | # Save the pageXML locally <mode>=[bookpath|savedir|none]
27 | # bookpath: save the pageXML in the bookpath
28 | # savedir: save the pageXML in a defined savedir
29 | # none: do not save the pageXML locally [default]
30 | # e.g. localsave:bookpath
31 | localsave=bookpath
32 | 
33 | # Save location for the localsave mode "savedir"
34 | # Will be used if localsave mode is set to "savedir"
35 | # e.g. savedir:/home/user/save (Linux)
36 | # e.g. savedir:C:\Users\user\Documents\save (Windows)
37 | #savedir=<path>
38 | 
39 | # Download the pageXML in the web browser after saving
40 | # <value>=[true|false]
41 | # true: download pageXML after saving [default]
42 | # false: no action after saving
43 | # e.g. websave:true
44 | websave=false
45 | 
46 | # Filter to select specific images via their sub extensions.
47 | # LAREX will only display images that include that sub extension and will group images with the
48 | # same base name up to the sub extension together. Comprised of a list of sub extensions,
49 | # divided by space. Use "." to refer to images without sub extension.  [default: no filter]:
50 | # <value>=<extensions>
51 | # e.g. imagefilter:bin nrm
52 | # Pages folder input: image.bin.png, image.png, image2.bin.png, image3.bin
53 | # Filtered pages: image.bin.png, image2.bin.png
54 | # (Images will point to *.bin.png, but will be named without the .bin. image.bin.png=image.png
55 | # 	Images with the same base name will be grouped together, with the same order as described in the filter)
56 | imagefilter=bin nrm desp
57 | 
58 | # Enable/Disable OCR4all UI mode
59 | # This setting allows displaying and/or hiding certain UI elements when LAREX is used in combination with
60 | # OCR4all.
61 | # enable: enable OCR4all UI mode
62 | # disable: disable OCR4all UI mode [default]
63 | # e.g. ocr4all:enable
64 | ocr4all=enable
65 | 
66 | # Set Colors for DiffView in TextViewer mode
67 | # This setting allows adjusting the colors for better contrast or better readability regarding color blindness.
68 | # All valid CSS colors are accepted.
69 | # e.g.:diff_insert_color=green
70 | #
71 | # Defaults: diff_insert_color=#58e123
72 | #           diff_delete_color=#e56123
73 | #diff_insert_color=<value>
74 | #diff_delete_color=<value>
75 | 


--------------------------------------------------------------------------------