├── .gitignore
├── docker-compose.yml
├── Dockerfile
├── index.html
├── larex.properties
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/
2 |
3 | *.iml
--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
1 | version: "3"
2 | services:
3 | latest:
4 | build:
5 | context: .
6 | image: uniwuezpd/ocr4all:latest
7 | staging:
8 | build:
9 | context: .
10 | image: uniwuezpd/ocr4all:staging
11 | dev:
12 | build:
13 | context: .
14 | image: uniwuezpd/ocr4all:dev
15 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | ARG BASE_IMAGE_TAG=latest
2 |
3 | FROM uniwuezpd/ocr4all_base:$BASE_IMAGE_TAG
4 |
5 | ARG OCR4LL_BRANCH=master
6 | ARG LAREX_BRANCH=master
7 | ARG OCR4ALL_HELPER_SCRIPTS_BRANCH=master
8 |
9 | ENV OCR4ALL_VERSION="0.6.1"
10 | ENV LAREX_VERSION="0.7.0"
11 |
12 | # Install helper scripts to make all scripts available to JAVA environment
13 | RUN git clone -b ${OCR4ALL_HELPER_SCRIPTS_BRANCH} https://github.com/OCR4all/OCR4all_helper-scripts /opt/OCR4all_helper-scripts
14 | WORKDIR /opt/OCR4all_helper-scripts
15 | RUN python3 -m pip install .
16 |
17 | # Clone OCR4all and LAREX
18 | RUN git clone --depth 1 --branch ${OCR4LL_BRANCH} https://github.com/OCR4all/OCR4all /tmp/OCR4all
19 | RUN git clone --depth 1 --branch ${LAREX_BRANCH} https://github.com/OCR4all/LAREX /tmp/LAREX
20 | # Build OCR4all and LAREX
21 | WORKDIR /tmp/OCR4all
22 | RUN mvn clean install -f pom.xml
23 | RUN cp target/ocr4all.war /usr/local/tomcat/webapps/.
24 | WORKDIR /tmp/LAREX
25 | RUN mvn clean install -f pom.xml
26 | RUN cp target/Larex.war /usr/local/tomcat/webapps/.
27 |
28 | RUN rm -r /tmp/*
29 |
30 | # Create index.html for calling url without tool url part!
31 | COPY index.html /usr/share/tomcat/webapps/ROOT/index.html
32 |
33 | # Copy larex.properties
34 | COPY larex.properties /larex.properties
35 | ENV LAREX_CONFIG=/larex.properties
36 |
--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | OCR4All
6 |
70 |
71 |
72 | OCR4all - Tools
73 |
74 | Web tools:
75 |
79 |
80 |
81 |
82 |
--------------------------------------------------------------------------------
/larex.properties:
--------------------------------------------------------------------------------
1 | ###### Configuration file for LAREX ######
2 | ### How to: ###
3 | # Comments: #
4 | # =
5 | # Empty or commented out settings are interpreted as default
6 |
7 | # Set the accessible modes in the LAREX GUI =[[segment][edit][lines][text]]
8 | # A combination of the modes "segment", "edit", "lines" and "text" can be set as
9 | # a space separated string.
10 | # e.g. modes:segment lines
11 | # The order of those modes in the string also determines which mode is opened
12 | # on startup, with the first in the list being opened as main mode.
13 | # The mode "segment" can be replaced with "edit" in order to hide all auto
14 | # segmentation features. ("edit" will be ignored if both are present)
15 | # [Default]modes:segment lines text
16 | # LAREX will display any of those modes
17 | #modes=
18 |
19 | # Set the file path of the books folder.
20 | # e.g. bookpath:/home/user/books (Linux)
21 | # e.g. bookpath:C:\Users\user\Documents\books (Windows)
22 | # LAREX will load the books off of this folder.
23 | # [default /src/main/webapp/resources/books]
24 | bookpath=/var/ocr4all/library
25 |
26 | # Save the pageXML locally =[bookpath|savedir|none]
27 | # bookpath: save the pageXML in the bookpath
28 | # savedir: save the pageXML in a defined savedir
29 | # none: do not save the pageXML locally [default]
30 | # e.g. localsave:bookpath
31 | localsave=bookpath
32 |
33 | # Save location for the localsave mode "savedir"
34 | # Will be used if localsave mode is set to "savedir"
35 | # e.g. savedir:/home/user/save (Linux)
36 | # e.g. savedir:C:\Users\user\Documents\save (Windows)
37 | #savedir=
38 |
39 | # Download the pageXML in the web browser after saving
40 | # =[true|false]
41 | # true: download pageXML after saving [default]
42 | # false: no action after saving
43 | # e.g. websave:true
44 | websave=false
45 |
46 | # Filter to select specific images via their sub extensions.
47 | # LAREX will only display images that include that sub extension and will group images with the
48 | # same base name up to the sub extension together. Comprised of a list of sub extensions,
49 | # divided by space. Use "." to refer to images without sub extension. [default: no filter]:
50 | # =
51 | # e.g. imagefilter:bin nrm
52 | # Pages folder input: image.bin.png, image.png, image2.bin.png, image3.bin
53 | # Filtered pages: image.bin.png, image2.bin.png
54 | # (Images will point to *.bin.png, but will be named without the .bin. image.bin.png=image.png
55 | # Images with the same base name will be grouped together, with the same order as described in the filter)
56 | imagefilter=bin nrm desp
57 |
58 | # Enable/Disable OCR4all UI mode
59 | # This setting allows displaying and/or hiding certain UI elements when LAREX is used in combination with
60 | # OCR4all.
61 | # enable: enable OCR4all UI mode
62 | # disable: disable OCR4all UI mode [default]
63 | # e.g. ocr4all:enable
64 | ocr4all=enable
65 |
66 | # Set Colors for DiffView in TextViewer mode
67 | # This setting allows adjusting the colors for better contrast or better readability regarding color blindness.
68 | # All valid CSS colors are accepted.
69 | # e.g.:diff_insert_color=green
70 | #
71 | # Defaults: diff_insert_color=#58e123
72 | # diff_delete_color=#e56123
73 | #diff_insert_color=
74 | #diff_delete_color=
75 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # OCR4all - Docker image
2 |
3 | Provides OCR (optical character recognition) services through web applications
4 |
5 | ## Getting Started
6 |
7 | These instructions will get you a [Docker container](https://www.docker.com/what-container) that runs the project
8 |
9 | ### Prerequisites
10 |
11 | [Docker](https://www.docker.com) (for installation instructions see the [Official Installation Guide](https://docs.docker.com/install/))
12 |
13 | ### Installing
14 |
15 | #### Get the Docker Image
16 | From Docker Hub:
17 | * Execute the following command ```docker pull uniwuezpd/ocr4all```
18 |
19 | or
20 |
21 | From Source:
22 | * Download the [Dockerfile](IdeaProjects/docker_image/Dockerfile) first and enter the directory that contains it with a command line tool.
23 |
24 | * Execute the following command inside the directory: ``` docker build -t . ```
25 |
26 | (We recommend uniwuezpd/ocr4all as image name)
27 |
28 | #### Initialize Container
29 | With the help of the image a container can now be created with the following command:
30 | ```
31 | docker run \
32 | -p 8080:8080 \
33 | -u `id -u root`:`id -g $USER` \
34 | --name ocr4all \
35 | -v :/var/ocr4all/data \
36 | -v :/var/ocr4all/models/custom \
37 | -it
38 | ```
39 |
40 | Explanation of variables used above:
41 | * `` - Name of the Docker image e.g. uniwuezpd/ocr4all
42 | * `` - Directory in which the OCR data is located on your local machine
43 | * `` - Directory in which the OCR models are located on your local machine
44 |
45 | The container will be started by default after executing the `docker run` command.
46 |
47 | If you want to start the container again later use `docker ps -a` to list all available containers with their Container IDs and then use `docker start -ia ocr4all` to start the desired container.
48 |
49 | You can now access the project via following URL: http://localhost:8080/ocr4all/
50 |
51 | ### Updating
52 | #### From Docker Hub:
53 |
54 | Updating the image can easily be done via the docker hub if the image has been previously pulled from the docker hub.
55 |
56 | The following command will update the image:
57 | ```
58 | docker pull uniwuezpd/ocr4all
59 | ```
60 |
61 | #### From Source:
62 |
63 | To update the source code of the project you currently need to reinstall the image.
64 |
65 | This can be achieved with executing the following command first:
66 | ```
67 | docker image rm
68 | ```
69 | Afterwards you can follow the installation guide above as it is a new clean installation.
70 |
71 | ## Development
72 |
73 | In case you want shell access on your Docker container for development or testing purposes the container needs to be created with the following command (including the `--entrypoint` option):
74 | ```
75 | docker run \
76 | -p 8080:8080 \
77 | --entrypoint /bin/bash \
78 | -v :/var/ocr4all/data \
79 | -v :/var/ocr4all/models/custom \
80 | -it
81 | ```
82 |
83 | The container will be started by default after executing the `docker run` command.
84 |
85 | If you want to start the container again later use `docker ps -a` to list all available containers with their Container IDs and then use `docker start ` to start the desired container. To gain shell access again use `docker attach `.
86 |
87 | Because the entrypoint has changed, processes will not start automatically and the following command needs to be executed after the container startup:
88 | ```
89 | /usr/bin/supervisord
90 | ```
91 |
92 | For information on how to update the project take a look into the commands within the [Dockerfile](IdeaProjects/docker_image/Dockerfile).
93 |
94 | ## Built With
95 |
96 | * [Docker](https://www.docker.com) - Platform and Software Deployment
97 | * [Maven](https://maven.apache.org/) - Dependency Management
98 | * [Spring](https://spring.io/) - Java Framework
99 | * [Materialize](http://materializecss.com/) - Front-end Framework
100 | * [jQuery](https://jquery.com/) - JavaScript Library
101 |
102 | ## Included Projects
103 |
104 | * [LAREX](https://github.com/chreul/LAREX) - Layout analysis on early printed books
105 | * [OCRopus](https://github.com/tmbdev/ocropy) - Collection of document analysis programs
106 | * [calamari](https://github.com/ChWick/calamari) - OCR Engine based on OCRopy and Kraken
107 |
108 |
--------------------------------------------------------------------------------