├── .gitignore ├── Dockerfile ├── README.md ├── docker-compose.yml ├── index.html └── larex.properties /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | 3 | *.iml -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | ARG BASE_IMAGE_TAG=latest 2 | 3 | FROM uniwuezpd/ocr4all_base:$BASE_IMAGE_TAG 4 | 5 | ARG OCR4LL_BRANCH=master 6 | ARG LAREX_BRANCH=master 7 | ARG OCR4ALL_HELPER_SCRIPTS_BRANCH=master 8 | 9 | ENV OCR4ALL_VERSION="0.6.1" 10 | ENV LAREX_VERSION="0.7.0" 11 | 12 | # Install helper scripts to make all scripts available to JAVA environment 13 | RUN git clone -b ${OCR4ALL_HELPER_SCRIPTS_BRANCH} https://github.com/OCR4all/OCR4all_helper-scripts /opt/OCR4all_helper-scripts 14 | WORKDIR /opt/OCR4all_helper-scripts 15 | RUN python3 -m pip install . 16 | 17 | # Clone OCR4all and LAREX 18 | RUN git clone --depth 1 --branch ${OCR4LL_BRANCH} https://github.com/OCR4all/OCR4all /tmp/OCR4all 19 | RUN git clone --depth 1 --branch ${LAREX_BRANCH} https://github.com/OCR4all/LAREX /tmp/LAREX 20 | # Build OCR4all and LAREX 21 | WORKDIR /tmp/OCR4all 22 | RUN mvn clean install -f pom.xml 23 | RUN cp target/ocr4all.war /usr/local/tomcat/webapps/. 24 | WORKDIR /tmp/LAREX 25 | RUN mvn clean install -f pom.xml 26 | RUN cp target/Larex.war /usr/local/tomcat/webapps/. 27 | 28 | RUN rm -r /tmp/* 29 | 30 | # Create index.html for calling url without tool url part! 31 | COPY index.html /usr/share/tomcat/webapps/ROOT/index.html 32 | 33 | # Copy larex.properties 34 | COPY larex.properties /larex.properties 35 | ENV LAREX_CONFIG=/larex.properties 36 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # OCR4all - Docker image 2 | 3 | Provides OCR (optical character recognition) services through web applications 4 | 5 | ## Getting Started 6 | 7 | These instructions will get you a [Docker container](https://www.docker.com/what-container) that runs the project 8 | 9 | ### Prerequisites 10 | 11 | [Docker](https://www.docker.com) (for installation instructions see the [Official Installation Guide](https://docs.docker.com/install/)) 12 | 13 | ### Installing 14 | 15 | #### Get the Docker Image 16 | From Docker Hub: 17 | * Execute the following command ```docker pull uniwuezpd/ocr4all``` 18 | 19 | or 20 | 21 | From Source: 22 | * Download the [Dockerfile](IdeaProjects/docker_image/Dockerfile) first and enter the directory that contains it with a command line tool. 23 | 24 | * Execute the following command inside the directory: ``` docker build -t . ``` 25 | 26 | (We recommend uniwuezpd/ocr4all as image name) 27 | 28 | #### Initialize Container 29 | With the help of the image a container can now be created with the following command: 30 | ``` 31 | docker run \ 32 | -p 8080:8080 \ 33 | -u `id -u root`:`id -g $USER` \ 34 | --name ocr4all \ 35 | -v :/var/ocr4all/data \ 36 | -v :/var/ocr4all/models/custom \ 37 | -it 38 | ``` 39 | 40 | Explanation of variables used above: 41 | * `` - Name of the Docker image e.g. uniwuezpd/ocr4all 42 | * `` - Directory in which the OCR data is located on your local machine 43 | * `` - Directory in which the OCR models are located on your local machine 44 | 45 | The container will be started by default after executing the `docker run` command. 46 | 47 | If you want to start the container again later use `docker ps -a` to list all available containers with their Container IDs and then use `docker start -ia ocr4all` to start the desired container. 48 | 49 | You can now access the project via following URL: http://localhost:8080/ocr4all/ 50 | 51 | ### Updating 52 | #### From Docker Hub: 53 | 54 | Updating the image can easily be done via the docker hub if the image has been previously pulled from the docker hub. 55 | 56 | The following command will update the image: 57 | ``` 58 | docker pull uniwuezpd/ocr4all 59 | ``` 60 | 61 | #### From Source: 62 | 63 | To update the source code of the project you currently need to reinstall the image. 64 | 65 | This can be achieved with executing the following command first: 66 | ``` 67 | docker image rm 68 | ``` 69 | Afterwards you can follow the installation guide above as it is a new clean installation. 70 | 71 | ## Development 72 | 73 | In case you want shell access on your Docker container for development or testing purposes the container needs to be created with the following command (including the `--entrypoint` option): 74 | ``` 75 | docker run \ 76 | -p 8080:8080 \ 77 | --entrypoint /bin/bash \ 78 | -v :/var/ocr4all/data \ 79 | -v :/var/ocr4all/models/custom \ 80 | -it 81 | ``` 82 | 83 | The container will be started by default after executing the `docker run` command. 84 | 85 | If you want to start the container again later use `docker ps -a` to list all available containers with their Container IDs and then use `docker start ` to start the desired container. To gain shell access again use `docker attach `. 86 | 87 | Because the entrypoint has changed, processes will not start automatically and the following command needs to be executed after the container startup: 88 | ``` 89 | /usr/bin/supervisord 90 | ``` 91 | 92 | For information on how to update the project take a look into the commands within the [Dockerfile](IdeaProjects/docker_image/Dockerfile). 93 | 94 | ## Built With 95 | 96 | * [Docker](https://www.docker.com) - Platform and Software Deployment 97 | * [Maven](https://maven.apache.org/) - Dependency Management 98 | * [Spring](https://spring.io/) - Java Framework 99 | * [Materialize](http://materializecss.com/) - Front-end Framework 100 | * [jQuery](https://jquery.com/) - JavaScript Library 101 | 102 | ## Included Projects 103 | 104 | * [LAREX](https://github.com/chreul/LAREX) - Layout analysis on early printed books 105 | * [OCRopus](https://github.com/tmbdev/ocropy) - Collection of document analysis programs 106 | * [calamari](https://github.com/ChWick/calamari) - OCR Engine based on OCRopy and Kraken 107 | 108 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: "3" 2 | services: 3 | latest: 4 | build: 5 | context: . 6 | image: uniwuezpd/ocr4all:latest 7 | staging: 8 | build: 9 | context: . 10 | image: uniwuezpd/ocr4all:staging 11 | dev: 12 | build: 13 | context: . 14 | image: uniwuezpd/ocr4all:dev 15 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | OCR4All 6 | 70 | 71 | 72 |

OCR4all - Tools

73 | 80 | 81 | 82 | -------------------------------------------------------------------------------- /larex.properties: -------------------------------------------------------------------------------- 1 | ###### Configuration file for LAREX ###### 2 | ### How to: ### 3 | # Comments: # 4 | # = 5 | # Empty or commented out settings are interpreted as default 6 | 7 | # Set the accessible modes in the LAREX GUI =[[segment][edit][lines][text]] 8 | # A combination of the modes "segment", "edit", "lines" and "text" can be set as 9 | # a space separated string. 10 | # e.g. modes:segment lines 11 | # The order of those modes in the string also determines which mode is opened 12 | # on startup, with the first in the list being opened as main mode. 13 | # The mode "segment" can be replaced with "edit" in order to hide all auto 14 | # segmentation features. ("edit" will be ignored if both are present) 15 | # [Default]modes:segment lines text 16 | # LAREX will display any of those modes 17 | #modes= 18 | 19 | # Set the file path of the books folder. 20 | # e.g. bookpath:/home/user/books (Linux) 21 | # e.g. bookpath:C:\Users\user\Documents\books (Windows) 22 | # LAREX will load the books off of this folder. 23 | # [default /src/main/webapp/resources/books] 24 | bookpath=/var/ocr4all/library 25 | 26 | # Save the pageXML locally =[bookpath|savedir|none] 27 | # bookpath: save the pageXML in the bookpath 28 | # savedir: save the pageXML in a defined savedir 29 | # none: do not save the pageXML locally [default] 30 | # e.g. localsave:bookpath 31 | localsave=bookpath 32 | 33 | # Save location for the localsave mode "savedir" 34 | # Will be used if localsave mode is set to "savedir" 35 | # e.g. savedir:/home/user/save (Linux) 36 | # e.g. savedir:C:\Users\user\Documents\save (Windows) 37 | #savedir= 38 | 39 | # Download the pageXML in the web browser after saving 40 | # =[true|false] 41 | # true: download pageXML after saving [default] 42 | # false: no action after saving 43 | # e.g. websave:true 44 | websave=false 45 | 46 | # Filter to select specific images via their sub extensions. 47 | # LAREX will only display images that include that sub extension and will group images with the 48 | # same base name up to the sub extension together. Comprised of a list of sub extensions, 49 | # divided by space. Use "." to refer to images without sub extension. [default: no filter]: 50 | # = 51 | # e.g. imagefilter:bin nrm 52 | # Pages folder input: image.bin.png, image.png, image2.bin.png, image3.bin 53 | # Filtered pages: image.bin.png, image2.bin.png 54 | # (Images will point to *.bin.png, but will be named without the .bin. image.bin.png=image.png 55 | # Images with the same base name will be grouped together, with the same order as described in the filter) 56 | imagefilter=bin nrm desp 57 | 58 | # Enable/Disable OCR4all UI mode 59 | # This setting allows displaying and/or hiding certain UI elements when LAREX is used in combination with 60 | # OCR4all. 61 | # enable: enable OCR4all UI mode 62 | # disable: disable OCR4all UI mode [default] 63 | # e.g. ocr4all:enable 64 | ocr4all=enable 65 | 66 | # Set Colors for DiffView in TextViewer mode 67 | # This setting allows adjusting the colors for better contrast or better readability regarding color blindness. 68 | # All valid CSS colors are accepted. 69 | # e.g.:diff_insert_color=green 70 | # 71 | # Defaults: diff_insert_color=#58e123 72 | # diff_delete_color=#e56123 73 | #diff_insert_color= 74 | #diff_delete_color= 75 | --------------------------------------------------------------------------------