├── package ├── linux │ ├── debian │ │ ├── compat │ │ ├── patches │ │ │ └── series │ │ ├── source │ │ │ └── format │ │ ├── libosra.install │ │ ├── osra.install │ │ ├── libosra-java.install │ │ ├── osra-common.install │ │ ├── libosra-dev.install │ │ ├── changelog │ │ ├── watch │ │ ├── copyright.new │ │ ├── substvars │ │ ├── copyright │ │ ├── rules │ │ ├── rules.in │ │ ├── control │ │ └── control.in │ ├── osra.sh │ ├── osra.pc.in │ ├── INSTALL │ ├── osra.pc │ ├── plugins │ │ └── bkchem │ │ │ ├── convert_clipboard_image.xml │ │ │ └── convert_clipboard_image.py │ ├── install.sh │ ├── suse │ │ ├── osra.spec │ │ └── osra.spec.in │ └── create_model_ga.py ├── osx │ └── setup_env ├── android │ └── runosra.java └── win32 │ └── osra.nsi ├── src ├── test.jpg ├── output ├── unpaper.h ├── config.h.in ├── osra_grayscale.h ├── osra_ocr_tesseract.cpp ├── osra_stl.h ├── osra_reaction.h ├── osra_stl.cpp ├── osra_anisotropic.h ├── detect.cpp ├── osra_thin.h ├── recall.cpp ├── osra_rgroup.cpp ├── osra_fragments.h ├── osra_ocr.h ├── Makefile ├── osra_java.cpp ├── osra_anisotropic.cpp ├── osra_lib.h ├── mcdlutil.h ├── osra_openbabel.h ├── osra_segment.h ├── osra_fragments.cpp ├── osra.cpp └── osra.h ├── test ├── test.png ├── bugs │ ├── ocrad_api_regression_test │ │ ├── README │ │ ├── Makefile │ │ └── osra_ocr.cpp │ ├── gocr_quality_regression_test │ │ ├── README │ │ ├── Makefile │ │ └── osra_gocr.cpp │ ├── gcc_and_graphicsmagick_test │ │ ├── Makefile │ │ └── test.cpp │ └── tesseract_init_test │ │ ├── Makefile │ │ └── test.cpp ├── run_all_tests.pl └── benchmark ├── m4 ├── README ├── ac_cxx_namespaces.m4 ├── ac_cxx_have_stl.m4 └── ax_cxx_compile_stdcxx_11.m4 ├── .gitignore ├── jni └── Application.mk ├── dict ├── Makefile ├── superatom.txt └── spelling.txt ├── addons ├── lib_sample │ ├── Makefile │ └── lib_sample.cpp ├── java │ └── net │ │ └── sf │ │ └── osra │ │ ├── OsraLibJni.java │ │ ├── OsraLibJnati.java │ │ └── OsraLib.java ├── lib_java_sample │ └── net │ │ └── sf │ │ └── osra │ │ └── OsraLibTest.java └── valgrind.supp ├── README ├── doc └── Makefile ├── Makefile.inc.in ├── pom.xml.in ├── install-sh └── Makefile.in /package/linux/debian/compat: -------------------------------------------------------------------------------- 1 | 7 2 | -------------------------------------------------------------------------------- /package/linux/debian/patches/series: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /package/linux/debian/source/format: -------------------------------------------------------------------------------- 1 | 3.0 (quilt) 2 | -------------------------------------------------------------------------------- /package/linux/debian/libosra.install: -------------------------------------------------------------------------------- 1 | /usr/lib/libosra.so* 2 | -------------------------------------------------------------------------------- /package/linux/debian/osra.install: -------------------------------------------------------------------------------- 1 | /usr/bin 2 | /usr/share/man 3 | -------------------------------------------------------------------------------- /package/linux/debian/libosra-java.install: -------------------------------------------------------------------------------- 1 | /usr/lib/libosra_java.so* 2 | -------------------------------------------------------------------------------- /package/linux/osra.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | /opt/local/osra/2.1.0/osra-bin $* 3 | -------------------------------------------------------------------------------- /src/test.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/edbeard/pyosra/HEAD/src/test.jpg -------------------------------------------------------------------------------- /test/test.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/edbeard/pyosra/HEAD/test/test.png -------------------------------------------------------------------------------- /package/linux/debian/osra-common.install: -------------------------------------------------------------------------------- 1 | /usr/share/osra 2 | /usr/share/doc 3 | -------------------------------------------------------------------------------- /package/linux/debian/libosra-dev.install: -------------------------------------------------------------------------------- 1 | /usr/lib/libosra.a 2 | /usr/lib/pkgconfig 3 | /usr/include 4 | -------------------------------------------------------------------------------- /m4/README: -------------------------------------------------------------------------------- 1 | Helper scripts in this directory have been taken from autoconf-archive (http://ac-archive.sourceforge.net/). 2 | -------------------------------------------------------------------------------- /src/output: -------------------------------------------------------------------------------- 1 | c1c2C(=O)c3ccc(c(Nc4cc(c(N)c5C(=O)c6c(C(=O)c45)cccc6)C)c3C(=O)c2ccc1)C 2 | c1(ccccc1)Nc1c2C(=O)c3ccccc3C(=O)c2c(Nc2cc(c(N)c3C(=O)c4c(C(=O)c23)cccc4)C)c(C)c1 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | src/*.o 2 | Makefile 3 | .md5 4 | cmake-build-debug 5 | CMakeLists.txt 6 | config.log 7 | config.status 8 | Makefile.inc 9 | osra_lib_failing.cpp 10 | pom.xml 11 | 12 | -------------------------------------------------------------------------------- /package/linux/debian/changelog: -------------------------------------------------------------------------------- 1 | osra (2.0.0-1) unstable; urgency=low 2 | 3 | * Initial release (closes: #682760) 4 | 5 | -- Dmitry Katsubo Mon, 15 Dec 2014 17:05:38 +0200 6 | -------------------------------------------------------------------------------- /jni/Application.mk: -------------------------------------------------------------------------------- 1 | APP_MODULES := osra 2 | APP_PROJECT_PATH := /home/igor/workspace/osra 3 | APP_OPTIM := release 4 | #APP_ABI := armeabi armeabi-v7a 5 | APP_BUILD_SCRIPT := /home/igor/osra/jni/Android.mk 6 | -------------------------------------------------------------------------------- /test/bugs/ocrad_api_regression_test/README: -------------------------------------------------------------------------------- 1 | You can download the test image file corresponding to this test from here: 2 | https://sourceforge.net/projects/osra/files/contrib/bugs/images/apodaca.png/download 3 | -------------------------------------------------------------------------------- /test/bugs/gocr_quality_regression_test/README: -------------------------------------------------------------------------------- 1 | You can download the test image file corresponding to this test from here: 2 | https://sourceforge.net/projects/osra/files/contrib/bugs/images/st_test2_2.bmp/download 3 | -------------------------------------------------------------------------------- /package/linux/osra.pc.in: -------------------------------------------------------------------------------- 1 | prefix=@prefix@ 2 | exec_prefix=@exec_prefix@ 3 | 4 | Name: @PACKAGE_NAME@ 5 | Description: Chemical structure recognition library 6 | Version: @PACKAGE_VERSION@ 7 | 8 | Requires: gocr ocrad openbabel-2.0 GraphicsMagick++ 9 | Libs: -l@libdir@ @LIBS@ 10 | Cflags: 11 | -------------------------------------------------------------------------------- /package/linux/debian/watch: -------------------------------------------------------------------------------- 1 | # You can run the "uscan" command to check for upstream updates and more. 2 | # See uscan(1) for format 3 | 4 | # Compulsory line, this is a version 3 file 5 | version=3 6 | 7 | # Uncomment to find new files on sourceforge, for devscripts >= 2.9 8 | http://sf.net/osra/osra-(.*)\.tgz 9 | -------------------------------------------------------------------------------- /package/linux/INSTALL: -------------------------------------------------------------------------------- 1 | To install run (as a root or via sudo) 2 | ./install.sh 3 | 4 | It will copy the contents of "package" into /opt/local/osra 5 | and the wrap-around shell script "osra" into /usr/local/bin 6 | 7 | Starting with version 2.1.0 Ghostscript is no longer necessary 8 | to process PDF and PS files. 9 | -------------------------------------------------------------------------------- /package/linux/debian/copyright.new: -------------------------------------------------------------------------------- 1 | Format-Specification: http://svn.debian.org/wsvn/dep/web/deps/dep5.mdwn?op=file&rev=135 2 | Name: osra 3 | Maintainer: Dmitry Katsubo 4 | Source: https://osra.svn.sourceforge.net/svnroot/osra/ 5 | 6 | Files: * 7 | Copyright: 2007-2013 Igor Filippov 8 | License: GPL-2+ 9 | -------------------------------------------------------------------------------- /test/bugs/ocrad_api_regression_test/Makefile: -------------------------------------------------------------------------------- 1 | CXX := g++ 2 | LD := g++ 3 | 4 | CXXFLAGS := -g3 -O2 5 | CPPFLAGS := -I/usr/include/ocrad 6 | LDFLAGS := -L/usr/lib 7 | LIBS := -locrad 8 | 9 | OBJ = osra_ocr.o 10 | 11 | .PHONY: all clean 12 | 13 | .SUFFIXES: .c .cpp 14 | 15 | all: test 16 | 17 | test: $(OBJ) 18 | $(LD) $(LDFLAGS) -o $@ $(OBJ) $(LIBS) 19 | 20 | clean: 21 | $(RM) -f *.o test 22 | -------------------------------------------------------------------------------- /package/linux/osra.pc: -------------------------------------------------------------------------------- 1 | prefix=/usr/local 2 | exec_prefix=${prefix} 3 | 4 | Name: osra 5 | Description: Chemical structure recognition library 6 | Version: 2.1.0 7 | 8 | Requires: gocr ocrad openbabel-2.0 GraphicsMagick++ 9 | Libs: -l${exec_prefix}/lib -lPgm2asc -lGraphicsMagick -lGraphicsMagick++ -lopenbabel -lpoppler-cpp -lpoppler -lfontconfig -lfreetype -lpthread -locrad -lpotrace -lm 10 | Cflags: 11 | -------------------------------------------------------------------------------- /test/bugs/gocr_quality_regression_test/Makefile: -------------------------------------------------------------------------------- 1 | CXX := g++ 2 | LD := g++ 3 | 4 | CXXFLAGS := -g3 -O2 5 | CPPFLAGS := -I/usr/include/gocr 6 | LDFLAGS := -L/usr/lib 7 | LIBS := -lPgm2asc -lnetpbm 8 | 9 | OBJ = osra_gocr.o 10 | 11 | .PHONY: all clean 12 | 13 | .SUFFIXES: .c .cpp 14 | 15 | all: test 16 | 17 | test: $(OBJ) 18 | $(LD) $(LDFLAGS) -o $@ $(OBJ) $(LIBS) 19 | 20 | clean: 21 | $(RM) -f *.o test 22 | -------------------------------------------------------------------------------- /package/osx/setup_env: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | defaults write ~/.MacOSX/environment OSRA /usr/local/bin/ 3 | chown $USER ~/.MacOSX/environment.plist 4 | chmod a+r ~/.MacOSX/environment.plist 5 | launchctl setenv OSRA /usr/local/bin/ 6 | chmod a+rx /usr/local/bin/osra 7 | chmod a+rx /usr/local/lib/libopenabel.3.dylib 8 | chmod a+rx /usr/local/lib/libopenabel.dylib 9 | chmod a+rx /usr/local/lib/libopenabel.la 10 | chmod -R a+rx /opt/local/osra/ -------------------------------------------------------------------------------- /test/bugs/gcc_and_graphicsmagick_test/Makefile: -------------------------------------------------------------------------------- 1 | CXX := g++ 2 | LD := g++ 3 | RM := /bin/rm 4 | 5 | CXXFLAGS := -g3 -O2 6 | CPPFLAGS := -I/usr/include/GraphicsMagick 7 | LDFLAGS := -L/usr/lib 8 | 9 | LIBS := -lGraphicsMagick++ 10 | 11 | .PHONY: all clean 12 | 13 | .SUFFIXES: .c .cpp 14 | 15 | OBJ = test.o 16 | 17 | all: test 18 | 19 | test: $(OBJ) 20 | $(LD) $(LDFLAGS) -o $@ $(OBJ) $(LIBS) 21 | 22 | clean: 23 | $(RM) -f *.o test 24 | -------------------------------------------------------------------------------- /dict/Makefile: -------------------------------------------------------------------------------- 1 | # 2 | # This makefile targets the installation of architecture-independent data. 3 | # 4 | 5 | include ../Makefile.inc 6 | 7 | install: spelling.txt superatom.txt 8 | $(INSTALL_DIR) $(DESTDIR)$(datadir) 9 | $(INSTALL_DATA) $? $(DESTDIR)$(datadir) 10 | 11 | uninstall: 12 | $(RM) -f $(DESTDIR)$(datadir)/spelling.txt 13 | $(RM) -f $(DESTDIR)$(datadir)/superatom.txt 14 | 15 | ../Makefile.inc: ../Makefile.inc.in ../config.status 16 | cd .. && ./config.status 17 | -------------------------------------------------------------------------------- /addons/lib_sample/Makefile: -------------------------------------------------------------------------------- 1 | CXX := g++ 2 | LD := g++ 3 | 4 | CXXFLAGS := -g3 -ggdb -O0 -DOSRA_LIB 5 | CPPFLAGS := -I../../src 6 | LDFLAGS := -L../../src -L/usr/local/lib 7 | 8 | LIBS := -losra 9 | 10 | .PHONY: all clean 11 | 12 | .SUFFIXES: .c .cpp 13 | 14 | OBJ = lib_sample.o 15 | 16 | all: lib_sample 17 | 18 | # LD_LIBRARY_PATH=../../src ./lib_sample /usr/local/lib 19 | 20 | lib_sample: $(OBJ) 21 | $(LD) $(LDFLAGS) -o $@ $(OBJ) $(LIBS) 22 | 23 | clean: 24 | rm -f *.o lib_sample 25 | -------------------------------------------------------------------------------- /package/linux/plugins/bkchem/convert_clipboard_image.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Noel M. O'Boyle, Igor V. Filippov 7 | 8 | Takes an image of a molecule from the clipboard and converts it using OSRA (you need to install OSRA 9 | and set the environment variable OSRA to point to the executable) 10 | 11 | 12 | 13 | 14 | convert_clipboard_image.py 15 | Convert Image To Mol 16 | 17 | 18 | -------------------------------------------------------------------------------- /package/linux/install.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | if (( $EUID != 0 )); then 3 | echo "The installation must be run as root" 1>&2 4 | exit 1 5 | fi 6 | mkdir -p /opt/local/osra/2.1.0 || { echo "Cannot create /opt/local/osra folder" 1>&2; exit; } 7 | mkdir -p /usr/local/bin || { echo "Cannot create /opt/local/osra folder" 1>&2; exit; } 8 | cp --remove-destination package/* /opt/local/osra/2.1.0/ || { echo "Cannot copy to /opt/local/osra folder" 1>&2; exit; } 9 | echo "Installing binary files in /opt/local/osra" 10 | cp --remove-destination osra /usr/local/bin || { echo "Cannot copy to /usr/local/bin" 1>&2; exit; } 11 | echo "Installing osra script in /usr/local/bin" 12 | 13 | -------------------------------------------------------------------------------- /addons/java/net/sf/osra/OsraLibJni.java: -------------------------------------------------------------------------------- 1 | package net.sf.osra; 2 | 3 | import javax.management.RuntimeErrorException; 4 | 5 | /** 6 | * Pure JNI bridge for OSRA library. 7 | * 8 | * @author Dmitry Katsubo 9 | */ 10 | class OsraLibJni extends OsraLib { 11 | 12 | private static final String NAME = "osra_java"; 13 | 14 | static { 15 | try { 16 | System.loadLibrary(NAME); 17 | } 18 | catch (UnsatisfiedLinkError e) { 19 | throw new RuntimeErrorException(e, "Check that lib" + NAME + ".so/" + NAME 20 | + ".dll is in PATH or in java.library.path (" + System.getProperty("java.library.path") + ")"); 21 | } 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /test/bugs/tesseract_init_test/Makefile: -------------------------------------------------------------------------------- 1 | CXX := g++ 2 | LD := g++ 3 | RM := /bin/rm 4 | 5 | CXXFLAGS := -g3 -ggdb -O0 6 | CPPFLAGS := 7 | LDFLAGS := -L/usr/lib 8 | 9 | LIBS := -ltesseract_api 10 | 11 | .PHONY: all clean 12 | 13 | .SUFFIXES: .c .cpp 14 | 15 | OBJ = test.o 16 | 17 | all: 18 | rm -f *.o 19 | $(MAKE) test_local 20 | rm -f *.o 21 | $(MAKE) test_global 22 | ./test_global > 1 23 | ./test_local > 2 24 | diff -au 1 2 25 | 26 | test_local: $(OBJ) 27 | $(LD) $(LDFLAGS) -o $@ $(OBJ) $(LIBS) 28 | 29 | test_global: CXXFLAGS += -DTESS_GLOBAL_INSTANCE 30 | test_global: $(OBJ) 31 | $(LD) $(LDFLAGS) -o $@ $(OBJ) $(LIBS) 32 | 33 | clean: 34 | $(RM) -f *.o test_* 35 | -------------------------------------------------------------------------------- /package/linux/debian/substvars: -------------------------------------------------------------------------------- 1 | # Note: underscore (_) is not allowed in variable name 2 | binary:Depends=libpotrace0 (>= 1.8), libopenbabel4 (>= 2.3), libgraphicsmagick++3 (>= 1.3), libtesseract3 (>= 3.01), libcuneiform0 (>= 1.1) 3 | common:Description=OSRA is a utility designed to convert graphical representations of chemical${Newline}structures into SMILES or SDF.${Newline}OSRA can read a document in any of the over 90 graphical formats parseable by${Newline}GraphicMagick and generate the SMILES or SDF representation of the molecular${Newline}structure images encountered within that document.${Newline}Authors:${Newline} ${common:Authors} 4 | common:Authors=Igor Filippov 5 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | Pyosra: A Python wrapper for OSRA 2 | ================================= 3 | 4 | 5 | Description: 6 | ------------ 7 | 8 | - Pyosra is a Python wrapper that exposes the functionality of the OSRA structure 9 | recognition tool to a Python environment, using pybind11. 10 | 11 | - It adds the capability to extract R-Group structures. 12 | 13 | - Pyosra is designed to work in tandem with ChemSchematicResolver,a tool that can 14 | extract many chemical diagram and their descriptive labels directly from scientific 15 | articles, in an automated fashion. 16 | 17 | - ChemSchematicResolver is open-source and documentation can be found at www.chemschematicresolver.org . 18 | The source code can be found at https://github.com/edbeard/ChemSchematicResolver . 19 | 20 | - The documentation for OSRA can be found at https://cactus.nci.nih.gov/osra/ . 21 | -------------------------------------------------------------------------------- /addons/lib_java_sample/net/sf/osra/OsraLibTest.java: -------------------------------------------------------------------------------- 1 | package net.sf.osra; 2 | 3 | import java.io.BufferedInputStream; 4 | import java.io.FileInputStream; 5 | import java.io.IOException; 6 | import java.io.InputStream; 7 | import java.io.StringWriter; 8 | 9 | import org.apache.commons.io.IOUtils; 10 | import org.junit.Test; 11 | 12 | /** 13 | * Sample usage of the library. 14 | * 15 | * @author Dmitry Katsubo 16 | */ 17 | public class OsraLibTest { 18 | 19 | @Test 20 | public void testProcessImage() throws IOException { 21 | StringWriter writer = new StringWriter(); 22 | InputStream is = new BufferedInputStream(new FileInputStream("test/test.png")); 23 | 24 | byte[] imageData = IOUtils.toByteArray(is); 25 | 26 | int result = OsraLibJni.processImage(imageData, writer, 0, false, 0, 0, 0, false, false, "sdf", "inchi", true, 27 | true, true, true, true); 28 | 29 | System.out.println("OSRA completed with result:" + result + " structure:\n" + writer.toString() + "\n"); 30 | } 31 | } 32 | -------------------------------------------------------------------------------- /test/run_all_tests.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | # 3 | # Simple test script that traverses the given folder and rund OSRA for every TIFF file found. 4 | # 5 | 6 | use strict; 7 | 8 | use File::Find; 9 | use File::Path; 10 | use File::Basename; 11 | use File::Spec::Functions qw(abs2rel); 12 | 13 | if ($#ARGV != 0) { 14 | print "Usage: $0 \n"; 15 | exit 0; 16 | } 17 | 18 | my $test_dir = $ARGV[0]; 19 | my $out_dir = 'run5'; 20 | 21 | local $| = 1; 22 | local $/; 23 | 24 | find({ no_chdir => 1, wanted => sub { 25 | return if -d $_ || (-f $_ && !/\.tif/); 26 | 27 | my $location = $out_dir . '/' . abs2rel($File::Find::dir, $test_dir); 28 | 29 | mkpath($location) or die "Can't create the directory $location: $?" unless -d $location; 30 | 31 | $location .= '/' . basename($_) . '.out'; 32 | 33 | print "process $_ --> $location\n"; 34 | 35 | open IN, "../src/osra -c -p -b -f sdf '$_' |" or die; 36 | my $mol = ; 37 | close IN; 38 | 39 | open OUT, ">", $location or die; 40 | print OUT $mol; 41 | close OUT; 42 | } }, $test_dir); 43 | -------------------------------------------------------------------------------- /package/linux/debian/copyright: -------------------------------------------------------------------------------- 1 | This work was packaged for Debian by: 2 | 3 | Dmitry Katsubo on Tue, 06 Jul 2010 12:48:38 -0500 4 | 5 | It was downloaded from: 6 | 7 | https://sourceforge.net/projects/osra/files/osra/ 8 | 9 | Upstream Author: 10 | 11 | Igor Filippov 12 | 13 | Copyright: 14 | 15 | Copyright (C) 2007-2013 Igor Filippov 16 | 17 | License: 18 | 19 | This package is free software; you can redistribute it and/or modify 20 | it under the terms of the GNU General Public License as published by 21 | the Free Software Foundation; either version 2 of the License, or 22 | (at your option) any later version. 23 | 24 | This package is distributed in the hope that it will be useful, 25 | but WITHOUT ANY WARRANTY; without even the implied warranty of 26 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 27 | GNU General Public License for more details. 28 | 29 | You should have received a copy of the GNU General Public License 30 | along with this program. If not, see 31 | 32 | On Debian systems, the complete text of the GNU General Public License 33 | version 2 can be found in "/usr/share/common-licenses/GPL-2". 34 | -------------------------------------------------------------------------------- /package/linux/debian/rules: -------------------------------------------------------------------------------- 1 | #!/usr/bin/make -f 2 | 3 | # Uncomment this to turn on verbose mode. 4 | #export DH_VERBOSE=1 5 | 6 | DPKG_EXPORT_BUILDFLAGS=1 7 | include /usr/share/dpkg/buildflags.mk 8 | 9 | %: 10 | dh $@ 11 | 12 | override_dh_auto_configure: 13 | CFLAGS="$(CFLAGS)" CXXFLAGS="$(CXXFLAGS)" LDFLAGS="$(LDFLAGS)" dh_auto_configure -- --enable-docs --enable-lib --enable-java --with-tesseract --with-cuneiform --datadir='$${datarootdir}/$${PACKAGE_NAME}' --docdir='$${datarootdir}/doc/$${PACKAGE_NAME}' 14 | 15 | override_dh_install: 16 | # Check that *.install files have the same names as expected by debuild: 17 | [ -e debian/osra.install ] || ln -s osra.install debian/osra.install 18 | [ -e debian/libosra2.install ] || ln -s libosra.install debian/libosra2.install 19 | [ -e debian/libosra-dev.install ] || ln -s libosra-dev.install debian/libosra-dev.install 20 | [ -e debian/libosra-java2.install ] || ln -s libosra-java.install debian/libosra-java2.install 21 | 22 | # Continue with normal operation: 23 | dh_install 24 | 25 | # The default file "debian/substvars" is replaced by package-specific "debian/${package}.substvars", so in order not to mess with symlinks, we define an additional file with variable substitution: 26 | override_dh_gencontrol: 27 | dh_gencontrol -- -Tdebian/substvars 28 | -------------------------------------------------------------------------------- /m4/ac_cxx_namespaces.m4: -------------------------------------------------------------------------------- 1 | # =========================================================================== 2 | # http://autoconf-archive.cryp.to/ac_cxx_namespaces.html 3 | # =========================================================================== 4 | # 5 | # SYNOPSIS 6 | # 7 | # AC_CXX_NAMESPACES 8 | # 9 | # DESCRIPTION 10 | # 11 | # If the compiler can prevent names clashes using namespaces, define 12 | # HAVE_NAMESPACES. 13 | # 14 | # LICENSE 15 | # 16 | # Copyright (c) 2008 Todd Veldhuizen 17 | # Copyright (c) 2008 Luc Maisonobe 18 | # 19 | # Copying and distribution of this file, with or without modification, are 20 | # permitted in any medium without royalty provided the copyright notice 21 | # and this notice are preserved. 22 | 23 | AC_DEFUN([AC_CXX_NAMESPACES], 24 | [AC_CACHE_CHECK(whether the compiler implements namespaces, 25 | ac_cv_cxx_namespaces, 26 | [AC_LANG_SAVE 27 | AC_LANG_CPLUSPLUS 28 | AC_TRY_COMPILE([namespace Outer { namespace Inner { int i = 0; }}], 29 | [using namespace Outer::Inner; return i;], 30 | ac_cv_cxx_namespaces=yes, ac_cv_cxx_namespaces=no) 31 | AC_LANG_RESTORE 32 | ]) 33 | if test "$ac_cv_cxx_namespaces" = yes; then 34 | AC_DEFINE(HAVE_NAMESPACES,,[define if the compiler implements namespaces]) 35 | fi 36 | ]) 37 | -------------------------------------------------------------------------------- /m4/ac_cxx_have_stl.m4: -------------------------------------------------------------------------------- 1 | # =========================================================================== 2 | # http://autoconf-archive.cryp.to/ac_cxx_have_stl.html 3 | # =========================================================================== 4 | # 5 | # SYNOPSIS 6 | # 7 | # AC_CXX_HAVE_STL 8 | # 9 | # DESCRIPTION 10 | # 11 | # If the compiler supports the Standard Template Library, define HAVE_STL. 12 | # 13 | # LICENSE 14 | # 15 | # Copyright (c) 2008 Todd Veldhuizen 16 | # Copyright (c) 2008 Luc Maisonobe 17 | # 18 | # Copying and distribution of this file, with or without modification, are 19 | # permitted in any medium without royalty provided the copyright notice 20 | # and this notice are preserved. 21 | 22 | AC_DEFUN([AC_CXX_HAVE_STL], 23 | [AC_CACHE_CHECK(whether the compiler supports Standard Template Library, 24 | ac_cv_cxx_have_stl, 25 | [AC_REQUIRE([AC_CXX_NAMESPACES]) 26 | AC_LANG_SAVE 27 | AC_LANG_CPLUSPLUS 28 | AC_TRY_COMPILE([#include 29 | #include 30 | #ifdef HAVE_NAMESPACES 31 | using namespace std; 32 | #endif],[list x; x.push_back(5); 33 | list::iterator iter = x.begin(); if (iter != x.end()) ++iter; return 0;], 34 | ac_cv_cxx_have_stl=yes, ac_cv_cxx_have_stl=no) 35 | AC_LANG_RESTORE 36 | ]) 37 | if test "$ac_cv_cxx_have_stl" = yes; then 38 | AC_DEFINE(HAVE_STL,,[define if the compiler supports Standard Template Library]) 39 | fi 40 | ]) 41 | -------------------------------------------------------------------------------- /package/linux/debian/rules.in: -------------------------------------------------------------------------------- 1 | #!/usr/bin/make -f 2 | 3 | # Uncomment this to turn on verbose mode. 4 | #export DH_VERBOSE=1 5 | 6 | DPKG_EXPORT_BUILDFLAGS=1 7 | include /usr/share/dpkg/buildflags.mk 8 | 9 | %: 10 | dh $@ 11 | 12 | override_dh_auto_configure: 13 | CFLAGS="$(CFLAGS)" CXXFLAGS="$(CXXFLAGS)" LDFLAGS="$(LDFLAGS)" dh_auto_configure -- --enable-docs --enable-lib --enable-java --with-tesseract --with-cuneiform --datadir='$${datarootdir}/$${PACKAGE_NAME}' --docdir='$${datarootdir}/doc/$${PACKAGE_NAME}' 14 | 15 | override_dh_install: 16 | # Check that *.install files have the same names as expected by debuild: 17 | [ -e debian/@PACKAGE_NAME@.install ] || @LN_S@ osra.install debian/@PACKAGE_NAME@.install 18 | [ -e debian/lib@PACKAGE_NAME@@LIB_MAJOR_VERSION@.install ] || @LN_S@ libosra.install debian/lib@PACKAGE_NAME@@LIB_MAJOR_VERSION@.install 19 | [ -e debian/lib@PACKAGE_NAME@-dev.install ] || @LN_S@ libosra-dev.install debian/lib@PACKAGE_NAME@-dev.install 20 | [ -e debian/lib@PACKAGE_NAME@-java@LIB_MAJOR_VERSION@.install ] || @LN_S@ libosra-java.install debian/lib@PACKAGE_NAME@-java@LIB_MAJOR_VERSION@.install 21 | 22 | # Continue with normal operation: 23 | dh_install 24 | 25 | # The default file "debian/substvars" is replaced by package-specific "debian/${package}.substvars", so in order not to mess with symlinks, we define an additional file with variable substitution: 26 | override_dh_gencontrol: 27 | dh_gencontrol -- -Tdebian/substvars 28 | -------------------------------------------------------------------------------- /src/unpaper.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | #include // Magick::Image 21 | 22 | // Header: unpaper.h 23 | // 24 | // Defines types and functions for unpaper image adjustment module. 25 | // 26 | 27 | // 28 | // Section: Functions 29 | // 30 | 31 | // Function: unpaper() 32 | // 33 | // Performs unpaper image adjustment based on http://unpaper.berlios.de/ 34 | // 35 | // Parameters: 36 | // picture - image object 37 | // 38 | // Returns: 39 | // 0 in case of success or non-zero error code otherwise 40 | int unpaper(Magick::Image &picture, double &radians, int &unpaper_dx, int &unpaper_dy); 41 | -------------------------------------------------------------------------------- /addons/java/net/sf/osra/OsraLibJnati.java: -------------------------------------------------------------------------------- 1 | package net.sf.osra; 2 | 3 | import java.io.IOException; 4 | import java.io.InputStream; 5 | import java.util.PropertyResourceBundle; 6 | 7 | import net.sf.jnati.NativeCodeException; 8 | import net.sf.jnati.deploy.NativeLibraryLoader; 9 | 10 | /** 11 | * JNI bridge for OSRA library based on JNATI library. 12 | * 13 | * @author Dmitry Katsubo 14 | */ 15 | public class OsraLibJnati extends OsraLibJni { 16 | 17 | private static final String NAME = "osra"; 18 | 19 | private static final String VERSION; 20 | 21 | public static String getVersion() { 22 | return VERSION; 23 | } 24 | 25 | static { 26 | try { 27 | VERSION = getVersionFromResource(); 28 | 29 | NativeLibraryLoader.loadLibrary(NAME, VERSION); 30 | } 31 | catch (NativeCodeException e) { 32 | // Unable to handle this. 33 | throw new RuntimeException(e); 34 | } 35 | catch (IOException e) { 36 | // Unable to handle this. 37 | throw new RuntimeException(e); 38 | } 39 | } 40 | 41 | private static final String MAVEN_PROPERTIES = "META-INF/maven/net.sf.osra/osra/pom.properties"; 42 | 43 | /** 44 | * This function looks up the OSRA version from Maven properties file. 45 | */ 46 | private static String getVersionFromResource() throws IOException { 47 | InputStream is = OsraLibJnati.class.getClassLoader().getResourceAsStream(MAVEN_PROPERTIES); 48 | 49 | try { 50 | PropertyResourceBundle resourceBundle = new PropertyResourceBundle(is); 51 | 52 | return resourceBundle.getString("version"); 53 | } 54 | finally { 55 | is.close(); 56 | } 57 | } 58 | } 59 | -------------------------------------------------------------------------------- /src/config.h.in: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | /* Tell CImg library that there is not going to be a X11-capable display attached. It doesn't matter for Linux hosts, but matters for OS X. */ 21 | #define cimg_display_type 0 22 | 23 | /* Define to the full name of this package. */ 24 | #undef PACKAGE_NAME 25 | 26 | /* Define to the full name and version of this package. */ 27 | #undef PACKAGE_STRING 28 | 29 | /* Define to the version of this package. */ 30 | #undef PACKAGE_VERSION 31 | 32 | /* Define the location of data files. */ 33 | #undef DATA_DIR 34 | 35 | /* Is tesseract library present? */ 36 | #undef HAVE_TESSERACT_LIB 37 | 38 | /* Is cuneiform library present? */ 39 | #undef HAVE_CUNEIFORM_LIB 40 | -------------------------------------------------------------------------------- /doc/Makefile: -------------------------------------------------------------------------------- 1 | # 2 | # This makefile targets the compilation and installation of poject documentation. 3 | # 4 | # We provide compiled documentation on purpose, to reduce the number of dependencies (see linux/debian/control). 5 | # 6 | 7 | include ../Makefile.inc 8 | 9 | HTML_DOC_DIR=html 10 | NATURALDOCS_DIR=nd 11 | 12 | all: 13 | 14 | ifdef NATURALDOCS 15 | all: $(HTML_DOC_DIR)/index.html 16 | 17 | $(HTML_DOC_DIR)/index.html: 18 | mkdir -p $(HTML_DOC_DIR) $(NATURALDOCS_DIR) 19 | $(NATURALDOCS) -i ../src -o HTML $(HTML_DOC_DIR) -p $(NATURALDOCS_DIR) 20 | endif 21 | 22 | ifdef XSLTPROC 23 | all: $(NAME).1 $(NAME).html 24 | 25 | $(NAME).1: manual.sgml 26 | # This command has been taken from http://www.debian.org/doc/manuals/maint-guide/ch-dother.en.html#s-manpagexml: 27 | $(XSLTPROC) --nonet \ 28 | --param make.year.ranges 1 \ 29 | --param make.single.year.ranges 1 \ 30 | --param man.charmap.use.subset 0 \ 31 | http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl \ 32 | $? 33 | 34 | $(NAME).html: manual.sgml 35 | $(XSLTPROC) --nonet \ 36 | --param make.year.ranges 1 \ 37 | --param make.single.year.ranges 1 \ 38 | --param man.charmap.use.subset 0 \ 39 | http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl \ 40 | $? > $@ 41 | 42 | install: $(NAME).1 43 | $(INSTALL_DIR) $(DESTDIR)$(mandir)/man1 44 | $(INSTALL_DATA) $? $(DESTDIR)$(mandir)/man1 45 | endif 46 | 47 | uninstall: 48 | $(RM) -f $(DESTDIR)$(mandir)/man1/$(NAME).1 49 | 50 | clean: 51 | $(RM) -Rf $(HTML_DOC_DIR) $(NAME).1 $(NAME).html 52 | 53 | distclean: 54 | $(RM) -Rf $(NATURALDOCS_DIR) manual.sgml 55 | 56 | ../Makefile.inc: ../Makefile.inc.in ../config.status 57 | cd .. && ./config.status 58 | -------------------------------------------------------------------------------- /test/benchmark: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | #\ 3 | exec /usr/local/bin/csts -f "$0" ${1+"$@"} 4 | #source identifier.tcl 5 | #ens::identifier::create 6 | 7 | #cmdtrace on 8 | 9 | set dir_ideal [lindex $argv 0] 10 | set dir_result [lindex $argv 1] 11 | 12 | set tanimoto_list {} 13 | set false_positive_names {} 14 | set false_positive_counts {} 15 | set missing_names {} 16 | set missing_counts {} 17 | set identical 0 18 | set total 0 19 | set filelist [exec find $dir_ideal -name *.sdf] 20 | foreach file1 $filelist { 21 | set fh1 [molfile open $file1 r] 22 | set count1 [molfile count $fh1] 23 | set file2 $file1 24 | # regsub {.MOL} $file2 {.TIF.sdf} file2 25 | regsub $dir_ideal $file2 $dir_result file2 26 | set fh2 [molfile open $file2 r] 27 | set count2 [molfile count $fh2] 28 | set k 0 29 | set hashmap [dict create] 30 | molfile loop $fh1 eh1 { 31 | if { [ens get $eh1 E_NATOMS]==0 } continue 32 | incr k 33 | incr total 34 | ens hadd $eh1 35 | ens purge $eh1 E_STDINCHI 36 | set key1 "" 37 | if {[catch {ens need $eh1 E_STDINCHI; set key1 [ens show $eh1 E_STDINCHI]}]} {puts "$file1 $k"} 38 | if {$key1 ne ""} {dict set hashmap $key1 1} 39 | } 40 | set k 0 41 | molfile loop $fh2 eh2 { 42 | if { [ens get $eh2 E_NATOMS]==0 } continue 43 | incr k 44 | ens hadd $eh2 45 | ens purge $eh2 E_STDINCHI 46 | set key2 "" 47 | if {[catch {ens need $eh2 E_STDINCHI; set key2 [ens show $eh2 E_STDINCHI]}]} {puts "$file2 $k"} 48 | if {$key2 ne "" && [dict exists $hashmap $key2]} {incr identical} 49 | } 50 | molfile close $fh1 51 | molfile close $fh2 52 | } 53 | 54 | puts "Identical structures: $identical" 55 | puts "Total structures: $total" 56 | -------------------------------------------------------------------------------- /m4/ax_cxx_compile_stdcxx_11.m4: -------------------------------------------------------------------------------- 1 | # ============================================================================ 2 | # http://www.gnu.org/software/autoconf-archive/ax_cxx_compile_stdcxx_11.html 3 | # ============================================================================ 4 | # 5 | # SYNOPSIS 6 | # 7 | # AX_CXX_COMPILE_STDCXX_11([ext|noext], [mandatory|optional]) 8 | # 9 | # DESCRIPTION 10 | # 11 | # Check for baseline language coverage in the compiler for the C++11 12 | # standard; if necessary, add switches to CXX to enable support. 13 | # 14 | # This macro is a convenience alias for calling the AX_CXX_COMPILE_STDCXX 15 | # macro with the version set to C++11. The two optional arguments are 16 | # forwarded literally as the second and third argument respectively. 17 | # Please see the documentation for the AX_CXX_COMPILE_STDCXX macro for 18 | # more information. If you want to use this macro, you also need to 19 | # download the ax_cxx_compile_stdcxx.m4 file. 20 | # 21 | # LICENSE 22 | # 23 | # Copyright (c) 2008 Benjamin Kosnik 24 | # Copyright (c) 2012 Zack Weinberg 25 | # Copyright (c) 2013 Roy Stogner 26 | # Copyright (c) 2014, 2015 Google Inc.; contributed by Alexey Sokolov 27 | # Copyright (c) 2015 Paul Norman 28 | # Copyright (c) 2015 Moritz Klammler 29 | # 30 | # Copying and distribution of this file, with or without modification, are 31 | # permitted in any medium without royalty provided the copyright notice 32 | # and this notice are preserved. This file is offered as-is, without any 33 | # warranty. 34 | 35 | #serial 15 36 | 37 | include([ax_cxx_compile_stdcxx.m4]) 38 | 39 | AC_DEFUN([AX_CXX_COMPILE_STDCXX_11], [AX_CXX_COMPILE_STDCXX([11], [$1], [$2])]) 40 | -------------------------------------------------------------------------------- /test/bugs/gcc_and_graphicsmagick_test/test.cpp: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition 3 | 4 | This is a U.S. Government work (2007-2010) and is therefore not subject to 5 | copyright. However, portions of this work were obtained from a GPL or 6 | GPL-compatible source. 7 | Created by Igor Filippov, 2007-2010 (igorf@helix.nih.gov) 8 | 9 | This program is free software; you can redistribute it and/or modify it under 10 | the terms of the GNU General Public License as published by the Free Software 11 | Foundation; either version 2 of the License, or (at your option) any later 12 | version. 13 | 14 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 15 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 16 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 17 | 18 | You should have received a copy of the GNU General Public License along with 19 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 20 | St, Fifth Floor, Boston, MA 02110-1301, USA 21 | *****************************************************************************/ 22 | 23 | #include 24 | 25 | #include 26 | #include 27 | 28 | using namespace std; 29 | 30 | int main(int argc, char **argv) 31 | { 32 | string fileName = argc > 1 ? argv[1] : "aaaa"; 33 | string type; 34 | 35 | Magick::InitializeMagick(*argv); 36 | 37 | try 38 | { 39 | Magick::Image image; 40 | image.ping(fileName); 41 | type = image.magick(); 42 | } 43 | catch (...) 44 | { 45 | cerr << "Cannot open file '" << fileName << "'" << endl; 46 | exit(1); 47 | } 48 | 49 | cerr << "File type is '" << type << "'" << endl; 50 | 51 | return 0; 52 | } 53 | -------------------------------------------------------------------------------- /package/linux/debian/control: -------------------------------------------------------------------------------- 1 | Source: osra 2 | Section: science 3 | Priority: optional 4 | Maintainer: Dmitry Katsubo 5 | Build-Depends: debhelper (>= 9), 6 | cimg-dev (>= 1.2.7), 7 | libc6-dev (>= 2.7), 8 | libstdc++-dev, 9 | libtclap-dev (>= 1.2), 10 | libpotrace-dev (>= 1.8), 11 | libgocr-dev (>= 0.49), 12 | libocrad-dev (>= 0.20), 13 | libopenbabel-dev (>= 2.3), 14 | libgraphicsmagick++1-dev (>= 1.3), 15 | libcuneiform-dev (>= 1.1), 16 | libtesseract-dev (>= 3.01), 17 | java2-sdk, 18 | docbook-xsl (>= 1.74.0), 19 | docbook-xml, 20 | xsltproc, 21 | naturaldocs 22 | Standards-Version: 3.9.6 23 | Homepage: http://osra.sourceforge.net/ 24 | XS-Vcs-Svn: https://osra.svn.sourceforge.net/svnroot/osra/ 25 | Vcs-browser: http://osra.svn.sourceforge.net/viewvc/osra/ 26 | 27 | Package: osra 28 | Architecture: any 29 | Depends: ${misc:Depends}, ${shlibs:Depends}, ${binary:Depends}, osra-common (= ${source:Version}) 30 | Recommends: gocr, ocrad, potrace 31 | Description: Command line chemical structure recognition tool (OSRA) 32 | ${common:Description} 33 | 34 | Package: osra-common 35 | Architecture: all 36 | Depends: ${misc:Depends} 37 | Description: Shared files for chemical structure recognition tool (OSRA) 38 | ${common:Description} 39 | 40 | Package: libosra2 41 | Architecture: any 42 | Depends: ${misc:Depends}, ${shlibs:Depends}, ${binary:Depends}, osra-common (= ${source:Version}) 43 | Description: Chemical structure recognition library (OSRA) 44 | ${common:Description} 45 | 46 | Package: libosra-dev 47 | Architecture: any 48 | Section: libdevel 49 | Depends: ${misc:Depends}, libosra2 (= ${binary:Version}) 50 | Description: Development headers to consume the OSRA library 51 | ${common:Description} 52 | 53 | Package: libosra-java2 54 | Architecture: any 55 | Section: java 56 | Depends: ${misc:Depends}, ${shlibs:Depends}, ${binary:Depends}, osra-common (= ${source:Version}) 57 | Description: Chemical structure recognition library for Java (OSRA) 58 | ${common:Description} 59 | -------------------------------------------------------------------------------- /src/osra_grayscale.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // Header: osra_grayscale.h 21 | // 22 | // Declares grayscale conversion functions 23 | // 24 | 25 | #include 26 | 27 | using namespace Magick; 28 | 29 | // 30 | // Section: Functions 31 | // 32 | 33 | // Function: getBgColor() 34 | // 35 | // Detects the backgroun color of the image 36 | // 37 | // Parameters: 38 | // image - a reference to the image object 39 | // 40 | // Returns: 41 | // a Color object corresponding to the background color 42 | const Color getBgColor(const Image &image); 43 | 44 | // Function: convert_to_gray() 45 | // 46 | // Converts image to grayscale 47 | // 48 | // Parameters: 49 | // image - reference to Image object 50 | // invert - flag set if the image is white-on-black 51 | // adaptive - flag set if adaptive thresholding is enforced 52 | // verbose - flag set if verbose reporting is on 53 | // 54 | // Returns: 55 | // a boolean flag indicating whether adaptive thresholding is indicated 56 | bool convert_to_gray(Image &image, bool invert, bool adaptive, bool verbose); 57 | -------------------------------------------------------------------------------- /src/osra_ocr_tesseract.cpp: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | #include // NULL 21 | #include // free() 22 | #include // isalnum() 23 | #include // strlen() 24 | 25 | #include // std::string 26 | 27 | #include 28 | 29 | const char UNKNOWN_CHAR = '_'; 30 | 31 | // Global variable: 32 | tesseract::TessBaseAPI tess; 33 | 34 | void osra_tesseract_init() 35 | { 36 | tess.Init(NULL, "eng", tesseract::OEM_DEFAULT, NULL, 0, NULL, NULL, false); 37 | } 38 | 39 | void osra_tesseract_destroy() 40 | { 41 | tess.End(); 42 | } 43 | 44 | char osra_tesseract_ocr(unsigned char *pixmap, int width, int height, const std::string &char_filter) 45 | { 46 | char result = UNKNOWN_CHAR; 47 | 48 | char *text = tess.TesseractRect(pixmap, 1, width, 0, 0, width, height); 49 | 50 | // TODO: Why text length should be exactly 3? Give examples... 51 | if (text != NULL && strlen(text) == 3 && isalnum(text[0]) && (char_filter.empty() || char_filter.find(text[0], 0) != std::string::npos)) 52 | result = text[0]; 53 | 54 | free(text); 55 | 56 | return result; 57 | } 58 | -------------------------------------------------------------------------------- /Makefile.inc.in: -------------------------------------------------------------------------------- 1 | # 2 | # This makefile is included from all other makefiles. 3 | # 4 | # Notes: 5 | # - Use $(CXX) variable for compiler. 6 | # - Use $(LINK.cpp) for linker. Currently it is equal to "$(CXX) $(CXXFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)", but if you need real linker detection, use libtool + LT_PATH_LD macros 7 | # - Pass optimization flags as follows: 8 | # CXXFLAGS="-g -O3" ./configure 9 | # There are no defaults for any options as there is no guarantee, that target compiler supports them, see http://www.gnu.org/software/autoconf/manual/make/Command-Variables.html. 10 | # 11 | 12 | NAME := @PACKAGE_NAME@ 13 | VERSION := @PACKAGE_VERSION@ 14 | LIB_MAJOR_VERSION := @LIB_MAJOR_VERSION@ 15 | LIB_MINOR_VERSION := @LIB_MINOR_VERSION@ 16 | LIB_PATCH_VERSION := @LIB_PATCH_VERSION@ 17 | NAME_VERSION := $(NAME)-$(VERSION) 18 | 19 | prefix := @prefix@ 20 | exec_prefix := @exec_prefix@ 21 | bindir := @bindir@ 22 | libdir := @libdir@ 23 | includedir := @includedir@ 24 | datarootdir := @datarootdir@ 25 | datadir := @datadir@ 26 | docdir := @docdir@ 27 | mandir := @mandir@ 28 | 29 | TARGET_CPU := @build_cpu@ 30 | TARGET_OS := @build_os@ 31 | 32 | CXX := @CXX@ 33 | RM := @RM@ 34 | LN_S := @LN_S@ 35 | RANLIB := @RANLIB@ 36 | AR := @AR@ 37 | INSTALL := @INSTALL@ 38 | INSTALL_PROGRAM := @INSTALL_PROGRAM@ 39 | INSTALL_DATA := @INSTALL_DATA@ 40 | INSTALL_DIR := ${INSTALL} -d -m 755 41 | NATURALDOCS := @NATURALDOCS@ 42 | XSLTPROC := @XSLTPROC@ 43 | TESSERACT_LIB := @TESSERACT_LIB@ 44 | OSRA_LIB := @OSRA_LIB@ 45 | OSRA_JAVA := @OSRA_JAVA@ 46 | 47 | # Notes: see "configure --enable-static-linking" to enable static linking; use "configure --enable-profiling" to include extra debug info. 48 | 49 | CXXFLAGS := @CXXFLAGS@ 50 | CPPFLAGS := @CPPFLAGS@ 51 | LDFLAGS := @LDFLAGS@ 52 | # Important that this variable is re-evaluated each time when used: 53 | LDSHAREDFLAGS = @LDSHAREDFLAGS@ 54 | 55 | EXEEXT := @EXEEXT@ 56 | SHAREDEXT := @SHAREDEXT@ 57 | 58 | LIBS := @LIBS@ 59 | 60 | PHONY_TARGETS := all install uninstall clean distclean 61 | 62 | .PHONY: $(PHONY_TARGETS) 63 | 64 | .SUFFIXES: .c .cpp 65 | -------------------------------------------------------------------------------- /package/linux/debian/control.in: -------------------------------------------------------------------------------- 1 | Source: @PACKAGE_NAME@ 2 | Section: science 3 | Priority: optional 4 | Maintainer: Dmitry Katsubo 5 | Build-Depends: debhelper (>= 9), 6 | cimg-dev (>= 1.2.7), 7 | libc6-dev (>= 2.7), 8 | libstdc++-dev, 9 | libtclap-dev (>= 1.2), 10 | libpotrace-dev (>= 1.8), 11 | libgocr-dev (>= 0.49), 12 | libocrad-dev (>= 0.20), 13 | libopenbabel-dev (>= 2.3), 14 | libgraphicsmagick++1-dev (>= 1.3), 15 | libcuneiform-dev (>= 1.1), 16 | libtesseract-dev (>= 3.01), 17 | java2-sdk, 18 | docbook-xsl (>= 1.74.0), 19 | docbook-xml, 20 | xsltproc, 21 | naturaldocs 22 | Standards-Version: 3.9.6 23 | Homepage: http://osra.sourceforge.net/ 24 | XS-Vcs-Svn: https://osra.svn.sourceforge.net/svnroot/osra/ 25 | Vcs-browser: http://osra.svn.sourceforge.net/viewvc/osra/ 26 | 27 | Package: @PACKAGE_NAME@ 28 | Architecture: any 29 | Depends: ${misc:Depends}, ${shlibs:Depends}, ${binary:Depends}, @PACKAGE_NAME@-common (= ${source:Version}) 30 | Recommends: gocr, ocrad, potrace 31 | Description: Command line chemical structure recognition tool (OSRA) 32 | ${common:Description} 33 | 34 | Package: @PACKAGE_NAME@-common 35 | Architecture: all 36 | Depends: ${misc:Depends} 37 | Description: Shared files for chemical structure recognition tool (OSRA) 38 | ${common:Description} 39 | 40 | Package: lib@PACKAGE_NAME@@LIB_MAJOR_VERSION@ 41 | Architecture: any 42 | Depends: ${misc:Depends}, ${shlibs:Depends}, ${binary:Depends}, @PACKAGE_NAME@-common (= ${source:Version}) 43 | Description: Chemical structure recognition library (OSRA) 44 | ${common:Description} 45 | 46 | Package: lib@PACKAGE_NAME@-dev 47 | Architecture: any 48 | Section: libdevel 49 | Depends: ${misc:Depends}, lib@PACKAGE_NAME@@LIB_MAJOR_VERSION@ (= ${binary:Version}) 50 | Description: Development headers to consume the OSRA library 51 | ${common:Description} 52 | 53 | Package: lib@PACKAGE_NAME@-java@LIB_MAJOR_VERSION@ 54 | Architecture: any 55 | Section: java 56 | Depends: ${misc:Depends}, ${shlibs:Depends}, ${binary:Depends}, @PACKAGE_NAME@-common (= ${source:Version}) 57 | Description: Chemical structure recognition library for Java (OSRA) 58 | ${common:Description} 59 | -------------------------------------------------------------------------------- /src/osra_stl.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // Header: osra_stl.h 21 | // 22 | // STL Helpers 23 | // 24 | 25 | #include // std::vector 26 | #include // std::ostream 27 | 28 | #include "osra_labels.h" 29 | #include "osra_fragments.h" 30 | 31 | // Function: operator<<() 32 | // 33 | // Helper template method to print vectors. 34 | namespace std 35 | { 36 | std::ostream& operator<<(std::ostream &os, const letters_t &letter); 37 | 38 | std::ostream& operator<<(std::ostream &os, const label_t &label); 39 | 40 | std::ostream& operator<<(std::ostream &os, const atom_t &atom); 41 | 42 | std::ostream& operator<<(std::ostream &os, const bond_t &bond); 43 | 44 | std::ostream& operator<<(std::ostream &os, const fragment_t &fragment); 45 | 46 | template 47 | std::ostream& operator<<(std::ostream &os, const std::vector &v) 48 | { 49 | os << '['; 50 | if (!v.empty()) 51 | { 52 | typedef typename std::vector::const_iterator const_iterator; 53 | 54 | const_iterator last = v.end(); 55 | std::copy(v.begin(), --last, std::ostream_iterator(os, ", ")); 56 | os << *last; 57 | } 58 | os << ']'; 59 | 60 | return os; 61 | } 62 | } 63 | -------------------------------------------------------------------------------- /src/osra_reaction.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // Header: osra_reaction.h 21 | // 22 | // Defines functions dealing with generating a reaction type output 23 | // 24 | 25 | #define SUBSTITUTE_REACTION_FORMAT "mol" 26 | 27 | 28 | // 29 | // Section: Functions 30 | // 31 | 32 | // Function: arrange_reactions 33 | // 34 | // Create a reaction representation for input vector of structures 35 | // 36 | // Parameters: 37 | // arrows - a vector of arrow_t objects representing arrows found during segmentation 38 | // page_of_boxes - a vector of box_t objects representing bounding boxes of molecules 39 | // pluses - a vector of plus sing centers 40 | // results - a vector of strings to represent output results 41 | // page_of_structures - input vector of reactants, intermediates and products 42 | // output_format - format of the returned result, i.e. rsmi or cmlr 43 | // 44 | 45 | 46 | void arrange_reactions(std::vector &arrows, const std::vector &page_of_boxes, 47 | const std::vector &pluses, std::vector &results, 48 | std::vector &rbox, const std::vector &page_of_structures, 49 | const std::string &output_format); 50 | -------------------------------------------------------------------------------- /addons/java/net/sf/osra/OsraLib.java: -------------------------------------------------------------------------------- 1 | package net.sf.osra; 2 | 3 | import java.io.Writer; 4 | 5 | /** 6 | * JNI bridge for OSRA library. 7 | */ 8 | public class OsraLib { 9 | 10 | /** 11 | * Process the given image with OSRA library. For more information see the corresponding CLI options. 13 | * 14 | * @param imageData 15 | * the image binary data 16 | * @param outputStructureWriter 17 | * the writer to output the found structures in given format 18 | * @param rotate 19 | * rotate image, degrees 20 | * @param invert 21 | * force color inversion (for white-on-black images) 22 | * @param input_resolution 23 | * force processing at a specific resolution, dpi 24 | * @param threshold 25 | * black-white binarization threshold 0.0-1.0 26 | * @param do_unpaper 27 | * perform unpaper image pre-processing, rounds 28 | * @param jaggy 29 | * perform image downsampling 30 | * @param adaptive_option 31 | * perform adaptive thresholding (more CPU-intensive) 32 | * @param format 33 | * one of the formats, accepted by OpenBabel ("sdf", "smi", "can"). 34 | * @param embeddedFormat 35 | * format to be embedded into SDF ("inchi", "smi", "can"). 36 | * @param outputConfidence 37 | * include confidence 38 | * @param show_resolution_guess 39 | * include image resolution estimate 40 | * @param show_page 41 | * include page number 42 | * @param outputCoordinates 43 | * include box coordinates 44 | * @param outputAvgBondLength 45 | * include average bond length 46 | * @return 0, if the call succeeded or negative value in case of error 47 | */ 48 | public static native int processImage(byte[] imageData, Writer outputStructureWriter, 49 | int rotate, boolean invert, int input_resolution, double threshold, int do_unpaper, boolean jaggy, boolean adaptive_option, 50 | String format, 51 | String embeddedFormat, 52 | boolean outputConfidence, 53 | boolean show_resolution_guess, 54 | boolean show_page, 55 | boolean outputCoordinates, 56 | boolean outputAvgBondLength); 57 | } 58 | 59 | -------------------------------------------------------------------------------- /pom.xml.in: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 11 | 12 | 4.0.0 13 | 14 | net.sf.osra 15 | osra 16 | jar 17 | @PACKAGE_VERSION@ 18 | 19 | OSRA 20 | http://osra.sourceforge.net/ 21 | 22 | 23 | https://osra.svn.sourceforge.net/svnroot/osra/trunk 24 | 25 | 26 | 27 | SourceForge 28 | https://sourceforge.net/tracker/?group_id=203833&atid=987182 29 | 30 | 31 | 32 | 33 | GPLv2 License 34 | http://www.gnu.org/licenses/gpl-2.0.txt 35 | 36 | 37 | 38 | 39 | 40 | net.sf.jnati 41 | jnati-deploy 42 | 0.4 43 | 44 | 45 | log4j 46 | log4j 47 | 48 | 49 | 50 | 51 | junit 52 | junit 53 | 4.8.2 54 | test 55 | 56 | 57 | commons-io 58 | commons-io 59 | 1.4 60 | test 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | maven-deploy-plugin 69 | 2.7 70 | 71 | 72 | 73 | 74 | addons/java 75 | 76 | 77 | 78 | -------------------------------------------------------------------------------- /src/osra_stl.cpp: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | #include "osra_stl.h" 21 | 22 | // Function: operator<<() 23 | // 24 | // Helper template method to print various object types. 25 | namespace std 26 | { 27 | std::ostream& operator<<(std::ostream &os, const letters_t &letter) 28 | { 29 | os << "{letter char:" << letter.a << " x:" << letter.x << " y:" << letter.y << " r:" << letter.r << " free:" << letter.free << '}'; 30 | 31 | return os; 32 | } 33 | 34 | std::ostream& operator<<(std::ostream &os, const label_t &label) 35 | { 36 | os << "{label s:" << label.a << " box:" << label.x1 << "x" << label.y1 << "-" << label.x2 << "x" << label.y2 << '}'; 37 | 38 | return os; 39 | } 40 | 41 | std::ostream& operator<<(std::ostream &os, const atom_t &atom) 42 | { 43 | os << "{atom label:" << atom.label << " x:" << atom.x << " y:" << atom.y << " n:" << atom.n << " anum:" << atom.anum << '}'; 44 | 45 | return os; 46 | } 47 | 48 | std::ostream& operator<<(std::ostream &os, const bond_t &bond) 49 | { 50 | os << "{bond a:" << bond.a << " b:" << bond.b << " type:" << bond.type << " exists:" << bond.exists 51 | << " arom:" << bond.arom << " hash:" << bond.hash << " wedge:" << bond.wedge << " up:" << bond.up << " down:" << bond.down<< '}'; 52 | 53 | return os; 54 | } 55 | 56 | std::ostream& operator<<(std::ostream &os, const fragment_t &fragment) 57 | { 58 | os << "{fragment " << fragment.x1 << "x" << fragment.y1 << "-" << fragment.x2 << "x" << fragment.y2 << " atoms.size:" << fragment.atom.size() << '}'; 59 | 60 | return os; 61 | } 62 | } 63 | -------------------------------------------------------------------------------- /src/osra_anisotropic.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // Header: osra_anisotropic.h 21 | // 22 | // Defines types and functions for anisotropic smoothing module. 23 | // 24 | 25 | #include // Magick::Image 26 | 27 | // 28 | // Section: Functions 29 | // 30 | 31 | // Function: anisotropic_smoothing() 32 | // 33 | // Performs Greycstoration anisotropic smoothing on an image according to the specified parameters 34 | // 35 | // Parameters: 36 | // image - image object 37 | // width - width of image 38 | // height - height of image 39 | // amplitude - amplitude of smoothing 40 | // sharpness - sharpness parameter 41 | // anisotropy - anisotropy parameter 42 | // alpha - alpha parameter for smoothing 43 | // sigma - sigma parameter for smoothing' 44 | // 45 | // Returns: 46 | // image object 47 | // 48 | // See also: 49 | // 50 | Magick::Image anisotropic_smoothing(const Magick::Image &image, int width, int height, const float amplitude, 51 | const float sharpness, const float anisotropy, const float alpha, const float sigma); 52 | 53 | // Function: anisotropic_scaling() 54 | // 55 | // Performs Greycstoration anisotropic scaling on an image 56 | // 57 | // Parameters: 58 | // image - image object 59 | // width - width of image 60 | // height - height of image 61 | // nw - new width 62 | // nh - new height 63 | // 64 | // Returns: 65 | // image object 66 | // 67 | // See also: 68 | // 69 | Magick::Image anisotropic_scaling(const Magick::Image &image, int width, int height, int nw, int nh); 70 | 71 | -------------------------------------------------------------------------------- /src/detect.cpp: -------------------------------------------------------------------------------- 1 | // g++ -I/usr/local/include/openbabel-2.0/ -static detect.cpp -o recall -L/usr/local/lib -lopenbabel -lz -linchi 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | void collect_inchi(std::set &inchi1, const std::string &name1) 10 | { 11 | OpenBabel::OBConversion obconversion; 12 | obconversion.SetInFormat("sdf"); 13 | obconversion.SetOutFormat("inchi"); 14 | obconversion.SetOptions("K", obconversion.OUTOPTIONS); 15 | OpenBabel::OBMol mol; 16 | bool notatend = obconversion.ReadFile(&mol, name1); 17 | while (notatend) 18 | { 19 | std::string inchi = obconversion.WriteString(&mol); 20 | if (!inchi.empty()) 21 | inchi1.insert(inchi); 22 | mol.Clear(); 23 | notatend = obconversion.Read(&mol); 24 | } 25 | } 26 | 27 | void print_errors(const std::set &inchi1, const std::string &name2) 28 | { 29 | OpenBabel::OBConversion obconversion; 30 | obconversion.SetInFormat("sdf"); 31 | obconversion.SetOutFormat("inchi"); 32 | obconversion.SetOptions("K", obconversion.OUTOPTIONS); 33 | OpenBabel::OBMol mol; 34 | bool notatend = obconversion.ReadFile(&mol, name2); 35 | int i = 0; 36 | while (notatend) 37 | { 38 | std::string inchi = obconversion.WriteString(&mol); 39 | if (inchi.empty() || inchi1.find(inchi) == inchi1.end()) 40 | { 41 | std::cout << name2 << " " << i << std::endl; 42 | } 43 | mol.Clear(); 44 | notatend = obconversion.Read(&mol); 45 | i++; 46 | } 47 | } 48 | 49 | 50 | 51 | int main(int argc,char **argv) 52 | { 53 | 54 | if(argc<3) 55 | { 56 | std::cerr << "Usage: " << argv[0] <<" ground_truth/ computed/" << std::endl; 57 | return 1; 58 | } 59 | 60 | OpenBabel::obErrorLog.StopLogging(); 61 | 62 | std::string folder1(argv[1]); 63 | std::string folder2(argv[2]); 64 | DIR *dir; 65 | struct dirent *ent; 66 | size_t total = 0, identical = 0, computed = 0; 67 | if ((dir = opendir (folder1.c_str())) != NULL) 68 | { 69 | while ((ent = readdir (dir)) != NULL) 70 | { 71 | std::string name1 = folder1 + ent->d_name; 72 | std::string name2 = folder2 + ent->d_name; 73 | if (name1.size() > 4 && name1.substr(name1.size()-4) == ".sdf") 74 | { 75 | std::set inchi1; 76 | collect_inchi(inchi1,name1); 77 | print_errors(inchi1,name2); 78 | } 79 | } 80 | closedir (dir); 81 | } 82 | else 83 | { 84 | std::cerr << "Unable to open directory " << argv[1] << std::endl; 85 | return 1; 86 | } 87 | 88 | 89 | return(0); 90 | } 91 | -------------------------------------------------------------------------------- /src/osra_thin.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // Header: osra_thin.h 21 | // 22 | // Image thinning routines and noise factor computation 23 | // 24 | #include 25 | 26 | using namespace Magick; 27 | 28 | // 29 | // Section: Functions 30 | // 31 | 32 | // Function: noise_factor() 33 | // 34 | // computes attributes of line thickness histogram 35 | // 36 | // Parameters: 37 | // image - image to be processed 38 | // width - image width 39 | // height - image height 40 | // bgColor - background color 41 | // THRESHOLD_BOND - black-white binarization threshold 42 | // resolution - resolution for which we're performing processing 43 | // max - position of the maximum of the thickness histogram (most common thickness) 44 | // nf45 - ratio of number of lines with thickness 4 to the number of lines with thickness 5 45 | // 46 | // Returns: 47 | // Ratio of the number of lines with thickness 2 to number of lines of thickness 3 48 | // or, if max == 2, ratio of the count of lines with thickness 1 to number of lines of thickness 2 49 | // or, if max == 1, ratio of the count of lines with thickness 2 to number of lines of thickness 1 50 | double noise_factor(const Image &image, int width, int height, const ColorGray &bgColor, double THRESHOLD_BOND, 51 | int resolution, int &max, double &nf45); 52 | 53 | // Function: thin_image() 54 | // 55 | // Performs image thinning based on Rosenfeld's algorithm 56 | // 57 | // Parameters: 58 | // box - original image 59 | // THRESHOLD_BOND - black-white binarization threshold 60 | // bgColor - background color 61 | // 62 | // Returns: 63 | // Thinned image 64 | Image thin_image(const Image &box, double THRESHOLD_BOND, const ColorGray &bgColor); 65 | -------------------------------------------------------------------------------- /src/recall.cpp: -------------------------------------------------------------------------------- 1 | // g++ -I/usr/local/include/openbabel-2.0/ -static recall.cpp -o recall -L/usr/local/lib -lopenbabel -lz -linchi 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | void collect_inchi(std::set &inchi1, const std::string &name1, int &count) 10 | { 11 | count = 0; 12 | OpenBabel::OBConversion obconversion; 13 | obconversion.SetInFormat("sdf"); 14 | obconversion.SetOutFormat("inchi"); 15 | obconversion.SetOptions("K", obconversion.OUTOPTIONS); 16 | OpenBabel::OBMol mol; 17 | bool notatend = obconversion.ReadFile(&mol, name1); 18 | while (notatend) 19 | { 20 | std::string inchi = obconversion.WriteString(&mol); 21 | if (!inchi.empty()) 22 | inchi1.insert(inchi); 23 | count++; 24 | mol.Clear(); 25 | notatend = obconversion.Read(&mol); 26 | } 27 | } 28 | 29 | template 30 | size_t size_intersection (InputIterator1 first1, InputIterator1 last1, 31 | InputIterator2 first2, InputIterator2 last2) 32 | { 33 | size_t result = 0; 34 | while (first1!=last1 && first2!=last2) 35 | { 36 | if (*first1<*first2) ++first1; 37 | else if (*first2<*first1) ++first2; 38 | else { 39 | ++result; ++first1; ++first2; 40 | } 41 | } 42 | return result; 43 | } 44 | 45 | int main(int argc,char **argv) 46 | { 47 | 48 | if(argc<3) 49 | { 50 | std::cerr << "Usage: " << argv[0] <<" ground_truth/ computed/" << std::endl; 51 | return 1; 52 | } 53 | 54 | OpenBabel::obErrorLog.StopLogging(); 55 | 56 | std::string folder1(argv[1]); 57 | std::string folder2(argv[2]); 58 | DIR *dir; 59 | struct dirent *ent; 60 | size_t total = 0, identical = 0, computed = 0; 61 | if ((dir = opendir (folder1.c_str())) != NULL) 62 | { 63 | while ((ent = readdir (dir)) != NULL) 64 | { 65 | std::string name1 = folder1 + ent->d_name; 66 | std::string name2 = folder2 + ent->d_name; 67 | if (name1.size() > 4 && name1.substr(name1.size()-4) == ".sdf") 68 | { 69 | std::set inchi1,inchi2; 70 | int count1, count2; 71 | collect_inchi(inchi1,name1, count1); 72 | collect_inchi(inchi2,name2, count2); 73 | total += inchi1.size(); 74 | computed += count2 - (count1 - inchi1.size()); 75 | identical += size_intersection(inchi1.begin(), inchi1.end(), inchi2.begin(), inchi2.end()); 76 | } 77 | } 78 | closedir (dir); 79 | } 80 | else 81 | { 82 | std::cerr << "Unable to open directory " << argv[1] << std::endl; 83 | return 1; 84 | } 85 | 86 | // std::cout << total <<" "<< identical << " " << double(identical) / total << " " << double(identical) / computed << std::endl; 87 | 88 | return(0); 89 | } 90 | -------------------------------------------------------------------------------- /src/osra_rgroup.cpp: -------------------------------------------------------------------------------- 1 | #include "iostream" 2 | #include "stdio.h" 3 | #include "osra_common.h" 4 | #include "osra_lib.h" 5 | #include "string" 6 | #include 7 | #include 8 | #include 9 | 10 | namespace py = pybind11; 11 | 12 | PYBIND11_MODULE(osra_rgroup, m){ 13 | 14 | m.doc() = "Python Wrapper of OSRA."; 15 | // Function for resolving R-groups 16 | m.def("read_rgroup", &read_rgroup, "Extension of OSRA to resolve generic chemical diagrams (employed by Chemical Schematic Diagram Extractor)", 17 | py::arg("list_of_rgroup_maps"), 18 | py::arg("input_file"), 19 | py::arg("image_data") = "a", 20 | py::arg("image_length") = 4, 21 | py::arg("output_file") = "", 22 | py::arg("rotate") = 0, 23 | py::arg("invert") = false, 24 | py::arg("input_resolution") = 0, 25 | py::arg("threshold") = 0., 26 | py::arg("do_unpaper") = 0, 27 | py::arg("jaggy") = false, 28 | py::arg("adaptive_option") = false, 29 | py::arg("output_format") = "smi", 30 | py::arg("embedded_format") = "", 31 | py::arg("show_confidence") = false, 32 | py::arg("show_resolution_guess") = false, 33 | py::arg("show_page") = false, 34 | py::arg("show_coordinates") = false, 35 | py::arg("show_avg_bond_length") = false, 36 | py::arg("show_learning") = false, 37 | py::arg("osra_dir") = "/usr/local/bin", 38 | py::arg("spelling_file") = "", 39 | py::arg("superatom_file") = "", 40 | py::arg("debug") = false, 41 | py::arg("verbose") = false, 42 | py::arg("output_image_file_prefix") = "", 43 | py::arg("resize") = "", 44 | py::arg("preview") = "" 45 | ); 46 | 47 | // Function for directly resolving images 48 | m.def("read_diagram", &read_diagram, "Python wrapper of OSRA for resolving chemical diagrams", 49 | py::arg("input_file"), 50 | py::arg("image_data") = "a", 51 | py::arg("image_length") = 4, 52 | py::arg("output_file") = "", 53 | py::arg("rotate") = 0, 54 | py::arg("invert") = false, 55 | py::arg("input_resolution") = 0, 56 | py::arg("threshold") = 0., 57 | py::arg("do_unpaper") = 0, 58 | py::arg("jaggy") = false, 59 | py::arg("adaptive_option") = false, 60 | py::arg("output_format") = "smi", 61 | py::arg("embedded_format") = "", 62 | py::arg("show_confidence") = false, 63 | py::arg("show_resolution_guess") = false, 64 | py::arg("show_page") = false, 65 | py::arg("show_coordinates") = false, 66 | py::arg("show_avg_bond_length") = false, 67 | py::arg("show_learning") = false, 68 | py::arg("osra_dir") = "/usr/local/bin", 69 | py::arg("spelling_file") = "", 70 | py::arg("superatom_file") = "", 71 | py::arg("debug") = false, 72 | py::arg("verbose") = false, 73 | py::arg("output_image_file_prefix") = "", 74 | py::arg("resize") = "", 75 | py::arg("preview") = "" 76 | ); 77 | } -------------------------------------------------------------------------------- /addons/lib_sample/lib_sample.cpp: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | #include // NULL 21 | #include // malloc(), free() 22 | 23 | #include // std::cout 24 | #include // std::ifstream 25 | #include // std:ostringstream 26 | 27 | #include 28 | 29 | using namespace std; 30 | 31 | int main(int argc, char **argv) 32 | { 33 | if (argc < 2) 34 | { 35 | cout << "Usage: " << argv[0] << " [image_file_name]" << endl; 36 | return 1; 37 | } 38 | 39 | ifstream is(argv[1]); 40 | 41 | if (!is.is_open()) 42 | { 43 | cout << "Failed to open a file '" << argv[1] << '\'' << endl; 44 | return 2; 45 | } 46 | 47 | // Learn the file size: 48 | is.seekg(0, ios::end); 49 | const int buf_size = (int) is.tellg(); 50 | is.seekg(0, ios::beg); 51 | 52 | // Allocate memory: 53 | char* buf = (char*) malloc(buf_size); 54 | 55 | if (buf == NULL) 56 | { 57 | cout << "Failed to allocate " << buf_size << " bytes of memory" << endl; 58 | is.close(); 59 | return 3; 60 | } 61 | 62 | is.read(buf, buf_size); 63 | 64 | // Call OSRA: 65 | const int result = osra_process_image( 66 | buf, 67 | buf_size, 68 | cout, 69 | 0, 70 | false, 71 | 0, 72 | 0, 73 | 0, 74 | false, 75 | false, 76 | "sdf", 77 | "", 78 | true, 79 | false, 80 | false, 81 | true, 82 | true, 83 | "", 84 | "", 85 | "", 86 | false, 87 | false 88 | ); 89 | 90 | // Release the allocated resources: 91 | is.close(); 92 | free(buf); 93 | 94 | return result; 95 | } 96 | -------------------------------------------------------------------------------- /addons/valgrind.supp: -------------------------------------------------------------------------------- 1 | # 2 | # Here is an example of running valgrind, suppressing the warnings for 3rd-party libraries: 3 | # 4 | # valgrind --leak-check=full --show-reachable=yes --read-var-info=yes --track-origins=yes --gen-suppressions=all --suppressions=valgrind.supp osra my_test.tif 2> out 5 | # 6 | # For more information about suppression, see http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress 7 | # 8 | # To perform time profiling one needs to compile (and like) the sources with "gcc -pg" (see Makefile.inc for more details). 9 | # 10 | 11 | ################## 12 | # Basic libraries 13 | ################## 14 | 15 | { 16 | Uninitialised value in libdl 17 | Memcheck:Cond 18 | ... 19 | obj:/lib/ld-*.so 20 | } 21 | 22 | { 23 | Uninitialised value in libltdl 24 | Memcheck:Cond 25 | ... 26 | obj:/usr/lib/libltdl.so.* 27 | } 28 | 29 | { 30 | Invalid read of size 8 in libltdl 31 | Memcheck:Addr8 32 | ... 33 | obj:/usr/lib/libltdl.so.* 34 | } 35 | 36 | { 37 | Memory leak in library initialization code for all libraries 38 | Memcheck:Leak 39 | ... 40 | fun:_dl_init 41 | } 42 | 43 | ################## 44 | # GraphicsMagick 45 | ################## 46 | 47 | { 48 | Conditional jump or move depends on uninitialised value(s) in libGraphicsMagick 49 | Memcheck:Cond 50 | ... 51 | obj:/usr/lib/libGraphicsMagick.so.* 52 | } 53 | 54 | { 55 | Use of uninitialised value of size 4 in libGraphicsMagick 56 | Memcheck:Value4 57 | ... 58 | obj:/usr/lib/libGraphicsMagick.so.* 59 | } 60 | 61 | { 62 | Invalid read of size 8 in libGraphicsMagick 63 | Memcheck:Addr8 64 | ... 65 | obj:/usr/lib/libGraphicsMagick.so.* 66 | } 67 | 68 | { 69 | Conditional jump or move depends on uninitialised value(s) in libGraphicsMagick++ 70 | Memcheck:Cond 71 | ... 72 | obj:/usr/lib/libGraphicsMagick++.so.* 73 | } 74 | 75 | { 76 | Use of uninitialised value of size 4 in libGraphicsMagick++ 77 | Memcheck:Value4 78 | ... 79 | obj:/usr/lib/libGraphicsMagick++.so.* 80 | } 81 | 82 | { 83 | Invalid read of size 4 in libGraphicsMagick++ 84 | Memcheck:Addr4 85 | ... 86 | obj:/usr/lib/libGraphicsMagick++.so.* 87 | } 88 | 89 | { 90 | Invalid read of size 8 in libGraphicsMagick++ 91 | Memcheck:Addr8 92 | ... 93 | obj:/usr/lib/libGraphicsMagick++.so.* 94 | } 95 | 96 | ################## 97 | # OpenBabel 98 | ################## 99 | 100 | { 101 | Conditional jump or move depends on uninitialised value(s) in libopenbabel 102 | Memcheck:Cond 103 | ... 104 | obj:/usr/lib/libopenbabel.so.* 105 | } 106 | 107 | { 108 | Use of uninitialised value of size 4 in libopenbabel 109 | Memcheck:Value4 110 | ... 111 | obj:/usr/lib/libopenbabel.so.* 112 | } 113 | 114 | { 115 | Invalid read of size 4 in libopenbabel 116 | Memcheck:Addr4 117 | ... 118 | obj:/usr/lib/libopenbabel.so.* 119 | } 120 | 121 | { 122 | Invalid read of size 8 in libopenbabel 123 | Memcheck:Addr8 124 | ... 125 | obj:/usr/lib/libopenbabel.so.* 126 | } 127 | 128 | { 129 | Use of uninitialised value of size 4 in libopenbabel 130 | Memcheck:Value4 131 | ... 132 | obj:/usr/lib/openbabel/*.so 133 | } 134 | 135 | { 136 | Invalid read of size 8 in libopenbabel 137 | Memcheck:Addr8 138 | ... 139 | obj:/usr/lib/openbabel/*.so 140 | } 141 | -------------------------------------------------------------------------------- /package/android/runosra.java: -------------------------------------------------------------------------------- 1 | package cadd.osra.main; 2 | 3 | import java.io.ByteArrayOutputStream; 4 | import java.io.File; 5 | import java.io.IOException; 6 | import java.io.InputStream; 7 | import java.io.OutputStream; 8 | import java.net.URLEncoder; 9 | 10 | import android.app.Activity; 11 | import android.content.Intent; 12 | import android.net.Uri; 13 | import android.os.Bundle; 14 | import android.widget.TextView; 15 | 16 | public class runosra extends Activity { 17 | /** Called when the activity is first created. */ 18 | @Override 19 | public void onCreate(Bundle savedInstanceState) { 20 | super.onCreate(savedInstanceState); 21 | setContentView(R.layout.main); 22 | 23 | try { 24 | writeToStream(getAssets().open("spelling.txt"), openFileOutput("spelling.txt", MODE_WORLD_READABLE )); 25 | } catch (Exception e) { 26 | e.printStackTrace(); 27 | } 28 | try { 29 | writeToStream(getAssets().open("superatom.txt"), openFileOutput("superatom.txt", MODE_WORLD_READABLE)); 30 | } catch (Exception e) { 31 | e.printStackTrace(); 32 | } 33 | byte [] rawimage=null; 34 | try { 35 | rawimage=writeToArray(getAssets().open("c.jpg")); 36 | } catch (Exception e) { 37 | e.printStackTrace(); 38 | } 39 | 40 | 41 | 42 | File spelling=getFileStreamPath("spelling.txt"); 43 | File superatom=getFileStreamPath("superatom.txt"); 44 | //File image=getFileStreamPath("chemnav.png"); 45 | //TextView tv = new TextView(this); 46 | String [] jargv= {"osra","-f","inchi","-l",spelling.getAbsolutePath(),"-a",superatom.getAbsolutePath()}; 47 | String inchi=nativeosra(jargv,rawimage); 48 | String enc_inchi=""; 49 | try { 50 | enc_inchi=URLEncoder.encode(inchi, "UTF-8"); 51 | } catch (Exception e) { 52 | e.printStackTrace(); 53 | } 54 | if (enc_inchi.length()!=0) { 55 | //String base_url="http://en.wikipedia.org/wiki/Special:Search?fulltext=Search&search="; 56 | String base_url="http://129.43.27.140/cgi-bin/lookup/results?type=inchi&context_all=all&query="; 57 | String url=base_url+enc_inchi; 58 | Uri uri = Uri.parse(url); 59 | Intent intent = new Intent(Intent.ACTION_VIEW, uri); 60 | startActivity(intent); 61 | 62 | } 63 | //tv.setText( enc_inchi ); 64 | //setContentView(tv); 65 | 66 | } 67 | 68 | public native String nativeosra(String [] jargv, byte [] rawimage); 69 | static { 70 | System.loadLibrary("openbabel"); 71 | System.loadLibrary("osra"); 72 | } 73 | 74 | public static void writeToStream(InputStream in , OutputStream out) throws IOException 75 | { 76 | byte[] bytes = new byte[2048]; 77 | 78 | for (int c = in.read(bytes); c != -1; c = in.read(bytes)) { 79 | out.write(bytes,0, c); 80 | } 81 | in.close(); 82 | out.close(); 83 | } 84 | 85 | public static byte [] writeToArray(InputStream in) throws IOException 86 | { 87 | ByteArrayOutputStream out=new ByteArrayOutputStream(); 88 | byte[] bytes = new byte[2048]; 89 | 90 | for (int c = in.read(bytes); c != -1; c = in.read(bytes)) { 91 | out.write(bytes,0, c); 92 | } 93 | in.close(); 94 | //byte [] prep = new byte[out.size()]; 95 | byte [] arr=out.toByteArray(); 96 | out.close(); 97 | return arr; 98 | } 99 | } 100 | -------------------------------------------------------------------------------- /src/osra_fragments.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // Header: osra_fragments.h 21 | // 22 | // Declares operations on molecular fragments 23 | // 24 | #ifndef OSRA_FRAGMENTS_H 25 | #define OSRA_FRAGMENTS_H 26 | 27 | #include "osra.h" 28 | 29 | //struct: fragment_s 30 | // used by to split chemical structure into unconnected molecules. 31 | struct fragment_s 32 | { 33 | //int: x1,y1,x2,y2 34 | //top left and bottom right coordinates of the fragment 35 | int x1, y1, x2, y2; 36 | //array: atom 37 | //vector of atom indices for atoms in a molecule of this fragment 38 | std::vector atom; 39 | }; 40 | //typedef: fragment_t 41 | //defines fragment_t type based on fragment_s struct 42 | typedef struct fragment_s fragment_t; 43 | 44 | // 45 | // Section: Functions 46 | // 47 | 48 | // Function: find_fragments() 49 | // 50 | // Find disjointed fragments in a molecule 51 | // 52 | // Parameters: 53 | // bond - vector of bonds 54 | // n_bond - number of bonds 55 | // atom - vector of atoms 56 | // 57 | // Returns: 58 | // vector of vectors of atom id's which belong to different fragments 59 | std::vector > find_fragments(const std::vector &bond, int n_bond, const std::vector &atom); 60 | 61 | // Function: reconnect_fragments() 62 | // 63 | // Reconnecting atoms from different fragments if they are less than 1.1 avg bond length apart 64 | // 65 | // Parameters: 66 | // bond - vector of bonds 67 | // n_bond - number of bonds 68 | // atom - vector of atoms 69 | // avg - average bond length 70 | // 71 | // Returns: 72 | // New number of bonds 73 | int reconnect_fragments(std::vector &bond, int n_bond, std::vector &atom, double avg); 74 | 75 | // Function: populate_fragments() 76 | // 77 | // Transforms vector of vectors of atom ids into a vector of fragments 78 | // 79 | // Parameters: 80 | // frags - vector of vector of atom ids 81 | // atom - vector of atoms 82 | // 83 | // Returns: 84 | // vector of fragments 85 | std::vector populate_fragments(const std::vector > &frags, const std::vector &atom); 86 | 87 | // Function: comp_fragments() 88 | // 89 | // Comparison function used for sorting fragments according to their positions in the picture: top-down, left to right 90 | // 91 | // Parameters: 92 | // aa, bb - two fragments to compare 93 | // 94 | // Returns: 95 | // True if fragment aa is higher or to the left of fragment bb. 96 | // False otherwise. 97 | bool comp_fragments(const fragment_t &aa, const fragment_t &bb); 98 | 99 | #endif // OSRA_FRAGMENTS_H 100 | -------------------------------------------------------------------------------- /test/bugs/tesseract_init_test/test.cpp: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition 3 | 4 | This is a U.S. Government work (2007-2010) and is therefore not subject to 5 | copyright. However, portions of this work were obtained from a GPL or 6 | GPL-compatible source. 7 | Created by Igor Filippov, 2007-2010 (igorf@helix.nih.gov) 8 | 9 | This program is free software; you can redistribute it and/or modify it under 10 | the terms of the GNU General Public License as published by the Free Software 11 | Foundation; either version 2 of the License, or (at your option) any later 12 | version. 13 | 14 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 15 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 16 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 17 | 18 | You should have received a copy of the GNU General Public License along with 19 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 20 | St, Fifth Floor, Boston, MA 02110-1301, USA 21 | *****************************************************************************/ 22 | 23 | #include // NULL 24 | #include // free() 25 | #include // strlen() 26 | 27 | #include // std::cout 28 | 29 | #include 30 | 31 | using namespace std; 32 | 33 | const char* PICTURE[][50] = 34 | { 35 | { 36 | "##.........", 37 | "##.........", 38 | "...........", 39 | "...........", 40 | "..........#", 41 | "..........#", 42 | "........###", 43 | "......#####", 44 | "....##.####", 45 | "...########", 46 | "###########" 47 | }, 48 | { 49 | "##.........", 50 | "##.........", 51 | "...........", 52 | "...........", 53 | "..........#", 54 | "..........#", 55 | "........###", 56 | "......#####", 57 | "....##.####", 58 | "...########", 59 | "###########" 60 | }, 61 | { 62 | "##.........", 63 | ".#.........", 64 | "...........", 65 | "...........", 66 | "..........#", 67 | "..........#", 68 | "..........#", 69 | "......#####", 70 | ".....######", 71 | "....#######", 72 | "###########" 73 | }, 74 | { 75 | "...........", 76 | "...........", 77 | "...........", 78 | "...........", 79 | "..........#", 80 | "........#.#", 81 | "........###", 82 | "......#####", 83 | "..#########", 84 | ".##########", 85 | ".##########" 86 | } 87 | }; 88 | // Create a binary pixel image and recognize it using the given engine: 89 | void recognize(int n, tesseract::TessBaseAPI &t) 90 | { 91 | int height = 0; 92 | int width = strlen(PICTURE[n][0]); 93 | 94 | while (PICTURE[n][height] != NULL) 95 | height++; 96 | 97 | cout << "Picture " << n + 1 << ": width x height = " << width << "x" << height << endl; 98 | 99 | unsigned char *pixmap = (unsigned char *) malloc(width * height); 100 | 101 | for (int row = 0; row < height; row++) 102 | for (int col = 0; col < width; col++) 103 | pixmap[row * width + col] = PICTURE[n][row][col] == '#' ? 255 : 0; 104 | 105 | char* text = t.TesseractRect(pixmap, 1, width, 0, 0, width, height); 106 | 107 | cout << "Result: " << text << '.' << endl; 108 | 109 | free(text); 110 | free(pixmap); 111 | } 112 | 113 | #ifdef TESS_GLOBAL_INSTANCE 114 | // Global variable: 115 | tesseract::TessBaseAPI tess; 116 | 117 | int main(int argc, char **argv) 118 | { 119 | tess.Init(NULL, "eng", NULL, 0, false); 120 | 121 | for (unsigned int n = 0; n < sizeof(PICTURE) / 200; n++) 122 | { 123 | recognize(n, tess); 124 | 125 | tess.Clear(); 126 | } 127 | 128 | tess.End(); 129 | } 130 | #else 131 | int main(int argc, char **argv) 132 | { 133 | for (unsigned int n = 0; n < sizeof(PICTURE) / 200; n++) 134 | { 135 | tesseract::TessBaseAPI tess; 136 | tess.Init(NULL, "eng", NULL, 0, false); 137 | 138 | recognize(n, tess); 139 | 140 | tess.End(); 141 | } 142 | } 143 | #endif 144 | -------------------------------------------------------------------------------- /dict/superatom.txt: -------------------------------------------------------------------------------- 1 | # Translations of superatom labels to SMILES. 2 | # First atom of SMILES string should be the one connected to the rest of 3 | # the molecule. 4 | # Empty lines and lines starting with # are ignored. 5 | # Also check spelling.txt to see that the superatom label 6 | # is correctly spelled. 7 | 8 | Me C 9 | MeO OC 10 | MeS SC 11 | MeN NC 12 | CF CF 13 | CF3 C(F)(F)F 14 | CN C#N 15 | F3CN NC(F)(F)F 16 | Ph c1ccccc1 17 | NO N=O 18 | NO2 N(=O)=O 19 | N(OH)CH3 N(O)C 20 | SO3H S(=O)(=O)O 21 | COOH C(=O)O 22 | nBu CCCC 23 | EtO OCC 24 | OiBu OCC(C)C 25 | iPr CCC 26 | tBu C(C)(C)C 27 | Ac C(=O)C 28 | AcO OC(=O)C 29 | NHAc NC(=O)C 30 | OR O* 31 | #BzO OCc1ccccc1 32 | BzO OC(=O)C1=CC=CC=C1 33 | THPO O[C@@H]1OCCCC1 34 | 35 | CHO C=O 36 | NOH NO 37 | 38 | # Added release 1.3.0 39 | CO2Et C(=O)OCC 40 | CO2Me C(=O)OC 41 | MeO2S S(=O)(=O)C 42 | NMe2 N(C)C 43 | CO2R C(=O)O* 44 | ZNH NC(=O)OCC1=CC=CC=C1 45 | HOCH2 CO 46 | H2NCH2 CN 47 | Et CC 48 | BnO OCC1=CC=CC=C1 49 | AmNH NCCCCC 50 | AmO OCCCCC 51 | AmO2C C(=O)OCCCCC 52 | AmS SCCCCC 53 | BnNH NCC1=CC=CC=C1 54 | BnO2C C(=O)OCC1=CC=CC=C1 55 | Bu3Sn [Sn](CCCC)(CCCC)CCCC 56 | BuNH NCCCC 57 | BuO OCCCC 58 | BuO2C C(=O)OCCCC 59 | BuS SCCCC 60 | CBr3 C(Br)(Br)Br 61 | CbzNH NC(=O)OCC1=CC=CC=C1 62 | CCl3 C(Cl)(Cl)Cl 63 | ClSO2 S(=O)(=O)Cl 64 | COBr C(=O)Br 65 | COBu C(=O)CCCC 66 | COCF3 C(=O)C(F)(F)F 67 | COCl C(=O)Cl 68 | COCO C(=O)C=O 69 | COEt C(=O)CC 70 | COF C(=O)F 71 | COMe C(=O)C 72 | OCOMe OC(=O)C 73 | CONH2 C(=O)N 74 | CONHEt C(=O)NCC 75 | CONHMe C(=O)NC 76 | COSH C(=O)S 77 | Et2N N(CC)CC 78 | Et3N N(CC)(CC)CC 79 | EtNH NCC 80 | H2NSO2 S(=O)(N)=O 81 | HONH ON 82 | Me2N N(C)C 83 | NCO N=C=O 84 | NCS N=C=S 85 | NHAm NCCCCC 86 | NHBn NCC1=CC=CC=C1 87 | NHBu NCCCC 88 | NHEt NCC 89 | NHOH NO 90 | NHPr NCCC 91 | NO N=O 92 | POEt2 P(OCC)OCC 93 | POEt3 P(OCC)(OCC)OCC 94 | POOEt2 P(=O)(OCC)OCC 95 | PrNH CCCN 96 | SEt SCC 97 | 98 | BOC C(=O)OC(C)(C)C 99 | MsO OS(=O)(=O)C 100 | OTos OS(=O)(=O)c1ccc(C)cc1 101 | Tos S(=O)(=O)c1ccc(C)cc1 102 | C8H CCCCCCCC 103 | C6H CCCCCC 104 | CH2CH3 CC 105 | N(CH2CH3)2 N(CC)CC 106 | N(CH2CH2CH3)2 N(CCC)CCC 107 | C(CH3)3 C(C)(C)C 108 | COCH3 C(=O)C 109 | CH(CH3)2 C(C)C 110 | OCF3 OC(F)(F)F 111 | OCCl3 OC(Cl)(Cl)Cl 112 | OCF2H OC(F)F 113 | SO2Me S(=O)(=O)C 114 | OCH2CO2H OCC(=O)O 115 | OCH2CO2Et OCC(=O)OCC 116 | BOC2N N(C(=O)OC(C)(C)C)C(=O)OC(C)(C)C 117 | BOCHN NC(=O)OC(C)(C)C 118 | NHCbz NC(=O)OCc1ccccc1 119 | OCH2CF3 OCC(F)(F)F 120 | NHSO2BU NS(=O)(=O)CCCC 121 | NHSO2Me NS(=O)(=O)C 122 | MeO2SO OS(=O)(=O)C 123 | NHCOOEt NC(=O)OCC 124 | NHCH3 NC 125 | H4NOOC C(=O)ON 126 | C3H7 CCC 127 | C2H5 CC 128 | NHNH2 NN 129 | OCH2CH2OH OCCO 130 | OCH2CHOHCH2OH OCC(O)CO 131 | OCH2CHOHCH2NH OCC(O)CN 132 | NHNHCOCH3 NNC(=O)C 133 | NHNHCOCF3 NNC(=O)C(F)(F)F 134 | NHCOCF3 NC(=O)C(F)(F)F 135 | CO2CysPr C(=O)ON[C@H](CS)C(=O)CCC 136 | HOH2C CO 137 | H3CHN NC 138 | H3CO2C C(=O)OC 139 | CF3CH2 CC(F)(F)F 140 | OBOC OC(=O)OC(C)(C)C 141 | Bn2N N(Cc1ccccc1)Cc1ccccc1 142 | F5S S(F)(F)(F)(F)F 143 | PPh2 P(c1ccccc1)c1ccccc1 144 | PPh3 P(c1ccccc1)(c1ccccc1)c1ccccc1 145 | OCH2Ph OCc1ccccc1 146 | CH2OMe COC 147 | PMBN NCc1ccc(OC)cc1 148 | SO2 S(=O)=O 149 | NH3Cl NCl 150 | CF2CF3 C(F)(F)C(F)(F)F 151 | CF2CF2H C(F)(F)C(F)(F) 152 | Bn Cc1ccccc1 153 | OCH2Ph OCc1ccccc1 154 | COOCH2Ph C(=O)OCc1ccccc1 155 | Ph3CO OC(c1ccccc1)(c1ccccc1)c1ccccc1 156 | Ph3C C(c1ccccc1)(c1ccccc1)c1ccccc1 157 | Me2NO2S S(C)(C)N(=O)=O 158 | SO3Na S(=O)(=O)(=O)[Na] 159 | OSO2Ph OS(=O)(=O)c1ccccc1 160 | (CH2)5Br CCCCCBr 161 | OPh Oc1ccccc1 162 | SPh Sc1ccccc1 163 | NHPh Nc1ccccc1 164 | 165 | CONEt2 C(=O)N(CC)CC 166 | CONMe2 C(=O)N(C)C 167 | EtO2CHN NC(=O)OCC 168 | H4NO3S S(=O)(=O)ON 169 | TMS [Si](C)(C)(C) 170 | COCOOCH2CH3 C(=O)C(=O)OCC 171 | OCH2CN OCC#N 172 | 173 | 174 | Xx [*] 175 | X [*] 176 | Y [*] 177 | Z [*] 178 | R [*] 179 | R1 [*] 180 | R2 [*] 181 | R3 [*] 182 | R4 [*] 183 | R5 [*] 184 | R6 [*] 185 | R7 [*] 186 | R8 [*] 187 | R9 [*] 188 | R10 [*] 189 | Y2 [*] 190 | D [*] 191 | 192 | -------------------------------------------------------------------------------- /package/linux/plugins/bkchem/convert_clipboard_image.py: -------------------------------------------------------------------------------- 1 | """Authors: Noel O'Boyle and Igor V. Filippov 2 | Copied of......hmmm... Inspired by the "fetch from webbook" plugin :-) 3 | Converts an image from clipboard to a molecule using OSRA 4 | """ 5 | 6 | import os 7 | import popen2 8 | import oasa_bridge 9 | import dialogs 10 | import tempfile 11 | import Pmw 12 | import StringIO 13 | import os, sys 14 | 15 | 16 | def err_mess_box(mess, title="Error"): #Pops up error OK-box 17 | message = "" 18 | for m in mess: 19 | message=message+m+"\n" 20 | dialog = Pmw.Dialog(App.paper, buttons=('OK',), 21 | defaultbutton='OK', title=title) 22 | 23 | w = Pmw.LabeledWidget(dialog.interior(), labelpos='n', label_text=message) 24 | w.pack(expand=1, fill='both', padx=4, pady=4) 25 | dialog.activate() 26 | 27 | def run_osra(osra): 28 | sdf = " " 29 | filedes, filename = tempfile.mkstemp(suffix='.png') 30 | 31 | if os.name=="posix": 32 | import pygtk 33 | pygtk.require('2.0') 34 | import gtk, gobject 35 | clipboard = gtk.clipboard_get() 36 | image=clipboard.wait_for_image() 37 | if not image: 38 | return sdf 39 | try: 40 | image.save(filename,"png") 41 | except: 42 | return sdf 43 | else: 44 | import ImageGrab 45 | image = ImageGrab.grabclipboard() 46 | if not image: 47 | return sdf 48 | try: 49 | image.save(filename) 50 | except: 51 | return sdf 52 | 53 | try: 54 | stdout, stdin, stderr = popen2.popen3('"%s" -f sdf %s' % (osra, filename)) 55 | except: 56 | os.remove(filename) 57 | return sdf 58 | 59 | sdf = stdout.read() 60 | #os.remove(filename) 61 | return sdf 62 | 63 | 64 | def present_mol(sdf): 65 | if not sdf.rstrip().endswith("$$$$"): 66 | return 0 67 | try: 68 | mol = StringIO.StringIO(sdf) 69 | molec = oasa_bridge.read_molfile(mol, App.paper) 70 | mol.close() 71 | except: 72 | return 0 73 | if len(molec.atoms)<2: 74 | return 0 75 | averagey = sum([atom.y for atom in molec.atoms]) / float(len(molec.atoms)) 76 | for atom in molec.atoms: 77 | atom.y = 2*averagey - atom.y 78 | N = 0 79 | for minimol in molec.get_disconnected_subgraphs(): 80 | N += 1 81 | App.paper.stack.append(minimol) 82 | minimol.draw() 83 | App.paper.add_bindings() 84 | App.paper.start_new_undo_record() 85 | return N 86 | 87 | 88 | 89 | osra = os.environ.get("OSRA", None) 90 | if osra and os.path.isfile(osra): 91 | if os.name=="posix": 92 | r, w = os.pipe() # these are file descriptors, not file objects 93 | pid = os.fork() 94 | if pid: 95 | # we are the parent 96 | os.close(w) # use os.close() to close a file descriptor 97 | r = os.fdopen(r) # turn r into a file object 98 | dialog = dialogs.progress_dialog(App, title="Progress") 99 | dialog.update(0.1, top_text = "Calling OSRA...", bottom_text = "Image processing in progress") 100 | sdf = r.read() 101 | dialog.update(0.9, top_text = "Adding molecules to workspace...", bottom_text = "Almost there!") 102 | N = present_mol(sdf) 103 | dialog.close() 104 | if N<1: 105 | err_mess_box(["Image could not be converted to a molecule."]) 106 | # else: 107 | # err_mess_box(["%d molecule%s added" % (N, ["s", ""][N==1])], "Info") 108 | os.waitpid(pid, 0) # make sure the child process gets cleaned up 109 | else: 110 | # we are the child 111 | os.close(r) 112 | w = os.fdopen(w, 'w') 113 | sdf = run_osra(osra) 114 | w.write(sdf) 115 | w.close() 116 | sys.exit(0) 117 | else: 118 | dialog = dialogs.progress_dialog(App, title="Progress") 119 | dialog.update(0.1, top_text = "Calling OSRA...", bottom_text = "Image processing in progress") 120 | sdf = run_osra(osra) 121 | dialog.update(0.9, top_text = "Adding molecules to workspace...", bottom_text = "Almost there!") 122 | N = present_mol(sdf) 123 | dialog.close() 124 | if N<1: 125 | err_mess_box(["Image could not be converted to a molecule."]) 126 | # else: 127 | # err_mess_box(["%d molecule%s added" % (N, ["s", ""][N==1])], "Info") 128 | 129 | else: 130 | err_mess_box(["You need to set the environment variable " \ 131 | "OSRA to point to the OSRA executable.\n" \ 132 | "When setting the variable, do not include quotation " \ 133 | "marks around the path."]) 134 | 135 | 136 | -------------------------------------------------------------------------------- /src/osra_ocr.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // Header: osra_ocr.h 21 | // 22 | // Defines types and functions for OSRA OCR module. 23 | // 24 | 25 | #include // std::string 26 | #include // std::map 27 | 28 | #include // Magick::Image, Magick::ColorGray 29 | 30 | // 31 | // Section: Functions 32 | // 33 | 34 | // Function: osra_ocr_init() 35 | // 36 | // Initialises OCR engine. Should be called at e.g. program startup. 37 | // 38 | void osra_ocr_init(); 39 | 40 | // Function: osra_ocr_destroy() 41 | // 42 | // Releases all resources allocated by OCR engine. 43 | // 44 | void osra_ocr_destroy(); 45 | 46 | // Function: get_atom_label() 47 | // 48 | // OCR engine function, does single character recognition 49 | // 50 | // Parameters: 51 | // image - image object 52 | // bg - gray-level background color 53 | // x1, y1, x2, y2 - coordinates of the character box 54 | // THRESHOLD - graylevel threshold for image binarization 55 | // dropx, dropy - coordinates of drop point from where breadth-first algorithm will search for single connected component 56 | // which is hopefully the character we are trying to recognize 57 | // no_filtering - do not apply character filter 58 | // numbers - only allow numbers in the output 0..9 59 | // 60 | // Returns: 61 | // recognized character or 0 62 | char get_atom_label(const Magick::Image &image, const Magick::ColorGray &bg, int x1, int y1, int x2, int y2, 63 | double THRESHOLD, int dropx, int dropy, bool no_filtering, bool verbose, bool numbers = false); 64 | 65 | 66 | // Function: get_atom_label_rgroup() 67 | // 68 | // OCR engine function, does single character recognition 69 | // 70 | // Parameters: 71 | // image - image object 72 | // bg - gray-level background color 73 | // x1, y1, x2, y2 - coordinates of the character box 74 | // THRESHOLD - graylevel threshold for image binarization 75 | // dropx, dropy - coordinates of drop point from where breadth-first algorithm will search for single connected component 76 | // which is hopefully the character we are trying to recognize 77 | // no_filtering - do not apply character filter 78 | // numbers - only allow numbers in the output 0..9 79 | // 80 | // Returns: 81 | // recognized character or 0 82 | char get_atom_label_rgroup(const Magick::Image &image, const Magick::ColorGray &bg, int x1, int y1, int x2, int y2, 83 | double THRESHOLD, int dropx, int dropy, bool no_filtering, bool verbose, std::string rgroup, bool numbers = false); 84 | 85 | // Function: fix_atom_name() 86 | // 87 | // Corrects common OCR errors by using spelling dictionary 88 | // 89 | // Parameters: 90 | // s - Original atomic label as returned by OCR engine. 91 | // n - The number of bonds attached to the atom. 92 | // fix - spelling dictionary 93 | // superatom - dictionary of superatom labels mapped to SMILES 94 | // debug - enables output of debugging information to stdout 95 | // 96 | // Returns: 97 | // corrected atomic label 98 | const std::string fix_atom_name(const std::string &s, int n, const std::map &fix, 99 | const std::map &superatom, bool debug); 100 | 101 | bool detect_bracket(int x, int y, unsigned char *pic); 102 | -------------------------------------------------------------------------------- /test/bugs/ocrad_api_regression_test/osra_ocr.cpp: -------------------------------------------------------------------------------- 1 | 2 | #include 3 | 4 | #include 5 | #include 6 | 7 | #include "ocradlib.h" 8 | 9 | // Required by OCRAD, not used here: 10 | #include 11 | #include 12 | 13 | #include "common.h" 14 | #include "rectangle.h" 15 | #include "bitmap.h" 16 | #include "blob.h" 17 | #include "character.h" 18 | 19 | using namespace std; 20 | 21 | /* Actual max height is 12, but we leave some more for extensions: */ 22 | const char* TESTS[][12] = 23 | { 24 | /* Test1: "N" is not detected */ 25 | { 26 | "11100001", 27 | "11110001", 28 | "11110001", 29 | "11011001", 30 | "11011101", 31 | "11001111", 32 | "11000111", 33 | "11000111", 34 | "11000011", 35 | }, 36 | /* Test2: "N" is not detected */ 37 | { 38 | "11000011", 39 | "11100011", 40 | "11110011", 41 | "11110011", 42 | "10011011", 43 | "10011111", 44 | "10001111", 45 | "10000111", 46 | "10000111", 47 | }, 48 | /* Test3: Detected as "r": */ 49 | { 50 | "000000010", 51 | "000000111", 52 | "000001110", 53 | "000011100", 54 | "000111000", 55 | "001111000", 56 | "011110000", 57 | "111100000", 58 | "011000000", 59 | }, 60 | /* Test4: Detected as "r": */ 61 | { 62 | "00000011111", 63 | "00000111100", 64 | "00011110000", 65 | "01111100000", 66 | "11110000000", 67 | "11000000000", 68 | }, 69 | /* Test5: Detected as "r": */ 70 | { 71 | "00111", 72 | "01110", 73 | "01110", 74 | "01110", 75 | "11100", 76 | "11100", 77 | "11100", 78 | "11000", 79 | "11000", 80 | }, 81 | /* Test6: Detected as "t": */ 82 | { 83 | "00111", 84 | "00111", 85 | "00110", 86 | "01110", 87 | "01110", 88 | "01110", 89 | "11100", 90 | "11100", 91 | "11100" 92 | } 93 | }; 94 | 95 | char run_test(int n) 96 | { 97 | int height = 0; 98 | int width = strlen(TESTS[n][0]); 99 | 100 | while (TESTS[n][height] != NULL) 101 | { 102 | height++; 103 | } 104 | 105 | const char** image = TESTS[n]; 106 | 107 | cout << "Test " << n + 1 << ": width x height = " << width << "x" << height << endl; 108 | 109 | // Blob for recognition attempt via Character::recognize1(): 110 | Blob* b = new Blob(0, 0, width-1, height-1); 111 | 112 | // OCRAD_Pixmap for recognition attempt via OCRAD_result_first_character(): 113 | struct OCRAD_Pixmap* opix = new OCRAD_Pixmap(); 114 | 115 | unsigned char* bitmap_data = (unsigned char*) malloc(width * height); 116 | unsigned char* greymap_data = (unsigned char*) malloc(width * height); 117 | 118 | memset(bitmap_data, 0, width * height); 119 | memset(greymap_data, 255, width * height); 120 | 121 | opix->height = height; 122 | opix->width = width; 123 | 124 | opix->mode = OCRAD_bitmap; 125 | opix->data = bitmap_data; 126 | // opix->mode = OCRAD_greymap; opix->data = greymap_data; 127 | 128 | for (int row = 0; row < height; row++) 129 | { 130 | for (int col = 0; col < width; col++) 131 | if (image[row][col] == '1') 132 | { 133 | b->set_bit(row, col, true); 134 | bitmap_data[row * width + col] = 1; 135 | greymap_data[row * width + col] = 0; 136 | } 137 | } 138 | 139 | b->find_holes(); 140 | 141 | Control control; 142 | Character a(b); 143 | 144 | // The Blob object was delegated to Character, which will free it on destruction: 145 | b = NULL; 146 | 147 | a.recognize1(control.charset, Rectangle::Rectangle(a.left(), a.top(), a.right(), a.bottom())); 148 | char c1 = a.byte_result(); 149 | 150 | // Was the character recognised by OCRAD? 151 | cout << "+ recognised via Character::recognize1(): " << c1 << endl; 152 | 153 | char c2 = 0; 154 | OCRAD_Descriptor * const ocrdes = OCRAD_open(); 155 | 156 | if (ocrdes && OCRAD_get_errno(ocrdes) == OCRAD_ok && 157 | OCRAD_set_image(ocrdes, opix, 0) == 0 && 158 | ( height >= 10 || OCRAD_scale( ocrdes, 2 ) == 0 ) && 159 | OCRAD_recognize(ocrdes, 0) == 0 ) 160 | c2 = OCRAD_result_first_character(ocrdes); 161 | 162 | OCRAD_close(ocrdes); 163 | 164 | delete opix; 165 | free(bitmap_data); 166 | free(greymap_data); 167 | 168 | cout << "+ recognised via OCRAD_result_first_character(): " << c2 << endl; 169 | } 170 | 171 | int main() 172 | { 173 | for (unsigned int n = 0; n < 6; n++) 174 | { 175 | run_test(n); 176 | } 177 | } 178 | -------------------------------------------------------------------------------- /package/linux/suse/osra.spec: -------------------------------------------------------------------------------- 1 | # 2 | # spec file for package OSRA 3 | # 4 | # Copyright (c) 2009 SUSE LINUX Products GmbH, Nuernberg, Germany. 5 | # 6 | # All modifications and additions to the file contributed by third parties 7 | # remain the property of their copyright owners, unless otherwise agreed 8 | # upon. The license for this file, and modifications and additions to the 9 | # file, is the same license as for the pristine package itself (unless the 10 | # license for the pristine package is not an Open Source License, in which 11 | # case the license is the MIT License). An "Open Source License" is a 12 | # license that conforms to the Open Source Definition (Version 1.9) 13 | # published by the Open Source Initiative. 14 | 15 | # Please submit bugfixes or comments via http://bugs.opensuse.org/ 16 | # 17 | 18 | # norootforbuild 19 | 20 | %define name osra 21 | %define version 2.1.0 22 | 23 | %define builddep glibc-devel, libstdc++45-devel, tclap >= 1.2, potrace-devel >= 1.8, gocr-devel >= 0.49, ocrad-devel >= 0.20, libopenbabel-devel >= 2.3, libGraphicsMagick++-devel >= 1.3.10, cuneiform-devel => 1.1.0, tesseract-devel => 3.01, docbook-xsl-stylesheets => 1.74.0, libxslt 24 | %define binarydep potrace-lib >= 1.8, libopenbabel3 >= 2.3, libGraphicsMagick++3 >= 1.3.10, cuneiform => 1.1.0, tesseract => 3.01, %{name}-common = %{version} 25 | 26 | Name: %{name} 27 | BuildRequires: %{builddep} 28 | Url: http://osra.sourceforge.net/ 29 | Summary: A command line chemical structure recognition tool 30 | Version: %{version} 31 | Release: 1.0 32 | Group: Productivity/Graphics/Other 33 | Requires: %{binarydep} 34 | License: GPL v2 or later 35 | Source0: %{name}-%{version}.tar.bz2 36 | #Patch0: Makefile.in.patch 37 | BuildRoot: %{_tmppath}/%{name}-%{version}-build 38 | BuildArch: x86_64 39 | 40 | %description 41 | OSRA is a utility designed to convert graphical representations of chemical structures into SMILES or SDF. 42 | OSRA can read a document in any of the over 90 graphical formats parseable by GraphicMagick and generate 43 | the SMILES or SDF representation of the molecular structure images encountered within that document. 44 | 45 | Authors: 46 | -------- 47 | Igor Filippov 48 | 49 | %package common 50 | Summary: OSRA shared files 51 | Group: Productivity/Graphics/Other 52 | BuildArch: noarch 53 | 54 | %description common 55 | This package contains the shared files for OSRA executable / library. 56 | 57 | %package lib2 58 | Summary: OSRA C++ library 59 | Group: Development/Libraries/C and C++ 60 | Requires: %{binarydep} 61 | 62 | %description lib2 63 | This package contains the dynamic library needed to consume OSRA functionality 64 | from C++ programs. 65 | 66 | %package lib-java2 67 | Summary: OSRA Java library 68 | Group: Development/Libraries/C and C++ 69 | Requires: %{binarydep} 70 | 71 | %description lib-java2 72 | This package contains the dynamic library needed to consume OSRA functionality 73 | from Java programs. 74 | 75 | %package devel 76 | Summary: OSRA static library and header files mandatory for development 77 | Group: Development/Libraries/C and C++ 78 | Requires: %{name}-lib2 = %{version} 79 | 80 | %description devel 81 | This package contains all necessary include files and libraries needed 82 | to develop applications on the top of OSRA. 83 | 84 | %prep 85 | %setup -n %{name}-%{version} 86 | #%patch0 -p0 87 | 88 | %build 89 | # See http://stackoverflow.com/questions/3113472/how-to-make-an-rpm-spec-that-installs-libraries-to-usr-lib-xor-usr-lib64-based 90 | # See http://www.rpm.org/api/4.4.2.2/config_macros.html 91 | %configure --enable-docs --enable-lib --enable-java --with-tesseract --with-cuneiform --datadir=%{_datadir}/%{name} --docdir=%{_datadir}/doc/packages/%{name} 92 | %__make 93 | 94 | %install 95 | # See http://fedoraproject.org/wiki/PackagingGuidelines#Why_the_.25makeinstall_macro_should_not_be_used 96 | %__make install DESTDIR=%{buildroot} 97 | 98 | %clean 99 | %__rm -rf $RPM_BUILD_ROOT 100 | 101 | %define _sharedir %{_prefix}/share 102 | 103 | %files 104 | %defattr(-, root, root) 105 | %{_prefix}/bin/%{name} 106 | %{_mandir}/man?/%{name}.* 107 | 108 | %files common 109 | %{_sharedir}/%{name} 110 | %{_sharedir}/doc 111 | 112 | %files lib2 113 | %defattr(-,root,root) 114 | %{_libdir}/lib%{name}.so* 115 | 116 | %files lib-java2 117 | %defattr(-,root,root) 118 | %{_libdir}/lib%{name}_java.so* 119 | 120 | %files devel 121 | %defattr(-,root,root) 122 | %{_libdir}/lib%{name}.a 123 | %{_libdir}/pkgconfig 124 | %{_includedir} 125 | 126 | # spec file ends here 127 | 128 | %changelog 129 | * Thu Jul 01 2011 dma_k@mail.ru 130 | - Initial SuSE package 131 | -------------------------------------------------------------------------------- /src/Makefile: -------------------------------------------------------------------------------- 1 | # 2 | # This makefile is responsible for building the executable. 3 | # 4 | 5 | include ../Makefile.inc 6 | include Makefile.dep 7 | 8 | .PHONY: clean_obj 9 | 10 | LIB_VERSION := $(LIB_MAJOR_VERSION).$(LIB_MINOR_VERSION).$(LIB_PATCH_VERSION) 11 | 12 | TARGETS := osra$(EXEEXT) 13 | 14 | OBJ_LIB := osra_lib.o osra_grayscale.o osra_fragments.o osra_segment.o osra_labels.o osra_thin.o osra_common.o osra_stl.o osra_structure.o osra_anisotropic.o osra_ocr.o osra_openbabel.o mcdlutil.o unpaper.o osra_reaction.o 15 | 16 | ifdef TESSERACT_LIB 17 | OBJ_LIB += osra_ocr_tesseract.o 18 | endif 19 | 20 | OBJ_CLI := $(OBJ_LIB) osra.o 21 | 22 | ifdef OSRA_LIB 23 | TARGETS += libosra.a libosra$(SHAREDEXT) 24 | endif 25 | 26 | ifdef OSRA_JAVA 27 | OBJ_JAVA := $(OBJ_LIB) osra_java.o 28 | TARGETS += libosra_java$(SHAREDEXT) 29 | endif 30 | 31 | ifdef OSRA_ANDROID 32 | OBJ_ANDROID := $(OBJ_LIB) osra.o 33 | endif 34 | 35 | all: 36 | # From here: http://stackoverflow.com/questions/5584872/complex-conditions-check-in-makefile/5586785#5586785 37 | ifneq ($(or $(OSRA_LIB),$(OSRA_JAVA),$(OSRA_ANDROID)),) 38 | $(RM) -f $(OBJ_CLI) 39 | endif 40 | $(MAKE) osra$(EXEEXT) 41 | ifdef OSRA_LIB 42 | $(RM) -f $(OBJ_LIB) 43 | $(MAKE) libosra.a 44 | $(MAKE) libosra$(SHAREDEXT) 45 | endif 46 | ifdef OSRA_JAVA 47 | $(RM) -f $(OBJ_JAVA) 48 | $(MAKE) libosra_java$(SHAREDEXT) 49 | endif 50 | ifdef OSRA_ANDROID 51 | $(RM) -f $(OBJ_ANDROID) 52 | $(MAKE) libosra_andriod$(SHAREDEXT) 53 | endif 54 | # We have to update the timestaps of the targets, otherwise "install" target will try to re-link and will cause missed symbols: 55 | touch $(TARGETS) 56 | 57 | osra$(EXEEXT): $(OBJ_CLI) 58 | $(LINK.cpp) -o $@ $(OBJ_CLI) $(LIBS) 59 | 60 | ifdef OSRA_LIB 61 | libosra.a: CXXFLAGS += -fPIC -DOSRA_LIB 62 | libosra.a: $(OBJ_LIB) 63 | $(AR) cru $@ $(OBJ_LIB) 64 | $(RANLIB) $@ 65 | 66 | libosra$(SHAREDEXT): CXXFLAGS += -fPIC -DOSRA_LIB 67 | libosra$(SHAREDEXT): $(OBJ_LIB) 68 | $(LINK.cpp) $(LDSHAREDFLAGS) -o $@ $(OBJ_LIB) $(LIBS) 69 | endif 70 | 71 | ifdef OSRA_JAVA 72 | libosra_java$(SHAREDEXT): CXXFLAGS += -fPIC -DOSRA_LIB -DOSRA_JAVA 73 | libosra_java$(SHAREDEXT): $(OBJ_JAVA) 74 | $(LINK.cpp) $(LDSHAREDFLAGS) -o $@ $(OBJ_JAVA) $(LIBS) 75 | endif 76 | 77 | ifdef OSRA_ANDROID 78 | libosra_andriod$(SHAREDEXT): CXXFLAGS += -fPIC -DOSRA_LIB -DOSRA_ANDROID 79 | libosra_andriod$(SHAREDEXT): $(OBJ_ANDROID) 80 | $(LINK.cpp) $(LDSHAREDFLAGS) -o $@ $(OBJ_ANDROID) $(LIBS) 81 | endif 82 | 83 | Makefile.dep: ../Makefile.inc $(wildcard *.cpp) 84 | $(CXX) $(CPPFLAGS) -MM $^ > Makefile.dep 85 | 86 | # Correct installation for Cygwin/MinGW also needs handling of libosra.dll.a, which is not done here: 87 | install: $(TARGETS) 88 | $(INSTALL_DIR) $(DESTDIR)$(bindir) 89 | $(INSTALL_PROGRAM) osra$(EXEEXT) $(DESTDIR)$(bindir) 90 | ifdef OSRA_LIB 91 | $(INSTALL_DIR) $(DESTDIR)$(libdir) $(DESTDIR)$(includedir) $(DESTDIR)$(libdir)/pkgconfig 92 | $(INSTALL_PROGRAM) libosra$(SHAREDEXT) $(DESTDIR)$(libdir)/libosra$(SHAREDEXT).$(LIB_VERSION) 93 | $(LN_S) -f libosra$(SHAREDEXT).$(LIB_VERSION) $(DESTDIR)$(libdir)/libosra$(SHAREDEXT).$(LIB_MAJOR_VERSION) 94 | $(LN_S) -f libosra$(SHAREDEXT).$(LIB_MAJOR_VERSION) $(DESTDIR)$(libdir)/libosra$(SHAREDEXT) 95 | $(INSTALL_DATA) libosra.a $(DESTDIR)$(libdir) 96 | $(INSTALL_DATA) osra_lib.h $(DESTDIR)$(includedir) 97 | $(INSTALL_DATA) ../package/linux/osra.pc $(DESTDIR)$(libdir)/pkgconfig 98 | endif 99 | ifdef OSRA_JAVA 100 | $(INSTALL_DIR) $(DESTDIR)$(libdir) 101 | $(INSTALL_PROGRAM) libosra_java$(SHAREDEXT) $(DESTDIR)$(libdir)/libosra_java$(SHAREDEXT).$(LIB_VERSION) 102 | $(LN_S) -f libosra_java$(SHAREDEXT).$(LIB_VERSION) $(DESTDIR)$(libdir)/libosra_java$(SHAREDEXT).$(LIB_MAJOR_VERSION) 103 | $(LN_S) -f libosra_java$(SHAREDEXT).$(LIB_MAJOR_VERSION) $(DESTDIR)$(libdir)/libosra_java$(SHAREDEXT) 104 | endif 105 | 106 | # "install" and "make" tools auomatically autodetect the extension for executables, but "rm" needs extension correction. 107 | uninstall: 108 | -$(RM) -f \ 109 | $(DESTDIR)$(bindir)/osra$(EXEEXT) \ 110 | $(DESTDIR)$(libdir)/libosra$(SHAREDEXT).$(LIB_VERSION) \ 111 | $(DESTDIR)$(libdir)/libosra$(SHAREDEXT).$(LIB_MAJOR_VERSION) \ 112 | $(DESTDIR)$(libdir)/libosra$(SHAREDEXT) \ 113 | $(DESTDIR)$(libdir)/libosra.a \ 114 | $(DESTDIR)$(libdir)/libosra_java$(SHAREDEXT).$(LIB_VERSION) \ 115 | $(DESTDIR)$(libdir)/libosra_java$(SHAREDEXT).$(LIB_MAJOR_VERSION) \ 116 | $(DESTDIR)$(libdir)/libosra_java$(SHAREDEXT) 117 | $(DESTDIR)$(includedir)/osra_lib.h \ 118 | $(DESTDIR)$(libdir)/pkgconfig/osra.pc 119 | 120 | clean: 121 | -$(RM) -f *.o osra$(EXEEXT) libosra*.* 122 | 123 | distclean: clean 124 | -$(RM) -f config.h Makefile.dep 125 | 126 | ../Makefile.inc: ../Makefile.inc.in ../config.status 127 | cd .. && ./config.status 128 | -------------------------------------------------------------------------------- /package/linux/suse/osra.spec.in: -------------------------------------------------------------------------------- 1 | # 2 | # spec file for package OSRA 3 | # 4 | # Copyright (c) 2009 SUSE LINUX Products GmbH, Nuernberg, Germany. 5 | # 6 | # All modifications and additions to the file contributed by third parties 7 | # remain the property of their copyright owners, unless otherwise agreed 8 | # upon. The license for this file, and modifications and additions to the 9 | # file, is the same license as for the pristine package itself (unless the 10 | # license for the pristine package is not an Open Source License, in which 11 | # case the license is the MIT License). An "Open Source License" is a 12 | # license that conforms to the Open Source Definition (Version 1.9) 13 | # published by the Open Source Initiative. 14 | 15 | # Please submit bugfixes or comments via http://bugs.opensuse.org/ 16 | # 17 | 18 | # norootforbuild 19 | 20 | %define name @PACKAGE_NAME@ 21 | %define version @PACKAGE_VERSION@ 22 | 23 | %define builddep glibc-devel, libstdc++45-devel, tclap >= 1.2, potrace-devel >= 1.8, gocr-devel >= 0.49, ocrad-devel >= 0.20, libopenbabel-devel >= 2.3, libGraphicsMagick++-devel >= 1.3.10, cuneiform-devel => 1.1.0, tesseract-devel => 3.01, docbook-xsl-stylesheets => 1.74.0, libxslt 24 | %define binarydep potrace-lib >= 1.8, libopenbabel3 >= 2.3, libGraphicsMagick++3 >= 1.3.10, cuneiform => 1.1.0, tesseract => 3.01, %{name}-common = %{version} 25 | 26 | Name: %{name} 27 | BuildRequires: %{builddep} 28 | Url: http://osra.sourceforge.net/ 29 | Summary: A command line chemical structure recognition tool 30 | Version: %{version} 31 | Release: 1.0 32 | Group: Productivity/Graphics/Other 33 | Requires: %{binarydep} 34 | License: GPL v2 or later 35 | Source0: %{name}-%{version}.tar.bz2 36 | #Patch0: Makefile.in.patch 37 | BuildRoot: %{_tmppath}/%{name}-%{version}-build 38 | BuildArch: @build_cpu@ 39 | 40 | %description 41 | OSRA is a utility designed to convert graphical representations of chemical structures into SMILES or SDF. 42 | OSRA can read a document in any of the over 90 graphical formats parseable by GraphicMagick and generate 43 | the SMILES or SDF representation of the molecular structure images encountered within that document. 44 | 45 | Authors: 46 | -------- 47 | Igor Filippov 48 | 49 | %package common 50 | Summary: OSRA shared files 51 | Group: Productivity/Graphics/Other 52 | BuildArch: noarch 53 | 54 | %description common 55 | This package contains the shared files for OSRA executable / library. 56 | 57 | %package lib@LIB_MAJOR_VERSION@ 58 | Summary: OSRA C++ library 59 | Group: Development/Libraries/C and C++ 60 | Requires: %{binarydep} 61 | 62 | %description lib@LIB_MAJOR_VERSION@ 63 | This package contains the dynamic library needed to consume OSRA functionality 64 | from C++ programs. 65 | 66 | %package lib-java@LIB_MAJOR_VERSION@ 67 | Summary: OSRA Java library 68 | Group: Development/Libraries/C and C++ 69 | Requires: %{binarydep} 70 | 71 | %description lib-java@LIB_MAJOR_VERSION@ 72 | This package contains the dynamic library needed to consume OSRA functionality 73 | from Java programs. 74 | 75 | %package devel 76 | Summary: OSRA static library and header files mandatory for development 77 | Group: Development/Libraries/C and C++ 78 | Requires: %{name}-lib@LIB_MAJOR_VERSION@ = %{version} 79 | 80 | %description devel 81 | This package contains all necessary include files and libraries needed 82 | to develop applications on the top of OSRA. 83 | 84 | %prep 85 | %setup -n %{name}-%{version} 86 | #%patch0 -p0 87 | 88 | %build 89 | # See http://stackoverflow.com/questions/3113472/how-to-make-an-rpm-spec-that-installs-libraries-to-usr-lib-xor-usr-lib64-based 90 | # See http://www.rpm.org/api/4.4.2.2/config_macros.html 91 | %configure --enable-docs --enable-lib --enable-java --with-tesseract --with-cuneiform --datadir=%{_datadir}/%{name} --docdir=%{_datadir}/doc/packages/%{name} 92 | %__make 93 | 94 | %install 95 | # See http://fedoraproject.org/wiki/PackagingGuidelines#Why_the_.25makeinstall_macro_should_not_be_used 96 | %__make install DESTDIR=%{buildroot} 97 | 98 | %clean 99 | %__rm -rf $RPM_BUILD_ROOT 100 | 101 | %define _sharedir %{_prefix}/share 102 | 103 | %files 104 | %defattr(-, root, root) 105 | %{_prefix}/bin/%{name} 106 | %{_mandir}/man?/%{name}.* 107 | 108 | %files common 109 | %{_sharedir}/%{name} 110 | %{_sharedir}/doc 111 | 112 | %files lib@LIB_MAJOR_VERSION@ 113 | %defattr(-,root,root) 114 | %{_libdir}/lib%{name}.so* 115 | 116 | %files lib-java@LIB_MAJOR_VERSION@ 117 | %defattr(-,root,root) 118 | %{_libdir}/lib%{name}_java.so* 119 | 120 | %files devel 121 | %defattr(-,root,root) 122 | %{_libdir}/lib%{name}.a 123 | %{_libdir}/pkgconfig 124 | %{_includedir} 125 | 126 | # spec file ends here 127 | 128 | %changelog 129 | * Thu Jul 01 2011 dma_k@mail.ru 130 | - Initial SuSE package 131 | -------------------------------------------------------------------------------- /src/osra_java.cpp: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | #include "config.h" // PACKAGE_VERSION 20 | 21 | #ifdef OSRA_JAVA 22 | /* Fix for jlong definition in jni.h on some versions of gcc on Windows */ 23 | #if defined(__GNUC__) && !defined(__INTEL_COMPILER) 24 | typedef long long __int64; 25 | #endif 26 | 27 | #include 28 | 29 | #include // calloc(), free() 30 | 31 | #include // std::string 32 | #include // std:ostream 33 | #include // std:ostringstream 34 | 35 | 36 | 37 | 38 | #include "osra_lib.h" 39 | 40 | extern "C" { 41 | /* 42 | * Class: net_sf_osra_OsraLib 43 | * Method: processImage 44 | * Signature: ([BLjava/io/Writer;Ljava/lang/String;Ljava/lang/String;ZZZ)I 45 | */ 46 | JNIEXPORT jint JNICALL Java_net_sf_osra_OsraLib_processImage(JNIEnv *, jclass, jbyteArray, jobject, jint, jboolean,jint,jdouble,jint, jboolean, jboolean,jstring, jstring, jboolean, jboolean,jboolean, jboolean, jboolean); 47 | 48 | /* 49 | * Class: net_sf_osra_OsraLib 50 | * Method: getVersion 51 | * Signature: ()Ljava/lang/String; 52 | */ 53 | JNIEXPORT jstring JNICALL Java_net_sf_osra_OsraLib_getVersion(JNIEnv *, jclass); 54 | } 55 | 56 | JNIEXPORT jint JNICALL Java_net_sf_osra_OsraLib_processImage(JNIEnv *j_env, jclass j_class, 57 | jbyteArray j_image_data, 58 | jobject j_writer, 59 | jint j_rotate, 60 | jboolean j_invert, 61 | jint j_input_resolution, 62 | jdouble j_threshold, 63 | jint j_do_unpaper, 64 | jboolean j_jaggy, 65 | jboolean j_adaptive_option, 66 | jstring j_output_format, 67 | jstring j_embedded_format, 68 | jboolean j_output_confidence, 69 | jboolean j_show_resolution_guess, 70 | jboolean j_show_page, 71 | jboolean j_output_coordinates, 72 | jboolean j_output_avg_bond_length) 73 | { 74 | const char *output_format = j_env->GetStringUTFChars(j_output_format, NULL); 75 | const char *embedded_format = j_env->GetStringUTFChars(j_embedded_format, NULL); 76 | const char *image_data = (char *) j_env->GetByteArrayElements(j_image_data, NULL); 77 | 78 | int result = -1; 79 | 80 | if (image_data != NULL) 81 | { 82 | // Perhaps there is a more optimal way to bridge from std:ostream to java.io.Writer. 83 | // See http://stackoverflow.com/questions/524524/creating-an-ostream/524590#524590 84 | std::ostringstream structure_output_stream; 85 | 86 | result = osra_process_image( 87 | image_data, 88 | j_env->GetArrayLength(j_image_data), 89 | structure_output_stream, 90 | j_rotate, 91 | j_invert, 92 | j_input_resolution, 93 | j_threshold, 94 | j_do_unpaper, 95 | j_jaggy, 96 | j_adaptive_option, 97 | output_format, 98 | embedded_format, 99 | j_output_confidence, 100 | j_show_resolution_guess, 101 | j_show_page, 102 | j_output_coordinates, 103 | j_output_avg_bond_length, 104 | "." 105 | ); 106 | 107 | j_env->ReleaseByteArrayElements(j_image_data, (jbyte *) image_data, JNI_ABORT); 108 | 109 | // Locate java.io.Writer#write(String) method: 110 | jclass j_writer_class = j_env->FindClass("java/io/Writer"); 111 | jmethodID write_method_id = j_env->GetMethodID(j_writer_class, "write", "(Ljava/lang/String;)V"); 112 | 113 | jstring j_string = j_env->NewStringUTF(structure_output_stream.str().c_str()); 114 | 115 | j_env->CallVoidMethod(j_writer, write_method_id, j_string); 116 | 117 | j_env->DeleteLocalRef(j_writer_class); 118 | j_env->DeleteLocalRef(j_string); 119 | } 120 | 121 | j_env->ReleaseStringUTFChars(j_output_format, output_format); 122 | j_env->ReleaseStringUTFChars(j_embedded_format, embedded_format); 123 | 124 | return result; 125 | } 126 | 127 | JNIEXPORT jstring JNICALL Java_net_sf_osra_OsraLib_getVersion(JNIEnv *j_env, jclass j_class) 128 | { 129 | return j_env->NewStringUTF(PACKAGE_VERSION); 130 | } 131 | #endif 132 | -------------------------------------------------------------------------------- /src/osra_anisotropic.cpp: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | #define cimg_use_magick 21 | #define cimg_plugin "greycstoration.h" 22 | 23 | #include "CImg.h" 24 | 25 | using namespace cimg_library; 26 | using namespace Magick; 27 | 28 | Image anisotropic_smoothing(const Image &image, int width, int height, const float amplitude, const float sharpness, 29 | const float anisotropy, const float alpha, const float sigma) 30 | { 31 | 32 | Image res(Geometry(width, height), "white"); 33 | res.type(GrayscaleType); 34 | #pragma omp critical 35 | { 36 | CImg source(width, height, 1, 1, 0); 37 | unsigned char color[1] = { 0 }; 38 | unsigned char cc; 39 | ColorGray c; 40 | 41 | for (int i = 0; i < width; i++) 42 | for (int j = 0; j < height; j++) 43 | { 44 | c = image.pixelColor(i, j); 45 | color[0] = (unsigned char) (255 * c.shade()); 46 | source.draw_point(i, j, color); 47 | } 48 | CImg dest(source); 49 | const float gfact = 1.; 50 | //const float amplitude = 5.; // 20 51 | // const float sharpness = 0.3; 52 | //const float anisotropy = 1.; 53 | //const float alpha = .2; //0.6 54 | //const float sigma = 1.1; // 2. 55 | const float dl = 0.8; 56 | const float da = 30.; 57 | const float gauss_prec = 2.; 58 | const unsigned int interp = 0; 59 | const bool fast_approx = true; 60 | const unsigned int tile = 512; 61 | const unsigned int btile = 4; 62 | const unsigned int threads = 1; // orig - 2 63 | 64 | dest.greycstoration_run(amplitude, sharpness, anisotropy, alpha, sigma, gfact, dl, da, gauss_prec, interp, 65 | fast_approx, tile, btile, threads); 66 | do 67 | { 68 | cimg::wait(1); 69 | } 70 | while (dest.greycstoration_is_running()); 71 | 72 | for (int i = 0; i < width; i++) 73 | for (int j = 0; j < height; j++) 74 | { 75 | cc = dest(i, j); 76 | c.shade(1. * cc / 255); 77 | res.pixelColor(i, j, c); 78 | } 79 | } 80 | return (res); 81 | } 82 | 83 | Image anisotropic_scaling(const Image &image, int width, int height, int nw, int nh) 84 | { 85 | 86 | Image res(Geometry(nw, nh), "white"); 87 | res.type(GrayscaleType); 88 | #pragma omp critical 89 | { 90 | CImg source(width, height, 1, 1, 0); 91 | unsigned char color[1] = { 0 }; 92 | unsigned char cc; 93 | ColorGray c; 94 | 95 | 96 | for (int i = 0; i < width; i++) 97 | for (int j = 0; j < height; j++) 98 | { 99 | c = image.pixelColor(i, j); 100 | color[0] = (unsigned char) (255 * c.shade()); 101 | source.draw_point(i, j, color); 102 | } 103 | 104 | //const float gfact = (sizeof(T) == 2) ? 1.0f / 256 : 1.0f; 105 | const float gfact = 1.; 106 | const float amplitude = 20.; // 40 20! 107 | const float sharpness = 0.2; // 0.2! 0.3 108 | const float anisotropy = 1.; 109 | const float alpha = .6; //0.6! 0.8 110 | const float sigma = 2.; //1.1 2.! 111 | const float dl = 0.8; 112 | const float da = 30.; 113 | const float gauss_prec = 2.; 114 | const unsigned int interp = 0; 115 | const bool fast_approx = true; 116 | const unsigned int tile = 512; // 512 0 117 | const unsigned int btile = 4; 118 | const unsigned int threads = 1; // 2 1 119 | 120 | const unsigned int init = 5; 121 | CImg mask; 122 | 123 | mask.assign(source.dimx(), source.dimy(), 1, 1, 255); 124 | mask = !mask.resize(nw, nh, 1, 1, 4); 125 | source.resize(nw, nh, 1, -100, init); 126 | CImg dest(source); 127 | 128 | dest.greycstoration_run(mask, amplitude, sharpness, anisotropy, alpha, sigma, gfact, dl, da, gauss_prec, interp, 129 | fast_approx, tile, btile, threads); 130 | do 131 | { 132 | cimg::wait(1); 133 | } 134 | while (dest.greycstoration_is_running()); 135 | 136 | for (int i = 0; i < nw; i++) 137 | for (int j = 0; j < nh; j++) 138 | { 139 | cc = dest(i, j); 140 | c.shade(1. * cc / 255); 141 | res.pixelColor(i, j, c); 142 | } 143 | } 144 | return (res); 145 | } 146 | -------------------------------------------------------------------------------- /src/osra_lib.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // Header: osra_lib.h 21 | // 22 | // Defines types and functions of OSRA library. 23 | // 24 | 25 | #include // std::string 26 | #include // std:ostream 27 | #include 28 | 29 | // 30 | // Section: Functions 31 | // 32 | 33 | // Function: osra_process_image() 34 | // 35 | // Parameters: 36 | // image_data - the binary image 37 | // 38 | // Returns: 39 | // 0, if processing was completed successfully 40 | int osra_process_image( 41 | #ifdef OSRA_LIB 42 | const char *image_data, 43 | int image_length, 44 | std::ostream &structure_output_stream, 45 | #else 46 | const std::string &input_file = "/home/edward/Documents/CSDE_Git/osra_install_pkg/osra-2.1.0-1/src/test.jpg", 47 | const std::string &output_file = "/home/edward/Documents/CSDE_Git/osra_install_pkg/osra-2.1.0-1/src/output", 48 | #endif 49 | int rotate = 0, 50 | bool invert = false, 51 | int input_resolution = 0, 52 | double threshold = 0, 53 | int do_unpaper = 0, 54 | bool jaggy = false, 55 | bool adaptive_option = false, 56 | std::string output_format = "smi", 57 | std::string embedded_format = "", 58 | bool show_confidence = false, 59 | bool show_resolution_guess = false, 60 | bool show_page = false, 61 | bool show_coordinates = false, 62 | bool show_avg_bond_length = false, 63 | bool show_learning = false, 64 | const std::string &osra_dir = "/usr/local/bin", 65 | const std::string &spelling_file = "", 66 | const std::string &superatom_file = "", 67 | bool debug = false, 68 | bool verbose = false, 69 | const std::string &output_image_file_prefix = "", 70 | const std::string &resize = "", 71 | const std::string &preview = "" 72 | ); 73 | 74 | // Instatiate and populate map and vector 75 | std::vector > initialize_rgroup(); 76 | 77 | /// Function for reading a single OSRA diagram 78 | std::string read_diagram( 79 | const std::string &input_file, 80 | const char *image_data = "a", 81 | int image_length = 4, 82 | const std::string &output_file = "/tmp", 83 | int rotate = 0, 84 | bool invert = false, 85 | int input_resolution = 0, 86 | double threshold = 0, 87 | int do_unpaper = 0, 88 | bool jaggy = false, 89 | bool adaptive_option = false, 90 | std::string output_format = "smi", 91 | std::string embedded_format = "", 92 | bool show_confidence = false, 93 | bool show_resolution_guess = false, 94 | bool show_page = false, 95 | bool show_coordinates = false, 96 | bool show_avg_bond_length = false, 97 | bool show_learning = false, 98 | const std::string &osra_dir = "/usr/local/bin", 99 | const std::string &spelling_file = "", 100 | const std::string &superatom_file = "", 101 | bool debug = false, 102 | bool verbose = false, 103 | const std::string &output_image_file_prefix = "", 104 | const std::string &resize = "", 105 | const std::string &preview = "" 106 | ); 107 | 108 | 109 | std::vector read_rgroup( 110 | std::vector > list_of_rgroup_maps, 111 | const std::string &input_file, 112 | const char *image_data = "a", 113 | int image_length = 4, 114 | const std::string &output_file = "/tmp", 115 | int rotate = 0, 116 | bool invert = false, 117 | int input_resolution = 0, 118 | double threshold = 0, 119 | int do_unpaper = 0, 120 | bool jaggy = false, 121 | bool adaptive_option = false, 122 | std::string output_format = "smi", 123 | std::string embedded_format = "", 124 | bool show_confidence = false, 125 | bool show_resolution_guess = false, 126 | bool show_page = false, 127 | bool show_coordinates = false, 128 | bool show_avg_bond_length = false, 129 | bool show_learning = false, 130 | const std::string &osra_dir = "/usr/local/bin", 131 | const std::string &spelling_file = "", 132 | const std::string &superatom_file = "", 133 | bool debug = false, 134 | bool verbose = false, 135 | const std::string &output_image_file_prefix = "", 136 | const std::string &resize = "", 137 | const std::string &preview = "" 138 | ); 139 | 140 | void test_osra_lib( 141 | #ifdef OSRA_LIB 142 | const std::string &output = "osra lib on", 143 | #else 144 | const std::string &output = "osra lib off", 145 | #endif 146 | // std::string output = "normal test", 147 | int pointless = 0 148 | ); -------------------------------------------------------------------------------- /test/bugs/gocr_quality_regression_test/osra_gocr.cpp: -------------------------------------------------------------------------------- 1 | 2 | #include 3 | 4 | #include 5 | #include 6 | 7 | extern "C" { 8 | #include 9 | } 10 | 11 | using namespace std; 12 | 13 | /* Actual max height is 12, but we leave some more for extensions: */ 14 | const char* TESTS[][50] = 15 | { 16 | /* These show where 0.45 is better */ 17 | /* Test1: "3" is not detected */ 18 | { 19 | "##############", 20 | "####......####", 21 | "###........###", 22 | "###..#......##", 23 | "########...###", 24 | "########...###", 25 | "########...###", 26 | "########..####", 27 | "######....####", 28 | "#####.....####", 29 | "#######.....##", 30 | "#########....#", 31 | "#########....#", 32 | "##########...#", 33 | "#########...##", 34 | "##...###....##", 35 | "#..........###", 36 | "#.........####", 37 | "###..#.#######" 38 | }, 39 | /* Test2: "3" is not detected */ 40 | { 41 | "##############", 42 | "####......####", 43 | "###........###", 44 | "###..#......##", 45 | "########...###", 46 | "########...###", 47 | "########...###", 48 | "########..####", 49 | "######....####", 50 | "#####.....####", 51 | "#######.....##", 52 | "#########....#", 53 | "#########....#", 54 | "##########...#", 55 | "#########...##", 56 | "##...###....##", 57 | "#..........###", 58 | "#.........####", 59 | "###..#.#######", 60 | "##############", 61 | }, 62 | /* Test3: "3" is not detected */ 63 | { 64 | "############", 65 | "###......###", 66 | "###.......##", 67 | "###........#", 68 | "#######....#", 69 | "#######....#", 70 | "#######...##", 71 | "#######...##", 72 | "######....##", 73 | "#####.....##", 74 | "######.....#", 75 | "#######.....", 76 | "########....", 77 | "########...#", 78 | "########...#", 79 | "#...###....#", 80 | "#.........##", 81 | "#........###", 82 | "###.....####", 83 | "############" 84 | }, 85 | /* The rest show where 0.48 is better */ 86 | /* Test4: nothing should be detected */ 87 | { 88 | "#######################", 89 | "#######################", 90 | "#######################", 91 | "#########..#..#########", 92 | "########.........######", 93 | "########........#######", 94 | "#########......########", 95 | "##########....#########", 96 | "###########...#########", 97 | ".##########...#########", 98 | ".##########...#########", 99 | ".#########....#########", 100 | "...#..........#########", 101 | "..............#########", 102 | "..............#########", 103 | ".########.....#########", 104 | ".#########....#########", 105 | ".#########....#########", 106 | "##########....#########", 107 | "##########....#########", 108 | "#########.....#########", 109 | "########......#########", 110 | "#######........########" 111 | }, 112 | /* Test5: "N" should be detected */ 113 | { 114 | "###############", 115 | "#....#####....#", 116 | "##....#####....", 117 | "##.....####..##", 118 | "##.....#####.##", 119 | "##...#..###..##", 120 | "##..##...##..##", 121 | "##..###...#..##", 122 | "##..####.....##", 123 | "##..#####....##", 124 | "##..#####....##", 125 | "##..######...##", 126 | "#.....#####..##", 127 | "#....######..##", 128 | "###############" 129 | }, 130 | /* Test6: "C" should be detected */ 131 | { 132 | "########.............", 133 | "#####..#.............", 134 | "#####................", 135 | "#####................", 136 | "####.................", 137 | "####........#########", 138 | "####........#########", 139 | "##..........#########", 140 | "##..........#########", 141 | "##.........##########", 142 | "##.........##########", 143 | "##.........##########", 144 | "##.........##########", 145 | "##.........##########", 146 | "##.........##########", 147 | "##.........##########", 148 | "##..........#########", 149 | "##.#........#########", 150 | "#.#.........#########", 151 | "###............##.##.", 152 | "####............#..#.", 153 | "####.................", 154 | "#####................" 155 | } 156 | }; 157 | 158 | job_t *JOB; 159 | job_t *OCR_JOB; 160 | 161 | char run_test(int n) 162 | { 163 | int height = 0; 164 | int width = strlen(TESTS[n][0]); 165 | 166 | while (TESTS[n][height] != NULL) 167 | { 168 | height++; 169 | } 170 | 171 | const char** image = TESTS[n]; 172 | 173 | cout << "Test " << n + 1 << ": width x height = " << width << "x" << height << endl; 174 | 175 | job_t job; 176 | 177 | job_init(&job); 178 | job_init_image(&job); 179 | 180 | job.cfg.cfilter = (char *) "oOcCnNHFsSBuUgMeEXYZRPp23456789"; 181 | job.src.p.x = width; 182 | job.src.p.y = height; 183 | job.src.p.bpp = 1; 184 | job.src.p.p = (unsigned char *) malloc(job.src.p.x * job.src.p.y); 185 | 186 | for (int row = 0; row < height; row++) 187 | { 188 | for (int col = 0; col < width; col++) 189 | job.src.p.p[row * width + col] = image[row][col] == '#' ? 255 : 0; 190 | } 191 | 192 | JOB = &job; 193 | OCR_JOB = &job; 194 | 195 | try 196 | { 197 | pgm2asc(&job); 198 | } 199 | catch (...) 200 | { 201 | } 202 | 203 | char *l = (char *) job.res.linelist.start.next->data; 204 | 205 | char c = 0; 206 | 207 | if (l != NULL) 208 | c = l[0]; 209 | 210 | if (isalnum(c)) 211 | { 212 | // Character recognition succeeded for GOCR: 213 | cout << "Found c=" << c << endl; 214 | } 215 | else 216 | { 217 | cout << "Failed c=" << c << endl; 218 | } 219 | } 220 | 221 | int main() 222 | { 223 | for (unsigned int n = 0; n < 6; n++) 224 | { 225 | run_test(n); 226 | } 227 | } 228 | -------------------------------------------------------------------------------- /src/mcdlutil.h: -------------------------------------------------------------------------------- 1 | /*-*-C++-*- 2 | 3 | ********************************************************************** 4 | Copyright (C) 2007,2008 by Sergei V. Trepalin sergey_trepalin@chemical-block.com 5 | Copyright (C) 2007,2008 by Andrei Gakh andrei.gakh@nnsa.doe.gov 6 | 7 | This file is part of the Open Babel project. 8 | For more information, see 9 | 10 | This program is free software; you can redistribute it and/or modify 11 | it under the terms of the GNU General Public License as published by 12 | the Free Software Foundation version 2 of the License. 13 | 14 | This program is distributed in the hope that it will be useful, 15 | but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | GNU General Public License for more details. 18 | *********************************************************************** 19 | */ 20 | /* 21 | Diagram is generated using templates, which are stored in SD file templates.sdf 22 | The SD file is usual SD file, which contain chemical structures and might contain data. 23 | Only chemical structures are used. Subgraph isomorphisme search is executed and coordinates 24 | of atoms are determined from templates. See Molecules, 11, 129-141 (2006) for algorithm decription. 25 | Structures in SD file are converted in next manner: 26 | 1. All atoms, except explicit hydrogens, are replaced with generic ANY_ATOM (matched with any atom in subgraph isomorphisme search) 27 | 2. All bonds are replaces with generic ANY_BOND, which can be matched with any bond in molecule 28 | 3. All hydrogen are removed, but they are used for search-query and structure atom matching is believed fo be 29 | sucessfukk if chemical structure contains more or equal number of hydrogens, than query. Using explicitly-defined hydrogens 30 | on query enables ones to remove substitutors attachment for atom, which are sterically hidden on templates 31 | if the file will not be found, predefined templates will be used 32 | */ 33 | 34 | 35 | namespace OpenBabel 36 | { 37 | 38 | //common constants 39 | static const int MAXBONDS=300; 40 | static const int MAXFRAGS=200; 41 | static const int MAXCHARS=1000; 42 | static const int MAX_DEPTH=10; 43 | static const int NELEMMAX=120; 44 | #define NELEMMCDL 121 45 | 46 | 47 | 48 | // Return valency by hydrogen for given atomic position in the Periodic Table 49 | int hydrogenValency(int na); 50 | int maxValency(int na); 51 | 52 | //Alternate overloaded methods 53 | int alternate(OBMol * pmol, const int nH[], int bondOrders []); //This method does not work! 54 | //Zero-based atomic numeration should be in connection matrix arrays iA1 and iA2-so first atom has indez zero 55 | int alternate(const std::vector aPosition,const std::vector aCharge, 56 | const std::vector aRad,const std::vector nHydr, const std::vector iA1, 57 | const std::vector iA2, std::vector & bondOrders, int nAtoms, int nBonds); 58 | 59 | //Diagram generation overloaded methods 60 | void generateDiagram(OBMol * pmol); 61 | //Zero-based atomic numeration should be in connection matrix arrays iA1 and iA2-so first atom has indez zero 62 | void generateDiagram(const std::vector iA1, const std::vector iA2, 63 | std::vector& rx, std::vector& ry, int nAtoms, int nBonds); 64 | void generateDiagram(OBMol * pmol, std::ostream & ofs); //for testing purposes only 65 | 66 | //Fragment search - pure subgraph isomorphisme 67 | bool fragmentSearch(OBMol * query, OBMol * structure); 68 | bool fragmentSearch(const std::vector aPositionQuery, const std::vector iA1Query, 69 | const std::vector iA2Query, const std::vector bondTypesQuery, const std::vector aPositionStructure, const std::vector iA1Structure, 70 | const std::vector iA2Structure, const std::vector bondTypesStructure, int nAtomsQuery, int nBondsQuery, int nAtomsStructure, int nBondsStructure); 71 | ///Equivalence list generation 72 | void equivalenceList(OBMol * pmol, std::vector& eqList); 73 | void equivalenceList(const std::vector aPosition,const std::vector aCharge, 74 | const std::vector aRad, const std::vector iA1, const std::vector iA2, 75 | const std::vector bondTypes, std::vector& eqList, int nAtoms, int nBonds); 76 | //Fragment addition 77 | void addFragment(OBMol * molecule, OBMol * fragment, int molAN, int fragAN, int molBN, int fragBN, bool isAddition); 78 | 79 | //routines below have no common meaning, but are necessary to process stereo information 80 | void createStereoLists(OBMol * pmol, std::vector& bondStereoList, std::vector& atomStereoList, std::vector& eqList); 81 | std::string getAtomMCDL(OBMol * pmol, int ntatoms, const std::vector ix, const std::vector aNumber, const std::vector atomStereoList, const std::vector eqList); 82 | std::string getBondMCDL(OBMol * pmol, int nbStore, int ntatoms, const std::vector ix, const std::vector aNumber, int bonds[MAXBONDS][4], const std::vector bondStereoList, const std::vector eqList); 83 | void implementAtomStereo(std::vector& iA1, std::vector& iA2, std::vector& stereoBonds, const std::vectorrx, const std::vector ry, int acount, int bcount, std::string astereo); 84 | void implementBondStereo(const std::vector iA1, const std::vector iA2, std::vector& rx, std::vector& ry, int acount, int bcount, std::string bstereo); 85 | 86 | 87 | int groupRedraw(OBMol * pmol, int bondN, int atomN, bool atomNInGroup); 88 | //int groupRedrawFrameAtom(OBMol * pmol, int bondN, int atomInFrame); 89 | 90 | int canonizeMCDL(const std::string atomBlock, std::vector & structureList); 91 | bool parseFormula(const std::string formulaString, std::vector & enumber, int & valency); 92 | 93 | void prepareTest(OBMol * pmol, std::ostream & ofs); 94 | 95 | } // namespace OpenBabel 96 | 97 | // 98 | //! \utilities for MCDL format and other useful utilities 99 | -------------------------------------------------------------------------------- /src/osra_openbabel.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | #ifndef OSRA_OPENBABEL_H 21 | #define OSRA_OPENBABEL_H 22 | 23 | #include // std::string 24 | #include // std::map 25 | #include // std::vector 26 | 27 | #include "osra.h" 28 | #include "osra_segment.h" 29 | 30 | 31 | // Header: osra_openbabel.h 32 | // 33 | // Defines types and functions for OSRA OpenBabel module. 34 | // 35 | 36 | 37 | //struct: molecule_statistics_s 38 | // contains the statistical information about molecule used for analysis of recognition accuracy 39 | struct molecule_statistics_s 40 | { 41 | // int: rotors 42 | // number of rotors in molecule 43 | int rotors; 44 | // int: num_fragments 45 | // number of contiguous fragments in molecule 46 | int fragments; 47 | // int: rings56 48 | // accumulated number of 5- and 6- rings in molecule 49 | int rings56; 50 | // int: rings456 51 | // accumulated number of 4, 5, and 6-member rings in molecule 52 | int rings456; 53 | // int: num_atoms 54 | // number of atoms in molecule 55 | int num_atoms; 56 | // int: num_bonds 57 | // number of bonds in molecule 58 | int num_bonds; 59 | // int: num_organic_non_carbon_atoms 60 | // number of organic atoms which are not carbon or hydrogen 61 | int num_organic_non_carbon_atoms; 62 | // int: num_small_angles 63 | // number of bond angles smaller than 20 degrees 64 | int num_small_angles; 65 | }; 66 | 67 | // typedef: molecule_statistics_t 68 | // defines molecule_statistics_t type based on molecule_statistics_s struct 69 | typedef struct molecule_statistics_s molecule_statistics_t; 70 | 71 | // 72 | // Section: Functions 73 | // 74 | 75 | // Function: osra_openbabel_init() 76 | // 77 | // Performs OpenBabel library engine sanity check. Should be called at e.g. program startup. 78 | // 79 | // Returns: 80 | // non-zero value in case of error 81 | int osra_openbabel_init(); 82 | 83 | // Function: calculate_molecule_statistics() 84 | // 85 | // Converts vectors of atoms and bonds into a molecular object and calculates the molecule statistics. 86 | // Note: this function changes the atoms! 87 | // 88 | // Parameters: 89 | // atom - vector of atoms 90 | // bond - vector of bonds 91 | // n_bond - total number of bonds 92 | // avg_bond_length - average bond length as measured from the image (to be included into output if provided) 93 | // superatom - dictionary of superatom labels mapped to SMILES 94 | // verbose - print debug info 95 | // 96 | // Returns: 97 | // calculated molecule statistics 98 | molecule_statistics_t calculate_molecule_statistics( 99 | std::vector &atom, const std::vector &bond, int n_bond, 100 | double avg_bond_length, const std::map &superatom, bool verbose); 101 | 102 | // Function: get_formatted_structure() 103 | // 104 | // Converts vectors of atoms and bonds into a molecular object and encodes the molecular into a text presentation (SMILES, MOL file, ...), 105 | // specified by given format. 106 | // 107 | // Parameters: 108 | // atom - vector of atoms 109 | // bond - vector of bonds 110 | // n_bond - total number of bonds 111 | // format - output format for molecular representation - i.e. SMI, SDF 112 | // embedded_format - output format to be embedded into SDF (is only valid if output format is SDF); the only embedded formats supported now are "inchi", "smi", and "can" 113 | // molecule_statistics - the molecule statistics (returned to the caller) 114 | // confidence - confidence score (returned to the caller) 115 | // show_confidence - toggles confidence score inclusion into output 116 | // avg_bond_length - average bond length as measured from the image 117 | // scaled_avg_bond_length - average bond length scaled to the original resolution of the image 118 | // show_avg_bond_length - toggles average bond length inclusion into output 119 | // resolution - resolution at which image is being processed in DPI (to be included into output if provided) 120 | // page - page number (to be included into output if provided) 121 | // surrounding_box - the coordinates of surrounding image box that contains the structure (to be included into output if provided) 122 | // superatom - dictionary of superatom labels mapped to SMILES 123 | // verbose - print debug info 124 | // 125 | // Returns: 126 | // string containing SMILES, SDF or other representation of the molecule 127 | const std::string get_formatted_structure( 128 | std::vector &atom, const std::vector &bond, int n_bond, 129 | const std::string &format, const std::string &second_format, 130 | molecule_statistics_t &molecule_statistics, 131 | double &confidence, bool show_confidence, 132 | double avg_bond_length, double scaled_avg_bond_length, bool show_avg_bond_length, 133 | const int * const resolution, const int * const page, const box_t * const surrounding_box, 134 | const std::map &superatom, int n_letters, bool show_learning, 135 | int resolution_iteration, bool verbose, const std::vector& brackets); 136 | 137 | #endif 138 | -------------------------------------------------------------------------------- /install-sh: -------------------------------------------------------------------------------- 1 | #! /bin/sh 2 | # 3 | # install - install a program, script, or datafile 4 | # This comes from X11R5 (mit/util/scripts/install.sh). 5 | # 6 | # Copyright 1991 by the Massachusetts Institute of Technology 7 | # 8 | # Permission to use, copy, modify, distribute, and sell this software and its 9 | # documentation for any purpose is hereby granted without fee, provided that 10 | # the above copyright notice appear in all copies and that both that 11 | # copyright notice and this permission notice appear in supporting 12 | # documentation, and that the name of M.I.T. not be used in advertising or 13 | # publicity pertaining to distribution of the software without specific, 14 | # written prior permission. M.I.T. makes no representations about the 15 | # suitability of this software for any purpose. It is provided "as is" 16 | # without express or implied warranty. 17 | # 18 | # Calling this script install-sh is preferred over install.sh, to prevent 19 | # `make' implicit rules from creating a file called install from it 20 | # when there is no Makefile. 21 | # 22 | # This script is compatible with the BSD install script, but was written 23 | # from scratch. It can only install one file at a time, a restriction 24 | # shared with many OS's install programs. 25 | 26 | 27 | # set DOITPROG to echo to test this script 28 | 29 | # Don't use :- since 4.3BSD and earlier shells don't like it. 30 | doit="${DOITPROG-}" 31 | 32 | 33 | # put in absolute paths if you don't have them in your path; or use env. vars. 34 | 35 | mvprog="${MVPROG-mv}" 36 | cpprog="${CPPROG-cp}" 37 | chmodprog="${CHMODPROG-chmod}" 38 | chownprog="${CHOWNPROG-chown}" 39 | chgrpprog="${CHGRPPROG-chgrp}" 40 | stripprog="${STRIPPROG-strip}" 41 | rmprog="${RMPROG-rm}" 42 | mkdirprog="${MKDIRPROG-mkdir}" 43 | 44 | transformbasename="" 45 | transform_arg="" 46 | instcmd="$mvprog" 47 | chmodcmd="$chmodprog 0755" 48 | chowncmd="" 49 | chgrpcmd="" 50 | stripcmd="" 51 | rmcmd="$rmprog -f" 52 | mvcmd="$mvprog" 53 | src="" 54 | dst="" 55 | dir_arg="" 56 | 57 | while [ x"$1" != x ]; do 58 | case $1 in 59 | -c) instcmd="$cpprog" 60 | shift 61 | continue;; 62 | 63 | -d) dir_arg=true 64 | shift 65 | continue;; 66 | 67 | -m) chmodcmd="$chmodprog $2" 68 | shift 69 | shift 70 | continue;; 71 | 72 | -o) chowncmd="$chownprog $2" 73 | shift 74 | shift 75 | continue;; 76 | 77 | -g) chgrpcmd="$chgrpprog $2" 78 | shift 79 | shift 80 | continue;; 81 | 82 | -s) stripcmd="$stripprog" 83 | shift 84 | continue;; 85 | 86 | -t=*) transformarg=`echo $1 | sed 's/-t=//'` 87 | shift 88 | continue;; 89 | 90 | -b=*) transformbasename=`echo $1 | sed 's/-b=//'` 91 | shift 92 | continue;; 93 | 94 | *) if [ x"$src" = x ] 95 | then 96 | src=$1 97 | else 98 | # this colon is to work around a 386BSD /bin/sh bug 99 | : 100 | dst=$1 101 | fi 102 | shift 103 | continue;; 104 | esac 105 | done 106 | 107 | if [ x"$src" = x ] 108 | then 109 | echo "install: no input file specified" 110 | exit 1 111 | else 112 | true 113 | fi 114 | 115 | if [ x"$dir_arg" != x ]; then 116 | dst=$src 117 | src="" 118 | 119 | if [ -d $dst ]; then 120 | instcmd=: 121 | else 122 | instcmd=mkdir 123 | fi 124 | else 125 | 126 | # Waiting for this to be detected by the "$instcmd $src $dsttmp" command 127 | # might cause directories to be created, which would be especially bad 128 | # if $src (and thus $dsttmp) contains '*'. 129 | 130 | if [ -f $src -o -d $src ] 131 | then 132 | true 133 | else 134 | echo "install: $src does not exist" 135 | exit 1 136 | fi 137 | 138 | if [ x"$dst" = x ] 139 | then 140 | echo "install: no destination specified" 141 | exit 1 142 | else 143 | true 144 | fi 145 | 146 | # If destination is a directory, append the input filename; if your system 147 | # does not like double slashes in filenames, you may need to add some logic 148 | 149 | if [ -d $dst ] 150 | then 151 | dst="$dst"/`basename $src` 152 | else 153 | true 154 | fi 155 | fi 156 | 157 | ## this sed command emulates the dirname command 158 | dstdir=`echo $dst | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'` 159 | 160 | # Make sure that the destination directory exists. 161 | # this part is taken from Noah Friedman's mkinstalldirs script 162 | 163 | # Skip lots of stat calls in the usual case. 164 | if [ ! -d "$dstdir" ]; then 165 | defaultIFS=' 166 | ' 167 | IFS="${IFS-${defaultIFS}}" 168 | 169 | oIFS="${IFS}" 170 | # Some sh's can't handle IFS=/ for some reason. 171 | IFS='%' 172 | set - `echo ${dstdir} | sed -e 's@/@%@g' -e 's@^%@/@'` 173 | IFS="${oIFS}" 174 | 175 | pathcomp='' 176 | 177 | while [ $# -ne 0 ] ; do 178 | pathcomp="${pathcomp}${1}" 179 | shift 180 | 181 | if [ ! -d "${pathcomp}" ] ; 182 | then 183 | $mkdirprog "${pathcomp}" 184 | else 185 | true 186 | fi 187 | 188 | pathcomp="${pathcomp}/" 189 | done 190 | fi 191 | 192 | if [ x"$dir_arg" != x ] 193 | then 194 | $doit $instcmd $dst && 195 | 196 | if [ x"$chowncmd" != x ]; then $doit $chowncmd $dst; else true ; fi && 197 | if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dst; else true ; fi && 198 | if [ x"$stripcmd" != x ]; then $doit $stripcmd $dst; else true ; fi && 199 | if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dst; else true ; fi 200 | else 201 | 202 | # If we're going to rename the final executable, determine the name now. 203 | 204 | if [ x"$transformarg" = x ] 205 | then 206 | dstfile=`basename $dst` 207 | else 208 | dstfile=`basename $dst $transformbasename | 209 | sed $transformarg`$transformbasename 210 | fi 211 | 212 | # don't allow the sed command to completely eliminate the filename 213 | 214 | if [ x"$dstfile" = x ] 215 | then 216 | dstfile=`basename $dst` 217 | else 218 | true 219 | fi 220 | 221 | # Make a temp file name in the proper directory. 222 | 223 | dsttmp=$dstdir/#inst.$$# 224 | 225 | # Move or copy the file name to the temp name 226 | 227 | $doit $instcmd $src $dsttmp && 228 | 229 | trap "rm -f ${dsttmp}" 0 && 230 | 231 | # and set any options; do chmod last to preserve setuid bits 232 | 233 | # If any of these fail, we abort the whole thing. If we want to 234 | # ignore errors from any of these, just make sure not to ignore 235 | # errors from the above "$doit $instcmd $src $dsttmp" command. 236 | 237 | if [ x"$chowncmd" != x ]; then $doit $chowncmd $dsttmp; else true;fi && 238 | if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dsttmp; else true;fi && 239 | if [ x"$stripcmd" != x ]; then $doit $stripcmd $dsttmp; else true;fi && 240 | if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dsttmp; else true;fi && 241 | 242 | # Now rename the file to the real destination. 243 | 244 | $doit $rmcmd -f $dstdir/$dstfile && 245 | $doit $mvcmd $dsttmp $dstdir/$dstfile 246 | 247 | fi && 248 | 249 | 250 | exit 0 251 | -------------------------------------------------------------------------------- /Makefile.in: -------------------------------------------------------------------------------- 1 | # 2 | # This makefile simply redirects all targets to "src" directory, but also includes 3 | # some specific package-wide rules. 4 | # 5 | # http://mad-scientist.net/make/rules.html 6 | 7 | include Makefile.inc 8 | 9 | SHELL=/bin/bash 10 | 11 | # These targets are used to invoke make recursively but do some additional actions if necessary: 12 | SPECIAL_PHONY_TARGETS := $(addsuffix .subdir,$(PHONY_TARGETS)) 13 | 14 | .PHONY: proper tarball dist package_deb package_rpm $(SPECIAL_PHONY_TARGETS) 15 | 16 | $(SPECIAL_PHONY_TARGETS): %.subdir: 17 | $(MAKE) -C src $* 18 | $(MAKE) -C dict $* 19 | $(MAKE) -C doc $* 20 | 21 | all: all.subdir 22 | 23 | install: install.subdir 24 | $(INSTALL_DIR) $(DESTDIR)$(docdir) 25 | $(INSTALL_DATA) README $(DESTDIR)$(docdir) 26 | 27 | uninstall: uninstall.subdir 28 | $(RM) -f $(DESTDIR)$(docdir)/README 29 | # We do not uninstall empty directories 30 | 31 | # Should remove all Makefile targets: 32 | clean: clean.subdir 33 | $(RM) -f *.deb *.rpm *.srpm $(NAME)*.tar.bz2 34 | 35 | # Cleanup all autogenerated files (e.g. used before creating a distro tarball). 36 | distclean: clean distclean.subdir 37 | $(RM) -rf config.status config.cache config.log autom4te.cache 38 | # Everything what is generated by configure script: 39 | $(RM) -f Makefile Makefile.inc doc/manual.sgml package/linux/osra.pc package/linux/debian/control package/linux/debian/rules package/linux/suse/osra.spec pom.xml 40 | 41 | # This rule is responsible for correcting permissions after checkout from VCS. 42 | proper: 43 | chmod 755 configure 44 | 45 | # This rule creates a tarball from current directory. 46 | # $(TAR_NAME) should be defined before calling this rule: 47 | tarball: distclean proper 48 | tar -C .. -cj --exclude-vcs $(shell basename `pwd`) -f $(TAR_NAME) 49 | 50 | # Create a tarball snapshot from current source directory (usually to be uploaded to FTP server). 51 | dist: TAR_NAME := ../$(NAME_VERSION).tar.bz2 52 | # Generate md5: 53 | dist: tarball 54 | cat $(TAR_NAME) | md5sum > $(TAR_NAME).md5 55 | @echo "Archive $(TAR_NAME) was created." 56 | 57 | # Use default value if TMP variable is not defined: 58 | BUILD_DIR := $(shell echo $${TMP:-/tmp}/build-$${RANDOM})/$(NAME_VERSION) 59 | 60 | # Create DEB package: 61 | package_deb: TAR_NAME := ../$(NAME)_$(VERSION).orig.tar.bz2 62 | package_deb: clean 63 | mkdir -p $(BUILD_DIR) 64 | 65 | cp -a . $(BUILD_DIR) 66 | cp -a package/linux/debian $(BUILD_DIR) 67 | 68 | # Debian build requires the original tarball name to correspond to mask "NAME_VERSION.orig.(tar|tar.bz2|tar.gz|lzma)": 69 | TAR_NAME=$(TAR_NAME) $(MAKE) -C $(BUILD_DIR) tarball 70 | 71 | # Run dpkg build in isolated environment (preserve the exit code): 72 | pushd $(BUILD_DIR); \ 73 | debuild -e JAVA_HOME -sa -us -uc; \ 74 | exit_code=$$?; \ 75 | popd; \ 76 | exit $$exit_code; 77 | cp $(BUILD_DIR)/../*.deb $(BUILD_DIR)/$(TAR_NAME) . 78 | $(RM) -rf $(BUILD_DIR) 79 | 80 | # Create RPM package: 81 | package_rpm: clean 82 | mkdir -p $(BUILD_DIR)/{SOURCES,SPECS,BUILD,RPMS,SRPMS,$(NAME_VERSION)} 83 | 84 | echo "%_topdir $(BUILD_DIR)" > ~/.rpmmacros 85 | 86 | cp -a . $(BUILD_DIR)/$(NAME_VERSION) 87 | cp package/linux/suse/osra.spec $(BUILD_DIR)/SPECS 88 | 89 | # If you change the tarname here, change also "Source0" in package/linux/suse/osra.spec: 90 | TAR_NAME=$(BUILD_DIR)/SOURCES/$(NAME_VERSION).tar.bz2 $(MAKE) -C $(BUILD_DIR)/$(NAME_VERSION) tarball 91 | 92 | # Run RPM build in isolated environment (preserve the exit code): 93 | pushd $(BUILD_DIR)/SPECS; \ 94 | rpmbuild -ba --target=$(TARGET_CPU) osra.spec; \ 95 | exit_code=$$?; \ 96 | popd; \ 97 | exit $$exit_code; 98 | $(RM) -f ~/.rpmmacros 99 | # The RPM location depends on target platform and concrete path is unknown in advance: 100 | find $(BUILD_DIR)/RPMS -iname '$(NAME)*.rpm' -exec cp '{}' . \; 101 | # Rename to .srpm, because otherwise *.rpm mask does not match unique file: 102 | cp $(BUILD_DIR)/SRPMS/$(NAME)*.src.rpm $(NAME_VERSION).srpm 103 | $(RM) -rf $(BUILD_DIR) 104 | 105 | MVN_COMMON_OPTS := $(MVN_EXTRA_OPTS) -B deploy:deploy-file -Durl=$(REPOSITORY_URL) -DrepositoryId=$(REPOSITORY_ID) -DgroupId=net.sf.osra -Dversion=$(VERSION) 106 | 107 | # Maven repository deployment rules for DEB packages: 108 | deploy_deb: 109 | mvn $(MVN_COMMON_OPTS) -Dfile=`echo $(NAME)_*.deb` -DartifactId=$(NAME) -Dclassifier=$(TARGET_CPU) -Dpackaging=deb 110 | mvn $(MVN_COMMON_OPTS) -Dfile=`echo $(NAME)-common_*.deb` -DartifactId=$(NAME)-common -Dclassifier=all -Dpackaging=deb 111 | mvn $(MVN_COMMON_OPTS) -Dfile=`echo $(NAME)_*.tar.bz2` -DartifactId=$(NAME) -Dclassifier=sources -Dpackaging=tar.bz2 112 | 113 | -[ -f lib$(NAME)$(LIB_MAJOR_VERSION)_*.deb ] && mvn $(MVN_COMMON_OPTS) -Dfile=`echo lib$(NAME)$(LIB_MAJOR_VERSION)_*.deb` -DartifactId=$(NAME)-lib -Dclassifier=$(TARGET_CPU) -Dpackaging=deb 114 | -[ -f lib$(NAME)-java$(LIB_MAJOR_VERSION)_*.deb ] && mvn $(MVN_COMMON_OPTS) -Dfile=`echo lib$(NAME)-java$(LIB_MAJOR_VERSION)_*.deb` -DartifactId=$(NAME)-lib-java -Dclassifier=$(TARGET_CPU) -Dpackaging=deb 115 | -[ -f lib$(NAME)-dev_*.deb ] && mvn $(MVN_COMMON_OPTS) -Dfile=`echo lib$(NAME)-dev_*.deb` -DartifactId=$(NAME)-devel -Dclassifier=$(TARGET_CPU) -Dpackaging=deb 116 | 117 | # Maven repository deployment rules for RPM packages: 118 | deploy_rpm: 119 | mvn $(MVN_COMMON_OPTS) -Dfile=`echo $(NAME_VERSION)-*.rpm` -DartifactId=$(NAME) -Dclassifier=$(TARGET_CPU) -Dpackaging=rpm 120 | mvn $(MVN_COMMON_OPTS) -Dfile=`echo $(NAME)-common-*.rpm` -DartifactId=$(NAME)-common -Dclassifier=noarch -Dpackaging=rpm 121 | mvn $(MVN_COMMON_OPTS) -Dfile=`echo $(NAME)-*.srpm` -DartifactId=$(NAME) -Dclassifier=sources -Dpackaging=srpm 122 | 123 | -[ -f $(NAME)-lib$(LIB_MAJOR_VERSION)-*.rpm ] && mvn $(MVN_COMMON_OPTS) -Dfile=`echo $(NAME)-lib$(LIB_MAJOR_VERSION)-*.rpm` -DartifactId=$(NAME)-lib -Dclassifier=$(TARGET_CPU) -Dpackaging=rpm 124 | -[ -f $(NAME)-lib-java$(LIB_MAJOR_VERSION)-*.rpm ] && mvn $(MVN_COMMON_OPTS) -Dfile=`echo $(NAME)-lib-java$(LIB_MAJOR_VERSION)-*.rpm` -DartifactId=$(NAME)-lib-java -Dclassifier=$(TARGET_CPU) -Dpackaging=rpm 125 | -[ -f $(NAME)-devel-*.rpm ] && mvn $(MVN_COMMON_OPTS) -Dfile=`echo $(NAME)-devel-*.rpm` -DartifactId=$(NAME)-devel -Dclassifier=$(TARGET_CPU) -Dpackaging=rpm 126 | 127 | # Maven repository deployment rules for JAR package: 128 | deploy_jar: 129 | mvn -B -DaltDeploymentRepository=$(REPOSITORY_ID)::default::$(REPOSITORY_URL) source:jar deploy 130 | 131 | beautify: 132 | astyle --style=gnu --suffix=none --recursive "*.cpp" "*.h" 133 | 134 | Makefile.inc: Makefile.inc.in config.status 135 | ./config.status 136 | 137 | config.status: configure 138 | @echo "Your Makefile.inc is older than configure script. As this file is generated by configure, it is strongly advised to re-run configure to update it." 139 | # ./configure 140 | 141 | configure: configure.ac aclocal.m4 142 | @echo "Your configure script is older than configure.ac. It is strongly advised to re-run autoconf to update it." 143 | # autoconf 144 | -------------------------------------------------------------------------------- /src/osra_segment.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | // Header: osra_segment.h 20 | // 21 | // Declares page segmentation functions 22 | // 23 | 24 | #ifndef OSRA_SEGMENT_H 25 | #define OSRA_SEGMENT_H 26 | 27 | #include // sdt::list 28 | #include // std::vector 29 | #include 30 | #include // fabs(double) 31 | #include // FLT_MAX 32 | #include // INT_MAX 33 | #include // std::min(double, double), std::max(double, double) 34 | #include 35 | 36 | using namespace Magick; 37 | 38 | // struct: point_s 39 | // a point of the image, used by image segmentation routines 40 | struct point_s 41 | { 42 | // int: x,y 43 | // coordinates of the image point 44 | int x, y; 45 | explicit point_s(int a, int b) : x(a), y(b) {} 46 | point_s() {} 47 | }; 48 | // typedef: point_t 49 | // defines point_t type based on point_s struct 50 | typedef struct point_s point_t; 51 | 52 | // struct: box_s 53 | // encompassing box structure for image segmentation 54 | struct box_s 55 | { 56 | // int: x1, y1, x2, y2 57 | // coordinates of top-left and bottom-right corners 58 | int x1, y1, x2, y2; 59 | // array: c 60 | // vector of points in the box 61 | std::vector c; 62 | }; 63 | // typedef: box_t 64 | // defines box_t type based on box_s struct 65 | typedef struct box_s box_t; 66 | 67 | // struct: arrow_s 68 | // coordinates of tail and head of an arrow 69 | struct arrow_s 70 | { 71 | arrow_s(point_t _head, point_t _tail,int _min_x,int _min_y,int _max_x,int _max_y) : 72 | head(_head),tail(_tail),min_x(_min_x),min_y(_min_y),max_x(_max_x),max_y(_max_y),linebreak(false),reversible(false),remove(false),agent("") {} 73 | arrow_s() {} 74 | // point_t: tail, head 75 | // tail and head of an arrow as points 76 | point_t tail,head; 77 | int min_x,min_y,max_x,max_y; 78 | std::string agent; 79 | bool linebreak; 80 | bool reversible; 81 | bool remove; 82 | }; 83 | // typedef: arrow_t 84 | // defines arrow_t type based on arrow_s struct 85 | typedef struct arrow_s arrow_t; 86 | 87 | struct plus_s 88 | { 89 | point_t center; 90 | int min_x,min_y,max_x,max_y; 91 | }; 92 | typedef struct plus_s plus_t; 93 | 94 | // 95 | // Section: Functions 96 | // 97 | 98 | // Function: find_segments() 99 | // 100 | // Performs page segmentation to different regions (text/graphics/linear etc.) 101 | // 102 | // Parameters: 103 | // image - page image 104 | // threshold - black-white binarization threshold 105 | // bgColor - background color 106 | // adaptive - flag set if adaptive thresholding has been used in grayscale conversion 107 | // is_reaction - flag set if we're looking for reaction-specific symbols (arrows, plus signs etc.) 108 | // arrows - a vector of arrows found during segmentation 109 | // pluses - a vector of plus centers found during segmentation 110 | // verbose - flag set for verbose reporting 111 | // 112 | // Returns: 113 | // A list of clusters, each of which is a list of connected segments each of which is a list of points 114 | std::list > > find_segments(const Image &image, double threshold, const ColorGray &bgColor, bool adaptive, bool is_reaction, std::vector &arrows, std::vector &pluses, bool verbose); 115 | 116 | // Function: prune_clusters() 117 | // 118 | // Prunes the list of clusters and retains only molecular structure images 119 | // 120 | // Parameters: 121 | // clusters - a list of clusters detected by 122 | // boxes - a vector of objects for molecular structure images 123 | // brackets - a vector of points which potentially belong to brackets 124 | // 125 | // Returns: 126 | // Number of molecular structure images 127 | int prune_clusters(std::list > > &clusters, std::vector &boxes, std::set > &brackets); 128 | 129 | 130 | template 131 | void build_hist(const T &seg, std::vector &hist, const int len, int &top_pos, int &top_value,point_t &head,point_t &tail, point_t ¢er, int &min_x, int &min_y, int &max_x, int &max_y) 132 | { 133 | int l=seg.size(); 134 | typename T::const_iterator j; 135 | center.x=0; center.y=0; 136 | min_x = INT_MAX; 137 | min_y = INT_MAX; 138 | max_x = 0; 139 | max_y = 0; 140 | for (j=seg.begin(); j!=seg.end(); j++) 141 | { 142 | center.x += j->x; 143 | center.y += j->y; 144 | min_x = std::min(min_x, j->x); 145 | min_y = std::min(min_y, j->y); 146 | max_x = std::max(max_x, j->x); 147 | max_y = std::max(max_y, j->y); 148 | } 149 | center.x /=l; // Find the center of mass for the segment margin 150 | center.y /=l; 151 | 152 | for (j=seg.begin(); j!=seg.end(); j++) 153 | { 154 | int dx = j->x-center.x; 155 | int dy = j->y-center.y; 156 | double r=(double)sqrt(dx*dx+dy*dy); 157 | double theta=0.; 158 | if (dx!=0 || dy!=0) 159 | theta = atan2(dy,dx); 160 | int bin = (theta+M_PI)*len/(2*M_PI); 161 | if (bin>=len) bin -= len; 162 | hist[bin]++; // build a histogram of occurencies in polar coordinates 163 | if (hist[bin]>=top_value) 164 | { 165 | top_pos = bin; // find the position of the highest peak 166 | top_value = hist[bin]; 167 | } 168 | } 169 | 170 | double r_max=0; 171 | for (j=seg.begin(); j!=seg.end(); j++) 172 | { 173 | int dx = j->x-center.x; 174 | int dy = j->y-center.y; 175 | double r=(double)sqrt(dx*dx+dy*dy); 176 | double theta=0.; 177 | if (dx!=0 || dy!=0) 178 | theta = atan2(dy,dx); 179 | int bin = (theta+M_PI)*len/(2*M_PI); 180 | if (bin>=len) bin -= len; 181 | if (bin == top_pos && r>r_max) 182 | { 183 | head = *j; 184 | r_max = r; 185 | } 186 | } 187 | 188 | r_max=0; 189 | for (j=seg.begin(); j!=seg.end(); j++) 190 | { 191 | int dx = j->x-head.x; 192 | int dy = j->y-head.y; 193 | double r=(double)sqrt(dx*dx+dy*dy); 194 | if (r>r_max) 195 | { 196 | r_max = r; 197 | tail = *j; 198 | } 199 | } 200 | } 201 | #endif 202 | -------------------------------------------------------------------------------- /src/osra_fragments.cpp: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // File osra_fragments.cpp 21 | // 22 | // Defines operations on molecular fragments 23 | // 24 | 25 | #include // FLT_MAX 26 | #include // INT_MAX 27 | #include // std::ostream, std::cout 28 | 29 | #include "osra.h" 30 | #include "osra_common.h" 31 | #include "osra_fragments.h" 32 | 33 | double atom_distance(const std::vector &atom, int a, int b) 34 | { 35 | return (distance(atom[a].x, atom[a].y, atom[b].x, atom[b].y)); 36 | } 37 | 38 | /** 39 | * TODO: Returning the vector from the stack causes copy constructor to trigger, which is inefficient. 40 | * Consider passing the vector as a reference. 41 | */ 42 | std::vector > find_fragments( 43 | const std::vector &bond, int n_bond, const std::vector &atom) 44 | { 45 | std::vector > frags; 46 | std::vector pool; 47 | int n = 0; 48 | 49 | for (int i = 0; i < n_bond; i++) 50 | if (bond[i].exists && atom[bond[i].a].exists && atom[bond[i].b].exists) 51 | pool.push_back(i); 52 | 53 | while (!pool.empty()) 54 | { 55 | frags.resize(n + 1); 56 | frags[n].push_back(bond[pool.back()].a); 57 | frags[n].push_back(bond[pool.back()].b); 58 | pool.pop_back(); 59 | bool found = true; 60 | 61 | while (found) 62 | { 63 | found = false; 64 | unsigned int i = 0; 65 | while (i < pool.size()) 66 | { 67 | bool found_a = false; 68 | bool found_b = false; 69 | bool newfound = false; 70 | for (unsigned int k = 0; k < frags[n].size(); k++) 71 | { 72 | if (frags[n][k] == bond[pool[i]].a) 73 | found_a = true; 74 | else if (frags[n][k] == bond[pool[i]].b) 75 | found_b = true; 76 | } 77 | if (found_a && !found_b) 78 | { 79 | frags[n].push_back(bond[pool[i]].b); 80 | pool.erase(pool.begin() + i); 81 | found = true; 82 | newfound = true; 83 | } 84 | if (!found_a && found_b) 85 | { 86 | frags[n].push_back(bond[pool[i]].a); 87 | pool.erase(pool.begin() + i); 88 | found = true; 89 | newfound = true; 90 | } 91 | if (found_a && found_b) 92 | { 93 | pool.erase(pool.begin() + i); 94 | newfound = true; 95 | } 96 | if (!newfound) 97 | i++; 98 | } 99 | } 100 | n++; 101 | } 102 | return (frags); 103 | } 104 | 105 | int reconnect_fragments(std::vector &bond, int n_bond, std::vector &atom, double avg) 106 | { 107 | std::vector > frags = find_fragments(bond, n_bond, atom); 108 | 109 | if (frags.size() <= 3) 110 | { 111 | for (unsigned int i = 0; i < frags.size(); i++) 112 | if (frags[i].size() > 2) 113 | for (unsigned int j = i + 1; j < frags.size(); j++) 114 | if (frags[j].size() > 2) 115 | { 116 | double l = FLT_MAX; 117 | int atom1 = 0, atom2 = 0; 118 | for (unsigned int ii = 0; ii < frags[i].size(); ii++) 119 | for (unsigned int jj = 0; jj < frags[j].size(); jj++) 120 | { 121 | double d = atom_distance(atom, frags[i][ii], frags[j][jj]); 122 | if (d < l) 123 | { 124 | l = d; 125 | atom1 = frags[i][ii]; 126 | atom2 = frags[j][jj]; 127 | } 128 | } 129 | if (l < 1.1 * avg && l > avg / 3) 130 | { 131 | //cout< populate_fragments( 160 | const std::vector > &frags, const std::vector &atom) 161 | { 162 | std::vector r; 163 | 164 | for (unsigned int i = 0; i < frags.size(); i++) 165 | { 166 | fragment_t f; 167 | f.x1 = INT_MAX; 168 | f.x2 = 0; 169 | f.y1 = INT_MAX; 170 | f.y2 = 0; 171 | 172 | for (unsigned j = 0; j < frags[i].size(); j++) 173 | { 174 | f.atom.push_back(frags[i][j]); 175 | if (atom[frags[i][j]].min_x < f.x1) 176 | f.x1 = atom[frags[i][j]].min_x; 177 | if (atom[frags[i][j]].max_x > f.x2) 178 | f.x2 = atom[frags[i][j]].max_x; 179 | if (atom[frags[i][j]].min_y < f.y1) 180 | f.y1 = atom[frags[i][j]].min_y; 181 | if (atom[frags[i][j]].max_y > f.y2) 182 | f.y2 = atom[frags[i][j]].max_y; 183 | //cout<<"Atoms2: "< bb.y2) 196 | return (false); 197 | if (aa.x1 > bb.x1) 198 | return (false); 199 | if (aa.x1 < bb.x1) 200 | return (true); 201 | 202 | return (false); 203 | } 204 | -------------------------------------------------------------------------------- /dict/spelling.txt: -------------------------------------------------------------------------------- 1 | # Spelling corrections for atom labels and abbreviations 2 | # that might not be correctly parsed by OCR engine 3 | # You can run osra with -d option to check the spelling correction 4 | # process - the output looks like: 5 | # OCR string --> Corrected String --> Final Output 6 | # Note that by default OSRA might try as many as 3 different resolutions 7 | # so you may have quite a bit of output to look through. Try a specific 8 | # resolution (with -r switch) or choose the best match. 9 | # Empty lines are ignored and lines starting with # are comments 10 | 11 | Ci Cl 12 | Cf Cl 13 | Cll Cl 14 | 15 | HN N 16 | NH N 17 | M N 18 | Hm N 19 | MN N 20 | N2 N 21 | NM N 22 | NH2 N 23 | H2N N 24 | NHZ N 25 | HZN N 26 | NH3 N 27 | nu N 28 | Hu N 29 | lU N 30 | HlU N 31 | lUH N 32 | H2Y N 33 | RN N 34 | MH N 35 | M2M N 36 | M N 37 | HnN N 38 | nIH N 39 | NHX N 40 | mH N 41 | NN N 42 | hN N 43 | Nh N 44 | 45 | OH O 46 | oH O 47 | Ho O 48 | HO O 49 | ol O 50 | On O 51 | on O 52 | no O 53 | nO O 54 | ON O 55 | oN O 56 | O4 O 57 | OM O 58 | Un O 59 | 4O O 60 | Mo O 61 | nU O 62 | UnH O 63 | nUH O 64 | 65 | Meo MeO 66 | oMe MeO 67 | oMg MeO 68 | omg MeO 69 | Mgo MeO 70 | leo MeO 71 | ohle MeO 72 | lleo MeO 73 | olllle MeO 74 | OMe MeO 75 | OM8 MeO 76 | OMo MeO 77 | OMB MeO 78 | OCH3 MeO 79 | OCHS MeO 80 | H3CO MeO 81 | OM4 MeO 82 | ocH MeO 83 | OM6 MeO 84 | M6O MeO 85 | OMR MeO 86 | OMoo MeO 87 | OmB MeO 88 | MoO MeO 89 | M*O MeO 90 | McoO MeO 91 | Ome MeO 92 | MgO MeO 93 | M8O MeO 94 | MBO MeO 95 | 96 | NC CN 97 | YC CN 98 | Nc CN 99 | cN CN 100 | 101 | nBU nBu 102 | neU nBu 103 | ngU nBu 104 | n8U nBu 105 | BU nBu 106 | 107 | Eto EtO 108 | oEt EtO 109 | Elo EtO 110 | oEl EtO 111 | ElO EtO 112 | OEl EtO 113 | OC2H EtO 114 | OCH2CH3 EtO 115 | CH3CH2O EtO 116 | 117 | olgU OiBu 118 | oleU OiBu 119 | OlBU OiBu 120 | 121 | npr iPr 122 | llpll iPr 123 | lpl iPr 124 | npl iPr 125 | lPl iPr 126 | nPl iPr 127 | 128 | tBU tBu 129 | llBU tBu 130 | lBU tBu 131 | 132 | CooH COOH 133 | HooC COOH 134 | Co2H COOH 135 | CO2H COOH 136 | HOOC COOH 137 | CO2n COOH 138 | co2H COOH 139 | CO2 COOH 140 | 141 | 142 | AC Ac 143 | pC Ac 144 | pc Ac 145 | 146 | ACo AcO 147 | opC AcO 148 | pcO AcO 149 | ACO AcO 150 | oCO AcO 151 | OoC AcO 152 | OpC AcO 153 | pCO AcO 154 | RCO AcO 155 | ORC AcO 156 | OnC AcO 157 | OAc AcO 158 | nCO AcO 159 | Rco AcO 160 | oRc AcO 161 | OAC AcO 162 | 163 | Bl Br 164 | el Br 165 | BC Br 166 | BF Br 167 | 168 | # Need to distinguish between a recognized label for methyl and 169 | # the default empty string for carbon 170 | CH3 Me 171 | H3C Me 172 | CH Me 173 | CH2 Me 174 | HC Me 175 | hle Me 176 | M8 Me 177 | MB Me 178 | MR Me 179 | Me Me 180 | H2C Me 181 | 3C Me 182 | 183 | pl Ar 184 | nl Ar 185 | 186 | oX Ox 187 | 188 | NoZ NO2 189 | o2N NO2 190 | No2 NO2 191 | No NO2 192 | O2N NO2 193 | NOZ NO2 194 | MO2 NO2 195 | 196 | F3C CF3 197 | CF CF3 198 | FC CF3 199 | Co CF3 200 | F8l CF3 201 | CFS CF3 202 | FSC CF3 203 | 204 | F3Co F3CN 205 | 206 | S3 S 207 | Se S 208 | lS S 209 | 8 S 210 | SH S 211 | HS S 212 | 5 S 213 | 214 | O2S SO2 215 | 216 | lH H 217 | 1H H 218 | 219 | AcNH NHAc 220 | AcHN NHAc 221 | ACNH NHAc 222 | NHnC NHAc 223 | pCNH NHAc 224 | NHpC NHAc 225 | lCnuH NHAc 226 | NHAC NHAc 227 | 228 | OlHP THPO 229 | lHPO THPO 230 | lNpo THPO 231 | olHp THPO 232 | 233 | 234 | NlOHCH3 NOHCH3 235 | 236 | HO3S SO3H 237 | so3H SO3H 238 | Ho3s SO3H 239 | SO3 SO3H 240 | SUn3 SO3H 241 | 242 | NMe MeN 243 | NHMe MeN 244 | NHMF MeN 245 | NHlME MeN 246 | 247 | RO OR 248 | oR OR 249 | Ro OR 250 | 251 | lHPO THPO 252 | OlHP THPO 253 | 254 | NCOlRlH3 N(OH)CH3 255 | 256 | pZO BzO 257 | p2O BzO 258 | OBX BzO 259 | BZO BzO 260 | B2O BzO 261 | OB2 BzO 262 | OBz BzO 263 | OBZ BzO 264 | Blo BzO 265 | BZC BzO 266 | EBZO BzO 267 | CBZ BzO 268 | B2C BzO 269 | 270 | Sl Si 271 | 272 | CO2El CO2Et 273 | COOEl CO2Et 274 | COOEt CO2Et 275 | COOC2H CO2Et 276 | CO2CH2CH3 CO2Et 277 | COOCH2CH3 CO2Et 278 | CO2C2H5 CO2Et 279 | COEl CO2Et 280 | COHEl CO2Et 281 | COOMe CO2Me 282 | COOCH3 CO2Me 283 | CO2CH3 CO2Me 284 | HUn2C COOH 285 | CO2E CO2Et 286 | EO EtO 287 | HO2C COOH 288 | CUn2H COOH 289 | CnU2H COOH 290 | MeHN MeN 291 | n2N N 292 | El Et 293 | CH2CH3 Et 294 | CH3CH2 Et 295 | C2H5 Et 296 | H5C2 Et 297 | OBn BnO 298 | HNZ ZNH 299 | ZHN ZNH 300 | HNAm AmNH 301 | OAm AmO 302 | AmOOC AmO2C 303 | CO2Am AmO2C 304 | COOAm AmO2C 305 | SAm AmS 306 | HNBn BnNH 307 | BnN BnNH 308 | CO2Bn BnO2C 309 | BnOOC BnO2C 310 | COOBn BnO2C 311 | SnBu3 Bu3Sn 312 | HNBu BuNH 313 | OBu BuO 314 | CO2Bu BuO2C 315 | COOBu BuO2C 316 | BuOOC BuO2C 317 | SBu BuS 318 | Br3C CBr3 319 | HNCbz CbzNH 320 | Cl3C CCl3 321 | OCH CHO 322 | OHC CHO 323 | O2SCl ClSO2 324 | SO2Cl ClSO2 325 | MeO2C CO2Me 326 | OSO2Me MeO2SO 327 | BrOC COBr 328 | BuOC COBu 329 | F3COC COCF3 330 | ClOC COCl 331 | OCOC COCO 332 | EtOC COEt 333 | FOC COF 334 | MeOC COMe 335 | H3COC COMe 336 | Et2NOC CONEt2 337 | NH2OC CONH2 338 | EtHNOC CONHEt 339 | MeHNOC CONHMe 340 | Me2NOC CONMe2 341 | HSOC COSH 342 | NEt2 Et2N 343 | El2N Et2N 344 | NE2 Et2N 345 | E2N Et2N 346 | NEt3 Et3N 347 | HNEt EtNH 348 | SO2NH2 H2NSO2 349 | H2NO2S H2NSO2 350 | SO2N H2NSO2 351 | HNOH HONH 352 | NMe2 Me2N 353 | MeNH MeN 354 | HNMe MeN 355 | MeOOC CO2Me 356 | OMs MsO 357 | OMS MsO 358 | OCN NCO 359 | SCN NCS 360 | AmHN NHAm 361 | BnHN NHBn 362 | BuHN NHBu 363 | EtHN NHEt 364 | HOHN NHOH 365 | PrHN NHPr 366 | ON NO 367 | Et2OP POEt2 368 | Et3OP POEt3 369 | Et2OOP POOEt2 370 | HNPr PrNH 371 | EtS SEt 372 | SMe MeS 373 | SCH3 MeS 374 | Pll Ph 375 | Pl Ph 376 | ElO2C CO2Et 377 | EtOOC CO2Et 378 | ElOOC CO2Et 379 | 380 | OlOS OTos 381 | CH2CH CH2CH3 382 | CHCH3 CH2CH3 383 | H3CHC CH2CH3 384 | H3CH2C CH2CH3 385 | CCH3CH2l2N N(CH2CH3)2 386 | NCCH2CH3l2 N(CH2CH3)2 387 | CCH3CHl2N N(CH2CH3)2 388 | NCCH2CH2CH3l2 N(CH2CH2CH3)2 389 | CCCH3l3 C(CH3)3 390 | CHCCH3l2 CH(CH3)2 391 | OCH2CO2El OCH2CO2Et 392 | CBOCl2N BOC2N 393 | NHBOC BOCHN 394 | NBOC BOCHN 395 | NHCRZ NHCbz 396 | NClZ NHCbz 397 | F3CO OCF3 398 | Cl3CO OCCl3 399 | NSO2BU NHSO2BU 400 | NHSO2CH3 NHSO2Me 401 | ElOCHN EtO2CHN 402 | ElO2CHl EtO2CHN 403 | NHCOOEl NHCOOEt 404 | OEl2 OEt2 405 | CH2OH HOCH2 406 | NHCH NHCH3 407 | NCH3 NHCH3 408 | NO3S H4NO3S 409 | NOOC H4NOOC 410 | C3H C3H7 411 | C2H C2H5 412 | NNH2 NHNH2 413 | H3CS MeS 414 | NHNHCOCH NHNHCOCH3 415 | NNCOCH3 NHNHCOCH3 416 | NHNHCOCF NHNHCOCF3 417 | NNCOCF3 NHNHCOCF3 418 | CO2CYSP CO2CysPr 419 | COCYSPl CO2CysPr 420 | CO2CYSPl CO2CysPr 421 | CF3CH CF3CH2 422 | PPll2 PPh2 423 | Pll2P PPh2 424 | Ph2P PPh2 425 | CO2M8 CO2Me 426 | OCH2Pll OCH2Ph 427 | PMoN PMBN 428 | lCO AcO 429 | XCO AcO 430 | OXC AcO 431 | CH3O MeO 432 | O3S SO3H 433 | NXeOH2C CH2OMe 434 | CH2ONXe CH2OMe 435 | CHOMe CH2OMe 436 | MeOHC CH2OMe 437 | CH3OCH2 CH2OMe 438 | CH2OCH3 CH2OMe 439 | N3Cl NH3Cl 440 | MeCHlN MeN 441 | NCHlMe MeN 442 | ElCHlN NHEt 443 | NCHlEl NHEt 444 | OCH2Pll OCH2Ph 445 | OCH2P OCH2Ph 446 | COOCH2P COOCH2Ph 447 | NeO MeO 448 | ONe MeO 449 | OCPh3 Ph3CO 450 | SO2CH3 SO2Me 451 | H3CO2S SO2Me 452 | CH3SO2 SO2Me 453 | POIOEII2 POOEt2 454 | SO3NI SO3Na 455 | OSO2Me MsO 456 | CH2I5Bl (CH2)5Br 457 | ICH2I5Bl (CH2)5Br 458 | CH2I5 (CH2)5 459 | TOS Tos 460 | PhO OPh 461 | PhS SPh 462 | PhHN NHPh 463 | 464 | Rl R1 465 | Rlo R10 466 | Rg R9 467 | Rp R4 468 | 2 Z 469 | RlO R10 470 | Y2 Y2 471 | 472 | PMRN PMBN 473 | 474 | * Xx 475 | ** Xx 476 | *** Xx 477 | 478 | -------------------------------------------------------------------------------- /src/osra.cpp: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | #include // strncpy() 21 | #include // dirname() 22 | 23 | #include 24 | 25 | #include "osra_lib.h" 26 | #include "config.h" // PACKAGE_VERSION 27 | 28 | int main(int argc, 29 | char **argv 30 | ) 31 | { 32 | TCLAP::CmdLine cmd("OSRA: Optical Structure Recognition Application, created by Igor Filippov, 2013", ' ', 33 | PACKAGE_VERSION); 34 | 35 | // 36 | // Image pre-processing options 37 | // 38 | TCLAP::ValueArg rotate_option("R", "rotate", "Rotate image clockwise by specified number of degrees", false, 0, 39 | "0..360"); 40 | cmd.add(rotate_option); 41 | 42 | TCLAP::SwitchArg invert_option("n", "negate", "Invert color (white on black)", false); 43 | cmd.add(invert_option); 44 | 45 | TCLAP::ValueArg resolution_option("r", "resolution", "Resolution in dots per inch", false, 0, "default: auto"); 46 | cmd.add(resolution_option); 47 | 48 | TCLAP::ValueArg threshold_option("t", "threshold", "Gray level threshold", false, 0, "0.2..0.8"); 49 | cmd.add(threshold_option); 50 | 51 | TCLAP::ValueArg do_unpaper_option("u", "unpaper", "Pre-process image with unpaper algorithm, rounds", false, 0, 52 | "default: 0 rounds"); 53 | cmd.add(do_unpaper_option); 54 | 55 | TCLAP::SwitchArg jaggy_option("j", "jaggy", "Additional thinning/scaling down of low quality documents", false); 56 | cmd.add(jaggy_option); 57 | 58 | TCLAP::SwitchArg adaptive_option("i", "adaptive", "Adaptive thresholding pre-processing, useful for low light/low contrast images", false); 59 | cmd.add(adaptive_option); 60 | 61 | // 62 | // Output format options 63 | // 64 | TCLAP::ValueArg output_format_option("f", "format", "Output format", false, "can", "can/smi/sdf"); 65 | cmd.add(output_format_option); 66 | 67 | TCLAP::ValueArg embedded_format_option("", "embedded-format", "Embedded format", false, "", "inchi/smi/can"); 68 | cmd.add(embedded_format_option); 69 | 70 | TCLAP::SwitchArg show_confidence_option("p", "print", "Print out confidence estimate", false); 71 | cmd.add(show_confidence_option); 72 | 73 | TCLAP::SwitchArg show_resolution_guess_option("g", "guess", "Print out resolution guess", false); 74 | cmd.add(show_resolution_guess_option); 75 | 76 | TCLAP::SwitchArg show_page_option("e", "page", "Show page number for PDF/PS/TIFF documents (only for SDF/SMI/CAN output format)", false); 77 | cmd.add(show_page_option); 78 | 79 | TCLAP::SwitchArg show_coordinates_option("c", "coordinates", "Show surrounding box coordinates (only for SDF/SMI/CAN output format)", false); 80 | cmd.add(show_coordinates_option); 81 | 82 | TCLAP::SwitchArg show_avg_bond_length_option("b", "bond", "Show average bond length in pixels (only for SDF/SMI/CAN output format)", false); 83 | cmd.add(show_avg_bond_length_option); 84 | 85 | // 86 | // Dictionaries options 87 | // 88 | TCLAP::ValueArg spelling_file_option("l", "spelling", "Spelling correction dictionary", false, "", "configfile"); 89 | cmd.add(spelling_file_option); 90 | 91 | TCLAP::ValueArg superatom_file_option("a", "superatom", "Superatom label map to SMILES", false, "", "configfile"); 92 | cmd.add(superatom_file_option); 93 | 94 | // 95 | // Debugging options 96 | // 97 | TCLAP::SwitchArg debug_option("d", "debug", "Print out debug information on spelling corrections", false); 98 | cmd.add(debug_option); 99 | 100 | TCLAP::SwitchArg verbose_option("v", "verbose", "Be verbose and print the program flow", false); 101 | cmd.add(verbose_option); 102 | 103 | TCLAP::ValueArg output_image_file_prefix_option("o", "output", "Write recognized structures to image files with given prefix", false, "", "filename prefix"); 104 | cmd.add(output_image_file_prefix_option); 105 | 106 | TCLAP::ValueArg resize_option("s", "size", "Resize image on output", false, "", "dimensions, 300x400"); 107 | cmd.add(resize_option); 108 | 109 | TCLAP::ValueArg preview_option("", "preview", "Preview Image", false, "", "filename"); 110 | cmd.add(preview_option); 111 | // 112 | // Input-output options 113 | // 114 | TCLAP::UnlabeledValueArg input_file_option("in", "input file", true, "", "filename"); 115 | cmd.add(input_file_option); 116 | 117 | TCLAP::ValueArg output_file_option("w", "write", "Write recognized structures to text file", false, "", "filename"); 118 | cmd.add(output_file_option); 119 | 120 | TCLAP::SwitchArg show_learning_option("", "learn", "Print out all structure guesses with confidence parameters", false); 121 | cmd.add(show_learning_option); 122 | 123 | cmd.parse(argc, argv); 124 | 125 | // Calculating the current dir: 126 | char progname[1024]; 127 | strncpy(progname, cmd.getProgramName().c_str(), sizeof(progname) - 1); 128 | progname[sizeof(progname) - 1] = '\0'; 129 | std::string osra_dir = dirname(progname); 130 | 131 | int result = osra_process_image( 132 | input_file_option.getValue(), 133 | output_file_option.getValue(), 134 | rotate_option.getValue(), 135 | invert_option.getValue(), 136 | resolution_option.getValue(), 137 | threshold_option.getValue(), 138 | do_unpaper_option.getValue(), 139 | jaggy_option.getValue(), 140 | adaptive_option.getValue(), 141 | output_format_option.getValue(), 142 | embedded_format_option.getValue(), 143 | show_confidence_option.getValue(), 144 | show_resolution_guess_option.getValue(), 145 | show_page_option.getValue(), 146 | show_coordinates_option.getValue(), 147 | show_avg_bond_length_option.getValue(), 148 | show_learning_option.getValue(), 149 | osra_dir, 150 | spelling_file_option.getValue(), 151 | superatom_file_option.getValue(), 152 | debug_option.getValue(), 153 | verbose_option.getValue(), 154 | output_image_file_prefix_option.getValue(), 155 | resize_option.getValue(), 156 | preview_option.getValue() 157 | ); 158 | 159 | return result; 160 | } 161 | -------------------------------------------------------------------------------- /src/osra.h: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | OSRA: Optical Structure Recognition Application 3 | 4 | Created by Igor Filippov, 2007-2013 (igor.v.filippov@gmail.com) 5 | 6 | This program is free software; you can redistribute it and/or modify it under 7 | the terms of the GNU General Public License as published by the Free Software 8 | Foundation; either version 2 of the License, or (at your option) any later 9 | version. 10 | 11 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 12 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 13 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 14 | 15 | You should have received a copy of the GNU General Public License along with 16 | this program; if not, write to the Free Software Foundation, Inc., 51 Franklin 17 | St, Fifth Floor, Boston, MA 02110-1301, USA 18 | *****************************************************************************/ 19 | 20 | // Header: osra.h 21 | // 22 | // Defines types and functions exported from main module to other modules. 23 | // 24 | #ifndef OSRA_H 25 | #define OSRA_H 26 | 27 | #include // std:string 28 | #include // std::vector 29 | 30 | #include // Magick::Image, Magick::ColorGray 31 | 32 | extern "C" { 33 | #include 34 | } 35 | 36 | using namespace Magick; 37 | 38 | // struct: atom_s 39 | // Contains information about perspective atom 40 | struct atom_s 41 | { 42 | atom_s(double xx=0, double yy=0, const potrace_path_t* p=NULL) : 43 | x(xx),y(yy),min_x(xx),min_y(yy),max_x(xx),max_y(yy),curve(p),label(" "),n(0),anum(0), exists(false),corner(false),terminal(false),charge(0) {} 44 | // doubles: x, y 45 | // coordinates within the image clip 46 | double x, y; 47 | // string: label 48 | // atomic label 49 | std::string label; 50 | // int: n 51 | // counter of created OBAtom objects in 52 | int n; 53 | // int: anum 54 | // atomic number 55 | int anum; 56 | // pointer: curve 57 | // pointer to the curve found by Potrace 58 | const potrace_path_t *curve; 59 | // bools: exists, corner, terminal 60 | // atom exists, atom is at the corner (has two bonds leading to it), atom is a terminal atom 61 | bool exists, corner, terminal; 62 | // int: charge 63 | // electric charge on the atom 64 | int charge; 65 | // int: min_x, min_y, max_x, max_y 66 | // box coordinates 67 | int min_x, min_y,max_x,max_y; 68 | }; 69 | // typedef: atom_t 70 | // defines atom_t type based on atom_s struct 71 | typedef struct atom_s atom_t; 72 | 73 | // struct: bond_s 74 | // contains information about perspective bond between two atoms 75 | struct bond_s 76 | { 77 | bond_s(int i=0, int j=0, const potrace_path_t* p=NULL) : 78 | a(i),b(j),curve(p),type(1),exists(true),hash(false),wedge(false),up(false),down(false),Small(false),arom(false),conjoined(false) {} 79 | // ints: a, b, type 80 | // starting atom, ending atom, bond type (1=single, 2=double, 3=triple) 81 | int a, b, type; 82 | // pointer: curve 83 | // pointer to the curve found by Potrace 84 | const potrace_path_t *curve; 85 | // bools: exists, hash, wedge, up, down, Small, arom 86 | // bond existence and type flags 87 | bool exists; 88 | bool hash; 89 | bool wedge; 90 | bool up; 91 | bool down; 92 | bool Small; 93 | bool arom; 94 | // bool: conjoined 95 | // true for a double bond which is joined at one end on the image 96 | bool conjoined; 97 | }; 98 | // typedef: bond_t 99 | // defines bond_t type based on bond_s struct 100 | typedef struct bond_s bond_t; 101 | 102 | // Section: Constants 103 | // 104 | // Constants: global defines 105 | // 106 | // MAX_ATOMS - maximum size of the vector holding perspective atoms 107 | // MAX_FONT_HEIGHT - maximum font height at a resolution of 150 dpi 108 | // MAX_FONT_WIDTH - maximum font width at a resolution of 150 dpi 109 | // MIN_FONT_HEIGHT - minimum font height 110 | // BG_PICK_POINTS - number of points to randomly pick to determine background color 111 | // D_T_TOLERANCE - cosine tolerance to find parallel bonds for double-triple bond extraction 112 | // V_DISPLACEMENT - threshold vertical displacement in pixels 113 | // DIR_CHANGE - threshold direction change in pixels 114 | // THRESHOLD_GLOBAL - gray-level threshold for image binarization 115 | // THRESHOLD_LOW_RES - gray-level threshold for low resolutions (72 dpi) 116 | // MAX_RATIO - maximum black/white fill ratio for perspective molecular structures 117 | // MIN_ASPECT - minimum aspect ration 118 | // MAX_ASPECT - maximum aspect ratio 119 | // MIN_A_COUNT - minimum number of atoms 120 | // MAX_A_COUNT - maximum number of atoms 121 | // MIN_B_COUNT - minimum number of bonds 122 | // MAX_B_COUNT - maximum number of bonds 123 | // MIN_CHAR_POINTS - minimum number of black and white pixels in a character box 124 | // MAX_BOND_THICKNESS - maximum bond thickness 125 | // SMALL_PICTURE_AREA - threshold area of the image to be consider a small picture 126 | // NUM_RESOLUTIONS - number of resolutions to try 127 | // MAX_DASH - maximum size of a dash in a dashed bond 128 | // CC_BOND_LENGTH - average carbon-carbon bond length 129 | // FRAME - border around structure in a segmented image 130 | // SEPARATOR_ASPECT - aspect ratio for a perspective separator line 131 | // SEPARATOR_AREA - area for a perspective separator line 132 | // MAX_DIST - maximum distance in pixels between neighboring segments in image segmentation routines 133 | // MAX_AREA_RATIO - maximum area ratio for connected compoments in image segmentation 134 | // SINGLE_IMAGE_DIST - default distance between connected components in a single structure image 135 | // THRESHOLD_LEVEL - threshold level for feature matrix for image segmentation 136 | // TEXT_LINE_SIZE - maximum atomic label size in characters 137 | // PARTS_IN_MARGIN - take only every other pixel on a connected component margin for speed 138 | // BORDER_COUNT - threshold number of pixels on a box border to be considered a table 139 | // MAX_SEGMENTS - maximum number of connected compoment segments 140 | // MAX_FRAGMENTS - maximum number of fragments 141 | // STRUCTURE_COUNT - threshold number of structures to compute limits on average bond length 142 | // SPELLING_TXT - spelling file for OCR corrections 143 | // SUPERATOM_TXT - superatom file for mapping labels to SMILES 144 | #define PI 3.14159265358979323846 145 | #define MAX_ATOMS 10000 146 | #define MAX_FONT_HEIGHT 22 147 | #define MAX_FONT_WIDTH 21 148 | #define MIN_FONT_HEIGHT 5 149 | #define BG_PICK_POINTS 1000 150 | #define D_T_TOLERANCE 0.95 151 | #define V_DISPLACEMENT 3 152 | #define DIR_CHANGE 2 153 | #define THRESHOLD_GLOBAL 0.4 154 | #define THRESHOLD_LOW_RES 0.2 155 | #define MAX_RATIO 0.2 156 | #define MIN_ASPECT 0.1 157 | #define MAX_ASPECT 10. 158 | #define MIN_A_COUNT 5 159 | #define MAX_A_COUNT 250 160 | #define MIN_B_COUNT 5 161 | #define MAX_B_COUNT 250 162 | #define MIN_CHAR_POINTS 2 163 | #define MAX_BOND_THICKNESS 10 164 | #define SMALL_PICTURE_AREA 6000 165 | #define NUM_RESOLUTIONS 5 166 | #define MAX_DASH 80 167 | #define CC_BOND_LENGTH 1.5120 168 | #define FRAME 5 169 | #define SEPARATOR_ASPECT 100 170 | #define SEPARATOR_AREA 300 171 | #define MAX_DIST 50 172 | #define MAX_AREA_RATIO 50 173 | #define SINGLE_IMAGE_DIST 1000 174 | #define MAX_DISTANCE_BETWEEN_ARROWS 200 175 | #define THRESHOLD_LEVEL 4 176 | #define TEXT_LINE_SIZE 8 177 | #define PARTS_IN_MARGIN 2 178 | #define BORDER_COUNT 3000 179 | #define MAX_SEGMENTS 10000 180 | #define MAX_FRAGMENTS 10 181 | #define STRUCTURE_COUNT 20 182 | #define SPELLING_TXT "spelling.txt" 183 | #define SUPERATOM_TXT "superatom.txt" 184 | #define RECOGNIZED_CHARS "oOcCnNHFsSBuUgMeEXYZRPp23456789AmThD" 185 | 186 | #define ERROR_SPELLING_FILE_IS_MISSING -1 187 | #define ERROR_SUPERATOM_FILE_IS_MISSING -2 188 | #define ERROR_OUTPUT_FILE_OPEN_FAILED -3 189 | // This error code may be returned, if ImageMagic was not able to find the .mgk files. 190 | // Check that MAGICK_CONFIGURE_PATH points to the location of *.mgk configuration files (check here http://www.imagemagick.org/script/resources.php). 191 | #define ERROR_UNKNOWN_IMAGE_TYPE -4 192 | #define ERROR_ILLEGAL_ARGUMENT_COMBINATION -5 193 | // This error code usually means: 194 | // (a) You have no /usr/lib/openbabel/x.x.x/smilesformat.so library installed. Install the format libraries / check http://openbabel.org/docs/dev/Installation/install.html#environment-variables 195 | // (b) The format libraries are installed, but do not correspond to /usr/lib/libopenbabel.so.y.y.y. Check they correspond to the same OpenBabel version. 196 | // (c) You need to preload OpenBabel e.g. using LD_PRELOAD=/usr/lib/libopenbabel.so 197 | #define ERROR_UNKNOWN_OPENBABEL_FORMAT -6 198 | 199 | #endif 200 | -------------------------------------------------------------------------------- /package/linux/create_model_ga.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | import sys 3 | import operator 4 | from operator import itemgetter 5 | from os import listdir 6 | from os.path import isfile, join 7 | import random 8 | import math 9 | 10 | from openbabel import * 11 | 12 | #verify_model = [-0.02285364975052934, 0.20014736496836238, 0.24359177367611942, 0.09879557707190285, 0.23658236605437208, 0.07640365396444981, -0.016708707126055426, 0.291372680023203, 0.19743466243758914, -0.04143248625528295, 0.13283691497820155, -0.09404435905499846, -0.34011551678018254, -0.036998642457270414, 0.3366565862477758, 0.2528949626509886, 0.3523503866589532, 0.3013887466571869, 0.2457272087651062, -0.08224552150295372, 0.0386321456632419, 0.2269247796030229, 0.191691888047917, 0.029364205782588967, -0.09207024117341821, 0.024551588143053422] 13 | 14 | verify_model = [-0.11469143725730054, 0.15723547931889853, 0.19765680222250673, 0.249101590474403, 0.1897669087341134, 0.19588348907301223, 0.3354622208036507, 0.16779269801176255, -0.21232000222198893, 0.016958281784354032, -0.08672059360133752, -0.05105752296619957, -0.349912750824004, 0.18836317536530647, 0.22316782354758827, 0.27741998968081166, 0.25710999274481955, 0.27968899280120096, 0.12695166847876285, -0.10020778884718293, 0.05150631410596443, 0.22283571763712148, 0.23130179826714167, 0.1049054095759948, 0.05333970810460394, -0.12491056666737535] 15 | 16 | #verify_model = [-0.1545855719726278, -0.16679291864636722, 0.2779073764931343, -0.183848833684335, 0.010790075194024773, 0.29094165316568404, 0.055324497605819506, -0.2104820104189514, -0.1781856691338483, 0.12170164214195042, -0.03319968208305941, -0.17050232311223057, -0.3855170942775288, -0.07088710430614285, 0.24005317771967355, 0.2759926472483148, 0.25348276233777095, 0.23427258354038655, -0.1175747967837222, -0.18681840394577787, 0.06103578120099978, 0.24422743725717977, 0.25207495568639754, -0.09625789745569688, -0.01025153552468599, 0.19182292957981223] 17 | 18 | #verify_model = [-0.18431997080588122, -0.13632503439995766, 0.2891372503939111, 0.169268288671698, 0.07457361791041998, 0.1896239880096002, 0.17921064798323905, 0.24807741146917148, -0.1415236210886208, 0.017500171104361622, -0.1444618582502517, -0.019936025471384962, -0.39685240156986173, -0.22908080638789957, 0.27059782240339336, 0.17386007711539425, 0.21106985232185135, 0.2865651377997317, 0.09715634915474097, 0.008962730235627716, 0.030950650868271857, 0.2256707621011711, 0.19237430308515277, -0.22938527889531524, 0.15229124226660562, 0.2099925925427031] 19 | 20 | def normalize(x): 21 | n = sum(map( operator.mul, x, x)) 22 | n = math.sqrt(n) 23 | y = [a/n for a in x] 24 | return y 25 | 26 | def trial_confidence(x,c): 27 | return sum(map( operator.mul, x, c)) 28 | 29 | def model_recall(N,res_iter_all,probabilities,target,inchi_list,total): 30 | recall_model = 0 31 | k = 0 32 | for i in range(N): 33 | total_probabilities = [0.,0.,0.,0.,0.] 34 | n_probabilities = [0.,0.,0.,0.,0.] 35 | recall_inchi = set() 36 | for r in range(0,5): 37 | for j in range(len(res_iter_all[i])): 38 | if (res_iter_all[i][j] == r): 39 | total_probabilities[r] += probabilities[k+j] 40 | n_probabilities[r] += 1 41 | maxp = 0 42 | maxr = 0 43 | for r in range(0,5): 44 | if (n_probabilities[r]>0 and maxp < total_probabilities[r]/n_probabilities[r]): 45 | maxp = total_probabilities[r]/n_probabilities[r] 46 | maxr = r 47 | first = True 48 | for r in range(0,5): 49 | if (n_probabilities[r]>0 and maxp == total_probabilities[r]/n_probabilities[r] and (r == 2 or r == 3) and first): 50 | maxr = r 51 | first = False 52 | 53 | for j in range(len(res_iter_all[i])): 54 | if (res_iter_all[i][j] == maxr and target[k+j] == 1): 55 | recall_inchi.add(inchi_list[k+j]) 56 | 57 | k += len(res_iter_all[i]) 58 | recall_model += len(recall_inchi); 59 | 60 | return 1.*recall_model/total 61 | 62 | def mutation(n): 63 | d = [] 64 | for i in range(n): 65 | d.append(2.*random.random()-1.) 66 | return normalize(d) 67 | 68 | def crossover(population): 69 | n = len(population) 70 | i = random.randint(0,n-1) 71 | j = random.randint(0,n-1) 72 | v = population[i][1] 73 | u = population[j][1] 74 | m = random.randint(0,len(v)-1) 75 | r = v[:m] 76 | r.extend(u[m:]) 77 | return normalize(r) 78 | 79 | obconversion1 = OBConversion() 80 | obconversion1.SetInFormat("sdf") 81 | obconversion1.SetOutFormat("inchi") 82 | obmol1 = OBMol() 83 | obconversion2 = OBConversion() 84 | obconversion2.SetInFormat("sdf") 85 | obconversion2.SetOutFormat("inchi") 86 | obmol2 = OBMol() 87 | 88 | result = OBPlugin.ListAsString("fingerprints") 89 | assert "FP2" in result, result 90 | fingerprinter = OBFingerprint.FindFingerprint("FP2") 91 | v1 = vectorUnsignedInt() 92 | v2 = vectorUnsignedInt() 93 | obErrorLog.StopLogging() 94 | 95 | path1 = sys.argv[1] 96 | path2 = sys.argv[2] 97 | files = [ f for f in listdir(path1) if isfile(join(path1,f)) ] 98 | 99 | total = 0; 100 | recall = 0; 101 | target = [] 102 | train = [] 103 | confidence = [] 104 | resolutions = [] 105 | single = [] 106 | res_iter_all = [] 107 | probabilities = [] 108 | inchi_list = [] 109 | for f in files: 110 | file1 = join(path1,f); 111 | inchi_set1 = set() 112 | notatend1 = obconversion1.ReadFile(obmol1,file1) 113 | while notatend1: 114 | obmol1.AddHydrogens() 115 | inchi1 = obconversion1.WriteString(obmol1) 116 | if inchi1: 117 | inchi_set1.add(inchi1) 118 | total += 1; 119 | obmol1 = OBMol() 120 | notatend1 = obconversion1.Read(obmol1) 121 | 122 | file2 = join(path2,f); 123 | inchi_set2 = set() 124 | if isfile(file2): 125 | notatend2 = obconversion2.ReadFile(obmol2,file2) 126 | data_file = [] 127 | resolution = [] 128 | res_iter = [] 129 | while notatend2: 130 | obmol2.AddHydrogens() 131 | line = obmol2.GetData("Confidence_parameters").GetValue() 132 | data = [int(d) for d in line.split(",")] 133 | data_file.append(data) 134 | inchi2 = obconversion2.WriteString(obmol2) 135 | result = 0 136 | if inchi2: 137 | inchi_set2.add(inchi2) 138 | if inchi2 in inchi_set1: 139 | result = 1 140 | target.append(result) 141 | inchi_list.append(inchi2); 142 | resolution.append(int(obmol2.GetData("Resolution").GetValue())) 143 | res_iter.append(int(obmol2.GetData("Resolution_iteration").GetValue())) 144 | obmol2 = OBMol() 145 | notatend2 = obconversion2.Read(obmol2) 146 | resolutions.append(resolution) 147 | res_iter_all.append(res_iter) 148 | for d in data_file: 149 | train.append(d) 150 | recall += len(inchi_set1.intersection(inchi_set2)) 151 | 152 | N = len(files); 153 | ideal_recall = 1.*recall/total 154 | print "Ideal: ",ideal_recall 155 | population = [] 156 | 157 | if len(sys.argv)>3 and sys.argv[3] == "-verify": 158 | c = verify_model 159 | probabilities = [] 160 | for t in train: 161 | probabilities.append(trial_confidence(t,c)) 162 | r = model_recall(N,res_iter_all,probabilities,target,inchi_list,total) 163 | print "Model: ",r 164 | exit(0) 165 | 166 | for i in range(100): 167 | c = mutation(len(train[0])) 168 | probabilities = [] 169 | for t in train: 170 | probabilities.append(trial_confidence(t,c)) 171 | r = model_recall(N,res_iter_all,probabilities,target,inchi_list,total) 172 | population.append([r,c]) 173 | 174 | population.sort(key=itemgetter(0),reverse=True) 175 | round = 1 176 | print round,population[0][0] 177 | 178 | while population[0][0] < ideal_recall - 0.001 and round < 1000: 179 | # keep top 10 180 | new_population = population[0:10] 181 | # mutation 10 182 | for i in range(10): 183 | c = mutation(len(train[0])) 184 | probabilities = [] 185 | for t in train: 186 | probabilities.append(trial_confidence(t,c)) 187 | r = model_recall(N,res_iter_all,probabilities,target,inchi_list,total) 188 | new_population.append([r,c]) 189 | # crossover 80 190 | for i in range(80): 191 | c = crossover(population) 192 | probabilities = [] 193 | for t in train: 194 | probabilities.append(trial_confidence(t,c)) 195 | r = model_recall(N,res_iter_all,probabilities,target,inchi_list,total) 196 | new_population.append([r,c]) 197 | 198 | new_population.sort(key=itemgetter(0),reverse=True) 199 | population = new_population 200 | round += 1 201 | print round,population[0][0] 202 | 203 | print population[0][1] 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | -------------------------------------------------------------------------------- /package/win32/osra.nsi: -------------------------------------------------------------------------------- 1 | !define DOT_VERSION "2.1.0" 2 | !define DASH_VERSION "2-1-0" 3 | 4 | !include Sections.nsh 5 | ; include for some of the windows messages defines 6 | !include "winmessages.nsh" 7 | ; HKLM (all users) vs HKCU (current user) defines 8 | !define env_hklm 'HKLM "SYSTEM\CurrentControlSet\Control\Session Manager\Environment"' 9 | !define env_hkcu 'HKCU "Environment"' 10 | 11 | !include "TextFunc.nsh" 12 | !insertmacro LineFind 13 | 14 | ; The name of the installer 15 | Name "Optical Structure Recognition Application" 16 | 17 | ; The file to write 18 | OutFile "osra-setup-${DASH_VERSION}.exe" 19 | 20 | ; The default installation directory 21 | InstallDir $PROGRAMFILES\osra\${DOT_VERSION} 22 | 23 | ; Registry key to check for directory (so if you install again, it will 24 | ; overwrite the old one automatically) 25 | InstallDirRegKey HKLM "Software\osra\${DOT_VERSION}" "Install_Dir" 26 | 27 | LicenseData "license.txt" 28 | 29 | ; Request application privileges for Windows Vista 30 | RequestExecutionLevel admin 31 | 32 | ;-------------------------------- 33 | 34 | ; Pages 35 | 36 | Page license 37 | Page components 38 | Page directory 39 | Page instfiles 40 | 41 | UninstPage uninstConfirm 42 | UninstPage instfiles 43 | 44 | ;-------------------------------- 45 | 46 | ; The stuff to install 47 | Section "osra (required)" 48 | 49 | SectionIn RO 50 | 51 | ; Set output path to the installation directory. 52 | SetOutPath $INSTDIR 53 | 54 | ; Put file there 55 | File "osra-bin.exe" 56 | File "pthreadGC2.dll" 57 | File "README.txt" 58 | File "spelling.txt" 59 | File "superatom.txt" 60 | call createOSRAbat 61 | 62 | 63 | ; Write the installation path into the registry 64 | WriteRegStr HKLM SOFTWARE\osra\${DOT_VERSION} "Install_Dir" "$INSTDIR" 65 | WriteRegStr HKLM SOFTWARE\osra "Install_Dir" "$INSTDIR" 66 | 67 | ; Write the uninstall keys for Windows 68 | WriteRegStr HKLM "Software\Microsoft\Windows\CurrentVersion\Uninstall\osra" "DisplayName" "OSRA ${DOT_VERSION}" 69 | WriteRegStr HKLM "Software\Microsoft\Windows\CurrentVersion\Uninstall\osra" "UninstallString" '"$INSTDIR\uninstall.exe"' 70 | WriteRegDWORD HKLM "Software\Microsoft\Windows\CurrentVersion\Uninstall\osra" "NoModify" 1 71 | WriteRegDWORD HKLM "Software\Microsoft\Windows\CurrentVersion\Uninstall\osra" "NoRepair" 1 72 | WriteUninstaller "uninstall.exe" 73 | 74 | 75 | ; set variable 76 | WriteRegExpandStr ${env_hklm} OSRA "$INSTDIR" 77 | ; make sure windows knows about the change 78 | SendMessage ${HWND_BROADCAST} ${WM_WININICHANGE} 0 "STR:Environment" /TIMEOUT=5000 79 | 80 | SectionEnd 81 | 82 | Section /o "BIOVIA Draw plugin" symyx_draw 83 | call getSymyxPath 84 | strcmp $1 "" no_symyx 85 | SetOutPath "$1\AddIns" 86 | File "plugins\symyx_draw\OSRAAction.xml" 87 | SetOutPath "$1\AddIns\OSRAAction" 88 | File "plugins\symyx_draw\OSRAAction\README.txt" 89 | File "plugins\symyx_draw\OSRAAction\OSRAAction.dll" 90 | Goto done 91 | no_symyx: 92 | MessageBox MB_OK "BIOVIA Draw not found" IDOK done 93 | done: 94 | SectionEnd 95 | 96 | 97 | ; Uninstaller 98 | 99 | Section "Uninstall" 100 | # call userInfo plugin to get user info. The plugin puts the result in the stack 101 | userInfo::getAccountType 102 | 103 | # pop the result from the stack into $0 104 | pop $0 105 | 106 | # compare the result with the string "Admin" to see if the user is admin. 107 | # If match, jump 3 lines down. 108 | strCmp $0 "Admin" +3 109 | 110 | # if there is not a match, print message and return 111 | messageBox MB_OK "Please run this with Administrator privileges" 112 | Quit 113 | ReadRegStr $0 HKLM SOFTWARE\osra\${DOT_VERSION} "Install_Dir" 114 | strcpy $INSTDIR $0 115 | ; Remove registry keys 116 | DeleteRegKey HKLM "Software\Microsoft\Windows\CurrentVersion\Uninstall\osra" 117 | DeleteRegKey HKLM SOFTWARE\osra\${DOT_VERSION} 118 | ; delete variable 119 | DeleteRegValue ${env_hklm} OSRA 120 | ; make sure windows knows about the change 121 | SendMessage ${HWND_BROADCAST} ${WM_WININICHANGE} 0 "STR:Environment" /TIMEOUT=5000 122 | 123 | ; Remove files and uninstaller 124 | Delete $INSTDIR\osra-bin.exe 125 | Delete $INSTDIR\pthreadGC2.dll 126 | Delete $INSTDIR\README.txt 127 | Delete $INSTDIR\osra.bat 128 | Delete $INSTDIR\superatom.txt 129 | Delete $INSTDIR\spelling.txt 130 | Delete $INSTDIR\uninstall.exe 131 | RMDir "$INSTDIR" 132 | call un.getSymyxPath 133 | strcmp $1 "" no_symyx 134 | Delete "$1\AddIns\OSRAAction.xml" 135 | Delete "$1\AddIns\OSRAAction\README.txt" 136 | Delete "$1\AddIns\OSRAAction\OSRAAction.dll" 137 | Delete "$1\AddIns\OSRAAction\OSRAAction.dll.config" 138 | RMDir "$1\AddIns\OSRAAction" 139 | no_symyx: 140 | SectionEnd 141 | 142 | Function getSymyxPath 143 | Push "$PROGRAMFILES\BIOVIA" 144 | Push "BIOVIADraw.exe" 145 | Call FindIt 146 | Pop $R1 147 | Push "$R1" 148 | Call GetParent 149 | Pop $R0 150 | StrCpy $1 "$R0\" 151 | IfFileExists $1BIOVIADraw.exe fin 152 | Push "$PROGRAMFILES64\BIOVIA" 153 | Push "BIOVIADraw.exe" 154 | Call FindIt 155 | Pop $R1 156 | Push "$R1" 157 | Call GetParent 158 | Pop $R0 159 | StrCpy $1 "$R0\" 160 | IfFileExists $1BIOVIADraw.exe fin 161 | Push "$PROGRAMFILES\Accelrys" 162 | Push "AccelrysDraw.exe" 163 | Call FindIt 164 | Pop $R1 165 | Push "$R1" 166 | Call GetParent 167 | Pop $R0 168 | StrCpy $1 "$R0\" 169 | IfFileExists $1AccelrysDraw.exe fin 170 | Push "$PROGRAMFILES64\Accelrys" 171 | Push "AccelrysDraw.exe" 172 | Call FindIt 173 | Pop $R1 174 | Push "$R1" 175 | Call GetParent 176 | Pop $R0 177 | StrCpy $1 "$R0\" 178 | IfFileExists $1AccelrysDraw.exe fin 179 | Push "$PROGRAMFILES\Symyx" 180 | Push "SymyxDraw.exe" 181 | Call FindIt 182 | Pop $R1 183 | Push "$R1" 184 | Call GetParent 185 | Pop $R0 186 | StrCpy $1 "$R0\" 187 | IfFileExists $1SymyxDraw.exe fin 188 | Push "$PROGRAMFILES64\Symyx" 189 | Push "SymyxDraw.exe" 190 | Call FindIt 191 | Pop $R1 192 | Push "$R1" 193 | Call GetParent 194 | Pop $R0 195 | StrCpy $1 "$R0\" 196 | IfFileExists $1SymyxDraw.exe fin 197 | StrCpy $1 "" 198 | fin: 199 | ;$1 contains the folder of Symyx Draw or empty 200 | FunctionEnd 201 | 202 | 203 | Function un.getSymyxPath 204 | Push "$PROGRAMFILES\BIOVIA" 205 | Push "BIOVIADraw.exe" 206 | Call un.FindIt 207 | Pop $R1 208 | Push "$R1" 209 | Call un.GetParent 210 | Pop $R0 211 | StrCpy $1 "$R0\" 212 | IfFileExists $1BIOVIADraw.exe fin 213 | Push "$PROGRAMFILES64\BIOVIA" 214 | Push "BIOVIADraw.exe" 215 | Call un.FindIt 216 | Pop $R1 217 | Push "$R1" 218 | Call un.GetParent 219 | Pop $R0 220 | StrCpy $1 "$R0\" 221 | IfFileExists $1BIOVIADraw.exe fin 222 | Push "$PROGRAMFILES\Accelrys" 223 | Push "AccelrysDraw.exe" 224 | Call un.FindIt 225 | Pop $R1 226 | Push "$R1" 227 | Call un.GetParent 228 | Pop $R0 229 | StrCpy $1 "$R0\" 230 | IfFileExists $1AccelrysDraw.exe fin 231 | Push "$PROGRAMFILES64\Accelrys" 232 | Push "AccelrysDraw.exe" 233 | Call un.FindIt 234 | Pop $R1 235 | Push "$R1" 236 | Call un.GetParent 237 | Pop $R0 238 | StrCpy $1 "$R0\" 239 | IfFileExists $1AccelrysDraw.exe fin 240 | Push "$PROGRAMFILES\Symyx" 241 | Push "SymyxDraw.exe" 242 | Call un.FindIt 243 | Pop $R1 244 | Push "$R1" 245 | Call un.GetParent 246 | Pop $R0 247 | StrCpy $1 "$R0\" 248 | IfFileExists $1SymyxDraw.exe fin 249 | Push "$PROGRAMFILES64\Symyx" 250 | Push "SymyxDraw.exe" 251 | Call un.FindIt 252 | Pop $R1 253 | Push "$R1" 254 | Call un.GetParent 255 | Pop $R0 256 | StrCpy $1 "$R0\" 257 | IfFileExists $1SymyxDraw.exe fin 258 | StrCpy $1 "" 259 | fin: 260 | ;$1 contains the folder of Symyx Draw or empty 261 | FunctionEnd 262 | 263 | 264 | Function createOSRAbat 265 | fileOpen $0 "$INSTDIR\osra.bat" w 266 | fileWrite $0 '\ 267 | @echo off$\r$\n\ 268 | setlocal$\r$\n\ 269 | set exec_dir=%~dp0%$\r$\n\ 270 | set PATH=%exec_dir%;$1\bin;$1\lib;%PATH%$\r$\n\ 271 | "%exec_dir%osra-bin.exe" %*$\r$\n\ 272 | endlocal$\r$\n\ 273 | ' 274 | fileClose $0 275 | FunctionEnd 276 | 277 | 278 | Function .onInit 279 | # call userInfo plugin to get user info. The plugin puts the result in the stack 280 | userInfo::getAccountType 281 | 282 | # pop the result from the stack into $0 283 | pop $0 284 | 285 | # compare the result with the string "Admin" to see if the user is admin. 286 | # If match, jump 3 lines down. 287 | strCmp $0 "Admin" +3 288 | 289 | # if there is not a match, print message and return 290 | messageBox MB_OK "Please run this with Administrator privileges" 291 | Quit 292 | call getSymyxPath 293 | strcmp $1 "" no_symyx 294 | SectionGetFlags "${symyx_draw}" $0 295 | IntOp $0 $0 | ${SF_SELECTED} 296 | SectionSetFlags "${symyx_draw}" $0 297 | no_symyx: 298 | FunctionEnd 299 | 300 | Function FindIt 301 | Exch $R0 302 | Exch 303 | Exch $R1 304 | Push $R2 305 | Push $R3 306 | Push $R4 307 | Push $R5 308 | Push $R6 309 | 310 | StrCpy $R6 -1 311 | StrCpy $R3 1 312 | 313 | Push $R1 314 | 315 | nextDir: 316 | Pop $R1 317 | IntOp $R3 $R3 - 1 318 | ClearErrors 319 | FindFirst $R5 $R2 "$R1\*.*" 320 | 321 | nextFile: 322 | StrCmp $R2 "." gotoNextFile 323 | StrCmp $R2 ".." gotoNextFile 324 | 325 | StrCmp $R2 $R0 0 isDir 326 | StrCpy $R6 "$R1\$R2" 327 | loop: 328 | StrCmp $R3 0 done 329 | Pop $R1 330 | IntOp $R3 $R3 - 1 331 | Goto loop 332 | 333 | isDir: 334 | 335 | IfFileExists "$R1\$R2\*.*" 0 gotoNextFile 336 | IntOp $R3 $R3 + 1 337 | Push "$R1\$R2" 338 | 339 | gotoNextFile: 340 | FindNext $R5 $R2 341 | IfErrors 0 nextFile 342 | 343 | done: 344 | FindClose $R5 345 | StrCmp $R3 0 0 nextDir 346 | StrCpy $R0 $R6 347 | 348 | Pop $R6 349 | Pop $R5 350 | Pop $R4 351 | Pop $R3 352 | Pop $R2 353 | Pop $R1 354 | Exch $R0 355 | FunctionEnd 356 | 357 | Function GetParent 358 | 359 | Exch $R0 360 | Push $R1 361 | Push $R2 362 | Push $R3 363 | 364 | StrCpy $R1 0 365 | StrLen $R2 $R0 366 | 367 | loop: 368 | IntOp $R1 $R1 + 1 369 | IntCmp $R1 $R2 get 0 get 370 | StrCpy $R3 $R0 1 -$R1 371 | StrCmp $R3 "\" get 372 | Goto loop 373 | 374 | get: 375 | StrCpy $R0 $R0 -$R1 376 | 377 | Pop $R3 378 | Pop $R2 379 | Pop $R1 380 | Exch $R0 381 | 382 | FunctionEnd 383 | 384 | Function un.FindIt 385 | Exch $R0 386 | Exch 387 | Exch $R1 388 | Push $R2 389 | Push $R3 390 | Push $R4 391 | Push $R5 392 | Push $R6 393 | 394 | StrCpy $R6 -1 395 | StrCpy $R3 1 396 | 397 | Push $R1 398 | 399 | nextDir: 400 | Pop $R1 401 | IntOp $R3 $R3 - 1 402 | ClearErrors 403 | FindFirst $R5 $R2 "$R1\*.*" 404 | 405 | nextFile: 406 | StrCmp $R2 "." gotoNextFile 407 | StrCmp $R2 ".." gotoNextFile 408 | 409 | StrCmp $R2 $R0 0 isDir 410 | StrCpy $R6 "$R1\$R2" 411 | loop: 412 | StrCmp $R3 0 done 413 | Pop $R1 414 | IntOp $R3 $R3 - 1 415 | Goto loop 416 | 417 | isDir: 418 | 419 | IfFileExists "$R1\$R2\*.*" 0 gotoNextFile 420 | IntOp $R3 $R3 + 1 421 | Push "$R1\$R2" 422 | 423 | gotoNextFile: 424 | FindNext $R5 $R2 425 | IfErrors 0 nextFile 426 | 427 | done: 428 | FindClose $R5 429 | StrCmp $R3 0 0 nextDir 430 | StrCpy $R0 $R6 431 | 432 | Pop $R6 433 | Pop $R5 434 | Pop $R4 435 | Pop $R3 436 | Pop $R2 437 | Pop $R1 438 | Exch $R0 439 | FunctionEnd 440 | 441 | Function un.GetParent 442 | 443 | Exch $R0 444 | Push $R1 445 | Push $R2 446 | Push $R3 447 | 448 | StrCpy $R1 0 449 | StrLen $R2 $R0 450 | 451 | loop: 452 | IntOp $R1 $R1 + 1 453 | IntCmp $R1 $R2 get 0 get 454 | StrCpy $R3 $R0 1 -$R1 455 | StrCmp $R3 "\" get 456 | Goto loop 457 | 458 | get: 459 | StrCpy $R0 $R0 -$R1 460 | 461 | Pop $R3 462 | Pop $R2 463 | Pop $R1 464 | Exch $R0 465 | 466 | FunctionEnd --------------------------------------------------------------------------------