├── .gitignore ├── .gitmodules ├── LICENSE ├── README.md ├── TODO.md ├── build ├── dldt │ └── dldt_setup.sh ├── ffmpeg │ ├── ffmpeg_premake.sh │ └── ffmpeg_setup.sh ├── openblas │ └── openblas_setup.sh └── opencv │ └── opencv_setup.sh ├── create_wheel ├── LICENSE ├── LICENSE_MIT ├── README.md ├── cv2 │ └── __init__.py └── setup.py ├── download_all_stuff.sh ├── requirements.txt ├── tests ├── README.md ├── dislike.jpg ├── examples.ipynb ├── helloworld.png ├── pixellink.py ├── prepare_and_run_tests.sh ├── requirements.txt ├── short_video.mp4 ├── tests.py ├── text-detection-0004.bin.sha256sum ├── text-detection-0004.xml.sha256sum ├── text-recognition-0012.bin.sha256sum ├── text-recognition-0012.xml.sha256sum └── text_recognition.py └── tests_openvino ├── README.md ├── pixellink.py ├── prepare_and_run_tests.sh ├── requirements.txt ├── tests.py ├── text-detection-0004.bin.sha256sum ├── text-detection-0004.xml.sha256sum └── text_recognition.py /.gitignore: -------------------------------------------------------------------------------- 1 | openblas/* 2 | !openblas/.gitkeep 3 | 4 | dldt/* 5 | !dldt/.gitkeep 6 | 7 | opencv/* 8 | !opencv/.gitkeep 9 | 10 | ffmpeg/* 11 | !ffmpeg/.gitkeep 12 | 13 | build/* 14 | !build/opencv 15 | build/opencv/* 16 | !build/opencv/opencv_setup.sh 17 | 18 | !build/dldt 19 | build/dldt/* 20 | !build/dldt/dldt_setup.sh 21 | !build/dldt/*.patch 22 | 23 | !build/ffmpeg 24 | build/ffmpeg/* 25 | !build/ffmpeg/ffmpeg_setup.sh 26 | !build/ffmpeg/ffmpeg_premake.sh 27 | 28 | !build/openblas 29 | build/openblas/* 30 | !build/openblas/openblas_setup.sh 31 | 32 | create_wheel/* 33 | !create_wheel/LICENSE* 34 | !create_wheel/README.md 35 | !create_wheel/setup.py 36 | !create_wheel/cv2 37 | create_wheel/cv2/* 38 | !create_wheel/cv2/__init__.py 39 | 40 | tests/venv_t 41 | tests/rateme* 42 | venv 43 | 44 | tests_openvino/venv_t 45 | tests_openvino/venv_d 46 | tests_openvino/rateme* 47 | tests_openvino/*.bin 48 | tests_openvino/*.xml 49 | 50 | *cache* 51 | .ipynb_checkpoints 52 | *tar.gz 53 | *tar.bz2 54 | *.zip 55 | *.swp 56 | *.whl 57 | *.xml 58 | TODO.txt 59 | *.bin 60 | *.weights 61 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "dldt"] 2 | path = dldt 3 | url = https://github.com/openvinotoolkit/openvino 4 | [submodule "opencv"] 5 | path = opencv 6 | url = https://github.com/opencv/opencv 7 | [submodule "ffmpeg"] 8 | path = ffmpeg 9 | url = https://github.com/FFmpeg/FFmpeg 10 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Kabakov Borys 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Downloads](https://pepy.tech/badge/opencv-python-inference-engine)](https://pepy.tech/project/opencv-python-inference-engine) [![Downloads](https://pepy.tech/badge/opencv-python-inference-engine/month)](https://pepy.tech/project/opencv-python-inference-engine/month) [![Downloads](https://pepy.tech/badge/opencv-python-inference-engine/week)](https://pepy.tech/project/opencv-python-inference-engine/week) 2 | 3 | # opencv-python-inference-engine 4 | 5 | --- 6 | 7 | $${\color{red}It \space is \space deprecated \space now, \space all \space future \space updates \space are \space improbable}$$ 8 | 9 | A lot has changed during my military leave: 10 | 11 | 1. Everything [changed since OpenVINO 2021.1](https://github.com/openvinotoolkit/openvino/releases/tag/2022.1.0), now there should be [two separate libs](https://opencv.org/how-to-use-opencv-with-openvino/), building process and inference engine API have changed dramatically, *without backwards compatibility* (btw, opencv-python now *official* python builds). 12 | 2. Now opencv has [small package installations via pip](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html), so the main reason for creating that package is gone. 13 | 3. Just [look at the current official way](https://github.com/banderlog/opencv-python-inference-engine/tree/dev/create_wheel/cv2) of importing cv2 cxx lib into the python module via 5 python scripts -- I do not want to mess with this growing lump of crutches 14 | 15 | My advice is to use openvino and opencv-python packages, see rewritten examples in [`tests_openvino`](https://github.com/banderlog/opencv-python-inference-engine/tree/master/tests_openvino) 16 | 17 | --- 18 | 19 | This is *Unofficial* pre-built OpenCV with the inference engine part of [OpenVINO](https://github.com/openvinotoolkit/openvino) package for Python. 20 | 21 | ## Installing from `pip3` 22 | 23 | Remove previously installed versions of `cv2` 24 | 25 | ```bash 26 | pip3 install opencv-python-inference-engine 27 | ``` 28 | 29 | ## Examples of usage 30 | 31 | Please see the `examples.ipynb` in the `tests` folder. 32 | 33 | You will need to preprocess data as a model requires and decode the output. A description of the decoding *should* be in the model documentation with examples in open-vino documentation, however, in some cases, the original article may be the only information source. Some models are very simple to encode/decode, others are tough (e.g., PixelLink in tests). 34 | 35 | 36 | ## Downloading intel models 37 | 38 | The official way is awkward because you need to git clone the whole [model_zoo](https://github.com/opencv/open_model_zoo) ([details](https://github.com/opencv/open_model_zoo/issues/522)) 39 | 40 | Better to find a model description [here](https://github.com/opencv/open_model_zoo/blob/master/models/intel/index.md) and download manually from [here](https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/) 41 | 42 | 43 | ## Description 44 | 45 | 46 | ### Why 47 | 48 | I needed an ability to fast deploy a small package that able to run models from [Intel's model zoo](https://github.com/openvinotoolkit/open_model_zoo) and use [Movidius NCS](https://software.intel.com/en-us/neural-compute-stick). 49 | Well-known [opencv-python](https://github.com/skvark/opencv-python) can't do this. 50 | The official way is to use OpenVINO, but it is big and clumsy (just try to use it with python venv or fast download it on cloud instance). 51 | 52 | 53 | ### Limitations 54 | 55 | + Package comes without contrib modules. 56 | + You need to [add udev rules](https://www.intel.com/content/www/us/en/support/articles/000057005/boards-and-kits.html) if you want working MYRIAD plugin. 57 | + It was tested on Ubuntu 18.04, Ubuntu 18.10 as Windows 10 Subsystem and Gentoo. 58 | + It will not work for Ubuntu 16.04 and below (except v4.1.0.4). 59 | + I had not made builds for Windows or MacOS. 60 | + It built with `ffmpeg` and `v4l` support (`ffmpeg` libs included). 61 | + No GTK/QT support -- use `matplotlib` for plotting your results. 62 | + It is 64 bit. 63 | 64 | ### Main differences from `opencv-python-headless` 65 | 66 | + Usage of `AVX2` instructions 67 | + No `JPEG 2000`, `WEBP`, `OpenEXR` support 68 | + `TBB` used as a parallel framework 69 | + Inference Engine with `MYRIAD` plugin 70 | 71 | ### Main differences from OpenVINO 72 | 73 | + No model-optimizer 74 | + No [ITT](https://software.intel.com/en-us/articles/intel-itt-api-open-source) 75 | + No [IPP](https://software.intel.com/en-us/ipp) 76 | + No [Intel Media SDK](https://software.intel.com/en-us/media-sdk) 77 | + No [OpenVINO IE API](https://github.com/opencv/dldt/tree/2020/inference-engine/ie_bridges/python/src/openvino/inference_engine) 78 | + No python2 support (it is dead) 79 | + No Gstreamer (use ffmpeg) 80 | + No GTK (+16 MB and a lot of problems and extra work to compile Qt\GTK libs from sources.) 81 | 82 | For additional info read `cv2.getBuildInformation()` output. 83 | 84 | ### Versioning 85 | 86 | `YYYY.MM.DD`, because it is the most simple way to track opencv/openvino versions. 87 | 88 | ## Compiling from source 89 | 90 | You will need ~7GB RAM and ~10GB disk space 91 | 92 | I am using Ubuntu 18.04 (python 3.6) [multipass](https://multipass.run/) instance: `multipass launch -c 6 -d 10G -m 7G 18.04`. 93 | 94 | ### Requirements 95 | 96 | From [opencv](https://docs.opencv.org/master/d7/d9f/tutorial_linux_install.html), [dldt](https://docs.opencv.org/master/d7/d9f/tutorial_linux_install.html), 97 | [ffmpeg](https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu), and [ngraph](https://www.ngraph.ai/documentation/buildlb) 98 | 99 | ```bash 100 | # We need newer `cmake` for dldt (fastest way I know) 101 | # >=cmake-3.16 102 | sudo apt remove --purge cmake 103 | hash -r 104 | sudo snap install cmake --classic 105 | 106 | # nasm for ffmpeg 107 | # libusb-1.0-0-dev for MYRIAD plugin 108 | sudo apt update 109 | sudo apt install build-essential git pkg-config python3-dev nasm python3 virtualenv libusb-1.0-0-dev chrpath shellcheck 110 | 111 | # for ngraph 112 | # the `dldt/_deps/ext_onnx-src/onnx/gen_proto.py` has `#!/usr/bin/env python` string and will throw an error otherwise 113 | sudo ln -s /usr/bin/python3 /usr/bin/python 114 | ``` 115 | 116 | ### Preparing 117 | 118 | ```bash 119 | git clone https://github.com/banderlog/opencv-python-inference-engine 120 | cd opencv-python-inference-engine 121 | # git checkout dev 122 | ./download_all_stuff.sh 123 | ``` 124 | 125 | ### Compilation 126 | 127 | ```bash 128 | cd build/ffmpeg 129 | ./ffmpeg_setup.sh && 130 | ./ffmpeg_premake.sh && 131 | make -j6 && 132 | make install 133 | 134 | cd ../dldt 135 | ./dldt_setup.sh && 136 | make -j6 137 | 138 | # NB: check `-D INF_ENGINE_RELEASE` value 139 | # should be in form YYYYAABBCC (e.g. 2020.1.0.2 -> 2020010002)") 140 | cd ../opencv 141 | ./opencv_setup.sh && 142 | make -j6 143 | ``` 144 | 145 | ### Wheel creation 146 | 147 | ```bash 148 | # get all compiled libs together 149 | cd ../../ 150 | cp build/opencv/lib/python3/cv2.cpython*.so create_wheel/cv2/cv2.so 151 | 152 | cp dldt/bin/intel64/Release/lib/*.so create_wheel/cv2/ 153 | cp dldt/bin/intel64/Release/lib/*.mvcmd create_wheel/cv2/ 154 | cp dldt/bin/intel64/Release/lib/plugins.xml create_wheel/cv2/ 155 | cp dldt/inference-engine/temp/tbb/lib/libtbb.so.2 create_wheel/cv2/ 156 | 157 | cp build/ffmpeg/binaries/lib/*.so create_wheel/cv2/ 158 | 159 | # change RPATH 160 | cd create_wheel 161 | for i in cv2/*.so; do chrpath -r '$ORIGIN' $i; done 162 | 163 | # final .whl will be in /create_wheel/dist/ 164 | # NB: check version in the `setup.py` 165 | ../venv/bin/python3 setup.py bdist_wheel 166 | ``` 167 | 168 | ### Optional things to play with 169 | 170 | + [dldt build instruction](https://github.com/openvinotoolkit/openvino/wiki/CMakeOptionsForCustomCompilation) 171 | + [dldt cmake flags](https://github.com/openvinotoolkit/openvino/blob/master/inference-engine/cmake/features.cmake) 172 | + [opencv cmake flags](https://github.com/opencv/opencv/blob/master/CMakeLists.txt) 173 | 174 | **NB:** removing `QUIET` from `find_package()` in project Cmake files, could help to solve some problems -- сmake will start to log them. 175 | 176 | 177 | #### GTK2 178 | 179 | Make next changes in `opencv-python-inference-engine/build/opencv/opencv_setup.sh`: 180 | 1. change string `-D WITH_GTK=OFF \` to `-D WITH_GTK=ON \` 181 | 2. `export PKG_CONFIG_PATH=$ABS_PORTION/build/ffmpeg/binaries/lib/pkgconfig:$PKG_CONFIG_PATH` -- you will need to 182 | add absolute paths to `.pc` files. On Ubuntu 18.04 they here: 183 | `/usr/lib/x86_64-linux-gnu/pkgconfig/:/usr/share/pkgconfig/:/usr/local/lib/pkgconfig/:/usr/lib/pkgconfig/` 184 | 185 | Exporting `PKG_CONFIG_PATH` for `ffmpeg` somehow messes with default values. 186 | 187 | It will add ~16MB to the package. 188 | 189 | #### Integrated Performance Primitives 190 | 191 | Just set `-D WITH_IPP=ON` in `opencv_setup.sh`. 192 | 193 | It will give +30MB to the final `cv2.so` size. And it will boost _some_ opencv functions. 194 | 195 | [Official Intel's IPP benchmarks](https://software.intel.com/en-us/ipp/benchmarks) (may ask for registration) 196 | 197 | #### MKL 198 | 199 | You need to download MKL-DNN release and set two flags:`-D GEMM=MKL` , `-D MKLROOT` ([details](https://github.com/opencv/dldt/issues/327)) 200 | 201 | OpenVino comes with 30MB `libmkl_tiny_tbb.so`, but [you will not be able to compile it](https://github.com/intel/mkl-dnn/issues/674), because it made from proprietary MKL. 202 | 203 | Our opensource MKL-DNN experiment will end with 125MB `libmklml_gnu.so` and inference speed compatible with 5MB openblas ([details](https://github.com/banderlog/opencv-python-inference-engine/issues/5)). 204 | 205 | 206 | #### CUDA 207 | 208 | I did not try it. But it cannot be universal, it will only work with the certain combination of GPU+CUDA+cuDNN for which it will be compiled for. 209 | 210 | + [Compile OpenCV’s ‘dnn’ module with NVIDIA GPU support](https://www.pyimagesearch.com/2020/02/10/opencv-dnn-with-nvidia-gpus-1549-faster-yolo-ssd-and-mask-r-cnn/) 211 | + [Use OpenCV’s ‘dnn’ module with NVIDIA GPUs, CUDA, and cuDNN](https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/) 212 | 213 | 214 | #### OpenMP 215 | 216 | It is possible to compile OpenBLAS, dldt and OpenCV with OpenMP. I am not sure that the result would be better than now, but who knows. 217 | -------------------------------------------------------------------------------- /TODO.md: -------------------------------------------------------------------------------- 1 | # TODO list 2 | 3 | + Auto value for `-D INF_ENGINE_RELEASE`: https://github.com/openvinotoolkit/openvino/issues/1435 4 | + 5 | + `ENABLE_AVX512F`, how often you see such CPUs in clouds? 6 | + `avresample` from ffmpeg to the opencv, do we need it? 7 | -------------------------------------------------------------------------------- /build/dldt/dldt_setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # https://github.com/openvinotoolkit/openvino/wiki/CMakeOptionsForCustomCompilation 4 | # https://github.com/openvinotoolkit/openvino/issues/4527 5 | # -D ENABLE_OPENCV=OFF \ 6 | # https://github.com/openvinotoolkit/openvino/issues/5100 7 | # -D BUILD_SHARED_LIBS=OFF \ 8 | # -D BUILD_SHARED_LIBS=ON \ 9 | # https://github.com/openvinotoolkit/openvino/issues/5209 10 | # -D NGRAPH_TOOLS_ENABLE=OFF \ 11 | cmake -D CMAKE_BUILD_TYPE=Release \ 12 | -D THREADING=TBB \ 13 | -D ENABLE_MKL_DNN=ON \ 14 | -D GEMM=JIT \ 15 | -D ENABLE_FASTER_BUILD=ON \ 16 | -D ENABLE_LTO=ON \ 17 | -D ENABLE_VPU=ON \ 18 | -D ENABLE_MYRIAD=ON \ 19 | -D ENABLE_SSE42=ON \ 20 | -D ENABLE_AVX2=ON \ 21 | -D ENABLE_AVX512F=OFF \ 22 | -D BUILD_TESTS=OFF \ 23 | -D ENABLE_ALTERNATIVE_TEMP=OFF \ 24 | -D ENABLE_CLDNN=OFF \ 25 | -D ENABLE_CLDNN_TESTS=OFF \ 26 | -D ENABLE_DOCS=OFF \ 27 | -D ENABLE_GAPI_TESTS=OFF \ 28 | -D ENABLE_GNA=OFF \ 29 | -D ENABLE_OPENCV=OFF \ 30 | -D ENABLE_PROFILING_ITT=OFF \ 31 | -D ENABLE_PYTHON=OFF \ 32 | -D ENABLE_SAMPLES=OFF \ 33 | -D ENABLE_SPEECH_DEMO=OFF \ 34 | -D ENABLE_TESTS=OFF \ 35 | -D GAPI_TEST_PERF=OFF \ 36 | -D NGRAPH_ONNX_IMPORT_ENABLE=ON \ 37 | -D NGRAPH_TEST_UTIL_ENABLE=OFF \ 38 | -D NGRAPH_TOOLS_ENABLE=OFF \ 39 | -D NGRAPH_UNIT_TEST_ENABLE=OFF \ 40 | -D SELECTIVE_BUILD=OFF ../../dldt/ 41 | -------------------------------------------------------------------------------- /build/ffmpeg/ffmpeg_premake.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Build ffmpeg shared libraries without version suffix 4 | # from 5 | 6 | OLD1='SLIBNAME_WITH_VERSION=$(SLIBNAME).$(LIBVERSION)' 7 | OLD2='SLIBNAME_WITH_MAJOR=$(SLIBNAME).$(LIBMAJOR)' 8 | OLD3='SLIB_INSTALL_NAME=$(SLIBNAME_WITH_VERSION)' 9 | OLD4='SLIB_INSTALL_LINKS=$(SLIBNAME_WITH_MAJOR) $(SLIBNAME)' 10 | 11 | NEW1='SLIBNAME_WITH_VERSION=$(SLIBNAME)' 12 | NEW2='SLIBNAME_WITH_MAJOR=$(SLIBNAME)' 13 | NEW3='SLIB_INSTALL_NAME=$(SLIBNAME)' 14 | NEW4='SLIB_INSTALL_LINKS=' 15 | 16 | 17 | sed -i -e "s/${OLD1}/${NEW1}/" -e "s/${OLD2}/${NEW2}/" -e "s/${OLD3}/${NEW3}/" -e "s/${OLD4}/${NEW4}/" ./ffbuild/config.mak 18 | -------------------------------------------------------------------------------- /build/ffmpeg/ffmpeg_setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # deprecated: --enable-avresample , switch to libswresample 3 | # The libswresample library performs highly optimized audio resampling, 4 | # rematrixing and sample format conversion operations. 5 | 6 | PATH_TO_SCRIPT=`dirname $(realpath $0)` 7 | 8 | ../../ffmpeg/configure \ 9 | --prefix=$PATH_TO_SCRIPT/binaries \ 10 | --disable-programs \ 11 | --disable-avdevice \ 12 | --disable-postproc \ 13 | --disable-static \ 14 | --disable-avdevice \ 15 | --disable-swresample \ 16 | --disable-postproc \ 17 | --disable-avfilter \ 18 | --disable-alsa \ 19 | --disable-appkit \ 20 | --disable-avfoundation \ 21 | --disable-bzlib \ 22 | --disable-coreimage \ 23 | --disable-iconv \ 24 | --disable-lzma \ 25 | --disable-sndio \ 26 | --disable-schannel \ 27 | --disable-sdl2 \ 28 | --disable-securetransport \ 29 | --disable-xlib \ 30 | --disable-zlib \ 31 | --disable-audiotoolbox \ 32 | --disable-amf \ 33 | --disable-cuvid \ 34 | --disable-d3d11va \ 35 | --disable-dxva2 \ 36 | --disable-ffnvcodec \ 37 | --disable-nvdec \ 38 | --disable-nvenc \ 39 | --disable-v4l2-m2m \ 40 | --disable-vaapi \ 41 | --disable-vdpau \ 42 | --disable-videotoolbox \ 43 | --disable-doc \ 44 | --disable-static \ 45 | --enable-pic \ 46 | --enable-shared \ 47 | -------------------------------------------------------------------------------- /build/openblas/openblas_setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Please refer here for details: 4 | # 5 | # If you compile it with `make FC=gfortran`, 6 | # you'll need `libgfortran.so.4` and `libquadmath.so.0` 7 | 8 | cmake -D NO_LAPACKE=1 \ 9 | -D CMAKE_BUILD_TYPE=Release \ 10 | -D NOFORTRAN=ON \ 11 | -D BUILD_RELAPACK=OFF \ 12 | -D NO_AFFINITY=1 \ 13 | -D USE_OPENMP=0 \ 14 | -D NO_WARMUP=1 \ 15 | -D NUM_THREADS=64 \ 16 | -D GEMM_MULTITHREAD_THRESHOLD=64 \ 17 | -D BUILD_SHARED_LIBS=ON \ 18 | -D CMAKE_INSTALL_PREFIX=./ ../../openblas/ 19 | -------------------------------------------------------------------------------- /build/opencv/opencv_setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # for CPU_BASELINE and CPU_DISPATCH see https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options 4 | # they should match with ones for dldt/inference-engine/src/extension/cmake/OptimizationFlags.cmake 5 | # 6 | # -DINF_ENGINE_RELEASE= should match dldt version 7 | # See 8 | # From 9 | # "Force IE version, should be in form YYYYAABBCC (e.g. 2020.1.0.2 -> 2020010002)") 10 | 11 | tmp=$(pwd) 12 | ABS_PORTION=${tmp%%"/build/opencv"} 13 | 14 | FFMPEG_PATH=$ABS_PORTION/build/ffmpeg/binaries 15 | export LD_LIBRARY_PATH=$FFMPEG_PATH/lib/:$LD_LIBRARY_PATH 16 | export PKG_CONFIG_PATH=$FFMPEG_PATH/lib/pkgconfig:$PKG_CONFIG_PATH 17 | export PKG_CONFIG_LIBDIR=$FFMPEG_PATH/lib/:$PKG_CONFIG_LIBDIR 18 | 19 | # grep "5" from "Python 3.5.2" 20 | PY_VER=`$ABS_PORTION/venv/bin/python3 --version | sed -rn "s/Python .\.(.)\..$/\1/p"` 21 | PY_LIB_PATH=`find $ABS_PORTION/venv/lib/ -iname libpython3.${PY_VER}m.so` 22 | 23 | 24 | cmake -D CMAKE_BUILD_TYPE=RELEASE \ 25 | -D BUILD_DOCS=OFF \ 26 | -D BUILD_EXAMPLES=OFF \ 27 | -D BUILD_JPEG=OFF \ 28 | -D BUILD_JPEG=OFF \ 29 | -D BUILD_PERF_TESTS=OFF \ 30 | -D BUILD_SHARED_LIBS=OFF \ 31 | -D BUILD_TESTS=OFF \ 32 | -D BUILD_opencv_apps=OFF \ 33 | -D BUILD_opencv_java=OFF \ 34 | -D BUILD_opencv_python2.7=OFF \ 35 | -D BUILD_opencv_python2=OFF \ 36 | -D BUILD_opencv_python3=ON \ 37 | -D BUILD_opencv_world=OFF \ 38 | -D CMAKE_INSTALL_PREFIX=./binaries/ \ 39 | -D CPU_BASELINE=SSE4_2 \ 40 | -D CPU_DISPATCH=AVX,AVX2,FP16,AVX512 \ 41 | -D CV_TRACE=OFF \ 42 | -D ENABLE_CXX11=ON \ 43 | -D ENABLE_PRECOMPILED_HEADERS=OFF \ 44 | -D FFMPEG_INCLUDE_DIRS=$FFMPEG_PATH/include \ 45 | -D INF_ENGINE_INCLUDE_DIRS=$ABS_PORTION/dldt/inference-engine/include \ 46 | -D INF_ENGINE_LIB_DIRS=$ABS_PORTION/dldt/bin/intel64/Release/lib \ 47 | -D INF_ENGINE_RELEASE=2021040200 \ 48 | -D INSTALL_CREATE_DISTRIB=ON \ 49 | -D INSTALL_C_EXAMPLES=OFF \ 50 | -D INSTALL_PYTHON_EXAMPLES=OFF \ 51 | -D JPEG_INCLUDE_DIR=$JPEG_INCLUDE_DIR \ 52 | -D JPEG_LIBRARY=$JPEG_LIBRARY \ 53 | -D OPENCV_ENABLE_NONFREE=OFF \ 54 | -D OPENCV_FORCE_3RDPARTY_BUILD=ON \ 55 | -D OPENCV_SKIP_PYTHON_LOADER=ON \ 56 | -D PYTHON3_EXECUTABLE=$ABS_PORTION/venv/bin/python3 \ 57 | -D PYTHON3_LIBRARY:PATH=$PY_LIB_PATH \ 58 | -D PYTHON3_NUMPY_INCLUDE_DIRS:PATH=$ABS_PORTION/venv/lib/python3.${PY_VER}/site-packages/numpy/core/include \ 59 | -D PYTHON3_PACKAGES_PATH=$ABS_PORTION/venv/lib/python3.${PY_VER}/site-packages \ 60 | -D PYTHON_DEFAULT_EXECUTABLE=$ABS_PORTION/venv/bin/python3 \ 61 | -D PYTHON_INCLUDE_DIR=/usr/include/python3.${PY_VER} \ 62 | -D WITH_1394=OFF \ 63 | -D WITH_CUDA=OFF \ 64 | -D WITH_EIGEN=OFF \ 65 | -D WITH_FFMPEG=ON \ 66 | -D WITH_GSTRREAMER=OFF \ 67 | -D WITH_GTK=OFF \ 68 | -D WITH_INF_ENGINE=ON \ 69 | -D WITH_IPP=OFF \ 70 | -D WITH_ITT=OFF \ 71 | -D WITH_JASPER=OFF \ 72 | -D WITH_NGRAPH=ON \ 73 | -D WITH_OPENEXR=OFF \ 74 | -D WITH_OPENMP=OFF \ 75 | -D WITH_PNG=ON \ 76 | -D WITH_PROTOBUF=ON \ 77 | -D WITH_QT=OFF \ 78 | -D WITH_TBB=ON \ 79 | -D WITH_V4L=ON \ 80 | -D WITH_VTK=OFF \ 81 | -D WITH_WEBP=OFF \ 82 | -D ngraph_DIR=$ABS_PORTION/build/dldt/ngraph ../../opencv 83 | -------------------------------------------------------------------------------- /create_wheel/LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /create_wheel/LICENSE_MIT: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Kabakov Borys 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /create_wheel/README.md: -------------------------------------------------------------------------------- 1 | # README 2 | 3 | This is a pre-built [OpenCV](https://github.com/opencv/opencv) with [Inference Engine](https://github.com/openvinotoolkit/openvino) module package for Python3. 4 | You need that module if you want to run models from [Intel's model zoo](https://github.com/openvinotoolkit/open_model_zoo). 5 | 6 | It built with `ffmpeg` and `v4l` but without GTK/QT (use matplotlib for plotting your results). 7 | Contrib modules and haarcascades are not included. 8 | 9 | For additional info visit the [project homepage](https://github.com/banderlog/opencv-python-inference-engine) 10 | -------------------------------------------------------------------------------- /create_wheel/cv2/__init__.py: -------------------------------------------------------------------------------- 1 | import importlib 2 | from .cv2 import * 3 | 4 | # wildcard import above does not import "private" variables like __version__ 5 | # this makes them available 6 | globals().update(importlib.import_module('cv2.cv2').__dict__) 7 | -------------------------------------------------------------------------------- /create_wheel/setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | 3 | 4 | with open("README.md", "r") as fh: 5 | long_description = fh.read() 6 | 7 | 8 | # This creates a list which is empty but returns a length of 1. 9 | # Should make the wheel a binary distribution and platlib compliant. 10 | # from 11 | class EmptyListWithLength(list): 12 | def __len__(self): 13 | return 1 14 | 15 | 16 | setuptools.setup( 17 | name='opencv-python-inference-engine', 18 | version='2022.01.05', 19 | url="https://github.com/banderlog/opencv-python-inference-engine", 20 | maintainer="Kabakov Borys", 21 | license='MIT, Apache 2.0', 22 | description="Wrapper package for OpenCV with Inference Engine python bindings", 23 | long_description=long_description, 24 | long_description_content_type="text/markdown", 25 | ext_modules=EmptyListWithLength(), 26 | packages=['cv2'], 27 | package_data={'cv2': ['*.so*', '*.mvcmd', '*.xml']}, 28 | include_package_data=True, 29 | install_requires=['numpy'], 30 | classifiers=[ 31 | 'Development Status :: 5 - Production/Stable', 32 | 'Environment :: Console', 33 | 'Intended Audience :: Developers', 34 | 'Intended Audience :: Education', 35 | 'Intended Audience :: Information Technology', 36 | 'Intended Audience :: Science/Research', 37 | 'Programming Language :: Python :: 3', 38 | 'Programming Language :: C++', 39 | 'Operating System :: POSIX :: Linux', 40 | 'Topic :: Scientific/Engineering', 41 | 'Topic :: Scientific/Engineering :: Image Recognition', 42 | 'Topic :: Software Development', 43 | ], 44 | ) -------------------------------------------------------------------------------- /download_all_stuff.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # colors 4 | end="\033[0m" 5 | red="\033[0;31m" 6 | green="\033[0;32m" 7 | 8 | green () { 9 | echo -e "${green}${1}${end}" 10 | } 11 | 12 | 13 | red () { 14 | echo -e "${red}${1}${end}" 15 | } 16 | 17 | 18 | ROOT_DIR=$(pwd) 19 | 20 | # check Ubuntu version (20.04 build will not work on 18.04) 21 | if test $(lsb_release -rs) != 18.04; then 22 | red "\n!!! You are NOT on the Ubuntu 18.04 !!!\n" 23 | fi 24 | 25 | green "RESET GIT SUBMODULES" 26 | # git checkout dev 27 | # for update use `git submodule update --init --recursive --jobs=4` 28 | # cd submodule dir and `git fetch --tags && git checkout tags/` 29 | git submodule update --init --recursive --depth=1 --jobs=4 30 | # restore changes command will differ between GIT versions (e.g., `restore`) 31 | git submodule foreach --recursive git checkout . 32 | # remove untracked 33 | git submodule foreach --recursive git clean -dxf 34 | 35 | green "CLEAN BUILD DIRS" 36 | find build/dldt/ -mindepth 1 -not -name 'dldt_setup.sh' -not -name '*.patch' -delete 37 | find build/opencv/ -mindepth 1 -not -name 'opencv_setup.sh' -delete 38 | find build/ffmpeg/ -mindepth 1 -not -name 'ffmpeg_*.sh' -delete 39 | 40 | green "CLEAN WHEEL DIR" 41 | find create_wheel/cv2/ -type f -not -name '__init__.py' -delete 42 | rm -drf create_wheel/build 43 | rm -drf create_wheel/dist 44 | rm -drf create_wheel/*egg-info 45 | 46 | green "CREATE VENV" 47 | cd $ROOT_DIR 48 | 49 | if [[ ! -d ./venv ]]; then 50 | virtualenv --clear --always-copy -p /usr/bin/python3 ./venv 51 | ./venv/bin/pip3 install -r requirements.txt 52 | fi 53 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | -------------------------------------------------------------------------------- /tests/README.md: -------------------------------------------------------------------------------- 1 | # Tests for opencv-python-inference-engine wheel 2 | 3 | ## Requirements 4 | 5 | `sudo apt install virtualenv` 6 | 7 | ## Usage 8 | 9 | ### Features 10 | 11 | Just run bash script and read output. 12 | 13 | ```bash 14 | cd tests 15 | ./prepare_and_run_tests.sh 16 | ``` 17 | 18 | ### Inference speed 19 | 20 | Something like below. The general idea is to test only inference speed, without preprocessing and decoding. 21 | Also, 1st inference must not count, because it will load all stuff into memory. 22 | 23 | **NB:** be strict about Backend and Target 24 | 25 | ```python 26 | import cv2 27 | 28 | class PixelLinkDetectorTest(): 29 | """ Cut version of PixelLinkDetector """ 30 | def __init__(self, xml_model_path: str): 31 | self.net = cv2.dnn.readNet(xml_model_path, xml_model_path[:-3] + 'bin') 32 | 33 | def detect(self, img: 'np.ndarray') -> None: 34 | blob = cv2.dnn.blobFromImage(img, 1, (1280, 768)) 35 | self.net.setInput(blob) 36 | out_layer_names = self.net.getUnconnectedOutLayersNames() 37 | return self.net.forward(out_layer_names) 38 | 39 | 40 | # check opencv version 41 | cv2.__version__ 42 | 43 | # read img and network 44 | img = cv2.imread('helloworld.png') 45 | detector = PixelLinkDetectorTest('text-detection-0004.xml') 46 | 47 | # select target & backend, please read the documentation for details: 48 | # 49 | detector.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU) 50 | detector.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE) 51 | 52 | # 1st inference does not count 53 | links, pixels = detector.detect(img) 54 | 55 | # use magic function 56 | %timeit links, pixels = detector.detect(img) 57 | ``` 58 | 59 | 60 | ## Models 61 | 62 | + [rateme](https://github.com/banderlog/rateme) (YOLO3) 63 | + [text-detection-0004](https://github.com/opencv/open_model_zoo/blob/master/models/intel/text-detection-0004/description/text-detection-0004.md) 64 | + [text-recognition-0012](https://github.com/opencv/open_model_zoo/blob/master/models/intel/text-recognition-0012/description/text-recognition-0012.md) 65 | 66 | ## Files 67 | 68 | + `short_video.mp4` from [here](https://www.pexels.com/video/a-cattails-fluff-floats-in-air-2156021/) (free) 69 | + `dislike.jpg` from [rateme repository](https://github.com/banderlog/rateme/blob/master/test_imgs/dislike.jpg) 70 | + `helloworld.png` I either made it or forgot from where it downloaded from 71 | -------------------------------------------------------------------------------- /tests/dislike.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/banderlog/opencv-python-inference-engine/0abe6990b938275c48ad990f2b484dd03ad0f39d/tests/dislike.jpg -------------------------------------------------------------------------------- /tests/helloworld.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/banderlog/opencv-python-inference-engine/0abe6990b938275c48ad990f2b484dd03ad0f39d/tests/helloworld.png -------------------------------------------------------------------------------- /tests/pixellink.py: -------------------------------------------------------------------------------- 1 | """ Wrapper class for Intel's PixelLink realisation (text segmentation NN) 2 | text-detection-00[34] 3 | 4 | For text-detection-002 you'll need to uncomment string in detect() 5 | """ 6 | import cv2 7 | import numpy as np 8 | from skimage.morphology import label 9 | from skimage.measure import regionprops 10 | from typing import List, Tuple 11 | from skimage.measure._regionprops import RegionProperties 12 | 13 | 14 | class PixelLinkDetector(): 15 | """ Wrapper class for Intel's version of PixelLink text detector 16 | 17 | See https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/ \ 18 | text-detection-0004/description/text-detection-0004.md 19 | 20 | :param xml_model_path: path to XML file 21 | 22 | **Example:** 23 | 24 | .. code-block:: python 25 | detector = PixelLinkDetector('text-detection-0004.xml') 26 | img = cv2.imread('tmp.jpg') 27 | # ~250ms on i7-6700K 28 | detector.detect(img) 29 | # ~2ms 30 | bboxes = detector.decode() 31 | """ 32 | def __init__(self, xml_model_path: str, txt_threshold=0.5): 33 | """ 34 | :param xml_model_path: path to model's XML file 35 | :param txt_threshold: confidence, defaults to ``0.5`` 36 | """ 37 | self._net = cv2.dnn.readNet(xml_model_path, xml_model_path[:-3] + 'bin') 38 | self._txt_threshold = txt_threshold 39 | 40 | def detect(self, img: np.ndarray) -> None: 41 | """ GetPixelLink's outputs (BxCxHxW): 42 | + [1x16x192x320] - logits related to linkage between pixels and their neighbors 43 | + [1x2x192x320] - logits related to text/no-text classification for each pixel 44 | 45 | B - batch size 46 | C - number of channels 47 | H - image height 48 | W - image width 49 | 50 | :param img: image as ``numpy.ndarray`` 51 | """ 52 | self._img_shape = img.shape 53 | blob = cv2.dnn.blobFromImage(img, 1, (1280, 768)) 54 | self._net.setInput(blob) 55 | out_layer_names = self._net.getUnconnectedOutLayersNames() 56 | # for text-detection-002 57 | # self.pixels, self.links = self._net.forward(out_layer_names) 58 | # for text-detection-00[34] 59 | self.links, self.pixels = self._net.forward(out_layer_names) 60 | 61 | def get_mask(self) -> np.ndarray: 62 | """ Get binary mask of detected text pixels 63 | """ 64 | pixel_mask = self._get_pixel_scores() >= self._txt_threshold 65 | return pixel_mask.astype(np.uint8) 66 | 67 | def _logsumexp(self, a: np.ndarray, axis=-1) -> np.ndarray: 68 | """ Castrated function from scipy 69 | https://github.com/scipy/scipy/blob/v1.6.2/scipy/special/_logsumexp.py 70 | 71 | Compute the log of the sum of exponentials of input elements. 72 | """ 73 | a_max = np.amax(a, axis=axis, keepdims=True) 74 | tmp = np.exp(a - a_max) 75 | s = np.sum(tmp, axis=axis, keepdims=True) 76 | out = np.log(s) 77 | out += a_max 78 | return out 79 | 80 | def _get_pixel_scores(self) -> np.ndarray: 81 | """ get softmaxed properly shaped pixel scores """ 82 | # move channels to the end 83 | tmp = np.transpose(self.pixels, (0, 2, 3, 1)) 84 | # softmax from scipy 85 | tmp = np.exp(tmp - self._logsumexp(tmp, axis=-1)) 86 | # select single batch, single chanel values 87 | return tmp[0, :, :, 1] 88 | 89 | def _get_txt_regions(self, pixel_mask: np.ndarray) -> List[RegionProperties]: 90 | """ kernels are class dependent """ 91 | img_h, img_w = self._img_shape[:2] 92 | _, mask = cv2.threshold(pixel_mask, 0, 1, cv2.THRESH_BINARY) 93 | # transmutatioins 94 | # kernel size should be image size dependant (default (21,21)) 95 | # on small image it will connect separate words 96 | txt_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2)) 97 | mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, txt_kernel) 98 | # connect regions on mask of original img size 99 | mask = cv2.resize(mask, (img_w, img_h), interpolation=cv2.INTER_NEAREST) 100 | # Label connected regions of an integer array 101 | mask = label(mask, background=0, connectivity=2) 102 | # Measure properties of labeled image regions. 103 | txt_regions = regionprops(mask) 104 | return txt_regions 105 | 106 | def _get_txt_bboxes(self, txt_regions: List[RegionProperties]) -> List[Tuple[int, int, int, int]]: 107 | """ Filter text area by area and height 108 | 109 | :return: ``[(ymin, xmin, ymax, xmax)]`` 110 | """ 111 | min_area = 0 112 | min_height = 4 113 | boxes = [] 114 | for p in txt_regions: 115 | if p.area > min_area: 116 | bbox = p.bbox 117 | if (bbox[2] - bbox[0]) > min_height: 118 | boxes.append(bbox) 119 | return boxes 120 | 121 | def decode(self) -> List[Tuple[int, int, int, int]]: 122 | """ Decode PixelLink's output 123 | 124 | :return: bounding_boxes 125 | 126 | .. note:: 127 | bounding_boxes format: [ymin ,xmin ,ymax, xmax] 128 | 129 | """ 130 | mask = self.get_mask() 131 | bboxes = self._get_txt_bboxes(self._get_txt_regions(mask)) 132 | # sort by xmin, ymin 133 | bboxes = sorted(bboxes, key=lambda x: (x[1], x[0])) 134 | return bboxes 135 | -------------------------------------------------------------------------------- /tests/prepare_and_run_tests.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | green="\033[0;32m" 4 | red="\033[0;31m" 5 | end="\033[0m" 6 | 7 | green () { 8 | echo -e "${green}${1}${end}" 9 | } 10 | 11 | red () { 12 | echo -e "${red}${1}${end}" 13 | } 14 | 15 | 16 | # check if (no ARG and no some appropriate files are compiled) or 17 | # (some args provided but arg1 is not existing file) 18 | # of course, you could shoot your leg here in different ways 19 | if ([ ! $# -ge 1 ] && ! $(ls ../create_wheel/dist/opencv_python_inference_engine*.whl &> /dev/null)) || 20 | ([ $# -ge 1 ] && [ ! -f $1 ]); then 21 | red "How do you suppose to run wheel tests without wheel?" 22 | red "Compile it or provide as an ARG1 to script" 23 | exit 1 24 | fi 25 | 26 | echo "======================================================================" 27 | green "CREATE SEPARATE TEST VENV" 28 | if [ ! -d ./venv_t ]; then 29 | virtualenv --clear --always-copy -p /usr/bin/python3 ./venv_t 30 | fi 31 | 32 | 33 | green "INSTALLING DEPENDENCIES" 34 | if [ $1 ]; then 35 | # install ARGV1 36 | green "Installing from provided path" 37 | WHEEL="$1" 38 | else 39 | # install compiled wheel 40 | green "Installing from default path" 41 | WHEEL=$(realpath ../create_wheel/dist/opencv_python_inference_engine*.whl) 42 | fi 43 | 44 | ./venv_t/bin/pip3 install --force-reinstall "$WHEEL" 45 | ./venv_t/bin/pip3 install -r requirements.txt 46 | 47 | 48 | green "GET MODELS" 49 | 50 | if [ ! -d "rateme" ]; then 51 | ./venv_t/bin/pip3 install "https://github.com/banderlog/rateme/releases/download/v0.1.1/rateme-0.1.1.tar.gz" 52 | fi 53 | 54 | # urls, filenames and checksums are from: 55 | # + 56 | # + 57 | declare -a models=("text-detection-0004.xml" 58 | "text-detection-0004.bin" 59 | "text-recognition-0012.xml" 60 | "text-recognition-0012.bin") 61 | 62 | url_start="https://download.01.org/opencv/2020/openvinotoolkit/2020.1/open_model_zoo/models_bin/1" 63 | 64 | for i in "${models[@]}"; do 65 | # if no such file 66 | if [ ! -f $i ]; then 67 | # download 68 | wget "${url_start}/${i%.*}/FP32/${i}" 69 | else 70 | # checksum 71 | sha256sum -c "${i}.sha256sum" || red "PROBLEMS ^^^" 72 | fi 73 | done 74 | 75 | green "For \"$WHEEL\"" 76 | green "RUN TESTS with ./venv_t/bin/python ./tests.py" 77 | ./venv_t/bin/python ./tests.py 78 | -------------------------------------------------------------------------------- /tests/requirements.txt: -------------------------------------------------------------------------------- 1 | scikit-image 2 | -------------------------------------------------------------------------------- /tests/short_video.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/banderlog/opencv-python-inference-engine/0abe6990b938275c48ad990f2b484dd03ad0f39d/tests/short_video.mp4 -------------------------------------------------------------------------------- /tests/tests.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | import cv2 3 | from pixellink import PixelLinkDetector 4 | from text_recognition import TextRecognizer 5 | from rateme.utils import RateMe 6 | 7 | 8 | class TestPackage(unittest.TestCase): 9 | 10 | def test_dnn_module(self): 11 | model = RateMe() 12 | img = cv2.imread('dislike.jpg') 13 | answer = model.predict(img) 14 | self.assertEqual(answer, 'dislike') 15 | print('rateme: passed') 16 | 17 | def test_inference_engine(self): 18 | img = cv2.imread('helloworld.png') 19 | detector4 = PixelLinkDetector('text-detection-0004.xml') 20 | detector4.detect(img) 21 | bboxes = detector4.decode() 22 | 23 | recognizer12 = TextRecognizer('./text-recognition-0012.xml') 24 | answer = recognizer12.do_ocr(img, bboxes) 25 | self.assertEqual(answer, ['hello', 'world']) 26 | print('text detection and recognition: passed') 27 | 28 | def test_ffmpeg(self): 29 | cap = cv2.VideoCapture('short_video.mp4') 30 | answer, img = cap.read() 31 | self.assertTrue(answer) 32 | print('video opening: passed') 33 | 34 | 35 | if __name__ == '__main__': 36 | unittest.main() 37 | -------------------------------------------------------------------------------- /tests/text-detection-0004.bin.sha256sum: -------------------------------------------------------------------------------- 1 | 6da6456f27123be2d9a0e68bb73a7750f6aaee2f0af75d7f34ec6fa97f6727dc text-detection-0004.bin 2 | -------------------------------------------------------------------------------- /tests/text-detection-0004.xml.sha256sum: -------------------------------------------------------------------------------- 1 | 244f836e36d63c9bd45b2123f4b9e4672cae6be348c15cac857d75a8b9852dd7 text-detection-0004.xml 2 | -------------------------------------------------------------------------------- /tests/text-recognition-0012.bin.sha256sum: -------------------------------------------------------------------------------- 1 | b0d99549692baeea3e83709a671844a365b15bd40e36d9a5d3ef5368a69d2897 text-recognition-0012.bin 2 | -------------------------------------------------------------------------------- /tests/text-recognition-0012.xml.sha256sum: -------------------------------------------------------------------------------- 1 | 54fd8ae6ea5ae11fdeb85f5c6b701793c28883f1e3dd8c3a531c43db6c3713ea text-recognition-0012.xml 2 | -------------------------------------------------------------------------------- /tests/text_recognition.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | from typing import List 4 | 5 | 6 | class TextRecognizer(): 7 | def __init__(self, xml_model_path: str): 8 | """ Class for the Intels' OCR model pipeline 9 | 10 | See https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/ \ 11 | text-recognition-0012/description/text-recognition-0012.md 12 | 13 | :param xml_model_path: path to model's XML file 14 | """ 15 | # load model 16 | self._net = cv2.dnn.readNetFromModelOptimizer(xml_model_path, xml_model_path[:-3] + 'bin') 17 | 18 | def _get_confidences(self, img: np.ndarray, box: tuple) -> np.ndarray: 19 | """ get OCR prediction confidences from a part of image in memory 20 | 21 | :param img: BGR image 22 | :param box: (ymin ,xmin ,ymax, xmax) 23 | 24 | :return: blob with the shape [30, 1, 37] in the format [WxBxL], where: 25 | W - output sequence length 26 | B - batch size 27 | L - confidence distribution across alphanumeric symbols: 28 | "0123456789abcdefghijklmnopqrstuvwxyz#", where # - special 29 | blank character for CTC decoding algorithm. 30 | """ 31 | y1, x1, y2, x2 = box 32 | img = img[y1:y2, x1:x2] 33 | img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 34 | blob = cv2.dnn.blobFromImage(img, 1, (120, 32)) 35 | self._net.setInput(blob) 36 | outs = self._net.forward() 37 | return outs 38 | 39 | def do_ocr(self, img: np.ndarray, bboxes: List[tuple]) -> List[str]: 40 | """ Run OCR pipeline with greedy decoder for each single word (bbox) 41 | 42 | :param img: BGR image 43 | :param bboxes: list of sepaate word bboxes (ymin ,xmin ,ymax, xmax) 44 | 45 | :return: recognized words 46 | 47 | For TF version use: 48 | 49 | .. code-block:: python 50 | 51 | # 30 is `confs.shape[0]` it is fixed 52 | a, b = tf.nn.ctc_beam_search_decoder(confs, np.array([30])) 53 | idx_no_blanks = tf.sparse.to_dense(a[0])[0].numpy() 54 | word = ''.join(char_vec[idxs_no_blanks]) 55 | """ 56 | words = [] 57 | # net could detect only these chars 58 | char_vec = np.array(list("0123456789abcdefghijklmnopqrstuvwxyz#")) 59 | 60 | for box in bboxes: 61 | # confidence distribution across symbols 62 | confs = self._get_confidences(img, box) 63 | # get maximal confidence for the whole beam width aka greedy decoder 64 | idxs = confs[:, 0, :].argmax(axis=1) 65 | # drop blank characters '#' with id == 36 in charvec 66 | # isupposedly we taking only separate words as input 67 | idxs_no_blanks = idxs[idxs != 36] 68 | # joint to string 69 | word = ''.join(char_vec[idxs_no_blanks]) 70 | words.append(word) 71 | 72 | return words 73 | -------------------------------------------------------------------------------- /tests_openvino/README.md: -------------------------------------------------------------------------------- 1 | Only way to download models -- through model downloader, no manual download anymore: 2 | - 3 | - 4 | - 5 | - 6 | - (you need to clone github repo to get them) 7 | 8 | Sometimes models are backwards compatible to new OpenVINO version, sometimes no. 9 | Sometimes new model versions became unworkable. 10 | 11 | IE API for network upload and usage now deprecated, one should use openvino API instead: 12 | - see differences of `pixellink.py` and `text_recognition.py` between `tests` and `tests_openvino` folders 13 | - 14 | -------------------------------------------------------------------------------- /tests_openvino/pixellink.py: -------------------------------------------------------------------------------- 1 | """ Wrapper class for Intel's PixelLink realisation (text segmentation NN) 2 | text-detection-00[34] 3 | 4 | For text-detection-002 you'll need to uncomment string in detect() 5 | """ 6 | import cv2 7 | import numpy as np 8 | from openvino.runtime import Core 9 | from skimage.morphology import label 10 | from skimage.measure import regionprops 11 | from typing import List, Tuple 12 | from skimage.measure._regionprops import RegionProperties 13 | 14 | 15 | class PixelLinkDetector(): 16 | """ Wrapper class for Intel's version of PixelLink text detector 17 | 18 | See https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/ \ 19 | text-detection-0004/description/text-detection-0004.md 20 | 21 | :param xml_model_path: path to XML file 22 | 23 | **Example:** 24 | 25 | .. code-block:: python 26 | detector = PixelLinkDetector('text-detection-0004.xml') 27 | img = cv2.imread('tmp.jpg') 28 | # ~250ms on i7-6700K 29 | detector.detect(img) 30 | # ~2ms 31 | bboxes = detector.decode() 32 | """ 33 | def __init__(self, xml_model_path: str, txt_threshold=0.5): 34 | """ 35 | :param xml_model_path: path to model's XML file 36 | :param txt_threshold: confidence, defaults to ``0.5`` 37 | """ 38 | ie = Core() 39 | model = ie.read_model(xml_model_path) 40 | self._net = ie.compile_model(model=model, device_name="CPU") 41 | #self._net = cv2.dnn.readNet(xml_model_path, xml_model_path[:-3] + 'bin') 42 | self._txt_threshold = txt_threshold 43 | 44 | def detect(self, img: np.ndarray) -> None: 45 | """ GetPixelLink's outputs (BxCxHxW): 46 | + [1x16x192x320] - logits related to linkage between pixels and their neighbors 47 | + [1x2x192x320] - logits related to text/no-text classification for each pixel 48 | 49 | B - batch size 50 | C - number of channels 51 | H - image height 52 | W - image width 53 | 54 | :param img: image as ``numpy.ndarray`` 55 | """ 56 | #input_layer = self._net.input(0) 57 | output_layer_1 = self._net.output(0) 58 | output_layer_2 = self._net.output(1) 59 | self._img_shape = img.shape 60 | blob = cv2.dnn.blobFromImage(img, 1, (1280, 768)) 61 | #self._net.setInput(blob) 62 | #out_layer_names = self._net.getUnconnectedOutLayersNames() 63 | # for text-detection-002 64 | # self.pixels, self.links = self._net.forward(out_layer_names) 65 | # for text-detection-00[34] 66 | #self.links, self.pixels = self._net.forward(out_layer_names) 67 | #self.links, self.pixels = self._net([blob])[output_layer] 68 | #self.links = self._net([blob])[output_layer_1] 69 | #self.pixels = self._net([blob])[output_layer_2] 70 | out = self._net([blob]) 71 | self.links = out[output_layer_1] 72 | self.pixels = out[output_layer_2] 73 | 74 | def get_mask(self) -> np.ndarray: 75 | """ Get binary mask of detected text pixels 76 | """ 77 | pixel_mask = self._get_pixel_scores() >= self._txt_threshold 78 | return pixel_mask.astype(np.uint8) 79 | 80 | def _logsumexp(self, a: np.ndarray, axis=-1) -> np.ndarray: 81 | """ Castrated function from scipy 82 | https://github.com/scipy/scipy/blob/v1.6.2/scipy/special/_logsumexp.py 83 | 84 | Compute the log of the sum of exponentials of input elements. 85 | """ 86 | a_max = np.amax(a, axis=axis, keepdims=True) 87 | tmp = np.exp(a - a_max) 88 | s = np.sum(tmp, axis=axis, keepdims=True) 89 | out = np.log(s) 90 | out += a_max 91 | return out 92 | 93 | def _get_pixel_scores(self) -> np.ndarray: 94 | """ get softmaxed properly shaped pixel scores """ 95 | # move channels to the end 96 | tmp = np.transpose(self.pixels, (0, 2, 3, 1)) 97 | # softmax from scipy 98 | tmp = np.exp(tmp - self._logsumexp(tmp, axis=-1)) 99 | # select single batch, single chanel values 100 | return tmp[0, :, :, 1] 101 | 102 | def _get_txt_regions(self, pixel_mask: np.ndarray) -> List[RegionProperties]: 103 | """ kernels are class dependent """ 104 | img_h, img_w = self._img_shape[:2] 105 | _, mask = cv2.threshold(pixel_mask, 0, 1, cv2.THRESH_BINARY) 106 | # transmutatioins 107 | # kernel size should be image size dependant (default (21,21)) 108 | # on small image it will connect separate words 109 | txt_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2)) 110 | mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, txt_kernel) 111 | # connect regions on mask of original img size 112 | mask = cv2.resize(mask, (img_w, img_h), interpolation=cv2.INTER_NEAREST) 113 | # Label connected regions of an integer array 114 | mask = label(mask, background=0, connectivity=2) 115 | # Measure properties of labeled image regions. 116 | txt_regions = regionprops(mask) 117 | return txt_regions 118 | 119 | def _get_txt_bboxes(self, txt_regions: List[RegionProperties]) -> List[Tuple[int, int, int, int]]: 120 | """ Filter text area by area and height 121 | 122 | :return: ``[(ymin, xmin, ymax, xmax)]`` 123 | """ 124 | min_area = 0 125 | min_height = 4 126 | boxes = [] 127 | for p in txt_regions: 128 | if p.area > min_area: 129 | bbox = p.bbox 130 | if (bbox[2] - bbox[0]) > min_height: 131 | boxes.append(bbox) 132 | return boxes 133 | 134 | def decode(self) -> List[Tuple[int, int, int, int]]: 135 | """ Decode PixelLink's output 136 | 137 | :return: bounding_boxes 138 | 139 | .. note:: 140 | bounding_boxes format: [ymin ,xmin ,ymax, xmax] 141 | 142 | """ 143 | mask = self.get_mask() 144 | bboxes = self._get_txt_bboxes(self._get_txt_regions(mask)) 145 | # sort by xmin, ymin 146 | bboxes = sorted(bboxes, key=lambda x: (x[1], x[0])) 147 | return bboxes 148 | -------------------------------------------------------------------------------- /tests_openvino/prepare_and_run_tests.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | green="\033[0;32m" 4 | red="\033[0;31m" 5 | end="\033[0m" 6 | 7 | green () { 8 | echo -e "${green}${1}${end}" 9 | } 10 | 11 | red () { 12 | echo -e "${red}${1}${end}" 13 | } 14 | 15 | 16 | echo "======================================================================" 17 | green "CREATE VENV WITH OPENCV AND OPENVINO RUNTIME" 18 | if [ ! -d ./venv_t ]; then 19 | virtualenv --clear --always-copy -p /usr/bin/python3 ./venv_t 20 | fi 21 | green "CREATE SEPARATE VENV WITH OPENVINO DEV TO USE MODEL DOWNLOADER" 22 | if [ ! -d ./venv_d ]; then 23 | virtualenv --clear --always-copy -p /usr/bin/python3 ./venv_d 24 | fi 25 | 26 | 27 | green "INSTALLING DEPENDENCIES" 28 | ./venv_t/bin/pip3 install -r requirements.txt 29 | ./venv_d/bin/pip3 install openvino-dev==2022.3.0 30 | 31 | 32 | green "GET MODELS" 33 | if [ ! -f "rateme-0.1.1.tar.gz" ]; then 34 | wget "https://github.com/banderlog/rateme/releases/download/v0.1.1/rateme-0.1.1.tar.gz" 35 | fi 36 | ./venv_t/bin/pip3 install --no-deps "rateme-0.1.1.tar.gz" 37 | 38 | # download models from intel 39 | if [ ! -f "intel/text-recognition-0012/FP32/text-recognition-0012.bin" ]; then 40 | ./venv_d/bin/omz_downloader --precision FP32 -o ./ --name text-recognition-0012 41 | fi 42 | 43 | # particularly that new model does not work or something changed in decoder 44 | declare -a models=("text-detection-0004.xml" 45 | "text-detection-0004.bin") 46 | 47 | url_start="https://download.01.org/opencv/2020/openvinotoolkit/2020.1/open_model_zoo/models_bin/1" 48 | 49 | for i in "${models[@]}"; do 50 | # if no such file 51 | if [ ! -f $i ]; then 52 | # download 53 | wget "${url_start}/${i%.*}/FP32/${i}" 54 | else 55 | # checksum 56 | sha256sum -c "${i}.sha256sum" || red "PROBLEMS ^^^" 57 | fi 58 | done 59 | 60 | 61 | 62 | green "For \"$WHEEL\"" 63 | green "RUN TESTS with ./venv_t/bin/python ./tests.py" 64 | ./venv_t/bin/python ./tests.py 65 | -------------------------------------------------------------------------------- /tests_openvino/requirements.txt: -------------------------------------------------------------------------------- 1 | scikit-image 2 | openvino==2022.3.0 3 | opencv-python 4 | -------------------------------------------------------------------------------- /tests_openvino/tests.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | import cv2 3 | from pixellink import PixelLinkDetector 4 | from text_recognition import TextRecognizer 5 | from rateme.utils import RateMe 6 | 7 | 8 | class TestPackage(unittest.TestCase): 9 | 10 | def test_dnn_module(self): 11 | model = RateMe() 12 | img = cv2.imread('../tests/dislike.jpg') 13 | answer = model.predict(img) 14 | self.assertEqual(answer, 'dislike') 15 | print('rateme: passed') 16 | 17 | def test_inference_engine(self): 18 | img = cv2.imread('../tests/helloworld.png') 19 | detector4 = PixelLinkDetector('text-detection-0004.xml') 20 | detector4.detect(img) 21 | bboxes = detector4.decode() 22 | 23 | recognizer12 = TextRecognizer('intel/text-recognition-0012/FP32/text-recognition-0012.xml') 24 | answer = recognizer12.do_ocr(img, bboxes) 25 | self.assertEqual(answer, ['hello', 'world']) 26 | print('text detection and recognition: passed') 27 | 28 | 29 | if __name__ == '__main__': 30 | unittest.main() 31 | -------------------------------------------------------------------------------- /tests_openvino/text-detection-0004.bin.sha256sum: -------------------------------------------------------------------------------- 1 | 6da6456f27123be2d9a0e68bb73a7750f6aaee2f0af75d7f34ec6fa97f6727dc text-detection-0004.bin 2 | -------------------------------------------------------------------------------- /tests_openvino/text-detection-0004.xml.sha256sum: -------------------------------------------------------------------------------- 1 | 244f836e36d63c9bd45b2123f4b9e4672cae6be348c15cac857d75a8b9852dd7 text-detection-0004.xml 2 | -------------------------------------------------------------------------------- /tests_openvino/text_recognition.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | from openvino.runtime import Core 3 | import numpy as np 4 | from typing import List 5 | 6 | 7 | class TextRecognizer(): 8 | def __init__(self, xml_model_path: str): 9 | """ Class for the Intels' OCR model pipeline 10 | 11 | See https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/ \ 12 | text-recognition-0012/description/text-recognition-0012.md 13 | 14 | :param xml_model_path: path to model's XML file 15 | """ 16 | # load model 17 | ie = Core() 18 | model = ie.read_model(xml_model_path) 19 | self._net = ie.compile_model(model=model, device_name="CPU") 20 | #self._net = cv2.dnn.readNetFromModelOptimizer(xml_model_path, xml_model_path[:-3] + 'bin') 21 | 22 | def _get_confidences(self, img: np.ndarray, box: tuple) -> np.ndarray: 23 | """ get OCR prediction confidences from a part of image in memory 24 | 25 | :param img: BGR image 26 | :param box: (ymin ,xmin ,ymax, xmax) 27 | 28 | :return: blob with the shape [30, 1, 37] in the format [WxBxL], where: 29 | W - output sequence length 30 | B - batch size 31 | L - confidence distribution across alphanumeric symbols: 32 | "0123456789abcdefghijklmnopqrstuvwxyz#", where # - special 33 | blank character for CTC decoding algorithm. 34 | """ 35 | y1, x1, y2, x2 = box 36 | img = img[y1:y2, x1:x2] 37 | img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 38 | blob = cv2.dnn.blobFromImage(img, 1, (120, 32)) 39 | #self._net.setInput(blob) 40 | #outs = self._net.forward() 41 | outs = self._net([blob]) 42 | return outs 43 | 44 | def do_ocr(self, img: np.ndarray, bboxes: List[tuple]) -> List[str]: 45 | """ Run OCR pipeline with greedy decoder for each single word (bbox) 46 | 47 | :param img: BGR image 48 | :param bboxes: list of sepaate word bboxes (ymin ,xmin ,ymax, xmax) 49 | 50 | :return: recognized words 51 | 52 | For TF version use: 53 | 54 | .. code-block:: python 55 | 56 | # 30 is `confs.shape[0]` it is fixed 57 | a, b = tf.nn.ctc_beam_search_decoder(confs, np.array([30])) 58 | idx_no_blanks = tf.sparse.to_dense(a[0])[0].numpy() 59 | word = ''.join(char_vec[idxs_no_blanks]) 60 | """ 61 | words = [] 62 | # net could detect only these chars 63 | char_vec = np.array(list("0123456789abcdefghijklmnopqrstuvwxyz#")) 64 | out_layer_name = self._net.output(0) 65 | 66 | for box in bboxes: 67 | # confidence distribution across symbols 68 | confs = self._get_confidences(img, box)[out_layer_name] 69 | # get maximal confidence for the whole beam width aka greedy decoder 70 | idxs = confs[:, 0, :].argmax(axis=1) 71 | # drop blank characters '#' with id == 36 in charvec 72 | # isupposedly we taking only separate words as input 73 | idxs_no_blanks = idxs[idxs != 36] 74 | # joint to string 75 | word = ''.join(char_vec[idxs_no_blanks]) 76 | words.append(word) 77 | 78 | return words 79 | --------------------------------------------------------------------------------