├── ESP32-CAM-pinout-new.png ├── LICENSE ├── README.md ├── edge-impulse-esp32-cam-bare ├── camera_index.h ├── camera_pins.h ├── dl_lib_matrix3d.h ├── dl_lib_matrix3dq.h ├── edge-impulse-esp32-cam-bare.ino ├── esp_image.hpp ├── frmn.h ├── image_util.c ├── image_util.h └── mtmn.h ├── edge-impulse-esp32-cam ├── camera_index.h ├── camera_pins.h ├── dl_lib_matrix3d.h ├── dl_lib_matrix3dq.h ├── edge-impulse-esp32-cam.ino ├── esp_image.hpp ├── frmn.h ├── image_util.c ├── image_util.h └── mtmn.h ├── ei-esp32-cam-cat-dog-arduino-1.0.4.zip └── esp32-cam-edge-impulse.png /ESP32-CAM-pinout-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alankrantas/edge-impulse-esp32-cam-image-classification/bea4de2a83737349598063bc4ded2949cdcc5b25/ESP32-CAM-pinout-new.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Alan Wang 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Live Image Classification on ESP32-CAM and ST7735 TFT using MobileNet v1 from Edge Impulse (TinyML) 2 | 3 | ![41Ub2S0SjXL _AC_](https://user-images.githubusercontent.com/44191076/153631624-e13576b3-b440-4cd0-8a42-fd29cbe25a2d.jpg) 4 | 5 | This example is for running a micro neural network model on the 10-dollar Ai-Thinker ESP32-CAM board and show the image classification results on a small TFT LCD display. 6 | 7 | > Be noted that I am not testing and improving this any further. This is simply a proof-of-concept and demonstration that you can make simple and practical edge AI devices without making them overly complicated. 8 | 9 | This is modified from [ESP32 Cam and Edge Impulse](https://github.com/edgeimpulse/example-esp32-cam) with simplified code, TFT support and copied necessary libraries from Espressif's [esp-face](https://github.com/Yuri-R-Studio/esp-face). ```esp-face``` had been refactored into [esp-dl](https://github.com/espressif/esp-dl) to support their other products and thus broke the original example. The original example also requires WiFi connection and has image lagging problems, which makes it difficult to use. My version works more like a hand-held point and shoot camera. 10 | 11 | > See the original example repo or [this article](https://www.survivingwithandroid.com/tinyml-esp32-cam-edge-image-classification-with-edge-impulse/) about how to generate your own model on Edge Impulse. You can also still run the original example by copy every libraries in this example to the project directory, then re-open the .ino script. 12 | 13 | ![demo](https://user-images.githubusercontent.com/44191076/154735134-12b59e38-79d6-4890-945c-db0604b0444e.JPG) 14 | 15 | See the [video demonstration](https://www.youtube.com/watch?v=UoWfiEZE0Y4) 16 | 17 | [中文版介紹](https://alankrantas.medium.com/tinyml-%E5%BD%B1%E5%83%8F%E8%BE%A8%E8%AD%98%E5%BE%AE%E5%9E%8B%E5%8C%96-%E5%9C%A8%E6%88%90%E6%9C%AC-10-%E7%BE%8E%E5%85%83%E7%9A%84-esp32-cam-%E9%96%8B%E7%99%BC%E6%9D%BF%E4%B8%8A%E5%8D%B3%E6%99%82%E5%88%86%E9%A1%9E%E8%B2%93%E7%8B%97-%E5%BE%9E%E6%AD%A4%E8%B7%9F%E9%BA%BB%E7%85%A9%E7%9A%84-wifi-%E9%80%A3%E7%B7%9A%E8%AA%AA%E6%8B%9C%E6%8B%9C-%E4%BD%BF%E7%94%A8-mobilenet-v1-%E6%A8%A1%E5%9E%8B%E8%88%87%E9%81%B7%E7%A7%BB%E5%AD%B8%E7%BF%92-10fb02da83e9) 18 | 19 | ## Setup 20 | 21 | The following is needed in your Arduino IDE: 22 | 23 | * [Arduino-ESP32 board support](https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json) (select ```Ai Thinker ESP32-CAM```) 24 | * [Adafruit GFX Library](https://github.com/adafruit/Adafruit-GFX-Library) 25 | * [Adafruit ST7735 and ST7789 Library](https://github.com/adafruit/Adafruit-ST7735-Library) 26 | * [Import](https://docs.arduino.cc/software/ide-v1/tutorials/installing-libraries) the model library you've generated from Edge Impulse Studio 27 | * Download [edge-impulse-esp32-cam](https://github.com/alankrantas/edge-impulse-esp32-cam-image-classification/tree/main/edge-impulse-esp32-cam) from this repo and open the ```.ino``` file in the directory. 28 | 29 | Be noted that you won't be able to read any serial output if you use Arduino IDE 2.0! 30 | 31 | ## Wiring 32 | 33 | ![pinout](https://github.com/alankrantas/edge-impulse-esp32-cam-image-classification/raw/main/ESP32-CAM-pinout-new.png) 34 | 35 | ![wiring](https://github.com/alankrantas/edge-impulse-esp32-cam-image-classification/raw/main/esp32-cam-edge-impulse.png) 36 | 37 | For the ESP32-CAM, the side with the reset button is "up". The whole system is powered from a power module that can output both 5V and 3.3V. The ESP32-CAM is powered by 5V and TFT by 3.3V. I use a 7.5V 1A charger (power modules require 6.5V+ to provide stable 5V). The power module I use only output 500 mA max - but you don't need a lot since we don't use WiFi. 38 | 39 | | USB-TTL pins | ESP32-CAM | 40 | | --- | --- | 41 | | Tx | GPIO 3 (UOR) | 42 | | Rx | GPIO 1 (UOT) | 43 | | GND | GND | 44 | 45 | The USB-TTL's GND should be connected to the breadboard, not the ESP32-CAM itself. If you want to upload code, disconnect power then connect GPIO 0 to GND (also should be on the breadboard), then power it up. It would be in flash mode. (The alternative way is remove the ESP32-CAM itself and use the ESP32-CAM-MB programmer board.) 46 | 47 | | TFT pins | ESP32-CAM | 48 | | --- | --- | 49 | | SCK (SCL) | GPIO 14 | 50 | | MOSI (SDA) | GPIO 13 | 51 | | RESET (RST) | GPIO 12 | 52 | | DC | GPIO 2 | 53 | | CS | GPIO 15 | 54 | | BL (back light) | 3V3 | 55 | 56 | The script will display a 120x120 image on the TFT, so any 160x128 or 128x128 versions can be used. But you might want to change the parameter in ```tft.initR(INITR_GREENTAB);``` to ```INITR_REDTAB``` or ```INITR_BLACKTAB``` to get correct text colors. 57 | 58 | | Button | ESP32-CAM | 59 | | --- | --- | 60 | | BTN | 3V3 | 61 | | BTN | GPIO 4 | 62 | 63 | Be noted that since the button pin is shared with the flash LED (this is the available pin left; GPIO 16 is camera-related), the button has to be **pulled down** with two 10 KΩ resistors. 64 | 65 | ## The Example Model - Cat & Dog Classification 66 | 67 | My demo model used Microsoft's [Kaggle Cats and Dogs Dataset](https://www.microsoft.com/en-us/download/details.aspx?id=54765) which has 12,500 cats and 12,500 dogs. 24,969 photos had successfully uploaded and split into 80-20% training/test sets. The variety of the images is perfect since we are not doing YOLO- or SSD- style object detection. 68 | 69 | ![下載](https://user-images.githubusercontent.com/44191076/154785876-b65de5e1-acba-4c2a-9c25-01d02e9b7a2b.png) 70 | 71 | The model I choose was ```MobileNetV1 96x96 0.25 (no final dense layer, 0.1 dropout)``` with transfer learning. Since free Edge Impulse accounts has a training time limit of 20 minutes per job, I can only train the model for 5 cycles. (You can go [ask for more](https://forum.edgeimpulse.com/t/err-deadlineexceeded-ways-to-fix-this/2354/2) though...) I imagine if you have only a dozen images per class, you can try better models or longer training cycles. 72 | 73 | Anyway, I got ```89.8%``` accuracy for training set and ```86.97%``` for test set, which seems decent enough. 74 | 75 | ![1](https://user-images.githubusercontent.com/44191076/153631673-96b90c0b-5745-43b9-9e5f-9a426d8bfe61.png) 76 | 77 | Also, ESP32-CAM is not yet an officially supported board when I created this project, so I cannot use EON Tuner for futher find-tuning. 78 | 79 | You can find my published Edge Impulse project here: [esp32-cam-cat-dog](https://studio.edgeimpulse.com/public/76904/latest). [ei-esp32-cam-cat-dog-arduino-1.0.4.zip](https://github.com/alankrantas/edge-impulse-esp32-cam-image-classification/blob/main/ei-esp32-cam-cat-dog-arduino-1.0.4.zip) is the downloaded Arduino library which can be imported into Ardiono IDE. 80 | 81 | The camera captures 240x240 images and resize them into 96x96 for the model input, and again resize the original image to 120x120 for the TFT display. The model inference time (prediction time) is 2607 ms (2.6 secs) per image, which is not very fast, with mostly good results. I don't know yet if different image sets or models may effect the result. 82 | 83 | > Note: the demo model has only two classes - dog and cat - thus it will try "predict" whatever it sees to either dogs or cats. A better model should have a third class of "not dogs nor cats" to avoid invalid responses. 84 | 85 | ## Boilerplate Version 86 | 87 | The [edge-impulse-esp32-cam-bare](https://github.com/alankrantas/edge-impulse-esp32-cam-image-classification/tree/main/edge-impulse-esp32-cam-bare) is the version that dosen't use any external devices. The model would be running in a non-stop loop. You can try to point the camera to the images and read the prediction via serial port (use Arduino IDE 1.x). 88 | 89 | ![bogdan-farca-CEx86maLUSc-unsplash](https://user-images.githubusercontent.com/44191076/153636524-9b2edab9-7c50-4aa1-9d6e-74477d67011f.jpg) 90 | 91 | ![richard-brutyo-Sg3XwuEpybU-unsplash](https://user-images.githubusercontent.com/44191076/153636561-16f7fb47-dcfc-4988-8772-85dcc5acfdac.jpg) 92 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam-bare/camera_pins.h: -------------------------------------------------------------------------------- 1 | 2 | #if defined(CAMERA_MODEL_WROVER_KIT) 3 | #define PWDN_GPIO_NUM -1 4 | #define RESET_GPIO_NUM -1 5 | #define XCLK_GPIO_NUM 21 6 | #define SIOD_GPIO_NUM 26 7 | #define SIOC_GPIO_NUM 27 8 | 9 | #define Y9_GPIO_NUM 35 10 | #define Y8_GPIO_NUM 34 11 | #define Y7_GPIO_NUM 39 12 | #define Y6_GPIO_NUM 36 13 | #define Y5_GPIO_NUM 19 14 | #define Y4_GPIO_NUM 18 15 | #define Y3_GPIO_NUM 5 16 | #define Y2_GPIO_NUM 4 17 | #define VSYNC_GPIO_NUM 25 18 | #define HREF_GPIO_NUM 23 19 | #define PCLK_GPIO_NUM 22 20 | 21 | #elif defined(CAMERA_MODEL_ESP_EYE) 22 | #define PWDN_GPIO_NUM -1 23 | #define RESET_GPIO_NUM -1 24 | #define XCLK_GPIO_NUM 4 25 | #define SIOD_GPIO_NUM 18 26 | #define SIOC_GPIO_NUM 23 27 | 28 | #define Y9_GPIO_NUM 36 29 | #define Y8_GPIO_NUM 37 30 | #define Y7_GPIO_NUM 38 31 | #define Y6_GPIO_NUM 39 32 | #define Y5_GPIO_NUM 35 33 | #define Y4_GPIO_NUM 14 34 | #define Y3_GPIO_NUM 13 35 | #define Y2_GPIO_NUM 34 36 | #define VSYNC_GPIO_NUM 5 37 | #define HREF_GPIO_NUM 27 38 | #define PCLK_GPIO_NUM 25 39 | 40 | #elif defined(CAMERA_MODEL_M5STACK_PSRAM) 41 | #define PWDN_GPIO_NUM -1 42 | #define RESET_GPIO_NUM 15 43 | #define XCLK_GPIO_NUM 27 44 | #define SIOD_GPIO_NUM 25 45 | #define SIOC_GPIO_NUM 23 46 | 47 | #define Y9_GPIO_NUM 19 48 | #define Y8_GPIO_NUM 36 49 | #define Y7_GPIO_NUM 18 50 | #define Y6_GPIO_NUM 39 51 | #define Y5_GPIO_NUM 5 52 | #define Y4_GPIO_NUM 34 53 | #define Y3_GPIO_NUM 35 54 | #define Y2_GPIO_NUM 32 55 | #define VSYNC_GPIO_NUM 22 56 | #define HREF_GPIO_NUM 26 57 | #define PCLK_GPIO_NUM 21 58 | 59 | #elif defined(CAMERA_MODEL_M5STACK_V2_PSRAM) 60 | #define PWDN_GPIO_NUM -1 61 | #define RESET_GPIO_NUM 15 62 | #define XCLK_GPIO_NUM 27 63 | #define SIOD_GPIO_NUM 22 64 | #define SIOC_GPIO_NUM 23 65 | 66 | #define Y9_GPIO_NUM 19 67 | #define Y8_GPIO_NUM 36 68 | #define Y7_GPIO_NUM 18 69 | #define Y6_GPIO_NUM 39 70 | #define Y5_GPIO_NUM 5 71 | #define Y4_GPIO_NUM 34 72 | #define Y3_GPIO_NUM 35 73 | #define Y2_GPIO_NUM 32 74 | #define VSYNC_GPIO_NUM 25 75 | #define HREF_GPIO_NUM 26 76 | #define PCLK_GPIO_NUM 21 77 | 78 | #elif defined(CAMERA_MODEL_M5STACK_WIDE) 79 | #define PWDN_GPIO_NUM -1 80 | #define RESET_GPIO_NUM 15 81 | #define XCLK_GPIO_NUM 27 82 | #define SIOD_GPIO_NUM 22 83 | #define SIOC_GPIO_NUM 23 84 | 85 | #define Y9_GPIO_NUM 19 86 | #define Y8_GPIO_NUM 36 87 | #define Y7_GPIO_NUM 18 88 | #define Y6_GPIO_NUM 39 89 | #define Y5_GPIO_NUM 5 90 | #define Y4_GPIO_NUM 34 91 | #define Y3_GPIO_NUM 35 92 | #define Y2_GPIO_NUM 32 93 | #define VSYNC_GPIO_NUM 25 94 | #define HREF_GPIO_NUM 26 95 | #define PCLK_GPIO_NUM 21 96 | 97 | #elif defined(CAMERA_MODEL_M5STACK_ESP32CAM) 98 | #define PWDN_GPIO_NUM -1 99 | #define RESET_GPIO_NUM 15 100 | #define XCLK_GPIO_NUM 27 101 | #define SIOD_GPIO_NUM 25 102 | #define SIOC_GPIO_NUM 23 103 | 104 | #define Y9_GPIO_NUM 19 105 | #define Y8_GPIO_NUM 36 106 | #define Y7_GPIO_NUM 18 107 | #define Y6_GPIO_NUM 39 108 | #define Y5_GPIO_NUM 5 109 | #define Y4_GPIO_NUM 34 110 | #define Y3_GPIO_NUM 35 111 | #define Y2_GPIO_NUM 17 112 | #define VSYNC_GPIO_NUM 22 113 | #define HREF_GPIO_NUM 26 114 | #define PCLK_GPIO_NUM 21 115 | 116 | #elif defined(CAMERA_MODEL_AI_THINKER) 117 | #define PWDN_GPIO_NUM 32 118 | #define RESET_GPIO_NUM -1 119 | #define XCLK_GPIO_NUM 0 120 | #define SIOD_GPIO_NUM 26 121 | #define SIOC_GPIO_NUM 27 122 | 123 | #define Y9_GPIO_NUM 35 124 | #define Y8_GPIO_NUM 34 125 | #define Y7_GPIO_NUM 39 126 | #define Y6_GPIO_NUM 36 127 | #define Y5_GPIO_NUM 21 128 | #define Y4_GPIO_NUM 19 129 | #define Y3_GPIO_NUM 18 130 | #define Y2_GPIO_NUM 5 131 | #define VSYNC_GPIO_NUM 25 132 | #define HREF_GPIO_NUM 23 133 | #define PCLK_GPIO_NUM 22 134 | 135 | #elif defined(CAMERA_MODEL_TTGO_T_JOURNAL) 136 | #define PWDN_GPIO_NUM 0 137 | #define RESET_GPIO_NUM 15 138 | #define XCLK_GPIO_NUM 27 139 | #define SIOD_GPIO_NUM 25 140 | #define SIOC_GPIO_NUM 23 141 | 142 | #define Y9_GPIO_NUM 19 143 | #define Y8_GPIO_NUM 36 144 | #define Y7_GPIO_NUM 18 145 | #define Y6_GPIO_NUM 39 146 | #define Y5_GPIO_NUM 5 147 | #define Y4_GPIO_NUM 34 148 | #define Y3_GPIO_NUM 35 149 | #define Y2_GPIO_NUM 17 150 | #define VSYNC_GPIO_NUM 22 151 | #define HREF_GPIO_NUM 26 152 | #define PCLK_GPIO_NUM 21 153 | 154 | #else 155 | #error "Camera model not selected" 156 | #endif 157 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam-bare/dl_lib_matrix3d.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | #if CONFIG_SPIRAM_SUPPORT || CONFIG_ESP32_SPIRAM_SUPPORT 11 | #include "freertos/FreeRTOS.h" 12 | #define DL_SPIRAM_SUPPORT 1 13 | #else 14 | #define DL_SPIRAM_SUPPORT 0 15 | #endif 16 | 17 | 18 | #ifndef max 19 | #define max(x, y) (((x) < (y)) ? (y) : (x)) 20 | #endif 21 | 22 | #ifndef min 23 | #define min(x, y) (((x) < (y)) ? (x) : (y)) 24 | #endif 25 | 26 | typedef float fptp_t; 27 | typedef uint8_t uc_t; 28 | 29 | typedef enum 30 | { 31 | DL_SUCCESS = 0, 32 | DL_FAIL = 1, 33 | } dl_error_type; 34 | 35 | typedef enum 36 | { 37 | PADDING_VALID = 0, /*!< Valid padding */ 38 | PADDING_SAME = 1, /*!< Same padding, from right to left, free input */ 39 | PADDING_SAME_DONT_FREE_INPUT = 2, /*!< Same padding, from right to left, do not free input */ 40 | PADDING_SAME_MXNET = 3, /*!< Same padding, from left to right */ 41 | } dl_padding_type; 42 | 43 | typedef enum 44 | { 45 | DL_POOLING_MAX = 0, /*!< Max pooling */ 46 | DL_POOLING_AVG = 1, /*!< Average pooling */ 47 | } dl_pooling_type; 48 | /* 49 | * Matrix for 3d 50 | * @Warning: the sequence of variables is fixed, cannot be modified, otherwise there will be errors in esp_dsp_dot_float 51 | */ 52 | typedef struct 53 | { 54 | int w; /*!< Width */ 55 | int h; /*!< Height */ 56 | int c; /*!< Channel */ 57 | int n; /*!< Number of filter, input and output must be 1 */ 58 | int stride; /*!< Step between lines */ 59 | fptp_t *item; /*!< Data */ 60 | } dl_matrix3d_t; 61 | 62 | typedef struct 63 | { 64 | int w; /*!< Width */ 65 | int h; /*!< Height */ 66 | int c; /*!< Channel */ 67 | int n; /*!< Number of filter, input and output must be 1 */ 68 | int stride; /*!< Step between lines */ 69 | uc_t *item; /*!< Data */ 70 | } dl_matrix3du_t; 71 | 72 | typedef enum 73 | { 74 | UPSAMPLE_NEAREST_NEIGHBOR = 0, /*!< Use nearest neighbor interpolation as the upsample method*/ 75 | UPSAMPLE_BILINEAR = 1, /*!< Use nearest bilinear interpolation as the upsample method*/ 76 | } dl_upsample_type; 77 | 78 | typedef struct 79 | { 80 | int stride_x; /*!< Strides of width */ 81 | int stride_y; /*!< Strides of height */ 82 | dl_padding_type padding; /*!< Padding type */ 83 | } dl_matrix3d_mobilenet_config_t; 84 | 85 | /* 86 | * @brief Allocate a zero-initialized space. Must use 'dl_lib_free' to free the memory. 87 | * 88 | * @param cnt Count of units. 89 | * @param size Size of unit. 90 | * @param align Align of memory. If not required, set 0. 91 | * @return Pointer of allocated memory. Null for failed. 92 | */ 93 | static void *dl_lib_calloc(int cnt, int size, int align) 94 | { 95 | int total_size = cnt * size + align + sizeof(void *); 96 | void *res = malloc(total_size); 97 | if (NULL == res) 98 | { 99 | #if DL_SPIRAM_SUPPORT 100 | res = heap_caps_malloc(total_size, MALLOC_CAP_8BIT | MALLOC_CAP_SPIRAM); 101 | } 102 | if (NULL == res) 103 | { 104 | printf("Item psram alloc failed. Size: %d x %d\n", cnt, size); 105 | #else 106 | printf("Item alloc failed. Size: %d x %d, SPIRAM_FLAG: %d\n", cnt, size, DL_SPIRAM_SUPPORT); 107 | #endif 108 | return NULL; 109 | } 110 | bzero(res, total_size); 111 | void **data = (void **)res + 1; 112 | void **aligned; 113 | if (align) 114 | aligned = (void **)(((size_t)data + (align - 1)) & -align); 115 | else 116 | aligned = data; 117 | 118 | aligned[-1] = res; 119 | return (void *)aligned; 120 | } 121 | 122 | /** 123 | * @brief Free the memory space allocated by 'dl_lib_calloc' 124 | * 125 | */ 126 | static inline void dl_lib_free(void *d) 127 | { 128 | if (NULL == d) 129 | return; 130 | 131 | free(((void **)d)[-1]); 132 | } 133 | 134 | /* 135 | * @brief Allocate a 3D matrix with float items, the access sequence is NHWC 136 | * 137 | * @param n Number of matrix3d, for filters it is out channels, for others it is 1 138 | * @param w Width of matrix3d 139 | * @param h Height of matrix3d 140 | * @param c Channel of matrix3d 141 | * @return 3d matrix 142 | */ 143 | static inline dl_matrix3d_t *dl_matrix3d_alloc(int n, int w, int h, int c) 144 | { 145 | dl_matrix3d_t *r = (dl_matrix3d_t *)dl_lib_calloc(1, sizeof(dl_matrix3d_t), 0); 146 | if (NULL == r) 147 | { 148 | printf("internal r failed.\n"); 149 | return NULL; 150 | } 151 | fptp_t *items = (fptp_t *)dl_lib_calloc(n * w * h * c, sizeof(fptp_t), 0); 152 | if (NULL == items) 153 | { 154 | printf("matrix3d item alloc failed.\n"); 155 | dl_lib_free(r); 156 | return NULL; 157 | } 158 | 159 | r->w = w; 160 | r->h = h; 161 | r->c = c; 162 | r->n = n; 163 | r->stride = w * c; 164 | r->item = items; 165 | 166 | return r; 167 | } 168 | 169 | /* 170 | * @brief Allocate a 3D matrix with 8-bits items, the access sequence is NHWC 171 | * 172 | * @param n Number of matrix3d, for filters it is out channels, for others it is 1 173 | * @param w Width of matrix3d 174 | * @param h Height of matrix3d 175 | * @param c Channel of matrix3d 176 | * @return 3d matrix 177 | */ 178 | static inline dl_matrix3du_t *dl_matrix3du_alloc(int n, int w, int h, int c) 179 | { 180 | dl_matrix3du_t *r = (dl_matrix3du_t *)dl_lib_calloc(1, sizeof(dl_matrix3du_t), 0); 181 | if (NULL == r) 182 | { 183 | printf("internal r failed.\n"); 184 | return NULL; 185 | } 186 | uc_t *items = (uc_t *)dl_lib_calloc(n * w * h * c, sizeof(uc_t), 0); 187 | if (NULL == items) 188 | { 189 | printf("matrix3du item alloc failed.\n"); 190 | dl_lib_free(r); 191 | return NULL; 192 | } 193 | 194 | r->w = w; 195 | r->h = h; 196 | r->c = c; 197 | r->n = n; 198 | r->stride = w * c; 199 | r->item = items; 200 | 201 | return r; 202 | } 203 | 204 | /* 205 | * @brief Free a matrix3d 206 | * 207 | * @param m matrix3d with float items 208 | */ 209 | static inline void dl_matrix3d_free(dl_matrix3d_t *m) 210 | { 211 | if (NULL == m) 212 | return; 213 | if (NULL == m->item) 214 | { 215 | dl_lib_free(m); 216 | return; 217 | } 218 | dl_lib_free(m->item); 219 | dl_lib_free(m); 220 | } 221 | 222 | /* 223 | * @brief Free a matrix3d 224 | * 225 | * @param m matrix3d with 8-bits items 226 | */ 227 | static inline void dl_matrix3du_free(dl_matrix3du_t *m) 228 | { 229 | if (NULL == m) 230 | return; 231 | if (NULL == m->item) 232 | { 233 | dl_lib_free(m); 234 | return; 235 | } 236 | dl_lib_free(m->item); 237 | dl_lib_free(m); 238 | } 239 | 240 | 241 | /* 242 | * @brief Dot product with a vector and matrix 243 | * 244 | * @param out Space to put the result 245 | * @param in input vector 246 | * @param f filter matrix 247 | */ 248 | void dl_matrix3dff_dot_product(dl_matrix3d_t *out, dl_matrix3d_t *in, dl_matrix3d_t *f); 249 | 250 | /** 251 | * @brief Do a softmax operation on a matrix3d 252 | * 253 | * @param in Input matrix3d 254 | */ 255 | void dl_matrix3d_softmax(dl_matrix3d_t *m); 256 | 257 | /** 258 | * @brief Copy a range of float items from an existing matrix to a preallocated matrix 259 | * 260 | * @param dst The destination slice matrix 261 | * @param src The source matrix to slice 262 | * @param x X-offset of the origin of the returned matrix within the sliced matrix 263 | * @param y Y-offset of the origin of the returned matrix within the sliced matrix 264 | * @param w Width of the resulting matrix 265 | * @param h Height of the resulting matrix 266 | */ 267 | void dl_matrix3d_slice_copy(dl_matrix3d_t *dst, 268 | dl_matrix3d_t *src, 269 | int x, 270 | int y, 271 | int w, 272 | int h); 273 | 274 | /** 275 | * @brief Copy a range of 8-bits items from an existing matrix to a preallocated matrix 276 | * 277 | * @param dst The destination slice matrix 278 | * @param src The source matrix to slice 279 | * @param x X-offset of the origin of the returned matrix within the sliced matrix 280 | * @param y Y-offset of the origin of the returned matrix within the sliced matrix 281 | * @param w Width of the resulting matrix 282 | * @param h Height of the resulting matrix 283 | */ 284 | void dl_matrix3du_slice_copy(dl_matrix3du_t *dst, 285 | dl_matrix3du_t *src, 286 | int x, 287 | int y, 288 | int w, 289 | int h); 290 | 291 | /** 292 | * @brief Transform a sliced matrix block from nhwc to nchw, the block needs to be memory continous. 293 | * 294 | * @param out The destination sliced matrix in nchw 295 | * @param in The source sliced matrix in nhwc 296 | */ 297 | void dl_matrix3d_sliced_transform_nchw(dl_matrix3d_t *out, 298 | dl_matrix3d_t *in); 299 | 300 | /** 301 | * @brief Do a general CNN layer pass, dimension is (number, width, height, channel) 302 | * 303 | * @param in Input matrix3d 304 | * @param filter Weights of the neurons 305 | * @param bias Bias for the CNN layer 306 | * @param stride_x The step length of the convolution window in x(width) direction 307 | * @param stride_y The step length of the convolution window in y(height) direction 308 | * @param padding One of VALID or SAME 309 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 310 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 311 | * @return dl_matrix3d_t* The result of CNN layer 312 | */ 313 | dl_matrix3d_t *dl_matrix3d_conv(dl_matrix3d_t *in, 314 | dl_matrix3d_t *filter, 315 | dl_matrix3d_t *bias, 316 | int stride_x, 317 | int stride_y, 318 | int padding, 319 | int mode); 320 | 321 | /** 322 | * @brief Do a global average pooling layer pass, dimension is (number, width, height, channel) 323 | * 324 | * @param in Input matrix3d 325 | * 326 | * @return The result of global average pooling layer 327 | */ 328 | dl_matrix3d_t *dl_matrix3d_global_pool(dl_matrix3d_t *in); 329 | 330 | /** 331 | * @brief Calculate pooling layer of a feature map 332 | * 333 | * @param in Input matrix, size (1, w, h, c) 334 | * @param f_w Window width 335 | * @param f_h Window height 336 | * @param stride_x Stride in horizontal direction 337 | * @param stride_y Stride in vertical direction 338 | * @param padding Padding type: PADDING_VALID and PADDING_SAME 339 | * @param pooling_type Pooling type: DL_POOLING_MAX and POOLING_AVG 340 | * @return dl_matrix3d_t* Resulting matrix, size (1, w', h', c) 341 | */ 342 | dl_matrix3d_t *dl_matrix3d_pooling(dl_matrix3d_t *in, 343 | int f_w, 344 | int f_h, 345 | int stride_x, 346 | int stride_y, 347 | dl_padding_type padding, 348 | dl_pooling_type pooling_type); 349 | /** 350 | * @brief Do a batch normalization operation, update the input matrix3d: input = input * scale + offset 351 | * 352 | * @param m Input matrix3d 353 | * @param scale scale matrix3d, scale = gamma/((moving_variance+sigma)^(1/2)) 354 | * @param Offset Offset matrix3d, offset = beta-(moving_mean*gamma/((moving_variance+sigma)^(1/2))) 355 | */ 356 | void dl_matrix3d_batch_normalize(dl_matrix3d_t *m, 357 | dl_matrix3d_t *scale, 358 | dl_matrix3d_t *offset); 359 | 360 | /** 361 | * @brief Add a pair of matrix3d item-by-item: res=in_1+in_2 362 | * 363 | * @param in_1 First Floating point input matrix3d 364 | * @param in_2 Second Floating point input matrix3d 365 | * 366 | * @return dl_matrix3d_t* Added data 367 | */ 368 | dl_matrix3d_t *dl_matrix3d_add(dl_matrix3d_t *in_1, dl_matrix3d_t *in_2); 369 | 370 | /** 371 | * @brief Concatenate the channels of two matrix3ds into a new matrix3d 372 | * 373 | * @param in_1 First Floating point input matrix3d 374 | * @param in_2 Second Floating point input matrix3d 375 | * 376 | * @return dl_matrix3d_t* A newly allocated matrix3d with as avlues in_1|in_2 377 | */ 378 | dl_matrix3d_t *dl_matrix3d_concat(dl_matrix3d_t *in_1, dl_matrix3d_t *in_2); 379 | 380 | /** 381 | * @brief Concatenate the channels of four matrix3ds into a new matrix3d 382 | * 383 | * @param in_1 First Floating point input matrix3d 384 | * @param in_2 Second Floating point input matrix3d 385 | * @param in_3 Third Floating point input matrix3d 386 | * @param in_4 Fourth Floating point input matrix3d 387 | * 388 | * @return A newly allocated matrix3d with as avlues in_1|in_2|in_3|in_4 389 | */ 390 | dl_matrix3d_t *dl_matrix3d_concat_4(dl_matrix3d_t *in_1, 391 | dl_matrix3d_t *in_2, 392 | dl_matrix3d_t *in_3, 393 | dl_matrix3d_t *in_4); 394 | 395 | /** 396 | * @brief Concatenate the channels of eight matrix3ds into a new matrix3d 397 | * 398 | * @param in_1 First Floating point input matrix3d 399 | * @param in_2 Second Floating point input matrix3d 400 | * @param in_3 Third Floating point input matrix3d 401 | * @param in_4 Fourth Floating point input matrix3d 402 | * @param in_5 Fifth Floating point input matrix3d 403 | * @param in_6 Sixth Floating point input matrix3d 404 | * @param in_7 Seventh Floating point input matrix3d 405 | * @param in_8 eighth Floating point input matrix3d 406 | * 407 | * @return A newly allocated matrix3d with as avlues in_1|in_2|in_3|in_4|in_5|in_6|in_7|in_8 408 | */ 409 | dl_matrix3d_t *dl_matrix3d_concat_8(dl_matrix3d_t *in_1, 410 | dl_matrix3d_t *in_2, 411 | dl_matrix3d_t *in_3, 412 | dl_matrix3d_t *in_4, 413 | dl_matrix3d_t *in_5, 414 | dl_matrix3d_t *in_6, 415 | dl_matrix3d_t *in_7, 416 | dl_matrix3d_t *in_8); 417 | 418 | /** 419 | * @brief Do a mobilefacenet block forward, dimension is (number, width, height, channel) 420 | * 421 | * @param in Input matrix3d 422 | * @param pw Weights of the pointwise conv layer 423 | * @param pw_bn_scale The scale params of the batch_normalize layer after the pointwise conv layer 424 | * @param pw_bn_offset The offset params of the batch_normalize layer after the pointwise conv layer 425 | * @param dw Weights of the depthwise conv layer 426 | * @param dw_bn_scale The scale params of the batch_normalize layer after the depthwise conv layer 427 | * @param dw_bn_offset The offset params of the batch_normalize layer after the depthwise conv layer 428 | * @param pw_linear Weights of the pointwise linear conv layer 429 | * @param pw_linear_bn_scale The scale params of the batch_normalize layer after the pointwise linear conv layer 430 | * @param pw_linear_bn_offset The offset params of the batch_normalize layer after the pointwise linear conv layer 431 | * @param stride_x The step length of the convolution window in x(width) direction 432 | * @param stride_y The step length of the convolution window in y(height) direction 433 | * @param padding One of VALID or SAME 434 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 435 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 436 | * @return The result of a mobilefacenet block 437 | */ 438 | dl_matrix3d_t *dl_matrix3d_mobilefaceblock(dl_matrix3d_t *in, 439 | dl_matrix3d_t *pw, 440 | dl_matrix3d_t *pw_bn_scale, 441 | dl_matrix3d_t *pw_bn_offset, 442 | dl_matrix3d_t *dw, 443 | dl_matrix3d_t *dw_bn_scale, 444 | dl_matrix3d_t *dw_bn_offset, 445 | dl_matrix3d_t *pw_linear, 446 | dl_matrix3d_t *pw_linear_bn_scale, 447 | dl_matrix3d_t *pw_linear_bn_offset, 448 | int stride_x, 449 | int stride_y, 450 | int padding, 451 | int mode, 452 | int shortcut); 453 | 454 | /** 455 | * @brief Do a mobilefacenet block forward with 1x1 split conv, dimension is (number, width, height, channel) 456 | * 457 | * @param in Input matrix3d 458 | * @param pw_1 Weights of the pointwise conv layer 1 459 | * @param pw_2 Weights of the pointwise conv layer 2 460 | * @param pw_bn_scale The scale params of the batch_normalize layer after the pointwise conv layer 461 | * @param pw_bn_offset The offset params of the batch_normalize layer after the pointwise conv layer 462 | * @param dw Weights of the depthwise conv layer 463 | * @param dw_bn_scale The scale params of the batch_normalize layer after the depthwise conv layer 464 | * @param dw_bn_offset The offset params of the batch_normalize layer after the depthwise conv layer 465 | * @param pw_linear_1 Weights of the pointwise linear conv layer 1 466 | * @param pw_linear_2 Weights of the pointwise linear conv layer 2 467 | * @param pw_linear_bn_scale The scale params of the batch_normalize layer after the pointwise linear conv layer 468 | * @param pw_linear_bn_offset The offset params of the batch_normalize layer after the pointwise linear conv layer 469 | * @param stride_x The step length of the convolution window in x(width) direction 470 | * @param stride_y The step length of the convolution window in y(height) direction 471 | * @param padding One of VALID or SAME 472 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 473 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 474 | * @return The result of a mobilefacenet block 475 | */ 476 | dl_matrix3d_t *dl_matrix3d_mobilefaceblock_split(dl_matrix3d_t *in, 477 | dl_matrix3d_t *pw_1, 478 | dl_matrix3d_t *pw_2, 479 | dl_matrix3d_t *pw_bn_scale, 480 | dl_matrix3d_t *pw_bn_offset, 481 | dl_matrix3d_t *dw, 482 | dl_matrix3d_t *dw_bn_scale, 483 | dl_matrix3d_t *dw_bn_offset, 484 | dl_matrix3d_t *pw_linear_1, 485 | dl_matrix3d_t *pw_linear_2, 486 | dl_matrix3d_t *pw_linear_bn_scale, 487 | dl_matrix3d_t *pw_linear_bn_offset, 488 | int stride_x, 489 | int stride_y, 490 | int padding, 491 | int mode, 492 | int shortcut); 493 | 494 | /** 495 | * @brief Initialize the matrix3d feature map to bias 496 | * 497 | * @param out The matrix3d feature map needs to be initialized 498 | * @param bias The bias of a convlotion operation 499 | */ 500 | void dl_matrix3d_init_bias(dl_matrix3d_t *out, dl_matrix3d_t *bias); 501 | 502 | /** 503 | * @brief Do a elementwise multiplication of two matrix3ds 504 | * 505 | * @param out Preallocated matrix3d, size (n, w, h, c) 506 | * @param in1 Input matrix 1, size (n, w, h, c) 507 | * @param in2 Input matrix 2, size (n, w, h, c) 508 | */ 509 | void dl_matrix3d_multiply(dl_matrix3d_t *out, dl_matrix3d_t *in1, dl_matrix3d_t *in2); 510 | 511 | // 512 | // Activation 513 | // 514 | 515 | /** 516 | * @brief Do a standard relu operation, update the input matrix3d 517 | * 518 | * @param m Floating point input matrix3d 519 | */ 520 | void dl_matrix3d_relu(dl_matrix3d_t *m); 521 | 522 | /** 523 | * @brief Do a relu (Rectifier Linear Unit) operation, update the input matrix3d 524 | * 525 | * @param in Floating point input matrix3d 526 | * @param clip If value is higher than this, it will be clipped to this value 527 | */ 528 | void dl_matrix3d_relu_clip(dl_matrix3d_t *m, fptp_t clip); 529 | 530 | /** 531 | * @brief Do a Prelu (Rectifier Linear Unit) operation, update the input matrix3d 532 | * 533 | * @param in Floating point input matrix3d 534 | * @param alpha If value is less than zero, it will be updated by multiplying this factor 535 | */ 536 | void dl_matrix3d_p_relu(dl_matrix3d_t *in, dl_matrix3d_t *alpha); 537 | 538 | /** 539 | * @brief Do a leaky relu (Rectifier Linear Unit) operation, update the input matrix3d 540 | * 541 | * @param in Floating point input matrix3d 542 | * @param alpha If value is less than zero, it will be updated by multiplying this factor 543 | */ 544 | void dl_matrix3d_leaky_relu(dl_matrix3d_t *m, fptp_t alpha); 545 | 546 | // 547 | // Conv 1x1 548 | // 549 | /** 550 | * @brief Do 1x1 convolution with a matrix3d 551 | * 552 | * @param out Preallocated matrix3d, size (1, w, h, n) 553 | * @param in Input matrix, size (1, w, h, c) 554 | * @param filter 1x1 filter, size (n, 1, 1, c) 555 | */ 556 | void dl_matrix3dff_conv_1x1(dl_matrix3d_t *out, 557 | dl_matrix3d_t *in, 558 | dl_matrix3d_t *filter); 559 | 560 | /** 561 | * @brief Do 1x1 convolution with a matrix3d, with bias adding 562 | * 563 | * @param out Preallocated matrix3d, size (1, w, h, n) 564 | * @param in Input matrix, size (1, w, h, c) 565 | * @param filter 1x1 filter, size (n, 1, 1, c) 566 | * @param bias Bias, size (1, 1, 1, n) 567 | */ 568 | void dl_matrix3dff_conv_1x1_with_bias(dl_matrix3d_t *out, 569 | dl_matrix3d_t *in, 570 | dl_matrix3d_t *filter, 571 | dl_matrix3d_t *bias); 572 | 573 | /** 574 | * @brief Do 1x1 convolution with an 8-bit fixed point matrix 575 | * 576 | * @param out Preallocated matrix3d, size (1, w, h, n) 577 | * @param in Input matrix, size (1, w, h, c) 578 | * @param filter 1x1 filter, size (n, 1, 1, c) 579 | */ 580 | void dl_matrix3duf_conv_1x1(dl_matrix3d_t *out, 581 | dl_matrix3du_t *in, 582 | dl_matrix3d_t *filter); 583 | 584 | /** 585 | * @brief Do 1x1 convolution with an 8-bit fixed point matrix, with bias adding 586 | * 587 | * @param out Preallocated matrix3d, size (1, w, h, n) 588 | * @param in Input matrix, size (1, w, h, c) 589 | * @param filter 1x1 filter, size (n, 1, 1, c) 590 | * @param bias Bias, size (1, 1, 1, n) 591 | */ 592 | void dl_matrix3duf_conv_1x1_with_bias(dl_matrix3d_t *out, 593 | dl_matrix3du_t *in, 594 | dl_matrix3d_t *filter, 595 | dl_matrix3d_t *bias); 596 | 597 | // 598 | // Conv 3x3 599 | // 600 | 601 | /** 602 | * @brief Do 3x3 convolution with a matrix3d, without padding 603 | * 604 | * @param out Preallocated matrix3d, size (1, w, h, n) 605 | * @param in Input matrix, size (1, w, h, c) 606 | * @param f 3x3 filter, size (n, 3, 3, c) 607 | * @param step_x Stride of width 608 | * @param step_y Stride of height 609 | */ 610 | void dl_matrix3dff_conv_3x3_op(dl_matrix3d_t *out, 611 | dl_matrix3d_t *in, 612 | dl_matrix3d_t *f, 613 | int step_x, 614 | int step_y); 615 | 616 | /** 617 | * @brief Do 3x3 convolution with a matrix3d, with bias adding 618 | * 619 | * @param input Input matrix, size (1, w, h, c) 620 | * @param filter 3x3 filter, size (n, 3, 3, c) 621 | * @param bias Bias, size (1, 1, 1, n) 622 | * @param stride_x Stride of width 623 | * @param stride_y Stride of height 624 | * @param padding Padding type 625 | * @return dl_matrix3d_t* Resulting matrix3d 626 | */ 627 | dl_matrix3d_t *dl_matrix3dff_conv_3x3(dl_matrix3d_t *in, 628 | dl_matrix3d_t *filter, 629 | dl_matrix3d_t *bias, 630 | int stride_x, 631 | int stride_y, 632 | dl_padding_type padding); 633 | 634 | // 635 | // Conv Common 636 | // 637 | 638 | /** 639 | * @brief Do a general convolution layer pass with an 8-bit fixed point matrix, size is (number, width, height, channel) 640 | * 641 | * @param in Input image 642 | * @param filter Weights of the neurons 643 | * @param bias Bias for the CNN layer 644 | * @param stride_x The step length of the convolution window in x(width) direction 645 | * @param stride_y The step length of the convolution window in y(height) direction 646 | * @param padding Padding type 647 | * @return dl_matrix3d_t* Resulting matrix3d 648 | */ 649 | dl_matrix3d_t *dl_matrix3duf_conv_common(dl_matrix3du_t *in, 650 | dl_matrix3d_t *filter, 651 | dl_matrix3d_t *bias, 652 | int stride_x, 653 | int stride_y, 654 | dl_padding_type padding); 655 | 656 | /** 657 | * @brief Do a general convolution layer pass, size is (number, width, height, channel) 658 | * 659 | * @param in Input image 660 | * @param filter Weights of the neurons 661 | * @param bias Bias for the CNN layer 662 | * @param stride_x The step length of the convolution window in x(width) direction 663 | * @param stride_y The step length of the convolution window in y(height) direction 664 | * @param padding Padding type 665 | * @return dl_matrix3d_t* Resulting matrix3d 666 | */ 667 | dl_matrix3d_t *dl_matrix3dff_conv_common(dl_matrix3d_t *in, 668 | dl_matrix3d_t *filter, 669 | dl_matrix3d_t *bias, 670 | int stride_x, 671 | int stride_y, 672 | dl_padding_type padding); 673 | 674 | // 675 | // Depthwise 3x3 676 | // 677 | 678 | /** 679 | * @brief Do 3x3 depthwise convolution with a float matrix3d 680 | * 681 | * @param in Input matrix, size (1, w, h, c) 682 | * @param filter 3x3 filter, size (1, 3, 3, c) 683 | * @param stride_x Stride of width 684 | * @param stride_y Stride of height 685 | * @param padding Padding type, 0: valid, 1: same 686 | * @return dl_matrix3d_t* Resulting float matrix3d 687 | */ 688 | dl_matrix3d_t *dl_matrix3dff_depthwise_conv_3x3(dl_matrix3d_t *in, 689 | dl_matrix3d_t *filter, 690 | int stride_x, 691 | int stride_y, 692 | int padding); 693 | 694 | /** 695 | * @brief Do 3x3 depthwise convolution with a 8-bit fixed point matrix 696 | * 697 | * @param in Input matrix, size (1, w, h, c) 698 | * @param filter 3x3 filter, size (1, 3, 3, c) 699 | * @param stride_x Stride of width 700 | * @param stride_y Stride of height 701 | * @param padding Padding type, 0: valid, 1: same 702 | * @return dl_matrix3d_t* Resulting float matrix3d 703 | */ 704 | dl_matrix3d_t *dl_matrix3duf_depthwise_conv_3x3(dl_matrix3du_t *in, 705 | dl_matrix3d_t *filter, 706 | int stride_x, 707 | int stride_y, 708 | int padding); 709 | 710 | /** 711 | * @brief Do 3x3 depthwise convolution with a float matrix3d, without padding 712 | * 713 | * @param out Preallocated matrix3d, size (1, w, h, n) 714 | * @param in Input matrix, size (1, w, h, c) 715 | * @param f 3x3 filter, size (1, 3, 3, c) 716 | * @param step_x Stride of width 717 | * @param step_y Stride of height 718 | */ 719 | void dl_matrix3dff_depthwise_conv_3x3_op(dl_matrix3d_t *out, 720 | dl_matrix3d_t *in, 721 | dl_matrix3d_t *f, 722 | int step_x, 723 | int step_y); 724 | 725 | // 726 | // Depthwise Common 727 | // 728 | 729 | /** 730 | * @brief Do a depthwise CNN layer pass, dimension is (number, width, height, channel) 731 | * 732 | * @param in Input matrix3d 733 | * @param filter Weights of the neurons 734 | * @param stride_x The step length of the convolution window in x(width) direction 735 | * @param stride_y The step length of the convolution window in y(height) direction 736 | * @param padding One of VALID or SAME 737 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 738 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 739 | * @return The result of depthwise CNN layer 740 | */ 741 | dl_matrix3d_t *dl_matrix3dff_depthwise_conv_common(dl_matrix3d_t *in, 742 | dl_matrix3d_t *filter, 743 | int stride_x, 744 | int stride_y, 745 | dl_padding_type padding); 746 | 747 | // 748 | // FC 749 | // 750 | /** 751 | * @brief Do a general fully connected layer pass, dimension is (number, width, height, channel) 752 | * 753 | * @param in Input matrix3d, size is (1, w, 1, 1) 754 | * @param filter Weights of the neurons, size is (1, w, h, 1) 755 | * @param bias Bias for the fc layer, size is (1, 1, 1, h) 756 | * @return The result of fc layer, size is (1, 1, 1, h) 757 | */ 758 | void dl_matrix3dff_fc(dl_matrix3d_t *out, 759 | dl_matrix3d_t *in, 760 | dl_matrix3d_t *filter); 761 | 762 | /** 763 | * @brief Do fully connected layer forward, with bias adding 764 | * 765 | * @param out Preallocated resulting matrix, size (1, 1, 1, h) 766 | * @param in Input matrix, size (1, 1, 1, w) 767 | * @param filter Filter matrix, size (1, w, h, 1) 768 | * @param bias Bias matrix, size (1, 1, 1, h) 769 | */ 770 | void dl_matrix3dff_fc_with_bias(dl_matrix3d_t *out, 771 | dl_matrix3d_t *in, 772 | dl_matrix3d_t *filter, 773 | dl_matrix3d_t *bias); 774 | 775 | // 776 | // Mobilenet 777 | // 778 | 779 | /** 780 | * @brief Do a mobilenet block forward, dimension is (number, width, height, channel) 781 | * 782 | * @param in Input matrix3d 783 | * @param filter Weights of the neurons 784 | * @param stride_x The step length of the convolution window in x(width) direction 785 | * @param stride_y The step length of the convolution window in y(height) direction 786 | * @param padding One of VALID or SAME 787 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 788 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 789 | * @return The result of depthwise CNN layer 790 | */ 791 | dl_matrix3d_t *dl_matrix3dff_mobilenet(dl_matrix3d_t *in, 792 | dl_matrix3d_t *dilate_filter, 793 | dl_matrix3d_t *dilate_prelu, 794 | dl_matrix3d_t *depthwise_filter, 795 | dl_matrix3d_t *depthwise_prelu, 796 | dl_matrix3d_t *compress_filter, 797 | dl_matrix3d_t *bias, 798 | dl_matrix3d_mobilenet_config_t config); 799 | 800 | /** 801 | * @brief Do a mobilenet block forward, dimension is (number, width, height, channel) 802 | * 803 | * @param in Input matrix3du 804 | * @param filter Weights of the neurons 805 | * @param stride_x The step length of the convolution window in x(width) direction 806 | * @param stride_y The step length of the convolution window in y(height) direction 807 | * @param padding One of VALID or SAME 808 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 809 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 810 | * @return The result of depthwise CNN layer 811 | */ 812 | dl_matrix3d_t *dl_matrix3duf_mobilenet(dl_matrix3du_t *in, 813 | dl_matrix3d_t *dilate_filter, 814 | dl_matrix3d_t *dilate_prelu, 815 | dl_matrix3d_t *depthwise_filter, 816 | dl_matrix3d_t *depthwise_prelu, 817 | dl_matrix3d_t *compress_filter, 818 | dl_matrix3d_t *bias, 819 | dl_matrix3d_mobilenet_config_t config); 820 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam-bare/edge-impulse-esp32-cam-bare.ino: -------------------------------------------------------------------------------- 1 | /* 2 | Live Image Classification on ESP32-CAM using MobileNet v1 from Edge Impulse 3 | Modified from https://github.com/edgeimpulse/example-esp32-cam. 4 | 5 | Note: 6 | Do not use Arduino IDE 2.0 or you won't be able to see the serial output! 7 | */ 8 | 9 | #include // replace with your deployed Edge Impulse library 10 | 11 | #define CAMERA_MODEL_AI_THINKER 12 | 13 | #include "img_converters.h" 14 | #include "image_util.h" 15 | #include "esp_camera.h" 16 | #include "camera_pins.h" 17 | 18 | dl_matrix3du_t *resized_matrix = NULL; 19 | ei_impulse_result_t result = {0}; 20 | 21 | // setup 22 | void setup() { 23 | Serial.begin(115200); 24 | 25 | // cam config 26 | camera_config_t config; 27 | config.ledc_channel = LEDC_CHANNEL_0; 28 | config.ledc_timer = LEDC_TIMER_0; 29 | config.pin_d0 = Y2_GPIO_NUM; 30 | config.pin_d1 = Y3_GPIO_NUM; 31 | config.pin_d2 = Y4_GPIO_NUM; 32 | config.pin_d3 = Y5_GPIO_NUM; 33 | config.pin_d4 = Y6_GPIO_NUM; 34 | config.pin_d5 = Y7_GPIO_NUM; 35 | config.pin_d6 = Y8_GPIO_NUM; 36 | config.pin_d7 = Y9_GPIO_NUM; 37 | config.pin_xclk = XCLK_GPIO_NUM; 38 | config.pin_pclk = PCLK_GPIO_NUM; 39 | config.pin_vsync = VSYNC_GPIO_NUM; 40 | config.pin_href = HREF_GPIO_NUM; 41 | config.pin_sscb_sda = SIOD_GPIO_NUM; 42 | config.pin_sscb_scl = SIOC_GPIO_NUM; 43 | config.pin_pwdn = PWDN_GPIO_NUM; 44 | config.pin_reset = RESET_GPIO_NUM; 45 | config.xclk_freq_hz = 20000000; 46 | config.pixel_format = PIXFORMAT_JPEG; 47 | config.frame_size = FRAMESIZE_240X240; 48 | config.jpeg_quality = 10; 49 | config.fb_count = 1; 50 | 51 | // camera init 52 | esp_err_t err = esp_camera_init(&config); 53 | if (err != ESP_OK) { 54 | Serial.printf("Camera init failed with error 0x%x", err); 55 | return; 56 | } 57 | 58 | sensor_t * s = esp_camera_sensor_get(); 59 | // initial sensors are flipped vertically and colors are a bit saturated 60 | if (s->id.PID == OV3660_PID) { 61 | s->set_vflip(s, 1); // flip it back 62 | s->set_brightness(s, 1); // up the brightness just a bit 63 | s->set_saturation(s, 0); // lower the saturation 64 | } 65 | 66 | Serial.println("Camera Ready!"); 67 | } 68 | 69 | // main loop 70 | void loop() { 71 | 72 | // capture a image and classify it 73 | String result = classify(); 74 | 75 | // display result 76 | Serial.printf("Result: %s\n", result); 77 | } 78 | 79 | // classify labels 80 | String classify() { 81 | 82 | // run image capture once to force clear buffer 83 | // otherwise the captured image below would only show up next time you pressed the button! 84 | capture_quick(); 85 | 86 | // capture image from camera 87 | if (!capture()) return "Error"; 88 | 89 | Serial.println("Getting image..."); 90 | signal_t signal; 91 | signal.total_length = EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_WIDTH; 92 | signal.get_data = &raw_feature_get_data; 93 | 94 | Serial.println("Run classifier..."); 95 | // Feed signal to the classifier 96 | EI_IMPULSE_ERROR res = run_classifier(&signal, &result, false /* debug */); 97 | // --- Free memory --- 98 | dl_matrix3du_free(resized_matrix); 99 | 100 | // --- Returned error variable "res" while data object.array in "result" --- 101 | ei_printf("run_classifier returned: %d\n", res); 102 | if (res != 0) return "Error"; 103 | 104 | // --- print the predictions --- 105 | ei_printf("Predictions (DSP: %d ms., Classification: %d ms., Anomaly: %d ms.): \n", 106 | result.timing.dsp, result.timing.classification, result.timing.anomaly); 107 | int index; 108 | float score = 0.0; 109 | for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) { 110 | // record the most possible label 111 | if (result.classification[ix].value > score) { 112 | score = result.classification[ix].value; 113 | index = ix; 114 | } 115 | ei_printf(" %s: \t%f\r\n", result.classification[ix].label, result.classification[ix].value); 116 | } 117 | 118 | #if EI_CLASSIFIER_HAS_ANOMALY == 1 119 | ei_printf(" anomaly score: %f\r\n", result.anomaly); 120 | #endif 121 | 122 | // --- return the most possible label --- 123 | return String(result.classification[index].label); 124 | } 125 | 126 | // quick capture (to clear buffer) 127 | void capture_quick() { 128 | camera_fb_t *fb = NULL; 129 | fb = esp_camera_fb_get(); 130 | if (!fb) return; 131 | esp_camera_fb_return(fb); 132 | } 133 | 134 | // capture image from cam 135 | bool capture() { 136 | 137 | Serial.println("Capture image..."); 138 | esp_err_t res = ESP_OK; 139 | camera_fb_t *fb = NULL; 140 | fb = esp_camera_fb_get(); 141 | if (!fb) { 142 | Serial.println("Camera capture failed"); 143 | return false; 144 | } 145 | 146 | // --- Convert frame to RGB888 --- 147 | Serial.println("Converting to RGB888..."); 148 | // Allocate rgb888_matrix buffer 149 | dl_matrix3du_t *rgb888_matrix = dl_matrix3du_alloc(1, fb->width, fb->height, 3); 150 | fmt2rgb888(fb->buf, fb->len, fb->format, rgb888_matrix->item); 151 | 152 | // --- Resize the RGB888 frame to 96x96 in this example --- 153 | Serial.println("Resizing the frame buffer..."); 154 | resized_matrix = dl_matrix3du_alloc(1, EI_CLASSIFIER_INPUT_WIDTH, EI_CLASSIFIER_INPUT_HEIGHT, 3); 155 | image_resize_linear(resized_matrix->item, rgb888_matrix->item, EI_CLASSIFIER_INPUT_WIDTH, EI_CLASSIFIER_INPUT_HEIGHT, 3, fb->width, fb->height); 156 | 157 | // --- Free memory --- 158 | dl_matrix3du_free(rgb888_matrix); 159 | esp_camera_fb_return(fb); 160 | 161 | return true; 162 | } 163 | 164 | int raw_feature_get_data(size_t offset, size_t out_len, float *signal_ptr) { 165 | 166 | size_t pixel_ix = offset * 3; 167 | size_t bytes_left = out_len; 168 | size_t out_ptr_ix = 0; 169 | 170 | // read byte for byte 171 | while (bytes_left != 0) { 172 | // grab the values and convert to r/g/b 173 | uint8_t r, g, b; 174 | r = resized_matrix->item[pixel_ix]; 175 | g = resized_matrix->item[pixel_ix + 1]; 176 | b = resized_matrix->item[pixel_ix + 2]; 177 | 178 | // then convert to out_ptr format 179 | float pixel_f = (r << 16) + (g << 8) + b; 180 | signal_ptr[out_ptr_ix] = pixel_f; 181 | 182 | // and go to the next pixel 183 | out_ptr_ix++; 184 | pixel_ix += 3; 185 | bytes_left--; 186 | } 187 | 188 | return 0; 189 | } 190 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam-bare/esp_image.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * ESPRESSIF MIT License 3 | * 4 | * Copyright (c) 2018 5 | * 6 | * Permission is hereby granted for use on ESPRESSIF SYSTEMS products only, in which case, 7 | * it is free of charge, to any person obtaining a copy of this software and associated 8 | * documentation files (the "Software"), to deal in the Software without restriction, including 9 | * without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 10 | * and/or sell copies of the Software, and to permit persons to whom the Software is furnished 11 | * to do so, subject to the following conditions: 12 | * 13 | * The above copyright notice and this permission notice shall be included in all copies or 14 | * substantial portions of the Software. 15 | * 16 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 18 | * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 19 | * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 20 | * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 21 | * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 22 | * 23 | */ 24 | #pragma once 25 | 26 | #ifdef __cplusplus 27 | extern "C" 28 | { 29 | #endif 30 | 31 | #include 32 | #include 33 | #include 34 | 35 | #ifdef __cplusplus 36 | } 37 | #endif 38 | 39 | typedef enum 40 | { 41 | IMAGE_RESIZE_BILINEAR = 0, /* 47 | class Image 48 | { 49 | public: 50 | /** 51 | * @brief Convert a RGB565 pixel to RGB888 52 | * 53 | * @param input Pixel value in RGB565 54 | * @param output Pixel value in RGB888 55 | */ 56 | static inline void pixel_rgb565_to_rgb888(uint16_t input, T *output) 57 | { 58 | output[2] = (input & 0x1F00) >> 5; //blue 59 | output[1] = ((input & 0x7) << 5) | ((input & 0xE000) >> 11); //green 60 | output[0] = input & 0xF8; //red 61 | }; 62 | 63 | /** 64 | * @brief Resize a RGB565 image to a RGB88 image 65 | * 66 | * @param dst_image The destination image 67 | * @param y_start The start y index of where resized image located 68 | * @param y_end The end y index of where resized image located 69 | * @param x_start The start x index of where resized image located 70 | * @param x_end The end x index of where resized image located 71 | * @param channel The channel number of image 72 | * @param src_image The source image 73 | * @param src_h The height of source image 74 | * @param src_w The width of source image 75 | * @param dst_w The width of destination image 76 | * @param shift_left The bit number of left shifting 77 | * @param type The resize type 78 | */ 79 | static void resize_to_rgb888(T *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint16_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type); 80 | 81 | /** 82 | * @brief Resize a RGB888 image to a RGB88 image 83 | * 84 | * @param dst_image The destination image 85 | * @param y_start The start y index of where resized image located 86 | * @param y_end The end y index of where resized image located 87 | * @param x_start The start x index of where resized image located 88 | * @param x_end The end x index of where resized image located 89 | * @param channel The channel number of image 90 | * @param src_image The source image 91 | * @param src_h The height of source image 92 | * @param src_w The width of source image 93 | * @param dst_w The width of destination image 94 | * @param shift_left The bit number of left shifting 95 | * @param type The resize type 96 | */ 97 | static void resize_to_rgb888(T *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint8_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type); 98 | // static void resize_to_rgb565(uint16_t *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint16_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type); 99 | // static void resize_to_rgb565(uint16_t *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint8_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type); 100 | }; 101 | 102 | template 103 | void Image::resize_to_rgb888(T *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint16_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type) 104 | { 105 | assert(channel == 3); 106 | float scale_y = (float)src_h / (y_end - y_start); 107 | float scale_x = (float)src_w / (x_end - x_start); 108 | int temp[13]; 109 | 110 | switch (type) 111 | { 112 | case IMAGE_RESIZE_BILINEAR: 113 | for (size_t y = y_start; y < y_end; y++) 114 | { 115 | float ratio_y[2]; 116 | ratio_y[0] = (float)((y + 0.5) * scale_y - 0.5); // y 117 | int src_y = (int)ratio_y[0]; // y1 118 | ratio_y[0] -= src_y; // y - y1 119 | 120 | if (src_y < 0) 121 | { 122 | ratio_y[0] = 0; 123 | src_y = 0; 124 | } 125 | if (src_y > src_h - 2) 126 | { 127 | ratio_y[0] = 0; 128 | src_y = src_h - 2; 129 | } 130 | ratio_y[1] = 1 - ratio_y[0]; // y2 - y 131 | 132 | int _dst_i = y * dst_w; 133 | 134 | int _src_row_0 = src_y * src_w; 135 | int _src_row_1 = _src_row_0 + src_w; 136 | 137 | for (size_t x = x_start; x < x_end; x++) 138 | { 139 | float ratio_x[2]; 140 | ratio_x[0] = (float)((x + 0.5) * scale_x - 0.5); // x 141 | int src_x = (int)ratio_x[0]; // x1 142 | ratio_x[0] -= src_x; // x - x1 143 | 144 | if (src_x < 0) 145 | { 146 | ratio_x[0] = 0; 147 | src_x = 0; 148 | } 149 | if (src_x > src_w - 2) 150 | { 151 | ratio_x[0] = 0; 152 | src_x = src_w - 2; 153 | } 154 | ratio_x[1] = 1 - ratio_x[0]; // x2 - x 155 | 156 | int dst_i = (_dst_i + x) * channel; 157 | 158 | int src_row_0 = _src_row_0 + src_x; 159 | int src_row_1 = _src_row_1 + src_x; 160 | 161 | Image::pixel_rgb565_to_rgb888(src_image[src_row_0], temp); 162 | Image::pixel_rgb565_to_rgb888(src_image[src_row_0 + 1], temp + 3); 163 | Image::pixel_rgb565_to_rgb888(src_image[src_row_1], temp + 6); 164 | Image::pixel_rgb565_to_rgb888(src_image[src_row_1 + 1], temp + 9); 165 | 166 | for (int c = 0; c < channel; c++) 167 | { 168 | temp[12] = round(temp[c] * ratio_x[1] * ratio_y[1] + temp[channel + c] * ratio_x[0] * ratio_y[1] + temp[channel + channel + c] * ratio_x[1] * ratio_y[0] + src_image[channel + channel + channel + c] * ratio_x[0] * ratio_y[0]); 169 | dst_image[dst_i + c] = (shift_left > 0) ? (temp[12] << shift_left) : (temp[12] >> -shift_left); 170 | } 171 | } 172 | } 173 | break; 174 | 175 | case IMAGE_RESIZE_MEAN: 176 | shift_left -= 2; 177 | for (int y = y_start; y < y_end; y++) 178 | { 179 | int _dst_i = y * dst_w; 180 | 181 | float _src_row_0 = rintf(y * scale_y) * src_w; 182 | float _src_row_1 = _src_row_0 + src_w; 183 | 184 | for (int x = x_start; x < x_end; x++) 185 | { 186 | int dst_i = (_dst_i + x) * channel; 187 | 188 | int src_row_0 = (_src_row_0 + rintf(x * scale_x)); 189 | int src_row_1 = (_src_row_1 + rintf(x * scale_x)); 190 | 191 | Image::pixel_rgb565_to_rgb888(src_image[src_row_0], temp); 192 | Image::pixel_rgb565_to_rgb888(src_image[src_row_0 + 1], temp + 3); 193 | Image::pixel_rgb565_to_rgb888(src_image[src_row_1], temp + 6); 194 | Image::pixel_rgb565_to_rgb888(src_image[src_row_1 + 1], temp + 9); 195 | 196 | dst_image[dst_i] = (shift_left > 0) ? ((temp[0] + temp[3] + temp[6] + temp[9]) << shift_left) : ((temp[0] + temp[3] + temp[6] + temp[9]) >> -shift_left); 197 | dst_image[dst_i + 1] = (shift_left > 0) ? ((temp[1] + temp[4] + temp[7] + temp[10]) << shift_left) : ((temp[1] + temp[4] + temp[7] + temp[10]) >> -shift_left); 198 | dst_image[dst_i + 2] = (shift_left > 0) ? ((temp[2] + temp[5] + temp[8] + temp[11]) << shift_left) : ((temp[1] + temp[4] + temp[7] + temp[10]) >> -shift_left); 199 | } 200 | } 201 | 202 | break; 203 | 204 | case IMAGE_RESIZE_NEAREST: 205 | for (size_t y = y_start; y < y_end; y++) 206 | { 207 | int _dst_i = y * dst_w; 208 | float _src_i = rintf(y * scale_y) * src_w; 209 | 210 | for (size_t x = x_start; x < x_end; x++) 211 | { 212 | int dst_i = (_dst_i + x) * channel; 213 | int src_i = _src_i + rintf(x * scale_x); 214 | 215 | Image::pixel_rgb565_to_rgb888(src_image[src_i], temp); 216 | 217 | dst_image[dst_i] = (shift_left > 0) ? (temp[0] << shift_left) : (temp[0] >> -shift_left); 218 | dst_image[dst_i + 1] = (shift_left > 0) ? (temp[1] << shift_left) : (temp[1] >> -shift_left); 219 | dst_image[dst_i + 2] = (shift_left > 0) ? (temp[2] << shift_left) : (temp[2] >> -shift_left); 220 | } 221 | } 222 | break; 223 | 224 | default: 225 | break; 226 | } 227 | } 228 | 229 | template 230 | void Image::resize_to_rgb888(T *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint8_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type) 231 | { 232 | float scale_y = (float)src_h / (y_end - y_start); 233 | float scale_x = (float)src_w / (x_end - x_start); 234 | int temp; 235 | 236 | switch (type) 237 | { 238 | case IMAGE_RESIZE_BILINEAR: 239 | for (size_t y = y_start; y < y_end; y++) 240 | { 241 | float ratio_y[2]; 242 | ratio_y[0] = (float)((y + 0.5) * scale_y - 0.5); // y 243 | int src_y = (int)ratio_y[0]; // y1 244 | ratio_y[0] -= src_y; // y - y1 245 | 246 | if (src_y < 0) 247 | { 248 | ratio_y[0] = 0; 249 | src_y = 0; 250 | } 251 | if (src_y > src_h - 2) 252 | { 253 | ratio_y[0] = 0; 254 | src_y = src_h - 2; 255 | } 256 | ratio_y[1] = 1 - ratio_y[0]; // y2 - y 257 | 258 | int _dst_i = y * dst_w; 259 | 260 | int _src_row_0 = src_y * src_w; 261 | int _src_row_1 = _src_row_0 + src_w; 262 | 263 | for (size_t x = x_start; x < x_end; x++) 264 | { 265 | float ratio_x[2]; 266 | ratio_x[0] = (float)((x + 0.5) * scale_x - 0.5); // x 267 | int src_x = (int)ratio_x[0]; // x1 268 | ratio_x[0] -= src_x; // x - x1 269 | 270 | if (src_x < 0) 271 | { 272 | ratio_x[0] = 0; 273 | src_x = 0; 274 | } 275 | if (src_x > src_w - 2) 276 | { 277 | ratio_x[0] = 0; 278 | src_x = src_w - 2; 279 | } 280 | ratio_x[1] = 1 - ratio_x[0]; // x2 - x 281 | 282 | int dst_i = (_dst_i + x) * channel; 283 | 284 | int src_row_0 = (_src_row_0 + src_x) * channel; 285 | int src_row_1 = (_src_row_1 + src_x) * channel; 286 | 287 | for (int c = 0; c < channel; c++) 288 | { 289 | temp = round(src_image[src_row_0 + c] * ratio_x[1] * ratio_y[1] + src_image[src_row_0 + channel + c] * ratio_x[0] * ratio_y[1] + src_image[src_row_1 + c] * ratio_x[1] * ratio_y[0] + src_image[src_row_1 + channel + c] * ratio_x[0] * ratio_y[0]); 290 | dst_image[dst_i + c] = (shift_left > 0) ? (temp << shift_left) : (temp >> -shift_left); 291 | } 292 | } 293 | } 294 | break; 295 | 296 | case IMAGE_RESIZE_MEAN: 297 | shift_left -= 2; 298 | 299 | for (size_t y = y_start; y < y_end; y++) 300 | { 301 | int _dst_i = y * dst_w; 302 | 303 | float _src_row_0 = rintf(y * scale_y) * src_w; 304 | float _src_row_1 = _src_row_0 + src_w; 305 | 306 | for (size_t x = x_start; x < x_end; x++) 307 | { 308 | int dst_i = (_dst_i + x) * channel; 309 | 310 | int src_row_0 = (_src_row_0 + rintf(x * scale_x)) * channel; 311 | int src_row_1 = (_src_row_1 + rintf(x * scale_x)) * channel; 312 | 313 | for (size_t c = 0; c < channel; c++) 314 | { 315 | temp = (int)src_image[src_row_0 + c] + (int)src_image[src_row_0 + channel + c] + (int)src_image[src_row_1 + c] + (int)src_image[src_row_1 + channel + c]; 316 | dst_image[dst_i + c] = (shift_left > 0) ? (temp << shift_left) : (temp >> -shift_left); 317 | } 318 | } 319 | } 320 | break; 321 | 322 | case IMAGE_RESIZE_NEAREST: 323 | for (size_t y = y_start; y < y_end; y++) 324 | { 325 | int _dst_i = y * dst_w; 326 | float _src_i = rintf(y * scale_y) * src_w; 327 | 328 | for (size_t x = x_start; x < x_end; x++) 329 | { 330 | int dst_i = (_dst_i + x) * channel; 331 | int src_i = (_src_i + rintf(x * scale_x)) * channel; 332 | 333 | for (size_t c = 0; c < channel; c++) 334 | { 335 | dst_image[dst_i + c] = (shift_left > 0) ? ((T)src_image[src_i + c] << shift_left) : ((T)src_image[src_i + c] >> -shift_left); 336 | } 337 | } 338 | } 339 | break; 340 | 341 | default: 342 | break; 343 | } 344 | } -------------------------------------------------------------------------------- /edge-impulse-esp32-cam-bare/frmn.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #if __cplusplus 4 | extern "C" 5 | { 6 | #endif 7 | 8 | #include "dl_lib_matrix3d.h" 9 | #include "dl_lib_matrix3dq.h" 10 | 11 | /** 12 | * @brief Forward the face recognition process with frmn model. Calculate in float. 13 | * 14 | * @param in Image matrix, rgb888 format, size is 56x56, normalized 15 | * @return dl_matrix3d_t* Face ID feature vector, size is 512 16 | */ 17 | dl_matrix3d_t *frmn(dl_matrix3d_t *in); 18 | 19 | /**@{*/ 20 | /** 21 | * @brief Forward the face recognition process with specified model. Calculate in quantization. 22 | * 23 | * @param in Image matrix, rgb888 format, size is 56x56, normalized 24 | * @param mode 0: C implement; 1: handwrite xtensa instruction implement 25 | * @return Face ID feature vector, size is 512 26 | */ 27 | dl_matrix3dq_t *frmn_q(dl_matrix3dq_t *in, dl_conv_mode mode); 28 | 29 | dl_matrix3dq_t *frmn2p_q(dl_matrix3dq_t *in, dl_conv_mode mode); 30 | 31 | dl_matrix3dq_t *mfn56_42m_q(dl_matrix3dq_t *in, dl_conv_mode mode); 32 | 33 | dl_matrix3dq_t *mfn56_72m_q(dl_matrix3dq_t *in, dl_conv_mode mode); 34 | 35 | dl_matrix3dq_t *mfn56_112m_q(dl_matrix3dq_t *in, dl_conv_mode mode); 36 | 37 | dl_matrix3dq_t *mfn56_156m_q(dl_matrix3dq_t *in, dl_conv_mode mode); 38 | 39 | /**@}*/ 40 | 41 | #if __cplusplus 42 | } 43 | #endif 44 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam-bare/image_util.h: -------------------------------------------------------------------------------- 1 | /* 2 | * ESPRESSIF MIT License 3 | * 4 | * Copyright (c) 2018 5 | * 6 | * Permission is hereby granted for use on ESPRESSIF SYSTEMS products only, in which case, 7 | * it is free of charge, to any person obtaining a copy of this software and associated 8 | * documentation files (the "Software"), to deal in the Software without restriction, including 9 | * without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 10 | * and/or sell copies of the Software, and to permit persons to whom the Software is furnished 11 | * to do so, subject to the following conditions: 12 | * 13 | * The above copyright notice and this permission notice shall be included in all copies or 14 | * substantial portions of the Software. 15 | * 16 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 18 | * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 19 | * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 20 | * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 21 | * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 22 | * 23 | */ 24 | #pragma once 25 | #ifdef __cplusplus 26 | extern "C" 27 | { 28 | #endif 29 | #include 30 | #include 31 | #include "mtmn.h" 32 | 33 | #define LANDMARKS_NUM (10) 34 | 35 | #define MAX_VALID_COUNT_PER_IMAGE (30) 36 | 37 | #define DL_IMAGE_MIN(A, B) ((A) < (B) ? (A) : (B)) 38 | #define DL_IMAGE_MAX(A, B) ((A) < (B) ? (B) : (A)) 39 | 40 | #define RGB565_MASK_RED 0xF800 41 | #define RGB565_MASK_GREEN 0x07E0 42 | #define RGB565_MASK_BLUE 0x001F 43 | 44 | typedef enum 45 | { 46 | BINARY, /*!< binary */ 47 | } en_threshold_mode; 48 | 49 | typedef struct 50 | { 51 | fptp_t landmark_p[LANDMARKS_NUM]; /*!< landmark struct */ 52 | } landmark_t; 53 | 54 | typedef struct 55 | { 56 | fptp_t box_p[4]; /*!< box struct */ 57 | } box_t; 58 | 59 | typedef struct tag_box_list 60 | { 61 | uint8_t *category; /*!< The category of the corresponding box */ 62 | fptp_t *score; /*!< The confidence score of the class corresponding to the box */ 63 | box_t *box; /*!< Anchor boxes or predicted boxes*/ 64 | landmark_t *landmark; /*!< The landmarks corresponding to the box */ 65 | int len; /*!< The num of the boxes */ 66 | } box_array_t; 67 | 68 | typedef struct tag_image_box 69 | { 70 | struct tag_image_box *next; /*!< Next image_box_t */ 71 | uint8_t category; 72 | fptp_t score; /*!< The confidence score of the class corresponding to the box */ 73 | box_t box; /*!< Anchor boxes or predicted boxes */ 74 | box_t offset; /*!< The predicted anchor-based offset */ 75 | landmark_t landmark; /*!< The landmarks corresponding to the box */ 76 | } image_box_t; 77 | 78 | typedef struct tag_image_list 79 | { 80 | image_box_t *head; /*!< The current head of the image_list */ 81 | image_box_t *origin_head; /*!< The original head of the image_list */ 82 | int len; /*!< Length of the image_list */ 83 | } image_list_t; 84 | 85 | /** 86 | * @brief Get the width and height of the box. 87 | * 88 | * @param box Input box 89 | * @param w Resulting width of the box 90 | * @param h Resulting height of the box 91 | */ 92 | static inline void image_get_width_and_height(box_t *box, float *w, float *h) 93 | { 94 | *w = box->box_p[2] - box->box_p[0] + 1; 95 | *h = box->box_p[3] - box->box_p[1] + 1; 96 | } 97 | 98 | /** 99 | * @brief Get the area of the box. 100 | * 101 | * @param box Input box 102 | * @param area Resulting area of the box 103 | */ 104 | static inline void image_get_area(box_t *box, float *area) 105 | { 106 | float w, h; 107 | image_get_width_and_height(box, &w, &h); 108 | *area = w * h; 109 | } 110 | 111 | /** 112 | * @brief calibrate the boxes by offset 113 | * 114 | * @param image_list Input boxes 115 | * @param image_height Height of the original image 116 | * @param image_width Width of the original image 117 | */ 118 | static inline void image_calibrate_by_offset(image_list_t *image_list, int image_height, int image_width) 119 | { 120 | for (image_box_t *head = image_list->head; head; head = head->next) 121 | { 122 | float w, h; 123 | image_get_width_and_height(&(head->box), &w, &h); 124 | head->box.box_p[0] = DL_IMAGE_MAX(0, head->box.box_p[0] + head->offset.box_p[0] * w); 125 | head->box.box_p[1] = DL_IMAGE_MAX(0, head->box.box_p[1] + head->offset.box_p[1] * w); 126 | head->box.box_p[2] += head->offset.box_p[2] * w; 127 | if (head->box.box_p[2] > image_width) 128 | { 129 | head->box.box_p[2] = image_width - 1; 130 | head->box.box_p[0] = image_width - w; 131 | } 132 | head->box.box_p[3] += head->offset.box_p[3] * h; 133 | if (head->box.box_p[3] > image_height) 134 | { 135 | head->box.box_p[3] = image_height - 1; 136 | head->box.box_p[1] = image_height - h; 137 | } 138 | } 139 | } 140 | 141 | /** 142 | * @brief calibrate the landmarks 143 | * 144 | * @param image_list Input landmarks 145 | */ 146 | static inline void image_landmark_calibrate(image_list_t *image_list) 147 | { 148 | for (image_box_t *head = image_list->head; head; head = head->next) 149 | { 150 | float w, h; 151 | image_get_width_and_height(&(head->box), &w, &h); 152 | head->landmark.landmark_p[0] = head->box.box_p[0] + head->landmark.landmark_p[0] * w; 153 | head->landmark.landmark_p[1] = head->box.box_p[1] + head->landmark.landmark_p[1] * h; 154 | 155 | head->landmark.landmark_p[2] = head->box.box_p[0] + head->landmark.landmark_p[2] * w; 156 | head->landmark.landmark_p[3] = head->box.box_p[1] + head->landmark.landmark_p[3] * h; 157 | 158 | head->landmark.landmark_p[4] = head->box.box_p[0] + head->landmark.landmark_p[4] * w; 159 | head->landmark.landmark_p[5] = head->box.box_p[1] + head->landmark.landmark_p[5] * h; 160 | 161 | head->landmark.landmark_p[6] = head->box.box_p[0] + head->landmark.landmark_p[6] * w; 162 | head->landmark.landmark_p[7] = head->box.box_p[1] + head->landmark.landmark_p[7] * h; 163 | 164 | head->landmark.landmark_p[8] = head->box.box_p[0] + head->landmark.landmark_p[8] * w; 165 | head->landmark.landmark_p[9] = head->box.box_p[1] + head->landmark.landmark_p[9] * h; 166 | } 167 | } 168 | 169 | /** 170 | * @brief Convert a rectangular box into a square box 171 | * 172 | * @param boxes Input box 173 | * @param width Width of the orignal image 174 | * @param height height of the orignal image 175 | */ 176 | static inline void image_rect2sqr(box_array_t *boxes, int width, int height) 177 | { 178 | for (int i = 0; i < boxes->len; i++) 179 | { 180 | box_t *box = &(boxes->box[i]); 181 | 182 | int x1 = round(box->box_p[0]); 183 | int y1 = round(box->box_p[1]); 184 | int x2 = round(box->box_p[2]); 185 | int y2 = round(box->box_p[3]); 186 | 187 | int w = x2 - x1 + 1; 188 | int h = y2 - y1 + 1; 189 | int l = DL_IMAGE_MAX(w, h); 190 | 191 | box->box_p[0] = DL_IMAGE_MAX(round(DL_IMAGE_MAX(0, x1) + 0.5 * (w - l)), 0); 192 | box->box_p[1] = DL_IMAGE_MAX(round(DL_IMAGE_MAX(0, y1) + 0.5 * (h - l)), 0); 193 | 194 | box->box_p[2] = box->box_p[0] + l - 1; 195 | if (box->box_p[2] > width) 196 | { 197 | box->box_p[2] = width - 1; 198 | box->box_p[0] = width - l; 199 | } 200 | box->box_p[3] = box->box_p[1] + l - 1; 201 | if (box->box_p[3] > height) 202 | { 203 | box->box_p[3] = height - 1; 204 | box->box_p[1] = height - l; 205 | } 206 | } 207 | } 208 | 209 | /**@{*/ 210 | /** 211 | * @brief Convert RGB565 image to RGB888 image 212 | * 213 | * @param in Input RGB565 image 214 | * @param dst Resulting RGB888 image 215 | */ 216 | static inline void rgb565_to_888(uint16_t in, uint8_t *dst) 217 | { /*{{{*/ 218 | in = (in & 0xFF) << 8 | (in & 0xFF00) >> 8; 219 | dst[2] = (in & RGB565_MASK_BLUE) << 3; // blue 220 | dst[1] = (in & RGB565_MASK_GREEN) >> 3; // green 221 | dst[0] = (in & RGB565_MASK_RED) >> 8; // red 222 | 223 | // dst[0] = (in & 0x1F00) >> 5; 224 | // dst[1] = ((in & 0x7) << 5) | ((in & 0xE000) >> 11); 225 | // dst[2] = in & 0xF8; 226 | } /*}}}*/ 227 | 228 | static inline void rgb565_to_888_q16(uint16_t in, int16_t *dst) 229 | { /*{{{*/ 230 | in = (in & 0xFF) << 8 | (in & 0xFF00) >> 8; 231 | dst[2] = (in & RGB565_MASK_BLUE) << 3; // blue 232 | dst[1] = (in & RGB565_MASK_GREEN) >> 3; // green 233 | dst[0] = (in & RGB565_MASK_RED) >> 8; // red 234 | 235 | // dst[0] = (in & 0x1F00) >> 5; 236 | // dst[1] = ((in & 0x7) << 5) | ((in & 0xE000) >> 11); 237 | // dst[2] = in & 0xF8; 238 | } /*}}}*/ 239 | /**@}*/ 240 | 241 | /** 242 | * @brief Convert RGB888 image to RGB565 image 243 | * 244 | * @param in Resulting RGB565 image 245 | * @param r The red channel of the Input RGB888 image 246 | * @param g The green channel of the Input RGB888 image 247 | * @param b The blue channel of the Input RGB888 image 248 | */ 249 | static inline void rgb888_to_565(uint16_t *in, uint8_t r, uint8_t g, uint8_t b) 250 | { /*{{{*/ 251 | uint16_t rgb565 = 0; 252 | rgb565 = ((r >> 3) << 11); 253 | rgb565 |= ((g >> 2) << 5); 254 | rgb565 |= (b >> 3); 255 | rgb565 = (rgb565 & 0xFF) << 8 | (rgb565 & 0xFF00) >> 8; 256 | *in = rgb565; 257 | } /*}}}*/ 258 | 259 | /** 260 | * @brief Filter out the resulting boxes whose confidence score is lower than the threshold and convert the boxes to the actual boxes on the original image.((x, y, w, h) -> (x1, y1, x2, y2)) 261 | * 262 | * @param score Confidence score of the boxes 263 | * @param offset The predicted anchor-based offset 264 | * @param landmark The landmarks corresponding to the box 265 | * @param width Height of the original image 266 | * @param height Width of the original image 267 | * @param anchor_number Anchor number of the detection output feature map 268 | * @param anchors_size The anchor size 269 | * @param score_threshold Threshold of the confidence score 270 | * @param stride 271 | * @param resized_height_scale 272 | * @param resized_width_scale 273 | * @param do_regression 274 | * @return image_list_t* 275 | */ 276 | image_list_t *image_get_valid_boxes(fptp_t *score, 277 | fptp_t *offset, 278 | fptp_t *landmark, 279 | int width, 280 | int height, 281 | int anchor_number, 282 | int *anchors_size, 283 | fptp_t score_threshold, 284 | int stride, 285 | fptp_t resized_height_scale, 286 | fptp_t resized_width_scale, 287 | bool do_regression); 288 | /** 289 | * @brief Sort the resulting box lists by their confidence score. 290 | * 291 | * @param image_sorted_list The sorted box list. 292 | * @param insert_list The box list that have not been sorted. 293 | */ 294 | void image_sort_insert_by_score(image_list_t *image_sorted_list, const image_list_t *insert_list); 295 | 296 | /** 297 | * @brief Run NMS algorithm 298 | * 299 | * @param image_list The input boxes list 300 | * @param nms_threshold NMS threshold 301 | * @param same_area The flag of boxes with same area 302 | */ 303 | void image_nms_process(image_list_t *image_list, fptp_t nms_threshold, int same_area); 304 | 305 | /** 306 | * @brief Resize an image to half size 307 | * 308 | * @param dimage The output image 309 | * @param dw Width of the output image 310 | * @param dh Height of the output image 311 | * @param dc Channel of the output image 312 | * @param simage Source image 313 | * @param sw Width of the source image 314 | * @param sc Channel of the source image 315 | */ 316 | void image_zoom_in_twice(uint8_t *dimage, 317 | int dw, 318 | int dh, 319 | int dc, 320 | uint8_t *simage, 321 | int sw, 322 | int sc); 323 | 324 | /** 325 | * @brief Resize the image in RGB888 format via bilinear interpolation 326 | * 327 | * @param dst_image The output image 328 | * @param src_image Source image 329 | * @param dst_w Width of the output image 330 | * @param dst_h Height of the output image 331 | * @param dst_c Channel of the output image 332 | * @param src_w Width of the source image 333 | * @param src_h Height of the source image 334 | */ 335 | void image_resize_linear(uint8_t *dst_image, uint8_t *src_image, int dst_w, int dst_h, int dst_c, int src_w, int src_h); 336 | 337 | /** 338 | * @brief Crop, rotate and zoom the image in RGB888 format, 339 | * 340 | * @param corp_image The output image 341 | * @param src_image Source image 342 | * @param rotate_angle Rotate angle 343 | * @param ratio scaling ratio 344 | * @param center Center of rotation 345 | */ 346 | void image_cropper(uint8_t *corp_image, uint8_t *src_image, int dst_w, int dst_h, int dst_c, int src_w, int src_h, float rotate_angle, float ratio, float *center); 347 | 348 | /** 349 | * @brief Convert the rgb565 image to the rgb888 image 350 | * 351 | * @param m The output rgb888 image 352 | * @param bmp The input rgb565 image 353 | * @param count Total pixels of the rgb565 image 354 | */ 355 | void image_rgb565_to_888(uint8_t *m, uint16_t *bmp, int count); 356 | 357 | /** 358 | * @brief Convert the rgb888 image to the rgb565 image 359 | * 360 | * @param bmp The output rgb565 image 361 | * @param m The input rgb888 image 362 | * @param count Total pixels of the rgb565 image 363 | */ 364 | void image_rgb888_to_565(uint16_t *bmp, uint8_t *m, int count); 365 | 366 | /** 367 | * @brief draw rectangle on the rgb565 image 368 | * 369 | * @param buf Input image 370 | * @param boxes Rectangle Boxes 371 | * @param width Width of the input image 372 | */ 373 | void draw_rectangle_rgb565(uint16_t *buf, box_array_t *boxes, int width); 374 | 375 | /** 376 | * @brief draw rectangle on the rgb888 image 377 | * 378 | * @param buf Input image 379 | * @param boxes Rectangle Boxes 380 | * @param width Width of the input image 381 | */ 382 | void draw_rectangle_rgb888(uint8_t *buf, box_array_t *boxes, int width); 383 | 384 | /** 385 | * @brief Get the pixel difference of two images 386 | * 387 | * @param dst The output pixel difference 388 | * @param src1 Input image 1 389 | * @param src2 Input image 2 390 | * @param count Total pixels of the input image 391 | */ 392 | void image_abs_diff(uint8_t *dst, uint8_t *src1, uint8_t *src2, int count); 393 | 394 | /** 395 | * @brief Binarize an image to 0 and value. 396 | * 397 | * @param dst The output image 398 | * @param src Source image 399 | * @param threshold Threshold of binarization 400 | * @param value The value of binarization 401 | * @param count Total pixels of the input image 402 | * @param mode Threshold mode 403 | */ 404 | void image_threshold(uint8_t *dst, uint8_t *src, int threshold, int value, int count, en_threshold_mode mode); 405 | 406 | /** 407 | * @brief Erode the image 408 | * 409 | * @param dst The output image 410 | * @param src Source image 411 | * @param src_w Width of the source image 412 | * @param src_h Height of the source image 413 | * @param src_c Channel of the source image 414 | */ 415 | void image_erode(uint8_t *dst, uint8_t *src, int src_w, int src_h, int src_c); 416 | 417 | typedef float matrixType; 418 | typedef struct 419 | { 420 | int w; /*!< width */ 421 | int h; /*!< height */ 422 | matrixType **array; /*!< array */ 423 | } Matrix; 424 | 425 | /** 426 | * @brief Allocate a 2d matrix 427 | * 428 | * @param h Height of matrix 429 | * @param w Width of matrix 430 | * @return Matrix* 2d matrix 431 | */ 432 | Matrix *matrix_alloc(int h, int w); 433 | 434 | /** 435 | * @brief Free a 2d matrix 436 | * 437 | * @param m 2d matrix 438 | */ 439 | void matrix_free(Matrix *m); 440 | 441 | /** 442 | * @brief Get the similarity matrix of similarity transformation 443 | * 444 | * @param srcx Source x coordinates 445 | * @param srcy Source y coordinates 446 | * @param dstx Destination x coordinates 447 | * @param dsty Destination y coordinates 448 | * @param num The number of the coordinates 449 | * @return Matrix* The resulting transformation matrix 450 | */ 451 | Matrix *get_similarity_matrix(float *srcx, float *srcy, float *dstx, float *dsty, int num); 452 | 453 | /** 454 | * @brief Get the affine transformation matrix 455 | * 456 | * @param srcx Source x coordinates 457 | * @param srcy Source y coordinates 458 | * @param dstx Destination x coordinates 459 | * @param dsty Destination y coordinates 460 | * @return Matrix* The resulting transformation matrix 461 | */ 462 | Matrix *get_affine_transform(float *srcx, float *srcy, float *dstx, float *dsty); 463 | 464 | /** 465 | * @brief Applies an affine transformation to an image 466 | * 467 | * @param img Input image 468 | * @param crop Dst output image that has the size dsize and the same type as src 469 | * @param M Affine transformation matrix 470 | */ 471 | void warp_affine(dl_matrix3du_t *img, dl_matrix3du_t *crop, Matrix *M); 472 | 473 | /** 474 | * @brief Resize the image in RGB888 format via bilinear interpolation, and quantify the output image 475 | * 476 | * @param dst_image Quantized output image 477 | * @param src_image Input image 478 | * @param dst_w Width of the output image 479 | * @param dst_h Height of the output image 480 | * @param dst_c Channel of the output image 481 | * @param src_w Width of the input image 482 | * @param src_h Height of the input image 483 | * @param shift Shift parameter of quantization. 484 | */ 485 | void image_resize_linear_q(qtp_t *dst_image, uint8_t *src_image, int dst_w, int dst_h, int dst_c, int src_w, int src_h, int shift); 486 | 487 | /** 488 | * @brief Preprocess the input image of object detection model. The process is like this: resize -> normalize -> quantify 489 | * 490 | * @param image Input image, RGB888 format. 491 | * @param input_w Width of the input image. 492 | * @param input_h Height of the input image. 493 | * @param target_size Target size of the model input image. 494 | * @param exponent Exponent of the quantized model input image. 495 | * @param process_mode Process mode. 0: resize with padding to keep height == width. 1: resize without padding, height != width. 496 | * @return dl_matrix3dq_t* The resulting preprocessed image. 497 | */ 498 | dl_matrix3dq_t *image_resize_normalize_quantize(uint8_t *image, int input_w, int input_h, int target_size, int exponent, int process_mode); 499 | 500 | /** 501 | * @brief Resize the image in RGB565 format via mean neighbour interpolation, and quantify the output image 502 | * 503 | * @param dimage Quantized output image. 504 | * @param simage Input image. 505 | * @param dw Width of the allocated output image memory. 506 | * @param dc Channel of the allocated output image memory. 507 | * @param sw Width of the input image. 508 | * @param sh Height of the input image. 509 | * @param tw Target width of the output image. 510 | * @param th Target height of the output image. 511 | * @param shift Shift parameter of quantization. 512 | */ 513 | void image_resize_shift_fast(qtp_t *dimage, uint16_t *simage, int dw, int dc, int sw, int sh, int tw, int th, int shift); 514 | 515 | /** 516 | * @brief Resize the image in RGB565 format via nearest neighbour interpolation, and quantify the output image 517 | * 518 | * @param dimage Quantized output image. 519 | * @param simage Input image. 520 | * @param dw Width of the allocated output image memory. 521 | * @param dc Channel of the allocated output image memory. 522 | * @param sw Width of the input image. 523 | * @param sh Height of the input image. 524 | * @param tw Target width of the output image. 525 | * @param th Target height of the output image. 526 | * @param shift Shift parameter of quantization. 527 | */ 528 | void image_resize_nearest_shift(qtp_t *dimage, uint16_t *simage, int dw, int dc, int sw, int sh, int tw, int th, int shift); 529 | 530 | /** 531 | * @brief Crop the image in RGB565 format and resize it to target size, then quantify the output image 532 | * 533 | * @param dimage Quantized output image. 534 | * @param simage Input image. 535 | * @param dw Target size of the output image. 536 | * @param sw Width of the input image. 537 | * @param sh Height of the input image. 538 | * @param x1 The x coordinate of the upper left corner of the cropped area 539 | * @param y1 The y coordinate of the upper left corner of the cropped area 540 | * @param x2 The x coordinate of the lower right corner of the cropped area 541 | * @param y2 The y coordinate of the lower right corner of the cropped area 542 | * @param shift Shift parameter of quantization. 543 | */ 544 | void image_crop_shift_fast(qtp_t *dimage, uint16_t *simage, int dw, int sw, int sh, int x1, int y1, int x2, int y2, int shift); 545 | 546 | #ifdef __cplusplus 547 | } 548 | #endif 549 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam-bare/mtmn.h: -------------------------------------------------------------------------------- 1 | /* 2 | * ESPRESSIF MIT License 3 | * 4 | * Copyright (c) 2018 5 | * 6 | * Permission is hereby granted for use on ESPRESSIF SYSTEMS products only, in which case, 7 | * it is free of charge, to any person obtaining a copy of this software and associated 8 | * documentation files (the "Software"), to deal in the Software without restriction, including 9 | * without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 10 | * and/or sell copies of the Software, and to permit persons to whom the Software is furnished 11 | * to do so, subject to the following conditions: 12 | * 13 | * The above copyright notice and this permission notice shall be included in all copies or 14 | * substantial portions of the Software. 15 | * 16 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 18 | * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 19 | * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 20 | * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 21 | * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 22 | * 23 | */ 24 | #pragma once 25 | 26 | #ifdef __cplusplus 27 | extern "C" 28 | { 29 | #endif 30 | #include "dl_lib_matrix3d.h" 31 | #include "dl_lib_matrix3dq.h" 32 | 33 | /** 34 | * Detection results with MTMN. 35 | * 36 | */ 37 | typedef struct 38 | { 39 | dl_matrix3d_t *category; /*!< Classification result after softmax, channel is 2 */ 40 | dl_matrix3d_t *offset; /*!< Bounding box offset of 2 points: top-left and bottom-right, channel is 4 */ 41 | dl_matrix3d_t *landmark; /*!< Offsets of 5 landmarks: 42 | * - Left eye 43 | * - Mouth leftside 44 | * - Nose 45 | * - Right eye 46 | * - Mouth rightside 47 | * 48 | * channel is 10 49 | * */ 50 | } mtmn_net_t; 51 | 52 | 53 | /** 54 | * @brief Free a mtmn_net_t 55 | * 56 | * @param p A mtmn_net_t pointer 57 | * 58 | */ 59 | 60 | void mtmn_net_t_free(mtmn_net_t *p); 61 | 62 | /** 63 | * @brief Forward the pnet process, coarse detection. Calculate in float. 64 | * 65 | * @param in Image matrix, rgb888 format, size is 320x240 66 | * @return Scores for every pixel, and box offset with respect. 67 | */ 68 | mtmn_net_t *pnet_lite_f(dl_matrix3du_t *in); 69 | 70 | /** 71 | * @brief Forward the rnet process, fine determine the boxes from pnet. Calculate in float. 72 | * 73 | * @param in Image matrix, rgb888 format 74 | * @param threshold Score threshold to detect human face 75 | * @return Scores for every box, and box offset with respect. 76 | */ 77 | mtmn_net_t *rnet_lite_f_with_score_verify(dl_matrix3du_t *in, float threshold); 78 | 79 | /** 80 | * @brief Forward the onet process, fine determine the boxes from rnet. Calculate in float. 81 | * 82 | * @param in Image matrix, rgb888 format 83 | * @param threshold Score threshold to detect human face 84 | * @return Scores for every box, box offset, and landmark with respect. 85 | */ 86 | mtmn_net_t *onet_lite_f_with_score_verify(dl_matrix3du_t *in, float threshold); 87 | 88 | /** 89 | * @brief Forward the pnet process, coarse detection. Calculate in quantization. 90 | * 91 | * @param in Image matrix, rgb888 format, size is 320x240 92 | * @return Scores for every pixel, and box offset with respect. 93 | */ 94 | mtmn_net_t *pnet_lite_q(dl_matrix3du_t *in, dl_conv_mode mode); 95 | 96 | /** 97 | * @brief Forward the rnet process, fine determine the boxes from pnet. Calculate in quantization. 98 | * 99 | * @param in Image matrix, rgb888 format 100 | * @param threshold Score threshold to detect human face 101 | * @return Scores for every box, and box offset with respect. 102 | */ 103 | mtmn_net_t *rnet_lite_q_with_score_verify(dl_matrix3du_t *in, float threshold, dl_conv_mode mode); 104 | 105 | /** 106 | * @brief Forward the onet process, fine determine the boxes from rnet. Calculate in quantization. 107 | * 108 | * @param in Image matrix, rgb888 format 109 | * @param threshold Score threshold to detect human face 110 | * @return Scores for every box, box offset, and landmark with respect. 111 | */ 112 | mtmn_net_t *onet_lite_q_with_score_verify(dl_matrix3du_t *in, float threshold, dl_conv_mode mode); 113 | 114 | /** 115 | * @brief Forward the pnet process, coarse detection. Calculate in quantization. 116 | * 117 | * @param in Image matrix, rgb888 format, size is 320x240 118 | * @return Scores for every pixel, and box offset with respect. 119 | */ 120 | mtmn_net_t *pnet_heavy_q(dl_matrix3du_t *in, dl_conv_mode mode); 121 | 122 | /** 123 | * @brief Forward the rnet process, fine determine the boxes from pnet. Calculate in quantization. 124 | * 125 | * @param in Image matrix, rgb888 format 126 | * @param threshold Score threshold to detect human face 127 | * @return Scores for every box, and box offset with respect. 128 | */ 129 | mtmn_net_t *rnet_heavy_q_with_score_verify(dl_matrix3du_t *in, float threshold, dl_conv_mode mode); 130 | 131 | /** 132 | * @brief Forward the onet process, fine determine the boxes from rnet. Calculate in quantization. 133 | * 134 | * @param in Image matrix, rgb888 format 135 | * @param threshold Score threshold to detect human face 136 | * @return Scores for every box, box offset, and landmark with respect. 137 | */ 138 | mtmn_net_t *onet_heavy_q_with_score_verify(dl_matrix3du_t *in, float threshold, dl_conv_mode mode); 139 | 140 | #ifdef __cplusplus 141 | } 142 | #endif 143 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam/camera_pins.h: -------------------------------------------------------------------------------- 1 | 2 | #if defined(CAMERA_MODEL_WROVER_KIT) 3 | #define PWDN_GPIO_NUM -1 4 | #define RESET_GPIO_NUM -1 5 | #define XCLK_GPIO_NUM 21 6 | #define SIOD_GPIO_NUM 26 7 | #define SIOC_GPIO_NUM 27 8 | 9 | #define Y9_GPIO_NUM 35 10 | #define Y8_GPIO_NUM 34 11 | #define Y7_GPIO_NUM 39 12 | #define Y6_GPIO_NUM 36 13 | #define Y5_GPIO_NUM 19 14 | #define Y4_GPIO_NUM 18 15 | #define Y3_GPIO_NUM 5 16 | #define Y2_GPIO_NUM 4 17 | #define VSYNC_GPIO_NUM 25 18 | #define HREF_GPIO_NUM 23 19 | #define PCLK_GPIO_NUM 22 20 | 21 | #elif defined(CAMERA_MODEL_ESP_EYE) 22 | #define PWDN_GPIO_NUM -1 23 | #define RESET_GPIO_NUM -1 24 | #define XCLK_GPIO_NUM 4 25 | #define SIOD_GPIO_NUM 18 26 | #define SIOC_GPIO_NUM 23 27 | 28 | #define Y9_GPIO_NUM 36 29 | #define Y8_GPIO_NUM 37 30 | #define Y7_GPIO_NUM 38 31 | #define Y6_GPIO_NUM 39 32 | #define Y5_GPIO_NUM 35 33 | #define Y4_GPIO_NUM 14 34 | #define Y3_GPIO_NUM 13 35 | #define Y2_GPIO_NUM 34 36 | #define VSYNC_GPIO_NUM 5 37 | #define HREF_GPIO_NUM 27 38 | #define PCLK_GPIO_NUM 25 39 | 40 | #elif defined(CAMERA_MODEL_M5STACK_PSRAM) 41 | #define PWDN_GPIO_NUM -1 42 | #define RESET_GPIO_NUM 15 43 | #define XCLK_GPIO_NUM 27 44 | #define SIOD_GPIO_NUM 25 45 | #define SIOC_GPIO_NUM 23 46 | 47 | #define Y9_GPIO_NUM 19 48 | #define Y8_GPIO_NUM 36 49 | #define Y7_GPIO_NUM 18 50 | #define Y6_GPIO_NUM 39 51 | #define Y5_GPIO_NUM 5 52 | #define Y4_GPIO_NUM 34 53 | #define Y3_GPIO_NUM 35 54 | #define Y2_GPIO_NUM 32 55 | #define VSYNC_GPIO_NUM 22 56 | #define HREF_GPIO_NUM 26 57 | #define PCLK_GPIO_NUM 21 58 | 59 | #elif defined(CAMERA_MODEL_M5STACK_V2_PSRAM) 60 | #define PWDN_GPIO_NUM -1 61 | #define RESET_GPIO_NUM 15 62 | #define XCLK_GPIO_NUM 27 63 | #define SIOD_GPIO_NUM 22 64 | #define SIOC_GPIO_NUM 23 65 | 66 | #define Y9_GPIO_NUM 19 67 | #define Y8_GPIO_NUM 36 68 | #define Y7_GPIO_NUM 18 69 | #define Y6_GPIO_NUM 39 70 | #define Y5_GPIO_NUM 5 71 | #define Y4_GPIO_NUM 34 72 | #define Y3_GPIO_NUM 35 73 | #define Y2_GPIO_NUM 32 74 | #define VSYNC_GPIO_NUM 25 75 | #define HREF_GPIO_NUM 26 76 | #define PCLK_GPIO_NUM 21 77 | 78 | #elif defined(CAMERA_MODEL_M5STACK_WIDE) 79 | #define PWDN_GPIO_NUM -1 80 | #define RESET_GPIO_NUM 15 81 | #define XCLK_GPIO_NUM 27 82 | #define SIOD_GPIO_NUM 22 83 | #define SIOC_GPIO_NUM 23 84 | 85 | #define Y9_GPIO_NUM 19 86 | #define Y8_GPIO_NUM 36 87 | #define Y7_GPIO_NUM 18 88 | #define Y6_GPIO_NUM 39 89 | #define Y5_GPIO_NUM 5 90 | #define Y4_GPIO_NUM 34 91 | #define Y3_GPIO_NUM 35 92 | #define Y2_GPIO_NUM 32 93 | #define VSYNC_GPIO_NUM 25 94 | #define HREF_GPIO_NUM 26 95 | #define PCLK_GPIO_NUM 21 96 | 97 | #elif defined(CAMERA_MODEL_M5STACK_ESP32CAM) 98 | #define PWDN_GPIO_NUM -1 99 | #define RESET_GPIO_NUM 15 100 | #define XCLK_GPIO_NUM 27 101 | #define SIOD_GPIO_NUM 25 102 | #define SIOC_GPIO_NUM 23 103 | 104 | #define Y9_GPIO_NUM 19 105 | #define Y8_GPIO_NUM 36 106 | #define Y7_GPIO_NUM 18 107 | #define Y6_GPIO_NUM 39 108 | #define Y5_GPIO_NUM 5 109 | #define Y4_GPIO_NUM 34 110 | #define Y3_GPIO_NUM 35 111 | #define Y2_GPIO_NUM 17 112 | #define VSYNC_GPIO_NUM 22 113 | #define HREF_GPIO_NUM 26 114 | #define PCLK_GPIO_NUM 21 115 | 116 | #elif defined(CAMERA_MODEL_AI_THINKER) 117 | #define PWDN_GPIO_NUM 32 118 | #define RESET_GPIO_NUM -1 119 | #define XCLK_GPIO_NUM 0 120 | #define SIOD_GPIO_NUM 26 121 | #define SIOC_GPIO_NUM 27 122 | 123 | #define Y9_GPIO_NUM 35 124 | #define Y8_GPIO_NUM 34 125 | #define Y7_GPIO_NUM 39 126 | #define Y6_GPIO_NUM 36 127 | #define Y5_GPIO_NUM 21 128 | #define Y4_GPIO_NUM 19 129 | #define Y3_GPIO_NUM 18 130 | #define Y2_GPIO_NUM 5 131 | #define VSYNC_GPIO_NUM 25 132 | #define HREF_GPIO_NUM 23 133 | #define PCLK_GPIO_NUM 22 134 | 135 | #elif defined(CAMERA_MODEL_TTGO_T_JOURNAL) 136 | #define PWDN_GPIO_NUM 0 137 | #define RESET_GPIO_NUM 15 138 | #define XCLK_GPIO_NUM 27 139 | #define SIOD_GPIO_NUM 25 140 | #define SIOC_GPIO_NUM 23 141 | 142 | #define Y9_GPIO_NUM 19 143 | #define Y8_GPIO_NUM 36 144 | #define Y7_GPIO_NUM 18 145 | #define Y6_GPIO_NUM 39 146 | #define Y5_GPIO_NUM 5 147 | #define Y4_GPIO_NUM 34 148 | #define Y3_GPIO_NUM 35 149 | #define Y2_GPIO_NUM 17 150 | #define VSYNC_GPIO_NUM 22 151 | #define HREF_GPIO_NUM 26 152 | #define PCLK_GPIO_NUM 21 153 | 154 | #else 155 | #error "Camera model not selected" 156 | #endif 157 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam/dl_lib_matrix3d.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | #if CONFIG_SPIRAM_SUPPORT || CONFIG_ESP32_SPIRAM_SUPPORT 11 | #include "freertos/FreeRTOS.h" 12 | #define DL_SPIRAM_SUPPORT 1 13 | #else 14 | #define DL_SPIRAM_SUPPORT 0 15 | #endif 16 | 17 | 18 | #ifndef max 19 | #define max(x, y) (((x) < (y)) ? (y) : (x)) 20 | #endif 21 | 22 | #ifndef min 23 | #define min(x, y) (((x) < (y)) ? (x) : (y)) 24 | #endif 25 | 26 | typedef float fptp_t; 27 | typedef uint8_t uc_t; 28 | 29 | typedef enum 30 | { 31 | DL_SUCCESS = 0, 32 | DL_FAIL = 1, 33 | } dl_error_type; 34 | 35 | typedef enum 36 | { 37 | PADDING_VALID = 0, /*!< Valid padding */ 38 | PADDING_SAME = 1, /*!< Same padding, from right to left, free input */ 39 | PADDING_SAME_DONT_FREE_INPUT = 2, /*!< Same padding, from right to left, do not free input */ 40 | PADDING_SAME_MXNET = 3, /*!< Same padding, from left to right */ 41 | } dl_padding_type; 42 | 43 | typedef enum 44 | { 45 | DL_POOLING_MAX = 0, /*!< Max pooling */ 46 | DL_POOLING_AVG = 1, /*!< Average pooling */ 47 | } dl_pooling_type; 48 | /* 49 | * Matrix for 3d 50 | * @Warning: the sequence of variables is fixed, cannot be modified, otherwise there will be errors in esp_dsp_dot_float 51 | */ 52 | typedef struct 53 | { 54 | int w; /*!< Width */ 55 | int h; /*!< Height */ 56 | int c; /*!< Channel */ 57 | int n; /*!< Number of filter, input and output must be 1 */ 58 | int stride; /*!< Step between lines */ 59 | fptp_t *item; /*!< Data */ 60 | } dl_matrix3d_t; 61 | 62 | typedef struct 63 | { 64 | int w; /*!< Width */ 65 | int h; /*!< Height */ 66 | int c; /*!< Channel */ 67 | int n; /*!< Number of filter, input and output must be 1 */ 68 | int stride; /*!< Step between lines */ 69 | uc_t *item; /*!< Data */ 70 | } dl_matrix3du_t; 71 | 72 | typedef enum 73 | { 74 | UPSAMPLE_NEAREST_NEIGHBOR = 0, /*!< Use nearest neighbor interpolation as the upsample method*/ 75 | UPSAMPLE_BILINEAR = 1, /*!< Use nearest bilinear interpolation as the upsample method*/ 76 | } dl_upsample_type; 77 | 78 | typedef struct 79 | { 80 | int stride_x; /*!< Strides of width */ 81 | int stride_y; /*!< Strides of height */ 82 | dl_padding_type padding; /*!< Padding type */ 83 | } dl_matrix3d_mobilenet_config_t; 84 | 85 | /* 86 | * @brief Allocate a zero-initialized space. Must use 'dl_lib_free' to free the memory. 87 | * 88 | * @param cnt Count of units. 89 | * @param size Size of unit. 90 | * @param align Align of memory. If not required, set 0. 91 | * @return Pointer of allocated memory. Null for failed. 92 | */ 93 | static void *dl_lib_calloc(int cnt, int size, int align) 94 | { 95 | int total_size = cnt * size + align + sizeof(void *); 96 | void *res = malloc(total_size); 97 | if (NULL == res) 98 | { 99 | #if DL_SPIRAM_SUPPORT 100 | res = heap_caps_malloc(total_size, MALLOC_CAP_8BIT | MALLOC_CAP_SPIRAM); 101 | } 102 | if (NULL == res) 103 | { 104 | printf("Item psram alloc failed. Size: %d x %d\n", cnt, size); 105 | #else 106 | printf("Item alloc failed. Size: %d x %d, SPIRAM_FLAG: %d\n", cnt, size, DL_SPIRAM_SUPPORT); 107 | #endif 108 | return NULL; 109 | } 110 | bzero(res, total_size); 111 | void **data = (void **)res + 1; 112 | void **aligned; 113 | if (align) 114 | aligned = (void **)(((size_t)data + (align - 1)) & -align); 115 | else 116 | aligned = data; 117 | 118 | aligned[-1] = res; 119 | return (void *)aligned; 120 | } 121 | 122 | /** 123 | * @brief Free the memory space allocated by 'dl_lib_calloc' 124 | * 125 | */ 126 | static inline void dl_lib_free(void *d) 127 | { 128 | if (NULL == d) 129 | return; 130 | 131 | free(((void **)d)[-1]); 132 | } 133 | 134 | /* 135 | * @brief Allocate a 3D matrix with float items, the access sequence is NHWC 136 | * 137 | * @param n Number of matrix3d, for filters it is out channels, for others it is 1 138 | * @param w Width of matrix3d 139 | * @param h Height of matrix3d 140 | * @param c Channel of matrix3d 141 | * @return 3d matrix 142 | */ 143 | static inline dl_matrix3d_t *dl_matrix3d_alloc(int n, int w, int h, int c) 144 | { 145 | dl_matrix3d_t *r = (dl_matrix3d_t *)dl_lib_calloc(1, sizeof(dl_matrix3d_t), 0); 146 | if (NULL == r) 147 | { 148 | printf("internal r failed.\n"); 149 | return NULL; 150 | } 151 | fptp_t *items = (fptp_t *)dl_lib_calloc(n * w * h * c, sizeof(fptp_t), 0); 152 | if (NULL == items) 153 | { 154 | printf("matrix3d item alloc failed.\n"); 155 | dl_lib_free(r); 156 | return NULL; 157 | } 158 | 159 | r->w = w; 160 | r->h = h; 161 | r->c = c; 162 | r->n = n; 163 | r->stride = w * c; 164 | r->item = items; 165 | 166 | return r; 167 | } 168 | 169 | /* 170 | * @brief Allocate a 3D matrix with 8-bits items, the access sequence is NHWC 171 | * 172 | * @param n Number of matrix3d, for filters it is out channels, for others it is 1 173 | * @param w Width of matrix3d 174 | * @param h Height of matrix3d 175 | * @param c Channel of matrix3d 176 | * @return 3d matrix 177 | */ 178 | static inline dl_matrix3du_t *dl_matrix3du_alloc(int n, int w, int h, int c) 179 | { 180 | dl_matrix3du_t *r = (dl_matrix3du_t *)dl_lib_calloc(1, sizeof(dl_matrix3du_t), 0); 181 | if (NULL == r) 182 | { 183 | printf("internal r failed.\n"); 184 | return NULL; 185 | } 186 | uc_t *items = (uc_t *)dl_lib_calloc(n * w * h * c, sizeof(uc_t), 0); 187 | if (NULL == items) 188 | { 189 | printf("matrix3du item alloc failed.\n"); 190 | dl_lib_free(r); 191 | return NULL; 192 | } 193 | 194 | r->w = w; 195 | r->h = h; 196 | r->c = c; 197 | r->n = n; 198 | r->stride = w * c; 199 | r->item = items; 200 | 201 | return r; 202 | } 203 | 204 | /* 205 | * @brief Free a matrix3d 206 | * 207 | * @param m matrix3d with float items 208 | */ 209 | static inline void dl_matrix3d_free(dl_matrix3d_t *m) 210 | { 211 | if (NULL == m) 212 | return; 213 | if (NULL == m->item) 214 | { 215 | dl_lib_free(m); 216 | return; 217 | } 218 | dl_lib_free(m->item); 219 | dl_lib_free(m); 220 | } 221 | 222 | /* 223 | * @brief Free a matrix3d 224 | * 225 | * @param m matrix3d with 8-bits items 226 | */ 227 | static inline void dl_matrix3du_free(dl_matrix3du_t *m) 228 | { 229 | if (NULL == m) 230 | return; 231 | if (NULL == m->item) 232 | { 233 | dl_lib_free(m); 234 | return; 235 | } 236 | dl_lib_free(m->item); 237 | dl_lib_free(m); 238 | } 239 | 240 | 241 | /* 242 | * @brief Dot product with a vector and matrix 243 | * 244 | * @param out Space to put the result 245 | * @param in input vector 246 | * @param f filter matrix 247 | */ 248 | void dl_matrix3dff_dot_product(dl_matrix3d_t *out, dl_matrix3d_t *in, dl_matrix3d_t *f); 249 | 250 | /** 251 | * @brief Do a softmax operation on a matrix3d 252 | * 253 | * @param in Input matrix3d 254 | */ 255 | void dl_matrix3d_softmax(dl_matrix3d_t *m); 256 | 257 | /** 258 | * @brief Copy a range of float items from an existing matrix to a preallocated matrix 259 | * 260 | * @param dst The destination slice matrix 261 | * @param src The source matrix to slice 262 | * @param x X-offset of the origin of the returned matrix within the sliced matrix 263 | * @param y Y-offset of the origin of the returned matrix within the sliced matrix 264 | * @param w Width of the resulting matrix 265 | * @param h Height of the resulting matrix 266 | */ 267 | void dl_matrix3d_slice_copy(dl_matrix3d_t *dst, 268 | dl_matrix3d_t *src, 269 | int x, 270 | int y, 271 | int w, 272 | int h); 273 | 274 | /** 275 | * @brief Copy a range of 8-bits items from an existing matrix to a preallocated matrix 276 | * 277 | * @param dst The destination slice matrix 278 | * @param src The source matrix to slice 279 | * @param x X-offset of the origin of the returned matrix within the sliced matrix 280 | * @param y Y-offset of the origin of the returned matrix within the sliced matrix 281 | * @param w Width of the resulting matrix 282 | * @param h Height of the resulting matrix 283 | */ 284 | void dl_matrix3du_slice_copy(dl_matrix3du_t *dst, 285 | dl_matrix3du_t *src, 286 | int x, 287 | int y, 288 | int w, 289 | int h); 290 | 291 | /** 292 | * @brief Transform a sliced matrix block from nhwc to nchw, the block needs to be memory continous. 293 | * 294 | * @param out The destination sliced matrix in nchw 295 | * @param in The source sliced matrix in nhwc 296 | */ 297 | void dl_matrix3d_sliced_transform_nchw(dl_matrix3d_t *out, 298 | dl_matrix3d_t *in); 299 | 300 | /** 301 | * @brief Do a general CNN layer pass, dimension is (number, width, height, channel) 302 | * 303 | * @param in Input matrix3d 304 | * @param filter Weights of the neurons 305 | * @param bias Bias for the CNN layer 306 | * @param stride_x The step length of the convolution window in x(width) direction 307 | * @param stride_y The step length of the convolution window in y(height) direction 308 | * @param padding One of VALID or SAME 309 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 310 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 311 | * @return dl_matrix3d_t* The result of CNN layer 312 | */ 313 | dl_matrix3d_t *dl_matrix3d_conv(dl_matrix3d_t *in, 314 | dl_matrix3d_t *filter, 315 | dl_matrix3d_t *bias, 316 | int stride_x, 317 | int stride_y, 318 | int padding, 319 | int mode); 320 | 321 | /** 322 | * @brief Do a global average pooling layer pass, dimension is (number, width, height, channel) 323 | * 324 | * @param in Input matrix3d 325 | * 326 | * @return The result of global average pooling layer 327 | */ 328 | dl_matrix3d_t *dl_matrix3d_global_pool(dl_matrix3d_t *in); 329 | 330 | /** 331 | * @brief Calculate pooling layer of a feature map 332 | * 333 | * @param in Input matrix, size (1, w, h, c) 334 | * @param f_w Window width 335 | * @param f_h Window height 336 | * @param stride_x Stride in horizontal direction 337 | * @param stride_y Stride in vertical direction 338 | * @param padding Padding type: PADDING_VALID and PADDING_SAME 339 | * @param pooling_type Pooling type: DL_POOLING_MAX and POOLING_AVG 340 | * @return dl_matrix3d_t* Resulting matrix, size (1, w', h', c) 341 | */ 342 | dl_matrix3d_t *dl_matrix3d_pooling(dl_matrix3d_t *in, 343 | int f_w, 344 | int f_h, 345 | int stride_x, 346 | int stride_y, 347 | dl_padding_type padding, 348 | dl_pooling_type pooling_type); 349 | /** 350 | * @brief Do a batch normalization operation, update the input matrix3d: input = input * scale + offset 351 | * 352 | * @param m Input matrix3d 353 | * @param scale scale matrix3d, scale = gamma/((moving_variance+sigma)^(1/2)) 354 | * @param Offset Offset matrix3d, offset = beta-(moving_mean*gamma/((moving_variance+sigma)^(1/2))) 355 | */ 356 | void dl_matrix3d_batch_normalize(dl_matrix3d_t *m, 357 | dl_matrix3d_t *scale, 358 | dl_matrix3d_t *offset); 359 | 360 | /** 361 | * @brief Add a pair of matrix3d item-by-item: res=in_1+in_2 362 | * 363 | * @param in_1 First Floating point input matrix3d 364 | * @param in_2 Second Floating point input matrix3d 365 | * 366 | * @return dl_matrix3d_t* Added data 367 | */ 368 | dl_matrix3d_t *dl_matrix3d_add(dl_matrix3d_t *in_1, dl_matrix3d_t *in_2); 369 | 370 | /** 371 | * @brief Concatenate the channels of two matrix3ds into a new matrix3d 372 | * 373 | * @param in_1 First Floating point input matrix3d 374 | * @param in_2 Second Floating point input matrix3d 375 | * 376 | * @return dl_matrix3d_t* A newly allocated matrix3d with as avlues in_1|in_2 377 | */ 378 | dl_matrix3d_t *dl_matrix3d_concat(dl_matrix3d_t *in_1, dl_matrix3d_t *in_2); 379 | 380 | /** 381 | * @brief Concatenate the channels of four matrix3ds into a new matrix3d 382 | * 383 | * @param in_1 First Floating point input matrix3d 384 | * @param in_2 Second Floating point input matrix3d 385 | * @param in_3 Third Floating point input matrix3d 386 | * @param in_4 Fourth Floating point input matrix3d 387 | * 388 | * @return A newly allocated matrix3d with as avlues in_1|in_2|in_3|in_4 389 | */ 390 | dl_matrix3d_t *dl_matrix3d_concat_4(dl_matrix3d_t *in_1, 391 | dl_matrix3d_t *in_2, 392 | dl_matrix3d_t *in_3, 393 | dl_matrix3d_t *in_4); 394 | 395 | /** 396 | * @brief Concatenate the channels of eight matrix3ds into a new matrix3d 397 | * 398 | * @param in_1 First Floating point input matrix3d 399 | * @param in_2 Second Floating point input matrix3d 400 | * @param in_3 Third Floating point input matrix3d 401 | * @param in_4 Fourth Floating point input matrix3d 402 | * @param in_5 Fifth Floating point input matrix3d 403 | * @param in_6 Sixth Floating point input matrix3d 404 | * @param in_7 Seventh Floating point input matrix3d 405 | * @param in_8 eighth Floating point input matrix3d 406 | * 407 | * @return A newly allocated matrix3d with as avlues in_1|in_2|in_3|in_4|in_5|in_6|in_7|in_8 408 | */ 409 | dl_matrix3d_t *dl_matrix3d_concat_8(dl_matrix3d_t *in_1, 410 | dl_matrix3d_t *in_2, 411 | dl_matrix3d_t *in_3, 412 | dl_matrix3d_t *in_4, 413 | dl_matrix3d_t *in_5, 414 | dl_matrix3d_t *in_6, 415 | dl_matrix3d_t *in_7, 416 | dl_matrix3d_t *in_8); 417 | 418 | /** 419 | * @brief Do a mobilefacenet block forward, dimension is (number, width, height, channel) 420 | * 421 | * @param in Input matrix3d 422 | * @param pw Weights of the pointwise conv layer 423 | * @param pw_bn_scale The scale params of the batch_normalize layer after the pointwise conv layer 424 | * @param pw_bn_offset The offset params of the batch_normalize layer after the pointwise conv layer 425 | * @param dw Weights of the depthwise conv layer 426 | * @param dw_bn_scale The scale params of the batch_normalize layer after the depthwise conv layer 427 | * @param dw_bn_offset The offset params of the batch_normalize layer after the depthwise conv layer 428 | * @param pw_linear Weights of the pointwise linear conv layer 429 | * @param pw_linear_bn_scale The scale params of the batch_normalize layer after the pointwise linear conv layer 430 | * @param pw_linear_bn_offset The offset params of the batch_normalize layer after the pointwise linear conv layer 431 | * @param stride_x The step length of the convolution window in x(width) direction 432 | * @param stride_y The step length of the convolution window in y(height) direction 433 | * @param padding One of VALID or SAME 434 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 435 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 436 | * @return The result of a mobilefacenet block 437 | */ 438 | dl_matrix3d_t *dl_matrix3d_mobilefaceblock(dl_matrix3d_t *in, 439 | dl_matrix3d_t *pw, 440 | dl_matrix3d_t *pw_bn_scale, 441 | dl_matrix3d_t *pw_bn_offset, 442 | dl_matrix3d_t *dw, 443 | dl_matrix3d_t *dw_bn_scale, 444 | dl_matrix3d_t *dw_bn_offset, 445 | dl_matrix3d_t *pw_linear, 446 | dl_matrix3d_t *pw_linear_bn_scale, 447 | dl_matrix3d_t *pw_linear_bn_offset, 448 | int stride_x, 449 | int stride_y, 450 | int padding, 451 | int mode, 452 | int shortcut); 453 | 454 | /** 455 | * @brief Do a mobilefacenet block forward with 1x1 split conv, dimension is (number, width, height, channel) 456 | * 457 | * @param in Input matrix3d 458 | * @param pw_1 Weights of the pointwise conv layer 1 459 | * @param pw_2 Weights of the pointwise conv layer 2 460 | * @param pw_bn_scale The scale params of the batch_normalize layer after the pointwise conv layer 461 | * @param pw_bn_offset The offset params of the batch_normalize layer after the pointwise conv layer 462 | * @param dw Weights of the depthwise conv layer 463 | * @param dw_bn_scale The scale params of the batch_normalize layer after the depthwise conv layer 464 | * @param dw_bn_offset The offset params of the batch_normalize layer after the depthwise conv layer 465 | * @param pw_linear_1 Weights of the pointwise linear conv layer 1 466 | * @param pw_linear_2 Weights of the pointwise linear conv layer 2 467 | * @param pw_linear_bn_scale The scale params of the batch_normalize layer after the pointwise linear conv layer 468 | * @param pw_linear_bn_offset The offset params of the batch_normalize layer after the pointwise linear conv layer 469 | * @param stride_x The step length of the convolution window in x(width) direction 470 | * @param stride_y The step length of the convolution window in y(height) direction 471 | * @param padding One of VALID or SAME 472 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 473 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 474 | * @return The result of a mobilefacenet block 475 | */ 476 | dl_matrix3d_t *dl_matrix3d_mobilefaceblock_split(dl_matrix3d_t *in, 477 | dl_matrix3d_t *pw_1, 478 | dl_matrix3d_t *pw_2, 479 | dl_matrix3d_t *pw_bn_scale, 480 | dl_matrix3d_t *pw_bn_offset, 481 | dl_matrix3d_t *dw, 482 | dl_matrix3d_t *dw_bn_scale, 483 | dl_matrix3d_t *dw_bn_offset, 484 | dl_matrix3d_t *pw_linear_1, 485 | dl_matrix3d_t *pw_linear_2, 486 | dl_matrix3d_t *pw_linear_bn_scale, 487 | dl_matrix3d_t *pw_linear_bn_offset, 488 | int stride_x, 489 | int stride_y, 490 | int padding, 491 | int mode, 492 | int shortcut); 493 | 494 | /** 495 | * @brief Initialize the matrix3d feature map to bias 496 | * 497 | * @param out The matrix3d feature map needs to be initialized 498 | * @param bias The bias of a convlotion operation 499 | */ 500 | void dl_matrix3d_init_bias(dl_matrix3d_t *out, dl_matrix3d_t *bias); 501 | 502 | /** 503 | * @brief Do a elementwise multiplication of two matrix3ds 504 | * 505 | * @param out Preallocated matrix3d, size (n, w, h, c) 506 | * @param in1 Input matrix 1, size (n, w, h, c) 507 | * @param in2 Input matrix 2, size (n, w, h, c) 508 | */ 509 | void dl_matrix3d_multiply(dl_matrix3d_t *out, dl_matrix3d_t *in1, dl_matrix3d_t *in2); 510 | 511 | // 512 | // Activation 513 | // 514 | 515 | /** 516 | * @brief Do a standard relu operation, update the input matrix3d 517 | * 518 | * @param m Floating point input matrix3d 519 | */ 520 | void dl_matrix3d_relu(dl_matrix3d_t *m); 521 | 522 | /** 523 | * @brief Do a relu (Rectifier Linear Unit) operation, update the input matrix3d 524 | * 525 | * @param in Floating point input matrix3d 526 | * @param clip If value is higher than this, it will be clipped to this value 527 | */ 528 | void dl_matrix3d_relu_clip(dl_matrix3d_t *m, fptp_t clip); 529 | 530 | /** 531 | * @brief Do a Prelu (Rectifier Linear Unit) operation, update the input matrix3d 532 | * 533 | * @param in Floating point input matrix3d 534 | * @param alpha If value is less than zero, it will be updated by multiplying this factor 535 | */ 536 | void dl_matrix3d_p_relu(dl_matrix3d_t *in, dl_matrix3d_t *alpha); 537 | 538 | /** 539 | * @brief Do a leaky relu (Rectifier Linear Unit) operation, update the input matrix3d 540 | * 541 | * @param in Floating point input matrix3d 542 | * @param alpha If value is less than zero, it will be updated by multiplying this factor 543 | */ 544 | void dl_matrix3d_leaky_relu(dl_matrix3d_t *m, fptp_t alpha); 545 | 546 | // 547 | // Conv 1x1 548 | // 549 | /** 550 | * @brief Do 1x1 convolution with a matrix3d 551 | * 552 | * @param out Preallocated matrix3d, size (1, w, h, n) 553 | * @param in Input matrix, size (1, w, h, c) 554 | * @param filter 1x1 filter, size (n, 1, 1, c) 555 | */ 556 | void dl_matrix3dff_conv_1x1(dl_matrix3d_t *out, 557 | dl_matrix3d_t *in, 558 | dl_matrix3d_t *filter); 559 | 560 | /** 561 | * @brief Do 1x1 convolution with a matrix3d, with bias adding 562 | * 563 | * @param out Preallocated matrix3d, size (1, w, h, n) 564 | * @param in Input matrix, size (1, w, h, c) 565 | * @param filter 1x1 filter, size (n, 1, 1, c) 566 | * @param bias Bias, size (1, 1, 1, n) 567 | */ 568 | void dl_matrix3dff_conv_1x1_with_bias(dl_matrix3d_t *out, 569 | dl_matrix3d_t *in, 570 | dl_matrix3d_t *filter, 571 | dl_matrix3d_t *bias); 572 | 573 | /** 574 | * @brief Do 1x1 convolution with an 8-bit fixed point matrix 575 | * 576 | * @param out Preallocated matrix3d, size (1, w, h, n) 577 | * @param in Input matrix, size (1, w, h, c) 578 | * @param filter 1x1 filter, size (n, 1, 1, c) 579 | */ 580 | void dl_matrix3duf_conv_1x1(dl_matrix3d_t *out, 581 | dl_matrix3du_t *in, 582 | dl_matrix3d_t *filter); 583 | 584 | /** 585 | * @brief Do 1x1 convolution with an 8-bit fixed point matrix, with bias adding 586 | * 587 | * @param out Preallocated matrix3d, size (1, w, h, n) 588 | * @param in Input matrix, size (1, w, h, c) 589 | * @param filter 1x1 filter, size (n, 1, 1, c) 590 | * @param bias Bias, size (1, 1, 1, n) 591 | */ 592 | void dl_matrix3duf_conv_1x1_with_bias(dl_matrix3d_t *out, 593 | dl_matrix3du_t *in, 594 | dl_matrix3d_t *filter, 595 | dl_matrix3d_t *bias); 596 | 597 | // 598 | // Conv 3x3 599 | // 600 | 601 | /** 602 | * @brief Do 3x3 convolution with a matrix3d, without padding 603 | * 604 | * @param out Preallocated matrix3d, size (1, w, h, n) 605 | * @param in Input matrix, size (1, w, h, c) 606 | * @param f 3x3 filter, size (n, 3, 3, c) 607 | * @param step_x Stride of width 608 | * @param step_y Stride of height 609 | */ 610 | void dl_matrix3dff_conv_3x3_op(dl_matrix3d_t *out, 611 | dl_matrix3d_t *in, 612 | dl_matrix3d_t *f, 613 | int step_x, 614 | int step_y); 615 | 616 | /** 617 | * @brief Do 3x3 convolution with a matrix3d, with bias adding 618 | * 619 | * @param input Input matrix, size (1, w, h, c) 620 | * @param filter 3x3 filter, size (n, 3, 3, c) 621 | * @param bias Bias, size (1, 1, 1, n) 622 | * @param stride_x Stride of width 623 | * @param stride_y Stride of height 624 | * @param padding Padding type 625 | * @return dl_matrix3d_t* Resulting matrix3d 626 | */ 627 | dl_matrix3d_t *dl_matrix3dff_conv_3x3(dl_matrix3d_t *in, 628 | dl_matrix3d_t *filter, 629 | dl_matrix3d_t *bias, 630 | int stride_x, 631 | int stride_y, 632 | dl_padding_type padding); 633 | 634 | // 635 | // Conv Common 636 | // 637 | 638 | /** 639 | * @brief Do a general convolution layer pass with an 8-bit fixed point matrix, size is (number, width, height, channel) 640 | * 641 | * @param in Input image 642 | * @param filter Weights of the neurons 643 | * @param bias Bias for the CNN layer 644 | * @param stride_x The step length of the convolution window in x(width) direction 645 | * @param stride_y The step length of the convolution window in y(height) direction 646 | * @param padding Padding type 647 | * @return dl_matrix3d_t* Resulting matrix3d 648 | */ 649 | dl_matrix3d_t *dl_matrix3duf_conv_common(dl_matrix3du_t *in, 650 | dl_matrix3d_t *filter, 651 | dl_matrix3d_t *bias, 652 | int stride_x, 653 | int stride_y, 654 | dl_padding_type padding); 655 | 656 | /** 657 | * @brief Do a general convolution layer pass, size is (number, width, height, channel) 658 | * 659 | * @param in Input image 660 | * @param filter Weights of the neurons 661 | * @param bias Bias for the CNN layer 662 | * @param stride_x The step length of the convolution window in x(width) direction 663 | * @param stride_y The step length of the convolution window in y(height) direction 664 | * @param padding Padding type 665 | * @return dl_matrix3d_t* Resulting matrix3d 666 | */ 667 | dl_matrix3d_t *dl_matrix3dff_conv_common(dl_matrix3d_t *in, 668 | dl_matrix3d_t *filter, 669 | dl_matrix3d_t *bias, 670 | int stride_x, 671 | int stride_y, 672 | dl_padding_type padding); 673 | 674 | // 675 | // Depthwise 3x3 676 | // 677 | 678 | /** 679 | * @brief Do 3x3 depthwise convolution with a float matrix3d 680 | * 681 | * @param in Input matrix, size (1, w, h, c) 682 | * @param filter 3x3 filter, size (1, 3, 3, c) 683 | * @param stride_x Stride of width 684 | * @param stride_y Stride of height 685 | * @param padding Padding type, 0: valid, 1: same 686 | * @return dl_matrix3d_t* Resulting float matrix3d 687 | */ 688 | dl_matrix3d_t *dl_matrix3dff_depthwise_conv_3x3(dl_matrix3d_t *in, 689 | dl_matrix3d_t *filter, 690 | int stride_x, 691 | int stride_y, 692 | int padding); 693 | 694 | /** 695 | * @brief Do 3x3 depthwise convolution with a 8-bit fixed point matrix 696 | * 697 | * @param in Input matrix, size (1, w, h, c) 698 | * @param filter 3x3 filter, size (1, 3, 3, c) 699 | * @param stride_x Stride of width 700 | * @param stride_y Stride of height 701 | * @param padding Padding type, 0: valid, 1: same 702 | * @return dl_matrix3d_t* Resulting float matrix3d 703 | */ 704 | dl_matrix3d_t *dl_matrix3duf_depthwise_conv_3x3(dl_matrix3du_t *in, 705 | dl_matrix3d_t *filter, 706 | int stride_x, 707 | int stride_y, 708 | int padding); 709 | 710 | /** 711 | * @brief Do 3x3 depthwise convolution with a float matrix3d, without padding 712 | * 713 | * @param out Preallocated matrix3d, size (1, w, h, n) 714 | * @param in Input matrix, size (1, w, h, c) 715 | * @param f 3x3 filter, size (1, 3, 3, c) 716 | * @param step_x Stride of width 717 | * @param step_y Stride of height 718 | */ 719 | void dl_matrix3dff_depthwise_conv_3x3_op(dl_matrix3d_t *out, 720 | dl_matrix3d_t *in, 721 | dl_matrix3d_t *f, 722 | int step_x, 723 | int step_y); 724 | 725 | // 726 | // Depthwise Common 727 | // 728 | 729 | /** 730 | * @brief Do a depthwise CNN layer pass, dimension is (number, width, height, channel) 731 | * 732 | * @param in Input matrix3d 733 | * @param filter Weights of the neurons 734 | * @param stride_x The step length of the convolution window in x(width) direction 735 | * @param stride_y The step length of the convolution window in y(height) direction 736 | * @param padding One of VALID or SAME 737 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 738 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 739 | * @return The result of depthwise CNN layer 740 | */ 741 | dl_matrix3d_t *dl_matrix3dff_depthwise_conv_common(dl_matrix3d_t *in, 742 | dl_matrix3d_t *filter, 743 | int stride_x, 744 | int stride_y, 745 | dl_padding_type padding); 746 | 747 | // 748 | // FC 749 | // 750 | /** 751 | * @brief Do a general fully connected layer pass, dimension is (number, width, height, channel) 752 | * 753 | * @param in Input matrix3d, size is (1, w, 1, 1) 754 | * @param filter Weights of the neurons, size is (1, w, h, 1) 755 | * @param bias Bias for the fc layer, size is (1, 1, 1, h) 756 | * @return The result of fc layer, size is (1, 1, 1, h) 757 | */ 758 | void dl_matrix3dff_fc(dl_matrix3d_t *out, 759 | dl_matrix3d_t *in, 760 | dl_matrix3d_t *filter); 761 | 762 | /** 763 | * @brief Do fully connected layer forward, with bias adding 764 | * 765 | * @param out Preallocated resulting matrix, size (1, 1, 1, h) 766 | * @param in Input matrix, size (1, 1, 1, w) 767 | * @param filter Filter matrix, size (1, w, h, 1) 768 | * @param bias Bias matrix, size (1, 1, 1, h) 769 | */ 770 | void dl_matrix3dff_fc_with_bias(dl_matrix3d_t *out, 771 | dl_matrix3d_t *in, 772 | dl_matrix3d_t *filter, 773 | dl_matrix3d_t *bias); 774 | 775 | // 776 | // Mobilenet 777 | // 778 | 779 | /** 780 | * @brief Do a mobilenet block forward, dimension is (number, width, height, channel) 781 | * 782 | * @param in Input matrix3d 783 | * @param filter Weights of the neurons 784 | * @param stride_x The step length of the convolution window in x(width) direction 785 | * @param stride_y The step length of the convolution window in y(height) direction 786 | * @param padding One of VALID or SAME 787 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 788 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 789 | * @return The result of depthwise CNN layer 790 | */ 791 | dl_matrix3d_t *dl_matrix3dff_mobilenet(dl_matrix3d_t *in, 792 | dl_matrix3d_t *dilate_filter, 793 | dl_matrix3d_t *dilate_prelu, 794 | dl_matrix3d_t *depthwise_filter, 795 | dl_matrix3d_t *depthwise_prelu, 796 | dl_matrix3d_t *compress_filter, 797 | dl_matrix3d_t *bias, 798 | dl_matrix3d_mobilenet_config_t config); 799 | 800 | /** 801 | * @brief Do a mobilenet block forward, dimension is (number, width, height, channel) 802 | * 803 | * @param in Input matrix3du 804 | * @param filter Weights of the neurons 805 | * @param stride_x The step length of the convolution window in x(width) direction 806 | * @param stride_y The step length of the convolution window in y(height) direction 807 | * @param padding One of VALID or SAME 808 | * @param mode Do convolution using C implement or xtensa implement, 0 or 1, with respect 809 | * If ESP_PLATFORM is not defined, this value is not used. Default is 0 810 | * @return The result of depthwise CNN layer 811 | */ 812 | dl_matrix3d_t *dl_matrix3duf_mobilenet(dl_matrix3du_t *in, 813 | dl_matrix3d_t *dilate_filter, 814 | dl_matrix3d_t *dilate_prelu, 815 | dl_matrix3d_t *depthwise_filter, 816 | dl_matrix3d_t *depthwise_prelu, 817 | dl_matrix3d_t *compress_filter, 818 | dl_matrix3d_t *bias, 819 | dl_matrix3d_mobilenet_config_t config); 820 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam/edge-impulse-esp32-cam.ino: -------------------------------------------------------------------------------- 1 | /* 2 | Live Image Classification on ESP32-CAM and ST7735 TFT display 3 | using MobileNet v1 from Edge Impulse 4 | Modified from https://github.com/edgeimpulse/example-esp32-cam. 5 | 6 | Note: 7 | The ST7735 TFT size has to be at least 120x120. 8 | Do not use Arduino IDE 2.0 or you won't be able to see the serial output! 9 | */ 10 | 11 | #include // replace with your deployed Edge Impulse library 12 | 13 | #define CAMERA_MODEL_AI_THINKER 14 | 15 | #include "img_converters.h" 16 | #include "image_util.h" 17 | #include "esp_camera.h" 18 | #include "camera_pins.h" 19 | 20 | #include // Core graphics library 21 | #include // Hardware-specific library for ST7735 22 | 23 | #define TFT_SCLK 14 // SCL 24 | #define TFT_MOSI 13 // SDA 25 | #define TFT_RST 12 // RES (RESET) 26 | #define TFT_DC 2 // Data Command control pin 27 | #define TFT_CS 15 // Chip select control pin 28 | // BL (back light) and VCC -> 3V3 29 | 30 | #define BTN 4 // button (shared with flash led) 31 | 32 | dl_matrix3du_t *resized_matrix = NULL; 33 | ei_impulse_result_t result = {0}; 34 | 35 | Adafruit_ST7735 tft = Adafruit_ST7735(TFT_CS, TFT_DC, TFT_MOSI, TFT_SCLK, TFT_RST); 36 | 37 | // setup 38 | void setup() { 39 | Serial.begin(115200); 40 | 41 | // button 42 | pinMode(4, INPUT); 43 | 44 | // TFT display init 45 | tft.initR(INITR_GREENTAB); // you might need to use INITR_REDTAB or INITR_BLACKTAB to get correct text colors 46 | tft.setRotation(0); 47 | tft.fillScreen(ST77XX_BLACK); 48 | 49 | // cam config 50 | camera_config_t config; 51 | config.ledc_channel = LEDC_CHANNEL_0; 52 | config.ledc_timer = LEDC_TIMER_0; 53 | config.pin_d0 = Y2_GPIO_NUM; 54 | config.pin_d1 = Y3_GPIO_NUM; 55 | config.pin_d2 = Y4_GPIO_NUM; 56 | config.pin_d3 = Y5_GPIO_NUM; 57 | config.pin_d4 = Y6_GPIO_NUM; 58 | config.pin_d5 = Y7_GPIO_NUM; 59 | config.pin_d6 = Y8_GPIO_NUM; 60 | config.pin_d7 = Y9_GPIO_NUM; 61 | config.pin_xclk = XCLK_GPIO_NUM; 62 | config.pin_pclk = PCLK_GPIO_NUM; 63 | config.pin_vsync = VSYNC_GPIO_NUM; 64 | config.pin_href = HREF_GPIO_NUM; 65 | config.pin_sscb_sda = SIOD_GPIO_NUM; 66 | config.pin_sscb_scl = SIOC_GPIO_NUM; 67 | config.pin_pwdn = PWDN_GPIO_NUM; 68 | config.pin_reset = RESET_GPIO_NUM; 69 | config.xclk_freq_hz = 20000000; 70 | config.pixel_format = PIXFORMAT_JPEG; 71 | config.frame_size = FRAMESIZE_240X240; 72 | config.jpeg_quality = 10; 73 | config.fb_count = 1; 74 | 75 | // camera init 76 | esp_err_t err = esp_camera_init(&config); 77 | if (err != ESP_OK) { 78 | Serial.printf("Camera init failed with error 0x%x", err); 79 | return; 80 | } 81 | 82 | sensor_t * s = esp_camera_sensor_get(); 83 | // initial sensors are flipped vertically and colors are a bit saturated 84 | if (s->id.PID == OV3660_PID) { 85 | s->set_vflip(s, 1); // flip it back 86 | s->set_brightness(s, 1); // up the brightness just a bit 87 | s->set_saturation(s, 0); // lower the saturation 88 | } 89 | 90 | Serial.println("Camera Ready!...(standby, press button to start)"); 91 | tft_drawtext(4, 4, "Standby", 1, ST77XX_BLUE); 92 | } 93 | 94 | // main loop 95 | void loop() { 96 | 97 | // wait until the button is pressed 98 | while (!digitalRead(BTN)); 99 | delay(100); 100 | 101 | // capture a image and classify it 102 | String result = classify(); 103 | 104 | // display result 105 | Serial.printf("Result: %s\n", result); 106 | tft_drawtext(4, 120 - 16, result, 2, ST77XX_GREEN); 107 | } 108 | 109 | // classify labels 110 | String classify() { 111 | 112 | // run image capture once to force clear buffer 113 | // otherwise the captured image below would only show up next time you pressed the button! 114 | capture_quick(); 115 | 116 | // capture image from camera 117 | if (!capture()) return "Error"; 118 | tft_drawtext(4, 4, "Classifying...", 1, ST77XX_CYAN); 119 | 120 | Serial.println("Getting image..."); 121 | signal_t signal; 122 | signal.total_length = EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_WIDTH; 123 | signal.get_data = &raw_feature_get_data; 124 | 125 | Serial.println("Run classifier..."); 126 | // Feed signal to the classifier 127 | EI_IMPULSE_ERROR res = run_classifier(&signal, &result, false /* debug */); 128 | // --- Free memory --- 129 | dl_matrix3du_free(resized_matrix); 130 | 131 | // --- Returned error variable "res" while data object.array in "result" --- 132 | ei_printf("run_classifier returned: %d\n", res); 133 | if (res != 0) return "Error"; 134 | 135 | // --- print the predictions --- 136 | ei_printf("Predictions (DSP: %d ms., Classification: %d ms., Anomaly: %d ms.): \n", 137 | result.timing.dsp, result.timing.classification, result.timing.anomaly); 138 | int index; 139 | float score = 0.0; 140 | for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) { 141 | // record the most possible label 142 | if (result.classification[ix].value > score) { 143 | score = result.classification[ix].value; 144 | index = ix; 145 | } 146 | ei_printf(" %s: \t%f\r\n", result.classification[ix].label, result.classification[ix].value); 147 | tft_drawtext(4, 12 + 8 * ix, String(result.classification[ix].label) + " " + String(result.classification[ix].value * 100) + "%", 1, ST77XX_ORANGE); 148 | } 149 | 150 | #if EI_CLASSIFIER_HAS_ANOMALY == 1 151 | ei_printf(" anomaly score: %f\r\n", result.anomaly); 152 | #endif 153 | 154 | // --- return the most possible label --- 155 | return String(result.classification[index].label); 156 | } 157 | 158 | // quick capture (to clear buffer) 159 | void capture_quick() { 160 | camera_fb_t *fb = NULL; 161 | fb = esp_camera_fb_get(); 162 | if (!fb) return; 163 | esp_camera_fb_return(fb); 164 | } 165 | 166 | // capture image from cam 167 | bool capture() { 168 | 169 | Serial.println("Capture image..."); 170 | esp_err_t res = ESP_OK; 171 | camera_fb_t *fb = NULL; 172 | fb = esp_camera_fb_get(); 173 | if (!fb) { 174 | Serial.println("Camera capture failed"); 175 | return false; 176 | } 177 | 178 | // --- Convert frame to RGB888 --- 179 | Serial.println("Converting to RGB888..."); 180 | // Allocate rgb888_matrix buffer 181 | dl_matrix3du_t *rgb888_matrix = dl_matrix3du_alloc(1, fb->width, fb->height, 3); 182 | fmt2rgb888(fb->buf, fb->len, fb->format, rgb888_matrix->item); 183 | 184 | // --- Resize the RGB888 frame to 96x96 in this example --- 185 | Serial.println("Resizing the frame buffer..."); 186 | resized_matrix = dl_matrix3du_alloc(1, EI_CLASSIFIER_INPUT_WIDTH, EI_CLASSIFIER_INPUT_HEIGHT, 3); 187 | image_resize_linear(resized_matrix->item, rgb888_matrix->item, EI_CLASSIFIER_INPUT_WIDTH, EI_CLASSIFIER_INPUT_HEIGHT, 3, fb->width, fb->height); 188 | 189 | // --- Convert frame to RGB565 and display on the TFT --- 190 | Serial.println("Converting to RGB565 and display on TFT..."); 191 | uint8_t *rgb565 = (uint8_t *) malloc(240 * 240 * 3); 192 | jpg2rgb565(fb->buf, fb->len, rgb565, JPG_SCALE_2X); // scale to half size 193 | tft.drawRGBBitmap(0, 0, (uint16_t*)rgb565, 120, 120); 194 | 195 | // --- Free memory --- 196 | rgb565 = NULL; 197 | dl_matrix3du_free(rgb888_matrix); 198 | esp_camera_fb_return(fb); 199 | 200 | return true; 201 | } 202 | 203 | int raw_feature_get_data(size_t offset, size_t out_len, float *signal_ptr) { 204 | 205 | size_t pixel_ix = offset * 3; 206 | size_t bytes_left = out_len; 207 | size_t out_ptr_ix = 0; 208 | 209 | // read byte for byte 210 | while (bytes_left != 0) { 211 | // grab the values and convert to r/g/b 212 | uint8_t r, g, b; 213 | r = resized_matrix->item[pixel_ix]; 214 | g = resized_matrix->item[pixel_ix + 1]; 215 | b = resized_matrix->item[pixel_ix + 2]; 216 | 217 | // then convert to out_ptr format 218 | float pixel_f = (r << 16) + (g << 8) + b; 219 | signal_ptr[out_ptr_ix] = pixel_f; 220 | 221 | // and go to the next pixel 222 | out_ptr_ix++; 223 | pixel_ix += 3; 224 | bytes_left--; 225 | } 226 | 227 | return 0; 228 | } 229 | 230 | // draw test on TFT 231 | void tft_drawtext(int16_t x, int16_t y, String text, uint8_t font_size, uint16_t color) { 232 | tft.setCursor(x, y); 233 | tft.setTextSize(font_size); // font size 1 = 6x8, 2 = 12x16, 3 = 18x24 234 | tft.setTextColor(color); 235 | tft.setTextWrap(true); 236 | tft.print(strcpy(new char[text.length() + 1], text.c_str())); 237 | } 238 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam/esp_image.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | * ESPRESSIF MIT License 3 | * 4 | * Copyright (c) 2018 5 | * 6 | * Permission is hereby granted for use on ESPRESSIF SYSTEMS products only, in which case, 7 | * it is free of charge, to any person obtaining a copy of this software and associated 8 | * documentation files (the "Software"), to deal in the Software without restriction, including 9 | * without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 10 | * and/or sell copies of the Software, and to permit persons to whom the Software is furnished 11 | * to do so, subject to the following conditions: 12 | * 13 | * The above copyright notice and this permission notice shall be included in all copies or 14 | * substantial portions of the Software. 15 | * 16 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 18 | * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 19 | * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 20 | * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 21 | * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 22 | * 23 | */ 24 | #pragma once 25 | 26 | #ifdef __cplusplus 27 | extern "C" 28 | { 29 | #endif 30 | 31 | #include 32 | #include 33 | #include 34 | 35 | #ifdef __cplusplus 36 | } 37 | #endif 38 | 39 | typedef enum 40 | { 41 | IMAGE_RESIZE_BILINEAR = 0, /* 47 | class Image 48 | { 49 | public: 50 | /** 51 | * @brief Convert a RGB565 pixel to RGB888 52 | * 53 | * @param input Pixel value in RGB565 54 | * @param output Pixel value in RGB888 55 | */ 56 | static inline void pixel_rgb565_to_rgb888(uint16_t input, T *output) 57 | { 58 | output[2] = (input & 0x1F00) >> 5; //blue 59 | output[1] = ((input & 0x7) << 5) | ((input & 0xE000) >> 11); //green 60 | output[0] = input & 0xF8; //red 61 | }; 62 | 63 | /** 64 | * @brief Resize a RGB565 image to a RGB88 image 65 | * 66 | * @param dst_image The destination image 67 | * @param y_start The start y index of where resized image located 68 | * @param y_end The end y index of where resized image located 69 | * @param x_start The start x index of where resized image located 70 | * @param x_end The end x index of where resized image located 71 | * @param channel The channel number of image 72 | * @param src_image The source image 73 | * @param src_h The height of source image 74 | * @param src_w The width of source image 75 | * @param dst_w The width of destination image 76 | * @param shift_left The bit number of left shifting 77 | * @param type The resize type 78 | */ 79 | static void resize_to_rgb888(T *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint16_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type); 80 | 81 | /** 82 | * @brief Resize a RGB888 image to a RGB88 image 83 | * 84 | * @param dst_image The destination image 85 | * @param y_start The start y index of where resized image located 86 | * @param y_end The end y index of where resized image located 87 | * @param x_start The start x index of where resized image located 88 | * @param x_end The end x index of where resized image located 89 | * @param channel The channel number of image 90 | * @param src_image The source image 91 | * @param src_h The height of source image 92 | * @param src_w The width of source image 93 | * @param dst_w The width of destination image 94 | * @param shift_left The bit number of left shifting 95 | * @param type The resize type 96 | */ 97 | static void resize_to_rgb888(T *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint8_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type); 98 | // static void resize_to_rgb565(uint16_t *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint16_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type); 99 | // static void resize_to_rgb565(uint16_t *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint8_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type); 100 | }; 101 | 102 | template 103 | void Image::resize_to_rgb888(T *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint16_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type) 104 | { 105 | assert(channel == 3); 106 | float scale_y = (float)src_h / (y_end - y_start); 107 | float scale_x = (float)src_w / (x_end - x_start); 108 | int temp[13]; 109 | 110 | switch (type) 111 | { 112 | case IMAGE_RESIZE_BILINEAR: 113 | for (size_t y = y_start; y < y_end; y++) 114 | { 115 | float ratio_y[2]; 116 | ratio_y[0] = (float)((y + 0.5) * scale_y - 0.5); // y 117 | int src_y = (int)ratio_y[0]; // y1 118 | ratio_y[0] -= src_y; // y - y1 119 | 120 | if (src_y < 0) 121 | { 122 | ratio_y[0] = 0; 123 | src_y = 0; 124 | } 125 | if (src_y > src_h - 2) 126 | { 127 | ratio_y[0] = 0; 128 | src_y = src_h - 2; 129 | } 130 | ratio_y[1] = 1 - ratio_y[0]; // y2 - y 131 | 132 | int _dst_i = y * dst_w; 133 | 134 | int _src_row_0 = src_y * src_w; 135 | int _src_row_1 = _src_row_0 + src_w; 136 | 137 | for (size_t x = x_start; x < x_end; x++) 138 | { 139 | float ratio_x[2]; 140 | ratio_x[0] = (float)((x + 0.5) * scale_x - 0.5); // x 141 | int src_x = (int)ratio_x[0]; // x1 142 | ratio_x[0] -= src_x; // x - x1 143 | 144 | if (src_x < 0) 145 | { 146 | ratio_x[0] = 0; 147 | src_x = 0; 148 | } 149 | if (src_x > src_w - 2) 150 | { 151 | ratio_x[0] = 0; 152 | src_x = src_w - 2; 153 | } 154 | ratio_x[1] = 1 - ratio_x[0]; // x2 - x 155 | 156 | int dst_i = (_dst_i + x) * channel; 157 | 158 | int src_row_0 = _src_row_0 + src_x; 159 | int src_row_1 = _src_row_1 + src_x; 160 | 161 | Image::pixel_rgb565_to_rgb888(src_image[src_row_0], temp); 162 | Image::pixel_rgb565_to_rgb888(src_image[src_row_0 + 1], temp + 3); 163 | Image::pixel_rgb565_to_rgb888(src_image[src_row_1], temp + 6); 164 | Image::pixel_rgb565_to_rgb888(src_image[src_row_1 + 1], temp + 9); 165 | 166 | for (int c = 0; c < channel; c++) 167 | { 168 | temp[12] = round(temp[c] * ratio_x[1] * ratio_y[1] + temp[channel + c] * ratio_x[0] * ratio_y[1] + temp[channel + channel + c] * ratio_x[1] * ratio_y[0] + src_image[channel + channel + channel + c] * ratio_x[0] * ratio_y[0]); 169 | dst_image[dst_i + c] = (shift_left > 0) ? (temp[12] << shift_left) : (temp[12] >> -shift_left); 170 | } 171 | } 172 | } 173 | break; 174 | 175 | case IMAGE_RESIZE_MEAN: 176 | shift_left -= 2; 177 | for (int y = y_start; y < y_end; y++) 178 | { 179 | int _dst_i = y * dst_w; 180 | 181 | float _src_row_0 = rintf(y * scale_y) * src_w; 182 | float _src_row_1 = _src_row_0 + src_w; 183 | 184 | for (int x = x_start; x < x_end; x++) 185 | { 186 | int dst_i = (_dst_i + x) * channel; 187 | 188 | int src_row_0 = (_src_row_0 + rintf(x * scale_x)); 189 | int src_row_1 = (_src_row_1 + rintf(x * scale_x)); 190 | 191 | Image::pixel_rgb565_to_rgb888(src_image[src_row_0], temp); 192 | Image::pixel_rgb565_to_rgb888(src_image[src_row_0 + 1], temp + 3); 193 | Image::pixel_rgb565_to_rgb888(src_image[src_row_1], temp + 6); 194 | Image::pixel_rgb565_to_rgb888(src_image[src_row_1 + 1], temp + 9); 195 | 196 | dst_image[dst_i] = (shift_left > 0) ? ((temp[0] + temp[3] + temp[6] + temp[9]) << shift_left) : ((temp[0] + temp[3] + temp[6] + temp[9]) >> -shift_left); 197 | dst_image[dst_i + 1] = (shift_left > 0) ? ((temp[1] + temp[4] + temp[7] + temp[10]) << shift_left) : ((temp[1] + temp[4] + temp[7] + temp[10]) >> -shift_left); 198 | dst_image[dst_i + 2] = (shift_left > 0) ? ((temp[2] + temp[5] + temp[8] + temp[11]) << shift_left) : ((temp[1] + temp[4] + temp[7] + temp[10]) >> -shift_left); 199 | } 200 | } 201 | 202 | break; 203 | 204 | case IMAGE_RESIZE_NEAREST: 205 | for (size_t y = y_start; y < y_end; y++) 206 | { 207 | int _dst_i = y * dst_w; 208 | float _src_i = rintf(y * scale_y) * src_w; 209 | 210 | for (size_t x = x_start; x < x_end; x++) 211 | { 212 | int dst_i = (_dst_i + x) * channel; 213 | int src_i = _src_i + rintf(x * scale_x); 214 | 215 | Image::pixel_rgb565_to_rgb888(src_image[src_i], temp); 216 | 217 | dst_image[dst_i] = (shift_left > 0) ? (temp[0] << shift_left) : (temp[0] >> -shift_left); 218 | dst_image[dst_i + 1] = (shift_left > 0) ? (temp[1] << shift_left) : (temp[1] >> -shift_left); 219 | dst_image[dst_i + 2] = (shift_left > 0) ? (temp[2] << shift_left) : (temp[2] >> -shift_left); 220 | } 221 | } 222 | break; 223 | 224 | default: 225 | break; 226 | } 227 | } 228 | 229 | template 230 | void Image::resize_to_rgb888(T *dst_image, int y_start, int y_end, int x_start, int x_end, int channel, uint8_t *src_image, int src_h, int src_w, int dst_w, int shift_left, image_resize_t type) 231 | { 232 | float scale_y = (float)src_h / (y_end - y_start); 233 | float scale_x = (float)src_w / (x_end - x_start); 234 | int temp; 235 | 236 | switch (type) 237 | { 238 | case IMAGE_RESIZE_BILINEAR: 239 | for (size_t y = y_start; y < y_end; y++) 240 | { 241 | float ratio_y[2]; 242 | ratio_y[0] = (float)((y + 0.5) * scale_y - 0.5); // y 243 | int src_y = (int)ratio_y[0]; // y1 244 | ratio_y[0] -= src_y; // y - y1 245 | 246 | if (src_y < 0) 247 | { 248 | ratio_y[0] = 0; 249 | src_y = 0; 250 | } 251 | if (src_y > src_h - 2) 252 | { 253 | ratio_y[0] = 0; 254 | src_y = src_h - 2; 255 | } 256 | ratio_y[1] = 1 - ratio_y[0]; // y2 - y 257 | 258 | int _dst_i = y * dst_w; 259 | 260 | int _src_row_0 = src_y * src_w; 261 | int _src_row_1 = _src_row_0 + src_w; 262 | 263 | for (size_t x = x_start; x < x_end; x++) 264 | { 265 | float ratio_x[2]; 266 | ratio_x[0] = (float)((x + 0.5) * scale_x - 0.5); // x 267 | int src_x = (int)ratio_x[0]; // x1 268 | ratio_x[0] -= src_x; // x - x1 269 | 270 | if (src_x < 0) 271 | { 272 | ratio_x[0] = 0; 273 | src_x = 0; 274 | } 275 | if (src_x > src_w - 2) 276 | { 277 | ratio_x[0] = 0; 278 | src_x = src_w - 2; 279 | } 280 | ratio_x[1] = 1 - ratio_x[0]; // x2 - x 281 | 282 | int dst_i = (_dst_i + x) * channel; 283 | 284 | int src_row_0 = (_src_row_0 + src_x) * channel; 285 | int src_row_1 = (_src_row_1 + src_x) * channel; 286 | 287 | for (int c = 0; c < channel; c++) 288 | { 289 | temp = round(src_image[src_row_0 + c] * ratio_x[1] * ratio_y[1] + src_image[src_row_0 + channel + c] * ratio_x[0] * ratio_y[1] + src_image[src_row_1 + c] * ratio_x[1] * ratio_y[0] + src_image[src_row_1 + channel + c] * ratio_x[0] * ratio_y[0]); 290 | dst_image[dst_i + c] = (shift_left > 0) ? (temp << shift_left) : (temp >> -shift_left); 291 | } 292 | } 293 | } 294 | break; 295 | 296 | case IMAGE_RESIZE_MEAN: 297 | shift_left -= 2; 298 | 299 | for (size_t y = y_start; y < y_end; y++) 300 | { 301 | int _dst_i = y * dst_w; 302 | 303 | float _src_row_0 = rintf(y * scale_y) * src_w; 304 | float _src_row_1 = _src_row_0 + src_w; 305 | 306 | for (size_t x = x_start; x < x_end; x++) 307 | { 308 | int dst_i = (_dst_i + x) * channel; 309 | 310 | int src_row_0 = (_src_row_0 + rintf(x * scale_x)) * channel; 311 | int src_row_1 = (_src_row_1 + rintf(x * scale_x)) * channel; 312 | 313 | for (size_t c = 0; c < channel; c++) 314 | { 315 | temp = (int)src_image[src_row_0 + c] + (int)src_image[src_row_0 + channel + c] + (int)src_image[src_row_1 + c] + (int)src_image[src_row_1 + channel + c]; 316 | dst_image[dst_i + c] = (shift_left > 0) ? (temp << shift_left) : (temp >> -shift_left); 317 | } 318 | } 319 | } 320 | break; 321 | 322 | case IMAGE_RESIZE_NEAREST: 323 | for (size_t y = y_start; y < y_end; y++) 324 | { 325 | int _dst_i = y * dst_w; 326 | float _src_i = rintf(y * scale_y) * src_w; 327 | 328 | for (size_t x = x_start; x < x_end; x++) 329 | { 330 | int dst_i = (_dst_i + x) * channel; 331 | int src_i = (_src_i + rintf(x * scale_x)) * channel; 332 | 333 | for (size_t c = 0; c < channel; c++) 334 | { 335 | dst_image[dst_i + c] = (shift_left > 0) ? ((T)src_image[src_i + c] << shift_left) : ((T)src_image[src_i + c] >> -shift_left); 336 | } 337 | } 338 | } 339 | break; 340 | 341 | default: 342 | break; 343 | } 344 | } -------------------------------------------------------------------------------- /edge-impulse-esp32-cam/frmn.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #if __cplusplus 4 | extern "C" 5 | { 6 | #endif 7 | 8 | #include "dl_lib_matrix3d.h" 9 | #include "dl_lib_matrix3dq.h" 10 | 11 | /** 12 | * @brief Forward the face recognition process with frmn model. Calculate in float. 13 | * 14 | * @param in Image matrix, rgb888 format, size is 56x56, normalized 15 | * @return dl_matrix3d_t* Face ID feature vector, size is 512 16 | */ 17 | dl_matrix3d_t *frmn(dl_matrix3d_t *in); 18 | 19 | /**@{*/ 20 | /** 21 | * @brief Forward the face recognition process with specified model. Calculate in quantization. 22 | * 23 | * @param in Image matrix, rgb888 format, size is 56x56, normalized 24 | * @param mode 0: C implement; 1: handwrite xtensa instruction implement 25 | * @return Face ID feature vector, size is 512 26 | */ 27 | dl_matrix3dq_t *frmn_q(dl_matrix3dq_t *in, dl_conv_mode mode); 28 | 29 | dl_matrix3dq_t *frmn2p_q(dl_matrix3dq_t *in, dl_conv_mode mode); 30 | 31 | dl_matrix3dq_t *mfn56_42m_q(dl_matrix3dq_t *in, dl_conv_mode mode); 32 | 33 | dl_matrix3dq_t *mfn56_72m_q(dl_matrix3dq_t *in, dl_conv_mode mode); 34 | 35 | dl_matrix3dq_t *mfn56_112m_q(dl_matrix3dq_t *in, dl_conv_mode mode); 36 | 37 | dl_matrix3dq_t *mfn56_156m_q(dl_matrix3dq_t *in, dl_conv_mode mode); 38 | 39 | /**@}*/ 40 | 41 | #if __cplusplus 42 | } 43 | #endif 44 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam/image_util.h: -------------------------------------------------------------------------------- 1 | /* 2 | * ESPRESSIF MIT License 3 | * 4 | * Copyright (c) 2018 5 | * 6 | * Permission is hereby granted for use on ESPRESSIF SYSTEMS products only, in which case, 7 | * it is free of charge, to any person obtaining a copy of this software and associated 8 | * documentation files (the "Software"), to deal in the Software without restriction, including 9 | * without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 10 | * and/or sell copies of the Software, and to permit persons to whom the Software is furnished 11 | * to do so, subject to the following conditions: 12 | * 13 | * The above copyright notice and this permission notice shall be included in all copies or 14 | * substantial portions of the Software. 15 | * 16 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 18 | * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 19 | * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 20 | * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 21 | * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 22 | * 23 | */ 24 | #pragma once 25 | #ifdef __cplusplus 26 | extern "C" 27 | { 28 | #endif 29 | #include 30 | #include 31 | #include "mtmn.h" 32 | 33 | #define LANDMARKS_NUM (10) 34 | 35 | #define MAX_VALID_COUNT_PER_IMAGE (30) 36 | 37 | #define DL_IMAGE_MIN(A, B) ((A) < (B) ? (A) : (B)) 38 | #define DL_IMAGE_MAX(A, B) ((A) < (B) ? (B) : (A)) 39 | 40 | #define RGB565_MASK_RED 0xF800 41 | #define RGB565_MASK_GREEN 0x07E0 42 | #define RGB565_MASK_BLUE 0x001F 43 | 44 | typedef enum 45 | { 46 | BINARY, /*!< binary */ 47 | } en_threshold_mode; 48 | 49 | typedef struct 50 | { 51 | fptp_t landmark_p[LANDMARKS_NUM]; /*!< landmark struct */ 52 | } landmark_t; 53 | 54 | typedef struct 55 | { 56 | fptp_t box_p[4]; /*!< box struct */ 57 | } box_t; 58 | 59 | typedef struct tag_box_list 60 | { 61 | uint8_t *category; /*!< The category of the corresponding box */ 62 | fptp_t *score; /*!< The confidence score of the class corresponding to the box */ 63 | box_t *box; /*!< Anchor boxes or predicted boxes*/ 64 | landmark_t *landmark; /*!< The landmarks corresponding to the box */ 65 | int len; /*!< The num of the boxes */ 66 | } box_array_t; 67 | 68 | typedef struct tag_image_box 69 | { 70 | struct tag_image_box *next; /*!< Next image_box_t */ 71 | uint8_t category; 72 | fptp_t score; /*!< The confidence score of the class corresponding to the box */ 73 | box_t box; /*!< Anchor boxes or predicted boxes */ 74 | box_t offset; /*!< The predicted anchor-based offset */ 75 | landmark_t landmark; /*!< The landmarks corresponding to the box */ 76 | } image_box_t; 77 | 78 | typedef struct tag_image_list 79 | { 80 | image_box_t *head; /*!< The current head of the image_list */ 81 | image_box_t *origin_head; /*!< The original head of the image_list */ 82 | int len; /*!< Length of the image_list */ 83 | } image_list_t; 84 | 85 | /** 86 | * @brief Get the width and height of the box. 87 | * 88 | * @param box Input box 89 | * @param w Resulting width of the box 90 | * @param h Resulting height of the box 91 | */ 92 | static inline void image_get_width_and_height(box_t *box, float *w, float *h) 93 | { 94 | *w = box->box_p[2] - box->box_p[0] + 1; 95 | *h = box->box_p[3] - box->box_p[1] + 1; 96 | } 97 | 98 | /** 99 | * @brief Get the area of the box. 100 | * 101 | * @param box Input box 102 | * @param area Resulting area of the box 103 | */ 104 | static inline void image_get_area(box_t *box, float *area) 105 | { 106 | float w, h; 107 | image_get_width_and_height(box, &w, &h); 108 | *area = w * h; 109 | } 110 | 111 | /** 112 | * @brief calibrate the boxes by offset 113 | * 114 | * @param image_list Input boxes 115 | * @param image_height Height of the original image 116 | * @param image_width Width of the original image 117 | */ 118 | static inline void image_calibrate_by_offset(image_list_t *image_list, int image_height, int image_width) 119 | { 120 | for (image_box_t *head = image_list->head; head; head = head->next) 121 | { 122 | float w, h; 123 | image_get_width_and_height(&(head->box), &w, &h); 124 | head->box.box_p[0] = DL_IMAGE_MAX(0, head->box.box_p[0] + head->offset.box_p[0] * w); 125 | head->box.box_p[1] = DL_IMAGE_MAX(0, head->box.box_p[1] + head->offset.box_p[1] * w); 126 | head->box.box_p[2] += head->offset.box_p[2] * w; 127 | if (head->box.box_p[2] > image_width) 128 | { 129 | head->box.box_p[2] = image_width - 1; 130 | head->box.box_p[0] = image_width - w; 131 | } 132 | head->box.box_p[3] += head->offset.box_p[3] * h; 133 | if (head->box.box_p[3] > image_height) 134 | { 135 | head->box.box_p[3] = image_height - 1; 136 | head->box.box_p[1] = image_height - h; 137 | } 138 | } 139 | } 140 | 141 | /** 142 | * @brief calibrate the landmarks 143 | * 144 | * @param image_list Input landmarks 145 | */ 146 | static inline void image_landmark_calibrate(image_list_t *image_list) 147 | { 148 | for (image_box_t *head = image_list->head; head; head = head->next) 149 | { 150 | float w, h; 151 | image_get_width_and_height(&(head->box), &w, &h); 152 | head->landmark.landmark_p[0] = head->box.box_p[0] + head->landmark.landmark_p[0] * w; 153 | head->landmark.landmark_p[1] = head->box.box_p[1] + head->landmark.landmark_p[1] * h; 154 | 155 | head->landmark.landmark_p[2] = head->box.box_p[0] + head->landmark.landmark_p[2] * w; 156 | head->landmark.landmark_p[3] = head->box.box_p[1] + head->landmark.landmark_p[3] * h; 157 | 158 | head->landmark.landmark_p[4] = head->box.box_p[0] + head->landmark.landmark_p[4] * w; 159 | head->landmark.landmark_p[5] = head->box.box_p[1] + head->landmark.landmark_p[5] * h; 160 | 161 | head->landmark.landmark_p[6] = head->box.box_p[0] + head->landmark.landmark_p[6] * w; 162 | head->landmark.landmark_p[7] = head->box.box_p[1] + head->landmark.landmark_p[7] * h; 163 | 164 | head->landmark.landmark_p[8] = head->box.box_p[0] + head->landmark.landmark_p[8] * w; 165 | head->landmark.landmark_p[9] = head->box.box_p[1] + head->landmark.landmark_p[9] * h; 166 | } 167 | } 168 | 169 | /** 170 | * @brief Convert a rectangular box into a square box 171 | * 172 | * @param boxes Input box 173 | * @param width Width of the orignal image 174 | * @param height height of the orignal image 175 | */ 176 | static inline void image_rect2sqr(box_array_t *boxes, int width, int height) 177 | { 178 | for (int i = 0; i < boxes->len; i++) 179 | { 180 | box_t *box = &(boxes->box[i]); 181 | 182 | int x1 = round(box->box_p[0]); 183 | int y1 = round(box->box_p[1]); 184 | int x2 = round(box->box_p[2]); 185 | int y2 = round(box->box_p[3]); 186 | 187 | int w = x2 - x1 + 1; 188 | int h = y2 - y1 + 1; 189 | int l = DL_IMAGE_MAX(w, h); 190 | 191 | box->box_p[0] = DL_IMAGE_MAX(round(DL_IMAGE_MAX(0, x1) + 0.5 * (w - l)), 0); 192 | box->box_p[1] = DL_IMAGE_MAX(round(DL_IMAGE_MAX(0, y1) + 0.5 * (h - l)), 0); 193 | 194 | box->box_p[2] = box->box_p[0] + l - 1; 195 | if (box->box_p[2] > width) 196 | { 197 | box->box_p[2] = width - 1; 198 | box->box_p[0] = width - l; 199 | } 200 | box->box_p[3] = box->box_p[1] + l - 1; 201 | if (box->box_p[3] > height) 202 | { 203 | box->box_p[3] = height - 1; 204 | box->box_p[1] = height - l; 205 | } 206 | } 207 | } 208 | 209 | /**@{*/ 210 | /** 211 | * @brief Convert RGB565 image to RGB888 image 212 | * 213 | * @param in Input RGB565 image 214 | * @param dst Resulting RGB888 image 215 | */ 216 | static inline void rgb565_to_888(uint16_t in, uint8_t *dst) 217 | { /*{{{*/ 218 | in = (in & 0xFF) << 8 | (in & 0xFF00) >> 8; 219 | dst[2] = (in & RGB565_MASK_BLUE) << 3; // blue 220 | dst[1] = (in & RGB565_MASK_GREEN) >> 3; // green 221 | dst[0] = (in & RGB565_MASK_RED) >> 8; // red 222 | 223 | // dst[0] = (in & 0x1F00) >> 5; 224 | // dst[1] = ((in & 0x7) << 5) | ((in & 0xE000) >> 11); 225 | // dst[2] = in & 0xF8; 226 | } /*}}}*/ 227 | 228 | static inline void rgb565_to_888_q16(uint16_t in, int16_t *dst) 229 | { /*{{{*/ 230 | in = (in & 0xFF) << 8 | (in & 0xFF00) >> 8; 231 | dst[2] = (in & RGB565_MASK_BLUE) << 3; // blue 232 | dst[1] = (in & RGB565_MASK_GREEN) >> 3; // green 233 | dst[0] = (in & RGB565_MASK_RED) >> 8; // red 234 | 235 | // dst[0] = (in & 0x1F00) >> 5; 236 | // dst[1] = ((in & 0x7) << 5) | ((in & 0xE000) >> 11); 237 | // dst[2] = in & 0xF8; 238 | } /*}}}*/ 239 | /**@}*/ 240 | 241 | /** 242 | * @brief Convert RGB888 image to RGB565 image 243 | * 244 | * @param in Resulting RGB565 image 245 | * @param r The red channel of the Input RGB888 image 246 | * @param g The green channel of the Input RGB888 image 247 | * @param b The blue channel of the Input RGB888 image 248 | */ 249 | static inline void rgb888_to_565(uint16_t *in, uint8_t r, uint8_t g, uint8_t b) 250 | { /*{{{*/ 251 | uint16_t rgb565 = 0; 252 | rgb565 = ((r >> 3) << 11); 253 | rgb565 |= ((g >> 2) << 5); 254 | rgb565 |= (b >> 3); 255 | rgb565 = (rgb565 & 0xFF) << 8 | (rgb565 & 0xFF00) >> 8; 256 | *in = rgb565; 257 | } /*}}}*/ 258 | 259 | /** 260 | * @brief Filter out the resulting boxes whose confidence score is lower than the threshold and convert the boxes to the actual boxes on the original image.((x, y, w, h) -> (x1, y1, x2, y2)) 261 | * 262 | * @param score Confidence score of the boxes 263 | * @param offset The predicted anchor-based offset 264 | * @param landmark The landmarks corresponding to the box 265 | * @param width Height of the original image 266 | * @param height Width of the original image 267 | * @param anchor_number Anchor number of the detection output feature map 268 | * @param anchors_size The anchor size 269 | * @param score_threshold Threshold of the confidence score 270 | * @param stride 271 | * @param resized_height_scale 272 | * @param resized_width_scale 273 | * @param do_regression 274 | * @return image_list_t* 275 | */ 276 | image_list_t *image_get_valid_boxes(fptp_t *score, 277 | fptp_t *offset, 278 | fptp_t *landmark, 279 | int width, 280 | int height, 281 | int anchor_number, 282 | int *anchors_size, 283 | fptp_t score_threshold, 284 | int stride, 285 | fptp_t resized_height_scale, 286 | fptp_t resized_width_scale, 287 | bool do_regression); 288 | /** 289 | * @brief Sort the resulting box lists by their confidence score. 290 | * 291 | * @param image_sorted_list The sorted box list. 292 | * @param insert_list The box list that have not been sorted. 293 | */ 294 | void image_sort_insert_by_score(image_list_t *image_sorted_list, const image_list_t *insert_list); 295 | 296 | /** 297 | * @brief Run NMS algorithm 298 | * 299 | * @param image_list The input boxes list 300 | * @param nms_threshold NMS threshold 301 | * @param same_area The flag of boxes with same area 302 | */ 303 | void image_nms_process(image_list_t *image_list, fptp_t nms_threshold, int same_area); 304 | 305 | /** 306 | * @brief Resize an image to half size 307 | * 308 | * @param dimage The output image 309 | * @param dw Width of the output image 310 | * @param dh Height of the output image 311 | * @param dc Channel of the output image 312 | * @param simage Source image 313 | * @param sw Width of the source image 314 | * @param sc Channel of the source image 315 | */ 316 | void image_zoom_in_twice(uint8_t *dimage, 317 | int dw, 318 | int dh, 319 | int dc, 320 | uint8_t *simage, 321 | int sw, 322 | int sc); 323 | 324 | /** 325 | * @brief Resize the image in RGB888 format via bilinear interpolation 326 | * 327 | * @param dst_image The output image 328 | * @param src_image Source image 329 | * @param dst_w Width of the output image 330 | * @param dst_h Height of the output image 331 | * @param dst_c Channel of the output image 332 | * @param src_w Width of the source image 333 | * @param src_h Height of the source image 334 | */ 335 | void image_resize_linear(uint8_t *dst_image, uint8_t *src_image, int dst_w, int dst_h, int dst_c, int src_w, int src_h); 336 | 337 | /** 338 | * @brief Crop, rotate and zoom the image in RGB888 format, 339 | * 340 | * @param corp_image The output image 341 | * @param src_image Source image 342 | * @param rotate_angle Rotate angle 343 | * @param ratio scaling ratio 344 | * @param center Center of rotation 345 | */ 346 | void image_cropper(uint8_t *corp_image, uint8_t *src_image, int dst_w, int dst_h, int dst_c, int src_w, int src_h, float rotate_angle, float ratio, float *center); 347 | 348 | /** 349 | * @brief Convert the rgb565 image to the rgb888 image 350 | * 351 | * @param m The output rgb888 image 352 | * @param bmp The input rgb565 image 353 | * @param count Total pixels of the rgb565 image 354 | */ 355 | void image_rgb565_to_888(uint8_t *m, uint16_t *bmp, int count); 356 | 357 | /** 358 | * @brief Convert the rgb888 image to the rgb565 image 359 | * 360 | * @param bmp The output rgb565 image 361 | * @param m The input rgb888 image 362 | * @param count Total pixels of the rgb565 image 363 | */ 364 | void image_rgb888_to_565(uint16_t *bmp, uint8_t *m, int count); 365 | 366 | /** 367 | * @brief draw rectangle on the rgb565 image 368 | * 369 | * @param buf Input image 370 | * @param boxes Rectangle Boxes 371 | * @param width Width of the input image 372 | */ 373 | void draw_rectangle_rgb565(uint16_t *buf, box_array_t *boxes, int width); 374 | 375 | /** 376 | * @brief draw rectangle on the rgb888 image 377 | * 378 | * @param buf Input image 379 | * @param boxes Rectangle Boxes 380 | * @param width Width of the input image 381 | */ 382 | void draw_rectangle_rgb888(uint8_t *buf, box_array_t *boxes, int width); 383 | 384 | /** 385 | * @brief Get the pixel difference of two images 386 | * 387 | * @param dst The output pixel difference 388 | * @param src1 Input image 1 389 | * @param src2 Input image 2 390 | * @param count Total pixels of the input image 391 | */ 392 | void image_abs_diff(uint8_t *dst, uint8_t *src1, uint8_t *src2, int count); 393 | 394 | /** 395 | * @brief Binarize an image to 0 and value. 396 | * 397 | * @param dst The output image 398 | * @param src Source image 399 | * @param threshold Threshold of binarization 400 | * @param value The value of binarization 401 | * @param count Total pixels of the input image 402 | * @param mode Threshold mode 403 | */ 404 | void image_threshold(uint8_t *dst, uint8_t *src, int threshold, int value, int count, en_threshold_mode mode); 405 | 406 | /** 407 | * @brief Erode the image 408 | * 409 | * @param dst The output image 410 | * @param src Source image 411 | * @param src_w Width of the source image 412 | * @param src_h Height of the source image 413 | * @param src_c Channel of the source image 414 | */ 415 | void image_erode(uint8_t *dst, uint8_t *src, int src_w, int src_h, int src_c); 416 | 417 | typedef float matrixType; 418 | typedef struct 419 | { 420 | int w; /*!< width */ 421 | int h; /*!< height */ 422 | matrixType **array; /*!< array */ 423 | } Matrix; 424 | 425 | /** 426 | * @brief Allocate a 2d matrix 427 | * 428 | * @param h Height of matrix 429 | * @param w Width of matrix 430 | * @return Matrix* 2d matrix 431 | */ 432 | Matrix *matrix_alloc(int h, int w); 433 | 434 | /** 435 | * @brief Free a 2d matrix 436 | * 437 | * @param m 2d matrix 438 | */ 439 | void matrix_free(Matrix *m); 440 | 441 | /** 442 | * @brief Get the similarity matrix of similarity transformation 443 | * 444 | * @param srcx Source x coordinates 445 | * @param srcy Source y coordinates 446 | * @param dstx Destination x coordinates 447 | * @param dsty Destination y coordinates 448 | * @param num The number of the coordinates 449 | * @return Matrix* The resulting transformation matrix 450 | */ 451 | Matrix *get_similarity_matrix(float *srcx, float *srcy, float *dstx, float *dsty, int num); 452 | 453 | /** 454 | * @brief Get the affine transformation matrix 455 | * 456 | * @param srcx Source x coordinates 457 | * @param srcy Source y coordinates 458 | * @param dstx Destination x coordinates 459 | * @param dsty Destination y coordinates 460 | * @return Matrix* The resulting transformation matrix 461 | */ 462 | Matrix *get_affine_transform(float *srcx, float *srcy, float *dstx, float *dsty); 463 | 464 | /** 465 | * @brief Applies an affine transformation to an image 466 | * 467 | * @param img Input image 468 | * @param crop Dst output image that has the size dsize and the same type as src 469 | * @param M Affine transformation matrix 470 | */ 471 | void warp_affine(dl_matrix3du_t *img, dl_matrix3du_t *crop, Matrix *M); 472 | 473 | /** 474 | * @brief Resize the image in RGB888 format via bilinear interpolation, and quantify the output image 475 | * 476 | * @param dst_image Quantized output image 477 | * @param src_image Input image 478 | * @param dst_w Width of the output image 479 | * @param dst_h Height of the output image 480 | * @param dst_c Channel of the output image 481 | * @param src_w Width of the input image 482 | * @param src_h Height of the input image 483 | * @param shift Shift parameter of quantization. 484 | */ 485 | void image_resize_linear_q(qtp_t *dst_image, uint8_t *src_image, int dst_w, int dst_h, int dst_c, int src_w, int src_h, int shift); 486 | 487 | /** 488 | * @brief Preprocess the input image of object detection model. The process is like this: resize -> normalize -> quantify 489 | * 490 | * @param image Input image, RGB888 format. 491 | * @param input_w Width of the input image. 492 | * @param input_h Height of the input image. 493 | * @param target_size Target size of the model input image. 494 | * @param exponent Exponent of the quantized model input image. 495 | * @param process_mode Process mode. 0: resize with padding to keep height == width. 1: resize without padding, height != width. 496 | * @return dl_matrix3dq_t* The resulting preprocessed image. 497 | */ 498 | dl_matrix3dq_t *image_resize_normalize_quantize(uint8_t *image, int input_w, int input_h, int target_size, int exponent, int process_mode); 499 | 500 | /** 501 | * @brief Resize the image in RGB565 format via mean neighbour interpolation, and quantify the output image 502 | * 503 | * @param dimage Quantized output image. 504 | * @param simage Input image. 505 | * @param dw Width of the allocated output image memory. 506 | * @param dc Channel of the allocated output image memory. 507 | * @param sw Width of the input image. 508 | * @param sh Height of the input image. 509 | * @param tw Target width of the output image. 510 | * @param th Target height of the output image. 511 | * @param shift Shift parameter of quantization. 512 | */ 513 | void image_resize_shift_fast(qtp_t *dimage, uint16_t *simage, int dw, int dc, int sw, int sh, int tw, int th, int shift); 514 | 515 | /** 516 | * @brief Resize the image in RGB565 format via nearest neighbour interpolation, and quantify the output image 517 | * 518 | * @param dimage Quantized output image. 519 | * @param simage Input image. 520 | * @param dw Width of the allocated output image memory. 521 | * @param dc Channel of the allocated output image memory. 522 | * @param sw Width of the input image. 523 | * @param sh Height of the input image. 524 | * @param tw Target width of the output image. 525 | * @param th Target height of the output image. 526 | * @param shift Shift parameter of quantization. 527 | */ 528 | void image_resize_nearest_shift(qtp_t *dimage, uint16_t *simage, int dw, int dc, int sw, int sh, int tw, int th, int shift); 529 | 530 | /** 531 | * @brief Crop the image in RGB565 format and resize it to target size, then quantify the output image 532 | * 533 | * @param dimage Quantized output image. 534 | * @param simage Input image. 535 | * @param dw Target size of the output image. 536 | * @param sw Width of the input image. 537 | * @param sh Height of the input image. 538 | * @param x1 The x coordinate of the upper left corner of the cropped area 539 | * @param y1 The y coordinate of the upper left corner of the cropped area 540 | * @param x2 The x coordinate of the lower right corner of the cropped area 541 | * @param y2 The y coordinate of the lower right corner of the cropped area 542 | * @param shift Shift parameter of quantization. 543 | */ 544 | void image_crop_shift_fast(qtp_t *dimage, uint16_t *simage, int dw, int sw, int sh, int x1, int y1, int x2, int y2, int shift); 545 | 546 | #ifdef __cplusplus 547 | } 548 | #endif 549 | -------------------------------------------------------------------------------- /edge-impulse-esp32-cam/mtmn.h: -------------------------------------------------------------------------------- 1 | /* 2 | * ESPRESSIF MIT License 3 | * 4 | * Copyright (c) 2018 5 | * 6 | * Permission is hereby granted for use on ESPRESSIF SYSTEMS products only, in which case, 7 | * it is free of charge, to any person obtaining a copy of this software and associated 8 | * documentation files (the "Software"), to deal in the Software without restriction, including 9 | * without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 10 | * and/or sell copies of the Software, and to permit persons to whom the Software is furnished 11 | * to do so, subject to the following conditions: 12 | * 13 | * The above copyright notice and this permission notice shall be included in all copies or 14 | * substantial portions of the Software. 15 | * 16 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 18 | * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 19 | * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 20 | * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 21 | * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 22 | * 23 | */ 24 | #pragma once 25 | 26 | #ifdef __cplusplus 27 | extern "C" 28 | { 29 | #endif 30 | #include "dl_lib_matrix3d.h" 31 | #include "dl_lib_matrix3dq.h" 32 | 33 | /** 34 | * Detection results with MTMN. 35 | * 36 | */ 37 | typedef struct 38 | { 39 | dl_matrix3d_t *category; /*!< Classification result after softmax, channel is 2 */ 40 | dl_matrix3d_t *offset; /*!< Bounding box offset of 2 points: top-left and bottom-right, channel is 4 */ 41 | dl_matrix3d_t *landmark; /*!< Offsets of 5 landmarks: 42 | * - Left eye 43 | * - Mouth leftside 44 | * - Nose 45 | * - Right eye 46 | * - Mouth rightside 47 | * 48 | * channel is 10 49 | * */ 50 | } mtmn_net_t; 51 | 52 | 53 | /** 54 | * @brief Free a mtmn_net_t 55 | * 56 | * @param p A mtmn_net_t pointer 57 | * 58 | */ 59 | 60 | void mtmn_net_t_free(mtmn_net_t *p); 61 | 62 | /** 63 | * @brief Forward the pnet process, coarse detection. Calculate in float. 64 | * 65 | * @param in Image matrix, rgb888 format, size is 320x240 66 | * @return Scores for every pixel, and box offset with respect. 67 | */ 68 | mtmn_net_t *pnet_lite_f(dl_matrix3du_t *in); 69 | 70 | /** 71 | * @brief Forward the rnet process, fine determine the boxes from pnet. Calculate in float. 72 | * 73 | * @param in Image matrix, rgb888 format 74 | * @param threshold Score threshold to detect human face 75 | * @return Scores for every box, and box offset with respect. 76 | */ 77 | mtmn_net_t *rnet_lite_f_with_score_verify(dl_matrix3du_t *in, float threshold); 78 | 79 | /** 80 | * @brief Forward the onet process, fine determine the boxes from rnet. Calculate in float. 81 | * 82 | * @param in Image matrix, rgb888 format 83 | * @param threshold Score threshold to detect human face 84 | * @return Scores for every box, box offset, and landmark with respect. 85 | */ 86 | mtmn_net_t *onet_lite_f_with_score_verify(dl_matrix3du_t *in, float threshold); 87 | 88 | /** 89 | * @brief Forward the pnet process, coarse detection. Calculate in quantization. 90 | * 91 | * @param in Image matrix, rgb888 format, size is 320x240 92 | * @return Scores for every pixel, and box offset with respect. 93 | */ 94 | mtmn_net_t *pnet_lite_q(dl_matrix3du_t *in, dl_conv_mode mode); 95 | 96 | /** 97 | * @brief Forward the rnet process, fine determine the boxes from pnet. Calculate in quantization. 98 | * 99 | * @param in Image matrix, rgb888 format 100 | * @param threshold Score threshold to detect human face 101 | * @return Scores for every box, and box offset with respect. 102 | */ 103 | mtmn_net_t *rnet_lite_q_with_score_verify(dl_matrix3du_t *in, float threshold, dl_conv_mode mode); 104 | 105 | /** 106 | * @brief Forward the onet process, fine determine the boxes from rnet. Calculate in quantization. 107 | * 108 | * @param in Image matrix, rgb888 format 109 | * @param threshold Score threshold to detect human face 110 | * @return Scores for every box, box offset, and landmark with respect. 111 | */ 112 | mtmn_net_t *onet_lite_q_with_score_verify(dl_matrix3du_t *in, float threshold, dl_conv_mode mode); 113 | 114 | /** 115 | * @brief Forward the pnet process, coarse detection. Calculate in quantization. 116 | * 117 | * @param in Image matrix, rgb888 format, size is 320x240 118 | * @return Scores for every pixel, and box offset with respect. 119 | */ 120 | mtmn_net_t *pnet_heavy_q(dl_matrix3du_t *in, dl_conv_mode mode); 121 | 122 | /** 123 | * @brief Forward the rnet process, fine determine the boxes from pnet. Calculate in quantization. 124 | * 125 | * @param in Image matrix, rgb888 format 126 | * @param threshold Score threshold to detect human face 127 | * @return Scores for every box, and box offset with respect. 128 | */ 129 | mtmn_net_t *rnet_heavy_q_with_score_verify(dl_matrix3du_t *in, float threshold, dl_conv_mode mode); 130 | 131 | /** 132 | * @brief Forward the onet process, fine determine the boxes from rnet. Calculate in quantization. 133 | * 134 | * @param in Image matrix, rgb888 format 135 | * @param threshold Score threshold to detect human face 136 | * @return Scores for every box, box offset, and landmark with respect. 137 | */ 138 | mtmn_net_t *onet_heavy_q_with_score_verify(dl_matrix3du_t *in, float threshold, dl_conv_mode mode); 139 | 140 | #ifdef __cplusplus 141 | } 142 | #endif 143 | -------------------------------------------------------------------------------- /ei-esp32-cam-cat-dog-arduino-1.0.4.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alankrantas/edge-impulse-esp32-cam-image-classification/bea4de2a83737349598063bc4ded2949cdcc5b25/ei-esp32-cam-cat-dog-arduino-1.0.4.zip -------------------------------------------------------------------------------- /esp32-cam-edge-impulse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alankrantas/edge-impulse-esp32-cam-image-classification/bea4de2a83737349598063bc4ded2949cdcc5b25/esp32-cam-edge-impulse.png --------------------------------------------------------------------------------