├── .gitignore ├── CMakeLists.txt ├── images ├── clean.jpg ├── matched.jpg └── original.jpg ├── main.cpp └── readme.md /.gitignore: -------------------------------------------------------------------------------- 1 | build 2 | -------------------------------------------------------------------------------- /CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 2.8) 2 | project(Kayak) 3 | add_definitions(-std=c++11) 4 | find_package(OpenCV REQUIRED) 5 | add_executable(kayak main.cpp) 6 | target_link_libraries(kayak ${OpenCV_LIBS}) 7 | -------------------------------------------------------------------------------- /images/clean.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepanbujnak/kayak/ddbbd54a5d8fea74bf4047276ed47c477f92aed4/images/clean.jpg -------------------------------------------------------------------------------- /images/matched.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepanbujnak/kayak/ddbbd54a5d8fea74bf4047276ed47c477f92aed4/images/matched.jpg -------------------------------------------------------------------------------- /images/original.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepanbujnak/kayak/ddbbd54a5d8fea74bf4047276ed47c477f92aed4/images/original.jpg -------------------------------------------------------------------------------- /main.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include 6 | 7 | // Sectors is our constant determining how each character will be split into a 8 | // grid. Here we've got 3x3 grid, each cell is going to be an individual input 9 | // to the ANN. 10 | static const cv::Size Sectors(3, 3); 11 | 12 | // Structure capturing each character. 13 | struct Char { 14 | 15 | int x; // X coordinate of the top left corner of the character in the 16 | // original image 17 | 18 | int y; // Y coordinate of the top left corner of the character in the 19 | // original image 20 | int w; // Character width 21 | int h; // Character height 22 | int predicted; // The ASCII value of the character predicted by the ANN 23 | cv::Mat mat; // Image of the character 24 | }; 25 | 26 | // Helper structure used by query search 27 | struct Location { 28 | int x; // Column of the text matrix 29 | int y; // Row of the text matrix 30 | int dir_x; // Search direction for `x` 31 | int dir_y; // Search direction for `y` 32 | 33 | Location(int x, int y, int dir_x, int dir_y) 34 | : x(x), y(y), dir_x(dir_x), dir_y(dir_y) {} 35 | }; 36 | 37 | // A matrix with each character indexed by its actual position. 38 | // This is necessary for the search that will be conducted after character 39 | // recognition. 40 | typedef std::vector> Text; 41 | 42 | //---------------------------------------------------------------------------- 43 | // isColBlank 44 | //---------------------------------------------------------------------------- 45 | // Iterates over all columns in a rowspecified by the `row_idx` parameter and 46 | // returns true if the row is blank, i.e. all the pixels in the row are white. 47 | static bool 48 | isRowBlank(const cv::Mat &mat, int row_idx) { 49 | const auto &row = mat.ptr(row_idx); 50 | 51 | for (int i = 0; i < mat.cols; ++i) { 52 | if (row[i] == 0) { 53 | return false; 54 | } 55 | } 56 | 57 | return true; 58 | } 59 | 60 | //---------------------------------------------------------------------------- 61 | // isColBlank 62 | //---------------------------------------------------------------------------- 63 | // Iterates over all rows in a column specified by the `col_idx` parameter and 64 | // returns true if the column is blank, i.e. all the pixels in the column 65 | // are white. 66 | static bool 67 | isColBlank(const cv::Mat &mat, int col_idx) { 68 | for (int i = 0; i < mat.rows; ++i) { 69 | const auto &row = mat.ptr(i); 70 | 71 | if (row[col_idx] == 0) { 72 | return false; 73 | } 74 | } 75 | 76 | return true; 77 | } 78 | 79 | //---------------------------------------------------------------------------- 80 | // calculateInputs 81 | //---------------------------------------------------------------------------- 82 | // The function resizes the character image into size specified by `Sectors` 83 | // e.g. if sectors = 3x3 then the image will have 9 pixels. 84 | // Each pixel is then used as an input to a neural network. 85 | static void 86 | calculateInputs(const cv::Mat &mat, cv::Mat &inputs) { 87 | cv::Mat small, flat; 88 | 89 | cv::resize(mat, small, Sectors); 90 | flat = small.reshape(0, 1); 91 | 92 | assert(flat.cols == inputs.cols); 93 | 94 | for (int i = 0; i < flat.cols; ++i) { 95 | inputs.at(0, i) = static_cast(flat.at(0, i)); 96 | } 97 | } 98 | 99 | //---------------------------------------------------------------------------- 100 | // detectText 101 | //---------------------------------------------------------------------------- 102 | // Here we merely extract character image from the large, global image. Since 103 | // the image is thresholded we can separate characters by empty (white) rows 104 | // and columns. 105 | static void 106 | detectText(const cv::Mat &mat, Text &text) { 107 | std::vector> lines; 108 | bool was_blank = true; 109 | int line_start = 0; 110 | int char_start = 0; 111 | 112 | // Detect lines 113 | for (int i = 0; i < mat.rows; ++i) { 114 | bool is_blank = isRowBlank(mat, i); 115 | 116 | if (!is_blank && was_blank) { 117 | line_start = i; 118 | was_blank = false; 119 | } else if (is_blank && !was_blank) { 120 | auto line_mat = mat(cv::Rect(0, line_start, mat.cols, i - line_start)); 121 | lines.push_back(std::make_pair(line_start, line_mat)); 122 | was_blank = true; 123 | } 124 | } 125 | 126 | was_blank = true; 127 | 128 | // Detect chars 129 | for (const auto &line: lines) { 130 | std::vector chars; 131 | 132 | for (int i = 0; i < line.second.cols; ++i) { 133 | bool is_blank = isColBlank(line.second, i); 134 | 135 | if (!is_blank && was_blank) { 136 | char_start = i; 137 | was_blank = false; 138 | } else if (is_blank && !was_blank) { 139 | Char cr; 140 | 141 | // Save locations 142 | cr.x = char_start; 143 | cr.y = line.first; 144 | cr.w = i - char_start; 145 | cr.h = line.second.rows; 146 | cr.mat = line.second(cv::Rect(cr.x, 0, cr.w, cr.h)); 147 | 148 | chars.push_back(cr); 149 | was_blank = true; 150 | } 151 | } 152 | 153 | text.push_back(chars); 154 | } 155 | } 156 | 157 | //---------------------------------------------------------------------------- 158 | // readText 159 | //---------------------------------------------------------------------------- 160 | // Initialize and train the ANN 161 | // Our topology is going to be: 162 | // Inputs: the area of sectors. If sectors is 3x3 then inputs is 9 163 | // Hidden: Rule of thumb: 1 layer, (size(inputs) + size(outputs)) / 2 164 | // Outputs: 1 (we only need one character) 165 | //--- 166 | // After the ANN is initialized and train we predict by running the extracted 167 | // characters through the neural network and then assigning label to the nearest 168 | // match 169 | static void 170 | readText(Text &text, const cv::Mat &train_in, const cv::Mat &train_out) { 171 | auto nb_inputs = Sectors.area(); 172 | CvANN_MLP ann((cv::Mat_(1, 3) << nb_inputs, ((nb_inputs + 1) / 2), 1)); 173 | ann.train(train_in, train_out, cv::Mat()); 174 | 175 | // Predict 176 | for (int i = 0; i < text.size(); ++i) { 177 | for (int j = 0; j < text[i].size(); ++j) { 178 | cv::Mat inputs(1, Sectors.area(), CV_64F); 179 | calculateInputs(text[i][j].mat, inputs); 180 | 181 | cv::Mat outputs(1, 1, CV_64F); 182 | ann.predict(inputs, outputs); 183 | 184 | auto predicted = outputs.at(0, 0); 185 | 186 | int nearest_dst = 9999; 187 | int nearest_val = -1; 188 | for (int k = 0; k < train_out.rows; ++k) { 189 | int dst = std::abs(predicted - train_out.at(k, 0)); 190 | 191 | if (dst < nearest_dst) { 192 | nearest_val = train_out.at(k, 0); 193 | nearest_dst = dst; 194 | } 195 | } 196 | 197 | text[i][j].predicted = nearest_val; 198 | } 199 | } 200 | } 201 | 202 | //---------------------------------------------------------------------------- 203 | // searchInDir 204 | //---------------------------------------------------------------------------- 205 | // Searches the character matrix in direction specified by the `loc` parameter 206 | // for specified query. 207 | // It also does some boundary checking to be sure that we don't run out of 208 | // canvas. 209 | static bool 210 | searchInDir(const Text &text, const std::string &query, const Location &loc) { 211 | int x = loc.x + loc.dir_x; 212 | int y = loc.y + loc.dir_y; 213 | 214 | int last_x = loc.x + (query.size() * loc.dir_x); 215 | int last_y = loc.y + (query.size() * loc.dir_y); 216 | 217 | if (last_y < 0 || last_y >= text.size() || 218 | last_x < 0 || last_x >= text[last_y].size()) { 219 | return false; 220 | } 221 | 222 | for (int i = 1; i < query.size(); ++i) { 223 | if (query[i] != text[x][y].predicted) { 224 | return false; 225 | } 226 | 227 | x += loc.dir_x; 228 | y += loc.dir_y; 229 | } 230 | 231 | return true; 232 | } 233 | 234 | //---------------------------------------------------------------------------- 235 | // colorMatch 236 | //---------------------------------------------------------------------------- 237 | // Create red circle around the character. We can do this because we stored 238 | // character position in the actual image before. 239 | static void 240 | colorMatch(cv::Mat &image, const Text &text, const Location &loc, int len) { 241 | int x = loc.x; 242 | int y = loc.y; 243 | 244 | for (int i = 0; i < len; ++i) { 245 | auto cr = text[x][y]; 246 | auto center_x = cr.x + (cr.w / 2); 247 | auto center_y = cr.y + (cr.h / 2); 248 | 249 | cv::Point center(center_x, center_y); 250 | cv::circle(image, center, cr.h, cv::Scalar(0, 0, 255), 2); 251 | 252 | x += loc.dir_x; 253 | y += loc.dir_y; 254 | } 255 | } 256 | 257 | //---------------------------------------------------------------------------- 258 | // colorMatches 259 | //---------------------------------------------------------------------------- 260 | // Run search and then circle all matched characters 261 | static void 262 | colorMatches(cv::Mat &image, const Text &text, const std::string &query) { 263 | static const std::vector> directions { 264 | {1, 0}, {1, 1}, {0, 1}, {-1, 1}, {-1, 0}, {-1, -1}, {0, -1} 265 | }; 266 | 267 | for (int i = 0; i < text.size(); ++i) { 268 | for (int j = 0; j < text[i].size(); ++j) { 269 | // Only if the first letter matches 270 | if (text[i][j].predicted == query[0]) { 271 | for (const auto &dir: directions) { 272 | Location location(i, j, dir.first, dir.second); 273 | 274 | if (searchInDir(text, query, location)) { 275 | colorMatch(image, text, location, query.size()); 276 | } 277 | } 278 | } 279 | } 280 | } 281 | } 282 | 283 | //---------------------------------------------------------------------------- 284 | // Main 285 | //---------------------------------------------------------------------------- 286 | int 287 | main(int argc, char *argv[]) { 288 | if (argc != 2) { 289 | std::cerr << "usage: " << argv[0] << " " << std::endl; 290 | return -1; 291 | } 292 | 293 | // Load original image, in colors. We need RGB image so we can draw 294 | // red circles later on. 295 | cv::Mat image = cv::imread(argv[1], CV_LOAD_IMAGE_COLOR); 296 | if (!image.data) { 297 | std::cerr << "No image data" << std::endl; 298 | return -1; 299 | } 300 | 301 | // Convert colored image into grayscale 302 | cv::Mat grayscale; 303 | cv::cvtColor(image, grayscale, cv::COLOR_BGR2GRAY); 304 | 305 | // Set pixels to either black or white, by a threshold 306 | cv::Mat binary; 307 | cv::threshold(grayscale, binary, 127, 255, cv::THRESH_BINARY); 308 | 309 | // Recognize all the characters 310 | Text text; 311 | detectText(binary, text); 312 | 313 | // Handpicked characters used for training of the ANN. 314 | auto K = text[0][3].mat; 315 | auto A = text[0][0].mat; 316 | auto Y = text[0][2].mat; 317 | 318 | // Load inputs to corresponding data structures. 319 | cv::Mat train_inputs(3, Sectors.area(), CV_64F); 320 | cv::Mat row; 321 | 322 | row = train_inputs.row(0); 323 | calculateInputs(K, row); 324 | row = train_inputs.row(1); 325 | calculateInputs(A, row); 326 | row = train_inputs.row(2); 327 | calculateInputs(Y, row); 328 | 329 | // Load outputs to corresponding data structures. 330 | cv::Mat train_outputs(3, 1, CV_64F); 331 | train_outputs.at(0, 0) = 'K'; 332 | train_outputs.at(1, 0) = 'A'; 333 | train_outputs.at(2, 0) = 'Y'; 334 | 335 | // Do the actual prediction 336 | readText(text, train_inputs, train_outputs); 337 | 338 | // Color all connected characters matching the query string 339 | colorMatches(image, text, "KAYAK"); 340 | 341 | // Display colored image 342 | /*cv::namedWindow("Display Image", cv::WINDOW_AUTOSIZE); 343 | cv::imshow("Display Image", image); 344 | cv::waitKey(0);*/ 345 | cv::imwrite("matched.jpg", image); 346 | 347 | return 0; 348 | } 349 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | The Kayak puzzle 2 | ================ 3 | 4 | So I saw this funny word search on the internet that was composed only of letters **K**, **A** and **Y**. 5 | The point of the word search was to find the word **KAYAK**. Seemingly impossible and the author disclosed 6 | that the word **KAYAK** appears in the glyphs matrix only once. 7 | 8 | After I spent a couple of minutes solving the puzzle I figured it would be interesting to write 9 | a program to solve the puzzle for me, since I've never done any [OCR](http://en.wikipedia.org/wiki/Optical_character_recognition) before, and wanted to try it. 10 | 11 | ![An image of the KAYAK word search](images/original.jpg "An image of the KAYAK word search") 12 | 13 | How did I proceed 14 | ================= 15 | 16 | The very first step was to remove the title and help text from the word search so it could be processed 17 | by the program without distractions. I've done this using Mac OS X's Preview.app. Here's the image after cropping: 18 | 19 | ![The cropped image](images/clean.jpg "The cropped image") 20 | 21 | Then, using [OpenCV](http://opencv.org/) I separated individual glyphs into a matrix, storing each glyph on index 22 | according to its position in the glyphs matrix on the image. 23 | 24 | After that, I initialized [Artificial Neural Network](http://en.wikipedia.org/wiki/Artificial_neural_network) with following topology: 9 inputs, 5 neurons in one hidden layer and one output. I resized each glyph into 3x3 pixel matrix, each pixel corresponding to one input and selected 3 glyphs for training the **ANN**. 25 | 26 | After the **ANN** was trained I could predict each individual glyph and get ASCII code instead. Once predicted, 27 | it was relatively straightforward to search for the string. Here's an image of the output image after search has been done: 28 | 29 | ![A matched KAYAK word](images/matched.jpg "A matched KAYAK word") 30 | 31 | Implementation 32 | ============== 33 | 34 | The implementation can be seen in the [main.cpp](main.cpp) file. The code is messy and not very clear but hopefully you'll find it useful. 35 | --------------------------------------------------------------------------------