├── LICENSE.txt ├── README.md ├── data └── data.csv ├── img ├── banner.png ├── example01.png ├── example02.png ├── example03.png └── example04.png └── src ├── image.py └── main.c /LICENSE.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) [year] [fullname] 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![](img/banner.png) 2 | 3 | A beginner programmer has protected his computer terminal from malicious bots using a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) based authentication system that he wrote in `C`. To write the CAPTCHA he though it would be a great idea to use the MNIST data set as input for the system; he made use of the labeled data set to display a random number of ASCII formatted digits whose corresponding sequence is known (the program does knows `0244` is the answer, so if the user types this sequence it's not a bot). He is not aware of sophisticated hacks associated with the programming language of choice, and he doesn't know the existence of machine learning programs that can defeat his authentication program. You were hired as a penetration tester to defeat his program, he is pretty sure his system is robust and foolproof. You know you have two choices; to hack the program using ML methods or to use vulnerabilities in the program related to `The C Programming Language`. 4 | 5 | The challenge is to provide a solution to this problem (hack the MNIST based ASCII CAPTCHA), by using any of the following methods: 6 | 7 | * Computer hacking techniques related to the program itself, the properties of the program, or the computer memory while the program is running (the program holds explicit references to the correct sequence). You will need the source code (don't worry, it is short and clear). 8 | * ML related methods. In this alternative, you will need the data set (`data/data.csv`), you do not need to know `C`, or to read the source code. 9 | 10 | The program consists of a single `C` file (`src/main.c`), and you are not suppose to modify it. The attacks must come from other programs (a bot that gives you the correct sequence. The only input the bot can receive by hand is the ASCII banner printed by `main.c`). A description of the program follows: 11 | 12 | To compile the program, just execute `make src/main`. 13 | 14 | `./main min max [-w | -b] [-d | -r]` receives four command line arguments. 15 | 16 | 1. The first pair (`min`, `max`) is the range for the selection of the length of the digits of the CAPTCHA. So the number of digits in the CAPTCHA can be any number between min and max inclusive. 17 | 2. `[-w | -b]` tells the program the way every single digit must be displayed in the terminal. If `-w` is selected, the digits must be filled by spaces and the remaining banner should be filled with ASCII characters (an special range of values that excludes the space character). Otherwise, if `-b` is selected, the digits must be filled and the remaining banner must consist of spaces. 18 | 3. `[-d | -r]` tells the program the character to fill the banner depending on the previous selection. If `-d` is selected, an special but unique character will fill the banner. If `-r` is selected, the banner would be filled by random characters. 19 | 20 | ### Some examples 21 | 22 | 1. `./main 2 5 -w -d` 23 | 24 | ![](img/example01.png) 25 | 26 | 2. `./main 2 5 -w -r` 27 | 28 | ![](img/example02.png) 29 | 30 | 3. `./main 2 5 -b -d` 31 | 32 | ![](img/example03.png) 33 | 34 | 4. `./main 2 5 -b -r` 35 | 36 | ![](img/example04.png) 37 | 38 | ### Solve the challenge 39 | 40 | To solve the challenge, just fork the repository and write your solution. You'll learn a lot from it. `Happy Hacking` 41 | 42 | 43 | -------------------------------------------------------------------------------- /img/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scvalencia/MNIST_ASCII_challenge/60f7880f2d5aebe2420b472a4af7c7f2e0ee9c45/img/banner.png -------------------------------------------------------------------------------- /img/example01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scvalencia/MNIST_ASCII_challenge/60f7880f2d5aebe2420b472a4af7c7f2e0ee9c45/img/example01.png -------------------------------------------------------------------------------- /img/example02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scvalencia/MNIST_ASCII_challenge/60f7880f2d5aebe2420b472a4af7c7f2e0ee9c45/img/example02.png -------------------------------------------------------------------------------- /img/example03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scvalencia/MNIST_ASCII_challenge/60f7880f2d5aebe2420b472a4af7c7f2e0ee9c45/img/example03.png -------------------------------------------------------------------------------- /img/example04.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scvalencia/MNIST_ASCII_challenge/60f7880f2d5aebe2420b472a4af7c7f2e0ee9c45/img/example04.png -------------------------------------------------------------------------------- /src/image.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy 3 | 4 | def write_MNIST_files(): 5 | 6 | file_object = open('../data/data.csv', 'r') 7 | 8 | file_object.readline() 9 | 10 | counters = {_ : 0 for _ in range(10)} 11 | 12 | folders = { 13 | 0 : 'zero', 1 : 'one', 2 : 'two', 3 : 'three', 14 | 4 : 'four', 5 : 'five', 6 : 'six', 7 : 'seven', 15 | 8 : 'eight', 9 : 'nine' 16 | } 17 | 18 | for line in file_object: 19 | 20 | parsed = map(lambda x : int(x.strip()), line.split(',')) 21 | label = int(parsed[0]) 22 | 23 | image_array = numpy.array(parsed[1:]) 24 | image_array = image_array.reshape(28, 28) 25 | 26 | imagefilename = "../img/data/" + folders[label] + "/file" + "_" + str(counters[label]) + ".png" 27 | cv2.imwrite(imagefilename, image_array) 28 | counters[label] = counters[label] + 1 -------------------------------------------------------------------------------- /src/main.c: -------------------------------------------------------------------------------- 1 | // 2 | // main.c 3 | // MNIST_ASCII_Explorer 4 | // 5 | // Created by Sebastian Valencia on 12/29/16. 6 | // 7 | 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | 15 | /* Total registers in the file */ 16 | #define MNIST_DATA_COUNT 42000 17 | 18 | /* What's the minimum value from a and b */ 19 | #define MIN(a, b) ((a) < (b)) ? (a) : (b) 20 | 21 | const char* FILE_NAME = "../data/data.csv"; 22 | 23 | const int MAX_BUFFER_SIZE = 1024 * 7 - 216; // By trial and error 24 | 25 | /* Pixel count per file register */ 26 | const int PIXELS_COUNT = 784; 27 | 28 | /* The dimensions of the box */ 29 | const int MNIST_LINE_SIZE = 28; 30 | 31 | /* Random coloring seed fot the ASCII CAPTCHA */ 32 | const char ASCII_colors[] = {'.', ',', ';', ':', '"', '\'', '`', '+', '*', '%', '#', '@'}; 33 | 34 | /* Predefined coloring constant */ 35 | const char DOT = '*'; 36 | 37 | /* 38 | * This block defines the global variables of the program 39 | * min_digits, and max_digits, forms a range that defines the number of digits 40 | * in the final CAPTCHA. 41 | * 42 | * white, if true, means that the digits shuld be filled with space character, while 43 | * the surroudings must have other non-blank charcters; that depends on the variable 44 | * dots, if this variable is true, those non-blank characters are going to be DOT, 45 | * otherwise, random characters taken from ASCII_colors. 46 | */ 47 | 48 | int min_digits, max_digits; 49 | bool white; 50 | bool dots; 51 | 52 | /* 53 | * This block defines the basic data structures to handle MNIST data, that is, what is 54 | * taken from the file, and what's going to be processed through out the program. 55 | * 56 | * raw_MNIST_line, is a single line from the CSV file, it have the label (the real digit) 57 | * that number represents, and a list of integers holding the "brightness" of each of the 58 | * 784 pixels, those are numbers between 0 and 255 (with higher numbers meaning darker). 59 | * 60 | * MNIST_array, is the colection of raw_MNIST_lines retrieved from the file. 61 | */ 62 | 63 | struct raw_MNIST_line { 64 | int label; 65 | int* pixels; 66 | }; 67 | 68 | struct raw_MNIST_line MNIST_array[MNIST_DATA_COUNT]; 69 | 70 | /* 71 | * Creates and returns a new element of type raw_MNIST_line, it creates an object whose fields 72 | * are given as parameters 73 | */ 74 | 75 | struct raw_MNIST_line* alloc_raw_MNIST_line(int label, int size, int* pixels) { 76 | 77 | struct raw_MNIST_line* new_item = (struct raw_MNIST_line*) 78 | malloc(sizeof(struct raw_MNIST_line)); 79 | 80 | /* Normalize the count of pixels */ 81 | size = MIN(size, PIXELS_COUNT); 82 | 83 | new_item->label = label; 84 | new_item->pixels = (int*) malloc(sizeof(int) * size); 85 | 86 | for(int i = 0; i < size; i++) 87 | *((new_item->pixels) + i) = *(pixels + i); 88 | 89 | return new_item; 90 | } 91 | 92 | /* 93 | * Creates a raw_MNIST_line element from a line of the CSV file, where the first item 94 | * is the label, and the rest of values are the pixels of the element 95 | */ 96 | 97 | struct raw_MNIST_line* parse_raw_MNIST_line(char* line) { 98 | int label = 0, i = 0; 99 | int* pixels = (int*) malloc(sizeof(int) * PIXELS_COUNT); 100 | 101 | char* parse; 102 | parse = strtok(line, ","); 103 | 104 | /* string to int */ 105 | label = (int) strtol(parse, (char **) NULL, 10); 106 | 107 | for(i = 0; i < PIXELS_COUNT && parse != NULL; i++) { 108 | parse = strtok (NULL, ","); 109 | *(pixels + i) = (int) strtol(parse, (char **) NULL, 10); 110 | } 111 | 112 | return alloc_raw_MNIST_line(label, i, pixels); 113 | } 114 | 115 | /* 116 | * Skips the header of the CSV file 117 | */ 118 | 119 | void skip_header(FILE* file, char* line) { 120 | fgets(line, MAX_BUFFER_SIZE, file); 121 | fgets(line, MAX_BUFFER_SIZE, file); 122 | } 123 | 124 | /* 125 | * Reads the file and populates MNIST_arrayy. If I/O processing failed, it throws an 126 | * error, and the program exits with erroneous termination 127 | */ 128 | 129 | void handle_file() { 130 | FILE * file; 131 | char line[MAX_BUFFER_SIZE]; 132 | 133 | /* Error handling */ 134 | file = fopen(FILE_NAME, "r"); 135 | 136 | if(!file) { 137 | perror("fopen"); 138 | exit(1); 139 | } 140 | 141 | skip_header(file, line); 142 | 143 | /* Create items and populate the array */ 144 | int i = 0; 145 | while(fgets(line, MAX_BUFFER_SIZE, file) && i < MNIST_DATA_COUNT) { 146 | struct raw_MNIST_line* item = parse_raw_MNIST_line(line); 147 | MNIST_array[i++] = *item; 148 | } 149 | 150 | fclose(file); 151 | 152 | } 153 | 154 | /* 155 | * Accepts an input and compare it to the actual value displayed. It uses the banner 156 | * holding the number of displayed digits, and the actual digits in order 157 | */ 158 | 159 | bool receive_CAPTCHA(int banner_size, struct raw_MNIST_line banner[]) { 160 | bool accepted = true; 161 | int num; 162 | 163 | printf(">> "); 164 | 165 | for(int i = 0; i < banner_size; i++) { 166 | 167 | int actual = banner[i].label; 168 | 169 | /* scanf a single digit */ 170 | scanf("%1d", &num); 171 | 172 | /* DON'T SHORT CIRCUIT THE LOOP */ 173 | if(num != actual) 174 | accepted = false; 175 | } 176 | 177 | return accepted; 178 | } 179 | 180 | /* 181 | * Telles the user whether or not, the given input corresponds to the displayed digits 182 | */ 183 | 184 | void report_CAPTCHA(bool status) { 185 | printf("%s\n", status ? "accepted" : "wrong"); 186 | } 187 | 188 | /* 189 | * Returns a random integer between min_num and max_num + 1 inclusive both bounds 190 | */ 191 | 192 | int randr(int min_num, int max_num) { 193 | return (rand() % max_num) + min_num; 194 | } 195 | 196 | /* 197 | * Depending on the state of the global variables, and of the given value for brightness, 198 | * this function gives a value that corresponds (taking into account the semantic constraints 199 | * imposed by the global variables) to the character to display in the CAPTCHA banner 200 | */ 201 | 202 | char classify_brightness(int brightness) { 203 | char ans = ' '; 204 | char random_fill = ASCII_colors[randr(0, 10)]; 205 | 206 | if(!white && !dots) 207 | ans = (brightness == 0) ? ' ' : random_fill; 208 | else if(!white && dots) 209 | ans = (brightness == 0) ? ' ' : DOT; 210 | else if(white && !dots) 211 | ans = (brightness == 0) ? random_fill : ' '; 212 | else /* white && dots */ 213 | ans = (brightness == 0) ? DOT : ' '; 214 | 215 | return ans; 216 | } 217 | 218 | /* 219 | * Displays a single MNIST digit given the data structure that holds it. It display 220 | * a 24 X 24 ASCII image 221 | */ 222 | 223 | void display_MNIST_line(struct raw_MNIST_line* item) { 224 | printf("%d\n", item->label); 225 | 226 | for(int i = 0; i < PIXELS_COUNT; i++) { 227 | 228 | int brightness = item->pixels[i]; 229 | 230 | char current = classify_brightness(brightness); 231 | 232 | if((i + MNIST_LINE_SIZE + 1) % MNIST_LINE_SIZE == 0) 233 | printf("%c\n", current); 234 | else 235 | printf("%c", current); 236 | 237 | } 238 | 239 | printf("\n\n"); 240 | } 241 | 242 | /* 243 | * Displays the MNIST data set based CAPTCHA banner depending on the values of min_digits, 244 | * and max_digits to define the size of it, and receives the CAPTCHA as seen by the user, 245 | * depending and on the banner, it returns a boolean that tell if the user was right 246 | */ 247 | 248 | bool display_MNIST_captcha() { 249 | 250 | /* Banner size, banner and actual answer for the CAPTCHA creation */ 251 | int banner_size = randr(min_digits, max_digits - 1); 252 | struct raw_MNIST_line banner[banner_size]; 253 | int answer[banner_size]; 254 | 255 | /* Filling the banner and the actual answer */ 256 | for(int i = 0; i < banner_size; i++) { 257 | int index = randr(0, MNIST_DATA_COUNT - 1); 258 | struct raw_MNIST_line item = MNIST_array[index]; 259 | banner[i] = item; 260 | answer[i] = item.label; 261 | } 262 | 263 | /* Display the banner */ 264 | for(int i = 0; i < MNIST_LINE_SIZE * MNIST_LINE_SIZE; i += MNIST_LINE_SIZE) { 265 | for(int j = 0; j < banner_size; j++) { 266 | for(int k = 0; k != MNIST_LINE_SIZE; k++) { 267 | int brightness = banner[j].pixels[k + i]; 268 | char current = classify_brightness(brightness); 269 | printf("%c", current); 270 | } 271 | 272 | } 273 | 274 | printf("\n"); 275 | } 276 | 277 | bool user_input = receive_CAPTCHA(banner_size, banner); 278 | return user_input; 279 | } 280 | 281 | /* 282 | * A function that checks if the given string is a positive integer 283 | */ 284 | 285 | bool isinteger(char const *str) { 286 | /* Handling empty string or just "-" */ 287 | if (!*str) 288 | return false; 289 | 290 | /* Check for non-digit chars in the rest of the stirng */ 291 | while (*str) 292 | if (!isdigit(*str)) 293 | return false; 294 | else 295 | ++str; 296 | 297 | return true; 298 | } 299 | 300 | /* 301 | * A boring function to handle command line arguments. If something weird happens 302 | * the program will stop. The responsability of this procedure is to populate 303 | * the global variables depending on the arguments 304 | */ 305 | 306 | void handle_args(int argc, char const *argv[]) { 307 | 308 | /* args to temporary variables */ 309 | char const* min_digits_arg = argv[1]; 310 | char const* max_digits_arg = argv[2]; 311 | char const* white_arg = argv[3]; 312 | char const* dots_arg = argv[4]; 313 | 314 | /* Is the range consistent? */ 315 | if(!isinteger(min_digits_arg) || !isinteger(max_digits_arg)) { 316 | printf("Error: first two args must be digits [0-9]\n"); 317 | exit(1); 318 | } else { 319 | min_digits = (int) strtol(min_digits_arg, (char **) NULL, 10); 320 | max_digits = (int) strtol(max_digits_arg, (char **) NULL, 10); 321 | 322 | if(max_digits < min_digits) { 323 | printf("Error: second arg must be bigger than the first one\n"); 324 | exit(1); 325 | } 326 | } 327 | 328 | /* Is white_arg -b or -w? */ 329 | if(strcmp(white_arg, "-b") != 0 && strcmp(white_arg, "-w") != 0) { 330 | printf("Error: third args must be -w for space filled digits, or -b for filled digits\n"); 331 | exit(1); 332 | } else 333 | white = (strcmp(white_arg, "-w") == 0); 334 | 335 | /* Is dots_arg -d or -r? */ 336 | if(strcmp(dots_arg, "-d") != 0 && strcmp(dots_arg, "-r") != 0) { 337 | printf("Error: thirs args must be -d to fill non-white space with dots, "); 338 | printf("or -r to fill it with random ASCII characters\n"); 339 | exit(1); 340 | } else 341 | dots = (strcmp(dots_arg, "-d") == 0); 342 | } 343 | 344 | /* 345 | * The main function takes care of the arguments, global variable instantiation, handling the 346 | * MNIST file and the logic behind the CAPTCHA (displaying the CAPTCHA banner), taking the user 347 | * input, and reporting the status of the input vs the banner, in a loop until the user input 348 | * corresponds to the actual digits displayed. 349 | * 350 | * ./main m n [-b | -w] [-d | -r] 351 | */ 352 | 353 | int main(int argc, char const *argv[]) { 354 | 355 | handle_args(argc, argv); 356 | handle_file(); 357 | srand(time(NULL)); 358 | 359 | bool value = false; 360 | 361 | while(!value) { 362 | value = display_MNIST_captcha(); 363 | report_CAPTCHA(value); 364 | } 365 | 366 | return 0; 367 | } --------------------------------------------------------------------------------