├── Makefile ├── README.md ├── benchmarks ├── benchmark.c ├── benchmark.h └── linux-perf-events.h ├── shuffle_table.c ├── shuffle_table.h ├── test.c ├── utf8_code.h └── utils.h /Makefile: -------------------------------------------------------------------------------- 1 | test: test.c utf8_code.h utils.h shuffle_table.h 2 | gcc -o test test.c 3 | 4 | benchmark: benchmarks/benchmark.c utf8_code.h shuffle_table.h 5 | gcc -std=c99 -march=native -O3 -Wall -o benchmark benchmarks/benchmark.c -I. -Ibenchmarks -D_GNU_SOURCE 6 | 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # fastdecode-utf-8 2 | SIMD-accelerated UTF-8 to UTF-32 conversion 3 | 4 | 5 | This is a UTF-8 decoder (to UTF-32) using Intel SIMD instructions (so far SSE only). It's based on an idea I had for doing rapid selection in variable-width data. 6 | 7 | Given a packed sequence of variable-length, byte-encoded values where the length is encoded in the first byte, selection of the k-th value is remarkably easy despite the variability. 8 | A mapping P can be formed in-place where each initial byte points to the next, which can be manipulated by the SIMD shuffle instruction. 9 | 10 | Example: A sequence of initial (shown as lengths) and continuation (shown as ".") bytes: 11 | 12 | S = 2,.,1,3,.,.,2,.,1,1,1 13 | 14 | is transformed by simple arithmetic to 15 | 16 | P = 2,.,3,6,.,.,8,.,9,10,11 17 | 18 | where, starting with position 0, each entry points to the next in sequence (a linked list, but with 4-bit pointers). 19 | 20 | The pshufb instruction dereferences such a list of indexes to the values at those indexes in a source register, so applying P above will shift the initial byte of the second value to the first byte (shifting left by one), and we can compose P^2, P^4, etc. to shift by other amounts. 21 | That composability is the basis of selecting the k-th value in O(log k) time despite the variability in its position; we simply exponentiate P^k, and the first byte will contain the offset of the k-th element. It's the power of permutation. 22 | 23 | These shuffles, once constructed, allow gathering bytes from multiple characters at a time into fixed positions, or generalized suffix-summing, where each position aggregates information from its following k consecutive values. They're quite fun. 24 | 25 | One approach shown here is to aggregate, for each UTF-8 character, the lengths of it and its 3 following characters into a code byte which is used to look up a decoding shuffle from a table. 26 | 27 | The aggregation process is similar to the brute-force SIMD prefix sum algorithm: aggregate adjacent elements into pairs (using P), and then aggregate pairs into quadruples (using P^2). We then iterate through quadruples by building P^4 and storing it to a byte array. 28 | 29 | Another approach is to shift the first, eg, 4 characters into fixed positions by adding 3 fixed-size dummy elements to the beginning of the array and building P^3 from that. Shifting everything left by 3 puts the offsets of the first 4 non-dummy characters into bytes 0-3, and they are used to build a decoding shuffle into the 4xuint32 output. This approach seems slower vs. the first approach, but it looks like it will scale better for wider registers (eg AVX512). The code-LUT approach is limited by the fact that only 4 2-bit lengths will fit in a code byte. 30 | 31 | However prepending dummy positions means truncating bytes from the right, so some care is needed. 32 | 33 | (Code for the dummy-positions approach is not in the initial checkin; I will remove this sentence when it is.) 34 | 35 | ## BENCHMARKS 36 | 37 | From the "benchmark" target. On clang-4.0, Skylake, input is 65536 bytes of UTF-8 data. Cycles are reported per input byte. 38 | 39 | | Input | Cycles/Byte | output chars | 40 | |--------------|-------------|--------------| 41 | | ASCII | 2.618 | 65536 | 42 | | UTF-8 3-byte | 2.218 | 21846 | 43 | | Random 1/3B | 3.39 | 32926 | 44 | 45 | ASCII data is randomly-generated bytes in the ASCII range (0-128). 46 | 47 | 3-byte data is the letter 'ㄱ' repeated 21845 times, followed by 'a' as a filler. 48 | 49 | Random is generated as simple 50/50-distributed 1 or 3-byte sequences, but it's enough to show that branch prediction is likely biasing the simpler benchmarks. 50 | 51 | The benchmark framework is due to Daniel Lemire. 52 | 53 | ### Performance Notes 54 | 55 | The code has a number of branch points and variable-length loops, based on traversing the data 4 unicode characters at a time, where width can vary from 4 to 16 (yielding 1-4 iterations per SSE register), and handling ASCII-only as a special case (4 bytes are converted directly to uint32_t's, without expensive masking and bitfield-compression steps). So far experiments with unrolling the traversal and dispatching multiple convert/shuffle-compress operations at a time (see the "unroll" branch) have been slower than the branchy code. 56 | 57 | ## RELATED WORK 58 | 59 | This project had its inspiration in work we did for validation (https://github.com/lemire/fastvalidate-utf-8), and we still have an issue open on that project deciding whether we should do decoding as well, but the two techniques are not integrated as of yet. So it should be noted that this code only expects valid UTF-8. 60 | 61 | ## BUILDING 62 | 63 | $ make test 64 | $ make benchmark 65 | 66 | "test" and "benchmark" are the only targets right now. 67 | -------------------------------------------------------------------------------- /benchmarks/benchmark.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | #include "benchmark.h" 10 | #include "utf8_code.h" 11 | 12 | #include 13 | /* 14 | * legal utf-8 byte sequence 15 | * http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf - page 94 16 | * 17 | * Code Points 1st 2s 3s 4s 18 | * U+0000..U+007F 00..7F 19 | * U+0080..U+07FF C2..DF 80..BF 20 | * U+0800..U+0FFF E0 A0..BF 80..BF 21 | * U+1000..U+CFFF E1..EC 80..BF 80..BF 22 | * U+D000..U+D7FF ED 80..9F 80..BF 23 | * U+E000..U+FFFF EE..EF 80..BF 80..BF 24 | * U+10000..U+3FFFF F0 90..BF 80..BF 80..BF 25 | * U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF 26 | * U+100000..U+10FFFF F4 80..8F 80..BF 80..BF 27 | * 28 | */ 29 | 30 | void populate(char *data, size_t N) { 31 | for (size_t i = 0; i < N; i++) 32 | data[i] = rand() & 0x7f; 33 | } 34 | 35 | void populate3(char *data, size_t N) { 36 | int i; 37 | for (i = 0; i+3 < N; i+=3) 38 | memcpy(data+i, "ㄱ", 3); 39 | 40 | for(; i < N; i++) 41 | data[i]='a'; 42 | } 43 | 44 | void populate_r(char *data, size_t N) { 45 | int i; 46 | for(i=0; i+3 < N; ) { 47 | if( rand() & 4096 ) { 48 | memcpy(data+i, "ㄱ", 3); 49 | i+= 3; 50 | } else { 51 | data[i]='a'; 52 | i+=1; 53 | } 54 | } 55 | 56 | for(; i < N; i++) 57 | data[i]='z'; 58 | } 59 | 60 | 61 | size_t decode(char *txt, int length, uint32_t *out) 62 | { 63 | uint32_t *ptmp = out; 64 | printf("N %d txt %4s\n", length, txt + length -4); 65 | int l = utf32_decode((uint8_t *) txt, length, &ptmp); 66 | printf("done\n%8x\n%8x\n%8x\n%8x\n", *(ptmp-4),*(ptmp-3),*(ptmp-2),*(ptmp-1)); 67 | printf("nout %zu\n", ptmp-out); 68 | return l; 69 | } 70 | 71 | void demo(size_t N) { 72 | printf("string size = %zu \n", N); 73 | char *data = (char *)malloc(N); 74 | uint32_t *out = (uint32_t *)malloc(N*sizeof(uint32_t)); 75 | 76 | int expected = 0; // it is all ascii? 77 | int repeat = 5; 78 | printf("ascii\n"); 79 | 80 | BEST_TIME(decode(data, N, out), expected, populate(data, N), repeat, N, 81 | true); 82 | 83 | printf("length 3 utf-8, with filler ascii at end\n"); 84 | BEST_TIME(decode(data, N, out), expected, populate3(data, N), repeat, N, 85 | true); 86 | 87 | BEST_TIME(decode(data, N, out), expected, populate_r(data, N), repeat, N, 88 | true); 89 | /* 90 | #ifdef __linux__ 91 | BEST_TIME_LINUX(validate_utf8_fast(data, N), expected, populate(data, N), 92 | repeat, N, true); 93 | #endif 94 | */ 95 | printf("\n\n"); 96 | 97 | free(data); 98 | } 99 | 100 | int main() { 101 | demo(65536); 102 | printf("We are feeding ascii so it is always going to be ok.\n"); 103 | } 104 | -------------------------------------------------------------------------------- /benchmarks/benchmark.h: -------------------------------------------------------------------------------- 1 | #ifndef _BENCHMARK_H_ 2 | #define _BENCHMARK_H_ 3 | #include "linux-perf-events.h" 4 | #include 5 | #include 6 | #ifdef __x86_64__ 7 | const char *unitname = "cycles"; 8 | 9 | #define RDTSC_START(cycles) \ 10 | do { \ 11 | uint32_t cyc_high, cyc_low; \ 12 | __asm volatile("cpuid\n" \ 13 | "rdtsc\n" \ 14 | "mov %%edx, %0\n" \ 15 | "mov %%eax, %1" \ 16 | : "=r"(cyc_high), "=r"(cyc_low) \ 17 | : \ 18 | : /* no read only */ \ 19 | "%rax", "%rbx", "%rcx", "%rdx" /* clobbers */ \ 20 | ); \ 21 | (cycles) = ((uint64_t)cyc_high << 32) | cyc_low; \ 22 | } while (0) 23 | 24 | #define RDTSC_STOP(cycles) \ 25 | do { \ 26 | uint32_t cyc_high, cyc_low; \ 27 | __asm volatile("rdtscp\n" \ 28 | "mov %%edx, %0\n" \ 29 | "mov %%eax, %1\n" \ 30 | "cpuid" \ 31 | : "=r"(cyc_high), "=r"(cyc_low) \ 32 | : /* no read only registers */ \ 33 | : "%rax", "%rbx", "%rcx", "%rdx" /* clobbers */ \ 34 | ); \ 35 | (cycles) = ((uint64_t)cyc_high << 32) | cyc_low; \ 36 | } while (0) 37 | 38 | #else 39 | const char *unitname = " (clock units) "; 40 | 41 | #define RDTSC_START(cycles) \ 42 | do { \ 43 | cycles = clock(); \ 44 | } while (0) 45 | 46 | #define RDTSC_STOP(cycles) \ 47 | do { \ 48 | cycles = clock(); \ 49 | } while (0) 50 | #endif 51 | 52 | static __attribute__((noinline)) uint64_t rdtsc_overhead_func(uint64_t dummy) { 53 | return dummy; 54 | } 55 | 56 | uint64_t global_rdtsc_overhead = (uint64_t)UINT64_MAX; 57 | 58 | #define RDTSC_SET_OVERHEAD(test, repeat) \ 59 | do { \ 60 | uint64_t cycles_start, cycles_final, cycles_diff; \ 61 | uint64_t min_diff = UINT64_MAX; \ 62 | for (int i = 0; i < repeat; i++) { \ 63 | __asm volatile("" ::: /* pretend to clobber */ "memory"); \ 64 | RDTSC_START(cycles_start); \ 65 | test; \ 66 | RDTSC_STOP(cycles_final); \ 67 | cycles_diff = (cycles_final - cycles_start); \ 68 | if (cycles_diff < min_diff) \ 69 | min_diff = cycles_diff; \ 70 | } \ 71 | global_rdtsc_overhead = min_diff; \ 72 | } while (0) 73 | 74 | /* 75 | * Prints the best number of operations per cycle where 76 | * test is the function call, answer is the expected answer generated by 77 | * test, repeat is the number of times we should repeat and size is the 78 | * number of operations represented by test. 79 | */ 80 | #define BEST_TIME(test, expected, pre, repeat, size, verbose) \ 81 | do { \ 82 | if (global_rdtsc_overhead == UINT64_MAX) { \ 83 | RDTSC_SET_OVERHEAD(rdtsc_overhead_func(1), repeat); \ 84 | } \ 85 | if (verbose) \ 86 | printf("%-60s\t: ", #test); \ 87 | fflush(NULL); \ 88 | uint64_t cycles_start, cycles_final, cycles_diff; \ 89 | uint64_t min_diff = (uint64_t)-1; \ 90 | uint64_t sum_diff = 0; \ 91 | for (int i = 0; i < repeat; i++) { \ 92 | pre; \ 93 | __asm volatile("" ::: /* pretend to clobber */ "memory"); \ 94 | RDTSC_START(cycles_start); \ 95 | if (test != expected) { \ 96 | printf("not expected (%d , %d )", (int)test, (int)expected); \ 97 | break; \ 98 | } \ 99 | RDTSC_STOP(cycles_final); \ 100 | cycles_diff = (cycles_final - cycles_start - global_rdtsc_overhead); \ 101 | if (cycles_diff < min_diff) \ 102 | min_diff = cycles_diff; \ 103 | sum_diff += cycles_diff; \ 104 | } \ 105 | uint64_t S = size; \ 106 | float cycle_per_op = (min_diff) / (double)S; \ 107 | float avg_cycle_per_op = (sum_diff) / ((double)S * repeat); \ 108 | if (verbose) \ 109 | printf(" %.3f %s per operation (best) ", cycle_per_op, unitname); \ 110 | if (verbose) \ 111 | printf("\t%.3f %s per operation (avg) ", avg_cycle_per_op, unitname); \ 112 | if (verbose) \ 113 | printf("\n"); \ 114 | if (!verbose) \ 115 | printf(" %.3f ", cycle_per_op); \ 116 | fflush(NULL); \ 117 | } while (0) 118 | 119 | #ifdef __linux__ 120 | 121 | uint64_t global_rdtsc_overhead_linux = (uint64_t)UINT64_MAX; 122 | 123 | #define RDTSC_SET_OVERHEAD_LINUX(test, repeat) \ 124 | do { \ 125 | uint64_t cycles_diff; \ 126 | uint64_t min_diff = UINT64_MAX; \ 127 | struct LinuxEvents cycles; \ 128 | LinuxEvents_init(&cycles); \ 129 | for (int i = 0; i < repeat; i++) { \ 130 | __asm volatile("" ::: /* pretend to clobber */ "memory"); \ 131 | LinuxEvents_start(&cycles); \ 132 | test; \ 133 | cycles_diff = LinuxEvents_end(&cycles); \ 134 | if (cycles_diff < min_diff) \ 135 | min_diff = cycles_diff; \ 136 | } \ 137 | global_rdtsc_overhead = min_diff; \ 138 | } while (0) 139 | 140 | #define BEST_TIME_LINUX(test, expected, pre, repeat, size, verbose) \ 141 | do { \ 142 | if (global_rdtsc_overhead_linux == UINT64_MAX) { \ 143 | RDTSC_SET_OVERHEAD_LINUX(rdtsc_overhead_func(1), repeat); \ 144 | } \ 145 | if (verbose) \ 146 | printf("%-60s \t: ", #test); \ 147 | fflush(NULL); \ 148 | uint64_t cycles_diff; \ 149 | uint64_t min_diff = (uint64_t)-1; \ 150 | uint64_t sum_diff = 0; \ 151 | struct LinuxEvents cycles; \ 152 | LinuxEvents_init(&cycles); \ 153 | for (int i = 0; i < repeat; i++) { \ 154 | pre; \ 155 | __asm volatile("" ::: /* pretend to clobber */ "memory"); \ 156 | LinuxEvents_start(&cycles); \ 157 | if (test != expected) { \ 158 | printf("not expected (%d , %d )", (int)test, (int)expected); \ 159 | break; \ 160 | } \ 161 | \ 162 | cycles_diff = LinuxEvents_end(&cycles) - global_rdtsc_overhead_linux; \ 163 | if (cycles_diff < min_diff) \ 164 | min_diff = cycles_diff; \ 165 | sum_diff += cycles_diff; \ 166 | } \ 167 | LinuxEvents_close(&cycles); \ 168 | uint64_t S = size; \ 169 | float cycle_per_op = (min_diff) / (double)S; \ 170 | float avg_cycle_per_op = (sum_diff) / ((double)S * repeat); \ 171 | if (verbose) \ 172 | printf(" %.3f %s per operation (best) ", cycle_per_op, unitname); \ 173 | if (verbose) \ 174 | printf("\t%.3f %s per operation (avg) ", avg_cycle_per_op, unitname); \ 175 | if (verbose) \ 176 | printf(" (linux counter) \n"); \ 177 | if (!verbose) \ 178 | printf(" %.3f ", cycle_per_op); \ 179 | fflush(NULL); \ 180 | } while (0) 181 | #endif 182 | 183 | #endif 184 | -------------------------------------------------------------------------------- /benchmarks/linux-perf-events.h: -------------------------------------------------------------------------------- 1 | // https://github.com/WojciechMula/toys/blob/master/000helpers/linux-perf-events.h 2 | #pragma once 3 | #ifdef __linux__ 4 | #ifndef _GNU_SOURCE 5 | #define _GNU_SOURCE /* See feature_test_macros(7) */ 6 | #endif 7 | #include 8 | #include /* For SYS_xxx definitions */ 9 | #include // for __NR_perf_event_open 10 | #include 11 | #include // for perf event constants 12 | #include // for ioctl 13 | #include // for syscall 14 | 15 | struct LinuxEvents { 16 | 17 | int fd; 18 | struct perf_event_attr attribs; 19 | }; 20 | 21 | void LinuxEvents_init(struct LinuxEvents *L) { 22 | L->fd = 0; 23 | memset(&L->attribs, 0, sizeof(L->attribs)); 24 | (L->attribs).type = PERF_TYPE_HARDWARE; 25 | (L->attribs).size = sizeof(L->attribs); 26 | (L->attribs).config = PERF_COUNT_HW_CPU_CYCLES; 27 | (L->attribs).disabled = 1; 28 | (L->attribs).exclude_kernel = 1; 29 | (L->attribs).exclude_hv = 1; 30 | 31 | const int pid = 0; // the current process 32 | const int cpu = -1; // all CPUs 33 | const int group = -1; // no group 34 | const unsigned long flags = 0; 35 | L->fd = syscall(__NR_perf_event_open, &(L->attribs), pid, cpu, group, flags); 36 | if (L->fd == -1) { 37 | printf("perf_event_open"); 38 | } 39 | } 40 | void LinuxEvents_close(struct LinuxEvents *L) { close(L->fd); } 41 | 42 | void LinuxEvents_start(struct LinuxEvents *L) { 43 | if (ioctl(L->fd, PERF_EVENT_IOC_RESET, 0) == -1) { 44 | printf("ioctl(PERF_EVENT_IOC_RESET)"); 45 | } 46 | 47 | if (ioctl(L->fd, PERF_EVENT_IOC_ENABLE, 0) == -1) { 48 | printf("ioctl(PERF_EVENT_IOC_ENABLE)"); 49 | } 50 | } 51 | 52 | unsigned long LinuxEvents_end(struct LinuxEvents *L) { 53 | if (ioctl(L->fd, PERF_EVENT_IOC_DISABLE, 0) == -1) { 54 | printf("ioctl(PERF_EVENT_IOC_DISABLE)"); 55 | } 56 | 57 | unsigned long result; 58 | if (read(L->fd, &result, sizeof(result)) == -1) { 59 | printf("read"); 60 | } 61 | 62 | return result; 63 | } 64 | 65 | #endif 66 | -------------------------------------------------------------------------------- /shuffle_table.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | //#include "streamvbyte_shuffle_tables_decode.h" 6 | 7 | static uint8_t shuffleTable[256][16]; 8 | 9 | #define extract(c,i) (3 & (c >> 2*i)) 10 | 11 | static void decoder_permutation(uint8_t *table) { 12 | uint8_t *p = table; 13 | for(int code = 0; code < 256; code++) { 14 | int byte = 0; 15 | for(int i = 0; i < 4; i++ ) { 16 | int c = extract(code, i); 17 | int j; 18 | for( j = c; j >= 0; j-- ) 19 | p[j] = byte++; 20 | for( j=c+1; j < 4; j++ ) 21 | p[j] = -1; 22 | p+=4; 23 | } 24 | } 25 | } 26 | 27 | void dump(char * xc) { 28 | 29 | for( int i=0; i<15; i++) 30 | printf("%2i,", xc[i]); 31 | printf("%2i", xc[15]); 32 | 33 | } 34 | 35 | int main() { 36 | decoder_permutation(&shuffleTable[0][0]); 37 | 38 | printf( "static uint8_t shuffleTable[256][16] = {\n{"); 39 | 40 | for(int i=0; i<256;i++) { 41 | dump((char *) &shuffleTable[i][0]); 42 | printf(i < 255 ? " },\n{" :" }\n};\n"); 43 | } 44 | 45 | } 46 | 47 | 48 | -------------------------------------------------------------------------------- /shuffle_table.h: -------------------------------------------------------------------------------- 1 | static uint8_t shuffleTable[256][16] = { 2 | { 0,-1,-1,-1, 1,-1,-1,-1, 2,-1,-1,-1, 3,-1,-1,-1 }, 3 | { 1, 0,-1,-1, 2,-1,-1,-1, 3,-1,-1,-1, 4,-1,-1,-1 }, 4 | { 2, 1, 0,-1, 3,-1,-1,-1, 4,-1,-1,-1, 5,-1,-1,-1 }, 5 | { 3, 2, 1, 0, 4,-1,-1,-1, 5,-1,-1,-1, 6,-1,-1,-1 }, 6 | { 0,-1,-1,-1, 2, 1,-1,-1, 3,-1,-1,-1, 4,-1,-1,-1 }, 7 | { 1, 0,-1,-1, 3, 2,-1,-1, 4,-1,-1,-1, 5,-1,-1,-1 }, 8 | { 2, 1, 0,-1, 4, 3,-1,-1, 5,-1,-1,-1, 6,-1,-1,-1 }, 9 | { 3, 2, 1, 0, 5, 4,-1,-1, 6,-1,-1,-1, 7,-1,-1,-1 }, 10 | { 0,-1,-1,-1, 3, 2, 1,-1, 4,-1,-1,-1, 5,-1,-1,-1 }, 11 | { 1, 0,-1,-1, 4, 3, 2,-1, 5,-1,-1,-1, 6,-1,-1,-1 }, 12 | { 2, 1, 0,-1, 5, 4, 3,-1, 6,-1,-1,-1, 7,-1,-1,-1 }, 13 | { 3, 2, 1, 0, 6, 5, 4,-1, 7,-1,-1,-1, 8,-1,-1,-1 }, 14 | { 0,-1,-1,-1, 4, 3, 2, 1, 5,-1,-1,-1, 6,-1,-1,-1 }, 15 | { 1, 0,-1,-1, 5, 4, 3, 2, 6,-1,-1,-1, 7,-1,-1,-1 }, 16 | { 2, 1, 0,-1, 6, 5, 4, 3, 7,-1,-1,-1, 8,-1,-1,-1 }, 17 | { 3, 2, 1, 0, 7, 6, 5, 4, 8,-1,-1,-1, 9,-1,-1,-1 }, 18 | { 0,-1,-1,-1, 1,-1,-1,-1, 3, 2,-1,-1, 4,-1,-1,-1 }, 19 | { 1, 0,-1,-1, 2,-1,-1,-1, 4, 3,-1,-1, 5,-1,-1,-1 }, 20 | { 2, 1, 0,-1, 3,-1,-1,-1, 5, 4,-1,-1, 6,-1,-1,-1 }, 21 | { 3, 2, 1, 0, 4,-1,-1,-1, 6, 5,-1,-1, 7,-1,-1,-1 }, 22 | { 0,-1,-1,-1, 2, 1,-1,-1, 4, 3,-1,-1, 5,-1,-1,-1 }, 23 | { 1, 0,-1,-1, 3, 2,-1,-1, 5, 4,-1,-1, 6,-1,-1,-1 }, 24 | { 2, 1, 0,-1, 4, 3,-1,-1, 6, 5,-1,-1, 7,-1,-1,-1 }, 25 | { 3, 2, 1, 0, 5, 4,-1,-1, 7, 6,-1,-1, 8,-1,-1,-1 }, 26 | { 0,-1,-1,-1, 3, 2, 1,-1, 5, 4,-1,-1, 6,-1,-1,-1 }, 27 | { 1, 0,-1,-1, 4, 3, 2,-1, 6, 5,-1,-1, 7,-1,-1,-1 }, 28 | { 2, 1, 0,-1, 5, 4, 3,-1, 7, 6,-1,-1, 8,-1,-1,-1 }, 29 | { 3, 2, 1, 0, 6, 5, 4,-1, 8, 7,-1,-1, 9,-1,-1,-1 }, 30 | { 0,-1,-1,-1, 4, 3, 2, 1, 6, 5,-1,-1, 7,-1,-1,-1 }, 31 | { 1, 0,-1,-1, 5, 4, 3, 2, 7, 6,-1,-1, 8,-1,-1,-1 }, 32 | { 2, 1, 0,-1, 6, 5, 4, 3, 8, 7,-1,-1, 9,-1,-1,-1 }, 33 | { 3, 2, 1, 0, 7, 6, 5, 4, 9, 8,-1,-1,10,-1,-1,-1 }, 34 | { 0,-1,-1,-1, 1,-1,-1,-1, 4, 3, 2,-1, 5,-1,-1,-1 }, 35 | { 1, 0,-1,-1, 2,-1,-1,-1, 5, 4, 3,-1, 6,-1,-1,-1 }, 36 | { 2, 1, 0,-1, 3,-1,-1,-1, 6, 5, 4,-1, 7,-1,-1,-1 }, 37 | { 3, 2, 1, 0, 4,-1,-1,-1, 7, 6, 5,-1, 8,-1,-1,-1 }, 38 | { 0,-1,-1,-1, 2, 1,-1,-1, 5, 4, 3,-1, 6,-1,-1,-1 }, 39 | { 1, 0,-1,-1, 3, 2,-1,-1, 6, 5, 4,-1, 7,-1,-1,-1 }, 40 | { 2, 1, 0,-1, 4, 3,-1,-1, 7, 6, 5,-1, 8,-1,-1,-1 }, 41 | { 3, 2, 1, 0, 5, 4,-1,-1, 8, 7, 6,-1, 9,-1,-1,-1 }, 42 | { 0,-1,-1,-1, 3, 2, 1,-1, 6, 5, 4,-1, 7,-1,-1,-1 }, 43 | { 1, 0,-1,-1, 4, 3, 2,-1, 7, 6, 5,-1, 8,-1,-1,-1 }, 44 | { 2, 1, 0,-1, 5, 4, 3,-1, 8, 7, 6,-1, 9,-1,-1,-1 }, 45 | { 3, 2, 1, 0, 6, 5, 4,-1, 9, 8, 7,-1,10,-1,-1,-1 }, 46 | { 0,-1,-1,-1, 4, 3, 2, 1, 7, 6, 5,-1, 8,-1,-1,-1 }, 47 | { 1, 0,-1,-1, 5, 4, 3, 2, 8, 7, 6,-1, 9,-1,-1,-1 }, 48 | { 2, 1, 0,-1, 6, 5, 4, 3, 9, 8, 7,-1,10,-1,-1,-1 }, 49 | { 3, 2, 1, 0, 7, 6, 5, 4,10, 9, 8,-1,11,-1,-1,-1 }, 50 | { 0,-1,-1,-1, 1,-1,-1,-1, 5, 4, 3, 2, 6,-1,-1,-1 }, 51 | { 1, 0,-1,-1, 2,-1,-1,-1, 6, 5, 4, 3, 7,-1,-1,-1 }, 52 | { 2, 1, 0,-1, 3,-1,-1,-1, 7, 6, 5, 4, 8,-1,-1,-1 }, 53 | { 3, 2, 1, 0, 4,-1,-1,-1, 8, 7, 6, 5, 9,-1,-1,-1 }, 54 | { 0,-1,-1,-1, 2, 1,-1,-1, 6, 5, 4, 3, 7,-1,-1,-1 }, 55 | { 1, 0,-1,-1, 3, 2,-1,-1, 7, 6, 5, 4, 8,-1,-1,-1 }, 56 | { 2, 1, 0,-1, 4, 3,-1,-1, 8, 7, 6, 5, 9,-1,-1,-1 }, 57 | { 3, 2, 1, 0, 5, 4,-1,-1, 9, 8, 7, 6,10,-1,-1,-1 }, 58 | { 0,-1,-1,-1, 3, 2, 1,-1, 7, 6, 5, 4, 8,-1,-1,-1 }, 59 | { 1, 0,-1,-1, 4, 3, 2,-1, 8, 7, 6, 5, 9,-1,-1,-1 }, 60 | { 2, 1, 0,-1, 5, 4, 3,-1, 9, 8, 7, 6,10,-1,-1,-1 }, 61 | { 3, 2, 1, 0, 6, 5, 4,-1,10, 9, 8, 7,11,-1,-1,-1 }, 62 | { 0,-1,-1,-1, 4, 3, 2, 1, 8, 7, 6, 5, 9,-1,-1,-1 }, 63 | { 1, 0,-1,-1, 5, 4, 3, 2, 9, 8, 7, 6,10,-1,-1,-1 }, 64 | { 2, 1, 0,-1, 6, 5, 4, 3,10, 9, 8, 7,11,-1,-1,-1 }, 65 | { 3, 2, 1, 0, 7, 6, 5, 4,11,10, 9, 8,12,-1,-1,-1 }, 66 | { 0,-1,-1,-1, 1,-1,-1,-1, 2,-1,-1,-1, 4, 3,-1,-1 }, 67 | { 1, 0,-1,-1, 2,-1,-1,-1, 3,-1,-1,-1, 5, 4,-1,-1 }, 68 | { 2, 1, 0,-1, 3,-1,-1,-1, 4,-1,-1,-1, 6, 5,-1,-1 }, 69 | { 3, 2, 1, 0, 4,-1,-1,-1, 5,-1,-1,-1, 7, 6,-1,-1 }, 70 | { 0,-1,-1,-1, 2, 1,-1,-1, 3,-1,-1,-1, 5, 4,-1,-1 }, 71 | { 1, 0,-1,-1, 3, 2,-1,-1, 4,-1,-1,-1, 6, 5,-1,-1 }, 72 | { 2, 1, 0,-1, 4, 3,-1,-1, 5,-1,-1,-1, 7, 6,-1,-1 }, 73 | { 3, 2, 1, 0, 5, 4,-1,-1, 6,-1,-1,-1, 8, 7,-1,-1 }, 74 | { 0,-1,-1,-1, 3, 2, 1,-1, 4,-1,-1,-1, 6, 5,-1,-1 }, 75 | { 1, 0,-1,-1, 4, 3, 2,-1, 5,-1,-1,-1, 7, 6,-1,-1 }, 76 | { 2, 1, 0,-1, 5, 4, 3,-1, 6,-1,-1,-1, 8, 7,-1,-1 }, 77 | { 3, 2, 1, 0, 6, 5, 4,-1, 7,-1,-1,-1, 9, 8,-1,-1 }, 78 | { 0,-1,-1,-1, 4, 3, 2, 1, 5,-1,-1,-1, 7, 6,-1,-1 }, 79 | { 1, 0,-1,-1, 5, 4, 3, 2, 6,-1,-1,-1, 8, 7,-1,-1 }, 80 | { 2, 1, 0,-1, 6, 5, 4, 3, 7,-1,-1,-1, 9, 8,-1,-1 }, 81 | { 3, 2, 1, 0, 7, 6, 5, 4, 8,-1,-1,-1,10, 9,-1,-1 }, 82 | { 0,-1,-1,-1, 1,-1,-1,-1, 3, 2,-1,-1, 5, 4,-1,-1 }, 83 | { 1, 0,-1,-1, 2,-1,-1,-1, 4, 3,-1,-1, 6, 5,-1,-1 }, 84 | { 2, 1, 0,-1, 3,-1,-1,-1, 5, 4,-1,-1, 7, 6,-1,-1 }, 85 | { 3, 2, 1, 0, 4,-1,-1,-1, 6, 5,-1,-1, 8, 7,-1,-1 }, 86 | { 0,-1,-1,-1, 2, 1,-1,-1, 4, 3,-1,-1, 6, 5,-1,-1 }, 87 | { 1, 0,-1,-1, 3, 2,-1,-1, 5, 4,-1,-1, 7, 6,-1,-1 }, 88 | { 2, 1, 0,-1, 4, 3,-1,-1, 6, 5,-1,-1, 8, 7,-1,-1 }, 89 | { 3, 2, 1, 0, 5, 4,-1,-1, 7, 6,-1,-1, 9, 8,-1,-1 }, 90 | { 0,-1,-1,-1, 3, 2, 1,-1, 5, 4,-1,-1, 7, 6,-1,-1 }, 91 | { 1, 0,-1,-1, 4, 3, 2,-1, 6, 5,-1,-1, 8, 7,-1,-1 }, 92 | { 2, 1, 0,-1, 5, 4, 3,-1, 7, 6,-1,-1, 9, 8,-1,-1 }, 93 | { 3, 2, 1, 0, 6, 5, 4,-1, 8, 7,-1,-1,10, 9,-1,-1 }, 94 | { 0,-1,-1,-1, 4, 3, 2, 1, 6, 5,-1,-1, 8, 7,-1,-1 }, 95 | { 1, 0,-1,-1, 5, 4, 3, 2, 7, 6,-1,-1, 9, 8,-1,-1 }, 96 | { 2, 1, 0,-1, 6, 5, 4, 3, 8, 7,-1,-1,10, 9,-1,-1 }, 97 | { 3, 2, 1, 0, 7, 6, 5, 4, 9, 8,-1,-1,11,10,-1,-1 }, 98 | { 0,-1,-1,-1, 1,-1,-1,-1, 4, 3, 2,-1, 6, 5,-1,-1 }, 99 | { 1, 0,-1,-1, 2,-1,-1,-1, 5, 4, 3,-1, 7, 6,-1,-1 }, 100 | { 2, 1, 0,-1, 3,-1,-1,-1, 6, 5, 4,-1, 8, 7,-1,-1 }, 101 | { 3, 2, 1, 0, 4,-1,-1,-1, 7, 6, 5,-1, 9, 8,-1,-1 }, 102 | { 0,-1,-1,-1, 2, 1,-1,-1, 5, 4, 3,-1, 7, 6,-1,-1 }, 103 | { 1, 0,-1,-1, 3, 2,-1,-1, 6, 5, 4,-1, 8, 7,-1,-1 }, 104 | { 2, 1, 0,-1, 4, 3,-1,-1, 7, 6, 5,-1, 9, 8,-1,-1 }, 105 | { 3, 2, 1, 0, 5, 4,-1,-1, 8, 7, 6,-1,10, 9,-1,-1 }, 106 | { 0,-1,-1,-1, 3, 2, 1,-1, 6, 5, 4,-1, 8, 7,-1,-1 }, 107 | { 1, 0,-1,-1, 4, 3, 2,-1, 7, 6, 5,-1, 9, 8,-1,-1 }, 108 | { 2, 1, 0,-1, 5, 4, 3,-1, 8, 7, 6,-1,10, 9,-1,-1 }, 109 | { 3, 2, 1, 0, 6, 5, 4,-1, 9, 8, 7,-1,11,10,-1,-1 }, 110 | { 0,-1,-1,-1, 4, 3, 2, 1, 7, 6, 5,-1, 9, 8,-1,-1 }, 111 | { 1, 0,-1,-1, 5, 4, 3, 2, 8, 7, 6,-1,10, 9,-1,-1 }, 112 | { 2, 1, 0,-1, 6, 5, 4, 3, 9, 8, 7,-1,11,10,-1,-1 }, 113 | { 3, 2, 1, 0, 7, 6, 5, 4,10, 9, 8,-1,12,11,-1,-1 }, 114 | { 0,-1,-1,-1, 1,-1,-1,-1, 5, 4, 3, 2, 7, 6,-1,-1 }, 115 | { 1, 0,-1,-1, 2,-1,-1,-1, 6, 5, 4, 3, 8, 7,-1,-1 }, 116 | { 2, 1, 0,-1, 3,-1,-1,-1, 7, 6, 5, 4, 9, 8,-1,-1 }, 117 | { 3, 2, 1, 0, 4,-1,-1,-1, 8, 7, 6, 5,10, 9,-1,-1 }, 118 | { 0,-1,-1,-1, 2, 1,-1,-1, 6, 5, 4, 3, 8, 7,-1,-1 }, 119 | { 1, 0,-1,-1, 3, 2,-1,-1, 7, 6, 5, 4, 9, 8,-1,-1 }, 120 | { 2, 1, 0,-1, 4, 3,-1,-1, 8, 7, 6, 5,10, 9,-1,-1 }, 121 | { 3, 2, 1, 0, 5, 4,-1,-1, 9, 8, 7, 6,11,10,-1,-1 }, 122 | { 0,-1,-1,-1, 3, 2, 1,-1, 7, 6, 5, 4, 9, 8,-1,-1 }, 123 | { 1, 0,-1,-1, 4, 3, 2,-1, 8, 7, 6, 5,10, 9,-1,-1 }, 124 | { 2, 1, 0,-1, 5, 4, 3,-1, 9, 8, 7, 6,11,10,-1,-1 }, 125 | { 3, 2, 1, 0, 6, 5, 4,-1,10, 9, 8, 7,12,11,-1,-1 }, 126 | { 0,-1,-1,-1, 4, 3, 2, 1, 8, 7, 6, 5,10, 9,-1,-1 }, 127 | { 1, 0,-1,-1, 5, 4, 3, 2, 9, 8, 7, 6,11,10,-1,-1 }, 128 | { 2, 1, 0,-1, 6, 5, 4, 3,10, 9, 8, 7,12,11,-1,-1 }, 129 | { 3, 2, 1, 0, 7, 6, 5, 4,11,10, 9, 8,13,12,-1,-1 }, 130 | { 0,-1,-1,-1, 1,-1,-1,-1, 2,-1,-1,-1, 5, 4, 3,-1 }, 131 | { 1, 0,-1,-1, 2,-1,-1,-1, 3,-1,-1,-1, 6, 5, 4,-1 }, 132 | { 2, 1, 0,-1, 3,-1,-1,-1, 4,-1,-1,-1, 7, 6, 5,-1 }, 133 | { 3, 2, 1, 0, 4,-1,-1,-1, 5,-1,-1,-1, 8, 7, 6,-1 }, 134 | { 0,-1,-1,-1, 2, 1,-1,-1, 3,-1,-1,-1, 6, 5, 4,-1 }, 135 | { 1, 0,-1,-1, 3, 2,-1,-1, 4,-1,-1,-1, 7, 6, 5,-1 }, 136 | { 2, 1, 0,-1, 4, 3,-1,-1, 5,-1,-1,-1, 8, 7, 6,-1 }, 137 | { 3, 2, 1, 0, 5, 4,-1,-1, 6,-1,-1,-1, 9, 8, 7,-1 }, 138 | { 0,-1,-1,-1, 3, 2, 1,-1, 4,-1,-1,-1, 7, 6, 5,-1 }, 139 | { 1, 0,-1,-1, 4, 3, 2,-1, 5,-1,-1,-1, 8, 7, 6,-1 }, 140 | { 2, 1, 0,-1, 5, 4, 3,-1, 6,-1,-1,-1, 9, 8, 7,-1 }, 141 | { 3, 2, 1, 0, 6, 5, 4,-1, 7,-1,-1,-1,10, 9, 8,-1 }, 142 | { 0,-1,-1,-1, 4, 3, 2, 1, 5,-1,-1,-1, 8, 7, 6,-1 }, 143 | { 1, 0,-1,-1, 5, 4, 3, 2, 6,-1,-1,-1, 9, 8, 7,-1 }, 144 | { 2, 1, 0,-1, 6, 5, 4, 3, 7,-1,-1,-1,10, 9, 8,-1 }, 145 | { 3, 2, 1, 0, 7, 6, 5, 4, 8,-1,-1,-1,11,10, 9,-1 }, 146 | { 0,-1,-1,-1, 1,-1,-1,-1, 3, 2,-1,-1, 6, 5, 4,-1 }, 147 | { 1, 0,-1,-1, 2,-1,-1,-1, 4, 3,-1,-1, 7, 6, 5,-1 }, 148 | { 2, 1, 0,-1, 3,-1,-1,-1, 5, 4,-1,-1, 8, 7, 6,-1 }, 149 | { 3, 2, 1, 0, 4,-1,-1,-1, 6, 5,-1,-1, 9, 8, 7,-1 }, 150 | { 0,-1,-1,-1, 2, 1,-1,-1, 4, 3,-1,-1, 7, 6, 5,-1 }, 151 | { 1, 0,-1,-1, 3, 2,-1,-1, 5, 4,-1,-1, 8, 7, 6,-1 }, 152 | { 2, 1, 0,-1, 4, 3,-1,-1, 6, 5,-1,-1, 9, 8, 7,-1 }, 153 | { 3, 2, 1, 0, 5, 4,-1,-1, 7, 6,-1,-1,10, 9, 8,-1 }, 154 | { 0,-1,-1,-1, 3, 2, 1,-1, 5, 4,-1,-1, 8, 7, 6,-1 }, 155 | { 1, 0,-1,-1, 4, 3, 2,-1, 6, 5,-1,-1, 9, 8, 7,-1 }, 156 | { 2, 1, 0,-1, 5, 4, 3,-1, 7, 6,-1,-1,10, 9, 8,-1 }, 157 | { 3, 2, 1, 0, 6, 5, 4,-1, 8, 7,-1,-1,11,10, 9,-1 }, 158 | { 0,-1,-1,-1, 4, 3, 2, 1, 6, 5,-1,-1, 9, 8, 7,-1 }, 159 | { 1, 0,-1,-1, 5, 4, 3, 2, 7, 6,-1,-1,10, 9, 8,-1 }, 160 | { 2, 1, 0,-1, 6, 5, 4, 3, 8, 7,-1,-1,11,10, 9,-1 }, 161 | { 3, 2, 1, 0, 7, 6, 5, 4, 9, 8,-1,-1,12,11,10,-1 }, 162 | { 0,-1,-1,-1, 1,-1,-1,-1, 4, 3, 2,-1, 7, 6, 5,-1 }, 163 | { 1, 0,-1,-1, 2,-1,-1,-1, 5, 4, 3,-1, 8, 7, 6,-1 }, 164 | { 2, 1, 0,-1, 3,-1,-1,-1, 6, 5, 4,-1, 9, 8, 7,-1 }, 165 | { 3, 2, 1, 0, 4,-1,-1,-1, 7, 6, 5,-1,10, 9, 8,-1 }, 166 | { 0,-1,-1,-1, 2, 1,-1,-1, 5, 4, 3,-1, 8, 7, 6,-1 }, 167 | { 1, 0,-1,-1, 3, 2,-1,-1, 6, 5, 4,-1, 9, 8, 7,-1 }, 168 | { 2, 1, 0,-1, 4, 3,-1,-1, 7, 6, 5,-1,10, 9, 8,-1 }, 169 | { 3, 2, 1, 0, 5, 4,-1,-1, 8, 7, 6,-1,11,10, 9,-1 }, 170 | { 0,-1,-1,-1, 3, 2, 1,-1, 6, 5, 4,-1, 9, 8, 7,-1 }, 171 | { 1, 0,-1,-1, 4, 3, 2,-1, 7, 6, 5,-1,10, 9, 8,-1 }, 172 | { 2, 1, 0,-1, 5, 4, 3,-1, 8, 7, 6,-1,11,10, 9,-1 }, 173 | { 3, 2, 1, 0, 6, 5, 4,-1, 9, 8, 7,-1,12,11,10,-1 }, 174 | { 0,-1,-1,-1, 4, 3, 2, 1, 7, 6, 5,-1,10, 9, 8,-1 }, 175 | { 1, 0,-1,-1, 5, 4, 3, 2, 8, 7, 6,-1,11,10, 9,-1 }, 176 | { 2, 1, 0,-1, 6, 5, 4, 3, 9, 8, 7,-1,12,11,10,-1 }, 177 | { 3, 2, 1, 0, 7, 6, 5, 4,10, 9, 8,-1,13,12,11,-1 }, 178 | { 0,-1,-1,-1, 1,-1,-1,-1, 5, 4, 3, 2, 8, 7, 6,-1 }, 179 | { 1, 0,-1,-1, 2,-1,-1,-1, 6, 5, 4, 3, 9, 8, 7,-1 }, 180 | { 2, 1, 0,-1, 3,-1,-1,-1, 7, 6, 5, 4,10, 9, 8,-1 }, 181 | { 3, 2, 1, 0, 4,-1,-1,-1, 8, 7, 6, 5,11,10, 9,-1 }, 182 | { 0,-1,-1,-1, 2, 1,-1,-1, 6, 5, 4, 3, 9, 8, 7,-1 }, 183 | { 1, 0,-1,-1, 3, 2,-1,-1, 7, 6, 5, 4,10, 9, 8,-1 }, 184 | { 2, 1, 0,-1, 4, 3,-1,-1, 8, 7, 6, 5,11,10, 9,-1 }, 185 | { 3, 2, 1, 0, 5, 4,-1,-1, 9, 8, 7, 6,12,11,10,-1 }, 186 | { 0,-1,-1,-1, 3, 2, 1,-1, 7, 6, 5, 4,10, 9, 8,-1 }, 187 | { 1, 0,-1,-1, 4, 3, 2,-1, 8, 7, 6, 5,11,10, 9,-1 }, 188 | { 2, 1, 0,-1, 5, 4, 3,-1, 9, 8, 7, 6,12,11,10,-1 }, 189 | { 3, 2, 1, 0, 6, 5, 4,-1,10, 9, 8, 7,13,12,11,-1 }, 190 | { 0,-1,-1,-1, 4, 3, 2, 1, 8, 7, 6, 5,11,10, 9,-1 }, 191 | { 1, 0,-1,-1, 5, 4, 3, 2, 9, 8, 7, 6,12,11,10,-1 }, 192 | { 2, 1, 0,-1, 6, 5, 4, 3,10, 9, 8, 7,13,12,11,-1 }, 193 | { 3, 2, 1, 0, 7, 6, 5, 4,11,10, 9, 8,14,13,12,-1 }, 194 | { 0,-1,-1,-1, 1,-1,-1,-1, 2,-1,-1,-1, 6, 5, 4, 3 }, 195 | { 1, 0,-1,-1, 2,-1,-1,-1, 3,-1,-1,-1, 7, 6, 5, 4 }, 196 | { 2, 1, 0,-1, 3,-1,-1,-1, 4,-1,-1,-1, 8, 7, 6, 5 }, 197 | { 3, 2, 1, 0, 4,-1,-1,-1, 5,-1,-1,-1, 9, 8, 7, 6 }, 198 | { 0,-1,-1,-1, 2, 1,-1,-1, 3,-1,-1,-1, 7, 6, 5, 4 }, 199 | { 1, 0,-1,-1, 3, 2,-1,-1, 4,-1,-1,-1, 8, 7, 6, 5 }, 200 | { 2, 1, 0,-1, 4, 3,-1,-1, 5,-1,-1,-1, 9, 8, 7, 6 }, 201 | { 3, 2, 1, 0, 5, 4,-1,-1, 6,-1,-1,-1,10, 9, 8, 7 }, 202 | { 0,-1,-1,-1, 3, 2, 1,-1, 4,-1,-1,-1, 8, 7, 6, 5 }, 203 | { 1, 0,-1,-1, 4, 3, 2,-1, 5,-1,-1,-1, 9, 8, 7, 6 }, 204 | { 2, 1, 0,-1, 5, 4, 3,-1, 6,-1,-1,-1,10, 9, 8, 7 }, 205 | { 3, 2, 1, 0, 6, 5, 4,-1, 7,-1,-1,-1,11,10, 9, 8 }, 206 | { 0,-1,-1,-1, 4, 3, 2, 1, 5,-1,-1,-1, 9, 8, 7, 6 }, 207 | { 1, 0,-1,-1, 5, 4, 3, 2, 6,-1,-1,-1,10, 9, 8, 7 }, 208 | { 2, 1, 0,-1, 6, 5, 4, 3, 7,-1,-1,-1,11,10, 9, 8 }, 209 | { 3, 2, 1, 0, 7, 6, 5, 4, 8,-1,-1,-1,12,11,10, 9 }, 210 | { 0,-1,-1,-1, 1,-1,-1,-1, 3, 2,-1,-1, 7, 6, 5, 4 }, 211 | { 1, 0,-1,-1, 2,-1,-1,-1, 4, 3,-1,-1, 8, 7, 6, 5 }, 212 | { 2, 1, 0,-1, 3,-1,-1,-1, 5, 4,-1,-1, 9, 8, 7, 6 }, 213 | { 3, 2, 1, 0, 4,-1,-1,-1, 6, 5,-1,-1,10, 9, 8, 7 }, 214 | { 0,-1,-1,-1, 2, 1,-1,-1, 4, 3,-1,-1, 8, 7, 6, 5 }, 215 | { 1, 0,-1,-1, 3, 2,-1,-1, 5, 4,-1,-1, 9, 8, 7, 6 }, 216 | { 2, 1, 0,-1, 4, 3,-1,-1, 6, 5,-1,-1,10, 9, 8, 7 }, 217 | { 3, 2, 1, 0, 5, 4,-1,-1, 7, 6,-1,-1,11,10, 9, 8 }, 218 | { 0,-1,-1,-1, 3, 2, 1,-1, 5, 4,-1,-1, 9, 8, 7, 6 }, 219 | { 1, 0,-1,-1, 4, 3, 2,-1, 6, 5,-1,-1,10, 9, 8, 7 }, 220 | { 2, 1, 0,-1, 5, 4, 3,-1, 7, 6,-1,-1,11,10, 9, 8 }, 221 | { 3, 2, 1, 0, 6, 5, 4,-1, 8, 7,-1,-1,12,11,10, 9 }, 222 | { 0,-1,-1,-1, 4, 3, 2, 1, 6, 5,-1,-1,10, 9, 8, 7 }, 223 | { 1, 0,-1,-1, 5, 4, 3, 2, 7, 6,-1,-1,11,10, 9, 8 }, 224 | { 2, 1, 0,-1, 6, 5, 4, 3, 8, 7,-1,-1,12,11,10, 9 }, 225 | { 3, 2, 1, 0, 7, 6, 5, 4, 9, 8,-1,-1,13,12,11,10 }, 226 | { 0,-1,-1,-1, 1,-1,-1,-1, 4, 3, 2,-1, 8, 7, 6, 5 }, 227 | { 1, 0,-1,-1, 2,-1,-1,-1, 5, 4, 3,-1, 9, 8, 7, 6 }, 228 | { 2, 1, 0,-1, 3,-1,-1,-1, 6, 5, 4,-1,10, 9, 8, 7 }, 229 | { 3, 2, 1, 0, 4,-1,-1,-1, 7, 6, 5,-1,11,10, 9, 8 }, 230 | { 0,-1,-1,-1, 2, 1,-1,-1, 5, 4, 3,-1, 9, 8, 7, 6 }, 231 | { 1, 0,-1,-1, 3, 2,-1,-1, 6, 5, 4,-1,10, 9, 8, 7 }, 232 | { 2, 1, 0,-1, 4, 3,-1,-1, 7, 6, 5,-1,11,10, 9, 8 }, 233 | { 3, 2, 1, 0, 5, 4,-1,-1, 8, 7, 6,-1,12,11,10, 9 }, 234 | { 0,-1,-1,-1, 3, 2, 1,-1, 6, 5, 4,-1,10, 9, 8, 7 }, 235 | { 1, 0,-1,-1, 4, 3, 2,-1, 7, 6, 5,-1,11,10, 9, 8 }, 236 | { 2, 1, 0,-1, 5, 4, 3,-1, 8, 7, 6,-1,12,11,10, 9 }, 237 | { 3, 2, 1, 0, 6, 5, 4,-1, 9, 8, 7,-1,13,12,11,10 }, 238 | { 0,-1,-1,-1, 4, 3, 2, 1, 7, 6, 5,-1,11,10, 9, 8 }, 239 | { 1, 0,-1,-1, 5, 4, 3, 2, 8, 7, 6,-1,12,11,10, 9 }, 240 | { 2, 1, 0,-1, 6, 5, 4, 3, 9, 8, 7,-1,13,12,11,10 }, 241 | { 3, 2, 1, 0, 7, 6, 5, 4,10, 9, 8,-1,14,13,12,11 }, 242 | { 0,-1,-1,-1, 1,-1,-1,-1, 5, 4, 3, 2, 9, 8, 7, 6 }, 243 | { 1, 0,-1,-1, 2,-1,-1,-1, 6, 5, 4, 3,10, 9, 8, 7 }, 244 | { 2, 1, 0,-1, 3,-1,-1,-1, 7, 6, 5, 4,11,10, 9, 8 }, 245 | { 3, 2, 1, 0, 4,-1,-1,-1, 8, 7, 6, 5,12,11,10, 9 }, 246 | { 0,-1,-1,-1, 2, 1,-1,-1, 6, 5, 4, 3,10, 9, 8, 7 }, 247 | { 1, 0,-1,-1, 3, 2,-1,-1, 7, 6, 5, 4,11,10, 9, 8 }, 248 | { 2, 1, 0,-1, 4, 3,-1,-1, 8, 7, 6, 5,12,11,10, 9 }, 249 | { 3, 2, 1, 0, 5, 4,-1,-1, 9, 8, 7, 6,13,12,11,10 }, 250 | { 0,-1,-1,-1, 3, 2, 1,-1, 7, 6, 5, 4,11,10, 9, 8 }, 251 | { 1, 0,-1,-1, 4, 3, 2,-1, 8, 7, 6, 5,12,11,10, 9 }, 252 | { 2, 1, 0,-1, 5, 4, 3,-1, 9, 8, 7, 6,13,12,11,10 }, 253 | { 3, 2, 1, 0, 6, 5, 4,-1,10, 9, 8, 7,14,13,12,11 }, 254 | { 0,-1,-1,-1, 4, 3, 2, 1, 8, 7, 6, 5,12,11,10, 9 }, 255 | { 1, 0,-1,-1, 5, 4, 3, 2, 9, 8, 7, 6,13,12,11,10 }, 256 | { 2, 1, 0,-1, 6, 5, 4, 3,10, 9, 8, 7,14,13,12,11 }, 257 | { 3, 2, 1, 0, 7, 6, 5, 4,11,10, 9, 8,15,14,13,12 } 258 | }; 259 | -------------------------------------------------------------------------------- /test.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | 7 | #include "utf8_code.h" 8 | 9 | void dump32(uint32_t *c, size_t cnt) { 10 | for(int i = 0; i < cnt; i++ ) 11 | printf("%8x,", c[i]); 12 | printf("\n"); 13 | } 14 | 15 | int main() { 16 | 17 | // reference: https://unicode-table.com/en/3139/ 18 | // 0x61, 0x3131, 0x62, 0x3134, 0x3137, 0x3139, ... 19 | uint8_t utf8[] = "aㄱbㄴcdefㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ"; 20 | 21 | uint32_t out[1024]; 22 | 23 | int nbytes = 0; 24 | uint32_t *pout = out; 25 | nbytes = utf32_code_ptr(utf8, &pout); 26 | printf("nbytes %d\n", nbytes); 27 | dump32(out, 8); 28 | 29 | 30 | uint8_t ascii[] = "abcdefghijklmnop"; 31 | pout = out; 32 | nbytes = utf32_code_ptr(ascii, &pout); 33 | printf("nbytes %d\n", nbytes); 34 | dump32(out, 5); 35 | 36 | pout = out; 37 | uint8_t longtxt[] = "abcdefghijklmnopjㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎz"; 38 | uint8_t *plong = longtxt; 39 | utf32_decode(plong, sizeof(longtxt), &pout); 40 | 41 | dump32(out, pout-out); 42 | } 43 | -------------------------------------------------------------------------------- /utf8_code.h: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "shuffle_table.h" 7 | #include "utils.h" 8 | 9 | #define mask _mm_setr_epi8(0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,\ 10 | 0x3F,0x3F,0x3F,0x3F,\ 11 | 0x1F,0x1F,\ 12 | 0x0F, 0x07) 13 | 14 | 15 | 16 | #define len _mm_setr_epi8(0,0,0,0,0,0,0,0,-1,-1,-1,-1,1,1,2,3) 17 | #define iota1 _mm_setr_epi8(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) 18 | 19 | // shared routines 20 | __m128i compress( __m128i bitfields ) { 21 | const __m128i pack16 = _mm_maddubs_epi16(bitfields, _mm_set1_epi32(0x40014001)); 22 | return _mm_madd_epi16(pack16, _mm_set1_epi32(0x10000001)); 23 | } 24 | 25 | __m128i make_upper4(__m128i utf8) {return _mm_and_si128(_mm_srli_epi64(utf8, 4), _mm_set1_epi8(0x0F));} 26 | 27 | __m128i shift( __m128i x, int i ) { 28 | char ix[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}; 29 | return _mm_shuffle_epi8(x, _mm_loadu_si128((__m128i *)(ix + i))); 30 | } 31 | 32 | static inline int utf32_code_ptr(void *p, uint32_t **pout) { 33 | __m128i utf8 = _mm_loadu_si128(p); 34 | __m128i upper4 = make_upper4(utf8); 35 | __m128i lengths = _mm_shuffle_epi8(len, upper4); 36 | 37 | __m128i P = _mm_adds_epu8(lengths, iota1); 38 | __m128i P2 = _mm_shuffle_epi8(P,P); 39 | __m128i P4 = _mm_shuffle_epi8(P2,P2); 40 | 41 | char jmp[16]; 42 | _mm_storeu_si128((__m128i *) jmp, P4); 43 | 44 | // iterate through jmp 45 | int i, multibyte = 0; 46 | __m128i masked; 47 | uint8_t bcode[16]; 48 | 49 | for(i = 0; i < 16 && jmp[i] > i; i = jmp[i]) 50 | { 51 | int bytes = jmp[i] - i; 52 | __m128i mout; 53 | if( bytes == 4 ) { 54 | mout = _mm_cvtepu8_epi32(shift(utf8, i)); 55 | } else { 56 | if( !multibyte ) { 57 | __m128i code12 = _mm_or_si128(lengths,_mm_slli_epi64(_mm_shuffle_epi8(lengths, P), 2)); 58 | __m128i code = _mm_or_si128(code12,_mm_slli_epi64(_mm_shuffle_epi8(code12, P2), 4)); 59 | _mm_storeu_si128( (__m128i *)bcode, code ); 60 | masked = _mm_and_si128(_mm_shuffle_epi8(mask,upper4), utf8); 61 | multibyte = 1; 62 | } 63 | 64 | __m128i Shuf = *(__m128i *) &shuffleTable[bcode[i]][0]; 65 | __m128i bitfields = _mm_shuffle_epi8(shift(masked, i), Shuf); 66 | mout = compress(bitfields); 67 | } 68 | _mm_storeu_si128((__m128i *)*pout, mout); 69 | *pout += 4; 70 | } 71 | return i; 72 | } 73 | 74 | size_t utf32_decode(uint8_t *txt, size_t length, uint32_t **pout) 75 | { 76 | int n = 0; 77 | uint8_t *p = txt; 78 | size_t consumed; 79 | for(consumed = 0; 80 | consumed+16 <= length && ((n = utf32_code_ptr(p+consumed, pout))); 81 | consumed += n) 82 | ; 83 | 84 | int l = length - consumed; 85 | p += consumed; 86 | 87 | while(l > 0) // 1-2 iterations to consume end 88 | { 89 | uint8_t inbuf[16]; 90 | uint32_t outbuf[16]; 91 | uint32_t *pbuf = outbuf; 92 | memset(inbuf, 0, 16); 93 | memcpy(inbuf, p, l); 94 | int nused = utf32_code_ptr(inbuf, &pbuf); 95 | int nout = pbuf-outbuf; 96 | int nextra = 0; 97 | if( nused >= l ) { 98 | nextra = nused - l; // one byte per extra 99 | l = 0; 100 | } else { 101 | nextra = 0; 102 | p += nused; 103 | l -= nused; 104 | } 105 | 106 | for(int i = 0; i < nout - nextra; i++) { 107 | **pout = outbuf[i]; 108 | (*pout)++; 109 | } 110 | } 111 | return l; 112 | } 113 | 114 | 115 | 116 | 117 | -------------------------------------------------------------------------------- /utils.h: -------------------------------------------------------------------------------- 1 | void dump(const __m128i x, char * tag) { 2 | printf( "%6s: ", tag); 3 | char * xc = (char *) &x; 4 | for( int i =0; i < 16; i++) 5 | printf("%2i,", xc[i]); 6 | 7 | printf("\n"); 8 | } 9 | 10 | 11 | void dumpx(const __m128i x, char * tag) { 12 | printf( "%6s: ", tag); 13 | unsigned char * xc = (unsigned char *) &x; 14 | for( int i =0; i < 16; i++) 15 | printf("%2x,", xc[i]); 16 | 17 | printf("\n"); 18 | } 19 | --------------------------------------------------------------------------------