├── .github └── workflows │ └── test.yml ├── .gitignore ├── README.md ├── include ├── strcasecmpeq_sse2.h ├── strcmp_x64.h ├── strcmpeq_sse2.h ├── strcmpeq_sse4.h ├── strcmpeq_x64.h ├── strlen_sse2.h └── strlen_sse4.h ├── src ├── strcasecmpeq_sse2.asm ├── strcmp_x64.asm ├── strcmpeq_sse2.asm ├── strcmpeq_sse4.asm ├── strcmpeq_x64.asm ├── strlen_sse2.asm └── strlen_sse4.asm └── test ├── Makefile ├── sse_level.c ├── sse_level.h ├── test_strcasecmpeq.c ├── test_strcmp.c ├── test_strcmpeq.c └── test_strlen.c /.github/workflows/test.yml: -------------------------------------------------------------------------------- 1 | name: Test 2 | 3 | on: [push] 4 | 5 | jobs: 6 | Test: 7 | runs-on: ubuntu-latest 8 | strategy: 9 | matrix: 10 | cc: [gcc, clang] 11 | steps: 12 | - name: Checkout 13 | uses: actions/checkout@v3 14 | - name: Install dependencies 15 | run: | 16 | sudo apt update 17 | sudo apt install nasm 18 | - name: Run tests 19 | env: 20 | CC: ${{matrix.cc}} 21 | working-directory: test 22 | run: make 23 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | test/test_strcasecmpeq 3 | test/test_strcmp 4 | test/test_strcmpeq 5 | test/test_strlen 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SSE string functions for x86-64 Linux 2 | 3 | [![Test](https://github.com/aklomp/sse-strings/actions/workflows/test.yml/badge.svg)](https://github.com/aklomp/sse-strings/actions/workflows/test.yml) 4 | 5 | November 5, 2014 6 | 7 | The story of this page is that I was [nerd sniped](https://xkcd.com/356/). 8 | During the course of optimizing a `strcasecmp()` bottleneck at work, I 9 | discovered the 10 | [assembly language versions of basic C string routines](http://www.strchr.com/strcmp_and_strlen_using_sse_4.2) 11 | at strchr.com, and they tickled me to get back into assembly language. 12 | Something about a problem just difficult enough to be interesting, and just 13 | easy enough to be tractable as a side project. 14 | 15 | I've programmed in x86 assembly language before as a teenager, but that was on 16 | a 486 that ran 16-bit DOS (with segments and offsets), with only a 16-bit 17 | assembler available to me, where I had to splice in my own 32-bit opcodes with 18 | `db` directives. Still, I have many fond memories of small moments of magic. 19 | Like finally getting my screen13 scrolling routines to run properly inside of 20 | my PowerBasic demo programs. I later went on to do some assembly work on the 21 | Z80 processor of my TI-83 calculator (a weird platform), and even did some 22 | Win32 assembly with MASM, but then my interest faded. I found assembly language 23 | to make for fantastic rainy afternoon puzzles, but lacking in power to make 24 | actual complex, useful things. 25 | 26 | Now that the '90's are over and open-source tooling is easily available, and we 27 | have fast 64-bit multicore CPUs that run everything in a single clock cycle, 28 | playing around with assembly is a lot more fun. Let's get to work! I present on 29 | this page a selection of small string routines written for 64-bit Linux 30 | platforms. I doubt that these are competitive in terms of speed to what common 31 | libraries use (they unroll their loops, are concerned with things like data 32 | dependencies and cache performance, and actually benchmark their code), but 33 | it's been a fun puzzle. 34 | 35 | All routines shown are written for [NASM](http://www.nasm.us) in Intel assembly 36 | syntax (the One True Form) and assume 64-bit (System V AMD64 ABI) calling 37 | conventions. These are so much nicer than the 32-bit calling conventions. The 38 | first two arguments (that's all we need here) are passed in `rdi` and `rsi`, 39 | return values go in `rax`, and the only registers the callee needs to save are 40 | `rbp`, `rbx` and `r12–r15` 41 | ([source](https://en.wikipedia.org/wiki/X86_calling_conventions)). The routines 42 | use SSE vector instructions where applicable, and consist of 43 | position-independent code, so that they can be included in shared libraries. 44 | Because this is the bright new x64 universe for which NASM has good defaults, 45 | we don't need to set up segments, but can get away with declaring our functions 46 | and our data `global`. 47 | 48 | ## Assembling and linking 49 | 50 | Let's say you have `strlen.asm`. Assemble it into an object file named 51 | `strlen.o` like so: 52 | 53 | ```sh 54 | nasm -f elf64 strlen.asm 55 | ``` 56 | 57 | Include the appropriate header file in your C program so the compiler knows how 58 | to call the function. In the linker stage, simply add the object file to the 59 | list of object files on the compiler's command line to link them in to the 60 | final binary. 61 | 62 | ## Caveat 63 | 64 | These functions are all unaware of how long a string is, and will read 16 bytes 65 | of memory at a time until a zero byte is found. This might mean that they read 66 | a few bytes past the end of the string, which could cause a memory access 67 | violation. 68 | 69 | All these functions are only lightly tested. In production code, use the 70 | functions provided by your regular libc, they'll probably be faster and 71 | battle-hardened. This is a hobby project, no guarantees given or implied, etc. 72 | 73 | ## License 74 | 75 | Because code must have a license, I declare these functions to be licensed 76 | under the [2-clause BSD](http://opensource.org/licenses/BSD-2-Clause) license. 77 | 78 | ## strlen\_sse2 79 | 80 | Determine the length of a zero-terminated string using only SSE2-level 81 | instructions. This code looks a lot like the code at strchr.com, but that's 82 | because this really is the most condensed, simplified way to write it. When 83 | you're working at this level, there is not much opportunity for stylistic 84 | flourish. 85 | 86 | ## strlen\_sse4 87 | 88 | Determine the length of a zero-terminated string using the SSE4.2 `pcmpistri` 89 | string instruction. Thanks to Lord Sméagol for suggesting the improvement of 90 | using `lea` to perform an addition without touching the flags. 91 | 92 | ## strcmp\_x64 93 | 94 | Compare zero-terminated string `a` against string `b`. Returns zero if the 95 | strings are equal, less than zero if `a < b`, and greater than zero if `a > b`. 96 | 97 | Actually, this function makes stronger guarantees about its return value, so 98 | that it is identical (and thus comparable) to glibc's `strcmp`. It will return 99 | the *signed difference* between the first differing character between `a` and 100 | `b`. For example, if `a` is `0xFF` and `b` is `0`, it returns `-255`. 101 | 102 | ## strcmpeq\_x64 103 | 104 | Comparing strings for equality is one of the most popular uses for `strcmp`, 105 | but is slightly inefficient because of the extra arithmetic performed on the 106 | differing bytes. If we're only interested in equality, we can take some 107 | shortcuts. The family of `strcmpeq` functions are a nonstandard variation of 108 | `strcmp` that only check whether two strings are equal or not, and return `0` 109 | or `1` (which can be typed to a compiler as a C99 `bool`). 110 | 111 | ## strcmpeq\_sse2 112 | 113 | Return `1` if zero-terminated strings `a` and `b` are equal, `0` if not. Uses 114 | SSE2-level vector instructions to do the least amount of work per byte. Bails 115 | out at the first sign that the strings must differ. 116 | 117 | ## strcmpeq\_sse4 118 | 119 | Compare two strings for equality using SSE4.2's `pcmpistri` instruction, which 120 | was virtually made for the job. It handles zero-termination and sets all the 121 | right flags for us. Compared to the SSE2 code above, this is extremely compact. 122 | 123 | ## strcasecmpeq\_sse2 124 | 125 | Compare zero-terminated strings `s1` and `s2` in case-insensitive fashion. 126 | Return `1` if they are equal and `0` if they are not. The case-insensitive part 127 | is a 1970's-style hack: it does no actual locale-dependent or language-aware 128 | conversion, all characters in the `A–Z` range are simply arithmetically 129 | transposed to characters in the `a–z` range. It's fast if you just care about 130 | comparing plain ASCII strings. 131 | -------------------------------------------------------------------------------- /include/strcasecmpeq_sse2.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | extern int strcasecmpeq_sse2 (const char *s1, const char *s2); 4 | -------------------------------------------------------------------------------- /include/strcmp_x64.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | extern int strcmp_x64 (const char *s1, const char *s2); 4 | -------------------------------------------------------------------------------- /include/strcmpeq_sse2.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | extern int strcmpeq_sse2 (const char *s1, const char *s2); 4 | -------------------------------------------------------------------------------- /include/strcmpeq_sse4.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | extern int strcmpeq_sse4 (const char *s1, const char *s2); 4 | -------------------------------------------------------------------------------- /include/strcmpeq_x64.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | extern int strcmpeq_x64 (const char *s1, const char *s2); 4 | -------------------------------------------------------------------------------- /include/strlen_sse2.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | extern size_t strlen_sse2 (const char *s); 6 | -------------------------------------------------------------------------------- /include/strlen_sse4.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | extern size_t strlen_sse4 (const char *s); 6 | -------------------------------------------------------------------------------- /src/strcasecmpeq_sse2.asm: -------------------------------------------------------------------------------- 1 | global aldiff:data 16 2 | global uppera:data 16 3 | global upperz:data 16 4 | global strcasecmpeq_sse2:function 5 | 6 | align 16 7 | aldiff: times 16 db 'a' - 'A' ; difference between lower and uppercase 8 | uppera: times 16 db 'A' - 1 ; we test for "greater than"; off by one 9 | upperz: times 16 db 'Z' 10 | 11 | %macro lcase 1 12 | movdqa xmm6, %1 ; xmm6 := xxx 13 | movdqa xmm7, %1 ; xmm7 := xxx 14 | pcmpgtb xmm6, xmm3 ; xmm6 := (xxx >= 'A') 15 | pcmpgtb xmm7, xmm4 ; xmm7 := (xxx > 'Z') 16 | pandn xmm7, xmm6 ; xmm7 := (xxx >= 'A' && !(xxx > 'Z')) 17 | pand xmm7, xmm5 ; xmm7 &= aldiff 18 | paddusb %1, xmm7 ; xxx += xmm7 19 | %endmacro 20 | 21 | strcasecmpeq_sse2: 22 | 23 | xor rdx, rdx 24 | pxor xmm0, xmm0 25 | 26 | movdqa xmm3, [rel uppera] 27 | movdqa xmm4, [rel upperz] 28 | movdqa xmm5, [rel aldiff] 29 | 30 | .loop: movdqu xmm1, [rdi + rdx] ; load strings 31 | movdqu xmm2, [rsi + rdx] 32 | 33 | movdqa xmm6, xmm1 ; copy strings 34 | movdqa xmm7, xmm2 35 | 36 | pcmpeqb xmm6, xmm0 ; check copies for 0x00 37 | pcmpeqb xmm7, xmm0 38 | 39 | pmovmskb eax, xmm6 ; fetch bitmasks 40 | pmovmskb ecx, xmm7 41 | 42 | ; If both masks are zero, neither strings contain a null byte and we 43 | ; take a fast track where we just check the vectors for equality: 44 | 45 | test ax, cx 46 | jnz .tail 47 | 48 | lcase xmm1 ; lowercase the strings 49 | lcase xmm2 50 | 51 | pcmpeqb xmm1, xmm2 ; compare strings directly 52 | pmovmskb eax, xmm1 ; fetch bitmask 53 | 54 | not ax ; if the inverse of the bitmask is not zero, 55 | test ax, ax ; the strings differ in at least one place 56 | jnz .dif 57 | 58 | add rdx, 16 ; match so far; go in for the next loop round 59 | jmp .loop 60 | 61 | .tail: ; If one mask is zero, one string contains a null byte but the other 62 | ; doesn't; strings are unequal. Must test for this case because bsf is 63 | ; not defined when run on a zero input: 64 | 65 | test ax, ax 66 | jz .dif 67 | 68 | test cx, cx 69 | jz .dif 70 | 71 | ; Neither mask is zero, so both vectors contain a null byte somewhere; 72 | ; take a slow path where we compensate for the null bytes: 73 | 74 | bsf ax, ax ; find bit positions of first null bytes 75 | bsf cx, cx 76 | 77 | cmp ax, cx ; if the nulls do not start at the same byte, 78 | jne .dif ; the strings have different lengths; exit 79 | 80 | lcase xmm1 ; lowercase the strings 81 | lcase xmm2 82 | 83 | pcmpeqb xmm1, xmm2 ; compare strings 84 | pmovmskb eax, xmm1 ; fetch bitmask 85 | 86 | ; If the inverse of the bitmask is zero, full match. 87 | ; Must again do this check because bsf is not defined for zero input: 88 | 89 | not ax ; does not affect the flags 90 | test ax, ax ; test for zero 91 | jz .eql 92 | 93 | bsf ax, ax ; get first nonmatching position 94 | cmp ax, cx ; if this is less than where the nulls start, 95 | jl .dif ; the strings differ 96 | 97 | ; The strings are equal up to the null byte; fallthrough: 98 | 99 | .eql: mov ax, 1 ; upper bits are known to be zero 100 | ret 101 | 102 | .dif: xor rax, rax 103 | ret 104 | -------------------------------------------------------------------------------- /src/strcmp_x64.asm: -------------------------------------------------------------------------------- 1 | global strcmp_x64:function 2 | 3 | strcmp_x64: 4 | 5 | xor eax, eax 6 | xor ecx, ecx 7 | xor rdx, rdx 8 | 9 | .loop: mov cl, [rsi + rdx] ; fetch bytes 10 | mov al, [rdi + rdx] 11 | 12 | mov r8b, cl ; move byte to temporary 13 | or r8b, al ; if both bytes are null, 14 | jz .ret ; strings are equal 15 | 16 | sub ax, cx ; subtract bytes, 17 | jne .ext ; result from -255 to 255 18 | 19 | inc rdx ; ax was found to be zero, 20 | jmp .loop ; go for next round 21 | 22 | .ext: cwde ; sign-extend ax to eax 23 | cdqe ; sign-extend eax to rax 24 | 25 | .ret: ret 26 | -------------------------------------------------------------------------------- /src/strcmpeq_sse2.asm: -------------------------------------------------------------------------------- 1 | global strcmpeq_sse2:function 2 | 3 | strcmpeq_sse2: 4 | 5 | xor rdx, rdx 6 | pxor xmm0, xmm0 7 | 8 | .loop: movdqu xmm1, [rdi + rdx] ; load strings 9 | movdqu xmm2, [rsi + rdx] 10 | 11 | movdqa xmm6, xmm1 ; copy strings 12 | movdqa xmm7, xmm2 13 | 14 | pcmpeqb xmm6, xmm0 ; check copies for 0x00 15 | pcmpeqb xmm7, xmm0 16 | 17 | pmovmskb eax, xmm6 ; fetch bitmasks 18 | pmovmskb ecx, xmm7 19 | 20 | ; If both masks are zero, neither strings contain a null byte and we 21 | ; take a fast track where we just check the vectors for equality: 22 | 23 | test ax, cx 24 | jnz .tail 25 | 26 | pcmpeqb xmm1, xmm2 ; compare strings directly 27 | pmovmskb eax, xmm1 ; fetch bitmask 28 | 29 | not ax ; if the inverse of the bitmask is not zero, 30 | test ax, ax ; the strings differ in at least one place 31 | jnz .dif 32 | 33 | add rdx, 16 ; match so far; go in for the next loop round 34 | jmp .loop 35 | 36 | .tail: ; If one mask is zero, one string contains a null byte but the other 37 | ; doesn't; strings are unequal. Must test for this case because bsf is 38 | ; not defined when run on a zero input: 39 | 40 | test ax, ax 41 | jz .dif 42 | 43 | test cx, cx 44 | jz .dif 45 | 46 | ; Neither mask is zero, so both vectors contain a null byte somewhere; 47 | ; take a slow path where we compensate for the null bytes: 48 | 49 | bsf ax, ax ; find bit positions of first null bytes 50 | bsf cx, cx 51 | 52 | cmp ax, cx ; if the nulls do not start at the same byte, 53 | jne .dif ; the strings have different lengths; exit 54 | 55 | pcmpeqb xmm1, xmm2 ; compare strings 56 | pmovmskb eax, xmm1 ; fetch bitmask 57 | 58 | ; If the inverse of the bitmask is zero, full match. 59 | ; Must again do this check because bsf is not defined for zero input: 60 | 61 | not ax ; does not affect the flags 62 | test ax, ax ; test for zero 63 | jz .eql 64 | 65 | bsf ax, ax ; get first nonmatching position 66 | cmp ax, cx ; if this is less than where the nulls start, 67 | jl .dif ; the strings differ 68 | 69 | ; The strings are equal up to the null byte; fallthrough: 70 | 71 | .eql: mov ax, 1 ; upper bits are known to be zero 72 | ret 73 | 74 | .dif: xor rax, rax 75 | ret 76 | -------------------------------------------------------------------------------- /src/strcmpeq_sse4.asm: -------------------------------------------------------------------------------- 1 | global strcmpeq_sse4:function 2 | 3 | strcmpeq_sse4: 4 | 5 | xor rax, rax 6 | xor rdx, rdx 7 | 8 | .loop: movdqu xmm1, [rdi + rdx] 9 | pcmpistri xmm1, [rsi + rdx], 0x18 ; EQUAL_EACH | NEGATIVE_POLARITY 10 | jc .dif 11 | jz .eql 12 | add rdx, 16 13 | jmp .loop 14 | 15 | .eql: inc eax 16 | .dif: ret 17 | -------------------------------------------------------------------------------- /src/strcmpeq_x64.asm: -------------------------------------------------------------------------------- 1 | global strcmpeq_x64:function 2 | 3 | strcmpeq_x64: 4 | 5 | xor rax, rax 6 | xor rdx, rdx 7 | 8 | .loop: mov cl, [rsi + rdx] ; fetch bytes 9 | mov ch, [rdi + rdx] 10 | 11 | test cx, cx ; if both bytes are null, 12 | jz .eql ; strings are equal 13 | 14 | cmp cl, ch ; compare bytes, 15 | jne .dif ; quit if not equal 16 | 17 | inc rdx ; else go for next round 18 | jmp .loop 19 | 20 | .eql: inc eax 21 | .dif: ret 22 | -------------------------------------------------------------------------------- /src/strlen_sse2.asm: -------------------------------------------------------------------------------- 1 | global strlen_sse2:function 2 | 3 | strlen_sse2: 4 | 5 | xor eax, eax ; zero the string offset 6 | pxor xmm0, xmm0 ; zero the comparison register 7 | 8 | .loop: movdqu xmm1, [rdi + rax] ; unaligned string read 9 | pcmpeqb xmm1, xmm0 ; compare string against zeroes 10 | pmovmskb ecx, xmm1 ; create bitmask from result 11 | lea eax, [eax + 16] ; inc offset without touching flags 12 | test ecx, ecx ; set flags based on bitmask 13 | jz .loop ; no bits set means no zeroes found 14 | 15 | bsf ecx, ecx ; find position of first set bit 16 | lea rax, [rax + rcx - 16] ; 64-bit add position to offset 17 | ret 18 | -------------------------------------------------------------------------------- /src/strlen_sse4.asm: -------------------------------------------------------------------------------- 1 | global strlen_sse4:function 2 | 3 | strlen_sse4: 4 | 5 | xor eax, eax 6 | pxor xmm0, xmm0 7 | 8 | .loop: pcmpistri xmm0, [rdi + rax], 0x08 ; EQUAL_EACH 9 | lea rax, [rax + 16] ; inc offset without touching flags 10 | jnz .loop ; branch based on pcmpistri's flags 11 | 12 | lea rax, [rax + rcx - 16] ; subtract spurious final increment 13 | ret 14 | -------------------------------------------------------------------------------- /test/Makefile: -------------------------------------------------------------------------------- 1 | CFLAGS += -std=c99 -Wall -Wextra -Werror -pedantic 2 | 3 | PROGS = \ 4 | test_strlen \ 5 | test_strcmp \ 6 | test_strcmpeq \ 7 | test_strcasecmpeq 8 | 9 | STRLEN_OBJS := test_strlen.o strlen_sse2.o strlen_sse4.o sse_level.o 10 | STRCMP_OBJS := test_strcmp.o strcmp_x64.o 11 | STRCMPEQ_OBJS := test_strcmpeq.o strcmpeq_x64.o strcmpeq_sse2.o strcmpeq_sse4.o sse_level.o 12 | STRCASECMPEQ_OBJS := test_strcasecmpeq.o strcasecmpeq_sse2.o sse_level.o 13 | 14 | # Default to silent mode, run 'make V=1' for a verbose build. 15 | ifneq ($(V),1) 16 | Q := @ 17 | MAKEFLAGS += --no-print-directory 18 | endif 19 | 20 | .PHONY: all test clean 21 | 22 | # The default target rebuilds all tests and runs them. 23 | test: clean all 24 | $(Q)for i in $(PROGS); do printf " TEST $$i\n"; ./$$i || exit; done 25 | 26 | all: $(PROGS) 27 | 28 | # Set per-program object file dependencies. 29 | test_strlen: $(STRLEN_OBJS) 30 | test_strcmp: $(STRCMP_OBJS) 31 | test_strcmpeq: $(STRCMPEQ_OBJS) 32 | test_strcasecmpeq: $(STRCASECMPEQ_OBJS) 33 | 34 | $(PROGS): 35 | @printf " LD $@\n" 36 | $(Q)$(CC) -o $@ $^ 37 | 38 | %.o: %.c 39 | @printf " CC $@\n" 40 | $(Q)$(CC) $(CFLAGS) -o $@ -c $^ 41 | 42 | str%.o: ../src/str%.asm 43 | @printf " NASM $@\n" 44 | $(Q)nasm -f elf64 -o $@ $^ 45 | 46 | clean: 47 | @printf " CLEAN\n" 48 | $(Q)$(RM) $(STRLEN_OBJS) $(STRCMP_OBJS) $(STRCMPEQ_OBJS) $(STRCASECMPEQ_OBJS) $(PROGS) 49 | -------------------------------------------------------------------------------- /test/sse_level.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | bool 5 | have_sse2 (void) 6 | { 7 | // Return true if SSE2 is supported, false if not. 8 | unsigned int eax, ebx, ecx, edx = 0; 9 | 10 | (void)__get_cpuid(1, &eax, &ebx, &ecx, &edx); 11 | return (edx & bit_SSE2); 12 | } 13 | 14 | bool 15 | have_sse42 (void) 16 | { 17 | // Return true if SSE4.2 is supported, false if not. 18 | unsigned int eax, ebx, ecx = 0, edx; 19 | 20 | (void)__get_cpuid(1, &eax, &ebx, &ecx, &edx); 21 | 22 | // Clang and GCC use different names for this constant 23 | // (bit_SSE4_2 vs bit_SSE42), so just use the raw value: 24 | return (ecx & 0x00100000); 25 | } 26 | -------------------------------------------------------------------------------- /test/sse_level.h: -------------------------------------------------------------------------------- 1 | bool have_sse2 (void); 2 | bool have_sse42 (void); 3 | -------------------------------------------------------------------------------- /test/test_strcasecmpeq.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "sse_level.h" 7 | #include "../include/strcasecmpeq_sse2.h" 8 | 9 | static int 10 | strcasecmpeq_ref (const char *a, const char *b) 11 | { 12 | // Our reference function: 13 | return (strcasecmp(a, b) == 0); 14 | } 15 | 16 | static int 17 | test_strcasecmpeq (const char *name, int (*mystrcasecmpeq)(const char *a, const char *b)) 18 | { 19 | int exp, got; 20 | char b1[50], b2[50]; 21 | 22 | // Generate 50 * 50 strings: 23 | for (size_t i = 0; i < sizeof (b1); i++) { 24 | for (size_t k = 0; k < i; k++) { 25 | b1[k] = 'A' + i; 26 | } 27 | b1[i] = '\0'; 28 | for (size_t j = 0; j < sizeof (b2); j++) { 29 | for (size_t k = 0; k < j; k++) { 30 | b2[k] = 'a' + j; 31 | } 32 | b2[j] = '\0'; 33 | 34 | if ((exp = strcasecmpeq_ref(b1, b2)) == (got = mystrcasecmpeq(b1, b2))) { 35 | continue; 36 | } 37 | printf("FAIL: %s: '%s' '%s': expected %d, got %d\n", name, b1, b2, exp, got); 38 | return 1; 39 | } 40 | } 41 | return 0; 42 | } 43 | 44 | static int 45 | test_sse2() 46 | { 47 | if (have_sse2()) { 48 | return test_strcasecmpeq("strcasecmpeq_sse2", strcasecmpeq_sse2); 49 | } 50 | puts("WARN: not testing SSE2 routines"); 51 | return 0; 52 | } 53 | 54 | int 55 | main () 56 | { 57 | int ret = 0; 58 | 59 | ret |= test_sse2(); 60 | 61 | return ret; 62 | } 63 | -------------------------------------------------------------------------------- /test/test_strcmp.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "../include/strcmp_x64.h" 5 | 6 | static int 7 | test_strcmp (const char *name, int (*mystrcmp)(const char *a, const char *b)) 8 | { 9 | int exp, got; 10 | char b1[50], b2[50]; 11 | 12 | // Generate 50 * 50 strings: 13 | for (size_t i = 0; i < sizeof (b1); i++) { 14 | for (size_t k = 0; k < i; k++) { 15 | b1[k] = 'A' + i; 16 | } 17 | b1[i] = '\0'; 18 | for (size_t j = 0; j < sizeof (b2); j++) { 19 | for (size_t k = 0; k < j; k++) { 20 | b2[k] = 'a' + j; 21 | } 22 | b2[j] = '\0'; 23 | 24 | if ((exp = strcmp(b1, b2)) == (got = mystrcmp(b1, b2))) { 25 | continue; 26 | } 27 | printf("FAIL: %s: '%s' '%s': expected %d, got %d\n", name, b1, b2, exp, got); 28 | return 1; 29 | } 30 | } 31 | return 0; 32 | } 33 | 34 | int 35 | main () 36 | { 37 | int ret = 0; 38 | 39 | ret |= test_strcmp("strcmp_x64", strcmp_x64); 40 | 41 | return ret; 42 | } 43 | -------------------------------------------------------------------------------- /test/test_strcmpeq.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include "sse_level.h" 6 | #include "../include/strcmpeq_x64.h" 7 | #include "../include/strcmpeq_sse2.h" 8 | #include "../include/strcmpeq_sse4.h" 9 | 10 | static int 11 | strcmpeq_ref (const char *a, const char *b) 12 | { 13 | // Our reference function: 14 | return (strcmp(a, b) == 0); 15 | } 16 | 17 | static int 18 | test_strcmpeq (const char *name, int (*mystrcmpeq)(const char *a, const char *b)) 19 | { 20 | int exp, got; 21 | char b1[50], b2[50]; 22 | 23 | // Generate 50 * 50 strings: 24 | for (size_t i = 0; i < sizeof (b1); i++) { 25 | for (size_t k = 0; k < i; k++) { 26 | b1[k] = 'A' + i; 27 | } 28 | b1[i] = '\0'; 29 | for (size_t j = 0; j < sizeof (b2); j++) { 30 | for (size_t k = 0; k < j; k++) { 31 | b2[k] = 'A' + j; 32 | } 33 | b2[j] = '\0'; 34 | 35 | if ((exp = strcmpeq_ref(b1, b2)) == (got = mystrcmpeq(b1, b2))) { 36 | continue; 37 | } 38 | printf("FAIL: %s: '%s' '%s': expected %d, got %d\n", name, b1, b2, exp, got); 39 | return 1; 40 | } 41 | } 42 | return 0; 43 | } 44 | 45 | static int 46 | test_x64 (void) 47 | { 48 | return test_strcmpeq("strcmpeq_x64", strcmpeq_x64); 49 | } 50 | 51 | static int 52 | test_sse2 (void) 53 | { 54 | if (have_sse2()) { 55 | return test_strcmpeq("strcmpeq_sse2", strcmpeq_sse2); 56 | } 57 | puts("WARN: not testing SSE2 routines"); 58 | return 0; 59 | } 60 | 61 | static int 62 | test_sse4 (void) 63 | { 64 | if (have_sse42()) { 65 | return test_strcmpeq("strcmpeq_sse4", strcmpeq_sse4); 66 | } 67 | puts("WARN: not testing SSE4.2 routines"); 68 | return 0; 69 | } 70 | 71 | int 72 | main () 73 | { 74 | int ret = 0; 75 | 76 | ret |= test_x64(); 77 | ret |= test_sse2(); 78 | ret |= test_sse4(); 79 | 80 | return ret; 81 | } 82 | -------------------------------------------------------------------------------- /test/test_strlen.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "sse_level.h" 5 | #include "../include/strlen_sse2.h" 6 | #include "../include/strlen_sse4.h" 7 | 8 | static int 9 | test_strlen (const char *name, size_t (*mystrlen)(const char *s)) 10 | { 11 | size_t got; 12 | char buf[50]; 13 | 14 | // Test strings of various known length: 15 | for (size_t i = 0; i < sizeof (buf); i++) { 16 | for (size_t j = 0; j < i; j++) { 17 | buf[j] = 'A'; 18 | } 19 | buf[i] = '\0'; 20 | 21 | if ((got = mystrlen(buf)) != i) { 22 | printf("FAIL: %s: '%s': expected %zu, got %zu\n", name, buf, i, got); 23 | return 1; 24 | } 25 | } 26 | return 0; 27 | } 28 | 29 | static int 30 | test_sse2 (void) 31 | { 32 | if (have_sse2()) { 33 | return test_strlen("strlen_sse2", strlen_sse2); 34 | } 35 | puts("WARN: not testing SSE2 routines"); 36 | return 0; 37 | } 38 | 39 | static int 40 | test_sse4 (void) 41 | { 42 | if (have_sse42()) { 43 | return test_strlen("strlen_sse4", strlen_sse4); 44 | } 45 | puts("WARN: not testing SSE4.2 routines"); 46 | return 0; 47 | } 48 | 49 | int 50 | main () 51 | { 52 | int ret = 0; 53 | 54 | ret |= test_sse2(); 55 | ret |= test_sse4(); 56 | 57 | return ret; 58 | } 59 | --------------------------------------------------------------------------------