├── .gitignore ├── .github ├── FUNDING.yml └── workflows │ └── main.yml ├── lib ├── terminal │ └── terminal.zig ├── string │ ├── utils │ │ ├── memory │ │ │ ├── memory.zig │ │ │ └── memory.test.zig │ │ ├── grapheme │ │ │ ├── grapheme.zig │ │ │ └── grapheme.test.zig │ │ ├── codepoint │ │ │ ├── codepoint.test.zig │ │ │ └── codepoint.zig │ │ ├── utf8 │ │ │ ├── utf8.test.zig │ │ │ └── utf8.zig │ │ └── ascii │ │ │ ├── ascii.zig │ │ │ └── ascii.test.zig │ └── string.zig └── io.zig ├── LICENSE ├── README.md └── docs ├── index.md └── string └── utils ├── utf8.md ├── ascii.md └── codepoint.md /.gitignore: -------------------------------------------------------------------------------- 1 | .vscode 2 | .zig-cache 3 | zig-out -------------------------------------------------------------------------------- /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | github: [Super-ZIG, maysara-elshewehy] 4 | ko_fi: codeguild 5 | -------------------------------------------------------------------------------- /lib/terminal/terminal.zig: -------------------------------------------------------------------------------- 1 | // terminal.zig — Terminal handling module for I/O library. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/terminal 5 | // author : https://github.com/maysara-elshewehy 6 | // 7 | // Developed with ❤️ by Maysara. 8 | 9 | 10 | 11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 12 | 13 | 14 | 15 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 16 | 17 | 18 | 19 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗ 20 | 21 | test { 22 | } 23 | 24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /lib/string/utils/memory/memory.zig: -------------------------------------------------------------------------------- 1 | // Copyright (c) 2025 Maysara, All rights reserved. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/memory 5 | // 6 | // owner : https://github.com/maysara-elshewehy 7 | // email : maysara.elshewehy@gmail.com 8 | // 9 | // Made with ❤️ by Maysara 10 | 11 | 12 | 13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 14 | 15 | 16 | 17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 18 | 19 | 20 | 21 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗ 22 | 23 | 24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /lib/string/utils/grapheme/grapheme.zig: -------------------------------------------------------------------------------- 1 | // Copyright (c) 2025 Maysara, All rights reserved. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/grapheme 5 | // 6 | // owner : https://github.com/maysara-elshewehy 7 | // email : maysara.elshewehy@gmail.com 8 | // 9 | // Made with ❤️ by Maysara 10 | 11 | 12 | 13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 14 | 15 | 16 | 17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 18 | 19 | 20 | 21 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗ 22 | 23 | 24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /lib/string/utils/memory/memory.test.zig: -------------------------------------------------------------------------------- 1 | // Copyright (c) 2025 Maysara, All rights reserved. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/memory 5 | // 6 | // owner : https://github.com/maysara-elshewehy 7 | // email : maysara.elshewehy@gmail.com 8 | // 9 | // Made with ❤️ by Maysara 10 | 11 | 12 | 13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 14 | 15 | 16 | 17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 18 | 19 | 20 | 21 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗ 22 | 23 | 24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /lib/string/utils/grapheme/grapheme.test.zig: -------------------------------------------------------------------------------- 1 | // Copyright (c) 2025 Maysara, All rights reserved. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/grapheme 5 | // 6 | // owner : https://github.com/maysara-elshewehy 7 | // email : maysara.elshewehy@gmail.com 8 | // 9 | // Made with ❤️ by Maysara 10 | 11 | 12 | 13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 14 | 15 | 16 | 17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 18 | 19 | 20 | 21 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗ 22 | 23 | 24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | 3 | on: 4 | push: 5 | branches: [main] 6 | pull_request: 7 | branches: [main] 8 | workflow_dispatch: 9 | 10 | permissions: 11 | contents: read 12 | 13 | jobs: 14 | build: 15 | runs-on: ubuntu-latest 16 | steps: 17 | - uses: actions/checkout@v4 18 | 19 | - name: Download Zig binary 20 | run: wget https://ziglang.org/download/0.14.0/zig-linux-x86_64-0.14.0.tar.xz 21 | 22 | - name: Extract Zig 23 | run: tar -xf zig-linux-x86_64-0.14.0.tar.xz 24 | 25 | - name: Add Zig to PATH 26 | run: echo "ZIG=$PWD/zig-linux-x86_64-0.14.0/zig" >> $GITHUB_ENV 27 | 28 | - name: Verify version 29 | run: $ZIG version 30 | 31 | - name: Run tests 32 | run: $ZIG build test 33 | -------------------------------------------------------------------------------- /lib/io.zig: -------------------------------------------------------------------------------- 1 | // io.zig — Central entry point for I/O library. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io 5 | // author : https://github.com/maysara-elshewehy 6 | // 7 | // Developed with ❤️ by Maysara. 8 | 9 | 10 | 11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 12 | 13 | /// Provides utilities for string manipulation and operations. 14 | pub const string = @import("./string/string.zig"); 15 | 16 | /// Provides utilities for terminal input/output operations. 17 | pub const terminal = @import("./terminal/terminal.zig"); 18 | 19 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 20 | 21 | 22 | 23 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗ 24 | 25 | test { 26 | _ = @import("./string/string.zig"); 27 | _ = @import("./terminal/terminal.zig"); 28 | } 29 | 30 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 Maysara 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /lib/string/string.zig: -------------------------------------------------------------------------------- 1 | // string.zig — String handling module for I/O library. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string 5 | // author : https://github.com/maysara-elshewehy 6 | // 7 | // Developed with ❤️ by Maysara. 8 | 9 | 10 | 11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 12 | 13 | /// A utility module for efficient string manipulation, providing various tools 14 | /// for handling ASCII, UTF-8, codepoints, graphemes, and memory operations. 15 | pub const utils = .{ 16 | .ascii = @import("./utils/ascii/ascii.zig"), 17 | .utf8 = @import("./utils/utf8/utf8.zig"), 18 | .codepoint = @import("./utils/codepoint/codepoint.zig"), 19 | .grapheme = @import("./utils/grapheme/grapheme.zig"), 20 | .memory = @import("./utils/memory/memory.zig"), 21 | }; 22 | 23 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 24 | 25 | 26 | 27 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗ 28 | 29 | test { 30 | // utils 31 | _ = @import("./utils/ascii/ascii.test.zig"); 32 | _ = @import("./utils/utf8/utf8.test.zig"); 33 | _ = @import("./utils/codepoint/codepoint.test.zig"); 34 | _ = @import("./utils/grapheme/grapheme.test.zig"); 35 | _ = @import("./utils/memory/memory.test.zig"); 36 | } 37 | 38 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /lib/string/utils/codepoint/codepoint.test.zig: -------------------------------------------------------------------------------- 1 | // codepoint.test.zig ! 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/utils/codepoint 5 | // author : https://github.com/maysara-elshewehy 6 | // 7 | // Developed with ❤️ by Maysara. 8 | 9 | 10 | 11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 12 | 13 | const Codepoint = @import("./codepoint.zig"); 14 | const testing = @import("std").testing; 15 | 16 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 17 | 18 | 19 | 20 | // ╔══════════════════════════════════════ INIT ══════════════════════════════════════╗ 21 | 22 | const SAMPLE = "A©€😀"; 23 | const RESULT = [_]struct { utf8: []const u8, cp: u21 } { 24 | .{ .utf8 = "A", .cp = 0x00041 }, 25 | .{ .utf8 = "©", .cp = 0x000A9 }, 26 | .{ .utf8 = "€", .cp = 0x020AC }, 27 | .{ .utf8 = "😀", .cp = 0x1F600 }, 28 | }; 29 | 30 | const INVALID_CP = 0x110000; 31 | const INVALID_UTF8 = "\xFF"; 32 | 33 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 34 | 35 | 36 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗ 37 | 38 | test "Codepoint" { 39 | for(0..RESULT.len) |i| { 40 | // init (using codepoint) 41 | if(Codepoint.init(RESULT[i].cp)) |cp| { 42 | try testing.expectEqual(RESULT[i].utf8.len, cp.len); 43 | try testing.expectEqual(RESULT[i].cp, cp.src); 44 | } else return error.CodepointInitError; 45 | 46 | // fromUtf8 (using UTF-8 slice) 47 | if(Codepoint.fromUtf8(RESULT[i].utf8)) |cp| { 48 | try testing.expectEqual(RESULT[i].utf8.len, cp.len); 49 | try testing.expectEqual(RESULT[i].cp, cp.src); 50 | } else return error.CodepointFromUtf8Error; 51 | } 52 | 53 | // must fail 54 | if(Codepoint.init(INVALID_CP)) |_| return error.CodepointInitMustFail; 55 | if(Codepoint.fromUtf8(INVALID_UTF8)) |_| return error.CodepointFromUtf8MustFail; 56 | } 57 | 58 | test "Utf8Iterator" { 59 | // must fail 60 | if(Codepoint.Utf8Iterator.init(INVALID_UTF8)) |_| return error.Utf8IteratorInitMustFail; 61 | var iterator = Codepoint.Utf8Iterator.init(SAMPLE) orelse return error.Utf8IteratorInitError; 62 | 63 | for(0..RESULT.len) |i| { 64 | 65 | // codepoint 66 | if(iterator.nextCodepoint()) |cp| { 67 | try testing.expectEqual(RESULT[i].utf8.len, cp.len); 68 | try testing.expectEqual(RESULT[i].cp, cp.src); 69 | 70 | // reset 71 | iterator.pos -= cp.len; 72 | } else return error.NextCodepointError; 73 | 74 | // utf8 slice 75 | if(iterator.nextSlice()) |slice| { 76 | try testing.expectEqual(RESULT[i].utf8.len, slice.len); 77 | try testing.expectEqualStrings(RESULT[i].utf8, slice); 78 | 79 | // reset 80 | iterator.pos -= slice.len; 81 | } else return error.NextSliceError; 82 | 83 | // length 84 | if(iterator.nextLength()) |len| { 85 | try testing.expectEqual(RESULT[i].utf8.len, len); 86 | } else return error.NextLengthError; 87 | } 88 | 89 | // position 90 | try testing.expectEqual(SAMPLE.len, iterator.pos); 91 | } 92 | 93 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /lib/string/utils/utf8/utf8.test.zig: -------------------------------------------------------------------------------- 1 | // Copyright (c) 2025 Maysara, All rights reserved. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/utf8 5 | // 6 | // owner : https://github.com/maysara-elshewehy 7 | // email : maysara.elshewehy@gmail.com 8 | // 9 | // Made with ❤️ by Maysara 10 | 11 | 12 | 13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 14 | 15 | const std = @import("std"); 16 | const testing = std.testing; 17 | const utf8 = @import("./utf8.zig"); 18 | 19 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 20 | 21 | 22 | 23 | // ╔══════════════════════════════════════ INIT ══════════════════════════════════════╗ 24 | 25 | const tests = [_]struct { slice: []const u8, codepoint: u21, } { 26 | .{ .slice = "A", .codepoint = 0x00041 }, 27 | .{ .slice = "©", .codepoint = 0x000A9 }, 28 | .{ .slice = "€", .codepoint = 0x020AC }, 29 | .{ .slice = "😀", .codepoint = 0x1F600 }, 30 | }; 31 | 32 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 33 | 34 | 35 | 36 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗ 37 | 38 | // ┌────────────────────────── Conversion ────────────────────────┐ 39 | 40 | test "utf8.encode" { 41 | var buf: [4]u8 = undefined; 42 | 43 | for (tests) |t| { 44 | const len = utf8.encode(t.codepoint, &buf); 45 | try testing.expectEqual(t.slice.len, len); 46 | for (0..len) |i| 47 | try testing.expectEqual(t.slice[i], buf[i]); 48 | } 49 | } 50 | 51 | test "utf8.decode" { 52 | for (tests) |t| { 53 | const cp = utf8.decode(t.slice); 54 | try testing.expectEqual(t.codepoint, cp); 55 | try testing.expectEqual(t.slice.len, utf8.getCodepointLength(cp)); 56 | } 57 | } 58 | 59 | // └──────────────────────────────────────────────────────────────┘ 60 | 61 | 62 | // ┌────────────────────────── Properties ────────────────────────┐ 63 | 64 | test "utf8.getCodepointLength" { 65 | try testing.expectEqual(@as(u3, 1), utf8.getCodepointLength('A')); 66 | try testing.expectEqual(@as(u3, 2), utf8.getCodepointLength(0x00A9)); 67 | try testing.expectEqual(@as(u3, 3), utf8.getCodepointLength(0x20AC)); 68 | try testing.expectEqual(@as(u3, 4), utf8.getCodepointLength(0x1F600)); 69 | try testing.expectEqual(null, utf8.getCodepointLengthOrNull(0x110000)); 70 | } 71 | 72 | test "utf8.getSequenceLength" { 73 | try testing.expectEqual(@as(u3, 1), utf8.getSequenceLength('A')); 74 | try testing.expectEqual(@as(u3, 2), utf8.getSequenceLength(0xC2)); 75 | try testing.expectEqual(@as(u3, 3), utf8.getSequenceLength(0xE2)); 76 | try testing.expectEqual(@as(u3, 4), utf8.getSequenceLength(0xF0)); 77 | try testing.expectEqual(null, utf8.getSequenceLengthOrNull(0xF8)); 78 | } 79 | 80 | test "utf8.isValidSlice" { 81 | // Valid UTF-8 sequences 82 | try testing.expect(utf8.isValidSlice("")); 83 | try testing.expect(utf8.isValidSlice("Hello")); 84 | try testing.expect(utf8.isValidSlice("Hello 世界")); 85 | try testing.expect(utf8.isValidSlice("🌍🌎🌏")); 86 | 87 | // Invalid UTF-8 sequences 88 | try testing.expect(!utf8.isValidSlice(&[_]u8{0xFF})); 89 | try testing.expect(!utf8.isValidSlice(&[_]u8{0xC0, 0x80})); 90 | try testing.expect(!utf8.isValidSlice(&[_]u8{0xE0, 0x80})); 91 | try testing.expect(!utf8.isValidSlice(&[_]u8{0xF0, 0x80, 0x80})); 92 | } 93 | 94 | test "utf8.isValidCodepoint" { 95 | // Valid UTF-8 sequences 96 | try testing.expect(utf8.isValidCodepoint('A')); 97 | try testing.expect(utf8.isValidCodepoint('🌍')); 98 | try testing.expect(utf8.isValidCodepoint('世')); 99 | 100 | // Invalid UTF-8 sequences 101 | try testing.expect(!utf8.isValidCodepoint(0x110000)); 102 | 103 | // ... 104 | } 105 | 106 | // └──────────────────────────────────────────────────────────────┘ 107 | 108 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 |
3 |
4 |

5 | Input / Output 6 |

7 |
8 | 9 |

10 | Version 11 | 12 | CI 13 | 14 | Github Repo Issues 15 | 16 | license 17 | 18 | GitHub Repo stars 19 |

20 | 21 |

22 | 23 | When simplicity meets efficiency 24 | 25 |

26 | 27 |
28 | 29 | 30 | 31 | part of SuperZIG framework 32 | 33 | 34 | 35 |
36 | 37 |
38 | line 39 |
40 |
41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | - **🍃 Zero dependencies**—meticulously crafted code. 49 | 50 | - **🚀 Blazing fast**—almost as fast as light! 51 | 52 | - **🌍 Universal compatibility**—Windows, Linux, and macOS. 53 | 54 | - **🛡️ Battle-tested**—ready for production. 55 | 56 |
57 |
58 | line 59 |
60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | - ### API 68 | 69 | - #### String 70 | - ##### Types 71 | > - ##### view 72 | > - ##### fixed 73 | > - ##### managed 74 | > - ##### unmanaged 75 | 76 | - ##### Utils 77 | - ##### [ascii](https://super-zig.github.io/io/string/utils/ascii) 78 | - ##### [utf8](https://super-zig.github.io/io/string/utils/utf8) 79 | - ##### [codepoint](https://super-zig.github.io/io/string/utils/codepoint) 80 | > - ##### grapheme 81 | > - ##### memory 82 | 83 | - #### Terminal 84 | - ##### App 85 | > - ##### cli 86 | > - ##### prompts 87 | 88 | - ##### Utils 89 | > - ##### ansi 90 | > - ##### print 91 | > - ##### info 92 | > - ##### settings 93 | > - ##### events 94 | 95 |
96 |
97 | line 98 |
99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | - ### Benchmark 107 | 108 | > [See benchmark results comparing `SuperZIG.io` with popular alternatives.](https://github.com/Super-ZIG/io-bench) 109 | 110 |
111 |
112 | line 113 |
114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 |
122 |
123 | 124 | 125 | 126 |
127 | 128 | -------------------------------------------------------------------------------- /docs/index.md: -------------------------------------------------------------------------------- 1 | 2 |
3 |
4 |

5 | Input / Output 6 |

7 |
8 | 9 |

10 | Version 11 | 12 | CI 13 | 14 | Github Repo Issues 15 | 16 | license 17 | 18 | GitHub Repo stars 19 |

20 | 21 |

22 | 23 | When simplicity meets efficiency 24 | 25 |

26 | 27 |
28 | 29 | 30 | 31 | part of SuperZIG framework 32 | 33 | 34 | 35 |
36 | 37 |
38 | line 39 |
40 |
41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | - **🍃 Zero dependencies**—meticulously crafted code. 49 | 50 | - **🚀 Blazing fast**—almost as fast as light! 51 | 52 | - **🌍 Universal compatibility**—Windows, Linux, and macOS. 53 | 54 | - **🛡️ Battle-tested**—ready for production. 55 | 56 |
57 |
58 | line 59 |
60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | - ### API 68 | 69 | - #### String 70 | - ##### Types 71 | > - ##### view 72 | > - ##### fixed 73 | > - ##### managed 74 | > - ##### unmanaged 75 | 76 | - ##### Utils 77 | - ##### [ascii](https://super-zig.github.io/io/string/utils/ascii) 78 | - ##### [utf8](https://super-zig.github.io/io/string/utils/utf8) 79 | - ##### [codepoint](https://super-zig.github.io/io/string/utils/codepoint) 80 | > - ##### grapheme 81 | > - ##### memory 82 | 83 | - #### Terminal 84 | - ##### App 85 | > - ##### cli 86 | > - ##### prompts 87 | 88 | - ##### Utils 89 | > - ##### ansi 90 | > - ##### print 91 | > - ##### info 92 | > - ##### settings 93 | > - ##### events 94 | 95 |
96 |
97 | line 98 |
99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | - ### Benchmark 107 | 108 | > [See benchmark results comparing `SuperZIG.io` with popular alternatives.](https://github.com/Super-ZIG/io-bench) 109 | 110 |
111 |
112 | line 113 |
114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 |
122 |
123 | 124 | 125 | 126 |
127 | 128 | -------------------------------------------------------------------------------- /lib/string/utils/ascii/ascii.zig: -------------------------------------------------------------------------------- 1 | // ascii.zig — ASCII handling module for I/O library. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/utils/ascii 5 | // author : https://github.com/maysara-elshewehy 6 | // 7 | // Developed with ❤️ by Maysara. 8 | 9 | 10 | 11 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗ 12 | 13 | // ┌────────────────────────── Conversion ────────────────────────┐ 14 | 15 | /// Converts a character to uppercase. 16 | /// If the character is not a lowercase letter, it is returned unchanged. 17 | pub fn toUpper(c: u8) u8 { 18 | // credit: std.ascii.toUpper 19 | const mask = @as(u8, @intFromBool(@call(.always_inline, isLower, .{c}))) << 5; 20 | return c ^ mask; 21 | } 22 | 23 | /// Converts a character to lowercase. 24 | /// If the character is not an uppercase letter, it is returned unchanged. 25 | pub fn toLower(c: u8) u8 { 26 | // credit: std.ascii.toLower 27 | const mask = @as(u8, @intFromBool(@call(.always_inline, isUpper, .{c}))) << 5; 28 | return c | mask; 29 | } 30 | 31 | // └──────────────────────────────────────────────────────────────┘ 32 | 33 | 34 | // ┌────────────────────────── Properties ────────────────────────┐ 35 | 36 | /// Returns true if the character is an uppercase letter (`A-Z`). 37 | pub fn isUpper(c: u8) bool { 38 | return switch (c) { 39 | 'A'...'Z' => true, 40 | else => false, 41 | }; 42 | } 43 | 44 | /// Returns true if the character is a lowercase letter (`a-z`). 45 | pub fn isLower(c: u8) bool { 46 | return switch (c) { 47 | 'a'...'z' => true, 48 | else => false, 49 | }; 50 | } 51 | 52 | /// Returns true if the character is an alphabetic letter (`A-Z`, `a-z`). 53 | pub fn isAlphabetic(c: u8) bool { 54 | return switch (c) { 55 | 'A'...'Z', 'a'...'z' => true, 56 | else => false, 57 | }; 58 | } 59 | 60 | /// Returns true if the character is a numeric digit (`0-9`). 61 | pub fn isDigit(c: u8) bool { 62 | return switch (c) { 63 | '0'...'9' => true, 64 | else => false, 65 | }; 66 | } 67 | 68 | /// Returns true if the character is alphanumeric (`A-Z`, `a-z`, `0-9`). 69 | pub fn isAlphanumeric(c: u8) bool { 70 | return switch (c) { 71 | 'A'...'Z', 'a'...'z', '0'...'9' => true, 72 | else => false, 73 | }; 74 | } 75 | 76 | /// Returns true if the character is a hexadecimal digit (`0-9`, `A-F`, `a-f`). 77 | pub fn isHex(c: u8) bool { 78 | return switch (c) { 79 | '0'...'9', 'A'...'F', 'a'...'f' => true, 80 | else => false, 81 | }; 82 | } 83 | 84 | /// Returns true if the character is an octal digit (`0-7`). 85 | pub fn isOctal(c: u8) bool { 86 | return switch (c) { 87 | '0'...'7' => true, 88 | else => false, 89 | }; 90 | } 91 | 92 | /// Returns true if the character is a binary digit (`0`, `1`). 93 | pub fn isBinary(c: u8) bool { 94 | return switch (c) { 95 | '0', '1' => true, 96 | else => false, 97 | }; 98 | } 99 | 100 | /// Returns true if the character is a punctuation symbol 101 | /// (any printable ASCII character that is not a letter, digit, or space). 102 | pub fn isPunctuation(c: u8) bool { 103 | return switch (c) { 104 | '!'...'/', ':'...'@', '['...'`', '{'...'~' => true, 105 | else => false, 106 | }; 107 | } 108 | 109 | /// Returns true if the character is a whitespace character 110 | /// (space, tab, newline, carriage return). 111 | pub fn isWhitespace(c: u8) bool { 112 | return switch (c) { 113 | ' ', '\t', '\n', '\r' => true, 114 | else => false, 115 | }; 116 | } 117 | 118 | /// Returns true if the character is printable 119 | /// (ASCII 0x20-0x7E, i.e., space through tilde). 120 | pub fn isPrintable(c: u8) bool { 121 | return switch (c) { 122 | ' '...'~' => true, 123 | else => false, 124 | }; 125 | } 126 | 127 | /// Returns true if the character is a control character 128 | /// (ASCII 0x00-0x1F or 0x7F). 129 | pub fn isControl(c: u8) bool { 130 | return (c <= 0x1F) or (c == 0x7F); 131 | } 132 | 133 | // └──────────────────────────────────────────────────────────────┘ 134 | 135 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /lib/string/utils/ascii/ascii.test.zig: -------------------------------------------------------------------------------- 1 | // ascii.test.zig ! 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/utils/ascii 5 | // author : https://github.com/maysara-elshewehy 6 | // 7 | // Developed with ❤️ by Maysara. 8 | 9 | 10 | 11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 12 | 13 | const std = @import("std"); 14 | const testing = std.testing; 15 | const ascii = @import("./ascii.zig"); 16 | 17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 18 | 19 | 20 | 21 | // ╔══════════════════════════════════════ INIT ══════════════════════════════════════╗ 22 | 23 | const uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; 24 | const lowercase = "abcdefghijklmnopqrstuvwxyz"; 25 | const digits = "0123456789"; 26 | const letters = uppercase ++ lowercase; 27 | 28 | const punctuation = "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"; 29 | const printable = letters ++ digits ++ punctuation ++ " "; 30 | 31 | const whitespace = " \t\n\r"; 32 | const control = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"; 33 | const non_ascii = "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"; 34 | 35 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 36 | 37 | 38 | 39 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗ 40 | 41 | // ┌────────────────────────── Conversion ────────────────────────┐ 42 | 43 | test "toUpper" { 44 | for(0..lowercase.len) |i| { 45 | try testing.expectEqual(uppercase[i], ascii.toUpper(lowercase[i])); 46 | } 47 | 48 | // unchanged 49 | for(uppercase ++ digits ++ punctuation) |c| { 50 | try testing.expectEqual(c, ascii.toUpper(c)); 51 | } 52 | } 53 | 54 | test "toLower" { 55 | for(0..uppercase.len) |i| { 56 | try testing.expectEqual(lowercase[i], ascii.toLower(uppercase[i])); 57 | } 58 | 59 | // unchanged 60 | for(lowercase ++ digits ++ punctuation) |c| { 61 | try testing.expectEqual(c, ascii.toLower(c)); 62 | } 63 | } 64 | 65 | // └──────────────────────────────────────────────────────────────┘ 66 | 67 | 68 | // ┌────────────────────────── Properties ────────────────────────┐ 69 | 70 | test "isUpper" { 71 | for(uppercase) |c| { 72 | try testing.expect(ascii.isUpper(c)); 73 | } 74 | 75 | // false 76 | for(lowercase) |c| { 77 | try testing.expect(!ascii.isUpper(c)); 78 | } 79 | } 80 | 81 | test "isLower" { 82 | for(lowercase) |c| { 83 | try testing.expect(ascii.isLower(c)); 84 | } 85 | 86 | // false 87 | for(uppercase) |c| { 88 | try testing.expect(!ascii.isLower(c)); 89 | } 90 | } 91 | 92 | test "isAlphabetic" { 93 | for(letters) |c| { 94 | try testing.expect(ascii.isAlphabetic(c)); 95 | } 96 | 97 | // false 98 | for(digits) |c| { 99 | try testing.expect(!ascii.isAlphabetic(c)); 100 | } 101 | } 102 | 103 | test "isDigit" { 104 | for(digits) |c| { 105 | try testing.expect(ascii.isDigit(c)); 106 | } 107 | 108 | // false 109 | for(letters) |c| { 110 | try testing.expect(!ascii.isDigit(c)); 111 | } 112 | } 113 | 114 | test "isAlphanumeric" { 115 | for(letters ++ digits) |c| { 116 | try testing.expect(ascii.isAlphanumeric(c)); 117 | } 118 | 119 | // false 120 | for(punctuation) |c| { 121 | try testing.expect(!ascii.isAlphanumeric(c)); 122 | } 123 | } 124 | 125 | test "isHex" { 126 | for("0123456789ABCDEFabcdef") |c| { 127 | try testing.expect(ascii.isHex(c)); 128 | } 129 | 130 | for(non_ascii) |c| { 131 | try testing.expect(!ascii.isHex(c)); 132 | } 133 | } 134 | 135 | test "isOctal" { 136 | for("01234567") |c| { 137 | try testing.expect(ascii.isOctal(c)); 138 | } 139 | 140 | for(non_ascii) |c| { 141 | try testing.expect(!ascii.isOctal(c)); 142 | } 143 | } 144 | 145 | test "isBinary" { 146 | for("01") |c| { 147 | try testing.expect(ascii.isBinary(c)); 148 | } 149 | 150 | for(non_ascii) |c| { 151 | try testing.expect(!ascii.isBinary(c)); 152 | } 153 | } 154 | 155 | test "isPunctuation" { 156 | for(punctuation) |c| { 157 | try testing.expect(ascii.isPunctuation(c)); 158 | } 159 | 160 | // false 161 | for(letters ++ digits) |c| { 162 | try testing.expect(!ascii.isPunctuation(c)); 163 | } 164 | } 165 | 166 | test "isWhitespace" { 167 | for(whitespace) |c| { 168 | try testing.expect(ascii.isWhitespace(c)); 169 | } 170 | 171 | // false 172 | for(non_ascii) |c| { 173 | try testing.expect(!ascii.isWhitespace(c)); 174 | } 175 | } 176 | 177 | test "isPrintable" { 178 | for(printable) |c| { 179 | try testing.expect(ascii.isPrintable(c)); 180 | } 181 | 182 | // false 183 | for(non_ascii) |c| { 184 | try testing.expect(!ascii.isPrintable(c)); 185 | } 186 | } 187 | 188 | test "isControl" { 189 | for(control) |c| { 190 | try testing.expect(ascii.isControl(c)); 191 | } 192 | 193 | // false 194 | for(non_ascii) |c| { 195 | try testing.expect(!ascii.isControl(c)); 196 | } 197 | } 198 | 199 | // └──────────────────────────────────────────────────────────────┘ 200 | 201 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /lib/string/utils/codepoint/codepoint.zig: -------------------------------------------------------------------------------- 1 | // codepoint.zig — Codepoint handling module for I/O library. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/utils/codepoint 5 | // author : https://github.com/maysara-elshewehy 6 | // 7 | // Developed with ❤️ by Maysara. 8 | 9 | 10 | 11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗ 12 | 13 | const utf8 = @import("../utf8/utf8.zig"); 14 | const Codepoint = @This(); 15 | 16 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 17 | 18 | 19 | 20 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗ 21 | 22 | // ┌─────────────────────────── Fields ───────────────────────────┐ 23 | 24 | /// Numeric value of the Unicode codepoint (U+0000 to U+10FFFF). 25 | src: u21 = 0, 26 | 27 | /// Length of this codepoint in UTF-8 (1-4 bytes). 28 | len: u3 = 0, 29 | 30 | // └──────────────────────────────────────────────────────────────┘ 31 | 32 | 33 | // ┌────────────────────────── Methods ───────────────────────────┐ 34 | 35 | /// Initializes a Codepoint from a Unicode scalar value if valid. 36 | /// Returns null if the codepoint is invalid according to UTF-8. 37 | pub fn init(cp: u21) ?Codepoint { 38 | return if(@call(.always_inline, utf8.isValidCodepoint, .{cp})) 39 | @call(.always_inline, unsafe_init, .{cp}) else null; 40 | } 41 | 42 | /// Initializes a Codepoint from a Unicode scalar value. 43 | /// Assumes the input is a valid codepoint. 44 | pub fn unsafe_init(cp: u21) Codepoint { 45 | return .{ 46 | .src = cp, 47 | .len = @call(.always_inline, utf8.getCodepointLength, .{cp}) 48 | }; 49 | } 50 | 51 | /// Initializes a Codepoint from a UTF-8 encoded byte slice if valid. 52 | /// Returns null if the slice is empty or contains an invalid UTF-8 sequence. 53 | pub fn fromUtf8(slice: []const u8) ?Codepoint { 54 | return if(slice.len == 0 or !@call(.always_inline, utf8.isValidSlice, .{slice})) null 55 | else @call(.always_inline, unsafe_fromUtf8, .{slice}); 56 | } 57 | 58 | /// Initializes a Codepoint from a UTF-8 encoded byte slice. 59 | /// Assumes the input is a valid UTF-8 slice. 60 | pub fn unsafe_fromUtf8(slice: []const u8) Codepoint { 61 | return if(@call(.always_inline, utf8.getSequenceLengthOrNull, .{slice[0]})) |len| 62 | .{ 63 | .src = utf8.decode(slice[0..len]), 64 | .len = len 65 | } else .{}; 66 | } 67 | 68 | // └──────────────────────────────────────────────────────────────┘ 69 | 70 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ 71 | 72 | 73 | 74 | // ╔══════════════════════════════════════ ITER ══════════════════════════════════════╗ 75 | 76 | pub const Utf8Iterator = struct { 77 | 78 | // ┌─────────────────────────── Fields ───────────────────────────┐ 79 | 80 | /// The UTF-8 encoded string that the iterator will traverse. 81 | src: []const u8, 82 | 83 | /// The current byte position in the string. 84 | pos: usize = 0, 85 | 86 | // └──────────────────────────────────────────────────────────────┘ 87 | 88 | 89 | // ┌────────────────────────── Methods ───────────────────────────┐ 90 | 91 | /// Initializes a new Utf8Iterator from the given UTF-8 slice if valid. 92 | /// Returns null if the slice is empty or contains invalid UTF-8. 93 | pub fn init(slice: []const u8) ?Utf8Iterator { 94 | return if(slice.len == 0 or !utf8.isValidSlice(slice)) null 95 | else @call(.always_inline, Utf8Iterator.unsafe_init, .{slice}); 96 | } 97 | 98 | /// Initializes a new Utf8Iterator from the given UTF-8 slice. 99 | /// Assumes the input is a valid UTF-8 slice. 100 | pub fn unsafe_init(slice: []const u8) Utf8Iterator { 101 | return .{ .src = slice }; 102 | } 103 | 104 | /// Returns the next Codepoint and increments the position. 105 | pub fn nextCodepoint(self: *Utf8Iterator) ?Codepoint { 106 | if(@call(.always_inline, Utf8Iterator.peekCodepoint, .{self.*})) |cp| { 107 | self.pos += cp.len; 108 | return cp; 109 | } else return null; 110 | } 111 | 112 | /// Returns the next UTF-8 slice and increments the position. 113 | pub fn nextSlice(self: *Utf8Iterator) ?[]const u8 { 114 | if(@call(.always_inline, Utf8Iterator.peekSlice, .{self.*})) |slice| { 115 | self.pos += slice.len; 116 | return slice; 117 | } else return null; 118 | } 119 | 120 | /// Returns the next codepoint length and increments the position. 121 | pub fn nextLength(self: *Utf8Iterator) ?u3 { 122 | if(@call(.always_inline, Utf8Iterator.peekLength, .{self.*})) |len| { 123 | self.pos += len; 124 | return len; 125 | } else return null; 126 | } 127 | 128 | /// Returns the next Codepoint without incrementing the position. 129 | pub fn peekCodepoint(self: Utf8Iterator) ?Codepoint { 130 | if(@call(.always_inline, Utf8Iterator.peekLength, .{self})) |len| { 131 | return Codepoint { 132 | .src = utf8.decode(self.src[self.pos..self.pos+len]), 133 | .len = len 134 | }; 135 | } else return null; 136 | } 137 | 138 | /// Returns the next UTF-8 slice without incrementing the position. 139 | pub fn peekSlice(self: Utf8Iterator) ?[]const u8 { 140 | if(@call(.always_inline, Utf8Iterator.peekLength, .{self})) |len| { 141 | return self.src[self.pos..self.pos+len]; 142 | } else return null; 143 | } 144 | 145 | /// Returns the next codepoint length without incrementing the position. 146 | pub fn peekLength(self: Utf8Iterator) ?u3 { 147 | return if(self.pos == self.src.len) null 148 | else @call(.always_inline, utf8.getSequenceLengthOrNull, .{self.src[self.pos]}); 149 | } 150 | 151 | // └──────────────────────────────────────────────────────────────┘ 152 | }; 153 | 154 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /lib/string/utils/utf8/utf8.zig: -------------------------------------------------------------------------------- 1 | // Copyright (c) 2025 Maysara, All rights reserved. 2 | // 3 | // repo : https://github.com/Super-ZIG/io 4 | // docs : https://super-zig.github.io/io/string/utils/utf8 5 | // 6 | // owner : https://github.com/maysara-elshewehy 7 | // email : maysara.elshewehy@gmail.com 8 | // 9 | // Made with ❤️ by Maysara 10 | 11 | 12 | 13 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗ 14 | 15 | // ┌────────────────────────── Conversion ────────────────────────┐ 16 | 17 | /// Encodes a single Unicode `codepoint` to a UTF-8 sequence. 18 | /// Returns the number of bytes written. 19 | /// 20 | /// Assumes the input `codepoint` is valid and the output slice is large enough. 21 | pub fn encode(cp: u21, out: []u8) u3 { 22 | const length = @call(.always_inline, getCodepointLength, .{cp}); 23 | 24 | switch (length) { 25 | 1 => { 26 | out[0] = @truncate(cp); 27 | }, 28 | 29 | 2 => { 30 | out[0] = @truncate(0xC0 | (cp >> 6 )); 31 | out[1] = @truncate(0x80 | (cp & 0x3F)); 32 | }, 33 | 34 | 3 => { 35 | out[0] = @truncate(0xE0 | (cp >> 12 )); 36 | out[1] = @truncate(0x80 | ((cp >> 6) & 0x3F)); 37 | out[2] = @truncate(0x80 | (cp & 0x3F)); 38 | }, 39 | 40 | else => { 41 | out[0] = @truncate(0xF0 | (cp >> 18 )); 42 | out[1] = @truncate(0x80 | ((cp >> 12) & 0x3F)); 43 | out[2] = @truncate(0x80 | ((cp >> 6) & 0x3F)); 44 | out[3] = @truncate(0x80 | (cp & 0x3F)); 45 | } 46 | } 47 | 48 | return length; 49 | } 50 | 51 | /// Decodes a UTF-8 sequence to a Unicode `codepoint`. 52 | /// Returns the decoded codepoint. 53 | /// 54 | /// Assumes the input slice is a valid UTF-8 sequence of length 1-4. 55 | pub fn decode(slice: []const u8) u21 { 56 | return switch (slice.len) { 57 | 1 => @as(u21, 58 | slice[0]), 59 | 60 | 2 => (@as(u21, 61 | (slice[0] & 0x1F)) << 6) | (slice[1] & 0x3F), 62 | 63 | 3 => (((@as(u21, 64 | (slice[0] & 0x0F)) << 6) | (slice[1] & 0x3F)) << 6) | (slice[2] & 0x3F), 65 | 66 | else => (((((@as(u21, 67 | (slice[0] & 0x07)) << 6) | (slice[1] & 0x3F)) << 6) | (slice[2] & 0x3F)) << 6) | (slice[3] & 0x3F) 68 | }; 69 | } 70 | 71 | // └──────────────────────────────────────────────────────────────┘ 72 | 73 | 74 | // ┌────────────────────────── Properties ────────────────────────┐ 75 | 76 | 77 | /// Returns the number of bytes (1-4) needed to encode a `codepoint` in UTF-8 format. 78 | pub fn getCodepointLength(cp: u21) u3 { 79 | return switch (cp) { 80 | 0x00000...0x00007F => @as(u3, 1), 81 | 0x00080...0x0007FF => @as(u3, 2), 82 | 0x00800...0x00FFFF => @as(u3, 3), 83 | else => @as(u3, 4), 84 | }; 85 | } 86 | 87 | /// Returns the number of bytes (1-4) needed to encode a `codepoint` in UTF-8 format, 88 | /// or null if the codepoint is invalid. 89 | pub fn getCodepointLengthOrNull(cp: u21) ?u3 { 90 | return if (cp > 0x10FFFF) null else @call(.always_inline, getCodepointLength, .{cp}); 91 | } 92 | 93 | /// Returns the expected number of bytes (1-4) in a UTF-8 sequence based on the first byte. 94 | pub fn getSequenceLength(first_byte: u8) u3 { 95 | return switch (first_byte) { 96 | 0x00...0x7F => @as(u3, 1), 97 | 0xC0...0xDF => @as(u3, 2), 98 | 0xE0...0xEF => @as(u3, 3), 99 | else => @as(u3, 4), 100 | }; 101 | } 102 | 103 | /// Returns the expected number of bytes (1-4) in a UTF-8 sequence based on the first byte, 104 | /// or null if the first byte is not a valid starter. 105 | pub fn getSequenceLengthOrNull(first_byte: u8) ?u3 { 106 | return if (first_byte > 0xF7) null 107 | else @call(.always_inline, getSequenceLength, .{first_byte}); 108 | } 109 | 110 | /// Returns true if the provided slice contains valid UTF-8 data. 111 | pub fn isValidSlice(utf8: []const u8) bool { 112 | // Inspired by: std.unicode.utf8ValidateSliceImpl 113 | // Todo: optimize or remove it (This was for learning purposes). 114 | 115 | // default lowest and highest continuation byte 116 | const lo_cb = 0b10000000; 117 | const hi_cb = 0b10111111; 118 | 119 | var remaining = utf8; 120 | vectorized: { 121 | const chunk_len = @import("std").simd.suggestVectorLength(u8) orelse break :vectorized; 122 | const Chunk = @Vector(chunk_len, u8); 123 | 124 | // Fast path. Check for and skip ASCII characters at the start of the input. 125 | while (remaining.len >= chunk_len) { 126 | const chunk: Chunk = remaining[0..chunk_len].*; 127 | const mask: Chunk = @splat(0x80); 128 | if (@reduce(.Or, chunk & mask == mask)) { 129 | // found a non ASCII byte 130 | break; 131 | } 132 | remaining = remaining[chunk_len..]; 133 | } 134 | } 135 | 136 | // The first nibble is used to identify the continuation byte range to 137 | // accept. The second nibble is the size. 138 | const xx = 0xF1; // invalid: size 1 139 | const as = 0xF0; // ASCII: size 1 140 | const s1 = 0x02; // accept 0, size 2 141 | const s2 = 0x13; // accept 1, size 3 142 | const s3 = 0x03; // accept 0, size 3 143 | const s4 = 0x23; // accept 2, size 3 144 | const s5 = 0x34; // accept 3, size 4 145 | const s6 = 0x04; // accept 0, size 4 146 | const s7 = 0x44; // accept 4, size 4 147 | 148 | // Information about the first byte in a UTF-8 sequence. 149 | const first = comptime ([_]u8{as} ** 128) ++ ([_]u8{xx} ** 64) ++ [_]u8{ 150 | xx, xx, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, 151 | s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, 152 | s2, s3, s3, s3, s3, s3, s3, s3, s3, s3, s3, s3, s3, s4, s3, s3, 153 | s5, s6, s6, s6, s7, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx, 154 | }; 155 | 156 | const n = remaining.len; 157 | var i: usize = 0; 158 | while (i < n) { 159 | const first_byte = remaining[i]; 160 | if (first_byte < 0x80) { 161 | i += 1; 162 | continue; 163 | } 164 | 165 | const info = first[first_byte]; 166 | if (info == xx) { 167 | return false; // Illegal starter byte. 168 | } 169 | 170 | const size = info & 7; 171 | if (i + size > n) { 172 | return false; // Short or invalid. 173 | } 174 | 175 | // Figure out the acceptable low and high continuation bytes, starting 176 | // with our defaults. 177 | var accept_lo: u8 = lo_cb; 178 | var accept_hi: u8 = hi_cb; 179 | 180 | switch (info >> 4) { 181 | 0 => {}, 182 | 1 => accept_lo = 0xA0, 183 | 2 => accept_hi = 0x9F, 184 | 3 => accept_lo = 0x90, 185 | 4 => accept_hi = 0x8F, 186 | else => unreachable, 187 | } 188 | 189 | const c1 = remaining[i + 1]; 190 | if (c1 < accept_lo or accept_hi < c1) { 191 | return false; 192 | } 193 | 194 | switch (size) { 195 | 2 => i += 2, 196 | 3 => { 197 | const c2 = remaining[i + 2]; 198 | if (c2 < lo_cb or hi_cb < c2) return false; 199 | i += 3; 200 | }, 201 | 4 => { 202 | const c2 = remaining[i + 2]; 203 | if (c2 < lo_cb or hi_cb < c2) return false; 204 | const c3 = remaining[i + 3]; 205 | if (c3 < lo_cb or hi_cb < c3) return false; 206 | i += 4; 207 | }, 208 | else => unreachable, 209 | } 210 | } 211 | 212 | return true; 213 | } 214 | 215 | /// Returns true if the provided codepoint is valid for UTF-8 encoding. 216 | pub fn isValidCodepoint(cp: u21) bool { 217 | return cp <= 0x10FFFF; 218 | } 219 | 220 | // └──────────────────────────────────────────────────────────────┘ 221 | 222 | // ╚══════════════════════════════════════════════════════════════════════════════════╝ -------------------------------------------------------------------------------- /docs/string/utils/utf8.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 |
19 |

20 | UTF-8 21 |

22 |
23 | 24 |

25 | Version 26 | 27 | CI 28 | 29 | Github Repo Issues 30 | 31 | license 32 | 33 | GitHub Repo stars 34 |

35 | 36 |

37 | 38 | When simplicity meets efficiency 39 | 40 |

41 | 42 |
43 | 44 | 45 | 46 | part of 47 | SuperZig::io library 48 | 49 | 50 | 51 |
52 | 53 |
54 | line 55 |
56 |
57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | - **🍃 Zero dependencies**—meticulously crafted code. 65 | 66 | - **🚀 Blazing fast**—almost as fast as light! 67 | 68 | - **🌍 Universal compatibility**—Windows, Linux, and macOS. 69 | 70 | - **🛡️ Battle-tested**—ready for production. 71 | 72 |
73 |
74 | line 75 |
76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | - ### Quick Start 🔥 85 | 86 | > If you have not already added the library to your project, please review the [installation guide](https://github.com/Super-ZIG/io/wiki/installation) for more information. 87 | 88 | ```zig 89 | const utf8 = @import("io").string.utils.utf8; 90 | ``` 91 | 92 | > Convert slice to codepoint 93 | 94 | ```zig 95 | _ = utf8.decode("🌟").?; // 👉 0x1F31F 96 | ``` 97 | 98 | > Convert codepoint to slice 99 | 100 | ```zig 101 | var buf: [4]u8 = undefined; // 👉 "🌟" 102 | _ = utf8.encode(0x1F31F, &buf).?; // 👉 4 103 | ``` 104 | 105 | > Get codepoint length 106 | 107 | ```zig 108 | _ = utf8.getCodepointLength(0x1F31F); // 👉 4 109 | ``` 110 | 111 | > Get UTF-8 sequence length 112 | 113 | ```zig 114 | _ = utf8.getCodepointLength("🌟"[0]); // 👉 4 115 | ``` 116 | 117 |
118 |
119 | line 120 |
121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | - ### API 129 | 130 | - #### Encoding / Decoding 131 | 132 | | Function | Return | Description | 133 | | -------- | ------ | --------------------------------------------------------------------------------------------- | 134 | | encode | `u3` | Encode a single Unicode `codepoint` to `UTF-8 sequence`, Returns the number of bytes written. | 135 | | decode | `u21` | Decode a `UTF-8 sequence` to a Unicode `codepoint`, Returns the decoded codepoint. | 136 | 137 | - #### Properties 138 | 139 | | Function | Return | Description | 140 | | ------------------------ | ------ | -------------------------------------------------------------------------------------------- | 141 | | getCodepointLength | `u3` | Returns the number of bytes (`1-4`) needed to encode a `codepoint` in UTF-8 format. | 142 | | getCodepointLengthOrNull | `?u3` | Returns the number of bytes (`1-4`) needed to encode a `codepoint` in UTF-8 format if valid. | 143 | | getSequenceLength | `u3` | Returns the number of bytes (`1-4`) in a `UTF-8 sequence` based on the first byte. | 144 | | getSequenceLengthOrNull | `?u3` | Returns the number of bytes (`1-4`) in a `UTF-8 sequence` based on the first byte if valid. | 145 | 146 | - #### Validation 147 | 148 | | Function | Return | Description | 149 | | ---------------- | ------ | ---------------------------------------------------------------------- | 150 | | isValidSlice | `bool` | Returns true if the provided slice contains valid `UTF-8 sequence`. | 151 | | isValidCodepoint | `bool` | Returns true if the provided code point is valid for `UTF-8 encoding`. | 152 | 153 |
154 |
155 | line 156 |
157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | - ### Benchmark 165 | 166 | > A quick summary with sample performance test results between _**`SuperZIG`.`io`.`string`.`utils`.`utf8`**_ implementations and its popular competitors. 167 | 168 | - #### vs `std.unicode` 169 | 170 | > _**In summary**, `io` is faster by **5 times** compared to `std` in most cases, thanks to its optimized implementation. ✨_ 171 | 172 | - #### Debug Build (`zig build run --release=safe -- utf8`) 173 | 174 | | Benchmark | Runs | Total Time | Avg Time | Speed | 175 | | --------- | ------ | ---------- | -------- | ----- | 176 | | std_x10 | 100000 | 92.7ms | 927ns | x1.00 | 177 | | io_x10 | 100000 | 31.9ms | 319ns | x2.91 | 178 | | std_x100 | 21485 | 1.959s | 91.188us | x1.00 | 179 | | io_x100 | 96186 | 1.997s | 20.768us | x4.39 | 180 | | std_x1000 | 218 | 2.067s | 9.482ms | x1.00 | 181 | | io_x1000 | 961 | 1.87s | 1.946ms | x4.87 | 182 | 183 | - #### Release Build (`zig build run --release=fast -- utf8`) 184 | 185 | | Benchmark | Runs | Total Time | Avg Time | Speed | 186 | | --------- | ------ | ---------- | -------- | ----- | 187 | | std_x10 | 100000 | 102.6ms | 1.026us | x1.00 | 188 | | io_x10 | 100000 | 29.1ms | 291ns | x3.53 | 189 | | std_x100 | 20653 | 1.915s | 92.771us | x1.00 | 190 | | io_x100 | 100000 | 1.796s | 17.962us | x5.16 | 191 | | std_x1000 | 232 | 2.028s | 8.742ms | x1.00 | 192 | | io_x1000 | 1176 | 2.07s | 1.76ms | x4.96 | 193 | 194 | > **It is normal for the values ​​to differ each time the benchmark is run, but in general these percentages will remain close.** 195 | 196 | > The benchmarks were run on a **Windows 11 v24H2** with **11th Gen Intel® Core™ i5-1155G7 × 8** processor and **32GB** of RAM. 197 | > 198 | > The version of zig used is **0.14.0**. 199 | > 200 | > The source code of this benchmark **[bench/string/utils/utf8.zig](https://github.com/Super-ZIG/io-bench/tree/main/src/bench/string/utils/utf8.zig)**. 201 | 202 |
203 |
204 | line 205 |
206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 |
214 |
215 | 216 | 217 | 218 |
219 | 220 | -------------------------------------------------------------------------------- /docs/string/utils/ascii.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 |
19 |

20 | ASCII 21 |

22 |
23 | 24 |

25 | Version 26 | 27 | CI 28 | 29 | Github Repo Issues 30 | 31 | license 32 | 33 | GitHub Repo stars 34 |

35 | 36 |

37 | 38 | When simplicity meets efficiency 39 | 40 |

41 | 42 |
43 | 44 | 45 | 46 | part of 47 | SuperZig::io library 48 | 49 | 50 | 51 |
52 | 53 |
54 | line 55 |
56 |
57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | - **🍃 Zero dependencies**—meticulously crafted code. 65 | 66 | - **🚀 Blazing fast**—almost as fast as light! 67 | 68 | - **🌍 Universal compatibility**—Windows, Linux, and macOS. 69 | 70 | - **🛡️ Battle-tested**—ready for production. 71 | 72 |
73 |
74 | line 75 |
76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | - ### Quick Start 🔥 85 | 86 | > If you have not already added the library to your project, please review the [installation guide](https://github.com/Super-ZIG/io/wiki/installation) for more information. 87 | 88 | ```zig 89 | const ascii = @import("io").string.utils.ascii; 90 | ``` 91 | 92 | > Convert characters 93 | 94 | ```zig 95 | _ = ascii.toUpper('a'); // 👉 'A' 96 | _ = ascii.toLower('A'); // 👉 'a' 97 | ``` 98 | 99 | > Check character properties 100 | 101 | ```zig 102 | _ = ascii.isUpper('A'); // 👉 true 103 | _ = ascii.isLower('a'); // 👉 true 104 | _ = ascii.isDigit('1'); // 👉 true 105 | _ = ascii.isHex('F'); // 👉 true 106 | _ = ascii.isWhitespace(' '); // 👉 true 107 | _ = ascii.isPunctuation('!'); // 👉 true 108 | ... 109 | ``` 110 | 111 |
112 |
113 | line 114 |
115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | - ### API 123 | 124 | - #### Conversion 125 | 126 | | Function | Return | Description | 127 | | -------- | ------ | ---------------------------------------------------------------------------------------------- | 128 | | toUpper | `u8` | Converts a character to `uppercase`, If not a `lowercase` letter, it is returned `unchanged`. | 129 | | toLower | `u8` | Converts a character to `lowercase`, If not an `uppercase` letter, it is returned `unchanged`. | 130 | 131 | - #### Properties 132 | 133 | | Function | Return | Description | 134 | | -------------- | ------ | ------------------------------------------------------------------------------------------------------- | 135 | | isUpper | `bool` | Returns true if the character is an uppercase letter (`A-Z`). | 136 | | isLower | `bool` | Returns true if the character is a lowercase letter (`a-z`). | 137 | | isAlphabetic | `bool` | Returns true if the character is an alphabetic letter (`A-Z`, `a-z`). | 138 | | isDigit | `bool` | Returns true if the character is a numeric digit (`0-9`). | 139 | | isAlphanumeric | `bool` | Returns true if the character is alphanumeric (`A-Z`, `a-z`, `0-9`). | 140 | | isHex | `bool` | Returns true if the character is a hexadecimal digit (`0-9`, `A-F`, `a-f`). | 141 | | isOctal | `bool` | Returns true if the character is an octal digit (`0-7`). | 142 | | isBinary | `bool` | Returns true if the character is a binary digit (`0-1`). | 143 | | isPunctuation | `bool` | Returns true if the character is a punctuation symbol (`!`, `@`, `#`, `$`, `%`, `^`, `&`, `*`, ..). | 144 | | isWhitespace | `bool` | Returns true if the character is a whitespace character (`space`, `tab`, `newline`, `carriage return`). | 145 | | isPrintable | `bool` | Returns true if the character is printable (`A-Z`, `a-z`, `0-9`, `punctuation marks`, `space`). | 146 | | isControl | `bool` | Returns true if the character is a control character (`ASCII 0x00-0x1F or 0x7F`). | 147 | 148 |
149 |
150 | line 151 |
152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | - ### Benchmark 160 | 161 | > A quick summary with sample performance test results between _**`SuperZIG`.`io`.`string`.`utils`.`ascii`**_ implementations and its popular competitors. 162 | 163 | - #### vs `std.ascii` 164 | 165 | > _**In summary**, the two run at **the same speed** because they share almost the same code. ✨_ 166 | 167 | - #### Debug Build (`zig build run --release=safe -- ascii`) 168 | 169 | | Benchmark | Runs | Total Time | Avg Time | Speed | 170 | | --------- | ------ | ---------- | -------- | ----- | 171 | | std_x10 | 100000 | 2.2ms | 22ns | x1.00 | 172 | | io_x10 | 100000 | 1.9ms | 19ns | x1.16 | 173 | | std_x100 | 100000 | 5.9ms | 59ns | x1.00 | 174 | | io_x100 | 100000 | 5.2ms | 52ns | x1.13 | 175 | | std_x1000 | 100000 | 30.3ms | 303ns | x1.00 | 176 | | io_x1000 | 100000 | 30.5ms | 305ns | x0.99 | 177 | 178 | - #### Release Build (`zig build run --release=fast -- ascii`) 179 | 180 | | Benchmark | Runs | Total Time | Avg Time | Speed | 181 | | --------- | ------ | ---------- | -------- | ----- | 182 | | std_x10 | 100000 | 1.6ms | 16ns | x1.00 | 183 | | io_x10 | 100000 | 1.5ms | 15ns | x1.07 | 184 | | std_x100 | 100000 | 5.2ms | 52ns | x1.00 | 185 | | io_x100 | 100000 | 5.2ms | 52ns | x1.00 | 186 | | std_x1000 | 100000 | 31.1ms | 311ns | x1.00 | 187 | | io_x1000 | 100000 | 31ms | 310ns | x1.00 | 188 | 189 | > **It is normal for the values ​​to differ each time the benchmark is run, but in general these percentages will remain close.** 190 | 191 | > The benchmarks were run on a **Windows 11 v24H2** with **11th Gen Intel® Core™ i5-1155G7 × 8** processor and **32GB** of RAM. 192 | > 193 | > The version of zig used is **0.14.0**. 194 | > 195 | > The source code of this benchmark **[bench/string/utils/ascii.zig](https://github.com/Super-ZIG/io-bench/tree/main/src/bench/string/utils/ascii.zig)**. 196 | 197 |
198 |
199 | line 200 |
201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 |
209 |
210 | 211 | 212 | 213 |
214 | 215 | -------------------------------------------------------------------------------- /docs/string/utils/codepoint.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 |
19 |

20 | Codepoint 21 |

22 |
23 | 24 |

25 | Version 26 | 27 | CI 28 | 29 | Github Repo Issues 30 | 31 | license 32 | 33 | GitHub Repo stars 34 |

35 | 36 |

37 | 38 | When simplicity meets efficiency 39 | 40 |

41 | 42 |
43 | 44 | 45 | 46 | part of 47 | SuperZig::io library 48 | 49 | 50 | 51 |
52 | 53 |
54 | line 55 |
56 |
57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | - **🍃 Zero dependencies**—meticulously crafted code. 65 | 66 | - **🚀 Blazing fast**—almost as fast as light! 67 | 68 | - **🌍 Universal compatibility**—Windows, Linux, and macOS. 69 | 70 | - **🛡️ Battle-tested**—ready for production. 71 | 72 |
73 |
74 | line 75 |
76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | - ### Quick Start 🔥 85 | 86 | > If you have not already added the library to your project, please review the [installation guide](https://github.com/Super-ZIG/io/wiki/installation) for more information. 87 | 88 | ```zig 89 | const codepoint = @import("io").string.utils.codepoint; 90 | ``` 91 | 92 | > Initializes a Codepoint from a Codepoint or UTF-8 slice. 93 | 94 | ```zig 95 | _ = codepoint.init(0x1F31F).?; // 👉 .{ .src = 0x1F31F, .len = 4 } 96 | _ = codepoint.fromUtf8("🌟").?; // 👉 .{ .src = 0x1F31F, .len = 4 } 97 | ``` 98 | 99 | > Iterate over a Codepoint or UTF-8 slice. 100 | 101 | ```zig 102 | var iter = codepoint.Utf8Iterator.init("..").?; // 👉 .{ .src = "..", .pos = 0 } 103 | 104 | while(iter.nextSlice()) |slice| { .. } 105 | while(iter.nextCodepoint()) |cp| { .. } 106 | ``` 107 | 108 |
109 |
110 | line 111 |
112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | - ### API 120 | 121 | - #### Codepoint 122 | 123 | - ##### Fields 124 | 125 | | Field | Type | Description | 126 | | ----- | ----- | ------------------------------------------------------------ | 127 | | `src` | `u21` | Numeric value of the Unicode codepoint (U+0000 to U+10FFFF). | 128 | | `len` | `u3` | Length of this codepoint in UTF-8 (1-4 bytes). | 129 | 130 | - ##### Initialization 131 | 132 | | Function | Return | Description | 133 | | --------------- | ------- | ------------------------------------------------------------------ | 134 | | init | `?Self` | Initializes a Codepoint from a Unicode `codepoint` value if valid. | 135 | | unsafe_init | `Self` | Initializes a Codepoint from a Unicode `codepoint` value. | 136 | | fromUtf8 | `?Self` | Initializes a Codepoint from a `UTF-8 encoded slice` if valid. | 137 | | unsafe_fromUtf8 | `Self` | Initializes a Codepoint from a `UTF-8 encoded slice`. | 138 | 139 | - #### Utf8Iterator 140 | 141 | - ##### Fields 142 | 143 | | Field | Type | Description | 144 | | ----- | ------------ | --------------------------------------------------------- | 145 | | `src` | `[]const u8` | The UTF-8 encoded string that the iterator will traverse. | 146 | | `pos` | `usize` | The current byte position in the string. | 147 | 148 | - ##### Initialization 149 | 150 | | Function | Return | Description | 151 | | ----------- | ------- | --------------------------------------------------------------------- | 152 | | init | `?Self` | Initializes a new Utf8Iterator from the given `UTF-8 slice` if valid. | 153 | | unsafe_init | `Self` | Initializes a new Utf8Iterator from the given `UTF-8 slice`. | 154 | 155 | - ##### Next 156 | 157 | | Function | Return | Description | 158 | | ------------- | ------------ | -------------------------------------------------------------------- | 159 | | nextCodepoint | `?Codepoint` | Returns the next `Codepoint` **and** increments the position. | 160 | | nextSlice | `?Codepoint` | Returns the next `UTF-8 slice` **and** increments the position. | 161 | | nextLength | `?Codepoint` | Returns the next `Codepoint` length **and** increments the position. | 162 | 163 | - ##### Peek 164 | 165 | | Function | Return | Description | 166 | | ------------- | ------------ | -------------------------------------------------------------------------- | 167 | | peekCodepoint | `?Codepoint` | Returns the next `Codepoint` **without** incrementing the position. | 168 | | peekSlice | `?Codepoint` | Returns the next `UTF-8 slice` **without** incrementing the position. | 169 | | peekLength | `?Codepoint` | Returns the next `Codepoint` length **without** incrementing the position. | 170 | 171 |
172 |
173 | line 174 |
175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | - ### Benchmark 183 | 184 | > A quick summary with sample performance test results between _**`SuperZIG`.`io`.`string`.`utils`.`codepoint`**_ implementations and its popular competitors. 185 | 186 | - #### vs `std.unicode` 187 | 188 | > _**In summary**, `io` is faster by **5 times** compared to `std` in most cases, thanks to its optimized implementation. ✨_ 189 | 190 | - #### Debug Build (`zig build run -- codepoint`) 191 | 192 | | Benchmark | Runs | Total Time | Avg Time | Speed | 193 | | --------- | ------ | ---------- | -------- | ----- | 194 | | std_x10 | 100000 | 87.4ms | 874ns | x1.00 | 195 | | io_x10 | 100000 | 65.6ms | 656ns | x1.33 | 196 | | std_x100 | 23412 | 2.108s | 90.082us | x1.00 | 197 | | io_x100 | 46583 | 1.952s | 41.918us | x2.15 | 198 | | std_x1000 | 234 | 2.061s | 8.81ms | x1.00 | 199 | | io_x1000 | 457 | 2.1s | 4.596ms | x1.92 | 200 | 201 | - #### Release Build (`zig build run --release=fast -- codepoint`) 202 | 203 | | Benchmark | Runs | Total Time | Avg Time | Speed | 204 | | --------- | ------ | ---------- | -------- | ----- | 205 | | std_x10 | 100000 | 84.9ms | 849ns | x1.00 | 206 | | io_x10 | 100000 | 22ms | 220ns | x3.86 | 207 | | std_x100 | 25531 | 1.967s | 77.053us | x1.00 | 208 | | io_x100 | 100000 | 1.56s | 15.608us | x4.94 | 209 | | std_x1000 | 263 | 2.107s | 8.012ms | x1.00 | 210 | | io_x1000 | 1233 | 1.966s | 1.594ms | x5.02 | 211 | 212 | > **It is normal for the values ​​to differ each time the benchmark is run, but in general these percentages will remain close.** 213 | 214 | > The benchmarks were run on a **Windows 11 v24H2** with **11th Gen Intel® Core™ i5-1155G7 × 8** processor and **32GB** of RAM. 215 | > 216 | > The version of zig used is **0.14.0**. 217 | > 218 | > The source code of this benchmark **[bench/string/utils/codepoint.zig](https://github.com/Super-ZIG/io-bench/tree/main/src/bench/string/utils/codepoint.zig)**. 219 | 220 |
221 |
222 | line 223 |
224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 |
232 |
233 | 234 | 235 | 236 |
237 | 238 | --------------------------------------------------------------------------------