├── .gitignore
├── .github
├── FUNDING.yml
└── workflows
│ └── main.yml
├── lib
├── terminal
│ └── terminal.zig
├── string
│ ├── utils
│ │ ├── memory
│ │ │ ├── memory.zig
│ │ │ └── memory.test.zig
│ │ ├── grapheme
│ │ │ ├── grapheme.zig
│ │ │ └── grapheme.test.zig
│ │ ├── codepoint
│ │ │ ├── codepoint.test.zig
│ │ │ └── codepoint.zig
│ │ ├── utf8
│ │ │ ├── utf8.test.zig
│ │ │ └── utf8.zig
│ │ └── ascii
│ │ │ ├── ascii.zig
│ │ │ └── ascii.test.zig
│ └── string.zig
└── io.zig
├── LICENSE
├── README.md
└── docs
├── index.md
└── string
└── utils
├── utf8.md
├── ascii.md
└── codepoint.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .vscode
2 | .zig-cache
3 | zig-out
--------------------------------------------------------------------------------
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | # These are supported funding model platforms
2 |
3 | github: [Super-ZIG, maysara-elshewehy]
4 | ko_fi: codeguild
5 |
--------------------------------------------------------------------------------
/lib/terminal/terminal.zig:
--------------------------------------------------------------------------------
1 | // terminal.zig — Terminal handling module for I/O library.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/terminal
5 | // author : https://github.com/maysara-elshewehy
6 | //
7 | // Developed with ❤️ by Maysara.
8 |
9 |
10 |
11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
12 |
13 |
14 |
15 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
16 |
17 |
18 |
19 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗
20 |
21 | test {
22 | }
23 |
24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/lib/string/utils/memory/memory.zig:
--------------------------------------------------------------------------------
1 | // Copyright (c) 2025 Maysara, All rights reserved.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/memory
5 | //
6 | // owner : https://github.com/maysara-elshewehy
7 | // email : maysara.elshewehy@gmail.com
8 | //
9 | // Made with ❤️ by Maysara
10 |
11 |
12 |
13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
14 |
15 |
16 |
17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
18 |
19 |
20 |
21 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗
22 |
23 |
24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/lib/string/utils/grapheme/grapheme.zig:
--------------------------------------------------------------------------------
1 | // Copyright (c) 2025 Maysara, All rights reserved.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/grapheme
5 | //
6 | // owner : https://github.com/maysara-elshewehy
7 | // email : maysara.elshewehy@gmail.com
8 | //
9 | // Made with ❤️ by Maysara
10 |
11 |
12 |
13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
14 |
15 |
16 |
17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
18 |
19 |
20 |
21 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗
22 |
23 |
24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/lib/string/utils/memory/memory.test.zig:
--------------------------------------------------------------------------------
1 | // Copyright (c) 2025 Maysara, All rights reserved.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/memory
5 | //
6 | // owner : https://github.com/maysara-elshewehy
7 | // email : maysara.elshewehy@gmail.com
8 | //
9 | // Made with ❤️ by Maysara
10 |
11 |
12 |
13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
14 |
15 |
16 |
17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
18 |
19 |
20 |
21 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗
22 |
23 |
24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/lib/string/utils/grapheme/grapheme.test.zig:
--------------------------------------------------------------------------------
1 | // Copyright (c) 2025 Maysara, All rights reserved.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/grapheme
5 | //
6 | // owner : https://github.com/maysara-elshewehy
7 | // email : maysara.elshewehy@gmail.com
8 | //
9 | // Made with ❤️ by Maysara
10 |
11 |
12 |
13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
14 |
15 |
16 |
17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
18 |
19 |
20 |
21 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗
22 |
23 |
24 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/.github/workflows/main.yml:
--------------------------------------------------------------------------------
1 | name: CI
2 |
3 | on:
4 | push:
5 | branches: [main]
6 | pull_request:
7 | branches: [main]
8 | workflow_dispatch:
9 |
10 | permissions:
11 | contents: read
12 |
13 | jobs:
14 | build:
15 | runs-on: ubuntu-latest
16 | steps:
17 | - uses: actions/checkout@v4
18 |
19 | - name: Download Zig binary
20 | run: wget https://ziglang.org/download/0.14.0/zig-linux-x86_64-0.14.0.tar.xz
21 |
22 | - name: Extract Zig
23 | run: tar -xf zig-linux-x86_64-0.14.0.tar.xz
24 |
25 | - name: Add Zig to PATH
26 | run: echo "ZIG=$PWD/zig-linux-x86_64-0.14.0/zig" >> $GITHUB_ENV
27 |
28 | - name: Verify version
29 | run: $ZIG version
30 |
31 | - name: Run tests
32 | run: $ZIG build test
33 |
--------------------------------------------------------------------------------
/lib/io.zig:
--------------------------------------------------------------------------------
1 | // io.zig — Central entry point for I/O library.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io
5 | // author : https://github.com/maysara-elshewehy
6 | //
7 | // Developed with ❤️ by Maysara.
8 |
9 |
10 |
11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
12 |
13 | /// Provides utilities for string manipulation and operations.
14 | pub const string = @import("./string/string.zig");
15 |
16 | /// Provides utilities for terminal input/output operations.
17 | pub const terminal = @import("./terminal/terminal.zig");
18 |
19 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
20 |
21 |
22 |
23 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗
24 |
25 | test {
26 | _ = @import("./string/string.zig");
27 | _ = @import("./terminal/terminal.zig");
28 | }
29 |
30 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 Maysara
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/lib/string/string.zig:
--------------------------------------------------------------------------------
1 | // string.zig — String handling module for I/O library.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string
5 | // author : https://github.com/maysara-elshewehy
6 | //
7 | // Developed with ❤️ by Maysara.
8 |
9 |
10 |
11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
12 |
13 | /// A utility module for efficient string manipulation, providing various tools
14 | /// for handling ASCII, UTF-8, codepoints, graphemes, and memory operations.
15 | pub const utils = .{
16 | .ascii = @import("./utils/ascii/ascii.zig"),
17 | .utf8 = @import("./utils/utf8/utf8.zig"),
18 | .codepoint = @import("./utils/codepoint/codepoint.zig"),
19 | .grapheme = @import("./utils/grapheme/grapheme.zig"),
20 | .memory = @import("./utils/memory/memory.zig"),
21 | };
22 |
23 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
24 |
25 |
26 |
27 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗
28 |
29 | test {
30 | // utils
31 | _ = @import("./utils/ascii/ascii.test.zig");
32 | _ = @import("./utils/utf8/utf8.test.zig");
33 | _ = @import("./utils/codepoint/codepoint.test.zig");
34 | _ = @import("./utils/grapheme/grapheme.test.zig");
35 | _ = @import("./utils/memory/memory.test.zig");
36 | }
37 |
38 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/lib/string/utils/codepoint/codepoint.test.zig:
--------------------------------------------------------------------------------
1 | // codepoint.test.zig !
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/utils/codepoint
5 | // author : https://github.com/maysara-elshewehy
6 | //
7 | // Developed with ❤️ by Maysara.
8 |
9 |
10 |
11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
12 |
13 | const Codepoint = @import("./codepoint.zig");
14 | const testing = @import("std").testing;
15 |
16 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
17 |
18 |
19 |
20 | // ╔══════════════════════════════════════ INIT ══════════════════════════════════════╗
21 |
22 | const SAMPLE = "A©€😀";
23 | const RESULT = [_]struct { utf8: []const u8, cp: u21 } {
24 | .{ .utf8 = "A", .cp = 0x00041 },
25 | .{ .utf8 = "©", .cp = 0x000A9 },
26 | .{ .utf8 = "€", .cp = 0x020AC },
27 | .{ .utf8 = "😀", .cp = 0x1F600 },
28 | };
29 |
30 | const INVALID_CP = 0x110000;
31 | const INVALID_UTF8 = "\xFF";
32 |
33 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
34 |
35 |
36 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗
37 |
38 | test "Codepoint" {
39 | for(0..RESULT.len) |i| {
40 | // init (using codepoint)
41 | if(Codepoint.init(RESULT[i].cp)) |cp| {
42 | try testing.expectEqual(RESULT[i].utf8.len, cp.len);
43 | try testing.expectEqual(RESULT[i].cp, cp.src);
44 | } else return error.CodepointInitError;
45 |
46 | // fromUtf8 (using UTF-8 slice)
47 | if(Codepoint.fromUtf8(RESULT[i].utf8)) |cp| {
48 | try testing.expectEqual(RESULT[i].utf8.len, cp.len);
49 | try testing.expectEqual(RESULT[i].cp, cp.src);
50 | } else return error.CodepointFromUtf8Error;
51 | }
52 |
53 | // must fail
54 | if(Codepoint.init(INVALID_CP)) |_| return error.CodepointInitMustFail;
55 | if(Codepoint.fromUtf8(INVALID_UTF8)) |_| return error.CodepointFromUtf8MustFail;
56 | }
57 |
58 | test "Utf8Iterator" {
59 | // must fail
60 | if(Codepoint.Utf8Iterator.init(INVALID_UTF8)) |_| return error.Utf8IteratorInitMustFail;
61 | var iterator = Codepoint.Utf8Iterator.init(SAMPLE) orelse return error.Utf8IteratorInitError;
62 |
63 | for(0..RESULT.len) |i| {
64 |
65 | // codepoint
66 | if(iterator.nextCodepoint()) |cp| {
67 | try testing.expectEqual(RESULT[i].utf8.len, cp.len);
68 | try testing.expectEqual(RESULT[i].cp, cp.src);
69 |
70 | // reset
71 | iterator.pos -= cp.len;
72 | } else return error.NextCodepointError;
73 |
74 | // utf8 slice
75 | if(iterator.nextSlice()) |slice| {
76 | try testing.expectEqual(RESULT[i].utf8.len, slice.len);
77 | try testing.expectEqualStrings(RESULT[i].utf8, slice);
78 |
79 | // reset
80 | iterator.pos -= slice.len;
81 | } else return error.NextSliceError;
82 |
83 | // length
84 | if(iterator.nextLength()) |len| {
85 | try testing.expectEqual(RESULT[i].utf8.len, len);
86 | } else return error.NextLengthError;
87 | }
88 |
89 | // position
90 | try testing.expectEqual(SAMPLE.len, iterator.pos);
91 | }
92 |
93 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/lib/string/utils/utf8/utf8.test.zig:
--------------------------------------------------------------------------------
1 | // Copyright (c) 2025 Maysara, All rights reserved.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/utf8
5 | //
6 | // owner : https://github.com/maysara-elshewehy
7 | // email : maysara.elshewehy@gmail.com
8 | //
9 | // Made with ❤️ by Maysara
10 |
11 |
12 |
13 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
14 |
15 | const std = @import("std");
16 | const testing = std.testing;
17 | const utf8 = @import("./utf8.zig");
18 |
19 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
20 |
21 |
22 |
23 | // ╔══════════════════════════════════════ INIT ══════════════════════════════════════╗
24 |
25 | const tests = [_]struct { slice: []const u8, codepoint: u21, } {
26 | .{ .slice = "A", .codepoint = 0x00041 },
27 | .{ .slice = "©", .codepoint = 0x000A9 },
28 | .{ .slice = "€", .codepoint = 0x020AC },
29 | .{ .slice = "😀", .codepoint = 0x1F600 },
30 | };
31 |
32 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
33 |
34 |
35 |
36 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗
37 |
38 | // ┌────────────────────────── Conversion ────────────────────────┐
39 |
40 | test "utf8.encode" {
41 | var buf: [4]u8 = undefined;
42 |
43 | for (tests) |t| {
44 | const len = utf8.encode(t.codepoint, &buf);
45 | try testing.expectEqual(t.slice.len, len);
46 | for (0..len) |i|
47 | try testing.expectEqual(t.slice[i], buf[i]);
48 | }
49 | }
50 |
51 | test "utf8.decode" {
52 | for (tests) |t| {
53 | const cp = utf8.decode(t.slice);
54 | try testing.expectEqual(t.codepoint, cp);
55 | try testing.expectEqual(t.slice.len, utf8.getCodepointLength(cp));
56 | }
57 | }
58 |
59 | // └──────────────────────────────────────────────────────────────┘
60 |
61 |
62 | // ┌────────────────────────── Properties ────────────────────────┐
63 |
64 | test "utf8.getCodepointLength" {
65 | try testing.expectEqual(@as(u3, 1), utf8.getCodepointLength('A'));
66 | try testing.expectEqual(@as(u3, 2), utf8.getCodepointLength(0x00A9));
67 | try testing.expectEqual(@as(u3, 3), utf8.getCodepointLength(0x20AC));
68 | try testing.expectEqual(@as(u3, 4), utf8.getCodepointLength(0x1F600));
69 | try testing.expectEqual(null, utf8.getCodepointLengthOrNull(0x110000));
70 | }
71 |
72 | test "utf8.getSequenceLength" {
73 | try testing.expectEqual(@as(u3, 1), utf8.getSequenceLength('A'));
74 | try testing.expectEqual(@as(u3, 2), utf8.getSequenceLength(0xC2));
75 | try testing.expectEqual(@as(u3, 3), utf8.getSequenceLength(0xE2));
76 | try testing.expectEqual(@as(u3, 4), utf8.getSequenceLength(0xF0));
77 | try testing.expectEqual(null, utf8.getSequenceLengthOrNull(0xF8));
78 | }
79 |
80 | test "utf8.isValidSlice" {
81 | // Valid UTF-8 sequences
82 | try testing.expect(utf8.isValidSlice(""));
83 | try testing.expect(utf8.isValidSlice("Hello"));
84 | try testing.expect(utf8.isValidSlice("Hello 世界"));
85 | try testing.expect(utf8.isValidSlice("🌍🌎🌏"));
86 |
87 | // Invalid UTF-8 sequences
88 | try testing.expect(!utf8.isValidSlice(&[_]u8{0xFF}));
89 | try testing.expect(!utf8.isValidSlice(&[_]u8{0xC0, 0x80}));
90 | try testing.expect(!utf8.isValidSlice(&[_]u8{0xE0, 0x80}));
91 | try testing.expect(!utf8.isValidSlice(&[_]u8{0xF0, 0x80, 0x80}));
92 | }
93 |
94 | test "utf8.isValidCodepoint" {
95 | // Valid UTF-8 sequences
96 | try testing.expect(utf8.isValidCodepoint('A'));
97 | try testing.expect(utf8.isValidCodepoint('🌍'));
98 | try testing.expect(utf8.isValidCodepoint('世'));
99 |
100 | // Invalid UTF-8 sequences
101 | try testing.expect(!utf8.isValidCodepoint(0x110000));
102 |
103 | // ...
104 | }
105 |
106 | // └──────────────────────────────────────────────────────────────┘
107 |
108 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 | When simplicity meets efficiency
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 | part of SuperZIG framework
32 |
33 |
34 |
35 |
36 |
37 |
38 |

39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 | - **🍃 Zero dependencies**—meticulously crafted code.
49 |
50 | - **🚀 Blazing fast**—almost as fast as light!
51 |
52 | - **🌍 Universal compatibility**—Windows, Linux, and macOS.
53 |
54 | - **🛡️ Battle-tested**—ready for production.
55 |
56 |
57 |
58 |

59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 | - ### API
68 |
69 | - #### String
70 | - ##### Types
71 | > - ##### view
72 | > - ##### fixed
73 | > - ##### managed
74 | > - ##### unmanaged
75 |
76 | - ##### Utils
77 | - ##### [ascii](https://super-zig.github.io/io/string/utils/ascii)
78 | - ##### [utf8](https://super-zig.github.io/io/string/utils/utf8)
79 | - ##### [codepoint](https://super-zig.github.io/io/string/utils/codepoint)
80 | > - ##### grapheme
81 | > - ##### memory
82 |
83 | - #### Terminal
84 | - ##### App
85 | > - ##### cli
86 | > - ##### prompts
87 |
88 | - ##### Utils
89 | > - ##### ansi
90 | > - ##### print
91 | > - ##### info
92 | > - ##### settings
93 | > - ##### events
94 |
95 |
96 |
97 |

98 |
99 |
100 |
101 |
102 |
103 |
104 |
105 |
106 | - ### Benchmark
107 |
108 | > [See benchmark results comparing `SuperZIG.io` with popular alternatives.](https://github.com/Super-ZIG/io-bench)
109 |
110 |
111 |
112 |

113 |
114 |
115 |
116 |
117 |
118 |
119 |
120 |
121 |
122 |
127 |
128 |
--------------------------------------------------------------------------------
/docs/index.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 | When simplicity meets efficiency
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 | part of SuperZIG framework
32 |
33 |
34 |
35 |
36 |
37 |
38 |

39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 | - **🍃 Zero dependencies**—meticulously crafted code.
49 |
50 | - **🚀 Blazing fast**—almost as fast as light!
51 |
52 | - **🌍 Universal compatibility**—Windows, Linux, and macOS.
53 |
54 | - **🛡️ Battle-tested**—ready for production.
55 |
56 |
57 |
58 |

59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 | - ### API
68 |
69 | - #### String
70 | - ##### Types
71 | > - ##### view
72 | > - ##### fixed
73 | > - ##### managed
74 | > - ##### unmanaged
75 |
76 | - ##### Utils
77 | - ##### [ascii](https://super-zig.github.io/io/string/utils/ascii)
78 | - ##### [utf8](https://super-zig.github.io/io/string/utils/utf8)
79 | - ##### [codepoint](https://super-zig.github.io/io/string/utils/codepoint)
80 | > - ##### grapheme
81 | > - ##### memory
82 |
83 | - #### Terminal
84 | - ##### App
85 | > - ##### cli
86 | > - ##### prompts
87 |
88 | - ##### Utils
89 | > - ##### ansi
90 | > - ##### print
91 | > - ##### info
92 | > - ##### settings
93 | > - ##### events
94 |
95 |
96 |
97 |

98 |
99 |
100 |
101 |
102 |
103 |
104 |
105 |
106 | - ### Benchmark
107 |
108 | > [See benchmark results comparing `SuperZIG.io` with popular alternatives.](https://github.com/Super-ZIG/io-bench)
109 |
110 |
111 |
112 |

113 |
114 |
115 |
116 |
117 |
118 |
119 |
120 |
121 |
122 |
127 |
128 |
--------------------------------------------------------------------------------
/lib/string/utils/ascii/ascii.zig:
--------------------------------------------------------------------------------
1 | // ascii.zig — ASCII handling module for I/O library.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/utils/ascii
5 | // author : https://github.com/maysara-elshewehy
6 | //
7 | // Developed with ❤️ by Maysara.
8 |
9 |
10 |
11 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗
12 |
13 | // ┌────────────────────────── Conversion ────────────────────────┐
14 |
15 | /// Converts a character to uppercase.
16 | /// If the character is not a lowercase letter, it is returned unchanged.
17 | pub fn toUpper(c: u8) u8 {
18 | // credit: std.ascii.toUpper
19 | const mask = @as(u8, @intFromBool(@call(.always_inline, isLower, .{c}))) << 5;
20 | return c ^ mask;
21 | }
22 |
23 | /// Converts a character to lowercase.
24 | /// If the character is not an uppercase letter, it is returned unchanged.
25 | pub fn toLower(c: u8) u8 {
26 | // credit: std.ascii.toLower
27 | const mask = @as(u8, @intFromBool(@call(.always_inline, isUpper, .{c}))) << 5;
28 | return c | mask;
29 | }
30 |
31 | // └──────────────────────────────────────────────────────────────┘
32 |
33 |
34 | // ┌────────────────────────── Properties ────────────────────────┐
35 |
36 | /// Returns true if the character is an uppercase letter (`A-Z`).
37 | pub fn isUpper(c: u8) bool {
38 | return switch (c) {
39 | 'A'...'Z' => true,
40 | else => false,
41 | };
42 | }
43 |
44 | /// Returns true if the character is a lowercase letter (`a-z`).
45 | pub fn isLower(c: u8) bool {
46 | return switch (c) {
47 | 'a'...'z' => true,
48 | else => false,
49 | };
50 | }
51 |
52 | /// Returns true if the character is an alphabetic letter (`A-Z`, `a-z`).
53 | pub fn isAlphabetic(c: u8) bool {
54 | return switch (c) {
55 | 'A'...'Z', 'a'...'z' => true,
56 | else => false,
57 | };
58 | }
59 |
60 | /// Returns true if the character is a numeric digit (`0-9`).
61 | pub fn isDigit(c: u8) bool {
62 | return switch (c) {
63 | '0'...'9' => true,
64 | else => false,
65 | };
66 | }
67 |
68 | /// Returns true if the character is alphanumeric (`A-Z`, `a-z`, `0-9`).
69 | pub fn isAlphanumeric(c: u8) bool {
70 | return switch (c) {
71 | 'A'...'Z', 'a'...'z', '0'...'9' => true,
72 | else => false,
73 | };
74 | }
75 |
76 | /// Returns true if the character is a hexadecimal digit (`0-9`, `A-F`, `a-f`).
77 | pub fn isHex(c: u8) bool {
78 | return switch (c) {
79 | '0'...'9', 'A'...'F', 'a'...'f' => true,
80 | else => false,
81 | };
82 | }
83 |
84 | /// Returns true if the character is an octal digit (`0-7`).
85 | pub fn isOctal(c: u8) bool {
86 | return switch (c) {
87 | '0'...'7' => true,
88 | else => false,
89 | };
90 | }
91 |
92 | /// Returns true if the character is a binary digit (`0`, `1`).
93 | pub fn isBinary(c: u8) bool {
94 | return switch (c) {
95 | '0', '1' => true,
96 | else => false,
97 | };
98 | }
99 |
100 | /// Returns true if the character is a punctuation symbol
101 | /// (any printable ASCII character that is not a letter, digit, or space).
102 | pub fn isPunctuation(c: u8) bool {
103 | return switch (c) {
104 | '!'...'/', ':'...'@', '['...'`', '{'...'~' => true,
105 | else => false,
106 | };
107 | }
108 |
109 | /// Returns true if the character is a whitespace character
110 | /// (space, tab, newline, carriage return).
111 | pub fn isWhitespace(c: u8) bool {
112 | return switch (c) {
113 | ' ', '\t', '\n', '\r' => true,
114 | else => false,
115 | };
116 | }
117 |
118 | /// Returns true if the character is printable
119 | /// (ASCII 0x20-0x7E, i.e., space through tilde).
120 | pub fn isPrintable(c: u8) bool {
121 | return switch (c) {
122 | ' '...'~' => true,
123 | else => false,
124 | };
125 | }
126 |
127 | /// Returns true if the character is a control character
128 | /// (ASCII 0x00-0x1F or 0x7F).
129 | pub fn isControl(c: u8) bool {
130 | return (c <= 0x1F) or (c == 0x7F);
131 | }
132 |
133 | // └──────────────────────────────────────────────────────────────┘
134 |
135 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/lib/string/utils/ascii/ascii.test.zig:
--------------------------------------------------------------------------------
1 | // ascii.test.zig !
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/utils/ascii
5 | // author : https://github.com/maysara-elshewehy
6 | //
7 | // Developed with ❤️ by Maysara.
8 |
9 |
10 |
11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
12 |
13 | const std = @import("std");
14 | const testing = std.testing;
15 | const ascii = @import("./ascii.zig");
16 |
17 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
18 |
19 |
20 |
21 | // ╔══════════════════════════════════════ INIT ══════════════════════════════════════╗
22 |
23 | const uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
24 | const lowercase = "abcdefghijklmnopqrstuvwxyz";
25 | const digits = "0123456789";
26 | const letters = uppercase ++ lowercase;
27 |
28 | const punctuation = "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~";
29 | const printable = letters ++ digits ++ punctuation ++ " ";
30 |
31 | const whitespace = " \t\n\r";
32 | const control = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f";
33 | const non_ascii = "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f";
34 |
35 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
36 |
37 |
38 |
39 | // ╔══════════════════════════════════════ TEST ══════════════════════════════════════╗
40 |
41 | // ┌────────────────────────── Conversion ────────────────────────┐
42 |
43 | test "toUpper" {
44 | for(0..lowercase.len) |i| {
45 | try testing.expectEqual(uppercase[i], ascii.toUpper(lowercase[i]));
46 | }
47 |
48 | // unchanged
49 | for(uppercase ++ digits ++ punctuation) |c| {
50 | try testing.expectEqual(c, ascii.toUpper(c));
51 | }
52 | }
53 |
54 | test "toLower" {
55 | for(0..uppercase.len) |i| {
56 | try testing.expectEqual(lowercase[i], ascii.toLower(uppercase[i]));
57 | }
58 |
59 | // unchanged
60 | for(lowercase ++ digits ++ punctuation) |c| {
61 | try testing.expectEqual(c, ascii.toLower(c));
62 | }
63 | }
64 |
65 | // └──────────────────────────────────────────────────────────────┘
66 |
67 |
68 | // ┌────────────────────────── Properties ────────────────────────┐
69 |
70 | test "isUpper" {
71 | for(uppercase) |c| {
72 | try testing.expect(ascii.isUpper(c));
73 | }
74 |
75 | // false
76 | for(lowercase) |c| {
77 | try testing.expect(!ascii.isUpper(c));
78 | }
79 | }
80 |
81 | test "isLower" {
82 | for(lowercase) |c| {
83 | try testing.expect(ascii.isLower(c));
84 | }
85 |
86 | // false
87 | for(uppercase) |c| {
88 | try testing.expect(!ascii.isLower(c));
89 | }
90 | }
91 |
92 | test "isAlphabetic" {
93 | for(letters) |c| {
94 | try testing.expect(ascii.isAlphabetic(c));
95 | }
96 |
97 | // false
98 | for(digits) |c| {
99 | try testing.expect(!ascii.isAlphabetic(c));
100 | }
101 | }
102 |
103 | test "isDigit" {
104 | for(digits) |c| {
105 | try testing.expect(ascii.isDigit(c));
106 | }
107 |
108 | // false
109 | for(letters) |c| {
110 | try testing.expect(!ascii.isDigit(c));
111 | }
112 | }
113 |
114 | test "isAlphanumeric" {
115 | for(letters ++ digits) |c| {
116 | try testing.expect(ascii.isAlphanumeric(c));
117 | }
118 |
119 | // false
120 | for(punctuation) |c| {
121 | try testing.expect(!ascii.isAlphanumeric(c));
122 | }
123 | }
124 |
125 | test "isHex" {
126 | for("0123456789ABCDEFabcdef") |c| {
127 | try testing.expect(ascii.isHex(c));
128 | }
129 |
130 | for(non_ascii) |c| {
131 | try testing.expect(!ascii.isHex(c));
132 | }
133 | }
134 |
135 | test "isOctal" {
136 | for("01234567") |c| {
137 | try testing.expect(ascii.isOctal(c));
138 | }
139 |
140 | for(non_ascii) |c| {
141 | try testing.expect(!ascii.isOctal(c));
142 | }
143 | }
144 |
145 | test "isBinary" {
146 | for("01") |c| {
147 | try testing.expect(ascii.isBinary(c));
148 | }
149 |
150 | for(non_ascii) |c| {
151 | try testing.expect(!ascii.isBinary(c));
152 | }
153 | }
154 |
155 | test "isPunctuation" {
156 | for(punctuation) |c| {
157 | try testing.expect(ascii.isPunctuation(c));
158 | }
159 |
160 | // false
161 | for(letters ++ digits) |c| {
162 | try testing.expect(!ascii.isPunctuation(c));
163 | }
164 | }
165 |
166 | test "isWhitespace" {
167 | for(whitespace) |c| {
168 | try testing.expect(ascii.isWhitespace(c));
169 | }
170 |
171 | // false
172 | for(non_ascii) |c| {
173 | try testing.expect(!ascii.isWhitespace(c));
174 | }
175 | }
176 |
177 | test "isPrintable" {
178 | for(printable) |c| {
179 | try testing.expect(ascii.isPrintable(c));
180 | }
181 |
182 | // false
183 | for(non_ascii) |c| {
184 | try testing.expect(!ascii.isPrintable(c));
185 | }
186 | }
187 |
188 | test "isControl" {
189 | for(control) |c| {
190 | try testing.expect(ascii.isControl(c));
191 | }
192 |
193 | // false
194 | for(non_ascii) |c| {
195 | try testing.expect(!ascii.isControl(c));
196 | }
197 | }
198 |
199 | // └──────────────────────────────────────────────────────────────┘
200 |
201 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/lib/string/utils/codepoint/codepoint.zig:
--------------------------------------------------------------------------------
1 | // codepoint.zig — Codepoint handling module for I/O library.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/utils/codepoint
5 | // author : https://github.com/maysara-elshewehy
6 | //
7 | // Developed with ❤️ by Maysara.
8 |
9 |
10 |
11 | // ╔══════════════════════════════════════ PACK ══════════════════════════════════════╗
12 |
13 | const utf8 = @import("../utf8/utf8.zig");
14 | const Codepoint = @This();
15 |
16 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
17 |
18 |
19 |
20 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗
21 |
22 | // ┌─────────────────────────── Fields ───────────────────────────┐
23 |
24 | /// Numeric value of the Unicode codepoint (U+0000 to U+10FFFF).
25 | src: u21 = 0,
26 |
27 | /// Length of this codepoint in UTF-8 (1-4 bytes).
28 | len: u3 = 0,
29 |
30 | // └──────────────────────────────────────────────────────────────┘
31 |
32 |
33 | // ┌────────────────────────── Methods ───────────────────────────┐
34 |
35 | /// Initializes a Codepoint from a Unicode scalar value if valid.
36 | /// Returns null if the codepoint is invalid according to UTF-8.
37 | pub fn init(cp: u21) ?Codepoint {
38 | return if(@call(.always_inline, utf8.isValidCodepoint, .{cp}))
39 | @call(.always_inline, unsafe_init, .{cp}) else null;
40 | }
41 |
42 | /// Initializes a Codepoint from a Unicode scalar value.
43 | /// Assumes the input is a valid codepoint.
44 | pub fn unsafe_init(cp: u21) Codepoint {
45 | return .{
46 | .src = cp,
47 | .len = @call(.always_inline, utf8.getCodepointLength, .{cp})
48 | };
49 | }
50 |
51 | /// Initializes a Codepoint from a UTF-8 encoded byte slice if valid.
52 | /// Returns null if the slice is empty or contains an invalid UTF-8 sequence.
53 | pub fn fromUtf8(slice: []const u8) ?Codepoint {
54 | return if(slice.len == 0 or !@call(.always_inline, utf8.isValidSlice, .{slice})) null
55 | else @call(.always_inline, unsafe_fromUtf8, .{slice});
56 | }
57 |
58 | /// Initializes a Codepoint from a UTF-8 encoded byte slice.
59 | /// Assumes the input is a valid UTF-8 slice.
60 | pub fn unsafe_fromUtf8(slice: []const u8) Codepoint {
61 | return if(@call(.always_inline, utf8.getSequenceLengthOrNull, .{slice[0]})) |len|
62 | .{
63 | .src = utf8.decode(slice[0..len]),
64 | .len = len
65 | } else .{};
66 | }
67 |
68 | // └──────────────────────────────────────────────────────────────┘
69 |
70 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
71 |
72 |
73 |
74 | // ╔══════════════════════════════════════ ITER ══════════════════════════════════════╗
75 |
76 | pub const Utf8Iterator = struct {
77 |
78 | // ┌─────────────────────────── Fields ───────────────────────────┐
79 |
80 | /// The UTF-8 encoded string that the iterator will traverse.
81 | src: []const u8,
82 |
83 | /// The current byte position in the string.
84 | pos: usize = 0,
85 |
86 | // └──────────────────────────────────────────────────────────────┘
87 |
88 |
89 | // ┌────────────────────────── Methods ───────────────────────────┐
90 |
91 | /// Initializes a new Utf8Iterator from the given UTF-8 slice if valid.
92 | /// Returns null if the slice is empty or contains invalid UTF-8.
93 | pub fn init(slice: []const u8) ?Utf8Iterator {
94 | return if(slice.len == 0 or !utf8.isValidSlice(slice)) null
95 | else @call(.always_inline, Utf8Iterator.unsafe_init, .{slice});
96 | }
97 |
98 | /// Initializes a new Utf8Iterator from the given UTF-8 slice.
99 | /// Assumes the input is a valid UTF-8 slice.
100 | pub fn unsafe_init(slice: []const u8) Utf8Iterator {
101 | return .{ .src = slice };
102 | }
103 |
104 | /// Returns the next Codepoint and increments the position.
105 | pub fn nextCodepoint(self: *Utf8Iterator) ?Codepoint {
106 | if(@call(.always_inline, Utf8Iterator.peekCodepoint, .{self.*})) |cp| {
107 | self.pos += cp.len;
108 | return cp;
109 | } else return null;
110 | }
111 |
112 | /// Returns the next UTF-8 slice and increments the position.
113 | pub fn nextSlice(self: *Utf8Iterator) ?[]const u8 {
114 | if(@call(.always_inline, Utf8Iterator.peekSlice, .{self.*})) |slice| {
115 | self.pos += slice.len;
116 | return slice;
117 | } else return null;
118 | }
119 |
120 | /// Returns the next codepoint length and increments the position.
121 | pub fn nextLength(self: *Utf8Iterator) ?u3 {
122 | if(@call(.always_inline, Utf8Iterator.peekLength, .{self.*})) |len| {
123 | self.pos += len;
124 | return len;
125 | } else return null;
126 | }
127 |
128 | /// Returns the next Codepoint without incrementing the position.
129 | pub fn peekCodepoint(self: Utf8Iterator) ?Codepoint {
130 | if(@call(.always_inline, Utf8Iterator.peekLength, .{self})) |len| {
131 | return Codepoint {
132 | .src = utf8.decode(self.src[self.pos..self.pos+len]),
133 | .len = len
134 | };
135 | } else return null;
136 | }
137 |
138 | /// Returns the next UTF-8 slice without incrementing the position.
139 | pub fn peekSlice(self: Utf8Iterator) ?[]const u8 {
140 | if(@call(.always_inline, Utf8Iterator.peekLength, .{self})) |len| {
141 | return self.src[self.pos..self.pos+len];
142 | } else return null;
143 | }
144 |
145 | /// Returns the next codepoint length without incrementing the position.
146 | pub fn peekLength(self: Utf8Iterator) ?u3 {
147 | return if(self.pos == self.src.len) null
148 | else @call(.always_inline, utf8.getSequenceLengthOrNull, .{self.src[self.pos]});
149 | }
150 |
151 | // └──────────────────────────────────────────────────────────────┘
152 | };
153 |
154 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/lib/string/utils/utf8/utf8.zig:
--------------------------------------------------------------------------------
1 | // Copyright (c) 2025 Maysara, All rights reserved.
2 | //
3 | // repo : https://github.com/Super-ZIG/io
4 | // docs : https://super-zig.github.io/io/string/utils/utf8
5 | //
6 | // owner : https://github.com/maysara-elshewehy
7 | // email : maysara.elshewehy@gmail.com
8 | //
9 | // Made with ❤️ by Maysara
10 |
11 |
12 |
13 | // ╔══════════════════════════════════════ CORE ══════════════════════════════════════╗
14 |
15 | // ┌────────────────────────── Conversion ────────────────────────┐
16 |
17 | /// Encodes a single Unicode `codepoint` to a UTF-8 sequence.
18 | /// Returns the number of bytes written.
19 | ///
20 | /// Assumes the input `codepoint` is valid and the output slice is large enough.
21 | pub fn encode(cp: u21, out: []u8) u3 {
22 | const length = @call(.always_inline, getCodepointLength, .{cp});
23 |
24 | switch (length) {
25 | 1 => {
26 | out[0] = @truncate(cp);
27 | },
28 |
29 | 2 => {
30 | out[0] = @truncate(0xC0 | (cp >> 6 ));
31 | out[1] = @truncate(0x80 | (cp & 0x3F));
32 | },
33 |
34 | 3 => {
35 | out[0] = @truncate(0xE0 | (cp >> 12 ));
36 | out[1] = @truncate(0x80 | ((cp >> 6) & 0x3F));
37 | out[2] = @truncate(0x80 | (cp & 0x3F));
38 | },
39 |
40 | else => {
41 | out[0] = @truncate(0xF0 | (cp >> 18 ));
42 | out[1] = @truncate(0x80 | ((cp >> 12) & 0x3F));
43 | out[2] = @truncate(0x80 | ((cp >> 6) & 0x3F));
44 | out[3] = @truncate(0x80 | (cp & 0x3F));
45 | }
46 | }
47 |
48 | return length;
49 | }
50 |
51 | /// Decodes a UTF-8 sequence to a Unicode `codepoint`.
52 | /// Returns the decoded codepoint.
53 | ///
54 | /// Assumes the input slice is a valid UTF-8 sequence of length 1-4.
55 | pub fn decode(slice: []const u8) u21 {
56 | return switch (slice.len) {
57 | 1 => @as(u21,
58 | slice[0]),
59 |
60 | 2 => (@as(u21,
61 | (slice[0] & 0x1F)) << 6) | (slice[1] & 0x3F),
62 |
63 | 3 => (((@as(u21,
64 | (slice[0] & 0x0F)) << 6) | (slice[1] & 0x3F)) << 6) | (slice[2] & 0x3F),
65 |
66 | else => (((((@as(u21,
67 | (slice[0] & 0x07)) << 6) | (slice[1] & 0x3F)) << 6) | (slice[2] & 0x3F)) << 6) | (slice[3] & 0x3F)
68 | };
69 | }
70 |
71 | // └──────────────────────────────────────────────────────────────┘
72 |
73 |
74 | // ┌────────────────────────── Properties ────────────────────────┐
75 |
76 |
77 | /// Returns the number of bytes (1-4) needed to encode a `codepoint` in UTF-8 format.
78 | pub fn getCodepointLength(cp: u21) u3 {
79 | return switch (cp) {
80 | 0x00000...0x00007F => @as(u3, 1),
81 | 0x00080...0x0007FF => @as(u3, 2),
82 | 0x00800...0x00FFFF => @as(u3, 3),
83 | else => @as(u3, 4),
84 | };
85 | }
86 |
87 | /// Returns the number of bytes (1-4) needed to encode a `codepoint` in UTF-8 format,
88 | /// or null if the codepoint is invalid.
89 | pub fn getCodepointLengthOrNull(cp: u21) ?u3 {
90 | return if (cp > 0x10FFFF) null else @call(.always_inline, getCodepointLength, .{cp});
91 | }
92 |
93 | /// Returns the expected number of bytes (1-4) in a UTF-8 sequence based on the first byte.
94 | pub fn getSequenceLength(first_byte: u8) u3 {
95 | return switch (first_byte) {
96 | 0x00...0x7F => @as(u3, 1),
97 | 0xC0...0xDF => @as(u3, 2),
98 | 0xE0...0xEF => @as(u3, 3),
99 | else => @as(u3, 4),
100 | };
101 | }
102 |
103 | /// Returns the expected number of bytes (1-4) in a UTF-8 sequence based on the first byte,
104 | /// or null if the first byte is not a valid starter.
105 | pub fn getSequenceLengthOrNull(first_byte: u8) ?u3 {
106 | return if (first_byte > 0xF7) null
107 | else @call(.always_inline, getSequenceLength, .{first_byte});
108 | }
109 |
110 | /// Returns true if the provided slice contains valid UTF-8 data.
111 | pub fn isValidSlice(utf8: []const u8) bool {
112 | // Inspired by: std.unicode.utf8ValidateSliceImpl
113 | // Todo: optimize or remove it (This was for learning purposes).
114 |
115 | // default lowest and highest continuation byte
116 | const lo_cb = 0b10000000;
117 | const hi_cb = 0b10111111;
118 |
119 | var remaining = utf8;
120 | vectorized: {
121 | const chunk_len = @import("std").simd.suggestVectorLength(u8) orelse break :vectorized;
122 | const Chunk = @Vector(chunk_len, u8);
123 |
124 | // Fast path. Check for and skip ASCII characters at the start of the input.
125 | while (remaining.len >= chunk_len) {
126 | const chunk: Chunk = remaining[0..chunk_len].*;
127 | const mask: Chunk = @splat(0x80);
128 | if (@reduce(.Or, chunk & mask == mask)) {
129 | // found a non ASCII byte
130 | break;
131 | }
132 | remaining = remaining[chunk_len..];
133 | }
134 | }
135 |
136 | // The first nibble is used to identify the continuation byte range to
137 | // accept. The second nibble is the size.
138 | const xx = 0xF1; // invalid: size 1
139 | const as = 0xF0; // ASCII: size 1
140 | const s1 = 0x02; // accept 0, size 2
141 | const s2 = 0x13; // accept 1, size 3
142 | const s3 = 0x03; // accept 0, size 3
143 | const s4 = 0x23; // accept 2, size 3
144 | const s5 = 0x34; // accept 3, size 4
145 | const s6 = 0x04; // accept 0, size 4
146 | const s7 = 0x44; // accept 4, size 4
147 |
148 | // Information about the first byte in a UTF-8 sequence.
149 | const first = comptime ([_]u8{as} ** 128) ++ ([_]u8{xx} ** 64) ++ [_]u8{
150 | xx, xx, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1,
151 | s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1, s1,
152 | s2, s3, s3, s3, s3, s3, s3, s3, s3, s3, s3, s3, s3, s4, s3, s3,
153 | s5, s6, s6, s6, s7, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx, xx,
154 | };
155 |
156 | const n = remaining.len;
157 | var i: usize = 0;
158 | while (i < n) {
159 | const first_byte = remaining[i];
160 | if (first_byte < 0x80) {
161 | i += 1;
162 | continue;
163 | }
164 |
165 | const info = first[first_byte];
166 | if (info == xx) {
167 | return false; // Illegal starter byte.
168 | }
169 |
170 | const size = info & 7;
171 | if (i + size > n) {
172 | return false; // Short or invalid.
173 | }
174 |
175 | // Figure out the acceptable low and high continuation bytes, starting
176 | // with our defaults.
177 | var accept_lo: u8 = lo_cb;
178 | var accept_hi: u8 = hi_cb;
179 |
180 | switch (info >> 4) {
181 | 0 => {},
182 | 1 => accept_lo = 0xA0,
183 | 2 => accept_hi = 0x9F,
184 | 3 => accept_lo = 0x90,
185 | 4 => accept_hi = 0x8F,
186 | else => unreachable,
187 | }
188 |
189 | const c1 = remaining[i + 1];
190 | if (c1 < accept_lo or accept_hi < c1) {
191 | return false;
192 | }
193 |
194 | switch (size) {
195 | 2 => i += 2,
196 | 3 => {
197 | const c2 = remaining[i + 2];
198 | if (c2 < lo_cb or hi_cb < c2) return false;
199 | i += 3;
200 | },
201 | 4 => {
202 | const c2 = remaining[i + 2];
203 | if (c2 < lo_cb or hi_cb < c2) return false;
204 | const c3 = remaining[i + 3];
205 | if (c3 < lo_cb or hi_cb < c3) return false;
206 | i += 4;
207 | },
208 | else => unreachable,
209 | }
210 | }
211 |
212 | return true;
213 | }
214 |
215 | /// Returns true if the provided codepoint is valid for UTF-8 encoding.
216 | pub fn isValidCodepoint(cp: u21) bool {
217 | return cp <= 0x10FFFF;
218 | }
219 |
220 | // └──────────────────────────────────────────────────────────────┘
221 |
222 | // ╚══════════════════════════════════════════════════════════════════════════════════╝
--------------------------------------------------------------------------------
/docs/string/utils/utf8.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 | UTF-8
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 | When simplicity meets efficiency
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 | part of
47 | SuperZig::io library
48 |
49 |
50 |
51 |
52 |
53 |
54 |

55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 | - **🍃 Zero dependencies**—meticulously crafted code.
65 |
66 | - **🚀 Blazing fast**—almost as fast as light!
67 |
68 | - **🌍 Universal compatibility**—Windows, Linux, and macOS.
69 |
70 | - **🛡️ Battle-tested**—ready for production.
71 |
72 |
73 |
74 |

75 |
76 |
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 | - ### Quick Start 🔥
85 |
86 | > If you have not already added the library to your project, please review the [installation guide](https://github.com/Super-ZIG/io/wiki/installation) for more information.
87 |
88 | ```zig
89 | const utf8 = @import("io").string.utils.utf8;
90 | ```
91 |
92 | > Convert slice to codepoint
93 |
94 | ```zig
95 | _ = utf8.decode("🌟").?; // 👉 0x1F31F
96 | ```
97 |
98 | > Convert codepoint to slice
99 |
100 | ```zig
101 | var buf: [4]u8 = undefined; // 👉 "🌟"
102 | _ = utf8.encode(0x1F31F, &buf).?; // 👉 4
103 | ```
104 |
105 | > Get codepoint length
106 |
107 | ```zig
108 | _ = utf8.getCodepointLength(0x1F31F); // 👉 4
109 | ```
110 |
111 | > Get UTF-8 sequence length
112 |
113 | ```zig
114 | _ = utf8.getCodepointLength("🌟"[0]); // 👉 4
115 | ```
116 |
117 |
118 |
119 |

120 |
121 |
122 |
123 |
124 |
125 |
126 |
127 |
128 | - ### API
129 |
130 | - #### Encoding / Decoding
131 |
132 | | Function | Return | Description |
133 | | -------- | ------ | --------------------------------------------------------------------------------------------- |
134 | | encode | `u3` | Encode a single Unicode `codepoint` to `UTF-8 sequence`, Returns the number of bytes written. |
135 | | decode | `u21` | Decode a `UTF-8 sequence` to a Unicode `codepoint`, Returns the decoded codepoint. |
136 |
137 | - #### Properties
138 |
139 | | Function | Return | Description |
140 | | ------------------------ | ------ | -------------------------------------------------------------------------------------------- |
141 | | getCodepointLength | `u3` | Returns the number of bytes (`1-4`) needed to encode a `codepoint` in UTF-8 format. |
142 | | getCodepointLengthOrNull | `?u3` | Returns the number of bytes (`1-4`) needed to encode a `codepoint` in UTF-8 format if valid. |
143 | | getSequenceLength | `u3` | Returns the number of bytes (`1-4`) in a `UTF-8 sequence` based on the first byte. |
144 | | getSequenceLengthOrNull | `?u3` | Returns the number of bytes (`1-4`) in a `UTF-8 sequence` based on the first byte if valid. |
145 |
146 | - #### Validation
147 |
148 | | Function | Return | Description |
149 | | ---------------- | ------ | ---------------------------------------------------------------------- |
150 | | isValidSlice | `bool` | Returns true if the provided slice contains valid `UTF-8 sequence`. |
151 | | isValidCodepoint | `bool` | Returns true if the provided code point is valid for `UTF-8 encoding`. |
152 |
153 |
154 |
155 |

156 |
157 |
158 |
159 |
160 |
161 |
162 |
163 |
164 | - ### Benchmark
165 |
166 | > A quick summary with sample performance test results between _**`SuperZIG`.`io`.`string`.`utils`.`utf8`**_ implementations and its popular competitors.
167 |
168 | - #### vs `std.unicode`
169 |
170 | > _**In summary**, `io` is faster by **5 times** compared to `std` in most cases, thanks to its optimized implementation. ✨_
171 |
172 | - #### Debug Build (`zig build run --release=safe -- utf8`)
173 |
174 | | Benchmark | Runs | Total Time | Avg Time | Speed |
175 | | --------- | ------ | ---------- | -------- | ----- |
176 | | std_x10 | 100000 | 92.7ms | 927ns | x1.00 |
177 | | io_x10 | 100000 | 31.9ms | 319ns | x2.91 |
178 | | std_x100 | 21485 | 1.959s | 91.188us | x1.00 |
179 | | io_x100 | 96186 | 1.997s | 20.768us | x4.39 |
180 | | std_x1000 | 218 | 2.067s | 9.482ms | x1.00 |
181 | | io_x1000 | 961 | 1.87s | 1.946ms | x4.87 |
182 |
183 | - #### Release Build (`zig build run --release=fast -- utf8`)
184 |
185 | | Benchmark | Runs | Total Time | Avg Time | Speed |
186 | | --------- | ------ | ---------- | -------- | ----- |
187 | | std_x10 | 100000 | 102.6ms | 1.026us | x1.00 |
188 | | io_x10 | 100000 | 29.1ms | 291ns | x3.53 |
189 | | std_x100 | 20653 | 1.915s | 92.771us | x1.00 |
190 | | io_x100 | 100000 | 1.796s | 17.962us | x5.16 |
191 | | std_x1000 | 232 | 2.028s | 8.742ms | x1.00 |
192 | | io_x1000 | 1176 | 2.07s | 1.76ms | x4.96 |
193 |
194 | > **It is normal for the values to differ each time the benchmark is run, but in general these percentages will remain close.**
195 |
196 | > The benchmarks were run on a **Windows 11 v24H2** with **11th Gen Intel® Core™ i5-1155G7 × 8** processor and **32GB** of RAM.
197 | >
198 | > The version of zig used is **0.14.0**.
199 | >
200 | > The source code of this benchmark **[bench/string/utils/utf8.zig](https://github.com/Super-ZIG/io-bench/tree/main/src/bench/string/utils/utf8.zig)**.
201 |
202 |
203 |
204 |

205 |
206 |
207 |
208 |
209 |
210 |
211 |
212 |
213 |
214 |
219 |
220 |
--------------------------------------------------------------------------------
/docs/string/utils/ascii.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 | ASCII
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 | When simplicity meets efficiency
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 | part of
47 | SuperZig::io library
48 |
49 |
50 |
51 |
52 |
53 |
54 |

55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 | - **🍃 Zero dependencies**—meticulously crafted code.
65 |
66 | - **🚀 Blazing fast**—almost as fast as light!
67 |
68 | - **🌍 Universal compatibility**—Windows, Linux, and macOS.
69 |
70 | - **🛡️ Battle-tested**—ready for production.
71 |
72 |
73 |
74 |

75 |
76 |
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 | - ### Quick Start 🔥
85 |
86 | > If you have not already added the library to your project, please review the [installation guide](https://github.com/Super-ZIG/io/wiki/installation) for more information.
87 |
88 | ```zig
89 | const ascii = @import("io").string.utils.ascii;
90 | ```
91 |
92 | > Convert characters
93 |
94 | ```zig
95 | _ = ascii.toUpper('a'); // 👉 'A'
96 | _ = ascii.toLower('A'); // 👉 'a'
97 | ```
98 |
99 | > Check character properties
100 |
101 | ```zig
102 | _ = ascii.isUpper('A'); // 👉 true
103 | _ = ascii.isLower('a'); // 👉 true
104 | _ = ascii.isDigit('1'); // 👉 true
105 | _ = ascii.isHex('F'); // 👉 true
106 | _ = ascii.isWhitespace(' '); // 👉 true
107 | _ = ascii.isPunctuation('!'); // 👉 true
108 | ...
109 | ```
110 |
111 |
112 |
113 |

114 |
115 |
116 |
117 |
118 |
119 |
120 |
121 |
122 | - ### API
123 |
124 | - #### Conversion
125 |
126 | | Function | Return | Description |
127 | | -------- | ------ | ---------------------------------------------------------------------------------------------- |
128 | | toUpper | `u8` | Converts a character to `uppercase`, If not a `lowercase` letter, it is returned `unchanged`. |
129 | | toLower | `u8` | Converts a character to `lowercase`, If not an `uppercase` letter, it is returned `unchanged`. |
130 |
131 | - #### Properties
132 |
133 | | Function | Return | Description |
134 | | -------------- | ------ | ------------------------------------------------------------------------------------------------------- |
135 | | isUpper | `bool` | Returns true if the character is an uppercase letter (`A-Z`). |
136 | | isLower | `bool` | Returns true if the character is a lowercase letter (`a-z`). |
137 | | isAlphabetic | `bool` | Returns true if the character is an alphabetic letter (`A-Z`, `a-z`). |
138 | | isDigit | `bool` | Returns true if the character is a numeric digit (`0-9`). |
139 | | isAlphanumeric | `bool` | Returns true if the character is alphanumeric (`A-Z`, `a-z`, `0-9`). |
140 | | isHex | `bool` | Returns true if the character is a hexadecimal digit (`0-9`, `A-F`, `a-f`). |
141 | | isOctal | `bool` | Returns true if the character is an octal digit (`0-7`). |
142 | | isBinary | `bool` | Returns true if the character is a binary digit (`0-1`). |
143 | | isPunctuation | `bool` | Returns true if the character is a punctuation symbol (`!`, `@`, `#`, `$`, `%`, `^`, `&`, `*`, ..). |
144 | | isWhitespace | `bool` | Returns true if the character is a whitespace character (`space`, `tab`, `newline`, `carriage return`). |
145 | | isPrintable | `bool` | Returns true if the character is printable (`A-Z`, `a-z`, `0-9`, `punctuation marks`, `space`). |
146 | | isControl | `bool` | Returns true if the character is a control character (`ASCII 0x00-0x1F or 0x7F`). |
147 |
148 |
149 |
150 |

151 |
152 |
153 |
154 |
155 |
156 |
157 |
158 |
159 | - ### Benchmark
160 |
161 | > A quick summary with sample performance test results between _**`SuperZIG`.`io`.`string`.`utils`.`ascii`**_ implementations and its popular competitors.
162 |
163 | - #### vs `std.ascii`
164 |
165 | > _**In summary**, the two run at **the same speed** because they share almost the same code. ✨_
166 |
167 | - #### Debug Build (`zig build run --release=safe -- ascii`)
168 |
169 | | Benchmark | Runs | Total Time | Avg Time | Speed |
170 | | --------- | ------ | ---------- | -------- | ----- |
171 | | std_x10 | 100000 | 2.2ms | 22ns | x1.00 |
172 | | io_x10 | 100000 | 1.9ms | 19ns | x1.16 |
173 | | std_x100 | 100000 | 5.9ms | 59ns | x1.00 |
174 | | io_x100 | 100000 | 5.2ms | 52ns | x1.13 |
175 | | std_x1000 | 100000 | 30.3ms | 303ns | x1.00 |
176 | | io_x1000 | 100000 | 30.5ms | 305ns | x0.99 |
177 |
178 | - #### Release Build (`zig build run --release=fast -- ascii`)
179 |
180 | | Benchmark | Runs | Total Time | Avg Time | Speed |
181 | | --------- | ------ | ---------- | -------- | ----- |
182 | | std_x10 | 100000 | 1.6ms | 16ns | x1.00 |
183 | | io_x10 | 100000 | 1.5ms | 15ns | x1.07 |
184 | | std_x100 | 100000 | 5.2ms | 52ns | x1.00 |
185 | | io_x100 | 100000 | 5.2ms | 52ns | x1.00 |
186 | | std_x1000 | 100000 | 31.1ms | 311ns | x1.00 |
187 | | io_x1000 | 100000 | 31ms | 310ns | x1.00 |
188 |
189 | > **It is normal for the values to differ each time the benchmark is run, but in general these percentages will remain close.**
190 |
191 | > The benchmarks were run on a **Windows 11 v24H2** with **11th Gen Intel® Core™ i5-1155G7 × 8** processor and **32GB** of RAM.
192 | >
193 | > The version of zig used is **0.14.0**.
194 | >
195 | > The source code of this benchmark **[bench/string/utils/ascii.zig](https://github.com/Super-ZIG/io-bench/tree/main/src/bench/string/utils/ascii.zig)**.
196 |
197 |
198 |
199 |

200 |
201 |
202 |
203 |
204 |
205 |
206 |
207 |
208 |
209 |
214 |
215 |
--------------------------------------------------------------------------------
/docs/string/utils/codepoint.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 | Codepoint
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 | When simplicity meets efficiency
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 | part of
47 | SuperZig::io library
48 |
49 |
50 |
51 |
52 |
53 |
54 |

55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 | - **🍃 Zero dependencies**—meticulously crafted code.
65 |
66 | - **🚀 Blazing fast**—almost as fast as light!
67 |
68 | - **🌍 Universal compatibility**—Windows, Linux, and macOS.
69 |
70 | - **🛡️ Battle-tested**—ready for production.
71 |
72 |
73 |
74 |

75 |
76 |
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 | - ### Quick Start 🔥
85 |
86 | > If you have not already added the library to your project, please review the [installation guide](https://github.com/Super-ZIG/io/wiki/installation) for more information.
87 |
88 | ```zig
89 | const codepoint = @import("io").string.utils.codepoint;
90 | ```
91 |
92 | > Initializes a Codepoint from a Codepoint or UTF-8 slice.
93 |
94 | ```zig
95 | _ = codepoint.init(0x1F31F).?; // 👉 .{ .src = 0x1F31F, .len = 4 }
96 | _ = codepoint.fromUtf8("🌟").?; // 👉 .{ .src = 0x1F31F, .len = 4 }
97 | ```
98 |
99 | > Iterate over a Codepoint or UTF-8 slice.
100 |
101 | ```zig
102 | var iter = codepoint.Utf8Iterator.init("..").?; // 👉 .{ .src = "..", .pos = 0 }
103 |
104 | while(iter.nextSlice()) |slice| { .. }
105 | while(iter.nextCodepoint()) |cp| { .. }
106 | ```
107 |
108 |
109 |
110 |

111 |
112 |
113 |
114 |
115 |
116 |
117 |
118 |
119 | - ### API
120 |
121 | - #### Codepoint
122 |
123 | - ##### Fields
124 |
125 | | Field | Type | Description |
126 | | ----- | ----- | ------------------------------------------------------------ |
127 | | `src` | `u21` | Numeric value of the Unicode codepoint (U+0000 to U+10FFFF). |
128 | | `len` | `u3` | Length of this codepoint in UTF-8 (1-4 bytes). |
129 |
130 | - ##### Initialization
131 |
132 | | Function | Return | Description |
133 | | --------------- | ------- | ------------------------------------------------------------------ |
134 | | init | `?Self` | Initializes a Codepoint from a Unicode `codepoint` value if valid. |
135 | | unsafe_init | `Self` | Initializes a Codepoint from a Unicode `codepoint` value. |
136 | | fromUtf8 | `?Self` | Initializes a Codepoint from a `UTF-8 encoded slice` if valid. |
137 | | unsafe_fromUtf8 | `Self` | Initializes a Codepoint from a `UTF-8 encoded slice`. |
138 |
139 | - #### Utf8Iterator
140 |
141 | - ##### Fields
142 |
143 | | Field | Type | Description |
144 | | ----- | ------------ | --------------------------------------------------------- |
145 | | `src` | `[]const u8` | The UTF-8 encoded string that the iterator will traverse. |
146 | | `pos` | `usize` | The current byte position in the string. |
147 |
148 | - ##### Initialization
149 |
150 | | Function | Return | Description |
151 | | ----------- | ------- | --------------------------------------------------------------------- |
152 | | init | `?Self` | Initializes a new Utf8Iterator from the given `UTF-8 slice` if valid. |
153 | | unsafe_init | `Self` | Initializes a new Utf8Iterator from the given `UTF-8 slice`. |
154 |
155 | - ##### Next
156 |
157 | | Function | Return | Description |
158 | | ------------- | ------------ | -------------------------------------------------------------------- |
159 | | nextCodepoint | `?Codepoint` | Returns the next `Codepoint` **and** increments the position. |
160 | | nextSlice | `?Codepoint` | Returns the next `UTF-8 slice` **and** increments the position. |
161 | | nextLength | `?Codepoint` | Returns the next `Codepoint` length **and** increments the position. |
162 |
163 | - ##### Peek
164 |
165 | | Function | Return | Description |
166 | | ------------- | ------------ | -------------------------------------------------------------------------- |
167 | | peekCodepoint | `?Codepoint` | Returns the next `Codepoint` **without** incrementing the position. |
168 | | peekSlice | `?Codepoint` | Returns the next `UTF-8 slice` **without** incrementing the position. |
169 | | peekLength | `?Codepoint` | Returns the next `Codepoint` length **without** incrementing the position. |
170 |
171 |
172 |
173 |

174 |
175 |
176 |
177 |
178 |
179 |
180 |
181 |
182 | - ### Benchmark
183 |
184 | > A quick summary with sample performance test results between _**`SuperZIG`.`io`.`string`.`utils`.`codepoint`**_ implementations and its popular competitors.
185 |
186 | - #### vs `std.unicode`
187 |
188 | > _**In summary**, `io` is faster by **5 times** compared to `std` in most cases, thanks to its optimized implementation. ✨_
189 |
190 | - #### Debug Build (`zig build run -- codepoint`)
191 |
192 | | Benchmark | Runs | Total Time | Avg Time | Speed |
193 | | --------- | ------ | ---------- | -------- | ----- |
194 | | std_x10 | 100000 | 87.4ms | 874ns | x1.00 |
195 | | io_x10 | 100000 | 65.6ms | 656ns | x1.33 |
196 | | std_x100 | 23412 | 2.108s | 90.082us | x1.00 |
197 | | io_x100 | 46583 | 1.952s | 41.918us | x2.15 |
198 | | std_x1000 | 234 | 2.061s | 8.81ms | x1.00 |
199 | | io_x1000 | 457 | 2.1s | 4.596ms | x1.92 |
200 |
201 | - #### Release Build (`zig build run --release=fast -- codepoint`)
202 |
203 | | Benchmark | Runs | Total Time | Avg Time | Speed |
204 | | --------- | ------ | ---------- | -------- | ----- |
205 | | std_x10 | 100000 | 84.9ms | 849ns | x1.00 |
206 | | io_x10 | 100000 | 22ms | 220ns | x3.86 |
207 | | std_x100 | 25531 | 1.967s | 77.053us | x1.00 |
208 | | io_x100 | 100000 | 1.56s | 15.608us | x4.94 |
209 | | std_x1000 | 263 | 2.107s | 8.012ms | x1.00 |
210 | | io_x1000 | 1233 | 1.966s | 1.594ms | x5.02 |
211 |
212 | > **It is normal for the values to differ each time the benchmark is run, but in general these percentages will remain close.**
213 |
214 | > The benchmarks were run on a **Windows 11 v24H2** with **11th Gen Intel® Core™ i5-1155G7 × 8** processor and **32GB** of RAM.
215 | >
216 | > The version of zig used is **0.14.0**.
217 | >
218 | > The source code of this benchmark **[bench/string/utils/codepoint.zig](https://github.com/Super-ZIG/io-bench/tree/main/src/bench/string/utils/codepoint.zig)**.
219 |
220 |
221 |
222 |

223 |
224 |
225 |
226 |
227 |
228 |
229 |
230 |
231 |
232 |
237 |
238 |
--------------------------------------------------------------------------------