├── .gitignore ├── README.md ├── build.zig ├── build.zig.zon └── src ├── Ast.zig ├── Parse.zig ├── Plan.zig ├── executor.zig ├── executor ├── join_ops.zig ├── modify_ops.zig ├── scan_ops.zig ├── simple_ops.zig └── step_ops.zig ├── graphon.zig ├── main.zig ├── parser_test.zig ├── storage.zig ├── storage └── rocksdb.zig ├── test_helpers.zig ├── tokenizer.zig ├── types.zig └── vendor └── snaptest.zig /.gitignore: -------------------------------------------------------------------------------- 1 | .zig-cache/ 2 | zig-out/ 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Graphon 2 | 3 | A very small graph database. 4 | 5 | ```gql 6 | MATCH (db:Database {name: 'graphon'})<-[:Wrote]-(p:Person) 7 | RETURN p.name 8 | ``` 9 | 10 | Can be queried with [GQL](https://www.iso.org/standard/76120.html), the ISO-standard graph query language. 11 | 12 | ## Getting started 13 | 14 | Graphon is a single binary that implements almost the entire GQL standard, to specification. You can query it either from Neo4j client libraries, or by making an HTTP request in any language. 15 | 16 | To start a database, just download the binary and run it. 17 | 18 | ```sh-session 19 | $ graphon 20 | $ curl "http://127.0.0.1:7687/?query=RETURN%2055" 21 | 55 22 | ``` 23 | 24 | The recommended way to explore a running Graphon database is through the CLI. 25 | 26 | ```sh-session 27 | $ graphon-cli 28 | Connected to http://127.0.0.1:7687 29 | > RETURN 100 * 3 30 | 300 31 | ``` 32 | 33 | ## Features 34 | 35 | Graphon implements the [GQL](https://www.gqlstandards.org/home) language for graph queries, which is defined in [ISO/IEC 39075:2024](https://www.iso.org/standard/76120.html). This standard was recently published in April 2024, so there's not many resources on it yet. You can find some documentation on the [Google Spanner](https://cloud.google.com/spanner/docs/reference/standard-sql/graph-intro) website. 36 | 37 | A simple graph query looks like this: 38 | 39 | ```gql 40 | MATCH (a:User {name: 'Eric'})->[:Likes]->(f:Food) 41 | RETURN f.name, f.calories 42 | ``` 43 | 44 | GQL is a powerful language. Here is a larger example that demonstrates a few features: 45 | 46 | - **Pattern Matching:** Find a variable-length path (trail) between follower and influencer nodes, allowing for a chain of connections between one and three `Follows` relationships deep. 47 | - **Complex Filtering:** Uses the `WHERE` clause to filter for influencers who have created popular posts (with more than 100 likes). 48 | - **Aggregation:** `OPTIONAL MATCH` finds recent posts created by the influencer to enrich the output, and `WITH` implicitly aggregates them. 49 | - **Structured Output:** Returns distinct named results including names, the titles of popular posts, the count of recent posts, and the entire trail of connections. 50 | - **Ordering and Limiting:** Orders and limits the output to the top 10 results. 51 | 52 | ```gql 53 | MATCH TRAIL (follower:Person) ((nodes)-[:Follows]->()){1,3} (influencer:Person), 54 | (influencer)-[:Created]->(post:Post), 55 | (follower)-[:Likes]->(post) 56 | WHERE post.likes_count > 100 57 | OPTIONAL MATCH (influencer)-[:Created]->(otherPost:Post) 58 | WHERE otherPost.creation_date > DATE '2024-01-01' 59 | WITH follower, influencer, post, nodes, COUNT(otherPost) AS recentPosts 60 | RETURN DISTINCT follower.name AS FollowerName, 61 | influencer.name AS InfluencerName, 62 | post.title AS PopularPost, 63 | recentPosts AS RecentPostCount, 64 | nodes AS FollowerTrail 65 | ORDER BY RecentPostCount DESC, InfluencerName 66 | LIMIT 10; 67 | ``` 68 | 69 | You can also insert, modify, and delete graph data. 70 | 71 | ```gql 72 | // Insert nodes and edges 73 | INSERT (a:Building {address: '285 Fulton St', city: 'New York', state: 'NY', zipcode: 10007}) 74 | INSERT (a)-[:Nearby]-(:Geography {name: 'Hudson River', type: 'water'}) 75 | INSERT (a)-[:Nearby]-(:Geography {name: 'The Battery', type: 'park'}) 76 | 77 | // Modify properties 78 | MATCH (p:Person {name: 'Eric'}) SET p.age = 23 79 | 80 | // Delete a node and attached edges 81 | MATCH (x:Account)-[:Invoice {unpaid: true}]->(:Account {id: 627}) 82 | DETACH DELETE x 83 | ``` 84 | 85 | Graphon can be queried via HTTP (results sent in JSON format) or [Bolt](https://neo4j.com/docs/bolt/current/) sessions. Concurrent transactions implement [snapshot isolation](https://jepsen.io/consistency/models/snapshot-isolation) to ensure consistency. 86 | 87 | The core GQL language includes graph pattern-matching queries, transactional updates, catalog changes, and list data types. 88 | 89 | These features are explicitly _not_ supported right now: 90 | 91 | - Having multiple directories and schemas in one database 92 | - Having multiple graphs in one database 93 | - Typed graphs, nodes, and edges (i.e., closed type schemas) 94 | - Named procedures 95 | - The datetime data type and storing time zones 96 | - Identifiers (variable names) using non-ASCII characters 97 | 98 | You could consider using Graphon when you want something small and low-overhead, yet still powerful. 99 | 100 | ## Limitations 101 | 102 | Graphon is a very small project. It tries to be fast where possible, but the query planner is not going to be very advanced. It won't perfectly optimize every query out there. 103 | 104 | I made this database primarily out of personal interest, to experiment with algorithms, and to learn what goes into a modern database. There will be bugs. Also, the on-disk format is unstable. **Do not use Graphon as a store for production data.** 105 | 106 | ## Architecture 107 | 108 | The database itself is written in Zig and based on RocksDB as a foundational storage layer. 109 | 110 | 1. **Session manager:** Listens for requests over HTTP and Bolt protocols, creates new sessions. 111 | 2. **Tokenizer and parser:** Convert text queries into an abstract syntax tree. 112 | 3. **Query planner:** Translate each query into an optimized, low-level query plan. 113 | 4. **Execution engine:** Safely execute query plans with specific graph algorithms in an interruptible, streaming API. 114 | 5. **Storage and transactions:** Move and fetch data from durable storage, hold transaction locks, and page files via RocksDB. 115 | 116 | ### Query plans 117 | 118 | Query plans are constructed out of the following operations. The design here was influenced by other databases, particularly the internal representations of [Postgres](https://github.com/postgres/postgres/blob/REL_16_3/src/backend/commands/explain.c#L1177-L1180) and [Neo4j](https://neo4j.com/docs/cypher-manual/current/planning-and-tuning/operators/operators-detail/). 119 | 120 | - `NodeScan`: Scan for nodes in a graph, optionally providing labels. 121 | - `EdgeScan`: Scan for edges in a graph, optionally providing labels. 122 | - `NodeById`: Fetch the node with an ID. 123 | - `EdgeById`: Fetch the edge with an ID. 124 | - `Step`: Traverse the graph for edges from a node. 125 | - `StepBetween`: Traverse the graph for edges between two nodes. 126 | - `Begin`: Marker node for the start of the right subtree of a repeat or join operator. 127 | - `Argument`: Marks a variable for the node or edge being repeated in a path. 128 | - `Repeat`: Repeat the sub-pattern, used for trail and path queries. 129 | - `ShortestPath`: Finds the shortest path(s) between two nodes. 130 | - `Join`: Take rows from the left subquery, execute the tree on the right subquery, and return both. 131 | - `SemiJoin`: Return rows from the left subquery where the right subquery is not null. 132 | - `Anti`: Test for the absence of a pattern, yielding a single row. 133 | - `Project`: Execute expressions or remap variable names. 134 | - `ProjectEndpoints`: Find the endpoints of an edge. 135 | - `EmptyResult`: Retrieve all results and drop them, used as the last operator in mutations. 136 | - `Filter`: Filter results by label presence or conditional expression. 137 | - `Limit`: Limit the count of result rows. 138 | - `Distinct`: Remove duplicate rows from the result. 139 | - `Skip`: Skip rows from the result. 140 | - `Sort`: Sort results by a provided key. 141 | - `Top`: Return some number of top rows by a provided key in sorted order (sort then limit). 142 | - `UnionAll`: Concatenates results from the left and right subqueries. 143 | - `InsertNode`: Insert a graph node with labels and properties. 144 | - `InsertEdge`: Insert an edge with direction, labels, and properties between two nodes. 145 | - `Update`: Set, add, or remove labels and properties from nodes and edges. 146 | - `Delete`: Delete a node or edge. 147 | - `Aggregate`: Compute aggregations, grouping by one or more columns. 148 | - `GroupAggregate`: Compute aggregations, where result table is already ordered into groups. 149 | 150 | There needs to be particular attention paid to graph algorithms to implement certain kinds of path queries efficiently, especially those that traverse paths or trails. We'll add new types of backend operations as Graphon's query language increases in expressivity. 151 | -------------------------------------------------------------------------------- /build.zig: -------------------------------------------------------------------------------- 1 | const std = @import("std"); 2 | 3 | // Although this function looks imperative, note that its job is to 4 | // declaratively construct a build graph that will be executed by an external 5 | // runner. 6 | pub fn build(b: *std.Build) void { 7 | // Standard target options allows the person running `zig build` to choose 8 | // what target to build for. Here we do not override the defaults, which 9 | // means any target is allowed, and the default is native. Other options 10 | // for restricting supported target set are available. 11 | const target = b.standardTargetOptions(.{}); 12 | 13 | // Standard optimization options allow the person running `zig build` to select 14 | // between Debug, ReleaseSafe, ReleaseFast, and ReleaseSmall. 15 | const optimize = b.standardOptimizeOption(.{ .preferred_optimize_mode = .ReleaseFast }); 16 | 17 | const lib = b.addStaticLibrary(.{ 18 | .name = "graphon", 19 | // In this case the main source file is merely a path, however, in more 20 | // complicated build scripts, this could be a generated file. 21 | .root_source_file = b.path("src/graphon.zig"), 22 | .target = target, 23 | .optimize = optimize, 24 | }); 25 | 26 | // This declares intent for the library to be installed into the standard 27 | // location when the user invokes the "install" step (the default step when 28 | // running `zig build`). 29 | b.installArtifact(lib); 30 | 31 | const exe = b.addExecutable(.{ 32 | .name = "graphon", 33 | .root_source_file = b.path("src/main.zig"), 34 | .target = target, 35 | .optimize = optimize, 36 | }); 37 | exe.linkLibC(); 38 | exe.linkSystemLibrary("rocksdb"); 39 | 40 | // This declares intent for the executable to be installed into the 41 | // standard location when the user invokes the "install" step (the default 42 | // step when running `zig build`). 43 | b.installArtifact(exe); 44 | 45 | // This *creates* a Run step in the build graph, to be executed when another 46 | // step is evaluated that depends on it. The next line below will establish 47 | // such a dependency. 48 | const run_cmd = b.addRunArtifact(exe); 49 | 50 | // By making the run step depend on the install step, it will be run from the 51 | // installation directory rather than directly from within the cache directory. 52 | // This is not necessary, however, if the application depends on other installed 53 | // files, this ensures they will be present and in the expected location. 54 | run_cmd.step.dependOn(b.getInstallStep()); 55 | 56 | // This allows the user to pass arguments to the application in the build 57 | // command itself, like this: `zig build run -- arg1 arg2 etc` 58 | if (b.args) |args| { 59 | run_cmd.addArgs(args); 60 | } 61 | 62 | // This creates a build step. It will be visible in the `zig build --help` menu, 63 | // and can be selected like this: `zig build run` 64 | // This will evaluate the `run` step rather than the default, which is "install". 65 | const run_step = b.step("run", "Run the app"); 66 | run_step.dependOn(&run_cmd.step); 67 | 68 | // Creates a step for unit testing. This only builds the test executable 69 | // but does not run it. 70 | const lib_unit_tests = b.addTest(.{ 71 | .root_source_file = b.path("src/graphon.zig"), 72 | .target = target, 73 | .optimize = optimize, 74 | }); 75 | lib_unit_tests.linkLibC(); 76 | lib_unit_tests.linkSystemLibrary("rocksdb"); 77 | 78 | const run_lib_unit_tests = b.addRunArtifact(lib_unit_tests); 79 | 80 | const exe_unit_tests = b.addTest(.{ 81 | .root_source_file = b.path("src/main.zig"), 82 | .target = target, 83 | .optimize = optimize, 84 | }); 85 | exe_unit_tests.linkLibC(); 86 | exe_unit_tests.linkSystemLibrary("rocksdb"); 87 | 88 | const run_exe_unit_tests = b.addRunArtifact(exe_unit_tests); 89 | 90 | // Similar to creating the run step earlier, this exposes a `test` step to 91 | // the `zig build --help` menu, providing a way for the user to request 92 | // running the unit tests. 93 | const test_step = b.step("test", "Run unit tests"); 94 | test_step.dependOn(&run_lib_unit_tests.step); 95 | test_step.dependOn(&run_exe_unit_tests.step); 96 | } 97 | -------------------------------------------------------------------------------- /build.zig.zon: -------------------------------------------------------------------------------- 1 | .{ 2 | .name = "graphon", 3 | .version = "0.0.0", 4 | .minimum_zig_version = "0.13.0", 5 | 6 | // This field is optional. 7 | // Each dependency must either provide a `url` and `hash`, or a `path`. 8 | // `zig build --fetch` can be used to fetch all dependencies of a package, recursively. 9 | // Once all dependencies are fetched, `zig build` no longer requires 10 | // internet connectivity. 11 | .dependencies = .{ 12 | // See `zig fetch --save ` for a command-line interface for adding dependencies. 13 | //.example = .{ 14 | // // When updating this field to a new URL, be sure to delete the corresponding 15 | // // `hash`, otherwise you are communicating that you expect to find the old hash at 16 | // // the new URL. 17 | // .url = "https://example.com/foo.tar.gz", 18 | // 19 | // // This is computed from the file contents of the directory of files that is 20 | // // obtained after fetching `url` and applying the inclusion rules given by 21 | // // `paths`. 22 | // // 23 | // // This field is the source of truth; packages do not come from a `url`; they 24 | // // come from a `hash`. `url` is just one of many possible mirrors for how to 25 | // // obtain a package matching this `hash`. 26 | // // 27 | // // Uses the [multihash](https://multiformats.io/multihash/) format. 28 | // .hash = "...", 29 | // 30 | // // When this is provided, the package is found in a directory relative to the 31 | // // build root. In this case the package's hash is irrelevant and therefore not 32 | // // computed. This field and `url` are mutually exclusive. 33 | // .path = "foo", 34 | 35 | // // When this is set to `true`, a package is declared to be lazily 36 | // // fetched. This makes the dependency only get fetched if it is 37 | // // actually used. 38 | // .lazy = false, 39 | //}, 40 | }, 41 | 42 | // Specifies the set of files and directories that are included in this package. 43 | // Only files and directories listed here are included in the `hash` that 44 | // is computed for this package. 45 | // Paths are relative to the build root. Use the empty string (`""`) to refer to 46 | // the build root itself. 47 | // A directory listed here means that all files within, recursively, are included. 48 | .paths = .{ 49 | "build.zig", 50 | "build.zig.zon", 51 | "src", 52 | "LICENSE", 53 | "README.md", 54 | }, 55 | } 56 | -------------------------------------------------------------------------------- /src/Ast.zig: -------------------------------------------------------------------------------- 1 | //! Abstract syntax tree for GQL expressions, based on ISO/IEC 39075:2024. 2 | //! 3 | //! The flat-layout data structures defined in this file are borrowed from Zig's 4 | //! `std.zig.Ast` struct. This is available under the MIT license. 5 | //! https://github.com/ziglang/zig/blob/0.13.0/LICENSE 6 | //! 7 | //! Of course, the actual parsing was rewritten to parse GQL instead of Zig. 8 | 9 | // Copyright (c) Zig contributors 10 | // 11 | // Permission is hereby granted, free of charge, to any person obtaining a copy 12 | // of this software and associated documentation files (the "Software"), to deal 13 | // in the Software without restriction, including without limitation the rights 14 | // to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 15 | // copies of the Software, and to permit persons to whom the Software is 16 | // furnished to do so, subject to the following conditions: 17 | // 18 | // The above copyright notice and this permission notice shall be included in 19 | // all copies or substantial portions of the Software. 20 | // 21 | // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 22 | // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 23 | // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 24 | // AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 25 | // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 26 | // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 27 | // THE SOFTWARE. 28 | 29 | const std = @import("std"); 30 | const mem = std.mem; 31 | const Allocator = std.mem.Allocator; 32 | const assert = std.debug.assert; 33 | 34 | const Ast = @This(); 35 | const Tokenizer = @import("tokenizer.zig").Tokenizer; 36 | const Token = @import("tokenizer.zig").Token; 37 | const Parse = @import("Parse.zig"); 38 | 39 | /// Reference to externally-owned data. 40 | source: [:0]const u8, 41 | 42 | tokens: TokenList.Slice, 43 | /// The root AST node is assumed to be index 0. Since there can be no 44 | /// references to the root node, this means 0 is available to indicate null. 45 | nodes: NodeList.Slice, 46 | extra_data: []Node.Index, 47 | 48 | errors: []const Error, 49 | 50 | pub const TokenIndex = u32; 51 | pub const ByteOffset = u32; 52 | 53 | pub const TokenList = std.MultiArrayList(struct { 54 | tag: Token.Tag, 55 | start: ByteOffset, 56 | }); 57 | pub const NodeList = std.MultiArrayList(Node); 58 | 59 | pub const Location = struct { 60 | line: usize, 61 | column: usize, 62 | line_start: usize, 63 | line_end: usize, 64 | }; 65 | 66 | pub const Span = struct { 67 | start: u32, 68 | end: u32, 69 | main: u32, 70 | }; 71 | 72 | pub fn deinit(tree: *Ast, gpa: Allocator) void { 73 | tree.tokens.deinit(gpa); 74 | tree.nodes.deinit(gpa); 75 | gpa.free(tree.extra_data); 76 | gpa.free(tree.errors); 77 | tree.* = undefined; 78 | } 79 | 80 | /// Result should be freed with tree.deinit() when there are 81 | /// no more references to any of the tokens or nodes. 82 | pub fn parse(gpa: Allocator, source: [:0]const u8) Allocator.Error!Ast { 83 | var tokens = Ast.TokenList{}; 84 | defer tokens.deinit(gpa); 85 | 86 | // Empirically, the zig std lib has an 8:1 ratio of source bytes to token count. 87 | const estimated_token_count = source.len / 8; 88 | try tokens.ensureTotalCapacity(gpa, estimated_token_count); 89 | 90 | var tokenizer = std.zig.Tokenizer.init(source); 91 | while (true) { 92 | const token = tokenizer.next(); 93 | try tokens.append(gpa, .{ 94 | .tag = token.tag, 95 | .start = @intCast(token.loc.start), 96 | }); 97 | if (token.tag == .eof) break; 98 | } 99 | 100 | var parser: Parse = .{ 101 | .source = source, 102 | .gpa = gpa, 103 | .token_tags = tokens.items(.tag), 104 | .token_starts = tokens.items(.start), 105 | .errors = .{}, 106 | .nodes = .{}, 107 | .extra_data = .{}, 108 | .scratch = .{}, 109 | .tok_i = 0, 110 | }; 111 | defer parser.errors.deinit(gpa); 112 | defer parser.nodes.deinit(gpa); 113 | defer parser.extra_data.deinit(gpa); 114 | defer parser.scratch.deinit(gpa); 115 | 116 | // Empirically, Zig source code has a 2:1 ratio of tokens to AST nodes. 117 | // Make sure at least 1 so we can use appendAssumeCapacity on the root node below. 118 | const estimated_node_count = (tokens.len + 2) / 2; 119 | try parser.nodes.ensureTotalCapacity(gpa, estimated_node_count); 120 | 121 | try parser.parseRoot(); 122 | 123 | // TODO experiment with compacting the MultiArrayList slices here 124 | return Ast{ 125 | .source = source, 126 | .tokens = tokens.toOwnedSlice(), 127 | .nodes = parser.nodes.toOwnedSlice(), 128 | .extra_data = try parser.extra_data.toOwnedSlice(gpa), 129 | .errors = try parser.errors.toOwnedSlice(gpa), 130 | }; 131 | } 132 | 133 | /// Returns an extra offset for column and byte offset of errors that 134 | /// should point after the token in the error message. 135 | pub fn errorOffset(tree: Ast, parse_error: Error) u32 { 136 | return if (parse_error.token_is_prev) 137 | @as(u32, @intCast(tree.tokenSlice(parse_error.token).len)) 138 | else 139 | 0; 140 | } 141 | 142 | pub fn tokenLocation(self: Ast, start_offset: ByteOffset, token_index: TokenIndex) Location { 143 | var loc = Location{ 144 | .line = 0, 145 | .column = 0, 146 | .line_start = start_offset, 147 | .line_end = self.source.len, 148 | }; 149 | const token_start = self.tokens.items(.start)[token_index]; 150 | 151 | // Scan to by line until we go past the token start 152 | while (std.mem.indexOfScalarPos(u8, self.source, loc.line_start, '\n')) |i| { 153 | if (i >= token_start) { 154 | break; // Went past 155 | } 156 | loc.line += 1; 157 | loc.line_start = i + 1; 158 | } 159 | 160 | const offset = loc.line_start; 161 | for (self.source[offset..], 0..) |c, i| { 162 | if (i + offset == token_start) { 163 | loc.line_end = i + offset; 164 | while (loc.line_end < self.source.len and self.source[loc.line_end] != '\n') { 165 | loc.line_end += 1; 166 | } 167 | return loc; 168 | } 169 | if (c == '\n') { 170 | loc.line += 1; 171 | loc.column = 0; 172 | loc.line_start = i + 1; 173 | } else { 174 | loc.column += 1; 175 | } 176 | } 177 | return loc; 178 | } 179 | 180 | pub fn tokenSlice(tree: Ast, token_index: TokenIndex) []const u8 { 181 | const token_starts = tree.tokens.items(.start); 182 | const token_tags = tree.tokens.items(.tag); 183 | const token_tag = token_tags[token_index]; 184 | 185 | // Many tokens can be determined entirely by their tag. 186 | if (token_tag.lexeme()) |lexeme| { 187 | return lexeme; 188 | } 189 | 190 | // For some tokens, re-tokenization is needed to find the end. 191 | var tokenizer: std.zig.Tokenizer = .{ 192 | .buffer = tree.source, 193 | .index = token_starts[token_index], 194 | }; 195 | const token = tokenizer.next(); 196 | assert(token.tag == token_tag); 197 | return tree.source[token.loc.start..token.loc.end]; 198 | } 199 | 200 | pub fn extraData(tree: Ast, index: usize, comptime T: type) T { 201 | const fields = std.meta.fields(T); 202 | var result: T = undefined; 203 | inline for (fields, 0..) |field, i| { 204 | comptime assert(field.type == Node.Index); 205 | @field(result, field.name) = tree.extra_data[index + i]; 206 | } 207 | return result; 208 | } 209 | 210 | pub fn rootDecls(tree: Ast) []const Node.Index { 211 | // Root is always index 0. 212 | const nodes_data = tree.nodes.items(.data); 213 | return tree.extra_data[nodes_data[0].lhs..nodes_data[0].rhs]; 214 | } 215 | 216 | pub fn renderError(tree: Ast, parse_error: Error, stream: anytype) !void { 217 | const token_tags = tree.tokens.items(.tag); 218 | switch (parse_error.tag) { 219 | .chained_comparison_operators => { 220 | return stream.writeAll("comparison operators cannot be chained"); 221 | }, 222 | // TODO: Other errors 223 | 224 | .expected_token => { 225 | const found_tag = token_tags[parse_error.token + @intFromBool(parse_error.token_is_prev)]; 226 | const expected_symbol = parse_error.extra.expected_tag.symbol(); 227 | switch (found_tag) { 228 | .invalid => return stream.print("expected '{s}', found invalid bytes", .{ 229 | expected_symbol, 230 | }), 231 | else => return stream.print("expected '{s}', found '{s}'", .{ 232 | expected_symbol, found_tag.symbol(), 233 | }), 234 | } 235 | }, 236 | } 237 | } 238 | 239 | pub fn tokensOnSameLine(tree: Ast, token1: TokenIndex, token2: TokenIndex) bool { 240 | const token_starts = tree.tokens.items(.start); 241 | const source = tree.source[token_starts[token1]..token_starts[token2]]; 242 | return mem.indexOfScalar(u8, source, '\n') == null; 243 | } 244 | 245 | pub const Error = struct { 246 | tag: Tag, 247 | is_note: bool = false, 248 | /// True if `token` points to the token before the token causing an issue. 249 | token_is_prev: bool = false, 250 | token: TokenIndex, 251 | extra: union { 252 | none: void, 253 | expected_tag: Token.Tag, 254 | } = .{ .none = {} }, 255 | 256 | pub const Tag = enum { 257 | chained_comparison_operators, 258 | 259 | /// `expected_tag` is populated. 260 | expected_token, 261 | }; 262 | }; 263 | 264 | pub const Node = struct { 265 | tag: Tag, 266 | main_token: TokenIndex, 267 | data: Data, 268 | 269 | pub const Index = u32; 270 | 271 | comptime { 272 | // Goal is to keep this under one byte for efficiency. 273 | assert(@sizeOf(Tag) == 1); 274 | } 275 | 276 | pub const Tag = enum { 277 | /// Node stored in lhs. 278 | root, 279 | 280 | /// Section 14.3, `extra_data[lhs..rhs]`. 281 | /// `(MATCH | LET | FOR | FILTER | ORDER BY ...)* RETURN ...` 282 | simple_query_statement, 283 | 284 | /// Section 14.4, `extra_data[lhs..rhs]`. 285 | /// `MATCH ` 286 | /// `OPTIONAL MATCH ` 287 | /// `OPTIONAL MATCH { ... }`, `OPTIONAL MATCH ( ... )` 288 | match_statement, 289 | 290 | /// Section 14.11, `extra_data[lhs..rhs]`. 291 | return_statement, 292 | 293 | /// Section 14.11, `lhs AS rhs`. 294 | return_alias, 295 | 296 | /// Section 16.4, for now just lhs is a ``. 297 | graph_pattern, 298 | 299 | /// Section 16.7, `extra_data[lhs..rhs]`. 300 | /// List of ``, each possibly quantified. 301 | path_pattern, 302 | 303 | /// Section 16.7, `extra_data[lhs..rhs]`. 304 | /// List of node or edge patterns, or parenthesized path patterns. 305 | path_primary, 306 | 307 | /// Section 16.7, `lhs?`. 308 | questioned_path_primary, 309 | 310 | /// Section 16.11, `lhs*`, `lhs+`, `lhs[x]`, `lhs{x, y}`. GraphPatternQuantifier[rhs] 311 | quantified_path_primary, 312 | 313 | /// Section 16.7, `(Name:Label Predicate)`. `ElementPatternFiller[lhs]`. 314 | /// rhs is the closing parenthesis. 315 | node_pattern, 316 | 317 | /// Section 16.7. ` Name:Label Predicate `. `ElementPatternFiller[lhs]`. 318 | /// main_token is the left half, and rhs is the right half. 319 | /// Both lhs and rhs can be null in case of an abbreviated edge pattern. 320 | edge_pattern, 321 | 322 | /// Both lhs and rhs unused. 323 | /// Most identifiers will not have explicit AST nodes, however for expressions 324 | /// which could be one of many different kinds of AST nodes, there will be an 325 | /// identifier AST node for it. 326 | identifier, 327 | 328 | /// Both lhs and rhs unused. 329 | number_literal, 330 | 331 | /// Both lhs and rhs unused. 332 | string_literal, 333 | }; 334 | 335 | pub const Data = struct { 336 | lhs: Index, 337 | rhs: Index, 338 | }; 339 | }; 340 | 341 | pub fn nodeToSpan(tree: *const Ast, node: u32) Span { 342 | return tokensToSpan( 343 | tree, 344 | tree.firstToken(node), 345 | tree.lastToken(node), 346 | tree.nodes.items(.main_token)[node], 347 | ); 348 | } 349 | 350 | pub fn tokenToSpan(tree: *const Ast, token: Ast.TokenIndex) Span { 351 | return tokensToSpan(tree, token, token, token); 352 | } 353 | 354 | pub fn tokensToSpan(tree: *const Ast, start: Ast.TokenIndex, end: Ast.TokenIndex, main: Ast.TokenIndex) Span { 355 | const token_starts = tree.tokens.items(.start); 356 | var start_tok = start; 357 | var end_tok = end; 358 | 359 | if (tree.tokensOnSameLine(start, end)) { 360 | // do nothing 361 | } else if (tree.tokensOnSameLine(start, main)) { 362 | end_tok = main; 363 | } else if (tree.tokensOnSameLine(main, end)) { 364 | start_tok = main; 365 | } else { 366 | start_tok = main; 367 | end_tok = main; 368 | } 369 | const start_off = token_starts[start_tok]; 370 | const end_off = token_starts[end_tok] + @as(u32, @intCast(tree.tokenSlice(end_tok).len)); 371 | return Span{ .start = start_off, .end = end_off, .main = token_starts[main] }; 372 | } 373 | 374 | test { 375 | _ = Parse; 376 | } 377 | -------------------------------------------------------------------------------- /src/Parse.zig: -------------------------------------------------------------------------------- 1 | //! Parser for GQL. This is based on the grammar defined in ISO/IEC 39075:2024, 2 | //! with some minor simplifications. 3 | //! 4 | //! The overall structure of this file and specific parts are taken from the Zig 5 | //! parser in `std.zig.Parse`. This is available under the MIT license. 6 | //! https://github.com/ziglang/zig/blob/0.13.0/LICENSE 7 | 8 | // Copyright (c) Zig contributors 9 | // 10 | // Permission is hereby granted, free of charge, to any person obtaining a copy 11 | // of this software and associated documentation files (the "Software"), to deal 12 | // in the Software without restriction, including without limitation the rights 13 | // to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 14 | // copies of the Software, and to permit persons to whom the Software is 15 | // furnished to do so, subject to the following conditions: 16 | // 17 | // The above copyright notice and this permission notice shall be included in 18 | // all copies or substantial portions of the Software. 19 | // 20 | // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 21 | // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 22 | // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 23 | // AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 24 | // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 25 | // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 26 | // THE SOFTWARE. 27 | 28 | const std = @import("std"); 29 | 30 | const Parse = @This(); 31 | const Allocator = std.mem.Allocator; 32 | 33 | const Token = @import("tokenizer.zig").Token; 34 | const Ast = @import("Ast.zig"); 35 | const Node = Ast.Node; 36 | const AstError = Ast.Error; 37 | const TokenIndex = Ast.TokenIndex; 38 | 39 | test { 40 | _ = @import("parser_test.zig"); 41 | } 42 | -------------------------------------------------------------------------------- /src/Plan.zig: -------------------------------------------------------------------------------- 1 | //! Plan of execution for a graph database query. 2 | 3 | const std = @import("std"); 4 | const Allocator = std.mem.Allocator; 5 | 6 | const types = @import("types.zig"); 7 | const EdgeDirection = types.EdgeDirection; 8 | const Value = types.Value; 9 | 10 | const Plan = @This(); 11 | 12 | /// Operators that define the query plan. 13 | ops: std.ArrayListUnmanaged(Operator) = .{}, 14 | 15 | /// Results that will be returned by the query, one per value column. 16 | results: std.ArrayListUnmanaged(u16) = .{}, 17 | 18 | pub fn deinit(self: *Plan, allocator: Allocator) void { 19 | for (self.ops.items) |*op| op.deinit(allocator); 20 | self.ops.deinit(allocator); 21 | self.results.deinit(allocator); 22 | } 23 | 24 | /// Pretty-print the provided query plan. 25 | /// 26 | /// As an example, given the query 27 | /// 28 | /// ``` 29 | /// MATCH (a:Person)-[b:Friend]-(c:Person) 30 | /// WHERE 31 | /// c.age > a.age + 3 32 | /// AND EXISTS ((c)-[:FavoriteFood]->(:Food {name: 'Pizza'}) 33 | /// RETURN a.name, c.name, b.duration AS duration 34 | /// ``` 35 | /// 36 | /// One possible query plan is: 37 | /// 38 | /// ``` 39 | /// Plan{%4, %5, %6} 40 | /// Project %4: %0.name, %5: %2.name, %6: %1.duration 41 | /// SemiJoin 42 | /// Filter %3.name = 'Pizza' 43 | /// Step (%2)-[:FavoriteFood]->(%3) 44 | /// Argument %2 45 | /// Begin 46 | /// Filter %2:Person, %2.age > %0.age + 3 47 | /// Step (%0)-[%1:Friend]-(%2) 48 | /// NodeScan (%0:Person) 49 | /// ``` 50 | pub fn print(self: Plan, writer: anytype) !void { 51 | try writer.writeAll("Plan{"); 52 | var first = true; 53 | for (self.results.items) |r| { 54 | if (!first) try writer.writeAll(", "); 55 | try writer.print("%{}", .{r}); 56 | first = false; 57 | } 58 | try writer.writeByte('}'); 59 | 60 | var level: usize = 1; 61 | var idx = self.ops.items.len; 62 | while (idx > 0) : (idx -= 1) { 63 | const op = self.ops.items[idx - 1]; 64 | if (op == .begin and level > 1) { 65 | level -= 1; 66 | } 67 | try writer.writeByte('\n'); 68 | try writer.writeByteNTimes(' ', 2 * level); 69 | try op.print(writer); 70 | if (op.hasSubquery()) { 71 | level += 1; 72 | } 73 | } 74 | } 75 | 76 | fn idents_chk(ret: *u16, values: anytype) void { 77 | inline for (values) |v| { 78 | if (@as(?u16, v)) |value| { 79 | if (value + 1 > ret.*) { 80 | ret.* = value + 1; 81 | } 82 | } 83 | } 84 | } 85 | 86 | /// Return the number of identifiers in the plan. 87 | pub fn idents(self: Plan) u16 { 88 | var ret: u16 = 0; 89 | for (self.ops.items) |op| { 90 | switch (op) { 91 | .node_scan => |n| idents_chk(&ret, .{n.ident}), 92 | .edge_scan => |n| idents_chk(&ret, .{n.ident}), 93 | .step => |n| idents_chk(&ret, .{ n.ident_edge, n.ident_dest }), 94 | .argument => |n| idents_chk(&ret, .{n}), 95 | .project => |n| { 96 | for (n.items) |c| idents_chk(&ret, .{c.ident}); 97 | }, 98 | .insert_node => |n| idents_chk(&ret, .{n.ident}), 99 | .insert_edge => |n| idents_chk(&ret, .{n.ident}), 100 | else => {}, 101 | } 102 | } 103 | return ret; 104 | } 105 | 106 | /// Given a join-style operation's index, return the index of the matching 'Begin' 107 | /// operation or null if not found. 108 | pub fn subqueryBegin(self: Plan, op_index: u32) ?u32 { 109 | std.debug.assert(self.ops.items[op_index].hasSubquery()); 110 | var level: u32 = 1; 111 | var i = op_index + 1; 112 | while (i < self.ops.items.len) : (i += 1) { 113 | const op = self.ops.items[i]; 114 | if (op == .begin) { 115 | if (level == 1) { 116 | return i; 117 | } 118 | level -= 1; 119 | } else if (op.hasSubquery()) { 120 | level += 1; 121 | } 122 | } 123 | return null; 124 | } 125 | 126 | /// A single step in the query plan, which may depend on previous steps. 127 | pub const Operator = union(enum) { 128 | node_scan: Scan, 129 | edge_scan: Scan, 130 | node_by_id: LookupId, 131 | edge_by_id: LookupId, 132 | step: Step, 133 | // step_between, 134 | begin, 135 | argument: u16, // unimplemented 136 | repeat, // unimplemented 137 | // shortest_path, 138 | join, 139 | semi_join, 140 | anti, 141 | project: std.ArrayListUnmanaged(ProjectClause), 142 | // project_endpoints: ProjectEndpoints, 143 | empty_result, 144 | filter: std.ArrayListUnmanaged(FilterClause), 145 | limit: u64, 146 | distinct: std.ArrayListUnmanaged(u16), // unimplemented 147 | skip: u64, 148 | sort: std.MultiArrayList(SortClause), // unimplemented 149 | top: u64, // unimplemented 150 | union_all, 151 | // update, 152 | insert_node: InsertNode, 153 | insert_edge: InsertEdge, 154 | // delete, 155 | // aggregate, 156 | // group_aggregate, 157 | 158 | pub fn deinit(self: *Operator, allocator: Allocator) void { 159 | switch (self.*) { 160 | .node_scan => |*n| n.deinit(allocator), 161 | .edge_scan => |*n| n.deinit(allocator), 162 | .node_by_id => {}, 163 | .edge_by_id => {}, 164 | .step => |*n| n.deinit(allocator), 165 | .begin => {}, 166 | .argument => {}, 167 | .repeat => {}, 168 | .join => {}, 169 | .semi_join => {}, 170 | .anti => {}, 171 | .project => |*n| { 172 | for (n.items) |*c| c.deinit(allocator); 173 | n.deinit(allocator); 174 | }, 175 | .empty_result => {}, 176 | .filter => |*n| { 177 | for (n.items) |*c| c.deinit(allocator); 178 | n.deinit(allocator); 179 | }, 180 | .limit => {}, 181 | .distinct => |*n| n.deinit(allocator), 182 | .skip => {}, 183 | .sort => |*n| n.deinit(allocator), 184 | .top => {}, 185 | .union_all => {}, 186 | .insert_node => |*n| n.deinit(allocator), 187 | .insert_edge => |*n| n.deinit(allocator), 188 | } 189 | self.* = undefined; 190 | } 191 | 192 | /// Pretty-print a query plan node. 193 | pub fn print(self: Operator, writer: anytype) !void { 194 | const node_name = switch (self) { 195 | .node_scan => "NodeScan", 196 | .edge_scan => "EdgeScan", 197 | .node_by_id => "NodeById", 198 | .edge_by_id => "EdgeById", 199 | .step => "Step", 200 | .begin => "Begin", 201 | .argument => "Argument", 202 | .repeat => "Repeat", 203 | .join => "Join", 204 | .semi_join => "SemiJoin", 205 | .anti => "Anti", 206 | .project => "Project", 207 | .empty_result => "EmptyResult", 208 | .filter => "Filter", 209 | .limit => "Limit", 210 | .distinct => "Distinct", 211 | .skip => "Skip", 212 | .sort => "Sort", 213 | .top => "Top", 214 | .union_all => "UnionAll", 215 | .insert_node => "InsertNode", 216 | .insert_edge => "InsertEdge", 217 | }; 218 | try writer.writeAll(node_name); 219 | switch (self) { 220 | .node_scan => |n| { 221 | try writer.writeByte(' '); 222 | try print_node_spec(writer, n.ident, n.label); 223 | }, 224 | .edge_scan => |n| { 225 | try writer.writeByte(' '); 226 | try print_edge_spec(writer, .any, n.ident, n.label); 227 | }, 228 | .node_by_id => |n| { 229 | try writer.print(" %{} -> %{}", .{ n.ident_id, n.ident_ref }); 230 | }, 231 | .edge_by_id => |n| { 232 | try writer.print(" %{} -> %{}", .{ n.ident_id, n.ident_ref }); 233 | }, 234 | .step => |n| { 235 | try writer.writeByte(' '); 236 | try print_node_spec(writer, n.ident_src, null); 237 | try print_edge_spec(writer, n.direction, n.ident_edge, n.edge_label); 238 | try print_node_spec(writer, n.ident_dest, null); 239 | }, 240 | .begin => {}, 241 | .argument => |n| { 242 | try writer.print(" %{}", .{n}); 243 | }, 244 | .repeat => std.debug.panic("repeat unimplemented", .{}), 245 | .join => std.debug.panic("join unimplemented", .{}), 246 | .semi_join => std.debug.panic("semi_join unimplemented", .{}), 247 | .anti => {}, 248 | .project => |n| { 249 | var first = true; 250 | for (n.items) |c| { 251 | if (first) { 252 | try writer.writeByte(' '); 253 | } else { 254 | try writer.writeAll(", "); 255 | } 256 | try writer.print("%{}: ", .{c.ident}); 257 | try c.exp.print(writer); 258 | first = false; 259 | } 260 | }, 261 | .empty_result => {}, 262 | .filter => |n| { 263 | var first = true; 264 | for (n.items) |c| { 265 | if (first) { 266 | try writer.writeByte(' '); 267 | } else { 268 | try writer.writeAll(", "); 269 | } 270 | try c.print(writer); 271 | first = false; 272 | } 273 | }, 274 | .limit => |n| { 275 | try writer.print(" {}", .{n}); 276 | }, 277 | .distinct => |n| { 278 | var first = true; 279 | for (n.items) |i| { 280 | if (first) { 281 | try writer.writeByte(' '); 282 | } else { 283 | try writer.writeAll(", "); 284 | } 285 | try writer.print("%{}", .{i}); 286 | first = false; 287 | } 288 | }, 289 | .skip => |n| { 290 | try writer.print(" {}", .{n}); 291 | }, 292 | .sort => |n| { 293 | for (0..n.len) |i| { 294 | if (i == 0) { 295 | try writer.writeByte(' '); 296 | } else { 297 | try writer.writeAll(", "); 298 | } 299 | const s = n.get(i); 300 | try writer.print("%{} {s}", .{ s.ident, if (s.desc) "desc" else "asc" }); 301 | } 302 | }, 303 | .top => |n| { 304 | try writer.print(" {}", .{n}); 305 | }, 306 | .union_all => {}, 307 | .insert_node => |n| { 308 | try writer.writeAll(" ("); 309 | if (n.ident) |i| { 310 | try writer.print("%{}", .{i}); 311 | } 312 | try print_labels(writer, n.labels.items); 313 | try print_properties(writer, n.properties); 314 | try writer.writeByte(')'); 315 | }, 316 | .insert_edge => |n| { 317 | try writer.writeByte(' '); 318 | try print_node_spec(writer, n.ident_src, null); 319 | const direction: EdgeDirection = if (n.directed) .right else .undirected; 320 | try writer.writeAll(direction.leftPart()); 321 | if (n.ident) |i| { 322 | try writer.print("%{}", .{i}); 323 | } 324 | try print_labels(writer, n.labels.items); 325 | try print_properties(writer, n.properties); 326 | try writer.writeAll(direction.rightPart()); 327 | try print_node_spec(writer, n.ident_dest, null); 328 | }, 329 | } 330 | } 331 | 332 | /// Return if this operator type has a subquery. 333 | pub fn hasSubquery(self: Operator) bool { 334 | return switch (self) { 335 | .repeat, .semi_join, .join, .union_all => true, 336 | else => false, 337 | }; 338 | } 339 | }; 340 | 341 | fn print_node_spec(writer: anytype, ident: ?u16, label: ?[]u8) !void { 342 | try writer.writeByte('('); 343 | if (ident) |i| { 344 | try writer.print("%{}", .{i}); 345 | } 346 | if (label) |l| { 347 | try writer.print(":{s}", .{l}); 348 | } 349 | try writer.writeByte(')'); 350 | } 351 | 352 | fn print_edge_spec(writer: anytype, direction: EdgeDirection, ident: ?u16, label: ?[]u8) !void { 353 | try writer.writeAll(direction.leftPart()); 354 | if (ident) |i| { 355 | try writer.print("%{}", .{i}); 356 | } 357 | if (label) |l| { 358 | try writer.print(":{s}", .{l}); 359 | } 360 | try writer.writeAll(direction.rightPart()); 361 | } 362 | 363 | fn print_labels(writer: anytype, labels: [][]u8) !void { 364 | var first = true; 365 | for (labels) |l| { 366 | if (first) { 367 | try writer.writeByte(':'); 368 | } else { 369 | try writer.writeByte('&'); 370 | } 371 | try writer.writeAll(l); 372 | first = false; 373 | } 374 | } 375 | 376 | pub fn print_properties(writer: anytype, properties: Properties) !void { 377 | if (properties.len > 0) { 378 | try writer.writeAll(" {"); 379 | for (0..properties.len) |i| { 380 | if (i > 0) { 381 | try writer.writeAll(", "); 382 | } 383 | const p = properties.get(i); 384 | try writer.print("{s}: ", .{p.key}); 385 | try p.value.print(writer); 386 | } 387 | try writer.writeByte('}'); 388 | } 389 | } 390 | 391 | pub const Scan = struct { 392 | ident: u16, // Name of the bound variable. 393 | label: ?[]u8, 394 | 395 | pub fn deinit(self: *Scan, allocator: Allocator) void { 396 | if (self.label) |l| { 397 | allocator.free(l); 398 | } 399 | } 400 | }; 401 | 402 | pub const LookupId = struct { 403 | ident_ref: u16, // Name of the bound entity reference (output). 404 | ident_id: u16, // Name of the ID to look up (input). 405 | }; 406 | 407 | pub const Step = struct { 408 | ident_src: u16, // Name of the starting node (input). 409 | ident_edge: ?u16, // Name of the edge, to be bound (output). 410 | ident_dest: ?u16, // Name of the ending node, to be bound (output). 411 | direction: EdgeDirection, 412 | edge_label: ?[]u8, // Label to traverse on the edge. 413 | 414 | pub fn deinit(self: *Step, allocator: Allocator) void { 415 | if (self.edge_label) |l| { 416 | allocator.free(l); 417 | } 418 | } 419 | }; 420 | 421 | /// A new variable assignment made in a Project operator. 422 | pub const ProjectClause = struct { 423 | ident: u16, 424 | exp: Exp, 425 | 426 | pub fn deinit(self: *ProjectClause, allocator: Allocator) void { 427 | self.exp.deinit(allocator); 428 | } 429 | }; 430 | 431 | /// A filter clause that can be applied to the query. 432 | pub const FilterClause = union(enum) { 433 | /// Include a row if the expression is truthy. 434 | bool_exp: Exp, 435 | 436 | /// CHeck that a node or edge has the given label. 437 | ident_label: struct { 438 | ident: u16, 439 | label: []u8, 440 | }, 441 | 442 | pub fn deinit(self: *FilterClause, allocator: Allocator) void { 443 | switch (self.*) { 444 | .bool_exp => |*n| n.deinit(allocator), 445 | .ident_label => |*n| allocator.free(n.label), 446 | } 447 | self.* = undefined; 448 | } 449 | 450 | /// Pretty-print a filter clause. 451 | pub fn print(self: FilterClause, writer: anytype) !void { 452 | switch (self) { 453 | .bool_exp => |e| try e.print(writer), 454 | .ident_label => |f| try writer.print("%{}: {s}", .{ f.ident, f.label }), 455 | } 456 | } 457 | }; 458 | 459 | pub const SortClause = struct { 460 | ident: u16, 461 | desc: bool, 462 | }; 463 | 464 | pub const InsertNode = struct { 465 | ident: ?u16, 466 | labels: std.ArrayListUnmanaged([]u8), 467 | properties: Properties, 468 | 469 | pub fn deinit(self: *InsertNode, allocator: Allocator) void { 470 | for (self.labels.items) |s| allocator.free(s); 471 | self.labels.deinit(allocator); 472 | 473 | for (self.properties.items(.key)) |k| allocator.free(k); 474 | for (self.properties.items(.value)) |*v| v.deinit(allocator); 475 | self.properties.deinit(allocator); 476 | } 477 | }; 478 | 479 | pub const InsertEdge = struct { 480 | ident: ?u16, 481 | ident_src: u16, 482 | ident_dest: u16, 483 | directed: bool, 484 | labels: std.ArrayListUnmanaged([]u8), 485 | properties: Properties, 486 | 487 | pub fn deinit(self: *InsertEdge, allocator: Allocator) void { 488 | for (self.labels.items) |s| allocator.free(s); 489 | self.labels.deinit(allocator); 490 | 491 | for (self.properties.items(.key)) |k| allocator.free(k); 492 | for (self.properties.items(.value)) |*v| v.deinit(allocator); 493 | self.properties.deinit(allocator); 494 | } 495 | }; 496 | 497 | /// A list of properties for a node or edge. 498 | pub const Properties = std.MultiArrayList(struct { key: []u8, value: Exp }); 499 | 500 | /// A low-level expression used by query plan operators. 501 | pub const Exp = union(enum) { 502 | literal: Value, 503 | ident: u32, 504 | parameter: u32, 505 | binop: *BinopExp, 506 | 507 | pub fn deinit(self: *Exp, allocator: Allocator) void { 508 | switch (self.*) { 509 | .literal => |*v| v.deinit(allocator), 510 | .binop => |b| { 511 | b.deinit(allocator); 512 | allocator.destroy(b); // Needed because binop is a pointer. 513 | }, 514 | else => {}, 515 | } 516 | self.* = undefined; 517 | } 518 | 519 | /// Pretty-print an expression. 520 | pub fn print(self: Exp, writer: anytype) !void { 521 | switch (self) { 522 | .literal => |v| try v.print(writer), 523 | .ident => |i| try writer.print("%{}", .{i}), 524 | .parameter => |n| try writer.print("${}", .{n}), 525 | .binop => |b| { 526 | try writer.writeByte('('); 527 | try b.left.print(writer); 528 | try writer.print(" {s} ", .{b.op.string()}); 529 | try b.right.print(writer); 530 | try writer.writeByte(')'); 531 | }, 532 | } 533 | } 534 | }; 535 | 536 | pub const BinopExp = struct { 537 | op: Binop, 538 | left: Exp, 539 | right: Exp, 540 | 541 | pub fn deinit(self: *BinopExp, allocator: Allocator) void { 542 | self.left.deinit(allocator); 543 | self.right.deinit(allocator); 544 | self.* = undefined; 545 | } 546 | }; 547 | 548 | pub const Binop = enum { 549 | add, 550 | sub, 551 | eql, 552 | neq, 553 | 554 | fn string(self: Binop) []const u8 { 555 | return switch (self) { 556 | .add => "+", 557 | .sub => "-", 558 | .eql => "=", 559 | .neq => "<>", 560 | }; 561 | } 562 | }; 563 | 564 | const Snap = @import("vendor/snaptest.zig").Snap; 565 | const snap = Snap.snap; 566 | 567 | /// Check the value of a query plan, for snapshot testing. 568 | fn check_plan_snapshot(plan: Plan, want: Snap) !void { 569 | var buf = std.ArrayList(u8).init(std.testing.allocator); 570 | defer buf.deinit(); 571 | try plan.print(buf.writer()); 572 | try want.diff(buf.items); 573 | } 574 | 575 | test "can create, free and print plan" { 576 | const allocator = std.testing.allocator; 577 | // MATCH (n) RETURN n; 578 | var plan = Plan{}; 579 | defer plan.deinit(allocator); 580 | 581 | try plan.results.append(allocator, 0); 582 | try plan.ops.append(allocator, Operator{ 583 | .node_scan = Scan{ 584 | .ident = 0, 585 | .label = null, 586 | }, 587 | }); 588 | 589 | try check_plan_snapshot(plan, snap(@src(), 590 | \\Plan{%0} 591 | \\ NodeScan (%0) 592 | )); 593 | 594 | try plan.ops.append(allocator, Operator{ 595 | .step = Step{ 596 | .ident_src = 0, 597 | .ident_edge = 1, 598 | .ident_dest = 2, 599 | .direction = .right_or_undirected, 600 | .edge_label = try allocator.dupe(u8, "Likes"), 601 | }, 602 | }); 603 | plan.results.items[0] = 1; 604 | try plan.results.append(allocator, 2); 605 | 606 | try check_plan_snapshot(plan, snap(@src(), 607 | \\Plan{%1, %2} 608 | \\ Step (%0)~[%1:Likes]~>(%2) 609 | \\ NodeScan (%0) 610 | )); 611 | 612 | try std.testing.expectEqual(3, plan.idents()); 613 | } 614 | -------------------------------------------------------------------------------- /src/executor.zig: -------------------------------------------------------------------------------- 1 | //! Execute query plans against a storage engine. 2 | 3 | const std = @import("std"); 4 | const Allocator = std.mem.Allocator; 5 | 6 | const Plan = @import("Plan.zig"); 7 | const types = @import("types.zig"); 8 | const Value = types.Value; 9 | const storage = @import("storage.zig"); 10 | 11 | const join_ops = @import("executor/join_ops.zig"); 12 | const modify_ops = @import("executor/modify_ops.zig"); 13 | const scan_ops = @import("executor/scan_ops.zig"); 14 | const simple_ops = @import("executor/simple_ops.zig"); 15 | const step_ops = @import("executor/step_ops.zig"); 16 | 17 | const test_helpers = @import("test_helpers.zig"); 18 | 19 | const operator_impls = blk: { 20 | // Specify implementations of operators here. 21 | // Format: { op, state type, destructor, run function } 22 | const operator_impls_raw = .{ 23 | .{ Plan.Operator.node_scan, scan_ops.NodeScanState, scan_ops.NodeScanState.deinit, scan_ops.runNodeScan }, 24 | .{ Plan.Operator.edge_scan, scan_ops.EdgeScanState, scan_ops.EdgeScanState.deinit, scan_ops.runEdgeScan }, 25 | .{ Plan.Operator.node_by_id, void, null, simple_ops.runNodeById }, 26 | .{ Plan.Operator.edge_by_id, void, null, simple_ops.runEdgeById }, 27 | .{ Plan.Operator.step, step_ops.StepState, step_ops.StepState.deinit, step_ops.runStep }, 28 | .{ Plan.Operator.begin, bool, null, join_ops.runBegin }, 29 | .{ Plan.Operator.join, join_ops.JoinState, null, join_ops.runJoin }, 30 | .{ Plan.Operator.semi_join, void, null, join_ops.runSemiJoin }, 31 | .{ Plan.Operator.anti, bool, null, simple_ops.runAnti }, 32 | .{ Plan.Operator.project, void, null, simple_ops.runProject }, 33 | .{ Plan.Operator.empty_result, void, null, simple_ops.runEmptyResult }, 34 | .{ Plan.Operator.filter, void, null, simple_ops.runFilter }, 35 | .{ Plan.Operator.limit, u64, null, simple_ops.runLimit }, 36 | .{ Plan.Operator.skip, bool, null, simple_ops.runSkip }, 37 | .{ Plan.Operator.union_all, bool, null, join_ops.runUnionAll }, 38 | .{ Plan.Operator.insert_node, void, null, modify_ops.runInsertNode }, 39 | .{ Plan.Operator.insert_edge, void, null, modify_ops.runInsertEdge }, 40 | }; 41 | 42 | var impls: std.EnumMap(std.meta.Tag(Plan.Operator), OperatorImpl) = .{}; 43 | 44 | for (operator_impls_raw) |impl_spec| { 45 | const spec_tag, const spec_state, const spec_deinit, const spec_run = impl_spec; 46 | const Impl = struct { 47 | fn init(allocator: Allocator) Allocator.Error!OperatorState { 48 | const state = try allocator.create(spec_state); 49 | switch (@typeInfo(spec_state)) { 50 | .Struct => state.* = std.mem.zeroInit(spec_state, .{}), 51 | else => state.* = std.mem.zeroes(spec_state), 52 | } 53 | return OperatorState.of(spec_state, state, spec_deinit); 54 | } 55 | fn run(op: Plan.Operator, state: *anyopaque, exec: *Executor, op_index: u32) Error!bool { 56 | const op1 = @field(op, @tagName(impl_spec[0])); 57 | const state1 = @as(*spec_state, @ptrCast(@alignCast(state))); 58 | return spec_run(op1, state1, exec, op_index); 59 | } 60 | }; 61 | impls.put(spec_tag, OperatorImpl{ .init = &Impl.init, .run = &Impl.run }); 62 | } 63 | 64 | break :blk impls; 65 | }; 66 | 67 | /// Type-erased implementation of a operator. 68 | const OperatorImpl = struct { 69 | init: *const fn (allocator: Allocator) Allocator.Error!OperatorState, 70 | run: *const fn (op: Plan.Operator, state: *anyopaque, exec: *Executor, op_index: u32) Error!bool, 71 | }; 72 | 73 | /// Type-erased state attached to a query plan operator while it is running. 74 | const OperatorState = struct { 75 | ptr: ?*anyopaque, 76 | destroy: *const fn (self: *anyopaque, allocator: Allocator) void, 77 | 78 | fn of(comptime T: type, ptr: *T, comptime deinit: ?*const fn (self: *T, allocator: Allocator) void) OperatorState { 79 | return .{ 80 | .ptr = ptr, 81 | .destroy = &struct { 82 | fn opaque_destroy(self: *anyopaque, allocator: Allocator) void { 83 | const state: *T = @ptrCast(@alignCast(self)); 84 | if (deinit) |func| { 85 | func(state, allocator); 86 | } 87 | allocator.destroy(state); 88 | } 89 | }.opaque_destroy, 90 | }; 91 | } 92 | }; 93 | 94 | /// Error type returned by running a query plan. 95 | pub const Error = storage.Error || error{ 96 | MalformedPlan, 97 | WrongType, 98 | }; 99 | 100 | /// State corresponding to a query plan while it is executing. 101 | pub const Executor = struct { 102 | plan: *const Plan, 103 | txn: storage.Transaction, 104 | 105 | /// State for each operator in the plan. 106 | states: []OperatorState, 107 | 108 | /// Value assignments, implicitly represents the current row. 109 | assignments: []Value, 110 | 111 | /// Whether the implicit "initial operator" has returned yet. 112 | init_op: bool, 113 | 114 | /// Create a new executor for the given plan, within a storage transaction. 115 | pub fn init(plan: *const Plan, txn: storage.Transaction) !Executor { 116 | var exec = try init1(plan, txn); 117 | errdefer exec.deinit(); 118 | for (plan.ops.items, 0..) |_, i| { 119 | try exec.resetState(@intCast(i)); 120 | } 121 | return exec; 122 | } 123 | 124 | fn init1(plan: *const Plan, txn: storage.Transaction) !Executor { 125 | const idents = plan.idents(); 126 | const assignments = try txn.allocator.alloc(Value, idents); 127 | errdefer txn.allocator.free(assignments); 128 | for (assignments) |*a| a.* = .null; 129 | 130 | const states = try txn.allocator.alloc(OperatorState, plan.ops.items.len); 131 | for (states) |*state| { 132 | state.* = .{ .ptr = null, .destroy = undefined }; 133 | } 134 | 135 | return .{ 136 | .plan = plan, 137 | .txn = txn, 138 | .states = states, 139 | .assignments = assignments, 140 | .init_op = false, 141 | }; 142 | } 143 | 144 | pub fn deinit(self: *Executor) void { 145 | for (self.states) |s| { 146 | if (s.ptr) |p| { 147 | s.destroy(p, self.txn.allocator); 148 | } 149 | } 150 | self.txn.allocator.free(self.states); 151 | for (self.assignments) |*v| { 152 | v.deinit(self.txn.allocator); 153 | } 154 | self.txn.allocator.free(self.assignments); 155 | self.* = undefined; 156 | } 157 | 158 | /// Reset the state of the operator in the given index in the plan. 159 | pub fn resetState(self: *Executor, op_index: u32) Allocator.Error!void { 160 | var state = &self.states[op_index]; 161 | if (state.ptr) |p| { 162 | state.destroy(p, self.txn.allocator); 163 | state.* = .{ .ptr = null, .destroy = undefined }; 164 | } 165 | if (operator_impls.get(self.plan.ops.items[op_index])) |impl| { 166 | state.* = try impl.init(self.txn.allocator); 167 | } 168 | } 169 | 170 | /// Reset the state of all operators in the given range in the plan. Useful 171 | /// for resetting a subquery in a join. 172 | pub fn resetStateRange(self: *Executor, start_index: u32, end_index: u32) Allocator.Error!void { 173 | std.debug.assert(start_index <= end_index); 174 | var i = start_index; 175 | while (i < end_index) : (i += 1) { 176 | try self.resetState(i); 177 | } 178 | } 179 | 180 | /// Run the operators before the given index in the plan. 181 | /// 182 | /// Returns false if the set of rows is exhausted for this operator. This is 183 | /// similar to an iterator API, but the actual values are stored in the 184 | /// executor's assignment buffer. 185 | /// 186 | /// After returning false, next() should not be called on the same operator 187 | /// again until its state has been reset. 188 | pub fn next(self: *Executor, end_index: u32) Error!bool { 189 | if (end_index > self.plan.ops.items.len) { 190 | std.debug.panic("operator end_index out of bounds: {d}", .{end_index}); 191 | } else if (end_index == 0) { 192 | const initialized = self.init_op; 193 | self.init_op = true; 194 | return !initialized; 195 | } 196 | 197 | const op_index = end_index - 1; 198 | const op = self.plan.ops.items[op_index]; 199 | if (operator_impls.get(op)) |impl| { 200 | return impl.run(op, self.states[op_index].ptr.?, self, op_index); 201 | } else { 202 | std.debug.panic("unimplemented operator {s}", .{@tagName(op)}); 203 | } 204 | } 205 | 206 | /// Return the next row from the plan, or false if the plan is exhausted. 207 | pub fn run(self: *Executor) Error!?Result { 208 | const has_next = try self.next(@intCast(self.plan.ops.items.len)); 209 | if (!has_next) return null; 210 | var values = try self.txn.allocator.alloc(Value, self.plan.results.items.len); 211 | for (values) |*v| v.* = .null; 212 | errdefer { 213 | for (values) |*v| { 214 | v.deinit(self.txn.allocator); 215 | } 216 | self.txn.allocator.free(values); 217 | } 218 | for (self.plan.results.items, 0..) |r, i| { 219 | values[i] = try self.assignments[r].dupe(self.txn.allocator); 220 | } 221 | return .{ .values = values }; 222 | } 223 | }; 224 | 225 | pub const Result = struct { 226 | /// A single row in the set of results from an executed query. 227 | values: []Value, 228 | 229 | pub fn deinit(self: *Result, allocator: Allocator) void { 230 | for (self.values) |*v| { 231 | v.deinit(allocator); 232 | } 233 | allocator.free(self.values); 234 | self.* = undefined; 235 | } 236 | }; 237 | 238 | test Executor { 239 | var tmp = test_helpers.tmp(); 240 | defer tmp.cleanup(); 241 | 242 | const store = try tmp.store("test.db"); 243 | defer store.db.close(); 244 | 245 | const txn = store.txn(); 246 | defer txn.close(); 247 | 248 | // Run an empty plan. 249 | const plan = Plan{}; 250 | var exec = try Executor.init(&plan, txn); 251 | defer exec.deinit(); 252 | try std.testing.expect(try exec.run() != null); 253 | try std.testing.expect(try exec.run() == null); 254 | } 255 | 256 | /// Evaluate an expression given assignments. 257 | pub fn evaluate(exp: Plan.Exp, assignments: []const Value, allocator: Allocator) Allocator.Error!Value { 258 | return switch (exp) { 259 | .literal => |v| v.dupe(allocator), 260 | .ident => |i| assignments[i].dupe(allocator), 261 | .parameter => |_| std.debug.panic("parameters not implemented yet", .{}), 262 | .binop => |binop| { 263 | var lhs = try evaluate(binop.left, assignments, allocator); 264 | defer lhs.deinit(allocator); 265 | var rhs = try evaluate(binop.right, assignments, allocator); 266 | defer rhs.deinit(allocator); 267 | return switch (binop.op) { 268 | .add => try lhs.add(rhs, allocator), 269 | .sub => lhs.sub(rhs), 270 | .eql => .{ .bool = lhs.eql(rhs) }, 271 | .neq => .{ .bool = !lhs.eql(rhs) }, 272 | }; 273 | }, 274 | }; 275 | } 276 | 277 | test evaluate { 278 | const allocator = std.testing.allocator; 279 | 280 | try std.testing.expectEqual( 281 | Value{ .int64 = 12 }, 282 | try evaluate(Plan.Exp{ .literal = Value{ .int64 = 12 } }, &.{}, allocator), 283 | ); 284 | 285 | try std.testing.expectEqual( 286 | Value{ .int64 = 13 }, 287 | try evaluate(Plan.Exp{ .ident = 0 }, &.{Value{ .int64 = 13 }}, allocator), 288 | ); 289 | 290 | var bop = Plan.BinopExp{ 291 | .op = Plan.Binop.sub, 292 | .left = Plan.Exp{ .literal = Value{ .int64 = 500 } }, 293 | .right = Plan.Exp{ .ident = 0 }, 294 | }; 295 | try std.testing.expectEqual( 296 | Value{ .int64 = 420 }, 297 | try evaluate(Plan.Exp{ .binop = &bop }, &.{Value{ .int64 = 80 }}, allocator), 298 | ); 299 | } 300 | -------------------------------------------------------------------------------- /src/executor/join_ops.zig: -------------------------------------------------------------------------------- 1 | //! Operators implemented in this file take a subquery, starting with 'Begin'. 2 | 3 | const std = @import("std"); 4 | const Allocator = std.mem.Allocator; 5 | 6 | const executor = @import("../executor.zig"); 7 | const Plan = @import("../Plan.zig"); 8 | 9 | /// Subquery execution will eventually trickle down to a 'Begin' operator, which 10 | /// needs to return true exactly once to act as the start of a query. 11 | pub fn runBegin(_: void, state: *bool, _: *executor.Executor, _: u32) !bool { 12 | if (state.*) return false; 13 | state.* = true; 14 | return true; 15 | } 16 | 17 | pub const JoinState = enum(u8) { 18 | left = 0, 19 | right, 20 | }; 21 | 22 | pub fn runJoin(_: void, state: *JoinState, exec: *executor.Executor, op_index: u32) !bool { 23 | const j = exec.plan.subqueryBegin(op_index) orelse return error.MalformedPlan; 24 | 25 | while (true) { 26 | switch (state.*) { 27 | .left => { 28 | if (!try exec.next(j)) return false; 29 | try exec.resetStateRange(j, op_index); 30 | state.* = .right; 31 | }, 32 | .right => { 33 | if (try exec.next(op_index)) { 34 | // Return this row, but stay on the right subquery. 35 | return true; 36 | } else { 37 | state.* = .left; 38 | } 39 | }, 40 | } 41 | } 42 | } 43 | 44 | pub fn runSemiJoin(_: void, _: *void, exec: *executor.Executor, op_index: u32) !bool { 45 | const j = exec.plan.subqueryBegin(op_index) orelse return error.MalformedPlan; 46 | 47 | while (true) { 48 | // Fetch a row from the left subquery. 49 | if (!try exec.next(j)) 50 | return false; 51 | 52 | // Then reset the right subquery and try to fetch at least one row. 53 | try exec.resetStateRange(j, op_index); 54 | if (try exec.next(op_index)) 55 | return true; 56 | } 57 | } 58 | 59 | /// This returns all values from the left subquery, then all values from the 60 | /// right subquery. 61 | pub fn runUnionAll(_: void, state: *bool, exec: *executor.Executor, op_index: u32) !bool { 62 | if (!state.*) { 63 | const j = exec.plan.subqueryBegin(op_index) orelse return error.MalformedPlan; 64 | const has_next_left = try exec.next(j); 65 | if (has_next_left) { 66 | return true; 67 | } 68 | state.* = true; // We finished the left subquery, move on below 69 | } 70 | return try exec.next(op_index); 71 | } 72 | -------------------------------------------------------------------------------- /src/executor/modify_ops.zig: -------------------------------------------------------------------------------- 1 | //! Execute query plan operators that modify graph data: nodes and edges. 2 | 3 | const std = @import("std"); 4 | const Allocator = std.mem.Allocator; 5 | 6 | const executor = @import("../executor.zig"); 7 | const Plan = @import("../Plan.zig"); 8 | const types = @import("../types.zig"); 9 | 10 | pub fn runInsertNode(op: Plan.InsertNode, _: *void, exec: *executor.Executor, op_index: u32) !bool { 11 | if (!try exec.next(op_index)) return false; 12 | 13 | var node = types.Node{ .id = types.ElementId.generate() }; 14 | defer node.deinit(exec.txn.allocator); 15 | 16 | for (op.labels.items) |label| { 17 | try node.labels.put(exec.txn.allocator, label, void{}); 18 | } 19 | node.properties = try evaluateProperties(op.properties, exec.assignments, exec.txn.allocator); 20 | 21 | try exec.txn.putNode(node); 22 | if (op.ident) |ident| { 23 | exec.assignments[ident] = .{ .node_ref = node.id }; 24 | } 25 | return true; 26 | } 27 | 28 | pub fn runInsertEdge(op: Plan.InsertEdge, _: *void, exec: *executor.Executor, op_index: u32) !bool { 29 | if (!try exec.next(op_index)) return false; 30 | 31 | const src_id = switch (exec.assignments[op.ident_src]) { 32 | .node_ref => |n| n, 33 | else => return executor.Error.WrongType, 34 | }; 35 | const dest_id = switch (exec.assignments[op.ident_dest]) { 36 | .node_ref => |n| n, 37 | else => return executor.Error.WrongType, 38 | }; 39 | 40 | var edge = types.Edge{ 41 | .id = types.ElementId.generate(), 42 | .endpoints = .{ src_id, dest_id }, 43 | .directed = op.directed, 44 | }; 45 | defer edge.deinit(exec.txn.allocator); 46 | 47 | for (op.labels.items) |label| { 48 | try edge.labels.put(exec.txn.allocator, label, void{}); 49 | } 50 | edge.properties = try evaluateProperties(op.properties, exec.assignments, exec.txn.allocator); 51 | 52 | try exec.txn.putEdge(edge); 53 | if (op.ident) |ident| { 54 | exec.assignments[ident] = .{ .edge_ref = edge.id }; 55 | } 56 | return true; 57 | } 58 | 59 | /// Evaluate property expressions in a query plan. 60 | fn evaluateProperties( 61 | properties: Plan.Properties, 62 | assignments: []const types.Value, 63 | allocator: Allocator, 64 | ) Allocator.Error!std.StringArrayHashMapUnmanaged(types.Value) { 65 | var ret: std.StringArrayHashMapUnmanaged(types.Value) = .{}; 66 | errdefer types.freeProperties(allocator, &ret); 67 | for (properties.items(.key), properties.items(.value)) |k, v| { 68 | const key = try allocator.dupe(u8, k); 69 | errdefer allocator.free(key); 70 | var value = try executor.evaluate(v, assignments, allocator); 71 | errdefer value.deinit(allocator); 72 | try ret.put(allocator, key, value); 73 | } 74 | return ret; 75 | } 76 | -------------------------------------------------------------------------------- /src/executor/scan_ops.zig: -------------------------------------------------------------------------------- 1 | const std = @import("std"); 2 | const Allocator = std.mem.Allocator; 3 | 4 | const executor = @import("../executor.zig"); 5 | const storage = @import("../storage.zig"); 6 | const Plan = @import("../Plan.zig"); 7 | const types = @import("../types.zig"); 8 | 9 | const test_helpers = @import("../test_helpers.zig"); 10 | 11 | pub const NodeScanState = struct { 12 | it: ?storage.ScanIterator(types.Node), 13 | 14 | pub fn deinit(self: *NodeScanState, _: Allocator) void { 15 | if (self.it) |it| it.close(); 16 | self.* = undefined; 17 | } 18 | }; 19 | 20 | pub fn runNodeScan(op: Plan.Scan, state: *NodeScanState, exec: *executor.Executor, op_index: u32) !bool { 21 | if (state.it == null) { 22 | const has_next = try exec.next(op_index); 23 | if (!has_next) return false; 24 | state.it = try exec.txn.iterateNodes(); 25 | } 26 | const it = &state.it.?; 27 | while (true) { 28 | var next_node: types.Node = try it.next() orelse return false; 29 | defer next_node.deinit(exec.txn.allocator); 30 | 31 | if (op.label == null or next_node.labels.get(op.label.?) != null) { 32 | exec.assignments[op_index] = types.Value{ .node_ref = next_node.id }; 33 | return true; 34 | } 35 | } 36 | } 37 | 38 | pub const EdgeScanState = struct { 39 | it: ?storage.ScanIterator(types.Edge), 40 | 41 | pub fn deinit(self: *EdgeScanState, _: Allocator) void { 42 | if (self.it) |it| it.close(); 43 | self.* = undefined; 44 | } 45 | }; 46 | 47 | pub fn runEdgeScan(op: Plan.Scan, state: *EdgeScanState, exec: *executor.Executor, op_index: u32) !bool { 48 | if (state.it == null) { 49 | const has_next = try exec.next(op_index); 50 | if (!has_next) return false; 51 | state.it = try exec.txn.iterateEdges(); 52 | } 53 | const it = &state.it.?; 54 | while (true) { 55 | var next_edge: types.Edge = try it.next() orelse return false; 56 | defer next_edge.deinit(exec.txn.allocator); 57 | 58 | if (op.label == null or next_edge.labels.get(op.label.?) != null) { 59 | exec.assignments[op_index] = types.Value{ .edge_ref = next_edge.id }; 60 | return true; 61 | } 62 | } 63 | } 64 | 65 | test "node scan" { 66 | var tmp = test_helpers.tmp(); 67 | defer tmp.cleanup(); 68 | 69 | const store = try tmp.store("test.db"); 70 | defer store.db.close(); 71 | 72 | const txn = store.txn(); 73 | defer txn.close(); 74 | 75 | const allocator = std.testing.allocator; 76 | var plan = Plan{}; 77 | defer plan.deinit(allocator); 78 | 79 | try plan.results.append(allocator, 0); 80 | try plan.ops.append(allocator, Plan.Operator{ 81 | .node_scan = Plan.Scan{ 82 | .ident = 0, 83 | .label = null, 84 | }, 85 | }); 86 | 87 | { 88 | // Currently, there are no nodes in the graph to scan through. 89 | var exec = try executor.Executor.init(&plan, txn); 90 | defer exec.deinit(); 91 | try std.testing.expect(try exec.run() == null); 92 | } 93 | 94 | const n = types.Node{ .id = types.ElementId.generate() }; 95 | try txn.putNode(n); 96 | 97 | { 98 | // There is now one node. 99 | var exec = try executor.Executor.init(&plan, txn); 100 | defer exec.deinit(); 101 | var result = try exec.run() orelse unreachable; 102 | defer result.deinit(allocator); 103 | try std.testing.expectEqual(n.id, result.values[0].node_ref); 104 | try std.testing.expect(try exec.run() == null); 105 | } 106 | } 107 | -------------------------------------------------------------------------------- /src/executor/simple_ops.zig: -------------------------------------------------------------------------------- 1 | const std = @import("std"); 2 | const Allocator = std.mem.Allocator; 3 | 4 | const executor = @import("../executor.zig"); 5 | const Plan = @import("../Plan.zig"); 6 | 7 | pub fn runNodeById(op: Plan.LookupId, _: *void, exec: *executor.Executor, op_index: u32) !bool { 8 | if (!try exec.next(op_index)) return false; 9 | switch (exec.assignments[op.ident_id]) { 10 | .id => |id| { 11 | var node = try exec.txn.getNode(id); 12 | if (node == null) 13 | return false; 14 | node.?.deinit(exec.txn.allocator); 15 | exec.assignments[op.ident_ref] = .{ .node_ref = id }; 16 | return true; 17 | }, 18 | else => return false, // Type error 19 | } 20 | } 21 | 22 | pub fn runEdgeById(op: Plan.LookupId, _: *void, exec: *executor.Executor, op_index: u32) !bool { 23 | if (!try exec.next(op_index)) return false; 24 | switch (exec.assignments[op.ident_id]) { 25 | .id => |id| { 26 | var edge = try exec.txn.getEdge(id); 27 | if (edge == null) 28 | return false; 29 | edge.?.deinit(exec.txn.allocator); 30 | exec.assignments[op.ident_ref] = .{ .edge_ref = id }; 31 | return true; 32 | }, 33 | else => return false, // Type error 34 | } 35 | } 36 | 37 | pub fn runAnti(_: void, state: *bool, exec: *executor.Executor, op_index: u32) !bool { 38 | // Anti only returns up to one row, so we keep track of this with the state. 39 | if (state.*) return false; 40 | state.* = true; 41 | 42 | // Return a row if and only if there are no rows. 43 | return !(try exec.next(op_index)); 44 | } 45 | 46 | pub fn runProject(op: std.ArrayListUnmanaged(Plan.ProjectClause), _: *void, exec: *executor.Executor, op_index: u32) !bool { 47 | if (!try exec.next(op_index)) return false; 48 | for (op.items) |clause| { 49 | // This allows later assignment clauses to depend on earlier ones in the list. 50 | const new_value = try executor.evaluate(clause.exp, exec.assignments, exec.txn.allocator); 51 | exec.assignments[clause.ident].deinit(exec.txn.allocator); 52 | exec.assignments[clause.ident] = new_value; 53 | } 54 | return true; 55 | } 56 | 57 | pub fn runEmptyResult(_: void, _: *void, exec: *executor.Executor, op_index: u32) !bool { 58 | // Consume all results, and then do not return them. 59 | while (try exec.next(op_index)) {} 60 | return false; 61 | } 62 | 63 | pub fn runFilter(op: std.ArrayListUnmanaged(Plan.FilterClause), _: *void, exec: *executor.Executor, op_index: u32) !bool { 64 | filter: while (true) { 65 | if (!try exec.next(op_index)) return false; 66 | 67 | for (op.items) |clause| { 68 | switch (clause) { 69 | .bool_exp => |exp| { 70 | var value = try executor.evaluate(exp, exec.assignments, exec.txn.allocator); 71 | defer value.deinit(exec.txn.allocator); 72 | if (!value.truthy()) { 73 | continue :filter; 74 | } 75 | }, 76 | .ident_label => |ident_label| { 77 | // Must be a reference, otherwise we return a type error. 78 | switch (exec.assignments[ident_label.ident]) { 79 | .node_ref => |node_id| { 80 | var node = try exec.txn.getNode(node_id) orelse continue :filter; 81 | defer node.deinit(exec.txn.allocator); 82 | node.labels.get(ident_label.label) orelse continue :filter; 83 | }, 84 | .edge_ref => |edge_id| { 85 | var edge = try exec.txn.getEdge(edge_id) orelse continue :filter; 86 | defer edge.deinit(exec.txn.allocator); 87 | edge.labels.get(ident_label.label) orelse continue :filter; 88 | }, 89 | else => return executor.Error.WrongType, 90 | } 91 | }, 92 | } 93 | } 94 | 95 | // If we got to this point, all filter clauses have passed. 96 | return true; 97 | } 98 | } 99 | 100 | pub fn runLimit(op: u64, state: *u64, exec: *executor.Executor, op_index: u32) !bool { 101 | if (state.* >= op) { 102 | return false; 103 | } else { 104 | state.* += 1; 105 | return try exec.next(op_index); 106 | } 107 | } 108 | 109 | pub fn runSkip(op: u64, state: *bool, exec: *executor.Executor, op_index: u32) !bool { 110 | if (!state.*) { 111 | state.* = true; // Mark this as having done the skip 112 | for (0..op) |_| { 113 | if (!try exec.next(op_index)) { 114 | return false; 115 | } 116 | } 117 | } 118 | return try exec.next(op_index); 119 | } 120 | -------------------------------------------------------------------------------- /src/executor/step_ops.zig: -------------------------------------------------------------------------------- 1 | const std = @import("std"); 2 | const Allocator = std.mem.Allocator; 3 | 4 | const executor = @import("../executor.zig"); 5 | const storage = @import("../storage.zig"); 6 | const Plan = @import("../Plan.zig"); 7 | const types = @import("../types.zig"); 8 | 9 | const test_helpers = @import("../test_helpers.zig"); 10 | 11 | const StepFsm = enum { 12 | init, 13 | iter_out_before_in, // Needs a separate state because they are noncontiguous. 14 | iterating, 15 | }; 16 | 17 | pub const StepState = struct { 18 | fsm: StepFsm = .init, 19 | it: ?storage.AdjIterator, 20 | 21 | pub fn deinit(self: *StepState, _: Allocator) void { 22 | if (self.it) |it| it.close(); 23 | self.* = undefined; 24 | } 25 | }; 26 | 27 | pub fn runStep(op: Plan.Step, state: *StepState, exec: *executor.Executor, op_index: u32) !bool { 28 | while (true) { 29 | if (state.fsm == .iterating or state.fsm == .iter_out_before_in) { 30 | // We are processing an existing iterator. 31 | var it = state.it.?; 32 | while (try it.next()) |entry| { 33 | var ok = true; 34 | 35 | // Check that the label matches, if needed. 36 | if (op.edge_label) |expected_label| { 37 | var edge = try exec.txn.getEdge(entry.edge_id); 38 | if (edge != null) { 39 | defer edge.?.deinit(exec.txn.allocator); 40 | if (!edge.?.labels.contains(expected_label)) { 41 | ok = false; 42 | } 43 | } else ok = false; 44 | } 45 | 46 | if (ok) { 47 | // We've found a matching edge. 48 | if (op.ident_edge) |i| exec.assignments[i] = .{ .edge_ref = entry.edge_id }; 49 | if (op.ident_dest) |i| exec.assignments[i] = .{ .node_ref = entry.dest_node_id }; 50 | return true; 51 | } 52 | } 53 | 54 | // We've finished iterating, exhausting this branch. 55 | it.close(); 56 | state.it = null; 57 | 58 | switch (state.fsm) { 59 | .iterating => state.fsm = .init, 60 | .iter_out_before_in => { 61 | state.fsm = .iterating; 62 | // Set up the next iterator to EdgeInOut.in direction. 63 | switch (exec.assignments[op.ident_src]) { 64 | .node_ref => |src_node_id| { 65 | state.it = try exec.txn.iterateAdj(src_node_id, .in, .in); 66 | }, 67 | // Type error, this should never happen. 68 | else => return false, 69 | } 70 | }, 71 | else => unreachable, 72 | } 73 | } else { 74 | std.debug.assert(state.fsm == .init); 75 | // Grab the next source node from the previous operator. 76 | const has_next = try exec.next(op_index); 77 | if (!has_next) return false; 78 | 79 | state.fsm = .iterating; 80 | const min_inout: types.EdgeInOut, const max_inout: types.EdgeInOut = switch (op.direction) { 81 | .left => .{ .in, .in }, 82 | .right => .{ .out, .out }, 83 | .undirected => .{ .simple, .simple }, 84 | .left_or_undirected => .{ .simple, .in }, 85 | .right_or_undirected => .{ .out, .simple }, 86 | .left_or_right => blk: { 87 | // Special case: there are two iterators to run, use this state. 88 | state.fsm = .iter_out_before_in; 89 | break :blk .{ .out, .out }; 90 | }, 91 | .any => .{ .out, .in }, 92 | }; 93 | switch (exec.assignments[op.ident_src]) { 94 | .node_ref => |src_node_id| { 95 | state.it = try exec.txn.iterateAdj(src_node_id, min_inout, max_inout); 96 | }, 97 | // Type error reading the type of ident_src. 98 | else => return false, 99 | } 100 | } 101 | } 102 | } 103 | 104 | test "triangle steps" { 105 | var tmp = test_helpers.tmp(); 106 | defer tmp.cleanup(); 107 | 108 | const store = try tmp.store("test.db"); 109 | defer store.db.close(); 110 | 111 | const txn = store.txn(); 112 | defer txn.close(); 113 | 114 | const allocator = std.testing.allocator; 115 | var plan = Plan{}; 116 | defer plan.deinit(allocator); 117 | 118 | try plan.results.appendSlice(allocator, &[_]u16{ 0, 1, 2 }); 119 | try plan.ops.append(allocator, Plan.Operator{ 120 | .node_scan = Plan.Scan{ 121 | .ident = 0, 122 | .label = null, 123 | }, 124 | }); 125 | try plan.ops.append(allocator, Plan.Operator{ 126 | .step = Plan.Step{ 127 | .ident_src = 0, 128 | .ident_edge = 1, 129 | .ident_dest = 2, 130 | .direction = .right, 131 | .edge_label = null, 132 | }, 133 | }); 134 | 135 | const n1 = types.Node{ .id = .{ .value = 1 } }; 136 | const n2 = types.Node{ .id = .{ .value = 2 } }; 137 | const n3 = types.Node{ .id = .{ .value = 3 } }; 138 | try txn.putNode(n1); 139 | try txn.putNode(n2); 140 | try txn.putNode(n3); 141 | 142 | { 143 | // No edges found. 144 | var exec = try executor.Executor.init(&plan, txn); 145 | defer exec.deinit(); 146 | try std.testing.expect(try exec.run() == null); 147 | } 148 | 149 | const e1 = types.Edge{ 150 | .id = .{ .value = 11 }, 151 | .endpoints = .{ n1.id, n2.id }, 152 | .directed = true, 153 | }; 154 | try txn.putEdge(e1); 155 | 156 | { 157 | // There is now one directed edge. 158 | var exec = try executor.Executor.init(&plan, txn); 159 | defer exec.deinit(); 160 | var result = try exec.run() orelse unreachable; 161 | defer result.deinit(allocator); 162 | try std.testing.expectEqual(n1.id, result.values[0].node_ref); 163 | try std.testing.expectEqual(e1.id, result.values[1].edge_ref); 164 | try std.testing.expectEqual(n2.id, result.values[2].node_ref); 165 | try std.testing.expect(try exec.run() == null); 166 | } 167 | 168 | const e2 = types.Edge{ 169 | .id = .{ .value = 12 }, 170 | .endpoints = .{ n2.id, n3.id }, 171 | .directed = true, 172 | }; 173 | try txn.putEdge(e2); 174 | 175 | { 176 | // We should see two edges now. 177 | var exec = try executor.Executor.init(&plan, txn); 178 | defer exec.deinit(); 179 | var result = try exec.run() orelse unreachable; 180 | defer result.deinit(allocator); 181 | var result2 = try exec.run() orelse unreachable; 182 | defer result2.deinit(allocator); 183 | try std.testing.expect(try exec.run() == null); 184 | } 185 | 186 | // Try doing a two-edge traversal. 187 | try plan.results.appendSlice(allocator, &[_]u16{ 3, 4 }); 188 | try plan.ops.append(allocator, Plan.Operator{ 189 | .step = Plan.Step{ 190 | .ident_src = 2, 191 | .ident_edge = 3, 192 | .ident_dest = 4, 193 | .direction = .right, 194 | .edge_label = null, 195 | }, 196 | }); 197 | 198 | { 199 | var exec = try executor.Executor.init(&plan, txn); 200 | defer exec.deinit(); 201 | var result = try exec.run() orelse unreachable; 202 | defer result.deinit(allocator); 203 | try std.testing.expectEqual(n1.id, result.values[0].node_ref); 204 | try std.testing.expectEqual(e1.id, result.values[1].edge_ref); 205 | try std.testing.expectEqual(n2.id, result.values[2].node_ref); 206 | try std.testing.expectEqual(e2.id, result.values[3].edge_ref); 207 | try std.testing.expectEqual(n3.id, result.values[4].node_ref); 208 | try std.testing.expect(try exec.run() == null); 209 | } 210 | } 211 | -------------------------------------------------------------------------------- /src/graphon.zig: -------------------------------------------------------------------------------- 1 | //! Graphon is a very small graph database. 2 | 3 | comptime { // Trigger tests to run on these modules. 4 | _ = @import("executor.zig"); 5 | _ = @import("Plan.zig"); 6 | _ = @import("storage.zig"); 7 | _ = @import("tokenizer.zig"); 8 | _ = @import("types.zig"); 9 | } 10 | -------------------------------------------------------------------------------- /src/main.zig: -------------------------------------------------------------------------------- 1 | const std = @import("std"); 2 | const allocator = std.heap.c_allocator; 3 | 4 | const rocksdb = @import("storage/rocksdb.zig"); 5 | const graphon = @import("graphon.zig"); 6 | 7 | fn rocksdb_insert_perf() !void { 8 | const db = try rocksdb.DB.open("/tmp/graphon"); 9 | defer db.close(); 10 | std.debug.print("opened database at /tmp/graphon\n", .{}); 11 | 12 | const n_keys = 2_000_000; 13 | const size_of_key = 128; 14 | 15 | var prng = std.rand.DefaultPrng.init(0); 16 | const rand = prng.random(); 17 | var buf: [size_of_key]u8 = undefined; 18 | var timer = try std.time.Timer.start(); 19 | var total_time: u64 = 0; 20 | for (0..n_keys) |i| { 21 | if ((i + 1) % 100_000 == 0) { 22 | const elapsed = timer.lap(); 23 | std.debug.print("putting key {d} / lap {}\n", .{ i + 1, std.fmt.fmtDuration(elapsed) }); 24 | total_time += elapsed; 25 | } 26 | rand.bytes(buf[0..]); 27 | try db.put(.default, buf[0..], buf[0..]); 28 | } 29 | total_time += timer.lap(); 30 | std.debug.print("total time: {}\n", .{std.fmt.fmtDuration(total_time)}); 31 | } 32 | 33 | pub fn main() !void { 34 | var args = try std.process.argsWithAllocator(allocator); 35 | defer args.deinit(); 36 | 37 | const C = enum { 38 | help, 39 | rocksdb_insert_perf, 40 | shell, 41 | }; 42 | 43 | _ = args.next(); // skip program name 44 | const command = std.meta.stringToEnum(C, args.next() orelse "help") orelse { 45 | std.debug.print("invalid command\n", .{}); 46 | std.posix.exit(1); 47 | }; 48 | switch (command) { 49 | .help => { 50 | std.debug.print("usage: graphon \n", .{}); 51 | return; 52 | }, 53 | .shell => { 54 | // Open a GQL shell into a temporary database. 55 | @panic("not implemented"); 56 | }, 57 | .rocksdb_insert_perf => { 58 | try rocksdb_insert_perf(); 59 | }, 60 | } 61 | } 62 | -------------------------------------------------------------------------------- /src/parser_test.zig: -------------------------------------------------------------------------------- 1 | //! GQL parsing tests for Parse.zig 2 | -------------------------------------------------------------------------------- /src/storage.zig: -------------------------------------------------------------------------------- 1 | //! Storage engine built on top of RocksDB. Serializes graph-structured data. 2 | 3 | const std = @import("std"); 4 | const Allocator = std.mem.Allocator; 5 | const builtin = @import("builtin"); 6 | 7 | const rocksdb = @import("storage/rocksdb.zig"); 8 | 9 | const types = @import("types.zig"); 10 | const ElementId = types.ElementId; 11 | const Node = types.Node; 12 | const Edge = types.Edge; 13 | 14 | const test_helpers = @import("test_helpers.zig"); 15 | 16 | pub const Error = rocksdb.Error || error{ 17 | CorruptedIndex, 18 | EdgeDataMismatch, 19 | EndOfStream, 20 | InvalidValueTag, 21 | }; 22 | 23 | /// This is the main storage engine type. 24 | /// 25 | /// Based on a RocksDB backend, this object stores nodes and edges. It is also 26 | /// able to index into the graph data to provide fast lookups, and it is 27 | /// responsible for maintaining consistent indices. 28 | pub const Storage = struct { 29 | db: rocksdb.DB, 30 | allocator: Allocator = if (builtin.is_test) std.testing.allocator else std.heap.c_allocator, 31 | 32 | pub fn txn(self: Storage) Transaction { 33 | return .{ .inner = self.db.begin(), .allocator = self.allocator }; 34 | } 35 | }; 36 | 37 | fn decodeNode(id: ElementId, allocator: Allocator, slice: []const u8) Error!Node { 38 | var stream = std.io.fixedBufferStream(slice); 39 | const reader = stream.reader(); 40 | 41 | var labels = try types.decodeLabels(allocator, reader); 42 | errdefer labels.deinit(allocator); 43 | var properties = try types.decodeProperties(allocator, reader); 44 | errdefer properties.deinit(allocator); 45 | 46 | return Node{ .id = id, .labels = labels, .properties = properties }; 47 | } 48 | 49 | fn decodeEdge(id: ElementId, allocator: Allocator, slice: []const u8) Error!Edge { 50 | var stream = std.io.fixedBufferStream(slice); 51 | const reader = stream.reader(); 52 | 53 | const endpoints = [2]ElementId{ try ElementId.decode(reader), try ElementId.decode(reader) }; 54 | const directed = try reader.readByte() == 1; 55 | var labels = try types.decodeLabels(allocator, reader); 56 | errdefer labels.deinit(allocator); 57 | var properties = try types.decodeProperties(allocator, reader); 58 | errdefer properties.deinit(allocator); 59 | 60 | return Edge{ 61 | .id = id, 62 | .endpoints = endpoints, 63 | .directed = directed, 64 | .labels = labels, 65 | .properties = properties, 66 | }; 67 | } 68 | 69 | /// An isolated transaction inside the storage engine. 70 | /// 71 | /// This uses RocksDB transactions to implement snapshot isolation using 72 | /// optimistic concurrency control. 73 | pub const Transaction = struct { 74 | inner: rocksdb.Transaction, 75 | allocator: Allocator, 76 | 77 | /// Close the inner transaction object. 78 | pub fn close(self: Transaction) void { 79 | self.inner.close(); 80 | } 81 | 82 | pub fn commit(self: Transaction) !void { 83 | try self.inner.commit(); 84 | } 85 | 86 | /// Get a node from the storage engine. Returns `null` if not found. 87 | pub fn getNode(self: Transaction, id: ElementId) !?Node { 88 | const value = try self.inner.get(.node, &id.toBytes(), false) orelse return null; 89 | defer value.close(); 90 | return try decodeNode(id, self.allocator, value.bytes()); 91 | } 92 | 93 | /// Get an edge from the storage engine. Returns `null` if not found. 94 | pub fn getEdge(self: Transaction, id: ElementId) !?Edge { 95 | const value = try self.inner.get(.edge, &id.toBytes(), false) orelse return null; 96 | defer value.close(); 97 | return try decodeEdge(id, self.allocator, value.bytes()); 98 | } 99 | 100 | /// Put a node into the storage engine. 101 | pub fn putNode(self: Transaction, node: Node) !void { 102 | var list = std.ArrayList(u8).init(self.allocator); 103 | defer list.deinit(); 104 | 105 | const writer = list.writer(); 106 | try types.encodeLabels(node.labels, writer); 107 | try types.encodeProperties(node.properties, writer); 108 | try self.inner.put(.node, &node.id.toBytes(), list.items); 109 | } 110 | 111 | /// Put an edge into the storage engine. 112 | pub fn putEdge(self: Transaction, edge: Edge) !void { 113 | var list = std.ArrayList(u8).init(self.allocator); 114 | defer list.deinit(); 115 | 116 | // Note: We call get() on each node to trigger transaction conflicts. 117 | // This is important if a node is deleted. 118 | for (edge.endpoints) |id| { 119 | const value = try self.inner.get(.node, &id.toBytes(), false) orelse return Error.NotFound; 120 | value.close(); 121 | } 122 | 123 | var already_exists = false; 124 | var old_edge_opt = try self.getEdge(edge.id); 125 | if (old_edge_opt) |*old_edge| { 126 | defer old_edge.deinit(self.allocator); 127 | if (old_edge.endpoints[0].value != edge.endpoints[0].value or 128 | old_edge.endpoints[1].value != edge.endpoints[1].value or 129 | old_edge.directed != edge.directed) 130 | { 131 | return Error.EdgeDataMismatch; 132 | } 133 | already_exists = true; 134 | } 135 | 136 | // Add the actual edge into the database. 137 | const writer = list.writer(); 138 | try edge.endpoints[0].encode(writer); 139 | try edge.endpoints[1].encode(writer); 140 | try writer.writeByte(@intFromBool(edge.directed)); 141 | try types.encodeLabels(edge.labels, writer); 142 | try types.encodeProperties(edge.properties, writer); 143 | try self.inner.put(.edge, &edge.id.toBytes(), list.items); 144 | 145 | // Update adjacency lists for the two endpoints. 146 | if (!already_exists) { 147 | const adj = AdjEntry.fromEdge(edge); 148 | try self.inner.put(.adj, &adj.packIntoKey(), &edge.endpoints[1].toBytes()); 149 | try self.inner.put(.adj, &adj.reverse().packIntoKey(), &edge.endpoints[0].toBytes()); 150 | } 151 | } 152 | 153 | /// Remove a node from the storage engine. 154 | pub fn deleteNode(self: Transaction, id: ElementId) !void { 155 | var node = try self.getNode(id) orelse return Error.NotFound; 156 | defer node.deinit(self.allocator); 157 | try self.inner.delete(.node, &id.toBytes()); 158 | 159 | // Update adjacency lists. 160 | var it = try self.iterateAdj(id, .out, .in); 161 | defer it.close(); 162 | while (try it.next()) |entry| { 163 | try self.inner.delete(.adj, &entry.packIntoKey()); 164 | try self.inner.delete(.adj, &entry.reverse().packIntoKey()); 165 | } 166 | } 167 | 168 | /// Remove an edge from the storage engine. 169 | pub fn deleteEdge(self: Transaction, id: ElementId) !void { 170 | var edge = try self.getEdge(id) orelse return Error.NotFound; 171 | defer edge.deinit(self.allocator); 172 | try self.inner.delete(.edge, &id.toBytes()); 173 | 174 | // Update adjacency lists. 175 | const adj = AdjEntry.fromEdge(edge); 176 | try self.inner.delete(.adj, &adj.packIntoKey()); 177 | try self.inner.delete(.adj, &adj.reverse().packIntoKey()); 178 | } 179 | 180 | /// Iterate over all nodes. 181 | pub fn iterateNodes(self: Transaction) !ScanIterator(types.Node) { 182 | return .{ 183 | .inner = self.inner.iterate(.node, null, null), 184 | .allocator = self.allocator, 185 | ._decode = decodeNode, 186 | }; 187 | } 188 | 189 | /// Iterate over all edges. 190 | pub fn iterateEdges(self: Transaction) !ScanIterator(types.Edge) { 191 | return .{ 192 | .inner = self.inner.iterate(.edge, null, null), 193 | .allocator = self.allocator, 194 | ._decode = decodeEdge, 195 | }; 196 | } 197 | 198 | /// Iterate over a subset of the adjacency list. 199 | /// 200 | /// This function does not access any of the node or edge data, or check 201 | /// that IDs exist. It will not trigger any transaction conflicts. 202 | pub fn iterateAdj( 203 | self: Transaction, 204 | node_id: ElementId, 205 | min_inout: types.EdgeInOut, 206 | max_inout: types.EdgeInOut, 207 | ) !AdjIterator { 208 | std.debug.assert(@intFromEnum(min_inout) <= @intFromEnum(max_inout)); 209 | var bounds: []u8 = try self.allocator.alloc(u8, 26); 210 | var lower_bound = bounds[0..13]; 211 | var upper_bound = bounds[13..26]; 212 | lower_bound[0..12].* = node_id.toBytes(); 213 | lower_bound[12] = @intFromEnum(min_inout); 214 | upper_bound[0..12].* = node_id.toBytes(); 215 | upper_bound[12] = @intFromEnum(max_inout) + 1; 216 | return .{ 217 | .inner = self.inner.iterate(.adj, lower_bound, upper_bound), 218 | .bounds = bounds, 219 | .allocator = self.allocator, 220 | }; 221 | } 222 | }; 223 | 224 | pub fn ScanIterator(comptime T: type) type { 225 | return struct { 226 | const Self = @This(); 227 | 228 | inner: rocksdb.Iterator, 229 | allocator: Allocator, 230 | _decode: *const fn (ElementId, Allocator, []const u8) Error!T, 231 | 232 | pub fn close(self: Self) void { 233 | self.inner.close(); 234 | } 235 | 236 | pub fn next(self: Self) Error!?T { 237 | if (!self.inner.valid()) return null; 238 | const key = self.inner.key(); 239 | if (key.len != 12) { 240 | return Error.CorruptedIndex; 241 | } 242 | const id = ElementId.fromBytes(key[0..12].*); 243 | const node = try self._decode(id, self.allocator, self.inner.value()); 244 | self.inner.next(); 245 | return node; 246 | } 247 | }; 248 | } 249 | 250 | /// An entry returned by scanning the adjacency list of a node. 251 | pub const AdjEntry = struct { 252 | src_node_id: ElementId, 253 | inout: types.EdgeInOut, 254 | edge_id: ElementId, 255 | dest_node_id: ElementId, 256 | 257 | pub fn fromEdge(edge: Edge) AdjEntry { 258 | return .{ 259 | .src_node_id = edge.endpoints[0], 260 | .inout = if (edge.directed) .out else .simple, 261 | .edge_id = edge.id, 262 | .dest_node_id = edge.endpoints[1], 263 | }; 264 | } 265 | 266 | pub fn reverse(self: AdjEntry) AdjEntry { 267 | return .{ 268 | .src_node_id = self.dest_node_id, 269 | .inout = switch (self.inout) { 270 | .in => .out, 271 | .out => .in, 272 | .simple => .simple, 273 | }, 274 | .edge_id = self.edge_id, 275 | .dest_node_id = self.src_node_id, 276 | }; 277 | } 278 | 279 | pub fn packIntoKey(self: AdjEntry) [25]u8 { 280 | var key: [25]u8 = undefined; 281 | key[0..12].* = self.src_node_id.toBytes(); 282 | key[12] = @intFromEnum(self.inout); 283 | key[13..25].* = self.edge_id.toBytes(); 284 | return key; 285 | } 286 | 287 | pub fn unpackFromKeyValue(key: []const u8, value: []const u8) !AdjEntry { 288 | if (key.len != 25 or value.len != 12) { 289 | return Error.CorruptedIndex; 290 | } 291 | return .{ 292 | .src_node_id = ElementId.fromBytes(key[0..12].*), 293 | .inout = std.meta.intToEnum(types.EdgeInOut, key[12]) catch return Error.CorruptedIndex, 294 | .edge_id = ElementId.fromBytes(key[13..25].*), 295 | .dest_node_id = ElementId.fromBytes(value[0..12].*), 296 | }; 297 | } 298 | }; 299 | 300 | /// Iterator through the adjacency list of a node. 301 | pub const AdjIterator = struct { 302 | inner: rocksdb.Iterator, 303 | bounds: []u8, 304 | allocator: Allocator, 305 | 306 | pub fn close(self: AdjIterator) void { 307 | self.inner.close(); 308 | self.allocator.free(self.bounds); 309 | } 310 | 311 | pub fn next(self: *AdjIterator) !?AdjEntry { 312 | if (!self.inner.valid()) return null; 313 | const key = self.inner.key(); 314 | const value = self.inner.value(); 315 | const result = try AdjEntry.unpackFromKeyValue(key, value); 316 | self.inner.next(); 317 | return result; 318 | } 319 | }; 320 | 321 | test "put node and edge" { 322 | var tmp = test_helpers.tmp(); 323 | defer tmp.cleanup(); 324 | const db = try rocksdb.DB.open(tmp.path("test.db")); 325 | defer db.close(); 326 | 327 | const store = Storage{ .db = db }; 328 | const txn = store.txn(); 329 | defer txn.close(); 330 | 331 | const n = Node{ .id = ElementId.generate() }; 332 | const e = Edge{ .id = ElementId.generate(), .endpoints = .{ n.id, n.id }, .directed = false }; 333 | 334 | try txn.putNode(n); 335 | try txn.putEdge(e); 336 | 337 | var n2 = try txn.getNode(n.id) orelse std.debug.panic("n not found", .{}); 338 | defer n2.deinit(txn.allocator); 339 | var e2 = try txn.getEdge(e.id) orelse std.debug.panic("e not found", .{}); 340 | defer e2.deinit(txn.allocator); 341 | 342 | try std.testing.expectEqual(n.id, n2.id); 343 | try std.testing.expectEqual(e.id, e2.id); 344 | try std.testing.expectEqual(e.endpoints, e2.endpoints); 345 | 346 | try txn.deleteNode(n.id); 347 | try txn.deleteEdge(e.id); 348 | try std.testing.expectEqual(null, try txn.getNode(n.id)); 349 | try std.testing.expectEqual(null, try txn.getEdge(e.id)); 350 | } 351 | 352 | test "iterate adjacency" { 353 | var tmp = test_helpers.tmp(); 354 | defer tmp.cleanup(); 355 | const db = try rocksdb.DB.open(tmp.path("test.db")); 356 | defer db.close(); 357 | 358 | const store = Storage{ .db = db }; 359 | 360 | const n1 = Node{ .id = ElementId.generate() }; 361 | const n2 = Node{ .id = ElementId.generate() }; 362 | const n3 = Node{ .id = ElementId.generate() }; 363 | const e1 = Edge{ .id = ElementId.generate(), .endpoints = .{ n1.id, n2.id }, .directed = false }; 364 | const e2 = Edge{ .id = ElementId.generate(), .endpoints = .{ n2.id, n3.id }, .directed = false }; 365 | 366 | { 367 | const txn = store.txn(); 368 | defer txn.close(); 369 | 370 | try txn.putNode(n1); 371 | try txn.putNode(n2); 372 | try txn.putNode(n3); 373 | try txn.putEdge(e1); 374 | try txn.putEdge(e2); 375 | try txn.commit(); 376 | } 377 | 378 | const txn = store.txn(); 379 | defer txn.close(); 380 | 381 | { 382 | var it = try txn.iterateAdj(n1.id, .simple, .simple); 383 | defer it.close(); 384 | const entry = (try it.next()).?; 385 | try std.testing.expectEqual(n1.id, entry.src_node_id); 386 | try std.testing.expectEqual(n2.id, entry.dest_node_id); 387 | try std.testing.expectEqual(e1.id, entry.edge_id); 388 | try std.testing.expectEqual(null, try it.next()); 389 | } 390 | 391 | { 392 | var it = try txn.iterateAdj(n2.id, .simple, .simple); 393 | defer it.close(); 394 | try std.testing.expect(try it.next() != null); 395 | try std.testing.expect(try it.next() != null); 396 | try std.testing.expectEqual(null, try it.next()); 397 | } 398 | 399 | { 400 | var it = try txn.iterateAdj(n2.id, .out, .out); 401 | defer it.close(); 402 | try std.testing.expectEqual(null, try it.next()); 403 | } 404 | 405 | // Open a second transaction to delete n2, ensure no interference. 406 | { 407 | const txn2 = store.txn(); 408 | defer txn2.close(); 409 | try txn2.deleteNode(n2.id); 410 | try txn2.commit(); 411 | } 412 | 413 | var n2_fetch = try txn.getNode(n2.id) orelse std.debug.panic("n2 not found", .{}); 414 | n2_fetch.deinit(txn.allocator); 415 | { 416 | var it = try txn.iterateAdj(n1.id, .simple, .simple); 417 | defer it.close(); 418 | try std.testing.expect(try it.next() != null); 419 | try std.testing.expectEqual(null, try it.next()); 420 | } 421 | 422 | try std.testing.expectEqual(Error.Busy, txn.commit()); 423 | } 424 | -------------------------------------------------------------------------------- /src/storage/rocksdb.zig: -------------------------------------------------------------------------------- 1 | //! Friendly Zig wrapper types for RocksDB's C API. 2 | 3 | const std = @import("std"); 4 | const c = @cImport(@cInclude("rocksdb/c.h")); 5 | const allocator = std.heap.c_allocator; 6 | 7 | const test_helpers = @import("../test_helpers.zig"); 8 | 9 | /// Type for known errors from RocksDB. 10 | /// Based on the `Code` enum defined in `include/rocksdb/status.h`. 11 | /// 12 | /// In addition to errors returned as a string from API methods, we also can can 13 | /// get error codes from other places like functions returning null pointers. 14 | pub const Error = error{ 15 | NotFound, 16 | Corruption, 17 | NotSupported, 18 | InvalidArgument, 19 | IOError, 20 | MergeInProgress, 21 | Incomplete, 22 | ShutdownInProgress, 23 | TimedOut, 24 | Aborted, 25 | Busy, 26 | Expired, 27 | TryAgain, 28 | CompactionTooLarge, 29 | ColumnFamilyDropped, 30 | UnknownStatus, 31 | OutOfMemory, 32 | }; 33 | 34 | const log = std.log.scoped(.rocksdb); 35 | 36 | inline fn slice_starts_with(slice: []const u8, prefix: []const u8) bool { 37 | if (slice.len < prefix.len) return false; 38 | return std.mem.eql(u8, slice[0..prefix.len], prefix); 39 | } 40 | 41 | test "slice_starts_with" { 42 | const slice = "hello, world!"; 43 | try std.testing.expect(slice_starts_with(slice, "")); 44 | try std.testing.expect(slice_starts_with(slice, "hello")); 45 | try std.testing.expect(!slice_starts_with(slice, "world")); 46 | } 47 | 48 | /// Parse a RocksDB error string into a status, logging it. Consumes the string. 49 | fn parse_rocks_error(err: [*:0]u8) Error { 50 | defer c.rocksdb_free(err); // free the memory when done 51 | log.info("{s}", .{err}); 52 | 53 | const slice = std.mem.span(err); 54 | if (slice.len == 0) return Error.UnknownStatus; 55 | switch (slice[0]) { 56 | 'C' => { 57 | if (slice_starts_with(slice, "Corruption: ")) return Error.Corruption; 58 | if (slice_starts_with(slice, "Compaction too large: ")) return Error.CompactionTooLarge; 59 | if (slice_starts_with(slice, "Column family dropped: ")) return Error.ColumnFamilyDropped; 60 | }, 61 | 'I' => { 62 | if (slice_starts_with(slice, "Invalid argument: ")) return Error.InvalidArgument; 63 | if (slice_starts_with(slice, "IO error: ")) return Error.IOError; 64 | }, 65 | 'M' => { 66 | if (slice_starts_with(slice, "Merge in progress: ")) return Error.MergeInProgress; 67 | }, 68 | 'N' => { 69 | if (slice_starts_with(slice, "NotFound: ")) return Error.NotFound; 70 | if (slice_starts_with(slice, "Not implemented: ")) return Error.NotSupported; 71 | }, 72 | 'O' => { 73 | if (slice_starts_with(slice, "Operation timed out: ")) return Error.TimedOut; 74 | if (slice_starts_with(slice, "Operation aborted: ")) return Error.Aborted; 75 | if (slice_starts_with(slice, "Operation expired: ")) return Error.Expired; 76 | if (slice_starts_with(slice, "Operation failed. Try again.: ")) return Error.TryAgain; 77 | }, 78 | 'R' => { 79 | if (slice_starts_with(slice, "Result incomplete: ")) return Error.Incomplete; 80 | if (slice_starts_with(slice, "Resource busy: ")) return Error.Busy; 81 | }, 82 | 'S' => { 83 | if (slice_starts_with(slice, "Shutdown in progress: ")) return Error.ShutdownInProgress; 84 | }, 85 | else => {}, 86 | } 87 | return Error.UnknownStatus; 88 | } 89 | 90 | test "parse error from rocksdb_open" { 91 | const options = c.rocksdb_options_create().?; 92 | defer c.rocksdb_options_destroy(options); 93 | 94 | var err: ?[*:0]u8 = null; 95 | const db = c.rocksdb_open(options, "<~~not@a/valid&file>", &err); 96 | try std.testing.expectEqual(null, db); 97 | 98 | const status = parse_rocks_error(err.?); 99 | try std.testing.expectEqual(Error.IOError, status); 100 | } 101 | 102 | /// Constant set of column families defined for the database. 103 | pub const ColumnFamily = enum(u8) { 104 | /// The default column family, required by RocksDB when opening a database. 105 | /// We keep graph metadata here. 106 | default, 107 | /// Nodes in the graph. 108 | node, 109 | /// Edges in the graph. 110 | edge, 111 | /// Forward and backward adjacency lists for nodes. 112 | adj, 113 | }; 114 | 115 | /// A handle to a RocksDB database. 116 | pub const DB = struct { 117 | db: *c.rocksdb_t, 118 | otxn_db: *c.rocksdb_optimistictransactiondb_t, 119 | write_opts: *c.rocksdb_writeoptions_t, 120 | read_opts: *c.rocksdb_readoptions_t, 121 | otxn_opts: *c.rocksdb_optimistictransaction_options_t, 122 | cf_handles: std.EnumArray(ColumnFamily, *c.rocksdb_column_family_handle_t), 123 | 124 | /// Open a RocksDB database with the given name, creating it if it does not exist. 125 | pub fn open(name: []const u8) !DB { 126 | const nameZ = try allocator.dupeZ(u8, name); 127 | defer allocator.free(nameZ); 128 | 129 | const options = c.rocksdb_options_create().?; 130 | defer c.rocksdb_options_destroy(options); 131 | c.rocksdb_options_set_create_if_missing(options, 1); 132 | c.rocksdb_options_set_create_missing_column_families(options, 1); 133 | c.rocksdb_options_set_compression(options, c.rocksdb_lz4_compression); 134 | c.rocksdb_options_set_bottommost_compression(options, c.rocksdb_zstd_compression); 135 | c.rocksdb_options_increase_parallelism(options, @as(c_int, @intCast(std.Thread.getCpuCount() catch 2))); 136 | c.rocksdb_options_set_compaction_style(options, c.rocksdb_level_compaction); 137 | c.rocksdb_options_optimize_level_style_compaction(options, 512 * 1024 * 1024); 138 | c.rocksdb_options_set_write_buffer_size(options, 256 * 1024 * 1024); 139 | 140 | // Set 512 MiB in-memory block cache for reads (default: 32 MiB). 141 | { 142 | const table_options = c.rocksdb_block_based_options_create().?; 143 | defer c.rocksdb_block_based_options_destroy(table_options); 144 | const cache = c.rocksdb_cache_create_lru(512 * 1024 * 1024).?; 145 | defer c.rocksdb_cache_destroy(cache); 146 | c.rocksdb_block_based_options_set_block_cache(table_options, cache); 147 | c.rocksdb_options_set_block_based_table_factory(options, table_options); 148 | } 149 | 150 | // pre-create options to avoid repeated allocations 151 | const write_opts = c.rocksdb_writeoptions_create().?; 152 | c.rocksdb_writeoptions_disable_WAL(write_opts, 1); 153 | errdefer c.rocksdb_writeoptions_destroy(write_opts); 154 | 155 | const read_opts = c.rocksdb_readoptions_create().?; 156 | c.rocksdb_readoptions_set_async_io(read_opts, 1); 157 | errdefer c.rocksdb_readoptions_destroy(read_opts); 158 | 159 | const otxn_opts = c.rocksdb_optimistictransaction_options_create().?; 160 | c.rocksdb_optimistictransaction_options_set_set_snapshot(otxn_opts, 1); 161 | errdefer c.rocksdb_optimistictransaction_options_destroy(otxn_opts); 162 | 163 | // Define column families and their options. 164 | var cf_names = std.EnumArray(ColumnFamily, [*:0]const u8).initUndefined(); 165 | var cf_names_it = cf_names.iterator(); 166 | while (cf_names_it.next()) |entry| { 167 | entry.value.* = @tagName(entry.key); 168 | } 169 | var cf_options = std.EnumArray(ColumnFamily, *const c.rocksdb_options_t).initFill(options); 170 | var cf_handles = std.EnumArray(ColumnFamily, *c.rocksdb_column_family_handle_t).initUndefined(); 171 | 172 | var err: ?[*:0]u8 = null; 173 | const otxn_db = c.rocksdb_optimistictransactiondb_open_column_families( 174 | options, 175 | nameZ.ptr, 176 | cf_names.values.len, 177 | &cf_names.values, 178 | &cf_options.values, 179 | @ptrCast(&cf_handles.values), // Cast the array type into a ?* pointer. 180 | &err, 181 | ); 182 | if (err) |e| return parse_rocks_error(e); 183 | 184 | // Should not be null because otxn_db is only null on error. 185 | const db = c.rocksdb_optimistictransactiondb_get_base_db(otxn_db); 186 | 187 | return DB{ 188 | .db = db.?, 189 | .otxn_db = otxn_db.?, 190 | .write_opts = write_opts, 191 | .read_opts = read_opts, 192 | .otxn_opts = otxn_opts, 193 | .cf_handles = cf_handles, 194 | }; 195 | } 196 | 197 | /// Close the database, releasing all resources. 198 | pub fn close(self: DB) void { 199 | for (self.cf_handles.values) |cf| { 200 | c.rocksdb_column_family_handle_destroy(cf); 201 | } 202 | c.rocksdb_optimistictransactiondb_close_base_db(self.db); 203 | c.rocksdb_optimistictransactiondb_close(self.otxn_db); 204 | c.rocksdb_writeoptions_destroy(self.write_opts); 205 | c.rocksdb_readoptions_destroy(self.read_opts); 206 | } 207 | 208 | /// Put a key-value pair into the database. 209 | pub fn put(self: DB, cf: ColumnFamily, key: []const u8, value: []const u8) !void { 210 | var err: ?[*:0]u8 = null; 211 | c.rocksdb_put_cf( 212 | self.db, 213 | self.write_opts, 214 | self.cf_handles.get(cf), 215 | key.ptr, 216 | key.len, 217 | value.ptr, 218 | value.len, 219 | &err, 220 | ); 221 | if (err) |e| return parse_rocks_error(e); 222 | } 223 | 224 | /// Get a value from the database by key. 225 | pub fn get(self: DB, cf: ColumnFamily, key: []const u8) !?PinnableSlice { 226 | var err: ?[*:0]u8 = null; 227 | const value = c.rocksdb_get_pinned_cf( 228 | self.db, 229 | self.read_opts, 230 | self.cf_handles.get(cf), 231 | key.ptr, 232 | key.len, 233 | &err, 234 | ); 235 | if (err) |e| return parse_rocks_error(e); 236 | const val = value orelse return null; 237 | return PinnableSlice{ .rep = val }; 238 | } 239 | 240 | /// Iterate over the database by inclusive-exclusive range. 241 | /// 242 | /// Make sure that the slices for the lower and upper bounds point to valid 243 | /// memory while the iterator is active. If the bounds are freed before the 244 | /// iterator is destroyed, it will lead to undefined behavior. 245 | pub fn iterate(self: DB, cf: ColumnFamily, lower_bound: ?[]const u8, upper_bound: ?[]const u8) Iterator { 246 | const opts = c.rocksdb_readoptions_create().?; 247 | c.rocksdb_readoptions_set_async_io(opts, 1); 248 | if (lower_bound) |key| 249 | c.rocksdb_readoptions_set_iterate_lower_bound(opts, key.ptr, key.len); 250 | if (upper_bound) |key| 251 | c.rocksdb_readoptions_set_iterate_upper_bound(opts, key.ptr, key.len); 252 | const it = c.rocksdb_create_iterator_cf(self.db, opts, self.cf_handles.get(cf)).?; 253 | c.rocksdb_iter_seek_to_first(it); 254 | return Iterator{ .rep = it, .opts = opts }; 255 | } 256 | 257 | /// Delete a key from the database. 258 | pub fn delete(self: DB, cf: ColumnFamily, key: []const u8) !void { 259 | var err: ?[*:0]u8 = null; 260 | c.rocksdb_delete_cf(self.db, self.write_opts, self.cf_handles.get(cf), key.ptr, key.len, &err); 261 | if (err) |e| return parse_rocks_error(e); 262 | } 263 | 264 | /// Delete a range of keys from the database. The range is inclusive-exclusive. 265 | pub fn deleteRange(self: DB, cf: ColumnFamily, lower_bound: []const u8, upper_bound: []const u8) !void { 266 | var err: ?[*:0]u8 = null; 267 | c.rocksdb_delete_range_cf( 268 | self.db, 269 | self.write_opts, 270 | self.cf_handles.get(cf), 271 | lower_bound.ptr, 272 | lower_bound.len, 273 | upper_bound.ptr, 274 | upper_bound.len, 275 | &err, 276 | ); 277 | if (err) |e| return parse_rocks_error(e); 278 | } 279 | 280 | /// Begin a new optimistic transaction on the database. 281 | pub fn begin(self: DB) Transaction { 282 | const txn = c.rocksdb_optimistictransaction_begin(self.otxn_db, self.write_opts, self.otxn_opts, null).?; 283 | // The snapshot exists because we enabled set_snapshot in otxn_opts. 284 | const snapshot = c.rocksdb_transaction_get_snapshot(txn).?; 285 | return Transaction{ .txn = txn, .snap = snapshot, .cf_handles = self.cf_handles }; 286 | } 287 | }; 288 | 289 | /// A transaction on a RocksDB database. 290 | /// Transactions are not thread-safe and should not be shared between threads. 291 | pub const Transaction = struct { 292 | txn: *c.rocksdb_transaction_t, 293 | snap: *const c.rocksdb_snapshot_t, 294 | cf_handles: std.EnumArray(ColumnFamily, *c.rocksdb_column_family_handle_t), 295 | 296 | /// Release the transaction. 297 | pub fn close(self: Transaction) void { 298 | c.rocksdb_transaction_destroy(self.txn); 299 | } 300 | 301 | /// See `RocksDB.put()`. 302 | pub fn put(self: Transaction, cf: ColumnFamily, key: []const u8, value: []const u8) !void { 303 | var err: ?[*:0]u8 = null; 304 | c.rocksdb_transaction_put_cf( 305 | self.txn, 306 | self.cf_handles.get(cf), 307 | key.ptr, 308 | key.len, 309 | value.ptr, 310 | value.len, 311 | &err, 312 | ); 313 | if (err) |e| return parse_rocks_error(e); 314 | } 315 | 316 | /// See `RocksDB.get()`. 317 | /// 318 | /// This function uses the GetForUpdate() operation to hint the underlying 319 | /// RocksDB transaction engine to trigger read-write conflicts. This is the 320 | /// only way to trigger conflicts, as `iterate()` does not do the check. 321 | /// 322 | /// If `exclusive` is true, the transaction is recorded as having written to 323 | /// this key. 324 | pub fn get(self: Transaction, cf: ColumnFamily, key: []const u8, exclusive: bool) !?PinnableSlice { 325 | const opts = c.rocksdb_readoptions_create().?; 326 | defer c.rocksdb_readoptions_destroy(opts); 327 | c.rocksdb_readoptions_set_snapshot(opts, self.snap); // Use snapshot in transaction. 328 | c.rocksdb_readoptions_set_async_io(opts, 1); 329 | var err: ?[*:0]u8 = null; 330 | const value = c.rocksdb_transaction_get_pinned_for_update_cf( 331 | self.txn, 332 | opts, 333 | self.cf_handles.get(cf), 334 | key.ptr, 335 | key.len, 336 | @intFromBool(exclusive), 337 | &err, 338 | ); 339 | if (err) |e| return parse_rocks_error(e); 340 | const val = value orelse return null; 341 | return PinnableSlice{ .rep = val }; 342 | } 343 | 344 | /// See `RocksDB.iterate()`. 345 | pub fn iterate(self: Transaction, cf: ColumnFamily, lower_bound: ?[]const u8, upper_bound: ?[]const u8) Iterator { 346 | const opts = c.rocksdb_readoptions_create().?; 347 | c.rocksdb_readoptions_set_snapshot(opts, self.snap); // Use snapshot in transaction. 348 | c.rocksdb_readoptions_set_async_io(opts, 1); 349 | if (lower_bound) |key| 350 | c.rocksdb_readoptions_set_iterate_lower_bound(opts, key.ptr, key.len); 351 | if (upper_bound) |key| 352 | c.rocksdb_readoptions_set_iterate_upper_bound(opts, key.ptr, key.len); 353 | const it = c.rocksdb_transaction_create_iterator_cf(self.txn, opts, self.cf_handles.get(cf)).?; 354 | c.rocksdb_iter_seek_to_first(it); 355 | return Iterator{ .rep = it, .opts = opts }; 356 | } 357 | 358 | /// See `RocksDB.delete()`. 359 | pub fn delete(self: Transaction, cf: ColumnFamily, key: []const u8) !void { 360 | var err: ?[*:0]u8 = null; 361 | c.rocksdb_transaction_delete_cf(self.txn, self.cf_handles.get(cf), key.ptr, key.len, &err); 362 | if (err) |e| return parse_rocks_error(e); 363 | } 364 | 365 | /// Commit the transaction and write all batched keys atomically. 366 | /// 367 | /// This will fail if there are any optimistic transaction conflicts. The 368 | /// error returned will be `Busy`. Otherwise, if the memtable history size 369 | /// is not large enough, it will return `TryAgain`. 370 | pub fn commit(self: Transaction) !void { 371 | var err: ?[*:0]u8 = null; 372 | c.rocksdb_transaction_commit(self.txn, &err); 373 | if (err) |e| return parse_rocks_error(e); 374 | } 375 | 376 | /// Rollback the transaction and discard all batched writes. 377 | pub fn rollback(self: Transaction) !void { 378 | var err: ?[*:0]u8 = null; 379 | c.rocksdb_transaction_rollback(self.txn, &err); 380 | if (err) |e| return parse_rocks_error(e); 381 | } 382 | 383 | /// Set the savepoint, allowing it to be rolled back to this point. 384 | pub fn set_savepoint(self: Transaction) void { 385 | c.rocksdb_transaction_set_savepoint(self.txn); 386 | } 387 | 388 | /// Rollback to the last savepoint, discarding all writes since then. 389 | pub fn rollback_to_savepoint(self: Transaction) !void { 390 | var err: ?[*:0]u8 = null; 391 | c.rocksdb_transaction_rollback_to_savepoint(self.txn, &err); 392 | if (err) |e| return parse_rocks_error(e); 393 | } 394 | }; 395 | 396 | /// An iterator over a range of keys in a RocksDB database. 397 | pub const Iterator = struct { 398 | rep: *c.rocksdb_iterator_t, 399 | opts: *c.rocksdb_readoptions_t, 400 | 401 | /// Check if the current position of the iterator is valid. 402 | pub fn valid(self: Iterator) bool { 403 | return c.rocksdb_iter_valid(self.rep) != 0; 404 | } 405 | 406 | /// Advance the iterator. This invalidates any previous key or value slice. 407 | pub fn next(self: Iterator) void { 408 | c.rocksdb_iter_next(self.rep); 409 | } 410 | 411 | /// Get the key at the current position of the iterator. 412 | pub fn key(self: Iterator) []const u8 { 413 | var klen: usize = undefined; 414 | const kptr = c.rocksdb_iter_key(self.rep, &klen); 415 | std.debug.assert(kptr != null); 416 | return kptr[0..klen]; 417 | } 418 | 419 | /// Get the value at the current position of the iterator. 420 | pub fn value(self: Iterator) []const u8 { 421 | var vlen: usize = undefined; 422 | const vptr = c.rocksdb_iter_value(self.rep, &vlen); 423 | std.debug.assert(vptr != null); 424 | return vptr[0..vlen]; 425 | } 426 | 427 | /// Release the iterator. 428 | pub fn close(self: Iterator) void { 429 | c.rocksdb_iter_destroy(self.rep); 430 | c.rocksdb_readoptions_destroy(self.opts); 431 | } 432 | }; 433 | 434 | /// A pinnable slice, which can reference memory that is directly owned by RocksDB. 435 | pub const PinnableSlice = struct { 436 | rep: *c.rocksdb_pinnableslice_t, 437 | 438 | /// Reference the value as a Zig slice. 439 | pub fn bytes(self: PinnableSlice) []const u8 { 440 | var vlen: usize = undefined; 441 | const vptr = c.rocksdb_pinnableslice_value(self.rep, &vlen); 442 | // Note: vptr cannot be null here, since self.rep is not null. 443 | std.debug.assert(vptr != null); 444 | return vptr[0..vlen]; 445 | } 446 | 447 | /// Release the reference to memory associated with this slice. 448 | pub fn close(self: PinnableSlice) void { 449 | c.rocksdb_pinnableslice_destroy(self.rep); 450 | } 451 | }; 452 | 453 | test "get and put value" { 454 | var tmp = test_helpers.tmp(); 455 | defer tmp.cleanup(); 456 | const db = try DB.open(tmp.path("test.db")); 457 | defer db.close(); 458 | 459 | try std.testing.expectEqual(null, try db.get(.default, "hello")); 460 | 461 | try db.put(.default, "hello", "world"); 462 | { 463 | const value = try db.get(.default, "hello") orelse 464 | std.debug.panic("value for 'hello' not found", .{}); 465 | defer value.close(); 466 | try std.testing.expectEqualSlices(u8, "world", value.bytes()); 467 | } 468 | 469 | try db.delete(.default, "hello"); 470 | try std.testing.expectEqual(null, try db.get(.default, "hello")); 471 | } 472 | 473 | test "iterate range" { 474 | var tmp = test_helpers.tmp(); 475 | defer tmp.cleanup(); 476 | const db = try DB.open(tmp.path("test.db")); 477 | defer db.close(); 478 | 479 | try db.put(.default, "a", "1"); 480 | try db.put(.default, "aa", "2"); 481 | try db.put(.default, "aaa", "3"); 482 | try db.put(.default, "aab", "4"); 483 | try db.put(.default, "ab", "5"); 484 | { 485 | const it = db.iterate(.default, "aa", "ab"); 486 | defer it.close(); 487 | try std.testing.expect(it.valid()); 488 | try std.testing.expectEqualSlices(u8, "aa", it.key()); 489 | try std.testing.expectEqualSlices(u8, "2", it.value()); 490 | it.next(); 491 | try std.testing.expect(it.valid()); 492 | try std.testing.expectEqualSlices(u8, "aaa", it.key()); 493 | try std.testing.expectEqualSlices(u8, "3", it.value()); 494 | it.next(); 495 | try std.testing.expect(it.valid()); 496 | try std.testing.expectEqualSlices(u8, "aab", it.key()); 497 | try std.testing.expectEqualSlices(u8, "4", it.value()); 498 | it.next(); 499 | try std.testing.expect(!it.valid()); 500 | } 501 | 502 | try db.deleteRange(.default, "aa", "aab"); 503 | { 504 | const it = db.iterate(.default, "aa", "ab"); 505 | defer it.close(); 506 | try std.testing.expect(it.valid()); 507 | try std.testing.expectEqualSlices(u8, "aab", it.key()); 508 | try std.testing.expectEqualSlices(u8, "4", it.value()); 509 | it.next(); 510 | try std.testing.expect(!it.valid()); 511 | } 512 | } 513 | 514 | test "transaction" { 515 | var tmp = test_helpers.tmp(); 516 | defer tmp.cleanup(); 517 | const db = try DB.open(tmp.path("test.db")); 518 | defer db.close(); 519 | 520 | const tx1 = db.begin(); 521 | const tx2 = db.begin(); 522 | defer tx1.close(); 523 | defer tx2.close(); 524 | try tx1.put(.default, "x", "1"); 525 | 526 | // Outside the transaction, we shouldn't see the value yet. 527 | try std.testing.expectEqual(null, try db.get(.default, "x")); 528 | try std.testing.expectEqual(null, try tx2.get(.default, "x", false)); 529 | 530 | try tx1.commit(); 531 | 532 | // After commit, we should see the value. 533 | { 534 | const value = try db.get(.default, "x") orelse 535 | std.debug.panic("value not found", .{}); 536 | defer value.close(); 537 | try std.testing.expectEqualSlices(u8, value.bytes(), "1"); 538 | } 539 | 540 | { 541 | const it = db.iterate(.default, "x", null); 542 | defer it.close(); 543 | try std.testing.expect(it.valid()); 544 | try std.testing.expectEqualSlices(u8, "x", it.key()); 545 | try std.testing.expectEqualSlices(u8, "1", it.value()); 546 | } 547 | 548 | // But tx2 should still not be able to see the value. 549 | try std.testing.expectEqual(null, try tx2.get(.default, "x", false)); 550 | 551 | { 552 | const it = tx2.iterate(.default, "x", null); 553 | defer it.close(); 554 | try std.testing.expect(!it.valid()); 555 | } 556 | 557 | // If tx2 then modifies "x", it should cause a conflict. 558 | try tx2.put(.default, "x", "2"); 559 | try std.testing.expectError(Error.Busy, tx2.commit()); 560 | try tx2.rollback(); 561 | } 562 | -------------------------------------------------------------------------------- /src/test_helpers.zig: -------------------------------------------------------------------------------- 1 | //! Helper functions to make testing less repetitive. 2 | 3 | const std = @import("std"); 4 | const testing = std.testing; 5 | 6 | const rocksdb = @import("storage/rocksdb.zig"); 7 | const storage = @import("storage.zig"); 8 | 9 | pub const SimpleTmpDir = struct { 10 | tmp_dir: testing.TmpDir, 11 | paths: std.ArrayList([]const u8), 12 | 13 | pub fn cleanup(self: *SimpleTmpDir) void { 14 | for (self.paths.items) |p| { 15 | testing.allocator.free(p); 16 | } 17 | self.paths.deinit(); 18 | self.tmp_dir.cleanup(); 19 | } 20 | 21 | pub fn path(self: *SimpleTmpDir, subpath: []const u8) []const u8 { 22 | const dir_path = self.tmp_dir.dir.realpathAlloc(testing.allocator, ".") catch 23 | std.debug.panic("realpathAlloc failed", .{}); 24 | defer testing.allocator.free(dir_path); 25 | const full_path = std.fmt.allocPrint(testing.allocator, "{s}/{s}", .{ dir_path, subpath }) catch 26 | std.debug.panic("failed to allocPrint", .{}); 27 | self.paths.append(full_path) catch 28 | std.debug.panic("failed to append full_path", .{}); 29 | return full_path; 30 | } 31 | 32 | pub fn store(self: *SimpleTmpDir, subpath: []const u8) !storage.Storage { 33 | const db = try rocksdb.DB.open(self.path(subpath)); 34 | return .{ .db = db }; 35 | } 36 | }; 37 | 38 | pub fn tmp() SimpleTmpDir { 39 | const tmp_dir = testing.tmpDir(.{}); 40 | const paths = std.ArrayList([]const u8).init(testing.allocator); 41 | return SimpleTmpDir{ .tmp_dir = tmp_dir, .paths = paths }; 42 | } 43 | -------------------------------------------------------------------------------- /src/types.zig: -------------------------------------------------------------------------------- 1 | //! Definition of common types used in modeling property graphs. 2 | 3 | const std = @import("std"); 4 | const Allocator = std.mem.Allocator; 5 | const random = std.crypto.random; 6 | 7 | /// Unique element ID for a node or edge. Element IDs are random 96-bit integers. 8 | pub const ElementId = struct { 9 | value: u96, 10 | 11 | /// Generates a new random element ID. 12 | pub fn generate() ElementId { 13 | return ElementId{ .value = random.int(u96) }; 14 | } 15 | 16 | /// Return this element ID as a big-endian byte array. 17 | pub fn toBytes(self: ElementId) [12]u8 { 18 | var buf: [12]u8 = undefined; 19 | std.mem.writeInt(u96, &buf, self.value, .big); 20 | return buf; 21 | } 22 | 23 | /// Create an element ID from a big-endian byte array. 24 | pub fn fromBytes(bytes: [12]u8) ElementId { 25 | return .{ .value = std.mem.readInt(u96, &bytes, .big) }; 26 | } 27 | 28 | /// Return an element ID as a base64 string. 29 | pub fn toString(self: ElementId) [16]u8 { 30 | var buf: [16]u8 = undefined; 31 | var x = self.value; 32 | var i = buf.len; 33 | while (i > 0) { 34 | i -= 1; 35 | const c = x & 0x3f; 36 | buf[i] = switch (c) { 37 | 0...25 => 'A' + @as(u8, @intCast(c)), 38 | 26...51 => 'a' + @as(u8, @intCast(c - 26)), 39 | 52...61 => '0' + @as(u8, @intCast(c - 52)), 40 | 62 => '-', 41 | 63 => '_', 42 | else => unreachable, 43 | }; 44 | x >>= 6; 45 | } 46 | return buf; 47 | } 48 | 49 | pub fn encode(self: ElementId, writer: anytype) !void { 50 | try writer.writeAll(&self.toBytes()); 51 | } 52 | 53 | pub fn decode(reader: anytype) !ElementId { 54 | var buf: [12]u8 = undefined; 55 | try reader.readNoEof(&buf); 56 | return ElementId.fromBytes(buf); 57 | } 58 | 59 | pub fn next(self: ElementId) ElementId { 60 | return ElementId{ .value = self.value + 1 }; 61 | } 62 | }; 63 | 64 | test ElementId { 65 | const id = ElementId{ .value = 238093323431135580 }; 66 | try std.testing.expectEqualStrings(&id.toString(), "AAAAAANN4J2-gUlc"); 67 | try std.testing.expect(ElementId.generate().value != ElementId.generate().value); 68 | } 69 | 70 | /// Edge direction as expressed in a path pattern. 71 | pub const EdgeDirection = enum { 72 | left, // <-[]- 73 | right, // -[]-> 74 | undirected, // ~[]~ 75 | left_or_undirected, // <~[]~ 76 | right_or_undirected, // ~[]~> 77 | left_or_right, // <-[]-> 78 | any, // -[]- 79 | 80 | /// Returns the left part of the edge direction as a string. 81 | pub fn leftPart(self: EdgeDirection) [:0]const u8 { 82 | return switch (self) { 83 | .left => "<-[", 84 | .right => "-[", 85 | .undirected => "~[", 86 | .left_or_undirected => "<~[", 87 | .right_or_undirected => "~[", 88 | .left_or_right => "<-[", 89 | .any => "-[", 90 | }; 91 | } 92 | 93 | /// Returns the right part of the edge direction as a string. 94 | pub fn rightPart(self: EdgeDirection) [:0]const u8 { 95 | return switch (self) { 96 | .left => "]-", 97 | .right => "]->", 98 | .undirected => "]~", 99 | .left_or_undirected => "]~", 100 | .right_or_undirected => "]~>", 101 | .left_or_right => "]->", 102 | .any => "]-", 103 | }; 104 | } 105 | }; 106 | 107 | /// Whether an edge is going in or out of a node. Stored in adjacency lists. 108 | pub const EdgeInOut = enum(u8) { 109 | /// A directed edge pointing out from a node. 110 | out = 0, 111 | /// An undirected edge. 112 | simple = 1, 113 | /// A directed edge pointing into a node. 114 | in = 2, 115 | }; 116 | 117 | /// The dynamically-typed kind of a value. 118 | pub const ValueKind = enum(u8) { 119 | string = 1, 120 | // bytes 121 | int64 = 2, 122 | // [various integers] 123 | float64 = 3, 124 | // [various floating-points] 125 | // date, datetime, duration 126 | node_ref = 4, 127 | edge_ref = 5, 128 | id = 6, 129 | bool = 7, 130 | null = 8, 131 | }; 132 | 133 | /// Encode a length-delimited byte buffer. 134 | pub fn encodeBytes(bytes: []const u8, writer: anytype) !void { 135 | try writer.writeInt(u32, @intCast(bytes.len), .big); 136 | try writer.writeAll(bytes); 137 | } 138 | 139 | /// Decode a length-delimited byte buffer. 140 | pub fn decodeBytes(allocator: Allocator, reader: anytype) ![]const u8 { 141 | const len: usize = @intCast(try reader.readInt(u32, .big)); 142 | const buf = try allocator.alloc(u8, len); 143 | errdefer allocator.free(buf); 144 | try reader.readNoEof(buf); 145 | return buf; 146 | } 147 | 148 | pub fn encodeLabels(labels: std.StringArrayHashMapUnmanaged(void), writer: anytype) !void { 149 | try writer.writeInt(u32, @intCast(labels.count()), .big); 150 | for (labels.keys()) |label| { 151 | try encodeBytes(label, writer); 152 | } 153 | } 154 | 155 | pub fn decodeLabels(allocator: Allocator, reader: anytype) !std.StringArrayHashMapUnmanaged(void) { 156 | const len: usize = @intCast(try reader.readInt(u32, .big)); 157 | var labels: std.StringArrayHashMapUnmanaged(void) = .{}; 158 | errdefer freeLabels(allocator, &labels); 159 | for (0..len) |_| { 160 | const label = try decodeBytes(allocator, reader); 161 | try labels.put(allocator, label, void{}); 162 | } 163 | return labels; 164 | } 165 | 166 | pub fn freeLabels(allocator: Allocator, labels: *std.StringArrayHashMapUnmanaged(void)) void { 167 | for (labels.keys()) |label| { 168 | allocator.free(label); 169 | } 170 | labels.deinit(allocator); 171 | } 172 | 173 | pub fn encodeProperties(properties: std.StringArrayHashMapUnmanaged(Value), writer: anytype) !void { 174 | try writer.writeInt(u32, @intCast(properties.count()), .big); 175 | var it = properties.iterator(); 176 | while (it.next()) |entry| { 177 | try encodeBytes(entry.key_ptr.*, writer); 178 | try entry.value_ptr.encode(writer); 179 | } 180 | } 181 | 182 | pub fn decodeProperties(allocator: Allocator, reader: anytype) !std.StringArrayHashMapUnmanaged(Value) { 183 | const len: usize = @intCast(try reader.readInt(u32, .big)); 184 | var properties: std.StringArrayHashMapUnmanaged(Value) = .{}; 185 | errdefer freeProperties(allocator, &properties); 186 | for (0..len) |_| { 187 | const key = try decodeBytes(allocator, reader); 188 | errdefer allocator.free(key); 189 | var value = try Value.decode(allocator, reader); 190 | errdefer value.deinit(allocator); 191 | try properties.put(allocator, key, value); 192 | } 193 | return properties; 194 | } 195 | 196 | pub fn freeProperties(allocator: Allocator, properties: *std.StringArrayHashMapUnmanaged(Value)) void { 197 | for (properties.values()) |*value| { 198 | value.deinit(allocator); 199 | } 200 | for (properties.keys()) |key| { 201 | allocator.free(key); 202 | } 203 | properties.deinit(allocator); 204 | } 205 | 206 | /// The main value type for graph properties and binding tables elements. 207 | /// 208 | /// This is a full list of data types supported by Graphon and supported by the 209 | /// expression language. Values can be assigned to properties or constructed 210 | /// during execution of a query. 211 | /// 212 | /// Reference: ISO/IEC 39075:2024, Section 18.9. 213 | pub const Value = union(ValueKind) { 214 | string: []const u8, // Binary-safe string. 215 | int64: i64, 216 | float64: f64, 217 | node_ref: ElementId, // Reference to a node (must exist). 218 | edge_ref: ElementId, // Reference to an edge (must exist). 219 | id: ElementId, // Not necessarily populated by node or edge. 220 | bool: bool, 221 | null, 222 | 223 | pub fn deinit(self: *Value, allocator: Allocator) void { 224 | switch (self.*) { 225 | .string => |s| allocator.free(s), 226 | else => {}, 227 | } 228 | self.* = undefined; 229 | } 230 | 231 | /// Duplicate a value, using the provided allocator. 232 | pub fn dupe(self: Value, allocator: Allocator) !Value { 233 | return switch (self) { 234 | .string => |s| .{ .string = try allocator.dupe(u8, s) }, 235 | else => self, 236 | }; 237 | } 238 | 239 | /// Pretty-print a value to a writer. 240 | pub fn print(self: Value, writer: anytype) !void { 241 | switch (self) { 242 | .string => |s| try writer.print("'{s}'", .{s}), 243 | .int64 => |n| try writer.print("{}", .{n}), 244 | .float64 => |f| try writer.print("{}", .{f}), 245 | .node_ref => |id| try writer.print("{s}", .{id.toString()}), 246 | .edge_ref => |id| try writer.print("{s}", .{id.toString()}), 247 | .id => |id| try writer.print("{s}", .{id.toString()}), 248 | .bool => |b| try writer.print("{s}", .{if (b) "true" else "false"}), 249 | .null => try writer.print("null", .{}), 250 | } 251 | } 252 | 253 | /// Encode this value to a binary format for storage or transmission. 254 | /// 255 | /// On failure due to out-of-memory, this function may leave the provided 256 | /// buffer in an invalid or partially-written state. 257 | pub fn encode(self: Value, writer: anytype) Allocator.Error!void { 258 | const tag: u8 = @intFromEnum(self); 259 | try writer.writeByte(tag); 260 | switch (self) { 261 | .string => |s| try encodeBytes(s, writer), 262 | .int64 => |n| try writer.writeInt(i64, n, .big), 263 | .float64 => |f| try writer.writeInt(u64, @bitCast(f), .big), 264 | .node_ref => |id| try id.encode(writer), 265 | .edge_ref => |id| try id.encode(writer), 266 | .id => |id| try id.encode(writer), 267 | .bool => |b| try writer.writeByte(if (b) 1 else 0), 268 | .null => {}, 269 | } 270 | } 271 | 272 | /// Decode a value encoded by `Value.encode()`. 273 | pub fn decode(allocator: Allocator, reader: anytype) !Value { 274 | const tag_int = try reader.readByte(); 275 | const tag = std.meta.intToEnum(ValueKind, tag_int) catch { 276 | return error.InvalidValueTag; 277 | }; 278 | switch (tag) { 279 | .string => { 280 | const s = try decodeBytes(allocator, reader); 281 | return .{ .string = s }; 282 | }, 283 | .int64 => { 284 | const n = try reader.readInt(i64, .big); 285 | return .{ .int64 = n }; 286 | }, 287 | .float64 => { 288 | const bits = try reader.readInt(u64, .big); 289 | return .{ .float64 = @bitCast(bits) }; 290 | }, 291 | .node_ref => { 292 | const id = try ElementId.decode(reader); 293 | return .{ .node_ref = id }; 294 | }, 295 | .edge_ref => { 296 | const id = try ElementId.decode(reader); 297 | return .{ .edge_ref = id }; 298 | }, 299 | .id => { 300 | const id = try ElementId.decode(reader); 301 | return .{ .id = id }; 302 | }, 303 | .bool => { 304 | const b = try reader.readByte(); 305 | return .{ .bool = b != 0 }; 306 | }, 307 | .null => return .null, 308 | } 309 | } 310 | 311 | /// Add two values together, allocating a result. 312 | pub fn add(a: Value, b: Value, allocator: Allocator) Allocator.Error!Value { 313 | return switch (a) { 314 | .string => |a_| switch (b) { 315 | .string => |b_| { 316 | const len = a_.len + b_.len; 317 | const buf = try allocator.alloc(u8, len); 318 | std.mem.copyForwards(u8, buf, a_); 319 | std.mem.copyForwards(u8, buf[a_.len..], b_); 320 | return .{ .string = buf }; 321 | }, 322 | else => .null, 323 | }, 324 | .int64 => |a_| switch (b) { 325 | .int64 => |b_| .{ .int64 = a_ + b_ }, 326 | .float64 => |b_| .{ .float64 = @as(f64, @floatFromInt(a_)) + b_ }, 327 | else => .null, 328 | }, 329 | .float64 => |a_| switch (b) { 330 | .int64 => |b_| .{ .float64 = a_ + @as(f64, @floatFromInt(b_)) }, 331 | .float64 => |b_| .{ .float64 = a_ + b_ }, 332 | else => .null, 333 | }, 334 | else => .null, 335 | }; 336 | } 337 | 338 | /// Subtract two values. 339 | pub fn sub(a: Value, b: Value) Value { 340 | return switch (a) { 341 | .int64 => |a_| switch (b) { 342 | .int64 => |b_| .{ .int64 = a_ - b_ }, 343 | .float64 => |b_| .{ .float64 = @as(f64, @floatFromInt(a_)) - b_ }, 344 | else => .null, 345 | }, 346 | .float64 => |a_| switch (b) { 347 | .int64 => |b_| .{ .float64 = a_ - @as(f64, @floatFromInt(b_)) }, 348 | .float64 => |b_| .{ .float64 = a_ - b_ }, 349 | else => .null, 350 | }, 351 | else => .null, 352 | }; 353 | } 354 | 355 | /// Check if two values are equal. 356 | pub fn eql(a: Value, b: Value) bool { 357 | return switch (a) { 358 | .string => |a_| switch (b) { 359 | .string => |b_| std.mem.eql(u8, a_, b_), 360 | else => false, 361 | }, 362 | .int64 => |a_| switch (b) { 363 | .int64 => |b_| a_ == b_, 364 | .float64 => |b_| @as(f64, @floatFromInt(a_)) == b_, 365 | else => false, 366 | }, 367 | .float64 => |a_| switch (b) { 368 | .int64 => |b_| a_ == @as(f64, @floatFromInt(b_)), 369 | .float64 => |b_| a_ == b_, 370 | else => false, 371 | }, 372 | .node_ref => |a_| switch (b) { 373 | .node_ref => |b_| a_.value == b_.value, 374 | else => false, 375 | }, 376 | .edge_ref => |a_| switch (b) { 377 | .edge_ref => |b_| a_.value == b_.value, 378 | else => false, 379 | }, 380 | .id => |a_| switch (b) { 381 | .id => |b_| a_.value == b_.value, 382 | else => false, 383 | }, 384 | .bool => |a_| switch (b) { 385 | .bool => |b_| a_ == b_, 386 | else => false, 387 | }, 388 | .null => b == .null, 389 | }; 390 | } 391 | 392 | /// Returns whether a value is truthy. 393 | /// 394 | /// All values are generally truthy, except for the following values: false, 395 | /// 0, -0, "", null, and NaN. 396 | pub fn truthy(self: Value) bool { 397 | return switch (self) { 398 | .string => |s| s.len > 0, 399 | .int64 => |n| n != 0, 400 | .float64 => |f| f != 0 and !std.math.isNan(f), 401 | .node_ref, .edge_ref, .id => true, 402 | .bool => |b| b, 403 | .null => false, 404 | }; 405 | } 406 | }; 407 | 408 | /// A property graph node. 409 | /// 410 | /// Reference: ISO/IEC 39075:2024, Section 4.3.5.1. 411 | pub const Node = struct { 412 | id: ElementId, 413 | labels: std.StringArrayHashMapUnmanaged(void) = .{}, 414 | properties: std.StringArrayHashMapUnmanaged(Value) = .{}, 415 | 416 | pub fn deinit(self: *Node, allocator: Allocator) void { 417 | freeLabels(allocator, &self.labels); 418 | freeProperties(allocator, &self.properties); 419 | self.* = undefined; 420 | } 421 | }; 422 | 423 | /// A property graph edge. 424 | /// 425 | /// Reference: ISO/IEC 39075:2024, Section 4.3.5.1. 426 | pub const Edge = struct { 427 | id: ElementId, 428 | endpoints: [2]ElementId, 429 | directed: bool, 430 | labels: std.StringArrayHashMapUnmanaged(void) = .{}, 431 | properties: std.StringArrayHashMapUnmanaged(Value) = .{}, 432 | 433 | pub fn deinit(self: *Edge, allocator: Allocator) void { 434 | freeLabels(allocator, &self.labels); 435 | freeProperties(allocator, &self.properties); 436 | self.* = undefined; 437 | } 438 | }; 439 | -------------------------------------------------------------------------------- /src/vendor/snaptest.zig: -------------------------------------------------------------------------------- 1 | // From Tigerbeetle (commit 588123f), licensed under Apache 2.0. 2 | // https://github.com/tigerbeetle/tigerbeetle/blob/588123f219f1f3e324f9293ae3975845e087d5f5/src/testing/snaptest.zig 3 | 4 | //! A tiny pattern/library for testing with expectations ([1], [2]). 5 | //! 6 | //! On a high level, this is a replacement for `std.testing.expectEqual` which: 7 | //! 8 | //! - is less cumbersome to use for complex types, 9 | //! - gives somewhat more useful feedback on a test failure without much investment, 10 | //! - drastically reduces the time to update the tests after refactors, 11 | //! - encourages creation of reusable visualizations for data structures. 12 | //! 13 | //! Implementation-wise, `snaptest` provides a `Snap` type, which can be thought of as a Zig string 14 | //! literal which also remembers its location in the source file, can be diffed with other strings, 15 | //! and, crucially, can _update its own source code_ to match the expected value. 16 | //! 17 | //! Example usage: 18 | //! 19 | //! ``` 20 | //! const Snap = @import("snaptest.zig").Snap; 21 | //! const snap = Snap.snap; 22 | //! 23 | //! fn check_addition(x: u32, y: u32, want: Snap) !void { 24 | //! const got = x + y; 25 | //! try want.diff_fmt("{}", .{got}); 26 | //! } 27 | //! 28 | //! test "addition" { 29 | //! try check_addition(2, 2, snap(@src(), 30 | //! \\8 31 | //! )); 32 | //! } 33 | //! ``` 34 | //! 35 | //! Running this test fails, printing the diff between actual result (`4`) and what's specified in 36 | //! the source code. 37 | //! 38 | //! Re-running the test with `SNAP_UPDATE=1` environmental variable auto-magically updates the 39 | //! source code to say `\\4`. Alternatively, you can use `snap(...).update()` to auto-update just a 40 | //! single test. 41 | //! 42 | //! Note the `@src()` argument passed to the `snap(...)` invocation --- that's how it knows which 43 | //! lines to update. 44 | //! 45 | //! Snapshots can use `` marker to ignore part of input: 46 | //! 47 | //! ``` 48 | //! test "time" { 49 | //! var buf: [32]u8 = undefined; 50 | //! const time = try std.fmt.bufPrint(&buf, "it's {}ms", .{ 51 | //! std.time.milliTimestamp(), 52 | //! }); 53 | //! try Snap.snap(@src(), 54 | //! \\it's ms 55 | //! ).diff(time); 56 | //! } 57 | //! ``` 58 | //! 59 | //! TODO: 60 | //! - This doesn't actually `diff` things yet :o) But running with `SNAP_UPDATE=1` and then using 61 | //! `git diff` is a workable substitute. 62 | //! - Only one test can be updated at a time. To update several, we need to return 63 | //! `error.SkipZigTest` on mismatch and adjust offsets appropriately. 64 | //! 65 | //! [1]: https://blog.janestreet.com/using-ascii-waveforms-to-test-hardware-designs/ 66 | //! [2]: https://ianthehenry.com/posts/my-kind-of-repl/ 67 | const std = @import("std"); 68 | const assert = std.debug.assert; 69 | const builtin = @import("builtin"); 70 | const SourceLocation = std.builtin.SourceLocation; 71 | 72 | const Cut = struct { 73 | prefix: []const u8, 74 | suffix: []const u8, 75 | }; 76 | 77 | /// Splits the `haystack` around the first occurrence of `needle`, returning parts before and after. 78 | /// 79 | /// This is a Zig version of Go's `string.Cut` / Rust's `str::split_once`. Cut turns out to be a 80 | /// surprisingly versatile primitive for ad-hoc string processing. Often `std.mem.indexOf` and 81 | /// `std.mem.split` can be replaced with a shorter and clearer code using `cut`. 82 | pub fn cut(haystack: []const u8, needle: []const u8) ?Cut { 83 | const index = std.mem.indexOf(u8, haystack, needle) orelse return null; 84 | 85 | return Cut{ 86 | .prefix = haystack[0..index], 87 | .suffix = haystack[index + needle.len ..], 88 | }; 89 | } 90 | 91 | comptime { 92 | assert(builtin.is_test); 93 | } 94 | 95 | // Set to `true` to update all snapshots. 96 | const update_all: bool = false; 97 | 98 | pub const Snap = struct { 99 | location: SourceLocation, 100 | text: []const u8, 101 | update_this: bool = false, 102 | 103 | /// Creates a new Snap. 104 | /// 105 | /// For the update logic to work, *must* be formatted as: 106 | /// 107 | /// ``` 108 | /// snap(@src(), 109 | /// \\Text of the snapshot. 110 | /// ) 111 | /// ``` 112 | pub fn snap(location: SourceLocation, text: []const u8) Snap { 113 | return Snap{ .location = location, .text = text }; 114 | } 115 | 116 | /// Builder-lite method to update just this particular snapshot. 117 | pub fn update(snapshot: *const Snap) Snap { 118 | return Snap{ 119 | .location = snapshot.location, 120 | .text = snapshot.text, 121 | .update_this = true, 122 | }; 123 | } 124 | 125 | /// To update a snapshot, use whichever you prefer: 126 | /// - `.update()` method on a particular snap, 127 | /// - `update_all` const in this file, 128 | /// - `SNAP_UPDATE` env var. 129 | fn should_update(snapshot: *const Snap) bool { 130 | return snapshot.update_this or update_all or 131 | std.process.hasEnvVarConstant("SNAP_UPDATE"); 132 | } 133 | 134 | // Compare the snapshot with a formatted string. 135 | pub fn diff_fmt(snapshot: *const Snap, comptime fmt: []const u8, fmt_args: anytype) !void { 136 | const got = try std.fmt.allocPrint(std.testing.allocator, fmt, fmt_args); 137 | defer std.testing.allocator.free(got); 138 | 139 | try snapshot.diff(got); 140 | } 141 | 142 | // Compare the snapshot with the json serialization of a `value`. 143 | pub fn diff_json(snapshot: *const Snap, value: anytype) !void { 144 | var got = std.ArrayList(u8).init(std.testing.allocator); 145 | defer got.deinit(); 146 | 147 | try std.json.stringify(value, .{}, got.writer()); 148 | try snapshot.diff(got.items); 149 | } 150 | 151 | // Compare the snapshot with a given string. 152 | pub fn diff(snapshot: *const Snap, got: []const u8) !void { 153 | if (equal_excluding_ignored(got, snapshot.text)) return; 154 | 155 | std.debug.print( 156 | \\Snapshot differs. 157 | \\Want: 158 | \\---- 159 | \\{s} 160 | \\---- 161 | \\Got: 162 | \\---- 163 | \\{s} 164 | \\---- 165 | \\ 166 | , 167 | .{ 168 | snapshot.text, 169 | got, 170 | }, 171 | ); 172 | 173 | if (!snapshot.should_update()) { 174 | std.debug.print( 175 | "Rerun with SNAP_UPDATE=1 environmental variable to update the snapshot.\n", 176 | .{}, 177 | ); 178 | return error.SnapDiff; 179 | } 180 | 181 | var arena = std.heap.ArenaAllocator.init(std.testing.allocator); 182 | defer arena.deinit(); 183 | 184 | const allocator = arena.allocator(); 185 | 186 | const file_text = 187 | try std.fs.cwd().readFileAlloc(allocator, snapshot.location.file, 1024 * 1024); 188 | var file_text_updated = try std.ArrayList(u8).initCapacity(allocator, file_text.len); 189 | 190 | const line_zero_based = snapshot.location.line - 1; 191 | const range = snap_range(file_text, line_zero_based); 192 | 193 | const snapshot_prefix = file_text[0..range.start]; 194 | const snapshot_text = file_text[range.start..range.end]; 195 | const snapshot_suffix = file_text[range.end..]; 196 | 197 | const indent = get_indent(snapshot_text); 198 | 199 | try file_text_updated.appendSlice(snapshot_prefix); 200 | { 201 | var lines = std.mem.split(u8, got, "\n"); 202 | while (lines.next()) |line| { 203 | try file_text_updated.writer().print("{s}\\\\{s}\n", .{ indent, line }); 204 | } 205 | } 206 | try file_text_updated.appendSlice(snapshot_suffix); 207 | 208 | try std.fs.cwd().writeFile(.{ .sub_path = snapshot.location.file, .data = file_text_updated.items }); 209 | 210 | std.debug.print("Updated {s}\n", .{snapshot.location.file}); 211 | return error.SnapUpdated; 212 | } 213 | }; 214 | 215 | fn equal_excluding_ignored(got: []const u8, snapshot: []const u8) bool { 216 | var got_rest = got; 217 | var snapshot_rest = snapshot; 218 | 219 | // Don't allow ignoring suffixes and prefixes, as that makes it easy to miss trailing or leading 220 | // data. 221 | assert(!std.mem.startsWith(u8, snapshot, "")); 222 | assert(!std.mem.endsWith(u8, snapshot, "")); 223 | 224 | for (0..10) |_| { 225 | // Cut the part before the first ignore, it should be equal between two strings... 226 | const snapshot_cut = cut(snapshot_rest, "") orelse break; 227 | const got_cut = cut(got_rest, snapshot_cut.prefix) orelse return false; 228 | if (got_cut.prefix.len != 0) return false; 229 | got_rest = got_cut.suffix; 230 | snapshot_rest = snapshot_cut.suffix; 231 | 232 | // ...then find the next part that should match, and cut up to that. 233 | const next_match = if (cut(snapshot_rest, "")) |snapshot_cut_next| 234 | snapshot_cut_next.prefix 235 | else 236 | snapshot_rest; 237 | assert(next_match.len > 0); 238 | snapshot_rest = cut(snapshot_rest, next_match).?.suffix; 239 | 240 | const got_cut_next = cut(got_rest, next_match) orelse return false; 241 | const ignored = got_cut_next.prefix; 242 | // If matched an empty string, or several lines, report it as an error. 243 | if (ignored.len == 0) return false; 244 | if (std.mem.indexOf(u8, ignored, "\n") != null) return false; 245 | got_rest = got_cut_next.suffix; 246 | } else @panic("more than 10 ignores"); 247 | 248 | return std.mem.eql(u8, got_rest, snapshot_rest); 249 | } 250 | 251 | test equal_excluding_ignored { 252 | const TestCase = struct { got: []const u8, snapshot: []const u8 }; 253 | 254 | const cases_ok: []const TestCase = &.{ 255 | .{ .got = "ABA", .snapshot = "ABA" }, 256 | .{ .got = "ABBA", .snapshot = "AA" }, 257 | .{ .got = "ABBACABA", .snapshot = "ABCAA" }, 258 | }; 259 | for (cases_ok) |case| { 260 | try std.testing.expect(equal_excluding_ignored(case.got, case.snapshot)); 261 | } 262 | 263 | const cases_err: []const TestCase = &.{ 264 | .{ .got = "ABA", .snapshot = "ACA" }, 265 | .{ .got = "ABBA", .snapshot = "AC" }, 266 | .{ .got = "ABBACABA", .snapshot = "ABDABA" }, 267 | .{ .got = "ABBACABA", .snapshot = "ABBADA" }, 268 | .{ .got = "ABA", .snapshot = "ABA" }, 269 | .{ .got = "A\nB\nA", .snapshot = "AA" }, 270 | }; 271 | for (cases_err) |case| { 272 | try std.testing.expect(!equal_excluding_ignored(case.got, case.snapshot)); 273 | } 274 | } 275 | 276 | const Range = struct { start: usize, end: usize }; 277 | 278 | /// Extracts the range of the snapshot. Assumes that the snapshot is formatted as 279 | /// 280 | /// ``` 281 | /// snap(@src(), 282 | /// \\first line 283 | /// \\second line 284 | /// ) 285 | /// ``` 286 | /// 287 | /// We could make this more robust by using `std.zig.Ast`, but sticking to manual string processing 288 | /// is simpler, and enforced consistent style of snapshots is a good thing. 289 | /// 290 | /// While we expect to find a snapshot after a given line, this is not guaranteed (the file could 291 | /// have been modified between compilation and running the test), but should be rare enough to 292 | /// just fail with an assertion. 293 | fn snap_range(text: []const u8, src_line: u32) Range { 294 | var offset: usize = 0; 295 | var line_number: u32 = 0; 296 | 297 | var lines = std.mem.split(u8, text, "\n"); 298 | const snap_start = while (lines.next()) |line| : (line_number += 1) { 299 | if (line_number == src_line) { 300 | assert(std.mem.indexOf(u8, line, "@src()") != null); 301 | } 302 | if (line_number == src_line + 1) { 303 | assert(is_multiline_string(line)); 304 | break offset; 305 | } 306 | offset += line.len + 1; // 1 for \n 307 | } else unreachable; 308 | 309 | lines = std.mem.split(u8, text[snap_start..], "\n"); 310 | const snap_end = while (lines.next()) |line| { 311 | if (!is_multiline_string(line)) { 312 | break offset; 313 | } 314 | offset += line.len + 1; // 1 for \n 315 | } else unreachable; 316 | 317 | return Range{ .start = snap_start, .end = snap_end }; 318 | } 319 | 320 | fn is_multiline_string(line: []const u8) bool { 321 | for (line, 0..) |c, i| { 322 | switch (c) { 323 | ' ' => {}, 324 | '\\' => return (i + 1 < line.len and line[i + 1] == '\\'), 325 | else => return false, 326 | } 327 | } 328 | return false; 329 | } 330 | 331 | fn get_indent(line: []const u8) []const u8 { 332 | for (line, 0..) |c, i| { 333 | if (c != ' ') return line[0..i]; 334 | } 335 | return line; 336 | } 337 | --------------------------------------------------------------------------------