25 |
26 | ## Contents
27 |
28 |
29 |
30 | ## Summary
31 |
32 | A description of what the Engineering Plan does in 1-2 paragraphs.
33 |
34 | ## Implementation
35 |
36 | How is the change or the feature going to be implemented? Which parts need to be
37 | changed and how?
38 |
39 | ## Tests
40 |
41 | How is the change or the feature going to be tested? Outline what test cases
42 | will be implemented and, roughly, how (e.g. as unit tests, integration tests
43 | etc.). Describe what the corner cases are and how they are accounted for in the
44 | tests.
45 |
46 | ## Migration
47 |
48 | If the change is breaking, even if in parts, are (semi-)automated migrations
49 | possible? If not, how are we going to move users forward (e.g. announce early to
50 | prepare users in time for the change to be activated)? If migrations are
51 | possible, how are they going to work?
52 |
53 | ## Documentation
54 |
55 | How are the changes going to be documented?
56 |
57 | ## Implementation Plan
58 |
59 | An iterative plan for implementing the change or new feature.
60 |
61 | Think hard to not skip over potential challenges. Break the implementation down
62 | into a step-by-step plan where the scope of every step is clear and where there
63 | is no, or little, uncertainty about each individual step and the sequence
64 | overall.
65 |
66 | **Estimates:** Every step/task must come with an estimate of 1, 2, 3, 4 or 5
67 | days.
68 |
69 | **Phases:** For big changes, it can make sense to break down the overall plan
70 | into implementation phases.
71 |
72 | **Integration:** Explicit integration points must be included in the plan. What
73 | are we going to submit for review and integration at what point? What can be
74 | integrated early?
75 |
76 | In order for a plan to be approved, there must be _extremely high_ confidence
77 | that the plan will work out.
78 |
79 | ## Open Questions
80 |
81 | What are unresolved questions? Which areas may we have forgotten about or
82 | neglected in the plan?
83 |
--------------------------------------------------------------------------------
/engineering-plans/0001-graphql-query-prefetching.md:
--------------------------------------------------------------------------------
1 | # PLAN-0001: GraphQL Query Prefetching
2 |
3 |
22 |
23 | This is not really a plan as it was written and discussed before we adopted
24 | the RFC process, but contains important implementation detail of how we
25 | process GraphQL queries.
26 |
27 | ## Contents
28 |
29 |
30 |
31 |
32 | ## Implementation Details for prefetch queries
33 |
34 |
35 | ### Goal
36 |
37 | For a GraphQL query of the form
38 |
39 | ```graphql
40 | query {
41 | parents(filter) {
42 | id
43 | children(filter) {
44 | id
45 | }
46 | }
47 | }
48 | ```
49 |
50 | we want to generate only two SQL queries: one to get the parents, and one
51 | to get the children for all those parents. The fact that `children` is
52 | nested under `parents` requires that we add a filter to the `children`
53 | query that restricts children to those that are related to the parents we
54 | fetched in the first query to get the parents. How exactly we filter the
55 | `children` query depends on how the relationship between parents and
56 | children is modeled in the GraphQL schema, and on whether one (or both) of
57 | the types involved are interfaces.
58 |
59 | The rest of this writeup is concerned with how to generate the query for
60 | `children`, assuming we already retrieved the list of all parents.
61 |
62 | The bulk of the implementation of this feature can be found in
63 | `graphql/src/store/prefetch.rs`, `store/postgres/src/jsonb_queries.rs`, and
64 | `store/postgres/src/relational_queries.rs`
65 |
66 |
67 | ### Handling first/skip
68 |
69 | We never get all the `children` for a parent; instead we always have a
70 | `first` and `skip` argument in the children filter. Those arguments need to
71 | be applied to each parent individually by ranking the children for each
72 | parent according to the order defined by the `children` query. If the same
73 | child matches multiple parents, we need to make sure that it is considered
74 | separately for each parent as it might appear at different ranks for
75 | different parents. In SQL, we use a lateral join, essentially a for
76 | loop. For children that store the id of their parent in `parent_id`, we'd
77 | run the following query:
78 |
79 | ```sql
80 | select c.*, p.id
81 | from unnest({parent_ids}) as p(id)
82 | cross join lateral
83 | (select *
84 | from children c
85 | where c.parent_id = p.id
86 | and .. other conditions on c ..
87 | order by c.{sort_key}
88 | limit {first}
89 | offset {skip}) c
90 | order by c.{sort_key}
91 | ```
92 |
93 | ### Handling parent/child relationships
94 |
95 | How we get the children for a set of parents depends on how the
96 | relationship between the two is modeled. The interesting parameters there
97 | are whether parents store a list or a single child, and whether that field
98 | is derived, together with the same for children.
99 |
100 | There are a total of 16 combinations of these four boolean variables; four
101 | of them, when both parent and child derive their fields, are not
102 | permissible. It also doesn't matter whether the child derives its parent
103 | field: when the parent field is not derived, we need to use that since that
104 | is the only place that contains the parent -> child relationship. When the
105 | parent field is derived, the child field can not be a derived field.
106 |
107 | That leaves us with eight combinations of whether the parent
108 | and child store a list or a scalar value, and whether the parent is
109 | derived. For details on the GraphQL schema for each row in this table, see the
110 | section at the end. The `Join cond` indicates how we can find the children
111 | for a given parent. The table refers to the four different kinds of join
112 | condition we might need as types A, B, C, and D.
113 |
114 | | Case | Parent list? | Parent derived? | Child list? | Join cond | Type |
115 | |------|--------------|-----------------|-------------|----------------------------|------|
116 | | 1 | TRUE | TRUE | TRUE | child.parents ∋ parent.id | A |
117 | | 2 | FALSE | TRUE | TRUE | child.parents ∋ parent.id | A |
118 | | 3 | TRUE | TRUE | FALSE | child.parent = parent.id | B |
119 | | 4 | FALSE | TRUE | FALSE | child.parent = parent.id | B |
120 | | 5 | TRUE | FALSE | TRUE | child.id ∈ parent.children | C |
121 | | 6 | TRUE | FALSE | FALSE | child.id ∈ parent.children | C |
122 | | 7 | FALSE | FALSE | TRUE | child.id = parent.child | D |
123 | | 8 | FALSE | FALSE | FALSE | child.id = parent.child | D |
124 |
125 | In addition to how the data about the parent/child relationship is stored,
126 | the multiplicity of the parent/child relationship also influences query
127 | generation: if each parent can have at most a single child, queries can be
128 | much simpler than if we have to account for multiple children per parent,
129 | which requires paginating them. We also need to detect cases where the
130 | mappings created multiple children per parent. We do this by adding a
131 | clause `limit {parent_ids.len} + 1` to the query, so that if there is one
132 | parent with multiple children, we will select it, but still protect
133 | ourselves against mappings that produce catastrophically bad data with huge
134 | numbers of children per parent. The GraphQL execution logic will detect
135 | that there is a parent with multiple children, and generate an error.
136 |
137 | When we query children, we already have a list of all parents from running
138 | a previous query. To find the children, we need to have the id of the
139 | parent that child is related to, and, when the parent stores the ids of its
140 | children directly (types C and D) the child ids for each parent id.
141 |
142 | The following queries all produce a relation that has the same columns as
143 | the table holding children, plus a column holding the id of the parent that
144 | the child belongs to.
145 |
146 | #### Type A
147 |
148 | Use when parent is derived and child stores a list of parents
149 |
150 | Data needed to generate:
151 |
152 | - children: name of child table
153 | - parent_ids: list of parent ids
154 | - parent_field: name of parents field (array) in child table
155 | - single: boolean to indicate whether a parent has at most one child or
156 | not
157 |
158 | The implementation uses an `EntityLink::Direct` for joins of this type.
159 |
160 | ##### Multiple children per parent
161 | ```sql
162 | select c.*, p.id as parent_id
163 | from unnest({parent_ids}) as p(id)
164 | cross join lateral
165 | (select *
166 | from children c
167 | where p.id = any(c.{parent_field})
168 | and .. other conditions on c ..
169 | order by c.{sort_key}
170 | limit {first} offset {skip}) c
171 | order by c.{sort_key}
172 | ```
173 |
174 | ##### Single child per parent
175 | ```sql
176 | select c.*, p.id as parent_id
177 | from unnest({parent_ids}) as p(id),
178 | children c
179 | where c.{parent_field} @> array[p.id]
180 | and .. other conditions on c ..
181 | limit {parent_ids.len} + 1
182 | ```
183 |
184 | #### Type B
185 |
186 | Use when parent is derived and child stores a single parent
187 |
188 | Data needed to generate:
189 |
190 | - children: name of child table
191 | - parent_ids: list of parent ids
192 | - parent_field: name of parent field (scalar) in child table
193 | - single: boolean to indicate whether a parent has at most one child or
194 | not
195 |
196 | The implementation uses an `EntityLink::Direct` for joins of this type.
197 |
198 | ##### Multiple children per parent
199 | ```sql
200 | select c.*, p.id as parent_id
201 | from unnest({parent_ids}) as p(id)
202 | cross join lateral
203 | (select *
204 | from children c
205 | where p.id = c.{parent_field}
206 | and .. other conditions on c ..
207 | order by c.{sort_key}
208 | limit {first} offset {skip}) c
209 | order by c.{sort_key}
210 | ```
211 |
212 | ##### Single child per parent
213 |
214 | ```sql
215 | select c.*, c.{parent_field} as parent_id
216 | from children c
217 | where c.{parent_field} = any({parent_ids})
218 | and .. other conditions on c ..
219 | limit {parent_ids.len} + 1
220 | ```
221 |
222 | Alternatively, this is worth a try, too:
223 | ```sql
224 | select c.*, c.{parent_field} as parent_id
225 | from unnest({parent_ids}) as p(id), children c
226 | where c.{parent_field} = p.id
227 | and .. other conditions on c ..
228 | limit {parent_ids.len} + 1
229 | ```
230 |
231 | #### Type C
232 |
233 | Use when the parent stores a list of its children.
234 |
235 | Data needed to generate:
236 |
237 | - children: name of child table
238 | - parent_ids: list of parent ids
239 | - child\_id_matrix: array of arrays where `child_id_matrix[i]` is an array
240 | containing the ids of the children for `parent_id[i]`
241 |
242 | The implementation uses a `EntityLink::Parent` for joins of this type.
243 |
244 | ##### Multiple children per parent
245 |
246 | ```sql
247 | select c.*, p.id as parent_id
248 | from rows from (unnest({parent_ids}), reduce_dim({child_id_matrix}))
249 | as p(id, child_ids)
250 | cross join lateral
251 | (select *
252 | from children c
253 | where c.id = any(p.child_ids)
254 | and .. other conditions on c ..
255 | order by c.{sort_key}
256 | limit {first} offset {skip}) c
257 | order by c.{sort_key}
258 | ```
259 |
260 | Note that `reduce_dim` is a custom function that is not part of [ANSI
261 | SQL:2016](https://en.wikipedia.org/wiki/SQL:2016) but is needed as there is
262 | no standard way to decompose a matrix into a table where each row contains
263 | one row of the matrix. The `ROWS FROM` construct is also not part of ANSI
264 | SQL.
265 |
266 | ##### Single child per parent
267 |
268 | Not possible with relations of this type
269 |
270 | #### Type D
271 |
272 | Use when parent is not a list and not derived
273 |
274 | Data needed to generate:
275 |
276 | - children: name of child table
277 | - parent_ids: list of parent ids
278 | - child_ids: list of the id of the child for each parent such that
279 | `child_ids[i]` is the id of the child for `parent_id[i]`
280 |
281 | The implementation uses a `EntityLink::Parent` for joins of this type.
282 |
283 | ##### Multiple children per parent
284 |
285 | Not possible with relations of this type
286 |
287 | ##### Single child per parent
288 |
289 | ```sql
290 | select c.*, p.id as parent_id
291 | from rows from (unnest({parent_ids}), unnest({child_ids})) as p(id, child_id),
292 | children c
293 | where c.id = p.child_id
294 | and .. other conditions on c ..
295 | ```
296 |
297 | The `ROWS FROM` construct is not part of ANSI SQL.
298 |
299 | ### Handling interfaces
300 |
301 | If the GraphQL type of the children is an interface, we need to take
302 | special care to form correct queries. Whether the parents are
303 | implementations of an interface or not does not matter, as we will have a
304 | full list of parents already loaded into memory when we build the query for
305 | the children. Whether the GraphQL type of the parents is an interface may
306 | influence from which parent attribute we get child ids for queries of type
307 | C and D.
308 |
309 | When the GraphQL type of the children is an interface, we resolve the
310 | interface type into the concrete types implementing it, produce a query for
311 | each concrete child type and combine those queries via `union all`.
312 |
313 | Since implementations of the same interface will generally differ in the
314 | schema they use, we can not form a `union all` of all the data in the
315 | tables for these concrete types, but have to first query only attributes
316 | that we know will be common to all entities implementing the interface,
317 | most notably the `vid` (a unique identifier that identifies the precise
318 | version of an entity), and then later fill in the details of each entity by
319 | converting it directly to JSON. A second reason to pass entities as JSON
320 | from the database is that it is impossible with Diesel to execute queries
321 | where the number and types of the columns of the result are not known at
322 | compile time.
323 |
324 | We need to to be careful though to not convert to JSONB too early, as that
325 | is slow when done for large numbers of rows. Deferring conversion is
326 | responsible for some of the complexity in these queries.
327 |
328 | In the following, we only go through the queries for relational storage;
329 | for JSONB storage, there are similar considerations, though they are
330 | somewhat simpler as the `union all` in the below queries turns into
331 | an `entity = any(..)` clause with JSONB storage, and because we do not need
332 | to convert to JSONB data.
333 |
334 | That means that when we deal with children that are an interface, we will
335 | first select only the following columns from each concrete child type
336 | (where exactly they come from depends on how the parent/child relationship
337 | is modeled)
338 |
339 | ```sql
340 | select '{__typename}' as entity, c.vid, c.id, c.{sort_key}, p.id as parent_id
341 | ```
342 |
343 | and then use that data to fill in the complete details of each concrete
344 | entity. The query `type_query(children)` is the query from the previous
345 | section according to the concrete type of `children`, but without the
346 | `select`, `limit`, `offset` or `order by` clauses. The overall structure of
347 | this query then is
348 |
349 | ```sql
350 | with matches as (
351 | select '{children.object}' as entity, c.vid, c.id,
352 | c.{sort_key}, p.id as parent_id
353 | from .. type_query(children) ..
354 | union all
355 | .. range over all child types ..
356 | order by {sort_key}
357 | limit {first} offset {skip})
358 | select m.*, to_jsonb(c.*) as data
359 | from matches m, {children.table} c
360 | where c.vid = m.vid and m.entity = '{children.object}'
361 | union all
362 | .. range over all child tables ..
363 | order by {sort_key}
364 | ```
365 |
366 | The list `all_parent_ids` must contain the ids of all the parents for which
367 | we want to find children.
368 |
369 | We have one `children` object for each concrete GraphQL type that we need
370 | to query, where `children.table` is the name of the database table in which
371 | these entities are stored, and `children.object` is the GraphQL typename
372 | for these children.
373 |
374 | The code uses an `EntityCollection::Window` containing multiple
375 | `EntityWindow` instances to represent the most general form of querying for
376 | the children of a set of parents, the query given above.
377 |
378 | When there is only one window, we can simplify the above query. The
379 | simplification basically inlines the `matches` CTE. That is important as
380 | CTE's in Postgres before Postgres 12 are optimization fences, even when
381 | they are only used once. We therefore reduce the two queries that Postgres
382 | executes above to one for the fairly common case that the children are not
383 | an interface. For each type of parent/child relationship, the resulting
384 | query is essentially the same as the one given in the section
385 | `Handling parent/child relationships`, except that the `select` clause is
386 | changed to `select '{window.child_type}' as entity, to_jsonb(c.*) as data`:
387 |
388 | ```sql
389 | select '..' as entity, to_jsonb(e.*) as data, p.id as parent_id
390 | from {expand_parents}
391 | cross join lateral
392 | (select *
393 | from children c
394 | where {linked_children}
395 | and .. other conditions on c ..
396 | order by c.{sort_key}
397 | limit {first} offset {skip}) c
398 | order by c.{sort_key}
399 | ```
400 |
401 | Toplevel queries, i.e., queries where we have no parents, and therefore do
402 | not restrict the children we return by parent ids are represented in the
403 | code by an `EntityCollection::All`. If the GraphQL type of the children is
404 | an interface with multiple implementers, we can simplify the query by
405 | avoiding ranking and just using an ordinary `order by` clause:
406 |
407 | ```sql
408 | with matches as (
409 | -- Get uniform info for all matching children
410 | select '{entity_type}' as entity, id, vid, {sort_key}
411 | from {entity_table} c
412 | where {query_filter}
413 | union all
414 | ... range over all entity types
415 | order by {sort_key} offset {query.skip} limit {query.first})
416 | -- Get the full entity for each match
417 | select m.entity, to_jsonb(c.*) as data, c.id, c.{sort_key}
418 | from matches m, {entity_table} c
419 | where c.vid = m.vid and m.entity = '{entity_type}'
420 | union all
421 | ... range over all entity types
422 | -- Make sure we return the children for each parent in the correct order
423 | order by c.{sort_key}, c.id
424 | ```
425 |
426 | And finally, for the very common case of a toplevel GraphQL query for a
427 | concrete type, not an interface, we can further simplify this, again by
428 | essentially inlining the `matches` CTE to:
429 |
430 | ```sql
431 | select '{entity_type}' as entity, to_jsonb(c.*) as data
432 | from {entity_table} c
433 | where query.filter()
434 | order by {query.order} offset {query.skip} limit {query.first}
435 | ```
436 |
437 | ## Boring list of possible GraphQL models
438 |
439 | These are the eight ways in which a parent/child relationship can be
440 | modeled. For brevity, I left the `id` attribute on each parent and child
441 | type out.
442 |
443 | This list assumes that parent and child types are concrete types, i.e.,
444 | that any interfaces involved in this query have already been reolved into
445 | their implementations and we are dealing with one pair of concrete
446 | parent/child types.
447 |
448 | ```graphql
449 | # Case 1
450 | type Parent {
451 | children: [Child] @derived
452 | }
453 |
454 | type Child {
455 | parents: [Parent]
456 | }
457 |
458 | # Case 2
459 | type Parent {
460 | child: Child @derived
461 | }
462 |
463 | type Child {
464 | parents: [Parent]
465 | }
466 |
467 | # Case 3
468 | type Parent {
469 | children: [Child] @derived
470 | }
471 |
472 | type Child {
473 | parent: Parent
474 | }
475 |
476 | # Case 4
477 | type Parent {
478 | child: Child @derived
479 | }
480 |
481 | type Child {
482 | parent: Parent
483 | }
484 |
485 | # Case 5
486 | type Parent {
487 | children: [Child]
488 | }
489 |
490 | type Child {
491 | # doesn't matter
492 | }
493 |
494 | # Case 6
495 | type Parent {
496 | children: [Child]
497 | }
498 |
499 | type Child {
500 | # doesn't matter
501 | }
502 |
503 | # Case 7
504 | type Parent {
505 | child: Child
506 | }
507 |
508 | type Child {
509 | # doesn't matter
510 | }
511 |
512 | # Case 8
513 | type Parent {
514 | child: Child
515 | }
516 |
517 | type Child {
518 | # doesn't matter
519 | }
520 | ```
521 |
522 | ## Resources
523 |
524 | * [PostgreSQL Manual](https://www.postgresql.org/docs/12/index.html)
525 | * [Browsable SQL Grammar](https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html)
526 | * [Wikipedia entry on ANSI SQL:2016](https://en.wikipedia.org/wiki/SQL:2016) The actual standard is not freely available
527 |
--------------------------------------------------------------------------------
/engineering-plans/0002-ethereum-tracing-cache.md:
--------------------------------------------------------------------------------
1 | # PLAN-0002: Ethereum Tracing Cache
2 |
3 |
22 |
23 | ## Summary
24 |
25 | Implements RFC-0002: Ethereum Tracing Cache
26 |
27 | ## Implementation
28 |
29 | These changes happen within or near `ethereum_adapter.rs`, `store.rs` and `db_schema.rs`.
30 |
31 | ### Limitations
32 | The problem of reorg turns out to be a particularly tricky one for the cache, mostly due to ranges of blocks being requested rather than individual hashes. To sidestep this problem, only blocks that are older than the reorg threshold will be eligible for caching.
33 |
34 | Additionally, there are some subgraphs which may require traces from all or a substantial number of blocks and don't make effective use of filtering. In particular, subgraphs which specify a call handler without a contract address fall into this category. In order to prevent the cache from bloating, any use of Ethereum traces which does not filter on a contract address will bypass the cache.
35 |
36 | ### EthereumTraceCache
37 |
38 | The implementation introduces the following trait, which is implemented primarily by `Store`.
39 |
40 | ```rust
41 | use std::ops::RangeInclusive;
42 | struct TracesInRange {
43 | range: RangeInclusive,
44 | traces: Vec,
45 | }
46 |
47 | pub trait EthereumTraceCache: Send + Sync + 'static {
48 | /// Attempts to retrieve traces from the cache. Returns ranges which were retrieved.
49 | /// The results may not cover the entire range of blocks. It is up to the caller to decide
50 | /// what to do with ranges of blocks that are not cached.
51 | fn traces_for_blocks(contract_address: Option, blocks: RangeInclusive
52 | ) -> Box, Error>>>;
53 | fn add(contract_address: Option, traces: Vec);
54 | }
55 | ```
56 |
57 | #### Block schema
58 |
59 | Each cached block will exist as its own row in the database in an `eth_traces_cache` table.
60 |
61 | ```rust
62 | eth_traces_cache(id) {
63 | id -> Integer,
64 | network -> Text,
65 | block_number: Integer,
66 | contract_address: Bytea,
67 | traces -> Jsonb,
68 | }
69 | ```
70 |
71 | A multi-column index will be added on network, block_number, and contract_address.
72 |
73 | It can be noted that in the `eth_traces_cache` table, there is a very low cardinality for the value of the network row. It is inefficient for example to store the string `mainnet` millions of times and consider this value when querying. A data oriented approach would be to partition these tables on the value of the network. It is expected that hash partitioning available in Postgres 11 would be useful here, but the necessary dependencies won't be ready in time for this RFC. This may be revisited in the future.
74 |
75 | #### Valid Cache Range
76 | Because the absence of trace data for a block is a valid cache result, the database must maintain a data structure indicating which ranges of the cache are valid in an `eth_traces_meta` table. This table also enables eventually implementing cleaning out old data.
77 |
78 | This is the schema for that structure:
79 | ```rust
80 | id -> Integer,
81 | network -> Text,
82 | start_block -> Integer,
83 | end_block -> Integer,
84 | contract_address -> Nullable,
85 | accessed_at -> Date,
86 | ```
87 |
88 | When inserting data into the cache, removing data from the cache, or reading the cache, a serialized transaction must be used to preserve atomicity between the valid cache range structure and the cached blocks. Care must be taken to not rely on any data read outside of the serialized transaction, and for the extent of the serialized transaction to not span any async contexts that rely on any `Future` outside of the database itself. The definition of the `EthereumTraceCache` trait is designed to uphold these guarantees.
89 |
90 | In order to preserve space in the database, whenever the valid cache range is added it will be added such that adjacent and overlapping ranges are merged into it.
91 |
92 | ### Cache usage
93 | The primary user of the cache is `EtheriumAdapter` in the `traces` function.
94 |
95 | The correct algorithm for retrieving traces from the cache is surprisingly nuanced. The complication arises from the interaction between multiple subgraphs which may require a subset of overlapping contract addresses. The rate at which indexing proceeds of these subgraphs can cause different ranges of the cache to be valid for a contract address in a single query.
96 |
97 | We want to minimize the cost of external requests for trace data. It is likely that it is better to...
98 | * Make fewer requests
99 | * Not ask for trace data that is already cached
100 | * Ask for trace data for multiple contract addresses within the same block when possible.
101 |
102 | There is one flow of data which upholds these invariants. In doing so it makes a tradeoff of increasing latency for the execution of a specific subgraph, but increases throughput of the whole system.
103 |
104 | Within this graph:
105 | * Edges which are labelled refer to some subset of the output data.
106 | * Edges which are not labelled refer to the entire set of the output data.
107 | * Each node executes once for each contiguous range of blocks. That is, it merges all incoming data before executing, and executes the minimum possible times.
108 | * The example given is just for 2 addresses. The actual code must work on sets of addresses.
109 |
110 | ```mermaid
111 | graph LR;
112 | A[Block Range for Contract A & B]
113 | A --> |Above Reorg Threshold| E
114 | D[Get Cache A]
115 | A --> |Below Reorg Threshold A| D
116 | A --> |Below Reorg Threshold B| H
117 | E[Ethereum A & B]
118 | F[Ethereum A]
119 | G[Ethereum B]
120 | H[Get Cache B]
121 | D --> |Found| M
122 | H --> |Found| M
123 | M[Result]
124 | D --> |Missing| N
125 | H --> |Missing| N
126 | N[Overlap]
127 | N --> |A & B| E
128 | N --> |A| F
129 | N --> |B| G
130 | E --> M
131 | K[Set Cache A]
132 | L[Set Cache B]
133 | E --> |B Below Reorg Threshold| L
134 | E --> |A Below Reorg Threshold| K
135 | F --> K
136 | G --> L
137 | F --> M
138 | G --> M
139 | ```
140 |
141 |
142 | This construction is designed to make the fewest number of the most efficient calls possible. It is not as complicated as it looks. The actual construction can be expressed as sequential steps with a set of filters preceding each step.
143 |
144 | ### Useful dependencies
145 | The feature deals a lot with ranges and sets. Operations like sum, subtract, merge, and find overlapping are used frequently. [nested_intervals](https://crates.io/crates/nested_intervals) is a crate which provides some of these operations.
146 |
147 |
148 | ## Tests
149 |
150 | ### Benchmark
151 | A temporary benchmark will be added for indexing a simple subgraph which uses call handlers. The benchmark will be run in these scenarios:
152 | * Sync before changes
153 | * Re-sync before changes
154 | * Sync after changes
155 | * Re-sync after changes
156 |
157 | ### Ranges
158 | Due to the complexity of the resource minimizing data workflow, it will be useful to have mocks for the cache and database which record their calls, and check that expected calls are made for tricky data sets.
159 |
160 | ### Database
161 | A real database integration test will be added to test the add/remove from cache implementation to verify that it correctly merges blocks, handles concurrency issues, etc.
162 |
163 |
164 | ## Migration
165 |
166 | None
167 |
168 | ## Documentation
169 |
170 | None, aside from code comments
171 |
172 | ## Implementation Plan: ##
173 |
174 | These estimates inflated to account for the author's lack of experience with Postgres, Ethereum, Futures0.1, and The Graph in general.
175 |
176 | - (1) Create benchmarks
177 | - Postgres Cache
178 | - (0.5) Block Cache
179 | - (0.5) Trace Serialization/Deserialization
180 | - (1.0) Ranges Cache
181 | - (0.5) Concurrency/Transactions
182 | - (0.5) Tests against Postgres
183 | - Data Flow
184 | - (3) Implementation
185 | - (1) Unit tests
186 | - (0.5) Run Benchmarks
187 |
188 | Total: 8
189 |
190 |
191 |
--------------------------------------------------------------------------------
/engineering-plans/0003-remove-jsonb-storage.md:
--------------------------------------------------------------------------------
1 | # PLAN-0003: Remove JSONB Storage
2 |
3 |
22 |
23 | ## Summary
24 |
25 | Remove JSONB storage from `graph-node`. That means that we want to remove
26 | the old storage scheme, and only use relational storage going
27 | forward. At a high level, removal has to touch the following areas:
28 |
29 | * user subgraphs in the hosted service
30 | * user subgraphs in self-hosted `graph-node` instances
31 | * subgraph metadata in `subgraphs.entities` (see [this issue](https://github.com/graphprotocol/graph-node/issues/1394))
32 | * the `graph-node` code base
33 |
34 | Because it touches so many areas and different things, JSONB storage
35 | removal will need to happen in several steps, the last being actual removal
36 | of JSONB code. The first three steps above are independent of each other
37 | and can be done in parallel.
38 |
39 | ## Implementation
40 |
41 | ### User Subgraphs in the Hosted Service
42 |
43 | We will need to communicate to users that they need to update their
44 | subgraphs if they still use JSONB storage. Currently, there are ~ 580
45 | subgraphs
46 | ([list](https://gist.github.com/lutter/2e7a7716b70b4144fe0b6a5f1c9066bc))
47 | belonging to 220 different organizations using JSONB storage. It is quite
48 | likely that the vast majority of them is not needed anymore and simply left
49 | over from somebody trying something out.
50 |
51 | We should contact users and tell them that we will delete their subgraph
52 | after a certain date (say 2020-02-01) _unless_ they deploy a new version of
53 | the subgraph (with an explanation why etc. of course) Redeploying their
54 | subgraph is all that is needed for those updates.
55 |
56 | ### Self-hosted User Subgraphs
57 |
58 | We will need to tell users that the 'old' JSONB storage is deprecated and
59 | support for it will be removed as of some target date, and that they need
60 | to redeploy their subgraph.
61 |
62 | Users will need some documentation/tooling to help them understand
63 | * which of their deployed subgraphs still use JSONB storage
64 | * how to remove old subgraphs
65 | * how to remove old deployments
66 |
67 | ### Subgraph Metadata in `subgraphs.entities`
68 |
69 | We can treat the `subgraphs` schema like a normal subgraph, with the
70 | exception that some entities must not be versioned. For that, we will need
71 | to adopt code that makes it possible to write entities to the store without
72 | recording their version (or, more generally, so that there will only be one
73 | version of the entity, tagged with a block range `[0,)`)
74 |
75 | We will manually create the DDL for the `subgraphs.graphql` schema and run
76 | that as part of a database migration. In that migration, we will also copy
77 | the existing metadata from `subgraphs.entities` and
78 | `subgraphs.entity_history` into their new tables.
79 |
80 | ### The Code Base
81 |
82 | Delete all code handling JSONB storage. This will mostly affect
83 | `entities.rs` and `jsonb_queries.rs` in `graph-store-postgres`, but there
84 | are also smaller things like that we do not need the annotations on
85 | `Entity` to serialize them to the JSON format that JSONB uses.
86 |
87 | ## Tests
88 |
89 | Most of the code-level changes are covered by the existing test suite. The
90 | major exception is that the migration of subgraph metadata needs to be
91 | tested and checked manually, using a recent dump of the production
92 | database.
93 |
94 | ## Migration
95 |
96 | See above on migrating data in the `subgraphs` schema.
97 |
98 | ## Documentation
99 |
100 | No user-facing documentation is needed.
101 |
102 | ## Implementation Plan
103 |
104 | _No estimates yet as we should first agree on this general course of
105 | action_
106 |
107 | * Notify hosted users to update their subgraph or have it deleted by date X
108 | * Mark JSONB storage as deprecated and announce when it will be removed
109 | * Provide tool to ship with `graph-node` to delete unused deployments and
110 | unneeded subgraphs
111 | * Add affordance to not version entities to relational storage code
112 | * Write SQL migrations to create new subgraph metadata schema and copy
113 | existing data
114 | * Delete old JSONB code
115 | * On start of `graph-node`, add check for any deployments that still use
116 | JSONB storage and log warning messages telling users to redeploy (once
117 | the JSONB code has been deleted, this data can not be accessed any more)
118 |
119 | ## Open Questions
120 |
121 | None
122 |
--------------------------------------------------------------------------------
/engineering-plans/0004-subgraph-grafting.md:
--------------------------------------------------------------------------------
1 | # PLAN-0004: Subgraph Grafting
2 |
3 |
23 |
24 | ## Contents
25 |
26 |
27 |
28 | ## Summary
29 |
30 | This feature makes it possible to resume indexing of a failed subgraph by
31 | creating a new subgraph that uses the failed subgraph's data up to a block
32 | that the subgraph writer deems reliable, and continue indexing using the
33 | new subgraph's mappings. We call this operation *grafting* subgraphs.
34 |
35 | The main use case for this feature is to allow subgraph developers to move
36 | quickly past an error in a subgraph, especially during development. The
37 | overarching goal should always be to replace a subgraph that was grafted
38 | with one that was indexed from scratch as soon as that is reasonably
39 | possible.
40 |
41 | ## Subgraph manifest changes
42 |
43 | The user-facing control for this feature will be a new optional
44 | `graft` field in the subgraph manifest. It specifies the base subgraph and
45 | block number in that subgraph that should be used as the basis of the graft:
46 |
47 | ```yaml
48 | description: My really very good subgraph
49 | graft:
50 | base: Qm...
51 | block: 123456
52 | ```
53 |
54 | - It is a deploy-time error if the source subgraph does not exist.
55 | - The block at which the graft happens must be final. Since requiring that
56 | the graft point is at least `ETHEREUM_REORG_THRESHOLD` would require
57 | waiting for an unreasonable amount of time before grafting becomes
58 | possible, grafting will be allowed onto any block the base subgraph has
59 | already indexed. If an attempt is made to revert that block, the grafted
60 | subgraph will fail with an error.
61 | - The base subgraph and the grafted subgraph must have GraphQL schemas that
62 | are compatible with copying data. The grafted subgraph can add and remove
63 | entity types and attributes from the base subgraph's schema, as well as
64 | make non-nullable attributes in the base subgraph nullable. In all other
65 | respects, the two schemas must be identical.
66 | - It is an error to graft onto a base subgraph that uses JSONB storage
67 | (i.e., the base subgraph must use relational storage)
68 |
69 | ## Implementation
70 |
71 | - All the magic will happen inside `Store.create_subgraph_deployment` which
72 | will receive an additional argument of type `DeploymentMode` that
73 | encapsulates the data from the subgraph manifest described above
74 | - Subgraph creation will first set up the subgraph metadata as we do today,
75 | and then
76 | 1. create the `Layout` and run its DDL in the database
77 | 2. copy the data from the source subgraph into the grafted subgraph using
78 | `insert into new.table select * from old.table` It would probably be
79 | faster to copy the data with `create table new.table as table
80 | old.table`; that requires that we manually create constraints and
81 | indexes after copying the data. As that requires a deeper change in
82 | the code that creates the database schema for a subgraph, we'll leave
83 | that as a possible improvement
84 | 3. copy dynamic data sources from the source subgraph. The copying will
85 | create new `DynamicEthereumContractDataSource` entities, and all their
86 | child entities, by creating a new id for the dynamic data source and
87 | adjusting the id's of child entities accordingly
88 | 4. rewind the new subgraph to the desired block by reverting to that
89 | block (both data and metadata)
90 | 5. set the `latestEthereumBlockHash` and `latestEthereumBlockNumber` of
91 | the new subgraph's `SubgraphDeployment` to the desired block
92 | - Since we are changing the manifest format, `graph-cli` will need
93 | validation for that. `graph-node`'s manifest validation will also need to
94 | be updated.
95 |
96 | ## Tests
97 |
98 | Details TBD, but the tests need to cover at least subgraph manifests
99 | without a `deployment` and each of the supported `modes`
100 |
101 | ## Migration
102 |
103 | This is purely additive and requires no migration.
104 |
105 | ## Documentation
106 |
107 | The `deployment` field in subgraph manifests will be documented [in the
108 | public
109 | docs](https://thegraph.com/docs/define-a-subgraph#the-subgraph-manifest)
110 | and the [manifest reference](https://github.com/graphprotocol/graph-node/blob/master/docs/subgraph-manifest.md)
111 |
112 | ## Implementation Plan
113 |
114 | - (1) Compare two `Layout`s and report some sort of useful error if they are
115 | not equal/compatible
116 | - (1) Copy subgraph data after creation of the `Layout`; copy dynamic data
117 | sources and adjust their id's
118 | - (0.5) Modify `SubgraphDeployment` entity to have optional `graftBase`
119 | and `graftBlock` attributes and store them (requires database migration)
120 | - (0.5) Rewind an existing subgraph, and set the subgraph's `latestEthereumBlock`
121 | to that
122 | - (0.5) Extract graft details from the subgraph manifest and pass that into
123 | `create_subgraph`. Have `create_subgraph` use the above steps to copy the
124 | source subgraph if the deployment mode asks for that. Check that source
125 | subgraph has advanced past the graft point/block
126 | - (0.5) Change revert logic to make sure a revert never goes past a graft
127 | point
128 | - (0.5) Validate `graft` construct in subgraph manifest
129 | - (0.5) Update documentation
130 | - (0.5) Update validation in `graph-cli`
131 |
132 | ## Open Questions
133 |
134 | - Should grafted subgraphs be marked somehow as it will be very hard to
135 | recreate them in other `graph-node` installations? Recreating such a
136 | subgraph would require deploying the source subgraph, letting it index
137 | past the source block and then deploying the target subgraph. Since the
138 | source subgraph could itself be the result of grafting, it might be
139 | necessary to perform these operations recursively
140 |
--------------------------------------------------------------------------------
/engineering-plans/approved.md:
--------------------------------------------------------------------------------
1 | # Approved Engineering Plans
2 |
3 | - [PLAN-0001: GraphQL Query Prefetching](./0001-graphql-query-prefetching.md)
4 | - [PLAN-0002: Ethereum Tracing Cache](./0002-ethereum-tracing-cache.md)
5 | - [PLAN-0003: Remove JSONB Storage](./0003-remove-jsonb-storage.md)
6 |
--------------------------------------------------------------------------------
/engineering-plans/index.md:
--------------------------------------------------------------------------------
1 | # Engineering Plans
2 |
3 | ## What is an Engineering Plan?
4 |
5 | Engineering Plans are plans to turn an [RFC](../rfcs/index.md) into an
6 | implementation in the core Graph Protocol tools like Graph Node, Graph CLI and
7 | Graph TS. Every substantial development effort that follows an RFC is planned in
8 | the form of an Engineering Plan.
9 |
10 | ## Engineering Plan process
11 |
12 | ### 1. Create a new Engineering Plan
13 |
14 | Like RFCs, Engineering Plans are numbered, starting at `0001`. To create a new
15 | plan, create a new branch of the `rfcs` repository. Check the existing plans to
16 | identify the next number to use. Then, copy the [Engineering Plan
17 | template](https://github.com/graphprotocol/rfcs/blob/master/engineering-plans/0000-template.md)
18 | to a new file in the `engineering-plans/` directory. For example:
19 |
20 | ```sh
21 | cp engineering-plans/0000-template.md engineering-plans/0015-fulltext-search.md
22 | ```
23 |
24 | Write the Engineering Plan, commit it to the branch and open a [pull
25 | request](https://github.com/graphprotocol/rfcs/pulls) in the `rfcs` repository.
26 |
27 | In addition to the Engineering Plan itself, the pull request must include the
28 | following changes:
29 |
30 | - a link to the Engineering Plan on the [Approved Engineering Plans](./approved.md) page, and
31 | - a link to the Engineering Plan under `Approved Engineering Plans` in `SUMMARY.md`.
32 |
33 | ### 2. Engineering Plan review
34 |
35 | After an Engineering Plan has been submitted through a pull request, it is being
36 | reviewed. At the time of writing, every Engineering Plan needs to be approved by
37 |
38 | - the Tech Lead, and
39 | - at least one member of the core development team.
40 |
41 | ### 3. Engineering Plan approval
42 |
43 | Once an Engineering Plan is approved, the Engineering Plan meta data (see the
44 | [template](https://github.com/graphprotocol/rfcs/blob/master/engineering-plans/0000-template.md))
45 | is updated and the pull request is merged by the original author or a Graph
46 | Protocol team member.
47 |
--------------------------------------------------------------------------------
/engineering-plans/obsolete.md:
--------------------------------------------------------------------------------
1 | # Obsolete Engineering Plans
2 |
3 | Obsolete Engineering Plans are moved to the `engineering-plans/obsolete`
4 | directory in the `rfcs` repository. They are listed below for reference.
5 |
6 | - No Engineering Plans have been obsoleted yet.
7 |
--------------------------------------------------------------------------------
/engineering-plans/rejected.md:
--------------------------------------------------------------------------------
1 | # Rejected Engineering Plans
2 |
3 | Rejected Engineering Plans can be found by filtering open and closed pull
4 | requests by those that are labeled with `rejected`. This list can be [found
5 | here](https://github.com/graphprotocol/rfcs/issues?q=label:engineering-plan+label:rejected).
6 |
--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | This repository / book describes the process for proposing changes to Graph
4 | Protocol in the form of [RFCs](./rfcs/index.md) and [Engineering
5 | Plans](./engineering-plans/index.md).
6 |
7 | It also includes all approved, rejected and obsolete RFCs and Engineering Plans.
8 | For more details, see the following pages:
9 |
10 | - [RFCs](./rfcs/index.md)
11 | - [Engineering Plans](./engineering-plans/index.md)
12 |
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
1 | {
2 | "name": "@graphprotocol/rfcs",
3 | "description": "",
4 | "version": "0.1.0",
5 | "author": "Graph Protocol, Inc.",
6 | "scripts": {
7 | "build": "curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs > rustup.sh && chmod +x rustup.sh && ./rustup.sh -y && source $HOME/.cargo/env && cargo install mdbook mdbook-toc mdbook-mermaid && mdbook build && cp -R book public"
8 | }
9 | }
10 |
--------------------------------------------------------------------------------
/rfcs/0000-template.md:
--------------------------------------------------------------------------------
1 | # RFC-0000: Template
2 |
3 |
22 |
23 | ## Contents
24 |
25 |
26 |
27 | ## Summary
28 |
29 | A brief summary of the proposal in 1-3 paragraphs.
30 |
31 | ## Goals & Motivation
32 |
33 | What are the reasons for proposing the change? Why is it needed? What will the
34 | benefits be and for whom?
35 |
36 | ## Urgency
37 |
38 | How urgent is this proposal? How soon or late should or must an implementation
39 | of this be available?
40 |
41 | ## Terminology
42 |
43 | What terminology are we introducing? If this RFC is for a new feature, what is
44 | this feature going to be referred to? The goal is that we all speak the same
45 | language and use the same terms when talking about the change.
46 |
47 | ## Detailed Design
48 |
49 | This is the main section of the RFC. What does the proposal include? What are
50 | the proposed interfaces/APIs? How are different affected parties, such as users,
51 | developers or node operators affected by the change and how are they going to
52 | use it?
53 |
54 | ## Compatibility
55 |
56 | Is this proposal backwards-compatible or is it a breaking change? If it is
57 | breaking, how could this be mitigated (think: migrations, announcing ahead of
58 | time like with hard forks, etc.)?
59 |
60 | ## Drawbacks and Risks
61 |
62 | Why might we _not_ want to do this? What cost would implementing this proposal
63 | incur? What risks would be introduced by going down this path?
64 |
65 | ## Alternatives
66 |
67 | What other designs have been considered, if any? For what reasons have they not
68 | been chosen? Are there workarounds that make this change less necessary?
69 |
70 | ## Open Questions
71 |
72 | What are unresolved questions?
73 |
--------------------------------------------------------------------------------
/rfcs/0001-subgraph-composition.md:
--------------------------------------------------------------------------------
1 | # RFC-0001: Subgraph Composition
2 |
3 |
22 |
23 |
24 | ## Summary
25 |
26 | Subgraph composition enables referencing, extending and querying entities across
27 | subgraph boundaries.
28 |
29 | ## Goals & Motivation
30 |
31 | The high-level goal of subgraph composition is to be able to compose subgraph
32 | schemas and data hierarchically. Imagine umbrella subgraphs that combine all the
33 | data from a domain (e.g. DeFi, job markets, music) through one unified, coherent
34 | API. This could allow reuse and governance at different levels and go all the
35 | way to the top, fulfilling the vision of _the_ Graph.
36 |
37 | The ability to reference, extend and query entities across subgraph boundaries
38 | enables several use cases:
39 |
40 | 1. Linking entities across subgraphs.
41 | 2. Extending entities defined in other subgraphs by adding new fields.
42 | 3. Breaking down data silos by composing subgraphs and defining richer schemas
43 | without indexing the same data over and over again.
44 |
45 | Subgraph composition is needed to avoid duplicated work, both in terms of
46 | developing subgraphs as well as indexing them. It is an essential part of the
47 | overall vision behind The Graph, as it allows to combine isolated subgraphs into
48 | a complete, connected graph of the (decentralized) world's data.
49 |
50 | Subgraph developers will benefit from the ability to reference data from other
51 | subgraphs, saving them development time and enabling richer data models. dApp
52 | developers will be able to leverage this to build more compelling applications.
53 | Node operators will benefit from subgraph composition by having better insight
54 | into which subgraphs are queried together, allowing them to make more informed
55 | decisions about which subgraphs to index.
56 |
57 | ## Urgency
58 |
59 | Due to the high impact of this feature and its important role in fulfilling the
60 | vision behind The Graph, it would be good to start working on this as early as
61 | possible.
62 |
63 | ## Terminology
64 |
65 | The feature is referred to by _query-time subgraph composition_, short:
66 | _subgraph composition_.
67 |
68 | Terms introduced and used in this RFC:
69 |
70 | - _Imported schema_: The schema of another subgraph from which types are
71 | imported.
72 | - _Imported type_: An entity type imported from another subgraph schema.
73 | - _Extended type_: An entity type imported from another subgraph schema and
74 | extended in the subgraph that imports it.
75 | - _Local schema_: The schema of the subgraph that imports from another subgraph.
76 | - _Local type_: A type defined in the local schema.
77 |
78 | ## Detailed Design
79 |
80 | The sections below make the assumption that there is a subgraph with the name
81 | `ethereum/mainnet` that includes an `Address` entity type.
82 |
83 | ### Composing Subgraphs By Importing Types
84 |
85 | In order to reference entity types from annother subgraph, a developer would
86 | first import these types from the other subgraph's schema.
87 |
88 | Types can be imported either from a subgraph name or from a subgraph ID.
89 | Importing from a subgraph name means that the exact version of the imported
90 | subgraph will be identified at query time and its schema may change in arbitrary
91 | ways over time. Importing from a subgraph ID guarantees that the schema will
92 | never change but also means that the import points to a subgraph version that
93 | may become outdated over time.
94 |
95 | Let's say a DAO subgraph contains a `Proposal` type that has a `proposer` field
96 | that should link to an Ethereum address (think: Ethereum accounts or contracts)
97 | and a `transaction` field that should link to an Ethereum transaction. The
98 | developer would then write the DAO subgraph schema as follows:
99 |
100 | ```graphql
101 | type _Schema_
102 | @import(
103 | types: ["Address", { name: "Transaction", as: "EthereumTransaction" }],
104 | from: { name: "ethereum/mainnet" }
105 | )
106 |
107 | type Proposal @entity {
108 | id: ID!
109 | proposer: Address!
110 | transaction: EthereumTransaction!
111 | }
112 | ```
113 |
114 | This would then allow queries that follow the references to addresses and
115 | transactions, like
116 |
117 | ```graphql
118 | {
119 | proposals {
120 | proposer {
121 | balance
122 | address
123 | }
124 | transaction {
125 | hash
126 | block {
127 | number
128 | }
129 | }
130 | }
131 | }
132 | ```
133 |
134 | ### Extending Types From Imported Schemas
135 |
136 | Extending types from another subgraph involves several steps:
137 |
138 | 1. Importing the entity types from the other subgraph.
139 | 2. Extending these types with custom fields.
140 | 3. Managing (e.g. creating) extended entities in subgraph mappings.
141 |
142 | Let's say the DAO subgraph wants to extend the Ethereum `Address` type to
143 | include the proposals created by each respective account. To achieve this, the
144 | developer would write the following schema:
145 |
146 | ```graphql
147 | type _Schema_
148 | @import(
149 | types: ["Address"],
150 | from: { name: "ethereum/mainnet" }
151 | )
152 |
153 | type Proposal @entity {
154 | id: ID!
155 | proposer: Address!
156 | }
157 |
158 | extend type Address {
159 | proposals: [Proposal!]! @derivedFrom(field: "proposal")
160 | }
161 | ```
162 |
163 | This makes queries like the following possible, where the query can go "back"
164 | from addresses to proposal entities, despite the Ethereum `Address` type
165 | originally being defined in the `ethereum/mainnet` subgraph.
166 |
167 | ```graphql
168 | {
169 | addresses {
170 | id
171 | proposals {
172 | id
173 | proposer {
174 | id
175 | }
176 | }
177 | }
178 | ```
179 |
180 | In the above case, the `proposals` field on the extended type is derived, which
181 | means that an implementation wouldn't have to create a local extension type in
182 | the store. However, if `proposals` was defined as
183 |
184 | ```graphql
185 | extend type Address {
186 | proposals: [Proposal!]!
187 | }
188 | ```
189 |
190 | then it would the subgraph mappings would have to create partial `Address`
191 | entities with `id` and `proposals` fields for all addresses from which proposals
192 | were created. At query time, these entity instances would have to be merged with
193 | the original `Address` entities from the `ethereum/mainnet` subgraph.
194 |
195 | ### Subgraph Availability
196 |
197 | In the decentralized network, queries will be split and routed through the
198 | network based on what indexers are available and which subgraphs they index. At
199 | that point, failure to find an indexer for a subgraph that types were imported
200 | from will result in a query error. The error that a non-nullable field resolved
201 | to null bubbles up to the next nullable parent, in accordance with the [GraphQL
202 | Spec](https://graphql.github.io/graphql-spec/draft/#sec-Errors.Error-result-format).
203 |
204 | Until the network is reality, we are dealing with individual Graph Nodes and
205 | querying subgraphs where imported entity types are not also indexed on the same
206 | node should be handled with more tolerance. This RFC proposes that entity
207 | reference fields that refer to imported types are converted to being optional in
208 | the generated API schema. If the subgraph that the type is imported from is not
209 | available on a node, such fields should resolve to `null`.
210 |
211 | ### Interfaces
212 |
213 | Subgraph composition also supports interfaces in the ways outlined below.
214 |
215 | #### Interfaces Can Be Imported From Other Subgraphs
216 |
217 | The syntax for this is the same as that for importing types:
218 |
219 | ```graphql
220 | type _Schema_
221 | @import(types: ["ERC20"], from: { name: "graphprotocol/erc20" })
222 | ```
223 |
224 | #### Local Types Can Implement Imported Interfaces
225 |
226 | This is achieved by importing the interface from another subgraph schema
227 | and implementing it in entity types:
228 |
229 | ```graphql
230 | type _Schema_
231 | @import(types: ["ERC20"], from: { name: "graphprotocol/erc20" })
232 |
233 | type MyToken implements ERC20 @entity {
234 | # ...
235 | }
236 | ```
237 |
238 | #### Imported Types Can Be Extended To Implement Local Interfaces
239 |
240 | This is achieved by importing the types from another subgraph schema, defining a
241 | local interface and using `extend` to implement the interface on the imported
242 | types:
243 |
244 | ```graphql
245 | type _Schema_
246 | @import(types: [{ name: "Token", as "LPT" }], from: { name: "livepeer/livepeer" })
247 | @import(types: [{ name: "Token", as "Rep" }], from: { name: "augur/augur" })
248 |
249 | interface Token {
250 | id: ID!
251 | balance: BigInt!
252 | }
253 |
254 | extend LPT implements Token {
255 | # ...
256 | }
257 | extend Rep implements Token {
258 | # ...
259 | }
260 | ```
261 |
262 | #### Imported Types Can Be Extended To Implement Imported Interfaces
263 |
264 | This is a combination of importing an interface, importing the types and
265 | extending them to implement the interface:
266 |
267 | ```graphql
268 | type _Schema_
269 | @import(types: ["Token"], from: { name: "graphprotocol/token" })
270 | @import(types: [{ name: "Token", as "LPT" }], from: { name: "livepeer/livepeer" })
271 | @import(types: [{ name: "Token", as "Rep" }], from: { name: "augur/augur" })
272 |
273 | extend LPT implements Token {
274 | # ...
275 | }
276 | extend Rep implements Token {
277 | # ...
278 | }
279 | ```
280 |
281 | #### Implementation Concerns For Interface Support
282 |
283 | Querying across types from different subgraphs that implement the same interface
284 | may require a smart algorithm, especially when it comes to pagination. For
285 | instance, if the first 1000 entities for an interface are queried, this range of
286 | 1000 entities may be divided up between different local and imported types
287 | arbitrarily.
288 |
289 | A naive algorithm could request 1000 entities from each subgraph, applying the
290 | selected filters and order, combine the results and cut off everything after the
291 | first 1000 items. This would generate a minimum of requests but would involve
292 | significant overfetching.
293 |
294 | Another algorithm could just fetch the first item from each subgraph, then based
295 | on that information, divide up the range in more optimal ways than the previous
296 | algorith, and satisfy the query with more requests but with less overfetching.
297 |
298 | ## Compatibility
299 |
300 | Subgraph composition is a purely additive, non-breaking change. Existing
301 | subgraphs remain valid without any migrations being necessary.
302 |
303 | ## Drawbacks And Risks
304 |
305 | Reasons that could speak against implementing this feature:
306 |
307 | - Schema parsing and validation becomes more complicated. Especially validation
308 | of imported schemas may not always be possible, depending on whether and when
309 | the referenced subgraph is available on the Graph Node or not.
310 |
311 | - Query execution becomes more complicated. The subgraph a type belongs to must
312 | be identified and local as well as imported versions of extended entities have
313 | to be queried separately and be merged before returning data to the client.
314 |
315 | ## Alternatives
316 |
317 | No alternatives have been considered.
318 |
319 | There are other ways to compose subgraph schemas using GraphQL technologies such
320 | as [schema
321 | stitching](https://www.apollographql.com/docs/graphql-tools/schema-stitching/)
322 | or [Apollo
323 | Federation](https://www.apollographql.com/docs/apollo-server/federation/introduction/).
324 | However, schema stitching is being deprecated and Apollo Federation requires a
325 | centralized server to serve to extend and merge GraphQL API. Both of these
326 | solutions slow down queries.
327 |
328 | Another reason not to use these is that GraphQL will only be _one_ of several
329 | query languages supported in the future. Composition therefore has to be
330 | implemented in a query-language-agnostic way.
331 |
332 | ## Open Questions
333 |
334 | - Right now, interfaces require unique IDs across all the concrete entity types
335 | that implement them. This is not something we can guarantee any longer if
336 | these concrete types live in different subgraphs. So we have to handle this at
337 | query time (or must somehow disallow it, returning a query error).
338 |
339 | It is also unclear how an individual interface entity lookup would look like
340 | if IDs are no longer guaranteed to be unique:
341 | ```graphql
342 | someInterface(id: "?????") {
343 | }
344 | ```
345 |
--------------------------------------------------------------------------------
/rfcs/0002-ethereum-tracing-cache.md:
--------------------------------------------------------------------------------
1 | # RFC-0002: Ethereum Tracing Cache
2 |
3 |
22 |
23 | ## Summary
24 |
25 | This RFC proposes the creation of a local Ethereum tracing cache to speed up indexing of subgraphs which use block and/or call handlers.
26 |
27 | ## Motivation
28 |
29 | When indexing a subgraph that uses block and/or call handlers, it is necessary to extract calls from the trace of each block that a Graph Node indexes. It is expensive to acquire and process traces from Ethereum nodes in both money and time.
30 |
31 | When developing a subgraph it is common to make changes and deploy those changes to a production Graph Node for testing. Each time a change is deployed, the Graph Node must re-sync the subgraph using the same traces that were used for the previous sync of the subgraph. The cost of acquiring the traces each time a change is deployed impacts a subgraph developer's ability to iterate and test quickly.
32 |
33 | ## Urgency
34 |
35 | None
36 |
37 | ## Terminology
38 |
39 | _Ethereum cache_: The new API proposed here.
40 |
41 | ## Detailed Design
42 |
43 | There is an existing `EthereumCallCache` for caching `eth_call` built into Graph Node today. This cache will be extended to support traces, and renamed to `EthereumCache`.
44 |
45 | ## Compatibility
46 |
47 | This change is backwards compatible. Existing code can continue to use the parity tracing API. Because the cache is local, each indexing node may delete the cache should the format or implementation of caching change. In this case of invalidated cache the code will fall back to existing methods for retrieving a trace and repopulating the cache.
48 |
49 | ## Drawbacks and Risks
50 |
51 | Subgraphs which are not being actively developed will incur the overhead for storing traces, but will not ever reap the benefits of ever reading them back from the cache.
52 |
53 | If this drawback is significant, it may be necessary to extend `EthereumCache` to provide a custom score for cache invalidation other than the current date. For example, `trace_filter` calls could be invalidated based on the latest update time for a subgraph requiring the trace. It is expected that a subgraph which has been updated recently is more likely to be updated again soon then a subgraph which has not been recently updated.
54 |
55 | ## Alternatives
56 |
57 | None
58 |
59 | ## Open Questions
60 |
61 | None
--------------------------------------------------------------------------------
/rfcs/0003-mutations.md:
--------------------------------------------------------------------------------
1 | # RFC-0003: Mutations
2 |
3 |
19 |
20 | ## Contents
21 |
22 |
23 |
24 | ## Summary
25 |
26 | GraphQL mutations allow developers to add executable functions to their schema. Callers can invoke these functions using GraphQL queries. An introduction to how mutations are defined and work can be found [here](https://graphql.org/learn/queries/#mutations). This RFC will assume the reader understands how to use GraphQL mutations in a traditional Web2 application. This proposal describes how mutations are added to The Graph's toolchain, and used to replace Web3 write operations the same way The Graph has replaced Web3 read operations.
27 |
28 | ## Goals & Motivation
29 |
30 | The Graph has created a read semantic layer that describes smart contract protocols, which has made it easier to build applications on top of complex protocols. Since dApps have two primary interactions with Web3 protocols (reading & writing), the next logical addition is write support.
31 |
32 | Protocol developers that use a subgraph still often publish a Javascript wrapper library for their dApp developers (examples: [DAOstack](https://github.com/daostack/client), [ENS](https://github.com/ensdomains/ensjs), [LivePeer](https://github.com/livepeer/livepeerjs/tree/master/packages/sdk), [DAI](https://github.com/makerdao/dai.js/tree/dev/packages/dai), [Uniswap](https://github.com/Uniswap/uniswap-sdk)). This is done to help speed up dApp development and promote consistency with protocol usage patterns. With the addition of mutations to the Graph Protocol's GraphQL tooling, Web3 reading & writing can now both be invoked through GraphQL queries. dApp developers can now simply refer to a single GraphQL schema that defines the entire protocol.
33 |
34 | ## Urgency
35 |
36 | This is urgent from a developer experience point of view. With this addition, it eliminates the need for protocol developers to manually wrap GraphQL query interfaces alongside developer-friendly write functions. Additionally, mutations provide a solution for optimistic UI updates, which is something dApp developers have been seeking for a long time (see [here](https://github.com/aragon/nest/issues/21)). Lastly with the whole protocol now defined in GraphQL, existing application layer code generators can now be used to hasten dApp development ([some examples](https://dev.to/graphqleditor/top-3-graphql-code-generators-1gnj)).
37 |
38 | ## Terminology
39 |
40 | * _Mutations_: Collection of mutations.
41 | * _Mutation_: A GraphQL mutation.
42 | * _Mutations Schema_: A GraphQL schema that defines a `type Mutation`, which contains all mutations. Additionally this schema can define other types to be used by the mutations, such as `input` and `interface` types.
43 | * _Mutations Manifest_: A YAML manifest file that is used to add mutations to an existing subgraph manifest. This manifest can be stored in an external YAML file, or within the subgraph manifest's YAML file under the `mutations` property.
44 | * _Mutation Resolvers_: Code module that contains all resolvers.
45 | * _Resolver_: Function that is used to execute a mutation's logic.
46 | * _Mutation Context_: A context object that's created for every mutation that's executed. It's passed as the 3rd argument to the resolver function.
47 | * _Mutation States_: A collection of mutation states. One is created for each mutation being executed in a given query.
48 | * _Mutation State_: The state of a mutation being executed. Also referred to in this document as "_State_". It is an aggregate of the core & extended states (see below). dApp developers can subscribe to the mutation's state upon execution of the mutation query. See the `useMutation` examples below.
49 | * _Core State_: Default properties present within every mutation state. Some examples: `events: Event[]`, `uuid: string`, and `progress: number`.
50 | * _Extended State_: Properties the mutation developer defines. These are added alongside the core state properties in the mutation state. There are no bounds to what a developer can define here. See examples below.
51 | * _State Events_: Events emitted by mutation resolvers. Also referred to in this document as "_Events_". Events are defined by a `name: string` and a `payload: any`. These events, once emitted, are given to reducer functions which then update the state accordingly.
52 | * _Core Events_: Default events available to all mutations. Some examples: `PROGRESS_UPDATE`, `TRANSACTION_CREATED`, `TRANSACTION_COMPLETED`.
53 | * _Extended Events_: Events the mutation developer defines. See examples below.
54 | * _State Reducers_: A collection of state reducer functions.
55 | * _State Reducer_: Reducers are responsible for translating events into state updates. They take the form of a function that has the inputs [event, current state], and returns the new state post-event. Also referred to in this document as "_Reducer(s)_".
56 | * _Core Reducers_: Default reducers that handle the processing of the core events.
57 | * _Extended Reducers_: Reducers the mutation developer defines. These reducers can be defined for any event, core or extended. The core & extended reducers are run one after another if both are defined for a given core event. See examples below.
58 | * _State Updater_: The state updater object is used by the resolvers to dispatch events. It's passed to the resolvers through the mutation context like so: `context.graph.state`.
59 | * _State Builder_: An object responsible for (1) initializing the state with initial values and (2) defining reducers for events.
60 | * _Core State Builder_: A state builder that's defined by default. It's responsible for initializing the core state properties, and processing the core events with its reducers.
61 | * _Extended State Builder_: A state builder defined by the mutation developer. It's responsible for initializing the extended state properties, and processing the extended events with its reducers.
62 | * _Mutations Config_: Collection of config properties required by the mutation resolvers. Also referred to in this document as "_Config_". All resolvers share the same config. It's passed to the resolver through the mutation context like so: `context.graph.config`.
63 | * _Config Property_: A single property within the config (ex: ipfs, ethereum, etc).
64 | * _Config Generator_: A function that takes a config argument, and returns a config property. For example, "localhost:5001" as a config argument gets turned into a new IPFS client by the config generator.
65 | * _Config Argument_: An initialization argument that's passed into the config generator function. This config argument is provided by the dApp developer.
66 | * _Optimistic Response_: A response given to the dApp that predicts what the outcome of the mutation's execution will be. If it is incorrect, it will be overwritten with the actual result.
67 |
68 | ## Detailed Design
69 |
70 | The sections below illustrate how a developer would add mutations to an existing subgraph, and then add those mutations to a dApp.
71 |
72 | ### Mutations Manifest
73 |
74 | The subgraph manifest (`subgraph.yaml`) now has an extra property named `mutations` which is the mutations manifest.
75 |
76 | `subgraph.yaml`
77 | ```yaml
78 | specVersion: ...
79 | ...
80 | mutations:
81 | repository: https://npmjs.com/package/...
82 | schema:
83 | file: ./mutations/schema.graphql
84 | resolvers:
85 | apiVersion: 0.0.1
86 | kind: javascript/es5
87 | file: ./mutations/index.js
88 | types: ./mutations/index.d.ts
89 | dataSources: ...
90 | ...
91 | ```
92 |
93 | Alternatively, the mutation manifest can be external like so:
94 | `subgraph.yaml`
95 | ```yaml
96 | specVersion: ...
97 | ...
98 | mutations:
99 | file: ./mutations/mutations.yaml
100 | dataSources: ...
101 | ...
102 | ```
103 | `mutations/mutations.yaml`
104 | ```yaml
105 | specVersion: ...
106 | repository: https://npmjs.com/package/...
107 | schema:
108 | file: ./schema.graphql
109 | resolvers:
110 | apiVersion: 0.0.1
111 | kind: javascript/es5
112 | file: ./index.js
113 | types: ./index.d.ts
114 | ```
115 |
116 | NOTE: `resolvers.types` is required. More on this below.
117 |
118 | ### Mutations Schema
119 |
120 | The mutations schema defines all of the mutations in the subgraph. The mutations schema builds on the subgraph schema, allowing the use of types from the subgraph schema, as well as defining new types that are used only in the context of mutations. For example, starting from a base subgraph schema:
121 | `schema.graphql`
122 | ```graphql
123 | type MyEntity @entity {
124 | id: ID!
125 | name: String!
126 | value: BigInt!
127 | }
128 | ```
129 |
130 | Developers can define mutations that reference these subgraph schema types. Additionally new `input` and `interface` types can be defined for the mutations to use:
131 | `mutations/schema.graphql`
132 | ```graphql
133 | input MyEntityOptions {
134 | name: String!
135 | value: BigInt!
136 | }
137 |
138 | interface NewNameSet {
139 | oldName: String!
140 | newName: String!
141 | }
142 |
143 | type Mutation {
144 | createEntity(
145 | options: MyEntityOptions!
146 | ): MyEntity!
147 |
148 | setEntityName(
149 | entity: MyEntity!
150 | name: String!
151 | ): NewNameSet!
152 | }
153 | ```
154 |
155 | `graph-cli` handles the parsing and validating of these two schemas. It verifies that the mutations schema defines a `type Mutation` and that all of the mutations within it are defined in the resolvers module (see next section).
156 |
157 | ### Mutation Resolvers
158 |
159 | Each mutation within the schema must have a corresponding resolver function defined. Resolvers will be invoked by whatever engine executes the mutation queries (ex: Apollo Client). They are executed locally within the client application.
160 |
161 | Mutation resolvers of kind `javascript/es5` take the form of an ES5 javascript module. This module is expected to have a default export that contains the following properties:
162 | * `resolvers: MutationResolvers` - The mutation resolver functions. The shape of this object must match the shape of the `type Mutation` defined above. See the example below for demonstration of this. Resolvers have the following prototype, [as defined in graphql-js](https://github.com/graphql/graphql-js/blob/9dba58eeb6e28031bec7594b6df34c4fd74459b0/src/type/definition.js#L906):
163 | ```typescript
164 | import { GraphQLFieldResolver } from 'graphql'
165 |
166 | interface MutationContext<
167 | TConfig extends ConfigGenerators,
168 | TState,
169 | TEventMap extends EventTypeMap
170 | > {
171 | [prop: string]: any,
172 | graph: {
173 | config: ConfigProperties,
174 | dataSources: DataSources,
175 | state: StateUpdater
176 | }
177 | }
178 |
179 | interface MutationResolvers<
180 | TConfig extends ConfigGenerators,
181 | TState,
182 | TEventMap extends EventTypeMap
183 | > {
184 | Mutation: {
185 | [field: string]: GraphQLFieldResolver<
186 | any,
187 | MutationContext
188 | >
189 | }
190 | }
191 | ```
192 | * `config: ConfigGenerators` - A collection of config generators. The config object is made up of properties, that can be nested, but all terminate in the form of a function with the prototype:
193 | ```typescript
194 | type ConfigGenerator = (arg: TArg) => TRet
195 |
196 | interface ConfigGenerators {
197 | [prop: string]: ConfigGenerator | ConfigGenerators
198 | }
199 | ```
200 | See the example below for a demonstration of this.
201 |
202 | * `stateBuilder: StateBuilder` (optional) - A state builder interface responsible for (1) initializing extended state properties and (2) reducing extended state events. State builders implement the following interface:
203 | ```typescript
204 | type MutationState = CoreState & TState
205 | type MutationEvents = CoreEvents & TEventMap
206 |
207 | interface StateBuilder {
208 | getInitialState(uuid: string): TState,
209 | // Event Specific Reducers
210 | reducers?: {
211 | [TEvent in keyof MutationEvents]?: (
212 | state: MutationState,
213 | payload: InferEventPayload
214 | ) => OptionalAsync>>
215 | },
216 | // Catch-All Reducer
217 | reducer?: (
218 | state: MutationState,
219 | event: Event
220 | ) => OptionalAsync>>
221 | }
222 |
223 | interface EventPayload { }
224 |
225 | interface Event {
226 | name: string
227 | payload: EventPayload
228 | }
229 |
230 | interface EventTypeMap {
231 | [name: string]: EventPayload
232 | }
233 |
234 | // Optionally support async functions
235 | type OptionalAsync = Promise | T
236 |
237 | // Infer the payload type from the event name, given an EventTypeMap
238 | type InferEventPayload<
239 | TEvent extends keyof TEvents,
240 | TEvents extends EventTypeMap
241 | > = TEvent extends keyof TEvents ? TEvents[TEvent] : any
242 | ```
243 | See the example below for a demonstration of this.
244 |
245 | For example:
246 | `mutations/index.js`
247 | ```typescript
248 | import {
249 | Event,
250 | EventPayload,
251 | MutationContext,
252 | MutationResolvers,
253 | MutationState,
254 | StateBuilder,
255 | ProgressUpdateEvent
256 | } from "@graphprotocol/mutations"
257 |
258 | import gql from "graphql-tag"
259 | import { ethers } from "ethers"
260 | import {
261 | AsyncSendable,
262 | Web3Provider
263 | } from "ethers/providers"
264 | import IPFS from "ipfs"
265 |
266 | // Typesafe Context
267 | type Context = MutationContext
268 |
269 | /// Mutation Resolvers
270 | const resolvers: MutationResolvers = {
271 | Mutation: {
272 | async createEntity (source: any, args: any, context: Context) {
273 | // Extract mutation arguments
274 | const { name, value } = args.options
275 |
276 | // Use config properties created by the
277 | // config generator functions
278 | const { ethereum, ipfs } = context.graph.config
279 |
280 | // Create ethereum transactions...
281 | // Fetch & upload to ipfs...
282 |
283 | // Dispatch a state event through the state updater
284 | const { state } = context.graph
285 | await state.dispatch("PROGRESS_UPDATE", { progress: 0.5 })
286 |
287 | // Dispatch a custom extended event
288 | await state.dispatch("MY_EVENT", { myValue: "..." })
289 |
290 | // Get a copy of the current state
291 | const currentState = state.current
292 |
293 | // Send another query using the same client.
294 | // This query would result in the graph-node's
295 | // entity store being fetched from. You could also
296 | // execute another mutation here if desired.
297 | const { client } = context
298 | await client.query({
299 | query: gql`
300 | myEntity (id: "${id}") {
301 | id
302 | name
303 | value
304 | }
305 | }`
306 | })
307 |
308 | ...
309 | },
310 | async setEntityName (source: any, args: any, context: Context) {
311 | ...
312 | }
313 | }
314 | }
315 |
316 | /// Config Generators
317 | type Config = typeof config
318 |
319 | const config = {
320 | // These function arguments are passed in by the dApp
321 | ethereum: (arg: AsyncSendable): Web3Provider => {
322 | return new ethers.providers.Web3Provider(arg)
323 | },
324 | ipfs: (arg: string): IPFS => {
325 | return new IPFS(arg)
326 | },
327 | // Example of a custom config property
328 | property: {
329 | // Generators can be nested
330 | a: (arg: string) => { },
331 | b: (arg: string) => { }
332 | }
333 | }
334 |
335 | /// (optional) Extended State, Events, and State Builder
336 |
337 | // Extended State
338 | interface State {
339 | myValue: string
340 | }
341 |
342 | // Extended Events
343 | interface MyEvent extends EventPayload {
344 | myValue: string
345 | }
346 |
347 | type EventMap = {
348 | "MY_EVENT": MyEvent
349 | }
350 |
351 | // Extended State Builder
352 | const stateBuilder: StateBuilder = {
353 | getInitialState(): State {
354 | return {
355 | myValue: ""
356 | }
357 | },
358 | reducers: {
359 | "MY_EVENT": async (state: MutationState, payload: MyEvent) => {
360 | return {
361 | myValue: payload.myValue
362 | }
363 | },
364 | "PROGRESS_UPDATE": (state: MutationState, payload: ProgressUpdateEvent) => {
365 | // Do something custom...
366 | }
367 | },
368 | // Catch-all reducer...
369 | reducer: (state: MutationState, event: Event) => {
370 | switch (event.name) {
371 | case "TRANSACTION_CREATED":
372 | // Do something custom...
373 | break
374 | }
375 | }
376 | }
377 |
378 | export default {
379 | resolvers,
380 | config,
381 | stateBuilder
382 | }
383 |
384 | // Required Types
385 | export {
386 | Config,
387 | State,
388 | EventMap,
389 | MyEvent
390 | }
391 | ```
392 |
393 | NOTE: It's expected that the mutations manifest has a `resolvers.types` file defined. The following types must be defined in the .d.ts type definition file:
394 | - `Config`
395 | - `State`
396 | - `EventMap`
397 | - Any `EventPayload` interfaces defined within the `EventMap`
398 |
399 | ### dApp Integration
400 |
401 | In addition to the resolvers module defined above, the dApp has access to a run-time API to help with the instantiation and execution of mutations. This package is called `@graphprotocol/mutations` and is defined like so:
402 | - `createMutations` - Create a mutations interface which enables the user to `execute` a mutation query and `configure` the mutation module.
403 | ```typescript
404 | interface CreateMutationsOptions<
405 | TConfig extends ConfigGenerators,
406 | TState,
407 | TEventMap extends EventTypeMap
408 | > {
409 | mutations: MutationsModule,
410 | subgraph: string,
411 | node: string,
412 | config: ConfigArguments
413 | mutationExecutor?: MutationExecutor
414 | }
415 |
416 | interface Mutations<
417 | TConfig extends ConfigGenerators,
418 | TState,
419 | TEventMap extends EventTypeMap
420 | > {
421 | execute: (query: MutationQuery) => Promise
422 | configure: (config: ConfigArguments) => void
423 | }
424 |
425 | const createMutations = <
426 | TConfig extends ConfigGenerators,
427 | TState = CoreState,
428 | TEventMap extends EventTypeMap = { },
429 | >(
430 | options: CreateMutationsOptions
431 | ): Mutations => { ... }
432 | ```
433 |
434 | - `createMutationsLink` - wrap the mutations created above in an ApolloLink.
435 | ```typescript
436 | const createMutationsLink = <
437 | TConfig extends ConfigGenerators,
438 | TState,
439 | TEventMap extends EventTypeMap,
440 | > (
441 | { mutations }: { mutations: Mutations }
442 | ): ApolloLink => { ... }
443 | ```
444 |
445 | For applications using Apollo and React, a run-time API is available which mimics commonly used hooks and components for executing mutations, with the addition of having the mutation state available to the caller. This package is called `@graphprotocol/mutations-apollo-react` and is defined like so:
446 | - `useMutation` - see https://www.apollographql.com/docs/react/data/mutations/#executing-a-mutation
447 | ```typescript
448 | import { DocumentNode } from "graphql"
449 | import {
450 | ExecutionResult,
451 | MutationFunctionOptions,
452 | MutationResult,
453 | OperationVariables
454 | } from "@apollo/react-common"
455 | import { MutationHookOptions } from "@apollo/react-hooks"
456 | import { CoreState } from "@graphprotocol/mutations"
457 |
458 | type MutationStates = {
459 | [mutation: string]: MutationState
460 | }
461 |
462 | interface MutationResultWithState extends MutationResult {
463 | state: MutationStates
464 | }
465 |
466 | type MutationTupleWithState = [
467 | (
468 | options?: MutationFunctionOptions
469 | ) => Promise>,
470 | MutationResultWithState
471 | ]
472 |
473 | const useMutation = <
474 | TState = CoreState,
475 | TData = any,
476 | TVariables = OperationVariables
477 | >(
478 | mutation: DocumentNode,
479 | mutationOptions: MutationHookOptions
480 | ): MutationTupleWithState => { ... }
481 | ```
482 |
483 | - `Mutation` - see https://www.howtographql.com/react-apollo/3-mutations-creating-links/
484 | ```typescript
485 | interface MutationComponentOptionsWithState<
486 | TState,
487 | TData,
488 | TVariables
489 | > extends BaseMutationOptions {
490 | mutation: DocumentNode
491 | children: (
492 | mutateFunction: MutationFunction,
493 | result: MutationResultWithState
494 | ) => JSX.Element | null
495 | }
496 |
497 | const Mutation = <
498 | TState = CoreState,
499 | TData = any,
500 | TVariables = OperationVariables
501 | >(
502 | props: MutationComponentOptionsWithState
503 | ): JSX.Element | null => { ... }
504 | ```
505 |
506 | For example:
507 | `dApp/src/App.tsx`
508 | ```typescript
509 | import {
510 | createMutations,
511 | createMutationsLink
512 | } from "@graphprotocol/mutations"
513 | import {
514 | Mutation,
515 | useMutation
516 | } from "@graphprotocol/mutations-apollo-react"
517 | import myMutations, { State } from "mutations-js-module"
518 | import { createHttpLink } from "apollo-link-http"
519 |
520 | const mutations = createMutations({
521 | mutations: myMutations,
522 | // Config args, which will be passed to the generators
523 | config: {
524 | // Config args can take the form of functions to allow
525 | // for dynamic fetching behavior
526 | ethereum: async (): AsyncSendable => {
527 | const { ethereum } = (window as any)
528 | await ethereum.enable()
529 | return ethereum
530 | },
531 | ipfs: "http://localhost:5001",
532 | property: {
533 | a: "...",
534 | b: "..."
535 | }
536 | },
537 | subgraph: "my-subgraph",
538 | node: "http://localhost:8080"
539 | })
540 |
541 | // Create Apollo links to handle queries and mutation queries
542 | const mutationLink = createMutationLink({ mutations })
543 | const queryLink = createHttpLink({
544 | uri: "http://localhost:8080/subgraphs/name/my-subgraph"
545 | })
546 |
547 | // Create a root ApolloLink which splits queries between
548 | // the two different operation links (query & mutation)
549 | const link = split(
550 | ({ query }) => {
551 | const node = getMainDefinition(query)
552 | return node.kind === "OperationDefinition" &&
553 | node.operation === "mutation"
554 | },
555 | mutationLink,
556 | queryLink
557 | )
558 |
559 | // Create an Apollo Client
560 | const client = new ApolloClient({
561 | link,
562 | cache: new InMemoryCache()
563 | })
564 |
565 | const CREATE_ENTITY = gql`
566 | mutation createEntity($options: MyEntityOptions) {
567 | createEntity(options: $options) {
568 | id
569 | name
570 | value
571 | }
572 | }
573 | `
574 |
575 | // exec: execution function for the mutation query
576 | // loading: https://www.apollographql.com/docs/react/data/mutations/#tracking-mutation-status
577 | // state: mutation state instance
578 | const [exec, { loading, state }] = useMutation(
579 | CREATE_ENTITY,
580 | {
581 | client,
582 | variables: {
583 | options: { name: "...", value: 5 }
584 | }
585 | }
586 | )
587 |
588 | // Access the mutation's state like so:
589 | state.createEntity.myValue
590 |
591 | // Optimistic responses can be used to update
592 | // the UI before the execution has finished.
593 | // More information can be found here:
594 | // https://www.apollographql.com/docs/react/performance/optimistic-ui/
595 | const [exec, { loading, state }] = useMutation(
596 | CREATE_ENTITY,
597 | {
598 | optimisticResponse: {
599 | __typename: "Mutation",
600 | createEntity: {
601 | __typename: "MyEntity",
602 | name: "...",
603 | value: 5,
604 | // NOTE: ID must be known so the
605 | // final response can be correlated.
606 | // Please refer to Apollo's docs.
607 | id: "id"
608 | }
609 | },
610 | variables: {
611 | options: { name: "...", value: 5 }
612 | }
613 | }
614 | )
615 | ```
616 | ```html
617 | // Use the Mutation JSX Component
618 |
622 | {(exec, { loading, state }) => (
623 |
624 | )}
625 |
626 | ```
627 |
628 |
629 | ## Compatibility
630 |
631 | No breaking changes will be introduced, as mutations are an optional add-on to a subgraph.
632 |
633 | ## Drawbacks and Risks
634 |
635 | Nothing apparent at the moment.
636 |
637 | ## Alternatives
638 |
639 | The existing alternative that protocol developers are creating for dApp developers has been described above.
640 |
641 | ## Open Questions
642 |
643 | - **How can mutations pickup where they left off in the event of an abrupt application shutdown?**
644 | Since mutations can contain many different steps internally, it would be ideal to be able to support continuing resolver execution in the event the dApp abruptly shuts down.
645 |
646 | - **How can dApps understand what steps a given mutation will take during the course of its execution?**
647 | dApps may want to present to the user friendly progress updates, letting them know a given mutation is 3/4ths of the way through its execution (for example) and a high level description of each step. I view this as closely tied to the previous open question above, as we could support continuing resolver executions if we know what step it's currently undergoing. A potential implementation could include adding a `steps: Step[]` property to the core state, where `Step` looks similar to:
648 | ```typescript
649 | interface Step {
650 | id: string
651 | title: string
652 | description: string
653 | status: 'pending' | 'processing' | 'error' | 'finished'
654 | current: boolean
655 | error?: Error
656 | data: any
657 | }
658 | ```
659 |
660 | This, plus a few core events & reducers, would be all we need to render UIs like the ones seen here: https://ant.design/components/steps/
661 |
662 | - **Should dApps be able to define event handlers for mutation events?**
663 | dApps may want to implement their own handlers for specific events emitted from mutations. These handlers would be different from the reducers, as we wouldn't want them to be able to modify the state. Instead they could store their own state elsewhere within the dApp based on the events.
664 |
665 | - **Should the Graph Node's schema introspection endpoint respond with the "full" schema, including the mutations' schema?**
666 | Developers could fetch the "full" schema by looking up the subgraph's manifest, read the `mutations.schema.file` hash value, and fetching the full schema from IPFS. Should the graph-node support querying this full schema directly from the graph-node itself through the introspection endpoint?
667 |
668 | - **Will server side execution ever be a reality?**
669 | I have not thought of a trustless solution to this, am curious if anyone has any ideas of how we could make this possible.
670 |
671 | - **Will The Graph Explorer support mutations?**
672 | We could have the explorer client-side application dynamically fetch and include mutation resolver modules. Configuring the resolvers module dynamically is problematic though. Maybe there are a few known config properties that the explorer client supports, and for all others it allows the user to input config arguments (if they're base types).
673 |
--------------------------------------------------------------------------------
/rfcs/0004-fulltext-search.md:
--------------------------------------------------------------------------------
1 | # RFC-0004: Fulltext Search
2 |
3 |
22 |
23 | ## Contents
24 |
25 |
26 |
27 | ## Summary
28 |
29 | The fulltext search filter type is a feature of the GraphQL API that
30 | allows subgraph developers to specify language-specific, lexical,
31 | composite filters that end users can use in their queries. The fulltext
32 | search feature examines all words in a document, breaking it into
33 | individual words and phrases (lexical analysis), and collapsing
34 | variations of words into a single index term (stemming.)
35 |
36 | ## Goals & Motivation
37 |
38 | The current set of string filters available in the GraphQL API is lacking
39 | fulltext search capabilities that enable efficient searches across entities
40 | and attributes. Wildcard string matching does provide string filtering, but
41 | users have come to expect the easy to use filtering that comes with fulltext
42 | search systems.
43 |
44 | To facilitate building effective user interfaces human-user friendly query
45 | filtering is essential. Lexical, composite fulltext search filters can provide
46 | the tools necessary for front-end developers to implement powerful search
47 | bars that filter data across multiple fields of an Entity.
48 |
49 | The proposed feature aims to provide tools for subgraph developers to define
50 | composite search APIs that can search across multiple fields and entities.
51 |
52 | ## Urgency
53 |
54 | A delay in adding the fulltext search feature will not create issues
55 | with current deployments. However, the feature will represent a
56 | realization of part of the long term vision for the query network. In
57 | addition, several high profile users have communicated that it may be a
58 | conversion blocker. Implementation should be prioritized.
59 |
60 | ## Terminology
61 |
62 | - _lexeme_: a basic lexical unit of a language, consisting of one word or
63 | several words, considered as an abstract unit, and applied to a family
64 | of words related by form or meaning.
65 | - _morphology (linguistics)_: the study of words, how they are formed,
66 | and their relationship to other words in the same language.
67 | - _fulltext search index_: the result of lexical and morphological
68 | analysis (stemming) of a set of text documents. It provides frequency
69 | and location for the language-specific stems found in the text documents
70 | being indexed.
71 | - _ranking algorithm_: "Ranking attempts to measure how relevant documents
72 | are to a particular query, so that when there are many matches the most
73 | relevant ones can be shown first." [- Postgres Documentation](https://www.postgresql.org/docs/11/textsearch-controls.html#TEXTSEARCH-RANKING-SEARCH-RESULTS)
74 |
75 | **Algorithms**:
76 | - _standard ranking_: ranking based on the number of matching lexemes.
77 | - _cover density ranking_: Cover density is similar to the standard
78 | fulltext search ranking except that the proximity of matching lexemes
79 | to each other is taken into consideration. This function requires
80 | lexeme positional information to perform its calculation, so it ignores
81 | any "stripped" lexemes in the index.
82 |
83 | ## Detailed Design
84 |
85 | ### Subgraph Schema
86 |
87 | Part of the power of the fulltext search API is the flexibility, so
88 | it is important to expose a simple interface to facilitate useful applications
89 | of the index and aim to reduce the need to create new subgraphs for the
90 | express purpose of updating fulltext search fields.
91 |
92 | For each fulltext search API a subgraph developer must be able to specify:
93 | 1. a language (specified using an `ISO 639-1` code),
94 | 2. a set of text document fields to include,
95 | 3. relative weighting for each field,
96 | 4. a choice of ranking algorithm for sorting query result items.
97 |
98 | The proposed process of adding one or more fulltext search API involves
99 | adding one or more fulltext directive to the `_Schema_` type in the
100 | subgraph's GraphQL schema. Each fulltext definition will have four
101 | required top level parameters: `name`, `language`, `algorithm`, and
102 | `include`. The fulltext search definitions will be used to generate
103 | query fields on the GraphQL schema that will be exposed to the end user.
104 |
105 | Enabling fulltext search across entities will be a powerful abstraction
106 | that allows users to search across all relevant entities in one query. Such
107 | a search will by definition have polymorphic results. To address this, a
108 | union type will be generated in the schema for the fulltext search results.
109 |
110 | Validation of the fulltext definition will ensure that all fields referenced
111 | in the directive are valid String type fields. With subgraph composition
112 | it will be possible to easily create new subgraphs that add specific fulltext
113 | search capabilities to an existing subgraph.
114 |
115 | Example fulltext search definition:
116 |
117 | ```graphql
118 | type _Schema_
119 | @fulltext(
120 | name: "media"
121 | ...
122 | )
123 | @fulltext(
124 | name: "search",
125 | language: EN, # variant of `_FullTextLanguage` enum
126 | algorithm: RANKED, # variant of `_FullTextAlgorithm` enum
127 | include: [
128 | {
129 | entity: "Band",
130 | fields: [
131 | { name: "name", weight: 5 },
132 | ]
133 | },
134 | {
135 | entity: "Album",
136 | fields: [
137 | { name: "title", weight: 5 },
138 | ]
139 | },
140 | {
141 | entity: "Musician",
142 | fields: [
143 | { name: "name", weight: 10 },
144 | { name: "bio", weight: 5 },
145 | ]
146 | }
147 | ]
148 | )
149 | ```
150 |
151 | The schema generated from the above definition:
152 | ```graphql
153 | union _FulltextMediaEntity = ...
154 | union _FulltextSearchEntity = Band | Album | Musician
155 | type Query {
156 | media...
157 | search(text: String!, first: Int, skip: Int, block: Block_height): [FulltextSearchResultItem!]!
158 | }
159 | ```
160 |
161 | ### GraphQL Query interface
162 |
163 | End users of the subgraph will have access to the fulltext search
164 | queries alongside the other queries available for each entity in the
165 | subgraph. In the case of a fulltext search defined across multiple
166 | entities,
167 | [inline fragments](https://graphql.org/learn/queries/#inline-fragments)
168 | may be used in the query to deal with the polymorphic result items. In
169 | the front-end the `__typename` field can be used to distinguish the
170 | concrete entity types of the returned results.
171 |
172 | In the `text` parameter supplied to the query there will be several operators
173 | available to the end user. Included are the and, or, and proximity operators
174 | (`&`, `|`, `<->`.) The special, proximity operator allows clients to specify
175 | the maximum distance between search terms: `foo<3>bar` is equivalent to
176 | requesting that `foo` and `bar` are at most three words apart.
177 |
178 |
179 |
180 | Example query using inline fragments and the proximity operator:
181 | ```graphql
182 | query {
183 | search(text: "Bob<3>run") {
184 | __typename
185 | ... on Band { name label { id } }
186 | ... on Album { title numberOfTracks }
187 | ... on Musician { name bio }
188 | }
189 | }
190 | ```
191 |
192 | ### Tools and Design
193 |
194 | Fulltext search query system implementations often involve specific systems
195 | for storing and querying the text documents; however, in an effort to reduce
196 | system complexity and feature implementation time I propose starting with
197 | extending the current store interface and storage implemenation with fulltext
198 | search features rather than use a fulltext specific interface and storage
199 | system.
200 |
201 | A FullText search field will get its own column in a table dedicated to fulltext
202 | data. The data stored will be the result of the lexical, morphological analysis
203 | of text documents performed on the fields included in the index. The fulltext
204 | search field will be created using the Postgres ts_vector function and will
205 | be indexed using a GIN index. The subgraph developer will define a ranking
206 | algorithm to be used to sort query results,so the end-user facing API remains
207 | easy to use without any requirement to understand the ranking algorithms.
208 |
209 | ## Compatibility
210 |
211 | This proposal does not change any existing interfaces, so no migrations
212 | will be necessary for existing subgraph deployments.
213 |
214 | ## Drawbacks and Risks
215 |
216 | The proposed solution uses native Postgres fulltext features and there is
217 | a nonzero probability this choice results in slower than optimal write and
218 | read times; however the tradeoff in implementation time/complexity and the
219 | existence of production use case testimonials tempers my apprehension here.
220 |
221 | In future phases of the network the storage layer may get a redesign with
222 | indexes being overhauled to facilitate query result verification. Postgres
223 | based fulltext search implementation would not be translatable to another
224 | storage system, so at the least a reevaluation of the tools used for analysis,
225 | indexing, and querying would be required.
226 |
227 | ## Alternatives
228 |
229 | An alternative design for the feature would allow more flexibility for
230 | Graph Node operators in their index implementation and create a
231 | marketplace for indexes. In the alternate, the definition of fulltext
232 | search indexes could be moved out of the subgraph schema. The subgraph
233 | would be deployed without them and they could be added later using a new
234 | Graph Explorer interface (in Hosted-Service context) or a JSON-RPC
235 | request directly to a Graph Node. Moving the creation of fulltext search
236 | indexes/queries out of the schema would mean that that the definition of
237 | uniqueness for a subgraph does not include the custom indexes, so a new
238 | subgraph deployment and subgraph re-syncing work does not have to be
239 | added in order to create or update an index. However, it also introduces
240 | significant added complexity. A separate query marketplace and discovery
241 | registry would be required for finding nodes with the needed
242 | subgraph-index combination.
243 |
244 | ## Open Questions
245 |
246 | Full-text search queries introduce new issues with maintaining query
247 | result determinism which will become a more potent issue with the
248 | decentralized network. A fulltext search query and a dataset are not enough
249 | to determine the output of the query, the index is vital to establish a
250 | deterministic causal relationship to the output data. Query verification
251 | will need to take into account the query, the index, the underlying dataset,
252 | and the query result. Can we find a healthy compromise between being
253 | prescriptive about the indexes and algorithms in order to allow formal
254 | verification and allowing indexer node operators to experiment with
255 | algorithms and indexes in order to continue to improve query speed and results?
256 |
257 | Since a fulltext search field is purely derivative of other Entity data
258 | the addition or update of an @fulltext directive does not require a full
259 | blockchain resync, rather the index itself just needs to be rebuilt.
260 | There is room for optimization in the future by allowing fulltext search
261 | definition updates without requiring a full subgraph resync.
--------------------------------------------------------------------------------
/rfcs/0005-multi-blockchain-support.md:
--------------------------------------------------------------------------------
1 | # RFC-0005: Multi-Blockchain Support
2 |
3 |
16 |
17 | ## Summary
18 |
19 | Multi-blockchain support allows subgraphs to index data from blockchains other
20 | than Ethereum.
21 |
22 | ## Goals & Motivation
23 |
24 | The main objective of multi-blockchain support is to add support for subgraphs
25 | to index data from a growing number of blockchains. At the time of writing this
26 | RFC, only Ethereum and blockchains compatible with the Ethereum JSON-RPC API are
27 | supported.
28 |
29 | This feature unlocks the same use cases on other blockchains that The Graph has
30 | made possible on Ethereum, including indexing historical data, interacting with
31 | smart contracts (if supported by the respective blockchain), handling reorgs (if
32 | applicable) and triggering off of blockchain-specific features like events or
33 | transactions.
34 |
35 | ## Urgency
36 |
37 | Since this is a big feature and one that requires a substantial refactoring of
38 | the existing codebase, and _may_ introduce breaking changes, it would be good to
39 | start working on this as soon as possible.
40 |
41 | ## Terminology
42 |
43 | The feature proposed in this RFC is referred to as **multi-blockchain support**.
44 | Other terms that are introduced in this RFC or are relevant when talking about
45 | it are listed below.
46 |
47 | - **Network** - a decentralized technology or similar, such as Ethereum,
48 | Substrate, or IPFS.
49 | - **Blockchain** - a blockchain technology, such as Ethereum or Substrate.
50 | - **Chain** - a specific instance of a _blockchain_, such as Ethereum mainnet.
51 | - **Chain subgraph** - a built-in subgraph that indexes data intrinsic to a
52 | _chain_, such as its blocks, transactions, accounts and balances.
53 | - **Trigger** - a _chain_ event or similar that can be observed on a _chain_ and
54 | that can be used to trigger indexing work via a subgraph mapping. Examples:
55 | blocks, transactions, contract method calls, logs/events.
56 |
57 | ## Detailed Design
58 |
59 | Multi-blockchain support touches almost every aspect of The Graph, including:
60 |
61 | - Subgraph manifests and data sources
62 | - Code generation
63 | - Mapping APIs and types
64 | - Subgraph indexing
65 | - Storage
66 |
67 | This section specifies how all of these components are updated to support other
68 | chains and what the integration of additional chains looks like. With this
69 | feature implemented, subgraph developers will be able to define new types of
70 | data sources for supported chains. Indexers will be able to configure which of
71 | these chains they want to enable.
72 |
73 | ### Subgraph Manifest
74 |
75 | The structure of the subgraph manifest is already designed for being extended
76 | with new types of data sources. An Ethereum data source is currently defined with
77 |
78 | ```yaml
79 | kind: ethereum/contract
80 | name: Gravity
81 | source:
82 | chain: mainnet
83 | address: '2E645469f354BB4F5c8a05B3b30A929361cf77eC'
84 | startBlock: 5000000
85 | abi: Gravity
86 | mapping:
87 | kind: wasm/assemblyscript
88 | apiVersion: 0.0.3
89 | eventHandlers:
90 | - event: NewGravatar(address,address,uint256)
91 | handler: handleNewGravatar
92 | callHandlers:
93 | - function: createGravatar(string,string)
94 | handler: handleCreateGravatar
95 | blockHandlers:
96 | - handler: handleBlock
97 | ```
98 |
99 | Multi-blockchain opens this up to other blockchains by allowing data sources to
100 | have `kind` values other than `ethereum/contract`. Support for individual
101 | blockchains will require separate RFCs, each of which will introduce one or more
102 | data source types with their own `kind` values such as e.g.
103 | `substrate/contract`.
104 |
105 | One change that simplifies working with ABIs and makes the code generated for
106 | them available across data sources is to move ABIs to the top level of the
107 | manifest and mark them with a `kind` as well:
108 |
109 | ```yaml
110 | kind: subgraph:1.0.0
111 | schema:
112 | file: ./schema.graphql
113 | bindings:
114 | - kind: ethereum/contract
115 | name: Gravity
116 | file: ./abis/Gravity.json
117 | - kind: ethereum/contract
118 | name: Gravatar
119 | file: ./abis/Gravatar.json
120 | dataSources:
121 | ...
122 | templates:
123 | ...
124 | ```
125 |
126 | Manifests can be migrated from the old structure to this automatically. It will
127 | however require code changes and a database migration to support top-level ABIs.
128 |
129 | ### Code Generation
130 |
131 | Code generation for schemas remains untouched by multi-blockchain support. Code
132 | generation for ABIs is extended to support more ABIs than just Ethereum's. Based
133 | on the type of ABI in top-level `bindings` section in the manifest, different
134 | code generation logic is applied.
135 |
136 | What code is generated exactly will vary from chain to chain and is therefore
137 | not specified in this RFC.
138 |
139 | ### Mapping APIs and Types
140 |
141 | Every chain comes with its own APIs and type system. Ethereum, for example, has
142 | contracts, events, calls, blocks, an API to make contract calls and a variety of
143 | low-level types for tuples/structs, byte arrays and signed/unsigned integers.
144 |
145 | The APIs that come with each chain are bundled in a host-exported module named
146 | after the blockchain. This requires the current Ethereum-specific types such as
147 | `EthereumEvent`, `EthereumCall`, `EthereumValue` etc. to be renamed. From a
148 | developer's perspective, each chain module can be imported from
149 | `@graphprotocol/graph-ts` as follows:
150 |
151 | ```js
152 | import { ethereum, substrate } from '@graphprotocol/graph-ts'
153 |
154 | // Types available now:
155 | //
156 | // - ethereum.Block
157 | // - ethereum.Event
158 | // - ethereum.Contract
159 | // - ethereum.Call
160 | // - ethereum.Value
161 | //
162 | // - substrate.Block
163 | // - ...
164 | ```
165 |
166 | Most, but not all of these, are internal to code generation. The same goes for
167 | low-level types such as Ethereum's `uint256` or `bytes32` type. These are
168 | already sufficiently abstracted behind `Bytes`, `BigInt`, `BigDecimal` and the
169 | code generated for Ethereum tuples.
170 |
171 | Additional types may be necessary for other chains, but it expected that the
172 | types implemented today are sufficient to start with.
173 |
174 | > **Note:** An alternative to the above could be to split chain modules up into
175 | > their own files or event packages, allowing them to be imported with either
176 | > ```js
177 | > import { ... } from '@graphprotocol/graph-ts/ethereum'
178 | > import { ... } from '@graphprotocol/graph-ts/substrate'
179 | > ```
180 | > or
181 | > ```js
182 | > import { ... } from '@graphprotocol/ethereum'
183 | > import { ... } from '@graphprotocol/substrate'
184 | > ```
185 |
186 | Each blockchain integration must include such a module as well as a WASM host
187 | exports implementation to back this module and type conversions from/to
188 | AssemblyScript for low-level types.
189 |
190 | > **Note:** Integrating a blockchain initially requires modifications across
191 | > Graph Node, Graph CLI and Graph TS. A more convenient extension concept can
192 | > still be developed later.
193 | >
194 | > - Graph Node: An alternative could be optional dependencies behind feature flags, to pull in
195 | > support for specific chains at build time.
196 | >
197 | > - Graph CLI: Gluegun, used for Graph CLI, includes an extension framework that
198 | > would allow to inject code generation and type conversion support from
199 | > separate NPM packages.
200 | >
201 | > - Graph TS: Separate NPM packages for chain-specific mapping APIs and types as
202 | > suggested in the previous note would make extending Graph TS easy.
203 |
204 | ### Subgraph Indexing
205 |
206 | This section discusses how the subgraph indexing components and data structures will change, with
207 | some being reused and others becoming chain-specific and therefore abstracted by Rust traits. [This
208 | diagram](https://whimsical.com/multi-blockchain-13ZinTmYAUn5YknGjcZc4e) gives an overview of the
209 | desired architecture for a running subgraph. The arrows describe the flow of data.
210 |
211 | Data enters the system by being requested from the chain
212 | itself, for example through JSON-RPC calls. The __adapters__ are the components responsible for
213 | interacting directly with the chain. They are broken up into specialized traits for each component
214 | that requires an adapter, such as the `IngestorAdapter` and `TriggersAdapter`. For performance, the
215 | adapter may cache and index into the local database the data pulled from the chain. Each chain will
216 | have its own DB schema (aka namespace) under which adapters may create any necessary custom DB
217 | tables.
218 |
219 | The __block ingestor__ and the __chain store__ will be reused across chains. There will be one
220 | instance of each per chain, an `IngestorAdapter` will provide access chain specific data. Each
221 | `blocks` table will be under the DB schema reserved for the corresponding chain. The
222 | `ChainHeadListener`, which handles the PG channel that notifies of new blocks, will also be reused
223 | across chains.
224 |
225 | The __block stream__ is initially chain-specific, though as we implement the first new chains, we
226 | will seek to build a more reusable block stream implementation. A reusable block stream would depend
227 | on a chain-specific `TriggersAdapter`. For performance, the block stream will be made asynchronous
228 | from the instance manager. So it will no longer have direct access to the subgraph store.
229 |
230 | The __instance manager__, __mapping runtime__ and related core components will be reused, though it
231 | will take considerable refactoring to move out the Ethereum-specific code.
232 |
233 | The __subgraph store__ will be reused since graphql entities behave the same for any chain. Even for
234 | the manifest and dynamic data source metadata we should be able to reuse the existing tables.
235 |
236 | #### Data structures
237 |
238 | A chain will define associated types such as `Block` and `Trigger` which will be produced by the
239 | adapters and over which the reusable components will need to be made generic. Since the `Chain`
240 | trait aggregates all chain-specific associated types, most of the codebase will be made generic over
241 | a `C: Chain` type parameter. The manifest data structure will also need to be made generic over the
242 | chain data source type.
243 |
244 | The `BlockPointer` type is shared across all chains, the existing `EthereumBlockPointer` will be
245 | renamed and its `hash` field will have the type changed from `H256` to `Bytes`.
246 |
247 | #### Chain Indexer
248 |
249 | Chains may include a built-in subgraph for block data, this is to be detailed in a future RFC. But
250 | this is an optional feature and does not replace the block ingestor.
251 |
252 | ### Adding Support for a New Blockchain
253 |
254 | This section covers what it takes to add support for a new blockchain. Please
255 | note that this RFC specifies the details only to an extent that allow us to
256 | assess whether the direction is feasible. Names and data structure details may
257 | vary in the implementation.
258 |
259 | #### Graph Node
260 |
261 | Each blockchain integration in Graph Node lives in the `graph-node` repository
262 | under `network//`, like the already existing `network/ethereum/` support.
263 |
264 | Graph Node includes a number of new component traits and data types to enable
265 | multi-blockchain support, sketched below:
266 |
267 | ```rust
268 | /// A compact representation of a block on a chain.
269 | struct BlockPtr {
270 | pub number: u64,
271 | pub hash: Bytes,
272 | }
273 |
274 | /// Represents a block in a particular chain. Each chain has its own
275 | /// implementation of the `Block` trait.
276 | trait Block: ToAsc {
277 | fn number(&self) -> u64 { self.ptr().number }
278 | fn hash(&self) -> { self.ptr().hash }
279 | fn ptr(&self) -> BlockPointer;
280 | fn parent_ptr(&self) -> Option;
281 | }
282 |
283 | trait DataSource { /* ... */ }
284 | trait IngestorAdapter { /* ... */ }
285 | trait TriggersAdapter { /* ... */ }
286 | trait HostFns { /* ... */ }
287 |
288 | trait BlockStream {
289 | fn new(&self, start_block: BlockPointer, data_sources: Vec, trigger_adapter: Arc) -> Result;
290 | }
291 |
292 | /// Represents a blockchain supported by the node.
293 | trait Blockchain {
294 | /// The DB schema for the `blocks` table and any custom tables.
295 | const DB_SCHEMA: &'static str;
296 |
297 | type Block: Block;
298 | type DataSource: DataSource;
299 | type IngestorAdapter: IngestorAdapter;
300 | type TriggersAdapter: IngestorAdapter;
301 | type BlockStream: BlockStream;
302 | type HostFns: HostFns;
303 | type ChainTrigger: ChainTrigger;
304 | // ...other adapters
305 | }
306 |
307 | /// Triggers emitted by a block stream. Triggers can be passed to
308 | /// AssemblyScript and implement a canonical ordering to ensure
309 | /// they are not processed out of order.
310 | trait ChainTrigger: ToAsc + Ord {
311 | fn matches_data_source(&self, data_source: &C::DataSource) -> bool;
312 | fn matches_handler(&self, handler: &C::DataSourceHandler) -> bool;
313 | }
314 |
315 | /// Combination of a block and subgraph-specific triggers for it.
316 | struct BlockWithTriggers {
317 | block: Box,
318 | triggers: Vec,
319 | }
320 |
321 | /// Events emitted by a block stream.
322 | enum BlockStreamEvent {
323 | Revert(BlockPointer),
324 | Block(BlockWithTriggers)
325 | }
326 |
327 | /// Common trait for all block streams.
328 | trait BlockStream: Stream> {}
329 | ```
330 |
331 | In addition, each blockchain will have to be configured in the main `graph-node`
332 | executable. This RFC does not include details on how this takes place, because
333 | the configuration options may vary from blockchain to blockchain. Ultimately,
334 | what `graph-node` builds before starting up is a set of supported networks /
335 | blockchains, each with a set of supported chains. These are then passed on to,
336 | for instance, the subgraph instance manager.
337 |
338 | #### Graph CLI
339 |
340 | Each blockchain integration comes in the form of a new JS module in the
341 | `graph-cli` repo under `src/networks/`. The existing Ethereum support is moved
342 | to `src/networks/ethereum/`.
343 |
344 | Each support module is required to export the following structure from its
345 | `index.js`:
346 |
347 | ```js
348 | module.exports = {
349 | // Extensions to the GraphQL schema used to validate the manifest.
350 | // Could come in the form of new binding and data source types:
351 | //
352 | // extend union Binding SubstrateContractBinding
353 | // extend union DataSource SubstrateDataSource
354 | // extend union DataSourceTemplate SubstrateDataSourceTemplate
355 | // ...
356 | //
357 | manifestExtensions: graphql.Document,
358 |
359 | // Type conversions to and from AssemblyScript.
360 | //
361 | // Conversions are defined as an array of tuples, each of which
362 | // defines the source type (as a pattern or string), the target
363 | // type (as a pattern or string), and a code snippet that performs
364 | // the conversion.
365 | //
366 | // An example for Ethereum could look as follows:
367 | typeConversions: {
368 | fromAssemblyScript: [
369 | ['Address', 'address', code => `ethereum.Value.fromAddress(${code})`],
370 | ['boolean', 'bool', code => `ethereum.Value.fromBoolean(${code})`],
371 | ['Bytes', 'byte', code => `ethereum.Value.fromFixedBytes(${code})`],
372 | ['Bytes', 'bytes', code => `ethereum.Value.fromBytes(${code})`],
373 | [
374 | 'Bytes',
375 | /^bytes(1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32)$/,
376 | code => `ethereum.Value.fromFixedBytes(${code})`,
377 | ],
378 | ...
379 | ],
380 | toAssemblyScript: [
381 | ['address', 'Address', code => `${code}.toAddress()`],
382 | ['bool', 'boolean', code => `${code}.toBoolean()`],
383 | ['byte', 'Bytes', code => `${code}.toBytes()`],
384 | [
385 | /^bytes(1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32)?$/,
386 | 'Bytes',
387 | code => `${code}.toBytes()`,
388 | ],
389 | ...
390 | ]
391 | },
392 |
393 | // Code generation for bindings,
394 | codegen: (binding) => {
395 | // Return a string to be put into the output file for the binding
396 | }
397 | }
398 | ```
399 |
400 | Mapping files ending with `.ts` are automatically detected wherever they appear
401 | in the manifest and included in builds and deployments.
402 |
403 | #### Graph TS
404 |
405 | Each blockchain supported in mappings comes with its own `network/.ts`
406 | file in the `graph-ts` repository. Each of these has at least two sections: one
407 | for declaring host exports that are required for mappings to interact with the
408 | chain, another to define blockchain-specific types and helpers written in
409 | AssemblyScript. The file `networks/ethereum.ts` file could look as follows:
410 |
411 | ```js
412 | import { Address } from '../index'
413 |
414 | export declare namespace ethereum {
415 | function call(contract: Address, fn: string, args: Array): void
416 | }
417 |
418 | export namespace ethereum {
419 | class Value { ... }
420 | class Block { ... }
421 | class Transaction { ... }
422 | class Event { ... }
423 | class Contract { ... }
424 | }
425 | ```
426 |
427 | The main `index.ts` in the `graph-ts` repository re-exports each of these chain
428 | integrations via
429 |
430 | ```js
431 | export * from './networks/ethereum'
432 | export * from './networks/substrate'
433 | ```
434 |
435 | So that they can be imported into subgraph mappings with
436 |
437 | ```js
438 | import { ethereum, substrate } from '@graphprotocol/graph-ts'
439 | ```
440 |
441 | ### Cross-Chain Indexing
442 |
443 | For the time being, the plan is to limit subgraphs to a single chain each. So
444 | subgraphs are not able to index across different blockchains or even different
445 | chains (e.g. Ethereum mainnet _and_ Ropsten) at the same time.
446 |
447 | ## Compatibility
448 |
449 | The proposed design is mostly backwards-compatible, except for a few areas:
450 |
451 | - The manifest changes require a subgraph manifest migration in Graph CLI, as
452 | well as database migration for the subgraph meta data to match the new
453 | structure.
454 |
455 | - Moving AssemblyScript types from e.g. `EthereumValue` to `ethereum.Value`
456 | _may_ break subgraphs that directly access these types. This is ok, because
457 | the majority (if not all) subgraphs only use them indirectly through generated
458 | code. This does not affect subgraphs that are already deployed.
459 |
460 | ## Drawbacks and Risks
461 |
462 | Generally, supporting more blockchains is an opportunity much more than it is a
463 | risk. It widens the reach of The Graph to communities beyond Ethereum.
464 |
465 | One risk in the design itself is that it may require a bit of experimentation to
466 | get the Graph Node traits for integrating arbitrary chains right.
467 |
468 | ## Alternatives
469 |
470 | There are nuances where the design could look different, specifically around how
471 | the manifest spec could be updated and whether new blockchain integrations are
472 | developed as plugins or not. For the moment, an approach _without_ plugins is
473 | considered the most efficient path forward.
474 |
475 | ## Open Questions
476 |
477 | - None.
478 |
--------------------------------------------------------------------------------
/rfcs/approved.md:
--------------------------------------------------------------------------------
1 | # Approved RFCs
2 |
3 | - [RFC-0001: Subgraph Composition](./0001-subgraph-composition.md)
4 | - [RFC-0002: Ethereum Tracing Cache](./0002-ethereum-tracing-cache.md)
5 | - [RFC-0003: Mutations](./0003-mutations.md)
6 | - [RFC-0004: Fulltext Search](./rfcs/0004-fulltext-search.md)
7 | - [RFC-0005: Multi-Blockchain Support](./0005-multi-blockchain-support.md)
8 |
--------------------------------------------------------------------------------
/rfcs/index.md:
--------------------------------------------------------------------------------
1 | # RFCs
2 |
3 | ## What is an RFC?
4 |
5 | An RFC describes a change to Graph Protocol, for example a new feature. Any
6 | substantial change goes through the RFC process, where the change is described
7 | in an RFC, is proposed a pull request to the `rfcs` repository, is reviewed,
8 | currently by the core team, and ultimately is either either approved or
9 | rejected.
10 |
11 | ## RFC process
12 |
13 | ### 1. Create a new RFC
14 |
15 | RFCs are numbered, starting at `0001`. To create a new RFC, create a new branch
16 | of the `rfcs` repository. Check the existing RFCs to identify the next number to
17 | use. Then, copy the [RFC
18 | template](https://github.com/graphprotocol/rfcs/blob/master/rfcs/0000-template.md)
19 | to a new file in the `rfcs/` directory. For example:
20 |
21 | ```sh
22 | cp rfcs/0000-template.md rfcs/0015-fulltext-search.md
23 | ```
24 |
25 | Write the RFC, commit it to the branch and open a [pull
26 | request](https://github.com/graphprotocol/rfcs/pulls) in the `rfcs` repository.
27 |
28 | In addition to the RFC itself, the pull request must include the following
29 | changes:
30 |
31 | - a link to the RFC on the [Approved RFCs](./approved.md) page, and
32 | - a link to the RFC under `Approved RFCs` in `SUMMARY.md`.
33 |
34 | ### 2. RFC review
35 |
36 | After an RFC has been submitted through a pull request, it is being reviewed. At
37 | the time of writing, every RFC needs to be approved by
38 |
39 | - at least one Graph Protocol founder, and
40 | - at least one member of the core development team.
41 |
42 | ### 3. RFC approval
43 |
44 | Once an RFC is approved, the RFC meta data (see the
45 | [template](https://github.com/graphprotocol/rfcs/blob/master/rfcs/0000-template.md))
46 | is updated and the pull request is merged by the original author or a Graph
47 | Protocol team member.
48 |
--------------------------------------------------------------------------------
/rfcs/obsolete.md:
--------------------------------------------------------------------------------
1 | # Obsolete RFCs
2 |
3 | Obsolete RFCs are moved to the `rfcs/obsolete` directory in the `rfcs`
4 | repository. They are listed below for reference.
5 |
6 | - No RFCs have been obsoleted yet.
7 |
--------------------------------------------------------------------------------
/rfcs/rejected.md:
--------------------------------------------------------------------------------
1 | # Rejected RFCs
2 |
3 | Rejected RFCs can be found by filtering open and closed pull requests by those
4 | that are labeled with `rejected`. This list can be [found
5 | here](https://github.com/graphprotocol/rfcs/issues?q=label:rfc+label:rejected).
6 |
--------------------------------------------------------------------------------