├── Readme.md
├── analysis
├── Modeling_durability_using_replication_state_and_related_metrics_(Markov_Chain_Model).ipynb
├── PoR_test_analysis_with_multiple_storage_nodes.ipynb
├── TokenValuation.ipynb
├── block-discovery-sim
│ ├── .Rbuildignore
│ ├── .Rprofile
│ ├── .gitignore
│ ├── DESCRIPTION
│ ├── R
│ │ ├── collate.R
│ │ ├── node.R
│ │ ├── partition.R
│ │ ├── sim.R
│ │ └── stats.R
│ ├── README.md
│ ├── block-discovery-sim.Rmd
│ ├── block-discovery-sim.Rproj
│ ├── renv.lock
│ ├── renv
│ │ ├── .gitignore
│ │ ├── activate.R
│ │ └── settings.json
│ └── tests
│ │ ├── testthat.R
│ │ └── testthat
│ │ ├── test-partition.R
│ │ └── test-stats.R
└── block-discovery.Rmd
├── design
├── Merkle.md
├── contract-deployment.md
├── marketplace.md
├── metadata-overhead.md
├── proof-erasure-coding.md
├── proof-erasure-coding.ods
├── sales.md
├── slot-reservations.md
└── storage-proof-timing.md
├── evaluations
├── account abstraction.md
├── arweave.md
├── eigenlayer.md
├── filecoin.md
├── ipfs.md
├── rollups.md
├── rollups.ods
├── sia.md
├── sidechains.md
├── sidechains.ods
├── statechannels
│ ├── disputes.md
│ └── overview.md
├── storj.md
├── sui.md
├── swarm.md
└── zeroknowledge.md
├── incentives-rationale.md
├── meetings
└── bogota2022.md
├── papers
├── Compact_Proofs_of_Retrievability
│ └── README.md
├── Economics_of_BitTorrent_communities
│ └── README.md
├── Falcon_Codes_Fast_Authenticated_LT_Codes
│ └── README.md
├── Filecoin_A_Decentralized_Storage_Network
│ └── README.md
├── Peer-to-Peer_Storage_System_a_Practical_Guideline_to_be_lazy
│ └── README.md
├── README.md
├── Sui
│ └── sui.md
└── template.md
├── project-overview.md
└── robust-data-possesion-scheme.md
/Readme.md:
--------------------------------------------------------------------------------
1 | Codex Research
2 | ===============
3 |
4 | Contains research for the Codex peer-to-peer storage network.
5 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/.Rbuildignore:
--------------------------------------------------------------------------------
1 | ^renv$
2 | ^renv\.lock$
3 | ^.*\.Rproj$
4 | ^\.Rproj\.user$
5 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/.Rprofile:
--------------------------------------------------------------------------------
1 | source("renv/activate.R")
2 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .RData
3 | .Rhistory
4 | *nb.html
5 | rsconnect
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/DESCRIPTION:
--------------------------------------------------------------------------------
1 | Package: blockdiscoverysim
2 | Title: Block Discovery Simulator
3 | Version: 0.0.0.9000
4 | Description: Simple Simulation for Block Discovery
5 | Encoding: UTF-8
6 | Roxygen: list(markdown = TRUE)
7 | RoxygenNote: 7.2.3
8 | Depends:
9 | shiny (>= 1.7.4.1),
10 | tidyverse (>= 2.0.0),
11 | purrr (>= 1.0.1),
12 | VGAM (>= 1.1-8),
13 | R6 (>= 2.2.2),
14 | plotly (>= 4.10.2)
15 | Suggests:
16 | devtools,
17 | testthat (>= 3.0.0)
18 | Config/testthat/edition: 3
19 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/R/collate.R:
--------------------------------------------------------------------------------
1 | # We do this hack because rsconnect doesn't seem to like us bundling the app
2 | # as a package.
3 |
4 | order <- c(
5 | 'R/partition.R',
6 | 'R/stats.R',
7 | 'R/node.R',
8 | 'R/sim.R'
9 | )
10 |
11 | library(R6)
12 | library(purrr)
13 | library(tidyverse)
14 |
15 | lapply(order, source)
16 |
17 | run <- function() {
18 | rmarkdown::run('./block-discovery-sim.Rmd')
19 | }
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/R/node.R:
--------------------------------------------------------------------------------
1 | Node <- R6Class(
2 | 'Node',
3 | public = list(
4 | node_id = NULL,
5 | storage = NULL,
6 |
7 | initialize = function(node_id, storage) {
8 | self$node_id = node_id
9 | self$storage = storage
10 | },
11 |
12 | name = function() paste0('node ', self$node_id)
13 | )
14 | )
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/R/partition.R:
--------------------------------------------------------------------------------
1 | #' Generates a random partition of a block array among a set of nodes. The
2 | #' partitioning follows the supplied distribution.
3 | #'
4 | #' @param block_array a vector containing blocks
5 | #' @param network_size the number of nodes in the network
6 | #' @param distribution a sample generator which generates a vector of n
7 | #' samples when called as distribution(n).
8 | #'
9 | partition <- function(block_array, network_size, distribution) {
10 | buckets <- distribution(length(block_array))
11 |
12 | # We won't attempt to shift the data, instead just checking that it is
13 | # positive.
14 | stopifnot(all(buckets >= 0))
15 |
16 | buckets <- trunc(buckets * (network_size - 1) / max(buckets)) + 1
17 | sapply(1:network_size, function(i) which(buckets == i))
18 | }
19 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/R/sim.R:
--------------------------------------------------------------------------------
1 | run_download_simulation <- function(swarm, max_steps, coding_rate) {
2 | total_blocks <- sum(sapply(swarm, function(node) length(node$storage)))
3 | required_blocks <- round(total_blocks * coding_rate)
4 | completed_blocks <- 0
5 | storage <- c()
6 |
7 | step <- 1
8 | stats <- Stats$new()
9 | while ((step < max_steps) && (completed_blocks < required_blocks)){
10 | neighbor <- swarm |> select_neighbor()
11 | storage <- neighbor |> download_blocks(storage)
12 |
13 | completed_blocks <- length(storage)
14 | stats$add_stat(
15 | step = step,
16 | selected_neighbor = neighbor$node_id,
17 | total_blocks = total_blocks,
18 | required_blocks = required_blocks,
19 | completed_blocks = completed_blocks
20 | )
21 |
22 | step <- step + 1
23 | }
24 |
25 | stats$as_tibble()
26 | }
27 |
28 | select_neighbor <- function(neighborhood) neighborhood[[sample(1:length(neighborhood), size = 1)]]
29 |
30 | download_blocks <- function(neighbor, storage) unique(c(neighbor$storage, storage))
31 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/R/stats.R:
--------------------------------------------------------------------------------
1 | Stats <- R6Class(
2 | 'Stats',
3 | public = list(
4 | stats = NULL,
5 |
6 | initialize = function() {
7 | self$stats = list(list())
8 | },
9 |
10 | add_stat = function(...) {
11 | self$stats <- c(self$stats, list(rlang::dots_list(...)))
12 | self
13 | },
14 |
15 | as_tibble = function() purrr::map_df(self$stats, as_tibble)
16 | )
17 | )
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/README.md:
--------------------------------------------------------------------------------
1 | Simple Block Discovery Simulator
2 | ================================
3 |
4 | Simple simulator for understanding of block discovery dynamics.
5 |
6 | ## Hosted Version
7 |
8 | You can access the block discovery simulator on [shinyapps](https://gmega.shinyapps.io/block-discovery-sim/)
9 |
10 | ## Running
11 |
12 | You will need R 4.1.2 with [renv](https://rstudio.github.io/renv/) installed. I also strongly recommend you run this
13 | from [RStudio](https://posit.co/products/open-source/rstudio/) as you will otherwise need to [install pandoc and set it up manually before running](https://stackoverflow.com/questions/28432607/pandoc-version-1-12-3-or-higher-is-required-and-was-not-found-r-shiny).
14 |
15 | Once that's cared for and you are in the R terminal (Console in RStudio), you will need to first install deps:
16 |
17 | ```R
18 | > renv::install()
19 | ```
20 |
21 | If you are outside RStudio, then you will need to restart your R session. After that, you should load the package:
22 |
23 | ```R
24 | devtools::load_all()
25 | ```
26 |
27 | run the tests:
28 |
29 | ```R
30 | testthat::test_package('blockdiscoverysim')
31 | ```
32 |
33 | and, if all goes well, launch the simulator:
34 |
35 | ```R
36 | run()
37 | ```
38 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/block-discovery-sim.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Block Discovery Sim"
3 | output: html_document
4 | runtime: shiny
5 |
6 | # rsconnect uses this
7 | resource_files:
8 | - R/node.R
9 | - R/partition.R
10 | - R/sim.R
11 | - R/stats.R
12 | ---
13 |
14 | ## Goal
15 |
16 | The goal of this experiment is to understand -- under different assumptions about how blocks are partitioned among nodes -- how long a hypothetical downloader would take to discover enough blocks to make a successful download from storage nodes by randomly sampling the swarm. We therefore do not account for download times or network latency - we just measure how many times the node randomly samples the swarm before figuring out where enough of the blocks are.
17 |
18 | ```{r echo = FALSE, message = FALSE}
19 | library(shiny)
20 | library(plotly)
21 |
22 | source('R/collate.R')
23 |
24 | knitr::opts_chunk$set(echo = FALSE, message = FALSE)
25 | ```
26 |
27 | ```{r}
28 | runs <- 10
29 | max_steps <- Inf
30 | ```
31 |
32 | ```{r}
33 | DISTRIBUTIONS <- list(
34 | 'uniform' = runif,
35 | 'exponential' = rexp,
36 | 'pareto' = VGAM::rparetoI
37 | )
38 | ```
39 |
40 |
41 | ## Network
42 |
43 | * Select the parameters of the network you would like to use in the experiments.
44 | * Preview the shape of the partitions by looking at the chart.
45 | * Generate more random partitions by clicking "Generate Another".
46 |
47 | ```{r}
48 | fluidPage(
49 | sidebarPanel(
50 | numericInput(
51 | 'swarm_size',
52 | label = 'size of the swarm',
53 | value = 20,
54 | min = 1,
55 | max = 10000
56 | ),
57 | numericInput(
58 | 'file_size',
59 | label = 'number of blocks in the file',
60 | value = 1000,
61 | min = 1,
62 | max = 1e6
63 | ),
64 | selectInput(
65 | 'partition_distribution',
66 | label = 'shape of the distribution for the partitions',
67 | choices = names(DISTRIBUTIONS)
68 | ),
69 | actionButton(
70 | 'generate_network',
71 | label = 'Generate Another'
72 | )
73 | ),
74 | mainPanel(
75 | plotOutput('network_sample')
76 | )
77 | )
78 | ```
79 |
80 | ```{r}
81 | observe({
82 | input$generate_network
83 | output$network_sample <- renderPlot({
84 | purrr::map_dfr(
85 | generate_network(
86 | number_of_blocks = input$file_size,
87 | network_size = input$swarm_size,
88 | partition_distribution = input$partition_distribution
89 | ),
90 | function(node) tibble(node_id = node$node_id, blocks = length(node$storage))
91 | ) %>%
92 | ggplot() +
93 | geom_bar(
94 | aes(x = node_id, y = blocks),
95 | stat = 'identity',
96 | col = 'black',
97 | fill = 'lightgray'
98 | ) +
99 | labs(x = 'node') +
100 | theme_minimal()
101 | })}
102 | )
103 | ```
104 |
105 | ## Experiment
106 |
107 | Select the number of experiment runs. Each experiment will generate a network and then simulate a download operation where a hypothetical node:
108 |
109 | 1. joins the swarm;
110 | 2. samples one neighbor per round in a round-based download protocol and asks for its block list.
111 |
112 | The experiment ends when the downloading node recovers "enough" blocks. If we let the total number of blocks in the file be $n$ and the coding rate $r$, then the simulation ends when the set of blocks $D$ discovered by the downloading node satisfies $\left|D\right| \geq n\times r$.
113 |
114 | We then show a "discovery curve": a curve that emerges as we look at the percentage of blocks the downloader has discovered so far as a function of the number of contacts it made.
115 |
116 | The curve is actually an average of all experiments, meaning that a point $(5, 10\%)$ should be interpreted as: "on average, after $5$ contacts, a downloader will have discovered $10\%$ of the blocks it needs to get a successful download". We show the $5^{th}$ percentile and the $95^{th}$ percentiles of the experiments as error bands around the average.
117 |
118 | ```{r}
119 | fluidPage(
120 | fluidRow(
121 | class='well',
122 | column(
123 | width = 6,
124 | sliderInput('runs', 'How many experiments to run', min = 10, max = 10000, value = 10),
125 | actionButton('do_run', 'Run')
126 | ),
127 | column(
128 | width = 6,
129 | numericInput('coding_rate', 'Coding rate (percentage of blocks required for a successful download)',
130 | min = 0.1, max = 1.0, step = 0.05, value = 0.5)
131 | )
132 | )
133 | )
134 | ```
135 |
136 | ```{r}
137 | experiment_results <- reactive({
138 | lapply(1:input$runs, function(i) {
139 | generate_network(
140 | number_of_blocks = input$file_size,
141 | network_size = input$swarm_size,
142 | partition_distribution = input$partition_distribution
143 | ) |> run_experiment(run_id = i, coding_rate = input$coding_rate)
144 | })
145 | }) |> bindEvent(
146 | input$do_run,
147 | ignoreNULL = TRUE,
148 | ignoreInit = TRUE
149 | )
150 | ```
151 |
152 | ```{r}
153 | renderPlotly({
154 | plot_results(do.call(rbind, experiment_results()))
155 | })
156 | ```
157 |
158 | ```{r}
159 | generate_network <- function(number_of_blocks, network_size, partition_distribution) {
160 | block_array <- sample(1:number_of_blocks, replace = FALSE)
161 |
162 | partitions <- partition(block_array, network_size, DISTRIBUTIONS[[partition_distribution]])
163 | sapply(1:network_size, function(i) Node$new(
164 | node_id = i,
165 | storage = partitions[[i]])
166 | )
167 | }
168 | ```
169 |
170 | ```{r}
171 | run_experiment <- function(network, coding_rate, run_id = 0) {
172 | run_download_simulation(
173 | swarm = network,
174 | coding_rate = coding_rate,
175 | max_steps = max_steps
176 | ) |> mutate(
177 | run = run_id
178 | )
179 | }
180 | ```
181 |
182 | ```{r}
183 | plot_results <- function(results) {
184 | stats <- results |>
185 | mutate(completion = pmin(1.0, completed_blocks / required_blocks)) |>
186 | group_by(step) |>
187 | summarise(
188 | average = mean(completion),
189 | p_95 = quantile(completion, 0.95),
190 | p_05 = quantile(completion, 0.05),
191 | .groups = 'drop'
192 | )
193 |
194 | plotly::ggplotly(ggplot(stats, aes(x = step)) +
195 | geom_line(aes(y = average), col = 'black', lwd = 1) +
196 | geom_ribbon(aes(ymin = p_05, ymax = p_95), fill = 'grey80', alpha = 0.5) +
197 | labs(x = 'contacts', y = 'blocks discovered (%)') +
198 | scale_y_continuous(labels = scales::percent_format()) +
199 | theme_minimal())
200 | }
201 | ```
202 |
203 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/block-discovery-sim.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: Default
4 | SaveWorkspace: Default
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 |
15 | BuildType: Package
16 | PackageUseDevtools: Yes
17 | PackageInstallArgs: --no-multiarch --with-keep.source
18 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/renv/.gitignore:
--------------------------------------------------------------------------------
1 | library/
2 | local/
3 | cellar/
4 | lock/
5 | python/
6 | sandbox/
7 | staging/
8 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/renv/settings.json:
--------------------------------------------------------------------------------
1 | {
2 | "bioconductor.version": null,
3 | "external.libraries": [],
4 | "ignored.packages": [],
5 | "package.dependency.fields": [
6 | "Imports",
7 | "Depends",
8 | "LinkingTo"
9 | ],
10 | "r.version": null,
11 | "snapshot.type": "implicit",
12 | "use.cache": true,
13 | "vcs.ignore.cellar": true,
14 | "vcs.ignore.library": true,
15 | "vcs.ignore.local": true,
16 | "vcs.manage.ignores": true
17 | }
18 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/tests/testthat.R:
--------------------------------------------------------------------------------
1 | # This file is part of the standard setup for testthat.
2 | # It is recommended that you do not modify it.
3 | #
4 | # Where should you do additional test configuration?
5 | # Learn more about the roles of various files in:
6 | # * https://r-pkgs.org/tests.html
7 | # * https://testthat.r-lib.org/reference/test_package.html#special-files
8 |
9 | library(testthat)
10 |
11 | test_check("blockdiscoverysim")
12 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/tests/testthat/test-partition.R:
--------------------------------------------------------------------------------
1 | test_that(
2 | "should partition into linearly scaled buckets", {
3 | samples <- c(1, 100, 500, 800, 850)
4 |
5 | partitions <- partition(
6 | block_array = 1:5,
7 | network_size = 4,
8 | distribution = function(n) samples[1:n]
9 | )
10 |
11 | expect_equal(partitions, list(
12 | c(1, 2),
13 | c(3),
14 | c(4),
15 | c(5))
16 | )
17 | }
18 | )
19 |
--------------------------------------------------------------------------------
/analysis/block-discovery-sim/tests/testthat/test-stats.R:
--------------------------------------------------------------------------------
1 | test_that(
2 | "should collect stats as they are input", {
3 | stats <- Stats$new()
4 |
5 | stats$add_stat(a = 1, b = 2, name = 'hello')
6 | stats$add_stat(a = 1, b = 3, name = 'world')
7 |
8 | expect_equal(
9 | stats$as_tibble(),
10 | tribble(
11 | ~a, ~b, ~name,
12 | 1, 2, 'hello',
13 | 1, 3, 'world',
14 | )
15 | )
16 | }
17 | )
18 |
--------------------------------------------------------------------------------
/analysis/block-discovery.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Block Discovery Problem"
3 | output:
4 | bookdown::gitbook:
5 | number_sections: false
6 | ---
7 |
8 | $$
9 | \newcommand{\rv}[1]{\textbf{#1}}
10 | \newcommand{\imin}{\rv{I}_{\text{min}}}
11 | $$
12 |
13 | ## Problem Statement
14 |
15 | Let $F = \left\{b_1, \cdots, b_m\right\}$ be an erasure-coded file, and let $O = \left\{o_1, \cdots, o_n\right\}$ be a set of nodes storing that file. We define a _storage function_ $s \longrightarrow O \times 2^F$ as a function mapping subsets of $F$ into nodes in $O$.
16 |
17 | In the simplified block discovery problem, we have a _downloader node_ which is attempting to construct a subset $D \subseteq F$ of blocks by repeatedly sampling nodes from $O$. "Discovery", in this context, can be seen as the downloader node running a round-based protocol where, at round $i$, it samples a random contact $o_i$ and learns about $s(o_i)$.
18 |
19 | To make this slightly more formal, we denote $D_i \subseteq F$ to be the set of blocks that the downloader has learned after $i^{th}$ contact. By the way we state the protocol to work, we have that:
20 |
21 | $$
22 | \begin{equation}
23 | D_i = D_{i - 1} \cup s(o_i)
24 | (\#eq:discovery)
25 | \end{equation}
26 | $$
27 |
28 | Since the file is erasure coded, the goal of the downloader is to learn some $D_i$ such that:
29 |
30 | $$
31 | \begin{equation}
32 | \left|D_i\right| \geq c \times \left|F\right|
33 | (\#eq:complete)
34 | \end{equation}
35 | $$
36 |
37 | When $D_i$ satisfies Eq. \@ref(eq:complete), we say that $D_i$ is _complete_. We can then state the problem as follows.
38 |
39 | **Statement.** Let $\imin$ be a random variable representing the first round at which $D_i$ is complete. We want to estimate $F(i) = \mathbb{P}(\imin \leq i)$; namely, the probability that the downloader has discovered all relevant blocks by round $i$.
40 |
41 | ## Case (1) - Erasure Coding but no Replication
42 |
43 | If we assume there is no replication then, unless we contact the same node twice, every node we contact contributes with new information. Indeed, the absence of replication implies, necessarily, that:
44 |
45 | $$
46 | \begin{equation}
47 | \bigcap_{o \in O} s(o) = \emptyset
48 | (\#eq:disjoint)
49 | \end{equation}
50 | $$
51 |
52 | So that if we are contacting a new node at round $i$, we must necessarily have that:
53 |
54 | $$
55 | \begin{equation}
56 | \left|D_{i}\right| \stackrel{1}{=} \left|D_{i - 1} \cup s(o_i)\right| \stackrel{2}{=} \left|D_{i - 1}\right| + \left|s(o_i)\right|
57 | (\#eq:monotonic)
58 | \end{equation}
59 | $$
60 | Where (1) follows from Eq. \@ref(eq:discovery), and (2) follows from the $s(o_i)$ being disjoint (Eq. \@ref(eq:disjoint)). This leads to the corollary:
61 |
62 | **Corollary 1.** After $\lceil c \times n\rceil$ rounds, the downloader will necessarily have learned enough blocks to download $F$.
63 |
64 | which follows trivially from Eq. \@ref(eq:monotonic) and the implication that $D_{\lceil c \times n\rceil}$ must be complete. $\blacksquare$
65 |
66 | As for $F(i)$, note that we can estimate the probability of completion by estimating the probability that $|D_i|$ is bigger than the completion number (Eq. \@ref(eq:complete)). How exactly that looks like and how tractable it is, however, depends on the formulation we give it.
67 |
68 | ### Independent Partition Sizes
69 |
70 | Suppose we knew the distribution for partition sizes in $O$, i.e., we knew that the number of blocks assigned to a node in $O$ follows some distribution $\mathcal{B}$ (e.g., a truncated Gaussian).
71 |
72 | If we have a "large enough" network, this means we would be able to approximate the number of blocks assigned to each node as $m$ independent random variables $\rv{Y}_i$, where $\rv{Y}_i \sim \mathcal{B}$. In that case, we would be able to express the total number of blocks learned by the downloader by round $i$ as a random variable $\rv{L}_i$ which represents the sum of the iid random variables $\rv{Y}_j \sim \mathcal{B}$:
73 |
74 | $$
75 | \begin{equation}
76 | \rv{L}_i \sim \sum_{j = 1}^{i} \rv{Y}_j
77 | (\#eq:learning-sum)
78 | \end{equation}
79 | $$
80 |
81 | The shape of the distribution would be the $i$-fold convolution of $\mathcal{B}$ with itself, which can be tractable for some distributions.
82 |
83 | More interestingly, though, Eq. \@ref(eq:learning-sum) allows us to express a $\mathcal{B}$-independent estimate of the average number of rounds a downloader will undergo before completing a download. We have that:
84 |
85 | $$
86 | \mathbb{E}(\rv{L}_i) \sim \sum_{j = 1}^i \mathbb{E}(\rv{Y}_j) = i\mathbb{E}(\rv{Y}) = i\times \mu_{\rv{Y}}
87 | $$
88 |
89 | We can then solve for $i$ and the completion condition to get:
90 |
91 | $$
92 | \begin{equation}
93 | i \times \mu_{\rv{Y}} \geq c \times |F| \iff i \geq \frac{c \times |F|}{\mu_{\rv{Y}}}
94 | (\#eq:average-completion)
95 | \end{equation}
96 | $$
97 |
98 | note that this is intuitive to the point of being trivial. If we let $c = 1$, we get $i \geq |F|/\mu_{\rv{Y}}$, which just means that on average the node will have to sample a number of nodes equal to the number of blocks over the average partition size. In practice we can use $\overline{\mu_\rv{Y}} = \frac{1}{m}\sum_i \left|s(o_i)\right|$ instead of $\mu_{\rv{Y}}$ to estimate what $i$ can look like.
99 |
100 | ### Non-Independent Partition Sizes
101 |
102 | If we cannot approximate partition sizes and independent random variables, then the problem changes. Stripping it down, we can cast it as follows. We have a set of integers $P = \{p_1, \cdots, p_m\}$ representing the sizes of each partition. We then want to understand the distribution of the partial sums for random permutations of $P$.
103 |
104 | As I understand it, there is no good way of addressing this without running simulations. The difference is that if we assume disjoint partitions then the simulations are a lot simpler as we do not need to track the contents of $D_i$.
--------------------------------------------------------------------------------
/design/Merkle.md:
--------------------------------------------------------------------------------
1 |
2 | Merkle tree API proposal (WIP draft)
3 | ------------------------------------
4 |
5 | Let's collect the possible problems and solutions with constructing Merkle trees.
6 |
7 | See [section "Final proposal"](#Final-proposal) at the bottom for the concrete
8 | version we decided to implement.
9 |
10 | ### Vocabulary
11 |
12 | A Merkle tree, built on a hash function `H`, produces a Merkle root of type `T`.
13 | This is usually the same type as the output of the hash function. Some examples:
14 |
15 | - SHA1: `T` is 160 bits
16 | - SHA256: `T` is 256 bits
17 | - Poseidon: `T` is one (or a few) finite field element(s)
18 |
19 | The hash function `H` can also have different types `S` of inputs. For example:
20 |
21 | - SHA1 / SHA256 / SHA3: `S` is an arbitrary sequence of bits
22 | - some less-conforming implementation of these could take a sequence of bytes instead
23 | - Poseidon: `S` is a sequence of finite field elements
24 | - Poseidon compression function: at most `t-1` field elements (in our case `t=3`, so
25 | that's two field elements)
26 | - A naive Merkle tree implementation could for example accept only a power-of-two
27 | sized sequence of `T`
28 |
29 | Notation: Let's denote a sequence of `T`-s by `[T]`.
30 |
31 | ### Merkle tree API
32 |
33 | We usually need at least two types of Merkle tree APIs:
34 |
35 | - one which takes a sequence `S = [T]` of length `n` as input, and produces an
36 | output (Merkle root) of type `T`
37 | - and one which takes a sequence of bytes (or even bits, but in practice we probably
38 | only need bytes): `S = [byte]`
39 |
40 | We can decompose the latter into the composition of a function
41 | `deserialize : [byte] -> [T]` and the former.
42 |
43 | ### Naive Merkle tree implementation
44 |
45 | A straightforward implementation of a binary Merkle tree `merkleRoot : [T] -> T`
46 | could be for example:
47 |
48 | - if the input has length 1, it's the root
49 | - if the input has even length `2*k`, group it into pairs, apply a
50 | `compress : (T,T) -> T` compression function, producing the next layer of size `k`
51 | - if the input has odd length `2*k+1`, pad it with an extra element `dummy` of
52 | type `T`, then apply the procedure for even length, producing the next layer of size `k+1`
53 |
54 | The compression function could be implemented in several ways:
55 |
56 | - when `S` and `T` are just sequences of bits or bytes (as in the case of classical hash
57 | functions like SHA256), we can just concatenate the two leaves of the node and apply the
58 | hash: `compress(x,y) := H(x|y)`
59 | - in case of hash functions based on the sponge construction (like Poseidon or Keccak/SHA3),
60 | we can just fill the "capacity part" of the state with a constant (say 0), the "absorbing
61 | part" of the state with the two inputs, apply the permutation, and extract a single `T`
62 |
63 | ### Attacks
64 |
65 | When implemented without enough care (like the above naive algorithm), there are several
66 | possible attacks producing hash collisions or second preimages:
67 |
68 | 1. The root of any particular layer is the same as the root of the input
69 | 2. The root of `[x_0,x_1,...,x_(2*k)]` (length is `n=2*k+1` is the same as the root of
70 | `[x_0,x_1,...,x_(2*k),dummy]` (length is `n=2*k+2`)
71 | 3. when using bytes as the input, already `deserialize` can have similar collision attacks
72 | 4. The root of a singleton sequence is itself
73 |
74 | Traditional (linear) hash functions usually solve the analogous problems by clever padding.
75 |
76 | ### Domain separation
77 |
78 | It's a good practice in general to ensure that different constructions using the same
79 | underlying hash function will never (or at least with a very high probability not) produce the same output.
80 | This is called "domain separation", and it can very loosely remind one to _multihash_; however
81 | instead of adding extra bits of information to a hash (and thus increasing its size), we just
82 | compress the extra information into the hash itself. So the information itself is lost,
83 | however collisions between different domains are prevented.
84 |
85 | A simple example would be using `H(dom|H(...))` instead of `H(...)`. The below solutions
86 | can be interpreted as an application of this idea, where we want to separate the different
87 | lengths `n`.
88 |
89 | ### Possible solutions (for the tree attacks)
90 |
91 | While the third problem (`deserialize` may be not injective) is similar to the second problem,
92 | let's deal first with the tree problems, and come back to `deserialize` (see below) later.
93 |
94 | **Solution 0.** Pre-hash each input element. This solves 1), 2) and also 4) (at least
95 | if we choose `dummy` to be something we don't expect anybody to find a preimage), but
96 | it doubles the computation time.
97 |
98 | **Solution 1.** Just prepend the data with the length `n` of the input sequence. Note that any
99 | cryptographic hash function needs an output size of at least 160 bits (and usually at least
100 | 256 bits), so we can always embed the length (surely less than `2^64`) into `T`. This solves
101 | both problems 1) and 2) (the height of the tree is a deterministic function of the length),
102 | and 4) too.
103 | However, a typical application of a Merkle tree is the case where the length of the input
104 | `n=2^d` is a power of two; in this case it looks a little bit "inelegant" to increase the size
105 | to `n=2^d+1`, though the overhead with above even-odd construction is only `log2(n)`.
106 | An advantage is that you can _prove_ the size of the input with a standard Merkle inclusion proof.
107 |
108 | Alternative version: append the length, instead of prepending; then the indexing of the leaves does not change.
109 |
110 | **Solution 2.** Apply an extra compression step at the very end including the length `n`,
111 | calculating `newRoot = compress(n,origRoot)`. This again solves all 3 problems. However, it
112 | makes the code a bit less regular; and you have to submit the length as part of Merkle proofs
113 | (but it seems hard to avoid that anyway).
114 |
115 | **Solution 3a.** Use two different compression functions, one for the bottom layer (by bottom
116 | I mean the one next to the input, which is the same as the widest one) and another for all
117 | the other layers. For example you can use `compress(x,y) := H(isBottomLayer|x|y)`.
118 | This solves problem 1).
119 |
120 | **Solution 3b.** Use two different compression functions, one for the even nodes, and another
121 | for the odd nodes (that is, those with a single children instead of two). Similarly to the
122 | previous case, you can use for example `compress(x,y) := H(isOddNode|x|y)` (note that for
123 | the odd nodes, we will have `y=dummy`). This solves problem 2). Remark: The extra bits of
124 | information (odd/even) added to the last nodes (one in each layer) are exactly the binary
125 | expansion of the length `n`. A disadvantage is that for verifying a Merkle proof, we need to
126 | know for each node whether it's the last or not, so we need to include the length `n` into
127 | any Merkle proof here too.
128 |
129 | **Solution 3.** Combining **3a** and **3b**, we can solve both problems 1) and 2); so here we add
130 | two bits of information to each node (that is, we need 4 different compression functions).
131 | 4) can be always solved by adding a final compression call.
132 |
133 | **Solution 4a.** Replace each input element `x_i` with `compress(i,x_i)`. This solves
134 | both problems again (and 4) too), but doubles the amount of computation.
135 |
136 | **Solution 4b.** Only in the bottom layer, use `H(1|isOddNode|i|x_{2i}|x_{2i+1})` for
137 | compression (note that for the odd node we have `x_{2i+1}=dummy`). This is similar to
138 | the previous solution, but does not increase the amount of computation.
139 |
140 | **Solution 4c.** Only in the bottom layer, use `H(i|j|x_i|x_j)` for even nodes
141 | (with `i=2*k` and `j=2*k+1`), and `H(i|0|x_i|0)` for the odd node (or alternatively
142 | we could also use `H(i|i|x_i|x_i)` for the odd node). Note: when verifying
143 | a Merkle proof, you still need to know whether the element you prove is the last _and_
144 | odd element, or not. However instead of submitting the length, you can encode this
145 | into a single bit (not sure if that's much better though).
146 |
147 | **Solution 5.** Use a different tree shape, where the left subtree is always a complete
148 | (full) binary tree with `2^floor(log2(n-1))` leaves, and the right subtree is
149 | constructed recursively. Then the shape of tree encodes the number of inputs `n`.
150 | Blake3 hash uses such a strategy internally. This however complicates the Merkle proofs
151 | (they won't have uniform size anymore).
152 | TODO: think more about this!
153 |
154 | ### Keyed compression functions
155 |
156 | How can we have many different compression functions? Consider three case studies:
157 |
158 | **Poseidon.** The Poseidon family of hashes is built on a (fixed) permutation
159 | `perm : F^t -> F^t`, where `F` is a (large) finite field. For simplicity consider the case `t=3`.
160 | The standard compression function is then defined as:
161 |
162 | compress(x,y) := let (u,_,_) = perm(x,y,0) in u
163 |
164 | That, we take the triple `(x,y,0)`, apply the permutation to get another triple `(u,v,w)`, and
165 | extract the field element `u` (we could use `v` or `w` too, it shouldn't matter).
166 | Now we can see that it is in fact very easy to generalize this to a _keyed_ (or _indexed_)
167 | compression function:
168 |
169 | compress_k(x,y) := let (u,_,_) = perm(x,y,k) in u
170 |
171 | where `k` is the key. Note that there is no overhead in doing this. And since `F` is pretty
172 | big (in our case, about 253 bits), there is plenty of information we can encode in the key `k`.
173 |
174 | Note: We probably lose a few bits of security here, if somebody looks for a preimage among
175 | _all_ keys; however in our constructions the keys have a fixed structure, so it's probably
176 | not that dangerous. If we want to be extra safe, we could use `t=4` and `pi(x,y,k,0)`
177 | instead (but that has some computation overhead).
178 |
179 | **SHA256.** When using SHA256 as our hash function, normally the compression function is
180 | defined as `compress(x,y) := SHA256(x|y)`, that is, concatenate the (bitstring representation of the)
181 | two elements, and apply SHA256 to the resulting (bit)string. Normally `x` and `y` are both
182 | 256 bits long, and so is the result. If we look into the details of how SHA256 is specified,
183 | this is actually wasteful. That's because while SHA256 processes the input in 512 bit chunks,
184 | it also prescribes a mandatory nonempty padding. So when calling SHA256 on an input of size
185 | 512 bit (64 bytes), it will actually process two chunks, the second chunk consisting purely
186 | of padding. When constructing a binary Merkle tree using a compression function like before,
187 | the input is always of the same size, so this padding is unnecessary; nevertheless, people
188 | usually prefer to follow the standardized SHA256 call. But, if we are processing 1024 bits
189 | anyway, we have a lot of free space to include our key `k`! In fact we can add up to
190 | `512-64-1=447` bits of additional information; so for example
191 |
192 | compress_k(x,y) := SHA256(k|x|y)
193 |
194 | works perfectly well with no overhead compared to `SHA256(x|y)`.
195 |
196 | **MiMC.** MiMC is another arithmetic construction, however in this
197 | case the starting point is a _block cipher_, that is, we start with
198 | a keyed permutation! Unfortunately MiMC-p/p is a (keyed) permutation
199 | of `F`, which is not very useful for us; however in Feistel mode we
200 | get a keyed permutation of `F^2`, and we can just take the first
201 | component of the output of that as the compressed output.
202 |
203 | ### Making `deserialize` injective
204 |
205 | Consider the following simple algorithm to deserialize a sequence of bytes into chunks of
206 | 31 bytes:
207 |
208 | - pad the input with at most 30 zero bytes such that the padded length becomes divisible
209 | with 31
210 | - split the padded sequnce into `ceil(n/31)` chunks, each 31 bytes.
211 |
212 | The problem with this, is that for example `0x123456`, `0x12345600` and `0x1234560000`
213 | all results in the same output.
214 |
215 | #### About padding in general
216 |
217 | Let's take a step back, and meditate a little bit about the meaning of padding.
218 |
219 | What is padding? It's a mapping from a set of sequences into a subset. In our case
220 | we have an arbitrary sequence of bytes, and we want to map into the subset of sequences
221 | whose length is divisible by 31.
222 |
223 | Why do we want padding? Because we want to apply an algorithm (in this case a hash function)
224 | to arbitrary sequences, but the algorithm can only handle a subset of all sequences.
225 | In our case we first map the arbitrary sequence of bytes into a sequence of bytes
226 | whose length is divisible by 31, and then map that into a sequence of finite field
227 | elements.
228 |
229 | What properties do we want from padding? Well, that depends on what what properties we
230 | want from the resulting algorithm. In this case we do hashing, so we definitely want
231 | to avoid collisions. This means that our padding should never map two different input
232 | sequences into the same padded sequence (because that would create a trivial collision).
233 | In mathematics, we call such functions "injective".
234 |
235 | How do you prove that a function is injective? You provide an inverse function,
236 | which takes a padded sequences and outputs the original one.
237 |
238 | In summary we need to come up with an injective padding strategy for arbitrary byte
239 | sequences, which always results in a byte sequence whose length is divisible by 31.
240 |
241 | #### Some possible solutions:
242 |
243 | - prepend the length (number of input bytes) to the input, say as a 64-bit little-endian integer (8 bytes),
244 | before padding as above
245 | - or append the length instead of prepending, then pad (note: appending is streaming-friendly; prepending is not)
246 | - or first pad with zero bytes, but leave 8 bytes for the length (so that when we finally append
247 | the length, the result will be divisible 31). This is _almost_ exactly what SHA2 does.
248 | - use the following padding strategy: _always_ add a single `0x01` byte, then enough `0x00` bytes (possibly none)
249 | so that the length is divisible by 31. This is usually called the `10*` padding strategy, abusing regexp notation.
250 | Why does this work? Well, consider an already padded sequence. It's very easy to recover the
251 | original byte sequence by 1) first removing all trailing zeros; and 2) after that, remove the single
252 | trailing `0x01` byte. This proves that the padding is an injective function.
253 | - one can easily come up with many similar padding strategies. For example SHA3/Keccak uses `10*1`
254 | (but on bits, not bytes), and SHA2 uses a combination of `10*` and appending the bit length of the
255 | original input.
256 |
257 | Remark: Any safe padding strategy will result in at least one extra field element
258 | if the input length was already divisible by 31. This is both unavoidable in general,
259 | and not an issue in practice (as the size of the input grows, the overhead becomes
260 | negligible). The same thing happens when you SHA256 hash an integer multiple of 64 bytes.
261 |
262 |
263 | ### Final proposal
264 |
265 | We decided to implement the following version.
266 |
267 | - pad byte sequences (to have length divisible by 31) with the `10*` padding strategy; that is,
268 | always append a single `0x01` byte, and after that add a number of zero bytes (between 0 and 30),
269 | so that the resulting sequence have length divisible by 31
270 | - when converting an (already padded) byte sequence to a sequence of field elements,
271 | split it up into 31 byte chunks, interpret those as little-endian 248-bit unsigned
272 | integers, and finally interpret those integers as field elements in the BN254 scalar
273 | prime field (using the standard mapping `Z -> Z/r`).
274 | - when using the Poseidon2 sponge construction to compute a linear hash out of
275 | a sequence of field elements, we use the BN254 scalar field, `t=3` and `(0,0,domsep)`
276 | as the initial state, where `domsep := 2^64 + 256*t + rate` is the domain separation
277 | IV. Note that because `t=3`, we can only have `rate=1` or `rate=2`. We need
278 | a padding strategy here too (since the input length must be divisible by `rate`):
279 | we use `10*` again, but here on field elements.
280 | Remark: For `rate=1` this makes things always a tiny bit slower, but we plan to use
281 | `rate=2` anyway (as it's twice as fast), and it's better not to have exceptional cases.
282 | - when using Poseidon2 to build a binary Merkle tree, we use "solution #3" from above.
283 | That is, we use a keyed compression function, with the key being one of `{0,1,2,3}`
284 | (two bits). The lowest bit is 1 in the bottom-most (that is, the widest) layer,
285 | and 0 otherwise; the other bit is 1 if it's both the last element of the layer,
286 | _and_ it is an odd layer; 0 otherwise. In odd layers, we also add an extra 0 field
287 | element to make it even. This is also valid for the singleton input: in that case
288 | it's both odd and the bottommost, so the root of a singleton input `[x]` will
289 | be `H_{key=3}(x|0)`
290 | - we will use the same strategy when constructing binary Merkle trees with the
291 | SHA256 hash; in that case, the compression function will be `SHA256(x|y|key)`.
292 | Note: since SHA256 already uses padding internally, adding the key does not
293 | result in any overhead.
294 |
--------------------------------------------------------------------------------
/design/marketplace.md:
--------------------------------------------------------------------------------
1 | A marketplace for storage durability
2 | ====================================
3 |
4 | We present a new design for a storage marketplace, that is both simpler and
5 | includes incentives for repair.
6 |
7 | Context
8 | -------
9 |
10 | Our current storage marketplace is designed around the notion of sending out
11 | requests for storage, waiting for hosts to offer storage, and then choosing a
12 | selection from these hosts to start a storage contract with. It requires
13 | separate contracts for each of these hosts, active participation of the client
14 | during the negotiation phase, and does not yet have any provisions for repairing
15 | storage when hosts fail to deliver on their contracts.
16 |
17 | In this document we describe a new design that is simpler, requires less
18 | interactions, and has repair incentives built in.
19 |
20 | A new design
21 | ------------
22 |
23 | We propose to create new type of storage contract, containing a number of slots.
24 | Each of these slots represents an agreement with a storage host to store a part
25 | of the content. When a client wants store data on the network with durability
26 | guarantees, it posts a storage Request on the blockchain. Hosts that want to
27 | offer storage can fill a slot in the Request.
28 |
29 |
30 | --------
31 | ---- fill slot --- | Host |
32 | | --------
33 | |
34 | v
35 | --------------
36 | ---------- | | --------
37 | | Client | --- request ---> | Blockchain | <--- fill slot --- | Host |
38 | ---------- | | --------
39 | --------------
40 | ^
41 | |
42 | | --------
43 | ---- fill slot --- | Host |
44 | --------
45 |
46 |
47 | The Request contains the content identifier, so that hosts can locate
48 | and download the content. It also contains the reward that hosts receive for
49 | storing the data and the collateral that hosts are expected to deposit. It
50 | contains parameters pertaining to storage proofs and erasure coding. And
51 | finally, it contains the amount of hosts that are expected to store the content,
52 | including a small amount of host losses that can be tolerated.
53 |
54 |
55 | Request
56 |
57 | cid # content identifier
58 |
59 | reward # tokens paid per second per filled slot
60 | collateral # amount of collateral required per host and slot
61 |
62 | proof probability # frequency at which proofs are required
63 | proof parameters # proof of retrievability parameters
64 | erasure coding # erasure coding parameters
65 | dispersal # dispersal parameter
66 | repair reward # amount of tokens paid for repairs
67 |
68 | hosts # amount of storage hosts (including loss)
69 | loss # number of allowed host losses
70 |
71 | slots # assigned host slots
72 |
73 | expire # slots need to be filled before timeout
74 |
75 | Slots
76 | -----
77 |
78 | Initially all host slots are empty. An empty slot can be filled by anyone by
79 | submitting a correct storage proof together with collateral.
80 |
81 |
82 | proof & proof &
83 | collateral proof missed collateral missed
84 | | | | | |
85 | v v v v v
86 | -------------------------------------------------------------------
87 | slot: |///////////////////////| |////////////////////|
88 | -------------------------------------------------------------------
89 | | |
90 | v v
91 | collateral collateral
92 | lost lost
93 |
94 |
95 |
96 | ---------------- time ---------------->
97 |
98 |
99 | The time interval that a slot is filled by a host determines the host payout;
100 | for every second of the interval a certain amount of tokens are awarded to the
101 | host. Hosts that fill a slot are required to submit frequent proofs of storage.
102 |
103 | When a certain number of proofs is missed, the slot is considered empty again.
104 | The collateral associated with the slot is mostly burned. Some of it is used to
105 | pay a fee to the node that indicated that proofs were missing, and some of it is
106 | reserved for repairs. An empty slot can be filled again once another host
107 | submits a correct proof together with collateral. Payouts for the time interval
108 | that a slot is empty are burned.
109 |
110 | Payouts for all hosts are accumulated in the smart contract and paid out at Request
111 | end. This is to ensure that the incentive posed by the collateral is not
112 | diminished over time.
113 |
114 | Contract lifecycle
115 | ------------------
116 |
117 | A Request starts when all slots are filled. Regular storage proofs will be
118 | required from the hosts that filled the slots.
119 |
120 | Some Requests may not attract the required amount of hosts, for instance
121 | because the payment is insufficient or the storage demands on the network are
122 | too high. To ensure that such Requests end, we add a timeout to the Request.
123 | If the Request failed to attract sufficient hosts before the timeout is
124 | reached, it is considered cancelled, and the hosts that filled any of the slots
125 | are able to withdraw their collateral. They are also paid for the time interval
126 | before the timeout. The client is able to withdraw the rest of the tokens in the
127 | Request.
128 |
129 | A Request ends when the money that was paid upfront runs out. The end time can
130 | be calculated from the amount of tokens that are paid out per second. Note that
131 | in our scheme this amount does not change during the lifetime of the Request,
132 | even when proofs are missed and repair happens. This is a desirable property
133 | for hosts; they can be sure of a steady source of income, and a predetermined
134 | Request length. When a Request ends, the hosts may withdraw their collateral.
135 |
136 | When too many hosts fail to submit storage proofs, and no other hosts take over
137 | the slots that they vacate, then the content can be considered lost. The
138 | Request is considered failed. The collateral of every host in the Request is
139 | burned as an additional incentive for the network hosts to avoid this scenario.
140 | The client is able to retrieve any funds that are left in the Request.
141 |
142 | |
143 | | create
144 | |
145 | v
146 | ----------- timeout -------------
147 | | new | ------------------> | cancelled |
148 | ----------- -------------
149 | |
150 | | all slots filled
151 | |
152 | v
153 | ----------- too many losses ----------
154 | | started | -------------------> | failed |
155 | ----------- ----------
156 | |
157 | | money runs out
158 | |
159 | v
160 | ------------
161 | | finished |
162 | ------------
163 |
164 |
165 | Repairs
166 | -------
167 |
168 | When a slot is freed because of missing too many storage proofs, some
169 | collateral from the host that previously filled the slot is used as an incentive
170 | to repair the lost content. Repair typically involves downloading other parts of
171 | the content and using erasure coding to restore the missing parts. To incentive
172 | other nodes to do this repair, there is repair fee. It is a partial amount of the original
173 | host's collateral. The size of the reward is a fraction of slot's collateral
174 | where the fraction is parameter of the smart contract.
175 |
176 | The size of the reward should be chosen carefully. It should not be too low, to
177 | incentivize hosts in the network to prioritize repairs over filling new slots in
178 | the network. It should also not be too high, to prevent malicious nodes in the
179 | network to try to disable hosts in an attempt to collect the reward.
180 |
181 | Renewal
182 | -------
183 |
184 | When a Request is about to end, and someone in the network wants the Request
185 | to continue for longer, then they can post a new Request with the same content
186 | identifier.
187 |
188 | We've chosen not to allow top-ups of existing Requests with new funds. Even
189 | though this has many advantages (it's a very simple way to extend the lifetime
190 | of the Request, it allows people to easily chip in to host content, etc.) it
191 | has one big disadvantage: hosts no longer know for how long they'll be kept to
192 | the Request. When a Request is continuously topped up, they cannot leave the
193 | Request without losing their collateral.
194 |
195 | Dispersal
196 | ---------
197 |
198 | Here we propose an alternative way to select hosts for slots that is a
199 | variant of the "first come, first serve" approach that we described earlier. It
200 | intends to alleviate these problems:
201 |
202 | 1. a single host can fill all slots in a Request
203 | 2. a small group of powerful hosts is able to fill most slots in the network
204 | 3. resources are wasted when many hosts try to fill the same slot
205 |
206 | For a client it is beneficial when their content is stored on as many different
207 | hosts as possible, to guard against host failures. Should a single host fill all
208 | slots in the Request, then the failure of this single host could mean that the
209 | content is lost.
210 |
211 | On a network level, we also want to avoid that a few large players are able to
212 | fill most Request slots, which would mean that the network becomes fairly
213 | centralized.
214 |
215 | When too many nodes compete for a slot in a Request, and only one is selected,
216 | then this leads to wasted resources in the network. Wasted resources ultimately
217 | lead to a higher cost of storage.
218 |
219 | To alleviate these problems, we introduce a dispersal parameter in the Request.
220 | The dispersal parameter allows a client to choose the amount of
221 | spreading within the network. When a slot becomes empty then only a small amount
222 | of hosts in the network are allowed to fill the slot. Over time, more and more
223 | hosts will be allowed to fill a slot. Each slot starts with a different set of
224 | allowed hosts.
225 |
226 | The speed at which new hosts are included is chosen by the client. When the
227 | client choses a high speed, then very quickly every host in the network will be
228 | able to fill slots. This increases the chances of a single host to fill all
229 | slots in a Request. When the client choses a low speed, then it is more likely
230 | that different hosts fill the slots.
231 |
232 | We use the Kademlia distance function to indicate which hosts are allowed to
233 | fill a slot.
234 |
235 | distance between a and b: xor(a, b)
236 | slot start point: hash(nonce || slot number)
237 | allowed distance: elapsed time * dispersal parameter
238 |
239 |
240 | Each slot has a different start point:
241 |
242 | slot 4 slot 0 slot 2 slot 3 slot 1
243 | | | | | |
244 | v v v v v
245 | ----·--------·------------------·-------------------·-------------·----
246 |
247 | A host is allowed to fill a slot when the distance between its id and the start
248 | point is less that the allowed distance.
249 |
250 | start point
251 | | Kademlia distance
252 | t=3 t=2 t=1 v
253 | <------(------(------(------·------)------)------)------>
254 | ^ ^
255 | | |
256 | this host is this host is
257 | allowed at t=2 allowed at t=3
258 |
259 | Note that even though we use the Kademlia distance function, this bears no
260 | relation to the DHT. We use the blockchain address of the host, not its peer id.
261 |
262 | This dispersal mechanism still requires modeling to check that it meets its
263 | goals, and to find the optimal value for the dispersal parameter, given certain
264 | network conditions. It is also worth looking into simpler alternatives.
265 |
266 | Conclusion
267 | ----------
268 |
269 | The design that we presented here deviates significantly from the previous
270 | marketplace design.
271 |
272 | There is no explicit negotiation phase for Requests. Clients are no
273 | longer able to choose which hosts will be responsible for keeping the content on
274 | the network. This removes the selection step that was required in the old
275 | design. Instead, a host presents the network with an opportunity to earn money by
276 | storing content. Hosts can decide whether they want to take part in the
277 | Request, and if they do they are expected to keep to their part of the deal
278 | lest they lose their collateral.
279 |
280 | The first hosts that download the content and provide initial storage proofs are
281 | awarded slots in the Request. This removes the explicit Request start (and its
282 | associated timeout behavior) that was required in the old design. It also adds
283 | an incentive to quickly start storing the content while slots are available in
284 | the Request.
285 |
286 | While the old design required separate negotiations per host, this design
287 | ensures that either the single Request starts with all hosts, or is cancelled.
288 | This is a significant reduction in the amount of interactions required.
289 |
290 | The old design required new negotiations when a host is not able to fulfill its
291 | obligations, and a separately designed repair protocol. In this design we
292 | managed to include repair incentives and a repair protocol that is nearly
293 | identical to Request start.
294 |
295 | In the old design we had a single collateral per host that could be used to
296 | cover many Requests. Here we decided to include collateral per Request. This
297 | is done to simplify collateral handling, but it is not a requirement of the new
298 | design. The new design can also be made to work with a single collateral per
299 | host.
300 |
--------------------------------------------------------------------------------
/design/metadata-overhead.md:
--------------------------------------------------------------------------------
1 | # Reducing Metadata Overhead
2 |
3 | Metadata plays a crucial role in any distributed or peer-to-peer (p2p) storage network. However, it often incurs significant overhead for the system. Therefore, it is important to understand the required metadata and how it should be stored, located, and transmitted.
4 |
5 | ## Metadata and Manifests
6 |
7 | Codex utilizes a metadata descriptor structure called the "manifest". A manifest is similar to a torrent file and stores various pieces of information necessary to describe a dataset.
8 |
9 | ```
10 | Manifest
11 | rootHash # Cid of root (tree) hash of the contained data set
12 | originalBytes # Exact size of the original (uploaded) file
13 | blockSize # Size of each contained block
14 | blocks # Array of dataset blocks Cids
15 | version # Cid version
16 | hcodec # Multihash codec
17 | codec # Data set codec
18 | ```
19 |
20 | Additional information that describes erasure coding parameters may also be included:
21 |
22 | ```
23 | Manifest
24 | ...
25 | ecK # Number of blocks to encode
26 | ecM # Number of resulting parity blocks
27 | originalCid # The original Cid of the dataset being erasure coded
28 | originalLen # The length of the original manifest
29 | ```
30 |
31 | Manifests are treated as regular blocks of data, requiring no special handling by the Codex network or nodes. This means that announcing and storing manifests follows the same flow and uses the same subsystems as regular blocks. This convenience simplifies the execution flow significantly.
32 |
33 | ## Manifest limitations
34 |
35 | Including block hashes in the manifest introduces significant limitations. Firstly, the size of the manifest grows linearly with the number of hashes and the size of the hashing function itself, resulting in increased overhead for storing and transmitting manifests.
36 |
37 | Overall, large manifests impose additional burden on the network in terms of storage and transmission, resulting in unnecessary overhead. For example, when retrieving a sizable file, it becomes necessary to obtain all the hashes listed in the manifest before downloading the initial block. This process can require hundreds of megabytes of data.
38 |
39 | One way to reduce the number of hashes is to increase the block size, which only partially addresses the problem. A better solution however, is to completely remove the blocks array from the manifest and instead rely on a Merkle proofs to verify the block.
40 |
41 | ## Slots and verification subsystem support
42 |
43 | Besides the block hashes overhead, another reason for the change is the introduction of slots (verifiable dataset subsets) that nodes in a storage set/group store and verify. Slots require Merkle trees for verification, but otherwise are identical to the top-level dataset. Thus, storing and transmitting Merkle proofs is already a requirement for slot verification.
44 |
45 | Replacing the blocks array with a proper Merkle tree would allow using the same mechanism proposed in this document, for both the top level dataset and for slot verification, storage and transmission. This greatly simplifies integration of the verification subsystem.
46 |
47 | ## Removing blocks array
48 |
49 | As already mentioned, the new mechanism proposed here, removes the blocks array from the manifest file in favor of a separate Merkle tree. This Merkle tree is persisted in the local store, and transmitted along side the dataset blocks on retrieval. This allows verifying the transmitted blocks without knowing it's hashes a priory.
50 |
51 | ## Implementation overview
52 |
53 | This mechanism requires an efficient Merkle tree implementation, which also allows persisting the leafs and intermediary hashes to disk; changes to the block exchange engine to support querying blocks by root hash and block index; and integration with the block store abstraction.
54 |
55 | ### Merkle Tree
56 |
57 | The block hashes array is replaced by a Merkle tree. The Merkle tree should support persisting to disk, partial and non blocking reads/writes, loading and storing from (async) iterators. For reference, checkout out https://github.com/filecoin-project/merkletree.
58 |
59 | ### Block retrieval
60 |
61 | #### Block Exchange Engine
62 |
63 | The block exchange engine requires support for querying blocks by their index and respective dataset Merkle root. It also requires returning the Merkle proofs along side the chunk so that it can be readily verified. Scheduling blocks for retrieval should largely remain the same, but additional request and response messages are required.
64 |
65 | #### Announcing over the DHT
66 |
67 | Also, datasets are now announced by their Merkle root instead of each individual block as was the case in the previous implementation. Announcing individual blocks is still supported, for example manifests are announced exactly the same as before, by their cid. Announcing individual blocks is also supported (but not required) and can be useful in the case of bandwidth incentives.
68 |
69 | ### Block Stores and Local Repo
70 |
71 | All interactions with blocks/chunks sit behind the `BlockStore` abstraction, which currently only supports querying blocks by hash. It should be extended to allow querying by Merkle root and block index and/or range.
72 |
73 | The local repo should be aware of the persisted Merkle tree. When a requests by index is made, the store first locates the persisted Merkle tree corresponding to the specified root and retrieves the requested leaf and corresponding Merkle proofs.
74 |
75 | Once the hash of the request block is known, the repo/store can be queried for the block using the retrieved block hash.
76 |
77 | Keeping support for hash based retrieval (content addressing) has two main advantages:
78 |
79 | 1. It preserves content addressing at the repo level, which enables content deduplication.
80 | 2. It allows keeping the number of required changes to a minimum, as once the block hash is know, the existing flow can be reused.
81 |
82 | ## Updated flow
83 |
84 | ### Upload
85 |
86 | ```mermaid
87 | sequenceDiagram
88 | User ->> +Client: Upload file
89 | loop Store Block
90 | Client ->> +Chunker: Data Stream
91 | loop Chunking
92 | Chunker -->> +Chunker: Chunk stream
93 | Chunker -->> -Client: Block
94 | end
95 | Client ->> +Repo: Put block
96 | Client ->> +MerkleTree: Add Block Hash
97 | end
98 | MerkleTree -->> -Client: Merkle Root
99 | Client ->> MerkleTree: Serialize Merkle Tree
100 | Client ->> Client: Put Merkle Root in Manifest
101 | Client ->> Repo: Persist Manifest
102 | Client -->> User: Manifest Cid
103 | Client ->> DHT: Announce Manifest Cid
104 | Client ->> -DHT: Announce Dataset Cid
105 | ```
106 |
107 | **Steps**:
108 |
109 | 1. User initiates a file upload
110 | 1. Client chunks the stream and stores blocks in the Repo
111 | 2. Block's hash is added to a MerkleTree instance
112 | 3. This is repeated until all data has been read from the stream
113 | 2. Once all blocks have been stored, the Merkle root is generated and persisted
114 | 3. The manifest is persisted and serialized in the repo
115 | 4. The cid of the persisted manifest is returned to the user
116 | 5. Both the manifest Cid and the Dataset Merkle Root Cid are announced on the DHT
117 | 1. This allows locating both the manifest and the dataset individually
118 |
119 | ### Retrieval
120 |
121 | #### Local Flow
122 |
123 | ```mermaid
124 | sequenceDiagram
125 | User ->> Client: Request Manifest Cid
126 | alt If manifest cid in Repo
127 | Client ->> Repo: getBlock(cid)
128 | else Manifest cid not in Repo, request from Network
129 | Client ->> NetworkStore: [See Network Flow]
130 | end
131 | Repo -->> Client: Manifest Block
132 | Client ->> Client: Deserialize Manifest and Read Merkle Root
133 | Client ->> MerkleTree: loadMerkleTree(manifest.cid)
134 | loop Read Dataset
135 | Client ->> MerkleTree: getLeaf(index)
136 | MerkleTree -->> Client: [leaf cid, proof hashes...]
137 | alt If cid in Repo
138 | Client ->> Repo: getBlock(cid)
139 | Repo -->> Client: Data Block
140 | Client -->> User: Stream of blocks
141 | else Cid not in Repo, request from Network
142 | Client ->> NetworkStore: [See Network Flow]
143 | end
144 | end
145 | ```
146 |
147 | **Steps**:
148 |
149 | 1. User initiates a download with a manifest Cid
150 | 2. Client checks the local store for the manifest Cid
151 | 1. If it exists, the manifest is deserialized and the Merkle root of the dataset is read
152 | 2. Otherwise, the Cid is requested from the network store
153 | 3. Client checks the local repo for the Merkle tree root
154 | 1. If it exists, the Merkle tree is deserialized and leaf hashes are read
155 | 2. For each leaf hash which corresponds to the hash of the block
156 | 1. The local repo is checked for the precense of the block
157 | 1. If it exists, it is read from the local store and returned to the client
158 | 2. Otherwise, the Cid is requested from the network store
159 |
160 | #### Network Flow
161 |
162 | ```mermaid
163 | sequenceDiagram
164 | alt If block cid in Repo
165 | Client ->> Repo: getBlock(cid)
166 | Repo -->> Client: Block
167 | else Not in repo or no cid for block
168 | Client ->> NetworkStore: getBlockByIndex(cid, index)
169 | NetworkStore ->> BlockExchange: requestBlock(cid, index)
170 | loop Retrieve Blocks
171 | alt If have peers for Cid
172 | BlockExchange ->> Peers: Request root cid and index (or range)
173 | break Found Block(s)
174 | Peers -->> BlockExchange: [[block, [leaf cid, proof hashes...]]...]
175 | end
176 | else No peers for Cid
177 | loop Find Peers
178 | BlockExchange ->> DHT: Find peers for cid
179 | break Peers Found
180 | DHT -->> BlockExchange: [peers...]
181 | end
182 | end
183 | end
184 | end
185 | BlockExchange -->> NetworkStore: [[block, [proof hashes...]]...]
186 | loop For all blocks
187 | alt If Block hash and Merkle proof is correct
188 | NetworkStore -->> MerkleTree: Store Merkle path
189 | NetworkStore -->> Repo: Store Block
190 | NetworkStore -->> Client: Block
191 | else Block hash and Merkle proof is incorrect
192 | break Incorrect Block or Merkle proof
193 | Client -> NetworkStore: Disconnect bad peer
194 | end
195 | end
196 | end
197 | end
198 | ```
199 |
200 | **Steps**:
201 |
202 | 1. The client requests blocks from the network store, using the Merkle root and block index
203 | 1. Network store requests the block from the BlockExchange engine
204 | 1. BlockExchange checks connected peers for requested hash
205 | 1. If they do, the block is requested using the root hash and index (or range) of the block
206 | 2. Otherwise, it queries the DHT for the requested root hash
207 | 1. Once new peers have been discovered and connected, go to step 1.1.1
208 | 2. Once blocks are received from the remote nodes
209 | 1. The hashes are verified against the requested Merkle root and if they pass
210 | 1. The block is persisted to the repo/local store
211 | 2. The block hash (cid) and the Merkle proof are stored in the persisted Merkle tree
212 | 2. Otherwise, the block is discarded and the node that sent the incorrect block disconnected
213 |
--------------------------------------------------------------------------------
/design/proof-erasure-coding.ods:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codex-storage/codex-research/4a5ddf0e5271bdb49fe6db18e2a2d4268d37c07a/design/proof-erasure-coding.ods
--------------------------------------------------------------------------------
/design/sales.md:
--------------------------------------------------------------------------------
1 | # Sales module
2 |
3 | The sales module is responsible for selling a node's available storage in the
4 | [marketplace](./marketplace.md). In order to do so it needs to know how much
5 | storage is available. It also needs to be able to reserve parts of the storage,
6 | to make sure that it is not used for other purposes.
7 |
8 | ---------------------------------------------------
9 | | |
10 | | Sales |
11 | | |
12 | | ^ | |
13 | | | | updates ------------------ |
14 | | | --------------> | | |
15 | | | | Reservations | |
16 | | ------------------- | | |
17 | | queries ------------------ |
18 | | ^ ^ |
19 | ----------------------------|---------|-----------
20 | | |
21 | reserved space | | state
22 | v v
23 | ---------------- -----------------
24 | | Repo | | Datastore |
25 | ---------------- -----------------
26 |
27 | The reservations module keeps track of storage that is available to be sold.
28 | Users are able to add availability to indicate how much storage they are willing
29 | to sell and under which conditions.
30 |
31 | Availability
32 | amount
33 | maximum duration
34 | minimum price
35 |
36 | Availabilities consist of an amount of storage, the maximum duration and minimum
37 | price to sell it for. They represent storage that is for sale, but not yet sold.
38 | This is information local to the node that can be altered without affecting
39 | global state.
40 |
41 | ## Adding availability
42 |
43 | When a user adds availability, then the reservations module will check whether
44 | there is enough space available in the Repo. If there is enough space, then it
45 | will increase the amount of reserved space in the Repo. It persists the state of
46 | all availabilities to the Datastore, to ensure that they can be restored when a
47 | node is restarted.
48 |
49 | User Reservations Repo Datastore
50 | | | | |
51 | | add availability | | |
52 | | ---------------->| check free space | |
53 | | |----------------->| |
54 | | | reserve amount | |
55 | | |----------------->| |
56 | | | |
57 | | | persist availability |
58 | | |------------------------------>|
59 |
60 | ## Selling storage
61 |
62 | When a request for storage is submitted on chain, the sales module decides
63 | whether or not it wants to act on it. First, it tries to find an availability
64 | that matches the requested amount, duration, and price. If an availability
65 | matches, but is larger than the requested storage, then the Sales module may
66 | decide to split the availability into a part that we can use for the request,
67 | and a remainder that can be sold separately. The matching availability will be
68 | set aside so that it can't be sold twice.
69 |
70 | It then selects a slot from the request to fill, and starts downloading its
71 | content chunk by chunk. For each chunk that is successfully downloaded, a bit of
72 | reserved space in the Repo is released. The content is stored in the Repo with a
73 | time-to-live value that ensures that the content remains in the Repo until the
74 | request expires.
75 |
76 | Once the entire content is downloaded, the sales module will calculate a storage
77 | proof, and submit the proof on chain. If these steps are all successful, then
78 | this node has filled the slot. Once the other slots are filled by other nodes
79 | the request will start. The time-to-live value of the content should then be
80 | updated to match the duration of the storage request.
81 |
82 | Marketplace Sales Reservations Repo
83 | | | | |
84 | | incoming request | | |
85 | |------------------->| find reservation | |
86 | | |-------------------->| |
87 | | | remove reservation | |
88 | | |-------------------->| |
89 | | | | |
90 | | | store content |
91 | | |----------------------------------->|
92 | | | set time-to-live |
93 | | |----------------------------------->|
94 | | | release reserved space |
95 | | |----------------------------------->|
96 | | submit proof | |
97 | |<-------------------| |
98 | | | |
99 | . . .
100 | . . .
101 | | request started | |
102 | |------------------->| update time-to-live |
103 | | |----------------------------------->|
104 |
105 | ## Ending a request
106 |
107 | When a storage request comes to an end, then the content can be removed from the
108 | repo and the storage space can be made available for sale again. The same should
109 | happen when something went wrong in the process of selling storage.
110 |
111 | The time-to-live value should be removed from the content in the Repo, reserved
112 | space in the Repo should be increased again, and the availability that was used
113 | for the request can be re-added to the reservations module.
114 |
115 | Sales Reservations Repo
116 | | | |
117 | | | |
118 | | |
119 | | remove time to live |
120 | |----------------------------------->|
121 | | increase reserved space |
122 | |----------------------------------->|
123 | | |
124 | | re-add availability | |
125 | |-------------------->| |
126 | | | |
127 |
128 | ## Persisting state
129 |
130 | The sales module keeps state in a number of places. Most state is kept on chain,
131 | this includes the slots that a host is filling and the state of each slot. This
132 | ensures that a node's local view of slot states does not deviate from the
133 | network view, even when the network changes while the node is down. The rest of
134 | the state is kept on local disk by the Repo and the Datastore. How much space is
135 | reserved to be sold is persisted on disk by the Repo. The availabilities are
136 | persisted on disk by the Datastore.
137 |
138 | ## Slot queue
139 |
140 | Once a new request for storage is created on chain, all hosts will receive a
141 | contract event announcing the storage request and decide if they want to act on
142 | the request by matching their availabilities with the incoming request. Because
143 | there will be many requests being announced over time, each host will create a
144 | queue of matching request slots, adding each new storage slot to the queue.
145 |
146 | ### Adding slots to the queue
147 |
148 | Slots will be added to the queue when request for storage events are received
149 | from the contracts. Additionally, when slots are freed, a contract event will
150 | also be received, and the slot will be added to the queue. Duplicates are
151 | ignored.
152 |
153 | When all slots of a request are added to the queue, the order should be randomly
154 | shuffled, as there will be many hosts in the network that could potentially pick
155 | up the request and will process the first slot in the queue at the same time.
156 | This should avoid some clashes in slot indices chosen by competing hosts.
157 |
158 | Before slots can be added to the queue, availabilities must be checked to ensure
159 | a matching availability exists. This filtering prevents all slots in the network
160 | from entering the queue.
161 |
162 | ### Removing slots from the queue
163 |
164 | Hosts will also receive contract events for when any contract is started,
165 | failed, or cancelled. In all of these cases, slots in the queue pertaining to
166 | these requests should be removed as they are no longer fillable by the host.
167 | Note: expired request slots will be checked when a request is processed and its
168 | state is validated.
169 |
170 | ### Sort order
171 |
172 | Slots in the queue should be sorted in the following order:
173 | 1. Seen flag (`true` flag should be lower than `false`)
174 | 2. Profit (descending)1
175 | 3. Collateral required (ascending)
176 | 4. Time before expiry (descending)
177 | 5. Dataset size (ascending)
178 |
179 | 1 While profit cannot yet be calculated correctly as this calculation will
180 | involve bandwidth incentives, profit can be estimated as `duration * reward`
181 | for now.
182 |
183 | Note: datset size may eventually be included in the profit algorithm and may not
184 | need to be included on its own in the future. Additionally, data dispersal may
185 | also impact the datset size to be downloaded by the host, and consequently the
186 | profitability of servicing a storage request, which will need to be considered
187 | in the future once profitability can be calculated.
188 |
189 | ### Queue processing
190 |
191 | Queue processing will be started only once, when the sales module starts and
192 | will process slots continuously, in order, until the queue is empty. If the
193 | queue is empty, processing of the queue will resume once items have been added
194 | to the queue. If the queue is not empty, but there are no availabilities, queue
195 | processing will resume once availabilites have been added.
196 |
197 | As soon as items are available in the queue, and there are workers available for
198 | processing, an item is popped from the queue and processed.
199 |
200 | When a slot is processed, it is first checked to ensure there is a matching
201 | availability, as these availabilities will have changed over time. Then, the
202 | sales process will begin. The start of the sales process should ensure that the
203 | slot being processed is indeed available (slot state is "free") before
204 | continuing. If it is not available, the sales process will exit and the host
205 | will continue to process the top slot in the queue. The start of the sales
206 | process should also check to ensure the host is allowed to fill the slot, due to
207 | the [sliding window
208 | mechanism](https://github.com/codex-storage/codex-research/blob/master/design/marketplace.md#dispersal).
209 | If the host is not allowed to fill the slot, the sales process will exit and the
210 | host will process the top slot in the queue.
211 |
212 | #### Preventing continual processing when there are small availabilities
213 | If the processed slot cannot continue because there are no availabilities, the
214 | slot should be marked as `seen` and put back into the queue. This flag will
215 | cause the slot to be ordered lower in the heap queue. If, upon processing
216 | a slot, the slot item already has a `seen` flag set, the queue should be
217 | paused.
218 |
219 | This serves to prevent availabilities that are small (in available bytes) from
220 | emptying the queue.
221 |
222 | #### Pausing the queue
223 | When availabilities are modified or removed, and there are no availabilities
224 | left, the queue should be paused.
225 |
226 | A paused queue will wait until it is unpaused before continuing to process items
227 | in the queue. This prevents unnecessarily popping items off the queue.
228 |
229 | #### Unpausing the queue
230 | When availabilities are modified or added, the queue should be unpaused if it
231 | was paused and any slots in the queue should have their `seen` flag cleared.
232 | Additionally, when slots are pushed to the queue, the queue should be unpaused
233 | if it was paused, however the `seen` flags of existing queue items should not be
234 | cleared.
235 |
236 | #### Queue workers
237 | Each time an item in the queue is processed, it is assigned to a workers. The
238 | number of allowed workers can be specified during queue creation. Specifying a
239 | limited number of workers allows the number of concurrent items being processed
240 | to be capped to prevent too many slots from being processed at once.
241 |
242 | During queue processing, only when there is a free worker will an item be popped
243 | from the queue and processed. Each time an item is popped and processed, a
244 | worker is removed from the available workers. If there are no available workers,
245 | queue processing will resume once there are workers available.
246 |
247 | #### Adding availabilities
248 | When a host adds an availability, a signal is triggered in the slot queue with
249 | information about the availability. This triggers a lookup of past request for
250 | storage events, capped at a certain number of past events or blocks. The slots
251 | of the requests in each of these events are added to the queue, where slots
252 | without matching availabilities are filtered out (see [Adding slots
253 | to the queue](#adding-slots-to-the-queue) above). Additionally, when slots of
254 | these requests are processed in the queue, they will be checked to ensure that
255 | the slots are not filled (see [Queue processing](#queue-processing) above).
256 |
257 | ### Implementation tips
258 |
259 | Request queue implementations should keep in mind that requests will likely need
260 | to be accessed randomly (by key, eg request id) and by index (for sorting), so
261 | implemented structures should handle these types of operations in as little time
262 | as possible.
263 |
264 | ## Repo
265 |
266 | The Repo exposes the following functions that allow the reservations module to
267 | query the amount of available storage, to update the amount of reserved
268 | space, and to store data for a guaranteed amount of time.
269 |
270 | Repository API:
271 | function available(): amount
272 | function reserve(amount)
273 | function release(amount)
274 | function setTtl(cid, ttl)
275 |
276 | ## Datastore
277 |
278 | The Datastore is a generic key-value store that is used to persist the state of
279 | the Reservations module, so that it survives node restarts.
280 |
281 | Datastore API:
282 | function put(key, value)
283 | function get(key): value
284 |
--------------------------------------------------------------------------------
/design/slot-reservations.md:
--------------------------------------------------------------------------------
1 | # Slot reservations
2 |
3 | Competition between storage providers (SPs) to fill slots has some advantages,
4 | such as providing an incentive for SPs to become proficient in downloading
5 | content and generating proofs. It also has some drawbacks, for instance it can
6 | lead to network inefficiencies because multiple SPs do the work of downloading
7 | and proving, while only one SP is rewarded for it. These inefficiencies lead to
8 | higher costs for SPs, which leads to an overall increase in the price of storage
9 | on the network. It can also lead to clients inadvertently inviting too much
10 | network traffic to themselves. Should they for instance post a very lucrative
11 | storage request, then this invites a lot of SPs to start downloading the content
12 | from the client simultaneously, not unlike a DDOS attack.
13 |
14 | Slot reservations are a means to avoid these inefficiencies by only allowing SPs
15 | who have secured a slot reservation to fill the slot. Furthermore, slots can
16 | only be reserved by eligible SPs, governed by a window of eligible addresses
17 | that starts small and grows larger over time, eventually encompassing the entire
18 | address space on the network.
19 |
20 | ## Proposed solution: slot reservations
21 |
22 | Before downloading the content associated with a slot, a limited number of SPs
23 | can reserve the slot. Only SPs that have reserved the slot can fill the slot.
24 | After the SP downloads the content and calculates a proof, it can move the slot
25 | from its reserved state into the filled state by providing collateral and the
26 | storage proof. Then it begins to periodically provide storage proofs and accrue
27 | payments for the slot.
28 |
29 | ```
30 | reserve proof & collateral
31 | | |
32 | v v
33 | ---------------------------------------------
34 | slot: |/ / / / / / / / / |/////////////////////////
35 | ---------------------------------------------
36 | | |
37 | v v
38 | slot slot
39 | reserved filled
40 |
41 |
42 | ---------------- time ---------------->
43 | ```
44 |
45 | There is an initial race for eligible SPs who are first to secure a reservation,
46 | then a second race amongst the SPs with a reservation to fill the slot (with
47 | collateral and the generated proof). However, not all SPs in the network can
48 | reserve a slot initially: the [expanding window
49 | mechanism](https://github.com/status-im/codex-research/blob/ad41558900ff8be91811aa5de355148d8d78404f/design/marketplace.md#dispersal)
50 | dictates which SPs are eligible to reserve the slot.
51 |
52 | ### Expanding window mechanism
53 |
54 | The expanding window mechanism prevents node and network overload once a slot
55 | becomes available to be filled (or repaired) by allowing only a very small
56 | number of SP addresses to fill/repair the slot at the start. Over time, the
57 | number of eligible SP addresses increases, until eventually all SP addresses in
58 | the network are eligible.
59 |
60 | The expanding window mechanism starts off with a random source address, defined
61 | as $hash(block hash, request id, slot index, reservationindex)$, with a unique
62 | source address for each reservation of each slot. The distance between each SP
63 | address and the source address can be defined as $XOR(A, A_0)$ (kademlia
64 | distance). Once the allowed distance is greater than SP's distance, the SP is
65 | considered eligible to reserve a slot. The allowed distance for eligible
66 | addresses over time $t_i$ can be [defined
67 | as](https://hackmd.io/@bkomuves/BkDXRJ-fC) $2^{256} * F(t_i)$, where $2^{256}$
68 | represents the total number of 256-bit addresses in the address space, and
69 | $F(t_i)$ represents the expansion function over time. As this allowed distance
70 | value increases along a curve, more and more addresses will be eligible to
71 | participate in reserving that slot. In total, eligible addresses are those that
72 | satisfy:
73 |
74 | $XOR(A, A_0) < 2^{256} * F(t_i)$
75 |
76 | Furthermore, the client can change the curve of the rate of expansion, by
77 | setting a [dispersal
78 | parameter](https://github.com/codex-storage/codex-research/blob/ad41558900ff8be91811aa5de355148d8d78404f/design/marketplace.md#dispersal)
79 | of the storage request, $h$, which represents the percentage of the network
80 | addresses that will be eligible halfway to the time of expiry. $h$ can be
81 | defined as:
82 |
83 | $h := F(0.5)$, where $0 \lt h \lt 1$ and $h \neq 0.5$
84 |
85 | Changing the value of $h$ will [affect the curve of the rate of
86 | expansion](https://www.desmos.com/calculator/pjas1m1472) (interactive graph).
87 |
88 | #### Expansion function, $F(t_i)$, in-depth
89 |
90 | $F(t_i)$ defines the expansion factor of eligible addresses in the network over
91 | time.
92 |
93 | ##### Assumptions
94 |
95 | It is assumed network addresses are randomly, and more-or-less uniformly,
96 | selected from a space of $2^{256}$.
97 |
98 | It is also assumed that the window can only change in discrete steps, based on
99 | some underlying blockchain's cadence (for example this would be approx every 12
100 | seconds in the case of Ethereum), and that we measure time based on timestamps
101 | encoded in blockchain blocks.
102 |
103 | However, with this assumption given, it is desired to be as granular and tunable
104 | as possibly.
105 |
106 | There is a time duration in which it is desired to go from a single network
107 | address to the whole address-space.
108 |
109 | To be able to make this work nicely, first a linear time function $t_i$ which
110 | goes from 0 to 1, is defined.
111 |
112 | ##### Implementation
113 |
114 | At any desired block with timestamp $timestamp_i$, simply compute:
115 |
116 | $$t_i := \frac{timestamp_i - start}{expiry - start}$$
117 |
118 | Then to get a network range, any kind of expansion function $F(x)$ with $F(0)=0$
119 | and $F(1)=1$ can be plugged in; for example, a parametric exponential:
120 |
121 | $$ F_s(x) = \frac{\exp(sx) - 1}{\exp(s) - 1} $$
122 |
123 | Remark: with this particular function, it is likely desired to have $s<0$
124 | (resulting in fast expansion initially, slowing down later). Here is a
125 | Mathematica one-liner to play with this idea:
126 | ```
127 | Manipulate[
128 | Plot[ (Exp[s*x]-1)/(Exp[s]-1), {x,0,1}, PlotRange->Full ],
129 | {s,-10,-1} ]
130 | ```
131 | As an alternative, the same can easily be done with eg. the online
132 | [Desmos](https://www.desmos.com/calculator) tool.
133 |
134 | ##### Address window
135 |
136 | Finally, an address $A$ becomes eligible at block $i$ if the Kademlia distance
137 | from the "window center" $A_0$ is smaller than $2^{256}\times F(t_i)$:
138 |
139 | $$ XOR(A,A_0) < 2^{256}\cdot F(t_i) $$
140 |
141 | Note: since $t_i$ only becomes 1 exactly at expiry, to allow the whole network
142 | to participate near the end, there should be a small positive $\delta > 0$ such
143 | that $F(t)=1$ for $t>1-\delta$, leaving the last about $100\delta$ percentage of
144 | the total slot fill window when the whole network is eligible to participate.
145 |
146 | Alternatively, $t_i$ could be rescaled to achieve the same effect:
147 |
148 | $$ t_i' := \min(\; t_i/(1-\delta)\;,\;1\;) $$
149 |
150 | The latter is probably simpler because it allows complete freedom in selecting
151 | the expansion function $F(x)$.
152 |
153 | ##### Parametrizing the speed of expansion
154 |
155 | While, in theory, arbitrary expansions functions could be used, it is likely
156 | undesirable to have more than an one parameter family, that is, a single
157 | parameter to set the curve. However, even with a single parameter, there
158 | could be any number of different ways to map a number to the same family of
159 | curves.
160 |
161 | In the above example $F_s(t)$, while $s$ is quite natural from a mathematical
162 | perspective, it doesn't really have any meaning for the user. A possibly better
163 | parametrization would be the value $h:=F_s(0.5)$, meaning "how big percentage of
164 | network is allowed to participate at half-time". $s$ can be computed from $h$:
165 |
166 | $$ s = 2\log\left(\frac{1-h}{h}\right) $$
167 |
168 | ### Abandoned ideas
169 |
170 | #### No reservation collateral
171 |
172 | Reservation collateral was thought to be able to prevent a situation where an SP
173 | would reserve a slot then fail to fill it. However, collateral could not be
174 | burned as it created an attack vector for clients: clients could withhold the
175 | data and cause SPs to lose their reservation collateral. The reservation
176 | transaction itself creates a signal of intent from an SP to fill the slot. If
177 | the SP were to not fill the slot, then other SPs that have reserved the slot
178 | will fill it.
179 |
180 | #### No reservation/fill reward
181 |
182 | Fill rewards were originally proposed to incentivize filling slots as fast as
183 | possible. However, the SPs are already being paid out for the time that they
184 | have filled the slot, thus negating the need for additional incentivization. If
185 | additional incentivization is desired by the client, then an increase in the
186 | value of the storage request is possible.
187 |
188 | Adding a fill reward for SPs who ultimately fill the slot is not necessary
189 | because, like the SP rewards for providing proofs, fill rewards would be paid
190 | once the storage request successfully completes. This would mean that the fill
191 | reward is effectively the same as an increase in value of the storage request
192 | payout. Therefore, if a client is so inclined to provide a fill reward, they
193 | could instead increase the total reward of the storage request.
194 |
195 | In this simplified slot reservations proposal, there will not be reservation
196 | collateral nor reward requirements until the behavior in a live environment can
197 | be observed to determine these are necessary mechanisms.
198 |
199 | ### Slot reservation attacks
200 |
201 | Name | Attack description
202 | :------------|:--------------------------------------------------------------
203 | Clever SP | SP drops slot when a better opportunity presents itself
204 | Lazy SP | SP reserves a slot, but doesn't fill it
205 | Censoring SP | acts like a lazy SP for specific CIDs that it tries to censor
206 | Greedy SP | SP tries to fill multiple slots in a request
207 | Sticky SP | SP tries to fill the same slot in a contract renewal
208 | Lazy client | client doesn't release content on the network
209 |
210 | #### Clever SP attack
211 |
212 | In this attack, an SP could fill a slot, and while fulfilling its duties, see
213 | that a better opportunity has arisen, and abandon its duties in the first slot
214 | to fill the second slot.
215 |
216 | This attack is mitigated by the SP losing its request collateral for the first
217 | slot once it is abandoned. Additionally, once the SP fills the first slot, it
218 | will accrue rewards over time that will not be paid out until the request
219 | successfully completes. These rewards act as another disincentive for the SP to
220 | abandon the slot.
221 |
222 | The behavior of SPs filling better opportunities is not necessarily an attack.
223 | If an SP is fulfilling its duties on a slot and finds a better opportunity
224 | elsewhere, it should be allowed to do so. The repair mechanisms will allow the
225 | abandoned slot to be refilled by another SP that deems it profitable.
226 |
227 | #### Lazy SP attack
228 |
229 | In this attack, a SP reserves a slot, but waits to fill the slot hoping a better
230 | opportunity will arise, in which the reward earned in the new opportunity would
231 | be greater than the reward earned in the original slot.
232 |
233 | This attack is mitigated by allowing for multiple reservations per slot. All SPs
234 | that have secured a reservation (capped at three) will race to fill the slot.
235 | Thus, if one or more SPs that have reserved the slot decide to pursue other
236 | opportunities, the other SPs that have reserved the slot will still be able to
237 | fill the slot.
238 |
239 | In addition, the expanding window mechanism allows for more SPs to participate
240 | (reserve/fill) as time progresses, so there will be a larger pool of SPs that
241 | could potentially fill the slot. Because each reservation will have its own
242 | unique expanding window source, SPs reserving one slot in a request will likely
243 | not have the same opportunities to reserve/fill the same slot in another
244 | request.
245 |
246 | #### Censoring SP attack
247 |
248 | The "censoring SP attack" is when an SP attempts to withhold providing specific
249 | CIDs from the network in an attempt to censor certain content. An SP could also
250 | try this attack in the case of repair, hoping to prevent a freed slot from being
251 | repaired.
252 |
253 | Even if one SP withholds specific content, the dataset, along with the withheld
254 | CID can be reconstructed from K chunks (provided by other SPs) allowing the
255 | censored CID to be accessed. In the case of repair, the SP would need to control
256 | M+1 chunks to prevent data reconstruction by other nodes in the network. The
257 | expanding window mechanism seeks to prevent SPs from filling multiple slots in
258 | the same request, which should prevent any one SP from controlling M+1 slots.
259 |
260 | #### Greedy SP attack
261 |
262 | The "greedy SP attack" is when one SP tries to fill multiple slots in a single
263 | request. Mitigation of this attack is achieved through the expanding windows for
264 | each request not allowing a single SP address to fill all the slots. This is
265 | only effective for the majority of time before expiry, however, meaning it is
266 | not impossible for this attack to occur. If a request is offered and the slots
267 | are not filled after some time, the expanding windows across the slots may open
268 | up to allow all SPs in the network to fill multiple slots in the request.
269 |
270 | A controlling entity may try to circumvent the expanding window by setting up a
271 | sybil attack with many highly distributed nodes. Even with many nodes covering a
272 | large distribution of the address space, the randomness of the expanding window
273 | will make this attack highly improbable, except for undervalued requests that do
274 | not have slots filled early, in which case there would be a lack of motivation
275 | to attack data that is undervalued.
276 |
277 | #### Sticky SP attack
278 |
279 | The "sticky SP attack" is where an SP tries to withhold data for a contract
280 | renewal so they are able to fill the slot again. The SP withholds data from all
281 | other SPs until the expanding window allows their address, then they quickly
282 | fill the slot (they are quick because they don't need to download the data). As
283 | in the censoring SP attack, the SP would need to control M+1 slots for this to
284 | be effective, because that is the only way to prevent the CID from being
285 | reconstructed from K slots available from other SPs.
286 |
287 | #### Lazy client attack
288 |
289 | In this attack, a client might want to disrupt the network by creating requests
290 | for storage but never releasing the data to SPs attempting to fill the slot. The
291 | transaction cost associated with this type of behavior should provide some
292 | mitigation. Additionally, if a client tries to spam the network with these types
293 | of untenable storage requests, the transaction cost will increase with the
294 | number of requests due to increasing block fill rate and rising gas costs
295 | associated. However, this attack is not impossible.
296 |
297 | ### Open questions
298 |
299 | Perhaps the expanding window mechanism should be network-aware such
300 | that there are always a minimum of two SPs in a window at a given time, to
301 | encourage competition? The downside of this is that active SPs need to be
302 | persisted and tracked in the contract, with larger transaction costs resulting
303 | from this.
304 |
305 | ### Trade offs
306 |
307 | The main advantage to this design is that nodes and the network would not be
308 | overloaded at the outset of slots being available for SP participation.
309 |
310 | The downside of this proposal is that an SP would have to participate in two
311 | races: one for reserving the slot and another for filling the slot once
312 | reserved, which brings additional complexities in the smart contract.
313 |
314 | In addition, there are two attack vectors, the "greedy SP attack" and the "lazy
315 | client attack" that are not well covered in the slot reservation design. There
316 | could be even more complexities added to the design to accommodate these two
317 | attacks (see the other proposed solution for the mitigation of these attacks).
318 |
--------------------------------------------------------------------------------
/design/storage-proof-timing.md:
--------------------------------------------------------------------------------
1 | Timing of Storage Proofs
2 | ========================
3 |
4 | We present a design that allows a smart contract to determine when proofs of
5 | storage should be provided by hosts.
6 |
7 | Context
8 | -------
9 |
10 | Hosts that are compensated for providing storage of data, are held accountable
11 | by providing proofs of storage periodically. It's important that a host is not
12 | able to pre-compute those proofs, otherwise it could simply delete the data and
13 | only store the proofs.
14 |
15 | A smart contract should be able to check whether those proofs were delivered in
16 | a correct and timely manner. Either the smart contract will be used to perform
17 | these checks directly, or as part of an arbitration mechanism to keep validators
18 | honest.
19 |
20 | Design 1: block by block
21 | ------------------------
22 |
23 | A first design used the property that blocks on Ethereum's main chain arrive in
24 | a predictable cadence; about once every 14 seconds a new block is produced. The
25 | idea is to use the block hashes as a source of non-predictable randomness.
26 |
27 | From the block hash you can derive a challenge, which the host uses together
28 | with the stored data to generate a proof of storage.
29 |
30 | Furthermore, we use the block hash to perform a die roll to determine whether a
31 | proof is required or not. For instance, if a storage contract stipulates that a
32 | proof is required once every 1000 blocks, then each new block hash leads to a 1
33 | in 1000 chance that a proof is required. This ensures that a host should always
34 | be able to deliver a storage proof at a moments notice, while keeping the total
35 | costs of generating and validating proofs relatively low.
36 |
37 | Problems with block cadence
38 | ---------------------------
39 |
40 | We see a couple of problems emerging from this design. The main problem is that
41 | block production rate is not as reliable as it may seem. The steady cadence of
42 | the Ethereum main net is (by design) not shared by L2 solutions, such as
43 | rollups.
44 |
45 | Most L2 solutions therefore [warn][2] [against][3] the use of the block interval
46 | as a measure of time, and tell you to use the block timestamp instead. Even
47 | though these are susceptible to some miner influence, over longer time intervals
48 | they are deemed reliable.
49 |
50 | Another issue that we run into is that on some L2 designs, block production
51 | increases when there are more transactions. This could lead to a death spiral
52 | where an increasing number of blocks leads to an increase in required proofs,
53 | leading to more transactions, leading to even more blocks, etc.
54 |
55 | And finally, because storage contracts between a client and a host are best
56 | expressed in wall clock time, there are going to be two different ways of
57 | measuring time in the same smart contract, which could lead to some subtle bugs.
58 |
59 | These problems lead us to the second design.
60 |
61 | Design 2: block pointers
62 | ------------------------
63 |
64 | In our second design we separate cadence from random number selection. For
65 | cadence we use a time interval measured in seconds. This divides time into
66 | periods that have a unique number. Each period represents a chance that a proof
67 | is required.
68 |
69 | We want to associate a random number with each period that is used for the proof
70 | of storage challenge, and for the die roll to determine whether a proof is
71 | required. But since we no longer have a one-on-one relationship between a period
72 | and a block hash, we need to get creative.
73 |
74 | EVM and solidity
75 | ----------------
76 |
77 | For context, our smart contracts are written in Solidity and execute on the EVM.
78 | In this environment we have access to the [most recent 256 block hashes][1], and
79 | to the current time, but not to the timestamps of the previous blocks. We also
80 | have access to the current block number.
81 |
82 | Block pointers
83 | --------------
84 |
85 | We introduce the notion of a block pointer. This is a number between 0 and 256
86 | that points to one of the latest 256 block hashes. We count from 0 (latest
87 | block) to 255 (oldest available block).
88 |
89 | oldest latest
90 | - - - |-----------------------------------|
91 | 255 ^ 0
92 | |
93 | pointer
94 |
95 | We want to associate a block pointer with a period such that it keeps pointing
96 | to the same block hash when new blocks are produced. We need this because the
97 | block hash is used to check whether a proof is required, to check a proof when
98 | it's submitted, or to prove absence of a proof, all at different times.
99 |
100 | To ensure that the block pointer points to the same block hash for longer
101 | periods of time, we derive it from the current block number:
102 |
103 | pointer(period) = (blocknumber + period) % 256
104 |
105 | Each time a new block is produced the block pointer increases by one, which
106 | ensures that it keeps pointing to the same block. Over time, when more blocks
107 | are produced, we get this picture:
108 |
109 | |
110 | | - - - |-----------------------------------|
111 | | 255 ^ 0
112 | | |
113 | | pointer
114 | t
115 | i - - - |-----------------------------------|
116 | m 255 ^ 0
117 | e |
118 | | pointer
119 | |
120 | | - - - |-----------------------------------|
121 | | 255 ^ 0
122 | | |
123 | v pointer
124 |
125 | Avoiding surprises
126 | ------------------
127 |
128 | There is one problem left when we use the pointer as we've just described.
129 | Because of the modulus, there are periods in which the pointer wraps around. It
130 | moves from 255 to 0 from one block to the next. This is undesirable because it
131 | would mean that new proof requirements could all of a sudden appear, leaving too
132 | little time for the host to calculate and submit a proof.
133 |
134 | We identified two ways of dealing with this problem: pointer downtime and
135 | pointer duos. Pointer downtime appears to be the simplest solution, so we
136 | present it here. Pointer duos are described in the appendix.
137 |
138 | Pointer downtime
139 | ----------------
140 |
141 | We ignore any proof requirements when the pointer is pointing to one of the most
142 | recent blocks:
143 |
144 | - - - |-------------------------|/////////|
145 | 255 ^ 0
146 | |
147 | pointer
148 |
149 | When the pointer is in the grey zone, no proof is required. The amount of blocks
150 | in the grey zone should be chosen such that it is not possible to generate them
151 | all inside a period; this ensures that a host cannot be surprised by new proof
152 | requirements popping up.
153 |
154 | If we want a host to provide a proof on average once every N periods, it now no
155 | longer suffices to have a 1 in N chance to provide a proof. Because there are no
156 | proof requirements in the grey zone, the odds have changed in favor of the host.
157 | To compensate, the odds outside of the grey zone should be increased. For
158 | instance, if the grey zone is 64 blocks (¼ of the available blocks), then the
159 | odds of requiring a proof should be 1 in ¾N.
160 |
161 | --------------------------------------------------------------------------------
162 |
163 | Appendix: Pointer duos
164 | ----------------------
165 |
166 | An alternative solution to the pointer wrapping problem is pointer duos:
167 |
168 | pointer1(period) = (blocknumber + period) % 256
169 | pointer2(period) = (blocknumber + period + 128) % 256
170 |
171 | The pointers are 128 blocks apart, ensuring that when one pointer wraps, the
172 | other remains stable.
173 |
174 | - - - |-----------------------------------|
175 | 255 ^ ^ 0
176 | | |
177 | pointer pointer
178 |
179 | We allow hosts to choose which of the two pointers to use. This has implications
180 | for the die roll that we perform to determine whether a proof is required.
181 |
182 | If we want a host to provide a proof on average once every N periods, it no
183 | longer suffices to have a 1 in N chance to provide a proof. Should a host be
184 | completely free to choose between the two pointers (which is not entirely true,
185 | as we shall see shortly) then the odds of a single pointer should be 1 in √N to
186 | get to a probability of `1/√N * 1/√N = 1/N` of both pointers leading to a proof
187 | requirement.
188 |
189 | In reality, a host will not be able to always choose either of the two pointers.
190 | When one of the pointers is about to wrap before validation is due, it can no
191 | longer be relied upon. A really conservative host would follow the strategy of
192 | always choosing the pointer that points to the most recent block, requiring the
193 | odds to be 1 in N. A host that tries to optimize towards providing as little
194 | proofs as necessary will require the odds to be nearer to 1 in √N.
195 |
196 | [1]: https://docs.soliditylang.org/en/v0.8.12/units-and-global-variables.html#block-and-transaction-properties
197 | [2]: https://community.optimism.io/docs/developers/build/differences/#block-numbers-and-timestamps
198 | [3]: https://support.avax.network/en/articles/5106526-measuring-time-in-smart-contracts
199 |
--------------------------------------------------------------------------------
/evaluations/account abstraction.md:
--------------------------------------------------------------------------------
1 | Ethereum Account Abstraction
2 | ============================
3 |
4 | A high level overview of what the current state of account abstraction in
5 | Ethereum is and what role it might play in the Codex design.
6 |
7 | TL;DR: Account abstraction does not impact the design of Codex
8 |
9 | Current state
10 | -------------
11 |
12 | There have been several proposals to introduce [account abstraction][roadmap]
13 | for Ethereum. Most of them required changes to the consensus mechanism, and were
14 | therefore postponed and have not made it into mainnet. [ERC-4337][4337] is a
15 | newer proposal that uses smart contracts and does not require changes to the
16 | consensus mechanism. It uses a separate mempool for transaction-like objects
17 | called "user operations". They are picked up by bundlers who bundle them into an
18 | actual transaction that is executed on-chain. ERC-4337 is the closest to being
19 | usable on mainnet.
20 |
21 | An ERC-4337 entry point [contract][entrypoint] is deployed on mainnet since
22 | March 2023. One bundler seems to be active ([Stackup][stackup]), although at the
23 | time of writing it seems to be running neither regularly nor without errors.
24 |
25 | Codex use cases
26 | ---------------
27 |
28 | Potential Codex use cases for account abstraction are:
29 |
30 | - Paying for storage without requiring ETH to pay for gas
31 | - Checking for missing storage proofs
32 |
33 | Clients pay for storage and hosts put down collateral in the Codex marketplace.
34 | They need both ERC-20 tokens for payment and collateral and ETH for gas. We
35 | expect wallet providers to make full use of ERC-4337 to implement transactions
36 | where gas is paid for by ERC-20 tokens instead of ETH. These wallets can then be
37 | used to interact with the Codex marketplace. This does not require a change to
38 | the design of Codex itself.
39 |
40 | In our current design for the Codex marketplace we require hosts to provide
41 | [storage proofs][proofs] at unpredictable times. If they fail to provide a
42 | proof, then a simple [validator][validator] can mark a proof as missing. Even
43 | though the marketplace smart contract has all the logic to determine whether a
44 | proof is actually missing, we need the validator to initiate a transaction to
45 | execute the logic.
46 |
47 | Some of the write-ups on account abstraction seem to indicate that account
48 | abstraction would allow for contracts to initiate transactions, or for
49 | subscriptions and repeat payments. However, I could not find any indications in
50 | the specifications that this would be the case. Certainly ERC-4337 does not
51 | allow for this. This means that account abstraction as it currently stands
52 | cannot be used to replace the validator when checking for missing storage
53 | proofs.
54 |
55 | [roadmap]: https://ethereum.org/en/roadmap/account-abstraction/
56 | [4337]: https://eips.ethereum.org/EIPS/eip-4337
57 | [entrypoint]: https://etherscan.io/address/0x5FF137D4b0FDCD49DcA30c7CF57E578a026d2789
58 | [stackup]: https://www.stackup.sh/
59 | [proofs]: https://github.com/codex-storage/codex-research/blob/33cd86af4d809c39c7c41ca50a6922e6b5963c67/design/storage-proof-timing.md
60 | [validator]: https://github.com/codex-storage/nim-codex/pull/387
61 |
--------------------------------------------------------------------------------
/evaluations/arweave.md:
--------------------------------------------------------------------------------
1 | ---
2 | published: false
3 | ---
4 | An evaluation of the Arweave paper
5 | ==================================
6 |
7 | 2021-05-18 Dagger Team
8 |
9 | https://www.arweave.org/yellow-paper.pdf
10 |
11 | Goal of this evaluation is to find things to adopt or avoid while designing
12 | Dagger. It is not meant to be a criticism of Arweave.
13 |
14 | #### Pros:
15 |
16 | + There is no distinction between full and light clients, merely clients that
17 | downloaded more or less of the blockweave. (§2.2)
18 | + Prefential treatment of peers is discouraged, because nodes are unaware when
19 | they're being monitored for responsiveness. (§3.4.2)
20 | + Interesting 'meta-game' on top of tit-for-tat, in which nodes monitor their
21 | peers on how they rank other peers. (§6.1)
22 | + Because behaviour of nodes is largely based on local rules and the local view
23 | that a node has of its peers, the network is able to shift behaviour gradually
24 | in response to a changing environment. (§6.2)
25 |
26 | #### Cons:
27 |
28 | - Proof of Work is used for the underlying blockweave (§3.1), which is
29 | rather wasteful.
30 | - Data is stored indefinitely, which is great for public information, but not so
31 | great for ephemeral private data. This makes storage unnecessarily expensive
32 | for data with a short lifespan. (§3.1)
33 | - Network is free at point of use for external users, raising questions about
34 | scalability of the network when faced with highly popular content. (§3.4.2)
35 | Incentives for data replication help (§7.1.2), but it is unlikely that it
36 | will hold up when the network grows in content (§8.2, §8.3). These incentives
37 | can also lead to unnecessary duplication of unpopular content.
38 | - Nodes with limited connectivity are discouraged from participating in the
39 | network, which precludes use on mobile devices. (§3.4.3)
40 | - There is an economic incentive for a miner to not to share old blocks with
41 | other miners, because it increases its chance of "winning" the new block.
42 | (§4.1.1)
43 | - There is an economic incentive for miners to have the strictest censorship
44 | rules, because otherwise a block that it mined might be rejected by others.
45 | (§5.1)
46 | - The majority of the network determines the censorship rules. This could prove
47 | troublesome should Arweave's Proof of Work lead to similar geographic
48 | centralization of mining power as we see in Bitcoin. (§5.3)
49 | - Transaction ID is used for addressing, instead of a content hash. (§7.1.1)
50 | - Uses HTTP for inter-node traffic, instead of an established peer-to-peer
51 | protocol. (§7.1.3)
52 |
--------------------------------------------------------------------------------
/evaluations/eigenlayer.md:
--------------------------------------------------------------------------------
1 | Eigenlayer
2 | ==========
3 |
4 | 2024-05-29
5 |
6 | A review of the Eigenlayer and EIGEN token whitepapers, with some thoughts on
7 | how this could be applied to Codex.
8 |
9 | * [Eigenlayer whitepaper](https://docs.eigenlayer.xyz/assets/files/EigenLayer_WhitePaper-88c47923ca0319870c611decd6e562ad.pdf)
10 | * [EIGEN token whitepaper](https://docs.eigenlayer.xyz/assets/files/EIGEN_Token_Whitepaper-0df8e17b7efa052fd2a22e1ade9c6f69.pdf)
11 |
12 | Eigenlayer
13 | ----------
14 |
15 | The core idea of Eigenlayer is to reuse the collateral that is already staked on
16 | the Ethereum beacon chain for other protocols beside Ethereum. The collateral
17 | that Ethereum validators put up to ensure that they stick to the Ethereum
18 | consensus protocol, is used to also ensure that they also follow to the rules of
19 | other protocols. In exchange for this, they get rewarded additional fees from
20 | these protocols.
21 |
22 | Eigenlayer has an open marketplace in which protocols advertise themselves, and
23 | validators can opt in to help secure these protocols by restaking their
24 | collateral (§2).
25 |
26 | The main mechanism that is used, is to have the Ethereum validators set the
27 | withdrawal address of their collateral to an Eigenlayer smart contract (§2.1).
28 | This means that when the validator behaved nicely on the Ethereum network and
29 | wants to exit the network, their stake will then be passed to an Eigenlayer
30 | contract. This Eigenlayer contract will then perform additional checks to ensure
31 | that the validator wasn't slashed by any additional protocols that the validator
32 | participated in, before releasing the stake (§3.1).
33 |
34 | ### Incentives and centralization ###
35 |
36 | This raises the question: what happens to the incentive for the validator to
37 | behave nicely if their collateral has already been forfeited in Eigenlayer. And
38 | what would the consequences for the Ethereum beacon chain be if this were to
39 | happen to a large number of validators simultaneously? In the whitepaper two
40 | mitigations are mentioned: security audits (§3.4.2) and the ability to veto
41 | slashings (§3.5). Before a protocol is allowed onto the marketplace it needs to
42 | be verified through a security audit. And if the protocol were to inadvertently
43 | slash a large group of validators (e.g. through a bug in its smart contract),
44 | then there is a governing group that can veto these slashings. The downside to
45 | these mitigations is that they are both centralizing forces, because there is
46 | now a small group of people that decide whether a protocol is admitted to the
47 | marketplace, and a small group of people that can veto slashings.
48 |
49 | Eigenlayer claims to incentivize decentralization by allowing protocols to
50 | specify that they only want to make use of stake that is put up by home stakers
51 | (§4.4). However, given the permissionless nature of Ethereum, it is not possible
52 | to distinguish home stakers from a large centralized player with many
53 | validators, each having its own address.
54 |
55 | A further centralizing force in Eigenlayer is its license, which is not an open
56 | source license. This means that only the Eigenlayer developers can change the
57 | Eigenlayer code, and forking is not allowed.
58 |
59 | ### Potential use cases for Codex ###
60 |
61 | There are a couple of places in Codex that might benefit from restaking. We
62 | could allow Ethereum validators to use (a part of) their stake on the beacon
63 | chain for filling slots in storage contracts. There are a few downsides to this.
64 | It becomes rather difficult to reason about how high the stake for a storage
65 | contract should be when when the stake behind a storage provider's promise can
66 | be shared with a number of other protocols (§3.4.1). Codex uses part of the
67 | slashed stake to incentivize repair, which would not be possible with restaking,
68 | because the stake only becomes available in Eigenlayer after the validator stops
69 | validating the beacon chain, and withdraws its collateral. That is, if the stake
70 | hasn't already been slashed by the beacon chain. Also, the hardware requirements
71 | for running an Ethereum validator are sufficiently different from the
72 | requirements of running a Codex provider, that we do not expect there to be many
73 | people that run both.
74 |
75 | We might also use restaking to keep proof aggregators honest (§4.1, point 6).
76 | Preferably using a combination of staked Codex tokens and restaked ETH (§4.4),
77 | so that we increase the utility of the Codex token while also guarding against
78 | value loss of the token.
79 |
80 | And finally, we might use restaking to keep participants in a nano payments
81 | scheme honest (§4.1, point 2 and 8). We intend to add bandwidth payments to
82 | Codex, and for this we need nano payments, for which a blockchain is too slow.
83 | Ideally we'd have a lighter form of consensus for these payments. The validators
84 | of this lighter form of consensus could be kept honest by restaking.
85 |
86 | EIGEN Token
87 | -----------
88 |
89 | The EIGEN token is a separate project only marginally related to Eigenlayer. It
90 | allows staking to disincentivize subjective faults. In contrast to objective
91 | faults, subjective faults cannot be coded into a smart contract, but need to be
92 | adjudicated by people (§1.2).
93 |
94 | This is implemented though a forkable token (§2.3.1) called EIGEN. Every time a
95 | subjective decision needs to be made, someone can create a new EIGEN' token, and
96 | start using that instead of the old token. If everyone agrees, then the new
97 | token will gain in perceived value, while the perceived value of the old token
98 | approaches 0.
99 |
100 | In the whitepaper a protocol is described to ensure that forking the token
101 | doesn't impact long-term holders of the token (§2.7).
102 |
103 | A centralizing force in the design is the security council, a small group of
104 | people in charge of freezing and/or upgrading the smart contracts (§2.7.4).
105 |
106 | Conclusion
107 | ----------
108 |
109 | Given the centralizing aspects of Eigenlayer, it is probably not a good
110 | foundation to build parts of the Codex protocol. The idea of restaking is an
111 | interesting one, but not without its own risks that are not easy to quantify.
112 |
113 | The EIGEN token is probably not interesting for Codex, because we've taken great
114 | effort to ensure that bad behaviour on the network is either objectively
115 | punishable or economically disincentivized, negating the need for human
116 | adjudication.
117 |
--------------------------------------------------------------------------------
/evaluations/filecoin.md:
--------------------------------------------------------------------------------
1 | An evaluation of the Filecoin whitepaper
2 | ========================================
3 |
4 | 2020-12-08 Mark Spanbroek
5 |
6 | https://filecoin.io/filecoin.pdf
7 |
8 | Goal of this evaluation is to find things to adopt or avoid while designing
9 | Dagger. It is not meant to be a criticism of Filecoin.
10 |
11 | #### Pros:
12 |
13 | + Clients do not need to actively monitor hosts. Once a deal has been agreed
14 | upon, the network checks proofs of storage.
15 | + The network actively tries to repair storage faults by introducing new
16 | orders in the storage market. (§4.3.4).
17 | + Integrity is achieved because files are addressed using their content
18 | hash (§4.4).
19 | + Marketplaces are explicitly designed and specified (§5).
20 | + Micropayments via payment channels (§5.3.1).
21 | + Integration with other blockchain systems such as Ethereum (§7.2) are being
22 | worked on.
23 |
24 | #### Cons:
25 |
26 | - Filecoin requires its own very specific blockchain, which influences a lot
27 | of its design. There is tight coupling between the blockchain, storage
28 | accounting, proofs and markets.
29 | - Proof of spacetime is much more complex than simple challenges, and only
30 | required to make the blockchain work (§3.3, §6.2)
31 | - A miners influence is proportional to the amount of storage they're
32 | providing (§1.2), which is an incentive to become big. This could lead to
33 | the same centralization issues that plague Bitcoin.
34 | - Incentives are geared towards making the particulars of the Filecoin
35 | design work, instead of directly aligned with users' interest. For instance,
36 | there are incentives for storage and retrieval, but it seems that a miner
37 | would be able to make money by only storing data, and never offering it for
38 | retrieval. Also, the incentive for a miner to store multiple independent
39 | copies does not mean protection against loss if they're all located on the
40 | same failing disk.
41 | - The blockchain contains a complete allocation table of all things that are
42 | stored in the network (§4.2), which raises questions about scalability.
43 | - Zero cash entry (such as in Swarm) doesn't seem possible.
44 | - Consecutive micropayments are presented as a solution for the trust problems
45 | while retrieving (§5.3.1), which doesn't entirely mitigate withholding
46 | attacks.
47 | - The addition of smart contracts (§7.1) feels like an unnecessary
48 | complication.
49 |
--------------------------------------------------------------------------------
/evaluations/ipfs.md:
--------------------------------------------------------------------------------
1 | An evaluation of the IPFS paper
2 | ===============================
3 |
4 | 2021-01-07 Dagger Team
5 |
6 | https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf
7 |
8 | Goal of this evaluation is to find things to adopt or avoid while designing
9 | Dagger. It is not meant to be a criticism of IPFS.
10 |
11 | #### Pros:
12 |
13 | + IPFS is designed by simplifying, evolving, and connecting proven techniques
14 | (§3)
15 | + Consists of a stack of separately described sub-protocols (§3)
16 | + Uses Coral DSHT to favor data that is nearby, reducing latency of lookup
17 | (§2.1.2)
18 | + Uses proof-of-work in S/Kademlia to discourage Sybil attacks (§2.1.3)
19 | + Favors self-describing values such as multihash (§3.1) and multiaddr (§3.2.1)
20 | + BitSwap protocol for exchanging blocks supports multiple strategies (§3.4.2),
21 | so it should be relatively easy to add a micropayment strategy.
22 | + Uses content addressing (§3.5)
23 | + The Merkle DAG is simple, yet allows constructing filesystems,
24 | key-value stores, databases, messaging system, etc.. (§3.5)
25 |
26 | #### Cons:
27 |
28 | - Kademlia prefers long-lived nodes (§2.1.1), which is not ideal for mobile
29 | environments (although it's unclear whether there are any better alternatives)
30 | - The default BitSwap strategy falls just short of introducing a currency with
31 | micro payments, necessitating additional work for nodes to find blocks to
32 | barter with (§3.4)
33 | - Object pinning (§3.5.3) inevitably leads to centralized gateways to IPFS, such
34 | as Infura and Pinata
35 | - IPFS uses variable size blocks instead of fixed-size chunks (§3.6), which
36 | might make it a bit harder to add incentives and pricing
37 | - Supporting version control directly in IPFS feels like an unnecessary
38 | complication (§3.6.5)
39 |
--------------------------------------------------------------------------------
/evaluations/rollups.md:
--------------------------------------------------------------------------------
1 | Ethereum L2 Rollups
2 | ===================
3 |
4 | A quick and dirty overview of existing rollups and their suitability for hosting
5 | the Codex marketplace smart contracts. To interact with these contracts, the
6 | participants in the network create blockchain transactions for purchasing and
7 | selling storage, and for providing storage proofs that are then checked
8 | on-chain. It would be too costly for these transactions to happen on Ethereum
9 | main net, which is why this document explores L2 rollups as an alternative.
10 |
11 | Main sources used:
12 | - individual websites of the rollup projects
13 | - https://l2beat.com
14 | - https://blog.kroma.network/l2-scaling-landscape-fees-and-max-tps-fe6087d3f690
15 |
16 | Requirements
17 | ------------
18 |
19 | For the storage marketplace to work, we have the following requirements for a
20 | rollup:
21 | 1. Low gas costs; if gas is too costly then the business case of storage
22 | providers disappears
23 | 2. EVM compatibility; this shortens our time to market because we already have
24 | Solidity contracts
25 | 3. Support for BN254 elliptic curve precompiles (ecAdd, ecMul, ecPairing) for
26 | the proof system
27 | 4. High throughput; our current proof system that checks all proofs separately
28 | on chain requires a large number of transactions per second
29 | 5. Censorship resistant; an L2 operator should not have the power to exclude
30 | transactions from certain people or apps
31 |
32 | Note that low latency is not a requirement; it's ok to have latency equivalent
33 | to L1, which is in the order of tens of seconds.
34 |
35 | Main flavours
36 | -------------
37 |
38 | Although there are many L2 rollups, there is a limited number of technical
39 | stacks that underly them.
40 |
41 | There is the family of purely optimistic rollups, that rely on fraud proofs to
42 | ensure that they are kept honest:
43 | - Arbitrum
44 | - Optimism / OP Stack
45 | - Fuel
46 |
47 | And there are the rollups that rely on zero-knowledge proofs to prove that they
48 | act honestly:
49 | - Polygon zkEVM / CDK
50 | - Linea
51 | - zkSync
52 | - Scroll
53 |
54 | And there's Taiko, which uses a combination of zero-knowledge proofs and fraud
55 | proofs to keep the network honest:
56 | - Taiko
57 |
58 | Gas prices
59 | ----------
60 |
61 | A rough approximation of average gas prices for submitting a Codex storage proof
62 | for each rollup:
63 |
64 | | Rollup | Average proof price | Potential profit |
65 | | ------------------- | ------------------ | ---------------- |
66 | | Mantle | $0.0000062723 | $2.58 |
67 | | Boba network | $0.0016726250 | -$2.54 |
68 | | Immutable zkEVM | $0.0073595500 | -$20.01 |
69 | | Arbitrum | $0.0083631250 | -$23.09 |
70 | | zkSync Era | $0.0209078125 | -$61.63 |
71 | | Base | $0.0418156250 | -$125.86 |
72 | | Optimism | $0.0836312500 | -$254.32 |
73 | | Polygon zkEVM | $0.1254468750 | -$382.77 |
74 | | Blast | $0.1672625000 | -$511.23 |
75 | | Scroll | $0.2090781250 | -$639.69 |
76 | | Taiko | $0.2508937500 | -$768.15 |
77 | | Metis | $0.4014300000 | -$1,230.59 |
78 | | Linea | $0.8363125000 | -$2,566.55 |
79 |
80 | This table was created by eyeballing the gas cost and token price graphs for
81 | each L2, and [calculating the USD costs](rollups.ods) for a proof from that. We
82 | did not include rollups that are not EVM compatible.
83 |
84 | Potential profit (per month per TB) is calculated by assuming operational costs
85 | of $1.40 and revenue of $4.00 per TB per month, an average slot size of 10 GB,
86 | and an average of 1 proof per slot per day.
87 |
88 | EVM compatibility
89 | -----------------
90 |
91 | This shows which rollups are EVM compatible, and whether they support the BN254
92 | elliptic curve precompiles that we require for verification of our storage
93 | proofs (ecAdd, ecMul, ecPairing).
94 |
95 | | Rollup | EVM compatible | Elliptic Curve operations |
96 | | --------------------- | -------------- | ------------------------- |
97 | | Arbitrum | Yes | Yes |
98 | | Base | Yes | Yes |
99 | | Blast | Yes | Yes |
100 | | Boba network | Yes | Yes |
101 | | Immutable zkEVM | Yes | Yes |
102 | | Linea | Yes | Yes |
103 | | Mantle | Yes | Yes |
104 | | Metis | Yes | Yes |
105 | | Optimism | Yes | Yes |
106 | | Polygon zkEVM | Yes | Yes |
107 | | Scroll | Yes | Yes |
108 | | Taiko | Yes | Yes |
109 | | zkSync Era | Yes | No |
110 | | Fuel L2 V1 | No | N/A |
111 | | Fuel Rollup OS | No | N/A |
112 | | Immutable X | No | N/A |
113 | | Polygon Miden | No | N/A |
114 | | Starknet | No | N/A |
115 | | zkSync lite | No | N/A |
116 |
117 |
118 | Throughput
119 | ----------
120 |
121 | A rough approximation of the maximum number of transactions that a rollup can
122 | handle, and the maximum size of the storage network that it might support:
123 |
124 | | Rollup | Maximum TPS | Maximum storage |
125 | | --------------------- | ----------- | --------------- |
126 | | zkSync Era | 750 | 1236 PB |
127 | | Starknet | 484 | 798 PB |
128 | | Optimism | 455 | 750 PB |
129 | | Base | 455 | 733 PB |
130 | | Mantle | 400 | 659 PB |
131 | | Metis | 357 | 588 PB |
132 | | Polygon zkEVM | 237 | 391 PB |
133 | | Arbitrum | 226 | 372 PB |
134 | | Boba network | 205 | 338 PB |
135 | | Scroll | 50 | 82 PB |
136 | | Taiko | 33 | 54 PB |
137 | | Blast | ? | ? |
138 | | Immutable zkEVM | ? | ? |
139 | | Linea | ? | ? |
140 | | Fuel L2 V1 | ? | ? |
141 | | Fuel Rollup OS | ? | ? |
142 | | Immutable X | ? | ? |
143 | | Polygon Miden | ? | ? |
144 | | zkSync lite | ? | ? |
145 |
146 | Maximum size of the storage network is [calculated](rollups.ods) assuming an
147 | average 1 proof per 24 hours per slot, average slot size 10 GB, and average
148 | erasure coding rate of 1/2. In practice the calculated maximum storage is going
149 | to be less, because we can't use up the entirety of the rollup capacity.
150 |
151 | Maximum TPS figures are taken from an [overview document by
152 | Kroma](https://blog.kroma.network/l2-scaling-landscape-fees-and-max-tps-fe6087d3f690)
153 |
154 | Censorship resistance
155 | ---------------------
156 |
157 | Censorship resistance can be achieved by having a decentralized architecture,
158 | where anyone is allowed to propose blocks and there are no admin rights that
159 | allow a rollup operator to change the rules in their favour.
160 |
161 | Only Fuel L2 V1 has all these properties, the others don't. And because Fuel L2
162 | V1 is a payment network without smart contracts it is not suitable for the Codex
163 | marketplace. This means that at this moment there is no censorship resistant
164 | rollup that can host the Codex marketplace.
165 |
166 | Taiko is one of the few rollups that has a decentralized architecture, and it's
167 | committed to becoming permissionless. However, at the moment it is not.
168 |
169 | | Rollup | Decentralized | Permissionless | Adminless |
170 | | --------------------- | ------------- | --------------- | ------------ |
171 | | Fuel L2 V1 | Yes | Yes | Yes |
172 | | Metis | Yes | No | No |
173 | | Taiko | Yes | No | No |
174 | | Arbitrum | No | N/A | N/A |
175 | | Base | No | N/A | N/A |
176 | | Blast | No | N/A | N/A |
177 | | Boba network | No | N/A | N/A |
178 | | Fuel Rollup OS | No | N/A | N/A |
179 | | Immutable X | No | N/A | N/A |
180 | | Immutable zkEVM | No | N/A | N/A |
181 | | Linea | No | N/A | N/A |
182 | | Mantle | No | N/A | N/A |
183 | | Optimism | No | N/A | N/A |
184 | | Polygon zkEVM | No | N/A | N/A |
185 | | Polygon Miden | No | N/A | N/A |
186 | | Scroll | No | N/A | N/A |
187 | | Starknet | No | N/A | N/A |
188 | | zkSync lite | No | N/A | N/A |
189 | | zkSync Era | No | N/A | N/A |
190 |
191 | Conclusion
192 | ----------
193 |
194 | There seems to be no rollup that matches all the requirements that we listed in
195 | the beginning of the document. The most pressing problem is that only Mantle
196 | seems to be cheap enough to allow storage providers to turn a profit, given the
197 | assumptions of an average 10 GB slot size and 1 proof per 24 hours. It is
198 | unclear whether these low prices are sustainable in the long run. If we want to
199 | have more choice on where to deploy, then we either need to reduce the number of
200 | on-chain proofs in Codex drastically, or we need to find a way to reduce rollup
201 | transaction costs.
202 |
203 | Luckily we're already working on reducing the number of proofs by introducing
204 | proof aggregation, but the analysis in this document shows that we might not be
205 | able to launch a storage network without it. Reducing the number of proofs also
206 | ensures that the network can grow to a larger total size.
207 |
208 | When we look at reducing the transaction costs, then the best thing to focus on
209 | is getting rid of the need to post transactions on L1 in blobs. This is by far
210 | the most expensive part of running a rollup, and this is most likely also why
211 | Mantle is the only rollup in this overview that is cheap enough; it uses EigenDA
212 | instead of posting to L1. In this respect, it might also be interesting to look
213 | at Arbitrum AnyTrust, which has a similar design. We could also consider
214 | creating a fork of an existing rollup and use Codex as the DA layer. Some of the
215 | new modular architectures for creating rollups, such as
216 | [Espresso](https://www.espressosys.com/), [Astria](https://www.astria.org/),
217 | [Radius](https://www.theradius.xyz/), [Altlayer](https://www.altlayer.io/) and
218 | [NodeKit](https://www.nodekit.xyz/) could also make it easier to experiment with
219 | different rollup designs.
220 |
--------------------------------------------------------------------------------
/evaluations/rollups.ods:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codex-storage/codex-research/4a5ddf0e5271bdb49fe6db18e2a2d4268d37c07a/evaluations/rollups.ods
--------------------------------------------------------------------------------
/evaluations/sia.md:
--------------------------------------------------------------------------------
1 | An evaluation of the Sia whitepaper
2 | ===================================
3 |
4 | 2020-12-07 Mark Spanbroek
5 |
6 | https://sia.tech/sia.pdf
7 |
8 | Goal of this evaluation is to find things to adopt or avoid while designing
9 | Dagger. It is not meant to be a criticism of Sia.
10 |
11 | #### Pros:
12 |
13 | + Clients do not need to actively monitor hosts (§1). Once a contract has been
14 | agreed upon, the host earns/loses coins based on proofs of storage that the
15 | network can check.
16 | + Denial of service attacks can be mitigated by burning funds associated with
17 | missed proofs (§4).
18 | + Proof of storage is simple; provide a random piece of the file, and the
19 | corresponding Merkle proof (§5.1).
20 | + Promotes erasure codes to safeguard against data loss (§7.2).
21 | + Suggests to use payment channels for micro-payments (§7.3).
22 | + The basic reputation system is protected against Sybil attacks (§7.4).
23 |
24 | #### Cons:
25 |
26 | - Sia has its own blockchain (§1), which makes some attacks more likely
27 | (§5.2, §5.3). This can be mitigated by adopting a widely used, general purpose
28 | blockchain such as Ethereum.
29 | - Requires a multi-signature scheme (§2).
30 | - The proof-of-storage algorithm requires that hosts store the entire file (§4),
31 | instead of a few chunks.
32 | - Contracts can be edited (§4). This feels like an unnecessary complication of
33 | the protocol.
34 | - Randomness for the storage proofs comes from the latest block hash (§5.1).
35 | This can be manipulated, especially when using a specialized blockchain for
36 | storage.
37 | - There is an arbitrary data field that might be used for advertisements in a
38 | storage marketplace (§6). This feels like a very restrictive environment for a
39 | marketplace, and an unnecessary complication for the underlying blockchain.
40 | - It is suggested that clients use erasure coding before encryption (§7.2). If
41 | this were reversed (first encryption, then erase coding) then this would open
42 | up scenario's for caching and re-hosting by those who do not possess the
43 | decryption key.
44 | - Consecutive micropayments are presented as a solution for the trust problems
45 | while downloading (§7.3). This assumes that the whole file, or a large part of
46 | it, is stored on a single host. It also doesn't entirely mitigate withholding
47 | attacks.
48 | - The basic reputation system favors hosts that have already earned or bought
49 | coins (§7.4). It is also unclear how the reputation system discourages abuse.
50 | - Governance seems fairly centralized, with most funds and proceeds going to a
51 | single company (§8).
52 |
--------------------------------------------------------------------------------
/evaluations/sidechains.md:
--------------------------------------------------------------------------------
1 | Side chains
2 | ===========
3 |
4 | This document looks at the economics of running the Codex marketplace contracts
5 | on an Ethereum side chain. Both existing side chains and running a dedicated
6 | sidechain for Codex are considered. Existing Ethereum [rollups][1] seem to be
7 | too expensive by about a factor of 100 for our current storage proof scheme
8 | (without proof aggregation). We'd like to find out if using a side chain could
9 | sufficiently lower the transactions costs.
10 |
11 | [1]: ../evaluations/rollups.md
12 |
13 |
14 | Existing side chains
15 | --------------------
16 |
17 | First we'll take a look at Polygon PoS and Gnosis chain to determine the average
18 | gas costs for submitting a storage proof on these chains. Then we'll estimate
19 | what the operational costs are of running these chains and compare that against
20 | their revenue in gas fees. This is done to see whether there is any room for
21 | reducing prices should we want to run a dedicated side chain for Codex.
22 |
23 | ### Gas prices ###
24 |
25 | A rough approximation of average gas costs for submitting a Codex storage proof
26 | for these chains:
27 |
28 | | Side chain | Average proof costs | Potential profit |
29 | | ------------------- | ------------------ | ---------------- |
30 | | Polygon PoS | $0.0070250250 | -$18.98 |
31 | | Gnosis chain | $0.0050178750 | -$12.81 |
32 |
33 | This table was created by eyeballing the gas cost and token price graphs for
34 | each chain over the past 6 months, and [calculating the USD
35 | costs](sidechains.ods) for a proof from that.
36 |
37 | Potential profit (per month per TB) is calculated by assuming operational costs
38 | of $1.40 and revenue of $4.00 per TB per month, an average slot size of 10 GB,
39 | and an average of 1 proof per slot per day.
40 |
41 | ### Throughput ###
42 |
43 | A rough approximation of the maximum number of transactions that a chain can
44 | handle, and the maximum size of the storage network that it might support:
45 |
46 | | Side chain | Maximum TPS | Maximum storage |
47 | | --------------------- | ----------- | --------------- |
48 | | Polygon PoS | 255 | 420 PB |
49 | | Gnosis chain | 156 | 257 PB |
50 |
51 | Maximum size of the storage network is [calculated](sidechains.ods) assuming an
52 | average 1 proof per 24 hours per slot, average slot size 10 GB, and average
53 | erasure coding rate of 1/2.
54 |
55 | ### Decentralization ###
56 |
57 | Polygon PoS has substantially less validators than Gnosis chain:
58 |
59 | | Side chain | Number of validators |
60 | | --------------------- | -------------------- |
61 | | Polygon PoS | 100 |
62 | | Gnosis chain | 200 000 |
63 |
64 | ### Network costs ###
65 |
66 | To get an idea of the actual costs for running a chain, we estimate the hardware
67 | costs needed to keep the network running. We take the cost of running a single
68 | validator and multiply this by the number of validators. This should give us an
69 | idea how much of the gas price is used to cover operational costs, and how much
70 | is profit. These are [back of the envelope calculations](sidechains.ods) using
71 | data from the past 6 months to get an idea of the order of magnitude, not meant
72 | to be very accurate:
73 |
74 | | Side chain | hardware costs | network fees | cost / revenue ratio |
75 | | ------------ | ------------------ | -----------------| --------------------- |
76 | | Polygon PoS | $28 000 / month | $840 000 / month | 3% |
77 | | Gnosis chain | $4 000 000 / month | $15 000 / month | 26667% |
78 |
79 | While Polygon PoS seem to have a healthy margin for profit, the validators of
80 | the Gnosis chain are spending about 250x more on hardware costs than is covered
81 | by the network fees. This is mostly due to the large amount of validators, and
82 | seems to be compensated for by reserving tokens and using them for paying out
83 | [validator rewards][2]. Also, Polygon PoS has a utilization of about 90%,
84 | whereas Gnosis chain has a utilization of about 25%.
85 |
86 | [2]: https://forum.gnosis.io/t/gno-utility-and-value-proposition/2344#current-gno-distribution-and-gno-burn-5
87 |
88 |
89 | A custom side chain for Codex
90 | -----------------------------
91 |
92 | Next, we'll look at ways in which we could reduce gas costs by deploying a
93 | dedicated side chain for Codex.
94 |
95 | ### EVM opcode pricing ###
96 |
97 | Ethereum transactions consist of EVM operations. Each operation is priced in
98 | amount of gas. Some operations [are more expensive than others][3], mainly
99 | because they require more resources (cpu, storage) than others. Gas costs are
100 | also specifically engineered to withstand DoS attacks on validators.
101 |
102 | Tweaking the gas prices of EVM opcodes does not seem to be the most viable path
103 | to lowering transaction costs, because it only determines how expensive
104 | operations are relative to one another, they don't determine the actual price.
105 | It is also difficult to oversee the security risks.
106 |
107 | [3]: https://notes.ethereum.org/@poemm/evm384-update5#Background-on-EVM-Gas-Costs
108 |
109 | ### Gas pricing ###
110 |
111 | The biggest factor that determines the actual costs of transactions is the gas
112 | price. The transaction costs is [determined according to the following
113 | formula][4] as specified by [EIP-1559][5]:
114 |
115 | `fee = units of gas used * (base fee + priority fee)`
116 |
117 | The base fee is calculated based on how full the latest blocks are. If they are
118 | above the target block size of 15 million gas, then the base fee increases. If
119 | they are below the target block size, then the base fee decreases. The base fee
120 | is burned when the transaction is included. The priority fee is set by the
121 | transaction sender, and goes to the validators. It is a mechanism for validators
122 | to prioritize transactions. It also acts as an incentive for validators to not
123 | produce empty blocks.
124 |
125 | Both priority base fee and transaction fee go up when there is more demand
126 | (submitted transactions) than there is supply (maximum transactions per second).
127 | This is the main reason why transactions are as expensive as they are:
128 |
129 | "No transaction fee mechanism, EIP-1559 or otherwise, is likely to substantially
130 | decrease average transaction fees; persistently high transaction fees is a
131 | scalability problem, not a mechanism design problem" -- [Tim Roughgarden][6]
132 |
133 | [4]: https://ethereum.org/en/developers/docs/gas/#how-are-gas-fees-calculated
134 | [5]: https://eips.ethereum.org/EIPS/eip-1559
135 | [6]: http://timroughgarden.org/papers/eip1559.pdf
136 |
137 |
138 | ### The scalability problem ###
139 |
140 | Ultimately high transaction costs is a scalability issue. And unfortunately
141 | there are [no easy solutions][7] for increasing the amount of transactions that
142 | the chain can handle. For instance, just increasing the block size introduces
143 | several issues. Increasing block size increases the amount of resources that
144 | validators need (cpu, memory, storage, bandwidth). This means that it becomes
145 | more expensive to run a validator, which leads to a decrease in the number of
146 | validators, and an increase in centralization. It also increases block time,
147 | because there is more time needed to dissemminate the block. And this actually
148 | decreases the capacity of the network in terms of transactions per second, which
149 | counters the positive effect that you get from increasing the block size.
150 |
151 | [7]: https://cryptonews.com/news/contrary-to-musk-s-idea-you-can-t-just-increase-block-size-s-10426.htm
152 |
153 | Conclusion
154 | ----------
155 |
156 | The gas prices on existing side chains that we looked are not low enough for
157 | storage providers to turn a profit, as long as we haven't implemented proof
158 | aggregation yet.
159 |
160 | From the costs analysis of Polygon PoS it seems feasible to launch a
161 | not-for-profit dedicated side chain for Codex that reduces transaction costs by
162 | about a factor of 10. This should be enough for storage providers to start
163 | making a very modest profit if they charge $4/TB/month. Polygon PoS achieves
164 | this by keeping a relatively low amount of validators, which is something to
165 | keep in mind when deploying a side chain for Codex. Also, as soon there are more
166 | transactions than fit in the blocks, and the chain is running at capacity, then
167 | the gas price will go up.
168 |
169 | For the short term it seems viable to start with a dedicated side chain for
170 | Codex, while there is no high demand yet. This gives us time to work on reducing
171 | the number of transactions, for instance by aggregating storage proofs. In the
172 | beginning the number of transactions won't be sufficient to cover the costs of
173 | running validators, so some sponsoring of validators will be required to
174 | bootstrap the chain.
175 |
176 | For the medium term we can consider to have multiple side chains depending on
177 | demand. If demand is reaching capacity of the existing side chain(s) then
178 | another side chain is started. This ensures that none of the side chains runs at
179 | full capacity, keeping the prices low. Because each side chain can be bridged to
180 | main net, funds can be moved from one side chain to the other. The obvious
181 | downside of this is fragmentation of the marketplace. However, the reason to add
182 | a side chain is because demand is high, so each fragment should have a healthy
183 | marketplace. Also, the Codex peer-to-peer network would not be fragmented, only
184 | the marketplace. Meaning that there is still a single content-addressable data
185 | space.
186 |
187 | For the long term we should probably move to a blockchain that supports a higher
188 | number of transactions at a lower cost than is currently available.
189 |
--------------------------------------------------------------------------------
/evaluations/sidechains.ods:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codex-storage/codex-research/4a5ddf0e5271bdb49fe6db18e2a2d4268d37c07a/evaluations/sidechains.ods
--------------------------------------------------------------------------------
/evaluations/statechannels/disputes.md:
--------------------------------------------------------------------------------
1 | State Channel Disputes
2 | ======================
3 |
4 | A problem with state channels is that participants have to remain "online"; they
5 | need to keep an eye on latest state of the underlying blockchain and be able to
6 | respond to any disputes about the final state of the state channel. Ironically
7 | this problem stems from the mechanism that allows a state channel to be closed
8 | when a participant goes offline. Closing a channel unilaterally is allowed, but
9 | there is a period in which the other participant can dispute the final state of
10 | the channel. Therefore participants should be monitoring the blockchain so that
11 | they can respond during a dispute period.
12 |
13 | ### Pisa
14 |
15 | https://www.cs.cornell.edu/~iddo/pisa.pdf
16 |
17 | The PISA protocol enables a participant to outsource monitoring of the
18 | underlying blockchain to an untrusted watchtower. The main idea is that a hash
19 | of the latest state channel update is sent to the watchtower. The watchtower
20 | responds with a signed promise to use this information to settle any disputes
21 | that may arise. Should the watchtower fail to do so, then the signed promise can
22 | be used as proof of negligence and it will lose its substantial collateral.
23 |
24 | A potential problem with this scheme is that the collateral is shared among all
25 | state channels that the watchtower is monitoring, which could lead to bribing
26 | attacks.
27 |
28 | ### Brick
29 |
30 | https://arxiv.org/pdf/1905.11360.pdf
31 |
32 | The BRICK protocol provides an alternative to the dispute period based on
33 | byzantine consistent broadcast. Participants in a state channel assign a
34 | committee that is allowed to sign off on channel closing in case they are not
35 | able to do so themselves. Instead of waiting for a period of time before
36 | unilaterally closing the channel, with BRICK you wait for a threshold number of
37 | committee members to confirm the latest state of the channel. This is much
38 | faster.
39 |
40 | Each state channel update contains a sequence number, which is signed by the
41 | channel participants and sent to the committee members. For a channel to be
42 | closed unilaterally, the BRICK smart contract requires a signed state update,
43 | and signed sequence numbers provided by a majority of the committee. The highest
44 | submitted sequence number should match the submitted state update. Committee
45 | members that submit a lower sequence number lose the collateral that they
46 | provided when the channel was opened.
47 |
48 | A potential problem with the implementation of BRICK as outlined in the paper is
49 | that the collateral scheme is vulnerable to sybil attacks; committee members can
50 | attempt to steal their own collateral by providing proof of their own
51 | wrongdoing.
52 |
53 | Unilateral closing is also rather heavy on blockchain transactions; each
54 | committee member has to separately perform a transaction on chain to supply
55 | their sequence number.
56 |
--------------------------------------------------------------------------------
/evaluations/statechannels/overview.md:
--------------------------------------------------------------------------------
1 | State Channels
2 | ==============
3 |
4 | State channels are a level 2 solution to enable fast and cheap transactions
5 | between parties, whose trust is anchored on a blockchain.
6 |
7 | We'll go through the evolution of state channels in somewhat chronological
8 | order. Starting with the most simple form: uni-directional payment channels.
9 |
10 | Uni-directional payment channels
11 | --------------------------------
12 |
13 | Payments are one-to-one, and flow in one direction only. They are easy to
14 | understand and are the base upon which further enhancements are built.
15 |
16 | Flow:
17 |
18 | 1. Alice locks up an amount of coins (e.g. 1 Eth) in a smart contract
19 | on-chain. This opens up the payment channel. She's not able to touch the
20 | coins for a fixed amount of time. Bob is the only one able to withdraw at
21 | any time.
22 | 2. She then sends Bob payments off-chain, which basically amount to signed
23 | "Alice owes Bob x Eth" statements. The amount owed is strictly increasing.
24 | For instance, if Alice first pays Bob 0.2 Eth, and then 0.3 Eth, then Bob
25 | first receives a statement "Alice owes Bob 0.2 Eth", and then "Alice owes
26 | Bob 0.5 Eth".
27 | 3. Bob sends the latest statement from Alice to the smart contract, which pays
28 | Bob the total amount due. This closes the payment channel.
29 |
30 |
31 | ------------
32 | | Contract | <------ 1 ---- Alice
33 | | |
34 | | | | | |
35 | | | | | |
36 | | | 2 2 2
37 | | | | | |
38 | | | | | |
39 | | | v v v
40 | | |
41 | | | <------ 3 ---- Bob
42 | ------------
43 |
44 |
45 | Bi-directional payment channels
46 | -------------------------------
47 |
48 | Payments are one-to-one, and are allowed to flow in both directions.
49 |
50 | Flow:
51 |
52 | 1. Both Alice and Bob lock up an mount of coins to open the payment channel.
53 | 2. Alice and Bob send each other payments off-chain, whereby they sign the
54 | total amount owed for both parties. For instance, when Bob sends 0.3 Eth
55 | after Alice sent 0.2 Eth, he will sign the statement:
56 | "A->B: 0.2, B->A: 0.3". These statements have a strictly increasing
57 | version number.
58 | 3. At any time, Alice or Bob can use the latest signed statement and ask
59 | the smart contract to pay out the amounts due. This closes the payment
60 | channel. To ensure that Alice and Bob do not submit an old statement,
61 | there is a period in which the other party can provide a newer statement.
62 |
63 | Because of the contention period these channels take longer to close in case of
64 | a dispute. Also, both parties need to remain online and keep up with the latest
65 | state of the blockchain.
66 |
67 | ------------
68 | | Contract | <------ 1 ---- Alice ----
69 | | | |
70 | | | | ^ ^ |
71 | | | | | | |
72 | | | 2 2 2 |
73 | | | | | | |
74 | | | | | | |
75 | | | v | | |
76 | | | |
77 | | | <------ 3 ---- Bob |
78 | | | |
79 | | | <------ 3 ----------------
80 | ------------
81 |
82 | Payment channel networks
83 | ------------------------
84 |
85 | Opening up a payment channel for every person that you interact with is
86 | impractical because they need to be opened and closed on-chain.
87 |
88 | Payment channel networks solve this problem by routing payments through
89 | intermediaries. If Alice wishes to pay David, she might route the payment
90 | through Bob and Carol. Hash-locks are used to ensure that a routed payment
91 | either succeeds or is rejected entirely. Intermediaries typically charge a fee
92 | for their efforts.
93 |
94 | Routing algorithms for payment channel networks are an active area of research.
95 | Each routing algorithm has its own drawbacks.
96 |
97 |
98 | Alice --> Bob --> Carol --> David
99 |
100 |
101 | State channels
102 | --------------
103 |
104 | Payment channels can be generalized to not just handle payments, but also state
105 | changes, to enable off-chain smart contracts. Instead of signing off on amounts
106 | owed, parties sign off on transactions to a smart contract. Upon closing of a
107 | state channel, only a single transaction is executed on the on-chain contract.
108 | In case of a dispute, a contention period is used to determine which transaction
109 | is the latest. This means that just like bi-directional payment channels there
110 | is a need to remain online.
111 |
112 | Virtual channels
113 | ----------------
114 |
115 | When routing payments over a payment channel network, all participants in the
116 | route are required to remain online and confirm all payments. Virtual channels
117 | alleviate this by involving intermediary nodes only for opening and closing
118 | a channel. They are built around the idea that state channels can host a smart
119 | contract for opening and closing a virtual channel.
120 |
121 | Existing solutions
122 | ------------------
123 |
124 | | Name | Bi-directional | State | Routing | Virtual |
125 | |-------------------|----------------|-------|---------|---------|
126 | | raiden.network | ✓ | ✕ | ✓ | ✕ |
127 | | perun.network | ✓ | ✓ | ✓ | ✓ |
128 | | statechannels.org | ✓ | ✓ | ✓ | ✓ |
129 | | ethswarm.org | ✓ | ✕ | ✓ | ✕ |
130 |
131 | References
132 | ----------
133 |
134 | * [SoK: Off The Chain Transactions][1]: a comprehensive overview of level 2
135 | solutions
136 | * [Raiden 101][2]: explanation of payment channel networks
137 | * [Perun][3] and [Nitro][4]: explanation of virtual state channels
138 |
139 | [1]: https://nms.kcl.ac.uk/patrick.mccorry/SoKoffchain.pdf
140 | [2]: https://raiden.network/101.html
141 | [3]: https://perun.network/pdf/Perun2.0.pdf
142 | [4]: https://magmo.com/nitro-protocol.pdf
143 |
--------------------------------------------------------------------------------
/evaluations/storj.md:
--------------------------------------------------------------------------------
1 | An evaluation of the Storj whitepaper
2 | =====================================
3 |
4 | 2020-12-22 Mark Spanbroek
5 |
6 | https://storj.io/storjv3.pdf
7 |
8 | Goal of this evaluation is to find things to adopt or avoid while designing
9 | Dagger. It is not meant to be a criticism of Storj.
10 |
11 | #### Pros:
12 |
13 | + Performance is considered throughout the design
14 | + Provides an Amazon S3 compatible API (§2.4)
15 | + Bandwidth usage of storage nodes is aggressively minimized to enable people
16 | with bandwidth caps to participate, which is good for decentralization (§2.7)
17 | + Erasure codes are used for redundancy (§3.4), upload and download speed
18 | (§3.4.2), proof of retrievability (§4.13) and repair (§4.7)!
19 | + BIP32 hierarchical keys are used to grant access to file paths (§3.6, §4.11)
20 | + Ethereum based token for payments (§3.9)
21 | + Storage nodes are not paid for uploads to avoid nodes that delete immediately
22 | after upload (§4.3)
23 | + Proof of Work on the node id is used to counter some Sybil attacks (§4.4)
24 | + Handles key revocations in a decentralized manner (§4.4)
25 | + Uses a simplified Kademlia DHT for node lookup (§4.6)
26 | + Uses caching to speed up Kademlia lookups (§4.6)
27 | + Uses standard-sized chunks (segments) throughout the network (§4.8.2)
28 | + Erasure coding is applied after encryption, allowing the network to repair
29 | redundancy without the need to know the decryption key (§4.8.4)
30 | + Streaming and seeking within a file are supported (§4.8.4)
31 | + Micropayments via payment channels (§4.17)
32 | + Paper has a very nice overview of possible attacks and mitigations (§B)
33 |
34 |
35 | #### Cons:
36 |
37 | - Mostly designed for long-lived stable nodes (§2.5)
38 | - Satellites are the gateway nodes to the network (§4.1.1), whose requirements
39 | for uptime and reputation lead to centralization (§4.10). They are also a
40 | single point of failure for a user, because it stores file metadata (§4.9).
41 | - Centralization is further encouraged by having a separate network of approved
42 | satellites (§4.21)
43 | - Clients have to actively perform for audits (§4.13) and execute repair (§4.14)
44 | (through their trusted satellite)
45 | - The network has a complex reputation system (§4.15)
46 | - Consecutive micropayments are presented as a solution for the trust problems
47 | while retrieving (§4.17), which doesn't entirely mitigate withholding attacks.
48 | - Scaling is hampered by the centralization that happens in the satellites
49 | (§6.1)
50 | - The choice to avoid Byzantine distributed consensus, such as a blockchain
51 | (§2.10, §A.3) results in the need for trusted centralized satellites
52 |
--------------------------------------------------------------------------------
/evaluations/sui.md:
--------------------------------------------------------------------------------
1 | ../papers/Sui/sui.md
--------------------------------------------------------------------------------
/evaluations/swarm.md:
--------------------------------------------------------------------------------
1 | An evaluation of the Swarm book
2 | ===============================
3 |
4 | 2020-12-22 Mark Spanbroek
5 |
6 | https://swarm-gateways.net/bzz:/latest.bookofswarm.eth/
7 |
8 | Goal of this evaluation is to find things to adopt or avoid while designing
9 | Dagger. It is not meant to be a criticism of Swarm.
10 |
11 | #### Pros:
12 |
13 | + Book contains a well-articulated vision and historical context (§1)
14 | + Uses libp2p as underlay network (§2.1.1)
15 | + Uses content-addressable fixed-size chunks (§2.2.1, §2.2.2)
16 | + Employs encryption by default, enabling plausible deniability for node owners
17 | (§2.2.4)
18 | + Opportunistic caching allows for automatic scaling for popular content
19 | (§2.3.1, §3.1.2)
20 | + Has an upload protocol that, once completed, allows the uploader to disappear
21 | (§2.3.2)
22 | + Network self-repairs content through pull syncing (§2.3.3).
23 | + Nodes can play different roles depending on their capabilities, e.g. light
24 | node, forwarding node, caching node (§2.3.4).
25 | + Has a pricing protocol (§3.1.2)
26 | + Uses micro payments (§3.2)
27 | + Allows for zero cash entry (§3.2.5), which benefits decentralization
28 | + Uses staking/collateral, spot-checks and litigation to insure long term
29 | storage. (§3.3.4, §5.3)
30 | + The Merkle tree for chunking a file enables random access, and resumption of
31 | uploads (§4.1.1)
32 | + Manifests allow for collections of files and their paths (§4.1.2)
33 | + Combines erasure coding with a Merkle tree in a smart way (§5.1.3)
34 | + Redundancy is used to improve latency (§5.1.3)
35 |
36 | #### Cons:
37 |
38 | - Use of two peer-to-peer networks (underlay and overlay) seems overly complex
39 | (§2.1)
40 | - Tries to solve many problems that can be addressed by other protocols, such as
41 | routing privacy, micro payments and messaging.
42 | - Storage nodes and peers are chosen based on their mathematical proximity,
43 | instead of taking performance and risk into account (§2.1.3)
44 | - Uses a forwarding Kademlia DHT (§2.1.3) for routing, which requires stable,
45 | long lived network connections
46 | - Depends heavily on forwarding of messages, each message passes through list of
47 | peers that could be on opposite sides of the world. (§2.1.3)
48 | - Tries to solve routing privacy (§2.1.3), which could arguably be better
49 | addressed by a separate protocol such as onion routing.
50 | - Because of the use of an overlay DHT network, Swarm has to solve the
51 | bootstrapping problem, even though libp2p already solves this (§2.1.4).
52 | - A Swarm node needs to maintain three different DHTs; one for the underlay
53 | network (libp2p), another for routing (forwarding Kademlia), and a third for
54 | storage (DISC).
55 | - Solves the problem of changing content in a content-addressable system in two
56 | different ways: through single-owner chunks (§2.2.3), and through ENS (§4.1.3)
57 | - Garbage collection based on chunk value makes it hard to reason about the
58 | amount of money that is required to keep content on the network (§2.2.5)
59 | - Besides all the various incentives, Swarm also has a reputation system in the
60 | form of a deny list (§2.2.7).
61 | - Incentive system is complex, and therefore harder to verify. (§3)
62 | - Has its own implementation of micro payments, instead of using existing
63 | payment channels (§3.2.1)
64 | - Rewarding nodes for upload receipts leads to a store-and-forget attack, that
65 | requires tricky mitigation (§3.3.4)
66 | - Extra complexity (trojan chunks, feeds) is added because Swarm is also a fully
67 | fledged communication system (§4).
68 | - Offers pinning of content, even though it is inferior to using incentives
69 | (§5.2.2)
70 | - Recovery is built on top of pinning and messaging (trojan chunks) (§5.2.3)
71 |
--------------------------------------------------------------------------------
/evaluations/zeroknowledge.md:
--------------------------------------------------------------------------------
1 | Zero Knowledge Proofs
2 | =====================
3 |
4 | Zero knowledge proofs allow for a verifier to check that a prover knows a value,
5 | without revealing that value.
6 |
7 | Types
8 | -----
9 |
10 | Several types of non-interactive zero knowledge schemes exist. The most
11 | well-known are zkSNARK and zkSTARK, which [come in several flavours][8].
12 | Interestingly, the most performant is the somewhat older Groth16 scheme, with
13 | very small proof size and verification time. Its downside is the requirement for
14 | a trusted setup, and its [malleability][9]. Performing a trusted setup has
15 | become easier through the [Perpetual Powers of Tau Ceremony][10].
16 |
17 | A lesser-known type of zero knowledge scheme is [MPC-in-the-head][11]. This lets
18 | a prover simulate a secure multiparty computation on a single computer, and uses
19 | the communication between the simulated parties as proof. The [ZKBoo][13] scheme
20 | for instance allows for fast generation and verification of proofs, but does not
21 | lead to smaller proofs than zkSNARKs can provide.
22 |
23 | Tooling
24 | -------
25 |
26 | [Zokrates][1] is a complete toolbox for specifying and generating and verifying
27 | zkSNARK proofs. It's written in Rust, has Javascript bindings, and can generate
28 | Solidity code for verification. C bindings appear to be absent.
29 |
30 | [libSNARK][2] and [libSTARK][3] are C++ libraries for zkSNARK and zkSTARK
31 | proofs. libSNARK can be used as a backend for Zokrates.
32 |
33 | [bellman][4] is a Rust library for zkSNARK proofs. It can also be used as a
34 | backend for Zokrates.
35 |
36 | Iden3 created a suite of tools ([circom][5], [snarkjs][6], [rapidsnark][7]) for
37 | zkSNARKs (Groth16 and PLONK). It is mostly Javascript, except for rapidsnark
38 | which is writting in C++.
39 |
40 | Nim tooling seems to be mostly absent.
41 |
42 | Ethereum
43 | --------
44 |
45 | Ethereum has pre-compiled contracts [BN_ADD, BN_MUL and SNARKV][12] that reduce
46 | the gas costs of zkSNARK verification. These are used by the Solidity code that
47 | Zokrates produces.
48 |
49 | [1]: https://zokrates.github.io
50 | [2]: https://github.com/scipr-lab/libsnark
51 | [3]: https://github.com/elibensasson/libSTARK
52 | [4]: https://github.com/zkcrypto/bellman/
53 | [5]: https://github.com/iden3/circom
54 | [6]: https://github.com/iden3/snarkjs
55 | [7]: https://github.com/iden3/rapidsnark
56 | [8]: https://medium.com/coinmonks/comparing-general-purpose-zk-snarks-51ce124c60bd
57 | [9]: https://zokrates.github.io/toolbox/proving_schemes.html#g16-malleability
58 | [10]: https://medium.com/coinmonks/announcing-the-perpetual-powers-of-tau-ceremony-to-benefit-all-zk-snark-projects-c3da86af8377
59 | [11]: https://yewtu.be/V8acfV8LJog
60 | [12]: https://coders-errand.com/ethereum-support-for-zk-snarks/
61 | [13]: https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/giacomelli
--------------------------------------------------------------------------------
/incentives-rationale.md:
--------------------------------------------------------------------------------
1 | # Incentives Rationale
2 |
3 | Why incentives? In order to have a sustainable p2p storage network such that it can be used to store arbitrary data and avoid specializing around a certain type of content, economic incentives or payments are required.
4 | ## Incentives in p2p networks
5 |
6 | Bittorrent & friends, tend to specialize around **popular content** such as movies or music (citation). Empirical evidence suggests, that this is a consequence of how incentives are aligned in this types of network (citation). Without diving too deep, the bittorrent incentives model is composed of 3 major elements:
7 |
8 | - A "tit-for-tat" (or some variation) accounting system
9 | - A "seeding ratio", which signals the peer's reputation
10 | - The content being shared, which becomes the commodity traded in the network
11 |
12 | In other words, you trade "content" in a "tit-for-tat" fashion, which increases the "seeding ratio", which gives access to more "content". This model sounds attractive and fair at first, however it has shown to have major flaws.
13 |
14 | - It leads to network specialization where only certain types of content are available
15 | - Peers want to maximize their "seeding-ratio", which leads to sharing content which is in high demand (popular), which often tends to be the latest movie, tv show or music album.
16 | - Rare or "uninteresting" content is almost impossible to come by unless explicitly provided by a party such as specialized communities or private trackers (which usually implies payments, ie external incentives).
17 | - File availability becomes dependent on the content's popularity and age (in the network). It's availability declines over time.
18 | - Anecdotally, the current season of a tv show is often easier to come by than the previous season, this is because once the content has been downloaded (and consumed), there is little reason to continue sharing it for a longer period of time.
19 |
20 | There is also operational costs associated with running a node. This costs grows (at the very least linearly) proportional to the amount of users being served. Running a highly available node that serves thousands of other nodes a day is probably unfeasible for the vast majority of casual users and building a business around this type of protocols has unclear economics and quite often, legal consequences due to (the already mentioned network specialization issues) sharing illegal or "pirated" content.
21 |
22 | In short, a direct consequence of this incentives model is **network specialization** and **content/data availability**.
23 |
24 | In contrast to this, there is a different type of p2p network where this problem is not observed. Blockchains, which are in some sense data sharing networks, are an example of such networks. The reason for this is that there is a minimum amount of data required for a blockchain node to operate and thus all nodes have the incentive to download that data - we can call it **essential** data. Since this data is **essential** to operate a node, it's guaranteed to always be available - **this is a premise of the network**; non essential data however, such as the chain's full history is harder to come by and usually subsidized by third party organizations.
25 |
26 | It follows from the above examples that, there are __at least__ two types of p2p storage networks:
27 |
28 | - One where the data is intrinsic to the protocol, in other words the protocol and the data are inseparable, which is the case of blockchains
29 | - Another, where the data is extrinsic and the protocol does not rely on it in order to operate
30 |
31 | ## General purpose p2p storage networks
32 |
33 | In general, when compared to centralized alternatives p2p networks have many desirable properties such as:
34 |
35 | - Censorship resistance
36 | - Robustness, in the face of large scale failures
37 | - Excellent scaling potential
38 | - This is usually a matter of having more nodes joining the network
39 |
40 | In short, p2p networks have advantages over centralized counterparts and yet, we haven't see wider adoption outside of a few niche use cases already outlined above. In our opinion, this is due to lack of sufficient guarantees in networks with extrinsic data.
41 |
42 | One important property of data, is that once data is gone and no backups exist, the chances of recovery are very slim. Contrast this with computation, if the data is intact, recovering from failed computation usually implies simply re-running it with the same input. In other words, when it comes to data, integrity and availability is more important than any other aspect.
43 |
44 | It's this project's aim to provide a solution to the outlined issues of **data availability** in networks with extrinsic data.
45 |
46 | It's worth further breaking down **data availability** into two separate problems:
47 |
48 | 1. Retrievability - the ability to retrieve/download data from the network at any time
49 | 2. Persistence - the guarantee that data is persisted by the network over a predetermined time frame
50 |
51 | ## What should be incentivized then?
52 |
53 | In our opinion, anything that is a finite resource. In p2p storage networks this is largely bandwidth and hard drive.
54 |
55 | ### Bandwidth incentives
56 |
57 | Bandwidth is a finite resource and has an associated cost. Eventually this cost is going to compound and serving the network will become unreasonable and unsustainable. This leads to low node participation and data retrievability issues as peers will chooses to only serve select nodes or none at all. In many cases, this leads nodes to temporarily leave the network even when the file is still sitting on its hard drive.
58 |
59 | There are several fundamental problems that bandwidth incentives solve.
60 |
61 | - Increase the chance of data being "retrievable" from the network
62 | - Properly compensate "seeders" for the resources consumed by "leechers"
63 | - This ensures higher levels of node participation
64 | - Serves as a sybil attack prevention mechanism in an open network
65 |
66 | With incentivized bandwidth, rational nodes driven by market dynamics should seek to maximize profits by sharing data that is in high demand, thus offsetting operational costs and scaling the network up or down. This would give the network properties similar to a CDN that caches content to prevent overwhelming the origin with requests.
67 |
68 | ### Storage incentives
69 |
70 | Storage, also being a finite resource, has associated costs. Storing data on your own hard drive for no reason is irrational and it's safe to assume that an overwhelming majority of the network's participants wont do that. This leads the network to specialize around certain types of __popular__ content. In order to offset that trend, storing data needs to be incentivized.
71 |
72 | The fundamental issues that storage incentives solve is **data availability** and more precisely the issue of data **persistence** over a predetermined time frame.
73 |
74 | Enabling persistence opens up many common use cases such as data backups, becoming a storage provider for traditional web and web3 applications, and many others. In short, it replaces centralized cloud storage providers. Due to the wide range of use cases the issue of specialization also goes away.
75 |
76 | It's worth noting that we make no claims that the network is not going to be used to store and distribute "pirated" content, we merely claim that by realigning incentives we'll enable other more common use cases.
77 |
78 | Together, these incentives lead to a sustainable and censorship resistant p2p network. You negotiate a price for certain content to be stored long-term. Should the content become unavailable (due to censorship or a generic failure) after the contract is negotiated then the peer that stores the content is punished. When content is popular, then it will spread because more peers want to earn bandwidth fees, which allows the network to scale to high demand, acting as a censorship resistant CDN.
79 |
80 | ## Zero-cash entry
81 |
82 | Zero-cash entry entails that you can enter the network without having any funds. When all interactions in the network have a price then it becomes a problem to start participating unless the node is funded. The way to work around this problem is to initially become a provider of services, for example, a node can start persisting chunks for some amount of time (minutes or hours), and thus earn some initial capital after which it can start to freely exchange data.
83 |
84 | Another possibility would be that businesses that have storage requirements subsidize new users with a seed amount. For example a chat application can seed a small amount to a newly signed up client which will help it get started in the network. Once the client participates in the network, it will start earning bandwidth fees, which if correctly balanced, mean that a casual user can participate almost for free.
85 |
86 |
87 | ## Design philosophy
88 |
89 | When choosing and designing our incentive protocols we favor practical and provable protocols that maximally reduce the risks for participants. We favor those solutions that are easy to separate and upgrade over those that are tightly coupled with the rest of the network design.
90 |
--------------------------------------------------------------------------------
/meetings/bogota2022.md:
--------------------------------------------------------------------------------
1 | Agenda Bogota Meetup
2 | --------------------
3 |
4 | Draft agenda, feel free add / modify as you see fit.
5 |
6 | ### Suggested Topics ###
7 |
8 | - Session on new ZK based proving scheme
9 | - Retrospective (evaluate how we're doing and how we can improve as a team)
10 | - Hacking sessions (hack on Codex together)
11 | - Discussion about slot collateral: global collateral or per slot?
12 | - Node interactions: how do we envision executing transactions in different states?
13 | - dapp / http server embedded in the binary?
14 | - Perhaps we can use Hester's designs as a guide?
15 | - Project management
16 | - Tasks/issues cleanup
17 | - Discuss what's working, what's not, and what we can improve
18 | - Present high-level architecture, make recording of it for future reference
19 | - Codex use cases
20 | - Codex credits
21 | - Codex differentiators
22 |
--------------------------------------------------------------------------------
/papers/Compact_Proofs_of_Retrievability/README.md:
--------------------------------------------------------------------------------
1 | # Compact Proofs of Retrievability
2 |
3 | ## Authors
4 |
5 | - Hovav Shacham - hovav@cs.ucsd.edu
6 | - Brent Waters - bwaters@cs.utexas.edu
7 |
8 | ### DOI
9 |
10 | - http://dx.doi.org/10.1007/978-3-540-89255-7_7
11 |
12 | ## Summary
13 |
14 | The paper introduces a remote storage auditing scheme known as Proofs of Retrievability based on work derived from `Pors: proofs of retrievability for large files` by Juels and Kaliski and `Provable data possession at untrusted stores` by Ateniese et al.
15 |
16 | It takes the idea of homomorphic authenticators from Ateniese and the idea of using erasure coding from Juels and Kaliski to strengthen the remote auditing scheme. To our knowledge, this is also the first work to provide rigorous mathematical proofs for this type of remote auditing.
17 |
18 | The paper introduces two types of schemes - public and private. In the private setting, the scheme requires possession of a private key to perform verification but lowers both the storage and network overhead. In the public setting, only the public key is required for verification but the storage and network overhead are greater than those of the private one.
19 |
20 | ### Main ideas
21 |
22 | - Given a file `F`, erasure code it into `F'`
23 | - Split the file into blocks and sectors
24 | - Generate cryptographic authenticators for each block
25 | - During verification
26 | - The verifier emits a challenge containing random indexes of blocks to be verified along side random values used as multipliers to compute the proof
27 | - The prover takes the challenge and using both the indexes and the random multipliers produces an unforgeable proof. The proof consists of the aggregate data and tags for the indexes in the challenge
28 | - Upon receival, the prover is able to verify that the proof was generated using the original data
29 |
30 | ### Observations
31 |
32 | - Both the original data and the tags are employed in the generation and verification of the proof, which prevents pre-image attack that other schemes are susceptible to.
33 | - It potentially achieves a level of compression where at most one block worth of data and one cryptographic tag is ever needed to be sent across the network.
34 | - The erasure coding solves several concurrent issues. With an MDS erasure code and a coding ratio of 1 (K=M)
35 | - It is only necessary to prove that %50 of all the blocks are in place, this lowers the amount of data needed to be sampled making it constant for datasets of any size
36 | - Having to only verify that only K blocks are still available also protects against adaptive adversaries. For example, if the data is stored in 3 drives and each drive keeps going offline between samples the odds reset between each sampling round. To protect against such an adversarial scenario without erasure coding it would require to sample 100% of the entire file at each round; with erasure coding, since **any** K blocks are sufficient to reconstruct, the odds do not reset across sampling rounds
37 |
38 | ### Other ideas
39 |
40 | - Another important aspect presented in the paper is an `extractor function`. The idea is that given an adversary that is producing proofs but not releasing the data upon request, it would still be possible to eventually extract enough data to be able to reconstruct the entirety of the dataset, this would require extracting an amount of data equivalent to K blocks.
41 |
--------------------------------------------------------------------------------
/papers/Economics_of_BitTorrent_communities/README.md:
--------------------------------------------------------------------------------
1 | # Economics of BitTorrent communities
2 |
3 | ## Authors
4 |
5 | - Ian A. Kash - iankash@microsoft.com
6 | - John K. Lai - jklai@seas.harvard.edu
7 | - Haoqi Zhang - hq@eecs.harvard.edu
8 | - Aviv Zohar - avivz@microsoft.com
9 |
10 | ### DOI
11 |
12 | - https://doi.org/10.1145/2187836.2187867
13 |
14 | ## Summary
15 |
16 | The paper is a study of a BitTorrent community called DIME, where users share live concert recordings. The community has around 100K users and the study analyses data gathered over 6 months.
17 |
18 | ### Main ideas
19 |
20 | * The DIME system enforces a ratio of at least 0.25: 4 downloads for 1 upload
21 | * Many users have a ratio above 1 (which shows an altruistic behaviour)
22 | * New files are more attractive to users and have high demand at the beginning
23 | * Users with high bandwidth Internet connections take advantage of new files to take credits
24 | * Old files are no good to gain credit because they are not in high demand
25 | * There are periods where downloads are free
26 | * Users prefer to download old files during free periods
27 |
28 | ### Observations
29 |
30 | * The paper does not give any numbers about the amount of data available in total
31 | * The paper does not provide data about the file size distribution
32 | * Overall the paper provides interesting data about how sharing communities behave but no data about the decentralized storage itself.
33 |
34 | ### Other ideas
35 |
36 | * Some aspects of the demand for files with respect to their life could be applied to other decentralized storage systems
37 |
38 |
39 |
--------------------------------------------------------------------------------
/papers/Falcon_Codes_Fast_Authenticated_LT_Codes/README.md:
--------------------------------------------------------------------------------
1 | # Falcon Codes
2 | ## Authors
3 |
4 | - Ari Juels
5 | - James Kelley
6 | - Roberto Tamassia
7 | - Nikos Triandopoulos
8 |
9 | ## DOI
10 |
11 | https://doi.org/10.1145/2810103.2813728.
12 |
13 | ## Bibliography entry
14 |
15 | Juels, Ari, James Kelley, Roberto Tamassia, and Nikos Triandopoulos. ‘Falcon Codes: Fast, Authenticated LT Codes (Or: Making Rapid Tornadoes Unstoppable)’. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 1032–47. CCS ’15. New York, NY, USA: Association for Computing Machinery, 2015. https://doi.org/10.1145/2810103.2813728.
16 |
17 | ## Summary
18 |
19 | The paper addresses the problem of **adversarial erasures** in case of **non-MDS codes**, in a **private coding setting**.
20 | LT-codes, and their derivatives (RaptorQ, etc.) are known to provide fast(even linear-time) encode and decode both asymptotically and in practice, and are useful both as large block codes and as rateless codes. However, their guarantees are w.h.p only, while minimum code distance can be small in practice. This means that adversarial erasure patterns exist that can eliminate the advantages of an otherwise strong redundancy. Falcon codes aim to solve this by hiding the coding pattern. Note that this hiding can only work in a private setting, where there is a shared secret between encoder and decoder.
21 |
22 | ### Main ideas
23 |
24 | The main idea is to:
25 | - Take an LT encoder, which already uses and RNG to pick from a random degree distribution when generating bipartite coding graph.
26 | - Employ a PRG parametrised by a secret to make the random coding graph secret.
27 | - Encoding is now using a secret graph, but since encoding is done using XOR, it would be easy to infer the graph by observing segments. Protect this by adding a layer of encryption over segments.
28 | - Optionally add a MAC to convert corruptions to erasure.
29 |
30 | ### Other ideas
31 |
32 | Other ideas in the paper include:
33 | - reduce MAC overhead: batching MACs amplifies error but reduces overhead.
34 | - Scalability (FalconS): original Falcon needs access to all segments. Change this by applying Falcon in `b` blocks. This improves encoder locality but introduces adversarial erasure. Thus, apply a random permutation over all parity symbols over all blocks to avoid the adversarial erasures.
35 | - Rateless (FalconR): split original to `b` blocks, and set up a different Falcon for each, but do not encode yet. Then, generate the next parity symbol by one of the `b` Falcon encoders, randomly selecting which one to use.
36 |
37 | There is also a whole section dedicated to the use of Falcon in PoR …. this needs further study.
38 |
39 |
--------------------------------------------------------------------------------
/papers/Filecoin_A_Decentralized_Storage_Network/README.md:
--------------------------------------------------------------------------------
1 | # Filecoin: A Decentralized Storage Network
2 |
3 | ## Authors
4 |
5 | - Juan Benet - juan.benet.ai
6 |
7 | ### DOI
8 |
9 | -
10 |
11 | ## Summary
12 |
13 | This paper describes the mechanis behind the decentralized storage network called Filecoin.
14 |
15 | ### Main ideas
16 |
17 | **DSN: Decentralized storage Network** (No trusted parties)
18 | DSN must guarantee:
19 | * Data integrity (Data retrieved is the same as data stored)
20 | * Data retrievability (Clients can eventually retrieve the data)
21 | * Management fault tolerance (Management nodes might fail)
22 | * Storage fault tolerance (Storage nodes might fail)
23 |
24 | Other Properties:
25 | * Publicly verifiable
26 | * Auditable
27 | * Incentive-compatible
28 |
29 | **Proof of Storage** : Provable Data Possession and Proof of Retrievability.
30 |
31 | * Sybil attacks: multiple fake identities storing only 1 copy of the data.
32 | * Outsourcing Attacks: Quick request the block to other storage node.
33 | * Generation Attacks: regenerate the data on the fly when possible.
34 |
35 | **PoRep**: Proof of Replication, not to confuse with Proof of Retrievability.
36 | Note that Filecoin does not support Erasure Codes, only trivial replication.
37 |
38 | **Proof of spacetime**: Repeat PoRep over time.
39 |
40 | Seal operation makes a permutation of the replica, so that proofs can only work for the specific replica, therefore storing n replicas implies allocating n times the size of the dataset.
41 |
42 | PoRep setup process needs to be 10-100 times more time consuming than the proof, otherwise setup, request and proofs can be generated on the fly.
43 |
44 | Clients pay to store data and also to retrieve it. Retrieval Miners can be the same as Storage Miners, or simply ask from Storage nodes and send to the client, keeping some data in cache. This is some kind of cache mechanism. The benefit of only being a Retrieval Miner is that you are not responsible for storage, you don't lose money if you lose data.
45 |
46 | Achieving Retrievability: The Put specifies (f,m)-tolerant, meaning m storage miners storing the data and a maximum of f faults need to be tolerated.
47 |
48 | The Market Place is off-chain. Data sent by mini-blocks accompanied with micro-payments for each mini-block.
49 |
50 | Power fault tolerance: N is total “power” of the network and f is the part of the power controlled by adversarial nodes.
51 |
52 | More storage in use = more power on the network, more probability to be elected and create blocks.
53 |
54 | ### Observations
55 |
56 | (Copy-paste from Mark's evaluation)
57 |
58 | **Pros**:
59 |
60 | * Clients do not need to actively monitor hosts. Once a deal has been agreed upon, the network checks proofs of storage.
61 | * The network actively tries to repair storage faults by introducing new orders in the storage market. (§4.3.4).
62 | * Integrity is achieved because files are addressed using their content hash (§4.4).
63 | * Marketplaces are explicitly designed and specified (§5).
64 | * Micropayments via payment channels (§5.3.1).
65 | * Integration with other blockchain systems such as Ethereum (§7.2) are being worked on.
66 |
67 | **Cons**:
68 |
69 | * Filecoin requires its own very specific blockchain, which influences a lot of its design. There is tight coupling between the blockchain, storage accounting, proofs and markets.
70 | * Proof of spacetime is much more complex than simple challenges, and only required to make the blockchain work (§3.3, §6.2)
71 | * A miners influence is proportional to the amount of storage they're providing (§1.2), which is an incentive to become big. This could lead to the same centralization issues that plague Bitcoin.
72 | * Incentives are geared towards making the particulars of the Filecoin design work, instead of directly aligned with users' interest. For instance, there are incentives for storage and retrieval, but it seems that a miner would be able to make money by only storing data, and never offering it for retrieval. Also, the incentive for a miner to store multiple independent copies does not mean protection against loss if they're all located on the same failing disk.
73 | * The blockchain contains a complete allocation table of all things that are stored in the network (§4.2), which raises questions about scalability.
74 | * Zero cash entry (such as in Swarm) doesn't seem possible.
75 | * Consecutive micropayments are presented as a solution for the trust problems while retrieving (§5.3.1), which doesn't entirely mitigate withholding attacks.
76 | * The addition of smart contracts (§7.1) feels like an unnecessary complication.
77 |
78 | ### Other ideas and comments
79 |
80 | Nice Figure 1 showing the state machine for each component.
81 |
82 | Figure 2 shows a nice workflow. Good way to explain the system.
83 |
84 | Not clear what is the optimal relationship between m and f?
85 |
86 | The parameters f and m are never shown in the PUT description in figure 7.
87 |
88 |
89 |
--------------------------------------------------------------------------------
/papers/Peer-to-Peer_Storage_System_a_Practical_Guideline_to_be_lazy/README.md:
--------------------------------------------------------------------------------
1 | # Peer-to-Peer Storage Systems: a Practical Guideline to be Lazy
2 |
3 | ## Authors
4 |
5 | - Frederic Giroire - frederic.giroire@sophia.inria.fr
6 | - Julian Monteiro - julian.monteiro@sophia.inria.fr
7 | - Stephane Perennes - stephane.perennes@sophia.inria.fr
8 |
9 | ### DOI
10 |
11 | - https://doi.org/10/c47cmb
12 |
13 | ## Summary
14 |
15 | The paper presents the different trade-offs for implementing erasure code reconstruction after failures. The paper focuses on Reed Solomon encoding and analyses the number of encoding blocks (called fragments in the paper) the number of parity blocks as well as the minimum number of redundant blocks before triggering block repairs. The authors propose a Markov chain model and they look at the impact of these parameters on network bandwidth as well as block loss rate.
16 |
17 | ### Main ideas
18 |
19 | * Every reconstruction implies data traffic over the network
20 | * If we reconstruct after every single block loss (eager repair), we consume too much bandwidth
21 | * If we wait for several blocks to be lost (Lazy repair), we can reconstruct the missing block just one time and save bandwidth
22 | * If we wait too long before reconstruction, data might be lost if multiple erasures occur simultaneously
23 | * A model can help us understand the impact of the parameters s, r and r0 on bandwidth and loss rate
24 | * The distribution of blocks redundancy is a bit counter-intuitive
25 |
26 |
27 | ### Observations
28 |
29 | * It is assumed that the block reconstruction process is much faster than the peer failure rate
30 | * In P2P networks, failures are considered independent and memory-less a=1/MTTF
31 | * The probability for a peer to be alive after T time steps is P_a = (1 − a)^T
32 | * They work on the Galois Field GF (2^8)), which leads to the practical limitation s + r ≤ 256
33 | * The failure rate (1 year for 1 disk) is very conservative
34 | * Overall the model is elegant and the results are very clear and interesting
35 |
36 | ### Other ideas
37 |
38 | * The stretch factor is computed as follows: k+m/k
39 | * Block sizes can be chosen depending on the main purpose of the storage system
40 | * Archival mode: Good with large blocks because few reads
41 | * Filesystem mode: Good with small blocks because it allows easy reads and edits
42 |
43 |
--------------------------------------------------------------------------------
/papers/README.md:
--------------------------------------------------------------------------------
1 | # Paper Summaries
2 |
3 | > This directory contains academic paper summaries explored as part of the Codex project research. It is structured as a list of links to a document containing a quick summary and observations extracted from the paper. The summaries aren't meant to be exhaustive and cover all aspects of the paper but rather serve as a quick refresher and a record of the papers already evaluated.
4 |
5 | ## Index
6 |
7 | - [Compact Proofs of Retrievability](./Compact_Proofs_of_Retrievability/README.md)
8 | - [Filecoin A Decentralized Storage Network](./Filecoin_A_Decentralized_Storage_Network/README.md)
9 | - [Economics of BitTorrent communities](./Economics_of_BitTorrent_communities/README.md)
10 | - [Peer-to-Peer Storage Systems: a Practical Guideline to be Lazy](./Peer-to-Peer_Storage_System_a_Practical_Guideline_to_be_lazy/README.md)
11 |
12 | ## Writing Summaries
13 |
14 | A summary should contain a brief overview of the core ideas presented in the paper along with observations and notes.
15 |
16 | ## Template
17 |
18 | A [template](template.md) is provided that outlines a few sections:
19 |
20 | - Title - the title of the paper
21 | - Authors - the authors of the paper
22 | - DOI - the digital object identifier of the paper
23 | - Links - an optional section with links to the paper and other relevant material, such as source code, simulations, etc... If the paper is uploaded to the repo, it should be linked here as well.
24 | - Summary - a quick summary capturing the main ideas proposed by the paper
25 | - Main ideas - an optional list of bullet points describing the main ideas of the paper in more detail
26 | - Observations - an optional list of bullet points with observations if any
27 | - Other ideas - an optional list of bullet points with additional observations
28 |
29 | ## Directory Structure
30 |
31 | Each evaluation should go into it's own directory named after the paper being evaluated. It should contain a `README.md` with the actual evaluation and additional supporting material such as the paper itself, if one is available; or relevant code samples if those are provided. For example, the `Shacham and Waters - Compact Proofs of Retrievability` directory structure would look something like this:
32 |
33 | ```
34 | ├── Compact\ Proofs\ of\ Retrievability
35 | │ └── README.md
36 | | └── Compact\ Proofs\ of\ Retrievability.pdf
37 | ```
38 |
--------------------------------------------------------------------------------
/papers/Sui/sui.md:
--------------------------------------------------------------------------------
1 | The Sui Smart Contracts Platform
2 | ================================
3 |
4 | * [Sui Whitepaper](https://github.com/MystenLabs/sui/blob/main/doc/paper/sui.pdf)
5 | * [Sui Tokenomics paper](https://github.com/MystenLabs/sui/blob/main/doc/paper/tokenomics.pdf), May 2022
6 |
7 |
8 | Sui is an alternative to blockchains that is geared towards high performance. It
9 | utilizes a UTXO and DAG based design that allows for parallelization. It uses a
10 | delegated proof-of-stake model to keep the number of validators low while
11 | keeping the design permissionless. It uses a storage fund to pay for persistent
12 | storage.
13 |
14 | Main ideas
15 | ----------
16 |
17 | ### Consensus ###
18 |
19 | Transactions require approval from 2/3 of the validators (as measured by
20 | delegated stake). Sui uses the minimum amount of consensus that is required for
21 | a given transaction:
22 |
23 | * For transactions on owned objects (controlled by a key) it uses byzantine
24 | consistent broadcast. (Whitepaper §4.3, "Sign once, and safety")
25 | * For transactions on shared objects (modifiable by anyone) it uses a byzantine
26 | agreement protocol only on the *order* of conflicting transactions. Execution
27 | of the transaction happens after the order has been determined. (Whitepaper
28 | §4.4, "Shared Objects" and §5, "Throughput")
29 |
30 | Transactions on owned objects require 2 roundtrips to a quorum for byzantine
31 | broadcast. Transactions on shared objects require 4-8 round trips to a quorum
32 | for byzantine agreement. (Whitepaper §5, "Latency")
33 |
34 | ### Parallelism ###
35 |
36 | Sui uses the
37 | [Move](https://github.com/MystenLabs/sui/blob/main/doc/paper/sui.pdf) language
38 | for programming smart contracts. Unlike the EVM languages, this language is
39 | geared towards the inherent parallelism that is afforded by the UTXO and DAG
40 | design. The language is not unique to Sui, it is used in other projects as well.
41 | (Whitepaper §2)
42 |
43 | ### Storage ###
44 | Persistent storage is paid for using a storage fund, whereby the storage fees
45 | are collected, and the proceeds of investing (staking) this fund are used to pay
46 | for future storage costs (Tokenomics §3.3, §5). Fees are rebated when deleting
47 | data from storage (Tokenomics §5.1). This is designed in such a way that the
48 | opportunity cost of locking up tokens is equal to the fees one would otherwise
49 | pay for storage (Tokenomics §6.2)..
50 |
51 | ### Proof of stake ###
52 |
53 | Avoids "rich-get-richer" forces of other proof-of-stake implementations,
54 | specifically to ensure that validators enjoy viable business models regardless
55 | of their delegated stake. Random selection is avoided, opting instead for a
56 | model where everyone is rewarded according to their stake (Tokenomics §3.2,
57 | §6.3)
58 |
59 | Gas fees are payed out to both validators and the people that delegated their
60 | stake to the validators. This is an extra incentive for delegators to keep an
61 | eye on their chosen validator, and to move stake when the validator is not
62 | behaving well. (Whitepaper §4.7, "Rewards and cryptoeconomics")
63 |
64 | ### Epochs ###
65 | Keeps several parameters of the network constant during epochs, such as the
66 | stake of the validators and (nominal) gas prices. Uses checkpointing to compress
67 | state and allow for committee changes on epoch boundaries. (Whitepaper §4.7)
68 |
69 | Promotes predictable gas prices by having validators indicate a gas price
70 | upfront, and diminish their rewards if they do not honour this upfront gas
71 | price. (Tokenomics §4.1, §4.3)
72 |
73 | Observations
74 | ------------
75 |
76 | ### Contention ###
77 |
78 | The Sui design nicely sidesteps contention issues with shared mutable UTXO state
79 | (e.g. such as in Cardano) by performing the byzantine agreement protocol on the
80 | order of the transactions, not on its execution. Therefore there is no need to
81 | resubmit a transaction when someone else used the object/UTXO that you intended
82 | to use. Uses references to objects/UTXOs without their serial number for mutable
83 | state, to enable this. (Whitepaper §4.4, "Shared objects")
84 |
85 | ### Recovery ###
86 |
87 | It also nicely solves an issue with byzantine consistent broadcast (e.g. as in
88 | ABC) whereby funds are locked forever when conflicting transactions are posted.
89 | Conflicting transactions are cleared on every epoch boundary. (Whitepaper §4.3,
90 | "Disaster recovery" and §4.7, "Recovery")
91 |
92 | ### Inert stake ###
93 |
94 | Sui mitigates the problem whereby an increasing amount of delegated stake can no
95 | longer be re-assigned because the associated keys are lost (as could happen in
96 | e.g. ABC). Stake delegation is an explicit action, instead of an implicit
97 | side-effect of every transaction. The staking logic is implemented in a smart
98 | contract, which allows the network to update the logic to deal with such issues.
99 |
100 | ### Validator reputation ###
101 |
102 | Stake rewards are influenced by the (subjective) reputation that validators
103 | report about other validators. It is unclear how far this can be gamed by
104 | validator wishing to increase their rewards. (Tokenomics §4.1.2, §4.1.3)
105 |
106 | ### Storage fund viability ###
107 |
108 | The storage fund seems based on the assumption that there is enough money to be
109 | made from computation fees to pay for continued storage. The fund ensures that
110 | validators earn a bigger cut of the computation fees based on the size of the
111 | storage fund. This could be problematic when the amount of storage heavily
112 | outweighs the amount of computation; for instance if Sui were used primarily as
113 | a storage network. This is good to keep in mind when comparing the storage fund
114 | model with rent-based models such as employed in Codex. (Tokenomics §3.2, §3.3)
115 |
116 | The storage gas price remains constant within an epoch (Tokenomics §3.1). It is
117 | unclear how the network should react when confronted with a sudden spike in
118 | demand for storage. What would happen when the storage is price is low, and a
119 | user decides to store massive amounts of data on the network?
120 |
--------------------------------------------------------------------------------
/papers/template.md:
--------------------------------------------------------------------------------
1 | # Title
2 |
3 | The title of the paper
4 |
5 | ## Authors
6 |
7 | The authors of the paper
8 |
9 | ### DOI
10 |
11 | The digital object identifier for the paper
12 |
13 | ### Links
14 |
15 | An optional section with links to the paper and other relevant material, such as source code, simulations, etc... If the paper is uploaded to the repo, it should be linked here as well.
16 |
17 | ## Summary
18 |
19 | A quick summary capturing the main ideas proposed by the paper
20 |
21 | ### Main ideas
22 |
23 | A list of bullet points describing the main ideas of the paper in more detail
24 |
25 | ### Observations
26 |
27 | An optional list of bullet points with observations if any
28 |
29 | ### Other ideas
30 |
31 | An optional list of bullet points with additional observations
32 |
--------------------------------------------------------------------------------
/project-overview.md:
--------------------------------------------------------------------------------
1 | # Codex project overview
2 |
3 | > This documents outlines at a high level, what Codex is; the problem it's attempting to solve and it's value proposition; as well as how it compares to similar solutions.
4 |
5 | ## Introduction
6 |
7 | Peer to peer storage and file sharing networks have been around for quite a long time. They exhibit clear advantages in comparison to centralized storage providers such as scalability and robustness in the face of large scale network disruptions and have desirable censorship resistant properties. However, we've yet to see widespread adoption outside of a few niche applications.
8 |
9 | Our intuition is that the lack of incentives, strong data availability, and persistence guarantees make these networks unsuitable for applications with moderate to high availability requirements. In other words, **without reliability at the storage layer it is impossible to build other reliable applications** on top of it. A more in depth overview of these observations can be found in the [incentives rationale](https://github.com/status-im/codex-research/blob/main/incentives-rationale.md) document.
10 |
11 | ## Goals and Motivations
12 |
13 | Codex is our attempt at creating a decentralized storage engine that intends to improve on the state of the art by supplying:
14 |
15 | - An incentivized p2p storage network with strong availability and persistence guarantees
16 | - A resource restricted friendly protocol that can endure higher levels of churn and large amounts of ephemeral devices
17 |
18 | We intend to address the first issue by developing a robust data availability and retrievability scheme and the second by building a p2p network friendly to mobile and other ephemeral devices.
19 |
20 | We follow the "less is more" principle and attempt to remove as much complexity from the core protocol as possible. Anything that doesn't directly contribute to the core functionality is pushed out of the protocol - this decision has two important goals. Reducing complexity at the protocol level, simplifies implementation and allows for quick iterative development cycles and, by simplifying the protocol we also simplify the incentives mechanisms - a particularly hard problem that we believe is yet to be properly addressed by other solutions.
21 |
22 | ## High level network overview
23 |
24 | Codex consists of a p2p network of **storage, ephemeral, validator** and **regular** nodes.
25 |
26 | ### Storage nodes
27 |
28 | Storage nodes provide long term reliable storage. In order for a storage node to operate it needs to stake a collateral proportional to the amount of data it's willing to store. Once the collateral has been staked and the node begins to store data, it needs to periodically provide proofs of data possession. If a node fails to provide a proof in time, it is penalized with a portion of its stake; if the node fails to provide proofs several times in a row, it looses the entirety of the stake.
29 |
30 | ### Validator nodes
31 |
32 | Validator nodes are in charge of collecting, validating and submitting proofs to an adjudicator contract which rewards and penalizes storage and other validator nodes. A validator node also needs to stake a collateral in order to be able to participate in the validation process.
33 |
34 | Note that we don't use the term "adjudicator contract" in the literal sense of an Ethereum contract. We use it to indicate anything that executes on a consensus engine.
35 |
36 | ### Ephemeral nodes
37 |
38 | Bandwidth incentives allow anyone to operate as an ephemeral node, profiting only from caching and serving popular content. We expect this to have the emergent property of an organic CDN, where nodes with spare bandwidth but limited or unreliable storage can collectively scale the network depending on current demands.
39 |
40 | ### Regular nodes
41 |
42 | Regular or client nodes, engage with other nodes to store, find and retrieve data from the network. Regular nodes constitute the lion share of the Codex network and consume services offered by other nodes in exchange for payments. A regular node can also be an ephemeral node by caching previously consumed data that other nodes can retrieve from it. This allows nodes to offset some of the cost of participating in the network and it's expected to allow the majority of nodes to participate on an almost free basis after an initial entree fee - this last point is covered in more detail in a later section.
43 |
44 | ## Incentives structure
45 |
46 | The goals behind our incentives structure are:
47 |
48 | 1. Allow demand and supply to direct the network to optimally utilize its resources
49 | 2. Allow nodes to utilize their competitive advantages to maximize profits, thus increasing participation
50 | 3. Serve as a security and spam prevention mechanism
51 |
52 | Interactions between nodes are 1:1. This decision is deliberate and allows us to simplify accounting and adjudicating of payments and avoids complex price discovery mechanisms. We explicitly want to avoid:
53 |
54 | - Complex multihop payment chains - all interactions are strictly between directly connected nodes
55 | - Arbitrary price setting - all prices are driven by demand and supply and are negotiated 1:1
56 | - Loose payment guarantees and doublespends - all interactions between parties are settled securely and unambiguously
57 |
58 | In other words, our incentive structure attempts to be simple, predictable and secure. Predictability and security allows nodes to properly plan and allocate resources.
59 |
60 | ### Incentives categories
61 |
62 | There are several incentives categories:
63 |
64 | - Staking
65 | - Bandwidth
66 | - Storage
67 | - Penalties and rewards
68 |
69 | #### Staking
70 |
71 | Staking is used as a mechanism to prevent spam and abuse in the system - all nodes stake some amount of collateral.
72 |
73 | Regular nodes stake funds indirectly by having an operational capital to be able to retrieve content from the network, i.e. bandwidth fees.
74 |
75 | #### Bandwidth
76 |
77 | Bandwidth fees play several important roles in the system:
78 |
79 | - Prevent spam and DDoS attacks from requesting nodes
80 | - Enable nodes to operate as exit or caching nodes
81 | - Avoid hotpaths and enable (geographical) locality
82 | - Rational nodes looking to maximize profits, can quickly cache and serve popular content, thus scaling the network according to current needs
83 |
84 | #### Storage
85 |
86 | Storage incentives allow nodes to earn a profit in exchange for storing arbitrary data. This allows content to persist in the network regardless of its popularity and age.
87 |
88 | #### Penalties and Rewards
89 |
90 | Penalties and rewards allow verifying nodes to profit by monitoring and detecting malicious or malfunctioning storage and other validator nodes.
91 |
92 | ## Data availability and persistence
93 |
94 | A core goal of the Codex protocol is to enable data availability and persistence. In order to accomplish this, we rely on several complementary techniques:
95 |
96 | - We use active verification to ensure data is available and retrievable
97 | - We ensure that failures are detected and corrected early to prevent outages and keep previously agreed upon redundancy guarantees
98 | - We use erasure coding to increase network wide data redundancy and prevent catastrophic data loss
99 |
100 | When a node commits to a storage contract and a user uploads a file or other arbitrary data, the network will proactively verify that the storing node is online and the data is retrievable. Storage nodes broadcast proofs of data possession over random intervals. If the storage node sent invalid proofs or failed to provide them in time, the network will re-post the contract for any other storage node to pick up. When the contract is re-posted, an amount from the faulty node's stake is used for the new storing node bandwidth fees. It is expected that data is stored on at least several nodes to prevent data loss in the case of a catastrophic network failure. Erasure coding complements active verification by allowing to reconstruct the full set from a subset of the data.
101 |
102 | ### Proofs of data possession and retrievability
103 |
104 | We use proofs of data possession and retrievability to ensure storage nodes committed to a contract remain online and available. The storage and retrievability proofs are formally described in this [document](https://hackmd.io/2uRBltuIT7yX0CyczJevYg?view).
105 |
106 | The main objective of the proofs are:
107 |
108 | - Ensure nodes are online and maintaining the entirety of the dataset from the storage contract
109 | - Ensure that data is readily retrievable to prevent blackmailing and withholding attacks
110 |
111 |
112 | ## Interacting with the Codex Network
113 |
114 | Any regular node that participates in the network needs to have an operational amount set aside in order to cover for bandwidth fees. This creates a barrier to entry however, we think it's a worthy tradeoff in order to maintain the security and health of the network. It's worth noting that any decentralized platform will have similar requirements and limitations. Bellow, we'll list some potential ways to workaround this in Codex.
115 |
116 | ### Subsidies or airdrops
117 |
118 | Any application migrating or being built for a decentralized platform requires some operational capital to participate. Many projects workaround this by initially subsidizing potential users with small portions of their token. This are usually known as airdrops. Codex can use a similar technique to allow first adopters to begin participating in the network.
119 |
120 | ### Tit-for-tat settlements
121 |
122 | Many interactions in the network will be long lived, this allows two nodes to exchange in a tit-for-tat manner. It works like this, a long lived payment channel is opened and nodes freely exchange chunks regardless of which way the balance tilts. The channel can only be closed when both parties agree (this is always true, regardless of how long the channel has been open). The node that is currently in debt will need to add funds to the channel or keep providing the other peer with chunks until the debt is repaid. The channel is closed only when the debt is completely settled or the counter party forgoes it. We expect that nodes resort to tit-for-tat often, thus alleviating the need to constantly "top up" the node's balance.
123 |
124 | ### Ephemeral or Caching nodes
125 |
126 | Storing nodes need to be constantly online in order to earn fees on storage contracts and respond to probing requests. If a node is overwhelmed and unable to serve requests, it can miss a verification window or a verifier requesting random chunks, both of which lead to penalties. In this cases, rational storage nodes can lower the price per chunk to allow other nodes to share the load. In many cases they might forgo bandwidth fees for some period of time, which allows newly joining or underfunded nodes to become caching nodes and earn bandwidth fees.
127 |
128 | ### Adaptive nodes
129 |
130 | Many mobile devices follow well established patterns of usage. Phones often switch from mobile data to WiFi and back; and are either on the go or plugged in the wall, charging. This devices can operate mostly as a consumer when on the go, but switch to a caching node when bandwidth and power aren't a limitation. This will offset all or most of the nodes consumed bandwidth during the day.
131 |
132 | ### Opportunistic providing
133 |
134 | A node might not have a chunk itself, but it might be connected to another node that does, in that case it might choose to advertise the chunk to the requesting node but charge a small premium in order to cover it's expenses and make a small profit on top. This is called "opportunistic providing".
135 |
136 | Note that this emulates forwarding but without incuring in complexity with payments tracking accross many hops. Payments at each hop are still settled 1:1.
137 |
138 | ### Altruistic nodes
139 |
140 | Any node can choose to provide services for free. Nodes can store and share arbitrary data at will without charging any fees.
141 |
142 | ## Closing Notes
143 |
144 | To summarize, Codex attempts to "untie the knot" of incentivized storage and allow many existing and future application to be built in a distributed manner. We're building Codex to be reliable and predictable p2p storage infrastructure that will allow for many business and casual use cases. We accomplish data persistence and availability by introducing robust PoDP proofs which we supplement with error correction techniques. We use robust PoR schemes to prevent blackmailing and data withholding attacks and guarantee data is always retrievable. We provide reasonable workarounds to the "zero entry" problem without compromising the network's security.
145 |
146 | Hopefully, this overview has clarified what Codex is and what its main value proposition.
147 |
--------------------------------------------------------------------------------
/robust-data-possesion-scheme.md:
--------------------------------------------------------------------------------
1 | # Robust Proofs of Data Possession and Retrievability
2 |
3 | > Proofs of data possession (PoDP) schemes establish if an entity is currently or has been in possession of a data set. Proofs of Retrievability (PoR) schemes attempt to detect with negligible probability that an entity is maliciously or otherwise withholding data.
4 |
5 | ## Proofs of data possession
6 |
7 | In our definition, a robust PoDP scheme is one that can prove with negligible probability that a storage provider is currently or has been in possession of a data set.
8 |
9 | To state the obvious first - the most secure data possession scheme is to "show" the data every time it's being requested. In other words, the most secure way of proving that someone has a file is for him/her to show the file every time it is being requested. Alas, due to bandwidth restriction, this is not practical in the majority of cases hence, the main objective of different data possession schemes is overcoming the limitation of _having to show the entire dataset every time_ to prove its possession.
10 |
11 | A common technique to overcome this limitation is, random sampling and fraud proofs. This consist in selecting a random subset of the data instead of the entire data set. The rationale is that, if the prover doesn't know which pieces are going to be requested next, it is reasonable to expect that it will keep all the pieces around to prevent being caught; and a well behaved prover would certainly do that.
12 |
13 | The mechanism described above sounds reasonable, unfortunately, naive random sampling only provides weak guarantees of possession.
14 |
15 | Lets look at a naive, but relatively common random sampling scheme.
16 |
17 | - Given a file $F$, split it into a set of chunks $C$, where $C=\{c_1, c_2,...,c_n\}$
18 | - Next, generate a publicly available digest $D$ of the file from the chunks in $C$ - for example a Merkle tree
19 | - To prove existence of the file, provide random chunks from the set $C$, along with a part of the digest $D$
20 | - A verifier, takes the random chunks provided and attempts to verify if they match the digest, if they do, they're assumed to be part of the same set $C$
21 |
22 | The problem with this sampling technique however, is that it can only prove existence of those pieces that have been sampled at that particular point in time and nothing more.
23 |
24 | For example, lets say that the prover $P$, supplies randomly sampled chunks $c_\alpha$ from the set $C$, at discrete time intervals $T$ to a verifier $V$. At time $t_0$, $P$ successfully supplies the set $\{c_1, c_7, c_9\}$ request by $V$; at time $t_1$ it supplies the set $\{c_2, c_8, c_5\}$; at time $t_2$ it supplies the set $\{c_4, c_3, c_6\}$ and so on.
25 |
26 | Each of this sampling steps are statistically independent from one another, ie, the set provided at $t_3$ doesn't imply that the sets $t_2$, $t_1$ and $t_0$ are still being held by the prover, it only implies that it's in possession of the currently provided set. At each verification step, the odds of detecting a malicious or faulty prover are proportional to the amount of sampled chunks. In other words, if the prover can provide %50 of the chunks, the chance of catching it are %50, if it provides %5 the chances of catching a malicious or faulty node are %5. Moreover, this doesn't establish possession over time, which we defined as another property of PoDP.
27 |
28 | One common misconception is that increasing the sampling rate will somehow change the odds of detecting missing chunks, but that is not the case, at best it will allow detecting that they are missing faster, but the odds will still be the same.
29 |
30 | To understand why, lets do a quick refresher on basic statistics. There are two types of statistical events - independent and dependent. In an independent event, the outcome of the previous event does not influence the next outcome. For example, flipping a coin always has a %50 chance of hitting either heads or tails, and throwing it 10 times vs 100000 times would not change this odds. Dependent events on the other hand are tied and the odds of the next event are dependent on the outcome of the previous event. For example, if there is a bag with 5 marbles, 2 red and 3 blue, the odds of pulling a red marble is 2 in 5 and the odds of pulling a blue one is 3 in 5. Now, if we pull one red marble from the bag, the odds change to 1 in 4 and so on.
31 |
32 | To increase the robustness of random sampling schemes and establish possession over time, each sampling event needs to be dependent on the previous event. How can we do this? A potential way is to establish some sort of cryptographic link at each sampling step, such that the next event can only happen after the previous one completed, thus establishing a chain of proofs.
33 |
34 | Lets extend our naive scheme with this new notion and see how much we can improve on it. For this we need to introduce a new primitive - a publicly known and verifiable random beacon. This can be anything, from a verifiable random function (VRF) to most blockchains (piggy backing on the randomness properties of the blockchain). In this scheme, we generate the same publicly known and verifiable digest $D$, but instead of supplying just random chunks along with a part of the digest (merkle proofs of inclusion), we also supply an additional digest generated using a value supplied by the random beacon $B$ and the previous digest - $d_{n-1}$. In other words, we establish a cryptographic chain that can only be generated using a digests from previous rounds.
35 |
36 | It looks roughly like this. We chunk a file and generate a well known digest $(D)$, lets say it is the root of the merkle tree derived from the chunks. This is also the content address used to refer to the file in the system. Next, we use the digest and a random number from the random beacon to derive a _verification digests_ at each round. The first iteration uses the digest derived by concatenating the random number from the random beacon and the digest $D$ to generate a new verification digest $d_0$, subsequent rounds use the previous digest (ie $d_{n-1}, d_{n-2}, d_{n-3}$, etc..) to generate new digests at each round. Like mentioned above, this creates a chain of cryptographic proofs, not unlike the ones in a blockchain, where the next block can only be generated using a valid previous block.
37 |
38 | More formally:
39 |
40 | ($||$ denotes concatenation)
41 |
42 | - Given a file $F$, split it into a set of chunks, where $C=\{c_1, c_2,...,c_n\}$
43 | - Using the chunks in $C$, generate a digest $D$ such that $D=H(C)$
44 | - To prove existence of the file
45 | - Select random chunks $c_\alpha = \{c_1, c_3, c_5\}$, $c_\alpha \subset C$
46 | - Get a random value $r$ from $B$ at time $t_n$, such that $r_n=B(t_n)$
47 | - Using $r_n$, plus $d_{n-1}$, generate a new digest, such that $C_n = \forall \sigma \in C: d_{n-1} || r_n || \sigma$ , and $d_n = H(C_n)$
48 | - At time $t_0$, the digest $d_0$ will be constructed as $C_n = \forall \sigma \in C: D || r_0 || \sigma$ , and $d_0 = H(C_n)$
49 | - We then send $d_n$ and $c_\alpha$ to the verifier
50 | - A verifier, takes the supplied values and using a function first verifies that $V(H(c_\alpha), D)$ it the takes $V(H(\forall \sigma \in c_\alpha: d_{n-1} || r_n || \sigma), d_n)$
51 |
52 | The first question to ask is how much has this scheme improved on our naive random sampling approach? Assuming that our cryptographic primitives have very low chances of collision and our randomness source is unbiased, the chances of forging a proof from a subset of the data are negligible, moreover, we can safely reduce the number of sampled chunks to just a few and still preserve the high level of certainty that the prover is in possession of the data, thus keeping the initial requirement of reducing bandwidth consumption.
53 |
54 | However, in its current non-interactive form the digest can be forged by combining the already known requested chunks and complementing the rest with random chunks. In order to prevent this, we need to split the digest generation and verification into independent steps, ie make it interactive.
55 |
56 | In an interactive scheme, where the prover first generates and sends a digest $d_n$ and the verifier then requests random chunks from the prover we can prevent these types of attacks. However every interactive scheme comes with the additional overhead of the multiple rounds, but as we'll see next, we can use this property to build a robust proof of retrievability scheme from it.
57 |
58 | ## Proofs of Retrievability
59 |
60 | A robust PoR scheme is one that can detect with negligible probability that a node is maliciously or otherwise withholding data.
61 |
62 | A particularly tricky problem in PoR schemes is the "fisherman" dilemma as described by [Vitalik Buterin](https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding).
63 |
64 | To illustrate the issue, lets look at a simple example:
65 |
66 | - Suppose that $P$ is storing a set $C=\{c_1, c_2..c_n\}$
67 | - A node $\rho$, attempts to retrieve $C$ from $P$
68 | - If $P$ is maliciously or otherwise unable to serve the request, $\rho$ needs to raise an alarm
69 |
70 | However, due to the "fishermans" dilemma, proving that $P$ withheld the data is impossible. Here is the relevant quote:
71 |
72 | > because not publishing data is not a uniquely attributable fault - in any scheme where a node ("fisherman") has the ability to "raise the alarm" about some piece of data not being available, if the publisher then publishes the remaining data, all nodes who were not paying attention to that specific piece of data at that exact time cannot determine whether it was the publisher that was maliciously withholding data or whether it was the fisherman that was maliciously making a false alarm.
73 |
74 | From the above, we can deduce that unless the entire network is observing the interaction between the requesting node and the responding node, it's impossible to tell for sure who is at fault.
75 |
76 | There are two problems that the "fisherman" dilemma outlines:
77 |
78 | - "all nodes who were not paying attention to that specific piece of data at that exact time"
79 | - "was the publisher that was maliciously withholding data or whether it was the fisherman that was maliciously making a false alarm"
80 |
81 | This can be further summarized as:
82 |
83 | 1. All interactions should be observable and
84 | 2. All interactions should be reproducible and verifiable
85 |
86 | The first requirement of observability can be broken down into observing the requester and observing the responder.
87 |
88 | 1. In the case of the responder, if it knows that no-one is observing then, there is no way anyone can prove that it withheld the data
89 | 2. In the case of the requester, it is both impossible to prove wrongdoing on behalf of the responder and prove that the requester is being honest in its claims
90 |
91 | We can invert the first proposition and instead of it being "if it knows that no-one is observing" we can restate it as "if it doesn't know when it's being observed" and introduce uncertainty into the proposition. If the responder never knows for sure that it's being observed, then it's reasonable for a rational responder to assume that it is being observed at all times.
92 |
93 | There isn't a way of directly addressing the second issue because there is still no way of verifying if the requester is being honest without observing the entire network, which is intractable. However, if we instead delegate that function to a subset of dedicated nodes that observe both the network and each other, then the requester never needs to sound the alarm itself, it is up to the dedicated nodes to detect and take action against the offending responder.
94 |
95 | However, this scheme is still incomplete and it's reasonable to assume that the responder can deny access to regular requesters, but respond appropriately to the dedicated verifiers. The solution is to anonymize the validators so that it is impossible to tell wether the requester is being audited or simply queried for data. This guarantees that storing nodes always respond to any request as soon as possible.
96 |
97 | Our second requirement states that all interactions should be reproducible and verifiable. It turns out that this is already partially solved by our PoDP scheme. In fact, the seemingly undesirable interactive property, can be used to extend the PoDP scheme to a PoR scheme.
98 |
99 | ## Extending PoDP to PoR
100 |
101 | To extend the PoDP with PoR properties, we simply need to turn it into an interactive scheme.
102 |
103 | Suppose that we have a trustless network of storing (responders), validating and regular (requesters) nodes. Storing nodes generate and submit a verification digest $d_n$ at specific intervals $t_n$. Validator nodes, collectively listen (observe) this proofs (which consist of only the digest). Proofs are aggregated and persisted, such that it is possible retrieve them at a later time to precisely establish when the node was last in possession of the dataset. Proofs are only valid for a certain window of time, so if the node went offline and failed to provide a proof for several intervals, it would be detect and the node would be marked offline. This by itself is not sufficient to prove neither possession nor availability, but it does establish a verifiable chain of events.
104 |
105 | Next, at random intervals, an odd subset of the validators is selected and each validator requests unique random set of chunks from the storing node. Each validator then verifies the chunks against $d_n$. If the chunks match for all validators, then each will generate an approval stamp which will be aggregated and persisted in a blockchain.
106 |
107 | If chunks only match for some validators and since there is an odd number of validators, then the majority decides if they are correct or invalid, thus avoiding a tie. Neither the validators nor the storing nodes know ahead of time which subset they will end up being part of and each validator generates its own random set to probe.
108 |
109 | In order to reduce bandwidth requirement and load on the network, validation happens periodically, for example, every two hours in a 24 hour window. If a faulty node misses a window to submit its proof (digest) it's marked offline and penalized, but if a malicious node submits several faulty proofs in succession, it should be detected during the next window of validation and penalized retroactively for every faulty proof. If enough proofs are missed and assuming that all participants are bound by a collateral, then the faulty or malicious node gets booted from the set of available storing nodes and looses it's stake.
110 |
111 | In a similar manner, if a node from the validators subset failed to submit its stamp on time, it gets penalized with a portion of its collateral and eventually booted off the network.
112 |
113 | Well behaved nodes get rewarded for following the protocol correctly, faulty or malicious nodes are detected, penalized and eventually booted out.
114 |
115 | ## Conclusion
116 |
117 | To understand whether the described PoDP and PoR schemes satisfy the requirements of being robust, lets first outline what those are:
118 |
119 | 1. Establish possession of data over time
120 | 2. Establish possession of data at the current time
121 | 3. Detect faulty or malicious nodes that are withholding data
122 | 4. Circumvent the "fishermans" dilemma
123 |
124 | Now, does our proposed scheme satisfy this requirements?
125 |
126 | - We can reliably say that a node has been in possession of data over time by issuing a cryptographically linked chain of proofs - this satisfies 1.
127 | - We can reliably tell whether a node is currently in possession of a data set by interactively probing for randomly selected chunks from the original data set and matching them agains the current digest - this satisfies 2.
128 | - We introduced uncertainty through anonymity and randomness into the interactive verification process, which allows us to satisfy 3 and 4.
129 | - Only dedicated nodes need to monitor the network, which makes observability tractable
130 | - Since nodes don't know when they are observed, rational nodes can only assume that they are always observed, thus preventing data withholding and encouraging availability
131 |
132 | Furthermore, assuming that the broadcasted proofs are smaller than the original data set, we keep the bandwidth requirements low. We can further improve on it by reducing the probing frequency. Since faults can still be reliably traced back to their origin, nodes can be retroactively punished, which further reduces the possibility of gaming the protocol.
133 |
--------------------------------------------------------------------------------