├── Readme.md ├── analysis ├── Modeling_durability_using_replication_state_and_related_metrics_(Markov_Chain_Model).ipynb ├── PoR_test_analysis_with_multiple_storage_nodes.ipynb ├── TokenValuation.ipynb ├── block-discovery-sim │ ├── .Rbuildignore │ ├── .Rprofile │ ├── .gitignore │ ├── DESCRIPTION │ ├── R │ │ ├── collate.R │ │ ├── node.R │ │ ├── partition.R │ │ ├── sim.R │ │ └── stats.R │ ├── README.md │ ├── block-discovery-sim.Rmd │ ├── block-discovery-sim.Rproj │ ├── renv.lock │ ├── renv │ │ ├── .gitignore │ │ ├── activate.R │ │ └── settings.json │ └── tests │ │ ├── testthat.R │ │ └── testthat │ │ ├── test-partition.R │ │ └── test-stats.R └── block-discovery.Rmd ├── design ├── Merkle.md ├── contract-deployment.md ├── marketplace.md ├── metadata-overhead.md ├── proof-erasure-coding.md ├── proof-erasure-coding.ods ├── sales.md ├── slot-reservations.md └── storage-proof-timing.md ├── evaluations ├── account abstraction.md ├── arweave.md ├── eigenlayer.md ├── filecoin.md ├── ipfs.md ├── rollups.md ├── rollups.ods ├── sia.md ├── sidechains.md ├── sidechains.ods ├── statechannels │ ├── disputes.md │ └── overview.md ├── storj.md ├── sui.md ├── swarm.md └── zeroknowledge.md ├── incentives-rationale.md ├── meetings └── bogota2022.md ├── papers ├── Compact_Proofs_of_Retrievability │ └── README.md ├── Economics_of_BitTorrent_communities │ └── README.md ├── Falcon_Codes_Fast_Authenticated_LT_Codes │ └── README.md ├── Filecoin_A_Decentralized_Storage_Network │ └── README.md ├── Peer-to-Peer_Storage_System_a_Practical_Guideline_to_be_lazy │ └── README.md ├── README.md ├── Sui │ └── sui.md └── template.md ├── project-overview.md └── robust-data-possesion-scheme.md /Readme.md: -------------------------------------------------------------------------------- 1 | Codex Research 2 | =============== 3 | 4 | Contains research for the Codex peer-to-peer storage network. 5 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^renv$ 2 | ^renv\.lock$ 3 | ^.*\.Rproj$ 4 | ^\.Rproj\.user$ 5 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/.Rprofile: -------------------------------------------------------------------------------- 1 | source("renv/activate.R") 2 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .RData 3 | .Rhistory 4 | *nb.html 5 | rsconnect -------------------------------------------------------------------------------- /analysis/block-discovery-sim/DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: blockdiscoverysim 2 | Title: Block Discovery Simulator 3 | Version: 0.0.0.9000 4 | Description: Simple Simulation for Block Discovery 5 | Encoding: UTF-8 6 | Roxygen: list(markdown = TRUE) 7 | RoxygenNote: 7.2.3 8 | Depends: 9 | shiny (>= 1.7.4.1), 10 | tidyverse (>= 2.0.0), 11 | purrr (>= 1.0.1), 12 | VGAM (>= 1.1-8), 13 | R6 (>= 2.2.2), 14 | plotly (>= 4.10.2) 15 | Suggests: 16 | devtools, 17 | testthat (>= 3.0.0) 18 | Config/testthat/edition: 3 19 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/R/collate.R: -------------------------------------------------------------------------------- 1 | # We do this hack because rsconnect doesn't seem to like us bundling the app 2 | # as a package. 3 | 4 | order <- c( 5 | 'R/partition.R', 6 | 'R/stats.R', 7 | 'R/node.R', 8 | 'R/sim.R' 9 | ) 10 | 11 | library(R6) 12 | library(purrr) 13 | library(tidyverse) 14 | 15 | lapply(order, source) 16 | 17 | run <- function() { 18 | rmarkdown::run('./block-discovery-sim.Rmd') 19 | } -------------------------------------------------------------------------------- /analysis/block-discovery-sim/R/node.R: -------------------------------------------------------------------------------- 1 | Node <- R6Class( 2 | 'Node', 3 | public = list( 4 | node_id = NULL, 5 | storage = NULL, 6 | 7 | initialize = function(node_id, storage) { 8 | self$node_id = node_id 9 | self$storage = storage 10 | }, 11 | 12 | name = function() paste0('node ', self$node_id) 13 | ) 14 | ) -------------------------------------------------------------------------------- /analysis/block-discovery-sim/R/partition.R: -------------------------------------------------------------------------------- 1 | #' Generates a random partition of a block array among a set of nodes. The 2 | #' partitioning follows the supplied distribution. 3 | #' 4 | #' @param block_array a vector containing blocks 5 | #' @param network_size the number of nodes in the network 6 | #' @param distribution a sample generator which generates a vector of n 7 | #' samples when called as distribution(n). 8 | #' 9 | partition <- function(block_array, network_size, distribution) { 10 | buckets <- distribution(length(block_array)) 11 | 12 | # We won't attempt to shift the data, instead just checking that it is 13 | # positive. 14 | stopifnot(all(buckets >= 0)) 15 | 16 | buckets <- trunc(buckets * (network_size - 1) / max(buckets)) + 1 17 | sapply(1:network_size, function(i) which(buckets == i)) 18 | } 19 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/R/sim.R: -------------------------------------------------------------------------------- 1 | run_download_simulation <- function(swarm, max_steps, coding_rate) { 2 | total_blocks <- sum(sapply(swarm, function(node) length(node$storage))) 3 | required_blocks <- round(total_blocks * coding_rate) 4 | completed_blocks <- 0 5 | storage <- c() 6 | 7 | step <- 1 8 | stats <- Stats$new() 9 | while ((step < max_steps) && (completed_blocks < required_blocks)){ 10 | neighbor <- swarm |> select_neighbor() 11 | storage <- neighbor |> download_blocks(storage) 12 | 13 | completed_blocks <- length(storage) 14 | stats$add_stat( 15 | step = step, 16 | selected_neighbor = neighbor$node_id, 17 | total_blocks = total_blocks, 18 | required_blocks = required_blocks, 19 | completed_blocks = completed_blocks 20 | ) 21 | 22 | step <- step + 1 23 | } 24 | 25 | stats$as_tibble() 26 | } 27 | 28 | select_neighbor <- function(neighborhood) neighborhood[[sample(1:length(neighborhood), size = 1)]] 29 | 30 | download_blocks <- function(neighbor, storage) unique(c(neighbor$storage, storage)) 31 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/R/stats.R: -------------------------------------------------------------------------------- 1 | Stats <- R6Class( 2 | 'Stats', 3 | public = list( 4 | stats = NULL, 5 | 6 | initialize = function() { 7 | self$stats = list(list()) 8 | }, 9 | 10 | add_stat = function(...) { 11 | self$stats <- c(self$stats, list(rlang::dots_list(...))) 12 | self 13 | }, 14 | 15 | as_tibble = function() purrr::map_df(self$stats, as_tibble) 16 | ) 17 | ) -------------------------------------------------------------------------------- /analysis/block-discovery-sim/README.md: -------------------------------------------------------------------------------- 1 | Simple Block Discovery Simulator 2 | ================================ 3 | 4 | Simple simulator for understanding of block discovery dynamics. 5 | 6 | ## Hosted Version 7 | 8 | You can access the block discovery simulator on [shinyapps](https://gmega.shinyapps.io/block-discovery-sim/) 9 | 10 | ## Running 11 | 12 | You will need R 4.1.2 with [renv](https://rstudio.github.io/renv/) installed. I also strongly recommend you run this 13 | from [RStudio](https://posit.co/products/open-source/rstudio/) as you will otherwise need to [install pandoc and set it up manually before running](https://stackoverflow.com/questions/28432607/pandoc-version-1-12-3-or-higher-is-required-and-was-not-found-r-shiny). 14 | 15 | Once that's cared for and you are in the R terminal (Console in RStudio), you will need to first install deps: 16 | 17 | ```R 18 | > renv::install() 19 | ``` 20 | 21 | If you are outside RStudio, then you will need to restart your R session. After that, you should load the package: 22 | 23 | ```R 24 | devtools::load_all() 25 | ``` 26 | 27 | run the tests: 28 | 29 | ```R 30 | testthat::test_package('blockdiscoverysim') 31 | ``` 32 | 33 | and, if all goes well, launch the simulator: 34 | 35 | ```R 36 | run() 37 | ``` 38 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/block-discovery-sim.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Block Discovery Sim" 3 | output: html_document 4 | runtime: shiny 5 | 6 | # rsconnect uses this 7 | resource_files: 8 | - R/node.R 9 | - R/partition.R 10 | - R/sim.R 11 | - R/stats.R 12 | --- 13 | 14 | ## Goal 15 | 16 | The goal of this experiment is to understand -- under different assumptions about how blocks are partitioned among nodes -- how long a hypothetical downloader would take to discover enough blocks to make a successful download from storage nodes by randomly sampling the swarm. We therefore do not account for download times or network latency - we just measure how many times the node randomly samples the swarm before figuring out where enough of the blocks are. 17 | 18 | ```{r echo = FALSE, message = FALSE} 19 | library(shiny) 20 | library(plotly) 21 | 22 | source('R/collate.R') 23 | 24 | knitr::opts_chunk$set(echo = FALSE, message = FALSE) 25 | ``` 26 | 27 | ```{r} 28 | runs <- 10 29 | max_steps <- Inf 30 | ``` 31 | 32 | ```{r} 33 | DISTRIBUTIONS <- list( 34 | 'uniform' = runif, 35 | 'exponential' = rexp, 36 | 'pareto' = VGAM::rparetoI 37 | ) 38 | ``` 39 | 40 | 41 | ## Network 42 | 43 | * Select the parameters of the network you would like to use in the experiments. 44 | * Preview the shape of the partitions by looking at the chart. 45 | * Generate more random partitions by clicking "Generate Another". 46 | 47 | ```{r} 48 | fluidPage( 49 | sidebarPanel( 50 | numericInput( 51 | 'swarm_size', 52 | label = 'size of the swarm', 53 | value = 20, 54 | min = 1, 55 | max = 10000 56 | ), 57 | numericInput( 58 | 'file_size', 59 | label = 'number of blocks in the file', 60 | value = 1000, 61 | min = 1, 62 | max = 1e6 63 | ), 64 | selectInput( 65 | 'partition_distribution', 66 | label = 'shape of the distribution for the partitions', 67 | choices = names(DISTRIBUTIONS) 68 | ), 69 | actionButton( 70 | 'generate_network', 71 | label = 'Generate Another' 72 | ) 73 | ), 74 | mainPanel( 75 | plotOutput('network_sample') 76 | ) 77 | ) 78 | ``` 79 | 80 | ```{r} 81 | observe({ 82 | input$generate_network 83 | output$network_sample <- renderPlot({ 84 | purrr::map_dfr( 85 | generate_network( 86 | number_of_blocks = input$file_size, 87 | network_size = input$swarm_size, 88 | partition_distribution = input$partition_distribution 89 | ), 90 | function(node) tibble(node_id = node$node_id, blocks = length(node$storage)) 91 | ) %>% 92 | ggplot() + 93 | geom_bar( 94 | aes(x = node_id, y = blocks), 95 | stat = 'identity', 96 | col = 'black', 97 | fill = 'lightgray' 98 | ) + 99 | labs(x = 'node') + 100 | theme_minimal() 101 | })} 102 | ) 103 | ``` 104 | 105 | ## Experiment 106 | 107 | Select the number of experiment runs. Each experiment will generate a network and then simulate a download operation where a hypothetical node: 108 | 109 | 1. joins the swarm; 110 | 2. samples one neighbor per round in a round-based download protocol and asks for its block list. 111 | 112 | The experiment ends when the downloading node recovers "enough" blocks. If we let the total number of blocks in the file be $n$ and the coding rate $r$, then the simulation ends when the set of blocks $D$ discovered by the downloading node satisfies $\left|D\right| \geq n\times r$. 113 | 114 | We then show a "discovery curve": a curve that emerges as we look at the percentage of blocks the downloader has discovered so far as a function of the number of contacts it made. 115 | 116 | The curve is actually an average of all experiments, meaning that a point $(5, 10\%)$ should be interpreted as: "on average, after $5$ contacts, a downloader will have discovered $10\%$ of the blocks it needs to get a successful download". We show the $5^{th}$ percentile and the $95^{th}$ percentiles of the experiments as error bands around the average. 117 | 118 | ```{r} 119 | fluidPage( 120 | fluidRow( 121 | class='well', 122 | column( 123 | width = 6, 124 | sliderInput('runs', 'How many experiments to run', min = 10, max = 10000, value = 10), 125 | actionButton('do_run', 'Run') 126 | ), 127 | column( 128 | width = 6, 129 | numericInput('coding_rate', 'Coding rate (percentage of blocks required for a successful download)', 130 | min = 0.1, max = 1.0, step = 0.05, value = 0.5) 131 | ) 132 | ) 133 | ) 134 | ``` 135 | 136 | ```{r} 137 | experiment_results <- reactive({ 138 | lapply(1:input$runs, function(i) { 139 | generate_network( 140 | number_of_blocks = input$file_size, 141 | network_size = input$swarm_size, 142 | partition_distribution = input$partition_distribution 143 | ) |> run_experiment(run_id = i, coding_rate = input$coding_rate) 144 | }) 145 | }) |> bindEvent( 146 | input$do_run, 147 | ignoreNULL = TRUE, 148 | ignoreInit = TRUE 149 | ) 150 | ``` 151 | 152 | ```{r} 153 | renderPlotly({ 154 | plot_results(do.call(rbind, experiment_results())) 155 | }) 156 | ``` 157 | 158 | ```{r} 159 | generate_network <- function(number_of_blocks, network_size, partition_distribution) { 160 | block_array <- sample(1:number_of_blocks, replace = FALSE) 161 | 162 | partitions <- partition(block_array, network_size, DISTRIBUTIONS[[partition_distribution]]) 163 | sapply(1:network_size, function(i) Node$new( 164 | node_id = i, 165 | storage = partitions[[i]]) 166 | ) 167 | } 168 | ``` 169 | 170 | ```{r} 171 | run_experiment <- function(network, coding_rate, run_id = 0) { 172 | run_download_simulation( 173 | swarm = network, 174 | coding_rate = coding_rate, 175 | max_steps = max_steps 176 | ) |> mutate( 177 | run = run_id 178 | ) 179 | } 180 | ``` 181 | 182 | ```{r} 183 | plot_results <- function(results) { 184 | stats <- results |> 185 | mutate(completion = pmin(1.0, completed_blocks / required_blocks)) |> 186 | group_by(step) |> 187 | summarise( 188 | average = mean(completion), 189 | p_95 = quantile(completion, 0.95), 190 | p_05 = quantile(completion, 0.05), 191 | .groups = 'drop' 192 | ) 193 | 194 | plotly::ggplotly(ggplot(stats, aes(x = step)) + 195 | geom_line(aes(y = average), col = 'black', lwd = 1) + 196 | geom_ribbon(aes(ymin = p_05, ymax = p_95), fill = 'grey80', alpha = 0.5) + 197 | labs(x = 'contacts', y = 'blocks discovered (%)') + 198 | scale_y_continuous(labels = scales::percent_format()) + 199 | theme_minimal()) 200 | } 201 | ``` 202 | 203 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/block-discovery-sim.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | BuildType: Package 16 | PackageUseDevtools: Yes 17 | PackageInstallArgs: --no-multiarch --with-keep.source 18 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/renv/.gitignore: -------------------------------------------------------------------------------- 1 | library/ 2 | local/ 3 | cellar/ 4 | lock/ 5 | python/ 6 | sandbox/ 7 | staging/ 8 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/renv/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "bioconductor.version": null, 3 | "external.libraries": [], 4 | "ignored.packages": [], 5 | "package.dependency.fields": [ 6 | "Imports", 7 | "Depends", 8 | "LinkingTo" 9 | ], 10 | "r.version": null, 11 | "snapshot.type": "implicit", 12 | "use.cache": true, 13 | "vcs.ignore.cellar": true, 14 | "vcs.ignore.library": true, 15 | "vcs.ignore.local": true, 16 | "vcs.manage.ignores": true 17 | } 18 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/tests/testthat.R: -------------------------------------------------------------------------------- 1 | # This file is part of the standard setup for testthat. 2 | # It is recommended that you do not modify it. 3 | # 4 | # Where should you do additional test configuration? 5 | # Learn more about the roles of various files in: 6 | # * https://r-pkgs.org/tests.html 7 | # * https://testthat.r-lib.org/reference/test_package.html#special-files 8 | 9 | library(testthat) 10 | 11 | test_check("blockdiscoverysim") 12 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/tests/testthat/test-partition.R: -------------------------------------------------------------------------------- 1 | test_that( 2 | "should partition into linearly scaled buckets", { 3 | samples <- c(1, 100, 500, 800, 850) 4 | 5 | partitions <- partition( 6 | block_array = 1:5, 7 | network_size = 4, 8 | distribution = function(n) samples[1:n] 9 | ) 10 | 11 | expect_equal(partitions, list( 12 | c(1, 2), 13 | c(3), 14 | c(4), 15 | c(5)) 16 | ) 17 | } 18 | ) 19 | -------------------------------------------------------------------------------- /analysis/block-discovery-sim/tests/testthat/test-stats.R: -------------------------------------------------------------------------------- 1 | test_that( 2 | "should collect stats as they are input", { 3 | stats <- Stats$new() 4 | 5 | stats$add_stat(a = 1, b = 2, name = 'hello') 6 | stats$add_stat(a = 1, b = 3, name = 'world') 7 | 8 | expect_equal( 9 | stats$as_tibble(), 10 | tribble( 11 | ~a, ~b, ~name, 12 | 1, 2, 'hello', 13 | 1, 3, 'world', 14 | ) 15 | ) 16 | } 17 | ) 18 | -------------------------------------------------------------------------------- /analysis/block-discovery.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Block Discovery Problem" 3 | output: 4 | bookdown::gitbook: 5 | number_sections: false 6 | --- 7 | 8 | $$ 9 | \newcommand{\rv}[1]{\textbf{#1}} 10 | \newcommand{\imin}{\rv{I}_{\text{min}}} 11 | $$ 12 | 13 | ## Problem Statement 14 | 15 | Let $F = \left\{b_1, \cdots, b_m\right\}$ be an erasure-coded file, and let $O = \left\{o_1, \cdots, o_n\right\}$ be a set of nodes storing that file. We define a _storage function_ $s \longrightarrow O \times 2^F$ as a function mapping subsets of $F$ into nodes in $O$. 16 | 17 | In the simplified block discovery problem, we have a _downloader node_ which is attempting to construct a subset $D \subseteq F$ of blocks by repeatedly sampling nodes from $O$. "Discovery", in this context, can be seen as the downloader node running a round-based protocol where, at round $i$, it samples a random contact $o_i$ and learns about $s(o_i)$. 18 | 19 | To make this slightly more formal, we denote $D_i \subseteq F$ to be the set of blocks that the downloader has learned after $i^{th}$ contact. By the way we state the protocol to work, we have that: 20 | 21 | $$ 22 | \begin{equation} 23 | D_i = D_{i - 1} \cup s(o_i) 24 | (\#eq:discovery) 25 | \end{equation} 26 | $$ 27 | 28 | Since the file is erasure coded, the goal of the downloader is to learn some $D_i$ such that: 29 | 30 | $$ 31 | \begin{equation} 32 | \left|D_i\right| \geq c \times \left|F\right| 33 | (\#eq:complete) 34 | \end{equation} 35 | $$ 36 | 37 | When $D_i$ satisfies Eq. \@ref(eq:complete), we say that $D_i$ is _complete_. We can then state the problem as follows. 38 | 39 | **Statement.** Let $\imin$ be a random variable representing the first round at which $D_i$ is complete. We want to estimate $F(i) = \mathbb{P}(\imin \leq i)$; namely, the probability that the downloader has discovered all relevant blocks by round $i$. 40 | 41 | ## Case (1) - Erasure Coding but no Replication 42 | 43 | If we assume there is no replication then, unless we contact the same node twice, every node we contact contributes with new information. Indeed, the absence of replication implies, necessarily, that: 44 | 45 | $$ 46 | \begin{equation} 47 | \bigcap_{o \in O} s(o) = \emptyset 48 | (\#eq:disjoint) 49 | \end{equation} 50 | $$ 51 | 52 | So that if we are contacting a new node at round $i$, we must necessarily have that: 53 | 54 | $$ 55 | \begin{equation} 56 | \left|D_{i}\right| \stackrel{1}{=} \left|D_{i - 1} \cup s(o_i)\right| \stackrel{2}{=} \left|D_{i - 1}\right| + \left|s(o_i)\right| 57 | (\#eq:monotonic) 58 | \end{equation} 59 | $$ 60 | Where (1) follows from Eq. \@ref(eq:discovery), and (2) follows from the $s(o_i)$ being disjoint (Eq. \@ref(eq:disjoint)). This leads to the corollary: 61 | 62 | **Corollary 1.** After $\lceil c \times n\rceil$ rounds, the downloader will necessarily have learned enough blocks to download $F$. 63 | 64 | which follows trivially from Eq. \@ref(eq:monotonic) and the implication that $D_{\lceil c \times n\rceil}$ must be complete. $\blacksquare$ 65 | 66 | As for $F(i)$, note that we can estimate the probability of completion by estimating the probability that $|D_i|$ is bigger than the completion number (Eq. \@ref(eq:complete)). How exactly that looks like and how tractable it is, however, depends on the formulation we give it. 67 | 68 | ### Independent Partition Sizes 69 | 70 | Suppose we knew the distribution for partition sizes in $O$, i.e., we knew that the number of blocks assigned to a node in $O$ follows some distribution $\mathcal{B}$ (e.g., a truncated Gaussian). 71 | 72 | If we have a "large enough" network, this means we would be able to approximate the number of blocks assigned to each node as $m$ independent random variables $\rv{Y}_i$, where $\rv{Y}_i \sim \mathcal{B}$. In that case, we would be able to express the total number of blocks learned by the downloader by round $i$ as a random variable $\rv{L}_i$ which represents the sum of the iid random variables $\rv{Y}_j \sim \mathcal{B}$: 73 | 74 | $$ 75 | \begin{equation} 76 | \rv{L}_i \sim \sum_{j = 1}^{i} \rv{Y}_j 77 | (\#eq:learning-sum) 78 | \end{equation} 79 | $$ 80 | 81 | The shape of the distribution would be the $i$-fold convolution of $\mathcal{B}$ with itself, which can be tractable for some distributions. 82 | 83 | More interestingly, though, Eq. \@ref(eq:learning-sum) allows us to express a $\mathcal{B}$-independent estimate of the average number of rounds a downloader will undergo before completing a download. We have that: 84 | 85 | $$ 86 | \mathbb{E}(\rv{L}_i) \sim \sum_{j = 1}^i \mathbb{E}(\rv{Y}_j) = i\mathbb{E}(\rv{Y}) = i\times \mu_{\rv{Y}} 87 | $$ 88 | 89 | We can then solve for $i$ and the completion condition to get: 90 | 91 | $$ 92 | \begin{equation} 93 | i \times \mu_{\rv{Y}} \geq c \times |F| \iff i \geq \frac{c \times |F|}{\mu_{\rv{Y}}} 94 | (\#eq:average-completion) 95 | \end{equation} 96 | $$ 97 | 98 | note that this is intuitive to the point of being trivial. If we let $c = 1$, we get $i \geq |F|/\mu_{\rv{Y}}$, which just means that on average the node will have to sample a number of nodes equal to the number of blocks over the average partition size. In practice we can use $\overline{\mu_\rv{Y}} = \frac{1}{m}\sum_i \left|s(o_i)\right|$ instead of $\mu_{\rv{Y}}$ to estimate what $i$ can look like. 99 | 100 | ### Non-Independent Partition Sizes 101 | 102 | If we cannot approximate partition sizes and independent random variables, then the problem changes. Stripping it down, we can cast it as follows. We have a set of integers $P = \{p_1, \cdots, p_m\}$ representing the sizes of each partition. We then want to understand the distribution of the partial sums for random permutations of $P$. 103 | 104 | As I understand it, there is no good way of addressing this without running simulations. The difference is that if we assume disjoint partitions then the simulations are a lot simpler as we do not need to track the contents of $D_i$. -------------------------------------------------------------------------------- /design/Merkle.md: -------------------------------------------------------------------------------- 1 | 2 | Merkle tree API proposal (WIP draft) 3 | ------------------------------------ 4 | 5 | Let's collect the possible problems and solutions with constructing Merkle trees. 6 | 7 | See [section "Final proposal"](#Final-proposal) at the bottom for the concrete 8 | version we decided to implement. 9 | 10 | ### Vocabulary 11 | 12 | A Merkle tree, built on a hash function `H`, produces a Merkle root of type `T`. 13 | This is usually the same type as the output of the hash function. Some examples: 14 | 15 | - SHA1: `T` is 160 bits 16 | - SHA256: `T` is 256 bits 17 | - Poseidon: `T` is one (or a few) finite field element(s) 18 | 19 | The hash function `H` can also have different types `S` of inputs. For example: 20 | 21 | - SHA1 / SHA256 / SHA3: `S` is an arbitrary sequence of bits 22 | - some less-conforming implementation of these could take a sequence of bytes instead 23 | - Poseidon: `S` is a sequence of finite field elements 24 | - Poseidon compression function: at most `t-1` field elements (in our case `t=3`, so 25 | that's two field elements) 26 | - A naive Merkle tree implementation could for example accept only a power-of-two 27 | sized sequence of `T` 28 | 29 | Notation: Let's denote a sequence of `T`-s by `[T]`. 30 | 31 | ### Merkle tree API 32 | 33 | We usually need at least two types of Merkle tree APIs: 34 | 35 | - one which takes a sequence `S = [T]` of length `n` as input, and produces an 36 | output (Merkle root) of type `T` 37 | - and one which takes a sequence of bytes (or even bits, but in practice we probably 38 | only need bytes): `S = [byte]` 39 | 40 | We can decompose the latter into the composition of a function 41 | `deserialize : [byte] -> [T]` and the former. 42 | 43 | ### Naive Merkle tree implementation 44 | 45 | A straightforward implementation of a binary Merkle tree `merkleRoot : [T] -> T` 46 | could be for example: 47 | 48 | - if the input has length 1, it's the root 49 | - if the input has even length `2*k`, group it into pairs, apply a 50 | `compress : (T,T) -> T` compression function, producing the next layer of size `k` 51 | - if the input has odd length `2*k+1`, pad it with an extra element `dummy` of 52 | type `T`, then apply the procedure for even length, producing the next layer of size `k+1` 53 | 54 | The compression function could be implemented in several ways: 55 | 56 | - when `S` and `T` are just sequences of bits or bytes (as in the case of classical hash 57 | functions like SHA256), we can just concatenate the two leaves of the node and apply the 58 | hash: `compress(x,y) := H(x|y)` 59 | - in case of hash functions based on the sponge construction (like Poseidon or Keccak/SHA3), 60 | we can just fill the "capacity part" of the state with a constant (say 0), the "absorbing 61 | part" of the state with the two inputs, apply the permutation, and extract a single `T` 62 | 63 | ### Attacks 64 | 65 | When implemented without enough care (like the above naive algorithm), there are several 66 | possible attacks producing hash collisions or second preimages: 67 | 68 | 1. The root of any particular layer is the same as the root of the input 69 | 2. The root of `[x_0,x_1,...,x_(2*k)]` (length is `n=2*k+1` is the same as the root of 70 | `[x_0,x_1,...,x_(2*k),dummy]` (length is `n=2*k+2`) 71 | 3. when using bytes as the input, already `deserialize` can have similar collision attacks 72 | 4. The root of a singleton sequence is itself 73 | 74 | Traditional (linear) hash functions usually solve the analogous problems by clever padding. 75 | 76 | ### Domain separation 77 | 78 | It's a good practice in general to ensure that different constructions using the same 79 | underlying hash function will never (or at least with a very high probability not) produce the same output. 80 | This is called "domain separation", and it can very loosely remind one to _multihash_; however 81 | instead of adding extra bits of information to a hash (and thus increasing its size), we just 82 | compress the extra information into the hash itself. So the information itself is lost, 83 | however collisions between different domains are prevented. 84 | 85 | A simple example would be using `H(dom|H(...))` instead of `H(...)`. The below solutions 86 | can be interpreted as an application of this idea, where we want to separate the different 87 | lengths `n`. 88 | 89 | ### Possible solutions (for the tree attacks) 90 | 91 | While the third problem (`deserialize` may be not injective) is similar to the second problem, 92 | let's deal first with the tree problems, and come back to `deserialize` (see below) later. 93 | 94 | **Solution 0.** Pre-hash each input element. This solves 1), 2) and also 4) (at least 95 | if we choose `dummy` to be something we don't expect anybody to find a preimage), but 96 | it doubles the computation time. 97 | 98 | **Solution 1.** Just prepend the data with the length `n` of the input sequence. Note that any 99 | cryptographic hash function needs an output size of at least 160 bits (and usually at least 100 | 256 bits), so we can always embed the length (surely less than `2^64`) into `T`. This solves 101 | both problems 1) and 2) (the height of the tree is a deterministic function of the length), 102 | and 4) too. 103 | However, a typical application of a Merkle tree is the case where the length of the input 104 | `n=2^d` is a power of two; in this case it looks a little bit "inelegant" to increase the size 105 | to `n=2^d+1`, though the overhead with above even-odd construction is only `log2(n)`. 106 | An advantage is that you can _prove_ the size of the input with a standard Merkle inclusion proof. 107 | 108 | Alternative version: append the length, instead of prepending; then the indexing of the leaves does not change. 109 | 110 | **Solution 2.** Apply an extra compression step at the very end including the length `n`, 111 | calculating `newRoot = compress(n,origRoot)`. This again solves all 3 problems. However, it 112 | makes the code a bit less regular; and you have to submit the length as part of Merkle proofs 113 | (but it seems hard to avoid that anyway). 114 | 115 | **Solution 3a.** Use two different compression functions, one for the bottom layer (by bottom 116 | I mean the one next to the input, which is the same as the widest one) and another for all 117 | the other layers. For example you can use `compress(x,y) := H(isBottomLayer|x|y)`. 118 | This solves problem 1). 119 | 120 | **Solution 3b.** Use two different compression functions, one for the even nodes, and another 121 | for the odd nodes (that is, those with a single children instead of two). Similarly to the 122 | previous case, you can use for example `compress(x,y) := H(isOddNode|x|y)` (note that for 123 | the odd nodes, we will have `y=dummy`). This solves problem 2). Remark: The extra bits of 124 | information (odd/even) added to the last nodes (one in each layer) are exactly the binary 125 | expansion of the length `n`. A disadvantage is that for verifying a Merkle proof, we need to 126 | know for each node whether it's the last or not, so we need to include the length `n` into 127 | any Merkle proof here too. 128 | 129 | **Solution 3.** Combining **3a** and **3b**, we can solve both problems 1) and 2); so here we add 130 | two bits of information to each node (that is, we need 4 different compression functions). 131 | 4) can be always solved by adding a final compression call. 132 | 133 | **Solution 4a.** Replace each input element `x_i` with `compress(i,x_i)`. This solves 134 | both problems again (and 4) too), but doubles the amount of computation. 135 | 136 | **Solution 4b.** Only in the bottom layer, use `H(1|isOddNode|i|x_{2i}|x_{2i+1})` for 137 | compression (note that for the odd node we have `x_{2i+1}=dummy`). This is similar to 138 | the previous solution, but does not increase the amount of computation. 139 | 140 | **Solution 4c.** Only in the bottom layer, use `H(i|j|x_i|x_j)` for even nodes 141 | (with `i=2*k` and `j=2*k+1`), and `H(i|0|x_i|0)` for the odd node (or alternatively 142 | we could also use `H(i|i|x_i|x_i)` for the odd node). Note: when verifying 143 | a Merkle proof, you still need to know whether the element you prove is the last _and_ 144 | odd element, or not. However instead of submitting the length, you can encode this 145 | into a single bit (not sure if that's much better though). 146 | 147 | **Solution 5.** Use a different tree shape, where the left subtree is always a complete 148 | (full) binary tree with `2^floor(log2(n-1))` leaves, and the right subtree is 149 | constructed recursively. Then the shape of tree encodes the number of inputs `n`. 150 | Blake3 hash uses such a strategy internally. This however complicates the Merkle proofs 151 | (they won't have uniform size anymore). 152 | TODO: think more about this! 153 | 154 | ### Keyed compression functions 155 | 156 | How can we have many different compression functions? Consider three case studies: 157 | 158 | **Poseidon.** The Poseidon family of hashes is built on a (fixed) permutation 159 | `perm : F^t -> F^t`, where `F` is a (large) finite field. For simplicity consider the case `t=3`. 160 | The standard compression function is then defined as: 161 | 162 | compress(x,y) := let (u,_,_) = perm(x,y,0) in u 163 | 164 | That, we take the triple `(x,y,0)`, apply the permutation to get another triple `(u,v,w)`, and 165 | extract the field element `u` (we could use `v` or `w` too, it shouldn't matter). 166 | Now we can see that it is in fact very easy to generalize this to a _keyed_ (or _indexed_) 167 | compression function: 168 | 169 | compress_k(x,y) := let (u,_,_) = perm(x,y,k) in u 170 | 171 | where `k` is the key. Note that there is no overhead in doing this. And since `F` is pretty 172 | big (in our case, about 253 bits), there is plenty of information we can encode in the key `k`. 173 | 174 | Note: We probably lose a few bits of security here, if somebody looks for a preimage among 175 | _all_ keys; however in our constructions the keys have a fixed structure, so it's probably 176 | not that dangerous. If we want to be extra safe, we could use `t=4` and `pi(x,y,k,0)` 177 | instead (but that has some computation overhead). 178 | 179 | **SHA256.** When using SHA256 as our hash function, normally the compression function is 180 | defined as `compress(x,y) := SHA256(x|y)`, that is, concatenate the (bitstring representation of the) 181 | two elements, and apply SHA256 to the resulting (bit)string. Normally `x` and `y` are both 182 | 256 bits long, and so is the result. If we look into the details of how SHA256 is specified, 183 | this is actually wasteful. That's because while SHA256 processes the input in 512 bit chunks, 184 | it also prescribes a mandatory nonempty padding. So when calling SHA256 on an input of size 185 | 512 bit (64 bytes), it will actually process two chunks, the second chunk consisting purely 186 | of padding. When constructing a binary Merkle tree using a compression function like before, 187 | the input is always of the same size, so this padding is unnecessary; nevertheless, people 188 | usually prefer to follow the standardized SHA256 call. But, if we are processing 1024 bits 189 | anyway, we have a lot of free space to include our key `k`! In fact we can add up to 190 | `512-64-1=447` bits of additional information; so for example 191 | 192 | compress_k(x,y) := SHA256(k|x|y) 193 | 194 | works perfectly well with no overhead compared to `SHA256(x|y)`. 195 | 196 | **MiMC.** MiMC is another arithmetic construction, however in this 197 | case the starting point is a _block cipher_, that is, we start with 198 | a keyed permutation! Unfortunately MiMC-p/p is a (keyed) permutation 199 | of `F`, which is not very useful for us; however in Feistel mode we 200 | get a keyed permutation of `F^2`, and we can just take the first 201 | component of the output of that as the compressed output. 202 | 203 | ### Making `deserialize` injective 204 | 205 | Consider the following simple algorithm to deserialize a sequence of bytes into chunks of 206 | 31 bytes: 207 | 208 | - pad the input with at most 30 zero bytes such that the padded length becomes divisible 209 | with 31 210 | - split the padded sequnce into `ceil(n/31)` chunks, each 31 bytes. 211 | 212 | The problem with this, is that for example `0x123456`, `0x12345600` and `0x1234560000` 213 | all results in the same output. 214 | 215 | #### About padding in general 216 | 217 | Let's take a step back, and meditate a little bit about the meaning of padding. 218 | 219 | What is padding? It's a mapping from a set of sequences into a subset. In our case 220 | we have an arbitrary sequence of bytes, and we want to map into the subset of sequences 221 | whose length is divisible by 31. 222 | 223 | Why do we want padding? Because we want to apply an algorithm (in this case a hash function) 224 | to arbitrary sequences, but the algorithm can only handle a subset of all sequences. 225 | In our case we first map the arbitrary sequence of bytes into a sequence of bytes 226 | whose length is divisible by 31, and then map that into a sequence of finite field 227 | elements. 228 | 229 | What properties do we want from padding? Well, that depends on what what properties we 230 | want from the resulting algorithm. In this case we do hashing, so we definitely want 231 | to avoid collisions. This means that our padding should never map two different input 232 | sequences into the same padded sequence (because that would create a trivial collision). 233 | In mathematics, we call such functions "injective". 234 | 235 | How do you prove that a function is injective? You provide an inverse function, 236 | which takes a padded sequences and outputs the original one. 237 | 238 | In summary we need to come up with an injective padding strategy for arbitrary byte 239 | sequences, which always results in a byte sequence whose length is divisible by 31. 240 | 241 | #### Some possible solutions: 242 | 243 | - prepend the length (number of input bytes) to the input, say as a 64-bit little-endian integer (8 bytes), 244 | before padding as above 245 | - or append the length instead of prepending, then pad (note: appending is streaming-friendly; prepending is not) 246 | - or first pad with zero bytes, but leave 8 bytes for the length (so that when we finally append 247 | the length, the result will be divisible 31). This is _almost_ exactly what SHA2 does. 248 | - use the following padding strategy: _always_ add a single `0x01` byte, then enough `0x00` bytes (possibly none) 249 | so that the length is divisible by 31. This is usually called the `10*` padding strategy, abusing regexp notation. 250 | Why does this work? Well, consider an already padded sequence. It's very easy to recover the 251 | original byte sequence by 1) first removing all trailing zeros; and 2) after that, remove the single 252 | trailing `0x01` byte. This proves that the padding is an injective function. 253 | - one can easily come up with many similar padding strategies. For example SHA3/Keccak uses `10*1` 254 | (but on bits, not bytes), and SHA2 uses a combination of `10*` and appending the bit length of the 255 | original input. 256 | 257 | Remark: Any safe padding strategy will result in at least one extra field element 258 | if the input length was already divisible by 31. This is both unavoidable in general, 259 | and not an issue in practice (as the size of the input grows, the overhead becomes 260 | negligible). The same thing happens when you SHA256 hash an integer multiple of 64 bytes. 261 | 262 | 263 | ### Final proposal 264 | 265 | We decided to implement the following version. 266 | 267 | - pad byte sequences (to have length divisible by 31) with the `10*` padding strategy; that is, 268 | always append a single `0x01` byte, and after that add a number of zero bytes (between 0 and 30), 269 | so that the resulting sequence have length divisible by 31 270 | - when converting an (already padded) byte sequence to a sequence of field elements, 271 | split it up into 31 byte chunks, interpret those as little-endian 248-bit unsigned 272 | integers, and finally interpret those integers as field elements in the BN254 scalar 273 | prime field (using the standard mapping `Z -> Z/r`). 274 | - when using the Poseidon2 sponge construction to compute a linear hash out of 275 | a sequence of field elements, we use the BN254 scalar field, `t=3` and `(0,0,domsep)` 276 | as the initial state, where `domsep := 2^64 + 256*t + rate` is the domain separation 277 | IV. Note that because `t=3`, we can only have `rate=1` or `rate=2`. We need 278 | a padding strategy here too (since the input length must be divisible by `rate`): 279 | we use `10*` again, but here on field elements. 280 | Remark: For `rate=1` this makes things always a tiny bit slower, but we plan to use 281 | `rate=2` anyway (as it's twice as fast), and it's better not to have exceptional cases. 282 | - when using Poseidon2 to build a binary Merkle tree, we use "solution #3" from above. 283 | That is, we use a keyed compression function, with the key being one of `{0,1,2,3}` 284 | (two bits). The lowest bit is 1 in the bottom-most (that is, the widest) layer, 285 | and 0 otherwise; the other bit is 1 if it's both the last element of the layer, 286 | _and_ it is an odd layer; 0 otherwise. In odd layers, we also add an extra 0 field 287 | element to make it even. This is also valid for the singleton input: in that case 288 | it's both odd and the bottommost, so the root of a singleton input `[x]` will 289 | be `H_{key=3}(x|0)` 290 | - we will use the same strategy when constructing binary Merkle trees with the 291 | SHA256 hash; in that case, the compression function will be `SHA256(x|y|key)`. 292 | Note: since SHA256 already uses padding internally, adding the key does not 293 | result in any overhead. 294 | -------------------------------------------------------------------------------- /design/marketplace.md: -------------------------------------------------------------------------------- 1 | A marketplace for storage durability 2 | ==================================== 3 | 4 | We present a new design for a storage marketplace, that is both simpler and 5 | includes incentives for repair. 6 | 7 | Context 8 | ------- 9 | 10 | Our current storage marketplace is designed around the notion of sending out 11 | requests for storage, waiting for hosts to offer storage, and then choosing a 12 | selection from these hosts to start a storage contract with. It requires 13 | separate contracts for each of these hosts, active participation of the client 14 | during the negotiation phase, and does not yet have any provisions for repairing 15 | storage when hosts fail to deliver on their contracts. 16 | 17 | In this document we describe a new design that is simpler, requires less 18 | interactions, and has repair incentives built in. 19 | 20 | A new design 21 | ------------ 22 | 23 | We propose to create new type of storage contract, containing a number of slots. 24 | Each of these slots represents an agreement with a storage host to store a part 25 | of the content. When a client wants store data on the network with durability 26 | guarantees, it posts a storage Request on the blockchain. Hosts that want to 27 | offer storage can fill a slot in the Request. 28 | 29 | 30 | -------- 31 | ---- fill slot --- | Host | 32 | | -------- 33 | | 34 | v 35 | -------------- 36 | ---------- | | -------- 37 | | Client | --- request ---> | Blockchain | <--- fill slot --- | Host | 38 | ---------- | | -------- 39 | -------------- 40 | ^ 41 | | 42 | | -------- 43 | ---- fill slot --- | Host | 44 | -------- 45 | 46 | 47 | The Request contains the content identifier, so that hosts can locate 48 | and download the content. It also contains the reward that hosts receive for 49 | storing the data and the collateral that hosts are expected to deposit. It 50 | contains parameters pertaining to storage proofs and erasure coding. And 51 | finally, it contains the amount of hosts that are expected to store the content, 52 | including a small amount of host losses that can be tolerated. 53 | 54 | 55 | Request 56 | 57 | cid # content identifier 58 | 59 | reward # tokens paid per second per filled slot 60 | collateral # amount of collateral required per host and slot 61 | 62 | proof probability # frequency at which proofs are required 63 | proof parameters # proof of retrievability parameters 64 | erasure coding # erasure coding parameters 65 | dispersal # dispersal parameter 66 | repair reward # amount of tokens paid for repairs 67 | 68 | hosts # amount of storage hosts (including loss) 69 | loss # number of allowed host losses 70 | 71 | slots # assigned host slots 72 | 73 | expire # slots need to be filled before timeout 74 | 75 | Slots 76 | ----- 77 | 78 | Initially all host slots are empty. An empty slot can be filled by anyone by 79 | submitting a correct storage proof together with collateral. 80 | 81 | 82 | proof & proof & 83 | collateral proof missed collateral missed 84 | | | | | | 85 | v v v v v 86 | ------------------------------------------------------------------- 87 | slot: |///////////////////////| |////////////////////| 88 | ------------------------------------------------------------------- 89 | | | 90 | v v 91 | collateral collateral 92 | lost lost 93 | 94 | 95 | 96 | ---------------- time ----------------> 97 | 98 | 99 | The time interval that a slot is filled by a host determines the host payout; 100 | for every second of the interval a certain amount of tokens are awarded to the 101 | host. Hosts that fill a slot are required to submit frequent proofs of storage. 102 | 103 | When a certain number of proofs is missed, the slot is considered empty again. 104 | The collateral associated with the slot is mostly burned. Some of it is used to 105 | pay a fee to the node that indicated that proofs were missing, and some of it is 106 | reserved for repairs. An empty slot can be filled again once another host 107 | submits a correct proof together with collateral. Payouts for the time interval 108 | that a slot is empty are burned. 109 | 110 | Payouts for all hosts are accumulated in the smart contract and paid out at Request 111 | end. This is to ensure that the incentive posed by the collateral is not 112 | diminished over time. 113 | 114 | Contract lifecycle 115 | ------------------ 116 | 117 | A Request starts when all slots are filled. Regular storage proofs will be 118 | required from the hosts that filled the slots. 119 | 120 | Some Requests may not attract the required amount of hosts, for instance 121 | because the payment is insufficient or the storage demands on the network are 122 | too high. To ensure that such Requests end, we add a timeout to the Request. 123 | If the Request failed to attract sufficient hosts before the timeout is 124 | reached, it is considered cancelled, and the hosts that filled any of the slots 125 | are able to withdraw their collateral. They are also paid for the time interval 126 | before the timeout. The client is able to withdraw the rest of the tokens in the 127 | Request. 128 | 129 | A Request ends when the money that was paid upfront runs out. The end time can 130 | be calculated from the amount of tokens that are paid out per second. Note that 131 | in our scheme this amount does not change during the lifetime of the Request, 132 | even when proofs are missed and repair happens. This is a desirable property 133 | for hosts; they can be sure of a steady source of income, and a predetermined 134 | Request length. When a Request ends, the hosts may withdraw their collateral. 135 | 136 | When too many hosts fail to submit storage proofs, and no other hosts take over 137 | the slots that they vacate, then the content can be considered lost. The 138 | Request is considered failed. The collateral of every host in the Request is 139 | burned as an additional incentive for the network hosts to avoid this scenario. 140 | The client is able to retrieve any funds that are left in the Request. 141 | 142 | | 143 | | create 144 | | 145 | v 146 | ----------- timeout ------------- 147 | | new | ------------------> | cancelled | 148 | ----------- ------------- 149 | | 150 | | all slots filled 151 | | 152 | v 153 | ----------- too many losses ---------- 154 | | started | -------------------> | failed | 155 | ----------- ---------- 156 | | 157 | | money runs out 158 | | 159 | v 160 | ------------ 161 | | finished | 162 | ------------ 163 | 164 | 165 | Repairs 166 | ------- 167 | 168 | When a slot is freed because of missing too many storage proofs, some 169 | collateral from the host that previously filled the slot is used as an incentive 170 | to repair the lost content. Repair typically involves downloading other parts of 171 | the content and using erasure coding to restore the missing parts. To incentive 172 | other nodes to do this repair, there is repair fee. It is a partial amount of the original 173 | host's collateral. The size of the reward is a fraction of slot's collateral 174 | where the fraction is parameter of the smart contract. 175 | 176 | The size of the reward should be chosen carefully. It should not be too low, to 177 | incentivize hosts in the network to prioritize repairs over filling new slots in 178 | the network. It should also not be too high, to prevent malicious nodes in the 179 | network to try to disable hosts in an attempt to collect the reward. 180 | 181 | Renewal 182 | ------- 183 | 184 | When a Request is about to end, and someone in the network wants the Request 185 | to continue for longer, then they can post a new Request with the same content 186 | identifier. 187 | 188 | We've chosen not to allow top-ups of existing Requests with new funds. Even 189 | though this has many advantages (it's a very simple way to extend the lifetime 190 | of the Request, it allows people to easily chip in to host content, etc.) it 191 | has one big disadvantage: hosts no longer know for how long they'll be kept to 192 | the Request. When a Request is continuously topped up, they cannot leave the 193 | Request without losing their collateral. 194 | 195 | Dispersal 196 | --------- 197 | 198 | Here we propose an alternative way to select hosts for slots that is a 199 | variant of the "first come, first serve" approach that we described earlier. It 200 | intends to alleviate these problems: 201 | 202 | 1. a single host can fill all slots in a Request 203 | 2. a small group of powerful hosts is able to fill most slots in the network 204 | 3. resources are wasted when many hosts try to fill the same slot 205 | 206 | For a client it is beneficial when their content is stored on as many different 207 | hosts as possible, to guard against host failures. Should a single host fill all 208 | slots in the Request, then the failure of this single host could mean that the 209 | content is lost. 210 | 211 | On a network level, we also want to avoid that a few large players are able to 212 | fill most Request slots, which would mean that the network becomes fairly 213 | centralized. 214 | 215 | When too many nodes compete for a slot in a Request, and only one is selected, 216 | then this leads to wasted resources in the network. Wasted resources ultimately 217 | lead to a higher cost of storage. 218 | 219 | To alleviate these problems, we introduce a dispersal parameter in the Request. 220 | The dispersal parameter allows a client to choose the amount of 221 | spreading within the network. When a slot becomes empty then only a small amount 222 | of hosts in the network are allowed to fill the slot. Over time, more and more 223 | hosts will be allowed to fill a slot. Each slot starts with a different set of 224 | allowed hosts. 225 | 226 | The speed at which new hosts are included is chosen by the client. When the 227 | client choses a high speed, then very quickly every host in the network will be 228 | able to fill slots. This increases the chances of a single host to fill all 229 | slots in a Request. When the client choses a low speed, then it is more likely 230 | that different hosts fill the slots. 231 | 232 | We use the Kademlia distance function to indicate which hosts are allowed to 233 | fill a slot. 234 | 235 | distance between a and b: xor(a, b) 236 | slot start point: hash(nonce || slot number) 237 | allowed distance: elapsed time * dispersal parameter 238 | 239 | 240 | Each slot has a different start point: 241 | 242 | slot 4 slot 0 slot 2 slot 3 slot 1 243 | | | | | | 244 | v v v v v 245 | ----·--------·------------------·-------------------·-------------·---- 246 | 247 | A host is allowed to fill a slot when the distance between its id and the start 248 | point is less that the allowed distance. 249 | 250 | start point 251 | | Kademlia distance 252 | t=3 t=2 t=1 v 253 | <------(------(------(------·------)------)------)------> 254 | ^ ^ 255 | | | 256 | this host is this host is 257 | allowed at t=2 allowed at t=3 258 | 259 | Note that even though we use the Kademlia distance function, this bears no 260 | relation to the DHT. We use the blockchain address of the host, not its peer id. 261 | 262 | This dispersal mechanism still requires modeling to check that it meets its 263 | goals, and to find the optimal value for the dispersal parameter, given certain 264 | network conditions. It is also worth looking into simpler alternatives. 265 | 266 | Conclusion 267 | ---------- 268 | 269 | The design that we presented here deviates significantly from the previous 270 | marketplace design. 271 | 272 | There is no explicit negotiation phase for Requests. Clients are no 273 | longer able to choose which hosts will be responsible for keeping the content on 274 | the network. This removes the selection step that was required in the old 275 | design. Instead, a host presents the network with an opportunity to earn money by 276 | storing content. Hosts can decide whether they want to take part in the 277 | Request, and if they do they are expected to keep to their part of the deal 278 | lest they lose their collateral. 279 | 280 | The first hosts that download the content and provide initial storage proofs are 281 | awarded slots in the Request. This removes the explicit Request start (and its 282 | associated timeout behavior) that was required in the old design. It also adds 283 | an incentive to quickly start storing the content while slots are available in 284 | the Request. 285 | 286 | While the old design required separate negotiations per host, this design 287 | ensures that either the single Request starts with all hosts, or is cancelled. 288 | This is a significant reduction in the amount of interactions required. 289 | 290 | The old design required new negotiations when a host is not able to fulfill its 291 | obligations, and a separately designed repair protocol. In this design we 292 | managed to include repair incentives and a repair protocol that is nearly 293 | identical to Request start. 294 | 295 | In the old design we had a single collateral per host that could be used to 296 | cover many Requests. Here we decided to include collateral per Request. This 297 | is done to simplify collateral handling, but it is not a requirement of the new 298 | design. The new design can also be made to work with a single collateral per 299 | host. 300 | -------------------------------------------------------------------------------- /design/metadata-overhead.md: -------------------------------------------------------------------------------- 1 | # Reducing Metadata Overhead 2 | 3 | Metadata plays a crucial role in any distributed or peer-to-peer (p2p) storage network. However, it often incurs significant overhead for the system. Therefore, it is important to understand the required metadata and how it should be stored, located, and transmitted. 4 | 5 | ## Metadata and Manifests 6 | 7 | Codex utilizes a metadata descriptor structure called the "manifest". A manifest is similar to a torrent file and stores various pieces of information necessary to describe a dataset. 8 | 9 | ``` 10 | Manifest 11 | rootHash # Cid of root (tree) hash of the contained data set 12 | originalBytes # Exact size of the original (uploaded) file 13 | blockSize # Size of each contained block 14 | blocks # Array of dataset blocks Cids 15 | version # Cid version 16 | hcodec # Multihash codec 17 | codec # Data set codec 18 | ``` 19 | 20 | Additional information that describes erasure coding parameters may also be included: 21 | 22 | ``` 23 | Manifest 24 | ... 25 | ecK # Number of blocks to encode 26 | ecM # Number of resulting parity blocks 27 | originalCid # The original Cid of the dataset being erasure coded 28 | originalLen # The length of the original manifest 29 | ``` 30 | 31 | Manifests are treated as regular blocks of data, requiring no special handling by the Codex network or nodes. This means that announcing and storing manifests follows the same flow and uses the same subsystems as regular blocks. This convenience simplifies the execution flow significantly. 32 | 33 | ## Manifest limitations 34 | 35 | Including block hashes in the manifest introduces significant limitations. Firstly, the size of the manifest grows linearly with the number of hashes and the size of the hashing function itself, resulting in increased overhead for storing and transmitting manifests. 36 | 37 | Overall, large manifests impose additional burden on the network in terms of storage and transmission, resulting in unnecessary overhead. For example, when retrieving a sizable file, it becomes necessary to obtain all the hashes listed in the manifest before downloading the initial block. This process can require hundreds of megabytes of data. 38 | 39 | One way to reduce the number of hashes is to increase the block size, which only partially addresses the problem. A better solution however, is to completely remove the blocks array from the manifest and instead rely on a Merkle proofs to verify the block. 40 | 41 | ## Slots and verification subsystem support 42 | 43 | Besides the block hashes overhead, another reason for the change is the introduction of slots (verifiable dataset subsets) that nodes in a storage set/group store and verify. Slots require Merkle trees for verification, but otherwise are identical to the top-level dataset. Thus, storing and transmitting Merkle proofs is already a requirement for slot verification. 44 | 45 | Replacing the blocks array with a proper Merkle tree would allow using the same mechanism proposed in this document, for both the top level dataset and for slot verification, storage and transmission. This greatly simplifies integration of the verification subsystem. 46 | 47 | ## Removing blocks array 48 | 49 | As already mentioned, the new mechanism proposed here, removes the blocks array from the manifest file in favor of a separate Merkle tree. This Merkle tree is persisted in the local store, and transmitted along side the dataset blocks on retrieval. This allows verifying the transmitted blocks without knowing it's hashes a priory. 50 | 51 | ## Implementation overview 52 | 53 | This mechanism requires an efficient Merkle tree implementation, which also allows persisting the leafs and intermediary hashes to disk; changes to the block exchange engine to support querying blocks by root hash and block index; and integration with the block store abstraction. 54 | 55 | ### Merkle Tree 56 | 57 | The block hashes array is replaced by a Merkle tree. The Merkle tree should support persisting to disk, partial and non blocking reads/writes, loading and storing from (async) iterators. For reference, checkout out https://github.com/filecoin-project/merkletree. 58 | 59 | ### Block retrieval 60 | 61 | #### Block Exchange Engine 62 | 63 | The block exchange engine requires support for querying blocks by their index and respective dataset Merkle root. It also requires returning the Merkle proofs along side the chunk so that it can be readily verified. Scheduling blocks for retrieval should largely remain the same, but additional request and response messages are required. 64 | 65 | #### Announcing over the DHT 66 | 67 | Also, datasets are now announced by their Merkle root instead of each individual block as was the case in the previous implementation. Announcing individual blocks is still supported, for example manifests are announced exactly the same as before, by their cid. Announcing individual blocks is also supported (but not required) and can be useful in the case of bandwidth incentives. 68 | 69 | ### Block Stores and Local Repo 70 | 71 | All interactions with blocks/chunks sit behind the `BlockStore` abstraction, which currently only supports querying blocks by hash. It should be extended to allow querying by Merkle root and block index and/or range. 72 | 73 | The local repo should be aware of the persisted Merkle tree. When a requests by index is made, the store first locates the persisted Merkle tree corresponding to the specified root and retrieves the requested leaf and corresponding Merkle proofs. 74 | 75 | Once the hash of the request block is known, the repo/store can be queried for the block using the retrieved block hash. 76 | 77 | Keeping support for hash based retrieval (content addressing) has two main advantages: 78 | 79 | 1. It preserves content addressing at the repo level, which enables content deduplication. 80 | 2. It allows keeping the number of required changes to a minimum, as once the block hash is know, the existing flow can be reused. 81 | 82 | ## Updated flow 83 | 84 | ### Upload 85 | 86 | ```mermaid 87 | sequenceDiagram 88 | User ->> +Client: Upload file 89 | loop Store Block 90 | Client ->> +Chunker: Data Stream 91 | loop Chunking 92 | Chunker -->> +Chunker: Chunk stream 93 | Chunker -->> -Client: Block 94 | end 95 | Client ->> +Repo: Put block 96 | Client ->> +MerkleTree: Add Block Hash 97 | end 98 | MerkleTree -->> -Client: Merkle Root 99 | Client ->> MerkleTree: Serialize Merkle Tree 100 | Client ->> Client: Put Merkle Root in Manifest 101 | Client ->> Repo: Persist Manifest 102 | Client -->> User: Manifest Cid 103 | Client ->> DHT: Announce Manifest Cid 104 | Client ->> -DHT: Announce Dataset Cid 105 | ``` 106 | 107 | **Steps**: 108 | 109 | 1. User initiates a file upload 110 | 1. Client chunks the stream and stores blocks in the Repo 111 | 2. Block's hash is added to a MerkleTree instance 112 | 3. This is repeated until all data has been read from the stream 113 | 2. Once all blocks have been stored, the Merkle root is generated and persisted 114 | 3. The manifest is persisted and serialized in the repo 115 | 4. The cid of the persisted manifest is returned to the user 116 | 5. Both the manifest Cid and the Dataset Merkle Root Cid are announced on the DHT 117 | 1. This allows locating both the manifest and the dataset individually 118 | 119 | ### Retrieval 120 | 121 | #### Local Flow 122 | 123 | ```mermaid 124 | sequenceDiagram 125 | User ->> Client: Request Manifest Cid 126 | alt If manifest cid in Repo 127 | Client ->> Repo: getBlock(cid) 128 | else Manifest cid not in Repo, request from Network 129 | Client ->> NetworkStore: [See Network Flow] 130 | end 131 | Repo -->> Client: Manifest Block 132 | Client ->> Client: Deserialize Manifest and Read Merkle Root 133 | Client ->> MerkleTree: loadMerkleTree(manifest.cid) 134 | loop Read Dataset 135 | Client ->> MerkleTree: getLeaf(index) 136 | MerkleTree -->> Client: [leaf cid, proof hashes...] 137 | alt If cid in Repo 138 | Client ->> Repo: getBlock(cid) 139 | Repo -->> Client: Data Block 140 | Client -->> User: Stream of blocks 141 | else Cid not in Repo, request from Network 142 | Client ->> NetworkStore: [See Network Flow] 143 | end 144 | end 145 | ``` 146 | 147 | **Steps**: 148 | 149 | 1. User initiates a download with a manifest Cid 150 | 2. Client checks the local store for the manifest Cid 151 | 1. If it exists, the manifest is deserialized and the Merkle root of the dataset is read 152 | 2. Otherwise, the Cid is requested from the network store 153 | 3. Client checks the local repo for the Merkle tree root 154 | 1. If it exists, the Merkle tree is deserialized and leaf hashes are read 155 | 2. For each leaf hash which corresponds to the hash of the block 156 | 1. The local repo is checked for the precense of the block 157 | 1. If it exists, it is read from the local store and returned to the client 158 | 2. Otherwise, the Cid is requested from the network store 159 | 160 | #### Network Flow 161 | 162 | ```mermaid 163 | sequenceDiagram 164 | alt If block cid in Repo 165 | Client ->> Repo: getBlock(cid) 166 | Repo -->> Client: Block 167 | else Not in repo or no cid for block 168 | Client ->> NetworkStore: getBlockByIndex(cid, index) 169 | NetworkStore ->> BlockExchange: requestBlock(cid, index) 170 | loop Retrieve Blocks 171 | alt If have peers for Cid 172 | BlockExchange ->> Peers: Request root cid and index (or range) 173 | break Found Block(s) 174 | Peers -->> BlockExchange: [[block, [leaf cid, proof hashes...]]...] 175 | end 176 | else No peers for Cid 177 | loop Find Peers 178 | BlockExchange ->> DHT: Find peers for cid 179 | break Peers Found 180 | DHT -->> BlockExchange: [peers...] 181 | end 182 | end 183 | end 184 | end 185 | BlockExchange -->> NetworkStore: [[block, [proof hashes...]]...] 186 | loop For all blocks 187 | alt If Block hash and Merkle proof is correct 188 | NetworkStore -->> MerkleTree: Store Merkle path 189 | NetworkStore -->> Repo: Store Block 190 | NetworkStore -->> Client: Block 191 | else Block hash and Merkle proof is incorrect 192 | break Incorrect Block or Merkle proof 193 | Client -> NetworkStore: Disconnect bad peer 194 | end 195 | end 196 | end 197 | end 198 | ``` 199 | 200 | **Steps**: 201 | 202 | 1. The client requests blocks from the network store, using the Merkle root and block index 203 | 1. Network store requests the block from the BlockExchange engine 204 | 1. BlockExchange checks connected peers for requested hash 205 | 1. If they do, the block is requested using the root hash and index (or range) of the block 206 | 2. Otherwise, it queries the DHT for the requested root hash 207 | 1. Once new peers have been discovered and connected, go to step 1.1.1 208 | 2. Once blocks are received from the remote nodes 209 | 1. The hashes are verified against the requested Merkle root and if they pass 210 | 1. The block is persisted to the repo/local store 211 | 2. The block hash (cid) and the Merkle proof are stored in the persisted Merkle tree 212 | 2. Otherwise, the block is discarded and the node that sent the incorrect block disconnected 213 | -------------------------------------------------------------------------------- /design/proof-erasure-coding.ods: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/codex-storage/codex-research/4a5ddf0e5271bdb49fe6db18e2a2d4268d37c07a/design/proof-erasure-coding.ods -------------------------------------------------------------------------------- /design/sales.md: -------------------------------------------------------------------------------- 1 | # Sales module 2 | 3 | The sales module is responsible for selling a node's available storage in the 4 | [marketplace](./marketplace.md). In order to do so it needs to know how much 5 | storage is available. It also needs to be able to reserve parts of the storage, 6 | to make sure that it is not used for other purposes. 7 | 8 | --------------------------------------------------- 9 | | | 10 | | Sales | 11 | | | 12 | | ^ | | 13 | | | | updates ------------------ | 14 | | | --------------> | | | 15 | | | | Reservations | | 16 | | ------------------- | | | 17 | | queries ------------------ | 18 | | ^ ^ | 19 | ----------------------------|---------|----------- 20 | | | 21 | reserved space | | state 22 | v v 23 | ---------------- ----------------- 24 | | Repo | | Datastore | 25 | ---------------- ----------------- 26 | 27 | The reservations module keeps track of storage that is available to be sold. 28 | Users are able to add availability to indicate how much storage they are willing 29 | to sell and under which conditions. 30 | 31 | Availability 32 | amount 33 | maximum duration 34 | minimum price 35 | 36 | Availabilities consist of an amount of storage, the maximum duration and minimum 37 | price to sell it for. They represent storage that is for sale, but not yet sold. 38 | This is information local to the node that can be altered without affecting 39 | global state. 40 | 41 | ## Adding availability 42 | 43 | When a user adds availability, then the reservations module will check whether 44 | there is enough space available in the Repo. If there is enough space, then it 45 | will increase the amount of reserved space in the Repo. It persists the state of 46 | all availabilities to the Datastore, to ensure that they can be restored when a 47 | node is restarted. 48 | 49 | User Reservations Repo Datastore 50 | | | | | 51 | | add availability | | | 52 | | ---------------->| check free space | | 53 | | |----------------->| | 54 | | | reserve amount | | 55 | | |----------------->| | 56 | | | | 57 | | | persist availability | 58 | | |------------------------------>| 59 | 60 | ## Selling storage 61 | 62 | When a request for storage is submitted on chain, the sales module decides 63 | whether or not it wants to act on it. First, it tries to find an availability 64 | that matches the requested amount, duration, and price. If an availability 65 | matches, but is larger than the requested storage, then the Sales module may 66 | decide to split the availability into a part that we can use for the request, 67 | and a remainder that can be sold separately. The matching availability will be 68 | set aside so that it can't be sold twice. 69 | 70 | It then selects a slot from the request to fill, and starts downloading its 71 | content chunk by chunk. For each chunk that is successfully downloaded, a bit of 72 | reserved space in the Repo is released. The content is stored in the Repo with a 73 | time-to-live value that ensures that the content remains in the Repo until the 74 | request expires. 75 | 76 | Once the entire content is downloaded, the sales module will calculate a storage 77 | proof, and submit the proof on chain. If these steps are all successful, then 78 | this node has filled the slot. Once the other slots are filled by other nodes 79 | the request will start. The time-to-live value of the content should then be 80 | updated to match the duration of the storage request. 81 | 82 | Marketplace Sales Reservations Repo 83 | | | | | 84 | | incoming request | | | 85 | |------------------->| find reservation | | 86 | | |-------------------->| | 87 | | | remove reservation | | 88 | | |-------------------->| | 89 | | | | | 90 | | | store content | 91 | | |----------------------------------->| 92 | | | set time-to-live | 93 | | |----------------------------------->| 94 | | | release reserved space | 95 | | |----------------------------------->| 96 | | submit proof | | 97 | |<-------------------| | 98 | | | | 99 | . . . 100 | . . . 101 | | request started | | 102 | |------------------->| update time-to-live | 103 | | |----------------------------------->| 104 | 105 | ## Ending a request 106 | 107 | When a storage request comes to an end, then the content can be removed from the 108 | repo and the storage space can be made available for sale again. The same should 109 | happen when something went wrong in the process of selling storage. 110 | 111 | The time-to-live value should be removed from the content in the Repo, reserved 112 | space in the Repo should be increased again, and the availability that was used 113 | for the request can be re-added to the reservations module. 114 | 115 | Sales Reservations Repo 116 | | | | 117 | | | | 118 | | | 119 | | remove time to live | 120 | |----------------------------------->| 121 | | increase reserved space | 122 | |----------------------------------->| 123 | | | 124 | | re-add availability | | 125 | |-------------------->| | 126 | | | | 127 | 128 | ## Persisting state 129 | 130 | The sales module keeps state in a number of places. Most state is kept on chain, 131 | this includes the slots that a host is filling and the state of each slot. This 132 | ensures that a node's local view of slot states does not deviate from the 133 | network view, even when the network changes while the node is down. The rest of 134 | the state is kept on local disk by the Repo and the Datastore. How much space is 135 | reserved to be sold is persisted on disk by the Repo. The availabilities are 136 | persisted on disk by the Datastore. 137 | 138 | ## Slot queue 139 | 140 | Once a new request for storage is created on chain, all hosts will receive a 141 | contract event announcing the storage request and decide if they want to act on 142 | the request by matching their availabilities with the incoming request. Because 143 | there will be many requests being announced over time, each host will create a 144 | queue of matching request slots, adding each new storage slot to the queue. 145 | 146 | ### Adding slots to the queue 147 | 148 | Slots will be added to the queue when request for storage events are received 149 | from the contracts. Additionally, when slots are freed, a contract event will 150 | also be received, and the slot will be added to the queue. Duplicates are 151 | ignored. 152 | 153 | When all slots of a request are added to the queue, the order should be randomly 154 | shuffled, as there will be many hosts in the network that could potentially pick 155 | up the request and will process the first slot in the queue at the same time. 156 | This should avoid some clashes in slot indices chosen by competing hosts. 157 | 158 | Before slots can be added to the queue, availabilities must be checked to ensure 159 | a matching availability exists. This filtering prevents all slots in the network 160 | from entering the queue. 161 | 162 | ### Removing slots from the queue 163 | 164 | Hosts will also receive contract events for when any contract is started, 165 | failed, or cancelled. In all of these cases, slots in the queue pertaining to 166 | these requests should be removed as they are no longer fillable by the host. 167 | Note: expired request slots will be checked when a request is processed and its 168 | state is validated. 169 | 170 | ### Sort order 171 | 172 | Slots in the queue should be sorted in the following order: 173 | 1. Seen flag (`true` flag should be lower than `false`) 174 | 2. Profit (descending)1 175 | 3. Collateral required (ascending) 176 | 4. Time before expiry (descending) 177 | 5. Dataset size (ascending) 178 | 179 | 1 While profit cannot yet be calculated correctly as this calculation will 180 | involve bandwidth incentives, profit can be estimated as `duration * reward` 181 | for now. 182 | 183 | Note: datset size may eventually be included in the profit algorithm and may not 184 | need to be included on its own in the future. Additionally, data dispersal may 185 | also impact the datset size to be downloaded by the host, and consequently the 186 | profitability of servicing a storage request, which will need to be considered 187 | in the future once profitability can be calculated. 188 | 189 | ### Queue processing 190 | 191 | Queue processing will be started only once, when the sales module starts and 192 | will process slots continuously, in order, until the queue is empty. If the 193 | queue is empty, processing of the queue will resume once items have been added 194 | to the queue. If the queue is not empty, but there are no availabilities, queue 195 | processing will resume once availabilites have been added. 196 | 197 | As soon as items are available in the queue, and there are workers available for 198 | processing, an item is popped from the queue and processed. 199 | 200 | When a slot is processed, it is first checked to ensure there is a matching 201 | availability, as these availabilities will have changed over time. Then, the 202 | sales process will begin. The start of the sales process should ensure that the 203 | slot being processed is indeed available (slot state is "free") before 204 | continuing. If it is not available, the sales process will exit and the host 205 | will continue to process the top slot in the queue. The start of the sales 206 | process should also check to ensure the host is allowed to fill the slot, due to 207 | the [sliding window 208 | mechanism](https://github.com/codex-storage/codex-research/blob/master/design/marketplace.md#dispersal). 209 | If the host is not allowed to fill the slot, the sales process will exit and the 210 | host will process the top slot in the queue. 211 | 212 | #### Preventing continual processing when there are small availabilities 213 | If the processed slot cannot continue because there are no availabilities, the 214 | slot should be marked as `seen` and put back into the queue. This flag will 215 | cause the slot to be ordered lower in the heap queue. If, upon processing 216 | a slot, the slot item already has a `seen` flag set, the queue should be 217 | paused. 218 | 219 | This serves to prevent availabilities that are small (in available bytes) from 220 | emptying the queue. 221 | 222 | #### Pausing the queue 223 | When availabilities are modified or removed, and there are no availabilities 224 | left, the queue should be paused. 225 | 226 | A paused queue will wait until it is unpaused before continuing to process items 227 | in the queue. This prevents unnecessarily popping items off the queue. 228 | 229 | #### Unpausing the queue 230 | When availabilities are modified or added, the queue should be unpaused if it 231 | was paused and any slots in the queue should have their `seen` flag cleared. 232 | Additionally, when slots are pushed to the queue, the queue should be unpaused 233 | if it was paused, however the `seen` flags of existing queue items should not be 234 | cleared. 235 | 236 | #### Queue workers 237 | Each time an item in the queue is processed, it is assigned to a workers. The 238 | number of allowed workers can be specified during queue creation. Specifying a 239 | limited number of workers allows the number of concurrent items being processed 240 | to be capped to prevent too many slots from being processed at once. 241 | 242 | During queue processing, only when there is a free worker will an item be popped 243 | from the queue and processed. Each time an item is popped and processed, a 244 | worker is removed from the available workers. If there are no available workers, 245 | queue processing will resume once there are workers available. 246 | 247 | #### Adding availabilities 248 | When a host adds an availability, a signal is triggered in the slot queue with 249 | information about the availability. This triggers a lookup of past request for 250 | storage events, capped at a certain number of past events or blocks. The slots 251 | of the requests in each of these events are added to the queue, where slots 252 | without matching availabilities are filtered out (see [Adding slots 253 | to the queue](#adding-slots-to-the-queue) above). Additionally, when slots of 254 | these requests are processed in the queue, they will be checked to ensure that 255 | the slots are not filled (see [Queue processing](#queue-processing) above). 256 | 257 | ### Implementation tips 258 | 259 | Request queue implementations should keep in mind that requests will likely need 260 | to be accessed randomly (by key, eg request id) and by index (for sorting), so 261 | implemented structures should handle these types of operations in as little time 262 | as possible. 263 | 264 | ## Repo 265 | 266 | The Repo exposes the following functions that allow the reservations module to 267 | query the amount of available storage, to update the amount of reserved 268 | space, and to store data for a guaranteed amount of time. 269 | 270 | Repository API: 271 | function available(): amount 272 | function reserve(amount) 273 | function release(amount) 274 | function setTtl(cid, ttl) 275 | 276 | ## Datastore 277 | 278 | The Datastore is a generic key-value store that is used to persist the state of 279 | the Reservations module, so that it survives node restarts. 280 | 281 | Datastore API: 282 | function put(key, value) 283 | function get(key): value 284 | -------------------------------------------------------------------------------- /design/slot-reservations.md: -------------------------------------------------------------------------------- 1 | # Slot reservations 2 | 3 | Competition between storage providers (SPs) to fill slots has some advantages, 4 | such as providing an incentive for SPs to become proficient in downloading 5 | content and generating proofs. It also has some drawbacks, for instance it can 6 | lead to network inefficiencies because multiple SPs do the work of downloading 7 | and proving, while only one SP is rewarded for it. These inefficiencies lead to 8 | higher costs for SPs, which leads to an overall increase in the price of storage 9 | on the network. It can also lead to clients inadvertently inviting too much 10 | network traffic to themselves. Should they for instance post a very lucrative 11 | storage request, then this invites a lot of SPs to start downloading the content 12 | from the client simultaneously, not unlike a DDOS attack. 13 | 14 | Slot reservations are a means to avoid these inefficiencies by only allowing SPs 15 | who have secured a slot reservation to fill the slot. Furthermore, slots can 16 | only be reserved by eligible SPs, governed by a window of eligible addresses 17 | that starts small and grows larger over time, eventually encompassing the entire 18 | address space on the network. 19 | 20 | ## Proposed solution: slot reservations 21 | 22 | Before downloading the content associated with a slot, a limited number of SPs 23 | can reserve the slot. Only SPs that have reserved the slot can fill the slot. 24 | After the SP downloads the content and calculates a proof, it can move the slot 25 | from its reserved state into the filled state by providing collateral and the 26 | storage proof. Then it begins to periodically provide storage proofs and accrue 27 | payments for the slot. 28 | 29 | ``` 30 | reserve proof & collateral 31 | | | 32 | v v 33 | --------------------------------------------- 34 | slot: |/ / / / / / / / / |///////////////////////// 35 | --------------------------------------------- 36 | | | 37 | v v 38 | slot slot 39 | reserved filled 40 | 41 | 42 | ---------------- time ----------------> 43 | ``` 44 | 45 | There is an initial race for eligible SPs who are first to secure a reservation, 46 | then a second race amongst the SPs with a reservation to fill the slot (with 47 | collateral and the generated proof). However, not all SPs in the network can 48 | reserve a slot initially: the [expanding window 49 | mechanism](https://github.com/status-im/codex-research/blob/ad41558900ff8be91811aa5de355148d8d78404f/design/marketplace.md#dispersal) 50 | dictates which SPs are eligible to reserve the slot. 51 | 52 | ### Expanding window mechanism 53 | 54 | The expanding window mechanism prevents node and network overload once a slot 55 | becomes available to be filled (or repaired) by allowing only a very small 56 | number of SP addresses to fill/repair the slot at the start. Over time, the 57 | number of eligible SP addresses increases, until eventually all SP addresses in 58 | the network are eligible. 59 | 60 | The expanding window mechanism starts off with a random source address, defined 61 | as $hash(block hash, request id, slot index, reservationindex)$, with a unique 62 | source address for each reservation of each slot. The distance between each SP 63 | address and the source address can be defined as $XOR(A, A_0)$ (kademlia 64 | distance). Once the allowed distance is greater than SP's distance, the SP is 65 | considered eligible to reserve a slot. The allowed distance for eligible 66 | addresses over time $t_i$ can be [defined 67 | as](https://hackmd.io/@bkomuves/BkDXRJ-fC) $2^{256} * F(t_i)$, where $2^{256}$ 68 | represents the total number of 256-bit addresses in the address space, and 69 | $F(t_i)$ represents the expansion function over time. As this allowed distance 70 | value increases along a curve, more and more addresses will be eligible to 71 | participate in reserving that slot. In total, eligible addresses are those that 72 | satisfy: 73 | 74 | $XOR(A, A_0) < 2^{256} * F(t_i)$ 75 | 76 | Furthermore, the client can change the curve of the rate of expansion, by 77 | setting a [dispersal 78 | parameter](https://github.com/codex-storage/codex-research/blob/ad41558900ff8be91811aa5de355148d8d78404f/design/marketplace.md#dispersal) 79 | of the storage request, $h$, which represents the percentage of the network 80 | addresses that will be eligible halfway to the time of expiry. $h$ can be 81 | defined as: 82 | 83 | $h := F(0.5)$, where $0 \lt h \lt 1$ and $h \neq 0.5$ 84 | 85 | Changing the value of $h$ will [affect the curve of the rate of 86 | expansion](https://www.desmos.com/calculator/pjas1m1472) (interactive graph). 87 | 88 | #### Expansion function, $F(t_i)$, in-depth 89 | 90 | $F(t_i)$ defines the expansion factor of eligible addresses in the network over 91 | time. 92 | 93 | ##### Assumptions 94 | 95 | It is assumed network addresses are randomly, and more-or-less uniformly, 96 | selected from a space of $2^{256}$. 97 | 98 | It is also assumed that the window can only change in discrete steps, based on 99 | some underlying blockchain's cadence (for example this would be approx every 12 100 | seconds in the case of Ethereum), and that we measure time based on timestamps 101 | encoded in blockchain blocks. 102 | 103 | However, with this assumption given, it is desired to be as granular and tunable 104 | as possibly. 105 | 106 | There is a time duration in which it is desired to go from a single network 107 | address to the whole address-space. 108 | 109 | To be able to make this work nicely, first a linear time function $t_i$ which 110 | goes from 0 to 1, is defined. 111 | 112 | ##### Implementation 113 | 114 | At any desired block with timestamp $timestamp_i$, simply compute: 115 | 116 | $$t_i := \frac{timestamp_i - start}{expiry - start}$$ 117 | 118 | Then to get a network range, any kind of expansion function $F(x)$ with $F(0)=0$ 119 | and $F(1)=1$ can be plugged in; for example, a parametric exponential: 120 | 121 | $$ F_s(x) = \frac{\exp(sx) - 1}{\exp(s) - 1} $$ 122 | 123 | Remark: with this particular function, it is likely desired to have $s<0$ 124 | (resulting in fast expansion initially, slowing down later). Here is a 125 | Mathematica one-liner to play with this idea: 126 | ``` 127 | Manipulate[ 128 | Plot[ (Exp[s*x]-1)/(Exp[s]-1), {x,0,1}, PlotRange->Full ], 129 | {s,-10,-1} ] 130 | ``` 131 | As an alternative, the same can easily be done with eg. the online 132 | [Desmos](https://www.desmos.com/calculator) tool. 133 | 134 | ##### Address window 135 | 136 | Finally, an address $A$ becomes eligible at block $i$ if the Kademlia distance 137 | from the "window center" $A_0$ is smaller than $2^{256}\times F(t_i)$: 138 | 139 | $$ XOR(A,A_0) < 2^{256}\cdot F(t_i) $$ 140 | 141 | Note: since $t_i$ only becomes 1 exactly at expiry, to allow the whole network 142 | to participate near the end, there should be a small positive $\delta > 0$ such 143 | that $F(t)=1$ for $t>1-\delta$, leaving the last about $100\delta$ percentage of 144 | the total slot fill window when the whole network is eligible to participate. 145 | 146 | Alternatively, $t_i$ could be rescaled to achieve the same effect: 147 | 148 | $$ t_i' := \min(\; t_i/(1-\delta)\;,\;1\;) $$ 149 | 150 | The latter is probably simpler because it allows complete freedom in selecting 151 | the expansion function $F(x)$. 152 | 153 | ##### Parametrizing the speed of expansion 154 | 155 | While, in theory, arbitrary expansions functions could be used, it is likely 156 | undesirable to have more than an one parameter family, that is, a single 157 | parameter to set the curve. However, even with a single parameter, there 158 | could be any number of different ways to map a number to the same family of 159 | curves. 160 | 161 | In the above example $F_s(t)$, while $s$ is quite natural from a mathematical 162 | perspective, it doesn't really have any meaning for the user. A possibly better 163 | parametrization would be the value $h:=F_s(0.5)$, meaning "how big percentage of 164 | network is allowed to participate at half-time". $s$ can be computed from $h$: 165 | 166 | $$ s = 2\log\left(\frac{1-h}{h}\right) $$ 167 | 168 | ### Abandoned ideas 169 | 170 | #### No reservation collateral 171 | 172 | Reservation collateral was thought to be able to prevent a situation where an SP 173 | would reserve a slot then fail to fill it. However, collateral could not be 174 | burned as it created an attack vector for clients: clients could withhold the 175 | data and cause SPs to lose their reservation collateral. The reservation 176 | transaction itself creates a signal of intent from an SP to fill the slot. If 177 | the SP were to not fill the slot, then other SPs that have reserved the slot 178 | will fill it. 179 | 180 | #### No reservation/fill reward 181 | 182 | Fill rewards were originally proposed to incentivize filling slots as fast as 183 | possible. However, the SPs are already being paid out for the time that they 184 | have filled the slot, thus negating the need for additional incentivization. If 185 | additional incentivization is desired by the client, then an increase in the 186 | value of the storage request is possible. 187 | 188 | Adding a fill reward for SPs who ultimately fill the slot is not necessary 189 | because, like the SP rewards for providing proofs, fill rewards would be paid 190 | once the storage request successfully completes. This would mean that the fill 191 | reward is effectively the same as an increase in value of the storage request 192 | payout. Therefore, if a client is so inclined to provide a fill reward, they 193 | could instead increase the total reward of the storage request. 194 | 195 | In this simplified slot reservations proposal, there will not be reservation 196 | collateral nor reward requirements until the behavior in a live environment can 197 | be observed to determine these are necessary mechanisms. 198 | 199 | ### Slot reservation attacks 200 | 201 | Name | Attack description 202 | :------------|:-------------------------------------------------------------- 203 | Clever SP | SP drops slot when a better opportunity presents itself 204 | Lazy SP | SP reserves a slot, but doesn't fill it 205 | Censoring SP | acts like a lazy SP for specific CIDs that it tries to censor 206 | Greedy SP | SP tries to fill multiple slots in a request 207 | Sticky SP | SP tries to fill the same slot in a contract renewal 208 | Lazy client | client doesn't release content on the network 209 | 210 | #### Clever SP attack 211 | 212 | In this attack, an SP could fill a slot, and while fulfilling its duties, see 213 | that a better opportunity has arisen, and abandon its duties in the first slot 214 | to fill the second slot. 215 | 216 | This attack is mitigated by the SP losing its request collateral for the first 217 | slot once it is abandoned. Additionally, once the SP fills the first slot, it 218 | will accrue rewards over time that will not be paid out until the request 219 | successfully completes. These rewards act as another disincentive for the SP to 220 | abandon the slot. 221 | 222 | The behavior of SPs filling better opportunities is not necessarily an attack. 223 | If an SP is fulfilling its duties on a slot and finds a better opportunity 224 | elsewhere, it should be allowed to do so. The repair mechanisms will allow the 225 | abandoned slot to be refilled by another SP that deems it profitable. 226 | 227 | #### Lazy SP attack 228 | 229 | In this attack, a SP reserves a slot, but waits to fill the slot hoping a better 230 | opportunity will arise, in which the reward earned in the new opportunity would 231 | be greater than the reward earned in the original slot. 232 | 233 | This attack is mitigated by allowing for multiple reservations per slot. All SPs 234 | that have secured a reservation (capped at three) will race to fill the slot. 235 | Thus, if one or more SPs that have reserved the slot decide to pursue other 236 | opportunities, the other SPs that have reserved the slot will still be able to 237 | fill the slot. 238 | 239 | In addition, the expanding window mechanism allows for more SPs to participate 240 | (reserve/fill) as time progresses, so there will be a larger pool of SPs that 241 | could potentially fill the slot. Because each reservation will have its own 242 | unique expanding window source, SPs reserving one slot in a request will likely 243 | not have the same opportunities to reserve/fill the same slot in another 244 | request. 245 | 246 | #### Censoring SP attack 247 | 248 | The "censoring SP attack" is when an SP attempts to withhold providing specific 249 | CIDs from the network in an attempt to censor certain content. An SP could also 250 | try this attack in the case of repair, hoping to prevent a freed slot from being 251 | repaired. 252 | 253 | Even if one SP withholds specific content, the dataset, along with the withheld 254 | CID can be reconstructed from K chunks (provided by other SPs) allowing the 255 | censored CID to be accessed. In the case of repair, the SP would need to control 256 | M+1 chunks to prevent data reconstruction by other nodes in the network. The 257 | expanding window mechanism seeks to prevent SPs from filling multiple slots in 258 | the same request, which should prevent any one SP from controlling M+1 slots. 259 | 260 | #### Greedy SP attack 261 | 262 | The "greedy SP attack" is when one SP tries to fill multiple slots in a single 263 | request. Mitigation of this attack is achieved through the expanding windows for 264 | each request not allowing a single SP address to fill all the slots. This is 265 | only effective for the majority of time before expiry, however, meaning it is 266 | not impossible for this attack to occur. If a request is offered and the slots 267 | are not filled after some time, the expanding windows across the slots may open 268 | up to allow all SPs in the network to fill multiple slots in the request. 269 | 270 | A controlling entity may try to circumvent the expanding window by setting up a 271 | sybil attack with many highly distributed nodes. Even with many nodes covering a 272 | large distribution of the address space, the randomness of the expanding window 273 | will make this attack highly improbable, except for undervalued requests that do 274 | not have slots filled early, in which case there would be a lack of motivation 275 | to attack data that is undervalued. 276 | 277 | #### Sticky SP attack 278 | 279 | The "sticky SP attack" is where an SP tries to withhold data for a contract 280 | renewal so they are able to fill the slot again. The SP withholds data from all 281 | other SPs until the expanding window allows their address, then they quickly 282 | fill the slot (they are quick because they don't need to download the data). As 283 | in the censoring SP attack, the SP would need to control M+1 slots for this to 284 | be effective, because that is the only way to prevent the CID from being 285 | reconstructed from K slots available from other SPs. 286 | 287 | #### Lazy client attack 288 | 289 | In this attack, a client might want to disrupt the network by creating requests 290 | for storage but never releasing the data to SPs attempting to fill the slot. The 291 | transaction cost associated with this type of behavior should provide some 292 | mitigation. Additionally, if a client tries to spam the network with these types 293 | of untenable storage requests, the transaction cost will increase with the 294 | number of requests due to increasing block fill rate and rising gas costs 295 | associated. However, this attack is not impossible. 296 | 297 | ### Open questions 298 | 299 | Perhaps the expanding window mechanism should be network-aware such 300 | that there are always a minimum of two SPs in a window at a given time, to 301 | encourage competition? The downside of this is that active SPs need to be 302 | persisted and tracked in the contract, with larger transaction costs resulting 303 | from this. 304 | 305 | ### Trade offs 306 | 307 | The main advantage to this design is that nodes and the network would not be 308 | overloaded at the outset of slots being available for SP participation. 309 | 310 | The downside of this proposal is that an SP would have to participate in two 311 | races: one for reserving the slot and another for filling the slot once 312 | reserved, which brings additional complexities in the smart contract. 313 | 314 | In addition, there are two attack vectors, the "greedy SP attack" and the "lazy 315 | client attack" that are not well covered in the slot reservation design. There 316 | could be even more complexities added to the design to accommodate these two 317 | attacks (see the other proposed solution for the mitigation of these attacks). 318 | -------------------------------------------------------------------------------- /design/storage-proof-timing.md: -------------------------------------------------------------------------------- 1 | Timing of Storage Proofs 2 | ======================== 3 | 4 | We present a design that allows a smart contract to determine when proofs of 5 | storage should be provided by hosts. 6 | 7 | Context 8 | ------- 9 | 10 | Hosts that are compensated for providing storage of data, are held accountable 11 | by providing proofs of storage periodically. It's important that a host is not 12 | able to pre-compute those proofs, otherwise it could simply delete the data and 13 | only store the proofs. 14 | 15 | A smart contract should be able to check whether those proofs were delivered in 16 | a correct and timely manner. Either the smart contract will be used to perform 17 | these checks directly, or as part of an arbitration mechanism to keep validators 18 | honest. 19 | 20 | Design 1: block by block 21 | ------------------------ 22 | 23 | A first design used the property that blocks on Ethereum's main chain arrive in 24 | a predictable cadence; about once every 14 seconds a new block is produced. The 25 | idea is to use the block hashes as a source of non-predictable randomness. 26 | 27 | From the block hash you can derive a challenge, which the host uses together 28 | with the stored data to generate a proof of storage. 29 | 30 | Furthermore, we use the block hash to perform a die roll to determine whether a 31 | proof is required or not. For instance, if a storage contract stipulates that a 32 | proof is required once every 1000 blocks, then each new block hash leads to a 1 33 | in 1000 chance that a proof is required. This ensures that a host should always 34 | be able to deliver a storage proof at a moments notice, while keeping the total 35 | costs of generating and validating proofs relatively low. 36 | 37 | Problems with block cadence 38 | --------------------------- 39 | 40 | We see a couple of problems emerging from this design. The main problem is that 41 | block production rate is not as reliable as it may seem. The steady cadence of 42 | the Ethereum main net is (by design) not shared by L2 solutions, such as 43 | rollups. 44 | 45 | Most L2 solutions therefore [warn][2] [against][3] the use of the block interval 46 | as a measure of time, and tell you to use the block timestamp instead. Even 47 | though these are susceptible to some miner influence, over longer time intervals 48 | they are deemed reliable. 49 | 50 | Another issue that we run into is that on some L2 designs, block production 51 | increases when there are more transactions. This could lead to a death spiral 52 | where an increasing number of blocks leads to an increase in required proofs, 53 | leading to more transactions, leading to even more blocks, etc. 54 | 55 | And finally, because storage contracts between a client and a host are best 56 | expressed in wall clock time, there are going to be two different ways of 57 | measuring time in the same smart contract, which could lead to some subtle bugs. 58 | 59 | These problems lead us to the second design. 60 | 61 | Design 2: block pointers 62 | ------------------------ 63 | 64 | In our second design we separate cadence from random number selection. For 65 | cadence we use a time interval measured in seconds. This divides time into 66 | periods that have a unique number. Each period represents a chance that a proof 67 | is required. 68 | 69 | We want to associate a random number with each period that is used for the proof 70 | of storage challenge, and for the die roll to determine whether a proof is 71 | required. But since we no longer have a one-on-one relationship between a period 72 | and a block hash, we need to get creative. 73 | 74 | EVM and solidity 75 | ---------------- 76 | 77 | For context, our smart contracts are written in Solidity and execute on the EVM. 78 | In this environment we have access to the [most recent 256 block hashes][1], and 79 | to the current time, but not to the timestamps of the previous blocks. We also 80 | have access to the current block number. 81 | 82 | Block pointers 83 | -------------- 84 | 85 | We introduce the notion of a block pointer. This is a number between 0 and 256 86 | that points to one of the latest 256 block hashes. We count from 0 (latest 87 | block) to 255 (oldest available block). 88 | 89 | oldest latest 90 | - - - |-----------------------------------| 91 | 255 ^ 0 92 | | 93 | pointer 94 | 95 | We want to associate a block pointer with a period such that it keeps pointing 96 | to the same block hash when new blocks are produced. We need this because the 97 | block hash is used to check whether a proof is required, to check a proof when 98 | it's submitted, or to prove absence of a proof, all at different times. 99 | 100 | To ensure that the block pointer points to the same block hash for longer 101 | periods of time, we derive it from the current block number: 102 | 103 | pointer(period) = (blocknumber + period) % 256 104 | 105 | Each time a new block is produced the block pointer increases by one, which 106 | ensures that it keeps pointing to the same block. Over time, when more blocks 107 | are produced, we get this picture: 108 | 109 | | 110 | | - - - |-----------------------------------| 111 | | 255 ^ 0 112 | | | 113 | | pointer 114 | t 115 | i - - - |-----------------------------------| 116 | m 255 ^ 0 117 | e | 118 | | pointer 119 | | 120 | | - - - |-----------------------------------| 121 | | 255 ^ 0 122 | | | 123 | v pointer 124 | 125 | Avoiding surprises 126 | ------------------ 127 | 128 | There is one problem left when we use the pointer as we've just described. 129 | Because of the modulus, there are periods in which the pointer wraps around. It 130 | moves from 255 to 0 from one block to the next. This is undesirable because it 131 | would mean that new proof requirements could all of a sudden appear, leaving too 132 | little time for the host to calculate and submit a proof. 133 | 134 | We identified two ways of dealing with this problem: pointer downtime and 135 | pointer duos. Pointer downtime appears to be the simplest solution, so we 136 | present it here. Pointer duos are described in the appendix. 137 | 138 | Pointer downtime 139 | ---------------- 140 | 141 | We ignore any proof requirements when the pointer is pointing to one of the most 142 | recent blocks: 143 | 144 | - - - |-------------------------|/////////| 145 | 255 ^ 0 146 | | 147 | pointer 148 | 149 | When the pointer is in the grey zone, no proof is required. The amount of blocks 150 | in the grey zone should be chosen such that it is not possible to generate them 151 | all inside a period; this ensures that a host cannot be surprised by new proof 152 | requirements popping up. 153 | 154 | If we want a host to provide a proof on average once every N periods, it now no 155 | longer suffices to have a 1 in N chance to provide a proof. Because there are no 156 | proof requirements in the grey zone, the odds have changed in favor of the host. 157 | To compensate, the odds outside of the grey zone should be increased. For 158 | instance, if the grey zone is 64 blocks (¼ of the available blocks), then the 159 | odds of requiring a proof should be 1 in ¾N. 160 | 161 | -------------------------------------------------------------------------------- 162 | 163 | Appendix: Pointer duos 164 | ---------------------- 165 | 166 | An alternative solution to the pointer wrapping problem is pointer duos: 167 | 168 | pointer1(period) = (blocknumber + period) % 256 169 | pointer2(period) = (blocknumber + period + 128) % 256 170 | 171 | The pointers are 128 blocks apart, ensuring that when one pointer wraps, the 172 | other remains stable. 173 | 174 | - - - |-----------------------------------| 175 | 255 ^ ^ 0 176 | | | 177 | pointer pointer 178 | 179 | We allow hosts to choose which of the two pointers to use. This has implications 180 | for the die roll that we perform to determine whether a proof is required. 181 | 182 | If we want a host to provide a proof on average once every N periods, it no 183 | longer suffices to have a 1 in N chance to provide a proof. Should a host be 184 | completely free to choose between the two pointers (which is not entirely true, 185 | as we shall see shortly) then the odds of a single pointer should be 1 in √N to 186 | get to a probability of `1/√N * 1/√N = 1/N` of both pointers leading to a proof 187 | requirement. 188 | 189 | In reality, a host will not be able to always choose either of the two pointers. 190 | When one of the pointers is about to wrap before validation is due, it can no 191 | longer be relied upon. A really conservative host would follow the strategy of 192 | always choosing the pointer that points to the most recent block, requiring the 193 | odds to be 1 in N. A host that tries to optimize towards providing as little 194 | proofs as necessary will require the odds to be nearer to 1 in √N. 195 | 196 | [1]: https://docs.soliditylang.org/en/v0.8.12/units-and-global-variables.html#block-and-transaction-properties 197 | [2]: https://community.optimism.io/docs/developers/build/differences/#block-numbers-and-timestamps 198 | [3]: https://support.avax.network/en/articles/5106526-measuring-time-in-smart-contracts 199 | -------------------------------------------------------------------------------- /evaluations/account abstraction.md: -------------------------------------------------------------------------------- 1 | Ethereum Account Abstraction 2 | ============================ 3 | 4 | A high level overview of what the current state of account abstraction in 5 | Ethereum is and what role it might play in the Codex design. 6 | 7 | TL;DR: Account abstraction does not impact the design of Codex 8 | 9 | Current state 10 | ------------- 11 | 12 | There have been several proposals to introduce [account abstraction][roadmap] 13 | for Ethereum. Most of them required changes to the consensus mechanism, and were 14 | therefore postponed and have not made it into mainnet. [ERC-4337][4337] is a 15 | newer proposal that uses smart contracts and does not require changes to the 16 | consensus mechanism. It uses a separate mempool for transaction-like objects 17 | called "user operations". They are picked up by bundlers who bundle them into an 18 | actual transaction that is executed on-chain. ERC-4337 is the closest to being 19 | usable on mainnet. 20 | 21 | An ERC-4337 entry point [contract][entrypoint] is deployed on mainnet since 22 | March 2023. One bundler seems to be active ([Stackup][stackup]), although at the 23 | time of writing it seems to be running neither regularly nor without errors. 24 | 25 | Codex use cases 26 | --------------- 27 | 28 | Potential Codex use cases for account abstraction are: 29 | 30 | - Paying for storage without requiring ETH to pay for gas 31 | - Checking for missing storage proofs 32 | 33 | Clients pay for storage and hosts put down collateral in the Codex marketplace. 34 | They need both ERC-20 tokens for payment and collateral and ETH for gas. We 35 | expect wallet providers to make full use of ERC-4337 to implement transactions 36 | where gas is paid for by ERC-20 tokens instead of ETH. These wallets can then be 37 | used to interact with the Codex marketplace. This does not require a change to 38 | the design of Codex itself. 39 | 40 | In our current design for the Codex marketplace we require hosts to provide 41 | [storage proofs][proofs] at unpredictable times. If they fail to provide a 42 | proof, then a simple [validator][validator] can mark a proof as missing. Even 43 | though the marketplace smart contract has all the logic to determine whether a 44 | proof is actually missing, we need the validator to initiate a transaction to 45 | execute the logic. 46 | 47 | Some of the write-ups on account abstraction seem to indicate that account 48 | abstraction would allow for contracts to initiate transactions, or for 49 | subscriptions and repeat payments. However, I could not find any indications in 50 | the specifications that this would be the case. Certainly ERC-4337 does not 51 | allow for this. This means that account abstraction as it currently stands 52 | cannot be used to replace the validator when checking for missing storage 53 | proofs. 54 | 55 | [roadmap]: https://ethereum.org/en/roadmap/account-abstraction/ 56 | [4337]: https://eips.ethereum.org/EIPS/eip-4337 57 | [entrypoint]: https://etherscan.io/address/0x5FF137D4b0FDCD49DcA30c7CF57E578a026d2789 58 | [stackup]: https://www.stackup.sh/ 59 | [proofs]: https://github.com/codex-storage/codex-research/blob/33cd86af4d809c39c7c41ca50a6922e6b5963c67/design/storage-proof-timing.md 60 | [validator]: https://github.com/codex-storage/nim-codex/pull/387 61 | -------------------------------------------------------------------------------- /evaluations/arweave.md: -------------------------------------------------------------------------------- 1 | --- 2 | published: false 3 | --- 4 | An evaluation of the Arweave paper 5 | ================================== 6 | 7 | 2021-05-18 Dagger Team 8 | 9 | https://www.arweave.org/yellow-paper.pdf 10 | 11 | Goal of this evaluation is to find things to adopt or avoid while designing 12 | Dagger. It is not meant to be a criticism of Arweave. 13 | 14 | #### Pros: 15 | 16 | + There is no distinction between full and light clients, merely clients that 17 | downloaded more or less of the blockweave. (§2.2) 18 | + Prefential treatment of peers is discouraged, because nodes are unaware when 19 | they're being monitored for responsiveness. (§3.4.2) 20 | + Interesting 'meta-game' on top of tit-for-tat, in which nodes monitor their 21 | peers on how they rank other peers. (§6.1) 22 | + Because behaviour of nodes is largely based on local rules and the local view 23 | that a node has of its peers, the network is able to shift behaviour gradually 24 | in response to a changing environment. (§6.2) 25 | 26 | #### Cons: 27 | 28 | - Proof of Work is used for the underlying blockweave (§3.1), which is 29 | rather wasteful. 30 | - Data is stored indefinitely, which is great for public information, but not so 31 | great for ephemeral private data. This makes storage unnecessarily expensive 32 | for data with a short lifespan. (§3.1) 33 | - Network is free at point of use for external users, raising questions about 34 | scalability of the network when faced with highly popular content. (§3.4.2) 35 | Incentives for data replication help (§7.1.2), but it is unlikely that it 36 | will hold up when the network grows in content (§8.2, §8.3). These incentives 37 | can also lead to unnecessary duplication of unpopular content. 38 | - Nodes with limited connectivity are discouraged from participating in the 39 | network, which precludes use on mobile devices. (§3.4.3) 40 | - There is an economic incentive for a miner to not to share old blocks with 41 | other miners, because it increases its chance of "winning" the new block. 42 | (§4.1.1) 43 | - There is an economic incentive for miners to have the strictest censorship 44 | rules, because otherwise a block that it mined might be rejected by others. 45 | (§5.1) 46 | - The majority of the network determines the censorship rules. This could prove 47 | troublesome should Arweave's Proof of Work lead to similar geographic 48 | centralization of mining power as we see in Bitcoin. (§5.3) 49 | - Transaction ID is used for addressing, instead of a content hash. (§7.1.1) 50 | - Uses HTTP for inter-node traffic, instead of an established peer-to-peer 51 | protocol. (§7.1.3) 52 | -------------------------------------------------------------------------------- /evaluations/eigenlayer.md: -------------------------------------------------------------------------------- 1 | Eigenlayer 2 | ========== 3 | 4 | 2024-05-29 5 | 6 | A review of the Eigenlayer and EIGEN token whitepapers, with some thoughts on 7 | how this could be applied to Codex. 8 | 9 | * [Eigenlayer whitepaper](https://docs.eigenlayer.xyz/assets/files/EigenLayer_WhitePaper-88c47923ca0319870c611decd6e562ad.pdf) 10 | * [EIGEN token whitepaper](https://docs.eigenlayer.xyz/assets/files/EIGEN_Token_Whitepaper-0df8e17b7efa052fd2a22e1ade9c6f69.pdf) 11 | 12 | Eigenlayer 13 | ---------- 14 | 15 | The core idea of Eigenlayer is to reuse the collateral that is already staked on 16 | the Ethereum beacon chain for other protocols beside Ethereum. The collateral 17 | that Ethereum validators put up to ensure that they stick to the Ethereum 18 | consensus protocol, is used to also ensure that they also follow to the rules of 19 | other protocols. In exchange for this, they get rewarded additional fees from 20 | these protocols. 21 | 22 | Eigenlayer has an open marketplace in which protocols advertise themselves, and 23 | validators can opt in to help secure these protocols by restaking their 24 | collateral (§2). 25 | 26 | The main mechanism that is used, is to have the Ethereum validators set the 27 | withdrawal address of their collateral to an Eigenlayer smart contract (§2.1). 28 | This means that when the validator behaved nicely on the Ethereum network and 29 | wants to exit the network, their stake will then be passed to an Eigenlayer 30 | contract. This Eigenlayer contract will then perform additional checks to ensure 31 | that the validator wasn't slashed by any additional protocols that the validator 32 | participated in, before releasing the stake (§3.1). 33 | 34 | ### Incentives and centralization ### 35 | 36 | This raises the question: what happens to the incentive for the validator to 37 | behave nicely if their collateral has already been forfeited in Eigenlayer. And 38 | what would the consequences for the Ethereum beacon chain be if this were to 39 | happen to a large number of validators simultaneously? In the whitepaper two 40 | mitigations are mentioned: security audits (§3.4.2) and the ability to veto 41 | slashings (§3.5). Before a protocol is allowed onto the marketplace it needs to 42 | be verified through a security audit. And if the protocol were to inadvertently 43 | slash a large group of validators (e.g. through a bug in its smart contract), 44 | then there is a governing group that can veto these slashings. The downside to 45 | these mitigations is that they are both centralizing forces, because there is 46 | now a small group of people that decide whether a protocol is admitted to the 47 | marketplace, and a small group of people that can veto slashings. 48 | 49 | Eigenlayer claims to incentivize decentralization by allowing protocols to 50 | specify that they only want to make use of stake that is put up by home stakers 51 | (§4.4). However, given the permissionless nature of Ethereum, it is not possible 52 | to distinguish home stakers from a large centralized player with many 53 | validators, each having its own address. 54 | 55 | A further centralizing force in Eigenlayer is its license, which is not an open 56 | source license. This means that only the Eigenlayer developers can change the 57 | Eigenlayer code, and forking is not allowed. 58 | 59 | ### Potential use cases for Codex ### 60 | 61 | There are a couple of places in Codex that might benefit from restaking. We 62 | could allow Ethereum validators to use (a part of) their stake on the beacon 63 | chain for filling slots in storage contracts. There are a few downsides to this. 64 | It becomes rather difficult to reason about how high the stake for a storage 65 | contract should be when when the stake behind a storage provider's promise can 66 | be shared with a number of other protocols (§3.4.1). Codex uses part of the 67 | slashed stake to incentivize repair, which would not be possible with restaking, 68 | because the stake only becomes available in Eigenlayer after the validator stops 69 | validating the beacon chain, and withdraws its collateral. That is, if the stake 70 | hasn't already been slashed by the beacon chain. Also, the hardware requirements 71 | for running an Ethereum validator are sufficiently different from the 72 | requirements of running a Codex provider, that we do not expect there to be many 73 | people that run both. 74 | 75 | We might also use restaking to keep proof aggregators honest (§4.1, point 6). 76 | Preferably using a combination of staked Codex tokens and restaked ETH (§4.4), 77 | so that we increase the utility of the Codex token while also guarding against 78 | value loss of the token. 79 | 80 | And finally, we might use restaking to keep participants in a nano payments 81 | scheme honest (§4.1, point 2 and 8). We intend to add bandwidth payments to 82 | Codex, and for this we need nano payments, for which a blockchain is too slow. 83 | Ideally we'd have a lighter form of consensus for these payments. The validators 84 | of this lighter form of consensus could be kept honest by restaking. 85 | 86 | EIGEN Token 87 | ----------- 88 | 89 | The EIGEN token is a separate project only marginally related to Eigenlayer. It 90 | allows staking to disincentivize subjective faults. In contrast to objective 91 | faults, subjective faults cannot be coded into a smart contract, but need to be 92 | adjudicated by people (§1.2). 93 | 94 | This is implemented though a forkable token (§2.3.1) called EIGEN. Every time a 95 | subjective decision needs to be made, someone can create a new EIGEN' token, and 96 | start using that instead of the old token. If everyone agrees, then the new 97 | token will gain in perceived value, while the perceived value of the old token 98 | approaches 0. 99 | 100 | In the whitepaper a protocol is described to ensure that forking the token 101 | doesn't impact long-term holders of the token (§2.7). 102 | 103 | A centralizing force in the design is the security council, a small group of 104 | people in charge of freezing and/or upgrading the smart contracts (§2.7.4). 105 | 106 | Conclusion 107 | ---------- 108 | 109 | Given the centralizing aspects of Eigenlayer, it is probably not a good 110 | foundation to build parts of the Codex protocol. The idea of restaking is an 111 | interesting one, but not without its own risks that are not easy to quantify. 112 | 113 | The EIGEN token is probably not interesting for Codex, because we've taken great 114 | effort to ensure that bad behaviour on the network is either objectively 115 | punishable or economically disincentivized, negating the need for human 116 | adjudication. 117 | -------------------------------------------------------------------------------- /evaluations/filecoin.md: -------------------------------------------------------------------------------- 1 | An evaluation of the Filecoin whitepaper 2 | ======================================== 3 | 4 | 2020-12-08 Mark Spanbroek 5 | 6 | https://filecoin.io/filecoin.pdf 7 | 8 | Goal of this evaluation is to find things to adopt or avoid while designing 9 | Dagger. It is not meant to be a criticism of Filecoin. 10 | 11 | #### Pros: 12 | 13 | + Clients do not need to actively monitor hosts. Once a deal has been agreed 14 | upon, the network checks proofs of storage. 15 | + The network actively tries to repair storage faults by introducing new 16 | orders in the storage market. (§4.3.4). 17 | + Integrity is achieved because files are addressed using their content 18 | hash (§4.4). 19 | + Marketplaces are explicitly designed and specified (§5). 20 | + Micropayments via payment channels (§5.3.1). 21 | + Integration with other blockchain systems such as Ethereum (§7.2) are being 22 | worked on. 23 | 24 | #### Cons: 25 | 26 | - Filecoin requires its own very specific blockchain, which influences a lot 27 | of its design. There is tight coupling between the blockchain, storage 28 | accounting, proofs and markets. 29 | - Proof of spacetime is much more complex than simple challenges, and only 30 | required to make the blockchain work (§3.3, §6.2) 31 | - A miners influence is proportional to the amount of storage they're 32 | providing (§1.2), which is an incentive to become big. This could lead to 33 | the same centralization issues that plague Bitcoin. 34 | - Incentives are geared towards making the particulars of the Filecoin 35 | design work, instead of directly aligned with users' interest. For instance, 36 | there are incentives for storage and retrieval, but it seems that a miner 37 | would be able to make money by only storing data, and never offering it for 38 | retrieval. Also, the incentive for a miner to store multiple independent 39 | copies does not mean protection against loss if they're all located on the 40 | same failing disk. 41 | - The blockchain contains a complete allocation table of all things that are 42 | stored in the network (§4.2), which raises questions about scalability. 43 | - Zero cash entry (such as in Swarm) doesn't seem possible. 44 | - Consecutive micropayments are presented as a solution for the trust problems 45 | while retrieving (§5.3.1), which doesn't entirely mitigate withholding 46 | attacks. 47 | - The addition of smart contracts (§7.1) feels like an unnecessary 48 | complication. 49 | -------------------------------------------------------------------------------- /evaluations/ipfs.md: -------------------------------------------------------------------------------- 1 | An evaluation of the IPFS paper 2 | =============================== 3 | 4 | 2021-01-07 Dagger Team 5 | 6 | https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf 7 | 8 | Goal of this evaluation is to find things to adopt or avoid while designing 9 | Dagger. It is not meant to be a criticism of IPFS. 10 | 11 | #### Pros: 12 | 13 | + IPFS is designed by simplifying, evolving, and connecting proven techniques 14 | (§3) 15 | + Consists of a stack of separately described sub-protocols (§3) 16 | + Uses Coral DSHT to favor data that is nearby, reducing latency of lookup 17 | (§2.1.2) 18 | + Uses proof-of-work in S/Kademlia to discourage Sybil attacks (§2.1.3) 19 | + Favors self-describing values such as multihash (§3.1) and multiaddr (§3.2.1) 20 | + BitSwap protocol for exchanging blocks supports multiple strategies (§3.4.2), 21 | so it should be relatively easy to add a micropayment strategy. 22 | + Uses content addressing (§3.5) 23 | + The Merkle DAG is simple, yet allows constructing filesystems, 24 | key-value stores, databases, messaging system, etc.. (§3.5) 25 | 26 | #### Cons: 27 | 28 | - Kademlia prefers long-lived nodes (§2.1.1), which is not ideal for mobile 29 | environments (although it's unclear whether there are any better alternatives) 30 | - The default BitSwap strategy falls just short of introducing a currency with 31 | micro payments, necessitating additional work for nodes to find blocks to 32 | barter with (§3.4) 33 | - Object pinning (§3.5.3) inevitably leads to centralized gateways to IPFS, such 34 | as Infura and Pinata 35 | - IPFS uses variable size blocks instead of fixed-size chunks (§3.6), which 36 | might make it a bit harder to add incentives and pricing 37 | - Supporting version control directly in IPFS feels like an unnecessary 38 | complication (§3.6.5) 39 | -------------------------------------------------------------------------------- /evaluations/rollups.md: -------------------------------------------------------------------------------- 1 | Ethereum L2 Rollups 2 | =================== 3 | 4 | A quick and dirty overview of existing rollups and their suitability for hosting 5 | the Codex marketplace smart contracts. To interact with these contracts, the 6 | participants in the network create blockchain transactions for purchasing and 7 | selling storage, and for providing storage proofs that are then checked 8 | on-chain. It would be too costly for these transactions to happen on Ethereum 9 | main net, which is why this document explores L2 rollups as an alternative. 10 | 11 | Main sources used: 12 | - individual websites of the rollup projects 13 | - https://l2beat.com 14 | - https://blog.kroma.network/l2-scaling-landscape-fees-and-max-tps-fe6087d3f690 15 | 16 | Requirements 17 | ------------ 18 | 19 | For the storage marketplace to work, we have the following requirements for a 20 | rollup: 21 | 1. Low gas costs; if gas is too costly then the business case of storage 22 | providers disappears 23 | 2. EVM compatibility; this shortens our time to market because we already have 24 | Solidity contracts 25 | 3. Support for BN254 elliptic curve precompiles (ecAdd, ecMul, ecPairing) for 26 | the proof system 27 | 4. High throughput; our current proof system that checks all proofs separately 28 | on chain requires a large number of transactions per second 29 | 5. Censorship resistant; an L2 operator should not have the power to exclude 30 | transactions from certain people or apps 31 | 32 | Note that low latency is not a requirement; it's ok to have latency equivalent 33 | to L1, which is in the order of tens of seconds. 34 | 35 | Main flavours 36 | ------------- 37 | 38 | Although there are many L2 rollups, there is a limited number of technical 39 | stacks that underly them. 40 | 41 | There is the family of purely optimistic rollups, that rely on fraud proofs to 42 | ensure that they are kept honest: 43 | - Arbitrum 44 | - Optimism / OP Stack 45 | - Fuel 46 | 47 | And there are the rollups that rely on zero-knowledge proofs to prove that they 48 | act honestly: 49 | - Polygon zkEVM / CDK 50 | - Linea 51 | - zkSync 52 | - Scroll 53 | 54 | And there's Taiko, which uses a combination of zero-knowledge proofs and fraud 55 | proofs to keep the network honest: 56 | - Taiko 57 | 58 | Gas prices 59 | ---------- 60 | 61 | A rough approximation of average gas prices for submitting a Codex storage proof 62 | for each rollup: 63 | 64 | | Rollup | Average proof price | Potential profit | 65 | | ------------------- | ------------------ | ---------------- | 66 | | Mantle | $0.0000062723 | $2.58 | 67 | | Boba network | $0.0016726250 | -$2.54 | 68 | | Immutable zkEVM | $0.0073595500 | -$20.01 | 69 | | Arbitrum | $0.0083631250 | -$23.09 | 70 | | zkSync Era | $0.0209078125 | -$61.63 | 71 | | Base | $0.0418156250 | -$125.86 | 72 | | Optimism | $0.0836312500 | -$254.32 | 73 | | Polygon zkEVM | $0.1254468750 | -$382.77 | 74 | | Blast | $0.1672625000 | -$511.23 | 75 | | Scroll | $0.2090781250 | -$639.69 | 76 | | Taiko | $0.2508937500 | -$768.15 | 77 | | Metis | $0.4014300000 | -$1,230.59 | 78 | | Linea | $0.8363125000 | -$2,566.55 | 79 | 80 | This table was created by eyeballing the gas cost and token price graphs for 81 | each L2, and [calculating the USD costs](rollups.ods) for a proof from that. We 82 | did not include rollups that are not EVM compatible. 83 | 84 | Potential profit (per month per TB) is calculated by assuming operational costs 85 | of $1.40 and revenue of $4.00 per TB per month, an average slot size of 10 GB, 86 | and an average of 1 proof per slot per day. 87 | 88 | EVM compatibility 89 | ----------------- 90 | 91 | This shows which rollups are EVM compatible, and whether they support the BN254 92 | elliptic curve precompiles that we require for verification of our storage 93 | proofs (ecAdd, ecMul, ecPairing). 94 | 95 | | Rollup | EVM compatible | Elliptic Curve operations | 96 | | --------------------- | -------------- | ------------------------- | 97 | | Arbitrum | Yes | Yes | 98 | | Base | Yes | Yes | 99 | | Blast | Yes | Yes | 100 | | Boba network | Yes | Yes | 101 | | Immutable zkEVM | Yes | Yes | 102 | | Linea | Yes | Yes | 103 | | Mantle | Yes | Yes | 104 | | Metis | Yes | Yes | 105 | | Optimism | Yes | Yes | 106 | | Polygon zkEVM | Yes | Yes | 107 | | Scroll | Yes | Yes | 108 | | Taiko | Yes | Yes | 109 | | zkSync Era | Yes | No | 110 | | Fuel L2 V1 | No | N/A | 111 | | Fuel Rollup OS | No | N/A | 112 | | Immutable X | No | N/A | 113 | | Polygon Miden | No | N/A | 114 | | Starknet | No | N/A | 115 | | zkSync lite | No | N/A | 116 | 117 | 118 | Throughput 119 | ---------- 120 | 121 | A rough approximation of the maximum number of transactions that a rollup can 122 | handle, and the maximum size of the storage network that it might support: 123 | 124 | | Rollup | Maximum TPS | Maximum storage | 125 | | --------------------- | ----------- | --------------- | 126 | | zkSync Era | 750 | 1236 PB | 127 | | Starknet | 484 | 798 PB | 128 | | Optimism | 455 | 750 PB | 129 | | Base | 455 | 733 PB | 130 | | Mantle | 400 | 659 PB | 131 | | Metis | 357 | 588 PB | 132 | | Polygon zkEVM | 237 | 391 PB | 133 | | Arbitrum | 226 | 372 PB | 134 | | Boba network | 205 | 338 PB | 135 | | Scroll | 50 | 82 PB | 136 | | Taiko | 33 | 54 PB | 137 | | Blast | ? | ? | 138 | | Immutable zkEVM | ? | ? | 139 | | Linea | ? | ? | 140 | | Fuel L2 V1 | ? | ? | 141 | | Fuel Rollup OS | ? | ? | 142 | | Immutable X | ? | ? | 143 | | Polygon Miden | ? | ? | 144 | | zkSync lite | ? | ? | 145 | 146 | Maximum size of the storage network is [calculated](rollups.ods) assuming an 147 | average 1 proof per 24 hours per slot, average slot size 10 GB, and average 148 | erasure coding rate of 1/2. In practice the calculated maximum storage is going 149 | to be less, because we can't use up the entirety of the rollup capacity. 150 | 151 | Maximum TPS figures are taken from an [overview document by 152 | Kroma](https://blog.kroma.network/l2-scaling-landscape-fees-and-max-tps-fe6087d3f690) 153 | 154 | Censorship resistance 155 | --------------------- 156 | 157 | Censorship resistance can be achieved by having a decentralized architecture, 158 | where anyone is allowed to propose blocks and there are no admin rights that 159 | allow a rollup operator to change the rules in their favour. 160 | 161 | Only Fuel L2 V1 has all these properties, the others don't. And because Fuel L2 162 | V1 is a payment network without smart contracts it is not suitable for the Codex 163 | marketplace. This means that at this moment there is no censorship resistant 164 | rollup that can host the Codex marketplace. 165 | 166 | Taiko is one of the few rollups that has a decentralized architecture, and it's 167 | committed to becoming permissionless. However, at the moment it is not. 168 | 169 | | Rollup | Decentralized | Permissionless | Adminless | 170 | | --------------------- | ------------- | --------------- | ------------ | 171 | | Fuel L2 V1 | Yes | Yes | Yes | 172 | | Metis | Yes | No | No | 173 | | Taiko | Yes | No | No | 174 | | Arbitrum | No | N/A | N/A | 175 | | Base | No | N/A | N/A | 176 | | Blast | No | N/A | N/A | 177 | | Boba network | No | N/A | N/A | 178 | | Fuel Rollup OS | No | N/A | N/A | 179 | | Immutable X | No | N/A | N/A | 180 | | Immutable zkEVM | No | N/A | N/A | 181 | | Linea | No | N/A | N/A | 182 | | Mantle | No | N/A | N/A | 183 | | Optimism | No | N/A | N/A | 184 | | Polygon zkEVM | No | N/A | N/A | 185 | | Polygon Miden | No | N/A | N/A | 186 | | Scroll | No | N/A | N/A | 187 | | Starknet | No | N/A | N/A | 188 | | zkSync lite | No | N/A | N/A | 189 | | zkSync Era | No | N/A | N/A | 190 | 191 | Conclusion 192 | ---------- 193 | 194 | There seems to be no rollup that matches all the requirements that we listed in 195 | the beginning of the document. The most pressing problem is that only Mantle 196 | seems to be cheap enough to allow storage providers to turn a profit, given the 197 | assumptions of an average 10 GB slot size and 1 proof per 24 hours. It is 198 | unclear whether these low prices are sustainable in the long run. If we want to 199 | have more choice on where to deploy, then we either need to reduce the number of 200 | on-chain proofs in Codex drastically, or we need to find a way to reduce rollup 201 | transaction costs. 202 | 203 | Luckily we're already working on reducing the number of proofs by introducing 204 | proof aggregation, but the analysis in this document shows that we might not be 205 | able to launch a storage network without it. Reducing the number of proofs also 206 | ensures that the network can grow to a larger total size. 207 | 208 | When we look at reducing the transaction costs, then the best thing to focus on 209 | is getting rid of the need to post transactions on L1 in blobs. This is by far 210 | the most expensive part of running a rollup, and this is most likely also why 211 | Mantle is the only rollup in this overview that is cheap enough; it uses EigenDA 212 | instead of posting to L1. In this respect, it might also be interesting to look 213 | at Arbitrum AnyTrust, which has a similar design. We could also consider 214 | creating a fork of an existing rollup and use Codex as the DA layer. Some of the 215 | new modular architectures for creating rollups, such as 216 | [Espresso](https://www.espressosys.com/), [Astria](https://www.astria.org/), 217 | [Radius](https://www.theradius.xyz/), [Altlayer](https://www.altlayer.io/) and 218 | [NodeKit](https://www.nodekit.xyz/) could also make it easier to experiment with 219 | different rollup designs. 220 | -------------------------------------------------------------------------------- /evaluations/rollups.ods: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/codex-storage/codex-research/4a5ddf0e5271bdb49fe6db18e2a2d4268d37c07a/evaluations/rollups.ods -------------------------------------------------------------------------------- /evaluations/sia.md: -------------------------------------------------------------------------------- 1 | An evaluation of the Sia whitepaper 2 | =================================== 3 | 4 | 2020-12-07 Mark Spanbroek 5 | 6 | https://sia.tech/sia.pdf 7 | 8 | Goal of this evaluation is to find things to adopt or avoid while designing 9 | Dagger. It is not meant to be a criticism of Sia. 10 | 11 | #### Pros: 12 | 13 | + Clients do not need to actively monitor hosts (§1). Once a contract has been 14 | agreed upon, the host earns/loses coins based on proofs of storage that the 15 | network can check. 16 | + Denial of service attacks can be mitigated by burning funds associated with 17 | missed proofs (§4). 18 | + Proof of storage is simple; provide a random piece of the file, and the 19 | corresponding Merkle proof (§5.1). 20 | + Promotes erasure codes to safeguard against data loss (§7.2). 21 | + Suggests to use payment channels for micro-payments (§7.3). 22 | + The basic reputation system is protected against Sybil attacks (§7.4). 23 | 24 | #### Cons: 25 | 26 | - Sia has its own blockchain (§1), which makes some attacks more likely 27 | (§5.2, §5.3). This can be mitigated by adopting a widely used, general purpose 28 | blockchain such as Ethereum. 29 | - Requires a multi-signature scheme (§2). 30 | - The proof-of-storage algorithm requires that hosts store the entire file (§4), 31 | instead of a few chunks. 32 | - Contracts can be edited (§4). This feels like an unnecessary complication of 33 | the protocol. 34 | - Randomness for the storage proofs comes from the latest block hash (§5.1). 35 | This can be manipulated, especially when using a specialized blockchain for 36 | storage. 37 | - There is an arbitrary data field that might be used for advertisements in a 38 | storage marketplace (§6). This feels like a very restrictive environment for a 39 | marketplace, and an unnecessary complication for the underlying blockchain. 40 | - It is suggested that clients use erasure coding before encryption (§7.2). If 41 | this were reversed (first encryption, then erase coding) then this would open 42 | up scenario's for caching and re-hosting by those who do not possess the 43 | decryption key. 44 | - Consecutive micropayments are presented as a solution for the trust problems 45 | while downloading (§7.3). This assumes that the whole file, or a large part of 46 | it, is stored on a single host. It also doesn't entirely mitigate withholding 47 | attacks. 48 | - The basic reputation system favors hosts that have already earned or bought 49 | coins (§7.4). It is also unclear how the reputation system discourages abuse. 50 | - Governance seems fairly centralized, with most funds and proceeds going to a 51 | single company (§8). 52 | -------------------------------------------------------------------------------- /evaluations/sidechains.md: -------------------------------------------------------------------------------- 1 | Side chains 2 | =========== 3 | 4 | This document looks at the economics of running the Codex marketplace contracts 5 | on an Ethereum side chain. Both existing side chains and running a dedicated 6 | sidechain for Codex are considered. Existing Ethereum [rollups][1] seem to be 7 | too expensive by about a factor of 100 for our current storage proof scheme 8 | (without proof aggregation). We'd like to find out if using a side chain could 9 | sufficiently lower the transactions costs. 10 | 11 | [1]: ../evaluations/rollups.md 12 | 13 | 14 | Existing side chains 15 | -------------------- 16 | 17 | First we'll take a look at Polygon PoS and Gnosis chain to determine the average 18 | gas costs for submitting a storage proof on these chains. Then we'll estimate 19 | what the operational costs are of running these chains and compare that against 20 | their revenue in gas fees. This is done to see whether there is any room for 21 | reducing prices should we want to run a dedicated side chain for Codex. 22 | 23 | ### Gas prices ### 24 | 25 | A rough approximation of average gas costs for submitting a Codex storage proof 26 | for these chains: 27 | 28 | | Side chain | Average proof costs | Potential profit | 29 | | ------------------- | ------------------ | ---------------- | 30 | | Polygon PoS | $0.0070250250 | -$18.98 | 31 | | Gnosis chain | $0.0050178750 | -$12.81 | 32 | 33 | This table was created by eyeballing the gas cost and token price graphs for 34 | each chain over the past 6 months, and [calculating the USD 35 | costs](sidechains.ods) for a proof from that. 36 | 37 | Potential profit (per month per TB) is calculated by assuming operational costs 38 | of $1.40 and revenue of $4.00 per TB per month, an average slot size of 10 GB, 39 | and an average of 1 proof per slot per day. 40 | 41 | ### Throughput ### 42 | 43 | A rough approximation of the maximum number of transactions that a chain can 44 | handle, and the maximum size of the storage network that it might support: 45 | 46 | | Side chain | Maximum TPS | Maximum storage | 47 | | --------------------- | ----------- | --------------- | 48 | | Polygon PoS | 255 | 420 PB | 49 | | Gnosis chain | 156 | 257 PB | 50 | 51 | Maximum size of the storage network is [calculated](sidechains.ods) assuming an 52 | average 1 proof per 24 hours per slot, average slot size 10 GB, and average 53 | erasure coding rate of 1/2. 54 | 55 | ### Decentralization ### 56 | 57 | Polygon PoS has substantially less validators than Gnosis chain: 58 | 59 | | Side chain | Number of validators | 60 | | --------------------- | -------------------- | 61 | | Polygon PoS | 100 | 62 | | Gnosis chain | 200 000 | 63 | 64 | ### Network costs ### 65 | 66 | To get an idea of the actual costs for running a chain, we estimate the hardware 67 | costs needed to keep the network running. We take the cost of running a single 68 | validator and multiply this by the number of validators. This should give us an 69 | idea how much of the gas price is used to cover operational costs, and how much 70 | is profit. These are [back of the envelope calculations](sidechains.ods) using 71 | data from the past 6 months to get an idea of the order of magnitude, not meant 72 | to be very accurate: 73 | 74 | | Side chain | hardware costs | network fees | cost / revenue ratio | 75 | | ------------ | ------------------ | -----------------| --------------------- | 76 | | Polygon PoS | $28 000 / month | $840 000 / month | 3% | 77 | | Gnosis chain | $4 000 000 / month | $15 000 / month | 26667% | 78 | 79 | While Polygon PoS seem to have a healthy margin for profit, the validators of 80 | the Gnosis chain are spending about 250x more on hardware costs than is covered 81 | by the network fees. This is mostly due to the large amount of validators, and 82 | seems to be compensated for by reserving tokens and using them for paying out 83 | [validator rewards][2]. Also, Polygon PoS has a utilization of about 90%, 84 | whereas Gnosis chain has a utilization of about 25%. 85 | 86 | [2]: https://forum.gnosis.io/t/gno-utility-and-value-proposition/2344#current-gno-distribution-and-gno-burn-5 87 | 88 | 89 | A custom side chain for Codex 90 | ----------------------------- 91 | 92 | Next, we'll look at ways in which we could reduce gas costs by deploying a 93 | dedicated side chain for Codex. 94 | 95 | ### EVM opcode pricing ### 96 | 97 | Ethereum transactions consist of EVM operations. Each operation is priced in 98 | amount of gas. Some operations [are more expensive than others][3], mainly 99 | because they require more resources (cpu, storage) than others. Gas costs are 100 | also specifically engineered to withstand DoS attacks on validators. 101 | 102 | Tweaking the gas prices of EVM opcodes does not seem to be the most viable path 103 | to lowering transaction costs, because it only determines how expensive 104 | operations are relative to one another, they don't determine the actual price. 105 | It is also difficult to oversee the security risks. 106 | 107 | [3]: https://notes.ethereum.org/@poemm/evm384-update5#Background-on-EVM-Gas-Costs 108 | 109 | ### Gas pricing ### 110 | 111 | The biggest factor that determines the actual costs of transactions is the gas 112 | price. The transaction costs is [determined according to the following 113 | formula][4] as specified by [EIP-1559][5]: 114 | 115 | `fee = units of gas used * (base fee + priority fee)` 116 | 117 | The base fee is calculated based on how full the latest blocks are. If they are 118 | above the target block size of 15 million gas, then the base fee increases. If 119 | they are below the target block size, then the base fee decreases. The base fee 120 | is burned when the transaction is included. The priority fee is set by the 121 | transaction sender, and goes to the validators. It is a mechanism for validators 122 | to prioritize transactions. It also acts as an incentive for validators to not 123 | produce empty blocks. 124 | 125 | Both priority base fee and transaction fee go up when there is more demand 126 | (submitted transactions) than there is supply (maximum transactions per second). 127 | This is the main reason why transactions are as expensive as they are: 128 | 129 | "No transaction fee mechanism, EIP-1559 or otherwise, is likely to substantially 130 | decrease average transaction fees; persistently high transaction fees is a 131 | scalability problem, not a mechanism design problem" -- [Tim Roughgarden][6] 132 | 133 | [4]: https://ethereum.org/en/developers/docs/gas/#how-are-gas-fees-calculated 134 | [5]: https://eips.ethereum.org/EIPS/eip-1559 135 | [6]: http://timroughgarden.org/papers/eip1559.pdf 136 | 137 | 138 | ### The scalability problem ### 139 | 140 | Ultimately high transaction costs is a scalability issue. And unfortunately 141 | there are [no easy solutions][7] for increasing the amount of transactions that 142 | the chain can handle. For instance, just increasing the block size introduces 143 | several issues. Increasing block size increases the amount of resources that 144 | validators need (cpu, memory, storage, bandwidth). This means that it becomes 145 | more expensive to run a validator, which leads to a decrease in the number of 146 | validators, and an increase in centralization. It also increases block time, 147 | because there is more time needed to dissemminate the block. And this actually 148 | decreases the capacity of the network in terms of transactions per second, which 149 | counters the positive effect that you get from increasing the block size. 150 | 151 | [7]: https://cryptonews.com/news/contrary-to-musk-s-idea-you-can-t-just-increase-block-size-s-10426.htm 152 | 153 | Conclusion 154 | ---------- 155 | 156 | The gas prices on existing side chains that we looked are not low enough for 157 | storage providers to turn a profit, as long as we haven't implemented proof 158 | aggregation yet. 159 | 160 | From the costs analysis of Polygon PoS it seems feasible to launch a 161 | not-for-profit dedicated side chain for Codex that reduces transaction costs by 162 | about a factor of 10. This should be enough for storage providers to start 163 | making a very modest profit if they charge $4/TB/month. Polygon PoS achieves 164 | this by keeping a relatively low amount of validators, which is something to 165 | keep in mind when deploying a side chain for Codex. Also, as soon there are more 166 | transactions than fit in the blocks, and the chain is running at capacity, then 167 | the gas price will go up. 168 | 169 | For the short term it seems viable to start with a dedicated side chain for 170 | Codex, while there is no high demand yet. This gives us time to work on reducing 171 | the number of transactions, for instance by aggregating storage proofs. In the 172 | beginning the number of transactions won't be sufficient to cover the costs of 173 | running validators, so some sponsoring of validators will be required to 174 | bootstrap the chain. 175 | 176 | For the medium term we can consider to have multiple side chains depending on 177 | demand. If demand is reaching capacity of the existing side chain(s) then 178 | another side chain is started. This ensures that none of the side chains runs at 179 | full capacity, keeping the prices low. Because each side chain can be bridged to 180 | main net, funds can be moved from one side chain to the other. The obvious 181 | downside of this is fragmentation of the marketplace. However, the reason to add 182 | a side chain is because demand is high, so each fragment should have a healthy 183 | marketplace. Also, the Codex peer-to-peer network would not be fragmented, only 184 | the marketplace. Meaning that there is still a single content-addressable data 185 | space. 186 | 187 | For the long term we should probably move to a blockchain that supports a higher 188 | number of transactions at a lower cost than is currently available. 189 | -------------------------------------------------------------------------------- /evaluations/sidechains.ods: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/codex-storage/codex-research/4a5ddf0e5271bdb49fe6db18e2a2d4268d37c07a/evaluations/sidechains.ods -------------------------------------------------------------------------------- /evaluations/statechannels/disputes.md: -------------------------------------------------------------------------------- 1 | State Channel Disputes 2 | ====================== 3 | 4 | A problem with state channels is that participants have to remain "online"; they 5 | need to keep an eye on latest state of the underlying blockchain and be able to 6 | respond to any disputes about the final state of the state channel. Ironically 7 | this problem stems from the mechanism that allows a state channel to be closed 8 | when a participant goes offline. Closing a channel unilaterally is allowed, but 9 | there is a period in which the other participant can dispute the final state of 10 | the channel. Therefore participants should be monitoring the blockchain so that 11 | they can respond during a dispute period. 12 | 13 | ### Pisa 14 | 15 | https://www.cs.cornell.edu/~iddo/pisa.pdf 16 | 17 | The PISA protocol enables a participant to outsource monitoring of the 18 | underlying blockchain to an untrusted watchtower. The main idea is that a hash 19 | of the latest state channel update is sent to the watchtower. The watchtower 20 | responds with a signed promise to use this information to settle any disputes 21 | that may arise. Should the watchtower fail to do so, then the signed promise can 22 | be used as proof of negligence and it will lose its substantial collateral. 23 | 24 | A potential problem with this scheme is that the collateral is shared among all 25 | state channels that the watchtower is monitoring, which could lead to bribing 26 | attacks. 27 | 28 | ### Brick 29 | 30 | https://arxiv.org/pdf/1905.11360.pdf 31 | 32 | The BRICK protocol provides an alternative to the dispute period based on 33 | byzantine consistent broadcast. Participants in a state channel assign a 34 | committee that is allowed to sign off on channel closing in case they are not 35 | able to do so themselves. Instead of waiting for a period of time before 36 | unilaterally closing the channel, with BRICK you wait for a threshold number of 37 | committee members to confirm the latest state of the channel. This is much 38 | faster. 39 | 40 | Each state channel update contains a sequence number, which is signed by the 41 | channel participants and sent to the committee members. For a channel to be 42 | closed unilaterally, the BRICK smart contract requires a signed state update, 43 | and signed sequence numbers provided by a majority of the committee. The highest 44 | submitted sequence number should match the submitted state update. Committee 45 | members that submit a lower sequence number lose the collateral that they 46 | provided when the channel was opened. 47 | 48 | A potential problem with the implementation of BRICK as outlined in the paper is 49 | that the collateral scheme is vulnerable to sybil attacks; committee members can 50 | attempt to steal their own collateral by providing proof of their own 51 | wrongdoing. 52 | 53 | Unilateral closing is also rather heavy on blockchain transactions; each 54 | committee member has to separately perform a transaction on chain to supply 55 | their sequence number. 56 | -------------------------------------------------------------------------------- /evaluations/statechannels/overview.md: -------------------------------------------------------------------------------- 1 | State Channels 2 | ============== 3 | 4 | State channels are a level 2 solution to enable fast and cheap transactions 5 | between parties, whose trust is anchored on a blockchain. 6 | 7 | We'll go through the evolution of state channels in somewhat chronological 8 | order. Starting with the most simple form: uni-directional payment channels. 9 | 10 | Uni-directional payment channels 11 | -------------------------------- 12 | 13 | Payments are one-to-one, and flow in one direction only. They are easy to 14 | understand and are the base upon which further enhancements are built. 15 | 16 | Flow: 17 | 18 | 1. Alice locks up an amount of coins (e.g. 1 Eth) in a smart contract 19 | on-chain. This opens up the payment channel. She's not able to touch the 20 | coins for a fixed amount of time. Bob is the only one able to withdraw at 21 | any time. 22 | 2. She then sends Bob payments off-chain, which basically amount to signed 23 | "Alice owes Bob x Eth" statements. The amount owed is strictly increasing. 24 | For instance, if Alice first pays Bob 0.2 Eth, and then 0.3 Eth, then Bob 25 | first receives a statement "Alice owes Bob 0.2 Eth", and then "Alice owes 26 | Bob 0.5 Eth". 27 | 3. Bob sends the latest statement from Alice to the smart contract, which pays 28 | Bob the total amount due. This closes the payment channel. 29 | 30 | 31 | ------------ 32 | | Contract | <------ 1 ---- Alice 33 | | | 34 | | | | | | 35 | | | | | | 36 | | | 2 2 2 37 | | | | | | 38 | | | | | | 39 | | | v v v 40 | | | 41 | | | <------ 3 ---- Bob 42 | ------------ 43 | 44 | 45 | Bi-directional payment channels 46 | ------------------------------- 47 | 48 | Payments are one-to-one, and are allowed to flow in both directions. 49 | 50 | Flow: 51 | 52 | 1. Both Alice and Bob lock up an mount of coins to open the payment channel. 53 | 2. Alice and Bob send each other payments off-chain, whereby they sign the 54 | total amount owed for both parties. For instance, when Bob sends 0.3 Eth 55 | after Alice sent 0.2 Eth, he will sign the statement: 56 | "A->B: 0.2, B->A: 0.3". These statements have a strictly increasing 57 | version number. 58 | 3. At any time, Alice or Bob can use the latest signed statement and ask 59 | the smart contract to pay out the amounts due. This closes the payment 60 | channel. To ensure that Alice and Bob do not submit an old statement, 61 | there is a period in which the other party can provide a newer statement. 62 | 63 | Because of the contention period these channels take longer to close in case of 64 | a dispute. Also, both parties need to remain online and keep up with the latest 65 | state of the blockchain. 66 | 67 | ------------ 68 | | Contract | <------ 1 ---- Alice ---- 69 | | | | 70 | | | | ^ ^ | 71 | | | | | | | 72 | | | 2 2 2 | 73 | | | | | | | 74 | | | | | | | 75 | | | v | | | 76 | | | | 77 | | | <------ 3 ---- Bob | 78 | | | | 79 | | | <------ 3 ---------------- 80 | ------------ 81 | 82 | Payment channel networks 83 | ------------------------ 84 | 85 | Opening up a payment channel for every person that you interact with is 86 | impractical because they need to be opened and closed on-chain. 87 | 88 | Payment channel networks solve this problem by routing payments through 89 | intermediaries. If Alice wishes to pay David, she might route the payment 90 | through Bob and Carol. Hash-locks are used to ensure that a routed payment 91 | either succeeds or is rejected entirely. Intermediaries typically charge a fee 92 | for their efforts. 93 | 94 | Routing algorithms for payment channel networks are an active area of research. 95 | Each routing algorithm has its own drawbacks. 96 | 97 | 98 | Alice --> Bob --> Carol --> David 99 | 100 | 101 | State channels 102 | -------------- 103 | 104 | Payment channels can be generalized to not just handle payments, but also state 105 | changes, to enable off-chain smart contracts. Instead of signing off on amounts 106 | owed, parties sign off on transactions to a smart contract. Upon closing of a 107 | state channel, only a single transaction is executed on the on-chain contract. 108 | In case of a dispute, a contention period is used to determine which transaction 109 | is the latest. This means that just like bi-directional payment channels there 110 | is a need to remain online. 111 | 112 | Virtual channels 113 | ---------------- 114 | 115 | When routing payments over a payment channel network, all participants in the 116 | route are required to remain online and confirm all payments. Virtual channels 117 | alleviate this by involving intermediary nodes only for opening and closing 118 | a channel. They are built around the idea that state channels can host a smart 119 | contract for opening and closing a virtual channel. 120 | 121 | Existing solutions 122 | ------------------ 123 | 124 | | Name | Bi-directional | State | Routing | Virtual | 125 | |-------------------|----------------|-------|---------|---------| 126 | | raiden.network | ✓ | ✕ | ✓ | ✕ | 127 | | perun.network | ✓ | ✓ | ✓ | ✓ | 128 | | statechannels.org | ✓ | ✓ | ✓ | ✓ | 129 | | ethswarm.org | ✓ | ✕ | ✓ | ✕ | 130 | 131 | References 132 | ---------- 133 | 134 | * [SoK: Off The Chain Transactions][1]: a comprehensive overview of level 2 135 | solutions 136 | * [Raiden 101][2]: explanation of payment channel networks 137 | * [Perun][3] and [Nitro][4]: explanation of virtual state channels 138 | 139 | [1]: https://nms.kcl.ac.uk/patrick.mccorry/SoKoffchain.pdf 140 | [2]: https://raiden.network/101.html 141 | [3]: https://perun.network/pdf/Perun2.0.pdf 142 | [4]: https://magmo.com/nitro-protocol.pdf 143 | -------------------------------------------------------------------------------- /evaluations/storj.md: -------------------------------------------------------------------------------- 1 | An evaluation of the Storj whitepaper 2 | ===================================== 3 | 4 | 2020-12-22 Mark Spanbroek 5 | 6 | https://storj.io/storjv3.pdf 7 | 8 | Goal of this evaluation is to find things to adopt or avoid while designing 9 | Dagger. It is not meant to be a criticism of Storj. 10 | 11 | #### Pros: 12 | 13 | + Performance is considered throughout the design 14 | + Provides an Amazon S3 compatible API (§2.4) 15 | + Bandwidth usage of storage nodes is aggressively minimized to enable people 16 | with bandwidth caps to participate, which is good for decentralization (§2.7) 17 | + Erasure codes are used for redundancy (§3.4), upload and download speed 18 | (§3.4.2), proof of retrievability (§4.13) and repair (§4.7)! 19 | + BIP32 hierarchical keys are used to grant access to file paths (§3.6, §4.11) 20 | + Ethereum based token for payments (§3.9) 21 | + Storage nodes are not paid for uploads to avoid nodes that delete immediately 22 | after upload (§4.3) 23 | + Proof of Work on the node id is used to counter some Sybil attacks (§4.4) 24 | + Handles key revocations in a decentralized manner (§4.4) 25 | + Uses a simplified Kademlia DHT for node lookup (§4.6) 26 | + Uses caching to speed up Kademlia lookups (§4.6) 27 | + Uses standard-sized chunks (segments) throughout the network (§4.8.2) 28 | + Erasure coding is applied after encryption, allowing the network to repair 29 | redundancy without the need to know the decryption key (§4.8.4) 30 | + Streaming and seeking within a file are supported (§4.8.4) 31 | + Micropayments via payment channels (§4.17) 32 | + Paper has a very nice overview of possible attacks and mitigations (§B) 33 | 34 | 35 | #### Cons: 36 | 37 | - Mostly designed for long-lived stable nodes (§2.5) 38 | - Satellites are the gateway nodes to the network (§4.1.1), whose requirements 39 | for uptime and reputation lead to centralization (§4.10). They are also a 40 | single point of failure for a user, because it stores file metadata (§4.9). 41 | - Centralization is further encouraged by having a separate network of approved 42 | satellites (§4.21) 43 | - Clients have to actively perform for audits (§4.13) and execute repair (§4.14) 44 | (through their trusted satellite) 45 | - The network has a complex reputation system (§4.15) 46 | - Consecutive micropayments are presented as a solution for the trust problems 47 | while retrieving (§4.17), which doesn't entirely mitigate withholding attacks. 48 | - Scaling is hampered by the centralization that happens in the satellites 49 | (§6.1) 50 | - The choice to avoid Byzantine distributed consensus, such as a blockchain 51 | (§2.10, §A.3) results in the need for trusted centralized satellites 52 | -------------------------------------------------------------------------------- /evaluations/sui.md: -------------------------------------------------------------------------------- 1 | ../papers/Sui/sui.md -------------------------------------------------------------------------------- /evaluations/swarm.md: -------------------------------------------------------------------------------- 1 | An evaluation of the Swarm book 2 | =============================== 3 | 4 | 2020-12-22 Mark Spanbroek 5 | 6 | https://swarm-gateways.net/bzz:/latest.bookofswarm.eth/ 7 | 8 | Goal of this evaluation is to find things to adopt or avoid while designing 9 | Dagger. It is not meant to be a criticism of Swarm. 10 | 11 | #### Pros: 12 | 13 | + Book contains a well-articulated vision and historical context (§1) 14 | + Uses libp2p as underlay network (§2.1.1) 15 | + Uses content-addressable fixed-size chunks (§2.2.1, §2.2.2) 16 | + Employs encryption by default, enabling plausible deniability for node owners 17 | (§2.2.4) 18 | + Opportunistic caching allows for automatic scaling for popular content 19 | (§2.3.1, §3.1.2) 20 | + Has an upload protocol that, once completed, allows the uploader to disappear 21 | (§2.3.2) 22 | + Network self-repairs content through pull syncing (§2.3.3). 23 | + Nodes can play different roles depending on their capabilities, e.g. light 24 | node, forwarding node, caching node (§2.3.4). 25 | + Has a pricing protocol (§3.1.2) 26 | + Uses micro payments (§3.2) 27 | + Allows for zero cash entry (§3.2.5), which benefits decentralization 28 | + Uses staking/collateral, spot-checks and litigation to insure long term 29 | storage. (§3.3.4, §5.3) 30 | + The Merkle tree for chunking a file enables random access, and resumption of 31 | uploads (§4.1.1) 32 | + Manifests allow for collections of files and their paths (§4.1.2) 33 | + Combines erasure coding with a Merkle tree in a smart way (§5.1.3) 34 | + Redundancy is used to improve latency (§5.1.3) 35 | 36 | #### Cons: 37 | 38 | - Use of two peer-to-peer networks (underlay and overlay) seems overly complex 39 | (§2.1) 40 | - Tries to solve many problems that can be addressed by other protocols, such as 41 | routing privacy, micro payments and messaging. 42 | - Storage nodes and peers are chosen based on their mathematical proximity, 43 | instead of taking performance and risk into account (§2.1.3) 44 | - Uses a forwarding Kademlia DHT (§2.1.3) for routing, which requires stable, 45 | long lived network connections 46 | - Depends heavily on forwarding of messages, each message passes through list of 47 | peers that could be on opposite sides of the world. (§2.1.3) 48 | - Tries to solve routing privacy (§2.1.3), which could arguably be better 49 | addressed by a separate protocol such as onion routing. 50 | - Because of the use of an overlay DHT network, Swarm has to solve the 51 | bootstrapping problem, even though libp2p already solves this (§2.1.4). 52 | - A Swarm node needs to maintain three different DHTs; one for the underlay 53 | network (libp2p), another for routing (forwarding Kademlia), and a third for 54 | storage (DISC). 55 | - Solves the problem of changing content in a content-addressable system in two 56 | different ways: through single-owner chunks (§2.2.3), and through ENS (§4.1.3) 57 | - Garbage collection based on chunk value makes it hard to reason about the 58 | amount of money that is required to keep content on the network (§2.2.5) 59 | - Besides all the various incentives, Swarm also has a reputation system in the 60 | form of a deny list (§2.2.7). 61 | - Incentive system is complex, and therefore harder to verify. (§3) 62 | - Has its own implementation of micro payments, instead of using existing 63 | payment channels (§3.2.1) 64 | - Rewarding nodes for upload receipts leads to a store-and-forget attack, that 65 | requires tricky mitigation (§3.3.4) 66 | - Extra complexity (trojan chunks, feeds) is added because Swarm is also a fully 67 | fledged communication system (§4). 68 | - Offers pinning of content, even though it is inferior to using incentives 69 | (§5.2.2) 70 | - Recovery is built on top of pinning and messaging (trojan chunks) (§5.2.3) 71 | -------------------------------------------------------------------------------- /evaluations/zeroknowledge.md: -------------------------------------------------------------------------------- 1 | Zero Knowledge Proofs 2 | ===================== 3 | 4 | Zero knowledge proofs allow for a verifier to check that a prover knows a value, 5 | without revealing that value. 6 | 7 | Types 8 | ----- 9 | 10 | Several types of non-interactive zero knowledge schemes exist. The most 11 | well-known are zkSNARK and zkSTARK, which [come in several flavours][8]. 12 | Interestingly, the most performant is the somewhat older Groth16 scheme, with 13 | very small proof size and verification time. Its downside is the requirement for 14 | a trusted setup, and its [malleability][9]. Performing a trusted setup has 15 | become easier through the [Perpetual Powers of Tau Ceremony][10]. 16 | 17 | A lesser-known type of zero knowledge scheme is [MPC-in-the-head][11]. This lets 18 | a prover simulate a secure multiparty computation on a single computer, and uses 19 | the communication between the simulated parties as proof. The [ZKBoo][13] scheme 20 | for instance allows for fast generation and verification of proofs, but does not 21 | lead to smaller proofs than zkSNARKs can provide. 22 | 23 | Tooling 24 | ------- 25 | 26 | [Zokrates][1] is a complete toolbox for specifying and generating and verifying 27 | zkSNARK proofs. It's written in Rust, has Javascript bindings, and can generate 28 | Solidity code for verification. C bindings appear to be absent. 29 | 30 | [libSNARK][2] and [libSTARK][3] are C++ libraries for zkSNARK and zkSTARK 31 | proofs. libSNARK can be used as a backend for Zokrates. 32 | 33 | [bellman][4] is a Rust library for zkSNARK proofs. It can also be used as a 34 | backend for Zokrates. 35 | 36 | Iden3 created a suite of tools ([circom][5], [snarkjs][6], [rapidsnark][7]) for 37 | zkSNARKs (Groth16 and PLONK). It is mostly Javascript, except for rapidsnark 38 | which is writting in C++. 39 | 40 | Nim tooling seems to be mostly absent. 41 | 42 | Ethereum 43 | -------- 44 | 45 | Ethereum has pre-compiled contracts [BN_ADD, BN_MUL and SNARKV][12] that reduce 46 | the gas costs of zkSNARK verification. These are used by the Solidity code that 47 | Zokrates produces. 48 | 49 | [1]: https://zokrates.github.io 50 | [2]: https://github.com/scipr-lab/libsnark 51 | [3]: https://github.com/elibensasson/libSTARK 52 | [4]: https://github.com/zkcrypto/bellman/ 53 | [5]: https://github.com/iden3/circom 54 | [6]: https://github.com/iden3/snarkjs 55 | [7]: https://github.com/iden3/rapidsnark 56 | [8]: https://medium.com/coinmonks/comparing-general-purpose-zk-snarks-51ce124c60bd 57 | [9]: https://zokrates.github.io/toolbox/proving_schemes.html#g16-malleability 58 | [10]: https://medium.com/coinmonks/announcing-the-perpetual-powers-of-tau-ceremony-to-benefit-all-zk-snark-projects-c3da86af8377 59 | [11]: https://yewtu.be/V8acfV8LJog 60 | [12]: https://coders-errand.com/ethereum-support-for-zk-snarks/ 61 | [13]: https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/giacomelli -------------------------------------------------------------------------------- /incentives-rationale.md: -------------------------------------------------------------------------------- 1 | # Incentives Rationale 2 | 3 | Why incentives? In order to have a sustainable p2p storage network such that it can be used to store arbitrary data and avoid specializing around a certain type of content, economic incentives or payments are required. 4 | ## Incentives in p2p networks 5 | 6 | Bittorrent & friends, tend to specialize around **popular content** such as movies or music (citation). Empirical evidence suggests, that this is a consequence of how incentives are aligned in this types of network (citation). Without diving too deep, the bittorrent incentives model is composed of 3 major elements: 7 | 8 | - A "tit-for-tat" (or some variation) accounting system 9 | - A "seeding ratio", which signals the peer's reputation 10 | - The content being shared, which becomes the commodity traded in the network 11 | 12 | In other words, you trade "content" in a "tit-for-tat" fashion, which increases the "seeding ratio", which gives access to more "content". This model sounds attractive and fair at first, however it has shown to have major flaws. 13 | 14 | - It leads to network specialization where only certain types of content are available 15 | - Peers want to maximize their "seeding-ratio", which leads to sharing content which is in high demand (popular), which often tends to be the latest movie, tv show or music album. 16 | - Rare or "uninteresting" content is almost impossible to come by unless explicitly provided by a party such as specialized communities or private trackers (which usually implies payments, ie external incentives). 17 | - File availability becomes dependent on the content's popularity and age (in the network). It's availability declines over time. 18 | - Anecdotally, the current season of a tv show is often easier to come by than the previous season, this is because once the content has been downloaded (and consumed), there is little reason to continue sharing it for a longer period of time. 19 | 20 | There is also operational costs associated with running a node. This costs grows (at the very least linearly) proportional to the amount of users being served. Running a highly available node that serves thousands of other nodes a day is probably unfeasible for the vast majority of casual users and building a business around this type of protocols has unclear economics and quite often, legal consequences due to (the already mentioned network specialization issues) sharing illegal or "pirated" content. 21 | 22 | In short, a direct consequence of this incentives model is **network specialization** and **content/data availability**. 23 | 24 | In contrast to this, there is a different type of p2p network where this problem is not observed. Blockchains, which are in some sense data sharing networks, are an example of such networks. The reason for this is that there is a minimum amount of data required for a blockchain node to operate and thus all nodes have the incentive to download that data - we can call it **essential** data. Since this data is **essential** to operate a node, it's guaranteed to always be available - **this is a premise of the network**; non essential data however, such as the chain's full history is harder to come by and usually subsidized by third party organizations. 25 | 26 | It follows from the above examples that, there are __at least__ two types of p2p storage networks: 27 | 28 | - One where the data is intrinsic to the protocol, in other words the protocol and the data are inseparable, which is the case of blockchains 29 | - Another, where the data is extrinsic and the protocol does not rely on it in order to operate 30 | 31 | ## General purpose p2p storage networks 32 | 33 | In general, when compared to centralized alternatives p2p networks have many desirable properties such as: 34 | 35 | - Censorship resistance 36 | - Robustness, in the face of large scale failures 37 | - Excellent scaling potential 38 | - This is usually a matter of having more nodes joining the network 39 | 40 | In short, p2p networks have advantages over centralized counterparts and yet, we haven't see wider adoption outside of a few niche use cases already outlined above. In our opinion, this is due to lack of sufficient guarantees in networks with extrinsic data. 41 | 42 | One important property of data, is that once data is gone and no backups exist, the chances of recovery are very slim. Contrast this with computation, if the data is intact, recovering from failed computation usually implies simply re-running it with the same input. In other words, when it comes to data, integrity and availability is more important than any other aspect. 43 | 44 | It's this project's aim to provide a solution to the outlined issues of **data availability** in networks with extrinsic data. 45 | 46 | It's worth further breaking down **data availability** into two separate problems: 47 | 48 | 1. Retrievability - the ability to retrieve/download data from the network at any time 49 | 2. Persistence - the guarantee that data is persisted by the network over a predetermined time frame 50 | 51 | ## What should be incentivized then? 52 | 53 | In our opinion, anything that is a finite resource. In p2p storage networks this is largely bandwidth and hard drive. 54 | 55 | ### Bandwidth incentives 56 | 57 | Bandwidth is a finite resource and has an associated cost. Eventually this cost is going to compound and serving the network will become unreasonable and unsustainable. This leads to low node participation and data retrievability issues as peers will chooses to only serve select nodes or none at all. In many cases, this leads nodes to temporarily leave the network even when the file is still sitting on its hard drive. 58 | 59 | There are several fundamental problems that bandwidth incentives solve. 60 | 61 | - Increase the chance of data being "retrievable" from the network 62 | - Properly compensate "seeders" for the resources consumed by "leechers" 63 | - This ensures higher levels of node participation 64 | - Serves as a sybil attack prevention mechanism in an open network 65 | 66 | With incentivized bandwidth, rational nodes driven by market dynamics should seek to maximize profits by sharing data that is in high demand, thus offsetting operational costs and scaling the network up or down. This would give the network properties similar to a CDN that caches content to prevent overwhelming the origin with requests. 67 | 68 | ### Storage incentives 69 | 70 | Storage, also being a finite resource, has associated costs. Storing data on your own hard drive for no reason is irrational and it's safe to assume that an overwhelming majority of the network's participants wont do that. This leads the network to specialize around certain types of __popular__ content. In order to offset that trend, storing data needs to be incentivized. 71 | 72 | The fundamental issues that storage incentives solve is **data availability** and more precisely the issue of data **persistence** over a predetermined time frame. 73 | 74 | Enabling persistence opens up many common use cases such as data backups, becoming a storage provider for traditional web and web3 applications, and many others. In short, it replaces centralized cloud storage providers. Due to the wide range of use cases the issue of specialization also goes away. 75 | 76 | It's worth noting that we make no claims that the network is not going to be used to store and distribute "pirated" content, we merely claim that by realigning incentives we'll enable other more common use cases. 77 | 78 | Together, these incentives lead to a sustainable and censorship resistant p2p network. You negotiate a price for certain content to be stored long-term. Should the content become unavailable (due to censorship or a generic failure) after the contract is negotiated then the peer that stores the content is punished. When content is popular, then it will spread because more peers want to earn bandwidth fees, which allows the network to scale to high demand, acting as a censorship resistant CDN. 79 | 80 | ## Zero-cash entry 81 | 82 | Zero-cash entry entails that you can enter the network without having any funds. When all interactions in the network have a price then it becomes a problem to start participating unless the node is funded. The way to work around this problem is to initially become a provider of services, for example, a node can start persisting chunks for some amount of time (minutes or hours), and thus earn some initial capital after which it can start to freely exchange data. 83 | 84 | Another possibility would be that businesses that have storage requirements subsidize new users with a seed amount. For example a chat application can seed a small amount to a newly signed up client which will help it get started in the network. Once the client participates in the network, it will start earning bandwidth fees, which if correctly balanced, mean that a casual user can participate almost for free. 85 | 86 | 87 | ## Design philosophy 88 | 89 | When choosing and designing our incentive protocols we favor practical and provable protocols that maximally reduce the risks for participants. We favor those solutions that are easy to separate and upgrade over those that are tightly coupled with the rest of the network design. 90 | -------------------------------------------------------------------------------- /meetings/bogota2022.md: -------------------------------------------------------------------------------- 1 | Agenda Bogota Meetup 2 | -------------------- 3 | 4 | Draft agenda, feel free add / modify as you see fit. 5 | 6 | ### Suggested Topics ### 7 | 8 | - Session on new ZK based proving scheme 9 | - Retrospective (evaluate how we're doing and how we can improve as a team) 10 | - Hacking sessions (hack on Codex together) 11 | - Discussion about slot collateral: global collateral or per slot? 12 | - Node interactions: how do we envision executing transactions in different states? 13 | - dapp / http server embedded in the binary? 14 | - Perhaps we can use Hester's designs as a guide? 15 | - Project management 16 | - Tasks/issues cleanup 17 | - Discuss what's working, what's not, and what we can improve 18 | - Present high-level architecture, make recording of it for future reference 19 | - Codex use cases 20 | - Codex credits 21 | - Codex differentiators 22 | -------------------------------------------------------------------------------- /papers/Compact_Proofs_of_Retrievability/README.md: -------------------------------------------------------------------------------- 1 | # Compact Proofs of Retrievability 2 | 3 | ## Authors 4 | 5 | - Hovav Shacham - hovav@cs.ucsd.edu 6 | - Brent Waters - bwaters@cs.utexas.edu 7 | 8 | ### DOI 9 | 10 | - http://dx.doi.org/10.1007/978-3-540-89255-7_7 11 | 12 | ## Summary 13 | 14 | The paper introduces a remote storage auditing scheme known as Proofs of Retrievability based on work derived from `Pors: proofs of retrievability for large files` by Juels and Kaliski and `Provable data possession at untrusted stores` by Ateniese et al. 15 | 16 | It takes the idea of homomorphic authenticators from Ateniese and the idea of using erasure coding from Juels and Kaliski to strengthen the remote auditing scheme. To our knowledge, this is also the first work to provide rigorous mathematical proofs for this type of remote auditing. 17 | 18 | The paper introduces two types of schemes - public and private. In the private setting, the scheme requires possession of a private key to perform verification but lowers both the storage and network overhead. In the public setting, only the public key is required for verification but the storage and network overhead are greater than those of the private one. 19 | 20 | ### Main ideas 21 | 22 | - Given a file `F`, erasure code it into `F'` 23 | - Split the file into blocks and sectors 24 | - Generate cryptographic authenticators for each block 25 | - During verification 26 | - The verifier emits a challenge containing random indexes of blocks to be verified along side random values used as multipliers to compute the proof 27 | - The prover takes the challenge and using both the indexes and the random multipliers produces an unforgeable proof. The proof consists of the aggregate data and tags for the indexes in the challenge 28 | - Upon receival, the prover is able to verify that the proof was generated using the original data 29 | 30 | ### Observations 31 | 32 | - Both the original data and the tags are employed in the generation and verification of the proof, which prevents pre-image attack that other schemes are susceptible to. 33 | - It potentially achieves a level of compression where at most one block worth of data and one cryptographic tag is ever needed to be sent across the network. 34 | - The erasure coding solves several concurrent issues. With an MDS erasure code and a coding ratio of 1 (K=M) 35 | - It is only necessary to prove that %50 of all the blocks are in place, this lowers the amount of data needed to be sampled making it constant for datasets of any size 36 | - Having to only verify that only K blocks are still available also protects against adaptive adversaries. For example, if the data is stored in 3 drives and each drive keeps going offline between samples the odds reset between each sampling round. To protect against such an adversarial scenario without erasure coding it would require to sample 100% of the entire file at each round; with erasure coding, since **any** K blocks are sufficient to reconstruct, the odds do not reset across sampling rounds 37 | 38 | ### Other ideas 39 | 40 | - Another important aspect presented in the paper is an `extractor function`. The idea is that given an adversary that is producing proofs but not releasing the data upon request, it would still be possible to eventually extract enough data to be able to reconstruct the entirety of the dataset, this would require extracting an amount of data equivalent to K blocks. 41 | -------------------------------------------------------------------------------- /papers/Economics_of_BitTorrent_communities/README.md: -------------------------------------------------------------------------------- 1 | # Economics of BitTorrent communities 2 | 3 | ## Authors 4 | 5 | - Ian A. Kash - iankash@microsoft.com 6 | - John K. Lai - jklai@seas.harvard.edu 7 | - Haoqi Zhang - hq@eecs.harvard.edu 8 | - Aviv Zohar - avivz@microsoft.com 9 | 10 | ### DOI 11 | 12 | - https://doi.org/10.1145/2187836.2187867 13 | 14 | ## Summary 15 | 16 | The paper is a study of a BitTorrent community called DIME, where users share live concert recordings. The community has around 100K users and the study analyses data gathered over 6 months. 17 | 18 | ### Main ideas 19 | 20 | * The DIME system enforces a ratio of at least 0.25: 4 downloads for 1 upload 21 | * Many users have a ratio above 1 (which shows an altruistic behaviour) 22 | * New files are more attractive to users and have high demand at the beginning 23 | * Users with high bandwidth Internet connections take advantage of new files to take credits 24 | * Old files are no good to gain credit because they are not in high demand 25 | * There are periods where downloads are free 26 | * Users prefer to download old files during free periods 27 | 28 | ### Observations 29 | 30 | * The paper does not give any numbers about the amount of data available in total 31 | * The paper does not provide data about the file size distribution 32 | * Overall the paper provides interesting data about how sharing communities behave but no data about the decentralized storage itself. 33 | 34 | ### Other ideas 35 | 36 | * Some aspects of the demand for files with respect to their life could be applied to other decentralized storage systems 37 | 38 | 39 | -------------------------------------------------------------------------------- /papers/Falcon_Codes_Fast_Authenticated_LT_Codes/README.md: -------------------------------------------------------------------------------- 1 | # Falcon Codes 2 | ## Authors 3 | 4 | - Ari Juels 5 | - James Kelley 6 | - Roberto Tamassia 7 | - Nikos Triandopoulos 8 | 9 | ## DOI 10 | 11 | https://doi.org/10.1145/2810103.2813728. 12 | 13 | ## Bibliography entry 14 | 15 | Juels, Ari, James Kelley, Roberto Tamassia, and Nikos Triandopoulos. ‘Falcon Codes: Fast, Authenticated LT Codes (Or: Making Rapid Tornadoes Unstoppable)’. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 1032–47. CCS ’15. New York, NY, USA: Association for Computing Machinery, 2015. https://doi.org/10.1145/2810103.2813728. 16 | 17 | ## Summary 18 | 19 | The paper addresses the problem of **adversarial erasures** in case of **non-MDS codes**, in a **private coding setting**. 20 | LT-codes, and their derivatives (RaptorQ, etc.) are known to provide fast(even linear-time) encode and decode both asymptotically and in practice, and are useful both as large block codes and as rateless codes. However, their guarantees are w.h.p only, while minimum code distance can be small in practice. This means that adversarial erasure patterns exist that can eliminate the advantages of an otherwise strong redundancy. Falcon codes aim to solve this by hiding the coding pattern. Note that this hiding can only work in a private setting, where there is a shared secret between encoder and decoder. 21 | 22 | ### Main ideas 23 | 24 | The main idea is to: 25 | - Take an LT encoder, which already uses and RNG to pick from a random degree distribution when generating bipartite coding graph. 26 | - Employ a PRG parametrised by a secret to make the random coding graph secret. 27 | - Encoding is now using a secret graph, but since encoding is done using XOR, it would be easy to infer the graph by observing segments. Protect this by adding a layer of encryption over segments. 28 | - Optionally add a MAC to convert corruptions to erasure. 29 | 30 | ### Other ideas 31 | 32 | Other ideas in the paper include: 33 | - reduce MAC overhead: batching MACs amplifies error but reduces overhead. 34 | - Scalability (FalconS): original Falcon needs access to all segments. Change this by applying Falcon in `b` blocks. This improves encoder locality but introduces adversarial erasure. Thus, apply a random permutation over all parity symbols over all blocks to avoid the adversarial erasures. 35 | - Rateless (FalconR): split original to `b` blocks, and set up a different Falcon for each, but do not encode yet. Then, generate the next parity symbol by one of the `b` Falcon encoders, randomly selecting which one to use. 36 | 37 | There is also a whole section dedicated to the use of Falcon in PoR …. this needs further study. 38 | 39 | -------------------------------------------------------------------------------- /papers/Filecoin_A_Decentralized_Storage_Network/README.md: -------------------------------------------------------------------------------- 1 | # Filecoin: A Decentralized Storage Network 2 | 3 | ## Authors 4 | 5 | - Juan Benet - juan.benet.ai 6 | 7 | ### DOI 8 | 9 | - 10 | 11 | ## Summary 12 | 13 | This paper describes the mechanis behind the decentralized storage network called Filecoin. 14 | 15 | ### Main ideas 16 | 17 | **DSN: Decentralized storage Network** (No trusted parties) 18 | DSN must guarantee: 19 | * Data integrity (Data retrieved is the same as data stored) 20 | * Data retrievability (Clients can eventually retrieve the data) 21 | * Management fault tolerance (Management nodes might fail) 22 | * Storage fault tolerance (Storage nodes might fail) 23 | 24 | Other Properties: 25 | * Publicly verifiable 26 | * Auditable 27 | * Incentive-compatible 28 | 29 | **Proof of Storage** : Provable Data Possession and Proof of Retrievability. 30 | 31 | * Sybil attacks: multiple fake identities storing only 1 copy of the data. 32 | * Outsourcing Attacks: Quick request the block to other storage node. 33 | * Generation Attacks: regenerate the data on the fly when possible. 34 | 35 | **PoRep**: Proof of Replication, not to confuse with Proof of Retrievability. 36 | Note that Filecoin does not support Erasure Codes, only trivial replication. 37 | 38 | **Proof of spacetime**: Repeat PoRep over time. 39 | 40 | Seal operation makes a permutation of the replica, so that proofs can only work for the specific replica, therefore storing n replicas implies allocating n times the size of the dataset. 41 | 42 | PoRep setup process needs to be 10-100 times more time consuming than the proof, otherwise setup, request and proofs can be generated on the fly. 43 | 44 | Clients pay to store data and also to retrieve it. Retrieval Miners can be the same as Storage Miners, or simply ask from Storage nodes and send to the client, keeping some data in cache. This is some kind of cache mechanism. The benefit of only being a Retrieval Miner is that you are not responsible for storage, you don't lose money if you lose data. 45 | 46 | Achieving Retrievability: The Put specifies (f,m)-tolerant, meaning m storage miners storing the data and a maximum of f faults need to be tolerated. 47 | 48 | The Market Place is off-chain. Data sent by mini-blocks accompanied with micro-payments for each mini-block. 49 | 50 | Power fault tolerance: N is total “power” of the network and f is the part of the power controlled by adversarial nodes. 51 | 52 | More storage in use = more power on the network, more probability to be elected and create blocks. 53 | 54 | ### Observations 55 | 56 | (Copy-paste from Mark's evaluation) 57 | 58 | **Pros**: 59 | 60 | * Clients do not need to actively monitor hosts. Once a deal has been agreed upon, the network checks proofs of storage. 61 | * The network actively tries to repair storage faults by introducing new orders in the storage market. (§4.3.4). 62 | * Integrity is achieved because files are addressed using their content hash (§4.4). 63 | * Marketplaces are explicitly designed and specified (§5). 64 | * Micropayments via payment channels (§5.3.1). 65 | * Integration with other blockchain systems such as Ethereum (§7.2) are being worked on. 66 | 67 | **Cons**: 68 | 69 | * Filecoin requires its own very specific blockchain, which influences a lot of its design. There is tight coupling between the blockchain, storage accounting, proofs and markets. 70 | * Proof of spacetime is much more complex than simple challenges, and only required to make the blockchain work (§3.3, §6.2) 71 | * A miners influence is proportional to the amount of storage they're providing (§1.2), which is an incentive to become big. This could lead to the same centralization issues that plague Bitcoin. 72 | * Incentives are geared towards making the particulars of the Filecoin design work, instead of directly aligned with users' interest. For instance, there are incentives for storage and retrieval, but it seems that a miner would be able to make money by only storing data, and never offering it for retrieval. Also, the incentive for a miner to store multiple independent copies does not mean protection against loss if they're all located on the same failing disk. 73 | * The blockchain contains a complete allocation table of all things that are stored in the network (§4.2), which raises questions about scalability. 74 | * Zero cash entry (such as in Swarm) doesn't seem possible. 75 | * Consecutive micropayments are presented as a solution for the trust problems while retrieving (§5.3.1), which doesn't entirely mitigate withholding attacks. 76 | * The addition of smart contracts (§7.1) feels like an unnecessary complication. 77 | 78 | ### Other ideas and comments 79 | 80 | Nice Figure 1 showing the state machine for each component. 81 | 82 | Figure 2 shows a nice workflow. Good way to explain the system. 83 | 84 | Not clear what is the optimal relationship between m and f? 85 | 86 | The parameters f and m are never shown in the PUT description in figure 7. 87 | 88 | 89 | -------------------------------------------------------------------------------- /papers/Peer-to-Peer_Storage_System_a_Practical_Guideline_to_be_lazy/README.md: -------------------------------------------------------------------------------- 1 | # Peer-to-Peer Storage Systems: a Practical Guideline to be Lazy 2 | 3 | ## Authors 4 | 5 | - Frederic Giroire - frederic.giroire@sophia.inria.fr 6 | - Julian Monteiro - julian.monteiro@sophia.inria.fr 7 | - Stephane Perennes - stephane.perennes@sophia.inria.fr 8 | 9 | ### DOI 10 | 11 | - https://doi.org/10/c47cmb 12 | 13 | ## Summary 14 | 15 | The paper presents the different trade-offs for implementing erasure code reconstruction after failures. The paper focuses on Reed Solomon encoding and analyses the number of encoding blocks (called fragments in the paper) the number of parity blocks as well as the minimum number of redundant blocks before triggering block repairs. The authors propose a Markov chain model and they look at the impact of these parameters on network bandwidth as well as block loss rate. 16 | 17 | ### Main ideas 18 | 19 | * Every reconstruction implies data traffic over the network 20 | * If we reconstruct after every single block loss (eager repair), we consume too much bandwidth 21 | * If we wait for several blocks to be lost (Lazy repair), we can reconstruct the missing block just one time and save bandwidth 22 | * If we wait too long before reconstruction, data might be lost if multiple erasures occur simultaneously 23 | * A model can help us understand the impact of the parameters s, r and r0 on bandwidth and loss rate 24 | * The distribution of blocks redundancy is a bit counter-intuitive 25 |   26 | 27 | ### Observations 28 | 29 | * It is assumed that the block reconstruction process is much faster than the peer failure rate 30 | * In P2P networks, failures are considered independent and memory-less a=1/MTTF 31 | * The probability for a peer to be alive after T time steps is P_a = (1 − a)^T 32 | * They work on the Galois Field GF (2^8)), which leads to the practical limitation s + r ≤ 256 33 | * The failure rate (1 year for 1 disk) is very conservative 34 | * Overall the model is elegant and the results are very clear and interesting 35 |   36 | ### Other ideas 37 | 38 | * The stretch factor is computed as follows: k+m/k 39 | * Block sizes can be chosen depending on the main purpose of the storage system 40 | * Archival mode: Good with large blocks because few reads 41 | * Filesystem mode: Good with small blocks because it allows easy reads and edits 42 | 43 | -------------------------------------------------------------------------------- /papers/README.md: -------------------------------------------------------------------------------- 1 | # Paper Summaries 2 | 3 | > This directory contains academic paper summaries explored as part of the Codex project research. It is structured as a list of links to a document containing a quick summary and observations extracted from the paper. The summaries aren't meant to be exhaustive and cover all aspects of the paper but rather serve as a quick refresher and a record of the papers already evaluated. 4 | 5 | ## Index 6 | 7 | - [Compact Proofs of Retrievability](./Compact_Proofs_of_Retrievability/README.md) 8 | - [Filecoin A Decentralized Storage Network](./Filecoin_A_Decentralized_Storage_Network/README.md) 9 | - [Economics of BitTorrent communities](./Economics_of_BitTorrent_communities/README.md) 10 | - [Peer-to-Peer Storage Systems: a Practical Guideline to be Lazy](./Peer-to-Peer_Storage_System_a_Practical_Guideline_to_be_lazy/README.md) 11 | 12 | ## Writing Summaries 13 | 14 | A summary should contain a brief overview of the core ideas presented in the paper along with observations and notes. 15 | 16 | ## Template 17 | 18 | A [template](template.md) is provided that outlines a few sections: 19 | 20 | - Title - the title of the paper 21 | - Authors - the authors of the paper 22 | - DOI - the digital object identifier of the paper 23 | - Links - an optional section with links to the paper and other relevant material, such as source code, simulations, etc... If the paper is uploaded to the repo, it should be linked here as well. 24 | - Summary - a quick summary capturing the main ideas proposed by the paper 25 | - Main ideas - an optional list of bullet points describing the main ideas of the paper in more detail 26 | - Observations - an optional list of bullet points with observations if any 27 | - Other ideas - an optional list of bullet points with additional observations 28 | 29 | ## Directory Structure 30 | 31 | Each evaluation should go into it's own directory named after the paper being evaluated. It should contain a `README.md` with the actual evaluation and additional supporting material such as the paper itself, if one is available; or relevant code samples if those are provided. For example, the `Shacham and Waters - Compact Proofs of Retrievability` directory structure would look something like this: 32 | 33 | ``` 34 | ├── Compact\ Proofs\ of\ Retrievability 35 | │ └── README.md 36 | | └── Compact\ Proofs\ of\ Retrievability.pdf 37 | ``` 38 | -------------------------------------------------------------------------------- /papers/Sui/sui.md: -------------------------------------------------------------------------------- 1 | The Sui Smart Contracts Platform 2 | ================================ 3 | 4 | * [Sui Whitepaper](https://github.com/MystenLabs/sui/blob/main/doc/paper/sui.pdf) 5 | * [Sui Tokenomics paper](https://github.com/MystenLabs/sui/blob/main/doc/paper/tokenomics.pdf), May 2022 6 | 7 | 8 | Sui is an alternative to blockchains that is geared towards high performance. It 9 | utilizes a UTXO and DAG based design that allows for parallelization. It uses a 10 | delegated proof-of-stake model to keep the number of validators low while 11 | keeping the design permissionless. It uses a storage fund to pay for persistent 12 | storage. 13 | 14 | Main ideas 15 | ---------- 16 | 17 | ### Consensus ### 18 | 19 | Transactions require approval from 2/3 of the validators (as measured by 20 | delegated stake). Sui uses the minimum amount of consensus that is required for 21 | a given transaction: 22 | 23 | * For transactions on owned objects (controlled by a key) it uses byzantine 24 | consistent broadcast. (Whitepaper §4.3, "Sign once, and safety") 25 | * For transactions on shared objects (modifiable by anyone) it uses a byzantine 26 | agreement protocol only on the *order* of conflicting transactions. Execution 27 | of the transaction happens after the order has been determined. (Whitepaper 28 | §4.4, "Shared Objects" and §5, "Throughput") 29 | 30 | Transactions on owned objects require 2 roundtrips to a quorum for byzantine 31 | broadcast. Transactions on shared objects require 4-8 round trips to a quorum 32 | for byzantine agreement. (Whitepaper §5, "Latency") 33 | 34 | ### Parallelism ### 35 | 36 | Sui uses the 37 | [Move](https://github.com/MystenLabs/sui/blob/main/doc/paper/sui.pdf) language 38 | for programming smart contracts. Unlike the EVM languages, this language is 39 | geared towards the inherent parallelism that is afforded by the UTXO and DAG 40 | design. The language is not unique to Sui, it is used in other projects as well. 41 | (Whitepaper §2) 42 | 43 | ### Storage ### 44 | Persistent storage is paid for using a storage fund, whereby the storage fees 45 | are collected, and the proceeds of investing (staking) this fund are used to pay 46 | for future storage costs (Tokenomics §3.3, §5). Fees are rebated when deleting 47 | data from storage (Tokenomics §5.1). This is designed in such a way that the 48 | opportunity cost of locking up tokens is equal to the fees one would otherwise 49 | pay for storage (Tokenomics §6.2).. 50 | 51 | ### Proof of stake ### 52 | 53 | Avoids "rich-get-richer" forces of other proof-of-stake implementations, 54 | specifically to ensure that validators enjoy viable business models regardless 55 | of their delegated stake. Random selection is avoided, opting instead for a 56 | model where everyone is rewarded according to their stake (Tokenomics §3.2, 57 | §6.3) 58 | 59 | Gas fees are payed out to both validators and the people that delegated their 60 | stake to the validators. This is an extra incentive for delegators to keep an 61 | eye on their chosen validator, and to move stake when the validator is not 62 | behaving well. (Whitepaper §4.7, "Rewards and cryptoeconomics") 63 | 64 | ### Epochs ### 65 | Keeps several parameters of the network constant during epochs, such as the 66 | stake of the validators and (nominal) gas prices. Uses checkpointing to compress 67 | state and allow for committee changes on epoch boundaries. (Whitepaper §4.7) 68 | 69 | Promotes predictable gas prices by having validators indicate a gas price 70 | upfront, and diminish their rewards if they do not honour this upfront gas 71 | price. (Tokenomics §4.1, §4.3) 72 | 73 | Observations 74 | ------------ 75 | 76 | ### Contention ### 77 | 78 | The Sui design nicely sidesteps contention issues with shared mutable UTXO state 79 | (e.g. such as in Cardano) by performing the byzantine agreement protocol on the 80 | order of the transactions, not on its execution. Therefore there is no need to 81 | resubmit a transaction when someone else used the object/UTXO that you intended 82 | to use. Uses references to objects/UTXOs without their serial number for mutable 83 | state, to enable this. (Whitepaper §4.4, "Shared objects") 84 | 85 | ### Recovery ### 86 | 87 | It also nicely solves an issue with byzantine consistent broadcast (e.g. as in 88 | ABC) whereby funds are locked forever when conflicting transactions are posted. 89 | Conflicting transactions are cleared on every epoch boundary. (Whitepaper §4.3, 90 | "Disaster recovery" and §4.7, "Recovery") 91 | 92 | ### Inert stake ### 93 | 94 | Sui mitigates the problem whereby an increasing amount of delegated stake can no 95 | longer be re-assigned because the associated keys are lost (as could happen in 96 | e.g. ABC). Stake delegation is an explicit action, instead of an implicit 97 | side-effect of every transaction. The staking logic is implemented in a smart 98 | contract, which allows the network to update the logic to deal with such issues. 99 | 100 | ### Validator reputation ### 101 | 102 | Stake rewards are influenced by the (subjective) reputation that validators 103 | report about other validators. It is unclear how far this can be gamed by 104 | validator wishing to increase their rewards. (Tokenomics §4.1.2, §4.1.3) 105 | 106 | ### Storage fund viability ### 107 | 108 | The storage fund seems based on the assumption that there is enough money to be 109 | made from computation fees to pay for continued storage. The fund ensures that 110 | validators earn a bigger cut of the computation fees based on the size of the 111 | storage fund. This could be problematic when the amount of storage heavily 112 | outweighs the amount of computation; for instance if Sui were used primarily as 113 | a storage network. This is good to keep in mind when comparing the storage fund 114 | model with rent-based models such as employed in Codex. (Tokenomics §3.2, §3.3) 115 | 116 | The storage gas price remains constant within an epoch (Tokenomics §3.1). It is 117 | unclear how the network should react when confronted with a sudden spike in 118 | demand for storage. What would happen when the storage is price is low, and a 119 | user decides to store massive amounts of data on the network? 120 | -------------------------------------------------------------------------------- /papers/template.md: -------------------------------------------------------------------------------- 1 | # Title 2 | 3 | The title of the paper 4 | 5 | ## Authors 6 | 7 | The authors of the paper 8 | 9 | ### DOI 10 | 11 | The digital object identifier for the paper 12 | 13 | ### Links 14 | 15 | An optional section with links to the paper and other relevant material, such as source code, simulations, etc... If the paper is uploaded to the repo, it should be linked here as well. 16 | 17 | ## Summary 18 | 19 | A quick summary capturing the main ideas proposed by the paper 20 | 21 | ### Main ideas 22 | 23 | A list of bullet points describing the main ideas of the paper in more detail 24 | 25 | ### Observations 26 | 27 | An optional list of bullet points with observations if any 28 | 29 | ### Other ideas 30 | 31 | An optional list of bullet points with additional observations 32 | -------------------------------------------------------------------------------- /project-overview.md: -------------------------------------------------------------------------------- 1 | # Codex project overview 2 | 3 | > This documents outlines at a high level, what Codex is; the problem it's attempting to solve and it's value proposition; as well as how it compares to similar solutions. 4 | 5 | ## Introduction 6 | 7 | Peer to peer storage and file sharing networks have been around for quite a long time. They exhibit clear advantages in comparison to centralized storage providers such as scalability and robustness in the face of large scale network disruptions and have desirable censorship resistant properties. However, we've yet to see widespread adoption outside of a few niche applications. 8 | 9 | Our intuition is that the lack of incentives, strong data availability, and persistence guarantees make these networks unsuitable for applications with moderate to high availability requirements. In other words, **without reliability at the storage layer it is impossible to build other reliable applications** on top of it. A more in depth overview of these observations can be found in the [incentives rationale](https://github.com/status-im/codex-research/blob/main/incentives-rationale.md) document. 10 | 11 | ## Goals and Motivations 12 | 13 | Codex is our attempt at creating a decentralized storage engine that intends to improve on the state of the art by supplying: 14 | 15 | - An incentivized p2p storage network with strong availability and persistence guarantees 16 | - A resource restricted friendly protocol that can endure higher levels of churn and large amounts of ephemeral devices 17 | 18 | We intend to address the first issue by developing a robust data availability and retrievability scheme and the second by building a p2p network friendly to mobile and other ephemeral devices. 19 | 20 | We follow the "less is more" principle and attempt to remove as much complexity from the core protocol as possible. Anything that doesn't directly contribute to the core functionality is pushed out of the protocol - this decision has two important goals. Reducing complexity at the protocol level, simplifies implementation and allows for quick iterative development cycles and, by simplifying the protocol we also simplify the incentives mechanisms - a particularly hard problem that we believe is yet to be properly addressed by other solutions. 21 | 22 | ## High level network overview 23 | 24 | Codex consists of a p2p network of **storage, ephemeral, validator** and **regular** nodes. 25 | 26 | ### Storage nodes 27 | 28 | Storage nodes provide long term reliable storage. In order for a storage node to operate it needs to stake a collateral proportional to the amount of data it's willing to store. Once the collateral has been staked and the node begins to store data, it needs to periodically provide proofs of data possession. If a node fails to provide a proof in time, it is penalized with a portion of its stake; if the node fails to provide proofs several times in a row, it looses the entirety of the stake. 29 | 30 | ### Validator nodes 31 | 32 | Validator nodes are in charge of collecting, validating and submitting proofs to an adjudicator contract which rewards and penalizes storage and other validator nodes. A validator node also needs to stake a collateral in order to be able to participate in the validation process. 33 | 34 | Note that we don't use the term "adjudicator contract" in the literal sense of an Ethereum contract. We use it to indicate anything that executes on a consensus engine. 35 | 36 | ### Ephemeral nodes 37 | 38 | Bandwidth incentives allow anyone to operate as an ephemeral node, profiting only from caching and serving popular content. We expect this to have the emergent property of an organic CDN, where nodes with spare bandwidth but limited or unreliable storage can collectively scale the network depending on current demands. 39 | 40 | ### Regular nodes 41 | 42 | Regular or client nodes, engage with other nodes to store, find and retrieve data from the network. Regular nodes constitute the lion share of the Codex network and consume services offered by other nodes in exchange for payments. A regular node can also be an ephemeral node by caching previously consumed data that other nodes can retrieve from it. This allows nodes to offset some of the cost of participating in the network and it's expected to allow the majority of nodes to participate on an almost free basis after an initial entree fee - this last point is covered in more detail in a later section. 43 | 44 | ## Incentives structure 45 | 46 | The goals behind our incentives structure are: 47 | 48 | 1. Allow demand and supply to direct the network to optimally utilize its resources 49 | 2. Allow nodes to utilize their competitive advantages to maximize profits, thus increasing participation 50 | 3. Serve as a security and spam prevention mechanism 51 | 52 | Interactions between nodes are 1:1. This decision is deliberate and allows us to simplify accounting and adjudicating of payments and avoids complex price discovery mechanisms. We explicitly want to avoid: 53 | 54 | - Complex multihop payment chains - all interactions are strictly between directly connected nodes 55 | - Arbitrary price setting - all prices are driven by demand and supply and are negotiated 1:1 56 | - Loose payment guarantees and doublespends - all interactions between parties are settled securely and unambiguously 57 | 58 | In other words, our incentive structure attempts to be simple, predictable and secure. Predictability and security allows nodes to properly plan and allocate resources. 59 | 60 | ### Incentives categories 61 | 62 | There are several incentives categories: 63 | 64 | - Staking 65 | - Bandwidth 66 | - Storage 67 | - Penalties and rewards 68 | 69 | #### Staking 70 | 71 | Staking is used as a mechanism to prevent spam and abuse in the system - all nodes stake some amount of collateral. 72 | 73 | Regular nodes stake funds indirectly by having an operational capital to be able to retrieve content from the network, i.e. bandwidth fees. 74 | 75 | #### Bandwidth 76 | 77 | Bandwidth fees play several important roles in the system: 78 | 79 | - Prevent spam and DDoS attacks from requesting nodes 80 | - Enable nodes to operate as exit or caching nodes 81 | - Avoid hotpaths and enable (geographical) locality 82 | - Rational nodes looking to maximize profits, can quickly cache and serve popular content, thus scaling the network according to current needs 83 | 84 | #### Storage 85 | 86 | Storage incentives allow nodes to earn a profit in exchange for storing arbitrary data. This allows content to persist in the network regardless of its popularity and age. 87 | 88 | #### Penalties and Rewards 89 | 90 | Penalties and rewards allow verifying nodes to profit by monitoring and detecting malicious or malfunctioning storage and other validator nodes. 91 | 92 | ## Data availability and persistence 93 | 94 | A core goal of the Codex protocol is to enable data availability and persistence. In order to accomplish this, we rely on several complementary techniques: 95 | 96 | - We use active verification to ensure data is available and retrievable 97 | - We ensure that failures are detected and corrected early to prevent outages and keep previously agreed upon redundancy guarantees 98 | - We use erasure coding to increase network wide data redundancy and prevent catastrophic data loss 99 | 100 | When a node commits to a storage contract and a user uploads a file or other arbitrary data, the network will proactively verify that the storing node is online and the data is retrievable. Storage nodes broadcast proofs of data possession over random intervals. If the storage node sent invalid proofs or failed to provide them in time, the network will re-post the contract for any other storage node to pick up. When the contract is re-posted, an amount from the faulty node's stake is used for the new storing node bandwidth fees. It is expected that data is stored on at least several nodes to prevent data loss in the case of a catastrophic network failure. Erasure coding complements active verification by allowing to reconstruct the full set from a subset of the data. 101 | 102 | ### Proofs of data possession and retrievability 103 | 104 | We use proofs of data possession and retrievability to ensure storage nodes committed to a contract remain online and available. The storage and retrievability proofs are formally described in this [document](https://hackmd.io/2uRBltuIT7yX0CyczJevYg?view). 105 | 106 | The main objective of the proofs are: 107 | 108 | - Ensure nodes are online and maintaining the entirety of the dataset from the storage contract 109 | - Ensure that data is readily retrievable to prevent blackmailing and withholding attacks 110 | 111 | 112 | ## Interacting with the Codex Network 113 | 114 | Any regular node that participates in the network needs to have an operational amount set aside in order to cover for bandwidth fees. This creates a barrier to entry however, we think it's a worthy tradeoff in order to maintain the security and health of the network. It's worth noting that any decentralized platform will have similar requirements and limitations. Bellow, we'll list some potential ways to workaround this in Codex. 115 | 116 | ### Subsidies or airdrops 117 | 118 | Any application migrating or being built for a decentralized platform requires some operational capital to participate. Many projects workaround this by initially subsidizing potential users with small portions of their token. This are usually known as airdrops. Codex can use a similar technique to allow first adopters to begin participating in the network. 119 | 120 | ### Tit-for-tat settlements 121 | 122 | Many interactions in the network will be long lived, this allows two nodes to exchange in a tit-for-tat manner. It works like this, a long lived payment channel is opened and nodes freely exchange chunks regardless of which way the balance tilts. The channel can only be closed when both parties agree (this is always true, regardless of how long the channel has been open). The node that is currently in debt will need to add funds to the channel or keep providing the other peer with chunks until the debt is repaid. The channel is closed only when the debt is completely settled or the counter party forgoes it. We expect that nodes resort to tit-for-tat often, thus alleviating the need to constantly "top up" the node's balance. 123 | 124 | ### Ephemeral or Caching nodes 125 | 126 | Storing nodes need to be constantly online in order to earn fees on storage contracts and respond to probing requests. If a node is overwhelmed and unable to serve requests, it can miss a verification window or a verifier requesting random chunks, both of which lead to penalties. In this cases, rational storage nodes can lower the price per chunk to allow other nodes to share the load. In many cases they might forgo bandwidth fees for some period of time, which allows newly joining or underfunded nodes to become caching nodes and earn bandwidth fees. 127 | 128 | ### Adaptive nodes 129 | 130 | Many mobile devices follow well established patterns of usage. Phones often switch from mobile data to WiFi and back; and are either on the go or plugged in the wall, charging. This devices can operate mostly as a consumer when on the go, but switch to a caching node when bandwidth and power aren't a limitation. This will offset all or most of the nodes consumed bandwidth during the day. 131 | 132 | ### Opportunistic providing 133 | 134 | A node might not have a chunk itself, but it might be connected to another node that does, in that case it might choose to advertise the chunk to the requesting node but charge a small premium in order to cover it's expenses and make a small profit on top. This is called "opportunistic providing". 135 | 136 | Note that this emulates forwarding but without incuring in complexity with payments tracking accross many hops. Payments at each hop are still settled 1:1. 137 | 138 | ### Altruistic nodes 139 | 140 | Any node can choose to provide services for free. Nodes can store and share arbitrary data at will without charging any fees. 141 | 142 | ## Closing Notes 143 | 144 | To summarize, Codex attempts to "untie the knot" of incentivized storage and allow many existing and future application to be built in a distributed manner. We're building Codex to be reliable and predictable p2p storage infrastructure that will allow for many business and casual use cases. We accomplish data persistence and availability by introducing robust PoDP proofs which we supplement with error correction techniques. We use robust PoR schemes to prevent blackmailing and data withholding attacks and guarantee data is always retrievable. We provide reasonable workarounds to the "zero entry" problem without compromising the network's security. 145 | 146 | Hopefully, this overview has clarified what Codex is and what its main value proposition. 147 | -------------------------------------------------------------------------------- /robust-data-possesion-scheme.md: -------------------------------------------------------------------------------- 1 | # Robust Proofs of Data Possession and Retrievability 2 | 3 | > Proofs of data possession (PoDP) schemes establish if an entity is currently or has been in possession of a data set. Proofs of Retrievability (PoR) schemes attempt to detect with negligible probability that an entity is maliciously or otherwise withholding data. 4 | 5 | ## Proofs of data possession 6 | 7 | In our definition, a robust PoDP scheme is one that can prove with negligible probability that a storage provider is currently or has been in possession of a data set. 8 | 9 | To state the obvious first - the most secure data possession scheme is to "show" the data every time it's being requested. In other words, the most secure way of proving that someone has a file is for him/her to show the file every time it is being requested. Alas, due to bandwidth restriction, this is not practical in the majority of cases hence, the main objective of different data possession schemes is overcoming the limitation of _having to show the entire dataset every time_ to prove its possession. 10 | 11 | A common technique to overcome this limitation is, random sampling and fraud proofs. This consist in selecting a random subset of the data instead of the entire data set. The rationale is that, if the prover doesn't know which pieces are going to be requested next, it is reasonable to expect that it will keep all the pieces around to prevent being caught; and a well behaved prover would certainly do that. 12 | 13 | The mechanism described above sounds reasonable, unfortunately, naive random sampling only provides weak guarantees of possession. 14 | 15 | Lets look at a naive, but relatively common random sampling scheme. 16 | 17 | - Given a file $F$, split it into a set of chunks $C$, where $C=\{c_1, c_2,...,c_n\}$ 18 | - Next, generate a publicly available digest $D$ of the file from the chunks in $C$ - for example a Merkle tree 19 | - To prove existence of the file, provide random chunks from the set $C$, along with a part of the digest $D$ 20 | - A verifier, takes the random chunks provided and attempts to verify if they match the digest, if they do, they're assumed to be part of the same set $C$ 21 | 22 | The problem with this sampling technique however, is that it can only prove existence of those pieces that have been sampled at that particular point in time and nothing more. 23 | 24 | For example, lets say that the prover $P$, supplies randomly sampled chunks $c_\alpha$ from the set $C$, at discrete time intervals $T$ to a verifier $V$. At time $t_0$, $P$ successfully supplies the set $\{c_1, c_7, c_9\}$ request by $V$; at time $t_1$ it supplies the set $\{c_2, c_8, c_5\}$; at time $t_2$ it supplies the set $\{c_4, c_3, c_6\}$ and so on. 25 | 26 | Each of this sampling steps are statistically independent from one another, ie, the set provided at $t_3$ doesn't imply that the sets $t_2$, $t_1$ and $t_0$ are still being held by the prover, it only implies that it's in possession of the currently provided set. At each verification step, the odds of detecting a malicious or faulty prover are proportional to the amount of sampled chunks. In other words, if the prover can provide %50 of the chunks, the chance of catching it are %50, if it provides %5 the chances of catching a malicious or faulty node are %5. Moreover, this doesn't establish possession over time, which we defined as another property of PoDP. 27 | 28 | One common misconception is that increasing the sampling rate will somehow change the odds of detecting missing chunks, but that is not the case, at best it will allow detecting that they are missing faster, but the odds will still be the same. 29 | 30 | To understand why, lets do a quick refresher on basic statistics. There are two types of statistical events - independent and dependent. In an independent event, the outcome of the previous event does not influence the next outcome. For example, flipping a coin always has a %50 chance of hitting either heads or tails, and throwing it 10 times vs 100000 times would not change this odds. Dependent events on the other hand are tied and the odds of the next event are dependent on the outcome of the previous event. For example, if there is a bag with 5 marbles, 2 red and 3 blue, the odds of pulling a red marble is 2 in 5 and the odds of pulling a blue one is 3 in 5. Now, if we pull one red marble from the bag, the odds change to 1 in 4 and so on. 31 | 32 | To increase the robustness of random sampling schemes and establish possession over time, each sampling event needs to be dependent on the previous event. How can we do this? A potential way is to establish some sort of cryptographic link at each sampling step, such that the next event can only happen after the previous one completed, thus establishing a chain of proofs. 33 | 34 | Lets extend our naive scheme with this new notion and see how much we can improve on it. For this we need to introduce a new primitive - a publicly known and verifiable random beacon. This can be anything, from a verifiable random function (VRF) to most blockchains (piggy backing on the randomness properties of the blockchain). In this scheme, we generate the same publicly known and verifiable digest $D$, but instead of supplying just random chunks along with a part of the digest (merkle proofs of inclusion), we also supply an additional digest generated using a value supplied by the random beacon $B$ and the previous digest - $d_{n-1}$. In other words, we establish a cryptographic chain that can only be generated using a digests from previous rounds. 35 | 36 | It looks roughly like this. We chunk a file and generate a well known digest $(D)$, lets say it is the root of the merkle tree derived from the chunks. This is also the content address used to refer to the file in the system. Next, we use the digest and a random number from the random beacon to derive a _verification digests_ at each round. The first iteration uses the digest derived by concatenating the random number from the random beacon and the digest $D$ to generate a new verification digest $d_0$, subsequent rounds use the previous digest (ie $d_{n-1}, d_{n-2}, d_{n-3}$, etc..) to generate new digests at each round. Like mentioned above, this creates a chain of cryptographic proofs, not unlike the ones in a blockchain, where the next block can only be generated using a valid previous block. 37 | 38 | More formally: 39 | 40 | ($||$ denotes concatenation) 41 | 42 | - Given a file $F$, split it into a set of chunks, where $C=\{c_1, c_2,...,c_n\}$ 43 | - Using the chunks in $C$, generate a digest $D$ such that $D=H(C)$ 44 | - To prove existence of the file 45 | - Select random chunks $c_\alpha = \{c_1, c_3, c_5\}$, $c_\alpha \subset C$ 46 | - Get a random value $r$ from $B$ at time $t_n$, such that $r_n=B(t_n)$ 47 | - Using $r_n$, plus $d_{n-1}$, generate a new digest, such that $C_n = \forall \sigma \in C: d_{n-1} || r_n || \sigma$ , and $d_n = H(C_n)$ 48 | - At time $t_0$, the digest $d_0$ will be constructed as $C_n = \forall \sigma \in C: D || r_0 || \sigma$ , and $d_0 = H(C_n)$ 49 | - We then send $d_n$ and $c_\alpha$ to the verifier 50 | - A verifier, takes the supplied values and using a function first verifies that $V(H(c_\alpha), D)$ it the takes $V(H(\forall \sigma \in c_\alpha: d_{n-1} || r_n || \sigma), d_n)$ 51 | 52 | The first question to ask is how much has this scheme improved on our naive random sampling approach? Assuming that our cryptographic primitives have very low chances of collision and our randomness source is unbiased, the chances of forging a proof from a subset of the data are negligible, moreover, we can safely reduce the number of sampled chunks to just a few and still preserve the high level of certainty that the prover is in possession of the data, thus keeping the initial requirement of reducing bandwidth consumption. 53 | 54 | However, in its current non-interactive form the digest can be forged by combining the already known requested chunks and complementing the rest with random chunks. In order to prevent this, we need to split the digest generation and verification into independent steps, ie make it interactive. 55 | 56 | In an interactive scheme, where the prover first generates and sends a digest $d_n$ and the verifier then requests random chunks from the prover we can prevent these types of attacks. However every interactive scheme comes with the additional overhead of the multiple rounds, but as we'll see next, we can use this property to build a robust proof of retrievability scheme from it. 57 | 58 | ## Proofs of Retrievability 59 | 60 | A robust PoR scheme is one that can detect with negligible probability that a node is maliciously or otherwise withholding data. 61 | 62 | A particularly tricky problem in PoR schemes is the "fisherman" dilemma as described by [Vitalik Buterin](https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding). 63 | 64 | To illustrate the issue, lets look at a simple example: 65 | 66 | - Suppose that $P$ is storing a set $C=\{c_1, c_2..c_n\}$ 67 | - A node $\rho$, attempts to retrieve $C$ from $P$ 68 | - If $P$ is maliciously or otherwise unable to serve the request, $\rho$ needs to raise an alarm 69 | 70 | However, due to the "fishermans" dilemma, proving that $P$ withheld the data is impossible. Here is the relevant quote: 71 | 72 | > because not publishing data is not a uniquely attributable fault - in any scheme where a node ("fisherman") has the ability to "raise the alarm" about some piece of data not being available, if the publisher then publishes the remaining data, all nodes who were not paying attention to that specific piece of data at that exact time cannot determine whether it was the publisher that was maliciously withholding data or whether it was the fisherman that was maliciously making a false alarm. 73 | 74 | From the above, we can deduce that unless the entire network is observing the interaction between the requesting node and the responding node, it's impossible to tell for sure who is at fault. 75 | 76 | There are two problems that the "fisherman" dilemma outlines: 77 | 78 | - "all nodes who were not paying attention to that specific piece of data at that exact time" 79 | - "was the publisher that was maliciously withholding data or whether it was the fisherman that was maliciously making a false alarm" 80 | 81 | This can be further summarized as: 82 | 83 | 1. All interactions should be observable and 84 | 2. All interactions should be reproducible and verifiable 85 | 86 | The first requirement of observability can be broken down into observing the requester and observing the responder. 87 | 88 | 1. In the case of the responder, if it knows that no-one is observing then, there is no way anyone can prove that it withheld the data 89 | 2. In the case of the requester, it is both impossible to prove wrongdoing on behalf of the responder and prove that the requester is being honest in its claims 90 | 91 | We can invert the first proposition and instead of it being "if it knows that no-one is observing" we can restate it as "if it doesn't know when it's being observed" and introduce uncertainty into the proposition. If the responder never knows for sure that it's being observed, then it's reasonable for a rational responder to assume that it is being observed at all times. 92 | 93 | There isn't a way of directly addressing the second issue because there is still no way of verifying if the requester is being honest without observing the entire network, which is intractable. However, if we instead delegate that function to a subset of dedicated nodes that observe both the network and each other, then the requester never needs to sound the alarm itself, it is up to the dedicated nodes to detect and take action against the offending responder. 94 | 95 | However, this scheme is still incomplete and it's reasonable to assume that the responder can deny access to regular requesters, but respond appropriately to the dedicated verifiers. The solution is to anonymize the validators so that it is impossible to tell wether the requester is being audited or simply queried for data. This guarantees that storing nodes always respond to any request as soon as possible. 96 | 97 | Our second requirement states that all interactions should be reproducible and verifiable. It turns out that this is already partially solved by our PoDP scheme. In fact, the seemingly undesirable interactive property, can be used to extend the PoDP scheme to a PoR scheme. 98 | 99 | ## Extending PoDP to PoR 100 | 101 | To extend the PoDP with PoR properties, we simply need to turn it into an interactive scheme. 102 | 103 | Suppose that we have a trustless network of storing (responders), validating and regular (requesters) nodes. Storing nodes generate and submit a verification digest $d_n$ at specific intervals $t_n$. Validator nodes, collectively listen (observe) this proofs (which consist of only the digest). Proofs are aggregated and persisted, such that it is possible retrieve them at a later time to precisely establish when the node was last in possession of the dataset. Proofs are only valid for a certain window of time, so if the node went offline and failed to provide a proof for several intervals, it would be detect and the node would be marked offline. This by itself is not sufficient to prove neither possession nor availability, but it does establish a verifiable chain of events. 104 | 105 | Next, at random intervals, an odd subset of the validators is selected and each validator requests unique random set of chunks from the storing node. Each validator then verifies the chunks against $d_n$. If the chunks match for all validators, then each will generate an approval stamp which will be aggregated and persisted in a blockchain. 106 | 107 | If chunks only match for some validators and since there is an odd number of validators, then the majority decides if they are correct or invalid, thus avoiding a tie. Neither the validators nor the storing nodes know ahead of time which subset they will end up being part of and each validator generates its own random set to probe. 108 | 109 | In order to reduce bandwidth requirement and load on the network, validation happens periodically, for example, every two hours in a 24 hour window. If a faulty node misses a window to submit its proof (digest) it's marked offline and penalized, but if a malicious node submits several faulty proofs in succession, it should be detected during the next window of validation and penalized retroactively for every faulty proof. If enough proofs are missed and assuming that all participants are bound by a collateral, then the faulty or malicious node gets booted from the set of available storing nodes and looses it's stake. 110 | 111 | In a similar manner, if a node from the validators subset failed to submit its stamp on time, it gets penalized with a portion of its collateral and eventually booted off the network. 112 | 113 | Well behaved nodes get rewarded for following the protocol correctly, faulty or malicious nodes are detected, penalized and eventually booted out. 114 | 115 | ## Conclusion 116 | 117 | To understand whether the described PoDP and PoR schemes satisfy the requirements of being robust, lets first outline what those are: 118 | 119 | 1. Establish possession of data over time 120 | 2. Establish possession of data at the current time 121 | 3. Detect faulty or malicious nodes that are withholding data 122 | 4. Circumvent the "fishermans" dilemma 123 | 124 | Now, does our proposed scheme satisfy this requirements? 125 | 126 | - We can reliably say that a node has been in possession of data over time by issuing a cryptographically linked chain of proofs - this satisfies 1. 127 | - We can reliably tell whether a node is currently in possession of a data set by interactively probing for randomly selected chunks from the original data set and matching them agains the current digest - this satisfies 2. 128 | - We introduced uncertainty through anonymity and randomness into the interactive verification process, which allows us to satisfy 3 and 4. 129 | - Only dedicated nodes need to monitor the network, which makes observability tractable 130 | - Since nodes don't know when they are observed, rational nodes can only assume that they are always observed, thus preventing data withholding and encouraging availability 131 | 132 | Furthermore, assuming that the broadcasted proofs are smaller than the original data set, we keep the bandwidth requirements low. We can further improve on it by reducing the probing frequency. Since faults can still be reliably traced back to their origin, nodes can be retroactively punished, which further reduces the possibility of gaming the protocol. 133 | --------------------------------------------------------------------------------