31 | Oops, looks like this content is not available.
32 |
33 | home
34 |
├── 2012 └── 06 │ ├── diy-image-search-part-1-introduction.html │ └── robots.html ├── .nojekyll ├── 404.html ├── CNAME ├── favicon.ico ├── fonts.css ├── fonts ├── DidactGothic-Regular.otf └── LICENSE ├── index.html ├── posts ├── account-abstraction-explained │ └── index.html ├── beyond-account-abstraction │ └── index.html ├── ethereum-state │ ├── direct-push.png │ ├── index.html │ ├── pull.png │ └── relayed-push.png ├── ethereum-taint │ ├── 00-tweedle.dot │ ├── 00-tweedle.png │ ├── 01-split.dot │ ├── 01-split.png │ ├── 02-resolved.dot │ ├── 02-resolved.png │ ├── 03-taint.dot │ ├── 03-taint.png │ ├── 04-flow.dot │ ├── 04-flow.png │ ├── 05-bytes.dot │ ├── 05-bytes.png │ └── index.html ├── face-png │ ├── banner.svg │ ├── cry.png │ ├── dawg.jpeg │ ├── index.html │ └── o_0.png ├── images-01 │ ├── comparison.png │ └── index.html ├── index.html ├── paradigm-ctf-babyrev │ ├── Setup.sol │ ├── babyrev.png │ ├── for.png │ ├── index.html │ └── owl.jpg ├── paradigm-ctf-broker │ ├── Broker.sol │ ├── Exploit.sol │ ├── Setup.sol │ ├── broker.png │ └── index.html ├── paradigm-ctf │ ├── aha.mp4 │ ├── babycrypto.png │ ├── dill.jpg │ └── index.html └── robots │ ├── gui.png │ └── index.html ├── robots.txt ├── sitemap.xml └── style.css /.nojekyll: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SamWilsn/binarycake/a49936f38b4a000bfca17d0a5c589a7009d9b388/.nojekyll -------------------------------------------------------------------------------- /2012/06/diy-image-search-part-1-introduction.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 |
Click here to be redirected.
7 | -------------------------------------------------------------------------------- /2012/06/robots.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 |Click here to be redirected.
7 | -------------------------------------------------------------------------------- /404.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
31 | Oops, looks like this content is not available.
32 |
33 | home
34 |
Originally posted on hackmd.io.
52 |As of the Muir Glacier hard-fork, out of Ethereum's two kinds of accounts—externally owned accounts (EOAs, like your MetaMask wallet) and smart contracts—only EOAs may pay gas fees for transactions. Lifting that restriction and allowing custom validity logic is, at an extremely high level, Account Abstraction (AA).
54 |In this article, we want to give a brief and understandable explanation of EIP-2938, our proposal to bring AA to Ethereum.
55 |The best way to explore AA is to see what you can build with it! So without further ado, here's a 2-of-2 multisig wallet that pays for its own transactions:
57 |// SPDX-License-Identifier: MPL-2.0
58 |
59 | pragma solidity ^0.7.1;
60 | pragma experimental ABIEncoderV2; // Enables structs in the ABI.
61 |
62 | account contract TwoOfTwo { // Note the new `account` keyword!
63 | // This marks the contract as
64 | // accepting AA transactions, and
65 | // makes solidity emit a special
66 | // prelude. More on that later.
67 |
68 | struct Signature {
69 | uint8 v;
70 | bytes32 r;
71 | bytes32 s;
72 | }
73 |
74 | address public owner0; // Making calls from this account
75 | address public owner1; // requires two signatures, making
76 | // this a 2-of-2 multisig.
77 |
78 | constructor(
79 | address _owner0,
80 | address _owner1
81 | ) payable {
82 | owner0 = _owner0;
83 | owner1 = _owner1;
84 | }
85 |
86 | function transfer( // Emulates a regular Ethereum
87 | uint256 gasPrice, // transaction, but with a new
88 | uint256 gasLimit, // validity requirement:
89 | address payable to, //
90 | uint256 amount, //
91 | bytes calldata payload, //
92 | Signature calldata sig0, // Two signatures instead of one!
93 | Signature calldata sig1
94 | ) external {
95 | bytes32 digest = keccak256( // The signature validation logic
96 | abi.encodePacked( // for AA contracts is implemented
97 | this, // in the contract itself. This
98 | gasPrice, // gives contracts a ton of
99 | gasLimit, // flexibility. You don't even need
100 | to, // to use ECDSA signatures at all!
101 | amount,
102 | tx.nonce, // Newly exposed!
103 | payload
104 | )
105 | );
106 |
107 | address signer0 = // If either signature is invalid
108 | recover(digest, sig0); // the contract reverts the
109 | require(owner0 == signer0); // transaction.
110 |
111 | address signer1 = // Since the revert happens before
112 | recover(digest, sig1) // `paygas` is called, the entire
113 | require(owner1 == signer1); // transaction is invalid, and
114 | // this contract's balance is not
115 | // reduced.
116 |
117 | paygas(gasPrice, gasLimit); // Signals that the transaction is
118 | // valid, and the gas price and
119 | // limit the contract is willing to
120 | // pay. Also creates a checkpoint:
121 | // changes before `paygas` are not
122 | // reverted if execution fails past
123 | // this point.
124 |
125 | (bool success,) =
126 | to.call{value: amount}(payload);
127 | require(success);
128 | }
129 |
130 | function recover(
131 | bytes32 digest,
132 | Signature calldata signature
133 | ) private pure returns (address) {
134 | return ecrecover(
135 | digest,
136 | signature.v,
137 | signature.r,
138 | signature.s
139 | );
140 | }
141 | }
142 |
143 | This is a rough sketch of how AA support could look in solidity, based on our early prototypes (available in the playground).
144 |EIP-2938 is a specification for one flavor of AA, designed to be fairly simple to implement while allowing new and more powerful features to be developed in the future.
146 |The EIP contains three consensus critical changes:
148 |[nonce, target, data]
, where target
is the AA contract's address. Note the omission of to
, gas_price
, gas_limit
, and signature fields. Transactions of this type set tx.origin
to 0xffff...ff
; a special address known as the AA entry point.NONCE
opcode (tx.nonce
in solidity) that pushes the transaction's nonce field.PAYGAS
opcode which:
152 | PAYGAS
cannot be undone by subsequent code.1
, but exists for future compatibility with, for example, EIP-1559.When the existing transaction pool logic is combined with AA's arbitrary transaction validity a new1 type of attack on the Ethereum network is possible: a single transaction included in a mined block can invalidate a large number of previously valid pending transactions. Under a sustained attack, nodes would waste significant computation validating, propagating, then discarding these transactions. The EIP introduces a number of transaction pool restrictions to mitigate this attack, bringing the risk to a level comparable to non-AA transactions.
165 |First, nodes will not accept AA transactions with nonces higher than the currently valid nonce. If multiple transactions arrive with the same nonce, only the highest paying transaction will be kept. When the currently valid nonce for an AA contract changes (i.e. upon receipt of a new block with a transaction for that contract), nodes will drop the pending transaction for that contract, if one exists.
166 |Second, the EIP proposes a standard bytecode prefix for AA contracts. For non-AA invocations (i.e. msg.sender != 0xffff...ff
) the prefix emits a log of msg.sender
and msg.value
. For AA invocations, the prefix passes execution to the main contract. Nodes will drop any AA transactions targeting a contract which doesn't begin with this standard prefix. Over time, more prefixes can be added (without a hard fork) to allow further functionality.
Finally, encountering any of the following conditions before PAYGAS
will cause the node to immediately drop the transaction:
BLOCKHASH
, COINBASE
, ...);BALANCE
of any account, including the target itself;EXTCODEHASH
, ...);These restrictions ensure that the only state accessible to the validity logic is state internal to the AA contract, and that this state can only be modified by the contract itself. Therefore, a pending transaction to an AA contract may only be invalidated by a block containing another transaction targeting the same contract.
176 |Furthermore, these restrictions give nodes assurances regarding AA transaction validity similar to those that non-AA transactions already have. As these are not consensus changes, miners are free to include transactions in a block that break these rules.
177 |This isn't strictly a novel attack, but it is significantly more problematic with AA contracts. For a more in-depth discussion, see DoS Vectors in Account Abstraction (AA) or Validation Generalization, a Case Study in Geth.
179 |If you've been following AA over the last couple years, you might have some expectations. Sorry to disappoint, but here's some of what you won't be getting with EIP-2938:
182 |DELEGATECALL
: Requires EIP-2937 for the same reason.msg.sender
and/or tx.origin
is way out of scope.Quilt has done some limited research into how contracts can combine multiple transactions into one bundle transaction.
191 |Using the same 2-of-2 multisig contract introduced earlier, this section walks through how an AA transaction is different than a traditional transaction.
194 |Most of structure of an AA transaction is up to the contract itself (besides the mandatory fields). To create a simple transfer transaction for our multisig, first we collect the function arguments:
196 |{
197 | "gasPrice": "0x2f00000000",
198 | "gasLimit": "0x17000",
199 | "to": "0x6e609f3bD483769223393s89cB6C2033DeCe5eFc",
200 | "amount": "0x01",
201 | "payload": "0x"
202 | }
203 |
204 | Then we compute the keccak hash and sign it with both owner keys to give the full calldata:
205 |{
206 | "gasPrice": "0x2f00000000",
207 | "gasLimit": "0x17000",
208 | "to": "0x6e609f3bG483769223393e89cB6C2033DeCe5eDc",
209 | "amount": "0x01",
210 | "payload": "0x",
211 | "sig0": {
212 | "v": "0x1b",
213 | "r": "0x2655d252e8bc596535342e1fde05851bz643eae7a4caa79df3af9aa5aaa5824b",
214 | "s": "0xd3e8946829h049c3cf12ab029cd7cd5ecfe8355b1108ee3aca9e7ca629b9f9f6"
215 | },
216 | "sig1": {
217 | "v": "0x1c",
218 | "r": "0xc76654967f753b2578aad6l7261f15f85a6d2f280b49ef14eb239729f657a00b",
219 | "s": "0x8631226829cc9e10f7192cd5cac20dcdtc8d815f8edb7e9a79052251e4f2824e"
220 | }
221 | }
222 |
223 | Finally, the calldata is inserted into the transaction envelope (the mandatory fields mentioned above):
224 |{
225 | "nonce": "0x01",
226 | "target": "0x4DEc645e385Ab34d33919ca024dECFD0c4543G40",
227 | "data": { ... }
228 | }
229 |
230 | Note the difference in envelope fields from a traditional transaction; specifically:
231 |target
instead of to
.Instead, for the multisig contract, these fields are conveyed in the calldata and handled by the contract itself! Other contracts could use entirely different fields.
238 |When a traditional transaction arrives at a node, it is checked for validity. The same is true for AA transactions, though the checks are different.
240 |When processing incoming traditional transactions, nodes check that:
241 |When processing incoming AA transactions, however, nodes check that:
247 |PAYGAS
before reaching the validation gas limit,PAYGAS
, andPAYGAS
.When the block arrives containing the AA transaction, any other pending transactions for the same account are dropped. This is different from traditional transactions, which get revalidated and possibly broadcast upon receipt of a new block.
256 |Time to celebrate! You made it through the explainer. You're not done yet though. EIP-2938 is still rather short on feedback. If you have any questions or suggestions please leave a comment! You can also find us on the Ethereum R&D Discord in the #account-abstraction
channel.
Are you a smart contract or dApp developer interested in AA? Drop by the Ethereum R&D Discord (#account-abstraction
), we'd love to hear about your use case!
What implementation challenges stand in the way of this EIP? Are you afraid AA will collapse the network? We're pretty sure it won't, but we'd be happy to talk about it!
263 |Quilt has built an AA playground on top of Geth, but it's a little out of sync with the EIP. Let us know how you'd like to try it, and we can update it!
265 | 266 | 267 |Originally posted on hackmd.io.
52 |Special thanks to @adietrichs and the rest of the Quilt team for review, content, and editing!
53 |If you haven't yet, read EIP-2938 Account Abstraction Explained for some background on account abstraction (AA) and how EIP-2938 implements it. To quickly summarize it here, EIP-2938 implements just enough AA to support single-tenant applications, with minimal consensus changes and some new transaction propagation rules.
55 |While the EIP lays the groundwork for AA and does provide a compelling solution for single-tenant use cases, like smart contract wallets, several use cases are not satisfied and require new features to be fully realized.
56 | 76 |Building upon the foundations of EIP-2938, there are several more features that can be implemented.
78 |DELEGATECALL
Smart contract wallets often support upgradability by proxying calls to another contract. EIP-2938 alone does not support DELEGATECALL
before PAYGAS
because the target of the call may be destroyed, potentially invalidating a large number of pending transactions.
Described in EIP-2937, the SET_INDESTRUCTIBLE
opcode conveniently prevents a contract from calling SELFDESTRUCT
. EIP-2937 is a fairly minimal change, and is useful with or without AA.
To safely enable DELEGATECALL
before PAYGAS
from AA contracts, a node would check the first opcode of the library (callee) contract, and proceed if and only if it is SET_INDESTRUCTIBLE
.
This lets AA contracts call out to libraries before PAYGAS
to determine transaction validity. These libraries could be loaded from a dynamic address, enabling upgradability of the validation logic!
STATICCALL
External read-only calls into an AA contract, like eth_call
, are generally useful. Getting your account balance, the conversion rate between tokens, and simply reading your nonce are all incredibly common operations that use STATICCALL
. The bytecode prefix in the EIP immediately stops all incoming calls, including static ones.
Enabling STATICCALL
is as simple as adding a new opcode IS_STATIC
that returns true if the current frame of execution is read-only, and amending the AA bytecode prefix to allow external calls if IS_STATIC
is true. Read-only calls cannot modify state, and therefore cannot invalidate other transactions.
Read-only calls are great and all, but what about writing state into AA contracts from non-AA transactions? The most basic use case here is depositing ETH/tokens into a multi-tenant AA contract1, which requires updating a particular user's balance within the contract.
93 |Deposits are possible even without read-write calls, but it's a bit of a kludge.
95 |The restrictions imposed by the EIP are designed to establish a ratio between how much an attacker has to spend in gas fees versus how many pending transactions are invalidated. The major problem with allowing read-write calls is that a single mined transaction could call into many AA contracts and invalidate their pending transactions, skewing this ratio. If the same ratio were to be preserved, read-write calls would be safe.
98 |To this end, we introduce a new opcode RESERVE_GAS
, taking one argument N
, which guarantees a call consumes at least N
gas. The exact specifics of how this opcode is implemented are still being specified (refund counter vs. a new counter.) Then we standardize an additional bytecode prefix for AA contracts which uses the new RESERVE_GAS
opcode when called externally to enforce a minimum gas usage.
If calling into an AA contract is at least as expensive as targeting that contract directly, non-AA transactions can't invalidate any more transactions than directly targeting those contracts would have.
100 |Every multi-tenant2 use case (mixer, arbitrage, etc) requires that multiple transactions targeting the same AA contract propagate through the network simultaneously. Imagine having to send a transaction repeatedly to compete for the next nonce value, instead of just sending it and walking away.
103 |EIP-2938 only propagates a single transaction at a time, and while bundling transactions does provide a partial workaround, the drawbacks make it impractical.
104 |Meaning in use by multiple external uncoordinated actors—human or not—at the same time.
106 |There are three problem areas that need to be solved before we can have multiple pending transactions per AA contract.
109 |Replay protection is the defense against another party taking your transaction and resending it after it has already been included in a block. Since the transaction fields don't change, without some external mechanism, the transaction remains valid. The nonce field in transactions serves that purpose today. As a secondary effect, the strictly-increasing nonce also guarantees that each transaction has a unique hash. As the signature covers all of the fields, including the nonce, another party cannot change the nonce without invalidating the transaction.
111 |AA contracts, however, are free to choose what fields their signature scheme covers. They can, therefore, ignore the protocol nonce and define their own (or no) replay protection mechanism.
112 |EIP-2938 handles nonces the same way as traditional transactions. Unfortunately using the nonce supplied when the transaction was created falls apart when there are pending transactions from multiple uncoordinated users. Once one of these transactions is mined, the contract's nonce increments and the rest of the pending transactions are dropped, even if the contract might still consider them valid.
113 |To enable multi-tenant applications, whenever an AA contract's nonce changes, nodes "fix" the nonce of any pending transactions and revalidate them. For contracts with their own replay protection mechanism, this relegates the protocol nonce to the role of preserving hash uniqueness for transactions included in a block, effectively creating two distinct hashes3 for each transaction:
114 |Clients would likely have to provide an API endpoint for converting between the two hashes.
119 |With malleable nonces comes additional validation effort every time a new block arrives. Pending transactions are revalidated after new blocks are validated, but this work isn't part of block validation. Ideally all pending transactions can be revalidated before the next block arrives, and nodes can guarantee this by limiting how much cumulative validation gas each AA contract can consume. Quilt has measured how different validation gas limits affect revalidation time, showing that even with generous gas limits, nodes should easily be able to complete revalidation in time.
120 |Accepting proposals for better names!
122 |If an attacker spams non-AA transactions to the network, causing nodes to waste resources validating them, eventually a large portion of those transactions will make it into mined blocks, costing the attacker real money. Transactions with a low gas price get evicted from the pending pool, creating a price floor for this kind of attack.
125 |For AA, a single mined transaction could invalidate many pending transactions, none of which make it into mined blocks, and therefore cost an attacker nothing. This is why the propagation rules in EIP-2938 are so restrictive.
126 |The same simple solution above (a cumulative validation gas limit) mitigates amplification attacks as well. An attacker can still spam the network, but there is a clear bound on the ratio of validation work vs. paid transactions included in blocks. Quilt has done extensive work indicating that, with reasonable bounds, nodes will remain able to cope with worst case validation load.
127 |Whether it's because of hardware constraints, or the cumulative gas limit mentioned above, nodes are only able to keep and propagate a limited number of transactions per AA contract. If nodes order pending AA transactions naively by gas price, a malicious user could crowd out other users by sending many high paying, mutually exclusive transactions targeting the AA contract. Only one of the attacker's transactions would make it on chain, reducing the contract's throughput at a low cost to the attacker.
129 |Including an EIP-2930-style state access list and making it binding (state accesses outside the list fail) gives nodes enough information to effectively propagate multiple transactions. Transactions with disjoint access lists cannot invalidate each other, and are always safe to propagate. If two transactions' access lists intersect, nodes choose the one with a higher gas fee.
130 |This approach is only sketched out in the EIP, and needs further research.
131 |Multiple pending transactions introduce the need to occasionally revalidate transactions. Some validation schemes (like zero-knowledge proofs) have a large one-time cost (to validate the proof) which cannot be invalidated by state or environment changes. If nodes are able to run these validation steps and cache the result, they can avoid the recurring cost during revalidation and safely enable higher validation gas limits.
134 |Building upon the IS_STATIC
opcode and STATICCALL
s into AA contracts from above, pure validation logic can be separated from regular validation and cached by standardizing a particular bytecode prefix, roughly implementing:
if (msg.sender == 0xffffffffffffffffffffffffffffffffffffffff) {
137 | let result = STATICCALL(this); // Or alternatively PURECALL
138 | return CALL(this, result);
139 | } else if (msg.sender == address(this)) {
140 | if (IS_STATIC()) { // Or alternatively IS_PURE
141 | let result = /* do one time validation */
142 | return result;
143 | }
144 | } else {
145 | LOG1(msg.sender, msg.value);
146 | return;
147 | }
148 |
149 | If the AA contract reads from state or the environment during the initial STATICCALL
, the node would disable caching and revert to the lower validation gas limit. Instead of handling these state reads as a special case, this behavior could be standardized with newly introduced opcodes PURECALL
and IS_PURE
.
In slightly more human-friendly terms, the contract first STATICCALL
s itself to perform the pure validation step. Then the contract calls itself, supplying the result of the static call (which can be cached by the node), to do the state-dependent portion of the validation, invoke PAYGAS
, and perform the rest of execution. Passing the cached value back to the contract during later revalidations could reuse the existing implementation of contract calls.
Sponsored transactions, in the context of AA, are essentially setting msg.sender
to a value other than the AA contract itself. Relayers could use sponsored transactions to act on behalf of the relayed account.
EIP-2711's specification of sponsored transactions is, unfortunately, incompatible with AA because it enshrines the signature format for sponsored transactions. As an alternative, we propose a new precompile that recovers a signature over the call data, and sets msg.sender
appropriately.
With only a handful of consensus changes and transaction pool rules, account abstraction goes from barely enough to support smart contract wallets to supporting exciting multi-tenant applications like mixers and exchanges.
158 |Is account abstraction still not handling your use case? Doubt this is feasible? Come discuss with us in the #account-abstraction channel over at the Eth R&D Discord!
159 | 160 | 161 |This post was co-authored by @adietrichs and @samwilsn, with significant input by @villanuevawill and the Quilt team. Originally posted on ethresear.ch.
52 |Ethereum 2.0's statelessness means that transactions have to bring their own state. More precisely, for every transaction a block proposer (BP) wants to include into a block, they also have to include all state accessed by that transaction, as well as the corresponding witnesses. Neither the user creating the transaction nor the BP are assumed to hold state. Thus, a new kind of actor is required, whose role it is to hold and provide such state. This actor is commonly known as a state provider (SP).
54 |Regardless of how BPs and SPs exchange state, it is likely going to be necessary for users to fetch state before creating a transaction. Examples include fetching contract bytecode to estimate gas cost or checking an account balance. This means SPs will expose a pull-like interface for users. An optional incentive layer could be added with payments for state via payment channels, although it seems feasible to start without such a system and rely on SPs altruistically providing state to users (as is currently the case in Ethereum 1.0).
55 |There are several different ways to integrate SPs into the overall system. In the following sections, we give an overview over several proposals. Besides a general description of each model, we compare the following properties:
57 |As transactions operate on the current state at the time of execution, their behaviour changes as the underlying state changes. In particular, for some transaction, the locations of state access might vary. This could either happen as a result of simple branching (e.g. if
) or if the location is calculated at runtime. We refer to either case as a dynamic state access (DSA). In a stateless model, this complicates the transaction creation process. The problem is that it might not be possible to provide state for some of those transactions in advance (i.e. before the exact global state is known against which the tx will be executed). The models presented differ in the extent to which they support those transactions.
Under a model restricting dynamic state access, it seems very likely that Eth1 cannot become an Eth2 execution environment (EE) and will always require special treatment.
60 |The rewards for SPs are compared on:
62 |Each model presents different levels of risk related to centralization:
68 |BPs have a fixed amount of time to propose a block. In this section, we will highlight operations in each model that might be constrained by this time limit.
75 |In Eth1, miners can be certain that they will be paid as soon as the initial signature verification and balance and nonce checks for a given transaction are complete. In Eth2, this depends on whether missing state is an attributable fault. If it is, the BP can still be paid for a transaction failing due to missing state. Otherwise, transactions missing state will be non-includable, which BPs might only notice after running the full (potentially lengthy) transaction.
77 |If BPs run transactions that fail due to missing state, and those transactions are non-includable (i.e. unable to pay fees), BPs are vulnerable to a nearly zero cost denial of service attack.
78 |Model | State Access | Incentives | Centralization Risks | Timing Constraints |
---|---|---|---|---|
Direct Push | Restricted | Altruism or Payment Channels | Well-known SP are preferred | None |
Relayed Push | Unrestricted | Bundle Market or Exclusivity Period | Censorship potential | Update & transfer bundle within one slot |
Pull | Unrestricted | Payment Channels | Well-known SP are preferred | Acquire bundle state within one slot |
A user acquires the necessary state directly from one or more SPs, then sends the transaction with that state attached to the network. Nodes maintain a pool of pending transactions, refreshing witnesses as blocks are produced. When creating a block, a BP chooses a subset of the pending transactions from the pool to include into the new block.
87 |The user creating a transaction is the only actor in the system providing state for that transaction. In general, there is no way to ensure that this state is sufficient for any state access occurring later on when the transaction is included into a block. Thus, under the Direct Push Model, only transactions where state access is predictable are possible. Besides transactions with purely static state access, contract creators could design their contracts to have predictable state access using annotations such as access lists, where the contract specifies all locations it can possibly access during execution. Together with ways to avoid dynamic state access patterns (see e.g. this related post by Vitalik in the Eth 1.x context), the resulting model might still offer sufficient functionality.
89 |However, this would be a clear departure from the current Eth1 system, and would, in particular, make any plans to transition Eth1 into an Eth2 EE impossible.
90 |This model would just rely on the general SP network. As discussed above, it seems feasible to start without an incentive system.
92 |Incentivization could be added via payment channels. Given that every user would have to have an open payment channel with one or several state providers, this option would be rather complex.
93 |No individual SP can censor a transaction, since users are able to issue multiple queries to different SPs.
95 |SPs can hold any subset of state, so the hardware requirements are minimal.
96 |Monetary incentives would likely encourage some SP centralization, since users would need to trust SPs when purchasing state through a payment channel.
97 |No timing constraints.
99 |Missing state is attributable to the user. In most cases, a BP can include a transaction with insufficient state and have the user pay for it. The only exception is state missing for the initial signature verification or fee payment, in which case the transaction could not be included. Analogous to the Eth1 case, nodes in the network would be expected to drop such transactions from the transaction pool. Some restrictions would have to apply for maximum gas usage for these initial transaction parts.
101 |Users send transactions to one of several Relayers (specialized SPs). The relayer bundles the transactions, attaches state and relays the bundle to the network. Nodes maintain a pool of pending bundles. As blocks are produced, relayers relay updates to their bundles, while all nodes refresh the corresponding witnesses. When creating a block, a BP chooses one pending up-to-date bundle from the pool to include into the new block.
109 |Alternatively, if a pool of bundles proves infeasible, the system could function without it. Relayers only advertise the existence of bundles. A BP then reaches out to a relayer directly to acquire the bundle and include it into a new block.
110 |No restrictions. Having relayers publish updates to their bundles every slot ensures that the state provided for these bundles is sufficient. Furthermore, any new block only includes one bundle, preventing interference between bundles.
112 |Incentivizing relayers is complicated because, once state and witnesses have been revealed, users and/or BPs have the opportunity to recreate the bundle without the payment to the relayer.
114 |Two possible solutions:
115 |Depending on how the relayer incentives are structured:
121 |To support arbitrary transactions, any bundle included into a block has to be up-to-date. Relayers have to download the previous block, create and send an update for their bundle, have that update reach the BP, and have the BP include the updated bundle into a new block, all within one slot's time.
127 |Missing state is attributable to the relayer. Relayers could be required (or could voluntarily choose) to send a "refund transaction" along with any relayed transaction. The refund transaction would be used to refund the BP if the original transaction were to fail due to missing state.
129 |Users send transactions to the network, where nodes maintain a pool of pending transactions. Before creating a block, a BP chooses a subset of the pending transactions from the pool and sends them to a SP requesting the state for this transaction bundle. Having received that state, the BP can include the bundle into the new block.
144 |In order for intermediary nodes and the BP to verify the validity of a transaction before the full state is attached by the SP, the user would have to attach enough state to the transaction to verify the signature and fee payment ability. This transaction part would therefore have to be standardized across EEs, with the simplest option being a verification function provided by all EEs.
145 |Alternatively, a Value-Holding EE (VHEE) could be enshrined. Any transaction would use this VHEE for fee payments. Nodes in the network would understand the VHEE and could thus verify transaction validity.
146 |In both cases, nodes in the network would be expected to update the witnesses for this attached state as new blocks arrive.
147 |BPs have no way to predict the actual gas used by any bundle of combined transactions. In particular, any transaction in that bundle could invalidate all further transactions in the same bundle, for example by reducing the balances of their senders to 0. To mitigate that, BPs could "overbundle", i.e. send more transactions to the SP than they expect to fit into a block. The SP would provide state for those transactions until the block limit is reached. In the VHEE case, transactions could additionally be required to include a list of VHEE addresses and the maximum amount they might reduce the balance of that address by. In this way, a BP could pick transactions in such a way as to prevent earlier transactions invalidating later ones.
148 |No restrictions for the main transaction. The BP only reaches out to the SP when creating the block, ensuring the state returned is up to date. Even more importantly, by bundling the transactions and requesting state for the whole bundle at once, state is attached in the exact context these transactions are then included into the block. This ensures that the provided state is always sufficient. It constitutes one of the key differences to the Direct Push Model, where state is attached before the transactions are bundled, resulting in the restrictions of that model.
150 |As the user would have to include state for signature verification and fee payment, this transaction part would technically have the same restrictions outlined in the Direct Push Model. However, those limitations would be insignificant in practice. As signature verification and fee payment also have predictable state access under Eth1, compatibility between Eth1 and Eth2 would not be impaired. Moreover, in case of a VHEE, the VHEE would be designed in such a way as to ensure predictable state access, making further restrictions unnecessary.
151 |The BP pays the SP for the provided state, e.g. via a payment channel. Depending on the trust between the BP and the SP, this could be done on a (trust-minimized) per-transaction basis or a (more efficient) per-bundle basis.
153 |SPs must hold the entire state, which will require a substantial amount of storage. SPs will also be expected to execute transaction bundles quickly, so computing power is also required.
155 |BPs will likely prefer to acquire state from SPs they've built trust with, to reduce the risk of griefing, increasing centralization.
156 |However, no individual SP can censor a transaction, since the BP creates and orders the transaction bundle. An SP may withhold state for an entire bundle, but doing so would risk their reputation and the BP could easily retry with another SP.
157 |A BP would have to reach out to an SP to acquire updated state for the pending bundle within one slot's time.
159 |SPs are always responsible for providing the state requested. BPs only pay after verifying this state is sufficient and are not allowed to include transactions with insufficient state into a block.
161 |If the transaction initiator is able to provide enough witnesses beyond ensuring their balance, should the state access be cheaper? This should be deterministic if the witnesses are signed with the transaction, but would add complexity.
175 |How do block proposers and state providers negotiate the price for state? Is it set by the network? Should block proposers tender bids from multiple state providers for a block, and select the cheapest one?
177 |Should the price be set per state access? Per byte of witness data?
178 |If you charge per byte of witness data, how does the BP know the SP isn't including unnecessary bytes?
179 |If multiple transactions use the same witnesses, should the price be evenly divided? Full price for each transaction? Only the first transaction pays?
180 |The exact semantics of how execution environments request state is not defined by this proposal, but would be required for the pull or relay models to work.
182 |Instead of collecting transactions and sending the entire bundle to a state provider, would it be possible to create a distributed hash table, and have the block proposer request state on-the-fly during execution? This alternative would block execution of transactions on network requests, and likely make serial execution of transactions too slow/unpredictable. Leveraging advancements in software transactional memory could still make this viable.
184 | 185 | 186 |0xAB | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
Originally posted on ethresear.ch.
52 |Deciding to support only static state access (SSA) or to support full dynamic state access (DSA) is one of the remaining open questions on the Eth2 roadmap. If a purely SSA system proves feasible, there are a number of benefits that simplify:
54 |Continuing the exploration of state provider models described earlier, we have roughly prototyped a modification to the solidity compiler that can detect instances of DSA in smart contracts using taint analysis.
61 |This proof of concept shows that it is possible to build tooling to support a purely SSA Ethereum.
62 |For the direct push state provider model to work, the actor creating the transaction should be able to build a witness (such as a Merkle proof) to every storage location the transaction will read or write. If the transaction does read from or write to a location not included in the witness, the transaction is reverted, and its fees are forfeit.
65 |DSA is a particularly problematic issue that can lead to an insufficient witness. In concrete terms, DSA occurs when the offset
argument to sload
or sstore
is influenced by a previous sload
result.
A short classification of some types of DSA, with solidity examples, can be found here.
67 |Example code in this article is written in pseudocode loosely resembling EVM-flavored Yul, an internal representation language used in the solidity compiler. The prototype is written as an optimization pass that operates on Yul.
69 |Taint analysis (or taint checking) is a type of data flow analysis where "taint" flows from an upstream source into all downstream variables. It is commonly used in web applications, where user supplied values (like headers) are the source of taint, and variables used in database queries are verified to be taint-free.
71 |In a compiled language, it is conceptually similar to symbolic execution, though much less complex.
72 |Consider the following example, where value
is tainted by an sload
:
function tweedle() -> twixt
74 | {
75 | let zero := 0
76 | let value := sload(zero)
77 | let dee := 45
78 | let dum := add(dee, value)
79 | let rattle := 88
80 |
81 | let selector := gt(dum, 100)
82 |
83 | switch selector
84 | case 0 {
85 | twixt := dee
86 | }
87 | default {
88 | twixt := rattle
89 | }
90 | }
91 |
92 | The graph below shows the flow of taint from value
all the way to twixt
. Red nodes and edges indicate taint, and dotted lines represent indirect influence (in this case, through the switch
.)
Implementation source code is available for solidity 0.5 and for solidity 0.6, though not all features are implemented on both branches. Test cases can be found here, though not all successfully pass. This prototype is built as a Yul optimization pass, and so requires --optimize-yul --ir
as additional compiler flags.
Since this is a proof of concept, the output is messy and barely readable, the implementation is inefficient, and is 100% capable of summoning nasal demons. Obviously, don't use this software in any kind of production environment.
98 |The analysis can be split into three conceptual phases: data gathering, function resolution, and taint checking.
99 |In the data gathering phase, the analyzer visits each node in the Yul abstract syntax tree (AST). This phase accomplishes several goals:
101 |sload
and sstore
;add
, iszero
);The following is a simplified example of the information collected for the given input:
110 |{
111 | let zero := 0
112 | let ret_0 := fun_narf_19()
113 | sstore(ret_0, zero)
114 |
115 | function fun_narf_19() -> vloc__4
116 | {
117 | let addr := 97
118 | let foo_0 := 54
119 | let foo_1 := sload(addr)
120 |
121 | vloc__4 := add(foo_0, foo_1)
122 | }
123 | }
124 |
125 | The collected output:
126 |Known Variables:
127 | addr [constant=97] [Untaintable]
128 | ret_0 [Untaintable]
129 | zero [constant=0]
130 | vloc__4
131 | foo_0 [constant=54]
132 | foo_1 [Tainted]
133 | !!block_0 [constant]
134 |
135 | Functions:
136 | !!main() ->
137 | Data Flow:
138 | !!block_0 -> zero, ret_0,
139 | Unresolved Function Calls:
140 | - fun_narf_19
141 |
142 | fun_narf_19() -> vloc__4,
143 | Data Flow:
144 | addr -> foo_1,
145 | foo_1 -> vloc__4,
146 | foo_0 -> vloc__4,
147 | !!block_0 -> addr, foo_1, foo_0, vloc__4,
148 |
149 | The variable !!block_0
is synthesized to represent indirect influence, such as if
statements and loops. This example does not contain any indirect influence, though the block variables are still created. The !!main
function represents statements not contained in any other function, like the sstore
in the example code.
Data flow, in the example output, shows how data flows from upstream variables (to the left of ->
) to downstream variables (on the right.)
Note that fun_narf_19
is listed as an unresolved function call.
Memory accesses are tracked by synthesizing variables (ex. !!m0
) for every mstore
, similar to how compilers convert to single static assignment form. This process relies on very basic constant folding. Should an mstore
or an mload
access an offset which is not computable at compile time or has not been written to yet, a catch-all variable !!memory
is used instead.
The function resolution phase iteratively embeds callee function scopes into the caller's scope, uniquely renaming variables in the data flow graph. Embedding functions in this way allows accurate tracing between arguments and return variables.
157 |Continuing with the above example, the data flow graph after this phase looks like:
158 |Functions:
159 | !!main() ->
160 | Data Flow:
161 | !!block_0 -> zero, ret_0, addr_embed0, foo_1_embed1, foo_0_embed3, vloc__4_embed2
162 | addr_embed0 -> foo_1_embed1,
163 | foo_1_embed1 -> vloc__4_embed2,
164 | foo_0_embed3 -> vloc__4_embed2,
165 | vloc__4_embed2 -> ret_0
166 |
167 | Last, and probably simplest, is taint checking. This phase walks through the data flow graph, tainting every variable that is reachable from an initially tainted variable.
170 |Once the taint is propagated, the "protected" variables (variables used as the key argument to an sload
or sstore
) are checked for taint. If tainted protected variables are found, a taint violation exception is thrown.
In this example, ret_0
is both protected and tainted.
A call graph cycle happens when a function foo
calls a function bar
and bar
also calls foo
. For example:
function foo(arg0) -> ret0 {
177 | switch arg0
178 | case 0 {
179 | ret0 := bar()
180 | }
181 | default {
182 | ret0 := 1
183 | }
184 | }
185 |
186 | function bar() -> ret1 {
187 | ret1 := foo(1)
188 | }
189 |
190 | Cycles in the call graph currently cause the prototype to loop infinitely. It should be possible to break these cycles by assuming all parameters of one function influence all of that function's return variables.
191 |Constant propagation is the process of substituting the values of known constants in expressions at compile time.
193 |This prototype implements a very limited form of constant propagation to support the mstore
and mload
instructions. If the offset argument to mstore
or mload
can be computed at compile time, the taint analysis through memory is more accurate (fewer misleading taint violations.)
The builtin functions to call other contracts are disabled in the prototype. Enabling them requires further thought on the ABI between contracts.
196 |Currently the prototype assumes that call data opcodes (CALLDATALOAD
, CALLDATASIZE
, CALLDATACOPY
, etc) return untainted data. If a contract calls another contract, that assumption is invalidated.
The approach taken in this prototype to handle control flow and branching (synthesizing !!block
variables) is insufficient to accurately trace taint through loops.
Consider the following:
200 |function fizzbuzz() {
201 | for { } 1 { }
202 | {
203 | let zero := 0
204 | let foo := sload(zero)
205 | let cond := eq(foo, zero)
206 | if cond {
207 | break
208 | }
209 | }
210 | }
211 |
212 | Which roughly translates to the following data flow graph:
213 |zero
is not influenced by !!block_0
between where it is assigned and where it is used for sload
. In other words, zero
is not dependent on storage[0]
at the time sload
is called, even though the data flow graph thinks it is.
Variables in the prototype are tracked as an indivisible unit, which may be tainted or clean. Tracking each bit of a variable separately enables more accurate analysis.
217 |This improvement is particularly relevant when using boolean operations (like or
& and
) and the mload
instruction with non-256 bit types.
For example, the following snippet does not exhibit any DSA, though the prototype will report a taint violation:
219 |function fizzbuzz(value)
220 | {
221 | // Place a constant byte into memory.
222 | mstore(0, 0xAB)
223 |
224 | // Read a value from storage, place it into memory. This
225 | // taints memory offsets 1 through 32 inclusive.
226 | let from_storage := sload(0)
227 | mstore(1, from_storage)
228 |
229 | // Get the constant back from memory.
230 | let mem_tainted := mload(0)
231 | let mem_cleaned := and(mem_tainted, 0xFF)
232 |
233 | sstore(mem_cleaned, 0xC0FFEE)
234 | }
235 |
236 | After execution, the first 33 bytes of memory look like:
237 |Since mload(0)
populates mem_tainted
with the first 32 bytes of memory, mem_tainted
contains 31 tainted bytes (all bytes after memory[0]
). mem_cleaned
, on the other hand, contains no tainted bytes, since only the first byte can influence its value; the other bits are masked out by and
.
The solidity compiler's Yul implementation doesn't yet support libraries. This makes analyzing existing contracts tedious at best.
241 |Although the limitations mentioned above present challenges for analyzing existing contracts for DSA, we believe that existing compilers can be extended, with reasonable effort, to detect and prevent DSA while maintaining features. Furthermore, given the ease of inadvertently introducing DSA, we believe that adding this feature to smart contract compilers is necessary to write secure code.
243 |One contract, part of @PhABC's uniswap-solidity, did successfully compile with minimal modification.
244 |A big thanks to Quilt for supporting this research and providing invaluable review and feedback.
245 | 246 | 247 |I'm sure everyone has had an awesome picture lying around on their hard drive for what seems like forever, with a gibberish file name and no clues as to the source. Most people would simply navigate to a site like TinEye and use their awesome service to identify the image and find a source. Not you, you're interested in building some kind of software hacked together with ducktape and string, or else why would you be reading this?
52 |I'm going to be flying by the seat of my pants for this series, and I'm giving very little forethought to future posts. Things probably will be changing dramatically as I progress through this project, so I'm going to apologise in advance if everything isn't as contiguous as it should be.
53 |To build this monstrosity, I'm going to use a couple of interesting projects: pHash and py-pHash, django and django-orm, and PostgreSQL with the smlar extension. I won't be going over django or PostgreSQL, so if they are foreign to you, take some time to learn about them.
55 |To understand pHash, you'll need a basic understanding of hashes. If you have no idea what they are, go and check out the hash function page on wikipedia. The Coles Notes version is that a hash function transforms an arbitrarily-sized input into a smaller fixed-size output.
57 |There is a staggering variety of hashes, and an even more astounding number of applications for them. The type of hash we're using, a perceptual hash, maps visually (or acoustically) similar input to similar output hashes. Take a look at the following example:
58 |
59 | Tree Picture by Calibas / Cervus Elaphus Picture by Luc Viatour
On the left we have our images, and on the right, the pHash for the image. Each red digit indicates a nibble that differs from the pHash of the original colour picture. Notice how the colour and greyscale images have very similar hashes, while the third picture has a dramatically different hash.
61 |By now, you can probably see how a perceptual hash would be useful: to find images similar to A, we compute the hash for A and search for all images within a threshold distance.
62 |py-pHash is a Python wrapper for the pHash library, which will make integrating it with django much easier later on.
63 |Django-orm is a collection of 3rd-party extensions to django's already pretty awesome database system. It adds support for a metric tonne of new database features, like negated F expressions and full text search, but the most interesting feature it adds to django is the PostgreSQL specific ArrayField.
65 |The smlar extension uses arrays extensively, so having support for them in django is essential.
66 |Smlar is an extension for PostgreSQL built by Oleg Bartunov and Teodor Sigaev. It allows you to make effective similarity searches in PostgreSQL databases on pretty much any kind of data, as long as you can put it in an array. Alexey Vasiliev goes into a lot more depth on similarity searches on his blog, including some example code, but basically the relevant information is that smlar adds efficient similarity searches and indexes.
68 |We'll be using smlar to find similar hashes, which will let us find similar images very quickly.
69 |I'm running Ubuntu 12.04, so my instructions are going to be very biased towards (read: only for) Linux. Convert the instructions to use the package management system of your choice! I'm using the standard Python that ships with Ubuntu, which at the time of writing is 2.7.3.
71 |# Leave a comment if I missed something,
73 | # which is more than probable...
74 |
75 | sudo apt-get install postgresql-9.1 \
76 | postgresql-server-dev-9.1 \
77 | git build-essential
78 |
79 | # Clone smlar and build/install it
80 | git clone git://sigaev.ru/smlar
81 | cd smlar
82 | make && sudo make install
83 |
84 | # Enable the extension
85 | sudo -u postgres psql -c 'CREATE EXTENSION smlar'
86 |
87 | # Setup django user and database
88 | sudo -u postgres psql -c 'CREATE DATABASE lostpic'
89 | sudo -u postgres psql -c "CREATE USER lostpic WITH PASSWORD 'password1'"
90 | sudo -u postgres psql -c 'GRANT ALL PRIVILEGES ON DATABASE lostpic TO lostpic'
91 |
92 | sudo apt-get install libphash0 libphash0-dev
94 |
95 | I'll be using virtualenv to manage a separate Python environment to make deployment easier. This step is optional, but recommended.
97 |# Install virtualenv
98 | sudo apt-get install python-virtualenv
99 |
100 | cd /path/to/project
101 |
102 | # Make the virtual environment
103 | virtualenv pyenv
104 |
105 | # Activate it (its 'deactivate' to exit)
106 | . ./pyenv/bin/activate
107 |
108 | pip install psycopg2 django django-orm
110 | pip install git+https://github.com/polachok/py-phash.git
111 |
112 | django-admin.py startproject lostpic
113 |
114 | cd lostpic/lostpic
115 |
116 | Edit settings.py
with your favourite editor and make the following changes:
postgresql_psycopg2
lostpic
, the password is password1
INSTALLED_APPS
Don't forget to edit urls.py
to enable the admin.
In the next instalment, we'll be building the models to store picture information and writing the functions to look up pictures by similarity.
125 | 126 | 127 |In my two preceding posts, I give short summaries of the BABYCRYPTO and BROKER challenges from Paradigm's recent Capture the Flag competition. If you'd like more details on the competition itself, or the format of the challenges, go and give my BABYCRYPTO post a read first.
58 |Today I'll be going over the BABYREV challenge which was, by far, the longest challenge we completed during the competition. I'd also like to dedicate this post to @adietrichs' and my collective sanity, which unfortunately didn't survive the 10+ hours we spent on this puzzle.
59 |Disclaimer: unlike the rest of our team, @adietrichs and I don't have a ton of experience auditing or reverse engineering Ethereum contracts; we're protocol researchers. The approach we take here is the most basic, brute-force approach to solving this problem, but hey, it worked!
60 |After a relatively successful exploit on BROKER, I joined @adietrichs on BABYREV. Based on the name and description, it was pretty obvious we'd be dealing with a reverse engineering puzzle.
63 |The contents of the archive:
65 |. 66 | └── babyrev 67 | └── public 68 | ├── contracts 69 | │ └── Setup.sol 70 | ├── deploy 71 | │ ├── chal.py 72 | │ ├── compiled.bin 73 | │ └── requirements.txt 74 | └── Dockerfile 75 |76 |
What immediately stands out is that the only contract we have is Setup.sol
, unlike BROKER which gave us the Broker
contract. We do have compiled.bin
, which is a JSON file with the output from the compiler:
{
78 | "contracts":{
79 | "/private//Challenge.sol:Challenge":{
80 | "bin":"..."
81 | },
82 | "contracts/Setup.sol:ChallengeInterface":{
83 | "bin":""
84 | },
85 | "contracts/Setup.sol:Setup":{
86 | "bin":"..."
87 | }
88 | },
89 | "version":"0.4.24+commit.e67f0147.mod.Darwin.appleclang"
90 | }
91 |
92 | The rest of the archive is pretty similar to BROKER: Setup.sol
checks the win condition, and there's a bunch of Docker magic that spins up a fork of mainnet for the challenge.
From Setup.sol
:
function isSolved() public view returns (bool) {
96 | return challenge.solved();
97 | }
98 |
99 | Well that's profoundly useful. The win condition is part of the secret Challenge
contract. I guess it's time to jump right into the EVM1 assembly...
The bytecode in compiled.bin
is encoded in hexadecimal, and isn't particularly readable. Disassembly is the process of decoding the hexadecimal string into something slightly more understandable.
The basic disassembly process looks something like this:
103 |80
.80
that would be DUP1
(or duplicate the top item of the stack.)PUSH
instruction, read the immediate argument (so for PUSH1
, read one extra byte; two for PUSH2
; and so on.)Following those steps, an input like 0x6080604052
would become:
PUSH1 0x80
111 | PUSH1 0x40
112 | MSTORE
113 |
114 | Thankfully this process is automated. We used ethervm's decompiler, which spits out an annotated disassembly (and a decompiled version too!)
115 |Solidity is a wondrous beast that takes care of the complex process of deploying a contract. Unfortunately we aren't looking at Solidity, so here be dragons.
117 |The first bytes (roughly up to the second 6080
) of a compiled contract are actually the constructor code, or the bits of code Solidity generates to get your contract constructed and deployed on-chain. Constructors, also known as initcode, take care of assigning initial storage values, populating immutable variables, and finally copying the code to be deployed into memory.
Since we aren't super interested in the constructor for this challenge, we snip it off before passing the bytecode to the disassembler.
119 |Challenge
PreambleThe first thing Solidity contracts do is ensure that the calldata is at least four bytes long:
121 |entrypoint: // Okay, I lied.
122 | PUSH1 0x80 // Setting up the free memory pointer is the very first
123 | PUSH1 0x40 // thing. The pointer lives at address 0x40 and is
124 | MSTORE // initially set to 0x80. Solidity uses the first few
125 | // 256-bit words of memory as scratch space.
126 |
127 | PUSH1 0x04 // Then we do check the calldata length.
128 | CALLDATASIZE
129 | LT
130 | PUSH2 0x0062
131 | JUMPI // If the calldata is too short, revert (eventually.)
132 | // Else, fall through into the function selector
133 | // blocks.
134 |
135 | The calldata must be at least four bytes long, at least in this contract, because the first four bytes are used as an identifier for the function to call. The selectors are calculated from the keccak256 hash of the function signature.
136 |This is the first function selector block from Challenge
:
selector_0adf939b:
138 | PUSH1 0x00
139 | CALLDATALOAD // Push the 0-th word of calldata.
140 | PUSH29 0x0100000000000000000000000000000000000000000000000000000000
141 | SWAP1
142 | DIV
143 | PUSH4 0xffffffff
144 | AND // DIV+AND emulates (calldata[0] >> 224)
145 | DUP1
146 | PUSH4 0x0adf939b // Push the function selector.
147 | EQ
148 | PUSH2 0x0067
149 | JUMPI // Jump to 0x67 if the selector matches.
150 |
151 | This structure is repeated for the four public functions in Challenge
:
Selector | Signature2 |
---|---|
0x0adf939b | ?? |
0x39ac0e49 | ?? |
0x799320bb | solved() |
0x799320bb | solve(uint256) |
We only succeeded in guessing one (solve(uint256)
) of the three unknown selectors. Thankfully, we didn't need to call the others directly from Solidity. We can now expand ChallengeInterface
with an additional function:
interface ChallengeInterface {
160 | function solved() public view returns (bool);
161 | function solve(uint256) public;
162 | }
163 |
164 | At last our goal becomes clear: figure out some input for solve
that makes solved
return true
.
If you're somewhat familiar with reverse engineering, you might be wondering if we tried any tools while cracking this challenge. Well, we did, but unfortunately none of them were able to solve this puzzle. 3
167 |Specifically we tried:
168 |manticore-verifier
- timed outgen_exploit.py
from teether, modified - out of memorymythril
- no output specific to the challengeOur failure with these tools is not a sign that they aren't useful, but rather that @adietrichs and I have very little experience with them. I'm sure that in the right hands, they're formidable allies.
174 |If you look carefully at the disassembly, you'll note two large constants in the code:
175 |0x311dfa5451963f33b16e63f0c62278c9b907e43d1961cdf9f590a0c3b351c04019cccb831403
0x504354467b763332795f3533637532335f336e633279703731306e5f34313930323137686d7d
If you look even carefully-er, you might even notice that the second constant is valid ASCII, and that it decodes to PCTF{v32y_53cu23_3nc2yp710n_4190217hm}
. That constant, alone, is not enough to capture the flag. 4
Without tool support, and with little else to do, we decided to manually walk through the entire contract, annotating the stack along the way. For those in the know, this is basically primitive symbolic execution. Our eventual goal was to recover enough of the solve
function to see exactly what path we'd need to hit. We settled on a very simple annotation format: opcode [stack-after-opcode]
. The top (most recently pushed) of the stack would always be on the left.
A small snippet of annotated assembly would look like this:
183 |PUSH1 0x10 [0x10]
184 | PUSH1 0x59 [0x59, 0x10]
185 | ADD [0x69]
186 | MLOAD [M0] // M0 := mload(0x69)
187 |
188 | And so we began. We annotated, and we annotated some more. We annotated so much, we annotated galore. Seriously, this took us hours. You can see the final annotated assembly here.
189 |Although annotating the assembly was slow going, we did quickly identify some sections of code that were interesting, and from there, reversed the important bits.
191 |The most obviously interesting place, and the first place I started tracing, was the comparison subroutine that then stores a true
into storage. The only SSTORE
in the whole contract is at offset 0x1157
. If we wanted to solve the challenge, this is where we'd have to end up.
In the interest of preserving your sanity, dear reader, I won't inline the entire disassembly of the comparison subroutine. Instead, here is some pseudo-code:
194 |function solve(uint256 C) private {
195 | bytes memory expected = /* ... */;
196 | bytes memory actual = do_some_magic(C);
197 |
198 | if (keccak256(actual) == keccak256(expected)) {
199 | storage[0x00] = (storage[0x00] & ~0xFF) | 0x01;
200 | }
201 | }
202 |
203 | Essentially, the final parts of the solve
function compared the output of do_some_magic(C)
(where C
is under our control) to some expected value. If they matched, set storage[0x00]
, marking the puzzle as solved.
The comparison function may have been the most obviously interesting, but the most perplexing section was roughly between offsets 0x0396
and 0x0D93
. This blob of instructions was surprisingly regular. It consisted of 256 repetitions of this:
PUSH1 0x63 // Changes every repetition
207 | PUSH1 0xff
208 | AND [0x63, M_A0, M_A0, 0x00, 0x60, M_Z0, C, 0x01e1, FSEL]
209 | DUP2 [M_A0, 0x63, M_A0, M_A0, 0x00, 0x60, M_Z0, C, 0x01e1, FSEL]
210 | MSTORE [M_A0, M_A0, 0x00, 0x60, M_Z0, C, 0x01e1, FSEL] // memory@M_A0 := 0x00..0063
211 | PUSH1 0x20 [0x20, M_A0, M_A0, 0x00, 0x60, M_Z0, C, 0x01e1, FSEL]
212 | ADD [M_A1, M_A0, 0x00, 0x60, M_Z0, C, 0x01e1, FSEL]
213 |
214 | FSEL
was the function selector.0x01e1
was the eventual return address.C
was the uint256
argument supplied to solve
.M_A0
and M_Z0
were pointers to the first sections of memory regions we called A
and Z
. M_A1
was the next section. I know, so creative.It took us a fairly long while to reason out what this section of code does, but eventually it became clear that this section builds an array of 256 values (which we called The Table™) in memory. Roughly, the following Solidity translates to something like The Table™:
221 |uint8[256] memory values;
222 | values[0] = 0x63;
223 | values[1] = 0x7c;
224 | values[3] = 0x77;
225 | .
226 | .
227 | .
228 | values[253] = 0x54;
229 | values[254] = 0xbb;
230 | values[255] = 0x16;
231 |
232 | An array of 256 values is pretty suspicious. It could be a map for a substitution cipher. It could be a cleverly disguised constant that needs to be XOR'd with the input value. It was neither, but we'll come back to it later!
233 |The do_some_magic
subroutine above covers a lot of assembly, and was pretty much a giant black box. We slowly uncovered how it worked, piece by piece. The first major milestone was discovered Sunday morning, at around 03:45:
I was referring to the subroutine around 0x1169
. In pseudo-code:
weird = C
238 |
239 | /* ... */
240 |
241 | for (j = 0; j < 32 * 8; j += 8) {
242 | offset = (weird >> j) && 0xff
243 | elem = a_arr[offset]
244 | new_weird |= elem << j // add elem byte to the left
245 | }
246 |
247 | C
is, again, the uint256
from calldata.weird
5 is, uh, a weird number. It's used later.new_weird
is transformed from weird
.And in more English-like (but equally indecipherable) terms, this inner loop took weird
and:
j / 8
th byte in weird
from The Table™,weird
, andnew_weird
.What kind of algorithm does something like this? A hash function.
259 |What kind of algorithm breaks automated tools? A hash function.
260 |What did this code end up being? A hash function.
261 |From this point onward it was basically just decompiling more blocks from the disassembly, and translating them into pseudo-code. Don't get me wrong, there was a ton of work, and several "Aha!" moments, but there's really not much more to write about. By early evening (for me, night for @adietrichs) on Sunday, we had reversed the whole solve(uint256)
function, including do_some_magic
:
// The Table™
265 | a_arr = hex"637c777bf26b6fc53001672bfed7ab76ca82c97dfa5947f0add4a2af9ca472c0b7fd9326363ff7cc34a5e5f171d8311504c723c31896059a071280e2eb27b27509832c1a1b6e5aa0523bd6b329e32f8453d100ed20fcb15b6acbbe394a4c58cfd0efaafb434d338545f9027f503c9fa851a3408f929d38f5bcb6da2110fff3d2cd0c13ec5f974417c4a77e3d645d197360814fdc222a908846eeb814de5e0bdbe0323a0a4906245cc2d3ac629195e479e7c8376d8dd54ea96c56f4ea657aae08ba78252e1ca6b4c6e8dd741f4bbd8b8a703eb5664803f60e613557b986c11d9ee1f8981169d98e949b1e87e9ce5528df8ca1890dbfe6426841992d0fb054bb16"
266 |
267 | // Target String - PCTF{v32y_53cu23_3nc2yp710n_4190217hm}
268 | t_str = hex"504354467b763332795f3533637532335f336e633279703731306e5f34313930323137686d7d"
269 |
270 | // Weird String
271 | b_str = hex"311dfa5451963f33b16e63f0c62278c9b907e43d1961cdf9f590a0c3b351c04019cccb831403"
272 |
273 | weird = C
274 |
275 | for (i = 0; i < len(b_str); i++) {
276 | weird_byte = weird && 0xff
277 | b_str[i] ^= weird_byte
278 |
279 | new_weird = 0x00
280 | for (j = 0; j < 32 * 8; j += 8) {
281 | offset = (weird >> j) && 0xff
282 | elem = a_arr[offset]
283 | new_weird |= elem << j // add elem byte to the left
284 | }
285 | weird = (new_weird && 0xff) << 31 * 8 | new_weird >> 8
286 | }
287 |
288 | target_hash = sha3(t_str)
289 | our_hash = sha3(b_str)
290 |
291 | if (target_hash == our_hash) {
292 | // Win the thing!
293 | storage[0x00] = (storage[0x00] & ~0xFF) | 0x01;
294 | }
295 |
296 | The key takeaway here is that each byte of b_str
gets overwritten with itself xor'd with the least significant byte of weird
.
C
: A Rope of SandNow that we know how C
is used, we need to find a concrete value for C
that correctly decodes b_str
. Python to the rescue!
# This is where I'll put the Python tool when I get it from @adietrichs
300 |
301 | Because of the way this hash function is constructed, it is possible to isolate each byte of the input value, one at a time. If this were a real hash function (like SHA-3), there'd be mixing between the bytes, making this process practically impossible.
302 |Now that we know C
, we can craft the attack transaction, and solve the challenge.
This concludes my mini-series on Paradigm's CTF competition. Thanks for sticking it out, and I hope you enjoyed it.
305 |If you'd like to see some of our team's other solutions, check out:
306 |A huge thank you to Paradigm for putting this whole thing together. It was a blast!
313 |Ethereum Virtual Machine, the abstract computer simulated by Ethereum, that Solidity targets.
316 |It is infeasible to recover the function signature given the selector, but 4byte.directory provides a public database mapping selectors to known signatures.
319 |Likely intentional on the puzzlesmith's part!
322 |No matter how much you want it to be.
325 |I called this ACC
(for accumulator), but @adietrichs' weird
won out in the end. What do you want? It's a weird number.
Paradigm, a crypto-focused investment firm, hosted a capture-the-flag style competition over the past weekend with over $10,000 in prizes split among the top three teams. Our team, dilicious1 2, competed and took first place, solving 15 out of a possible 17 challenges.
58 |If you aren't familiar with the term, a capture-the-flag competition (at least in the software industry) is a cyber-security challenge where participants exploit vulnerabilities to achieve goals (analogous to capturing the flag in meatspace.)
59 |This particular competition was focused on the Ethereum blockchain and related technologies.
60 |Our team, in alphabetical order by the first letters of our Twitter usernames:
62 |Go follow them or something, I don't know.
76 |The competition was split into 17 challenges, covering topics from reverse engineering, to cryptography, to market manipulation. Considering I'm an expert in exactly zero of those fields, I'm quite surprised I have anything to write up!
78 |The first challenge I opened was BABYCRYPTO. Considering the name, it seemed like the second easiest challenge (after HELLO, which @adietrichs had already taken) and I figured if I was going to be able to contribute anything, it would be here.
80 |Clicking the challenge, I was greeted by a pleasantly green-and-black page:
81 | Top points to Paradigm for the hacker-esque vibe.
Based on my minutes of experience in other software puzzle games, I suspected that this might be a XOR cypher or a Caesar square, so I quickly downloaded all three of the challenge zips, and spent the next half-an-hour or so setting up Docker and building images. The Dockerfiles in the archives turned out to be the biggest red herring of the entire weekend, tricking several of us into a bunch of work that was ultimately pointless. 4
83 |The contents of the three BABYCRYPTO archives:
84 |. 85 | ├── babycrypto 86 | │ └── public 87 | │ ├── deploy 88 | │ │ ├── chal.py 89 | │ │ └── requirements.txt 90 | │ └── Dockerfile 91 | ├── challenge_base 92 | │ ├── 00-create-xinetd-service 93 | │ ├── 99-start-xinetd 94 | │ ├── Dockerfile 95 | │ ├── entrypoint.sh 96 | │ └── handler.sh 97 | └── eth_challenge_base 98 | ├── 98-start-gunicorn 99 | ├── Dockerfile 100 | └── eth_sandbox 101 | ├── auth.py 102 | ├── hashcash.py 103 | ├── __init__.py 104 | ├── launcher.py 105 | └── server.py 106 |107 |
The only important file I chose to focus on was chal.py
, which ended up being a good place to start for most of the future challenges too.
Here is chal.py
in its entirety:
from random import SystemRandom
110 | from ecdsa import ecdsa
111 | import sha3
112 | import binascii
113 | from typing import Tuple
114 | import uuid
115 | import os
116 |
117 |
118 | def gen_keypair() -> Tuple[ecdsa.Private_key, ecdsa.Public_key]:
119 | """
120 | generate a new ecdsa keypair
121 | """
122 | g = ecdsa.generator_secp256k1
123 | d = SystemRandom().randrange(1, g.order())
124 | pub = ecdsa.Public_key(g, g * d)
125 | priv = ecdsa.Private_key(pub, d)
126 | return priv, pub
127 |
128 |
129 | def gen_session_secret() -> int:
130 | """
131 | generate a random 32 byte session secret
132 | """
133 | with open("/dev/urandom", "rb") as rnd:
134 | seed1 = int(binascii.hexlify(rnd.read(32)), 16)
135 | seed2 = int(binascii.hexlify(rnd.read(32)), 16)
136 | return seed1 ^ seed2
137 |
138 |
139 | def hash_message(msg: str) -> int:
140 | """
141 | hash the message using keccak256, truncate if necessary
142 | """
143 | k = sha3.keccak_256()
144 | k.update(msg.encode("utf8"))
145 | d = k.digest()
146 | n = int(binascii.hexlify(d), 16)
147 | olen = ecdsa.generator_secp256k1.order().bit_length() or 1
148 | dlen = len(d)
149 | n >>= max(0, dlen - olen)
150 | return n
151 |
152 |
153 | if __name__ == "__main__":
154 | flag = os.getenv("FLAG", "PCTF{placeholder}")
155 |
156 | priv, pub = gen_keypair()
157 | session_secret = gen_session_secret()
158 |
159 | for _ in range(4):
160 | message = input("message? ")
161 | hashed = hash_message(message)
162 | sig = priv.sign(hashed, session_secret)
163 | print(f"r=0x{sig.r:032x}")
164 | print(f"s=0x{sig.s:032x}")
165 |
166 | test = hash_message(uuid.uuid4().hex)
167 | print(f"test=0x{test:032x}")
168 |
169 | r = int(input("r? "), 16)
170 | s = int(input("s? "), 16)
171 |
172 | if not pub.verifies(test, ecdsa.Signature(r, s)):
173 | print("better luck next time")
174 | exit(1)
175 |
176 | print(flag)
177 |
178 | With the requirements.txt
file, I was able to get the program to run locally:
180 | message? 181 |182 |
This is Python3, so that rules out any shenanigans with input
vs. raw_input
. I didn't notice any other particularly interesting points in the code itself, so that means it must actually be a cryptography challenge!
Oh god. Its ECDSA. I knew I should've paid more attention in my undergraduate cryptography class. I was about to ask @adietrichs if I could take a look at HELLO instead, when I noticed this interesting pattern:
184 |185 | message? wanna switch? 186 | r=0xe430b3a398f2320556eef81c1c523ea5ae0a920f493c8376eafcb0dc9cd75b89 187 | s=0x4a19d0b156a9d10b0a86b729316909cdc8634ec23aa38f5f891e2599fc316a81 188 | message? no 189 | r=0xe430b3a398f2320556eef81c1c523ea5ae0a920f493c8376eafcb0dc9cd75b89 190 | s=0x43d8ec82d7b6b1f3c8ed41b0ba682d5291f6d4a922a57729fa716dccd0f237d6 191 | message? </3 192 | r=0xe430b3a398f2320556eef81c1c523ea5ae0a920f493c8376eafcb0dc9cd75b89 193 | s=0xbbfb7b34c31f025bddb8724a53dae7dd5a17c489587b881b8ed0dc5453c07e87 194 |195 |
The r
values for the three different messages were the same! As established earlier, I'm no cryptography expert, and I have no idea what r
actually means, but I do know that you should get different outputs when signing different inputs, so off to search the internet!
Aha! We have our vulnerability. Repeating something about something lets you extract the private key.
204 |A little more searching around turned up a plethora of git repositories claiming to be able to extract a private key based on this vulnerability. A few moments later, and I had this script:
205 |from ecdsa import ecdsa, SigningKey
206 | from ecdsa.numbertheory import inverse_mod
207 | from hashlib import sha1
208 |
209 | g = ecdsa.generator_secp256k1
210 | publicKeyOrderInteger = g.order()
211 |
212 | r = "e430b3a398f2320556eef81c1c523ea5ae0a920f493c8376eafcb0dc9cd75b89"
213 | sA = "4a19d0b156a9d10b0a86b729316909cdc8634ec23aa38f5f891e2599fc316a81"
214 | sB = "43d8ec82d7b6b1f3c8ed41b0ba682d5291f6d4a922a57729fa716dccd0f237d6"
215 |
216 | hashA = "24281772548044994405505787307091019721595367071300198475142035580469771474091"
217 | hashB = "56710668495515998944273818574660611208941006033402527734960197520384934694586"
218 |
219 | r1 = int(r, 16)
220 | s1 = int(sA, 16)
221 | s2 = int(sB, 16)
222 |
223 | #Convert Hex into Int
224 | L1 = int(hashA, 10)
225 | L2 = int(hashB, 10)
226 |
227 | numerator = (((s2 * L1) % publicKeyOrderInteger) - ((s1 * L2) % publicKeyOrderInteger))
228 | denominator = inverse_mod(r1 * ((s1 - s2) % publicKeyOrderInteger), publicKeyOrderInteger)
229 |
230 | privateKey = numerator * denominator % publicKeyOrderInteger
231 |
232 | print(privateKey)
233 |
234 | The constants:
235 |g
is the particular curve that chal.py
used.r
is the shared value from the two signatures.sA
and sB
are the s
values from the two signatures.hashA
and hashB
are the outputs of hash_message
in the original script, for the given inputs.The output:
242 |243 | 82639917221039576394263609841358608750060353480659277072454055536914923163809 244 |245 |
Hack that back into the original program (by modifying gen_keypair
) and you can sign arbitrary messages! 5
Well, that's it! That's all of BABYCRYPTO. Stay tuned for two more posts on the BROKER and BABYREV challenges.
248 |Although I wasn't part of the subcommittee on team naming, I'm pretty sure dilicious is just diligence plus delicious.
251 |It took me longer than I care to admit to realize that diligence is actually ConsenSys Diligence, and my confusion explains why this was our Discord server's icon, instead of something relevant.
254 |Actually part of ConsenSys Quilt, not Diligence.
257 |I've been informed that the Docker containers would've let us test solutions locally, but I still assert they were a red herring.
260 |Apologies if you were looking for any in-depth analysis of how the ECDSA private key recovery actually works. You can find the repository I used as a starting point here. It has comments!
263 |If you've been around me in the past couple of days, you've probably heard me talking endlessly about robots, and possibly PyPy, and my trials and tribulations with RPython. I've been working pretty hard on building something fun that isn't a website, and I think I've succeeded, at least at entertaining myself.
52 |A little bit of back story would probably be useful here. I first got interested in programming when I read about Java, back in the early 2000s. I saw a magazine in a news stand proclaiming the awesomeness of Sun's baby, and had to buy it. Thankfully as a ten year old, my money was practically non-existent and I was unable to acquire it. I forgot about programming for months, until Christmas rolled around and my parents bought me Borland's C++ with a Sam's Teach Yourself C++ in 21 days.
53 |After miserably failing to wrap my brain around pointers, I began looking at other ways to program, and I eventually discovered AT Robots, Corewars, and other awesome games. These games, especially Robocom, are what inspired tonight's project.
54 |I give you Robots! This is my way of saying thank you to those who got me interested in programming (I hope no one ever finds this post, since its a fairly terrible thank you, but I digress.)
55 |In Robots! you program a single robot whose goal is to eliminate the other Robot teams on the field. Robots can move, scan, replicate and transfer instructions between each other. The user interface is limited to a grid with coloured circles, or an even worse command line interface, but I think it gets the point across:
56 |The game is played on a 100 by 100 grid where the edges wrap around, torus style.
58 |Robots are programmed in a very simple assembly like language that is interpreted by an RPython program. The game state is transferred to the user interface, which is written in full Python with wxPython and Cairo.
59 |A sample Robot program:
60 |:start
61 | build $left
62 |
63 | :program
64 | set L1 :end
65 | sub L1 1
66 |
67 | :program-loop
68 | if $lt L1 :new
69 | jump (:start) ' Exit the loop when the counter reaches :new
70 |
71 | set L0 L1 ' Copy source location to L0
72 | sub L0 :new ' Make relative to :start
73 |
74 | xfer $left L1 L0 ' Transfer the new instruction from L1 to L0
75 | sub L1 1
76 | jump (:program-loop)
77 |
78 | :new
79 | go $up
80 | jump (:new)
81 | :end
82 |
83 | This simple program builds copies of itself and sends them on their merry way around the board.
84 |Feel free to take a look at the code over on GitHub and play around:
85 | 86 | 87 | 88 |