├── README.md
├── instructions
    └── README.md
└── patterns
    └── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # Smart Contract Dataset
  2 | 
  3 | **This repository aims at releasing smart contract datasets used in our works, to facilitate community research. Also, we present instructions on how to label a certain type of vulnerability and show the detailed pattern designs of investigated vulnerabilities.**
  4 | 
  5 | 
  6 | ## Resource 1
  7 | 
  8 | - This dataset consists of over 40K real world Ethereum smart contracts. 
  9 | - Download this resource at [Ethereum_smart_contract](https://drive.google.com/file/d/1yFJSCiUuoiSx4uWYNcCESUvsEs5DOGM9/view?usp=sharing). 
 10 | 
 11 | - Please cite one of the papers if you want to use the dataset in your paper.
 12 | ```
 13 | @inproceedings{zhuangsmart,
 14 |   title={Smart Contract Vulnerability Detection using Graph Neural Network},
 15 |   author={Zhuang, Yuan and Liu, Zhenguang and Qian, Peng and Liu, Qi and Wang, Xiang and He, Qinming},
 16 |   booktitle={IJCAI},
 17 |   pages={3283--3290},
 18 |   year={2020}
 19 | }
 20 | 
 21 | @inproceedings{liu2021smart,
 22 |   title={Smart Contract Vulnerability Detection: From Pure Neural Network to Interpretable Graph Feature and Expert Pattern Fusion},
 23 |   author={Liu, Zhenguang and Qian, Peng and Wang, Xiang and Zhu, Lei and He, Qinming and Ji, Shouling},
 24 |    booktitle={IJCAI},
 25 |   pages={2751--2759},
 26 |   year={2021}
 27 | }
 28 | 
 29 | @article{liu2021combining,
 30 |   title={Combining Graph Neural Networks with Expert Knowledge for Smart Contract Vulnerability Detection},
 31 |   author={Liu, Zhenguang and Qian, Peng and Wang, Xiaoyang and Zhuang, Yuan and Qiu, Lin and Wang, Xun},
 32 |   journal={IEEE Transactions on Knowledge and Data Engineering},
 33 |   year={2021},
 34 |   publisher={IEEE}
 35 | }
 36 | ``` 
 37 | 
 38 | ## Resource 2
 39 | - This dataset concerns four types of vulnerabilities (i.e., reentrancy, timestamp dependency, integer overflow, dangerous delegatecall), where we give the preprocessing method. 
 40 | - Check [instructions](https://github.com/Messi-Q/Smart-Contract-Dataset/tree/master/instructions) for how to label these vulnerabilities.
 41 | - Download this resource at [Dataset_preprocessing](https://drive.google.com/file/d/1UhHHevE9iDmvSB_k_lhyI58KAj7hnB1o/view?usp=share_link). 
 42 | 
 43 | Please cite our paper if you want to use the dataset in your paper.
 44 | ```
 45 | @inproceedings{10.1145/3543507.3583367,
 46 | author = {Qian, Peng and Liu, Zhenguang and Yin, Yifang and He, Qinming},
 47 | title = {Cross-Modality Mutual Learning for Enhancing Smart Contract Vulnerability Detection on Bytecode},
 48 | year = {2023},
 49 | isbn = {9781450394161},
 50 | publisher = {Association for Computing Machinery},
 51 | address = {New York, NY, USA},
 52 | booktitle = {Proceedings of the ACM Web Conference 2023},
 53 | pages = {2220–2229},
 54 | numpages = {10},
 55 | location = {Austin, TX, USA},
 56 | series = {WWW '23}
 57 | }
 58 | ```
 59 | 
 60 | 
 61 | ## Resource 3
 62 | - This dataset contains over 12K Ethereum smart contracts (where inherited contracts are also included) and concerns eight types of vulnerabilities. 
 63 | - Check the [pattern](https://github.com/Messi-Q/Smart-Contract-Dataset/tree/master/patterns) design for more details.
 64 | - Download this resource at [Dataset](https://drive.google.com/file/d/1iU2J-BIstCa3ooVhXu-GljOBzWi9gVrG/view?usp=share_link). 
 65 | 
 66 | - Please cite our paper if you want to use the dataset in your paper.
 67 | ```
 68 | @article{liu2023rethinking,
 69 |   title={Rethinking Smart Contract Fuzzing: Fuzzing With Invocation Ordering and Important Branch Revisiting},
 70 |   author={Liu, Zhenguang and Qian, Peng and Yang, Jiaxu and Liu, Lingfeng and Xu, Xiaojun and He, Qinming and Zhang, Xiaosong},
 71 |   journal={arXiv preprint arXiv:2301.03943},
 72 |   year={2023}
 73 | }
 74 | ```
 75 | 
 76 | 
 77 | ## Resource 4
 78 | - Here, we present three datasets to evaluate the performance of smart contract analyzers.
 79 | 
 80 | - The first dataset D1 (released by [1]) is used to measure the branch coverage of fuzzers. 
 81 | The second dataset D2 (released by [2, 3, 4]) aims to evaluate the performance of vulnerability detection tools, 
 82 | while the purpose of the third dataset D3 (released by [5]) is to validate the effectiveness of our system in handling real-world contracts that involve large-scale transactions.
 83 | 
 84 | - Download this resource at [Dataset](https://drive.google.com/file/d/1XFp3tZSMkWSkeLSHe_vrQjGZYZ3LzB2s/view?usp=sharing).
 85 | 
 86 | - Please cite our paper if you want to use the dataset in your paper.
 87 | ```
 88 | Coming soon.
 89 | ```
 90 | 
 91 | 
 92 | 
 93 | ## Reference 
 94 | [1] Christof Ferreira Torres, et al. CONFUZZIUS: A Data Dependency-Aware Hybrid Fuzzer for Smart Contracts. EuroSP 2021.  
 95 | 
 96 | [2] SmartBug: https://github.com/smartbugs/smartbugs-wild
 97 | 
 98 | [3] VeriSmart: https://github.com/kupl/VeriSmart-benchmarks
 99 | 
100 | [4] SWC registry:  https://swcregistry.io
101 | 
102 | [5] Jaeseung Choi, et al. SMARTIAN: Enhancing Smart Contract Fuzzing with Static and Dynamic Data-Flow Analyses. ASE 2021.
103 | 


--------------------------------------------------------------------------------
/instructions/README.md:
--------------------------------------------------------------------------------
  1 | # Instruction
  2 | 
  3 | ## 1 Timestamp Dependence
  4 | The timestamp dependence vulnerability exists when a smart contract uses the *block.timestamp* as part of the conditions to perform critical operations.
  5 | 
  6 | 
  7 | ### How to label the timestamp dependency vulnerability?
  8 | We refer to several _patterns_ to label the timestamp dependence vulnerability. 
  9 | 1) **TDInvocation** models whether there exists an invocation to *block.timestamp* in the function. 
 10 | 2) **TDAssign** checks whether the value of *block.timestamp* is assigned to other variables or passed to a condition statement as a parameter, namely whether *block.timestamp* is actually used. 
 11 | 3) **TDContaminate** checks if *block.timestamp* may contaminate the triggering condition of a critical operation (e.g., money transfer) or the return value.
 12 | We consider a function as suspicious to have a timestamp dependence vulnerability if it fulfills the combined pattern: **TimestampInvoc ∧ (TimestampAssign ∨ TimestampContaminate)**.
 13 | 
 14 | 
 15 | #### TDInvocation
 16 | 
 17 | Note that we treat those functions with the *block.timestamp* statement as the target functions. As such, we first utilize the pattern **timestampInvoc** to filter those functions without the statement of *block.timestamp*. 
 18 | 
 19 | 
 20 | #### TDAssign 
 21 | 
 22 | Case 1: When the *block.timestamp* is assigned to a variable and the variable is used by the following operations or passed to a condition statement as a parameter, we label the corresponding function to have the timestamp dependency vulnerability.
 23 |     
 24 |     ```    
 25 |         1.contract CrowdsaleWPTByRounds {
 26 |         2.    uint256 public closingTime;       
 27 |         3.    function closeRound() public returns(uint256) {
 28 |         4.        closingTime = block.timestamp + 1;
 29 |         5.        return closingTime;
 30 |         6.    }
 31 |         7.}
 32 |     ```
 33 |  
 34 | As can be seen, the *block.timestamp* is assigned to variable *closingTime* (line 4), and the variable *closingTime* is called in the return statement (line 5). Thus, we label the function *closeRound* to have the timestamp dependency vulnerability, i.e., label = 1.
 35 | 
 36 | 
 37 | Case 2: When the *block.timestamp* is assigned in the strict condition statements (e.g., _require_ and _assert_), we label the corresponding function to have no  timestamp dependency vulnerability.
 38 | 
 39 |     ```    
 40 |         1.contract Safe {
 41 |         2.    address public owner;
 42 |         3.    uint256 public lock;        
 43 |         4.    function withdrawal( address to, uint value) returns (bool) {
 44 |         5.        require(msg.sender == owner);
 45 |         6.        uint256 time = block.timestamp;
 46 |         7.        require(time >= lock);
 47 |         8.        require(to != address(0));
 48 |         9.        return true;
 49 |         10.    }
 50 |         11.}
 51 |     ```
 52 |     
 53 | It can be observed, the *block.timestamp* is assigned to the variable *time* (line 6), and the variable *time* is assigned in the *require* statement (line 7). Thus, we label the function *withdrawal* to have no timestamp dependency vulnerability, i.e., label = 0.
 54 | 
 55 | 
 56 | #### TDContaminate
 57 | 
 58 | case 1: When the body of the conditional statement (e.g. _if_ and _while_) involves the return value of the function, we label the corresponding function to have the timestamp dependency vulnerability.
 59 | 
 60 |     ```
 61 |         1.contract CrowdsaleExt {
 62 |         2.    uint public startsAt;
 63 |         3.    uint public endsAt;
 64 |         4.    bool public finalized;
 65 |         5.    enum State {PreFunding, Failure, Finalized}
 66 |         6.    function getState() public constant returns (State) {
 67 |         7.         if(finalized) return State.Finalized;
 68 |         8.         else if (block.timestamp < startsAt) return State.PreFunding;
 69 |         9.         else return State.Failure;
 70 |         10.   }
 71 |         11.}
 72 |      ``` 
 73 |        
 74 | As can be seen, when the conditional statement _else if_ satisfies _block.timestamp < startsAt_ (line8), the return value of the function _getState_ is _State.PreFunding_. Thus, we label the function _getState_ to have the timestamp dependency vulnerability, i.e., label = 1.
 75 | 
 76 | Case 2: When the body of the conditional statement involves money operations (e.g.,_transfers_), we label the corresponding function to have the timestamp dependency vulnerability.
 77 |     
 78 |     ```
 79 |         1.contract FreezableToken {
 80 |         2.    function releaseAll() public returns (uint tokens) {
 81 |         3.         uint release;
 82 |         4.         uint balance;
 83 |         5.         while (release != 0 && block.timestamp > release) {
 84 |         6.                tokens += balance;
 85 |         7.                msg.sender.call.value(tokens);
 86 |         8.         }
 87 |         9.         return tokens;
 88 |         10.   }
 89 |         11.}
 90 |     ```
 91 |     
 92 | 
 93 | It can be observed, when the conditional statement _while_ satisfies _release != 0 && block.timestamp > release_ (line 5), the function executes the _call.value_ transfer operation (line 7). Thus, we label the function _releaseAll_ to have the timestamp dependency vulnerability, i.e., label = 1.
 94 | 
 95 | Case 3: When the body of the conditional statement is not related to the return value of the function or money operations (e.g., transfer), we label the corresponding function to have no timestamp dependency vulnerability.
 96 |     
 97 |     ```
 98 |         1.contract BirthdayGift {
 99 |         2.    address public recipient;
100 |         3.    uint public birthday;
101 |         4.    function Take () {
102 |         5         if (msg.sender != recipient) throw;
103 |         6.        if (block.timestamp < birthday) throw;
104 |         7.        if (!recipient.send (this.balance)) throw;
105 |         8.        return;
106 |         9.    }
107 |         10.}
108 |     ```
109 |     
110 | 
111 | As can be seen, when the conditional statement _if_ satisfies _block.timestamp < birthday_ (line 6), the function _Take_ throws an exception. Thus, we label the function _Take_ to have no timestamp dependency vulnerability, i.e., label = 0.
112 | 
113 | 
114 | 
115 | 
116 | 
117 | ## 2 Reentrancy
118 | Reentrancy vulnerability is considered as an invocation to _call.value_ that can call back to itself through a chain of calls.
119 | 
120 | 
121 | ### How to label the reentrancy vulnerability?
122 | We refer to several expert patterns to label the reentrancy vulnerability. 
123 | 1) **callValueInvocation** that checks whether there exists an invocation to call.value in the function.
124 | 2) **balanceDeduction** checks whether the user balance is deducted after money transfer using _call.value_, which considers the fact that the money stealing 
125 | can be avoided if user balance is deducted each time before money transfer.
126 | 3) **zeroParameter** checks whether the parameter of the _call.value_ function itself is zero.
127 | 4) **ModifierConstrain** checks whether the function is constrained by the _onlyOwner_ modifier.
128 | We consider a function as suspicious to have a reentrancy vulnerability if it fulfills the combined pattern: **callValueInvocation ∧ balanceDeduction ∧ zeroParameter ∧ (!ModifierConstrain)**.
129 | 
130 | 
131 | 
132 | #### callValueInvocation
133 | 
134 | Note that we treat those functions with an invocation to *call.value* as the target functions. As such, we first utilize the pattern **callValueInvocation** to filter those functions without an invocation to _call.value_.
135 | 
136 | 
137 | #### zeroParameter
138 | 
139 | Case 1: When the _call.value_ exists in the function and the parameter of the *call.value* is zero, we label the corresponding function to have no reentrancy vulnerability, i.e., label = 0.
140 |      
141 |      ```
142 |         1.contract HiroyukiCoinDark {
143 |         2.    mapping(address => uint256) public balanceOf;
144 |         3.    function transfer(address _to, uint _value, bytes _data) public returns (bool) {
145 |         4.        require(balanceOf[msg.sender] >= _value);
146 |         5.        balanceOf[msg.sender] = balanceOf[msg.sender] - _value;
147 |         6.        balanceOf[_to] = balanceOf[_to] + _value;
148 |         7.        assert(msg.sender.call.value(0)());
149 |         8.        return true;
150 |         9.    }
151 |         10.}
152 |     ```
153 |     
154 | As can be seen, the parameter of _call.value_ is zero (line 7). Thus, we label that the function _transfer_ dose not exist the reentrancy vulnerability, i.e., label = 0.
155 | 
156 | 
157 | #### balanceDeduction
158 | 
159 | Case 1: When the parameter of _call.value_ is not zero and the user balance is deducted before money transfer using _call.value_, we label the corresponding function to have no reentrancy vulnerability, i.e., label = 0.
160 |    
161 |     ```
162 |         1.contract NIZIGEN {
163 |         2.    mapping (address => uint) balances;  
164 |         3.    function transfer(uint _value, bytes _data) public returns (bool) { 
165 |         4.      if(true) {
166 |         5.          if (balances[msg.sender] < _value) revert();
167 |         6.          balances[msg.sender] = balances[msg.sender] - _value;
168 |         7.          assert(msg.sender.call.value(_value)(_data));
169 |         8.          return true;
170 |         9.      }
171 |         10.      else {
172 |         11.          return false;
173 |         12.      }
174 |         13.    }
175 |         14.}
176 |     ```
177 |     
178 | It can be observed, the user balance _balances[msg.sender]_(line 6) is deducted before money transfer using _call.value_ (line 7). Thus, we label the corresponding function to have no reentrancy vulnerability, i.e., label = 0.
179 | 
180 | 
181 | #### modifierDeclaration
182 | 
183 | Case 1: When a function has the _onlyOwner_ modifier constraint, we label the corresponding function to have no reentrancy vulnerability.
184 |     
185 |     ```
186 |         1.contract CrowdsaleWPTByRounds {
187 |         2.  mapping (address => uint) balances;
188 |         3.  address wallet;
189 |         4.  address owner;
190 |         5.  modifier onlyOwner() {
191 |         6.    require(msg.sender == owner);
192 |         7.    _;
193 |         8.  }   
194 |         9.  function forwardFunds() internal onlyOwner {
195 |         10.     wallet.call.value(msg.value).gas(10000000)();
196 |         11.     balances[wallet] -= msg.value;
197 |         12.  }
198 |         13.}
199 |     ```
200 |     
201 | As can be seen, the function _forwardFunds_ is constrained by the _onlyOwner_ modifier(line 9). Thus, we label the function _forwardFunds_ to have no reentrancy vulnerability, i.e., label = 0.
202 | 
203 | 
204 | Case 2: When a function has not the _onlyOwner_ modifier constraint, we label the corresponding function to have the reentrancy vulnerability.
205 |       
206 |       ```
207 |         1.contract CrowdsaleWPTByRounds {
208 |         2.  mapping (address => uint) balances;
209 |         3.  address wallet;
210 |         4.  function forwardFunds() internal {
211 |         5.    wallet.call.value(msg.value).gas(10000000)();
212 |         6.    balances[wallet] -= msg.value;
213 |         7.  }
214 |         8.}
215 |     ```
216 |     
217 | It can be observed, the function _forwardFunds_ is not constrained by the _onlyOwner_ modifier(line 9). Thus, we label the function _forwardFunds_ to have the reentrancy vulnerability, i.e., label = 1.
218 | 
219 | 
220 | 
221 | 
222 | ## 3 Integer Overflow/Underflow
223 | Integer overflow/underflow vulnerability happens when an arithmetic operation attempts to create a numeric value that is outside the range of the integer type..
224 | 
225 | 
226 | ### How to label the integer overflow/underflow vulnerability?
227 | We refer to several expert patterns to label the integer overflow/underflow vulnerability. 
228 | 1) **arithmeticOperation** that checks whether there is arithmetic operation between variables.
229 | 2) **safeLibraryInvocation** that checks whether the arithmetic operations between variables are constrained by a security library function.
230 | 3) **conditionDeclaration** that checks whether the variable for the arithmetic operation is judged by the conditional statement.
231 | We consider a function as suspicious to have a integer overflow or underflow vulnerability if it fulfills the combined pattern: **ArithmeticOperation ∧ (SafeLibraryInvoc ∨ ConditionDeclaration)**.
232 | 
233 | 
234 | #### arithmeticOperation
235 | Note that we treat those functions with the arithmetic operations _(e.g., +, -, *)_ as the target functions. As such, we first utilize the pattern **arithmeticOperation** to filter those functions without the arithmetic operations.
236 | 
237 | 
238 | 
239 | #### safeLibraryInvoc
240 | 
241 | Case 1: When there are arithmetic operations between the variables and the arithmetic operations are constrained by the security library function, we label the corresponding function to have no integer overflow/underflow vulnerability.
242 |     
243 |     ```
244 |         1.library SafeMath {
245 |         2.    function sub(uint256 a, uint256 b) internal pure returns (uint256) {
246 |         3.      assert(b <= a);
247 |         4.      return a - b;
248 |         5.    }
249 |         6.    function add(uint256 a, uint256 b) internal pure returns (uint256 c) {
250 |         7.      c = a + b;
251 |         8.      assert(c >= a);
252 |         9.      return c;
253 |         10.    }
254 |         11.}
255 |         12.contract StandardToken {
256 |         13.    using SafeMath for uint256;
257 |         14.    mapping(address => uint256) balances; 
258 |         15.    function transfer(address to, uint256 value) public returns (bool) {
259 |         16.        balances[msg.sender] = balances[msg.sender].sub(value);
260 |         17.        balances[to] = balances[to].add(value);
261 |         18.        return true;
262 |         19.    }
263 |         20.}
264 |     ```
265 |     
266 | As can be seen, the subtraction operation between the _balances[msg.sender]_ and the _value_ (line 16) is constrained by the security library function (line 2). Thus, we label the corresponding function _transfer_ to have no integer overflow/underflow vulnerability, i.e., label = 0.
267 | 
268 | 
269 | #### conditionDeclaration
270 | 
271 | case 1: When the arithmetic operations and corresponding variables appear in the strict conditional statements (e.g., assert, require), we label the corresponding function to have no integer overflow/underflow vulnerability.
272 |    
273 |     ```
274 |         1.contract Overflow_fixed_assert {
275 |         2.    uint8 sellerBalance = 0;
276 |         3.    function add(uint8 value) returns (uint) {
277 |         4.        sellerBalance += value;
278 |         5.        assert(sellerBalance >= value);
279 |         6.        return sellerBalance;
280 |         7.    }
281 |         8.}
282 |     ```
283 |     
284 | It can be observed, there is an addition operation between the _sellerBalance_ and the _value_ (line 4), and _assert_ statement contains the comparison between the r_sellerBalance_ and the _value_ (line 5). Thus, we label the corresponding function _add_ to have no integer overflow/underflow vulnerability, i.e., label = 0.
285 | 
286 | 
287 | Case 2: When the subtraction operation appears in the strict conditional statement (e.g., assert, require) for comparison and the conditional statement appears before the subtraction operation, we label the corresponding function to have no integer overflow/underflow vulnerability.
288 | 
289 |     ```
290 |         1.contract HiroyukiCoinDark {
291 |         2.    mapping(address => uint256) public balanceOf;
292 |         3.    function transfer(address to, uint value, bytes data) public returns (bool) {
293 |         4.        require(balanceOf[msg.sender] >= value);
294 |         5.        balanceOf[msg.sender] = balanceOf[msg.sender] - value;
295 |         6.        balanceOf[to] = balanceOf[_to] + value;
296 |         7.        assert(msg.sender.call.value(value)());
297 |         8.        return true;
298 |         9.    }
299 |         10.}
300 |     ```
301 |  
302 | As can be seen, there is a subtraction operation between the _balanceOf[msg.sender]_ and the _value_ (line 5), and the _require_ statement contains the comparison between the _balanceOf[msg.sender]_ and the _value_ (line 4). Thus, we label the corresponding function to have no integer overflow/underflow vulnerability, i.e., label = 0.
303 | 
304 | 
305 | Case 3: When the function does not satisfy case 1 and case 2, we label the corresponding function to have integer overflow/underflow vulnerability.
306 |     
307 |     ```
308 |         1. contract Overflow_add {
309 |         2.    uint8 sellerBalance = 0;
310 |         3.    function add(uint8 value) returns (uint) {
311 |         4.        sellerBalance += value;
312 |         5.        return sellerBalance;
313 |         6.    }
314 |         7. }
315 |     ```
316 |     
317 | It can be observed, there is an addition operation between the _sellerBalance_ and the _value_ (line 4), and no conditional statement used to constrain the two variables after the addition operation. Thus, we label the corresponding function _add_ to have the integer overflow/underflow vulnerability, i.e., label = 1.
318 | 
319 | 
320 | 
321 | 
322 | ## 4 Dangerous Delegatecall
323 | Delegatecall endows a caller with the ability to put the code of the callee contract into the current execution environment of the caller contract. However, the execution environments of the caller and the callee might be quite different to each other, running a function of the callee in the environment of the caller may lead to unexpected results. As such, we need to evaluate if a delegatecall will indeed cause losses, e.g., Ether frozen.
324 | 
325 | 
326 | ### How to label the dangerous delegatecall vulnerability?
327 | We refer to several expert patterns to label the dangerous delegatecall vulnerability. 
328 | 1) **delegateInvocation** models whether there exists an invocation to _delegatecall_ in the function;
329 | 2) **ownerInvocation** that checks whether the caller of _delegatecall_ is the owner account;
330 | We consider a function as suspicious to have a delegatecall vulnerability if it fulfills the combined pattern: **DelegatecallInvoc ∧ (!ownerInvocation)**.
331 | 
332 | #### delegateInvocation
333 | Note that we treat those functions with the statemnet of *delegatecall* as the target functions. As such, we first utilize the pattern **delegateInvocation** to filter those functions without the statemnet of *delegatecall*.
334 | 
335 | #### ownerInvocation
336 | Case 1: When the _delegatecall_ exists in the function and the caller is the owner account, we label the corresponding function to have no dangerous delegatecall vulnerability.
337 |    
338 |    ```
339 |        1.contract Proxy {
340 |        2.     address callee;
341 |        3.     address owner;
342 |        4.     modifier onlyOwner {
343 |        5.           require(msg.sender == owner);
344 |        6.          _;
345 |        7.     }
346 |        8.     function setCallee(address newCallee) public onlyOwner {
347 |        9.         callee = newCallee;
348 |        10.   }
349 |        11.   function forward(bytes _data) public {
350 |        12.       require(callee.delegatecall(_data));
351 |        13.   }
352 |        14.}
353 |    ```
354 |    
355 | As can be seen, the caller of _delegatecall_ is _callee_ (line 12), and _callee_ is the target address of the owner account, we thus label the corresponding function _forward_ to have no dangerous delegatecall vulnerability, i.e., label = 0;
356 | 
357 | Case 2: When the _delegatecall_ exists in the function and the caller of _delegatecall_ is not the owner account, we label the corresponding function to have the dangerous delegatecall vulnerability. 
358 |     
359 |     ```
360 |         1.contract Proxy { 
361 |         2.    address owner;
362 |         3.    function forward(address callee, bytes _data) public {
363 |         4.       require(callee.delegatecall(_data));
364 |         5.    }
365 |         6.}
366 |     ```
367 |     
368 | It can be observed, the caller of _delegatecall_ is _callee_ (line 4), and _callee_ is not the owner account, we thus label the function _forward_ to have the dangerous delegatecall vulnerability, i.e., label = 1.
369 | 
370 | 
371 | 
372 | 
373 | 
374 | 


--------------------------------------------------------------------------------
/patterns/README.md:
--------------------------------------------------------------------------------
 1 | # Pattern / Oracle
 2 | 
 3 | ### 1 Reentrancy
 4 | Reentrancy vulnerability is considered as an invocation to call.value that can call back to itself through a chain of calls. That is, the invocation of call.value is successfully re-entered to perform unexpected repeat money transfers. Specifically, we design two patterns to expose the reentrancy vulnerability. The first pattern *CALLValueInvocation* checks if there exists an invocation to call.value in the contract. The second pattern *RepeatedCallValue* concerns whether a specific function with *call.value* invocation is called repeatedly during fuzzing. IR-Fuzz reports that a function has a reentrancy vulnerability if it fulfills the combined pattern: *CALLValueInvocation* ∧ *RepeatedCallValue*.
 5 | 
 6 | 
 7 | ### 2 Timestamp Dependency
 8 | The timestamp dependency vulnerability exists when a function uses the block timestamp as a condition to conduct critical operations (e.g., transfer Ether). Specifically, we define three patterns that are related to timestamp dependency. First, the pattern *TSInvocation* checks whether there exists an invocation to the block.timestamp (i.e., instruction TIMESTAMP) in the contract code. Then, the pattern *TSContaminate* checks if block.timestamp may contaminate the triggering condition of a critical operation e.g., money transfer (namely whether TIMESTAMP contaminates instruction JUMPI or is read by subsequent compare instructions e.g., LT, GT, EQ ). The third pattern *TSRandom* checks if block.timestamp (i.e., TIMESTAMP) is used as a random number seed. IR-Fuzz reports that a function has a timestamp dependency vulnerability if it fulfills the combined pattern: *TSInvocation* ∨ (*TSContaminate* ∧ *TSRandom*).
 9 | 
10 | 
11 | ### 3 Block Number Dependency
12 | The block number dependency vulnerability exists when a function uses the block number (i.e., block.number) as a condition in a branch statement. An attacker may exploit the block number to achieve malicious behaviors. Specifically, we design two patterns to expose this vulnerability. First, the pattern *BNInvocatio*n checks if there exists an invocation to the block.number (i.e., instruction NUMBER) in the contract code. The second pattern is *BNContaminate* which checks whether NUMBER contaminates instruction JUMPI or is read by subsequent compare instructions (e.g., LT, GT, EQ). IR-Fuzz reports that a function has a block number dependency vulnerability if it fulfills the combined pattern: *BNInvocation* ∧ *BNContaminate*.
13 | 
14 | 
15 | ### 4 Dangerous Delegatecall
16 | The delegatecall is designed for invoking library contracts. It endows a caller with the ability to put the code of the callee contract into the current execution environment of the caller contract. If an attacker manipulates the argument of delegatecall, he/she may control the contract and execute arbitrary code. In particular, we design three patterns to reveal this vulnerability. The first pattern *DGInvocation* checks if there is an invocation to delegatecall (i.e., instruction DELEGATECALL) in the contract code. The second pattern is *DGCallConstraint*, which checks whether there is a constraint (e.g., Modifier) to the caller of the function with a delegatecall. The third pattern *DGParameter* checks if the argument of delegatecall is included in the function. IR-Fuzz reports that a function has a delegatecall vulnerability if it fulfills the combined pattern: *DGInvocation* ∧ ¬*DGCallConstraint* ∧ *DGParameter*.
17 | 
18 | 
19 | ### 5 Ether Frozen
20 | Smart contracts can receive and send Ether to other addresses via delegatecall. However, they may contain no functions to transfer Ether, while purely relying on the code of other contracts to send Ether using delegatecall. When the callee contract uses suicide or self-destruct to perform Ether transfer operations, the calling contract has no way to send Ether and all its Ether will be frozen. An attack that exploits the Ether frozen vulnerability in the Parity Multisig Wallet contract led to freezing $280 million worth of Ether1. Particularly, we design two patterns to handle this vulnerability. The first pattern *DGInvocation* checks if there exists an invocation to delegatecall (i.e., instruction DELEGATECALL) to send Ether during execution. The second pattern *FETransfer* checks if there exist transfer instructions (e.g., CALL, CALLCODE, SUICIDE) within the contract itself. IR-Fuzz reports that a function has an Ether frozen vulnerability if it fulfills the combined pattern: *DGInvocation* ∧ ¬*FETransfer*.
21 | 
22 | 
23 | ### 6 Unchecked External Call
24 | A contract that calls an external contract or a set of external contracts will form an external contract call-chain. When the return value of a call in the call-chain is unchecked, which indicates an exception is thrown, the first external call in the call-chain will not find this exception and handle it. Particularly, we design three patterns to expose the unchecked external call vulnerability. First, the pattern *ExternalCall* checks whether there is an invocation (or a chain of calls) to external contracts. Second, the pattern *ExceptionConsistency* checks if the exception instruction (i.e., INVALID) occurs in every invocation of the call-chain. Third, the pattern *ReturnCondtion* checks whether the return value of external calls exists in the branch conditional statement. To summarize, IR-Fuzz reports that a function has an unchecked call return value vulnerability if it fulfills the combined pattern: *ExternalCall* ∧ (*ExceptionConsistency* ∧ ¬*ReturnCondtion*).
25 | 
26 | 
27 | ### 7 Integer Overflow/Underflow
28 | The integer overflow vulnerability exists when an arithmetic operation attempts to create a numeric value that is outside the range of the integer type. For example, if a number is stored in 'uint256' type, this means that the number is stored as a 256-bits unsigned number ranging from 0 to 2^256 - 1. When we try to create a value that is either larger than 2^256 or lower than 0, the integer overflow vulnerability will occur. In such scenario, Ethereum Virtual Machine (EVM) will truncate the overflow bits. Therefore, we design a pattern *OFStackTruncate*, which checks whether the variable arithmetic operations (e.g., instructions ADD, MUL or SUB) are truncated on the EVM stack. IR-Fuzz reports that a function has an integer overflow vulnerability if it fulfills the pattern: *OFStackTruncate*.
29 | 
30 | 
31 | ### 8 Dangerous Ether Strict Equality
32 | The dangerous Ether strict equality vulnerability exists when Ether balance is treated as the conditions of branch statements. The variable this.balance will record all received Ether. However, a contract can also obtain Ether by pre-storing Ether or using selfdestruct for Ether transfer, making the amount stored in the variable this.balance inconsistent with the total amount of Ether sent by users. There is a risk in using this.balance as a condition of the branch statement. Specifically, we design two patterns to expose the dangerous Ether strict equality vulnerability. The first pattern *EDInvocation* checks whether the invocation of instruction BALANCE is corresponding to the variable this.balance in the contract code. The second pattern *EDContaminate* checks if the invocation to instruction BALANCE is followed by the conditional jump instruction JUMPI or compare instructions (e.g., LT, GT, EQ). IR-Fuzz reports that a function has a dangerous Ether strict equality vulnerability if it fulfills the combined pattern: *EDInvocation* ∧ *EDContaminate*.
33 | 
34 | 
35 | 
36 | ## Reference 
37 | 1. Christof Ferreira Torres, et al. CONFUZZIUS: A Data Dependency-Aware Hybrid Fuzzer for Smart Contracts. EuroSP 2021.
38 | 
39 | 2. Jaeseung Choi, et al. SMARTIAN: Enhancing Smart Contract Fuzzing with Static and Dynamic Data-Flow Analyses. ASE 2021.
40 | 
41 | 3. Zhenguang Liu, Peng Qiann, et al. Rethinking Smart Contract Fuzzing: Fuzzing With Invocation Ordering and Important Branch Revisiting. TIFS 2023.
42 | 


--------------------------------------------------------------------------------