├── README.md ├── auditing.ipynb ├── fine_tune.ipynb ├── fine_tune.txt ├── solodit.json └── training_data.pkl /README.md: -------------------------------------------------------------------------------- 1 | # Audit GPT 2 | Fine-tuning GPT-3/4 for Smart Contract Auditing. 3 | 4 | 5 | ## Running 6 | Get an API key from OpenAI and set it to environment. Fine-tuning costs around $16. 7 | ``` 8 | export OPENAI_API_KEY=sk_XXXXXXXX 9 | ``` 10 | 11 | ## Files 12 | * `fine_tune.ipynb` - Code for crawling & preparing data and fine-tuning OpenAI models. 13 | * `audit.ipynb` - Code for leveraging fine-tuned model to do an audit. 14 | 15 | ## Example 16 | 17 | Let fine-tuned GPT audit [Kairos Auction Faucet](https://github.com/sherlock-audit/2023-02-kairos/blob/main/kairos-contracts/src/AuctionFacet.sol) from Sherlock competition: 18 | 19 | | **False Positives** | **9** | 20 | |---------------------|-------| 21 | | **False Negatives** | **1** | 22 | 23 | #### Details 24 | ```solidity 25 | contract AuctionFacet is IAuctionFacet, SafeMint { 26 | using RayMath for Ray; 27 | using RayMath for uint256; 28 | using Erc20CheckedTransfer for IERC20; 29 | 30 | /// @notice buy one or multiple NFTs in liquidation 31 | ``` 32 | 🤖️ the method buy() doesn’t check that the sender of the transaction is the winner of the auction. This allows the attacker to steal all of the funds in the contract. 33 | ```solidity 34 | /// @param args arguments on what and how to buy 35 | function buy(BuyArg[] memory args) external { 36 | for (uint256 i = 0; i < args.length; i++) { 37 | useLoan(args[i]); 38 | } 39 | } 40 | ``` 41 | 🤖️ malicious user can buy tokens for the lender multiple times, draining their stake. 42 | ```solidity 43 | 44 | /// @notice gets the price to buy the underlying collateral of the loan 45 | /// @param loanId identifier of the loan 46 | /// @return price computed price 47 | function price(uint256 loanId) public view returns (uint256) { 48 | Loan storage loan = protocolStorage().loan[loanId]; 49 | ``` 50 | 🤖️ the method price() returns the wrong price of the collateral ETH when the loan is in the Erased state. This allows a malicious lender to buy back the collateral ETH at a low price before it is erased from their wallet. 51 | ```solidity 52 | uint256 loanEndDate = loan.endDate; 53 | uint256 timeSinceLiquidable = block.timestamp - loanEndDate; 54 | 55 | checkLoanStatus(loanId); 56 | 57 | /* the decreasing factor controls the evolution of the price from its initial value to 0 (and staying at 0) 58 | ``` 59 | 🤖️ Liquidable loans can be stuck in the endDate state, so that timeSinceLiquidable will always be greater than block.timestamp, and the borrower will not be able to pay back the loan before its end date. This can cause the borrower’s collateral to be stuck in the EndDate state forever, and can lead to a loss of principal for the lender. 60 | ```solidity 61 | over the course of the auction duration */ 62 | Ray decreasingFactor = timeSinceLiquidable >= loan.auction.duration 63 | ? ZERO 64 | : ONE.sub(timeSinceLiquidable.div(loan.auction.duration)); 65 | 66 | /* the estimated value arises from the mean of the loan offer loanToValues used in the loan regarding their 67 | ``` 68 | 🤖️ the estimated liquidable stake is wrong, which can be used to frontrun the bidders. 69 | ```solidity 70 | share in the collateral usage. This must stay consitent even if less than the full value of the NFT has been 71 | used as collateral */ 72 | uint256 estimatedValue = loan.lent.mul(ONE.div(loan.shareLent)); 73 | 74 | /* by mutliplying the estimated price by some factor and slowly decreasing this price over time we aim to 75 | make sure a liquidator will buy the NFT at fair market price. */ 76 | ``` 77 | 🤖️ the liquidator may not buy the collateral token for its fair value, but instead buy it for much less than its fair value, allowing the borrower to profit from the liquidator’s mistake. 78 | ```solidity 79 | return estimatedValue.mul(loan.auction.priceFactor).mul(decreasingFactor); 80 | } 81 | 82 | /// @notice handles buying one NFT 83 | /// @param arg arguments on what and how to buy 84 | function useLoan(BuyArg memory arg) internal { 85 | ``` 86 | 🤖️ the user can lose their NFTs when they buy a loan. 87 | ```solidity 88 | Loan storage loan = protocolStorage().loan[arg.loanId]; 89 | 90 | checkLoanStatus(arg.loanId); 91 | uint256 toPay = price(arg.loanId); 92 | 93 | /* store as liquidated and paid before transfers to avoid malicious reentrency, following 94 | ``` 95 | 🤖️ Users can take the ETH in the collateral of a loan without paying back the loan. 96 | ```solidity 97 | checks-effects-interaction pattern */ 98 | loan.payment.liquidated = true; 99 | loan.payment.paid = toPay; 100 | loan.assetLent.checkedTransferFrom(msg.sender, address(this), toPay); 101 | loan.collateral.implem.safeTransferFrom(address(this), arg.to, loan.collateral.id); 102 | 103 | ``` 104 | 🤖️ the user’s NFT could be stuck in the contract when the user tries to liquidate the loan. 105 | ```solidity 106 | emit Buy(arg.loanId, abi.encode(arg)); 107 | } 108 | 109 | /// @notice checks that loan is liquidable, revert if not 110 | /// @param loanId identifier of the loan 111 | function checkLoanStatus(uint256 loanId) internal view { 112 | ``` 113 | 🤖️ Liquidation can be bypassed by calling buy on a loan that has already been liquidated. This allows users to purchase a loan that they should not be able to purchase, allowing them to steal funds from the borrower. 114 | ```solidity 115 | Loan storage loan = protocolStorage().loan[loanId]; 116 | 117 | if (block.timestamp < loan.endDate) { 118 | revert CollateralIsNotLiquidableYet(loan.endDate, loanId); 119 | } 120 | if (loan.payment.paid != 0 || loan.payment.liquidated) { 121 | ``` 122 | 🤖️ Liquidation of collateral does not update endDate, so users can take out a new loan in the old endDate, allowing them to take out multiple loans with the same collateral. 123 | ```solidity 124 | revert LoanAlreadyRepaid(loanId); 125 | } 126 | } 127 | ``` 128 | 🤖️ User’s loan can be maliciously reported as already repaid so that the user can’t repay the loan again. This allows users to take NFTs out of their account multiple times. 129 | -------------------------------------------------------------------------------- /auditing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 19, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# change the model name to your own model name!\n", 10 | "MODEL_NAME = \"davinci:ft-personal-2023-04-23-04-39-34\"" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "import os\n", 20 | "import openai\n", 21 | "openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n", 22 | "\n", 23 | "def find_vulnerabilities(prompt):\n", 24 | " prompt = prompt.replace(\" \", \"\").replace(\"\\t\", \"\") + \"\\n\\nThe vulnerability is:\"\n", 25 | " response = openai.Completion.create( \n", 26 | " model=MODEL_NAME, \n", 27 | " prompt=prompt, \n", 28 | " temperature=0, \n", 29 | " max_tokens=1024, \n", 30 | " top_p=1, \n", 31 | " frequency_penalty=0.5, \n", 32 | " presence_penalty=0, stop=[\"\\n\", \" User:\", \" AI:\"] \n", 33 | " )\n", 34 | "3 return response[\"choices\"][0][\"text\"]" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 20, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "### https://github.com/sherlock-audit/2023-02-kairos/blob/main/kairos-contracts/src/AuctionFacet.sol\n", 44 | "contract = \"\"\"\n", 45 | "// SPDX-License-Identifier: UNLICENSED\n", 46 | "pragma solidity 0.8.18;\n", 47 | "\n", 48 | "import {IERC20} from \"@openzeppelin/contracts/token/ERC20/IERC20.sol\";\n", 49 | "\n", 50 | "import {IAuctionFacet} from \"./interface/IAuctionFacet.sol\";\n", 51 | "\n", 52 | "import {BuyArg, NFToken, Ray} from \"./DataStructure/Objects.sol\";\n", 53 | "import {Loan, Protocol, Provision, SupplyPosition} from \"./DataStructure/Storage.sol\";\n", 54 | "import {RayMath} from \"./utils/RayMath.sol\";\n", 55 | "import {Erc20CheckedTransfer} from \"./utils/Erc20CheckedTransfer.sol\";\n", 56 | "import {SafeMint} from \"./SupplyPositionLogic/SafeMint.sol\";\n", 57 | "import {protocolStorage, supplyPositionStorage, ONE, ZERO} from \"./DataStructure/Global.sol\";\n", 58 | "// solhint-disable-next-line max-line-length\n", 59 | "import {LoanAlreadyRepaid, CollateralIsNotLiquidableYet} from \"./DataStructure/Errors.sol\";\n", 60 | "\n", 61 | "/// @notice handles sale of collaterals being liquidated, following a dutch auction starting at repayment date\n", 62 | "contract AuctionFacet is IAuctionFacet, SafeMint {\n", 63 | " using RayMath for Ray;\n", 64 | " using RayMath for uint256;\n", 65 | " using Erc20CheckedTransfer for IERC20;\n", 66 | "\n", 67 | " /// @notice buy one or multiple NFTs in liquidation\n", 68 | " /// @param args arguments on what and how to buy\n", 69 | " function buy(BuyArg[] memory args) external {\n", 70 | " for (uint256 i = 0; i < args.length; i++) {\n", 71 | " useLoan(args[i]);\n", 72 | " }\n", 73 | " }\n", 74 | "\n", 75 | " /// @notice gets the price to buy the underlying collateral of the loan\n", 76 | " /// @param loanId identifier of the loan\n", 77 | " /// @return price computed price\n", 78 | " function price(uint256 loanId) public view returns (uint256) {\n", 79 | " Loan storage loan = protocolStorage().loan[loanId];\n", 80 | " uint256 loanEndDate = loan.endDate;\n", 81 | " uint256 timeSinceLiquidable = block.timestamp - loanEndDate;\n", 82 | "\n", 83 | " checkLoanStatus(loanId);\n", 84 | "\n", 85 | " /* the decreasing factor controls the evolution of the price from its initial value to 0 (and staying at 0)\n", 86 | " over the course of the auction duration */\n", 87 | " Ray decreasingFactor = timeSinceLiquidable >= loan.auction.duration\n", 88 | " ? ZERO\n", 89 | " : ONE.sub(timeSinceLiquidable.div(loan.auction.duration));\n", 90 | "\n", 91 | " /* the estimated value arises from the mean of the loan offer loanToValues used in the loan regarding their\n", 92 | " share in the collateral usage. This must stay consitent even if less than the full value of the NFT has been\n", 93 | " used as collateral */\n", 94 | " uint256 estimatedValue = loan.lent.mul(ONE.div(loan.shareLent));\n", 95 | "\n", 96 | " /* by mutliplying the estimated price by some factor and slowly decreasing this price over time we aim to\n", 97 | " make sure a liquidator will buy the NFT at fair market price. */\n", 98 | " return estimatedValue.mul(loan.auction.priceFactor).mul(decreasingFactor);\n", 99 | " }\n", 100 | "\n", 101 | " /// @notice handles buying one NFT\n", 102 | " /// @param arg arguments on what and how to buy\n", 103 | " function useLoan(BuyArg memory arg) internal {\n", 104 | " Loan storage loan = protocolStorage().loan[arg.loanId];\n", 105 | "\n", 106 | " checkLoanStatus(arg.loanId);\n", 107 | " uint256 toPay = price(arg.loanId);\n", 108 | "\n", 109 | " /* store as liquidated and paid before transfers to avoid malicious reentrency, following\n", 110 | " checks-effects-interaction pattern */\n", 111 | " loan.payment.liquidated = true;\n", 112 | " loan.payment.paid = toPay;\n", 113 | " loan.assetLent.checkedTransferFrom(msg.sender, address(this), toPay);\n", 114 | " loan.collateral.implem.safeTransferFrom(address(this), arg.to, loan.collateral.id);\n", 115 | "\n", 116 | " emit Buy(arg.loanId, abi.encode(arg));\n", 117 | " }\n", 118 | "\n", 119 | " /// @notice checks that loan is liquidable, revert if not\n", 120 | " /// @param loanId identifier of the loan\n", 121 | " function checkLoanStatus(uint256 loanId) internal view {\n", 122 | " Loan storage loan = protocolStorage().loan[loanId];\n", 123 | "\n", 124 | " if (block.timestamp < loan.endDate) {\n", 125 | " revert CollateralIsNotLiquidableYet(loan.endDate, loanId);\n", 126 | " }\n", 127 | " if (loan.payment.paid != 0 || loan.payment.liquidated) {\n", 128 | " revert LoanAlreadyRepaid(loanId);\n", 129 | " }\n", 130 | " }\n", 131 | "}\n", 132 | "\"\"\"" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 21, 138 | "metadata": {}, 139 | "outputs": [], 140 | "source": [ 141 | "def parse_solidity(content):\n", 142 | " paranthesis = 0\n", 143 | " current_contract = []\n", 144 | " contract_segments = []\n", 145 | " for i in content.split(\"\\n\"):\n", 146 | " if \"{\" in i:\n", 147 | " paranthesis += 1\n", 148 | " if \"}\" in i:\n", 149 | " paranthesis -= 1\n", 150 | " if paranthesis != 0:\n", 151 | " current_contract.append(i)\n", 152 | " if len(current_contract) > 5:\n", 153 | " contract_segments.append(\"\\n\".join(current_contract))\n", 154 | " current_contract = []\n", 155 | " if len(current_contract)>1:\n", 156 | " contract_segments.append(\"\\n\".join(current_contract))\n", 157 | " return contract_segments\n", 158 | "\n", 159 | "\n" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 22, 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "name": "stdout", 169 | "output_type": "stream", 170 | "text": [ 171 | "==================================================\n", 172 | "contract AuctionFacet is IAuctionFacet, SafeMint {\n", 173 | " using RayMath for Ray;\n", 174 | " using RayMath for uint256;\n", 175 | " using Erc20CheckedTransfer for IERC20;\n", 176 | "\n", 177 | " /// @notice buy one or multiple NFTs in liquidation\n", 178 | "🤖️ the method buy() doesn’t check that the sender of the transaction is the winner of the auction. This allows the attacker to steal all of the funds in the contract.\n", 179 | "==================================================\n", 180 | "==================================================\n", 181 | " /// @param args arguments on what and how to buy\n", 182 | " function buy(BuyArg[] memory args) external {\n", 183 | " for (uint256 i = 0; i < args.length; i++) {\n", 184 | " useLoan(args[i]);\n", 185 | " }\n", 186 | " }\n", 187 | "🤖️ malicious user can buy tokens for the lender multiple times, draining their stake.\n", 188 | "==================================================\n", 189 | "==================================================\n", 190 | "\n", 191 | " /// @notice gets the price to buy the underlying collateral of the loan\n", 192 | " /// @param loanId identifier of the loan\n", 193 | " /// @return price computed price\n", 194 | " function price(uint256 loanId) public view returns (uint256) {\n", 195 | " Loan storage loan = protocolStorage().loan[loanId];\n", 196 | "🤖️ the method price() returns the wrong price of the collateral ETH when the loan is in the Erased state. This allows a malicious lender to buy back the collateral ETH at a low price before it is erased from their wallet.\n", 197 | "==================================================\n", 198 | "==================================================\n", 199 | " uint256 loanEndDate = loan.endDate;\n", 200 | " uint256 timeSinceLiquidable = block.timestamp - loanEndDate;\n", 201 | "\n", 202 | " checkLoanStatus(loanId);\n", 203 | "\n", 204 | " /* the decreasing factor controls the evolution of the price from its initial value to 0 (and staying at 0)\n", 205 | "🤖️ Liquidable loans can be stuck in the endDate state, so that timeSinceLiquidable will always be greater than block.timestamp, and the borrower will not be able to pay back the loan before its end date. This can cause the borrower’s collateral to be stuck in the EndDate state forever, and can lead to a loss of principal for the lender.\n", 206 | "==================================================\n", 207 | "==================================================\n", 208 | " over the course of the auction duration */\n", 209 | " Ray decreasingFactor = timeSinceLiquidable >= loan.auction.duration\n", 210 | " ? ZERO\n", 211 | " : ONE.sub(timeSinceLiquidable.div(loan.auction.duration));\n", 212 | "\n", 213 | " /* the estimated value arises from the mean of the loan offer loanToValues used in the loan regarding their\n", 214 | "🤖️ the liquidable state is wrongfully calculated, causing the bidders to overpay for the loan.\n", 215 | "==================================================\n", 216 | "==================================================\n", 217 | " share in the collateral usage. This must stay consitent even if less than the full value of the NFT has been\n", 218 | " used as collateral */\n", 219 | " uint256 estimatedValue = loan.lent.mul(ONE.div(loan.shareLent));\n", 220 | "\n", 221 | " /* by mutliplying the estimated price by some factor and slowly decreasing this price over time we aim to\n", 222 | " make sure a liquidator will buy the NFT at fair market price. */\n", 223 | "🤖️ the liquidator may not buy the collateral token for its fair value, but instead buy it for much less than its fair value, allowing the borrower to profit from the liquidator’s mistake.\n", 224 | "==================================================\n", 225 | "==================================================\n", 226 | " return estimatedValue.mul(loan.auction.priceFactor).mul(decreasingFactor);\n", 227 | " }\n", 228 | "\n", 229 | " /// @notice handles buying one NFT\n", 230 | " /// @param arg arguments on what and how to buy\n", 231 | " function useLoan(BuyArg memory arg) internal {\n", 232 | "🤖️ the user can lose their NFTs when they buy a loan.\n", 233 | "==================================================\n", 234 | "==================================================\n", 235 | " Loan storage loan = protocolStorage().loan[arg.loanId];\n", 236 | "\n", 237 | " checkLoanStatus(arg.loanId);\n", 238 | " uint256 toPay = price(arg.loanId);\n", 239 | "\n", 240 | " /* store as liquidated and paid before transfers to avoid malicious reentrency, following\n", 241 | "🤖️ Users can take the ETH in the collateral of a loan without paying back the loan.\n", 242 | "==================================================\n", 243 | "==================================================\n", 244 | " checks-effects-interaction pattern */\n", 245 | " loan.payment.liquidated = true;\n", 246 | " loan.payment.paid = toPay;\n", 247 | " loan.assetLent.checkedTransferFrom(msg.sender, address(this), toPay);\n", 248 | " loan.collateral.implem.safeTransferFrom(address(this), arg.to, loan.collateral.id);\n", 249 | "\n", 250 | "🤖️ the user’s NFT could be stuck in the contract when the user tries to liquidate the loan.\n", 251 | "==================================================\n", 252 | "==================================================\n", 253 | " emit Buy(arg.loanId, abi.encode(arg));\n", 254 | " }\n", 255 | "\n", 256 | " /// @notice checks that loan is liquidable, revert if not\n", 257 | " /// @param loanId identifier of the loan\n", 258 | " function checkLoanStatus(uint256 loanId) internal view {\n", 259 | "🤖️ Liquidation can be bypassed by taking a new loan with the same collateral before the old one is liquidated. This allows users to take multiple loans in a single transaction, which is not possible when using the Smart Contract implementation.\n", 260 | "==================================================\n", 261 | "==================================================\n", 262 | " Loan storage loan = protocolStorage().loan[loanId];\n", 263 | "\n", 264 | " if (block.timestamp < loan.endDate) {\n", 265 | " revert CollateralIsNotLiquidableYet(loan.endDate, loanId);\n", 266 | " }\n", 267 | " if (loan.payment.paid != 0 || loan.payment.liquidated) {\n", 268 | "🤖️ Liquidation of collateral does not update endDate, so users can take out a loan with no endDate and no payment, allowing them to profit from the liquidation fee without paying it.\n", 269 | "==================================================\n", 270 | "==================================================\n", 271 | " revert LoanAlreadyRepaid(loanId);\n", 272 | " }\n", 273 | " }\n", 274 | "🤖️ User’s loan can be maliciously reported as already repaid so that the user can’t repay the loan again. This allows users to take NFTs out of their account multiple times.\n", 275 | "==================================================\n" 276 | ] 277 | } 278 | ], 279 | "source": [ 280 | "parsed = parse_solidity(contract)\n", 281 | "\n", 282 | "for i in parsed:\n", 283 | " print(\"=\" * 50)\n", 284 | " print(i)\n", 285 | " print(\"🤖️ \" + find_vulnerabilities(i))\n", 286 | " print(\"=\" * 50)" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [] 295 | } 296 | ], 297 | "metadata": { 298 | "kernelspec": { 299 | "display_name": "Python 3", 300 | "language": "python", 301 | "name": "python3" 302 | }, 303 | "language_info": { 304 | "codemirror_mode": { 305 | "name": "ipython", 306 | "version": 3 307 | }, 308 | "file_extension": ".py", 309 | "mimetype": "text/x-python", 310 | "name": "python", 311 | "nbconvert_exporter": "python", 312 | "pygments_lexer": "ipython3", 313 | "version": "3.10.8" 314 | }, 315 | "orig_nbformat": 4 316 | }, 317 | "nbformat": 4, 318 | "nbformat_minor": 2 319 | } 320 | -------------------------------------------------------------------------------- /fine_tune.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "attachments": {}, 5 | "cell_type": "markdown", 6 | "metadata": {}, 7 | "source": [ 8 | "## Crawl Code4rena reports from Solodit" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 160, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "import requests\n", 18 | "\n", 19 | "def get_solodit(page):\n", 20 | " headers = {\n", 21 | " 'authorization': 'Token 36dc738e703c50039f3e6f03ee696730c49c54cb', # <- replace with your own token! You can find it in the network tab of your browser\n", 22 | " 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36',\n", 23 | " }\n", 24 | "\n", 25 | " params = {\n", 26 | " 'source': 'Code4rena',\n", 27 | " 'impact': 'HIGH,MEDIUM',\n", 28 | " 'finder': '',\n", 29 | " 'protocol': '',\n", 30 | " 'report_date': '',\n", 31 | " 'min_quality_score': '0',\n", 32 | " 'min_general_score': '0',\n", 33 | " 'tags': '',\n", 34 | " 'bookmarked': 'False',\n", 35 | " 'keyword': '',\n", 36 | " 'page': page,\n", 37 | " }\n", 38 | "\n", 39 | " response = requests.get('https://solodit.xyz/api/issues/rest/', params=params, headers=headers)\n", 40 | " return response.json()\n" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 161, 46 | "metadata": {}, 47 | "outputs": [ 48 | { 49 | "data": { 50 | "text/plain": [ 51 | "124" 52 | ] 53 | }, 54 | "execution_count": 161, 55 | "metadata": {}, 56 | "output_type": "execute_result" 57 | } 58 | ], 59 | "source": [ 60 | "total_pages = get_solodit(1)[\"total_pages\"]\n", 61 | "total_pages" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 234, 67 | "metadata": {}, 68 | "outputs": [ 69 | { 70 | "name": "stderr", 71 | "output_type": "stream", 72 | "text": [ 73 | "/var/folders/wy/h6tpyrcn4szfs0598d0y5hhh0000gn/T/ipykernel_62742/2122698606.py:3: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0\n", 74 | "Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`\n", 75 | " for i in tqdm.tqdm_notebook(range(1, total_pages)):\n" 76 | ] 77 | }, 78 | { 79 | "data": { 80 | "application/vnd.jupyter.widget-view+json": { 81 | "model_id": "2acbbb3f231f43dc8d9dbd99f60db3d8", 82 | "version_major": 2, 83 | "version_minor": 0 84 | }, 85 | "text/plain": [ 86 | " 0%| | 0/123 [00:00name and symbol. It is possible it set them back to an empty string, uninitializing the contract and letting the initialize(..) function be called again. This way, the owner may,\n", 244 | "Error in 1727 ('code-423n4/2021-12-amun', 'main', 'contracts/basket/contracts/facets/ERC20/ERC20Facet.sol', 'L25-L28\">name and symbol. It is possible it set them back to an empty string, uninitializing the contract and letting the initialize(..) function be called again. This way, the owner may, for example, hide minting additional tokens. Or, after accidentally setting name and symbol to empty strings, anyone can take control over the contract and mint any number of tokens.

')\n" 245 | ] 246 | } 247 | ], 248 | "source": [ 249 | "training_data = []\n", 250 | "total_lens = 0\n", 251 | "\n", 252 | "for idx, i in tqdm.tqdm_notebook(enumerate(data)):\n", 253 | " all_code = \"\"\n", 254 | " try:\n", 255 | " locs, vuln = parse(i)\n", 256 | " except Exception as e:\n", 257 | " print(e)\n", 258 | " print(f\"Error in {idx}\")\n", 259 | " continue\n", 260 | " # if len(vuln) > 950:\n", 261 | " # print(f\"Error in {idx} {len(vuln)}\")\n", 262 | " # continue\n", 263 | " for loc in locs:\n", 264 | " repo, commit_hash, file_name, line_number = loc\n", 265 | " try:\n", 266 | " code = crawl(repo, commit_hash, file_name, line_number)\n", 267 | " all_code += code + \"\\n\"\n", 268 | " total_lens += len(vuln)\n", 269 | " except Exception as e:\n", 270 | " print(e)\n", 271 | " print(f\"Error in {idx} {loc}\")\n", 272 | " if len(all_code) > 0:\n", 273 | " training_data.append({\n", 274 | " \"text\": all_code,\n", 275 | " \"label\": vuln.strip(),\n", 276 | " })" 277 | ] 278 | }, 279 | { 280 | "attachments": {}, 281 | "cell_type": "markdown", 282 | "metadata": {}, 283 | "source": [ 284 | "## Remove Too Lengthy Code / Vulnerability" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 245, 290 | "metadata": {}, 291 | "outputs": [ 292 | { 293 | "data": { 294 | "text/plain": [ 295 | "1440" 296 | ] 297 | }, 298 | "execution_count": 245, 299 | "metadata": {}, 300 | "output_type": "execute_result" 301 | } 302 | ], 303 | "source": [ 304 | "removal = []\n", 305 | "for idx, i in enumerate(training_data):\n", 306 | " i[\"text\"] = i[\"text\"].replace(\"\\t\", \"\").replace(\" \", \"\")\n", 307 | " if len(i[\"text\"]) < 20:\n", 308 | " removal.append(idx)\n", 309 | " print(i)\n", 310 | "\n", 311 | "for i in removal[::-1]:\n", 312 | " del training_data[i]\n", 313 | "len(training_data)" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 246, 319 | "metadata": {}, 320 | "outputs": [], 321 | "source": [ 322 | "import pickle\n", 323 | "with open(\"training_data.pkl\", \"wb\") as f:\n", 324 | " pickle.dump(training_data, f)" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": 248, 330 | "metadata": {}, 331 | "outputs": [ 332 | { 333 | "data": { 334 | "text/plain": [ 335 | "1260" 336 | ] 337 | }, 338 | "execution_count": 248, 339 | "metadata": {}, 340 | "output_type": "execute_result" 341 | } 342 | ], 343 | "source": [ 344 | "removal = []\n", 345 | "total_lens = []\n", 346 | "for i in training_data:\n", 347 | " if len(i[\"label\"] + i[\"text\"]) > 1000:\n", 348 | " removal.append(i)\n", 349 | " total_lens.append(len(i[\"label\"] + i[\"text\"]))\n", 350 | "\n", 351 | "for i in removal[::-1]:\n", 352 | " training_data.remove(i)\n", 353 | "len(training_data)" 354 | ] 355 | }, 356 | { 357 | "attachments": {}, 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "## Convert to OpenAI JSONL Format" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 251, 367 | "metadata": {}, 368 | "outputs": [], 369 | "source": [ 370 | "fp = open(\"fine_tune.txt\", \"w\")\n", 371 | "for i in training_data:\n", 372 | " fp.write(json.dumps(\n", 373 | " {\n", 374 | " \"prompt\": i[\"text\"],\n", 375 | " \"completion\": i[\"label\"],\n", 376 | " }\n", 377 | " ))\n", 378 | " fp.write(\"\\n\")\n", 379 | "fp.close()" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 252, 385 | "metadata": {}, 386 | "outputs": [ 387 | { 388 | "name": "stdout", 389 | "output_type": "stream", 390 | "text": [ 391 | "Upload progress: 100%|███████████████████████| 490k/490k [00:00<00:00, 522Mit/s]\n", 392 | "Uploaded file from ./fine_tune.txt: file-0JIZiNjjmOGfI9giyONlk42g\n", 393 | "Created fine-tune: ft-4HRl6BNwtdUTEMklsphePbcx\n", 394 | "Streaming events until fine-tuning is complete...\n", 395 | "\n", 396 | "(Ctrl-C will interrupt the stream, but not cancel the fine-tune)\n", 397 | "[2023-04-22 22:36:09] Created fine-tune: ft-4HRl6BNwtdUTEMklsphePbcx\n", 398 | "[2023-04-22 22:36:20] Fine-tune costs $15.97\n", 399 | "[2023-04-22 22:36:21] Fine-tune enqueued. Queue number: 0\n" 400 | ] 401 | } 402 | ], 403 | "source": [ 404 | "!openai api fine_tunes.create -t ./fine_tune.txt -m davinci" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 217, 410 | "metadata": {}, 411 | "outputs": [ 412 | { 413 | "name": "stdout", 414 | "output_type": "stream", 415 | "text": [ 416 | "[2023-04-22 21:32:52] Created fine-tune: ft-fgUWvKK6Jpb8ZscMxmMJ5Joi\n", 417 | "[2023-04-22 21:33:00] Fine-tune costs $1.20\n", 418 | "[2023-04-22 21:33:00] Fine-tune enqueued. Queue number: 2\n", 419 | "[2023-04-22 21:33:39] Fine-tune is in the queue. Queue number: 1\n", 420 | "[2023-04-22 21:34:32] Fine-tune is in the queue. Queue number: 0\n", 421 | "[2023-04-22 21:35:29] Fine-tune started\n", 422 | "[2023-04-22 21:37:28] Completed epoch 1/4\n", 423 | "[2023-04-22 21:37:59] Completed epoch 2/4\n", 424 | "[2023-04-22 21:38:29] Completed epoch 3/4\n", 425 | "[2023-04-22 21:39:00] Completed epoch 4/4\n", 426 | "[2023-04-22 21:39:34] Uploaded model: davinci:ft-personal-2023-04-23-04-39-34\n", 427 | "[2023-04-22 21:39:36] Uploaded result file: file-ufOV1uNei2tozTCcbTRztBds\n", 428 | "[2023-04-22 21:39:36] Fine-tune succeeded\n", 429 | "\n", 430 | "Job complete! Status: succeeded 🎉\n", 431 | "Try out your fine-tuned model:\n", 432 | "\n", 433 | "openai api completions.create -m davinci:ft-personal-2023-04-23-04-39-34 -p \n" 434 | ] 435 | } 436 | ], 437 | "source": [ 438 | "!openai api fine_tunes.follow -i ft-4HRl6BNwtdUTEMklsphePbcx # <- change this to your fine-tune id (starts with ft-)" 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": null, 444 | "metadata": {}, 445 | "outputs": [], 446 | "source": [ 447 | "MODEL_NAME = \"davinci:ft-personal-2023-04-23-04-39-34\" # <- change this to your model name from the previous step (starts with davinci:ft-)" 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 229, 453 | "metadata": {}, 454 | "outputs": [], 455 | "source": [ 456 | "prompt = \"\"\"\n", 457 | "uint256 buyoutPrice = (msg.value * 100) /\n", 458 | " (100 - ((depositAmount * 100) / totalSupply));\n", 459 | "uint256 fractionPrice = buyoutPrice / totalSupply;\n", 460 | "\"\"\".replace(\"\\t\", \"\").replace(\" \", \"\")" 461 | ] 462 | }, 463 | { 464 | "attachments": {}, 465 | "cell_type": "markdown", 466 | "metadata": {}, 467 | "source": [ 468 | "```solidity\n", 469 | "uint256 buyoutPrice = (msg.value * 100) /\n", 470 | " (100 - ((depositAmount * 100) / totalSupply));\n", 471 | "uint256 fractionPrice = buyoutPrice / totalSupply;\n", 472 | "\n", 473 | "```" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": 233, 479 | "metadata": {}, 480 | "outputs": [ 481 | { 482 | "name": "stdout", 483 | "output_type": "stream", 484 | "text": [ 485 | "Buyout logic is broken due to integer division. Attacker can buy tokens for much less than expected amount.\n" 486 | ] 487 | } 488 | ], 489 | "source": [ 490 | "import os\n", 491 | "import openai\n", 492 | "openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n", 493 | "response = openai.Completion.create( model=MODEL_NAME, prompt=prompt, temperature=0, max_tokens=1024, top_p=1, frequency_penalty=0.5, presence_penalty=0, stop=[\"\\n\", \" User:\", \" AI:\"] )\n", 494 | "print(response[\"choices\"][0][\"text\"])" 495 | ] 496 | } 497 | ], 498 | "metadata": { 499 | "kernelspec": { 500 | "display_name": "Python 3 (ipykernel)", 501 | "language": "python", 502 | "name": "python3" 503 | }, 504 | "language_info": { 505 | "codemirror_mode": { 506 | "name": "ipython", 507 | "version": 3 508 | }, 509 | "file_extension": ".py", 510 | "mimetype": "text/x-python", 511 | "name": "python", 512 | "nbconvert_exporter": "python", 513 | "pygments_lexer": "ipython3", 514 | "version": "3.10.8" 515 | }, 516 | "orig_nbformat": 4 517 | }, 518 | "nbformat": 4, 519 | "nbformat_minor": 2 520 | } 521 | -------------------------------------------------------------------------------- /training_data.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fuzzland/audit_gpt/348516321b61abf65c6cde95a4513b5bff5775c4/training_data.pkl --------------------------------------------------------------------------------