├── .DS_Store ├── .gitignore ├── LICENSE ├── README.md ├── demogifs ├── 1List.gif ├── 2SearchAndDownload.gif └── 3DLSingleContract.gif ├── lib ├── github_downloader.py └── helper.py ├── requirements.txt └── scrapyfi.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pratraut/scrapyFi/c7f2515f1b3c2a3d60a22a3acafb5ab632252c7a/.DS_Store -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | downloaded_contracts/ 2 | venv/ 3 | tmp/ 4 | *.DS_Store 5 | *.notes 6 | .gitpod.yml 7 | *.vscode 8 | lib/__pycache__ 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 savi0ur 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # scrapyFi 2 | ``` 3 | '||''''| || 4 | .... .... ... .. .... ... ... .... ... || . ... 5 | ||. ' .| '' ||' '' '' .|| ||' || '|. | ||''| || 6 | . '|.. || || .|' || || | '|.| || || 7 | |'..|' '|...' .||. '|..'|' ||...' '| .||. .||. 8 | || .. | 9 | '''' '' 10 | ``` 11 | Scraper for Immunefi. It will help you to perform below task: 12 | 1. List all the projects from immunefi with basic details in tabular form. 13 | 2. Query particular project with its project name and list basic details alongs with all smart contract links. It will also let you download all those contracts. 14 | 3. Download all contracts from provided links. 15 | 4. If given contract is a proxy, it will also download its implementation contracts. 16 | 17 | ## Requirement 18 | Python 3.9+ 19 | 20 | ## Supported Platform 21 | [Immunefi](https://immunefi.com/explore/) 22 | 23 | ### Supported blockchain scanner for downloading contracts 24 | * [https://etherscan.io/](https://etherscan.io/) 25 | * [https://goerli.etherscan.io/](https://goerli.etherscan.io/) 26 | * [https://polygonscan.com/](https://polygonscan.com/) 27 | * [https://mumbai.polygonscan.com/](https://mumbai.polygonscan.com/) 28 | * [https://bscscan.com/](https://bscscan.com/) 29 | * [https://testnet.bscscan.com/](https://testnet.bscscan.com/) 30 | * [https://blockscout.com/](https://blockscout.com/) 31 | * [https://aurorascan.dev/](https://aurorascan.dev/) 32 | 33 | ## Usage 34 | ``` 35 | $ scrapyfi.py [-h] [-t TIMEOUT] {list,search,download} ... 36 | 37 | positional arguments: 38 | {list,search,download} 39 | Commands 40 | list List programs 41 | search Search programs 42 | download Download code from link 43 | 44 | optional arguments: 45 | -h, --help show this help message and exit 46 | -t TIMEOUT, --timeout TIMEOUT 47 | timeout for each request in seconds (default: 10 sec) 48 | ``` 49 | **Default download folder is** `$(PWD)/downloaded_contracts/` 50 | 51 | ### Details of List option 52 | ``` 53 | $ scrapyfi.py list [-h] [-lcl] [-lgl] [-lol] [-ltl] [-ltc] [-t] 54 | 55 | -lcl, --least-contract-link 56 | list project by least contract link 57 | -lgl, --least-github-link 58 | list project by least github link 59 | -lol, --least-other-link 60 | list project by least other link 61 | -ltl, --least-total-link 62 | list project by least total link 63 | -ltc, --least-total-contracts 64 | list project by least total contracts 65 | ``` 66 | **Note:** `-ltc` is a very slow. 67 | 68 | ### Details of Search option 69 | ``` 70 | $ scrapyfi.py search [-h] -q QUERY [-d] [-f FILTER] 71 | 72 | -q QUERY, --query QUERY 73 | Query particular program by its name. Ex. MakerDAO 74 | -d, --download Download all contracts code from queried program 75 | -f FILTER, --filter FILTER 76 | Filter results of a queried program 77 | ``` 78 | 79 | ### Details of Download option 80 | ``` 81 | $ scrapyfi.py download [-h] [-fn FOLDER_NAME] links [links ...] 82 | 83 | -fn FOLDER_NAME, --folder-name FOLDER_NAME 84 | Folder name to store contracts 85 | links Download all contracts code from provided links (space separated) 86 | ``` 87 | 88 | ## Demo 89 | ### List 90 | ``` 91 | $ python3 scrapyfi.py list 92 | 93 | '||''''| || 94 | .... .... ... .. .... ... ... .... ... || . ... 95 | ||. ' .| '' ||' '' '' .|| ||' || '|. | ||''| || 96 | . '|.. || || .|' || || | '|.| || || 97 | |'..|' '|...' .||. '|..'|' ||...' '| .||. .||. 98 | || .. | 99 | '''' '' 100 | Author: savi0ur 101 | Helps in Searching and Downloading contracts of a program from Immunefi 102 | v1.0 103 | 104 | 122.7532970905304 seconds to process 291 projects. 105 | NOTE: Ignoring programs with zero contract links 106 | ┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ 107 | ┃ SN ┃ Project ┃ Reward ┃ Technologies ┃ #Links ┃ 108 | ┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ 109 | │ 1 │ Wormhole │ $10,000,000 │ Smart Contract|Websites and Applications|Blockchain/DLT │ 8 │ 110 | │ 2 │ MakerDAO │ $10,000,000 │ Smart Contract|Websites and Applications │ 131 │ 111 | │ 3 │ Aurora │ $6,000,000 │ Smart Contract|Websites and Applications │ 31 │ 112 | |[...]| [...] | [...] | [...] | [...] | 113 | │ 287 │ Pillar │ $1,250 │ Smart Contract │ 2 │ 114 | │ 288 │ CRO Max │ $1,000 │ Smart Contract │ 1 │ 115 | │ 289 │ Nuggies │ $1,000 │ Smart Contract │ 1 │ 116 | └─────┴────────────────────────┴─────────────┴─────────────────────────────────────────────────────────┴────────┘ 117 | ``` 118 | 119 | **CLI Demo** 120 | 121 | ![](./demogifs/1List.gif) 122 | 123 | ### Search 124 | ``` 125 | $ python3 scrapyfi.py search -q "wormhole" 126 | 127 | '||''''| || 128 | .... .... ... .. .... ... ... .... ... || . ... 129 | ||. ' .| '' ||' '' '' .|| ||' || '|. | ||''| || 130 | . '|.. || || .|' || || | '|.| || || 131 | |'..|' '|...' .||. '|..'|' ||...' '| .||. .||. 132 | || .. | 133 | '''' '' 134 | Author: savi0ur 135 | Helps in Searching and Downloading contracts of a program from Immunefi 136 | v1.0 137 | 138 | Searching for wormhole... 139 | 3.4474568367004395 seconds to process 1 projects. 140 | NOTE: Ignoring programs with zero contract links 141 | ┏━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ 142 | ┃ SN ┃ Project ┃ Reward ┃ Technologies ┃ #Links ┃ 143 | ┡━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ 144 | │ 1 │ Wormhole │ $10,000,000 │ Smart Contract|Websites and Applications|Blockchain/DLT │ 8 │ 145 | └────┴──────────┴─────────────┴─────────────────────────────────────────────────────────┴────────┘ 146 | Links for Wormhole: 147 | GITHUB: 148 | 1. https://github.com/certusone/wormhole/tree/dev.v2/ethereum 149 | 2. https://github.com/certusone/wormhole/tree/dev.v2/solana 150 | 3. https://github.com/certusone/wormhole/tree/dev.v2/terra 151 | 4. https://github.com/certusone/wormhole/tree/dev.v2/sdk/rust 152 | 5. https://github.com/certusone/wormhole/tree/dev.v2/node 153 | 154 | CONTRACT: 155 | NO DATA FOUND 156 | 157 | 158 | OTHER: 159 | 1. https://docs.wormholenetwork.com/wormhole/contracts#mainnet 160 | 2. https://portalbridge.com 161 | 3. https://wormholenetwork.com/explorer/ 162 | ``` 163 | **CLI Demo** 164 | 165 | ![](./demogifs/2SearchAndDownload.gif) 166 | 167 | ### Download 168 | ``` 169 | $ python3 scrapyfi.py download -fn "Custom Download" https://etherscan.io/address/0x98f3c9e6E3fAce36bAAd05FE09d375Ef1464288B https://github.com/makerdao/dss/blob/master/src/dai.sol 170 | 171 | '||''''| || 172 | .... .... ... .. .... ... ... .... ... || . ... 173 | ||. ' .| '' ||' '' '' .|| ||' || '|. | ||''| || 174 | . '|.. || || .|' || || | '|.| || || 175 | |'..|' '|...' .||. '|..'|' ||...' '| .||. .||. 176 | || .. | 177 | '''' '' 178 | Author: savi0ur 179 | Helps in Searching and Downloading contracts of a program from Immunefi 180 | v1.0 181 | 182 | Downloading contract(s) from https://etherscan.io/address/0x98f3c9e6E3fAce36bAAd05FE09d375Ef1464288B#code: 183 | [#] Directory /Users/pratraut/Public/scrapyFi/downloaded_contracts/Custom Download/Wormhole_1 has been created. 184 | Contract address is 0x98f3c9e6E3fAce36bAAd05FE09d375Ef1464288B 185 | [+] Download Wormhole.sol... 186 | [+] Download ERC1967Proxy.sol... 187 | [+] Download ERC1967Upgrade.sol... 188 | [+] Download Proxy.sol... 189 | [+] Download IBeacon.sol... 190 | [+] Download Address.sol... 191 | [+] Download StorageSlot.sol... 192 | [#] Wormhole smart contract code downloaded successfully in /Users/pratraut/Public/scrapyFi/downloaded_contracts/Custom Download/Wormhole_1. 193 | 194 | [#] Downloading repo/files from https://github.com/makerdao/dss/blob/master/src/dai.sol: 195 | [#] Directory /Users/pratraut/Public/scrapyFi/downloaded_contracts/Custom Download/dss has been created. 196 | [+] Downloading dai.sol... 197 | [#] File "dai.sol" downloaded successfully in /Users/pratraut/Public/scrapyFi/downloaded_contracts/Custom Download/dss 198 | ``` 199 | 200 | **CLI Demo** 201 | 202 | ![](./demogifs/3DLSingleContract.gif) 203 | -------------------------------------------------------------------------------- /demogifs/1List.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pratraut/scrapyFi/c7f2515f1b3c2a3d60a22a3acafb5ab632252c7a/demogifs/1List.gif -------------------------------------------------------------------------------- /demogifs/2SearchAndDownload.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pratraut/scrapyFi/c7f2515f1b3c2a3d60a22a3acafb5ab632252c7a/demogifs/2SearchAndDownload.gif -------------------------------------------------------------------------------- /demogifs/3DLSingleContract.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pratraut/scrapyFi/c7f2515f1b3c2a3d60a22a3acafb5ab632252c7a/demogifs/3DLSingleContract.gif -------------------------------------------------------------------------------- /lib/github_downloader.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import os 3 | import re 4 | import sys 5 | 6 | headers = { 7 | "User-Agent" : "Mozilla/5.0 (X11; Ubuntu; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", 8 | } 9 | 10 | def safe_list_get(lst, idx, default): 11 | try: 12 | return lst[idx] 13 | except IndexError: 14 | return default 15 | 16 | def parse(url): 17 | regex = r'https?://[www.]*github.com(.+)' 18 | result = re.match(regex, url) 19 | if not result: 20 | print(f"Invalid URL : {url}") 21 | return 22 | 23 | # print(f"repo path = {result.group(1)}") 24 | repo_path = result.group(1) 25 | # print("repo path =", repo_path) 26 | split_path = repo_path.split('/') 27 | author = safe_list_get(split_path, 1, "") 28 | repository = safe_list_get(split_path, 2, "") 29 | branch = safe_list_get(split_path, 4, "") 30 | if not branch: 31 | branch = "master" 32 | remaining_path = "" 33 | else: 34 | remaining_path = repo_path[repo_path.index(branch) + len(branch) + 1:] 35 | 36 | # print(f"author = {author}, repo = {repository}, branch = {branch}, remaining path = {remaining_path}") 37 | return author, repository, branch, remaining_path 38 | 39 | def isFile(link): 40 | SMART_CONTRACT_EXTENSIONS = [".sol", ".ts", ".vy", ".rs"] 41 | for ext in SMART_CONTRACT_EXTENSIONS: 42 | if link.endswith(ext): 43 | return True 44 | return False 45 | 46 | def raw_download(link, proj_name=""): 47 | author, repository, branch, remaining_path = parse(link) 48 | # print(f"Author = {author}, Repository = {repository}, Branch = {branch}, Remaining path = {remaining_path}") 49 | is_file = isFile(link) 50 | new_link = "" 51 | if is_file: 52 | # https://github.com/makerdao/dss-cdp-manager/raw/master/src/DssCdpManager.sol 53 | new_link = f"https://github.com/{author}/{repository}/raw/{branch}/{remaining_path}" 54 | else: 55 | # https://github.com/makerdao/dss-cdp-manager/archive/refs/heads/master.zip 56 | new_link = f"https://github.com/{author}/{repository}/archive/refs/heads/{branch}.zip" 57 | 58 | # print("New link =", new_link) 59 | # print("Is File =", is_file) 60 | 61 | content = requests.head(new_link, headers=headers, timeout=int(os.environ['TIMEOUT']), allow_redirects=True) 62 | # print("Status code =", content.status_code) 63 | # print("Redirect URL =", content.url) 64 | 65 | redirect_url = content.url 66 | content = requests.get(redirect_url, headers=headers, allow_redirects=True, timeout=int(os.environ['TIMEOUT'])) 67 | # print("Content:", content.content) 68 | 69 | DOWNLOAD_PATH = os.path.join(os.getcwd(), "downloaded_contracts", proj_name) 70 | print(f"[#] Downloading repo/files from {link}:") 71 | path = os.path.join(DOWNLOAD_PATH, repository) 72 | if is_file: 73 | if not os.path.exists(path): 74 | os.makedirs(path) 75 | print(f"[#] Directory {path} has been created.") 76 | 77 | filename = remaining_path.split("/")[-1] 78 | if not os.path.exists(os.path.join(path, filename)): 79 | with open(os.path.join(path, filename), "wb"): 80 | pass 81 | with open(os.path.join(path, filename), "wb") as f: 82 | print(f"[+] Downloading {filename}...") 83 | f.write(content.content) 84 | print(f"[#] File \"{filename}\" downloaded successfully in {path}") 85 | else: 86 | # unzip the content 87 | from io import BytesIO 88 | from zipfile import ZipFile, Path 89 | with ZipFile(BytesIO(content.content)) as my_zip_file: 90 | for contained_file in my_zip_file.namelist(): 91 | temp_path = os.path.join(path, contained_file) 92 | # print("Path =", path) 93 | if not os.path.exists(temp_path): 94 | path_obj = Path(my_zip_file, at=contained_file) 95 | if path_obj.is_file(): 96 | # print("File :", contained_file) 97 | print(f"[+] Downloading {contained_file}...") 98 | with open(temp_path, "wb") as file_handle: 99 | file_handle.write(path_obj.read_bytes()) 100 | else: 101 | # print("Dir :", contained_file) 102 | os.makedirs(temp_path) 103 | print(f"[#] Repo \"{repository}\" downloaded successfully in {path}") 104 | 105 | def download_github(links, project_name=""): 106 | if not links: 107 | print(f"Github link list for \"{project_name}\" is empty.") 108 | return 109 | 110 | for link in links: 111 | raw_download(link, project_name) 112 | print() 113 | 114 | # LINK = ["https://github.com/makerdao/dss", "https://github.com/makerdao/dss-cdp-manager/blob/master/src/DssCdpManager.sol", "https://github.com/makerdao/dss-gem-joins/tree/v1.2"] 115 | 116 | # https://github.com/makerdao/dss -> https://github.com/makerdao/dss/archive/refs/heads/master.zip 117 | # https://github.com/makerdao/dss-cdp-manager/blob/master/src/DssCdpManager.sol -> https://github.com/makerdao/dss-cdp-manager/raw/master/src/DssCdpManager.sol 118 | # https://github.com/makerdao/dss-cdp-manager -> https://github.com/makerdao/dss-cdp-manager/archive/refs/heads/master.zip 119 | # https://github.com/makerdao/dss-gem-joins/tree/v1.2 -> https://github.com/makerdao/dss-gem-joins/archive/refs/heads/v1.2.zip 120 | 121 | #download_github(link, "dummy2") -------------------------------------------------------------------------------- /lib/helper.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import requests 4 | import html.parser 5 | import concurrent.futures 6 | import asyncio 7 | import aiohttp 8 | 9 | class Project(object): 10 | def __init__(self, **kwargs): 11 | self.id = kwargs['id'] 12 | self.project = kwargs['project'] 13 | self.date = kwargs['date'] 14 | self.maximum_reward = kwargs['maximum_reward'] 15 | self.technologies = kwargs['technologies'] 16 | self.kyc = kwargs['kyc'] 17 | self.assets_in_scope = kwargs['assets_in_scope'] 18 | self.url = kwargs['url'] 19 | self.num_contracts = kwargs['num_contracts'] 20 | 21 | # ABI for the implementation contract at 0xd9db270c1b5e3bd161e8c8503c55ceabee709552, likely using a custom proxy implementation.
22 | #
Learn more about proxy contracts in our Knowledge Base
23 | #
24 | class EtherscanPattern: 25 | CODE = r"(.*?)" 26 | FILENAME = r"File \d+ of \d+\s*:\s*(.*?)" 27 | CONTRACT_NAME = r"Contract Name.*?(.*?)" 28 | IMPLEMENTATION_ADDRESS = r"[.*?(.*?)|.*?Minimal\s+Proxy\s+Contract.*?.*?(.*?)]" 29 | 30 | class GoerliEtherscanPattern: 31 | CODE = r"(.*?)" 32 | FILENAME = r"File \d+ of \d+\s*:\s*(.*?)" 33 | CONTRACT_NAME = r"Contract Name.*?(.*?)" 34 | IMPLEMENTATION_ADDRESS = r"[.*?(.*?)|.*?Minimal\s+Proxy\s+Contract.*?.*?(.*?)]" 35 | 36 | class MumbaiPolygonscanPattern: 37 | CODE = r"(.*?)" 38 | FILENAME = r"File \d+ of \d+\s*:\s*(.*?)" 39 | CONTRACT_NAME = r"Contract Name.*?(.*?)" 40 | IMPLEMENTATION_ADDRESS = r"[.*?(.*?)|.*?Minimal\s+Proxy\s+Contract.*?.*?(.*?)]" 41 | 42 | class PolygonscanPattern: 43 | CODE = r"(.*?)" 44 | FILENAME = r"File \d+ of \d+\s*:\s*(.*?)" 45 | CONTRACT_NAME = r"Contract Name.*?(.*?)" 46 | IMPLEMENTATION_ADDRESS = r"[.*?(.*?)|.*?Minimal\s+Proxy\s+Contract.*?.*?(.*?)]" 47 | 48 | class TestnetBscscanPattern: 49 | CODE = r"(.*?)" 50 | FILENAME = r"File \d+ of \d+\s*:\s*(.*?)" 51 | CONTRACT_NAME = r"Contract Name.*?(.*?)" 52 | IMPLEMENTATION_ADDRESS = r"[.*?(.*?)|.*?Minimal\s+Proxy\s+Contract.*?.*?(.*?)]" 53 | 54 | class BscscanPattern: 55 | CODE = r"(.*?)" 56 | FILENAME = r"File \d+ of \d+\s*:\s*(.*?)" 57 | CONTRACT_NAME = r"Contract Name.*?(.*?)" 58 | IMPLEMENTATION_ADDRESS = r"[.*?(.*?)|.*?Minimal\s+Proxy\s+Contract.*?.*?(.*?)]" 59 | 60 | class BlockscoutPattern: 61 | CODE = r"