├── .gitattributes ├── .idea ├── .gitignore ├── address_labels_github.iml ├── inspectionProfiles │ ├── Project_Default.xml │ └── profiles_settings.xml ├── misc.xml └── modules.xml ├── README.md ├── code ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-36.pyc │ ├── chain.cpython-36.pyc │ ├── sqlite.cpython-36.pyc │ └── util.cpython-36.pyc ├── chain.py ├── sqlite.py └── util.py ├── data └── addresses.db ├── driver.py ├── frontend └── db_browse.py └── requirements.txt /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # Default ignored files 2 | /shelf/ 3 | /workspace.xml 4 | -------------------------------------------------------------------------------- /.idea/address_labels_github.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /.idea/inspectionProfiles/Project_Default.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 12 | -------------------------------------------------------------------------------- /.idea/inspectionProfiles/profiles_settings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 6 | -------------------------------------------------------------------------------- /.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Blockchain Address Ownership Database 2 | Update 01/09/2023: 3 | I've rescraped the data and remade the database. It's now too large for github, and can be downloaded from here: https://drive.google.com/file/d/1rEFloDWxmFjdb3SenSvrSsQk3g0-3RNk/view?usp=sharing 4 | Currently the database has: 5 | * ~1.3M ETH addresses 6 | * ~600K BSC addresses 7 | * ~320K Polygon addresses 8 | * ~54K Fantom addresses 9 | * ~5K Arbitrum addresses 10 | * ~3K Optimism addresses 11 | 12 | And also some addresses for Cronos, Gnossis, Aurora, Boba, HECO, CELO, Moonbeam, Moonriver 13 | 14 | I have not updated the code. The scanners are too finicky and I don't want to spend time maintaining it. Let me know if you have any questions. 15 | 16 | [old info below] 17 | The database is in data/addresses.db 18 | 19 | This is a SQLite database of addresses from several blockchains. It's obtained by page scraping & API-querying etherscan's team scanners. For each address it has some subset of [name tag, labels, ownership entity]. I page scraped each labelcloud (i.e. etherscan.io/labelcloud) for all labels, and then got all the addresses for each labels with their name tags. Additionally, I downloaded all addresses created by each labeled contract deployer and factory contract (using the scanner's API) 20 | 21 | Exceptions: I didn't download all the million of shitcoin pools from uniswap and pancakeswap, only pools with at least 10 transactions. Also, etherscan-like scanners only allow for first 10000 transactions to be accessed via API, so children addresses will be missing from especially prolific deployers and factories. 22 | 23 | The database will not update automatically. I may occasionally update it. You can create your own copy with the code; it requires BeautifulSoup for scraping, and cookie and API key from each scanner for the blockchains you want. If you're doing ETH or BSC it will take several hours. 24 | 25 | Currently the database has: 26 | * ~160K ETH addresses 27 | * ~77K BSC addresses 28 | * ~9K Polygon addresses 29 | * ~13K HECO addresses 30 | * ~2K Fantom addresses 31 | * ~0.5K HSC addresses 32 | 33 | -------------------------------------------------------------------------------- /code/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__init__.py -------------------------------------------------------------------------------- /code/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/chain.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__pycache__/chain.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/sqlite.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__pycache__/sqlite.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/util.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__pycache__/util.cpython-36.pyc -------------------------------------------------------------------------------- /code/chain.py: -------------------------------------------------------------------------------- 1 | import json 2 | import requests 3 | import time 4 | from .util import log 5 | import re 6 | from bs4 import BeautifulSoup 7 | import traceback 8 | 9 | class Chain: 10 | def __init__(self,db,name,explorer_domain,main_asset, api_key, cookie, primary_swap=None, ignore_labels=()): 11 | self.explorer_domain = explorer_domain 12 | self.api_url = 'https://api.'+explorer_domain+'/api' 13 | 14 | self.main_asset = main_asset 15 | self.api_key = api_key 16 | self.cookie = cookie 17 | self.name=name 18 | self.db = db 19 | self.primary_swap = primary_swap 20 | self.ignore_labels = ignore_labels 21 | 22 | # self.db.create_table(self.name+"_addresses", 'address PRIMARY KEY, deployer, entity', drop=False) 23 | # self.db.create_index(self.name + "_addresses_idx_1", self.name+"_addresses", 'entity') 24 | 25 | self.db.create_table(self.name + "_addresses", 'address PRIMARY KEY, tag, ancestor_address, entity', drop=False) 26 | self.db.create_table(self.name + "_labels", 'address, label', drop=False) 27 | self.db.create_index(self.name + "_addresses_idx_1", self.name+"_addresses", 'entity') 28 | self.db.create_index(self.name + "_labels_idx_1", self.name + "_labels", 'address, label', unique=True) 29 | 30 | self.load_all_from_db() 31 | 32 | def get_label_list(self): 33 | url = 'https://' + self.explorer_domain + '/labelcloud' 34 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36', 35 | 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 36 | 'cache-control': 'max-age=0', 37 | 'cookie': self.cookie} 38 | cont = requests.get(url, headers=headers).content 39 | html = cont.decode('utf-8') 40 | soup = BeautifulSoup(html, features="lxml") 41 | dropdowns = soup.find_all('div',class_='dropdown') 42 | 43 | accounts_labels = [] 44 | for entry in dropdowns: 45 | label_url = entry.find('button')['data-url'] 46 | label = entry.find('span').contents[0] 47 | sections = entry.find_all('a') 48 | # print(label, sections) 49 | for section in sections: 50 | tp = section.contents[-1] 51 | if 'Accounts' in tp: 52 | accounts_labels.append((label.strip(),label_url)) 53 | break 54 | return accounts_labels 55 | 56 | 57 | 58 | 59 | def extract_entity(self,tag): 60 | if ':' in tag: 61 | row_entity = tag[:tag.index(':')].upper() 62 | else: 63 | tag_parts = tag.split(' ') 64 | if tag_parts[-1].isdigit(): 65 | row_entity = ' '.join(tag_parts[:-1]).upper() 66 | else: 67 | row_entity = tag.upper() 68 | return row_entity 69 | 70 | # def download_labeled(self,label, subcatid='undefined',entity=None, page_size=10000): 71 | # db_table = self.name+"_addresses" 72 | # # self.db.create_table(db_table, 'address PRIMARY KEY, label, tag, entity', drop=False) 73 | # # self.db.create_index(self.name+"_"+table+"_idx_1", db_table, 'entity') 74 | # offset = 0 75 | # page_idx = 0 76 | # done = False 77 | # while not done: 78 | # url = 'https://'+self.explorer_domain+'/accounts/label/'+label+'?subcatid='+subcatid+'&size='+str(page_size)+'&start='+str(offset)+'&col=1&order=asc' 79 | # print(page_idx,offset,url) 80 | # headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36', 81 | # 'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 82 | # 'cache-control':'max-age=0', 83 | # 'cookie':self.cookie} 84 | # cont = requests.get(url,headers=headers).content 85 | # # print(cont) 86 | # html = cont.decode('utf-8') 87 | # soup = BeautifulSoup(html,features="lxml") 88 | # if subcatid == 'undefined': 89 | # table = soup.find("table", class_="table-hover") 90 | # else: 91 | # table = soup.find("table", id="table-subcatid-"+subcatid) 92 | # # print(table) 93 | # try: 94 | # rows = table.find_all("tr") 95 | # except: 96 | # print('EXCEPTION',traceback.format_exc()) 97 | # print(html) 98 | # exit(0) 99 | # print('rows',len(rows)) 100 | # for row in rows: 101 | # # print(row) 102 | # cells = row.find_all("td") 103 | # # print(len(cells)) 104 | # if len(cells) == 4: 105 | # address = cells[0].find("a").contents[0] 106 | # tag = cells[1].contents 107 | # if len(tag) == 1: 108 | # tag = tag[0] 109 | # if entity is None: 110 | # row_entity = self.extract_entity(tag) 111 | # else: 112 | # row_entity = entity 113 | # # print(address,tag,row_entity) 114 | # self.db.insert_kw(db_table,values = [address,tag,None,row_entity]) 115 | # self.db.insert_kw(self.name+"_labels", values=[address, label]) 116 | # self.db.commit() 117 | # offset += page_size 118 | # page_idx += 1 119 | # 120 | # # done=True 121 | # if len(rows) < page_size: 122 | # done = True 123 | # else: 124 | # print("sleeping") 125 | # time.sleep(30) 126 | 127 | 128 | def download_labeled_post(self,label, label_url): 129 | db_table = self.name + "_addresses" 130 | # label_rep = label.lower().replace(' ','-').replace('.','-') 131 | url = 'https://' + self.explorer_domain + '/accounts/label/' + label_url 132 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36', 133 | 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 134 | 'cache-control': 'max-age=0', 135 | 'cookie': self.cookie} 136 | cont = requests.get(url, headers=headers).content 137 | soup = BeautifulSoup(cont,features="lxml") 138 | header = soup.find('div',class_='card-header') 139 | if header is None: 140 | subcats = ['undefined'] 141 | else: 142 | subcats = [] 143 | subcat_els = header.find_all('a') 144 | for subcat_el in subcat_els: 145 | subcat = subcat_el['val'] 146 | subcats.append(subcat) 147 | log(label,'subcats',subcats) 148 | 149 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36', 150 | 'cache-control': 'max-age=0', 151 | # 'referer':'https://etherscan.io/accounts/label/balancer-vested-shareholders', 152 | 'accept': 'application/json, text/javascript, */*; q=0.01', 153 | 'content-type': 'application/json', 154 | 'cookie': self.cookie} 155 | page_size = 100 #it won't give more than 100 156 | for subcat in subcats: 157 | log('subcat',subcat) 158 | done = False 159 | start = 0 160 | inserted = 0 161 | while not done: 162 | 163 | payload = {"dataTableModel":{"draw":2,"columns":[{"data":"address","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"nameTag","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"balance","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}},{"data":"txnCount","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}}],"order":[{"column":3,"dir":"desc"}],"start":start,"length":page_size,"search":{"value":"","regex":False}}, 164 | "labelModel":{"label":label_url}} 165 | if subcat != 'undefined': 166 | payload['labelModel']['subCategoryId'] = subcat 167 | 168 | url = 'https://' + self.explorer_domain + '/accounts.aspx/GetTableEntriesBySubLabel' 169 | payload = json.dumps(payload) 170 | # print(payload) 171 | time.sleep(0.25) 172 | resp = requests.post(url, payload, headers=headers) 173 | js = json.loads(resp.content.decode('utf-8')) 174 | data = js['d']['data'] 175 | if len(data) < page_size: 176 | done = True 177 | # print(resp.content) 178 | # pprint.pprint(js) 179 | txncount = None 180 | for entry in data: 181 | address = entry['address'] 182 | match = re.search('0x[a-f0-9]{40}',address) 183 | address = match.group() 184 | if address: 185 | 186 | txncount = int(entry['txnCount'].replace(',','')) 187 | if label.lower() == self.primary_swap and txncount < 10: 188 | done = True 189 | 190 | nametag = None 191 | row_entity = None 192 | if 'nameTag' in entry and len(entry['nameTag']) > 0: 193 | nametag = entry['nameTag'] 194 | row_entity = self.extract_entity(nametag) 195 | 196 | self.db.insert_kw(db_table, values=[address, nametag, None, row_entity]) 197 | self.db.insert_kw(self.name + "_labels", values=[address, label]) 198 | inserted += 1 199 | 200 | log('label',label,'sub',subcat,'start',start,'inserted',inserted, 'last txncount',txncount) 201 | self.db.commit() 202 | 203 | 204 | start += page_size 205 | 206 | def store_all_labels_to_db(self, start = None): 207 | label_list = self.get_label_list() 208 | for label_name, label_url in sorted(label_list): 209 | if start is not None and label_name < start: 210 | continue 211 | if 1: 212 | if label_name in self.ignore_labels: 213 | log("Ignoring label",label_name) 214 | else: 215 | log("Processing label",label_name,label_url) 216 | self.download_labeled_post(label_name,label_url) 217 | 218 | 219 | def load_all_from_db(self): 220 | self.addresses = {} 221 | rows = self.db.select("SELECT * FROM "+self.name+"_addresses") 222 | log(self.name,str(len(rows))+" addresses currently in the database") 223 | for row in rows: 224 | address, tag, ancestor, entity = row 225 | self.addresses[address] = entity 226 | 227 | 228 | 229 | 230 | def get_spawns(self,address, ancestor, entity, level=0, deep=False): 231 | time.sleep(0.25) 232 | offset = ' ' * level 233 | log(offset, address, entity) 234 | if deep and level > 0: 235 | self.db.insert_kw(self.name + "_addresses", values=[address, None, ancestor, entity], ignore=True) 236 | url = self.api_url + "?module=account&action=txlist&address=" + address + "&page=1&sort=asc&apikey=" + self.api_key + "&offset=10000" 237 | resp = requests.get(url).json() 238 | cnt = 0 239 | cnt_int = 0 240 | data1 = resp['result'] 241 | for transaction in data1: 242 | if transaction['to'] == '' and len(transaction['input']) > 2 and transaction['value'] == '0' and transaction['from'].lower() == address and transaction['isError'] == '0': 243 | spawn = transaction['contractAddress'].lower() 244 | if spawn not in self.addresses: 245 | if deep: 246 | cnt_sub_1, cnt_sub_2, _, _ = self.get_spawns(spawn, ancestor, entity, level=level+1, deep=True) 247 | cnt += cnt_sub_1 248 | cnt_int += cnt_sub_2 249 | else: 250 | self.db.insert_kw(self.name + "_addresses", values=[spawn, None, address, entity], ignore=True) 251 | cnt += 1 252 | 253 | time.sleep(0.25) 254 | url = self.api_url + "?module=account&action=txlistinternal&address=" + address + "&page=1&sort=asc&apikey=" + self.api_key + "&offset=10000" 255 | resp = requests.get(url).json() 256 | data2 = resp['result'] 257 | ld2 = 0 258 | if data2 is not None: 259 | ld2 = len(data2) 260 | for transaction in data2: 261 | if 'create' in transaction['type'] and transaction['to'] == "": 262 | spawn = transaction['contractAddress'].lower() 263 | if spawn not in self.addresses: 264 | if deep: 265 | cnt_sub_1, cnt_sub_2, _, _ = self.get_spawns(spawn, ancestor, entity, level=level+1, deep=True) 266 | cnt += cnt_sub_1 267 | cnt_int += cnt_sub_2 268 | else: 269 | self.db.insert_kw(self.name + "_addresses", values=[spawn, None, address, entity]) 270 | cnt_int += 1 271 | 272 | return cnt, cnt_int, len(data1),ld2 273 | # if spawn not in self.deployers: 274 | # self.get_spawns(spawn, entity, level=level+1) 275 | 276 | 277 | 278 | def load_from_db_by_label(self,label): 279 | res = {} 280 | Q = "select a.* from "+self.name+"_addresses as a, "+self.name+"_labels as l WHERE a.address = l.address and l.label = '"+label+"'" 281 | # print(Q) 282 | rows = self.db.select(Q) 283 | for row in rows: 284 | address, tag, ancestor, entity = row 285 | res[address.lower()] = entity 286 | return res 287 | 288 | def find_unlabeled_deployers(self): 289 | query = "select a.*, group_concat(label) as labellist " \ 290 | "from "+self.name+"_addresses as a, "+self.name+"_labels as l " \ 291 | "where a.address = l.address and (tag like '%factory%' or tag like '%deployer%') " \ 292 | "group by a.address " \ 293 | "having labellist not like '%contract-deployer%' and labellist not like '%factory%'" 294 | rows = self.db.select(query) 295 | res = {} 296 | for row in rows: 297 | res[row[0].lower()] = row[3] 298 | return res 299 | 300 | 301 | def get_all_spawns_by_dict(self, addr_dict, start = None): 302 | for idx, (address, entity) in enumerate(sorted(addr_dict.items())): 303 | if start is not None and address < start: 304 | continue 305 | cnt, cnt_int, total, total_int = self.get_spawns(address,address,entity) 306 | log(idx, "Father", address, entity, cnt, cnt_int, total, total_int) 307 | self.db.commit() 308 | 309 | 310 | 311 | def test(self): 312 | # address = '0x5c69bee701ef814a2b6a3edd4b1652cb9cc5aa6f' #uniswap factory 313 | # address = '0xbaf9a5d4b0052359326a6cdab54babaa3a3a9643' #1inch factory 314 | # url = self.api_url + "?module=account&action=txlistinternal&address=" + address + "&page=1&sort=asc&apikey=" + self.api_key + "&offset=10000" 315 | # resp = requests.get(url) 316 | # pprint.pprint(resp.json()) 317 | # res = self.get_spawns('0x5c69bee701ef814a2b6a3edd4b1652cb9cc5aa6f', 'UNISWAP') 318 | # print(res) 319 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36', 320 | # 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 321 | 'cache-control': 'max-age=0', 322 | # 'referer':'https://etherscan.io/accounts/label/balancer-vested-shareholders', 323 | 'accept':'application/json, text/javascript, */*; q=0.01', 324 | 'content-type':'application/json', 325 | 'cookie': self.cookie} 326 | # payload = {"dataTableModel":{"draw":2,"columns":[{"data":"address","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"nameTag","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"balance","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}},{"data":"txnCount","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}}],"order":[{"column":3,"dir":"desc"}],"start":0,"length":25,"search":{"value":"","regex":False}},"labelModel":{"label":"balancer-vested-shareholders"}} 327 | # payload = {"dataTableModel":{"draw":2,"columns":[{"data":"address","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"nameTag","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"balance","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}},{"data":"txnCount","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}}],"order":[{"column":3,"dir":"desc"}],"start":9000,"length":25,"search":{"value":"","regex":False}},"labelModel":{"label":"pancakeswap"}} 328 | true = True 329 | false = False 330 | payload = {"dataTableModel": {"columns": [], "order": [{"column": 1, "dir": "desc"}], "start": 0, "length": 200}, 331 | # "labelModel": {"label": "factory-contract"} 332 | "addressModel": {"address":"0xfc00c80b0000007f73004edb00094cad80626d8d"} 333 | } 334 | 335 | 336 | payload = json.dumps(payload) 337 | # url = 'https://etherscan.io/accounts.aspx/GetTableEntriesBySubLabel' 338 | url = 'https://'+self.explorer_domain+'/accounts.aspx/GetTableEntriesBySubLabel' 339 | resp = requests.post(url,payload,headers=headers) 340 | # print(resp) 341 | print(resp.content.decode('utf-8')) 342 | # js = json.loads(resp.content.decode('utf-8')) 343 | # pprint.pprint(js) 344 | -------------------------------------------------------------------------------- /code/sqlite.py: -------------------------------------------------------------------------------- 1 | import warnings 2 | warnings.filterwarnings("ignore",category=DeprecationWarning) 3 | import traceback 4 | 5 | import sqlite3 6 | from sqlite3 import Error 7 | import pprint 8 | import atexit 9 | # from util import dec 10 | from collections import defaultdict 11 | from random import shuffle 12 | import time 13 | from queue import Queue 14 | 15 | 16 | class SQLite: 17 | def __init__(self, db=None,check_same_thread=True, isolation_level='DEFERRED', read_only=False, do_logging=False): 18 | self.deferred_buffers = defaultdict(Queue) 19 | self.currently_processing = False 20 | self.conn = None 21 | self.read_only = read_only 22 | self.do_logging = do_logging 23 | self.log_file = 'data/sqlite_log.txt' 24 | 25 | 26 | if db is not None: 27 | self.connect(db,check_same_thread=check_same_thread,isolation_level=isolation_level) 28 | 29 | 30 | def execute_and_log(self,cursor,query,values=None): 31 | tstart = time.time() 32 | if values is None: 33 | rv = cursor.execute(query) 34 | else: 35 | rv = cursor.execute(query,values) 36 | tend = time.time() 37 | if self.do_logging: 38 | myfile = open(self.log_file, "a", encoding="utf-8") 39 | myfile.write('\nQUERY '+query+'\n') 40 | myfile.write('TIMING '+str(tend-tstart)+'\n') 41 | myfile.close() 42 | return rv 43 | 44 | def connect(self,db, check_same_thread=True, isolation_level='DEFERRED'): 45 | # print("CONNECT TO "+db) 46 | self.db = db 47 | if self.read_only: 48 | self.conn = sqlite3.connect('file:data/' + db + '.db?mode=ro', timeout=5, check_same_thread=check_same_thread, 49 | isolation_level=isolation_level, uri=True) 50 | else: 51 | self.conn = sqlite3.connect('data/' + db + '.db', timeout=5, check_same_thread=check_same_thread, 52 | isolation_level=isolation_level) 53 | 54 | self.conn.row_factory = sqlite3.Row 55 | 56 | def disconnect(self): 57 | if self.conn is not None: 58 | # print("DISCONNECT FROM " + self.db) 59 | self.conn.close() 60 | self.conn = None 61 | 62 | def commit(self): 63 | self.conn.commit() 64 | 65 | def create_table(self,table_name,fields,drop=True): 66 | conn = self.conn 67 | c = conn.cursor() 68 | if drop: 69 | query = "DROP TABLE IF EXISTS "+table_name 70 | self.execute_and_log(c,query) 71 | query = "CREATE TABLE IF NOT EXISTS "+table_name+" ("+fields+")" 72 | self.execute_and_log(c,query) 73 | conn.commit() 74 | 75 | def create_index(self,index_name,table_name,fields, unique=False): 76 | conn = self.conn 77 | c = conn.cursor() 78 | query = "CREATE " 79 | if unique: 80 | query += "UNIQUE " 81 | query += "INDEX IF NOT EXISTS " + index_name + " on " + table_name + " (" + fields + ")" 82 | self.execute_and_log(c,query) 83 | conn.commit() 84 | 85 | def query(self,q, commit=True,value_list=None): 86 | c = self.conn.cursor() 87 | if value_list is None: 88 | self.execute_and_log(c,q) 89 | else: 90 | self.execute_and_log(c,q,value_list) 91 | modified = c.rowcount 92 | if commit: 93 | self.commit() 94 | return modified 95 | 96 | # def get_column_type(self, table, column): 97 | # affinity_map = { 98 | # 'INTEGER':0, 99 | # 'INT':0, 100 | # 'NUMERIC':0, 101 | # 'REAL':0, 102 | # 'BLOB':2 103 | # } 104 | # if table not in self.type_mapping: 105 | # c = self.conn.execute('PRAGMA table_info('+table+')') 106 | # columns = c.fetchall() 107 | # self.type_mapping[table] = {} 108 | # for entry in columns: 109 | # type = entry[2].upper() 110 | # if type in affinity_map: 111 | # self.type_mapping[table][entry[1]] = affinity_map[type] 112 | # else: 113 | # self.type_mapping[table][entry[1]] = 1 #text 114 | # return self.type_mapping[table][column] 115 | 116 | def insert_kw(self, table, **kwargs): 117 | column_list = [] 118 | value_list = [] 119 | command_list = {'commit': False, 'connection': None, 'ignore':False, 'values':None} 120 | for key, value in kwargs.items(): 121 | if key in command_list: 122 | command_list[key] = value 123 | continue 124 | column_list.append(key) 125 | if isinstance(value,str): 126 | value_list.append("'" + value + "'") 127 | elif isinstance(value,bytes): 128 | value_list.append(sqlite3.Binary(value)) 129 | else: 130 | value_list.append(str(value)) 131 | 132 | # try: 133 | # check = float(value) 134 | # value_list.append(str(value)) 135 | # except Exception: 136 | # if type(value) == bytes: 137 | # value_list.append(sqlite3.Binary(value)) 138 | # else: 139 | # value_list.append("'" + value + "'") 140 | 141 | error_mode = 'REPLACE' 142 | if command_list['ignore']: 143 | error_mode = 'IGNORE' 144 | 145 | conn_to_use = self.conn 146 | if command_list['connection'] is not None: 147 | conn_to_use = command_list['connection'] 148 | 149 | if command_list['values'] is not None: 150 | value_list = [] 151 | placeholder_list = [] 152 | for value in command_list['values']: 153 | if type(value) == bytes: 154 | value_list.append(sqlite3.Binary(value)) 155 | else: 156 | value_list.append(str(value)) 157 | # try: 158 | # value_list.append(str(value)) 159 | # except Exception: 160 | # value_list.append(sqlite3.Binary(value)) 161 | placeholder_list.append("?") 162 | # query = "INSERT OR " + error_mode + " INTO " + table + " VALUES (" + ",".join(value_list) + ")" 163 | query = "INSERT OR " + error_mode + " INTO " + table + " VALUES (" + ",".join(placeholder_list) + ")" 164 | c = self.execute_and_log(conn_to_use,query,value_list) 165 | else: 166 | query = "INSERT OR "+error_mode+" INTO " + table + " (" + ",".join(column_list) + ") VALUES (" + ",".join(value_list) + ")" 167 | # print("QUERY", query) 168 | try: 169 | c = self.execute_and_log(conn_to_use,query) 170 | except: 171 | print("Could not insert") 172 | print(query) 173 | exit(1) 174 | 175 | 176 | 177 | try: 178 | 179 | if command_list['commit']: 180 | conn_to_use.commit() 181 | return c.rowcount 182 | except Error as e: 183 | print(self.db,"insert_kw error ", e, "table",table,"kwargs",kwargs) 184 | exit(0) 185 | 186 | 187 | 188 | def deferred_insert(self, table, values): 189 | buffer = self.deferred_buffers[table] 190 | if values is not None: 191 | converted_values = [] 192 | for value in values: 193 | if type(value) == bytes: 194 | converted_values.append(sqlite3.Binary(value)) 195 | else: 196 | converted_values.append(str(value)) 197 | # buffer.append(converted_values) 198 | buffer.put(converted_values) 199 | # print('deferred',table,len(self.deferred_buffers[table])) 200 | 201 | 202 | def process_deferred_inserts(self,min_count, max_count_total=100, error_mode='IGNORE', single_table=False): 203 | # print("[",end='') 204 | if self.currently_processing: 205 | print("CURRENTLY IN INSERTS!!!") 206 | self.currently_processing = True 207 | tables = list(self.deferred_buffers.keys()) 208 | len_list = [] 209 | for table in tables: 210 | # len_list.append((table,len(self.deferred_buffers[table]))) 211 | len_list.append((table,self.deferred_buffers[table].qsize())) 212 | len_list.sort(key=lambda tup: -tup[1]) 213 | 214 | total_cnt = 0 215 | table_cnt = 0 216 | exec_time = 0 217 | self.conn.execute("BEGIN TRANSACTION") 218 | for table, _ in len_list: 219 | buffer = self.deferred_buffers[table] 220 | # if len(self.deferred_buffers[table]) >= min_count: 221 | if self.deferred_buffers[table].qsize() >= min_count: 222 | # print(table, "ACTUALLY INSERTING", len(self.deferred_buffers[table])) 223 | # placeholder_list = ["?"] * len(buffer[0]) 224 | values = self.deferred_buffers[table].get() 225 | placeholder_list = ["?"] * len(values) 226 | # query = "INSERT OR " + error_mode + " INTO " + table + " VALUES (" + ",".join(placeholder_list) + ")" 227 | query = "INSERT OR REPLACE INTO " + table + " VALUES (" + ",".join(placeholder_list) + ")" 228 | query_list = [] 229 | 230 | query_list.append(values) 231 | total_cnt += 1 232 | 233 | # current_length = len(buffer) #buffer may grow while processing it! In that case this may never finish unless short-circuited. 234 | while not self.deferred_buffers[table].empty() and total_cnt < max_count_total: 235 | values = self.deferred_buffers[table].get() 236 | query_list.append(values) 237 | total_cnt += 1 238 | # for values in buffer: 239 | # query_list.append(values) 240 | # blob_size_total += len(values[2]) 241 | # cnt += 1 242 | # total_cnt += 1 243 | # if total_cnt == max_count_total: 244 | # break 245 | 246 | 247 | table_cnt += 1 248 | t = time.time() 249 | try: 250 | self.conn.executemany(query, query_list) 251 | except: 252 | raise NotImplementedError("Couldn't handle query "+query+", values"+str(query_list), traceback.format_exc()) 253 | exec_time += (time.time()-t) 254 | # print(table, "ACTUALLY INSERTED",cnt) 255 | # self.deferred_buffers[table] = self.deferred_buffers[table][cnt:] 256 | if total_cnt >= max_count_total: 257 | break 258 | if single_table: 259 | break 260 | 261 | remaining_cnt = 0 262 | tables = list(self.deferred_buffers.keys()) 263 | for table in tables: 264 | # remaining_cnt += len(self.deferred_buffers[table]) 265 | remaining_cnt += self.deferred_buffers[table].qsize() 266 | self.conn.execute("COMMIT") 267 | self.currently_processing = False 268 | # print("]",end='') 269 | return table_cnt, total_cnt, remaining_cnt, exec_time 270 | 271 | 272 | 273 | # error_mode = 'REPLACE' 274 | # if not replace: 275 | # error_mode = 'IGNORE' 276 | # query = "INSERT OR " + error_mode + " INTO " + table + " VALUES (?)" 277 | # self.conn.executemany(query,value_list) 278 | 279 | 280 | 281 | 282 | def update_kw(self, table, where, **kwargs): 283 | # column_list = [] 284 | # value_list = [] 285 | pair_list = [] 286 | command_list = {'commit': False, 'connection': None, 'ignore': False} 287 | for key, value in kwargs.items(): 288 | if key in command_list: 289 | command_list[key] = value 290 | continue 291 | 292 | try: 293 | check = float(value) 294 | pair_list.append(key + " = " +str(value)) 295 | except Exception: 296 | pair_list.append(key + " = " + "'" + value + "'") 297 | 298 | error_mode = 'REPLACE' 299 | if command_list['ignore']: 300 | error_mode = 'IGNORE' 301 | # if 'IGNORE' in command_list: 302 | # error_mode = 'IGNORE' 303 | query = "UPDATE OR " + error_mode + " "+ table + " SET " + (",").join(pair_list) + " WHERE "+ where 304 | 305 | conn_to_use = self.conn 306 | if command_list['connection'] is not None: 307 | conn_to_use = command_list['connection'] 308 | 309 | try: 310 | c = conn_to_use.cursor() 311 | self.execute_and_log(c,query) 312 | # if command_list['commit']: 313 | if command_list['commit']: 314 | conn_to_use.commit() 315 | return c.rowcount 316 | except Error as e: 317 | print(self.db,"update_kw error ", e, "table", table, "kwargs", kwargs) 318 | exit(0) 319 | 320 | 321 | 322 | 323 | def select(self,query, return_dictionaries=False, id_col=None): 324 | def printer(b): 325 | print('converting', b) 326 | if b[0] == 'b' and b[1] in["'","\'"]: 327 | return b 328 | return b.decode('UTF-8') 329 | 330 | 331 | conn = self.conn 332 | # def dict_factory(cursor, row): 333 | # d = {} 334 | # for idx, col in enumerate(cursor.description): 335 | # d[col[0]] = row[idx] 336 | # return d 337 | 338 | try: 339 | 340 | # conn.text_factory = printer 341 | 342 | # if dict: 343 | # old_f = conn.row_factory 344 | # conn.row_factory = sqlite3.Row 345 | # conn.row_factory = dict_factory 346 | c = conn.cursor() 347 | self.execute_and_log(c, query) 348 | res = c.fetchall() 349 | if id_col is None: 350 | conv_res = [] 351 | if return_dictionaries: 352 | for row in res: 353 | conv_res.append(dict(row)) 354 | else: 355 | for row in res: 356 | conv_res.append(list(row)) 357 | else: 358 | conv_res = {} 359 | if return_dictionaries: 360 | for row in res: 361 | conv_res[row[id_col]] = dict(row) 362 | else: 363 | for row in res: 364 | conv_res[row[id_col]] = list(row) 365 | 366 | # if dict: 367 | # conn.row_factory = old_f 368 | return conv_res 369 | except Error as e: 370 | print(self.db,"Error ", e,query) 371 | exit(0) 372 | 373 | 374 | def attach(self, other_db_file, other_db_name): 375 | c = self.conn.cursor() 376 | c.execute("ATTACH '" + other_db_file + "' AS " + other_db_name) 377 | self.conn.commit() 378 | 379 | -------------------------------------------------------------------------------- /code/util.py: -------------------------------------------------------------------------------- 1 | import decimal 2 | import time 3 | from collections import defaultdict 4 | import datetime 5 | import pprint 6 | import pickle 7 | 8 | Q = [decimal.Decimal(10) ** 0, decimal.Decimal(10) ** -1, decimal.Decimal(10) ** -2, decimal.Decimal(10) ** -3, 9 | decimal.Decimal(10) ** -4, decimal.Decimal(10) ** -5, decimal.Decimal(10) ** -6, decimal.Decimal(10) ** -7, 10 | decimal.Decimal(10) ** -8, 11 | decimal.Decimal(10) ** -9, decimal.Decimal(10) ** -10, decimal.Decimal(10) ** -11, decimal.Decimal(10) ** -12] 12 | 13 | 14 | 15 | def dec(num, places=None): 16 | if places is None: 17 | # print("dec",num) 18 | return decimal.Decimal(num) 19 | else: 20 | return decimal.Decimal(num).quantize(Q[places], rounding=decimal.ROUND_HALF_EVEN) 21 | 22 | 23 | logger = None 24 | class Logger: 25 | def __init__(self, write_frequency=1): 26 | self.files = defaultdict(dict) 27 | self.write_frequency = write_frequency 28 | 29 | 30 | def log(self,*args, **kwargs): 31 | t = time.time() 32 | if 'WRITE ALL' in args: 33 | for filename in self.files: 34 | self.buf_to_file(filename) 35 | # self.files[filename]['file_object'].close() 36 | # self.files = defaultdict(dict) 37 | return 38 | 39 | if 'buffer' in kwargs and kwargs['buffer'] != None: 40 | buffer = kwargs['buffer'] 41 | strings = [] 42 | if 'ignore_time' not in kwargs: 43 | tm = str(datetime.datetime.now()) 44 | strings.append(tm) 45 | 46 | for s in args: 47 | if 'prettify' in kwargs: 48 | s = pprint.pformat(s) 49 | strings.append(str(s)) 50 | buffer.append(" ".join(strings)) 51 | else: 52 | if 'file' in kwargs: 53 | filename = kwargs['file'] 54 | else: 55 | filename = "log.txt" 56 | if filename not in self.files: 57 | # myfile = open('logs/' + filename, "a", encoding="utf-8") 58 | # self.files[filename]['file_object'] = myfile 59 | self.files[filename]['last_write'] = t 60 | self.files[filename]['buffer'] = [] 61 | 62 | 63 | buffer = self.files[filename]['buffer'] 64 | # myfile = self.files[filename]['file_object'] 65 | if 'ignore_time' not in kwargs: 66 | tm = str(datetime.datetime.now()) 67 | if 'print_only' not in kwargs: 68 | buffer.append(tm + " ") 69 | # myfile.write(tm + " ") 70 | if 'log_only' not in kwargs: 71 | self.lprint(tm) 72 | 73 | for s in args: 74 | if 'prettify' in kwargs: 75 | s = pprint.pformat(s) 76 | if 'print_only' not in kwargs: 77 | # myfile.write(str(s) + " ") 78 | buffer.append(str(s) + " ") 79 | if 'log_only' not in kwargs: 80 | self.lprint(s) 81 | 82 | if 'print_only' not in kwargs: 83 | # myfile.write("\n") 84 | buffer.append("\n") 85 | if 'log_only' not in kwargs: 86 | self.lprint("", same_line=False) 87 | 88 | self.buf_to_file(filename) 89 | # if 'force_write' in kwargs: 90 | # # if 1: 91 | # self.buf_to_file(filename) 92 | # 93 | # elif self.files[filename]['last_write'] + self.write_frequency < t: 94 | # self.buf_to_file(filename) 95 | 96 | # myfile.close() 97 | 98 | def buf_to_file(self,filename): 99 | buffer = self.files[filename]['buffer'] 100 | if len(buffer) > 0: 101 | myfile = open('logs/' + filename, "a", encoding="utf-8") 102 | myfile.write(''.join(buffer)) 103 | myfile.close() 104 | self.files[filename]['buffer'] = [] 105 | self.files[filename]['last_write'] = time.time() 106 | 107 | def lprint(self,p, same_line=True): 108 | try: 109 | if same_line: 110 | print(p, end=' ') 111 | else: 112 | print(p) 113 | except Exception: 114 | pass 115 | 116 | 117 | def log(*args,**kwargs): 118 | global logger 119 | if logger is None: 120 | logger = Logger() 121 | 122 | logger.log(*args,**kwargs) 123 | -------------------------------------------------------------------------------- /data/addresses.db: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/data/addresses.db -------------------------------------------------------------------------------- /driver.py: -------------------------------------------------------------------------------- 1 | from code.chain import Chain 2 | from code.sqlite import SQLite 3 | 4 | 5 | 6 | address_db = SQLite('addresses') 7 | 8 | 9 | #comment out chains you don't need 10 | #to get the cookie, login into appropriate scanner (i.e. etherscan.io), press F12 in your browser, go to [scanner url]/labelcloud 11 | #assuming you are using Chrome, go to Network -> Doc -> labelcloud -> Headers -> Request Headers -> cookie, right click on it and "Copy value" 12 | 13 | eth_cookie = 'YOUR ETHERSCAN COOKIE HERE' 14 | eth_api_key = 'YOUR ETHERSCAN API KEY' 15 | ignore_labels = ['Eth2 Depositor','Blocked','User Proxy Contracts','Upbit Hack','Phish / Hack'] 16 | eth_chain = Chain(address_db,'ETH', 'etherscan.io', 'ETH', eth_api_key,eth_cookie, primary_swap='uniswap', ignore_labels=ignore_labels) 17 | 18 | bsc_cookie = 'YOUR BSCSCAN COOKIE HERE' 19 | bsc_api_key = 'YOUR BSCSCAN API KEY' 20 | bsc_chain = Chain(address_db,'BSC', 'bscscan.com', 'BNB', bsc_api_key, bsc_cookie, primary_swap='pancakeswap' ) 21 | 22 | heco_cookie = 'YOUR HECOINFO COOKIE HERE' 23 | heco_api_key = 'YOUR HECOINFO API KEY' 24 | heco_chain = Chain(address_db,'HECO', 'hecoinfo.com', 'HT', heco_api_key, heco_cookie) 25 | 26 | polygon_cookie = 'YOUR POLYGONSCAN COOKIE HERE' 27 | polygon_api_key = 'YOUR POLYGONSCAN API KEY' 28 | polygon_chain = Chain(address_db,'POLYGON', 'polygonscan.com', 'MATIC', polygon_api_key, polygon_cookie) 29 | 30 | fantom_cookie = 'YOUR FTMSCAN COOKIE HERE' 31 | fantom_api_key = 'YOUR FTMSCAN API KEY' 32 | fantom_chain = Chain(address_db,'FANTOM', 'ftmscan.com', 'FTM', fantom_api_key, fantom_cookie) 33 | 34 | hoo_cookie = 'YOUR HOOSCAN COOKIE HERE' 35 | hoo_api_key = 'YOUR HOOSCAN API KEY' 36 | hoo_chain = Chain(address_db,'HSC', 'hooscan.com', 'HOO', hoo_api_key, hoo_cookie) 37 | 38 | 39 | chain = hoo_chain 40 | 41 | #page scrapes labelcloud for list of labels, then downloads all addresses for each label 42 | #if crashes, can restart where left off using start parameter 43 | chain.store_all_labels_to_db() 44 | 45 | #finds all deployers in the downloaded labels 46 | deployer_dict = chain.load_from_db_by_label('Contract Deployer') 47 | factory_dict = chain.load_from_db_by_label('Factory Contract') 48 | unlabeled_dict = chain.find_unlabeled_deployers() 49 | deployer_dict.update(factory_dict) 50 | deployer_dict.update(unlabeled_dict) 51 | 52 | #downloads children of each deployer using API. If crashes, can restart where left off using start parameter 53 | chain.get_all_spawns_by_dict(deployer_dict) 54 | 55 | 56 | address_db.disconnect() -------------------------------------------------------------------------------- /frontend/db_browse.py: -------------------------------------------------------------------------------- 1 | import os 2 | import subprocess 3 | # import sqlite_web 4 | 5 | ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) 6 | 7 | subprocess.check_output(['sqlite_web', ROOT_DIR + '/data/addresses.db']) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests==2.26.0 2 | beautifulsoup4==4.9.3 3 | sqlite_web==0.3.8 --------------------------------------------------------------------------------