├── .gitattributes
├── .idea
├── .gitignore
├── address_labels_github.iml
├── inspectionProfiles
│ ├── Project_Default.xml
│ └── profiles_settings.xml
├── misc.xml
└── modules.xml
├── README.md
├── code
├── __init__.py
├── __pycache__
│ ├── __init__.cpython-36.pyc
│ ├── chain.cpython-36.pyc
│ ├── sqlite.cpython-36.pyc
│ └── util.cpython-36.pyc
├── chain.py
├── sqlite.py
└── util.py
├── data
└── addresses.db
├── driver.py
├── frontend
└── db_browse.py
└── requirements.txt
/.gitattributes:
--------------------------------------------------------------------------------
1 | # Auto detect text files and perform LF normalization
2 | * text=auto
3 |
--------------------------------------------------------------------------------
/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # Default ignored files
2 | /shelf/
3 | /workspace.xml
4 |
--------------------------------------------------------------------------------
/.idea/address_labels_github.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/.idea/inspectionProfiles/Project_Default.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
10 |
11 |
12 |
--------------------------------------------------------------------------------
/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/.idea/misc.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
--------------------------------------------------------------------------------
/.idea/modules.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Blockchain Address Ownership Database
2 | Update 01/09/2023:
3 | I've rescraped the data and remade the database. It's now too large for github, and can be downloaded from here: https://drive.google.com/file/d/1rEFloDWxmFjdb3SenSvrSsQk3g0-3RNk/view?usp=sharing
4 | Currently the database has:
5 | * ~1.3M ETH addresses
6 | * ~600K BSC addresses
7 | * ~320K Polygon addresses
8 | * ~54K Fantom addresses
9 | * ~5K Arbitrum addresses
10 | * ~3K Optimism addresses
11 |
12 | And also some addresses for Cronos, Gnossis, Aurora, Boba, HECO, CELO, Moonbeam, Moonriver
13 |
14 | I have not updated the code. The scanners are too finicky and I don't want to spend time maintaining it. Let me know if you have any questions.
15 |
16 | [old info below]
17 | The database is in data/addresses.db
18 |
19 | This is a SQLite database of addresses from several blockchains. It's obtained by page scraping & API-querying etherscan's team scanners. For each address it has some subset of [name tag, labels, ownership entity]. I page scraped each labelcloud (i.e. etherscan.io/labelcloud) for all labels, and then got all the addresses for each labels with their name tags. Additionally, I downloaded all addresses created by each labeled contract deployer and factory contract (using the scanner's API)
20 |
21 | Exceptions: I didn't download all the million of shitcoin pools from uniswap and pancakeswap, only pools with at least 10 transactions. Also, etherscan-like scanners only allow for first 10000 transactions to be accessed via API, so children addresses will be missing from especially prolific deployers and factories.
22 |
23 | The database will not update automatically. I may occasionally update it. You can create your own copy with the code; it requires BeautifulSoup for scraping, and cookie and API key from each scanner for the blockchains you want. If you're doing ETH or BSC it will take several hours.
24 |
25 | Currently the database has:
26 | * ~160K ETH addresses
27 | * ~77K BSC addresses
28 | * ~9K Polygon addresses
29 | * ~13K HECO addresses
30 | * ~2K Fantom addresses
31 | * ~0.5K HSC addresses
32 |
33 |
--------------------------------------------------------------------------------
/code/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__init__.py
--------------------------------------------------------------------------------
/code/__pycache__/__init__.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__pycache__/__init__.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/chain.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__pycache__/chain.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/sqlite.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__pycache__/sqlite.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/util.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/code/__pycache__/util.cpython-36.pyc
--------------------------------------------------------------------------------
/code/chain.py:
--------------------------------------------------------------------------------
1 | import json
2 | import requests
3 | import time
4 | from .util import log
5 | import re
6 | from bs4 import BeautifulSoup
7 | import traceback
8 |
9 | class Chain:
10 | def __init__(self,db,name,explorer_domain,main_asset, api_key, cookie, primary_swap=None, ignore_labels=()):
11 | self.explorer_domain = explorer_domain
12 | self.api_url = 'https://api.'+explorer_domain+'/api'
13 |
14 | self.main_asset = main_asset
15 | self.api_key = api_key
16 | self.cookie = cookie
17 | self.name=name
18 | self.db = db
19 | self.primary_swap = primary_swap
20 | self.ignore_labels = ignore_labels
21 |
22 | # self.db.create_table(self.name+"_addresses", 'address PRIMARY KEY, deployer, entity', drop=False)
23 | # self.db.create_index(self.name + "_addresses_idx_1", self.name+"_addresses", 'entity')
24 |
25 | self.db.create_table(self.name + "_addresses", 'address PRIMARY KEY, tag, ancestor_address, entity', drop=False)
26 | self.db.create_table(self.name + "_labels", 'address, label', drop=False)
27 | self.db.create_index(self.name + "_addresses_idx_1", self.name+"_addresses", 'entity')
28 | self.db.create_index(self.name + "_labels_idx_1", self.name + "_labels", 'address, label', unique=True)
29 |
30 | self.load_all_from_db()
31 |
32 | def get_label_list(self):
33 | url = 'https://' + self.explorer_domain + '/labelcloud'
34 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36',
35 | 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
36 | 'cache-control': 'max-age=0',
37 | 'cookie': self.cookie}
38 | cont = requests.get(url, headers=headers).content
39 | html = cont.decode('utf-8')
40 | soup = BeautifulSoup(html, features="lxml")
41 | dropdowns = soup.find_all('div',class_='dropdown')
42 |
43 | accounts_labels = []
44 | for entry in dropdowns:
45 | label_url = entry.find('button')['data-url']
46 | label = entry.find('span').contents[0]
47 | sections = entry.find_all('a')
48 | # print(label, sections)
49 | for section in sections:
50 | tp = section.contents[-1]
51 | if 'Accounts' in tp:
52 | accounts_labels.append((label.strip(),label_url))
53 | break
54 | return accounts_labels
55 |
56 |
57 |
58 |
59 | def extract_entity(self,tag):
60 | if ':' in tag:
61 | row_entity = tag[:tag.index(':')].upper()
62 | else:
63 | tag_parts = tag.split(' ')
64 | if tag_parts[-1].isdigit():
65 | row_entity = ' '.join(tag_parts[:-1]).upper()
66 | else:
67 | row_entity = tag.upper()
68 | return row_entity
69 |
70 | # def download_labeled(self,label, subcatid='undefined',entity=None, page_size=10000):
71 | # db_table = self.name+"_addresses"
72 | # # self.db.create_table(db_table, 'address PRIMARY KEY, label, tag, entity', drop=False)
73 | # # self.db.create_index(self.name+"_"+table+"_idx_1", db_table, 'entity')
74 | # offset = 0
75 | # page_idx = 0
76 | # done = False
77 | # while not done:
78 | # url = 'https://'+self.explorer_domain+'/accounts/label/'+label+'?subcatid='+subcatid+'&size='+str(page_size)+'&start='+str(offset)+'&col=1&order=asc'
79 | # print(page_idx,offset,url)
80 | # headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36',
81 | # 'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
82 | # 'cache-control':'max-age=0',
83 | # 'cookie':self.cookie}
84 | # cont = requests.get(url,headers=headers).content
85 | # # print(cont)
86 | # html = cont.decode('utf-8')
87 | # soup = BeautifulSoup(html,features="lxml")
88 | # if subcatid == 'undefined':
89 | # table = soup.find("table", class_="table-hover")
90 | # else:
91 | # table = soup.find("table", id="table-subcatid-"+subcatid)
92 | # # print(table)
93 | # try:
94 | # rows = table.find_all("tr")
95 | # except:
96 | # print('EXCEPTION',traceback.format_exc())
97 | # print(html)
98 | # exit(0)
99 | # print('rows',len(rows))
100 | # for row in rows:
101 | # # print(row)
102 | # cells = row.find_all("td")
103 | # # print(len(cells))
104 | # if len(cells) == 4:
105 | # address = cells[0].find("a").contents[0]
106 | # tag = cells[1].contents
107 | # if len(tag) == 1:
108 | # tag = tag[0]
109 | # if entity is None:
110 | # row_entity = self.extract_entity(tag)
111 | # else:
112 | # row_entity = entity
113 | # # print(address,tag,row_entity)
114 | # self.db.insert_kw(db_table,values = [address,tag,None,row_entity])
115 | # self.db.insert_kw(self.name+"_labels", values=[address, label])
116 | # self.db.commit()
117 | # offset += page_size
118 | # page_idx += 1
119 | #
120 | # # done=True
121 | # if len(rows) < page_size:
122 | # done = True
123 | # else:
124 | # print("sleeping")
125 | # time.sleep(30)
126 |
127 |
128 | def download_labeled_post(self,label, label_url):
129 | db_table = self.name + "_addresses"
130 | # label_rep = label.lower().replace(' ','-').replace('.','-')
131 | url = 'https://' + self.explorer_domain + '/accounts/label/' + label_url
132 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36',
133 | 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
134 | 'cache-control': 'max-age=0',
135 | 'cookie': self.cookie}
136 | cont = requests.get(url, headers=headers).content
137 | soup = BeautifulSoup(cont,features="lxml")
138 | header = soup.find('div',class_='card-header')
139 | if header is None:
140 | subcats = ['undefined']
141 | else:
142 | subcats = []
143 | subcat_els = header.find_all('a')
144 | for subcat_el in subcat_els:
145 | subcat = subcat_el['val']
146 | subcats.append(subcat)
147 | log(label,'subcats',subcats)
148 |
149 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36',
150 | 'cache-control': 'max-age=0',
151 | # 'referer':'https://etherscan.io/accounts/label/balancer-vested-shareholders',
152 | 'accept': 'application/json, text/javascript, */*; q=0.01',
153 | 'content-type': 'application/json',
154 | 'cookie': self.cookie}
155 | page_size = 100 #it won't give more than 100
156 | for subcat in subcats:
157 | log('subcat',subcat)
158 | done = False
159 | start = 0
160 | inserted = 0
161 | while not done:
162 |
163 | payload = {"dataTableModel":{"draw":2,"columns":[{"data":"address","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"nameTag","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"balance","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}},{"data":"txnCount","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}}],"order":[{"column":3,"dir":"desc"}],"start":start,"length":page_size,"search":{"value":"","regex":False}},
164 | "labelModel":{"label":label_url}}
165 | if subcat != 'undefined':
166 | payload['labelModel']['subCategoryId'] = subcat
167 |
168 | url = 'https://' + self.explorer_domain + '/accounts.aspx/GetTableEntriesBySubLabel'
169 | payload = json.dumps(payload)
170 | # print(payload)
171 | time.sleep(0.25)
172 | resp = requests.post(url, payload, headers=headers)
173 | js = json.loads(resp.content.decode('utf-8'))
174 | data = js['d']['data']
175 | if len(data) < page_size:
176 | done = True
177 | # print(resp.content)
178 | # pprint.pprint(js)
179 | txncount = None
180 | for entry in data:
181 | address = entry['address']
182 | match = re.search('0x[a-f0-9]{40}',address)
183 | address = match.group()
184 | if address:
185 |
186 | txncount = int(entry['txnCount'].replace(',',''))
187 | if label.lower() == self.primary_swap and txncount < 10:
188 | done = True
189 |
190 | nametag = None
191 | row_entity = None
192 | if 'nameTag' in entry and len(entry['nameTag']) > 0:
193 | nametag = entry['nameTag']
194 | row_entity = self.extract_entity(nametag)
195 |
196 | self.db.insert_kw(db_table, values=[address, nametag, None, row_entity])
197 | self.db.insert_kw(self.name + "_labels", values=[address, label])
198 | inserted += 1
199 |
200 | log('label',label,'sub',subcat,'start',start,'inserted',inserted, 'last txncount',txncount)
201 | self.db.commit()
202 |
203 |
204 | start += page_size
205 |
206 | def store_all_labels_to_db(self, start = None):
207 | label_list = self.get_label_list()
208 | for label_name, label_url in sorted(label_list):
209 | if start is not None and label_name < start:
210 | continue
211 | if 1:
212 | if label_name in self.ignore_labels:
213 | log("Ignoring label",label_name)
214 | else:
215 | log("Processing label",label_name,label_url)
216 | self.download_labeled_post(label_name,label_url)
217 |
218 |
219 | def load_all_from_db(self):
220 | self.addresses = {}
221 | rows = self.db.select("SELECT * FROM "+self.name+"_addresses")
222 | log(self.name,str(len(rows))+" addresses currently in the database")
223 | for row in rows:
224 | address, tag, ancestor, entity = row
225 | self.addresses[address] = entity
226 |
227 |
228 |
229 |
230 | def get_spawns(self,address, ancestor, entity, level=0, deep=False):
231 | time.sleep(0.25)
232 | offset = ' ' * level
233 | log(offset, address, entity)
234 | if deep and level > 0:
235 | self.db.insert_kw(self.name + "_addresses", values=[address, None, ancestor, entity], ignore=True)
236 | url = self.api_url + "?module=account&action=txlist&address=" + address + "&page=1&sort=asc&apikey=" + self.api_key + "&offset=10000"
237 | resp = requests.get(url).json()
238 | cnt = 0
239 | cnt_int = 0
240 | data1 = resp['result']
241 | for transaction in data1:
242 | if transaction['to'] == '' and len(transaction['input']) > 2 and transaction['value'] == '0' and transaction['from'].lower() == address and transaction['isError'] == '0':
243 | spawn = transaction['contractAddress'].lower()
244 | if spawn not in self.addresses:
245 | if deep:
246 | cnt_sub_1, cnt_sub_2, _, _ = self.get_spawns(spawn, ancestor, entity, level=level+1, deep=True)
247 | cnt += cnt_sub_1
248 | cnt_int += cnt_sub_2
249 | else:
250 | self.db.insert_kw(self.name + "_addresses", values=[spawn, None, address, entity], ignore=True)
251 | cnt += 1
252 |
253 | time.sleep(0.25)
254 | url = self.api_url + "?module=account&action=txlistinternal&address=" + address + "&page=1&sort=asc&apikey=" + self.api_key + "&offset=10000"
255 | resp = requests.get(url).json()
256 | data2 = resp['result']
257 | ld2 = 0
258 | if data2 is not None:
259 | ld2 = len(data2)
260 | for transaction in data2:
261 | if 'create' in transaction['type'] and transaction['to'] == "":
262 | spawn = transaction['contractAddress'].lower()
263 | if spawn not in self.addresses:
264 | if deep:
265 | cnt_sub_1, cnt_sub_2, _, _ = self.get_spawns(spawn, ancestor, entity, level=level+1, deep=True)
266 | cnt += cnt_sub_1
267 | cnt_int += cnt_sub_2
268 | else:
269 | self.db.insert_kw(self.name + "_addresses", values=[spawn, None, address, entity])
270 | cnt_int += 1
271 |
272 | return cnt, cnt_int, len(data1),ld2
273 | # if spawn not in self.deployers:
274 | # self.get_spawns(spawn, entity, level=level+1)
275 |
276 |
277 |
278 | def load_from_db_by_label(self,label):
279 | res = {}
280 | Q = "select a.* from "+self.name+"_addresses as a, "+self.name+"_labels as l WHERE a.address = l.address and l.label = '"+label+"'"
281 | # print(Q)
282 | rows = self.db.select(Q)
283 | for row in rows:
284 | address, tag, ancestor, entity = row
285 | res[address.lower()] = entity
286 | return res
287 |
288 | def find_unlabeled_deployers(self):
289 | query = "select a.*, group_concat(label) as labellist " \
290 | "from "+self.name+"_addresses as a, "+self.name+"_labels as l " \
291 | "where a.address = l.address and (tag like '%factory%' or tag like '%deployer%') " \
292 | "group by a.address " \
293 | "having labellist not like '%contract-deployer%' and labellist not like '%factory%'"
294 | rows = self.db.select(query)
295 | res = {}
296 | for row in rows:
297 | res[row[0].lower()] = row[3]
298 | return res
299 |
300 |
301 | def get_all_spawns_by_dict(self, addr_dict, start = None):
302 | for idx, (address, entity) in enumerate(sorted(addr_dict.items())):
303 | if start is not None and address < start:
304 | continue
305 | cnt, cnt_int, total, total_int = self.get_spawns(address,address,entity)
306 | log(idx, "Father", address, entity, cnt, cnt_int, total, total_int)
307 | self.db.commit()
308 |
309 |
310 |
311 | def test(self):
312 | # address = '0x5c69bee701ef814a2b6a3edd4b1652cb9cc5aa6f' #uniswap factory
313 | # address = '0xbaf9a5d4b0052359326a6cdab54babaa3a3a9643' #1inch factory
314 | # url = self.api_url + "?module=account&action=txlistinternal&address=" + address + "&page=1&sort=asc&apikey=" + self.api_key + "&offset=10000"
315 | # resp = requests.get(url)
316 | # pprint.pprint(resp.json())
317 | # res = self.get_spawns('0x5c69bee701ef814a2b6a3edd4b1652cb9cc5aa6f', 'UNISWAP')
318 | # print(res)
319 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36',
320 | # 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
321 | 'cache-control': 'max-age=0',
322 | # 'referer':'https://etherscan.io/accounts/label/balancer-vested-shareholders',
323 | 'accept':'application/json, text/javascript, */*; q=0.01',
324 | 'content-type':'application/json',
325 | 'cookie': self.cookie}
326 | # payload = {"dataTableModel":{"draw":2,"columns":[{"data":"address","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"nameTag","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"balance","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}},{"data":"txnCount","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}}],"order":[{"column":3,"dir":"desc"}],"start":0,"length":25,"search":{"value":"","regex":False}},"labelModel":{"label":"balancer-vested-shareholders"}}
327 | # payload = {"dataTableModel":{"draw":2,"columns":[{"data":"address","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"nameTag","name":"","searchable":True,"orderable":False,"search":{"value":"","regex":False}},{"data":"balance","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}},{"data":"txnCount","name":"","searchable":True,"orderable":True,"search":{"value":"","regex":False}}],"order":[{"column":3,"dir":"desc"}],"start":9000,"length":25,"search":{"value":"","regex":False}},"labelModel":{"label":"pancakeswap"}}
328 | true = True
329 | false = False
330 | payload = {"dataTableModel": {"columns": [], "order": [{"column": 1, "dir": "desc"}], "start": 0, "length": 200},
331 | # "labelModel": {"label": "factory-contract"}
332 | "addressModel": {"address":"0xfc00c80b0000007f73004edb00094cad80626d8d"}
333 | }
334 |
335 |
336 | payload = json.dumps(payload)
337 | # url = 'https://etherscan.io/accounts.aspx/GetTableEntriesBySubLabel'
338 | url = 'https://'+self.explorer_domain+'/accounts.aspx/GetTableEntriesBySubLabel'
339 | resp = requests.post(url,payload,headers=headers)
340 | # print(resp)
341 | print(resp.content.decode('utf-8'))
342 | # js = json.loads(resp.content.decode('utf-8'))
343 | # pprint.pprint(js)
344 |
--------------------------------------------------------------------------------
/code/sqlite.py:
--------------------------------------------------------------------------------
1 | import warnings
2 | warnings.filterwarnings("ignore",category=DeprecationWarning)
3 | import traceback
4 |
5 | import sqlite3
6 | from sqlite3 import Error
7 | import pprint
8 | import atexit
9 | # from util import dec
10 | from collections import defaultdict
11 | from random import shuffle
12 | import time
13 | from queue import Queue
14 |
15 |
16 | class SQLite:
17 | def __init__(self, db=None,check_same_thread=True, isolation_level='DEFERRED', read_only=False, do_logging=False):
18 | self.deferred_buffers = defaultdict(Queue)
19 | self.currently_processing = False
20 | self.conn = None
21 | self.read_only = read_only
22 | self.do_logging = do_logging
23 | self.log_file = 'data/sqlite_log.txt'
24 |
25 |
26 | if db is not None:
27 | self.connect(db,check_same_thread=check_same_thread,isolation_level=isolation_level)
28 |
29 |
30 | def execute_and_log(self,cursor,query,values=None):
31 | tstart = time.time()
32 | if values is None:
33 | rv = cursor.execute(query)
34 | else:
35 | rv = cursor.execute(query,values)
36 | tend = time.time()
37 | if self.do_logging:
38 | myfile = open(self.log_file, "a", encoding="utf-8")
39 | myfile.write('\nQUERY '+query+'\n')
40 | myfile.write('TIMING '+str(tend-tstart)+'\n')
41 | myfile.close()
42 | return rv
43 |
44 | def connect(self,db, check_same_thread=True, isolation_level='DEFERRED'):
45 | # print("CONNECT TO "+db)
46 | self.db = db
47 | if self.read_only:
48 | self.conn = sqlite3.connect('file:data/' + db + '.db?mode=ro', timeout=5, check_same_thread=check_same_thread,
49 | isolation_level=isolation_level, uri=True)
50 | else:
51 | self.conn = sqlite3.connect('data/' + db + '.db', timeout=5, check_same_thread=check_same_thread,
52 | isolation_level=isolation_level)
53 |
54 | self.conn.row_factory = sqlite3.Row
55 |
56 | def disconnect(self):
57 | if self.conn is not None:
58 | # print("DISCONNECT FROM " + self.db)
59 | self.conn.close()
60 | self.conn = None
61 |
62 | def commit(self):
63 | self.conn.commit()
64 |
65 | def create_table(self,table_name,fields,drop=True):
66 | conn = self.conn
67 | c = conn.cursor()
68 | if drop:
69 | query = "DROP TABLE IF EXISTS "+table_name
70 | self.execute_and_log(c,query)
71 | query = "CREATE TABLE IF NOT EXISTS "+table_name+" ("+fields+")"
72 | self.execute_and_log(c,query)
73 | conn.commit()
74 |
75 | def create_index(self,index_name,table_name,fields, unique=False):
76 | conn = self.conn
77 | c = conn.cursor()
78 | query = "CREATE "
79 | if unique:
80 | query += "UNIQUE "
81 | query += "INDEX IF NOT EXISTS " + index_name + " on " + table_name + " (" + fields + ")"
82 | self.execute_and_log(c,query)
83 | conn.commit()
84 |
85 | def query(self,q, commit=True,value_list=None):
86 | c = self.conn.cursor()
87 | if value_list is None:
88 | self.execute_and_log(c,q)
89 | else:
90 | self.execute_and_log(c,q,value_list)
91 | modified = c.rowcount
92 | if commit:
93 | self.commit()
94 | return modified
95 |
96 | # def get_column_type(self, table, column):
97 | # affinity_map = {
98 | # 'INTEGER':0,
99 | # 'INT':0,
100 | # 'NUMERIC':0,
101 | # 'REAL':0,
102 | # 'BLOB':2
103 | # }
104 | # if table not in self.type_mapping:
105 | # c = self.conn.execute('PRAGMA table_info('+table+')')
106 | # columns = c.fetchall()
107 | # self.type_mapping[table] = {}
108 | # for entry in columns:
109 | # type = entry[2].upper()
110 | # if type in affinity_map:
111 | # self.type_mapping[table][entry[1]] = affinity_map[type]
112 | # else:
113 | # self.type_mapping[table][entry[1]] = 1 #text
114 | # return self.type_mapping[table][column]
115 |
116 | def insert_kw(self, table, **kwargs):
117 | column_list = []
118 | value_list = []
119 | command_list = {'commit': False, 'connection': None, 'ignore':False, 'values':None}
120 | for key, value in kwargs.items():
121 | if key in command_list:
122 | command_list[key] = value
123 | continue
124 | column_list.append(key)
125 | if isinstance(value,str):
126 | value_list.append("'" + value + "'")
127 | elif isinstance(value,bytes):
128 | value_list.append(sqlite3.Binary(value))
129 | else:
130 | value_list.append(str(value))
131 |
132 | # try:
133 | # check = float(value)
134 | # value_list.append(str(value))
135 | # except Exception:
136 | # if type(value) == bytes:
137 | # value_list.append(sqlite3.Binary(value))
138 | # else:
139 | # value_list.append("'" + value + "'")
140 |
141 | error_mode = 'REPLACE'
142 | if command_list['ignore']:
143 | error_mode = 'IGNORE'
144 |
145 | conn_to_use = self.conn
146 | if command_list['connection'] is not None:
147 | conn_to_use = command_list['connection']
148 |
149 | if command_list['values'] is not None:
150 | value_list = []
151 | placeholder_list = []
152 | for value in command_list['values']:
153 | if type(value) == bytes:
154 | value_list.append(sqlite3.Binary(value))
155 | else:
156 | value_list.append(str(value))
157 | # try:
158 | # value_list.append(str(value))
159 | # except Exception:
160 | # value_list.append(sqlite3.Binary(value))
161 | placeholder_list.append("?")
162 | # query = "INSERT OR " + error_mode + " INTO " + table + " VALUES (" + ",".join(value_list) + ")"
163 | query = "INSERT OR " + error_mode + " INTO " + table + " VALUES (" + ",".join(placeholder_list) + ")"
164 | c = self.execute_and_log(conn_to_use,query,value_list)
165 | else:
166 | query = "INSERT OR "+error_mode+" INTO " + table + " (" + ",".join(column_list) + ") VALUES (" + ",".join(value_list) + ")"
167 | # print("QUERY", query)
168 | try:
169 | c = self.execute_and_log(conn_to_use,query)
170 | except:
171 | print("Could not insert")
172 | print(query)
173 | exit(1)
174 |
175 |
176 |
177 | try:
178 |
179 | if command_list['commit']:
180 | conn_to_use.commit()
181 | return c.rowcount
182 | except Error as e:
183 | print(self.db,"insert_kw error ", e, "table",table,"kwargs",kwargs)
184 | exit(0)
185 |
186 |
187 |
188 | def deferred_insert(self, table, values):
189 | buffer = self.deferred_buffers[table]
190 | if values is not None:
191 | converted_values = []
192 | for value in values:
193 | if type(value) == bytes:
194 | converted_values.append(sqlite3.Binary(value))
195 | else:
196 | converted_values.append(str(value))
197 | # buffer.append(converted_values)
198 | buffer.put(converted_values)
199 | # print('deferred',table,len(self.deferred_buffers[table]))
200 |
201 |
202 | def process_deferred_inserts(self,min_count, max_count_total=100, error_mode='IGNORE', single_table=False):
203 | # print("[",end='')
204 | if self.currently_processing:
205 | print("CURRENTLY IN INSERTS!!!")
206 | self.currently_processing = True
207 | tables = list(self.deferred_buffers.keys())
208 | len_list = []
209 | for table in tables:
210 | # len_list.append((table,len(self.deferred_buffers[table])))
211 | len_list.append((table,self.deferred_buffers[table].qsize()))
212 | len_list.sort(key=lambda tup: -tup[1])
213 |
214 | total_cnt = 0
215 | table_cnt = 0
216 | exec_time = 0
217 | self.conn.execute("BEGIN TRANSACTION")
218 | for table, _ in len_list:
219 | buffer = self.deferred_buffers[table]
220 | # if len(self.deferred_buffers[table]) >= min_count:
221 | if self.deferred_buffers[table].qsize() >= min_count:
222 | # print(table, "ACTUALLY INSERTING", len(self.deferred_buffers[table]))
223 | # placeholder_list = ["?"] * len(buffer[0])
224 | values = self.deferred_buffers[table].get()
225 | placeholder_list = ["?"] * len(values)
226 | # query = "INSERT OR " + error_mode + " INTO " + table + " VALUES (" + ",".join(placeholder_list) + ")"
227 | query = "INSERT OR REPLACE INTO " + table + " VALUES (" + ",".join(placeholder_list) + ")"
228 | query_list = []
229 |
230 | query_list.append(values)
231 | total_cnt += 1
232 |
233 | # current_length = len(buffer) #buffer may grow while processing it! In that case this may never finish unless short-circuited.
234 | while not self.deferred_buffers[table].empty() and total_cnt < max_count_total:
235 | values = self.deferred_buffers[table].get()
236 | query_list.append(values)
237 | total_cnt += 1
238 | # for values in buffer:
239 | # query_list.append(values)
240 | # blob_size_total += len(values[2])
241 | # cnt += 1
242 | # total_cnt += 1
243 | # if total_cnt == max_count_total:
244 | # break
245 |
246 |
247 | table_cnt += 1
248 | t = time.time()
249 | try:
250 | self.conn.executemany(query, query_list)
251 | except:
252 | raise NotImplementedError("Couldn't handle query "+query+", values"+str(query_list), traceback.format_exc())
253 | exec_time += (time.time()-t)
254 | # print(table, "ACTUALLY INSERTED",cnt)
255 | # self.deferred_buffers[table] = self.deferred_buffers[table][cnt:]
256 | if total_cnt >= max_count_total:
257 | break
258 | if single_table:
259 | break
260 |
261 | remaining_cnt = 0
262 | tables = list(self.deferred_buffers.keys())
263 | for table in tables:
264 | # remaining_cnt += len(self.deferred_buffers[table])
265 | remaining_cnt += self.deferred_buffers[table].qsize()
266 | self.conn.execute("COMMIT")
267 | self.currently_processing = False
268 | # print("]",end='')
269 | return table_cnt, total_cnt, remaining_cnt, exec_time
270 |
271 |
272 |
273 | # error_mode = 'REPLACE'
274 | # if not replace:
275 | # error_mode = 'IGNORE'
276 | # query = "INSERT OR " + error_mode + " INTO " + table + " VALUES (?)"
277 | # self.conn.executemany(query,value_list)
278 |
279 |
280 |
281 |
282 | def update_kw(self, table, where, **kwargs):
283 | # column_list = []
284 | # value_list = []
285 | pair_list = []
286 | command_list = {'commit': False, 'connection': None, 'ignore': False}
287 | for key, value in kwargs.items():
288 | if key in command_list:
289 | command_list[key] = value
290 | continue
291 |
292 | try:
293 | check = float(value)
294 | pair_list.append(key + " = " +str(value))
295 | except Exception:
296 | pair_list.append(key + " = " + "'" + value + "'")
297 |
298 | error_mode = 'REPLACE'
299 | if command_list['ignore']:
300 | error_mode = 'IGNORE'
301 | # if 'IGNORE' in command_list:
302 | # error_mode = 'IGNORE'
303 | query = "UPDATE OR " + error_mode + " "+ table + " SET " + (",").join(pair_list) + " WHERE "+ where
304 |
305 | conn_to_use = self.conn
306 | if command_list['connection'] is not None:
307 | conn_to_use = command_list['connection']
308 |
309 | try:
310 | c = conn_to_use.cursor()
311 | self.execute_and_log(c,query)
312 | # if command_list['commit']:
313 | if command_list['commit']:
314 | conn_to_use.commit()
315 | return c.rowcount
316 | except Error as e:
317 | print(self.db,"update_kw error ", e, "table", table, "kwargs", kwargs)
318 | exit(0)
319 |
320 |
321 |
322 |
323 | def select(self,query, return_dictionaries=False, id_col=None):
324 | def printer(b):
325 | print('converting', b)
326 | if b[0] == 'b' and b[1] in["'","\'"]:
327 | return b
328 | return b.decode('UTF-8')
329 |
330 |
331 | conn = self.conn
332 | # def dict_factory(cursor, row):
333 | # d = {}
334 | # for idx, col in enumerate(cursor.description):
335 | # d[col[0]] = row[idx]
336 | # return d
337 |
338 | try:
339 |
340 | # conn.text_factory = printer
341 |
342 | # if dict:
343 | # old_f = conn.row_factory
344 | # conn.row_factory = sqlite3.Row
345 | # conn.row_factory = dict_factory
346 | c = conn.cursor()
347 | self.execute_and_log(c, query)
348 | res = c.fetchall()
349 | if id_col is None:
350 | conv_res = []
351 | if return_dictionaries:
352 | for row in res:
353 | conv_res.append(dict(row))
354 | else:
355 | for row in res:
356 | conv_res.append(list(row))
357 | else:
358 | conv_res = {}
359 | if return_dictionaries:
360 | for row in res:
361 | conv_res[row[id_col]] = dict(row)
362 | else:
363 | for row in res:
364 | conv_res[row[id_col]] = list(row)
365 |
366 | # if dict:
367 | # conn.row_factory = old_f
368 | return conv_res
369 | except Error as e:
370 | print(self.db,"Error ", e,query)
371 | exit(0)
372 |
373 |
374 | def attach(self, other_db_file, other_db_name):
375 | c = self.conn.cursor()
376 | c.execute("ATTACH '" + other_db_file + "' AS " + other_db_name)
377 | self.conn.commit()
378 |
379 |
--------------------------------------------------------------------------------
/code/util.py:
--------------------------------------------------------------------------------
1 | import decimal
2 | import time
3 | from collections import defaultdict
4 | import datetime
5 | import pprint
6 | import pickle
7 |
8 | Q = [decimal.Decimal(10) ** 0, decimal.Decimal(10) ** -1, decimal.Decimal(10) ** -2, decimal.Decimal(10) ** -3,
9 | decimal.Decimal(10) ** -4, decimal.Decimal(10) ** -5, decimal.Decimal(10) ** -6, decimal.Decimal(10) ** -7,
10 | decimal.Decimal(10) ** -8,
11 | decimal.Decimal(10) ** -9, decimal.Decimal(10) ** -10, decimal.Decimal(10) ** -11, decimal.Decimal(10) ** -12]
12 |
13 |
14 |
15 | def dec(num, places=None):
16 | if places is None:
17 | # print("dec",num)
18 | return decimal.Decimal(num)
19 | else:
20 | return decimal.Decimal(num).quantize(Q[places], rounding=decimal.ROUND_HALF_EVEN)
21 |
22 |
23 | logger = None
24 | class Logger:
25 | def __init__(self, write_frequency=1):
26 | self.files = defaultdict(dict)
27 | self.write_frequency = write_frequency
28 |
29 |
30 | def log(self,*args, **kwargs):
31 | t = time.time()
32 | if 'WRITE ALL' in args:
33 | for filename in self.files:
34 | self.buf_to_file(filename)
35 | # self.files[filename]['file_object'].close()
36 | # self.files = defaultdict(dict)
37 | return
38 |
39 | if 'buffer' in kwargs and kwargs['buffer'] != None:
40 | buffer = kwargs['buffer']
41 | strings = []
42 | if 'ignore_time' not in kwargs:
43 | tm = str(datetime.datetime.now())
44 | strings.append(tm)
45 |
46 | for s in args:
47 | if 'prettify' in kwargs:
48 | s = pprint.pformat(s)
49 | strings.append(str(s))
50 | buffer.append(" ".join(strings))
51 | else:
52 | if 'file' in kwargs:
53 | filename = kwargs['file']
54 | else:
55 | filename = "log.txt"
56 | if filename not in self.files:
57 | # myfile = open('logs/' + filename, "a", encoding="utf-8")
58 | # self.files[filename]['file_object'] = myfile
59 | self.files[filename]['last_write'] = t
60 | self.files[filename]['buffer'] = []
61 |
62 |
63 | buffer = self.files[filename]['buffer']
64 | # myfile = self.files[filename]['file_object']
65 | if 'ignore_time' not in kwargs:
66 | tm = str(datetime.datetime.now())
67 | if 'print_only' not in kwargs:
68 | buffer.append(tm + " ")
69 | # myfile.write(tm + " ")
70 | if 'log_only' not in kwargs:
71 | self.lprint(tm)
72 |
73 | for s in args:
74 | if 'prettify' in kwargs:
75 | s = pprint.pformat(s)
76 | if 'print_only' not in kwargs:
77 | # myfile.write(str(s) + " ")
78 | buffer.append(str(s) + " ")
79 | if 'log_only' not in kwargs:
80 | self.lprint(s)
81 |
82 | if 'print_only' not in kwargs:
83 | # myfile.write("\n")
84 | buffer.append("\n")
85 | if 'log_only' not in kwargs:
86 | self.lprint("", same_line=False)
87 |
88 | self.buf_to_file(filename)
89 | # if 'force_write' in kwargs:
90 | # # if 1:
91 | # self.buf_to_file(filename)
92 | #
93 | # elif self.files[filename]['last_write'] + self.write_frequency < t:
94 | # self.buf_to_file(filename)
95 |
96 | # myfile.close()
97 |
98 | def buf_to_file(self,filename):
99 | buffer = self.files[filename]['buffer']
100 | if len(buffer) > 0:
101 | myfile = open('logs/' + filename, "a", encoding="utf-8")
102 | myfile.write(''.join(buffer))
103 | myfile.close()
104 | self.files[filename]['buffer'] = []
105 | self.files[filename]['last_write'] = time.time()
106 |
107 | def lprint(self,p, same_line=True):
108 | try:
109 | if same_line:
110 | print(p, end=' ')
111 | else:
112 | print(p)
113 | except Exception:
114 | pass
115 |
116 |
117 | def log(*args,**kwargs):
118 | global logger
119 | if logger is None:
120 | logger = Logger()
121 |
122 | logger.log(*args,**kwargs)
123 |
--------------------------------------------------------------------------------
/data/addresses.db:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iraykhel/blockchain-address-database/bde6b23bdb368b9ce6d5443adda7c20763fe109b/data/addresses.db
--------------------------------------------------------------------------------
/driver.py:
--------------------------------------------------------------------------------
1 | from code.chain import Chain
2 | from code.sqlite import SQLite
3 |
4 |
5 |
6 | address_db = SQLite('addresses')
7 |
8 |
9 | #comment out chains you don't need
10 | #to get the cookie, login into appropriate scanner (i.e. etherscan.io), press F12 in your browser, go to [scanner url]/labelcloud
11 | #assuming you are using Chrome, go to Network -> Doc -> labelcloud -> Headers -> Request Headers -> cookie, right click on it and "Copy value"
12 |
13 | eth_cookie = 'YOUR ETHERSCAN COOKIE HERE'
14 | eth_api_key = 'YOUR ETHERSCAN API KEY'
15 | ignore_labels = ['Eth2 Depositor','Blocked','User Proxy Contracts','Upbit Hack','Phish / Hack']
16 | eth_chain = Chain(address_db,'ETH', 'etherscan.io', 'ETH', eth_api_key,eth_cookie, primary_swap='uniswap', ignore_labels=ignore_labels)
17 |
18 | bsc_cookie = 'YOUR BSCSCAN COOKIE HERE'
19 | bsc_api_key = 'YOUR BSCSCAN API KEY'
20 | bsc_chain = Chain(address_db,'BSC', 'bscscan.com', 'BNB', bsc_api_key, bsc_cookie, primary_swap='pancakeswap' )
21 |
22 | heco_cookie = 'YOUR HECOINFO COOKIE HERE'
23 | heco_api_key = 'YOUR HECOINFO API KEY'
24 | heco_chain = Chain(address_db,'HECO', 'hecoinfo.com', 'HT', heco_api_key, heco_cookie)
25 |
26 | polygon_cookie = 'YOUR POLYGONSCAN COOKIE HERE'
27 | polygon_api_key = 'YOUR POLYGONSCAN API KEY'
28 | polygon_chain = Chain(address_db,'POLYGON', 'polygonscan.com', 'MATIC', polygon_api_key, polygon_cookie)
29 |
30 | fantom_cookie = 'YOUR FTMSCAN COOKIE HERE'
31 | fantom_api_key = 'YOUR FTMSCAN API KEY'
32 | fantom_chain = Chain(address_db,'FANTOM', 'ftmscan.com', 'FTM', fantom_api_key, fantom_cookie)
33 |
34 | hoo_cookie = 'YOUR HOOSCAN COOKIE HERE'
35 | hoo_api_key = 'YOUR HOOSCAN API KEY'
36 | hoo_chain = Chain(address_db,'HSC', 'hooscan.com', 'HOO', hoo_api_key, hoo_cookie)
37 |
38 |
39 | chain = hoo_chain
40 |
41 | #page scrapes labelcloud for list of labels, then downloads all addresses for each label
42 | #if crashes, can restart where left off using start parameter
43 | chain.store_all_labels_to_db()
44 |
45 | #finds all deployers in the downloaded labels
46 | deployer_dict = chain.load_from_db_by_label('Contract Deployer')
47 | factory_dict = chain.load_from_db_by_label('Factory Contract')
48 | unlabeled_dict = chain.find_unlabeled_deployers()
49 | deployer_dict.update(factory_dict)
50 | deployer_dict.update(unlabeled_dict)
51 |
52 | #downloads children of each deployer using API. If crashes, can restart where left off using start parameter
53 | chain.get_all_spawns_by_dict(deployer_dict)
54 |
55 |
56 | address_db.disconnect()
--------------------------------------------------------------------------------
/frontend/db_browse.py:
--------------------------------------------------------------------------------
1 | import os
2 | import subprocess
3 | # import sqlite_web
4 |
5 | ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
6 |
7 | subprocess.check_output(['sqlite_web', ROOT_DIR + '/data/addresses.db'])
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | requests==2.26.0
2 | beautifulsoup4==4.9.3
3 | sqlite_web==0.3.8
--------------------------------------------------------------------------------