├── README.md
└── BSC_Contract_Scraper.py


/README.md:
--------------------------------------------------------------------------------
 1 | # Binance Smart Chain Contract Scraper + Contract Evaluator
 2 | Pulls Binance Smart Chain feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials.
 3 | Returns only those with socials information included, and then submits the contract address to TokenSniffer to evaluate contract legitimacy.  
 4 | 
 5 | Sample execution:  
 6 | ![2b423cea3307c40b307fdfdfe2528592](https://user-images.githubusercontent.com/62744506/149968695-6a91dc12-be82-408b-9082-ff5796896391.png) 
 7 | 
 8 | 
 9 | 
10 | Its common practice to include links to social media such as Telegram somewhere in the contract for added transparency, the idea of the scraper being to return new
11 | contracts just as they come out that have websites, social media etc. that demonstrates some level of ambition - just a website is often enough for decently high market caps.  
12 | 
13 | Generally the contracts returned will never have high evaluation scores as they have just come out and TokenSniffers evaluation criteria is based on
14 | factors like buying fees, contract ownership renouncals, liquidity locks etc. which in most cases are set some time after contract verification.
15 | 
16 | Upon startup you'll have to solve the captcha that appears in the ChromeDriver window to bypass TokenSniffer's bot protection.
17 | Contracts returned with scores around 40-50 are ones to keep a look at generally as their overall evaluation is bound to be high if they decide to do those things mentioned
18 | when launching. In general, I advise to trade these tokens returned by the scraper with caution, many of them are scams and will just lose you your funds.
19 | However, as this scraper returns contracts when they just come out there is ample opportunity to be very early to good projects too. DYOR!
20 | 
21 | Future versions might implement frameworks like pupeteer to bypass the captcha so we can run the driver headlessly without that annoying chrome window. 
22 | 
23 | To run:   
24 | 1.Install Google Chrome.   
25 | 2.Import selenium, requests, time into a python3 environment of your choice  
26 | 3. Install the Chrome webdriver from: https://chromedriver.chromium.org/home    
27 | 4. Pass in the directory location of your chromedriver.exe as a String argument into the Service object on line 29   
28 | 
29 | 
30 | ![ce4e75df3c012907aee2c9559c250f30](https://user-images.githubusercontent.com/62744506/149964319-ce99bc2f-46ff-4cb7-8f89-c5791a8cb489.png)   
31 | 
32 | 5.  Run Bsc_Contract_Scraper.py  
33 | 
34 | 
35 | New contracts will be printed into the console, you can then go to the contract code to find the links to eventual social media/website.   
36 | 
37 | 
38 | 
39 | 


--------------------------------------------------------------------------------
/BSC_Contract_Scraper.py:
--------------------------------------------------------------------------------
  1 | #! python3
  2 | #
  3 | # Pulls BSC Scan feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials
  4 | # Returns only those with socials information included, and then submits the contract address to TokenSniffer to evaluate contract legitimacy
  5 | # Its common practice to include links to social media such as Telegram somewhere in the contract for added transparency, the idea of the scraper being to return new
  6 | # contracts just as they come out that have websites, social media etc. that demonstrates some level of ambition - just a website is often enough for decently high market caps.
  7 | # Generally the contracts returned will never have high evaluation scores as they have just come out and TokenSniffers evaluation criteria is based on
  8 | # factors like buying fees, contract ownership renouncals, liquidity locks etc. which in most cases are set some time after contract verification.
  9 | 
 10 | # Upon startup you'll have to solve the captcha that appears in the ChromeDriver window to bypass TokenSniffer's bot protection.
 11 | # Contracts returned with scores around 40-50 are ones to keep a look at generally as their overall evaluation is bound to be high if they decide to do those things mentioned
 12 | # when launching. In general, I advise to trade these tokens returned by the scraper with caution, many of them are scams and will just lose you your funds.
 13 | # However, as this scraper returns contracts when they just come out there is ample opportunity to be very early to good projects too. DYOR!
 14 | # author Greek, GG
 15 | 
 16 | import requests
 17 | import time
 18 | import undetected_chromedriver as uc
 19 | from selenium import webdriver
 20 | from selenium.webdriver.support.ui import WebDriverWait
 21 | from selenium.webdriver.support import expected_conditions as EC
 22 | from selenium.webdriver.common.by import By
 23 | from selenium.common.exceptions import TimeoutException
 24 | from selenium.webdriver.chrome.service import Service
 25 | 
 26 | 
 27 | URL = "https://bscscan.com/"
 28 | contractsFeedURL = URL + "contractsVerified"
 29 | 
 30 | options = uc.ChromeOptions()
 31 | options.add_argument('--headless')
 32 | 
 33 | browser = uc.Chrome(use_subprocess=True,options=options)
 34 | 
 35 | payload = {}
 36 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/83.0.4103.116 Safari/537.36'}
 37 | 
 38 | addresses = []  # To store all the addresses pulled from the contracts feed
 39 | leadAddresses = []  # To store only those addresses whose contracts include socials information
 40 | keywords = ["t.me/", "twitter.com", "medium.com", ".finance"]  # Keywords being scraped for in smart contract headers, can include any keyword you want to query for
 41 | 
 42 | 
 43 | def FeedScan():  # Update the BSC Scan verified contracts feed and scrape the addresses of new additions
 44 | 
 45 |     res = requests.get(contractsFeedURL, headers=headers, data=payload)  # Load the BSC Scan new verified contracts page
 46 |     contractsFeed = res.text.split("<tbody>")[1]  # Break out the table of contracts from the page
 47 |     for i in range(1, 26):  # Iterate over each entry in the table, i.e. newest 25 records
 48 |         addrSplit = contractsFeed.split("href=\'/address/")[i].split("#code\'")[
 49 |             0]  # Break out the address of each individual token
 50 |         if not addrSplit in addresses:  # Check if the address has already been parsed or not
 51 |             addresses.append(addrSplit)
 52 | 
 53 | 
 54 | def ContractCheck(address):  # Check the contract code of a given token address and see if it contains links to socials
 55 | 
 56 |     contractURL = URL + "address/" + address + "#code"
 57 | 
 58 | 
 59 |     tokenSnifferUrl = "https://tokensniffer.com/token/bsc/" + address
 60 |     bscScanRequest = requests.get(contractURL, headers=headers, data=payload)
 61 | 
 62 |     try:
 63 |      contractText = bscScanRequest.text.split("id=\'editor\' style=\'margin-top: 5px;\'>")[1].split("</pre><br><script>")[0]
 64 |      tokenName = bscScanRequest.text.split("Contract Name:</div>")[1].split("mb-0\">")[1].split("</span>")[0]
 65 | 
 66 |      for kw in keywords:
 67 |         if kw in contractText and address not in leadAddresses:
 68 | 
 69 |             time.sleep(3)
 70 |             browser.get(tokenSnifferUrl)
 71 | 
 72 |             delay = 40
 73 | 
 74 |             try:
 75 |                 element_present = EC.presence_of_element_located((By.XPATH, '/html/body/div/div/main/div[2]/div[2]/div[1]/table[1]/tbody/tr[1]/td/h2/span'))
 76 |                 WebDriverWait(browser, delay).until(element_present)
 77 |                 element = browser.find_element(By.XPATH, '/html/body/div/div/main/div[2]/div[2]/div[1]/table[1]/tbody/tr[1]/td/h2/span')
 78 |                 tokenSnifferEval = element.text
 79 | 
 80 |                 if not tokenSnifferEval:
 81 |                     print("Token: " + tokenName + ", URL: " + contractURL)
 82 |                     print("")
 83 | 
 84 | 
 85 |                 else:
 86 |                     print("Token: " + tokenName + ", URL: " + contractURL + ", TokenSniffer evaluation score: " + tokenSnifferEval)
 87 |                     print("")
 88 |                     time.sleep(3)
 89 | 
 90 |             except TimeoutException:
 91 |                 print("Token: " + tokenName + ", URL: " + contractURL)
 92 |                 print("Couldn't load TokenSniffer for token " + tokenName + ", " + contractURL + ", took too much time or too many requests")
 93 |                 print("")
 94 |             leadAddresses.append(address)
 95 |         else:
 96 |             continue
 97 | 
 98 |     except IndexError:
 99 |         pass
100 | 
101 | 
102 | print("Checking BSCScan for new contracts with socials...")
103 | while 1 == 1:  # Infinite loop checking for new contracts every 60 seconds
104 |     FeedScan()
105 |     counter = 0
106 |     for i in addresses:
107 |         if counter == 5:  # BSCScan rate limited at 5 requests per second
108 |             time.sleep(3)
109 |             counter = 0
110 |         ContractCheck(i)
111 |         counter += 1
112 |     # print(leadAddresses)
113 |     time.sleep(60)
114 |     print("Refreshing new contract list...")
115 | 


--------------------------------------------------------------------------------