├── README.md ├── channels.py ├── detailsshell.py ├── forwards.py ├── groups.py ├── launcher.py └── setup.py /README.md: -------------------------------------------------------------------------------- 1 | # TelegramScraper 2 | 3 | A toolkit for scraping Telegram to investigate shady goings on. 4 | 5 | ## Installation 6 | 7 | 1. Download all files and save to directory of choice. 8 | 9 | 2. Ensure pandas and telethon are installed. 10 | 11 | ```bash 12 | pip install pandas 13 | pip install telethon 14 | ``` 15 | 16 | 3. Obtain your Telegram API details from [my.telegram.org][1] (further instructions to be added here). 17 | 18 | 4. In terminal, navigate to the installation directory (eg, desktop) and run setup.py 19 | 20 | ```bash 21 | cd Desktop 22 | python3 setup.py 23 | ``` 24 | 25 | 5. Executing the setup.py file will walk you through the Telegram API login and prepare the toolkit with your details. 26 | 27 | n.b: Currently there is no easy installation, however, I'm working on properly packaging everything to make this straightforward for non-technical users. 28 | 29 | ## Usage 30 | 31 | Upon installation completion, you will be able to launch the toolkit from launcher.py 32 | 33 | ```bash 34 | cd Desktop 35 | python3 launcher.py 36 | ``` 37 | 38 | The launcher will guide you through each of the tools. Here is an overview. 39 | 40 | 1. _Scrape group members_ 41 | Scrapes all group members from a Telegram group you are part of. Exports as a .CSV containing the username (when available), user id, name, group name and group ID. The file is named after the group. 42 | 43 | 2. _Scrape forwards from chats you are in_ 44 | Scrapes all forwards from a chat you are following. Saves from, from ID, to and to ID to _forwards_data.csv_. It can then scrape forwards from all the discovered channels for a larger network map. This second feature takes a long time to run, but is worthwhile for a broader analysis. 45 | 46 | 3. _Scrape forwards from a channel_ 47 | Scrapes all forwards from any channel you specify. It can then scrape forwards from all the discovered channels for a larger network map. This second feature takes a long time to run, but is worthwhile for a broader analysis. 48 | 49 | Currently only scrapes from user and to user then saves to _ef_edgelist.csv_. 50 | 51 | ## Upcoming updates 52 | 53 | 1. An option to export all data (from user, from user ID, to user, and to ID) OR simply exporting an edgelist for direct analysis. 54 | 55 | 2. Updating all save files to generate unique names for each group/chat scraped. 56 | 57 | 3. Tool to archive all messages and media from a chat. 58 | 59 | ## Known bugs 60 | 61 | 1. Sometimes, when using _scrape group members_, returning to the launcher, then selecting _scrape forwards from chats you are in_, the toolkit will crash. This is an API error and can be avoided by restarting the launcher. 62 | 63 | 2. _Scrape forwards from chats you are in_ displays an error message when you try to pull from _Groups_ rather than _Channels_. Working on a fix to omit groups from the generated list. 64 | 65 | ## Feedback 66 | 67 | Please send all feedback either to (@[jordanwildon][2]) on Twitter, or to jordanwildon@protonmail.com 68 | 69 | ## License 70 | 71 | This project is still being tested and is not currently licensed. Please contact (@[jordanwildon][2]) on Twitter, or email jordanwildon@protonmail.com for usage information and restrictions. 72 | 73 | ## Credits 74 | 75 | All tools created by Jordan Wildon (@[jordanwildon][2]) and Alex Newhouse (@[AlexBNewhouse][3]). 76 | 77 | [1]: "Telegram API" 78 | [2]: "@jordanwildon" 79 | [3]: "@AlexBNewhouse" 80 | -------------------------------------------------------------------------------- /channels.py: -------------------------------------------------------------------------------- 1 | from telethon import TelegramClient 2 | from telethon import utils 3 | import pandas as pd 4 | import details as ds 5 | 6 | #Login details# 7 | api_id = ds.apiID 8 | api_hash = ds.apiHash 9 | phone = ds.number 10 | client = TelegramClient(phone, api_id, api_hash) 11 | 12 | client.connect() 13 | if not client.is_user_authorized(): 14 | client.send_code_request(phone) 15 | client.sign_in(phone, input('Enter the code: ')) 16 | 17 | print('Welcome to channel forward scraper') 18 | print('This tool will scrape a Telegram channel for all forwarded messages and their original source.') 19 | 20 | while True: 21 | try: 22 | channel_name = input("Please enter a Telegram channel name:\n") 23 | print(f'You entered "{channel_name}"') 24 | answer = input('Is this correct? (y/n)') 25 | if answer == 'y': 26 | print('Scraping forwards from', channel_name, '...') 27 | break; 28 | except: 29 | continue 30 | 31 | async def main(): 32 | l = [] 33 | async for message in client.iter_messages(channel_name): 34 | 35 | if message.forward is not None: 36 | try: 37 | id = message.forward.original_fwd.from_id 38 | if id is not None: 39 | ent = await client.get_entity(id) 40 | print(ent.title) 41 | l.append([channel_name, ent.title]) 42 | except: 43 | print("An exception occurred") 44 | 45 | df = pd.DataFrame(l, columns = ['From', 'To']) 46 | df.to_csv('edgelist.csv') 47 | 48 | with client: 49 | client.loop.run_until_complete(main()) 50 | 51 | print('Forwards scraped successfully.') 52 | 53 | next1 = input('Do you also want to scrape forwards from the discovered channels? (y/n)') 54 | if next1 == 'y': 55 | print('Scraping forwards from channels discovered in', channel_name, '...') 56 | async def new_main(): 57 | df = pd.read_csv('edgelist.csv') 58 | df = df.To.unique() 59 | l = [] 60 | for i in df: 61 | async for message in client.iter_messages(i): 62 | if message.forward is not None: 63 | try: 64 | id = message.forward.original_fwd.from_id 65 | if id is not None: 66 | ent = await client.get_entity(id) 67 | print(ent.title) 68 | l.append([i, ent.title]) 69 | except: 70 | print("An exception occurred") 71 | print("# # # # # # # # # # Next channel: ", i, "# # # # # # # # # #") 72 | df = pd.DataFrame(l, columns = ['From', 'To']) 73 | df.to_csv("net.csv") 74 | 75 | with client: 76 | client.loop.run_until_complete(new_main()) 77 | print('Forwards scraped successfully.') 78 | else: 79 | pass 80 | 81 | again = input('Do you want to scrape more channels? (y/n)') 82 | if again == 'y': 83 | print('Restarting...') 84 | exec(open("channels.py").read()) 85 | else: 86 | pass 87 | 88 | launcher = input('Do you want to return to the launcher? (y/n)') 89 | if launcher == 'y': 90 | print('Restarting...') 91 | exec(open("launcher.py").read()) 92 | else: 93 | print('Thank you for using the Telegram scraper.') 94 | -------------------------------------------------------------------------------- /detailsshell.py: -------------------------------------------------------------------------------- 1 | apiID = old_text1 2 | apiHash = old_text2 3 | number = old_text3 4 | -------------------------------------------------------------------------------- /forwards.py: -------------------------------------------------------------------------------- 1 | #Imports# 2 | from telethon.sync import TelegramClient 3 | from telethon.tl.functions.messages import GetDialogsRequest 4 | from telethon.tl.types import InputPeerEmpty 5 | import pandas as pd 6 | import csv 7 | import details as ds 8 | 9 | #Login details# 10 | api_id = ds.apiID 11 | api_hash = ds.apiHash 12 | phone = ds.number 13 | client = TelegramClient(phone, api_id, api_hash) 14 | 15 | #Check authorisation# 16 | client.connect() 17 | if not client.is_user_authorized(): 18 | client.send_code_request(phone) 19 | client.sign_in(phone, input('Enter the code: ')) 20 | 21 | chats = [] 22 | last_date = None 23 | chunk_size = 200 24 | groups=[] 25 | 26 | result = client(GetDialogsRequest( 27 | offset_date=last_date, 28 | offset_id=0, 29 | offset_peer=InputPeerEmpty(), 30 | limit=chunk_size, 31 | hash = 0 32 | )) 33 | chats.extend(result.chats) 34 | 35 | for chat in chats: 36 | groups.append(chat) 37 | 38 | print('List of chats:') 39 | i=0 40 | for g in groups: 41 | print(str(i) + '- ' + g.title) 42 | i+=1 43 | 44 | g_index = input("Enter a Number: ") 45 | target_entity=groups[int(g_index)] 46 | 47 | print('Fetching forwards...') 48 | 49 | async def main(): 50 | l = [] 51 | async for message in client.iter_messages(target_entity): 52 | 53 | if message.forward is not None: 54 | try: 55 | id = message.forward.original_fwd.from_id 56 | if id is not None: 57 | ent = await client.get_entity(id) 58 | print(ent.title) 59 | l.append([ent.title,id,target_entity.title,target_entity.id]) 60 | except: 61 | print("An exception occurred") 62 | 63 | df = pd.DataFrame(l, columns = ['From','From ID','To', 'To ID']) 64 | df.to_csv('forwards_data.csv') 65 | 66 | with client: 67 | client.loop.run_until_complete(main()) 68 | 69 | print('Forwards scraped successfully.') 70 | 71 | next1 = input('Do you also want to scrape forwards from the discovered channels? (y/n)') 72 | if next1 == 'y': 73 | print('Scraping forwards from channels discovered in', target_entity.title, '...') 74 | async def new_main(): 75 | df = pd.read_csv('edgelist.csv') 76 | df = df.To.unique() 77 | l = [] 78 | for i in df: 79 | async for message in client.iter_messages(i): 80 | if message.forward is not None: 81 | try: 82 | id = message.forward.original_fwd.from_id 83 | if id is not None: 84 | ent = await client.get_entity(id) 85 | print(ent.title) 86 | l.append([i, ent.title]) 87 | except: 88 | print("An exception occurred") 89 | print("# # # # # # # # # # Next channel: ", i, "# # # # # # # # # #") 90 | df = pd.DataFrame(l, columns = ['From', 'To']) 91 | df.to_csv("net.csv") 92 | 93 | with client: 94 | client.loop.run_until_complete(new_main()) 95 | print('Forwards scraped successfully.') 96 | 97 | again = input('Do you want to scrape more groups? (y/n)') 98 | if again == 'y': 99 | print('Restarting...') 100 | exec(open("forwards.py").read()) 101 | else: 102 | pass 103 | 104 | launcher = input('Do you want to return to the launcher? (y/n)') 105 | if launcher == 'y': 106 | print('Restarting...') 107 | exec(open("launcher.py").read()) 108 | else: 109 | print('Thank you for using the Telegram scraper.') 110 | -------------------------------------------------------------------------------- /groups.py: -------------------------------------------------------------------------------- 1 | from telethon.tl.functions.messages import GetDialogsRequest 2 | from telethon.tl.types import InputPeerEmpty 3 | import csv 4 | import details as ds 5 | 6 | #Login details# 7 | api_id = ds.apiID 8 | api_hash = ds.apiHash 9 | phone = ds.number 10 | client = TelegramClient(phone, api_id, api_hash) 11 | 12 | client.connect() 13 | if not client.is_user_authorized(): 14 | client.send_code_request(phone) 15 | client.sign_in(phone, input('Enter the code: ')) 16 | 17 | chats = [] 18 | last_date = None 19 | chunk_size = 200 20 | groups=[] 21 | 22 | result = client(GetDialogsRequest( 23 | offset_date=last_date, 24 | offset_id=0, 25 | offset_peer=InputPeerEmpty(), 26 | limit=chunk_size, 27 | hash = 0 28 | )) 29 | chats.extend(result.chats) 30 | 31 | for chat in chats: 32 | try: 33 | if chat.megagroup== True: 34 | groups.append(chat) 35 | except: 36 | continue 37 | 38 | print('Choose a group to scrape members from:') 39 | i=0 40 | for g in groups: 41 | print(str(i) + '- ' + g.title) 42 | i+=1 43 | 44 | g_index = input("Enter a Number: ") 45 | target_group=groups[int(g_index)] 46 | 47 | print('Fetching members...') 48 | all_participants = [] 49 | all_participants = client.get_participants(target_group, aggressive=True) 50 | 51 | print('Creating file...') 52 | file_name = ""+str(target_group.title)+"_members.csv" 53 | 54 | print('Saving in file...') 55 | with open(file_name,"w",encoding='UTF-8') as f: 56 | writer = csv.writer(f,delimiter=",",lineterminator="\n") 57 | writer.writerow(['username','user id','name','group', 'group id']) 58 | for user in all_participants: 59 | if user.username: 60 | username= user.username 61 | else: 62 | username= "" 63 | if user.first_name: 64 | first_name= user.first_name 65 | else: 66 | first_name= "" 67 | if user.last_name: 68 | last_name= user.last_name 69 | else: 70 | last_name= "" 71 | name= (first_name + ' ' + last_name).strip() 72 | writer.writerow([username,user.id,name,target_group.title, target_group.id]) 73 | print('Members scraped successfully.') 74 | 75 | again = input('Do you want to scrape more groups? (y/n)') 76 | if again == 'y': 77 | print('Restarting...') 78 | exec(open("groups.py").read()) 79 | else: 80 | pass 81 | 82 | launcher = input('Do you want to return to the launcher? (y/n)') 83 | if launcher == 'y': 84 | print('Restarting...') 85 | exec(open("launcher.py").read()) 86 | else: 87 | print('Thank you for using the Telegram scraper.') 88 | -------------------------------------------------------------------------------- /launcher.py: -------------------------------------------------------------------------------- 1 | from telethon.sync import TelegramClient 2 | from telethon import TelegramClient 3 | 4 | #launcher code# 5 | print('Welcome to Telegram Scraper') 6 | print('Please select a function:') 7 | 8 | li = ['Scrape group members', 'Scrape forwards from chats you are in', 'Scrape forwards from a channel'] 9 | 10 | def display(li): 11 | for idx, tables in enumerate(li): 12 | print("%s. %s" % (idx+1, tables)) 13 | 14 | def get_list(li): 15 | choose = int(input("\nPick a number:"))-1 16 | if choose < 0 or choose > (len(li)-1): 17 | print('Invalid Choice') 18 | return '' 19 | return li[choose] 20 | 21 | display(li) 22 | choice = (get_list(li)) 23 | 24 | print('Loading', choice,'...') 25 | 26 | if choice == 'Scrape group members': 27 | print('Launching group member scraper') 28 | exec(open("groups.py").read()) 29 | elif choice == 'Scrape forwards from chats you are in': 30 | print('Launching chat forward scraper') 31 | exec(open("forwards.py").read()) 32 | elif choice == 'Scrape forwards from a channel': 33 | print('Launching channel forward scraper') 34 | exec(open("channels.py").read()) 35 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #ask user for telegram details and guide them through it 2 | 3 | print('Welcome to the Telegram Scraper setup wizard.') 4 | print('This file will insert your login information to the Telegram Scraper scripts.') 5 | print('Follow the README instructions to get your credentials.') 6 | 7 | fin1 = open("detailsshell.py", "rt") 8 | fout1 = open("details.py", "wt") 9 | 10 | while True: 11 | try: 12 | a = input("Please enter your API ID:\n") 13 | print(f'You entered "{a}"') 14 | a1 = input('Is this correct? (y/n)') 15 | if a1 == 'y': 16 | print('Updating...') 17 | new_text1 = a 18 | break; 19 | except: 20 | continue 21 | 22 | while True: 23 | try: 24 | h = input("Please enter your API Hash:\n") 25 | print(f'You entered "{h}"') 26 | a2 = input('Is this correct? (y/n)') 27 | if a2 == 'y': 28 | print('Updating...') 29 | new_text2 = "'" + h + "'" 30 | break; 31 | except: 32 | continue 33 | 34 | while True: 35 | try: 36 | n = input("Please enter your phone number:\n") 37 | print(f'You entered "{n}"') 38 | a3 = input('Is this correct? (y/n)') 39 | if a3 == 'y': 40 | print('Updating...') 41 | new_text3 = "'" + n + "'" 42 | break; 43 | except: 44 | continue 45 | 46 | checkWords = ("old_text1","old_text2","old_text3") 47 | repWords = (new_text1,new_text2,new_text3) 48 | 49 | for line in fin1: 50 | for check, rep in zip(checkWords, repWords): 51 | line = line.replace(check, rep) 52 | fout1.write(line) 53 | 54 | fin1.close() 55 | fout1.close() 56 | 57 | print('Setup is complete.') 58 | 59 | launcher = input('Do you want to open the launcher? (y/n)') 60 | if launcher == 'y': 61 | print('Starting...') 62 | exec(open("launcher.py").read()) 63 | else: 64 | print('The launcher is now ready and can be started with the launcher.py file. You may now close the terminal.') 65 | --------------------------------------------------------------------------------