├── .gitignore ├── clbExtract.log ├── offlineTranslate ├── images │ └── offlineTranslate.jpg ├── __pycache__ │ └── bulk_translate_v3.cpython-310.pyc ├── requirements.txt ├── readme.md ├── translateGUI.py ├── old │ └── bulk_translate.py └── bulk_translate_v3.py ├── clbExtract ├── requirements.txt ├── readme.md ├── clbExtractGUI.py ├── old │ └── clbExtract.py └── clbExtract.py ├── README.md ├── locationExtract └── locationExtract.py └── applenotes2hash └── applenotes2hash.py /.gitignore: -------------------------------------------------------------------------------- 1 | .venv 2 | .gitignore 3 | .vscode/settings.json -------------------------------------------------------------------------------- /clbExtract.log: -------------------------------------------------------------------------------- 1 | 2023-07-25 21:05:49,626,- INFO - Bulk processing 0 files 2 | -------------------------------------------------------------------------------- /offlineTranslate/images/offlineTranslate.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facelessg00n/pythonForensics/HEAD/offlineTranslate/images/offlineTranslate.jpg -------------------------------------------------------------------------------- /offlineTranslate/__pycache__/bulk_translate_v3.cpython-310.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facelessg00n/pythonForensics/HEAD/offlineTranslate/__pycache__/bulk_translate_v3.cpython-310.pyc -------------------------------------------------------------------------------- /offlineTranslate/requirements.txt: -------------------------------------------------------------------------------- 1 | certifi==2023.11.17 2 | charset-normalizer==3.3.2 3 | et-xmlfile==1.1.0 4 | idna==3.4 5 | numpy==1.26.2 6 | openpyxl==3.1.2 7 | pandas==2.1.3 8 | python-dateutil==2.8.2 9 | pytz==2023.3.post1 10 | requests==2.31.0 11 | six==1.16.0 12 | tk==0.1.0 13 | tqdm==4.66.4 14 | tzdata==2023.3 15 | urllib3==2.1.0 16 | XlsxWriter==3.1.9 17 | -------------------------------------------------------------------------------- /clbExtract/requirements.txt: -------------------------------------------------------------------------------- 1 | altgraph==0.17.3 2 | black==23.1.0 3 | click==8.1.3 4 | future==0.18.2 5 | macholib==1.16.2 6 | mypy-extensions==0.4.3 7 | numpy==1.23.3 8 | packaging==23.0 9 | pandas==1.5.0 10 | pathspec==0.11.0 11 | pefile==2022.5.30 12 | platformdirs==2.6.2 13 | pyinstaller==5.6.2 14 | pyinstaller-hooks-contrib==2022.13 15 | python-dateutil==2.8.2 16 | pytz==2022.4 17 | pywin32-ctypes==0.2.0 18 | scapy==2.4.5 19 | simplekml==1.3.6 20 | six==1.16.0 21 | tk==0.1.0 22 | tomli==2.0.1 23 | typing_extensions==4.4.0 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pythonForesnsics 2 | 3 | Collection of handy Python scripts 4 | 5 | ### applenotes2hash.py 6 | 7 | Extracts hashes from Apple Notes NoteStore.sqlite, or from a GK extract for cracking. Exports in Hashcat or John Format. 8 | 9 | ### clbExtract.py 10 | 11 | Extracts contact details from Cellebrite formatted Excel files 12 | 13 | ### locationExtract.py 14 | 15 | Extracts location data from Cellebrite Excel files and converts them to an ESRI friendly format. Can also look for gaps of more than a specified time. 16 | 17 | ### offlineTranslate 18 | 19 | Utilises LibreTranslate for the bulk offline translation of messages 20 | -------------------------------------------------------------------------------- /clbExtract/readme.md: -------------------------------------------------------------------------------- 1 | # Cellebrite Contact Extractor 2 | 3 | --- 4 | Extracts contacts from Cellebrite formatted Excel files. The data in these files is nested within the column of an excel file which can cause issues when analysing them with third party tools. 5 | 6 | This tool exports contacts on a per app basis into flat .CSV files for use with third party analysis tools. This was built to handle Excel files as this is typically what analysts will receive unless they have received a 'reader' file. 7 | 8 | ## Usage 9 | 10 | This folder contains 2 python scripts. One is an optional GUI if you wish to build this into a portable .exe fi. 11 | 12 | These instruction assume you are utilising *VS Code* () and have a Python environment setup. 13 | 14 | Download the contents of this folder and open the folder in VS code. 15 | 16 | Create and activate a virtual environment. 17 | 18 | 19 | 20 | install the requirements packages, this will include the tools to turn this into a portable exe. 21 | 22 | `pip install -r .\requirements.txt` 23 | 24 | The standalone script may then be run from the command line. 25 | 26 | options: 27 | 28 | - -h show this help message and exit 29 | - -f path to the input file 30 | - -b process all files in the working directory 31 | - -p add data provenance from one of the pre approved items 32 | 33 | Place the Excel files in the folder where the script is located to process the files in bulk. 34 | 35 | ## Building the exe 36 | 37 | A portable exe can be build utilising PyInstaller. 38 | 39 | The exe must be built on the same OS it is intended to be run on or it will not work. For example if you intend to build this for use on Windows machines it must be built on a windows machine. 40 | 41 | The resulting exe will be locayed in the /dist folder of the working directory after it has been built. 42 | 43 | ### **With GUI** 44 | 45 | `pyinstaller --onefile .\clbExtractGUI.py` 46 | 47 | ### **Without GUI** 48 | 49 | `pyinstaller --onefile .\clbExtract.py` 50 | 51 | --- 52 | 53 | ## Current known issues 54 | 55 | - Native contacts does not currently export email addresses 56 | - Depending on which version of Cellebrite was used, or what type of extraction was perfomed some social media SUer ID's may not be available in the Excel files. 57 | 58 | ## Newtork Analysis tools 59 | 60 | ---- 61 | **Constellation** 62 | 63 | 64 | 65 | **Maltego** 66 | 67 | 68 | -------------------------------------------------------------------------------- /offlineTranslate/readme.md: -------------------------------------------------------------------------------- 1 | # Offline Translation 2 | 3 | --- 4 | 5 | Many forensic tools have inbuilt translation offerings however my experience shows they can be slow or unreliable. As an offline translation option is often required I began to seek other means of translation. Enter LibreTranslate, an self hosted Machine translation API. 6 | 7 | 8 | 9 | ## Installation 10 | 11 | Installation options will depend on your environment however to test the proof of concept LibreTranslate can be installed with the following command from an internet connected machine. 12 | 13 | `pip install libretranslate` 14 | 15 | The server can then be started on localhost with the following command. On first run it will pull down the language packages. The machine can then be taken offline. 16 | 17 | `libretranslate` 18 | --- 19 | 20 | ## Modification 21 | 22 | You may need to change the `serverURL = "http://localhost:5000"` value to match where you libretranslate instance is hosted 23 | 24 | ## Script usage 25 | 26 | The Python script loads the specified Excel file and looks for a column named 'Messages' as per the Magnet AXIOM formatted excel sheets. At this time it will only handle Excel documents with a single sheet. 27 | 28 | In reality it will take any Excel spreadsheet with a Column named messages. 29 | 30 | - Auto detection of language is much faster however not as accurate. If you know the language it is best to select one of the language codes manually. To retrieve the language codes run `python3 bulk_translate_v3 -g` and the available languages from the server will be listed. 31 | - The generated CSV files may not open in Microsoft Excel however will open in LibreOffice Calc. It will however also attempt to output Excel files. 32 | - Defaults to English translation but other languages are possible 33 | 34 | Example usage 35 | 36 | Auto Detect 37 | python3 bulk_translate_v3.py -f excel.xlsx 38 | 39 | Manually Select language 40 | python3 python3 bulk_translate_v3 -f excel.xlsx -l zh 41 | 42 | ![screenshot](https://github.com/facelessg00n/pythonForensics/blob/main/offlineTranslate/images/offlineTranslate.jpg) 43 | 44 | ## Other usage 45 | 46 | options: 47 | 48 | -h, --help show this help message and exit 49 | 50 | -f INPUTFILEPATH, --file INPUTFILEPATH 51 | Path to Excel File 52 | 53 | -s TRANSLATIONSERVER, --server TRANSLATIONSERVER 54 | Address of translation server if not localhost or hardcoded 55 | 56 | -l {}, --language {} Language code for input text - optional but can greatly improve accuracy 57 | 58 | -e {Chats,Instant Messages}, --excelSheet {Chats,Instant Messages} 59 | Sheet name within Excel file to be translated 60 | 61 | -c, --isCellebrite If file originated from Cellebrite, header starts at 1, and message column is called 'Body' 62 | 63 | -g, --getlangs Get supported language codes and names from server 64 | 65 | ## Building the exe 66 | 67 | A portable exe can be build utilising PyInstaller. 68 | 69 | The exe must be built on the same OS it is intended to be run on or it will not work. For example if you intend to build this for use on Windows machines it must be built on a windows machine. 70 | 71 | The resulting exe will be located in the /dist folder of the working directory after it has been built. 72 | 73 | ### **With GUI** 74 | 75 | `pyinstaller --onefile .\translateGUI.py` 76 | 77 | ### **Without GUI** 78 | 79 | `pyinstaller --onefile .\bulk_translate_v3.py` 80 | -------------------------------------------------------------------------------- /locationExtract/locationExtract.py: -------------------------------------------------------------------------------- 1 | """ 2 | Extracts location data from a Cellebrite PA report and converts it to an ESRI friendly time format. 3 | 4 | Data is Extracted from the Timeline tab of the Excel report. 5 | 6 | Also has a feature to look for gaps in recording. 7 | 8 | """ 9 | import argparse 10 | import logging 11 | import pandas as pd 12 | import os 13 | import sys 14 | from datetime import datetime, timedelta 15 | 16 | # file = "input.xlsx" 17 | 18 | # Details 19 | __description__ = "Converts Cellebrite PA Garmin extracts to an ESRI compatible CSV.\n Loads data from the Timeline tab of the Excel Export" 20 | __author__ = "facelessg00n" 21 | __version__ = "0.1" 22 | 23 | # Options 24 | debug = False 25 | findGaps = False 26 | dateAfter = None 27 | localConvert = True 28 | 29 | localHour = 0.0 30 | localMinute = 0.0 31 | 32 | # ----------------------Config the logger------------------------------------ 33 | logging.basicConfig( 34 | filename="log.txt", 35 | format="%(levelname)s:%(asctime)s:%(message)s", 36 | level=logging.DEBUG, 37 | ) 38 | # ---------------Functions -------------------------------------------------- 39 | 40 | # Setup function to load spreadsheet and columns of interest 41 | def convertFile(inputFilename, dateAfter=None, gapFinder=None): 42 | if dateAfter is not None: 43 | print("Looking for dates after = " + str(dateAfter)) 44 | dateCut = True 45 | else: 46 | dateCut = False 47 | 48 | print("Loading Excel file {}".format(inputFilename)) 49 | logging.info("Loading Excel data from %s" % (str(inputFilename))) 50 | 51 | try: 52 | df = pd.read_excel(inputFilename, sheet_name="Timeline", header=1) 53 | df = df[["#", "Time", "Latitude", "Longitude"]] 54 | except Exception as e: 55 | print(e) 56 | exit() 57 | 58 | # Convert time format 59 | print("Converting Time Format") 60 | new = df["Time"].str.split("(", n=1, expand=True) 61 | df["DateTime"] = new[0] 62 | df["DateTime"] = pd.to_datetime( 63 | df["DateTime"], errors="raise", utc=True, format="%d/%m/%Y %I:%M:%S %p" 64 | ) 65 | if debug == True: 66 | print(df.info()) 67 | 68 | # Filter only data after this date 69 | if dateCut: 70 | try: 71 | df = df[(df["DateTime"] > dateAfter)] 72 | # "2020-02-01" 73 | except TypeError: 74 | print( 75 | "Type error has been raised, it is likely the input date format is incorect. This process will be skipped" 76 | ) 77 | dateCut = False 78 | pass 79 | 80 | if localConvert: 81 | df["Local"] = df["DateTime"] + pd.Timedelta( 82 | hours=localHour, minutes=localMinute 83 | ) 84 | 85 | # Find and report gaps in data recording. 86 | if gapFinder is not None: 87 | print( 88 | "Gap finder is looking for gaps of more than %s seconds." % (str(gapFinder)) 89 | ) 90 | gapData = True 91 | else: 92 | print("Gap finder is not looking for gaps in time") 93 | gapData = False 94 | 95 | if gapData: 96 | print("\nFinding gaps in time") 97 | df["GapFinder"] = df["DateTime"].diff().dt.seconds > gapFinder 98 | time_diff = df[df["GapFinder"] == True] 99 | print(str(time_diff.shape[0]) + " gaps in recording located.") 100 | gapData = True 101 | if debug: 102 | print(time_diff) 103 | 104 | # Export dataframes to CSV 105 | print("\nExporting CSV's") 106 | df.to_csv("locationData.csv", index=False, date_format="%Y/%m/%d %H:%M:%S") 107 | if gapData: 108 | time_diff.to_csv("gapData.csv", index=False, date_format="%Y/%m/%d %H:%M:%S") 109 | 110 | 111 | # Command line input args 112 | if __name__ == "__main__": 113 | parser = argparse.ArgumentParser( 114 | description=__description__, 115 | epilog="Developed by {}".format(str(__author__), str(__version__)), 116 | ) 117 | 118 | parser.add_argument( 119 | "-f", 120 | "--file", 121 | dest="inputFilename", 122 | help="Path to input Excel Spreadsheet", 123 | # required=True, 124 | ) 125 | 126 | parser.add_argument( 127 | "-g", 128 | "--gap", 129 | dest="gapSeconds", 130 | type=int, 131 | help="To detect gaps in time enter a time gap in seconds. 300 seconds is 5 minutes", 132 | default=None, 133 | required=False, 134 | ) 135 | 136 | parser.add_argument( 137 | "-d", 138 | "--dateafter", 139 | dest="dateAfter", 140 | type=int, 141 | help="Filter only data after a certain date. Required format is YYYY-MM-DD. Useful for shrinking your dataset", 142 | required=False, 143 | ) 144 | 145 | args = parser.parse_args() 146 | 147 | # display help message when no args are passed. 148 | if len(sys.argv) == 1: 149 | 150 | parser.print_help() 151 | sys.exit(1) 152 | 153 | # If no input show the help text. 154 | if not args.inputFilename: 155 | parser.print_help() 156 | parser.exit(1) 157 | 158 | # Check if the input file exists. 159 | if not os.path.exists(args.inputFilename): 160 | print("ERROR: '{}' does not exist or is not a file".format(args.inputFilename)) 161 | sys.exit(1) 162 | 163 | if args.dateAfter is not None: 164 | dateAfter = args.dateAfter 165 | print("Date After Not none") 166 | else: 167 | dateAfter = None 168 | 169 | if args.gapSeconds is not None: 170 | gapSeconds = args.gapSeconds 171 | if debug: 172 | print("GapSeconds Not none") 173 | else: 174 | gapSeconds = None 175 | 176 | convertFile(args.inputFilename, gapFinder=gapSeconds, dateAfter=dateAfter) 177 | -------------------------------------------------------------------------------- /applenotes2hash/applenotes2hash.py: -------------------------------------------------------------------------------- 1 | # Extracts password protected hashes from Apple Notes 2 | # 3 | # Elements from script from Dhiru Kholia 4 | # https://github.com/openwall/john/blob/bleeding-jumbo/run/applenotes2john.py 5 | # 6 | # Formatted with Black 7 | # 8 | # ------Changes---------- 9 | # 10 | # V0.1 - Initial release 11 | # 12 | 13 | import argparse 14 | import binascii 15 | import glob 16 | import os 17 | import sys 18 | import sqlite3 19 | import shutil 20 | import zipfile 21 | 22 | PY3 = sys.version_info[0] == 3 23 | 24 | if not PY3: 25 | reload(sys) 26 | sys.setdefaultencoding("utf8") 27 | 28 | __description__ = "Extracts and converts Apple Note hashes to Hashcat and JTR format" 29 | __author__ = "facelessg00n" 30 | __version__ = "0.1" 31 | 32 | formatType = [] 33 | notesFile = "NoteStore.sqlite" 34 | targetPath = os.getcwd() + "/temp" 35 | debug = False 36 | 37 | # ------------- Functions live here ----------------------------------- 38 | 39 | 40 | def makeTempFolder(): 41 | try: 42 | # print("Creating temporary folder") 43 | os.makedirs(targetPath) 44 | except OSError as e: 45 | # print(e) 46 | # print("Temporary folder exists") 47 | # print("Purging directory") 48 | shutil.rmtree(targetPath) 49 | try: 50 | # print("Creating temporary folder") 51 | os.makedirs(targetPath) 52 | except: 53 | # print("Something has gone horribly wrong") 54 | exit() 55 | 56 | 57 | # Check it is a zip file and extract relevant file 58 | def checkZip(z): 59 | if zipfile.is_zipfile(z): 60 | # print("This is a Zip File") 61 | with zipfile.ZipFile(z) as file: 62 | zippedFiles = file.namelist() 63 | filePath = [x for x in zippedFiles if x.endswith(notesFile)] 64 | if debug: 65 | print("Located file at path : {}".format(filePath)) 66 | print("Extracting to temp file") 67 | file.extract(filePath[0], targetPath) 68 | 69 | else: 70 | print("this does not appear to be a zip file") 71 | 72 | 73 | def processGrayShift(x, formatType): 74 | formatType = formatType 75 | try: 76 | makeTempFolder() 77 | except Exception as e: 78 | print(e) 79 | checkZip(x) 80 | inputFile = glob.glob("./**/NoteStore.sqlite", recursive=True) 81 | if debug: 82 | print("Using" + str(inputFile[0]) + " as the input file for Cache.") 83 | extractHash(inputFile[0], formatType) 84 | 85 | 86 | # Functionality below lifted from 87 | # https://github.com/openwall/john/blob/bleeding-jumbo/run/applenotes2john.py 88 | 89 | 90 | def extractHash(inputFile, formatType): 91 | db = sqlite3.connect(inputFile) 92 | cursor = db.cursor() 93 | rows = cursor.execute( 94 | "SELECT Z_PK, ZCRYPTOITERATIONCOUNT, ZCRYPTOSALT, ZCRYPTOWRAPPEDKEY, ZPASSWORDHINT, ZCRYPTOVERIFIER, ZISPASSWORDPROTECTED FROM ZICCLOUDSYNCINGOBJECT" 95 | ) 96 | for row in rows: 97 | iden, iterations, salt, fhash, hint, shash, is_protected = row 98 | if fhash is None: 99 | phash = shash 100 | else: 101 | phash = fhash 102 | if hint is None: 103 | hint = "None" 104 | # NOTE: is_protected can be zero even if iterations value is non-zero! 105 | # This was tested on macOS 10.13.2 with cloud syncing turned off. 106 | if iterations == 0: # is this a safer check than checking is_protected? 107 | continue 108 | if phash is None: 109 | continue 110 | phash = binascii.hexlify(phash) 111 | salt = binascii.hexlify(salt) 112 | if PY3: 113 | phash = str(phash, "ascii") 114 | salt = str(salt, "ascii") 115 | fname = os.path.basename(inputFile) 116 | # For John 117 | if formatType == "JOHN": 118 | sys.stdout.write( 119 | "%s:$ASN$*%d*%d*%s*%s:::::%s\n" 120 | % (fname, iden, iterations, salt, phash, hint) 121 | ) 122 | # For Hashcat 123 | elif formatType == "HASHCAT": 124 | sys.stdout.write("$ASN$*%d*%d*%s*%s\n" % (iden, iterations, salt, phash)) 125 | 126 | else: 127 | print("Invalid or no format type set") 128 | db.close 129 | 130 | 131 | # ----------- Argument Parser --------------------------------------------- 132 | 133 | if __name__ == "__main__": 134 | parser = argparse.ArgumentParser( 135 | description=__description__, 136 | epilog="Developed by {}, version {}".format(str(__author__), str(__version__)), 137 | ) 138 | 139 | parser.add_argument( 140 | "-f", "--file", dest="notesFile", help="Path to NoteStore.sqlite" 141 | ) 142 | parser.add_argument( 143 | "-g", "--grayshift", dest="grayshiftINPUT", help="Path to Grayshift Extract" 144 | ) 145 | parser.add_argument( 146 | "-t", 147 | "--type", 148 | dest="formatType", 149 | help="Output format type, JOHN or HASHCAT, defaults to JOHN. Hashcat Mode is 16200", 150 | choices=["HASHCAT", "JOHN"], 151 | default="JOHN", 152 | required=False, 153 | ) 154 | 155 | args = parser.parse_args() 156 | if len(sys.argv) == 1: 157 | parser.print_help() 158 | sys.exit(1) 159 | 160 | if args.notesFile: 161 | if not os.path.exists(args.notesFile): 162 | print("ERROR: {} does not exist or is not a file".format(args.notesFile)) 163 | sys.exit(1) 164 | extractHash(notesFile, args.formatType) 165 | 166 | if args.grayshiftINPUT: 167 | if not os.path.exists(args.grayshiftINPUT): 168 | print("ERROR: {} does not exist or is not a file".format(args.grayshiftINPUT)) 169 | sys.exit(1) 170 | processGrayShift(args.grayshiftINPUT, args.formatType) 171 | -------------------------------------------------------------------------------- /clbExtract/clbExtractGUI.py: -------------------------------------------------------------------------------- 1 | ### GUI for Cellebrite File Flattener 2 | # Only tested with Windows 3 | # Known display issues with OSX 4 | 5 | 6 | # Changelog 7 | # v0.2 - Minor changes, added provenance selector 8 | # v0.1 - Initial concept 9 | 10 | 11 | ### GUI for Cellebrite File Flattener 12 | 13 | import clbExtract 14 | 15 | import os 16 | from tkinter import * 17 | from tkinter import ttk 18 | from tkinter import messagebox 19 | from tkinter import filedialog as fd 20 | from tkinter.messagebox import showinfo 21 | 22 | LIGHT_GREY = "#BEBFC7" 23 | LIGHT_BLUE = "#307FE2" 24 | DARK_BLUE = "#024DA1" 25 | FONT_1 = "Roboto Condensed" 26 | 27 | # Auto locate list of files 28 | candidateFiles = os.listdir(os.getcwd()) 29 | file_list = [] 30 | for candidateFiles in candidateFiles: 31 | if candidateFiles.endswith(".xlsx"): 32 | file_list.append(candidateFiles) 33 | fileListingDisplay = "\n".join(file_list) 34 | 35 | # list of handled apps 36 | supportedAppsDisp = "\n".join(clbExtract.parsedApps) 37 | 38 | ## _____Functions live here_____ 39 | 40 | 41 | # TODO - Will need to pass in Provenance data here 42 | def process_all(): 43 | print("Process all selected") 44 | clbExtract.bulkProcessor(provMenu.get()) 45 | 46 | 47 | def select_file(): 48 | filetypes = [("Excel Files", "*.xlsx")] 49 | 50 | filename = fd.askopenfile( 51 | title="Open a file", 52 | initialdir=os.listdir(os.getcwd()), 53 | filetypes=filetypes, 54 | multiple=False, 55 | ) 56 | if filename: 57 | print(filename.name) 58 | showinfo( 59 | title="Selected File", 60 | message=filename.name, 61 | ) 62 | print(provMenu.get()) 63 | clbExtract.processMetadata(filename.name, provMenu.get()) 64 | 65 | 66 | # Process selected file 67 | def get_selection(): 68 | selected_file = lbox.curselection() 69 | print(lbox.get(selected_file)) 70 | print(provMenu.get()) 71 | clbExtract.processMetadata(lbox.get(selected_file), provMenu.get()) 72 | 73 | 74 | def comboSelection(event): 75 | selectedProvenance = provMenu.get() 76 | # messagebox.showinfo(message=f"The Selected value is {selectedProvenance}",title='Selection') 77 | 78 | 79 | ### _____Create interface_____ 80 | root = Tk() 81 | root.geometry("580x650") 82 | root.minsize(458, 580) 83 | root.maxsize(780, 780) 84 | root.configure(bg=LIGHT_GREY) 85 | 86 | prog_name = Label( 87 | text="Cellebrite Contact Extractor", 88 | anchor=W, 89 | padx=10, 90 | pady=10, 91 | background=DARK_BLUE, 92 | width=480, 93 | font=(FONT_1, 20), 94 | ) 95 | prog_name.pack() 96 | 97 | sideFrame = Frame(master=root, width=100, height=100, bg=LIGHT_BLUE) 98 | sideFrame.pack(fill=Y, side=LEFT) 99 | sideFrame.pack() 100 | 101 | prog_data = Label( 102 | text="For bulk procesisng of files place this program in the folder\n containing your Cellebrite formatted Excel files. ", 103 | font=(FONT_1, 10), 104 | anchor=W, 105 | padx=10, 106 | pady=10, 107 | bg=LIGHT_GREY, 108 | ) 109 | prog_data.pack() 110 | 111 | app_data_heading = Label( 112 | sideFrame, text="Handled apps:", bg=LIGHT_BLUE, font=(FONT_1, 10) 113 | ) 114 | app_data_heading.pack() 115 | app_data = Label(sideFrame, text=supportedAppsDisp, bg=LIGHT_BLUE, font=(FONT_1, 10)) 116 | app_data.pack() 117 | 118 | ## Show Auto Located files 119 | auto_locate_data = Label( 120 | text="{} candidate files located at path: \n{}".format( 121 | str(len(file_list)), str(os.getcwd()) 122 | ), 123 | anchor=W, 124 | padx=10, 125 | pady=10, 126 | bg=LIGHT_GREY, 127 | ) 128 | auto_locate_data.pack(pady=10, padx=10) 129 | ### Show options for data provenance. 130 | provLabel = Label( 131 | text="Select provenance, i.e WARRANT", 132 | padx=10, 133 | pady=10, 134 | bg=LIGHT_GREY, 135 | ) 136 | provLabel.pack() 137 | 138 | provVar = StringVar() 139 | provMenu = ttk.Combobox( 140 | values=clbExtract.provenanceCols, textvariable=provVar, state="readonly" 141 | ) 142 | provMenu.bind("<>", comboSelection) 143 | provMenu.pack(side="top") 144 | 145 | filesLabel = Label( 146 | text="Select File", 147 | padx=10, 148 | pady=10, 149 | bg=LIGHT_GREY, 150 | ) 151 | filesLabel.pack() 152 | 153 | # Select file names 154 | fNames = StringVar(value=fileListingDisplay) 155 | lbox = Listbox(root, listvariable=fNames, height=5, width=200) 156 | scroll_bar = Scrollbar(root) 157 | scroll_bar.pack(side=RIGHT, fill=Y) 158 | lbox.pack() 159 | scroll_bar.config(command=lbox.yview) 160 | 161 | 162 | ### Buttons for processing selected files 163 | 164 | btn2 = Button( 165 | root, 166 | text="Process Selected", 167 | command=get_selection, 168 | bg=LIGHT_GREY, 169 | padx=10, 170 | ) 171 | btn2.pack(side="top") 172 | 173 | btn3 = Button(root, text="Process all files", command=process_all, bg=LIGHT_GREY) 174 | btn3.pack(side="top") 175 | 176 | 177 | prog_data = Label( 178 | text="Manually select file a file to extract \n Output files will be located at: \n {}".format( 179 | str(os.getcwd()) 180 | ), 181 | anchor=W, 182 | padx=10, 183 | pady=10, 184 | font=(FONT_1, 10), 185 | bg=LIGHT_GREY, 186 | ) 187 | prog_data.pack() 188 | 189 | btn = Button(root, text="Locate file", command=select_file, bg=LIGHT_GREY) 190 | btn.pack(side=TOP, pady=10, padx=10) 191 | 192 | # Exit Program 193 | exitBtn = Button(root, text="Exit", command=root.destroy, bg=LIGHT_GREY) 194 | exitBtn.pack(side=TOP, pady=20, padx=10) 195 | 196 | # Display version info 197 | verLabel = Label( 198 | text="Version {}\nDeveloped facelesg00n".format(str(clbExtract.__version__)), 199 | padx=10, 200 | pady=10, 201 | bg=LIGHT_GREY, 202 | ) 203 | verLabel.pack() 204 | 205 | 206 | root.mainloop() 207 | -------------------------------------------------------------------------------- /offlineTranslate/translateGUI.py: -------------------------------------------------------------------------------- 1 | ### GUI for Offline Translation 2 | # Only tested with Windows 3 | # Known display issues with OSX 4 | 5 | # Changelog 6 | # v0.2 - Update function names and handle Cellebrite formatted files. 7 | # - Language selection menu 8 | # v0.1 - Initial concept 9 | 10 | import bulk_translate_v3 11 | 12 | import os 13 | from tkinter import * 14 | from tkinter import ttk 15 | from tkinter import messagebox 16 | from tkinter import filedialog as fd 17 | from tkinter.messagebox import showinfo 18 | 19 | LIGHT_GREY = "#BEBFC7" 20 | LIGHT_BLUE = "#307FE2" 21 | DARK_BLUE = "#024DA1" 22 | DARK_RED = "#FF5342" 23 | FONT_1 = "Roboto Condensed" 24 | 25 | isCellebrite = False 26 | SERVER_CONNECTED = False 27 | 28 | 29 | inputLanguages = [ 30 | "auto", 31 | "en", 32 | "sq", 33 | "ar", 34 | "az", 35 | "bn", 36 | "bg", 37 | "ca", 38 | "zh", 39 | "zt", 40 | "cs", 41 | "da", 42 | "nl", 43 | "eo", 44 | "et", 45 | "fi", 46 | "fr", 47 | "de", 48 | "el", 49 | "he", 50 | "hi", 51 | "hu", 52 | "id", 53 | "ga", 54 | "it", 55 | "ja", 56 | "ko", 57 | "lv", 58 | "lt", 59 | "ms", 60 | "nb", 61 | "fa", 62 | "pl", 63 | "pt", 64 | "ro", 65 | "ru", 66 | "sr", 67 | "sk", 68 | "sl", 69 | "es", 70 | "sv", 71 | "tl", 72 | "th", 73 | "tr", 74 | "uk", 75 | ] 76 | 77 | ## _______________Functions live here___________________________________________________________ 78 | 79 | 80 | # Process a selected file 81 | def get_selection(): 82 | selected_file = lbox.curselection() 83 | print(lbox.get(selected_file)) 84 | print(inputSheetMenu.get()) 85 | bulk_translate_v3.loadAndTranslate( 86 | lbox.get(selected_file), 87 | inputLangMenu.get(), 88 | inputSheetMenu.get(), 89 | isCellebrite.get(), 90 | ) 91 | 92 | 93 | def inputComboSelection(event): 94 | selectedProvenance = inputSheetMenu.get() 95 | 96 | 97 | def langComboSelection(event): 98 | selectedProvenance = inputLangMenu.get() 99 | # messagebox.showinfo(message=f"The Selected value is {selectedProvenance}",title='Selection') 100 | 101 | 102 | ### _____Create interface______________________________________________________________________ 103 | 104 | # Show list of Excel files in the current working directory 105 | candidateFiles = os.listdir(os.getcwd()) 106 | file_list = [] 107 | for candidateFiles in candidateFiles: 108 | if candidateFiles.endswith(".xlsx"): 109 | file_list.append(candidateFiles) 110 | fileListingDisplay = "\n".join(file_list) 111 | 112 | # Test Connectivity 113 | # bulk_translate_v3.serverCheck()) 114 | if bulk_translate_v3.serverCheck(bulk_translate_v3.serverURL) == "SERVER_OK": 115 | print("Connected to server") 116 | SERVER_CONNECTED = True 117 | serverButtonColour = LIGHT_BLUE 118 | serverStatus = "Online" 119 | else: 120 | print("Server connection failed") 121 | SERVER_CONNECTED = False 122 | serverButtonColour = DARK_RED 123 | serverStatus = "Offline" 124 | 125 | # Create box 126 | root = Tk() 127 | root.geometry("580x650") 128 | root.minsize(458, 580) 129 | root.maxsize(780, 780) 130 | root.configure(bg=LIGHT_GREY) 131 | 132 | prog_name = Label( 133 | text="Offline Translation", 134 | anchor=W, 135 | padx=10, 136 | pady=10, 137 | background=DARK_BLUE, 138 | width=480, 139 | font=(FONT_1, 20), 140 | ) 141 | prog_name.pack() 142 | 143 | sideFrame = Frame(master=root, width=100, height=100, bg=LIGHT_BLUE) 144 | sideFrame.pack(fill=Y, side=LEFT) 145 | sideFrame.pack() 146 | 147 | servAdd = Label( 148 | text="Server Address: {} Server Status: {}".format( 149 | str(bulk_translate_v3.serverURL), serverStatus 150 | ), 151 | padx=10, 152 | pady=00, 153 | bg=serverButtonColour, 154 | ) 155 | servAdd.pack() 156 | # User instructions 157 | prog_data = Label( 158 | text="For procesisng of files place this program in the folder\n containing your Excel files.", 159 | font=(FONT_1, 10), 160 | anchor=W, 161 | padx=5, 162 | pady=5, 163 | bg=LIGHT_GREY, 164 | ) 165 | prog_data.pack() 166 | 167 | app_data_heading = Label(sideFrame, text=" ", bg=LIGHT_BLUE, font=(FONT_1, 10)) 168 | app_data_heading.pack() 169 | 170 | # app_data.pack() 171 | 172 | ## Show Auto located files 173 | auto_locate_data = Label( 174 | text="{} candidate files located at path: \n{}".format( 175 | str(len(file_list)), str(os.getcwd()) 176 | ), 177 | anchor=W, 178 | padx=10, 179 | pady=10, 180 | bg=LIGHT_GREY, 181 | ) 182 | auto_locate_data.pack(pady=10, padx=10) 183 | 184 | # Tick box if file is a Cellebrite file, the header in these files starts at 1 185 | isCellebrite = IntVar() 186 | c1 = Checkbutton(text="Cellebrite file?", variable=isCellebrite, onvalue=1, offvalue=0) 187 | c1.pack() 188 | 189 | # Select an input Datasheet 190 | inputSheetName = Label( 191 | text="Input Sheet name if multiple sheets exist", 192 | padx=10, 193 | pady=10, 194 | bg=LIGHT_GREY, 195 | ) 196 | inputSheetName.pack() 197 | 198 | # Input sheet selection menu 199 | inputSheetVar = StringVar() 200 | inputSheetMenu = ttk.Combobox( 201 | values=bulk_translate_v3.inputSheets, textvariable=inputSheetVar, state="readonly" 202 | ) 203 | 204 | inputSheetMenu.bind("<>", inputComboSelection) 205 | 206 | # inputSheetMenu.set("Chats") 207 | inputSheetMenu.pack(side="top") 208 | 209 | # File selection label 210 | filesLabel = Label( 211 | text="Select File", 212 | padx=10, 213 | pady=10, 214 | bg=LIGHT_GREY, 215 | ) 216 | 217 | # ____________Language selection menu_______________________________________ 218 | inputLangName = Label( 219 | text="Input Language", 220 | padx=10, 221 | pady=10, 222 | bg=LIGHT_GREY, 223 | ) 224 | inputLangName.pack() 225 | langVar = StringVar() 226 | inputLangMenu = ttk.Combobox( 227 | values=inputLanguages, textvariable=langVar, state="readonly" 228 | ) 229 | 230 | inputLangMenu.bind("<>", langComboSelection) 231 | inputLangMenu.set("auto") 232 | inputLangMenu.pack(side="top") 233 | 234 | # ____________File selction menu_______________________________________ 235 | filesLabel = Label( 236 | text="Select File", 237 | padx=10, 238 | pady=10, 239 | bg=LIGHT_GREY, 240 | ) 241 | filesLabel.pack() 242 | 243 | # Select file names 244 | fNames = StringVar(value=fileListingDisplay) 245 | lbox = Listbox(root, listvariable=fNames, height=5, width=200) 246 | scroll_bar = Scrollbar(root) 247 | scroll_bar.pack(side=RIGHT, fill=Y) 248 | lbox.pack() 249 | scroll_bar.config(command=lbox.yview) 250 | 251 | ### Buttons for processing selected files 252 | processSelectedBtn = Button( 253 | root, 254 | text="Process Selected", 255 | command=get_selection, 256 | bg=LIGHT_GREY, 257 | padx=10, 258 | ) 259 | processSelectedBtn.pack(side="top") 260 | 261 | # Exit Program 262 | exitBtn = Button(root, text="Exit", command=root.destroy, bg=LIGHT_GREY) 263 | exitBtn.pack(side=TOP, pady=20, padx=10) 264 | 265 | # Display version info 266 | verLabel = Label( 267 | text="Version {}".format(str(bulk_translate_v3.__version__)), 268 | padx=10, 269 | pady=10, 270 | bg=LIGHT_GREY, 271 | ) 272 | verLabel.pack() 273 | 274 | root.mainloop() 275 | -------------------------------------------------------------------------------- /offlineTranslate/old/bulk_translate.py: -------------------------------------------------------------------------------- 1 | # Bulk Translation of Axiom formatted Excels containing messages 2 | # Made in South Australia 3 | # Unapologetically formatted with Black 4 | # 5 | # 6 | # Changelog 7 | # 8 | # v0.1 Initial Concept 9 | 10 | import argparse 11 | import json 12 | import pandas as pd 13 | import requests 14 | import os 15 | import sys 16 | 17 | # ----------------- Settings live here ------------------------ 18 | 19 | __description__ = "Utilises a Libretranslate server to translate messages from Axiom formatted Excel spreadsheets. Messages are loaded from a column titled 'Messages.'" 20 | __author__ = "facelessg00n" 21 | __version__ = "0.1" 22 | 23 | banner = """ 24 | ██████  ███████ ███████ ██  ██ ███  ██ ███████  ████████ ██████  █████  ███  ██ ███████ ██  █████  ████████ ███████  25 | ██    ██ ██      ██      ██  ██ ████  ██ ██          ██    ██   ██ ██   ██ ████  ██ ██      ██  ██   ██    ██    ██       26 | ██  ██ █████  █████  ██  ██ ██ ██  ██ █████  ██  ██████  ███████ ██ ██  ██ ███████ ██  ███████  ██  █████  27 | ██  ██ ██     ██     ██  ██ ██  ██ ██ ██     ██  ██   ██ ██   ██ ██  ██ ██      ██ ██  ██   ██  ██  ██     28 |  ██████  ██  ██  ███████ ██ ██   ████ ███████  ██  ██  ██ ██  ██ ██   ████ ███████ ███████ ██  ██  ██  ███████  29 |                                                                                                              30 | """ 31 | 32 | # Debug mode, will print errors etc 33 | debug = False 34 | 35 | serverURL = "http://localhost:5000" 36 | # Endpoints 37 | # /translate - translation 38 | # /languages - supported languages 39 | 40 | # Name of the column where the messages to be translated are found. 41 | # This can be modified to suit other Excel column names if desired 42 | inputColumn = "Message" 43 | 44 | 45 | # Check is server is reachable and able to process a request. 46 | def serverCheck(): 47 | print(f"Testing we can reach server {serverURL}") 48 | headers = {"Content-Type": "application/json"} 49 | payload = json.dumps( 50 | { 51 | "q": "Buenos días señor", 52 | "source": "auto", 53 | "target": "en", 54 | "format": "text", 55 | "api_key": None, 56 | } 57 | ) 58 | try: 59 | response = requests.post( 60 | f"{serverURL}/translate", data=payload, headers=headers 61 | ) 62 | if response.status_code == 404: 63 | print("ERROR: 404, server not found, check server address.") 64 | sys.exit(1) 65 | elif response.status_code == 400: 66 | print("ERROR: Invalid request sent - exiting") 67 | sys.exit(1) 68 | elif response.status_code == 200: 69 | print("Server located, testing translation") 70 | print(response.json()) 71 | 72 | # FIXME - Handle connection errors, can probably be done better. 73 | except ConnectionRefusedError: 74 | print( 75 | f"Server connection refused - {serverURL}, is the address correct? \n\nExiting" 76 | ) 77 | sys.exit() 78 | except Exception as e: 79 | print(f"Unable to connect, ERROR: {e}") 80 | sys.exit() 81 | 82 | 83 | # Loads Excel into dataframe and translates messages 84 | def loadAndTranslate(inputFile, inputLanguage): 85 | # Check we can hit the server before we start 86 | serverCheck() 87 | head, tail = os.path.split(inputFile) 88 | fileName = tail.split(".")[0] 89 | # Load Excel into Dataframe "df" and check for messages column. 90 | df = pd.read_excel(inputFile) 91 | 92 | if inputColumn not in df.columns: 93 | print("Required message column not found") 94 | sys.exit(1) 95 | 96 | # Load Messages Column to list and print some stats 97 | messages_nan_count = df["Message"].isna().sum() 98 | messages = df["Message"].tolist() 99 | print(f"{len(messages)} messages") 100 | print(f"{messages_nan_count} blank rows") 101 | 102 | results = [] 103 | loopCount = 0 104 | for message in messages: 105 | # If no language code is specified use Auto Translate 106 | if inputLanguage == None: 107 | translated_text = translate_text(message, None) 108 | # Else manual translation 109 | else: 110 | translated_text = translate_text(message, inputLanguage) 111 | 112 | if debug: 113 | print(translated_text) 114 | results.append(translated_text) 115 | print(f"Processing message {loopCount} of {len(messages)}") 116 | loopCount = loopCount + 1 117 | 118 | # ------------- Write backup file every 100 messages ---------------------------------------- 119 | if len(results) % 100 == 0: 120 | print("Writing backup") 121 | backup_frame = pd.DataFrame(results) 122 | 123 | try: 124 | backup_frame.to_excel( 125 | f"{fileName}_backup.xlsx", 126 | index=False, 127 | columns=[ 128 | "detected_language", 129 | "detected_confidence", 130 | "success", 131 | "input", 132 | "translatedText", 133 | ], 134 | ) 135 | except: 136 | print("Writing Excel Bakcup failed") 137 | pass 138 | 139 | try: 140 | backup_frame.to_csv( 141 | f"{fileName}_backup.csv", 142 | encoding="utf-16", 143 | columns=[ 144 | "detected_language", 145 | "detected_confidence", 146 | "success", 147 | "input", 148 | "translatedText", 149 | ], 150 | ) 151 | except: 152 | print("Writing CSV backup failed") 153 | pass 154 | 155 | # ------------------ Write output file ----------------------------------------------------------------- 156 | print("Translation complete - Writing file") 157 | outputFrame = pd.DataFrame(results) 158 | 159 | try: 160 | outputFrame.to_excel( 161 | f"{fileName}_translated.xlsx", 162 | index=False, 163 | columns=[ 164 | "detected_language", 165 | "detected_confidence", 166 | "success", 167 | "input", 168 | "translatedText", 169 | ], 170 | ) 171 | except: 172 | print("Writing Excel file") 173 | pass 174 | 175 | try: 176 | outputFrame.to_csv( 177 | f"{fileName}_translated.csv", 178 | encoding="utf-16", 179 | columns=[ 180 | "detected_language", 181 | "detected_confidence", 182 | "success", 183 | "input", 184 | "translatedText", 185 | ], 186 | ) 187 | except: 188 | print("Writing CSV failed") 189 | pass 190 | 191 | print("Process complete - Exiting.") 192 | 193 | 194 | # ------------------ Translates text with selected language ----------------------------------------------- 195 | def translate_text(inputText, inputLang, api_key=None): 196 | # For future implementation 197 | if api_key is not None: 198 | API_KEY = api_key 199 | else: 200 | API_KEY = None 201 | 202 | if inputLang is not None: 203 | if debug: 204 | print("Manual Lanugage Detection {}".format(inputLang)) 205 | payload = json.dumps( 206 | { 207 | "q": inputText, 208 | "source": inputLang, 209 | "target": "en", 210 | "format": "text", 211 | "api_key": API_KEY, 212 | } 213 | ) 214 | else: 215 | if debug: 216 | print("Auto language detection enabled".format(inputLang)) 217 | payload = json.dumps( 218 | { 219 | "q": inputText, 220 | "source": "auto", 221 | "target": "en", 222 | "format": "text", 223 | "api_key": API_KEY, 224 | } 225 | ) 226 | 227 | # Detect blank rows and skip to prevent error being thrown by server / speeds up process 228 | if inputText == None or pd.isna(inputText): 229 | print("Blank row found, skipping") 230 | output = { 231 | "detected_language": None, 232 | "detected_confidence": None, 233 | "translatedText": None, 234 | "success": False, 235 | } 236 | output["input"] = inputText 237 | return output 238 | 239 | else: 240 | headers = {"Content-Type": "application/json"} 241 | response = requests.post( 242 | f"{serverURL}/translate", data=payload, headers=headers 243 | ) 244 | if response.status_code == 200: 245 | results = response.json() 246 | if debug: 247 | print(f"{inputText} and {response.json()}") 248 | try: 249 | answer = results 250 | # Server response style is different for Auto or Manual language selection 251 | if inputLang is not None: 252 | output = { 253 | "detected_language": f"Manual - {inputLang}", 254 | "detected_confidence": None, 255 | "translatedText": answer.get("translatedText"), 256 | "success": True, 257 | } 258 | else: 259 | output = { 260 | "detected_language": results.get("detectedLanguage")[ 261 | "language" 262 | ], 263 | "detected_confidence": results.get("detectedLanguage")[ 264 | "confidence" 265 | ], 266 | "translatedText": answer.get("translatedText"), 267 | "success": True, 268 | } 269 | 270 | output["input"] = inputText 271 | return output 272 | except Exception as e: 273 | print(e) 274 | 275 | elif response.status_code == 400: 276 | print("Invalid request") 277 | output = { 278 | "detected_language": None, 279 | "detected_confidence": None, 280 | "translatedText": None, 281 | "success": f"Error: {response.status_code, results.get}", 282 | } 283 | output["input"] = inputText 284 | return output 285 | 286 | 287 | # Retrieve list of allowed languages from the server 288 | def getLanguages(printVals): 289 | AllowedLangs = [] 290 | supportedLanguages = requests.get(f"{serverURL}/languages").json() 291 | for langItem in supportedLanguages: 292 | if printVals: 293 | print( 294 | f"Language Code: {langItem['code']} Language Name: {langItem['name']}" 295 | ) 296 | AllowedLangs.append(langItem["code"]) 297 | return AllowedLangs 298 | 299 | 300 | # ---------------------------- Argument Parser ------------------------ 301 | 302 | if __name__ == "__main__": 303 | print(banner) 304 | serverCheck() 305 | print(f"Checking server {serverURL} for supported languages") 306 | try: 307 | supportedLanguages = getLanguages(False) 308 | if len(supportedLanguages) == 0: 309 | print("Supported Languages not found") 310 | supportedLanguages = ["0"] 311 | else: 312 | print(f"Languages found - {supportedLanguages} \n\n") 313 | 314 | except Exception as e: 315 | print(e) 316 | 317 | parser = argparse.ArgumentParser( 318 | description=__description__, 319 | epilog="Developed by {}, version {}".format(str(__author__), str(__version__)), 320 | ) 321 | 322 | parser.add_argument( 323 | "-f", "--file", dest="inputFilePath", help="Path to Axiom formatted excel file" 324 | ) 325 | parser.add_argument( 326 | "-s", 327 | "--server", 328 | dest="translationServer", 329 | help="Address of translation server if not localhost or hardcoded", 330 | required=False, 331 | ) 332 | parser.add_argument( 333 | "-l", 334 | "--language", 335 | dest="inputLanguage", 336 | help="Language code for input text - optional but can greatly improve accuracy", 337 | required=False, 338 | choices=supportedLanguages, 339 | ) 340 | parser.add_argument( 341 | "-g", 342 | "--getlangs", 343 | dest="getLangs", 344 | action="store_true", 345 | help="Get supported language codes and names from server", 346 | required=False, 347 | default=True, 348 | ) 349 | args = parser.parse_args() 350 | if len(sys.argv) == 1: 351 | parser.print_help() 352 | sys.exit(1) 353 | 354 | if args.inputFilePath and not args.inputLanguage: 355 | if not os.path.exists(args.inputFilePath): 356 | print( 357 | "ERROR: {} does not exist or is not a file".format(args.inputFilePath) 358 | ) 359 | sys.exit(1) 360 | loadAndTranslate(args.inputFilePath, None) 361 | 362 | if args.inputFilePath and args.inputLanguage: 363 | if not os.path.exists(args.inputFilePath): 364 | print( 365 | "ERROR: {} does not exist or is not a file".format(args.inputFilePath) 366 | ) 367 | sys.exit(1) 368 | print(f"Input language set to {args.inputLanguage}") 369 | loadAndTranslate(args.inputFilePath, args.inputLanguage) 370 | 371 | if args.getLangs: 372 | getLanguages(True) 373 | -------------------------------------------------------------------------------- /offlineTranslate/bulk_translate_v3.py: -------------------------------------------------------------------------------- 1 | # Bulk Translation of Axiom formatted Excels containing messages 2 | # Made in South Australia 3 | # Unapologetically formatted with Black 4 | # 5 | # Changelog 6 | # v0.3 Handle network errors... oops 7 | # v0.2 Change to output full content of the input sheet 8 | # Handle Cellebrite and Axiom files 9 | # v0.1 Initial Concept 10 | 11 | import argparse 12 | import json 13 | import pandas as pd 14 | import requests 15 | import os 16 | import sys 17 | from tqdm import tqdm 18 | from time import sleep 19 | 20 | # ----------------- Settings live here ------------------------ 21 | 22 | __description__ = "Utilises a Libretranslate server to translate messages from Excel spreadsheets. By default messages are loaded from a column titled 'Message'." 23 | __author__ = "facelessg00n" 24 | __version__ = "0.3" 25 | 26 | banner = """ 27 | ██████  ███████ ███████ ██  ██ ███  ██ ███████  ████████ ██████  █████  ███  ██ ███████ ██  █████  ████████ ███████  28 | ██    ██ ██      ██      ██  ██ ████  ██ ██          ██    ██   ██ ██   ██ ████  ██ ██      ██  ██   ██    ██    ██       29 | ██  ██ █████  █████  ██  ██ ██ ██  ██ █████  ██  ██████  ███████ ██ ██  ██ ███████ ██  ███████  ██  █████  30 | ██  ██ ██     ██     ██  ██ ██  ██ ██ ██     ██  ██   ██ ██   ██ ██  ██ ██      ██ ██  ██   ██  ██  ██     31 |  ██████  ██  ██  ███████ ██ ██   ████ ███████  ██  ██  ██ ██  ██ ██   ████ ███████ ███████ ██  ██  ██  ███████  32 |                                                                                                              33 | """ 34 | 35 | # Debug mode, will print errors etc 36 | debug = False 37 | 38 | # if being compiled with a GUI 39 | # Keeps window alive if connection fails 40 | hasGUI = True 41 | 42 | serverURL = "http://localhost:5000" 43 | CONNECTION_TIMEOUT = 3 44 | RESPONSE_TIMEOUT = 60 45 | # 46 | # Endpoints 47 | # /translate - translation 48 | # /languages - supported languages 49 | 50 | # Name of the column where the messages to be translated are found. 51 | # This can be modified to suit other Excel column names if desired 52 | inputColumn = "Message" 53 | inputSheets = ["Chats", "Instant Messages"] 54 | sheetName = "Chats" 55 | headerRow = 1 56 | 57 | translationColumns = [ 58 | "detectedLanguage", 59 | "detectedConfidence", 60 | "success", 61 | "input", 62 | "translatedText", 63 | ] 64 | 65 | 66 | # Check is server is reachable and able to process a request. 67 | def serverCheck(serverURL): 68 | print(f"Testing we can reach server {serverURL}") 69 | headers = {"Content-Type": "application/json"} 70 | payload = json.dumps( 71 | { 72 | "q": "Buenos días señor", 73 | "source": "auto", 74 | "target": "en", 75 | "format": "text", 76 | "api_key": None, 77 | } 78 | ) 79 | try: 80 | response = requests.post( 81 | f"{serverURL}/translate", data=payload, headers=headers 82 | ) 83 | if response.status_code == 404: 84 | print("ERROR: 404, server not found, check server address.") 85 | sys.exit(1) 86 | elif response.status_code == 400: 87 | print("ERROR: Invalid request sent - exiting") 88 | sys.exit(1) 89 | elif response.status_code == 200: 90 | print("Server located, testing translation") 91 | print(response.json()) 92 | return "SERVER_OK" 93 | 94 | # FIXME - Handle connection errors, can probably be done better. 95 | except ConnectionRefusedError: 96 | print( 97 | f"Server connection refused - {serverURL}, is the address correct? \n\nExiting" 98 | ) 99 | if not hasGUI: 100 | sys.exit() 101 | except Exception as e: 102 | print(f"Unable to connect, ERROR: {e}") 103 | if not hasGUI: 104 | sys.exit() 105 | 106 | 107 | # Loads Excel into dataframe and translates messages 108 | def loadAndTranslate(inputFile, inputLanguage, inputSheet, isCellebrite): 109 | # Check we can hit the server before we start 110 | serverCheck(serverURL) 111 | head, tail = os.path.split(inputFile) 112 | fileName = tail.split(".")[0] 113 | 114 | if isCellebrite: 115 | inputHeader = 1 116 | inputColumn = "Body" 117 | else: 118 | inputHeader = 0 119 | inputColumn = "Message" 120 | 121 | # Load Excel into Dataframe "df" and check for messages column. 122 | if inputSheet: 123 | print("There is an input sheet") 124 | df = pd.read_excel(inputFile, sheet_name=inputSheet, header=inputHeader) 125 | else: 126 | print("There is no input sheet specified") 127 | df = pd.read_excel(inputFile, header=inputHeader) 128 | 129 | if debug: 130 | df = df.head(25) 131 | 132 | if inputColumn not in df.columns: 133 | print("Required message column not found, is this a Cellbrite Formatted Excel?") 134 | sys.exit(1) 135 | 136 | # Load Messages Column to list and print some stats 137 | messages_nan_count = df[inputColumn].isna().sum() 138 | messages = df[inputColumn].tolist() 139 | print(f"{len(messages)} messages") 140 | print(f"{messages_nan_count} blank rows") 141 | 142 | results = [] 143 | loopCount = 1 144 | for message in tqdm(messages, desc="Translating messages", ascii="░▒█"): 145 | # If no language code is specified use Auto Translate 146 | if inputLanguage == None: 147 | translated_text = translate_text(message, None) 148 | # Else manual translation 149 | else: 150 | translated_text = translate_text(message, inputLanguage) 151 | 152 | if debug: 153 | print(translated_text) 154 | results.append(translated_text) 155 | tqdm.write(f"Processing message {loopCount} of {len(messages)}") 156 | # print(f"Processing message {loopCount} of {len(messages)}") 157 | loopCount = loopCount + 1 158 | 159 | # ------------- Write backup file every 100 messages ---------------------------------------- 160 | if len(results) % 100 == 0: 161 | tqdm.write("Writing backup") 162 | backup_frame = pd.DataFrame(results) 163 | 164 | try: 165 | backup_frame.to_csv( 166 | f"{fileName}_backup.csv", 167 | encoding="utf-16", 168 | columns=translationColumns, 169 | ) 170 | except: 171 | print("Writing CSV backup failed") 172 | pass 173 | 174 | # ------------------ Write output file ----------------------------------------------------------------- 175 | print("Translation complete - Writing file") 176 | # Get colum positon to insert new data 177 | bodyPosition = df.columns.get_loc(inputColumn) + 1 178 | # Splitting orig frame into to then concat with new data 179 | df1_part1 = df.iloc[:, :bodyPosition] 180 | df1_part2 = df.iloc[:, bodyPosition:] 181 | outputFrame = pd.concat([df1_part1, pd.DataFrame(results), df1_part2], axis=1) 182 | 183 | try: 184 | outputFrame.to_excel(f"{fileName}_translated.xlsx", index=False) 185 | except: 186 | print("Writing Excel failed") 187 | pass 188 | 189 | try: 190 | outputFrame.to_csv(f"{fileName}_translated.csv", encoding="utf-16") 191 | except: 192 | print("Writing CSV failed") 193 | pass 194 | 195 | print("Process complete - Exiting.") 196 | 197 | 198 | # ------------------ Translates text with selected language ----------------------------------------------- 199 | def translate_text(inputText, inputLang, api_key=None): 200 | # For future implementation 201 | if api_key is not None: 202 | API_KEY = api_key 203 | else: 204 | API_KEY = None 205 | 206 | if inputLang is not None: 207 | if debug: 208 | print("Manual Lanugage Selection {}".format(inputLang)) 209 | payload = json.dumps( 210 | { 211 | "q": inputText, 212 | "source": inputLang, 213 | "target": "en", 214 | "format": "text", 215 | "api_key": API_KEY, 216 | } 217 | ) 218 | else: 219 | if debug: 220 | print("Auto language detection enabled".format(inputLang)) 221 | payload = json.dumps( 222 | { 223 | "q": inputText, 224 | "source": "auto", 225 | "target": "en", 226 | "format": "text", 227 | "api_key": API_KEY, 228 | } 229 | ) 230 | 231 | # Detect blank rows and skip to prevent error being thrown by server / speeds up process 232 | if inputText == None or pd.isna(inputText): 233 | tqdm.write("Blank row found, skipping") 234 | output = { 235 | "detectedLanguage": None, 236 | "detectedConfidence": None, 237 | "translatedText": None, 238 | "success": False, 239 | } 240 | output["input"] = inputText 241 | return output 242 | 243 | # If row is not blank, attempt to translate it 244 | else: 245 | headers = {"Content-Type": "application/json"} 246 | try: 247 | # Max Attempt for retries 248 | MAX_ATTEMPTS = 5 249 | 250 | response = requests.post( 251 | f"{serverURL}/translate", 252 | data=payload, 253 | headers=headers, 254 | timeout=(CONNECTION_TIMEOUT, RESPONSE_TIMEOUT), 255 | ) 256 | 257 | # Handle a read timeout error, sleep 2 seconds then try again 258 | except requests.ReadTimeout: 259 | 260 | while MAX_ATTEMPTS > 0: 261 | try: 262 | tqdm.write("Read Timeout error, retrying") 263 | sleep(2) 264 | response = requests.post( 265 | f"{serverURL}/translate", 266 | data=payload, 267 | headers=headers, 268 | ) 269 | output["input"] = inputText 270 | return output 271 | 272 | except Exception: 273 | MAX_ATTEMPTS -= 1 274 | continue 275 | else: 276 | output = { 277 | "detectedLanguage": None, 278 | "detectedConfidence": None, 279 | "translatedText": None, 280 | "success": "False: Error: Read Timeout ", 281 | } 282 | output["input"] = inputText 283 | return output 284 | 285 | # Handle a connection dropout, sleep 2 seconds and try again 286 | except requests.ConnectionError: 287 | while MAX_ATTEMPTS > 0: 288 | try: 289 | tqdm.write("Connection Error - Retrying") 290 | sleep(2) 291 | response = requests.post( 292 | f"{serverURL}/translate", data=payload, headers=headers 293 | ) 294 | output["input"] = inputText 295 | return output 296 | 297 | except Exception: 298 | MAX_ATTEMPTS -= 1 299 | continue 300 | else: 301 | print("Failed") 302 | output = { 303 | "detectedLanguage": None, 304 | "detectedConfidence": None, 305 | "translatedText": None, 306 | "success": "False: Error: Connection Error", 307 | } 308 | output["input"] = inputText 309 | return output 310 | 311 | except Exception as e: 312 | tqdm.write(f"Unhandled exception {e}") 313 | output = { 314 | "detectedLanguage": None, 315 | "detectedConfidence": None, 316 | "translatedText": None, 317 | "success": f"False: Error: {e}", 318 | } 319 | output["input"] = inputText 320 | return output 321 | 322 | if response.status_code == 200: 323 | results = response.json() 324 | if debug: 325 | print(f"{inputText} and {response.json()}") 326 | try: 327 | answer = results 328 | # Server response style is different for Auto or Manual language selection 329 | if inputLang is not None: 330 | output = { 331 | "detectedLanguage": f"Manual - {inputLang}", 332 | "detectedConfidence": None, 333 | "translatedText": answer.get("translatedText"), 334 | "success": True, 335 | } 336 | else: 337 | output = { 338 | "detectedLanguage": results.get("detectedLanguage")["language"], 339 | "detectedConfidence": results.get("detectedLanguage")[ 340 | "confidence" 341 | ], 342 | "translatedText": answer.get("translatedText"), 343 | "success": True, 344 | } 345 | 346 | output["input"] = inputText 347 | return output 348 | except Exception as e: 349 | print(e) 350 | 351 | elif response.status_code == 400: 352 | print("Invalid request") 353 | output = { 354 | "detectedLanguage": None, 355 | "detectedConfidence": None, 356 | "translatedText": None, 357 | "success": f"Error: {response.status_code, results.get}", 358 | } 359 | output["input"] = inputText 360 | return output 361 | 362 | 363 | # Retrieve list of alowed languages from the server 364 | def getLanguages(printVals): 365 | AllowedLangs = [] 366 | try: 367 | supportedLanguages = requests.get(f"{serverURL}/languages").json() 368 | except: 369 | print("Supported Languages not found") 370 | supportedLanguages = [] 371 | pass 372 | 373 | for langItem in supportedLanguages: 374 | if printVals: 375 | print( 376 | f"Language Code: {langItem['code']} Language Name: {langItem['name']}" 377 | ) 378 | AllowedLangs.append(langItem["code"]) 379 | return AllowedLangs 380 | 381 | 382 | # ---------------------------- Argument Parser ------------------------ 383 | 384 | if __name__ == "__main__": 385 | print(banner) 386 | if debug: 387 | print("WARNING DEBUG MODE IS ACTIVE") 388 | serverCheck(serverURL) 389 | print(f"Checking server {serverURL} for supported languages") 390 | try: 391 | supportedLanguages = getLanguages(False) 392 | if len(supportedLanguages) == 0: 393 | print("Supported Languages not found") 394 | supportedLanguages = [] 395 | else: 396 | print(f"Languages found - {supportedLanguages} \n\n") 397 | 398 | except Exception as e: 399 | print(e) 400 | 401 | parser = argparse.ArgumentParser( 402 | description=__description__, 403 | epilog="Developed by {}, version {}".format(str(__author__), str(__version__)), 404 | ) 405 | 406 | parser.add_argument("-f", "--file", dest="inputFilePath", help="Path to Excel File") 407 | parser.add_argument( 408 | "-s", 409 | "--server", 410 | dest="translationServer", 411 | help="Address of translation server if not localhost or hardcoded", 412 | required=False, 413 | ) 414 | 415 | parser.add_argument( 416 | "-l", 417 | "--language", 418 | dest="inputLanguage", 419 | help="Language code for input text - optional but can greatly improve accuracy", 420 | required=False, 421 | choices=supportedLanguages, 422 | ) 423 | 424 | parser.add_argument( 425 | "-e", 426 | "--excelSheet", 427 | dest="inputSheet", 428 | help="Sheet name within Excel file to be translated", 429 | required=False, 430 | choices=inputSheets, 431 | ) 432 | 433 | parser.add_argument( 434 | "-c", 435 | "--isCellebrite", 436 | dest="isCellebrite", 437 | help="If file originated from Cellebrite, header starts at 1, and message column is called 'Body'", 438 | required=False, 439 | action="store_true", 440 | default=False, 441 | ) 442 | 443 | parser.add_argument( 444 | "-g", 445 | "--getlangs", 446 | dest="getLangs", 447 | action="store_true", 448 | help="Get supported language codes and names from server", 449 | required=False, 450 | default=False, 451 | ) 452 | 453 | args = parser.parse_args() 454 | if len(sys.argv) == 1: 455 | parser.print_help() 456 | sys.exit(1) 457 | 458 | if args.inputFilePath and not args.inputLanguage: 459 | if not os.path.exists(args.inputFilePath): 460 | print( 461 | "ERROR: {} does not exist or is not a file".format(args.inputFilePath) 462 | ) 463 | sys.exit(1) 464 | loadAndTranslate(args.inputFilePath, None, args.inputSheet, args.isCellebrite) 465 | 466 | if args.inputFilePath and args.inputLanguage: 467 | if not os.path.exists(args.inputFilePath): 468 | print( 469 | "ERROR: {} does not exist or is not a file".format(args.inputFilePath) 470 | ) 471 | sys.exit(1) 472 | print(f"Input language set to {args.inputLanguage}") 473 | loadAndTranslate( 474 | args.inputFilePath, args.inputLanguage, args.inputSheet, args.isCellebrite 475 | ) 476 | 477 | if args.getLangs: 478 | getLanguages(True) 479 | -------------------------------------------------------------------------------- /clbExtract/old/clbExtract.py: -------------------------------------------------------------------------------- 1 | """ 2 | Extracts nested contacts data from Cellebrite formatted Excel documents. 3 | - Cellebrite Stores contact details in multiline Excel cells. 4 | Formatted with Black 5 | 6 | Changelog 7 | 0.3 Complete rewrite 8 | 9 | 0.2 - Implement command line argument parser 10 | Allow bulk processing of all items in directory 11 | 12 | 0.1 - Initial concept 13 | 14 | """ 15 | import argparse 16 | import glob 17 | import logging 18 | import os 19 | import openpyxl 20 | import pandas as pd 21 | from pathlib import Path 22 | import sys 23 | 24 | 25 | 26 | ## Details 27 | __description__ = 'Flattens Cellebrite formatted Excel files. "Contacts" and "Device Info" tabs are required.' 28 | __author__ = "facelessg00n" 29 | __version__ = "0.3" 30 | 31 | parser = argparse.ArgumentParser( 32 | description=__description__, 33 | epilog="Developed by {}".format(str(__author__), str(__version__)), 34 | ) 35 | 36 | # ----------- Options ----------- 37 | debug = False 38 | 39 | os.chdir(os.getcwd()) 40 | 41 | logging.basicConfig( 42 | filename="log.txt", 43 | format="%(asctime)s,- %(levelname)s - %(message)s", 44 | level=logging.INFO, 45 | ) 46 | 47 | 48 | # Set names for sheets of interest 49 | clbPhoneInfo = "Device Info" 50 | clbContactSheet = "Contacts" 51 | 52 | # FIXME 53 | #### ---- Column names and other options --------------------------------------------- 54 | contactOutput = "ContactDetail" 55 | contactTypeOutput = "ContactType" 56 | originIMEI = "originIMEI" 57 | parsedApps = [ 58 | "Instagram", 59 | "Native", 60 | "Telegram", 61 | "Snapchat", 62 | "WhatsApp", 63 | "Facebook Messenger", 64 | "Signal", 65 | ] 66 | 67 | # Class object to hold phone and input file info 68 | class phoneData: 69 | IMEI = None 70 | IMEI2 = None 71 | inFile = None 72 | inPath = None 73 | 74 | def __init__(self, IMEI=None, IMEI2=None, inFile=None, inPath=None) -> None: 75 | self.IMEI = IMEI 76 | self.IMEI2 = IMEI2 77 | self.inFile = inFile 78 | self.inPath = inPath 79 | 80 | 81 | # -------------Functions live here ------------------------------------------ 82 | 83 | # ----- Bulk Excel Processor-------------------------------------------------- 84 | 85 | # Finds and processes all exxcel files in the working directory. 86 | def bulkProcessor(): 87 | FILE_PATH = os.getcwd() 88 | inputFiles = glob.glob("*.xlsx") 89 | print((str(len(inputFiles)) + " Excel files located. \n")) 90 | # If there are no files found exit the process. 91 | if len(inputFiles) == 0: 92 | print("No excel files located.") 93 | print("Exiting.") 94 | quit() 95 | else: 96 | for x in inputFiles: 97 | if os.path.exists(x): 98 | try: 99 | processMetadata(x) 100 | # Need to deal with $ files. 101 | except FileNotFoundError: 102 | print("File does not exist or temp file detected") 103 | pass 104 | if debug: 105 | for x in inputFiles: 106 | inputFilename = x.split(".")[0] 107 | print(inputFilename) 108 | 109 | 110 | # FIXME - Deal with error when this info is missing 111 | ### -------- Process phone metadata ------------------------------------------------------ 112 | def processMetadata(inputFile): 113 | 114 | try: 115 | infoPD = pd.read_excel( 116 | inputFile, sheet_name=clbPhoneInfo, header=1, usecols="B,C,D" 117 | ) 118 | 119 | phoneData.IMEI = infoPD.loc[infoPD["Name"] == "IMEI", ["Value"]].values[0][0] 120 | try: 121 | phoneData.IMEI2 = infoPD.loc[infoPD["Name"] == "IMEI2", ["Value"]].values[ 122 | 0 123 | ][0] 124 | except: 125 | phoneData.IMEI2 = None 126 | # phoneData.inFile = inputFile.split(".")[0] 127 | phoneData.inFile = Path(inputFile).stem 128 | phoneData.inPath = os.path.dirname(inputFile) 129 | 130 | if debug: 131 | print(infoPD) 132 | print(phoneData.IMEI) 133 | except ValueError: 134 | print( 135 | "\033[1;31m Info tab not found in {}, apptempting with with no IMEI".format( 136 | inputFile 137 | ) 138 | ) 139 | phoneData.IMEI = None 140 | phoneData.IMEI2 = None 141 | # phoneData.inFile = inputFile.split(".")[0] 142 | phoneData.inFile = Path(inputFile).stem 143 | phoneData.inPath = os.path.dirname(inputFile) 144 | 145 | try: 146 | processContacts(inputFile) 147 | except ValueError: 148 | print("\033[1;31m No Contacts tab found, is this a correctly formatted Excel?") 149 | logging.error( 150 | "No Contacts tab found in {}, is this a correctly formatted Excel?".format( 151 | inputFile 152 | ) 153 | ) 154 | 155 | 156 | ### Extract contacts tab of Excel file ------------------------------------------------------------------- 157 | def processContacts(inputFile): 158 | inputFile = inputFile 159 | logging.info("Processing contacts in {} has begun.".format(inputFile)) 160 | 161 | # Record input filename for use in export processes. 162 | 163 | if debug: 164 | print("\033[0;37m Input file is : {}".format(phoneData.inFile)) 165 | 166 | contactsPD = pd.read_excel( 167 | inputFile, 168 | sheet_name=clbContactSheet, 169 | header=1, 170 | index_col="#", 171 | usecols=["#", "Name", "Interaction Statuses", "Entries", "Source", "Account"], 172 | ) 173 | 174 | print( 175 | "\033[0m Processing the following app types for : {}".format(phoneData.inFile) 176 | ) 177 | applist = contactsPD["Source"].unique() 178 | for x in applist: 179 | if x in parsedApps: 180 | print("{} : \u2713 ".format(x)) 181 | else: 182 | print("{} : \u2716".format(x)) 183 | # Process native contacts 184 | try: 185 | processAppleNative(contactsPD) 186 | except: 187 | print("Processing native contacts failed.") 188 | pass 189 | # Process Apps 190 | for x in applist: 191 | if x == "Instagram": 192 | processInstagram(contactsPD) 193 | if x == "Snapchat": 194 | processSnapChat(contactsPD) 195 | if x == "WhatsApp": 196 | processWhatsapp(contactsPD) 197 | if x == "Telegram": 198 | processTelegram(contactsPD) 199 | if x == "Facebook Messenger": 200 | processFacebookMessenger(contactsPD) 201 | if x == "Signal": 202 | processSignal(contactsPD) 203 | 204 | 205 | # ------ Parse Facebook Messenger -------------------------------------------------------------- 206 | def processFacebookMessenger(contactsPD): 207 | print("\nProcessing Facebook Messenger") 208 | facebookMessengerPD = contactsPD[contactsPD["Source"] == "Facebook Messenger"] 209 | facebookMessengerPD = facebookMessengerPD.drop("Entries", axis=1).join( 210 | facebookMessengerPD["Entries"].str.split("\n", expand=True) 211 | ) 212 | facebookMessengerPD = facebookMessengerPD.reset_index(drop=True) 213 | 214 | selected_cols = [] 215 | for x in facebookMessengerPD.columns: 216 | if isinstance(x, int): 217 | selected_cols.append(x) 218 | 219 | def phoneCheck(facebookMessengerPD): 220 | for x in selected_cols: 221 | facebookMessengerPD.loc[ 222 | (facebookMessengerPD[x].str.contains("User ID-Facebook Id", na=False)), 223 | "Account ID", 224 | ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1] 225 | facebookMessengerPD.loc[ 226 | (facebookMessengerPD[x].str.contains("User ID-Username", na=False)), 227 | "User Name", 228 | ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1] 229 | 230 | phoneCheck(facebookMessengerPD) 231 | facebookMessengerPD[originIMEI] = phoneData.IMEI 232 | exportCols = [] 233 | for x in facebookMessengerPD.columns: 234 | if isinstance(x, str): 235 | exportCols.append(x) 236 | print("\n") 237 | print( 238 | "{} user accounts located".format(len(facebookMessengerPD["Account"].unique())) 239 | ) 240 | print("{} contacts located".format(len(facebookMessengerPD["Account ID"].unique()))) 241 | print("Exporting {}-FB-MESSENGER.csv".format(phoneData.inFile)) 242 | logging.info("Exporting FB messenger from {}".format(phoneData.inFile)) 243 | facebookMessengerPD[exportCols].to_csv( 244 | "{}-FB-MESSENGER.csv".format(phoneData.inFile), 245 | index=False, 246 | columns=[ 247 | originIMEI, 248 | "Account", 249 | "Interaction Statuses", 250 | "Name", 251 | "User Name", 252 | "Account ID", 253 | "Source", 254 | ], 255 | ) 256 | 257 | 258 | # ----- Parse Instagram data ------------------------------------------------------------------ 259 | def processInstagram(contactsPD): 260 | print("\nProcessing Instagram") 261 | instagramPD = contactsPD[contactsPD["Source"] == "Instagram"].copy() 262 | instagramPD = instagramPD.drop("Entries", axis=1).join( 263 | instagramPD["Entries"].str.split("\n", expand=True) 264 | ) 265 | 266 | selected_cols = [] 267 | for x in instagramPD.columns: 268 | if isinstance(x, int): 269 | selected_cols.append(x) 270 | 271 | def instaContacts(instagramPD): 272 | for x in selected_cols: 273 | instagramPD.loc[ 274 | (instagramPD[x].str.contains("User ID-Username", na=False)), "User Name" 275 | ] = instagramPD[x].str.split(":", n=1, expand=True)[1] 276 | instagramPD.loc[ 277 | (instagramPD[x].str.contains("User ID-Instagram Id", na=False)), 278 | "Instagram ID", 279 | ] = instagramPD[x].str.split(":", n=1, expand=True)[1] 280 | 281 | instaContacts(instagramPD) 282 | 283 | instagramPD[originIMEI] = phoneData.IMEI 284 | exportCols = [] 285 | for x in instagramPD.columns: 286 | if isinstance(x, str): 287 | exportCols.append(x) 288 | 289 | print("Exporting {}-INSTAGRAM.csv".format(phoneData.inFile)) 290 | logging.info("Exporting Instagram from {}".format(phoneData.inFile)) 291 | instagramPD[exportCols].to_csv( 292 | "{}-INSTAGRAM.csv".format(phoneData.inFile), 293 | index=False, 294 | columns=[ 295 | originIMEI, 296 | "Account", 297 | "Name", 298 | "User Name", 299 | "Instagram ID", 300 | "Interaction Statuses", 301 | ], 302 | ) 303 | 304 | 305 | # ------------Process native contact list ------------------------------------------------ 306 | def processAppleNative(contactsPD): 307 | 308 | print("\nProcessing Native Contacts") 309 | nativeContactsPD = contactsPD[contactsPD["Source"].isna()] 310 | nativeContactsPD = nativeContactsPD.drop("Entries", axis=1).join( 311 | nativeContactsPD["Entries"] 312 | .str.split("\n", expand=True) 313 | .stack() 314 | .reset_index(level=1, drop=True) 315 | .rename("Entries") 316 | ) 317 | 318 | nativeContactsPD = nativeContactsPD[["Name", "Interaction Statuses", "Entries"]] 319 | 320 | nativeContactsPD = nativeContactsPD[ 321 | nativeContactsPD["Entries"].str.contains(r"Phone-") 322 | ] 323 | nativeContactsPD[originIMEI] = phoneData.IMEI 324 | nativeContactsPD["Entries"] = ( 325 | nativeContactsPD["Entries"] 326 | .str.split(":", n=1, expand=True)[1] 327 | .str.strip() 328 | .str.replace(" ", "") 329 | .str.replace("-", "") 330 | ) 331 | if debug: 332 | print(nativeContactsPD) 333 | nativeContactsPD = nativeContactsPD[ 334 | [originIMEI, "Name", "Entries", "Interaction Statuses"] 335 | ] 336 | print("{} contacts located.".format(len(nativeContactsPD))) 337 | print("Exporting {}-NATIVE.csv".format(phoneData.inFile)) 338 | logging.info("Exporting Native contacts from {}".format(phoneData.inFile)) 339 | nativeContactsPD.to_csv("{}-NATIVE.csv".format(phoneData.inFile), index=False) 340 | 341 | 342 | # ------------Parse Signal contacts --------------------------------------------------------------- 343 | def processSignal(contactsPD): 344 | print("Processing Signal Contacts") 345 | signalPD = contactsPD[contactsPD["Source"] == "Signal"].copy() 346 | signalPD = signalPD[["Name", "Entries", "Source"]] 347 | signalPD = signalPD.drop("Entries", axis=1).join( 348 | signalPD["Entries"].str.split("\n", expand=True) 349 | ) 350 | 351 | # Data is expended into columns with integern names, add these columsn to selected_cols so we can search them later 352 | selected_cols = [] 353 | for x in signalPD.columns: 354 | if isinstance(x, int): 355 | selected_cols.append(x) 356 | 357 | # Signal can store mutiple values under entries such as Mobile Number: 358 | # So we break them all out into columns. 359 | def signalContact(signalPD): 360 | for x in selected_cols: 361 | # Locate Signal Username and move to Username Column 362 | signalPD.loc[ 363 | (signalPD[x].str.contains("User ID-Username:", na=False)), 364 | "User Name", 365 | ] = signalPD[x].str.split(":", n=1, expand=True)[1] 366 | # Delete Username entry from origional location 367 | signalPD.loc[ 368 | signalPD[x].str.contains("User ID-Username:", na=False), [x] 369 | ] = "" 370 | # delete all befote semicolon 371 | signalPD[x] = signalPD[x].str.split(":", n=1, expand=True)[1].str.strip() 372 | 373 | signalContact(signalPD) 374 | 375 | signalPD[originIMEI] = phoneData.IMEI 376 | 377 | export_cols = [originIMEI, "Name", "User Name"] 378 | export_cols.extend(selected_cols) 379 | print("Located {} Signal contacts".format(len(signalPD["Name"]))) 380 | print("Exporting {}-SIGNAL.csv".format(phoneData.inFile)) 381 | logging.info("Exporting Signal messenger from {}".format(phoneData.inFile)) 382 | signalPD.to_csv( 383 | "{}-SIGNAL.csv".format(phoneData.inFile), index=False, columns=export_cols 384 | ) 385 | 386 | 387 | # ----------- Parse Snapchat data ------------------------------------------------------------------ 388 | def processSnapChat(contactsPD): 389 | print("\nProcessing Snapchat") 390 | snapPD = contactsPD[contactsPD["Source"] == "Snapchat"] 391 | snapPD = snapPD[["Name", "Entries", "Source"]] 392 | 393 | # Extract nested entities 394 | snapPD = snapPD.drop("Entries", axis=1).join( 395 | snapPD["Entries"].str.split("\n", expand=True) 396 | ) 397 | selected_cols = [] 398 | for x in snapPD.columns: 399 | if isinstance(x, int): 400 | selected_cols.append(x) 401 | 402 | def snapContacts(snapPD): 403 | for x in selected_cols: 404 | snapPD.loc[ 405 | (snapPD[x].str.contains("User ID-Username", na=False)), "User Name" 406 | ] = snapPD[x].str.split(":", n=1, expand=True)[1] 407 | snapPD.loc[ 408 | (snapPD[x].str.contains("User ID-User ID", na=False)), "User ID" 409 | ] = snapPD[x].str.split(":", n=1, expand=True)[1] 410 | 411 | snapContacts(snapPD) 412 | snapPD[originIMEI] = phoneData.IMEI 413 | 414 | exportCols = [] 415 | for x in snapPD.columns: 416 | if isinstance(x, str): 417 | exportCols.append(x) 418 | if debug: 419 | print(snapPD[exportCols]) 420 | print("Exporting {}-SNAPCHAT.csv".format(phoneData.inFile)) 421 | logging.info("Exporting Snapchat from {}".format(phoneData.inFile)) 422 | snapPD[exportCols].to_csv( 423 | "{}-SNAPCHAT.csv".format(phoneData.inFile), 424 | index=False, 425 | columns=[originIMEI, "Name", "User Name", "User ID"], 426 | ) 427 | 428 | 429 | # ---- Parse Telegram Contacts-------------------------------------------------------------- 430 | def processTelegram(contactsPD): 431 | print("\nProcessing Telegram") 432 | telegramPD = contactsPD[contactsPD["Source"] == "Telegram"].copy() 433 | telegramPD = telegramPD.drop("Entries", axis=1).join( 434 | telegramPD["Entries"].str.split("\n", expand=True) 435 | ) 436 | telegramPD = telegramPD.reset_index(drop=True) 437 | 438 | selected_cols = [] 439 | for x in telegramPD.columns: 440 | if isinstance(x, int): 441 | selected_cols.append(x) 442 | 443 | def phoneCheck(telegramPD): 444 | for x in selected_cols: 445 | telegramPD.loc[ 446 | (telegramPD[x].str.contains("Phone-", na=False)), "Phone-Number" 447 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1] 448 | 449 | telegramPD.loc[ 450 | (telegramPD[x].str.contains("User ID-Peer", na=False)), "Peer-ID" 451 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1] 452 | 453 | telegramPD.loc[ 454 | (telegramPD[x].str.contains("User ID-Username", na=False)), "User-Name" 455 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1] 456 | 457 | phoneCheck(telegramPD) 458 | telegramPD[originIMEI] = phoneData.IMEI 459 | exportCols = [] 460 | for x in telegramPD.columns: 461 | if isinstance(x, str): 462 | exportCols.append(x) 463 | # Export CSV 464 | print("Exporting {}-TELEGRAM.csv".format(phoneData.inFile)) 465 | logging.info("Exporting Telegram from {}".format(phoneData.inFile)) 466 | telegramPD[exportCols].to_csv( 467 | "{}-TELEGRAM.csv".format(phoneData.inFile), index=False 468 | ) 469 | 470 | 471 | # ---Parse Whatsapp Contacts---------------------------------------------------------------------- 472 | # Load WhatsApp 473 | def processWhatsapp(contactsPD): 474 | print("\nProcessing WhatsApp") 475 | whatsAppPD = contactsPD[contactsPD["Source"] == "WhatsApp"].copy() 476 | whatsAppPD = whatsAppPD[["Name", "Entries", "Source", "Interaction Statuses"]] 477 | # Shared contacts are not associated with a Whats app ID and cause problems. 478 | whatsAppPD = whatsAppPD[ 479 | whatsAppPD["Interaction Statuses"].str.contains("Shared", na=False) == False 480 | ] 481 | # Unpack nested data 482 | whatsAppPD = whatsAppPD.drop("Entries", axis=1).join( 483 | whatsAppPD["Entries"].str.split("\n", expand=True) 484 | ) 485 | 486 | # Data is expanded into colums with Integer names, check for these columns and add them to a 487 | # list to allow for different width sheets. 488 | colList = list(whatsAppPD) 489 | selected_cols = [] 490 | for x in colList: 491 | if isinstance(x, int): 492 | selected_cols.append(x) 493 | 494 | # Look for data across expanded columns and shift it to output columns. 495 | def whatsappContactProcess(whatsAppPD): 496 | print("\nProcessing WhatsApp") 497 | for x in selected_cols: 498 | whatsAppPD.loc[ 499 | (whatsAppPD[x].str.contains("Phone-Mobile", na=False)), "Phone-Mobile" 500 | ] = ( 501 | whatsAppPD[x] 502 | .str.split(":", n=1, expand=True)[1] 503 | .str.replace(" ", "") 504 | .str.replace("-", "") 505 | ) 506 | 507 | whatsAppPD.loc[ 508 | (whatsAppPD[x].str.contains("Phone-:", na=False)), "Phone" 509 | ] = ( 510 | whatsAppPD[x] 511 | .str.split(":", n=1, expand=True)[1] 512 | .str.replace(" ", "") 513 | .str.replace("-", "") 514 | ) 515 | 516 | whatsAppPD.loc[ 517 | (whatsAppPD[x].str.contains("Phone-Home:", na=False)), "Phone-Home" 518 | ] = ( 519 | whatsAppPD[x] 520 | .str.split(":", n=1, expand=True)[1] 521 | .str.replace(" ", "") 522 | .str.replace("-", "") 523 | ) 524 | 525 | whatsAppPD.loc[ 526 | (whatsAppPD[x].str.contains("User ID-Push Name", na=False)), "Push-ID" 527 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 528 | 529 | whatsAppPD.loc[ 530 | (whatsAppPD[x].str.contains("User ID-Id", na=False)), "Id-ID" 531 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 532 | 533 | whatsAppPD.loc[ 534 | (whatsAppPD[x].str.contains("User ID-WhatsApp User Id", na=False)), 535 | "WhatsApp-ID", 536 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 537 | 538 | whatsAppPD.loc[ 539 | (whatsAppPD[x].str.contains("Web address-Professional", na=False)), 540 | "BusinessWebsite", 541 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 542 | 543 | whatsAppPD.loc[ 544 | (whatsAppPD[x].str.contains("Email-Professional", na=False)), 545 | "Business-Email", 546 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 547 | 548 | whatsappContactProcess(whatsAppPD) 549 | 550 | # Add IMEI Column 551 | whatsAppPD[originIMEI] = phoneData.IMEI 552 | 553 | # Remove working columns. 554 | exportCols = [] 555 | for x in whatsAppPD.columns: 556 | if isinstance(x, str): 557 | exportCols.append(x) 558 | if debug: 559 | print(exportCols) 560 | 561 | # Export CSV 562 | print("Exporting {}-WHATSAPP.csv".format(phoneData.inFile)) 563 | logging.info("Exporting Whatsapp from {}".format(phoneData.inFile)) 564 | whatsAppPD[exportCols].to_csv( 565 | "{}-WHATSAPP.csv".format(phoneData.inFile), index=False 566 | ) 567 | 568 | 569 | # ------- Argument parser for command line arguments ----------------------------------------- 570 | 571 | if __name__ == "__main__": 572 | parser = argparse.ArgumentParser( 573 | description=__description__, 574 | epilog="Developed by {}".format(str(__author__), str(__version__)), 575 | ) 576 | 577 | parser.add_argument( 578 | "-f", 579 | "--f", 580 | dest="inputFilename", 581 | help="Path to Excel Spreadsheet", 582 | required=False, 583 | ) 584 | 585 | parser.add_argument( 586 | "-b", 587 | "--bulk", 588 | dest="bulk", 589 | required=False, 590 | action="store_true", 591 | help="Bulk process Excel spreadsheets in working directory.", 592 | ) 593 | 594 | args = parser.parse_args() 595 | 596 | if len(sys.argv) == 1: 597 | parser.print_help() 598 | parser.exit() 599 | 600 | if args.bulk: 601 | print("Bulk Process") 602 | bulkProcessor() 603 | 604 | if args.inputFilename: 605 | if not os.path.exists(args.inputFilename): 606 | print( 607 | "Error: '{}' does not exist or is not a file.".format( 608 | args.inputFilename 609 | ) 610 | ) 611 | sys.exit(1) 612 | processMetadata(args.inputFilename) 613 | -------------------------------------------------------------------------------- /clbExtract/clbExtract.py: -------------------------------------------------------------------------------- 1 | """ 2 | Extracts nested contacts data from Cellebrite formatted Excel documents. 3 | - Cellebrite Stores contact details in multiline Excel cells. 4 | 5 | Formatted unapologetically with Black 6 | 7 | # Current Known Issues 8 | # FIXME - Fix Instagram parser, account ID's have changed 9 | # FIXME - Fix Og Signal parser, Column order 10 | 11 | Changelog 12 | 0.9 - Fix - Handles name change to Signal Private Messenger and extra data columns 13 | - prints version to command line 14 | - Fix bug where files ending in .XLXS (Caps) wouldn't be automatically found 15 | - Support for Outlook contacts 16 | 17 | 0.8 - Fix issue where Input file was entered twice for Instagram export sheet 18 | 19 | 0.7 - Add Provenance data column 20 | - Fix issue where WhatsApp or Facebook may not export due to lack if 'Interaction Statuses' Column 21 | 22 | 0.6 - Fix issue with Threema user ID attribution 23 | - Fix issue with parsers crashing out, now raises an exception and continues. 24 | 25 | 0.5 - Added support for recents - at this time this is kept separate from native contacts 26 | - Warning re large files, pandas is unable to provide load time estimates 27 | - Add option to normalise Au mobile phone by converting +614** to 04** 28 | - Minor tidyups and fixes to logging. 29 | - Fix WeChat exception for older style excels 30 | - Fix Whatsapp exception when interaction status is not populated 31 | - Fix Exception when there is no IMEI entry at all, eg. older iPads 32 | - Populate and export source columns 33 | 34 | 0.4a - Added support for Cellebrite files with device info stored in "device" rather than name columns 35 | 36 | 0.4 - Add support for alternate Cellebrite info page format 37 | - Add support For Line, WeChat, Threema contacts 38 | 39 | 0.3 Complete rewrite 40 | 41 | 0.2 - Implement command line argument parser 42 | Allow bulk processing of all items in directory 43 | 44 | 0.1 - Initial concept 45 | 46 | """ 47 | 48 | import argparse 49 | import glob 50 | import logging 51 | import os 52 | import pandas as pd 53 | from pathlib import Path 54 | import sys 55 | 56 | ## Details 57 | __description__ = 'Flattens Cellebrite formatted Excel files. "Contacts" and "Device Info" tabs are required.' 58 | __author__ = "facelessg00n" 59 | __version__ = "0.9" 60 | 61 | parser = argparse.ArgumentParser( 62 | description=__description__, 63 | epilog="Developed by {}, version {}".format(str(__author__), str(__version__)), 64 | ) 65 | 66 | # ----------- Options ----------- 67 | os.chdir(os.getcwd()) 68 | 69 | # Show extra debug output 70 | debug = False 71 | 72 | # Normalise Australian mobile numbers by replacing +614 with 04 73 | ausNormal = True 74 | 75 | # File size warning (MB) 76 | warnSize = 50 77 | 78 | 79 | # ----------- Logging options ------------------------------------- 80 | 81 | logging.basicConfig( 82 | filename="clbExtract.log", 83 | format="%(asctime)s,- %(levelname)s - %(message)s", 84 | level=logging.INFO, 85 | ) 86 | 87 | 88 | # Set names for sheets of interest 89 | clbPhoneInfo = "Device Info" 90 | clbContactSheet = "Contacts" 91 | clbPhoneInfov2 = "Device Information" 92 | 93 | # FIXME 94 | #### ---- Column names and other options --------------------------------------------- 95 | provenanceCols = ["WARRANT", "COLLECT", "EXAM", "NOTICE"] 96 | 97 | contactOutput = "ContactDetail" 98 | contactTypeOutput = "ContactType" 99 | originIMEI = "originIMEI" 100 | parsedApps = [ 101 | "Facebook Messenger", 102 | "Instagram", 103 | "Line", 104 | "Native", 105 | "Outlook", 106 | "Recents", 107 | "Signal", 108 | "Signal Private Messenger", 109 | "Snapchat", 110 | "WhatsApp", 111 | "Telegram", 112 | "Threema", 113 | "WeChat", 114 | "Zalo", 115 | ] 116 | 117 | 118 | # Class object to hold phone and input file info 119 | class phoneData: 120 | IMEI = None 121 | IMEI2 = None 122 | inFile = None 123 | inPath = None 124 | inProvenance = None 125 | 126 | def __init__( 127 | self, IMEI=None, IMEI2=None, inFile=None, inPath=None, inProvenance=None 128 | ) -> None: 129 | self.IMEI = IMEI 130 | self.IMEI2 = IMEI2 131 | self.inFile = inFile 132 | self.inPath = inPath 133 | self.inProvenance = inProvenance 134 | 135 | 136 | # -------------Functions live here ------------------------------------------ 137 | 138 | # ----- Bulk Excel Processor-------------------------------------------------- 139 | 140 | 141 | # Finds and processes all excel files in the working directory. 142 | def bulkProcessor(inputProvenance): 143 | FILE_PATH = os.getcwd() 144 | inputFiles = glob.glob("*.xlsx") + glob.glob("*.XLSX") 145 | print((str(len(inputFiles)) + " Excel files located. \n")) 146 | logging.info("Bulk processing {} files".format(str(len(inputFiles)))) 147 | # If there are no files found exit the process. 148 | if len(inputFiles) == 0: 149 | print("No excel files located.") 150 | print("Exiting.") 151 | quit() 152 | else: 153 | for inputFile in inputFiles: 154 | if os.path.exists(inputFile): 155 | try: 156 | processMetadata(inputFile, inputProvenance) 157 | # Need to deal with $ files. 158 | except FileNotFoundError: 159 | print("File does not exist or temp file detected") 160 | pass 161 | if debug: 162 | for inputFile in inputFiles: 163 | inputFilename = inputFile.split(".")[0] 164 | print(inputFilename) 165 | 166 | 167 | # FIXME - Deal with error when this info is missing 168 | ### -------- Process phone metadata ------------------------------------------------------ 169 | def processMetadata(inputFile, inputProvenance): 170 | inputFile = inputFile 171 | print("Input Provenance is {}".format(inputProvenance)) 172 | print("Extracting metadata from {}".format(inputFile)) 173 | logging.info("Extracting metadata from {}".format(inputFile)) 174 | 175 | phoneData.inProvenance = inputProvenance 176 | 177 | fileSize = os.path.getsize(inputFile) / 1048576 178 | if fileSize > warnSize: 179 | print( 180 | "Large input file detected, {} MB and may take some time to process, sadly progress is not able to be updated while the file is loading".format( 181 | f"{fileSize:.2f}" 182 | ) 183 | ) 184 | else: 185 | print("Input file is {} MB".format(f"{fileSize:.2f}")) 186 | 187 | try: 188 | infoPD = pd.read_excel( 189 | inputFile, sheet_name=clbPhoneInfo, header=1, usecols="B,C,D" 190 | ) 191 | 192 | try: 193 | phoneData.IMEI = infoPD.loc[infoPD["Name"] == "IMEI", ["Value"]].values[0][ 194 | 0 195 | ] 196 | phoneData.inFile = Path(inputFile).stem 197 | phoneData.inPath = os.path.dirname(inputFile) 198 | phoneData.inProvenance = inputProvenance 199 | except: 200 | print("Attempting Device Column") 201 | try: 202 | phoneData.IMEI = infoPD.loc[ 203 | infoPD["Device"] == "IMEI", ["Value"] 204 | ].values[0][0] 205 | phoneData.inFile = Path(inputFile).stem 206 | phoneData.inPath = os.path.dirname(inputFile) 207 | except: 208 | print("IMEI not located, setting to NULL") 209 | phoneData.IMEI = None 210 | phoneData.inFile = Path(inputFile).stem 211 | phoneData.inPath = os.path.dirname(inputFile) 212 | 213 | try: 214 | phoneData.IMEI2 = infoPD.loc[infoPD["Name"] == "IMEI2", ["Value"]].values[ 215 | 0 216 | ][0] 217 | except: 218 | phoneData.IMEI2 = None 219 | phoneData.inFile = Path(inputFile).stem 220 | phoneData.inPath = os.path.dirname(inputFile) 221 | # phoneData.inFile = inputFile.split(".")[0] 222 | phoneData.inFile = Path(inputFile).stem 223 | phoneData.inPath = os.path.dirname(inputFile) 224 | 225 | if debug: 226 | print(infoPD) 227 | print(phoneData.IMEI) 228 | 229 | except ValueError: 230 | print( 231 | "Info tab not found in {}, attempting with second format.".format(inputFile) 232 | ) 233 | logging.exception( 234 | "No info tab found in {}, attempting with second format".format(inputFile) 235 | ) 236 | try: 237 | infoPD = pd.read_excel( 238 | inputFile, sheet_name=clbPhoneInfov2, header=1, usecols="B,C,D" 239 | ) 240 | # Remove leading whitespace from columns 241 | infoPD["Name"] = infoPD["Name"].str.strip() 242 | phoneData.IMEI = infoPD.loc[infoPD["Name"] == "IMEI", ["Value"]].values[0][ 243 | 0 244 | ] 245 | print("Second format succeeded") 246 | logging.info("Second format succeeded on {}".format(inputFile)) 247 | 248 | phoneData.inFile = Path(inputFile).stem 249 | phoneData.inPath = os.path.dirname(inputFile) 250 | 251 | except IndexError: 252 | print("IMEI not located, is this a tablet or iPAD?") 253 | logging.warning( 254 | "IMEI not found in {}, apptempting with with no IMEI".format(inputFile) 255 | ) 256 | phoneData.IMEI = None 257 | phoneData.IMEI2 = None 258 | phoneData.inFile = Path(inputFile).stem 259 | phoneData.inPath = os.path.dirname(inputFile) 260 | print("Loaded {}, with no IMEI".format(inputFile)) 261 | logging.info("Loaded {}, with no IMEI".format(inputFile)) 262 | pass 263 | 264 | except ValueError: 265 | print( 266 | "\033[1;31m Info tab not found in {}, apptempting with with no IMEI".format( 267 | inputFile 268 | ) 269 | ) 270 | logging.warning( 271 | "Info tab not found in {}, apptempting with with no IMEI".format( 272 | inputFile 273 | ) 274 | ) 275 | phoneData.IMEI = None 276 | phoneData.IMEI2 = None 277 | # phoneData.inFile = inputFile.split(".")[0] 278 | phoneData.inFile = Path(inputFile).stem 279 | phoneData.inPath = os.path.dirname(inputFile) 280 | print("\033[1;31m Loaded {}, with no IMEI".format(inputFile)) 281 | logging.info("Loaded {}, with no IMEI".format(inputFile)) 282 | pass 283 | 284 | try: 285 | processContacts(inputFile) 286 | except Exception as e: 287 | print(e) 288 | except ValueError: 289 | print("\033[1;31m No Contacts tab found, is this a correctly formatted Excel?") 290 | logging.error( 291 | "No Contacts tab found in {}, is this a correctly formatted Excel?".format( 292 | inputFile 293 | ) 294 | ) 295 | 296 | 297 | ### Extract contacts tab of Excel file ------------------------------------------------------------------- 298 | # This creates the initial dataframe, future processing is from copies of this dataframe. 299 | def processContacts(inputFile): 300 | inputFile = inputFile 301 | fileSize = os.path.getsize(inputFile) / 1048576 302 | print("Processing contacts in {} has begun.".format(phoneData.inFile)) 303 | logging.info("Processing contacts in {} has begun.".format(phoneData.inFile)) 304 | 305 | if fileSize > warnSize: 306 | print( 307 | "Large input file detected, {} MB and may take some time to process, sadly progress is not able to be updated while the file is loading".format( 308 | f"{fileSize:.2f}" 309 | ) 310 | ) 311 | else: 312 | print("Input file is {} MB".format(f"{fileSize:.2f}")) 313 | 314 | # Record input filename for use in export processes. 315 | if debug: 316 | print("\033[0;37m Input file is : {}".format(phoneData.inFile)) 317 | 318 | contactsPD = pd.read_excel( 319 | inputFile, 320 | sheet_name=clbContactSheet, 321 | header=1, 322 | index_col="#", 323 | usecols=["#", "Name", "Entries", "Source", "Account"], 324 | ) 325 | 326 | print("\033[0mProcessing the following app types for : {}".format(phoneData.inFile)) 327 | applist = contactsPD["Source"].unique() 328 | for x in applist: 329 | if x in parsedApps: 330 | print("{} : \u2713 ".format(x)) 331 | 332 | else: 333 | print("{} : \u2716".format(x)) 334 | 335 | # Process native contacts 336 | try: 337 | processAppleNative(contactsPD) 338 | except Exception as e: 339 | print("Processing native contacts failed") 340 | print(e) 341 | pass 342 | 343 | # Process Apps 344 | for x in applist: 345 | if x == "Facebook Messenger": 346 | try: 347 | processFacebookMessenger(contactsPD) 348 | except Exception as e: 349 | logging.warning("Failed to parse Facebook messenger - {}".format(e)) 350 | pass 351 | if x == "Instagram": 352 | try: 353 | processInstagram(contactsPD) 354 | except: 355 | logging.warning("Failed to parse Instagram") 356 | pass 357 | if x == "Line": 358 | try: 359 | processLine(contactsPD) 360 | except: 361 | logging.warning("Failed to parse Line") 362 | pass 363 | if x == "Outlook": 364 | try: 365 | processOutlookContacts(contactsPD) 366 | except: 367 | logging.warning("Failed to parse Outlook") 368 | pass 369 | if x == "Recents": 370 | try: 371 | processRecents(contactsPD) 372 | except: 373 | logging.warning("Failed to parse Recents") 374 | pass 375 | if x == "Snapchat": 376 | try: 377 | processSnapChat(contactsPD) 378 | except: 379 | logging.warning("Failed to parse Snapchat") 380 | pass 381 | if x == "Telegram": 382 | try: 383 | processTelegram(contactsPD) 384 | except: 385 | logging.warning("Failed to parse Telegram") 386 | pass 387 | if x == "Threema": 388 | try: 389 | processThreema(contactsPD) 390 | except: 391 | logging.warning("Failed to parse Threema") 392 | pass 393 | if x == "Signal": 394 | try: 395 | processSignal(contactsPD) 396 | except: 397 | logging.warning("Failed to parse Signal") 398 | pass 399 | if x == "Signal Private Messenger": 400 | try: 401 | processSignalPrivateMessenger(contactsPD) 402 | except: 403 | logging.warning("Failed to parse Signal Private Messenger") 404 | 405 | if x == "WeChat": 406 | try: 407 | processWeChat(contactsPD) 408 | except: 409 | logging.warning("Failed to parse WeChat") 410 | pass 411 | if x == "WhatsApp": 412 | try: 413 | processWhatsapp(contactsPD) 414 | except Exception as e: 415 | logging.warning("Failed to parse WhatsApp - {}".format(e)) 416 | pass 417 | if x == "Zalo": 418 | try: 419 | processZalo(contactsPD) 420 | except: 421 | logging.warning("Failed to parse Zalo") 422 | pass 423 | 424 | print("\nProcessing of {} complete".format(inputFile)) 425 | 426 | 427 | # ------ Parse Facebook Messenger -------------------------------------------------------------- 428 | def processFacebookMessenger(contactsPD): 429 | print("\nProcessing Facebook Messenger") 430 | facebookMessengerPD = contactsPD[contactsPD["Source"] == "Facebook Messenger"] 431 | facebookMessengerPD = facebookMessengerPD.drop("Entries", axis=1).join( 432 | facebookMessengerPD["Entries"].str.split("\n", expand=True) 433 | ) 434 | facebookMessengerPD = facebookMessengerPD.reset_index(drop=True) 435 | 436 | selected_cols = [] 437 | for x in facebookMessengerPD.columns: 438 | if isinstance(x, int): 439 | selected_cols.append(x) 440 | 441 | def phoneCheck(facebookMessengerPD): 442 | for x in selected_cols: 443 | facebookMessengerPD.loc[ 444 | (facebookMessengerPD[x].str.contains("User ID-Facebook Id", na=False)), 445 | "Account ID", 446 | ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1] 447 | facebookMessengerPD.loc[ 448 | (facebookMessengerPD[x].str.contains("User ID-Username", na=False)), 449 | "User Name", 450 | ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1] 451 | 452 | phoneCheck(facebookMessengerPD) 453 | 454 | facebookMessengerPD["Source"] = "Messenger" 455 | facebookMessengerPD[originIMEI] = phoneData.IMEI 456 | facebookMessengerPD["inputFile"] = phoneData.inFile 457 | facebookMessengerPD["Provenance"] = phoneData.inProvenance 458 | 459 | exportCols = [] 460 | for x in facebookMessengerPD.columns: 461 | if isinstance(x, str): 462 | exportCols.append(x) 463 | print( 464 | "{} user accounts located".format(len(facebookMessengerPD["Account"].unique())) 465 | ) 466 | print("{} contacts located".format(len(facebookMessengerPD["Account ID"].unique()))) 467 | print("Exporting {}-FB-MESSENGER.csv".format(phoneData.inFile)) 468 | logging.info("Exporting FB messenger from {}".format(phoneData.inFile)) 469 | try: 470 | facebookMessengerPD[exportCols].to_csv( 471 | "{}-FB-MESSENGER.csv".format(phoneData.inFile), 472 | index=False, 473 | ) 474 | except Exception as e: 475 | print(e) 476 | 477 | 478 | # ----- Parse Instagram data ------------------------------------------------------------------ 479 | def processInstagram(contactsPD): 480 | print("\nProcessing Instagram") 481 | instagramPD = contactsPD[contactsPD["Source"] == "Instagram"].copy() 482 | instagramPD = instagramPD.drop("Entries", axis=1).join( 483 | instagramPD["Entries"].str.split("\n", expand=True) 484 | ) 485 | 486 | selected_cols = [] 487 | for x in instagramPD.columns: 488 | if isinstance(x, int): 489 | selected_cols.append(x) 490 | 491 | def instaContacts(instagramPD): 492 | for x in selected_cols: 493 | instagramPD.loc[ 494 | (instagramPD[x].str.contains("User ID-Username", na=False)), "User Name" 495 | ] = instagramPD[x].str.split(":", n=1, expand=True)[1] 496 | instagramPD.loc[ 497 | (instagramPD[x].str.contains("User ID-Instagram Id", na=False)), 498 | "Instagram ID", 499 | ] = instagramPD[x].str.split(":", n=1, expand=True)[1] 500 | 501 | instaContacts(instagramPD) 502 | 503 | instagramPD[originIMEI] = phoneData.IMEI 504 | instagramPD["inputFile"] = phoneData.inFile 505 | 506 | exportCols = [] 507 | for x in instagramPD.columns: 508 | if isinstance(x, str): 509 | exportCols.append(x) 510 | print("{} Instagram contacts located".format(len(instagramPD["Name"]))) 511 | print("Exporting {}-INSTAGRAM.csv".format(phoneData.inFile)) 512 | logging.info("Exporting Instagram from {}".format(phoneData.inFile)) 513 | # TODO - Fix column handling 514 | instagramPD[exportCols].to_csv( 515 | "{}-INSTAGRAM.csv".format(phoneData.inFile), 516 | index=False, 517 | ) 518 | 519 | 520 | # ---- Process Line ----------------------------------------------------------------------- 521 | def processLine(contactsPD): 522 | print("Processing Line") 523 | linePD = contactsPD[contactsPD["Source"] == "Line"].copy() 524 | linePD = linePD.drop("Entries", axis=1).join( 525 | linePD["Entries"].str.split("\n", expand=True) 526 | ) 527 | linePD = linePD.reset_index(drop=True) 528 | 529 | selected_cols = [] 530 | for x in linePD.columns: 531 | if isinstance(x, int): 532 | selected_cols.append(x) 533 | 534 | def processLine(LinePD): 535 | for x in selected_cols: 536 | LinePD.loc[ 537 | (LinePD[x].str.contains("User ID-Address Book Name:", na=False)), 538 | "LineAddressBook", 539 | ] = LinePD[x].str.split(":", n=1, expand=True)[1] 540 | 541 | LinePD.loc[ 542 | (LinePD[x].str.contains("User ID-User ID:", na=False)), 543 | "LineUserID", 544 | ] = LinePD[x].str.split(":", n=1, expand=True)[1] 545 | LinePD.loc[ 546 | (LinePD[x].str.contains("User ID-Server:", na=False)), 547 | "LineServerID", 548 | ] = LinePD[x].str.split(":", n=1, expand=True)[1] 549 | 550 | processLine(linePD) 551 | 552 | linePD[originIMEI] = phoneData.IMEI 553 | linePD["inputFile"] = phoneData.inFile 554 | exportCols = [] 555 | 556 | for x in linePD.columns: 557 | if isinstance(x, str): 558 | exportCols.append(x) 559 | 560 | print("{} Line contacts located".format(len(linePD["Name"]))) 561 | print("Exporting {}-LINE.csv".format(phoneData.inFile)) 562 | logging.info("Exporting Line contacts from {}".format(phoneData.inFile)) 563 | linePD[exportCols].to_csv("{}-LINE.csv".format(phoneData.inFile), index=False) 564 | 565 | 566 | # ------------Process native contact list ------------------------------------------------ 567 | def processAppleNative(contactsPD): 568 | 569 | print("\nProcessing Native Contacts") 570 | # nativeContactsPD = contactsPD[contactsPD["Source"].isna()] 571 | 572 | # Contacts are stored with either null (iPhone) or "Phone" for Android 573 | nativeContactsPD = contactsPD[ 574 | (contactsPD.Source.isna()) | (contactsPD.Source == "Phone") 575 | ].copy() 576 | 577 | # Fill NaN values with : to prevent error with blank entries. 578 | nativeContactsPD.Entries = nativeContactsPD.Entries.fillna(":") 579 | 580 | nativeContactsPD = nativeContactsPD.drop("Entries", axis=1).join( 581 | nativeContactsPD["Entries"] 582 | .str.split("\n", expand=True) 583 | .stack() 584 | .reset_index(level=1, drop=True) 585 | .rename("Entries") 586 | ) 587 | 588 | # nativeContactsPD = nativeContactsPD[["Name", "Interaction Statuses", "Entries"]] 589 | 590 | nativeContactsPD = nativeContactsPD[ 591 | nativeContactsPD["Entries"].str.contains(r"Phone-") 592 | ] 593 | nativeContactsPD[originIMEI] = phoneData.IMEI 594 | nativeContactsPD["inputFile"] = phoneData.inFile 595 | nativeContactsPD["Provenance"] = phoneData.inProvenance 596 | 597 | # Remove erroneous characters, need to make this a regex 598 | # TODO Use a regex to tidy this up. 599 | nativeContactsPD["Entries"] = ( 600 | nativeContactsPD["Entries"] 601 | .str.split(":", n=1, expand=True)[1] 602 | .str.strip() 603 | .str.replace(" ", "", regex=False) 604 | .str.replace("-", "", regex=False) 605 | .str.replace("+", "", regex=False) 606 | # Fix issue with inseyets reports 607 | .str.replace("Message", "", regex=False) 608 | .str.replace("(", "", regex=False) 609 | .str.replace(")", "", regex=False) 610 | ) 611 | 612 | if ausNormal: 613 | nativeContactsPD["Entries"] = nativeContactsPD["Entries"].str.replace( 614 | r"\+614", "04", regex=True 615 | ) 616 | 617 | if debug: 618 | print(nativeContactsPD) 619 | 620 | # nativeContactsPD = nativeContactsPD[[originIMEI, "Name", "Entries", "Interaction Statuses"]] 621 | print("{} contacts located.".format(len(nativeContactsPD))) 622 | print("Exporting {}-NATIVE.csv".format(phoneData.inFile)) 623 | logging.info("Exporting Native contacts from {}".format(phoneData.inFile)) 624 | nativeContactsPD.to_csv("{}-NATIVE.csv".format(phoneData.inFile), index=False) 625 | 626 | 627 | # Process Outlook Contacts 628 | def processOutlookContacts(contactsPD): 629 | print("\nProcessing Outlook Contacts") 630 | 631 | outlookContactsPD = contactsPD[(contactsPD.Source == "Outlook")].copy() 632 | # Fill NaN values with : to prevent error with blank entries. 633 | outlookContactsPD.Entries = outlookContactsPD.Entries.fillna(":") 634 | 635 | outlookContactsPD = outlookContactsPD.drop("Entries", axis=1).join( 636 | outlookContactsPD["Entries"] 637 | .str.split("\n", expand=True) 638 | .stack() 639 | .reset_index(level=1, drop=True) 640 | .rename("Entries") 641 | ) 642 | 643 | outlookContactsPD = outlookContactsPD[["Account", "Name", "Entries", "Source"]] 644 | outlookContactsPD[originIMEI] = phoneData.IMEI 645 | outlookContactsPD["inputFile"] = phoneData.inFile 646 | outlookContactsPD["Provenance"] = phoneData.inProvenance 647 | 648 | outlookContactsPD["Entries"] = ( 649 | outlookContactsPD["Entries"].str.split(":", n=1, expand=True)[1].str.strip() 650 | ) 651 | 652 | print("{} contacts located.".format(len(outlookContactsPD))) 653 | print("Exporting {}-OUTLOOK.csv".format(phoneData.inFile)) 654 | logging.info("Exporting Native contacts from {}".format(phoneData.inFile)) 655 | outlookContactsPD.to_csv("{}-OUTLOOK.csv".format(phoneData.inFile), index=False) 656 | 657 | 658 | # ----------- Parse Recents ----------------------------------------------------------------------- 659 | def processRecents(contactsPD): 660 | print("\nProcessing Recents") 661 | recentsPD = contactsPD[contactsPD["Source"] == "Recents"].copy() 662 | recentsPD.Entries = recentsPD.Entries.fillna(":") 663 | recentsPD = recentsPD[recentsPD["Entries"].str.contains(r"Phone-")] 664 | 665 | recentsPD[originIMEI] = phoneData.IMEI 666 | recentsPD["inputFile"] = phoneData.inFile 667 | recentsPD["Provenance"] = phoneData.inProvenance 668 | 669 | recentsPD["Entries"] = ( 670 | recentsPD["Entries"] 671 | .str.split(":", n=1, expand=True)[1] 672 | .str.strip() 673 | .str.replace(" ", "") 674 | .str.replace("-", "") 675 | # .str.replace("+","",regex=False) 676 | ) 677 | if ausNormal: 678 | recentsPD["Entries"] = recentsPD["Entries"].str.replace( 679 | r"\+614", "04", regex=True 680 | ) 681 | 682 | print("{} recent contacts located.".format(len(recentsPD))) 683 | print("Exporting {}-RECENT.csv".format(phoneData.inFile)) 684 | logging.info("Exporting recent contacts from {}".format(phoneData.inFile)) 685 | recentsPD.to_csv("{}-RECENTS.csv".format(phoneData.inFile), index=False) 686 | 687 | 688 | # ------------Parse Signal contacts --------------------------------------------------------------- 689 | def processSignal(contactsPD): 690 | print("\nProcessing Signal Contacts") 691 | signalPD = contactsPD[contactsPD["Source"] == "Signal"].copy() 692 | signalPD = signalPD[["Name", "Entries", "Source"]] 693 | signalPD = signalPD.drop("Entries", axis=1).join( 694 | signalPD["Entries"].str.split("\n", expand=True) 695 | ) 696 | 697 | # Data is expended into columns with integer names, add these columsn to selected_cols so we can search them later 698 | selected_cols = [] 699 | for x in signalPD.columns: 700 | if isinstance(x, int): 701 | selected_cols.append(x) 702 | 703 | # FIXME improve with method used for other apps 704 | # Signal can store mutiple values under entries such as Mobile Number: 705 | # So we break them all out into columns. 706 | def signalContact(signalPD): 707 | for x in selected_cols: 708 | # Locate Signal Username and move to Username Column 709 | signalPD.loc[ 710 | (signalPD[x].str.contains("User ID-Username:", na=False)), 711 | "User Name", 712 | ] = signalPD[x].str.split(":", n=1, expand=True)[1] 713 | # Delete Username entry from origional location 714 | signalPD.loc[ 715 | signalPD[x].str.contains("User ID-Username:", na=False), [x] 716 | ] = "" 717 | # delete all befote semicolon 718 | signalPD[x] = signalPD[x].str.split(":", n=1, expand=True)[1].str.strip() 719 | 720 | signalContact(signalPD) 721 | 722 | signalPD[originIMEI] = phoneData.IMEI 723 | signalPD["inputFile"] = phoneData.inFile 724 | signalPD["Provenance"] = phoneData.inProvenance 725 | 726 | export_cols = [originIMEI, "Name", "User Name"] 727 | export_cols.extend(selected_cols) 728 | print("Located {} Signal contacts".format(len(signalPD["Name"]))) 729 | print("Exporting {}-SIGNAL.csv".format(phoneData.inFile)) 730 | logging.info("Exporting Signal messenger from {}".format(phoneData.inFile)) 731 | signalPD.to_csv( 732 | "{}-SIGNAL.csv".format(phoneData.inFile), index=False, columns=export_cols 733 | ) 734 | 735 | 736 | # ----------- Parse Signal Private Messenger-------------------------------------------------------- 737 | def processSignalPrivateMessenger(contactsPD): 738 | print("\nProcessing Signal Private Messenger") 739 | spmPD = contactsPD[contactsPD["Source"] == "Signal Private Messenger"].copy() 740 | spmPD = spmPD.drop("Entries", axis=1).join( 741 | spmPD["Entries"].str.split("\n", expand=True) 742 | ) 743 | # spmPD['Entries'].tolist() 744 | # spmPD.explode('Entries') 745 | # spmPD = spmPD.reset_index(drop=True) 746 | spmPD[originIMEI] = phoneData.IMEI 747 | spmPD["inputFile"] = phoneData.inFile 748 | spmPD["Provenance"] = phoneData.inProvenance 749 | 750 | selected_cols = [] 751 | for x in spmPD.columns: 752 | if isinstance(x, int): 753 | selected_cols.append(x) 754 | 755 | def spmContacts(spmPD): 756 | for x in selected_cols: 757 | try: 758 | spmPD.loc[(spmPD[x].str.contains("Phone-:", na=False)), "Phone"] = ( 759 | spmPD[x].str.split(":", n=1, expand=True)[1] 760 | ) 761 | except: 762 | pass 763 | try: 764 | spmPD.loc[(spmPD[x].str.contains("User ID-:", na=False)), "User-ID"] = ( 765 | spmPD[x].str.split(":", n=1, expand=True)[1] 766 | ) 767 | except: 768 | pass 769 | try: 770 | spmPD.loc[ 771 | (spmPD[x].str.contains("User ID-Nickname:", na=False)), 772 | "User-ID-Nickname", 773 | ] = spmPD[x].str.split(":", n=1, expand=True)[1] 774 | except: 775 | pass 776 | try: 777 | spmPD.loc[ 778 | (spmPD[x].str.contains("User ID-Username:", na=False)), 779 | "User-ID-Username", 780 | ] = spmPD[x].str.split(":", n=1, expand=True)[1] 781 | except: 782 | pass 783 | try: 784 | spmPD.loc[ 785 | (spmPD[x].str.contains("User ID-ProfileKey:", na=False)), 786 | "User-ID-ProfileKey", 787 | ] = spmPD[x].str.split(":", n=1, expand=True)[1] 788 | except: 789 | pass 790 | 791 | spmContacts(spmPD) 792 | # spmPD.info() 793 | 794 | exportCols = [] 795 | # Remove column from previous step 796 | for x in spmPD.columns: 797 | if isinstance(x, str): 798 | # if x !="Provenence" or x != 'originIMEI' or x != 'inputFile': 799 | if x not in ["Provenance", "originIMEI", "inputFile"]: 800 | exportCols.append(x) 801 | 802 | exportCols.extend(["originIMEI", "inputFile", "Provenance"]) 803 | 804 | print("Located {} Signal Private Messenger contacts.".format(len(spmPD["Name"]))) 805 | print("Exporting {}-Signal-PM.csv".format(phoneData.inFile)) 806 | logging.info("Exporting Signal Private Messenger from {}".format(phoneData.inFile)) 807 | spmPD[exportCols].to_csv("{}-Signal-PM.csv".format(phoneData.inFile), index=False) 808 | 809 | 810 | # ----------- Parse Snapchat data ------------------------------------------------------------------ 811 | def processSnapChat(contactsPD): 812 | print("\nProcessing Snapchat") 813 | snapPD = contactsPD[contactsPD["Source"] == "Snapchat"] 814 | snapPD = snapPD[["Name", "Entries", "Source"]] 815 | 816 | # Extract nested entities 817 | snapPD = snapPD.drop("Entries", axis=1).join( 818 | snapPD["Entries"].str.split("\n", expand=True) 819 | ) 820 | selected_cols = [] 821 | for x in snapPD.columns: 822 | if isinstance(x, int): 823 | selected_cols.append(x) 824 | 825 | def snapContacts(snapPD): 826 | for x in selected_cols: 827 | snapPD.loc[ 828 | (snapPD[x].str.contains("User ID-Username", na=False)), "User Name" 829 | ] = snapPD[x].str.split(":", n=1, expand=True)[1] 830 | snapPD.loc[ 831 | (snapPD[x].str.contains("User ID-User ID", na=False)), "User ID" 832 | ] = snapPD[x].str.split(":", n=1, expand=True)[1] 833 | 834 | snapContacts(snapPD) 835 | 836 | snapPD[originIMEI] = phoneData.IMEI 837 | snapPD["inputFile"] = phoneData.inFile 838 | snapPD["Provenance"] = phoneData.inProvenance 839 | 840 | exportCols = [] 841 | for x in snapPD.columns: 842 | if isinstance(x, str): 843 | exportCols.append(x) 844 | if debug: 845 | print(snapPD[exportCols]) 846 | 847 | print("{} Snapchat contacts located.".format(len(snapPD))) 848 | print("Exporting {}-SNAPCHAT.csv".format(phoneData.inFile)) 849 | logging.info("Exporting Snapchat from {}".format(phoneData.inFile)) 850 | snapPD[exportCols].to_csv( 851 | "{}-SNAPCHAT.csv".format(phoneData.inFile), 852 | index=False, 853 | columns=[ 854 | originIMEI, 855 | "Name", 856 | "User Name", 857 | "User ID", 858 | "Source", 859 | "inputFile", 860 | "Provenance", 861 | ], 862 | ) 863 | 864 | 865 | # ---- Parse Telegram Contacts-------------------------------------------------------------- 866 | def processTelegram(contactsPD): 867 | print("\nProcessing Telegram") 868 | telegramPD = contactsPD[contactsPD["Source"] == "Telegram"].copy() 869 | telegramPD = telegramPD.drop("Entries", axis=1).join( 870 | telegramPD["Entries"].str.split("\n", expand=True) 871 | ) 872 | telegramPD = telegramPD.reset_index(drop=True) 873 | 874 | selected_cols = [] 875 | for x in telegramPD.columns: 876 | if isinstance(x, int): 877 | selected_cols.append(x) 878 | 879 | def phoneCheck(telegramPD): 880 | for x in selected_cols: 881 | telegramPD.loc[ 882 | (telegramPD[x].str.contains("Phone-", na=False)), "Phone-Number" 883 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1] 884 | 885 | telegramPD.loc[ 886 | (telegramPD[x].str.contains("User ID-Peer", na=False)), "Peer-ID" 887 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1] 888 | 889 | telegramPD.loc[ 890 | (telegramPD[x].str.contains("User ID-Username", na=False)), "User-Name" 891 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1] 892 | 893 | phoneCheck(telegramPD) 894 | 895 | telegramPD[originIMEI] = phoneData.IMEI 896 | telegramPD["inputFile"] = phoneData.inFile 897 | telegramPD["Provenance"] = phoneData.inProvenance 898 | telegramPD["source"] = "Telegram" 899 | exportCols = [] 900 | for x in telegramPD.columns: 901 | if isinstance(x, str): 902 | exportCols.append(x) 903 | # Export CSV 904 | print("{} Telegram contacts located.".format(len(telegramPD))) 905 | print("Exporting {}-TELEGRAM.csv".format(phoneData.inFile)) 906 | logging.info("Exporting Telegram from {}".format(phoneData.inFile)) 907 | telegramPD[exportCols].to_csv( 908 | "{}-TELEGRAM.csv".format(phoneData.inFile), index=False 909 | ) 910 | 911 | 912 | # ------ Parse Threema Contacts ----------------------------------------------------------------- 913 | def processThreema(contactsPD): 914 | print("\nProcessing Threema") 915 | threemaPD = contactsPD[contactsPD["Source"] == "Threema"].copy() 916 | threemaPD = threemaPD.drop("Entries", axis=1).join( 917 | threemaPD["Entries"].str.split("\n", expand=True) 918 | ) 919 | threemaPD = threemaPD.reset_index(drop=True) 920 | 921 | selected_cols = [] 922 | for x in threemaPD.columns: 923 | if isinstance(x, int): 924 | selected_cols.append(x) 925 | 926 | def ThreemaParse(ThreemaPD): 927 | for x in selected_cols: 928 | try: 929 | ThreemaPD.loc[ 930 | (ThreemaPD[x].str.contains("User ID-identity:", na=False)), 931 | "Threema ID", 932 | ] = ThreemaPD[x].str.split(":", n=1, expand=True)[1] 933 | except: 934 | pass 935 | try: 936 | ThreemaPD.loc[ 937 | (ThreemaPD[x].str.contains("User ID-Username:", na=False)), 938 | "ThreemaUsername", 939 | ] = ThreemaPD[x].str.split(":", n=1, expand=True)[1] 940 | except: 941 | pass 942 | 943 | ThreemaParse(threemaPD) 944 | 945 | threemaPD[originIMEI] = phoneData.IMEI 946 | threemaPD["inputFile"] = phoneData.inFile 947 | threemaPD["Provenance"] = phoneData.inProvenance 948 | 949 | exportCols = [] 950 | for x in threemaPD.columns: 951 | if isinstance(x, str): 952 | exportCols.append(x) 953 | 954 | print("Exporting {}-THREEMA.csv".format(phoneData.inFile)) 955 | logging.info("Exporting Threema from {}".format(phoneData.inFile)) 956 | threemaPD[exportCols].to_csv("{}-THREEMA.csv".format(phoneData.inFile), index=False) 957 | 958 | 959 | ## Parse WeChat Contacts ------------------------------------------------------------------------ 960 | def processWeChat(contactsPD): 961 | print("\nProcessing WeChat") 962 | WeChatPD = contactsPD[contactsPD["Source"] == "WeChat"].copy() 963 | WeChatPD = WeChatPD.drop("Entries", axis=1).join( 964 | WeChatPD["Entries"].str.split("\n", expand=True) 965 | ) 966 | 967 | WeChatPD = WeChatPD.reset_index(drop=True) 968 | 969 | selected_cols = [] 970 | for x in WeChatPD.columns: 971 | if isinstance(x, int): 972 | selected_cols.append(x) 973 | 974 | def WeChatContacts(WeChatPD): 975 | 976 | for x in selected_cols: 977 | # FIXME Usernames that contain @stranger??? 978 | # FIXME Try / Except / Pass 979 | 980 | try: 981 | WeChatPD.loc[ 982 | (WeChatPD[x].str.contains("User ID-WeChat ID:", na=False)), 983 | "WeChatID", 984 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1] 985 | except: 986 | pass 987 | 988 | try: 989 | WeChatPD.loc[ 990 | (WeChatPD[x].str.contains("User ID-QQ:", na=False)), "QQ User ID" 991 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1] 992 | except: 993 | pass 994 | 995 | try: 996 | WeChatPD.loc[ 997 | (WeChatPD[x].str.contains("User ID-Username:", na=False)), 998 | "Username", 999 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1] 1000 | except: 1001 | pass 1002 | 1003 | try: 1004 | WeChatPD.loc[ 1005 | (WeChatPD[x].str.contains("User ID-LinkedIn ID:", na=False)), 1006 | "LinkedIn ID", 1007 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1] 1008 | except: 1009 | pass 1010 | 1011 | try: 1012 | WeChatPD.loc[ 1013 | (WeChatPD[x].str.contains("User ID-Facebook ID:", na=False)), 1014 | "Facebook ID", 1015 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1] 1016 | except: 1017 | pass 1018 | 1019 | WeChatContacts(WeChatPD) 1020 | 1021 | # Repalace we chat ID's with @ stranhger with blank values as are not we chat user IDs 1022 | try: 1023 | WeChatPD.WeChatID = WeChatPD.WeChatID.apply( 1024 | lambda x: "" if (r"@stranger") in str(x) else x 1025 | ) 1026 | except: 1027 | print("WeChat float exception") 1028 | print(WeChatPD.WeChatID) 1029 | pass 1030 | 1031 | WeChatPD[originIMEI] = phoneData.IMEI 1032 | WeChatPD["inputFile"] = phoneData.inFile 1033 | WeChatPD["Provenance"] = phoneData.inProvenance 1034 | WeChatPD["Source"] = "Weixin" 1035 | 1036 | # Export Columns where the title is a string to drop working columns 1037 | exportCols = [] 1038 | for x in WeChatPD.columns: 1039 | if isinstance(x, str): 1040 | exportCols.append(x) 1041 | print("Located {} WeChat contacts.".format(len(WeChatPD["WeChatID"]))) 1042 | print("Exporting {}-WECHAT.csv".format(phoneData.inFile)) 1043 | logging.info("Exporting WeChat from {}".format(phoneData.inFile)) 1044 | WeChatPD[exportCols].to_csv("{}-WECHAT.csv".format(phoneData.inFile), index=False) 1045 | 1046 | 1047 | # ---Parse Whatsapp Contacts---------------------------------------------------------------------- 1048 | # Load WhatsApp 1049 | def processWhatsapp(contactsPD): 1050 | print("\nProcessing WhatsApp") 1051 | whatsAppPD = contactsPD[contactsPD["Source"] == "WhatsApp"].copy() 1052 | try: 1053 | whatsAppPD = whatsAppPD[["Name", "Entries", "Source", "Interaction Statuses"]] 1054 | # Datatype needs to be object not float to allow filtering by string without throwing an error 1055 | whatsAppPD["Interaction Statuses"] = whatsAppPD["Interaction Statuses"].astype( 1056 | object 1057 | ) 1058 | # Shared contacts are not associated with a Whats app ID and cause problems. 1059 | print(whatsAppPD.dtypes) 1060 | whatsAppPD = whatsAppPD[ 1061 | whatsAppPD["Interaction Statuses"].str.contains("Shared", na=False) == False 1062 | ] 1063 | except Exception as e: 1064 | print(e) 1065 | print("Interaction statuses column not found, ignoring") 1066 | # print(whatsAppPD) 1067 | whatsAppPD = whatsAppPD[ 1068 | [ 1069 | "Name", 1070 | "Entries", 1071 | "Source", 1072 | ] 1073 | ] 1074 | 1075 | # Unpack nested data 1076 | whatsAppPD = whatsAppPD.drop("Entries", axis=1).join( 1077 | whatsAppPD["Entries"].str.split("\n", expand=True) 1078 | ) 1079 | 1080 | # Data is expanded into colums with Integer names, check for these columns and add them to a 1081 | # list to allow for different width sheets. 1082 | colList = list(whatsAppPD) 1083 | selected_cols = [] 1084 | for x in colList: 1085 | if isinstance(x, int): 1086 | selected_cols.append(x) 1087 | 1088 | # Look for data across expanded columns and shift it to output columns. 1089 | def whatsappContactProcess(whatsAppPD): 1090 | for x in selected_cols: 1091 | whatsAppPD.loc[ 1092 | (whatsAppPD[x].str.contains("Phone-Mobile", na=False)), "Phone-Mobile" 1093 | ] = ( 1094 | whatsAppPD[x] 1095 | .str.split(":", n=1, expand=True)[1] 1096 | .str.replace(" ", "") 1097 | .str.replace("-", "") 1098 | ) 1099 | 1100 | whatsAppPD.loc[ 1101 | (whatsAppPD[x].str.contains("Phone-:", na=False)), "Phone" 1102 | ] = ( 1103 | whatsAppPD[x] 1104 | .str.split(":", n=1, expand=True)[1] 1105 | .str.replace(" ", "") 1106 | .str.replace("-", "") 1107 | ) 1108 | 1109 | whatsAppPD.loc[ 1110 | (whatsAppPD[x].str.contains("Phone-Home:", na=False)), "Phone-Home" 1111 | ] = ( 1112 | whatsAppPD[x] 1113 | .str.split(":", n=1, expand=True)[1] 1114 | .str.replace(" ", "") 1115 | .str.replace("-", "") 1116 | ) 1117 | 1118 | whatsAppPD.loc[ 1119 | (whatsAppPD[x].str.contains("User ID-Push Name", na=False)), "Push-ID" 1120 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 1121 | 1122 | whatsAppPD.loc[ 1123 | (whatsAppPD[x].str.contains("User ID-Id", na=False)), "Id-ID" 1124 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 1125 | 1126 | whatsAppPD.loc[ 1127 | (whatsAppPD[x].str.contains("User ID-WhatsApp User Id", na=False)), 1128 | "WhatsApp-ID", 1129 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 1130 | 1131 | whatsAppPD.loc[ 1132 | (whatsAppPD[x].str.contains("Web address-Professional", na=False)), 1133 | "BusinessWebsite", 1134 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 1135 | 1136 | whatsAppPD.loc[ 1137 | (whatsAppPD[x].str.contains("Email-Professional", na=False)), 1138 | "Business-Email", 1139 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1] 1140 | 1141 | whatsappContactProcess(whatsAppPD) 1142 | 1143 | # Add IMEI Column 1144 | whatsAppPD[originIMEI] = phoneData.IMEI 1145 | whatsAppPD["inputFile"] = phoneData.inFile 1146 | whatsAppPD["Provenance"] = phoneData.inProvenance 1147 | whatsAppPD["Source"] = "Whatsapp" 1148 | 1149 | # Remove working columns. 1150 | exportCols = [] 1151 | for x in whatsAppPD.columns: 1152 | if isinstance(x, str): 1153 | exportCols.append(x) 1154 | if debug: 1155 | print(exportCols) 1156 | 1157 | # Export CSV 1158 | print("{} WhatsApp contacts located".format(len(whatsAppPD["Name"]))) 1159 | print("Exporting {}-WHATSAPP.csv".format(phoneData.inFile)) 1160 | logging.info("Exporting Whatsapp from {}".format(phoneData.inFile)) 1161 | whatsAppPD[exportCols].to_csv( 1162 | "{}-WHATSAPP.csv".format(phoneData.inFile), index=False 1163 | ) 1164 | 1165 | 1166 | # --- Parse Zalo Contacts -------------------------------------------------------------------- 1167 | def processZalo(contactsPD): 1168 | print("\nProcessinf Zalo") 1169 | ZaloPD = contactsPD[contactsPD["Source"] == "Zalo"] 1170 | ZaloPD = ZaloPD.drop("Entries", axis=1).join( 1171 | ZaloPD["Entries"].str.split("\n", expand=True) 1172 | ) 1173 | selected_cols = [] 1174 | for x in ZaloPD.columns: 1175 | if isinstance(x, int): 1176 | selected_cols.append(x) 1177 | 1178 | def processZaloContacts(ZaloPD): 1179 | for x in selected_cols: 1180 | ZaloPD.loc[ 1181 | (ZaloPD[x].str.contains("User ID-User Name:", na=False)), 1182 | "ZaloUserName", 1183 | ] = ZaloPD[x].str.split(":", n=1, expand=True)[1] 1184 | 1185 | ZaloPD.loc[ 1186 | (ZaloPD[x].str.contains("User ID-Id:", na=False)), 1187 | "ZaloUserID", 1188 | ] = ZaloPD[x].str.split(":", n=1, expand=True)[1] 1189 | 1190 | processZaloContacts(ZaloPD) 1191 | 1192 | ZaloPD[originIMEI] = phoneData.IMEI 1193 | ZaloPD["inputFile"] = phoneData.inFile 1194 | ZaloPD["Provenance"] = phoneData.inProvenance 1195 | 1196 | exportCols = [] 1197 | for x in ZaloPD.columns: 1198 | if isinstance(x, str): 1199 | exportCols.append(x) 1200 | 1201 | print("Exporting {}-ZALO.csv".format(phoneData.inFile)) 1202 | logging.info("Exporting Zalo from {}".format(phoneData.inFile)) 1203 | ZaloPD[exportCols].to_csv("{}-ZALO.csv".format(phoneData.inFile), index=False) 1204 | 1205 | 1206 | # ------- Argument parser for command line arguments ----------------------------------------- 1207 | 1208 | if __name__ == "__main__": 1209 | parser = argparse.ArgumentParser( 1210 | description=__description__, 1211 | epilog="Developed by {}".format(str(__author__), str(__version__)), 1212 | ) 1213 | 1214 | parser.add_argument( 1215 | "-f", 1216 | "--f", 1217 | dest="inputFilename", 1218 | help="Path to Excel Spreadsheet", 1219 | required=False, 1220 | ) 1221 | 1222 | parser.add_argument( 1223 | "-p", 1224 | "--p", 1225 | dest="inputProvenance", 1226 | choices=provenanceCols, 1227 | required=False, 1228 | ) 1229 | 1230 | parser.add_argument( 1231 | "-b", 1232 | "--bulk", 1233 | dest="bulk", 1234 | required=False, 1235 | action="store_true", 1236 | help="Bulk process Excel spreadsheets in working directory.", 1237 | ) 1238 | 1239 | args = parser.parse_args() 1240 | 1241 | if len(sys.argv) == 1: 1242 | parser.print_help() 1243 | parser.exit() 1244 | 1245 | if args.bulk: 1246 | print("Bulk Process") 1247 | bulkProcessor(args.inputProvenance) 1248 | 1249 | if args.inputFilename: 1250 | if not os.path.exists(args.inputFilename): 1251 | print( 1252 | "Error: '{}' does not exist or is not a file.".format( 1253 | args.inputFilename 1254 | ) 1255 | ) 1256 | sys.exit(1) 1257 | processMetadata(args.inputFilename, args.inputProvenance) 1258 | --------------------------------------------------------------------------------