├── .gitignore
├── clbExtract.log
├── offlineTranslate
├── images
│ └── offlineTranslate.jpg
├── __pycache__
│ └── bulk_translate_v3.cpython-310.pyc
├── requirements.txt
├── readme.md
├── translateGUI.py
├── old
│ └── bulk_translate.py
└── bulk_translate_v3.py
├── clbExtract
├── requirements.txt
├── readme.md
├── clbExtractGUI.py
├── old
│ └── clbExtract.py
└── clbExtract.py
├── README.md
├── locationExtract
└── locationExtract.py
└── applenotes2hash
└── applenotes2hash.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .venv
2 | .gitignore
3 | .vscode/settings.json
--------------------------------------------------------------------------------
/clbExtract.log:
--------------------------------------------------------------------------------
1 | 2023-07-25 21:05:49,626,- INFO - Bulk processing 0 files
2 |
--------------------------------------------------------------------------------
/offlineTranslate/images/offlineTranslate.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facelessg00n/pythonForensics/HEAD/offlineTranslate/images/offlineTranslate.jpg
--------------------------------------------------------------------------------
/offlineTranslate/__pycache__/bulk_translate_v3.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facelessg00n/pythonForensics/HEAD/offlineTranslate/__pycache__/bulk_translate_v3.cpython-310.pyc
--------------------------------------------------------------------------------
/offlineTranslate/requirements.txt:
--------------------------------------------------------------------------------
1 | certifi==2023.11.17
2 | charset-normalizer==3.3.2
3 | et-xmlfile==1.1.0
4 | idna==3.4
5 | numpy==1.26.2
6 | openpyxl==3.1.2
7 | pandas==2.1.3
8 | python-dateutil==2.8.2
9 | pytz==2023.3.post1
10 | requests==2.31.0
11 | six==1.16.0
12 | tk==0.1.0
13 | tqdm==4.66.4
14 | tzdata==2023.3
15 | urllib3==2.1.0
16 | XlsxWriter==3.1.9
17 |
--------------------------------------------------------------------------------
/clbExtract/requirements.txt:
--------------------------------------------------------------------------------
1 | altgraph==0.17.3
2 | black==23.1.0
3 | click==8.1.3
4 | future==0.18.2
5 | macholib==1.16.2
6 | mypy-extensions==0.4.3
7 | numpy==1.23.3
8 | packaging==23.0
9 | pandas==1.5.0
10 | pathspec==0.11.0
11 | pefile==2022.5.30
12 | platformdirs==2.6.2
13 | pyinstaller==5.6.2
14 | pyinstaller-hooks-contrib==2022.13
15 | python-dateutil==2.8.2
16 | pytz==2022.4
17 | pywin32-ctypes==0.2.0
18 | scapy==2.4.5
19 | simplekml==1.3.6
20 | six==1.16.0
21 | tk==0.1.0
22 | tomli==2.0.1
23 | typing_extensions==4.4.0
24 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # pythonForesnsics
2 |
3 | Collection of handy Python scripts
4 |
5 | ### applenotes2hash.py
6 |
7 | Extracts hashes from Apple Notes NoteStore.sqlite, or from a GK extract for cracking. Exports in Hashcat or John Format.
8 |
9 | ### clbExtract.py
10 |
11 | Extracts contact details from Cellebrite formatted Excel files
12 |
13 | ### locationExtract.py
14 |
15 | Extracts location data from Cellebrite Excel files and converts them to an ESRI friendly format. Can also look for gaps of more than a specified time.
16 |
17 | ### offlineTranslate
18 |
19 | Utilises LibreTranslate for the bulk offline translation of messages
20 |
--------------------------------------------------------------------------------
/clbExtract/readme.md:
--------------------------------------------------------------------------------
1 | # Cellebrite Contact Extractor
2 |
3 | ---
4 | Extracts contacts from Cellebrite formatted Excel files. The data in these files is nested within the column of an excel file which can cause issues when analysing them with third party tools.
5 |
6 | This tool exports contacts on a per app basis into flat .CSV files for use with third party analysis tools. This was built to handle Excel files as this is typically what analysts will receive unless they have received a 'reader' file.
7 |
8 | ## Usage
9 |
10 | This folder contains 2 python scripts. One is an optional GUI if you wish to build this into a portable .exe fi.
11 |
12 | These instruction assume you are utilising *VS Code* () and have a Python environment setup.
13 |
14 | Download the contents of this folder and open the folder in VS code.
15 |
16 | Create and activate a virtual environment.
17 |
18 |
19 |
20 | install the requirements packages, this will include the tools to turn this into a portable exe.
21 |
22 | `pip install -r .\requirements.txt`
23 |
24 | The standalone script may then be run from the command line.
25 |
26 | options:
27 |
28 | - -h show this help message and exit
29 | - -f path to the input file
30 | - -b process all files in the working directory
31 | - -p add data provenance from one of the pre approved items
32 |
33 | Place the Excel files in the folder where the script is located to process the files in bulk.
34 |
35 | ## Building the exe
36 |
37 | A portable exe can be build utilising PyInstaller.
38 |
39 | The exe must be built on the same OS it is intended to be run on or it will not work. For example if you intend to build this for use on Windows machines it must be built on a windows machine.
40 |
41 | The resulting exe will be locayed in the /dist folder of the working directory after it has been built.
42 |
43 | ### **With GUI**
44 |
45 | `pyinstaller --onefile .\clbExtractGUI.py`
46 |
47 | ### **Without GUI**
48 |
49 | `pyinstaller --onefile .\clbExtract.py`
50 |
51 | ---
52 |
53 | ## Current known issues
54 |
55 | - Native contacts does not currently export email addresses
56 | - Depending on which version of Cellebrite was used, or what type of extraction was perfomed some social media SUer ID's may not be available in the Excel files.
57 |
58 | ## Newtork Analysis tools
59 |
60 | ----
61 | **Constellation**
62 |
63 |
64 |
65 | **Maltego**
66 |
67 |
68 |
--------------------------------------------------------------------------------
/offlineTranslate/readme.md:
--------------------------------------------------------------------------------
1 | # Offline Translation
2 |
3 | ---
4 |
5 | Many forensic tools have inbuilt translation offerings however my experience shows they can be slow or unreliable. As an offline translation option is often required I began to seek other means of translation. Enter LibreTranslate, an self hosted Machine translation API.
6 |
7 |
8 |
9 | ## Installation
10 |
11 | Installation options will depend on your environment however to test the proof of concept LibreTranslate can be installed with the following command from an internet connected machine.
12 |
13 | `pip install libretranslate`
14 |
15 | The server can then be started on localhost with the following command. On first run it will pull down the language packages. The machine can then be taken offline.
16 |
17 | `libretranslate`
18 | ---
19 |
20 | ## Modification
21 |
22 | You may need to change the `serverURL = "http://localhost:5000"` value to match where you libretranslate instance is hosted
23 |
24 | ## Script usage
25 |
26 | The Python script loads the specified Excel file and looks for a column named 'Messages' as per the Magnet AXIOM formatted excel sheets. At this time it will only handle Excel documents with a single sheet.
27 |
28 | In reality it will take any Excel spreadsheet with a Column named messages.
29 |
30 | - Auto detection of language is much faster however not as accurate. If you know the language it is best to select one of the language codes manually. To retrieve the language codes run `python3 bulk_translate_v3 -g` and the available languages from the server will be listed.
31 | - The generated CSV files may not open in Microsoft Excel however will open in LibreOffice Calc. It will however also attempt to output Excel files.
32 | - Defaults to English translation but other languages are possible
33 |
34 | Example usage
35 |
36 | Auto Detect
37 | python3 bulk_translate_v3.py -f excel.xlsx
38 |
39 | Manually Select language
40 | python3 python3 bulk_translate_v3 -f excel.xlsx -l zh
41 |
42 | 
43 |
44 | ## Other usage
45 |
46 | options:
47 |
48 | -h, --help show this help message and exit
49 |
50 | -f INPUTFILEPATH, --file INPUTFILEPATH
51 | Path to Excel File
52 |
53 | -s TRANSLATIONSERVER, --server TRANSLATIONSERVER
54 | Address of translation server if not localhost or hardcoded
55 |
56 | -l {}, --language {} Language code for input text - optional but can greatly improve accuracy
57 |
58 | -e {Chats,Instant Messages}, --excelSheet {Chats,Instant Messages}
59 | Sheet name within Excel file to be translated
60 |
61 | -c, --isCellebrite If file originated from Cellebrite, header starts at 1, and message column is called 'Body'
62 |
63 | -g, --getlangs Get supported language codes and names from server
64 |
65 | ## Building the exe
66 |
67 | A portable exe can be build utilising PyInstaller.
68 |
69 | The exe must be built on the same OS it is intended to be run on or it will not work. For example if you intend to build this for use on Windows machines it must be built on a windows machine.
70 |
71 | The resulting exe will be located in the /dist folder of the working directory after it has been built.
72 |
73 | ### **With GUI**
74 |
75 | `pyinstaller --onefile .\translateGUI.py`
76 |
77 | ### **Without GUI**
78 |
79 | `pyinstaller --onefile .\bulk_translate_v3.py`
80 |
--------------------------------------------------------------------------------
/locationExtract/locationExtract.py:
--------------------------------------------------------------------------------
1 | """
2 | Extracts location data from a Cellebrite PA report and converts it to an ESRI friendly time format.
3 |
4 | Data is Extracted from the Timeline tab of the Excel report.
5 |
6 | Also has a feature to look for gaps in recording.
7 |
8 | """
9 | import argparse
10 | import logging
11 | import pandas as pd
12 | import os
13 | import sys
14 | from datetime import datetime, timedelta
15 |
16 | # file = "input.xlsx"
17 |
18 | # Details
19 | __description__ = "Converts Cellebrite PA Garmin extracts to an ESRI compatible CSV.\n Loads data from the Timeline tab of the Excel Export"
20 | __author__ = "facelessg00n"
21 | __version__ = "0.1"
22 |
23 | # Options
24 | debug = False
25 | findGaps = False
26 | dateAfter = None
27 | localConvert = True
28 |
29 | localHour = 0.0
30 | localMinute = 0.0
31 |
32 | # ----------------------Config the logger------------------------------------
33 | logging.basicConfig(
34 | filename="log.txt",
35 | format="%(levelname)s:%(asctime)s:%(message)s",
36 | level=logging.DEBUG,
37 | )
38 | # ---------------Functions --------------------------------------------------
39 |
40 | # Setup function to load spreadsheet and columns of interest
41 | def convertFile(inputFilename, dateAfter=None, gapFinder=None):
42 | if dateAfter is not None:
43 | print("Looking for dates after = " + str(dateAfter))
44 | dateCut = True
45 | else:
46 | dateCut = False
47 |
48 | print("Loading Excel file {}".format(inputFilename))
49 | logging.info("Loading Excel data from %s" % (str(inputFilename)))
50 |
51 | try:
52 | df = pd.read_excel(inputFilename, sheet_name="Timeline", header=1)
53 | df = df[["#", "Time", "Latitude", "Longitude"]]
54 | except Exception as e:
55 | print(e)
56 | exit()
57 |
58 | # Convert time format
59 | print("Converting Time Format")
60 | new = df["Time"].str.split("(", n=1, expand=True)
61 | df["DateTime"] = new[0]
62 | df["DateTime"] = pd.to_datetime(
63 | df["DateTime"], errors="raise", utc=True, format="%d/%m/%Y %I:%M:%S %p"
64 | )
65 | if debug == True:
66 | print(df.info())
67 |
68 | # Filter only data after this date
69 | if dateCut:
70 | try:
71 | df = df[(df["DateTime"] > dateAfter)]
72 | # "2020-02-01"
73 | except TypeError:
74 | print(
75 | "Type error has been raised, it is likely the input date format is incorect. This process will be skipped"
76 | )
77 | dateCut = False
78 | pass
79 |
80 | if localConvert:
81 | df["Local"] = df["DateTime"] + pd.Timedelta(
82 | hours=localHour, minutes=localMinute
83 | )
84 |
85 | # Find and report gaps in data recording.
86 | if gapFinder is not None:
87 | print(
88 | "Gap finder is looking for gaps of more than %s seconds." % (str(gapFinder))
89 | )
90 | gapData = True
91 | else:
92 | print("Gap finder is not looking for gaps in time")
93 | gapData = False
94 |
95 | if gapData:
96 | print("\nFinding gaps in time")
97 | df["GapFinder"] = df["DateTime"].diff().dt.seconds > gapFinder
98 | time_diff = df[df["GapFinder"] == True]
99 | print(str(time_diff.shape[0]) + " gaps in recording located.")
100 | gapData = True
101 | if debug:
102 | print(time_diff)
103 |
104 | # Export dataframes to CSV
105 | print("\nExporting CSV's")
106 | df.to_csv("locationData.csv", index=False, date_format="%Y/%m/%d %H:%M:%S")
107 | if gapData:
108 | time_diff.to_csv("gapData.csv", index=False, date_format="%Y/%m/%d %H:%M:%S")
109 |
110 |
111 | # Command line input args
112 | if __name__ == "__main__":
113 | parser = argparse.ArgumentParser(
114 | description=__description__,
115 | epilog="Developed by {}".format(str(__author__), str(__version__)),
116 | )
117 |
118 | parser.add_argument(
119 | "-f",
120 | "--file",
121 | dest="inputFilename",
122 | help="Path to input Excel Spreadsheet",
123 | # required=True,
124 | )
125 |
126 | parser.add_argument(
127 | "-g",
128 | "--gap",
129 | dest="gapSeconds",
130 | type=int,
131 | help="To detect gaps in time enter a time gap in seconds. 300 seconds is 5 minutes",
132 | default=None,
133 | required=False,
134 | )
135 |
136 | parser.add_argument(
137 | "-d",
138 | "--dateafter",
139 | dest="dateAfter",
140 | type=int,
141 | help="Filter only data after a certain date. Required format is YYYY-MM-DD. Useful for shrinking your dataset",
142 | required=False,
143 | )
144 |
145 | args = parser.parse_args()
146 |
147 | # display help message when no args are passed.
148 | if len(sys.argv) == 1:
149 |
150 | parser.print_help()
151 | sys.exit(1)
152 |
153 | # If no input show the help text.
154 | if not args.inputFilename:
155 | parser.print_help()
156 | parser.exit(1)
157 |
158 | # Check if the input file exists.
159 | if not os.path.exists(args.inputFilename):
160 | print("ERROR: '{}' does not exist or is not a file".format(args.inputFilename))
161 | sys.exit(1)
162 |
163 | if args.dateAfter is not None:
164 | dateAfter = args.dateAfter
165 | print("Date After Not none")
166 | else:
167 | dateAfter = None
168 |
169 | if args.gapSeconds is not None:
170 | gapSeconds = args.gapSeconds
171 | if debug:
172 | print("GapSeconds Not none")
173 | else:
174 | gapSeconds = None
175 |
176 | convertFile(args.inputFilename, gapFinder=gapSeconds, dateAfter=dateAfter)
177 |
--------------------------------------------------------------------------------
/applenotes2hash/applenotes2hash.py:
--------------------------------------------------------------------------------
1 | # Extracts password protected hashes from Apple Notes
2 | #
3 | # Elements from script from Dhiru Kholia
4 | # https://github.com/openwall/john/blob/bleeding-jumbo/run/applenotes2john.py
5 | #
6 | # Formatted with Black
7 | #
8 | # ------Changes----------
9 | #
10 | # V0.1 - Initial release
11 | #
12 |
13 | import argparse
14 | import binascii
15 | import glob
16 | import os
17 | import sys
18 | import sqlite3
19 | import shutil
20 | import zipfile
21 |
22 | PY3 = sys.version_info[0] == 3
23 |
24 | if not PY3:
25 | reload(sys)
26 | sys.setdefaultencoding("utf8")
27 |
28 | __description__ = "Extracts and converts Apple Note hashes to Hashcat and JTR format"
29 | __author__ = "facelessg00n"
30 | __version__ = "0.1"
31 |
32 | formatType = []
33 | notesFile = "NoteStore.sqlite"
34 | targetPath = os.getcwd() + "/temp"
35 | debug = False
36 |
37 | # ------------- Functions live here -----------------------------------
38 |
39 |
40 | def makeTempFolder():
41 | try:
42 | # print("Creating temporary folder")
43 | os.makedirs(targetPath)
44 | except OSError as e:
45 | # print(e)
46 | # print("Temporary folder exists")
47 | # print("Purging directory")
48 | shutil.rmtree(targetPath)
49 | try:
50 | # print("Creating temporary folder")
51 | os.makedirs(targetPath)
52 | except:
53 | # print("Something has gone horribly wrong")
54 | exit()
55 |
56 |
57 | # Check it is a zip file and extract relevant file
58 | def checkZip(z):
59 | if zipfile.is_zipfile(z):
60 | # print("This is a Zip File")
61 | with zipfile.ZipFile(z) as file:
62 | zippedFiles = file.namelist()
63 | filePath = [x for x in zippedFiles if x.endswith(notesFile)]
64 | if debug:
65 | print("Located file at path : {}".format(filePath))
66 | print("Extracting to temp file")
67 | file.extract(filePath[0], targetPath)
68 |
69 | else:
70 | print("this does not appear to be a zip file")
71 |
72 |
73 | def processGrayShift(x, formatType):
74 | formatType = formatType
75 | try:
76 | makeTempFolder()
77 | except Exception as e:
78 | print(e)
79 | checkZip(x)
80 | inputFile = glob.glob("./**/NoteStore.sqlite", recursive=True)
81 | if debug:
82 | print("Using" + str(inputFile[0]) + " as the input file for Cache.")
83 | extractHash(inputFile[0], formatType)
84 |
85 |
86 | # Functionality below lifted from
87 | # https://github.com/openwall/john/blob/bleeding-jumbo/run/applenotes2john.py
88 |
89 |
90 | def extractHash(inputFile, formatType):
91 | db = sqlite3.connect(inputFile)
92 | cursor = db.cursor()
93 | rows = cursor.execute(
94 | "SELECT Z_PK, ZCRYPTOITERATIONCOUNT, ZCRYPTOSALT, ZCRYPTOWRAPPEDKEY, ZPASSWORDHINT, ZCRYPTOVERIFIER, ZISPASSWORDPROTECTED FROM ZICCLOUDSYNCINGOBJECT"
95 | )
96 | for row in rows:
97 | iden, iterations, salt, fhash, hint, shash, is_protected = row
98 | if fhash is None:
99 | phash = shash
100 | else:
101 | phash = fhash
102 | if hint is None:
103 | hint = "None"
104 | # NOTE: is_protected can be zero even if iterations value is non-zero!
105 | # This was tested on macOS 10.13.2 with cloud syncing turned off.
106 | if iterations == 0: # is this a safer check than checking is_protected?
107 | continue
108 | if phash is None:
109 | continue
110 | phash = binascii.hexlify(phash)
111 | salt = binascii.hexlify(salt)
112 | if PY3:
113 | phash = str(phash, "ascii")
114 | salt = str(salt, "ascii")
115 | fname = os.path.basename(inputFile)
116 | # For John
117 | if formatType == "JOHN":
118 | sys.stdout.write(
119 | "%s:$ASN$*%d*%d*%s*%s:::::%s\n"
120 | % (fname, iden, iterations, salt, phash, hint)
121 | )
122 | # For Hashcat
123 | elif formatType == "HASHCAT":
124 | sys.stdout.write("$ASN$*%d*%d*%s*%s\n" % (iden, iterations, salt, phash))
125 |
126 | else:
127 | print("Invalid or no format type set")
128 | db.close
129 |
130 |
131 | # ----------- Argument Parser ---------------------------------------------
132 |
133 | if __name__ == "__main__":
134 | parser = argparse.ArgumentParser(
135 | description=__description__,
136 | epilog="Developed by {}, version {}".format(str(__author__), str(__version__)),
137 | )
138 |
139 | parser.add_argument(
140 | "-f", "--file", dest="notesFile", help="Path to NoteStore.sqlite"
141 | )
142 | parser.add_argument(
143 | "-g", "--grayshift", dest="grayshiftINPUT", help="Path to Grayshift Extract"
144 | )
145 | parser.add_argument(
146 | "-t",
147 | "--type",
148 | dest="formatType",
149 | help="Output format type, JOHN or HASHCAT, defaults to JOHN. Hashcat Mode is 16200",
150 | choices=["HASHCAT", "JOHN"],
151 | default="JOHN",
152 | required=False,
153 | )
154 |
155 | args = parser.parse_args()
156 | if len(sys.argv) == 1:
157 | parser.print_help()
158 | sys.exit(1)
159 |
160 | if args.notesFile:
161 | if not os.path.exists(args.notesFile):
162 | print("ERROR: {} does not exist or is not a file".format(args.notesFile))
163 | sys.exit(1)
164 | extractHash(notesFile, args.formatType)
165 |
166 | if args.grayshiftINPUT:
167 | if not os.path.exists(args.grayshiftINPUT):
168 | print("ERROR: {} does not exist or is not a file".format(args.grayshiftINPUT))
169 | sys.exit(1)
170 | processGrayShift(args.grayshiftINPUT, args.formatType)
171 |
--------------------------------------------------------------------------------
/clbExtract/clbExtractGUI.py:
--------------------------------------------------------------------------------
1 | ### GUI for Cellebrite File Flattener
2 | # Only tested with Windows
3 | # Known display issues with OSX
4 |
5 |
6 | # Changelog
7 | # v0.2 - Minor changes, added provenance selector
8 | # v0.1 - Initial concept
9 |
10 |
11 | ### GUI for Cellebrite File Flattener
12 |
13 | import clbExtract
14 |
15 | import os
16 | from tkinter import *
17 | from tkinter import ttk
18 | from tkinter import messagebox
19 | from tkinter import filedialog as fd
20 | from tkinter.messagebox import showinfo
21 |
22 | LIGHT_GREY = "#BEBFC7"
23 | LIGHT_BLUE = "#307FE2"
24 | DARK_BLUE = "#024DA1"
25 | FONT_1 = "Roboto Condensed"
26 |
27 | # Auto locate list of files
28 | candidateFiles = os.listdir(os.getcwd())
29 | file_list = []
30 | for candidateFiles in candidateFiles:
31 | if candidateFiles.endswith(".xlsx"):
32 | file_list.append(candidateFiles)
33 | fileListingDisplay = "\n".join(file_list)
34 |
35 | # list of handled apps
36 | supportedAppsDisp = "\n".join(clbExtract.parsedApps)
37 |
38 | ## _____Functions live here_____
39 |
40 |
41 | # TODO - Will need to pass in Provenance data here
42 | def process_all():
43 | print("Process all selected")
44 | clbExtract.bulkProcessor(provMenu.get())
45 |
46 |
47 | def select_file():
48 | filetypes = [("Excel Files", "*.xlsx")]
49 |
50 | filename = fd.askopenfile(
51 | title="Open a file",
52 | initialdir=os.listdir(os.getcwd()),
53 | filetypes=filetypes,
54 | multiple=False,
55 | )
56 | if filename:
57 | print(filename.name)
58 | showinfo(
59 | title="Selected File",
60 | message=filename.name,
61 | )
62 | print(provMenu.get())
63 | clbExtract.processMetadata(filename.name, provMenu.get())
64 |
65 |
66 | # Process selected file
67 | def get_selection():
68 | selected_file = lbox.curselection()
69 | print(lbox.get(selected_file))
70 | print(provMenu.get())
71 | clbExtract.processMetadata(lbox.get(selected_file), provMenu.get())
72 |
73 |
74 | def comboSelection(event):
75 | selectedProvenance = provMenu.get()
76 | # messagebox.showinfo(message=f"The Selected value is {selectedProvenance}",title='Selection')
77 |
78 |
79 | ### _____Create interface_____
80 | root = Tk()
81 | root.geometry("580x650")
82 | root.minsize(458, 580)
83 | root.maxsize(780, 780)
84 | root.configure(bg=LIGHT_GREY)
85 |
86 | prog_name = Label(
87 | text="Cellebrite Contact Extractor",
88 | anchor=W,
89 | padx=10,
90 | pady=10,
91 | background=DARK_BLUE,
92 | width=480,
93 | font=(FONT_1, 20),
94 | )
95 | prog_name.pack()
96 |
97 | sideFrame = Frame(master=root, width=100, height=100, bg=LIGHT_BLUE)
98 | sideFrame.pack(fill=Y, side=LEFT)
99 | sideFrame.pack()
100 |
101 | prog_data = Label(
102 | text="For bulk procesisng of files place this program in the folder\n containing your Cellebrite formatted Excel files. ",
103 | font=(FONT_1, 10),
104 | anchor=W,
105 | padx=10,
106 | pady=10,
107 | bg=LIGHT_GREY,
108 | )
109 | prog_data.pack()
110 |
111 | app_data_heading = Label(
112 | sideFrame, text="Handled apps:", bg=LIGHT_BLUE, font=(FONT_1, 10)
113 | )
114 | app_data_heading.pack()
115 | app_data = Label(sideFrame, text=supportedAppsDisp, bg=LIGHT_BLUE, font=(FONT_1, 10))
116 | app_data.pack()
117 |
118 | ## Show Auto Located files
119 | auto_locate_data = Label(
120 | text="{} candidate files located at path: \n{}".format(
121 | str(len(file_list)), str(os.getcwd())
122 | ),
123 | anchor=W,
124 | padx=10,
125 | pady=10,
126 | bg=LIGHT_GREY,
127 | )
128 | auto_locate_data.pack(pady=10, padx=10)
129 | ### Show options for data provenance.
130 | provLabel = Label(
131 | text="Select provenance, i.e WARRANT",
132 | padx=10,
133 | pady=10,
134 | bg=LIGHT_GREY,
135 | )
136 | provLabel.pack()
137 |
138 | provVar = StringVar()
139 | provMenu = ttk.Combobox(
140 | values=clbExtract.provenanceCols, textvariable=provVar, state="readonly"
141 | )
142 | provMenu.bind("<>", comboSelection)
143 | provMenu.pack(side="top")
144 |
145 | filesLabel = Label(
146 | text="Select File",
147 | padx=10,
148 | pady=10,
149 | bg=LIGHT_GREY,
150 | )
151 | filesLabel.pack()
152 |
153 | # Select file names
154 | fNames = StringVar(value=fileListingDisplay)
155 | lbox = Listbox(root, listvariable=fNames, height=5, width=200)
156 | scroll_bar = Scrollbar(root)
157 | scroll_bar.pack(side=RIGHT, fill=Y)
158 | lbox.pack()
159 | scroll_bar.config(command=lbox.yview)
160 |
161 |
162 | ### Buttons for processing selected files
163 |
164 | btn2 = Button(
165 | root,
166 | text="Process Selected",
167 | command=get_selection,
168 | bg=LIGHT_GREY,
169 | padx=10,
170 | )
171 | btn2.pack(side="top")
172 |
173 | btn3 = Button(root, text="Process all files", command=process_all, bg=LIGHT_GREY)
174 | btn3.pack(side="top")
175 |
176 |
177 | prog_data = Label(
178 | text="Manually select file a file to extract \n Output files will be located at: \n {}".format(
179 | str(os.getcwd())
180 | ),
181 | anchor=W,
182 | padx=10,
183 | pady=10,
184 | font=(FONT_1, 10),
185 | bg=LIGHT_GREY,
186 | )
187 | prog_data.pack()
188 |
189 | btn = Button(root, text="Locate file", command=select_file, bg=LIGHT_GREY)
190 | btn.pack(side=TOP, pady=10, padx=10)
191 |
192 | # Exit Program
193 | exitBtn = Button(root, text="Exit", command=root.destroy, bg=LIGHT_GREY)
194 | exitBtn.pack(side=TOP, pady=20, padx=10)
195 |
196 | # Display version info
197 | verLabel = Label(
198 | text="Version {}\nDeveloped facelesg00n".format(str(clbExtract.__version__)),
199 | padx=10,
200 | pady=10,
201 | bg=LIGHT_GREY,
202 | )
203 | verLabel.pack()
204 |
205 |
206 | root.mainloop()
207 |
--------------------------------------------------------------------------------
/offlineTranslate/translateGUI.py:
--------------------------------------------------------------------------------
1 | ### GUI for Offline Translation
2 | # Only tested with Windows
3 | # Known display issues with OSX
4 |
5 | # Changelog
6 | # v0.2 - Update function names and handle Cellebrite formatted files.
7 | # - Language selection menu
8 | # v0.1 - Initial concept
9 |
10 | import bulk_translate_v3
11 |
12 | import os
13 | from tkinter import *
14 | from tkinter import ttk
15 | from tkinter import messagebox
16 | from tkinter import filedialog as fd
17 | from tkinter.messagebox import showinfo
18 |
19 | LIGHT_GREY = "#BEBFC7"
20 | LIGHT_BLUE = "#307FE2"
21 | DARK_BLUE = "#024DA1"
22 | DARK_RED = "#FF5342"
23 | FONT_1 = "Roboto Condensed"
24 |
25 | isCellebrite = False
26 | SERVER_CONNECTED = False
27 |
28 |
29 | inputLanguages = [
30 | "auto",
31 | "en",
32 | "sq",
33 | "ar",
34 | "az",
35 | "bn",
36 | "bg",
37 | "ca",
38 | "zh",
39 | "zt",
40 | "cs",
41 | "da",
42 | "nl",
43 | "eo",
44 | "et",
45 | "fi",
46 | "fr",
47 | "de",
48 | "el",
49 | "he",
50 | "hi",
51 | "hu",
52 | "id",
53 | "ga",
54 | "it",
55 | "ja",
56 | "ko",
57 | "lv",
58 | "lt",
59 | "ms",
60 | "nb",
61 | "fa",
62 | "pl",
63 | "pt",
64 | "ro",
65 | "ru",
66 | "sr",
67 | "sk",
68 | "sl",
69 | "es",
70 | "sv",
71 | "tl",
72 | "th",
73 | "tr",
74 | "uk",
75 | ]
76 |
77 | ## _______________Functions live here___________________________________________________________
78 |
79 |
80 | # Process a selected file
81 | def get_selection():
82 | selected_file = lbox.curselection()
83 | print(lbox.get(selected_file))
84 | print(inputSheetMenu.get())
85 | bulk_translate_v3.loadAndTranslate(
86 | lbox.get(selected_file),
87 | inputLangMenu.get(),
88 | inputSheetMenu.get(),
89 | isCellebrite.get(),
90 | )
91 |
92 |
93 | def inputComboSelection(event):
94 | selectedProvenance = inputSheetMenu.get()
95 |
96 |
97 | def langComboSelection(event):
98 | selectedProvenance = inputLangMenu.get()
99 | # messagebox.showinfo(message=f"The Selected value is {selectedProvenance}",title='Selection')
100 |
101 |
102 | ### _____Create interface______________________________________________________________________
103 |
104 | # Show list of Excel files in the current working directory
105 | candidateFiles = os.listdir(os.getcwd())
106 | file_list = []
107 | for candidateFiles in candidateFiles:
108 | if candidateFiles.endswith(".xlsx"):
109 | file_list.append(candidateFiles)
110 | fileListingDisplay = "\n".join(file_list)
111 |
112 | # Test Connectivity
113 | # bulk_translate_v3.serverCheck())
114 | if bulk_translate_v3.serverCheck(bulk_translate_v3.serverURL) == "SERVER_OK":
115 | print("Connected to server")
116 | SERVER_CONNECTED = True
117 | serverButtonColour = LIGHT_BLUE
118 | serverStatus = "Online"
119 | else:
120 | print("Server connection failed")
121 | SERVER_CONNECTED = False
122 | serverButtonColour = DARK_RED
123 | serverStatus = "Offline"
124 |
125 | # Create box
126 | root = Tk()
127 | root.geometry("580x650")
128 | root.minsize(458, 580)
129 | root.maxsize(780, 780)
130 | root.configure(bg=LIGHT_GREY)
131 |
132 | prog_name = Label(
133 | text="Offline Translation",
134 | anchor=W,
135 | padx=10,
136 | pady=10,
137 | background=DARK_BLUE,
138 | width=480,
139 | font=(FONT_1, 20),
140 | )
141 | prog_name.pack()
142 |
143 | sideFrame = Frame(master=root, width=100, height=100, bg=LIGHT_BLUE)
144 | sideFrame.pack(fill=Y, side=LEFT)
145 | sideFrame.pack()
146 |
147 | servAdd = Label(
148 | text="Server Address: {} Server Status: {}".format(
149 | str(bulk_translate_v3.serverURL), serverStatus
150 | ),
151 | padx=10,
152 | pady=00,
153 | bg=serverButtonColour,
154 | )
155 | servAdd.pack()
156 | # User instructions
157 | prog_data = Label(
158 | text="For procesisng of files place this program in the folder\n containing your Excel files.",
159 | font=(FONT_1, 10),
160 | anchor=W,
161 | padx=5,
162 | pady=5,
163 | bg=LIGHT_GREY,
164 | )
165 | prog_data.pack()
166 |
167 | app_data_heading = Label(sideFrame, text=" ", bg=LIGHT_BLUE, font=(FONT_1, 10))
168 | app_data_heading.pack()
169 |
170 | # app_data.pack()
171 |
172 | ## Show Auto located files
173 | auto_locate_data = Label(
174 | text="{} candidate files located at path: \n{}".format(
175 | str(len(file_list)), str(os.getcwd())
176 | ),
177 | anchor=W,
178 | padx=10,
179 | pady=10,
180 | bg=LIGHT_GREY,
181 | )
182 | auto_locate_data.pack(pady=10, padx=10)
183 |
184 | # Tick box if file is a Cellebrite file, the header in these files starts at 1
185 | isCellebrite = IntVar()
186 | c1 = Checkbutton(text="Cellebrite file?", variable=isCellebrite, onvalue=1, offvalue=0)
187 | c1.pack()
188 |
189 | # Select an input Datasheet
190 | inputSheetName = Label(
191 | text="Input Sheet name if multiple sheets exist",
192 | padx=10,
193 | pady=10,
194 | bg=LIGHT_GREY,
195 | )
196 | inputSheetName.pack()
197 |
198 | # Input sheet selection menu
199 | inputSheetVar = StringVar()
200 | inputSheetMenu = ttk.Combobox(
201 | values=bulk_translate_v3.inputSheets, textvariable=inputSheetVar, state="readonly"
202 | )
203 |
204 | inputSheetMenu.bind("<>", inputComboSelection)
205 |
206 | # inputSheetMenu.set("Chats")
207 | inputSheetMenu.pack(side="top")
208 |
209 | # File selection label
210 | filesLabel = Label(
211 | text="Select File",
212 | padx=10,
213 | pady=10,
214 | bg=LIGHT_GREY,
215 | )
216 |
217 | # ____________Language selection menu_______________________________________
218 | inputLangName = Label(
219 | text="Input Language",
220 | padx=10,
221 | pady=10,
222 | bg=LIGHT_GREY,
223 | )
224 | inputLangName.pack()
225 | langVar = StringVar()
226 | inputLangMenu = ttk.Combobox(
227 | values=inputLanguages, textvariable=langVar, state="readonly"
228 | )
229 |
230 | inputLangMenu.bind("<>", langComboSelection)
231 | inputLangMenu.set("auto")
232 | inputLangMenu.pack(side="top")
233 |
234 | # ____________File selction menu_______________________________________
235 | filesLabel = Label(
236 | text="Select File",
237 | padx=10,
238 | pady=10,
239 | bg=LIGHT_GREY,
240 | )
241 | filesLabel.pack()
242 |
243 | # Select file names
244 | fNames = StringVar(value=fileListingDisplay)
245 | lbox = Listbox(root, listvariable=fNames, height=5, width=200)
246 | scroll_bar = Scrollbar(root)
247 | scroll_bar.pack(side=RIGHT, fill=Y)
248 | lbox.pack()
249 | scroll_bar.config(command=lbox.yview)
250 |
251 | ### Buttons for processing selected files
252 | processSelectedBtn = Button(
253 | root,
254 | text="Process Selected",
255 | command=get_selection,
256 | bg=LIGHT_GREY,
257 | padx=10,
258 | )
259 | processSelectedBtn.pack(side="top")
260 |
261 | # Exit Program
262 | exitBtn = Button(root, text="Exit", command=root.destroy, bg=LIGHT_GREY)
263 | exitBtn.pack(side=TOP, pady=20, padx=10)
264 |
265 | # Display version info
266 | verLabel = Label(
267 | text="Version {}".format(str(bulk_translate_v3.__version__)),
268 | padx=10,
269 | pady=10,
270 | bg=LIGHT_GREY,
271 | )
272 | verLabel.pack()
273 |
274 | root.mainloop()
275 |
--------------------------------------------------------------------------------
/offlineTranslate/old/bulk_translate.py:
--------------------------------------------------------------------------------
1 | # Bulk Translation of Axiom formatted Excels containing messages
2 | # Made in South Australia
3 | # Unapologetically formatted with Black
4 | #
5 | #
6 | # Changelog
7 | #
8 | # v0.1 Initial Concept
9 |
10 | import argparse
11 | import json
12 | import pandas as pd
13 | import requests
14 | import os
15 | import sys
16 |
17 | # ----------------- Settings live here ------------------------
18 |
19 | __description__ = "Utilises a Libretranslate server to translate messages from Axiom formatted Excel spreadsheets. Messages are loaded from a column titled 'Messages.'"
20 | __author__ = "facelessg00n"
21 | __version__ = "0.1"
22 |
23 | banner = """
24 | ██████ ███████ ███████ ██ ██ ███ ██ ███████ ████████ ██████ █████ ███ ██ ███████ ██ █████ ████████ ███████
25 | ██ ██ ██ ██ ██ ██ ████ ██ ██ ██ ██ ██ ██ ██ ████ ██ ██ ██ ██ ██ ██ ██
26 | ██ ██ █████ █████ ██ ██ ██ ██ ██ █████ ██ ██████ ███████ ██ ██ ██ ███████ ██ ███████ ██ █████
27 | ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
28 | ██████ ██ ██ ███████ ██ ██ ████ ███████ ██ ██ ██ ██ ██ ██ ████ ███████ ███████ ██ ██ ██ ███████
29 |
30 | """
31 |
32 | # Debug mode, will print errors etc
33 | debug = False
34 |
35 | serverURL = "http://localhost:5000"
36 | # Endpoints
37 | # /translate - translation
38 | # /languages - supported languages
39 |
40 | # Name of the column where the messages to be translated are found.
41 | # This can be modified to suit other Excel column names if desired
42 | inputColumn = "Message"
43 |
44 |
45 | # Check is server is reachable and able to process a request.
46 | def serverCheck():
47 | print(f"Testing we can reach server {serverURL}")
48 | headers = {"Content-Type": "application/json"}
49 | payload = json.dumps(
50 | {
51 | "q": "Buenos días señor",
52 | "source": "auto",
53 | "target": "en",
54 | "format": "text",
55 | "api_key": None,
56 | }
57 | )
58 | try:
59 | response = requests.post(
60 | f"{serverURL}/translate", data=payload, headers=headers
61 | )
62 | if response.status_code == 404:
63 | print("ERROR: 404, server not found, check server address.")
64 | sys.exit(1)
65 | elif response.status_code == 400:
66 | print("ERROR: Invalid request sent - exiting")
67 | sys.exit(1)
68 | elif response.status_code == 200:
69 | print("Server located, testing translation")
70 | print(response.json())
71 |
72 | # FIXME - Handle connection errors, can probably be done better.
73 | except ConnectionRefusedError:
74 | print(
75 | f"Server connection refused - {serverURL}, is the address correct? \n\nExiting"
76 | )
77 | sys.exit()
78 | except Exception as e:
79 | print(f"Unable to connect, ERROR: {e}")
80 | sys.exit()
81 |
82 |
83 | # Loads Excel into dataframe and translates messages
84 | def loadAndTranslate(inputFile, inputLanguage):
85 | # Check we can hit the server before we start
86 | serverCheck()
87 | head, tail = os.path.split(inputFile)
88 | fileName = tail.split(".")[0]
89 | # Load Excel into Dataframe "df" and check for messages column.
90 | df = pd.read_excel(inputFile)
91 |
92 | if inputColumn not in df.columns:
93 | print("Required message column not found")
94 | sys.exit(1)
95 |
96 | # Load Messages Column to list and print some stats
97 | messages_nan_count = df["Message"].isna().sum()
98 | messages = df["Message"].tolist()
99 | print(f"{len(messages)} messages")
100 | print(f"{messages_nan_count} blank rows")
101 |
102 | results = []
103 | loopCount = 0
104 | for message in messages:
105 | # If no language code is specified use Auto Translate
106 | if inputLanguage == None:
107 | translated_text = translate_text(message, None)
108 | # Else manual translation
109 | else:
110 | translated_text = translate_text(message, inputLanguage)
111 |
112 | if debug:
113 | print(translated_text)
114 | results.append(translated_text)
115 | print(f"Processing message {loopCount} of {len(messages)}")
116 | loopCount = loopCount + 1
117 |
118 | # ------------- Write backup file every 100 messages ----------------------------------------
119 | if len(results) % 100 == 0:
120 | print("Writing backup")
121 | backup_frame = pd.DataFrame(results)
122 |
123 | try:
124 | backup_frame.to_excel(
125 | f"{fileName}_backup.xlsx",
126 | index=False,
127 | columns=[
128 | "detected_language",
129 | "detected_confidence",
130 | "success",
131 | "input",
132 | "translatedText",
133 | ],
134 | )
135 | except:
136 | print("Writing Excel Bakcup failed")
137 | pass
138 |
139 | try:
140 | backup_frame.to_csv(
141 | f"{fileName}_backup.csv",
142 | encoding="utf-16",
143 | columns=[
144 | "detected_language",
145 | "detected_confidence",
146 | "success",
147 | "input",
148 | "translatedText",
149 | ],
150 | )
151 | except:
152 | print("Writing CSV backup failed")
153 | pass
154 |
155 | # ------------------ Write output file -----------------------------------------------------------------
156 | print("Translation complete - Writing file")
157 | outputFrame = pd.DataFrame(results)
158 |
159 | try:
160 | outputFrame.to_excel(
161 | f"{fileName}_translated.xlsx",
162 | index=False,
163 | columns=[
164 | "detected_language",
165 | "detected_confidence",
166 | "success",
167 | "input",
168 | "translatedText",
169 | ],
170 | )
171 | except:
172 | print("Writing Excel file")
173 | pass
174 |
175 | try:
176 | outputFrame.to_csv(
177 | f"{fileName}_translated.csv",
178 | encoding="utf-16",
179 | columns=[
180 | "detected_language",
181 | "detected_confidence",
182 | "success",
183 | "input",
184 | "translatedText",
185 | ],
186 | )
187 | except:
188 | print("Writing CSV failed")
189 | pass
190 |
191 | print("Process complete - Exiting.")
192 |
193 |
194 | # ------------------ Translates text with selected language -----------------------------------------------
195 | def translate_text(inputText, inputLang, api_key=None):
196 | # For future implementation
197 | if api_key is not None:
198 | API_KEY = api_key
199 | else:
200 | API_KEY = None
201 |
202 | if inputLang is not None:
203 | if debug:
204 | print("Manual Lanugage Detection {}".format(inputLang))
205 | payload = json.dumps(
206 | {
207 | "q": inputText,
208 | "source": inputLang,
209 | "target": "en",
210 | "format": "text",
211 | "api_key": API_KEY,
212 | }
213 | )
214 | else:
215 | if debug:
216 | print("Auto language detection enabled".format(inputLang))
217 | payload = json.dumps(
218 | {
219 | "q": inputText,
220 | "source": "auto",
221 | "target": "en",
222 | "format": "text",
223 | "api_key": API_KEY,
224 | }
225 | )
226 |
227 | # Detect blank rows and skip to prevent error being thrown by server / speeds up process
228 | if inputText == None or pd.isna(inputText):
229 | print("Blank row found, skipping")
230 | output = {
231 | "detected_language": None,
232 | "detected_confidence": None,
233 | "translatedText": None,
234 | "success": False,
235 | }
236 | output["input"] = inputText
237 | return output
238 |
239 | else:
240 | headers = {"Content-Type": "application/json"}
241 | response = requests.post(
242 | f"{serverURL}/translate", data=payload, headers=headers
243 | )
244 | if response.status_code == 200:
245 | results = response.json()
246 | if debug:
247 | print(f"{inputText} and {response.json()}")
248 | try:
249 | answer = results
250 | # Server response style is different for Auto or Manual language selection
251 | if inputLang is not None:
252 | output = {
253 | "detected_language": f"Manual - {inputLang}",
254 | "detected_confidence": None,
255 | "translatedText": answer.get("translatedText"),
256 | "success": True,
257 | }
258 | else:
259 | output = {
260 | "detected_language": results.get("detectedLanguage")[
261 | "language"
262 | ],
263 | "detected_confidence": results.get("detectedLanguage")[
264 | "confidence"
265 | ],
266 | "translatedText": answer.get("translatedText"),
267 | "success": True,
268 | }
269 |
270 | output["input"] = inputText
271 | return output
272 | except Exception as e:
273 | print(e)
274 |
275 | elif response.status_code == 400:
276 | print("Invalid request")
277 | output = {
278 | "detected_language": None,
279 | "detected_confidence": None,
280 | "translatedText": None,
281 | "success": f"Error: {response.status_code, results.get}",
282 | }
283 | output["input"] = inputText
284 | return output
285 |
286 |
287 | # Retrieve list of allowed languages from the server
288 | def getLanguages(printVals):
289 | AllowedLangs = []
290 | supportedLanguages = requests.get(f"{serverURL}/languages").json()
291 | for langItem in supportedLanguages:
292 | if printVals:
293 | print(
294 | f"Language Code: {langItem['code']} Language Name: {langItem['name']}"
295 | )
296 | AllowedLangs.append(langItem["code"])
297 | return AllowedLangs
298 |
299 |
300 | # ---------------------------- Argument Parser ------------------------
301 |
302 | if __name__ == "__main__":
303 | print(banner)
304 | serverCheck()
305 | print(f"Checking server {serverURL} for supported languages")
306 | try:
307 | supportedLanguages = getLanguages(False)
308 | if len(supportedLanguages) == 0:
309 | print("Supported Languages not found")
310 | supportedLanguages = ["0"]
311 | else:
312 | print(f"Languages found - {supportedLanguages} \n\n")
313 |
314 | except Exception as e:
315 | print(e)
316 |
317 | parser = argparse.ArgumentParser(
318 | description=__description__,
319 | epilog="Developed by {}, version {}".format(str(__author__), str(__version__)),
320 | )
321 |
322 | parser.add_argument(
323 | "-f", "--file", dest="inputFilePath", help="Path to Axiom formatted excel file"
324 | )
325 | parser.add_argument(
326 | "-s",
327 | "--server",
328 | dest="translationServer",
329 | help="Address of translation server if not localhost or hardcoded",
330 | required=False,
331 | )
332 | parser.add_argument(
333 | "-l",
334 | "--language",
335 | dest="inputLanguage",
336 | help="Language code for input text - optional but can greatly improve accuracy",
337 | required=False,
338 | choices=supportedLanguages,
339 | )
340 | parser.add_argument(
341 | "-g",
342 | "--getlangs",
343 | dest="getLangs",
344 | action="store_true",
345 | help="Get supported language codes and names from server",
346 | required=False,
347 | default=True,
348 | )
349 | args = parser.parse_args()
350 | if len(sys.argv) == 1:
351 | parser.print_help()
352 | sys.exit(1)
353 |
354 | if args.inputFilePath and not args.inputLanguage:
355 | if not os.path.exists(args.inputFilePath):
356 | print(
357 | "ERROR: {} does not exist or is not a file".format(args.inputFilePath)
358 | )
359 | sys.exit(1)
360 | loadAndTranslate(args.inputFilePath, None)
361 |
362 | if args.inputFilePath and args.inputLanguage:
363 | if not os.path.exists(args.inputFilePath):
364 | print(
365 | "ERROR: {} does not exist or is not a file".format(args.inputFilePath)
366 | )
367 | sys.exit(1)
368 | print(f"Input language set to {args.inputLanguage}")
369 | loadAndTranslate(args.inputFilePath, args.inputLanguage)
370 |
371 | if args.getLangs:
372 | getLanguages(True)
373 |
--------------------------------------------------------------------------------
/offlineTranslate/bulk_translate_v3.py:
--------------------------------------------------------------------------------
1 | # Bulk Translation of Axiom formatted Excels containing messages
2 | # Made in South Australia
3 | # Unapologetically formatted with Black
4 | #
5 | # Changelog
6 | # v0.3 Handle network errors... oops
7 | # v0.2 Change to output full content of the input sheet
8 | # Handle Cellebrite and Axiom files
9 | # v0.1 Initial Concept
10 |
11 | import argparse
12 | import json
13 | import pandas as pd
14 | import requests
15 | import os
16 | import sys
17 | from tqdm import tqdm
18 | from time import sleep
19 |
20 | # ----------------- Settings live here ------------------------
21 |
22 | __description__ = "Utilises a Libretranslate server to translate messages from Excel spreadsheets. By default messages are loaded from a column titled 'Message'."
23 | __author__ = "facelessg00n"
24 | __version__ = "0.3"
25 |
26 | banner = """
27 | ██████ ███████ ███████ ██ ██ ███ ██ ███████ ████████ ██████ █████ ███ ██ ███████ ██ █████ ████████ ███████
28 | ██ ██ ██ ██ ██ ██ ████ ██ ██ ██ ██ ██ ██ ██ ████ ██ ██ ██ ██ ██ ██ ██
29 | ██ ██ █████ █████ ██ ██ ██ ██ ██ █████ ██ ██████ ███████ ██ ██ ██ ███████ ██ ███████ ██ █████
30 | ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
31 | ██████ ██ ██ ███████ ██ ██ ████ ███████ ██ ██ ██ ██ ██ ██ ████ ███████ ███████ ██ ██ ██ ███████
32 |
33 | """
34 |
35 | # Debug mode, will print errors etc
36 | debug = False
37 |
38 | # if being compiled with a GUI
39 | # Keeps window alive if connection fails
40 | hasGUI = True
41 |
42 | serverURL = "http://localhost:5000"
43 | CONNECTION_TIMEOUT = 3
44 | RESPONSE_TIMEOUT = 60
45 | #
46 | # Endpoints
47 | # /translate - translation
48 | # /languages - supported languages
49 |
50 | # Name of the column where the messages to be translated are found.
51 | # This can be modified to suit other Excel column names if desired
52 | inputColumn = "Message"
53 | inputSheets = ["Chats", "Instant Messages"]
54 | sheetName = "Chats"
55 | headerRow = 1
56 |
57 | translationColumns = [
58 | "detectedLanguage",
59 | "detectedConfidence",
60 | "success",
61 | "input",
62 | "translatedText",
63 | ]
64 |
65 |
66 | # Check is server is reachable and able to process a request.
67 | def serverCheck(serverURL):
68 | print(f"Testing we can reach server {serverURL}")
69 | headers = {"Content-Type": "application/json"}
70 | payload = json.dumps(
71 | {
72 | "q": "Buenos días señor",
73 | "source": "auto",
74 | "target": "en",
75 | "format": "text",
76 | "api_key": None,
77 | }
78 | )
79 | try:
80 | response = requests.post(
81 | f"{serverURL}/translate", data=payload, headers=headers
82 | )
83 | if response.status_code == 404:
84 | print("ERROR: 404, server not found, check server address.")
85 | sys.exit(1)
86 | elif response.status_code == 400:
87 | print("ERROR: Invalid request sent - exiting")
88 | sys.exit(1)
89 | elif response.status_code == 200:
90 | print("Server located, testing translation")
91 | print(response.json())
92 | return "SERVER_OK"
93 |
94 | # FIXME - Handle connection errors, can probably be done better.
95 | except ConnectionRefusedError:
96 | print(
97 | f"Server connection refused - {serverURL}, is the address correct? \n\nExiting"
98 | )
99 | if not hasGUI:
100 | sys.exit()
101 | except Exception as e:
102 | print(f"Unable to connect, ERROR: {e}")
103 | if not hasGUI:
104 | sys.exit()
105 |
106 |
107 | # Loads Excel into dataframe and translates messages
108 | def loadAndTranslate(inputFile, inputLanguage, inputSheet, isCellebrite):
109 | # Check we can hit the server before we start
110 | serverCheck(serverURL)
111 | head, tail = os.path.split(inputFile)
112 | fileName = tail.split(".")[0]
113 |
114 | if isCellebrite:
115 | inputHeader = 1
116 | inputColumn = "Body"
117 | else:
118 | inputHeader = 0
119 | inputColumn = "Message"
120 |
121 | # Load Excel into Dataframe "df" and check for messages column.
122 | if inputSheet:
123 | print("There is an input sheet")
124 | df = pd.read_excel(inputFile, sheet_name=inputSheet, header=inputHeader)
125 | else:
126 | print("There is no input sheet specified")
127 | df = pd.read_excel(inputFile, header=inputHeader)
128 |
129 | if debug:
130 | df = df.head(25)
131 |
132 | if inputColumn not in df.columns:
133 | print("Required message column not found, is this a Cellbrite Formatted Excel?")
134 | sys.exit(1)
135 |
136 | # Load Messages Column to list and print some stats
137 | messages_nan_count = df[inputColumn].isna().sum()
138 | messages = df[inputColumn].tolist()
139 | print(f"{len(messages)} messages")
140 | print(f"{messages_nan_count} blank rows")
141 |
142 | results = []
143 | loopCount = 1
144 | for message in tqdm(messages, desc="Translating messages", ascii="░▒█"):
145 | # If no language code is specified use Auto Translate
146 | if inputLanguage == None:
147 | translated_text = translate_text(message, None)
148 | # Else manual translation
149 | else:
150 | translated_text = translate_text(message, inputLanguage)
151 |
152 | if debug:
153 | print(translated_text)
154 | results.append(translated_text)
155 | tqdm.write(f"Processing message {loopCount} of {len(messages)}")
156 | # print(f"Processing message {loopCount} of {len(messages)}")
157 | loopCount = loopCount + 1
158 |
159 | # ------------- Write backup file every 100 messages ----------------------------------------
160 | if len(results) % 100 == 0:
161 | tqdm.write("Writing backup")
162 | backup_frame = pd.DataFrame(results)
163 |
164 | try:
165 | backup_frame.to_csv(
166 | f"{fileName}_backup.csv",
167 | encoding="utf-16",
168 | columns=translationColumns,
169 | )
170 | except:
171 | print("Writing CSV backup failed")
172 | pass
173 |
174 | # ------------------ Write output file -----------------------------------------------------------------
175 | print("Translation complete - Writing file")
176 | # Get colum positon to insert new data
177 | bodyPosition = df.columns.get_loc(inputColumn) + 1
178 | # Splitting orig frame into to then concat with new data
179 | df1_part1 = df.iloc[:, :bodyPosition]
180 | df1_part2 = df.iloc[:, bodyPosition:]
181 | outputFrame = pd.concat([df1_part1, pd.DataFrame(results), df1_part2], axis=1)
182 |
183 | try:
184 | outputFrame.to_excel(f"{fileName}_translated.xlsx", index=False)
185 | except:
186 | print("Writing Excel failed")
187 | pass
188 |
189 | try:
190 | outputFrame.to_csv(f"{fileName}_translated.csv", encoding="utf-16")
191 | except:
192 | print("Writing CSV failed")
193 | pass
194 |
195 | print("Process complete - Exiting.")
196 |
197 |
198 | # ------------------ Translates text with selected language -----------------------------------------------
199 | def translate_text(inputText, inputLang, api_key=None):
200 | # For future implementation
201 | if api_key is not None:
202 | API_KEY = api_key
203 | else:
204 | API_KEY = None
205 |
206 | if inputLang is not None:
207 | if debug:
208 | print("Manual Lanugage Selection {}".format(inputLang))
209 | payload = json.dumps(
210 | {
211 | "q": inputText,
212 | "source": inputLang,
213 | "target": "en",
214 | "format": "text",
215 | "api_key": API_KEY,
216 | }
217 | )
218 | else:
219 | if debug:
220 | print("Auto language detection enabled".format(inputLang))
221 | payload = json.dumps(
222 | {
223 | "q": inputText,
224 | "source": "auto",
225 | "target": "en",
226 | "format": "text",
227 | "api_key": API_KEY,
228 | }
229 | )
230 |
231 | # Detect blank rows and skip to prevent error being thrown by server / speeds up process
232 | if inputText == None or pd.isna(inputText):
233 | tqdm.write("Blank row found, skipping")
234 | output = {
235 | "detectedLanguage": None,
236 | "detectedConfidence": None,
237 | "translatedText": None,
238 | "success": False,
239 | }
240 | output["input"] = inputText
241 | return output
242 |
243 | # If row is not blank, attempt to translate it
244 | else:
245 | headers = {"Content-Type": "application/json"}
246 | try:
247 | # Max Attempt for retries
248 | MAX_ATTEMPTS = 5
249 |
250 | response = requests.post(
251 | f"{serverURL}/translate",
252 | data=payload,
253 | headers=headers,
254 | timeout=(CONNECTION_TIMEOUT, RESPONSE_TIMEOUT),
255 | )
256 |
257 | # Handle a read timeout error, sleep 2 seconds then try again
258 | except requests.ReadTimeout:
259 |
260 | while MAX_ATTEMPTS > 0:
261 | try:
262 | tqdm.write("Read Timeout error, retrying")
263 | sleep(2)
264 | response = requests.post(
265 | f"{serverURL}/translate",
266 | data=payload,
267 | headers=headers,
268 | )
269 | output["input"] = inputText
270 | return output
271 |
272 | except Exception:
273 | MAX_ATTEMPTS -= 1
274 | continue
275 | else:
276 | output = {
277 | "detectedLanguage": None,
278 | "detectedConfidence": None,
279 | "translatedText": None,
280 | "success": "False: Error: Read Timeout ",
281 | }
282 | output["input"] = inputText
283 | return output
284 |
285 | # Handle a connection dropout, sleep 2 seconds and try again
286 | except requests.ConnectionError:
287 | while MAX_ATTEMPTS > 0:
288 | try:
289 | tqdm.write("Connection Error - Retrying")
290 | sleep(2)
291 | response = requests.post(
292 | f"{serverURL}/translate", data=payload, headers=headers
293 | )
294 | output["input"] = inputText
295 | return output
296 |
297 | except Exception:
298 | MAX_ATTEMPTS -= 1
299 | continue
300 | else:
301 | print("Failed")
302 | output = {
303 | "detectedLanguage": None,
304 | "detectedConfidence": None,
305 | "translatedText": None,
306 | "success": "False: Error: Connection Error",
307 | }
308 | output["input"] = inputText
309 | return output
310 |
311 | except Exception as e:
312 | tqdm.write(f"Unhandled exception {e}")
313 | output = {
314 | "detectedLanguage": None,
315 | "detectedConfidence": None,
316 | "translatedText": None,
317 | "success": f"False: Error: {e}",
318 | }
319 | output["input"] = inputText
320 | return output
321 |
322 | if response.status_code == 200:
323 | results = response.json()
324 | if debug:
325 | print(f"{inputText} and {response.json()}")
326 | try:
327 | answer = results
328 | # Server response style is different for Auto or Manual language selection
329 | if inputLang is not None:
330 | output = {
331 | "detectedLanguage": f"Manual - {inputLang}",
332 | "detectedConfidence": None,
333 | "translatedText": answer.get("translatedText"),
334 | "success": True,
335 | }
336 | else:
337 | output = {
338 | "detectedLanguage": results.get("detectedLanguage")["language"],
339 | "detectedConfidence": results.get("detectedLanguage")[
340 | "confidence"
341 | ],
342 | "translatedText": answer.get("translatedText"),
343 | "success": True,
344 | }
345 |
346 | output["input"] = inputText
347 | return output
348 | except Exception as e:
349 | print(e)
350 |
351 | elif response.status_code == 400:
352 | print("Invalid request")
353 | output = {
354 | "detectedLanguage": None,
355 | "detectedConfidence": None,
356 | "translatedText": None,
357 | "success": f"Error: {response.status_code, results.get}",
358 | }
359 | output["input"] = inputText
360 | return output
361 |
362 |
363 | # Retrieve list of alowed languages from the server
364 | def getLanguages(printVals):
365 | AllowedLangs = []
366 | try:
367 | supportedLanguages = requests.get(f"{serverURL}/languages").json()
368 | except:
369 | print("Supported Languages not found")
370 | supportedLanguages = []
371 | pass
372 |
373 | for langItem in supportedLanguages:
374 | if printVals:
375 | print(
376 | f"Language Code: {langItem['code']} Language Name: {langItem['name']}"
377 | )
378 | AllowedLangs.append(langItem["code"])
379 | return AllowedLangs
380 |
381 |
382 | # ---------------------------- Argument Parser ------------------------
383 |
384 | if __name__ == "__main__":
385 | print(banner)
386 | if debug:
387 | print("WARNING DEBUG MODE IS ACTIVE")
388 | serverCheck(serverURL)
389 | print(f"Checking server {serverURL} for supported languages")
390 | try:
391 | supportedLanguages = getLanguages(False)
392 | if len(supportedLanguages) == 0:
393 | print("Supported Languages not found")
394 | supportedLanguages = []
395 | else:
396 | print(f"Languages found - {supportedLanguages} \n\n")
397 |
398 | except Exception as e:
399 | print(e)
400 |
401 | parser = argparse.ArgumentParser(
402 | description=__description__,
403 | epilog="Developed by {}, version {}".format(str(__author__), str(__version__)),
404 | )
405 |
406 | parser.add_argument("-f", "--file", dest="inputFilePath", help="Path to Excel File")
407 | parser.add_argument(
408 | "-s",
409 | "--server",
410 | dest="translationServer",
411 | help="Address of translation server if not localhost or hardcoded",
412 | required=False,
413 | )
414 |
415 | parser.add_argument(
416 | "-l",
417 | "--language",
418 | dest="inputLanguage",
419 | help="Language code for input text - optional but can greatly improve accuracy",
420 | required=False,
421 | choices=supportedLanguages,
422 | )
423 |
424 | parser.add_argument(
425 | "-e",
426 | "--excelSheet",
427 | dest="inputSheet",
428 | help="Sheet name within Excel file to be translated",
429 | required=False,
430 | choices=inputSheets,
431 | )
432 |
433 | parser.add_argument(
434 | "-c",
435 | "--isCellebrite",
436 | dest="isCellebrite",
437 | help="If file originated from Cellebrite, header starts at 1, and message column is called 'Body'",
438 | required=False,
439 | action="store_true",
440 | default=False,
441 | )
442 |
443 | parser.add_argument(
444 | "-g",
445 | "--getlangs",
446 | dest="getLangs",
447 | action="store_true",
448 | help="Get supported language codes and names from server",
449 | required=False,
450 | default=False,
451 | )
452 |
453 | args = parser.parse_args()
454 | if len(sys.argv) == 1:
455 | parser.print_help()
456 | sys.exit(1)
457 |
458 | if args.inputFilePath and not args.inputLanguage:
459 | if not os.path.exists(args.inputFilePath):
460 | print(
461 | "ERROR: {} does not exist or is not a file".format(args.inputFilePath)
462 | )
463 | sys.exit(1)
464 | loadAndTranslate(args.inputFilePath, None, args.inputSheet, args.isCellebrite)
465 |
466 | if args.inputFilePath and args.inputLanguage:
467 | if not os.path.exists(args.inputFilePath):
468 | print(
469 | "ERROR: {} does not exist or is not a file".format(args.inputFilePath)
470 | )
471 | sys.exit(1)
472 | print(f"Input language set to {args.inputLanguage}")
473 | loadAndTranslate(
474 | args.inputFilePath, args.inputLanguage, args.inputSheet, args.isCellebrite
475 | )
476 |
477 | if args.getLangs:
478 | getLanguages(True)
479 |
--------------------------------------------------------------------------------
/clbExtract/old/clbExtract.py:
--------------------------------------------------------------------------------
1 | """
2 | Extracts nested contacts data from Cellebrite formatted Excel documents.
3 | - Cellebrite Stores contact details in multiline Excel cells.
4 | Formatted with Black
5 |
6 | Changelog
7 | 0.3 Complete rewrite
8 |
9 | 0.2 - Implement command line argument parser
10 | Allow bulk processing of all items in directory
11 |
12 | 0.1 - Initial concept
13 |
14 | """
15 | import argparse
16 | import glob
17 | import logging
18 | import os
19 | import openpyxl
20 | import pandas as pd
21 | from pathlib import Path
22 | import sys
23 |
24 |
25 |
26 | ## Details
27 | __description__ = 'Flattens Cellebrite formatted Excel files. "Contacts" and "Device Info" tabs are required.'
28 | __author__ = "facelessg00n"
29 | __version__ = "0.3"
30 |
31 | parser = argparse.ArgumentParser(
32 | description=__description__,
33 | epilog="Developed by {}".format(str(__author__), str(__version__)),
34 | )
35 |
36 | # ----------- Options -----------
37 | debug = False
38 |
39 | os.chdir(os.getcwd())
40 |
41 | logging.basicConfig(
42 | filename="log.txt",
43 | format="%(asctime)s,- %(levelname)s - %(message)s",
44 | level=logging.INFO,
45 | )
46 |
47 |
48 | # Set names for sheets of interest
49 | clbPhoneInfo = "Device Info"
50 | clbContactSheet = "Contacts"
51 |
52 | # FIXME
53 | #### ---- Column names and other options ---------------------------------------------
54 | contactOutput = "ContactDetail"
55 | contactTypeOutput = "ContactType"
56 | originIMEI = "originIMEI"
57 | parsedApps = [
58 | "Instagram",
59 | "Native",
60 | "Telegram",
61 | "Snapchat",
62 | "WhatsApp",
63 | "Facebook Messenger",
64 | "Signal",
65 | ]
66 |
67 | # Class object to hold phone and input file info
68 | class phoneData:
69 | IMEI = None
70 | IMEI2 = None
71 | inFile = None
72 | inPath = None
73 |
74 | def __init__(self, IMEI=None, IMEI2=None, inFile=None, inPath=None) -> None:
75 | self.IMEI = IMEI
76 | self.IMEI2 = IMEI2
77 | self.inFile = inFile
78 | self.inPath = inPath
79 |
80 |
81 | # -------------Functions live here ------------------------------------------
82 |
83 | # ----- Bulk Excel Processor--------------------------------------------------
84 |
85 | # Finds and processes all exxcel files in the working directory.
86 | def bulkProcessor():
87 | FILE_PATH = os.getcwd()
88 | inputFiles = glob.glob("*.xlsx")
89 | print((str(len(inputFiles)) + " Excel files located. \n"))
90 | # If there are no files found exit the process.
91 | if len(inputFiles) == 0:
92 | print("No excel files located.")
93 | print("Exiting.")
94 | quit()
95 | else:
96 | for x in inputFiles:
97 | if os.path.exists(x):
98 | try:
99 | processMetadata(x)
100 | # Need to deal with $ files.
101 | except FileNotFoundError:
102 | print("File does not exist or temp file detected")
103 | pass
104 | if debug:
105 | for x in inputFiles:
106 | inputFilename = x.split(".")[0]
107 | print(inputFilename)
108 |
109 |
110 | # FIXME - Deal with error when this info is missing
111 | ### -------- Process phone metadata ------------------------------------------------------
112 | def processMetadata(inputFile):
113 |
114 | try:
115 | infoPD = pd.read_excel(
116 | inputFile, sheet_name=clbPhoneInfo, header=1, usecols="B,C,D"
117 | )
118 |
119 | phoneData.IMEI = infoPD.loc[infoPD["Name"] == "IMEI", ["Value"]].values[0][0]
120 | try:
121 | phoneData.IMEI2 = infoPD.loc[infoPD["Name"] == "IMEI2", ["Value"]].values[
122 | 0
123 | ][0]
124 | except:
125 | phoneData.IMEI2 = None
126 | # phoneData.inFile = inputFile.split(".")[0]
127 | phoneData.inFile = Path(inputFile).stem
128 | phoneData.inPath = os.path.dirname(inputFile)
129 |
130 | if debug:
131 | print(infoPD)
132 | print(phoneData.IMEI)
133 | except ValueError:
134 | print(
135 | "\033[1;31m Info tab not found in {}, apptempting with with no IMEI".format(
136 | inputFile
137 | )
138 | )
139 | phoneData.IMEI = None
140 | phoneData.IMEI2 = None
141 | # phoneData.inFile = inputFile.split(".")[0]
142 | phoneData.inFile = Path(inputFile).stem
143 | phoneData.inPath = os.path.dirname(inputFile)
144 |
145 | try:
146 | processContacts(inputFile)
147 | except ValueError:
148 | print("\033[1;31m No Contacts tab found, is this a correctly formatted Excel?")
149 | logging.error(
150 | "No Contacts tab found in {}, is this a correctly formatted Excel?".format(
151 | inputFile
152 | )
153 | )
154 |
155 |
156 | ### Extract contacts tab of Excel file -------------------------------------------------------------------
157 | def processContacts(inputFile):
158 | inputFile = inputFile
159 | logging.info("Processing contacts in {} has begun.".format(inputFile))
160 |
161 | # Record input filename for use in export processes.
162 |
163 | if debug:
164 | print("\033[0;37m Input file is : {}".format(phoneData.inFile))
165 |
166 | contactsPD = pd.read_excel(
167 | inputFile,
168 | sheet_name=clbContactSheet,
169 | header=1,
170 | index_col="#",
171 | usecols=["#", "Name", "Interaction Statuses", "Entries", "Source", "Account"],
172 | )
173 |
174 | print(
175 | "\033[0m Processing the following app types for : {}".format(phoneData.inFile)
176 | )
177 | applist = contactsPD["Source"].unique()
178 | for x in applist:
179 | if x in parsedApps:
180 | print("{} : \u2713 ".format(x))
181 | else:
182 | print("{} : \u2716".format(x))
183 | # Process native contacts
184 | try:
185 | processAppleNative(contactsPD)
186 | except:
187 | print("Processing native contacts failed.")
188 | pass
189 | # Process Apps
190 | for x in applist:
191 | if x == "Instagram":
192 | processInstagram(contactsPD)
193 | if x == "Snapchat":
194 | processSnapChat(contactsPD)
195 | if x == "WhatsApp":
196 | processWhatsapp(contactsPD)
197 | if x == "Telegram":
198 | processTelegram(contactsPD)
199 | if x == "Facebook Messenger":
200 | processFacebookMessenger(contactsPD)
201 | if x == "Signal":
202 | processSignal(contactsPD)
203 |
204 |
205 | # ------ Parse Facebook Messenger --------------------------------------------------------------
206 | def processFacebookMessenger(contactsPD):
207 | print("\nProcessing Facebook Messenger")
208 | facebookMessengerPD = contactsPD[contactsPD["Source"] == "Facebook Messenger"]
209 | facebookMessengerPD = facebookMessengerPD.drop("Entries", axis=1).join(
210 | facebookMessengerPD["Entries"].str.split("\n", expand=True)
211 | )
212 | facebookMessengerPD = facebookMessengerPD.reset_index(drop=True)
213 |
214 | selected_cols = []
215 | for x in facebookMessengerPD.columns:
216 | if isinstance(x, int):
217 | selected_cols.append(x)
218 |
219 | def phoneCheck(facebookMessengerPD):
220 | for x in selected_cols:
221 | facebookMessengerPD.loc[
222 | (facebookMessengerPD[x].str.contains("User ID-Facebook Id", na=False)),
223 | "Account ID",
224 | ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1]
225 | facebookMessengerPD.loc[
226 | (facebookMessengerPD[x].str.contains("User ID-Username", na=False)),
227 | "User Name",
228 | ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1]
229 |
230 | phoneCheck(facebookMessengerPD)
231 | facebookMessengerPD[originIMEI] = phoneData.IMEI
232 | exportCols = []
233 | for x in facebookMessengerPD.columns:
234 | if isinstance(x, str):
235 | exportCols.append(x)
236 | print("\n")
237 | print(
238 | "{} user accounts located".format(len(facebookMessengerPD["Account"].unique()))
239 | )
240 | print("{} contacts located".format(len(facebookMessengerPD["Account ID"].unique())))
241 | print("Exporting {}-FB-MESSENGER.csv".format(phoneData.inFile))
242 | logging.info("Exporting FB messenger from {}".format(phoneData.inFile))
243 | facebookMessengerPD[exportCols].to_csv(
244 | "{}-FB-MESSENGER.csv".format(phoneData.inFile),
245 | index=False,
246 | columns=[
247 | originIMEI,
248 | "Account",
249 | "Interaction Statuses",
250 | "Name",
251 | "User Name",
252 | "Account ID",
253 | "Source",
254 | ],
255 | )
256 |
257 |
258 | # ----- Parse Instagram data ------------------------------------------------------------------
259 | def processInstagram(contactsPD):
260 | print("\nProcessing Instagram")
261 | instagramPD = contactsPD[contactsPD["Source"] == "Instagram"].copy()
262 | instagramPD = instagramPD.drop("Entries", axis=1).join(
263 | instagramPD["Entries"].str.split("\n", expand=True)
264 | )
265 |
266 | selected_cols = []
267 | for x in instagramPD.columns:
268 | if isinstance(x, int):
269 | selected_cols.append(x)
270 |
271 | def instaContacts(instagramPD):
272 | for x in selected_cols:
273 | instagramPD.loc[
274 | (instagramPD[x].str.contains("User ID-Username", na=False)), "User Name"
275 | ] = instagramPD[x].str.split(":", n=1, expand=True)[1]
276 | instagramPD.loc[
277 | (instagramPD[x].str.contains("User ID-Instagram Id", na=False)),
278 | "Instagram ID",
279 | ] = instagramPD[x].str.split(":", n=1, expand=True)[1]
280 |
281 | instaContacts(instagramPD)
282 |
283 | instagramPD[originIMEI] = phoneData.IMEI
284 | exportCols = []
285 | for x in instagramPD.columns:
286 | if isinstance(x, str):
287 | exportCols.append(x)
288 |
289 | print("Exporting {}-INSTAGRAM.csv".format(phoneData.inFile))
290 | logging.info("Exporting Instagram from {}".format(phoneData.inFile))
291 | instagramPD[exportCols].to_csv(
292 | "{}-INSTAGRAM.csv".format(phoneData.inFile),
293 | index=False,
294 | columns=[
295 | originIMEI,
296 | "Account",
297 | "Name",
298 | "User Name",
299 | "Instagram ID",
300 | "Interaction Statuses",
301 | ],
302 | )
303 |
304 |
305 | # ------------Process native contact list ------------------------------------------------
306 | def processAppleNative(contactsPD):
307 |
308 | print("\nProcessing Native Contacts")
309 | nativeContactsPD = contactsPD[contactsPD["Source"].isna()]
310 | nativeContactsPD = nativeContactsPD.drop("Entries", axis=1).join(
311 | nativeContactsPD["Entries"]
312 | .str.split("\n", expand=True)
313 | .stack()
314 | .reset_index(level=1, drop=True)
315 | .rename("Entries")
316 | )
317 |
318 | nativeContactsPD = nativeContactsPD[["Name", "Interaction Statuses", "Entries"]]
319 |
320 | nativeContactsPD = nativeContactsPD[
321 | nativeContactsPD["Entries"].str.contains(r"Phone-")
322 | ]
323 | nativeContactsPD[originIMEI] = phoneData.IMEI
324 | nativeContactsPD["Entries"] = (
325 | nativeContactsPD["Entries"]
326 | .str.split(":", n=1, expand=True)[1]
327 | .str.strip()
328 | .str.replace(" ", "")
329 | .str.replace("-", "")
330 | )
331 | if debug:
332 | print(nativeContactsPD)
333 | nativeContactsPD = nativeContactsPD[
334 | [originIMEI, "Name", "Entries", "Interaction Statuses"]
335 | ]
336 | print("{} contacts located.".format(len(nativeContactsPD)))
337 | print("Exporting {}-NATIVE.csv".format(phoneData.inFile))
338 | logging.info("Exporting Native contacts from {}".format(phoneData.inFile))
339 | nativeContactsPD.to_csv("{}-NATIVE.csv".format(phoneData.inFile), index=False)
340 |
341 |
342 | # ------------Parse Signal contacts ---------------------------------------------------------------
343 | def processSignal(contactsPD):
344 | print("Processing Signal Contacts")
345 | signalPD = contactsPD[contactsPD["Source"] == "Signal"].copy()
346 | signalPD = signalPD[["Name", "Entries", "Source"]]
347 | signalPD = signalPD.drop("Entries", axis=1).join(
348 | signalPD["Entries"].str.split("\n", expand=True)
349 | )
350 |
351 | # Data is expended into columns with integern names, add these columsn to selected_cols so we can search them later
352 | selected_cols = []
353 | for x in signalPD.columns:
354 | if isinstance(x, int):
355 | selected_cols.append(x)
356 |
357 | # Signal can store mutiple values under entries such as Mobile Number:
358 | # So we break them all out into columns.
359 | def signalContact(signalPD):
360 | for x in selected_cols:
361 | # Locate Signal Username and move to Username Column
362 | signalPD.loc[
363 | (signalPD[x].str.contains("User ID-Username:", na=False)),
364 | "User Name",
365 | ] = signalPD[x].str.split(":", n=1, expand=True)[1]
366 | # Delete Username entry from origional location
367 | signalPD.loc[
368 | signalPD[x].str.contains("User ID-Username:", na=False), [x]
369 | ] = ""
370 | # delete all befote semicolon
371 | signalPD[x] = signalPD[x].str.split(":", n=1, expand=True)[1].str.strip()
372 |
373 | signalContact(signalPD)
374 |
375 | signalPD[originIMEI] = phoneData.IMEI
376 |
377 | export_cols = [originIMEI, "Name", "User Name"]
378 | export_cols.extend(selected_cols)
379 | print("Located {} Signal contacts".format(len(signalPD["Name"])))
380 | print("Exporting {}-SIGNAL.csv".format(phoneData.inFile))
381 | logging.info("Exporting Signal messenger from {}".format(phoneData.inFile))
382 | signalPD.to_csv(
383 | "{}-SIGNAL.csv".format(phoneData.inFile), index=False, columns=export_cols
384 | )
385 |
386 |
387 | # ----------- Parse Snapchat data ------------------------------------------------------------------
388 | def processSnapChat(contactsPD):
389 | print("\nProcessing Snapchat")
390 | snapPD = contactsPD[contactsPD["Source"] == "Snapchat"]
391 | snapPD = snapPD[["Name", "Entries", "Source"]]
392 |
393 | # Extract nested entities
394 | snapPD = snapPD.drop("Entries", axis=1).join(
395 | snapPD["Entries"].str.split("\n", expand=True)
396 | )
397 | selected_cols = []
398 | for x in snapPD.columns:
399 | if isinstance(x, int):
400 | selected_cols.append(x)
401 |
402 | def snapContacts(snapPD):
403 | for x in selected_cols:
404 | snapPD.loc[
405 | (snapPD[x].str.contains("User ID-Username", na=False)), "User Name"
406 | ] = snapPD[x].str.split(":", n=1, expand=True)[1]
407 | snapPD.loc[
408 | (snapPD[x].str.contains("User ID-User ID", na=False)), "User ID"
409 | ] = snapPD[x].str.split(":", n=1, expand=True)[1]
410 |
411 | snapContacts(snapPD)
412 | snapPD[originIMEI] = phoneData.IMEI
413 |
414 | exportCols = []
415 | for x in snapPD.columns:
416 | if isinstance(x, str):
417 | exportCols.append(x)
418 | if debug:
419 | print(snapPD[exportCols])
420 | print("Exporting {}-SNAPCHAT.csv".format(phoneData.inFile))
421 | logging.info("Exporting Snapchat from {}".format(phoneData.inFile))
422 | snapPD[exportCols].to_csv(
423 | "{}-SNAPCHAT.csv".format(phoneData.inFile),
424 | index=False,
425 | columns=[originIMEI, "Name", "User Name", "User ID"],
426 | )
427 |
428 |
429 | # ---- Parse Telegram Contacts--------------------------------------------------------------
430 | def processTelegram(contactsPD):
431 | print("\nProcessing Telegram")
432 | telegramPD = contactsPD[contactsPD["Source"] == "Telegram"].copy()
433 | telegramPD = telegramPD.drop("Entries", axis=1).join(
434 | telegramPD["Entries"].str.split("\n", expand=True)
435 | )
436 | telegramPD = telegramPD.reset_index(drop=True)
437 |
438 | selected_cols = []
439 | for x in telegramPD.columns:
440 | if isinstance(x, int):
441 | selected_cols.append(x)
442 |
443 | def phoneCheck(telegramPD):
444 | for x in selected_cols:
445 | telegramPD.loc[
446 | (telegramPD[x].str.contains("Phone-", na=False)), "Phone-Number"
447 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
448 |
449 | telegramPD.loc[
450 | (telegramPD[x].str.contains("User ID-Peer", na=False)), "Peer-ID"
451 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
452 |
453 | telegramPD.loc[
454 | (telegramPD[x].str.contains("User ID-Username", na=False)), "User-Name"
455 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
456 |
457 | phoneCheck(telegramPD)
458 | telegramPD[originIMEI] = phoneData.IMEI
459 | exportCols = []
460 | for x in telegramPD.columns:
461 | if isinstance(x, str):
462 | exportCols.append(x)
463 | # Export CSV
464 | print("Exporting {}-TELEGRAM.csv".format(phoneData.inFile))
465 | logging.info("Exporting Telegram from {}".format(phoneData.inFile))
466 | telegramPD[exportCols].to_csv(
467 | "{}-TELEGRAM.csv".format(phoneData.inFile), index=False
468 | )
469 |
470 |
471 | # ---Parse Whatsapp Contacts----------------------------------------------------------------------
472 | # Load WhatsApp
473 | def processWhatsapp(contactsPD):
474 | print("\nProcessing WhatsApp")
475 | whatsAppPD = contactsPD[contactsPD["Source"] == "WhatsApp"].copy()
476 | whatsAppPD = whatsAppPD[["Name", "Entries", "Source", "Interaction Statuses"]]
477 | # Shared contacts are not associated with a Whats app ID and cause problems.
478 | whatsAppPD = whatsAppPD[
479 | whatsAppPD["Interaction Statuses"].str.contains("Shared", na=False) == False
480 | ]
481 | # Unpack nested data
482 | whatsAppPD = whatsAppPD.drop("Entries", axis=1).join(
483 | whatsAppPD["Entries"].str.split("\n", expand=True)
484 | )
485 |
486 | # Data is expanded into colums with Integer names, check for these columns and add them to a
487 | # list to allow for different width sheets.
488 | colList = list(whatsAppPD)
489 | selected_cols = []
490 | for x in colList:
491 | if isinstance(x, int):
492 | selected_cols.append(x)
493 |
494 | # Look for data across expanded columns and shift it to output columns.
495 | def whatsappContactProcess(whatsAppPD):
496 | print("\nProcessing WhatsApp")
497 | for x in selected_cols:
498 | whatsAppPD.loc[
499 | (whatsAppPD[x].str.contains("Phone-Mobile", na=False)), "Phone-Mobile"
500 | ] = (
501 | whatsAppPD[x]
502 | .str.split(":", n=1, expand=True)[1]
503 | .str.replace(" ", "")
504 | .str.replace("-", "")
505 | )
506 |
507 | whatsAppPD.loc[
508 | (whatsAppPD[x].str.contains("Phone-:", na=False)), "Phone"
509 | ] = (
510 | whatsAppPD[x]
511 | .str.split(":", n=1, expand=True)[1]
512 | .str.replace(" ", "")
513 | .str.replace("-", "")
514 | )
515 |
516 | whatsAppPD.loc[
517 | (whatsAppPD[x].str.contains("Phone-Home:", na=False)), "Phone-Home"
518 | ] = (
519 | whatsAppPD[x]
520 | .str.split(":", n=1, expand=True)[1]
521 | .str.replace(" ", "")
522 | .str.replace("-", "")
523 | )
524 |
525 | whatsAppPD.loc[
526 | (whatsAppPD[x].str.contains("User ID-Push Name", na=False)), "Push-ID"
527 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
528 |
529 | whatsAppPD.loc[
530 | (whatsAppPD[x].str.contains("User ID-Id", na=False)), "Id-ID"
531 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
532 |
533 | whatsAppPD.loc[
534 | (whatsAppPD[x].str.contains("User ID-WhatsApp User Id", na=False)),
535 | "WhatsApp-ID",
536 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
537 |
538 | whatsAppPD.loc[
539 | (whatsAppPD[x].str.contains("Web address-Professional", na=False)),
540 | "BusinessWebsite",
541 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
542 |
543 | whatsAppPD.loc[
544 | (whatsAppPD[x].str.contains("Email-Professional", na=False)),
545 | "Business-Email",
546 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
547 |
548 | whatsappContactProcess(whatsAppPD)
549 |
550 | # Add IMEI Column
551 | whatsAppPD[originIMEI] = phoneData.IMEI
552 |
553 | # Remove working columns.
554 | exportCols = []
555 | for x in whatsAppPD.columns:
556 | if isinstance(x, str):
557 | exportCols.append(x)
558 | if debug:
559 | print(exportCols)
560 |
561 | # Export CSV
562 | print("Exporting {}-WHATSAPP.csv".format(phoneData.inFile))
563 | logging.info("Exporting Whatsapp from {}".format(phoneData.inFile))
564 | whatsAppPD[exportCols].to_csv(
565 | "{}-WHATSAPP.csv".format(phoneData.inFile), index=False
566 | )
567 |
568 |
569 | # ------- Argument parser for command line arguments -----------------------------------------
570 |
571 | if __name__ == "__main__":
572 | parser = argparse.ArgumentParser(
573 | description=__description__,
574 | epilog="Developed by {}".format(str(__author__), str(__version__)),
575 | )
576 |
577 | parser.add_argument(
578 | "-f",
579 | "--f",
580 | dest="inputFilename",
581 | help="Path to Excel Spreadsheet",
582 | required=False,
583 | )
584 |
585 | parser.add_argument(
586 | "-b",
587 | "--bulk",
588 | dest="bulk",
589 | required=False,
590 | action="store_true",
591 | help="Bulk process Excel spreadsheets in working directory.",
592 | )
593 |
594 | args = parser.parse_args()
595 |
596 | if len(sys.argv) == 1:
597 | parser.print_help()
598 | parser.exit()
599 |
600 | if args.bulk:
601 | print("Bulk Process")
602 | bulkProcessor()
603 |
604 | if args.inputFilename:
605 | if not os.path.exists(args.inputFilename):
606 | print(
607 | "Error: '{}' does not exist or is not a file.".format(
608 | args.inputFilename
609 | )
610 | )
611 | sys.exit(1)
612 | processMetadata(args.inputFilename)
613 |
--------------------------------------------------------------------------------
/clbExtract/clbExtract.py:
--------------------------------------------------------------------------------
1 | """
2 | Extracts nested contacts data from Cellebrite formatted Excel documents.
3 | - Cellebrite Stores contact details in multiline Excel cells.
4 |
5 | Formatted unapologetically with Black
6 |
7 | # Current Known Issues
8 | # FIXME - Fix Instagram parser, account ID's have changed
9 | # FIXME - Fix Og Signal parser, Column order
10 |
11 | Changelog
12 | 0.9 - Fix - Handles name change to Signal Private Messenger and extra data columns
13 | - prints version to command line
14 | - Fix bug where files ending in .XLXS (Caps) wouldn't be automatically found
15 | - Support for Outlook contacts
16 |
17 | 0.8 - Fix issue where Input file was entered twice for Instagram export sheet
18 |
19 | 0.7 - Add Provenance data column
20 | - Fix issue where WhatsApp or Facebook may not export due to lack if 'Interaction Statuses' Column
21 |
22 | 0.6 - Fix issue with Threema user ID attribution
23 | - Fix issue with parsers crashing out, now raises an exception and continues.
24 |
25 | 0.5 - Added support for recents - at this time this is kept separate from native contacts
26 | - Warning re large files, pandas is unable to provide load time estimates
27 | - Add option to normalise Au mobile phone by converting +614** to 04**
28 | - Minor tidyups and fixes to logging.
29 | - Fix WeChat exception for older style excels
30 | - Fix Whatsapp exception when interaction status is not populated
31 | - Fix Exception when there is no IMEI entry at all, eg. older iPads
32 | - Populate and export source columns
33 |
34 | 0.4a - Added support for Cellebrite files with device info stored in "device" rather than name columns
35 |
36 | 0.4 - Add support for alternate Cellebrite info page format
37 | - Add support For Line, WeChat, Threema contacts
38 |
39 | 0.3 Complete rewrite
40 |
41 | 0.2 - Implement command line argument parser
42 | Allow bulk processing of all items in directory
43 |
44 | 0.1 - Initial concept
45 |
46 | """
47 |
48 | import argparse
49 | import glob
50 | import logging
51 | import os
52 | import pandas as pd
53 | from pathlib import Path
54 | import sys
55 |
56 | ## Details
57 | __description__ = 'Flattens Cellebrite formatted Excel files. "Contacts" and "Device Info" tabs are required.'
58 | __author__ = "facelessg00n"
59 | __version__ = "0.9"
60 |
61 | parser = argparse.ArgumentParser(
62 | description=__description__,
63 | epilog="Developed by {}, version {}".format(str(__author__), str(__version__)),
64 | )
65 |
66 | # ----------- Options -----------
67 | os.chdir(os.getcwd())
68 |
69 | # Show extra debug output
70 | debug = False
71 |
72 | # Normalise Australian mobile numbers by replacing +614 with 04
73 | ausNormal = True
74 |
75 | # File size warning (MB)
76 | warnSize = 50
77 |
78 |
79 | # ----------- Logging options -------------------------------------
80 |
81 | logging.basicConfig(
82 | filename="clbExtract.log",
83 | format="%(asctime)s,- %(levelname)s - %(message)s",
84 | level=logging.INFO,
85 | )
86 |
87 |
88 | # Set names for sheets of interest
89 | clbPhoneInfo = "Device Info"
90 | clbContactSheet = "Contacts"
91 | clbPhoneInfov2 = "Device Information"
92 |
93 | # FIXME
94 | #### ---- Column names and other options ---------------------------------------------
95 | provenanceCols = ["WARRANT", "COLLECT", "EXAM", "NOTICE"]
96 |
97 | contactOutput = "ContactDetail"
98 | contactTypeOutput = "ContactType"
99 | originIMEI = "originIMEI"
100 | parsedApps = [
101 | "Facebook Messenger",
102 | "Instagram",
103 | "Line",
104 | "Native",
105 | "Outlook",
106 | "Recents",
107 | "Signal",
108 | "Signal Private Messenger",
109 | "Snapchat",
110 | "WhatsApp",
111 | "Telegram",
112 | "Threema",
113 | "WeChat",
114 | "Zalo",
115 | ]
116 |
117 |
118 | # Class object to hold phone and input file info
119 | class phoneData:
120 | IMEI = None
121 | IMEI2 = None
122 | inFile = None
123 | inPath = None
124 | inProvenance = None
125 |
126 | def __init__(
127 | self, IMEI=None, IMEI2=None, inFile=None, inPath=None, inProvenance=None
128 | ) -> None:
129 | self.IMEI = IMEI
130 | self.IMEI2 = IMEI2
131 | self.inFile = inFile
132 | self.inPath = inPath
133 | self.inProvenance = inProvenance
134 |
135 |
136 | # -------------Functions live here ------------------------------------------
137 |
138 | # ----- Bulk Excel Processor--------------------------------------------------
139 |
140 |
141 | # Finds and processes all excel files in the working directory.
142 | def bulkProcessor(inputProvenance):
143 | FILE_PATH = os.getcwd()
144 | inputFiles = glob.glob("*.xlsx") + glob.glob("*.XLSX")
145 | print((str(len(inputFiles)) + " Excel files located. \n"))
146 | logging.info("Bulk processing {} files".format(str(len(inputFiles))))
147 | # If there are no files found exit the process.
148 | if len(inputFiles) == 0:
149 | print("No excel files located.")
150 | print("Exiting.")
151 | quit()
152 | else:
153 | for inputFile in inputFiles:
154 | if os.path.exists(inputFile):
155 | try:
156 | processMetadata(inputFile, inputProvenance)
157 | # Need to deal with $ files.
158 | except FileNotFoundError:
159 | print("File does not exist or temp file detected")
160 | pass
161 | if debug:
162 | for inputFile in inputFiles:
163 | inputFilename = inputFile.split(".")[0]
164 | print(inputFilename)
165 |
166 |
167 | # FIXME - Deal with error when this info is missing
168 | ### -------- Process phone metadata ------------------------------------------------------
169 | def processMetadata(inputFile, inputProvenance):
170 | inputFile = inputFile
171 | print("Input Provenance is {}".format(inputProvenance))
172 | print("Extracting metadata from {}".format(inputFile))
173 | logging.info("Extracting metadata from {}".format(inputFile))
174 |
175 | phoneData.inProvenance = inputProvenance
176 |
177 | fileSize = os.path.getsize(inputFile) / 1048576
178 | if fileSize > warnSize:
179 | print(
180 | "Large input file detected, {} MB and may take some time to process, sadly progress is not able to be updated while the file is loading".format(
181 | f"{fileSize:.2f}"
182 | )
183 | )
184 | else:
185 | print("Input file is {} MB".format(f"{fileSize:.2f}"))
186 |
187 | try:
188 | infoPD = pd.read_excel(
189 | inputFile, sheet_name=clbPhoneInfo, header=1, usecols="B,C,D"
190 | )
191 |
192 | try:
193 | phoneData.IMEI = infoPD.loc[infoPD["Name"] == "IMEI", ["Value"]].values[0][
194 | 0
195 | ]
196 | phoneData.inFile = Path(inputFile).stem
197 | phoneData.inPath = os.path.dirname(inputFile)
198 | phoneData.inProvenance = inputProvenance
199 | except:
200 | print("Attempting Device Column")
201 | try:
202 | phoneData.IMEI = infoPD.loc[
203 | infoPD["Device"] == "IMEI", ["Value"]
204 | ].values[0][0]
205 | phoneData.inFile = Path(inputFile).stem
206 | phoneData.inPath = os.path.dirname(inputFile)
207 | except:
208 | print("IMEI not located, setting to NULL")
209 | phoneData.IMEI = None
210 | phoneData.inFile = Path(inputFile).stem
211 | phoneData.inPath = os.path.dirname(inputFile)
212 |
213 | try:
214 | phoneData.IMEI2 = infoPD.loc[infoPD["Name"] == "IMEI2", ["Value"]].values[
215 | 0
216 | ][0]
217 | except:
218 | phoneData.IMEI2 = None
219 | phoneData.inFile = Path(inputFile).stem
220 | phoneData.inPath = os.path.dirname(inputFile)
221 | # phoneData.inFile = inputFile.split(".")[0]
222 | phoneData.inFile = Path(inputFile).stem
223 | phoneData.inPath = os.path.dirname(inputFile)
224 |
225 | if debug:
226 | print(infoPD)
227 | print(phoneData.IMEI)
228 |
229 | except ValueError:
230 | print(
231 | "Info tab not found in {}, attempting with second format.".format(inputFile)
232 | )
233 | logging.exception(
234 | "No info tab found in {}, attempting with second format".format(inputFile)
235 | )
236 | try:
237 | infoPD = pd.read_excel(
238 | inputFile, sheet_name=clbPhoneInfov2, header=1, usecols="B,C,D"
239 | )
240 | # Remove leading whitespace from columns
241 | infoPD["Name"] = infoPD["Name"].str.strip()
242 | phoneData.IMEI = infoPD.loc[infoPD["Name"] == "IMEI", ["Value"]].values[0][
243 | 0
244 | ]
245 | print("Second format succeeded")
246 | logging.info("Second format succeeded on {}".format(inputFile))
247 |
248 | phoneData.inFile = Path(inputFile).stem
249 | phoneData.inPath = os.path.dirname(inputFile)
250 |
251 | except IndexError:
252 | print("IMEI not located, is this a tablet or iPAD?")
253 | logging.warning(
254 | "IMEI not found in {}, apptempting with with no IMEI".format(inputFile)
255 | )
256 | phoneData.IMEI = None
257 | phoneData.IMEI2 = None
258 | phoneData.inFile = Path(inputFile).stem
259 | phoneData.inPath = os.path.dirname(inputFile)
260 | print("Loaded {}, with no IMEI".format(inputFile))
261 | logging.info("Loaded {}, with no IMEI".format(inputFile))
262 | pass
263 |
264 | except ValueError:
265 | print(
266 | "\033[1;31m Info tab not found in {}, apptempting with with no IMEI".format(
267 | inputFile
268 | )
269 | )
270 | logging.warning(
271 | "Info tab not found in {}, apptempting with with no IMEI".format(
272 | inputFile
273 | )
274 | )
275 | phoneData.IMEI = None
276 | phoneData.IMEI2 = None
277 | # phoneData.inFile = inputFile.split(".")[0]
278 | phoneData.inFile = Path(inputFile).stem
279 | phoneData.inPath = os.path.dirname(inputFile)
280 | print("\033[1;31m Loaded {}, with no IMEI".format(inputFile))
281 | logging.info("Loaded {}, with no IMEI".format(inputFile))
282 | pass
283 |
284 | try:
285 | processContacts(inputFile)
286 | except Exception as e:
287 | print(e)
288 | except ValueError:
289 | print("\033[1;31m No Contacts tab found, is this a correctly formatted Excel?")
290 | logging.error(
291 | "No Contacts tab found in {}, is this a correctly formatted Excel?".format(
292 | inputFile
293 | )
294 | )
295 |
296 |
297 | ### Extract contacts tab of Excel file -------------------------------------------------------------------
298 | # This creates the initial dataframe, future processing is from copies of this dataframe.
299 | def processContacts(inputFile):
300 | inputFile = inputFile
301 | fileSize = os.path.getsize(inputFile) / 1048576
302 | print("Processing contacts in {} has begun.".format(phoneData.inFile))
303 | logging.info("Processing contacts in {} has begun.".format(phoneData.inFile))
304 |
305 | if fileSize > warnSize:
306 | print(
307 | "Large input file detected, {} MB and may take some time to process, sadly progress is not able to be updated while the file is loading".format(
308 | f"{fileSize:.2f}"
309 | )
310 | )
311 | else:
312 | print("Input file is {} MB".format(f"{fileSize:.2f}"))
313 |
314 | # Record input filename for use in export processes.
315 | if debug:
316 | print("\033[0;37m Input file is : {}".format(phoneData.inFile))
317 |
318 | contactsPD = pd.read_excel(
319 | inputFile,
320 | sheet_name=clbContactSheet,
321 | header=1,
322 | index_col="#",
323 | usecols=["#", "Name", "Entries", "Source", "Account"],
324 | )
325 |
326 | print("\033[0mProcessing the following app types for : {}".format(phoneData.inFile))
327 | applist = contactsPD["Source"].unique()
328 | for x in applist:
329 | if x in parsedApps:
330 | print("{} : \u2713 ".format(x))
331 |
332 | else:
333 | print("{} : \u2716".format(x))
334 |
335 | # Process native contacts
336 | try:
337 | processAppleNative(contactsPD)
338 | except Exception as e:
339 | print("Processing native contacts failed")
340 | print(e)
341 | pass
342 |
343 | # Process Apps
344 | for x in applist:
345 | if x == "Facebook Messenger":
346 | try:
347 | processFacebookMessenger(contactsPD)
348 | except Exception as e:
349 | logging.warning("Failed to parse Facebook messenger - {}".format(e))
350 | pass
351 | if x == "Instagram":
352 | try:
353 | processInstagram(contactsPD)
354 | except:
355 | logging.warning("Failed to parse Instagram")
356 | pass
357 | if x == "Line":
358 | try:
359 | processLine(contactsPD)
360 | except:
361 | logging.warning("Failed to parse Line")
362 | pass
363 | if x == "Outlook":
364 | try:
365 | processOutlookContacts(contactsPD)
366 | except:
367 | logging.warning("Failed to parse Outlook")
368 | pass
369 | if x == "Recents":
370 | try:
371 | processRecents(contactsPD)
372 | except:
373 | logging.warning("Failed to parse Recents")
374 | pass
375 | if x == "Snapchat":
376 | try:
377 | processSnapChat(contactsPD)
378 | except:
379 | logging.warning("Failed to parse Snapchat")
380 | pass
381 | if x == "Telegram":
382 | try:
383 | processTelegram(contactsPD)
384 | except:
385 | logging.warning("Failed to parse Telegram")
386 | pass
387 | if x == "Threema":
388 | try:
389 | processThreema(contactsPD)
390 | except:
391 | logging.warning("Failed to parse Threema")
392 | pass
393 | if x == "Signal":
394 | try:
395 | processSignal(contactsPD)
396 | except:
397 | logging.warning("Failed to parse Signal")
398 | pass
399 | if x == "Signal Private Messenger":
400 | try:
401 | processSignalPrivateMessenger(contactsPD)
402 | except:
403 | logging.warning("Failed to parse Signal Private Messenger")
404 |
405 | if x == "WeChat":
406 | try:
407 | processWeChat(contactsPD)
408 | except:
409 | logging.warning("Failed to parse WeChat")
410 | pass
411 | if x == "WhatsApp":
412 | try:
413 | processWhatsapp(contactsPD)
414 | except Exception as e:
415 | logging.warning("Failed to parse WhatsApp - {}".format(e))
416 | pass
417 | if x == "Zalo":
418 | try:
419 | processZalo(contactsPD)
420 | except:
421 | logging.warning("Failed to parse Zalo")
422 | pass
423 |
424 | print("\nProcessing of {} complete".format(inputFile))
425 |
426 |
427 | # ------ Parse Facebook Messenger --------------------------------------------------------------
428 | def processFacebookMessenger(contactsPD):
429 | print("\nProcessing Facebook Messenger")
430 | facebookMessengerPD = contactsPD[contactsPD["Source"] == "Facebook Messenger"]
431 | facebookMessengerPD = facebookMessengerPD.drop("Entries", axis=1).join(
432 | facebookMessengerPD["Entries"].str.split("\n", expand=True)
433 | )
434 | facebookMessengerPD = facebookMessengerPD.reset_index(drop=True)
435 |
436 | selected_cols = []
437 | for x in facebookMessengerPD.columns:
438 | if isinstance(x, int):
439 | selected_cols.append(x)
440 |
441 | def phoneCheck(facebookMessengerPD):
442 | for x in selected_cols:
443 | facebookMessengerPD.loc[
444 | (facebookMessengerPD[x].str.contains("User ID-Facebook Id", na=False)),
445 | "Account ID",
446 | ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1]
447 | facebookMessengerPD.loc[
448 | (facebookMessengerPD[x].str.contains("User ID-Username", na=False)),
449 | "User Name",
450 | ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1]
451 |
452 | phoneCheck(facebookMessengerPD)
453 |
454 | facebookMessengerPD["Source"] = "Messenger"
455 | facebookMessengerPD[originIMEI] = phoneData.IMEI
456 | facebookMessengerPD["inputFile"] = phoneData.inFile
457 | facebookMessengerPD["Provenance"] = phoneData.inProvenance
458 |
459 | exportCols = []
460 | for x in facebookMessengerPD.columns:
461 | if isinstance(x, str):
462 | exportCols.append(x)
463 | print(
464 | "{} user accounts located".format(len(facebookMessengerPD["Account"].unique()))
465 | )
466 | print("{} contacts located".format(len(facebookMessengerPD["Account ID"].unique())))
467 | print("Exporting {}-FB-MESSENGER.csv".format(phoneData.inFile))
468 | logging.info("Exporting FB messenger from {}".format(phoneData.inFile))
469 | try:
470 | facebookMessengerPD[exportCols].to_csv(
471 | "{}-FB-MESSENGER.csv".format(phoneData.inFile),
472 | index=False,
473 | )
474 | except Exception as e:
475 | print(e)
476 |
477 |
478 | # ----- Parse Instagram data ------------------------------------------------------------------
479 | def processInstagram(contactsPD):
480 | print("\nProcessing Instagram")
481 | instagramPD = contactsPD[contactsPD["Source"] == "Instagram"].copy()
482 | instagramPD = instagramPD.drop("Entries", axis=1).join(
483 | instagramPD["Entries"].str.split("\n", expand=True)
484 | )
485 |
486 | selected_cols = []
487 | for x in instagramPD.columns:
488 | if isinstance(x, int):
489 | selected_cols.append(x)
490 |
491 | def instaContacts(instagramPD):
492 | for x in selected_cols:
493 | instagramPD.loc[
494 | (instagramPD[x].str.contains("User ID-Username", na=False)), "User Name"
495 | ] = instagramPD[x].str.split(":", n=1, expand=True)[1]
496 | instagramPD.loc[
497 | (instagramPD[x].str.contains("User ID-Instagram Id", na=False)),
498 | "Instagram ID",
499 | ] = instagramPD[x].str.split(":", n=1, expand=True)[1]
500 |
501 | instaContacts(instagramPD)
502 |
503 | instagramPD[originIMEI] = phoneData.IMEI
504 | instagramPD["inputFile"] = phoneData.inFile
505 |
506 | exportCols = []
507 | for x in instagramPD.columns:
508 | if isinstance(x, str):
509 | exportCols.append(x)
510 | print("{} Instagram contacts located".format(len(instagramPD["Name"])))
511 | print("Exporting {}-INSTAGRAM.csv".format(phoneData.inFile))
512 | logging.info("Exporting Instagram from {}".format(phoneData.inFile))
513 | # TODO - Fix column handling
514 | instagramPD[exportCols].to_csv(
515 | "{}-INSTAGRAM.csv".format(phoneData.inFile),
516 | index=False,
517 | )
518 |
519 |
520 | # ---- Process Line -----------------------------------------------------------------------
521 | def processLine(contactsPD):
522 | print("Processing Line")
523 | linePD = contactsPD[contactsPD["Source"] == "Line"].copy()
524 | linePD = linePD.drop("Entries", axis=1).join(
525 | linePD["Entries"].str.split("\n", expand=True)
526 | )
527 | linePD = linePD.reset_index(drop=True)
528 |
529 | selected_cols = []
530 | for x in linePD.columns:
531 | if isinstance(x, int):
532 | selected_cols.append(x)
533 |
534 | def processLine(LinePD):
535 | for x in selected_cols:
536 | LinePD.loc[
537 | (LinePD[x].str.contains("User ID-Address Book Name:", na=False)),
538 | "LineAddressBook",
539 | ] = LinePD[x].str.split(":", n=1, expand=True)[1]
540 |
541 | LinePD.loc[
542 | (LinePD[x].str.contains("User ID-User ID:", na=False)),
543 | "LineUserID",
544 | ] = LinePD[x].str.split(":", n=1, expand=True)[1]
545 | LinePD.loc[
546 | (LinePD[x].str.contains("User ID-Server:", na=False)),
547 | "LineServerID",
548 | ] = LinePD[x].str.split(":", n=1, expand=True)[1]
549 |
550 | processLine(linePD)
551 |
552 | linePD[originIMEI] = phoneData.IMEI
553 | linePD["inputFile"] = phoneData.inFile
554 | exportCols = []
555 |
556 | for x in linePD.columns:
557 | if isinstance(x, str):
558 | exportCols.append(x)
559 |
560 | print("{} Line contacts located".format(len(linePD["Name"])))
561 | print("Exporting {}-LINE.csv".format(phoneData.inFile))
562 | logging.info("Exporting Line contacts from {}".format(phoneData.inFile))
563 | linePD[exportCols].to_csv("{}-LINE.csv".format(phoneData.inFile), index=False)
564 |
565 |
566 | # ------------Process native contact list ------------------------------------------------
567 | def processAppleNative(contactsPD):
568 |
569 | print("\nProcessing Native Contacts")
570 | # nativeContactsPD = contactsPD[contactsPD["Source"].isna()]
571 |
572 | # Contacts are stored with either null (iPhone) or "Phone" for Android
573 | nativeContactsPD = contactsPD[
574 | (contactsPD.Source.isna()) | (contactsPD.Source == "Phone")
575 | ].copy()
576 |
577 | # Fill NaN values with : to prevent error with blank entries.
578 | nativeContactsPD.Entries = nativeContactsPD.Entries.fillna(":")
579 |
580 | nativeContactsPD = nativeContactsPD.drop("Entries", axis=1).join(
581 | nativeContactsPD["Entries"]
582 | .str.split("\n", expand=True)
583 | .stack()
584 | .reset_index(level=1, drop=True)
585 | .rename("Entries")
586 | )
587 |
588 | # nativeContactsPD = nativeContactsPD[["Name", "Interaction Statuses", "Entries"]]
589 |
590 | nativeContactsPD = nativeContactsPD[
591 | nativeContactsPD["Entries"].str.contains(r"Phone-")
592 | ]
593 | nativeContactsPD[originIMEI] = phoneData.IMEI
594 | nativeContactsPD["inputFile"] = phoneData.inFile
595 | nativeContactsPD["Provenance"] = phoneData.inProvenance
596 |
597 | # Remove erroneous characters, need to make this a regex
598 | # TODO Use a regex to tidy this up.
599 | nativeContactsPD["Entries"] = (
600 | nativeContactsPD["Entries"]
601 | .str.split(":", n=1, expand=True)[1]
602 | .str.strip()
603 | .str.replace(" ", "", regex=False)
604 | .str.replace("-", "", regex=False)
605 | .str.replace("+", "", regex=False)
606 | # Fix issue with inseyets reports
607 | .str.replace("Message", "", regex=False)
608 | .str.replace("(", "", regex=False)
609 | .str.replace(")", "", regex=False)
610 | )
611 |
612 | if ausNormal:
613 | nativeContactsPD["Entries"] = nativeContactsPD["Entries"].str.replace(
614 | r"\+614", "04", regex=True
615 | )
616 |
617 | if debug:
618 | print(nativeContactsPD)
619 |
620 | # nativeContactsPD = nativeContactsPD[[originIMEI, "Name", "Entries", "Interaction Statuses"]]
621 | print("{} contacts located.".format(len(nativeContactsPD)))
622 | print("Exporting {}-NATIVE.csv".format(phoneData.inFile))
623 | logging.info("Exporting Native contacts from {}".format(phoneData.inFile))
624 | nativeContactsPD.to_csv("{}-NATIVE.csv".format(phoneData.inFile), index=False)
625 |
626 |
627 | # Process Outlook Contacts
628 | def processOutlookContacts(contactsPD):
629 | print("\nProcessing Outlook Contacts")
630 |
631 | outlookContactsPD = contactsPD[(contactsPD.Source == "Outlook")].copy()
632 | # Fill NaN values with : to prevent error with blank entries.
633 | outlookContactsPD.Entries = outlookContactsPD.Entries.fillna(":")
634 |
635 | outlookContactsPD = outlookContactsPD.drop("Entries", axis=1).join(
636 | outlookContactsPD["Entries"]
637 | .str.split("\n", expand=True)
638 | .stack()
639 | .reset_index(level=1, drop=True)
640 | .rename("Entries")
641 | )
642 |
643 | outlookContactsPD = outlookContactsPD[["Account", "Name", "Entries", "Source"]]
644 | outlookContactsPD[originIMEI] = phoneData.IMEI
645 | outlookContactsPD["inputFile"] = phoneData.inFile
646 | outlookContactsPD["Provenance"] = phoneData.inProvenance
647 |
648 | outlookContactsPD["Entries"] = (
649 | outlookContactsPD["Entries"].str.split(":", n=1, expand=True)[1].str.strip()
650 | )
651 |
652 | print("{} contacts located.".format(len(outlookContactsPD)))
653 | print("Exporting {}-OUTLOOK.csv".format(phoneData.inFile))
654 | logging.info("Exporting Native contacts from {}".format(phoneData.inFile))
655 | outlookContactsPD.to_csv("{}-OUTLOOK.csv".format(phoneData.inFile), index=False)
656 |
657 |
658 | # ----------- Parse Recents -----------------------------------------------------------------------
659 | def processRecents(contactsPD):
660 | print("\nProcessing Recents")
661 | recentsPD = contactsPD[contactsPD["Source"] == "Recents"].copy()
662 | recentsPD.Entries = recentsPD.Entries.fillna(":")
663 | recentsPD = recentsPD[recentsPD["Entries"].str.contains(r"Phone-")]
664 |
665 | recentsPD[originIMEI] = phoneData.IMEI
666 | recentsPD["inputFile"] = phoneData.inFile
667 | recentsPD["Provenance"] = phoneData.inProvenance
668 |
669 | recentsPD["Entries"] = (
670 | recentsPD["Entries"]
671 | .str.split(":", n=1, expand=True)[1]
672 | .str.strip()
673 | .str.replace(" ", "")
674 | .str.replace("-", "")
675 | # .str.replace("+","",regex=False)
676 | )
677 | if ausNormal:
678 | recentsPD["Entries"] = recentsPD["Entries"].str.replace(
679 | r"\+614", "04", regex=True
680 | )
681 |
682 | print("{} recent contacts located.".format(len(recentsPD)))
683 | print("Exporting {}-RECENT.csv".format(phoneData.inFile))
684 | logging.info("Exporting recent contacts from {}".format(phoneData.inFile))
685 | recentsPD.to_csv("{}-RECENTS.csv".format(phoneData.inFile), index=False)
686 |
687 |
688 | # ------------Parse Signal contacts ---------------------------------------------------------------
689 | def processSignal(contactsPD):
690 | print("\nProcessing Signal Contacts")
691 | signalPD = contactsPD[contactsPD["Source"] == "Signal"].copy()
692 | signalPD = signalPD[["Name", "Entries", "Source"]]
693 | signalPD = signalPD.drop("Entries", axis=1).join(
694 | signalPD["Entries"].str.split("\n", expand=True)
695 | )
696 |
697 | # Data is expended into columns with integer names, add these columsn to selected_cols so we can search them later
698 | selected_cols = []
699 | for x in signalPD.columns:
700 | if isinstance(x, int):
701 | selected_cols.append(x)
702 |
703 | # FIXME improve with method used for other apps
704 | # Signal can store mutiple values under entries such as Mobile Number:
705 | # So we break them all out into columns.
706 | def signalContact(signalPD):
707 | for x in selected_cols:
708 | # Locate Signal Username and move to Username Column
709 | signalPD.loc[
710 | (signalPD[x].str.contains("User ID-Username:", na=False)),
711 | "User Name",
712 | ] = signalPD[x].str.split(":", n=1, expand=True)[1]
713 | # Delete Username entry from origional location
714 | signalPD.loc[
715 | signalPD[x].str.contains("User ID-Username:", na=False), [x]
716 | ] = ""
717 | # delete all befote semicolon
718 | signalPD[x] = signalPD[x].str.split(":", n=1, expand=True)[1].str.strip()
719 |
720 | signalContact(signalPD)
721 |
722 | signalPD[originIMEI] = phoneData.IMEI
723 | signalPD["inputFile"] = phoneData.inFile
724 | signalPD["Provenance"] = phoneData.inProvenance
725 |
726 | export_cols = [originIMEI, "Name", "User Name"]
727 | export_cols.extend(selected_cols)
728 | print("Located {} Signal contacts".format(len(signalPD["Name"])))
729 | print("Exporting {}-SIGNAL.csv".format(phoneData.inFile))
730 | logging.info("Exporting Signal messenger from {}".format(phoneData.inFile))
731 | signalPD.to_csv(
732 | "{}-SIGNAL.csv".format(phoneData.inFile), index=False, columns=export_cols
733 | )
734 |
735 |
736 | # ----------- Parse Signal Private Messenger--------------------------------------------------------
737 | def processSignalPrivateMessenger(contactsPD):
738 | print("\nProcessing Signal Private Messenger")
739 | spmPD = contactsPD[contactsPD["Source"] == "Signal Private Messenger"].copy()
740 | spmPD = spmPD.drop("Entries", axis=1).join(
741 | spmPD["Entries"].str.split("\n", expand=True)
742 | )
743 | # spmPD['Entries'].tolist()
744 | # spmPD.explode('Entries')
745 | # spmPD = spmPD.reset_index(drop=True)
746 | spmPD[originIMEI] = phoneData.IMEI
747 | spmPD["inputFile"] = phoneData.inFile
748 | spmPD["Provenance"] = phoneData.inProvenance
749 |
750 | selected_cols = []
751 | for x in spmPD.columns:
752 | if isinstance(x, int):
753 | selected_cols.append(x)
754 |
755 | def spmContacts(spmPD):
756 | for x in selected_cols:
757 | try:
758 | spmPD.loc[(spmPD[x].str.contains("Phone-:", na=False)), "Phone"] = (
759 | spmPD[x].str.split(":", n=1, expand=True)[1]
760 | )
761 | except:
762 | pass
763 | try:
764 | spmPD.loc[(spmPD[x].str.contains("User ID-:", na=False)), "User-ID"] = (
765 | spmPD[x].str.split(":", n=1, expand=True)[1]
766 | )
767 | except:
768 | pass
769 | try:
770 | spmPD.loc[
771 | (spmPD[x].str.contains("User ID-Nickname:", na=False)),
772 | "User-ID-Nickname",
773 | ] = spmPD[x].str.split(":", n=1, expand=True)[1]
774 | except:
775 | pass
776 | try:
777 | spmPD.loc[
778 | (spmPD[x].str.contains("User ID-Username:", na=False)),
779 | "User-ID-Username",
780 | ] = spmPD[x].str.split(":", n=1, expand=True)[1]
781 | except:
782 | pass
783 | try:
784 | spmPD.loc[
785 | (spmPD[x].str.contains("User ID-ProfileKey:", na=False)),
786 | "User-ID-ProfileKey",
787 | ] = spmPD[x].str.split(":", n=1, expand=True)[1]
788 | except:
789 | pass
790 |
791 | spmContacts(spmPD)
792 | # spmPD.info()
793 |
794 | exportCols = []
795 | # Remove column from previous step
796 | for x in spmPD.columns:
797 | if isinstance(x, str):
798 | # if x !="Provenence" or x != 'originIMEI' or x != 'inputFile':
799 | if x not in ["Provenance", "originIMEI", "inputFile"]:
800 | exportCols.append(x)
801 |
802 | exportCols.extend(["originIMEI", "inputFile", "Provenance"])
803 |
804 | print("Located {} Signal Private Messenger contacts.".format(len(spmPD["Name"])))
805 | print("Exporting {}-Signal-PM.csv".format(phoneData.inFile))
806 | logging.info("Exporting Signal Private Messenger from {}".format(phoneData.inFile))
807 | spmPD[exportCols].to_csv("{}-Signal-PM.csv".format(phoneData.inFile), index=False)
808 |
809 |
810 | # ----------- Parse Snapchat data ------------------------------------------------------------------
811 | def processSnapChat(contactsPD):
812 | print("\nProcessing Snapchat")
813 | snapPD = contactsPD[contactsPD["Source"] == "Snapchat"]
814 | snapPD = snapPD[["Name", "Entries", "Source"]]
815 |
816 | # Extract nested entities
817 | snapPD = snapPD.drop("Entries", axis=1).join(
818 | snapPD["Entries"].str.split("\n", expand=True)
819 | )
820 | selected_cols = []
821 | for x in snapPD.columns:
822 | if isinstance(x, int):
823 | selected_cols.append(x)
824 |
825 | def snapContacts(snapPD):
826 | for x in selected_cols:
827 | snapPD.loc[
828 | (snapPD[x].str.contains("User ID-Username", na=False)), "User Name"
829 | ] = snapPD[x].str.split(":", n=1, expand=True)[1]
830 | snapPD.loc[
831 | (snapPD[x].str.contains("User ID-User ID", na=False)), "User ID"
832 | ] = snapPD[x].str.split(":", n=1, expand=True)[1]
833 |
834 | snapContacts(snapPD)
835 |
836 | snapPD[originIMEI] = phoneData.IMEI
837 | snapPD["inputFile"] = phoneData.inFile
838 | snapPD["Provenance"] = phoneData.inProvenance
839 |
840 | exportCols = []
841 | for x in snapPD.columns:
842 | if isinstance(x, str):
843 | exportCols.append(x)
844 | if debug:
845 | print(snapPD[exportCols])
846 |
847 | print("{} Snapchat contacts located.".format(len(snapPD)))
848 | print("Exporting {}-SNAPCHAT.csv".format(phoneData.inFile))
849 | logging.info("Exporting Snapchat from {}".format(phoneData.inFile))
850 | snapPD[exportCols].to_csv(
851 | "{}-SNAPCHAT.csv".format(phoneData.inFile),
852 | index=False,
853 | columns=[
854 | originIMEI,
855 | "Name",
856 | "User Name",
857 | "User ID",
858 | "Source",
859 | "inputFile",
860 | "Provenance",
861 | ],
862 | )
863 |
864 |
865 | # ---- Parse Telegram Contacts--------------------------------------------------------------
866 | def processTelegram(contactsPD):
867 | print("\nProcessing Telegram")
868 | telegramPD = contactsPD[contactsPD["Source"] == "Telegram"].copy()
869 | telegramPD = telegramPD.drop("Entries", axis=1).join(
870 | telegramPD["Entries"].str.split("\n", expand=True)
871 | )
872 | telegramPD = telegramPD.reset_index(drop=True)
873 |
874 | selected_cols = []
875 | for x in telegramPD.columns:
876 | if isinstance(x, int):
877 | selected_cols.append(x)
878 |
879 | def phoneCheck(telegramPD):
880 | for x in selected_cols:
881 | telegramPD.loc[
882 | (telegramPD[x].str.contains("Phone-", na=False)), "Phone-Number"
883 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
884 |
885 | telegramPD.loc[
886 | (telegramPD[x].str.contains("User ID-Peer", na=False)), "Peer-ID"
887 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
888 |
889 | telegramPD.loc[
890 | (telegramPD[x].str.contains("User ID-Username", na=False)), "User-Name"
891 | ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
892 |
893 | phoneCheck(telegramPD)
894 |
895 | telegramPD[originIMEI] = phoneData.IMEI
896 | telegramPD["inputFile"] = phoneData.inFile
897 | telegramPD["Provenance"] = phoneData.inProvenance
898 | telegramPD["source"] = "Telegram"
899 | exportCols = []
900 | for x in telegramPD.columns:
901 | if isinstance(x, str):
902 | exportCols.append(x)
903 | # Export CSV
904 | print("{} Telegram contacts located.".format(len(telegramPD)))
905 | print("Exporting {}-TELEGRAM.csv".format(phoneData.inFile))
906 | logging.info("Exporting Telegram from {}".format(phoneData.inFile))
907 | telegramPD[exportCols].to_csv(
908 | "{}-TELEGRAM.csv".format(phoneData.inFile), index=False
909 | )
910 |
911 |
912 | # ------ Parse Threema Contacts -----------------------------------------------------------------
913 | def processThreema(contactsPD):
914 | print("\nProcessing Threema")
915 | threemaPD = contactsPD[contactsPD["Source"] == "Threema"].copy()
916 | threemaPD = threemaPD.drop("Entries", axis=1).join(
917 | threemaPD["Entries"].str.split("\n", expand=True)
918 | )
919 | threemaPD = threemaPD.reset_index(drop=True)
920 |
921 | selected_cols = []
922 | for x in threemaPD.columns:
923 | if isinstance(x, int):
924 | selected_cols.append(x)
925 |
926 | def ThreemaParse(ThreemaPD):
927 | for x in selected_cols:
928 | try:
929 | ThreemaPD.loc[
930 | (ThreemaPD[x].str.contains("User ID-identity:", na=False)),
931 | "Threema ID",
932 | ] = ThreemaPD[x].str.split(":", n=1, expand=True)[1]
933 | except:
934 | pass
935 | try:
936 | ThreemaPD.loc[
937 | (ThreemaPD[x].str.contains("User ID-Username:", na=False)),
938 | "ThreemaUsername",
939 | ] = ThreemaPD[x].str.split(":", n=1, expand=True)[1]
940 | except:
941 | pass
942 |
943 | ThreemaParse(threemaPD)
944 |
945 | threemaPD[originIMEI] = phoneData.IMEI
946 | threemaPD["inputFile"] = phoneData.inFile
947 | threemaPD["Provenance"] = phoneData.inProvenance
948 |
949 | exportCols = []
950 | for x in threemaPD.columns:
951 | if isinstance(x, str):
952 | exportCols.append(x)
953 |
954 | print("Exporting {}-THREEMA.csv".format(phoneData.inFile))
955 | logging.info("Exporting Threema from {}".format(phoneData.inFile))
956 | threemaPD[exportCols].to_csv("{}-THREEMA.csv".format(phoneData.inFile), index=False)
957 |
958 |
959 | ## Parse WeChat Contacts ------------------------------------------------------------------------
960 | def processWeChat(contactsPD):
961 | print("\nProcessing WeChat")
962 | WeChatPD = contactsPD[contactsPD["Source"] == "WeChat"].copy()
963 | WeChatPD = WeChatPD.drop("Entries", axis=1).join(
964 | WeChatPD["Entries"].str.split("\n", expand=True)
965 | )
966 |
967 | WeChatPD = WeChatPD.reset_index(drop=True)
968 |
969 | selected_cols = []
970 | for x in WeChatPD.columns:
971 | if isinstance(x, int):
972 | selected_cols.append(x)
973 |
974 | def WeChatContacts(WeChatPD):
975 |
976 | for x in selected_cols:
977 | # FIXME Usernames that contain @stranger???
978 | # FIXME Try / Except / Pass
979 |
980 | try:
981 | WeChatPD.loc[
982 | (WeChatPD[x].str.contains("User ID-WeChat ID:", na=False)),
983 | "WeChatID",
984 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
985 | except:
986 | pass
987 |
988 | try:
989 | WeChatPD.loc[
990 | (WeChatPD[x].str.contains("User ID-QQ:", na=False)), "QQ User ID"
991 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
992 | except:
993 | pass
994 |
995 | try:
996 | WeChatPD.loc[
997 | (WeChatPD[x].str.contains("User ID-Username:", na=False)),
998 | "Username",
999 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
1000 | except:
1001 | pass
1002 |
1003 | try:
1004 | WeChatPD.loc[
1005 | (WeChatPD[x].str.contains("User ID-LinkedIn ID:", na=False)),
1006 | "LinkedIn ID",
1007 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
1008 | except:
1009 | pass
1010 |
1011 | try:
1012 | WeChatPD.loc[
1013 | (WeChatPD[x].str.contains("User ID-Facebook ID:", na=False)),
1014 | "Facebook ID",
1015 | ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
1016 | except:
1017 | pass
1018 |
1019 | WeChatContacts(WeChatPD)
1020 |
1021 | # Repalace we chat ID's with @ stranhger with blank values as are not we chat user IDs
1022 | try:
1023 | WeChatPD.WeChatID = WeChatPD.WeChatID.apply(
1024 | lambda x: "" if (r"@stranger") in str(x) else x
1025 | )
1026 | except:
1027 | print("WeChat float exception")
1028 | print(WeChatPD.WeChatID)
1029 | pass
1030 |
1031 | WeChatPD[originIMEI] = phoneData.IMEI
1032 | WeChatPD["inputFile"] = phoneData.inFile
1033 | WeChatPD["Provenance"] = phoneData.inProvenance
1034 | WeChatPD["Source"] = "Weixin"
1035 |
1036 | # Export Columns where the title is a string to drop working columns
1037 | exportCols = []
1038 | for x in WeChatPD.columns:
1039 | if isinstance(x, str):
1040 | exportCols.append(x)
1041 | print("Located {} WeChat contacts.".format(len(WeChatPD["WeChatID"])))
1042 | print("Exporting {}-WECHAT.csv".format(phoneData.inFile))
1043 | logging.info("Exporting WeChat from {}".format(phoneData.inFile))
1044 | WeChatPD[exportCols].to_csv("{}-WECHAT.csv".format(phoneData.inFile), index=False)
1045 |
1046 |
1047 | # ---Parse Whatsapp Contacts----------------------------------------------------------------------
1048 | # Load WhatsApp
1049 | def processWhatsapp(contactsPD):
1050 | print("\nProcessing WhatsApp")
1051 | whatsAppPD = contactsPD[contactsPD["Source"] == "WhatsApp"].copy()
1052 | try:
1053 | whatsAppPD = whatsAppPD[["Name", "Entries", "Source", "Interaction Statuses"]]
1054 | # Datatype needs to be object not float to allow filtering by string without throwing an error
1055 | whatsAppPD["Interaction Statuses"] = whatsAppPD["Interaction Statuses"].astype(
1056 | object
1057 | )
1058 | # Shared contacts are not associated with a Whats app ID and cause problems.
1059 | print(whatsAppPD.dtypes)
1060 | whatsAppPD = whatsAppPD[
1061 | whatsAppPD["Interaction Statuses"].str.contains("Shared", na=False) == False
1062 | ]
1063 | except Exception as e:
1064 | print(e)
1065 | print("Interaction statuses column not found, ignoring")
1066 | # print(whatsAppPD)
1067 | whatsAppPD = whatsAppPD[
1068 | [
1069 | "Name",
1070 | "Entries",
1071 | "Source",
1072 | ]
1073 | ]
1074 |
1075 | # Unpack nested data
1076 | whatsAppPD = whatsAppPD.drop("Entries", axis=1).join(
1077 | whatsAppPD["Entries"].str.split("\n", expand=True)
1078 | )
1079 |
1080 | # Data is expanded into colums with Integer names, check for these columns and add them to a
1081 | # list to allow for different width sheets.
1082 | colList = list(whatsAppPD)
1083 | selected_cols = []
1084 | for x in colList:
1085 | if isinstance(x, int):
1086 | selected_cols.append(x)
1087 |
1088 | # Look for data across expanded columns and shift it to output columns.
1089 | def whatsappContactProcess(whatsAppPD):
1090 | for x in selected_cols:
1091 | whatsAppPD.loc[
1092 | (whatsAppPD[x].str.contains("Phone-Mobile", na=False)), "Phone-Mobile"
1093 | ] = (
1094 | whatsAppPD[x]
1095 | .str.split(":", n=1, expand=True)[1]
1096 | .str.replace(" ", "")
1097 | .str.replace("-", "")
1098 | )
1099 |
1100 | whatsAppPD.loc[
1101 | (whatsAppPD[x].str.contains("Phone-:", na=False)), "Phone"
1102 | ] = (
1103 | whatsAppPD[x]
1104 | .str.split(":", n=1, expand=True)[1]
1105 | .str.replace(" ", "")
1106 | .str.replace("-", "")
1107 | )
1108 |
1109 | whatsAppPD.loc[
1110 | (whatsAppPD[x].str.contains("Phone-Home:", na=False)), "Phone-Home"
1111 | ] = (
1112 | whatsAppPD[x]
1113 | .str.split(":", n=1, expand=True)[1]
1114 | .str.replace(" ", "")
1115 | .str.replace("-", "")
1116 | )
1117 |
1118 | whatsAppPD.loc[
1119 | (whatsAppPD[x].str.contains("User ID-Push Name", na=False)), "Push-ID"
1120 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1121 |
1122 | whatsAppPD.loc[
1123 | (whatsAppPD[x].str.contains("User ID-Id", na=False)), "Id-ID"
1124 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1125 |
1126 | whatsAppPD.loc[
1127 | (whatsAppPD[x].str.contains("User ID-WhatsApp User Id", na=False)),
1128 | "WhatsApp-ID",
1129 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1130 |
1131 | whatsAppPD.loc[
1132 | (whatsAppPD[x].str.contains("Web address-Professional", na=False)),
1133 | "BusinessWebsite",
1134 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1135 |
1136 | whatsAppPD.loc[
1137 | (whatsAppPD[x].str.contains("Email-Professional", na=False)),
1138 | "Business-Email",
1139 | ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1140 |
1141 | whatsappContactProcess(whatsAppPD)
1142 |
1143 | # Add IMEI Column
1144 | whatsAppPD[originIMEI] = phoneData.IMEI
1145 | whatsAppPD["inputFile"] = phoneData.inFile
1146 | whatsAppPD["Provenance"] = phoneData.inProvenance
1147 | whatsAppPD["Source"] = "Whatsapp"
1148 |
1149 | # Remove working columns.
1150 | exportCols = []
1151 | for x in whatsAppPD.columns:
1152 | if isinstance(x, str):
1153 | exportCols.append(x)
1154 | if debug:
1155 | print(exportCols)
1156 |
1157 | # Export CSV
1158 | print("{} WhatsApp contacts located".format(len(whatsAppPD["Name"])))
1159 | print("Exporting {}-WHATSAPP.csv".format(phoneData.inFile))
1160 | logging.info("Exporting Whatsapp from {}".format(phoneData.inFile))
1161 | whatsAppPD[exportCols].to_csv(
1162 | "{}-WHATSAPP.csv".format(phoneData.inFile), index=False
1163 | )
1164 |
1165 |
1166 | # --- Parse Zalo Contacts --------------------------------------------------------------------
1167 | def processZalo(contactsPD):
1168 | print("\nProcessinf Zalo")
1169 | ZaloPD = contactsPD[contactsPD["Source"] == "Zalo"]
1170 | ZaloPD = ZaloPD.drop("Entries", axis=1).join(
1171 | ZaloPD["Entries"].str.split("\n", expand=True)
1172 | )
1173 | selected_cols = []
1174 | for x in ZaloPD.columns:
1175 | if isinstance(x, int):
1176 | selected_cols.append(x)
1177 |
1178 | def processZaloContacts(ZaloPD):
1179 | for x in selected_cols:
1180 | ZaloPD.loc[
1181 | (ZaloPD[x].str.contains("User ID-User Name:", na=False)),
1182 | "ZaloUserName",
1183 | ] = ZaloPD[x].str.split(":", n=1, expand=True)[1]
1184 |
1185 | ZaloPD.loc[
1186 | (ZaloPD[x].str.contains("User ID-Id:", na=False)),
1187 | "ZaloUserID",
1188 | ] = ZaloPD[x].str.split(":", n=1, expand=True)[1]
1189 |
1190 | processZaloContacts(ZaloPD)
1191 |
1192 | ZaloPD[originIMEI] = phoneData.IMEI
1193 | ZaloPD["inputFile"] = phoneData.inFile
1194 | ZaloPD["Provenance"] = phoneData.inProvenance
1195 |
1196 | exportCols = []
1197 | for x in ZaloPD.columns:
1198 | if isinstance(x, str):
1199 | exportCols.append(x)
1200 |
1201 | print("Exporting {}-ZALO.csv".format(phoneData.inFile))
1202 | logging.info("Exporting Zalo from {}".format(phoneData.inFile))
1203 | ZaloPD[exportCols].to_csv("{}-ZALO.csv".format(phoneData.inFile), index=False)
1204 |
1205 |
1206 | # ------- Argument parser for command line arguments -----------------------------------------
1207 |
1208 | if __name__ == "__main__":
1209 | parser = argparse.ArgumentParser(
1210 | description=__description__,
1211 | epilog="Developed by {}".format(str(__author__), str(__version__)),
1212 | )
1213 |
1214 | parser.add_argument(
1215 | "-f",
1216 | "--f",
1217 | dest="inputFilename",
1218 | help="Path to Excel Spreadsheet",
1219 | required=False,
1220 | )
1221 |
1222 | parser.add_argument(
1223 | "-p",
1224 | "--p",
1225 | dest="inputProvenance",
1226 | choices=provenanceCols,
1227 | required=False,
1228 | )
1229 |
1230 | parser.add_argument(
1231 | "-b",
1232 | "--bulk",
1233 | dest="bulk",
1234 | required=False,
1235 | action="store_true",
1236 | help="Bulk process Excel spreadsheets in working directory.",
1237 | )
1238 |
1239 | args = parser.parse_args()
1240 |
1241 | if len(sys.argv) == 1:
1242 | parser.print_help()
1243 | parser.exit()
1244 |
1245 | if args.bulk:
1246 | print("Bulk Process")
1247 | bulkProcessor(args.inputProvenance)
1248 |
1249 | if args.inputFilename:
1250 | if not os.path.exists(args.inputFilename):
1251 | print(
1252 | "Error: '{}' does not exist or is not a file.".format(
1253 | args.inputFilename
1254 | )
1255 | )
1256 | sys.exit(1)
1257 | processMetadata(args.inputFilename, args.inputProvenance)
1258 |
--------------------------------------------------------------------------------