├── .gitignore
├── clbExtract.log
├── offlineTranslate
    ├── images
    │   └── offlineTranslate.jpg
    ├── __pycache__
    │   └── bulk_translate_v3.cpython-310.pyc
    ├── requirements.txt
    ├── readme.md
    ├── translateGUI.py
    ├── old
    │   └── bulk_translate.py
    └── bulk_translate_v3.py
├── clbExtract
    ├── requirements.txt
    ├── readme.md
    ├── clbExtractGUI.py
    ├── old
    │   └── clbExtract.py
    └── clbExtract.py
├── README.md
├── locationExtract
    └── locationExtract.py
└── applenotes2hash
    └── applenotes2hash.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .venv
2 | .gitignore
3 | .vscode/settings.json


--------------------------------------------------------------------------------
/clbExtract.log:
--------------------------------------------------------------------------------
1 | 2023-07-25 21:05:49,626,- INFO - Bulk processing 0 files
2 | 


--------------------------------------------------------------------------------
/offlineTranslate/images/offlineTranslate.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facelessg00n/pythonForensics/HEAD/offlineTranslate/images/offlineTranslate.jpg


--------------------------------------------------------------------------------
/offlineTranslate/__pycache__/bulk_translate_v3.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facelessg00n/pythonForensics/HEAD/offlineTranslate/__pycache__/bulk_translate_v3.cpython-310.pyc


--------------------------------------------------------------------------------
/offlineTranslate/requirements.txt:
--------------------------------------------------------------------------------
 1 | certifi==2023.11.17
 2 | charset-normalizer==3.3.2
 3 | et-xmlfile==1.1.0
 4 | idna==3.4
 5 | numpy==1.26.2
 6 | openpyxl==3.1.2
 7 | pandas==2.1.3
 8 | python-dateutil==2.8.2
 9 | pytz==2023.3.post1
10 | requests==2.31.0
11 | six==1.16.0
12 | tk==0.1.0
13 | tqdm==4.66.4
14 | tzdata==2023.3
15 | urllib3==2.1.0
16 | XlsxWriter==3.1.9
17 | 


--------------------------------------------------------------------------------
/clbExtract/requirements.txt:
--------------------------------------------------------------------------------
 1 | altgraph==0.17.3
 2 | black==23.1.0
 3 | click==8.1.3
 4 | future==0.18.2
 5 | macholib==1.16.2
 6 | mypy-extensions==0.4.3
 7 | numpy==1.23.3
 8 | packaging==23.0
 9 | pandas==1.5.0
10 | pathspec==0.11.0
11 | pefile==2022.5.30
12 | platformdirs==2.6.2
13 | pyinstaller==5.6.2
14 | pyinstaller-hooks-contrib==2022.13
15 | python-dateutil==2.8.2
16 | pytz==2022.4
17 | pywin32-ctypes==0.2.0
18 | scapy==2.4.5
19 | simplekml==1.3.6
20 | six==1.16.0
21 | tk==0.1.0
22 | tomli==2.0.1
23 | typing_extensions==4.4.0
24 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # pythonForesnsics
 2 | 
 3 | Collection of handy Python scripts
 4 | 
 5 | ### applenotes2hash.py
 6 | 
 7 | Extracts hashes from Apple Notes NoteStore.sqlite, or from a GK extract for cracking. Exports in Hashcat or John Format.
 8 | 
 9 | ### clbExtract.py
10 | 
11 | Extracts contact details from Cellebrite formatted Excel files
12 | 
13 | ### locationExtract.py
14 | 
15 | Extracts location data from Cellebrite Excel files and converts them to an ESRI friendly format. Can also look for gaps of more than a specified time.
16 | 
17 | ### offlineTranslate
18 | 
19 | Utilises LibreTranslate for the bulk offline translation of messages
20 | 


--------------------------------------------------------------------------------
/clbExtract/readme.md:
--------------------------------------------------------------------------------
 1 | # Cellebrite Contact Extractor
 2 | 
 3 | ---
 4 | Extracts contacts from Cellebrite formatted Excel files. The data in these files is nested within the column of an excel file which can cause issues when analysing them with third party tools.
 5 | 
 6 | This tool exports contacts on a per app basis into flat .CSV files for use with third party analysis tools. This was built to handle Excel files as this is typically what analysts will receive unless they have received a 'reader' file.
 7 | 
 8 | ## Usage
 9 | 
10 | This folder contains 2 python scripts. One is an optional GUI if you wish to build this into a portable .exe fi.
11 | 
12 | These instruction assume you are utilising *VS Code* (<https://code.visualstudio.com/>) and have a Python environment setup.
13 | 
14 | Download the contents of this folder and open the folder in VS code.
15 | 
16 | Create and activate a virtual environment.
17 | 
18 | <https://code.visualstudio.com/docs/python/environments>
19 | 
20 | install the requirements packages, this will include the tools to turn this into a portable exe.
21 | 
22 | `pip install -r .\requirements.txt`
23 | 
24 | The standalone script may then be run from the command line.
25 | 
26 | options:
27 | 
28 | - -h show this help message and exit
29 | - -f path to the input file
30 | - -b process all files in the working directory
31 | - -p add data provenance from one of the pre approved items
32 | 
33 | Place the Excel files in the folder where the script is located to process the files in bulk.
34 | 
35 | ## Building the exe
36 | 
37 | A portable exe can be build utilising PyInstaller.
38 | 
39 | The exe must be built on the same OS it is intended to be run on or it will not work. For example if you intend to build this for use on Windows machines it must be built on a windows machine.
40 | 
41 | The resulting exe will be locayed in the /dist folder of the working directory after it has been built.
42 | 
43 | ### **With GUI**
44 | 
45 | `pyinstaller --onefile .\clbExtractGUI.py`
46 | 
47 | ### **Without GUI**
48 | 
49 | `pyinstaller --onefile .\clbExtract.py`
50 | 
51 | ---
52 | 
53 | ## Current known issues
54 | 
55 | - Native contacts does not currently export email addresses
56 | - Depending on which version of Cellebrite was used, or what type of extraction was perfomed some social media SUer ID's may not be available in the Excel files.
57 | 
58 | ## Newtork Analysis tools
59 | 
60 | ----
61 | **Constellation**
62 | 
63 | <https://www.constellation-app.com/>
64 | 
65 | **Maltego**
66 | 
67 | <https://www.maltego.com/>
68 | 


--------------------------------------------------------------------------------
/offlineTranslate/readme.md:
--------------------------------------------------------------------------------
 1 | # Offline Translation
 2 | 
 3 | ---
 4 | 
 5 | Many forensic tools have inbuilt translation offerings however my experience shows they can be slow or unreliable. As an offline translation option is often required I began to seek other means of translation. Enter LibreTranslate, an self hosted Machine translation API.
 6 | 
 7 | <https://github.com/LibreTranslate/LibreTranslate>
 8 | 
 9 | ## Installation
10 | 
11 | Installation options will depend on your environment however to test the proof of concept LibreTranslate can be installed with the following command from an internet connected machine.
12 | 
13 |     `pip install libretranslate`
14 | 
15 | The server can then be started on localhost with the following command. On first run it will pull down the language packages. The machine can then be taken offline.
16 | 
17 |     `libretranslate`
18 | ---
19 | 
20 | ## Modification
21 | 
22 | You may need to change the `serverURL = "http://localhost:5000"` value to match where you libretranslate instance is hosted
23 | 
24 | ## Script usage
25 | 
26 | The Python script loads the specified Excel file and looks for a column named 'Messages' as per the Magnet AXIOM formatted excel sheets. At this time it will only handle Excel documents with a single sheet.
27 | 
28 | In reality it will take any Excel spreadsheet with a Column named messages.
29 | 
30 | - Auto detection of language is much faster however not as accurate. If you know the language it is best to select one of the language codes manually. To retrieve the language codes run `python3 bulk_translate_v3 -g` and the available languages from the server will be listed.
31 | - The generated CSV files may not open in Microsoft Excel however will open in LibreOffice Calc. It will however also attempt to output Excel files.
32 | - Defaults to English translation but other languages are possible
33 | 
34 | Example usage
35 | 
36 |     Auto Detect
37 |     python3 bulk_translate_v3.py -f excel.xlsx
38 |     
39 |     Manually Select language
40 |     python3 python3 bulk_translate_v3 -f excel.xlsx -l zh
41 | 
42 | ![screenshot](https://github.com/facelessg00n/pythonForensics/blob/main/offlineTranslate/images/offlineTranslate.jpg)
43 | 
44 | ## Other usage
45 | 
46 | options:
47 | 
48 |   -h, --help            show this help message and exit
49 | 
50 |   -f INPUTFILEPATH, --file INPUTFILEPATH
51 |                         Path to Excel File
52 | 
53 |   -s TRANSLATIONSERVER, --server TRANSLATIONSERVER
54 |                         Address of translation server if not localhost or hardcoded
55 | 
56 |   -l {}, --language {}  Language code for input text - optional but can greatly improve accuracy
57 | 
58 |   -e {Chats,Instant Messages}, --excelSheet {Chats,Instant Messages}
59 |                         Sheet name within Excel file to be translated
60 | 
61 |   -c, --isCellebrite    If file originated from Cellebrite, header starts at 1, and message column is called 'Body'
62 |   
63 |   -g, --getlangs        Get supported language codes and names from server
64 | 
65 | ## Building the exe
66 | 
67 | A portable exe can be build utilising PyInstaller.
68 | 
69 | The exe must be built on the same OS it is intended to be run on or it will not work. For example if you intend to build this for use on Windows machines it must be built on a windows machine.
70 | 
71 | The resulting exe will be located in the /dist folder of the working directory after it has been built.
72 | 
73 | ### **With GUI**
74 | 
75 | `pyinstaller --onefile .\translateGUI.py`
76 | 
77 | ### **Without GUI**
78 | 
79 | `pyinstaller --onefile .\bulk_translate_v3.py`
80 | 


--------------------------------------------------------------------------------
/locationExtract/locationExtract.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Extracts location data from a Cellebrite PA report and converts it to an ESRI friendly time format.
  3 | 
  4 | Data is Extracted from the Timeline tab of the Excel report.
  5 | 
  6 | Also has a feature to look for gaps in recording.
  7 | 
  8 | """
  9 | import argparse
 10 | import logging
 11 | import pandas as pd
 12 | import os
 13 | import sys
 14 | from datetime import datetime, timedelta
 15 | 
 16 | # file = "input.xlsx"
 17 | 
 18 | # Details
 19 | __description__ = "Converts Cellebrite PA Garmin extracts to an ESRI compatible CSV.\n Loads data from the Timeline tab of the Excel Export"
 20 | __author__ = "facelessg00n"
 21 | __version__ = "0.1"
 22 | 
 23 | # Options
 24 | debug = False
 25 | findGaps = False
 26 | dateAfter = None
 27 | localConvert = True
 28 | 
 29 | localHour = 0.0
 30 | localMinute = 0.0
 31 | 
 32 | # ----------------------Config the logger------------------------------------
 33 | logging.basicConfig(
 34 |     filename="log.txt",
 35 |     format="%(levelname)s:%(asctime)s:%(message)s",
 36 |     level=logging.DEBUG,
 37 | )
 38 | # ---------------Functions --------------------------------------------------
 39 | 
 40 | # Setup function to load spreadsheet and columns of interest
 41 | def convertFile(inputFilename, dateAfter=None, gapFinder=None):
 42 |     if dateAfter is not None:
 43 |         print("Looking for dates after = " + str(dateAfter))
 44 |         dateCut = True
 45 |     else:
 46 |         dateCut = False
 47 | 
 48 |     print("Loading Excel file {}".format(inputFilename))
 49 |     logging.info("Loading Excel data from %s" % (str(inputFilename)))
 50 | 
 51 |     try:
 52 |         df = pd.read_excel(inputFilename, sheet_name="Timeline", header=1)
 53 |         df = df[["#", "Time", "Latitude", "Longitude"]]
 54 |     except Exception as e:
 55 |         print(e)
 56 |         exit()
 57 | 
 58 |     # Convert time format
 59 |     print("Converting Time Format")
 60 |     new = df["Time"].str.split("(", n=1, expand=True)
 61 |     df["DateTime"] = new[0]
 62 |     df["DateTime"] = pd.to_datetime(
 63 |         df["DateTime"], errors="raise", utc=True, format="%d/%m/%Y %I:%M:%S %p"
 64 |     )
 65 |     if debug == True:
 66 |         print(df.info())
 67 | 
 68 |     # Filter only data after this date
 69 |     if dateCut:
 70 |         try:
 71 |             df = df[(df["DateTime"] > dateAfter)]
 72 |         # "2020-02-01"
 73 |         except TypeError:
 74 |             print(
 75 |                 "Type error has been raised, it is likely the input date format is incorect. This process will be skipped"
 76 |             )
 77 |             dateCut = False
 78 |             pass
 79 | 
 80 |     if localConvert:
 81 |         df["Local"] = df["DateTime"] + pd.Timedelta(
 82 |             hours=localHour, minutes=localMinute
 83 |         )
 84 | 
 85 |     # Find and report gaps in data recording.
 86 |     if gapFinder is not None:
 87 |         print(
 88 |             "Gap finder is looking for gaps of more than %s seconds." % (str(gapFinder))
 89 |         )
 90 |         gapData = True
 91 |     else:
 92 |         print("Gap finder is not looking for gaps in time")
 93 |         gapData = False
 94 | 
 95 |     if gapData:
 96 |         print("\nFinding gaps in time")
 97 |         df["GapFinder"] = df["DateTime"].diff().dt.seconds > gapFinder
 98 |         time_diff = df[df["GapFinder"] == True]
 99 |         print(str(time_diff.shape[0]) + " gaps in recording located.")
100 |         gapData = True
101 |         if debug:
102 |             print(time_diff)
103 | 
104 |     # Export dataframes to CSV
105 |     print("\nExporting CSV's")
106 |     df.to_csv("locationData.csv", index=False, date_format="%Y/%m/%d %H:%M:%S")
107 |     if gapData:
108 |         time_diff.to_csv("gapData.csv", index=False, date_format="%Y/%m/%d %H:%M:%S")
109 | 
110 | 
111 | # Command line input args
112 | if __name__ == "__main__":
113 |     parser = argparse.ArgumentParser(
114 |         description=__description__,
115 |         epilog="Developed by {}".format(str(__author__), str(__version__)),
116 |     )
117 | 
118 |     parser.add_argument(
119 |         "-f",
120 |         "--file",
121 |         dest="inputFilename",
122 |         help="Path to input Excel Spreadsheet",
123 |         # required=True,
124 |     )
125 | 
126 |     parser.add_argument(
127 |         "-g",
128 |         "--gap",
129 |         dest="gapSeconds",
130 |         type=int,
131 |         help="To detect gaps in time enter a time gap in seconds. 300 seconds is 5 minutes",
132 |         default=None,
133 |         required=False,
134 |     )
135 | 
136 |     parser.add_argument(
137 |         "-d",
138 |         "--dateafter",
139 |         dest="dateAfter",
140 |         type=int,
141 |         help="Filter only data after a certain date. Required format is YYYY-MM-DD. Useful for shrinking your dataset",
142 |         required=False,
143 |     )
144 | 
145 |     args = parser.parse_args()
146 | 
147 |     # display help message when no args are passed.
148 |     if len(sys.argv) == 1:
149 | 
150 |         parser.print_help()
151 |         sys.exit(1)
152 | 
153 |     # If no input show the help text.
154 |     if not args.inputFilename:
155 |         parser.print_help()
156 |         parser.exit(1)
157 | 
158 |     # Check if the input file exists.
159 |     if not os.path.exists(args.inputFilename):
160 |         print("ERROR: '{}' does not exist or is not a file".format(args.inputFilename))
161 |         sys.exit(1)
162 | 
163 |     if args.dateAfter is not None:
164 |         dateAfter = args.dateAfter
165 |         print("Date After Not none")
166 |     else:
167 |         dateAfter = None
168 | 
169 |     if args.gapSeconds is not None:
170 |         gapSeconds = args.gapSeconds
171 |         if debug:
172 |             print("GapSeconds Not none")
173 |     else:
174 |         gapSeconds = None
175 | 
176 |     convertFile(args.inputFilename, gapFinder=gapSeconds, dateAfter=dateAfter)
177 | 


--------------------------------------------------------------------------------
/applenotes2hash/applenotes2hash.py:
--------------------------------------------------------------------------------
  1 | # Extracts password protected hashes from Apple Notes
  2 | #
  3 | # Elements from script from Dhiru Kholia <kholia at kth.se>
  4 | # https://github.com/openwall/john/blob/bleeding-jumbo/run/applenotes2john.py
  5 | #
  6 | # Formatted with Black
  7 | #
  8 | # ------Changes----------
  9 | #
 10 | # V0.1 - Initial release
 11 | #
 12 | 
 13 | import argparse
 14 | import binascii
 15 | import glob
 16 | import os
 17 | import sys
 18 | import sqlite3
 19 | import shutil
 20 | import zipfile
 21 | 
 22 | PY3 = sys.version_info[0] == 3
 23 | 
 24 | if not PY3:
 25 |     reload(sys)
 26 |     sys.setdefaultencoding("utf8")
 27 | 
 28 | __description__ = "Extracts and converts Apple Note hashes to Hashcat and JTR format"
 29 | __author__ = "facelessg00n"
 30 | __version__ = "0.1"
 31 | 
 32 | formatType = []
 33 | notesFile = "NoteStore.sqlite"
 34 | targetPath = os.getcwd() + "/temp"
 35 | debug = False
 36 | 
 37 | # ------------- Functions live here -----------------------------------
 38 | 
 39 | 
 40 | def makeTempFolder():
 41 |     try:
 42 |         # print("Creating temporary folder")
 43 |         os.makedirs(targetPath)
 44 |     except OSError as e:
 45 |         # print(e)
 46 |         # print("Temporary folder exists")
 47 |         # print("Purging directory")
 48 |         shutil.rmtree(targetPath)
 49 |         try:
 50 |             # print("Creating temporary folder")
 51 |             os.makedirs(targetPath)
 52 |         except:
 53 |             # print("Something has gone horribly wrong")
 54 |             exit()
 55 | 
 56 | 
 57 | # Check it is a zip file and extract relevant file
 58 | def checkZip(z):
 59 |     if zipfile.is_zipfile(z):
 60 |         # print("This is a Zip File")
 61 |         with zipfile.ZipFile(z) as file:
 62 |             zippedFiles = file.namelist()
 63 |             filePath = [x for x in zippedFiles if x.endswith(notesFile)]
 64 |             if debug:
 65 |                 print("Located file at path : {}".format(filePath))
 66 |                 print("Extracting to temp file")
 67 |             file.extract(filePath[0], targetPath)
 68 | 
 69 |     else:
 70 |         print("this does not appear to be a zip file")
 71 | 
 72 | 
 73 | def processGrayShift(x, formatType):
 74 |     formatType = formatType
 75 |     try:
 76 |         makeTempFolder()
 77 |     except Exception as e:
 78 |         print(e)
 79 |     checkZip(x)
 80 |     inputFile = glob.glob("./**/NoteStore.sqlite", recursive=True)
 81 |     if debug:
 82 |         print("Using" + str(inputFile[0]) + " as the input file for Cache.")
 83 |     extractHash(inputFile[0], formatType)
 84 | 
 85 | 
 86 | # Functionality below lifted from
 87 | # https://github.com/openwall/john/blob/bleeding-jumbo/run/applenotes2john.py
 88 | 
 89 | 
 90 | def extractHash(inputFile, formatType):
 91 |     db = sqlite3.connect(inputFile)
 92 |     cursor = db.cursor()
 93 |     rows = cursor.execute(
 94 |         "SELECT Z_PK, ZCRYPTOITERATIONCOUNT, ZCRYPTOSALT, ZCRYPTOWRAPPEDKEY, ZPASSWORDHINT, ZCRYPTOVERIFIER, ZISPASSWORDPROTECTED FROM ZICCLOUDSYNCINGOBJECT"
 95 |     )
 96 |     for row in rows:
 97 |         iden, iterations, salt, fhash, hint, shash, is_protected = row
 98 |         if fhash is None:
 99 |             phash = shash
100 |         else:
101 |             phash = fhash
102 |         if hint is None:
103 |             hint = "None"
104 |             # NOTE: is_protected can be zero even if iterations value is non-zero!
105 |             # This was tested on macOS 10.13.2 with cloud syncing turned off.
106 |         if iterations == 0:  # is this a safer check than checking is_protected?
107 |             continue
108 |         if phash is None:
109 |             continue
110 |         phash = binascii.hexlify(phash)
111 |         salt = binascii.hexlify(salt)
112 |         if PY3:
113 |             phash = str(phash, "ascii")
114 |             salt = str(salt, "ascii")
115 |         fname = os.path.basename(inputFile)
116 |         # For John
117 |         if formatType == "JOHN":
118 |             sys.stdout.write(
119 |                 "%s:$ASN$*%d*%d*%s*%s:::::%s\n"
120 |                 % (fname, iden, iterations, salt, phash, hint)
121 |             )
122 |         # For Hashcat
123 |         elif formatType == "HASHCAT":
124 |             sys.stdout.write("$ASN$*%d*%d*%s*%s\n" % (iden, iterations, salt, phash))
125 | 
126 |         else:
127 |             print("Invalid or no format type set")
128 |         db.close
129 | 
130 | 
131 | # ----------- Argument Parser ---------------------------------------------
132 | 
133 | if __name__ == "__main__":
134 |     parser = argparse.ArgumentParser(
135 |         description=__description__,
136 |         epilog="Developed by {}, version {}".format(str(__author__), str(__version__)),
137 |     )
138 | 
139 |     parser.add_argument(
140 |         "-f", "--file", dest="notesFile", help="Path to NoteStore.sqlite"
141 |     )
142 |     parser.add_argument(
143 |         "-g", "--grayshift", dest="grayshiftINPUT", help="Path to Grayshift Extract"
144 |     )
145 |     parser.add_argument(
146 |         "-t",
147 |         "--type",
148 |         dest="formatType",
149 |         help="Output format type, JOHN or HASHCAT, defaults to JOHN. Hashcat Mode is 16200",
150 |         choices=["HASHCAT", "JOHN"],
151 |         default="JOHN",
152 |         required=False,
153 |     )
154 | 
155 |     args = parser.parse_args()
156 |     if len(sys.argv) == 1:
157 |         parser.print_help()
158 |         sys.exit(1)
159 | 
160 | if args.notesFile:
161 |     if not os.path.exists(args.notesFile):
162 |         print("ERROR: {} does not exist or is not a file".format(args.notesFile))
163 |         sys.exit(1)
164 |     extractHash(notesFile, args.formatType)
165 | 
166 | if args.grayshiftINPUT:
167 |     if not os.path.exists(args.grayshiftINPUT):
168 |         print("ERROR: {} does not exist or is not a file".format(args.grayshiftINPUT))
169 |         sys.exit(1)
170 |     processGrayShift(args.grayshiftINPUT, args.formatType)
171 | 


--------------------------------------------------------------------------------
/clbExtract/clbExtractGUI.py:
--------------------------------------------------------------------------------
  1 | ### GUI for Cellebrite File Flattener
  2 | # Only tested with Windows
  3 | # Known display issues with OSX
  4 | 
  5 | 
  6 | # Changelog
  7 | # v0.2 - Minor changes, added provenance selector
  8 | # v0.1 - Initial concept
  9 | 
 10 | 
 11 | ### GUI for Cellebrite File Flattener
 12 | 
 13 | import clbExtract
 14 | 
 15 | import os
 16 | from tkinter import *
 17 | from tkinter import ttk
 18 | from tkinter import messagebox
 19 | from tkinter import filedialog as fd
 20 | from tkinter.messagebox import showinfo
 21 | 
 22 | LIGHT_GREY = "#BEBFC7"
 23 | LIGHT_BLUE = "#307FE2"
 24 | DARK_BLUE = "#024DA1"
 25 | FONT_1 = "Roboto Condensed"
 26 | 
 27 | # Auto locate list of files
 28 | candidateFiles = os.listdir(os.getcwd())
 29 | file_list = []
 30 | for candidateFiles in candidateFiles:
 31 |     if candidateFiles.endswith(".xlsx"):
 32 |         file_list.append(candidateFiles)
 33 | fileListingDisplay = "\n".join(file_list)
 34 | 
 35 | # list of handled apps
 36 | supportedAppsDisp = "\n".join(clbExtract.parsedApps)
 37 | 
 38 | ## _____Functions live here_____
 39 | 
 40 | 
 41 | # TODO - Will need to pass in Provenance data here
 42 | def process_all():
 43 |     print("Process all selected")
 44 |     clbExtract.bulkProcessor(provMenu.get())
 45 | 
 46 | 
 47 | def select_file():
 48 |     filetypes = [("Excel Files", "*.xlsx")]
 49 | 
 50 |     filename = fd.askopenfile(
 51 |         title="Open a file",
 52 |         initialdir=os.listdir(os.getcwd()),
 53 |         filetypes=filetypes,
 54 |         multiple=False,
 55 |     )
 56 |     if filename:
 57 |         print(filename.name)
 58 |         showinfo(
 59 |             title="Selected File",
 60 |             message=filename.name,
 61 |         )
 62 |         print(provMenu.get())
 63 |         clbExtract.processMetadata(filename.name, provMenu.get())
 64 | 
 65 | 
 66 | # Process selected file
 67 | def get_selection():
 68 |     selected_file = lbox.curselection()
 69 |     print(lbox.get(selected_file))
 70 |     print(provMenu.get())
 71 |     clbExtract.processMetadata(lbox.get(selected_file), provMenu.get())
 72 | 
 73 | 
 74 | def comboSelection(event):
 75 |     selectedProvenance = provMenu.get()
 76 |     # messagebox.showinfo(message=f"The Selected value is {selectedProvenance}",title='Selection')
 77 | 
 78 | 
 79 | ### _____Create interface_____
 80 | root = Tk()
 81 | root.geometry("580x650")
 82 | root.minsize(458, 580)
 83 | root.maxsize(780, 780)
 84 | root.configure(bg=LIGHT_GREY)
 85 | 
 86 | prog_name = Label(
 87 |     text="Cellebrite Contact Extractor",
 88 |     anchor=W,
 89 |     padx=10,
 90 |     pady=10,
 91 |     background=DARK_BLUE,
 92 |     width=480,
 93 |     font=(FONT_1, 20),
 94 | )
 95 | prog_name.pack()
 96 | 
 97 | sideFrame = Frame(master=root, width=100, height=100, bg=LIGHT_BLUE)
 98 | sideFrame.pack(fill=Y, side=LEFT)
 99 | sideFrame.pack()
100 | 
101 | prog_data = Label(
102 |     text="For bulk procesisng of files place this program in the folder\n containing your Cellebrite formatted Excel files. ",
103 |     font=(FONT_1, 10),
104 |     anchor=W,
105 |     padx=10,
106 |     pady=10,
107 |     bg=LIGHT_GREY,
108 | )
109 | prog_data.pack()
110 | 
111 | app_data_heading = Label(
112 |     sideFrame, text="Handled apps:", bg=LIGHT_BLUE, font=(FONT_1, 10)
113 | )
114 | app_data_heading.pack()
115 | app_data = Label(sideFrame, text=supportedAppsDisp, bg=LIGHT_BLUE, font=(FONT_1, 10))
116 | app_data.pack()
117 | 
118 | ## Show Auto Located files
119 | auto_locate_data = Label(
120 |     text="{} candidate files located at path: \n{}".format(
121 |         str(len(file_list)), str(os.getcwd())
122 |     ),
123 |     anchor=W,
124 |     padx=10,
125 |     pady=10,
126 |     bg=LIGHT_GREY,
127 | )
128 | auto_locate_data.pack(pady=10, padx=10)
129 | ### Show options for data provenance.
130 | provLabel = Label(
131 |     text="Select provenance, i.e WARRANT",
132 |     padx=10,
133 |     pady=10,
134 |     bg=LIGHT_GREY,
135 | )
136 | provLabel.pack()
137 | 
138 | provVar = StringVar()
139 | provMenu = ttk.Combobox(
140 |     values=clbExtract.provenanceCols, textvariable=provVar, state="readonly"
141 | )
142 | provMenu.bind("<<ComboboxSelected>>", comboSelection)
143 | provMenu.pack(side="top")
144 | 
145 | filesLabel = Label(
146 |     text="Select File",
147 |     padx=10,
148 |     pady=10,
149 |     bg=LIGHT_GREY,
150 | )
151 | filesLabel.pack()
152 | 
153 | # Select file names
154 | fNames = StringVar(value=fileListingDisplay)
155 | lbox = Listbox(root, listvariable=fNames, height=5, width=200)
156 | scroll_bar = Scrollbar(root)
157 | scroll_bar.pack(side=RIGHT, fill=Y)
158 | lbox.pack()
159 | scroll_bar.config(command=lbox.yview)
160 | 
161 | 
162 | ### Buttons for processing selected files
163 | 
164 | btn2 = Button(
165 |     root,
166 |     text="Process Selected",
167 |     command=get_selection,
168 |     bg=LIGHT_GREY,
169 |     padx=10,
170 | )
171 | btn2.pack(side="top")
172 | 
173 | btn3 = Button(root, text="Process all files", command=process_all, bg=LIGHT_GREY)
174 | btn3.pack(side="top")
175 | 
176 | 
177 | prog_data = Label(
178 |     text="Manually select file a file to extract \n Output files will be located at: \n {}".format(
179 |         str(os.getcwd())
180 |     ),
181 |     anchor=W,
182 |     padx=10,
183 |     pady=10,
184 |     font=(FONT_1, 10),
185 |     bg=LIGHT_GREY,
186 | )
187 | prog_data.pack()
188 | 
189 | btn = Button(root, text="Locate file", command=select_file, bg=LIGHT_GREY)
190 | btn.pack(side=TOP, pady=10, padx=10)
191 | 
192 | # Exit Program
193 | exitBtn = Button(root, text="Exit", command=root.destroy, bg=LIGHT_GREY)
194 | exitBtn.pack(side=TOP, pady=20, padx=10)
195 | 
196 | # Display version info
197 | verLabel = Label(
198 |     text="Version {}\nDeveloped facelesg00n".format(str(clbExtract.__version__)),
199 |     padx=10,
200 |     pady=10,
201 |     bg=LIGHT_GREY,
202 | )
203 | verLabel.pack()
204 | 
205 | 
206 | root.mainloop()
207 | 


--------------------------------------------------------------------------------
/offlineTranslate/translateGUI.py:
--------------------------------------------------------------------------------
  1 | ### GUI for Offline Translation
  2 | # Only tested with Windows
  3 | # Known display issues with OSX
  4 | 
  5 | # Changelog
  6 | # v0.2 - Update function names and handle Cellebrite formatted files.
  7 | #      - Language selection menu
  8 | # v0.1 - Initial concept
  9 | 
 10 | import bulk_translate_v3
 11 | 
 12 | import os
 13 | from tkinter import *
 14 | from tkinter import ttk
 15 | from tkinter import messagebox
 16 | from tkinter import filedialog as fd
 17 | from tkinter.messagebox import showinfo
 18 | 
 19 | LIGHT_GREY = "#BEBFC7"
 20 | LIGHT_BLUE = "#307FE2"
 21 | DARK_BLUE = "#024DA1"
 22 | DARK_RED = "#FF5342"
 23 | FONT_1 = "Roboto Condensed"
 24 | 
 25 | isCellebrite = False
 26 | SERVER_CONNECTED = False
 27 | 
 28 | 
 29 | inputLanguages = [
 30 |     "auto",
 31 |     "en",
 32 |     "sq",
 33 |     "ar",
 34 |     "az",
 35 |     "bn",
 36 |     "bg",
 37 |     "ca",
 38 |     "zh",
 39 |     "zt",
 40 |     "cs",
 41 |     "da",
 42 |     "nl",
 43 |     "eo",
 44 |     "et",
 45 |     "fi",
 46 |     "fr",
 47 |     "de",
 48 |     "el",
 49 |     "he",
 50 |     "hi",
 51 |     "hu",
 52 |     "id",
 53 |     "ga",
 54 |     "it",
 55 |     "ja",
 56 |     "ko",
 57 |     "lv",
 58 |     "lt",
 59 |     "ms",
 60 |     "nb",
 61 |     "fa",
 62 |     "pl",
 63 |     "pt",
 64 |     "ro",
 65 |     "ru",
 66 |     "sr",
 67 |     "sk",
 68 |     "sl",
 69 |     "es",
 70 |     "sv",
 71 |     "tl",
 72 |     "th",
 73 |     "tr",
 74 |     "uk",
 75 | ]
 76 | 
 77 | ## _______________Functions live here___________________________________________________________
 78 | 
 79 | 
 80 | # Process a selected file
 81 | def get_selection():
 82 |     selected_file = lbox.curselection()
 83 |     print(lbox.get(selected_file))
 84 |     print(inputSheetMenu.get())
 85 |     bulk_translate_v3.loadAndTranslate(
 86 |         lbox.get(selected_file),
 87 |         inputLangMenu.get(),
 88 |         inputSheetMenu.get(),
 89 |         isCellebrite.get(),
 90 |     )
 91 | 
 92 | 
 93 | def inputComboSelection(event):
 94 |     selectedProvenance = inputSheetMenu.get()
 95 | 
 96 | 
 97 | def langComboSelection(event):
 98 |     selectedProvenance = inputLangMenu.get()
 99 |     # messagebox.showinfo(message=f"The Selected value is {selectedProvenance}",title='Selection')
100 | 
101 | 
102 | ### _____Create interface______________________________________________________________________
103 | 
104 | # Show list of Excel files in the current working directory
105 | candidateFiles = os.listdir(os.getcwd())
106 | file_list = []
107 | for candidateFiles in candidateFiles:
108 |     if candidateFiles.endswith(".xlsx"):
109 |         file_list.append(candidateFiles)
110 | fileListingDisplay = "\n".join(file_list)
111 | 
112 | # Test Connectivity
113 | # bulk_translate_v3.serverCheck())
114 | if bulk_translate_v3.serverCheck(bulk_translate_v3.serverURL) == "SERVER_OK":
115 |     print("Connected to server")
116 |     SERVER_CONNECTED = True
117 |     serverButtonColour = LIGHT_BLUE
118 |     serverStatus = "Online"
119 | else:
120 |     print("Server connection failed")
121 |     SERVER_CONNECTED = False
122 |     serverButtonColour = DARK_RED
123 |     serverStatus = "Offline"
124 | 
125 | # Create box
126 | root = Tk()
127 | root.geometry("580x650")
128 | root.minsize(458, 580)
129 | root.maxsize(780, 780)
130 | root.configure(bg=LIGHT_GREY)
131 | 
132 | prog_name = Label(
133 |     text="Offline Translation",
134 |     anchor=W,
135 |     padx=10,
136 |     pady=10,
137 |     background=DARK_BLUE,
138 |     width=480,
139 |     font=(FONT_1, 20),
140 | )
141 | prog_name.pack()
142 | 
143 | sideFrame = Frame(master=root, width=100, height=100, bg=LIGHT_BLUE)
144 | sideFrame.pack(fill=Y, side=LEFT)
145 | sideFrame.pack()
146 | 
147 | servAdd = Label(
148 |     text="Server Address: {} Server Status: {}".format(
149 |         str(bulk_translate_v3.serverURL), serverStatus
150 |     ),
151 |     padx=10,
152 |     pady=00,
153 |     bg=serverButtonColour,
154 | )
155 | servAdd.pack()
156 | # User instructions
157 | prog_data = Label(
158 |     text="For procesisng of files place this program in the folder\n containing your Excel files.",
159 |     font=(FONT_1, 10),
160 |     anchor=W,
161 |     padx=5,
162 |     pady=5,
163 |     bg=LIGHT_GREY,
164 | )
165 | prog_data.pack()
166 | 
167 | app_data_heading = Label(sideFrame, text="   ", bg=LIGHT_BLUE, font=(FONT_1, 10))
168 | app_data_heading.pack()
169 | 
170 | # app_data.pack()
171 | 
172 | ## Show Auto located files
173 | auto_locate_data = Label(
174 |     text="{} candidate files located at path: \n{}".format(
175 |         str(len(file_list)), str(os.getcwd())
176 |     ),
177 |     anchor=W,
178 |     padx=10,
179 |     pady=10,
180 |     bg=LIGHT_GREY,
181 | )
182 | auto_locate_data.pack(pady=10, padx=10)
183 | 
184 | # Tick box if file is a Cellebrite file, the header in these files starts at 1
185 | isCellebrite = IntVar()
186 | c1 = Checkbutton(text="Cellebrite file?", variable=isCellebrite, onvalue=1, offvalue=0)
187 | c1.pack()
188 | 
189 | # Select an input Datasheet
190 | inputSheetName = Label(
191 |     text="Input Sheet name if multiple sheets exist",
192 |     padx=10,
193 |     pady=10,
194 |     bg=LIGHT_GREY,
195 | )
196 | inputSheetName.pack()
197 | 
198 | # Input sheet selection menu
199 | inputSheetVar = StringVar()
200 | inputSheetMenu = ttk.Combobox(
201 |     values=bulk_translate_v3.inputSheets, textvariable=inputSheetVar, state="readonly"
202 | )
203 | 
204 | inputSheetMenu.bind("<<ComboboxSelected>>", inputComboSelection)
205 | 
206 | # inputSheetMenu.set("Chats")
207 | inputSheetMenu.pack(side="top")
208 | 
209 | # File selection label
210 | filesLabel = Label(
211 |     text="Select File",
212 |     padx=10,
213 |     pady=10,
214 |     bg=LIGHT_GREY,
215 | )
216 | 
217 | # ____________Language selection menu_______________________________________
218 | inputLangName = Label(
219 |     text="Input Language",
220 |     padx=10,
221 |     pady=10,
222 |     bg=LIGHT_GREY,
223 | )
224 | inputLangName.pack()
225 | langVar = StringVar()
226 | inputLangMenu = ttk.Combobox(
227 |     values=inputLanguages, textvariable=langVar, state="readonly"
228 | )
229 | 
230 | inputLangMenu.bind("<<ComboboxSelected>>", langComboSelection)
231 | inputLangMenu.set("auto")
232 | inputLangMenu.pack(side="top")
233 | 
234 | # ____________File selction menu_______________________________________
235 | filesLabel = Label(
236 |     text="Select File",
237 |     padx=10,
238 |     pady=10,
239 |     bg=LIGHT_GREY,
240 | )
241 | filesLabel.pack()
242 | 
243 | # Select file names
244 | fNames = StringVar(value=fileListingDisplay)
245 | lbox = Listbox(root, listvariable=fNames, height=5, width=200)
246 | scroll_bar = Scrollbar(root)
247 | scroll_bar.pack(side=RIGHT, fill=Y)
248 | lbox.pack()
249 | scroll_bar.config(command=lbox.yview)
250 | 
251 | ### Buttons for processing selected files
252 | processSelectedBtn = Button(
253 |     root,
254 |     text="Process Selected",
255 |     command=get_selection,
256 |     bg=LIGHT_GREY,
257 |     padx=10,
258 | )
259 | processSelectedBtn.pack(side="top")
260 | 
261 | # Exit Program
262 | exitBtn = Button(root, text="Exit", command=root.destroy, bg=LIGHT_GREY)
263 | exitBtn.pack(side=TOP, pady=20, padx=10)
264 | 
265 | # Display version info
266 | verLabel = Label(
267 |     text="Version {}".format(str(bulk_translate_v3.__version__)),
268 |     padx=10,
269 |     pady=10,
270 |     bg=LIGHT_GREY,
271 | )
272 | verLabel.pack()
273 | 
274 | root.mainloop()
275 | 


--------------------------------------------------------------------------------
/offlineTranslate/old/bulk_translate.py:
--------------------------------------------------------------------------------
  1 | # Bulk Translation of Axiom formatted Excels containing messages
  2 | # Made in South Australia
  3 | # Unapologetically formatted with Black
  4 | #
  5 | #
  6 | # Changelog
  7 | #
  8 | # v0.1 Initial Concept
  9 | 
 10 | import argparse
 11 | import json
 12 | import pandas as pd
 13 | import requests
 14 | import os
 15 | import sys
 16 | 
 17 | # ----------------- Settings live here ------------------------
 18 | 
 19 | __description__ = "Utilises a Libretranslate server to translate messages from Axiom formatted Excel spreadsheets. Messages are loaded from a column titled 'Messages.'"
 20 | __author__ = "facelessg00n"
 21 | __version__ = "0.1"
 22 | 
 23 | banner = """
 24 |  ██████  ███████ ███████ ██      ██ ███    ██ ███████     ████████ ██████   █████  ███    ██ ███████ ██       █████  ████████ ███████ 
 25 | ██    ██ ██      ██      ██      ██ ████   ██ ██             ██    ██   ██ ██   ██ ████   ██ ██      ██      ██   ██    ██    ██      
 26 | ██    ██ █████   █████   ██      ██ ██ ██  ██ █████          ██    ██████  ███████ ██ ██  ██ ███████ ██      ███████    ██    █████   
 27 | ██    ██ ██      ██      ██      ██ ██  ██ ██ ██             ██    ██   ██ ██   ██ ██  ██ ██      ██ ██      ██   ██    ██    ██      
 28 |  ██████  ██      ██      ███████ ██ ██   ████ ███████        ██    ██   ██ ██   ██ ██   ████ ███████ ███████ ██   ██    ██    ███████ 
 29 |                                                                                                                                       
 30 |                                                                                                                                       """
 31 | 
 32 | # Debug mode, will print errors etc
 33 | debug = False
 34 | 
 35 | serverURL = "http://localhost:5000"
 36 | # Endpoints
 37 | #           /translate - translation
 38 | #           /languages - supported languages
 39 | 
 40 | # Name of the column where the messages to be translated are found.
 41 | # This can be modified to suit other Excel column names if desired
 42 | inputColumn = "Message"
 43 | 
 44 | 
 45 | # Check is server is reachable and able to process a request.
 46 | def serverCheck():
 47 |     print(f"Testing we can reach server {serverURL}")
 48 |     headers = {"Content-Type": "application/json"}
 49 |     payload = json.dumps(
 50 |         {
 51 |             "q": "Buenos días señor",
 52 |             "source": "auto",
 53 |             "target": "en",
 54 |             "format": "text",
 55 |             "api_key": None,
 56 |         }
 57 |     )
 58 |     try:
 59 |         response = requests.post(
 60 |             f"{serverURL}/translate", data=payload, headers=headers
 61 |         )
 62 |         if response.status_code == 404:
 63 |             print("ERROR: 404, server not found, check server address.")
 64 |             sys.exit(1)
 65 |         elif response.status_code == 400:
 66 |             print("ERROR: Invalid request sent - exiting")
 67 |             sys.exit(1)
 68 |         elif response.status_code == 200:
 69 |             print("Server located, testing translation")
 70 |             print(response.json())
 71 | 
 72 |     # FIXME - Handle connection errors, can probably be done better.
 73 |     except ConnectionRefusedError:
 74 |         print(
 75 |             f"Server connection refused - {serverURL}, is the address correct? \n\nExiting"
 76 |         )
 77 |         sys.exit()
 78 |     except Exception as e:
 79 |         print(f"Unable to connect, ERROR: {e}")
 80 |         sys.exit()
 81 | 
 82 | 
 83 | # Loads Excel into dataframe and translates messages
 84 | def loadAndTranslate(inputFile, inputLanguage):
 85 |     # Check we can hit the server before we start
 86 |     serverCheck()
 87 |     head, tail = os.path.split(inputFile)
 88 |     fileName = tail.split(".")[0]
 89 |     # Load Excel into Dataframe "df" and check for messages column.
 90 |     df = pd.read_excel(inputFile)
 91 | 
 92 |     if inputColumn not in df.columns:
 93 |         print("Required message column not found")
 94 |         sys.exit(1)
 95 | 
 96 |     # Load Messages Column to list and print some stats
 97 |     messages_nan_count = df["Message"].isna().sum()
 98 |     messages = df["Message"].tolist()
 99 |     print(f"{len(messages)} messages")
100 |     print(f"{messages_nan_count} blank rows")
101 | 
102 |     results = []
103 |     loopCount = 0
104 |     for message in messages:
105 |         # If no language code is specified use Auto Translate
106 |         if inputLanguage == None:
107 |             translated_text = translate_text(message, None)
108 |         # Else manual translation
109 |         else:
110 |             translated_text = translate_text(message, inputLanguage)
111 | 
112 |         if debug:
113 |             print(translated_text)
114 |         results.append(translated_text)
115 |         print(f"Processing message {loopCount} of {len(messages)}")
116 |         loopCount = loopCount + 1
117 | 
118 |         # ------------- Write backup file every 100 messages ----------------------------------------
119 |         if len(results) % 100 == 0:
120 |             print("Writing backup")
121 |             backup_frame = pd.DataFrame(results)
122 | 
123 |             try:
124 |                 backup_frame.to_excel(
125 |                     f"{fileName}_backup.xlsx",
126 |                     index=False,
127 |                     columns=[
128 |                         "detected_language",
129 |                         "detected_confidence",
130 |                         "success",
131 |                         "input",
132 |                         "translatedText",
133 |                     ],
134 |                 )
135 |             except:
136 |                 print("Writing Excel Bakcup failed")
137 |                 pass
138 | 
139 |             try:
140 |                 backup_frame.to_csv(
141 |                     f"{fileName}_backup.csv",
142 |                     encoding="utf-16",
143 |                     columns=[
144 |                         "detected_language",
145 |                         "detected_confidence",
146 |                         "success",
147 |                         "input",
148 |                         "translatedText",
149 |                     ],
150 |                 )
151 |             except:
152 |                 print("Writing CSV backup failed")
153 |                 pass
154 | 
155 |     # ------------------ Write output file -----------------------------------------------------------------
156 |     print("Translation complete - Writing file")
157 |     outputFrame = pd.DataFrame(results)
158 | 
159 |     try:
160 |         outputFrame.to_excel(
161 |             f"{fileName}_translated.xlsx",
162 |             index=False,
163 |             columns=[
164 |                 "detected_language",
165 |                 "detected_confidence",
166 |                 "success",
167 |                 "input",
168 |                 "translatedText",
169 |             ],
170 |         )
171 |     except:
172 |         print("Writing Excel file")
173 |         pass
174 | 
175 |     try:
176 |         outputFrame.to_csv(
177 |             f"{fileName}_translated.csv",
178 |             encoding="utf-16",
179 |             columns=[
180 |                 "detected_language",
181 |                 "detected_confidence",
182 |                 "success",
183 |                 "input",
184 |                 "translatedText",
185 |             ],
186 |         )
187 |     except:
188 |         print("Writing CSV failed")
189 |         pass
190 | 
191 |     print("Process complete - Exiting.")
192 | 
193 | 
194 | # ------------------ Translates text with selected language -----------------------------------------------
195 | def translate_text(inputText, inputLang, api_key=None):
196 |     # For future implementation
197 |     if api_key is not None:
198 |         API_KEY = api_key
199 |     else:
200 |         API_KEY = None
201 | 
202 |     if inputLang is not None:
203 |         if debug:
204 |             print("Manual Lanugage Detection {}".format(inputLang))
205 |         payload = json.dumps(
206 |             {
207 |                 "q": inputText,
208 |                 "source": inputLang,
209 |                 "target": "en",
210 |                 "format": "text",
211 |                 "api_key": API_KEY,
212 |             }
213 |         )
214 |     else:
215 |         if debug:
216 |             print("Auto language detection enabled".format(inputLang))
217 |         payload = json.dumps(
218 |             {
219 |                 "q": inputText,
220 |                 "source": "auto",
221 |                 "target": "en",
222 |                 "format": "text",
223 |                 "api_key": API_KEY,
224 |             }
225 |         )
226 | 
227 |     # Detect blank rows and skip to prevent error being thrown by server / speeds up process
228 |     if inputText == None or pd.isna(inputText):
229 |         print("Blank row found, skipping")
230 |         output = {
231 |             "detected_language": None,
232 |             "detected_confidence": None,
233 |             "translatedText": None,
234 |             "success": False,
235 |         }
236 |         output["input"] = inputText
237 |         return output
238 | 
239 |     else:
240 |         headers = {"Content-Type": "application/json"}
241 |         response = requests.post(
242 |             f"{serverURL}/translate", data=payload, headers=headers
243 |         )
244 |         if response.status_code == 200:
245 |             results = response.json()
246 |             if debug:
247 |                 print(f"{inputText} and {response.json()}")
248 |             try:
249 |                 answer = results
250 |                 # Server response style is different for Auto or Manual language selection
251 |                 if inputLang is not None:
252 |                     output = {
253 |                         "detected_language": f"Manual - {inputLang}",
254 |                         "detected_confidence": None,
255 |                         "translatedText": answer.get("translatedText"),
256 |                         "success": True,
257 |                     }
258 |                 else:
259 |                     output = {
260 |                         "detected_language": results.get("detectedLanguage")[
261 |                             "language"
262 |                         ],
263 |                         "detected_confidence": results.get("detectedLanguage")[
264 |                             "confidence"
265 |                         ],
266 |                         "translatedText": answer.get("translatedText"),
267 |                         "success": True,
268 |                     }
269 | 
270 |                 output["input"] = inputText
271 |                 return output
272 |             except Exception as e:
273 |                 print(e)
274 | 
275 |         elif response.status_code == 400:
276 |             print("Invalid request")
277 |             output = {
278 |                 "detected_language": None,
279 |                 "detected_confidence": None,
280 |                 "translatedText": None,
281 |                 "success": f"Error: {response.status_code, results.get}",
282 |             }
283 |             output["input"] = inputText
284 |             return output
285 | 
286 | 
287 | # Retrieve list of allowed languages from the server
288 | def getLanguages(printVals):
289 |     AllowedLangs = []
290 |     supportedLanguages = requests.get(f"{serverURL}/languages").json()
291 |     for langItem in supportedLanguages:
292 |         if printVals:
293 |             print(
294 |                 f"Language Code: {langItem['code']} Language Name: {langItem['name']}"
295 |             )
296 |         AllowedLangs.append(langItem["code"])
297 |     return AllowedLangs
298 | 
299 | 
300 | # ---------------------------- Argument Parser ------------------------
301 | 
302 | if __name__ == "__main__":
303 |     print(banner)
304 |     serverCheck()
305 |     print(f"Checking server {serverURL} for supported languages")
306 |     try:
307 |         supportedLanguages = getLanguages(False)
308 |         if len(supportedLanguages) == 0:
309 |             print("Supported Languages not found")
310 |             supportedLanguages = ["0"]
311 |         else:
312 |             print(f"Languages found - {supportedLanguages} \n\n")
313 | 
314 |     except Exception as e:
315 |         print(e)
316 | 
317 |     parser = argparse.ArgumentParser(
318 |         description=__description__,
319 |         epilog="Developed by {}, version {}".format(str(__author__), str(__version__)),
320 |     )
321 | 
322 |     parser.add_argument(
323 |         "-f", "--file", dest="inputFilePath", help="Path to Axiom formatted excel file"
324 |     )
325 |     parser.add_argument(
326 |         "-s",
327 |         "--server",
328 |         dest="translationServer",
329 |         help="Address of translation server if not localhost or hardcoded",
330 |         required=False,
331 |     )
332 |     parser.add_argument(
333 |         "-l",
334 |         "--language",
335 |         dest="inputLanguage",
336 |         help="Language code for input text - optional but can greatly improve accuracy",
337 |         required=False,
338 |         choices=supportedLanguages,
339 |     )
340 |     parser.add_argument(
341 |         "-g",
342 |         "--getlangs",
343 |         dest="getLangs",
344 |         action="store_true",
345 |         help="Get supported language codes and names from server",
346 |         required=False,
347 |         default=True,
348 |     )
349 |     args = parser.parse_args()
350 |     if len(sys.argv) == 1:
351 |         parser.print_help()
352 |         sys.exit(1)
353 | 
354 |     if args.inputFilePath and not args.inputLanguage:
355 |         if not os.path.exists(args.inputFilePath):
356 |             print(
357 |                 "ERROR: {} does not exist or is not a file".format(args.inputFilePath)
358 |             )
359 |             sys.exit(1)
360 |         loadAndTranslate(args.inputFilePath, None)
361 | 
362 |     if args.inputFilePath and args.inputLanguage:
363 |         if not os.path.exists(args.inputFilePath):
364 |             print(
365 |                 "ERROR: {} does not exist or is not a file".format(args.inputFilePath)
366 |             )
367 |             sys.exit(1)
368 |         print(f"Input language set to {args.inputLanguage}")
369 |         loadAndTranslate(args.inputFilePath, args.inputLanguage)
370 | 
371 |     if args.getLangs:
372 |         getLanguages(True)
373 | 


--------------------------------------------------------------------------------
/offlineTranslate/bulk_translate_v3.py:
--------------------------------------------------------------------------------
  1 | # Bulk Translation of Axiom formatted Excels containing messages
  2 | # Made in South Australia
  3 | # Unapologetically formatted with Black
  4 | #
  5 | # Changelog
  6 | # v0.3 Handle network errors... oops
  7 | # v0.2 Change to output full content of the input sheet
  8 | #      Handle Cellebrite and Axiom files
  9 | # v0.1 Initial Concept
 10 | 
 11 | import argparse
 12 | import json
 13 | import pandas as pd
 14 | import requests
 15 | import os
 16 | import sys
 17 | from tqdm import tqdm
 18 | from time import sleep
 19 | 
 20 | # ----------------- Settings live here ------------------------
 21 | 
 22 | __description__ = "Utilises a Libretranslate server to translate messages from Excel spreadsheets. By default messages are loaded from a column titled 'Message'."
 23 | __author__ = "facelessg00n"
 24 | __version__ = "0.3"
 25 | 
 26 | banner = """
 27 |  ██████  ███████ ███████ ██      ██ ███    ██ ███████     ████████ ██████   █████  ███    ██ ███████ ██       █████  ████████ ███████ 
 28 | ██    ██ ██      ██      ██      ██ ████   ██ ██             ██    ██   ██ ██   ██ ████   ██ ██      ██      ██   ██    ██    ██      
 29 | ██    ██ █████   █████   ██      ██ ██ ██  ██ █████          ██    ██████  ███████ ██ ██  ██ ███████ ██      ███████    ██    █████   
 30 | ██    ██ ██      ██      ██      ██ ██  ██ ██ ██             ██    ██   ██ ██   ██ ██  ██ ██      ██ ██      ██   ██    ██    ██      
 31 |  ██████  ██      ██      ███████ ██ ██   ████ ███████        ██    ██   ██ ██   ██ ██   ████ ███████ ███████ ██   ██    ██    ███████ 
 32 |                                                                                                                                       
 33 |                                                                                                                                       """
 34 | 
 35 | # Debug mode, will print errors etc
 36 | debug = False
 37 | 
 38 | # if being compiled with a GUI
 39 | # Keeps window alive if connection fails
 40 | hasGUI = True
 41 | 
 42 | serverURL = "http://localhost:5000"
 43 | CONNECTION_TIMEOUT = 3
 44 | RESPONSE_TIMEOUT = 60
 45 | #
 46 | # Endpoints
 47 | #           /translate - translation
 48 | #           /languages - supported languages
 49 | 
 50 | # Name of the column where the messages to be translated are found.
 51 | # This can be modified to suit other Excel column names if desired
 52 | inputColumn = "Message"
 53 | inputSheets = ["Chats", "Instant Messages"]
 54 | sheetName = "Chats"
 55 | headerRow = 1
 56 | 
 57 | translationColumns = [
 58 |     "detectedLanguage",
 59 |     "detectedConfidence",
 60 |     "success",
 61 |     "input",
 62 |     "translatedText",
 63 | ]
 64 | 
 65 | 
 66 | # Check is server is reachable and able to process a request.
 67 | def serverCheck(serverURL):
 68 |     print(f"Testing we can reach server {serverURL}")
 69 |     headers = {"Content-Type": "application/json"}
 70 |     payload = json.dumps(
 71 |         {
 72 |             "q": "Buenos días señor",
 73 |             "source": "auto",
 74 |             "target": "en",
 75 |             "format": "text",
 76 |             "api_key": None,
 77 |         }
 78 |     )
 79 |     try:
 80 |         response = requests.post(
 81 |             f"{serverURL}/translate", data=payload, headers=headers
 82 |         )
 83 |         if response.status_code == 404:
 84 |             print("ERROR: 404, server not found, check server address.")
 85 |             sys.exit(1)
 86 |         elif response.status_code == 400:
 87 |             print("ERROR: Invalid request sent - exiting")
 88 |             sys.exit(1)
 89 |         elif response.status_code == 200:
 90 |             print("Server located, testing translation")
 91 |             print(response.json())
 92 |             return "SERVER_OK"
 93 | 
 94 |     # FIXME - Handle connection errors, can probably be done better.
 95 |     except ConnectionRefusedError:
 96 |         print(
 97 |             f"Server connection refused - {serverURL}, is the address correct? \n\nExiting"
 98 |         )
 99 |         if not hasGUI:
100 |             sys.exit()
101 |     except Exception as e:
102 |         print(f"Unable to connect, ERROR: {e}")
103 |         if not hasGUI:
104 |             sys.exit()
105 | 
106 | 
107 | # Loads Excel into dataframe and translates messages
108 | def loadAndTranslate(inputFile, inputLanguage, inputSheet, isCellebrite):
109 |     # Check we can hit the server before we start
110 |     serverCheck(serverURL)
111 |     head, tail = os.path.split(inputFile)
112 |     fileName = tail.split(".")[0]
113 | 
114 |     if isCellebrite:
115 |         inputHeader = 1
116 |         inputColumn = "Body"
117 |     else:
118 |         inputHeader = 0
119 |         inputColumn = "Message"
120 | 
121 |     # Load Excel into Dataframe "df" and check for messages column.
122 |     if inputSheet:
123 |         print("There is an input sheet")
124 |         df = pd.read_excel(inputFile, sheet_name=inputSheet, header=inputHeader)
125 |     else:
126 |         print("There is no input sheet specified")
127 |         df = pd.read_excel(inputFile, header=inputHeader)
128 | 
129 |     if debug:
130 |         df = df.head(25)
131 | 
132 |     if inputColumn not in df.columns:
133 |         print("Required message column not found, is this a Cellbrite Formatted Excel?")
134 |         sys.exit(1)
135 | 
136 |     # Load Messages Column to list and print some stats
137 |     messages_nan_count = df[inputColumn].isna().sum()
138 |     messages = df[inputColumn].tolist()
139 |     print(f"{len(messages)} messages")
140 |     print(f"{messages_nan_count} blank rows")
141 | 
142 |     results = []
143 |     loopCount = 1
144 |     for message in tqdm(messages, desc="Translating messages", ascii="░▒█"):
145 |         # If no language code is specified use Auto Translate
146 |         if inputLanguage == None:
147 |             translated_text = translate_text(message, None)
148 |         # Else manual translation
149 |         else:
150 |             translated_text = translate_text(message, inputLanguage)
151 | 
152 |         if debug:
153 |             print(translated_text)
154 |         results.append(translated_text)
155 |         tqdm.write(f"Processing message {loopCount} of {len(messages)}")
156 |         # print(f"Processing message {loopCount} of {len(messages)}")
157 |         loopCount = loopCount + 1
158 | 
159 |         # ------------- Write backup file every 100 messages ----------------------------------------
160 |         if len(results) % 100 == 0:
161 |             tqdm.write("Writing backup")
162 |             backup_frame = pd.DataFrame(results)
163 | 
164 |             try:
165 |                 backup_frame.to_csv(
166 |                     f"{fileName}_backup.csv",
167 |                     encoding="utf-16",
168 |                     columns=translationColumns,
169 |                 )
170 |             except:
171 |                 print("Writing CSV backup failed")
172 |                 pass
173 | 
174 |     # ------------------ Write output file -----------------------------------------------------------------
175 |     print("Translation complete - Writing file")
176 |     # Get colum positon to insert new data
177 |     bodyPosition = df.columns.get_loc(inputColumn) + 1
178 |     # Splitting orig frame into to then concat with new data
179 |     df1_part1 = df.iloc[:, :bodyPosition]
180 |     df1_part2 = df.iloc[:, bodyPosition:]
181 |     outputFrame = pd.concat([df1_part1, pd.DataFrame(results), df1_part2], axis=1)
182 | 
183 |     try:
184 |         outputFrame.to_excel(f"{fileName}_translated.xlsx", index=False)
185 |     except:
186 |         print("Writing Excel failed")
187 |         pass
188 | 
189 |     try:
190 |         outputFrame.to_csv(f"{fileName}_translated.csv", encoding="utf-16")
191 |     except:
192 |         print("Writing CSV failed")
193 |         pass
194 | 
195 |     print("Process complete - Exiting.")
196 | 
197 | 
198 | # ------------------ Translates text with selected language -----------------------------------------------
199 | def translate_text(inputText, inputLang, api_key=None):
200 |     # For future implementation
201 |     if api_key is not None:
202 |         API_KEY = api_key
203 |     else:
204 |         API_KEY = None
205 | 
206 |     if inputLang is not None:
207 |         if debug:
208 |             print("Manual Lanugage Selection {}".format(inputLang))
209 |         payload = json.dumps(
210 |             {
211 |                 "q": inputText,
212 |                 "source": inputLang,
213 |                 "target": "en",
214 |                 "format": "text",
215 |                 "api_key": API_KEY,
216 |             }
217 |         )
218 |     else:
219 |         if debug:
220 |             print("Auto language detection enabled".format(inputLang))
221 |         payload = json.dumps(
222 |             {
223 |                 "q": inputText,
224 |                 "source": "auto",
225 |                 "target": "en",
226 |                 "format": "text",
227 |                 "api_key": API_KEY,
228 |             }
229 |         )
230 | 
231 |     # Detect blank rows and skip to prevent error being thrown by server / speeds up process
232 |     if inputText == None or pd.isna(inputText):
233 |         tqdm.write("Blank row found, skipping")
234 |         output = {
235 |             "detectedLanguage": None,
236 |             "detectedConfidence": None,
237 |             "translatedText": None,
238 |             "success": False,
239 |         }
240 |         output["input"] = inputText
241 |         return output
242 | 
243 |     # If row is not blank, attempt to translate it
244 |     else:
245 |         headers = {"Content-Type": "application/json"}
246 |         try:
247 |             # Max Attempt for retries
248 |             MAX_ATTEMPTS = 5
249 | 
250 |             response = requests.post(
251 |                 f"{serverURL}/translate",
252 |                 data=payload,
253 |                 headers=headers,
254 |                 timeout=(CONNECTION_TIMEOUT, RESPONSE_TIMEOUT),
255 |             )
256 | 
257 |         # Handle a read timeout error, sleep 2 seconds then try again
258 |         except requests.ReadTimeout:
259 | 
260 |             while MAX_ATTEMPTS > 0:
261 |                 try:
262 |                     tqdm.write("Read Timeout error, retrying")
263 |                     sleep(2)
264 |                     response = requests.post(
265 |                         f"{serverURL}/translate",
266 |                         data=payload,
267 |                         headers=headers,
268 |                     )
269 |                     output["input"] = inputText
270 |                     return output
271 | 
272 |                 except Exception:
273 |                     MAX_ATTEMPTS -= 1
274 |                     continue
275 |             else:
276 |                 output = {
277 |                     "detectedLanguage": None,
278 |                     "detectedConfidence": None,
279 |                     "translatedText": None,
280 |                     "success": "False: Error: Read Timeout ",
281 |                 }
282 |                 output["input"] = inputText
283 |                 return output
284 | 
285 |         # Handle a connection dropout, sleep 2 seconds and try again
286 |         except requests.ConnectionError:
287 |             while MAX_ATTEMPTS > 0:
288 |                 try:
289 |                     tqdm.write("Connection Error - Retrying")
290 |                     sleep(2)
291 |                     response = requests.post(
292 |                         f"{serverURL}/translate", data=payload, headers=headers
293 |                     )
294 |                     output["input"] = inputText
295 |                     return output
296 | 
297 |                 except Exception:
298 |                     MAX_ATTEMPTS -= 1
299 |                     continue
300 |             else:
301 |                 print("Failed")
302 |                 output = {
303 |                     "detectedLanguage": None,
304 |                     "detectedConfidence": None,
305 |                     "translatedText": None,
306 |                     "success": "False: Error: Connection Error",
307 |                 }
308 |                 output["input"] = inputText
309 |                 return output
310 | 
311 |         except Exception as e:
312 |             tqdm.write(f"Unhandled exception {e}")
313 |             output = {
314 |                 "detectedLanguage": None,
315 |                 "detectedConfidence": None,
316 |                 "translatedText": None,
317 |                 "success": f"False: Error: {e}",
318 |             }
319 |             output["input"] = inputText
320 |             return output
321 | 
322 |         if response.status_code == 200:
323 |             results = response.json()
324 |             if debug:
325 |                 print(f"{inputText} and {response.json()}")
326 |             try:
327 |                 answer = results
328 |                 # Server response style is different for Auto or Manual language selection
329 |                 if inputLang is not None:
330 |                     output = {
331 |                         "detectedLanguage": f"Manual - {inputLang}",
332 |                         "detectedConfidence": None,
333 |                         "translatedText": answer.get("translatedText"),
334 |                         "success": True,
335 |                     }
336 |                 else:
337 |                     output = {
338 |                         "detectedLanguage": results.get("detectedLanguage")["language"],
339 |                         "detectedConfidence": results.get("detectedLanguage")[
340 |                             "confidence"
341 |                         ],
342 |                         "translatedText": answer.get("translatedText"),
343 |                         "success": True,
344 |                     }
345 | 
346 |                 output["input"] = inputText
347 |                 return output
348 |             except Exception as e:
349 |                 print(e)
350 | 
351 |         elif response.status_code == 400:
352 |             print("Invalid request")
353 |             output = {
354 |                 "detectedLanguage": None,
355 |                 "detectedConfidence": None,
356 |                 "translatedText": None,
357 |                 "success": f"Error: {response.status_code, results.get}",
358 |             }
359 |             output["input"] = inputText
360 |             return output
361 | 
362 | 
363 | # Retrieve list of alowed languages from the server
364 | def getLanguages(printVals):
365 |     AllowedLangs = []
366 |     try:
367 |         supportedLanguages = requests.get(f"{serverURL}/languages").json()
368 |     except:
369 |         print("Supported Languages not found")
370 |         supportedLanguages = []
371 |         pass
372 | 
373 |     for langItem in supportedLanguages:
374 |         if printVals:
375 |             print(
376 |                 f"Language Code: {langItem['code']} Language Name: {langItem['name']}"
377 |             )
378 |         AllowedLangs.append(langItem["code"])
379 |     return AllowedLangs
380 | 
381 | 
382 | # ---------------------------- Argument Parser ------------------------
383 | 
384 | if __name__ == "__main__":
385 |     print(banner)
386 |     if debug:
387 |         print("WARNING DEBUG MODE IS ACTIVE")
388 |     serverCheck(serverURL)
389 |     print(f"Checking server {serverURL} for supported languages")
390 |     try:
391 |         supportedLanguages = getLanguages(False)
392 |         if len(supportedLanguages) == 0:
393 |             print("Supported Languages not found")
394 |             supportedLanguages = []
395 |         else:
396 |             print(f"Languages found - {supportedLanguages} \n\n")
397 | 
398 |     except Exception as e:
399 |         print(e)
400 | 
401 |     parser = argparse.ArgumentParser(
402 |         description=__description__,
403 |         epilog="Developed by {}, version {}".format(str(__author__), str(__version__)),
404 |     )
405 | 
406 |     parser.add_argument("-f", "--file", dest="inputFilePath", help="Path to Excel File")
407 |     parser.add_argument(
408 |         "-s",
409 |         "--server",
410 |         dest="translationServer",
411 |         help="Address of translation server if not localhost or hardcoded",
412 |         required=False,
413 |     )
414 | 
415 |     parser.add_argument(
416 |         "-l",
417 |         "--language",
418 |         dest="inputLanguage",
419 |         help="Language code for input text - optional but can greatly improve accuracy",
420 |         required=False,
421 |         choices=supportedLanguages,
422 |     )
423 | 
424 |     parser.add_argument(
425 |         "-e",
426 |         "--excelSheet",
427 |         dest="inputSheet",
428 |         help="Sheet name within Excel file to be translated",
429 |         required=False,
430 |         choices=inputSheets,
431 |     )
432 | 
433 |     parser.add_argument(
434 |         "-c",
435 |         "--isCellebrite",
436 |         dest="isCellebrite",
437 |         help="If file originated from Cellebrite, header starts at 1, and message column is called 'Body'",
438 |         required=False,
439 |         action="store_true",
440 |         default=False,
441 |     )
442 | 
443 |     parser.add_argument(
444 |         "-g",
445 |         "--getlangs",
446 |         dest="getLangs",
447 |         action="store_true",
448 |         help="Get supported language codes and names from server",
449 |         required=False,
450 |         default=False,
451 |     )
452 | 
453 |     args = parser.parse_args()
454 |     if len(sys.argv) == 1:
455 |         parser.print_help()
456 |         sys.exit(1)
457 | 
458 |     if args.inputFilePath and not args.inputLanguage:
459 |         if not os.path.exists(args.inputFilePath):
460 |             print(
461 |                 "ERROR: {} does not exist or is not a file".format(args.inputFilePath)
462 |             )
463 |             sys.exit(1)
464 |         loadAndTranslate(args.inputFilePath, None, args.inputSheet, args.isCellebrite)
465 | 
466 |     if args.inputFilePath and args.inputLanguage:
467 |         if not os.path.exists(args.inputFilePath):
468 |             print(
469 |                 "ERROR: {} does not exist or is not a file".format(args.inputFilePath)
470 |             )
471 |             sys.exit(1)
472 |         print(f"Input language set to {args.inputLanguage}")
473 |         loadAndTranslate(
474 |             args.inputFilePath, args.inputLanguage, args.inputSheet, args.isCellebrite
475 |         )
476 | 
477 |     if args.getLangs:
478 |         getLanguages(True)
479 | 


--------------------------------------------------------------------------------
/clbExtract/old/clbExtract.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Extracts nested contacts data from Cellebrite formatted Excel documents.
  3 |     - Cellebrite Stores contact details in multiline Excel cells.
  4 | Formatted with Black
  5 | 
  6 | Changelog
  7 | 0.3 Complete rewrite
  8 | 
  9 | 0.2 - Implement command line argument parser
 10 |         Allow bulk processing of all items in directory
 11 | 
 12 | 0.1 - Initial concept
 13 | 
 14 | """
 15 | import argparse
 16 | import glob
 17 | import logging
 18 | import os
 19 | import openpyxl
 20 | import pandas as pd
 21 | from pathlib import Path
 22 | import sys
 23 | 
 24 | 
 25 | 
 26 | ## Details
 27 | __description__ = 'Flattens Cellebrite formatted Excel files. "Contacts" and "Device Info" tabs are required.'
 28 | __author__ = "facelessg00n"
 29 | __version__ = "0.3"
 30 | 
 31 | parser = argparse.ArgumentParser(
 32 |     description=__description__,
 33 |     epilog="Developed by {}".format(str(__author__), str(__version__)),
 34 | )
 35 | 
 36 | # ----------- Options -----------
 37 | debug = False
 38 | 
 39 | os.chdir(os.getcwd())
 40 | 
 41 | logging.basicConfig(
 42 |     filename="log.txt",
 43 |         format="%(asctime)s,- %(levelname)s - %(message)s",
 44 |     level=logging.INFO,
 45 | )
 46 | 
 47 | 
 48 | # Set names for sheets of interest
 49 | clbPhoneInfo = "Device Info"
 50 | clbContactSheet = "Contacts"
 51 | 
 52 | # FIXME
 53 | #### ---- Column names and other options ---------------------------------------------
 54 | contactOutput = "ContactDetail"
 55 | contactTypeOutput = "ContactType"
 56 | originIMEI = "originIMEI"
 57 | parsedApps = [
 58 |     "Instagram",
 59 |     "Native",
 60 |     "Telegram",
 61 |     "Snapchat",
 62 |     "WhatsApp",
 63 |     "Facebook Messenger",
 64 |     "Signal",
 65 | ]
 66 | 
 67 | # Class object to hold phone and input file info
 68 | class phoneData:
 69 |     IMEI = None
 70 |     IMEI2 = None
 71 |     inFile = None
 72 |     inPath = None
 73 | 
 74 |     def __init__(self, IMEI=None, IMEI2=None, inFile=None, inPath=None) -> None:
 75 |         self.IMEI = IMEI
 76 |         self.IMEI2 = IMEI2
 77 |         self.inFile = inFile
 78 |         self.inPath = inPath
 79 | 
 80 | 
 81 | # -------------Functions live here ------------------------------------------
 82 | 
 83 | # ----- Bulk Excel Processor--------------------------------------------------
 84 | 
 85 | # Finds and processes all exxcel files in the working directory.
 86 | def bulkProcessor():
 87 |     FILE_PATH = os.getcwd()
 88 |     inputFiles = glob.glob("*.xlsx")
 89 |     print((str(len(inputFiles)) + " Excel files located. \n"))
 90 |     # If there are no files found exit the process.
 91 |     if len(inputFiles) == 0:
 92 |         print("No excel files located.")
 93 |         print("Exiting.")
 94 |         quit()
 95 |     else:
 96 |         for x in inputFiles:
 97 |             if os.path.exists(x):
 98 |                 try:
 99 |                     processMetadata(x)
100 |                 # Need to deal with $ files.
101 |                 except FileNotFoundError:
102 |                     print("File does not exist or temp file detected")
103 |                     pass
104 |     if debug:
105 |         for x in inputFiles:
106 |             inputFilename = x.split(".")[0]
107 |             print(inputFilename)
108 | 
109 | 
110 | # FIXME - Deal with error when this info is missing
111 | ### -------- Process phone metadata ------------------------------------------------------
112 | def processMetadata(inputFile):
113 | 
114 |     try:
115 |         infoPD = pd.read_excel(
116 |             inputFile, sheet_name=clbPhoneInfo, header=1, usecols="B,C,D"
117 |         )
118 | 
119 |         phoneData.IMEI = infoPD.loc[infoPD["Name"] == "IMEI", ["Value"]].values[0][0]
120 |         try:
121 |             phoneData.IMEI2 = infoPD.loc[infoPD["Name"] == "IMEI2", ["Value"]].values[
122 |                 0
123 |             ][0]
124 |         except:
125 |             phoneData.IMEI2 = None
126 |         # phoneData.inFile = inputFile.split(".")[0]
127 |         phoneData.inFile = Path(inputFile).stem
128 |         phoneData.inPath = os.path.dirname(inputFile)
129 | 
130 |         if debug:
131 |             print(infoPD)
132 |             print(phoneData.IMEI)
133 |     except ValueError:
134 |         print(
135 |             "\033[1;31m Info tab not found in {}, apptempting with with no IMEI".format(
136 |                 inputFile
137 |             )
138 |         )
139 |         phoneData.IMEI = None
140 |         phoneData.IMEI2 = None
141 |         # phoneData.inFile = inputFile.split(".")[0]
142 |         phoneData.inFile = Path(inputFile).stem
143 |         phoneData.inPath = os.path.dirname(inputFile)
144 | 
145 |     try:
146 |         processContacts(inputFile)
147 |     except ValueError:
148 |         print("\033[1;31m No Contacts tab  found, is this a correctly formatted Excel?")
149 |         logging.error(
150 |             "No Contacts tab  found in {}, is this a correctly formatted Excel?".format(
151 |                 inputFile
152 |             )
153 |         )
154 | 
155 | 
156 | ### Extract contacts tab of Excel file -------------------------------------------------------------------
157 | def processContacts(inputFile):
158 |     inputFile = inputFile
159 |     logging.info("Processing contacts in {} has begun.".format(inputFile))
160 | 
161 |     # Record input filename for use in export processes.
162 | 
163 |     if debug:
164 |         print("\033[0;37m Input file is : {}".format(phoneData.inFile))
165 | 
166 |     contactsPD = pd.read_excel(
167 |         inputFile,
168 |         sheet_name=clbContactSheet,
169 |         header=1,
170 |         index_col="#",
171 |         usecols=["#", "Name", "Interaction Statuses", "Entries", "Source", "Account"],
172 |     )
173 | 
174 |     print(
175 |         "\033[0m Processing the following app types for : {}".format(phoneData.inFile)
176 |     )
177 |     applist = contactsPD["Source"].unique()
178 |     for x in applist:
179 |         if x in parsedApps:
180 |             print("{} : \u2713 ".format(x))
181 |         else:
182 |             print("{} : \u2716".format(x))
183 |     # Process native contacts
184 |     try:
185 |         processAppleNative(contactsPD)
186 |     except:
187 |         print("Processing native contacts failed.")
188 |         pass
189 |     # Process Apps
190 |     for x in applist:
191 |         if x == "Instagram":
192 |             processInstagram(contactsPD)
193 |         if x == "Snapchat":
194 |             processSnapChat(contactsPD)
195 |         if x == "WhatsApp":
196 |             processWhatsapp(contactsPD)
197 |         if x == "Telegram":
198 |             processTelegram(contactsPD)
199 |         if x == "Facebook Messenger":
200 |             processFacebookMessenger(contactsPD)
201 |         if x == "Signal":
202 |             processSignal(contactsPD)
203 | 
204 | 
205 | # ------ Parse Facebook Messenger --------------------------------------------------------------
206 | def processFacebookMessenger(contactsPD):
207 |     print("\nProcessing Facebook Messenger")
208 |     facebookMessengerPD = contactsPD[contactsPD["Source"] == "Facebook Messenger"]
209 |     facebookMessengerPD = facebookMessengerPD.drop("Entries", axis=1).join(
210 |         facebookMessengerPD["Entries"].str.split("\n", expand=True)
211 |     )
212 |     facebookMessengerPD = facebookMessengerPD.reset_index(drop=True)
213 | 
214 |     selected_cols = []
215 |     for x in facebookMessengerPD.columns:
216 |         if isinstance(x, int):
217 |             selected_cols.append(x)
218 | 
219 |     def phoneCheck(facebookMessengerPD):
220 |         for x in selected_cols:
221 |             facebookMessengerPD.loc[
222 |                 (facebookMessengerPD[x].str.contains("User ID-Facebook Id", na=False)),
223 |                 "Account ID",
224 |             ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1]
225 |             facebookMessengerPD.loc[
226 |                 (facebookMessengerPD[x].str.contains("User ID-Username", na=False)),
227 |                 "User Name",
228 |             ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1]
229 | 
230 |     phoneCheck(facebookMessengerPD)
231 |     facebookMessengerPD[originIMEI] = phoneData.IMEI
232 |     exportCols = []
233 |     for x in facebookMessengerPD.columns:
234 |         if isinstance(x, str):
235 |             exportCols.append(x)
236 |     print("\n")
237 |     print(
238 |         "{} user accounts located".format(len(facebookMessengerPD["Account"].unique()))
239 |     )
240 |     print("{} contacts located".format(len(facebookMessengerPD["Account ID"].unique())))
241 |     print("Exporting {}-FB-MESSENGER.csv".format(phoneData.inFile))
242 |     logging.info("Exporting FB messenger from {}".format(phoneData.inFile))
243 |     facebookMessengerPD[exportCols].to_csv(
244 |         "{}-FB-MESSENGER.csv".format(phoneData.inFile),
245 |         index=False,
246 |         columns=[
247 |             originIMEI,
248 |             "Account",
249 |             "Interaction Statuses",
250 |             "Name",
251 |             "User Name",
252 |             "Account ID",
253 |             "Source",
254 |         ],
255 |     )
256 | 
257 | 
258 | # ----- Parse Instagram data ------------------------------------------------------------------
259 | def processInstagram(contactsPD):
260 |     print("\nProcessing Instagram")
261 |     instagramPD = contactsPD[contactsPD["Source"] == "Instagram"].copy()
262 |     instagramPD = instagramPD.drop("Entries", axis=1).join(
263 |         instagramPD["Entries"].str.split("\n", expand=True)
264 |     )
265 | 
266 |     selected_cols = []
267 |     for x in instagramPD.columns:
268 |         if isinstance(x, int):
269 |             selected_cols.append(x)
270 | 
271 |     def instaContacts(instagramPD):
272 |         for x in selected_cols:
273 |             instagramPD.loc[
274 |                 (instagramPD[x].str.contains("User ID-Username", na=False)), "User Name"
275 |             ] = instagramPD[x].str.split(":", n=1, expand=True)[1]
276 |             instagramPD.loc[
277 |                 (instagramPD[x].str.contains("User ID-Instagram Id", na=False)),
278 |                 "Instagram ID",
279 |             ] = instagramPD[x].str.split(":", n=1, expand=True)[1]
280 | 
281 |     instaContacts(instagramPD)
282 | 
283 |     instagramPD[originIMEI] = phoneData.IMEI
284 |     exportCols = []
285 |     for x in instagramPD.columns:
286 |         if isinstance(x, str):
287 |             exportCols.append(x)
288 | 
289 |     print("Exporting {}-INSTAGRAM.csv".format(phoneData.inFile))
290 |     logging.info("Exporting Instagram from {}".format(phoneData.inFile))
291 |     instagramPD[exportCols].to_csv(
292 |         "{}-INSTAGRAM.csv".format(phoneData.inFile),
293 |         index=False,
294 |         columns=[
295 |             originIMEI,
296 |             "Account",
297 |             "Name",
298 |             "User Name",
299 |             "Instagram ID",
300 |             "Interaction Statuses",
301 |         ],
302 |     )
303 | 
304 | 
305 | # ------------Process native contact list ------------------------------------------------
306 | def processAppleNative(contactsPD):
307 | 
308 |     print("\nProcessing Native Contacts")
309 |     nativeContactsPD = contactsPD[contactsPD["Source"].isna()]
310 |     nativeContactsPD = nativeContactsPD.drop("Entries", axis=1).join(
311 |         nativeContactsPD["Entries"]
312 |         .str.split("\n", expand=True)
313 |         .stack()
314 |         .reset_index(level=1, drop=True)
315 |         .rename("Entries")
316 |     )
317 | 
318 |     nativeContactsPD = nativeContactsPD[["Name", "Interaction Statuses", "Entries"]]
319 | 
320 |     nativeContactsPD = nativeContactsPD[
321 |         nativeContactsPD["Entries"].str.contains(r"Phone-")
322 |     ]
323 |     nativeContactsPD[originIMEI] = phoneData.IMEI
324 |     nativeContactsPD["Entries"] = (
325 |         nativeContactsPD["Entries"]
326 |         .str.split(":", n=1, expand=True)[1]
327 |         .str.strip()
328 |         .str.replace(" ", "")
329 |         .str.replace("-", "")
330 |     )
331 |     if debug:
332 |         print(nativeContactsPD)
333 |     nativeContactsPD = nativeContactsPD[
334 |         [originIMEI, "Name", "Entries", "Interaction Statuses"]
335 |     ]
336 |     print("{} contacts located.".format(len(nativeContactsPD)))
337 |     print("Exporting {}-NATIVE.csv".format(phoneData.inFile))
338 |     logging.info("Exporting Native contacts from {}".format(phoneData.inFile))
339 |     nativeContactsPD.to_csv("{}-NATIVE.csv".format(phoneData.inFile), index=False)
340 | 
341 | 
342 | # ------------Parse Signal contacts ---------------------------------------------------------------
343 | def processSignal(contactsPD):
344 |     print("Processing Signal Contacts")
345 |     signalPD = contactsPD[contactsPD["Source"] == "Signal"].copy()
346 |     signalPD = signalPD[["Name", "Entries", "Source"]]
347 |     signalPD = signalPD.drop("Entries", axis=1).join(
348 |         signalPD["Entries"].str.split("\n", expand=True)
349 |     )
350 | 
351 |     # Data is expended into columns with integern names, add these columsn to selected_cols so we can search them later
352 |     selected_cols = []
353 |     for x in signalPD.columns:
354 |         if isinstance(x, int):
355 |             selected_cols.append(x)
356 | 
357 |     # Signal can store mutiple values under entries such as Mobile Number:
358 |     # So we break them all out into columns.
359 |     def signalContact(signalPD):
360 |         for x in selected_cols:
361 |             # Locate Signal Username and move to Username Column
362 |             signalPD.loc[
363 |                 (signalPD[x].str.contains("User ID-Username:", na=False)),
364 |                 "User Name",
365 |             ] = signalPD[x].str.split(":", n=1, expand=True)[1]
366 |             # Delete Username entry from origional location
367 |             signalPD.loc[
368 |                 signalPD[x].str.contains("User ID-Username:", na=False), [x]
369 |             ] = ""
370 |             # delete all befote semicolon
371 |             signalPD[x] = signalPD[x].str.split(":", n=1, expand=True)[1].str.strip()
372 | 
373 |     signalContact(signalPD)
374 | 
375 |     signalPD[originIMEI] = phoneData.IMEI
376 | 
377 |     export_cols = [originIMEI, "Name", "User Name"]
378 |     export_cols.extend(selected_cols)
379 |     print("Located {} Signal contacts".format(len(signalPD["Name"])))
380 |     print("Exporting {}-SIGNAL.csv".format(phoneData.inFile))
381 |     logging.info("Exporting Signal messenger from {}".format(phoneData.inFile))
382 |     signalPD.to_csv(
383 |         "{}-SIGNAL.csv".format(phoneData.inFile), index=False, columns=export_cols
384 |     )
385 | 
386 | 
387 | # ----------- Parse Snapchat data ------------------------------------------------------------------
388 | def processSnapChat(contactsPD):
389 |     print("\nProcessing Snapchat")
390 |     snapPD = contactsPD[contactsPD["Source"] == "Snapchat"]
391 |     snapPD = snapPD[["Name", "Entries", "Source"]]
392 | 
393 |     # Extract nested entities
394 |     snapPD = snapPD.drop("Entries", axis=1).join(
395 |         snapPD["Entries"].str.split("\n", expand=True)
396 |     )
397 |     selected_cols = []
398 |     for x in snapPD.columns:
399 |         if isinstance(x, int):
400 |             selected_cols.append(x)
401 | 
402 |     def snapContacts(snapPD):
403 |         for x in selected_cols:
404 |             snapPD.loc[
405 |                 (snapPD[x].str.contains("User ID-Username", na=False)), "User Name"
406 |             ] = snapPD[x].str.split(":", n=1, expand=True)[1]
407 |             snapPD.loc[
408 |                 (snapPD[x].str.contains("User ID-User ID", na=False)), "User ID"
409 |             ] = snapPD[x].str.split(":", n=1, expand=True)[1]
410 | 
411 |     snapContacts(snapPD)
412 |     snapPD[originIMEI] = phoneData.IMEI
413 | 
414 |     exportCols = []
415 |     for x in snapPD.columns:
416 |         if isinstance(x, str):
417 |             exportCols.append(x)
418 |     if debug:
419 |         print(snapPD[exportCols])
420 |     print("Exporting {}-SNAPCHAT.csv".format(phoneData.inFile))
421 |     logging.info("Exporting Snapchat from {}".format(phoneData.inFile))
422 |     snapPD[exportCols].to_csv(
423 |         "{}-SNAPCHAT.csv".format(phoneData.inFile),
424 |         index=False,
425 |         columns=[originIMEI, "Name", "User Name", "User ID"],
426 |     )
427 | 
428 | 
429 | # ---- Parse Telegram Contacts--------------------------------------------------------------
430 | def processTelegram(contactsPD):
431 |     print("\nProcessing Telegram")
432 |     telegramPD = contactsPD[contactsPD["Source"] == "Telegram"].copy()
433 |     telegramPD = telegramPD.drop("Entries", axis=1).join(
434 |         telegramPD["Entries"].str.split("\n", expand=True)
435 |     )
436 |     telegramPD = telegramPD.reset_index(drop=True)
437 | 
438 |     selected_cols = []
439 |     for x in telegramPD.columns:
440 |         if isinstance(x, int):
441 |             selected_cols.append(x)
442 | 
443 |     def phoneCheck(telegramPD):
444 |         for x in selected_cols:
445 |             telegramPD.loc[
446 |                 (telegramPD[x].str.contains("Phone-", na=False)), "Phone-Number"
447 |             ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
448 | 
449 |             telegramPD.loc[
450 |                 (telegramPD[x].str.contains("User ID-Peer", na=False)), "Peer-ID"
451 |             ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
452 | 
453 |             telegramPD.loc[
454 |                 (telegramPD[x].str.contains("User ID-Username", na=False)), "User-Name"
455 |             ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
456 | 
457 |     phoneCheck(telegramPD)
458 |     telegramPD[originIMEI] = phoneData.IMEI
459 |     exportCols = []
460 |     for x in telegramPD.columns:
461 |         if isinstance(x, str):
462 |             exportCols.append(x)
463 |     # Export CSV
464 |     print("Exporting {}-TELEGRAM.csv".format(phoneData.inFile))
465 |     logging.info("Exporting Telegram from {}".format(phoneData.inFile))
466 |     telegramPD[exportCols].to_csv(
467 |         "{}-TELEGRAM.csv".format(phoneData.inFile), index=False
468 |     )
469 | 
470 | 
471 | # ---Parse Whatsapp Contacts----------------------------------------------------------------------
472 | # Load WhatsApp
473 | def processWhatsapp(contactsPD):
474 |     print("\nProcessing WhatsApp")
475 |     whatsAppPD = contactsPD[contactsPD["Source"] == "WhatsApp"].copy()
476 |     whatsAppPD = whatsAppPD[["Name", "Entries", "Source", "Interaction Statuses"]]
477 |     # Shared contacts are not associated with a Whats app ID and cause problems.
478 |     whatsAppPD = whatsAppPD[
479 |         whatsAppPD["Interaction Statuses"].str.contains("Shared", na=False) == False
480 |     ]
481 |     # Unpack nested data
482 |     whatsAppPD = whatsAppPD.drop("Entries", axis=1).join(
483 |         whatsAppPD["Entries"].str.split("\n", expand=True)
484 |     )
485 | 
486 |     # Data is expanded into colums with Integer names, check for these columns and add them to a
487 |     # list to allow for different width sheets.
488 |     colList = list(whatsAppPD)
489 |     selected_cols = []
490 |     for x in colList:
491 |         if isinstance(x, int):
492 |             selected_cols.append(x)
493 | 
494 |     # Look for data across expanded columns and shift it to output columns.
495 |     def whatsappContactProcess(whatsAppPD):
496 |         print("\nProcessing WhatsApp")
497 |         for x in selected_cols:
498 |             whatsAppPD.loc[
499 |                 (whatsAppPD[x].str.contains("Phone-Mobile", na=False)), "Phone-Mobile"
500 |             ] = (
501 |                 whatsAppPD[x]
502 |                 .str.split(":", n=1, expand=True)[1]
503 |                 .str.replace(" ", "")
504 |                 .str.replace("-", "")
505 |             )
506 | 
507 |             whatsAppPD.loc[
508 |                 (whatsAppPD[x].str.contains("Phone-:", na=False)), "Phone"
509 |             ] = (
510 |                 whatsAppPD[x]
511 |                 .str.split(":", n=1, expand=True)[1]
512 |                 .str.replace(" ", "")
513 |                 .str.replace("-", "")
514 |             )
515 | 
516 |             whatsAppPD.loc[
517 |                 (whatsAppPD[x].str.contains("Phone-Home:", na=False)), "Phone-Home"
518 |             ] = (
519 |                 whatsAppPD[x]
520 |                 .str.split(":", n=1, expand=True)[1]
521 |                 .str.replace(" ", "")
522 |                 .str.replace("-", "")
523 |             )
524 | 
525 |             whatsAppPD.loc[
526 |                 (whatsAppPD[x].str.contains("User ID-Push Name", na=False)), "Push-ID"
527 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
528 | 
529 |             whatsAppPD.loc[
530 |                 (whatsAppPD[x].str.contains("User ID-Id", na=False)), "Id-ID"
531 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
532 | 
533 |             whatsAppPD.loc[
534 |                 (whatsAppPD[x].str.contains("User ID-WhatsApp User Id", na=False)),
535 |                 "WhatsApp-ID",
536 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
537 | 
538 |             whatsAppPD.loc[
539 |                 (whatsAppPD[x].str.contains("Web address-Professional", na=False)),
540 |                 "BusinessWebsite",
541 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
542 | 
543 |             whatsAppPD.loc[
544 |                 (whatsAppPD[x].str.contains("Email-Professional", na=False)),
545 |                 "Business-Email",
546 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
547 | 
548 |     whatsappContactProcess(whatsAppPD)
549 | 
550 |     # Add IMEI Column
551 |     whatsAppPD[originIMEI] = phoneData.IMEI
552 | 
553 |     # Remove working columns.
554 |     exportCols = []
555 |     for x in whatsAppPD.columns:
556 |         if isinstance(x, str):
557 |             exportCols.append(x)
558 |     if debug:
559 |         print(exportCols)
560 | 
561 |     # Export CSV
562 |     print("Exporting {}-WHATSAPP.csv".format(phoneData.inFile))
563 |     logging.info("Exporting Whatsapp from {}".format(phoneData.inFile))    
564 |     whatsAppPD[exportCols].to_csv(
565 |         "{}-WHATSAPP.csv".format(phoneData.inFile), index=False
566 |     )
567 | 
568 | 
569 | # ------- Argument parser for command line arguments -----------------------------------------
570 | 
571 | if __name__ == "__main__":
572 |     parser = argparse.ArgumentParser(
573 |         description=__description__,
574 |         epilog="Developed by {}".format(str(__author__), str(__version__)),
575 |     )
576 | 
577 |     parser.add_argument(
578 |         "-f",
579 |         "--f",
580 |         dest="inputFilename",
581 |         help="Path to Excel Spreadsheet",
582 |         required=False,
583 |     )
584 | 
585 |     parser.add_argument(
586 |         "-b",
587 |         "--bulk",
588 |         dest="bulk",
589 |         required=False,
590 |         action="store_true",
591 |         help="Bulk process Excel spreadsheets in working directory.",
592 |     )
593 | 
594 |     args = parser.parse_args()
595 | 
596 |     if len(sys.argv) == 1:
597 |         parser.print_help()
598 |         parser.exit()
599 | 
600 |     if args.bulk:
601 |         print("Bulk Process")
602 |         bulkProcessor()
603 | 
604 |     if args.inputFilename:
605 |         if not os.path.exists(args.inputFilename):
606 |             print(
607 |                 "Error: '{}' does not exist or is not a file.".format(
608 |                     args.inputFilename
609 |                 )
610 |             )
611 |             sys.exit(1)
612 |         processMetadata(args.inputFilename)
613 | 


--------------------------------------------------------------------------------
/clbExtract/clbExtract.py:
--------------------------------------------------------------------------------
   1 | """
   2 | Extracts nested contacts data from Cellebrite formatted Excel documents.
   3 |     - Cellebrite Stores contact details in multiline Excel cells.
   4 | 
   5 | Formatted unapologetically with Black
   6 | 
   7 | # Current Known Issues
   8 | # FIXME - Fix Instagram parser, account ID's have changed
   9 | # FIXME - Fix Og Signal parser, Column order
  10 | 
  11 | Changelog
  12 | 0.9 - Fix - Handles name change to Signal Private Messenger and extra data columns
  13 |     - prints version to command line
  14 |     - Fix bug where files ending in .XLXS (Caps) wouldn't be automatically found
  15 |     - Support for Outlook contacts
  16 | 
  17 | 0.8 - Fix issue where Input file was entered twice for Instagram export sheet
  18 | 
  19 | 0.7 - Add Provenance data column
  20 |     - Fix issue where WhatsApp or Facebook may not export due to lack if 'Interaction Statuses' Column
  21 | 
  22 | 0.6 - Fix issue with Threema user ID attribution
  23 |     - Fix issue with parsers crashing out, now raises an exception and continues.
  24 | 
  25 | 0.5 - Added support for recents - at this time this is kept separate from native contacts
  26 |     - Warning re large files, pandas is unable to provide load time estimates
  27 |     - Add option to normalise Au mobile phone by converting +614** to 04**
  28 |     - Minor tidyups and fixes to logging.
  29 |     - Fix WeChat exception for older style excels
  30 |     - Fix Whatsapp exception when interaction status is not populated
  31 |     - Fix Exception when there is no IMEI entry at all, eg. older iPads
  32 |     - Populate and export source columns
  33 | 
  34 | 0.4a - Added support for Cellebrite files with device info stored in "device" rather than name columns
  35 | 
  36 | 0.4 - Add support for alternate Cellebrite info page format
  37 |     - Add support For Line, WeChat, Threema contacts
  38 | 
  39 | 0.3 Complete rewrite
  40 | 
  41 | 0.2 - Implement command line argument parser
  42 |         Allow bulk processing of all items in directory
  43 | 
  44 | 0.1 - Initial concept
  45 | 
  46 | """
  47 | 
  48 | import argparse
  49 | import glob
  50 | import logging
  51 | import os
  52 | import pandas as pd
  53 | from pathlib import Path
  54 | import sys
  55 | 
  56 | ## Details
  57 | __description__ = 'Flattens Cellebrite formatted Excel files. "Contacts" and "Device Info" tabs are required.'
  58 | __author__ = "facelessg00n"
  59 | __version__ = "0.9"
  60 | 
  61 | parser = argparse.ArgumentParser(
  62 |     description=__description__,
  63 |     epilog="Developed by {}, version {}".format(str(__author__), str(__version__)),
  64 | )
  65 | 
  66 | # ----------- Options -----------
  67 | os.chdir(os.getcwd())
  68 | 
  69 | # Show extra debug output
  70 | debug = False
  71 | 
  72 | # Normalise Australian mobile numbers by replacing +614 with 04
  73 | ausNormal = True
  74 | 
  75 | # File size warning (MB)
  76 | warnSize = 50
  77 | 
  78 | 
  79 | # ----------- Logging options -------------------------------------
  80 | 
  81 | logging.basicConfig(
  82 |     filename="clbExtract.log",
  83 |     format="%(asctime)s,- %(levelname)s - %(message)s",
  84 |     level=logging.INFO,
  85 | )
  86 | 
  87 | 
  88 | # Set names for sheets of interest
  89 | clbPhoneInfo = "Device Info"
  90 | clbContactSheet = "Contacts"
  91 | clbPhoneInfov2 = "Device Information"
  92 | 
  93 | # FIXME
  94 | #### ---- Column names and other options ---------------------------------------------
  95 | provenanceCols = ["WARRANT", "COLLECT", "EXAM", "NOTICE"]
  96 | 
  97 | contactOutput = "ContactDetail"
  98 | contactTypeOutput = "ContactType"
  99 | originIMEI = "originIMEI"
 100 | parsedApps = [
 101 |     "Facebook Messenger",
 102 |     "Instagram",
 103 |     "Line",
 104 |     "Native",
 105 |     "Outlook",
 106 |     "Recents",
 107 |     "Signal",
 108 |     "Signal Private Messenger",
 109 |     "Snapchat",
 110 |     "WhatsApp",
 111 |     "Telegram",
 112 |     "Threema",
 113 |     "WeChat",
 114 |     "Zalo",
 115 | ]
 116 | 
 117 | 
 118 | # Class object to hold phone and input file info
 119 | class phoneData:
 120 |     IMEI = None
 121 |     IMEI2 = None
 122 |     inFile = None
 123 |     inPath = None
 124 |     inProvenance = None
 125 | 
 126 |     def __init__(
 127 |         self, IMEI=None, IMEI2=None, inFile=None, inPath=None, inProvenance=None
 128 |     ) -> None:
 129 |         self.IMEI = IMEI
 130 |         self.IMEI2 = IMEI2
 131 |         self.inFile = inFile
 132 |         self.inPath = inPath
 133 |         self.inProvenance = inProvenance
 134 | 
 135 | 
 136 | # -------------Functions live here ------------------------------------------
 137 | 
 138 | # ----- Bulk Excel Processor--------------------------------------------------
 139 | 
 140 | 
 141 | # Finds and processes all excel files in the working directory.
 142 | def bulkProcessor(inputProvenance):
 143 |     FILE_PATH = os.getcwd()
 144 |     inputFiles = glob.glob("*.xlsx") + glob.glob("*.XLSX")
 145 |     print((str(len(inputFiles)) + " Excel files located. \n"))
 146 |     logging.info("Bulk processing {} files".format(str(len(inputFiles))))
 147 |     # If there are no files found exit the process.
 148 |     if len(inputFiles) == 0:
 149 |         print("No excel files located.")
 150 |         print("Exiting.")
 151 |         quit()
 152 |     else:
 153 |         for inputFile in inputFiles:
 154 |             if os.path.exists(inputFile):
 155 |                 try:
 156 |                     processMetadata(inputFile, inputProvenance)
 157 |                 # Need to deal with $ files.
 158 |                 except FileNotFoundError:
 159 |                     print("File does not exist or temp file detected")
 160 |                     pass
 161 |     if debug:
 162 |         for inputFile in inputFiles:
 163 |             inputFilename = inputFile.split(".")[0]
 164 |             print(inputFilename)
 165 | 
 166 | 
 167 | # FIXME - Deal with error when this info is missing
 168 | ### -------- Process phone metadata ------------------------------------------------------
 169 | def processMetadata(inputFile, inputProvenance):
 170 |     inputFile = inputFile
 171 |     print("Input Provenance is {}".format(inputProvenance))
 172 |     print("Extracting metadata from {}".format(inputFile))
 173 |     logging.info("Extracting metadata from {}".format(inputFile))
 174 | 
 175 |     phoneData.inProvenance = inputProvenance
 176 | 
 177 |     fileSize = os.path.getsize(inputFile) / 1048576
 178 |     if fileSize > warnSize:
 179 |         print(
 180 |             "Large input file detected, {} MB and may take some time to process, sadly progress is not able to be updated while the file is loading".format(
 181 |                 f"{fileSize:.2f}"
 182 |             )
 183 |         )
 184 |     else:
 185 |         print("Input file is {} MB".format(f"{fileSize:.2f}"))
 186 | 
 187 |     try:
 188 |         infoPD = pd.read_excel(
 189 |             inputFile, sheet_name=clbPhoneInfo, header=1, usecols="B,C,D"
 190 |         )
 191 | 
 192 |         try:
 193 |             phoneData.IMEI = infoPD.loc[infoPD["Name"] == "IMEI", ["Value"]].values[0][
 194 |                 0
 195 |             ]
 196 |             phoneData.inFile = Path(inputFile).stem
 197 |             phoneData.inPath = os.path.dirname(inputFile)
 198 |             phoneData.inProvenance = inputProvenance
 199 |         except:
 200 |             print("Attempting Device Column")
 201 |             try:
 202 |                 phoneData.IMEI = infoPD.loc[
 203 |                     infoPD["Device"] == "IMEI", ["Value"]
 204 |                 ].values[0][0]
 205 |                 phoneData.inFile = Path(inputFile).stem
 206 |                 phoneData.inPath = os.path.dirname(inputFile)
 207 |             except:
 208 |                 print("IMEI not located, setting to NULL")
 209 |                 phoneData.IMEI = None
 210 |                 phoneData.inFile = Path(inputFile).stem
 211 |                 phoneData.inPath = os.path.dirname(inputFile)
 212 | 
 213 |         try:
 214 |             phoneData.IMEI2 = infoPD.loc[infoPD["Name"] == "IMEI2", ["Value"]].values[
 215 |                 0
 216 |             ][0]
 217 |         except:
 218 |             phoneData.IMEI2 = None
 219 |             phoneData.inFile = Path(inputFile).stem
 220 |             phoneData.inPath = os.path.dirname(inputFile)
 221 |         # phoneData.inFile = inputFile.split(".")[0]
 222 |         phoneData.inFile = Path(inputFile).stem
 223 |         phoneData.inPath = os.path.dirname(inputFile)
 224 | 
 225 |         if debug:
 226 |             print(infoPD)
 227 |             print(phoneData.IMEI)
 228 | 
 229 |     except ValueError:
 230 |         print(
 231 |             "Info tab not found in {}, attempting with second format.".format(inputFile)
 232 |         )
 233 |         logging.exception(
 234 |             "No info tab found in {}, attempting with second format".format(inputFile)
 235 |         )
 236 |         try:
 237 |             infoPD = pd.read_excel(
 238 |                 inputFile, sheet_name=clbPhoneInfov2, header=1, usecols="B,C,D"
 239 |             )
 240 |             # Remove leading whitespace from columns
 241 |             infoPD["Name"] = infoPD["Name"].str.strip()
 242 |             phoneData.IMEI = infoPD.loc[infoPD["Name"] == "IMEI", ["Value"]].values[0][
 243 |                 0
 244 |             ]
 245 |             print("Second format succeeded")
 246 |             logging.info("Second format succeeded on {}".format(inputFile))
 247 | 
 248 |             phoneData.inFile = Path(inputFile).stem
 249 |             phoneData.inPath = os.path.dirname(inputFile)
 250 | 
 251 |         except IndexError:
 252 |             print("IMEI not located, is this a tablet or iPAD?")
 253 |             logging.warning(
 254 |                 "IMEI not found in {}, apptempting with with no IMEI".format(inputFile)
 255 |             )
 256 |             phoneData.IMEI = None
 257 |             phoneData.IMEI2 = None
 258 |             phoneData.inFile = Path(inputFile).stem
 259 |             phoneData.inPath = os.path.dirname(inputFile)
 260 |             print("Loaded {}, with no IMEI".format(inputFile))
 261 |             logging.info("Loaded {}, with no IMEI".format(inputFile))
 262 |             pass
 263 | 
 264 |         except ValueError:
 265 |             print(
 266 |                 "\033[1;31m Info tab not found in {}, apptempting with with no IMEI".format(
 267 |                     inputFile
 268 |                 )
 269 |             )
 270 |             logging.warning(
 271 |                 "Info tab not found in {}, apptempting with with no IMEI".format(
 272 |                     inputFile
 273 |                 )
 274 |             )
 275 |             phoneData.IMEI = None
 276 |             phoneData.IMEI2 = None
 277 |             # phoneData.inFile = inputFile.split(".")[0]
 278 |             phoneData.inFile = Path(inputFile).stem
 279 |             phoneData.inPath = os.path.dirname(inputFile)
 280 |             print("\033[1;31m Loaded {}, with no IMEI".format(inputFile))
 281 |             logging.info("Loaded {}, with no IMEI".format(inputFile))
 282 |             pass
 283 | 
 284 |     try:
 285 |         processContacts(inputFile)
 286 |     except Exception as e:
 287 |         print(e)
 288 |     except ValueError:
 289 |         print("\033[1;31m No Contacts tab  found, is this a correctly formatted Excel?")
 290 |         logging.error(
 291 |             "No Contacts tab  found in {}, is this a correctly formatted Excel?".format(
 292 |                 inputFile
 293 |             )
 294 |         )
 295 | 
 296 | 
 297 | ### Extract contacts tab of Excel file -------------------------------------------------------------------
 298 | # This creates the initial dataframe, future processing is from copies of this dataframe.
 299 | def processContacts(inputFile):
 300 |     inputFile = inputFile
 301 |     fileSize = os.path.getsize(inputFile) / 1048576
 302 |     print("Processing contacts in {} has begun.".format(phoneData.inFile))
 303 |     logging.info("Processing contacts in {} has begun.".format(phoneData.inFile))
 304 | 
 305 |     if fileSize > warnSize:
 306 |         print(
 307 |             "Large input file detected, {} MB and may take some time to process, sadly progress is not able to be updated while the file is loading".format(
 308 |                 f"{fileSize:.2f}"
 309 |             )
 310 |         )
 311 |     else:
 312 |         print("Input file is {} MB".format(f"{fileSize:.2f}"))
 313 | 
 314 |     # Record input filename for use in export processes.
 315 |     if debug:
 316 |         print("\033[0;37m Input file is : {}".format(phoneData.inFile))
 317 | 
 318 |     contactsPD = pd.read_excel(
 319 |         inputFile,
 320 |         sheet_name=clbContactSheet,
 321 |         header=1,
 322 |         index_col="#",
 323 |         usecols=["#", "Name", "Entries", "Source", "Account"],
 324 |     )
 325 | 
 326 |     print("\033[0mProcessing the following app types for : {}".format(phoneData.inFile))
 327 |     applist = contactsPD["Source"].unique()
 328 |     for x in applist:
 329 |         if x in parsedApps:
 330 |             print("{} : \u2713 ".format(x))
 331 | 
 332 |         else:
 333 |             print("{} : \u2716".format(x))
 334 | 
 335 |     # Process native contacts
 336 |     try:
 337 |         processAppleNative(contactsPD)
 338 |     except Exception as e:
 339 |         print("Processing native contacts failed")
 340 |         print(e)
 341 |         pass
 342 | 
 343 |     # Process Apps
 344 |     for x in applist:
 345 |         if x == "Facebook Messenger":
 346 |             try:
 347 |                 processFacebookMessenger(contactsPD)
 348 |             except Exception as e:
 349 |                 logging.warning("Failed to parse Facebook messenger - {}".format(e))
 350 |                 pass
 351 |         if x == "Instagram":
 352 |             try:
 353 |                 processInstagram(contactsPD)
 354 |             except:
 355 |                 logging.warning("Failed to parse Instagram")
 356 |                 pass
 357 |         if x == "Line":
 358 |             try:
 359 |                 processLine(contactsPD)
 360 |             except:
 361 |                 logging.warning("Failed to parse Line")
 362 |                 pass
 363 |         if x == "Outlook":
 364 |             try:
 365 |                 processOutlookContacts(contactsPD)
 366 |             except:
 367 |                 logging.warning("Failed to parse Outlook")
 368 |                 pass
 369 |         if x == "Recents":
 370 |             try:
 371 |                 processRecents(contactsPD)
 372 |             except:
 373 |                 logging.warning("Failed to parse Recents")
 374 |                 pass
 375 |         if x == "Snapchat":
 376 |             try:
 377 |                 processSnapChat(contactsPD)
 378 |             except:
 379 |                 logging.warning("Failed to parse Snapchat")
 380 |                 pass
 381 |         if x == "Telegram":
 382 |             try:
 383 |                 processTelegram(contactsPD)
 384 |             except:
 385 |                 logging.warning("Failed to parse Telegram")
 386 |                 pass
 387 |         if x == "Threema":
 388 |             try:
 389 |                 processThreema(contactsPD)
 390 |             except:
 391 |                 logging.warning("Failed to parse Threema")
 392 |                 pass
 393 |         if x == "Signal":
 394 |             try:
 395 |                 processSignal(contactsPD)
 396 |             except:
 397 |                 logging.warning("Failed to parse Signal")
 398 |                 pass
 399 |         if x == "Signal Private Messenger":
 400 |             try:
 401 |                 processSignalPrivateMessenger(contactsPD)
 402 |             except:
 403 |                 logging.warning("Failed to parse Signal Private Messenger")
 404 | 
 405 |         if x == "WeChat":
 406 |             try:
 407 |                 processWeChat(contactsPD)
 408 |             except:
 409 |                 logging.warning("Failed to parse WeChat")
 410 |                 pass
 411 |         if x == "WhatsApp":
 412 |             try:
 413 |                 processWhatsapp(contactsPD)
 414 |             except Exception as e:
 415 |                 logging.warning("Failed to parse WhatsApp - {}".format(e))
 416 |                 pass
 417 |         if x == "Zalo":
 418 |             try:
 419 |                 processZalo(contactsPD)
 420 |             except:
 421 |                 logging.warning("Failed to parse Zalo")
 422 |                 pass
 423 | 
 424 |     print("\nProcessing of {} complete".format(inputFile))
 425 | 
 426 | 
 427 | # ------ Parse Facebook Messenger --------------------------------------------------------------
 428 | def processFacebookMessenger(contactsPD):
 429 |     print("\nProcessing Facebook Messenger")
 430 |     facebookMessengerPD = contactsPD[contactsPD["Source"] == "Facebook Messenger"]
 431 |     facebookMessengerPD = facebookMessengerPD.drop("Entries", axis=1).join(
 432 |         facebookMessengerPD["Entries"].str.split("\n", expand=True)
 433 |     )
 434 |     facebookMessengerPD = facebookMessengerPD.reset_index(drop=True)
 435 | 
 436 |     selected_cols = []
 437 |     for x in facebookMessengerPD.columns:
 438 |         if isinstance(x, int):
 439 |             selected_cols.append(x)
 440 | 
 441 |     def phoneCheck(facebookMessengerPD):
 442 |         for x in selected_cols:
 443 |             facebookMessengerPD.loc[
 444 |                 (facebookMessengerPD[x].str.contains("User ID-Facebook Id", na=False)),
 445 |                 "Account ID",
 446 |             ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1]
 447 |             facebookMessengerPD.loc[
 448 |                 (facebookMessengerPD[x].str.contains("User ID-Username", na=False)),
 449 |                 "User Name",
 450 |             ] = facebookMessengerPD[x].str.split(":", n=1, expand=True)[1]
 451 | 
 452 |     phoneCheck(facebookMessengerPD)
 453 | 
 454 |     facebookMessengerPD["Source"] = "Messenger"
 455 |     facebookMessengerPD[originIMEI] = phoneData.IMEI
 456 |     facebookMessengerPD["inputFile"] = phoneData.inFile
 457 |     facebookMessengerPD["Provenance"] = phoneData.inProvenance
 458 | 
 459 |     exportCols = []
 460 |     for x in facebookMessengerPD.columns:
 461 |         if isinstance(x, str):
 462 |             exportCols.append(x)
 463 |     print(
 464 |         "{} user accounts located".format(len(facebookMessengerPD["Account"].unique()))
 465 |     )
 466 |     print("{} contacts located".format(len(facebookMessengerPD["Account ID"].unique())))
 467 |     print("Exporting {}-FB-MESSENGER.csv".format(phoneData.inFile))
 468 |     logging.info("Exporting FB messenger from {}".format(phoneData.inFile))
 469 |     try:
 470 |         facebookMessengerPD[exportCols].to_csv(
 471 |             "{}-FB-MESSENGER.csv".format(phoneData.inFile),
 472 |             index=False,
 473 |         )
 474 |     except Exception as e:
 475 |         print(e)
 476 | 
 477 | 
 478 | # ----- Parse Instagram data ------------------------------------------------------------------
 479 | def processInstagram(contactsPD):
 480 |     print("\nProcessing Instagram")
 481 |     instagramPD = contactsPD[contactsPD["Source"] == "Instagram"].copy()
 482 |     instagramPD = instagramPD.drop("Entries", axis=1).join(
 483 |         instagramPD["Entries"].str.split("\n", expand=True)
 484 |     )
 485 | 
 486 |     selected_cols = []
 487 |     for x in instagramPD.columns:
 488 |         if isinstance(x, int):
 489 |             selected_cols.append(x)
 490 | 
 491 |     def instaContacts(instagramPD):
 492 |         for x in selected_cols:
 493 |             instagramPD.loc[
 494 |                 (instagramPD[x].str.contains("User ID-Username", na=False)), "User Name"
 495 |             ] = instagramPD[x].str.split(":", n=1, expand=True)[1]
 496 |             instagramPD.loc[
 497 |                 (instagramPD[x].str.contains("User ID-Instagram Id", na=False)),
 498 |                 "Instagram ID",
 499 |             ] = instagramPD[x].str.split(":", n=1, expand=True)[1]
 500 | 
 501 |     instaContacts(instagramPD)
 502 | 
 503 |     instagramPD[originIMEI] = phoneData.IMEI
 504 |     instagramPD["inputFile"] = phoneData.inFile
 505 | 
 506 |     exportCols = []
 507 |     for x in instagramPD.columns:
 508 |         if isinstance(x, str):
 509 |             exportCols.append(x)
 510 |     print("{} Instagram contacts located".format(len(instagramPD["Name"])))
 511 |     print("Exporting {}-INSTAGRAM.csv".format(phoneData.inFile))
 512 |     logging.info("Exporting Instagram from {}".format(phoneData.inFile))
 513 |     # TODO - Fix column handling
 514 |     instagramPD[exportCols].to_csv(
 515 |         "{}-INSTAGRAM.csv".format(phoneData.inFile),
 516 |         index=False,
 517 |     )
 518 | 
 519 | 
 520 | # ---- Process Line -----------------------------------------------------------------------
 521 | def processLine(contactsPD):
 522 |     print("Processing Line")
 523 |     linePD = contactsPD[contactsPD["Source"] == "Line"].copy()
 524 |     linePD = linePD.drop("Entries", axis=1).join(
 525 |         linePD["Entries"].str.split("\n", expand=True)
 526 |     )
 527 |     linePD = linePD.reset_index(drop=True)
 528 | 
 529 |     selected_cols = []
 530 |     for x in linePD.columns:
 531 |         if isinstance(x, int):
 532 |             selected_cols.append(x)
 533 | 
 534 |     def processLine(LinePD):
 535 |         for x in selected_cols:
 536 |             LinePD.loc[
 537 |                 (LinePD[x].str.contains("User ID-Address Book Name:", na=False)),
 538 |                 "LineAddressBook",
 539 |             ] = LinePD[x].str.split(":", n=1, expand=True)[1]
 540 | 
 541 |             LinePD.loc[
 542 |                 (LinePD[x].str.contains("User ID-User ID:", na=False)),
 543 |                 "LineUserID",
 544 |             ] = LinePD[x].str.split(":", n=1, expand=True)[1]
 545 |             LinePD.loc[
 546 |                 (LinePD[x].str.contains("User ID-Server:", na=False)),
 547 |                 "LineServerID",
 548 |             ] = LinePD[x].str.split(":", n=1, expand=True)[1]
 549 | 
 550 |     processLine(linePD)
 551 | 
 552 |     linePD[originIMEI] = phoneData.IMEI
 553 |     linePD["inputFile"] = phoneData.inFile
 554 |     exportCols = []
 555 | 
 556 |     for x in linePD.columns:
 557 |         if isinstance(x, str):
 558 |             exportCols.append(x)
 559 | 
 560 |     print("{} Line contacts located".format(len(linePD["Name"])))
 561 |     print("Exporting {}-LINE.csv".format(phoneData.inFile))
 562 |     logging.info("Exporting Line contacts from {}".format(phoneData.inFile))
 563 |     linePD[exportCols].to_csv("{}-LINE.csv".format(phoneData.inFile), index=False)
 564 | 
 565 | 
 566 | # ------------Process native contact list ------------------------------------------------
 567 | def processAppleNative(contactsPD):
 568 | 
 569 |     print("\nProcessing Native Contacts")
 570 |     # nativeContactsPD = contactsPD[contactsPD["Source"].isna()]
 571 | 
 572 |     # Contacts are stored with either null (iPhone) or "Phone" for Android
 573 |     nativeContactsPD = contactsPD[
 574 |         (contactsPD.Source.isna()) | (contactsPD.Source == "Phone")
 575 |     ].copy()
 576 | 
 577 |     # Fill NaN values with : to prevent error with blank entries.
 578 |     nativeContactsPD.Entries = nativeContactsPD.Entries.fillna(":")
 579 | 
 580 |     nativeContactsPD = nativeContactsPD.drop("Entries", axis=1).join(
 581 |         nativeContactsPD["Entries"]
 582 |         .str.split("\n", expand=True)
 583 |         .stack()
 584 |         .reset_index(level=1, drop=True)
 585 |         .rename("Entries")
 586 |     )
 587 | 
 588 |     # nativeContactsPD = nativeContactsPD[["Name", "Interaction Statuses", "Entries"]]
 589 | 
 590 |     nativeContactsPD = nativeContactsPD[
 591 |         nativeContactsPD["Entries"].str.contains(r"Phone-")
 592 |     ]
 593 |     nativeContactsPD[originIMEI] = phoneData.IMEI
 594 |     nativeContactsPD["inputFile"] = phoneData.inFile
 595 |     nativeContactsPD["Provenance"] = phoneData.inProvenance
 596 | 
 597 |     # Remove erroneous characters, need to make this a regex
 598 |     # TODO Use a regex to tidy this up.
 599 |     nativeContactsPD["Entries"] = (
 600 |         nativeContactsPD["Entries"]
 601 |         .str.split(":", n=1, expand=True)[1]
 602 |         .str.strip()
 603 |         .str.replace(" ", "", regex=False)
 604 |         .str.replace("-", "", regex=False)
 605 |         .str.replace("+", "", regex=False)
 606 |         # Fix issue with inseyets reports
 607 |         .str.replace("Message", "", regex=False)
 608 |         .str.replace("(", "", regex=False)
 609 |         .str.replace(")", "", regex=False)
 610 |     )
 611 | 
 612 |     if ausNormal:
 613 |         nativeContactsPD["Entries"] = nativeContactsPD["Entries"].str.replace(
 614 |             r"\+614", "04", regex=True
 615 |         )
 616 | 
 617 |     if debug:
 618 |         print(nativeContactsPD)
 619 | 
 620 |     # nativeContactsPD = nativeContactsPD[[originIMEI, "Name", "Entries", "Interaction Statuses"]]
 621 |     print("{} contacts located.".format(len(nativeContactsPD)))
 622 |     print("Exporting {}-NATIVE.csv".format(phoneData.inFile))
 623 |     logging.info("Exporting Native contacts from {}".format(phoneData.inFile))
 624 |     nativeContactsPD.to_csv("{}-NATIVE.csv".format(phoneData.inFile), index=False)
 625 | 
 626 | 
 627 | # Process Outlook Contacts
 628 | def processOutlookContacts(contactsPD):
 629 |     print("\nProcessing Outlook Contacts")
 630 | 
 631 |     outlookContactsPD = contactsPD[(contactsPD.Source == "Outlook")].copy()
 632 |     # Fill NaN values with : to prevent error with blank entries.
 633 |     outlookContactsPD.Entries = outlookContactsPD.Entries.fillna(":")
 634 | 
 635 |     outlookContactsPD = outlookContactsPD.drop("Entries", axis=1).join(
 636 |         outlookContactsPD["Entries"]
 637 |         .str.split("\n", expand=True)
 638 |         .stack()
 639 |         .reset_index(level=1, drop=True)
 640 |         .rename("Entries")
 641 |     )
 642 | 
 643 |     outlookContactsPD = outlookContactsPD[["Account", "Name", "Entries", "Source"]]
 644 |     outlookContactsPD[originIMEI] = phoneData.IMEI
 645 |     outlookContactsPD["inputFile"] = phoneData.inFile
 646 |     outlookContactsPD["Provenance"] = phoneData.inProvenance
 647 | 
 648 |     outlookContactsPD["Entries"] = (
 649 |         outlookContactsPD["Entries"].str.split(":", n=1, expand=True)[1].str.strip()
 650 |     )
 651 | 
 652 |     print("{} contacts located.".format(len(outlookContactsPD)))
 653 |     print("Exporting {}-OUTLOOK.csv".format(phoneData.inFile))
 654 |     logging.info("Exporting Native contacts from {}".format(phoneData.inFile))
 655 |     outlookContactsPD.to_csv("{}-OUTLOOK.csv".format(phoneData.inFile), index=False)
 656 | 
 657 | 
 658 | # ----------- Parse Recents -----------------------------------------------------------------------
 659 | def processRecents(contactsPD):
 660 |     print("\nProcessing Recents")
 661 |     recentsPD = contactsPD[contactsPD["Source"] == "Recents"].copy()
 662 |     recentsPD.Entries = recentsPD.Entries.fillna(":")
 663 |     recentsPD = recentsPD[recentsPD["Entries"].str.contains(r"Phone-")]
 664 | 
 665 |     recentsPD[originIMEI] = phoneData.IMEI
 666 |     recentsPD["inputFile"] = phoneData.inFile
 667 |     recentsPD["Provenance"] = phoneData.inProvenance
 668 | 
 669 |     recentsPD["Entries"] = (
 670 |         recentsPD["Entries"]
 671 |         .str.split(":", n=1, expand=True)[1]
 672 |         .str.strip()
 673 |         .str.replace(" ", "")
 674 |         .str.replace("-", "")
 675 |         # .str.replace("+","",regex=False)
 676 |     )
 677 |     if ausNormal:
 678 |         recentsPD["Entries"] = recentsPD["Entries"].str.replace(
 679 |             r"\+614", "04", regex=True
 680 |         )
 681 | 
 682 |     print("{} recent contacts located.".format(len(recentsPD)))
 683 |     print("Exporting {}-RECENT.csv".format(phoneData.inFile))
 684 |     logging.info("Exporting recent contacts from {}".format(phoneData.inFile))
 685 |     recentsPD.to_csv("{}-RECENTS.csv".format(phoneData.inFile), index=False)
 686 | 
 687 | 
 688 | # ------------Parse Signal contacts ---------------------------------------------------------------
 689 | def processSignal(contactsPD):
 690 |     print("\nProcessing Signal Contacts")
 691 |     signalPD = contactsPD[contactsPD["Source"] == "Signal"].copy()
 692 |     signalPD = signalPD[["Name", "Entries", "Source"]]
 693 |     signalPD = signalPD.drop("Entries", axis=1).join(
 694 |         signalPD["Entries"].str.split("\n", expand=True)
 695 |     )
 696 | 
 697 |     # Data is expended into columns with integer names, add these columsn to selected_cols so we can search them later
 698 |     selected_cols = []
 699 |     for x in signalPD.columns:
 700 |         if isinstance(x, int):
 701 |             selected_cols.append(x)
 702 | 
 703 |     # FIXME improve with method used for other apps
 704 |     # Signal can store mutiple values under entries such as Mobile Number:
 705 |     # So we break them all out into columns.
 706 |     def signalContact(signalPD):
 707 |         for x in selected_cols:
 708 |             # Locate Signal Username and move to Username Column
 709 |             signalPD.loc[
 710 |                 (signalPD[x].str.contains("User ID-Username:", na=False)),
 711 |                 "User Name",
 712 |             ] = signalPD[x].str.split(":", n=1, expand=True)[1]
 713 |             # Delete Username entry from origional location
 714 |             signalPD.loc[
 715 |                 signalPD[x].str.contains("User ID-Username:", na=False), [x]
 716 |             ] = ""
 717 |             # delete all befote semicolon
 718 |             signalPD[x] = signalPD[x].str.split(":", n=1, expand=True)[1].str.strip()
 719 | 
 720 |     signalContact(signalPD)
 721 | 
 722 |     signalPD[originIMEI] = phoneData.IMEI
 723 |     signalPD["inputFile"] = phoneData.inFile
 724 |     signalPD["Provenance"] = phoneData.inProvenance
 725 | 
 726 |     export_cols = [originIMEI, "Name", "User Name"]
 727 |     export_cols.extend(selected_cols)
 728 |     print("Located {} Signal contacts".format(len(signalPD["Name"])))
 729 |     print("Exporting {}-SIGNAL.csv".format(phoneData.inFile))
 730 |     logging.info("Exporting Signal messenger from {}".format(phoneData.inFile))
 731 |     signalPD.to_csv(
 732 |         "{}-SIGNAL.csv".format(phoneData.inFile), index=False, columns=export_cols
 733 |     )
 734 | 
 735 | 
 736 | # ----------- Parse Signal Private Messenger--------------------------------------------------------
 737 | def processSignalPrivateMessenger(contactsPD):
 738 |     print("\nProcessing Signal Private Messenger")
 739 |     spmPD = contactsPD[contactsPD["Source"] == "Signal Private Messenger"].copy()
 740 |     spmPD = spmPD.drop("Entries", axis=1).join(
 741 |         spmPD["Entries"].str.split("\n", expand=True)
 742 |     )
 743 |     # spmPD['Entries'].tolist()
 744 |     # spmPD.explode('Entries')
 745 |     # spmPD = spmPD.reset_index(drop=True)
 746 |     spmPD[originIMEI] = phoneData.IMEI
 747 |     spmPD["inputFile"] = phoneData.inFile
 748 |     spmPD["Provenance"] = phoneData.inProvenance
 749 | 
 750 |     selected_cols = []
 751 |     for x in spmPD.columns:
 752 |         if isinstance(x, int):
 753 |             selected_cols.append(x)
 754 | 
 755 |     def spmContacts(spmPD):
 756 |         for x in selected_cols:
 757 |             try:
 758 |                 spmPD.loc[(spmPD[x].str.contains("Phone-:", na=False)), "Phone"] = (
 759 |                     spmPD[x].str.split(":", n=1, expand=True)[1]
 760 |                 )
 761 |             except:
 762 |                 pass
 763 |             try:
 764 |                 spmPD.loc[(spmPD[x].str.contains("User ID-:", na=False)), "User-ID"] = (
 765 |                     spmPD[x].str.split(":", n=1, expand=True)[1]
 766 |                 )
 767 |             except:
 768 |                 pass
 769 |             try:
 770 |                 spmPD.loc[
 771 |                     (spmPD[x].str.contains("User ID-Nickname:", na=False)),
 772 |                     "User-ID-Nickname",
 773 |                 ] = spmPD[x].str.split(":", n=1, expand=True)[1]
 774 |             except:
 775 |                 pass
 776 |             try:
 777 |                 spmPD.loc[
 778 |                     (spmPD[x].str.contains("User ID-Username:", na=False)),
 779 |                     "User-ID-Username",
 780 |                 ] = spmPD[x].str.split(":", n=1, expand=True)[1]
 781 |             except:
 782 |                 pass
 783 |             try:
 784 |                 spmPD.loc[
 785 |                     (spmPD[x].str.contains("User ID-ProfileKey:", na=False)),
 786 |                     "User-ID-ProfileKey",
 787 |                 ] = spmPD[x].str.split(":", n=1, expand=True)[1]
 788 |             except:
 789 |                 pass
 790 | 
 791 |     spmContacts(spmPD)
 792 |     # spmPD.info()
 793 | 
 794 |     exportCols = []
 795 |     # Remove column from previous step
 796 |     for x in spmPD.columns:
 797 |         if isinstance(x, str):
 798 |             # if x !="Provenence" or x != 'originIMEI' or x != 'inputFile':
 799 |             if x not in ["Provenance", "originIMEI", "inputFile"]:
 800 |                 exportCols.append(x)
 801 | 
 802 |     exportCols.extend(["originIMEI", "inputFile", "Provenance"])
 803 | 
 804 |     print("Located {} Signal Private Messenger contacts.".format(len(spmPD["Name"])))
 805 |     print("Exporting {}-Signal-PM.csv".format(phoneData.inFile))
 806 |     logging.info("Exporting Signal Private Messenger from {}".format(phoneData.inFile))
 807 |     spmPD[exportCols].to_csv("{}-Signal-PM.csv".format(phoneData.inFile), index=False)
 808 | 
 809 | 
 810 | # ----------- Parse Snapchat data ------------------------------------------------------------------
 811 | def processSnapChat(contactsPD):
 812 |     print("\nProcessing Snapchat")
 813 |     snapPD = contactsPD[contactsPD["Source"] == "Snapchat"]
 814 |     snapPD = snapPD[["Name", "Entries", "Source"]]
 815 | 
 816 |     # Extract nested entities
 817 |     snapPD = snapPD.drop("Entries", axis=1).join(
 818 |         snapPD["Entries"].str.split("\n", expand=True)
 819 |     )
 820 |     selected_cols = []
 821 |     for x in snapPD.columns:
 822 |         if isinstance(x, int):
 823 |             selected_cols.append(x)
 824 | 
 825 |     def snapContacts(snapPD):
 826 |         for x in selected_cols:
 827 |             snapPD.loc[
 828 |                 (snapPD[x].str.contains("User ID-Username", na=False)), "User Name"
 829 |             ] = snapPD[x].str.split(":", n=1, expand=True)[1]
 830 |             snapPD.loc[
 831 |                 (snapPD[x].str.contains("User ID-User ID", na=False)), "User ID"
 832 |             ] = snapPD[x].str.split(":", n=1, expand=True)[1]
 833 | 
 834 |     snapContacts(snapPD)
 835 | 
 836 |     snapPD[originIMEI] = phoneData.IMEI
 837 |     snapPD["inputFile"] = phoneData.inFile
 838 |     snapPD["Provenance"] = phoneData.inProvenance
 839 | 
 840 |     exportCols = []
 841 |     for x in snapPD.columns:
 842 |         if isinstance(x, str):
 843 |             exportCols.append(x)
 844 |     if debug:
 845 |         print(snapPD[exportCols])
 846 | 
 847 |     print("{} Snapchat contacts located.".format(len(snapPD)))
 848 |     print("Exporting {}-SNAPCHAT.csv".format(phoneData.inFile))
 849 |     logging.info("Exporting Snapchat from {}".format(phoneData.inFile))
 850 |     snapPD[exportCols].to_csv(
 851 |         "{}-SNAPCHAT.csv".format(phoneData.inFile),
 852 |         index=False,
 853 |         columns=[
 854 |             originIMEI,
 855 |             "Name",
 856 |             "User Name",
 857 |             "User ID",
 858 |             "Source",
 859 |             "inputFile",
 860 |             "Provenance",
 861 |         ],
 862 |     )
 863 | 
 864 | 
 865 | # ---- Parse Telegram Contacts--------------------------------------------------------------
 866 | def processTelegram(contactsPD):
 867 |     print("\nProcessing Telegram")
 868 |     telegramPD = contactsPD[contactsPD["Source"] == "Telegram"].copy()
 869 |     telegramPD = telegramPD.drop("Entries", axis=1).join(
 870 |         telegramPD["Entries"].str.split("\n", expand=True)
 871 |     )
 872 |     telegramPD = telegramPD.reset_index(drop=True)
 873 | 
 874 |     selected_cols = []
 875 |     for x in telegramPD.columns:
 876 |         if isinstance(x, int):
 877 |             selected_cols.append(x)
 878 | 
 879 |     def phoneCheck(telegramPD):
 880 |         for x in selected_cols:
 881 |             telegramPD.loc[
 882 |                 (telegramPD[x].str.contains("Phone-", na=False)), "Phone-Number"
 883 |             ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
 884 | 
 885 |             telegramPD.loc[
 886 |                 (telegramPD[x].str.contains("User ID-Peer", na=False)), "Peer-ID"
 887 |             ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
 888 | 
 889 |             telegramPD.loc[
 890 |                 (telegramPD[x].str.contains("User ID-Username", na=False)), "User-Name"
 891 |             ] = telegramPD[x].str.split(":", n=1, expand=True)[1]
 892 | 
 893 |     phoneCheck(telegramPD)
 894 | 
 895 |     telegramPD[originIMEI] = phoneData.IMEI
 896 |     telegramPD["inputFile"] = phoneData.inFile
 897 |     telegramPD["Provenance"] = phoneData.inProvenance
 898 |     telegramPD["source"] = "Telegram"
 899 |     exportCols = []
 900 |     for x in telegramPD.columns:
 901 |         if isinstance(x, str):
 902 |             exportCols.append(x)
 903 |     # Export CSV
 904 |     print("{} Telegram contacts located.".format(len(telegramPD)))
 905 |     print("Exporting {}-TELEGRAM.csv".format(phoneData.inFile))
 906 |     logging.info("Exporting Telegram from {}".format(phoneData.inFile))
 907 |     telegramPD[exportCols].to_csv(
 908 |         "{}-TELEGRAM.csv".format(phoneData.inFile), index=False
 909 |     )
 910 | 
 911 | 
 912 | # ------ Parse Threema Contacts -----------------------------------------------------------------
 913 | def processThreema(contactsPD):
 914 |     print("\nProcessing Threema")
 915 |     threemaPD = contactsPD[contactsPD["Source"] == "Threema"].copy()
 916 |     threemaPD = threemaPD.drop("Entries", axis=1).join(
 917 |         threemaPD["Entries"].str.split("\n", expand=True)
 918 |     )
 919 |     threemaPD = threemaPD.reset_index(drop=True)
 920 | 
 921 |     selected_cols = []
 922 |     for x in threemaPD.columns:
 923 |         if isinstance(x, int):
 924 |             selected_cols.append(x)
 925 | 
 926 |     def ThreemaParse(ThreemaPD):
 927 |         for x in selected_cols:
 928 |             try:
 929 |                 ThreemaPD.loc[
 930 |                     (ThreemaPD[x].str.contains("User ID-identity:", na=False)),
 931 |                     "Threema ID",
 932 |                 ] = ThreemaPD[x].str.split(":", n=1, expand=True)[1]
 933 |             except:
 934 |                 pass
 935 |             try:
 936 |                 ThreemaPD.loc[
 937 |                     (ThreemaPD[x].str.contains("User ID-Username:", na=False)),
 938 |                     "ThreemaUsername",
 939 |                 ] = ThreemaPD[x].str.split(":", n=1, expand=True)[1]
 940 |             except:
 941 |                 pass
 942 | 
 943 |     ThreemaParse(threemaPD)
 944 | 
 945 |     threemaPD[originIMEI] = phoneData.IMEI
 946 |     threemaPD["inputFile"] = phoneData.inFile
 947 |     threemaPD["Provenance"] = phoneData.inProvenance
 948 | 
 949 |     exportCols = []
 950 |     for x in threemaPD.columns:
 951 |         if isinstance(x, str):
 952 |             exportCols.append(x)
 953 | 
 954 |     print("Exporting {}-THREEMA.csv".format(phoneData.inFile))
 955 |     logging.info("Exporting Threema from {}".format(phoneData.inFile))
 956 |     threemaPD[exportCols].to_csv("{}-THREEMA.csv".format(phoneData.inFile), index=False)
 957 | 
 958 | 
 959 | ## Parse WeChat Contacts ------------------------------------------------------------------------
 960 | def processWeChat(contactsPD):
 961 |     print("\nProcessing WeChat")
 962 |     WeChatPD = contactsPD[contactsPD["Source"] == "WeChat"].copy()
 963 |     WeChatPD = WeChatPD.drop("Entries", axis=1).join(
 964 |         WeChatPD["Entries"].str.split("\n", expand=True)
 965 |     )
 966 | 
 967 |     WeChatPD = WeChatPD.reset_index(drop=True)
 968 | 
 969 |     selected_cols = []
 970 |     for x in WeChatPD.columns:
 971 |         if isinstance(x, int):
 972 |             selected_cols.append(x)
 973 | 
 974 |     def WeChatContacts(WeChatPD):
 975 | 
 976 |         for x in selected_cols:
 977 |             # FIXME Usernames that contain @stranger???
 978 |             # FIXME Try / Except / Pass
 979 | 
 980 |             try:
 981 |                 WeChatPD.loc[
 982 |                     (WeChatPD[x].str.contains("User ID-WeChat ID:", na=False)),
 983 |                     "WeChatID",
 984 |                 ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
 985 |             except:
 986 |                 pass
 987 | 
 988 |             try:
 989 |                 WeChatPD.loc[
 990 |                     (WeChatPD[x].str.contains("User ID-QQ:", na=False)), "QQ User ID"
 991 |                 ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
 992 |             except:
 993 |                 pass
 994 | 
 995 |             try:
 996 |                 WeChatPD.loc[
 997 |                     (WeChatPD[x].str.contains("User ID-Username:", na=False)),
 998 |                     "Username",
 999 |                 ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
1000 |             except:
1001 |                 pass
1002 | 
1003 |             try:
1004 |                 WeChatPD.loc[
1005 |                     (WeChatPD[x].str.contains("User ID-LinkedIn ID:", na=False)),
1006 |                     "LinkedIn ID",
1007 |                 ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
1008 |             except:
1009 |                 pass
1010 | 
1011 |             try:
1012 |                 WeChatPD.loc[
1013 |                     (WeChatPD[x].str.contains("User ID-Facebook ID:", na=False)),
1014 |                     "Facebook ID",
1015 |                 ] = WeChatPD[x].str.split(":", n=1, expand=True)[1]
1016 |             except:
1017 |                 pass
1018 | 
1019 |     WeChatContacts(WeChatPD)
1020 | 
1021 |     # Repalace we chat ID's with @ stranhger with blank values as are not we chat user IDs
1022 |     try:
1023 |         WeChatPD.WeChatID = WeChatPD.WeChatID.apply(
1024 |             lambda x: "" if (r"@stranger") in str(x) else x
1025 |         )
1026 |     except:
1027 |         print("WeChat float exception")
1028 |         print(WeChatPD.WeChatID)
1029 |         pass
1030 | 
1031 |     WeChatPD[originIMEI] = phoneData.IMEI
1032 |     WeChatPD["inputFile"] = phoneData.inFile
1033 |     WeChatPD["Provenance"] = phoneData.inProvenance
1034 |     WeChatPD["Source"] = "Weixin"
1035 | 
1036 |     # Export Columns where the title is a string to drop working columns
1037 |     exportCols = []
1038 |     for x in WeChatPD.columns:
1039 |         if isinstance(x, str):
1040 |             exportCols.append(x)
1041 |     print("Located {} WeChat contacts.".format(len(WeChatPD["WeChatID"])))
1042 |     print("Exporting {}-WECHAT.csv".format(phoneData.inFile))
1043 |     logging.info("Exporting WeChat from {}".format(phoneData.inFile))
1044 |     WeChatPD[exportCols].to_csv("{}-WECHAT.csv".format(phoneData.inFile), index=False)
1045 | 
1046 | 
1047 | # ---Parse Whatsapp Contacts----------------------------------------------------------------------
1048 | # Load WhatsApp
1049 | def processWhatsapp(contactsPD):
1050 |     print("\nProcessing WhatsApp")
1051 |     whatsAppPD = contactsPD[contactsPD["Source"] == "WhatsApp"].copy()
1052 |     try:
1053 |         whatsAppPD = whatsAppPD[["Name", "Entries", "Source", "Interaction Statuses"]]
1054 |         # Datatype needs to be object not float to allow filtering by string without throwing an error
1055 |         whatsAppPD["Interaction Statuses"] = whatsAppPD["Interaction Statuses"].astype(
1056 |             object
1057 |         )
1058 |         # Shared contacts are not associated with a Whats app ID and cause problems.
1059 |         print(whatsAppPD.dtypes)
1060 |         whatsAppPD = whatsAppPD[
1061 |             whatsAppPD["Interaction Statuses"].str.contains("Shared", na=False) == False
1062 |         ]
1063 |     except Exception as e:
1064 |         print(e)
1065 |         print("Interaction statuses column not found, ignoring")
1066 |         # print(whatsAppPD)
1067 |         whatsAppPD = whatsAppPD[
1068 |             [
1069 |                 "Name",
1070 |                 "Entries",
1071 |                 "Source",
1072 |             ]
1073 |         ]
1074 | 
1075 |     # Unpack nested data
1076 |     whatsAppPD = whatsAppPD.drop("Entries", axis=1).join(
1077 |         whatsAppPD["Entries"].str.split("\n", expand=True)
1078 |     )
1079 | 
1080 |     # Data is expanded into colums with Integer names, check for these columns and add them to a
1081 |     # list to allow for different width sheets.
1082 |     colList = list(whatsAppPD)
1083 |     selected_cols = []
1084 |     for x in colList:
1085 |         if isinstance(x, int):
1086 |             selected_cols.append(x)
1087 | 
1088 |     # Look for data across expanded columns and shift it to output columns.
1089 |     def whatsappContactProcess(whatsAppPD):
1090 |         for x in selected_cols:
1091 |             whatsAppPD.loc[
1092 |                 (whatsAppPD[x].str.contains("Phone-Mobile", na=False)), "Phone-Mobile"
1093 |             ] = (
1094 |                 whatsAppPD[x]
1095 |                 .str.split(":", n=1, expand=True)[1]
1096 |                 .str.replace(" ", "")
1097 |                 .str.replace("-", "")
1098 |             )
1099 | 
1100 |             whatsAppPD.loc[
1101 |                 (whatsAppPD[x].str.contains("Phone-:", na=False)), "Phone"
1102 |             ] = (
1103 |                 whatsAppPD[x]
1104 |                 .str.split(":", n=1, expand=True)[1]
1105 |                 .str.replace(" ", "")
1106 |                 .str.replace("-", "")
1107 |             )
1108 | 
1109 |             whatsAppPD.loc[
1110 |                 (whatsAppPD[x].str.contains("Phone-Home:", na=False)), "Phone-Home"
1111 |             ] = (
1112 |                 whatsAppPD[x]
1113 |                 .str.split(":", n=1, expand=True)[1]
1114 |                 .str.replace(" ", "")
1115 |                 .str.replace("-", "")
1116 |             )
1117 | 
1118 |             whatsAppPD.loc[
1119 |                 (whatsAppPD[x].str.contains("User ID-Push Name", na=False)), "Push-ID"
1120 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1121 | 
1122 |             whatsAppPD.loc[
1123 |                 (whatsAppPD[x].str.contains("User ID-Id", na=False)), "Id-ID"
1124 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1125 | 
1126 |             whatsAppPD.loc[
1127 |                 (whatsAppPD[x].str.contains("User ID-WhatsApp User Id", na=False)),
1128 |                 "WhatsApp-ID",
1129 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1130 | 
1131 |             whatsAppPD.loc[
1132 |                 (whatsAppPD[x].str.contains("Web address-Professional", na=False)),
1133 |                 "BusinessWebsite",
1134 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1135 | 
1136 |             whatsAppPD.loc[
1137 |                 (whatsAppPD[x].str.contains("Email-Professional", na=False)),
1138 |                 "Business-Email",
1139 |             ] = whatsAppPD[x].str.split(":", n=1, expand=True)[1]
1140 | 
1141 |     whatsappContactProcess(whatsAppPD)
1142 | 
1143 |     # Add IMEI Column
1144 |     whatsAppPD[originIMEI] = phoneData.IMEI
1145 |     whatsAppPD["inputFile"] = phoneData.inFile
1146 |     whatsAppPD["Provenance"] = phoneData.inProvenance
1147 |     whatsAppPD["Source"] = "Whatsapp"
1148 | 
1149 |     # Remove working columns.
1150 |     exportCols = []
1151 |     for x in whatsAppPD.columns:
1152 |         if isinstance(x, str):
1153 |             exportCols.append(x)
1154 |     if debug:
1155 |         print(exportCols)
1156 | 
1157 |     # Export CSV
1158 |     print("{} WhatsApp contacts located".format(len(whatsAppPD["Name"])))
1159 |     print("Exporting {}-WHATSAPP.csv".format(phoneData.inFile))
1160 |     logging.info("Exporting Whatsapp from {}".format(phoneData.inFile))
1161 |     whatsAppPD[exportCols].to_csv(
1162 |         "{}-WHATSAPP.csv".format(phoneData.inFile), index=False
1163 |     )
1164 | 
1165 | 
1166 | # --- Parse Zalo Contacts --------------------------------------------------------------------
1167 | def processZalo(contactsPD):
1168 |     print("\nProcessinf Zalo")
1169 |     ZaloPD = contactsPD[contactsPD["Source"] == "Zalo"]
1170 |     ZaloPD = ZaloPD.drop("Entries", axis=1).join(
1171 |         ZaloPD["Entries"].str.split("\n", expand=True)
1172 |     )
1173 |     selected_cols = []
1174 |     for x in ZaloPD.columns:
1175 |         if isinstance(x, int):
1176 |             selected_cols.append(x)
1177 | 
1178 |     def processZaloContacts(ZaloPD):
1179 |         for x in selected_cols:
1180 |             ZaloPD.loc[
1181 |                 (ZaloPD[x].str.contains("User ID-User Name:", na=False)),
1182 |                 "ZaloUserName",
1183 |             ] = ZaloPD[x].str.split(":", n=1, expand=True)[1]
1184 | 
1185 |             ZaloPD.loc[
1186 |                 (ZaloPD[x].str.contains("User ID-Id:", na=False)),
1187 |                 "ZaloUserID",
1188 |             ] = ZaloPD[x].str.split(":", n=1, expand=True)[1]
1189 | 
1190 |     processZaloContacts(ZaloPD)
1191 | 
1192 |     ZaloPD[originIMEI] = phoneData.IMEI
1193 |     ZaloPD["inputFile"] = phoneData.inFile
1194 |     ZaloPD["Provenance"] = phoneData.inProvenance
1195 | 
1196 |     exportCols = []
1197 |     for x in ZaloPD.columns:
1198 |         if isinstance(x, str):
1199 |             exportCols.append(x)
1200 | 
1201 |     print("Exporting {}-ZALO.csv".format(phoneData.inFile))
1202 |     logging.info("Exporting Zalo from {}".format(phoneData.inFile))
1203 |     ZaloPD[exportCols].to_csv("{}-ZALO.csv".format(phoneData.inFile), index=False)
1204 | 
1205 | 
1206 | # ------- Argument parser for command line arguments -----------------------------------------
1207 | 
1208 | if __name__ == "__main__":
1209 |     parser = argparse.ArgumentParser(
1210 |         description=__description__,
1211 |         epilog="Developed by {}".format(str(__author__), str(__version__)),
1212 |     )
1213 | 
1214 |     parser.add_argument(
1215 |         "-f",
1216 |         "--f",
1217 |         dest="inputFilename",
1218 |         help="Path to Excel Spreadsheet",
1219 |         required=False,
1220 |     )
1221 | 
1222 |     parser.add_argument(
1223 |         "-p",
1224 |         "--p",
1225 |         dest="inputProvenance",
1226 |         choices=provenanceCols,
1227 |         required=False,
1228 |     )
1229 | 
1230 |     parser.add_argument(
1231 |         "-b",
1232 |         "--bulk",
1233 |         dest="bulk",
1234 |         required=False,
1235 |         action="store_true",
1236 |         help="Bulk process Excel spreadsheets in working directory.",
1237 |     )
1238 | 
1239 |     args = parser.parse_args()
1240 | 
1241 |     if len(sys.argv) == 1:
1242 |         parser.print_help()
1243 |         parser.exit()
1244 | 
1245 |     if args.bulk:
1246 |         print("Bulk Process")
1247 |         bulkProcessor(args.inputProvenance)
1248 | 
1249 |     if args.inputFilename:
1250 |         if not os.path.exists(args.inputFilename):
1251 |             print(
1252 |                 "Error: '{}' does not exist or is not a file.".format(
1253 |                     args.inputFilename
1254 |                 )
1255 |             )
1256 |             sys.exit(1)
1257 |         processMetadata(args.inputFilename, args.inputProvenance)
1258 | 


--------------------------------------------------------------------------------