├── LICENSE.txt ├── README.md └── pdiff2.py /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 netspooky 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pdiff2 2 | 3 | pDiff2 is a standalone tool and library for analyzing pcaps, as well as text files containing lines of hex data. It's a combination of several smaller scripts I had worked on previously, along with the core logic of the [original pDiff](https://github.com/netspooky/pdiff). I wanted to rename it because pdiff is super generic and it's confusing. 4 | 5 | I switched to using the pyshark library as it's a wrapper for Wireshark/Tshark and gives access to much nicer dissections (and even custom dissectors if they're installed). 6 | 7 | > This tool is under active development, many features may change! 8 | 9 | Requirements 10 | - [pyshark](https://github.com/KimiNewt/pyshark) 11 | 12 | ## Usage 13 | 14 | You can use pDiff2 in 2 ways 15 | 16 | ### As a standalone tool 17 | 18 | - To analyze a pcap, use the option `-p myPcap.pcap` 19 | - To analyze a text file use the option `-t myTextFile.txt` 20 | 21 | Command Line Options 22 | ``` 23 | python3 pdiff2.py -h 24 | usage: pdiff2.py [-h] [-p INPCAP] [-t INTEXT] [-f PFILTER] [--packet-offset POFFSET] [-v] [-c] [-s] 25 | 26 | pDiff2 27 | 28 | options: 29 | -h, --help show this help message and exit 30 | -p INPCAP Pcap File to Analyze 31 | -t INTEXT Text File to Analyze 32 | -f PFILTER Display filter to use 33 | --packet-offset POFFSET 34 | Offset in packet to diff 35 | -v verbose output 36 | -c List common bytes per offset 37 | -s Show string stats 38 | ``` 39 | 40 | ### As a library 41 | 42 | pDiff2 has class methods for performing packet analysis. 43 | 44 | An example of creating a pDiff object is as follows: 45 | ```python 46 | pDiff("./myPcap.pcap,"pcap", pFilter="some wireshark filter", verbose=True, pOffset=0x2a) 47 | myPkts.packetHeatmap() # print a heatmap of all available packets 48 | ``` 49 | 50 | Text files containing lines of hex data are also supported. Each line is treated as a packet, and can be analyzed in the same way as pcaps. The main difference is that dissections and filters aren't possible (currently) with this format, but all of the other analysis remains the same. 51 | ```python 52 | pDiff("./mydata.txt,"text", verbose=True) 53 | myPkts.packetHeatmap() # print a heatmap of all available packets 54 | ``` 55 | 56 | ## What's Next? 57 | 58 | There are a few things I want to finish: 59 | 60 | - [ ] Make the color scheming nicer - Currently it's gross but I did it to make it work 61 | - [ ] Add nicer output in general - I was thinking of using that `rich` library instead of writing my own 62 | - [ ] Add Yara Support - I had another script that did frame aware yara signature scans on a pcap and alerted when the signature matched inside of a frame. I want to rework this so it works on both pcaps and text input. 63 | - [ ] Fix up the data structure that holds all the info. It's kind of messy. 64 | - [ ] Add more analysis functions 65 | - [ ] LiveDiff - Raw socket listener for diffing packets 66 | 67 | Others: 68 | 69 | - [ ] It would be nice to add a repl mode 70 | - [ ] Explore more pyshark features 71 | -------------------------------------------------------------------------------- /pdiff2.py: -------------------------------------------------------------------------------- 1 | import pyshark 2 | import argparse 3 | from collections import Counter 4 | import string 5 | 6 | parser = argparse.ArgumentParser(description='pDiff2') 7 | parser.add_argument('-p', dest='inPcap', help='Pcap File to Analyze') 8 | parser.add_argument('-t', dest='inText', help='Text File to Analyze') 9 | parser.add_argument('-f', dest='pFilter', default="", help='Display Filter to use') 10 | #parser.add_argument('-l', dest='dLayer', default="", help='Dissection Layer') # Unsupported as of now 11 | parser.add_argument('--packet-offset', dest='pOffset', type=lambda x: int(x,0), default=0, help='Offset in packet to diff') 12 | parser.add_argument('-v', dest='verbose', help='verbose output',action="store_true") 13 | parser.add_argument('-c', dest='listCommonBytes', help='List common bytes per offset',action="store_true") 14 | #parser.add_argument('-u', dest='unique', help='Highlight unique values in packet output',action="store_true") 15 | parser.add_argument('-s', dest='stringStats', help='Show string stats',action="store_true") 16 | args = parser.parse_args() 17 | 18 | CGREY7 = "\x1b[48;5;253;38;5;16m" 19 | CGREY6 = "\x1b[48;5;251;38;5;16m" 20 | CGREY5 = "\x1b[48;5;249;38;5;16m" 21 | CGREY4 = "\x1b[48;5;247;38;5;16m" 22 | CGREY3 = "\x1b[48;5;245;38;5;16m" 23 | CGREY2 = "\x1b[48;5;243;38;5;231m" 24 | CGREY1 = "\x1b[48;5;241;38;5;231m" 25 | CGREY0 = "\x1b[48;5;239;38;5;231m" 26 | 27 | COLOR0 = "\x1b[48;5;230;38;5;0m" # White 28 | COLOR1 = "\x1b[48;5;227;38;5;0m" # Light Yellow 29 | COLOR2 = "\x1b[48;5;220;38;5;0m" # Yellow Orange 30 | COLOR3 = "\x1b[48;5;214;38;5;0m" # Light Orange 31 | COLOR4 = "\x1b[48;5;208;38;5;231m" # Orange 32 | COLOR5 = "\x1b[48;5;202;38;5;231m" # Dark Orange 33 | COLOR6 = "\x1b[48;5;196;38;5;231m" # Red 34 | COLOR7 = "\x1b[48;5;124;38;5;231m" # Dark Red 35 | COLORN = "\x1b[0m" 36 | COLORX = "\x1b[48;5;244;38;5;0m" 37 | COLORAV = "\x1b[38;5;51m" # For average packet data color 38 | 39 | class pDiff: 40 | def __init__(self, inFile, dataMode, pFilter="", pOffset=0, verbose=False): 41 | self.captureFile = inFile 42 | self.pFilter = pFilter 43 | self.verbose = verbose 44 | self.pOffset = pOffset 45 | self.dataMode = dataMode # There are two modes, pcap and text. text is just line by line input of a text file containing ascii hex 46 | if self.dataMode == "pcap": 47 | self.packets = pyshark.FileCapture(self.captureFile,use_json=True,include_raw=True,display_filter=self.pFilter) 48 | elif self.dataMode == "text": 49 | self.packets = self.getTextPackets() 50 | else: 51 | print("Unsupported input type!") 52 | return 53 | self.pBytes = {} # Dict full of each offset and the value of each packet that has a byte at that offset 54 | self.pStrings = {} # This is the structure that contains all of packet string data 55 | self.pLens = {} # Hacky for now, holds the lengths of all the packets for length analysis 56 | self.strUniques = [] # This contains unique strings 57 | self.initPackets() # Get it going 58 | def dHex(self,inBytes,baseAddr=0): 59 | offs = 0 60 | while offs < len(inBytes): 61 | bHex = "" 62 | bAsc = "" 63 | bChunk = inBytes[offs:offs+16] 64 | for b in bChunk: 65 | bAsc += chr(b) if chr(b).isprintable() and b < 0x7f else '.' 66 | bHex += "{:02x} ".format(b) 67 | sp = " "*(48-len(bHex)) 68 | print("{:08x}: {}{} {}".format(baseAddr + offs, bHex, sp, bAsc)) 69 | offs = offs + 16 70 | def getTextPackets(self): 71 | with open(self.captureFile, "r") as f: 72 | return f.readlines() 73 | def initPackets(self, showLayers=False, showDissection=False, showFrameInfo=False, printPacketHex=True, printPacketStrings=True): 74 | if self.dataMode == "pcap": 75 | print("Analyzing PCAP") 76 | for pkt in self.packets: 77 | rawPkt = pkt.get_raw_packet() 78 | rawPkt = rawPkt[self.pOffset:] 79 | pktStrings = self.getPacketStrings(rawPkt) # string actions 80 | self.pStrings[f"f{str(pkt.number)}"] = pktStrings # Put all the strings in the packet buffer 81 | self.pLens[f"f{str(pkt.number)}"] = len(rawPkt) 82 | if self.verbose: 83 | print(f"Frame {pkt.number}") 84 | if showFrameInfo: 85 | print(pkt.frame_info) # This will show each packet's wireshark frame info 86 | if showDissection: # This will show each packet's full dissection 87 | print(pkt.show()) # pretty_print seems to be the same as show 88 | if showLayers: 89 | print(pkt.layers) # This will show each packet's layers 90 | if printPacketHex: 91 | self.dHex(rawPkt,self.pOffset) 92 | if printPacketStrings: 93 | print(f"Strings in frame {pkt.number}") 94 | for pktStr in pktStrings: 95 | print(f"- {pktStr[0]:04x}: {repr(pktStr[1])}") 96 | currentByte = 0 # Byte number in given packet 97 | for pktByte in rawPkt: 98 | if self.pBytes.get(str(currentByte)) is None: 99 | self.pBytes[str(currentByte)] = {} 100 | fNum = "f"+str(pkt.number) # The packet number 101 | self.pBytes[str(currentByte)][fNum] = pktByte # The actual payload data 102 | currentByte = currentByte + 1 103 | else: 104 | fNum = "f"+str(pkt.number) # The packet number 105 | self.pBytes[str(currentByte)][fNum] = pktByte # The actual payload data 106 | currentByte = currentByte + 1 107 | elif self.dataMode == "text": 108 | # Text has no packet metadata or filter support, but the parsing logic is very similar but we have to track the "packet" number manually 109 | pktNum = 0 110 | print("Analyzing Text File") 111 | for pkt in self.packets: 112 | rawPkt = bytes.fromhex(pkt) 113 | rawPkt = rawPkt[self.pOffset:] 114 | pktStrings = self.getPacketStrings(rawPkt) 115 | self.pStrings[f"f{str(pktNum)}"] = pktStrings # Put all the strings in the packet buffer 116 | self.pLens[f"f{str(pktNum)}"] = len(rawPkt) # Record the curent length 117 | if self.verbose: 118 | print(f"Frame {pktNum}") 119 | if printPacketHex: 120 | self.dHex(rawPkt,self.pOffset) 121 | if printPacketStrings: 122 | print(f"Strings in frame {pktNum}") 123 | for pktStr in pktStrings: 124 | print(f"{pktStr[0]:04x}: {repr(pktStr[1])}") 125 | currentByte = 0 # Byte number in given packet 126 | for pktByte in rawPkt: 127 | if self.pBytes.get(str(currentByte)) is None: 128 | self.pBytes[str(currentByte)] = {} 129 | fNum = "f"+str(pktNum) # The packet number 130 | self.pBytes[str(currentByte)][fNum] = pktByte # The actual payload data 131 | currentByte = currentByte + 1 132 | else: 133 | fNum = "f"+str(pktNum) # The packet number 134 | self.pBytes[str(currentByte)][fNum] = pktByte # The actual payload data 135 | currentByte = currentByte + 1 136 | pktNum = pktNum + 1 137 | def getUniquePacketLens(self): 138 | uniqueLens = [] 139 | for pkt in self.pLens.items(): 140 | if pkt[1] not in uniqueLens: 141 | uniqueLens.append(pkt[1]) 142 | return uniqueLens 143 | def listCommonBytesPerOffset(self, asciiPrint=True, maxComp=10): 144 | # Call with -c argument 145 | for currentPkt in self.pBytes.keys(): 146 | mostCommon = Counter(self.pBytes[currentPkt].values()).most_common(maxComp) # Get most common values 147 | if len(mostCommon) > 0: 148 | tBytes = len(self.pBytes[currentPkt]) 149 | realOffset = int(currentPkt)+self.pOffset 150 | print(f"\033[1;33m[ Offset 0x{realOffset:02x} ] Total: {tBytes}") 151 | for commonValue in mostCommon: 152 | if asciiPrint: # This handles the printing of ascii characters next to the offset 153 | if commonValue[0] < 127 and chr(commonValue[0]).isprintable(): 154 | charPrint = chr(commonValue[0]) 155 | print(f" \033[38;5;219m0x{commonValue[0]:02x}\033[0m - {commonValue[1]}/{tBytes} ({round((commonValue[1]/tBytes)*100,2)}%)\t'{charPrint}'") 156 | else: 157 | print(f" \033[38;5;219m0x{commonValue[0]:02x}\033[0m - {commonValue[1]}/{tBytes} ({round((commonValue[1]/tBytes)*100,2)}%)") 158 | else: 159 | print(f" \033[38;5;219m0x{commonValue[0]:02x}\033[0m - {commonValue[1]}/{tBytes} ({round((commonValue[1]/tBytes)*100,2)}%)") 160 | def packetHeatmap(self): 161 | # This generates an average packet with a heat map. The more unique values a given byte has, the more intense the color becomes 162 | print(f"\nPacket Average (With Unique Value Heatmap)") 163 | print(f"-[{CGREY0} 1+ {CGREY1} 4+ {CGREY2} 8+ {CGREY3} 12+ {CGREY4} 16+ {CGREY5} 20+ {CGREY6} 24+ {CGREY7} 28+ {COLORN}]-") 164 | print(f"-[{COLOR0} 32+ {COLOR1} 64+ {COLOR2} 96+ {COLOR3} 128+ {COLOR4} 160+ {COLOR5} 192+ {COLOR6} 224+ {COLOR7} 256 {COLORN}]-") 165 | print() 166 | pktAverage = "" 167 | pktAscii = "" 168 | currentByteInRow = 0 169 | numRows = 0 170 | bSep = " " 171 | packetOffset = 0 172 | uniqueLens = self.getUniquePacketLens() 173 | for pktData in self.pBytes.keys(): # iterate over packets keys 174 | pSet = set(self.pBytes[pktData].values()) # This is the number of unique values for this offset 175 | pSetLen = len(pSet) # This is the length of this set of values 176 | if packetOffset in uniqueLens: 177 | bSep = "\x1b[38;5;213m]" # This puts a bracket to show where a previous packet ended 178 | if pSetLen == 1: 179 | r = int(list(self.pBytes[pktData].values())[0]) 180 | pktAverage += f"{COLORAV}{r:02x} {COLORN}" 181 | pktAscii += f"{COLORAV}{chr(r)}{COLORN}" if chr(r).isprintable() and r < 0x7f else f'{COLORAV}.{COLORN}' 182 | else: 183 | COLORZ = "" 184 | COLORZ = CGREY0 if pSetLen > 0 else COLORZ 185 | COLORZ = CGREY1 if pSetLen >= 4 else COLORZ 186 | COLORZ = CGREY2 if pSetLen >= 8 else COLORZ 187 | COLORZ = CGREY3 if pSetLen >= 12 else COLORZ 188 | COLORZ = CGREY4 if pSetLen >= 16 else COLORZ 189 | COLORZ = CGREY5 if pSetLen >= 20 else COLORZ 190 | COLORZ = CGREY6 if pSetLen >= 24 else COLORZ 191 | COLORZ = CGREY7 if pSetLen >= 28 else COLORZ 192 | COLORZ = COLOR0 if pSetLen >= (32*1) else COLORZ 193 | COLORZ = COLOR1 if pSetLen >= (32*2) else COLORZ 194 | COLORZ = COLOR2 if pSetLen >= (32*3) else COLORZ 195 | COLORZ = COLOR3 if pSetLen >= (32*4) else COLORZ 196 | COLORZ = COLOR4 if pSetLen >= (32*5) else COLORZ 197 | COLORZ = COLOR5 if pSetLen >= (32*6) else COLORZ 198 | COLORZ = COLOR6 if pSetLen >= (32*7) else COLORZ 199 | COLORZ = COLOR7 if pSetLen >= (32*8) else COLORZ 200 | pktAverage += f"{COLORZ} {COLORN}{bSep}" 201 | pktAscii += f"{COLORZ} {COLORN}" 202 | currentByteInRow = currentByteInRow + 1 203 | bSep = " " 204 | if currentByteInRow == 16: 205 | print(f"{self.pOffset+(numRows*16):04x} {pktAverage}{' '*(16-currentByteInRow)} {pktAscii}") 206 | pktAverage = "" 207 | pktAscii = "" 208 | currentByteInRow = 0 209 | numRows = numRows + 1 210 | packetOffset = packetOffset + 1 211 | print(f"{self.pOffset+(numRows*16):04x} {pktAverage}{' '*(16-currentByteInRow)} {pktAscii}") 212 | def getPacketStrings(self,packetPayload, strLenMin=4, strModeStrict=False): 213 | outString = "" 214 | strList = [] 215 | offs = 0 216 | strlen = 0 217 | for pktChar in packetPayload.decode("latin-1"): 218 | offs = offs + 1 # counting the offset of the packet in total 219 | if pktChar in string.printable: 220 | if strModeStrict: 221 | if pktChar in string.punctuation: 222 | continue 223 | if pktChar in string.whitespace: 224 | continue 225 | outString += pktChar 226 | strlen = strlen + 1 227 | continue 228 | if len(outString) >= strLenMin: 229 | stroffs = offs - strlen - 1 # The -1 is to keep it consistent with the 0 index offset 230 | strList.append((stroffs,outString)) 231 | outString = "" 232 | strlen = 0 233 | if len(outString) >= strLenMin: # catch at end of packet buffer 234 | stroffs = offs - strlen # Double check that this one isn't off by one. Might need a special case if offs == 0 235 | strList.append((stroffs,outString)) 236 | return strList 237 | def enumUniqueStrings(self): 238 | # Helper for stringStats 239 | strList = [] 240 | for dFrame in self.pStrings: 241 | for packetString in self.pStrings[dFrame]: 242 | strList.append(packetString[1]) 243 | for i, c in Counter(strList).most_common(): 244 | if c == 1: 245 | self.strUniques.append(i) 246 | def stringStats(self, strShowFilter=True): 247 | # Call with -s argument 248 | stringList = [] # List of all the strings 249 | offsetList = [] # List of all the offsets 250 | lengthsList = [] # Most common lengths 251 | for pi in self.pStrings: 252 | lengthsList.append(self.pLens[pi]) 253 | for ps in self.pStrings[pi]: 254 | offsetList.append(ps[0]) 255 | stringList.append(ps[1]) 256 | mcs = Counter(stringList) 257 | mco = Counter(offsetList) 258 | mcl = Counter(lengthsList) 259 | print("\nMost Common Strings:\nCount\tString") 260 | for s in mcs.most_common(10): 261 | print(f' {s[1]}\t{repr(s[0])}',end="") 262 | if strShowFilter: # If you want the filter to be automatically shown 263 | print(f' \t frame contains {":".join("{:02x}".format(ord(c)) for c in s[0])}') 264 | else: 265 | print() 266 | print("\nMost Common String Offsets:\nCount\tOffset") 267 | for pktMostCommonOffset in mco.most_common(10): 268 | print(f' {pktMostCommonOffset[1]}\t0x{pktMostCommonOffset[0]:04x}') 269 | print("\nMost Common Payload Lengths:\nCount\tLength") 270 | for pktMostCommonLen in mcl.most_common(10): 271 | print(f' {pktMostCommonLen[1]}\t{pktMostCommonLen[0]}') 272 | self.enumUniqueStrings() 273 | if len(self.strUniques) > 0: 274 | print("\nUnique Strings:") 275 | for u in self.strUniques: 276 | print(f' {repr(u)}') 277 | else: 278 | print("\nNo Unique Strings Found!") 279 | 280 | if __name__ == '__main__': 281 | if args.inPcap and args.inText: 282 | print("Only use one: pcap or text") 283 | if args.inPcap: 284 | myPkts = pDiff(args.inPcap,"pcap", pFilter=args.pFilter, verbose=args.verbose, pOffset=args.pOffset) 285 | if args.inText: 286 | myPkts = pDiff(args.inText,"text", verbose=args.verbose, pOffset=args.pOffset) 287 | myPkts.packetHeatmap() 288 | if args.listCommonBytes: 289 | myPkts.listCommonBytesPerOffset() 290 | if args.stringStats: 291 | myPkts.stringStats() 292 | --------------------------------------------------------------------------------