├── LICENSE.txt
├── README.md
└── pdiff2.py


/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 netspooky
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # pdiff2
 2 | 
 3 | pDiff2 is a standalone tool and library for analyzing pcaps, as well as text files containing lines of hex data. It's a combination of several smaller scripts I had worked on previously, along with the core logic of the [original pDiff](https://github.com/netspooky/pdiff). I wanted to rename it because pdiff is super generic and it's confusing.
 4 | 
 5 | I switched to using the pyshark library as it's a wrapper for Wireshark/Tshark and gives access to much nicer dissections (and even custom dissectors if they're installed).
 6 | 
 7 | > This tool is under active development, many features may change!
 8 | 
 9 | Requirements
10 | - [pyshark](https://github.com/KimiNewt/pyshark)
11 | 
12 | ## Usage
13 | 
14 | You can use pDiff2 in 2 ways
15 | 
16 | ### As a standalone tool
17 | 
18 | - To analyze a pcap, use the option `-p myPcap.pcap`
19 | - To analyze a text file use the option `-t myTextFile.txt`
20 | 
21 | Command Line Options
22 | ```
23 | python3 pdiff2.py -h
24 | usage: pdiff2.py [-h] [-p INPCAP] [-t INTEXT] [-f PFILTER] [--packet-offset POFFSET] [-v] [-c] [-s]
25 | 
26 | pDiff2
27 | 
28 | options:
29 |   -h, --help            show this help message and exit
30 |   -p INPCAP             Pcap File to Analyze
31 |   -t INTEXT             Text File to Analyze
32 |   -f PFILTER            Display filter to use
33 |   --packet-offset POFFSET
34 |                         Offset in packet to diff
35 |   -v                    verbose output
36 |   -c                    List common bytes per offset
37 |   -s                    Show string stats
38 | ```
39 | 
40 | ### As a library
41 | 
42 | pDiff2 has class methods for performing packet analysis.
43 | 
44 | An example of creating a pDiff object is as follows:
45 | ```python
46 | pDiff("./myPcap.pcap,"pcap", pFilter="some wireshark filter", verbose=True, pOffset=0x2a)
47 | myPkts.packetHeatmap() # print a heatmap of all available packets
48 | ```
49 | 
50 | Text files containing lines of hex data are also supported. Each line is treated as a packet, and can be analyzed in the same way as pcaps. The main difference is that dissections and filters aren't possible (currently) with this format, but all of the other analysis remains the same.
51 | ```python
52 | pDiff("./mydata.txt,"text", verbose=True)
53 | myPkts.packetHeatmap() # print a heatmap of all available packets
54 | ```
55 | 
56 | ## What's Next?
57 | 
58 | There are a few things I want to finish:
59 | 
60 | - [ ] Make the color scheming nicer - Currently it's gross but I did it to make it work
61 | - [ ] Add nicer output in general - I was thinking of using that `rich` library instead of writing my own
62 | - [ ] Add Yara Support - I had another script that did frame aware yara signature scans on a pcap and alerted when the signature matched inside of a frame. I want to rework this so it works on both pcaps and text input.
63 | - [ ] Fix up the data structure that holds all the info. It's kind of messy.
64 | - [ ] Add more analysis functions
65 | - [ ] LiveDiff - Raw socket listener for diffing packets
66 | 
67 | Others:
68 | 
69 | - [ ] It would be nice to add a repl mode 
70 | - [ ] Explore more pyshark features
71 | 


--------------------------------------------------------------------------------
/pdiff2.py:
--------------------------------------------------------------------------------
  1 | import pyshark
  2 | import argparse
  3 | from collections import Counter 
  4 | import string
  5 | 
  6 | parser = argparse.ArgumentParser(description='pDiff2')
  7 | parser.add_argument('-p', dest='inPcap', help='Pcap File to Analyze')
  8 | parser.add_argument('-t', dest='inText', help='Text File to Analyze')
  9 | parser.add_argument('-f', dest='pFilter', default="", help='Display Filter to use')
 10 | #parser.add_argument('-l', dest='dLayer', default="", help='Dissection Layer') # Unsupported as of now
 11 | parser.add_argument('--packet-offset', dest='pOffset', type=lambda x: int(x,0), default=0, help='Offset in packet to diff')
 12 | parser.add_argument('-v', dest='verbose', help='verbose output',action="store_true")
 13 | parser.add_argument('-c', dest='listCommonBytes', help='List common bytes per offset',action="store_true")
 14 | #parser.add_argument('-u', dest='unique', help='Highlight unique values in packet output',action="store_true")
 15 | parser.add_argument('-s', dest='stringStats', help='Show string stats',action="store_true")
 16 | args    = parser.parse_args()
 17 | 
 18 | CGREY7 = "\x1b[48;5;253;38;5;16m"
 19 | CGREY6 = "\x1b[48;5;251;38;5;16m"
 20 | CGREY5 = "\x1b[48;5;249;38;5;16m"
 21 | CGREY4 = "\x1b[48;5;247;38;5;16m"
 22 | CGREY3 = "\x1b[48;5;245;38;5;16m"
 23 | CGREY2 = "\x1b[48;5;243;38;5;231m"
 24 | CGREY1 = "\x1b[48;5;241;38;5;231m"
 25 | CGREY0 = "\x1b[48;5;239;38;5;231m"
 26 | 
 27 | COLOR0 = "\x1b[48;5;230;38;5;0m" # White
 28 | COLOR1 = "\x1b[48;5;227;38;5;0m" # Light Yellow
 29 | COLOR2 = "\x1b[48;5;220;38;5;0m" # Yellow Orange
 30 | COLOR3 = "\x1b[48;5;214;38;5;0m" # Light Orange
 31 | COLOR4 = "\x1b[48;5;208;38;5;231m" # Orange
 32 | COLOR5 = "\x1b[48;5;202;38;5;231m" # Dark Orange
 33 | COLOR6 = "\x1b[48;5;196;38;5;231m" # Red
 34 | COLOR7 = "\x1b[48;5;124;38;5;231m" # Dark Red
 35 | COLORN = "\x1b[0m"
 36 | COLORX = "\x1b[48;5;244;38;5;0m"
 37 | COLORAV = "\x1b[38;5;51m" # For average packet data color
 38 | 
 39 | class pDiff:
 40 |     def __init__(self, inFile, dataMode, pFilter="", pOffset=0, verbose=False):
 41 |         self.captureFile = inFile
 42 |         self.pFilter = pFilter
 43 |         self.verbose = verbose
 44 |         self.pOffset = pOffset
 45 |         self.dataMode = dataMode # There are two modes, pcap and text. text is just line by line input of a text file containing ascii hex
 46 |         if self.dataMode == "pcap":
 47 |             self.packets = pyshark.FileCapture(self.captureFile,use_json=True,include_raw=True,display_filter=self.pFilter)
 48 |         elif self.dataMode == "text":
 49 |             self.packets = self.getTextPackets()
 50 |         else:
 51 |             print("Unsupported input type!")
 52 |             return
 53 |         self.pBytes = {} # Dict full of each offset and the value of each packet that has a byte at that offset
 54 |         self.pStrings = {} # This is the structure that contains all of packet string data
 55 |         self.pLens = {} # Hacky for now, holds the lengths of all the packets for length analysis
 56 |         self.strUniques = [] # This contains unique strings
 57 |         self.initPackets() # Get it going
 58 |     def dHex(self,inBytes,baseAddr=0):
 59 |         offs = 0
 60 |         while offs < len(inBytes):
 61 |             bHex = ""
 62 |             bAsc = ""
 63 |             bChunk = inBytes[offs:offs+16]
 64 |             for b in bChunk:
 65 |                 bAsc += chr(b) if chr(b).isprintable() and b < 0x7f else '.'
 66 |                 bHex += "{:02x} ".format(b)
 67 |             sp = " "*(48-len(bHex))
 68 |             print("{:08x}: {}{} {}".format(baseAddr + offs, bHex, sp, bAsc))
 69 |             offs = offs + 16
 70 |     def getTextPackets(self):
 71 |         with open(self.captureFile, "r") as f:
 72 |             return f.readlines()
 73 |     def initPackets(self, showLayers=False, showDissection=False, showFrameInfo=False, printPacketHex=True, printPacketStrings=True):
 74 |         if self.dataMode == "pcap":
 75 |             print("Analyzing PCAP")
 76 |             for pkt in self.packets:
 77 |                 rawPkt = pkt.get_raw_packet()
 78 |                 rawPkt = rawPkt[self.pOffset:]
 79 |                 pktStrings = self.getPacketStrings(rawPkt) # string actions
 80 |                 self.pStrings[f"f{str(pkt.number)}"] = pktStrings # Put all the strings in the packet buffer
 81 |                 self.pLens[f"f{str(pkt.number)}"] = len(rawPkt)
 82 |                 if self.verbose:
 83 |                     print(f"Frame {pkt.number}")
 84 |                     if showFrameInfo:
 85 |                         print(pkt.frame_info) # This will show each packet's wireshark frame info
 86 |                     if showDissection:  # This will show each packet's full dissection
 87 |                         print(pkt.show()) # pretty_print seems to be the same as show
 88 |                     if showLayers:
 89 |                         print(pkt.layers) # This will show each packet's layers
 90 |                     if printPacketHex:
 91 |                         self.dHex(rawPkt,self.pOffset)
 92 |                     if printPacketStrings:
 93 |                         print(f"Strings in frame {pkt.number}")
 94 |                         for pktStr in pktStrings:
 95 |                             print(f"- {pktStr[0]:04x}: {repr(pktStr[1])}")
 96 |                 currentByte = 0 # Byte number in given packet
 97 |                 for pktByte in rawPkt:
 98 |                     if self.pBytes.get(str(currentByte)) is None:
 99 |                         self.pBytes[str(currentByte)] = {}
100 |                         fNum = "f"+str(pkt.number) # The packet number
101 |                         self.pBytes[str(currentByte)][fNum] = pktByte # The actual payload data
102 |                         currentByte = currentByte + 1
103 |                     else:
104 |                         fNum = "f"+str(pkt.number) # The packet number
105 |                         self.pBytes[str(currentByte)][fNum] = pktByte # The actual payload data
106 |                         currentByte = currentByte + 1
107 |         elif self.dataMode == "text":
108 |             # Text has no packet metadata or filter support, but the parsing logic is very similar but we have to track the "packet" number manually
109 |             pktNum = 0
110 |             print("Analyzing Text File")
111 |             for pkt in self.packets:
112 |                 rawPkt = bytes.fromhex(pkt)
113 |                 rawPkt = rawPkt[self.pOffset:]
114 |                 pktStrings = self.getPacketStrings(rawPkt)
115 |                 self.pStrings[f"f{str(pktNum)}"] = pktStrings # Put all the strings in the packet buffer
116 |                 self.pLens[f"f{str(pktNum)}"] = len(rawPkt) # Record the curent length
117 |                 if self.verbose:
118 |                     print(f"Frame {pktNum}")
119 |                     if printPacketHex:
120 |                         self.dHex(rawPkt,self.pOffset)
121 |                     if printPacketStrings:
122 |                         print(f"Strings in frame {pktNum}")
123 |                         for pktStr in pktStrings:
124 |                             print(f"{pktStr[0]:04x}: {repr(pktStr[1])}")
125 |                 currentByte = 0 # Byte number in given packet
126 |                 for pktByte in rawPkt:
127 |                     if self.pBytes.get(str(currentByte)) is None:
128 |                         self.pBytes[str(currentByte)] = {}
129 |                         fNum = "f"+str(pktNum) # The packet number
130 |                         self.pBytes[str(currentByte)][fNum] = pktByte # The actual payload data
131 |                         currentByte = currentByte + 1
132 |                     else:
133 |                         fNum = "f"+str(pktNum) # The packet number
134 |                         self.pBytes[str(currentByte)][fNum] = pktByte # The actual payload data
135 |                         currentByte = currentByte + 1
136 |                 pktNum = pktNum + 1
137 |     def getUniquePacketLens(self):
138 |         uniqueLens = []
139 |         for pkt in self.pLens.items():
140 |             if pkt[1] not in uniqueLens:
141 |                 uniqueLens.append(pkt[1])
142 |         return uniqueLens
143 |     def listCommonBytesPerOffset(self, asciiPrint=True, maxComp=10):
144 |         # Call with -c argument
145 |         for currentPkt in self.pBytes.keys():
146 |           mostCommon = Counter(self.pBytes[currentPkt].values()).most_common(maxComp) # Get most common values
147 |           if len(mostCommon) > 0:
148 |             tBytes = len(self.pBytes[currentPkt])
149 |             realOffset = int(currentPkt)+self.pOffset
150 |             print(f"\033[1;33m[ Offset 0x{realOffset:02x} ] Total: {tBytes}")
151 |             for commonValue in mostCommon:
152 |                 if asciiPrint: # This handles the printing of ascii characters next to the offset
153 |                     if commonValue[0] < 127 and chr(commonValue[0]).isprintable():
154 |                       charPrint = chr(commonValue[0])
155 |                       print(f"  \033[38;5;219m0x{commonValue[0]:02x}\033[0m - {commonValue[1]}/{tBytes} ({round((commonValue[1]/tBytes)*100,2)}%)\t'{charPrint}'") 
156 |                     else:
157 |                       print(f"  \033[38;5;219m0x{commonValue[0]:02x}\033[0m - {commonValue[1]}/{tBytes} ({round((commonValue[1]/tBytes)*100,2)}%)") 
158 |                 else:
159 |                     print(f"  \033[38;5;219m0x{commonValue[0]:02x}\033[0m - {commonValue[1]}/{tBytes} ({round((commonValue[1]/tBytes)*100,2)}%)") 
160 |     def packetHeatmap(self):
161 |         # This generates an average packet with a heat map. The more unique values a given byte has, the more intense the color becomes
162 |         print(f"\nPacket Average (With Unique Value Heatmap)")
163 |         print(f"-[{CGREY0}  1+ {CGREY1}  4+ {CGREY2}  8+ {CGREY3}  12+ {CGREY4}  16+ {CGREY5}  20+ {CGREY6}  24+ {CGREY7} 28+ {COLORN}]-")
164 |         print(f"-[{COLOR0} 32+ {COLOR1} 64+ {COLOR2} 96+ {COLOR3} 128+ {COLOR4} 160+ {COLOR5} 192+ {COLOR6} 224+ {COLOR7} 256 {COLORN}]-")
165 |         print()
166 |         pktAverage = ""
167 |         pktAscii = ""
168 |         currentByteInRow = 0
169 |         numRows = 0
170 |         bSep = " "
171 |         packetOffset = 0
172 |         uniqueLens = self.getUniquePacketLens()
173 |         for pktData in self.pBytes.keys(): # iterate over packets keys
174 |             pSet = set(self.pBytes[pktData].values()) # This is the number of unique values for this offset
175 |             pSetLen = len(pSet) # This is the length of this set of values
176 |             if packetOffset in uniqueLens:
177 |                 bSep = "\x1b[38;5;213m]" # This puts a bracket to show where a previous packet ended
178 |             if pSetLen == 1:
179 |                 r = int(list(self.pBytes[pktData].values())[0])
180 |                 pktAverage += f"{COLORAV}{r:02x} {COLORN}"
181 |                 pktAscii += f"{COLORAV}{chr(r)}{COLORN}" if chr(r).isprintable() and r < 0x7f else f'{COLORAV}.{COLORN}'
182 |             else:
183 |                 COLORZ = ""
184 |                 COLORZ = CGREY0 if pSetLen > 0  else COLORZ
185 |                 COLORZ = CGREY1 if pSetLen >= 4 else COLORZ
186 |                 COLORZ = CGREY2 if pSetLen >= 8 else COLORZ
187 |                 COLORZ = CGREY3 if pSetLen >= 12 else COLORZ
188 |                 COLORZ = CGREY4 if pSetLen >= 16 else COLORZ
189 |                 COLORZ = CGREY5 if pSetLen >= 20 else COLORZ
190 |                 COLORZ = CGREY6 if pSetLen >= 24 else COLORZ
191 |                 COLORZ = CGREY7 if pSetLen >= 28 else COLORZ
192 |                 COLORZ = COLOR0 if pSetLen >= (32*1) else COLORZ
193 |                 COLORZ = COLOR1 if pSetLen >= (32*2) else COLORZ
194 |                 COLORZ = COLOR2 if pSetLen >= (32*3) else COLORZ
195 |                 COLORZ = COLOR3 if pSetLen >= (32*4) else COLORZ
196 |                 COLORZ = COLOR4 if pSetLen >= (32*5) else COLORZ
197 |                 COLORZ = COLOR5 if pSetLen >= (32*6) else COLORZ
198 |                 COLORZ = COLOR6 if pSetLen >= (32*7) else COLORZ
199 |                 COLORZ = COLOR7 if pSetLen >= (32*8) else COLORZ
200 |                 pktAverage += f"{COLORZ}  {COLORN}{bSep}"
201 |                 pktAscii += f"{COLORZ} {COLORN}"
202 |             currentByteInRow = currentByteInRow + 1
203 |             bSep = " "
204 |             if currentByteInRow == 16:
205 |                 print(f"{self.pOffset+(numRows*16):04x}  {pktAverage}{'   '*(16-currentByteInRow)}  {pktAscii}")
206 |                 pktAverage = ""
207 |                 pktAscii = ""
208 |                 currentByteInRow = 0
209 |                 numRows = numRows + 1
210 |             packetOffset = packetOffset + 1
211 |         print(f"{self.pOffset+(numRows*16):04x}  {pktAverage}{'   '*(16-currentByteInRow)}  {pktAscii}")
212 |     def getPacketStrings(self,packetPayload, strLenMin=4, strModeStrict=False):
213 |         outString = ""
214 |         strList = []
215 |         offs = 0
216 |         strlen = 0
217 |         for pktChar in packetPayload.decode("latin-1"):
218 |             offs = offs + 1 # counting the offset of the packet in total
219 |             if pktChar in string.printable:
220 |                 if strModeStrict:
221 |                     if pktChar in string.punctuation:
222 |                         continue
223 |                     if pktChar in string.whitespace:
224 |                         continue
225 |                 outString += pktChar
226 |                 strlen = strlen + 1
227 |                 continue
228 |             if len(outString) >= strLenMin:
229 |                 stroffs = offs - strlen - 1 # The -1 is to keep it consistent with the 0 index offset
230 |                 strList.append((stroffs,outString))
231 |             outString = ""
232 |             strlen = 0
233 |         if len(outString) >= strLenMin:  # catch at end of packet buffer
234 |             stroffs = offs - strlen # Double check that this one isn't off by one. Might need a special case if offs == 0
235 |             strList.append((stroffs,outString))
236 |         return strList
237 |     def enumUniqueStrings(self):
238 |         # Helper for stringStats
239 |         strList = []
240 |         for dFrame in self.pStrings:
241 |             for packetString in self.pStrings[dFrame]:
242 |                 strList.append(packetString[1])
243 |         for i, c in Counter(strList).most_common():
244 |             if c == 1:
245 |                 self.strUniques.append(i)
246 |     def stringStats(self, strShowFilter=True):
247 |         # Call with -s argument
248 |         stringList  = [] # List of all the strings
249 |         offsetList  = [] # List of all the offsets
250 |         lengthsList = [] # Most common lengths
251 |         for pi in self.pStrings:
252 |             lengthsList.append(self.pLens[pi])
253 |             for ps in self.pStrings[pi]:
254 |                 offsetList.append(ps[0])
255 |                 stringList.append(ps[1])
256 |         mcs = Counter(stringList)
257 |         mco = Counter(offsetList)
258 |         mcl = Counter(lengthsList)
259 |         print("\nMost Common Strings:\nCount\tString")
260 |         for s in mcs.most_common(10):
261 |             print(f'  {s[1]}\t{repr(s[0])}',end="")
262 |             if strShowFilter: # If you want the filter to be automatically shown
263 |                 print(f' \t frame contains {":".join("{:02x}".format(ord(c)) for c in s[0])}')
264 |             else:
265 |                 print()
266 |         print("\nMost Common String Offsets:\nCount\tOffset")
267 |         for pktMostCommonOffset in mco.most_common(10):
268 |             print(f'  {pktMostCommonOffset[1]}\t0x{pktMostCommonOffset[0]:04x}')
269 |         print("\nMost Common Payload Lengths:\nCount\tLength")
270 |         for pktMostCommonLen in mcl.most_common(10):
271 |             print(f'  {pktMostCommonLen[1]}\t{pktMostCommonLen[0]}')
272 |         self.enumUniqueStrings()
273 |         if len(self.strUniques) > 0:
274 |             print("\nUnique Strings:")
275 |             for u in self.strUniques:
276 |                 print(f' {repr(u)}')
277 |         else:
278 |             print("\nNo Unique Strings Found!")
279 | 
280 | if __name__ == '__main__':
281 |     if args.inPcap and args.inText:
282 |         print("Only use one: pcap or text")
283 |     if args.inPcap:
284 |         myPkts = pDiff(args.inPcap,"pcap", pFilter=args.pFilter, verbose=args.verbose, pOffset=args.pOffset)
285 |     if args.inText:
286 |         myPkts = pDiff(args.inText,"text", verbose=args.verbose, pOffset=args.pOffset)
287 |     myPkts.packetHeatmap()
288 |     if args.listCommonBytes:
289 |         myPkts.listCommonBytesPerOffset()
290 |     if args.stringStats:
291 |         myPkts.stringStats()
292 | 


--------------------------------------------------------------------------------