├── .gitignore
├── README.md
├── img
    ├── extract_packets_report.png
    ├── get_traffic_stats_report_rcv.png
    └── get_traffic_stats_report_snd.png
├── scripts
    ├── __init__.py
    ├── dump_pkt_timestamps.py
    ├── get_traffic_stats.py
    └── plot_snd_timing.py
├── setup.py
└── tcpdump_processing
    ├── __init__.py
    ├── convert.py
    └── extract_packets.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | *.egg-info/
 24 | .installed.cfg
 25 | *.egg
 26 | MANIFEST
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | .pytest_cache/
 49 | 
 50 | # Translations
 51 | *.mo
 52 | *.pot
 53 | 
 54 | # Django stuff:
 55 | *.log
 56 | local_settings.py
 57 | db.sqlite3
 58 | 
 59 | # Flask stuff:
 60 | instance/
 61 | .webassets-cache
 62 | 
 63 | # Scrapy stuff:
 64 | .scrapy
 65 | 
 66 | # Sphinx documentation
 67 | docs/_build/
 68 | 
 69 | # PyBuilder
 70 | target/
 71 | 
 72 | # Jupyter Notebook
 73 | .ipynb_checkpoints
 74 | 
 75 | # pyenv
 76 | .python-version
 77 | 
 78 | # celery beat schedule file
 79 | celerybeat-schedule
 80 | 
 81 | # SageMath parsed files
 82 | *.sage.py
 83 | 
 84 | # Environments
 85 | .env
 86 | .venv
 87 | env/
 88 | venv/
 89 | ENV/
 90 | env.bak/
 91 | venv.bak/
 92 | 
 93 | # Spyder project settings
 94 | .spyderproject
 95 | .spyproject
 96 | 
 97 | # Rope project settings
 98 | .ropeproject
 99 | 
100 | # mkdocs documentation
101 | /site
102 | 
103 | # mypy
104 | .mypy_cache/
105 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # lib-tcpdump-processing
  2 | 
  3 | **`lib-tcpdump-processing`** is a library designed to process `.pcap(ng)` [tcpdump](https://www.tcpdump.org/) or [Wireshark](https://www.wireshark.org/) trace files and extract [SRT](https://github.com/Haivision/srt) packets of interest for further analysis.
  4 | 
  5 | It also helps to process a network trace file and generate a report with SRT related statistics. In particular, the following statistics are calculated at the receiver side:
  6 | - the number of SRT DATA and CONTROL packets present in a dump,
  7 | - original DATA packets received, lost, recovered, and unrecovered,
  8 | - retransmitted DATA packets received and the amount of packets retransmitted once, twice, 3 times, or more,
  9 | - The information about CONTROL (ACK, ACKACK, and NAK) packets,
 10 | - and other relevant information.
 11 | 
 12 | See the [`get-traffic-stats`](#get-traffic-stats) script for details and report examples.
 13 | 
 14 | **Important:** Currently, trace files containing only one flow of data are supported. To support several data flows adjustments will be required.
 15 | 
 16 | ## 1. Getting Started
 17 | 
 18 | ### Requirements
 19 | 
 20 | * python 3.6+
 21 | * tshark 3.2.2+ (setting up tshark is described [here](https://github.com/mbakholdina/srt-test-runner#setting-up-tshark) and in the SRT CookBook [here](https://srtlab.github.io/srt-cookbook/how-to-articles/how-to-setup-wireshark-for-srt-traffic-analysis/))
 22 | 
 23 | ### Install the library with pip
 24 | 
 25 | For development, it is recommended to:
 26 | * use `venv` for virtual environments and `pip` for installing the library and any dependencies. This ensures the code and dependencies are isolated from the system Python installation,
 27 | * install the library in “editable” mode by running `pip install -e .` from the same directory. This allows changing the source code (both tests and library) and rerunning tests against library code at will. For regular installation, use `pip install .`.
 28 | 
 29 | As soon as the library is installed, you can run modules directly:
 30 | ```
 31 | venv/bin/python -m tcpdump_processing.extract_packets --help
 32 | ```
 33 | 
 34 | or use preinstalled executable scripts:
 35 | ```
 36 | venv/bin/extract-packets --help
 37 | ```
 38 | 
 39 | ### Install the library to import in another project
 40 | 
 41 | Install with `pip` (a `venv` is recommended), using the `pip` VCS requirement specifier:
 42 | ```
 43 | pip install 'git+https://github.com/mbakholdina/lib-tcpdump-processing.git@v0.1#egg=tcpdump_processing'
 44 | ```
 45 | 
 46 | or simply put the following row in `requirements.txt`:
 47 | ```
 48 | git+https://github.com/mbakholdina/lib-tcpdump-processing.git@v0.1#egg=tcpdump_processing
 49 | ```
 50 | 
 51 | Remember to quote the full URL to avoid shell expansion in the case of direct installation.
 52 | 
 53 | This installs the version corresponding to the git tag 'v0.1'. You can replace that with a branch name, a commit hash, or a git ref as necessary. See the [pip documentation](https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support) for details.
 54 | 
 55 | To install the latest master, use:
 56 | ```
 57 | git+https://github.com/mbakholdina/lib-tcpdump-processing.git@master#egg=tcpdump_processing
 58 | ```
 59 | 
 60 | As soon as the library is installed, you can import the whole library:
 61 | ```
 62 | import tcpdump_processing
 63 | ```
 64 | 
 65 | or a particular module:
 66 | ```
 67 | import tcpdump_processing.extract_packets as extract_packets
 68 | ```
 69 | 
 70 | ## 2. Executable Scripts
 71 | 
 72 | To use the following scripts, please install the library first (see the [Install the library with pip](#install-the-library-with-pip) section).
 73 | 
 74 | ### `extract-packets`
 75 | 
 76 | This script parses `.pcap(ng)` trace file, saves the output in `.csv` format in the same directory as the original file, extracts packets of interest, and saves the obtained dataframe in `.csv` format in the same directory as the original file.
 77 | 
 78 | Usage:
 79 | ```
 80 | venv/bin/extract-packets [OPTIONS] PATH
 81 | ```
 82 | where `PATH` refers to `.pcap(ng)` file.
 83 | 
 84 | Options:
 85 | ```
 86 | Options:
 87 |   --type [srt|data|control|probing|umsg_handshake|umsg_ack]
 88 |                                   Packet type to extract: SRT (both DATA and
 89 |                                   CONTROL), SRT DATA, SRT CONTROL, SRT DATA
 90 |                                   probing, SRT CONTROL UMSG_HANDSHAKE, or SRT
 91 |                                   CONTROL UMSG_ACK packets.  [default:
 92 |                                   probing]
 93 |   --overwrite / --no-overwrite    If exists, overwrite the .csv file produced
 94 |                                   out of the .pcap(ng) one at the previous
 95 |                                   iterations of running the script.  [default:
 96 |                                   no-overwrite]
 97 |   --save / --no-save              Save dataframe with extracted packets into
 98 |                                   .csv file.  [default: no-save]
 99 |   --port TEXT                     Decode packets as SRT on a specified port.
100 |                                   This option is helpful when there is no SRT
101 |                                   handshake in .pcap(ng) file. Should be used
102 |                                   together with --overwrite option.
103 |   --help                          Show this message and exit.
104 | ```
105 | 
106 | Here is an example of the report generated when extracting `--type srt` packets:
107 | 
108 | ![extract_packets_report](img/extract_packets_report.png)
109 | 
110 | ### `get-traffic-stats`
111 | 
112 | This script helps to process a network `.pcap(ng)` trace file and generate a report with SRT related statistics. Intermediate data is stored in `.csv` format in the same directory as the original file. Both sender and receiver side dumps are supported.
113 | 
114 | Usage:
115 | ```
116 | venv/bin/get-traffic-stats [OPTIONS] PATH
117 | ```
118 | where `PATH` refers to `.pcap(ng)` file.
119 | 
120 | Options:
121 | ```
122 | Options:
123 |   --side [snd|rcv]              The side .pcap(ng) file was collected at.
124 |                                 [required]
125 |   --overwrite / --no-overwrite  If exists, overwrite the .csv file produced
126 |                                 out of the .pcap(ng) one at the previous
127 |                                 iterations of running the script.  [default:
128 |                                 no-overwrite]
129 |   --show-unrec-pkts / --no-show-unrec-pkts
130 |                                 Show sequence numbers of unrecovered at the
131 |                                 receiver side packets. Save the list of
132 |                                 sequence numbers into respective .csv file.
133 |                                 [default: no-show-unrec-pkts]
134 |   --port TEXT                   Decode packets as SRT on a specified port.
135 |                                 This option is helpful when there is no SRT
136 |                                 handshake in .pcap(ng) file. Should be used
137 |                                 together with --overwrite option.
138 |   --help                        Show this message and exit.
139 | ```
140 | 
141 | Here is an example of the report generated at the sender side:
142 | 
143 | ![get_traffic_stats_report_snd](img/get_traffic_stats_report_snd.png)
144 | 
145 | Here is an example of the report generated at the receiver side:
146 | 
147 | ![get_traffic_stats_report_rcv](img/get_traffic_stats_report_rcv.png)
148 | 
149 | ### `plot-snd-timing`
150 | 
151 | This script parses `.pcap(ng)` tcpdump trace file captured at the sender side
152 | and plots the time delta between SRT packet timestamp (`srt.timestamp`) and
153 | packet time captured by Wireshark at the sender side (`ws.time`).
154 | This could be done for either SRT original DATA packets only, or both
155 | original and retransmitted DATA packets.
156 | 
157 | Usage:
158 | ```
159 | venv/bin/plot-snd-timing [OPTIONS] PATH
160 | ```
161 | where `PATH` refers to `.pcap(ng)` file.
162 | 
163 | Options:
164 | ```
165 | Options:
166 |   --overwrite / --no-overwrite    If exists, overwrite the .csv file produced
167 |                                   out of the .pcap(ng) one at the previous
168 |                                   iterations of running the script.  [default:
169 |                                   no-overwrite]
170 |   --with-rexmits / --without-rexmits
171 |                                   Also show retransmitted data packets.
172 |                                   [default: without-rexmits]
173 |   --port TEXT                     Decode packets as SRT on a specified port.
174 |                                   This option is helpful when there is no SRT
175 |                                   handshake in .pcap(ng) file. Should be used
176 |                                   together with --overwrite option.
177 |   --latency TEXT                  SRT latency, in milliseconds, to plot on a
178 |                                   graph.
179 |   --help                          Show this message and exit.
180 | ```
181 | 
182 | ### `dump-pkt-timestamps`
183 | 
184 | The script dumps SRT timestamps (not Wireshark `ws.time`) of SRT data packets to a `.csv` file
185 | to be used by [srt-xtransmit](https://github.com/maxsharabayko/srt-xtransmit) application with the `--playback-csv` argument.
186 | 
187 | Usage:
188 | ```
189 | dump-pkt-timestamps [OPTIONS] INPUT OUTPUT
190 | ```
191 | where `INPUT` is the `.pcap(ng)` file to use as an input, `OUTPUT` is the output `.csv` file to be produced.
192 | 
193 | Options:
194 | ```
195 | Options:
196 |   --overwrite / --no-overwrite  If exists, overwrite the .csv file produced
197 |                                 out of the .pcap(ng) one at the previous
198 |                                 iterations of running the script.  [default:
199 |                                 no-overwrite]
200 |   --port TEXT                   Decode packets as SRT on a specified port.
201 |                                 This option is helpful when there is no SRT
202 |                                 handshake in .pcap(ng) file. Should be used
203 |                                 together with --overwrite option.
204 |   --help                        Show this message and exit.
205 | ```
206 | 
207 | ## 3. Data Preparation
208 | 
209 | A `.pcap(ng)` trace file with measurements from a specific network interface and port collected at the sender/receiver side is used as a proxy for packet data collected by SRT. This trace file is preprocessed in `.csv` format with timestamp, source IP address, destination IP address, protocol, and other columns and rows representing observations (sent/received packets).
210 | 
211 | This data is further cleaned and transformed using [pandas](https://pandas.pydata.org/) in the following way:
212 | 1. The data is filtered to extract SRT packets only (`ws.protocol == SRT`), which makes sense for further analysis.
213 | 2. The dataset then is split into DATA (`srt.iscontrol == 0`) and CONTROL (`srt.iscontrol == 1`) packets.
214 | 3. For DATA packets, timestamps are converted from seconds to microseconds using the same procedure as in the protocol. To be precise, the new variable `ws.time.us` is obtained as `(ws.time * 1000000).astype('int64')`.
215 | 4. For DATA packets, the inter-arrival time is calculated as the difference between current and previous packet timestamps and stored as a separate variable `ws.iat.us`. Please note that the time delta for the first SRT data packet by default is equal to 0. That's why this packet should probably be excluded from the analysis.
216 | 5. Type conversion is performed to structure the data in appropriate formats.
217 | 
218 | The detailed description of dataset variables, tcpdump/Wireshark dissectors and other data is provided in the table below. See columns `DATA` and `CONTROL` to determine whether a variable is present (`✓`) or absent (`-`) in a corresponding dataset.
219 | 
220 | | Dataset Variable  | Wireshark Dissector    | Description                                                              | DATA       | CONTROL   | Data Type  |
221 | |:------------------|:-----------------------|:-------------------------------------------------------------------------|:-----------|:----------|:-----------|
222 | | ws.no             | _ws.col.No.            | Number as registered by Wireshark                                        | ✓          | ✓         | int64      |
223 | | frame.time        | frame.time             | Absolute time when the frame was captured as registered by Wireshark     | ✓          | ✓         | datetime64 |
224 | | ws.time           | _ws.col.Time           | Relative timestamp as registered by Wireshark (seconds)                  | ✓          | ✓         | float64    |
225 | | ws.source         | _ws.col.Source         | Source IP address                                                        | ✓          | ✓         | category   |
226 | | ws.destination    | _ws.col.Destination    | Destination IP address                                                   | ✓          | ✓         | category   |
227 | | ws.protocol       | _ws.col.Protocol       | Protocol                                                                 | ✓          | ✓         | category   |
228 | | ws.length         | _ws.col.Length         | Length (bytes)                                                           | ✓          | ✓         | int16      |
229 | | ws.info           | _ws.col.Info           | Information                                                              | ✓          | ✓         | object     |
230 | | udp.length        | udp.length             | UDP packet size (bytes)                                                  | ✓          | ✓         | int16      |
231 | | srt.iscontrol     | srt.iscontrol          | Content type (CONTROL if 1, DATA if 0)                                   | ✓          | ✓         | int8       |
232 | | srt.type          | srt.type               | Message type (e.g. UMSG_ACK, UMSG_ACKACK)                                | -          | ✓         | category   |
233 | | srt.seqno         | srt.seqno              | Sequence number                                                          | ✓          | -         | int64      |
234 | | srt.msg.rexmit    | srt.msg.rexmit         | Sent as original if 0, retransmitted if 1                                | ✓          | -         | int8       |
235 | | srt.timestamp     | srt.timestamp          | Timestamp since the socket was opened (microseconds)                     | ✓          | ✓         | int64      |
236 | | srt.id            | srt.id                 | Destination socket id                                                    | ✓          | ✓         | category   |
237 | | srt.ack_seqno     | srt.ack_seqno          | First unacknowledged sequence number                                     | -          | ✓         | int64      |
238 | | srt.rtt           | srt.rtt                | Round Trip Time (RTT) estimate (microseconds)                            | -          | ✓         | int64      |
239 | | srt.rttvar        | srt.rttvar             | The variance of Round Trip Time (RTT) estimate (microseconds)            | -          | ✓         | int64      |
240 | | srt.rate          | srt.rate               | Receiving speed estimate (packets/s)                                     | -          | ✓         | int64      |
241 | | srt.bw            | srt.bw                 | Bandwidth estimate (packets/s)                                           | -          | ✓         | int64      |
242 | | srt.rcvrate       | srt.rcvrate            | Receiving speed estimate (bytes/s)                                       | -          | ✓         | int64      |
243 | | data.len          | data.len               | Payload size or 0 in case of control packets (bytes)                     | ✓          | -         | int16      |
244 | | ws.time.us        | -                      | Relative timestamp as registered by Wireshark (microseconds)             | ✓          | -         | int64      |
245 | | ws.iat.us         | -                      | Packet inter-arrival time (microseconds)                                 | ✓          | -         | int64      |
246 | 
247 | ### Probing DATA Packets
248 | 
249 | Probing DATA packets are extracted from the DATA packets dataset as follows:
250 | 1. Find all the packet pairs where the latest 4 bits of their sequence numbers (`srt.seqno`) are `0000` and `0001`. The order is important.
251 | 2. For each packet pair, check whether both packets are sent as `Original` (`srt.msg.rexmit` == 0), not `Retransmitted` (`srt.msg.rexmit` == 1).
252 | 3. For the remaining packet pairs, take the packet with sequence number ending with `0001` bits as a probing packet.
253 | 
254 | Here is an example of a packet pair where `Frame 25` corresponds to the probing packet:
255 | ```
256 | Frame 24: 1514 bytes on wire (12112 bits), 1500 bytes captured (12000 bits) on interface 0
257 | Ethernet II, Src: 12:34:56:78:9a:bc (12:34:56:78:9a:bc), Dst: Microsof_59:95:17 (00:0d:3a:59:95:17)
258 | Internet Protocol Version 4, Src: 51.144.160.127, Dst: 10.1.4.4
259 | User Datagram Protocol, Src Port: 60900, Dst Port: 4200
260 | SRT Protocol
261 |     0... .... .... .... .... .... .... .... = Content: DATA
262 |     .111 1111 1110 0111 0111 1000 0011 0000 = Sequence Number: 2145876016
263 |     11.. .... .... .... .... .... .... .... = Packet Boundary: PB_SOLO (3)
264 |     ..0. .... .... .... .... .... .... .... = In-Order Indicator: 0
265 |     ...0 0... .... .... .... .... .... .... = Encryption Status: Not encrypted (0)
266 |     .... .0.. .... .... .... .... .... .... = Sent as: Original
267 |     .... ..00 0000 0000 0000 0000 0001 0001 = Message Number: 17
268 |     Time Stamp: 449263 (0x0006daef)
269 |     Destination Socket ID: 0x1c9ff5e1
270 |     Data (1442 bytes)
271 | 
272 | ```
273 | 
274 | ```
275 | Frame 25: 1514 bytes on wire (12112 bits), 1500 bytes captured (12000 bits) on interface 0
276 | Ethernet II, Src: 12:34:56:78:9a:bc (12:34:56:78:9a:bc), Dst: Microsof_59:95:17 (00:0d:3a:59:95:17)
277 | Internet Protocol Version 4, Src: 51.144.160.127, Dst: 10.1.4.4
278 | User Datagram Protocol, Src Port: 60900, Dst Port: 4200
279 | SRT Protocol
280 |     0... .... .... .... .... .... .... .... = Content: DATA
281 |     .111 1111 1110 0111 0111 1000 0011 0001 = Sequence Number: 2145876017
282 |     11.. .... .... .... .... .... .... .... = Packet Boundary: PB_SOLO (3)
283 |     ..0. .... .... .... .... .... .... .... = In-Order Indicator: 0
284 |     ...0 0... .... .... .... .... .... .... = Encryption Status: Not encrypted (0)
285 |     .... .0.. .... .... .... .... .... .... = Sent as: Original
286 |     .... ..00 0000 0000 0000 0000 0001 0010 = Message Number: 18
287 |     Time Stamp: 449292 (0x0006db0c)
288 |     Destination Socket ID: 0x1c9ff5e1
289 |     Data (1442 bytes)
290 | ```
291 | 
292 | ### UMSG_HANDSHAKE CONTROL Packets
293 | 
294 | UMSG_HANDSHAKE CONTROL packets are extracted from the CONTROL packets dataset using the following criteria: `srt.type == 0x00000000`.
295 | 
296 | ### UMSG_ACK CONTROL Packets
297 | 
298 | UMSG_ACK CONTROL packets are extracted from the CONTROL packets dataset as follows:
299 | 1. Find all the packets with `srt.type == 0x00000002`.
300 | 2. Drop rows with `NaN` values of `srt.rate`, `srt.bw`, and `srt.rcvrate` variables (so called "light acknowledgements" or "light ACKs").
301 | 
302 | 
303 | 
304 | [RETURN TO TOP](#lib-tcpdump-processing)
305 | 


--------------------------------------------------------------------------------
/img/extract_packets_report.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/img/extract_packets_report.png


--------------------------------------------------------------------------------
/img/get_traffic_stats_report_rcv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/img/get_traffic_stats_report_rcv.png


--------------------------------------------------------------------------------
/img/get_traffic_stats_report_snd.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/img/get_traffic_stats_report_snd.png


--------------------------------------------------------------------------------
/scripts/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/scripts/__init__.py


--------------------------------------------------------------------------------
/scripts/dump_pkt_timestamps.py:
--------------------------------------------------------------------------------
 1 | """
 2 | The script dumps SRT timestamps (not Wireshark ws.time) of SRT data packets to a .csv file
 3 | to be used by srt-xtransmit application with the --playback-csv argument.
 4 | """
 5 | import pathlib
 6 | 
 7 | import click
 8 | 
 9 | from tcpdump_processing.convert import convert_to_csv
10 | from tcpdump_processing.extract_packets import extract_srt_packets, UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound
11 | 
12 | 
13 | class SRTDataIndex:
14 | 	def __init__(self, srt_packets):
15 | 		self.ctrl_pkts        = (srt_packets['srt.iscontrol'] == 1)
16 | 		self.data_pkts = (srt_packets['srt.iscontrol'] == 0)
17 | 		self.data_pkts_org = self.data_pkts & (srt_packets['srt.msg.rexmit'] == 0)
18 | 
19 | 
20 | @click.command()
21 | @click.argument(
22 | 	'input', 
23 | 	type=click.Path(exists=True)
24 | )
25 | @click.argument(
26 | 	'output',
27 | 	type=click.Path(exists=False)
28 | )
29 | @click.option(
30 | 	'--overwrite/--no-overwrite',
31 | 	default=False,
32 | 	help=	'If exists, overwrite the .csv file produced out of the .pcap(ng) '
33 | 			'one at the previous iterations of running the script.',
34 | 	show_default=True
35 | )
36 | @click.option(
37 | 	'--port',
38 | 	help=	'Decode packets as SRT on a specified port. '
39 | 			'This option is helpful when there is no SRT handshake in .pcap(ng) file. '
40 | 			'Should be used together with --overwrite option.'
41 | )
42 | def main(input, output, overwrite, port):
43 | 	"""
44 | 	This script parses .pcap(ng) tcpdump trace file and outputs all original
45 | 	data packets' SRT timestamps (not Wireshark ws.time) into a .csv file.
46 | 
47 | 	INPUT is the .pcap(ng) file to use as an input.
48 | 
49 | 	OUTPUT is the output .csv file to be produced.
50 | 	"""
51 | 	pcap_filepath = pathlib.Path(input)
52 | 	if port is not None:
53 | 		csv_filepath = convert_to_csv(pcap_filepath, overwrite, True, port)
54 | 	else:
55 | 		csv_filepath = convert_to_csv(pcap_filepath, overwrite)
56 | 	
57 | 	try:
58 | 		srt_packets = extract_srt_packets(csv_filepath)
59 | 	except (UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound) as error:
60 | 		print(f'{error}')
61 | 		return
62 | 		
63 | 	index = SRTDataIndex(srt_packets)
64 | 	df = srt_packets[index.data_pkts_org]
65 | 	(df['srt.timestamp'] / 1000000.0).to_csv(output, index=False, header=False)
66 | 	
67 | 	# TODO: Plotting the histogram of packets by 10 ms bins.
68 | 	# The code below is missing the end time in the arrange() function.
69 | 	#x = np.arange(0, 27, 0.01, dtype = float)
70 | 	#fig, axis = plt.subplots(figsize =(10, 5))
71 | 	#axis.hist((df['srt.timestamp'] / 1000000.0), bins = x)
72 | 	#plt.show()
73 | 
74 | 	return
75 | 
76 | 
77 | if __name__ == '__main__':
78 | 	main()
79 | 


--------------------------------------------------------------------------------
/scripts/get_traffic_stats.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Script designed to process .pcap(ng) files and generate a report
  3 | with network traffic statistics.
  4 | """
  5 | import pathlib
  6 | 
  7 | import click
  8 | import pandas as pd
  9 | 
 10 | from tcpdump_processing.convert import convert_to_csv
 11 | from tcpdump_processing.extract_packets import extract_srt_packets, UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound
 12 | 
 13 | 
 14 | def to_percent(value, base):
 15 | 	return round(value / base * 100, 4)
 16 | 
 17 | 
 18 | def to_str(first, second):
 19 | 	return str(first) + '(' + str(second) + ')'
 20 | 
 21 | 
 22 | def to_rate(value, duration):
 23 | 	return round(value * 8 / duration / 1000000, 2)
 24 | 
 25 | 
 26 | class TrafficStatsIndex:
 27 | 	def __init__(self, srt_packets):
 28 | 		self.ctrl_pkts        = (srt_packets['srt.iscontrol'] == 1)
 29 | 		self.ctrl_pkts_ack    = (self.ctrl_pkts) & (srt_packets['srt.type'] == '0x00000002')
 30 | 		self.ctrl_pkts_ackack = (self.ctrl_pkts) & (srt_packets['srt.type'] == '0x00000006')
 31 | 		self.ctrl_pkts_nak    = (self.ctrl_pkts) & (srt_packets['srt.type'] == '0x00000003')
 32 | 
 33 | 		self.data_pkts     = (srt_packets['srt.iscontrol'] == 0)
 34 | 		self.data_pkts_org = (self.data_pkts) & (srt_packets['srt.msg.rexmit'] == 0)
 35 | 		self.data_pkts_rex = (self.data_pkts) & (srt_packets['srt.msg.rexmit'] == 1)
 36 | 
 37 | 
 38 | class TrafficStats:
 39 | 	def __init__(self, srt_packets):
 40 | 		self.srt_packets = srt_packets
 41 | 		self.index = TrafficStatsIndex(srt_packets)
 42 | 
 43 | 
 44 | 	def bytes_to_Mbps(self, bytes):
 45 | 		return bytes * 8 / self.duration / 1000000
 46 | 
 47 | 
 48 | 	@property
 49 | 	def data_pkts(self):
 50 | 		return self.srt_packets[self.index.data_pkts]
 51 | 
 52 | 
 53 | 	@property
 54 | 	def data_pkts_org(self):
 55 | 		return self.srt_packets[self.index.data_pkts_org]
 56 | 
 57 | 
 58 | 	@property
 59 | 	def data_pkts_rex(self):
 60 | 		return self.srt_packets[self.index.data_pkts_rex]
 61 | 
 62 | 
 63 | 	@property
 64 | 	def ctrl_pkts(self):
 65 | 		return self.srt_packets[self.index.ctrl_pkts]
 66 | 
 67 | 
 68 | 	@property
 69 | 	def duration(self):
 70 | 		# Calculate duration in seconds.
 71 | 		start    = self.data_pkts.iloc[0]['ws.time']
 72 | 		stop      = self.data_pkts.iloc[-1]['ws.time']
 73 | 		return (stop - start)
 74 | 
 75 | 
 76 | 	def count_packets(self):
 77 | 		# Count the number of packets.
 78 | 		pkts          = len(self.srt_packets.index)
 79 | 		data_pkts     = self.index.data_pkts.sum()
 80 | 		data_pkts_org = self.index.data_pkts_org.sum()
 81 | 		data_pkts_rex = self.index.data_pkts_rex.sum()
 82 | 
 83 | 		ctrl_pkts        = self.index.ctrl_pkts.sum()
 84 | 		ctrl_pkts_ack    = self.index.ctrl_pkts_ack.sum()
 85 | 		ctrl_pkts_ackack = self.index.ctrl_pkts_ackack.sum()
 86 | 		ctrl_pkts_nak    = self.index.ctrl_pkts_nak.sum()
 87 | 
 88 | 		return {
 89 | 			'pkts': pkts,
 90 | 			'data_pkts': data_pkts,
 91 | 			'data_pkts_org': data_pkts_org,
 92 | 			'data_pkts_rex': data_pkts_rex,
 93 | 			'ctrl_pkts': ctrl_pkts,
 94 | 			'ctrl_pkts_ack': ctrl_pkts_ack,
 95 | 			'ctrl_pkts_ackack': ctrl_pkts_ackack,
 96 | 			'ctrl_pkts_nak': ctrl_pkts_nak,
 97 | 		}
 98 | 
 99 | 	
100 | 	def count_retransmissions(self):
101 | 		# Calculate how much packets were retransmitted once, twice, 3x times, etc.
102 | 		rexmit_pkts              = self.data_pkts_rex.copy()
103 | 		rexmit_pkts['srt.seqno'] = rexmit_pkts['srt.seqno'].astype('int32')
104 | 		rexmit_pkts['seqno']     = rexmit_pkts['srt.seqno']
105 | 		rexmits                  = rexmit_pkts.groupby(['srt.seqno'])['seqno'].count()
106 | 
107 | 		once    = rexmits[rexmits == 1].count()
108 | 		twice   = rexmits[rexmits == 2].count()
109 | 		x3      = rexmits[rexmits == 3].count()
110 | 		x4      = rexmits[rexmits == 4].count()
111 | 		x5_more = rexmits[rexmits > 4].count()
112 | 
113 | 		return {
114 | 			'once': once,
115 | 			'twice': twice,
116 | 			'x3': x3,
117 | 			'x4': x4,
118 | 			'x5_more': x5_more,
119 | 			'once_total': once,
120 | 			'twice_total': twice * 2,
121 | 			'x3_total': x3 * 3,
122 | 			'x4_total': x4 * 4,
123 | 			'x5_more_total': len(rexmit_pkts) - once - twice * 2 - x3 * 3 - x4 * 4,
124 | 		}
125 | 
126 | 
127 | 	def print_traffic(self):
128 | 		print(" Traffic ".center(70, "~"))
129 | 
130 | 		print(f"- SRT DATA pkts")
131 | 		print(f"  - SRT payload + SRT hdr + UDP hdr (orig+retrans)  {to_rate(self.data_pkts['udp.length'].sum(), self.duration):>13} Mbps")
132 | 		print(f"  - SRT payload + SRT hdr (orig+retrans)            {to_rate(self.data_pkts['data.len'].sum() + 16 * len(self.data_pkts), self.duration):>13} Mbps")
133 | 		print(f"  - SRT payload (orig+retrans)                      {to_rate(self.data_pkts['data.len'].sum(), self.duration):>13} Mbps")
134 | 		print(f"  - SRT payload + SRT hdr + UDP hdr (orig)          {to_rate(self.data_pkts_org['udp.length'].sum(), self.duration):>13} Mbps")
135 | 		print(f"  - SRT payload + SRT hdr (orig)                    {to_rate(self.data_pkts_org['data.len'].sum() + 16 * len(self.data_pkts_org), self.duration):>13} Mbps")
136 | 		print(f"  - SRT payload (orig)                              {to_rate(self.data_pkts_org['data.len'].sum(), self.duration):>13} Mbps")
137 | 
138 | 
139 | 	def print_notations(self):
140 | 		print(" Notations ".center(70, "~"))
141 | 		print("pkts - packets")
142 | 		print("hdr - header")
143 | 		print("orig - original")
144 | 		print("retrans - retransmitted")
145 | 		print("".center(70, "~"))
146 | 
147 | 
148 | 	def generate_snd_report(self):
149 | 		cnt = self.count_packets()
150 | 		rexmits_cnt = self.count_retransmissions()
151 | 
152 | 		# Calculate the number of missing in the dump original data packets
153 | 		# that were either dropped by the SRT sender, or UDP socket.
154 | 		# Reordered packets are not taken into account, so if a packet is reordered and
155 | 		# comes later, it will not be included into statistic.
156 | 		seqnos_org = self.data_pkts_org['srt.seqno'].astype('int32')
157 | 		# Removing duplicates in sent original packets.
158 | 		seqnos_org = seqnos_org.drop_duplicates()
159 | 		data_pkts_org_missing_cnt = int((seqnos_org.diff().dropna() - 1).sum())
160 | 
161 | 		print(" SRT Packets ".center(70, "~"))
162 | 
163 | 		print(f"- SRT DATA+CONTROL pkts  {cnt['pkts']:>35}")
164 | 
165 | 		print(f"- SRT DATA pkts          {cnt['data_pkts']:>35}")
166 | 
167 | 		print(
168 | 			f"  - Original DATA pkts sent       {cnt['data_pkts_org']:>26}"
169 | 			f" {to_percent(cnt['data_pkts_org'], cnt['data_pkts']):>8}%"
170 | 			"  out of orig+retrans sent DATA pkts"
171 | 		)
172 | 
173 | 		print(
174 | 			f"  - Retransmitted DATA pkts sent  {cnt['data_pkts_rex']:>26}"
175 | 			f" {to_percent(cnt['data_pkts_rex'], cnt['data_pkts']):>8}%"
176 | 			"  out of orig+retrans sent DATA pkts"
177 | 		)
178 | 		print(f"      Once   {to_str(rexmits_cnt['once'], rexmits_cnt['once']):>47} {to_percent(rexmits_cnt['once'], cnt['data_pkts']):>8}%")
179 | 		print(f"      Twice  {to_str(rexmits_cnt['twice'], rexmits_cnt['twice_total']):>47} {to_percent(rexmits_cnt['twice_total'], cnt['data_pkts']):>8}%")
180 | 		print(f"      3×     {to_str(rexmits_cnt['x3'], rexmits_cnt['x3_total']):>47} {to_percent(rexmits_cnt['x3_total'], cnt['data_pkts']):>8}%")
181 | 		print(f"      4×     {to_str(rexmits_cnt['x4'], rexmits_cnt['x4_total']):>47} {to_percent(rexmits_cnt['x4_total'], cnt['data_pkts']):>8}%")
182 | 		print(f"      5+     {to_str(rexmits_cnt['x5_more'], rexmits_cnt['x5_more_total']):>47} {to_percent(rexmits_cnt['x5_more_total'], cnt['data_pkts']):>8}%")
183 | 
184 | 		print(
185 | 			f"- Original DATA pkts missing       {data_pkts_org_missing_cnt:>25}"
186 | 			f" {to_percent(data_pkts_org_missing_cnt, (cnt['data_pkts_org']+data_pkts_org_missing_cnt)):>8}%"
187 | 			"  out of orig sent+missing DATA pkts"
188 | 		)
189 | 
190 | 		print(f"- SRT CONTROL pkts     {cnt['ctrl_pkts']:>37}")
191 | 		print(f"  - ACK pkts received  {cnt['ctrl_pkts_ack']:>37}")
192 | 		print(f"  - ACKACK pkts sent   {cnt['ctrl_pkts_ackack']:>37}")
193 | 		print(f"  - NAK pkts received  {cnt['ctrl_pkts_nak']:>37}")
194 | 
195 | 		self.print_traffic()
196 | 
197 | 		print(" Overhead ".center(70, "~"))
198 | 
199 | 		print(f"- SRT DATA pkts")
200 | 		print(
201 | 			"  - UDP+SRT headers over SRT payload (orig)"
202 | 			f"{round(to_rate(self.data_pkts_org['udp.length'].sum(), self.duration) * 100 / to_rate(self.data_pkts_org['data.len'].sum(), self.duration) - 100, 2):>25} %"
203 | 		)
204 | 		print(
205 | 			"  - Retransmitted over original sent pkts"
206 | 			f"{to_percent(cnt['data_pkts_rex'], cnt['data_pkts_org']):>27} %"
207 | 		)
208 | 
209 | 		self.print_notations()
210 | 
211 | 
212 | 	def generate_rcv_report(self):
213 | 		cnt = self.count_packets()
214 | 		rexmits_cnt = self.count_retransmissions()
215 | 
216 | 		# Calculate the number of lost original data packets as the number
217 | 		# of original data packets that haven't reached the receiver.
218 | 		# Reordered packets are not taken into account, so if a packet is reordered and
219 | 		# comes later, it will not be included into statistic.
220 | 		seqnos_org = self.data_pkts_org['srt.seqno'].astype('int32')
221 | 		# Removing duplicates in received original packets.
222 | 		seqnos_org = seqnos_org.drop_duplicates()
223 | 		data_pkts_org_lost_cnt = int((seqnos_org.diff().dropna() - 1).sum())
224 | 		
225 | 		# The number of packets considered unrecovered at the receiver.
226 | 		# It means neither original, nor re-transmitted packet with
227 | 		# a particular sequence number has reached the destination.
228 | 		seqnos = self.data_pkts['srt.seqno'].astype('int32').copy()
229 | 		seqnos = seqnos.drop_duplicates().sort_values()
230 | 		data_pkts_unrecovered_cnt = int((seqnos.diff().dropna() - 1).sum())
231 | 		
232 | 		# The number of recovered at the receiver side packets.
233 | 		data_pkts_recovered_cnt = data_pkts_org_lost_cnt - data_pkts_unrecovered_cnt
234 | 
235 | 		# The number of original DATA packets (received + lost).
236 | 		data_pkts_org_rcvd_lost_cnt = cnt['data_pkts_org'] + data_pkts_org_lost_cnt
237 | 
238 | 		print(" SRT Packets ".center(70, "~"))
239 | 
240 | 		print(f"- SRT DATA+CONTROL pkts  {cnt['pkts']:>35}")
241 | 
242 | 		print(f"- SRT DATA pkts          {cnt['data_pkts']:>35}")
243 | 
244 | 		print(
245 | 			f"  - Original DATA pkts received       {cnt['data_pkts_org']:>22}"
246 | 			f" {to_percent(cnt['data_pkts_org'], cnt['data_pkts']):>8}%"
247 | 			"  out of orig+retrans received DATA pkts"
248 | 		)
249 | 
250 | 		print(
251 | 			f"  - Retransmitted DATA pkts received  {cnt['data_pkts_rex']:>22}"
252 | 			f" {to_percent(cnt['data_pkts_rex'], cnt['data_pkts']):>8}%"
253 | 			"  out of orig+retrans received DATA pkts"
254 | 		)
255 | 		print(f"      Once   {to_str(rexmits_cnt['once'], rexmits_cnt['once']):>47} {to_percent(rexmits_cnt['once'], cnt['data_pkts']):>8}%")
256 | 		print(f"      Twice  {to_str(rexmits_cnt['twice'], rexmits_cnt['twice_total']):>47} {to_percent(rexmits_cnt['twice_total'], cnt['data_pkts']):>8}%")
257 | 		print(f"      3×     {to_str(rexmits_cnt['x3'], rexmits_cnt['x3_total']):>47} {to_percent(rexmits_cnt['x3_total'], cnt['data_pkts']):>8}%")
258 | 		print(f"      4×     {to_str(rexmits_cnt['x4'], rexmits_cnt['x4_total']):>47} {to_percent(rexmits_cnt['x4_total'], cnt['data_pkts']):>8}%")
259 | 		print(f"      5+     {to_str(rexmits_cnt['x5_more'], rexmits_cnt['x5_more_total']):>47} {to_percent(rexmits_cnt['x5_more_total'], cnt['data_pkts']):>8}%")
260 | 
261 | 		# The percentage of original DATA packets lost is calculated out of
262 | 		# original DATA packets (received + lost) which equals sent unique
263 | 		# packets approximately.
264 | 		print(
265 | 			f"- Original DATA pkts lost       {data_pkts_org_lost_cnt:>28}"
266 | 			f" {to_percent(data_pkts_org_lost_cnt, data_pkts_org_rcvd_lost_cnt):>8}%"
267 | 			"  out of orig received+lost DATA pkts"
268 | 		)
269 | 		print(
270 | 			f"  - Recovered pkts  {data_pkts_recovered_cnt:>40}"
271 | 			f" {to_percent(data_pkts_recovered_cnt, data_pkts_org_rcvd_lost_cnt):>8}%"
272 | 		)
273 | 		print(
274 | 			f"  - Unrecovered pkts  {data_pkts_unrecovered_cnt:>38}"
275 | 			f" {to_percent(data_pkts_unrecovered_cnt, data_pkts_org_rcvd_lost_cnt):>8}%"
276 | 		)
277 | 
278 | 		print(f"- SRT CONTROL pkts                {cnt['ctrl_pkts']:>26}")
279 | 		print(f"  - ACK pkts sent                 {cnt['ctrl_pkts_ack']:>26}")
280 | 		print(f"  - ACKACK pkts received          {cnt['ctrl_pkts_ackack']:>26}")
281 | 		print(f"  - NAK pkts sent                 {cnt['ctrl_pkts_nak']:>26}")
282 | 
283 | 		self.print_traffic()
284 | 		
285 | 		print(" Overhead ".center(70, "~"))
286 | 
287 | 		print(f"- SRT DATA pkts")
288 | 		print(
289 | 			"  - UDP+SRT headers over SRT payload (orig)"
290 | 			f"{round(to_rate(self.data_pkts_org['udp.length'].sum(), self.duration) * 100 / to_rate(self.data_pkts_org['data.len'].sum(), self.duration) - 100, 2):>25} %"
291 | 		)
292 | 		print(
293 | 			"  - Retransmitted over original (received+lost) pkts"
294 | 			f"{to_percent(cnt['data_pkts_rex'], data_pkts_org_rcvd_lost_cnt):>16} %"
295 | 		)
296 | 
297 | 		self.print_notations()
298 | 
299 | 
300 | 	def show_unrecovered_packets(self, parent, stem):
301 | 		# Show and save to file sequence numbers of unrecovered at the
302 | 		# receiver side packets.
303 | 
304 | 		# The number of packets considered unrecovered at the receiver.
305 | 		# It means neither original, nor re-transmitted packet with
306 | 		# a particular sequence number has reached the destination.
307 | 		seqnos = self.data_pkts['srt.seqno'].astype('int32').copy()
308 | 		seqnos = seqnos.drop_duplicates().sort_values()
309 | 
310 | 		# Get sequence numbers of unrecovered packets.
311 | 		df = pd.DataFrame(seqnos)
312 | 		df['diff'] = df['srt.seqno'].diff()
313 | 		df.dropna(inplace=True)
314 | 		df['diff'] = df['diff'].astype('int32') - 1
315 | 		df = df[df['diff'] != 0]
316 | 		df['start'] = df['srt.seqno'] - df['diff']
317 | 
318 | 		list_unrec = df[['diff', 'start']].values.tolist()
319 | 
320 | 		unrec_pkts_seqnos = []
321 | 		for sublist in list_unrec:
322 | 			diff, start = sublist
323 | 			for i in range(0, diff):
324 | 				unrec_pkts_seqnos.append(start + i)
325 | 
326 | 		unrec_pkts_seqnos = pd.Series(unrec_pkts_seqnos)
327 | 		path_unrec = parent / (stem + '-unrec-pkts-seqnos.csv')
328 | 		unrec_pkts_seqnos.to_csv(path_unrec)
329 | 		print(f'\nUnrecovered at the receiver side packets have the following sequence numbers. They are stored in {path_unrec} file.')
330 | 		print(unrec_pkts_seqnos)
331 | 
332 | 
333 | @click.command()
334 | @click.argument(
335 | 	'path', 
336 | 	type=click.Path(exists=True)
337 | )
338 | @click.option(
339 | 	'--side',
340 | 	type=click.Choice(['snd', 'rcv'], case_sensitive=False),
341 | 	required=True,
342 | 	help='The side .pcap(ng) file was collected at.'
343 | )
344 | @click.option(
345 | 	'--overwrite/--no-overwrite',
346 | 	default=False,
347 | 	help=	'If exists, overwrite the .csv file produced out of the .pcap(ng) '
348 | 			'one at the previous iterations of running the script.',
349 | 	show_default=True
350 | )
351 | @click.option(
352 | 	'--show-unrec-pkts/--no-show-unrec-pkts',
353 | 	default=False,
354 | 	help=	'Show sequence numbers of unrecovered at the receiver side '
355 | 			'packets. Save the list of sequence numbers into respective .csv file.',
356 | 	show_default=True
357 | )
358 | @click.option(
359 | 	'--port',
360 | 	help=	'Decode packets as SRT on a specified port. '
361 | 			'This option is helpful when there is no SRT handshake in .pcap(ng) file. '
362 | 			'Should be used together with --overwrite option.'
363 | )
364 | def main(path, side, overwrite, show_unrec_pkts, port):
365 | 	"""
366 | 	Script designed to process .pcap(ng) files and generate a report
367 | 	with network traffic statistics.
368 | 	"""
369 | 	# Convert .pcap(ng) to .csv tcpdump trace file
370 | 	pcap_filepath = pathlib.Path(path)
371 | 	if port is not None:
372 | 		csv_filepath = convert_to_csv(pcap_filepath, overwrite, True, port)
373 | 	else:
374 | 		csv_filepath = convert_to_csv(pcap_filepath, overwrite)
375 | 
376 | 	# Extract SRT packets
377 | 	try:
378 | 		srt_packets = extract_srt_packets(csv_filepath)
379 | 	except (UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound) as error:
380 | 		print(f'{error}')
381 | 		return
382 | 
383 | 	stats = TrafficStats(srt_packets)
384 | 
385 | 	if (side == 'snd'):
386 | 		stats.generate_snd_report()
387 | 		return
388 | 	
389 | 	if (side == 'rcv'):
390 | 		stats.generate_rcv_report()
391 | 		if (show_unrec_pkts):
392 | 			stats.show_unrecovered_packets(pathlib.Path(path).parent, pathlib.Path(path).stem)
393 | 
394 | 
395 | if __name__ == '__main__':
396 | 	main()
397 | 


--------------------------------------------------------------------------------
/scripts/plot_snd_timing.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Script designed to plot time delta between packet capture time (Wireshark) and
 3 | SRT packet timestamp.
 4 | """
 5 | import pathlib
 6 | 
 7 | import click
 8 | import pandas as pd
 9 | import matplotlib.pyplot as plt
10 | 
11 | from tcpdump_processing.convert import convert_to_csv
12 | from tcpdump_processing.extract_packets import extract_srt_packets, UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound
13 | 
14 | 
15 | pd.options.mode.chained_assignment = None  # default='warn'
16 | 
17 | 
18 | class SRTDataIndex:
19 | 	def __init__(self, srt_packets):
20 | 		self.ctrl_pkts		= (srt_packets['srt.iscontrol'] == 1)
21 | 		self.data_pkts		= (srt_packets['srt.iscontrol'] == 0)
22 | 		self.data_pkts_org	= self.data_pkts & (srt_packets['srt.msg.rexmit'] == 0)
23 | 		self.data_pkts_rxt	= self.data_pkts & (srt_packets['srt.msg.rexmit'] == 1)
24 | 
25 | 
26 | @click.command()
27 | @click.argument(
28 | 	'path', 
29 | 	type=click.Path(exists=True)
30 | )
31 | @click.option(
32 | 	'--overwrite/--no-overwrite',
33 | 	default=False,
34 | 	help=	'If exists, overwrite the .csv file produced out of the .pcap(ng) one '
35 | 			'at the previous iterations of running the script.',
36 | 	show_default=True
37 | )
38 | @click.option(
39 | 	'--with-rexmits/--without-rexmits',
40 | 	default=False,
41 | 	help=	'Also show retransmitted data packets.',
42 | 	show_default=True
43 | )
44 | @click.option(
45 | 	'--port',
46 | 	help=	'Decode packets as SRT on a specified port. '
47 | 			'This option is helpful when there is no SRT handshake in .pcap(ng) file. '
48 | 			'Should be used together with --overwrite option.'
49 | )
50 | @click.option(
51 | 	'--latency',
52 | 	help=	'SRT latency, in milliseconds, to plot on a graph.'
53 | )
54 | def main(path, overwrite, with_rexmits, port, latency):
55 | 	"""
56 | 	This script parses .pcap(ng) tcpdump trace file captured at the sender side
57 | 	and plots the time delta between SRT packet timestamp (srt.timestamp) and
58 | 	packet time captured by Wireshark at the sender side (ws.time).
59 | 	This could be done for either SRT original DATA packets only, or both
60 | 	original and retransmitted DATA packets.
61 | 	"""
62 | 	pcap_filepath = pathlib.Path(path)
63 | 	if port is not None:
64 | 		csv_filepath = convert_to_csv(pcap_filepath, overwrite, True, port)
65 | 	else:
66 | 		csv_filepath = convert_to_csv(pcap_filepath, overwrite)
67 | 	
68 | 	try:
69 | 		srt_packets = extract_srt_packets(csv_filepath)
70 | 	except (UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound) as error:
71 | 		print(f'{error}')
72 | 		return
73 | 	
74 | 	index = SRTDataIndex(srt_packets)
75 | 	df = srt_packets[index.data_pkts]
76 | 	df['delta'] = df['ws.time'] * 1000 - df['srt.timestamp'] / 1000
77 | 	# NOTE: The correction on the very first DATA packet is made by means of subtracting
78 | 	# respective time delta from all the whole column.
79 | 	df['delta'] = df['delta'] - df['delta'].iloc[0]
80 | 	org = df[df['srt.msg.rexmit'] == 0]
81 | 	rxt = df[df['srt.msg.rexmit'] == 1]
82 | 
83 | 	fig, ax = plt.subplots()
84 | 	org.plot(x = 'ws.time', xlabel = 'Time, s', y = 'delta', ylabel = 'Time Delta, ms', kind='scatter', label='Original Packets', ax=ax)
85 | 	if with_rexmits:
86 | 		rxt.plot(x = 'ws.time', xlabel = 'Time, s', y = 'delta', ylabel = 'Time Delta, ms', kind='scatter', color='r', label='Retransmitted Packets', ax=ax)
87 | 	if latency:
88 | 		plt.axhline(float(latency), color='g', label='SRT Latency')
89 | 	ax.legend()
90 | 	plt.show()
91 | 
92 | 
93 | if __name__ == '__main__':
94 | 	main()
95 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup, find_packages
 2 | 
 3 | 
 4 | # Dependencies for using the library
 5 | install_requires = [
 6 |     'click >=7.0',
 7 |     'pandas >=2.0.1',
 8 |     'pathlib >=1.0.1',
 9 | ]
10 | 
11 | 
12 | setup(
13 |     name='lib-tcpdump-processing',
14 |     version='0.2',
15 |     author='Maria Sharabayko',
16 |     author_email='maria.bakholdina@gmail.com',
17 |     packages=find_packages(),
18 |     install_requires=install_requires,
19 |     entry_points={
20 |         'console_scripts': [
21 |             'extract-packets = tcpdump_processing.extract_packets:main',
22 |             'get-traffic-stats = scripts.get_traffic_stats:main',
23 |             'plot-snd-timing = scripts.plot_snd_timing:main',
24 |             'dump-pkt-timestamps = scripts.dump_pkt_timestamps:main'
25 |         ],
26 |     },
27 | )


--------------------------------------------------------------------------------
/tcpdump_processing/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/tcpdump_processing/__init__.py


--------------------------------------------------------------------------------
/tcpdump_processing/convert.py:
--------------------------------------------------------------------------------
  1 | """ Module designed to convert .pcap(ng) tcpdump trace file into .csv one. """
  2 | 
  3 | import pathlib
  4 | import subprocess
  5 | 
  6 | 
  7 | class IsNotPcapFile(Exception):
  8 | 	pass
  9 | 
 10 | 
 11 | class PcapProcessingFailed(Exception):
 12 | 	pass
 13 | 
 14 | 
 15 | class FileDoesNotExist(Exception):
 16 | 	pass
 17 | 
 18 | 
 19 | class DirectoryDoesNotExist(Exception):
 20 | 	pass
 21 | 
 22 | 
 23 | def convert_to_csv(
 24 | 	filepath: pathlib.Path,
 25 | 	overwrite: bool=False,
 26 | 	decode_as_srt: bool=False,
 27 | 	port: str=None
 28 | ) -> pathlib.Path:
 29 | 	""" 
 30 | 	Convert .pcap(ng) tcpdump trace file into .csv one. During conversion,
 31 | 	by default UDP packets are extracted. If `decode_as_srt` equals True,
 32 | 	packets are decoded as SRT on a particular port.
 33 | 
 34 | 	Attributes:
 35 | 		filepath: 
 36 | 			:class:`pathlib.Path` path to tcpdump trace file.
 37 | 		overwrite:
 38 | 			True if already existing .csv file should be overwritten.
 39 | 		decode_as_srt:
 40 | 			True if packets should be decoded as SRT packets
 41 | 			on a particular port.
 42 | 		port:
 43 | 			Port on which packets should be decoded as SRT if `decode_as_srt`
 44 | 			equals True.
 45 | 
 46 | 	Returns:
 47 | 		:class:`pathlib.Path` path to a result csv file.
 48 | 
 49 | 	Raises:
 50 | 		:exc:`FileDoesNotExist` 
 51 | 			if `filepath` file does not exist,
 52 | 		:exc:`IsNotPcapFile` 
 53 | 			if `filepath` does not correspond to .pcap(ng) file,
 54 | 		:exc:`PcapProcessingFailed` 
 55 | 			if tcpdump trace file .csv file processing was not successful.
 56 | 	"""
 57 | 	if not filepath.exists():
 58 | 		raise FileDoesNotExist(filepath)
 59 | 
 60 | 	suffix = filepath.suffix
 61 | 	if not suffix.endswith('.pcapng'):
 62 | 		if not suffix.endswith('.pcap'):
 63 | 			raise IsNotPcapFile(
 64 | 				f'{filepath} does not correspond to .pcap(ng) file'
 65 | 			)
 66 | 
 67 | 	csv_filepath = filepath.parent / (filepath.stem + '.csv')
 68 | 	if csv_filepath.exists() and not overwrite:
 69 | 		print(
 70 | 			'Skipping .pcap(ng) tcpdump trace file processing to '
 71 | 			f'.csv, .csv file already exists: {csv_filepath}.'
 72 | 		)
 73 | 		return csv_filepath	
 74 | 
 75 | 	print(f'Processing .pcap(ng) tcpdump trace file to .csv: {filepath}')
 76 | 	args = [
 77 | 		'tshark',
 78 | 		'-r', str(filepath),
 79 | 		'--disable-protocol', 'udt',				# Disable UDT protocol, otherwise SRT packets will be treated as UDT ones
 80 | 	]
 81 | 
 82 | 	if decode_as_srt:
 83 | 		args += ['-d', f'udp.port=={port},srt']		# Decode UDP packets as SRT on a particular port
 84 | 	else:
 85 | 		args += ['-Y', 'udp',]						# Decode packets as UDP
 86 | 
 87 | 	args += [
 88 | 		'-T', 'fields',
 89 | 		'-e', '_ws.col.No.',
 90 | 		'-e', 'frame.time',
 91 | 		'-e', '_ws.col.Time',
 92 | 		'-e', '_ws.col.Source',
 93 | 		'-e', '_ws.col.Destination',
 94 | 		'-e', '_ws.col.Protocol',
 95 | 		'-e', '_ws.col.Length',
 96 | 		'-e', '_ws.col.Info',
 97 | 		'-e', 'udp.length',
 98 | 		'-e', 'udp.srcport',
 99 | 		'-e', 'udp.dstport',
100 | 		'-e', 'srt.iscontrol',
101 | 		'-e', 'srt.type',
102 | 		'-e', 'srt.seqno',
103 | 		'-e', 'srt.msg.rexmit',
104 | 		'-e', 'srt.timestamp',
105 | 		'-e', 'srt.id',
106 | 		'-e', 'srt.ack_seqno',
107 | 		'-e', 'srt.rtt',
108 | 		'-e', 'srt.rttvar',
109 | 		'-e', 'srt.rate',
110 | 		'-e', 'srt.bw',
111 | 		'-e', 'srt.rcvrate',
112 | 		'-e', 'data.len',
113 | 		'-E', 'header=y',
114 | 		'-E', 'separator=;',
115 | 	]
116 | 
117 | 	with csv_filepath.open(mode='w') as f:
118 | 		process = subprocess.run(args, stdout=f)	
119 | 		if process.returncode != 0:
120 | 			raise PcapProcessingFailed(
121 | 				'Processing .pcap(ng) tcpdump trace file to .csv '
122 | 				f'has failed with the code: {process.returncode}'
123 | 			)
124 | 	print(f'Processing finished: {csv_filepath}')
125 | 	
126 | 	return csv_filepath


--------------------------------------------------------------------------------
/tcpdump_processing/extract_packets.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Module designed to extract packets of interest out of the .pcap(ng)
  3 | tcpdump trace file. Currently only trace files with one data flow
  4 | are supported.
  5 | """
  6 | 
  7 | import enum
  8 | import pathlib
  9 | 
 10 | import click
 11 | import dateutil
 12 | import pandas as pd
 13 | 
 14 | import tcpdump_processing.convert as convert
 15 | 
 16 | 
 17 | class AutoName(enum.Enum):
 18 | 	def _generate_next_value_(name, start, count, last_values):
 19 | 		return name
 20 | 
 21 | 
 22 | @enum.unique
 23 | class PacketTypes(AutoName):
 24 | 	srt = enum.auto()
 25 | 	data = enum.auto()
 26 | 	control = enum.auto()
 27 | 	probing = enum.auto()
 28 | 	umsg_handshake = enum.auto()
 29 | 	umsg_ack = enum.auto()
 30 | 
 31 | 
 32 | PACKET_TYPES = [name for name, member in PacketTypes.__members__.items()]
 33 | 
 34 | 
 35 | class UnexpectedColumnsNumber(Exception):
 36 | 	pass
 37 | 
 38 | class EmptyCSV(Exception):
 39 | 	pass
 40 | 
 41 | class NoUDPPacketsFound(Exception):
 42 | 	pass
 43 | 
 44 | class NoSRTPacketsFound(Exception):
 45 | 	pass
 46 | 
 47 | 
 48 | def extract_srt_packets(filepath: pathlib.Path) -> pd.DataFrame:
 49 | 	""" 
 50 | 	Extract SRT packets (both DATA and CONTROL) from the .csv
 51 | 	tcpdump trace file.
 52 | 
 53 | 	Attributes:
 54 | 		filepath: 
 55 | 			:class:`pathlib.Path` path to the .csv tcpdump trace file.
 56 | 
 57 | 	Returns:
 58 | 		:class:`pd.DataFrame` dataframe with SRT packets or 
 59 | 		an empty dataframe if there is no SRT packets.
 60 | 
 61 | 	Raises:
 62 | 		:exc:`convert.FileDoesNotExist` 
 63 | 			if `filepath` file does not exist.
 64 | 		:exc: `UnexpectedColumnsNumber`
 65 | 			if .csv file contains unexpected number of columns.
 66 | 		:exc: `EmptyCSV`
 67 | 			if neither SRT, nor UDP packets are present in .csv file.
 68 | 		:exc: `NoUDPPacketsFound`
 69 | 			if there is no SRT handshake and there are no UDP packets found in .csv file.
 70 | 		:exc: `NoSRTPacketsFound`
 71 | 			if there is no SRT handshake, but there are UDP packets found in .csv file.
 72 | 			Those UDP packets could be further parsed as SRT ones.
 73 | 	"""
 74 | 	if not filepath.exists():
 75 | 		raise convert.FileDoesNotExist(filepath)
 76 | 
 77 | 	columns = [
 78 | 		'_ws.col.No.',
 79 | 		'frame.time',
 80 | 		'_ws.col.Time',
 81 | 		'_ws.col.Source',
 82 | 		'_ws.col.Destination',
 83 | 		'_ws.col.Protocol',
 84 | 		'_ws.col.Length',
 85 | 		'_ws.col.Info',
 86 | 		'udp.length',
 87 | 		'udp.srcport',
 88 | 		'udp.dstport',
 89 | 		'srt.iscontrol',
 90 | 		'srt.type',
 91 | 		'srt.seqno',
 92 | 		'srt.msg.rexmit',
 93 | 		'srt.timestamp',
 94 | 		'srt.id',
 95 | 		'srt.ack_seqno',
 96 | 		'srt.rtt',
 97 | 		'srt.rttvar',
 98 | 		'srt.rate',
 99 | 		'srt.bw',
100 | 		'srt.rcvrate',
101 | 		'data.len',
102 | 	]
103 | 
104 | 	types = [
105 | 		'int64',		# _ws.col.No. (ws.no)
106 | 		'object',		# frame.time
107 | 		'float64',		# _ws.col.Time (ws.time)
108 | 		'category',		# _ws.col.Source (ws.source)
109 | 		'category',		# _ws.col.Destination (ws.destination)
110 | 		'category',		# _ws.col.Protocol (ws.protocol)
111 | 		'int16',		# _ws.col.Length (ws.length)
112 | 		'object',		# _ws.col.Info (ws.info)
113 | 		'float32',		# udp.length
114 | 		'object',		# ws.srcport
115 | 		'object',		# ws.dstport
116 | 		'float32',		# srt.iscontrol
117 | 		'category',		# srt.type
118 | 		'float64',		# srt.seqno
119 | 		'float32',		# srt.msg.rexmit
120 | 		'float64',		# srt.timestamp
121 | 		'category',		# srt.id
122 | 		'float64',		# srt.ack_seqno
123 | 		'float64',		# srt.rtt
124 | 		'float64',		# srt.rttvar
125 | 		'float64',		# srt.rate
126 | 		'float64',		# srt.bw
127 | 		'float64',		# srt.rcvrate
128 | 		'float32'		# data.len
129 | 	]
130 | 
131 | 	columns_types = dict(zip(columns, types))
132 | 	packets = pd.read_csv(filepath, sep=';', dtype=columns_types)
133 | 
134 | 	if len(packets.columns) != len(columns):
135 | 		raise UnexpectedColumnsNumber(
136 | 			f'Unexpected columns number in .csv file: {filepath}. '
137 | 			'Try running the script with --overwrite option.'
138 | 		)
139 | 
140 | 	packets.columns = [
141 | 		'ws.no',
142 | 		'frame.time',
143 | 		'ws.time',
144 | 		'ws.source',
145 | 		'ws.destination',
146 | 		'ws.protocol',
147 | 		'ws.length',
148 | 		'ws.info',
149 | 		'udp.length',
150 | 		'udp.srcport',
151 | 		'udp.dstport',
152 | 		'srt.iscontrol',
153 | 		'srt.type',
154 | 		'srt.seqno',
155 | 		'srt.msg.rexmit',
156 | 		'srt.timestamp',
157 | 		'srt.id',
158 | 		'srt.ack_seqno',
159 | 		'srt.rtt',
160 | 		'srt.rttvar',
161 | 		'srt.rate',
162 | 		'srt.bw',
163 | 		'srt.rcvrate',
164 | 		'data.len'
165 | 	]
166 | 
167 | 	# Packets dataframe may consist of both SRT and UDP packets, maybe empty as well
168 | 	if packets.empty:
169 | 		raise EmptyCSV(
170 | 			'Neither SRT, nor UDP packets are present in .csv file. '
171 | 			'Sounds like original .pcap(ng) file is empty or consists of non-UDP packets.'
172 | 		)
173 | 
174 | 	srt_packets = packets[packets['ws.protocol'] == 'SRT'].copy()
175 | 
176 | 	if srt_packets.empty:
177 | 		# With a high probability there is no SRT handshake present in the original .pcap(ng) file
178 | 		print(
179 | 			'No SRT packets found in .csv file. '
180 | 			'Sounds like there is no SRT handshake in the original .pcap(ng) file. '
181 | 			'Extracting UDP packets.'
182 | 		)
183 | 
184 | 		udp_packets = packets[packets['ws.protocol'] == 'UDP'].copy()
185 | 
186 | 		if udp_packets.empty:
187 | 			raise NoUDPPacketsFound(
188 | 				'No UDP packets found in .csv file. '
189 | 				'Sounds like there is no UDP packets present in the original .pcap(ng) file.'
190 | 			)
191 | 
192 | 		ports = udp_packets.groupby(['udp.srcport', 'udp.dstport'])['ws.no'].count()
193 | 
194 | 		raise NoSRTPacketsFound(
195 | 			f'There are UDP packets in .csv file on ports: \n{ports}\n'
196 | 			'Try to decode UDP packets as SRT ones by running the script with --overwrite and --port options.'
197 | 		)
198 | 
199 | 	# SRT packets found in .csv file.
200 | 	# TODO: Using ports 'udp.srcport', 'udp.dstport', check that there is only one stream inside
201 | 	
202 | 	# NOTE: When adding a combination "offset abbreviation <-> timezone", it's recommended
203 | 	# to add both standard and daylight savings time offsets for each timezone
204 | 	# (like CET and CEST for 'Europe/Berlin')
205 | 	# https://stackoverflow.com/questions/67061724/panda-to-datetime-raises-warning-tzname-cet-identified-but-not-understood
206 | 	tzmapping = {
207 | 		'CET':	dateutil.tz.gettz('Europe/Berlin'),
208 | 		'CEST':	dateutil.tz.gettz('Europe/Berlin')
209 | 	}
210 | 
211 | 	# NOTE: This is done to convert Windows time offsets into appropriate pandas format
212 | 	# https://learn.microsoft.com/en-us/windows-hardware/manufacture/desktop/default-time-zones?view=windows-11
213 | 	srt_packets['frame.time'] = srt_packets['frame.time'].str.replace('W. Europe Standard Time', 'CET')
214 | 	srt_packets['frame.time'] = srt_packets['frame.time'].str.replace('W. Europe Daylight Time', 'CEST')
215 | 
216 | 	srt_packets['frame.time'] = srt_packets['frame.time'].apply(dateutil.parser.parse, tzinfos=tzmapping)
217 | 	srt_packets['frame.time'] = srt_packets['frame.time'].dt.tz_convert('UTC')
218 | 
219 | 	srt_packets['srt.iscontrol'] = srt_packets['srt.iscontrol'].astype('int8')
220 | 	srt_packets['srt.timestamp'] = srt_packets['srt.timestamp'].astype('int64')
221 | 	srt_packets['udp.length'] = srt_packets['udp.length'].fillna(0).astype('int16')
222 | 	srt_packets['data.len'] = srt_packets['data.len'].fillna(0).astype('int16')
223 | 	
224 | 	return srt_packets
225 | 
226 | 
227 | def extract_data_packets(srt_packets: pd.DataFrame) -> pd.DataFrame:
228 | 	""" 
229 | 	Extract SRT DATA packets from SRT packets (both DATA and CONTROL)
230 | 	`srt_packets` dataframe. 
231 | 	
232 | 	Attributes:
233 | 		srt_packets: 
234 | 			:class:`pd.DataFrame` dataframe with SRT packets (both DATA and 
235 | 			CONTROL) obtained from the .csv tcpdump trace file using
236 | 			`extract_srt_packets` function.
237 | 
238 | 	Returns:
239 | 		:class:`pd.DataFrame` dataframe with SRT DATA packets or
240 | 		an empty dataframe if there is no DATA packets found.
241 | 	"""
242 | 	columns = [
243 | 		'ws.no',
244 | 		'frame.time',
245 | 		'ws.time',
246 | 		'ws.source',
247 | 		'ws.destination',
248 | 		'ws.protocol',
249 | 		'ws.length',
250 | 		'ws.info',
251 | 		'srt.iscontrol',
252 | 		'srt.seqno',
253 | 		'srt.msg.rexmit',
254 | 		'srt.timestamp',
255 | 		'srt.id',
256 | 		'data.len',
257 | 	]
258 | 	data = srt_packets.loc[srt_packets['srt.iscontrol'] == 0, columns]
259 | 	data['srt.seqno'] = data['srt.seqno'].astype('int64')
260 | 	data['srt.msg.rexmit'] = data['srt.msg.rexmit'].astype('int8')
261 | 	data['data.len'] = data['data.len'].astype('int16')
262 | 
263 | 	# Group data by source, destination and socket id
264 | 	# NOTE: There should be only one group under the assumption that tcpdump
265 | 	# trace file has been taken at the receiver side and there is only one 
266 | 	# data flow. For more complicated use cases, a proper data splitting 
267 | 	# should be implemented.
268 | 	data_grouped = data.groupby(['ws.source', 'ws.destination', 'srt.id'])
269 | 	
270 | 	# Return an empty dataframe if there is no DATA packets found
271 | 	if len(data_grouped) == 0:
272 | 		columns += [
273 | 			'ws.time.us',
274 | 			'ws.iat.us'
275 | 		]
276 | 		return pd.DataFrame(columns=columns)
277 | 
278 | 	# TODO: Implement
279 | 	# Return an empty dataframe if there is more than 1 data flow detected
280 | 	if len(data_grouped) > 1:
281 | 		print(
282 | 			'There are more than 1 data flow detected. '
283 | 			'This case is not supported. The groups found are listed below:'
284 | 		)
285 | 
286 | 		for name, group in data_grouped:
287 | 			print(name)
288 | 			print(group)
289 | 
290 | 		columns += [
291 | 			'ws.time.us',
292 | 			'ws.iat.us'
293 | 		]
294 | 		return pd.DataFrame(columns=columns)
295 | 
296 | 	assert(len(data_grouped) == 1)
297 | 
298 | 	# Calculate packet inter-arrival times
299 | 	# NOTE: Packet timestamp `ws.time` in tcpdump trace file is measured 
300 | 	# in seconds, time in SRT is measured in microseconds (us). 
301 | 	# That is why, first we multiply the timestamp by 1000000, then make 
302 | 	# a conversion from float to int as it is done in SRT, and only then 
303 | 	# calculate the inter-arrival times. The very first value will be NaN,
304 | 	# fillna() changes it to 0, otherwise astype() will fail. Finally,
305 | 	# we convert the type from float to int, because diff() returns float.
306 | 	# NOTE: In SRT protocol, the time delta for the first SRT data packet is
307 | 	# taken as the difference between time of this data packet and the
308 | 	# previous handshake one. Here we assume this value to be equal to 0
309 | 	# for simplicity.
310 | 	data['ws.time.us'] = (data['ws.time'] * 1000000).astype('int64')
311 | 	data['ws.iat.us'] = data['ws.time.us'].diff().fillna(0).astype('int64')
312 | 
313 | 	return data
314 | 
315 | 
316 | def extract_control_packets(srt_packets: pd.DataFrame) -> pd.DataFrame:
317 | 	"""
318 | 	Extract SRT CONTROL packets from SRT packets (both DATA and CONTROL)
319 | 	`srt_packets` dataframe. 
320 | 	
321 | 	Attributes:
322 | 		srt_packets: 
323 | 			:class:`pd.DataFrame` dataframe with SRT packets (both DATA and 
324 | 			CONTROL) obtained from the .csv tcpdump trace file using
325 | 			`extract_srt_packets` function.
326 | 
327 | 	Returns:
328 | 		:class:`pd.DataFrame` dataframe with SRT CONTROL packets or
329 | 		an empty dataframe if there is no CONTROL packets found.
330 | 	"""
331 | 	columns = [
332 | 		'ws.no',
333 | 		'frame.time',
334 | 		'ws.time',
335 | 		'ws.source',
336 | 		'ws.destination',
337 | 		'ws.protocol',
338 | 		'ws.length',
339 | 		'ws.info',
340 | 		'srt.iscontrol',
341 | 		'srt.type',
342 | 		'srt.timestamp',
343 | 		'srt.id',
344 | 		'srt.ack_seqno',
345 | 		'srt.rtt',
346 | 		'srt.rttvar',
347 | 		'srt.rate',
348 | 		'srt.bw',
349 | 		'srt.rcvrate',
350 | 	]
351 | 	control = srt_packets.loc[srt_packets['srt.iscontrol'] == 1, columns]
352 | 
353 | 	return control
354 | 
355 | 
356 | def extract_probing_packets(srt_packets: pd.DataFrame) -> pd.DataFrame:
357 | 	""" 
358 | 	Extract SRT probing DATA packets from SRT packets (both DATA and CONTROL)
359 | 	`srt_packets` dataframe. 
360 | 	
361 | 	Attributes:
362 | 		srt_packets: 
363 | 			:class:`pd.DataFrame` dataframe with SRT packets (both DATA and 
364 | 			CONTROL) obtained from the .csv tcpdump trace file using
365 | 			`extract_srt_packets` function.
366 | 
367 | 	Returns:
368 | 		:class:`pd.DataFrame` dataframe with SRT probing DATA packets or
369 | 		an empty dataframe if there is no probing packets found.
370 | 	"""
371 | 	data = extract_data_packets(srt_packets)
372 | 
373 | 	# Apply logic AND to SRT data packet sequence number and 15 (1111) in order to check
374 | 	# the latest 4 bits of the sequence number (whether it is 0000=0 or 0001=1).
375 | 	# 0001=1 corresponds to the probing packet.
376 | 	data['seqno'] = data['srt.seqno'] & 15
377 | 	# Shift seqno column by 1 in order to get the current and previous values nearby.
378 | 	# We are looking for pairs: probing packet (0001=1) and previous packet (0000=0).
379 | 	# The order is important. Fill the first NA value with 1 in order to exclude this
380 | 	# row for sure.
381 | 	data['seqno_shifted'] = data['seqno'].shift().fillna(1).astype('int64')
382 | 	# Then we are interested in those probing packets for which packet pairs consist
383 | 	# of original only packets. There should be no retransmitted packets.
384 | 	# Shift srt.msg.rexmit column by 1 to get the current and previous values of rexmit
385 | 	# flag (0 - original packet, 1 - retransmitted packet) nearby. Fill the first
386 | 	# NA value with 1 in order to exclude this row for sure.
387 | 	data['rexmit_shifted'] = data['srt.msg.rexmit'].shift().fillna(1).astype('int8')
388 | 
389 | 	probing_packets = data[
390 | 		(data['seqno'] == 1) & 
391 | 		(data['seqno_shifted'] == 0) & 
392 | 		(data['srt.msg.rexmit'] == 0) & 
393 | 		(data['rexmit_shifted'] == 0)
394 | 	]
395 | 	
396 | 	columns = [
397 | 		'ws.no',
398 | 		'frame.time',
399 | 		'ws.time',
400 | 		'ws.source',
401 | 		'ws.destination',
402 | 		'ws.protocol',
403 | 		'ws.length',
404 | 		'ws.info',
405 | 		'srt.iscontrol',
406 | 		'srt.seqno',
407 | 		'srt.msg.rexmit',
408 | 		'srt.timestamp',
409 | 		'srt.id',
410 | 		'data.len',
411 | 		'ws.time.us',
412 | 		'ws.iat.us',
413 | 	]
414 | 	probing_packets = probing_packets[columns]
415 | 
416 | 	return probing_packets
417 | 
418 | 
419 | def extract_umsg_handshake_packets(srt_packets: pd.DataFrame) -> pd.DataFrame:
420 | 	"""
421 | 	Extract SRT UMSG_HANDSHAKE CONTROL packets from SRT packets
422 | 	(both DATA and CONTROL) `srt_packets` dataframe. 
423 | 	
424 | 	Attributes:
425 | 		srt_packets: 
426 | 			:class:`pd.DataFrame` dataframe with SRT packets (both DATA and 
427 | 			CONTROL) obtained from the .csv tcpdump trace file using
428 | 			`extract_srt_packets` function.
429 | 
430 | 	Returns:
431 | 		:class:`pd.DataFrame` dataframe with SRT UMSG_HANDSHAKE CONTROL
432 | 		packets or an empty dataframe if there is no UMSG_HANDSHAKE
433 | 		packets found.
434 | 	"""
435 | 	columns = [
436 | 		'ws.no',
437 | 		'frame.time',
438 | 		'ws.time',
439 | 		'ws.source',
440 | 		'ws.destination',
441 | 		'ws.protocol',
442 | 		'ws.length',
443 | 		'ws.info',
444 | 		'srt.iscontrol',
445 | 		'srt.type',
446 | 		'srt.timestamp',
447 | 		'srt.id',
448 | 	]
449 | 	control = extract_control_packets(srt_packets)
450 | 	umsg_handshake = control.loc[control['srt.type'] == '0x00000000', columns]
451 | 
452 | 	return umsg_handshake
453 | 
454 | 
455 | def extract_umsg_ack_packets(srt_packets: pd.DataFrame) -> pd.DataFrame:
456 | 	""" 
457 | 	Extract SRT UMSG_ACK CONTROL packets from SRT packets (both DATA and CONTROL)
458 | 	`srt_packets` dataframe. 
459 | 	
460 | 	Attributes:
461 | 		srt_packets: 
462 | 			:class:`pd.DataFrame` dataframe with SRT packets (both DATA and 
463 | 			CONTROL) obtained from the .csv tcpdump trace file using
464 | 			`extract_srt_packets` function.
465 | 
466 | 	Returns:
467 | 		:class:`pd.DataFrame` dataframe with SRT UMSG_ACK CONTROL packets or
468 | 		an empty dataframe if there is no UMSG_ACK packets found.
469 | 	"""
470 | 	control = extract_control_packets(srt_packets)
471 | 
472 | 	# Group data by source, destination, socket id and packet type
473 | 	grouped = control.groupby(['ws.source', 'ws.destination', 'srt.id', 'srt.type'])
474 | 	# Find the group with packet type = UMSG_ACK ('0x00000002')
475 | 	# NOTE: There should be only one group under the assumption that tcpdump
476 | 	# trace file has been taken at the receiver side and there is only one 
477 | 	# data flow. For more complicated use cases, a proper data splitting 
478 | 	# should be implemented.
479 | 	names = [name for name, _ in grouped if name[len(name) - 1] == '0x00000002']
480 | 
481 | 	# Return an empty dataframe if there is no UMSG_ACK packets found
482 | 	if len(names) == 0:
483 | 		return pd.DataFrame(columns=columns)
484 | 
485 | 	# TODO: Implement
486 | 	# Return an empty dataframe if there is more than 1 data flow detected
487 | 	if len(names) > 1:
488 | 		print(
489 | 			'There are more than 1 data flow detected. '
490 | 			f'This case is not supported. The groups found are listed below:'
491 | 		)
492 | 
493 | 		for name in names:
494 | 			print(name)
495 | 
496 | 		return pd.DataFrame(columns=columns)
497 |  
498 | 	assert(len(names) == 1)
499 | 
500 | 	umsg_ack = grouped.get_group(names[0])
501 | 
502 | 	# Drop rows with NaN values in srt.rate, srt.bw, srt.rcvrate columns 
503 | 	# (so called light acknowledgements)
504 | 	umsg_ack = umsg_ack.dropna(subset=['srt.rate', 'srt.bw', 'srt.rcvrate'], how='any')
505 | 
506 | 	# Convert types
507 | 	umsg_ack['srt.ack_seqno'] = umsg_ack['srt.ack_seqno'].astype('int64')
508 | 	umsg_ack['srt.rtt'] = umsg_ack['srt.rtt'].astype('int64')
509 | 	umsg_ack['srt.rttvar'] = umsg_ack['srt.rttvar'].astype('int64')
510 | 	umsg_ack['srt.rate'] = umsg_ack['srt.rate'].astype('int64')
511 | 	umsg_ack['srt.bw'] = umsg_ack['srt.bw'].astype('int64')
512 | 	umsg_ack['srt.rcvrate'] = umsg_ack['srt.rcvrate'].astype('int64')
513 | 
514 | 	return umsg_ack
515 | 
516 | 
517 | @click.command()
518 | @click.argument(
519 | 	'path', 
520 | 	type=click.Path(exists=True)
521 | )
522 | @click.option(
523 | 	'--type',
524 | 	type=click.Choice(PACKET_TYPES),
525 | 	default=PacketTypes.probing.value,
526 | 	help=	'Packet type to extract: '
527 | 			'SRT (both DATA and CONTROL), SRT DATA, SRT CONTROL, '
528 | 			'SRT DATA probing, SRT CONTROL UMSG_HANDSHAKE, '
529 | 			'or SRT CONTROL UMSG_ACK packets.',
530 | 	show_default=True
531 | )
532 | @click.option(
533 | 	'--overwrite/--no-overwrite',
534 | 	default=False,
535 | 	help=	'If exists, overwrite the .csv file produced out of the .pcap(ng) '
536 | 			'one at the previous iterations of running the script.',
537 | 	show_default=True
538 | )
539 | @click.option(
540 | 	'--save/--no-save',
541 | 	default=False,
542 | 	help='Save dataframe with extracted packets into .csv file.',
543 | 	show_default=True
544 | )
545 | @click.option(
546 | 	'--port',
547 | 	help=	'Decode packets as SRT on a specified port. '
548 | 			'This option is helpful when there is no SRT handshake in .pcap(ng) file. '
549 | 			'Should be used together with --overwrite option.'
550 | )
551 | def main(path, type, overwrite, save, port):
552 | 	"""
553 | 	This script parses .pcap(ng) tcpdump trace file,
554 | 	saves the output in .csv format nearby the original file, extract packets 
555 | 	of interest and saves the obtained dataframe in .csv format nearby the 
556 | 	original file.
557 | 	"""
558 | 	# Convert .pcap(ng) to .csv tcpdump trace file
559 | 	pcap_filepath = pathlib.Path(path)
560 | 	if port is not None:
561 | 		csv_filepath = convert.convert_to_csv(pcap_filepath, overwrite, True, port)
562 | 	else:
563 | 		csv_filepath = convert.convert_to_csv(pcap_filepath, overwrite)
564 | 
565 | 	# Extract packets of interest
566 | 	try:
567 | 		srt_packets = extract_srt_packets(csv_filepath)
568 | 	except (UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound) as error:
569 | 		print(f'{error}')
570 | 		return
571 | 
572 | 	if type == PacketTypes.srt.value:
573 | 		packets = srt_packets
574 | 	if type == PacketTypes.data.value:
575 | 		packets = extract_data_packets(srt_packets)
576 | 	if type == PacketTypes.control.value:
577 | 		packets = extract_control_packets(srt_packets)
578 | 	if type == PacketTypes.probing.value:
579 | 		packets = extract_probing_packets(srt_packets)
580 | 	if type == PacketTypes.umsg_handshake.value:
581 | 		packets = extract_umsg_handshake_packets(srt_packets)
582 | 	if type == PacketTypes.umsg_ack.value:
583 | 		packets = extract_umsg_ack_packets(srt_packets)
584 | 
585 | 	# Print the first 20 rows of the dataframe with extracted packets
586 | 	print('The result dataframe is the following:')
587 | 	print(packets.head(20))
588 | 
589 | 	# Save extracted packets to .csv
590 | 	if save:
591 | 		print('Writing to .csv file ...')
592 | 		name, _ = csv_filepath.name.split('.')
593 | 		packets.to_csv(csv_filepath.parent / f'{name}-{type}.csv', sep=';')
594 | 
595 | 
596 | if __name__ == '__main__':
597 | 	main()
598 | 


--------------------------------------------------------------------------------