├── .gitignore ├── README.md ├── img ├── extract_packets_report.png ├── get_traffic_stats_report_rcv.png └── get_traffic_stats_report_snd.png ├── scripts ├── __init__.py ├── dump_pkt_timestamps.py ├── get_traffic_stats.py └── plot_snd_timing.py ├── setup.py └── tcpdump_processing ├── __init__.py ├── convert.py └── extract_packets.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # lib-tcpdump-processing 2 | 3 | **`lib-tcpdump-processing`** is a library designed to process `.pcap(ng)` [tcpdump](https://www.tcpdump.org/) or [Wireshark](https://www.wireshark.org/) trace files and extract [SRT](https://github.com/Haivision/srt) packets of interest for further analysis. 4 | 5 | It also helps to process a network trace file and generate a report with SRT related statistics. In particular, the following statistics are calculated at the receiver side: 6 | - the number of SRT DATA and CONTROL packets present in a dump, 7 | - original DATA packets received, lost, recovered, and unrecovered, 8 | - retransmitted DATA packets received and the amount of packets retransmitted once, twice, 3 times, or more, 9 | - The information about CONTROL (ACK, ACKACK, and NAK) packets, 10 | - and other relevant information. 11 | 12 | See the [`get-traffic-stats`](#get-traffic-stats) script for details and report examples. 13 | 14 | **Important:** Currently, trace files containing only one flow of data are supported. To support several data flows adjustments will be required. 15 | 16 | ## 1. Getting Started 17 | 18 | ### Requirements 19 | 20 | * python 3.6+ 21 | * tshark 3.2.2+ (setting up tshark is described [here](https://github.com/mbakholdina/srt-test-runner#setting-up-tshark) and in the SRT CookBook [here](https://srtlab.github.io/srt-cookbook/how-to-articles/how-to-setup-wireshark-for-srt-traffic-analysis/)) 22 | 23 | ### Install the library with pip 24 | 25 | For development, it is recommended to: 26 | * use `venv` for virtual environments and `pip` for installing the library and any dependencies. This ensures the code and dependencies are isolated from the system Python installation, 27 | * install the library in “editable” mode by running `pip install -e .` from the same directory. This allows changing the source code (both tests and library) and rerunning tests against library code at will. For regular installation, use `pip install .`. 28 | 29 | As soon as the library is installed, you can run modules directly: 30 | ``` 31 | venv/bin/python -m tcpdump_processing.extract_packets --help 32 | ``` 33 | 34 | or use preinstalled executable scripts: 35 | ``` 36 | venv/bin/extract-packets --help 37 | ``` 38 | 39 | ### Install the library to import in another project 40 | 41 | Install with `pip` (a `venv` is recommended), using the `pip` VCS requirement specifier: 42 | ``` 43 | pip install 'git+https://github.com/mbakholdina/lib-tcpdump-processing.git@v0.1#egg=tcpdump_processing' 44 | ``` 45 | 46 | or simply put the following row in `requirements.txt`: 47 | ``` 48 | git+https://github.com/mbakholdina/lib-tcpdump-processing.git@v0.1#egg=tcpdump_processing 49 | ``` 50 | 51 | Remember to quote the full URL to avoid shell expansion in the case of direct installation. 52 | 53 | This installs the version corresponding to the git tag 'v0.1'. You can replace that with a branch name, a commit hash, or a git ref as necessary. See the [pip documentation](https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support) for details. 54 | 55 | To install the latest master, use: 56 | ``` 57 | git+https://github.com/mbakholdina/lib-tcpdump-processing.git@master#egg=tcpdump_processing 58 | ``` 59 | 60 | As soon as the library is installed, you can import the whole library: 61 | ``` 62 | import tcpdump_processing 63 | ``` 64 | 65 | or a particular module: 66 | ``` 67 | import tcpdump_processing.extract_packets as extract_packets 68 | ``` 69 | 70 | ## 2. Executable Scripts 71 | 72 | To use the following scripts, please install the library first (see the [Install the library with pip](#install-the-library-with-pip) section). 73 | 74 | ### `extract-packets` 75 | 76 | This script parses `.pcap(ng)` trace file, saves the output in `.csv` format in the same directory as the original file, extracts packets of interest, and saves the obtained dataframe in `.csv` format in the same directory as the original file. 77 | 78 | Usage: 79 | ``` 80 | venv/bin/extract-packets [OPTIONS] PATH 81 | ``` 82 | where `PATH` refers to `.pcap(ng)` file. 83 | 84 | Options: 85 | ``` 86 | Options: 87 | --type [srt|data|control|probing|umsg_handshake|umsg_ack] 88 | Packet type to extract: SRT (both DATA and 89 | CONTROL), SRT DATA, SRT CONTROL, SRT DATA 90 | probing, SRT CONTROL UMSG_HANDSHAKE, or SRT 91 | CONTROL UMSG_ACK packets. [default: 92 | probing] 93 | --overwrite / --no-overwrite If exists, overwrite the .csv file produced 94 | out of the .pcap(ng) one at the previous 95 | iterations of running the script. [default: 96 | no-overwrite] 97 | --save / --no-save Save dataframe with extracted packets into 98 | .csv file. [default: no-save] 99 | --port TEXT Decode packets as SRT on a specified port. 100 | This option is helpful when there is no SRT 101 | handshake in .pcap(ng) file. Should be used 102 | together with --overwrite option. 103 | --help Show this message and exit. 104 | ``` 105 | 106 | Here is an example of the report generated when extracting `--type srt` packets: 107 | 108 | ![extract_packets_report](img/extract_packets_report.png) 109 | 110 | ### `get-traffic-stats` 111 | 112 | This script helps to process a network `.pcap(ng)` trace file and generate a report with SRT related statistics. Intermediate data is stored in `.csv` format in the same directory as the original file. Both sender and receiver side dumps are supported. 113 | 114 | Usage: 115 | ``` 116 | venv/bin/get-traffic-stats [OPTIONS] PATH 117 | ``` 118 | where `PATH` refers to `.pcap(ng)` file. 119 | 120 | Options: 121 | ``` 122 | Options: 123 | --side [snd|rcv] The side .pcap(ng) file was collected at. 124 | [required] 125 | --overwrite / --no-overwrite If exists, overwrite the .csv file produced 126 | out of the .pcap(ng) one at the previous 127 | iterations of running the script. [default: 128 | no-overwrite] 129 | --show-unrec-pkts / --no-show-unrec-pkts 130 | Show sequence numbers of unrecovered at the 131 | receiver side packets. Save the list of 132 | sequence numbers into respective .csv file. 133 | [default: no-show-unrec-pkts] 134 | --port TEXT Decode packets as SRT on a specified port. 135 | This option is helpful when there is no SRT 136 | handshake in .pcap(ng) file. Should be used 137 | together with --overwrite option. 138 | --help Show this message and exit. 139 | ``` 140 | 141 | Here is an example of the report generated at the sender side: 142 | 143 | ![get_traffic_stats_report_snd](img/get_traffic_stats_report_snd.png) 144 | 145 | Here is an example of the report generated at the receiver side: 146 | 147 | ![get_traffic_stats_report_rcv](img/get_traffic_stats_report_rcv.png) 148 | 149 | ### `plot-snd-timing` 150 | 151 | This script parses `.pcap(ng)` tcpdump trace file captured at the sender side 152 | and plots the time delta between SRT packet timestamp (`srt.timestamp`) and 153 | packet time captured by Wireshark at the sender side (`ws.time`). 154 | This could be done for either SRT original DATA packets only, or both 155 | original and retransmitted DATA packets. 156 | 157 | Usage: 158 | ``` 159 | venv/bin/plot-snd-timing [OPTIONS] PATH 160 | ``` 161 | where `PATH` refers to `.pcap(ng)` file. 162 | 163 | Options: 164 | ``` 165 | Options: 166 | --overwrite / --no-overwrite If exists, overwrite the .csv file produced 167 | out of the .pcap(ng) one at the previous 168 | iterations of running the script. [default: 169 | no-overwrite] 170 | --with-rexmits / --without-rexmits 171 | Also show retransmitted data packets. 172 | [default: without-rexmits] 173 | --port TEXT Decode packets as SRT on a specified port. 174 | This option is helpful when there is no SRT 175 | handshake in .pcap(ng) file. Should be used 176 | together with --overwrite option. 177 | --latency TEXT SRT latency, in milliseconds, to plot on a 178 | graph. 179 | --help Show this message and exit. 180 | ``` 181 | 182 | ### `dump-pkt-timestamps` 183 | 184 | The script dumps SRT timestamps (not Wireshark `ws.time`) of SRT data packets to a `.csv` file 185 | to be used by [srt-xtransmit](https://github.com/maxsharabayko/srt-xtransmit) application with the `--playback-csv` argument. 186 | 187 | Usage: 188 | ``` 189 | dump-pkt-timestamps [OPTIONS] INPUT OUTPUT 190 | ``` 191 | where `INPUT` is the `.pcap(ng)` file to use as an input, `OUTPUT` is the output `.csv` file to be produced. 192 | 193 | Options: 194 | ``` 195 | Options: 196 | --overwrite / --no-overwrite If exists, overwrite the .csv file produced 197 | out of the .pcap(ng) one at the previous 198 | iterations of running the script. [default: 199 | no-overwrite] 200 | --port TEXT Decode packets as SRT on a specified port. 201 | This option is helpful when there is no SRT 202 | handshake in .pcap(ng) file. Should be used 203 | together with --overwrite option. 204 | --help Show this message and exit. 205 | ``` 206 | 207 | ## 3. Data Preparation 208 | 209 | A `.pcap(ng)` trace file with measurements from a specific network interface and port collected at the sender/receiver side is used as a proxy for packet data collected by SRT. This trace file is preprocessed in `.csv` format with timestamp, source IP address, destination IP address, protocol, and other columns and rows representing observations (sent/received packets). 210 | 211 | This data is further cleaned and transformed using [pandas](https://pandas.pydata.org/) in the following way: 212 | 1. The data is filtered to extract SRT packets only (`ws.protocol == SRT`), which makes sense for further analysis. 213 | 2. The dataset then is split into DATA (`srt.iscontrol == 0`) and CONTROL (`srt.iscontrol == 1`) packets. 214 | 3. For DATA packets, timestamps are converted from seconds to microseconds using the same procedure as in the protocol. To be precise, the new variable `ws.time.us` is obtained as `(ws.time * 1000000).astype('int64')`. 215 | 4. For DATA packets, the inter-arrival time is calculated as the difference between current and previous packet timestamps and stored as a separate variable `ws.iat.us`. Please note that the time delta for the first SRT data packet by default is equal to 0. That's why this packet should probably be excluded from the analysis. 216 | 5. Type conversion is performed to structure the data in appropriate formats. 217 | 218 | The detailed description of dataset variables, tcpdump/Wireshark dissectors and other data is provided in the table below. See columns `DATA` and `CONTROL` to determine whether a variable is present (`✓`) or absent (`-`) in a corresponding dataset. 219 | 220 | | Dataset Variable | Wireshark Dissector | Description | DATA | CONTROL | Data Type | 221 | |:------------------|:-----------------------|:-------------------------------------------------------------------------|:-----------|:----------|:-----------| 222 | | ws.no | _ws.col.No. | Number as registered by Wireshark | ✓ | ✓ | int64 | 223 | | frame.time | frame.time | Absolute time when the frame was captured as registered by Wireshark | ✓ | ✓ | datetime64 | 224 | | ws.time | _ws.col.Time | Relative timestamp as registered by Wireshark (seconds) | ✓ | ✓ | float64 | 225 | | ws.source | _ws.col.Source | Source IP address | ✓ | ✓ | category | 226 | | ws.destination | _ws.col.Destination | Destination IP address | ✓ | ✓ | category | 227 | | ws.protocol | _ws.col.Protocol | Protocol | ✓ | ✓ | category | 228 | | ws.length | _ws.col.Length | Length (bytes) | ✓ | ✓ | int16 | 229 | | ws.info | _ws.col.Info | Information | ✓ | ✓ | object | 230 | | udp.length | udp.length | UDP packet size (bytes) | ✓ | ✓ | int16 | 231 | | srt.iscontrol | srt.iscontrol | Content type (CONTROL if 1, DATA if 0) | ✓ | ✓ | int8 | 232 | | srt.type | srt.type | Message type (e.g. UMSG_ACK, UMSG_ACKACK) | - | ✓ | category | 233 | | srt.seqno | srt.seqno | Sequence number | ✓ | - | int64 | 234 | | srt.msg.rexmit | srt.msg.rexmit | Sent as original if 0, retransmitted if 1 | ✓ | - | int8 | 235 | | srt.timestamp | srt.timestamp | Timestamp since the socket was opened (microseconds) | ✓ | ✓ | int64 | 236 | | srt.id | srt.id | Destination socket id | ✓ | ✓ | category | 237 | | srt.ack_seqno | srt.ack_seqno | First unacknowledged sequence number | - | ✓ | int64 | 238 | | srt.rtt | srt.rtt | Round Trip Time (RTT) estimate (microseconds) | - | ✓ | int64 | 239 | | srt.rttvar | srt.rttvar | The variance of Round Trip Time (RTT) estimate (microseconds) | - | ✓ | int64 | 240 | | srt.rate | srt.rate | Receiving speed estimate (packets/s) | - | ✓ | int64 | 241 | | srt.bw | srt.bw | Bandwidth estimate (packets/s) | - | ✓ | int64 | 242 | | srt.rcvrate | srt.rcvrate | Receiving speed estimate (bytes/s) | - | ✓ | int64 | 243 | | data.len | data.len | Payload size or 0 in case of control packets (bytes) | ✓ | - | int16 | 244 | | ws.time.us | - | Relative timestamp as registered by Wireshark (microseconds) | ✓ | - | int64 | 245 | | ws.iat.us | - | Packet inter-arrival time (microseconds) | ✓ | - | int64 | 246 | 247 | ### Probing DATA Packets 248 | 249 | Probing DATA packets are extracted from the DATA packets dataset as follows: 250 | 1. Find all the packet pairs where the latest 4 bits of their sequence numbers (`srt.seqno`) are `0000` and `0001`. The order is important. 251 | 2. For each packet pair, check whether both packets are sent as `Original` (`srt.msg.rexmit` == 0), not `Retransmitted` (`srt.msg.rexmit` == 1). 252 | 3. For the remaining packet pairs, take the packet with sequence number ending with `0001` bits as a probing packet. 253 | 254 | Here is an example of a packet pair where `Frame 25` corresponds to the probing packet: 255 | ``` 256 | Frame 24: 1514 bytes on wire (12112 bits), 1500 bytes captured (12000 bits) on interface 0 257 | Ethernet II, Src: 12:34:56:78:9a:bc (12:34:56:78:9a:bc), Dst: Microsof_59:95:17 (00:0d:3a:59:95:17) 258 | Internet Protocol Version 4, Src: 51.144.160.127, Dst: 10.1.4.4 259 | User Datagram Protocol, Src Port: 60900, Dst Port: 4200 260 | SRT Protocol 261 | 0... .... .... .... .... .... .... .... = Content: DATA 262 | .111 1111 1110 0111 0111 1000 0011 0000 = Sequence Number: 2145876016 263 | 11.. .... .... .... .... .... .... .... = Packet Boundary: PB_SOLO (3) 264 | ..0. .... .... .... .... .... .... .... = In-Order Indicator: 0 265 | ...0 0... .... .... .... .... .... .... = Encryption Status: Not encrypted (0) 266 | .... .0.. .... .... .... .... .... .... = Sent as: Original 267 | .... ..00 0000 0000 0000 0000 0001 0001 = Message Number: 17 268 | Time Stamp: 449263 (0x0006daef) 269 | Destination Socket ID: 0x1c9ff5e1 270 | Data (1442 bytes) 271 | 272 | ``` 273 | 274 | ``` 275 | Frame 25: 1514 bytes on wire (12112 bits), 1500 bytes captured (12000 bits) on interface 0 276 | Ethernet II, Src: 12:34:56:78:9a:bc (12:34:56:78:9a:bc), Dst: Microsof_59:95:17 (00:0d:3a:59:95:17) 277 | Internet Protocol Version 4, Src: 51.144.160.127, Dst: 10.1.4.4 278 | User Datagram Protocol, Src Port: 60900, Dst Port: 4200 279 | SRT Protocol 280 | 0... .... .... .... .... .... .... .... = Content: DATA 281 | .111 1111 1110 0111 0111 1000 0011 0001 = Sequence Number: 2145876017 282 | 11.. .... .... .... .... .... .... .... = Packet Boundary: PB_SOLO (3) 283 | ..0. .... .... .... .... .... .... .... = In-Order Indicator: 0 284 | ...0 0... .... .... .... .... .... .... = Encryption Status: Not encrypted (0) 285 | .... .0.. .... .... .... .... .... .... = Sent as: Original 286 | .... ..00 0000 0000 0000 0000 0001 0010 = Message Number: 18 287 | Time Stamp: 449292 (0x0006db0c) 288 | Destination Socket ID: 0x1c9ff5e1 289 | Data (1442 bytes) 290 | ``` 291 | 292 | ### UMSG_HANDSHAKE CONTROL Packets 293 | 294 | UMSG_HANDSHAKE CONTROL packets are extracted from the CONTROL packets dataset using the following criteria: `srt.type == 0x00000000`. 295 | 296 | ### UMSG_ACK CONTROL Packets 297 | 298 | UMSG_ACK CONTROL packets are extracted from the CONTROL packets dataset as follows: 299 | 1. Find all the packets with `srt.type == 0x00000002`. 300 | 2. Drop rows with `NaN` values of `srt.rate`, `srt.bw`, and `srt.rcvrate` variables (so called "light acknowledgements" or "light ACKs"). 301 | 302 | 303 | 304 | [RETURN TO TOP](#lib-tcpdump-processing) 305 | -------------------------------------------------------------------------------- /img/extract_packets_report.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/img/extract_packets_report.png -------------------------------------------------------------------------------- /img/get_traffic_stats_report_rcv.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/img/get_traffic_stats_report_rcv.png -------------------------------------------------------------------------------- /img/get_traffic_stats_report_snd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/img/get_traffic_stats_report_snd.png -------------------------------------------------------------------------------- /scripts/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/scripts/__init__.py -------------------------------------------------------------------------------- /scripts/dump_pkt_timestamps.py: -------------------------------------------------------------------------------- 1 | """ 2 | The script dumps SRT timestamps (not Wireshark ws.time) of SRT data packets to a .csv file 3 | to be used by srt-xtransmit application with the --playback-csv argument. 4 | """ 5 | import pathlib 6 | 7 | import click 8 | 9 | from tcpdump_processing.convert import convert_to_csv 10 | from tcpdump_processing.extract_packets import extract_srt_packets, UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound 11 | 12 | 13 | class SRTDataIndex: 14 | def __init__(self, srt_packets): 15 | self.ctrl_pkts = (srt_packets['srt.iscontrol'] == 1) 16 | self.data_pkts = (srt_packets['srt.iscontrol'] == 0) 17 | self.data_pkts_org = self.data_pkts & (srt_packets['srt.msg.rexmit'] == 0) 18 | 19 | 20 | @click.command() 21 | @click.argument( 22 | 'input', 23 | type=click.Path(exists=True) 24 | ) 25 | @click.argument( 26 | 'output', 27 | type=click.Path(exists=False) 28 | ) 29 | @click.option( 30 | '--overwrite/--no-overwrite', 31 | default=False, 32 | help= 'If exists, overwrite the .csv file produced out of the .pcap(ng) ' 33 | 'one at the previous iterations of running the script.', 34 | show_default=True 35 | ) 36 | @click.option( 37 | '--port', 38 | help= 'Decode packets as SRT on a specified port. ' 39 | 'This option is helpful when there is no SRT handshake in .pcap(ng) file. ' 40 | 'Should be used together with --overwrite option.' 41 | ) 42 | def main(input, output, overwrite, port): 43 | """ 44 | This script parses .pcap(ng) tcpdump trace file and outputs all original 45 | data packets' SRT timestamps (not Wireshark ws.time) into a .csv file. 46 | 47 | INPUT is the .pcap(ng) file to use as an input. 48 | 49 | OUTPUT is the output .csv file to be produced. 50 | """ 51 | pcap_filepath = pathlib.Path(input) 52 | if port is not None: 53 | csv_filepath = convert_to_csv(pcap_filepath, overwrite, True, port) 54 | else: 55 | csv_filepath = convert_to_csv(pcap_filepath, overwrite) 56 | 57 | try: 58 | srt_packets = extract_srt_packets(csv_filepath) 59 | except (UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound) as error: 60 | print(f'{error}') 61 | return 62 | 63 | index = SRTDataIndex(srt_packets) 64 | df = srt_packets[index.data_pkts_org] 65 | (df['srt.timestamp'] / 1000000.0).to_csv(output, index=False, header=False) 66 | 67 | # TODO: Plotting the histogram of packets by 10 ms bins. 68 | # The code below is missing the end time in the arrange() function. 69 | #x = np.arange(0, 27, 0.01, dtype = float) 70 | #fig, axis = plt.subplots(figsize =(10, 5)) 71 | #axis.hist((df['srt.timestamp'] / 1000000.0), bins = x) 72 | #plt.show() 73 | 74 | return 75 | 76 | 77 | if __name__ == '__main__': 78 | main() 79 | -------------------------------------------------------------------------------- /scripts/get_traffic_stats.py: -------------------------------------------------------------------------------- 1 | """ 2 | Script designed to process .pcap(ng) files and generate a report 3 | with network traffic statistics. 4 | """ 5 | import pathlib 6 | 7 | import click 8 | import pandas as pd 9 | 10 | from tcpdump_processing.convert import convert_to_csv 11 | from tcpdump_processing.extract_packets import extract_srt_packets, UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound 12 | 13 | 14 | def to_percent(value, base): 15 | return round(value / base * 100, 4) 16 | 17 | 18 | def to_str(first, second): 19 | return str(first) + '(' + str(second) + ')' 20 | 21 | 22 | def to_rate(value, duration): 23 | return round(value * 8 / duration / 1000000, 2) 24 | 25 | 26 | class TrafficStatsIndex: 27 | def __init__(self, srt_packets): 28 | self.ctrl_pkts = (srt_packets['srt.iscontrol'] == 1) 29 | self.ctrl_pkts_ack = (self.ctrl_pkts) & (srt_packets['srt.type'] == '0x00000002') 30 | self.ctrl_pkts_ackack = (self.ctrl_pkts) & (srt_packets['srt.type'] == '0x00000006') 31 | self.ctrl_pkts_nak = (self.ctrl_pkts) & (srt_packets['srt.type'] == '0x00000003') 32 | 33 | self.data_pkts = (srt_packets['srt.iscontrol'] == 0) 34 | self.data_pkts_org = (self.data_pkts) & (srt_packets['srt.msg.rexmit'] == 0) 35 | self.data_pkts_rex = (self.data_pkts) & (srt_packets['srt.msg.rexmit'] == 1) 36 | 37 | 38 | class TrafficStats: 39 | def __init__(self, srt_packets): 40 | self.srt_packets = srt_packets 41 | self.index = TrafficStatsIndex(srt_packets) 42 | 43 | 44 | def bytes_to_Mbps(self, bytes): 45 | return bytes * 8 / self.duration / 1000000 46 | 47 | 48 | @property 49 | def data_pkts(self): 50 | return self.srt_packets[self.index.data_pkts] 51 | 52 | 53 | @property 54 | def data_pkts_org(self): 55 | return self.srt_packets[self.index.data_pkts_org] 56 | 57 | 58 | @property 59 | def data_pkts_rex(self): 60 | return self.srt_packets[self.index.data_pkts_rex] 61 | 62 | 63 | @property 64 | def ctrl_pkts(self): 65 | return self.srt_packets[self.index.ctrl_pkts] 66 | 67 | 68 | @property 69 | def duration(self): 70 | # Calculate duration in seconds. 71 | start = self.data_pkts.iloc[0]['ws.time'] 72 | stop = self.data_pkts.iloc[-1]['ws.time'] 73 | return (stop - start) 74 | 75 | 76 | def count_packets(self): 77 | # Count the number of packets. 78 | pkts = len(self.srt_packets.index) 79 | data_pkts = self.index.data_pkts.sum() 80 | data_pkts_org = self.index.data_pkts_org.sum() 81 | data_pkts_rex = self.index.data_pkts_rex.sum() 82 | 83 | ctrl_pkts = self.index.ctrl_pkts.sum() 84 | ctrl_pkts_ack = self.index.ctrl_pkts_ack.sum() 85 | ctrl_pkts_ackack = self.index.ctrl_pkts_ackack.sum() 86 | ctrl_pkts_nak = self.index.ctrl_pkts_nak.sum() 87 | 88 | return { 89 | 'pkts': pkts, 90 | 'data_pkts': data_pkts, 91 | 'data_pkts_org': data_pkts_org, 92 | 'data_pkts_rex': data_pkts_rex, 93 | 'ctrl_pkts': ctrl_pkts, 94 | 'ctrl_pkts_ack': ctrl_pkts_ack, 95 | 'ctrl_pkts_ackack': ctrl_pkts_ackack, 96 | 'ctrl_pkts_nak': ctrl_pkts_nak, 97 | } 98 | 99 | 100 | def count_retransmissions(self): 101 | # Calculate how much packets were retransmitted once, twice, 3x times, etc. 102 | rexmit_pkts = self.data_pkts_rex.copy() 103 | rexmit_pkts['srt.seqno'] = rexmit_pkts['srt.seqno'].astype('int32') 104 | rexmit_pkts['seqno'] = rexmit_pkts['srt.seqno'] 105 | rexmits = rexmit_pkts.groupby(['srt.seqno'])['seqno'].count() 106 | 107 | once = rexmits[rexmits == 1].count() 108 | twice = rexmits[rexmits == 2].count() 109 | x3 = rexmits[rexmits == 3].count() 110 | x4 = rexmits[rexmits == 4].count() 111 | x5_more = rexmits[rexmits > 4].count() 112 | 113 | return { 114 | 'once': once, 115 | 'twice': twice, 116 | 'x3': x3, 117 | 'x4': x4, 118 | 'x5_more': x5_more, 119 | 'once_total': once, 120 | 'twice_total': twice * 2, 121 | 'x3_total': x3 * 3, 122 | 'x4_total': x4 * 4, 123 | 'x5_more_total': len(rexmit_pkts) - once - twice * 2 - x3 * 3 - x4 * 4, 124 | } 125 | 126 | 127 | def print_traffic(self): 128 | print(" Traffic ".center(70, "~")) 129 | 130 | print(f"- SRT DATA pkts") 131 | print(f" - SRT payload + SRT hdr + UDP hdr (orig+retrans) {to_rate(self.data_pkts['udp.length'].sum(), self.duration):>13} Mbps") 132 | print(f" - SRT payload + SRT hdr (orig+retrans) {to_rate(self.data_pkts['data.len'].sum() + 16 * len(self.data_pkts), self.duration):>13} Mbps") 133 | print(f" - SRT payload (orig+retrans) {to_rate(self.data_pkts['data.len'].sum(), self.duration):>13} Mbps") 134 | print(f" - SRT payload + SRT hdr + UDP hdr (orig) {to_rate(self.data_pkts_org['udp.length'].sum(), self.duration):>13} Mbps") 135 | print(f" - SRT payload + SRT hdr (orig) {to_rate(self.data_pkts_org['data.len'].sum() + 16 * len(self.data_pkts_org), self.duration):>13} Mbps") 136 | print(f" - SRT payload (orig) {to_rate(self.data_pkts_org['data.len'].sum(), self.duration):>13} Mbps") 137 | 138 | 139 | def print_notations(self): 140 | print(" Notations ".center(70, "~")) 141 | print("pkts - packets") 142 | print("hdr - header") 143 | print("orig - original") 144 | print("retrans - retransmitted") 145 | print("".center(70, "~")) 146 | 147 | 148 | def generate_snd_report(self): 149 | cnt = self.count_packets() 150 | rexmits_cnt = self.count_retransmissions() 151 | 152 | # Calculate the number of missing in the dump original data packets 153 | # that were either dropped by the SRT sender, or UDP socket. 154 | # Reordered packets are not taken into account, so if a packet is reordered and 155 | # comes later, it will not be included into statistic. 156 | seqnos_org = self.data_pkts_org['srt.seqno'].astype('int32') 157 | # Removing duplicates in sent original packets. 158 | seqnos_org = seqnos_org.drop_duplicates() 159 | data_pkts_org_missing_cnt = int((seqnos_org.diff().dropna() - 1).sum()) 160 | 161 | print(" SRT Packets ".center(70, "~")) 162 | 163 | print(f"- SRT DATA+CONTROL pkts {cnt['pkts']:>35}") 164 | 165 | print(f"- SRT DATA pkts {cnt['data_pkts']:>35}") 166 | 167 | print( 168 | f" - Original DATA pkts sent {cnt['data_pkts_org']:>26}" 169 | f" {to_percent(cnt['data_pkts_org'], cnt['data_pkts']):>8}%" 170 | " out of orig+retrans sent DATA pkts" 171 | ) 172 | 173 | print( 174 | f" - Retransmitted DATA pkts sent {cnt['data_pkts_rex']:>26}" 175 | f" {to_percent(cnt['data_pkts_rex'], cnt['data_pkts']):>8}%" 176 | " out of orig+retrans sent DATA pkts" 177 | ) 178 | print(f" Once {to_str(rexmits_cnt['once'], rexmits_cnt['once']):>47} {to_percent(rexmits_cnt['once'], cnt['data_pkts']):>8}%") 179 | print(f" Twice {to_str(rexmits_cnt['twice'], rexmits_cnt['twice_total']):>47} {to_percent(rexmits_cnt['twice_total'], cnt['data_pkts']):>8}%") 180 | print(f" 3× {to_str(rexmits_cnt['x3'], rexmits_cnt['x3_total']):>47} {to_percent(rexmits_cnt['x3_total'], cnt['data_pkts']):>8}%") 181 | print(f" 4× {to_str(rexmits_cnt['x4'], rexmits_cnt['x4_total']):>47} {to_percent(rexmits_cnt['x4_total'], cnt['data_pkts']):>8}%") 182 | print(f" 5+ {to_str(rexmits_cnt['x5_more'], rexmits_cnt['x5_more_total']):>47} {to_percent(rexmits_cnt['x5_more_total'], cnt['data_pkts']):>8}%") 183 | 184 | print( 185 | f"- Original DATA pkts missing {data_pkts_org_missing_cnt:>25}" 186 | f" {to_percent(data_pkts_org_missing_cnt, (cnt['data_pkts_org']+data_pkts_org_missing_cnt)):>8}%" 187 | " out of orig sent+missing DATA pkts" 188 | ) 189 | 190 | print(f"- SRT CONTROL pkts {cnt['ctrl_pkts']:>37}") 191 | print(f" - ACK pkts received {cnt['ctrl_pkts_ack']:>37}") 192 | print(f" - ACKACK pkts sent {cnt['ctrl_pkts_ackack']:>37}") 193 | print(f" - NAK pkts received {cnt['ctrl_pkts_nak']:>37}") 194 | 195 | self.print_traffic() 196 | 197 | print(" Overhead ".center(70, "~")) 198 | 199 | print(f"- SRT DATA pkts") 200 | print( 201 | " - UDP+SRT headers over SRT payload (orig)" 202 | f"{round(to_rate(self.data_pkts_org['udp.length'].sum(), self.duration) * 100 / to_rate(self.data_pkts_org['data.len'].sum(), self.duration) - 100, 2):>25} %" 203 | ) 204 | print( 205 | " - Retransmitted over original sent pkts" 206 | f"{to_percent(cnt['data_pkts_rex'], cnt['data_pkts_org']):>27} %" 207 | ) 208 | 209 | self.print_notations() 210 | 211 | 212 | def generate_rcv_report(self): 213 | cnt = self.count_packets() 214 | rexmits_cnt = self.count_retransmissions() 215 | 216 | # Calculate the number of lost original data packets as the number 217 | # of original data packets that haven't reached the receiver. 218 | # Reordered packets are not taken into account, so if a packet is reordered and 219 | # comes later, it will not be included into statistic. 220 | seqnos_org = self.data_pkts_org['srt.seqno'].astype('int32') 221 | # Removing duplicates in received original packets. 222 | seqnos_org = seqnos_org.drop_duplicates() 223 | data_pkts_org_lost_cnt = int((seqnos_org.diff().dropna() - 1).sum()) 224 | 225 | # The number of packets considered unrecovered at the receiver. 226 | # It means neither original, nor re-transmitted packet with 227 | # a particular sequence number has reached the destination. 228 | seqnos = self.data_pkts['srt.seqno'].astype('int32').copy() 229 | seqnos = seqnos.drop_duplicates().sort_values() 230 | data_pkts_unrecovered_cnt = int((seqnos.diff().dropna() - 1).sum()) 231 | 232 | # The number of recovered at the receiver side packets. 233 | data_pkts_recovered_cnt = data_pkts_org_lost_cnt - data_pkts_unrecovered_cnt 234 | 235 | # The number of original DATA packets (received + lost). 236 | data_pkts_org_rcvd_lost_cnt = cnt['data_pkts_org'] + data_pkts_org_lost_cnt 237 | 238 | print(" SRT Packets ".center(70, "~")) 239 | 240 | print(f"- SRT DATA+CONTROL pkts {cnt['pkts']:>35}") 241 | 242 | print(f"- SRT DATA pkts {cnt['data_pkts']:>35}") 243 | 244 | print( 245 | f" - Original DATA pkts received {cnt['data_pkts_org']:>22}" 246 | f" {to_percent(cnt['data_pkts_org'], cnt['data_pkts']):>8}%" 247 | " out of orig+retrans received DATA pkts" 248 | ) 249 | 250 | print( 251 | f" - Retransmitted DATA pkts received {cnt['data_pkts_rex']:>22}" 252 | f" {to_percent(cnt['data_pkts_rex'], cnt['data_pkts']):>8}%" 253 | " out of orig+retrans received DATA pkts" 254 | ) 255 | print(f" Once {to_str(rexmits_cnt['once'], rexmits_cnt['once']):>47} {to_percent(rexmits_cnt['once'], cnt['data_pkts']):>8}%") 256 | print(f" Twice {to_str(rexmits_cnt['twice'], rexmits_cnt['twice_total']):>47} {to_percent(rexmits_cnt['twice_total'], cnt['data_pkts']):>8}%") 257 | print(f" 3× {to_str(rexmits_cnt['x3'], rexmits_cnt['x3_total']):>47} {to_percent(rexmits_cnt['x3_total'], cnt['data_pkts']):>8}%") 258 | print(f" 4× {to_str(rexmits_cnt['x4'], rexmits_cnt['x4_total']):>47} {to_percent(rexmits_cnt['x4_total'], cnt['data_pkts']):>8}%") 259 | print(f" 5+ {to_str(rexmits_cnt['x5_more'], rexmits_cnt['x5_more_total']):>47} {to_percent(rexmits_cnt['x5_more_total'], cnt['data_pkts']):>8}%") 260 | 261 | # The percentage of original DATA packets lost is calculated out of 262 | # original DATA packets (received + lost) which equals sent unique 263 | # packets approximately. 264 | print( 265 | f"- Original DATA pkts lost {data_pkts_org_lost_cnt:>28}" 266 | f" {to_percent(data_pkts_org_lost_cnt, data_pkts_org_rcvd_lost_cnt):>8}%" 267 | " out of orig received+lost DATA pkts" 268 | ) 269 | print( 270 | f" - Recovered pkts {data_pkts_recovered_cnt:>40}" 271 | f" {to_percent(data_pkts_recovered_cnt, data_pkts_org_rcvd_lost_cnt):>8}%" 272 | ) 273 | print( 274 | f" - Unrecovered pkts {data_pkts_unrecovered_cnt:>38}" 275 | f" {to_percent(data_pkts_unrecovered_cnt, data_pkts_org_rcvd_lost_cnt):>8}%" 276 | ) 277 | 278 | print(f"- SRT CONTROL pkts {cnt['ctrl_pkts']:>26}") 279 | print(f" - ACK pkts sent {cnt['ctrl_pkts_ack']:>26}") 280 | print(f" - ACKACK pkts received {cnt['ctrl_pkts_ackack']:>26}") 281 | print(f" - NAK pkts sent {cnt['ctrl_pkts_nak']:>26}") 282 | 283 | self.print_traffic() 284 | 285 | print(" Overhead ".center(70, "~")) 286 | 287 | print(f"- SRT DATA pkts") 288 | print( 289 | " - UDP+SRT headers over SRT payload (orig)" 290 | f"{round(to_rate(self.data_pkts_org['udp.length'].sum(), self.duration) * 100 / to_rate(self.data_pkts_org['data.len'].sum(), self.duration) - 100, 2):>25} %" 291 | ) 292 | print( 293 | " - Retransmitted over original (received+lost) pkts" 294 | f"{to_percent(cnt['data_pkts_rex'], data_pkts_org_rcvd_lost_cnt):>16} %" 295 | ) 296 | 297 | self.print_notations() 298 | 299 | 300 | def show_unrecovered_packets(self, parent, stem): 301 | # Show and save to file sequence numbers of unrecovered at the 302 | # receiver side packets. 303 | 304 | # The number of packets considered unrecovered at the receiver. 305 | # It means neither original, nor re-transmitted packet with 306 | # a particular sequence number has reached the destination. 307 | seqnos = self.data_pkts['srt.seqno'].astype('int32').copy() 308 | seqnos = seqnos.drop_duplicates().sort_values() 309 | 310 | # Get sequence numbers of unrecovered packets. 311 | df = pd.DataFrame(seqnos) 312 | df['diff'] = df['srt.seqno'].diff() 313 | df.dropna(inplace=True) 314 | df['diff'] = df['diff'].astype('int32') - 1 315 | df = df[df['diff'] != 0] 316 | df['start'] = df['srt.seqno'] - df['diff'] 317 | 318 | list_unrec = df[['diff', 'start']].values.tolist() 319 | 320 | unrec_pkts_seqnos = [] 321 | for sublist in list_unrec: 322 | diff, start = sublist 323 | for i in range(0, diff): 324 | unrec_pkts_seqnos.append(start + i) 325 | 326 | unrec_pkts_seqnos = pd.Series(unrec_pkts_seqnos) 327 | path_unrec = parent / (stem + '-unrec-pkts-seqnos.csv') 328 | unrec_pkts_seqnos.to_csv(path_unrec) 329 | print(f'\nUnrecovered at the receiver side packets have the following sequence numbers. They are stored in {path_unrec} file.') 330 | print(unrec_pkts_seqnos) 331 | 332 | 333 | @click.command() 334 | @click.argument( 335 | 'path', 336 | type=click.Path(exists=True) 337 | ) 338 | @click.option( 339 | '--side', 340 | type=click.Choice(['snd', 'rcv'], case_sensitive=False), 341 | required=True, 342 | help='The side .pcap(ng) file was collected at.' 343 | ) 344 | @click.option( 345 | '--overwrite/--no-overwrite', 346 | default=False, 347 | help= 'If exists, overwrite the .csv file produced out of the .pcap(ng) ' 348 | 'one at the previous iterations of running the script.', 349 | show_default=True 350 | ) 351 | @click.option( 352 | '--show-unrec-pkts/--no-show-unrec-pkts', 353 | default=False, 354 | help= 'Show sequence numbers of unrecovered at the receiver side ' 355 | 'packets. Save the list of sequence numbers into respective .csv file.', 356 | show_default=True 357 | ) 358 | @click.option( 359 | '--port', 360 | help= 'Decode packets as SRT on a specified port. ' 361 | 'This option is helpful when there is no SRT handshake in .pcap(ng) file. ' 362 | 'Should be used together with --overwrite option.' 363 | ) 364 | def main(path, side, overwrite, show_unrec_pkts, port): 365 | """ 366 | Script designed to process .pcap(ng) files and generate a report 367 | with network traffic statistics. 368 | """ 369 | # Convert .pcap(ng) to .csv tcpdump trace file 370 | pcap_filepath = pathlib.Path(path) 371 | if port is not None: 372 | csv_filepath = convert_to_csv(pcap_filepath, overwrite, True, port) 373 | else: 374 | csv_filepath = convert_to_csv(pcap_filepath, overwrite) 375 | 376 | # Extract SRT packets 377 | try: 378 | srt_packets = extract_srt_packets(csv_filepath) 379 | except (UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound) as error: 380 | print(f'{error}') 381 | return 382 | 383 | stats = TrafficStats(srt_packets) 384 | 385 | if (side == 'snd'): 386 | stats.generate_snd_report() 387 | return 388 | 389 | if (side == 'rcv'): 390 | stats.generate_rcv_report() 391 | if (show_unrec_pkts): 392 | stats.show_unrecovered_packets(pathlib.Path(path).parent, pathlib.Path(path).stem) 393 | 394 | 395 | if __name__ == '__main__': 396 | main() 397 | -------------------------------------------------------------------------------- /scripts/plot_snd_timing.py: -------------------------------------------------------------------------------- 1 | """ 2 | Script designed to plot time delta between packet capture time (Wireshark) and 3 | SRT packet timestamp. 4 | """ 5 | import pathlib 6 | 7 | import click 8 | import pandas as pd 9 | import matplotlib.pyplot as plt 10 | 11 | from tcpdump_processing.convert import convert_to_csv 12 | from tcpdump_processing.extract_packets import extract_srt_packets, UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound 13 | 14 | 15 | pd.options.mode.chained_assignment = None # default='warn' 16 | 17 | 18 | class SRTDataIndex: 19 | def __init__(self, srt_packets): 20 | self.ctrl_pkts = (srt_packets['srt.iscontrol'] == 1) 21 | self.data_pkts = (srt_packets['srt.iscontrol'] == 0) 22 | self.data_pkts_org = self.data_pkts & (srt_packets['srt.msg.rexmit'] == 0) 23 | self.data_pkts_rxt = self.data_pkts & (srt_packets['srt.msg.rexmit'] == 1) 24 | 25 | 26 | @click.command() 27 | @click.argument( 28 | 'path', 29 | type=click.Path(exists=True) 30 | ) 31 | @click.option( 32 | '--overwrite/--no-overwrite', 33 | default=False, 34 | help= 'If exists, overwrite the .csv file produced out of the .pcap(ng) one ' 35 | 'at the previous iterations of running the script.', 36 | show_default=True 37 | ) 38 | @click.option( 39 | '--with-rexmits/--without-rexmits', 40 | default=False, 41 | help= 'Also show retransmitted data packets.', 42 | show_default=True 43 | ) 44 | @click.option( 45 | '--port', 46 | help= 'Decode packets as SRT on a specified port. ' 47 | 'This option is helpful when there is no SRT handshake in .pcap(ng) file. ' 48 | 'Should be used together with --overwrite option.' 49 | ) 50 | @click.option( 51 | '--latency', 52 | help= 'SRT latency, in milliseconds, to plot on a graph.' 53 | ) 54 | def main(path, overwrite, with_rexmits, port, latency): 55 | """ 56 | This script parses .pcap(ng) tcpdump trace file captured at the sender side 57 | and plots the time delta between SRT packet timestamp (srt.timestamp) and 58 | packet time captured by Wireshark at the sender side (ws.time). 59 | This could be done for either SRT original DATA packets only, or both 60 | original and retransmitted DATA packets. 61 | """ 62 | pcap_filepath = pathlib.Path(path) 63 | if port is not None: 64 | csv_filepath = convert_to_csv(pcap_filepath, overwrite, True, port) 65 | else: 66 | csv_filepath = convert_to_csv(pcap_filepath, overwrite) 67 | 68 | try: 69 | srt_packets = extract_srt_packets(csv_filepath) 70 | except (UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound) as error: 71 | print(f'{error}') 72 | return 73 | 74 | index = SRTDataIndex(srt_packets) 75 | df = srt_packets[index.data_pkts] 76 | df['delta'] = df['ws.time'] * 1000 - df['srt.timestamp'] / 1000 77 | # NOTE: The correction on the very first DATA packet is made by means of subtracting 78 | # respective time delta from all the whole column. 79 | df['delta'] = df['delta'] - df['delta'].iloc[0] 80 | org = df[df['srt.msg.rexmit'] == 0] 81 | rxt = df[df['srt.msg.rexmit'] == 1] 82 | 83 | fig, ax = plt.subplots() 84 | org.plot(x = 'ws.time', xlabel = 'Time, s', y = 'delta', ylabel = 'Time Delta, ms', kind='scatter', label='Original Packets', ax=ax) 85 | if with_rexmits: 86 | rxt.plot(x = 'ws.time', xlabel = 'Time, s', y = 'delta', ylabel = 'Time Delta, ms', kind='scatter', color='r', label='Retransmitted Packets', ax=ax) 87 | if latency: 88 | plt.axhline(float(latency), color='g', label='SRT Latency') 89 | ax.legend() 90 | plt.show() 91 | 92 | 93 | if __name__ == '__main__': 94 | main() 95 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | 4 | # Dependencies for using the library 5 | install_requires = [ 6 | 'click >=7.0', 7 | 'pandas >=2.0.1', 8 | 'pathlib >=1.0.1', 9 | ] 10 | 11 | 12 | setup( 13 | name='lib-tcpdump-processing', 14 | version='0.2', 15 | author='Maria Sharabayko', 16 | author_email='maria.bakholdina@gmail.com', 17 | packages=find_packages(), 18 | install_requires=install_requires, 19 | entry_points={ 20 | 'console_scripts': [ 21 | 'extract-packets = tcpdump_processing.extract_packets:main', 22 | 'get-traffic-stats = scripts.get_traffic_stats:main', 23 | 'plot-snd-timing = scripts.plot_snd_timing:main', 24 | 'dump-pkt-timestamps = scripts.dump_pkt_timestamps:main' 25 | ], 26 | }, 27 | ) -------------------------------------------------------------------------------- /tcpdump_processing/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mbakholdina/lib-tcpdump-processing/ac3d567199cbea09f1a5616fe8849fc1b923c69e/tcpdump_processing/__init__.py -------------------------------------------------------------------------------- /tcpdump_processing/convert.py: -------------------------------------------------------------------------------- 1 | """ Module designed to convert .pcap(ng) tcpdump trace file into .csv one. """ 2 | 3 | import pathlib 4 | import subprocess 5 | 6 | 7 | class IsNotPcapFile(Exception): 8 | pass 9 | 10 | 11 | class PcapProcessingFailed(Exception): 12 | pass 13 | 14 | 15 | class FileDoesNotExist(Exception): 16 | pass 17 | 18 | 19 | class DirectoryDoesNotExist(Exception): 20 | pass 21 | 22 | 23 | def convert_to_csv( 24 | filepath: pathlib.Path, 25 | overwrite: bool=False, 26 | decode_as_srt: bool=False, 27 | port: str=None 28 | ) -> pathlib.Path: 29 | """ 30 | Convert .pcap(ng) tcpdump trace file into .csv one. During conversion, 31 | by default UDP packets are extracted. If `decode_as_srt` equals True, 32 | packets are decoded as SRT on a particular port. 33 | 34 | Attributes: 35 | filepath: 36 | :class:`pathlib.Path` path to tcpdump trace file. 37 | overwrite: 38 | True if already existing .csv file should be overwritten. 39 | decode_as_srt: 40 | True if packets should be decoded as SRT packets 41 | on a particular port. 42 | port: 43 | Port on which packets should be decoded as SRT if `decode_as_srt` 44 | equals True. 45 | 46 | Returns: 47 | :class:`pathlib.Path` path to a result csv file. 48 | 49 | Raises: 50 | :exc:`FileDoesNotExist` 51 | if `filepath` file does not exist, 52 | :exc:`IsNotPcapFile` 53 | if `filepath` does not correspond to .pcap(ng) file, 54 | :exc:`PcapProcessingFailed` 55 | if tcpdump trace file .csv file processing was not successful. 56 | """ 57 | if not filepath.exists(): 58 | raise FileDoesNotExist(filepath) 59 | 60 | suffix = filepath.suffix 61 | if not suffix.endswith('.pcapng'): 62 | if not suffix.endswith('.pcap'): 63 | raise IsNotPcapFile( 64 | f'{filepath} does not correspond to .pcap(ng) file' 65 | ) 66 | 67 | csv_filepath = filepath.parent / (filepath.stem + '.csv') 68 | if csv_filepath.exists() and not overwrite: 69 | print( 70 | 'Skipping .pcap(ng) tcpdump trace file processing to ' 71 | f'.csv, .csv file already exists: {csv_filepath}.' 72 | ) 73 | return csv_filepath 74 | 75 | print(f'Processing .pcap(ng) tcpdump trace file to .csv: {filepath}') 76 | args = [ 77 | 'tshark', 78 | '-r', str(filepath), 79 | '--disable-protocol', 'udt', # Disable UDT protocol, otherwise SRT packets will be treated as UDT ones 80 | ] 81 | 82 | if decode_as_srt: 83 | args += ['-d', f'udp.port=={port},srt'] # Decode UDP packets as SRT on a particular port 84 | else: 85 | args += ['-Y', 'udp',] # Decode packets as UDP 86 | 87 | args += [ 88 | '-T', 'fields', 89 | '-e', '_ws.col.No.', 90 | '-e', 'frame.time', 91 | '-e', '_ws.col.Time', 92 | '-e', '_ws.col.Source', 93 | '-e', '_ws.col.Destination', 94 | '-e', '_ws.col.Protocol', 95 | '-e', '_ws.col.Length', 96 | '-e', '_ws.col.Info', 97 | '-e', 'udp.length', 98 | '-e', 'udp.srcport', 99 | '-e', 'udp.dstport', 100 | '-e', 'srt.iscontrol', 101 | '-e', 'srt.type', 102 | '-e', 'srt.seqno', 103 | '-e', 'srt.msg.rexmit', 104 | '-e', 'srt.timestamp', 105 | '-e', 'srt.id', 106 | '-e', 'srt.ack_seqno', 107 | '-e', 'srt.rtt', 108 | '-e', 'srt.rttvar', 109 | '-e', 'srt.rate', 110 | '-e', 'srt.bw', 111 | '-e', 'srt.rcvrate', 112 | '-e', 'data.len', 113 | '-E', 'header=y', 114 | '-E', 'separator=;', 115 | ] 116 | 117 | with csv_filepath.open(mode='w') as f: 118 | process = subprocess.run(args, stdout=f) 119 | if process.returncode != 0: 120 | raise PcapProcessingFailed( 121 | 'Processing .pcap(ng) tcpdump trace file to .csv ' 122 | f'has failed with the code: {process.returncode}' 123 | ) 124 | print(f'Processing finished: {csv_filepath}') 125 | 126 | return csv_filepath -------------------------------------------------------------------------------- /tcpdump_processing/extract_packets.py: -------------------------------------------------------------------------------- 1 | """ 2 | Module designed to extract packets of interest out of the .pcap(ng) 3 | tcpdump trace file. Currently only trace files with one data flow 4 | are supported. 5 | """ 6 | 7 | import enum 8 | import pathlib 9 | 10 | import click 11 | import dateutil 12 | import pandas as pd 13 | 14 | import tcpdump_processing.convert as convert 15 | 16 | 17 | class AutoName(enum.Enum): 18 | def _generate_next_value_(name, start, count, last_values): 19 | return name 20 | 21 | 22 | @enum.unique 23 | class PacketTypes(AutoName): 24 | srt = enum.auto() 25 | data = enum.auto() 26 | control = enum.auto() 27 | probing = enum.auto() 28 | umsg_handshake = enum.auto() 29 | umsg_ack = enum.auto() 30 | 31 | 32 | PACKET_TYPES = [name for name, member in PacketTypes.__members__.items()] 33 | 34 | 35 | class UnexpectedColumnsNumber(Exception): 36 | pass 37 | 38 | class EmptyCSV(Exception): 39 | pass 40 | 41 | class NoUDPPacketsFound(Exception): 42 | pass 43 | 44 | class NoSRTPacketsFound(Exception): 45 | pass 46 | 47 | 48 | def extract_srt_packets(filepath: pathlib.Path) -> pd.DataFrame: 49 | """ 50 | Extract SRT packets (both DATA and CONTROL) from the .csv 51 | tcpdump trace file. 52 | 53 | Attributes: 54 | filepath: 55 | :class:`pathlib.Path` path to the .csv tcpdump trace file. 56 | 57 | Returns: 58 | :class:`pd.DataFrame` dataframe with SRT packets or 59 | an empty dataframe if there is no SRT packets. 60 | 61 | Raises: 62 | :exc:`convert.FileDoesNotExist` 63 | if `filepath` file does not exist. 64 | :exc: `UnexpectedColumnsNumber` 65 | if .csv file contains unexpected number of columns. 66 | :exc: `EmptyCSV` 67 | if neither SRT, nor UDP packets are present in .csv file. 68 | :exc: `NoUDPPacketsFound` 69 | if there is no SRT handshake and there are no UDP packets found in .csv file. 70 | :exc: `NoSRTPacketsFound` 71 | if there is no SRT handshake, but there are UDP packets found in .csv file. 72 | Those UDP packets could be further parsed as SRT ones. 73 | """ 74 | if not filepath.exists(): 75 | raise convert.FileDoesNotExist(filepath) 76 | 77 | columns = [ 78 | '_ws.col.No.', 79 | 'frame.time', 80 | '_ws.col.Time', 81 | '_ws.col.Source', 82 | '_ws.col.Destination', 83 | '_ws.col.Protocol', 84 | '_ws.col.Length', 85 | '_ws.col.Info', 86 | 'udp.length', 87 | 'udp.srcport', 88 | 'udp.dstport', 89 | 'srt.iscontrol', 90 | 'srt.type', 91 | 'srt.seqno', 92 | 'srt.msg.rexmit', 93 | 'srt.timestamp', 94 | 'srt.id', 95 | 'srt.ack_seqno', 96 | 'srt.rtt', 97 | 'srt.rttvar', 98 | 'srt.rate', 99 | 'srt.bw', 100 | 'srt.rcvrate', 101 | 'data.len', 102 | ] 103 | 104 | types = [ 105 | 'int64', # _ws.col.No. (ws.no) 106 | 'object', # frame.time 107 | 'float64', # _ws.col.Time (ws.time) 108 | 'category', # _ws.col.Source (ws.source) 109 | 'category', # _ws.col.Destination (ws.destination) 110 | 'category', # _ws.col.Protocol (ws.protocol) 111 | 'int16', # _ws.col.Length (ws.length) 112 | 'object', # _ws.col.Info (ws.info) 113 | 'float32', # udp.length 114 | 'object', # ws.srcport 115 | 'object', # ws.dstport 116 | 'float32', # srt.iscontrol 117 | 'category', # srt.type 118 | 'float64', # srt.seqno 119 | 'float32', # srt.msg.rexmit 120 | 'float64', # srt.timestamp 121 | 'category', # srt.id 122 | 'float64', # srt.ack_seqno 123 | 'float64', # srt.rtt 124 | 'float64', # srt.rttvar 125 | 'float64', # srt.rate 126 | 'float64', # srt.bw 127 | 'float64', # srt.rcvrate 128 | 'float32' # data.len 129 | ] 130 | 131 | columns_types = dict(zip(columns, types)) 132 | packets = pd.read_csv(filepath, sep=';', dtype=columns_types) 133 | 134 | if len(packets.columns) != len(columns): 135 | raise UnexpectedColumnsNumber( 136 | f'Unexpected columns number in .csv file: {filepath}. ' 137 | 'Try running the script with --overwrite option.' 138 | ) 139 | 140 | packets.columns = [ 141 | 'ws.no', 142 | 'frame.time', 143 | 'ws.time', 144 | 'ws.source', 145 | 'ws.destination', 146 | 'ws.protocol', 147 | 'ws.length', 148 | 'ws.info', 149 | 'udp.length', 150 | 'udp.srcport', 151 | 'udp.dstport', 152 | 'srt.iscontrol', 153 | 'srt.type', 154 | 'srt.seqno', 155 | 'srt.msg.rexmit', 156 | 'srt.timestamp', 157 | 'srt.id', 158 | 'srt.ack_seqno', 159 | 'srt.rtt', 160 | 'srt.rttvar', 161 | 'srt.rate', 162 | 'srt.bw', 163 | 'srt.rcvrate', 164 | 'data.len' 165 | ] 166 | 167 | # Packets dataframe may consist of both SRT and UDP packets, maybe empty as well 168 | if packets.empty: 169 | raise EmptyCSV( 170 | 'Neither SRT, nor UDP packets are present in .csv file. ' 171 | 'Sounds like original .pcap(ng) file is empty or consists of non-UDP packets.' 172 | ) 173 | 174 | srt_packets = packets[packets['ws.protocol'] == 'SRT'].copy() 175 | 176 | if srt_packets.empty: 177 | # With a high probability there is no SRT handshake present in the original .pcap(ng) file 178 | print( 179 | 'No SRT packets found in .csv file. ' 180 | 'Sounds like there is no SRT handshake in the original .pcap(ng) file. ' 181 | 'Extracting UDP packets.' 182 | ) 183 | 184 | udp_packets = packets[packets['ws.protocol'] == 'UDP'].copy() 185 | 186 | if udp_packets.empty: 187 | raise NoUDPPacketsFound( 188 | 'No UDP packets found in .csv file. ' 189 | 'Sounds like there is no UDP packets present in the original .pcap(ng) file.' 190 | ) 191 | 192 | ports = udp_packets.groupby(['udp.srcport', 'udp.dstport'])['ws.no'].count() 193 | 194 | raise NoSRTPacketsFound( 195 | f'There are UDP packets in .csv file on ports: \n{ports}\n' 196 | 'Try to decode UDP packets as SRT ones by running the script with --overwrite and --port options.' 197 | ) 198 | 199 | # SRT packets found in .csv file. 200 | # TODO: Using ports 'udp.srcport', 'udp.dstport', check that there is only one stream inside 201 | 202 | # NOTE: When adding a combination "offset abbreviation <-> timezone", it's recommended 203 | # to add both standard and daylight savings time offsets for each timezone 204 | # (like CET and CEST for 'Europe/Berlin') 205 | # https://stackoverflow.com/questions/67061724/panda-to-datetime-raises-warning-tzname-cet-identified-but-not-understood 206 | tzmapping = { 207 | 'CET': dateutil.tz.gettz('Europe/Berlin'), 208 | 'CEST': dateutil.tz.gettz('Europe/Berlin') 209 | } 210 | 211 | # NOTE: This is done to convert Windows time offsets into appropriate pandas format 212 | # https://learn.microsoft.com/en-us/windows-hardware/manufacture/desktop/default-time-zones?view=windows-11 213 | srt_packets['frame.time'] = srt_packets['frame.time'].str.replace('W. Europe Standard Time', 'CET') 214 | srt_packets['frame.time'] = srt_packets['frame.time'].str.replace('W. Europe Daylight Time', 'CEST') 215 | 216 | srt_packets['frame.time'] = srt_packets['frame.time'].apply(dateutil.parser.parse, tzinfos=tzmapping) 217 | srt_packets['frame.time'] = srt_packets['frame.time'].dt.tz_convert('UTC') 218 | 219 | srt_packets['srt.iscontrol'] = srt_packets['srt.iscontrol'].astype('int8') 220 | srt_packets['srt.timestamp'] = srt_packets['srt.timestamp'].astype('int64') 221 | srt_packets['udp.length'] = srt_packets['udp.length'].fillna(0).astype('int16') 222 | srt_packets['data.len'] = srt_packets['data.len'].fillna(0).astype('int16') 223 | 224 | return srt_packets 225 | 226 | 227 | def extract_data_packets(srt_packets: pd.DataFrame) -> pd.DataFrame: 228 | """ 229 | Extract SRT DATA packets from SRT packets (both DATA and CONTROL) 230 | `srt_packets` dataframe. 231 | 232 | Attributes: 233 | srt_packets: 234 | :class:`pd.DataFrame` dataframe with SRT packets (both DATA and 235 | CONTROL) obtained from the .csv tcpdump trace file using 236 | `extract_srt_packets` function. 237 | 238 | Returns: 239 | :class:`pd.DataFrame` dataframe with SRT DATA packets or 240 | an empty dataframe if there is no DATA packets found. 241 | """ 242 | columns = [ 243 | 'ws.no', 244 | 'frame.time', 245 | 'ws.time', 246 | 'ws.source', 247 | 'ws.destination', 248 | 'ws.protocol', 249 | 'ws.length', 250 | 'ws.info', 251 | 'srt.iscontrol', 252 | 'srt.seqno', 253 | 'srt.msg.rexmit', 254 | 'srt.timestamp', 255 | 'srt.id', 256 | 'data.len', 257 | ] 258 | data = srt_packets.loc[srt_packets['srt.iscontrol'] == 0, columns] 259 | data['srt.seqno'] = data['srt.seqno'].astype('int64') 260 | data['srt.msg.rexmit'] = data['srt.msg.rexmit'].astype('int8') 261 | data['data.len'] = data['data.len'].astype('int16') 262 | 263 | # Group data by source, destination and socket id 264 | # NOTE: There should be only one group under the assumption that tcpdump 265 | # trace file has been taken at the receiver side and there is only one 266 | # data flow. For more complicated use cases, a proper data splitting 267 | # should be implemented. 268 | data_grouped = data.groupby(['ws.source', 'ws.destination', 'srt.id']) 269 | 270 | # Return an empty dataframe if there is no DATA packets found 271 | if len(data_grouped) == 0: 272 | columns += [ 273 | 'ws.time.us', 274 | 'ws.iat.us' 275 | ] 276 | return pd.DataFrame(columns=columns) 277 | 278 | # TODO: Implement 279 | # Return an empty dataframe if there is more than 1 data flow detected 280 | if len(data_grouped) > 1: 281 | print( 282 | 'There are more than 1 data flow detected. ' 283 | 'This case is not supported. The groups found are listed below:' 284 | ) 285 | 286 | for name, group in data_grouped: 287 | print(name) 288 | print(group) 289 | 290 | columns += [ 291 | 'ws.time.us', 292 | 'ws.iat.us' 293 | ] 294 | return pd.DataFrame(columns=columns) 295 | 296 | assert(len(data_grouped) == 1) 297 | 298 | # Calculate packet inter-arrival times 299 | # NOTE: Packet timestamp `ws.time` in tcpdump trace file is measured 300 | # in seconds, time in SRT is measured in microseconds (us). 301 | # That is why, first we multiply the timestamp by 1000000, then make 302 | # a conversion from float to int as it is done in SRT, and only then 303 | # calculate the inter-arrival times. The very first value will be NaN, 304 | # fillna() changes it to 0, otherwise astype() will fail. Finally, 305 | # we convert the type from float to int, because diff() returns float. 306 | # NOTE: In SRT protocol, the time delta for the first SRT data packet is 307 | # taken as the difference between time of this data packet and the 308 | # previous handshake one. Here we assume this value to be equal to 0 309 | # for simplicity. 310 | data['ws.time.us'] = (data['ws.time'] * 1000000).astype('int64') 311 | data['ws.iat.us'] = data['ws.time.us'].diff().fillna(0).astype('int64') 312 | 313 | return data 314 | 315 | 316 | def extract_control_packets(srt_packets: pd.DataFrame) -> pd.DataFrame: 317 | """ 318 | Extract SRT CONTROL packets from SRT packets (both DATA and CONTROL) 319 | `srt_packets` dataframe. 320 | 321 | Attributes: 322 | srt_packets: 323 | :class:`pd.DataFrame` dataframe with SRT packets (both DATA and 324 | CONTROL) obtained from the .csv tcpdump trace file using 325 | `extract_srt_packets` function. 326 | 327 | Returns: 328 | :class:`pd.DataFrame` dataframe with SRT CONTROL packets or 329 | an empty dataframe if there is no CONTROL packets found. 330 | """ 331 | columns = [ 332 | 'ws.no', 333 | 'frame.time', 334 | 'ws.time', 335 | 'ws.source', 336 | 'ws.destination', 337 | 'ws.protocol', 338 | 'ws.length', 339 | 'ws.info', 340 | 'srt.iscontrol', 341 | 'srt.type', 342 | 'srt.timestamp', 343 | 'srt.id', 344 | 'srt.ack_seqno', 345 | 'srt.rtt', 346 | 'srt.rttvar', 347 | 'srt.rate', 348 | 'srt.bw', 349 | 'srt.rcvrate', 350 | ] 351 | control = srt_packets.loc[srt_packets['srt.iscontrol'] == 1, columns] 352 | 353 | return control 354 | 355 | 356 | def extract_probing_packets(srt_packets: pd.DataFrame) -> pd.DataFrame: 357 | """ 358 | Extract SRT probing DATA packets from SRT packets (both DATA and CONTROL) 359 | `srt_packets` dataframe. 360 | 361 | Attributes: 362 | srt_packets: 363 | :class:`pd.DataFrame` dataframe with SRT packets (both DATA and 364 | CONTROL) obtained from the .csv tcpdump trace file using 365 | `extract_srt_packets` function. 366 | 367 | Returns: 368 | :class:`pd.DataFrame` dataframe with SRT probing DATA packets or 369 | an empty dataframe if there is no probing packets found. 370 | """ 371 | data = extract_data_packets(srt_packets) 372 | 373 | # Apply logic AND to SRT data packet sequence number and 15 (1111) in order to check 374 | # the latest 4 bits of the sequence number (whether it is 0000=0 or 0001=1). 375 | # 0001=1 corresponds to the probing packet. 376 | data['seqno'] = data['srt.seqno'] & 15 377 | # Shift seqno column by 1 in order to get the current and previous values nearby. 378 | # We are looking for pairs: probing packet (0001=1) and previous packet (0000=0). 379 | # The order is important. Fill the first NA value with 1 in order to exclude this 380 | # row for sure. 381 | data['seqno_shifted'] = data['seqno'].shift().fillna(1).astype('int64') 382 | # Then we are interested in those probing packets for which packet pairs consist 383 | # of original only packets. There should be no retransmitted packets. 384 | # Shift srt.msg.rexmit column by 1 to get the current and previous values of rexmit 385 | # flag (0 - original packet, 1 - retransmitted packet) nearby. Fill the first 386 | # NA value with 1 in order to exclude this row for sure. 387 | data['rexmit_shifted'] = data['srt.msg.rexmit'].shift().fillna(1).astype('int8') 388 | 389 | probing_packets = data[ 390 | (data['seqno'] == 1) & 391 | (data['seqno_shifted'] == 0) & 392 | (data['srt.msg.rexmit'] == 0) & 393 | (data['rexmit_shifted'] == 0) 394 | ] 395 | 396 | columns = [ 397 | 'ws.no', 398 | 'frame.time', 399 | 'ws.time', 400 | 'ws.source', 401 | 'ws.destination', 402 | 'ws.protocol', 403 | 'ws.length', 404 | 'ws.info', 405 | 'srt.iscontrol', 406 | 'srt.seqno', 407 | 'srt.msg.rexmit', 408 | 'srt.timestamp', 409 | 'srt.id', 410 | 'data.len', 411 | 'ws.time.us', 412 | 'ws.iat.us', 413 | ] 414 | probing_packets = probing_packets[columns] 415 | 416 | return probing_packets 417 | 418 | 419 | def extract_umsg_handshake_packets(srt_packets: pd.DataFrame) -> pd.DataFrame: 420 | """ 421 | Extract SRT UMSG_HANDSHAKE CONTROL packets from SRT packets 422 | (both DATA and CONTROL) `srt_packets` dataframe. 423 | 424 | Attributes: 425 | srt_packets: 426 | :class:`pd.DataFrame` dataframe with SRT packets (both DATA and 427 | CONTROL) obtained from the .csv tcpdump trace file using 428 | `extract_srt_packets` function. 429 | 430 | Returns: 431 | :class:`pd.DataFrame` dataframe with SRT UMSG_HANDSHAKE CONTROL 432 | packets or an empty dataframe if there is no UMSG_HANDSHAKE 433 | packets found. 434 | """ 435 | columns = [ 436 | 'ws.no', 437 | 'frame.time', 438 | 'ws.time', 439 | 'ws.source', 440 | 'ws.destination', 441 | 'ws.protocol', 442 | 'ws.length', 443 | 'ws.info', 444 | 'srt.iscontrol', 445 | 'srt.type', 446 | 'srt.timestamp', 447 | 'srt.id', 448 | ] 449 | control = extract_control_packets(srt_packets) 450 | umsg_handshake = control.loc[control['srt.type'] == '0x00000000', columns] 451 | 452 | return umsg_handshake 453 | 454 | 455 | def extract_umsg_ack_packets(srt_packets: pd.DataFrame) -> pd.DataFrame: 456 | """ 457 | Extract SRT UMSG_ACK CONTROL packets from SRT packets (both DATA and CONTROL) 458 | `srt_packets` dataframe. 459 | 460 | Attributes: 461 | srt_packets: 462 | :class:`pd.DataFrame` dataframe with SRT packets (both DATA and 463 | CONTROL) obtained from the .csv tcpdump trace file using 464 | `extract_srt_packets` function. 465 | 466 | Returns: 467 | :class:`pd.DataFrame` dataframe with SRT UMSG_ACK CONTROL packets or 468 | an empty dataframe if there is no UMSG_ACK packets found. 469 | """ 470 | control = extract_control_packets(srt_packets) 471 | 472 | # Group data by source, destination, socket id and packet type 473 | grouped = control.groupby(['ws.source', 'ws.destination', 'srt.id', 'srt.type']) 474 | # Find the group with packet type = UMSG_ACK ('0x00000002') 475 | # NOTE: There should be only one group under the assumption that tcpdump 476 | # trace file has been taken at the receiver side and there is only one 477 | # data flow. For more complicated use cases, a proper data splitting 478 | # should be implemented. 479 | names = [name for name, _ in grouped if name[len(name) - 1] == '0x00000002'] 480 | 481 | # Return an empty dataframe if there is no UMSG_ACK packets found 482 | if len(names) == 0: 483 | return pd.DataFrame(columns=columns) 484 | 485 | # TODO: Implement 486 | # Return an empty dataframe if there is more than 1 data flow detected 487 | if len(names) > 1: 488 | print( 489 | 'There are more than 1 data flow detected. ' 490 | f'This case is not supported. The groups found are listed below:' 491 | ) 492 | 493 | for name in names: 494 | print(name) 495 | 496 | return pd.DataFrame(columns=columns) 497 | 498 | assert(len(names) == 1) 499 | 500 | umsg_ack = grouped.get_group(names[0]) 501 | 502 | # Drop rows with NaN values in srt.rate, srt.bw, srt.rcvrate columns 503 | # (so called light acknowledgements) 504 | umsg_ack = umsg_ack.dropna(subset=['srt.rate', 'srt.bw', 'srt.rcvrate'], how='any') 505 | 506 | # Convert types 507 | umsg_ack['srt.ack_seqno'] = umsg_ack['srt.ack_seqno'].astype('int64') 508 | umsg_ack['srt.rtt'] = umsg_ack['srt.rtt'].astype('int64') 509 | umsg_ack['srt.rttvar'] = umsg_ack['srt.rttvar'].astype('int64') 510 | umsg_ack['srt.rate'] = umsg_ack['srt.rate'].astype('int64') 511 | umsg_ack['srt.bw'] = umsg_ack['srt.bw'].astype('int64') 512 | umsg_ack['srt.rcvrate'] = umsg_ack['srt.rcvrate'].astype('int64') 513 | 514 | return umsg_ack 515 | 516 | 517 | @click.command() 518 | @click.argument( 519 | 'path', 520 | type=click.Path(exists=True) 521 | ) 522 | @click.option( 523 | '--type', 524 | type=click.Choice(PACKET_TYPES), 525 | default=PacketTypes.probing.value, 526 | help= 'Packet type to extract: ' 527 | 'SRT (both DATA and CONTROL), SRT DATA, SRT CONTROL, ' 528 | 'SRT DATA probing, SRT CONTROL UMSG_HANDSHAKE, ' 529 | 'or SRT CONTROL UMSG_ACK packets.', 530 | show_default=True 531 | ) 532 | @click.option( 533 | '--overwrite/--no-overwrite', 534 | default=False, 535 | help= 'If exists, overwrite the .csv file produced out of the .pcap(ng) ' 536 | 'one at the previous iterations of running the script.', 537 | show_default=True 538 | ) 539 | @click.option( 540 | '--save/--no-save', 541 | default=False, 542 | help='Save dataframe with extracted packets into .csv file.', 543 | show_default=True 544 | ) 545 | @click.option( 546 | '--port', 547 | help= 'Decode packets as SRT on a specified port. ' 548 | 'This option is helpful when there is no SRT handshake in .pcap(ng) file. ' 549 | 'Should be used together with --overwrite option.' 550 | ) 551 | def main(path, type, overwrite, save, port): 552 | """ 553 | This script parses .pcap(ng) tcpdump trace file, 554 | saves the output in .csv format nearby the original file, extract packets 555 | of interest and saves the obtained dataframe in .csv format nearby the 556 | original file. 557 | """ 558 | # Convert .pcap(ng) to .csv tcpdump trace file 559 | pcap_filepath = pathlib.Path(path) 560 | if port is not None: 561 | csv_filepath = convert.convert_to_csv(pcap_filepath, overwrite, True, port) 562 | else: 563 | csv_filepath = convert.convert_to_csv(pcap_filepath, overwrite) 564 | 565 | # Extract packets of interest 566 | try: 567 | srt_packets = extract_srt_packets(csv_filepath) 568 | except (UnexpectedColumnsNumber, EmptyCSV, NoUDPPacketsFound, NoSRTPacketsFound) as error: 569 | print(f'{error}') 570 | return 571 | 572 | if type == PacketTypes.srt.value: 573 | packets = srt_packets 574 | if type == PacketTypes.data.value: 575 | packets = extract_data_packets(srt_packets) 576 | if type == PacketTypes.control.value: 577 | packets = extract_control_packets(srt_packets) 578 | if type == PacketTypes.probing.value: 579 | packets = extract_probing_packets(srt_packets) 580 | if type == PacketTypes.umsg_handshake.value: 581 | packets = extract_umsg_handshake_packets(srt_packets) 582 | if type == PacketTypes.umsg_ack.value: 583 | packets = extract_umsg_ack_packets(srt_packets) 584 | 585 | # Print the first 20 rows of the dataframe with extracted packets 586 | print('The result dataframe is the following:') 587 | print(packets.head(20)) 588 | 589 | # Save extracted packets to .csv 590 | if save: 591 | print('Writing to .csv file ...') 592 | name, _ = csv_filepath.name.split('.') 593 | packets.to_csv(csv_filepath.parent / f'{name}-{type}.csv', sep=';') 594 | 595 | 596 | if __name__ == '__main__': 597 | main() 598 | --------------------------------------------------------------------------------