├── .github └── workflows │ └── run_tests.yml ├── .gitignore ├── LICENSE ├── README.md ├── netflow ├── __init__.py ├── analyzer.py ├── collector.py ├── ipfix.py ├── utils.py ├── v1.py ├── v5.py └── v9.py ├── nf-workflow.png ├── setup.py └── tests ├── __init__.py ├── lib.py ├── test_analyzer.py ├── test_ipfix.py ├── test_netflow.py └── test_performance.py /.github/workflows/run_tests.yml: -------------------------------------------------------------------------------- 1 | name: Run Python unit tests 2 | 3 | on: 4 | push: 5 | branches: [ master, release ] 6 | pull_request: 7 | workflow_dispatch: 8 | 9 | jobs: 10 | test-netflow: 11 | runs-on: ubuntu-20.04 12 | strategy: 13 | matrix: 14 | python: 15 | - "3.5.3" # Debian Stretch 16 | - "3.7.3" # Debian Buster 17 | - "3.9.2" # Debian Bullseye 18 | - "3.11" # Debian Bookworm uses 3.11.1, but it's in a newer pyenv release 19 | steps: 20 | - uses: actions/checkout@v3 21 | 22 | - name: Set up Python with pyenv 23 | uses: gabrielfalcao/pyenv-action@v11 24 | with: 25 | default: "${{ matrix.python }}" 26 | 27 | - name: Run Python unittests 28 | run: python3 -m unittest 29 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .*egg-info.* 2 | build* 3 | dist* 4 | .*python_netflow_v9_softflowd.egg-info/ 5 | *.swp 6 | *.swo 7 | __pycache__ 8 | *.json 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016-2020 Dominik Pataky 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python NetFlow/IPFIX library 2 | This package contains libraries and tools for **NetFlow versions 1, 5 and 9, and IPFIX**. It is available [on PyPI as "netflow"](https://pypi.org/project/netflow/). 3 | 4 | Version 9 is the first NetFlow version using templates. Templates make dynamically sized and configured NetFlow data flowsets possible, which makes the collector's job harder. The library provides the `netflow.parse_packet()` function as the main API point (see below). By importing `netflow.v1`, `netflow.v5` or `netflow.v9` you have direct access to the respective parsing objects, but at the beginning you probably will have more success by running the reference collector (example below) and look into its code. IPFIX (IP Flow Information Export) is based on NetFlow v9 and standardized by the IETF. All related classes are contained in `netflow.ipfix`. 5 | 6 | ![Data flow diagram](nf-workflow.png) 7 | 8 | Copyright 2016-2023 Dominik Pataky 9 | 10 | Licensed under MIT License. See LICENSE. 11 | 12 | 13 | ## Using the library 14 | If you chose to use the classes provided by this library directly, here's an example for a NetFlow v5 export packet: 15 | 16 | 1. Create a collector which listens for exported packets on some UDP port. It should then receive UDP packets from exporters. 17 | 2. Inside the UDP packets, the NetFlow payload is contained. For NetFlow v5 it should begin with bytes `0005` for example. 18 | 3. Call the `netflow.parse_packet()` function with the payload as first argument (takes string, bytes string and hex'd bytes). 19 | 20 | Example UDP collector server (receiving exports on port 2055): 21 | ```python 22 | import netflow 23 | import socket 24 | sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) 25 | sock.bind(("0.0.0.0", 2055)) 26 | payload, client = sock.recvfrom(4096) # experimental, tested with 1464 bytes 27 | p = netflow.parse_packet(payload) # Test result: 28 | print(p.header.version) # Test result: 5 29 | ``` 30 | 31 | Or from hex dump: 32 | ```python 33 | import netflow 34 | p = netflow.parse_packet("00050003000379a35e80c58622a...") # see test_netflow.py 35 | assert p.header.version == 5 # NetFlow v5 packet 36 | assert p.flows[0].PROTO == 1 # ICMP flow 37 | ``` 38 | 39 | In NetFlow v9 and IPFIX, templates are used instead of a fixed set of fields (like `PROTO`). See `collector.py` on how to handle these. You **must** store received templates in between exports and pass them to the parser when new packets arrive. Not storing the templates will always result in parsing failures. 40 | 41 | ## Using the collector and analyzer 42 | Since v0.9.0 the `netflow` library also includes reference implementations of a collector and an analyzer as CLI tools. 43 | These can be used on the CLI with `python3 -m netflow.collector` and `python3 -m netflow.analyzer`. Use the `-h` flag to receive the respective help output with all provided CLI flags. 44 | 45 | Example: to start the collector run `python3 -m netflow.collector -p 9000 -D`. This will start a collector instance at port 9000 in debug mode. Point your flow exporter to this port on your host and after some time the first ExportPackets should appear (the flows need to expire first). After you collected some data, the collector exports them into GZIP files, simply named `.gz` (or the filename you specified with `--file`/`-o`). 46 | 47 | To analyze the saved traffic, run `python3 -m netflow.analyzer -f `. The output will look similar to the following snippet, with resolved hostnames and services, transferred bytes and connection duration: 48 | 49 | 2017-10-28 23:17.01: SSH | 4.25M | 15:27 min | local-2 () to local-1 () 50 | 2017-10-28 23:17.01: SSH | 4.29M | 16:22 min | remote-1 () to local-2 () 51 | 2017-10-28 23:19.01: HTTP | 22.79M | 47:32 min | uwstream3.somafm.com (173...) to local-1 () 52 | 2017-10-28 23:22.01: HTTPS | 1.21M | 3 sec | fra16s12-in-x0e.1e100.net (2a00:..) to local-1 () 53 | 2017-10-28 23:23.01: SSH | 93.79M | 21 sec | remote-1 () to local-2 () 54 | 2017-10-28 23:51.01: SSH | 14.08M | 1:23.09 hours | remote-1 () to local-2 () 55 | 56 | **Please note that the collector and analyzer are experimental reference implementations. Do not rely on them in production monitoring use cases!** In any case I recommend looking into the `netflow/collector.py` and `netflow/analyzer.py` scripts for customization. Feel free to use the code and extend it in your own tool set - that's what the MIT license is for! 57 | 58 | 59 | ## Resources 60 | * [Cisco NetFlow v9 paper](http://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html) 61 | * [RFC 3954 "Cisco Systems NetFlow Services Export Version 9"](https://tools.ietf.org/html/rfc3954) 62 | * [RFC 7011 "IPFIX Protocol Specification"](https://tools.ietf.org/html/rfc7011) 63 | 64 | ## Development environment 65 | The library was specifically written in combination with NetFlow exports from Hitoshi Irino's fork of [softflowd](https://github.com/irino/softflowd) (v1.0.0) - it should work with every correct NetFlow/IPFIX implementation though. If you stumble upon new custom template fields please let me know, they will make a fine addition to the `netflow.v9.V9_FIELD_TYPES` collection. 66 | 67 | ### Running and creating tests 68 | The test files contain tests for all use cases in the library, based on real softflowd export packets. Whenever `softflowd` is referenced, a compiled version of softflowd 1.0.0 is meant, which is probably NOT the one in your distribution's package. During the development of this library, two ways of gathering these hex dumps were used. First, the tcpdump/Wireshark export way: 69 | 70 | 1. Run tcpdump/Wireshark on your public-facing interface (with tcpdump, save the pcap to disk). 71 | 2. Produce some sample flows, e.g. surf the web and refresh your mail client. With Wireshark, save the captured packets to disk. 72 | 4. Run tcpdump/Wireshark again on a local interface. 73 | 4. Run `softflowd` with the `-r ` flag. softflowd reads the captured traffic, produces the flows and exports them. Use the interface you are capturing packets on to send the exports to. E.g. capture on the localhost interface (with `-i lo` or on loopback) and then let softflowd export to `127.0.0.1:1337`. 74 | 5. Examine the captured traffic. Use Wireshark and set the `CFLOW` "decode as" dissector on the export packets (e.g. based on the port). The `data` fields should then be shown correctly as Netflow payload. 75 | 6. Extract this payload as hex stream. Anonymize the IP addresses with a hex editor if necessary. A recommended hex editor is [bless](https://github.com/afrantzis/bless). 76 | 77 | Second, a Docker way: 78 | 79 | 2. Run a softflowd daemon in the background inside a Docker container, listening on `eth0` and exporting to e.g. `172.17.0.1:1337`. 80 | 3. On your host start Wireshark to listen on the Docker bridge. 81 | 4. Create some traffic from inside the container. 82 | 5. Check the softflow daemon with `softflowctl dump-flows`. 83 | 6. If you have some flows shown to you, export them with `softflowctl expire-all`. 84 | 7. Your Wireshark should have picked up the epxort packets (it does not matter if there's a port unreachable error). 85 | 8. Set the decoder for the packets to `CFLOW` and copy the hex value from the NetFlow packet. 86 | 87 | Your exported hex string should begin with `0001`, `0005`, `0009` or `000a`, depending on the version. 88 | 89 | The collector is run in a background thread. The difference in transmission speed from the exporting client can lead to different results, possibly caused by race conditions during the usage of the GZIP output file. 90 | -------------------------------------------------------------------------------- /netflow/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 5 | 6 | Copyright 2016-2020 Dominik Pataky 7 | Licensed under MIT License. See LICENSE. 8 | """ 9 | 10 | from .utils import parse_packet 11 | -------------------------------------------------------------------------------- /netflow/analyzer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | Reference analyzer script for NetFlow Python package. 5 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 6 | 7 | Copyright 2016-2020 Dominik Pataky 8 | Licensed under MIT License. See LICENSE. 9 | """ 10 | 11 | import argparse 12 | import contextlib 13 | import functools 14 | import gzip 15 | import ipaddress 16 | import json 17 | import logging 18 | import os.path 19 | import socket 20 | import sys 21 | from collections import namedtuple 22 | from datetime import datetime 23 | 24 | IP_PROTOCOLS = { 25 | 1: "ICMP", 26 | 6: "TCP", 27 | 17: "UDP", 28 | 58: "ICMPv6" 29 | } 30 | 31 | Pair = namedtuple('Pair', ['src', 'dest']) 32 | 33 | logger = logging.getLogger(__name__) 34 | ch = logging.StreamHandler() 35 | formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') 36 | ch.setFormatter(formatter) 37 | logger.addHandler(ch) 38 | 39 | 40 | def printv(message, *args_, **kwargs): 41 | if args.verbose: 42 | print(message.format(*args_, **kwargs)) 43 | 44 | 45 | @functools.lru_cache(maxsize=None) 46 | def resolve_hostname(ip: str) -> str: 47 | if args.no_dns: 48 | # If no DNS resolution is requested, simply return the IP string 49 | return ip 50 | # else resolve the IP address to a hostname and return the hostname 51 | return socket.getfqdn(ip) 52 | 53 | 54 | def fallback(d, keys): 55 | for k in keys: 56 | if k in d: 57 | return d[k] 58 | raise KeyError(", ".join(keys)) 59 | 60 | 61 | def human_size(size_bytes): 62 | # Calculate a human readable size of the flow 63 | if size_bytes < 1024: 64 | return "%dB" % size_bytes 65 | elif size_bytes / 1024. < 1024: 66 | return "%.2fK" % (size_bytes / 1024.) 67 | elif size_bytes / 1024. ** 2 < 1024: 68 | return "%.2fM" % (size_bytes / 1024. ** 2) 69 | else: 70 | return "%.2fG" % (size_bytes / 1024. ** 3) 71 | 72 | 73 | def human_duration(seconds): 74 | # Calculate human readable duration times 75 | if seconds < 60: 76 | # seconds 77 | return "%d sec" % seconds 78 | if seconds / 60 > 60: 79 | # hours 80 | return "%d:%02d.%02d hours" % (seconds / 60 ** 2, seconds % 60 ** 2 / 60, seconds % 60) 81 | # minutes 82 | return "%02d:%02d min" % (seconds / 60, seconds % 60) 83 | 84 | 85 | class Connection: 86 | """Connection model for two flows. 87 | The direction of the data flow can be seen by looking at the size. 88 | 89 | 'src' describes the peer which sends more data towards the other. This 90 | does NOT have to mean that 'src' was the initiator of the connection. 91 | """ 92 | 93 | def __init__(self, flow1, flow2): 94 | if not flow1 or not flow2: 95 | raise Exception("A connection requires two flows") 96 | 97 | # Assume the size that sent the most data is the source 98 | # TODO: this might not always be right, maybe use earlier timestamp? 99 | size1 = fallback(flow1, ['IN_BYTES', 'IN_OCTETS']) 100 | size2 = fallback(flow2, ['IN_BYTES', 'IN_OCTETS']) 101 | if size1 >= size2: 102 | src = flow1 103 | dest = flow2 104 | else: 105 | src = flow2 106 | dest = flow1 107 | 108 | # TODO: this next approach uses the lower port as the service identifier 109 | # port1 = fallback(flow1, ['L4_SRC_PORT', 'SRC_PORT']) 110 | # port2 = fallback(flow2, ['L4_SRC_PORT', 'SRC_PORT']) 111 | # 112 | # src = flow1 113 | # dest = flow2 114 | # if port1 > port2: 115 | # src = flow2 116 | # dest = flow1 117 | 118 | self.src_flow = src 119 | self.dest_flow = dest 120 | ips = self.get_ips(src) 121 | self.src = ips.src 122 | self.dest = ips.dest 123 | self.src_port = fallback(src, ['L4_SRC_PORT', 'SRC_PORT']) 124 | self.dest_port = fallback(dest, ['L4_DST_PORT', 'DST_PORT']) 125 | self.size = fallback(src, ['IN_BYTES', 'IN_OCTETS']) 126 | 127 | # Duration is given in milliseconds 128 | self.duration = src['LAST_SWITCHED'] - src['FIRST_SWITCHED'] 129 | if self.duration < 0: 130 | # 32 bit int has its limits. Handling overflow here 131 | # TODO: Should be handled in the collection phase 132 | self.duration = (2 ** 32 - src['FIRST_SWITCHED']) + src['LAST_SWITCHED'] 133 | 134 | def __repr__(self): 135 | return "".format( 136 | self.src, self.dest, self.human_size) 137 | 138 | @staticmethod 139 | def get_ips(flow): 140 | # IPv4 141 | if flow.get('IP_PROTOCOL_VERSION') == 4 or \ 142 | 'IPV4_SRC_ADDR' in flow or 'IPV4_DST_ADDR' in flow: 143 | return Pair( 144 | ipaddress.ip_address(flow['IPV4_SRC_ADDR']), 145 | ipaddress.ip_address(flow['IPV4_DST_ADDR']) 146 | ) 147 | 148 | # IPv6 149 | return Pair( 150 | ipaddress.ip_address(flow['IPV6_SRC_ADDR']), 151 | ipaddress.ip_address(flow['IPV6_DST_ADDR']) 152 | ) 153 | 154 | @property 155 | def human_size(self): 156 | return human_size(self.size) 157 | 158 | @property 159 | def human_duration(self): 160 | duration = self.duration // 1000 # uptime in milliseconds, floor it 161 | return human_duration(duration) 162 | 163 | @property 164 | def hostnames(self): 165 | # Resolve the IPs of this flows to their hostname 166 | src_hostname = resolve_hostname(self.src.compressed) 167 | dest_hostname = resolve_hostname(self.dest.compressed) 168 | return Pair(src_hostname, dest_hostname) 169 | 170 | @property 171 | def service(self): 172 | # Resolve ports to their services, if known 173 | default = "({} {})".format(self.src_port, self.dest_port) 174 | with contextlib.suppress(OSError): 175 | return socket.getservbyport(self.src_port) 176 | with contextlib.suppress(OSError): 177 | return socket.getservbyport(self.dest_port) 178 | return default 179 | 180 | @property 181 | def total_packets(self): 182 | return self.src_flow["IN_PKTS"] + self.dest_flow["IN_PKTS"] 183 | 184 | 185 | if __name__ == "netflow.analyzer": 186 | logger.error("The analyzer is currently meant to be used as a CLI tool only.") 187 | logger.error("Use 'python3 -m netflow.analyzer -h' in your console for additional help.") 188 | 189 | if __name__ == "__main__": 190 | parser = argparse.ArgumentParser(description="Output a basic analysis of NetFlow data") 191 | parser.add_argument("-f", "--file", dest="file", type=str, default=sys.stdin, 192 | help="The file to analyze (defaults to stdin if not provided)") 193 | parser.add_argument("-p", "--packets", dest="packets_threshold", type=int, default=10, 194 | help="Number of packets representing the lower bound in connections to be processed") 195 | parser.add_argument("-v", "--verbose", dest="verbose", action="store_true", 196 | help="Enable verbose output.") 197 | parser.add_argument("--match-host", dest="match_host", type=str, default=None, 198 | help="Filter output by matching on the given host (matches source or destination)") 199 | parser.add_argument("-n", "--no-dns", dest="no_dns", action="store_true", 200 | help="Disable DNS resolving of IP addresses") 201 | args = parser.parse_args() 202 | 203 | # Sanity check for IP address 204 | if args.match_host: 205 | try: 206 | match_host = ipaddress.ip_address(args.match_host) 207 | except ValueError: 208 | exit("IP address '{}' is neither IPv4 nor IPv6".format(args.match_host)) 209 | 210 | # Using a file and using stdin differ in their further usage for gzip.open 211 | file = args.file 212 | mode = "rb" # reading files 213 | if file != sys.stdin and not os.path.exists(file): 214 | exit("File {} does not exist!".format(file)) 215 | 216 | if file == sys.stdin: 217 | file = sys.stdin.buffer 218 | mode = "rt" # reading from stdin 219 | 220 | data = {} 221 | 222 | with gzip.open(file, mode) as gzipped: 223 | # "for line in" lazy-loads all lines in the file 224 | for line in gzipped: 225 | entry = json.loads(line) 226 | if len(entry.keys()) != 1: 227 | logger.warning("The line does not have exactly one timestamp key: \"{}\"".format(line.keys())) 228 | 229 | try: 230 | ts = list(entry)[0] # timestamp from key 231 | except KeyError: 232 | logger.error("Saved line \"{}\" has no timestamp key!".format(line)) 233 | continue 234 | 235 | if "header" not in entry[ts]: 236 | logger.error("No header dict in entry {}".format(ts)) 237 | raise ValueError 238 | 239 | if entry[ts]["header"]["version"] == 10: 240 | logger.warning("Skipped IPFIX entry, because analysis of IPFIX is not yet implemented") 241 | continue 242 | 243 | data[ts] = entry[ts] 244 | 245 | # Go through data and dissect every flow saved inside the dump 246 | 247 | # The following dict holds flows which are looking for a peer, to analyze a duplex 'Connection'. 248 | # For each flow, the destination address is looked up. If the peer is not in the list of pending peers, 249 | # insert this flow, waiting for its peer. If found, take the waiting peer and create a Connection object. 250 | pending = {} 251 | skipped = 0 252 | skipped_threshold = args.packets_threshold 253 | 254 | first_line = True # print header line before first line 255 | 256 | for key in sorted(data): 257 | timestamp = datetime.fromtimestamp(float(key)).strftime("%Y-%m-%d %H:%M.%S") 258 | client = data[key]["client"] 259 | flows = data[key]["flows"] 260 | 261 | for flow in sorted(flows, key=lambda x: x["FIRST_SWITCHED"]): 262 | first_switched = flow["FIRST_SWITCHED"] 263 | 264 | if first_switched - 1 in pending: 265 | # TODO: handle fitting, yet mismatching (here: 1 second) pairs 266 | pass 267 | 268 | # Find the peer for this connection 269 | if "IPV4_SRC_ADDR" in flow or flow.get("IP_PROTOCOL_VERSION") == 4: 270 | local_peer = flow["IPV4_SRC_ADDR"] 271 | remote_peer = flow["IPV4_DST_ADDR"] 272 | else: 273 | local_peer = flow["IPV6_SRC_ADDR"] 274 | remote_peer = flow["IPV6_DST_ADDR"] 275 | 276 | # Match on host filter passed in as argument 277 | if args.match_host and not any([local_peer == args.match_host, remote_peer == args.match_host]): 278 | # If a match_host is given but neither local_peer nor remote_peer match 279 | continue 280 | 281 | if first_switched not in pending: 282 | pending[first_switched] = {} 283 | 284 | # Match peers 285 | if remote_peer in pending[first_switched]: 286 | # The destination peer put itself into the pending dict, getting and removing entry 287 | peer_flow = pending[first_switched].pop(remote_peer) 288 | if len(pending[first_switched]) == 0: 289 | del pending[first_switched] 290 | else: 291 | # Flow did not find a matching, pending peer - inserting itself 292 | pending[first_switched][local_peer] = flow 293 | continue 294 | 295 | con = Connection(flow, peer_flow) 296 | if con.total_packets < skipped_threshold: 297 | skipped += 1 298 | continue 299 | 300 | if first_line: 301 | print("{:19} | {:14} | {:8} | {:9} | {:7} | Involved hosts".format("Timestamp", "Service", "Size", 302 | "Duration", "Packets")) 303 | print("-" * 100) 304 | first_line = False 305 | 306 | print("{timestamp} | {service:<14} | {size:8} | {duration:9} | {packets:7} | " 307 | "Between {src_host} ({src}) and {dest_host} ({dest})" 308 | .format(timestamp=timestamp, service=con.service.upper(), src_host=con.hostnames.src, src=con.src, 309 | dest_host=con.hostnames.dest, dest=con.dest, size=con.human_size, duration=con.human_duration, 310 | packets=con.total_packets)) 311 | 312 | if skipped > 0: 313 | print("{skipped} connections skipped, because they had less than {skipped_threshold} packets " 314 | "(this value can be set with the -p flag).".format(skipped=skipped, skipped_threshold=skipped_threshold)) 315 | 316 | if not args.verbose: 317 | # Exit here if no debugging session was wanted 318 | exit(0) 319 | 320 | if len(pending) > 0: 321 | print("\nThere are {pending} first_switched entries left in the pending dict!".format(pending=len(pending))) 322 | all_noise = True 323 | for first_switched, flows in sorted(pending.items(), key=lambda x: x[0]): 324 | for peer, flow in flows.items(): 325 | # Ignore all pings, SYN scans and other noise to find only those peers left over which need a fix 326 | if flow["IN_PKTS"] < skipped_threshold: 327 | continue 328 | all_noise = False 329 | 330 | src = flow.get("IPV4_SRC_ADDR") or flow.get("IPV6_SRC_ADDR") 331 | src_host = resolve_hostname(src) 332 | src_text = "{}".format(src) if src == src_host else "{} ({})".format(src_host, src) 333 | dst = flow.get("IPV4_DST_ADDR") or flow.get("IPV6_DST_ADDR") 334 | dst_host = resolve_hostname(dst) 335 | dst_text = "{}".format(dst) if dst == dst_host else "{} ({})".format(dst_host, dst) 336 | proto = flow["PROTOCOL"] 337 | size = flow["IN_BYTES"] 338 | packets = flow["IN_PKTS"] 339 | src_port = flow.get("L4_SRC_PORT", 0) 340 | dst_port = flow.get("L4_DST_PORT", 0) 341 | 342 | print("From {src_text}:{src_port} to {dst_text}:{dst_port} with " 343 | "proto {proto} and size {size}" 344 | " ({packets} packets)".format(src_text=src_text, src_port=src_port, dst_text=dst_text, 345 | dst_port=dst_port, proto=IP_PROTOCOLS.get(proto, 'UNKNOWN'), 346 | size=human_size(size), packets=packets)) 347 | 348 | if all_noise: 349 | print("They were all noise!") 350 | -------------------------------------------------------------------------------- /netflow/collector.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | Reference collector script for NetFlow v1, v5, and v9 Python package. 5 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 6 | 7 | Copyright 2016-2020 Dominik Pataky 8 | Licensed under MIT License. See LICENSE. 9 | """ 10 | import argparse 11 | import gzip 12 | import json 13 | import logging 14 | import queue 15 | import signal 16 | import socket 17 | import socketserver 18 | import threading 19 | import time 20 | from collections import namedtuple 21 | 22 | from netflow.ipfix import IPFIXTemplateNotRecognized 23 | from netflow.utils import UnknownExportVersion, parse_packet 24 | from netflow.v9 import V9TemplateNotRecognized 25 | 26 | RawPacket = namedtuple('RawPacket', ['ts', 'client', 'data']) 27 | ParsedPacket = namedtuple('ParsedPacket', ['ts', 'client', 'export']) 28 | 29 | # Amount of time to wait before dropping an undecodable ExportPacket 30 | PACKET_TIMEOUT = 60 * 60 31 | 32 | logger = logging.getLogger("netflow-collector") 33 | ch = logging.StreamHandler() 34 | formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') 35 | ch.setFormatter(formatter) 36 | logger.addHandler(ch) 37 | 38 | 39 | class QueuingRequestHandler(socketserver.BaseRequestHandler): 40 | def handle(self): 41 | data = self.request[0] # get content, [1] would be the socket 42 | self.server.queue.put(RawPacket(time.time(), self.client_address, data)) 43 | logger.debug( 44 | "Received %d bytes of data from %s", len(data), self.client_address 45 | ) 46 | 47 | 48 | class QueuingUDPListener(socketserver.ThreadingUDPServer): 49 | """A threaded UDP server that adds a (time, data) tuple to a queue for 50 | every request it sees 51 | """ 52 | 53 | def __init__(self, interface, queue): 54 | self.queue = queue 55 | 56 | # If IPv6 interface addresses are used, override the default AF_INET family 57 | if ":" in interface[0]: 58 | self.address_family = socket.AF_INET6 59 | 60 | super().__init__(interface, QueuingRequestHandler) 61 | 62 | 63 | class ThreadedNetFlowListener(threading.Thread): 64 | """A thread that listens for incoming NetFlow packets, processes them, and 65 | makes them available to consumers. 66 | 67 | - When initialized, will start listening for NetFlow packets on the provided 68 | host and port and queuing them for processing. 69 | - When started, will start processing and parsing queued packets. 70 | - When stopped, will shut down the listener and stop processing. 71 | - When joined, will wait for the listener to exit 72 | 73 | For example, a simple script that outputs data until killed with CTRL+C: 74 | >>> listener = ThreadedNetFlowListener('0.0.0.0', 2055) 75 | >>> print("Listening for NetFlow packets") 76 | >>> listener.start() # start processing packets 77 | >>> try: 78 | ... while True: 79 | ... ts, export = listener.get() 80 | ... print("Time: {}".format(ts)) 81 | ... for f in export.flows: 82 | ... print(" - {IPV4_SRC_ADDR} sent data to {IPV4_DST_ADDR}" 83 | ... "".format(**f)) 84 | ... finally: 85 | ... print("Stopping...") 86 | ... listener.stop() 87 | ... listener.join() 88 | ... print("Stopped!") 89 | """ 90 | 91 | def __init__(self, host: str, port: int): 92 | logger.info("Starting the NetFlow listener on {}:{}".format(host, port)) 93 | self.output = queue.Queue() 94 | self.input = queue.Queue() 95 | self.server = QueuingUDPListener((host, port), self.input) 96 | self.thread = threading.Thread(target=self.server.serve_forever) 97 | self.thread.start() 98 | self._shutdown = threading.Event() 99 | super().__init__() 100 | 101 | def get(self, block=True, timeout=None) -> ParsedPacket: 102 | """Get a processed flow. 103 | 104 | If optional args 'block' is true and 'timeout' is None (the default), 105 | block if necessary until a flow is available. If 'timeout' is 106 | a non-negative number, it blocks at most 'timeout' seconds and raises 107 | the queue.Empty exception if no flow was available within that time. 108 | Otherwise ('block' is false), return a flow if one is immediately 109 | available, else raise the queue.Empty exception ('timeout' is ignored 110 | in that case). 111 | """ 112 | return self.output.get(block, timeout) 113 | 114 | def run(self): 115 | # Process packets from the queue 116 | try: 117 | # TODO: use per-client templates 118 | templates = {"netflow": {}, "ipfix": {}} 119 | to_retry = [] 120 | while not self._shutdown.is_set(): 121 | try: 122 | # 0.5s delay to limit CPU usage while waiting for new packets 123 | pkt = self.input.get(block=True, timeout=0.5) # type: RawPacket 124 | except queue.Empty: 125 | continue 126 | 127 | try: 128 | # templates is passed as reference, updated in V9ExportPacket 129 | export = parse_packet(pkt.data, templates) 130 | except UnknownExportVersion as e: 131 | logger.error("%s, ignoring the packet", e) 132 | continue 133 | except (V9TemplateNotRecognized, IPFIXTemplateNotRecognized): 134 | # TODO: differentiate between v9 and IPFIX, use separate to_retry lists 135 | if time.time() - pkt.ts > PACKET_TIMEOUT: 136 | logger.warning("Dropping an old and undecodable v9/IPFIX ExportPacket") 137 | else: 138 | to_retry.append(pkt) 139 | logger.debug("Failed to decode a v9/IPFIX ExportPacket - will " 140 | "re-attempt when a new template is discovered") 141 | continue 142 | 143 | if export.header.version == 10: 144 | logger.debug("Processed an IPFIX ExportPacket with length %d.", export.header.length) 145 | else: 146 | logger.debug("Processed a v%d ExportPacket with %d flows.", 147 | export.header.version, export.header.count) 148 | 149 | # If any new templates were discovered, dump the unprocessable 150 | # data back into the queue and try to decode them again 151 | if export.header.version in [9, 10] and export.contains_new_templates and to_retry: 152 | logger.debug("Received new template(s)") 153 | logger.debug("Will re-attempt to decode %d old v9/IPFIX ExportPackets", len(to_retry)) 154 | for p in to_retry: 155 | self.input.put(p) 156 | to_retry.clear() 157 | 158 | self.output.put(ParsedPacket(pkt.ts, pkt.client, export)) 159 | finally: 160 | # Only reached when while loop ends 161 | self.server.shutdown() 162 | self.server.server_close() 163 | 164 | def stop(self): 165 | logger.info("Shutting down the NetFlow listener") 166 | self._shutdown.set() 167 | 168 | def join(self, timeout=None): 169 | self.thread.join(timeout=timeout) 170 | super().join(timeout=timeout) 171 | 172 | 173 | def get_export_packets(host: str, port: int) -> ParsedPacket: 174 | """A threaded generator that will yield ExportPacket objects until it is killed 175 | """ 176 | def handle_signal(s, f): 177 | logger.debug("Received signal {}, raising StopIteration".format(s)) 178 | raise StopIteration 179 | signal.signal(signal.SIGTERM, handle_signal) 180 | signal.signal(signal.SIGINT, handle_signal) 181 | 182 | listener = ThreadedNetFlowListener(host, port) 183 | listener.start() 184 | 185 | try: 186 | while True: 187 | yield listener.get() 188 | except StopIteration: 189 | pass 190 | finally: 191 | listener.stop() 192 | listener.join() 193 | 194 | 195 | if __name__ == "netflow.collector": 196 | logger.error("The collector is currently meant to be used as a CLI tool only.") 197 | logger.error("Use 'python3 -m netflow.collector -h' in your console for additional help.") 198 | 199 | if __name__ == "__main__": 200 | parser = argparse.ArgumentParser(description="A sample netflow collector.") 201 | parser.add_argument("--host", type=str, default="0.0.0.0", 202 | help="collector listening address") 203 | parser.add_argument("--port", "-p", type=int, default=2055, 204 | help="collector listener port") 205 | parser.add_argument("--file", "-o", type=str, dest="output_file", 206 | default="{}.gz".format(int(time.time())), 207 | help="collector export multiline JSON file") 208 | parser.add_argument("--debug", "-D", action="store_true", 209 | help="Enable debug output") 210 | args = parser.parse_args() 211 | 212 | if args.debug: 213 | logger.setLevel(logging.DEBUG) 214 | ch.setLevel(logging.DEBUG) 215 | 216 | try: 217 | # With every parsed flow a new line is appended to the output file. In previous versions, this was implemented 218 | # by storing the whole data dict in memory and dumping it regularly onto disk. This was extremely fragile, as 219 | # it a) consumed a lot of memory and CPU (dropping packets since storing one flow took longer than the arrival 220 | # of the next flow) and b) broke the exported JSON file, if the collector crashed during the write process, 221 | # rendering all collected flows during the runtime of the collector useless (the file contained one large JSON 222 | # dict which represented the 'data' dict). 223 | 224 | # In this new approach, each received flow is parsed as usual, but it gets appended to a gzipped file each time. 225 | # All in all, this improves in three aspects: 226 | # 1. collected flow data is not stored in memory any more 227 | # 2. received and parsed flows are persisted reliably 228 | # 3. the disk usage of files with JSON and its full strings as keys is reduced by using gzipped files 229 | # This also means that the files have to be handled differently, because they are gzipped and not formatted as 230 | # one single big JSON dump, but rather many little JSON dumps, separated by line breaks. 231 | for ts, client, export in get_export_packets(args.host, args.port): 232 | entry = {ts: { 233 | "client": client, 234 | "header": export.header.to_dict(), 235 | "flows": [flow.data for flow in export.flows]} 236 | } 237 | line = json.dumps(entry).encode() + b"\n" # byte encoded line 238 | with gzip.open(args.output_file, "ab") as fh: # open as append, not reading the whole file 239 | fh.write(line) 240 | except KeyboardInterrupt: 241 | logger.info("Received KeyboardInterrupt, passing through") 242 | pass 243 | -------------------------------------------------------------------------------- /netflow/ipfix.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 5 | Reference: https://tools.ietf.org/html/rfc7011 6 | 7 | Copyright 2016-2020 Dominik Pataky 8 | Licensed under MIT License. See LICENSE. 9 | """ 10 | import functools 11 | import struct 12 | from collections import namedtuple 13 | from typing import Optional, Union, List, Dict 14 | 15 | FieldType = namedtuple("FieldType", ["id", "name", "type"]) 16 | DataType = namedtuple("DataType", ["type", "unpack_format"]) 17 | TemplateField = namedtuple("TemplateField", ["id", "length"]) 18 | TemplateFieldEnterprise = namedtuple("TemplateFieldEnterprise", ["id", "length", "enterprise_number"]) 19 | 20 | 21 | class IPFIXFieldTypes: 22 | # Source: https://www.iana.org/assignments/ipfix/ipfix-information-elements.csv 23 | iana_field_types = [ 24 | (0, "Reserved", ""), 25 | (1, "octetDeltaCount", "unsigned64"), 26 | (2, "packetDeltaCount", "unsigned64"), 27 | (3, "deltaFlowCount", "unsigned64"), 28 | (4, "protocolIdentifier", "unsigned8"), 29 | (5, "ipClassOfService", "unsigned8"), 30 | (6, "tcpControlBits", "unsigned16"), 31 | (7, "sourceTransportPort", "unsigned16"), 32 | (8, "sourceIPv4Address", "ipv4Address"), 33 | (9, "sourceIPv4PrefixLength", "unsigned8"), 34 | (10, "ingressInterface", "unsigned32"), 35 | (11, "destinationTransportPort", "unsigned16"), 36 | (12, "destinationIPv4Address", "ipv4Address"), 37 | (13, "destinationIPv4PrefixLength", "unsigned8"), 38 | (14, "egressInterface", "unsigned32"), 39 | (15, "ipNextHopIPv4Address", "ipv4Address"), 40 | (16, "bgpSourceAsNumber", "unsigned32"), 41 | (17, "bgpDestinationAsNumber", "unsigned32"), 42 | (18, "bgpNextHopIPv4Address", "ipv4Address"), 43 | (19, "postMCastPacketDeltaCount", "unsigned64"), 44 | (20, "postMCastOctetDeltaCount", "unsigned64"), 45 | (21, "flowEndSysUpTime", "unsigned32"), 46 | (22, "flowStartSysUpTime", "unsigned32"), 47 | (23, "postOctetDeltaCount", "unsigned64"), 48 | (24, "postPacketDeltaCount", "unsigned64"), 49 | (25, "minimumIpTotalLength", "unsigned64"), 50 | (26, "maximumIpTotalLength", "unsigned64"), 51 | (27, "sourceIPv6Address", "ipv6Address"), 52 | (28, "destinationIPv6Address", "ipv6Address"), 53 | (29, "sourceIPv6PrefixLength", "unsigned8"), 54 | (30, "destinationIPv6PrefixLength", "unsigned8"), 55 | (31, "flowLabelIPv6", "unsigned32"), 56 | (32, "icmpTypeCodeIPv4", "unsigned16"), 57 | (33, "igmpType", "unsigned8"), 58 | (34, "samplingInterval", "unsigned32"), 59 | (35, "samplingAlgorithm", "unsigned8"), 60 | (36, "flowActiveTimeout", "unsigned16"), 61 | (37, "flowIdleTimeout", "unsigned16"), 62 | (38, "engineType", "unsigned8"), 63 | (39, "engineId", "unsigned8"), 64 | (40, "exportedOctetTotalCount", "unsigned64"), 65 | (41, "exportedMessageTotalCount", "unsigned64"), 66 | (42, "exportedFlowRecordTotalCount", "unsigned64"), 67 | (43, "ipv4RouterSc", "ipv4Address"), 68 | (44, "sourceIPv4Prefix", "ipv4Address"), 69 | (45, "destinationIPv4Prefix", "ipv4Address"), 70 | (46, "mplsTopLabelType", "unsigned8"), 71 | (47, "mplsTopLabelIPv4Address", "ipv4Address"), 72 | (48, "samplerId", "unsigned8"), 73 | (49, "samplerMode", "unsigned8"), 74 | (50, "samplerRandomInterval", "unsigned32"), 75 | (51, "classId", "unsigned8"), 76 | (52, "minimumTTL", "unsigned8"), 77 | (53, "maximumTTL", "unsigned8"), 78 | (54, "fragmentIdentification", "unsigned32"), 79 | (55, "postIpClassOfService", "unsigned8"), 80 | (56, "sourceMacAddress", "macAddress"), 81 | (57, "postDestinationMacAddress", "macAddress"), 82 | (58, "vlanId", "unsigned16"), 83 | (59, "postVlanId", "unsigned16"), 84 | (60, "ipVersion", "unsigned8"), 85 | (61, "flowDirection", "unsigned8"), 86 | (62, "ipNextHopIPv6Address", "ipv6Address"), 87 | (63, "bgpNextHopIPv6Address", "ipv6Address"), 88 | (64, "ipv6ExtensionHeaders", "unsigned32"), 89 | (70, "mplsTopLabelStackSection", "octetArray"), 90 | (71, "mplsLabelStackSection2", "octetArray"), 91 | (72, "mplsLabelStackSection3", "octetArray"), 92 | (73, "mplsLabelStackSection4", "octetArray"), 93 | (74, "mplsLabelStackSection5", "octetArray"), 94 | (75, "mplsLabelStackSection6", "octetArray"), 95 | (76, "mplsLabelStackSection7", "octetArray"), 96 | (77, "mplsLabelStackSection8", "octetArray"), 97 | (78, "mplsLabelStackSection9", "octetArray"), 98 | (79, "mplsLabelStackSection10", "octetArray"), 99 | (80, "destinationMacAddress", "macAddress"), 100 | (81, "postSourceMacAddress", "macAddress"), 101 | (82, "interfaceName", "string"), 102 | (83, "interfaceDescription", "string"), 103 | (84, "samplerName", "string"), 104 | (85, "octetTotalCount", "unsigned64"), 105 | (86, "packetTotalCount", "unsigned64"), 106 | (87, "flagsAndSamplerId", "unsigned32"), 107 | (88, "fragmentOffset", "unsigned16"), 108 | (89, "forwardingStatus", "unsigned8"), 109 | (90, "mplsVpnRouteDistinguisher", "octetArray"), 110 | (91, "mplsTopLabelPrefixLength", "unsigned8"), 111 | (92, "srcTrafficIndex", "unsigned32"), 112 | (93, "dstTrafficIndex", "unsigned32"), 113 | (94, "applicationDescription", "string"), 114 | (95, "applicationId", "octetArray"), 115 | (96, "applicationName", "string"), 116 | (97, "Assigned for NetFlow v9 compatibility", ""), 117 | (98, "postIpDiffServCodePoint", "unsigned8"), 118 | (99, "multicastReplicationFactor", "unsigned32"), 119 | (100, "className", "string"), 120 | (101, "classificationEngineId", "unsigned8"), 121 | (102, "layer2packetSectionOffset", "unsigned16"), 122 | (103, "layer2packetSectionSize", "unsigned16"), 123 | (104, "layer2packetSectionData", "octetArray"), 124 | (128, "bgpNextAdjacentAsNumber", "unsigned32"), 125 | (129, "bgpPrevAdjacentAsNumber", "unsigned32"), 126 | (130, "exporterIPv4Address", "ipv4Address"), 127 | (131, "exporterIPv6Address", "ipv6Address"), 128 | (132, "droppedOctetDeltaCount", "unsigned64"), 129 | (133, "droppedPacketDeltaCount", "unsigned64"), 130 | (134, "droppedOctetTotalCount", "unsigned64"), 131 | (135, "droppedPacketTotalCount", "unsigned64"), 132 | (136, "flowEndReason", "unsigned8"), 133 | (137, "commonPropertiesId", "unsigned64"), 134 | (138, "observationPointId", "unsigned64"), 135 | (139, "icmpTypeCodeIPv6", "unsigned16"), 136 | (140, "mplsTopLabelIPv6Address", "ipv6Address"), 137 | (141, "lineCardId", "unsigned32"), 138 | (142, "portId", "unsigned32"), 139 | (143, "meteringProcessId", "unsigned32"), 140 | (144, "exportingProcessId", "unsigned32"), 141 | (145, "templateId", "unsigned16"), 142 | (146, "wlanChannelId", "unsigned8"), 143 | (147, "wlanSSID", "string"), 144 | (148, "flowId", "unsigned64"), 145 | (149, "observationDomainId", "unsigned32"), 146 | (150, "flowStartSeconds", "dateTimeSeconds"), 147 | (151, "flowEndSeconds", "dateTimeSeconds"), 148 | (152, "flowStartMilliseconds", "dateTimeMilliseconds"), 149 | (153, "flowEndMilliseconds", "dateTimeMilliseconds"), 150 | (154, "flowStartMicroseconds", "dateTimeMicroseconds"), 151 | (155, "flowEndMicroseconds", "dateTimeMicroseconds"), 152 | (156, "flowStartNanoseconds", "dateTimeNanoseconds"), 153 | (157, "flowEndNanoseconds", "dateTimeNanoseconds"), 154 | (158, "flowStartDeltaMicroseconds", "unsigned32"), 155 | (159, "flowEndDeltaMicroseconds", "unsigned32"), 156 | (160, "systemInitTimeMilliseconds", "dateTimeMilliseconds"), 157 | (161, "flowDurationMilliseconds", "unsigned32"), 158 | (162, "flowDurationMicroseconds", "unsigned32"), 159 | (163, "observedFlowTotalCount", "unsigned64"), 160 | (164, "ignoredPacketTotalCount", "unsigned64"), 161 | (165, "ignoredOctetTotalCount", "unsigned64"), 162 | (166, "notSentFlowTotalCount", "unsigned64"), 163 | (167, "notSentPacketTotalCount", "unsigned64"), 164 | (168, "notSentOctetTotalCount", "unsigned64"), 165 | (169, "destinationIPv6Prefix", "ipv6Address"), 166 | (170, "sourceIPv6Prefix", "ipv6Address"), 167 | (171, "postOctetTotalCount", "unsigned64"), 168 | (172, "postPacketTotalCount", "unsigned64"), 169 | (173, "flowKeyIndicator", "unsigned64"), 170 | (174, "postMCastPacketTotalCount", "unsigned64"), 171 | (175, "postMCastOctetTotalCount", "unsigned64"), 172 | (176, "icmpTypeIPv4", "unsigned8"), 173 | (177, "icmpCodeIPv4", "unsigned8"), 174 | (178, "icmpTypeIPv6", "unsigned8"), 175 | (179, "icmpCodeIPv6", "unsigned8"), 176 | (180, "udpSourcePort", "unsigned16"), 177 | (181, "udpDestinationPort", "unsigned16"), 178 | (182, "tcpSourcePort", "unsigned16"), 179 | (183, "tcpDestinationPort", "unsigned16"), 180 | (184, "tcpSequenceNumber", "unsigned32"), 181 | (185, "tcpAcknowledgementNumber", "unsigned32"), 182 | (186, "tcpWindowSize", "unsigned16"), 183 | (187, "tcpUrgentPointer", "unsigned16"), 184 | (188, "tcpHeaderLength", "unsigned8"), 185 | (189, "ipHeaderLength", "unsigned8"), 186 | (190, "totalLengthIPv4", "unsigned16"), 187 | (191, "payloadLengthIPv6", "unsigned16"), 188 | (192, "ipTTL", "unsigned8"), 189 | (193, "nextHeaderIPv6", "unsigned8"), 190 | (194, "mplsPayloadLength", "unsigned32"), 191 | (195, "ipDiffServCodePoint", "unsigned8"), 192 | (196, "ipPrecedence", "unsigned8"), 193 | (197, "fragmentFlags", "unsigned8"), 194 | (198, "octetDeltaSumOfSquares", "unsigned64"), 195 | (199, "octetTotalSumOfSquares", "unsigned64"), 196 | (200, "mplsTopLabelTTL", "unsigned8"), 197 | (201, "mplsLabelStackLength", "unsigned32"), 198 | (202, "mplsLabelStackDepth", "unsigned32"), 199 | (203, "mplsTopLabelExp", "unsigned8"), 200 | (204, "ipPayloadLength", "unsigned32"), 201 | (205, "udpMessageLength", "unsigned16"), 202 | (206, "isMulticast", "unsigned8"), 203 | (207, "ipv4IHL", "unsigned8"), 204 | (208, "ipv4Options", "unsigned32"), 205 | (209, "tcpOptions", "unsigned64"), 206 | (210, "paddingOctets", "octetArray"), 207 | (211, "collectorIPv4Address", "ipv4Address"), 208 | (212, "collectorIPv6Address", "ipv6Address"), 209 | (213, "exportInterface", "unsigned32"), 210 | (214, "exportProtocolVersion", "unsigned8"), 211 | (215, "exportTransportProtocol", "unsigned8"), 212 | (216, "collectorTransportPort", "unsigned16"), 213 | (217, "exporterTransportPort", "unsigned16"), 214 | (218, "tcpSynTotalCount", "unsigned64"), 215 | (219, "tcpFinTotalCount", "unsigned64"), 216 | (220, "tcpRstTotalCount", "unsigned64"), 217 | (221, "tcpPshTotalCount", "unsigned64"), 218 | (222, "tcpAckTotalCount", "unsigned64"), 219 | (223, "tcpUrgTotalCount", "unsigned64"), 220 | (224, "ipTotalLength", "unsigned64"), 221 | (225, "postNATSourceIPv4Address", "ipv4Address"), 222 | (226, "postNATDestinationIPv4Address", "ipv4Address"), 223 | (227, "postNAPTSourceTransportPort", "unsigned16"), 224 | (228, "postNAPTDestinationTransportPort", "unsigned16"), 225 | (229, "natOriginatingAddressRealm", "unsigned8"), 226 | (230, "natEvent", "unsigned8"), 227 | (231, "initiatorOctets", "unsigned64"), 228 | (232, "responderOctets", "unsigned64"), 229 | (233, "firewallEvent", "unsigned8"), 230 | (234, "ingressVRFID", "unsigned32"), 231 | (235, "egressVRFID", "unsigned32"), 232 | (236, "VRFname", "string"), 233 | (237, "postMplsTopLabelExp", "unsigned8"), 234 | (238, "tcpWindowScale", "unsigned16"), 235 | (239, "biflowDirection", "unsigned8"), 236 | (240, "ethernetHeaderLength", "unsigned8"), 237 | (241, "ethernetPayloadLength", "unsigned16"), 238 | (242, "ethernetTotalLength", "unsigned16"), 239 | (243, "dot1qVlanId", "unsigned16"), 240 | (244, "dot1qPriority", "unsigned8"), 241 | (245, "dot1qCustomerVlanId", "unsigned16"), 242 | (246, "dot1qCustomerPriority", "unsigned8"), 243 | (247, "metroEvcId", "string"), 244 | (248, "metroEvcType", "unsigned8"), 245 | (249, "pseudoWireId", "unsigned32"), 246 | (250, "pseudoWireType", "unsigned16"), 247 | (251, "pseudoWireControlWord", "unsigned32"), 248 | (252, "ingressPhysicalInterface", "unsigned32"), 249 | (253, "egressPhysicalInterface", "unsigned32"), 250 | (254, "postDot1qVlanId", "unsigned16"), 251 | (255, "postDot1qCustomerVlanId", "unsigned16"), 252 | (256, "ethernetType", "unsigned16"), 253 | (257, "postIpPrecedence", "unsigned8"), 254 | (258, "collectionTimeMilliseconds", "dateTimeMilliseconds"), 255 | (259, "exportSctpStreamId", "unsigned16"), 256 | (260, "maxExportSeconds", "dateTimeSeconds"), 257 | (261, "maxFlowEndSeconds", "dateTimeSeconds"), 258 | (262, "messageMD5Checksum", "octetArray"), 259 | (263, "messageScope", "unsigned8"), 260 | (264, "minExportSeconds", "dateTimeSeconds"), 261 | (265, "minFlowStartSeconds", "dateTimeSeconds"), 262 | (266, "opaqueOctets", "octetArray"), 263 | (267, "sessionScope", "unsigned8"), 264 | (268, "maxFlowEndMicroseconds", "dateTimeMicroseconds"), 265 | (269, "maxFlowEndMilliseconds", "dateTimeMilliseconds"), 266 | (270, "maxFlowEndNanoseconds", "dateTimeNanoseconds"), 267 | (271, "minFlowStartMicroseconds", "dateTimeMicroseconds"), 268 | (272, "minFlowStartMilliseconds", "dateTimeMilliseconds"), 269 | (273, "minFlowStartNanoseconds", "dateTimeNanoseconds"), 270 | (274, "collectorCertificate", "octetArray"), 271 | (275, "exporterCertificate", "octetArray"), 272 | (276, "dataRecordsReliability", "boolean"), 273 | (277, "observationPointType", "unsigned8"), 274 | (278, "newConnectionDeltaCount", "unsigned32"), 275 | (279, "connectionSumDurationSeconds", "unsigned64"), 276 | (280, "connectionTransactionId", "unsigned64"), 277 | (281, "postNATSourceIPv6Address", "ipv6Address"), 278 | (282, "postNATDestinationIPv6Address", "ipv6Address"), 279 | (283, "natPoolId", "unsigned32"), 280 | (284, "natPoolName", "string"), 281 | (285, "anonymizationFlags", "unsigned16"), 282 | (286, "anonymizationTechnique", "unsigned16"), 283 | (287, "informationElementIndex", "unsigned16"), 284 | (288, "p2pTechnology", "string"), 285 | (289, "tunnelTechnology", "string"), 286 | (290, "encryptedTechnology", "string"), 287 | (291, "basicList", "basicList"), 288 | (292, "subTemplateList", "subTemplateList"), 289 | (293, "subTemplateMultiList", "subTemplateMultiList"), 290 | (294, "bgpValidityState", "unsigned8"), 291 | (295, "IPSecSPI", "unsigned32"), 292 | (296, "greKey", "unsigned32"), 293 | (297, "natType", "unsigned8"), 294 | (298, "initiatorPackets", "unsigned64"), 295 | (299, "responderPackets", "unsigned64"), 296 | (300, "observationDomainName", "string"), 297 | (301, "selectionSequenceId", "unsigned64"), 298 | (302, "selectorId", "unsigned64"), 299 | (303, "informationElementId", "unsigned16"), 300 | (304, "selectorAlgorithm", "unsigned16"), 301 | (305, "samplingPacketInterval", "unsigned32"), 302 | (306, "samplingPacketSpace", "unsigned32"), 303 | (307, "samplingTimeInterval", "unsigned32"), 304 | (308, "samplingTimeSpace", "unsigned32"), 305 | (309, "samplingSize", "unsigned32"), 306 | (310, "samplingPopulation", "unsigned32"), 307 | (311, "samplingProbability", "float64"), 308 | (312, "dataLinkFrameSize", "unsigned16"), 309 | (313, "ipHeaderPacketSection", "octetArray"), 310 | (314, "ipPayloadPacketSection", "octetArray"), 311 | (315, "dataLinkFrameSection", "octetArray"), 312 | (316, "mplsLabelStackSection", "octetArray"), 313 | (317, "mplsPayloadPacketSection", "octetArray"), 314 | (318, "selectorIdTotalPktsObserved", "unsigned64"), 315 | (319, "selectorIdTotalPktsSelected", "unsigned64"), 316 | (320, "absoluteError", "float64"), 317 | (321, "relativeError", "float64"), 318 | (322, "observationTimeSeconds", "dateTimeSeconds"), 319 | (323, "observationTimeMilliseconds", "dateTimeMilliseconds"), 320 | (324, "observationTimeMicroseconds", "dateTimeMicroseconds"), 321 | (325, "observationTimeNanoseconds", "dateTimeNanoseconds"), 322 | (326, "digestHashValue", "unsigned64"), 323 | (327, "hashIPPayloadOffset", "unsigned64"), 324 | (328, "hashIPPayloadSize", "unsigned64"), 325 | (329, "hashOutputRangeMin", "unsigned64"), 326 | (330, "hashOutputRangeMax", "unsigned64"), 327 | (331, "hashSelectedRangeMin", "unsigned64"), 328 | (332, "hashSelectedRangeMax", "unsigned64"), 329 | (333, "hashDigestOutput", "boolean"), 330 | (334, "hashInitialiserValue", "unsigned64"), 331 | (335, "selectorName", "string"), 332 | (336, "upperCILimit", "float64"), 333 | (337, "lowerCILimit", "float64"), 334 | (338, "confidenceLevel", "float64"), 335 | (339, "informationElementDataType", "unsigned8"), 336 | (340, "informationElementDescription", "string"), 337 | (341, "informationElementName", "string"), 338 | (342, "informationElementRangeBegin", "unsigned64"), 339 | (343, "informationElementRangeEnd", "unsigned64"), 340 | (344, "informationElementSemantics", "unsigned8"), 341 | (345, "informationElementUnits", "unsigned16"), 342 | (346, "privateEnterpriseNumber", "unsigned32"), 343 | (347, "virtualStationInterfaceId", "octetArray"), 344 | (348, "virtualStationInterfaceName", "string"), 345 | (349, "virtualStationUUID", "octetArray"), 346 | (350, "virtualStationName", "string"), 347 | (351, "layer2SegmentId", "unsigned64"), 348 | (352, "layer2OctetDeltaCount", "unsigned64"), 349 | (353, "layer2OctetTotalCount", "unsigned64"), 350 | (354, "ingressUnicastPacketTotalCount", "unsigned64"), 351 | (355, "ingressMulticastPacketTotalCount", "unsigned64"), 352 | (356, "ingressBroadcastPacketTotalCount", "unsigned64"), 353 | (357, "egressUnicastPacketTotalCount", "unsigned64"), 354 | (358, "egressBroadcastPacketTotalCount", "unsigned64"), 355 | (359, "monitoringIntervalStartMilliSeconds", "dateTimeMilliseconds"), 356 | (360, "monitoringIntervalEndMilliSeconds", "dateTimeMilliseconds"), 357 | (361, "portRangeStart", "unsigned16"), 358 | (362, "portRangeEnd", "unsigned16"), 359 | (363, "portRangeStepSize", "unsigned16"), 360 | (364, "portRangeNumPorts", "unsigned16"), 361 | (365, "staMacAddress", "macAddress"), 362 | (366, "staIPv4Address", "ipv4Address"), 363 | (367, "wtpMacAddress", "macAddress"), 364 | (368, "ingressInterfaceType", "unsigned32"), 365 | (369, "egressInterfaceType", "unsigned32"), 366 | (370, "rtpSequenceNumber", "unsigned16"), 367 | (371, "userName", "string"), 368 | (372, "applicationCategoryName", "string"), 369 | (373, "applicationSubCategoryName", "string"), 370 | (374, "applicationGroupName", "string"), 371 | (375, "originalFlowsPresent", "unsigned64"), 372 | (376, "originalFlowsInitiated", "unsigned64"), 373 | (377, "originalFlowsCompleted", "unsigned64"), 374 | (378, "distinctCountOfSourceIPAddress", "unsigned64"), 375 | (379, "distinctCountOfDestinationIPAddress", "unsigned64"), 376 | (380, "distinctCountOfSourceIPv4Address", "unsigned32"), 377 | (381, "distinctCountOfDestinationIPv4Address", "unsigned32"), 378 | (382, "distinctCountOfSourceIPv6Address", "unsigned64"), 379 | (383, "distinctCountOfDestinationIPv6Address", "unsigned64"), 380 | (384, "valueDistributionMethod", "unsigned8"), 381 | (385, "rfc3550JitterMilliseconds", "unsigned32"), 382 | (386, "rfc3550JitterMicroseconds", "unsigned32"), 383 | (387, "rfc3550JitterNanoseconds", "unsigned32"), 384 | (388, "dot1qDEI", "boolean"), 385 | (389, "dot1qCustomerDEI", "boolean"), 386 | (390, "flowSelectorAlgorithm", "unsigned16"), 387 | (391, "flowSelectedOctetDeltaCount", "unsigned64"), 388 | (392, "flowSelectedPacketDeltaCount", "unsigned64"), 389 | (393, "flowSelectedFlowDeltaCount", "unsigned64"), 390 | (394, "selectorIDTotalFlowsObserved", "unsigned64"), 391 | (395, "selectorIDTotalFlowsSelected", "unsigned64"), 392 | (396, "samplingFlowInterval", "unsigned64"), 393 | (397, "samplingFlowSpacing", "unsigned64"), 394 | (398, "flowSamplingTimeInterval", "unsigned64"), 395 | (399, "flowSamplingTimeSpacing", "unsigned64"), 396 | (400, "hashFlowDomain", "unsigned16"), 397 | (401, "transportOctetDeltaCount", "unsigned64"), 398 | (402, "transportPacketDeltaCount", "unsigned64"), 399 | (403, "originalExporterIPv4Address", "ipv4Address"), 400 | (404, "originalExporterIPv6Address", "ipv6Address"), 401 | (405, "originalObservationDomainId", "unsigned32"), 402 | (406, "intermediateProcessId", "unsigned32"), 403 | (407, "ignoredDataRecordTotalCount", "unsigned64"), 404 | (408, "dataLinkFrameType", "unsigned16"), 405 | (409, "sectionOffset", "unsigned16"), 406 | (410, "sectionExportedOctets", "unsigned16"), 407 | (411, "dot1qServiceInstanceTag", "octetArray"), 408 | (412, "dot1qServiceInstanceId", "unsigned32"), 409 | (413, "dot1qServiceInstancePriority", "unsigned8"), 410 | (414, "dot1qCustomerSourceMacAddress", "macAddress"), 411 | (415, "dot1qCustomerDestinationMacAddress", "macAddress"), 412 | (416, "", ""), 413 | (417, "postLayer2OctetDeltaCount", "unsigned64"), 414 | (418, "postMCastLayer2OctetDeltaCount", "unsigned64"), 415 | (419, "", ""), 416 | (420, "postLayer2OctetTotalCount", "unsigned64"), 417 | (421, "postMCastLayer2OctetTotalCount", "unsigned64"), 418 | (422, "minimumLayer2TotalLength", "unsigned64"), 419 | (423, "maximumLayer2TotalLength", "unsigned64"), 420 | (424, "droppedLayer2OctetDeltaCount", "unsigned64"), 421 | (425, "droppedLayer2OctetTotalCount", "unsigned64"), 422 | (426, "ignoredLayer2OctetTotalCount", "unsigned64"), 423 | (427, "notSentLayer2OctetTotalCount", "unsigned64"), 424 | (428, "layer2OctetDeltaSumOfSquares", "unsigned64"), 425 | (429, "layer2OctetTotalSumOfSquares", "unsigned64"), 426 | (430, "layer2FrameDeltaCount", "unsigned64"), 427 | (431, "layer2FrameTotalCount", "unsigned64"), 428 | (432, "pseudoWireDestinationIPv4Address", "ipv4Address"), 429 | (433, "ignoredLayer2FrameTotalCount", "unsigned64"), 430 | (434, "mibObjectValueInteger", "signed32"), 431 | (435, "mibObjectValueOctetString", "octetArray"), 432 | (436, "mibObjectValueOID", "octetArray"), 433 | (437, "mibObjectValueBits", "octetArray"), 434 | (438, "mibObjectValueIPAddress", "ipv4Address"), 435 | (439, "mibObjectValueCounter", "unsigned64"), 436 | (440, "mibObjectValueGauge", "unsigned32"), 437 | (441, "mibObjectValueTimeTicks", "unsigned32"), 438 | (442, "mibObjectValueUnsigned", "unsigned32"), 439 | (443, "mibObjectValueTable", "subTemplateList"), 440 | (444, "mibObjectValueRow", "subTemplateList"), 441 | (445, "mibObjectIdentifier", "octetArray"), 442 | (446, "mibSubIdentifier", "unsigned32"), 443 | (447, "mibIndexIndicator", "unsigned64"), 444 | (448, "mibCaptureTimeSemantics", "unsigned8"), 445 | (449, "mibContextEngineID", "octetArray"), 446 | (450, "mibContextName", "string"), 447 | (451, "mibObjectName", "string"), 448 | (452, "mibObjectDescription", "string"), 449 | (453, "mibObjectSyntax", "string"), 450 | (454, "mibModuleName", "string"), 451 | (455, "mobileIMSI", "string"), 452 | (456, "mobileMSISDN", "string"), 453 | (457, "httpStatusCode", "unsigned16"), 454 | (458, "sourceTransportPortsLimit", "unsigned16"), 455 | (459, "httpRequestMethod", "string"), 456 | (460, "httpRequestHost", "string"), 457 | (461, "httpRequestTarget", "string"), 458 | (462, "httpMessageVersion", "string"), 459 | (463, "natInstanceID", "unsigned32"), 460 | (464, "internalAddressRealm", "octetArray"), 461 | (465, "externalAddressRealm", "octetArray"), 462 | (466, "natQuotaExceededEvent", "unsigned32"), 463 | (467, "natThresholdEvent", "unsigned32"), 464 | (468, "httpUserAgent", "string"), 465 | (469, "httpContentType", "string"), 466 | (470, "httpReasonPhrase", "string"), 467 | (471, "maxSessionEntries", "unsigned32"), 468 | (472, "maxBIBEntries", "unsigned32"), 469 | (473, "maxEntriesPerUser", "unsigned32"), 470 | (474, "maxSubscribers", "unsigned32"), 471 | (475, "maxFragmentsPendingReassembly", "unsigned32"), 472 | (476, "addressPoolHighThreshold", "unsigned32"), 473 | (477, "addressPoolLowThreshold", "unsigned32"), 474 | (478, "addressPortMappingHighThreshold", "unsigned32"), 475 | (479, "addressPortMappingLowThreshold", "unsigned32"), 476 | (480, "addressPortMappingPerUserHighThreshold", "unsigned32"), 477 | (481, "globalAddressMappingHighThreshold", "unsigned32"), 478 | (482, "vpnIdentifier", "octetArray"), 479 | (483, "bgpCommunity", "unsigned32"), 480 | (484, "bgpSourceCommunityList", "basicList"), 481 | (485, "bgpDestinationCommunityList", "basicList"), 482 | (486, "bgpExtendedCommunity", "octetArray"), 483 | (487, "bgpSourceExtendedCommunityList", "basicList"), 484 | (488, "bgpDestinationExtendedCommunityList", "basicList"), 485 | (489, "bgpLargeCommunity", "octetArray"), 486 | (490, "bgpSourceLargeCommunityList", "basicList"), 487 | (491, "bgpDestinationLargeCommunityList", "basicList"), 488 | ] 489 | 490 | @classmethod 491 | @functools.lru_cache(maxsize=128) 492 | def by_id(cls, id_: int) -> Optional[FieldType]: 493 | for item in cls.iana_field_types: 494 | if item[0] == id_: 495 | return FieldType(*item) 496 | return None 497 | 498 | @classmethod 499 | @functools.lru_cache(maxsize=128) 500 | def by_name(cls, key: str) -> Optional[FieldType]: 501 | for item in cls.iana_field_types: 502 | if item[1] == key: 503 | return FieldType(*item) 504 | return None 505 | 506 | @classmethod 507 | @functools.lru_cache(maxsize=128) 508 | def get_type_unpack(cls, key: Union[int, str]) -> Optional[DataType]: 509 | """ 510 | This method covers the mapping from a field type to a struct.unpack format string. 511 | BLOCKED: due to Reduced-Size Encoding, fields may be exported with a smaller length than defined in 512 | the standard. Because of this mismatch, the parser in `IPFIXDataRecord.__init__` cannot use this method. 513 | :param key: 514 | :return: 515 | """ 516 | item = None 517 | if type(key) is int: 518 | item = cls.by_id(key) 519 | elif type(key) is str: 520 | item = cls.by_name(key) 521 | if not item: 522 | return None 523 | return IPFIXDataTypes.by_name(item.type) 524 | 525 | 526 | class IPFIXDataTypes: 527 | # Source: https://www.iana.org/assignments/ipfix/ipfix-information-element-data-types.csv 528 | # Reference: https://tools.ietf.org/html/rfc7011 529 | iana_data_types = [ 530 | ("octetArray", None), # has no encoding rules; it represents a raw array of zero or more octets 531 | ("unsigned8", "B"), 532 | ("unsigned16", "H"), 533 | ("unsigned32", "I"), 534 | ("unsigned64", "Q"), 535 | ("signed8", "b"), 536 | ("signed16", "h"), 537 | ("signed32", "i"), 538 | ("signed64", "q"), 539 | ("float32", "f"), 540 | ("float64", "d"), 541 | ("boolean", "?"), # encoded as a single-octet integer [..], with the value 1 for true and value 2 for false. 542 | ("macAddress", "6s"), 543 | ("string", None), # represents a finite-length string of valid characters of the Unicode encoding set 544 | ("dateTimeSeconds", "I"), 545 | ("dateTimeMilliseconds", "Q"), 546 | ("dateTimeMicroseconds", "8s"), # This field is made up of two unsigned 32-bit integers 547 | ("dateTimeNanoseconds", "8s"), # same as above 548 | ("ipv4Address", "4s"), 549 | ("ipv6Address", "16s"), 550 | 551 | # To be implemented 552 | # ("basicList", "x"), 553 | # ("subTemplateList", "x"), 554 | # ("subTemplateMultiList", "x"), 555 | ] 556 | 557 | @classmethod 558 | @functools.lru_cache(maxsize=128) 559 | def by_name(cls, key: str) -> Optional[DataType]: 560 | """ 561 | Get DataType by name if found, else None. 562 | :param key: 563 | :return: 564 | """ 565 | for t in cls.iana_data_types: 566 | if t[0] == key: 567 | return DataType(*t) 568 | return None 569 | 570 | @classmethod 571 | def is_signed(cls, dt: Union[DataType, str]) -> bool: 572 | """ 573 | Check if a data type is meant to be a signed integer. 574 | :param dt: 575 | :return: 576 | """ 577 | fields = ["signed8", "signed16", "signed32", "signed64"] 578 | if type(dt) is DataType: 579 | return dt.type in fields 580 | return dt in fields 581 | 582 | @classmethod 583 | def is_float(cls, dt: Union[DataType, str]) -> bool: 584 | """ 585 | Check if data type is meant to be a float. 586 | :param dt: 587 | :return: 588 | """ 589 | fields = ["float32", "float64"] 590 | if type(dt) is DataType: 591 | return dt.type in fields 592 | return dt in fields 593 | 594 | @classmethod 595 | def is_bytes(cls, dt: Union[DataType, str]) -> bool: 596 | """ 597 | Check if a data type is meant to be parsed as bytes. 598 | :param dt: 599 | :return: 600 | """ 601 | fields = ["octetArray", "string", 602 | "macAddress", "ipv4Address", "ipv6Address", 603 | "dateTimeMicroseconds", "dateTimeNanoseconds"] 604 | if type(dt) is DataType: 605 | return dt.type in fields 606 | return dt in fields 607 | 608 | @classmethod 609 | def to_fitting_object(cls, field): 610 | """ 611 | Could implement conversion to IPv4Address etc. 612 | :param field: 613 | :return: 614 | """ 615 | pass 616 | 617 | 618 | class IPFIXMalformedRecord(Exception): 619 | pass 620 | 621 | 622 | class IPFIXRFCError(Exception): 623 | pass 624 | 625 | 626 | class IPFIXMalformedPacket(Exception): 627 | pass 628 | 629 | 630 | class IPFIXTemplateError(Exception): 631 | pass 632 | 633 | 634 | class IPFIXTemplateNotRecognized(KeyError): 635 | pass 636 | 637 | 638 | class PaddingCalculationError(Exception): 639 | pass 640 | 641 | 642 | class IPFIXHeader: 643 | """The header of the IPFIX export packet 644 | """ 645 | size = 16 646 | 647 | def __init__(self, data): 648 | pack = struct.unpack('!HHIII', data) 649 | self.version = pack[0] 650 | self.length = pack[1] 651 | self.export_uptime = pack[2] 652 | self.sequence_number = pack[3] 653 | self.obervation_domain_id = pack[4] 654 | 655 | def to_dict(self): 656 | return self.__dict__ 657 | 658 | 659 | class IPFIXTemplateRecord: 660 | def __init__(self, data): 661 | pack = struct.unpack("!HH", data[:4]) 662 | self.template_id = pack[0] # range 256 to 65535 663 | self.field_count = pack[1] # Number of fields in this Template Record 664 | 665 | offset = 4 666 | self.fields, offset_add = parse_fields(data[offset:], self.field_count) 667 | offset += offset_add 668 | if len(self.fields) != self.field_count: 669 | raise IPFIXMalformedRecord 670 | self._length = offset 671 | 672 | def get_length(self): 673 | return self._length 674 | 675 | def __repr__(self): 676 | return "".format(len(self.fields)) 677 | 678 | 679 | class IPFIXOptionsTemplateRecord: 680 | def __init__(self, data): 681 | pack = struct.unpack("!HHH", data[:6]) 682 | self.template_id = pack[0] # range 256 to 65535 683 | self.field_count = pack[1] # includes count of scope fields 684 | 685 | # A scope field count of N specifies that the first N Field Specifiers in 686 | # the Template Record are Scope Fields. The Scope Field Count MUST NOT be zero. 687 | self.scope_field_count = pack[2] 688 | 689 | offset = 6 690 | 691 | self.scope_fields, offset_add = parse_fields(data[offset:], self.scope_field_count) 692 | if len(self.scope_fields) != self.scope_field_count: 693 | raise IPFIXMalformedRecord 694 | offset += offset_add 695 | 696 | self.fields, offset_add = parse_fields(data[offset:], self.field_count - self.scope_field_count) 697 | if len(self.fields) + len(self.scope_fields) != self.field_count: 698 | raise IPFIXMalformedRecord 699 | offset += offset_add 700 | 701 | self._length = offset 702 | 703 | def get_length(self): 704 | return self._length 705 | 706 | def __repr__(self): 707 | return "".format( 708 | len(self.scope_fields), len(self.fields) 709 | ) 710 | 711 | 712 | class IPFIXDataRecord: 713 | """The IPFIX data record with fields and their value. 714 | The field types are identified by the corresponding template. 715 | In contrast to the NetFlow v9 implementation, this one does not use an extra class for the fields. 716 | """ 717 | 718 | def __init__(self, data, template: List[Union[TemplateField, TemplateFieldEnterprise]]): 719 | self.fields = set() 720 | offset = 0 721 | unpacker = "!" 722 | discovered_fields = [] 723 | 724 | # Iterate through all fields of this template and build the unpack format string 725 | # See https://www.iana.org/assignments/ipfix/ipfix.xhtml 726 | for index, field in enumerate(template): 727 | field_type_id = field.id 728 | field_length = field.length 729 | offset += field_length 730 | 731 | # Here, reduced-size encoding of fields blocks the usage of IPFIXFieldTypes.get_type_unpack. 732 | # See comment in IPFIXFieldTypes.get_type_unpack for more information. 733 | 734 | field_type = IPFIXFieldTypes.by_id(field_type_id) # type: Optional[FieldType] 735 | if not field_type and type(field) is not TemplateFieldEnterprise: 736 | # This should break, since the exporter seems to use a field identifier 737 | # which is not standardized by IANA. 738 | raise NotImplementedError("Field type with ID {} is not implemented".format(field_type_id)) 739 | 740 | datatype = field_type.type # type: str 741 | discovered_fields.append((datatype, field_type_id)) 742 | 743 | # Catch fields which are meant to be raw bytes and skip the rest 744 | if IPFIXDataTypes.is_bytes(datatype): 745 | unpacker += "{}s".format(field_length) 746 | continue 747 | 748 | # Go into int, uint, float types 749 | issigned = IPFIXDataTypes.is_signed(datatype) 750 | isfloat = IPFIXDataTypes.is_float(datatype) 751 | assert not (all([issigned, isfloat])) # signed int and float are exclusive 752 | 753 | if field_length == 1: 754 | unpacker += "b" if issigned else "B" 755 | elif field_length == 2: 756 | unpacker += "h" if issigned else "H" 757 | elif field_length == 4: 758 | unpacker += "i" if issigned else "f" if isfloat else "I" 759 | elif field_length == 8: 760 | unpacker += "q" if issigned else "d" if isfloat else "Q" 761 | else: 762 | raise IPFIXTemplateError("Template field_length {} not handled in unpacker".format(field_length)) 763 | 764 | # Finally, unpack the data byte stream according to format defined in iteration above 765 | pack = struct.unpack(unpacker, data[0:offset]) 766 | 767 | # Iterate through template again, but taking the unpacked values this time 768 | for index, ((field_datatype, field_type_id), value) in enumerate(zip(discovered_fields, pack)): 769 | if type(value) is bytes: 770 | # Check if value is raw bytes, so no conversion happened in struct.unpack 771 | if field_datatype in ["string"]: 772 | try: 773 | value = value.decode() 774 | except UnicodeDecodeError: 775 | value = str(value) 776 | # TODO: handle octetArray (= does not have to be unicode encoded) 777 | elif field_datatype in ["boolean"]: 778 | value = True if value == 1 else False # 2 = false per RFC 779 | elif field_datatype in ["dateTimeMicroseconds", "dateTimeNanoseconds"]: 780 | seconds = value[:4] 781 | fraction = value[4:] 782 | value = (int.from_bytes(seconds, "big"), int.from_bytes(fraction, "big")) 783 | else: 784 | value = int.from_bytes(value, "big") 785 | # If not bytes, struct.unpack already did necessary conversions (int, float...), 786 | # value can be used as-is. 787 | self.fields.add((field_type_id, value)) 788 | 789 | self._length = offset 790 | self.__dict__.update(self.data) 791 | 792 | def get_length(self): 793 | return self._length 794 | 795 | @property 796 | def data(self): 797 | return { 798 | IPFIXFieldTypes.by_id(key)[1]: value for (key, value) in self.fields 799 | } 800 | 801 | def __repr__(self): 802 | return "".format(len(self.fields)) 803 | 804 | 805 | class IPFIXSet: 806 | """A set containing the set header and a collection of records (one of templates, options, data) 807 | """ 808 | 809 | def __init__(self, data: bytes, templates): 810 | self.header = IPFIXSetHeader(data[0:IPFIXSetHeader.size]) 811 | self.records = [] 812 | self._templates = {} 813 | 814 | offset = IPFIXSetHeader.size # fixed size 815 | 816 | if self.header.set_id == 2: # template set 817 | while offset < self.header.length: # length of whole set 818 | template_record = IPFIXTemplateRecord(data[offset:]) 819 | self.records.append(template_record) 820 | if template_record.field_count == 0: 821 | # Should not happen, since RFC says "one or more" 822 | self._templates[template_record.template_id] = None 823 | else: 824 | self._templates[template_record.template_id] = template_record.fields 825 | offset += template_record.get_length() 826 | 827 | # If the rest of the data is deemed to be too small for another 828 | # template record, check existence of padding 829 | if ( 830 | offset != self.header.length 831 | and self.header.length - offset <= 16 # 16 is chosen as a guess 832 | and rest_is_padding_zeroes(data[:self.header.length], offset) 833 | ): 834 | # Rest should be padding zeroes 835 | break 836 | 837 | elif self.header.set_id == 3: # options template 838 | while offset < self.header.length: 839 | optionstemplate_record = IPFIXOptionsTemplateRecord(data[offset:]) 840 | self.records.append(optionstemplate_record) 841 | if optionstemplate_record.field_count == 0: 842 | self._templates[optionstemplate_record.template_id] = None 843 | else: 844 | self._templates[optionstemplate_record.template_id] = \ 845 | optionstemplate_record.scope_fields + optionstemplate_record.fields 846 | offset += optionstemplate_record.get_length() 847 | 848 | # If the rest of the data is deemed to be too small for another 849 | # options template record, check existence of padding 850 | if ( 851 | offset != self.header.length 852 | and self.header.length - offset <= 16 # 16 is chosen as a guess 853 | and rest_is_padding_zeroes(data[:self.header.length], offset) 854 | ): 855 | # Rest should be padding zeroes 856 | break 857 | 858 | elif self.header.set_id >= 256: # data set, set_id is template id 859 | # First, get the template behind the ID. Returns a list of fields or raises an exception 860 | template_fields = templates.get( 861 | self.header.set_id) # type: List[Union[TemplateField, TemplateFieldEnterprise]] 862 | if not template_fields: 863 | raise IPFIXTemplateNotRecognized 864 | 865 | # All template fields have a known length. Add them all together to get the length of the data set. 866 | dataset_length = functools.reduce(lambda a, x: a + x.length, template_fields, 0) 867 | 868 | # This is the last possible offset value possible if there's no padding. 869 | # If there is padding, this value marks the beginning of the padding. 870 | # Two cases possible: 871 | # 1. No padding: then (4 + x * dataset_length) == self.header.length 872 | # 2. Padding: then (4 + x * dataset_length + p) == self.header.length, 873 | # where p is the remaining length of padding zeroes. The modulo calculates p 874 | no_padding_last_offset = self.header.length - ((self.header.length - IPFIXSetHeader.size) % dataset_length) 875 | 876 | while offset < no_padding_last_offset: 877 | data_record = IPFIXDataRecord(data[offset:], template_fields) 878 | self.records.append(data_record) 879 | offset += data_record.get_length() 880 | 881 | # Safety check 882 | if ( 883 | offset != self.header.length 884 | and not rest_is_padding_zeroes(data[:self.header.length], offset) 885 | ): 886 | raise PaddingCalculationError 887 | 888 | self._length = self.header.length 889 | 890 | def get_length(self): 891 | return self._length 892 | 893 | @property 894 | def is_template(self): 895 | return self.header.set_id in [2, 3] 896 | 897 | @property 898 | def is_data(self): 899 | return self.header.set_id >= 256 900 | 901 | @property 902 | def templates(self): 903 | return self._templates 904 | 905 | def __repr__(self): 906 | return "".format(self.header.set_id, len(self.records)) 907 | 908 | 909 | class IPFIXSetHeader: 910 | """Header of a set (collection of records) 911 | """ 912 | size = 4 913 | 914 | def __init__(self, data): 915 | pack = struct.unpack("!HH", data) 916 | 917 | # A value of 2 is reserved for Template Sets. 918 | # A value of 3 is reserved for Options Template Sets. Values from 4 919 | # to 255 are reserved for future use. Values 256 and above are used 920 | # for Data Sets. The Set ID values of 0 and 1 are not used, for 921 | # historical reasons [RFC3954]. 922 | self.set_id = pack[0] 923 | if self.set_id in [0, 1] + [i for i in range(4, 256)]: 924 | raise IPFIXRFCError("IPFIX set has forbidden ID {}".format(self.set_id)) 925 | 926 | self.length = pack[1] # Total length of the Set, in octets, including the Set Header 927 | 928 | def to_dict(self): 929 | return self.__dict__ 930 | 931 | def __repr__(self): 932 | return "".format(self.set_id, self.length) 933 | 934 | 935 | class IPFIXExportPacket: 936 | """IPFIX export packet with header, templates, options and data flowsets 937 | """ 938 | 939 | def __init__(self, data: bytes, templates: Dict[int, list]): 940 | self.header = IPFIXHeader(data[:IPFIXHeader.size]) 941 | self.sets = [] 942 | self._contains_new_templates = False 943 | self._flows = [] 944 | self._templates = templates 945 | 946 | offset = IPFIXHeader.size 947 | while offset < self.header.length: 948 | try: 949 | new_set = IPFIXSet(data[offset:], templates) 950 | except IPFIXTemplateNotRecognized: 951 | raise 952 | if new_set.is_template: 953 | self._contains_new_templates = True 954 | self._templates.update(new_set.templates) 955 | for template_id, template_fields in self._templates.items(): 956 | if template_fields is None: 957 | # Template withdrawal 958 | del self._templates[template_id] 959 | elif new_set.is_data: 960 | self._flows += new_set.records 961 | 962 | self.sets.append(new_set) 963 | offset += new_set.get_length() 964 | 965 | # Here all data should be processed and offset set to the length 966 | if offset != self.header.length: 967 | raise IPFIXMalformedPacket 968 | 969 | @property 970 | def contains_new_templates(self) -> bool: 971 | return self._contains_new_templates 972 | 973 | @property 974 | def flows(self): 975 | return self._flows 976 | 977 | @property 978 | def templates(self): 979 | return self._templates 980 | 981 | def __repr__(self): 982 | return "".format( 983 | len(self.sets), self.header.export_uptime 984 | ) 985 | 986 | 987 | def parse_fields(data: bytes, count: int) -> (list, int): 988 | """ 989 | Parse fields from a bytes stream, based on the count of fields. 990 | If the field is an enterprise field or not will be determinded in this function. 991 | :param data: 992 | :param count: 993 | :return: List of fields and the new offset. 994 | """ 995 | offset = 0 996 | fields = [] # type: List[Union[TemplateField, TemplateFieldEnterprise]] 997 | for ctr in range(count): 998 | if (data[offset] & (1 << 7)) != 0: # enterprise flag set. Bitwise AND checks bit only in the first byte/octet 999 | pack = struct.unpack("!HHI", data[offset:offset + 8]) 1000 | fields.append( 1001 | TemplateFieldEnterprise( 1002 | id=(pack[0] & ~(1 << 15)), # clear enterprise flag bit. Bitwise AND and INVERT work on two bytes 1003 | length=pack[1], # field length 1004 | enterprise_number=pack[2] # enterprise number 1005 | ) 1006 | ) 1007 | offset += 8 1008 | else: 1009 | pack = struct.unpack("!HH", data[offset:offset + 4]) 1010 | fields.append( 1011 | TemplateField( 1012 | id=pack[0], 1013 | length=pack[1] 1014 | ) 1015 | ) 1016 | offset += 4 1017 | return fields, offset 1018 | 1019 | 1020 | def rest_is_padding_zeroes(data: bytes, offset: int) -> bool: 1021 | if offset <= len(data): 1022 | # padding zeros, so rest of bytes must be summed to 0 1023 | if sum(data[offset:]) != 0: 1024 | return False 1025 | return True 1026 | 1027 | # If offset > len(data) there is an error 1028 | raise ValueError("netflow.ipfix.rest_is_padding_zeroes received a greater offset value than there is data") 1029 | -------------------------------------------------------------------------------- /netflow/utils.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 5 | 6 | Copyright 2016-2020 Dominik Pataky 7 | Licensed under MIT License. See LICENSE. 8 | """ 9 | 10 | import struct 11 | from typing import Union, Dict 12 | 13 | from .ipfix import IPFIXExportPacket 14 | from .v1 import V1ExportPacket 15 | from .v5 import V5ExportPacket 16 | from .v9 import V9ExportPacket 17 | 18 | 19 | class UnknownExportVersion(Exception): 20 | def __init__(self, data, version): 21 | self.data = data 22 | self.version = version 23 | r = repr(data) 24 | data_str = ("{:.25}..." if len(r) >= 28 else "{}").format(r) 25 | super().__init__( 26 | "Unknown NetFlow version {} for data {}".format(version, data_str) 27 | ) 28 | 29 | 30 | def get_export_version(data): 31 | return struct.unpack('!H', data[:2])[0] 32 | 33 | 34 | def parse_packet(data: Union[str, bytes], templates: Dict = None) \ 35 | -> Union[V1ExportPacket, V5ExportPacket, V9ExportPacket, IPFIXExportPacket]: 36 | """ 37 | Parse an exported packet, either from string (hex) or from bytes. 38 | 39 | NetFlow version 9 and IPFIX use dynamic templates, which are sent by the exporter in regular intervals. 40 | These templates must be cached in between exports and are re-used for incoming new export packets. 41 | 42 | The following pseudo-code might help to understand the use case better. First, the collector is started, a new 43 | templates dict is created with default keys and an empty list for buffered packets is added. Then the receiver 44 | loop is started. For each arriving packet, it is tried to be parsed. If parsing fails due to unknown templates, 45 | the packet is queued for later re-parsing (this functionality is not handled in this code snippet). 46 | 47 | ``` 48 | collector = netflow.collector 49 | coll = collector.start('0.0.0.0', 2055) 50 | templates = {"netflow": [], "ipfix": []} 51 | packets_with_unrecognized_templates = [] 52 | while coll.receive_export(): 53 | packet = coll.get_received_export_packet() 54 | try: 55 | parsed_packet = parse_packet(packet, templates) 56 | except (V9TemplateNotRecognized, IPFIXTemplateNotRecognized): 57 | packets_with_unrecognized_templates.append(packet) 58 | ``` 59 | 60 | See the reference implementation of the collector for more information on how to use this function with templates. 61 | 62 | :raises ValueError: When the templates parameter was not passed, but templates must be used (v9, IPFIX). 63 | :raises UnknownExportVersion: When the exported version is not recognized. 64 | 65 | :param data: The export packet as string or bytes. 66 | :param templates: The templates dictionary with keys 'netflow' and 'ipfix' (created if not existing). 67 | :return: The parsed packet, or an exception. 68 | """ 69 | if type(data) is str: 70 | # hex dump as string 71 | data = bytes.fromhex(data) 72 | elif type(data) is bytes: 73 | # check representation based on utf-8 decoding result 74 | try: 75 | # hex dump as bytes, but not hex 76 | dec = data.decode() 77 | data = bytes.fromhex(dec) 78 | except UnicodeDecodeError: 79 | # use data as given, assuming hex-formatted bytes 80 | pass 81 | 82 | version = get_export_version(data) 83 | 84 | if version in [9, 10] and templates is None: 85 | raise ValueError("{} packet detected, but no templates dict was passed! For correct parsing of packets with " 86 | "templates, create a 'templates' dict and pass it into the 'parse_packet' function." 87 | .format("NetFlow v9" if version == 9 else "IPFIX")) 88 | 89 | if version == 1: 90 | return V1ExportPacket(data) 91 | elif version == 5: 92 | return V5ExportPacket(data) 93 | elif version == 9: 94 | if "netflow" not in templates: 95 | templates["netflow"] = [] 96 | return V9ExportPacket(data, templates["netflow"]) 97 | elif version == 10: 98 | if "ipfix" not in templates: 99 | templates["ipfix"] = [] 100 | return IPFIXExportPacket(data, templates["ipfix"]) 101 | raise UnknownExportVersion(data, version) 102 | -------------------------------------------------------------------------------- /netflow/v1.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | Netflow V1 collector and parser implementation in Python 3. 5 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 6 | Created purely for fun. Not battled tested nor will it be. 7 | 8 | Reference https://www.cisco.com/c/en/us/td/docs/net_mgmt/netflow_collection_engine/3-6/user/guide/format.html 9 | This script is specifically implemented in combination with softflowd. See https://github.com/djmdjm/softflowd 10 | """ 11 | 12 | import struct 13 | 14 | __all__ = ["V1DataFlow", "V1ExportPacket", "V1Header"] 15 | 16 | 17 | class V1DataFlow: 18 | """Holds one v1 DataRecord 19 | """ 20 | length = 48 21 | 22 | def __init__(self, data): 23 | pack = struct.unpack('!IIIHHIIIIHHxxBBBxxxxxxx', data) 24 | fields = [ 25 | 'IPV4_SRC_ADDR', 26 | 'IPV4_DST_ADDR', 27 | 'NEXT_HOP', 28 | 'INPUT', 29 | 'OUTPUT', 30 | 'IN_PACKETS', 31 | 'IN_OCTETS', 32 | 'FIRST_SWITCHED', 33 | 'LAST_SWITCHED', 34 | 'SRC_PORT', 35 | 'DST_PORT', 36 | # Word at 36-37 is used for padding 37 | 'PROTO', 38 | 'TOS', 39 | 'TCP_FLAGS', 40 | # Data at 41-47 is padding/reserved 41 | ] 42 | 43 | self.data = {} 44 | for idx, field in enumerate(fields): 45 | self.data[field] = pack[idx] 46 | self.__dict__.update(self.data) # Make data dict entries accessible as object attributes 47 | 48 | def __repr__(self): 49 | return "".format(self.data) 50 | 51 | 52 | class V1Header: 53 | """The header of the V1ExportPacket 54 | """ 55 | length = 16 56 | 57 | def __init__(self, data): 58 | pack = struct.unpack('!HHIII', data[:self.length]) 59 | self.version = pack[0] 60 | self.count = pack[1] 61 | self.uptime = pack[2] 62 | self.timestamp = pack[3] 63 | self.timestamp_nano = pack[4] 64 | 65 | def to_dict(self): 66 | return self.__dict__ 67 | 68 | 69 | class V1ExportPacket: 70 | """The flow record holds the header and data flowsets. 71 | """ 72 | 73 | def __init__(self, data): 74 | self.flows = [] 75 | self.header = V1Header(data) 76 | 77 | offset = self.header.length 78 | for flow_count in range(0, self.header.count): 79 | end = offset + V1DataFlow.length 80 | flow = V1DataFlow(data[offset:end]) 81 | self.flows.append(flow) 82 | offset += flow.length 83 | 84 | def __repr__(self): 85 | return "".format( 86 | self.header.version, self.header.count) 87 | -------------------------------------------------------------------------------- /netflow/v5.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | Netflow V5 collector and parser implementation in Python 3. 5 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 6 | Created purely for fun. Not battled tested nor will it be. 7 | 8 | Reference: https://www.cisco.com/c/en/us/td/docs/net_mgmt/netflow_collection_engine/3-6/user/guide/format.html 9 | This script is specifically implemented in combination with softflowd. See https://github.com/djmdjm/softflowd 10 | """ 11 | 12 | import struct 13 | 14 | __all__ = ["V5DataFlow", "V5ExportPacket", "V5Header"] 15 | 16 | 17 | class V5DataFlow: 18 | """Holds one v5 DataRecord 19 | """ 20 | length = 48 21 | 22 | def __init__(self, data): 23 | pack = struct.unpack("!IIIHHIIIIHHxBBBHHBBxx", data) 24 | fields = [ 25 | 'IPV4_SRC_ADDR', 26 | 'IPV4_DST_ADDR', 27 | 'NEXT_HOP', 28 | 'INPUT', 29 | 'OUTPUT', 30 | 'IN_PACKETS', 31 | 'IN_OCTETS', 32 | 'FIRST_SWITCHED', 33 | 'LAST_SWITCHED', 34 | 'SRC_PORT', 35 | 'DST_PORT', 36 | # Byte 36 is used for padding 37 | 'TCP_FLAGS', 38 | 'PROTO', 39 | 'TOS', 40 | 'SRC_AS', 41 | 'DST_AS', 42 | 'SRC_MASK', 43 | 'DST_MASK', 44 | # Word 46 is used for padding 45 | ] 46 | 47 | self.data = {} 48 | for idx, field in enumerate(fields): 49 | self.data[field] = pack[idx] 50 | self.__dict__.update(self.data) # Make data dict entries accessible as object attributes 51 | 52 | def __repr__(self): 53 | return "".format(self.data) 54 | 55 | 56 | class V5Header: 57 | """The header of the V5ExportPacket 58 | """ 59 | length = 24 60 | 61 | def __init__(self, data): 62 | pack = struct.unpack('!HHIIIIBBH', data[:self.length]) 63 | self.version = pack[0] 64 | self.count = pack[1] 65 | self.uptime = pack[2] 66 | self.timestamp = pack[3] 67 | self.timestamp_nano = pack[4] 68 | self.sequence = pack[5] 69 | self.engine_type = pack[6] 70 | self.engine_id = pack[7] 71 | self.sampling_interval = pack[8] 72 | 73 | def to_dict(self): 74 | return self.__dict__ 75 | 76 | 77 | class V5ExportPacket: 78 | """The flow record holds the header and data flowsets. 79 | """ 80 | 81 | def __init__(self, data): 82 | self.flows = [] 83 | self.header = V5Header(data) 84 | 85 | offset = self.header.length 86 | for flow_count in range(0, self.header.count): 87 | end = offset + V5DataFlow.length 88 | flow = V5DataFlow(data[offset:end]) 89 | self.flows.append(flow) 90 | offset += flow.length 91 | 92 | def __repr__(self): 93 | return "".format( 94 | self.header.version, self.header.count) 95 | -------------------------------------------------------------------------------- /netflow/v9.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | Netflow V9 collector and parser implementation in Python 3. 5 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 6 | Created for learning purposes and unsatisfying alternatives. 7 | 8 | Reference: https://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html 9 | This script is specifically implemented in combination with softflowd. See https://github.com/djmdjm/softflowd 10 | 11 | Copyright 2016-2020 Dominik Pataky 12 | Licensed under MIT License. See LICENSE. 13 | """ 14 | 15 | import ipaddress 16 | import struct 17 | import sys 18 | 19 | from .ipfix import IPFIXFieldTypes, IPFIXDataTypes 20 | 21 | __all__ = ["V9DataFlowSet", "V9DataRecord", "V9ExportPacket", "V9Header", "V9TemplateField", 22 | "V9TemplateFlowSet", "V9TemplateNotRecognized", "V9TemplateRecord", 23 | "V9OptionsTemplateFlowSet", "V9OptionsTemplateRecord", "V9OptionsDataRecord"] 24 | 25 | V9_FIELD_TYPES_CONTAINING_IP = [8, 12, 15, 18, 27, 28, 62, 63] 26 | 27 | V9_FIELD_TYPES = { 28 | 0: 'UNKNOWN_FIELD_TYPE', # fallback for unknown field types 29 | 30 | # Cisco specs for NetFlow v9 31 | # https://tools.ietf.org/html/rfc3954 32 | # https://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html 33 | 1: 'IN_BYTES', 34 | 2: 'IN_PKTS', 35 | 3: 'FLOWS', 36 | 4: 'PROTOCOL', 37 | 5: 'SRC_TOS', 38 | 6: 'TCP_FLAGS', 39 | 7: 'L4_SRC_PORT', 40 | 8: 'IPV4_SRC_ADDR', 41 | 9: 'SRC_MASK', 42 | 10: 'INPUT_SNMP', 43 | 11: 'L4_DST_PORT', 44 | 12: 'IPV4_DST_ADDR', 45 | 13: 'DST_MASK', 46 | 14: 'OUTPUT_SNMP', 47 | 15: 'IPV4_NEXT_HOP', 48 | 16: 'SRC_AS', 49 | 17: 'DST_AS', 50 | 18: 'BGP_IPV4_NEXT_HOP', 51 | 19: 'MUL_DST_PKTS', 52 | 20: 'MUL_DST_BYTES', 53 | 21: 'LAST_SWITCHED', 54 | 22: 'FIRST_SWITCHED', 55 | 23: 'OUT_BYTES', 56 | 24: 'OUT_PKTS', 57 | 25: 'MIN_PKT_LNGTH', 58 | 26: 'MAX_PKT_LNGTH', 59 | 27: 'IPV6_SRC_ADDR', 60 | 28: 'IPV6_DST_ADDR', 61 | 29: 'IPV6_SRC_MASK', 62 | 30: 'IPV6_DST_MASK', 63 | 31: 'IPV6_FLOW_LABEL', 64 | 32: 'ICMP_TYPE', 65 | 33: 'MUL_IGMP_TYPE', 66 | 34: 'SAMPLING_INTERVAL', 67 | 35: 'SAMPLING_ALGORITHM', 68 | 36: 'FLOW_ACTIVE_TIMEOUT', 69 | 37: 'FLOW_INACTIVE_TIMEOUT', 70 | 38: 'ENGINE_TYPE', 71 | 39: 'ENGINE_ID', 72 | 40: 'TOTAL_BYTES_EXP', 73 | 41: 'TOTAL_PKTS_EXP', 74 | 42: 'TOTAL_FLOWS_EXP', 75 | # 43 vendor proprietary 76 | 44: 'IPV4_SRC_PREFIX', 77 | 45: 'IPV4_DST_PREFIX', 78 | 46: 'MPLS_TOP_LABEL_TYPE', 79 | 47: 'MPLS_TOP_LABEL_IP_ADDR', 80 | 48: 'FLOW_SAMPLER_ID', 81 | 49: 'FLOW_SAMPLER_MODE', 82 | 50: 'NTERVAL', 83 | # 51 vendor proprietary 84 | 52: 'MIN_TTL', 85 | 53: 'MAX_TTL', 86 | 54: 'IPV4_IDENT', 87 | 55: 'DST_TOS', 88 | 56: 'IN_SRC_MAC', 89 | 57: 'OUT_DST_MAC', 90 | 58: 'SRC_VLAN', 91 | 59: 'DST_VLAN', 92 | 60: 'IP_PROTOCOL_VERSION', 93 | 61: 'DIRECTION', 94 | 62: 'IPV6_NEXT_HOP', 95 | 63: 'BPG_IPV6_NEXT_HOP', 96 | 64: 'IPV6_OPTION_HEADERS', 97 | # 65-69 vendor proprietary 98 | 70: 'MPLS_LABEL_1', 99 | 71: 'MPLS_LABEL_2', 100 | 72: 'MPLS_LABEL_3', 101 | 73: 'MPLS_LABEL_4', 102 | 74: 'MPLS_LABEL_5', 103 | 75: 'MPLS_LABEL_6', 104 | 76: 'MPLS_LABEL_7', 105 | 77: 'MPLS_LABEL_8', 106 | 78: 'MPLS_LABEL_9', 107 | 79: 'MPLS_LABEL_10', 108 | 80: 'IN_DST_MAC', 109 | 81: 'OUT_SRC_MAC', 110 | 82: 'IF_NAME', 111 | 83: 'IF_DESC', 112 | 84: 'SAMPLER_NAME', 113 | 85: 'IN_PERMANENT_BYTES', 114 | 86: 'IN_PERMANENT_PKTS', 115 | # 87 vendor property 116 | 88: 'FRAGMENT_OFFSET', 117 | 89: 'FORWARDING_STATUS', 118 | 90: 'MPLS_PAL_RD', 119 | 91: 'MPLS_PREFIX_LEN', # Number of consecutive bits in the MPLS prefix length. 120 | 92: 'SRC_TRAFFIC_INDEX', # BGP Policy Accounting Source Traffic Index 121 | 93: 'DST_TRAFFIC_INDEX', # BGP Policy Accounting Destination Traffic Index 122 | 94: 'APPLICATION_DESCRIPTION', # Application description 123 | 95: 'APPLICATION_TAG', # 8 bits of engine ID, followed by n bits of classification 124 | 96: 'APPLICATION_NAME', # Name associated with a classification 125 | 98: 'postipDiffServCodePoint', # The value of a Differentiated Services Code Point (DSCP) 126 | # encoded in the Differentiated Services Field, after modification 127 | 99: 'replication_factor', # Multicast replication factor 128 | 100: 'DEPRECATED', # DEPRECATED 129 | 102: 'layer2packetSectionOffset', # Layer 2 packet section offset. Potentially a generic offset 130 | 103: 'layer2packetSectionSize', # Layer 2 packet section size. Potentially a generic size 131 | 104: 'layer2packetSectionData', # Layer 2 packet section data 132 | # 105-127 reserved for future use by Cisco 133 | 134 | # ASA extensions 135 | # https://www.cisco.com/c/en/us/td/docs/security/asa/special/netflow/guide/asa_netflow.html 136 | 148: 'NF_F_CONN_ID', # An identifier of a unique flow for the device 137 | 176: 'NF_F_ICMP_TYPE', # ICMP type value 138 | 177: 'NF_F_ICMP_CODE', # ICMP code value 139 | 178: 'NF_F_ICMP_TYPE_IPV6', # ICMP IPv6 type value 140 | 179: 'NF_F_ICMP_CODE_IPV6', # ICMP IPv6 code value 141 | 225: 'NF_F_XLATE_SRC_ADDR_IPV4', # Post NAT Source IPv4 Address 142 | 226: 'NF_F_XLATE_DST_ADDR_IPV4', # Post NAT Destination IPv4 Address 143 | 227: 'NF_F_XLATE_SRC_PORT', # Post NATT Source Transport Port 144 | 228: 'NF_F_XLATE_DST_PORT', # Post NATT Destination Transport Port 145 | 281: 'NF_F_XLATE_SRC_ADDR_IPV6', # Post NAT Source IPv6 Address 146 | 282: 'NF_F_XLATE_DST_ADDR_IPV6', # Post NAT Destination IPv6 Address 147 | 233: 'NF_F_FW_EVENT', # High-level event code 148 | 33002: 'NF_F_FW_EXT_EVENT', # Extended event code 149 | 323: 'NF_F_EVENT_TIME_MSEC', # The time that the event occurred, which comes from IPFIX 150 | 152: 'NF_F_FLOW_CREATE_TIME_MSEC', 151 | 231: 'NF_F_FWD_FLOW_DELTA_BYTES', # The delta number of bytes from source to destination 152 | 232: 'NF_F_REV_FLOW_DELTA_BYTES', # The delta number of bytes from destination to source 153 | 33000: 'NF_F_INGRESS_ACL_ID', # The input ACL that permitted or denied the flow 154 | 33001: 'NF_F_EGRESS_ACL_ID', # The output ACL that permitted or denied a flow 155 | 40000: 'NF_F_USERNAME', # AAA username 156 | 157 | # PaloAlto PAN-OS 8.0 158 | # https://www.paloaltonetworks.com/documentation/80/pan-os/pan-os/monitoring/netflow-monitoring/netflow-templates 159 | 346: 'PANOS_privateEnterpriseNumber', 160 | 56701: 'PANOS_APPID', 161 | 56702: 'PANOS_USERID' 162 | } 163 | 164 | V9_SCOPE_TYPES = { 165 | 1: "System", 166 | 2: "Interface", 167 | 3: "Line Card", 168 | 4: "Cache", 169 | 5: "Template" 170 | } 171 | 172 | 173 | class V9TemplateNotRecognized(KeyError): 174 | pass 175 | 176 | 177 | class V9DataRecord: 178 | """This is a 'flow' as we want it from our source. What it contains is 179 | variable in NetFlow V9, so to work with the data you have to analyze the 180 | data dict keys (which are integers and can be mapped with the FIELD_TYPES 181 | dict). 182 | Should hold a 'data' dict with keys=field_type (integer) and value (in bytes). 183 | """ 184 | 185 | def __init__(self): 186 | self.data = {} 187 | 188 | def __repr__(self): 189 | return "".format(self.data) 190 | 191 | 192 | class V9DataFlowSet: 193 | """Holds one or multiple DataRecord which are all defined after the same 194 | template. This template is referenced in the field 'flowset_id' of this 195 | DataFlowSet and must not be zero. 196 | """ 197 | 198 | def __init__(self, data, template): 199 | pack = struct.unpack('!HH', data[:4]) 200 | 201 | self.template_id = pack[0] # flowset_id is reference to a template_id 202 | self.length = pack[1] 203 | self.flows = [] 204 | 205 | offset = 4 206 | 207 | # As the field lengths are variable V9 has padding to next 32 Bit 208 | padding_size = 4 - (self.length % 4) # 4 Byte 209 | 210 | # For performance reasons, we use struct.unpack to get individual values. Here 211 | # we prepare the format string for parsing it. The format string is based on the template fields and their 212 | # lengths. The string can then be re-used for every data record in the data stream 213 | struct_format = '!' 214 | struct_len = 0 215 | for field in template.fields: 216 | # The length of the value byte slice is defined in the template 217 | flen = field.field_length 218 | if flen == 4: 219 | struct_format += 'L' 220 | elif flen == 2: 221 | struct_format += 'H' 222 | elif flen == 1: 223 | struct_format += 'B' 224 | else: 225 | struct_format += '%ds' % flen 226 | struct_len += flen 227 | 228 | while offset <= (self.length - padding_size): 229 | # Here we actually unpack the values, the struct format string is used in every data record 230 | # iteration, until the final offset reaches the end of the whole data stream 231 | unpacked_values = struct.unpack(struct_format, data[offset:offset + struct_len]) 232 | 233 | new_record = V9DataRecord() 234 | for field, value in zip(template.fields, unpacked_values): 235 | flen = field.field_length 236 | fkey = V9_FIELD_TYPES[field.field_type] 237 | 238 | # Special handling of IP addresses to convert integers to strings to not lose precision in dump 239 | # TODO: might only be needed for IPv6 240 | if field.field_type in V9_FIELD_TYPES_CONTAINING_IP: 241 | try: 242 | ip = ipaddress.ip_address(value) 243 | except ValueError: 244 | print("IP address could not be parsed: {}".format(repr(value))) 245 | continue 246 | new_record.data[fkey] = ip.compressed 247 | elif flen in (1, 2, 4): 248 | # These values are already converted to numbers by struct.unpack: 249 | new_record.data[fkey] = value 250 | else: 251 | # Caveat: this code assumes little-endian system (like x86) 252 | if sys.byteorder != "little": 253 | print("v9.py uses bit shifting for little endianness. Your processor is not little endian") 254 | 255 | fdata = 0 256 | for idx, byte in enumerate(reversed(bytearray(value))): 257 | fdata += byte << (idx * 8) 258 | new_record.data[fkey] = fdata 259 | 260 | offset += flen 261 | 262 | new_record.__dict__.update(new_record.data) 263 | self.flows.append(new_record) 264 | 265 | def __repr__(self): 266 | return "" \ 267 | .format(self.template_id, self.length, len(self.flows)) 268 | 269 | 270 | class V9TemplateField: 271 | """A field with type identifier and length. 272 | """ 273 | 274 | def __init__(self, field_type, field_length): 275 | self.field_type = field_type # integer 276 | self.field_length = field_length # bytes 277 | 278 | def __repr__(self): 279 | return "".format( 280 | self.field_type, V9_FIELD_TYPES[self.field_type], self.field_length) 281 | 282 | 283 | class V9TemplateRecord: 284 | """A template record contained in a TemplateFlowSet. 285 | """ 286 | 287 | def __init__(self, template_id, field_count, fields: list): 288 | self.template_id = template_id 289 | self.field_count = field_count 290 | self.fields = fields 291 | 292 | def __repr__(self): 293 | return "".format( 294 | self.template_id, self.field_count, 295 | ' '.join([V9_FIELD_TYPES[field.field_type] for field in self.fields])) 296 | 297 | 298 | class V9OptionsDataRecord: 299 | def __init__(self): 300 | self.scopes = {} 301 | self.data = {} 302 | 303 | def __repr__(self): 304 | return "".format(self.scopes.keys(), self.data.keys()) 305 | 306 | 307 | class V9OptionsTemplateRecord: 308 | """An options template record contained in an options template flowset. 309 | """ 310 | 311 | def __init__(self, template_id, scope_fields: dict, option_fields: dict): 312 | self.template_id = template_id 313 | self.scope_fields = scope_fields 314 | self.option_fields = option_fields 315 | 316 | def __repr__(self): 317 | return "".format( 318 | self.scope_fields.keys(), self.option_fields.keys()) 319 | 320 | 321 | class V9OptionsTemplateFlowSet: 322 | """An options template flowset. 323 | 324 | > Each Options Template FlowSet MAY contain multiple Options Template Records. 325 | 326 | Scope field types range from 1 to 5: 327 | 1 System 328 | 2 Interface 329 | 3 Line Card 330 | 4 Cache 331 | 5 Template 332 | """ 333 | 334 | def __init__(self, data: bytes): 335 | pack = struct.unpack('!HH', data[:4]) 336 | self.flowset_id = pack[0] # always 1 337 | self.flowset_length = pack[1] # length of this flowset 338 | self.templates = {} 339 | 340 | offset = 4 341 | 342 | while offset < self.flowset_length: 343 | pack = struct.unpack("!HHH", data[offset:offset + 6]) # options template header 344 | template_id = pack[0] # value above 255 345 | option_scope_length = pack[1] 346 | options_length = pack[2] 347 | 348 | offset += 6 349 | 350 | # Fetch all scope fields (most probably only one field) 351 | scopes = {} # Holds "type: length" key-value pairs 352 | 353 | if option_scope_length % 4 != 0 or options_length % 4 != 0: 354 | raise ValueError(option_scope_length, options_length) 355 | 356 | for scope_counter in range(option_scope_length // 4): # example: option_scope_length = 4 means one scope 357 | pack = struct.unpack("!HH", data[offset:offset + 4]) 358 | scope_field_type = pack[0] # values range from 1 to 5 359 | scope_field_length = pack[1] 360 | scopes[scope_field_type] = scope_field_length 361 | offset += 4 362 | 363 | # Fetch all option fields 364 | options = {} # same 365 | for option_counter in range(options_length // 4): # now counting the options 366 | pack = struct.unpack("!HH", data[offset:offset + 4]) 367 | option_field_type = pack[0] 368 | option_field_length = pack[1] 369 | options[option_field_type] = option_field_length 370 | offset += 4 371 | 372 | optionstemplate = V9OptionsTemplateRecord(template_id, scopes, options) 373 | 374 | self.templates[template_id] = optionstemplate 375 | 376 | # handle padding and add offset if needed 377 | if offset % 4 == 2: 378 | offset += 2 379 | 380 | def __repr__(self): 381 | return "".format(len(self.templates), self.templates.keys()) 382 | 383 | 384 | class V9OptionsDataFlowset: 385 | """An options data flowset with option data records 386 | """ 387 | 388 | def __init__(self, data: bytes, template: V9OptionsTemplateRecord): 389 | pack = struct.unpack('!HH', data[:4]) 390 | 391 | self.template_id = pack[0] 392 | self.length = pack[1] 393 | self.option_data_records = [] 394 | 395 | offset = 4 396 | 397 | while offset < self.length: 398 | new_options_record = V9OptionsDataRecord() 399 | 400 | for scope_type, length in template.scope_fields.items(): 401 | type_name = V9_SCOPE_TYPES.get(scope_type, scope_type) # Either name, or unknown int 402 | value = int.from_bytes(data[offset:offset + length], 'big') # TODO: is this always integer? 403 | new_options_record.scopes[type_name] = value 404 | offset += length 405 | 406 | for field_type, length in template.option_fields.items(): 407 | type_name = V9_FIELD_TYPES.get(field_type, None) 408 | is_bytes = False 409 | 410 | if not type_name: # Cisco refers to the IANA IPFIX table for types >256... 411 | iana_type = IPFIXFieldTypes.by_id(field_type) # try to get from IPFIX types 412 | if iana_type: 413 | type_name = iana_type.name 414 | is_bytes = IPFIXDataTypes.is_bytes(iana_type) 415 | 416 | if not type_name: 417 | raise ValueError 418 | 419 | value = None 420 | if is_bytes: 421 | value = data[offset:offset + length] 422 | else: 423 | value = int.from_bytes(data[offset:offset + length], 'big') 424 | 425 | new_options_record.data[type_name] = value 426 | 427 | offset += length 428 | 429 | self.option_data_records.append(new_options_record) 430 | 431 | if offset % 4 == 2: 432 | offset += 2 433 | 434 | 435 | class V9TemplateFlowSet: 436 | """A template flowset, which holds an id that is used by data flowsets to 437 | reference back to the template. The template then has fields which hold 438 | identifiers of data types (eg "IP_SRC_ADDR", "PKTS"..). This way the flow 439 | sender can dynamically put together data flowsets. 440 | """ 441 | 442 | def __init__(self, data): 443 | pack = struct.unpack('!HH', data[:4]) 444 | self.flowset_id = pack[0] # always 0 445 | self.length = pack[1] # total length including this header in bytes 446 | self.templates = {} 447 | 448 | offset = 4 # Skip header 449 | 450 | # Iterate through all template records in this template flowset 451 | while offset < self.length: 452 | pack = struct.unpack('!HH', data[offset:offset + 4]) 453 | template_id = pack[0] 454 | field_count = pack[1] 455 | 456 | fields = [] 457 | for field in range(field_count): 458 | # Get all fields of this template 459 | offset += 4 460 | field_type, field_length = struct.unpack('!HH', data[offset:offset + 4]) 461 | if field_type not in V9_FIELD_TYPES: 462 | field_type = 0 # Set field_type to UNKNOWN_FIELD_TYPE as fallback 463 | field = V9TemplateField(field_type, field_length) 464 | fields.append(field) 465 | 466 | # Create a template object with all collected data 467 | template = V9TemplateRecord(template_id, field_count, fields) 468 | 469 | # Append the new template to the global templates list 470 | self.templates[template.template_id] = template 471 | 472 | # Set offset to next template_id field 473 | offset += 4 474 | 475 | def __repr__(self): 476 | return "" \ 477 | .format(self.flowset_id, self.length, self.templates.keys()) 478 | 479 | 480 | class V9Header: 481 | """The header of the V9ExportPacket 482 | """ 483 | length = 20 484 | 485 | def __init__(self, data): 486 | pack = struct.unpack('!HHIIII', data[:self.length]) 487 | self.version = pack[0] 488 | self.count = pack[1] # not sure if correct. softflowd: no of flows 489 | self.uptime = pack[2] 490 | self.timestamp = pack[3] 491 | self.sequence = pack[4] 492 | self.source_id = pack[5] 493 | 494 | def to_dict(self): 495 | return self.__dict__ 496 | 497 | 498 | class V9ExportPacket: 499 | """The flow record holds the header and all template and data flowsets. 500 | 501 | TODO: refactor into two loops: first get all contained flowsets and examine template 502 | flowsets first. Then data flowsets. 503 | """ 504 | 505 | def __init__(self, data: bytes, templates: dict): 506 | self.header = V9Header(data) 507 | self._templates = templates 508 | self._new_templates = False 509 | self._flows = [] 510 | self._options = [] 511 | 512 | offset = self.header.length 513 | skipped_flowsets_offsets = [] 514 | 515 | while offset != len(data): 516 | pack = struct.unpack('!HH', data[offset:offset + 4]) 517 | flowset_id = pack[0] # = template id 518 | flowset_length = pack[1] 519 | 520 | # Data template flowsets 521 | if flowset_id == 0: # TemplateFlowSet always have id 0 522 | tfs = V9TemplateFlowSet(data[offset:]) 523 | # Update the templates with the provided templates, even if they are the same 524 | for id_, template in tfs.templates.items(): 525 | if id_ not in self._templates: 526 | self._new_templates = True 527 | self._templates[id_] = template 528 | if tfs.length == 0: 529 | break 530 | offset += tfs.length 531 | continue 532 | 533 | # Option template flowsets 534 | elif flowset_id == 1: # Option templates always use ID 1 535 | otfs = V9OptionsTemplateFlowSet(data[offset:]) 536 | for id_, template in otfs.templates.items(): 537 | if id_ not in self._templates: 538 | self._new_templates = True 539 | self._templates[id_] = template 540 | offset += otfs.flowset_length 541 | if otfs.flowset_length == 0: 542 | break 543 | continue 544 | 545 | # Data / option flowsets 546 | # First, check if template is known 547 | if flowset_id not in self._templates: 548 | # Could not be parsed, continue to check for templates 549 | skipped_flowsets_offsets.append(offset) 550 | offset += flowset_length 551 | if flowset_length == 0: 552 | break 553 | continue 554 | 555 | matched_template = self._templates[flowset_id] 556 | 557 | if isinstance(matched_template, V9TemplateRecord): 558 | dfs = V9DataFlowSet(data[offset:], matched_template) 559 | self._flows += dfs.flows 560 | if dfs.length == 0: 561 | break 562 | offset += dfs.length 563 | 564 | elif isinstance(matched_template, V9OptionsTemplateRecord): 565 | odfs = V9OptionsDataFlowset(data[offset:], matched_template) 566 | self._options += odfs.option_data_records 567 | if odfs.length == 0: 568 | break 569 | offset += odfs.length 570 | 571 | else: 572 | raise NotImplementedError 573 | 574 | # In the same export packet, re-try flowsets with previously unknown templates. 575 | # Might happen, if an export packet first contains data flowsets, and template flowsets after 576 | if skipped_flowsets_offsets and self._new_templates: 577 | # Process flowsets in the data slice which occured before the template sets 578 | # Handling of offset increases is not needed here 579 | for offset in skipped_flowsets_offsets: 580 | pack = struct.unpack('!H', data[offset:offset + 2]) 581 | flowset_id = pack[0] 582 | 583 | if flowset_id not in self._templates: 584 | raise V9TemplateNotRecognized 585 | 586 | matched_template = self._templates[flowset_id] 587 | if isinstance(matched_template, V9TemplateRecord): 588 | dfs = V9DataFlowSet(data[offset:], matched_template) 589 | self._flows += dfs.flows 590 | elif isinstance(matched_template, V9OptionsTemplateRecord): 591 | odfs = V9OptionsDataFlowset(data[offset:], matched_template) 592 | self._options += odfs.option_data_records 593 | 594 | elif skipped_flowsets_offsets: 595 | raise V9TemplateNotRecognized 596 | 597 | @property 598 | def contains_new_templates(self): 599 | return self._new_templates 600 | 601 | @property 602 | def flows(self): 603 | return self._flows 604 | 605 | @property 606 | def templates(self): 607 | return self._templates 608 | 609 | @property 610 | def options(self): 611 | return self._options 612 | 613 | def __repr__(self): 614 | s = " and new template(s)" if self.contains_new_templates else "" 615 | return "".format(self.header.count, s) 616 | -------------------------------------------------------------------------------- /nf-workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bitkeks/python-netflow-v9-softflowd/71fb316a24357c2466fb996dffa82348a1e17b00/nf-workflow.png -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | from setuptools import setup 4 | 5 | with open("README.md", "r") as fh: 6 | long_description = fh.read() 7 | 8 | setup( 9 | name='netflow', 10 | version='0.12.2', 11 | description='NetFlow v1, v5, v9 and IPFIX tool suite implemented in Python 3', 12 | long_description=long_description, 13 | long_description_content_type='text/markdown', 14 | author='Dominik Pataky', 15 | author_email='software+pynetflow@dpataky.eu', 16 | url='https://github.com/bitkeks/python-netflow-v9-softflowd', 17 | packages=["netflow"], 18 | license='MIT', 19 | python_requires='>=3.5.3', 20 | keywords='netflow ipfix collector parser', 21 | classifiers=[ 22 | "Programming Language :: Python :: 3", 23 | "License :: OSI Approved :: MIT License", 24 | "Intended Audience :: Developers", 25 | "Intended Audience :: System Administrators" 26 | ], 27 | ) 28 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bitkeks/python-netflow-v9-softflowd/71fb316a24357c2466fb996dffa82348a1e17b00/tests/__init__.py -------------------------------------------------------------------------------- /tests/lib.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 3 | 4 | The test packets (defined below as hex streams) were extracted from "real" 5 | softflowd exports based on a sample PCAP capture file. 6 | 7 | Copyright 2016-2020 Dominik Pataky 8 | Licensed under MIT License. See LICENSE. 9 | """ 10 | 11 | # The flowset with 2 templates (IPv4 and IPv6) and 8 flows with data 12 | import queue 13 | import random 14 | import socket 15 | import time 16 | 17 | from netflow.collector import ThreadedNetFlowListener 18 | 19 | # Invalid export hex stream 20 | PACKET_INVALID = "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF" 21 | 22 | CONNECTION = ('127.0.0.1', 1337) 23 | NUM_PACKETS = 1000 24 | 25 | 26 | def emit_packets(packets, delay=0.0001): 27 | """Send the provided packets to the listener""" 28 | sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) 29 | for p in packets: 30 | sock.sendto(bytes.fromhex(p), CONNECTION) 31 | time.sleep(delay) 32 | sock.close() 33 | 34 | 35 | def send_recv_packets(packets, delay=0.0001, store_packets=-1) -> (list, float, float): 36 | """Starts a listener, send packets, receives packets 37 | 38 | returns a tuple: ([(ts, export), ...], time_started_sending, time_stopped_sending) 39 | """ 40 | listener = ThreadedNetFlowListener(*CONNECTION) 41 | tstart = time.time() 42 | emit_packets(packets, delay=delay) 43 | time.sleep(0.5) # Allow packets to be sent and recieved 44 | tend = time.time() 45 | listener.start() 46 | 47 | pkts = [] 48 | to_pad = 0 49 | while True: 50 | try: 51 | packet = listener.get(timeout=0.5) 52 | if -1 == store_packets or store_packets > 0: 53 | # Case where a programm yields from the queue and stores all packets. 54 | pkts.append(packet) 55 | if store_packets != -1 and len(pkts) > store_packets: 56 | to_pad += len(pkts) # Hack for testing 57 | pkts.clear() 58 | else: 59 | # Performance measurements for cases where yielded objects are processed 60 | # immediatelly instead of stored. Add empty tuple to retain counting possibility. 61 | pkts.append(()) 62 | except queue.Empty: 63 | break 64 | listener.stop() 65 | listener.join() 66 | if to_pad > 0: 67 | pkts = [()] * to_pad + pkts 68 | return pkts, tstart, tend 69 | 70 | 71 | def generate_packets(amount, version, template_every_x=100): 72 | packets = [PACKET_IPFIX] 73 | template = PACKET_IPFIX_TEMPLATE 74 | 75 | if version == 1: 76 | packets = [PACKET_V1] 77 | elif version == 5: 78 | packets = [PACKET_V5] 79 | elif version == 9: 80 | packets = [*PACKETS_V9] 81 | template = PACKET_V9_TEMPLATE 82 | 83 | if amount < template_every_x: 84 | template_every_x = 10 85 | 86 | # If the list of test packets is only one item big (the same packet is used over and over), 87 | # do not use random.choice - it costs performance and results in the same packet every time. 88 | def single_packet(pkts): 89 | return pkts[0] 90 | 91 | packet_func = single_packet 92 | if len(packets) > 1: 93 | packet_func = random.choice 94 | 95 | for x in range(amount): 96 | if x % template_every_x == 0 and version in [9, 10]: 97 | # First packet always a template, then periodically 98 | # Note: this was once based on random.random, but it costs performance 99 | yield template 100 | else: 101 | yield packet_func(packets) 102 | 103 | 104 | # Example export for v1 which contains two flows from one ICMP ping request/reply session 105 | PACKET_V1 = "000100020001189b5e80c32c2fd41848ac110002ac11000100000000000000000000000a00000348" \ 106 | "000027c700004af100000800000001000000000000000000ac110001ac1100020000000000000000" \ 107 | "0000000a00000348000027c700004af100000000000001000000000000000000" 108 | 109 | # Example export for v5 which contains three flows, two for ICMP ping and one multicast on interface (224.0.0.251) 110 | PACKET_V5 = "00050003000379a35e80c58622a55ab00000000000000000ac110002ac1100010000000000000000" \ 111 | "0000000a0000034800002f4c0000527600000800000001000000000000000000ac110001ac110002" \ 112 | "00000000000000000000000a0000034800002f4c0000527600000000000001000000000000000000" \ 113 | "ac110001e00000fb000000000000000000000001000000a90000e01c0000e01c14e914e900001100" \ 114 | "0000000000000000" 115 | 116 | PACKET_V9_TEMPLATE = "0009000a000000035c9f55980000000100000000000000400400000e00080004000c000400150004" \ 117 | "001600040001000400020004000a0004000e000400070002000b00020004000100060001003c0001" \ 118 | "00050001000000400800000e001b0010001c001000150004001600040001000400020004000a0004" \ 119 | "000e000400070002000b00020004000100060001003c000100050001040001447f0000017f000001" \ 120 | "fb3c1aaafb3c18fd000190100000004b00000000000000000050942c061b04007f0000017f000001" \ 121 | "fb3c1aaafb3c18fd00000f94000000360000000000000000942c0050061f04007f0000017f000001" \ 122 | "fb3c1cfcfb3c1a9b0000d3fc0000002a000000000000000000509434061b04007f0000017f000001" \ 123 | "fb3c1cfcfb3c1a9b00000a490000001e000000000000000094340050061f04007f0000017f000001" \ 124 | "fb3bb82cfb3ba48b000002960000000300000000000000000050942a061904007f0000017f000001" \ 125 | "fb3bb82cfb3ba48b00000068000000020000000000000000942a0050061104007f0000017f000001" \ 126 | "fb3c1900fb3c18fe0000004c0000000100000000000000000035b3c9110004007f0000017f000001" \ 127 | "fb3c1900fb3c18fe0000003c000000010000000000000000b3c9003511000400" 128 | 129 | # This packet is special. We take PACKET_V9_TEMPLATE and re-order the templates and flows. 130 | # The first line is the header, the smaller lines the templates and the long lines the flows (limited to 80 chars) 131 | PACKET_V9_TEMPLATE_MIXED = ("0009000a000000035c9f55980000000100000000" # header 132 | "040001447f0000017f000001fb3c1aaafb3c18fd000190100000004b00000000000000000050942c" 133 | "061b04007f0000017f000001fb3c1aaafb3c18fd00000f94000000360000000000000000942c0050" 134 | "061f04007f0000017f000001fb3c1cfcfb3c1a9b0000d3fc0000002a000000000000000000509434" 135 | "061b04007f0000017f000001fb3c1cfcfb3c1a9b00000a490000001e000000000000000094340050" 136 | "061f04007f0000017f000001fb3bb82cfb3ba48b000002960000000300000000000000000050942a" 137 | "061904007f0000017f000001fb3bb82cfb3ba48b00000068000000020000000000000000942a0050" 138 | "061104007f0000017f000001fb3c1900fb3c18fe0000004c0000000100000000000000000035b3c9" 139 | "110004007f0000017f000001fb3c1900fb3c18fe0000003c000000010000000000000000b3c90035" 140 | "11000400" # end of flow segments 141 | "000000400400000e00080004000c000400150004001600040001000400020004" # template 1024 142 | "000a0004000e000400070002000b00020004000100060001003c000100050001" 143 | "000000400800000e001b0010001c001000150004001600040001000400020004" # template 2048 144 | "000a0004000e000400070002000b00020004000100060001003c000100050001") 145 | 146 | # Three packets without templates, each with 12 flows 147 | PACKETS_V9 = [ 148 | "0009000c000000035c9f55980000000200000000040001e47f0000017f000001fb3c1a17fb3c19fd" 149 | "000001480000000200000000000000000035ea82110004007f0000017f000001fb3c1a17fb3c19fd" 150 | "0000007a000000020000000000000000ea820035110004007f0000017f000001fb3c1a17fb3c19fd" 151 | "000000f80000000200000000000000000035c6e2110004007f0000017f000001fb3c1a17fb3c19fd" 152 | "0000007a000000020000000000000000c6e20035110004007f0000017f000001fb3c1a9efb3c1a9c" 153 | "0000004c0000000100000000000000000035adc1110004007f0000017f000001fb3c1a9efb3c1a9c" 154 | "0000003c000000010000000000000000adc10035110004007f0000017f000001fb3c1b74fb3c1b72" 155 | "0000004c0000000100000000000000000035d0b3110004007f0000017f000001fb3c1b74fb3c1b72" 156 | "0000003c000000010000000000000000d0b30035110004007f0000017f000001fb3c2f59fb3c1b71" 157 | "00001a350000000a000000000000000000509436061b04007f0000017f000001fb3c2f59fb3c1b71" 158 | "0000038a0000000a000000000000000094360050061b04007f0000017f000001fb3c913bfb3c9138" 159 | "0000004c0000000100000000000000000035e262110004007f0000017f000001fb3c913bfb3c9138" 160 | "0000003c000000010000000000000000e262003511000400", 161 | 162 | "0009000c000000035c9f55980000000300000000040001e47f0000017f000001fb3ca523fb3c913b" 163 | "0000030700000005000000000000000000509438061b04007f0000017f000001fb3ca523fb3c913b" 164 | "000002a200000005000000000000000094380050061b04007f0000017f000001fb3f7fe1fb3dbc97" 165 | "0002d52800000097000000000000000001bb8730061b04007f0000017f000001fb3f7fe1fb3dbc97" 166 | "0000146c000000520000000000000000873001bb061f04007f0000017f000001fb3d066ffb3d066c" 167 | "0000004c0000000100000000000000000035e5bd110004007f0000017f000001fb3d066ffb3d066c" 168 | "0000003c000000010000000000000000e5bd0035110004007f0000017f000001fb3d1a61fb3d066b" 169 | "000003060000000500000000000000000050943a061b04007f0000017f000001fb3d1a61fb3d066b" 170 | "000002a2000000050000000000000000943a0050061b04007f0000017f000001fb3fed00fb3f002c" 171 | "0000344000000016000000000000000001bbae50061f04007f0000017f000001fb3fed00fb3f002c" 172 | "00000a47000000120000000000000000ae5001bb061b04007f0000017f000001fb402f17fb402a75" 173 | "0003524c000000a5000000000000000001bbc48c061b04007f0000017f000001fb402f17fb402a75" 174 | "000020a60000007e0000000000000000c48c01bb061f0400", 175 | 176 | "0009000c000000035c9f55980000000400000000040001e47f0000017f000001fb3d7ba2fb3d7ba0" 177 | "0000004c0000000100000000000000000035a399110004007f0000017f000001fb3d7ba2fb3d7ba0" 178 | "0000003c000000010000000000000000a3990035110004007f0000017f000001fb3d8f85fb3d7b9f" 179 | "000003070000000500000000000000000050943c061b04007f0000017f000001fb3d8f85fb3d7b9f" 180 | "000002a2000000050000000000000000943c0050061b04007f0000017f000001fb3d9165fb3d7f6d" 181 | "0000c97b0000002a000000000000000001bbae48061b04007f0000017f000001fb3d9165fb3d7f6d" 182 | "000007f40000001a0000000000000000ae4801bb061b04007f0000017f000001fb3dbc96fb3dbc7e" 183 | "0000011e0000000200000000000000000035bd4f110004007f0000017f000001fb3dbc96fb3dbc7e" 184 | "0000008e000000020000000000000000bd4f0035110004007f0000017f000001fb3ddbb3fb3c1a18" 185 | "0000bfee0000002f00000000000000000050ae56061b04007f0000017f000001fb3ddbb3fb3c1a18" 186 | "00000982000000270000000000000000ae560050061b04007f0000017f000001fb3ddbb3fb3c1a18" 187 | "0000130e0000001200000000000000000050e820061b04007f0000017f000001fb3ddbb3fb3c1a18" 188 | "0000059c000000140000000000000000e8200050061b0400", 189 | ] 190 | 191 | PACKET_V9_WITH_ZEROS = ( 192 | "000900057b72e830620b717d78cf34e30102000001040048000000000000006e0000000101000000" 193 | "000a20076a06065c0800000d6b15c80000000b7b72e4487b72e448080000000000000438bf6401c7" 194 | "65ad1e0d6b15c8000000000001040048000000000000006700000001110000c951ac180b0306065c" 195 | "080035010000010000000b7b72e4487b72e448000000000000000443177b01c765ada501000001c3" 196 | "9c00350001040048000000000000004a000000010100000000ac19bc3206065c080000287048cd00" 197 | "00000b7b72e8307b72e83008000000000000048f071a01c765ae42287048cd000000000001040048" 198 | "000000000000004600000001060002cbef0a30681f06065c0801bb142a49180000000b7b72e8307b" 199 | "72e8300000000000000004801c7801c765ae1d142a49185a2b01bb00010400480000000000000046" 200 | "00000001060002fe800a2f601206065c0801bb142a49180000000b7b72e8307b72e8300000000000" 201 | "0000040806b001c765ae28142a4918d4b501bb000000000000000000000000000000000000000000" 202 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 203 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 204 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 205 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 206 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 207 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 208 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 209 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 210 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 211 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 212 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 213 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 214 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 215 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 216 | "00000000000000000000000000000000000000000000000000000000000000000000000000000000" 217 | ) 218 | 219 | # Example export for IPFIX (v10) with 4 templates, 1 option template and 8 data flow sets 220 | PACKET_IPFIX_TEMPLATE = "000a05202d45a4700000001300000000000200400400000e00080004000c00040016000400150004" \ 221 | "0001000400020004000a0004000e000400070002000b00020004000100060001003c000100050001" \ 222 | "000200340401000b00080004000c000400160004001500040001000400020004000a0004000e0004" \ 223 | "00200002003c000100050001000200400800000e001b0010001c0010001600040015000400010004" \ 224 | "00020004000a0004000e000400070002000b00020004000100060001003c00010005000100020034" \ 225 | "0801000b001b0010001c001000160004001500040001000400020004000a0004000e0004008b0002" \ 226 | "003c0001000500010003001e010000050001008f000400a000080131000401320004013000020100" \ 227 | "001a00000a5900000171352e672100000001000000000001040000547f000001ac110002ff7ed688" \ 228 | "ff7ed73a000015c70000000d000000000000000001bbe1a6061b0400ac1100027f000001ff7ed688" \ 229 | "ff7ed73a0000074f000000130000000000000000e1a601bb061f04000401004cac110002ac110001" \ 230 | "ff7db9e0ff7dc1d0000000fc00000003000000000000000008000400ac110001ac110002ff7db9e0" \ 231 | "ff7dc1d0000000fc0000000300000000000000000000040008010220fde66f14e0f1960900000242" \ 232 | "ac110002ff0200000000000000000001ff110001ff7dfad6ff7e0e95000001b00000000600000000" \ 233 | "0000000087000600fde66f14e0f196090000affeaffeaffefdabcdef123456789000000000000001" \ 234 | "ff7e567fff7e664a0000020800000005000000000000000080000600fde66f14e0f1960900000000" \ 235 | "00000001fde66f14e0f196090000affeaffeaffeff7e567fff7e664a000002080000000500000000" \ 236 | "0000000081000600fe800000000000000042aafffe73bbfafde66f14e0f196090000affeaffeaffe" \ 237 | "ff7e6aaaff7e6aaa0000004800000001000000000000000087000600fde66f14e0f1960900000242" \ 238 | "ac110002fe800000000000000042aafffe73bbfaff7e6aaaff7e6aaa000000400000000100000000" \ 239 | "0000000088000600fe800000000000000042acfffe110002fe800000000000000042aafffe73bbfa" \ 240 | "ff7e7eaaff7e7eaa0000004800000001000000000000000087000600fe800000000000000042aaff" \ 241 | "fe73bbfafe800000000000000042acfffe110002ff7e7eaaff7e7eaa000000400000000100000000" \ 242 | "0000000088000600fe800000000000000042aafffe73bbfafe800000000000000042acfffe110002" \ 243 | "ff7e92aaff7e92aa0000004800000001000000000000000087000600fe800000000000000042acff" \ 244 | "fe110002fe800000000000000042aafffe73bbfaff7e92aaff7e92aa000000400000000100000000" \ 245 | "000000008800060008000044fde66f14e0f196090000affeaffeaffefd41b7143f86000000000000" \ 246 | "00000001ff7ec2a0ff7ec2a00000004a000000010000000000000000d20100351100060004000054" \ 247 | "ac1100027f000001ff7ed62eff7ed68700000036000000010000000000000000c496003511000400" \ 248 | "7f000001ac110002ff7ed62eff7ed687000000760000000100000000000000000035c49611000400" \ 249 | "08000044fde66f14e0f196090000affeaffeaffefd41b7143f8600000000000000000001ff7ef359" \ 250 | "ff7ef3590000004a000000010000000000000000b1e700351100060004000054ac1100027f000001" \ 251 | "ff7f06e4ff7f06e800000036000000010000000000000000a8f90035110004007f000001ac110002" \ 252 | "ff7f06e4ff7f06e8000000a60000000100000000000000000035a8f911000400" 253 | 254 | # Example export for IPFIX with two data sets 255 | PACKET_IPFIX = "000a00d02d45a47000000016000000000801007cfe800000000000000042acfffe110002fde66f14" \ 256 | "e0f196090000000000000001ff7f0755ff7f07550000004800000001000000000000000087000600" \ 257 | "fdabcdef123456789000000000000001fe800000000000000042acfffe110002ff7f0755ff7f0755" \ 258 | "000000400000000100000000000000008800060008000044fde66f14e0f196090000affeaffeaffe" \ 259 | "2a044e42020000000000000000000223ff7f06e9ff7f22d500000140000000040000000000000000" \ 260 | "e54c01bb06020600" 261 | 262 | PACKET_IPFIX_TEMPLATE_ETHER = "000a05002d45a4700000000d00000000" \ 263 | "000200500400001200080004000c000400160004001500040001000400020004000a0004000e0004" \ 264 | "00070002000b00020004000100060001003c000100050001003a0002003b00020038000600390006" \ 265 | "000200440401000f00080004000c000400160004001500040001000400020004000a0004000e0004" \ 266 | "00200002003c000100050001003a0002003b000200380006003900060002005008000012001b0010" \ 267 | "001c001000160004001500040001000400020004000a0004000e000400070002000b000200040001" \ 268 | "00060001003c000100050001003a0002003b00020038000600390006000200440801000f001b0010" \ 269 | "001c001000160004001500040001000400020004000a0004000e0004008b0002003c000100050001" \ 270 | "003a0002003b000200380006003900060003001e010000050001008f000400a00008013100040132" \ 271 | "0004013000020100001a00000009000000b0d80a558000000001000000000001040000747f000001" \ 272 | "ac110002e58b988be58b993e000015c70000000d000000000000000001bbe1a6061b040000000000" \ 273 | "123456affefeaffeaffeaffeac1100027f000001e58b988be58b993e0000074f0000001300000000" \ 274 | "00000000e1a601bb061f040000000000affeaffeaffe123456affefe0401006cac110002ac110001" \ 275 | "e58a7be3e58a83d3000000fc0000000300000000000000000800040000000000affeaffeaffe0242" \ 276 | "aa73bbfaac110001ac110002e58a7be3e58a83d3000000fc00000003000000000000000000000400" \ 277 | "00000000123456affefeaffeaffeaffe080102b0fde66f14e0f196090000affeaffeaffeff020000" \ 278 | "0000000000000001ff110001e58abcd9e58ad098000001b000000006000000000000000087000600" \ 279 | "00000000affeaffeaffe3333ff110001fde66f14e0f196090000affeaffeaffefde66f14e0f19609" \ 280 | "0000000000000001e58b1883e58b284e000002080000000500000000000000008000060000000000" \ 281 | "affeaffeaffe123456affefefdabcdef123456789000000000000001fde66f14e0f1960900000242" \ 282 | "ac110002e58b1883e58b284e0000020800000005000000000000000081000600000000000242aa73" \ 283 | "bbfaaffeaffeaffefe800000000000000042aafffe73bbfafde66f14e0f196090000affeaffeaffe" \ 284 | "e58b2caee58b2cae000000480000000100000000000000008700060000000000123456affefe0242" \ 285 | "ac110002fde66f14e0f196090000affeaffeaffefe800000000000000042aafffe73bbfae58b2cae" \ 286 | "e58b2cae000000400000000100000000000000008800060000000000affeaffeaffe123456affefe" \ 287 | "fe800000000000000042acfffe110002fe800000000000000042aafffe73bbfae58b40aee58b40ae" \ 288 | "000000480000000100000000000000008700060000000000affeaffeaffe123456affefefe800000" \ 289 | "000000000042aafffe73bbfafe800000000000000042acfffe110002e58b40aee58b40ae00000040" \ 290 | "0000000100000000000000008800060000000000123456affefeaffeaffeaffefe80000000000000" \ 291 | "0042aafffe73bbfafe800000000000000042acfffe110002e58b54aee58b54ae0000004800000001" \ 292 | "00000000000000008700060000000000123456affefeaffeaffeaffefe800000000000000042acff" \ 293 | "fe110002fe800000000000000042aafffe73bbfae58b54aee58b54ae000000400000000100000000" \ 294 | "000000008800060000000000affeaffeaffe123456affefe" 295 | 296 | PACKET_IPFIX_ETHER = "000a02905e8b0aa90000001600000000" \ 297 | "08000054fde66f14e0f196090000affeaffeaffefd40abcdabcd00000000000000011111e58b84a4" \ 298 | "e58b84a40000004a000000010000000000000000d20100351100060000000000affeaffeaffe0242" \ 299 | "aa73bbfa04000074ac1100027f000001e58b9831e58b988a00000036000000010000000000000000" \ 300 | "c49600351100040000000000affeaffeaffe123456affefe7f000001ac110002e58b9831e58b988a" \ 301 | "000000760000000100000000000000000035c4961100040000000000123456affefeaffeaffeaffe" \ 302 | "08000054fde66f14e0f196090000affeaffeaffefd40abcdabcd00000000000000011111e58bb55c" \ 303 | "e58bb55c0000004a000000010000000000000000b1e700351100060000000000affeaffeaffe0242" \ 304 | "aa73bbfa04000074ac1100027f000001e58bc8e8e58bc8ec00000036000000010000000000000000" \ 305 | "a8f900351100040000000000affeaffeaffe123456affefe7f000001ac110002e58bc8e8e58bc8ec" \ 306 | "000000a60000000100000000000000000035a8f91100040000000000123456affefeaffeaffeaffe" \ 307 | "0801009cfe800000000000000042acfffe110002fdabcdef123456789000000000000001e58bc958" \ 308 | "e58bc958000000480000000100000000000000008700060000000000affeaffeaffe123456affefe" \ 309 | "fdabcdef123456789000000000000001fe800000000000000042acfffe110002e58bc958e58bc958" \ 310 | "000000400000000100000000000000008800060000000000123456affefeaffeaffeaffe08000054" \ 311 | "fde66f14e0f196090000affeaffeaffe2a044e42020000000000000000000223e58bc8ede58be4d8" \ 312 | "00000140000000040000000000000000e54c01bb0602060000000000affeaffeaffe123456affefe" 313 | 314 | PACKET_IPFIX_PADDING = "000a01c064e0b1900000000200000000000200480400001000080004000c00040016000400150004" \ 315 | "0001000400020004000a0004000e0004003d00010088000100070002000b00020004000100060001" \ 316 | "003c000100050001000200400401000e00080004000c000400160004001500040001000400020004" \ 317 | "000a0004000e0004003d0001008800010020000200040001003c0001000500010002004808000010" \ 318 | "001b0010001c001000160004001500040001000400020004000a0004000e0004003d000100880001" \ 319 | "00070002000b00020004000100060001003c000100050001000200400801000e001b0010001c0010" \ 320 | "00160004001500040001000400020004000a0004000e0004003d000100880001008b000200040001" \ 321 | "003c00010005000100030022010000060001008f000400a000080131000401320004013000020052" \ 322 | "0010040100547f0000017f000001ffff07d0ffff0ff7000000fc0000000300000000000000000001" \ 323 | "08000104007f0000017f000001ffff07d0ffff0ff7000000fc000000030000000000000000000100" \ 324 | "0001040000000100002a0000b2da0000018a0db59d2e000000010000000000017465737463617074" \ 325 | "7572655f73696e67" 326 | -------------------------------------------------------------------------------- /tests/test_analyzer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 5 | 6 | Copyright 2016-2020 Dominik Pataky 7 | Licensed under MIT License. See LICENSE. 8 | """ 9 | import gzip 10 | import json 11 | import subprocess 12 | import sys 13 | import unittest 14 | 15 | from tests.lib import * 16 | from tests.lib import PACKET_V9_TEMPLATE, PACKETS_V9 17 | 18 | 19 | class TestFlowExportAnalyzer(unittest.TestCase): 20 | def test_analyzer(self): 21 | """Test the analyzer by producing some packets, parsing them and then calling the analyzer 22 | in a subprocess, piping in a created gzip JSON collection (as if it is coming from a file). 23 | """ 24 | # First create and parse some packets, which should get exported 25 | pkts, _, _ = send_recv_packets([PACKET_V9_TEMPLATE, *PACKETS_V9]) 26 | 27 | # Now the pkts must be transformed from their data structure to the "gzipped JSON representation", 28 | # which the collector uses for persistant storage. 29 | data_dicts = [] # list holding all entries 30 | for p in pkts: # each pkt has its own entry with timestamp as key 31 | data_dicts.append({p.ts: { 32 | "client": p.client, 33 | "header": p.export.header.to_dict(), 34 | "flows": [f.data for f in p.export.flows] 35 | }}) 36 | data = "\n".join([json.dumps(dd) for dd in data_dicts]) # join all entries together by newlines 37 | 38 | # Different stdout/stderr arguments for backwards compatibility 39 | pipe_output_param = {"capture_output": True} 40 | if sys.version_info < (3, 7): # capture_output was added in Python 3.7 41 | pipe_output_param = { 42 | "stdout": subprocess.PIPE, 43 | "stderr": subprocess.PIPE 44 | } 45 | 46 | # Analyzer takes gzipped input either via stdin or from a file (here: stdin) 47 | gzipped_input = gzip.compress(data.encode()) # encode to unicode 48 | 49 | # Run analyzer as CLI script with no packets ignored (parameter) 50 | analyzer = subprocess.run( 51 | [sys.executable, '-m', 'netflow.analyzer', '-p', '0'], 52 | input=gzipped_input, 53 | **pipe_output_param 54 | ) 55 | 56 | # If stderr has content, print it 57 | # make sure there are no errors 58 | self.assertEqual(analyzer.stderr, b"", analyzer.stderr.decode()) 59 | 60 | # Every 2 flows are written as a single line (any extras are dropped) 61 | num_flows = sum(len(list(item.values())[0]["flows"]) for item in data_dicts) 62 | self.assertEqual(len(analyzer.stdout.splitlines()) - 2, num_flows // 2) # ignore two header lines 63 | 64 | 65 | if __name__ == '__main__': 66 | unittest.main() 67 | -------------------------------------------------------------------------------- /tests/test_ipfix.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 5 | 6 | Copyright 2016-2020 Dominik Pataky 7 | Licensed under MIT License. See LICENSE. 8 | """ 9 | # TODO: tests with 500 packets fail with delay=0. Probably a problem with UDP sockets buffer 10 | # TODO: add test for template withdrawal 11 | 12 | import ipaddress 13 | import unittest 14 | 15 | from tests.lib import send_recv_packets, PACKET_IPFIX_TEMPLATE, PACKET_IPFIX, PACKET_IPFIX_ETHER, \ 16 | PACKET_IPFIX_TEMPLATE_ETHER, PACKET_IPFIX_PADDING 17 | 18 | 19 | class TestFlowExportIPFIX(unittest.TestCase): 20 | def test_recv_ipfix_packet(self): 21 | """ 22 | Test general sending of raw and receiving and parsing of these packets. 23 | If this test runs successfully, the sender thread has sent a raw bytes packet towards a locally 24 | listening collector thread, and the collector has successfully received and parsed the packets. 25 | :return: 26 | """ 27 | # send packet without any template, must fail to parse (packets are queued) 28 | pkts, _, _ = send_recv_packets([PACKET_IPFIX]) 29 | self.assertEqual(len(pkts), 0) # no export is parsed due to missing template 30 | 31 | # send packet with 5 templates and 20 flows, should parse correctly since the templates are known 32 | pkts, _, _ = send_recv_packets([PACKET_IPFIX_TEMPLATE]) 33 | self.assertEqual(len(pkts), 1) 34 | 35 | p = pkts[0] 36 | self.assertEqual(p.client[0], "127.0.0.1") 37 | self.assertEqual(len(p.export.flows), 1 + 2 + 2 + 9 + 1 + 2 + 1 + 2) # count flows 38 | self.assertEqual(len(p.export.templates), 4 + 1) # count new templates 39 | 40 | # send template and multiple export packets 41 | pkts, _, _ = send_recv_packets([PACKET_IPFIX, PACKET_IPFIX_TEMPLATE, PACKET_IPFIX]) 42 | self.assertEqual(len(pkts), 3) 43 | self.assertEqual(pkts[0].export.header.version, 10) 44 | 45 | # check amount of flows across all packets 46 | total_flows = 0 47 | for packet in pkts: 48 | total_flows += len(packet.export.flows) 49 | self.assertEqual(total_flows, 2 + 1 + (1 + 2 + 2 + 9 + 1 + 2 + 1 + 2) + 2 + 1) 50 | 51 | def test_ipfix_contents(self): 52 | """ 53 | Inspect content of exported flows, eg. test the value of an option flow and the correct 54 | parsing of IPv4 and IPv6 addresses. 55 | :return: 56 | """ 57 | p = send_recv_packets([PACKET_IPFIX_TEMPLATE])[0][0] 58 | 59 | flow = p.export.flows[0] 60 | self.assertEqual(flow.meteringProcessId, 2649) 61 | self.assertEqual(flow.selectorAlgorithm, 1) 62 | self.assertEqual(flow.systemInitTimeMilliseconds, 1585735165729) 63 | 64 | flow = p.export.flows[1] # HTTPS flow from web server to client 65 | self.assertEqual(flow.destinationIPv4Address, 2886795266) 66 | self.assertEqual(ipaddress.ip_address(flow.destinationIPv4Address), 67 | ipaddress.ip_address("172.17.0.2")) 68 | self.assertEqual(flow.protocolIdentifier, 6) # TCP 69 | self.assertEqual(flow.sourceTransportPort, 443) 70 | self.assertEqual(flow.destinationTransportPort, 57766) 71 | self.assertEqual(flow.tcpControlBits, 0x1b) 72 | 73 | flow = p.export.flows[17] # IPv6 flow 74 | self.assertEqual(flow.protocolIdentifier, 17) # UDP 75 | self.assertEqual(flow.sourceIPv6Address, 0xfde66f14e0f196090000affeaffeaffe) 76 | self.assertEqual(ipaddress.ip_address(flow.sourceIPv6Address), # Docker ULA 77 | ipaddress.ip_address("fde6:6f14:e0f1:9609:0:affe:affe:affe")) 78 | 79 | def test_ipfix_contents_ether(self): 80 | """ 81 | IPFIX content tests based on exports with the softflowd "-T ether" flag, meaning that layer 2 82 | is included in the export, like MAC addresses. 83 | :return: 84 | """ 85 | pkts, _, _ = send_recv_packets([PACKET_IPFIX_TEMPLATE_ETHER, PACKET_IPFIX_ETHER]) 86 | self.assertEqual(len(pkts), 2) 87 | p = pkts[0] 88 | 89 | # Inspect contents of specific flows 90 | flow = p.export.flows[0] 91 | self.assertEqual(flow.meteringProcessId, 9) 92 | self.assertEqual(flow.selectorAlgorithm, 1) 93 | self.assertEqual(flow.systemInitTimeMilliseconds, 759538800000) 94 | 95 | flow = p.export.flows[1] 96 | self.assertEqual(flow.destinationIPv4Address, 2886795266) 97 | self.assertTrue(hasattr(flow, "sourceMacAddress")) 98 | self.assertTrue(hasattr(flow, "postDestinationMacAddress")) 99 | self.assertEqual(flow.sourceMacAddress, 0x123456affefe) 100 | self.assertEqual(flow.postDestinationMacAddress, 0xaffeaffeaffe) 101 | 102 | def test_ipfix_padding(self): 103 | """ 104 | Checks successful parsing of export packets that contain padding zeroes in an IPFIX set. 105 | The padding in the example data is in between the last two data sets, so the successful parsing of the last 106 | data set indicates correct handling of padding zero bytes. 107 | """ 108 | pkts, _, _ = send_recv_packets([PACKET_IPFIX_PADDING]) 109 | self.assertEqual(len(pkts), 1) 110 | p = pkts[0] 111 | 112 | # Check for length of whole export 113 | self.assertEqual(p.export.header.length, 448) 114 | 115 | # Check a specific value of the last flow in the export. Success means correct handling of padding in the set 116 | self.assertEqual(p.export.flows[-1].meteringProcessId, 45786) 117 | -------------------------------------------------------------------------------- /tests/test_netflow.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 5 | 6 | Copyright 2016-2020 Dominik Pataky 7 | Licensed under MIT License. See LICENSE. 8 | """ 9 | # TODO: tests with 500 packets fail with delay=0. Probably a problem with UDP sockets buffer 10 | 11 | import ipaddress 12 | import random 13 | import unittest 14 | 15 | from tests.lib import send_recv_packets, NUM_PACKETS, \ 16 | PACKET_INVALID, PACKET_V1, PACKET_V5, PACKET_V9_WITH_ZEROS, \ 17 | PACKET_V9_TEMPLATE, PACKET_V9_TEMPLATE_MIXED, PACKETS_V9 18 | 19 | 20 | class TestFlowExportNetflow(unittest.TestCase): 21 | def _test_recv_all_packets(self, num, template_idx, delay=0.0001): 22 | """Fling packets at the server and test that it receives them all""" 23 | 24 | def gen_pkts(n, idx): 25 | for x in range(n): 26 | if x == idx: 27 | yield PACKET_V9_TEMPLATE 28 | else: 29 | yield random.choice(PACKETS_V9) 30 | 31 | pkts, tstart, tend = send_recv_packets(gen_pkts(num, template_idx), delay=delay) 32 | 33 | # check number of packets 34 | self.assertEqual(len(pkts), num) 35 | 36 | # check timestamps are when packets were sent, not processed 37 | self.assertTrue(all(tstart < p.ts < tend for p in pkts)) 38 | 39 | # check number of "things" in the packets (flows + templates) 40 | # template packet = 10 things 41 | # other packets = 12 things 42 | self.assertEqual(sum(p.export.header.count for p in pkts), (num - 1) * 12 + 10) 43 | 44 | # check number of flows in the packets 45 | # template packet = 8 flows (2 templates) 46 | # other packets = 12 flows 47 | self.assertEqual(sum(len(p.export.flows) for p in pkts), (num - 1) * 12 + 8) 48 | 49 | def test_recv_all_packets_template_first(self): 50 | """Test all packets are received when the template is sent first""" 51 | self._test_recv_all_packets(NUM_PACKETS, 0) 52 | 53 | def test_recv_all_packets_template_middle(self): 54 | """Test all packets are received when the template is sent in the middle""" 55 | self._test_recv_all_packets(NUM_PACKETS, NUM_PACKETS // 2) 56 | 57 | def test_recv_all_packets_template_last(self): 58 | """Test all packets are received when the template is sent last""" 59 | self._test_recv_all_packets(NUM_PACKETS, NUM_PACKETS - 1) 60 | 61 | def test_recv_all_packets_slowly(self): 62 | """Test all packets are received when things are sent slooooowwwwwwwwlllllllyyyyyy""" 63 | self._test_recv_all_packets(3, 0, delay=1) 64 | 65 | def test_ignore_invalid_packets(self): 66 | """Test that invalid packets log a warning but are otherwise ignored""" 67 | with self.assertLogs(level='WARNING'): 68 | pkts, _, _ = send_recv_packets([ 69 | PACKET_INVALID, PACKET_V9_TEMPLATE, random.choice(PACKETS_V9), PACKET_INVALID, 70 | random.choice(PACKETS_V9), PACKET_INVALID 71 | ]) 72 | self.assertEqual(len(pkts), 3) 73 | 74 | def test_recv_v1_packet(self): 75 | """Test NetFlow v1 packet parsing""" 76 | pkts, _, _ = send_recv_packets([PACKET_V1]) 77 | self.assertEqual(len(pkts), 1) 78 | 79 | # Take the parsed packet and check meta data 80 | p = pkts[0] 81 | self.assertEqual(p.client[0], "127.0.0.1") # collector listens locally 82 | self.assertEqual(len(p.export.flows), 2) # ping request and reply 83 | self.assertEqual(p.export.header.count, 2) # same value, in header 84 | self.assertEqual(p.export.header.version, 1) 85 | 86 | # Check specific IP address contained in a flow. 87 | # Since it might vary which flow of the pair is epxorted first, check both 88 | flow = p.export.flows[0] 89 | self.assertIn( 90 | ipaddress.ip_address(flow.IPV4_SRC_ADDR), # convert to ipaddress obj because value is int 91 | [ipaddress.ip_address("172.17.0.1"), ipaddress.ip_address("172.17.0.2")] 92 | ) 93 | self.assertEqual(flow.PROTO, 1) # ICMP 94 | 95 | def test_recv_v5_packet(self): 96 | """Test NetFlow v5 packet parsing""" 97 | pkts, _, _ = send_recv_packets([PACKET_V5]) 98 | self.assertEqual(len(pkts), 1) 99 | 100 | p = pkts[0] 101 | self.assertEqual(p.client[0], "127.0.0.1") 102 | self.assertEqual(len(p.export.flows), 3) # ping request and reply, one multicast 103 | self.assertEqual(p.export.header.count, 3) 104 | self.assertEqual(p.export.header.version, 5) 105 | 106 | # Check specific IP address contained in a flow. 107 | # Since it might vary which flow of the pair is epxorted first, check both 108 | flow = p.export.flows[0] 109 | self.assertIn( 110 | ipaddress.ip_address(flow.IPV4_SRC_ADDR), # convert to ipaddress obj because value is int 111 | [ipaddress.ip_address("172.17.0.1"), ipaddress.ip_address("172.17.0.2")] # matches multicast packet too 112 | ) 113 | self.assertEqual(flow.PROTO, 1) # ICMP 114 | 115 | def test_recv_v9_packet(self): 116 | """Test NetFlow v9 packet parsing""" 117 | 118 | # send packet without any template, must fail to parse (packets are queued) 119 | pkts, _, _ = send_recv_packets([PACKETS_V9[0]]) 120 | self.assertEqual(len(pkts), 0) # no export is parsed due to missing template 121 | 122 | # send an invalid packet with zero bytes, must fail to parse 123 | pkts, _, _ = send_recv_packets([PACKET_V9_WITH_ZEROS]) 124 | self.assertEqual(len(pkts), 0) # no export is parsed due to missing template 125 | 126 | # send packet with two templates and eight flows, should parse correctly since the templates are known 127 | pkts, _, _ = send_recv_packets([PACKET_V9_TEMPLATE]) 128 | self.assertEqual(len(pkts), 1) 129 | 130 | # and again, but with the templates at the end in the packet 131 | pkts, _, _ = send_recv_packets([PACKET_V9_TEMPLATE_MIXED]) 132 | self.assertEqual(len(pkts), 1) 133 | p = pkts[0] 134 | self.assertEqual(p.client[0], "127.0.0.1") 135 | self.assertEqual(len(p.export.flows), 8) # count flows 136 | self.assertEqual(len(p.export.templates), 2) # count new templates 137 | 138 | # Inspect contents of specific flows 139 | flow = p.export.flows[0] 140 | self.assertEqual(flow.PROTOCOL, 6) # TCP 141 | self.assertEqual(flow.L4_SRC_PORT, 80) 142 | self.assertEqual(flow.IPV4_SRC_ADDR, "127.0.0.1") 143 | 144 | flow = p.export.flows[-1] # last flow 145 | self.assertEqual(flow.PROTOCOL, 17) # UDP 146 | self.assertEqual(flow.L4_DST_PORT, 53) 147 | 148 | # send template and multiple export packets 149 | pkts, _, _ = send_recv_packets([PACKET_V9_TEMPLATE, *PACKETS_V9]) 150 | self.assertEqual(len(pkts), 4) 151 | self.assertEqual(pkts[0].export.header.version, 9) 152 | 153 | # check amount of flows across all packets 154 | total_flows = 0 155 | for packet in pkts: 156 | total_flows += len(packet.export.flows) 157 | self.assertEqual(total_flows, 8 + 12 + 12 + 12) 158 | -------------------------------------------------------------------------------- /tests/test_performance.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd. 5 | 6 | Copyright 2016-2020 Dominik Pataky 7 | Licensed under MIT License. See LICENSE. 8 | """ 9 | import cProfile 10 | import io 11 | import linecache 12 | import pstats 13 | import tracemalloc 14 | import unittest 15 | 16 | from tests.lib import send_recv_packets, generate_packets 17 | 18 | NUM_PACKETS_PERFORMANCE = 2000 19 | 20 | 21 | @unittest.skip("Not necessary in functional tests, used as analysis tool") 22 | class TestNetflowIPFIXPerformance(unittest.TestCase): 23 | def setUp(self) -> None: 24 | """ 25 | Before each test run, start tracemalloc profiling. 26 | :return: 27 | """ 28 | tracemalloc.start() 29 | print("\n\n") 30 | 31 | def tearDown(self) -> None: 32 | """ 33 | After each test run, stop tracemalloc. 34 | :return: 35 | """ 36 | tracemalloc.stop() 37 | 38 | def _memory_of_version(self, version, store_packets=500) -> tracemalloc.Snapshot: 39 | """ 40 | Create memory snapshot of collector run with packets of version :version: 41 | :param version: 42 | :return: 43 | """ 44 | if not tracemalloc.is_tracing(): 45 | raise RuntimeError 46 | pkts, t1, t2 = send_recv_packets(generate_packets(NUM_PACKETS_PERFORMANCE, version), 47 | store_packets=store_packets) 48 | self.assertEqual(len(pkts), NUM_PACKETS_PERFORMANCE) 49 | snapshot = tracemalloc.take_snapshot() 50 | del pkts 51 | return snapshot 52 | 53 | @staticmethod 54 | def _print_memory_statistics(snapshot: tracemalloc.Snapshot, key: str, topx: int = 10): 55 | """ 56 | Print memory statistics from a tracemalloc.Snapshot in certain formats. 57 | :param snapshot: 58 | :param key: 59 | :param topx: 60 | :return: 61 | """ 62 | if key not in ["filename", "lineno", "traceback"]: 63 | raise KeyError 64 | 65 | stats = snapshot.statistics(key) 66 | if key == "lineno": 67 | for idx, stat in enumerate(stats[:topx]): 68 | frame = stat.traceback[0] 69 | print("\n{idx:02d}: {filename}:{lineno} {size:.1f} KiB, count {count}".format( 70 | idx=idx + 1, filename=frame.filename, lineno=frame.lineno, size=stat.size / 1024, count=stat.count 71 | )) 72 | 73 | lines = [] 74 | lines_whitespaces = [] 75 | for lineshift in range(-3, 2): 76 | stat = linecache.getline(frame.filename, frame.lineno + lineshift) 77 | lines_whitespaces.append(len(stat) - len(stat.lstrip(" "))) # count 78 | lines.append(stat.strip()) 79 | lines_whitespaces = [x - min([y for y in lines_whitespaces if y > 0]) for x in lines_whitespaces] 80 | for lidx, stat in enumerate(lines): 81 | print(" {}{}".format("> " if lidx == 3 else "| ", " " * lines_whitespaces.pop(0) + stat)) 82 | elif key == "filename": 83 | for idx, stat in enumerate(stats[:topx]): 84 | frame = stat.traceback[0] 85 | print("{idx:02d}: {filename:80s} {size:6.1f} KiB, count {count:5