├── .ci └── check-format.sh ├── .clang-format ├── .github └── workflows │ └── main.yml ├── .gitignore ├── LICENSE ├── Makefile ├── README.md ├── config.h ├── include ├── linker_set.h ├── nstack_in.h ├── nstack_link.h ├── nstack_socket.h ├── nstack_util.h └── queue_r.h ├── src ├── arp.c ├── collection.h ├── ether.c ├── ether_fcs.c ├── icmp.c ├── ip.c ├── ip_defer.c ├── ip_defer.h ├── ip_fragment.c ├── ip_route.c ├── linux │ └── ether.c ├── logger.h ├── nstack.c ├── nstack_arp.h ├── nstack_ether.h ├── nstack_icmp.h ├── nstack_internal.h ├── nstack_ip.h ├── socket.c ├── tcp.c ├── tcp.h ├── tree.h ├── udp.c └── udp.h ├── tests ├── tcptest.c ├── tnetcat.c ├── udp.c └── unetcat.c └── tools ├── assert.sh ├── gdb ├── .gitignore └── nstack.py ├── gdbinit ├── ping_test.sh ├── run.sh └── testenv.sh /.ci/check-format.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SOURCES=$(find $(git rev-parse --show-toplevel) | egrep "\.(c|cxx|cpp|h|hpp)\$") 4 | 5 | set -x 6 | 7 | for file in ${SOURCES}; 8 | do 9 | clang-format-12 ${file} > expected-format 10 | diff -u -p --label="${file}" --label="expected coding style" ${file} expected-format 11 | done 12 | exit $(clang-format-12 --output-replacements-xml ${SOURCES} | egrep -c "") 13 | -------------------------------------------------------------------------------- /.clang-format: -------------------------------------------------------------------------------- 1 | BasedOnStyle: Chromium 2 | Language: Cpp 3 | MaxEmptyLinesToKeep: 3 4 | IndentCaseLabels: false 5 | AllowShortIfStatementsOnASingleLine: false 6 | AllowShortCaseLabelsOnASingleLine: false 7 | AllowShortLoopsOnASingleLine: false 8 | DerivePointerAlignment: false 9 | PointerAlignment: Right 10 | SpaceAfterCStyleCast: true 11 | TabWidth: 4 12 | UseTab: Never 13 | IndentWidth: 4 14 | BreakBeforeBraces: Linux 15 | AccessModifierOffset: -4 16 | ForEachMacros: 17 | - SET_FOREACH 18 | - RB_FOREACH 19 | - SLIST_FOREACH 20 | - STAILQ_FOREACH 21 | - LIST_FOREACH 22 | - TAILQ_FOREACH 23 | -------------------------------------------------------------------------------- /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | 3 | on: [push, pull_request] 4 | 5 | jobs: 6 | nstack: 7 | runs-on: ubuntu-22.04 8 | steps: 9 | - uses: actions/checkout@v3.1.0 10 | - name: default build 11 | run: make 12 | coding_style: 13 | runs-on: ubuntu-22.04 14 | steps: 15 | - uses: actions/checkout@v3.1.0 16 | - name: coding convention 17 | run: | 18 | sudo apt-get install -q -y clang-format-12 19 | sh .ci/check-format.sh 20 | shell: bash 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | *.o.d 3 | *.swp 4 | build/ 5 | __pycache__/ 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | nstack is freely redistributable under the two-clause BSD License: 2 | 3 | Copyright (c) 2018-2019, 2022 National Cheng Kung University, Taiwan. 4 | Copyright (c) 2015, 2017 Olli Vanhoja. 5 | All rights reserved. 6 | 7 | Redistribution and use in source and binary forms, with or without 8 | modification, are permitted provided that the following conditions are met: 9 | 10 | * Redistributions of source code must retain the above copyright notice, this 11 | list of conditions and the following disclaimer. 12 | 13 | * Redistributions in binary form must reproduce the above copyright notice, 14 | this list of conditions and the following disclaimer in the documentation 15 | and/or other materials provided with the distribution. 16 | 17 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 18 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 19 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 20 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 21 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 22 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 23 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 24 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 25 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 26 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 27 | 28 | ---------------------------------------------------------------------------- 29 | 30 | File include/linker_set.h 31 | 32 | nstack utilizes parts of FreeBSD libraries, which are freely 33 | redistributable under the two-clause BSD License: 34 | 35 | Copyright (c) 1999 John D. Polstra 36 | Copyright (c) 1999, 2001 Peter Wemm 37 | All rights reserved. 38 | 39 | Redistribution and use in source and binary forms, with or without 40 | modification, are permitted provided that the following conditions 41 | are met: 42 | 1. Redistributions of source code must retain the above copyright 43 | notice, this list of conditions and the following disclaimer. 44 | 2. Redistributions in binary form must reproduce the above copyright 45 | notice, this list of conditions and the following disclaimer in the 46 | documentation and/or other materials provided with the distribution. 47 | 48 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 49 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 50 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 51 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 52 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 53 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 54 | OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 55 | HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 56 | LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 57 | OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 58 | SUCH DAMAGE. 59 | 60 | ---------------------------------------------------------------------------- 61 | 62 | File src/tree.h 63 | 64 | nstack utilizes parts of FreeBSD libraries, which are freely 65 | redistributable under the two-clause BSD License: 66 | 67 | Copyright 2002 Niels Provos 68 | All rights reserved. 69 | 70 | Redistribution and use in source and binary forms, with or without 71 | modification, are permitted provided that the following conditions 72 | are met: 73 | 1. Redistributions of source code must retain the above copyright 74 | notice, this list of conditions and the following disclaimer. 75 | 2. Redistributions in binary form must reproduce the above copyright 76 | notice, this list of conditions and the following disclaimer in the 77 | documentation and/or other materials provided with the distribution. 78 | 79 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 80 | IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 81 | OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 82 | IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 83 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 84 | NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 85 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 86 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 87 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 88 | THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 89 | 90 | ---------------------------------------------------------------------------- 91 | 92 | File src/collection.h 93 | 94 | nstack utilizes parts of FreeBSD libraries, which are freely 95 | redistributable under the two-clause BSD License: 96 | 97 | Copyright (c) 1991, 1993 The Regents of the University of California. 98 | All rights reserved. 99 | 100 | Redistribution and use in source and binary forms, with or without 101 | modification, are permitted provided that the following conditions 102 | are met: 103 | 1. Redistributions of source code must retain the above copyright 104 | notice, this list of conditions and the following disclaimer. 105 | 2. Redistributions in binary form must reproduce the above copyright 106 | notice, this list of conditions and the following disclaimer in the 107 | documentation and/or other materials provided with the distribution. 108 | 109 | THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 110 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 111 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 112 | ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 113 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 114 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 115 | OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 116 | HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 117 | LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 118 | OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 119 | SUCH DAMAGE. 120 | 121 | ---------------------------------------------------------------------------- 122 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | CC ?= gcc 2 | OUT ?= build 3 | 4 | CFLAGS := -Wall -Wextra -Wno-unused-parameter -g 5 | CFLAGS += --std=gnu99 -pthread 6 | CFLAGS += -include config.h -I include 7 | 8 | SRC = src 9 | 10 | OBJS_core := \ 11 | arp.o \ 12 | ether.o \ 13 | ether_fcs.o \ 14 | icmp.o \ 15 | ip.o \ 16 | ip_defer.o \ 17 | ip_fragment.o \ 18 | ip_route.o \ 19 | tcp.o \ 20 | udp.o \ 21 | nstack.o \ 22 | linux/ether.o 23 | OBJS_core := $(addprefix $(OUT)/, $(OBJS_core)) 24 | 25 | OBJS_socket := \ 26 | socket.o 27 | OBJS_socket := $(addprefix $(OUT)/, $(OBJS_socket)) 28 | 29 | OBJS := $(OBJS_core) $(OBJS_socket) 30 | deps := $(OBJS:%.o=%.o.d) 31 | 32 | SHELL_HACK := $(shell mkdir -p $(OUT)) 33 | SHELL_HACK := $(shell mkdir -p $(OUT)/linux) 34 | 35 | EXEC = $(OUT)/inetd $(OUT)/tnetcat $(OUT)/unetcat $(OUT)/tcptest 36 | 37 | all: $(EXEC) 38 | 39 | $(OUT)/%.o: $(SRC)/%.c 40 | $(CC) -o $@ $(CFLAGS) -c -MMD -MF $@.d $< 41 | 42 | $(OUT)/inetd: $(OBJS_core) 43 | $(CC) $(CFLAGS) -o $@ $^ 44 | 45 | $(OUT)/tnetcat: $(OBJS_socket) 46 | $(CC) $(CFLAGS) -o $@ tests/tnetcat.c $^ 47 | 48 | $(OUT)/unetcat: $(OBJS_socket) 49 | $(CC) $(CFLAGS) -o $@ tests/unetcat.c $^ 50 | 51 | $(OUT)/tcptest: $(OBJS_socket) 52 | $(CC) $(CFLAGS) -o $@ tests/tcptest.c $^ 53 | 54 | clean: 55 | $(RM) $(EXEC) $(OBJS) $(deps) 56 | distclean: clean 57 | $(RM) -r $(OUT) 58 | 59 | -include $(deps) 60 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # nstack 2 | 3 | ## Overview 4 | 5 | nstack is a Linux userspace TCP/IP stack. It was constructed to meet the following goals: 6 | * Learn TCP/IP 7 | * Learn Linux systems/network programming 8 | * Learn Linux Socket API 9 | 10 | Current features: 11 | * One network interface and socket 12 | * Ethernet frame handling 13 | * ARP request/reply, simple caching 14 | * ICMP pings and replies 15 | * IPv4 packet handling, checksum 16 | * TCPv4 Handshake 17 | * TCP data transmission 18 | 19 | 20 | ## Build and Test 21 | 22 | ```shell 23 | make 24 | ``` 25 | 26 | Set up test environment: 27 | ```shell 28 | sudo tools/testenv.sh start 29 | tools/run.sh veth1 30 | ``` 31 | 32 | Execute `ping` inside test environment: 33 | ```shell 34 | tools/ping_test.sh 35 | ``` 36 | 37 | Expected nstack messages: 38 | ``` 39 | arp_gratuitous: Announce 10.0.0.2 40 | nstack_ingress_thread: Waiting for rx 41 | nstack_ingress_thread: Frame received! 42 | ether_input: proto id: 0x800 43 | ip_input: proto id: 0x1 44 | icmp_input: ICMP type: 8 45 | nstack_ingress_thread: tick 46 | nstack_ingress_thread: Waiting for rx 47 | nstack_ingress_thread: Frame received! 48 | ether_input: proto id: 0x800 49 | ip_input: proto id: 0x1 50 | icmp_input: ICMP type: 8 51 | ``` 52 | 53 | Ending the test environment: 54 | ```shell 55 | sudo tools/testenv.sh stop 56 | ``` 57 | 58 | # Licensing 59 | 60 | nstack is freely redistributable under the two-clause BSD License. 61 | Use of this source code is governed by a BSD-style license that can be found 62 | in the `LICENSE` file. 63 | 64 | # Reference 65 | 66 | * [Level-IP](https://github.com/saminiir/level-ip) and [informative blog](http://www.saminiir.com/) 67 | * [Linux kernel TCP/IP stack](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/ipv4) 68 | * [picoTCP](https://github.com/tass-belgium/picotcp) 69 | * [tapip](https://github.com/chobits/tapip) 70 | -------------------------------------------------------------------------------- /config.h: -------------------------------------------------------------------------------- 1 | /** 2 | * @addtogroup NSTACK_CONFIG 3 | * @{ 4 | */ 5 | 6 | #pragma once 7 | 8 | #define NSTACK_DATAGRAM_SIZE_MAX 4096 9 | 10 | #define NSTACK_DATAGRAM_BUF_SIZE 16384 11 | 12 | /** 13 | * Periodic IP event tick. 14 | * How often should periodic tasks run. 15 | * This is handled by IP but it's meant to be more generic. 16 | */ 17 | #define NSTACK_PERIODIC_EVENT_SEC 10 18 | 19 | /** 20 | * Periodic TCP timer tick. (500 ms) 21 | */ 22 | #define NSTACK_TCP_TIMER_USEC 500000 23 | 24 | /** 25 | * ARP Configuration. 26 | * @{ 27 | */ 28 | 29 | /** 30 | * ARP Cache size. 31 | * The size of ARP cache in entries. 32 | * If ARP runs out of slots it will free the oldest validdynamic entry in 33 | * the cache; if all entries all static and thus there is no more empty 34 | * slots left the ARP insert will fail. 35 | */ 36 | #define NSTACK_ARP_CACHE_SIZE 50 37 | 38 | /** 39 | * @} 40 | */ 41 | 42 | /* 43 | * @{ 44 | * IP Configuration. 45 | */ 46 | 47 | /** 48 | * RIB (Routing Information Base) size in the number of entries. 49 | */ 50 | #define NSTACK_IP_RIB_SIZE 5 51 | 52 | /** 53 | * Max number of deferred IP packets. 54 | * Maximum number of IP packets waiting for transmission, ie. waiting for ARP 55 | * to provide a destination MAC address. 56 | */ 57 | #define NSTACK_IP_DEFER_MAX 20 58 | 59 | /** 60 | * Unreachable destination IP. 61 | * + 0 = Drop silently 62 | * + 1 = Send ICMP Destination host unreachable 63 | */ 64 | #define NSTACK_IP_SEND_HOSTUNREAC 1 65 | 66 | /** 67 | * The number of buffers reserved for IP fragment reassembly. 68 | */ 69 | #define NSTACK_IP_FRAGMENT_BUF 4 70 | 71 | /** 72 | * IP fragment reassembly timer lower bound [sec]. 73 | * The RFC recommends a default value of 15 seconds. 74 | */ 75 | #define NSTACK_IP_FRAGMENT_TLB 15 76 | 77 | /** 78 | * @} 79 | */ 80 | 81 | /** 82 | * @} 83 | */ 84 | -------------------------------------------------------------------------------- /include/linker_set.h: -------------------------------------------------------------------------------- 1 | /** 2 | * Copyright (c) 1999 John D. Polstra 3 | * Copyright (c) 1999, 2001 Peter Wemm 4 | * All rights reserved. 5 | * 6 | * Redistribution and use in source and binary forms, with or without 7 | * modification, are permitted provided that the following conditions 8 | * are met: 9 | * 1. Redistributions of source code must retain the above copyright 10 | * notice, this list of conditions and the following disclaimer. 11 | * 2. Redistributions in binary form must reproduce the above copyright 12 | * notice, this list of conditions and the following disclaimer in the 13 | * documentation and/or other materials provided with the distribution. 14 | * 15 | * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 16 | * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 17 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 18 | * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 19 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 20 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 21 | * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 22 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 23 | * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 24 | * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 25 | * SUCH DAMAGE. 26 | */ 27 | 28 | #pragma once 29 | 30 | #include "nstack_util.h" 31 | 32 | /* 33 | * The following macros are used to declare global sets of objects, which 34 | * are collected by the linker into a `linker_set' as defined below. 35 | * For ELF, this is done by constructing a separate segment for each set. 36 | */ 37 | 38 | /* 39 | * Private macros, not to be used outside this header file. 40 | */ 41 | #define __MAKE_SET(set, sym) \ 42 | __GLOBL(__CONCAT(__start_set_, set)); \ 43 | __GLOBL(__CONCAT(__stop_set_, set)); \ 44 | static void const *const __set_##set##_sym_##sym __section("set_" #set) \ 45 | __used = &sym 46 | 47 | /* 48 | * Public macros. 49 | */ 50 | #define TEXT_SET(set, sym) __MAKE_SET(set, sym) 51 | #define DATA_SET(set, sym) __MAKE_SET(set, sym) 52 | #define BSS_SET(set, sym) __MAKE_SET(set, sym) 53 | #define ABS_SET(set, sym) __MAKE_SET(set, sym) 54 | #define SET_ENTRY(set, sym) __MAKE_SET(set, sym) 55 | 56 | /* 57 | * Initialize before referring to a given linker set. 58 | */ 59 | #define SET_DECLARE(set, ptype) \ 60 | extern ptype *__CONCAT(__start_set_, set); \ 61 | extern ptype *__CONCAT(__stop_set_, set) 62 | 63 | #define SET_BEGIN(set) (&__CONCAT(__start_set_, set)) 64 | #define SET_LIMIT(set) (&__CONCAT(__stop_set_, set)) 65 | 66 | /* 67 | * Iterate over all the elements of a set. 68 | * 69 | * Sets always contain addresses of things, and "pvar" points to words 70 | * containing those addresses. Thus is must be declared as "type **pvar", 71 | * and the address of each set item is obtained inside the loop by "*pvar". 72 | */ 73 | #define SET_FOREACH(pvar, set) \ 74 | for (pvar = SET_BEGIN(set); pvar < SET_LIMIT(set); pvar++) 75 | 76 | #define SET_ITEM(set, i) ((SET_BEGIN(set))[i]) 77 | 78 | /* 79 | * Provide a count of the items in a set. 80 | */ 81 | #define SET_COUNT(set) (SET_LIMIT(set) - SET_BEGIN(set)) 82 | -------------------------------------------------------------------------------- /include/nstack_in.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include /* TODO Maybe we want to define our own version */ 4 | #include 5 | #include 6 | 7 | typedef uint32_t in_addr_t; 8 | typedef uint16_t in_port_t; 9 | 10 | /** 11 | * Convert an IP address from integer representation to a C string. 12 | * @note The minimum size of buf is IP_STR_LEN. 13 | * @param[in] ip is the IP address to be converted. 14 | * @param[out] buf is the destination buffer. 15 | */ 16 | static inline void ip2str(in_addr_t ip, char *buf) 17 | { 18 | unsigned char bytes[4]; 19 | bytes[0] = ip & 0xFF; 20 | bytes[1] = (ip >> 8) & 0xFF; 21 | bytes[2] = (ip >> 16) & 0xFF; 22 | bytes[3] = (ip >> 24) & 0xFF; 23 | sprintf(buf, "%d.%d.%d.%d", bytes[3], bytes[2], bytes[1], bytes[0]); 24 | } 25 | -------------------------------------------------------------------------------- /include/nstack_link.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | #define LINK_MAC_ALEN 6 6 | 7 | typedef uint8_t mac_addr_t[LINK_MAC_ALEN]; 8 | -------------------------------------------------------------------------------- /include/nstack_socket.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | /** 4 | * nstack sockets. 5 | * @addtogroup Socket 6 | * @{ 7 | */ 8 | 9 | #include 10 | #include 11 | 12 | #include "linker_set.h" 13 | #include "nstack_in.h" 14 | #include "queue_r.h" 15 | 16 | #define NSTACK_SHMEM_SIZE \ 17 | (sizeof(struct nstack_sock_ctrl) + 2 * sizeof(struct queue_cb) + \ 18 | 2 * NSTACK_DATAGRAM_BUF_SIZE) 19 | 20 | #define NSTACK_SOCK_CTRL(x) ((struct nstack_sock_ctrl *) (x)) 21 | 22 | #define NSTACK_INGRESS_QADDR(x) \ 23 | ((struct queue_cb *) ((uintptr_t) NSTACK_SOCK_CTRL(x) + \ 24 | sizeof(struct nstack_sock_ctrl))) 25 | 26 | #define NSTACK_INGRESS_DADDR(x) \ 27 | ((uint8_t *) ((uintptr_t) NSTACK_INGRESS_QADDR(x) + \ 28 | sizeof(struct queue_cb))) 29 | 30 | #define NSTACK_EGRESS_QADDR(x) \ 31 | ((struct queue_cb *) ((uintptr_t) NSTACK_INGRESS_DADDR(x) + \ 32 | NSTACK_DATAGRAM_BUF_SIZE)) 33 | 34 | #define NSTACK_EGRESS_DADDR(x) \ 35 | ((uint8_t *) ((uintptr_t) NSTACK_EGRESS_QADDR(x) + sizeof(struct queue_cb))) 36 | 37 | /** 38 | * Socket domain. 39 | */ 40 | enum nstack_sock_dom { 41 | XF_INET4, /*!< IPv4 address. */ 42 | XF_INET6, /*!< IPv6 address. */ 43 | }; 44 | 45 | /** 46 | * Socket type. 47 | */ 48 | enum nstack_sock_type { 49 | XSOCK_DGRAM, /*!< Unreliable datagram oriented service. */ 50 | XSOCK_STREAM, /*!< Reliable stream oriented service. */ 51 | }; 52 | 53 | /** 54 | * Socket protocol. 55 | */ 56 | enum nstack_sock_proto { 57 | XIP_PROTO_NONE = 0, 58 | XIP_PROTO_TCP, /*!< TCP/IP. */ 59 | XIP_PROTO_UDP, /*!< UDP/IP. */ 60 | XIP_PROTO_LAST 61 | }; 62 | 63 | /** 64 | * Max port number. 65 | */ 66 | #define NSTACK_SOCK_PORT_MAX 49151 67 | 68 | /** 69 | * Socket addresss descriptor. 70 | */ 71 | struct nstack_sockaddr { 72 | union { 73 | in_addr_t inet4_addr; /*!< IPv4 address. */ 74 | }; 75 | union { 76 | int port; /*!< Protocol port. */ 77 | }; 78 | }; 79 | 80 | struct nstack_sock_ctrl { 81 | pid_t pid_inetd; 82 | pid_t pid_end; 83 | }; 84 | 85 | struct nstack_sock_info { 86 | enum nstack_sock_dom sock_dom; 87 | enum nstack_sock_type sock_type; 88 | enum nstack_sock_proto sock_proto; 89 | struct nstack_sockaddr sock_addr; 90 | }; 91 | 92 | struct nstack_dgram { 93 | struct nstack_sockaddr srcaddr; 94 | struct nstack_sockaddr dstaddr; 95 | size_t buf_size; 96 | uint8_t buf[0]; 97 | }; 98 | 99 | #define NSTACK_MSG_PEEK 0x1 100 | 101 | void *nstack_listen(const char *socket_path); 102 | ssize_t nstack_recvfrom(void *socket, 103 | void *restrict buffer, 104 | size_t length, 105 | int flags, 106 | struct nstack_sockaddr *restrict address); 107 | ssize_t nstack_sendto(void *socket, 108 | const void *buffer, 109 | size_t length, 110 | int flags, 111 | const struct nstack_sockaddr *dest_addr); 112 | 113 | /** 114 | * @} 115 | */ 116 | -------------------------------------------------------------------------------- /include/nstack_util.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #ifndef __GLOBL1 4 | #define __GLOBL1(sym) __asm__(".globl " #sym) 5 | #define __GLOBL(sym) __GLOBL1(sym) 6 | #endif 7 | 8 | #ifndef __used 9 | #define __used __attribute__((__used__)) 10 | #endif 11 | 12 | #ifndef __unused 13 | #define __unused __attribute__((__unused__)) 14 | #endif 15 | 16 | #ifndef __section 17 | #define __section(x) __attribute__((__section__(x))) 18 | #endif 19 | 20 | #define __constructor __attribute__((constructor)) 21 | #define __destructor __attribute__((destructor)) 22 | 23 | /** 24 | * Returns a container of ptr, which is a element in some struct. 25 | * @param ptr is a pointer to a element in struct. 26 | * @param type is the type of the container struct. 27 | * @param member is the name of the ptr in container struct. 28 | * @return Pointer to the container of ptr. 29 | */ 30 | #define container_of(ptr, type, member) \ 31 | ((type *) ((uint8_t *) (ptr) -offsetof(type, member))) 32 | 33 | #define num_elem(x) (sizeof(x) / sizeof(*(x))) 34 | 35 | static inline int imax(int a, int b) 36 | { 37 | return (a > b ? a : b); 38 | } 39 | 40 | static inline int imin(int a, int b) 41 | { 42 | return (a < b ? a : b); 43 | } 44 | 45 | static inline long lmax(long a, long b) 46 | { 47 | return (a > b ? a : b); 48 | } 49 | 50 | static inline long lmin(long a, long b) 51 | { 52 | return (a < b ? a : b); 53 | } 54 | 55 | static inline unsigned int max(unsigned int a, unsigned int b) 56 | { 57 | return (a > b ? a : b); 58 | } 59 | 60 | static inline unsigned int min(unsigned int a, unsigned int b) 61 | { 62 | return (a < b ? a : b); 63 | } 64 | 65 | static inline unsigned long ulmax(unsigned long a, unsigned long b) 66 | { 67 | return (a > b ? a : b); 68 | } 69 | 70 | static inline unsigned long ulmin(unsigned long a, unsigned long b) 71 | { 72 | return (a < b ? a : b); 73 | } 74 | 75 | static inline unsigned int smin(size_t a, size_t b) 76 | { 77 | return (a < b ? a : b); 78 | } 79 | 80 | static inline unsigned int uround_up(unsigned n, unsigned s) 81 | { 82 | return ((n + s - 1) / s) * s; 83 | } 84 | -------------------------------------------------------------------------------- /include/queue_r.h: -------------------------------------------------------------------------------- 1 | /** 2 | * @brief Thread-safe queue 3 | */ 4 | 5 | /** 6 | * @addtogroup queue_r 7 | * @{ 8 | */ 9 | 10 | #pragma once 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | 17 | /** 18 | * Queue control block. 19 | */ 20 | typedef struct queue_cb { 21 | size_t b_size; /*!< Block size in bytes. */ 22 | size_t a_len; /*!< Array length. */ 23 | size_t m_write; /*!< Write index. */ 24 | size_t m_read; /*!< Read index. */ 25 | } queue_cb_t; 26 | 27 | #define QUEUE_INITIALIZER(block_size, array_size) \ 28 | (struct queue_cb) \ 29 | { \ 30 | .b_size = (block_size), .a_len = (array_size / block_size), \ 31 | .m_write = 0, .m_read = 0, \ 32 | } 33 | 34 | /** 35 | * Create a new queue control block. 36 | * Initializes a new queue control block and returns it as a value. 37 | * @param block_size the size of single data block/struct/data type in 38 | * data_array in bytes. 39 | * @param arra_size the size of the data_array in bytes. 40 | * @return a new queue_cb_t queue control block structure. 41 | */ 42 | static inline queue_cb_t queue_create(size_t block_size, size_t array_size) 43 | { 44 | queue_cb_t cb = {.b_size = block_size, 45 | .a_len = array_size / block_size, 46 | .m_read = 0, 47 | .m_write = 0}; 48 | return cb; 49 | } 50 | 51 | /** 52 | * Allocate an element from the queue. 53 | * @param cb is a pointer to the queue control block. 54 | */ 55 | static inline int queue_alloc(queue_cb_t *cb) 56 | { 57 | const size_t write = cb->m_write; 58 | const size_t next_element = (write + 1) % cb->a_len; 59 | const size_t b_size = cb->b_size; 60 | 61 | /* Check that the queue is not full */ 62 | if (next_element == cb->m_read) 63 | return -1; 64 | 65 | return write * b_size; 66 | } 67 | 68 | /** 69 | * Commit previous allocation from the queue. 70 | */ 71 | static inline void queue_commit(queue_cb_t *cb) 72 | { 73 | const size_t next_element = (cb->m_write + 1) % cb->a_len; 74 | cb->m_write = next_element; 75 | } 76 | 77 | /** 78 | * Peek an element from the queue. 79 | * @param cb is a pointer to the queue control block. 80 | * @param index is the location where element is located in the buffer. 81 | * @return false if queue is empty; otherwise operation was succeed. 82 | */ 83 | static inline bool queue_peek(queue_cb_t *cb, int *index) 84 | { 85 | const size_t read = cb->m_read; 86 | const size_t b_size = cb->b_size; 87 | 88 | /* Check that the queue is not empty */ 89 | if (read == cb->m_write) 90 | return false; 91 | 92 | *index = read * b_size; 93 | return true; 94 | } 95 | 96 | /** 97 | * Discard n number of elements in the queue from the read end. 98 | * @param cb is a pointer to the queue control block. 99 | * @return Returns the number of elements skipped. 100 | */ 101 | static inline int queue_discard(queue_cb_t *cb, size_t n) 102 | { 103 | size_t count; 104 | for (count = 0; count < n; count++) { 105 | const size_t read = cb->m_read; 106 | 107 | /* Check that the queue is not empty */ 108 | if (read == cb->m_write) 109 | break; 110 | 111 | cb->m_read = (read + 1) % cb->a_len; 112 | } 113 | return count; 114 | } 115 | 116 | /** 117 | * Clear the queue. 118 | * This operation is considered safe when committed from the push end thread. 119 | * @param cb is a pointer to the queue control block. 120 | */ 121 | static inline void queue_clear_from_push_end(queue_cb_t *cb) 122 | { 123 | cb->m_write = cb->m_read; 124 | } 125 | 126 | /** 127 | * Clear the queue. 128 | * This operation is considered safe when committed from the pop end thread. 129 | * @param cb is a pointer to the queue control block. 130 | */ 131 | static inline void queue_clear_from_pop_end(queue_cb_t *cb) 132 | { 133 | cb->m_read = cb->m_write; 134 | } 135 | 136 | /** 137 | * Check if the queue is empty. 138 | * @param cb is a pointer to the queue control block. 139 | * @return false if the queue is not empty. 140 | */ 141 | static inline bool queue_is_empty(queue_cb_t *cb) 142 | { 143 | return cb->m_write == cb->m_read; 144 | } 145 | 146 | /** 147 | * Check if the queue is full. 148 | * @param cb is a pointer to the queue control block. 149 | * @return false if the queue is not full. 150 | */ 151 | static inline bool queue_is_full(queue_cb_t *cb) 152 | { 153 | return ((cb->m_write + 1) % cb->a_len) == cb->m_read; 154 | } 155 | 156 | /** 157 | * @} 158 | */ 159 | -------------------------------------------------------------------------------- /src/arp.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "nstack_util.h" 5 | 6 | #include "ip_defer.h" 7 | #include "logger.h" 8 | #include "nstack_arp.h" 9 | #include "nstack_ether.h" 10 | #include "nstack_internal.h" 11 | #include "nstack_ip.h" 12 | #include "tree.h" 13 | 14 | #define ARP_CACHE_AGE_MAX (20 * 60 * 60) /* Expiration time */ 15 | 16 | struct arp_cache_entry { 17 | in_addr_t ip_addr; 18 | mac_addr_t haddr; 19 | int age; 20 | RB_ENTRY(arp_cache_entry) _entry; 21 | }; 22 | 23 | RB_HEAD(arp_cache_tree, arp_cache_entry); 24 | 25 | static struct arp_cache_entry arp_cache[NSTACK_ARP_CACHE_SIZE]; 26 | static struct arp_cache_tree arp_cache_head = RB_INITIALIZER(); 27 | 28 | static int arp_cache_cmp(struct arp_cache_entry *a, struct arp_cache_entry *b) 29 | { 30 | return a->ip_addr - b->ip_addr; 31 | } 32 | 33 | RB_GENERATE_STATIC(arp_cache_tree, arp_cache_entry, _entry, arp_cache_cmp); 34 | 35 | static int arp_request(int ether_handle, in_addr_t spa, in_addr_t tpa); 36 | static struct arp_cache_entry *arp_cache_get_entry(in_addr_t ip_addr); 37 | 38 | static void arp_hton(const struct arp_ip *host, struct arp_ip *net) 39 | { 40 | net->arp_htype = htons(host->arp_htype); 41 | net->arp_ptype = htons(host->arp_ptype); 42 | net->arp_hlen = host->arp_hlen; 43 | net->arp_plen = host->arp_plen; 44 | net->arp_oper = htons(host->arp_oper); 45 | memmove(net->arp_sha, host->arp_sha, sizeof(mac_addr_t)); 46 | net->arp_spa = ntohl(host->arp_spa); 47 | memmove(net->arp_tha, host->arp_tha, sizeof(mac_addr_t)); 48 | net->arp_tpa = ntohl(host->arp_tpa); 49 | } 50 | 51 | static void arp_ntoh(const struct arp_ip *net, struct arp_ip *host) 52 | { 53 | host->arp_htype = htons(net->arp_htype); 54 | host->arp_ptype = htons(net->arp_ptype); 55 | host->arp_hlen = net->arp_hlen; 56 | host->arp_plen = net->arp_plen; 57 | host->arp_oper = htons(net->arp_oper); 58 | memmove(host->arp_sha, net->arp_sha, sizeof(mac_addr_t)); 59 | host->arp_spa = ntohl(net->arp_spa); 60 | memmove(host->arp_tha, net->arp_tha, sizeof(mac_addr_t)); 61 | host->arp_tpa = ntohl(net->arp_tpa); 62 | } 63 | 64 | int arp_cache_insert(in_addr_t ip_addr, 65 | const mac_addr_t haddr, 66 | enum arp_cache_entry_type type) 67 | { 68 | struct arp_cache_entry *it; 69 | struct arp_cache_entry *entry = NULL; 70 | 71 | if (ip_addr == 0) 72 | return 0; 73 | 74 | if ((entry = arp_cache_get_entry(ip_addr)) > 0) { 75 | entry->age = (int) type; 76 | return 0; 77 | } 78 | 79 | it = arp_cache; 80 | for (size_t i = 0; i < num_elem(arp_cache); i++) { 81 | if (it->age == ARP_CACHE_FREE) { 82 | entry = it; 83 | } else if ((entry && entry->age > it->age) || 84 | (!entry && it->age >= 0)) { 85 | entry = it; 86 | } 87 | it++; 88 | } 89 | if (!entry) { 90 | errno = ENOMEM; 91 | return -1; 92 | } 93 | if (entry->age >= 0) 94 | RB_REMOVE(arp_cache_tree, &arp_cache_head, entry); 95 | 96 | entry->ip_addr = ip_addr; 97 | memcpy(entry->haddr, haddr, sizeof(mac_addr_t)); 98 | entry->age = (int) type; 99 | RB_INSERT(arp_cache_tree, &arp_cache_head, entry); 100 | 101 | return 0; 102 | } 103 | 104 | static struct arp_cache_entry *arp_cache_get_entry(in_addr_t ip_addr) 105 | { 106 | struct arp_cache_entry find = { 107 | .ip_addr = ip_addr, 108 | }; 109 | 110 | return RB_FIND(arp_cache_tree, &arp_cache_head, &find); 111 | } 112 | 113 | void arp_cache_remove(in_addr_t ip_addr) 114 | { 115 | struct arp_cache_entry *entry = arp_cache_get_entry(ip_addr); 116 | 117 | RB_REMOVE(arp_cache_tree, &arp_cache_head, entry); 118 | if (entry) 119 | entry->age = ARP_CACHE_FREE; 120 | } 121 | 122 | int arp_cache_get_haddr(in_addr_t iface, in_addr_t ip_addr, mac_addr_t haddr) 123 | { 124 | struct arp_cache_entry *entry = arp_cache_get_entry(ip_addr); 125 | struct ip_route route; 126 | 127 | if (entry && entry->age >= 0) { 128 | memcpy(haddr, entry->haddr, sizeof(mac_addr_t)); 129 | return 0; 130 | } 131 | 132 | if (!ip_route_find_by_iface(iface, &route) && 133 | !arp_request(route.r_iface_handle, route.r_iface, ip_addr)) { 134 | errno = EHOSTUNREACH; 135 | } 136 | 137 | return -1; 138 | } 139 | 140 | static void arp_cache_update(int delta_time) 141 | { 142 | for (size_t i = 0; i < num_elem(arp_cache); i++) { 143 | struct arp_cache_entry *entry = &arp_cache[i]; 144 | 145 | if (entry->age > ARP_CACHE_AGE_MAX) { 146 | entry->age = ARP_CACHE_FREE; 147 | } else if (entry->age >= 0) { 148 | entry->age += delta_time; 149 | } 150 | } 151 | } 152 | NSTACK_PERIODIC_TASK(arp_cache_update); 153 | 154 | static int arp_input(const struct ether_hdr *hdr __unused, 155 | uint8_t *payload, 156 | size_t bsize) 157 | { 158 | struct arp_ip *arp_net = (struct arp_ip *) payload; 159 | struct arp_ip arp; 160 | 161 | arp_ntoh(arp_net, &arp); 162 | 163 | if (arp.arp_htype != ARP_HTYPE_ETHER) 164 | return -EPROTOTYPE; 165 | 166 | if (arp.arp_ptype == ETHER_PROTO_IPV4) { 167 | struct ip_route route; 168 | char str_ip[IP_STR_LEN]; 169 | 170 | /* Add sender to the ARP cache */ 171 | arp_cache_insert(arp.arp_spa, arp.arp_sha, ARP_CACHE_DYN); 172 | 173 | /* Check for deferred IP packet transmissions */ 174 | ip_defer_handler(0); 175 | 176 | /* Process the opcode */ 177 | switch (arp.arp_oper) { 178 | case ARP_OPER_REQUEST: 179 | ip2str(arp.arp_tpa, str_ip); 180 | LOG(LOG_DEBUG, "ARP request: %s", str_ip); 181 | 182 | if (!ip_route_find_by_iface(arp.arp_tpa, &route)) { 183 | arp_net->arp_oper = htons(ARP_OPER_REPLY); 184 | ether_handle2addr(route.r_iface_handle, arp_net->arp_sha); 185 | memcpy(arp_net->arp_tha, arp.arp_sha, sizeof(mac_addr_t)); 186 | arp_net->arp_tpa = arp_net->arp_spa; 187 | arp_net->arp_spa = htonl(route.r_iface); 188 | 189 | return bsize; 190 | } 191 | break; 192 | case ARP_OPER_REPLY: 193 | /* Nothing more to do. */ 194 | break; 195 | default: 196 | LOG(LOG_WARN, "Invalid ARP op: %d", arp.arp_oper); 197 | break; 198 | } 199 | } else { 200 | LOG(LOG_DEBUG, "Unknown ptype"); 201 | 202 | return -EPROTOTYPE; 203 | } 204 | return 0; 205 | } 206 | ETHER_PROTO_INPUT_HANDLER(ETHER_PROTO_ARP, arp_input); 207 | 208 | /** 209 | * @param[in] sha 210 | * @param[in] spa 211 | * @param[in] tpa 212 | * @param[out] tha 213 | */ 214 | static int arp_request(int ether_handle, in_addr_t spa, in_addr_t tpa) 215 | { 216 | struct arp_ip msg = { 217 | .arp_htype = ARP_HTYPE_ETHER, 218 | .arp_ptype = ETHER_PROTO_IPV4, 219 | .arp_hlen = ETHER_ALEN, 220 | .arp_plen = sizeof(in_addr_t), 221 | .arp_oper = ARP_OPER_REQUEST, 222 | .arp_spa = spa, 223 | .arp_tpa = tpa, 224 | }; 225 | int retval; 226 | 227 | ether_handle2addr(ether_handle, msg.arp_sha); 228 | memset(msg.arp_tha, 0, sizeof(mac_addr_t)); 229 | 230 | arp_hton(&msg, &msg); 231 | retval = ether_send(ether_handle, mac_broadcast_addr, ETHER_PROTO_ARP, 232 | (uint8_t *) (&msg), sizeof(msg)); 233 | 234 | return (retval < 0) ? retval : 0; 235 | } 236 | 237 | int arp_gratuitous(int ether_handle, in_addr_t spa) 238 | { 239 | struct arp_ip msg = { 240 | .arp_htype = ARP_HTYPE_ETHER, 241 | .arp_ptype = ETHER_PROTO_IPV4, 242 | .arp_hlen = ETHER_ALEN, 243 | .arp_plen = sizeof(in_addr_t), 244 | .arp_oper = ARP_OPER_REQUEST, 245 | .arp_spa = spa, 246 | .arp_tpa = spa, 247 | }; 248 | char str_ip[IP_STR_LEN]; 249 | int retval; 250 | 251 | ether_handle2addr(ether_handle, msg.arp_sha); 252 | memset(msg.arp_tha, 0, sizeof(mac_addr_t)); 253 | 254 | ip2str(spa, str_ip); 255 | LOG(LOG_DEBUG, "Announce %s", str_ip); 256 | 257 | arp_hton(&msg, &msg); 258 | retval = ether_send(ether_handle, mac_broadcast_addr, ETHER_PROTO_ARP, 259 | (uint8_t *) (&msg), sizeof(msg)); 260 | if (retval < 0) { 261 | char errmsg[40]; 262 | 263 | strerror_r(errno, errmsg, sizeof(errmsg)); 264 | LOG(LOG_WARN, "Failed to announce %s: %s", str_ip, errmsg); 265 | } 266 | 267 | return 0; 268 | } 269 | -------------------------------------------------------------------------------- /src/collection.h: -------------------------------------------------------------------------------- 1 | /*- 2 | * Copyright (c) 1991, 1993 The Regents of the University of California. 3 | * All rights reserved. 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions 7 | * are met: 8 | * 1. Redistributions of source code must retain the above copyright 9 | * notice, this list of conditions and the following disclaimer. 10 | * 2. Redistributions in binary form must reproduce the above copyright 11 | * notice, this list of conditions and the following disclaimer in the 12 | * documentation and/or other materials provided with the distribution. 13 | * 14 | * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 15 | * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 17 | * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 18 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 20 | * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 21 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 22 | * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 23 | * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 24 | * SUCH DAMAGE. 25 | */ 26 | 27 | #pragma once 28 | 29 | #include 30 | 31 | #define QUEUEDEBUG_ABORT(...) err(1, __VA_ARGS__) 32 | 33 | /* 34 | * This file defines four types of data structures: singly-linked lists, 35 | * singly-linked tail queues, lists and tail queues. 36 | * 37 | * A singly-linked list is headed by a single forward pointer. The elements 38 | * are singly linked for minimum space and pointer manipulation overhead at 39 | * the expense of O(n) removal for arbitrary elements. New elements can be 40 | * added to the list after an existing element or at the head of the list. 41 | * Elements being removed from the head of the list should use the explicit 42 | * macro for this purpose for optimum efficiency. A singly-linked list may 43 | * only be traversed in the forward direction. Singly-linked lists are ideal 44 | * for applications with large datasets and few or no removals or for 45 | * implementing a LIFO queue. 46 | * 47 | * A singly-linked tail queue is headed by a pair of pointers, one to the 48 | * head of the list and the other to the tail of the list. The elements are 49 | * singly linked for minimum space and pointer manipulation overhead at the 50 | * expense of O(n) removal for arbitrary elements. New elements can be added 51 | * to the list after an existing element, at the head of the list, or at the 52 | * end of the list. Elements being removed from the head of the tail queue 53 | * should use the explicit macro for this purpose for optimum efficiency. 54 | * A singly-linked tail queue may only be traversed in the forward direction. 55 | * Singly-linked tail queues are ideal for applications with large datasets 56 | * and few or no removals or for implementing a FIFO queue. 57 | * 58 | * A list is headed by a single forward pointer (or an array of forward 59 | * pointers for a hash table header). The elements are doubly linked 60 | * so that an arbitrary element can be removed without a need to 61 | * traverse the list. New elements can be added to the list before 62 | * or after an existing element or at the head of the list. A list 63 | * may be traversed in either direction. 64 | * 65 | * A tail queue is headed by a pair of pointers, one to the head of the 66 | * list and the other to the tail of the list. The elements are doubly 67 | * linked so that an arbitrary element can be removed without a need to 68 | * traverse the list. New elements can be added to the list before or 69 | * after an existing element, at the head of the list, or at the end of 70 | * the list. A tail queue may be traversed in either direction. 71 | * 72 | * For details on the use of these macros, see the queue(3) manual page. 73 | * 74 | * 75 | * SLIST LIST STAILQ TAILQ 76 | * _HEAD + + + + 77 | * _HEAD_INITIALIZER + + + + 78 | * _ENTRY + + + + 79 | * _INIT + + + + 80 | * _EMPTY + + + + 81 | * _FIRST + + + + 82 | * _NEXT + + + + 83 | * _PREV - + - + 84 | * _LAST - - + + 85 | * _FOREACH + + + + 86 | * _FOREACH_FROM + + + + 87 | * _FOREACH_SAFE + + + + 88 | * _FOREACH_FROM_SAFE + + + + 89 | * _FOREACH_REVERSE - - - + 90 | * _FOREACH_REVERSE_FROM - - - + 91 | * _FOREACH_REVERSE_SAFE - - - + 92 | * _FOREACH_REVERSE_FROM_SAFE - - - + 93 | * _INSERT_HEAD + + + + 94 | * _INSERT_BEFORE - + - + 95 | * _INSERT_AFTER + + + + 96 | * _INSERT_TAIL - - + + 97 | * _CONCAT - - + + 98 | * _REMOVE_AFTER + - + - 99 | * _REMOVE_HEAD + - + - 100 | * _REMOVE + + + + 101 | * _SWAP + + + + 102 | * 103 | */ 104 | #ifdef QUEUE_MACRO_DEBUG 105 | /* Store the last 2 places the queue element or head was altered */ 106 | struct qm_trace { 107 | unsigned long lastline; 108 | unsigned long prevline; 109 | const char *lastfile; 110 | const char *prevfile; 111 | }; 112 | 113 | #define TRACEBUF struct qm_trace trace; 114 | #define TRACEBUF_INITIALIZER {__FILE__, __LINE__, NULL, 0}, 115 | #define TRASHIT(x) \ 116 | do { \ 117 | (x) = (void *) -1; \ 118 | } while (0) 119 | #define QMD_SAVELINK(name, link) void **name = (void *) &(link) 120 | 121 | #define QMD_TRACE_HEAD(head) \ 122 | do { \ 123 | (head)->trace.prevline = (head)->trace.lastline; \ 124 | (head)->trace.prevfile = (head)->trace.lastfile; \ 125 | (head)->trace.lastline = __LINE__; \ 126 | (head)->trace.lastfile = __FILE__; \ 127 | } while (0) 128 | 129 | #define QMD_TRACE_ELEM(elem) \ 130 | do { \ 131 | (elem)->trace.prevline = (elem)->trace.lastline; \ 132 | (elem)->trace.prevfile = (elem)->trace.lastfile; \ 133 | (elem)->trace.lastline = __LINE__; \ 134 | (elem)->trace.lastfile = __FILE__; \ 135 | } while (0) 136 | 137 | #else 138 | #define QMD_TRACE_ELEM(elem) 139 | #define QMD_TRACE_HEAD(head) 140 | #define QMD_SAVELINK(name, link) 141 | #define TRACEBUF 142 | #define TRACEBUF_INITIALIZER 143 | #define TRASHIT(x) 144 | #endif /* QUEUE_MACRO_DEBUG */ 145 | 146 | /* 147 | * Singly-linked List declarations. 148 | */ 149 | #define SLIST_HEAD(name, type) \ 150 | struct name { \ 151 | struct type *slh_first; /* first element */ \ 152 | } 153 | 154 | #define SLIST_HEAD_INITIALIZER(head) \ 155 | { \ 156 | NULL \ 157 | } 158 | 159 | #define SLIST_ENTRY(type) \ 160 | struct { \ 161 | struct type *sle_next; /* next element */ \ 162 | } 163 | 164 | /* 165 | * Singly-linked List functions. 166 | */ 167 | #define SLIST_EMPTY(head) ((head)->slh_first == NULL) 168 | 169 | #define SLIST_FIRST(head) ((head)->slh_first) 170 | 171 | #define SLIST_FOREACH(var, head, field) \ 172 | for ((var) = SLIST_FIRST((head)); (var); (var) = SLIST_NEXT((var), field)) 173 | 174 | #define SLIST_FOREACH_FROM(var, head, field) \ 175 | for ((var) = ((var) ? (var) : SLIST_FIRST((head))); (var); \ 176 | (var) = SLIST_NEXT((var), field)) 177 | 178 | #define SLIST_FOREACH_SAFE(var, head, field, tvar) \ 179 | for ((var) = SLIST_FIRST((head)); \ 180 | (var) && ((tvar) = SLIST_NEXT((var), field), 1); (var) = (tvar)) 181 | 182 | #define SLIST_FOREACH_FROM_SAFE(var, head, field, tvar) \ 183 | for ((var) = ((var) ? (var) : SLIST_FIRST((head))); \ 184 | (var) && ((tvar) = SLIST_NEXT((var), field), 1); (var) = (tvar)) 185 | 186 | #define SLIST_FOREACH_PREVPTR(var, varp, head, field) \ 187 | for ((varp) = &SLIST_FIRST((head)); ((var) = *(varp)) != NULL; \ 188 | (varp) = &SLIST_NEXT((var), field)) 189 | 190 | #define SLIST_INIT(head) \ 191 | do { \ 192 | SLIST_FIRST((head)) = NULL; \ 193 | } while (0) 194 | 195 | #define SLIST_INSERT_AFTER(slistelm, elm, field) \ 196 | do { \ 197 | SLIST_NEXT((elm), field) = SLIST_NEXT((slistelm), field); \ 198 | SLIST_NEXT((slistelm), field) = (elm); \ 199 | } while (0) 200 | 201 | #define SLIST_INSERT_HEAD(head, elm, field) \ 202 | do { \ 203 | SLIST_NEXT((elm), field) = SLIST_FIRST((head)); \ 204 | SLIST_FIRST((head)) = (elm); \ 205 | } while (0) 206 | 207 | #define SLIST_NEXT(elm, field) ((elm)->field.sle_next) 208 | 209 | #define SLIST_REMOVE(head, elm, type, field) \ 210 | do { \ 211 | QMD_SAVELINK(oldnext, (elm)->field.sle_next); \ 212 | if (SLIST_FIRST((head)) == (elm)) { \ 213 | SLIST_REMOVE_HEAD((head), field); \ 214 | } else { \ 215 | struct type *curelm = SLIST_FIRST((head)); \ 216 | while (SLIST_NEXT(curelm, field) != (elm)) \ 217 | curelm = SLIST_NEXT(curelm, field); \ 218 | SLIST_REMOVE_AFTER(curelm, field); \ 219 | } \ 220 | TRASHIT(*oldnext); \ 221 | } while (0) 222 | 223 | #define SLIST_REMOVE_AFTER(elm, field) \ 224 | do { \ 225 | SLIST_NEXT(elm, field) = SLIST_NEXT(SLIST_NEXT(elm, field), field); \ 226 | } while (0) 227 | 228 | #define SLIST_REMOVE_HEAD(head, field) \ 229 | do { \ 230 | SLIST_FIRST((head)) = SLIST_NEXT(SLIST_FIRST((head)), field); \ 231 | } while (0) 232 | 233 | #define SLIST_SWAP(head1, head2, type) \ 234 | do { \ 235 | struct type *swap_first = SLIST_FIRST(head1); \ 236 | SLIST_FIRST(head1) = SLIST_FIRST(head2); \ 237 | SLIST_FIRST(head2) = swap_first; \ 238 | } while (0) 239 | 240 | /* 241 | * Singly-linked Tail queue declarations. 242 | */ 243 | #define STAILQ_HEAD(name, type) \ 244 | struct name { \ 245 | struct type *stqh_first; /* first element */ \ 246 | struct type **stqh_last; /* addr of last next element */ \ 247 | } 248 | 249 | #define STAILQ_HEAD_INITIALIZER(head) \ 250 | { \ 251 | NULL, &(head).stqh_first \ 252 | } 253 | 254 | #define STAILQ_ENTRY(type) \ 255 | struct { \ 256 | struct type *stqe_next; /* next element */ \ 257 | } 258 | 259 | /* 260 | * Singly-linked Tail queue functions. 261 | */ 262 | #define STAILQ_CONCAT(head1, head2) \ 263 | do { \ 264 | if (!STAILQ_EMPTY((head2))) { \ 265 | *(head1)->stqh_last = (head2)->stqh_first; \ 266 | (head1)->stqh_last = (head2)->stqh_last; \ 267 | STAILQ_INIT((head2)); \ 268 | } \ 269 | } while (0) 270 | 271 | #define STAILQ_EMPTY(head) ((head)->stqh_first == NULL) 272 | 273 | #define STAILQ_FIRST(head) ((head)->stqh_first) 274 | 275 | #define STAILQ_FOREACH(var, head, field) \ 276 | for ((var) = STAILQ_FIRST((head)); (var); (var) = STAILQ_NEXT((var), field)) 277 | 278 | #define STAILQ_FOREACH_FROM(var, head, field) \ 279 | for ((var) = ((var) ? (var) : STAILQ_FIRST((head))); (var); \ 280 | (var) = STAILQ_NEXT((var), field)) 281 | 282 | #define STAILQ_FOREACH_SAFE(var, head, field, tvar) \ 283 | for ((var) = STAILQ_FIRST((head)); \ 284 | (var) && ((tvar) = STAILQ_NEXT((var), field), 1); (var) = (tvar)) 285 | 286 | #define STAILQ_FOREACH_FROM_SAFE(var, head, field, tvar) \ 287 | for ((var) = ((var) ? (var) : STAILQ_FIRST((head))); \ 288 | (var) && ((tvar) = STAILQ_NEXT((var), field), 1); (var) = (tvar)) 289 | 290 | #define STAILQ_INIT(head) \ 291 | do { \ 292 | STAILQ_FIRST((head)) = NULL; \ 293 | (head)->stqh_last = &STAILQ_FIRST((head)); \ 294 | } while (0) 295 | 296 | #define STAILQ_INSERT_AFTER(head, tqelm, elm, field) \ 297 | do { \ 298 | if ((STAILQ_NEXT((elm), field) = STAILQ_NEXT((tqelm), field)) == NULL) \ 299 | (head)->stqh_last = &STAILQ_NEXT((elm), field); \ 300 | STAILQ_NEXT((tqelm), field) = (elm); \ 301 | } while (0) 302 | 303 | #define STAILQ_INSERT_HEAD(head, elm, field) \ 304 | do { \ 305 | if ((STAILQ_NEXT((elm), field) = STAILQ_FIRST((head))) == NULL) \ 306 | (head)->stqh_last = &STAILQ_NEXT((elm), field); \ 307 | STAILQ_FIRST((head)) = (elm); \ 308 | } while (0) 309 | 310 | #define STAILQ_INSERT_TAIL(head, elm, field) \ 311 | do { \ 312 | STAILQ_NEXT((elm), field) = NULL; \ 313 | *(head)->stqh_last = (elm); \ 314 | (head)->stqh_last = &STAILQ_NEXT((elm), field); \ 315 | } while (0) 316 | 317 | #define STAILQ_LAST(head, type, field) \ 318 | (STAILQ_EMPTY((head)) \ 319 | ? NULL \ 320 | : __containerof((head)->stqh_last, struct type, field.stqe_next)) 321 | 322 | #define STAILQ_NEXT(elm, field) ((elm)->field.stqe_next) 323 | 324 | #define STAILQ_REMOVE(head, elm, type, field) \ 325 | do { \ 326 | QMD_SAVELINK(oldnext, (elm)->field.stqe_next); \ 327 | if (STAILQ_FIRST((head)) == (elm)) { \ 328 | STAILQ_REMOVE_HEAD((head), field); \ 329 | } else { \ 330 | struct type *curelm = STAILQ_FIRST((head)); \ 331 | while (STAILQ_NEXT(curelm, field) != (elm)) \ 332 | curelm = STAILQ_NEXT(curelm, field); \ 333 | STAILQ_REMOVE_AFTER(head, curelm, field); \ 334 | } \ 335 | TRASHIT(*oldnext); \ 336 | } while (0) 337 | 338 | #define STAILQ_REMOVE_AFTER(head, elm, field) \ 339 | do { \ 340 | if ((STAILQ_NEXT(elm, field) = \ 341 | STAILQ_NEXT(STAILQ_NEXT(elm, field), field)) == NULL) \ 342 | (head)->stqh_last = &STAILQ_NEXT((elm), field); \ 343 | } while (0) 344 | 345 | #define STAILQ_REMOVE_HEAD(head, field) \ 346 | do { \ 347 | if ((STAILQ_FIRST((head)) = \ 348 | STAILQ_NEXT(STAILQ_FIRST((head)), field)) == NULL) \ 349 | (head)->stqh_last = &STAILQ_FIRST((head)); \ 350 | } while (0) 351 | 352 | #define STAILQ_SWAP(head1, head2, type) \ 353 | do { \ 354 | struct type *swap_first = STAILQ_FIRST(head1); \ 355 | struct type **swap_last = (head1)->stqh_last; \ 356 | STAILQ_FIRST(head1) = STAILQ_FIRST(head2); \ 357 | (head1)->stqh_last = (head2)->stqh_last; \ 358 | STAILQ_FIRST(head2) = swap_first; \ 359 | (head2)->stqh_last = swap_last; \ 360 | if (STAILQ_EMPTY(head1)) \ 361 | (head1)->stqh_last = &STAILQ_FIRST(head1); \ 362 | if (STAILQ_EMPTY(head2)) \ 363 | (head2)->stqh_last = &STAILQ_FIRST(head2); \ 364 | } while (0) 365 | 366 | 367 | /* 368 | * List declarations. 369 | */ 370 | #define LIST_HEAD(name, type) \ 371 | struct name { \ 372 | struct type *lh_first; /* first element */ \ 373 | } 374 | 375 | #define LIST_HEAD_INITIALIZER(head) \ 376 | { \ 377 | NULL \ 378 | } 379 | 380 | #define LIST_ENTRY(type) \ 381 | struct { \ 382 | struct type *le_next; /* next element */ \ 383 | struct type **le_prev; /* address of previous next element */ \ 384 | } 385 | 386 | /* 387 | * List functions. 388 | */ 389 | 390 | #define QMD_LIST_CHECK_HEAD(head, field) \ 391 | do { \ 392 | if (LIST_FIRST((head)) != NULL && \ 393 | LIST_FIRST((head))->field.le_prev != &LIST_FIRST((head))) \ 394 | QUEUEDEBUG_ABORT("Bad list head %p first->prev != head", (head)); \ 395 | } while (0) 396 | 397 | #define QMD_LIST_CHECK_NEXT(elm, field) \ 398 | do { \ 399 | if (LIST_NEXT((elm), field) != NULL && \ 400 | LIST_NEXT((elm), field)->field.le_prev != &((elm)->field.le_next)) \ 401 | QUEUEDEBUG_ABORT("Bad link elm %p next->prev != elm", (elm)); \ 402 | } while (0) 403 | 404 | #define QMD_LIST_CHECK_PREV(elm, field) \ 405 | do { \ 406 | if (*(elm)->field.le_prev != (elm)) \ 407 | QUEUEDEBUG_ABORT("Bad link elm %p prev->next != elm", (elm)); \ 408 | } while (0) 409 | 410 | #define LIST_EMPTY(head) ((head)->lh_first == NULL) 411 | 412 | #define LIST_FIRST(head) ((head)->lh_first) 413 | 414 | #define LIST_FOREACH(var, head, field) \ 415 | for ((var) = LIST_FIRST((head)); (var); (var) = LIST_NEXT((var), field)) 416 | 417 | #define LIST_FOREACH_FROM(var, head, field) \ 418 | for ((var) = ((var) ? (var) : LIST_FIRST((head))); (var); \ 419 | (var) = LIST_NEXT((var), field)) 420 | 421 | #define LIST_FOREACH_SAFE(var, head, field, tvar) \ 422 | for ((var) = LIST_FIRST((head)); \ 423 | (var) && ((tvar) = LIST_NEXT((var), field), 1); (var) = (tvar)) 424 | 425 | #define LIST_FOREACH_FROM_SAFE(var, head, field, tvar) \ 426 | for ((var) = ((var) ? (var) : LIST_FIRST((head))); \ 427 | (var) && ((tvar) = LIST_NEXT((var), field), 1); (var) = (tvar)) 428 | 429 | #define LIST_INIT(head) \ 430 | do { \ 431 | LIST_FIRST((head)) = NULL; \ 432 | } while (0) 433 | 434 | #define LIST_INSERT_AFTER(listelm, elm, field) \ 435 | do { \ 436 | QMD_LIST_CHECK_NEXT(listelm, field); \ 437 | if ((LIST_NEXT((elm), field) = LIST_NEXT((listelm), field)) != NULL) \ 438 | LIST_NEXT((listelm), field)->field.le_prev = \ 439 | &LIST_NEXT((elm), field); \ 440 | LIST_NEXT((listelm), field) = (elm); \ 441 | (elm)->field.le_prev = &LIST_NEXT((listelm), field); \ 442 | } while (0) 443 | 444 | #define LIST_INSERT_BEFORE(listelm, elm, field) \ 445 | do { \ 446 | QMD_LIST_CHECK_PREV(listelm, field); \ 447 | (elm)->field.le_prev = (listelm)->field.le_prev; \ 448 | LIST_NEXT((elm), field) = (listelm); \ 449 | *(listelm)->field.le_prev = (elm); \ 450 | (listelm)->field.le_prev = &LIST_NEXT((elm), field); \ 451 | } while (0) 452 | 453 | #define LIST_INSERT_HEAD(head, elm, field) \ 454 | do { \ 455 | QMD_LIST_CHECK_HEAD((head), field); \ 456 | if ((LIST_NEXT((elm), field) = LIST_FIRST((head))) != NULL) \ 457 | LIST_FIRST((head))->field.le_prev = &LIST_NEXT((elm), field); \ 458 | LIST_FIRST((head)) = (elm); \ 459 | (elm)->field.le_prev = &LIST_FIRST((head)); \ 460 | } while (0) 461 | 462 | #define LIST_NEXT(elm, field) ((elm)->field.le_next) 463 | 464 | #define LIST_PREV(elm, head, type, field) \ 465 | ((elm)->field.le_prev == &LIST_FIRST((head)) \ 466 | ? NULL \ 467 | : __containerof((elm)->field.le_prev, struct type, field.le_next)) 468 | 469 | #define LIST_REMOVE(elm, field) \ 470 | do { \ 471 | QMD_SAVELINK(oldnext, (elm)->field.le_next); \ 472 | QMD_SAVELINK(oldprev, (elm)->field.le_prev); \ 473 | QMD_LIST_CHECK_NEXT(elm, field); \ 474 | QMD_LIST_CHECK_PREV(elm, field); \ 475 | if (LIST_NEXT((elm), field) != NULL) \ 476 | LIST_NEXT((elm), field)->field.le_prev = (elm)->field.le_prev; \ 477 | *(elm)->field.le_prev = LIST_NEXT((elm), field); \ 478 | TRASHIT(*oldnext); \ 479 | TRASHIT(*oldprev); \ 480 | } while (0) 481 | 482 | #define LIST_SWAP(head1, head2, type, field) \ 483 | do { \ 484 | struct type *swap_tmp = LIST_FIRST((head1)); \ 485 | LIST_FIRST((head1)) = LIST_FIRST((head2)); \ 486 | LIST_FIRST((head2)) = swap_tmp; \ 487 | if ((swap_tmp = LIST_FIRST((head1))) != NULL) \ 488 | swap_tmp->field.le_prev = &LIST_FIRST((head1)); \ 489 | if ((swap_tmp = LIST_FIRST((head2))) != NULL) \ 490 | swap_tmp->field.le_prev = &LIST_FIRST((head2)); \ 491 | } while (0) 492 | 493 | /* 494 | * Tail queue declarations. 495 | */ 496 | #define TAILQ_HEAD(name, type) \ 497 | struct name { \ 498 | struct type *tqh_first; /* first element */ \ 499 | struct type **tqh_last; /* addr of last next element */ \ 500 | TRACEBUF \ 501 | } 502 | 503 | #define TAILQ_HEAD_INITIALIZER(head) \ 504 | { \ 505 | NULL, &(head).tqh_first, TRACEBUF_INITIALIZER \ 506 | } 507 | 508 | #define TAILQ_ENTRY(type) \ 509 | struct { \ 510 | struct type *tqe_next; /* next element */ \ 511 | struct type **tqe_prev; /* address of previous next element */ \ 512 | TRACEBUF \ 513 | } 514 | 515 | /* 516 | * Tail queue functions. 517 | */ 518 | 519 | #define TAILQ_CONCAT(head1, head2, field) \ 520 | do { \ 521 | if (!TAILQ_EMPTY(head2)) { \ 522 | *(head1)->tqh_last = (head2)->tqh_first; \ 523 | (head2)->tqh_first->field.tqe_prev = (head1)->tqh_last; \ 524 | (head1)->tqh_last = (head2)->tqh_last; \ 525 | TAILQ_INIT((head2)); \ 526 | QMD_TRACE_HEAD(head1); \ 527 | QMD_TRACE_HEAD(head2); \ 528 | } \ 529 | } while (0) 530 | 531 | #define TAILQ_EMPTY(head) ((head)->tqh_first == NULL) 532 | 533 | #define TAILQ_FIRST(head) ((head)->tqh_first) 534 | 535 | #define TAILQ_FOREACH(var, head, field) \ 536 | for ((var) = TAILQ_FIRST((head)); (var); (var) = TAILQ_NEXT((var), field)) 537 | 538 | #define TAILQ_FOREACH_FROM(var, head, field) \ 539 | for ((var) = ((var) ? (var) : TAILQ_FIRST((head))); (var); \ 540 | (var) = TAILQ_NEXT((var), field)) 541 | 542 | #define TAILQ_FOREACH_SAFE(var, head, field, tvar) \ 543 | for ((var) = TAILQ_FIRST((head)); \ 544 | (var) && ((tvar) = TAILQ_NEXT((var), field), 1); (var) = (tvar)) 545 | 546 | #define TAILQ_FOREACH_FROM_SAFE(var, head, field, tvar) \ 547 | for ((var) = ((var) ? (var) : TAILQ_FIRST((head))); \ 548 | (var) && ((tvar) = TAILQ_NEXT((var), field), 1); (var) = (tvar)) 549 | 550 | #define TAILQ_FOREACH_REVERSE(var, head, headname, field) \ 551 | for ((var) = TAILQ_LAST((head), headname); (var); \ 552 | (var) = TAILQ_PREV((var), headname, field)) 553 | 554 | #define TAILQ_FOREACH_REVERSE_FROM(var, head, headname, field) \ 555 | for ((var) = ((var) ? (var) : TAILQ_LAST((head), headname)); (var); \ 556 | (var) = TAILQ_PREV((var), headname, field)) 557 | 558 | #define TAILQ_FOREACH_REVERSE_SAFE(var, head, headname, field, tvar) \ 559 | for ((var) = TAILQ_LAST((head), headname); \ 560 | (var) && ((tvar) = TAILQ_PREV((var), headname, field), 1); \ 561 | (var) = (tvar)) 562 | 563 | #define TAILQ_FOREACH_REVERSE_FROM_SAFE(var, head, headname, field, tvar) \ 564 | for ((var) = ((var) ? (var) : TAILQ_LAST((head), headname)); \ 565 | (var) && ((tvar) = TAILQ_PREV((var), headname, field), 1); \ 566 | (var) = (tvar)) 567 | 568 | #define TAILQ_INIT(head) \ 569 | do { \ 570 | TAILQ_FIRST((head)) = NULL; \ 571 | (head)->tqh_last = &TAILQ_FIRST((head)); \ 572 | QMD_TRACE_HEAD(head); \ 573 | } while (0) 574 | 575 | #define QMD_TAILQ_CHECK_HEAD(head, field) \ 576 | do { \ 577 | if (!TAILQ_EMPTY(head) && \ 578 | TAILQ_FIRST((head))->field.tqe_prev != &TAILQ_FIRST((head))) \ 579 | QUEUEDEBUG_ABORT("Bad tailq head %p first->prev != head", (head)); \ 580 | } while (0) 581 | 582 | #define QMD_TAILQ_CHECK_TAIL(head, field) \ 583 | do { \ 584 | if (*(head)->tqh_last != NULL) \ 585 | QUEUEDEBUG_ABORT("Bad tailq NEXT(%p->tqh_last) != NULL", (head)); \ 586 | } while (0) 587 | 588 | #define QMD_TAILQ_CHECK_NEXT(elm, field) \ 589 | do { \ 590 | if (TAILQ_NEXT((elm), field) != NULL && \ 591 | TAILQ_NEXT((elm), field)->field.tqe_prev != \ 592 | &((elm)->field.tqe_next)) \ 593 | QUEUEDEBUG_ABORT("Bad link elm %p next->prev != elm", (elm)); \ 594 | } while (0) 595 | 596 | #define QMD_TAILQ_CHECK_PREV(elm, field) \ 597 | do { \ 598 | if (*(elm)->field.tqe_prev != (elm)) \ 599 | QUEUEDEBUG_ABORT("Bad link elm %p prev->next != elm", (elm)); \ 600 | } while (0) 601 | 602 | #define TAILQ_INSERT_AFTER(head, listelm, elm, field) \ 603 | do { \ 604 | QMD_TAILQ_CHECK_NEXT(listelm, field); \ 605 | if ((TAILQ_NEXT((elm), field) = TAILQ_NEXT((listelm), field)) != NULL) \ 606 | TAILQ_NEXT((elm), field)->field.tqe_prev = \ 607 | &TAILQ_NEXT((elm), field); \ 608 | else { \ 609 | (head)->tqh_last = &TAILQ_NEXT((elm), field); \ 610 | QMD_TRACE_HEAD(head); \ 611 | } \ 612 | TAILQ_NEXT((listelm), field) = (elm); \ 613 | (elm)->field.tqe_prev = &TAILQ_NEXT((listelm), field); \ 614 | QMD_TRACE_ELEM(&(elm)->field); \ 615 | QMD_TRACE_ELEM(&listelm->field); \ 616 | } while (0) 617 | 618 | #define TAILQ_INSERT_BEFORE(listelm, elm, field) \ 619 | do { \ 620 | QMD_TAILQ_CHECK_PREV(listelm, field); \ 621 | (elm)->field.tqe_prev = (listelm)->field.tqe_prev; \ 622 | TAILQ_NEXT((elm), field) = (listelm); \ 623 | *(listelm)->field.tqe_prev = (elm); \ 624 | (listelm)->field.tqe_prev = &TAILQ_NEXT((elm), field); \ 625 | QMD_TRACE_ELEM(&(elm)->field); \ 626 | QMD_TRACE_ELEM(&listelm->field); \ 627 | } while (0) 628 | 629 | #define TAILQ_INSERT_HEAD(head, elm, field) \ 630 | do { \ 631 | QMD_TAILQ_CHECK_HEAD(head, field); \ 632 | if ((TAILQ_NEXT((elm), field) = TAILQ_FIRST((head))) != NULL) \ 633 | TAILQ_FIRST((head))->field.tqe_prev = &TAILQ_NEXT((elm), field); \ 634 | else \ 635 | (head)->tqh_last = &TAILQ_NEXT((elm), field); \ 636 | TAILQ_FIRST((head)) = (elm); \ 637 | (elm)->field.tqe_prev = &TAILQ_FIRST((head)); \ 638 | QMD_TRACE_HEAD(head); \ 639 | QMD_TRACE_ELEM(&(elm)->field); \ 640 | } while (0) 641 | 642 | #define TAILQ_INSERT_TAIL(head, elm, field) \ 643 | do { \ 644 | QMD_TAILQ_CHECK_TAIL(head, field); \ 645 | TAILQ_NEXT((elm), field) = NULL; \ 646 | (elm)->field.tqe_prev = (head)->tqh_last; \ 647 | *(head)->tqh_last = (elm); \ 648 | (head)->tqh_last = &TAILQ_NEXT((elm), field); \ 649 | QMD_TRACE_HEAD(head); \ 650 | QMD_TRACE_ELEM(&(elm)->field); \ 651 | } while (0) 652 | 653 | #define TAILQ_LAST(head, headname) \ 654 | (*(((struct headname *) ((head)->tqh_last))->tqh_last)) 655 | 656 | #define TAILQ_NEXT(elm, field) ((elm)->field.tqe_next) 657 | 658 | #define TAILQ_PREV(elm, headname, field) \ 659 | (*(((struct headname *) ((elm)->field.tqe_prev))->tqh_last)) 660 | 661 | #define TAILQ_REMOVE(head, elm, field) \ 662 | do { \ 663 | QMD_SAVELINK(oldnext, (elm)->field.tqe_next); \ 664 | QMD_SAVELINK(oldprev, (elm)->field.tqe_prev); \ 665 | QMD_TAILQ_CHECK_NEXT(elm, field); \ 666 | QMD_TAILQ_CHECK_PREV(elm, field); \ 667 | if ((TAILQ_NEXT((elm), field)) != NULL) \ 668 | TAILQ_NEXT((elm), field)->field.tqe_prev = (elm)->field.tqe_prev; \ 669 | else { \ 670 | (head)->tqh_last = (elm)->field.tqe_prev; \ 671 | QMD_TRACE_HEAD(head); \ 672 | } \ 673 | *(elm)->field.tqe_prev = TAILQ_NEXT((elm), field); \ 674 | TRASHIT(*oldnext); \ 675 | TRASHIT(*oldprev); \ 676 | QMD_TRACE_ELEM(&(elm)->field); \ 677 | } while (0) 678 | 679 | #define TAILQ_SWAP(head1, head2, type, field) \ 680 | do { \ 681 | struct type *swap_first = (head1)->tqh_first; \ 682 | struct type **swap_last = (head1)->tqh_last; \ 683 | (head1)->tqh_first = (head2)->tqh_first; \ 684 | (head1)->tqh_last = (head2)->tqh_last; \ 685 | (head2)->tqh_first = swap_first; \ 686 | (head2)->tqh_last = swap_last; \ 687 | if ((swap_first = (head1)->tqh_first) != NULL) \ 688 | swap_first->field.tqe_prev = &(head1)->tqh_first; \ 689 | else \ 690 | (head1)->tqh_last = &(head1)->tqh_first; \ 691 | if ((swap_first = (head2)->tqh_first) != NULL) \ 692 | swap_first->field.tqe_prev = &(head2)->tqh_first; \ 693 | else \ 694 | (head2)->tqh_last = &(head2)->tqh_first; \ 695 | } while (0) 696 | -------------------------------------------------------------------------------- /src/ether.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include "logger.h" 4 | #include "nstack_ether.h" 5 | 6 | SET_DECLARE(_ether_proto_handlers, struct _ether_proto_handler); 7 | 8 | const mac_addr_t mac_broadcast_addr = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; 9 | 10 | int ether_input(const struct ether_hdr *hdr, uint8_t *payload, size_t bsize) 11 | { 12 | struct _ether_proto_handler **tmpp; 13 | struct _ether_proto_handler *proto; 14 | int retval; 15 | 16 | SET_FOREACH (tmpp, _ether_proto_handlers) { 17 | proto = *tmpp; 18 | if (proto->proto_id == hdr->h_proto) 19 | break; 20 | proto = NULL; 21 | } 22 | 23 | LOG(LOG_DEBUG, "proto id: 0x%x", (unsigned) hdr->h_proto); 24 | 25 | if (proto) { 26 | retval = proto->fn(hdr, payload, bsize); 27 | if (retval < 0) { 28 | errno = -retval; 29 | retval = -1; 30 | } 31 | } else { 32 | errno = EPROTONOSUPPORT; 33 | retval = -1; 34 | } 35 | 36 | return retval; 37 | } 38 | 39 | int ether_output_reply(int ether_handle, 40 | const struct ether_hdr *hdr, 41 | uint8_t *payload, 42 | size_t bsize) 43 | { 44 | int retval; 45 | 46 | retval = ether_send(ether_handle, hdr->h_src, hdr->h_proto, payload, bsize); 47 | if (retval < 0) { 48 | errno = -retval; 49 | retval = -1; 50 | } 51 | 52 | return retval; 53 | } 54 | -------------------------------------------------------------------------------- /src/ether_fcs.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | uint32_t ether_fcs(const void *data, size_t bsize) 5 | { 6 | const uint8_t *dp = (uint8_t *) data; 7 | const uint32_t crc_table[] = { 8 | 0x4DBDF21C, 0x500AE278, 0x76D3D2D4, 0x6B64C2B0, 0x3B61B38C, 0x26D6A3E8, 9 | 0x000F9344, 0x1DB88320, 0xA005713C, 0xBDB26158, 0x9B6B51F4, 0x86DC4190, 10 | 0xD6D930AC, 0xCB6E20C8, 0xEDB71064, 0xF0000000}; 11 | uint32_t crc = 0; 12 | 13 | for (size_t i = 0; i < bsize; i++) { 14 | crc = (crc >> 4) ^ crc_table[(crc ^ (dp[i] >> 0)) & 0x0F]; 15 | crc = (crc >> 4) ^ crc_table[(crc ^ (dp[i] >> 4)) & 0x0F]; 16 | } 17 | 18 | return crc; 19 | } 20 | -------------------------------------------------------------------------------- /src/icmp.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "logger.h" 5 | #include "nstack_icmp.h" 6 | #include "nstack_ip.h" 7 | 8 | static void icmp_hton(const struct icmp *host, struct icmp *net) 9 | { 10 | net->icmp_type = host->icmp_type; 11 | net->icmp_code = host->icmp_code; 12 | net->icmp_csum = host->icmp_csum; 13 | } 14 | 15 | static void icmp_ntoh(const struct icmp *net, struct icmp *host) 16 | { 17 | host->icmp_type = net->icmp_type; 18 | host->icmp_code = net->icmp_code; 19 | host->icmp_csum = net->icmp_csum; 20 | } 21 | 22 | static int icmp_input(const struct ip_hdr *ip_hdr __unused, 23 | uint8_t *payload, 24 | size_t bsize) 25 | { 26 | struct icmp *net_msg = (struct icmp *) payload; 27 | struct icmp hdr; 28 | 29 | if (bsize < sizeof(struct icmp)) { 30 | LOG(LOG_ERR, "Invalid ICMP message size"); 31 | 32 | return -EBADMSG; 33 | } 34 | 35 | icmp_ntoh(net_msg, &hdr); 36 | 37 | LOG(LOG_DEBUG, "ICMP type: %d", hdr.icmp_type); 38 | switch (hdr.icmp_type) { 39 | case ICMP_TYPE_ECHO_REQUEST: 40 | net_msg->icmp_type = ICMP_TYPE_ECHO_REPLY; 41 | net_msg->icmp_csum = 0; 42 | net_msg->icmp_csum = ip_checksum(net_msg, bsize); 43 | 44 | return bsize; 45 | default: 46 | LOG(LOG_INFO, "Unkown ICMP message type"); 47 | 48 | return -ENOMSG; 49 | } 50 | } 51 | IP_PROTO_INPUT_HANDLER(IP_PROTO_ICMP, icmp_input); 52 | 53 | int icmp_generate_dest_unreachable(struct ip_hdr *hdr, 54 | int code, 55 | uint8_t *buf, 56 | size_t bsize) 57 | { 58 | struct icmp_destunreac *msg = (struct icmp_destunreac *) buf; 59 | size_t msg_size; 60 | 61 | /* 62 | * We assume there is always some space in the frame to move things around. 63 | */ 64 | bsize = min(sizeof(msg->data), bsize); 65 | msg_size = sizeof(struct icmp_destunreac) + bsize; 66 | 67 | memmove(msg->data, buf, bsize); 68 | msg->icmp = (struct icmp){ 69 | .icmp_type = ICMP_TYPE_DESTUNREAC, 70 | .icmp_code = code, 71 | }; 72 | /* TODO Next-hop MTU if code is 4 */ 73 | icmp_hton(&msg->icmp, &msg->icmp); 74 | msg->icmp.icmp_csum = ip_checksum(msg, msg_size); 75 | ip_hton(hdr, &msg->old_ip_hdr); 76 | 77 | hdr->ip_vhl = IP_VHL_DEFAULT; 78 | hdr->ip_tos = IP_TOS_DEFAULT; 79 | hdr->ip_proto = IP_PROTO_ICMP; 80 | msg_size += ip_reply_header(hdr, msg_size); 81 | 82 | return msg_size; 83 | } 84 | -------------------------------------------------------------------------------- /src/ip.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "nstack_in.h" 5 | 6 | #include "ip_defer.h" 7 | #include "logger.h" 8 | #include "nstack_arp.h" 9 | #include "nstack_icmp.h" 10 | #include "nstack_ip.h" 11 | 12 | SET_DECLARE(_ip_proto_handlers, struct _ip_proto_handler); 13 | 14 | static unsigned ip_global_id; /* Global ID for IP packets. */ 15 | 16 | int ip_config(int ether_handle, in_addr_t ip_addr, in_addr_t netmask) 17 | { 18 | mac_addr_t mac; 19 | struct ip_route route = { 20 | .r_network = ip_addr & netmask, 21 | .r_netmask = netmask, 22 | .r_gw = 0, /* TODO GW support */ 23 | .r_iface = ip_addr, 24 | .r_iface_handle = ether_handle, 25 | }; 26 | 27 | ether_handle2addr(ether_handle, mac); 28 | arp_cache_insert(ip_addr, mac, ARP_CACHE_STATIC); 29 | 30 | ip_route_update(&route); 31 | 32 | /* Announce that we are online. */ 33 | for (size_t i = 0; i < 3; i++) 34 | arp_gratuitous(ether_handle, ip_addr); 35 | 36 | return 0; 37 | } 38 | 39 | uint16_t ip_checksum(void *dp, size_t bsize) 40 | { 41 | uint8_t *data = (uint8_t *) dp; 42 | uint32_t acc = 0xffff; 43 | size_t i; 44 | uint16_t word; 45 | 46 | for (i = 0; i + 1 < bsize; i += 2) { 47 | memcpy(&word, data + i, 2); 48 | acc += word; 49 | if (acc > 0xffff) 50 | acc -= ntohs(0xffff); 51 | } 52 | 53 | if (bsize & 1) { 54 | word = 0; 55 | memcpy(&word, data + bsize - 1, 1); 56 | acc += word; 57 | if (acc > 0xffff) 58 | acc -= ntohs(0xffff); 59 | } 60 | 61 | return ~acc; 62 | } 63 | 64 | void ip_hton(const struct ip_hdr *host, struct ip_hdr *net) 65 | { 66 | size_t hlen = ip_hdr_hlen(host); 67 | 68 | net->ip_vhl = host->ip_vhl; 69 | net->ip_tos = host->ip_tos; 70 | net->ip_len = htons(host->ip_len); 71 | net->ip_id = htons(host->ip_id); 72 | net->ip_foff = htons(host->ip_foff); 73 | net->ip_ttl = host->ip_ttl; 74 | net->ip_proto = host->ip_proto; 75 | net->ip_csum = host->ip_csum; 76 | net->ip_src = htonl(host->ip_src); 77 | net->ip_dst = htonl(host->ip_dst); 78 | 79 | net->ip_csum = 0; 80 | net->ip_csum = ip_checksum(net, hlen); 81 | } 82 | 83 | size_t ip_ntoh(const struct ip_hdr *net, struct ip_hdr *host) 84 | { 85 | host->ip_vhl = net->ip_vhl; 86 | host->ip_tos = net->ip_tos; 87 | host->ip_len = ntohs(net->ip_len); 88 | host->ip_id = ntohs(net->ip_id); 89 | host->ip_foff = ntohs(net->ip_foff); 90 | host->ip_ttl = net->ip_ttl; 91 | host->ip_proto = net->ip_proto; 92 | host->ip_csum = net->ip_csum; 93 | host->ip_src = ntohl(net->ip_src); 94 | host->ip_dst = ntohl(net->ip_dst); 95 | 96 | return ip_hdr_hlen(host); 97 | } 98 | 99 | size_t ip_reply_header(struct ip_hdr *host_ip_hdr, size_t bsize) 100 | { 101 | struct ip_hdr *const ip = host_ip_hdr; 102 | in_addr_t tmp; 103 | 104 | /* Swap source and destination. */ 105 | tmp = ip->ip_src; 106 | ip->ip_src = ip->ip_dst; 107 | ip->ip_dst = tmp; 108 | ip->ip_ttl = IP_TTL_DEFAULT; 109 | 110 | bsize += ip_hdr_hlen(ip); 111 | ip->ip_len = bsize; 112 | 113 | /* Back to network order */ 114 | ip_hton(ip, ip); 115 | 116 | return bsize; 117 | } 118 | 119 | int ip_input(const struct ether_hdr *e_hdr, uint8_t *payload, size_t bsize) 120 | { 121 | struct ip_hdr *ip = (struct ip_hdr *) payload; 122 | struct _ip_proto_handler **tmpp; 123 | struct _ip_proto_handler *proto; 124 | size_t hlen; 125 | 126 | if (e_hdr) { 127 | ip_ntoh(ip, ip); 128 | } 129 | 130 | if ((ip->ip_vhl & 0x40) != 0x40) { 131 | LOG(LOG_ERR, "Unsupported IP packet version: 0x%x", ip->ip_vhl); 132 | return 0; 133 | } 134 | 135 | hlen = ip_hdr_hlen(ip); 136 | if (hlen < 20) { 137 | LOG(LOG_ERR, "Incorrect packet header length: %d", (int) hlen); 138 | return 0; 139 | } 140 | 141 | if (ip->ip_len != bsize) { 142 | LOG(LOG_ERR, "Packet size mismatch. iplen = %d, bsize = %d", 143 | (int) ip->ip_len, (int) bsize); 144 | return 0; 145 | } 146 | 147 | /* 148 | * RFE The packet header is already modified with ntoh so this wont work. 149 | */ 150 | #if 0 151 | if (ip_checksum(ip, hlen) != 0) { 152 | LOG(LOG_ERR, "Drop due to an invalid checksum"); 153 | return; 154 | } 155 | #endif 156 | 157 | if (ip->ip_tos != IP_TOS_DEFAULT) { 158 | LOG(LOG_INFO, "Unsupported IP type of service or ECN: 0x%x", 159 | ip->ip_tos); 160 | } 161 | 162 | if (e_hdr) { 163 | /* Insert to ARP table so it's possible/faster to send a reply. */ 164 | arp_cache_insert(ip->ip_src, e_hdr->h_src, ARP_CACHE_DYN); 165 | } 166 | 167 | if (ip_route_find_by_iface(ip->ip_dst, NULL)) { 168 | char dst_str[IP_STR_LEN]; 169 | 170 | ip2str(ip->ip_dst, dst_str); 171 | LOG(LOG_WARN, "Invalid destination address %s", dst_str); 172 | 173 | if (NSTACK_IP_SEND_HOSTUNREAC) { 174 | return icmp_generate_dest_unreachable(ip, ICMP_CODE_HOSTUNREAC, 175 | payload + hlen, bsize - hlen); 176 | } 177 | return 0; 178 | } 179 | 180 | if (ip_fragment_is_frag(ip)) { 181 | /* 182 | * Fragmented packet must be first reassembled. 183 | */ 184 | ip_fragment_input(ip, payload + hlen); 185 | 186 | return 0; 187 | } 188 | 189 | SET_FOREACH (tmpp, _ip_proto_handlers) { 190 | proto = *tmpp; 191 | if (proto->proto_id == ip->ip_proto) 192 | break; 193 | proto = NULL; 194 | } 195 | 196 | LOG(LOG_DEBUG, "proto id: 0x%x", ip->ip_proto); 197 | 198 | if (proto) { 199 | int retval; 200 | 201 | retval = proto->fn(ip, payload + hlen, bsize - hlen); 202 | if (retval > 0) 203 | retval = ip_reply_header(ip, retval); 204 | if (retval == -ENOTSOCK) { 205 | LOG(LOG_INFO, "Unreachable port"); 206 | 207 | return icmp_generate_dest_unreachable(ip, ICMP_CODE_PORTUNREAC, 208 | payload + hlen, bsize - hlen); 209 | } 210 | return retval; 211 | } else { 212 | LOG(LOG_INFO, "Unsupported protocol"); 213 | 214 | return icmp_generate_dest_unreachable(ip, ICMP_CODE_PROTOUNREAC, 215 | payload + hlen, bsize - hlen); 216 | } 217 | } 218 | ETHER_PROTO_INPUT_HANDLER(ETHER_PROTO_IPV4, ip_input); 219 | 220 | static inline size_t ip_off_round(size_t plen) 221 | { 222 | return (plen + 7) & ~7; 223 | } 224 | 225 | static size_t next_fragment_size(size_t bytes, size_t hlen, size_t mtu) 226 | { 227 | size_t max, retval; 228 | 229 | max = ip_off_round(mtu - hlen - 8); /* RFE A kernel bug? */ 230 | retval = (bytes < max) ? bytes : max; 231 | 232 | return retval; 233 | } 234 | 235 | static int ip_send_fragments(int ether_handle, 236 | const mac_addr_t dst_mac, 237 | uint8_t *payload, 238 | size_t bsize) 239 | { 240 | struct ip_hdr *ip_hdr_net = (struct ip_hdr *) payload; 241 | struct ip_hdr ip_hdr; 242 | uint8_t *data; 243 | size_t hlen, bytes, offset = 0; 244 | int retval = 0; 245 | 246 | hlen = ip_ntoh(ip_hdr_net, &ip_hdr); 247 | data = payload + hlen; 248 | bytes = bsize - hlen; 249 | do { 250 | size_t plen; 251 | int eret; 252 | 253 | plen = next_fragment_size(bytes, hlen, ETHER_DATA_LEN); 254 | bytes -= plen; 255 | ip_hdr.ip_len = hlen + plen; 256 | ip_hdr.ip_foff = ((bytes != 0) ? IP_FLAGS_MF : 0) | (offset >> 3); 257 | 258 | ip_hton(&ip_hdr, ip_hdr_net); 259 | memmove(data, data + offset, plen); 260 | eret = ether_send(ether_handle, dst_mac, ETHER_PROTO_IPV4, payload, 261 | ip_hdr.ip_len); 262 | if (eret < 0) 263 | return eret; 264 | retval += eret; 265 | offset += plen; 266 | } while (bytes > 0); 267 | 268 | return retval; 269 | } 270 | 271 | static const struct ip_hdr ip_hdr_template = { 272 | .ip_vhl = IP_VHL_DEFAULT, 273 | .ip_tos = IP_TOS_DEFAULT, 274 | .ip_foff = IP_TOFF_DEFAULT, 275 | .ip_ttl = IP_TTL_DEFAULT, 276 | }; 277 | 278 | int ip_send(in_addr_t dst, uint8_t proto, const uint8_t *buf, size_t bsize) 279 | { 280 | mac_addr_t dst_mac; 281 | size_t packet_size = sizeof(struct ip_hdr) + bsize; 282 | struct ip_route route; 283 | 284 | if (ip_route_find_by_network(dst, &route)) { 285 | char ip_str[IP_STR_LEN]; 286 | 287 | ip2str(dst, ip_str); 288 | LOG(LOG_ERR, "No route to host %s", ip_str); 289 | errno = EHOSTUNREACH; 290 | return -1; 291 | } 292 | 293 | if (arp_cache_get_haddr(route.r_iface, dst, dst_mac)) { 294 | int retval = 0; 295 | 296 | if (errno == EHOSTUNREACH) { 297 | /* 298 | * We must defer the operation for now because we are waiting for 299 | * the receiver's MAC addr to be resolved. 300 | */ 301 | retval = ip_defer_push(dst, proto, buf, bsize); 302 | if (retval == 0 || (retval == -EALREADY)) { 303 | retval = 0; /* Return 0 to indicate a deferred operation. */ 304 | } else { /* else an error occurred. */ 305 | errno = -retval; 306 | retval = -1; 307 | } 308 | } 309 | return retval; 310 | } 311 | 312 | { 313 | uint8_t packet[packet_size]; 314 | struct ip_hdr *hdr = (struct ip_hdr *) packet; 315 | int retval; 316 | 317 | memcpy(hdr, &ip_hdr_template, sizeof(ip_hdr_template)); 318 | hdr->ip_len = packet_size; 319 | hdr->ip_id = ip_global_id++; 320 | hdr->ip_src = route.r_iface; 321 | hdr->ip_dst = dst; 322 | hdr->ip_proto = proto; 323 | memcpy(packet + sizeof(ip_hdr_template), buf, bsize); 324 | ip_hton(hdr, hdr); 325 | 326 | if (bsize <= ETHER_DATA_LEN) { 327 | retval = ether_send(route.r_iface_handle, dst_mac, ETHER_PROTO_IPV4, 328 | packet, packet_size); 329 | } else if (1) { /* Check DF flag */ 330 | retval = ip_send_fragments(route.r_iface_handle, dst_mac, packet, 331 | packet_size); 332 | if (retval < 0) { 333 | errno = -retval; 334 | retval = -1; 335 | } 336 | } else { 337 | /* TODO Fail properly */ 338 | errno = EMSGSIZE; 339 | retval = -1; 340 | } 341 | 342 | return retval; 343 | } 344 | } 345 | -------------------------------------------------------------------------------- /src/ip_defer.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "nstack_in.h" 5 | 6 | #include "logger.h" 7 | #include "nstack_ether.h" 8 | #include "nstack_internal.h" 9 | #include "nstack_ip.h" 10 | 11 | struct ip_defer { 12 | int tries; 13 | in_addr_t dst; 14 | uint8_t proto; 15 | size_t buf_size; 16 | uint8_t buf[IP_DATA_MAX_BYTES]; 17 | }; 18 | 19 | /* 20 | * This is used to inhibit defers if it's the ip deferring code itself causing 21 | * defer push. 22 | */ 23 | static bool defer_inhibit = false; 24 | 25 | static struct ip_defer ip_defer_queue[NSTACK_IP_DEFER_MAX]; 26 | static size_t q_rd, q_wr; 27 | 28 | int ip_defer_push(in_addr_t dst, 29 | uint8_t proto, 30 | const uint8_t *buf, 31 | size_t bsize) 32 | { 33 | const size_t next = (q_wr + 1) % num_elem(ip_defer_queue); 34 | struct ip_defer *slot; 35 | 36 | if (defer_inhibit) 37 | return -EALREADY; 38 | 39 | if (next == q_rd) 40 | return -ENOBUFS; 41 | slot = ip_defer_queue + q_wr; 42 | 43 | if (bsize > IP_DATA_MAX_BYTES) 44 | return -EMSGSIZE; 45 | 46 | slot->tries = 0; 47 | slot->dst = dst; 48 | slot->proto = proto; 49 | slot->buf_size = bsize; 50 | memcpy(slot->buf, buf, bsize); 51 | 52 | q_wr = next; 53 | return 0; 54 | } 55 | 56 | static struct ip_defer *ip_defer_peek(void) 57 | { 58 | if (q_rd == q_wr) 59 | return NULL; 60 | 61 | return ip_defer_queue + q_rd; 62 | } 63 | 64 | static void ip_defer_drop(void) 65 | { 66 | if (q_rd == q_wr) 67 | return; 68 | 69 | q_rd = (q_rd + 1) % num_elem(ip_defer_queue); 70 | } 71 | 72 | void ip_defer_handler(int delta_time __unused) 73 | { 74 | defer_inhibit = true; 75 | while (1) { 76 | struct ip_defer *ipd = ip_defer_peek(); 77 | if (!ipd) { 78 | defer_inhibit = false; 79 | return; 80 | } 81 | 82 | if (ipd->tries++ > 3) { /* Drop the packet after couple of tries. */ 83 | char str_ip[IP_STR_LEN]; 84 | 85 | ip2str(ipd->dst, str_ip); 86 | LOG(LOG_INFO, "Dropping IP deferred transmission for %s", str_ip); 87 | ip_defer_drop(); 88 | continue; 89 | } 90 | 91 | if (ip_send(ipd->dst, ipd->proto, ipd->buf, ipd->buf_size) == -1) { 92 | if (errno == EHOSTUNREACH) { 93 | ipd->tries++; /* Try again later. */ 94 | defer_inhibit = false; 95 | return; 96 | } 97 | } 98 | ip_defer_drop(); 99 | } 100 | } 101 | NSTACK_PERIODIC_TASK(ip_defer_handler); 102 | -------------------------------------------------------------------------------- /src/ip_defer.h: -------------------------------------------------------------------------------- 1 | /** 2 | * @addtogroup ip_defer 3 | * IP defer can be used to defer IP packet transmission processing. 4 | * This is useful for example while waiting for an ARP reply. 5 | * @{ 6 | */ 7 | 8 | #pragma once 9 | 10 | #include "nstack_in.h" 11 | 12 | int ip_defer_push(in_addr_t dst, 13 | uint8_t proto, 14 | const uint8_t *buf, 15 | size_t bsize); 16 | 17 | void ip_defer_handler(int delta_time); 18 | 19 | /** 20 | * @} 21 | */ 22 | -------------------------------------------------------------------------------- /src/ip_fragment.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "nstack_in.h" 7 | 8 | #include "logger.h" 9 | #include "nstack_ip.h" 10 | #include "tree.h" 11 | 12 | #define FRAG_MAX 8192 13 | #define FRAG_MAP_SIZE (FRAG_MAX / 32) 14 | #define FRAG_MAP_AI(_i_) ((_i_) >> 5) 15 | #define FRAG_MAP_BI(_i_) ((_i_) &0x1f) 16 | 17 | struct fragment_map { 18 | uint32_t fragmap[FRAG_MAP_SIZE]; 19 | }; 20 | 21 | struct packet_buf { 22 | int reserved; 23 | int timer; 24 | struct fragment_map fragmap; 25 | struct ip_hdr ip_hdr; 26 | uint8_t payload[IP_MAX_BYTES]; 27 | RB_ENTRY(packet_buf) _entry; 28 | }; 29 | 30 | RB_HEAD(packet_buf_tree, packet_buf); 31 | 32 | static struct packet_buf packet_buffer[4]; 33 | static struct packet_buf_tree packet_buffer_head = RB_INITIALIZER(); 34 | 35 | /* 36 | * TODO Timer for giving up 37 | */ 38 | 39 | static int packet_buf_cmp(struct packet_buf *a, struct packet_buf *b) 40 | { 41 | /* Bufid according to RFC 791 */ 42 | struct bufid { 43 | typeof(a->ip_hdr.ip_src) src; 44 | typeof(a->ip_hdr.ip_dst) dst; 45 | typeof(a->ip_hdr.ip_proto) proto; 46 | typeof(a->ip_hdr.ip_id) id; 47 | } bufid[2]; 48 | 49 | memset(bufid, 0, sizeof(bufid)); 50 | 51 | bufid[0] = (struct bufid){ 52 | .src = a->ip_hdr.ip_src, 53 | .dst = a->ip_hdr.ip_dst, 54 | .proto = a->ip_hdr.ip_proto, 55 | .id = a->ip_hdr.ip_id, 56 | }; 57 | bufid[1] = (struct bufid){ 58 | .src = b->ip_hdr.ip_src, 59 | .dst = b->ip_hdr.ip_dst, 60 | .proto = b->ip_hdr.ip_proto, 61 | .id = b->ip_hdr.ip_id, 62 | }; 63 | 64 | return memcmp(&bufid[0], &bufid[1], sizeof(struct bufid)); 65 | } 66 | 67 | RB_GENERATE_STATIC(packet_buf_tree, packet_buf, _entry, packet_buf_cmp); 68 | 69 | static inline void fragmap_init(struct fragment_map *map) 70 | { 71 | memset(map->fragmap, 0, FRAG_MAP_SIZE); 72 | } 73 | 74 | static inline void fragmap_set(struct fragment_map *map, unsigned i) 75 | { 76 | map->fragmap[FRAG_MAP_AI(i)] |= 1 << FRAG_MAP_BI(i); 77 | } 78 | 79 | static inline void fragmap_clear(struct fragment_map *map, unsigned i) 80 | { 81 | map->fragmap[FRAG_MAP_AI(i)] &= ~(1 << FRAG_MAP_BI(i)); 82 | } 83 | 84 | static inline int fragmap_tst(struct fragment_map *map, unsigned i) 85 | { 86 | return (map->fragmap[FRAG_MAP_AI(i)] & (1 << FRAG_MAP_BI(i))); 87 | } 88 | 89 | static inline void release_packet_buffer(struct packet_buf *p) 90 | { 91 | RB_REMOVE(packet_buf_tree, &packet_buffer_head, p); 92 | __sync_lock_release(&p->reserved); 93 | } 94 | 95 | struct packet_buf *get_packet_buffer(struct ip_hdr *hdr) 96 | { 97 | struct packet_buf find = { 98 | .ip_hdr.ip_id = hdr->ip_id, 99 | .ip_hdr.ip_proto = hdr->ip_proto, 100 | .ip_hdr.ip_src = hdr->ip_src, 101 | .ip_hdr.ip_dst = hdr->ip_dst, 102 | }; 103 | struct packet_buf *p; 104 | size_t i; 105 | 106 | /* 107 | * Try to find it. 108 | */ 109 | p = RB_FIND(packet_buf_tree, &packet_buffer_head, &find); 110 | if (p) 111 | return p; 112 | 113 | /* 114 | * Not yet allocated, so allocate a new buffer. 115 | */ 116 | for (i = 0; i < num_elem(packet_buffer); i++) { 117 | struct packet_buf *p = packet_buffer + i; 118 | const int old = __sync_lock_test_and_set(&p->reserved, 1); 119 | 120 | if (old == 0) { 121 | fragmap_init(&p->fragmap); 122 | p->timer = NSTACK_IP_FRAGMENT_TLB; 123 | p->ip_hdr = *hdr; /* RFE Clear things that aren't needed. */ 124 | p->ip_hdr.ip_foff = 0; 125 | p->ip_hdr.ip_len = 0; 126 | 127 | if (RB_INSERT(packet_buf_tree, &packet_buffer_head, p)) { 128 | release_packet_buffer(p); 129 | 130 | return NULL; 131 | } 132 | 133 | return p; 134 | } 135 | } 136 | 137 | return NULL; 138 | } 139 | 140 | int ip_fragment_input(struct ip_hdr *ip_hdr, uint8_t *rx_packet) 141 | { 142 | const size_t off = (ip_hdr->ip_foff & 0x1fff) << 3; 143 | struct packet_buf *p; 144 | size_t i; 145 | 146 | if (off > IP_MAX_BYTES) 147 | return -EMSGSIZE; 148 | 149 | p = get_packet_buffer(ip_hdr); 150 | if (!p) { 151 | LOG(LOG_WARN, "Out of fragment buffers"); 152 | return -ENOBUFS; 153 | } 154 | 155 | memcpy(p->payload + off, rx_packet, ip_hdr->ip_len - ip_hdr_hlen(ip_hdr)); 156 | for (i = off >> 3; 157 | i < (off >> 3) + ((ip_hdr->ip_len - ip_hdr_hlen(ip_hdr) + 7) >> 3); 158 | i++) { 159 | fragmap_set(&p->fragmap, i); 160 | } 161 | 162 | if (off == 0) { 163 | p->ip_hdr = *ip_hdr; 164 | p->ip_hdr.ip_len = 0; 165 | p->ip_hdr.ip_foff = 0; 166 | } else if (!(ip_hdr->ip_foff & IP_FLAGS_MF)) { 167 | p->ip_hdr.ip_len = ip_hdr->ip_len - ip_hdr_hlen(ip_hdr) + off; 168 | } 169 | 170 | if (p->ip_hdr.ip_len != 0) { 171 | int t = 0; 172 | 173 | for (i = 0; i < (size_t) ((p->ip_hdr.ip_len + 7) >> 3); i++) { 174 | t |= !fragmap_tst(&p->fragmap, i); 175 | } 176 | if (!t) { 177 | int retval; 178 | 179 | LOG(LOG_DEBUG, "Fragmented packet was fully reassembled (len: %u)", 180 | (unsigned) p->ip_hdr.ip_len); 181 | retval = ip_input(NULL, (uint8_t *) (&p->ip_hdr), p->ip_hdr.ip_len); 182 | 183 | ip_ntoh(&p->ip_hdr, &p->ip_hdr); 184 | retval = ip_send(p->ip_hdr.ip_dst, p->ip_hdr.ip_proto, p->payload, 185 | retval); 186 | if (retval < 0) { 187 | LOG(LOG_ERR, "Failed to send fragments"); 188 | } 189 | 190 | release_packet_buffer(p); 191 | } 192 | } 193 | 194 | /* 195 | * Commenting out this line breaks the RFC but greatly reduces DOS 196 | * possibilities against the fragment reassembly implementation. 197 | */ 198 | #if 0 199 | p->timer = imax(ip->ip_hdr.ip_ttl, p->timer); 200 | #endif 201 | 202 | return 0; 203 | } 204 | 205 | void ip_fragment_timer(int delta_time) 206 | { 207 | for (size_t i = 0; i < num_elem(packet_buffer); i++) { 208 | struct packet_buf *p = &packet_buffer[i]; 209 | 210 | /* TODO not actually as thread safe as the allocation. */ 211 | if (!p->reserved) 212 | continue; 213 | 214 | p->timer -= delta_time; 215 | if (p->timer <= 0) 216 | release_packet_buffer(p); 217 | } 218 | } 219 | -------------------------------------------------------------------------------- /src/ip_route.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include "nstack_util.h" 6 | 7 | #include "collection.h" 8 | #include "nstack_ip.h" 9 | #include "tree.h" 10 | 11 | struct ip_route_entry { 12 | struct ip_route route; 13 | RB_ENTRY(ip_route_entry) _rib_rtree_entry; /*!< Network tree. */ 14 | RB_ENTRY(ip_route_entry) _rib_stree_entry; /*!< Source addr tree. */ 15 | SLIST_ENTRY(ip_route_entry) _rib_freelist_entry; 16 | }; 17 | 18 | RB_HEAD(rib_routetree, ip_route_entry); 19 | RB_HEAD(rib_sourcetree, ip_route_entry); 20 | SLIST_HEAD(rib_freelist, ip_route_entry); 21 | 22 | static struct ip_route_entry rib[NSTACK_IP_RIB_SIZE]; 23 | static struct rib_routetree rib_routetree; 24 | static struct rib_sourcetree rib_sourcetree; 25 | static struct rib_freelist rib_freelist; 26 | 27 | /** 28 | * Compare network addresses of two routes. 29 | */ 30 | static int route_network_cmp(struct ip_route_entry *a, struct ip_route_entry *b) 31 | { 32 | return a->route.r_network - b->route.r_network; 33 | } 34 | 35 | static int route_iface_cmp(struct ip_route_entry *a, struct ip_route_entry *b) 36 | { 37 | return a->route.r_iface - b->route.r_iface; 38 | } 39 | 40 | RB_GENERATE_STATIC(rib_routetree, 41 | ip_route_entry, 42 | _rib_rtree_entry, 43 | route_network_cmp); 44 | RB_GENERATE_STATIC(rib_sourcetree, 45 | ip_route_entry, 46 | _rib_stree_entry, 47 | route_iface_cmp); 48 | 49 | /** 50 | * Get a new route entry from the free list. 51 | */ 52 | static struct ip_route_entry *ip_route_entry_alloc(void) 53 | { 54 | struct ip_route_entry *entry; 55 | 56 | entry = SLIST_FIRST(&rib_freelist); 57 | SLIST_REMOVE_HEAD(&rib_freelist, _rib_freelist_entry); 58 | 59 | return entry; 60 | } 61 | 62 | /** 63 | * Put a route entry back to the free list. 64 | */ 65 | static void ip_route_entry_free(struct ip_route_entry *entry) 66 | { 67 | SLIST_INSERT_HEAD(&rib_freelist, entry, _rib_freelist_entry); 68 | } 69 | 70 | static void ip_route_tree_insert(struct ip_route_entry *entry) 71 | { 72 | RB_INSERT(rib_routetree, &rib_routetree, entry); 73 | RB_INSERT(rib_sourcetree, &rib_sourcetree, entry); 74 | } 75 | 76 | static void ip_route_tree_remove(struct ip_route_entry *entry) 77 | { 78 | RB_REMOVE(rib_routetree, &rib_routetree, entry); 79 | RB_REMOVE(rib_sourcetree, &rib_sourcetree, entry); 80 | } 81 | 82 | static int ip_route_add(struct ip_route *route) 83 | { 84 | struct ip_route_entry *entry; 85 | 86 | entry = ip_route_entry_alloc(); 87 | if (!entry) { 88 | errno = ENOMEM; 89 | return -1; 90 | } 91 | 92 | entry->route = *route; 93 | ip_route_tree_insert(entry); 94 | 95 | return 0; 96 | } 97 | 98 | int ip_route_update(struct ip_route *route) 99 | { 100 | struct ip_route_entry *entry; 101 | 102 | entry = 103 | RB_FIND(rib_routetree, &rib_routetree, (struct ip_route_entry *) route); 104 | if (entry) { /* Update an existing entry. */ 105 | ip_route_tree_remove(entry); 106 | entry->route = *route; 107 | ip_route_tree_insert(entry); 108 | } else { /* Route not found so we insert it. */ 109 | if (ip_route_add(route)) 110 | return -1; 111 | } 112 | 113 | return 0; 114 | } 115 | 116 | int ip_route_remove(struct ip_route *route) 117 | { 118 | struct ip_route_entry *entry = 119 | RB_FIND(rib_routetree, &rib_routetree, (struct ip_route_entry *) route); 120 | if (!entry) { 121 | errno = ENOENT; 122 | return -1; 123 | } 124 | 125 | ip_route_tree_remove(entry); 126 | memset(entry, 0, sizeof(struct ip_route_entry)); 127 | ip_route_entry_free(entry); 128 | 129 | return 0; 130 | } 131 | 132 | int ip_route_find_by_network(in_addr_t addr, struct ip_route *route) 133 | { 134 | struct ip_route find[] = {{.r_network = addr}, {.r_network = 0}}; 135 | struct ip_route_entry *entry = NULL; 136 | 137 | switch (0) { 138 | case 0: /* First we try exact match */ 139 | entry = RB_FIND(rib_routetree, &rib_routetree, 140 | (struct ip_route_entry *) (&find)); 141 | if (entry) 142 | goto match; 143 | case 1: /* Then with network masks */ 144 | RB_FOREACH (entry, rib_routetree, &rib_routetree) { 145 | if (entry->route.r_network == (addr & entry->route.r_netmask)) { 146 | goto match; 147 | } 148 | } 149 | default: /* And finally we check if there is a default gw */ 150 | entry = RB_FIND(rib_routetree, &rib_routetree, 151 | (struct ip_route_entry *) (find + 1)); 152 | if (entry) 153 | goto match; 154 | } 155 | 156 | match: 157 | if (!entry) { 158 | errno = ENOENT; 159 | return -1; 160 | } 161 | 162 | if (route) 163 | memcpy(route, &entry->route, sizeof(struct ip_route)); 164 | 165 | return 0; 166 | } 167 | 168 | int ip_route_find_by_iface(in_addr_t addr, struct ip_route *route) 169 | { 170 | struct ip_route find = {.r_iface = addr}; 171 | struct ip_route_entry *entry; 172 | 173 | entry = RB_FIND(rib_sourcetree, &rib_sourcetree, 174 | (struct ip_route_entry *) (&find)); 175 | if (!entry) { 176 | errno = ENOENT; 177 | return -1; 178 | } 179 | 180 | if (route) 181 | memcpy(route, &entry->route, sizeof(struct ip_route)); 182 | 183 | return 0; 184 | } 185 | 186 | __constructor void ip_route_init(void) 187 | { 188 | RB_INIT(&rib_routetree); 189 | RB_INIT(&rib_sourcetree); 190 | SLIST_INIT(&rib_freelist); 191 | 192 | for (size_t i = 0; i < num_elem(rib); i++) { 193 | SLIST_INSERT_HEAD(&rib_freelist, &rib[i], _rib_freelist_entry); 194 | } 195 | } 196 | -------------------------------------------------------------------------------- /src/linux/ether.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | 15 | #include "nstack_util.h" 16 | 17 | #include "../logger.h" 18 | #include "../nstack_ether.h" 19 | 20 | #define DEFAULT_IF "eth0" 21 | #define ETHER_MAX_IF 1 22 | 23 | struct ether_linux { 24 | int el_fd; 25 | mac_addr_t el_mac; 26 | struct ifreq el_if_idx; 27 | }; 28 | 29 | static struct ether_linux ether_if[ETHER_MAX_IF]; 30 | static int ether_next_handle; 31 | 32 | static struct ether_linux *ether_handle2eth(int handle) 33 | { 34 | if (handle >= ETHER_MAX_IF) { 35 | errno = ENODEV; 36 | return NULL; 37 | } 38 | return ðer_if[handle]; 39 | } 40 | 41 | int ether_handle2addr(int handle, mac_addr_t addr) 42 | { 43 | struct ether_linux *eth; 44 | 45 | if (!(eth = ether_handle2eth(handle))) { 46 | errno = ENODEV; 47 | return -1; 48 | } 49 | 50 | memcpy(addr, eth->el_mac, sizeof(mac_addr_t)); 51 | return 0; 52 | } 53 | 54 | int ether_addr2handle(const mac_addr_t addr __unused) 55 | { 56 | return 0; /* TODO Implementation of ether_get_handle() */ 57 | } 58 | 59 | static int linux_ether_bind(struct ether_linux *eth) 60 | { 61 | struct ifreq ifopts = {0}; 62 | struct sockaddr_ll socket_address = {0}; 63 | int sockopt, retval; 64 | 65 | /* Set the interface to promiscuous mode. */ 66 | strncpy(ifopts.ifr_name, eth->el_if_idx.ifr_name, IFNAMSIZ - 1); 67 | ioctl(eth->el_fd, SIOCGIFFLAGS, &ifopts); 68 | ifopts.ifr_flags |= IFF_PROMISC; 69 | ioctl(eth->el_fd, SIOCSIFFLAGS, &ifopts); 70 | /* Allow the socket to be reused. */ 71 | retval = setsockopt(eth->el_fd, SOL_SOCKET, SO_REUSEADDR, &sockopt, 72 | sizeof(sockopt)); 73 | if (retval == -1) 74 | return -1; 75 | 76 | socket_address.sll_family = AF_PACKET; 77 | socket_address.sll_protocol = htons(ETH_P_ALL); 78 | socket_address.sll_ifindex = eth->el_if_idx.ifr_ifindex; 79 | socket_address.sll_pkttype = 80 | PACKET_OTHERHOST | PACKET_BROADCAST | PACKET_MULTICAST | PACKET_HOST; 81 | /*socket_address.sll_pkttype = PACKET_HOST;*/ 82 | socket_address.sll_halen = ETHER_ALEN, 83 | socket_address.sll_addr[0] = eth->el_mac[0], 84 | socket_address.sll_addr[1] = eth->el_mac[1], 85 | socket_address.sll_addr[2] = eth->el_mac[2], 86 | socket_address.sll_addr[3] = eth->el_mac[3], 87 | socket_address.sll_addr[4] = eth->el_mac[4], 88 | socket_address.sll_addr[5] = eth->el_mac[5], 89 | socket_address.sll_hatype = 0x0000; 90 | bind(eth->el_fd, (struct sockaddr *) &socket_address, 91 | sizeof(socket_address)); 92 | 93 | return 0; 94 | } 95 | 96 | static int linux_ether_set_rxtimeout(struct ether_linux *eth) 97 | { 98 | struct timeval tv = { 99 | .tv_sec = NSTACK_PERIODIC_EVENT_SEC, 100 | }; 101 | 102 | return setsockopt(eth->el_fd, SOL_SOCKET, SO_RCVTIMEO, (char *) &tv, 103 | sizeof(struct timeval)); 104 | } 105 | 106 | int ether_init(char *const args[]) 107 | { 108 | const int handle = ether_next_handle; 109 | struct ether_linux *eth; 110 | char if_name[IFNAMSIZ]; 111 | struct ifreq if_mac; 112 | 113 | if (handle >= ETHER_MAX_IF) { 114 | errno = EAGAIN; 115 | return -1; 116 | } 117 | eth = ðer_if[handle]; 118 | ether_next_handle++; 119 | 120 | /* TODO Parse args */ 121 | if (args[0]) { /* Non-default IF */ 122 | /* prevent buffer overflow */ 123 | if (strnlen(args[0], IFNAMSIZ) >= IFNAMSIZ) 124 | return -2; 125 | strcpy(if_name, args[0]); 126 | } else { /* Default IF */ 127 | strcpy(if_name, DEFAULT_IF); 128 | } 129 | 130 | if ((eth->el_fd = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW)) == -1) 131 | return -1; 132 | 133 | /* Get the index of the interface */ 134 | memset(ð->el_if_idx, 0, sizeof(struct ifreq)); 135 | strncpy(eth->el_if_idx.ifr_name, if_name, IFNAMSIZ - 1); 136 | if (ioctl(eth->el_fd, SIOCGIFINDEX, ð->el_if_idx) < 0) 137 | goto fail; 138 | 139 | /* Get the MAC address of the interface */ 140 | if (args[0] && args[1]) { /* MAC addr given by the user */ 141 | /* TODO Parse MAC addr */ 142 | errno = ENOTSUP; 143 | return -1; 144 | } 145 | 146 | /* Use the default MAC addr */ 147 | memset(&if_mac, 0, sizeof(struct ifreq)); 148 | strncpy(if_mac.ifr_name, if_name, IFNAMSIZ - 1); 149 | if (ioctl(eth->el_fd, SIOCGIFHWADDR, &if_mac) < 0) 150 | goto fail; 151 | eth->el_mac[0] = ((uint8_t *) &if_mac.ifr_hwaddr.sa_data)[0]; 152 | eth->el_mac[1] = ((uint8_t *) &if_mac.ifr_hwaddr.sa_data)[1]; 153 | eth->el_mac[2] = ((uint8_t *) &if_mac.ifr_hwaddr.sa_data)[2]; 154 | eth->el_mac[3] = ((uint8_t *) &if_mac.ifr_hwaddr.sa_data)[3]; 155 | eth->el_mac[4] = ((uint8_t *) &if_mac.ifr_hwaddr.sa_data)[4]; 156 | eth->el_mac[5] = ((uint8_t *) &if_mac.ifr_hwaddr.sa_data)[5]; 157 | 158 | if (linux_ether_bind(eth)) 159 | goto fail; 160 | 161 | if (linux_ether_set_rxtimeout(eth)) 162 | goto fail; 163 | 164 | return handle; 165 | fail: 166 | close(eth->el_fd); 167 | return -1; 168 | } 169 | 170 | void ether_deinit(int handle) 171 | { 172 | struct ether_linux *eth; 173 | 174 | if (!(eth = ether_handle2eth(handle))) 175 | return; 176 | 177 | close(eth->el_fd); 178 | } 179 | 180 | int ether_receive(int handle, struct ether_hdr *hdr, uint8_t *buf, size_t bsize) 181 | { 182 | struct ether_linux *eth; 183 | uint8_t frame[ETHER_MAXLEN] __attribute__((aligned)); 184 | struct ether_hdr *frame_hdr = (struct ether_hdr *) frame; 185 | int retval; 186 | 187 | assert(hdr != NULL); 188 | assert(buf != NULL); 189 | 190 | if (!(eth = ether_handle2eth(handle))) 191 | return -1; 192 | 193 | do { 194 | retval = 195 | (int) recvfrom(eth->el_fd, frame, sizeof(frame), 0, NULL, NULL); 196 | if (retval == -1 && 197 | (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINPROGRESS)) { 198 | return 0; 199 | } else if (retval == -1) { 200 | return -1; 201 | } 202 | } while (!memcmp(frame_hdr->h_src, eth->el_mac, sizeof(mac_addr_t))); 203 | 204 | memcpy(hdr->h_dst, frame_hdr->h_dst, sizeof(mac_addr_t)); 205 | memcpy(hdr->h_src, frame_hdr->h_src, sizeof(mac_addr_t)); 206 | hdr->h_proto = ntohs(frame_hdr->h_proto); 207 | 208 | retval -= ETHER_HEADER_LEN; 209 | memcpy(buf, frame + ETHER_HEADER_LEN, min(retval, bsize)); 210 | 211 | return retval; 212 | } 213 | 214 | int ether_send(int handle, 215 | const mac_addr_t dst, 216 | uint16_t proto, 217 | uint8_t *buf, 218 | size_t bsize) 219 | { 220 | struct ether_linux *eth; 221 | struct sockaddr_ll socket_address; 222 | const size_t frame_size = ETHER_HEADER_LEN + 223 | max(bsize, ETHER_MINLEN - ETHER_FCS_LEN) + 224 | ETHER_FCS_LEN; 225 | uint8_t frame[frame_size] __attribute__((aligned)); 226 | uint32_t fcs; 227 | uint8_t *data = frame + ETHER_HEADER_LEN; 228 | struct ether_hdr *frame_hdr = (struct ether_hdr *) frame; 229 | uint32_t *fcs_p = (uint32_t *) &frame[frame_size - ETHER_FCS_LEN]; 230 | int retval; 231 | 232 | assert(buf != NULL); 233 | 234 | if (frame_size > ETHER_MAXLEN + ETHER_FCS_LEN) { 235 | retval = -EMSGSIZE; 236 | goto out; 237 | } 238 | 239 | if (!(eth = ether_handle2eth(handle))) { 240 | retval = -errno; 241 | goto out; 242 | } 243 | 244 | socket_address = (struct sockaddr_ll){ 245 | .sll_family = AF_PACKET, 246 | .sll_protocol = htons(proto), 247 | .sll_ifindex = eth->el_if_idx.ifr_ifindex, 248 | .sll_halen = ETHER_ALEN, 249 | .sll_addr[0] = dst[0], 250 | .sll_addr[1] = dst[1], 251 | .sll_addr[2] = dst[2], 252 | .sll_addr[3] = dst[3], 253 | .sll_addr[4] = dst[4], 254 | .sll_addr[5] = dst[5], 255 | }; 256 | 257 | memcpy(frame_hdr->h_dst, dst, ETHER_ALEN); 258 | memcpy(frame_hdr->h_src, eth->el_mac, ETHER_ALEN); 259 | frame_hdr->h_proto = htons(proto); 260 | memcpy(data, buf, bsize); 261 | memset(data + bsize, 0, frame_size - ETHER_HEADER_LEN - bsize); 262 | fcs = ether_fcs(frame, frame_size - ETHER_FCS_LEN); 263 | memcpy(fcs_p, &fcs, sizeof(uint32_t)); 264 | 265 | retval = (int) sendto(eth->el_fd, frame, frame_size, 0, 266 | (struct sockaddr *) (&socket_address), 267 | sizeof(socket_address)); 268 | if (retval < 0) 269 | retval = -errno; 270 | out: 271 | return retval; 272 | } 273 | -------------------------------------------------------------------------------- /src/logger.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | enum log_level { 6 | LOG_ERR = '1', 7 | LOG_WARN = '2', 8 | LOG_INFO = '3', 9 | LOG_DEBUG = '4', 10 | }; 11 | 12 | #define LOG(_level_, _fmt_, ...) \ 13 | do { \ 14 | fprintf(stderr, \ 15 | "%c:%s: "_fmt_ \ 16 | "\n", \ 17 | _level_, __func__, ##__VA_ARGS__); \ 18 | } while (0) 19 | -------------------------------------------------------------------------------- /src/nstack.c: -------------------------------------------------------------------------------- 1 | #define _GNU_SOURCE 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | 11 | #include "linker_set.h" 12 | #include "nstack_in.h" 13 | #include "nstack_socket.h" 14 | 15 | #include "collection.h" 16 | #include "logger.h" 17 | #include "nstack_ether.h" 18 | #include "nstack_internal.h" 19 | #include "nstack_ip.h" 20 | #include "tcp.h" 21 | #include "udp.h" 22 | 23 | extern void tcp_slowtimo(); 24 | /** 25 | * nstack ingress and egress thread state. 26 | */ 27 | enum nstack_state { 28 | NSTACK_STOPPED = 0, /*!< Ingress and egress threads are not running. */ 29 | NSTACK_RUNNING, /*!< Ingress and egress threads are running. */ 30 | NSTACK_DYING, /*!< Waiting for ingress and egress threads to stop. */ 31 | }; 32 | 33 | SET_DECLARE(_nstack_periodic_tasks, void); 34 | 35 | /* 36 | * nstack state variables. 37 | */ 38 | static enum nstack_state nstack_state = NSTACK_STOPPED; 39 | static pthread_t ingress_tid, egress_tid, tcp_timer_tid; 40 | static int ether_handle; 41 | 42 | static nstack_send_fn *proto_send[] = { 43 | [XIP_PROTO_TCP] = nstack_tcp_send, 44 | [XIP_PROTO_UDP] = nstack_udp_send, 45 | }; 46 | 47 | static struct nstack_sock sockets[] = { 48 | { 49 | .info.sock_dom = XF_INET4, 50 | .info.sock_type = XSOCK_DGRAM, 51 | .info.sock_proto = XIP_PROTO_UDP, 52 | .info.sock_addr = 53 | (struct nstack_sockaddr){ 54 | .inet4_addr = 167772162, 55 | .port = 10, 56 | }, 57 | .shmem_path = "/tmp/unetcat.sock", 58 | }, 59 | {.info.sock_dom = XF_INET4, 60 | .info.sock_type = XSOCK_STREAM, 61 | .info.sock_proto = XIP_PROTO_TCP, 62 | .info.sock_addr = 63 | (struct nstack_sockaddr){ 64 | .inet4_addr = 167772162, 65 | .port = 10, 66 | }, 67 | .shmem_path = "/tmp/tnetcat.sock"}, 68 | }; 69 | 70 | static enum nstack_state get_state(void) 71 | { 72 | enum nstack_state *state = &nstack_state; 73 | 74 | return *state; 75 | } 76 | 77 | static void set_state(enum nstack_state state) 78 | { 79 | nstack_state = state; 80 | } 81 | 82 | static int delta_time; 83 | static int eval_timer(void) 84 | { 85 | static struct timespec start; 86 | struct timespec now; 87 | 88 | clock_gettime(CLOCK_MONOTONIC, &now); 89 | delta_time = now.tv_sec - start.tv_sec; 90 | if (delta_time >= NSTACK_PERIODIC_EVENT_SEC) { 91 | start = now; 92 | return !0; 93 | } 94 | return 0; 95 | } 96 | 97 | static void *nstack_tcp_timer_thread(void *arg) 98 | { 99 | while (1) { 100 | usleep(NSTACK_TCP_TIMER_USEC); 101 | tcp_slowtimo(); 102 | } 103 | pthread_exit(NULL); 104 | } 105 | 106 | /** 107 | * Bind an address to a socket. 108 | * @param[in] sock is a pointer to the socket returned by nstack_socket(). 109 | * @returns Upon successful completion returns 0; 110 | * Otherwise -1 is returned and errno is set. 111 | */ 112 | static int nstack_bind(struct nstack_sock *sock) 113 | { 114 | switch (sock->info.sock_proto) { 115 | case XIP_PROTO_UDP: 116 | return nstack_udp_bind(sock); 117 | case XIP_PROTO_TCP: 118 | return nstack_tcp_bind(sock); 119 | default: 120 | errno = EPROTOTYPE; 121 | return -1; 122 | } 123 | } 124 | 125 | int nstack_sock_dgram_input(struct nstack_sock *sock, 126 | struct nstack_sockaddr *srcaddr, 127 | uint8_t *buf, 128 | size_t bsize) 129 | { 130 | int dgram_index; 131 | struct nstack_dgram *dgram; 132 | 133 | while ((dgram_index = queue_alloc(sock->ingress_q)) == -1) 134 | ; 135 | dgram = (struct nstack_dgram *) (sock->ingress_data + dgram_index); 136 | 137 | dgram->srcaddr = *srcaddr; 138 | dgram->dstaddr = sock->info.sock_addr; 139 | dgram->buf_size = bsize; 140 | memcpy(dgram->buf, buf, bsize); 141 | 142 | queue_commit(sock->ingress_q); 143 | kill(sock->ctrl->pid_end, SIGUSR2); 144 | 145 | return 0; 146 | } 147 | 148 | static void run_periodic_tasks(int delta_time) 149 | { 150 | void **taskp; 151 | 152 | SET_FOREACH (taskp, _nstack_periodic_tasks) { 153 | nstack_periodic_task_t *task = *(nstack_periodic_task_t **) taskp; 154 | 155 | if (task) 156 | task(delta_time); 157 | } 158 | } 159 | 160 | /** 161 | * Handle the ingress traffic. 162 | * All ingress data is handled in a single pipeline until this point where 163 | * the data is demultiplexed to sockets. 164 | * transport -> socket fd 165 | */ 166 | static void *nstack_ingress_thread(void *arg) 167 | { 168 | static uint8_t rx_buffer[ETHER_MAXLEN]; 169 | 170 | while (1) { 171 | struct ether_hdr hdr; 172 | int retval; 173 | 174 | LOG(LOG_DEBUG, "Waiting for rx"); 175 | 176 | retval = 177 | ether_receive(ether_handle, &hdr, rx_buffer, sizeof(rx_buffer)); 178 | if (retval == -1) { 179 | LOG(LOG_ERR, "Rx failed: %d", errno); 180 | } else if (retval > 0) { 181 | LOG(LOG_DEBUG, "Frame received!"); 182 | 183 | retval = ether_input(&hdr, rx_buffer, retval); 184 | if (retval == -1) { 185 | LOG(LOG_ERR, "Protocol handling failed: %d", errno); 186 | } else if (retval > 0) { 187 | retval = 188 | ether_output_reply(ether_handle, &hdr, rx_buffer, retval); 189 | if (retval < 0) { 190 | LOG(LOG_ERR, "Reply failed: %d", errno); 191 | } 192 | } 193 | } 194 | 195 | if (eval_timer()) { 196 | LOG(LOG_DEBUG, "tick"); 197 | run_periodic_tasks(delta_time); 198 | } 199 | 200 | if (get_state() == NSTACK_DYING) { 201 | break; 202 | } 203 | } 204 | 205 | pthread_exit(NULL); 206 | } 207 | 208 | /** 209 | * Handle the egress traffic. 210 | * All egress traffic is mux'd and serialized through one egress pipe. 211 | * socket fd -> transport 212 | */ 213 | static void *nstack_egress_thread(void *arg) 214 | { 215 | sigset_t sigset; 216 | 217 | sigemptyset(&sigset); 218 | sigaddset(&sigset, SIGUSR2); 219 | 220 | if (pthread_sigmask(SIG_BLOCK, &sigset, NULL) == -1) { 221 | LOG(LOG_ERR, "Unable to ignore SIGUSR2"); 222 | abort(); 223 | } 224 | 225 | while (1) { 226 | struct timespec timeout = { 227 | .tv_sec = NSTACK_PERIODIC_EVENT_SEC, 228 | .tv_nsec = 0, 229 | }; 230 | 231 | sigtimedwait(&sigset, NULL, &timeout); 232 | 233 | for (size_t i = 0; i < num_elem(sockets); i++) { 234 | struct nstack_sock *sock = sockets + i; 235 | 236 | if (!queue_is_empty(sock->egress_q)) { 237 | int dgram_index; 238 | struct nstack_dgram *dgram; 239 | enum nstack_sock_proto proto; 240 | 241 | while (!queue_peek(sock->egress_q, &dgram_index)) 242 | ; 243 | dgram = 244 | (struct nstack_dgram *) (sock->egress_data + dgram_index); 245 | 246 | LOG(LOG_DEBUG, "Sending a datagram"); 247 | proto = sock->info.sock_proto; 248 | if (proto > XIP_PROTO_NONE && proto < XIP_PROTO_LAST) { 249 | if (proto_send[proto](sock, dgram) < 0) { 250 | LOG(LOG_ERR, "Failed to send a datagram"); 251 | } 252 | } else { 253 | LOG(LOG_ERR, "Invalid protocol"); 254 | } 255 | 256 | queue_discard(sock->egress_q, 1); 257 | } 258 | } 259 | 260 | if (get_state() == NSTACK_DYING) 261 | break; 262 | } 263 | 264 | pthread_exit(NULL); 265 | } 266 | 267 | static void nstack_init(void) 268 | { 269 | pid_t mypid = getpid(); 270 | 271 | for (size_t i = 0; i < num_elem(sockets); i++) { 272 | struct nstack_sock *sock = sockets + i; 273 | int fd; 274 | void *pa; 275 | 276 | fd = open(sock->shmem_path, O_RDWR); 277 | if (fd == -1) { 278 | perror("Failed to open shmem file"); 279 | exit(1); 280 | } 281 | 282 | pa = mmap(0, NSTACK_SHMEM_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 283 | 0); 284 | if (pa == MAP_FAILED) { 285 | perror("Failed to mmap() shared mem"); 286 | exit(1); 287 | } 288 | memset(pa, 0, NSTACK_SHMEM_SIZE); 289 | 290 | sock->ctrl = NSTACK_SOCK_CTRL(pa); 291 | *sock->ctrl = (struct nstack_sock_ctrl){ 292 | .pid_inetd = mypid, 293 | .pid_end = 0, 294 | }; 295 | 296 | sock->ingress_data = NSTACK_INGRESS_DADDR(pa); 297 | sock->ingress_q = NSTACK_INGRESS_QADDR(pa); 298 | *sock->ingress_q = 299 | queue_create(NSTACK_DATAGRAM_SIZE_MAX, NSTACK_DATAGRAM_BUF_SIZE); 300 | 301 | sock->egress_data = NSTACK_EGRESS_DADDR(pa); 302 | sock->egress_q = NSTACK_EGRESS_QADDR(pa); 303 | *sock->egress_q = 304 | queue_create(NSTACK_DATAGRAM_SIZE_MAX, NSTACK_DATAGRAM_BUF_SIZE); 305 | 306 | if (nstack_bind(sock) < 0) { 307 | perror("Failed to bind a socket"); 308 | exit(1); 309 | } 310 | } 311 | } 312 | 313 | int nstack_start(int handle) 314 | { 315 | ether_handle = handle; 316 | 317 | if (get_state() != NSTACK_STOPPED) { 318 | errno = EALREADY; 319 | return -1; 320 | } 321 | 322 | nstack_init(); 323 | 324 | if (pthread_create(&ingress_tid, NULL, nstack_ingress_thread, NULL)) { 325 | return -1; 326 | } 327 | 328 | if (pthread_create(&egress_tid, NULL, nstack_egress_thread, NULL)) { 329 | pthread_cancel(ingress_tid); 330 | return -1; 331 | } 332 | 333 | if (pthread_create(&tcp_timer_tid, NULL, nstack_tcp_timer_thread, NULL)) { 334 | pthread_cancel(ingress_tid); 335 | pthread_cancel(egress_tid); 336 | return -1; 337 | } 338 | 339 | set_state(NSTACK_RUNNING); 340 | return 0; 341 | } 342 | 343 | void nstack_stop(void) 344 | { 345 | set_state(NSTACK_DYING); 346 | 347 | pthread_join(ingress_tid, NULL); 348 | pthread_join(egress_tid, NULL); 349 | pthread_join(tcp_timer_tid, NULL); 350 | 351 | set_state(NSTACK_STOPPED); 352 | } 353 | 354 | int main(int argc, char *argv[]) 355 | { 356 | char *const ether_args[] = { 357 | argv[1], 358 | NULL, 359 | }; 360 | int handle; 361 | sigset_t sigset; 362 | 363 | if (argc == 1) { 364 | fprintf(stderr, "Usage: %s INTERFACE\n", argv[0]); 365 | exit(1); 366 | } 367 | 368 | sigemptyset(&sigset); 369 | sigaddset(&sigset, SIGUSR1); 370 | 371 | /* Block sigset for all future threads */ 372 | sigprocmask(SIG_SETMASK, &sigset, NULL); 373 | 374 | handle = ether_init(ether_args); 375 | if (handle == -1) { 376 | perror("Failed to init"); 377 | exit(1); 378 | } else if (handle == -2) { 379 | perror("Interface identifier is too long"); 380 | exit(1); 381 | } 382 | 383 | if (ip_config(handle, STACK_IP, SUBNET_MASK)) { 384 | perror("Failed to config IP"); 385 | exit(1); 386 | } 387 | 388 | nstack_start(handle); 389 | 390 | sigwaitinfo(&sigset, NULL); 391 | 392 | fprintf(stderr, "Stopping the IP stack...\n"); 393 | 394 | nstack_stop(); 395 | 396 | ether_deinit(handle); 397 | 398 | return 0; 399 | } 400 | -------------------------------------------------------------------------------- /src/nstack_arp.h: -------------------------------------------------------------------------------- 1 | /** 2 | * @addtogroup ARP 3 | * @{ 4 | */ 5 | 6 | #pragma once 7 | 8 | #include "nstack_in.h" 9 | #include "nstack_link.h" 10 | 11 | /** 12 | * ARP IP protocol message. 13 | */ 14 | struct arp_ip { 15 | uint16_t arp_htype; /*!< HW type */ 16 | uint16_t arp_ptype; /*!< Protocol type */ 17 | uint8_t arp_hlen; /*!< HW addr len */ 18 | uint8_t arp_plen; /*!< Proto addr len */ 19 | uint16_t arp_oper; /*!< Opcode */ 20 | mac_addr_t arp_sha; /*!< Sender HW addr */ 21 | in_addr_t arp_spa; /*!< Sender IP addr */ 22 | mac_addr_t arp_tha; /*!< Target HW addr */ 23 | in_addr_t arp_tpa; /*!< Target IP addr */ 24 | } __attribute__((packed, aligned(2))); 25 | 26 | /** 27 | * arp_htype 28 | * @{ 29 | */ 30 | #define ARP_HTYPE_ETHER 1 31 | /** 32 | * @} 33 | */ 34 | 35 | /** 36 | * arp_oper 37 | * @{ 38 | */ 39 | #define ARP_OPER_REQUEST 1 /*!< Request */ 40 | #define ARP_OPER_REPLY 2 /*!< Repply */ 41 | /** 42 | * @} 43 | */ 44 | 45 | /** 46 | * ARP Cache Operations. 47 | */ 48 | 49 | /** 50 | * ARP Cache entry type. 51 | */ 52 | enum arp_cache_entry_type { 53 | ARP_CACHE_FREE = -2, /*!< Unused entry. */ 54 | ARP_CACHE_STATIC = -1, /*!< Static entry. */ 55 | ARP_CACHE_DYN = 0, /*!< Dynamic entry. */ 56 | }; 57 | 58 | int arp_cache_insert(in_addr_t ip_addr, 59 | const mac_addr_t ether_addr, 60 | enum arp_cache_entry_type type); 61 | void arp_cache_remove(in_addr_t ip_addr); 62 | int arp_cache_get_haddr(in_addr_t iface, in_addr_t ip_addr, mac_addr_t haddr); 63 | 64 | /** 65 | * @} 66 | */ 67 | 68 | /** 69 | * Announce an IP address with ARP. 70 | */ 71 | int arp_gratuitous(int ether_handle, in_addr_t spa); 72 | 73 | /** 74 | * @} 75 | */ 76 | -------------------------------------------------------------------------------- /src/nstack_ether.h: -------------------------------------------------------------------------------- 1 | /** 2 | * @addtogroup Ether 3 | * @{ 4 | */ 5 | 6 | #pragma once 7 | 8 | #include 9 | #include 10 | 11 | #include "linker_set.h" 12 | #include "nstack_link.h" 13 | 14 | /** 15 | * Ethernet frame 16 | * @verbatim 17 | * +---------+---------+---------+-----------+-----+ 18 | * | dst MAC | src MAC | Type ID | Data | FCS | 19 | * +---------+---------+---------+-----------+-----+ 20 | * 6 6 2 45 - 1500 21 | * |-----------------------------|-----------------| 22 | * Datalink header Data and CRC 23 | * @endverbatim 24 | * @{ 25 | */ 26 | #define ETHER_ALEN LINK_MAC_ALEN 27 | #define ETHER_HEADER_LEN 14 28 | #define ETHER_DATA_LEN 1500 /*!< Max length of data */ 29 | #define ETHER_FCS_LEN 4 30 | #define ETHER_MINLEN 60 31 | #define ETHER_MAXLEN 1514 32 | /** 33 | * @} 34 | */ 35 | 36 | /** 37 | * Protocol type IDs. 38 | * @{ 39 | */ 40 | #define ETHER_PROTO_LOOP 0x0060 /*!< Loopback */ 41 | #define ETHER_PROTO_IPV4 0x0800 /*!< IPv4 */ 42 | #define ETHER_PROTO_ARP 0x0806 /*!< Address Resolution Protocol */ 43 | #define ETHER_PROTO_RARP 0x8035 /*!< Reverse Address Resolution Protocol */ 44 | #define ETHER_PROTO_WOL 0x0842 /*!< Wake-on-LAN */ 45 | #define ETHER_PROTO_8021Q 0x8100 /*!< VLAN-tagged frame */ 46 | #define ETHER_PROTO_IPV6 0x86DD /*!< IPv6 */ 47 | /** 48 | * @} 49 | */ 50 | 51 | /** 52 | * Ethernet frame header. 53 | */ 54 | struct ether_hdr { 55 | mac_addr_t h_dst; /*!< Destination ethernet address */ 56 | mac_addr_t h_src; /*!< Source ethernet address */ 57 | uint16_t h_proto; /*!< Packet type ID */ 58 | } __attribute__((packed)); 59 | 60 | struct _ether_proto_handler { 61 | uint16_t proto_id; 62 | int (*fn)(const struct ether_hdr *hdr, uint8_t *payload, size_t bsize); 63 | }; 64 | 65 | /** 66 | * Declare an ethernet input chain handler. 67 | */ 68 | #define ETHER_PROTO_INPUT_HANDLER(_proto_id_, _handler_fn_) \ 69 | static struct _ether_proto_handler _ether_proto_handler_##_handler_fn_ = { \ 70 | .proto_id = _proto_id_, \ 71 | .fn = _handler_fn_, \ 72 | }; \ 73 | DATA_SET(_ether_proto_handlers, _ether_proto_handler_##_handler_fn_) 74 | 75 | 76 | extern const mac_addr_t mac_broadcast_addr; 77 | 78 | int ether_init(char *const args[]); 79 | void ether_deinit(int ether_handle); 80 | uint32_t ether_fcs(const void *data, size_t bsize); 81 | 82 | /* Platform dependent functions */ 83 | 84 | /** 85 | * Get the MAC address of an interface. 86 | * @param[in] handle is the ether handle. 87 | * @param[out] addr is the destination buffer. 88 | */ 89 | int ether_handle2addr(int handle, mac_addr_t addr); 90 | 91 | /** 92 | * Get the corresponding handle of an MAC address. 93 | */ 94 | int ether_addr2handle(const mac_addr_t addr); 95 | 96 | /** 97 | * Raw Ethernet RX and TX functions. 98 | * @{ 99 | */ 100 | 101 | /** 102 | * Receive a frame from ether. 103 | * @retval >0 the size of the received frame; 104 | * @retval 0 read timed out; 105 | * @retval -1 a read error occurred, errno is set. 106 | */ 107 | int ether_receive(int handle, 108 | struct ether_hdr *hdr, 109 | uint8_t *buf, 110 | size_t bsize); 111 | /** 112 | * Send a frame to a destination over ether. 113 | */ 114 | int ether_send(int handle, 115 | const mac_addr_t dst, 116 | uint16_t proto, 117 | uint8_t *buf, 118 | size_t bsize); 119 | /** 120 | * @} 121 | */ 122 | 123 | /** 124 | * Ethernet input and output chains. 125 | * @{ 126 | */ 127 | 128 | /** 129 | * Handle the received ethernet frame. 130 | * @retval >0 the size of the reply written back to payload; 131 | * @retval 0 if no reply should be sent; 132 | * @retval -1 an error occurred, errno is set. 133 | */ 134 | int ether_input(const struct ether_hdr *hdr, uint8_t *payload, size_t bsize); 135 | 136 | /** 137 | * Send back a reply message. 138 | * @param hdr must be untouched header received by ether_receive(). 139 | */ 140 | int ether_output_reply(int ether_handle, 141 | const struct ether_hdr *hdr, 142 | uint8_t *payload, 143 | size_t bsize); 144 | 145 | /** 146 | * @} 147 | */ 148 | 149 | /** 150 | * @} 151 | */ 152 | -------------------------------------------------------------------------------- /src/nstack_icmp.h: -------------------------------------------------------------------------------- 1 | /** 2 | * @addtogroup ICMP 3 | * @{ 4 | */ 5 | 6 | #pragma once 7 | 8 | #include 9 | 10 | #include "nstack_ip.h" 11 | 12 | /** 13 | * ICMP message. 14 | */ 15 | struct icmp { 16 | uint8_t icmp_type; 17 | uint8_t icmp_code; 18 | uint16_t icmp_csum; 19 | uint32_t icmp_rest; 20 | uint8_t icmp_data[0]; 21 | } __attribute__((packed)); 22 | 23 | /** 24 | * ICMP destination unreachable message. 25 | */ 26 | struct icmp_destunreac { 27 | struct icmp icmp; 28 | struct ip_hdr old_ip_hdr; 29 | uint8_t data[8]; 30 | } __attribute__((packed)); 31 | 32 | /** 33 | * ICMP Types. 34 | * @{ 35 | */ 36 | #define ICMP_TYPE_ECHO_REPLY 0 37 | #define ICMP_TYPE_DESTUNREAC 3 38 | #define ICMP_TYPE_ECHO_REQUEST 8 39 | /** 40 | * @} 41 | */ 42 | 43 | /** 44 | * ICMP Codes for ICMP_TYPE_DESTUNREAC. 45 | * @{ 46 | */ 47 | #define ICMP_CODE_DESTUNREAC 0 /*!< Network unreachable error. */ 48 | #define ICMP_CODE_HOSTUNREAC 1 /*!< Host unreachable error. */ 49 | #define ICMP_CODE_PROTOUNREAC 2 /*!< Protocol unreachable error. */ 50 | #define ICMP_CODE_PORTUNREAC 3 /*!< Port unreachable error. */ 51 | #define ICMP_CODE_DESTNETUNK 6 /*!< Destination network unknown error. */ 52 | #define ICMP_CODE_HOSTUNK 7 /*!< Destination host unknown error. */ 53 | /** 54 | * @} 55 | */ 56 | 57 | /** 58 | * Generate an ICMP destination unreachable message to buf. 59 | * This function will generate a directly returnable IP packet if hdr 60 | * is a pointer to a header stored in an ether buffer. 61 | * @param[in,out] hdr is the received header. It will be updated and 62 | * converted to the network order. 63 | * @param[in] code is one of the ICMP_TYPE_DESTUNREAC error codes. 64 | * @param[in,out] is the packet buffer given by ether layer. 65 | * @param[in] bsize is the size of the frame given by ether layer. 66 | * @return Returns the number of bytes written. 67 | */ 68 | int icmp_generate_dest_unreachable(struct ip_hdr *hdr, 69 | int code, 70 | uint8_t *buf, 71 | size_t bsize); 72 | 73 | /** 74 | * @} 75 | */ 76 | -------------------------------------------------------------------------------- /src/nstack_internal.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | 6 | #include "nstack_socket.h" 7 | #include "tree.h" 8 | 9 | #define NSTACK_CTRL_FLAG_DYING 0x8000 10 | 11 | typedef void nstack_periodic_task_t(int delta_time); 12 | 13 | /** 14 | * Declare a periodic task. 15 | */ 16 | #define NSTACK_PERIODIC_TASK(_task_fn_) \ 17 | DATA_SET(_nstack_periodic_tasks, _task_fn_) 18 | 19 | struct queue_cb; 20 | 21 | /** 22 | * A generic socket descriptor. 23 | */ 24 | struct nstack_sock { 25 | struct nstack_sock_info info; /* Must be first */ 26 | 27 | struct nstack_sock_ctrl *ctrl; 28 | uint8_t *ingress_data; 29 | struct queue_cb *ingress_q; 30 | uint8_t *egress_data; 31 | struct queue_cb *egress_q; 32 | 33 | union { 34 | struct { 35 | RB_ENTRY(nstack_sock) _entry; 36 | } udp; 37 | struct { 38 | RB_ENTRY(nstack_sock) _entry; 39 | } tcp; 40 | } data; 41 | char shmem_path[80]; 42 | }; 43 | 44 | /** 45 | * Socket datagram wrapping. 46 | * @{ 47 | */ 48 | 49 | /** 50 | * Handle socket input data. 51 | * Transport -> Socket 52 | */ 53 | int nstack_sock_dgram_input(struct nstack_sock *sock, 54 | struct nstack_sockaddr *srcaddr, 55 | uint8_t *buf, 56 | size_t bsize); 57 | 58 | typedef int nstack_send_fn(struct nstack_sock *sock, 59 | const struct nstack_dgram *dgram); 60 | 61 | /** 62 | * @} 63 | */ 64 | -------------------------------------------------------------------------------- /src/nstack_ip.h: -------------------------------------------------------------------------------- 1 | /** 2 | * nstack IP service. 3 | * @addtogroup IP 4 | * @{ 5 | */ 6 | 7 | #pragma once 8 | 9 | #include "linker_set.h" 10 | #include "nstack_ether.h" 11 | #include "nstack_in.h" 12 | 13 | #define IP_STR_LEN 17 14 | 15 | /** 16 | * Stack ip "10.0.0.2" and subnet mask "255.255.255.0" in decimal form 17 | * @{ 18 | */ 19 | #define STACK_IP 167772162 20 | #define SUBNET_MASK 4294967040 21 | /** 22 | * @} 23 | */ 24 | 25 | /** 26 | * IP Route descriptor. 27 | */ 28 | struct ip_route { 29 | in_addr_t r_network; /*!< Network address. */ 30 | in_addr_t r_netmask; /*!< Network mask. */ 31 | in_addr_t r_gw; /*!< Gateway IP. */ 32 | in_addr_t r_iface; /*!< Interface address. */ 33 | int r_iface_handle; /*!< Interface ether_handle. */ 34 | }; 35 | 36 | /** 37 | * IP Packet Header. 38 | */ 39 | struct ip_hdr { 40 | uint8_t ip_vhl; 41 | uint8_t ip_tos; 42 | uint16_t ip_len; 43 | uint16_t ip_id; 44 | uint16_t ip_foff; 45 | uint8_t ip_ttl; 46 | uint8_t ip_proto; 47 | uint16_t ip_csum; 48 | uint32_t ip_src; 49 | uint32_t ip_dst; 50 | uint8_t ip_opt[0]; 51 | 52 | } __attribute__((packed, aligned(4))); 53 | 54 | /** 55 | * IP Packet Header Defaults 56 | * @{ 57 | */ 58 | /* v4 and 5 * 4 octets */ 59 | #define IP_VHL_DEFAULT 0x45 /*!< Default value for version and ihl. */ 60 | #define IP_TOS_DEFAULT 0x0 /*!< Default type of service and no ECN */ 61 | #define IP_TOFF_DEFAULT 0x4000 62 | #define IP_TTL_DEFAULT 64 63 | /** 64 | * @} 65 | */ 66 | 67 | /** 68 | * IP Packet header values. 69 | */ 70 | 71 | /** 72 | * Get IP version. 73 | */ 74 | #define IP_VERSION(_ip_hdr_) (((ip_hdr)->ip_vhl & 0x40) >> 4) 75 | 76 | #define IP_FALGS_DF 0x4000 77 | #define IP_FLAGS_MF 0x2000 78 | 79 | /** 80 | * Max IP packet size in bytes. 81 | */ 82 | #define IP_MAX_BYTES 65535 83 | 84 | /** 85 | * Max IP data size in bytes. 86 | */ 87 | #define IP_DATA_MAX_BYTES 65515 88 | 89 | 90 | /** 91 | * IP protocol numbers. 92 | * @{ 93 | */ 94 | #define IP_PROTO_ICMP 1 95 | #define IP_PROTO_IGMP 2 96 | #define IP_PROTO_TCP 6 97 | #define IP_PROTO_UDP 17 98 | #define IP_PROTO_SCTP 132 99 | /** 100 | * @} 101 | */ 102 | 103 | /** 104 | * @} 105 | */ 106 | 107 | struct _ip_proto_handler { 108 | uint16_t proto_id; 109 | int (*fn)(const struct ip_hdr *hdr, uint8_t *payload, size_t bsize); 110 | }; 111 | 112 | /** 113 | * Declare an IP input chain handler. 114 | */ 115 | #define IP_PROTO_INPUT_HANDLER(_proto_id_, _handler_fn_) \ 116 | static struct _ip_proto_handler _ip_proto_handler_##_handler_fn_ = { \ 117 | .proto_id = _proto_id_, \ 118 | .fn = _handler_fn_, \ 119 | }; \ 120 | DATA_SET(_ip_proto_handlers, _ip_proto_handler_##_handler_fn_) 121 | 122 | int ip_config(int ether_handle, in_addr_t ip_addr, in_addr_t netmask); 123 | 124 | /** 125 | * IP Packet manipulation. 126 | * @{ 127 | */ 128 | 129 | /** 130 | * Calculate the Internet checksum. 131 | */ 132 | uint16_t ip_checksum(void *dp, size_t bsize); 133 | 134 | /** 135 | * Get the header length of an IP packet. 136 | */ 137 | static inline size_t ip_hdr_hlen(const struct ip_hdr *ip) 138 | { 139 | return (ip->ip_vhl & 0x0f) * 4; 140 | } 141 | 142 | /** 143 | * @} 144 | */ 145 | 146 | /** 147 | * RIB 148 | * @{ 149 | */ 150 | 151 | /** 152 | * Update a route. 153 | * @param[in] route is a pointer to a route struct; the information will be 154 | * copied from the struct. 155 | */ 156 | int ip_route_update(struct ip_route *route); 157 | 158 | /** 159 | * Remove a route from routing table. 160 | */ 161 | int ip_route_remove(struct ip_route *route); 162 | 163 | /** 164 | * Get routing information for a network. 165 | * @param[out] route is a pointer to a ip_route struct that will be updated 166 | * if a route is found. 167 | */ 168 | int ip_route_find_by_network(in_addr_t ip, struct ip_route *route); 169 | 170 | /** 171 | * Get routing information for a source IP addess. 172 | * The function can be also used for source IP address validation by setting 173 | * route pointer argument to NULL. 174 | */ 175 | int ip_route_find_by_iface(in_addr_t addr, struct ip_route *route); 176 | 177 | /** 178 | * @} 179 | */ 180 | 181 | /** 182 | * IP packet handling and manipulation. 183 | * @{ 184 | */ 185 | void ip_hton(const struct ip_hdr *host, struct ip_hdr *net); 186 | size_t ip_ntoh(const struct ip_hdr *net, struct ip_hdr *host); 187 | 188 | int ip_input(const struct ether_hdr *e_hdr, uint8_t *payload, size_t bsize); 189 | 190 | /** 191 | * Construct a reply header from a received IP packet header. 192 | * Swaps src and dst etc. 193 | * @param host_ip_hd is a pointer to a IP packet header that should be reversed. 194 | * @param bsize is the size of the packet data. 195 | * @returns Returns the size of the header. 196 | */ 197 | size_t ip_reply_header(struct ip_hdr *host_ip_hdr, size_t bsize); 198 | /** 199 | * @} 200 | */ 201 | 202 | /** 203 | * Send an IP packet to a destination. 204 | */ 205 | int ip_send(in_addr_t dst, uint8_t proto, const uint8_t *buf, size_t bsize); 206 | 207 | /** 208 | * IP Fragmentation 209 | * @{ 210 | */ 211 | 212 | static inline int ip_fragment_is_frag(struct ip_hdr *hdr) 213 | { 214 | return (!!(hdr->ip_foff & IP_FLAGS_MF) || !!(hdr->ip_foff & 0x1fff)); 215 | } 216 | 217 | int ip_fragment_input(struct ip_hdr *ip_hdr, uint8_t *rx_packet); 218 | 219 | /** 220 | * @} 221 | */ 222 | 223 | /** 224 | * @} 225 | */ 226 | -------------------------------------------------------------------------------- /src/socket.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | #include "nstack_socket.h" 10 | #include "nstack_util.h" 11 | 12 | /* TODO bind fn for outbound connections */ 13 | 14 | static void block_sigusr2(void) 15 | { 16 | sigset_t sigset; 17 | 18 | sigemptyset(&sigset); 19 | sigaddset(&sigset, SIGUSR2); 20 | 21 | if (pthread_sigmask(SIG_BLOCK, &sigset, NULL) == -1) 22 | abort(); 23 | } 24 | 25 | void *nstack_listen(const char *socket_path) 26 | { 27 | int fd; 28 | void *pa; 29 | 30 | fd = open(socket_path, O_RDWR); 31 | if (fd == -1) 32 | return NULL; 33 | 34 | pa = mmap(0, NSTACK_SHMEM_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); 35 | if (pa == MAP_FAILED) 36 | return NULL; 37 | 38 | block_sigusr2(); 39 | NSTACK_SOCK_CTRL(pa)->pid_end = getpid(); 40 | 41 | return pa; 42 | } 43 | 44 | ssize_t nstack_recvfrom(void *socket, 45 | void *restrict buffer, 46 | size_t length, 47 | int flags, 48 | struct nstack_sockaddr *restrict address) 49 | { 50 | struct queue_cb *ingress_q = NSTACK_INGRESS_QADDR(socket); 51 | struct nstack_dgram *dgram; 52 | sigset_t sigset; 53 | int dgram_index; 54 | ssize_t rd; 55 | 56 | sigemptyset(&sigset); 57 | sigaddset(&sigset, SIGUSR2); 58 | 59 | do { 60 | struct timespec timeout = { 61 | .tv_sec = NSTACK_PERIODIC_EVENT_SEC, 62 | .tv_nsec = 0, 63 | }; 64 | 65 | sigtimedwait(&sigset, NULL, &timeout); 66 | } while (!queue_peek(ingress_q, &dgram_index)); 67 | dgram = 68 | (struct nstack_dgram *) (NSTACK_INGRESS_DADDR(socket) + dgram_index); 69 | 70 | if (address) 71 | *address = dgram->srcaddr; 72 | rd = smin(length, dgram->buf_size); 73 | memcpy(buffer, dgram->buf, rd); 74 | dgram = NULL; 75 | 76 | if (!(flags & NSTACK_MSG_PEEK)) 77 | queue_discard(ingress_q, 1); 78 | 79 | return rd; 80 | } 81 | 82 | ssize_t nstack_sendto(void *socket, 83 | const void *buffer, 84 | size_t length, 85 | int flags, 86 | const struct nstack_sockaddr *dest_addr) 87 | { 88 | const struct nstack_sock_ctrl *ctrl = NSTACK_SOCK_CTRL(socket); 89 | struct queue_cb *egress_q = NSTACK_EGRESS_QADDR(socket); 90 | struct nstack_dgram *dgram; 91 | int dgram_index; 92 | 93 | if (length > NSTACK_DATAGRAM_SIZE_MAX) { 94 | errno = ENOBUFS; 95 | return -1; 96 | } 97 | 98 | while ((dgram_index = queue_alloc(egress_q)) == -1) 99 | ; 100 | dgram = (struct nstack_dgram *) (NSTACK_EGRESS_DADDR(socket) + dgram_index); 101 | 102 | /* Ignored by the implementation */ 103 | memset(&dgram->srcaddr, 0, sizeof(struct nstack_sockaddr)); 104 | 105 | dgram->dstaddr = *dest_addr; 106 | memcpy(dgram->buf, buffer, length); 107 | dgram->buf_size = length; 108 | 109 | queue_commit(egress_q); 110 | 111 | return length; 112 | } 113 | -------------------------------------------------------------------------------- /src/tcp.c: -------------------------------------------------------------------------------- 1 | #if defined(__linux__) 2 | #include 3 | #include 4 | #else 5 | #include 6 | #endif 7 | 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | #include "nstack_ip.h" 15 | #include "nstack_socket.h" 16 | 17 | #include "collection.h" 18 | #include "logger.h" 19 | #include "nstack_internal.h" 20 | #include "tcp.h" 21 | #include "tree.h" 22 | 23 | #define TCP_MSS 1460 /*!< TCP maximum segment size. */ 24 | 25 | #define TCP_TIMER_MS 250 26 | #define TCP_FIN_WAIT_TIMEOUT_MS 20000 27 | #define TCP_SYN_RCVD_TIMEOUT_MS 20000 28 | 29 | /* 30 | * TCP Connection Flags. 31 | */ 32 | #define TCP_FLAG_ACK_DELAY 0x01 33 | #define TCP_FLAG_ACK_NOW 0x02 34 | #define TCP_FLAG_RESET 0x04 35 | #define TCP_FLAG_CLOSED 0x08 36 | #define TCP_FLAG_GOT_FIN 0x10 37 | #define TCP_FLAG_NODELAY 0x20 /*!< Disable nagle algorithm. */ 38 | 39 | /** 40 | * Current time. Used if RTT is measured using timestamp method. 41 | */ 42 | unsigned tcp_now = 0; 43 | 44 | /** 45 | * TCP Segment. 46 | */ 47 | struct tcp_segment { 48 | TAILQ_ENTRY(tcp_segment) _link; 49 | size_t size; 50 | char *data; 51 | struct tcp_hdr header; 52 | }; 53 | 54 | TAILQ_HEAD(tcp_segment_list, tcp_segment); 55 | 56 | /** 57 | * TCP Connection Control Block. 58 | */ 59 | struct tcp_conn_tcb { 60 | struct nstack_sockaddr local; /*!< Local address and port. */ 61 | struct nstack_sockaddr remote; /*!< Remote address and port. */ 62 | 63 | enum tcp_state state; /*!< Connection state. */ 64 | unsigned flags; /*!< Connection status flags. */ 65 | size_t mss; /*!< Maximum Segment Size. */ 66 | unsigned keepalive; /*!< Keepalive time. */ 67 | unsigned keepalive_cnt; /*!< Keepalive counter. */ 68 | 69 | /* RTT Estimation. */ 70 | int rtt_est; /*!< RTT estimator. */ 71 | int rtt_var; /*!< mean deviation RTT estimator*/ 72 | int rtt; /*!< RTT sample*/ 73 | int rtt_cur_seq; /*!< Seq number being timed for RTT estimation. */ 74 | 75 | unsigned retran_timeout; /*!< Retransmission timeout. */ 76 | unsigned retran_count; /*!< Number of retransmissions. */ 77 | 78 | /* Fast Retransmit. */ 79 | uint32_t fastre_last_ack; 80 | unsigned fastre_dup_acks; 81 | 82 | /* Receiver. */ 83 | uint32_t recv_next; /*!< Next seqno expected. */ 84 | uint32_t recv_wnd; /*!< Receiver window. */ 85 | 86 | /* Sender. */ 87 | uint32_t send_next; /*!< Next seqno to be used. */ 88 | uint32_t send_una; /*!< Oldest unacknowledged seqno.*/ 89 | uint32_t send_max; /*!< Maximum send seqno. An acceptible ACK is the one 90 | which the following inequality holds: snd_una < 91 | acknowledgment field <= snd_max */ 92 | uint32_t send_wnd; 93 | uint32_t acked; 94 | 95 | RB_ENTRY(tcp_conn_tcb) _rb_entry; 96 | 97 | /* Segment Lists. */ 98 | struct tcp_segment_list unsent_list; /*!< Unsent segments. */ 99 | struct tcp_segment_list unacked_list; /*!< Unacked segments. */ 100 | struct tcp_segment_list oos_segments_list; /*!< Out of seq segments. */ 101 | 102 | int timer[TCP_T_NTIMERS]; 103 | pthread_mutex_t mutex; 104 | }; 105 | 106 | struct tcp_conn_attr { 107 | struct nstack_sockaddr local; 108 | struct nstack_sockaddr remote; 109 | }; 110 | 111 | RB_HEAD(tcp_conn_map, tcp_conn_tcb); 112 | 113 | static struct tcp_conn_map tcp_conn_map = RB_INITIALIZER(); 114 | 115 | static int tcp_conn_cmp(struct tcp_conn_tcb *a, struct tcp_conn_tcb *b) 116 | { 117 | const int local = 118 | memcmp(&a->local, &b->local, sizeof(struct nstack_sockaddr)); 119 | const int remote = 120 | memcmp(&a->remote, &b->remote, sizeof(struct nstack_sockaddr)); 121 | return local + remote; 122 | } 123 | 124 | RB_GENERATE_STATIC(tcp_conn_map, tcp_conn_tcb, _rb_entry, tcp_conn_cmp); 125 | 126 | static struct tcp_conn_tcb *tcp_find_connection(struct tcp_conn_attr *find) 127 | { 128 | struct tcp_conn_tcb *find_p = (struct tcp_conn_tcb *) find; 129 | 130 | return RB_FIND(tcp_conn_map, &tcp_conn_map, find_p); 131 | } 132 | 133 | static struct tcp_conn_tcb *tcp_new_connection(const struct tcp_conn_attr *attr) 134 | { 135 | struct tcp_conn_tcb *conn = calloc(1, sizeof(struct tcp_conn_tcb)); 136 | assert(conn); 137 | memcpy(&conn->local, &attr->local, sizeof(struct nstack_sockaddr)); 138 | memcpy(&conn->remote, &attr->remote, sizeof(struct nstack_sockaddr)); 139 | pthread_mutex_init(&conn->mutex, NULL); 140 | RB_INSERT(tcp_conn_map, &tcp_conn_map, conn); 141 | 142 | return conn; 143 | } 144 | 145 | RB_HEAD(tcp_sock_tree, nstack_sock); 146 | 147 | static struct tcp_sock_tree tcp_sock_tree_head = RB_INITIALIZER(); 148 | 149 | static int tcp_socket_cmp(struct nstack_sock *a, struct nstack_sock *b) 150 | { 151 | return memcmp(&a->info.sock_addr, &b->info.sock_addr, 152 | sizeof(struct nstack_sockaddr)); 153 | } 154 | 155 | RB_GENERATE_STATIC(tcp_sock_tree, nstack_sock, data.tcp._entry, tcp_socket_cmp); 156 | 157 | static struct nstack_sock *find_tcp_socket(const struct nstack_sockaddr *addr) 158 | { 159 | struct nstack_sock_info find = { 160 | .sock_addr = *addr, 161 | }; 162 | 163 | return RB_FIND(tcp_sock_tree, &tcp_sock_tree_head, 164 | (struct nstack_sock *) (&find)); 165 | } 166 | 167 | static uint16_t tcp_checksum(const struct nstack_sockaddr *restrict src, 168 | const struct nstack_sockaddr *restrict dst, 169 | struct tcp_hdr *restrict dp, 170 | size_t bsize) 171 | { 172 | const uint8_t *restrict data = (uint8_t *) dp; 173 | uint32_t acc = 0xffff; 174 | size_t i; 175 | uint16_t word; 176 | struct { 177 | uint32_t src_addr; 178 | uint32_t dst_addr; 179 | uint8_t zeros; 180 | uint8_t proto; 181 | uint16_t tcp_len; 182 | } __attribute__((packed, aligned(4))) pseudo_header = { 183 | .src_addr = htonl(src->inet4_addr), 184 | .dst_addr = htonl(dst->inet4_addr), 185 | .zeros = 0, 186 | .proto = IP_PROTO_TCP, 187 | .tcp_len = htons(bsize), 188 | }; 189 | 190 | for (i = 0; i + 1 < 12; i += 2) { 191 | memcpy(&word, (uint8_t *) (&pseudo_header) + i, 2); 192 | acc += word; 193 | if (acc > 0xffff) 194 | acc -= ntohs(0xffff); 195 | } 196 | 197 | for (i = 0; i + 1 < bsize; i += 2) { 198 | memcpy(&word, data + i, 2); 199 | acc += word; 200 | if (acc > 0xffff) 201 | acc -= ntohs(0xffff); 202 | } 203 | 204 | if (bsize & 1) { 205 | word = 0; 206 | memcpy(&word, data + bsize - 1, 1); 207 | acc += word; 208 | if (acc > 0xffff) 209 | acc -= ntohs(0xffff); 210 | } 211 | 212 | return ~acc; 213 | } 214 | 215 | static void tcp_hton_opt(struct tcp_hdr *hdr, int len) 216 | { 217 | for (int i = 0; i < len;) { 218 | if (hdr->opt[i] == 0) { 219 | break; 220 | } 221 | if (hdr->opt[i] == 1) { 222 | i += 1; 223 | continue; 224 | } 225 | struct tcp_option *opt = (struct tcp_option *) (&(hdr->opt[i])); 226 | switch (opt->option_kind) { 227 | case 2: /* maximum segment size (2bytes)*/ 228 | opt->mss = htons(opt->mss); 229 | i += opt->length; 230 | continue; 231 | case 3: /*window size (1 byte)*/ 232 | i += opt->length; 233 | continue; 234 | case 4: 235 | i += opt->length; 236 | continue; 237 | case 5: 238 | i += opt->length; 239 | continue; 240 | case 8: /*timestamp and echo of previous timestamp(8 bytes)*/ 241 | opt->tsval = htonl(opt->tsval); 242 | opt->tsecr = htonl(opt->tsecr); 243 | i += opt->length; 244 | continue; 245 | } 246 | } 247 | } 248 | 249 | static void tcp_ntoh_opt(struct tcp_hdr *hdr, int len) 250 | { 251 | for (int i = 0; i < len;) { 252 | if (hdr->opt[i] == 0) { 253 | break; 254 | } 255 | if (hdr->opt[i] == 1) { 256 | i += 1; 257 | continue; 258 | } 259 | struct tcp_option *opt = (struct tcp_option *) (&(hdr->opt[i])); 260 | switch (opt->option_kind) { 261 | case 2: /* maximum segment size (2bytes)*/ 262 | opt->mss = ntohs(opt->mss); 263 | i += opt->length; 264 | continue; 265 | case 3: /*window size (1 byte)*/ 266 | i += opt->length; 267 | continue; 268 | case 4: 269 | i += opt->length; 270 | continue; 271 | case 5: 272 | i += opt->length; 273 | continue; 274 | case 8: /*timestamp and echo of previous timestamp(8 bytes)*/ 275 | opt->tsval = ntohl(opt->tsval); 276 | opt->tsecr = ntohl(opt->tsecr); 277 | i += opt->length; 278 | continue; 279 | } 280 | } 281 | } 282 | 283 | static void tcp_hton(const struct nstack_sockaddr *restrict src, 284 | const struct nstack_sockaddr *restrict dst, 285 | const struct tcp_hdr *host, 286 | struct tcp_hdr *net, 287 | size_t bsize) 288 | { 289 | int opt_len = tcp_opt_size(host); 290 | tcp_hton_opt(host, opt_len); 291 | net->tcp_sport = htons(host->tcp_sport); 292 | net->tcp_dport = htons(host->tcp_dport); 293 | net->tcp_seqno = htonl(host->tcp_seqno); 294 | net->tcp_ack_num = htonl(host->tcp_ack_num); 295 | net->tcp_flags = htons(host->tcp_flags); 296 | net->tcp_win_size = htons(host->tcp_win_size); 297 | net->tcp_urg_ptr = htons(host->tcp_urg_ptr); 298 | net->tcp_checksum = 0; 299 | net->tcp_checksum = tcp_checksum(src, dst, net, bsize); 300 | } 301 | 302 | static void tcp_ntoh(const struct tcp_hdr *net, struct tcp_hdr *host) 303 | { 304 | host->tcp_sport = ntohs(net->tcp_sport); 305 | host->tcp_dport = ntohs(net->tcp_dport); 306 | host->tcp_seqno = ntohl(net->tcp_seqno); 307 | host->tcp_ack_num = ntohl(net->tcp_ack_num); 308 | host->tcp_flags = ntohs(net->tcp_flags); 309 | host->tcp_win_size = ntohs(net->tcp_win_size); 310 | host->tcp_urg_ptr = ntohs(net->tcp_urg_ptr); 311 | int opt_len = tcp_opt_size(host); 312 | tcp_ntoh_opt(host, opt_len); 313 | } 314 | 315 | static void tcp_rto_update(struct tcp_conn_tcb *conn, int rtt); 316 | static void tcp_ack_segments(struct tcp_conn_tcb *conn, struct tcp_hdr *tcp); 317 | 318 | static int tcp_fsm(struct tcp_conn_tcb *conn, 319 | struct tcp_hdr *rs, 320 | struct ip_hdr *ip_hdr, 321 | size_t bsize) 322 | { 323 | if (conn->rtt && (rs->tcp_ack_num > conn->rtt_cur_seq)) { 324 | tcp_rto_update(conn, conn->rtt); 325 | } 326 | switch (conn->state) { 327 | case TCP_CLOSED: 328 | LOG(LOG_INFO, "TCP state: TCP_CLOSED"); 329 | return 0; 330 | case TCP_SYN_SENT: 331 | LOG(LOG_INFO, "TCP state: TCP_SYN_SENT"); 332 | if (rs->tcp_flags & (TCP_SYN | TCP_ACK)) { 333 | LOG(LOG_INFO, "SYN & ACK received"); 334 | rs->tcp_flags = TCP_ACK | 5 << 12; 335 | rs->tcp_ack_num = rs->tcp_seqno + 1; 336 | rs->tcp_seqno = conn->send_next; 337 | conn->recv_next = rs->tcp_ack_num; 338 | conn->recv_wnd = rs->tcp_win_size; 339 | LOG(LOG_INFO, "%d", ((uint32_t *) &rs)[3]); 340 | conn->timer[TCP_T_KEEP] = 0; 341 | conn->state = TCP_ESTABLISHED; 342 | conn->timer[TCP_T_REXMT] = 343 | 1; /* TODO: Instead of utilizing retransmission, use another way 344 | to send any unsent segments after receiving SYN & ACK.*/ 345 | return tcp_hdr_size(rs); 346 | } 347 | if (rs->tcp_flags & (TCP_SYN)) { 348 | /*Client and server open connection simultaneously*/ 349 | LOG(LOG_INFO, "SYN received, connection opened simultaneously "); 350 | rs->tcp_flags = (TCP_SYN | TCP_ACK) | 5 << 12; 351 | rs->tcp_ack_num = rs->tcp_seqno + 1; 352 | rs->tcp_seqno = conn->send_next; 353 | conn->recv_next = rs->tcp_ack_num; 354 | conn->recv_wnd = rs->tcp_win_size; 355 | LOG(LOG_INFO, "%d", ((uint32_t *) &rs)[3]); 356 | conn->state = TCP_SYN_RCVD; 357 | return tcp_hdr_size(rs); 358 | } 359 | case TCP_LISTEN: 360 | LOG(LOG_INFO, "TCP state: TCP_LISTEN"); 361 | 362 | if (rs->tcp_flags & TCP_SYN) { 363 | LOG(LOG_INFO, "SYN received"); 364 | 365 | struct nstack_sockaddr sockaddr = { 366 | .inet4_addr = ip_hdr->ip_dst, 367 | .port = rs->tcp_dport, 368 | }; 369 | struct nstack_sock *sock = find_tcp_socket(&sockaddr); 370 | if (!sock) { 371 | LOG(LOG_INFO, "Port %d unreachable", sockaddr.port); 372 | rs->tcp_flags &= ~TCP_SYN; 373 | rs->tcp_flags |= TCP_RST; 374 | } 375 | rs->tcp_flags |= TCP_ACK; 376 | rs->tcp_ack_num = rs->tcp_seqno + 1; 377 | 378 | #if defined(__linux__) 379 | int fd = open("/dev/urandom", O_RDONLY); 380 | read(fd, &(rs->tcp_seqno), sizeof(rs->tcp_seqno)); 381 | close(fd); 382 | #else 383 | srand(time(NULL)); 384 | rs->tcp_seqno = rand() % 100; 385 | #endif 386 | 387 | if (sock) { 388 | conn->state = TCP_SYN_RCVD; 389 | } else { 390 | conn->state = TCP_CLOSED; 391 | } 392 | conn->recv_next = rs->tcp_ack_num; 393 | conn->send_next = rs->tcp_seqno + 1; 394 | LOG(LOG_INFO, "%d", ((uint32_t *) &rs)[3]); 395 | return tcp_hdr_size(rs); 396 | } 397 | return 0; 398 | case TCP_SYN_RCVD: 399 | LOG(LOG_INFO, "TCP state: TCP_SYN_RCVD"); 400 | if ((rs->tcp_flags & TCP_RST) && rs->tcp_seqno == conn->recv_next && 401 | rs->tcp_ack_num == conn->send_next) { 402 | conn->state = TCP_LISTEN; 403 | return 0; 404 | } 405 | if ((rs->tcp_flags & TCP_ACK) && rs->tcp_seqno == conn->recv_next && 406 | rs->tcp_ack_num == conn->send_next) { 407 | conn->timer[TCP_T_KEEP] = 0; 408 | conn->state = TCP_ESTABLISHED; 409 | return 0; 410 | } 411 | rs->tcp_flags &= ~TCP_ACK; 412 | rs->tcp_ack_num = rs->tcp_seqno; 413 | rs->tcp_seqno = conn->send_next; 414 | 415 | conn->recv_next = rs->tcp_ack_num; 416 | conn->send_next = rs->tcp_seqno + 1; 417 | return tcp_hdr_size(rs); 418 | case TCP_ESTABLISHED: 419 | LOG(LOG_INFO, "TCP state: TCP_ESTABLISHED"); 420 | tcp_ack_segments(conn, rs); 421 | if ((rs->tcp_flags & TCP_ACK) && (rs->tcp_flags & TCP_PSH) && 422 | rs->tcp_seqno == conn->recv_next && 423 | rs->tcp_ack_num == conn->send_next) { 424 | /* data handling */ 425 | rs->tcp_flags &= ~TCP_PSH; 426 | rs->tcp_ack_num = rs->tcp_seqno + (bsize - tcp_hdr_size(rs)); 427 | rs->tcp_seqno = conn->send_next; 428 | 429 | conn->recv_next = rs->tcp_ack_num; 430 | conn->send_next = rs->tcp_seqno; 431 | 432 | /* forward the payload to application layer */ 433 | struct nstack_sockaddr sockaddr = { 434 | .inet4_addr = ip_hdr->ip_dst, 435 | .port = rs->tcp_dport, 436 | }; 437 | struct nstack_sock *sock = find_tcp_socket(&sockaddr); 438 | struct nstack_sockaddr srcaddr = { 439 | .inet4_addr = ip_hdr->ip_src, 440 | .port = rs->tcp_sport, 441 | }; 442 | size_t header_size = tcp_hdr_size(rs); 443 | nstack_sock_dgram_input(sock, &srcaddr, 444 | ((uint8_t *) rs) + header_size, 445 | bsize - header_size); 446 | 447 | return tcp_hdr_size(rs); 448 | } 449 | if (rs->tcp_flags & TCP_FIN) { /* Close connection. */ 450 | rs->tcp_flags |= TCP_ACK; 451 | rs->tcp_ack_num = rs->tcp_seqno + 1; 452 | rs->tcp_seqno = conn->send_next; 453 | 454 | conn->state = TCP_LAST_ACK; /* Skip TCP_CLOSE_WAIT state */ 455 | conn->recv_next = rs->tcp_ack_num; 456 | conn->send_next = rs->tcp_seqno + 1; 457 | return tcp_hdr_size(rs); 458 | } 459 | return 0; 460 | case TCP_FIN_WAIT_1: 461 | case TCP_FIN_WAIT_2: 462 | case TCP_CLOSE_WAIT: 463 | case TCP_CLOSING: 464 | LOG(LOG_INFO, "TCP state: TCP_CLOSING"); 465 | 466 | return 0; 467 | case TCP_LAST_ACK: 468 | LOG(LOG_INFO, "TCP state: TCP_LAST_ACK"); 469 | if (rs->tcp_flags & TCP_ACK) { 470 | RB_REMOVE(tcp_conn_map, &tcp_conn_map, conn); 471 | conn->state = TCP_CLOSED; 472 | free(conn); 473 | } 474 | return 0; 475 | /* TODO handle error? */ 476 | case TCP_TIME_WAIT: 477 | LOG(LOG_INFO, "TCP state: TCP_TIME_WAIT"); 478 | default: 479 | LOG(LOG_INFO, "TCP state: INVALID (%d)", conn->state); 480 | 481 | return -EINVAL; 482 | } 483 | } 484 | 485 | /** 486 | * TCP input chain. 487 | * IP -> TCP 488 | */ 489 | static int tcp_input(const struct ip_hdr *ip_hdr, 490 | uint8_t *payload, 491 | size_t bsize) 492 | { 493 | struct tcp_conn_attr attr; 494 | struct tcp_hdr *tcp = (struct tcp_hdr *) payload; 495 | 496 | if (bsize < sizeof(struct tcp_hdr)) { 497 | LOG(LOG_INFO, "Datagram size too small"); 498 | 499 | return -EBADMSG; 500 | } 501 | 502 | memset(&attr, 0, sizeof(attr)); 503 | attr.local.inet4_addr = ip_hdr->ip_dst; 504 | attr.local.port = ntohs(tcp->tcp_dport); 505 | attr.remote.inet4_addr = ip_hdr->ip_src; 506 | attr.remote.port = ntohs(tcp->tcp_sport); 507 | 508 | /* TODO Can't verify on LXC env */ 509 | #if 0 510 | if (tcp_checksum(&attr.remote, &attr.local, tcp, bsize) != 0) { 511 | LOG(LOG_INFO, "TCP checksum fail"); 512 | /* TODO Fail properly */ 513 | return -EBADMSG; 514 | } 515 | #endif 516 | 517 | tcp_ntoh(tcp, tcp); 518 | 519 | struct tcp_conn_tcb *conn = tcp_find_connection(&attr); 520 | if ((conn && 521 | ((tcp->tcp_flags & TCP_SYN) && (conn->state >= TCP_ESTABLISHED))) || 522 | tcp_hdr_size(tcp) < 0) { 523 | /*Invalid flag, or invalid header size. */ 524 | return -EINVAL; /* TODO any other error handling needed here? */ 525 | } 526 | if (!conn && (tcp->tcp_flags & TCP_SYN)) { /* New connection */ 527 | char rem_str[IP_STR_LEN]; 528 | char loc_str[IP_STR_LEN]; 529 | 530 | ip2str(attr.remote.inet4_addr, rem_str); 531 | ip2str(attr.local.inet4_addr, loc_str); 532 | LOG(LOG_INFO, "New connection %s:%i -> %s:%i", rem_str, 533 | attr.remote.port, loc_str, attr.local.port); 534 | 535 | conn = tcp_new_connection(&attr); 536 | conn->state = TCP_LISTEN; 537 | } 538 | 539 | int retval = tcp_fsm(conn, tcp, ip_hdr, bsize); 540 | if (retval > 0) { /* Fast reply */ 541 | tcp->tcp_sport = attr.local.port; 542 | tcp->tcp_dport = attr.remote.port; 543 | tcp_hton(&attr.local, &attr.remote, tcp, tcp, retval); 544 | } 545 | 546 | return retval; 547 | } 548 | IP_PROTO_INPUT_HANDLER(IP_PROTO_TCP, tcp_input); 549 | 550 | int nstack_tcp_bind(struct nstack_sock *sock) 551 | { 552 | if (sock->info.sock_addr.port > NSTACK_SOCK_PORT_MAX) { 553 | errno = EINVAL; 554 | return -1; 555 | } 556 | 557 | if (find_tcp_socket(&sock->info.sock_addr)) { 558 | errno = EADDRINUSE; 559 | return -1; 560 | } 561 | 562 | RB_INSERT(tcp_sock_tree, &tcp_sock_tree_head, sock); 563 | 564 | return 0; 565 | } 566 | 567 | static int tcp_connection_init(struct tcp_conn_tcb *conn) 568 | { 569 | conn->state = TCP_SYN_SENT; 570 | conn->mss = TCP_MSS; 571 | #if defined(__linux__) 572 | int fd = open("/dev/urandom", O_RDONLY); 573 | read(fd, &(conn->send_next), sizeof(conn->send_next)); 574 | close(fd); 575 | #else 576 | srand(time(NULL)); 577 | conn->send_next = rand() % 100; 578 | #endif 579 | conn->rtt_est = TCP_TV_SRTTBASE; 580 | conn->rtt_var = (TCP_RTTDFT * TCP_TIMER_PR_SLOWHZ) << 2; 581 | conn->retran_timeout = 582 | ((TCP_TV_SRTTBASE >> 2) + (TCP_TV_SRTTDFLT << 2)) >> 1; 583 | conn->send_wnd = 502; 584 | conn->send_una = conn->send_next; 585 | conn->send_max = conn->send_next; 586 | TAILQ_INIT(&conn->unsent_list); 587 | TAILQ_INIT(&conn->unacked_list); 588 | TAILQ_INIT(&conn->oos_segments_list); 589 | return 0; 590 | } 591 | 592 | static int tcp_send_syn(struct tcp_conn_tcb *conn) 593 | { 594 | struct tcp_option opt; 595 | opt = (struct tcp_option){.option_kind = 2, .length = 4, .mss = TCP_MSS}; 596 | 597 | uint8_t buf[sizeof(struct tcp_hdr) + opt.length]; 598 | struct tcp_hdr *tcp = (struct tcp_hdr *) buf; 599 | memcpy((struct tcp_option *) tcp->opt, &opt, opt.length); 600 | tcp->tcp_seqno = conn->send_next; 601 | conn->send_next++; 602 | tcp->tcp_flags = TCP_SYN | 6 << 12; 603 | tcp->tcp_win_size = 502; 604 | tcp->tcp_sport = conn->local.port; 605 | tcp->tcp_dport = conn->remote.port; 606 | tcp_hton(&(conn->local), &(conn->remote), tcp, tcp, tcp_hdr_size(tcp)); 607 | conn->state = TCP_SYN_SENT; 608 | conn->timer[TCP_T_KEEP] = TCP_TV_KEEP_INIT; 609 | int retval = ip_send(conn->remote.inet4_addr, IP_PROTO_TCP, buf, 610 | sizeof(struct tcp_hdr) + opt.length); 611 | return retval; 612 | } 613 | 614 | static int tcp_send_segments(struct tcp_conn_tcb *conn) 615 | { 616 | struct tcp_segment *seg, *seg_tmp; 617 | int retval; 618 | pthread_mutex_lock(&conn->mutex); 619 | TAILQ_FOREACH_SAFE(seg, &conn->unsent_list, _link, seg_tmp) 620 | { 621 | TAILQ_REMOVE(&conn->unsent_list, seg, _link); 622 | size_t hdr_size = tcp_hdr_size(&seg->header); 623 | uint8_t payload[hdr_size + seg->size]; 624 | struct tcp_hdr *tcp = (struct tcp_hdr *) (payload); 625 | 626 | memcpy(payload, &seg->header, hdr_size); 627 | ((struct tcp_hdr *) payload)->tcp_seqno = conn->send_next; 628 | ((struct tcp_hdr *) payload)->tcp_ack_num = conn->recv_next; 629 | memcpy(payload + hdr_size, seg->data, seg->size); 630 | 631 | tcp_hton(&(conn->local), &(conn->remote), tcp, tcp, 632 | hdr_size + seg->size); 633 | conn->send_next += seg->size; 634 | conn->send_max = conn->send_next; 635 | retval = ip_send(conn->remote.inet4_addr, IP_PROTO_TCP, payload, 636 | (hdr_size + seg->size)); 637 | if (retval < 0) { 638 | return -1; 639 | } 640 | TAILQ_INSERT_TAIL(&conn->unacked_list, seg, _link); 641 | } 642 | pthread_mutex_unlock(&conn->mutex); 643 | return 0; 644 | } 645 | 646 | static void tcp_ack_segments(struct tcp_conn_tcb *conn, struct tcp_hdr *tcp) 647 | { 648 | struct tcp_segment *seg, *seg_tmp; 649 | if (tcp->tcp_ack_num > conn->send_una) { 650 | conn->send_una = tcp->tcp_ack_num; 651 | pthread_mutex_lock(&conn->mutex); 652 | TAILQ_FOREACH_SAFE(seg, &conn->unacked_list, _link, seg_tmp) 653 | { 654 | if (seg->header.tcp_seqno < conn->send_una) { 655 | TAILQ_REMOVE(&conn->unacked_list, seg, _link); 656 | free(seg); 657 | } 658 | } 659 | pthread_mutex_unlock(&conn->mutex); 660 | if (conn->send_una == conn->send_max) { 661 | conn->timer[TCP_T_REXMT] = 0; 662 | } else { 663 | conn->send_next = conn->send_una; 664 | conn->timer[TCP_T_REXMT] = conn->rtt_est; 665 | } 666 | return; 667 | } else { 668 | return; 669 | } 670 | } 671 | 672 | int nstack_tcp_send(struct nstack_sock *sock, const struct nstack_dgram *dgram) 673 | { 674 | struct tcp_conn_attr attr; 675 | struct tcp_hdr tcp; 676 | struct tcp_segment *seg; 677 | int retval; 678 | memset(&attr, 0, sizeof(attr)); 679 | attr.local.inet4_addr = sock->info.sock_addr.inet4_addr; 680 | attr.local.port = sock->info.sock_addr.port; 681 | attr.remote.inet4_addr = dgram->dstaddr.inet4_addr; 682 | attr.remote.port = dgram->dstaddr.port; 683 | struct tcp_conn_tcb *conn = tcp_find_connection(&attr); 684 | if (!conn) { 685 | /*Client, send syn*/ 686 | char rem_str[IP_STR_LEN]; 687 | char loc_str[IP_STR_LEN]; 688 | ip2str(attr.remote.inet4_addr, rem_str); 689 | ip2str(attr.local.inet4_addr, loc_str); 690 | LOG(LOG_INFO, "Client request new connection %s:%i -> %s:%i", rem_str, 691 | attr.remote.port, loc_str, attr.local.port); 692 | conn = tcp_new_connection(&attr); 693 | tcp_connection_init(conn); 694 | tcp = (struct tcp_hdr){ 695 | .tcp_flags = TCP_PSH | TCP_ACK | (5 << TCP_DOFF_OFF), 696 | .tcp_win_size = 502, 697 | .tcp_sport = sock->info.sock_addr.port, 698 | .tcp_dport = dgram->dstaddr.port 699 | 700 | }; 701 | seg = calloc(1, sizeof(struct tcp_segment) + tcp_opt_size(&tcp) + 702 | dgram->buf_size); 703 | seg->data = (seg->header.opt) + tcp_opt_size(&tcp); 704 | seg->size = dgram->buf_size; 705 | memcpy(seg->data, dgram->buf, dgram->buf_size); 706 | memcpy(&seg->header, &tcp, tcp_hdr_size(&tcp)); 707 | pthread_mutex_lock(&conn->mutex); 708 | TAILQ_INSERT_TAIL(&conn->unsent_list, seg, _link); 709 | pthread_mutex_unlock(&conn->mutex); 710 | int retval = tcp_send_syn(conn); 711 | return retval; 712 | } else { 713 | switch (conn->state) { 714 | case TCP_ESTABLISHED: 715 | tcp = (struct tcp_hdr){ 716 | .tcp_flags = TCP_PSH | TCP_ACK | (5 << TCP_DOFF_OFF), 717 | .tcp_win_size = 502, 718 | .tcp_sport = conn->local.port, 719 | .tcp_dport = conn->remote.port 720 | 721 | }; 722 | if (conn->rtt == 0) { 723 | conn->rtt = 1; 724 | conn->rtt_cur_seq = conn->send_next; 725 | } 726 | struct tcp_segment *seg = 727 | calloc(1, sizeof(struct tcp_segment) + tcp_opt_size(&tcp) + 728 | dgram->buf_size); 729 | seg->data = (seg->header.opt) + tcp_opt_size(&tcp); 730 | seg->size = dgram->buf_size; 731 | memcpy(seg->data, dgram->buf, dgram->buf_size); 732 | memcpy(&seg->header, &tcp, tcp_hdr_size(&tcp)); 733 | pthread_mutex_lock(&conn->mutex); 734 | TAILQ_INSERT_TAIL(&conn->unsent_list, seg, _link); 735 | pthread_mutex_unlock(&conn->mutex); 736 | retval = tcp_send_segments(conn); 737 | return retval; 738 | default: 739 | LOG(LOG_INFO, "TCP state: INVALID (%d)", conn->state); 740 | 741 | return -EINVAL; 742 | } 743 | } 744 | } 745 | 746 | static void tcp_rto_update(struct tcp_conn_tcb *conn, int rtt) 747 | { 748 | int delta; 749 | if (conn->rtt_est != 0) { 750 | delta = (rtt - 1) - (conn->rtt_est >> TCP_RTT_SHIFT); 751 | if ((conn->rtt_est += delta) <= 0) { 752 | conn->rtt_est = 1; 753 | } 754 | delta = abs(delta); 755 | delta -= ((conn->rtt_var) >> (TCP_RTTVAR_SHIFT)); 756 | if ((conn->rtt_var += delta) <= 0) { 757 | conn->rtt_var = 1; 758 | } 759 | } else { 760 | conn->rtt_est = rtt << TCP_RTT_SHIFT; 761 | conn->rtt_var = rtt << (TCP_RTTVAR_SHIFT - 1); 762 | } 763 | conn->retran_timeout = TCP_REXMTVAL(conn); 764 | LOG(LOG_INFO, "Update RTO: value = %d", conn->retran_timeout); 765 | conn->rtt = 0; /*Reset to 0 for timing and transmission of next segment. */ 766 | } 767 | 768 | static void tcp_rexmt_prepare(struct tcp_conn_tcb *conn) 769 | { 770 | struct tcp_segment *seg, *seg_tmp; 771 | pthread_mutex_lock(&conn->mutex); 772 | TAILQ_FOREACH_SAFE(seg, &conn->unsent_list, _link, seg_tmp) 773 | { 774 | TAILQ_REMOVE(&conn->unsent_list, seg, _link); 775 | TAILQ_INSERT_TAIL(&conn->unacked_list, seg, _link); 776 | } 777 | TAILQ_SWAP(&conn->unsent_list, &conn->unacked_list, tcp_segment, _link); 778 | pthread_mutex_unlock(&conn->mutex); 779 | } 780 | 781 | static void tcp_rexmt_commit(struct tcp_conn_tcb *conn) 782 | { 783 | conn->retran_count++; 784 | tcp_send_segments(conn); 785 | } 786 | 787 | 788 | static void tcp_timer(struct tcp_conn_tcb *conn, int counter_index) 789 | { 790 | switch (counter_index) { 791 | case TCP_T_REXMT: 792 | conn->timer[counter_index] = conn->retran_timeout; 793 | /* Karn's Algorithm: the only segments that are timed by conn->rtt are 794 | * those that are not retransmitted. 795 | * TODO: Use timestamps to estimate 796 | * RTT instead of Karn's Algorithm */ 797 | conn->rtt = 0; 798 | tcp_rexmt_prepare(conn); 799 | tcp_rexmt_commit(conn); 800 | return; 801 | case TCP_T_PERSIST: 802 | case TCP_T_KEEP: 803 | if (conn->state < TCP_ESTABLISHED) { 804 | RB_REMOVE(tcp_conn_map, &tcp_conn_map, conn); 805 | free(conn); 806 | return; 807 | } 808 | 809 | case TCP_T_2MSL: 810 | RB_REMOVE(tcp_conn_map, &tcp_conn_map, conn); 811 | free(conn); 812 | return; 813 | } 814 | } 815 | void tcp_slowtimo() 816 | { 817 | struct tcp_conn_tcb *conn; 818 | RB_FOREACH (conn, tcp_conn_map, &tcp_conn_map) { 819 | for (int i = 0; i < TCP_T_NTIMERS; i++) { 820 | if (conn->timer[i] && (--conn->timer[i] == 0)) { 821 | tcp_timer(conn, i); 822 | } 823 | } 824 | if (conn->rtt) { 825 | conn->rtt++; 826 | } 827 | } 828 | tcp_now++; 829 | } -------------------------------------------------------------------------------- /src/tcp.h: -------------------------------------------------------------------------------- 1 | /** 2 | * nstack TCP service. 3 | * @addtogroup TCP 4 | * @{ 5 | */ 6 | 7 | #pragma once 8 | 9 | #include 10 | 11 | #include "linker_set.h" 12 | #include "nstack_in.h" 13 | 14 | /** 15 | * Type for an TCP port number. 16 | */ 17 | typedef uint16_t tcp_port_t; 18 | 19 | /** 20 | * TCP packet header. 21 | */ 22 | struct tcp_hdr { 23 | struct { 24 | uint16_t tcp_sport; /*!< Source port. */ 25 | uint16_t tcp_dport; /*!< Destination port. */ 26 | uint32_t tcp_seqno; /*!< Sequence number.*/ 27 | uint32_t tcp_ack_num; /*!< Acknowledgment number (if ACK). */ 28 | uint16_t tcp_flags; 29 | uint16_t tcp_win_size; /*!< Window size. */ 30 | uint16_t tcp_checksum; /*!< TCP Checksum. */ 31 | uint16_t tcp_urg_ptr; /*!< Urgent pointer (if URG). */ 32 | }; 33 | uint8_t opt[0]; /*!< Options. */ 34 | } __attribute__((packed, aligned(4))); 35 | 36 | /** 37 | * TCP packet header option. 38 | */ 39 | struct tcp_option { 40 | uint8_t option_kind; 41 | uint8_t length; 42 | union { 43 | uint16_t mss; 44 | uint8_t window_scale; 45 | struct { 46 | uint32_t tsval; 47 | uint32_t tsecr; 48 | }; 49 | }; 50 | } __attribute__((packed, aligned(4))); 51 | /*TODO: support Selective ACKnowledgement (SACK)*/ 52 | 53 | 54 | #define TCP_DOFF_MASK 0xF000 /*tcp_flags & TCP_DOFF_MASK) >> TCP_DOFF_OFF; 69 | if (doff < 5 || doff > 15) { 70 | return -EINVAL; 71 | } 72 | 73 | return doff * 4; 74 | } 75 | 76 | inline static int tcp_opt_size(struct tcp_hdr *hdr) 77 | { 78 | const size_t doff = (hdr->tcp_flags & TCP_DOFF_MASK) >> TCP_DOFF_OFF; 79 | if (doff < 5 || doff > 15) { 80 | return -EINVAL; 81 | } 82 | 83 | return doff * 4 - sizeof(struct tcp_hdr); 84 | } 85 | 86 | /** 87 | * TCP Connection State. 88 | * Passive Open = responder/server 89 | * Active Open = initiator/client 90 | */ 91 | enum tcp_state { 92 | TCP_CLOSED = 0, 93 | TCP_LISTEN, /* Passive Open */ 94 | TCP_SYN_RCVD, /* Passive Open */ 95 | TCP_SYN_SENT, /* Active Open */ 96 | TCP_ESTABLISHED, 97 | TCP_FIN_WAIT_1, /* Initiator */ 98 | TCP_FIN_WAIT_2, /* Initiator */ 99 | TCP_CLOSE_WAIT, /* Responder */ 100 | TCP_CLOSING, /* Simultaneous Close */ 101 | TCP_LAST_ACK, /* Responder */ 102 | TCP_TIME_WAIT, /* Initiator/Simultaneous */ 103 | }; 104 | 105 | /** 106 | * TCP timer counter index 107 | */ 108 | 109 | #define TCP_T_REXMT 0 /*rtt_est) >> TCP_RTT_SHIFT) + (conn)->rtt_var) 154 | 155 | struct nstack_sockaddr; 156 | 157 | /** 158 | * Allocate a UDP socket descriptor. 159 | */ 160 | struct nstack_sock *nstack_udp_alloc_sock(void); 161 | 162 | int nstack_tcp_bind(struct nstack_sock *sock); 163 | int nstack_tcp_send(struct nstack_sock *sock, const struct nstack_dgram *dgram); 164 | 165 | /** 166 | * @} 167 | */ 168 | -------------------------------------------------------------------------------- /src/udp.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include "nstack_in.h" 8 | #include "nstack_socket.h" 9 | 10 | #include "ip_defer.h" 11 | #include "logger.h" 12 | #include "nstack_arp.h" 13 | #include "nstack_icmp.h" 14 | #include "nstack_internal.h" 15 | #include "nstack_ip.h" 16 | #include "udp.h" 17 | 18 | RB_HEAD(udp_sock_tree, nstack_sock); 19 | 20 | static struct udp_sock_tree udp_sock_tree_head = RB_INITIALIZER(); 21 | 22 | static int udp_socket_cmp(struct nstack_sock *a, struct nstack_sock *b) 23 | { 24 | return memcmp(&a->info.sock_addr, &b->info.sock_addr, 25 | sizeof(struct nstack_sockaddr)); 26 | } 27 | 28 | RB_GENERATE_STATIC(udp_sock_tree, nstack_sock, data.udp._entry, udp_socket_cmp); 29 | 30 | static struct nstack_sock *find_udp_socket(const struct nstack_sockaddr *addr) 31 | { 32 | struct nstack_sock_info find = { 33 | .sock_addr = *addr, 34 | }; 35 | 36 | return RB_FIND(udp_sock_tree, &udp_sock_tree_head, 37 | (struct nstack_sock *) (&find)); 38 | } 39 | 40 | static void udp_hton(const struct udp_hdr *host, struct udp_hdr *net) 41 | { 42 | net->udp_sport = htons(host->udp_sport); 43 | net->udp_dport = htons(host->udp_dport); 44 | net->udp_len = htons(host->udp_len); 45 | } 46 | 47 | static void udp_ntoh(const struct udp_hdr *net, struct udp_hdr *host) 48 | { 49 | host->udp_sport = ntohs(net->udp_sport); 50 | host->udp_dport = ntohs(net->udp_dport); 51 | host->udp_len = ntohs(net->udp_len); 52 | } 53 | 54 | int nstack_udp_bind(struct nstack_sock *sock) 55 | { 56 | if (sock->info.sock_addr.port > NSTACK_SOCK_PORT_MAX) { 57 | errno = EINVAL; 58 | return -1; 59 | } 60 | 61 | if (find_udp_socket(&sock->info.sock_addr)) { 62 | errno = EADDRINUSE; 63 | return -1; 64 | } 65 | 66 | RB_INSERT(udp_sock_tree, &udp_sock_tree_head, sock); 67 | 68 | return 0; 69 | } 70 | 71 | /** 72 | * UDP input chain. 73 | * IP -> UDP 74 | */ 75 | static int udp_input(const struct ip_hdr *ip_hdr, 76 | uint8_t *payload, 77 | size_t bsize) 78 | { 79 | struct udp_hdr *udp = (struct udp_hdr *) payload; 80 | struct nstack_sock *sock; 81 | struct nstack_sockaddr sockaddr; 82 | 83 | if (bsize < sizeof(struct udp_hdr)) { 84 | LOG(LOG_INFO, "Datagram size too small"); 85 | 86 | return -EBADMSG; 87 | } 88 | 89 | udp_ntoh(udp, udp); 90 | 91 | sockaddr.inet4_addr = ip_hdr->ip_dst; 92 | sockaddr.port = udp->udp_dport; 93 | sock = find_udp_socket(&sockaddr); 94 | if (sock) { 95 | int retval; 96 | struct nstack_sockaddr srcaddr = { 97 | .inet4_addr = ip_hdr->ip_src, 98 | .port = udp->udp_sport, 99 | }; 100 | 101 | retval = nstack_sock_dgram_input(sock, &srcaddr, 102 | payload + sizeof(struct udp_hdr), 103 | bsize - sizeof(struct udp_hdr)); 104 | if (retval > 0) { 105 | /* 106 | * RFE The following code is probably not needed as 107 | * nstack_sock_dgram_input() never returns anything above 0. 108 | */ 109 | udp_port_t tmp; 110 | 111 | /* Swap ports */ 112 | tmp = udp->udp_sport; 113 | udp->udp_sport = udp->udp_dport; 114 | udp->udp_dport = tmp; 115 | 116 | udp->udp_len = sizeof(struct udp_hdr) + retval; 117 | 118 | if (IP_VERSION(ip_hdr) == 4) { 119 | udp->udp_csum = 0; /* Can be zeroed on IPv4 */ 120 | } else { 121 | /* TODO update csum */ 122 | udp->udp_csum = 0; 123 | } 124 | 125 | udp_hton(udp, udp); 126 | } 127 | return retval; 128 | } else { 129 | LOG(LOG_INFO, "Port %d unreachable", sockaddr.port); 130 | 131 | return -ENOTSOCK; 132 | } 133 | } 134 | IP_PROTO_INPUT_HANDLER(IP_PROTO_UDP, udp_input); 135 | 136 | static uint16_t udp_checksum(const void *buff, 137 | size_t len, 138 | in_addr_t src_addr, 139 | in_addr_t dest_addr) 140 | { 141 | const uint16_t *buf = buff; 142 | uint16_t *ip_src = (void *) &src_addr, *ip_dst = (void *) &dest_addr; 143 | uint32_t sum; 144 | size_t length = len; 145 | 146 | /* Calculate the sum */ 147 | sum = 0; 148 | while (len > 1) { 149 | sum += *buf++; 150 | if (sum & 0x80000000) 151 | sum = (sum & 0xFFFF) + (sum >> 16); 152 | len -= 2; 153 | } 154 | /* Add the padding if the packet length is odd */ 155 | if (len & 1) 156 | sum += *((uint8_t *) buf); 157 | 158 | /* Add the pseudo-header */ 159 | sum += *(ip_src++); 160 | sum += *ip_src; 161 | 162 | sum += *(ip_dst++); 163 | sum += *ip_dst; 164 | 165 | sum += htons(IPPROTO_UDP); 166 | sum += htons(length); 167 | 168 | /* Add the carries */ 169 | while (sum >> 16) 170 | sum = (sum & 0xFFFF) + (sum >> 16); 171 | 172 | /* Return the one's complement of sum */ 173 | return ((uint16_t) (~sum)); 174 | } 175 | 176 | int nstack_udp_send(struct nstack_sock *sock, const struct nstack_dgram *dgram) 177 | { 178 | uint8_t buf[sizeof(struct udp_hdr) + dgram->buf_size]; 179 | struct udp_hdr *udp = (struct udp_hdr *) buf; 180 | uint8_t *payload = udp->data; 181 | 182 | if (!(dgram->buf_size > 0 && dgram->buf_size < UDP_MAXLEN)) { 183 | return -EINVAL; 184 | } 185 | 186 | /* 187 | * UDP Header. 188 | */ 189 | udp->udp_sport = sock->info.sock_addr.port; 190 | udp->udp_dport = dgram->dstaddr.port; 191 | udp->udp_len = sizeof(struct udp_hdr) + dgram->buf_size; 192 | udp->udp_csum = 0; 193 | /* Calculate UDP csum */ 194 | int chk = udp_checksum(udp, udp->udp_len, dgram->srcaddr.inet4_addr, 195 | dgram->dstaddr.inet4_addr); 196 | 197 | if (chk) { 198 | LOG(LOG_INFO, "Checksum error: %d", chk); 199 | return -1; 200 | } 201 | 202 | memcpy(payload, dgram->buf, dgram->buf_size); 203 | 204 | udp_hton(udp, udp); 205 | return ip_send(dgram->dstaddr.inet4_addr, IP_PROTO_UDP, buf, sizeof(buf)); 206 | } 207 | -------------------------------------------------------------------------------- /src/udp.h: -------------------------------------------------------------------------------- 1 | /** 2 | * nstack UDP service. 3 | * @addtogroup UDP 4 | * @{ 5 | */ 6 | 7 | #pragma once 8 | 9 | #include 10 | 11 | #include "linker_set.h" 12 | #include "nstack_in.h" 13 | 14 | #define UDP_MAXLEN 65507 15 | 16 | /** 17 | * Type for a UDP port number. 18 | */ 19 | typedef uint16_t udp_port_t; 20 | 21 | /** 22 | * UDP packet header. 23 | */ 24 | struct udp_hdr { 25 | udp_port_t udp_sport; /*!< UDP Source port. */ 26 | udp_port_t udp_dport; /*!< UDP Destination port. */ 27 | uint16_t udp_len; /*!< UDP datagram length. */ 28 | uint16_t udp_csum; /*!< UDP Checksum. */ 29 | uint8_t data[0]; /*!< Datagram contents. */ 30 | }; 31 | 32 | struct nstack_sock; 33 | struct nstack_dgram; 34 | 35 | int nstack_udp_bind(struct nstack_sock *sock); 36 | int nstack_udp_send(struct nstack_sock *sock, const struct nstack_dgram *dgram); 37 | 38 | /** 39 | * @} 40 | */ 41 | -------------------------------------------------------------------------------- /tests/tcptest.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "nstack_socket.h" 7 | 8 | /* How to test: 9 | * Open a new terminal, enter in following command: 10 | 11 | $ nc -lv 10.0.0.1 10000 12 | 13 | then start this program. Normally you will see message 14 | print out in the terminal mentioned above. 15 | */ 16 | static char buf[2048]; 17 | 18 | int main(void) 19 | { 20 | void *sock = nstack_listen("/tmp/tnetcat.sock"); 21 | if (!sock) { 22 | perror("Failed to open sock"); 23 | exit(1); 24 | } 25 | struct nstack_sockaddr addr; 26 | addr.inet4_addr = 167772161; 27 | addr.port = 10000; 28 | size_t r; 29 | while (1) { 30 | memset(buf, 0, sizeof(buf)); 31 | buf[0] = 'f'; 32 | buf[1] = 'o'; 33 | buf[2] = 'o'; 34 | r = nstack_sendto(sock, buf, 3, 0, &addr); 35 | // r = nstack_recvfrom(sock, buf, sizeof(buf) - 1, 0, &addr); 36 | // if (r > 0) 37 | // write(STDOUT_FILENO, buf, r); 38 | sleep(10); 39 | } 40 | } -------------------------------------------------------------------------------- /tests/tnetcat.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "nstack_socket.h" 7 | 8 | static char buf[2048]; 9 | 10 | int main(void) 11 | { 12 | void *sock = nstack_listen("/tmp/tnetcat.sock"); 13 | if (!sock) { 14 | perror("Failed to open sock"); 15 | exit(1); 16 | } 17 | 18 | while (1) { 19 | struct nstack_sockaddr addr; 20 | size_t r; 21 | 22 | memset(buf, 0, sizeof(buf)); 23 | r = nstack_recvfrom(sock, buf, sizeof(buf) - 1, 0, &addr); 24 | if (r > 0) 25 | write(STDOUT_FILENO, buf, r); 26 | } 27 | } 28 | -------------------------------------------------------------------------------- /tests/udp.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | 8 | #define SERVER "10.0.0.2" 9 | #define PORT 10 10 | 11 | static char buf[1400]; 12 | 13 | int main(void) 14 | { 15 | struct sockaddr_in si_other = { 16 | .sin_family = AF_INET, 17 | .sin_port = htons(PORT), 18 | }; 19 | 20 | if (inet_aton(SERVER, &si_other.sin_addr) == 0) { 21 | fprintf(stderr, "inet_aton() failed\n"); 22 | exit(1); 23 | } 24 | 25 | int s; 26 | if ((s = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) == -1) { 27 | perror("Failed to open socket"); 28 | exit(1); 29 | } 30 | 31 | while (1) { 32 | if (sendto(s, buf, sizeof(buf), 0, (struct sockaddr *) &si_other, 33 | sizeof(struct sockaddr_in)) == -1) { 34 | perror("sendto()"); 35 | exit(1); 36 | } 37 | } 38 | 39 | close(s); 40 | return 0; 41 | } 42 | -------------------------------------------------------------------------------- /tests/unetcat.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "nstack_socket.h" 7 | 8 | static char buf[2048]; 9 | 10 | int main(void) 11 | { 12 | void *sock = nstack_listen("/tmp/unetcat.sock"); 13 | if (!sock) { 14 | perror("Failed to open sock"); 15 | exit(1); 16 | } 17 | 18 | while (1) { 19 | struct nstack_sockaddr addr; 20 | size_t r; 21 | 22 | memset(buf, 0, sizeof(buf)); 23 | r = nstack_recvfrom(sock, buf, sizeof(buf) - 1, 0, &addr); 24 | if (r > 0) 25 | write(STDOUT_FILENO, buf, r); 26 | } 27 | } 28 | -------------------------------------------------------------------------------- /tools/assert.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | function ASSERT { 4 | $* 5 | RES=$? 6 | if [ $RES -ne 0 ]; then 7 | echo 'Assert failed: "' $* '"' 8 | exit $RES 9 | fi 10 | } 11 | -------------------------------------------------------------------------------- /tools/gdb/.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /tools/gdb/nstack.py: -------------------------------------------------------------------------------- 1 | import gdb 2 | import gdb.printing 3 | import string 4 | import socket, struct 5 | 6 | class arpEntryPrinter: 7 | "Print ARP table entry" 8 | 9 | def __init__(self, val): 10 | self.val = val 11 | 12 | def to_string(self): 13 | if self.val.address == 0: 14 | return 'NULL' 15 | 16 | ip_addr = socket.inet_ntoa(struct.pack('!L', int(self.val['ip_addr']))) 17 | haddr = self.val['haddr'] 18 | age = self.val['age'] 19 | if int(age) < 0: 20 | age = self.val['age'].cast(gdb.lookup_type('enum arp_cache_entry_type')) 21 | 22 | s = ip_addr + ' at ' + str(haddr) + ', age: ' + str(age) 23 | return s 24 | 25 | def build_pretty_printer(): 26 | pp = gdb.printing.RegexpCollectionPrettyPrinter("nstatck_arp") 27 | pp.add_printer('arp', '^arp_cache_entry$', arpEntryPrinter) 28 | 29 | return pp 30 | 31 | gdb.printing.register_pretty_printer( 32 | gdb.current_objfile(), 33 | build_pretty_printer()) 34 | -------------------------------------------------------------------------------- /tools/gdbinit: -------------------------------------------------------------------------------- 1 | python 2 | import sys 3 | sys.path.insert(0, 'tools/gdb') 4 | import nstack 5 | end 6 | -------------------------------------------------------------------------------- /tools/ping_test.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | source tools/assert.sh 4 | source tools/testenv.sh config 5 | 6 | ASSERT ping -c 3 -w 5 -s 500 $STACK_IP 7 | ASSERT ping -c 3 -w 5 -s 1500 $STACK_IP 8 | ASSERT ping -c 3 -w 5 -s 4500 $STACK_IP 9 | -------------------------------------------------------------------------------- /tools/run.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env -S bash -e 2 | 3 | # shmem for sockets 4 | dd if=/dev/zero of=/tmp/unetcat.sock bs=1024 count=1024 5 | dd if=/dev/zero of=/tmp/tnetcat.sock bs=1024 count=1024 6 | 7 | sudo setcap cap_net_raw,cap_net_admin,cap_net_bind_service+eip build/inetd 8 | sudo ip netns exec TEST su $USER -c "build/inetd $1" 9 | -------------------------------------------------------------------------------- /tools/testenv.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env -S bash -e 2 | 3 | LOCAL_IP=10.0.0.1 4 | STACK_IP=10.0.0.2 5 | 6 | function start { 7 | ip netns add TEST 8 | ip link add veth0 type veth peer name veth1 9 | ip link set dev veth0 up 10 | ip link set dev veth1 up 11 | ip addr add dev veth0 local $LOCAL_IP 12 | ip route add $STACK_IP dev veth0 13 | ip link set dev veth1 netns TEST 14 | ip netns exec TEST ip link set dev veth1 up 15 | } 16 | 17 | function stop { 18 | ip link set dev veth0 down 19 | ip link delete veth0 20 | ip netns delete TEST 21 | } 22 | 23 | function config { 24 | echo "$LOCAL_IP" 25 | } 26 | 27 | if [ $# -eq 0 ]; then 28 | echo "Usage: $(basename $0) {start|config|stop}" 29 | exit 30 | fi 31 | $1 32 | --------------------------------------------------------------------------------