├── LICENSE ├── README.md ├── benchmark.md ├── client ├── Makefile ├── app ├── main.cc ├── main.o ├── p4ml_manager.cc ├── p4ml_manager.h └── p4ml_manager.o ├── common ├── CC_manager.h ├── HashTable.cc ├── HashTable.h ├── HashTable.o ├── ThreadPool.h ├── dma_common.cc ├── dma_common.h ├── dma_common.o ├── mlx5_defs.h ├── p4ml_struct.h ├── packet.h ├── quantize.h ├── utils.h └── window_manager.h ├── datasample ├── job_A.txt ├── job_B.txt └── switch.txt ├── p4ml2 ├── includes │ ├── actions.p4 │ ├── common.p4 │ ├── headers.p4 │ ├── parser.p4 │ ├── registers.p4 │ └── tables.p4 └── p4ml2.p4 ├── ptf_p4ml2 ├── ptfTest.py └── ptfTest.pyc ├── run_pd_rpc └── setupp4ml2.py ├── server ├── Makefile ├── ParameterServer.cc ├── ParameterServer.h ├── ParameterServer.o └── app └── shell ├── dealdata.sh ├── gencsv.py ├── occ_A.txt ├── occ_B.txt ├── result.csv ├── thr_A.txt └── thr_B.txt /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 CSU-NetLab 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A2TP-Eurosys2023 2 | 3 | ## Abstract 4 | Compared with the state-of-the-art in-network aggregation protocols, A2TP redesigns the congestion control algorithm, decouples in-network aggregator congestion and link bandwidth congestion, and considers the impact of stragglers, so as to improve the efficiency of in-network aggregation. We implement A2TP with P4 programmable switch and kernel bypass protocol stack at the end host. We provide the source code of the whole system. In addition, we conduct a benchmark experiment and provide experimental steps in detail to verify the performance of A2TP. 5 | 6 | 7 | ## Description \& Requirment 8 | Our source code is built on ATP [1] (https://github.com/in-ATP/ATP.git), which is one of the advanced in-network aggregation schemes. Our core design is in the congestion control part, such as the decoupled window adjustment in CC_manerger.h and the straggling degree detection in p4ml_manager.cc, etc. 9 | ``` 10 | [1] C. Lao, Y. Le, K. Mahajan, Y. Chen, W. Wu, A. Akella and M.M. Swift. ATP: In-network Aggregation for Multi-tenant Learning. In Proc. NSDI, 2021. 11 | ``` 12 | 13 | ### Hardware dependencies 14 | The Mellanox NIC is required for the hosts. We recommend using the Mellanox ConnectX5 100Gbps NIC. Besides, our system requires programmable switch support and the recommended switch is Barefoot Wedge 100BF-32X. 15 | 16 | ### Software dependencies 17 | For each host, the NIC driver version is MLNX\_OFED\_LINUX-4.9-4.1.7.0. 18 | For aggregating switch, the SDE version is bf-sde-8.9.1. 19 | 20 | ## Directory Structure 21 | ``` 22 | ./client ./common and ./server are deployed on workers and PSs 23 | ./p4ml2 ./ptf_p4ml2 and ./run_pd_rpc are used in programmable switch 24 | ./datasample is the sample data from the benchmark experiment 25 | ./shell is used to deal with the sample data and generate a .csv file 26 | ``` 27 | 28 | ## Basic Performance 29 | The detailed steps of the benchmark experiment are described in benchmark.md 30 | -------------------------------------------------------------------------------- /benchmark.md: -------------------------------------------------------------------------------- 1 | # Benchmark 2 | ## Set Up 3 | The experiment needs a Barefoot Wedge 100BF-32X programmable switch and 11 servers. All servers are connected to the switch with 100Gbps links. There are 2 jobs, each of which has 4 workers and 1 PS (Parameter Server). They share 1 aggregating switch. 4 | To successfully install the NIC driver, you need to use the adapted version of OS kernel. 5 | Ubuntu 20.04 with Linux 5.4.0-26-generic is feasible to install the NIC driver. Note that configuring 4 workers per job is not necessary. If you don't have enough machines, you can reduce the number of workers for any job. 6 | 7 | ## Experiment Steps 8 | ### Compile P4 program 9 | ``` 10 | $ cd ~/bf-sde-8.9.1/pkgsrc/p4-build 11 | $ ./configure --prefix=$SDE_INSTALL --with-tofino P4_NAME=p4ml2 P4_PATH=/root/bf-sde-8.9.1/pkgsrc/p4-examples/programs/p4ml2/p4ml2.p4 --enable-thrift 12 | $ make 13 | $ make install 14 | ``` 15 | ``` 16 | $ cd ~/bf-sde-8.9.1/pkgsrc/p4-examples 17 | $ ./configure --prefix=$SDE_INSTALL 18 | $ make 19 | $ make install 20 | ``` 21 | 22 | 23 | ### Compile worker and parameter server 24 | ``` 25 | $ cd $A2TP/client/ 26 | $ make 27 | ``` 28 | ``` 29 | $ cd $A2TP/server/ 30 | $ make 31 | ``` 32 | 33 | 34 | ### Run switch program, configure ports, and install table entries 35 | ``` 36 | $ cd $SDE 37 | $ ./run_switchd.sh -p p4ml2 (Terminal1) 38 | ``` 39 | ``` 40 | $ $SDE/run_p4_tests.sh -t $A2TP/ptf_p4ml2/ -p p4ml2 (Terminal2) 41 | ``` 42 | ``` 43 | $ $TOOLS/run_pd_rpc.py -p p4ml2 $A2TP/run_pd_rpc/setupp4ml2.py (Terminal3) 44 | ``` 45 | 46 | ### Run server of job A and job B 47 | ``` 48 | $ #Usage: ./app [AppID] 49 | $ sudo ./app 1 50 | $ sudo ./app 2 51 | ``` 52 | After running successfully, the server will wait for the sender to start. 53 | 54 | ### Generate background traffic 55 | Choose one of the workers as the sender and any idle server as the receiver. Then, use the following command to generate UDP background traffic. 56 | This step simply simulates the bandwidth contention in the large distributed machine learning clusters. 57 | ``` 58 | $ iperf -c [destination ip] -B [local ip] -u -l 50000 -t 90 -i 1 -b 10G -P 8 59 | ``` 60 | 61 | 62 | ### Run nth worker of job A and mth worker of job B 63 | ``` 64 | $ #Usage: ./app [workerID] [Num of Worker] [AppID] [Num of PS] 65 | $ sudo ./app n 4 1 1 66 | $ sudo ./app m 4 2 1 67 | ``` 68 | If the step is successful, the terminal will print the throughput in each terminal. 69 | 70 | ### Expected result 71 | 72 | The aggregation throughput and aggregator occupancy will be printed in the terminal. Moreover, the results can be redirected to a file, and then you can use the shell to deal with the data. We collect the files of sample data from this basic experiment, and you can run "./dealdata.sh" command to generate a result.csv file, which includes the real-time aggregation throughput and aggregator occupancy of job A and job B. 73 | The expected result is that the straggling job only occupies a small number of aggregators, relinquishing the aggregators for the non-straggling job, 74 | so that the non-straggling job still maintains a high aggregation throughput (about 50Gbps in our environment). 75 | However, the job with severe straggling can still utilize PS for aggregation, so the overall aggregation throughput is improved. 76 | -------------------------------------------------------------------------------- /client/Makefile: -------------------------------------------------------------------------------- 1 | # CFLAGS := -O1 -g 2 | # LD := g++ 3 | # LDFLAGS := ${LDFLAGS} -lrdmacm -libverbs -lrt -lpthread -lm 4 | 5 | # ROCE_COMMON_PATH = ../common/ 6 | # INCLUDES = -I${ROCE_COMMON_PATH} 7 | # CFLAGS := ${CFLAGS} ${INCLUDES} 8 | # SOURCES := $(wildcard *.c *.h ${ROCE_COMMON_PATH}*.c ${ROCE_COMMON_PATH}*.h) 9 | 10 | 11 | # all: app 12 | # app: main.o p4ml_manager.o ${ROCE_COMMON_PATH}packet.o ${ROCE_COMMON_PATH}dma_common.o ${ROCE_COMMON_PATH}window_manager.o 13 | # ${LD} $(CFLAGS) -o $@ $^ ${LDFLAGS} 14 | 15 | 16 | # # Clean Target 17 | # clean: 18 | # rm *.o ../common/*.o 19 | # rm app 20 | 21 | all: 22 | g++ -std=c++11 -g -O1 -c -o main.o main.cc 23 | g++ -std=c++11 -g -O1 -c -o p4ml_manager.o p4ml_manager.cc -mavx 24 | g++ -std=c++11 -g -O1 -c -o ../common/HashTable.o ../common/HashTable.cc 25 | g++ -std=c++11 -g -O1 -c -o ../common/dma_common.o ../common/dma_common.cc 26 | g++ -std=c++11 -g -O1 -I../common/ -o app main.o p4ml_manager.o ../common/HashTable.o ../common/dma_common.o -lrdmacm -libverbs -lrt -lpthread -lm 27 | 28 | clean: 29 | rm *.o 30 | rm app 31 | -------------------------------------------------------------------------------- /client/app: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/client/app -------------------------------------------------------------------------------- /client/main.cc: -------------------------------------------------------------------------------- 1 | #include "p4ml_manager.h" 2 | 3 | #define ENABLE_LOG true 4 | 5 | uint32_t* init_model(int size) { 6 | uint32_t* tmp = new uint32_t[size]; 7 | for (int i = 0; i < size; i++) 8 | tmp[i] = i+1; 9 | return tmp; 10 | } 11 | 12 | float* init_model_float(int size) { 13 | float* tmp = new float[size]; 14 | for (int i = 0; i < size; i++) { 15 | tmp[i] = (i+1.0) / 10000000.0; 16 | // tmp[i] = (i + 1.0) / 10000.0; 17 | // printf("%f ", tmp[i]); 18 | } 19 | // tmp[63] = 200; 20 | return tmp; 21 | } 22 | 23 | float* init_model_float_with_overflow(int size) { 24 | float* tmp = new float[size]; 25 | for (int i = 0; i < size; i++) { 26 | tmp[i] = (i+1.0) / 10000000.0; 27 | } 28 | for (int i = 0; i < 100; i++) { 29 | int rand_num = rand() % size; 30 | if (rand_num > size / 2) 31 | tmp[rand_num] = 200; 32 | else 33 | tmp[rand_num] = 100; 34 | // printf("rand!!! %d\n", rand_num); 35 | } 36 | return tmp; 37 | } 38 | 39 | 40 | std::shared_ptr _p4ml_manager; 41 | 42 | 43 | int main(int argc, char *argv[]) 44 | { 45 | 46 | bindingCPU(19); 47 | if (argc < 5) { 48 | printf("\nUsage %s [MyID] [Num of Worker] [AppID] [Num of PS]\n\n", argv[0]); 49 | exit(1); 50 | } 51 | 52 | int host = atoi(argv[1]); 53 | int num_worker = atoi(argv[2]); 54 | int appID = atoi(argv[3]); 55 | int num_PS = atoi(argv[4]); 56 | 57 | //int host = 0; 58 | // int num_worker = 2; 59 | // int appID = 1; 60 | 61 | _p4ml_manager = std::shared_ptr(new P4mlManager(host, num_worker, appID, num_PS)); 62 | 63 | /* Here for int size to send per thread */ 64 | /* ex. 25600 = 32*800 = 1 Round */ 65 | int size = 1024000; 66 | int thread_to_use = 9; 67 | int loop_time = 500; 68 | _p4ml_manager->SetMaxWindow(50); 69 | _p4ml_manager->SetUsedSwitchAGTRcount(450); 70 | _p4ml_manager->SetMaxAgtrSizePerThread(50); 71 | //printf("argc %d\n",argc); 72 | int tempargc = 5; 73 | 74 | if (argc > 5 ) { 75 | while (tempargc < argc){ 76 | std::string option = argv[tempargc++]; 77 | if(option == "-t"){ 78 | int trange = atoi(argv[tempargc++]); 79 | _p4ml_manager->SetTailTimeRange(trange); 80 | printf("tail_time_range:%d\n",trange); 81 | } 82 | if(option == "-th"){ 83 | int num_thread_argv = atoi(argv[tempargc++]); 84 | thread_to_use = num_thread_argv; 85 | //printf("tail_time_range:%d\n",trange); 86 | } 87 | if (option == "-a") { 88 | int num_agtr = atoi(argv[tempargc++]); 89 | _p4ml_manager->SetMaxAgtrSizePerThread(num_agtr); 90 | } 91 | if (option == "-f") { 92 | float forward_rate = atof(argv[tempargc++]); 93 | _p4ml_manager->SetForceForward(forward_rate); 94 | } 95 | if (option == "-l") { 96 | loop_time = atof(argv[tempargc++]); 97 | } 98 | if (option == "-aa") { 99 | int num_used_agtr = atoi(argv[tempargc++]); 100 | _p4ml_manager->SetUsedSwitchAGTRcount(num_used_agtr); 101 | } 102 | 103 | if (option == "-w") { 104 | int wsize = atoi(argv[tempargc++]); 105 | _p4ml_manager->SetMaxWindow(wsize); 106 | } 107 | 108 | if (option == "-n"){ 109 | int num_nic = atoi(argv[tempargc++]); 110 | _p4ml_manager->SetNic(num_nic); 111 | } 112 | } 113 | } 114 | 115 | /* (40) Threads in thread pool */ 116 | /* MAX_AGTR (32000) / 40 = 800 Agtr per thread */ 117 | printf("thread_to_use %d\n", thread_to_use); 118 | _p4ml_manager->init_threadPool(thread_to_use); 119 | 120 | // _p4ml_manager->SetForceForward(0.25); 121 | // _p4ml_manager->SetMaxAgtrSizePerThread(50); 122 | 123 | int finish_counter = loop_time * thread_to_use; 124 | uint32_t** tensor = new uint32_t*[thread_to_use * loop_time]; 125 | 126 | printf("\nModel initializing...\n"); 127 | // for (int i = 0; i < thread_to_use * loop_time; i++) 128 | for (int i = 0; i < 1; i++) 129 | if (FLOATING_POINT_INPUT) 130 | tensor[i] = (uint32_t*) init_model_float_with_overflow(size); 131 | else 132 | tensor[i] = init_model(size); 133 | 134 | printf("\nModel initialized completed. Start sending...\n\n"); 135 | 136 | std::chrono::time_point timer = std::chrono::high_resolution_clock::now(); 137 | 138 | std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now(); 139 | 140 | for (int j = 0; j < loop_time; j++) { 141 | /* thread to use */ 142 | for (int i = 0; i < thread_to_use; i++) { 143 | uint64_t key = _p4ml_manager->GetNewKey(); 144 | _p4ml_manager->PushPull(key, (char*) tensor[0], size, 1); 145 | } 146 | } 147 | 148 | 149 | int total_sent = 0; 150 | 151 | while (finish_counter > 0) { 152 | int64_t tmp_key = _p4ml_manager->GetFinishKey(); 153 | if (tmp_key >= 0) { 154 | finish_counter--; 155 | total_sent++; 156 | } 157 | 158 | if (ENABLE_LOG) { 159 | std::chrono::time_point current_time = 160 | std::chrono::high_resolution_clock::now(); 161 | std::chrono::duration time_span = 162 | std::chrono::duration_cast>(current_time - timer); 163 | std::chrono::duration total_time = 164 | std::chrono::duration_cast>(current_time - t1); 165 | if (time_span.count() >= 0.5) { 166 | //printf("time_span %lf\n",time_span.count()); 167 | // printf("Tensor left: %d, ", finish_counter); 168 | // printf("total send %" PRIu64 " bytes, time %lf, throughput: %lf\n", total_sent * 32000 * 194, total_time, total_sent * 6062.5 / 1024.0 / 1024.0 * 8.0 / 1.0); 169 | // printf("%lf\n", total_sent * 6062.5 / 1024.0 / 1024.0 * 8.0 / 1.0); 170 | // int tmp = _p4ml_manager->GetCollisionTimeAndClear(); 171 | // if (tmp) 172 | // printf("%d\n", tmp); 173 | // printf("%d\n", _p4ml_manager->GetCollisionTimeAndClear()); 174 | 175 | double t_thr = (float)total_sent * (16517 * P4ML_PACKET_SIZE) / 1024 / 1024 / 1024 * 8 / time_span.count(); 176 | double thr = (float)total_sent * (16517 * P4ML_DATA_SIZE) / 1024 / 1024 / 1024 * 8 / time_span.count(); 177 | printf("Thr %lf \n", thr); 178 | total_sent = 0; 179 | timer = current_time; 180 | } 181 | } 182 | } 183 | 184 | std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now(); 185 | std::chrono::duration time_span = std::chrono::duration_cast>(t2 - t1); 186 | double transmit_size_in_m = (double)((double)size * loop_time * thread_to_use / (float)MAX_ENTRIES_PER_PACKET) * P4ML_PACKET_SIZE / 1024 / 1024; 187 | double total_time = time_span.count(); 188 | double throughput = (transmit_size_in_m / 1024 * 8 ) / total_time; 189 | printf("Finish all %d Tensors,\n Time = %lf s,\n Total Size = %lf MB,\n Throughput: %lf Gbps\n\n", thread_to_use * loop_time, total_time, transmit_size_in_m, throughput); 190 | 191 | } 192 | -------------------------------------------------------------------------------- /client/main.o: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/client/main.o -------------------------------------------------------------------------------- /client/p4ml_manager.h: -------------------------------------------------------------------------------- 1 | 2 | #ifndef P4ML_MANAGER_H 3 | #define P4ML_MANAGER_H 4 | 5 | #include "../common/dma_common.h" 6 | #include "../common/packet.h" 7 | #include "../common/utils.h" 8 | #include "../common/window_manager.h" 9 | #include "../common/HashTable.h" 10 | #include "../common/quantize.h" 11 | #include "../common/p4ml_struct.h" 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | #include 26 | #include 27 | #include 28 | #include 29 | 30 | #define FLOATING_POINT_INPUT false 31 | 32 | #define ONLY_DO_QUAN false 33 | 34 | #define OVERFLOW_THRESHOLD 213 35 | #define UNDERFLOW_THRESHOLD -213 36 | 37 | #define P4ML_KEY_TOTAL 500000 38 | #define MAX_TENSOR_SIZE 1024000 39 | 40 | #define MAX_THREAD_PER_APP 10 41 | 42 | class P4mlManager { 43 | public: 44 | P4mlManager(uint32_t host, int num_worker, int appID, int num_PS); 45 | // ~P4mlManager(); 46 | 47 | void init_threadPool(int num_thread); 48 | void PushPull(uint64_t key, char* data, int len, int cmd); 49 | static void PushPullLoop(int thread_id); 50 | static void QuantizationLoop(int thread_id); 51 | 52 | void PushTaskToThread(uint64_t key, char *data, int len, int cmd, int thread_id); 53 | 54 | uint64_t GetNewKey(); 55 | int64_t GetFinishKey(); 56 | 57 | 58 | 59 | 60 | double GetLossRate(); 61 | 62 | int GetCollisionTimeAndClear(); 63 | void SetForceForward(float forward_rate); 64 | void SetMaxAgtrSizePerThread(int max_agtr_size_per_thread); 65 | void SetUsedSwitchAGTRcount(int used_agtr); 66 | void SetTailTimeRange(int trange); 67 | void SetMaxWindow(int size); 68 | void SetNic(int num); 69 | 70 | private: 71 | static uint32_t host; 72 | static uint8_t num_worker; 73 | static uint8_t num_PS; 74 | static uint16_t appID; 75 | static uint64_t p4mlKey; 76 | static AppInfo* app_info; 77 | 78 | 79 | static int pause_count; 80 | static int max_window_size; 81 | static int nic; 82 | static int tail_time_range; 83 | static int max_agtr_size_per_thread; 84 | static int UsedSwitchAGTRcount; 85 | static int _num_thread; 86 | static std::chrono::time_point start; 87 | static ThreadInfo** threadInfoQueue; 88 | static DMAcontext** dmaContextQueue; 89 | static std::thread** threadQueue; 90 | static std::thread** pushPullthreadQueue; 91 | static std::queue* pushPulljobQueue; 92 | static std::thread** quantizationthreadQueue; 93 | static std::queue* quantizejobQueue; 94 | static std::queue* dequantizejobQueue; 95 | 96 | static WindowManager* window_manager; 97 | static std::queue finishQueue; 98 | static std::queue* pendingQueue; 99 | static uint64_t* weightQueue; 100 | 101 | // static uint16_t* hash_map; 102 | static HashTable* hash_table; 103 | static int32_t** quantizeBuffer; 104 | static bool** isOverflow; 105 | 106 | static bool isForceForward; 107 | static int forwardFrequency; 108 | static float forwardRate; 109 | 110 | static std::mutex Resource_mutex; 111 | static std::mutex _P4MLKey_mutex; 112 | static std::mutex _print_mutex; 113 | static std::mutex _queuePush_mutex; 114 | 115 | static void main_receive_packet_loop(DMAcontext* dma_context, int32_t* data, int my_id, CC_manager& cc_manager); 116 | static void updateModel(agghdr* p4ml_header, int32_t* data, int my_id); 117 | }; 118 | 119 | inline void P4mlManager::updateModel(agghdr* p4ml_header, int32_t* data, int my_id) 120 | { 121 | uint16_t* p_seq = &p4ml_header->seq_num; 122 | uint32_t* tensor_len = &pushPulljobQueue[my_id].front()->len; 123 | 124 | int32_t* p_model = p4ml_header->vector; 125 | uint32_t offset = (*p_seq - 1) * MAX_ENTRIES_PER_PACKET; 126 | if (offset < *tensor_len) { 127 | if (offset + MAX_ENTRIES_PER_PACKET > *tensor_len) 128 | memcpy(data + offset, p_model, sizeof(int32_t) * (*tensor_len % MAX_ENTRIES_PER_PACKET)); 129 | else 130 | memcpy(data + offset, p_model, sizeof(int32_t) * MAX_ENTRIES_PER_PACKET); 131 | } 132 | } 133 | 134 | #endif //P4ML_MANAGER_H -------------------------------------------------------------------------------- /client/p4ml_manager.o: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/client/p4ml_manager.o -------------------------------------------------------------------------------- /common/CC_manager.h: -------------------------------------------------------------------------------- 1 | #ifndef CC_MANAGER_H 2 | #define CC_MANAGER_H 3 | 4 | #define MAX_BYTES 200 * P4ML_PACKET_SIZE 5 | 6 | #define A2TPENABLE 1 7 | 8 | #include "packet.h" 9 | #include 10 | #include 11 | #include 12 | #include 13 | using namespace std; 14 | #define do_div(n, base) ({ \ 15 | uint32_t __base = (base); \ 16 | uint32_t __rem; \ 17 | __rem = ((uint64_t)(n)) % __base; \ 18 | (n) = ((uint64_t)(n)) / __base; \ 19 | __rem; \ 20 | }) 21 | #define GET_MIN(a, b) (a < b ? a : b) 22 | #define GET_MAX(a, b) (a > b ? a : b) 23 | 24 | class CC_manager { 25 | 26 | public: 27 | CC_manager(int init_window, int max_window_size) 28 | { 29 | cwnd_bytes = init_window * P4ML_PACKET_SIZE; 30 | aggr_bytes = cwnd_bytes; 31 | max_window = max_window_size; 32 | last_window = max_window; 33 | ecn_count = 0; 34 | col_count = 0; 35 | aggr_count = init_window; 36 | last_aggr_count = init_window; 37 | alpha = 0; 38 | beta = 0; 39 | urgent = 1; 40 | last_ts = std::chrono::high_resolution_clock::now(); 41 | receive_ack = 0; 42 | update_urgent = 1; 43 | gam = 1; 44 | print_gamflag = 0; 45 | } 46 | 47 | // int adjustWindow(bool isECN, bool isCollision, int appID) 48 | void adjustWindow(bool isECN, bool isCollision, int appID, int looptime){ 49 | 50 | if(A2TPENABLE){ 51 | 52 | gam = (1 - 1.0 / 64) * gam + 1.0 / 64 * urgent; //smooth straggler degree 53 | 54 | alpha = (1 - 1.0 / 8) * alpha + 1.0 / 8 * (1.0 * ecn_count / (cwnd_bytes / P4ML_PACKET_SIZE)); //smooth link congestion factor 55 | beta = (1 - 1.0 / 8) * beta + 1.0 / 8 * (1.0 * col_count / (aggr_bytes / P4ML_PACKET_SIZE)); //smooth aggregator congestion factor 56 | 57 | 58 | 59 | double thre = 0.1; //threshold for aggregator congestion control 60 | if(isCollision && beta > thre){ //adjust aggregator congestion window 61 | aggr_bytes = aggr_bytes * (1 - pow(beta-thre, gam) / (2-thre)); 62 | }else{ 63 | aggr_bytes = aggr_bytes + P4ML_PACKET_SIZE; // P4ML_PACKET_SIZE1500 64 | } 65 | 66 | if (ecn_count > 0){ //adjust link congestion window 67 | cwnd_bytes = cwnd_bytes*(1-alpha/2); 68 | //printf("reciev ecn\n"); 69 | } 70 | else{ 71 | cwnd_bytes += P4ML_PACKET_SIZE; //P4ML_PACKET_SIZE1500 72 | //aggr_bytes += 1500; 73 | } 74 | 75 | 76 | if (cwnd_bytes < P4ML_PACKET_SIZE) 77 | cwnd_bytes = P4ML_PACKET_SIZE; 78 | if (cwnd_bytes > max_window * P4ML_PACKET_SIZE) 79 | cwnd_bytes = max_window * P4ML_PACKET_SIZE; 80 | if (cwnd_bytes > P4ML_PACKET_SIZE) 81 | cwnd_bytes = (cwnd_bytes / P4ML_PACKET_SIZE) * P4ML_PACKET_SIZE; 82 | 83 | if (aggr_bytes < P4ML_PACKET_SIZE) 84 | aggr_bytes = P4ML_PACKET_SIZE; 85 | if (aggr_bytes > max_window * P4ML_PACKET_SIZE) 86 | aggr_bytes = max_window * P4ML_PACKET_SIZE; 87 | if (aggr_bytes > P4ML_PACKET_SIZE) 88 | aggr_bytes = (aggr_bytes / P4ML_PACKET_SIZE) * P4ML_PACKET_SIZE; 89 | 90 | if (aggr_bytes > cwnd_bytes){ 91 | aggr_bytes = cwnd_bytes; 92 | } 93 | }else{ 94 | if (isECN){ 95 | cwnd_bytes = cwnd_bytes / 2; 96 | //aggr_bytes = aggr_bytes*(1-alpha/2); 97 | //printf("reciev ecn\n"); 98 | } 99 | else{ 100 | cwnd_bytes += 1500; 101 | //aggr_bytes += 1500; 102 | } 103 | if (cwnd_bytes < P4ML_PACKET_SIZE) 104 | cwnd_bytes = P4ML_PACKET_SIZE; 105 | if (cwnd_bytes > max_window * P4ML_PACKET_SIZE) 106 | cwnd_bytes = max_window * P4ML_PACKET_SIZE; 107 | if (cwnd_bytes > P4ML_PACKET_SIZE) 108 | cwnd_bytes = (cwnd_bytes / P4ML_PACKET_SIZE) * P4ML_PACKET_SIZE; 109 | 110 | aggr_bytes = cwnd_bytes; 111 | 112 | } 113 | 114 | ecn_count = 0; 115 | col_count = 0; 116 | //return cwnd_bytes / P4ML_PACKET_SIZE; 117 | } 118 | 119 | int GetCwndPackets(){ 120 | return cwnd_bytes / P4ML_PACKET_SIZE; 121 | } 122 | 123 | int GetAggrPackets(){ 124 | return aggr_bytes / P4ML_PACKET_SIZE; 125 | } 126 | 127 | int ecn_count; 128 | int col_count; 129 | int aggr_count; 130 | int last_aggr_count; 131 | double alpha; //link congestion factor 132 | double beta; //aggregator congestion factor 133 | double urgent; //straggler degree 134 | double gam; 135 | int last_window; 136 | int receive_ack; 137 | bool update_urgent; 138 | bool print_gamflag; 139 | std::chrono::high_resolution_clock::time_point last_ts; 140 | 141 | private: 142 | uint64_t cwnd_bytes; //link congestion window size 143 | uint64_t aggr_bytes; //aggregator congestion window size 144 | int max_window; 145 | }; 146 | 147 | #endif -------------------------------------------------------------------------------- /common/HashTable.cc: -------------------------------------------------------------------------------- 1 | #include "HashTable.h" 2 | #define MAX_BYTES 100 * P4ML_PACKET_SIZE 3 | 4 | HashTable::HashTable(int size) 5 | { 6 | used_size = size; 7 | int max_size = MAX_AGTR_COUNT; 8 | hash_map = new uint16_t[max_size]; 9 | memset(isAlreadyDeclare, 0, sizeof(bool) * size); 10 | memset(predefine_agtr_list, 0, sizeof(bool) * size); 11 | for (int i = 0; i < size; i++) { 12 | predefine_agtr_list[i] = i; 13 | // printf("[%d] %d ", i, predefine_agtr_list[i]); 14 | } 15 | int random_seed = rand(); 16 | std::shuffle(predefine_agtr_list, predefine_agtr_list + size, std::default_random_engine(random_seed)); 17 | 18 | // for (int i = 0; i < size; i++) { 19 | 20 | // printf("[%d] %d ", i, predefine_agtr_list[i]); 21 | // } 22 | hash_pos = 0; 23 | } 24 | 25 | void HashTable::HashNew_linear(int index) 26 | { 27 | // Guarantee non-repeat element generated 28 | uint16_t new_value; 29 | do { 30 | new_value = hash_function(); 31 | } while (isAlreadyDeclare[new_value]); 32 | 33 | hash_map[index] = new_value; 34 | isAlreadyDeclare[new_value] = true; 35 | } 36 | 37 | int HashTable::HashNew_predefine() 38 | { 39 | if (hash_pos >= used_size) { 40 | return -1; 41 | } 42 | 43 | // Get AGTR from predefined hash 44 | while (hash_pos < used_size) { 45 | int new_agtr = predefine_agtr_list[hash_pos]; 46 | if (isAlreadyDeclare[new_agtr]) { 47 | hash_pos++; 48 | } else { 49 | hash_pos++; 50 | isAlreadyDeclare[new_agtr] = true; 51 | return new_agtr; 52 | } 53 | } 54 | 55 | return -1; 56 | } 57 | 58 | int HashTable::HashNew_crc(uint16_t appID, uint16_t index) 59 | { 60 | // Guarantee non-repeat element generated 61 | uint8_t crc_input[] = {(uint8_t)(appID & 0xff), (uint8_t)(appID >> 8), (uint8_t)(index & 0xff), (uint8_t)(index >> 8), 0, 0}; 62 | 63 | uint16_t new_value; 64 | uint8_t salt = 0; 65 | do { 66 | new_value = crc32_le(0xffffffff, crc_input, 6); 67 | new_value %= used_size; 68 | crc_input[4]++; 69 | if (crc_input[4] == 255) { 70 | crc_input[4] = 0; 71 | crc_input[5]++; 72 | } 73 | } while (isAlreadyDeclare[new_value]); 74 | hash_map[index] = new_value; 75 | isAlreadyDeclare[new_value] = true; 76 | return new_value; 77 | } 78 | 79 | void HashTable::HashNew_separate(uint16_t appID, uint16_t index) 80 | { 81 | int real_index = ((appID - 1) * 2000) + index; 82 | hash_map[index] = real_index; 83 | isAlreadyDeclare[real_index] = true; 84 | } 85 | 86 | uint16_t HashTable::hash_function() 87 | { 88 | return hash_pos++; 89 | } 90 | 91 | uint32_t HashTable::crc32_le(uint32_t crc, unsigned char const* p, size_t len) 92 | { 93 | while (len--) { 94 | crc ^= *p++; 95 | for (int i = 0; i < 8; i++) 96 | crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0); 97 | } 98 | return ~crc; 99 | } 100 | -------------------------------------------------------------------------------- /common/HashTable.h: -------------------------------------------------------------------------------- 1 | #ifndef HASHTABLE_H 2 | #define HASHTABLE_H 3 | #include 4 | #include "packet.h" 5 | #include "utils.h" 6 | #define CRCPOLY_LE 0xedb88320 7 | 8 | class HashTable { 9 | 10 | public: 11 | HashTable(int size); 12 | void HashNew_linear(int index); 13 | int HashNew_crc(uint16_t appID, uint16_t index); 14 | int HashNew_predefine(); 15 | void HashNew_separate(uint16_t appID, uint16_t index); 16 | uint16_t* hash_map; 17 | bool isAlreadyDeclare[MAX_AGTR_COUNT]; 18 | 19 | private: 20 | int used_size; 21 | uint32_t crc32_le(uint32_t crc, unsigned char const* p, size_t len); 22 | int predefine_agtr_list[MAX_AGTR_COUNT]; 23 | 24 | // These for predefine Hash 25 | 26 | // These two for Linear Hash 27 | uint16_t hash_function(); 28 | uint16_t hash_pos; 29 | 30 | }; 31 | 32 | #endif -------------------------------------------------------------------------------- /common/HashTable.o: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/common/HashTable.o -------------------------------------------------------------------------------- /common/ThreadPool.h: -------------------------------------------------------------------------------- 1 | #ifndef THREAD_POOL_H 2 | #define THREAD_POOL_H 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | class ThreadPool { 15 | public: 16 | template ThreadPool(size_t, F callback); 17 | template 18 | auto enqueue(F&& f, Args&&... args) 19 | -> std::future::type>; 20 | ~ThreadPool(); 21 | private: 22 | // need to keep track of threads so we can join them 23 | std::vector< std::thread > workers; 24 | // the task queue 25 | std::queue< std::function > tasks; 26 | 27 | // synchronization 28 | std::mutex queue_mutex; 29 | std::condition_variable condition; 30 | bool stop; 31 | }; 32 | 33 | // the constructor just launches some amount of workers 34 | template 35 | inline ThreadPool::ThreadPool(size_t threads, F callback) 36 | : stop(false) 37 | { 38 | for(size_t i = 0;i task; 45 | 46 | { 47 | std::unique_lock lock(this->queue_mutex); 48 | this->condition.wait(lock, 49 | [this]{ return this->stop || !this->tasks.empty(); }); 50 | if(this->stop && this->tasks.empty()) 51 | return; 52 | task = std::move(this->tasks.front()); 53 | this->tasks.pop(); 54 | } 55 | 56 | task(); 57 | callback(); 58 | } 59 | } 60 | ); 61 | } 62 | 63 | // add new work item to the pool 64 | template 65 | auto ThreadPool::enqueue(F&& f, Args&&... args) 66 | -> std::future::type> 67 | { 68 | using return_type = typename std::result_of::type; 69 | 70 | auto task = std::make_shared< std::packaged_task >( 71 | std::bind(std::forward(f), std::forward(args)...) 72 | ); 73 | 74 | std::future res = task->get_future(); 75 | { 76 | std::unique_lock lock(queue_mutex); 77 | 78 | // don't allow enqueueing after stopping the pool 79 | if(stop) 80 | throw std::runtime_error("enqueue on stopped ThreadPool"); 81 | 82 | tasks.emplace([task](){ (*task)(); }); 83 | } 84 | condition.notify_one(); 85 | return res; 86 | } 87 | 88 | // the destructor joins all threads 89 | inline ThreadPool::~ThreadPool() 90 | { 91 | { 92 | std::unique_lock lock(queue_mutex); 93 | stop = true; 94 | } 95 | condition.notify_all(); 96 | for(std::thread &worker: workers) 97 | worker.join(); 98 | } 99 | 100 | #endif 101 | -------------------------------------------------------------------------------- /common/dma_common.cc: -------------------------------------------------------------------------------- 1 | #define __USE_GNU 2 | 3 | #include "dma_common.h" 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | 17 | 18 | std::mutex ___print_mutex; 19 | int my_send_queue_length = 2048; 20 | int my_recv_queue_length = my_send_queue_length * 8; 21 | 22 | unsigned char PS_FILTER_TEMPLATE_R[] = { 0x05, 0x04, 0x03, 0x02, 0x01, 0xFF }; 23 | unsigned char WORKER_FILTER_TEMPLATE_R[] = { 0x77, 0x77, 0x77, 0x77, 0x77, 0xFF }; 24 | 25 | DMAcontext* DMA_create(ibv_device* ib_dev, int thread_id, bool isPS) 26 | { 27 | 28 | ibv_context* context = ibv_open_device(ib_dev); 29 | if (!context) { 30 | fprintf(stderr, "Couldn't get context for %s\n", 31 | ibv_get_device_name(ib_dev)); 32 | exit(1); 33 | } 34 | ibv_pd* pd = ibv_alloc_pd(context); 35 | if (!pd) { 36 | fprintf(stderr, "Couldn't allocate PD\n"); 37 | exit(1); 38 | } 39 | 40 | struct ibv_cq* rec_cq = ibv_create_cq(context, my_recv_queue_length + 1, NULL, NULL, 0); 41 | if (!rec_cq) { 42 | fprintf(stderr, "Couldn't create CQ %d\n", errno); 43 | exit(1); 44 | } 45 | 46 | struct ibv_cq* snd_cq = ibv_create_cq(context, my_send_queue_length + 1, NULL, NULL, 0); 47 | if (!snd_cq) { 48 | fprintf(stderr, "Couldn't create CQ %d\n", errno); 49 | exit(1); 50 | } 51 | 52 | struct ibv_qp* qp; 53 | struct ibv_exp_qp_init_attr* qp_init_attr = (struct ibv_exp_qp_init_attr*)malloc(sizeof(struct ibv_exp_qp_init_attr)); 54 | 55 | memset(qp_init_attr, 0, sizeof(*qp_init_attr)); 56 | qp_init_attr->comp_mask = IBV_EXP_QP_INIT_ATTR_PD | IBV_EXP_QP_INIT_ATTR_MAX_TSO_HEADER | IBV_EXP_QP_INIT_ATTR_INL_RECV; 57 | qp_init_attr->send_cq = snd_cq; 58 | qp_init_attr->recv_cq = rec_cq; 59 | qp_init_attr->qp_type = IBV_QPT_RAW_PACKET; 60 | 61 | qp_init_attr->pd = pd; 62 | qp_init_attr->cap.max_send_wr = my_send_queue_length + 1; 63 | qp_init_attr->cap.max_recv_wr = my_recv_queue_length + 1; 64 | qp_init_attr->cap.max_inline_data = 512; 65 | qp_init_attr->cap.max_send_sge = 1; 66 | qp_init_attr->cap.max_recv_sge = 1; 67 | qp_init_attr->max_tso_header = IP_ETH_UDP_HEADER_SIZE; 68 | qp_init_attr->max_inl_recv = 512; 69 | 70 | qp = ibv_exp_create_qp(context, qp_init_attr); 71 | //qp = ibv_create_qp(pd, qp_init_attr); 72 | if (!qp) { 73 | fprintf(stderr, "Couldn't create RSS QP\n"); 74 | exit(1); 75 | } 76 | 77 | struct ibv_qp_attr qp_attr; 78 | int qp_flags; 79 | int ret; 80 | memset(&qp_attr, 0, sizeof(qp_attr)); 81 | qp_flags = IBV_QP_STATE | IBV_QP_PORT; 82 | qp_attr.qp_state = IBV_QPS_INIT; 83 | qp_attr.port_num = 1; 84 | ret = ibv_modify_qp(qp, &qp_attr, qp_flags); 85 | if (ret < 0) { 86 | fprintf(stderr, "failed modify qp to init\n"); 87 | exit(1); 88 | } 89 | memset(&qp_attr, 0, sizeof(qp_attr)); 90 | 91 | /* a. Move ring state to ready to receive, this is needed to be able to move ring to ready to send even if receive queue is not enabled */ 92 | 93 | qp_flags = IBV_QP_STATE; 94 | qp_attr.qp_state = IBV_QPS_RTR; 95 | ret = ibv_modify_qp(qp, &qp_attr, qp_flags); 96 | if (ret < 0) { 97 | fprintf(stderr, "failed modify qp to receive\n"); 98 | exit(1); 99 | } 100 | 101 | /* b. Move the ring to ready to send */ 102 | 103 | qp_flags = IBV_QP_STATE; 104 | qp_attr.qp_state = IBV_QPS_RTS; 105 | ret = ibv_modify_qp(qp, &qp_attr, qp_flags); 106 | if (ret < 0) { 107 | fprintf(stderr, "failed modify qp to send\n"); 108 | exit(1); 109 | } 110 | 111 | int send_buf_size = P4ML_PACKET_SIZE * my_send_queue_length; 112 | 113 | void* send_buf; 114 | 115 | //send_buf = malloc(send_buf_size); 116 | // send_buf = alloc_raw_pages(send_buf_size / EACH_HUGEPAGE_SIZE + 1, EACH_HUGEPAGE_SIZE); 117 | ib_malloc(&send_buf, send_buf_size); 118 | if (!send_buf) { 119 | fprintf(stderr, "Coudln't allocate send memory\n"); 120 | exit(1); 121 | } 122 | 123 | struct ibv_mr* send_mr; 124 | send_mr = ibv_reg_mr(pd, send_buf, send_buf_size, IBV_ACCESS_LOCAL_WRITE); 125 | if (!send_mr) { 126 | fprintf(stderr, "Couldn't register recv mr\n"); 127 | exit(1); 128 | } 129 | 130 | // Init CQ. Its size MUST be one so that we get two CQEs in mlx5. 131 | struct ibv_exp_cq_init_attr cq_init_attr; 132 | memset(&cq_init_attr, 0, sizeof(cq_init_attr)); 133 | struct ibv_cq* mp_recv_cq = ibv_exp_create_cq(context, kAppRecvCQDepth / 2, nullptr, nullptr, 0, &cq_init_attr); 134 | assert(mp_recv_cq != nullptr); 135 | 136 | // Modify the RECV CQ to ignore overrun 137 | struct ibv_exp_cq_attr cq_attr; 138 | memset(&cq_attr, 0, sizeof(cq_attr)); 139 | cq_attr.comp_mask = IBV_EXP_CQ_ATTR_CQ_CAP_FLAGS; 140 | cq_attr.cq_cap_flags = IBV_EXP_CQ_IGNORE_OVERRUN; 141 | rt_assert(ibv_exp_modify_cq(mp_recv_cq, &cq_attr, IBV_EXP_CQ_CAP_FLAGS) == 0); 142 | 143 | struct ibv_exp_wq_init_attr wq_init_attr; 144 | memset(&wq_init_attr, 0, sizeof(wq_init_attr)); 145 | 146 | wq_init_attr.wq_type = IBV_EXP_WQT_RQ; 147 | wq_init_attr.max_recv_wr = kAppRQDepth; 148 | wq_init_attr.max_recv_sge = 1; 149 | wq_init_attr.pd = pd; 150 | wq_init_attr.cq = mp_recv_cq; 151 | 152 | wq_init_attr.comp_mask |= IBV_EXP_CREATE_WQ_MP_RQ; 153 | wq_init_attr.mp_rq.use_shift = IBV_EXP_MP_RQ_NO_SHIFT; 154 | wq_init_attr.mp_rq.single_wqe_log_num_of_strides = kAppLogNumStrides; 155 | wq_init_attr.mp_rq.single_stride_log_num_of_bytes = kAppLogStrideBytes; 156 | struct ibv_exp_wq* mp_wq = ibv_exp_create_wq(context, &wq_init_attr); 157 | assert(mp_wq != nullptr); 158 | 159 | // Change WQ to ready state 160 | struct ibv_exp_wq_attr wq_attr; 161 | memset(&wq_attr, 0, sizeof(wq_attr)); 162 | wq_attr.attr_mask = IBV_EXP_WQ_ATTR_STATE; 163 | wq_attr.wq_state = IBV_EXP_WQS_RDY; 164 | rt_assert(ibv_exp_modify_wq(mp_wq, &wq_attr) == 0); 165 | 166 | // Get the RQ burst function 167 | enum ibv_exp_query_intf_status intf_status = IBV_EXP_INTF_STAT_OK; 168 | struct ibv_exp_query_intf_params query_intf_params; 169 | memset(&query_intf_params, 0, sizeof(query_intf_params)); 170 | query_intf_params.intf_scope = IBV_EXP_INTF_GLOBAL; 171 | query_intf_params.intf = IBV_EXP_INTF_WQ; 172 | query_intf_params.obj = mp_wq; 173 | struct ibv_exp_wq_family* mp_wq_family = reinterpret_cast( 174 | ibv_exp_query_intf(context, &query_intf_params, &intf_status)); 175 | assert(mp_wq_family != nullptr); 176 | 177 | // Create indirect table 178 | struct ibv_exp_rwq_ind_table_init_attr rwq_ind_table_init_attr; 179 | memset(&rwq_ind_table_init_attr, 0, sizeof(rwq_ind_table_init_attr)); 180 | rwq_ind_table_init_attr.pd = pd; 181 | rwq_ind_table_init_attr.log_ind_tbl_size = 0; // Ignore hash 182 | rwq_ind_table_init_attr.ind_tbl = &mp_wq; // Pointer to RECV work queue 183 | rwq_ind_table_init_attr.comp_mask = 0; 184 | struct ibv_exp_rwq_ind_table* mp_ind_tbl = ibv_exp_create_rwq_ind_table(context, &rwq_ind_table_init_attr); 185 | assert(mp_ind_tbl != nullptr); 186 | 187 | // Create rx_hash_conf and indirection table for the QP 188 | uint8_t toeplitz_key[] = { 0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2, 189 | 0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0, 190 | 0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4, 191 | 0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c, 192 | 0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa }; 193 | const int TOEPLITZ_RX_HASH_KEY_LEN = sizeof(toeplitz_key) / sizeof(toeplitz_key[0]); 194 | 195 | struct ibv_exp_rx_hash_conf rx_hash_conf; 196 | memset(&rx_hash_conf, 0, sizeof(rx_hash_conf)); 197 | rx_hash_conf.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ; 198 | rx_hash_conf.rx_hash_key_len = TOEPLITZ_RX_HASH_KEY_LEN; 199 | rx_hash_conf.rx_hash_key = toeplitz_key; 200 | rx_hash_conf.rx_hash_fields_mask = IBV_EXP_RX_HASH_DST_PORT_UDP; 201 | rx_hash_conf.rwq_ind_tbl = mp_ind_tbl; 202 | 203 | struct ibv_exp_qp_init_attr mp_qp_init_attr; 204 | memset(&mp_qp_init_attr, 0, sizeof(mp_qp_init_attr)); 205 | mp_qp_init_attr.comp_mask = IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS | IBV_EXP_QP_INIT_ATTR_PD | IBV_EXP_QP_INIT_ATTR_RX_HASH; 206 | mp_qp_init_attr.rx_hash_conf = &rx_hash_conf; 207 | mp_qp_init_attr.pd = pd; 208 | mp_qp_init_attr.qp_type = IBV_QPT_RAW_PACKET; 209 | 210 | // Create the QP 211 | struct ibv_qp* mp_recv_qp = ibv_exp_create_qp(context, &mp_qp_init_attr); 212 | assert(mp_recv_qp != nullptr); 213 | 214 | size_t tx_ring_size = P4ML_LAYER_SIZE * kAppMaxPostlist; 215 | uint8_t* mp_send_ring; 216 | ib_malloc((void **)&mp_send_ring, tx_ring_size); 217 | rt_assert(mp_send_ring != nullptr); 218 | memset(mp_send_ring, 0, tx_ring_size); 219 | 220 | struct ibv_mr* mp_send_mr = ibv_reg_mr(pd, mp_send_ring, tx_ring_size, IBV_ACCESS_LOCAL_WRITE); 221 | rt_assert(mp_send_mr != nullptr); 222 | 223 | // Register RX ring memory 224 | uint8_t* mp_recv_ring; 225 | ib_malloc((void **)&mp_recv_ring, kAppRingSize); 226 | rt_assert(mp_recv_ring != nullptr); 227 | memset(mp_recv_ring, 0, kAppRingSize); 228 | 229 | struct ibv_mr* mp_mr = ibv_reg_mr(pd, mp_recv_ring, kAppRingSize, IBV_ACCESS_LOCAL_WRITE); 230 | rt_assert(mp_mr != nullptr); 231 | ///////////////////////////////////////////////////////////////////////////////////// 232 | // install_flow_rule(mp_recv_qp, 30720 + thread_id); 233 | install_flow_rule(mp_recv_qp, thread_id, isPS); 234 | // This cast works for mlx5 where ibv_cq is the first member of mlx5_cq. 235 | auto* _mlx5_cq = reinterpret_cast(mp_recv_cq); 236 | rt_assert(kAppRecvCQDepth == std::pow(2, _mlx5_cq->cq_log_size)); 237 | rt_assert(_mlx5_cq->buf_a.buf != nullptr); 238 | 239 | auto* mp_cqe_arr = reinterpret_cast(_mlx5_cq->buf_a.buf); 240 | 241 | // Initialize the CQEs as if we received the last (kAppRecvCQDepth) packets 242 | // in the CQE cycle. 243 | static_assert(kAppStridesPerWQE >= kAppRecvCQDepth, ""); 244 | for (size_t i = 0; i < kAppRecvCQDepth; i++) { 245 | mp_cqe_arr[i].wqe_id = htons(std::numeric_limits::max()); 246 | // Last CQE gets 247 | // * wqe_counter = (kAppStridesPerWQE - 1) 248 | // * snapshot_cycle_idx = (kAppCQESnapshotCycle - 1) 249 | mp_cqe_arr[i].wqe_counter = htons(kAppStridesPerWQE - (kAppRecvCQDepth - i)); 250 | 251 | cqe_snapshot_t snapshot; 252 | snapshot_cqe(&mp_cqe_arr[i], snapshot); 253 | rt_assert(snapshot.get_cqe_snapshot_cycle_idx() == kAppCQESnapshotCycle - (kAppRecvCQDepth - i)); 254 | } 255 | 256 | // The multi-packet RECVs. This must be done after we've initialized the CQE. 257 | struct ibv_sge* mp_sge = reinterpret_cast(malloc(sizeof(struct ibv_sge) * kAppRQDepth)); 258 | for (size_t i = 0; i < kAppRQDepth; i++) { 259 | size_t mpwqe_offset = i * (kAppRingMbufSize * kAppStridesPerWQE); 260 | mp_sge[i].addr = reinterpret_cast(&mp_recv_ring[mpwqe_offset]); 261 | mp_sge[i].lkey = mp_mr->lkey; 262 | mp_sge[i].length = kAppRingMbufSize * kAppStridesPerWQE; //kAppRingSize; 263 | mp_wq_family->recv_burst(mp_wq, &mp_sge[i], 1); 264 | } 265 | 266 | printf("[Thread %d] Finish created QP - ", thread_id); 267 | printf("kAppRingMbufSize=%lu, kAppStridesPerWQE=%lu, kAppRingSize=%lu, kAppRQDepth=%lu\n", kAppRingMbufSize, kAppStridesPerWQE, kAppRingSize, kAppRQDepth); 268 | auto* cqe_arr = mp_cqe_arr; 269 | cqe_snapshot_t prev_snapshot; 270 | snapshot_cqe(&cqe_arr[kAppRecvCQDepth - 1], prev_snapshot); 271 | 272 | return new DMAcontext{ 273 | .pd = pd, 274 | .ctx = context, 275 | .receive_cq = rec_cq, 276 | .send_cq = snd_cq, 277 | .send_mr = send_mr, 278 | .send_region = send_buf, 279 | .data_qp = qp, 280 | 281 | .mp_recv_qp = mp_recv_qp, 282 | .mp_recv_cq = mp_recv_cq, 283 | .mp_wq = mp_wq, 284 | .mp_wq_family = mp_wq_family, 285 | .mp_ind_tbl = mp_ind_tbl, 286 | .mp_cqe_arr = mp_cqe_arr, 287 | .mp_sge = mp_sge, 288 | .mp_recv_ring = mp_recv_ring, 289 | .mp_send_ring = mp_send_ring, 290 | .mp_send_mr = mp_send_mr, 291 | 292 | .id = thread_id, 293 | .total_received = 0, 294 | .total_sent = 0, 295 | .my_send_queue_length = my_send_queue_length, 296 | .my_recv_queue_length = my_recv_queue_length, 297 | 298 | .ring_head = 0, 299 | .nb_rx_rolling = 0, 300 | .sge_idx = 0, 301 | .cqe_idx = 0, 302 | .prev_snapshot = prev_snapshot, 303 | .isPS = isPS, 304 | .isMarkTimeStamp = false, 305 | }; 306 | } 307 | 308 | void send_packet(DMAcontext* dma_context, int chunk_size, uint64_t offset) 309 | { 310 | int ret; 311 | 312 | struct ibv_sge sg; 313 | struct ibv_exp_send_wr wr, *bad_wr; 314 | // struct ibv_send_wr wr; 315 | // struct ibv_send_wr *bad_wr; 316 | 317 | memset(&sg, 0, sizeof(sg)); 318 | sg.addr = (uintptr_t)((char*)dma_context->send_region + offset * P4ML_LAYER_SIZE); 319 | // printf("%d\n", sg.addr); 320 | sg.length = chunk_size; 321 | sg.lkey = dma_context->send_mr->lkey; 322 | 323 | memset(&wr, 0, sizeof(wr)); 324 | wr.wr_id = 0; 325 | wr.sg_list = &sg; 326 | wr.num_sge = 1; 327 | // wr.opcode = IBV_WR_SEND; 328 | wr.exp_opcode = IBV_EXP_WR_TSO; 329 | wr.tso.mss = P4ML_LAYER_SIZE; // Maximum Segment Size example 330 | wr.tso.hdr_sz = IP_ETH_UDP_HEADER_SIZE; // ETH/IPv4/TCP header example 331 | char hdr[IP_ETH_UDP_HEADER_SIZE]; // ETH/IPv4/TCP header example 332 | if (dma_context->isPS) 333 | memcpy(hdr, PS_IP_ETH_UDP_HEADER, IP_ETH_UDP_HEADER_SIZE); // Assuming that the header buffer was define before. 334 | else 335 | memcpy(hdr, WORKER_IP_ETH_UDP_HEADER, IP_ETH_UDP_HEADER_SIZE); // Assuming that the header buffer was define before. 336 | 337 | hdr[5] = dma_context->id; 338 | // hdr[37] = dma_context->id; 339 | wr.tso.hdr = hdr; // There is no need to use malloc operation in this case, local definition of hdr is ok. 340 | //wr.exp_send_flags = IBV_SEND_INLINE; 341 | wr.exp_send_flags |= IBV_SEND_SIGNALED; 342 | 343 | if (DEBUG_PRINT_ALL_SENDING_PACKET) 344 | for (int i = 0; i < chunk_size / P4ML_LAYER_SIZE; i++) 345 | p4ml_header_print_h((agghdr*)((char *)sg.addr + i * P4ML_LAYER_SIZE), "SEND"); 346 | 347 | // mark first time sending timestamp 348 | if (dma_context->isMarkTimeStamp) { 349 | std::chrono::high_resolution_clock::time_point current_time = std::chrono::high_resolution_clock::now(); 350 | for (int i = 0; i < chunk_size / P4ML_LAYER_SIZE; i++) { 351 | agghdr* p4ml_header = (agghdr*)((char*)sg.addr + i * P4ML_LAYER_SIZE); 352 | if (!dma_context->isSent[ntohs(p4ml_header->seq_num)]) { 353 | dma_context->isSent[ntohs(p4ml_header->seq_num)] = true; 354 | dma_context->first_send_time[ntohs(p4ml_header->seq_num)] = current_time; 355 | } else { 356 | /* Resend may trigger */ 357 | } 358 | } 359 | } 360 | 361 | // we dont need to wait cq cause received represent sent 362 | ret = ibv_exp_post_send(dma_context->data_qp, &wr, &bad_wr); 363 | if (ret < 0) { 364 | fprintf(stderr, "failed in post send\n"); 365 | exit(1); 366 | } 367 | 368 | struct ibv_wc wc_send_cq[POLLING_SIZE]; 369 | ibv_poll_cq(dma_context->send_cq, POLLING_SIZE, wc_send_cq); 370 | if (DEBUG_CHECK_SEND_RECEIVE_TOTAL) 371 | dma_context->total_sent += chunk_size / P4ML_LAYER_SIZE; 372 | } 373 | 374 | size_t receive_packet(DMAcontext *dma_context, cqe_snapshot_t* new_snapshot) 375 | { 376 | // cqe_snapshot_t new_snapshot; 377 | // cur_snapshot = new_snapshot; 378 | snapshot_cqe(&dma_context->mp_cqe_arr[dma_context->cqe_idx], *new_snapshot); 379 | const size_t delta = get_cycle_delta(dma_context->prev_snapshot, *new_snapshot); 380 | 381 | if (!(delta == 0 || delta >= kAppNumRingEntries)) { 382 | if (DEBUG_CHECK_SEND_RECEIVE_TOTAL) 383 | dma_context->total_received += delta; 384 | return delta; 385 | } 386 | else 387 | return 0; 388 | // return delta; 389 | } 390 | 391 | void dma_postback(DMAcontext *dma_context) 392 | { 393 | dma_context->ring_head = (dma_context->ring_head + 1) % kAppNumRingEntries; 394 | dma_context->nb_rx_rolling++; 395 | if (dma_context->nb_rx_rolling == kAppStridesPerWQE) 396 | { 397 | dma_context->nb_rx_rolling = 0; 398 | int ret = dma_context->mp_wq_family->recv_burst(dma_context->mp_wq, &dma_context->mp_sge[dma_context->sge_idx], 1); 399 | rt_assert(ret == 0); 400 | dma_context->sge_idx = (dma_context->sge_idx + 1) % kAppRQDepth; 401 | } 402 | } 403 | 404 | void dma_update_snapshot(DMAcontext *dma_context, cqe_snapshot_t new_snapshot) 405 | { 406 | dma_context->prev_snapshot = new_snapshot; 407 | dma_context->cqe_idx = (dma_context->cqe_idx + 1) % kAppRecvCQDepth; 408 | } 409 | 410 | const char* ibv_wc_opcode_str(enum ibv_wc_opcode opcode) 411 | { 412 | switch (opcode) { 413 | case IBV_EXP_WC_SEND: 414 | return "IBV_WC_SEND"; 415 | case IBV_EXP_WC_RDMA_WRITE: 416 | return "IBV_WC_RDMA_WRITE"; 417 | case IBV_EXP_WC_RDMA_READ: 418 | return "IBV_WC_RDMA_READ"; 419 | case IBV_WC_COMP_SWAP: 420 | return "IBV_WC_COMP_SWAP"; 421 | case IBV_WC_FETCH_ADD: 422 | return "IBV_WC_FETCH_ADD"; 423 | case IBV_WC_BIND_MW: 424 | return "IBV_WC_BIND_MW"; 425 | /* receive-side: inbound completion */ 426 | case IBV_EXP_WC_RECV: 427 | return "IBV_WC_RECV"; 428 | case IBV_EXP_WC_RECV_RDMA_WITH_IMM: 429 | return "IBV_WC_RECV_RDMA_WITH_IMM"; 430 | default: 431 | return "IBV_WC_UNKNOWN"; 432 | } 433 | } 434 | 435 | // Install a flow rule 436 | void install_flow_rule(struct ibv_qp* qp, uint16_t thread_id, bool isPS) 437 | { 438 | static constexpr size_t rule_sz = sizeof(ibv_exp_flow_attr) + sizeof(ibv_exp_flow_spec_eth) + sizeof(ibv_exp_flow_spec_ipv4_ext); 439 | 440 | uint8_t* flow_rule = new uint8_t[rule_sz]; 441 | memset(flow_rule, 0, rule_sz); 442 | uint8_t* buf = flow_rule; 443 | 444 | auto* flow_attr = reinterpret_cast(flow_rule); 445 | flow_attr->type = IBV_EXP_FLOW_ATTR_NORMAL; 446 | flow_attr->size = rule_sz; 447 | flow_attr->priority = 0; 448 | flow_attr->num_of_specs = 1; 449 | flow_attr->port = 1; 450 | flow_attr->flags = 0; 451 | flow_attr->reserved = 0; 452 | buf += sizeof(struct ibv_exp_flow_attr); 453 | 454 | // Ethernet - all wildcard 455 | auto* eth_spec = reinterpret_cast(buf); 456 | eth_spec->type = IBV_EXP_FLOW_SPEC_ETH; 457 | eth_spec->size = sizeof(struct ibv_exp_flow_spec_eth); 458 | buf += sizeof(struct ibv_exp_flow_spec_eth); 459 | 460 | const unsigned char R_SRC_MAC[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; 461 | unsigned char R_DST_MAC[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; 462 | if (isPS) 463 | memcpy(R_DST_MAC, PS_FILTER_TEMPLATE_R, sizeof(R_DST_MAC)); 464 | else 465 | memcpy(R_DST_MAC, WORKER_FILTER_TEMPLATE_R, sizeof(R_DST_MAC)); 466 | 467 | R_DST_MAC[5] = thread_id; 468 | 469 | const unsigned char R_SRC_MAC_MASK[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; 470 | const unsigned char R_DST_MAC_MASK[] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF }; 471 | memcpy(eth_spec->val.dst_mac, R_DST_MAC, sizeof(R_DST_MAC)); 472 | memcpy(eth_spec->val.src_mac, R_SRC_MAC, sizeof(R_SRC_MAC)); 473 | memcpy(eth_spec->mask.dst_mac, R_DST_MAC_MASK, sizeof(R_DST_MAC_MASK)); 474 | memcpy(eth_spec->mask.src_mac, R_SRC_MAC_MASK, sizeof(R_SRC_MAC_MASK)); 475 | eth_spec->val.vlan_tag = 0; 476 | eth_spec->mask.ether_type = 0; 477 | 478 | rt_assert(ibv_exp_create_flow(qp, flow_attr) != nullptr); 479 | } 480 | 481 | // Install a UDP destination port--based flow rule 482 | void install_udp_flow_rule(struct ibv_qp* qp, uint16_t dst_port) 483 | { 484 | static constexpr size_t rule_sz = sizeof(ibv_exp_flow_attr) + sizeof(ibv_exp_flow_spec_eth) + sizeof(ibv_exp_flow_spec_ipv4_ext) + sizeof(ibv_exp_flow_spec_tcp_udp); 485 | 486 | uint8_t* flow_rule = new uint8_t[rule_sz]; 487 | memset(flow_rule, 0, rule_sz); 488 | uint8_t* buf = flow_rule; 489 | 490 | auto* flow_attr = reinterpret_cast(flow_rule); 491 | flow_attr->type = IBV_EXP_FLOW_ATTR_NORMAL; 492 | flow_attr->size = rule_sz; 493 | flow_attr->priority = 0; 494 | flow_attr->num_of_specs = 1; 495 | flow_attr->port = 1; 496 | flow_attr->flags = 0; 497 | flow_attr->reserved = 0; 498 | buf += sizeof(struct ibv_exp_flow_attr); 499 | 500 | // Ethernet - all wildcard 501 | auto* eth_spec = reinterpret_cast(buf); 502 | eth_spec->type = IBV_EXP_FLOW_SPEC_ETH; 503 | eth_spec->size = sizeof(struct ibv_exp_flow_spec_eth); 504 | buf += sizeof(struct ibv_exp_flow_spec_eth); 505 | 506 | // IPv4 - all wildcard 507 | auto* spec_ipv4 = reinterpret_cast(buf); 508 | spec_ipv4->type = IBV_EXP_FLOW_SPEC_IPV4_EXT; 509 | spec_ipv4->size = sizeof(struct ibv_exp_flow_spec_ipv4_ext); 510 | buf += sizeof(struct ibv_exp_flow_spec_ipv4_ext); 511 | 512 | // UDP - match dst port 513 | auto* udp_spec = reinterpret_cast(buf); 514 | udp_spec->type = IBV_EXP_FLOW_SPEC_UDP; 515 | udp_spec->size = sizeof(struct ibv_exp_flow_spec_tcp_udp); 516 | udp_spec->val.dst_port = htons(dst_port); 517 | udp_spec->mask.dst_port = 0xffffu; 518 | udp_spec->mask.dst_port = 0; 519 | 520 | rt_assert(ibv_exp_create_flow(qp, flow_attr) != nullptr); 521 | } 522 | 523 | void snapshot_cqe(volatile mlx5_cqe64* cqe, cqe_snapshot_t& cqe_snapshot) 524 | { 525 | while (true) { 526 | uint16_t wqe_id_0 = cqe->wqe_id; 527 | uint16_t wqe_counter_0 = cqe->wqe_counter; 528 | memory_barrier(); 529 | uint16_t wqe_id_1 = cqe->wqe_id; 530 | 531 | if (likely(wqe_id_0 == wqe_id_1)) { 532 | cqe_snapshot.wqe_id = ntohs(wqe_id_0); 533 | cqe_snapshot.wqe_counter = ntohs(wqe_counter_0); 534 | return; 535 | } 536 | } 537 | } 538 | 539 | size_t get_cycle_delta(const cqe_snapshot_t& prev, const cqe_snapshot_t& cur) 540 | { 541 | size_t prev_idx = prev.get_cqe_snapshot_cycle_idx(); 542 | size_t cur_idx = cur.get_cqe_snapshot_cycle_idx(); 543 | assert(prev_idx < kAppCQESnapshotCycle && cur_idx < kAppCQESnapshotCycle); 544 | 545 | return ((cur_idx + kAppCQESnapshotCycle) - prev_idx) % kAppCQESnapshotCycle; 546 | } 547 | -------------------------------------------------------------------------------- /common/dma_common.h: -------------------------------------------------------------------------------- 1 | #ifndef DMA_COMMON_H 2 | #define DMA_COMMON_H 3 | 4 | #include "mlx5_defs.h" 5 | #include "packet.h" 6 | #include "utils.h" 7 | #include 8 | #include 9 | #include 10 | #include //ifreq 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | 24 | #define POLLING_SIZE 400 25 | #define ENTRY_SIZE 256 /* maximum size of each buffer */ 26 | #define PORT_NUM 1 27 | 28 | #define DEBUG_PRINT_ALL_SENDING_PACKET false 29 | #define DEBUG_PRINT_ALL_RECEIVING_PACKET false 30 | 31 | #define DEBUG_CHECK_SEND_RECEIVE_TOTAL false 32 | 33 | static constexpr size_t kAppRecvCQDepth = 8; 34 | static constexpr size_t kAppRQDepth = 4; // Multi-packet RQ depth 35 | 36 | static constexpr size_t kAppLogNumStrides = 9; 37 | static constexpr size_t kAppLogStrideBytes = 9; 38 | static constexpr size_t kAppMaxPostlist = 512; 39 | 40 | static constexpr bool kAppVerbose = false; 41 | static constexpr bool kAppCheckContents = true; // Check buffer contents 42 | 43 | /// Size of one ring message buffer 44 | static constexpr size_t kAppRingMbufSize = (1ull << kAppLogStrideBytes); 45 | 46 | /// Number of strides in one multi-packet RECV WQE 47 | static constexpr size_t kAppStridesPerWQE = (1ull << kAppLogNumStrides); 48 | 49 | /// Packets after which the CQE snapshot cycles 50 | static constexpr size_t kAppCQESnapshotCycle = 65536 * kAppStridesPerWQE; 51 | 52 | /// Total number of entries in the RX ring 53 | static constexpr size_t kAppNumRingEntries = (kAppStridesPerWQE * kAppRQDepth); 54 | 55 | static constexpr size_t kAppRingSize = (kAppNumRingEntries * kAppRingMbufSize); 56 | 57 | /// A consistent snapshot of CQE fields in host endian format 58 | struct cqe_snapshot_t { 59 | uint16_t wqe_id; 60 | uint16_t wqe_counter; 61 | 62 | /// Return this packet's index in the CQE snapshot cycle 63 | size_t get_cqe_snapshot_cycle_idx() const 64 | { 65 | return wqe_id * kAppStridesPerWQE + wqe_counter; 66 | } 67 | 68 | std::string to_string() 69 | { 70 | std::ostringstream ret; 71 | ret << "[ID " << std::to_string(wqe_id) << ", counter " 72 | << std::to_string(wqe_counter) << "]"; 73 | return ret.str(); 74 | } 75 | }; 76 | 77 | struct DMAcontext { 78 | struct ibv_pd* pd; 79 | struct ibv_context* ctx; 80 | struct ibv_cq* receive_cq; 81 | struct ibv_cq* send_cq; 82 | struct ibv_mr* send_mr; 83 | void* send_region; 84 | struct ibv_qp* data_qp; 85 | 86 | struct ibv_qp* mp_recv_qp; 87 | struct ibv_cq* mp_recv_cq; 88 | struct ibv_exp_wq* mp_wq; 89 | struct ibv_exp_wq_family* mp_wq_family; 90 | struct ibv_exp_rwq_ind_table* mp_ind_tbl; 91 | volatile mlx5_cqe64* mp_cqe_arr; 92 | struct ibv_sge* mp_sge; 93 | uint8_t* mp_recv_ring; 94 | uint8_t* mp_send_ring; 95 | struct ibv_mr* mp_send_mr; 96 | 97 | // for connection 98 | int id; 99 | int total_received; 100 | int total_sent; 101 | int my_send_queue_length; 102 | int my_recv_queue_length; 103 | 104 | size_t ring_head; 105 | size_t nb_rx_rolling; 106 | size_t sge_idx; 107 | size_t cqe_idx; 108 | 109 | cqe_snapshot_t prev_snapshot; 110 | 111 | bool isPS; 112 | 113 | // // For window adjustment 114 | bool isMarkTimeStamp; 115 | bool* isSent; 116 | std::chrono::high_resolution_clock::time_point* first_send_time; 117 | std::chrono::high_resolution_clock::time_point* first_receive_time; 118 | }; 119 | 120 | DMAcontext* DMA_create(ibv_device* ib_dev, int thread_id, bool isPS); 121 | const char* ibv_wc_opcode_str(enum ibv_wc_opcode opcode); 122 | void send_packet(DMAcontext* dma_context, int packet_size, uint64_t offset); 123 | size_t receive_packet(DMAcontext *dma_context, cqe_snapshot_t* new_snapshot); 124 | void dma_postback(DMAcontext *dma_context); 125 | void dma_update_snapshot(DMAcontext *dma_context, cqe_snapshot_t new_snapshot); 126 | void dma_context_print(DMAcontext* dma_context, const char* caption); 127 | 128 | // Install a UDP destination port--based flow rule 129 | void install_flow_rule(struct ibv_qp* qp, uint16_t thread_id, bool isPS); 130 | void install_udp_flow_rule(struct ibv_qp* qp, uint16_t dst_port); 131 | void snapshot_cqe(volatile mlx5_cqe64* cqe, cqe_snapshot_t& cqe_snapshot); 132 | size_t get_cycle_delta(const cqe_snapshot_t& prev, const cqe_snapshot_t& cur); 133 | #endif 134 | -------------------------------------------------------------------------------- /common/dma_common.o: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/common/dma_common.o -------------------------------------------------------------------------------- /common/mlx5_defs.h: -------------------------------------------------------------------------------- 1 | #ifndef MLX5_DEFS_H 2 | #define MLX5_DEFS_H 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | enum mlx5_alloc_type { 10 | MLX5_ALLOC_TYPE_ANON, 11 | MLX5_ALLOC_TYPE_HUGE, 12 | MLX5_ALLOC_TYPE_CONTIG, 13 | MLX5_ALLOC_TYPE_PEER_DIRECT, 14 | MLX5_ALLOC_TYPE_PREFER_HUGE, 15 | MLX5_ALLOC_TYPE_PREFER_CONTIG, 16 | MLX5_ALLOC_TYPE_ALL 17 | }; 18 | 19 | enum mlx5_lock_type { 20 | MLX5_SPIN_LOCK = 0, 21 | MLX5_MUTEX = 1, 22 | }; 23 | 24 | enum mlx5_lock_state { MLX5_USE_LOCK, 25 | MLX5_LOCKED, 26 | MLX5_UNLOCKED }; 27 | 28 | struct mlx5_lock { 29 | pthread_mutex_t mutex; 30 | pthread_spinlock_t slock; 31 | enum mlx5_lock_state state; 32 | enum mlx5_lock_type type; 33 | }; 34 | 35 | struct mlx5_numa_req { 36 | int valid; 37 | int numa_id; 38 | }; 39 | 40 | struct mlx5_peer_direct_mem { 41 | uint32_t dir; 42 | uint64_t va_id; 43 | struct ibv_exp_peer_buf* pb; 44 | struct ibv_exp_peer_direct_attr* ctx; 45 | }; 46 | 47 | struct mlx5_buf { 48 | void* buf; 49 | size_t length; 50 | int base; 51 | struct mlx5_hugetlb_mem* hmem; 52 | struct mlx5_peer_direct_mem peer; 53 | enum mlx5_alloc_type type; 54 | struct mlx5_numa_req numa_req; 55 | int numa_alloc; 56 | }; 57 | 58 | struct mlx5_mini_cqe8 { 59 | union { 60 | uint32_t rx_hash_result; 61 | uint32_t checksum; 62 | struct { 63 | uint16_t wqe_counter; 64 | uint8_t s_wqe_opcode; 65 | uint8_t reserved; 66 | } s_wqe_info; 67 | }; 68 | uint32_t byte_cnt; 69 | }; 70 | 71 | enum { MLX5_MINI_ARR_SIZE = 8 }; 72 | 73 | struct mlx5_tm_cqe { 74 | uint32_t success; 75 | uint32_t hw_phase_cnt; 76 | uint8_t rsvd0[10]; 77 | }; 78 | 79 | struct mlx5_cqe64 { 80 | uint8_t rsvd0[2]; 81 | /* 82 | * wqe_id is valid only for 83 | * Striding RQ (Multi-Packet RQ). 84 | * It provides the WQE index inside the RQ. 85 | */ 86 | uint16_t wqe_id; 87 | uint8_t rsvd4[8]; 88 | uint32_t rx_hash_res; 89 | uint8_t rx_hash_type; 90 | uint8_t ml_path; 91 | uint8_t rsvd20[2]; 92 | uint16_t checksum; 93 | uint16_t slid; 94 | uint32_t flags_rqpn; 95 | uint8_t hds_ip_ext; 96 | uint8_t l4_hdr_type_etc; 97 | __be16 vlan_info; 98 | uint32_t srqn_uidx; 99 | uint32_t imm_inval_pkey; 100 | uint8_t app; 101 | uint8_t app_op; 102 | uint16_t app_info; 103 | uint32_t byte_cnt; 104 | __be64 timestamp; 105 | union { 106 | uint32_t sop_drop_qpn; 107 | struct { 108 | uint8_t sop; 109 | uint8_t qpn[3]; 110 | } sop_qpn; 111 | }; 112 | /* 113 | * In Striding RQ (Multi-Packet RQ) wqe_counter provides 114 | * the WQE stride index (to calc pointer to start of the message) 115 | */ 116 | uint16_t wqe_counter; 117 | uint8_t signature; 118 | uint8_t op_own; 119 | }; 120 | 121 | struct mlx5_cq { 122 | struct ibv_cq ibv_cq; 123 | uint32_t creation_flags; 124 | uint32_t pattern; 125 | struct mlx5_buf buf_a; 126 | struct mlx5_buf buf_b; 127 | struct mlx5_buf* active_buf; 128 | struct mlx5_buf* resize_buf; 129 | int resize_cqes; 130 | int active_cqes; 131 | struct mlx5_lock lock; 132 | uint32_t cqn; 133 | uint32_t cons_index; 134 | uint32_t wait_index; 135 | uint32_t wait_count; 136 | volatile uint32_t* dbrec; 137 | int arm_sn; 138 | int cqe_sz; 139 | int resize_cqe_sz; 140 | int stall_next_poll; 141 | int stall_enable; 142 | uint64_t stall_last_count; 143 | int stall_adaptive_enable; 144 | int stall_cycles; 145 | uint8_t model_flags; /* use mlx5_cq_model_flags */ 146 | uint16_t cqe_comp_max_num; 147 | uint8_t cq_log_size; 148 | /* Compressed CQE data */ 149 | struct mlx5_cqe64 next_decomp_cqe64; 150 | struct mlx5_resource* compressed_rsc; 151 | uint16_t compressed_left; 152 | uint16_t compressed_wqe_cnt; 153 | uint8_t compressed_req; 154 | uint8_t compressed_mp_rq; 155 | uint8_t mini_arr_idx; 156 | struct mlx5_mini_cqe8 mini_array[MLX5_MINI_ARR_SIZE]; 157 | /* peer-direct data */ 158 | int peer_enabled; 159 | struct ibv_exp_peer_direct_attr* peer_ctx; 160 | struct mlx5_buf peer_buf; 161 | struct mlx5_peek_entry** peer_peek_table; 162 | struct mlx5_peek_entry* peer_peek_free; 163 | }; 164 | 165 | #endif // MLX5_DEFS_H 166 | -------------------------------------------------------------------------------- /common/p4ml_struct.h: -------------------------------------------------------------------------------- 1 | #ifndef P4ML_STRUCT_H 2 | #define P4ML_STRUCT_H 3 | #include 4 | 5 | #include "packet.h" 6 | 7 | struct ThreadInfo 8 | { 9 | int thread_id; 10 | int agtr_start_pos; 11 | }; 12 | 13 | struct Job 14 | { 15 | uint64_t key; 16 | float *float_data; 17 | int32_t *int_data; 18 | uint32_t len; 19 | int cmd; 20 | }; 21 | 22 | struct AppInfo 23 | { 24 | uint32_t host; 25 | uint16_t appID; 26 | uint8_t num_worker; 27 | uint8_t num_PS; 28 | }; 29 | 30 | #endif -------------------------------------------------------------------------------- /common/packet.h: -------------------------------------------------------------------------------- 1 | #ifndef PACKET_P4ML_H 2 | #define PACKET_P4ML_H 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include "utils.h" 15 | #include "p4ml_struct.h" 16 | 17 | #define PS_FILTER_TEMPLATE 0x05, 0x04, 0x03, 0x02, 0x01, 0xFF 18 | #define WORKER_FILTER_TEMPLATE 0x77, 0x77, 0x77, 0x77, 0x77, 0xFF 19 | 20 | // #define SRC_MAC 0xb8, 0x59, 0x9f, 0x1d, 0x04, 0xf2 21 | #define SRC_MAC 0xe4, 0x1d, 0x2d, 0xf3, 0xdd, 0xcc 22 | // #define DST_MAC 0xb8, 0x59, 0x9f, 0x0b, 0x30, 0x72 23 | 24 | #define ETH_TYPE 0x07, 0x00 25 | 26 | #define IP_HDRS 0x45, 0x00, 0x00, 0x54, 0x00, 0x00, 0x40, 0x00, 0x40, 0x01, 0xaf, 0xb6 27 | 28 | #define SRC_IP 0x0d, 0x07, 0x38, 0x66 29 | 30 | #define DST_IP 0x0d, 0x07, 0x38, 0x7f 31 | 32 | #define SRC_PORT 0x67, 0x67 33 | 34 | #define DST_PORT 0x78, 0x78 35 | 36 | #define UDP_HDRS 0x00, 0x00, 0x00, 0x00 37 | 38 | // Only a template, DST_IP will be modified soon 39 | // This one is for sending 40 | const unsigned char PS_IP_ETH_UDP_HEADER[] = { WORKER_FILTER_TEMPLATE, SRC_MAC, ETH_TYPE, IP_HDRS, SRC_IP, DST_IP }; 41 | const unsigned char WORKER_IP_ETH_UDP_HEADER[] = { PS_FILTER_TEMPLATE, SRC_MAC, ETH_TYPE, IP_HDRS, SRC_IP, DST_IP }; 42 | 43 | // P4ML_PACKET_SIZE = IP_ETH_HEADER_SIZE + P4ML_HEADER_SIZE + P4ML_DATA_SIZE 44 | #define P4ML_PACKET_SIZE 309 //308+1 45 | #define P4ML_DATA_SIZE 248 46 | #define P4ML_HEADER_SIZE 27 //26+1 47 | #define P4ML_LAYER_SIZE 275 //274+1 48 | #define IP_ETH_UDP_HEADER_SIZE 34 49 | 50 | #define MAX_ENTRIES_PER_PACKET 62 51 | 52 | #define BYTE_TO_BINARY_PATTERN "%c%c%c%c%c%c%c%c" 53 | #define BYTE_TO_BINARY(byte) \ 54 | (byte & 0x80 ? '1' : '0'), \ 55 | (byte & 0x40 ? '1' : '0'), \ 56 | (byte & 0x20 ? '1' : '0'), \ 57 | (byte & 0x10 ? '1' : '0'), \ 58 | (byte & 0x08 ? '1' : '0'), \ 59 | (byte & 0x04 ? '1' : '0'), \ 60 | (byte & 0x02 ? '1' : '0'), \ 61 | (byte & 0x01 ? '1' : '0') 62 | 63 | #pragma pack(push, 1) 64 | struct agghdr { 65 | uint32_t bitmap; 66 | uint8_t num_worker; 67 | uint8_t flag; 68 | // reserved : 2; 69 | // isForceFoward : 1; 70 | 71 | /* Current version 72 | overflow : 1; 73 | PSIndex : 2; 74 | dataIndex : 1; 75 | ECN : 1; 76 | isResend : 1; 77 | isSWCollision : 1; 78 | isACK : 1; 79 | */ 80 | 81 | uint16_t appID; 82 | uint16_t seq_num; 83 | uint8_t is_lzy_Col; //is_Col in A2TP 84 | uint16_t agtr; 85 | uint16_t agtr2; 86 | int32_t vector[MAX_ENTRIES_PER_PACKET]; 87 | uint64_t key; 88 | uint32_t len_tensor; 89 | }; 90 | #pragma pack(pop) 91 | 92 | static std::mutex _packet_print_mutex; 93 | 94 | void inline make_p4ml_layer_and_copy_to(void* payload, Job* job_info, AppInfo* app_info, uint16_t* agtr, uint16_t* seq_num, int* offset, bool isResend, bool isForceForward, bool isOverflow, bool isForceCollision = false) 95 | { 96 | agghdr* agg_header = (agghdr*)payload; 97 | agghdr* p4ml_header = agg_header; 98 | agg_header->key = job_info->key; 99 | agg_header->len_tensor = htonl(job_info->len); 100 | agg_header->bitmap = htonl(1 << (app_info->host)); 101 | agg_header->num_worker = app_info->num_worker; 102 | agg_header->appID = htons(app_info->appID); 103 | agg_header->flag = 0; 104 | agg_header->agtr = htons(*agtr); 105 | //TODO: clarify this and UsedSwitchAGTRcount 106 | agg_header->agtr2 = htons(*agtr + MAX_AGTR_COUNT); 107 | agg_header->seq_num = htons(*seq_num); 108 | 109 | agg_header->flag = ((job_info->key % app_info->num_PS)) << 5; 110 | 111 | agg_header->is_lzy_Col = 0; 112 | //agg_header->flag = 0; //lzy 113 | if (isResend) 114 | agg_header->flag |= 4; 115 | 116 | if (isForceForward) 117 | agg_header->flag |= 32; 118 | 119 | if (isOverflow) 120 | agg_header->flag |= 128; 121 | 122 | if (isForceCollision){ //lzy 123 | agg_header->flag |= 0x02; 124 | } 125 | // PS Index 126 | // agg_header->flag |= (*num_PS << 5); 127 | // printf("to PS: %d\n", ((*key % *num_PS)+1)); 128 | 129 | int32_t* used_data; 130 | if (isOverflow) { 131 | used_data = (int32_t*) job_info->float_data; 132 | } 133 | else 134 | used_data = (int32_t*) job_info->int_data; 135 | 136 | int32_t* send_data; 137 | if (*offset + MAX_ENTRIES_PER_PACKET > job_info->len) { 138 | int32_t* tmp = new int32_t[MAX_ENTRIES_PER_PACKET](); 139 | memcpy(tmp, used_data + *offset, sizeof(int32_t) * (job_info->len % MAX_ENTRIES_PER_PACKET)); 140 | send_data = tmp; 141 | delete tmp; 142 | } else { 143 | send_data = used_data + *offset; 144 | } 145 | 146 | // p4ml_header_print_h(agg_header, "Make"); 147 | } 148 | 149 | // void inline make_packet_and_copy_to(void* payload, uint64_t* key, uint32_t* len_tensor, uint32_t* workerID, uint8_t* num_worker, uint16_t* appID, uint16_t* agtr, uint16_t* seq_num, int32_t* data, bool isResend, bool isForceForward, uint8_t* num_PS, int thread_id) 150 | // { 151 | // char* eth_ip_header = (char*)payload; 152 | // memcpy(payload, IP_ETH_UDP_HEADER, sizeof(IP_ETH_UDP_HEADER)); 153 | // eth_ip_header[5] = thread_id; 154 | // make_p4ml_layer_and_copy_to((char*)payload + sizeof(IP_ETH_UDP_HEADER), key, len_tensor, workerID, num_worker, appID, agtr, seq_num, data, isResend, isForceForward, num_PS); 155 | // } 156 | 157 | void inline p4ml_header_ntoh(agghdr* p_p4ml) 158 | { 159 | p_p4ml->len_tensor = ntohl(p_p4ml->len_tensor); 160 | p_p4ml->bitmap = ntohl(p_p4ml->bitmap); 161 | p_p4ml->seq_num = ntohs(p_p4ml->seq_num); 162 | p_p4ml->agtr = ntohs(p_p4ml->agtr); 163 | p_p4ml->agtr2 = ntohs(p_p4ml->agtr2); 164 | p_p4ml->appID = ntohs(p_p4ml->appID); 165 | 166 | 167 | int32_t* p_model = p_p4ml->vector; 168 | 169 | /* if not float */ 170 | if (!(p_p4ml->flag & 0x80)) { 171 | for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++) 172 | p_model[i] = ntohl(p_model[i]); 173 | } 174 | } 175 | 176 | void inline p4ml_header_ntoh_without_data(agghdr* p_p4ml) 177 | { 178 | p_p4ml->len_tensor = ntohl(p_p4ml->len_tensor); 179 | p_p4ml->bitmap = ntohl(p_p4ml->bitmap); 180 | p_p4ml->seq_num = ntohs(p_p4ml->seq_num); 181 | p_p4ml->agtr = ntohs(p_p4ml->agtr); 182 | p_p4ml->agtr2 = ntohs(p_p4ml->agtr2); 183 | p_p4ml->appID = ntohs(p_p4ml->appID); 184 | 185 | int32_t* p_model = p_p4ml->vector; 186 | } 187 | 188 | void inline p4ml_header_hton_without_data(agghdr* p_p4ml) 189 | { 190 | p_p4ml->len_tensor = htonl(p_p4ml->len_tensor); 191 | p_p4ml->bitmap = htonl(p_p4ml->bitmap); 192 | p_p4ml->seq_num = htons(p_p4ml->seq_num); 193 | p_p4ml->agtr = htons(p_p4ml->agtr); 194 | p_p4ml->agtr2 = htons(p_p4ml->agtr2); 195 | p_p4ml->appID = htons(p_p4ml->appID); 196 | 197 | } 198 | 199 | void inline p4ml_header_setACK(agghdr* p4ml_header) 200 | { 201 | p4ml_header->flag |= 1; 202 | } 203 | 204 | void inline p4ml_header_setOverflow(agghdr* p4ml_header) 205 | { 206 | p4ml_header->flag |= 128; 207 | } 208 | 209 | void inline p4ml_header_setOverflowRequest(agghdr* p4ml_header) 210 | { 211 | p4ml_header->flag |= 128; 212 | p4ml_header->flag &= ~(4); 213 | } 214 | 215 | void inline p4ml_header_setCollisionBit(agghdr* p4ml_header) 216 | { 217 | p4ml_header->flag |= 2; 218 | } 219 | 220 | void inline p4ml_header_setLengthFieldToAgtr(agghdr* p4ml_header, uint16_t new_agtr) 221 | { 222 | p4ml_header->len_tensor = new_agtr; 223 | } 224 | 225 | void inline p4ml_header_resetIndex(agghdr* p4ml_header) 226 | { 227 | p4ml_header->flag &= ~(16); 228 | } 229 | 230 | void inline p4ml_header_resetCollisionBit(agghdr* p4ml_header) 231 | { 232 | p4ml_header->flag &= ~(2); 233 | } 234 | 235 | void inline p4ml_header_resetLZYColBit(agghdr* p4ml_header){ 236 | p4ml_header->is_lzy_Col = 0x00; 237 | 238 | } 239 | void inline p4ml_header_setLZYColBit(agghdr* p4ml_header){ 240 | p4ml_header->is_lzy_Col = 0x01; 241 | } 242 | 243 | void inline p4ml_header_print(agghdr *p4ml_header, const char *caption) 244 | { 245 | std::lock_guard lock(_packet_print_mutex); 246 | printf("[%s] \n key: %" PRIu64 ", len_tensor: %u, " 247 | "bitmap: " BYTE_TO_BINARY_PATTERN ", num_worker: %u, appID: %u, " 248 | "agtr: %u, agtr2: %u, seq_num: %u, isACK: %d, dataIndex: %d," 249 | "isResend: %d, isOverflow: %d, data: ", 250 | caption, p4ml_header->key, p4ml_header->len_tensor, 251 | BYTE_TO_BINARY(p4ml_header->bitmap), p4ml_header->num_worker, p4ml_header->appID, 252 | p4ml_header->agtr, p4ml_header->agtr2, p4ml_header->seq_num, 253 | p4ml_header->flag & 1 ? 1 : 0, p4ml_header->flag & 16 ? 1 : 0, p4ml_header->flag & 4 ? 1 : 0, 254 | p4ml_header->flag & 128 ? 1 : 0); 255 | 256 | // is Overflow? 257 | if (p4ml_header->flag & 128) 258 | // is ACK? isn't Resend? 259 | if (p4ml_header->flag & 1 && !(p4ml_header->flag & 4)) 260 | printf("REQUEST - CARELESS."); 261 | else 262 | for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++) 263 | printf("%.7f ", ntohf((p4ml_header->vector)[i])); 264 | else 265 | for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++) 266 | printf("%d ", p4ml_header->vector[i]); 267 | printf("\n"); 268 | } 269 | 270 | void inline p4ml_header_print_h(agghdr *p4ml_header, const char *caption) 271 | { 272 | std::lock_guard lock(_packet_print_mutex); 273 | printf("[%s] \n key: %" PRIu64 ", len_tensor: %u, " 274 | "bitmap: " BYTE_TO_BINARY_PATTERN ", num_worker: %u, appID: %u, " 275 | "agtr: %u, agtr2: %u, seq_num: %u, isACK: %d, dataIndex: %d," 276 | "isResend: %d, isOverflow: %d, data: ", 277 | caption, p4ml_header->key, ntohl(p4ml_header->len_tensor), 278 | BYTE_TO_BINARY(ntohl(p4ml_header->bitmap)), p4ml_header->num_worker, ntohs(p4ml_header->appID), 279 | ntohs(p4ml_header->agtr), ntohs(p4ml_header->agtr2), ntohs(p4ml_header->seq_num), 280 | p4ml_header->flag & 1 ? 1 : 0, p4ml_header->flag & 16 ? 1 : 0, p4ml_header->flag & 4 ? 1 : 0, 281 | p4ml_header->flag & 128 ? 1 : 0); 282 | 283 | // is Overflow? 284 | if (p4ml_header->flag & 128) 285 | // is ACK? isn't Resend? 286 | if (p4ml_header->flag & 1 && !(p4ml_header->flag & 4)) 287 | printf("REQUEST - CARELESS."); 288 | else 289 | for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++) 290 | printf("%.7f ", ((float *)(p4ml_header->vector))[i]); 291 | else 292 | for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++) 293 | printf("%d ", ntohl(p4ml_header->vector[i])); 294 | printf("\n"); 295 | } 296 | 297 | #endif 298 | -------------------------------------------------------------------------------- /common/quantize.h: -------------------------------------------------------------------------------- 1 | #ifndef QUAN_P4ML_H 2 | #define QUAN_P4ML_H 3 | #include 4 | #include 5 | 6 | // scale up float then translate it to int 7 | // without any further optimization 8 | inline static void quantizeNaive(char *data_ptr, uint32_t size) 9 | { 10 | int factor = 1000000; 11 | int *int_data_ptr = (int *)data_ptr; 12 | float *float_data_ptr = (float *)data_ptr; 13 | for (uint32_t i = 0; i < size; i++) 14 | { 15 | int_data_ptr[i] = (int)(float_data_ptr[i] * factor); 16 | } 17 | } 18 | 19 | // translate back to float and scale down 20 | // without any further optimization 21 | inline static void dequantizeNaive(char *data_ptr, uint32_t size) 22 | { 23 | float factor = 1000000.0; 24 | int *int_data_ptr = (int *)data_ptr; 25 | float *float_data_ptr = (float *)data_ptr; 26 | for (uint32_t i = 0; i < size; i++) 27 | { 28 | float_data_ptr[i] = (float)(int_data_ptr[i] / factor); 29 | } 30 | } 31 | 32 | // functioned the same as quantizeNaive 33 | // boost with avx 256 instructions 34 | inline static void quantizeAVX2(char *data_ptr, uint32_t size) 35 | { 36 | // check alignment 37 | 38 | __m256 input; 39 | __m256i output; 40 | 41 | int unaligned_size = size % 8; 42 | int aligned_size = size / 8; 43 | 44 | const float factor = 1000000.0; 45 | float *float_data_ptr = (float *)data_ptr; 46 | int *int_data_ptr = (int *)data_ptr; 47 | 48 | // 0xF4240 is 1000000 in hex 49 | __m256 factor_in_avx = _mm256_broadcast_ss(&factor); 50 | 51 | for (uint32_t i = 0; i < aligned_size; i++) 52 | { 53 | float *current_pos = float_data_ptr + i * 8; 54 | input = _mm256_loadu_ps(current_pos); 55 | input = _mm256_mul_ps(input, factor_in_avx); 56 | output = _mm256_cvtps_epi32(input); 57 | _mm256_storeu_si256((__m256i *)current_pos, output); 58 | } 59 | 60 | for (uint32_t i = 0; i < unaligned_size; i++) 61 | { 62 | int_data_ptr[aligned_size * 8 + i] = 63 | (int)(float_data_ptr[aligned_size * 8 + i] * factor); 64 | } 65 | } 66 | 67 | // functioned the same as dequantizeNaive 68 | // boost with avx 256 instructions 69 | inline static void dequantizeAVX2(char *data_ptr, uint32_t size) 70 | { 71 | __m256i input; 72 | __m256 output; 73 | 74 | int unaligned_size = size % 8; 75 | int aligned_size = size / 8; 76 | 77 | const float factor = 1000000.0; 78 | int *int_data_ptr = (int *)data_ptr; 79 | float *float_data_ptr = (float *)data_ptr; 80 | 81 | // __m256i* input_avx = (__m256i*) data_ptr; 82 | __m256 factor_in_avx = _mm256_broadcast_ss(&factor); 83 | 84 | for (uint32_t i = 0; i < aligned_size; i++) 85 | { 86 | float *current_pos = float_data_ptr + i * 8; 87 | input = _mm256_loadu_si256((__m256i *)current_pos); 88 | output = _mm256_cvtepi32_ps(input); 89 | output = _mm256_div_ps(output, factor_in_avx); 90 | _mm256_storeu_ps(current_pos, output); 91 | } 92 | 93 | for (uint32_t i = 0; i < unaligned_size; i++) 94 | { 95 | float_data_ptr[aligned_size * 8 + i] = 96 | (float)(int_data_ptr[aligned_size * 8 + i] / factor); 97 | } 98 | } 99 | 100 | // functioned the same as quantizeNaive 101 | // boost with avx 256 instructions 102 | inline static void quantizeAVX2to(char *dst_ptr, char *src_ptr, uint32_t size) 103 | { 104 | // check alignment 105 | 106 | __m256 input; 107 | __m256i output; 108 | 109 | int unaligned_size = size % 8; 110 | int aligned_size = size / 8; 111 | 112 | const float factor = 1000000.0; 113 | float *float_data_ptr = (float *)src_ptr; 114 | int *int_data_ptr = (int *)src_ptr; 115 | 116 | float *dst_float_data_ptr = (float *)dst_ptr; 117 | int *dst_int_data_ptr = (int *)dst_ptr; 118 | 119 | // 0xF4240 is 1000000 in hex 120 | __m256 factor_in_avx = _mm256_broadcast_ss(&factor); 121 | 122 | for (uint32_t i = 0; i < aligned_size; i++) 123 | { 124 | float *current_pos = float_data_ptr + i * 8; 125 | float *current_dst_pos = dst_float_data_ptr + i * 8; 126 | 127 | input = _mm256_loadu_ps(current_pos); 128 | input = _mm256_mul_ps(input, factor_in_avx); 129 | output = _mm256_cvtps_epi32(input); 130 | _mm256_storeu_si256((__m256i *)current_dst_pos, output); 131 | } 132 | 133 | for (uint32_t i = 0; i < unaligned_size; i++) 134 | { 135 | dst_int_data_ptr[aligned_size * 8 + i] = 136 | (int)(float_data_ptr[aligned_size * 8 + i] * factor); 137 | } 138 | } 139 | 140 | // functioned the same as dequantizeNaive 141 | // boost with avx 256 instructions 142 | inline static void dequantizeAVX2to(char *dst_ptr, char *src_ptr, 143 | uint32_t size) 144 | { 145 | __m256i input; 146 | __m256 output; 147 | 148 | int unaligned_size = size % 8; 149 | int aligned_size = size / 8; 150 | 151 | const float factor = 1000000.0; 152 | int *int_data_ptr = (int *)src_ptr; 153 | float *float_data_ptr = (float *)src_ptr; 154 | 155 | int *dst_int_data_ptr = (int *)dst_ptr; 156 | float *dst_float_data_ptr = (float *)dst_ptr; 157 | 158 | // __m256i* input_avx = (__m256i*) src_ptr; 159 | __m256 factor_in_avx = _mm256_broadcast_ss(&factor); 160 | 161 | for (uint32_t i = 0; i < aligned_size; i++) 162 | { 163 | float *current_pos = float_data_ptr + i * 8; 164 | float *current_dst_pos = dst_float_data_ptr + i * 8; 165 | 166 | input = _mm256_loadu_si256((__m256i *)current_pos); 167 | output = _mm256_cvtepi32_ps(input); 168 | output = _mm256_div_ps(output, factor_in_avx); 169 | _mm256_storeu_ps(current_dst_pos, output); 170 | } 171 | 172 | for (uint32_t i = 0; i < unaligned_size; i++) 173 | { 174 | dst_float_data_ptr[aligned_size * 8 + i] = 175 | (float)(int_data_ptr[aligned_size * 8 + i] / factor); 176 | } 177 | } 178 | 179 | #endif -------------------------------------------------------------------------------- /common/utils.h: -------------------------------------------------------------------------------- 1 | #ifndef UTILS_H 2 | #define UTILS_H 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | 15 | // Because here we use 2 agtr for one packet, so /2 16 | #define MAX_AGTR_COUNT 20000 17 | #define AGTR_TO_USE_PER_APPLICATION 2800 18 | 19 | #define EACH_HUGEPAGE_SIZE (2048*1024) 20 | 21 | #define likely(x) __builtin_expect(!!(x), 1) 22 | #define unlikely(x) __builtin_expect(!!(x), 0) 23 | 24 | 25 | #define DIVUP(x, y) (((x)+(y)-1)/(y)) 26 | #define ROUNDUP(x, y) (DIVUP((x), (y))*(y)) 27 | 28 | template 29 | static inline T align_floor(T v, T align) { 30 | return v - (v % align); 31 | } 32 | 33 | template 34 | static inline T align_ceil(T v, T align) { 35 | return align_floor(v + align - 1, align); 36 | } 37 | 38 | static inline void ib_malloc(void** ptr, size_t size) { 39 | size_t page_size = sysconf(_SC_PAGESIZE); 40 | void* p; 41 | int size_aligned = ROUNDUP(size, page_size); 42 | int ret = posix_memalign(&p, page_size, size_aligned); 43 | if (ret != 0) { 44 | printf("posix_memalign error.\n"); 45 | exit(1); 46 | } 47 | memset(p, 0, size); 48 | *ptr = p; 49 | } 50 | 51 | #define KB(x) (static_cast(x) << 10) 52 | #define KB_(x) (KB(x) - 1) 53 | #define MB(x) (static_cast(x) << 20) 54 | #define MB_(x) (MB(x) - 1) 55 | 56 | static void memory_barrier() { asm volatile("" ::: "memory"); } 57 | static void lfence() { asm volatile("lfence" ::: "memory"); } 58 | static void sfence() { asm volatile("sfence" ::: "memory"); } 59 | static void mfence() { asm volatile("mfence" ::: "memory"); } 60 | static void clflush(volatile void* p) { asm volatile("clflush (%0)" ::"r"(p)); } 61 | static void cpuid(unsigned int* eax, unsigned int* ebx, unsigned int* ecx, 62 | unsigned int* edx) { 63 | asm volatile("cpuid" 64 | : "=a"(*eax), "=b"(*ebx), "=c"(*ecx), "=d"(*edx) 65 | : "0"(*eax), "2"(*ecx)); 66 | } 67 | 68 | inline void bindingCPU(int num) { 69 | int result; 70 | cpu_set_t mask; 71 | CPU_ZERO(&mask); 72 | CPU_SET(num, &mask); 73 | result = sched_setaffinity(0, sizeof(mask), &mask); 74 | if (result < 0) { 75 | printf("binding CPU fails\n"); 76 | exit(1); 77 | } 78 | } 79 | 80 | /// Check a condition at runtime. If the condition is false, throw exception. 81 | static inline void rt_assert(bool condition) { 82 | if (unlikely(!condition)) throw std::runtime_error(""); 83 | } 84 | 85 | 86 | /* allocate the huge pages. */ 87 | inline char *alloc_raw_pages(int cnt, int size) { 88 | /* 89 | * Don't touch the page since then allocator would not allocate the page 90 | * right now. 91 | */ 92 | int flag = MAP_SHARED | MAP_ANONYMOUS; 93 | if (size == EACH_HUGEPAGE_SIZE) flag |= MAP_HUGETLB; 94 | char *ptr = 95 | (char *)mmap(NULL, (int64_t)cnt * size, PROT_READ | PROT_WRITE, flag, -1, 0); 96 | if (ptr == (char *)-1) { 97 | perror("alloc_raw_pages"); 98 | return NULL; 99 | } 100 | return ptr; 101 | } 102 | 103 | union { 104 | float f; 105 | uint32_t u; 106 | } if_value; 107 | 108 | inline float ntohf(uint32_t net32) 109 | { 110 | if_value.u = ntohl(net32); 111 | return if_value.f; 112 | } 113 | 114 | // /* Returns the MAC Address Params: int iNetType - 0: ethernet, 1: Wifi char chMAC[6] - MAC Address in binary format Returns: 0: success -1: Failure */ 115 | // int getMACAddress(char chMAC[6]) 116 | // { 117 | // struct ifreq ifr; 118 | // int sock; 119 | // char* ifname = "enp178s0f0"; 120 | // sock = socket(AF_INET, SOCK_DGRAM, 0); 121 | // strcpy(ifr.ifr_name, ifname); 122 | // ifr.ifr_addr.sa_family = AF_INET; 123 | // if (ioctl(sock, SIOCGIFHWADDR, &ifr) < 0) { 124 | // return -1; 125 | // } 126 | // memcpy(chMAC, ifr.ifr_hwaddr.sa_data, 6); 127 | // close(sock); 128 | // return 0; 129 | // } 130 | 131 | // /* Returns the interface IP Address Params: int iNetType - 0: ethernet, 1: Wifi char *chIP - IP Address string Return: 0: success / -1: Failure */ 132 | // int getIpAddress(char chIP[16]) 133 | // { 134 | // struct ifreq ifr; 135 | // int sock = 0; 136 | // sock = socket(AF_INET, SOCK_DGRAM, 0); 137 | // strcpy(ifr.ifr_name, "enp178s0f0"); 138 | // if (ioctl(sock, SIOCGIFADDR, &ifr) < 0) { 139 | // strcpy(chIP, "0.0.0.0"); 140 | // return -1; 141 | // } 142 | // sprintf(chIP, "%s", inet_ntoa(((struct sockaddr_in*)&(ifr.ifr_addr))->sin_addr)); 143 | // close(sock); 144 | // return 0; 145 | // } 146 | 147 | #endif 148 | -------------------------------------------------------------------------------- /common/window_manager.h: -------------------------------------------------------------------------------- 1 | #ifndef SLIDING_W_H 2 | #define SLIDING_W_H 3 | 4 | #include "packet.h" 5 | #include "CC_manager.h" 6 | #define RESEND_TRIGGER 1 7 | 8 | class WindowManager { 9 | public: 10 | bool* isACKed; 11 | /* This three variable is completely useless, but 12 | when deleting it, the performance will drop from 46Gbps to 40Gbps.. */ 13 | bool* isSent; 14 | std::chrono::high_resolution_clock::time_point* send_time; 15 | std::chrono::high_resolution_clock::time_point* receive_time; 16 | /* */ 17 | int total_ACK; 18 | int last_ACK; 19 | 20 | WindowManager() { 21 | last_ACK = 0; 22 | } 23 | 24 | int inline UpdateWindow(uint16_t* seq_num) 25 | { 26 | int isLastAckUpdated = 0; 27 | isACKed[*seq_num] = true; 28 | while (isACKed[last_ACK + 1]) { 29 | last_ACK++; 30 | isLastAckUpdated ++; 31 | } 32 | return isLastAckUpdated; 33 | } 34 | 35 | int inline Reset(int packet_total) 36 | { 37 | last_ACK = 0; 38 | total_ACK = packet_total; 39 | memset(isACKed, 0, sizeof(bool) * packet_total + 1); 40 | return 0; 41 | } 42 | }; 43 | 44 | #endif -------------------------------------------------------------------------------- /datasample/job_A.txt: -------------------------------------------------------------------------------- 1 | SetMaxWindow to 50 2 | 3 | Set used agtr to 450... (all agtr: 20000) 4 | 5 | 6 | Set max_agtr_size_per_thread to 50... 7 | 8 | thread_to_use 9 9 | max_agtr_size_per_thread: 50 10 | used agtr: 450 (all:20000) 11 | using: mlx5_1 nic: 0 12 | [Thread 0] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 13 | init: 0 14 | [Thread 1] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 15 | init: 1 16 | [Thread 2] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 17 | init: 2 18 | [Thread 3] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 19 | init: 3 20 | [Thread 4] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 21 | init: 4 22 | [Thread 5] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 23 | init: 5 24 | [Thread 6] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 25 | init: 6 26 | [Thread 7] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 27 | init: 7 28 | [Thread 8] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 29 | init: 8 30 | 31 | Model initializing... 32 | 33 | Model initialized completed. Start sending... 34 | 35 | Thr 48.673178 36 | Thr 50.194216 37 | Thr 50.118166 38 | Thr 50.042113 39 | Thr 49.966063 40 | Thr 50.270271 41 | Finish all 4500 Tensors, 42 | Time = 3.463762 s, 43 | Total Size = 21901.776714 MB, 44 | Throughput: 49.399356 Gbps 45 | -------------------------------------------------------------------------------- /datasample/job_B.txt: -------------------------------------------------------------------------------- 1 | SetMaxWindow to 50 2 | 3 | Set used agtr to 450... (all agtr: 20000) 4 | 5 | 6 | Set max_agtr_size_per_thread to 50... 7 | 8 | thread_to_use 9 9 | max_agtr_size_per_thread: 50 10 | used agtr: 450 (all:20000) 11 | using: mlx5_0 nic: 1 12 | [Thread 10] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 13 | init: 0 14 | [Thread 11] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 15 | init: 1 16 | [Thread 12] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 17 | init: 2 18 | [Thread 13] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 19 | init: 3 20 | [Thread 14] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 21 | init: 4 22 | [Thread 15] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 23 | init: 5 24 | [Thread 16] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 25 | init: 6 26 | [Thread 17] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 27 | init: 7 28 | [Thread 18] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4 29 | init: 8 30 | 31 | Model initializing... 32 | 33 | Model initialized completed. Start sending... 34 | 35 | Thr 14.525902 36 | Thr 15.514576 37 | Thr 15.590628 38 | Thr 15.742732 39 | Thr 15.970888 40 | Thr 15.590628 41 | Thr 16.122991 42 | Thr 15.742732 43 | Thr 15.210369 44 | Thr 15.894835 45 | Thr 15.894835 46 | Thr 15.970888 47 | Thr 15.742732 48 | Thr 16.046939 49 | Thr 16.199043 50 | Thr 16.275095 51 | Thr 16.046940 52 | Thr 16.503250 53 | Thr 16.275095 54 | Thr 16.655353 55 | Thr 16.199043 56 | Finish all 4500 Tensors, 57 | Time = 10.827258 s, 58 | Total Size = 21901.776714 MB, 59 | Throughput: 15.803413 Gbps 60 | -------------------------------------------------------------------------------- /datasample/switch.txt: -------------------------------------------------------------------------------- 1 | INFO: SDE was built with python 2.7 2 | 3 | (1, 5) 4 | (2, 6) 5 | appID[1] 210/450 46.74 % 6 | appID[2] 19/450 4.37 % 7 | time 12.4 total_used 230/450 51.1 % 8 | appID[1] 202/450 45.04 % 9 | appID[2] 18/450 4.15 % 10 | time 12.9 total_used 221/450 49.2 % 11 | appID[1] 247/450 55.04 % 12 | appID[2] 7/450 1.70 % 13 | time 13.5 total_used 255/450 56.7 % 14 | appID[1] 186/450 41.33 % 15 | appID[2] 12/450 2.81 % 16 | time 14.1 total_used 198/450 44.1 % 17 | appID[1] 243/450 54.00 % 18 | appID[2] 5/450 1.19 % 19 | time 14.7 total_used 248/450 55.2 % 20 | appID[1] 231/450 51.33 % 21 | appID[2] 9/450 2.00 % 22 | time 15.3 total_used 240/450 53.3 % 23 | appID[1] 74/450 16.52 % 24 | appID[2] 251/450 55.93 % 25 | time 15.9 total_used 326/450 72.4 % 26 | appID[2] 382/450 84.89 % 27 | time 16.5 total_used 382/450 84.9 % 28 | appID[2] 411/450 91.33 % 29 | time 17.1 total_used 411/450 91.3 % 30 | appID[2] 429/450 95.33 % 31 | time 17.6 total_used 429/450 95.3 % 32 | appID[2] 408/450 90.81 % 33 | time 18.2 total_used 408/450 90.8 % 34 | appID[2] 436/450 96.96 % 35 | time 18.8 total_used 436/450 97.0 % 36 | appID[2] 427/450 94.96 % 37 | time 19.4 total_used 427/450 95.0 % 38 | appID[2] 392/450 87.19 % 39 | time 20.0 total_used 392/450 87.2 % 40 | appID[2] 399/450 88.74 % 41 | time 20.6 total_used 399/450 88.7 % 42 | appID[2] 427/450 94.96 % 43 | time 21.2 total_used 427/450 95.0 % 44 | appID[2] 428/450 95.19 % 45 | time 21.7 total_used 428/450 95.2 % 46 | appID[2] 429/450 95.48 % 47 | time 22.3 total_used 429/450 95.5 % 48 | appID[2] 262/450 58.22 % 49 | time 22.9 total_used 262/450 58.2 % -------------------------------------------------------------------------------- /p4ml2/includes/actions.p4: -------------------------------------------------------------------------------- 1 | action processentry1() { 2 | write_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr); 3 | } 4 | 5 | action noequ0_processentry1() { 6 | noequ0_write_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr); 7 | } 8 | 9 | action processentry1andWriteToPacket() { 10 | write_read_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr); 11 | } 12 | 13 | action noequ0_processentry1andWriteToPacket() { 14 | noequ0_write_read_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr); 15 | } 16 | 17 | action do_cleanEntry1() { 18 | clean_entry1.execute_stateful_alu(p4ml_agtr_index.agtr); 19 | } 20 | 21 | action entry1WriteToPacket() { 22 | read_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr); 23 | } 24 | 25 | action processentry2() { 26 | write_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr); 27 | } 28 | 29 | action noequ0_processentry2() { 30 | noequ0_write_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr); 31 | } 32 | 33 | action processentry2andWriteToPacket() { 34 | write_read_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr); 35 | } 36 | 37 | action noequ0_processentry2andWriteToPacket() { 38 | noequ0_write_read_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr); 39 | } 40 | 41 | action do_cleanEntry2() { 42 | clean_entry2.execute_stateful_alu(p4ml_agtr_index.agtr); 43 | } 44 | 45 | action entry2WriteToPacket() { 46 | read_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr); 47 | } 48 | 49 | action processentry3() { 50 | write_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr); 51 | } 52 | 53 | action noequ0_processentry3() { 54 | noequ0_write_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr); 55 | } 56 | 57 | action processentry3andWriteToPacket() { 58 | write_read_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr); 59 | } 60 | 61 | action noequ0_processentry3andWriteToPacket() { 62 | noequ0_write_read_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr); 63 | } 64 | 65 | action do_cleanEntry3() { 66 | clean_entry3.execute_stateful_alu(p4ml_agtr_index.agtr); 67 | } 68 | 69 | action entry3WriteToPacket() { 70 | read_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr); 71 | } 72 | 73 | action processentry4() { 74 | write_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr); 75 | } 76 | 77 | action noequ0_processentry4() { 78 | noequ0_write_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr); 79 | } 80 | 81 | action processentry4andWriteToPacket() { 82 | write_read_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr); 83 | } 84 | 85 | action noequ0_processentry4andWriteToPacket() { 86 | noequ0_write_read_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr); 87 | } 88 | 89 | action do_cleanEntry4() { 90 | clean_entry4.execute_stateful_alu(p4ml_agtr_index.agtr); 91 | } 92 | 93 | action entry4WriteToPacket() { 94 | read_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr); 95 | } 96 | 97 | action processentry5() { 98 | write_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr); 99 | } 100 | 101 | action noequ0_processentry5() { 102 | noequ0_write_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr); 103 | } 104 | 105 | action processentry5andWriteToPacket() { 106 | write_read_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr); 107 | } 108 | 109 | action noequ0_processentry5andWriteToPacket() { 110 | noequ0_write_read_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr); 111 | } 112 | 113 | action do_cleanEntry5() { 114 | clean_entry5.execute_stateful_alu(p4ml_agtr_index.agtr); 115 | } 116 | 117 | action entry5WriteToPacket() { 118 | read_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr); 119 | } 120 | 121 | action processentry6() { 122 | write_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr); 123 | } 124 | 125 | action noequ0_processentry6() { 126 | noequ0_write_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr); 127 | } 128 | 129 | action processentry6andWriteToPacket() { 130 | write_read_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr); 131 | } 132 | 133 | action noequ0_processentry6andWriteToPacket() { 134 | noequ0_write_read_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr); 135 | } 136 | 137 | action do_cleanEntry6() { 138 | clean_entry6.execute_stateful_alu(p4ml_agtr_index.agtr); 139 | } 140 | 141 | action entry6WriteToPacket() { 142 | read_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr); 143 | } 144 | 145 | action processentry7() { 146 | write_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr); 147 | } 148 | 149 | action noequ0_processentry7() { 150 | noequ0_write_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr); 151 | } 152 | 153 | action processentry7andWriteToPacket() { 154 | write_read_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr); 155 | } 156 | 157 | action noequ0_processentry7andWriteToPacket() { 158 | noequ0_write_read_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr); 159 | } 160 | 161 | action do_cleanEntry7() { 162 | clean_entry7.execute_stateful_alu(p4ml_agtr_index.agtr); 163 | } 164 | 165 | action entry7WriteToPacket() { 166 | read_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr); 167 | } 168 | 169 | action processentry8() { 170 | write_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr); 171 | } 172 | 173 | action noequ0_processentry8() { 174 | noequ0_write_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr); 175 | } 176 | 177 | action processentry8andWriteToPacket() { 178 | write_read_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr); 179 | } 180 | 181 | action noequ0_processentry8andWriteToPacket() { 182 | noequ0_write_read_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr); 183 | } 184 | 185 | action do_cleanEntry8() { 186 | clean_entry8.execute_stateful_alu(p4ml_agtr_index.agtr); 187 | } 188 | 189 | action entry8WriteToPacket() { 190 | read_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr); 191 | } 192 | 193 | action processentry9() { 194 | write_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr); 195 | } 196 | 197 | action noequ0_processentry9() { 198 | noequ0_write_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr); 199 | } 200 | 201 | action processentry9andWriteToPacket() { 202 | write_read_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr); 203 | } 204 | 205 | action noequ0_processentry9andWriteToPacket() { 206 | noequ0_write_read_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr); 207 | } 208 | 209 | action do_cleanEntry9() { 210 | clean_entry9.execute_stateful_alu(p4ml_agtr_index.agtr); 211 | } 212 | 213 | action entry9WriteToPacket() { 214 | read_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr); 215 | } 216 | 217 | action processentry10() { 218 | write_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr); 219 | } 220 | 221 | action noequ0_processentry10() { 222 | noequ0_write_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr); 223 | } 224 | 225 | action processentry10andWriteToPacket() { 226 | write_read_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr); 227 | } 228 | 229 | action noequ0_processentry10andWriteToPacket() { 230 | noequ0_write_read_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr); 231 | } 232 | 233 | action do_cleanEntry10() { 234 | clean_entry10.execute_stateful_alu(p4ml_agtr_index.agtr); 235 | } 236 | 237 | action entry10WriteToPacket() { 238 | read_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr); 239 | } 240 | 241 | action processentry11() { 242 | write_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr); 243 | } 244 | 245 | action noequ0_processentry11() { 246 | noequ0_write_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr); 247 | } 248 | 249 | action processentry11andWriteToPacket() { 250 | write_read_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr); 251 | } 252 | 253 | action noequ0_processentry11andWriteToPacket() { 254 | noequ0_write_read_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr); 255 | } 256 | 257 | action do_cleanEntry11() { 258 | clean_entry11.execute_stateful_alu(p4ml_agtr_index.agtr); 259 | } 260 | 261 | action entry11WriteToPacket() { 262 | read_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr); 263 | } 264 | 265 | action processentry12() { 266 | write_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr); 267 | } 268 | 269 | action noequ0_processentry12() { 270 | noequ0_write_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr); 271 | } 272 | 273 | action processentry12andWriteToPacket() { 274 | write_read_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr); 275 | } 276 | 277 | action noequ0_processentry12andWriteToPacket() { 278 | noequ0_write_read_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr); 279 | } 280 | 281 | action do_cleanEntry12() { 282 | clean_entry12.execute_stateful_alu(p4ml_agtr_index.agtr); 283 | } 284 | 285 | action entry12WriteToPacket() { 286 | read_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr); 287 | } 288 | 289 | action processentry13() { 290 | write_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr); 291 | } 292 | 293 | action noequ0_processentry13() { 294 | noequ0_write_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr); 295 | } 296 | 297 | action processentry13andWriteToPacket() { 298 | write_read_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr); 299 | } 300 | 301 | action noequ0_processentry13andWriteToPacket() { 302 | noequ0_write_read_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr); 303 | } 304 | 305 | action do_cleanEntry13() { 306 | clean_entry13.execute_stateful_alu(p4ml_agtr_index.agtr); 307 | } 308 | 309 | action entry13WriteToPacket() { 310 | read_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr); 311 | } 312 | 313 | action processentry14() { 314 | write_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr); 315 | } 316 | 317 | action noequ0_processentry14() { 318 | noequ0_write_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr); 319 | } 320 | 321 | action processentry14andWriteToPacket() { 322 | write_read_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr); 323 | } 324 | 325 | action noequ0_processentry14andWriteToPacket() { 326 | noequ0_write_read_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr); 327 | } 328 | 329 | action do_cleanEntry14() { 330 | clean_entry14.execute_stateful_alu(p4ml_agtr_index.agtr); 331 | } 332 | 333 | action entry14WriteToPacket() { 334 | read_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr); 335 | } 336 | 337 | action processentry15() { 338 | write_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr); 339 | } 340 | 341 | action noequ0_processentry15() { 342 | noequ0_write_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr); 343 | } 344 | 345 | action processentry15andWriteToPacket() { 346 | write_read_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr); 347 | } 348 | 349 | action noequ0_processentry15andWriteToPacket() { 350 | noequ0_write_read_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr); 351 | } 352 | 353 | action do_cleanEntry15() { 354 | clean_entry15.execute_stateful_alu(p4ml_agtr_index.agtr); 355 | } 356 | 357 | action entry15WriteToPacket() { 358 | read_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr); 359 | } 360 | 361 | action processentry16() { 362 | write_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr); 363 | } 364 | 365 | action noequ0_processentry16() { 366 | noequ0_write_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr); 367 | } 368 | 369 | action processentry16andWriteToPacket() { 370 | write_read_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr); 371 | } 372 | 373 | action noequ0_processentry16andWriteToPacket() { 374 | noequ0_write_read_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr); 375 | } 376 | 377 | action do_cleanEntry16() { 378 | clean_entry16.execute_stateful_alu(p4ml_agtr_index.agtr); 379 | } 380 | 381 | action entry16WriteToPacket() { 382 | read_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr); 383 | } 384 | 385 | action processentry17() { 386 | write_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr); 387 | } 388 | 389 | action noequ0_processentry17() { 390 | noequ0_write_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr); 391 | } 392 | 393 | action processentry17andWriteToPacket() { 394 | write_read_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr); 395 | } 396 | 397 | action noequ0_processentry17andWriteToPacket() { 398 | noequ0_write_read_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr); 399 | } 400 | 401 | action do_cleanEntry17() { 402 | clean_entry17.execute_stateful_alu(p4ml_agtr_index.agtr); 403 | } 404 | 405 | action entry17WriteToPacket() { 406 | read_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr); 407 | } 408 | 409 | action processentry18() { 410 | write_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr); 411 | } 412 | 413 | action noequ0_processentry18() { 414 | noequ0_write_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr); 415 | } 416 | 417 | action processentry18andWriteToPacket() { 418 | write_read_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr); 419 | } 420 | 421 | action noequ0_processentry18andWriteToPacket() { 422 | noequ0_write_read_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr); 423 | } 424 | 425 | action do_cleanEntry18() { 426 | clean_entry18.execute_stateful_alu(p4ml_agtr_index.agtr); 427 | } 428 | 429 | action entry18WriteToPacket() { 430 | read_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr); 431 | } 432 | 433 | action processentry19() { 434 | write_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr); 435 | } 436 | 437 | action noequ0_processentry19() { 438 | noequ0_write_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr); 439 | } 440 | 441 | action processentry19andWriteToPacket() { 442 | write_read_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr); 443 | } 444 | 445 | action noequ0_processentry19andWriteToPacket() { 446 | noequ0_write_read_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr); 447 | } 448 | 449 | action do_cleanEntry19() { 450 | clean_entry19.execute_stateful_alu(p4ml_agtr_index.agtr); 451 | } 452 | 453 | action entry19WriteToPacket() { 454 | read_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr); 455 | } 456 | 457 | action processentry20() { 458 | write_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr); 459 | } 460 | 461 | action noequ0_processentry20() { 462 | noequ0_write_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr); 463 | } 464 | 465 | action processentry20andWriteToPacket() { 466 | write_read_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr); 467 | } 468 | 469 | action noequ0_processentry20andWriteToPacket() { 470 | noequ0_write_read_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr); 471 | } 472 | 473 | action do_cleanEntry20() { 474 | clean_entry20.execute_stateful_alu(p4ml_agtr_index.agtr); 475 | } 476 | 477 | action entry20WriteToPacket() { 478 | read_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr); 479 | } 480 | 481 | action processentry21() { 482 | write_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr); 483 | } 484 | 485 | action noequ0_processentry21() { 486 | noequ0_write_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr); 487 | } 488 | 489 | action processentry21andWriteToPacket() { 490 | write_read_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr); 491 | } 492 | 493 | action noequ0_processentry21andWriteToPacket() { 494 | noequ0_write_read_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr); 495 | } 496 | 497 | action do_cleanEntry21() { 498 | clean_entry21.execute_stateful_alu(p4ml_agtr_index.agtr); 499 | } 500 | 501 | action entry21WriteToPacket() { 502 | read_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr); 503 | } 504 | 505 | action processentry22() { 506 | write_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr); 507 | } 508 | 509 | action noequ0_processentry22() { 510 | noequ0_write_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr); 511 | } 512 | 513 | action processentry22andWriteToPacket() { 514 | write_read_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr); 515 | } 516 | 517 | action noequ0_processentry22andWriteToPacket() { 518 | noequ0_write_read_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr); 519 | } 520 | 521 | action do_cleanEntry22() { 522 | clean_entry22.execute_stateful_alu(p4ml_agtr_index.agtr); 523 | } 524 | 525 | action entry22WriteToPacket() { 526 | read_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr); 527 | } 528 | 529 | action processentry23() { 530 | write_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr); 531 | } 532 | 533 | action noequ0_processentry23() { 534 | noequ0_write_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr); 535 | } 536 | 537 | action processentry23andWriteToPacket() { 538 | write_read_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr); 539 | } 540 | 541 | action noequ0_processentry23andWriteToPacket() { 542 | noequ0_write_read_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr); 543 | } 544 | 545 | action do_cleanEntry23() { 546 | clean_entry23.execute_stateful_alu(p4ml_agtr_index.agtr); 547 | } 548 | 549 | action entry23WriteToPacket() { 550 | read_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr); 551 | } 552 | 553 | action processentry24() { 554 | write_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr); 555 | } 556 | 557 | action noequ0_processentry24() { 558 | noequ0_write_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr); 559 | } 560 | 561 | action processentry24andWriteToPacket() { 562 | write_read_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr); 563 | } 564 | 565 | action noequ0_processentry24andWriteToPacket() { 566 | noequ0_write_read_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr); 567 | } 568 | 569 | action do_cleanEntry24() { 570 | clean_entry24.execute_stateful_alu(p4ml_agtr_index.agtr); 571 | } 572 | 573 | action entry24WriteToPacket() { 574 | read_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr); 575 | } 576 | 577 | action processentry25() { 578 | write_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr); 579 | } 580 | 581 | action noequ0_processentry25() { 582 | noequ0_write_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr); 583 | } 584 | 585 | action processentry25andWriteToPacket() { 586 | write_read_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr); 587 | } 588 | 589 | action noequ0_processentry25andWriteToPacket() { 590 | noequ0_write_read_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr); 591 | } 592 | 593 | action do_cleanEntry25() { 594 | clean_entry25.execute_stateful_alu(p4ml_agtr_index.agtr); 595 | } 596 | 597 | action entry25WriteToPacket() { 598 | read_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr); 599 | } 600 | 601 | action processentry26() { 602 | write_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr); 603 | } 604 | 605 | action noequ0_processentry26() { 606 | noequ0_write_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr); 607 | } 608 | 609 | action processentry26andWriteToPacket() { 610 | write_read_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr); 611 | } 612 | 613 | action noequ0_processentry26andWriteToPacket() { 614 | noequ0_write_read_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr); 615 | } 616 | 617 | action do_cleanEntry26() { 618 | clean_entry26.execute_stateful_alu(p4ml_agtr_index.agtr); 619 | } 620 | 621 | action entry26WriteToPacket() { 622 | read_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr); 623 | } 624 | 625 | action processentry27() { 626 | write_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr); 627 | } 628 | 629 | action noequ0_processentry27() { 630 | noequ0_write_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr); 631 | } 632 | 633 | action processentry27andWriteToPacket() { 634 | write_read_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr); 635 | } 636 | 637 | action noequ0_processentry27andWriteToPacket() { 638 | noequ0_write_read_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr); 639 | } 640 | 641 | action do_cleanEntry27() { 642 | clean_entry27.execute_stateful_alu(p4ml_agtr_index.agtr); 643 | } 644 | 645 | action entry27WriteToPacket() { 646 | read_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr); 647 | } 648 | 649 | action processentry28() { 650 | write_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr); 651 | } 652 | 653 | action noequ0_processentry28() { 654 | noequ0_write_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr); 655 | } 656 | 657 | action processentry28andWriteToPacket() { 658 | write_read_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr); 659 | } 660 | 661 | action noequ0_processentry28andWriteToPacket() { 662 | noequ0_write_read_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr); 663 | } 664 | 665 | action do_cleanEntry28() { 666 | clean_entry28.execute_stateful_alu(p4ml_agtr_index.agtr); 667 | } 668 | 669 | action entry28WriteToPacket() { 670 | read_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr); 671 | } 672 | 673 | action processentry29() { 674 | write_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr); 675 | } 676 | 677 | action noequ0_processentry29() { 678 | noequ0_write_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr); 679 | } 680 | 681 | action processentry29andWriteToPacket() { 682 | write_read_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr); 683 | } 684 | 685 | action noequ0_processentry29andWriteToPacket() { 686 | noequ0_write_read_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr); 687 | } 688 | 689 | action do_cleanEntry29() { 690 | clean_entry29.execute_stateful_alu(p4ml_agtr_index.agtr); 691 | } 692 | 693 | action entry29WriteToPacket() { 694 | read_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr); 695 | } 696 | 697 | action processentry30() { 698 | write_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr); 699 | } 700 | 701 | action noequ0_processentry30() { 702 | noequ0_write_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr); 703 | } 704 | 705 | action processentry30andWriteToPacket() { 706 | write_read_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr); 707 | } 708 | 709 | action noequ0_processentry30andWriteToPacket() { 710 | noequ0_write_read_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr); 711 | } 712 | 713 | action do_cleanEntry30() { 714 | clean_entry30.execute_stateful_alu(p4ml_agtr_index.agtr); 715 | } 716 | 717 | action entry30WriteToPacket() { 718 | read_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr); 719 | } 720 | 721 | action processentry31() { 722 | write_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr); 723 | } 724 | 725 | action noequ0_processentry31() { 726 | noequ0_write_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr); 727 | } 728 | 729 | action processentry31andWriteToPacket() { 730 | write_read_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr); 731 | } 732 | 733 | action noequ0_processentry31andWriteToPacket() { 734 | noequ0_write_read_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr); 735 | } 736 | 737 | action do_cleanEntry31() { 738 | clean_entry31.execute_stateful_alu(p4ml_agtr_index.agtr); 739 | } 740 | 741 | action entry31WriteToPacket() { 742 | read_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr); 743 | } 744 | 745 | //action processentry32() { 746 | // write_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr); 747 | //} 748 | 749 | //actionoequ0_n processentry32() { 750 | // noequ0_ write_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr); 751 | //} 752 | // 753 | //action processentry32andWriteToPacket() { 754 | // write_read_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr); 755 | //} 756 | 757 | //actionoequ0_n processentry32andWriteToPacket() { 758 | // noequ0_ write_read_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr); 759 | //} 760 | 761 | //action do_cleanEntryry32() { 762 | // clean_entry32.execute_stateful_alu(p4ml_agtr_index.agtr); 763 | 764 | 765 | //action entry32WriteToPacket() { 766 | // read_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr); 767 | //} 768 | // 769 | -------------------------------------------------------------------------------- /p4ml2/includes/common.p4: -------------------------------------------------------------------------------- 1 | /* 2 | * P4PS 3 | * / 4 | 5 | /************************************************************************* 6 | *********************** R E G I S T E R ******************************* 7 | *************************************************************************/ 8 | 9 | blackbox stateful_alu cleaning_agtr_time { 10 | reg: agtr_time; 11 | 12 | update_lo_1_value : 0; 13 | } 14 | 15 | blackbox stateful_alu cleaning_ecn { 16 | reg: ecn_register; 17 | 18 | update_lo_1_value : 0; 19 | } 20 | 21 | //for is_Col in A2TP 22 | 23 | blackbox stateful_alu cleaning_col { 24 | reg: col_register; 25 | 26 | update_lo_1_value : 0; 27 | } 28 | 29 | 30 | blackbox stateful_alu cleaning_bitmap { 31 | reg: bitmap; 32 | 33 | update_lo_1_value : 0; 34 | } 35 | 36 | blackbox stateful_alu read_write_bitmap { 37 | reg: bitmap; 38 | 39 | output_dst : mdata.bitmap; 40 | 41 | output_value : register_lo; 42 | 43 | update_lo_1_value : register_lo | p4ml.bitmap; 44 | } 45 | 46 | blackbox stateful_alu read_write_bitmap_resend { 47 | reg: bitmap; 48 | 49 | output_dst : mdata.bitmap; 50 | 51 | output_value : register_lo; 52 | 53 | update_lo_1_value : 0; 54 | } 55 | 56 | // if same application, output appID, if not, not output (zero) 57 | blackbox stateful_alu check_app_id_and_seq { 58 | reg: appID_and_Seq; 59 | 60 | condition_lo : p4ml.appIDandSeqNum == register_lo; 61 | // The agtr is empty 62 | condition_hi : register_lo == 0; 63 | 64 | update_lo_1_predicate : condition_lo or condition_hi; 65 | update_lo_1_value : p4ml.appIDandSeqNum; 66 | 67 | output_predicate : condition_lo or condition_hi; 68 | output_dst : mdata.isMyAppIDandMyCurrentSeq; 69 | output_value : p4ml.appIDandSeqNum; 70 | } 71 | 72 | blackbox stateful_alu check_app_id_and_seq_resend { 73 | reg: appID_and_Seq; 74 | 75 | condition_lo : p4ml.appIDandSeqNum == register_lo; 76 | 77 | update_lo_1_predicate : condition_lo; 78 | update_lo_1_value : 0; 79 | 80 | output_predicate : condition_lo; 81 | output_dst : mdata.isMyAppIDandMyCurrentSeq; 82 | output_value : register_lo; 83 | } 84 | 85 | blackbox stateful_alu clean_app_id_and_seq { 86 | reg: appID_and_Seq; 87 | 88 | condition_lo : p4ml.appIDandSeqNum == register_lo; 89 | 90 | update_lo_1_predicate : condition_lo; 91 | update_lo_1_value : 0; 92 | 93 | output_predicate : condition_lo; 94 | output_dst : mdata.isMyAppIDandMyCurrentSeq; 95 | output_value : p4ml.appIDandSeqNum; 96 | } 97 | 98 | blackbox stateful_alu check_agtrTime { 99 | reg: agtr_time; 100 | 101 | condition_lo : mdata.isAggregate != 0; 102 | output_dst : mdata.current_agtr_time; 103 | 104 | update_lo_1_predicate : condition_lo; 105 | update_lo_1_value : register_lo + 1; 106 | 107 | update_lo_2_predicate : not condition_lo; 108 | update_lo_2_value : register_lo; 109 | 110 | output_value : alu_lo; 111 | } 112 | 113 | blackbox stateful_alu check_resend_agtrTime { 114 | reg: agtr_time; 115 | 116 | condition_lo : mdata.isAggregate != 0; 117 | // fake, force forward 118 | output_dst : mdata.current_agtr_time; 119 | 120 | update_lo_1_predicate : condition_lo; 121 | update_lo_1_value : 0; 122 | 123 | update_lo_2_predicate : not condition_lo; 124 | update_lo_2_value : 0; 125 | 126 | output_value : p4ml.agtr_time; 127 | } 128 | 129 | blackbox stateful_alu do_comp_qdepth { 130 | reg: dqueue_alert_threshold; 131 | 132 | condition_lo : eg_intr_md.deq_qdepth >= register_lo; 133 | // fake, force forward 134 | output_predicate : condition_lo; 135 | output_dst : mdata.qdepth; 136 | output_value : eg_intr_md.deq_qdepth; 137 | initial_register_lo_value : 1000; 138 | } 139 | 140 | blackbox stateful_alu do_check_ecn { 141 | reg: ecn_register; 142 | 143 | condition_lo : register_lo == 1; 144 | 145 | update_lo_1_value : register_lo | mdata.is_ecn; 146 | 147 | output_predicate : condition_lo; 148 | output_value : mdata.value_one; 149 | output_dst : p4ml.ECN; 150 | } 151 | 152 | //for is_Col in A2TP 153 | 154 | blackbox stateful_alu do_check_col { 155 | reg: col_register; 156 | 157 | condition_lo : register_lo == 1; 158 | 159 | update_lo_1_value : register_lo | mdata.is_col; 160 | 161 | output_predicate : condition_lo; 162 | output_value : mdata.value_two; 163 | output_dst : p4ml.is_lzy_Col; //for is_Col in A2TP 164 | } 165 | 166 | blackbox stateful_alu do_tag_col { 167 | reg: col_register; 168 | 169 | update_lo_1_value : register_lo | mdata.is_col; 170 | 171 | } 172 | 173 | /************************************************************************* 174 | ************** I N G R E S S P R O C E S S I N G ******************* 175 | *************************************************************************/ 176 | 177 | /* 178 | * Actions 179 | */ 180 | 181 | action process_bitmap() { 182 | read_write_bitmap.execute_stateful_alu(p4ml_agtr_index.agtr); 183 | } 184 | 185 | action process_bitmap_resend() { 186 | read_write_bitmap_resend.execute_stateful_alu(p4ml_agtr_index.agtr); 187 | } 188 | 189 | 190 | action check_aggregate_and_forward() { 191 | // this is is for aggregation needed checking 192 | bit_andcb(mdata.isAggregate, p4ml.bitmap, mdata.bitmap); 193 | bit_or(mdata.integrated_bitmap, p4ml.bitmap, mdata.bitmap); 194 | } 195 | 196 | action clean_agtr_time() { 197 | cleaning_agtr_time.execute_stateful_alu(p4ml_agtr_index.agtr); 198 | } 199 | 200 | action clean_ecn() { 201 | cleaning_ecn.execute_stateful_alu(p4ml_agtr_index.agtr); 202 | } 203 | 204 | //for is_Col in A2TP 205 | 206 | action clean_col() { 207 | cleaning_col.execute_stateful_alu(p4ml_agtr_index.agtr); 208 | } 209 | 210 | 211 | action clean_bitmap() { 212 | cleaning_bitmap.execute_stateful_alu(p4ml_agtr_index.agtr); 213 | } 214 | 215 | action multicast(group) { 216 | modify_field(ig_intr_md_for_tm.mcast_grp_a, group); 217 | } 218 | 219 | action check_appID_and_seq() { 220 | check_app_id_and_seq.execute_stateful_alu(p4ml_agtr_index.agtr); 221 | //modify_field(mdata.qdepth, 0); 222 | } 223 | 224 | action check_appID_and_seq_resend() { 225 | check_app_id_and_seq_resend.execute_stateful_alu(p4ml_agtr_index.agtr); 226 | // modify_field(mdata.qdepth, 0); 227 | } 228 | 229 | action clean_appID_and_seq() { 230 | clean_app_id_and_seq.execute_stateful_alu(p4ml_agtr_index.agtr); 231 | } 232 | 233 | action check_agtr_time() { 234 | check_agtrTime.execute_stateful_alu(p4ml_agtr_index.agtr); 235 | } 236 | 237 | action check_resend_agtr_time() { 238 | check_resend_agtrTime.execute_stateful_alu(p4ml_agtr_index.agtr); 239 | } 240 | 241 | action modify_packet_bitmap() { 242 | modify_field(p4ml.bitmap, mdata.integrated_bitmap); 243 | } 244 | 245 | action do_qdepth() { 246 | do_comp_qdepth.execute_stateful_alu(0); 247 | } 248 | 249 | action modify_ecn() { 250 | modify_field(p4ml.ECN, 1); 251 | } 252 | 253 | action mark_ecn() { 254 | bit_or(mdata.is_ecn, mdata.qdepth, mdata.is_ecn); 255 | } 256 | 257 | action modify_ipv4_ecn() { 258 | modify_field(ipv4.ecn, 3); 259 | } 260 | 261 | action check_ecn() { 262 | do_check_ecn.execute_stateful_alu(p4ml_agtr_index.agtr); 263 | } 264 | 265 | action setup_ecn() { 266 | modify_field(mdata.is_ecn, 1); 267 | } 268 | 269 | //for is_Col in A2TP 270 | 271 | action check_col() { 272 | do_check_col.execute_stateful_alu(p4ml_agtr_index.agtr); 273 | } 274 | 275 | action tag_col() { 276 | do_tag_col.execute_stateful_alu(p4ml_agtr_index.agtr); 277 | } 278 | /* 279 | action setup_col() { 280 | modify_field(mdata.is_col, 1); 281 | } 282 | */ 283 | 284 | 285 | action tag_collision_incoming() { 286 | modify_field(p4ml.isSWCollision, 1); 287 | modify_field(mdata.is_col, 1); 288 | // modify_field(p4ml.bitmap, mdata.isMyAppIDandMyCurrentSeq); 289 | } 290 | 291 | action set_egr(egress_spec) { 292 | modify_field(ig_intr_md_for_tm.ucast_egress_port, egress_spec); 293 | // increase_p4ml_counter.execute_stateful_alu(ig_intr_md.ingress_port); 294 | } 295 | 296 | action set_egr_and_set_index(egress_spec) { 297 | modify_field(ig_intr_md_for_tm.ucast_egress_port, egress_spec); 298 | modify_field(p4ml.dataIndex, 1); 299 | // increase_p4ml_counter.execute_stateful_alu(ig_intr_md.ingress_port); 300 | } 301 | 302 | action nop() 303 | { 304 | } 305 | 306 | action drop_pkt() { 307 | drop(); 308 | } 309 | 310 | action increase_counter() { 311 | increase_p4ml_counter.execute_stateful_alu(0); 312 | } 313 | 314 | table bitmap_table { 315 | actions { 316 | process_bitmap; 317 | } 318 | default_action: process_bitmap(); 319 | size : 1; 320 | } 321 | 322 | table bitmap_resend_table { 323 | actions { 324 | process_bitmap_resend; 325 | } 326 | default_action: process_bitmap_resend(); 327 | size : 1; 328 | } 329 | 330 | //@pragma stage 2 331 | table bitmap_aggregate_table { 332 | actions { 333 | check_aggregate_and_forward; 334 | } 335 | default_action: check_aggregate_and_forward(); 336 | size : 1; 337 | } 338 | 339 | //@pragma stage 3 340 | table agtr_time_table { 341 | actions { 342 | check_agtr_time; 343 | } 344 | default_action: check_agtr_time(); 345 | size : 1; 346 | } 347 | 348 | //@pragma stage 3 349 | table agtr_time_resend_table { 350 | actions { 351 | check_resend_agtr_time; 352 | } 353 | default_action: check_resend_agtr_time(); 354 | size : 1; 355 | } 356 | 357 | table immd_outPort_table { 358 | reads { 359 | p4ml.appIDandSeqNum mask 0xFFFF0000: exact; 360 | } 361 | actions { 362 | set_egr; 363 | } 364 | } 365 | 366 | //@pragma stage 11 367 | table outPort_table { 368 | reads { 369 | p4ml.appIDandSeqNum mask 0xFFFF0000: exact; 370 | ig_intr_md.ingress_port: exact; 371 | p4ml.dataIndex: exact; 372 | p4ml.PSIndex: exact; 373 | } 374 | actions { 375 | nop; 376 | set_egr; 377 | set_egr_and_set_index; 378 | drop_pkt; 379 | } 380 | default_action: drop_pkt(); 381 | } 382 | 383 | table bg_outPort_table { 384 | reads { 385 | // useless here, just can't use default action for variable 386 | p4ml_bg.isACK : exact; 387 | } 388 | actions { 389 | set_egr; 390 | nop; 391 | } 392 | } 393 | 394 | table multicast_table { 395 | reads { 396 | p4ml.isACK: exact; 397 | p4ml.appIDandSeqNum mask 0xFFFF0000: exact; 398 | ig_intr_md.ingress_port: exact; 399 | p4ml.dataIndex: exact; 400 | } 401 | actions { 402 | multicast; drop_pkt; set_egr_and_set_index; 403 | } 404 | default_action: drop_pkt(); 405 | } 406 | 407 | @pragma stage 3 408 | table clean_agtr_time_table { 409 | actions { 410 | clean_agtr_time; 411 | } 412 | default_action: clean_agtr_time(); 413 | size : 1; 414 | } 415 | 416 | @pragma stage 1 417 | table clean_ecn_table { 418 | actions { 419 | clean_ecn; 420 | } 421 | default_action: clean_ecn(); 422 | size : 1; 423 | } 424 | 425 | //for is_Col in A2TP 426 | 427 | @pragma stage 2 428 | table clean_col_table { 429 | actions { 430 | clean_col; 431 | } 432 | default_action: clean_col(); 433 | size : 1; 434 | } 435 | 436 | 437 | table clean_bitmap_table { 438 | actions { 439 | clean_bitmap; 440 | } 441 | default_action: clean_bitmap(); 442 | size : 1; 443 | } 444 | 445 | /* Counter */ 446 | register p4ml_counter { 447 | width : 32; 448 | instance_count :1; 449 | } 450 | 451 | blackbox stateful_alu increase_p4ml_counter { 452 | reg: p4ml_counter; 453 | 454 | update_lo_1_value : register_lo + 1 ; 455 | } 456 | 457 | table forward_counter_table { 458 | actions { 459 | increase_counter; 460 | } 461 | default_action: increase_counter(); 462 | size : 1; 463 | } 464 | 465 | //@pragma stage 0 466 | table appID_and_seq_table { 467 | actions { 468 | check_appID_and_seq; 469 | } 470 | default_action: check_appID_and_seq(); 471 | size : 1; 472 | } 473 | 474 | table appID_and_seq_resend_table { 475 | actions { 476 | check_appID_and_seq_resend; 477 | } 478 | default_action: check_appID_and_seq_resend(); 479 | size : 1; 480 | } 481 | 482 | table clean_appID_and_seq_table { 483 | actions { 484 | clean_appID_and_seq; 485 | } 486 | default_action: clean_appID_and_seq(); 487 | size : 1; 488 | } 489 | 490 | table modify_packet_bitmap_table { 491 | reads { 492 | p4ml.dataIndex: exact; 493 | } 494 | actions { 495 | modify_packet_bitmap; nop; 496 | } 497 | default_action: nop(); 498 | } 499 | 500 | table qdepth_table { 501 | actions { 502 | do_qdepth; 503 | } 504 | default_action: do_qdepth(); 505 | size : 1; 506 | } 507 | 508 | table modify_ecn_table { 509 | actions { 510 | modify_ecn; 511 | } 512 | default_action: modify_ecn(); 513 | size : 1; 514 | } 515 | 516 | table mark_ecn_ipv4_table { 517 | actions { 518 | modify_ipv4_ecn; 519 | } 520 | default_action: modify_ipv4_ecn(); 521 | size : 1; 522 | } 523 | 524 | table ecn_mark_table { 525 | actions { 526 | mark_ecn; 527 | } 528 | default_action: mark_ecn(); 529 | size : 1; 530 | } 531 | 532 | @pragma stage 1 533 | table ecn_register_table { 534 | actions { 535 | check_ecn; 536 | } 537 | default_action: check_ecn(); 538 | size : 1; 539 | } 540 | 541 | 542 | table setup_ecn_table { 543 | actions { 544 | setup_ecn; 545 | } 546 | default_action: setup_ecn(); 547 | size : 1; 548 | } 549 | 550 | //for is_Col in A2TP 551 | @pragma stage 2 552 | table col_register_table { 553 | actions { 554 | check_col; 555 | } 556 | default_action: check_col(); 557 | size : 1; 558 | } 559 | 560 | 561 | 562 | 563 | table forward { 564 | reads { 565 | ethernet.dstAddr : exact; 566 | } 567 | actions { 568 | set_egr; nop; drop_pkt; 569 | } 570 | default_action: drop_pkt(); 571 | } 572 | 573 | table drop_table { 574 | reads { 575 | ig_intr_md.ingress_port: exact; 576 | p4ml.dataIndex : exact; 577 | } 578 | actions { 579 | drop_pkt; set_egr; set_egr_and_set_index; 580 | } 581 | default_action: drop_pkt(); 582 | } 583 | 584 | @pragma stage 2 585 | table tag_col_register_table { 586 | actions { 587 | tag_col; 588 | } 589 | default_action: tag_col(); 590 | } 591 | 592 | table tag_collision_incoming_table { 593 | actions { 594 | tag_collision_incoming; 595 | } 596 | default_action: tag_collision_incoming(); 597 | } 598 | -------------------------------------------------------------------------------- /p4ml2/includes/headers.p4: -------------------------------------------------------------------------------- 1 | #define MAX_ENTRIES_PER_PACKET 32 2 | /************************************************************************* 3 | *********************** H E A D E R S ********************************* 4 | *************************************************************************/ 5 | 6 | // 14Byte 7 | header_type ethernet_t { 8 | fields { 9 | dstAddr : 48; 10 | srcAddr : 48; 11 | etherType : 16; 12 | } 13 | } 14 | 15 | // 20Byte 16 | header_type ipv4_t { 17 | fields { 18 | version : 4; 19 | ihl : 4; 20 | dscp : 6; 21 | ecn : 2; 22 | totalLen : 16; 23 | identification : 16; 24 | flags : 3; 25 | fragOffset : 13; 26 | ttl : 8; 27 | protocol : 8; 28 | hdrChecksum : 16; 29 | srcAddr : 32; 30 | dstAddr : 32; 31 | } 32 | } 33 | 34 | header_type udp_t { 35 | fields { 36 | srcPort : 16; 37 | dstPort : 16; 38 | length_ : 16; 39 | checksum : 16; 40 | } 41 | } 42 | 43 | // 12Byte * 2 44 | header_type p4ml_t { 45 | fields { 46 | bitmap : 32; 47 | agtr_time : 8; 48 | overflow : 1; 49 | /* For multiple PS */ 50 | PSIndex : 2; 51 | /* For signle PS */ 52 | // reserved : 2; 53 | // isForceFoward : 1; 54 | dataIndex : 1; 55 | ECN : 1; 56 | isResend : 1; 57 | isSWCollision : 1; 58 | isACK : 1; 59 | appIDandSeqNum : 32; //in switchml.p4: this is used to find the bit location 60 | is_lzy_Col : 8; //used in A2TP 61 | } 62 | } 63 | 64 | header_type p4ml_agtr_index_t { 65 | fields{ 66 | agtr :16; 67 | } 68 | } 69 | 70 | header_type bg_p4ml_t { 71 | fields { 72 | key : 64; 73 | len_tensor : 32; 74 | bitmap : 32; 75 | agtr_time : 8; 76 | reserved : 4; 77 | ECN : 1; 78 | isResend : 1; 79 | isSWCollision : 1; 80 | isACK : 1; 81 | agtr : 16; 82 | appIDandSeqNum : 32; //in switchml.p4: this is used to find the bit location 83 | } 84 | } 85 | 86 | // 108Byte * 2 87 | header_type entry_t { 88 | fields { 89 | data0 : 32 (signed); 90 | data1 : 32 (signed); 91 | data2 : 32 (signed); 92 | data3 : 32 (signed); 93 | data4 : 32 (signed); 94 | data5 : 32 (signed); 95 | data6 : 32 (signed); 96 | data7 : 32 (signed); 97 | data8 : 32 (signed); 98 | data9 : 32 (signed); 99 | data10 : 32 (signed); 100 | data11 : 32 (signed); 101 | data12 : 32 (signed); 102 | data13 : 32 (signed); 103 | data14 : 32 (signed); 104 | data15 : 32 (signed); 105 | data16 : 32 (signed); 106 | data17 : 32 (signed); 107 | data18 : 32 (signed); 108 | data19 : 32 (signed); 109 | data20 : 32 (signed); 110 | data21 : 32 (signed); 111 | data22 : 32 (signed); 112 | data23 : 32 (signed); 113 | data24 : 32 (signed); 114 | data25 : 32 (signed); 115 | data26 : 32 (signed); 116 | data27 : 32 (signed); 117 | data28 : 32 (signed); 118 | data29 : 32 (signed); 119 | data30 : 32 (signed); 120 | // data31 : 32 (signed); 121 | } 122 | } 123 | 124 | //12Byte * 2 125 | // header_type entry2_t { 126 | // fields { 127 | // data27 : 32 (signed); 128 | // data28 : 32 (signed); 129 | // data29 : 32 (signed); 130 | // data30 : 32 (signed); 131 | // data31 : 32 (signed); 132 | // } 133 | // } 134 | 135 | /************************************************************************* 136 | *********************** M E T A D A T A ******************************* 137 | *************************************************************************/ 138 | 139 | header_type p4ml_meta_t { 140 | fields { 141 | // P4ML 142 | isMyAppIDandMyCurrentSeq : 16; 143 | bitmap : 32; 144 | isAggregate : 32; 145 | agtr_time : 8; 146 | integrated_bitmap : 32; 147 | current_agtr_time : 8; 148 | agtr_index : 32; 149 | isDrop : 32; 150 | inside_appID_and_Seq : 1; 151 | value_one : 1; 152 | value_two : 8; 153 | qdepth : 16; 154 | seen_bitmap0 : 8; 155 | seen_isAggregate : 8; 156 | is_ecn : 32; 157 | is_col : 32; //used in A2TP 158 | } 159 | } 160 | 161 | header_type p4ml_constant_t { 162 | fields{ 163 | bitmap :32; 164 | agtr_time :8; 165 | } 166 | } 167 | -------------------------------------------------------------------------------- /p4ml2/includes/parser.p4: -------------------------------------------------------------------------------- 1 | 2 | 3 | metadata p4ml_meta_t mdata; 4 | metadata p4ml_constant_t p4ml_constant; 5 | 6 | header ethernet_t ethernet; 7 | header ipv4_t ipv4; 8 | header udp_t udp; 9 | header p4ml_agtr_index_t p4ml_agtr_index; 10 | header p4ml_agtr_index_t p4ml_agtr_index_useless; 11 | header p4ml_agtr_index_t p4ml_agtr_index_useless2; 12 | 13 | header p4ml_t p4ml; 14 | header entry_t p4ml_entries; 15 | header entry_t p4ml_entries_useless; 16 | 17 | header bg_p4ml_t p4ml_bg; 18 | // header blank3_t blank3; 19 | /************************************************************************* 20 | *********************** P A R S E R *********************************** 21 | *************************************************************************/ 22 | 23 | parser start { 24 | extract(ethernet); 25 | set_metadata(mdata.value_one, 1); 26 | set_metadata(mdata.value_two, 1); 27 | return select(ethernet.etherType) { 28 | 0x0700 : parse_ipv4; 29 | 0x0800 : parse_rdma; 30 | 0x0900 : parse_bg; 31 | default : ingress; 32 | } 33 | // return parse_ipv4; 34 | 35 | } 36 | 37 | parser parse_ipv4 { 38 | extract(ipv4); 39 | return parse_p4ml; 40 | } 41 | 42 | parser parse_p4ml { 43 | extract(p4ml); 44 | return select(p4ml.dataIndex) { 45 | 0x0 : check_if_resubmit; 46 | 0x1 : use_second_p4ml_agtr_index_recirculate; 47 | default : ingress; 48 | } 49 | } 50 | 51 | parser check_if_resubmit { 52 | return select(ig_intr_md.resubmit_flag) { 53 | // 0x0 : parse_p4ml_agtr_index; 54 | 0x0 : use_first_p4ml_agtr_index_recirculate; 55 | // 0x1 : skip_first_p4ml_agtr_index; 56 | 0x1 : use_second_p4ml_agtr_index_recirculate; 57 | default : ingress; 58 | } 59 | } 60 | 61 | /// resubmit 0x0 62 | 63 | parser parse_p4ml_agtr_index { 64 | extract(p4ml_agtr_index); 65 | return skip_second_p4ml_agtr_index; 66 | } 67 | 68 | @pragma force_shift ingress 16 /* 2 bytes */ 69 | parser skip_second_p4ml_agtr_index { 70 | return parse_entry; 71 | } 72 | 73 | parser parse_entry { 74 | extract(p4ml_entries); 75 | return ingress; 76 | } 77 | 78 | /// resubmit 0x1 79 | 80 | parser parse_p4ml_agtr_index2 { 81 | extract(p4ml_agtr_index); 82 | return skip_header_c_0_31; 83 | } 84 | 85 | @pragma force_shift ingress 16 /* 2 bytes */ 86 | parser skip_first_p4ml_agtr_index { 87 | return parse_p4ml_agtr_index2; 88 | } 89 | 90 | /// recirculate 2 91 | 92 | parser use_second_p4ml_agtr_index_recirculate { 93 | extract(p4ml_agtr_index_useless2); 94 | return parse_p4ml_agtr_index_recirculate; 95 | } 96 | 97 | parser parse_p4ml_agtr_index_recirculate { 98 | extract(p4ml_agtr_index); 99 | return parse_entry2; 100 | } 101 | 102 | parser parse_entry2 { 103 | extract(p4ml_entries_useless); 104 | return parse_entry; 105 | } 106 | 107 | /// recirculate 1 108 | 109 | parser use_first_p4ml_agtr_index_recirculate { 110 | extract(p4ml_agtr_index); 111 | return useless_second_p4ml_agtr_index_recirculate; 112 | } 113 | 114 | parser useless_second_p4ml_agtr_index_recirculate { 115 | extract(p4ml_agtr_index_useless); 116 | return parse_entry; 117 | } 118 | /// 119 | 120 | @pragma force_shift ingress 256 /* 32 bytes */ 121 | parser skip_header_c_0_31 { 122 | return skip_header_c_32_63; 123 | } 124 | 125 | @pragma force_shift ingress 256 /* 32 bytes */ 126 | parser skip_header_c_32_63 { 127 | return skip_header_c_64_95; 128 | } 129 | 130 | @pragma force_shift ingress 256 /* 32 bytes */ 131 | parser skip_header_c_64_95 { 132 | return skip_header_c_96_127; 133 | } 134 | 135 | @pragma force_shift ingress 256 /* 32 bytes */ 136 | parser skip_header_c_96_127 { 137 | return parse_entry; 138 | } 139 | 140 | 141 | // /* RDMA */ 142 | parser parse_rdma { 143 | extract(ipv4); 144 | return ingress; 145 | } 146 | 147 | // /* BG */ 148 | parser parse_bg { 149 | extract(ipv4); 150 | return parse_udp_bg; 151 | } 152 | 153 | parser parse_udp_bg { 154 | extract(udp); 155 | return parse_p4ml_bg; 156 | } 157 | 158 | parser parse_p4ml_bg { 159 | extract(p4ml_bg); 160 | //set_metadata(mdata.qdepth, 0); 161 | // return ingress; 162 | return ingress; 163 | } 164 | -------------------------------------------------------------------------------- /p4ml2/p4ml2.p4: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "includes/headers.p4" 5 | #include "includes/parser.p4" 6 | 7 | #include "includes/registers.p4" 8 | #include "includes/tables.p4" 9 | #include "includes/actions.p4" 10 | #include "includes/common.p4" 11 | 12 | field_list p4ml_resubmit_list{ 13 | mdata.agtr_time; 14 | } 15 | 16 | action do_resubmit(){ 17 | resubmit(p4ml_resubmit_list); 18 | } 19 | 20 | table p4ml_resubmit{ 21 | actions{ 22 | do_resubmit; 23 | } 24 | default_action: do_resubmit(); 25 | size: 1; 26 | 27 | } 28 | control ingress 29 | { 30 | 31 | if (valid(p4ml_entries)) { //aggregated packet 32 | 33 | if (ipv4.ecn == 3 or p4ml.ECN == 1) { //mdata.is_ecn=1 34 | apply(setup_ecn_table); //in common.p4 35 | } 36 | // ack packet 37 | if (p4ml.isACK == 1) { 38 | 39 | if (p4ml.overflow == 1 and p4ml.isResend == 0) { 40 | 41 | } else { 42 | apply(clean_appID_and_seq_table); //in common.p4 and registers.p4 get mdata.isMyAppIDandMyCurrentSeq 43 | 44 | if (mdata.isMyAppIDandMyCurrentSeq != 0) { 45 | /* Clean */ 46 | apply(clean_bitmap_table); 47 | apply(clean_ecn_table); 48 | 49 | apply(clean_col_table); //used in A2TP 50 | 51 | apply(clean_agtr_time_table); 52 | 53 | 54 | // apply(cleanEntry1); 55 | } 56 | } 57 | 58 | /* Multicast Back */ 59 | if(ig_intr_md.resubmit_flag == 1) { 60 | apply(multicast_table); 61 | } else { 62 | apply(p4ml_resubmit); //resubmit mdata.agtr_time 63 | } 64 | 65 | } else { 66 | 67 | if (p4ml.overflow == 1) { 68 | apply(outPort_table); //set eg according to appseq, ingress, dataindex, psindex in common.p4 69 | } else { 70 | if (p4ml.isResend == 1) { 71 | apply(appID_and_seq_resend_table); //clean appid and seq 72 | } else { 73 | apply(appID_and_seq_table); //allocate a new/existing block for current appid 74 | } 75 | // Correct ID and Seq 76 | if (mdata.isMyAppIDandMyCurrentSeq != 0) { 77 | 78 | if (p4ml.isResend == 1) { 79 | // Clean the bitmap also 80 | apply(bitmap_resend_table); 81 | } else { 82 | apply(bitmap_table); // OR bitmap 83 | } 84 | 85 | 86 | 87 | apply(ecn_register_table); 88 | 89 | apply(col_register_table); //used in A2TP 90 | 91 | apply(bitmap_aggregate_table); //check aggregation bitmap get 92 | //isaggregate? and bit_or 93 | 94 | 95 | if (p4ml.isResend == 1) { 96 | // Force forward and clean 97 | apply(agtr_time_resend_table); 98 | } else { 99 | apply(agtr_time_table); //update agtr_time 100 | } 101 | 102 | // bitmap correct 103 | if (mdata.isAggregate != 0) { //need to aggregate 104 | if (mdata.current_agtr_time == p4ml.agtr_time) { // aggregation finish? 105 | apply(noequ0_processEntry1andWriteToPacket); //sum to register 106 | apply(noequ0_processEntry2andWriteToPacket); 107 | apply(noequ0_processEntry3andWriteToPacket); 108 | apply(noequ0_processEntry4andWriteToPacket); 109 | apply(noequ0_processEntry5andWriteToPacket); 110 | apply(noequ0_processEntry6andWriteToPacket); 111 | apply(noequ0_processEntry7andWriteToPacket); 112 | apply(noequ0_processEntry8andWriteToPacket); 113 | apply(noequ0_processEntry9andWriteToPacket); 114 | apply(noequ0_processEntry10andWriteToPacket); 115 | apply(noequ0_processEntry11andWriteToPacket); 116 | apply(noequ0_processEntry12andWriteToPacket); 117 | apply(noequ0_processEntry13andWriteToPacket); 118 | apply(noequ0_processEntry14andWriteToPacket); 119 | apply(noequ0_processEntry15andWriteToPacket); 120 | apply(noequ0_processEntry16andWriteToPacket); 121 | apply(noequ0_processEntry17andWriteToPacket); 122 | apply(noequ0_processEntry18andWriteToPacket); 123 | apply(noequ0_processEntry19andWriteToPacket); 124 | apply(noequ0_processEntry20andWriteToPacket); 125 | apply(noequ0_processEntry21andWriteToPacket); 126 | apply(noequ0_processEntry22andWriteToPacket); 127 | apply(noequ0_processEntry23andWriteToPacket); 128 | apply(noequ0_processEntry24andWriteToPacket); 129 | apply(noequ0_processEntry25andWriteToPacket); 130 | apply(noequ0_processEntry26andWriteToPacket); 131 | apply(noequ0_processEntry27andWriteToPacket); 132 | apply(noequ0_processEntry28andWriteToPacket); 133 | apply(noequ0_processEntry29andWriteToPacket); 134 | apply(noequ0_processEntry30andWriteToPacket); 135 | apply(noequ0_processEntry31andWriteToPacket); 136 | //apply(noequ0_processEntry32andWriteToPacket); 137 | // set output port 138 | // if(ig_intr_md.resubmit_flag == 1) { 139 | apply(modify_packet_bitmap_table); 140 | apply(outPort_table); 141 | // } else { 142 | // apply(p4ml_resubmit); 143 | // } 144 | } else { 145 | apply(processEntry1); // cover or sum to register 146 | apply(processEntry2); 147 | apply(processEntry3); 148 | apply(processEntry4); 149 | apply(processEntry5); 150 | apply(processEntry6); 151 | apply(processEntry7); 152 | apply(processEntry8); 153 | apply(processEntry9); 154 | apply(processEntry10); 155 | apply(processEntry11); 156 | apply(processEntry12); 157 | apply(processEntry13); 158 | apply(processEntry14); 159 | apply(processEntry15); 160 | apply(processEntry16); 161 | apply(processEntry17); 162 | apply(processEntry18); 163 | apply(processEntry19); 164 | apply(processEntry20); 165 | apply(processEntry21); 166 | apply(processEntry22); 167 | apply(processEntry23); 168 | apply(processEntry24); 169 | apply(processEntry25); 170 | apply(processEntry26); 171 | apply(processEntry27); 172 | apply(processEntry28); 173 | apply(processEntry29); 174 | apply(processEntry30); 175 | apply(processEntry31); 176 | //apply(processEntry32); 177 | 178 | if (ig_intr_md.resubmit_flag == 1) { 179 | apply(drop_table); 180 | } else { 181 | apply(p4ml_resubmit); 182 | } 183 | 184 | } 185 | } else { //arrive yet 186 | if (mdata.current_agtr_time == p4ml.agtr_time) { 187 | apply(Entry1WriteToPacket); //write to p4ml_entries.data1; to packet 188 | apply(Entry2WriteToPacket); 189 | apply(Entry3WriteToPacket); 190 | apply(Entry4WriteToPacket); 191 | apply(Entry5WriteToPacket); 192 | apply(Entry6WriteToPacket); 193 | apply(Entry7WriteToPacket); 194 | apply(Entry8WriteToPacket); 195 | apply(Entry9WriteToPacket); 196 | apply(Entry10WriteToPacket); 197 | apply(Entry11WriteToPacket); 198 | apply(Entry12WriteToPacket); 199 | apply(Entry13WriteToPacket); 200 | apply(Entry14WriteToPacket); 201 | apply(Entry15WriteToPacket); 202 | apply(Entry16WriteToPacket); 203 | apply(Entry17WriteToPacket); 204 | apply(Entry18WriteToPacket); 205 | apply(Entry19WriteToPacket); 206 | apply(Entry20WriteToPacket); 207 | apply(Entry21WriteToPacket); 208 | apply(Entry22WriteToPacket); 209 | apply(Entry23WriteToPacket); 210 | apply(Entry24WriteToPacket); 211 | apply(Entry25WriteToPacket); 212 | apply(Entry26WriteToPacket); 213 | apply(Entry27WriteToPacket); 214 | apply(Entry28WriteToPacket); 215 | apply(Entry29WriteToPacket); 216 | apply(Entry30WriteToPacket); 217 | apply(Entry31WriteToPacket); 218 | //apply(Entry32WriteToPacket); 219 | // set output port 220 | // if(ig_intr_md.resubmit_flag == 1) { 221 | apply(modify_packet_bitmap_table); 222 | apply(outPort_table); 223 | // } else { 224 | // apply(p4ml_resubmit); 225 | // } 226 | } 227 | } 228 | } else { 229 | /* tag collision bit in incoming one */ 230 | // if not empty 231 | if (p4ml.isResend == 0) { 232 | 233 | apply(tag_collision_incoming_table); 234 | 235 | if(mdata.is_col == 1){ 236 | apply(tag_col_register_table); //used in A2TP 237 | } 238 | 239 | } 240 | 241 | apply(outPort_table); 242 | } 243 | } 244 | } 245 | } else { 246 | // // BG traffic doesn't have data layer 247 | // if (valid(p4ml_bg)){ 248 | // apply(bg_outPort_table); 249 | // } else { 250 | apply(forward); 251 | // } 252 | } 253 | } 254 | 255 | control egress 256 | { 257 | apply(qdepth_table); 258 | if (valid(ipv4)) { 259 | if (mdata.qdepth != 0) { 260 | apply(mark_ecn_ipv4_table); 261 | } 262 | } 263 | if (valid(p4ml_entries)) { 264 | if (mdata.qdepth != 0) { 265 | apply(modify_ecn_table); 266 | } 267 | } 268 | } 269 | 270 | -------------------------------------------------------------------------------- /ptf_p4ml2/ptfTest.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | import pd_base_tests 4 | import pltfm_pm_rpc 5 | import pal_rpc 6 | import random 7 | import sys 8 | import time 9 | import unittest 10 | 11 | from pltfm_pm_rpc.ttypes import * 12 | from pal_rpc.ttypes import * 13 | from ptf import config 14 | from ptf.testutils import * 15 | from ptf.thriftutils import * 16 | from res_pd_rpc.ttypes import * 17 | from ptf import config 18 | from ptf.thriftutils import * 19 | 20 | from res_pd_rpc.ttypes import * 21 | from port_mapping import * 22 | 23 | from tm_api_rpc.ttypes import * 24 | 25 | this_dir = os.path.dirname(os.path.abspath(__file__)) 26 | 27 | fp_ports = ["24/0", "23/0", "22/0", "21/0", "20/0", "16/0","15/0","14/0","13/0", "12/0", "11/0"] 28 | # fp_ports = ["13/0","14/0", "11/0"] 29 | loopback_ports = ["19/0", "18/0"] 30 | # loopback_ports = ["1/0", "2/0", "3/0", "4/0", "5/0", "6/0", "7/0", "8/0", "25/0"] 31 | def toInt8(n): 32 | n = n & 0xff 33 | return (n ^ 0x80) - 0x80 34 | 35 | class L2Test(pd_base_tests.ThriftInterfaceDataPlane): 36 | def __init__(self): 37 | pd_base_tests.ThriftInterfaceDataPlane.__init__(self, 38 | ["basic_switching"]) 39 | 40 | # The setUp() method is used to prepare the test fixture. Typically 41 | # you would use it to establich connection to the Thrift server. 42 | # 43 | # You can also put the initial device configuration there. However, 44 | # if during this process an error is encountered, it will be considered 45 | # as a test error (meaning the test is incorrect), 46 | # rather than a test failure 47 | def setUp(self): 48 | # initialize the connection 49 | pd_base_tests.ThriftInterfaceDataPlane.setUp(self) 50 | self.sess_hdl = self.conn_mgr.client_init() 51 | self.dev_tgt = DevTarget_t(0, hex_to_i16(0xFFFF)) 52 | self.devPorts = [] 53 | self.LPPorts = [] 54 | self.dev = 0 55 | self.platform_type = "mavericks" 56 | board_type = self.pltfm_pm.pltfm_pm_board_type_get() 57 | if re.search("0x0234|0x1234|0x4234|0x5234", hex(board_type)): 58 | self.platform_type = "mavericks" 59 | elif re.search("0x2234|0x3234", hex(board_type)): 60 | self.platform_type = "montara" 61 | 62 | # get the device ports from front panel ports 63 | try: 64 | for fpPort in fp_ports: 65 | port, chnl = fpPort.split("/") 66 | devPort = \ 67 | self.pal.pal_port_front_panel_port_to_dev_port_get(0, 68 | int(port), 69 | int(chnl)) 70 | self.devPorts.append(devPort) 71 | 72 | if test_param_get('setup') == True or (test_param_get('setup') != True 73 | and test_param_get('cleanup') != True): 74 | 75 | # add and enable the platform ports 76 | for i in self.devPorts: 77 | if int(i) in [999]: #pal_port_speed_t.BF_SPEED_40G, pal_fec_type_t.BF_FEC_TYP_NONE 78 | self.pal.pal_port_add(0, i, 79 | pal_port_speed_t.BF_SPEED_40G, pal_fec_type_t.BF_FEC_TYP_NONE) #pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON 80 | self.pal.pal_port_an_set(0, i, 2); 81 | self.pal.pal_port_enable(0, i) 82 | else: 83 | self.pal.pal_port_add(0, i, 84 | pal_port_speed_t.BF_SPEED_100G, pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON) #pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON 85 | self.pal.pal_port_an_set(0, i, 2); 86 | self.pal.pal_port_enable(0, i) 87 | 88 | ####################### LOOPBACK ########################### 89 | for lbPort in loopback_ports: 90 | port, chnl = lbPort.split("/") 91 | devPort = \ 92 | self.pal.pal_port_front_panel_port_to_dev_port_get(0, 93 | int(port), 94 | int(chnl)) 95 | self.LPPorts.append(devPort) 96 | 97 | # add and enable the platform ports 98 | for i in self.LPPorts: 99 | self.pal.pal_port_add(0, i, 100 | pal_port_speed_t.BF_SPEED_100G, pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON) #pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON 101 | 102 | self.pal.pal_port_loopback_mode_set(0, i, 103 | pal_loopback_mod_t.BF_LPBK_MAC_NEAR) 104 | self.pal.pal_port_an_set(0, i, 2); 105 | self.pal.pal_port_enable(0, i) 106 | 107 | self.conn_mgr.complete_operations(self.sess_hdl) 108 | 109 | except Exception as e: 110 | print "Some Error in port init" 111 | 112 | # # flow control setting, follow "Barefoot Network Tofino Fixed Function API Guide" 113 | # for i in range(len(self.devPorts)): 114 | # # step 1: Map loessless traffice to a PPG handle with a buffer limit 115 | # ppg_cells = 2000 116 | # self.ppg_handler = self.tm.tm_allocate_ppg(self.dev, self.devPorts[i]) 117 | # self.tm.tm_set_ppg_guaranteed_min_limit(self.dev, self.ppg_handler, ppg_cells) 118 | 119 | # # step 2: Map traffic to an iCos 120 | # icos_bmap = toInt8(0x01) 121 | # self.tm.tm_set_ppg_icos_mapping(self.dev, self.ppg_handler, icos_bmap) 122 | 123 | # # step 3: Provision skid buffer set up pasue PFC generation 124 | # skid_cells = 4000 125 | # self.tm.tm_set_ppg_skid_limit(self.dev, self.ppg_handler, skid_cells) 126 | # self.tm.tm_enable_lossless_treatment(self.dev, self.ppg_handler) 127 | # # link-level flow control 128 | # fctype = 1 # BF_TM_PAUSE_PORT 129 | # self.tm.tm_set_port_flowcontrol_mode(self.dev, self.devPorts[i], fctype) 130 | # # iCos to Cos 131 | # icos_cos_map = tm_pfc_cos_map_t(CoS0_to_iCos=0) 132 | # self.tm.tm_set_port_pfc_cos_mapping(self.dev, self.devPorts[i], icos_cos_map) 133 | 134 | # ########################################## 135 | # for i in range(len(self.devPorts)): 136 | # #step 4: Apply buffering 137 | # queue_id = 0 138 | # queue_cells = 25000 139 | # self.tm.tm_set_q_guaranteed_min_limit(self.dev, self.devPorts[i], queue_id, queue_cells) 140 | 141 | # # step 5: Allocate queues 142 | # q_count = 8 143 | # q_map = tm_q_map_t(0,1,2,3,4,5,6,7) 144 | # self.tm.tm_set_port_q_mapping(self.dev, self.devPorts[i], q_count, q_map) 145 | # # step 6: Apply weighting if needed (skip, no use) 146 | 147 | # # step 7: Honor pause/PFC event 148 | # cos = 0 149 | # self.tm.tm_set_q_pfc_cos_mapping(self.dev, self.devPorts[i], queue_id, cos) 150 | 151 | # # Can not find below API 152 | # # self.tm.tm_set_port_flowcontrol_rx(self.dev, self.devPorts, fctype) 153 | # self.tm.tm_complete_operations(self.dev) 154 | 155 | # for i in range(len(self.devPorts)): 156 | # # For MAC 157 | # self.pal.pal_port_flow_control_pfc_set(self.dev, self.devPorts[i], 1, 1) 158 | # print("Done with PFC") 159 | 160 | return 161 | 162 | def runTest(self): 163 | print "runTest" 164 | # self.conn_mgr.complete_operations(self.sess_hdl) 165 | 166 | def tearDown(self): 167 | return 168 | # try: 169 | # print("Clearing table entries") 170 | # for table in self.entries.keys(): 171 | # delete_func = "self.client." + table + "_table_delete" 172 | # for entry in self.entries[table]: 173 | # exec delete_func + "(self.sess_hdl, self.dev, entry)" 174 | # except: 175 | # print("Error while cleaning up. ") 176 | # print("You might need to restart the driver") 177 | # finally: 178 | # self.conn_mgr.complete_operations(self.sess_hdl) 179 | # self.conn_mgr.client_cleanup(self.sess_hdl) 180 | # print("Closed Session %d" % self.sess_hdl) 181 | # self.tm.tm_free_ppg(self.dev, self.ppg_handler) 182 | # print("Free ppg handler %d" % self.ppg_handler) 183 | # pd_base_tests.ThriftInterfaceDataPlane.tearDown(self) 184 | -------------------------------------------------------------------------------- /ptf_p4ml2/ptfTest.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/ptf_p4ml2/ptfTest.pyc -------------------------------------------------------------------------------- /run_pd_rpc/setupp4ml2.py: -------------------------------------------------------------------------------- 1 | clear_all() 2 | 3 | p4_pd.register_reset_all_agtr_time() 4 | p4_pd.register_reset_all_appID_and_Seq() 5 | p4_pd.register_reset_all_bitmap() 6 | p4_pd.register_reset_all_register1() 7 | p4_pd.register_reset_all_register2() 8 | p4_pd.register_reset_all_register3() 9 | p4_pd.register_reset_all_register4() 10 | p4_pd.register_reset_all_register5() 11 | p4_pd.register_reset_all_register6() 12 | p4_pd.register_reset_all_register7() 13 | p4_pd.register_reset_all_register8() 14 | p4_pd.register_reset_all_register9() 15 | p4_pd.register_reset_all_register10() 16 | p4_pd.register_reset_all_register11() 17 | p4_pd.register_reset_all_register12() 18 | p4_pd.register_reset_all_register13() 19 | p4_pd.register_reset_all_register14() 20 | p4_pd.register_reset_all_register15() 21 | p4_pd.register_reset_all_register16() 22 | p4_pd.register_reset_all_register17() 23 | p4_pd.register_reset_all_register18() 24 | p4_pd.register_reset_all_register19() 25 | p4_pd.register_reset_all_register20() 26 | p4_pd.register_reset_all_register21() 27 | p4_pd.register_reset_all_register22() 28 | p4_pd.register_reset_all_register23() 29 | p4_pd.register_reset_all_register24() 30 | p4_pd.register_reset_all_register25() 31 | p4_pd.register_reset_all_register26() 32 | p4_pd.register_reset_all_register27() 33 | p4_pd.register_reset_all_register28() 34 | p4_pd.register_reset_all_register29() 35 | p4_pd.register_reset_all_register30() 36 | p4_pd.register_reset_all_register31() 37 | # p4_pd.register_reset_all_register32() 38 | 39 | 40 | # These are background traffic 41 | # p4_pd.bg_outPort_table_table_add_with_set_egr( 42 | # p4_pd.bg_outPort_table_match_spec_t(0), 43 | # p4_pd.set_egr_action_spec_t(4) 44 | # ) 45 | 46 | # p4_pd.bg_outPort_table_table_add_with_set_egr( 47 | # p4_pd.bg_outPort_table_match_spec_t(1), 48 | # p4_pd.set_egr_action_spec_t(0) 49 | # ) 50 | 51 | # first Zero for pending 52 | #port_of_worker = [0, 56, 48, 40, 32, 24, 16, 8, 0, 4] 53 | #port_of_worker = [0, 8, 0, 4] 54 | port_of_worker = [0, 60, 52, 44, 36, 28, 0, 8, 16, 24, 32, 40] 55 | single_loopback_port = [0, 20, 12] #20 12 56 | 57 | MAC_address_of_worker = [ "0" 58 | , "0c:42:a1:5a:5b:c1" 59 | , "0c:42:a1:5a:5b:d9" 60 | , "0c:42:a1:5a:5b:b9" 61 | , "0c:42:a1:5a:53:01" 62 | , "b8:59:9f:e2:0c:17" 63 | , "b8:59:9f:e2:0c:16" 64 | , "0c:42:a1:5a:5b:e1" 65 | , "b8:59:9f:e2:25:f7" 66 | , "b8:59:9f:e2:09:47" 67 | , "0c:42:a1:5a:53:81" 68 | , "b8:59:9f:e2:26:0e"] 69 | 70 | # host0 24 60 0c:42:a1:5a:5b:c1 71 | # host1 23 52 0c:42:a1:5a:5b:d9 72 | # host2 22 44 0c:42:a1:5a:5b:b9 73 | # host3 21 36 0c:42:a1:5a:53:01 74 | ##host4 20 28 b8:59:9f:e2:0c:17 75 | # loop1 19 20 76 | # loop2 18 12 77 | # 17 78 | ##host5 16 0 b8:59:9f:e2:0c:16 79 | # host6 15 8 0c:42:a1:5a:5b:e1 80 | # host7 14 16 b8:59:9f:e2:25:f7 81 | # host8 13 24 b8:59:9f:e2:09:47 82 | # host9 12 32 0c:42:a1:5a:53:81 83 | # host10 11 40 b8:59:9f:e2:26:0e 84 | 85 | 86 | 87 | 88 | 89 | # first Zero for pending 90 | # PSs = [0, 9, 8] 91 | PSs = [0, 5, 6] 92 | 93 | len_workers = len(port_of_worker) 94 | len_PS = len(PSs) 95 | 96 | # Normal Switch traffic 97 | for i in range(1, len_workers): 98 | p4_pd.forward_table_add_with_set_egr( 99 | p4_pd.forward_match_spec_t(macAddr_to_string(MAC_address_of_worker[i])), 100 | p4_pd.set_egr_action_spec_t(port_of_worker[i]) 101 | ) 102 | 103 | 104 | # P4ML Traffic 105 | 106 | # No Pending packet, First time enter switch 107 | for i in range(1, len_workers - 1): 108 | for j in range(1, len_PS): #appIDandseqnum,loopport,dataIndex,PSIndex PSIndex=key%num_PS 109 | p4_pd.outPort_table_table_add_with_set_egr_and_set_index( 110 | p4_pd.outPort_table_match_spec_t( 111 | j << 16, 112 | port_of_worker[i], 113 | 0, 114 | 0), 115 | 116 | p4_pd.set_egr_and_set_index_action_spec_t(single_loopback_port[j])) 117 | 118 | # Not Pending packet, Second time enter switch 119 | for j in range(1, len_PS): 120 | print(j, PSs[j]) #appIDandseqnum,loopport,dataIndex,PSIndex 121 | p4_pd.outPort_table_table_add_with_set_egr( 122 | p4_pd.outPort_table_match_spec_t( 123 | j << 16, 124 | single_loopback_port[j], 125 | 1, 126 | 0), 127 | # app1 -> worker3 128 | p4_pd.set_egr_action_spec_t(port_of_worker[PSs[j]])) 129 | 130 | # INGRESSPORT, Index 131 | 132 | for i in range(1, len_workers - 1):#ingressport,dataindex 133 | p4_pd.drop_table_table_add_with_drop_pkt( 134 | p4_pd.drop_table_match_spec_t( 135 | port_of_worker[i], 136 | 1) 137 | ) 138 | 139 | ####### Server ######## 140 | ''' 141 | for j in range(1, len_PS): 142 | p4_pd.multicast_table_table_add_with_multicast( 143 | p4_pd.multicast_table_match_spec_t( 144 | 1, 145 | 1 << 16, 146 | port_of_worker[PSs[j]], 147 | 0), 148 | # multicast app1 -> worker1, 2 149 | p4_pd.multicast_action_spec_t(999) 150 | ) 151 | ''' 152 | for j in range(1, len_PS): #isAck, appIDandseqnum, IngressPort, dataIndex; aciton(multicast_group) 153 | p4_pd.multicast_table_table_add_with_multicast( 154 | p4_pd.multicast_table_match_spec_t( 155 | 1, 156 | j << 16, 157 | port_of_worker[PSs[j]], 158 | 0), 159 | # multicast app1 -> worker1, 2 160 | p4_pd.multicast_action_spec_t(1000-j) 161 | ) 162 | 163 | p4_pd.modify_packet_bitmap_table_table_add_with_modify_packet_bitmap( 164 | p4_pd.modify_packet_bitmap_table_match_spec_t(1) 165 | ) 166 | 167 | p4_pd.modify_packet_bitmap_table_table_add_with_nop( 168 | p4_pd.modify_packet_bitmap_table_match_spec_t(0) 169 | ) 170 | 171 | p4_pd.processEntry1_table_add_with_processentry1( 172 | p4_pd.processEntry1_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 173 | ) 174 | p4_pd.processEntry1_table_add_with_noequ0_processentry1( 175 | p4_pd.processEntry1_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1, 176 | ) 177 | p4_pd.processEntry2_table_add_with_processentry2( 178 | p4_pd.processEntry2_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 179 | ) 180 | p4_pd.processEntry2_table_add_with_noequ0_processentry2( 181 | p4_pd.processEntry2_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 182 | ) 183 | p4_pd.processEntry3_table_add_with_processentry3( 184 | p4_pd.processEntry3_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 185 | ) 186 | p4_pd.processEntry3_table_add_with_noequ0_processentry3( 187 | p4_pd.processEntry3_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 188 | ) 189 | p4_pd.processEntry4_table_add_with_processentry4( 190 | p4_pd.processEntry4_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 191 | ) 192 | p4_pd.processEntry4_table_add_with_noequ0_processentry4( 193 | p4_pd.processEntry4_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 194 | ) 195 | p4_pd.processEntry5_table_add_with_processentry5( 196 | p4_pd.processEntry5_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 197 | ) 198 | p4_pd.processEntry5_table_add_with_noequ0_processentry5( 199 | p4_pd.processEntry5_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 200 | ) 201 | p4_pd.processEntry6_table_add_with_processentry6( 202 | p4_pd.processEntry6_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 203 | ) 204 | p4_pd.processEntry6_table_add_with_noequ0_processentry6( 205 | p4_pd.processEntry6_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 206 | ) 207 | p4_pd.processEntry7_table_add_with_processentry7( 208 | p4_pd.processEntry7_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 209 | ) 210 | p4_pd.processEntry7_table_add_with_noequ0_processentry7( 211 | p4_pd.processEntry7_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 212 | ) 213 | p4_pd.processEntry8_table_add_with_processentry8( 214 | p4_pd.processEntry8_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 215 | ) 216 | p4_pd.processEntry8_table_add_with_noequ0_processentry8( 217 | p4_pd.processEntry8_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 218 | ) 219 | p4_pd.processEntry9_table_add_with_processentry9( 220 | p4_pd.processEntry9_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 221 | ) 222 | p4_pd.processEntry9_table_add_with_noequ0_processentry9( 223 | p4_pd.processEntry9_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 224 | ) 225 | p4_pd.processEntry10_table_add_with_processentry10( 226 | p4_pd.processEntry10_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 227 | ) 228 | p4_pd.processEntry10_table_add_with_noequ0_processentry10( 229 | p4_pd.processEntry10_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 230 | ) 231 | p4_pd.processEntry11_table_add_with_processentry11( 232 | p4_pd.processEntry11_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 233 | ) 234 | p4_pd.processEntry11_table_add_with_noequ0_processentry11( 235 | p4_pd.processEntry11_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 236 | ) 237 | p4_pd.processEntry12_table_add_with_processentry12( 238 | p4_pd.processEntry12_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 239 | ) 240 | p4_pd.processEntry12_table_add_with_noequ0_processentry12( 241 | p4_pd.processEntry12_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 242 | ) 243 | p4_pd.processEntry13_table_add_with_processentry13( 244 | p4_pd.processEntry13_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 245 | ) 246 | p4_pd.processEntry13_table_add_with_noequ0_processentry13( 247 | p4_pd.processEntry13_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 248 | ) 249 | p4_pd.processEntry14_table_add_with_processentry14( 250 | p4_pd.processEntry14_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 251 | ) 252 | p4_pd.processEntry14_table_add_with_noequ0_processentry14( 253 | p4_pd.processEntry14_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 254 | ) 255 | p4_pd.processEntry15_table_add_with_processentry15( 256 | p4_pd.processEntry15_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 257 | ) 258 | p4_pd.processEntry15_table_add_with_noequ0_processentry15( 259 | p4_pd.processEntry15_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 260 | ) 261 | p4_pd.processEntry16_table_add_with_processentry16( 262 | p4_pd.processEntry16_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 263 | ) 264 | p4_pd.processEntry16_table_add_with_noequ0_processentry16( 265 | p4_pd.processEntry16_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 266 | ) 267 | p4_pd.processEntry17_table_add_with_processentry17( 268 | p4_pd.processEntry17_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 269 | ) 270 | p4_pd.processEntry17_table_add_with_noequ0_processentry17( 271 | p4_pd.processEntry17_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 272 | ) 273 | p4_pd.processEntry18_table_add_with_processentry18( 274 | p4_pd.processEntry18_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 275 | ) 276 | p4_pd.processEntry18_table_add_with_noequ0_processentry18( 277 | p4_pd.processEntry18_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 278 | ) 279 | p4_pd.processEntry19_table_add_with_processentry19( 280 | p4_pd.processEntry19_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 281 | ) 282 | p4_pd.processEntry19_table_add_with_noequ0_processentry19( 283 | p4_pd.processEntry19_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 284 | ) 285 | p4_pd.processEntry20_table_add_with_processentry20( 286 | p4_pd.processEntry20_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 287 | ) 288 | p4_pd.processEntry20_table_add_with_noequ0_processentry20( 289 | p4_pd.processEntry20_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 290 | ) 291 | p4_pd.processEntry21_table_add_with_processentry21( 292 | p4_pd.processEntry21_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 293 | ) 294 | p4_pd.processEntry21_table_add_with_noequ0_processentry21( 295 | p4_pd.processEntry21_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 296 | ) 297 | p4_pd.processEntry22_table_add_with_processentry22( 298 | p4_pd.processEntry22_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 299 | ) 300 | p4_pd.processEntry22_table_add_with_noequ0_processentry22( 301 | p4_pd.processEntry22_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 302 | ) 303 | p4_pd.processEntry23_table_add_with_processentry23( 304 | p4_pd.processEntry23_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 305 | ) 306 | p4_pd.processEntry23_table_add_with_noequ0_processentry23( 307 | p4_pd.processEntry23_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 308 | ) 309 | p4_pd.processEntry24_table_add_with_processentry24( 310 | p4_pd.processEntry24_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 311 | ) 312 | p4_pd.processEntry24_table_add_with_noequ0_processentry24( 313 | p4_pd.processEntry24_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 314 | ) 315 | p4_pd.processEntry25_table_add_with_processentry25( 316 | p4_pd.processEntry25_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 317 | ) 318 | p4_pd.processEntry25_table_add_with_noequ0_processentry25( 319 | p4_pd.processEntry25_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 320 | ) 321 | p4_pd.processEntry26_table_add_with_processentry26( 322 | p4_pd.processEntry26_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 323 | ) 324 | p4_pd.processEntry26_table_add_with_noequ0_processentry26( 325 | p4_pd.processEntry26_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 326 | ) 327 | p4_pd.processEntry27_table_add_with_processentry27( 328 | p4_pd.processEntry27_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 329 | ) 330 | p4_pd.processEntry27_table_add_with_noequ0_processentry27( 331 | p4_pd.processEntry27_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 332 | ) 333 | p4_pd.processEntry28_table_add_with_processentry28( 334 | p4_pd.processEntry28_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 335 | ) 336 | p4_pd.processEntry28_table_add_with_noequ0_processentry28( 337 | p4_pd.processEntry28_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 338 | ) 339 | p4_pd.processEntry29_table_add_with_processentry29( 340 | p4_pd.processEntry29_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 341 | ) 342 | p4_pd.processEntry29_table_add_with_noequ0_processentry29( 343 | p4_pd.processEntry29_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 344 | ) 345 | p4_pd.processEntry30_table_add_with_processentry30( 346 | p4_pd.processEntry30_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 347 | ) 348 | p4_pd.processEntry30_table_add_with_noequ0_processentry30( 349 | p4_pd.processEntry30_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 350 | ) 351 | p4_pd.processEntry31_table_add_with_processentry31( 352 | p4_pd.processEntry31_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1, 353 | ) 354 | p4_pd.processEntry31_table_add_with_noequ0_processentry31( 355 | p4_pd.processEntry31_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1 356 | ) 357 | try: 358 | # TODO: understand it 359 | # dont know why, but if group = input port, 360 | # then the packet followed by that packet will execute multicast 361 | # therefore make it 20, no 20th port is used. 362 | mcg_all = mc.mgrp_create(999) 363 | mcg1 = mc.mgrp_create(998) 364 | # mcg2 = mc.mgrp_create(997) 365 | # mcg3 = mc.mgrp_create(996) 366 | except: 367 | print """ 368 | clean_all() does not yet support cleaning the PRE programming. 369 | You need to restart the driver before running this script for the second time 370 | """ 371 | quit() 372 | 373 | node_all = mc.node_create( 374 | rid=999, 375 | #port_map=devports_to_mcbitmap([56,48,40,32,24,16,8,0]), 376 | # port_map=devports_to_mcbitmap([port_of_worker[2], port_of_worker[3], port_of_worker[4],]), 377 | #port_map=devports_to_mcbitmap([36,44]), 378 | port_map=devports_to_mcbitmap([60,52,44,36]), 379 | lag_map=lags_to_mcbitmap(([])) 380 | ) 381 | mc.associate_node(mcg_all, node_all, xid=0, xid_valid=False) 382 | 383 | #[52, 60, 36, 44, 12, 4, 20] 384 | 385 | node1 = mc.node_create( 386 | rid=998, 387 | # Not multicast to "0" ( 0 as bg traffic ) 388 | #port_map=devports_to_mcbitmap([56,48,40,32,24,16,8]), 389 | port_map=devports_to_mcbitmap([8,16,24,32]), 390 | #port_map=devports_to_mcbitmap([52,60,0]), 391 | lag_map=lags_to_mcbitmap(([])) 392 | ) 393 | mc.associate_node(mcg1, node1, xid=0, xid_valid=False) 394 | 395 | 396 | # node2 = mc.node_create( 397 | # rid=997, 398 | # # Not multicast to "0" ( 0 as bg traffic ) 399 | # #port_map=devports_to_mcbitmap([56,48,40,32,24,16,8]), 400 | # #port_map=devports_to_mcbitmap([56,48,40]), 401 | # port_map=devports_to_mcbitmap([60,0]), 402 | # lag_map=lags_to_mcbitmap(([])) 403 | # ) 404 | # mc.associate_node(mcg2, node2, xid=0, xid_valid=False) 405 | 406 | ''' 407 | node2 = mc.node_create( 408 | rid=997, 409 | # Not multicast to "0" ( 0 as bg traffic ) 410 | # port_map=devports_to_mcbitmap([56,48,40,32,24,16,8]), 411 | port_map=devports_to_mcbitmap([24,16,8]), 412 | lag_map=lags_to_mcbitmap(([])) 413 | ) 414 | mc.associate_node(mcg2, node2, xid=0, xid_valid=False) 415 | ''' 416 | 417 | conn_mgr.complete_operations() 418 | 419 | def hex_to_i32(h): 420 | x = int(h, 0) 421 | if (x > 0xFFFFFFFF): 422 | raise UIn_Error("Integer cannot fit within 32 bits") 423 | if (x > 0x7FFFFFFF): x-= 0x100000000 424 | return x 425 | 426 | import time 427 | 428 | 429 | used_aggr = 450 430 | all_start = time.time() 431 | last_time1 = all_start 432 | last_time2 = all_start 433 | 434 | num_com = 0 435 | average_total = 0 436 | average_appID_map = {} 437 | 438 | tmp = [] 439 | appID_map = {} 440 | 441 | while (1): 442 | 443 | now = time.time() 444 | time_count1 = now - last_time1 445 | time_count2 = now - last_time2 446 | if (time_count1 > 0.01): 447 | lasttime1=now 448 | num_com += 1 449 | 450 | del tmp[:] 451 | appID_map.clear() 452 | total_used = 0 453 | #timetmp1 = time.time() 454 | p4_pd.register_hw_sync_appID_and_Seq() 455 | 456 | for i in range(used_aggr): 457 | tmp.append(p4_pd.register_read_appID_and_Seq(i , p4_pd.register_flags_t(False))) 458 | #timetmp2 = time.time() 459 | #print timetmp2-timetmp1 460 | #print "hello" 461 | for appID_and_Seq in tmp: 462 | appID = (appID_and_Seq[0]>>16) 463 | #print "appID", appID 464 | if appID != 0: 465 | if appID in appID_map: 466 | appID_map[appID] += 1 467 | else: 468 | appID_map[appID] = 1 469 | 470 | if appID in average_appID_map: 471 | average_appID_map[appID] += 1 472 | else: 473 | average_appID_map[appID] = 1 474 | 475 | for key, value in appID_map.items(): 476 | total_used += value 477 | #print "appID{0} {1}/{2} {3}".format(key, value, used_aggr, 1.0*value/used_aggr) 478 | average_total += total_used 479 | 480 | #if total_used != 0: 481 | #print "total_used {0}/{1} {2}".format(total_used, used_aggr, 1.0*total_used/used_aggr) 482 | time.sleep(0.09) 483 | 484 | if (time_count2 > 0.5): 485 | last_time2 = now 486 | for key, value in average_appID_map.items(): 487 | print "appID[{0}] {1}/{2} {3:.2f} %".format(key, value/num_com, used_aggr, 100.0*value/num_com/used_aggr) 488 | 489 | if average_total > 0: 490 | print "time {3:.1f} total_used {0}/{1} {2:.1f} %".format(average_total/num_com, used_aggr, 100.0*average_total/num_com/used_aggr, now - all_start) 491 | 492 | num_com = 0 493 | average_total = 0 494 | average_appID_map.clear() 495 | 496 | ''' 497 | while (1): 498 | 499 | now = time.time() 500 | time_count1 = now - last_time1 501 | time_count2 = now - last_time2 502 | if (time_count1 > 0.01): 503 | lasttime1=now 504 | num_com += 1 505 | 506 | del tmp[:] 507 | appID_map.clear() 508 | total_used = 0 509 | timetmp1 = time.time() 510 | p4_pd.register_hw_sync_appID_and_Seq() 511 | 512 | for i in range(used_aggr): 513 | tmp.append(p4_pd.register_read_appID_and_Seq(i , p4_pd.register_flags_t(False))) 514 | timetmp2 = time.time() 515 | time.sleep(0.5 - (timetmp2-timetmp1)) 516 | #print timetmp2-timetmp1 517 | #print "hello" 518 | for appID_and_Seq in tmp: 519 | appID = (appID_and_Seq[0]>>16) 520 | #print "appID", appID 521 | if appID != 0: 522 | if appID in appID_map: 523 | appID_map[appID] += 1 524 | else: 525 | appID_map[appID] = 1 526 | 527 | if appID in average_appID_map: 528 | average_appID_map[appID] += 1 529 | else: 530 | average_appID_map[appID] = 1 531 | 532 | for key, value in appID_map.items(): 533 | total_used += value 534 | #print "appID{0} {1}/{2} {3}".format(key, value, used_aggr, 1.0*value/used_aggr) 535 | average_total += total_used 536 | 537 | #if total_used != 0: 538 | #print "total_used {0}/{1} {2}".format(total_used, used_aggr, 1.0*total_used/used_aggr) 539 | #time.sleep(0.1) 540 | 541 | if (time_count2 > 1.0): 542 | last_time2 = now 543 | for key, value in average_appID_map.items(): 544 | print "appID[{0}] {1}/{2} {3:.2f} %".format(key, value/num_com, used_aggr, 100.0*value/num_com/used_aggr) 545 | 546 | if average_total > 0: 547 | print "time {3:.0f} total_used {0}/{1} {2:.1f} %".format(average_total/num_com, used_aggr, 100.0*average_total/num_com/used_aggr, now - all_start) 548 | 549 | num_com = 0 550 | average_total = 0 551 | average_appID_map.clear() 552 | 553 | ''' 554 | 555 | 556 | 557 | -------------------------------------------------------------------------------- /server/Makefile: -------------------------------------------------------------------------------- 1 | 2 | # All Target 3 | all: 4 | g++ -std=c++11 -O3 -g -c -o ParameterServer.o ParameterServer.cc 5 | g++ -std=c++11 -O3 -g -c -o ../common/dma_common.o ../common/dma_common.cc 6 | g++ -std=c++11 -O3 -g -c -o ../common/HashTable.o ../common/HashTable.cc 7 | g++ -std=c++11 -O3 -g -o app ParameterServer.o ../common/HashTable.o ../common/dma_common.o -lpthread -libverbs 8 | 9 | 10 | # Clean Target 11 | clean: 12 | rm *.o 13 | rm app 14 | -------------------------------------------------------------------------------- /server/ParameterServer.cc: -------------------------------------------------------------------------------- 1 | #include "ParameterServer.h" 2 | 3 | tensor_context *tensors; 4 | 5 | int max_agtr_size_per_thread; 6 | int UsedSwitchAGTRcount = MAX_AGTR_COUNT; 7 | std::mutex _dma_mutex; 8 | struct ibv_device **dev_list; 9 | struct ibv_device *ib_dev; 10 | ThreadPool* workQueue; 11 | std::mutex __print_mutex; 12 | std::mutex _init_mutex; 13 | int num_thread; 14 | int print_count = 0; 15 | int appID; 16 | 17 | 18 | 19 | 20 | long long int receive_in_sec[20] = {0}; 21 | bool receive_byte_reset_flag[20] = {0}; 22 | 23 | bool is_completed_p4ml_key[1024000] = {0}; 24 | 25 | int next_agtr[MAX_AGTR_COUNT] = {-1}; 26 | HashTable* hash_table; 27 | 28 | int packet_full_count = 0; 29 | int packet_partial_count = 0; 30 | int packet_all_forward_count = 0; 31 | int packet_partial_total_count = 0; 32 | 33 | #define MAX_MEASUREMENT_KEY 12000 34 | int full_packet_count[MAX_MEASUREMENT_KEY][16518] = { 0 }; 35 | int resend_packet_count[MAX_MEASUREMENT_KEY][16518] = { 0 }; 36 | 37 | 38 | DMAcontext** global_dma_contexts; 39 | 40 | void main_receive_packet_loop(DMAcontext* dma_context, int thread_id) { 41 | int msgs_completed = 0; 42 | int this_pos_to_send = 0; 43 | int total_last_tensor_packet = 0; 44 | int imm_pos_to_send = dma_context->my_send_queue_length / 2; 45 | bool app_init[MAX_APP_PER_THREAD] = {0}; 46 | 47 | /* Loss */ 48 | int loss = 0; 49 | 50 | int rand_index = 0; 51 | int total_loss = 0; 52 | 53 | // app start from 1 54 | int* tensors_pos_of_app = new int[MAX_APP_PER_THREAD + 1]; 55 | for (int i = 1; i <= MAX_APP_PER_THREAD; i++) { 56 | tensors_pos_of_app[i] = thread_id * MAX_STORAGE_PER_APP_PER_THREAD * MAX_APP_PER_THREAD + (i - 1) * MAX_STORAGE_PER_APP_PER_THREAD; 57 | } 58 | 59 | while (1) { 60 | 61 | cqe_snapshot_t cur_snapshot; 62 | msgs_completed = 0; 63 | 64 | std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now(); 65 | while(1) { 66 | 67 | // if (receive_byte_reset_flag[thread_id]) { 68 | // receive_in_sec[thread_id] = 0; 69 | // receive_byte_reset_flag[thread_id] = false; 70 | // } 71 | 72 | std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now(); 73 | std::chrono::duration time_span = std::chrono::duration_cast>(t2 - t1); 74 | 75 | msgs_completed = receive_packet(dma_context, &cur_snapshot); 76 | if (msgs_completed) { 77 | break; 78 | } 79 | if (time_span.count() > 10.0 && msgs_completed == 0 && dma_context->total_received > 0) { 80 | std::lock_guard lock(_dma_mutex); 81 | fprintf(stderr, "Timeout happened this thread_id=%d, total_received=%d, total_sent=%d, last_ACK=%d, total_last_tensor_packet_recv=%d\n", 82 | thread_id, global_dma_contexts[thread_id]->total_received, global_dma_contexts[thread_id]->total_sent, tensors[tensors_pos_of_app[1]].window_manager[0].last_ACK, total_last_tensor_packet); 83 | for (int i = 0; i < num_thread; i++) 84 | fprintf(stderr, "Timeout happened at thread_id=%d, total_received=%d, total_sent=%d\n", i, global_dma_contexts[i]->total_received, global_dma_contexts[i]->total_sent); 85 | 86 | for (uint64_t i = 0; i < MAX_MEASUREMENT_KEY; i++) { 87 | for (uint16_t j = 1; j <= ceil((float)MAX_TENSOR_SIZE/MAX_ENTRIES_PER_PACKET); j++) { 88 | if (full_packet_count[i][j]) { 89 | packet_full_count++; 90 | } else if (resend_packet_count[i][j]) { 91 | packet_partial_count++; 92 | packet_partial_total_count += resend_packet_count[i][j]; 93 | } else { 94 | packet_all_forward_count++; 95 | // printf("i:%d, j:%d\n", i, j); 96 | } 97 | } 98 | } 99 | printf("%d, %d, %d, %d\n", packet_full_count, packet_partial_count, packet_all_forward_count, packet_partial_total_count); 100 | 101 | int seen_agtrs = 0; 102 | for (int i = 0; i < MAX_AGTR_COUNT; i++) 103 | if (hash_table->isAlreadyDeclare[i]) 104 | seen_agtrs++; 105 | printf("Seen agtrs: %d\n", seen_agtrs); 106 | 107 | exit(-1); 108 | } 109 | } 110 | 111 | int to_be_sent = 0; 112 | if (this_pos_to_send + max_agtr_size_per_thread + max_agtr_size_per_thread > dma_context->my_send_queue_length / 2) 113 | this_pos_to_send = 0; 114 | 115 | // printf("%d packets received.\n", msgs_completed); 116 | for(int msg=0; msg < msgs_completed; msg++) { 117 | // std::chrono::high_resolution_clock::time_point packet_start = std::chrono::high_resolution_clock::now(); 118 | uint8_t* buf = &dma_context->mp_recv_ring[dma_context->ring_head * kAppRingMbufSize]; 119 | 120 | agghdr* p4ml_header = reinterpret_cast(buf + IP_ETH_UDP_HEADER_SIZE); 121 | 122 | //check ecn mark 123 | // bool is_ecn_mark_packet = p4ml_header->flag & 0x08; 124 | // if (is_ecn_mark_packet) 125 | // printf("ECN mark found.\n"); 126 | if (DEBUG_PRINT_ALL_RECEIVING_PACKET) 127 | p4ml_header_print_h(p4ml_header, "Receive"); 128 | 129 | bool isTerminated_packet = p4ml_header->flag & 0x02; 130 | bool isResend_packet = p4ml_header->flag & 0x04; 131 | bool isOverflow_packet = p4ml_header->flag & 0x80; 132 | 133 | // exit(1); 134 | p4ml_header_ntoh(p4ml_header); 135 | /* Move AppID index */ 136 | int appID = p4ml_header->appID; 137 | if (!app_init[appID]) { 138 | app_init[appID] = true; 139 | } else { 140 | if (p4ml_header->key != tensors[tensors_pos_of_app[appID]].key && tensors[tensors_pos_of_app[appID]].isCompleted) { 141 | // p4ml_header_print(p4ml_header, "ERROR PACKET"); 142 | // printf("tensors_pos_of_app[appID] from %d to %d\n", tensors_pos_of_app[appID], tensors_pos_of_app[appID]+1); 143 | tensors_pos_of_app[appID]++; 144 | if (tensors_pos_of_app[appID] == thread_id * MAX_APP_PER_THREAD * MAX_STORAGE_PER_APP_PER_THREAD + MAX_STORAGE_PER_APP_PER_THREAD * (appID)) 145 | tensors_pos_of_app[appID] = tensors_pos_of_app[appID] - MAX_STORAGE_PER_APP_PER_THREAD; 146 | } 147 | } 148 | 149 | if (!hash_table->isAlreadyDeclare[p4ml_header->agtr] && !isResend_packet) 150 | hash_table->isAlreadyDeclare[p4ml_header->agtr] = true; 151 | 152 | 153 | /* Check if Collision packet */ 154 | bool is_collision_packet = p4ml_header->flag & 0x02; 155 | bool is_lzy_collision = p4ml_header->is_lzy_Col & 0x01; 156 | 157 | 158 | 159 | if(is_collision_packet || isResend_packet){ 160 | tensors[tensors_pos_of_app[appID]].isCollision[p4ml_header->seq_num] = true; 161 | } 162 | 163 | int my_tensors_pos = tensors_pos_of_app[appID]; 164 | 165 | check_tensor_available(&tensors[my_tensors_pos], p4ml_header, thread_id); 166 | 167 | // char * eth_ip_header = (char*) dma_context->send_region + wc_recv_id * ENTRY_SIZE; 168 | // uint8_t swap[6]; 169 | // for (int i = 0; i < 6; i++) { 170 | // swap[i] = eth_ip_header[i]; 171 | // eth_ip_header[i] = eth_ip_header[i+6]; 172 | // eth_ip_header[i+6] = swap[i]; 173 | // } 174 | 175 | if (OVERFLOW_HANDLE) { 176 | // Check Switch Overflow but not Host Overflow 177 | if (!isOverflow_packet) 178 | for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++) 179 | if (p4ml_header->vector[i] == INT32_MAX || p4ml_header->vector[i] == INT32_MIN) 180 | { 181 | if (p4ml_header->vector[i] == INT32_MIN) 182 | p4ml_header_print(p4ml_header, "Switch Overflow"); 183 | isOverflow_packet = true; 184 | } 185 | 186 | // p4ml_header_print(p4ml_header, "Receive"); 187 | if (isOverflow_packet) { 188 | /* Clean Integer Data */ 189 | if (!tensors[my_tensors_pos].isFloat[p4ml_header->seq_num]) { 190 | // printf("ReadyForFloat\n"); 191 | makeTensorReadyforFloat(p4ml_header, &tensors[my_tensors_pos]); 192 | tensors[my_tensors_pos].isFloat[p4ml_header->seq_num] = true; 193 | } 194 | } 195 | 196 | /* Floating point request packet */ 197 | bool sendFloatRequest = false; 198 | if (isOverflow_packet && !isResend_packet) 199 | sendFloatRequest = true; 200 | if (!isOverflow_packet && isResend_packet && tensors[my_tensors_pos].isFloat[p4ml_header->seq_num]) 201 | sendFloatRequest = true; 202 | 203 | if (sendFloatRequest) { 204 | /* Do floating point request */ 205 | /* Send back request to everyone immediately */ 206 | p4ml_header_hton_without_data(p4ml_header); 207 | memcpy((char*) dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE), (char*) buf + IP_ETH_UDP_HEADER_SIZE, P4ML_LAYER_SIZE); 208 | /* then send ACK */ 209 | p4ml_header_setACK((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE))); 210 | p4ml_header_setOverflowRequest((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE))); 211 | p4ml_header_resetIndex((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE))); 212 | 213 | // p4ml_header_print_h((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE)), "Overflow Sendback PACKET"); 214 | send_packet(dma_context, P4ML_LAYER_SIZE, imm_pos_to_send); 215 | imm_pos_to_send++; 216 | if (imm_pos_to_send == dma_context->my_send_queue_length - 1) 217 | imm_pos_to_send = dma_context->my_send_queue_length / 2 + 1; 218 | 219 | /* Push Back */ 220 | dma_postback(dma_context); 221 | continue; 222 | } 223 | } 224 | 225 | /* Check Full Packet */ 226 | bool isFullPacket = (1 << p4ml_header->num_worker) - 1 == p4ml_header->bitmap? 1:0; 227 | 228 | 229 | if (receive_byte_reset_flag[thread_id]) { 230 | //receive_in_sec[thread_id] = 0; 231 | //receive_PS[thread_id] = 0; 232 | // receive_Aggr[thread_id] = 0; 233 | // receive_Total_Per_Task[thread_id] = 0; 234 | 235 | receive_byte_reset_flag[thread_id] = false; 236 | } 237 | 238 | /* if full packet, update directly. */ 239 | if (isFullPacket) { 240 | // printf("%d: full packet - seq %d update model.\n", p4ml_header->key, p4ml_header->seq_num); 241 | updateModel_force(p4ml_header, &tensors[my_tensors_pos]); 242 | for (int i = 0; i < p4ml_header->num_worker; i++) 243 | tensors[my_tensors_pos].window_manager[i].UpdateWindow(&p4ml_header->seq_num); 244 | 245 | if (p4ml_header->key < MAX_MEASUREMENT_KEY) { 246 | if (isResend_packet) { 247 | resend_packet_count[p4ml_header->key][p4ml_header->seq_num]++; 248 | } else { 249 | full_packet_count[p4ml_header->key][p4ml_header->seq_num]++; 250 | } 251 | } 252 | } else { 253 | 254 | 255 | bool type_consistent = false; 256 | if (tensors[my_tensors_pos].isFloat[p4ml_header->seq_num] && isOverflow_packet) 257 | type_consistent = true; 258 | if (!tensors[my_tensors_pos].isFloat[p4ml_header->seq_num] && !isOverflow_packet) 259 | type_consistent = true; 260 | 261 | if (type_consistent) { 262 | 263 | if (p4ml_header->key < MAX_MEASUREMENT_KEY) { 264 | if (isResend_packet) 265 | resend_packet_count[p4ml_header->key][p4ml_header->seq_num]++; 266 | } 267 | // printf("seq %d Partial packet receive.\n", p4ml_header->seq_num); 268 | // p4ml_header_print(p4ml_header, "Partial PACKET"); 269 | int valid_bit = 1; 270 | bool need_to_update = true; 271 | // check if update is needed 272 | for (int i = 0; i < p4ml_header->num_worker; i++) { 273 | if (valid_bit & p4ml_header->bitmap) { 274 | if (tensors[my_tensors_pos].window_manager[i].isACKed[p4ml_header->seq_num]) { 275 | // p4ml_header_print(p4ml_header, "ERROR PACKET"); 276 | // printf("[thread %d][worker %d]'s gredient is already integrated in PS, %d.\n", thread_id, i, p4ml_header->seq_num); 277 | need_to_update = false; 278 | break; 279 | } 280 | } 281 | valid_bit <<= 1; 282 | } 283 | 284 | if (need_to_update) { 285 | // printf("need to update\n"); 286 | int valid_bit = 1; 287 | for (int i = 0; i < p4ml_header->num_worker; i++) { 288 | if (valid_bit & p4ml_header->bitmap) { 289 | // TODO: Update Window will cause BUG, to be fix (floating point need reset ACK) 290 | tensors[my_tensors_pos].window_manager[i].UpdateWindow(&p4ml_header->seq_num); 291 | } 292 | valid_bit <<= 1; 293 | } 294 | updateModel(p4ml_header, &tensors[my_tensors_pos], isOverflow_packet); 295 | } 296 | 297 | } 298 | } 299 | // if any of the worker doesn't complete slot 300 | bool is_slot_completed = true; 301 | for (int i = 0; i < p4ml_header->num_worker; i++) 302 | if (!tensors[my_tensors_pos].window_manager[i].isACKed[p4ml_header->seq_num]) 303 | is_slot_completed = false; 304 | // printf("packet receive %d\n", p4ml_header->seq_num); 305 | if (is_slot_completed) { 306 | p4ml_header->bitmap = 1; 307 | 308 | uint16_t new_agtr; 309 | 310 | 311 | 312 | 313 | if (tensors[my_tensors_pos].isCollision[p4ml_header->seq_num] == true) { 314 | // Check if new agtr is already hashed 315 | if (next_agtr[p4ml_header->agtr] == -1) { 316 | int new_hash_agtr = hash_table->HashNew_predefine(); 317 | // if get any of AGTR from hash 318 | if (new_hash_agtr != -1) { 319 | new_agtr = new_hash_agtr; 320 | next_agtr[p4ml_header->agtr] = new_agtr; 321 | hash_table->hash_map[p4ml_header->agtr] = new_agtr; 322 | // printf("old: %d -> new: %d\n", p4ml_header->agtr, new_agtr); 323 | } else { 324 | // if all of the AGTR is used, full 325 | // keep original AGTR 326 | // printf("Change Agtr fail, full.\n"); 327 | new_agtr = p4ml_header->agtr; 328 | } 329 | } else { 330 | //TODO: Separate APP 331 | new_agtr = next_agtr[p4ml_header->agtr]; 332 | // printf("New hash - already: %d\n", new_agtr); 333 | // printf("[hashed] old: %d -> new: %d\n", p4ml_header->agtr, new_agtr); 334 | } 335 | 336 | p4ml_header_setLengthFieldToAgtr(p4ml_header, new_agtr); 337 | p4ml_header_setCollisionBit(p4ml_header); 338 | } else { 339 | p4ml_header_resetCollisionBit(p4ml_header); 340 | } 341 | 342 | int offset = (p4ml_header->seq_num - 1) * MAX_ENTRIES_PER_PACKET; 343 | 344 | p4ml_header_hton_without_data(p4ml_header); 345 | 346 | if (!isOverflow_packet) 347 | for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++) 348 | tensors[my_tensors_pos].data.data_int[offset + i] = htonl(tensors[my_tensors_pos].data.data_int[offset + i]); 349 | 350 | // /* Give higher priority to Resend packet */ 351 | if (isResend_packet) { 352 | 353 | // TODO: PACKET LOSS HANDLING FOR DOUBLE PACKET 354 | // printf("Immediately send back Resend packet %d\n", ntohl(p4ml_header->seq_num)); 355 | memcpy((char*) dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE), (char*) buf + IP_ETH_UDP_HEADER_SIZE, P4ML_HEADER_SIZE - 12); 356 | memcpy((char*) dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE) + P4ML_HEADER_SIZE - 12, tensors[my_tensors_pos].data.data_int + offset, P4ML_DATA_SIZE); 357 | memcpy((char*) dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE) + 14 + P4ML_DATA_SIZE, (char*) buf + IP_ETH_UDP_HEADER_SIZE + P4ML_DATA_SIZE + 14, 12); 358 | /* then send ACK */ 359 | p4ml_header_setACK((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE))); 360 | p4ml_header_resetIndex((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE))); 361 | 362 | send_packet(dma_context, P4ML_LAYER_SIZE, imm_pos_to_send); 363 | imm_pos_to_send++; 364 | if (imm_pos_to_send == dma_context->my_send_queue_length - 1) 365 | imm_pos_to_send = dma_context->my_send_queue_length / 2 + 1; 366 | 367 | } else { 368 | memcpy((char*) dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE, (char*) buf + IP_ETH_UDP_HEADER_SIZE, P4ML_HEADER_SIZE - 12); 369 | memcpy((char*) dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE + P4ML_HEADER_SIZE - 12, tensors[my_tensors_pos].data.data_int + offset, P4ML_DATA_SIZE); 370 | memcpy((char*) dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE + 14 + P4ML_DATA_SIZE, (char*) buf + IP_ETH_UDP_HEADER_SIZE + P4ML_DATA_SIZE + 14, 12); 371 | /* then send ACK */ 372 | p4ml_header_setACK((agghdr*)((char*)dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE)); 373 | p4ml_header_resetIndex((agghdr*)((char*)dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE)); 374 | 375 | to_be_sent++; 376 | } 377 | // printf("to_be_sent: %d\n", to_be_sent); 378 | 379 | if (tensors[tensors_pos_of_app[appID]].num_worker > 0) { 380 | bool this_tensor_finished = true; 381 | for (int i = 0; i < tensors[tensors_pos_of_app[appID]].num_worker; i++) 382 | if (tensors[tensors_pos_of_app[appID]].window_manager[i].last_ACK < tensors[tensors_pos_of_app[appID]].window_manager[i].total_ACK) 383 | this_tensor_finished = false; 384 | 385 | if (this_tensor_finished && !tensors[tensors_pos_of_app[appID]].isCompleted) { 386 | // printf("[Thread %d] Tensor %d at %d Completed.\n", thread_id, tensors[tensors_pos_of_app[appID]].key, tensors_pos_of_app[appID]); 387 | tensors[tensors_pos_of_app[appID]].isCompleted = true; 388 | rand_index = 0; 389 | // dma_context->total_received = 0; 390 | // dma_context->total_sent = 0; 391 | } 392 | } 393 | } 394 | 395 | /* Push Back */ 396 | dma_postback(dma_context); 397 | } 398 | 399 | dma_update_snapshot(dma_context, cur_snapshot); 400 | 401 | if (msgs_completed < 0) { 402 | printf("Polling error\n"); 403 | exit(1); 404 | } 405 | 406 | if (msgs_completed > 0) { 407 | dma_context->total_received += msgs_completed; 408 | if (receive_byte_reset_flag[thread_id]) { 409 | 410 | receive_byte_reset_flag[thread_id] = false; 411 | } 412 | else{ 413 | ; 414 | 415 | } 416 | if (to_be_sent > 0) { 417 | send_packet(dma_context, P4ML_LAYER_SIZE * to_be_sent, this_pos_to_send); 418 | } 419 | this_pos_to_send += to_be_sent; 420 | // Let assume the last packet will not loss 421 | } 422 | 423 | } 424 | } 425 | 426 | 427 | void Start(int thread_id) { 428 | bindingCPU((appID-1)*MAX_THREAD_PER_APP+thread_id); 429 | DMAcontext* dma_context; 430 | { 431 | std::lock_guard lock(_dma_mutex); 432 | 433 | dma_context = DMA_create(ib_dev, thread_id + ((appID - 1) * MAX_THREAD_PER_APP), true); 434 | // dma_context->isSent = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1]; 435 | // dma_context->send_time = new std::chrono::high_resolution_clock::time_point[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1]; 436 | // dma_context->receive_time = new std::chrono::high_resolution_clock::time_point[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1]; 437 | global_dma_contexts[thread_id] = dma_context; 438 | } 439 | 440 | main_receive_packet_loop(dma_context, thread_id); 441 | 442 | sleep(1000); 443 | } 444 | int nic = 0; 445 | int main(int argc, char *argv[]) { 446 | appID = atoi(argv[1]); 447 | 448 | bindingCPU(19); 449 | 450 | srand(time(NULL)); 451 | // num_thread = atoi(argv[1]); 452 | 453 | 454 | // Lam: this one is for experiment, disable temporary 455 | // if (argv[1]) 456 | // UsedSwitchAGTRcount = atoi(argv[1]); 457 | // else 458 | // UsedSwitchAGTRcount = MAX_AGTR_COUNT; 459 | num_thread = 9; 460 | 461 | 462 | 463 | max_agtr_size_per_thread = 50; 464 | UsedSwitchAGTRcount = 450; 465 | int tempargc = 2; 466 | 467 | if (argc > 2 ) { 468 | while (tempargc < argc){ 469 | std::string option = argv[tempargc++]; 470 | 471 | if (option == "-a") { 472 | int num_agtr = atoi(argv[tempargc++]); 473 | max_agtr_size_per_thread = num_agtr; 474 | } 475 | 476 | if (option == "-aa") { 477 | int num_used_agtr = atoi(argv[tempargc++]); 478 | UsedSwitchAGTRcount = num_used_agtr; 479 | } 480 | if (option == "-th") { 481 | int num_thread_argv = atoi(argv[tempargc++]); 482 | num_thread = num_thread_argv; 483 | } 484 | 485 | if (option == "-n"){ 486 | int num_nic = atoi(argv[tempargc++]); 487 | nic = num_nic; 488 | } 489 | } 490 | 491 | } 492 | 493 | 494 | dev_list = ibv_get_device_list(NULL); 495 | if (!dev_list) { 496 | perror("Failed to get devices list"); 497 | exit(1); 498 | } 499 | 500 | ib_dev = dev_list[nic]; 501 | /* 502 | for (int i = 0; i < 2; i++){ 503 | ib_dev = dev_list[i]; 504 | if (strcmp(ibv_get_device_name(ib_dev),nic)==0){ 505 | break; 506 | } 507 | } 508 | */ 509 | printf("using %s\n",ibv_get_device_name(ib_dev)); 510 | //ib_dev = dev_list[0]; 511 | if (!ib_dev) { 512 | fprintf(stderr, "IB device not found\n"); 513 | exit(1); 514 | } 515 | 516 | /* Init Thread */ 517 | workQueue = new ThreadPool(num_thread, [](){}); 518 | 519 | global_dma_contexts = new DMAcontext*[num_thread]; 520 | printf("\nUsedSwitchAGTRcount: %d\n\n", UsedSwitchAGTRcount); 521 | printf("max_agtr_size_per_thread: %d\n\n", max_agtr_size_per_thread); 522 | 523 | printf("Overflow Handled: %s\n\n", OVERFLOW_HANDLE? "TRUE":"FALSE"); 524 | /* Init tensors capacity */ 525 | tensors = new tensor_context[MAX_APP_PER_THREAD * MAX_STORAGE_PER_APP_PER_THREAD * num_thread]; 526 | printf("\nTensors memory pre-allocate...\n"); 527 | for (int i = 0; i < MAX_APP_PER_THREAD * MAX_STORAGE_PER_APP_PER_THREAD * num_thread; i++) 528 | init_tensor(&tensors[i], MAX_TENSOR_SIZE); 529 | 530 | hash_table = new HashTable(UsedSwitchAGTRcount); 531 | printf("\nHash table creating...\n\n"); 532 | memset(next_agtr, -1, sizeof(int) * MAX_AGTR_COUNT); 533 | 534 | for (int i = 0; i < num_thread; i++) 535 | workQueue->enqueue(Start, i); 536 | 537 | std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now(); 538 | std::chrono::time_point timer = std::chrono::high_resolution_clock::now(); 539 | 540 | while (1) { 541 | std::chrono::time_point current_time = std::chrono::high_resolution_clock::now(); 542 | std::chrono::duration time_span = std::chrono::duration_cast>(current_time - timer); 543 | std::chrono::duration total_time = std::chrono::duration_cast>(current_time - t1); 544 | if (time_span.count() >= 0.5) { 545 | // printf("############################################\n"); 546 | double total_bandwidth = 0.0; 547 | 548 | for (int i = 0; i < num_thread; i++) { 549 | // printf("[thread %d] %lf Gbps.\n", i, receive_in_sec[i] * 194.0 / 1024.0 / 1024.0 / 1024.0 * 8.0); 550 | total_bandwidth += receive_in_sec[i] * 194.0 / 1024.0 / 1024.0 / 1024.0 * 8.0 / time_span.count(); 551 | 552 | receive_byte_reset_flag[i] = true; 553 | 554 | 555 | receive_in_sec[i] = 0; 556 | } 557 | 558 | 559 | 560 | // total_sent = 0; 561 | timer = current_time; 562 | } 563 | } 564 | 565 | 566 | sleep(10000000); 567 | 568 | } 569 | -------------------------------------------------------------------------------- /server/ParameterServer.h: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include "../common/packet.h" 16 | #include "../common/dma_common.h" 17 | #include "../common/ThreadPool.h" 18 | #include "../common/utils.h" 19 | #include "../common/window_manager.h" 20 | #include "../common/HashTable.h" 21 | 22 | #define MAX_TENSOR_SIZE 1024000 23 | // Lam: this one is useless since a PS can only handle 1app, to be mod. 24 | #define MAX_APP_PER_THREAD 5 25 | #define MAX_STORAGE_PER_APP_PER_THREAD 10 26 | #define MAX_WORKER 16 27 | 28 | #define MAX_THREAD_PER_APP 10 29 | 30 | #define OVERFLOW_HANDLE false 31 | 32 | 33 | union data_t { 34 | int32_t *data_int; 35 | float *data_float; 36 | }; 37 | 38 | struct tensor_context { 39 | bool* isOccupy; 40 | bool* isCollision; 41 | bool* isFloat; 42 | bool* isLZYCol; 43 | bool isCompleted; 44 | 45 | data_t data; 46 | uint32_t len; 47 | uint64_t key; 48 | uint8_t num_worker; 49 | WindowManager* window_manager; 50 | std::chrono::time_point start_time; 51 | }; 52 | 53 | void inline init_tensor(tensor_context* tensor, uint32_t len) { 54 | tensor->data.data_int = new int32_t[len](); 55 | tensor->isCompleted = true; 56 | tensor->isOccupy = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1](); 57 | tensor->isCollision = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1](); 58 | tensor->isLZYCol = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1](); //useless 59 | tensor->isFloat = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1](); 60 | tensor->len = 0; 61 | tensor->num_worker = 0; 62 | tensor->key = 0xffffffffffffffff; 63 | tensor->window_manager = new WindowManager[MAX_WORKER]; 64 | for (int i = 0; i < MAX_WORKER; i++) { 65 | tensor->window_manager[i].isACKed = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1](); 66 | tensor->window_manager[i].total_ACK = MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1; 67 | } 68 | } 69 | 70 | int inline check_tensor_available(tensor_context* tensor, agghdr* p4ml_header, int thread_id) { 71 | // printf("*skey: %d, seq: %d\n", *skey, p4ml_header->seq_num); 72 | 73 | // Already have completed model and not retrieve 74 | if (tensor->isCompleted && p4ml_header->key != tensor->key) { 75 | int total_ACK = ceil((float)p4ml_header->len_tensor / MAX_ENTRIES_PER_PACKET); 76 | for (int i = 0; i < p4ml_header->num_worker; i++) 77 | tensor->window_manager[i].Reset(total_ACK); 78 | // if (thread_id == 0) 79 | // printf("Reset tensors[%d] LAST_ACK: %d\n", *skey, tensor->window_manager[0].last_ACK); 80 | memset(tensor->data.data_int, 0, sizeof(int32_t) * p4ml_header->len_tensor); 81 | memset(tensor->isOccupy, 0, sizeof(bool) * (total_ACK + 1)); 82 | memset(tensor->isCollision, 0, sizeof(bool) * (total_ACK + 1)); 83 | memset(tensor->isFloat, 0, sizeof(bool) * (total_ACK + 1)); 84 | tensor->num_worker = p4ml_header->num_worker; 85 | tensor->len = p4ml_header->len_tensor; 86 | tensor->isCompleted = false; 87 | tensor->key = p4ml_header->key; 88 | // printf("Place %d available, real key = %d\n", *skey, tensors[*skey].key); 89 | return 1; 90 | } 91 | return 0; 92 | } 93 | 94 | void inline makeTensorReadyforFloat(agghdr *p4ml_header, tensor_context *tensor_cnt) { 95 | int32_t* data = tensor_cnt->data.data_int; 96 | uint16_t *p_seq = &p4ml_header->seq_num; 97 | int32_t *p_model = p4ml_header->vector; 98 | uint32_t offset = (*p_seq - 1) * MAX_ENTRIES_PER_PACKET; 99 | 100 | /* Reset Data */ 101 | memset(data + offset, 0, sizeof(int32_t) * MAX_ENTRIES_PER_PACKET); 102 | tensor_cnt->isOccupy[*p_seq] = false; 103 | 104 | /* Reset Bitmap */ 105 | for (int i = 0; i < p4ml_header->num_worker; i++) { 106 | tensor_cnt->window_manager[i].isACKed[p4ml_header->seq_num] = 0; 107 | } 108 | } 109 | 110 | void inline updateModel(agghdr *p4ml_header, tensor_context *dst_place, bool isFloat) { 111 | int32_t* data = dst_place->data.data_int; 112 | uint16_t *p_seq = &p4ml_header->seq_num; 113 | uint32_t *tensor_len = &p4ml_header->len_tensor; 114 | int32_t *p_model = p4ml_header->vector; 115 | uint32_t offset = (*p_seq - 1) * MAX_ENTRIES_PER_PACKET; 116 | // printf("dst_place->isOccupy[%d]: %d\n", *p_seq - 1, dst_place->isOccupy[*p_seq - 1]); 117 | if (!dst_place->isOccupy[*p_seq]) { 118 | // printf("replace\n"); 119 | if (offset < *tensor_len) { 120 | if (offset + MAX_ENTRIES_PER_PACKET > *tensor_len) 121 | memcpy(data + offset, p_model, sizeof(int32_t) * (*tensor_len % MAX_ENTRIES_PER_PACKET)); 122 | else 123 | memcpy(data + offset, p_model, sizeof(int32_t) * MAX_ENTRIES_PER_PACKET); 124 | } else { 125 | printf("Update with offset %d > tensor length %d, something wrong.\n", offset, *tensor_len); 126 | } 127 | dst_place->isOccupy[*p_seq] = true; 128 | } else { 129 | // printf("addition\n"); 130 | if (isFloat) { 131 | float* data = dst_place->data.data_float; 132 | float* p_model = (float*) p4ml_header->vector; 133 | 134 | if (offset < *tensor_len) { 135 | for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++) 136 | data[offset + i] += p_model[i]; 137 | } else { 138 | printf("Update with offset %d > tensor length %d, something wrong.\n", offset, *tensor_len); 139 | } 140 | } else { 141 | if (offset < *tensor_len) { 142 | for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++) 143 | data[offset + i] += p_model[i]; 144 | } else { 145 | printf("Update with offset %d > tensor length %d, something wrong.\n", offset, *tensor_len); 146 | } 147 | } 148 | } 149 | } 150 | 151 | void inline updateModel_force(agghdr *p4ml_header, tensor_context *dst_place) { 152 | int32_t* data = dst_place->data.data_int; 153 | uint16_t *p_seq = &p4ml_header->seq_num; 154 | uint32_t *tensor_len = &p4ml_header->len_tensor; 155 | int32_t *p_model = p4ml_header->vector; 156 | uint32_t offset = (*p_seq - 1) * MAX_ENTRIES_PER_PACKET; 157 | 158 | if (offset < *tensor_len) { 159 | if (offset + MAX_ENTRIES_PER_PACKET > *tensor_len) 160 | memcpy(data + offset, p_model, sizeof(int32_t) * (*tensor_len % MAX_ENTRIES_PER_PACKET)); 161 | else 162 | memcpy(data + offset, p_model, sizeof(int32_t) * MAX_ENTRIES_PER_PACKET); 163 | } else { 164 | printf("Update with offset %d > tensor length %d, something wrong.\n", offset, *tensor_len); 165 | } 166 | dst_place->isOccupy[*p_seq] = true; 167 | } -------------------------------------------------------------------------------- /server/ParameterServer.o: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/server/ParameterServer.o -------------------------------------------------------------------------------- /server/app: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/server/app -------------------------------------------------------------------------------- /shell/dealdata.sh: -------------------------------------------------------------------------------- 1 | cat ../datasample/job_A.txt | awk '{if($1=="Thr") print $2}' > thr_A.txt 2 | cat ../datasample/job_B.txt | awk '{if($1=="Thr") print $2}' > thr_B.txt 3 | cat ../datasample/switch.txt | awk '{if($1=="appID[1]") print $3}' > occ_A.txt 4 | cat ../datasample/switch.txt | awk '{if($1=="appID[2]") print $3}' > occ_B.txt 5 | 6 | python gencsv.py -------------------------------------------------------------------------------- /shell/gencsv.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import itertools 3 | import string 4 | 5 | 6 | filePath = "result.csv" 7 | jobn = ['A', 'B'] 8 | matrix = ['thr', 'occ'] 9 | 10 | lis = [] 11 | k=0 12 | for i in matrix: 13 | for j in jobn: 14 | with open(i+"_"+j+".txt") as f: 15 | temp = f.readlines() 16 | lis.append([ string.atof(x.strip()) for x in temp]) 17 | lis[k].insert(0,i+"_"+j) 18 | k = k + 1 19 | f.close() 20 | 21 | rows = list(itertools.izip_longest(lis[0],lis[1],lis[2],lis[3])) 22 | 23 | with open(filePath, "w") as f: 24 | writer = csv.writer(f) 25 | for row in rows: 26 | writer.writerow(row) 27 | -------------------------------------------------------------------------------- /shell/occ_A.txt: -------------------------------------------------------------------------------- 1 | 46.74 2 | 45.04 3 | 55.04 4 | 41.33 5 | 54.00 6 | 51.33 7 | 16.52 8 | -------------------------------------------------------------------------------- /shell/occ_B.txt: -------------------------------------------------------------------------------- 1 | 4.37 2 | 4.15 3 | 1.70 4 | 2.81 5 | 1.19 6 | 2.00 7 | 55.93 8 | 84.89 9 | 91.33 10 | 95.33 11 | 90.81 12 | 96.96 13 | 94.96 14 | 87.19 15 | 88.74 16 | 94.96 17 | 95.19 18 | 95.48 19 | 58.22 20 | -------------------------------------------------------------------------------- /shell/result.csv: -------------------------------------------------------------------------------- 1 | thr_A,thr_B,occ_A,occ_B 2 | 48.673178,14.525902,46.74,4.37 3 | 50.194216,15.514576,45.04,4.15 4 | 50.118166,15.590628,55.04,1.7 5 | 50.042113,15.742732,41.33,2.81 6 | 49.966063,15.970888,54.0,1.19 7 | 50.270271,15.590628,51.33,2.0 8 | ,16.122991,16.52,55.93 9 | ,15.742732,,84.89 10 | ,15.210369,,91.33 11 | ,15.894835,,95.33 12 | ,15.894835,,90.81 13 | ,15.970888,,96.96 14 | ,15.742732,,94.96 15 | ,16.046939,,87.19 16 | ,16.199043,,88.74 17 | ,16.275095,,94.96 18 | ,16.04694,,95.19 19 | ,16.50325,,95.48 20 | ,16.275095,,58.22 21 | ,16.655353,, 22 | ,16.199043,, 23 | -------------------------------------------------------------------------------- /shell/thr_A.txt: -------------------------------------------------------------------------------- 1 | 48.673178 2 | 50.194216 3 | 50.118166 4 | 50.042113 5 | 49.966063 6 | 50.270271 7 | -------------------------------------------------------------------------------- /shell/thr_B.txt: -------------------------------------------------------------------------------- 1 | 14.525902 2 | 15.514576 3 | 15.590628 4 | 15.742732 5 | 15.970888 6 | 15.590628 7 | 16.122991 8 | 15.742732 9 | 15.210369 10 | 15.894835 11 | 15.894835 12 | 15.970888 13 | 15.742732 14 | 16.046939 15 | 16.199043 16 | 16.275095 17 | 16.046940 18 | 16.503250 19 | 16.275095 20 | 16.655353 21 | 16.199043 22 | --------------------------------------------------------------------------------