├── LICENSE
├── README.md
├── benchmark.md
├── client
    ├── Makefile
    ├── app
    ├── main.cc
    ├── main.o
    ├── p4ml_manager.cc
    ├── p4ml_manager.h
    └── p4ml_manager.o
├── common
    ├── CC_manager.h
    ├── HashTable.cc
    ├── HashTable.h
    ├── HashTable.o
    ├── ThreadPool.h
    ├── dma_common.cc
    ├── dma_common.h
    ├── dma_common.o
    ├── mlx5_defs.h
    ├── p4ml_struct.h
    ├── packet.h
    ├── quantize.h
    ├── utils.h
    └── window_manager.h
├── datasample
    ├── job_A.txt
    ├── job_B.txt
    └── switch.txt
├── p4ml2
    ├── includes
    │   ├── actions.p4
    │   ├── common.p4
    │   ├── headers.p4
    │   ├── parser.p4
    │   ├── registers.p4
    │   └── tables.p4
    └── p4ml2.p4
├── ptf_p4ml2
    ├── ptfTest.py
    └── ptfTest.pyc
├── run_pd_rpc
    └── setupp4ml2.py
├── server
    ├── Makefile
    ├── ParameterServer.cc
    ├── ParameterServer.h
    ├── ParameterServer.o
    └── app
└── shell
    ├── dealdata.sh
    ├── gencsv.py
    ├── occ_A.txt
    ├── occ_B.txt
    ├── result.csv
    ├── thr_A.txt
    └── thr_B.txt


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 CSU-NetLab
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # A2TP-Eurosys2023
 2 | 
 3 | ## Abstract
 4 | Compared with the state-of-the-art in-network aggregation protocols, A2TP redesigns the congestion control algorithm, decouples in-network aggregator congestion and link bandwidth congestion, and considers the impact of stragglers, so as to improve the efficiency of in-network aggregation. We implement A2TP with P4 programmable switch and kernel bypass protocol stack at the end host. We provide the source code of the whole system. In addition, we conduct a benchmark experiment and provide experimental steps in detail to verify the performance of A2TP.
 5 | 
 6 | 
 7 | ## Description \& Requirment
 8 | Our source code is built on ATP [1] (https://github.com/in-ATP/ATP.git), which is one of the advanced in-network aggregation schemes. Our core design is in the congestion control part, such as the decoupled window adjustment in CC_manerger.h and the straggling degree detection in p4ml_manager.cc, etc.
 9 | ```
10 | [1] C. Lao, Y. Le, K. Mahajan, Y. Chen, W. Wu, A. Akella and M.M. Swift. ATP: In-network Aggregation for Multi-tenant Learning. In Proc. NSDI, 2021.
11 | ```
12 | 
13 | ### Hardware dependencies
14 | The Mellanox NIC is required for the hosts. We recommend using the Mellanox ConnectX5 100Gbps NIC. Besides, our system requires programmable switch support and the recommended switch is Barefoot Wedge 100BF-32X.
15 | 
16 | ### Software dependencies
17 | For each host, the NIC driver version is MLNX\_OFED\_LINUX-4.9-4.1.7.0.
18 | For aggregating switch, the SDE version is bf-sde-8.9.1.
19 | 
20 | ## Directory Structure
21 | ```
22 | ./client ./common and ./server are deployed on workers and PSs
23 | ./p4ml2 ./ptf_p4ml2 and ./run_pd_rpc are used in programmable switch
24 | ./datasample is the sample data from the benchmark experiment
25 | ./shell is used to deal with the sample data and generate a .csv file
26 | ```
27 | 
28 | ## Basic Performance
29 | The detailed steps of the benchmark experiment are described in benchmark.md
30 | 


--------------------------------------------------------------------------------
/benchmark.md:
--------------------------------------------------------------------------------
 1 | # Benchmark
 2 | ## Set Up
 3 | The experiment needs a Barefoot Wedge 100BF-32X programmable switch and 11 servers. All servers are connected to the switch with 100Gbps links. There are 2 jobs, each of which has 4 workers and 1 PS (Parameter Server). They share 1 aggregating switch. 
 4 | To successfully install the NIC driver, you need to use the adapted version of OS kernel. 
 5 | Ubuntu 20.04 with Linux 5.4.0-26-generic is feasible to install the NIC driver. Note that configuring 4 workers per job is not necessary. If you don't have enough machines, you can reduce the number of workers for any job.
 6 | 
 7 | ## Experiment Steps
 8 | ### Compile P4 program
 9 | ```
10 | $ cd ~/bf-sde-8.9.1/pkgsrc/p4-build
11 | $ ./configure --prefix=$SDE_INSTALL --with-tofino P4_NAME=p4ml2 P4_PATH=/root/bf-sde-8.9.1/pkgsrc/p4-examples/programs/p4ml2/p4ml2.p4 --enable-thrift
12 | $ make
13 | $ make install
14 | ```
15 | ```
16 | $ cd ~/bf-sde-8.9.1/pkgsrc/p4-examples
17 | $ ./configure --prefix=$SDE_INSTALL 
18 | $ make
19 | $ make install
20 | ```
21 | 
22 | 
23 | ### Compile worker and parameter server
24 | ```
25 | $ cd $A2TP/client/
26 | $ make
27 | ```
28 | ```
29 | $ cd $A2TP/server/
30 | $ make
31 | ```
32 | 
33 | 
34 | ### Run switch program, configure ports, and install table entries
35 | ```
36 | $ cd $SDE
37 | $ ./run_switchd.sh -p p4ml2  (Terminal1)
38 | ```
39 | ```
40 | $ $SDE/run_p4_tests.sh -t $A2TP/ptf_p4ml2/ -p p4ml2 (Terminal2)
41 | ```
42 | ```
43 | $ $TOOLS/run_pd_rpc.py -p p4ml2 $A2TP/run_pd_rpc/setupp4ml2.py (Terminal3)
44 | ```
45 | 
46 | ### Run server of job A and job B
47 | ```
48 | $ #Usage: ./app [AppID]
49 | $ sudo ./app 1
50 | $ sudo ./app 2
51 | ```
52 | After running successfully, the server will wait for the sender to start.
53 | 
54 | ### Generate background traffic
55 | Choose one of the workers as the sender and any idle server as the receiver. Then, use the following command to generate UDP background traffic. 
56 | This step simply simulates the bandwidth contention in the large distributed machine learning clusters.
57 | ```
58 | $ iperf -c [destination ip] -B [local ip] -u -l 50000 -t 90 -i 1 -b 10G -P 8
59 | ```
60 | 
61 | 
62 | ### Run nth worker of job A and mth worker of job B
63 | ```
64 | $ #Usage: ./app [workerID] [Num of Worker] [AppID] [Num of PS]
65 | $ sudo ./app n 4 1 1
66 | $ sudo ./app m 4 2 1
67 | ```
68 | If the step is successful, the terminal will print the throughput in each terminal. 
69 | 
70 | ### Expected result
71 | 
72 | The aggregation throughput and aggregator occupancy will be printed in the terminal. Moreover, the results can be redirected to a file, and then you can use the shell to deal with the data. We collect the files of sample data from this basic experiment, and you can run "./dealdata.sh" command to generate a result.csv file, which includes the real-time aggregation throughput and aggregator occupancy of job A and job B.
73 | The expected result is that the straggling job only occupies a small number of aggregators, relinquishing the aggregators for the non-straggling job, 
74 | so that the non-straggling job still maintains a high aggregation throughput (about 50Gbps in our environment). 
75 | However, the job with severe straggling can still utilize PS for aggregation, so the overall aggregation throughput is improved.
76 | 


--------------------------------------------------------------------------------
/client/Makefile:
--------------------------------------------------------------------------------
 1 | # CFLAGS  := -O1 -g
 2 | # LD      := g++
 3 | # LDFLAGS := ${LDFLAGS} -lrdmacm -libverbs -lrt -lpthread  -lm
 4 | 
 5 | # ROCE_COMMON_PATH = ../common/
 6 | # INCLUDES  = -I${ROCE_COMMON_PATH}
 7 | # CFLAGS := ${CFLAGS} ${INCLUDES}
 8 | # SOURCES := $(wildcard *.c *.h ${ROCE_COMMON_PATH}*.c ${ROCE_COMMON_PATH}*.h)
 9 | 
10 | 
11 | # all: app 
12 | # app: main.o  p4ml_manager.o ${ROCE_COMMON_PATH}packet.o ${ROCE_COMMON_PATH}dma_common.o ${ROCE_COMMON_PATH}window_manager.o
13 | # 	${LD} $(CFLAGS) -o $@ $^ ${LDFLAGS}
14 | 
15 | 
16 | # # Clean Target
17 | # clean:
18 | # 	rm *.o ../common/*.o
19 | # 	rm app
20 | 
21 | all:
22 | 	g++ -std=c++11 -g -O1 -c -o main.o main.cc
23 | 	g++ -std=c++11 -g -O1 -c -o p4ml_manager.o p4ml_manager.cc  -mavx
24 | 	g++ -std=c++11 -g -O1 -c -o ../common/HashTable.o ../common/HashTable.cc
25 | 	g++ -std=c++11 -g -O1 -c -o ../common/dma_common.o ../common/dma_common.cc
26 | 	g++ -std=c++11 -g -O1 -I../common/ -o app main.o p4ml_manager.o ../common/HashTable.o ../common/dma_common.o -lrdmacm -libverbs -lrt -lpthread  -lm 
27 | 
28 | clean:
29 | 	rm *.o
30 | 	rm app
31 | 


--------------------------------------------------------------------------------
/client/app:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/client/app


--------------------------------------------------------------------------------
/client/main.cc:
--------------------------------------------------------------------------------
  1 | #include "p4ml_manager.h"
  2 | 
  3 | #define ENABLE_LOG true
  4 | 
  5 | uint32_t* init_model(int size) {
  6 |     uint32_t* tmp = new uint32_t[size];
  7 |     for (int i = 0; i < size; i++)
  8 |         tmp[i] = i+1;
  9 |     return tmp;
 10 | }
 11 | 
 12 | float* init_model_float(int size) {
 13 |     float* tmp = new float[size];
 14 |     for (int i = 0; i < size; i++) {
 15 |         tmp[i] = (i+1.0) / 10000000.0;
 16 |         // tmp[i] = (i + 1.0) / 10000.0;
 17 |         // printf("%f ", tmp[i]);
 18 |     }
 19 |     // tmp[63] = 200;
 20 |     return tmp;
 21 | }
 22 | 
 23 | float* init_model_float_with_overflow(int size) {
 24 |     float* tmp = new float[size];
 25 |     for (int i = 0; i < size; i++) {
 26 |         tmp[i] = (i+1.0) / 10000000.0;
 27 |     }
 28 |     for (int i = 0; i < 100; i++) {
 29 |         int rand_num = rand() % size;
 30 |         if (rand_num > size / 2)
 31 |             tmp[rand_num] = 200;
 32 |         else 
 33 |             tmp[rand_num] = 100;
 34 |         // printf("rand!!! %d\n", rand_num);
 35 |     }
 36 |     return tmp;
 37 | }
 38 | 
 39 | 
 40 | std::shared_ptr<P4mlManager> _p4ml_manager;
 41 | 
 42 | 
 43 | int main(int argc, char *argv[])
 44 | {
 45 |     
 46 |      bindingCPU(19);
 47 |     if (argc < 5) {
 48 |         printf("\nUsage %s [MyID] [Num of Worker] [AppID] [Num of PS]\n\n", argv[0]);
 49 |         exit(1);
 50 |     }
 51 | 
 52 |     int host = atoi(argv[1]); 
 53 |     int num_worker = atoi(argv[2]); 
 54 |     int appID = atoi(argv[3]); 
 55 |     int num_PS = atoi(argv[4]);
 56 |    
 57 |     //int host = 0;
 58 |     // int num_worker = 2;
 59 |     // int appID = 1;
 60 | 
 61 |     _p4ml_manager = std::shared_ptr<P4mlManager>(new P4mlManager(host, num_worker, appID, num_PS));
 62 |     
 63 |     /* Here for int size to send per thread */
 64 |     /* ex. 25600 = 32*800 = 1 Round */
 65 |     int size = 1024000;
 66 |     int thread_to_use = 9;
 67 |     int loop_time = 500;
 68 |     _p4ml_manager->SetMaxWindow(50);
 69 |     _p4ml_manager->SetUsedSwitchAGTRcount(450);
 70 |     _p4ml_manager->SetMaxAgtrSizePerThread(50);
 71 |     //printf("argc %d\n",argc);
 72 |     int tempargc = 5;
 73 | 
 74 |     if (argc > 5 ) {
 75 |         while (tempargc < argc){
 76 |             std::string option = argv[tempargc++];
 77 |             if(option == "-t"){
 78 |                 int trange = atoi(argv[tempargc++]);
 79 |                 _p4ml_manager->SetTailTimeRange(trange);
 80 |                 printf("tail_time_range:%d\n",trange);
 81 |             }
 82 |             if(option == "-th"){
 83 |                 int num_thread_argv = atoi(argv[tempargc++]);
 84 |                 thread_to_use = num_thread_argv;
 85 |                 //printf("tail_time_range:%d\n",trange);
 86 |             }
 87 |             if (option == "-a") {
 88 |                 int num_agtr = atoi(argv[tempargc++]);
 89 |                 _p4ml_manager->SetMaxAgtrSizePerThread(num_agtr);
 90 |             } 
 91 |             if (option == "-f") {
 92 |                 float forward_rate = atof(argv[tempargc++]);
 93 |                 _p4ml_manager->SetForceForward(forward_rate);
 94 |             }
 95 |             if (option == "-l") {
 96 |                 loop_time = atof(argv[tempargc++]);
 97 |             }
 98 |             if (option == "-aa") {
 99 |                 int num_used_agtr = atoi(argv[tempargc++]);
100 |                 _p4ml_manager->SetUsedSwitchAGTRcount(num_used_agtr);
101 |             }
102 | 
103 |             if (option == "-w") {
104 |                 int wsize = atoi(argv[tempargc++]);
105 |                 _p4ml_manager->SetMaxWindow(wsize);
106 |             }
107 | 
108 |             if (option == "-n"){
109 |                 int num_nic = atoi(argv[tempargc++]);
110 |                 _p4ml_manager->SetNic(num_nic);
111 |             }
112 |         }
113 |     }
114 | 
115 |     /* (40) Threads in thread pool */
116 |     /* MAX_AGTR (32000) / 40 = 800 Agtr per thread */
117 |     printf("thread_to_use %d\n", thread_to_use);
118 |     _p4ml_manager->init_threadPool(thread_to_use);
119 | 
120 |     // _p4ml_manager->SetForceForward(0.25);
121 |     // _p4ml_manager->SetMaxAgtrSizePerThread(50);
122 | 
123 |     int finish_counter = loop_time * thread_to_use;
124 |     uint32_t** tensor = new uint32_t*[thread_to_use * loop_time];
125 | 
126 |     printf("\nModel initializing...\n");
127 |     // for (int i = 0; i < thread_to_use * loop_time; i++)
128 |     for (int i = 0; i < 1; i++)
129 |         if (FLOATING_POINT_INPUT)
130 |             tensor[i] = (uint32_t*) init_model_float_with_overflow(size);
131 |         else 
132 |             tensor[i] = init_model(size);
133 |         
134 |     printf("\nModel initialized completed. Start sending...\n\n");
135 | 
136 |     std::chrono::time_point<std::chrono::system_clock> timer = std::chrono::high_resolution_clock::now();
137 | 
138 |     std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
139 |     
140 |     for (int j = 0; j < loop_time; j++) {
141 |     /* thread to use */
142 |         for (int i = 0; i < thread_to_use; i++) {
143 |             uint64_t key = _p4ml_manager->GetNewKey();
144 |             _p4ml_manager->PushPull(key, (char*) tensor[0], size, 1);
145 |         }
146 |     }
147 | 
148 | 
149 |     int total_sent = 0;
150 | 
151 |     while (finish_counter > 0) {
152 |         int64_t tmp_key = _p4ml_manager->GetFinishKey();
153 |         if (tmp_key >= 0) {
154 |             finish_counter--;
155 |             total_sent++;
156 |         }
157 | 
158 |         if (ENABLE_LOG) {
159 |             std::chrono::time_point<std::chrono::system_clock> current_time =
160 |                 std::chrono::high_resolution_clock::now();
161 |             std::chrono::duration<double> time_span =
162 |                 std::chrono::duration_cast<std::chrono::duration<double>>(current_time - timer);
163 |             std::chrono::duration<double> total_time =
164 |                 std::chrono::duration_cast<std::chrono::duration<double>>(current_time - t1);
165 |             if (time_span.count() >= 0.5) {
166 |                 //printf("time_span %lf\n",time_span.count());
167 |                 // printf("Tensor left: %d, ", finish_counter);
168 |                 // printf("total send %" PRIu64 " bytes, time %lf, throughput: %lf\n", total_sent * 32000 * 194, total_time, total_sent * 6062.5 / 1024.0 / 1024.0 * 8.0 / 1.0);
169 |                 // printf("%lf\n", total_sent * 6062.5 / 1024.0 / 1024.0 * 8.0 / 1.0);
170 |                 // int tmp = _p4ml_manager->GetCollisionTimeAndClear();
171 |                 // if (tmp)
172 |                 //     printf("%d\n", tmp);
173 |                 // printf("%d\n", _p4ml_manager->GetCollisionTimeAndClear());
174 | 
175 |                 double t_thr = (float)total_sent * (16517 * P4ML_PACKET_SIZE) / 1024 / 1024 / 1024 * 8 / time_span.count();
176 |                 double thr = (float)total_sent * (16517 * P4ML_DATA_SIZE) / 1024 / 1024 / 1024 * 8 / time_span.count();
177 |                 printf("Thr %lf \n", thr);
178 |                 total_sent = 0;
179 |                 timer = current_time;
180 |             }
181 |         }
182 |     }
183 | 
184 |     std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();    
185 |     std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
186 |     double transmit_size_in_m = (double)((double)size * loop_time * thread_to_use / (float)MAX_ENTRIES_PER_PACKET) * P4ML_PACKET_SIZE / 1024 / 1024;
187 |     double total_time = time_span.count();
188 |     double throughput = (transmit_size_in_m / 1024 * 8 ) / total_time;
189 |     printf("Finish all %d Tensors,\n  Time = %lf s,\n  Total Size = %lf MB,\n  Throughput: %lf Gbps\n\n", thread_to_use * loop_time, total_time, transmit_size_in_m, throughput);
190 | 
191 | }
192 | 


--------------------------------------------------------------------------------
/client/main.o:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/client/main.o


--------------------------------------------------------------------------------
/client/p4ml_manager.h:
--------------------------------------------------------------------------------
  1 | 
  2 | #ifndef P4ML_MANAGER_H
  3 | #define P4ML_MANAGER_H
  4 | 
  5 | #include "../common/dma_common.h"
  6 | #include "../common/packet.h"
  7 | #include "../common/utils.h"
  8 | #include "../common/window_manager.h"
  9 | #include "../common/HashTable.h"
 10 | #include "../common/quantize.h"
 11 | #include "../common/p4ml_struct.h"
 12 | #include <arpa/inet.h>
 13 | #include <assert.h>
 14 | #include <bits/stdc++.h>
 15 | #include <chrono>
 16 | #include <condition_variable>
 17 | #include <cstring>
 18 | #include <ctime>
 19 | #include <iostream>
 20 | #include <limits.h>
 21 | #include <netinet/ip.h>
 22 | #include <queue>
 23 | #include <random>
 24 | #include <stdlib.h>
 25 | #include <string>
 26 | #include <thread>
 27 | #include <unistd.h>
 28 | #include <vector>
 29 |  
 30 | #define FLOATING_POINT_INPUT false
 31 | 
 32 | #define ONLY_DO_QUAN false
 33 | 
 34 | #define OVERFLOW_THRESHOLD 213
 35 | #define UNDERFLOW_THRESHOLD -213
 36 | 
 37 | #define P4ML_KEY_TOTAL 500000
 38 | #define MAX_TENSOR_SIZE 1024000
 39 | 
 40 | #define MAX_THREAD_PER_APP 10
 41 | 
 42 | class P4mlManager {
 43 | public:
 44 |     P4mlManager(uint32_t host, int num_worker, int appID, int num_PS);
 45 |     // ~P4mlManager();
 46 | 
 47 |     void init_threadPool(int num_thread);
 48 |     void PushPull(uint64_t key, char* data, int len, int cmd);
 49 |     static void PushPullLoop(int thread_id);
 50 |     static void QuantizationLoop(int thread_id);
 51 | 
 52 |     void PushTaskToThread(uint64_t key, char *data, int len, int cmd, int thread_id);
 53 | 
 54 |     uint64_t GetNewKey();
 55 |     int64_t GetFinishKey();
 56 | 
 57 | 
 58 | 
 59 | 
 60 |     double GetLossRate();
 61 | 
 62 |     int GetCollisionTimeAndClear();
 63 |     void SetForceForward(float forward_rate);
 64 |     void SetMaxAgtrSizePerThread(int max_agtr_size_per_thread);
 65 |     void SetUsedSwitchAGTRcount(int used_agtr);
 66 |     void SetTailTimeRange(int trange);
 67 |     void SetMaxWindow(int size);
 68 |     void SetNic(int num);
 69 | 
 70 | private:
 71 |     static uint32_t host;
 72 |     static uint8_t num_worker;
 73 |     static uint8_t num_PS;
 74 |     static uint16_t appID;
 75 |     static uint64_t p4mlKey;
 76 |     static AppInfo* app_info;
 77 |     
 78 | 
 79 |     static int pause_count;
 80 |     static int max_window_size;
 81 |     static int nic;
 82 |     static int tail_time_range;
 83 |     static int max_agtr_size_per_thread;
 84 |     static int UsedSwitchAGTRcount;
 85 |     static int _num_thread;
 86 |     static std::chrono::time_point<std::chrono::system_clock> start;
 87 |     static ThreadInfo** threadInfoQueue;
 88 |     static DMAcontext** dmaContextQueue;
 89 |     static std::thread** threadQueue;
 90 |     static std::thread** pushPullthreadQueue;
 91 |     static std::queue<Job*>* pushPulljobQueue;
 92 |     static std::thread** quantizationthreadQueue;
 93 |     static std::queue<Job*>* quantizejobQueue;
 94 |     static std::queue<Job*>* dequantizejobQueue;
 95 | 
 96 |     static WindowManager* window_manager;
 97 |     static std::queue<Job*> finishQueue;
 98 |     static std::queue<agghdr*>* pendingQueue;
 99 |     static uint64_t* weightQueue;
100 | 
101 |     // static uint16_t* hash_map;
102 |     static HashTable* hash_table;
103 |     static int32_t** quantizeBuffer;
104 |     static bool** isOverflow;
105 | 
106 |     static bool isForceForward;
107 |     static int forwardFrequency;
108 |     static float forwardRate;
109 | 
110 |     static std::mutex Resource_mutex;
111 |     static std::mutex _P4MLKey_mutex;
112 |     static std::mutex _print_mutex;
113 |     static std::mutex _queuePush_mutex;
114 | 
115 |     static void main_receive_packet_loop(DMAcontext* dma_context, int32_t* data, int my_id, CC_manager& cc_manager);
116 |     static void updateModel(agghdr* p4ml_header, int32_t* data, int my_id);
117 | };
118 | 
119 | inline void P4mlManager::updateModel(agghdr* p4ml_header, int32_t* data, int my_id)
120 | {
121 |     uint16_t* p_seq = &p4ml_header->seq_num;
122 |     uint32_t* tensor_len = &pushPulljobQueue[my_id].front()->len;
123 | 
124 |     int32_t* p_model = p4ml_header->vector;
125 |     uint32_t offset = (*p_seq - 1) * MAX_ENTRIES_PER_PACKET;
126 |     if (offset < *tensor_len) {
127 |         if (offset + MAX_ENTRIES_PER_PACKET > *tensor_len)
128 |             memcpy(data + offset, p_model, sizeof(int32_t) * (*tensor_len % MAX_ENTRIES_PER_PACKET));
129 |         else
130 |             memcpy(data + offset, p_model, sizeof(int32_t) * MAX_ENTRIES_PER_PACKET);
131 |     }
132 | }
133 | 
134 | #endif //P4ML_MANAGER_H


--------------------------------------------------------------------------------
/client/p4ml_manager.o:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/client/p4ml_manager.o


--------------------------------------------------------------------------------
/common/CC_manager.h:
--------------------------------------------------------------------------------
  1 | #ifndef CC_MANAGER_H
  2 | #define CC_MANAGER_H
  3 | 
  4 | #define MAX_BYTES 200 * P4ML_PACKET_SIZE
  5 | 
  6 | #define A2TPENABLE 1
  7 | 
  8 | #include "packet.h"
  9 | #include <iostream>
 10 | #include <stdint.h>
 11 | #include <stdio.h>
 12 | #include <math.h>
 13 | using namespace std;
 14 | #define do_div(n, base) ({            \
 15 |     uint32_t __base = (base);         \
 16 |     uint32_t __rem;                   \
 17 |     __rem = ((uint64_t)(n)) % __base; \
 18 |     (n) = ((uint64_t)(n)) / __base;   \
 19 |     __rem;                            \
 20 | })
 21 | #define GET_MIN(a, b) (a < b ? a : b)
 22 | #define GET_MAX(a, b) (a > b ? a : b)
 23 | 
 24 | class CC_manager {
 25 | 
 26 | public:
 27 |     CC_manager(int init_window, int max_window_size)
 28 |     {
 29 |         cwnd_bytes = init_window * P4ML_PACKET_SIZE;
 30 |         aggr_bytes = cwnd_bytes;
 31 |         max_window = max_window_size;
 32 |         last_window = max_window;
 33 |         ecn_count = 0;
 34 |         col_count = 0;
 35 |         aggr_count = init_window;
 36 |         last_aggr_count = init_window;
 37 |         alpha = 0;
 38 |         beta = 0;
 39 |         urgent = 1;
 40 |         last_ts = std::chrono::high_resolution_clock::now();
 41 |         receive_ack = 0;
 42 |         update_urgent = 1;
 43 |         gam = 1;
 44 |         print_gamflag = 0;
 45 |     }
 46 | 
 47 |  //   int adjustWindow(bool isECN, bool isCollision, int appID)
 48 |  void adjustWindow(bool isECN, bool isCollision, int appID, int looptime){
 49 | 
 50 |         if(A2TPENABLE){
 51 | 
 52 |             gam = (1 - 1.0 / 64) * gam + 1.0 / 64 * urgent; //smooth straggler degree
 53 | 
 54 |             alpha = (1 - 1.0 / 8) * alpha + 1.0 / 8 * (1.0 * ecn_count / (cwnd_bytes / P4ML_PACKET_SIZE)); //smooth link congestion factor
 55 |             beta = (1 - 1.0 / 8) * beta + 1.0 / 8 * (1.0 * col_count / (aggr_bytes / P4ML_PACKET_SIZE)); //smooth aggregator congestion factor
 56 | 
 57 | 
 58 |             
 59 |             double thre = 0.1; //threshold for aggregator congestion control
 60 |             if(isCollision && beta > thre){ //adjust aggregator congestion window
 61 |                 aggr_bytes = aggr_bytes * (1 - pow(beta-thre, gam) / (2-thre));
 62 |             }else{
 63 |                 aggr_bytes = aggr_bytes + P4ML_PACKET_SIZE; // P4ML_PACKET_SIZE1500
 64 |             }
 65 | 
 66 |             if (ecn_count > 0){ //adjust link congestion window
 67 |                 cwnd_bytes = cwnd_bytes*(1-alpha/2);
 68 |                 //printf("reciev ecn\n");
 69 |             }
 70 |             else{
 71 |                 cwnd_bytes += P4ML_PACKET_SIZE; //P4ML_PACKET_SIZE1500
 72 |                 //aggr_bytes += 1500;
 73 |             }
 74 | 
 75 | 
 76 |             if (cwnd_bytes < P4ML_PACKET_SIZE)
 77 |                 cwnd_bytes = P4ML_PACKET_SIZE;
 78 |             if (cwnd_bytes > max_window * P4ML_PACKET_SIZE)
 79 |                 cwnd_bytes = max_window * P4ML_PACKET_SIZE;
 80 |             if (cwnd_bytes > P4ML_PACKET_SIZE)
 81 |                 cwnd_bytes = (cwnd_bytes / P4ML_PACKET_SIZE) * P4ML_PACKET_SIZE;
 82 | 
 83 |             if (aggr_bytes < P4ML_PACKET_SIZE)
 84 |                 aggr_bytes = P4ML_PACKET_SIZE;
 85 |             if (aggr_bytes > max_window * P4ML_PACKET_SIZE)
 86 |                 aggr_bytes = max_window * P4ML_PACKET_SIZE;
 87 |             if (aggr_bytes > P4ML_PACKET_SIZE)
 88 |                 aggr_bytes = (aggr_bytes / P4ML_PACKET_SIZE) * P4ML_PACKET_SIZE;
 89 | 
 90 |             if (aggr_bytes > cwnd_bytes){
 91 |                 aggr_bytes = cwnd_bytes;
 92 |             }
 93 |         }else{
 94 |             if (isECN){
 95 |                 cwnd_bytes = cwnd_bytes / 2;
 96 |                 //aggr_bytes = aggr_bytes*(1-alpha/2);
 97 |                 //printf("reciev ecn\n");
 98 |             }
 99 |             else{
100 |                 cwnd_bytes += 1500;
101 |                 //aggr_bytes += 1500;
102 |             }
103 |             if (cwnd_bytes < P4ML_PACKET_SIZE)
104 |                 cwnd_bytes = P4ML_PACKET_SIZE;
105 |             if (cwnd_bytes > max_window * P4ML_PACKET_SIZE)
106 |                 cwnd_bytes = max_window * P4ML_PACKET_SIZE;
107 |             if (cwnd_bytes > P4ML_PACKET_SIZE)
108 |                 cwnd_bytes = (cwnd_bytes / P4ML_PACKET_SIZE) * P4ML_PACKET_SIZE;
109 | 
110 |             aggr_bytes = cwnd_bytes;
111 | 
112 |         }
113 |         
114 |         ecn_count = 0;
115 |         col_count = 0;
116 |         //return cwnd_bytes / P4ML_PACKET_SIZE;
117 | }
118 | 
119 | int GetCwndPackets(){
120 |     return cwnd_bytes / P4ML_PACKET_SIZE;
121 | }
122 | 
123 | int GetAggrPackets(){
124 |     return aggr_bytes / P4ML_PACKET_SIZE;
125 | }
126 | 
127 |     int ecn_count;
128 |     int col_count;
129 |     int aggr_count;
130 |     int last_aggr_count;
131 |     double alpha; //link congestion factor
132 |     double beta; //aggregator congestion factor
133 |     double urgent; //straggler degree
134 |     double gam;
135 |     int last_window;
136 |     int receive_ack;
137 |     bool update_urgent;
138 |     bool print_gamflag;
139 |     std::chrono::high_resolution_clock::time_point last_ts;
140 | 
141 | private:
142 |     uint64_t cwnd_bytes; //link congestion window size
143 |     uint64_t aggr_bytes; //aggregator congestion window size
144 |     int max_window;
145 | };
146 | 
147 | #endif


--------------------------------------------------------------------------------
/common/HashTable.cc:
--------------------------------------------------------------------------------
  1 | #include "HashTable.h"
  2 | #define MAX_BYTES 100 * P4ML_PACKET_SIZE
  3 | 
  4 | HashTable::HashTable(int size)
  5 | {
  6 |     used_size = size;
  7 |     int max_size = MAX_AGTR_COUNT;
  8 |     hash_map = new uint16_t[max_size];
  9 |     memset(isAlreadyDeclare, 0, sizeof(bool) * size);
 10 |     memset(predefine_agtr_list, 0, sizeof(bool) * size);
 11 |     for (int i = 0; i < size; i++) {
 12 |         predefine_agtr_list[i] = i;
 13 |         // printf("[%d] %d  ", i, predefine_agtr_list[i]);
 14 |     }
 15 |     int random_seed = rand();
 16 |     std::shuffle(predefine_agtr_list, predefine_agtr_list + size, std::default_random_engine(random_seed));
 17 |     
 18 |     // for (int i = 0; i < size; i++) {
 19 | 
 20 |     //     printf("[%d] %d ", i, predefine_agtr_list[i]);
 21 |     // }
 22 |     hash_pos = 0;
 23 | }
 24 | 
 25 | void HashTable::HashNew_linear(int index)
 26 | {
 27 |     // Guarantee non-repeat element generated
 28 |     uint16_t new_value;
 29 |     do {
 30 |         new_value = hash_function();
 31 |     } while (isAlreadyDeclare[new_value]);
 32 | 
 33 |     hash_map[index] = new_value;
 34 |     isAlreadyDeclare[new_value] = true;
 35 | }
 36 | 
 37 | int HashTable::HashNew_predefine()
 38 | {
 39 |     if (hash_pos >= used_size) {
 40 |         return -1;
 41 |     }
 42 | 
 43 |     // Get AGTR from predefined hash
 44 |     while (hash_pos < used_size) { 
 45 |         int new_agtr = predefine_agtr_list[hash_pos];
 46 |         if (isAlreadyDeclare[new_agtr]) {
 47 |             hash_pos++;
 48 |         } else {
 49 |             hash_pos++;
 50 |             isAlreadyDeclare[new_agtr] = true;
 51 |             return new_agtr;
 52 |         }
 53 |     }
 54 | 
 55 |     return -1;
 56 | }
 57 | 
 58 | int HashTable::HashNew_crc(uint16_t appID, uint16_t index)
 59 | {
 60 |     // Guarantee non-repeat element generated
 61 |     uint8_t crc_input[] = {(uint8_t)(appID & 0xff), (uint8_t)(appID >> 8), (uint8_t)(index & 0xff), (uint8_t)(index >> 8), 0, 0};
 62 | 
 63 |     uint16_t new_value;
 64 |     uint8_t salt = 0;
 65 |     do {
 66 |         new_value = crc32_le(0xffffffff, crc_input, 6);
 67 |         new_value %= used_size;
 68 |         crc_input[4]++;
 69 |         if (crc_input[4] == 255) {
 70 |             crc_input[4] = 0;
 71 |             crc_input[5]++;
 72 |         }
 73 |     } while (isAlreadyDeclare[new_value]);
 74 |     hash_map[index] = new_value;
 75 |     isAlreadyDeclare[new_value] = true;
 76 |     return new_value;
 77 | }
 78 | 
 79 | void HashTable::HashNew_separate(uint16_t appID, uint16_t index)
 80 | {
 81 |     int real_index = ((appID - 1) * 2000) + index;
 82 |     hash_map[index] = real_index;
 83 |     isAlreadyDeclare[real_index] = true;
 84 | }
 85 | 
 86 | uint16_t HashTable::hash_function()
 87 | {
 88 |     return hash_pos++;
 89 | }
 90 | 
 91 | uint32_t HashTable::crc32_le(uint32_t crc, unsigned char const* p, size_t len)
 92 | {
 93 |     while (len--) {
 94 |         crc ^= *p++;
 95 |         for (int i = 0; i < 8; i++)
 96 |             crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
 97 |     }
 98 |     return ~crc;
 99 | }
100 | 


--------------------------------------------------------------------------------
/common/HashTable.h:
--------------------------------------------------------------------------------
 1 | #ifndef HASHTABLE_H
 2 | #define HASHTABLE_H
 3 | #include <bits/stdc++.h>
 4 | #include "packet.h"
 5 | #include "utils.h"
 6 | #define CRCPOLY_LE 0xedb88320
 7 | 
 8 | class HashTable {
 9 | 
10 | public:
11 |     HashTable(int size);
12 |     void HashNew_linear(int index);
13 |     int HashNew_crc(uint16_t appID, uint16_t index);
14 |     int HashNew_predefine();
15 |     void HashNew_separate(uint16_t appID, uint16_t index);
16 |     uint16_t* hash_map;
17 |     bool isAlreadyDeclare[MAX_AGTR_COUNT];
18 | 
19 | private:
20 |     int used_size;
21 |     uint32_t crc32_le(uint32_t crc, unsigned char const* p, size_t len);
22 |     int predefine_agtr_list[MAX_AGTR_COUNT];
23 | 
24 |     // These for predefine Hash
25 | 
26 |     // These two for Linear Hash
27 |     uint16_t hash_function();
28 |     uint16_t hash_pos;
29 | 
30 | };
31 | 
32 | #endif


--------------------------------------------------------------------------------
/common/HashTable.o:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/common/HashTable.o


--------------------------------------------------------------------------------
/common/ThreadPool.h:
--------------------------------------------------------------------------------
  1 | #ifndef THREAD_POOL_H
  2 | #define THREAD_POOL_H
  3 | 
  4 | #include <vector>
  5 | #include <queue>
  6 | #include <memory>
  7 | #include <thread>
  8 | #include <mutex>
  9 | #include <condition_variable>
 10 | #include <future>
 11 | #include <functional>
 12 | #include <stdexcept>
 13 | 
 14 | class ThreadPool {
 15 | public:
 16 |     template<class F> ThreadPool(size_t, F callback);
 17 |     template<class F, class... Args>
 18 |     auto enqueue(F&& f, Args&&... args) 
 19 |         -> std::future<typename std::result_of<F(Args...)>::type>;
 20 |     ~ThreadPool();
 21 | private:
 22 |     // need to keep track of threads so we can join them
 23 |     std::vector< std::thread > workers;
 24 |     // the task queue
 25 |     std::queue< std::function<void()> > tasks;
 26 |     
 27 |     // synchronization
 28 |     std::mutex queue_mutex;
 29 |     std::condition_variable condition;
 30 |     bool stop;
 31 | };
 32 |  
 33 | // the constructor just launches some amount of workers
 34 | template<class F>
 35 | inline ThreadPool::ThreadPool(size_t threads, F callback)
 36 |     :   stop(false)
 37 | {
 38 |     for(size_t i = 0;i<threads;++i)
 39 |         workers.emplace_back(
 40 |             [this, callback]
 41 |             {
 42 |                 for(;;)
 43 |                 {
 44 |                     std::function<void()> task;
 45 | 
 46 |                     {
 47 |                         std::unique_lock<std::mutex> lock(this->queue_mutex);
 48 |                         this->condition.wait(lock,
 49 |                             [this]{ return this->stop || !this->tasks.empty(); });
 50 |                         if(this->stop && this->tasks.empty())
 51 |                             return;
 52 |                         task = std::move(this->tasks.front());
 53 |                         this->tasks.pop();
 54 |                     }
 55 | 
 56 |                     task();
 57 |                     callback();
 58 |                 }
 59 |             }
 60 |         );
 61 | }
 62 | 
 63 | // add new work item to the pool
 64 | template<class F, class... Args>
 65 | auto ThreadPool::enqueue(F&& f, Args&&... args) 
 66 |     -> std::future<typename std::result_of<F(Args...)>::type>
 67 | {
 68 |     using return_type = typename std::result_of<F(Args...)>::type;
 69 | 
 70 |     auto task = std::make_shared< std::packaged_task<return_type()> >(
 71 |             std::bind(std::forward<F>(f), std::forward<Args>(args)...)
 72 |         );
 73 |         
 74 |     std::future<return_type> res = task->get_future();
 75 |     {
 76 |         std::unique_lock<std::mutex> lock(queue_mutex);
 77 | 
 78 |         // don't allow enqueueing after stopping the pool
 79 |         if(stop)
 80 |             throw std::runtime_error("enqueue on stopped ThreadPool");
 81 | 
 82 |         tasks.emplace([task](){ (*task)(); });
 83 |     }
 84 |     condition.notify_one();
 85 |     return res;
 86 | }
 87 | 
 88 | // the destructor joins all threads
 89 | inline ThreadPool::~ThreadPool()
 90 | {
 91 |     {
 92 |         std::unique_lock<std::mutex> lock(queue_mutex);
 93 |         stop = true;
 94 |     }
 95 |     condition.notify_all();
 96 |     for(std::thread &worker: workers)
 97 |         worker.join();
 98 | }
 99 | 
100 | #endif
101 | 


--------------------------------------------------------------------------------
/common/dma_common.cc:
--------------------------------------------------------------------------------
  1 | #define __USE_GNU
  2 | 
  3 | #include "dma_common.h"
  4 | #include <infiniband/verbs_exp.h>
  5 | #include <inttypes.h>
  6 | #include <linux/if_ether.h>
  7 | #include <netdb.h>
  8 | #include <netinet/in.h>
  9 | #include <rdma/rdma_cma.h>
 10 | #include <sched.h>
 11 | #include <stdio.h>
 12 | #include <stdlib.h>
 13 | #include <string.h>
 14 | #include <sys/socket.h>
 15 | #include <unistd.h>
 16 | 
 17 | 
 18 | std::mutex ___print_mutex;
 19 | int my_send_queue_length = 2048;
 20 | int my_recv_queue_length = my_send_queue_length * 8;
 21 | 
 22 | unsigned char PS_FILTER_TEMPLATE_R[] = { 0x05, 0x04, 0x03, 0x02, 0x01, 0xFF };
 23 | unsigned char WORKER_FILTER_TEMPLATE_R[] = { 0x77, 0x77, 0x77, 0x77, 0x77, 0xFF };
 24 | 
 25 | DMAcontext* DMA_create(ibv_device* ib_dev, int thread_id, bool isPS)
 26 | {
 27 | 
 28 |     ibv_context* context = ibv_open_device(ib_dev);
 29 |     if (!context) {
 30 |         fprintf(stderr, "Couldn't get context for %s\n",
 31 |             ibv_get_device_name(ib_dev));
 32 |         exit(1);
 33 |     }
 34 |     ibv_pd* pd = ibv_alloc_pd(context);
 35 |     if (!pd) {
 36 |         fprintf(stderr, "Couldn't allocate PD\n");
 37 |         exit(1);
 38 |     }
 39 | 
 40 |     struct ibv_cq* rec_cq = ibv_create_cq(context, my_recv_queue_length + 1, NULL, NULL, 0);
 41 |     if (!rec_cq) {
 42 |         fprintf(stderr, "Couldn't create CQ %d\n", errno);
 43 |         exit(1);
 44 |     }
 45 | 
 46 |     struct ibv_cq* snd_cq = ibv_create_cq(context, my_send_queue_length + 1, NULL, NULL, 0);
 47 |     if (!snd_cq) {
 48 |         fprintf(stderr, "Couldn't create CQ %d\n", errno);
 49 |         exit(1);
 50 |     }
 51 | 
 52 |     struct ibv_qp* qp;
 53 |     struct ibv_exp_qp_init_attr* qp_init_attr = (struct ibv_exp_qp_init_attr*)malloc(sizeof(struct ibv_exp_qp_init_attr));
 54 | 
 55 |     memset(qp_init_attr, 0, sizeof(*qp_init_attr));
 56 |     qp_init_attr->comp_mask = IBV_EXP_QP_INIT_ATTR_PD | IBV_EXP_QP_INIT_ATTR_MAX_TSO_HEADER | IBV_EXP_QP_INIT_ATTR_INL_RECV;
 57 |     qp_init_attr->send_cq = snd_cq;
 58 |     qp_init_attr->recv_cq = rec_cq;
 59 |     qp_init_attr->qp_type = IBV_QPT_RAW_PACKET;
 60 | 
 61 |     qp_init_attr->pd = pd;
 62 |     qp_init_attr->cap.max_send_wr = my_send_queue_length + 1;
 63 |     qp_init_attr->cap.max_recv_wr = my_recv_queue_length + 1;
 64 |     qp_init_attr->cap.max_inline_data = 512;
 65 |     qp_init_attr->cap.max_send_sge = 1;
 66 |     qp_init_attr->cap.max_recv_sge = 1;
 67 |     qp_init_attr->max_tso_header = IP_ETH_UDP_HEADER_SIZE;
 68 |     qp_init_attr->max_inl_recv = 512;
 69 | 
 70 |     qp = ibv_exp_create_qp(context, qp_init_attr);
 71 |     //qp = ibv_create_qp(pd, qp_init_attr);
 72 |     if (!qp) {
 73 |         fprintf(stderr, "Couldn't create RSS QP\n");
 74 |         exit(1);
 75 |     }
 76 | 
 77 |     struct ibv_qp_attr qp_attr;
 78 |     int qp_flags;
 79 |     int ret;
 80 |     memset(&qp_attr, 0, sizeof(qp_attr));
 81 |     qp_flags = IBV_QP_STATE | IBV_QP_PORT;
 82 |     qp_attr.qp_state = IBV_QPS_INIT;
 83 |     qp_attr.port_num = 1;
 84 |     ret = ibv_modify_qp(qp, &qp_attr, qp_flags);
 85 |     if (ret < 0) {
 86 |         fprintf(stderr, "failed modify qp to init\n");
 87 |         exit(1);
 88 |     }
 89 |     memset(&qp_attr, 0, sizeof(qp_attr));
 90 | 
 91 |     /* a. Move ring state to ready to receive, this is needed to be able to move ring to ready to send even if receive queue is not enabled */
 92 | 
 93 |     qp_flags = IBV_QP_STATE;
 94 |     qp_attr.qp_state = IBV_QPS_RTR;
 95 |     ret = ibv_modify_qp(qp, &qp_attr, qp_flags);
 96 |     if (ret < 0) {
 97 |         fprintf(stderr, "failed modify qp to receive\n");
 98 |         exit(1);
 99 |     }
100 | 
101 |     /* b. Move the ring to ready to send */
102 | 
103 |     qp_flags = IBV_QP_STATE;
104 |     qp_attr.qp_state = IBV_QPS_RTS;
105 |     ret = ibv_modify_qp(qp, &qp_attr, qp_flags);
106 |     if (ret < 0) {
107 |         fprintf(stderr, "failed modify qp to send\n");
108 |         exit(1);
109 |     }
110 | 
111 |     int send_buf_size = P4ML_PACKET_SIZE * my_send_queue_length;
112 | 
113 |     void* send_buf;
114 | 
115 |     //send_buf = malloc(send_buf_size);
116 |     // send_buf = alloc_raw_pages(send_buf_size / EACH_HUGEPAGE_SIZE + 1, EACH_HUGEPAGE_SIZE);
117 |     ib_malloc(&send_buf, send_buf_size);
118 |     if (!send_buf) {
119 |         fprintf(stderr, "Coudln't allocate send memory\n");
120 |         exit(1);
121 |     }
122 | 
123 |     struct ibv_mr* send_mr;
124 |     send_mr = ibv_reg_mr(pd, send_buf, send_buf_size, IBV_ACCESS_LOCAL_WRITE);
125 |     if (!send_mr) {
126 |         fprintf(stderr, "Couldn't register recv mr\n");
127 |         exit(1);
128 |     }
129 | 
130 |     // Init CQ. Its size MUST be one so that we get two CQEs in mlx5.
131 |     struct ibv_exp_cq_init_attr cq_init_attr;
132 |     memset(&cq_init_attr, 0, sizeof(cq_init_attr));
133 |     struct ibv_cq* mp_recv_cq = ibv_exp_create_cq(context, kAppRecvCQDepth / 2, nullptr, nullptr, 0, &cq_init_attr);
134 |     assert(mp_recv_cq != nullptr);
135 | 
136 |     // Modify the RECV CQ to ignore overrun
137 |     struct ibv_exp_cq_attr cq_attr;
138 |     memset(&cq_attr, 0, sizeof(cq_attr));
139 |     cq_attr.comp_mask = IBV_EXP_CQ_ATTR_CQ_CAP_FLAGS;
140 |     cq_attr.cq_cap_flags = IBV_EXP_CQ_IGNORE_OVERRUN;
141 |     rt_assert(ibv_exp_modify_cq(mp_recv_cq, &cq_attr, IBV_EXP_CQ_CAP_FLAGS) == 0);
142 | 
143 |     struct ibv_exp_wq_init_attr wq_init_attr;
144 |     memset(&wq_init_attr, 0, sizeof(wq_init_attr));
145 | 
146 |     wq_init_attr.wq_type = IBV_EXP_WQT_RQ;
147 |     wq_init_attr.max_recv_wr = kAppRQDepth;
148 |     wq_init_attr.max_recv_sge = 1;
149 |     wq_init_attr.pd = pd;
150 |     wq_init_attr.cq = mp_recv_cq;
151 | 
152 |     wq_init_attr.comp_mask |= IBV_EXP_CREATE_WQ_MP_RQ;
153 |     wq_init_attr.mp_rq.use_shift = IBV_EXP_MP_RQ_NO_SHIFT;
154 |     wq_init_attr.mp_rq.single_wqe_log_num_of_strides = kAppLogNumStrides;
155 |     wq_init_attr.mp_rq.single_stride_log_num_of_bytes = kAppLogStrideBytes;
156 |     struct ibv_exp_wq* mp_wq = ibv_exp_create_wq(context, &wq_init_attr);
157 |     assert(mp_wq != nullptr);
158 | 
159 |     // Change WQ to ready state
160 |     struct ibv_exp_wq_attr wq_attr;
161 |     memset(&wq_attr, 0, sizeof(wq_attr));
162 |     wq_attr.attr_mask = IBV_EXP_WQ_ATTR_STATE;
163 |     wq_attr.wq_state = IBV_EXP_WQS_RDY;
164 |     rt_assert(ibv_exp_modify_wq(mp_wq, &wq_attr) == 0);
165 | 
166 |     // Get the RQ burst function
167 |     enum ibv_exp_query_intf_status intf_status = IBV_EXP_INTF_STAT_OK;
168 |     struct ibv_exp_query_intf_params query_intf_params;
169 |     memset(&query_intf_params, 0, sizeof(query_intf_params));
170 |     query_intf_params.intf_scope = IBV_EXP_INTF_GLOBAL;
171 |     query_intf_params.intf = IBV_EXP_INTF_WQ;
172 |     query_intf_params.obj = mp_wq;
173 |     struct ibv_exp_wq_family* mp_wq_family = reinterpret_cast<struct ibv_exp_wq_family*>(
174 |         ibv_exp_query_intf(context, &query_intf_params, &intf_status));
175 |     assert(mp_wq_family != nullptr);
176 | 
177 |     // Create indirect table
178 |     struct ibv_exp_rwq_ind_table_init_attr rwq_ind_table_init_attr;
179 |     memset(&rwq_ind_table_init_attr, 0, sizeof(rwq_ind_table_init_attr));
180 |     rwq_ind_table_init_attr.pd = pd;
181 |     rwq_ind_table_init_attr.log_ind_tbl_size = 0; // Ignore hash
182 |     rwq_ind_table_init_attr.ind_tbl = &mp_wq; // Pointer to RECV work queue
183 |     rwq_ind_table_init_attr.comp_mask = 0;
184 |     struct ibv_exp_rwq_ind_table* mp_ind_tbl = ibv_exp_create_rwq_ind_table(context, &rwq_ind_table_init_attr);
185 |     assert(mp_ind_tbl != nullptr);
186 | 
187 |     // Create rx_hash_conf and indirection table for the QP
188 |     uint8_t toeplitz_key[] = { 0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
189 |         0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
190 |         0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
191 |         0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
192 |         0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa };
193 |     const int TOEPLITZ_RX_HASH_KEY_LEN = sizeof(toeplitz_key) / sizeof(toeplitz_key[0]);
194 | 
195 |     struct ibv_exp_rx_hash_conf rx_hash_conf;
196 |     memset(&rx_hash_conf, 0, sizeof(rx_hash_conf));
197 |     rx_hash_conf.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ;
198 |     rx_hash_conf.rx_hash_key_len = TOEPLITZ_RX_HASH_KEY_LEN;
199 |     rx_hash_conf.rx_hash_key = toeplitz_key;
200 |     rx_hash_conf.rx_hash_fields_mask = IBV_EXP_RX_HASH_DST_PORT_UDP;
201 |     rx_hash_conf.rwq_ind_tbl = mp_ind_tbl;
202 | 
203 |     struct ibv_exp_qp_init_attr mp_qp_init_attr;
204 |     memset(&mp_qp_init_attr, 0, sizeof(mp_qp_init_attr));
205 |     mp_qp_init_attr.comp_mask = IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS | IBV_EXP_QP_INIT_ATTR_PD | IBV_EXP_QP_INIT_ATTR_RX_HASH;
206 |     mp_qp_init_attr.rx_hash_conf = &rx_hash_conf;
207 |     mp_qp_init_attr.pd = pd;
208 |     mp_qp_init_attr.qp_type = IBV_QPT_RAW_PACKET;
209 | 
210 |     // Create the QP
211 |     struct ibv_qp* mp_recv_qp = ibv_exp_create_qp(context, &mp_qp_init_attr);
212 |     assert(mp_recv_qp != nullptr);
213 | 
214 |     size_t tx_ring_size = P4ML_LAYER_SIZE * kAppMaxPostlist;
215 |     uint8_t* mp_send_ring;
216 |     ib_malloc((void **)&mp_send_ring, tx_ring_size);
217 |     rt_assert(mp_send_ring != nullptr);
218 |     memset(mp_send_ring, 0, tx_ring_size);
219 | 
220 |     struct ibv_mr* mp_send_mr = ibv_reg_mr(pd, mp_send_ring, tx_ring_size, IBV_ACCESS_LOCAL_WRITE);
221 |     rt_assert(mp_send_mr != nullptr);
222 | 
223 |     // Register RX ring memory
224 |     uint8_t* mp_recv_ring;
225 |     ib_malloc((void **)&mp_recv_ring, kAppRingSize);
226 |     rt_assert(mp_recv_ring != nullptr);
227 |     memset(mp_recv_ring, 0, kAppRingSize);
228 | 
229 |     struct ibv_mr* mp_mr = ibv_reg_mr(pd, mp_recv_ring, kAppRingSize, IBV_ACCESS_LOCAL_WRITE);
230 |     rt_assert(mp_mr != nullptr);
231 |     /////////////////////////////////////////////////////////////////////////////////////
232 |     // install_flow_rule(mp_recv_qp, 30720 + thread_id);
233 |     install_flow_rule(mp_recv_qp, thread_id, isPS);
234 |     // This cast works for mlx5 where ibv_cq is the first member of mlx5_cq.
235 |     auto* _mlx5_cq = reinterpret_cast<mlx5_cq*>(mp_recv_cq);
236 |     rt_assert(kAppRecvCQDepth == std::pow(2, _mlx5_cq->cq_log_size));
237 |     rt_assert(_mlx5_cq->buf_a.buf != nullptr);
238 | 
239 |     auto* mp_cqe_arr = reinterpret_cast<volatile mlx5_cqe64*>(_mlx5_cq->buf_a.buf);
240 | 
241 |     // Initialize the CQEs as if we received the last (kAppRecvCQDepth) packets
242 |     // in the CQE cycle.
243 |     static_assert(kAppStridesPerWQE >= kAppRecvCQDepth, "");
244 |     for (size_t i = 0; i < kAppRecvCQDepth; i++) {
245 |         mp_cqe_arr[i].wqe_id = htons(std::numeric_limits<uint16_t>::max());
246 |         // Last CQE gets
247 |         // * wqe_counter = (kAppStridesPerWQE - 1)
248 |         // * snapshot_cycle_idx = (kAppCQESnapshotCycle - 1)
249 |         mp_cqe_arr[i].wqe_counter = htons(kAppStridesPerWQE - (kAppRecvCQDepth - i));
250 | 
251 |         cqe_snapshot_t snapshot;
252 |         snapshot_cqe(&mp_cqe_arr[i], snapshot);
253 |         rt_assert(snapshot.get_cqe_snapshot_cycle_idx() == kAppCQESnapshotCycle - (kAppRecvCQDepth - i));
254 |     }
255 | 
256 |     // The multi-packet RECVs. This must be done after we've initialized the CQE.
257 |     struct ibv_sge* mp_sge = reinterpret_cast<struct ibv_sge*>(malloc(sizeof(struct ibv_sge) * kAppRQDepth));
258 |     for (size_t i = 0; i < kAppRQDepth; i++) {
259 |         size_t mpwqe_offset = i * (kAppRingMbufSize * kAppStridesPerWQE);
260 |         mp_sge[i].addr = reinterpret_cast<uint64_t>(&mp_recv_ring[mpwqe_offset]);
261 |         mp_sge[i].lkey = mp_mr->lkey;
262 |         mp_sge[i].length = kAppRingMbufSize * kAppStridesPerWQE; //kAppRingSize;
263 |         mp_wq_family->recv_burst(mp_wq, &mp_sge[i], 1);
264 |     }
265 | 
266 |     printf("[Thread %d] Finish created QP - ", thread_id);
267 |     printf("kAppRingMbufSize=%lu, kAppStridesPerWQE=%lu, kAppRingSize=%lu, kAppRQDepth=%lu\n", kAppRingMbufSize, kAppStridesPerWQE, kAppRingSize, kAppRQDepth);
268 |     auto* cqe_arr = mp_cqe_arr;
269 |     cqe_snapshot_t prev_snapshot;
270 |     snapshot_cqe(&cqe_arr[kAppRecvCQDepth - 1], prev_snapshot);
271 | 
272 |     return new DMAcontext{
273 |         .pd = pd,
274 |         .ctx = context,
275 |         .receive_cq = rec_cq,
276 |         .send_cq = snd_cq,
277 |         .send_mr = send_mr,
278 |         .send_region = send_buf,
279 |         .data_qp = qp,
280 | 
281 |         .mp_recv_qp = mp_recv_qp,
282 |         .mp_recv_cq = mp_recv_cq,
283 |         .mp_wq = mp_wq,
284 |         .mp_wq_family = mp_wq_family,
285 |         .mp_ind_tbl = mp_ind_tbl,
286 |         .mp_cqe_arr = mp_cqe_arr,
287 |         .mp_sge = mp_sge,
288 |         .mp_recv_ring = mp_recv_ring,
289 |         .mp_send_ring = mp_send_ring,
290 |         .mp_send_mr = mp_send_mr,
291 | 
292 |         .id = thread_id,
293 |         .total_received = 0,
294 |         .total_sent = 0,
295 |         .my_send_queue_length = my_send_queue_length,
296 |         .my_recv_queue_length = my_recv_queue_length,
297 | 
298 |         .ring_head = 0,
299 |         .nb_rx_rolling = 0,
300 |         .sge_idx = 0,
301 |         .cqe_idx = 0,
302 |         .prev_snapshot = prev_snapshot,
303 |         .isPS = isPS,
304 |         .isMarkTimeStamp = false,
305 |     };
306 | }
307 | 
308 | void send_packet(DMAcontext* dma_context, int chunk_size, uint64_t offset)
309 | {
310 |     int ret;
311 | 
312 |     struct ibv_sge sg;
313 |     struct ibv_exp_send_wr wr, *bad_wr;
314 |     // struct ibv_send_wr wr;
315 |     // struct ibv_send_wr *bad_wr;
316 | 
317 |     memset(&sg, 0, sizeof(sg));
318 |     sg.addr = (uintptr_t)((char*)dma_context->send_region + offset * P4ML_LAYER_SIZE);
319 |     // printf("%d\n", sg.addr);
320 |     sg.length = chunk_size;
321 |     sg.lkey = dma_context->send_mr->lkey;
322 | 
323 |     memset(&wr, 0, sizeof(wr));
324 |     wr.wr_id = 0;
325 |     wr.sg_list = &sg;
326 |     wr.num_sge = 1;
327 |     // wr.opcode     = IBV_WR_SEND;
328 |     wr.exp_opcode = IBV_EXP_WR_TSO;
329 |     wr.tso.mss = P4ML_LAYER_SIZE; // Maximum Segment Size example
330 |     wr.tso.hdr_sz = IP_ETH_UDP_HEADER_SIZE; // ETH/IPv4/TCP header example
331 |     char hdr[IP_ETH_UDP_HEADER_SIZE]; // ETH/IPv4/TCP header example
332 |     if (dma_context->isPS)
333 |         memcpy(hdr, PS_IP_ETH_UDP_HEADER, IP_ETH_UDP_HEADER_SIZE); // Assuming that the header buffer was define before.
334 |     else
335 |         memcpy(hdr, WORKER_IP_ETH_UDP_HEADER, IP_ETH_UDP_HEADER_SIZE); // Assuming that the header buffer was define before.
336 | 
337 |     hdr[5] = dma_context->id;
338 |     // hdr[37] = dma_context->id;
339 |     wr.tso.hdr = hdr; // There is no need to use malloc operation in this case, local definition of hdr is ok.
340 |         //wr.exp_send_flags = IBV_SEND_INLINE;
341 |     wr.exp_send_flags |= IBV_SEND_SIGNALED;
342 | 
343 |     if (DEBUG_PRINT_ALL_SENDING_PACKET)
344 |         for (int i = 0; i < chunk_size / P4ML_LAYER_SIZE; i++) 
345 |             p4ml_header_print_h((agghdr*)((char *)sg.addr + i * P4ML_LAYER_SIZE), "SEND");
346 | 
347 |     // mark first time sending timestamp
348 |     if (dma_context->isMarkTimeStamp) {
349 |         std::chrono::high_resolution_clock::time_point current_time = std::chrono::high_resolution_clock::now();
350 |         for (int i = 0; i < chunk_size / P4ML_LAYER_SIZE; i++) {
351 |             agghdr* p4ml_header = (agghdr*)((char*)sg.addr + i * P4ML_LAYER_SIZE);
352 |             if (!dma_context->isSent[ntohs(p4ml_header->seq_num)]) {
353 |                 dma_context->isSent[ntohs(p4ml_header->seq_num)] = true;
354 |                 dma_context->first_send_time[ntohs(p4ml_header->seq_num)] = current_time;
355 |             } else {
356 |                 /* Resend may trigger */
357 |             }
358 |         }
359 |     }
360 | 
361 |     // we dont need to wait cq cause received represent sent
362 |     ret = ibv_exp_post_send(dma_context->data_qp, &wr, &bad_wr);
363 |     if (ret < 0) {
364 |         fprintf(stderr, "failed in post send\n");
365 |         exit(1);
366 |     }
367 | 
368 |     struct ibv_wc wc_send_cq[POLLING_SIZE];
369 |     ibv_poll_cq(dma_context->send_cq, POLLING_SIZE, wc_send_cq);
370 |     if (DEBUG_CHECK_SEND_RECEIVE_TOTAL)
371 |         dma_context->total_sent += chunk_size / P4ML_LAYER_SIZE;
372 | }
373 | 
374 | size_t receive_packet(DMAcontext *dma_context, cqe_snapshot_t* new_snapshot)
375 | {
376 |     // cqe_snapshot_t new_snapshot;
377 |     // cur_snapshot = new_snapshot;
378 |     snapshot_cqe(&dma_context->mp_cqe_arr[dma_context->cqe_idx], *new_snapshot);
379 |     const size_t delta = get_cycle_delta(dma_context->prev_snapshot, *new_snapshot);
380 | 
381 |     if (!(delta == 0 || delta >= kAppNumRingEntries)) {
382 |         if (DEBUG_CHECK_SEND_RECEIVE_TOTAL)
383 |             dma_context->total_received += delta;
384 |         return delta;
385 |     }
386 |     else 
387 |         return 0;
388 |     // return delta;
389 | }
390 | 
391 | void dma_postback(DMAcontext *dma_context)
392 | {
393 |     dma_context->ring_head = (dma_context->ring_head + 1) % kAppNumRingEntries;
394 |     dma_context->nb_rx_rolling++;
395 |     if (dma_context->nb_rx_rolling == kAppStridesPerWQE)
396 |     {
397 |         dma_context->nb_rx_rolling = 0;
398 |         int ret = dma_context->mp_wq_family->recv_burst(dma_context->mp_wq, &dma_context->mp_sge[dma_context->sge_idx], 1);
399 |         rt_assert(ret == 0);
400 |         dma_context->sge_idx = (dma_context->sge_idx + 1) % kAppRQDepth;
401 |     }
402 | }
403 | 
404 | void dma_update_snapshot(DMAcontext *dma_context, cqe_snapshot_t new_snapshot)
405 | {
406 |     dma_context->prev_snapshot = new_snapshot;
407 |     dma_context->cqe_idx = (dma_context->cqe_idx + 1) % kAppRecvCQDepth;
408 | }
409 | 
410 | const char* ibv_wc_opcode_str(enum ibv_wc_opcode opcode)
411 | {
412 |     switch (opcode) {
413 |     case IBV_EXP_WC_SEND:
414 |         return "IBV_WC_SEND";
415 |     case IBV_EXP_WC_RDMA_WRITE:
416 |         return "IBV_WC_RDMA_WRITE";
417 |     case IBV_EXP_WC_RDMA_READ:
418 |         return "IBV_WC_RDMA_READ";
419 |     case IBV_WC_COMP_SWAP:
420 |         return "IBV_WC_COMP_SWAP";
421 |     case IBV_WC_FETCH_ADD:
422 |         return "IBV_WC_FETCH_ADD";
423 |     case IBV_WC_BIND_MW:
424 |         return "IBV_WC_BIND_MW";
425 |         /* receive-side: inbound completion */
426 |     case IBV_EXP_WC_RECV:
427 |         return "IBV_WC_RECV";
428 |     case IBV_EXP_WC_RECV_RDMA_WITH_IMM:
429 |         return "IBV_WC_RECV_RDMA_WITH_IMM";
430 |     default:
431 |         return "IBV_WC_UNKNOWN";
432 |     }
433 | }
434 | 
435 | // Install a flow rule
436 | void install_flow_rule(struct ibv_qp* qp, uint16_t thread_id, bool isPS)
437 | {
438 |     static constexpr size_t rule_sz = sizeof(ibv_exp_flow_attr) + sizeof(ibv_exp_flow_spec_eth) + sizeof(ibv_exp_flow_spec_ipv4_ext);
439 | 
440 |     uint8_t* flow_rule = new uint8_t[rule_sz];
441 |     memset(flow_rule, 0, rule_sz);
442 |     uint8_t* buf = flow_rule;
443 | 
444 |     auto* flow_attr = reinterpret_cast<struct ibv_exp_flow_attr*>(flow_rule);
445 |     flow_attr->type = IBV_EXP_FLOW_ATTR_NORMAL;
446 |     flow_attr->size = rule_sz;
447 |     flow_attr->priority = 0;
448 |     flow_attr->num_of_specs = 1;
449 |     flow_attr->port = 1;
450 |     flow_attr->flags = 0;
451 |     flow_attr->reserved = 0;
452 |     buf += sizeof(struct ibv_exp_flow_attr);
453 | 
454 |     // Ethernet - all wildcard
455 |     auto* eth_spec = reinterpret_cast<struct ibv_exp_flow_spec_eth*>(buf);
456 |     eth_spec->type = IBV_EXP_FLOW_SPEC_ETH;
457 |     eth_spec->size = sizeof(struct ibv_exp_flow_spec_eth);
458 |     buf += sizeof(struct ibv_exp_flow_spec_eth);
459 | 
460 |     const unsigned char R_SRC_MAC[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
461 |     unsigned char R_DST_MAC[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
462 |     if (isPS)
463 |         memcpy(R_DST_MAC, PS_FILTER_TEMPLATE_R, sizeof(R_DST_MAC));
464 |     else
465 |         memcpy(R_DST_MAC, WORKER_FILTER_TEMPLATE_R, sizeof(R_DST_MAC));
466 | 
467 |     R_DST_MAC[5] = thread_id;
468 |     
469 |     const unsigned char R_SRC_MAC_MASK[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
470 |     const unsigned char R_DST_MAC_MASK[] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF };
471 |     memcpy(eth_spec->val.dst_mac, R_DST_MAC, sizeof(R_DST_MAC));
472 |     memcpy(eth_spec->val.src_mac, R_SRC_MAC, sizeof(R_SRC_MAC));
473 |     memcpy(eth_spec->mask.dst_mac, R_DST_MAC_MASK, sizeof(R_DST_MAC_MASK));
474 |     memcpy(eth_spec->mask.src_mac, R_SRC_MAC_MASK, sizeof(R_SRC_MAC_MASK));
475 |     eth_spec->val.vlan_tag = 0;
476 |     eth_spec->mask.ether_type = 0;
477 | 
478 |     rt_assert(ibv_exp_create_flow(qp, flow_attr) != nullptr);
479 | }
480 | 
481 | // Install a UDP destination port--based flow rule
482 | void install_udp_flow_rule(struct ibv_qp* qp, uint16_t dst_port)
483 | {
484 |     static constexpr size_t rule_sz = sizeof(ibv_exp_flow_attr) + sizeof(ibv_exp_flow_spec_eth) + sizeof(ibv_exp_flow_spec_ipv4_ext) + sizeof(ibv_exp_flow_spec_tcp_udp);
485 | 
486 |     uint8_t* flow_rule = new uint8_t[rule_sz];
487 |     memset(flow_rule, 0, rule_sz);
488 |     uint8_t* buf = flow_rule;
489 | 
490 |     auto* flow_attr = reinterpret_cast<struct ibv_exp_flow_attr*>(flow_rule);
491 |     flow_attr->type = IBV_EXP_FLOW_ATTR_NORMAL;
492 |     flow_attr->size = rule_sz;
493 |     flow_attr->priority = 0;
494 |     flow_attr->num_of_specs = 1;
495 |     flow_attr->port = 1;
496 |     flow_attr->flags = 0;
497 |     flow_attr->reserved = 0;
498 |     buf += sizeof(struct ibv_exp_flow_attr);
499 | 
500 |     // Ethernet - all wildcard
501 |     auto* eth_spec = reinterpret_cast<struct ibv_exp_flow_spec_eth*>(buf);
502 |     eth_spec->type = IBV_EXP_FLOW_SPEC_ETH;
503 |     eth_spec->size = sizeof(struct ibv_exp_flow_spec_eth);
504 |     buf += sizeof(struct ibv_exp_flow_spec_eth);
505 | 
506 |     // IPv4 - all wildcard
507 |     auto* spec_ipv4 = reinterpret_cast<struct ibv_exp_flow_spec_ipv4_ext*>(buf);
508 |     spec_ipv4->type = IBV_EXP_FLOW_SPEC_IPV4_EXT;
509 |     spec_ipv4->size = sizeof(struct ibv_exp_flow_spec_ipv4_ext);
510 |     buf += sizeof(struct ibv_exp_flow_spec_ipv4_ext);
511 | 
512 |     // UDP - match dst port
513 |     auto* udp_spec = reinterpret_cast<struct ibv_exp_flow_spec_tcp_udp*>(buf);
514 |     udp_spec->type = IBV_EXP_FLOW_SPEC_UDP;
515 |     udp_spec->size = sizeof(struct ibv_exp_flow_spec_tcp_udp);
516 |     udp_spec->val.dst_port = htons(dst_port);
517 |     udp_spec->mask.dst_port = 0xffffu;
518 |     udp_spec->mask.dst_port = 0;
519 | 
520 |     rt_assert(ibv_exp_create_flow(qp, flow_attr) != nullptr);
521 | }
522 | 
523 | void snapshot_cqe(volatile mlx5_cqe64* cqe, cqe_snapshot_t& cqe_snapshot)
524 | {
525 |     while (true) {
526 |         uint16_t wqe_id_0 = cqe->wqe_id;
527 |         uint16_t wqe_counter_0 = cqe->wqe_counter;
528 |         memory_barrier();
529 |         uint16_t wqe_id_1 = cqe->wqe_id;
530 | 
531 |         if (likely(wqe_id_0 == wqe_id_1)) {
532 |             cqe_snapshot.wqe_id = ntohs(wqe_id_0);
533 |             cqe_snapshot.wqe_counter = ntohs(wqe_counter_0);
534 |             return;
535 |         }
536 |     }
537 | }
538 | 
539 | size_t get_cycle_delta(const cqe_snapshot_t& prev, const cqe_snapshot_t& cur)
540 | {
541 |     size_t prev_idx = prev.get_cqe_snapshot_cycle_idx();
542 |     size_t cur_idx = cur.get_cqe_snapshot_cycle_idx();
543 |     assert(prev_idx < kAppCQESnapshotCycle && cur_idx < kAppCQESnapshotCycle);
544 | 
545 |     return ((cur_idx + kAppCQESnapshotCycle) - prev_idx) % kAppCQESnapshotCycle;
546 | }
547 | 


--------------------------------------------------------------------------------
/common/dma_common.h:
--------------------------------------------------------------------------------
  1 | #ifndef DMA_COMMON_H
  2 | #define DMA_COMMON_H
  3 | 
  4 | #include "mlx5_defs.h"
  5 | #include "packet.h"
  6 | #include "utils.h"
  7 | #include <assert.h>
  8 | #include <cmath>
  9 | #include <inttypes.h>
 10 | #include <net/if.h> //ifreq
 11 | #include <netdb.h>
 12 | #include <netinet/in.h>
 13 | #include <rdma/rdma_cma.h>
 14 | #include <sstream>
 15 | #include <stdio.h>
 16 | #include <stdlib.h>
 17 | #include <string.h>
 18 | #include <sys/ioctl.h>
 19 | #include <sys/ipc.h>
 20 | #include <sys/mman.h>
 21 | #include <sys/shm.h>
 22 | #include <sys/socket.h>
 23 | 
 24 | #define POLLING_SIZE 400
 25 | #define ENTRY_SIZE 256 /* maximum size of each buffer */
 26 | #define PORT_NUM 1
 27 | 
 28 | #define DEBUG_PRINT_ALL_SENDING_PACKET false
 29 | #define DEBUG_PRINT_ALL_RECEIVING_PACKET false
 30 | 
 31 | #define DEBUG_CHECK_SEND_RECEIVE_TOTAL false
 32 | 
 33 | static constexpr size_t kAppRecvCQDepth = 8;
 34 | static constexpr size_t kAppRQDepth = 4; // Multi-packet RQ depth
 35 | 
 36 | static constexpr size_t kAppLogNumStrides = 9;
 37 | static constexpr size_t kAppLogStrideBytes = 9;
 38 | static constexpr size_t kAppMaxPostlist = 512;
 39 | 
 40 | static constexpr bool kAppVerbose = false;
 41 | static constexpr bool kAppCheckContents = true; // Check buffer contents
 42 | 
 43 | /// Size of one ring message buffer
 44 | static constexpr size_t kAppRingMbufSize = (1ull << kAppLogStrideBytes);
 45 | 
 46 | /// Number of strides in one multi-packet RECV WQE
 47 | static constexpr size_t kAppStridesPerWQE = (1ull << kAppLogNumStrides);
 48 | 
 49 | /// Packets after which the CQE snapshot cycles
 50 | static constexpr size_t kAppCQESnapshotCycle = 65536 * kAppStridesPerWQE;
 51 | 
 52 | /// Total number of entries in the RX ring
 53 | static constexpr size_t kAppNumRingEntries = (kAppStridesPerWQE * kAppRQDepth);
 54 | 
 55 | static constexpr size_t kAppRingSize = (kAppNumRingEntries * kAppRingMbufSize);
 56 | 
 57 | /// A consistent snapshot of CQE fields in host endian format
 58 | struct cqe_snapshot_t {
 59 |     uint16_t wqe_id;
 60 |     uint16_t wqe_counter;
 61 | 
 62 |     /// Return this packet's index in the CQE snapshot cycle
 63 |     size_t get_cqe_snapshot_cycle_idx() const
 64 |     {
 65 |         return wqe_id * kAppStridesPerWQE + wqe_counter;
 66 |     }
 67 | 
 68 |     std::string to_string()
 69 |     {
 70 |         std::ostringstream ret;
 71 |         ret << "[ID " << std::to_string(wqe_id) << ", counter "
 72 |             << std::to_string(wqe_counter) << "]";
 73 |         return ret.str();
 74 |     }
 75 | };
 76 | 
 77 | struct DMAcontext {
 78 |     struct ibv_pd* pd;
 79 |     struct ibv_context* ctx;
 80 |     struct ibv_cq* receive_cq;
 81 |     struct ibv_cq* send_cq;
 82 |     struct ibv_mr* send_mr;
 83 |     void* send_region;
 84 |     struct ibv_qp* data_qp;
 85 | 
 86 |     struct ibv_qp* mp_recv_qp;
 87 |     struct ibv_cq* mp_recv_cq;
 88 |     struct ibv_exp_wq* mp_wq;
 89 |     struct ibv_exp_wq_family* mp_wq_family;
 90 |     struct ibv_exp_rwq_ind_table* mp_ind_tbl;
 91 |     volatile mlx5_cqe64* mp_cqe_arr;
 92 |     struct ibv_sge* mp_sge;
 93 |     uint8_t* mp_recv_ring;
 94 |     uint8_t* mp_send_ring;
 95 |     struct ibv_mr* mp_send_mr;
 96 | 
 97 |     // for connection
 98 |     int id;
 99 |     int total_received;
100 |     int total_sent;
101 |     int my_send_queue_length;
102 |     int my_recv_queue_length;
103 | 
104 |     size_t ring_head;
105 |     size_t nb_rx_rolling;
106 |     size_t sge_idx;
107 |     size_t cqe_idx;
108 | 
109 |     cqe_snapshot_t prev_snapshot;
110 | 
111 |     bool isPS;
112 | 
113 |     // // For window adjustment
114 |     bool isMarkTimeStamp;
115 |     bool* isSent;
116 |     std::chrono::high_resolution_clock::time_point* first_send_time;
117 |     std::chrono::high_resolution_clock::time_point* first_receive_time;
118 | };
119 | 
120 | DMAcontext* DMA_create(ibv_device* ib_dev, int thread_id, bool isPS);
121 | const char* ibv_wc_opcode_str(enum ibv_wc_opcode opcode);
122 | void send_packet(DMAcontext* dma_context, int packet_size, uint64_t offset);
123 | size_t receive_packet(DMAcontext *dma_context, cqe_snapshot_t* new_snapshot);
124 | void dma_postback(DMAcontext *dma_context);
125 | void dma_update_snapshot(DMAcontext *dma_context, cqe_snapshot_t new_snapshot);
126 | void dma_context_print(DMAcontext* dma_context, const char* caption);
127 | 
128 | // Install a UDP destination port--based flow rule
129 | void install_flow_rule(struct ibv_qp* qp, uint16_t thread_id, bool isPS);
130 | void install_udp_flow_rule(struct ibv_qp* qp, uint16_t dst_port);
131 | void snapshot_cqe(volatile mlx5_cqe64* cqe, cqe_snapshot_t& cqe_snapshot);
132 | size_t get_cycle_delta(const cqe_snapshot_t& prev, const cqe_snapshot_t& cur);
133 | #endif
134 | 


--------------------------------------------------------------------------------
/common/dma_common.o:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/common/dma_common.o


--------------------------------------------------------------------------------
/common/mlx5_defs.h:
--------------------------------------------------------------------------------
  1 | #ifndef MLX5_DEFS_H
  2 | #define MLX5_DEFS_H
  3 | 
  4 | #include <infiniband/verbs_exp.h>
  5 | #include <inttypes.h>
  6 | #include <linux/types.h>
  7 | #include <stdint.h>
  8 | 
  9 | enum mlx5_alloc_type {
 10 |     MLX5_ALLOC_TYPE_ANON,
 11 |     MLX5_ALLOC_TYPE_HUGE,
 12 |     MLX5_ALLOC_TYPE_CONTIG,
 13 |     MLX5_ALLOC_TYPE_PEER_DIRECT,
 14 |     MLX5_ALLOC_TYPE_PREFER_HUGE,
 15 |     MLX5_ALLOC_TYPE_PREFER_CONTIG,
 16 |     MLX5_ALLOC_TYPE_ALL
 17 | };
 18 | 
 19 | enum mlx5_lock_type {
 20 |     MLX5_SPIN_LOCK = 0,
 21 |     MLX5_MUTEX = 1,
 22 | };
 23 | 
 24 | enum mlx5_lock_state { MLX5_USE_LOCK,
 25 |     MLX5_LOCKED,
 26 |     MLX5_UNLOCKED };
 27 | 
 28 | struct mlx5_lock {
 29 |     pthread_mutex_t mutex;
 30 |     pthread_spinlock_t slock;
 31 |     enum mlx5_lock_state state;
 32 |     enum mlx5_lock_type type;
 33 | };
 34 | 
 35 | struct mlx5_numa_req {
 36 |     int valid;
 37 |     int numa_id;
 38 | };
 39 | 
 40 | struct mlx5_peer_direct_mem {
 41 |     uint32_t dir;
 42 |     uint64_t va_id;
 43 |     struct ibv_exp_peer_buf* pb;
 44 |     struct ibv_exp_peer_direct_attr* ctx;
 45 | };
 46 | 
 47 | struct mlx5_buf {
 48 |     void* buf;
 49 |     size_t length;
 50 |     int base;
 51 |     struct mlx5_hugetlb_mem* hmem;
 52 |     struct mlx5_peer_direct_mem peer;
 53 |     enum mlx5_alloc_type type;
 54 |     struct mlx5_numa_req numa_req;
 55 |     int numa_alloc;
 56 | };
 57 | 
 58 | struct mlx5_mini_cqe8 {
 59 |     union {
 60 |         uint32_t rx_hash_result;
 61 |         uint32_t checksum;
 62 |         struct {
 63 |             uint16_t wqe_counter;
 64 |             uint8_t s_wqe_opcode;
 65 |             uint8_t reserved;
 66 |         } s_wqe_info;
 67 |     };
 68 |     uint32_t byte_cnt;
 69 | };
 70 | 
 71 | enum { MLX5_MINI_ARR_SIZE = 8 };
 72 | 
 73 | struct mlx5_tm_cqe {
 74 |     uint32_t success;
 75 |     uint32_t hw_phase_cnt;
 76 |     uint8_t rsvd0[10];
 77 | };
 78 | 
 79 | struct mlx5_cqe64 {
 80 |     uint8_t rsvd0[2];
 81 |     /*
 82 |    * wqe_id is valid only for
 83 |    * Striding RQ (Multi-Packet RQ).
 84 |    * It provides the WQE index inside the RQ.
 85 |    */
 86 |     uint16_t wqe_id;
 87 |     uint8_t rsvd4[8];
 88 |     uint32_t rx_hash_res;
 89 |     uint8_t rx_hash_type;
 90 |     uint8_t ml_path;
 91 |     uint8_t rsvd20[2];
 92 |     uint16_t checksum;
 93 |     uint16_t slid;
 94 |     uint32_t flags_rqpn;
 95 |     uint8_t hds_ip_ext;
 96 |     uint8_t l4_hdr_type_etc;
 97 |     __be16 vlan_info;
 98 |     uint32_t srqn_uidx;
 99 |     uint32_t imm_inval_pkey;
100 |     uint8_t app;
101 |     uint8_t app_op;
102 |     uint16_t app_info;
103 |     uint32_t byte_cnt;
104 |     __be64 timestamp;
105 |     union {
106 |         uint32_t sop_drop_qpn;
107 |         struct {
108 |             uint8_t sop;
109 |             uint8_t qpn[3];
110 |         } sop_qpn;
111 |     };
112 |     /*
113 |    * In Striding RQ (Multi-Packet RQ) wqe_counter provides
114 |    * the WQE stride index (to calc pointer to start of the message)
115 |    */
116 |     uint16_t wqe_counter;
117 |     uint8_t signature;
118 |     uint8_t op_own;
119 | };
120 | 
121 | struct mlx5_cq {
122 |     struct ibv_cq ibv_cq;
123 |     uint32_t creation_flags;
124 |     uint32_t pattern;
125 |     struct mlx5_buf buf_a;
126 |     struct mlx5_buf buf_b;
127 |     struct mlx5_buf* active_buf;
128 |     struct mlx5_buf* resize_buf;
129 |     int resize_cqes;
130 |     int active_cqes;
131 |     struct mlx5_lock lock;
132 |     uint32_t cqn;
133 |     uint32_t cons_index;
134 |     uint32_t wait_index;
135 |     uint32_t wait_count;
136 |     volatile uint32_t* dbrec;
137 |     int arm_sn;
138 |     int cqe_sz;
139 |     int resize_cqe_sz;
140 |     int stall_next_poll;
141 |     int stall_enable;
142 |     uint64_t stall_last_count;
143 |     int stall_adaptive_enable;
144 |     int stall_cycles;
145 |     uint8_t model_flags; /* use mlx5_cq_model_flags */
146 |     uint16_t cqe_comp_max_num;
147 |     uint8_t cq_log_size;
148 |     /* Compressed CQE data */
149 |     struct mlx5_cqe64 next_decomp_cqe64;
150 |     struct mlx5_resource* compressed_rsc;
151 |     uint16_t compressed_left;
152 |     uint16_t compressed_wqe_cnt;
153 |     uint8_t compressed_req;
154 |     uint8_t compressed_mp_rq;
155 |     uint8_t mini_arr_idx;
156 |     struct mlx5_mini_cqe8 mini_array[MLX5_MINI_ARR_SIZE];
157 |     /* peer-direct data */
158 |     int peer_enabled;
159 |     struct ibv_exp_peer_direct_attr* peer_ctx;
160 |     struct mlx5_buf peer_buf;
161 |     struct mlx5_peek_entry** peer_peek_table;
162 |     struct mlx5_peek_entry* peer_peek_free;
163 | };
164 | 
165 | #endif // MLX5_DEFS_H
166 | 


--------------------------------------------------------------------------------
/common/p4ml_struct.h:
--------------------------------------------------------------------------------
 1 | #ifndef P4ML_STRUCT_H
 2 | #define P4ML_STRUCT_H
 3 | #include <inttypes.h>
 4 | 
 5 | #include "packet.h"
 6 | 
 7 | struct ThreadInfo
 8 | {
 9 |     int thread_id;
10 |     int agtr_start_pos;
11 | };
12 | 
13 | struct Job
14 | {
15 |     uint64_t key;
16 |     float *float_data;
17 |     int32_t *int_data;
18 |     uint32_t len;
19 |     int cmd;
20 | };
21 | 
22 | struct AppInfo
23 | {
24 |     uint32_t host;
25 |     uint16_t appID;
26 |     uint8_t num_worker;
27 |     uint8_t num_PS;
28 | };
29 | 
30 | #endif


--------------------------------------------------------------------------------
/common/packet.h:
--------------------------------------------------------------------------------
  1 | #ifndef PACKET_P4ML_H
  2 | #define PACKET_P4ML_H
  3 | #include <stdint.h>
  4 | #include <net/ethernet.h>
  5 | #include <arpa/inet.h>
  6 | #include <cstring>
  7 | #include <cstdio>
  8 | #include <thread>
  9 | #include <mutex>
 10 | #include <inttypes.h>
 11 | #include <iostream>
 12 | #include <bitset>
 13 | #include <chrono>
 14 | #include "utils.h"
 15 | #include "p4ml_struct.h"
 16 | 
 17 | #define PS_FILTER_TEMPLATE 0x05, 0x04, 0x03, 0x02, 0x01, 0xFF
 18 | #define WORKER_FILTER_TEMPLATE 0x77, 0x77, 0x77, 0x77, 0x77, 0xFF
 19 | 
 20 | // #define SRC_MAC 0xb8, 0x59, 0x9f, 0x1d, 0x04, 0xf2 
 21 | #define SRC_MAC 0xe4, 0x1d, 0x2d, 0xf3, 0xdd, 0xcc
 22 | // #define DST_MAC 0xb8, 0x59, 0x9f, 0x0b, 0x30, 0x72
 23 | 
 24 | #define ETH_TYPE 0x07, 0x00
 25 | 
 26 | #define IP_HDRS 0x45, 0x00, 0x00, 0x54, 0x00, 0x00, 0x40, 0x00, 0x40, 0x01, 0xaf, 0xb6
 27 | 
 28 | #define SRC_IP 0x0d, 0x07, 0x38, 0x66
 29 | 
 30 | #define DST_IP 0x0d, 0x07, 0x38, 0x7f
 31 | 
 32 | #define SRC_PORT 0x67, 0x67
 33 | 
 34 | #define DST_PORT 0x78, 0x78
 35 | 
 36 | #define UDP_HDRS 0x00, 0x00, 0x00, 0x00
 37 | 
 38 | // Only a template, DST_IP will be modified soon
 39 | // This one is for sending
 40 | const unsigned char PS_IP_ETH_UDP_HEADER[] = { WORKER_FILTER_TEMPLATE, SRC_MAC, ETH_TYPE, IP_HDRS, SRC_IP, DST_IP };
 41 | const unsigned char WORKER_IP_ETH_UDP_HEADER[] = { PS_FILTER_TEMPLATE, SRC_MAC, ETH_TYPE, IP_HDRS, SRC_IP, DST_IP };
 42 | 
 43 | // P4ML_PACKET_SIZE = IP_ETH_HEADER_SIZE + P4ML_HEADER_SIZE + P4ML_DATA_SIZE
 44 | #define P4ML_PACKET_SIZE 309 //308+1
 45 | #define P4ML_DATA_SIZE 248
 46 | #define P4ML_HEADER_SIZE 27 //26+1
 47 | #define P4ML_LAYER_SIZE 275 //274+1
 48 | #define IP_ETH_UDP_HEADER_SIZE 34
 49 | 
 50 | #define MAX_ENTRIES_PER_PACKET 62
 51 | 
 52 | #define BYTE_TO_BINARY_PATTERN "%c%c%c%c%c%c%c%c"
 53 | #define BYTE_TO_BINARY(byte)  \
 54 |   (byte & 0x80 ? '1' : '0'), \
 55 |   (byte & 0x40 ? '1' : '0'), \
 56 |   (byte & 0x20 ? '1' : '0'), \
 57 |   (byte & 0x10 ? '1' : '0'), \
 58 |   (byte & 0x08 ? '1' : '0'), \
 59 |   (byte & 0x04 ? '1' : '0'), \
 60 |   (byte & 0x02 ? '1' : '0'), \
 61 |   (byte & 0x01 ? '1' : '0') 
 62 | 
 63 | #pragma pack(push, 1)
 64 |     struct agghdr {
 65 |         uint32_t bitmap;
 66 |         uint8_t num_worker;
 67 |         uint8_t flag;
 68 |         // reserved       :  2;
 69 |         // isForceFoward  :  1;
 70 | 
 71 |         /* Current version
 72 |         overflow       :  1;
 73 |         PSIndex        :  2;
 74 |         dataIndex      :  1;
 75 |         ECN            :  1;
 76 |         isResend       :  1;
 77 |         isSWCollision  :  1;
 78 |         isACK          :  1;
 79 |         */
 80 | 
 81 |         uint16_t appID;
 82 |         uint16_t seq_num;
 83 |         uint8_t is_lzy_Col; //is_Col in A2TP
 84 |         uint16_t agtr;
 85 |         uint16_t agtr2;
 86 |         int32_t vector[MAX_ENTRIES_PER_PACKET];
 87 |         uint64_t key;
 88 |         uint32_t len_tensor;
 89 | };
 90 | #pragma pack(pop)
 91 | 
 92 | static std::mutex _packet_print_mutex;
 93 | 
 94 | void inline make_p4ml_layer_and_copy_to(void* payload, Job* job_info, AppInfo* app_info, uint16_t* agtr, uint16_t* seq_num, int* offset, bool isResend, bool isForceForward, bool isOverflow, bool isForceCollision = false)
 95 | {
 96 |     agghdr* agg_header = (agghdr*)payload;
 97 |     agghdr* p4ml_header = agg_header;
 98 |     agg_header->key = job_info->key;
 99 |     agg_header->len_tensor = htonl(job_info->len);
100 |     agg_header->bitmap = htonl(1 << (app_info->host));
101 |     agg_header->num_worker = app_info->num_worker;
102 |     agg_header->appID = htons(app_info->appID);
103 |     agg_header->flag = 0;
104 |     agg_header->agtr = htons(*agtr);
105 |     //TODO: clarify this and UsedSwitchAGTRcount
106 |     agg_header->agtr2 = htons(*agtr + MAX_AGTR_COUNT);
107 |     agg_header->seq_num = htons(*seq_num);
108 | 
109 |     agg_header->flag = ((job_info->key % app_info->num_PS)) << 5;
110 | 
111 |     agg_header->is_lzy_Col = 0;
112 |     //agg_header->flag = 0; //lzy
113 |     if (isResend) 
114 |         agg_header->flag |= 4;
115 |         
116 |     if (isForceForward)
117 |         agg_header->flag |= 32;
118 | 
119 |     if (isOverflow) 
120 |         agg_header->flag |= 128;
121 | 
122 |     if (isForceCollision){ //lzy
123 |         agg_header->flag |= 0x02;
124 |     }
125 |     // PS Index
126 |     // agg_header->flag |= (*num_PS << 5);
127 |     // printf("to PS: %d\n", ((*key % *num_PS)+1));
128 | 
129 |     int32_t* used_data;
130 |     if (isOverflow) {
131 |         used_data = (int32_t*) job_info->float_data;
132 |     }
133 |     else 
134 |         used_data = (int32_t*) job_info->int_data;
135 | 
136 |     int32_t* send_data;
137 |     if (*offset + MAX_ENTRIES_PER_PACKET > job_info->len) {
138 |         int32_t* tmp = new int32_t[MAX_ENTRIES_PER_PACKET]();
139 |         memcpy(tmp, used_data + *offset, sizeof(int32_t) * (job_info->len % MAX_ENTRIES_PER_PACKET));
140 |         send_data = tmp;
141 |         delete tmp;
142 |     } else {
143 |         send_data = used_data + *offset;
144 |     }
145 | 
146 |     // p4ml_header_print_h(agg_header, "Make");
147 | }
148 | 
149 | // void inline make_packet_and_copy_to(void* payload, uint64_t* key, uint32_t* len_tensor, uint32_t* workerID, uint8_t* num_worker, uint16_t* appID, uint16_t* agtr, uint16_t* seq_num, int32_t* data, bool isResend, bool isForceForward, uint8_t* num_PS, int thread_id)
150 | // {
151 | //     char* eth_ip_header = (char*)payload;
152 | //     memcpy(payload, IP_ETH_UDP_HEADER, sizeof(IP_ETH_UDP_HEADER));
153 | //     eth_ip_header[5] = thread_id;
154 | //     make_p4ml_layer_and_copy_to((char*)payload + sizeof(IP_ETH_UDP_HEADER), key, len_tensor, workerID, num_worker, appID, agtr, seq_num, data, isResend, isForceForward, num_PS);
155 | // }
156 | 
157 | void inline p4ml_header_ntoh(agghdr* p_p4ml)
158 | {
159 |     p_p4ml->len_tensor = ntohl(p_p4ml->len_tensor);
160 |     p_p4ml->bitmap = ntohl(p_p4ml->bitmap);
161 |     p_p4ml->seq_num = ntohs(p_p4ml->seq_num);
162 |     p_p4ml->agtr = ntohs(p_p4ml->agtr);
163 |     p_p4ml->agtr2 = ntohs(p_p4ml->agtr2);
164 |     p_p4ml->appID = ntohs(p_p4ml->appID);
165 | 
166 | 
167 |     int32_t* p_model = p_p4ml->vector;
168 | 
169 |     /* if not float */
170 |     if (!(p_p4ml->flag & 0x80)) {
171 |         for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++)
172 |             p_model[i] = ntohl(p_model[i]);
173 |     }
174 | }
175 | 
176 | void inline p4ml_header_ntoh_without_data(agghdr* p_p4ml)
177 | {
178 |     p_p4ml->len_tensor = ntohl(p_p4ml->len_tensor);
179 |     p_p4ml->bitmap = ntohl(p_p4ml->bitmap);
180 |     p_p4ml->seq_num = ntohs(p_p4ml->seq_num);
181 |     p_p4ml->agtr = ntohs(p_p4ml->agtr);
182 |     p_p4ml->agtr2 = ntohs(p_p4ml->agtr2);
183 |     p_p4ml->appID = ntohs(p_p4ml->appID);
184 | 
185 |     int32_t* p_model = p_p4ml->vector;
186 | }
187 | 
188 | void inline p4ml_header_hton_without_data(agghdr* p_p4ml)
189 | {
190 |     p_p4ml->len_tensor = htonl(p_p4ml->len_tensor);
191 |     p_p4ml->bitmap = htonl(p_p4ml->bitmap);
192 |     p_p4ml->seq_num = htons(p_p4ml->seq_num);
193 |     p_p4ml->agtr = htons(p_p4ml->agtr);
194 |     p_p4ml->agtr2 = htons(p_p4ml->agtr2);
195 |     p_p4ml->appID = htons(p_p4ml->appID);
196 | 
197 | }
198 | 
199 | void inline p4ml_header_setACK(agghdr* p4ml_header)
200 | {
201 |     p4ml_header->flag |= 1;
202 | }
203 | 
204 | void inline p4ml_header_setOverflow(agghdr* p4ml_header)
205 | {
206 |     p4ml_header->flag |= 128;
207 | }
208 | 
209 | void inline p4ml_header_setOverflowRequest(agghdr* p4ml_header)
210 | {
211 |     p4ml_header->flag |= 128;
212 |     p4ml_header->flag &= ~(4);
213 | }
214 | 
215 | void inline p4ml_header_setCollisionBit(agghdr* p4ml_header)
216 | {
217 |     p4ml_header->flag |= 2;
218 | }
219 | 
220 | void inline p4ml_header_setLengthFieldToAgtr(agghdr* p4ml_header, uint16_t new_agtr)
221 | {
222 |     p4ml_header->len_tensor = new_agtr;
223 | }
224 | 
225 | void inline p4ml_header_resetIndex(agghdr* p4ml_header)
226 | {
227 |     p4ml_header->flag &= ~(16);
228 | }
229 | 
230 | void inline p4ml_header_resetCollisionBit(agghdr* p4ml_header)
231 | {
232 |     p4ml_header->flag &= ~(2);
233 | }
234 | 
235 | void inline p4ml_header_resetLZYColBit(agghdr* p4ml_header){
236 |     p4ml_header->is_lzy_Col = 0x00;
237 | 
238 | }
239 | void inline p4ml_header_setLZYColBit(agghdr* p4ml_header){
240 |     p4ml_header->is_lzy_Col = 0x01;
241 | }
242 | 
243 | void inline p4ml_header_print(agghdr *p4ml_header, const char *caption)
244 | {
245 |     std::lock_guard<std::mutex> lock(_packet_print_mutex);
246 |     printf("[%s] \n key: %" PRIu64 ", len_tensor: %u, "
247 |            "bitmap: " BYTE_TO_BINARY_PATTERN ", num_worker: %u, appID: %u, "
248 |            "agtr: %u, agtr2: %u, seq_num: %u, isACK: %d, dataIndex: %d,"
249 |            "isResend: %d, isOverflow: %d, data: ",
250 |         caption, p4ml_header->key, p4ml_header->len_tensor,
251 |         BYTE_TO_BINARY(p4ml_header->bitmap), p4ml_header->num_worker, p4ml_header->appID,
252 |         p4ml_header->agtr, p4ml_header->agtr2, p4ml_header->seq_num,
253 |         p4ml_header->flag & 1 ? 1 : 0, p4ml_header->flag & 16 ? 1 : 0, p4ml_header->flag & 4 ? 1 : 0,
254 |         p4ml_header->flag & 128 ? 1 : 0);
255 | 
256 |     // is Overflow?
257 |     if (p4ml_header->flag & 128)
258 |         // is ACK?  isn't Resend?
259 |         if (p4ml_header->flag & 1 && !(p4ml_header->flag & 4))
260 |             printf("REQUEST - CARELESS.");
261 |         else
262 |             for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++)
263 |                 printf("%.7f ", ntohf((p4ml_header->vector)[i]));
264 |     else
265 |         for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++)
266 |             printf("%d ", p4ml_header->vector[i]);
267 |     printf("\n");
268 | }
269 | 
270 | void inline p4ml_header_print_h(agghdr *p4ml_header, const char *caption)
271 | {
272 |     std::lock_guard<std::mutex> lock(_packet_print_mutex);
273 |     printf("[%s] \n key: %" PRIu64 ", len_tensor: %u, "
274 |            "bitmap: " BYTE_TO_BINARY_PATTERN ", num_worker: %u, appID: %u, "
275 |            "agtr: %u, agtr2: %u, seq_num: %u, isACK: %d, dataIndex: %d,"
276 |            "isResend: %d, isOverflow: %d, data: ",
277 |         caption, p4ml_header->key, ntohl(p4ml_header->len_tensor),
278 |         BYTE_TO_BINARY(ntohl(p4ml_header->bitmap)), p4ml_header->num_worker, ntohs(p4ml_header->appID),
279 |         ntohs(p4ml_header->agtr), ntohs(p4ml_header->agtr2), ntohs(p4ml_header->seq_num),
280 |         p4ml_header->flag & 1 ? 1 : 0, p4ml_header->flag & 16 ? 1 : 0, p4ml_header->flag & 4 ? 1 : 0,
281 |         p4ml_header->flag & 128 ? 1 : 0);
282 | 
283 |     // is Overflow?
284 |     if (p4ml_header->flag & 128)
285 |         // is ACK?  isn't Resend?
286 |         if (p4ml_header->flag & 1 && !(p4ml_header->flag & 4))
287 |             printf("REQUEST - CARELESS.");
288 |         else 
289 |             for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++)
290 |                 printf("%.7f ", ((float *)(p4ml_header->vector))[i]);
291 |     else
292 |         for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++)
293 |             printf("%d ", ntohl(p4ml_header->vector[i]));
294 |     printf("\n");
295 | }
296 | 
297 | #endif
298 | 


--------------------------------------------------------------------------------
/common/quantize.h:
--------------------------------------------------------------------------------
  1 | #ifndef QUAN_P4ML_H
  2 | #define QUAN_P4ML_H
  3 | #include <immintrin.h>
  4 | #include <stdint.h>
  5 | 
  6 | // scale up float then translate it to int
  7 | // without any further optimization
  8 | inline static void quantizeNaive(char *data_ptr, uint32_t size)
  9 | {
 10 |     int factor = 1000000;
 11 |     int *int_data_ptr = (int *)data_ptr;
 12 |     float *float_data_ptr = (float *)data_ptr;
 13 |     for (uint32_t i = 0; i < size; i++)
 14 |     {
 15 |         int_data_ptr[i] = (int)(float_data_ptr[i] * factor);
 16 |     }
 17 | }
 18 | 
 19 | // translate back to float and scale down
 20 | // without any further optimization
 21 | inline static void dequantizeNaive(char *data_ptr, uint32_t size)
 22 | {
 23 |     float factor = 1000000.0;
 24 |     int *int_data_ptr = (int *)data_ptr;
 25 |     float *float_data_ptr = (float *)data_ptr;
 26 |     for (uint32_t i = 0; i < size; i++)
 27 |     {
 28 |         float_data_ptr[i] = (float)(int_data_ptr[i] / factor);
 29 |     }
 30 | }
 31 | 
 32 | // functioned the same as quantizeNaive
 33 | // boost with avx 256 instructions
 34 | inline static void quantizeAVX2(char *data_ptr, uint32_t size)
 35 | {
 36 |     // check alignment
 37 | 
 38 |     __m256 input;
 39 |     __m256i output;
 40 | 
 41 |     int unaligned_size = size % 8;
 42 |     int aligned_size = size / 8;
 43 | 
 44 |     const float factor = 1000000.0;
 45 |     float *float_data_ptr = (float *)data_ptr;
 46 |     int *int_data_ptr = (int *)data_ptr;
 47 | 
 48 |     // 0xF4240 is 1000000 in hex
 49 |     __m256 factor_in_avx = _mm256_broadcast_ss(&factor);
 50 | 
 51 |     for (uint32_t i = 0; i < aligned_size; i++)
 52 |     {
 53 |         float *current_pos = float_data_ptr + i * 8;
 54 |         input = _mm256_loadu_ps(current_pos);
 55 |         input = _mm256_mul_ps(input, factor_in_avx);
 56 |         output = _mm256_cvtps_epi32(input);
 57 |         _mm256_storeu_si256((__m256i *)current_pos, output);
 58 |     }
 59 | 
 60 |     for (uint32_t i = 0; i < unaligned_size; i++)
 61 |     {
 62 |         int_data_ptr[aligned_size * 8 + i] =
 63 |             (int)(float_data_ptr[aligned_size * 8 + i] * factor);
 64 |     }
 65 | }
 66 | 
 67 | // functioned the same as dequantizeNaive
 68 | // boost with avx 256 instructions
 69 | inline static void dequantizeAVX2(char *data_ptr, uint32_t size)
 70 | {
 71 |     __m256i input;
 72 |     __m256 output;
 73 | 
 74 |     int unaligned_size = size % 8;
 75 |     int aligned_size = size / 8;
 76 | 
 77 |     const float factor = 1000000.0;
 78 |     int *int_data_ptr = (int *)data_ptr;
 79 |     float *float_data_ptr = (float *)data_ptr;
 80 | 
 81 |     // __m256i* input_avx = (__m256i*) data_ptr;
 82 |     __m256 factor_in_avx = _mm256_broadcast_ss(&factor);
 83 | 
 84 |     for (uint32_t i = 0; i < aligned_size; i++)
 85 |     {
 86 |         float *current_pos = float_data_ptr + i * 8;
 87 |         input = _mm256_loadu_si256((__m256i *)current_pos);
 88 |         output = _mm256_cvtepi32_ps(input);
 89 |         output = _mm256_div_ps(output, factor_in_avx);
 90 |         _mm256_storeu_ps(current_pos, output);
 91 |     }
 92 | 
 93 |     for (uint32_t i = 0; i < unaligned_size; i++)
 94 |     {
 95 |         float_data_ptr[aligned_size * 8 + i] =
 96 |             (float)(int_data_ptr[aligned_size * 8 + i] / factor);
 97 |     }
 98 | }
 99 | 
100 | // functioned the same as quantizeNaive
101 | // boost with avx 256 instructions
102 | inline static void quantizeAVX2to(char *dst_ptr, char *src_ptr, uint32_t size)
103 | {
104 |     // check alignment
105 | 
106 |     __m256 input;
107 |     __m256i output;
108 | 
109 |     int unaligned_size = size % 8;
110 |     int aligned_size = size / 8;
111 | 
112 |     const float factor = 1000000.0;
113 |     float *float_data_ptr = (float *)src_ptr;
114 |     int *int_data_ptr = (int *)src_ptr;
115 | 
116 |     float *dst_float_data_ptr = (float *)dst_ptr;
117 |     int *dst_int_data_ptr = (int *)dst_ptr;
118 | 
119 |     // 0xF4240 is 1000000 in hex
120 |     __m256 factor_in_avx = _mm256_broadcast_ss(&factor);
121 | 
122 |     for (uint32_t i = 0; i < aligned_size; i++)
123 |     {
124 |         float *current_pos = float_data_ptr + i * 8;
125 |         float *current_dst_pos = dst_float_data_ptr + i * 8;
126 | 
127 |         input = _mm256_loadu_ps(current_pos);
128 |         input = _mm256_mul_ps(input, factor_in_avx);
129 |         output = _mm256_cvtps_epi32(input);
130 |         _mm256_storeu_si256((__m256i *)current_dst_pos, output);
131 |     }
132 | 
133 |     for (uint32_t i = 0; i < unaligned_size; i++)
134 |     {
135 |         dst_int_data_ptr[aligned_size * 8 + i] =
136 |             (int)(float_data_ptr[aligned_size * 8 + i] * factor);
137 |     }
138 | }
139 | 
140 | // functioned the same as dequantizeNaive
141 | // boost with avx 256 instructions
142 | inline static void dequantizeAVX2to(char *dst_ptr, char *src_ptr,
143 |                                     uint32_t size)
144 | {
145 |     __m256i input;
146 |     __m256 output;
147 | 
148 |     int unaligned_size = size % 8;
149 |     int aligned_size = size / 8;
150 | 
151 |     const float factor = 1000000.0;
152 |     int *int_data_ptr = (int *)src_ptr;
153 |     float *float_data_ptr = (float *)src_ptr;
154 | 
155 |     int *dst_int_data_ptr = (int *)dst_ptr;
156 |     float *dst_float_data_ptr = (float *)dst_ptr;
157 | 
158 |     // __m256i* input_avx = (__m256i*) src_ptr;
159 |     __m256 factor_in_avx = _mm256_broadcast_ss(&factor);
160 | 
161 |     for (uint32_t i = 0; i < aligned_size; i++)
162 |     {
163 |         float *current_pos = float_data_ptr + i * 8;
164 |         float *current_dst_pos = dst_float_data_ptr + i * 8;
165 | 
166 |         input = _mm256_loadu_si256((__m256i *)current_pos);
167 |         output = _mm256_cvtepi32_ps(input);
168 |         output = _mm256_div_ps(output, factor_in_avx);
169 |         _mm256_storeu_ps(current_dst_pos, output);
170 |     }
171 | 
172 |     for (uint32_t i = 0; i < unaligned_size; i++)
173 |     {
174 |         dst_float_data_ptr[aligned_size * 8 + i] =
175 |             (float)(int_data_ptr[aligned_size * 8 + i] / factor);
176 |     }
177 | }
178 | 
179 | #endif


--------------------------------------------------------------------------------
/common/utils.h:
--------------------------------------------------------------------------------
  1 | #ifndef UTILS_H
  2 | #define UTILS_H
  3 | 
  4 | #include <sched.h>
  5 | #include <sys/ioctl.h>
  6 | #include <net/if.h>
  7 | #include <stdint.h>
  8 | #include <stdio.h>
  9 | #include <stdlib.h>
 10 | #include <sys/mman.h>
 11 | #include <time.h>
 12 | #include <unistd.h>
 13 | #include <stdexcept>
 14 | 
 15 | // Because here we use 2 agtr for one packet, so /2
 16 | #define MAX_AGTR_COUNT 20000
 17 | #define AGTR_TO_USE_PER_APPLICATION 2800
 18 | 
 19 | #define EACH_HUGEPAGE_SIZE (2048*1024)
 20 | 
 21 | #define likely(x) __builtin_expect(!!(x), 1)
 22 | #define unlikely(x) __builtin_expect(!!(x), 0)
 23 | 
 24 | 
 25 | #define DIVUP(x, y) (((x)+(y)-1)/(y))
 26 | #define ROUNDUP(x, y) (DIVUP((x), (y))*(y))
 27 | 
 28 | template <typename T>
 29 | static inline T align_floor(T v, T align) {
 30 |   return v - (v % align);
 31 | }
 32 | 
 33 | template <typename T>
 34 | static inline T align_ceil(T v, T align) {
 35 |   return align_floor(v + align - 1, align);
 36 | }
 37 | 
 38 | static inline void ib_malloc(void** ptr, size_t size) {
 39 |   size_t page_size = sysconf(_SC_PAGESIZE);
 40 |   void* p;
 41 |   int size_aligned = ROUNDUP(size, page_size);
 42 |   int ret = posix_memalign(&p, page_size, size_aligned);
 43 |   if (ret != 0) {
 44 |     printf("posix_memalign error.\n");
 45 |     exit(1);
 46 |   }
 47 |   memset(p, 0, size);
 48 |   *ptr = p;
 49 | }
 50 | 
 51 | #define KB(x) (static_cast<size_t>(x) << 10)
 52 | #define KB_(x) (KB(x) - 1)
 53 | #define MB(x) (static_cast<size_t>(x) << 20)
 54 | #define MB_(x) (MB(x) - 1)
 55 | 
 56 | static void memory_barrier() { asm volatile("" ::: "memory"); }
 57 | static void lfence() { asm volatile("lfence" ::: "memory"); }
 58 | static void sfence() { asm volatile("sfence" ::: "memory"); }
 59 | static void mfence() { asm volatile("mfence" ::: "memory"); }
 60 | static void clflush(volatile void* p) { asm volatile("clflush (%0)" ::"r"(p)); }
 61 | static void cpuid(unsigned int* eax, unsigned int* ebx, unsigned int* ecx,
 62 |                   unsigned int* edx) {
 63 |   asm volatile("cpuid"
 64 |                : "=a"(*eax), "=b"(*ebx), "=c"(*ecx), "=d"(*edx)
 65 |                : "0"(*eax), "2"(*ecx));
 66 | }
 67 | 
 68 | inline void bindingCPU(int num) {
 69 |   int result;
 70 |   cpu_set_t mask;
 71 |   CPU_ZERO(&mask);
 72 |   CPU_SET(num, &mask);
 73 |   result = sched_setaffinity(0, sizeof(mask), &mask);
 74 |   if (result < 0) {
 75 |     printf("binding CPU fails\n");
 76 |     exit(1);
 77 |   }
 78 | }
 79 | 
 80 | /// Check a condition at runtime. If the condition is false, throw exception.
 81 | static inline void rt_assert(bool condition) {
 82 |   if (unlikely(!condition)) throw std::runtime_error("");
 83 | }
 84 | 
 85 | 
 86 | /* allocate the huge pages. */
 87 | inline char *alloc_raw_pages(int cnt, int size) {
 88 |   /*
 89 |    *  Don't touch the page since then allocator would not allocate the page
 90 |    * right now.
 91 |    */
 92 |   int flag = MAP_SHARED | MAP_ANONYMOUS;
 93 |   if (size == EACH_HUGEPAGE_SIZE) flag |= MAP_HUGETLB;
 94 |   char *ptr =
 95 |       (char *)mmap(NULL, (int64_t)cnt * size, PROT_READ | PROT_WRITE, flag, -1, 0);
 96 |   if (ptr == (char *)-1) {
 97 |     perror("alloc_raw_pages");
 98 |     return NULL;
 99 |   }
100 |   return ptr;
101 | }
102 | 
103 | union {
104 |     float f;
105 |     uint32_t u;
106 | } if_value;
107 |     
108 | inline float ntohf(uint32_t net32)
109 | {
110 |     if_value.u = ntohl(net32);
111 |     return if_value.f;
112 | }
113 | 
114 | // /* Returns the MAC Address Params: int iNetType - 0: ethernet, 1: Wifi char chMAC[6] - MAC Address in binary format Returns: 0: success -1: Failure */
115 | // int getMACAddress(char chMAC[6])
116 | // {
117 | //     struct ifreq ifr;
118 | //     int sock;
119 | //     char* ifname = "enp178s0f0";
120 | //     sock = socket(AF_INET, SOCK_DGRAM, 0);
121 | //     strcpy(ifr.ifr_name, ifname);
122 | //     ifr.ifr_addr.sa_family = AF_INET;
123 | //     if (ioctl(sock, SIOCGIFHWADDR, &ifr) < 0) {
124 | //         return -1;
125 | //     }
126 | //     memcpy(chMAC, ifr.ifr_hwaddr.sa_data, 6);
127 | //     close(sock);
128 | //     return 0;
129 | // }
130 |  
131 | // /* Returns the interface IP Address Params: int iNetType - 0: ethernet, 1: Wifi char *chIP - IP Address string Return: 0: success / -1: Failure */
132 | // int getIpAddress(char chIP[16])
133 | // {
134 | //     struct ifreq ifr;
135 | //     int sock = 0;
136 | //     sock = socket(AF_INET, SOCK_DGRAM, 0);
137 | //     strcpy(ifr.ifr_name, "enp178s0f0");
138 | //     if (ioctl(sock, SIOCGIFADDR, &ifr) < 0) {
139 | //         strcpy(chIP, "0.0.0.0");
140 | //         return -1;
141 | //     }
142 | //     sprintf(chIP, "%s", inet_ntoa(((struct sockaddr_in*)&(ifr.ifr_addr))->sin_addr));
143 | //     close(sock);
144 | //     return 0;
145 | // }
146 | 
147 | #endif
148 | 


--------------------------------------------------------------------------------
/common/window_manager.h:
--------------------------------------------------------------------------------
 1 | #ifndef SLIDING_W_H
 2 | #define SLIDING_W_H
 3 | 
 4 | #include "packet.h"
 5 | #include "CC_manager.h"
 6 | #define RESEND_TRIGGER 1
 7 | 
 8 | class WindowManager {
 9 |     public:
10 |         bool* isACKed;
11 |         /* This three variable is completely useless, but
12 |         when deleting it, the performance will drop from 46Gbps to 40Gbps.. */
13 |         bool* isSent;
14 |         std::chrono::high_resolution_clock::time_point* send_time;
15 |         std::chrono::high_resolution_clock::time_point* receive_time;
16 |         /* */
17 |         int total_ACK;
18 |         int last_ACK;
19 | 
20 |         WindowManager() {
21 |             last_ACK = 0;
22 |         }
23 | 
24 |         int inline UpdateWindow(uint16_t* seq_num)
25 |         {
26 |             int isLastAckUpdated = 0;
27 |             isACKed[*seq_num] = true;
28 |             while (isACKed[last_ACK + 1]) {
29 |                 last_ACK++;
30 |                 isLastAckUpdated ++;
31 |             }
32 |             return isLastAckUpdated;
33 |         }
34 | 
35 |         int inline Reset(int packet_total)
36 |         {
37 |             last_ACK = 0;
38 |             total_ACK = packet_total;
39 |             memset(isACKed, 0, sizeof(bool) * packet_total + 1);
40 |             return 0;
41 |         }
42 | };
43 | 
44 | #endif


--------------------------------------------------------------------------------
/datasample/job_A.txt:
--------------------------------------------------------------------------------
 1 | SetMaxWindow to 50
 2 | 
 3 | Set used agtr to 450... (all agtr: 20000)
 4 | 
 5 | 
 6 | Set max_agtr_size_per_thread to 50...
 7 | 
 8 | thread_to_use 9
 9 | max_agtr_size_per_thread: 50
10 | used agtr: 450 (all:20000) 
11 | using: mlx5_1 nic: 0
12 | [Thread 0] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
13 | init: 0
14 | [Thread 1] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
15 | init: 1
16 | [Thread 2] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
17 | init: 2
18 | [Thread 3] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
19 | init: 3
20 | [Thread 4] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
21 | init: 4
22 | [Thread 5] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
23 | init: 5
24 | [Thread 6] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
25 | init: 6
26 | [Thread 7] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
27 | init: 7
28 | [Thread 8] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
29 | init: 8
30 | 
31 | Model initializing...
32 | 
33 | Model initialized completed. Start sending...
34 | 
35 | Thr 48.673178 
36 | Thr 50.194216 
37 | Thr 50.118166 
38 | Thr 50.042113 
39 | Thr 49.966063 
40 | Thr 50.270271 
41 | Finish all 4500 Tensors,
42 |   Time = 3.463762 s,
43 |   Total Size = 21901.776714 MB,
44 |   Throughput: 49.399356 Gbps
45 | 


--------------------------------------------------------------------------------
/datasample/job_B.txt:
--------------------------------------------------------------------------------
 1 | SetMaxWindow to 50
 2 | 
 3 | Set used agtr to 450... (all agtr: 20000)
 4 | 
 5 | 
 6 | Set max_agtr_size_per_thread to 50...
 7 | 
 8 | thread_to_use 9
 9 | max_agtr_size_per_thread: 50
10 | used agtr: 450 (all:20000) 
11 | using: mlx5_0 nic: 1
12 | [Thread 10] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
13 | init: 0
14 | [Thread 11] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
15 | init: 1
16 | [Thread 12] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
17 | init: 2
18 | [Thread 13] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
19 | init: 3
20 | [Thread 14] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
21 | init: 4
22 | [Thread 15] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
23 | init: 5
24 | [Thread 16] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
25 | init: 6
26 | [Thread 17] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
27 | init: 7
28 | [Thread 18] Finish created QP - kAppRingMbufSize=512, kAppStridesPerWQE=512, kAppRingSize=1048576, kAppRQDepth=4
29 | init: 8
30 | 
31 | Model initializing...
32 | 
33 | Model initialized completed. Start sending...
34 | 
35 | Thr 14.525902 
36 | Thr 15.514576 
37 | Thr 15.590628 
38 | Thr 15.742732 
39 | Thr 15.970888 
40 | Thr 15.590628 
41 | Thr 16.122991 
42 | Thr 15.742732 
43 | Thr 15.210369 
44 | Thr 15.894835 
45 | Thr 15.894835 
46 | Thr 15.970888 
47 | Thr 15.742732 
48 | Thr 16.046939 
49 | Thr 16.199043 
50 | Thr 16.275095 
51 | Thr 16.046940 
52 | Thr 16.503250 
53 | Thr 16.275095 
54 | Thr 16.655353 
55 | Thr 16.199043 
56 | Finish all 4500 Tensors,
57 |   Time = 10.827258 s,
58 |   Total Size = 21901.776714 MB,
59 |   Throughput: 15.803413 Gbps
60 | 


--------------------------------------------------------------------------------
/datasample/switch.txt:
--------------------------------------------------------------------------------
 1 | INFO: SDE was built with python 2.7
 2 | 
 3 | (1, 5)
 4 | (2, 6)
 5 | appID[1] 210/450 46.74 %
 6 | appID[2] 19/450 4.37 %
 7 | time 12.4 total_used 230/450 51.1 %
 8 | appID[1] 202/450 45.04 %
 9 | appID[2] 18/450 4.15 %
10 | time 12.9 total_used 221/450 49.2 %
11 | appID[1] 247/450 55.04 %
12 | appID[2] 7/450 1.70 %
13 | time 13.5 total_used 255/450 56.7 %
14 | appID[1] 186/450 41.33 %
15 | appID[2] 12/450 2.81 %
16 | time 14.1 total_used 198/450 44.1 %
17 | appID[1] 243/450 54.00 %
18 | appID[2] 5/450 1.19 %
19 | time 14.7 total_used 248/450 55.2 %
20 | appID[1] 231/450 51.33 %
21 | appID[2] 9/450 2.00 %
22 | time 15.3 total_used 240/450 53.3 %
23 | appID[1] 74/450 16.52 %
24 | appID[2] 251/450 55.93 %
25 | time 15.9 total_used 326/450 72.4 %
26 | appID[2] 382/450 84.89 %
27 | time 16.5 total_used 382/450 84.9 %
28 | appID[2] 411/450 91.33 %
29 | time 17.1 total_used 411/450 91.3 %
30 | appID[2] 429/450 95.33 %
31 | time 17.6 total_used 429/450 95.3 %
32 | appID[2] 408/450 90.81 %
33 | time 18.2 total_used 408/450 90.8 %
34 | appID[2] 436/450 96.96 %
35 | time 18.8 total_used 436/450 97.0 %
36 | appID[2] 427/450 94.96 %
37 | time 19.4 total_used 427/450 95.0 %
38 | appID[2] 392/450 87.19 %
39 | time 20.0 total_used 392/450 87.2 %
40 | appID[2] 399/450 88.74 %
41 | time 20.6 total_used 399/450 88.7 %
42 | appID[2] 427/450 94.96 %
43 | time 21.2 total_used 427/450 95.0 %
44 | appID[2] 428/450 95.19 %
45 | time 21.7 total_used 428/450 95.2 %
46 | appID[2] 429/450 95.48 %
47 | time 22.3 total_used 429/450 95.5 %
48 | appID[2] 262/450 58.22 %
49 | time 22.9 total_used 262/450 58.2 %


--------------------------------------------------------------------------------
/p4ml2/includes/actions.p4:
--------------------------------------------------------------------------------
  1 | action processentry1() {
  2 |     write_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr);
  3 | }
  4 | 
  5 | action noequ0_processentry1() {
  6 |     noequ0_write_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr);
  7 | }
  8 | 
  9 | action processentry1andWriteToPacket() {
 10 |     write_read_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr);
 11 | }
 12 | 
 13 | action noequ0_processentry1andWriteToPacket() {
 14 |     noequ0_write_read_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr);
 15 | }
 16 | 
 17 | action do_cleanEntry1() {
 18 |     clean_entry1.execute_stateful_alu(p4ml_agtr_index.agtr);
 19 | }
 20 | 
 21 | action entry1WriteToPacket() {
 22 |     read_data_entry1.execute_stateful_alu(p4ml_agtr_index.agtr);
 23 | }
 24 | 
 25 | action processentry2() {
 26 |     write_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr);
 27 | }
 28 | 
 29 | action noequ0_processentry2() {
 30 |     noequ0_write_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr);
 31 | }
 32 | 
 33 | action processentry2andWriteToPacket() {
 34 |     write_read_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr);
 35 | }
 36 | 
 37 | action noequ0_processentry2andWriteToPacket() {
 38 |     noequ0_write_read_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr);
 39 | }
 40 | 
 41 | action do_cleanEntry2() {
 42 |     clean_entry2.execute_stateful_alu(p4ml_agtr_index.agtr);
 43 | }
 44 | 
 45 | action entry2WriteToPacket() {
 46 |     read_data_entry2.execute_stateful_alu(p4ml_agtr_index.agtr);
 47 | }
 48 | 
 49 | action processentry3() {
 50 |     write_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr);
 51 | }
 52 | 
 53 | action noequ0_processentry3() {
 54 |     noequ0_write_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr);
 55 | }
 56 | 
 57 | action processentry3andWriteToPacket() {
 58 |     write_read_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr);
 59 | }
 60 | 
 61 | action noequ0_processentry3andWriteToPacket() {
 62 |     noequ0_write_read_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr);
 63 | }
 64 | 
 65 | action do_cleanEntry3() {
 66 |     clean_entry3.execute_stateful_alu(p4ml_agtr_index.agtr);
 67 | }
 68 | 
 69 | action entry3WriteToPacket() {
 70 |     read_data_entry3.execute_stateful_alu(p4ml_agtr_index.agtr);
 71 | }
 72 | 
 73 | action processentry4() {
 74 |     write_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr);
 75 | }
 76 | 
 77 | action noequ0_processentry4() {
 78 |     noequ0_write_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr);
 79 | }
 80 | 
 81 | action processentry4andWriteToPacket() {
 82 |     write_read_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr);
 83 | }
 84 | 
 85 | action noequ0_processentry4andWriteToPacket() {
 86 |     noequ0_write_read_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr);
 87 | }
 88 | 
 89 | action do_cleanEntry4() {
 90 |     clean_entry4.execute_stateful_alu(p4ml_agtr_index.agtr);
 91 | }
 92 | 
 93 | action entry4WriteToPacket() {
 94 |     read_data_entry4.execute_stateful_alu(p4ml_agtr_index.agtr);
 95 | }
 96 | 
 97 | action processentry5() {
 98 |     write_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr);
 99 | }
100 | 
101 | action noequ0_processentry5() {
102 |     noequ0_write_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr);
103 | }
104 | 
105 | action processentry5andWriteToPacket() {
106 |     write_read_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr);
107 | }
108 | 
109 | action noequ0_processentry5andWriteToPacket() {
110 |     noequ0_write_read_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr);
111 | }
112 | 
113 | action do_cleanEntry5() {
114 |     clean_entry5.execute_stateful_alu(p4ml_agtr_index.agtr);
115 | }
116 | 
117 | action entry5WriteToPacket() {
118 |     read_data_entry5.execute_stateful_alu(p4ml_agtr_index.agtr);
119 | }
120 | 
121 | action processentry6() {
122 |     write_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr);
123 | }
124 | 
125 | action noequ0_processentry6() {
126 |     noequ0_write_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr);
127 | }
128 | 
129 | action processentry6andWriteToPacket() {
130 |     write_read_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr);
131 | }
132 | 
133 | action noequ0_processentry6andWriteToPacket() {
134 |     noequ0_write_read_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr);
135 | }
136 | 
137 | action do_cleanEntry6() {
138 |     clean_entry6.execute_stateful_alu(p4ml_agtr_index.agtr);
139 | }
140 | 
141 | action entry6WriteToPacket() {
142 |     read_data_entry6.execute_stateful_alu(p4ml_agtr_index.agtr);
143 | }
144 | 
145 | action processentry7() {
146 |     write_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr);
147 | }
148 | 
149 | action noequ0_processentry7() {
150 |     noequ0_write_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr);
151 | }
152 | 
153 | action processentry7andWriteToPacket() {
154 |     write_read_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr);
155 | }
156 | 
157 | action noequ0_processentry7andWriteToPacket() {
158 |     noequ0_write_read_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr);
159 | }
160 | 
161 | action do_cleanEntry7() {
162 |     clean_entry7.execute_stateful_alu(p4ml_agtr_index.agtr);
163 | }
164 | 
165 | action entry7WriteToPacket() {
166 |     read_data_entry7.execute_stateful_alu(p4ml_agtr_index.agtr);
167 | }
168 | 
169 | action processentry8() {
170 |     write_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr);
171 | }
172 | 
173 | action noequ0_processentry8() {
174 |     noequ0_write_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr);
175 | }
176 | 
177 | action processentry8andWriteToPacket() {
178 |     write_read_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr);
179 | }
180 | 
181 | action noequ0_processentry8andWriteToPacket() {
182 |     noequ0_write_read_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr);
183 | }
184 | 
185 | action do_cleanEntry8() {
186 |     clean_entry8.execute_stateful_alu(p4ml_agtr_index.agtr);
187 | }
188 | 
189 | action entry8WriteToPacket() {
190 |     read_data_entry8.execute_stateful_alu(p4ml_agtr_index.agtr);
191 | }
192 | 
193 | action processentry9() {
194 |     write_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr);
195 | }
196 | 
197 | action noequ0_processentry9() {
198 |     noequ0_write_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr);
199 | }
200 | 
201 | action processentry9andWriteToPacket() {
202 |     write_read_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr);
203 | }
204 | 
205 | action noequ0_processentry9andWriteToPacket() {
206 |     noequ0_write_read_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr);
207 | }
208 | 
209 | action do_cleanEntry9() {
210 |     clean_entry9.execute_stateful_alu(p4ml_agtr_index.agtr);
211 | }
212 | 
213 | action entry9WriteToPacket() {
214 |     read_data_entry9.execute_stateful_alu(p4ml_agtr_index.agtr);
215 | }
216 | 
217 | action processentry10() {
218 |     write_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr);
219 | }
220 | 
221 | action noequ0_processentry10() {
222 |     noequ0_write_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr);
223 | }
224 | 
225 | action processentry10andWriteToPacket() {
226 |     write_read_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr);
227 | }
228 | 
229 | action noequ0_processentry10andWriteToPacket() {
230 |     noequ0_write_read_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr);
231 | }
232 | 
233 | action do_cleanEntry10() {
234 |     clean_entry10.execute_stateful_alu(p4ml_agtr_index.agtr);
235 | }
236 | 
237 | action entry10WriteToPacket() {
238 |     read_data_entry10.execute_stateful_alu(p4ml_agtr_index.agtr);
239 | }
240 | 
241 | action processentry11() {
242 |     write_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr);
243 | }
244 | 
245 | action noequ0_processentry11() {
246 |     noequ0_write_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr);
247 | }
248 | 
249 | action processentry11andWriteToPacket() {
250 |     write_read_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr);
251 | }
252 | 
253 | action noequ0_processentry11andWriteToPacket() {
254 |     noequ0_write_read_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr);
255 | }
256 | 
257 | action do_cleanEntry11() {
258 |     clean_entry11.execute_stateful_alu(p4ml_agtr_index.agtr);
259 | }
260 | 
261 | action entry11WriteToPacket() {
262 |     read_data_entry11.execute_stateful_alu(p4ml_agtr_index.agtr);
263 | }
264 | 
265 | action processentry12() {
266 |     write_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr);
267 | }
268 | 
269 | action noequ0_processentry12() {
270 |     noequ0_write_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr);
271 | }
272 | 
273 | action processentry12andWriteToPacket() {
274 |     write_read_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr);
275 | }
276 | 
277 | action noequ0_processentry12andWriteToPacket() {
278 |     noequ0_write_read_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr);
279 | }
280 | 
281 | action do_cleanEntry12() {
282 |     clean_entry12.execute_stateful_alu(p4ml_agtr_index.agtr);
283 | }
284 | 
285 | action entry12WriteToPacket() {
286 |     read_data_entry12.execute_stateful_alu(p4ml_agtr_index.agtr);
287 | }
288 | 
289 | action processentry13() {
290 |     write_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr);
291 | }
292 | 
293 | action noequ0_processentry13() {
294 |     noequ0_write_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr);
295 | }
296 | 
297 | action processentry13andWriteToPacket() {
298 |     write_read_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr);
299 | }
300 | 
301 | action noequ0_processentry13andWriteToPacket() {
302 |     noequ0_write_read_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr);
303 | }
304 | 
305 | action do_cleanEntry13() {
306 |     clean_entry13.execute_stateful_alu(p4ml_agtr_index.agtr);
307 | }
308 | 
309 | action entry13WriteToPacket() {
310 |     read_data_entry13.execute_stateful_alu(p4ml_agtr_index.agtr);
311 | }
312 | 
313 | action processentry14() {
314 |     write_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr);
315 | }
316 | 
317 | action noequ0_processentry14() {
318 |     noequ0_write_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr);
319 | }
320 | 
321 | action processentry14andWriteToPacket() {
322 |     write_read_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr);
323 | }
324 | 
325 | action noequ0_processentry14andWriteToPacket() {
326 |     noequ0_write_read_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr);
327 | }
328 | 
329 | action do_cleanEntry14() {
330 |     clean_entry14.execute_stateful_alu(p4ml_agtr_index.agtr);
331 | }
332 | 
333 | action entry14WriteToPacket() {
334 |     read_data_entry14.execute_stateful_alu(p4ml_agtr_index.agtr);
335 | }
336 | 
337 | action processentry15() {
338 |     write_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr);
339 | }
340 | 
341 | action noequ0_processentry15() {
342 |     noequ0_write_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr);
343 | }
344 | 
345 | action processentry15andWriteToPacket() {
346 |     write_read_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr);
347 | }
348 | 
349 | action noequ0_processentry15andWriteToPacket() {
350 |     noequ0_write_read_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr);
351 | }
352 | 
353 | action do_cleanEntry15() {
354 |     clean_entry15.execute_stateful_alu(p4ml_agtr_index.agtr);
355 | }
356 | 
357 | action entry15WriteToPacket() {
358 |     read_data_entry15.execute_stateful_alu(p4ml_agtr_index.agtr);
359 | }
360 | 
361 | action processentry16() {
362 |     write_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr);
363 | }
364 | 
365 | action noequ0_processentry16() {
366 |     noequ0_write_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr);
367 | }
368 | 
369 | action processentry16andWriteToPacket() {
370 |     write_read_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr);
371 | }
372 | 
373 | action noequ0_processentry16andWriteToPacket() {
374 |     noequ0_write_read_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr);
375 | }
376 | 
377 | action do_cleanEntry16() {
378 |     clean_entry16.execute_stateful_alu(p4ml_agtr_index.agtr);
379 | }
380 | 
381 | action entry16WriteToPacket() {
382 |     read_data_entry16.execute_stateful_alu(p4ml_agtr_index.agtr);
383 | }
384 | 
385 | action processentry17() {
386 |     write_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr);
387 | }
388 | 
389 | action noequ0_processentry17() {
390 |     noequ0_write_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr);
391 | }
392 | 
393 | action processentry17andWriteToPacket() {
394 |     write_read_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr);
395 | }
396 | 
397 | action noequ0_processentry17andWriteToPacket() {
398 |     noequ0_write_read_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr);
399 | }
400 | 
401 | action do_cleanEntry17() {
402 |     clean_entry17.execute_stateful_alu(p4ml_agtr_index.agtr);
403 | }
404 | 
405 | action entry17WriteToPacket() {
406 |     read_data_entry17.execute_stateful_alu(p4ml_agtr_index.agtr);
407 | }
408 | 
409 | action processentry18() {
410 |     write_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr);
411 | }
412 | 
413 | action noequ0_processentry18() {
414 |     noequ0_write_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr);
415 | }
416 | 
417 | action processentry18andWriteToPacket() {
418 |     write_read_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr);
419 | }
420 | 
421 | action noequ0_processentry18andWriteToPacket() {
422 |     noequ0_write_read_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr);
423 | }
424 | 
425 | action do_cleanEntry18() {
426 |     clean_entry18.execute_stateful_alu(p4ml_agtr_index.agtr);
427 | }
428 | 
429 | action entry18WriteToPacket() {
430 |     read_data_entry18.execute_stateful_alu(p4ml_agtr_index.agtr);
431 | }
432 | 
433 | action processentry19() {
434 |     write_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr);
435 | }
436 | 
437 | action noequ0_processentry19() {
438 |     noequ0_write_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr);
439 | }
440 | 
441 | action processentry19andWriteToPacket() {
442 |     write_read_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr);
443 | }
444 | 
445 | action noequ0_processentry19andWriteToPacket() {
446 |     noequ0_write_read_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr);
447 | }
448 | 
449 | action do_cleanEntry19() {
450 |     clean_entry19.execute_stateful_alu(p4ml_agtr_index.agtr);
451 | }
452 | 
453 | action entry19WriteToPacket() {
454 |     read_data_entry19.execute_stateful_alu(p4ml_agtr_index.agtr);
455 | }
456 | 
457 | action processentry20() {
458 |     write_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr);
459 | }
460 | 
461 | action noequ0_processentry20() {
462 |     noequ0_write_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr);
463 | }
464 | 
465 | action processentry20andWriteToPacket() {
466 |     write_read_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr);
467 | }
468 | 
469 | action noequ0_processentry20andWriteToPacket() {
470 |     noequ0_write_read_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr);
471 | }
472 | 
473 | action do_cleanEntry20() {
474 |     clean_entry20.execute_stateful_alu(p4ml_agtr_index.agtr);
475 | }
476 | 
477 | action entry20WriteToPacket() {
478 |     read_data_entry20.execute_stateful_alu(p4ml_agtr_index.agtr);
479 | }
480 | 
481 | action processentry21() {
482 |     write_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr);
483 | }
484 | 
485 | action noequ0_processentry21() {
486 |     noequ0_write_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr);
487 | }
488 | 
489 | action processentry21andWriteToPacket() {
490 |     write_read_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr);
491 | }
492 | 
493 | action noequ0_processentry21andWriteToPacket() {
494 |     noequ0_write_read_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr);
495 | }
496 | 
497 | action do_cleanEntry21() {
498 |     clean_entry21.execute_stateful_alu(p4ml_agtr_index.agtr);
499 | }
500 | 
501 | action entry21WriteToPacket() {
502 |     read_data_entry21.execute_stateful_alu(p4ml_agtr_index.agtr);
503 | }
504 | 
505 | action processentry22() {
506 |     write_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr);
507 | }
508 | 
509 | action noequ0_processentry22() {
510 |     noequ0_write_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr);
511 | }
512 | 
513 | action processentry22andWriteToPacket() {
514 |     write_read_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr);
515 | }
516 | 
517 | action noequ0_processentry22andWriteToPacket() {
518 |     noequ0_write_read_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr);
519 | }
520 | 
521 | action do_cleanEntry22() {
522 |     clean_entry22.execute_stateful_alu(p4ml_agtr_index.agtr);
523 | }
524 | 
525 | action entry22WriteToPacket() {
526 |     read_data_entry22.execute_stateful_alu(p4ml_agtr_index.agtr);
527 | }
528 | 
529 | action processentry23() {
530 |     write_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr);
531 | }
532 | 
533 | action noequ0_processentry23() {
534 |     noequ0_write_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr);
535 | }
536 | 
537 | action processentry23andWriteToPacket() {
538 |     write_read_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr);
539 | }
540 | 
541 | action noequ0_processentry23andWriteToPacket() {
542 |     noequ0_write_read_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr);
543 | }
544 | 
545 | action do_cleanEntry23() {
546 |     clean_entry23.execute_stateful_alu(p4ml_agtr_index.agtr);
547 | }
548 | 
549 | action entry23WriteToPacket() {
550 |     read_data_entry23.execute_stateful_alu(p4ml_agtr_index.agtr);
551 | }
552 | 
553 | action processentry24() {
554 |     write_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr);
555 | }
556 | 
557 | action noequ0_processentry24() {
558 |     noequ0_write_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr);
559 | }
560 | 
561 | action processentry24andWriteToPacket() {
562 |     write_read_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr);
563 | }
564 | 
565 | action noequ0_processentry24andWriteToPacket() {
566 |     noequ0_write_read_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr);
567 | }
568 | 
569 | action do_cleanEntry24() {
570 |     clean_entry24.execute_stateful_alu(p4ml_agtr_index.agtr);
571 | }
572 | 
573 | action entry24WriteToPacket() {
574 |     read_data_entry24.execute_stateful_alu(p4ml_agtr_index.agtr);
575 | }
576 | 
577 | action processentry25() {
578 |     write_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr);
579 | }
580 | 
581 | action noequ0_processentry25() {
582 |     noequ0_write_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr);
583 | }
584 | 
585 | action processentry25andWriteToPacket() {
586 |     write_read_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr);
587 | }
588 | 
589 | action noequ0_processentry25andWriteToPacket() {
590 |     noequ0_write_read_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr);
591 | }
592 | 
593 | action do_cleanEntry25() {
594 |     clean_entry25.execute_stateful_alu(p4ml_agtr_index.agtr);
595 | }
596 | 
597 | action entry25WriteToPacket() {
598 |     read_data_entry25.execute_stateful_alu(p4ml_agtr_index.agtr);
599 | }
600 | 
601 | action processentry26() {
602 |     write_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr);
603 | }
604 | 
605 | action noequ0_processentry26() {
606 |     noequ0_write_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr);
607 | }
608 | 
609 | action processentry26andWriteToPacket() {
610 |     write_read_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr);
611 | }
612 | 
613 | action noequ0_processentry26andWriteToPacket() {
614 |     noequ0_write_read_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr);
615 | }
616 | 
617 | action do_cleanEntry26() {
618 |     clean_entry26.execute_stateful_alu(p4ml_agtr_index.agtr);
619 | }
620 | 
621 | action entry26WriteToPacket() {
622 |     read_data_entry26.execute_stateful_alu(p4ml_agtr_index.agtr);
623 | }
624 | 
625 | action processentry27() {
626 |     write_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr);
627 | }
628 | 
629 | action noequ0_processentry27() {
630 |     noequ0_write_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr);
631 | }
632 | 
633 | action processentry27andWriteToPacket() {
634 |     write_read_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr);
635 | }
636 | 
637 | action noequ0_processentry27andWriteToPacket() {
638 |     noequ0_write_read_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr);
639 | }
640 | 
641 | action do_cleanEntry27() {
642 |     clean_entry27.execute_stateful_alu(p4ml_agtr_index.agtr);
643 | }
644 | 
645 | action entry27WriteToPacket() {
646 |     read_data_entry27.execute_stateful_alu(p4ml_agtr_index.agtr);
647 | }
648 | 
649 | action processentry28() {
650 |     write_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr);
651 | }
652 | 
653 | action noequ0_processentry28() {
654 |     noequ0_write_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr);
655 | }
656 | 
657 | action processentry28andWriteToPacket() {
658 |     write_read_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr);
659 | }
660 | 
661 | action noequ0_processentry28andWriteToPacket() {
662 |     noequ0_write_read_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr);
663 | }
664 | 
665 | action do_cleanEntry28() {
666 |     clean_entry28.execute_stateful_alu(p4ml_agtr_index.agtr);
667 | }
668 | 
669 | action entry28WriteToPacket() {
670 |     read_data_entry28.execute_stateful_alu(p4ml_agtr_index.agtr);
671 | }
672 | 
673 | action processentry29() {
674 |     write_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr);
675 | }
676 | 
677 | action noequ0_processentry29() {
678 |     noequ0_write_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr);
679 | }
680 | 
681 | action processentry29andWriteToPacket() {
682 |     write_read_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr);
683 | }
684 | 
685 | action noequ0_processentry29andWriteToPacket() {
686 |     noequ0_write_read_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr);
687 | }
688 | 
689 | action do_cleanEntry29() {
690 |     clean_entry29.execute_stateful_alu(p4ml_agtr_index.agtr);
691 | }
692 | 
693 | action entry29WriteToPacket() {
694 |     read_data_entry29.execute_stateful_alu(p4ml_agtr_index.agtr);
695 | }
696 | 
697 | action processentry30() {
698 |     write_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr);
699 | }
700 | 
701 | action noequ0_processentry30() {
702 |     noequ0_write_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr);
703 | }
704 | 
705 | action processentry30andWriteToPacket() {
706 |     write_read_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr);
707 | }
708 | 
709 | action noequ0_processentry30andWriteToPacket() {
710 |     noequ0_write_read_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr);
711 | }
712 | 
713 | action do_cleanEntry30() {
714 |     clean_entry30.execute_stateful_alu(p4ml_agtr_index.agtr);
715 | }
716 | 
717 | action entry30WriteToPacket() {
718 |     read_data_entry30.execute_stateful_alu(p4ml_agtr_index.agtr);
719 | }
720 | 
721 | action processentry31() {
722 |     write_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr);
723 | }
724 | 
725 | action noequ0_processentry31() {
726 |     noequ0_write_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr);
727 | }
728 | 
729 | action processentry31andWriteToPacket() {
730 |     write_read_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr);
731 | }
732 | 
733 | action noequ0_processentry31andWriteToPacket() {
734 |     noequ0_write_read_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr);
735 | }
736 | 
737 | action do_cleanEntry31() {
738 |     clean_entry31.execute_stateful_alu(p4ml_agtr_index.agtr);
739 | }
740 | 
741 | action entry31WriteToPacket() {
742 |     read_data_entry31.execute_stateful_alu(p4ml_agtr_index.agtr);
743 | }
744 | 
745 | //action processentry32() {
746 | //    write_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr);
747 | //}
748 | 
749 | //actionoequ0_n processentry32() {
750 | //  noequ0_  write_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr);
751 | //}
752 | //
753 | //action processentry32andWriteToPacket() {
754 | //    write_read_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr);
755 | //}
756 | 
757 | //actionoequ0_n processentry32andWriteToPacket() {
758 | //  noequ0_  write_read_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr);
759 | //}
760 | 
761 | //action do_cleanEntryry32() {
762 | //    clean_entry32.execute_stateful_alu(p4ml_agtr_index.agtr);
763 | 
764 | 
765 | //action entry32WriteToPacket() {
766 | //    read_data_entry32.execute_stateful_alu(p4ml_agtr_index.agtr);
767 | //}
768 | //
769 | 


--------------------------------------------------------------------------------
/p4ml2/includes/common.p4:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * P4PS
  3 |  * /
  4 | 
  5 | /*************************************************************************
  6 |  ***********************  R E G I S T E R  *******************************
  7 |  *************************************************************************/
  8 | 
  9 | blackbox stateful_alu cleaning_agtr_time {
 10 |     reg: agtr_time;
 11 | 
 12 |     update_lo_1_value   :  0;
 13 | }
 14 | 
 15 | blackbox stateful_alu cleaning_ecn {
 16 |     reg: ecn_register;
 17 | 
 18 |     update_lo_1_value   :  0;
 19 | }
 20 | 
 21 | //for is_Col in A2TP
 22 | 
 23 | blackbox stateful_alu cleaning_col {
 24 |     reg: col_register;
 25 | 
 26 |     update_lo_1_value   :  0;
 27 | }
 28 | 
 29 | 
 30 | blackbox stateful_alu cleaning_bitmap {
 31 |     reg: bitmap;
 32 | 
 33 |     update_lo_1_value   :  0;
 34 | }
 35 | 
 36 | blackbox stateful_alu read_write_bitmap {
 37 |     reg: bitmap;
 38 | 
 39 |     output_dst            :  mdata.bitmap;
 40 | 
 41 |     output_value          :  register_lo;
 42 | 
 43 |     update_lo_1_value     :  register_lo | p4ml.bitmap;
 44 | }
 45 | 
 46 | blackbox stateful_alu read_write_bitmap_resend {
 47 |     reg: bitmap;
 48 | 
 49 |     output_dst            :  mdata.bitmap;
 50 | 
 51 |     output_value          :  register_lo;
 52 | 
 53 |     update_lo_1_value     :  0;
 54 | }
 55 | 
 56 | // if same application, output appID, if not, not output (zero)
 57 | blackbox stateful_alu check_app_id_and_seq {
 58 |     reg: appID_and_Seq;
 59 | 
 60 |     condition_lo           :  p4ml.appIDandSeqNum == register_lo;
 61 |     // The agtr is empty
 62 |     condition_hi           :  register_lo == 0;
 63 | 
 64 |     update_lo_1_predicate  :  condition_lo or condition_hi;
 65 |     update_lo_1_value      :  p4ml.appIDandSeqNum;
 66 | 
 67 |     output_predicate       :  condition_lo or condition_hi;
 68 |     output_dst             :  mdata.isMyAppIDandMyCurrentSeq;
 69 |     output_value           :  p4ml.appIDandSeqNum;
 70 | }
 71 | 
 72 | blackbox stateful_alu check_app_id_and_seq_resend {
 73 |     reg: appID_and_Seq;
 74 | 
 75 |     condition_lo           :  p4ml.appIDandSeqNum == register_lo;
 76 | 
 77 |     update_lo_1_predicate  :  condition_lo;
 78 |     update_lo_1_value      :  0;
 79 | 
 80 |     output_predicate       :  condition_lo;
 81 |     output_dst             :  mdata.isMyAppIDandMyCurrentSeq;
 82 |     output_value           :  register_lo;
 83 | }
 84 | 
 85 | blackbox stateful_alu clean_app_id_and_seq {
 86 |     reg: appID_and_Seq;
 87 | 
 88 |     condition_lo           :  p4ml.appIDandSeqNum == register_lo;
 89 | 
 90 |     update_lo_1_predicate  :  condition_lo;
 91 |     update_lo_1_value      :  0;
 92 | 
 93 |     output_predicate       :  condition_lo;
 94 |     output_dst             :  mdata.isMyAppIDandMyCurrentSeq;
 95 |     output_value           :  p4ml.appIDandSeqNum;
 96 | }
 97 | 
 98 | blackbox stateful_alu check_agtrTime {
 99 |     reg: agtr_time;
100 | 
101 |     condition_lo           :  mdata.isAggregate != 0;
102 |     output_dst             :  mdata.current_agtr_time; 
103 | 
104 |     update_lo_1_predicate  :  condition_lo;
105 |     update_lo_1_value      :  register_lo + 1;
106 | 
107 |     update_lo_2_predicate  :  not condition_lo;
108 |     update_lo_2_value      :  register_lo;
109 | 
110 |     output_value           :  alu_lo;
111 | }
112 | 
113 | blackbox stateful_alu check_resend_agtrTime {
114 |     reg: agtr_time;
115 | 
116 |     condition_lo           :  mdata.isAggregate != 0;
117 |     // fake, force forward
118 |     output_dst             :  mdata.current_agtr_time; 
119 | 
120 |     update_lo_1_predicate  :  condition_lo;
121 |     update_lo_1_value      :  0;
122 | 
123 |     update_lo_2_predicate  :  not condition_lo;
124 |     update_lo_2_value      :  0;
125 | 
126 |     output_value           :  p4ml.agtr_time;
127 | }
128 | 
129 | blackbox stateful_alu do_comp_qdepth {
130 |     reg: dqueue_alert_threshold;
131 | 
132 |     condition_lo           :  eg_intr_md.deq_qdepth >= register_lo;
133 |     // fake, force forward
134 |     output_predicate       :  condition_lo;
135 |     output_dst             :  mdata.qdepth; 
136 |     output_value           :  eg_intr_md.deq_qdepth;
137 |     initial_register_lo_value : 1000;
138 | }
139 | 
140 | blackbox stateful_alu do_check_ecn {
141 |     reg: ecn_register;
142 | 
143 |  	condition_lo		   :  register_lo == 1; 
144 |   
145 |     update_lo_1_value      :  register_lo | mdata.is_ecn;
146 | 
147 | 	output_predicate 	   :  condition_lo;
148 |     output_value           :  mdata.value_one;
149 |     output_dst             :  p4ml.ECN;
150 | }
151 | 
152 | //for is_Col in A2TP
153 | 
154 | blackbox stateful_alu do_check_col {
155 |     reg: col_register;
156 | 
157 |  	condition_lo		   :  register_lo == 1; 
158 |   
159 |     update_lo_1_value      :  register_lo | mdata.is_col;
160 | 
161 | 	output_predicate 	   :  condition_lo;
162 |     output_value           :  mdata.value_two;
163 |     output_dst             :  p4ml.is_lzy_Col; //for is_Col in A2TP
164 | }
165 | 
166 | blackbox stateful_alu do_tag_col {
167 |     reg: col_register;
168 |   
169 |     update_lo_1_value      :  register_lo | mdata.is_col;
170 | 
171 | }
172 | 
173 | /*************************************************************************
174 |  **************  I N G R E S S   P R O C E S S I N G   *******************
175 |  *************************************************************************/
176 | 
177 | /*
178 |  * Actions 
179 |  */
180 | 
181 | action process_bitmap() {
182 |     read_write_bitmap.execute_stateful_alu(p4ml_agtr_index.agtr);
183 | }
184 | 
185 | action process_bitmap_resend() {
186 |     read_write_bitmap_resend.execute_stateful_alu(p4ml_agtr_index.agtr);
187 | }
188 | 
189 | 
190 | action check_aggregate_and_forward() {
191 |     // this is is for aggregation needed checking
192 |     bit_andcb(mdata.isAggregate, p4ml.bitmap, mdata.bitmap);
193 |     bit_or(mdata.integrated_bitmap, p4ml.bitmap, mdata.bitmap);
194 | }
195 | 
196 | action clean_agtr_time() {
197 |     cleaning_agtr_time.execute_stateful_alu(p4ml_agtr_index.agtr);
198 | }
199 | 
200 | action clean_ecn() {
201 |     cleaning_ecn.execute_stateful_alu(p4ml_agtr_index.agtr);
202 | }
203 | 
204 | //for is_Col in A2TP
205 | 
206 | action clean_col() {
207 |     cleaning_col.execute_stateful_alu(p4ml_agtr_index.agtr);
208 | }
209 | 
210 | 
211 | action clean_bitmap() {
212 |     cleaning_bitmap.execute_stateful_alu(p4ml_agtr_index.agtr);
213 | }
214 | 
215 | action multicast(group) {
216 |     modify_field(ig_intr_md_for_tm.mcast_grp_a, group);
217 | }
218 | 
219 | action check_appID_and_seq() {
220 |     check_app_id_and_seq.execute_stateful_alu(p4ml_agtr_index.agtr);
221 |     //modify_field(mdata.qdepth, 0);   
222 | }
223 | 
224 | action check_appID_and_seq_resend() {
225 |     check_app_id_and_seq_resend.execute_stateful_alu(p4ml_agtr_index.agtr);
226 |  //   modify_field(mdata.qdepth, 0);   
227 | }
228 | 
229 | action clean_appID_and_seq() {
230 |     clean_app_id_and_seq.execute_stateful_alu(p4ml_agtr_index.agtr);
231 | }
232 | 
233 | action check_agtr_time() {
234 |     check_agtrTime.execute_stateful_alu(p4ml_agtr_index.agtr);
235 | }
236 | 
237 | action check_resend_agtr_time() {
238 |     check_resend_agtrTime.execute_stateful_alu(p4ml_agtr_index.agtr);
239 | }
240 | 
241 | action modify_packet_bitmap() {
242 |     modify_field(p4ml.bitmap, mdata.integrated_bitmap);
243 | }
244 | 
245 | action do_qdepth() {
246 |     do_comp_qdepth.execute_stateful_alu(0);
247 | }
248 | 
249 | action modify_ecn() {
250 |     modify_field(p4ml.ECN, 1);
251 | }
252 | 
253 | action mark_ecn() {
254 |     bit_or(mdata.is_ecn, mdata.qdepth, mdata.is_ecn);
255 | }
256 | 
257 | action modify_ipv4_ecn() {
258 |     modify_field(ipv4.ecn, 3);
259 | }
260 | 
261 | action check_ecn() {
262 |     do_check_ecn.execute_stateful_alu(p4ml_agtr_index.agtr);
263 | }
264 | 
265 | action setup_ecn() {
266 |     modify_field(mdata.is_ecn, 1);    
267 | }
268 | 
269 | //for is_Col in A2TP
270 | 
271 | action check_col() {
272 |     do_check_col.execute_stateful_alu(p4ml_agtr_index.agtr);
273 | }
274 | 
275 | action tag_col() {
276 |     do_tag_col.execute_stateful_alu(p4ml_agtr_index.agtr);
277 | }
278 | /*
279 | action setup_col() {
280 |     modify_field(mdata.is_col, 1);    
281 | }
282 | */
283 | 
284 | 
285 | action tag_collision_incoming() {
286 |     modify_field(p4ml.isSWCollision, 1);
287 |     modify_field(mdata.is_col, 1);
288 |     // modify_field(p4ml.bitmap, mdata.isMyAppIDandMyCurrentSeq);
289 | }
290 | 
291 | action set_egr(egress_spec) {
292 |     modify_field(ig_intr_md_for_tm.ucast_egress_port, egress_spec);
293 |     // increase_p4ml_counter.execute_stateful_alu(ig_intr_md.ingress_port);
294 | }
295 | 
296 | action set_egr_and_set_index(egress_spec) {
297 |     modify_field(ig_intr_md_for_tm.ucast_egress_port, egress_spec);
298 |     modify_field(p4ml.dataIndex, 1);
299 |     // increase_p4ml_counter.execute_stateful_alu(ig_intr_md.ingress_port);
300 | }
301 | 
302 | action nop()
303 | {
304 | }
305 | 
306 | action drop_pkt() {
307 |     drop();
308 | }
309 | 
310 | action increase_counter() {
311 |     increase_p4ml_counter.execute_stateful_alu(0);
312 | }
313 | 
314 | table bitmap_table {
315 |     actions {
316 |         process_bitmap;
317 |     }
318 |     default_action: process_bitmap();
319 |     size : 1;
320 | }
321 | 
322 | table bitmap_resend_table {
323 |     actions {
324 |         process_bitmap_resend;
325 |     }
326 |     default_action: process_bitmap_resend();
327 |     size : 1;
328 | }
329 | 
330 | //@pragma stage 2
331 | table bitmap_aggregate_table {
332 |     actions {
333 |         check_aggregate_and_forward;
334 |     }
335 |     default_action: check_aggregate_and_forward();
336 |     size : 1;
337 | }
338 | 
339 | //@pragma stage 3
340 | table agtr_time_table {
341 |     actions {
342 |         check_agtr_time;
343 |     }
344 |     default_action: check_agtr_time();
345 |     size : 1;
346 | }
347 | 
348 | //@pragma stage 3
349 | table agtr_time_resend_table {
350 |     actions {
351 |         check_resend_agtr_time;
352 |     }
353 |     default_action: check_resend_agtr_time();
354 |     size : 1;
355 | }
356 | 
357 | table immd_outPort_table {
358 |     reads {
359 |         p4ml.appIDandSeqNum mask 0xFFFF0000: exact;
360 |     }
361 |     actions {
362 |         set_egr;
363 |     }
364 | }
365 | 
366 | //@pragma stage 11
367 | table outPort_table {
368 |     reads {
369 |         p4ml.appIDandSeqNum mask 0xFFFF0000: exact;
370 |         ig_intr_md.ingress_port: exact;
371 |         p4ml.dataIndex: exact;
372 |         p4ml.PSIndex: exact;
373 |     }
374 |     actions {
375 | 		nop;
376 |         set_egr;
377 |         set_egr_and_set_index;
378 |         drop_pkt;
379 |     }
380 |     default_action: drop_pkt();
381 | }
382 | 
383 | table bg_outPort_table {
384 |     reads {
385 |         // useless here, just can't use default action for variable
386 |         p4ml_bg.isACK : exact;
387 |     }
388 |     actions {
389 |         set_egr;
390 | 		nop;
391 |     }
392 | }
393 | 
394 | table multicast_table {
395 |     reads {
396 |         p4ml.isACK: exact;
397 |         p4ml.appIDandSeqNum mask 0xFFFF0000: exact;
398 |         ig_intr_md.ingress_port: exact;
399 |         p4ml.dataIndex: exact;
400 |     }
401 |     actions {
402 |         multicast; drop_pkt; set_egr_and_set_index;
403 |     }
404 |     default_action: drop_pkt();
405 | }
406 | 
407 | @pragma stage 3
408 | table clean_agtr_time_table {
409 |     actions {
410 |         clean_agtr_time;
411 |     }
412 |     default_action: clean_agtr_time();
413 |     size : 1;
414 | }
415 | 
416 | @pragma stage 1
417 | table clean_ecn_table {
418 |     actions {
419 |         clean_ecn;
420 |     }
421 |     default_action: clean_ecn();
422 |     size : 1;
423 | }
424 | 
425 | //for is_Col in A2TP
426 | 
427 | @pragma stage 2
428 | table clean_col_table {
429 |     actions {
430 |         clean_col;
431 |     }
432 |     default_action: clean_col();
433 |     size : 1;
434 | }
435 | 
436 | 
437 | table clean_bitmap_table {
438 |     actions {
439 |         clean_bitmap;
440 |     }
441 |     default_action: clean_bitmap();
442 |     size : 1;
443 | }
444 | 
445 | /* Counter */
446 | register p4ml_counter {
447 |     width : 32;
448 |     instance_count :1;
449 | }
450 | 
451 | blackbox stateful_alu increase_p4ml_counter {
452 |     reg: p4ml_counter;
453 | 
454 |     update_lo_1_value   :  register_lo + 1 ;
455 | }
456 | 
457 | table forward_counter_table {
458 |         actions {
459 |         increase_counter;
460 |     }
461 |     default_action: increase_counter();
462 |     size : 1;
463 | }
464 | 
465 | //@pragma stage 0
466 | table appID_and_seq_table {
467 |         actions {
468 |         check_appID_and_seq;
469 |     }
470 |     default_action: check_appID_and_seq();
471 |     size : 1;
472 | }
473 | 
474 | table appID_and_seq_resend_table {
475 |         actions {
476 |         check_appID_and_seq_resend;
477 |     }
478 |     default_action: check_appID_and_seq_resend();
479 |     size : 1;
480 | }
481 | 
482 | table clean_appID_and_seq_table {
483 |         actions {
484 |         clean_appID_and_seq;
485 |     }
486 |     default_action: clean_appID_and_seq();
487 |     size : 1;
488 | }
489 | 
490 | table modify_packet_bitmap_table {
491 |     reads {
492 |         p4ml.dataIndex: exact;
493 |     }
494 |         actions {
495 |         modify_packet_bitmap; nop;
496 |     }
497 |     default_action: nop();
498 | }
499 | 
500 | table qdepth_table {
501 |     actions {
502 |         do_qdepth;
503 |     }
504 |     default_action: do_qdepth();
505 |     size : 1;
506 | }
507 | 
508 | table modify_ecn_table {
509 |     actions {
510 |         modify_ecn;
511 |     }
512 |     default_action: modify_ecn();
513 |     size : 1;
514 | }
515 | 
516 | table mark_ecn_ipv4_table {
517 |     actions {
518 |         modify_ipv4_ecn;
519 |     }
520 |     default_action: modify_ipv4_ecn();
521 |     size : 1;
522 | }
523 | 
524 | table ecn_mark_table {
525 |     actions {
526 |         mark_ecn;
527 |     }
528 |     default_action: mark_ecn();
529 |     size : 1;
530 | }
531 | 
532 | @pragma stage 1
533 | table ecn_register_table {
534 |     actions {
535 |         check_ecn;
536 |     }
537 |     default_action: check_ecn();
538 |     size : 1;
539 | }
540 | 
541 | 
542 | table setup_ecn_table {
543 |     actions {
544 |         setup_ecn;
545 |     }
546 |     default_action: setup_ecn();
547 |     size : 1;
548 | }
549 | 
550 | //for is_Col in A2TP
551 | @pragma stage 2
552 | table col_register_table {
553 |     actions {
554 |         check_col;
555 |     }
556 |     default_action: check_col();
557 |     size : 1;
558 | }
559 | 
560 | 
561 | 
562 | 
563 | table forward {
564 |     reads {
565 |         ethernet.dstAddr : exact;
566 |     }
567 |     actions {
568 |         set_egr; nop; drop_pkt;
569 |     }
570 |     default_action: drop_pkt();
571 | }
572 | 
573 | table drop_table {
574 |     reads {
575 |         ig_intr_md.ingress_port: exact;
576 |         p4ml.dataIndex : exact;
577 |     }
578 |     actions {
579 |         drop_pkt; set_egr; set_egr_and_set_index;
580 |     }
581 |     default_action: drop_pkt();
582 | }
583 | 
584 | @pragma stage 2
585 | table tag_col_register_table {
586 |     actions {
587 |         tag_col;
588 |     }
589 |     default_action: tag_col();
590 | }
591 | 
592 | table tag_collision_incoming_table {
593 |     actions {
594 |         tag_collision_incoming;
595 |     }
596 |     default_action: tag_collision_incoming();
597 | }
598 | 


--------------------------------------------------------------------------------
/p4ml2/includes/headers.p4:
--------------------------------------------------------------------------------
  1 | #define MAX_ENTRIES_PER_PACKET 32
  2 | /*************************************************************************
  3 |  ***********************  H E A D E R S  *********************************
  4 |  *************************************************************************/
  5 | 
  6 | // 14Byte
  7 | header_type ethernet_t {
  8 |     fields {
  9 |         dstAddr   : 48;
 10 |         srcAddr   : 48;
 11 |         etherType : 16;
 12 |     }
 13 | }
 14 | 
 15 | // 20Byte
 16 | header_type ipv4_t {
 17 |     fields {
 18 |         version        : 4;
 19 |         ihl            : 4;
 20 |         dscp           : 6;
 21 |         ecn            : 2;
 22 |         totalLen       : 16;
 23 |         identification : 16;
 24 |         flags          : 3;
 25 |         fragOffset     : 13;
 26 |         ttl            : 8;
 27 |         protocol       : 8;
 28 |         hdrChecksum    : 16;
 29 |         srcAddr        : 32;
 30 |         dstAddr        : 32;
 31 |     }
 32 | }
 33 | 
 34 | header_type udp_t {
 35 |     fields {
 36 |         srcPort : 16;
 37 |         dstPort : 16;
 38 |         length_ : 16;
 39 |         checksum : 16;
 40 |     }
 41 | }
 42 | 
 43 | // 12Byte * 2
 44 | header_type p4ml_t {
 45 |     fields {
 46 |         bitmap         :  32;
 47 |         agtr_time      :  8;
 48 |         overflow       :  1;
 49 |         /* For multiple PS */
 50 |         PSIndex        :  2;
 51 |         /* For signle PS */
 52 |         // reserved       :  2;
 53 |         // isForceFoward  :  1;
 54 |         dataIndex      :  1; 
 55 |         ECN            :  1;
 56 |         isResend       :  1;
 57 |         isSWCollision  :  1;
 58 |         isACK          :  1;
 59 |         appIDandSeqNum :  32;  //in switchml.p4: this is used to find the bit location 
 60 |         is_lzy_Col     :  8; //used in A2TP
 61 |     }
 62 | }
 63 | 
 64 | header_type p4ml_agtr_index_t {
 65 | 	fields{
 66 | 	    agtr	:16;
 67 |     }
 68 | }
 69 | 
 70 | header_type bg_p4ml_t {
 71 |     fields {
 72 |         key            :  64;
 73 |         len_tensor     :  32;
 74 |         bitmap         :  32;
 75 |         agtr_time      :  8;
 76 |         reserved       :  4;
 77 |         ECN            :  1;
 78 |         isResend       :  1;
 79 |         isSWCollision  :  1;
 80 |         isACK          :  1;
 81 |         agtr           :  16;
 82 |         appIDandSeqNum :  32;  //in switchml.p4: this is used to find the bit location
 83 |     }
 84 | }
 85 | 
 86 | // 108Byte * 2
 87 | header_type entry_t {
 88 |     fields {
 89 |         data0         : 32 (signed);
 90 |         data1         : 32 (signed);
 91 |         data2         : 32 (signed);
 92 |         data3         : 32 (signed);
 93 |         data4         : 32 (signed);
 94 |         data5         : 32 (signed);
 95 |         data6         : 32 (signed);
 96 |         data7         : 32 (signed);
 97 |         data8         : 32 (signed);
 98 |         data9         : 32 (signed);
 99 |         data10        : 32 (signed);
100 |         data11        : 32 (signed);
101 |         data12        : 32 (signed);
102 |         data13        : 32 (signed);
103 |         data14        : 32 (signed);
104 |         data15        : 32 (signed);
105 |         data16        : 32 (signed);
106 |         data17        : 32 (signed);
107 |         data18        : 32 (signed);
108 |         data19        : 32 (signed);
109 |         data20        : 32 (signed);
110 |         data21        : 32 (signed);
111 |         data22        : 32 (signed);
112 |         data23        : 32 (signed);
113 |         data24        : 32 (signed);
114 |         data25        : 32 (signed);
115 |         data26        : 32 (signed);
116 |         data27        : 32 (signed);
117 |         data28        : 32 (signed);
118 |         data29        : 32 (signed);
119 |         data30        : 32 (signed);
120 | //        data31        : 32 (signed);
121 |     }
122 | }
123 | 
124 | //12Byte * 2
125 | // header_type entry2_t {
126 | //     fields {
127 | //         data27        : 32 (signed);
128 | //         data28        : 32 (signed);
129 | //         data29        : 32 (signed);
130 | //         data30        : 32 (signed);
131 | //         data31        : 32 (signed);
132 | //     }
133 | // }
134 | 
135 | /*************************************************************************
136 |  ***********************  M E T A D A T A  *******************************
137 |  *************************************************************************/
138 | 
139 | header_type p4ml_meta_t {
140 |     fields {
141 |         // P4ML
142 |         isMyAppIDandMyCurrentSeq : 16;
143 |         bitmap                   : 32;
144 |         isAggregate              : 32;
145 |         agtr_time                : 8;
146 |         integrated_bitmap        : 32;
147 |         current_agtr_time        : 8;
148 |         agtr_index 	          	 : 32;
149 | 	    isDrop                   : 32; 
150 |         inside_appID_and_Seq     : 1;
151 |         value_one                : 1;
152 |         value_two                : 8;
153 |         qdepth                   : 16;
154 |  	    seen_bitmap0		     : 8;
155 | 	    seen_isAggregate 	     : 8;
156 |         is_ecn                   : 32;
157 |         is_col                   : 32; //used in A2TP
158 | 	}
159 | }
160 | 
161 | header_type p4ml_constant_t {
162 | 	fields{
163 |         bitmap		:32;
164 |         agtr_time	:8;
165 | 	}
166 | }
167 | 


--------------------------------------------------------------------------------
/p4ml2/includes/parser.p4:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | metadata p4ml_meta_t mdata;
  4 | metadata p4ml_constant_t p4ml_constant;
  5 | 
  6 | header ethernet_t ethernet;
  7 | header ipv4_t     ipv4;
  8 | header udp_t      udp;
  9 | header p4ml_agtr_index_t p4ml_agtr_index;
 10 | header p4ml_agtr_index_t p4ml_agtr_index_useless;
 11 | header p4ml_agtr_index_t p4ml_agtr_index_useless2;
 12 | 
 13 | header p4ml_t p4ml;
 14 | header entry_t p4ml_entries;
 15 | header entry_t p4ml_entries_useless;
 16 | 
 17 | header bg_p4ml_t     p4ml_bg;
 18 | // header blank3_t blank3;
 19 | /*************************************************************************
 20 |  ***********************  P A R S E R  ***********************************
 21 |  *************************************************************************/
 22 | 
 23 | parser start {
 24 |     extract(ethernet);
 25 |     set_metadata(mdata.value_one, 1);
 26 |     set_metadata(mdata.value_two, 1);
 27 |     return select(ethernet.etherType) {
 28 |         0x0700   : parse_ipv4;
 29 |         0x0800   : parse_rdma;
 30 |         0x0900   : parse_bg;
 31 |         default  : ingress;
 32 |     }
 33 |     // return parse_ipv4;
 34 | 
 35 | }
 36 | 
 37 | parser parse_ipv4 {
 38 |     extract(ipv4);
 39 |     return parse_p4ml;
 40 | }
 41 | 
 42 | parser parse_p4ml {
 43 |     extract(p4ml);
 44 |     return  select(p4ml.dataIndex) {
 45 |         0x0     : check_if_resubmit;
 46 |         0x1     : use_second_p4ml_agtr_index_recirculate;
 47 |         default : ingress;
 48 |     }
 49 | }
 50 | 
 51 | parser check_if_resubmit {
 52 |     return  select(ig_intr_md.resubmit_flag) {
 53 |         // 0x0     : parse_p4ml_agtr_index;
 54 |         0x0     : use_first_p4ml_agtr_index_recirculate;
 55 |         // 0x1     : skip_first_p4ml_agtr_index;
 56 |         0x1     : use_second_p4ml_agtr_index_recirculate;
 57 |         default : ingress;
 58 |     }
 59 | }
 60 | 
 61 | /// resubmit 0x0
 62 | 
 63 | parser parse_p4ml_agtr_index {
 64 | 	extract(p4ml_agtr_index);
 65 | 	return skip_second_p4ml_agtr_index; 
 66 | }
 67 | 
 68 | @pragma force_shift ingress 16 /* 2 bytes */
 69 | parser skip_second_p4ml_agtr_index {
 70 | 	return parse_entry;
 71 | }
 72 | 
 73 | parser parse_entry {
 74 | 	extract(p4ml_entries);
 75 | 	return ingress;
 76 | }
 77 | 
 78 | /// resubmit 0x1
 79 | 
 80 | parser parse_p4ml_agtr_index2 {
 81 | 	extract(p4ml_agtr_index);
 82 | 	return skip_header_c_0_31;
 83 | }
 84 | 
 85 | @pragma force_shift ingress 16 /* 2 bytes */
 86 | parser skip_first_p4ml_agtr_index {
 87 | 	return parse_p4ml_agtr_index2;
 88 | }
 89 | 
 90 | /// recirculate 2
 91 | 
 92 | parser use_second_p4ml_agtr_index_recirculate {
 93 | 	extract(p4ml_agtr_index_useless2);
 94 | 	return parse_p4ml_agtr_index_recirculate;
 95 | }
 96 | 
 97 | parser parse_p4ml_agtr_index_recirculate {
 98 | 	extract(p4ml_agtr_index);
 99 | 	return parse_entry2;
100 | }
101 | 
102 | parser parse_entry2 {
103 | 	extract(p4ml_entries_useless);
104 | 	return parse_entry;
105 | }
106 | 
107 | /// recirculate 1
108 | 
109 | parser use_first_p4ml_agtr_index_recirculate {
110 | 	extract(p4ml_agtr_index);
111 | 	return useless_second_p4ml_agtr_index_recirculate;
112 | }
113 | 
114 | parser useless_second_p4ml_agtr_index_recirculate {
115 | 	extract(p4ml_agtr_index_useless);
116 | 	return parse_entry;
117 | }
118 | ///
119 |  
120 | @pragma force_shift ingress 256 /* 32 bytes */
121 | parser skip_header_c_0_31 {
122 |     return skip_header_c_32_63;
123 | }
124 | 
125 | @pragma force_shift ingress 256 /* 32 bytes */
126 | parser skip_header_c_32_63 {
127 |     return skip_header_c_64_95;
128 | }
129 | 
130 | @pragma force_shift ingress 256 /* 32 bytes */
131 | parser skip_header_c_64_95 {
132 |     return skip_header_c_96_127;
133 | }
134 | 
135 | @pragma force_shift ingress 256 /* 32 bytes */
136 | parser skip_header_c_96_127 {
137 |     return parse_entry;
138 | }
139 | 
140 | 
141 | // /* RDMA */
142 | parser parse_rdma {
143 |     extract(ipv4);
144 |     return ingress;
145 | }
146 | 
147 | // /* BG */
148 | parser parse_bg {
149 |     extract(ipv4);
150 |     return parse_udp_bg;
151 | }
152 | 
153 | parser parse_udp_bg {
154 |     extract(udp);
155 |     return parse_p4ml_bg;
156 | }
157 | 
158 | parser parse_p4ml_bg {
159 |     extract(p4ml_bg);
160 | //set_metadata(mdata.qdepth, 0);    
161 | //    return ingress; 
162 |    return ingress;
163 | }
164 | 


--------------------------------------------------------------------------------
/p4ml2/p4ml2.p4:
--------------------------------------------------------------------------------
  1 | #include <tofino/intrinsic_metadata.p4>
  2 | #include <tofino/stateful_alu_blackbox.p4>
  3 | #include <tofino/constants.p4>
  4 | #include "includes/headers.p4"
  5 | #include "includes/parser.p4"
  6 | 
  7 | #include "includes/registers.p4"
  8 | #include "includes/tables.p4"
  9 | #include "includes/actions.p4"
 10 | #include "includes/common.p4" 
 11 | 
 12 | field_list p4ml_resubmit_list{
 13 | 	mdata.agtr_time;	
 14 | }  
 15 |  
 16 | action do_resubmit(){
 17 | 	resubmit(p4ml_resubmit_list);
 18 | }
 19 | 
 20 | table p4ml_resubmit{
 21 | 	actions{
 22 | 		do_resubmit;
 23 | 	}
 24 | 	default_action: do_resubmit();
 25 | 	size: 1;
 26 | 
 27 | } 
 28 | control ingress 
 29 | {
 30 | 
 31 |     if (valid(p4ml_entries)) { //aggregated packet
 32 | 
 33 |             if (ipv4.ecn == 3 or p4ml.ECN == 1) { //mdata.is_ecn=1
 34 |                 apply(setup_ecn_table); //in common.p4
 35 |             }
 36 |             // ack packet
 37 |             if (p4ml.isACK == 1) {
 38 |                 
 39 |                 if (p4ml.overflow == 1 and p4ml.isResend == 0) {
 40 | 
 41 |                 } else {
 42 |                     apply(clean_appID_and_seq_table); //in common.p4 and registers.p4 get mdata.isMyAppIDandMyCurrentSeq
 43 |                     
 44 |                     if (mdata.isMyAppIDandMyCurrentSeq != 0) {
 45 |                         /* Clean */
 46 |                         apply(clean_bitmap_table);
 47 |                         apply(clean_ecn_table);
 48 |                         
 49 |                         apply(clean_col_table); //used in A2TP
 50 |                         
 51 |                         apply(clean_agtr_time_table);
 52 | 
 53 |                         
 54 |                         // apply(cleanEntry1);
 55 |                     }
 56 |                 }
 57 | 
 58 |                 /* Multicast Back */
 59 |                 if(ig_intr_md.resubmit_flag == 1) {
 60 |                     apply(multicast_table);
 61 |                 } else {
 62 |                     apply(p4ml_resubmit); //resubmit mdata.agtr_time
 63 |                 }
 64 |                 
 65 |             } else {
 66 | 
 67 |                 if (p4ml.overflow == 1) {
 68 |                     apply(outPort_table); //set eg according to appseq, ingress, dataindex, psindex in common.p4
 69 |                 } else {
 70 |                     if (p4ml.isResend == 1) {
 71 |                         apply(appID_and_seq_resend_table); //clean appid and seq
 72 |                     } else {
 73 |                         apply(appID_and_seq_table); //allocate a new/existing block for current appid
 74 |                     }
 75 |                     // Correct ID and Seq
 76 |                     if (mdata.isMyAppIDandMyCurrentSeq != 0) {
 77 |                         
 78 |                         if (p4ml.isResend == 1) {
 79 |                             // Clean the bitmap also
 80 |                             apply(bitmap_resend_table);
 81 |                         } else {
 82 |                             apply(bitmap_table); // OR bitmap
 83 |                         }
 84 |  
 85 |                         
 86 | 
 87 |                         apply(ecn_register_table);
 88 | 
 89 |                         apply(col_register_table); //used in A2TP
 90 |                         
 91 |                         apply(bitmap_aggregate_table); //check aggregation bitmap get
 92 |                                                         //isaggregate? and bit_or
 93 |                         
 94 | 
 95 |                         if (p4ml.isResend == 1) {
 96 |                             // Force forward and clean
 97 |                             apply(agtr_time_resend_table);
 98 |                         } else {
 99 |                             apply(agtr_time_table); //update agtr_time
100 |                         }
101 |  
102 |                         // bitmap correct
103 |                         if (mdata.isAggregate != 0) { //need to aggregate
104 |                             if (mdata.current_agtr_time == p4ml.agtr_time) { // aggregation finish?
105 |                                 apply(noequ0_processEntry1andWriteToPacket); //sum to register
106 |                                 apply(noequ0_processEntry2andWriteToPacket);
107 |                                 apply(noequ0_processEntry3andWriteToPacket);
108 |                                 apply(noequ0_processEntry4andWriteToPacket);
109 |                                 apply(noequ0_processEntry5andWriteToPacket);
110 |                                 apply(noequ0_processEntry6andWriteToPacket);
111 |                                 apply(noequ0_processEntry7andWriteToPacket);
112 |                                 apply(noequ0_processEntry8andWriteToPacket);
113 |                                 apply(noequ0_processEntry9andWriteToPacket);
114 |                                 apply(noequ0_processEntry10andWriteToPacket);
115 |                                 apply(noequ0_processEntry11andWriteToPacket);
116 |                                 apply(noequ0_processEntry12andWriteToPacket);
117 |                                 apply(noequ0_processEntry13andWriteToPacket);
118 |                                 apply(noequ0_processEntry14andWriteToPacket);
119 |                                 apply(noequ0_processEntry15andWriteToPacket);
120 |                                 apply(noequ0_processEntry16andWriteToPacket);
121 |                                 apply(noequ0_processEntry17andWriteToPacket);
122 |                                 apply(noequ0_processEntry18andWriteToPacket);
123 |                                 apply(noequ0_processEntry19andWriteToPacket);
124 |                                 apply(noequ0_processEntry20andWriteToPacket);
125 |                                 apply(noequ0_processEntry21andWriteToPacket);
126 |                                 apply(noequ0_processEntry22andWriteToPacket);
127 |                                 apply(noequ0_processEntry23andWriteToPacket);
128 |                                 apply(noequ0_processEntry24andWriteToPacket);
129 |                                 apply(noequ0_processEntry25andWriteToPacket);
130 |                                 apply(noequ0_processEntry26andWriteToPacket);
131 |                                 apply(noequ0_processEntry27andWriteToPacket);
132 |                                 apply(noequ0_processEntry28andWriteToPacket);
133 |                                 apply(noequ0_processEntry29andWriteToPacket);
134 |                                 apply(noequ0_processEntry30andWriteToPacket);
135 |                                 apply(noequ0_processEntry31andWriteToPacket);
136 |                                 //apply(noequ0_processEntry32andWriteToPacket);
137 |                                 // set output port
138 |                                 // if(ig_intr_md.resubmit_flag == 1) {
139 |                                 apply(modify_packet_bitmap_table);
140 |                                 apply(outPort_table);
141 |                                 // } else {
142 |                                     // apply(p4ml_resubmit);
143 |                                 // }
144 |                             } else {
145 |                                 apply(processEntry1); // cover or sum to register
146 |                                 apply(processEntry2);
147 |                                 apply(processEntry3);
148 |                                 apply(processEntry4);
149 |                                 apply(processEntry5);
150 |                                 apply(processEntry6);
151 |                                 apply(processEntry7);
152 |                                 apply(processEntry8);
153 |                                 apply(processEntry9);
154 |                                 apply(processEntry10);
155 |                                 apply(processEntry11);
156 |                                 apply(processEntry12);
157 |                                 apply(processEntry13);
158 |                                 apply(processEntry14);
159 |                                 apply(processEntry15);
160 |                                 apply(processEntry16);
161 |                                 apply(processEntry17);
162 |                                 apply(processEntry18);
163 |                                 apply(processEntry19);
164 |                                 apply(processEntry20);
165 |                                 apply(processEntry21);
166 |                                 apply(processEntry22);
167 |                                 apply(processEntry23);
168 |                                 apply(processEntry24);
169 |                                 apply(processEntry25);
170 |                                 apply(processEntry26);
171 |                                 apply(processEntry27);
172 |                                 apply(processEntry28);
173 |                                 apply(processEntry29);
174 |                                 apply(processEntry30);
175 |                                 apply(processEntry31);
176 |                                 //apply(processEntry32);
177 |                                 
178 |                                 if (ig_intr_md.resubmit_flag == 1) {
179 |                                     apply(drop_table); 
180 |                                 } else {
181 |                                     apply(p4ml_resubmit);
182 |                                 }
183 | 
184 |                             }
185 |                         } else { //arrive yet
186 |                             if (mdata.current_agtr_time == p4ml.agtr_time) {
187 |                                 apply(Entry1WriteToPacket); //write to p4ml_entries.data1; to packet
188 |                                 apply(Entry2WriteToPacket);
189 |                                 apply(Entry3WriteToPacket);
190 |                                 apply(Entry4WriteToPacket);
191 |                                 apply(Entry5WriteToPacket);
192 |                                 apply(Entry6WriteToPacket);
193 |                                 apply(Entry7WriteToPacket);
194 |                                 apply(Entry8WriteToPacket);
195 |                                 apply(Entry9WriteToPacket);
196 |                                 apply(Entry10WriteToPacket);
197 |                                 apply(Entry11WriteToPacket);
198 |                                 apply(Entry12WriteToPacket);
199 |                                 apply(Entry13WriteToPacket);
200 |                                 apply(Entry14WriteToPacket);
201 |                                 apply(Entry15WriteToPacket);
202 |                                 apply(Entry16WriteToPacket);
203 |                                 apply(Entry17WriteToPacket);
204 |                                 apply(Entry18WriteToPacket);
205 |                                 apply(Entry19WriteToPacket);
206 |                                 apply(Entry20WriteToPacket);
207 |                                 apply(Entry21WriteToPacket);
208 |                                 apply(Entry22WriteToPacket);
209 |                                 apply(Entry23WriteToPacket);
210 |                                 apply(Entry24WriteToPacket);
211 |                                 apply(Entry25WriteToPacket);
212 |                                 apply(Entry26WriteToPacket);
213 |                                 apply(Entry27WriteToPacket);
214 |                                 apply(Entry28WriteToPacket);
215 |                                 apply(Entry29WriteToPacket);
216 |                                 apply(Entry30WriteToPacket);
217 |                                 apply(Entry31WriteToPacket);
218 |                                 //apply(Entry32WriteToPacket);
219 |                                 // set output port
220 |                                 // if(ig_intr_md.resubmit_flag == 1) {
221 |                                 apply(modify_packet_bitmap_table);
222 |                                 apply(outPort_table);
223 |                                 // } else {
224 |                                     // apply(p4ml_resubmit);
225 |                                 // }	
226 |                             }  
227 |                         }
228 |                     } else {
229 |                         /* tag collision bit in incoming one */
230 |                         // if not empty
231 |                         if (p4ml.isResend == 0) {
232 |                             
233 |                             apply(tag_collision_incoming_table);
234 |                             
235 |                             if(mdata.is_col == 1){
236 |                                 apply(tag_col_register_table); //used in A2TP
237 |                             }
238 |                             
239 |                         }
240 |                         
241 |                         apply(outPort_table);
242 |                     }
243 |                 }
244 |             }
245 |     } else {
246 |         // // BG traffic doesn't have data layer
247 |         //   if (valid(p4ml_bg)){
248 |         //      apply(bg_outPort_table);
249 |         //   } else {
250 |         apply(forward);
251 |         //   }
252 |     }
253 | }
254 | 
255 | control egress 
256 | {
257 |       apply(qdepth_table); 
258 |       if (valid(ipv4)) {
259 |           if (mdata.qdepth != 0) {
260 |             apply(mark_ecn_ipv4_table);
261 |           }
262 |       }
263 |       if (valid(p4ml_entries)) {
264 |         if (mdata.qdepth != 0) {
265 |             apply(modify_ecn_table);
266 |         }
267 |       }
268 | }
269 | 
270 | 


--------------------------------------------------------------------------------
/ptf_p4ml2/ptfTest.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | import os
  3 | import pd_base_tests
  4 | import pltfm_pm_rpc
  5 | import pal_rpc
  6 | import random
  7 | import sys
  8 | import time
  9 | import unittest
 10 | 
 11 | from pltfm_pm_rpc.ttypes import *
 12 | from pal_rpc.ttypes import *
 13 | from ptf import config
 14 | from ptf.testutils import *
 15 | from ptf.thriftutils import *
 16 | from res_pd_rpc.ttypes import *
 17 | from ptf import config
 18 | from ptf.thriftutils import *
 19 | 
 20 | from res_pd_rpc.ttypes import *
 21 | from port_mapping import *
 22 | 
 23 | from tm_api_rpc.ttypes import *
 24 | 
 25 | this_dir = os.path.dirname(os.path.abspath(__file__))
 26 | 
 27 | fp_ports = ["24/0", "23/0", "22/0", "21/0", "20/0", "16/0","15/0","14/0","13/0", "12/0", "11/0"]
 28 | # fp_ports = ["13/0","14/0", "11/0"]
 29 | loopback_ports = ["19/0", "18/0"]
 30 | # loopback_ports = ["1/0", "2/0", "3/0", "4/0", "5/0", "6/0", "7/0", "8/0", "25/0"]
 31 | def toInt8(n):
 32 |   n = n & 0xff
 33 |   return (n ^ 0x80) - 0x80
 34 | 
 35 | class L2Test(pd_base_tests.ThriftInterfaceDataPlane):
 36 |     def __init__(self):
 37 |         pd_base_tests.ThriftInterfaceDataPlane.__init__(self,
 38 |                                                         ["basic_switching"])
 39 | 
 40 |     # The setUp() method is used to prepare the test fixture. Typically
 41 |     # you would use it to establich connection to the Thrift server.
 42 |     #
 43 |     # You can also put the initial device configuration there. However,
 44 |     # if during this process an error is encountered, it will be considered
 45 |     # as a test error (meaning the test is incorrect),
 46 |     # rather than a test failure
 47 |     def setUp(self):
 48 |         # initialize the connection
 49 |         pd_base_tests.ThriftInterfaceDataPlane.setUp(self)
 50 |         self.sess_hdl = self.conn_mgr.client_init()
 51 |         self.dev_tgt = DevTarget_t(0, hex_to_i16(0xFFFF))
 52 |         self.devPorts = []
 53 |         self.LPPorts = []
 54 |         self.dev = 0
 55 |         self.platform_type = "mavericks"
 56 |         board_type = self.pltfm_pm.pltfm_pm_board_type_get()
 57 |         if re.search("0x0234|0x1234|0x4234|0x5234", hex(board_type)):
 58 |             self.platform_type = "mavericks"
 59 |         elif re.search("0x2234|0x3234", hex(board_type)):
 60 |             self.platform_type = "montara"
 61 | 
 62 |         # get the device ports from front panel ports
 63 |         try:
 64 |             for fpPort in fp_ports:
 65 |                 port, chnl = fpPort.split("/")
 66 |                 devPort = \
 67 |                     self.pal.pal_port_front_panel_port_to_dev_port_get(0,
 68 |                                                                     int(port),
 69 |                                                                     int(chnl))
 70 |                 self.devPorts.append(devPort)
 71 | 
 72 |             if test_param_get('setup') == True or (test_param_get('setup') != True
 73 |                 and test_param_get('cleanup') != True):
 74 | 
 75 |                 # add and enable the platform ports
 76 |                 for i in self.devPorts:
 77 |                     if int(i) in [999]: #pal_port_speed_t.BF_SPEED_40G, pal_fec_type_t.BF_FEC_TYP_NONE
 78 |                         self.pal.pal_port_add(0, i,
 79 |                                             pal_port_speed_t.BF_SPEED_40G, pal_fec_type_t.BF_FEC_TYP_NONE) #pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON
 80 |                         self.pal.pal_port_an_set(0, i, 2);
 81 |                         self.pal.pal_port_enable(0, i)
 82 |                     else:
 83 |                         self.pal.pal_port_add(0, i,
 84 |                                             pal_port_speed_t.BF_SPEED_100G, pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON) #pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON
 85 |                         self.pal.pal_port_an_set(0, i, 2);
 86 |                         self.pal.pal_port_enable(0, i)
 87 | 
 88 | ####################### LOOPBACK ###########################
 89 |             for lbPort in loopback_ports:
 90 |                 port, chnl = lbPort.split("/")
 91 |                 devPort = \
 92 |                     self.pal.pal_port_front_panel_port_to_dev_port_get(0,
 93 |                                                                     int(port),
 94 |                                                                     int(chnl))
 95 |                 self.LPPorts.append(devPort)
 96 | 
 97 |                 # add and enable the platform ports
 98 |             for i in self.LPPorts:
 99 |                 self.pal.pal_port_add(0, i,
100 |                                     pal_port_speed_t.BF_SPEED_100G, pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON) #pal_fec_type_t.BF_FEC_TYP_REED_SOLOMON
101 | 
102 |                 self.pal.pal_port_loopback_mode_set(0, i,
103 |                                     pal_loopback_mod_t.BF_LPBK_MAC_NEAR)
104 |                 self.pal.pal_port_an_set(0, i, 2);
105 |                 self.pal.pal_port_enable(0, i)
106 |                 
107 |             self.conn_mgr.complete_operations(self.sess_hdl)
108 | 
109 |         except Exception as e:
110 | 		print "Some Error in port init"
111 |         
112 |         # # flow control setting, follow "Barefoot Network Tofino Fixed Function API Guide"
113 |         # for i in range(len(self.devPorts)):
114 |         #     # step 1: Map loessless traffice to a PPG handle with a buffer limit
115 |         #     ppg_cells = 2000
116 |         #     self.ppg_handler = self.tm.tm_allocate_ppg(self.dev, self.devPorts[i])
117 |         #     self.tm.tm_set_ppg_guaranteed_min_limit(self.dev, self.ppg_handler, ppg_cells)
118 | 
119 |         #     # step 2: Map traffic to an iCos
120 |         #     icos_bmap = toInt8(0x01)
121 |         #     self.tm.tm_set_ppg_icos_mapping(self.dev, self.ppg_handler, icos_bmap)
122 | 
123 |         #     # step 3: Provision skid buffer set up pasue PFC generation
124 |         #     skid_cells = 4000
125 |         #     self.tm.tm_set_ppg_skid_limit(self.dev, self.ppg_handler, skid_cells)
126 |         #     self.tm.tm_enable_lossless_treatment(self.dev, self.ppg_handler)
127 |         #     # link-level flow control
128 |         #     fctype = 1 # BF_TM_PAUSE_PORT
129 |         #     self.tm.tm_set_port_flowcontrol_mode(self.dev, self.devPorts[i], fctype)
130 |         #     # iCos to Cos
131 |         #     icos_cos_map = tm_pfc_cos_map_t(CoS0_to_iCos=0)
132 |         #     self.tm.tm_set_port_pfc_cos_mapping(self.dev, self.devPorts[i], icos_cos_map)
133 | 
134 |         # ##########################################
135 |         # for i in range(len(self.devPorts)):
136 |         #     #step 4: Apply buffering
137 |         #     queue_id = 0
138 |         #     queue_cells = 25000
139 |         #     self.tm.tm_set_q_guaranteed_min_limit(self.dev, self.devPorts[i], queue_id, queue_cells)
140 | 
141 |         #     # step 5: Allocate queues
142 |         #     q_count = 8
143 |         #     q_map = tm_q_map_t(0,1,2,3,4,5,6,7)
144 |         #     self.tm.tm_set_port_q_mapping(self.dev, self.devPorts[i], q_count, q_map)
145 |         #     # step 6: Apply weighting if needed (skip, no use)
146 | 
147 |         #     # step 7: Honor pause/PFC event
148 |         #     cos = 0
149 |         #     self.tm.tm_set_q_pfc_cos_mapping(self.dev, self.devPorts[i], queue_id, cos)
150 | 
151 |         # # Can not find below API
152 |         # # self.tm.tm_set_port_flowcontrol_rx(self.dev, self.devPorts, fctype)
153 |         # self.tm.tm_complete_operations(self.dev)
154 | 
155 |         # for i in range(len(self.devPorts)):
156 |         #     # For MAC
157 |         #     self.pal.pal_port_flow_control_pfc_set(self.dev, self.devPorts[i], 1, 1)
158 |         # print("Done with PFC")
159 | 
160 |         return 
161 | 
162 |     def runTest(self):
163 |         print "runTest"
164 |    	    # self.conn_mgr.complete_operations(self.sess_hdl)
165 |     
166 |     def tearDown(self):
167 |         return 
168 |         # try:
169 |         #     print("Clearing table entries")
170 |         #     for table in self.entries.keys():
171 |         #         delete_func = "self.client." + table + "_table_delete"
172 |         #     for entry in self.entries[table]:
173 |         #         exec delete_func + "(self.sess_hdl, self.dev, entry)"
174 |         # except:
175 |         #     print("Error while cleaning up. ")
176 |         #     print("You might need to restart the driver")
177 |         # finally:
178 |         #     self.conn_mgr.complete_operations(self.sess_hdl)
179 |         #     self.conn_mgr.client_cleanup(self.sess_hdl)
180 |         #     print("Closed Session %d" % self.sess_hdl)
181 |         #     self.tm.tm_free_ppg(self.dev, self.ppg_handler)
182 |         #     print("Free ppg handler %d" % self.ppg_handler)
183 |         #     pd_base_tests.ThriftInterfaceDataPlane.tearDown(self)
184 | 


--------------------------------------------------------------------------------
/ptf_p4ml2/ptfTest.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/ptf_p4ml2/ptfTest.pyc


--------------------------------------------------------------------------------
/run_pd_rpc/setupp4ml2.py:
--------------------------------------------------------------------------------
  1 | clear_all()
  2 | 
  3 | p4_pd.register_reset_all_agtr_time()
  4 | p4_pd.register_reset_all_appID_and_Seq()
  5 | p4_pd.register_reset_all_bitmap()
  6 | p4_pd.register_reset_all_register1()
  7 | p4_pd.register_reset_all_register2()
  8 | p4_pd.register_reset_all_register3()
  9 | p4_pd.register_reset_all_register4()
 10 | p4_pd.register_reset_all_register5()
 11 | p4_pd.register_reset_all_register6()
 12 | p4_pd.register_reset_all_register7()
 13 | p4_pd.register_reset_all_register8()
 14 | p4_pd.register_reset_all_register9()
 15 | p4_pd.register_reset_all_register10()
 16 | p4_pd.register_reset_all_register11()
 17 | p4_pd.register_reset_all_register12()
 18 | p4_pd.register_reset_all_register13()
 19 | p4_pd.register_reset_all_register14()
 20 | p4_pd.register_reset_all_register15()
 21 | p4_pd.register_reset_all_register16()
 22 | p4_pd.register_reset_all_register17()
 23 | p4_pd.register_reset_all_register18()
 24 | p4_pd.register_reset_all_register19()
 25 | p4_pd.register_reset_all_register20()
 26 | p4_pd.register_reset_all_register21()
 27 | p4_pd.register_reset_all_register22()
 28 | p4_pd.register_reset_all_register23()
 29 | p4_pd.register_reset_all_register24()
 30 | p4_pd.register_reset_all_register25()
 31 | p4_pd.register_reset_all_register26()
 32 | p4_pd.register_reset_all_register27()
 33 | p4_pd.register_reset_all_register28()
 34 | p4_pd.register_reset_all_register29()
 35 | p4_pd.register_reset_all_register30()
 36 | p4_pd.register_reset_all_register31()
 37 | # p4_pd.register_reset_all_register32()
 38 | 
 39 | 
 40 | # These are background traffic
 41 | # p4_pd.bg_outPort_table_table_add_with_set_egr(
 42 | #     p4_pd.bg_outPort_table_match_spec_t(0), 
 43 | #     p4_pd.set_egr_action_spec_t(4)
 44 | # )
 45 | 
 46 | # p4_pd.bg_outPort_table_table_add_with_set_egr(
 47 | #     p4_pd.bg_outPort_table_match_spec_t(1), 
 48 | #     p4_pd.set_egr_action_spec_t(0)
 49 | # )
 50 | 
 51 | # first Zero for pending
 52 | #port_of_worker = [0, 56, 48, 40, 32, 24, 16, 8, 0, 4]
 53 | #port_of_worker = [0, 8, 0, 4]
 54 | port_of_worker = [0, 60, 52, 44, 36, 28, 0, 8, 16, 24, 32, 40]
 55 | single_loopback_port = [0, 20, 12] #20 12
 56 | 
 57 | MAC_address_of_worker = [ "0"
 58 |                         , "0c:42:a1:5a:5b:c1"
 59 |                         , "0c:42:a1:5a:5b:d9"
 60 |                         , "0c:42:a1:5a:5b:b9"
 61 |                         , "0c:42:a1:5a:53:01"
 62 |                         , "b8:59:9f:e2:0c:17"
 63 |                         , "b8:59:9f:e2:0c:16"
 64 |                         , "0c:42:a1:5a:5b:e1"
 65 |                         , "b8:59:9f:e2:25:f7"
 66 |                         , "b8:59:9f:e2:09:47"
 67 |                         , "0c:42:a1:5a:53:81"
 68 |                         , "b8:59:9f:e2:26:0e"]
 69 | 
 70 | # host0   24  60  0c:42:a1:5a:5b:c1
 71 | # host1   23  52  0c:42:a1:5a:5b:d9
 72 | # host2  22  44  0c:42:a1:5a:5b:b9
 73 | # host3   21  36  0c:42:a1:5a:53:01
 74 | ##host4   20  28  b8:59:9f:e2:0c:17
 75 | # loop1     19  20
 76 | # loop2     18  12
 77 | #           17  
 78 | ##host5   16  0   b8:59:9f:e2:0c:16
 79 | # host6   15  8   0c:42:a1:5a:5b:e1
 80 | # host7   14  16  b8:59:9f:e2:25:f7
 81 | # host8   13  24  b8:59:9f:e2:09:47
 82 | # host9   12  32  0c:42:a1:5a:53:81
 83 | # host10   11  40  b8:59:9f:e2:26:0e
 84 | 
 85 | 
 86 | 
 87 | 
 88 | 
 89 | # first Zero for pending
 90 | # PSs = [0, 9, 8]
 91 | PSs = [0, 5, 6]
 92 | 
 93 | len_workers = len(port_of_worker)
 94 | len_PS = len(PSs)
 95 | 
 96 | # Normal Switch traffic
 97 | for i in range(1, len_workers):
 98 |     p4_pd.forward_table_add_with_set_egr(
 99 |         p4_pd.forward_match_spec_t(macAddr_to_string(MAC_address_of_worker[i])),
100 |         p4_pd.set_egr_action_spec_t(port_of_worker[i])
101 |     )
102 | 
103 | 
104 | # P4ML Traffic
105 | 
106 | # No Pending packet, First time enter switch
107 | for i in range(1, len_workers - 1): 
108 |     for j in range(1, len_PS): #appIDandseqnum,loopport,dataIndex,PSIndex PSIndex=key%num_PS
109 |         p4_pd.outPort_table_table_add_with_set_egr_and_set_index(
110 |         p4_pd.outPort_table_match_spec_t(
111 |             j << 16,
112 |             port_of_worker[i],
113 |             0,
114 |             0), 
115 |         
116 |         p4_pd.set_egr_and_set_index_action_spec_t(single_loopback_port[j]))
117 | 
118 | # Not Pending packet, Second time enter switch
119 | for j in range(1, len_PS): 
120 |     print(j, PSs[j]) #appIDandseqnum,loopport,dataIndex,PSIndex
121 |     p4_pd.outPort_table_table_add_with_set_egr(
122 |     p4_pd.outPort_table_match_spec_t(
123 |         j << 16,
124 |         single_loopback_port[j],
125 |         1,
126 |         0), 
127 |     # app1 -> worker3
128 |     p4_pd.set_egr_action_spec_t(port_of_worker[PSs[j]]))
129 | 
130 | # INGRESSPORT, Index
131 | 
132 | for i in range(1, len_workers - 1):#ingressport,dataindex
133 |     p4_pd.drop_table_table_add_with_drop_pkt(
134 |         p4_pd.drop_table_match_spec_t(
135 |             port_of_worker[i],
136 |             1)
137 |     )
138 | 
139 | ####### Server ########
140 | '''
141 | for j in range(1, len_PS):
142 |     p4_pd.multicast_table_table_add_with_multicast(
143 |         p4_pd.multicast_table_match_spec_t(
144 |             1,
145 |             1 << 16,
146 |             port_of_worker[PSs[j]],
147 |             0),
148 |         # multicast app1 -> worker1, 2
149 |         p4_pd.multicast_action_spec_t(999)
150 |     )
151 | '''
152 | for j in range(1, len_PS): #isAck, appIDandseqnum, IngressPort, dataIndex; aciton(multicast_group)
153 |     p4_pd.multicast_table_table_add_with_multicast(
154 |         p4_pd.multicast_table_match_spec_t(
155 |             1,
156 |             j << 16,
157 |             port_of_worker[PSs[j]],
158 |             0),
159 |         # multicast app1 -> worker1, 2
160 |         p4_pd.multicast_action_spec_t(1000-j)
161 |     )
162 | 
163 | p4_pd.modify_packet_bitmap_table_table_add_with_modify_packet_bitmap(
164 |     p4_pd.modify_packet_bitmap_table_match_spec_t(1)
165 | )
166 | 
167 | p4_pd.modify_packet_bitmap_table_table_add_with_nop(
168 |     p4_pd.modify_packet_bitmap_table_match_spec_t(0)
169 | )
170 | 
171 | p4_pd.processEntry1_table_add_with_processentry1(
172 |     p4_pd.processEntry1_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
173 | )
174 | p4_pd.processEntry1_table_add_with_noequ0_processentry1(
175 |     p4_pd.processEntry1_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1,
176 | )
177 | p4_pd.processEntry2_table_add_with_processentry2(
178 |     p4_pd.processEntry2_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
179 | )
180 | p4_pd.processEntry2_table_add_with_noequ0_processentry2(
181 |     p4_pd.processEntry2_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
182 | )
183 | p4_pd.processEntry3_table_add_with_processentry3(
184 |     p4_pd.processEntry3_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
185 | )
186 | p4_pd.processEntry3_table_add_with_noequ0_processentry3(
187 |     p4_pd.processEntry3_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
188 | )
189 | p4_pd.processEntry4_table_add_with_processentry4(
190 |     p4_pd.processEntry4_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
191 | )
192 | p4_pd.processEntry4_table_add_with_noequ0_processentry4(
193 |     p4_pd.processEntry4_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
194 | )
195 | p4_pd.processEntry5_table_add_with_processentry5(
196 |     p4_pd.processEntry5_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
197 | )
198 | p4_pd.processEntry5_table_add_with_noequ0_processentry5(
199 |     p4_pd.processEntry5_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
200 | )
201 | p4_pd.processEntry6_table_add_with_processentry6(
202 |     p4_pd.processEntry6_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
203 | )
204 | p4_pd.processEntry6_table_add_with_noequ0_processentry6(
205 |     p4_pd.processEntry6_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
206 | )
207 | p4_pd.processEntry7_table_add_with_processentry7(
208 |     p4_pd.processEntry7_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
209 | )
210 | p4_pd.processEntry7_table_add_with_noequ0_processentry7(
211 |     p4_pd.processEntry7_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
212 | )
213 | p4_pd.processEntry8_table_add_with_processentry8(
214 |     p4_pd.processEntry8_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
215 | )
216 | p4_pd.processEntry8_table_add_with_noequ0_processentry8(
217 |     p4_pd.processEntry8_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
218 | )
219 | p4_pd.processEntry9_table_add_with_processentry9(
220 |     p4_pd.processEntry9_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
221 | )
222 | p4_pd.processEntry9_table_add_with_noequ0_processentry9(
223 |     p4_pd.processEntry9_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
224 | )
225 | p4_pd.processEntry10_table_add_with_processentry10(
226 |     p4_pd.processEntry10_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
227 | )
228 | p4_pd.processEntry10_table_add_with_noequ0_processentry10(
229 |     p4_pd.processEntry10_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
230 | )
231 | p4_pd.processEntry11_table_add_with_processentry11(
232 |     p4_pd.processEntry11_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
233 | )
234 | p4_pd.processEntry11_table_add_with_noequ0_processentry11(
235 |     p4_pd.processEntry11_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
236 | )
237 | p4_pd.processEntry12_table_add_with_processentry12(
238 |     p4_pd.processEntry12_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
239 | )
240 | p4_pd.processEntry12_table_add_with_noequ0_processentry12(
241 |     p4_pd.processEntry12_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
242 | )
243 | p4_pd.processEntry13_table_add_with_processentry13(
244 |     p4_pd.processEntry13_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
245 | )
246 | p4_pd.processEntry13_table_add_with_noequ0_processentry13(
247 |     p4_pd.processEntry13_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
248 | )
249 | p4_pd.processEntry14_table_add_with_processentry14(
250 |     p4_pd.processEntry14_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
251 | )
252 | p4_pd.processEntry14_table_add_with_noequ0_processentry14(
253 |     p4_pd.processEntry14_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
254 | )
255 | p4_pd.processEntry15_table_add_with_processentry15(
256 |     p4_pd.processEntry15_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
257 | )
258 | p4_pd.processEntry15_table_add_with_noequ0_processentry15(
259 |     p4_pd.processEntry15_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
260 | )
261 | p4_pd.processEntry16_table_add_with_processentry16(
262 |     p4_pd.processEntry16_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
263 | )
264 | p4_pd.processEntry16_table_add_with_noequ0_processentry16(
265 |     p4_pd.processEntry16_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
266 | )
267 | p4_pd.processEntry17_table_add_with_processentry17(
268 |     p4_pd.processEntry17_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
269 | )
270 | p4_pd.processEntry17_table_add_with_noequ0_processentry17(
271 |     p4_pd.processEntry17_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
272 | )
273 | p4_pd.processEntry18_table_add_with_processentry18(
274 |     p4_pd.processEntry18_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
275 | )
276 | p4_pd.processEntry18_table_add_with_noequ0_processentry18(
277 |     p4_pd.processEntry18_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
278 | )
279 | p4_pd.processEntry19_table_add_with_processentry19(
280 |     p4_pd.processEntry19_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
281 | )
282 | p4_pd.processEntry19_table_add_with_noequ0_processentry19(
283 |     p4_pd.processEntry19_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
284 | )
285 | p4_pd.processEntry20_table_add_with_processentry20(
286 |     p4_pd.processEntry20_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
287 | )
288 | p4_pd.processEntry20_table_add_with_noequ0_processentry20(
289 |     p4_pd.processEntry20_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
290 | )
291 | p4_pd.processEntry21_table_add_with_processentry21(
292 |     p4_pd.processEntry21_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
293 | )
294 | p4_pd.processEntry21_table_add_with_noequ0_processentry21(
295 |     p4_pd.processEntry21_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
296 | )
297 | p4_pd.processEntry22_table_add_with_processentry22(
298 |     p4_pd.processEntry22_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
299 | )
300 | p4_pd.processEntry22_table_add_with_noequ0_processentry22(
301 |     p4_pd.processEntry22_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
302 | )
303 | p4_pd.processEntry23_table_add_with_processentry23(
304 |     p4_pd.processEntry23_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
305 | )
306 | p4_pd.processEntry23_table_add_with_noequ0_processentry23(
307 |     p4_pd.processEntry23_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
308 | )
309 | p4_pd.processEntry24_table_add_with_processentry24(
310 |     p4_pd.processEntry24_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
311 | )
312 | p4_pd.processEntry24_table_add_with_noequ0_processentry24(
313 |     p4_pd.processEntry24_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
314 | )
315 | p4_pd.processEntry25_table_add_with_processentry25(
316 |     p4_pd.processEntry25_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
317 | )
318 | p4_pd.processEntry25_table_add_with_noequ0_processentry25(
319 |     p4_pd.processEntry25_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
320 | )
321 | p4_pd.processEntry26_table_add_with_processentry26(
322 |     p4_pd.processEntry26_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
323 | )
324 | p4_pd.processEntry26_table_add_with_noequ0_processentry26(
325 |     p4_pd.processEntry26_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
326 | )
327 | p4_pd.processEntry27_table_add_with_processentry27(
328 |     p4_pd.processEntry27_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
329 | )
330 | p4_pd.processEntry27_table_add_with_noequ0_processentry27(
331 |     p4_pd.processEntry27_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
332 | )
333 | p4_pd.processEntry28_table_add_with_processentry28(
334 |     p4_pd.processEntry28_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
335 | )
336 | p4_pd.processEntry28_table_add_with_noequ0_processentry28(
337 |     p4_pd.processEntry28_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
338 | )
339 | p4_pd.processEntry29_table_add_with_processentry29(
340 |     p4_pd.processEntry29_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
341 | )
342 | p4_pd.processEntry29_table_add_with_noequ0_processentry29(
343 |     p4_pd.processEntry29_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
344 | )
345 | p4_pd.processEntry30_table_add_with_processentry30(
346 |     p4_pd.processEntry30_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
347 | )
348 | p4_pd.processEntry30_table_add_with_noequ0_processentry30(
349 |     p4_pd.processEntry30_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
350 | )
351 | p4_pd.processEntry31_table_add_with_processentry31(
352 |     p4_pd.processEntry31_match_spec_t(hex_to_i32(0), hex_to_i32(0xFFFFFFFF)), 1,
353 | )
354 | p4_pd.processEntry31_table_add_with_noequ0_processentry31(
355 |     p4_pd.processEntry31_match_spec_t(hex_to_i32(0), hex_to_i32(0x00000000)), 1
356 | )
357 | try:
358 |     # TODO: understand it
359 |     # dont know why, but if group = input port,
360 |     # then the packet followed by that packet will execute multicast
361 |     # therefore make it 20, no 20th port is used.
362 |     mcg_all  = mc.mgrp_create(999)
363 |     mcg1  = mc.mgrp_create(998)
364 |     # mcg2  = mc.mgrp_create(997)
365 |     # mcg3  = mc.mgrp_create(996)
366 | except:
367 |     print """
368 | clean_all() does not yet support cleaning the PRE programming.
369 | You need to restart the driver before running this script for the second time
370 | """
371 |     quit()
372 | 
373 | node_all = mc.node_create(
374 |     rid=999,
375 |     #port_map=devports_to_mcbitmap([56,48,40,32,24,16,8,0]),
376 |     # port_map=devports_to_mcbitmap([port_of_worker[2], port_of_worker[3], port_of_worker[4],]),
377 |     #port_map=devports_to_mcbitmap([36,44]),
378 |     port_map=devports_to_mcbitmap([60,52,44,36]),
379 |     lag_map=lags_to_mcbitmap(([]))
380 | )
381 | mc.associate_node(mcg_all, node_all, xid=0, xid_valid=False)
382 | 
383 | #[52, 60, 36, 44, 12, 4, 20]
384 | 
385 | node1 = mc.node_create(
386 |     rid=998,
387 |     # Not multicast to "0" ( 0 as bg traffic )
388 |     #port_map=devports_to_mcbitmap([56,48,40,32,24,16,8]),
389 |     port_map=devports_to_mcbitmap([8,16,24,32]),
390 |     #port_map=devports_to_mcbitmap([52,60,0]),
391 |     lag_map=lags_to_mcbitmap(([]))
392 | )
393 | mc.associate_node(mcg1, node1, xid=0, xid_valid=False)
394 | 
395 | 
396 | # node2 = mc.node_create(
397 | #     rid=997,
398 | #     # Not multicast to "0" ( 0 as bg traffic )
399 | #     #port_map=devports_to_mcbitmap([56,48,40,32,24,16,8]),
400 | #     #port_map=devports_to_mcbitmap([56,48,40]),
401 | #     port_map=devports_to_mcbitmap([60,0]),
402 | #     lag_map=lags_to_mcbitmap(([]))
403 | # )
404 | # mc.associate_node(mcg2, node2, xid=0, xid_valid=False)
405 | 
406 | '''
407 | node2 = mc.node_create(
408 |     rid=997,
409 |     # Not multicast to "0" ( 0 as bg traffic )
410 |     # port_map=devports_to_mcbitmap([56,48,40,32,24,16,8]),
411 |     port_map=devports_to_mcbitmap([24,16,8]),
412 |     lag_map=lags_to_mcbitmap(([]))
413 | )
414 | mc.associate_node(mcg2, node2, xid=0, xid_valid=False)
415 | '''
416 | 
417 | conn_mgr.complete_operations()
418 | 
419 | def hex_to_i32(h):
420 |     x = int(h, 0)
421 |     if (x > 0xFFFFFFFF):
422 |         raise UIn_Error("Integer cannot fit within 32 bits")
423 |     if (x > 0x7FFFFFFF): x-= 0x100000000
424 |     return x
425 | 
426 | import time
427 | 
428 | 
429 | used_aggr = 450
430 | all_start = time.time()
431 | last_time1 = all_start
432 | last_time2 = all_start
433 | 
434 | num_com = 0
435 | average_total = 0
436 | average_appID_map = {}
437 | 
438 | tmp = []
439 | appID_map = {}
440 | 
441 | while (1):
442 | 
443 |     now = time.time()
444 |     time_count1 = now - last_time1
445 |     time_count2 = now - last_time2
446 |     if (time_count1 > 0.01):
447 |         lasttime1=now
448 |         num_com += 1
449 | 
450 |         del tmp[:]
451 |         appID_map.clear()
452 |         total_used = 0
453 |         #timetmp1 = time.time()
454 |         p4_pd.register_hw_sync_appID_and_Seq()
455 | 
456 |         for i in range(used_aggr):
457 |             tmp.append(p4_pd.register_read_appID_and_Seq(i , p4_pd.register_flags_t(False)))
458 |         #timetmp2 = time.time()
459 |         #print timetmp2-timetmp1
460 |         #print "hello"
461 |         for appID_and_Seq in tmp:
462 |             appID = (appID_and_Seq[0]>>16)
463 |             #print "appID", appID
464 |             if appID != 0:
465 |                 if appID in appID_map:
466 |                     appID_map[appID] += 1    
467 |                 else:
468 |                     appID_map[appID] = 1
469 | 
470 |                 if appID in average_appID_map:
471 |                     average_appID_map[appID] += 1    
472 |                 else:
473 |                     average_appID_map[appID] = 1
474 |                                                 
475 |         for key, value in appID_map.items():
476 |             total_used += value
477 |             #print "appID{0} {1}/{2} {3}".format(key, value, used_aggr, 1.0*value/used_aggr)
478 |         average_total += total_used
479 | 
480 |         #if total_used != 0:
481 |         #print "total_used {0}/{1} {2}".format(total_used, used_aggr, 1.0*total_used/used_aggr)
482 |         time.sleep(0.09)
483 |     
484 |     if (time_count2 > 0.5):
485 |         last_time2 = now 
486 |         for key, value in average_appID_map.items():
487 |             print "appID[{0}] {1}/{2} {3:.2f} %".format(key, value/num_com, used_aggr, 100.0*value/num_com/used_aggr)
488 | 
489 |         if average_total > 0:
490 |             print "time {3:.1f} total_used {0}/{1} {2:.1f} %".format(average_total/num_com, used_aggr, 100.0*average_total/num_com/used_aggr, now - all_start)
491 | 
492 |         num_com = 0
493 |         average_total = 0
494 |         average_appID_map.clear()
495 | 
496 | '''
497 | while (1):
498 | 
499 |     now = time.time()
500 |     time_count1 = now - last_time1
501 |     time_count2 = now - last_time2
502 |     if (time_count1 > 0.01):
503 |         lasttime1=now
504 |         num_com += 1
505 | 
506 |         del tmp[:]
507 |         appID_map.clear()
508 |         total_used = 0
509 |         timetmp1 = time.time()
510 |         p4_pd.register_hw_sync_appID_and_Seq()
511 | 
512 |         for i in range(used_aggr):
513 |             tmp.append(p4_pd.register_read_appID_and_Seq(i , p4_pd.register_flags_t(False)))
514 |         timetmp2 = time.time()
515 |         time.sleep(0.5 - (timetmp2-timetmp1))
516 |         #print timetmp2-timetmp1
517 |         #print "hello"
518 |         for appID_and_Seq in tmp:
519 |             appID = (appID_and_Seq[0]>>16)
520 |             #print "appID", appID
521 |             if appID != 0:
522 |                 if appID in appID_map:
523 |                     appID_map[appID] += 1    
524 |                 else:
525 |                     appID_map[appID] = 1
526 | 
527 |                 if appID in average_appID_map:
528 |                     average_appID_map[appID] += 1    
529 |                 else:
530 |                     average_appID_map[appID] = 1
531 |                                                 
532 |         for key, value in appID_map.items():
533 |             total_used += value
534 |             #print "appID{0} {1}/{2} {3}".format(key, value, used_aggr, 1.0*value/used_aggr)
535 |         average_total += total_used
536 | 
537 |         #if total_used != 0:
538 |         #print "total_used {0}/{1} {2}".format(total_used, used_aggr, 1.0*total_used/used_aggr)
539 |         #time.sleep(0.1)
540 |     
541 |     if (time_count2 > 1.0):
542 |         last_time2 = now 
543 |         for key, value in average_appID_map.items():
544 |             print "appID[{0}] {1}/{2} {3:.2f} %".format(key, value/num_com, used_aggr, 100.0*value/num_com/used_aggr)
545 | 
546 |         if average_total > 0:
547 |             print "time {3:.0f} total_used {0}/{1} {2:.1f} %".format(average_total/num_com, used_aggr, 100.0*average_total/num_com/used_aggr, now - all_start)
548 | 
549 |         num_com = 0
550 |         average_total = 0
551 |         average_appID_map.clear()
552 | 
553 | '''
554 | 
555 |     
556 | 
557 | 


--------------------------------------------------------------------------------
/server/Makefile:
--------------------------------------------------------------------------------
 1 | 
 2 | # All Target
 3 | all:
 4 | 	g++ -std=c++11 -O3 -g -c -o  ParameterServer.o  ParameterServer.cc
 5 | 	g++ -std=c++11 -O3 -g -c -o ../common/dma_common.o ../common/dma_common.cc
 6 | 	g++ -std=c++11 -O3 -g -c -o ../common/HashTable.o ../common/HashTable.cc
 7 | 	g++ -std=c++11 -O3 -g -o app  ParameterServer.o ../common/HashTable.o ../common/dma_common.o -lpthread -libverbs
 8 | 
 9 | 
10 | # Clean Target
11 | clean:
12 | 	rm *.o
13 | 	rm app
14 | 


--------------------------------------------------------------------------------
/server/ParameterServer.cc:
--------------------------------------------------------------------------------
  1 | #include "ParameterServer.h"
  2 | 
  3 | tensor_context *tensors;
  4 | 
  5 | int max_agtr_size_per_thread;
  6 | int UsedSwitchAGTRcount = MAX_AGTR_COUNT;
  7 | std::mutex _dma_mutex;
  8 | struct ibv_device **dev_list;
  9 | struct ibv_device *ib_dev;
 10 | ThreadPool* workQueue;
 11 | std::mutex __print_mutex;
 12 | std::mutex _init_mutex;
 13 | int num_thread;
 14 | int print_count = 0;
 15 | int appID;
 16 | 
 17 |  
 18 | 
 19 | 
 20 | long long int receive_in_sec[20] = {0};
 21 | bool receive_byte_reset_flag[20] = {0};
 22 | 
 23 | bool is_completed_p4ml_key[1024000] = {0};
 24 | 
 25 | int next_agtr[MAX_AGTR_COUNT] = {-1};
 26 | HashTable* hash_table;
 27 | 
 28 | int packet_full_count = 0;
 29 | int packet_partial_count = 0;
 30 | int packet_all_forward_count = 0;
 31 | int packet_partial_total_count = 0;
 32 | 
 33 | #define MAX_MEASUREMENT_KEY 12000
 34 | int full_packet_count[MAX_MEASUREMENT_KEY][16518] = { 0 };
 35 | int resend_packet_count[MAX_MEASUREMENT_KEY][16518] = { 0 };
 36 | 
 37 | 
 38 | DMAcontext** global_dma_contexts;
 39 | 
 40 | void main_receive_packet_loop(DMAcontext* dma_context, int thread_id) {
 41 |     int msgs_completed = 0;
 42 |     int this_pos_to_send = 0;
 43 |     int total_last_tensor_packet = 0;
 44 |     int imm_pos_to_send = dma_context->my_send_queue_length / 2;
 45 |     bool app_init[MAX_APP_PER_THREAD] = {0};
 46 |     
 47 |     /* Loss */
 48 |     int loss = 0;
 49 | 
 50 |     int rand_index = 0;
 51 |     int total_loss = 0;
 52 | 
 53 |     // app start from 1
 54 |     int* tensors_pos_of_app = new int[MAX_APP_PER_THREAD + 1];
 55 |     for (int i = 1; i <= MAX_APP_PER_THREAD; i++) {
 56 |         tensors_pos_of_app[i] = thread_id * MAX_STORAGE_PER_APP_PER_THREAD * MAX_APP_PER_THREAD + (i - 1) * MAX_STORAGE_PER_APP_PER_THREAD;
 57 |     }
 58 | 
 59 |     while (1) {
 60 |         
 61 |         cqe_snapshot_t cur_snapshot;
 62 |         msgs_completed = 0;
 63 | 
 64 |         std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now(); 
 65 |         while(1) {
 66 |             
 67 |             // if (receive_byte_reset_flag[thread_id]) {
 68 |             //     receive_in_sec[thread_id] = 0;
 69 |             //     receive_byte_reset_flag[thread_id] = false;
 70 |             // }
 71 |             
 72 |             std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
 73 |             std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
 74 |             
 75 |             msgs_completed = receive_packet(dma_context, &cur_snapshot);
 76 |             if (msgs_completed) {    
 77 |                 break;
 78 |             }
 79 |             if (time_span.count() > 10.0 && msgs_completed == 0 && dma_context->total_received > 0) {
 80 |                 std::lock_guard<std::mutex> lock(_dma_mutex);
 81 |                 fprintf(stderr, "Timeout happened this thread_id=%d, total_received=%d, total_sent=%d, last_ACK=%d, total_last_tensor_packet_recv=%d\n",
 82 |                     thread_id, global_dma_contexts[thread_id]->total_received, global_dma_contexts[thread_id]->total_sent, tensors[tensors_pos_of_app[1]].window_manager[0].last_ACK, total_last_tensor_packet);
 83 |                 for (int i = 0; i < num_thread; i++)
 84 |                     fprintf(stderr, "Timeout happened at thread_id=%d, total_received=%d, total_sent=%d\n", i, global_dma_contexts[i]->total_received, global_dma_contexts[i]->total_sent);
 85 | 
 86 |                 for (uint64_t i = 0; i < MAX_MEASUREMENT_KEY; i++) {
 87 |                     for (uint16_t j = 1; j <= ceil((float)MAX_TENSOR_SIZE/MAX_ENTRIES_PER_PACKET); j++) {
 88 |                         if (full_packet_count[i][j]) {
 89 |                             packet_full_count++;
 90 |                         } else if (resend_packet_count[i][j]) {
 91 |                             packet_partial_count++;
 92 |                             packet_partial_total_count += resend_packet_count[i][j];
 93 |                         } else {
 94 |                             packet_all_forward_count++;
 95 |                             // printf("i:%d, j:%d\n", i, j);
 96 |                         }
 97 |                     }
 98 |                 }
 99 |                 printf("%d, %d, %d, %d\n", packet_full_count, packet_partial_count, packet_all_forward_count, packet_partial_total_count);
100 | 
101 |                 int seen_agtrs = 0;
102 |                 for (int i = 0; i < MAX_AGTR_COUNT; i++)
103 |                     if (hash_table->isAlreadyDeclare[i])
104 |                         seen_agtrs++;
105 |                 printf("Seen agtrs: %d\n", seen_agtrs);
106 | 
107 |                 exit(-1);
108 |             } 
109 |         }
110 |         
111 |         int to_be_sent = 0;
112 |         if (this_pos_to_send + max_agtr_size_per_thread + max_agtr_size_per_thread > dma_context->my_send_queue_length / 2)
113 |             this_pos_to_send = 0;
114 | 
115 |         // printf("%d packets received.\n", msgs_completed);
116 |         for(int msg=0; msg < msgs_completed; msg++) {
117 |             // std::chrono::high_resolution_clock::time_point packet_start = std::chrono::high_resolution_clock::now();
118 |             uint8_t* buf = &dma_context->mp_recv_ring[dma_context->ring_head * kAppRingMbufSize];
119 | 
120 |             agghdr* p4ml_header = reinterpret_cast<agghdr*>(buf + IP_ETH_UDP_HEADER_SIZE);
121 | 
122 |             //check ecn mark
123 |             // bool is_ecn_mark_packet = p4ml_header->flag & 0x08;
124 |             // if (is_ecn_mark_packet)
125 |             //     printf("ECN mark found.\n");
126 |             if (DEBUG_PRINT_ALL_RECEIVING_PACKET)
127 |                 p4ml_header_print_h(p4ml_header, "Receive");
128 | 
129 |             bool isTerminated_packet = p4ml_header->flag & 0x02;
130 |             bool isResend_packet = p4ml_header->flag & 0x04;
131 |             bool isOverflow_packet = p4ml_header->flag & 0x80;
132 |             
133 |             // exit(1);
134 |             p4ml_header_ntoh(p4ml_header);
135 |             /* Move AppID index */
136 |             int appID = p4ml_header->appID;
137 |             if (!app_init[appID]) {
138 |                 app_init[appID] = true;
139 |             } else {
140 |                 if (p4ml_header->key != tensors[tensors_pos_of_app[appID]].key && tensors[tensors_pos_of_app[appID]].isCompleted) {
141 |                     // p4ml_header_print(p4ml_header, "ERROR PACKET");
142 |                     // printf("tensors_pos_of_app[appID] from %d to %d\n", tensors_pos_of_app[appID], tensors_pos_of_app[appID]+1);
143 |                     tensors_pos_of_app[appID]++;
144 |                     if (tensors_pos_of_app[appID] == thread_id * MAX_APP_PER_THREAD * MAX_STORAGE_PER_APP_PER_THREAD + MAX_STORAGE_PER_APP_PER_THREAD * (appID))
145 |                         tensors_pos_of_app[appID] = tensors_pos_of_app[appID] - MAX_STORAGE_PER_APP_PER_THREAD;
146 |                 }
147 |             }
148 | 
149 |             if (!hash_table->isAlreadyDeclare[p4ml_header->agtr] && !isResend_packet)
150 |                 hash_table->isAlreadyDeclare[p4ml_header->agtr] = true;
151 | 
152 | 
153 |             /* Check if Collision packet */
154 |             bool is_collision_packet = p4ml_header->flag & 0x02;
155 |             bool is_lzy_collision = p4ml_header->is_lzy_Col & 0x01;
156 | 
157 | 
158 |   
159 |             if(is_collision_packet || isResend_packet){
160 |                 tensors[tensors_pos_of_app[appID]].isCollision[p4ml_header->seq_num] = true;
161 |             }   
162 | 
163 |             int my_tensors_pos = tensors_pos_of_app[appID];
164 | 
165 |             check_tensor_available(&tensors[my_tensors_pos], p4ml_header, thread_id);
166 | 
167 |             // char * eth_ip_header = (char*) dma_context->send_region + wc_recv_id * ENTRY_SIZE;
168 |             // uint8_t swap[6];
169 |             // for (int i = 0; i < 6; i++) {
170 |             //     swap[i] = eth_ip_header[i];
171 |             //     eth_ip_header[i] = eth_ip_header[i+6];
172 |             //     eth_ip_header[i+6] = swap[i];
173 |             // }
174 | 
175 |             if (OVERFLOW_HANDLE) {
176 |                 // Check Switch Overflow but not Host Overflow
177 |                 if (!isOverflow_packet)
178 |                     for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++)
179 |                         if (p4ml_header->vector[i] == INT32_MAX || p4ml_header->vector[i] == INT32_MIN)
180 |                         {
181 |                             if (p4ml_header->vector[i] == INT32_MIN)
182 |                                 p4ml_header_print(p4ml_header, "Switch Overflow");
183 |                             isOverflow_packet = true;
184 |                         }
185 |                         
186 |             // p4ml_header_print(p4ml_header, "Receive");
187 |                 if (isOverflow_packet) {
188 |                     /* Clean Integer Data */
189 |                     if (!tensors[my_tensors_pos].isFloat[p4ml_header->seq_num]) {
190 |                         // printf("ReadyForFloat\n");
191 |                         makeTensorReadyforFloat(p4ml_header, &tensors[my_tensors_pos]);
192 |                         tensors[my_tensors_pos].isFloat[p4ml_header->seq_num] = true;
193 |                     }
194 |                 }
195 | 
196 |                 /* Floating point request packet */
197 |                 bool sendFloatRequest = false;
198 |                 if (isOverflow_packet && !isResend_packet)
199 |                     sendFloatRequest = true;
200 |                 if (!isOverflow_packet && isResend_packet && tensors[my_tensors_pos].isFloat[p4ml_header->seq_num])
201 |                     sendFloatRequest = true;
202 | 
203 |                 if (sendFloatRequest) {
204 |                     /* Do floating point request */
205 |                     /* Send back request to everyone immediately */
206 |                     p4ml_header_hton_without_data(p4ml_header);
207 |                     memcpy((char*) dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE), (char*) buf + IP_ETH_UDP_HEADER_SIZE, P4ML_LAYER_SIZE);
208 |                     /* then send ACK */
209 |                     p4ml_header_setACK((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE)));
210 |                     p4ml_header_setOverflowRequest((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE)));
211 |                     p4ml_header_resetIndex((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE)));
212 | 
213 |                     // p4ml_header_print_h((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE)), "Overflow Sendback PACKET");
214 |                     send_packet(dma_context, P4ML_LAYER_SIZE, imm_pos_to_send);
215 |                     imm_pos_to_send++;
216 |                     if (imm_pos_to_send == dma_context->my_send_queue_length - 1)
217 |                         imm_pos_to_send = dma_context->my_send_queue_length / 2 + 1;
218 | 
219 |                     /* Push Back */
220 |                     dma_postback(dma_context);
221 |                     continue;
222 |                 }
223 |             }
224 | 
225 |             /* Check Full Packet */
226 |             bool isFullPacket = (1 << p4ml_header->num_worker) - 1 == p4ml_header->bitmap? 1:0;
227 | 
228 | 
229 |             if (receive_byte_reset_flag[thread_id]) {
230 |                 //receive_in_sec[thread_id] = 0;
231 |                 //receive_PS[thread_id] = 0;
232 |                // receive_Aggr[thread_id] = 0;
233 |                // receive_Total_Per_Task[thread_id] = 0;
234 | 
235 |                 receive_byte_reset_flag[thread_id] = false;
236 |             }
237 | 
238 |             /* if full packet, update directly. */
239 |             if (isFullPacket) {
240 |                 // printf("%d: full packet - seq %d update model.\n", p4ml_header->key, p4ml_header->seq_num);
241 |                 updateModel_force(p4ml_header, &tensors[my_tensors_pos]);
242 |                 for (int i = 0; i < p4ml_header->num_worker; i++) 
243 |                     tensors[my_tensors_pos].window_manager[i].UpdateWindow(&p4ml_header->seq_num);
244 | 
245 |                 if (p4ml_header->key < MAX_MEASUREMENT_KEY) {
246 |                     if (isResend_packet) {
247 |                         resend_packet_count[p4ml_header->key][p4ml_header->seq_num]++;
248 |                     } else {
249 |                         full_packet_count[p4ml_header->key][p4ml_header->seq_num]++;
250 |                     }
251 |                 }
252 |             } else {
253 | 
254 | 
255 |                 bool type_consistent = false;
256 |                 if (tensors[my_tensors_pos].isFloat[p4ml_header->seq_num] && isOverflow_packet)
257 |                     type_consistent = true;
258 |                 if (!tensors[my_tensors_pos].isFloat[p4ml_header->seq_num] && !isOverflow_packet)
259 |                     type_consistent = true;
260 |                 
261 |                 if (type_consistent)  {
262 | 
263 |                     if (p4ml_header->key < MAX_MEASUREMENT_KEY) {
264 |                         if (isResend_packet)
265 |                             resend_packet_count[p4ml_header->key][p4ml_header->seq_num]++;
266 |                     }
267 |                     // printf("seq %d Partial packet receive.\n", p4ml_header->seq_num);
268 |                     // p4ml_header_print(p4ml_header, "Partial PACKET");
269 |                     int valid_bit = 1;
270 |                     bool need_to_update = true;
271 |                     // check if update is needed
272 |                     for (int i = 0; i < p4ml_header->num_worker; i++) {
273 |                         if (valid_bit & p4ml_header->bitmap) {
274 |                             if (tensors[my_tensors_pos].window_manager[i].isACKed[p4ml_header->seq_num]) {
275 |                                 // p4ml_header_print(p4ml_header, "ERROR PACKET");
276 |                                 // printf("[thread %d][worker %d]'s gredient is already integrated in PS, %d.\n", thread_id, i, p4ml_header->seq_num);
277 |                                 need_to_update = false;
278 |                                 break;
279 |                             }
280 |                         }   
281 |                         valid_bit <<= 1;    
282 |                     }
283 | 
284 |                     if (need_to_update) {
285 |                         // printf("need to update\n");
286 |                         int valid_bit = 1;
287 |                         for (int i = 0; i < p4ml_header->num_worker; i++) {
288 |                             if (valid_bit & p4ml_header->bitmap) { 
289 |                                 // TODO: Update Window will cause BUG, to be fix (floating point need reset ACK)
290 |                                 tensors[my_tensors_pos].window_manager[i].UpdateWindow(&p4ml_header->seq_num);
291 |                             }
292 |                             valid_bit <<= 1;
293 |                         }
294 |                         updateModel(p4ml_header, &tensors[my_tensors_pos], isOverflow_packet);
295 |                     }
296 |                 
297 |                 }
298 |             }
299 |             // if any of the worker doesn't complete slot
300 |             bool is_slot_completed = true;
301 |             for (int i = 0; i < p4ml_header->num_worker; i++) 
302 |                 if (!tensors[my_tensors_pos].window_manager[i].isACKed[p4ml_header->seq_num]) 
303 |                     is_slot_completed = false;
304 |             // printf("packet receive %d\n", p4ml_header->seq_num);
305 |             if (is_slot_completed) {
306 |                 p4ml_header->bitmap = 1;
307 |                 
308 |                 uint16_t new_agtr;
309 | 
310 |                 
311 | 
312 | 
313 |                 if (tensors[my_tensors_pos].isCollision[p4ml_header->seq_num] == true) {
314 |                     // Check if new agtr is already hashed
315 |                     if (next_agtr[p4ml_header->agtr] == -1) {
316 |                         int new_hash_agtr = hash_table->HashNew_predefine();
317 |                         // if get any of AGTR from hash
318 |                         if (new_hash_agtr != -1) {
319 |                             new_agtr = new_hash_agtr;
320 |                             next_agtr[p4ml_header->agtr] = new_agtr;
321 |                             hash_table->hash_map[p4ml_header->agtr] = new_agtr;
322 |                             // printf("old: %d -> new: %d\n", p4ml_header->agtr, new_agtr);
323 |                         } else {
324 |                             // if all of the AGTR is used, full
325 |                             // keep original AGTR
326 |                             // printf("Change Agtr fail, full.\n");
327 |                             new_agtr = p4ml_header->agtr;
328 |                         }
329 |                     } else {
330 |                         //TODO: Separate APP
331 |                         new_agtr = next_agtr[p4ml_header->agtr];
332 |                         // printf("New hash - already: %d\n", new_agtr);
333 |                         // printf("[hashed] old: %d -> new: %d\n", p4ml_header->agtr, new_agtr);
334 |                     }
335 | 
336 |                     p4ml_header_setLengthFieldToAgtr(p4ml_header, new_agtr);
337 |                     p4ml_header_setCollisionBit(p4ml_header);
338 |                 } else {
339 |                     p4ml_header_resetCollisionBit(p4ml_header);
340 |                 }
341 | 
342 |                 int offset = (p4ml_header->seq_num - 1) * MAX_ENTRIES_PER_PACKET;
343 |                 
344 |                 p4ml_header_hton_without_data(p4ml_header);
345 | 
346 |                     if (!isOverflow_packet)
347 |                         for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++)
348 |                             tensors[my_tensors_pos].data.data_int[offset + i] = htonl(tensors[my_tensors_pos].data.data_int[offset + i]);
349 | 
350 |                 // /* Give higher priority to Resend packet */
351 |                 if (isResend_packet) {
352 |                     
353 |                     // TODO: PACKET LOSS HANDLING FOR DOUBLE PACKET 
354 |                     // printf("Immediately send back Resend packet %d\n", ntohl(p4ml_header->seq_num));
355 |                     memcpy((char*) dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE), (char*) buf + IP_ETH_UDP_HEADER_SIZE, P4ML_HEADER_SIZE - 12);
356 |                     memcpy((char*) dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE) + P4ML_HEADER_SIZE - 12, tensors[my_tensors_pos].data.data_int + offset, P4ML_DATA_SIZE);
357 |                     memcpy((char*) dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE) + 14 + P4ML_DATA_SIZE, (char*) buf + IP_ETH_UDP_HEADER_SIZE + P4ML_DATA_SIZE + 14, 12);
358 |                     /* then send ACK */
359 |                     p4ml_header_setACK((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE)));
360 |                     p4ml_header_resetIndex((agghdr*)((char*)dma_context->send_region + (imm_pos_to_send * P4ML_LAYER_SIZE)));
361 | 
362 |                     send_packet(dma_context, P4ML_LAYER_SIZE, imm_pos_to_send);
363 |                     imm_pos_to_send++;
364 |                     if (imm_pos_to_send == dma_context->my_send_queue_length - 1)
365 |                         imm_pos_to_send = dma_context->my_send_queue_length / 2 + 1;
366 | 
367 |                 } else {
368 |                     memcpy((char*) dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE, (char*) buf + IP_ETH_UDP_HEADER_SIZE, P4ML_HEADER_SIZE - 12);
369 |                     memcpy((char*) dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE + P4ML_HEADER_SIZE - 12, tensors[my_tensors_pos].data.data_int + offset, P4ML_DATA_SIZE);
370 |                     memcpy((char*) dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE + 14 + P4ML_DATA_SIZE, (char*) buf + IP_ETH_UDP_HEADER_SIZE + P4ML_DATA_SIZE + 14, 12);
371 |                     /* then send ACK */
372 |                     p4ml_header_setACK((agghdr*)((char*)dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE));
373 |                     p4ml_header_resetIndex((agghdr*)((char*)dma_context->send_region + (this_pos_to_send + to_be_sent) * P4ML_LAYER_SIZE));
374 | 
375 |                     to_be_sent++;
376 |                 }
377 |                 // printf("to_be_sent: %d\n", to_be_sent);
378 | 
379 |                 if (tensors[tensors_pos_of_app[appID]].num_worker > 0) {
380 |                     bool this_tensor_finished = true;
381 |                     for (int i = 0; i < tensors[tensors_pos_of_app[appID]].num_worker; i++)
382 |                         if (tensors[tensors_pos_of_app[appID]].window_manager[i].last_ACK < tensors[tensors_pos_of_app[appID]].window_manager[i].total_ACK)
383 |                             this_tensor_finished = false;
384 | 
385 |                     if (this_tensor_finished && !tensors[tensors_pos_of_app[appID]].isCompleted) {
386 |                         // printf("[Thread %d] Tensor %d at %d Completed.\n", thread_id, tensors[tensors_pos_of_app[appID]].key, tensors_pos_of_app[appID]);
387 |                         tensors[tensors_pos_of_app[appID]].isCompleted = true;
388 |                         rand_index = 0;
389 |                         // dma_context->total_received = 0;
390 |                         // dma_context->total_sent = 0;
391 |                     }
392 |                 }
393 |             }
394 | 
395 |             /* Push Back */
396 |             dma_postback(dma_context);
397 |         }
398 |         
399 |         dma_update_snapshot(dma_context, cur_snapshot);
400 | 
401 |         if (msgs_completed < 0) {
402 |             printf("Polling error\n");
403 |             exit(1);
404 |         }
405 | 
406 |         if (msgs_completed > 0) {
407 |             dma_context->total_received += msgs_completed;
408 |             if (receive_byte_reset_flag[thread_id]) {
409 | 
410 |                 receive_byte_reset_flag[thread_id] = false;
411 |             }
412 |             else{
413 |                 ;
414 | 
415 |             }
416 |             if (to_be_sent > 0) {
417 |                 send_packet(dma_context, P4ML_LAYER_SIZE * to_be_sent, this_pos_to_send);
418 |             }
419 |             this_pos_to_send += to_be_sent;
420 |             // Let assume the last packet will not loss        
421 |         }
422 |         
423 |     }
424 | }
425 | 
426 | 
427 | void Start(int thread_id) {
428 |     bindingCPU((appID-1)*MAX_THREAD_PER_APP+thread_id);
429 |     DMAcontext* dma_context;
430 |     {
431 |         std::lock_guard<std::mutex> lock(_dma_mutex);
432 | 
433 |         dma_context = DMA_create(ib_dev, thread_id + ((appID - 1) * MAX_THREAD_PER_APP), true);
434 |         // dma_context->isSent = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1];
435 |         // dma_context->send_time = new std::chrono::high_resolution_clock::time_point[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1];
436 |         // dma_context->receive_time = new std::chrono::high_resolution_clock::time_point[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1];
437 |         global_dma_contexts[thread_id] = dma_context;
438 |     }
439 | 
440 |     main_receive_packet_loop(dma_context, thread_id); 
441 |     
442 |     sleep(1000);
443 | }
444 | int nic = 0;
445 | int main(int argc, char *argv[]) {
446 |     appID = atoi(argv[1]);
447 | 
448 |     bindingCPU(19);
449 | 
450 |     srand(time(NULL));
451 |     // num_thread = atoi(argv[1]);
452 | 
453 |     
454 |     // Lam: this one is for experiment, disable temporary
455 |     // if (argv[1])
456 |     //     UsedSwitchAGTRcount = atoi(argv[1]);
457 |     // else
458 |     //     UsedSwitchAGTRcount = MAX_AGTR_COUNT;
459 |     num_thread = 9;
460 | 
461 |  
462 | 
463 |     max_agtr_size_per_thread = 50;
464 |     UsedSwitchAGTRcount = 450;
465 |     int tempargc = 2;
466 | 
467 |     if (argc > 2 ) {
468 |         while (tempargc < argc){
469 |             std::string option = argv[tempargc++];
470 |             
471 |             if (option == "-a") {
472 |                 int num_agtr = atoi(argv[tempargc++]);
473 |                max_agtr_size_per_thread = num_agtr;
474 |             } 
475 | 
476 |             if (option == "-aa") {
477 |                 int num_used_agtr = atoi(argv[tempargc++]);
478 |                 UsedSwitchAGTRcount = num_used_agtr;
479 |             }
480 |             if (option == "-th") {
481 |                 int num_thread_argv = atoi(argv[tempargc++]);
482 |                 num_thread = num_thread_argv;
483 |             }
484 | 
485 |             if (option == "-n"){
486 |                 int num_nic = atoi(argv[tempargc++]);
487 |                 nic = num_nic;
488 |             }
489 |         }
490 | 
491 |     }
492 | 
493 | 
494 |     dev_list = ibv_get_device_list(NULL);
495 |     if (!dev_list) {
496 |         perror("Failed to get devices list");
497 |         exit(1);
498 |     }
499 | 
500 |     ib_dev = dev_list[nic];
501 |     /*
502 |     for (int i = 0; i < 2; i++){
503 |         ib_dev = dev_list[i];
504 |         if (strcmp(ibv_get_device_name(ib_dev),nic)==0){
505 |             break;
506 |         }
507 |     }
508 |     */
509 |     printf("using %s\n",ibv_get_device_name(ib_dev));
510 |     //ib_dev = dev_list[0];
511 |     if (!ib_dev) {
512 |         fprintf(stderr, "IB device not found\n");
513 |         exit(1);
514 |     }
515 | 
516 |     /* Init Thread */
517 |     workQueue = new ThreadPool(num_thread, [](){});
518 |     
519 |     global_dma_contexts = new DMAcontext*[num_thread];
520 |     printf("\nUsedSwitchAGTRcount: %d\n\n", UsedSwitchAGTRcount);
521 |     printf("max_agtr_size_per_thread: %d\n\n", max_agtr_size_per_thread);
522 | 
523 |     printf("Overflow Handled: %s\n\n", OVERFLOW_HANDLE? "TRUE":"FALSE");
524 |     /* Init tensors capacity */
525 |     tensors = new tensor_context[MAX_APP_PER_THREAD * MAX_STORAGE_PER_APP_PER_THREAD * num_thread];
526 |     printf("\nTensors memory pre-allocate...\n");
527 |     for (int i = 0; i < MAX_APP_PER_THREAD * MAX_STORAGE_PER_APP_PER_THREAD * num_thread; i++)
528 |         init_tensor(&tensors[i], MAX_TENSOR_SIZE);
529 | 
530 |     hash_table = new HashTable(UsedSwitchAGTRcount);
531 |     printf("\nHash table creating...\n\n");
532 |     memset(next_agtr, -1, sizeof(int) * MAX_AGTR_COUNT);
533 |     
534 |     for (int i = 0; i < num_thread; i++)
535 |         workQueue->enqueue(Start, i);
536 | 
537 |     std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
538 |     std::chrono::time_point<std::chrono::system_clock> timer = std::chrono::high_resolution_clock::now();
539 |     
540 |     while (1) {
541 |         std::chrono::time_point<std::chrono::system_clock> current_time = std::chrono::high_resolution_clock::now();
542 |         std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(current_time - timer);
543 |         std::chrono::duration<double> total_time = std::chrono::duration_cast<std::chrono::duration<double>>(current_time - t1);
544 |         if (time_span.count() >= 0.5) {
545 |             // printf("############################################\n");
546 |             double total_bandwidth = 0.0;
547 | 
548 |             for (int i = 0; i < num_thread; i++) {
549 |                 // printf("[thread %d] %lf Gbps.\n", i, receive_in_sec[i] * 194.0 / 1024.0 / 1024.0 / 1024.0 * 8.0);
550 |                 total_bandwidth += receive_in_sec[i] * 194.0 / 1024.0 / 1024.0 / 1024.0 * 8.0 / time_span.count();
551 |                 
552 |                 receive_byte_reset_flag[i] = true;
553 | 
554 |                 
555 |                 receive_in_sec[i] = 0;
556 |             }
557 | 
558 |     
559 |             
560 |             // total_sent = 0;
561 |             timer = current_time;
562 |         }
563 |     }
564 |     
565 | 
566 |     sleep(10000000);
567 | 
568 | }
569 | 


--------------------------------------------------------------------------------
/server/ParameterServer.h:
--------------------------------------------------------------------------------
  1 | #include <iostream>
  2 | #include <ctime>
  3 | #include <cmath>
  4 | #include <random>
  5 | #include <arpa/inet.h>
  6 | #include <chrono>
  7 | #include <map>
  8 | #include <unistd.h>
  9 | #include <inttypes.h>
 10 | #include <assert.h>
 11 | #include <cmath>
 12 | #include <algorithm>
 13 | #include <netinet/ip.h>
 14 | #include <set>
 15 | #include "../common/packet.h"
 16 | #include "../common/dma_common.h"
 17 | #include "../common/ThreadPool.h"
 18 | #include "../common/utils.h"
 19 | #include "../common/window_manager.h"
 20 | #include "../common/HashTable.h"
 21 | 
 22 | #define MAX_TENSOR_SIZE 1024000
 23 | // Lam: this one is useless since a PS can only handle 1app, to be mod.
 24 | #define MAX_APP_PER_THREAD 5
 25 | #define MAX_STORAGE_PER_APP_PER_THREAD 10
 26 | #define MAX_WORKER 16
 27 | 
 28 | #define MAX_THREAD_PER_APP 10
 29 | 
 30 | #define OVERFLOW_HANDLE false
 31 | 
 32 | 
 33 | union data_t {
 34 |     int32_t *data_int;
 35 |     float *data_float;
 36 | };
 37 | 
 38 | struct tensor_context {
 39 |     bool* isOccupy;
 40 |     bool* isCollision;
 41 |     bool* isFloat;
 42 |     bool* isLZYCol;
 43 |     bool isCompleted;
 44 | 
 45 |     data_t data;
 46 |     uint32_t len;
 47 |     uint64_t key;
 48 |     uint8_t num_worker;
 49 |     WindowManager* window_manager;
 50 |     std::chrono::time_point<std::chrono::system_clock> start_time;
 51 | };
 52 | 
 53 | void inline init_tensor(tensor_context* tensor, uint32_t len) {
 54 |     tensor->data.data_int = new int32_t[len]();
 55 |     tensor->isCompleted = true;
 56 |     tensor->isOccupy = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1]();
 57 |     tensor->isCollision = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1]();
 58 |     tensor->isLZYCol = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1](); //useless
 59 |     tensor->isFloat = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1]();
 60 |     tensor->len = 0;
 61 |     tensor->num_worker = 0;
 62 |     tensor->key = 0xffffffffffffffff;
 63 |     tensor->window_manager = new WindowManager[MAX_WORKER];
 64 |     for (int i = 0; i < MAX_WORKER; i++) {
 65 |         tensor->window_manager[i].isACKed = new bool[MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1]();
 66 |         tensor->window_manager[i].total_ACK = MAX_TENSOR_SIZE / MAX_ENTRIES_PER_PACKET + 1;
 67 |     }
 68 | }
 69 | 
 70 | int inline check_tensor_available(tensor_context* tensor, agghdr* p4ml_header, int thread_id) {
 71 |     // printf("*skey: %d, seq: %d\n", *skey, p4ml_header->seq_num);
 72 | 
 73 |     // Already have completed model and not retrieve
 74 |     if (tensor->isCompleted && p4ml_header->key != tensor->key) {
 75 |         int total_ACK = ceil((float)p4ml_header->len_tensor / MAX_ENTRIES_PER_PACKET);
 76 |         for (int i = 0; i < p4ml_header->num_worker; i++) 
 77 |             tensor->window_manager[i].Reset(total_ACK);
 78 |         // if (thread_id == 0)
 79 |         // printf("Reset tensors[%d] LAST_ACK: %d\n", *skey, tensor->window_manager[0].last_ACK);
 80 |         memset(tensor->data.data_int, 0, sizeof(int32_t) * p4ml_header->len_tensor);
 81 |         memset(tensor->isOccupy, 0, sizeof(bool) * (total_ACK + 1));
 82 |         memset(tensor->isCollision, 0, sizeof(bool) * (total_ACK + 1));
 83 |         memset(tensor->isFloat, 0, sizeof(bool) * (total_ACK + 1));
 84 |         tensor->num_worker = p4ml_header->num_worker; 
 85 |         tensor->len = p4ml_header->len_tensor;
 86 |         tensor->isCompleted = false;
 87 |         tensor->key = p4ml_header->key;
 88 |         // printf("Place %d available, real key = %d\n", *skey, tensors[*skey].key);
 89 |         return 1;
 90 |     } 
 91 |     return 0;
 92 | }
 93 | 
 94 | void inline makeTensorReadyforFloat(agghdr *p4ml_header, tensor_context *tensor_cnt) {
 95 |     int32_t* data = tensor_cnt->data.data_int;
 96 |     uint16_t *p_seq = &p4ml_header->seq_num;
 97 |     int32_t *p_model = p4ml_header->vector;
 98 |     uint32_t offset = (*p_seq - 1) * MAX_ENTRIES_PER_PACKET;
 99 |     
100 |     /* Reset Data */
101 |     memset(data + offset, 0, sizeof(int32_t) * MAX_ENTRIES_PER_PACKET);
102 |     tensor_cnt->isOccupy[*p_seq] = false;
103 |     
104 |     /* Reset Bitmap */
105 |     for (int i = 0; i < p4ml_header->num_worker; i++) {
106 |         tensor_cnt->window_manager[i].isACKed[p4ml_header->seq_num] = 0;
107 |     }
108 | }
109 | 
110 | void inline updateModel(agghdr *p4ml_header, tensor_context *dst_place, bool isFloat) {
111 |     int32_t* data = dst_place->data.data_int;
112 |     uint16_t *p_seq = &p4ml_header->seq_num;
113 |     uint32_t *tensor_len = &p4ml_header->len_tensor;
114 |     int32_t *p_model = p4ml_header->vector;
115 |     uint32_t offset = (*p_seq - 1) * MAX_ENTRIES_PER_PACKET;
116 |     // printf("dst_place->isOccupy[%d]: %d\n", *p_seq - 1, dst_place->isOccupy[*p_seq - 1]);
117 |     if (!dst_place->isOccupy[*p_seq]) {
118 |         // printf("replace\n");
119 |         if (offset < *tensor_len) {
120 |             if (offset + MAX_ENTRIES_PER_PACKET > *tensor_len)
121 |                 memcpy(data + offset, p_model, sizeof(int32_t) * (*tensor_len % MAX_ENTRIES_PER_PACKET));
122 |             else
123 |                 memcpy(data + offset, p_model, sizeof(int32_t) * MAX_ENTRIES_PER_PACKET);
124 |         } else {
125 |             printf("Update with offset %d > tensor length %d, something wrong.\n", offset, *tensor_len);
126 |         }
127 |         dst_place->isOccupy[*p_seq] = true;
128 |     } else {
129 |         // printf("addition\n");
130 |         if (isFloat) {
131 |             float* data = dst_place->data.data_float;
132 |             float* p_model = (float*) p4ml_header->vector;
133 | 
134 |             if (offset < *tensor_len) {
135 |                 for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++)
136 |                     data[offset + i] += p_model[i];
137 |             } else {
138 |                 printf("Update with offset %d > tensor length %d, something wrong.\n", offset, *tensor_len);
139 |             }
140 |         } else {
141 |             if (offset < *tensor_len) {
142 |                 for (int i = 0; i < MAX_ENTRIES_PER_PACKET; i++)
143 |                     data[offset + i] +=  p_model[i];
144 |             } else {
145 |                 printf("Update with offset %d > tensor length %d, something wrong.\n", offset, *tensor_len);
146 |             }
147 |         }
148 |     }
149 | }
150 | 
151 | void inline updateModel_force(agghdr *p4ml_header, tensor_context *dst_place) {
152 |     int32_t* data = dst_place->data.data_int;
153 |     uint16_t *p_seq = &p4ml_header->seq_num;
154 |     uint32_t *tensor_len = &p4ml_header->len_tensor;
155 |     int32_t *p_model = p4ml_header->vector;
156 |     uint32_t offset = (*p_seq - 1) * MAX_ENTRIES_PER_PACKET;
157 |     
158 |     if (offset < *tensor_len) {
159 |         if (offset + MAX_ENTRIES_PER_PACKET > *tensor_len)
160 |             memcpy(data + offset, p_model, sizeof(int32_t) * (*tensor_len % MAX_ENTRIES_PER_PACKET));
161 |         else
162 |             memcpy(data + offset, p_model, sizeof(int32_t) * MAX_ENTRIES_PER_PACKET);
163 |     } else {
164 |         printf("Update with offset %d > tensor length %d, something wrong.\n", offset, *tensor_len);
165 |     }
166 |     dst_place->isOccupy[*p_seq] = true;
167 | }


--------------------------------------------------------------------------------
/server/ParameterServer.o:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/server/ParameterServer.o


--------------------------------------------------------------------------------
/server/app:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CSU-NetLab/A2TP-Eurosys2023/c69b416ea8cdfded4f85bcdc3f96b99f864dc0c1/server/app


--------------------------------------------------------------------------------
/shell/dealdata.sh:
--------------------------------------------------------------------------------
1 | cat ../datasample/job_A.txt | awk '{if($1=="Thr") print $2}' > thr_A.txt
2 | cat ../datasample/job_B.txt | awk '{if($1=="Thr") print $2}' > thr_B.txt
3 | cat ../datasample/switch.txt | awk '{if($1=="appID[1]") print $3}' > occ_A.txt
4 | cat ../datasample/switch.txt | awk '{if($1=="appID[2]") print $3}' > occ_B.txt
5 | 
6 | python gencsv.py


--------------------------------------------------------------------------------
/shell/gencsv.py:
--------------------------------------------------------------------------------
 1 | import csv
 2 | import itertools
 3 | import string
 4 | 
 5 | 
 6 | filePath = "result.csv"
 7 | jobn = ['A', 'B']
 8 | matrix = ['thr', 'occ']
 9 | 
10 | lis = []
11 | k=0
12 | for i in matrix:
13 |   for j in jobn:
14 |     with open(i+"_"+j+".txt") as f:
15 |       temp = f.readlines()
16 |       lis.append([ string.atof(x.strip()) for x in temp])
17 |       lis[k].insert(0,i+"_"+j)
18 |     k = k + 1
19 | f.close()
20 | 
21 | rows = list(itertools.izip_longest(lis[0],lis[1],lis[2],lis[3]))
22 | 
23 | with open(filePath, "w") as f:
24 |     writer = csv.writer(f)
25 |     for row in rows:
26 |         writer.writerow(row)
27 | 


--------------------------------------------------------------------------------
/shell/occ_A.txt:
--------------------------------------------------------------------------------
1 | 46.74
2 | 45.04
3 | 55.04
4 | 41.33
5 | 54.00
6 | 51.33
7 | 16.52
8 | 


--------------------------------------------------------------------------------
/shell/occ_B.txt:
--------------------------------------------------------------------------------
 1 | 4.37
 2 | 4.15
 3 | 1.70
 4 | 2.81
 5 | 1.19
 6 | 2.00
 7 | 55.93
 8 | 84.89
 9 | 91.33
10 | 95.33
11 | 90.81
12 | 96.96
13 | 94.96
14 | 87.19
15 | 88.74
16 | 94.96
17 | 95.19
18 | 95.48
19 | 58.22
20 | 


--------------------------------------------------------------------------------
/shell/result.csv:
--------------------------------------------------------------------------------
 1 | thr_A,thr_B,occ_A,occ_B
 2 | 48.673178,14.525902,46.74,4.37
 3 | 50.194216,15.514576,45.04,4.15
 4 | 50.118166,15.590628,55.04,1.7
 5 | 50.042113,15.742732,41.33,2.81
 6 | 49.966063,15.970888,54.0,1.19
 7 | 50.270271,15.590628,51.33,2.0
 8 | ,16.122991,16.52,55.93
 9 | ,15.742732,,84.89
10 | ,15.210369,,91.33
11 | ,15.894835,,95.33
12 | ,15.894835,,90.81
13 | ,15.970888,,96.96
14 | ,15.742732,,94.96
15 | ,16.046939,,87.19
16 | ,16.199043,,88.74
17 | ,16.275095,,94.96
18 | ,16.04694,,95.19
19 | ,16.50325,,95.48
20 | ,16.275095,,58.22
21 | ,16.655353,,
22 | ,16.199043,,
23 | 


--------------------------------------------------------------------------------
/shell/thr_A.txt:
--------------------------------------------------------------------------------
1 | 48.673178
2 | 50.194216
3 | 50.118166
4 | 50.042113
5 | 49.966063
6 | 50.270271
7 | 


--------------------------------------------------------------------------------
/shell/thr_B.txt:
--------------------------------------------------------------------------------
 1 | 14.525902
 2 | 15.514576
 3 | 15.590628
 4 | 15.742732
 5 | 15.970888
 6 | 15.590628
 7 | 16.122991
 8 | 15.742732
 9 | 15.210369
10 | 15.894835
11 | 15.894835
12 | 15.970888
13 | 15.742732
14 | 16.046939
15 | 16.199043
16 | 16.275095
17 | 16.046940
18 | 16.503250
19 | 16.275095
20 | 16.655353
21 | 16.199043
22 | 


--------------------------------------------------------------------------------