├── report
├── thesis.pdf
├── figures
│ ├── mask.pdf
│ ├── delay.pdf
│ ├── delay2.pdf
│ ├── delay4.pdf
│ ├── vision.pdf
│ ├── MCT_loss.pdf
│ ├── overview.pdf
│ ├── MCT_valloss.pdf
│ ├── complex_topo.pdf
│ ├── declaration.pdf
│ ├── prev_vision.pdf
│ ├── simple_topo.pdf
│ ├── simulation.pdf
│ ├── MCT_trainloss.pdf
│ ├── eth-nsg-header.pdf
│ ├── simple_topo_ft.pdf
│ ├── slidingwindow.pdf
│ ├── architecture_bert.pdf
│ ├── architecture_ntt.pdf
│ ├── architecture_vit.pdf
│ ├── delay_Receiver0.pdf
│ ├── delay_Receiver1.pdf
│ ├── delay_Receiver2.pdf
│ ├── delay_Receivers.pdf
│ ├── queue_profile_A.pdf
│ ├── simulation_small.pdf
│ ├── SE_trend_arima_10000.pdf
│ ├── SE_trend_arima_30000.pdf
│ ├── architecture_transformer.pdf
│ ├── finetune_mct_loss_comparison.pdf
│ ├── finetune_mct_loss_comparison_agg.pdf
│ ├── simple_topo.drawio
│ ├── mask.drawio
│ ├── slidingwindow.drawio
│ └── complex_topo.drawio
├── README.md
├── Makefile
├── abstract.tex
├── summary.tex
├── thesis.tex
├── .gitignore
├── introduction.tex
├── appendix.tex
├── outlook.tex
└── background.tex
├── presentation
├── slides.pdf
├── figures
│ ├── delay.pdf
│ ├── vision.pdf
│ ├── eth_logo.pdf
│ ├── nsg_logo.pdf
│ ├── questions.pdf
│ ├── simple_topo.pdf
│ ├── complex_topo.pdf
│ ├── architecture_ntt.pdf
│ ├── eth-nsg-header.pdf
│ ├── queue_profile_A.pdf
│ ├── simple_topo_ft.pdf
│ ├── finetune_mct_loss_comparison.pdf
│ └── finetune_mct_loss_comparison_agg.pdf
├── README.md
├── Makefile
└── .gitignore
├── .gitmodules
├── literature
├── README.md
└── Literature.html
├── workspace
├── NetworkSimulators
│ ├── ns3
│ │ ├── cptodocker.sh
│ │ ├── cpfromdocker.sh
│ │ ├── dockerns3.sh
│ │ ├── tracing.cc
│ │ ├── newnet.cc
│ │ └── tcpapplication.cc
│ └── memento
│ │ ├── run_topo_small.sh
│ │ ├── run_topo.sh
│ │ ├── experiment-tags.h
│ │ ├── eval.py
│ │ ├── cdf-application.h
│ │ └── cdf-application.cc
├── Dockerfile
├── docker-run.sh
├── TransformerModels
│ ├── configs
│ │ ├── config-linear.yaml
│ │ ├── config-lstm.yaml
│ │ ├── config-encoder.yaml
│ │ ├── config-transformer.yaml
│ │ └── config-encoder-test.yaml
│ ├── run.sh
│ ├── plot_losses.py
│ ├── arima.py
│ ├── mct_test_plots.py
│ └── transformer_delay.py
├── requirements.txt
├── PandasScripts
│ ├── csv_gendelays.py
│ └── csvhelper_memento.py
└── README.md
├── CITATION.cff
├── LICENSE
├── .gitlab-ci.yml
├── README.md
├── .github
└── workflows
│ └── codeql-analysis.yml
└── .gitignore
/report/thesis.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/thesis.pdf
--------------------------------------------------------------------------------
/presentation/slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/slides.pdf
--------------------------------------------------------------------------------
/report/figures/mask.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/mask.pdf
--------------------------------------------------------------------------------
/report/figures/delay.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay.pdf
--------------------------------------------------------------------------------
/report/figures/delay2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay2.pdf
--------------------------------------------------------------------------------
/report/figures/delay4.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay4.pdf
--------------------------------------------------------------------------------
/report/figures/vision.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/vision.pdf
--------------------------------------------------------------------------------
/report/figures/MCT_loss.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/MCT_loss.pdf
--------------------------------------------------------------------------------
/report/figures/overview.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/overview.pdf
--------------------------------------------------------------------------------
/presentation/figures/delay.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/delay.pdf
--------------------------------------------------------------------------------
/presentation/figures/vision.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/vision.pdf
--------------------------------------------------------------------------------
/report/figures/MCT_valloss.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/MCT_valloss.pdf
--------------------------------------------------------------------------------
/report/figures/complex_topo.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/complex_topo.pdf
--------------------------------------------------------------------------------
/report/figures/declaration.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/declaration.pdf
--------------------------------------------------------------------------------
/report/figures/prev_vision.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/prev_vision.pdf
--------------------------------------------------------------------------------
/report/figures/simple_topo.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/simple_topo.pdf
--------------------------------------------------------------------------------
/report/figures/simulation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/simulation.pdf
--------------------------------------------------------------------------------
/presentation/figures/eth_logo.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/eth_logo.pdf
--------------------------------------------------------------------------------
/presentation/figures/nsg_logo.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/nsg_logo.pdf
--------------------------------------------------------------------------------
/report/figures/MCT_trainloss.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/MCT_trainloss.pdf
--------------------------------------------------------------------------------
/report/figures/eth-nsg-header.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/eth-nsg-header.pdf
--------------------------------------------------------------------------------
/report/figures/simple_topo_ft.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/simple_topo_ft.pdf
--------------------------------------------------------------------------------
/report/figures/slidingwindow.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/slidingwindow.pdf
--------------------------------------------------------------------------------
/presentation/figures/questions.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/questions.pdf
--------------------------------------------------------------------------------
/presentation/figures/simple_topo.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/simple_topo.pdf
--------------------------------------------------------------------------------
/report/figures/architecture_bert.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/architecture_bert.pdf
--------------------------------------------------------------------------------
/report/figures/architecture_ntt.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/architecture_ntt.pdf
--------------------------------------------------------------------------------
/report/figures/architecture_vit.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/architecture_vit.pdf
--------------------------------------------------------------------------------
/report/figures/delay_Receiver0.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay_Receiver0.pdf
--------------------------------------------------------------------------------
/report/figures/delay_Receiver1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay_Receiver1.pdf
--------------------------------------------------------------------------------
/report/figures/delay_Receiver2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay_Receiver2.pdf
--------------------------------------------------------------------------------
/report/figures/delay_Receivers.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay_Receivers.pdf
--------------------------------------------------------------------------------
/report/figures/queue_profile_A.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/queue_profile_A.pdf
--------------------------------------------------------------------------------
/report/figures/simulation_small.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/simulation_small.pdf
--------------------------------------------------------------------------------
/presentation/figures/complex_topo.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/complex_topo.pdf
--------------------------------------------------------------------------------
/presentation/figures/architecture_ntt.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/architecture_ntt.pdf
--------------------------------------------------------------------------------
/presentation/figures/eth-nsg-header.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/eth-nsg-header.pdf
--------------------------------------------------------------------------------
/presentation/figures/queue_profile_A.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/queue_profile_A.pdf
--------------------------------------------------------------------------------
/presentation/figures/simple_topo_ft.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/simple_topo_ft.pdf
--------------------------------------------------------------------------------
/report/figures/SE_trend_arima_10000.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/SE_trend_arima_10000.pdf
--------------------------------------------------------------------------------
/report/figures/SE_trend_arima_30000.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/SE_trend_arima_30000.pdf
--------------------------------------------------------------------------------
/report/figures/architecture_transformer.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/architecture_transformer.pdf
--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "workspace/memento-ns3-for-NTT"]
2 | path = workspace/memento-ns3-for-NTT
3 | url = git@github.com:Siddhant-Ray/memento-ns3-for-NTT.git
4 |
--------------------------------------------------------------------------------
/report/figures/finetune_mct_loss_comparison.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/finetune_mct_loss_comparison.pdf
--------------------------------------------------------------------------------
/literature/README.md:
--------------------------------------------------------------------------------
1 | # Literature
2 |
3 | Most of the relevant literature used for this thesis, can be found in this single [```Literature.html```](Literature.html) file.
4 |
--------------------------------------------------------------------------------
/report/figures/finetune_mct_loss_comparison_agg.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/finetune_mct_loss_comparison_agg.pdf
--------------------------------------------------------------------------------
/presentation/figures/finetune_mct_loss_comparison.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/finetune_mct_loss_comparison.pdf
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/ns3/cptodocker.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | sudo docker cp ../workspace/NetworkSimulators/ns3/. 467f79c77706:/ns3/scratch # change CONTAINER ID as required
4 |
--------------------------------------------------------------------------------
/presentation/figures/finetune_mct_loss_comparison_agg.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/finetune_mct_loss_comparison_agg.pdf
--------------------------------------------------------------------------------
/report/README.md:
--------------------------------------------------------------------------------
1 | # Report
2 |
3 | ## Final thesis PDF :
4 | [Download final compiled thesis](https://gitlab.ethz.ch/nsg/students/projects/2022/ma-2022_packet_transformer/-/jobs/artifacts/main/raw/report/thesis.pdf?job=compile_pdf)
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/ns3/cpfromdocker.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | sudo docker cp 467f79c77706:/ns3/outputs . # change CONTAINER ID as required
4 | sudo chmod -R 757 outputs
5 | sudo rm -r ../outputs
6 | sudo mv outputs ../
7 |
--------------------------------------------------------------------------------
/presentation/README.md:
--------------------------------------------------------------------------------
1 | # Presentation
2 |
3 | ## Slides for presentation
4 | [`Final slides`](https://gitlab.ethz.ch/nsg/students/projects/2022/ma-2022_packet_transformer/-/jobs/artifacts/main/raw/presentation/slides.pdf?job=compile_slides)
5 |
--------------------------------------------------------------------------------
/presentation/Makefile:
--------------------------------------------------------------------------------
1 |
2 | NOTE = !! change the next line to fit your filename; no spaces at file name end !!
3 | FILE = slides
4 |
5 | all:
6 | pdflatex $(FILE)
7 | pdflatex $(FILE)
8 |
9 | clean:
10 | rm -f *.dvi *.log *.aux *.bbl *.blg *.toc *.lof *.lot *.cb *.~ *.out *.fdb_latexmk *.fls *.nav *.snm
11 |
--------------------------------------------------------------------------------
/workspace/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM ubuntu:latest
2 | LABEL MAINTAINER Siddhant Ray
3 | WORKDIR /ntt
4 | RUN set -xe \
5 | && apt-get update \
6 | && apt-get install python3-pip -y
7 |
8 | COPY requirements.txt requirements.txt
9 | RUN pip install --upgrade pip
10 | RUN pip install -r requirements.txt
11 |
12 | COPY . .
--------------------------------------------------------------------------------
/report/Makefile:
--------------------------------------------------------------------------------
1 |
2 | NOTE = !! change the next line to fit your filename; no spaces at file name end !!
3 | FILE = thesis
4 |
5 | all:
6 | pdflatex $(FILE)
7 | bibtex $(FILE)
8 | pdflatex $(FILE)
9 | pdflatex $(FILE)
10 |
11 | clean:
12 | rm -f *.dvi *.log *.aux *.bbl *.blg *.toc *.lof *.lot *.cb *.~ *.out *.fdb_latexmk *.fls
13 |
--------------------------------------------------------------------------------
/workspace/docker-run.sh:
--------------------------------------------------------------------------------
1 | # https://stackoverflow.com/questions/30543409/how-to-check-if-a-docker-image-with-a-specific-tag-exist-locally
2 | if [[ "$(docker images -q siddhantray/ntt-docker:latest 2> /dev/null)" == "" ]]; then
3 | docker run -it ntt-docker:latest
4 | else
5 | docker pull siddhantray/ntt-docker
6 | docker run -it siddhantray/ntt-docker
7 | fi
--------------------------------------------------------------------------------
/workspace/TransformerModels/configs/config-linear.yaml:
--------------------------------------------------------------------------------
1 | config_base = ''' name: base
2 | max_learning_rate: 0.1
3 | weight_decay: 1e-5
4 | learning_rate: 1e-4
5 | dropout: 0.2
6 | num_layers: 6
7 | epochs: 15
8 | batch_size: 64
9 | linear_size: 256
10 | loss_function: huber
11 | eof: eof_str'''
--------------------------------------------------------------------------------
/workspace/TransformerModels/configs/config-lstm.yaml:
--------------------------------------------------------------------------------
1 | config_base = ''' name: base
2 | max_learning_rate: 0.1
3 | weight_decay: 1e-5
4 | learning_rate: 1e-4
5 | dropout: 0.2
6 | num_layers: 6
7 | epochs: 15
8 | batch_size: 64
9 | linear_size: 256
10 | loss_function: huber
11 | eof: eof_str'''
--------------------------------------------------------------------------------
/workspace/TransformerModels/configs/config-encoder.yaml:
--------------------------------------------------------------------------------
1 | config_base = ''' name: encoder
2 | max_learning_rate: 0.1
3 | weight_decay: 1e-5
4 | learning_rate: 1e-4
5 | dropout: 0.2
6 | num_heads: 8
7 | num_layers: 6
8 | epochs: 15
9 | batch_size: 64
10 | linear_size: 640
11 | loss_function: huber
12 | eof: eof_str'''
--------------------------------------------------------------------------------
/workspace/TransformerModels/configs/config-transformer.yaml:
--------------------------------------------------------------------------------
1 | config_base = ''' name: base
2 | max_learning_rate: 0.1
3 | weight_decay: 1e-5
4 | learning_rate: 1e-4
5 | dropout: 0.2
6 | num_heads: 8
7 | num_layers: 6
8 | epochs: 20
9 | batch_size: 64
10 | linear_size: 384
11 | loss_function: huber
12 | eof: eof_str'''
--------------------------------------------------------------------------------
/workspace/TransformerModels/configs/config-encoder-test.yaml:
--------------------------------------------------------------------------------
1 | config_base = ''' name: encoder
2 | max_learning_rate: 0.1
3 | weight_decay: 1e-5
4 | learning_rate: 1e-4
5 | dropout: 0.2
6 | num_heads: 8
7 | num_layers: 6
8 | epochs: 15
9 | batch_size: 64
10 | linear_size: 120
11 | loss_function: huber
12 | eof: eof_str'''
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/ns3/dockerns3.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | if [ "$1" == "fetch" ]
4 | then
5 | sudo docker pull notspecial/ns-3-dev
6 | sudo docker run -i -t notspecial/ns-3-dev
7 | elif [ "$1" == "newcontainer" ]
8 | then
9 | sudo docker run -i -t notspecial/ns-3-dev
10 | elif [ "$1" == "shell" ]
11 | then
12 | sudo docker exec -it 467f79c77706 bash # change CONTAINER ID as required
13 | else
14 | sudo docker start -ai 467f79c77706 # change CONTAINER ID as required
15 | fi
16 |
--------------------------------------------------------------------------------
/CITATION.cff:
--------------------------------------------------------------------------------
1 | # This CITATION.cff file was generated with cffinit.
2 | # Visit https://bit.ly/cffinit to generate yours today!
3 |
4 | cff-version: 1.2.0
5 | title: Network Traffic Transformer
6 | message: >-
7 | If you use this software, please cite it using the
8 | metadata from this file.
9 | type: software
10 | authors:
11 | - given-names: Siddhant
12 | family-names: Ray
13 | affiliation: ETH Zurich
14 | orcid: 'https://orcid.org/0000-0003-0265-2144'
15 | email: siddhant.r98@gmail.com
16 | - given-names: Alexander
17 | family-names: Dietmüller
18 | affiliation: ETH Zurich
19 | orcid: 'https://orcid.org/0000-0003-3769-3958'
20 | email: adietmue@ethz.ch
21 | keywords:
22 | - Transformer
23 | - Packet-level modelling
24 | license: MIT
25 | version: '1.0'
26 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 Siddhant Ray
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/workspace/TransformerModels/run.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | #SBATCH --output=./%j.out
3 | #SBATCH --error=./%j.err
4 | #SBATCH --cpus-per-task=4
5 | #SBATCH --gres=gpu:1
6 | #SBATCH --mem=80G
7 | set -o errexit # Exit on errors
8 |
9 | # Activate the correct venv.
10 | eval "$(pyenv init -)"
11 | eval "$(pyenv virtualenv-init -)"
12 | pyenv activate venv
13 |
14 | echo "Running on node: $(hostname)"
15 | echo "In directory: $(pwd)"
16 | echo "Starting on: $(date)"
17 | echo "SLURM_JOB_ID: ${SLURM_JOB_ID}"
18 |
19 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python encoder_delay.py
20 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python encoder_delay_varmask_chooseagglevel_multi.py
21 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python encoder_delay_varmask_chooseencodelem.py
22 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python encoder_delay_varmask_chooseencodelem_multi.py
23 |
24 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python finetune_encoder.py
25 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python finetune_encoder_multi.py
26 |
27 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python finetune_mct.py
28 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python finetune_mct_multi.py
29 |
30 | # NCCL_DEBUG=INFO; NCCL_SEBUG_SUBSYS=ALL python lstm.py
31 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python arima.py --run true
32 |
33 |
--------------------------------------------------------------------------------
/report/abstract.tex:
--------------------------------------------------------------------------------
1 | \clearpage
2 | \null
3 | \vfil
4 | \thispagestyle{plain}
5 | \begin{center}\textbf{Abstract}\end{center}
6 |
7 | Learning underlying network dynamics from packet-level data has been deemed an extremely difficult task, to the point that it is practically not attempted. While research has shown that machine learning (ML) models can be used to learn behaviour and improve on some specific tasks in the networking domain, these models do not generalize to any other tasks. However, a new ML model called the \emph{Transformer} has shown massive generalization abilities in several fields, where the model is pre-trained on large datasets, in a task agnostic fashion and fine-tuned on smaller datasets for task specific applications, and has become the state-of-the-art architecture for machine learning generalization. We present a new Transformer architecture adapted for the networking domain, the Network Traffic Transformer (NTT), which is designed to learn network dynamics from packet traces. We pre-train our NTT to learn fundamental network dynamics and then, leverage this learnt behaviour to fine-tune to specific network applications in a quick and efficient manner. By learning such dynamics in the network, the NTT can then be used to make more network-aware decisions across applications, make improvements to the same and make the networks of tomorrow, more efficient and reliable.
8 |
9 |
10 | \vfil
11 | \clearpage
12 |
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/memento/run_topo_small.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Original author: Siddhant Ray
4 |
5 | ## First in arguments is the topology
6 | ## Second specifies different congestion for different receivers (default is 0)
7 | ## Third specifies the seed for the random number generator (change for multiple runs)
8 |
9 | ## Current setup generates fine-tuning data with 2 bottlenecks, $2 is the second bottleneck rate (!=0)
10 | ## To generate pre-training data with only one bottleneck, replace --prefix with
11 | ## --prefix=results/small_test_no_disturbance_with_message_ids$3 and pass $2 as 0
12 |
13 | # // Network topology 1
14 | # //
15 | # // disturbance1
16 | # // |
17 | # // 3x n_apps(senders) --- switchA --- switchB --- receiver1
18 | # //
19 |
20 | # If running inside the VSCode's environment to run Docker containers: Replace ./docker-run.sh waf with just waf
21 |
22 | mkdir -p results
23 | ./docker-run.sh waf --run "trafficgen_small
24 | --topo=$1
25 | --apps=20
26 | --apprate=1Mbps
27 | --startwindow=50
28 | --queuesize=1000p
29 | --linkrate=30Mbps
30 | --congestion1=$2Mbps
31 | --prefix=results/small_test_one_disturbance_with_message_ids$3
32 | --seed=$3"
33 |
34 |
--------------------------------------------------------------------------------
/report/figures/simple_topo.drawio:
--------------------------------------------------------------------------------
1 | 7Vldb9owFP01SNsDU74Lj3x0qzRRTWNSn03iBKsmzhxToL9+14mTkDistE0pqC1SiY+da+ece+17Q8+erLY/OEqWMxZg2rOMYNuzpz3LGjoe/JfALgc8x82BiJMgh8wKmJNHrEBDoWsS4LQ2UDBGBUnqoM/iGPuihiHO2aY+LGS0PmuCIqwBcx9RHb0jgVjm6MA1KvwGk2hZzGwaqmeFisEKSJcoYJs9yL7u2RPOmMivVtsJppK7gpf8vu8HesuFcRyLY26I+OB21p+ECD+Ob+ZhdMd/zvrKygOia/XAarFiVzAAVoBsaIw3SyLwPEG+7NmA3IAtxYpCy4RLlCa5AiHZYph0rGxjLvD24KLNkgpwIcxWWPAdDFE3WI5ib9dobyoxCmi5p0OBISV/VFquGIILRdIzCLPOnDDHODPC7CMIi4ORDFVo+RSlKfHrPHG2jgNJ0NQoScKBFrdPUrRHgdtCQYFxTJEgD3XzbbyoGX4xAhNXLjtoKOAOv7l1Iylbcx+r+/ZDtmFKE9NrqCQQj7DQDGVClQ/+cu2cFu08CnSNQ5at02eU8azH+7uWGxmoYzuB/OxDXiS/bVi8MVvAxm0ZlMT3hS1YWm4uH6a5BwSDaAQOJVEsvQUUxzD/WIYMgU17pDpWJAjk7WOOU/KIFpkp6TyJpCojzx333Km0tRYszY8daToVnN3jiXqumMXSSkgobUBd7G3DA+ru+emwxU/ttwpV96OFqtM8XV4eqq75vqHqfTTtzKvOtNOSjBNrd/W5zb7hNqsF+Xtvs4NO5fY+5a7JbbpnJrfZVjLU9K5UKcSVHf2cvxEMMJ1km0veJv4WWmgla4t4kSZ5b2Z+wSv1Xz1hu1ca2Z++sDkcNpgf9MULdlG54klJhe1M5acj123mEJbuulf/OVG7d129ePuNfQyHNr8csdQ7JdPRxVPe280x0yz7bFcTb3DSfaetentVSvg8Ns83dXSsulTmsKHB0fW5WeaczdctJ0odzbaa7TN37Cx3bHiKZehBfdpkQq/zJpylaf8PR2EI4XsRypxwW7YbZWJZNu4pWL6zP42ET5Z7T6dnlkrPjAXy76Nsl+6rQJf9PFp8sVxwVVihYcmnkxem4X5tzSJvWU/+MsIlsRz3BUckJnF00cF/ypPfO8LFrG5cDJrVL0b5IVL97GZf/wM=
--------------------------------------------------------------------------------
/report/figures/mask.drawio:
--------------------------------------------------------------------------------
1 | 5ZpNc5swEIZ/jY/pAAKCj7GTpj204xkn7fSogAyayogKObbz6ytAfIqUNMHEE11ssZIW6X00oNUyA8vt4ZbBJPpGA0RmlhEcZuB6Zllz2xW/meFYGBzPKwwhw0FhMmvDGj8haTSkdYcDlLYackoJx0nb6NM4Rj5v2SBjdN9utqGkfdcEhkgxrH1IVOtPHPCosHqOUdu/IBxG5Z1NQ9ZsYdlYGtIIBnTfMIGbGVgySnlR2h6WiGTalboU/T4/U1sNjKGYv6TDdzBP6CO/i2+/hvv7+6sf+5+/Lmw5uEdIdnLGcrT8WErA6C4OUObFmIHFPsIcrRPoZ7V7wVzYIr4l4soUxQ0mZEkJZXlfsNlsLN8X9pQz+hs1agL3wXXcqqYUV8iykENCjKPDs5M1KwnF0kN0izg7iiZlB1dOTC47UFLY1xBB2SZqACwFgXLdhJXrWlpRkOr+j9LmsNIoDq6yNSuuYhqjtrJtDEVfFCird1ChhgJOjwCljSECOX5su+9TRd5hRbG4cQXAMjoA3I6wKd0xH8lezWU75MjqOOKQhYgrjnJI1bTfwM3Si5s9Freuo6m5Ab24eWNx6zqamputFbdK3rdyUxxNzc3Ri1t3g/Fqbs/tVKbi5mrFzR5rX6I4mprbpV7cxtqXKI6m5ubpxW2sfYniaGpuc4XbylDIiYiVt3FBgsNYlH3BA4nYd5HFtdiH5EpWbHEQZN0XDKX4CT7krjK0STaVfHLOYuZcZ752nKbFEYmpxNRypTQDcGmSAx83npav+8YqsntWEThVOO2o4fTK1AaHGhS/Nw412vqkDw3PGaQBJqWhxlAa0bDPjYYaGelDA7jnRkONdzSiYZ0bDTWK0YeGbZwbDTU2WZlGT4DyYYmA4X1VX9bodER6og5BBOhD5PLMiJQ3+1f8nkYwyYpCbEJ3fDhrmiCGxfAyTGWnVW1aBDCN8ojfHEnUzlmx1ZMdNftUnTunUvUF2dE35aE9H/XnoR88x3aMkXTtnC1Vq7Cpq9Wj68myzq41rOsHPm2yLjvCvva0SXmenPi0ydUre+mY83G4KY6m5taXvXRJ9grY0HygNUD3z46WFRfF2/VKNDCN5JAjK+tFKcz+79alKzG0wltRoc9mQHm+vvOxl9uX9BwJd/4tnO7AlcfwewPvy5aOBPxaPEOPuhN3TGeQeN+L6BXExWX99WXxAqg/YQU3fwE=
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/ns3/tracing.cc:
--------------------------------------------------------------------------------
1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */
2 |
3 | #include "ns3/object.h"
4 | #include "ns3/uinteger.h"
5 | #include "ns3/traced-value.h"
6 | #include "ns3/trace-source-accessor.h"
7 |
8 | #include "ns3/core-module.h"
9 |
10 | #include
11 |
12 | using namespace ns3;
13 |
14 | // Probably should avoid using ns std;
15 | using namespace std;
16 |
17 | class TraceTest : public Object
18 | {
19 | public:
20 |
21 | static TypeId GetTypeId (void)
22 | {
23 | static TypeId tid = TypeId("TraceTest")
24 | .SetParent(Object::GetTypeId())
25 | .SetGroupName("Tracing")
26 | .AddConstructor ()
27 | .AddTraceSource("Integer value",
28 | "An integer value to trace.",
29 | MakeTraceSourceAccessor(&TraceTest::val_Int),
30 | "ns3::TracedValueCallback::Int32")
31 | ;
32 | return tid;
33 | }
34 |
35 | TraceTest() {}
36 | TracedValue val_Int;
37 | };
38 |
39 | void IntTrace (int32_t oldValue, int32_t newValue)
40 | {
41 | std::cout<<"Traced "< traceTest = CreateObject ();
54 | traceTest->TraceConnectWithoutContext("Integer value", MakeCallback(&IntTrace));
55 |
56 | traceTest->val_Int = 1000;
57 | return 0;
58 | }
59 |
60 |
61 |
--------------------------------------------------------------------------------
/report/figures/slidingwindow.drawio:
--------------------------------------------------------------------------------
1 | 5ZvPk5owFMf/Go/dASKox8pu28N2yswe9hwhClMgDsTV7V/fRAIKL1bXgeg2e9iBB3nC95OXHy9khPxs973A6/gnjUg6cqxoN0KPI8eZjT3+XxjeK4M7nVaGVZFElck+GF6SP0QaLWndJBEpWzcySlOWrNvGkOY5CVnLhouCbtu3LWna/tU1XhFgeAlxCq2vScTiyjp1rYP9B0lWcf3LtiWvZLi+WRrKGEd0e2RCTyPkF5Sy6ijb+SQV2tW6VOW+nbjaPFhBcnZJgV+pmzw+fyfZrzlZPI3p9uU1+yK9vOF0I19YPix7rxUo6CaPiHBijdB8GyeMvKxxKK5uOXJui1mW8jObHy6TNPVpSot9WbRcLp0w5PaSFfQ3OboSeQvP9ZortbZclbl8JFIwsjv5rnajIK95hGaEFe/8lrqAJ0WXtc4Zy/PtgaFb3xMf8avvw7LarBrXB2X5gRT3A0I754UmefRV1Fh+ltOctIVtU6jKkgjU3bMCHQugeP/aVpAUs+St7V4livyFgCb8hxv9Hauj/7Sja0k3RUhkqeNKe85RFxDDxYow4GjPqHnt67Ehs7BN+8LWdaQZ29gobMjpCRtwpBmbaxa2bid1NbZTvZ0mbJ5R2MZ99W3AkWZsE7OwodmD2xM46EozuqlZ6PoalgBHmrHNjMLm9jUsAY40Y6tzEaZw62tcAhzp5nZBduM/4ub1NTABjnRzc8ziNu6LW9eRbm5mZUu8voYlwJFubmbNu52+4g040s3tgmnAAY39sey9pBzhMm6K68nV18Oto9rQJKaOq0Oz1NJ/sv6CAd/AwoIFE9+3+F8/koOsunOh5N1I6U9xmLENLKA5f2PWFhanySrnxyF/c8KVmgtdkhCnX+WFLIkiUXxekDL5gxd7V0LGtYjK/Wu485H7KHxtGC2rBUgbEJDUFCCHCAFXwUOBwxkMB+wRAtsYHCA8bo4DZliDmTE4wPT/5jhg5jSYGoMDJNFujgMOg4KJOTi6g9Cb44BZzsAzB8e99R0IjmUD1xgcYGn05jhgTjIYm4Pj3rpy5EAcyBgc4DOdm+OAGcQATr3/Wxz31pUjxazcNic8vIl7lsdYKw/VtNyc+PDQvfFQzMttc/Ik7uzeeCgm5nU22QQe7r3xuOo7pVZm/OQq06m1iPtdd3I7U8NZ8xXZhz+smJ3zNPRn8XCO/5rkYnsLfwS2B4wLZk7gdXnAuGsWq7QEXh3legLv3FrV5wlJ2+ktJhWuht70AFMLyqDkz8nBWCaNU7poFPGpWu4cLj5h2gHQ0L3ALELW9wfKgiKouHKB2RtM8au+TeqtReyI+4laxC6Rq9vDrqOhW0OYu/hXa2jOCgUAo5glaG4NL/j+7Ez7V8Z4La5ku5XYhv6wwGUSPhQkZD5OU7oRJPd70pHAsd+Mbrv8aFEF8vMiVYKQkWvBKL+k0RUnAWa8ruR7i+hz+2leJ22EE0hQ2UbY6OMI+elhj3oVnIeN/ujpLw==
--------------------------------------------------------------------------------
/workspace/requirements.txt:
--------------------------------------------------------------------------------
1 | absl-py==1.0.0
2 | aiohttp==3.8.1
3 | aiosignal==1.2.0
4 | astunparse==1.6.3
5 | async-timeout==4.0.2
6 | attrs==21.4.0
7 | black==22.8.0
8 | cachetools==5.0.0
9 | certifi==2022.12.7
10 | charset-normalizer==2.0.12
11 | click==8.1.3
12 | colorama==0.4.5
13 | cycler==0.11.0
14 | docformatter==1.5.0
15 | einops==0.4.1
16 | flatbuffers==1.12
17 | fonttools==4.33.3
18 | frozenlist==1.3.0
19 | fsspec==2022.3.0
20 | future==0.18.3
21 | gast==0.4.0
22 | google-auth==2.6.6
23 | google-auth-oauthlib==0.4.6
24 | google-pasta==0.2.0
25 | grpcio==1.44.0
26 | h5py==3.7.0
27 | idna==3.3
28 | importlib-metadata==4.11.3
29 | isort==5.10.1
30 | joblib==1.2.0
31 | keras==2.9.0
32 | Keras-Preprocessing==1.1.2
33 | kiwisolver==1.4.3
34 | libclang==14.0.1
35 | mando==0.6.4
36 | Markdown==3.3.6
37 | matplotlib==3.5.2
38 | multidict==6.0.2
39 | mypy-extensions==0.4.3
40 | numpy==1.22.3
41 | oauthlib==3.2.1
42 | opt-einsum==3.3.0
43 | packaging==21.3
44 | pandas==1.4.2
45 | pathspec==0.10.1
46 | Pillow==9.3.0
47 | platformdirs==2.5.2
48 | protobuf==3.19.5
49 | pyasn1==0.4.8
50 | pyasn1-modules==0.2.8
51 | pyDeprecate==0.3.2
52 | pyparsing==3.0.8
53 | python-dateutil==2.8.2
54 | pytorch-lightning==1.6.1
55 | pytz==2022.1
56 | PyYAML==6.0
57 | radon==5.1.0
58 | requests==2.27.1
59 | requests-oauthlib==1.3.1
60 | rsa==4.8
61 | scikit-learn==1.0.2
62 | scipy==1.8.1
63 | seaborn==0.11.2
64 | six==1.16.0
65 | tbparse==0.0.6
66 | tensorboard==2.9.1
67 | tensorboard-data-server==0.6.1
68 | tensorboard-plugin-wit==1.8.1
69 | tensorflow==2.11.1
70 | tensorflow-estimator==2.9.0
71 | tensorflow-io-gcs-filesystem==0.26.0
72 | termcolor==1.1.0
73 | threadpoolctl==3.1.0
74 | tomli==2.0.1
75 | torch==1.13.1
76 | torchmetrics==0.8.0
77 | tqdm==4.64.0
78 | typing_extensions==4.1.1
79 | untokenize==0.1.1
80 | urllib3==1.26.9
81 | Werkzeug==2.2.3
82 | wrapt==1.14.1
83 | yarl==1.7.2
84 | zipp==3.8.0
85 |
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/memento/run_topo.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Original author: Siddhant Ray
4 |
5 | ## First in arguments is the topology
6 | ## Second to fourth Specify the different congestion for different receivers (default is 0)
7 | ## Last argument is for setting the seed for the random number generator (change for multiple runs).
8 |
9 | ## Current setup generates fine-tuning data with 4 bottlenecks, $2 is the second bottleneck rate
10 | ## $3 is the third bottleneck rate, $4 is the fourth bottleneck rate (all !=0)
11 |
12 | ## Topology is
13 | # // Network topology 2
14 | # //
15 | # // disturbance1
16 | # // |
17 | # // 3x n_apps(senders) --- switchA --- switchB --- receiver1
18 | # // |
19 | # // |
20 | # // | disturbance2
21 | # // | |
22 | # // switchC --- switchD --- receiver2
23 | # // |
24 | # // |
25 | # // | disturbance3
26 | # // | |
27 | # // switchE --- switchF --- switchG--recevier3
28 | # //
29 |
30 | # If running inside the VSCode's environment to run Docker containers: Replace ./docker-run.sh waf with just waf
31 |
32 | mkdir -p results
33 | ./docker-run.sh waf --run "trafficgen
34 | --topo=$1
35 | --apps=20
36 | --apprate=1Mbps
37 | --startwindow=50
38 | --queuesize=1000p
39 | --linkrate=30Mbps
40 | --congestion1=$2Mbps
41 | --congestion2=$3Mbps
42 | --congestion3=$4Mbps
43 | --prefix=results/large_test_disturbance_with_message_ids$5
44 | --seed=$5"
45 |
46 |
--------------------------------------------------------------------------------
/report/summary.tex:
--------------------------------------------------------------------------------
1 | \chapter{Summary}
2 | \label{cha:summary}
3 |
4 | We have seen that earning fundamental dynamics in networks is not a trivial problem. Even on the simplest network topologies, different interactions of traffic arising from multiple sources, can create complex patterns in the overall traffic. However, amidst these complexities, there exist some underlying patterns, which can effectively be captured and learnt, by having the right model equipped for this task. Over a span of time, measurements taken on these traffic complexities, show that not all is truly random, there is some structure which can be learnt, and be doing so, this can be leveraged to improve performance in networks on a whole.
5 |
6 | We present in this thesis, a new NTT model, which serves the first steps towards learning fundamental behaviour of dynamics from network traffic. Based on state-of-the-art techniques on learning sequences in data in other fields, we present our Transformer based architecture which is trained to learn similar sequential information in network traffic data. Of course, doing so even in our setup, is quite challenging and there is clear scope for a huge amount of improvement. We feel that this approach surely opens up a plethora of new research questions and directions in the field of learning for networks. Over the course of this project, we explore several possible methods of pre-training on traffic traces, following which we evaluate and compare them, in order to understand better, the possibilities and limits of learning the network dynamics. Based on our findings, we do conclude that learning these networks dynamics is definitely possible, and acknowledge the fact that a lot is still unknown about deep learning on such data, but at the same time, realise that we are in a much better position to decide new directions to proceed in, given our current NTT architecture.
7 |
8 | We hope that our initial work in this direction, motivates the networking community to explore the vast possibilities of taking a step further in this domain and working together to build smarter and better learnt models, to improve performance and efficiency in the networks of tomorrow.
--------------------------------------------------------------------------------
/report/figures/complex_topo.drawio:
--------------------------------------------------------------------------------
1 | 7Vxtk6I4EP41Vu19cAtIQP046t7O1tXcbc1c3d5+RIhIDRIv4Kjz6y+RoJDE8Y03XcYqx3Sgic/TnXSnU3bAaL7+SuzF7Am7KOgYmrvugHHHMAbQou9MsEkEFjQTgUd8NxHpe8GL/464UOPSpe+iKHdhjHEQ+4u80MFhiJw4J7MJwav8ZVMc5J+6sD0kCV4cO5ClP3w3niXSvqnt5Y/I92bpk3WN98zt9GIuiGa2i1cZEfjSASOCcZx8mq9HKGDYpbgk9/1+oHc3MILC+JQbPNL/86k7mtroffj4MvV+kD+eulzLmx0s+Rfmg403KQJUCwWbNoarmR+jl4XtsJ4VpZvKZvE8oC2dfrSjRcLA1F8j+tChPML0cYjEaJ0R8RF/RXiOYrKhl/BeA3L0NkJ7tScjFc0yPKQym9Pv7TTvEaIfOEhnAGY0HDCoNQwwcAJgofvAXJW2nMCOIt/J40TwMnQZQGPtI5CQm3NlGaIMBKYCglRGUGDH/lt+AlDhwp/wHft0JHuT7QsMmIPPZl5JhJfEQfy+rMsKqiQyLYGl2CYeiiVFW6J2X/xy7qCCOyugcA2neDtOBweYbHus/5ZsIqPsAOiyV1Zkeew/oIPXniZ04ja0wA9fU110aIm65DLJPKjlx4LjBL4XMmuhJoDo84fMP3w6aT/wjrnvuuz2IUGR/25PtqqY8SwYVFvwzGHHHDNdyxhHybLDVEcxwa9oxL9XiEOmZeoHgSAqYm4bHGA3Y6cDhZ2CslzV/NVcFYqry+Wuaur1uqr1q3Gn9wrjTgoyKuau106zJU6zkpPXPc32C6XbaunO0a2bDaNbV6UMOb73rKTkso5ugt8DvUCHi3VCuYr8NW3Zc5ZbhJNokfRu1U/Inv2rH6i2Sm37Jw/shS42iBy0xRs2UTbi0Q4KAMfsVZDpijGEIZtu74MVtXjTlZO3Z+QgumgT/XbY4ptKOpTZ4+ZbCHtSBJjudNU28ajSt6tiwqvRbEjsCI08VfpA4ODkBF3fBZ3ifktFsaOuStra4LGw4FF0aq3uaEJO9EYER1H3b2JPp9R9b4KZCqdlIOSJu7wxw+Bu074aCo/me8fjM4PHZ9rEdl697Szd5Y7O+ok3+WSYdGqgI9S2Nss+6Jr5mzKM/ImYv0/p3fTdD1E3XoZ+6N2081doYtA6wcSMkkzsr+d/rMfHb70YvD9rkQPfQqJXUdaR4FGAeHJZByiStLKqFErAVDlakwATKwG1A3ZKWadJgMF+zYCdEozXCZhpNQywwqsRFAiy+Zc1aMzOmz+zfeN1rrXpZHfChdRFuWQksf6xebm2wqTA8K5QeW7WA8SgWFzHist5lCiWVOs4znBTSsrgUuakZaSviUWTkrlrax1H0tXr4qrBQX4rSFiVhKuqHXftrGK9ydQudFZxRaaKKnbWQeusJTorNA/yW5ezpgv77QRdtXm5PsiRBy/1ckmRuLaX7OOKGk/r5OU5OVCkVdW6eOFVoYyLW1bvHCfftr4j4tOvxiguPN0CTQoELo/axaiu4nxLUXO4TZOpyxIscTNqcPFZNaunfdbAMW1lG4ScxaXnAmTLaOTMfl514LqNNWG3BFh1B3lySpayB1r2BPYsoXwI+3Wz1yZlpe6giPGaXnO8ZqhSspbvovgWl+YGJOG7AnTLeCkZmXD4CypOWVbLt9HyXSbfYvxl1O/h8p7L3R/iuopD8RAXUBzDK+0Ql5rCMrdR6ipPG2oS6jmUm/HTszdMRFXiCfmS82OjPeF3X/OPeMJPOf9UesLPKLyueq+VGrMnFFjgxfOKpKriWo0h5/5t3PDh3hs4HutXGzekitul4U5MzDohnax2aQDt/kGZ+aQp7gjDEvNJ2tz/QlSyjOx/Zgt8+R8=
--------------------------------------------------------------------------------
/.gitlab-ci.yml:
--------------------------------------------------------------------------------
1 | # This file is a template, and might need editing before it works on your project.
2 | # To contribute improvements to CI/CD templates, please follow the Development guide at:
3 | # https://docs.gitlab.com/ee/development/cicd/templates.html
4 | # This specific template is located at:
5 | # https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Getting-Started.gitlab-ci.yml
6 |
7 | # This is a sample GitLab CI/CD configuration file that should run without any modifications.
8 | # It demonstrates a basic 3 stage CI/CD pipeline. Instead of real tests or scripts,
9 | # it uses echo commands to simulate the pipeline execution.
10 | #
11 | # A pipeline is composed of independent jobs that run scripts, grouped into stages.
12 | # Stages run in sequential order, but jobs within stages run in parallel.
13 | #
14 | # For more information, see: https://docs.gitlab.com/ee/ci/yaml/index.html#stages
15 |
16 | compile_pdf:
17 | stage: build
18 | image: texlive/texlive # use a Docker image for LaTeX from https://hub.docker.com/
19 | script:
20 | - cd report/
21 | - rm -f *.dvi *.log *.aux *.bbl *.blg *.toc *.lof *.lot *.cb *.~ *.out *.fdb_latexmk *.fls
22 | - pdflatex thesis
23 | - bibtex thesis
24 | - pdflatex thesis
25 | - pdflatex thesis # build the pdf just as you would on your computer
26 | artifacts:
27 | paths:
28 | - ./report/thesis.pdf # instruct GitLab to keep the thesis.pdf file
29 |
30 | compile_slides:
31 | stage: build
32 | image: texlive/texlive
33 | script:
34 | - cd presentation/
35 | - rm -f *.dvi *.log *.aux *.bbl *.blg *.toc *.lof *.lot *.cb *.~ *.out *.fdb_latexmk *.fls *.nav *.snm
36 | - pdflatex slides
37 | - pdflatex slides # build the pdf just as you would on your computer
38 | artifacts:
39 | paths:
40 | - ./presentation/slides.pdf # instruct GitLab to keep the slides.pdf file
41 |
42 |
43 | pages:
44 | stage: deploy
45 | script:
46 | - mkdir test_report # create a folder called public
47 | - cp report/thesis.pdf test_report # copy the pdf file into the public folder
48 | - mkdir test_slides
49 | - cp presentation/slides.pdf test_slides
50 | artifacts:
51 | paths:
52 | - test_report # instruct GitLab to keep the public folder
53 | - test_slides
54 | only:
55 | - main # deploy the pdf only for commits made to the main branch
--------------------------------------------------------------------------------
/report/thesis.tex:
--------------------------------------------------------------------------------
1 | \documentclass[11pt,oneside]{book}
2 | \usepackage{graphicx}
3 | \usepackage{booktabs}
4 | \usepackage{caption}
5 | \usepackage{subcaption}
6 | \usepackage{amsmath}
7 | \usepackage{amsfonts}
8 | \usepackage{amssymb}
9 | \usepackage{lscape}
10 | \usepackage{psfrag}
11 | \usepackage[usenames]{color}
12 | \usepackage{bbm}
13 | \usepackage[update]{epstopdf}
14 | \usepackage[bookmarks,pdfstartview=FitH,a4paper,pdfborder={0 0 0}]{hyperref}
15 | \usepackage{verbatim}
16 | \usepackage{listings}
17 | \usepackage{textcomp}
18 | \usepackage{fancyhdr}
19 | \usepackage{multirow}
20 | \usepackage{tikz}
21 | \usepackage{lipsum}
22 | \usepackage{xcolor}
23 | \usepackage[margin=1in]{geometry}
24 | \newcommand{\hint}[1]{{\color{blue} \em #1}}
25 |
26 | \usepackage{xspace}
27 | \newcommand*{\eg}{e.g.\@\xspace}
28 | \newcommand*{\ie}{i.e.\@\xspace}
29 |
30 | \usepackage{xcolor, soul}
31 | \usepackage[shortlabels]{enumitem}
32 |
33 | \definecolor{GoodBlue}{rgb}{0.6640625,0.8203125,1}
34 | \sethlcolor{GoodBlue}
35 |
36 | \newcommand{\smallindent}{\hphantom{N}}
37 | %\usepackage{appendix}
38 | %\usepackage[title]{appendix}
39 | \usepackage[titletoc]{appendix}
40 |
41 | \begin{document}
42 |
43 | % Update the Thesis information below!
44 | \begin{titlepage}
45 | \centering
46 | \includegraphics[width=\textwidth]{figures/eth-nsg-header}\\[60mm]
47 | %
48 | {\Huge\bf\sf{
49 | Advancing Packet-Level Traffic Predictions \\
50 | with Transformers %\\[5mm]
51 | % Second Line of Thesis Title
52 | }}\\[10mm]
53 | {\Large\bf\sf Master Thesis}\\[3mm]
54 | %
55 | {\Large\bf\sf Author: Siddhant Ray } \\[5mm]
56 | {\sf Tutors: Alexander Dietmüller, Dr. Romain Jacob}\\[5mm]
57 | {\sf Supervisor: Prof. Dr. Laurent Vanbever}\\[30mm]
58 | %
59 | {\sf Februrary 2022 to August 2022}
60 | \end{titlepage}
61 |
62 | \thispagestyle{empty}
63 | \newpage
64 | \pagenumbering{roman}
65 | \include{abstract}
66 |
67 | \clearpage
68 | \setcounter{tocdepth}{2}
69 | \tableofcontents
70 | \clearpage
71 |
72 | \pagenumbering{arabic}
73 |
74 | \include{introduction}
75 | \include{background}
76 | \include{design}
77 | \include{evaluation}
78 | \include{outlook}
79 | \include{summary}
80 |
81 | \clearpage
82 |
83 | \addcontentsline{toc}{chapter}{References}
84 |
85 | \bibliographystyle{acm}
86 | \bibliography{refs}
87 |
88 | \clearpage
89 | \begin{appendices}
90 |
91 | \pagenumbering{Roman}
92 |
93 | \include{appendix}
94 | \end{appendices}
95 |
96 | \end{document}
97 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Network Traffic Transformer (NTT)
2 |
3 | This work was undertaken as part of my master thesis at ETH Zurich, from Feb 2022 to Aug 2022, titled `Advancing packet-level traffic predictions with Transformers`. We present a new transformer-based architecture, to learn network dynamics from packet traces.
4 |
5 | We design a `pre-training` phase, where we learn fundamental network dynamics. Following this, we have a `fine-tuning` phase, on different network tasks, and demonstrate that pre-training well leads to generalization to multiple fine-tuning tasks.
6 |
7 | ### Original proposal:
8 | * [`Project proposal`](https://nsg.ee.ethz.ch/fileadmin/user_upload/thesis_proposal_packet_transformer.pdf)
9 |
10 | ### Supervisors:
11 | * [`Alexander Dietmüller`](https://nsg.ee.ethz.ch/people/alexander-dietmueller/)
12 | * [`Dr. Romain Jacob`](https://nsg.ee.ethz.ch/people/romain-jacob/)
13 | * [`Prof. Dr. Laurent Vanbever`](https://nsg.ee.ethz.ch/people/laurent-vanbever/)
14 |
15 | ### Research Lab:
16 | * [`Networked Systems Group, ETH Zurich`](https://nsg.ee.ethz.ch/home/)
17 |
18 | ### We redirect you to the following sections for further details.
19 |
20 | * [`Code and reproducing instructions:`](workspace/README.md)
21 |
22 | * [`Thesis TeX and PDF files`](report/)
23 |
24 | * [`Literature files`](literature/)
25 |
26 | * [`Slides TeX and PDF files`](presentation/)
27 |
28 | ### NOTE 1:
29 | The experiments conducted in this project are very involved. Understanding and reproducing them from just the code and comments alone will be quite hard, inspite of the instructions mentioned in the given [`README`](workspace/README.md). For more detailed understanding, we invite you to read the thesis ([`direct link`](report/thesis.pdf)). You can also check out an overview on the presentation slides ([`direct link`](presentation/slides.pdf))
30 |
31 | For any further questions or to discuss related research ideas, please feel free to contact me by [`email.`](mailto:siddhant.r98@gmail.com)
32 |
33 | ### NOTE 2:
34 | Some results from the thesis have been written as a paper titled ```A new hope for network model generalization``` and the same has been accepted for presentation at [ACM HotNets 2022](https://conferences.sigcomm.org/hotnets/2022/). The paper is now online and open-access, it can be accessed via the ACM
35 | Digital Library via this [link](https://dl.acm.org/doi/abs/10.1145/3563766.3564104), DOI is: 10.1145/3563766.3564104.
36 |
37 | ### NOTE 3:
38 | The thesis has now been published under the ETH Research collection, which is open access. It can be accessed from here ([`direct link`](https://www.research-collection.ethz.ch/handle/20.500.11850/569234))
39 |
40 |
--------------------------------------------------------------------------------
/.github/workflows/codeql-analysis.yml:
--------------------------------------------------------------------------------
1 | # For most projects, this workflow file will not need changing; you simply need
2 | # to commit it to your repository.
3 | #
4 | # You may wish to alter this file to override the set of languages analyzed,
5 | # or to provide custom queries or build logic.
6 | #
7 | # ******** NOTE ********
8 | # We have attempted to detect the languages in your repository. Please check
9 | # the `language` matrix defined below to confirm you have the correct set of
10 | # supported CodeQL languages.
11 | #
12 | name: "CodeQL"
13 |
14 | on:
15 | push:
16 | branches: [ "main" ]
17 | pull_request:
18 | # The branches below must be a subset of the branches above
19 | branches: [ "main" ]
20 | schedule:
21 | - cron: '18 11 * * 5'
22 |
23 | jobs:
24 | analyze:
25 | name: Analyze
26 | runs-on: ubuntu-latest
27 | permissions:
28 | actions: read
29 | contents: read
30 | security-events: write
31 |
32 | strategy:
33 | fail-fast: false
34 | matrix:
35 | language: [ 'python' ]
36 | # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
37 | # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
38 |
39 | steps:
40 | - name: Checkout repository
41 | uses: actions/checkout@v3
42 |
43 | # Initializes the CodeQL tools for scanning.
44 | - name: Initialize CodeQL
45 | uses: github/codeql-action/init@v2
46 | with:
47 | languages: ${{ matrix.language }}
48 | # If you wish to specify custom queries, you can do so here or in a config file.
49 | # By default, queries listed here will override any specified in a config file.
50 | # Prefix the list here with "+" to use these queries and those in the config file.
51 |
52 | # Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
53 | # queries: security-extended,security-and-quality
54 |
55 |
56 | # Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
57 | # If this step fails, then you should remove it and run the build manually (see below)
58 | - name: Autobuild
59 | uses: github/codeql-action/autobuild@v2
60 |
61 | # ℹ️ Command-line programs to run using the OS shell.
62 | # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
63 |
64 | # If the Autobuild fails above, remove it and uncomment the following three lines.
65 | # modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
66 |
67 | # - run: |
68 | # echo "Run, Build Application using script"
69 | # ./location_of_script_within_repo/buildscript.sh
70 |
71 | - name: Perform CodeQL Analysis
72 | uses: github/codeql-action/analyze@v2
73 |
--------------------------------------------------------------------------------
/workspace/PandasScripts/csv_gendelays.py:
--------------------------------------------------------------------------------
1 | # Orignal author: Siddhant Ray
2 |
3 | import argparse
4 | import collections
5 | import os
6 |
7 | import numpy as np
8 | import pandas as pd
9 |
10 | print("Current directory is:", os.getcwd())
11 | print("Generate end to end packet delays")
12 |
13 | ## Some packets are sent at sender but not received at receiver
14 | ## they are dropped somewhere in the middle, so those are not counted
15 | ## for the delay calculation
16 | def gen_packet_delay(input_dataframe1, input_dataframe2, path):
17 | # print(input_dataframe1.head(), input_dataframe1.shape)
18 | # print(input_dataframe2.head(), input_dataframe2.shape)
19 |
20 | ## these packets are not received
21 | ids_not_received = list(
22 | set(input_dataframe1["IP ID"].to_list())
23 | - set(input_dataframe2["IP ID"].to_list())
24 | )
25 |
26 | # print(ids_not_received)
27 |
28 | for value in ids_not_received:
29 | input_dataframe1 = input_dataframe1[input_dataframe1["IP ID"] != value]
30 | input_dataframe1 = input_dataframe1.reset_index(drop=True)
31 |
32 | # print(input_dataframe1.tail(), input_dataframe1.shape)
33 | # print(input_dataframe2.tail(), input_dataframe2.shape)
34 |
35 | input_dataframe1["Delay"] = (
36 | input_dataframe2["Timestamp"] - input_dataframe1["Timestamp"]
37 | )
38 | # print(input_dataframe1.tail(), input_dataframe1.shape)
39 |
40 | input_dataframe1 = input_dataframe1[input_dataframe1["Delay"].notna()]
41 | # print(input_dataframe1.tail(), input_dataframe1.shape)
42 |
43 | return input_dataframe1
44 |
45 |
46 | def main():
47 | parser = argparse.ArgumentParser()
48 | parser.add_argument(
49 | "-mod", "--model", help="choose CC model for creating congestion", required=True
50 | )
51 | parser.add_argument(
52 | "-nsend",
53 | "--numsenders",
54 | help="choose path for different topologies",
55 | required=True,
56 | )
57 | args = parser.parse_args()
58 | print(args)
59 |
60 | if args.model == "tcponly":
61 | path = "congestion_1/"
62 | elif args.model == "tcpandudp":
63 | path = "congestion_2/"
64 | else:
65 | print("ERROR: CONGESTION MODEL NOT CORRECT....")
66 | exit()
67 |
68 | num_senders = int(args.numsenders)
69 | num_receivers = num_senders
70 | sender = 0
71 | receiver = 0
72 |
73 | temp_cols = [
74 | "Timestamp",
75 | "Flow ID",
76 | "Packet ID",
77 | "Packet Size",
78 | "Interface ID",
79 | "IP ID",
80 | "DSCP",
81 | "ECN",
82 | "Payload Size",
83 | "TTL",
84 | "Proto",
85 | "Source IP",
86 | "Destination IP",
87 | "TCP Source Port",
88 | "TCP Destination Port",
89 | "TCP Sequence Number",
90 | "TCP Window Size",
91 | "Delay",
92 | ]
93 |
94 | temp = pd.DataFrame(columns=temp_cols)
95 |
96 | while sender < num_senders and receiver < num_receivers:
97 |
98 | input_dataframe1 = pd.read_csv(path + "sender_{}_final.csv".format(sender))
99 | input_dataframe2 = pd.read_csv(path + "receiver_{}_final.csv".format(receiver))
100 |
101 | delay_df = gen_packet_delay(input_dataframe1, input_dataframe2, path)
102 | temp = pd.concat([temp, delay_df], ignore_index=True, copy=False)
103 |
104 | sender += 1
105 | receiver += 1
106 |
107 | temp = temp.sort_values(by=["Timestamp"], ascending=True)
108 | print(temp.head())
109 | print(temp.shape)
110 | temp.to_csv(path + "endtoenddelay.csv", index=False)
111 |
112 |
113 | if __name__ == "__main__":
114 | main()
115 |
--------------------------------------------------------------------------------
/presentation/.gitignore:
--------------------------------------------------------------------------------
1 | ## Core latex/pdflatex auxiliary files:
2 | *.aux
3 | *.lof
4 | *.log
5 | *.lot
6 | *.fls
7 | *.out
8 | *.toc
9 | *.fmt
10 | *.fot
11 | *.cb
12 | *.cb2
13 | .*.lb
14 |
15 | ## Intermediate documents:
16 | *.dvi
17 | *.xdv
18 | *-converted-to.*
19 | # these rules might exclude image files for figures etc.
20 | # *.ps
21 | # *.eps
22 | # *.pdf
23 |
24 | ## Generated if empty string is given at "Please type another file name for output:"
25 | .pdf
26 |
27 | ## Bibliography auxiliary files (bibtex/biblatex/biber):
28 | *.bbl
29 | *.bcf
30 | *.blg
31 | *-blx.aux
32 | *-blx.bib
33 | *.run.xml
34 |
35 | ## Build tool auxiliary files:
36 | *.fdb_latexmk
37 | *.synctex
38 | *.synctex(busy)
39 | *.synctex.gz
40 | *.synctex.gz(busy)
41 | *.pdfsync
42 |
43 | ## Build tool directories for auxiliary files
44 | # latexrun
45 | latex.out/
46 |
47 | ## Auxiliary and intermediate files from other packages:
48 | # algorithms
49 | *.alg
50 | *.loa
51 |
52 | # achemso
53 | acs-*.bib
54 |
55 | # amsthm
56 | *.thm
57 |
58 | # beamer
59 | *.nav
60 | *.pre
61 | *.snm
62 | *.vrb
63 |
64 | # changes
65 | *.soc
66 |
67 | # comment
68 | *.cut
69 |
70 | # cprotect
71 | *.cpt
72 |
73 | # elsarticle (documentclass of Elsevier journals)
74 | *.spl
75 |
76 | # endnotes
77 | *.ent
78 |
79 | # fixme
80 | *.lox
81 |
82 | # feynmf/feynmp
83 | *.mf
84 | *.mp
85 | *.t[1-9]
86 | *.t[1-9][0-9]
87 | *.tfm
88 |
89 | #(r)(e)ledmac/(r)(e)ledpar
90 | *.end
91 | *.?end
92 | *.[1-9]
93 | *.[1-9][0-9]
94 | *.[1-9][0-9][0-9]
95 | *.[1-9]R
96 | *.[1-9][0-9]R
97 | *.[1-9][0-9][0-9]R
98 | *.eledsec[1-9]
99 | *.eledsec[1-9]R
100 | *.eledsec[1-9][0-9]
101 | *.eledsec[1-9][0-9]R
102 | *.eledsec[1-9][0-9][0-9]
103 | *.eledsec[1-9][0-9][0-9]R
104 |
105 | # glossaries
106 | *.acn
107 | *.acr
108 | *.glg
109 | *.glo
110 | *.gls
111 | *.glsdefs
112 |
113 | # gnuplottex
114 | *-gnuplottex-*
115 |
116 | # gregoriotex
117 | *.gaux
118 | *.gtex
119 |
120 | # htlatex
121 | *.4ct
122 | *.4tc
123 | *.idv
124 | *.lg
125 | *.trc
126 | *.xref
127 |
128 | # hyperref
129 | *.brf
130 |
131 | # knitr
132 | *-concordance.tex
133 | # TODO Comment the next line if you want to keep your tikz graphics files
134 | *.tikz
135 | *-tikzDictionary
136 |
137 | # listings
138 | *.lol
139 |
140 | # luatexja-ruby
141 | *.ltjruby
142 |
143 | # makeidx
144 | *.idx
145 | *.ilg
146 | *.ind
147 | *.ist
148 |
149 | # minitoc
150 | *.maf
151 | *.mlf
152 | *.mlt
153 | *.mtc[0-9]*
154 | *.slf[0-9]*
155 | *.slt[0-9]*
156 | *.stc[0-9]*
157 |
158 | # minted
159 | _minted*
160 | *.pyg
161 |
162 | # morewrites
163 | *.mw
164 |
165 | # nomencl
166 | *.nlg
167 | *.nlo
168 | *.nls
169 |
170 | # pax
171 | *.pax
172 |
173 | # pdfpcnotes
174 | *.pdfpc
175 |
176 | # sagetex
177 | *.sagetex.sage
178 | *.sagetex.py
179 | *.sagetex.scmd
180 |
181 | # scrwfile
182 | *.wrt
183 |
184 | # sympy
185 | *.sout
186 | *.sympy
187 | sympy-plots-for-*.tex/
188 |
189 | # pdfcomment
190 | *.upa
191 | *.upb
192 |
193 | # pythontex
194 | *.pytxcode
195 | pythontex-files-*/
196 |
197 | # tcolorbox
198 | *.listing
199 |
200 | # thmtools
201 | *.loe
202 |
203 | # TikZ & PGF
204 | *.dpth
205 | *.md5
206 | *.auxlock
207 |
208 | # todonotes
209 | *.tdo
210 |
211 | # vhistory
212 | *.hst
213 | *.ver
214 |
215 | # easy-todo
216 | *.lod
217 |
218 | # xcolor
219 | *.xcp
220 |
221 | # xmpincl
222 | *.xmpi
223 |
224 | # xindy
225 | *.xdy
226 |
227 | # xypic precompiled matrices
228 | *.xyc
229 |
230 | # endfloat
231 | *.ttt
232 | *.fff
233 |
234 | # Latexian
235 | TSWLatexianTemp*
236 |
237 | ## Editors:
238 | # WinEdt
239 | *.bak
240 | *.sav
241 |
242 | # Texpad
243 | .texpadtmp
244 |
245 | # LyX
246 | *.lyx~
247 |
248 | # Kile
249 | *.backup
250 |
251 | # KBibTeX
252 | *~[0-9]*
253 |
254 | # auto folder when using emacs and auctex
255 | ./auto/*
256 | *.el
257 |
258 | # expex forward references with \gathertags
259 | *-tags.tex
260 |
261 | # standalone packages
262 | *.sta
263 |
264 |
--------------------------------------------------------------------------------
/report/.gitignore:
--------------------------------------------------------------------------------
1 | ## Core latex/pdflatex auxiliary files:
2 | *.aux
3 | *.lof
4 | *.log
5 | *.lot
6 | *.fls
7 | *.out
8 | *.toc
9 | *.fmt
10 | *.fot
11 | *.cb
12 | *.cb2
13 | .*.lb
14 |
15 | ## Intermediate documents:
16 | *.dvi
17 | *.xdv
18 | *-converted-to.*
19 | # these rules might exclude image files for figures etc.
20 | # *.ps
21 | # *.eps
22 | # *.pdf
23 |
24 | ## Generated if empty string is given at "Please type another file name for output:"
25 | .pdf
26 |
27 | ## Bibliography auxiliary files (bibtex/biblatex/biber):
28 | *.bbl
29 | *.bcf
30 | *.blg
31 | *-blx.aux
32 | *-blx.bib
33 | *.run.xml
34 |
35 | ## Build tool auxiliary files:
36 | *.fdb_latexmk
37 | *.synctex
38 | *.synctex(busy)
39 | *.synctex.gz
40 | *.synctex.gz(busy)
41 | *.pdfsync
42 |
43 | ## Build tool directories for auxiliary files
44 | # latexrun
45 | latex.out/
46 |
47 | ## Auxiliary and intermediate files from other packages:
48 | # algorithms
49 | *.alg
50 | *.loa
51 |
52 | # achemso
53 | acs-*.bib
54 |
55 | # amsthm
56 | *.thm
57 |
58 | # beamer
59 | *.nav
60 | *.pre
61 | *.snm
62 | *.vrb
63 |
64 | # changes
65 | *.soc
66 |
67 | # comment
68 | *.cut
69 |
70 | # cprotect
71 | *.cpt
72 |
73 | # elsarticle (documentclass of Elsevier journals)
74 | *.spl
75 |
76 | # endnotes
77 | *.ent
78 |
79 | # fixme
80 | *.lox
81 |
82 | # feynmf/feynmp
83 | *.mf
84 | *.mp
85 | *.t[1-9]
86 | *.t[1-9][0-9]
87 | *.tfm
88 |
89 | #(r)(e)ledmac/(r)(e)ledpar
90 | *.end
91 | *.?end
92 | *.[1-9]
93 | *.[1-9][0-9]
94 | *.[1-9][0-9][0-9]
95 | *.[1-9]R
96 | *.[1-9][0-9]R
97 | *.[1-9][0-9][0-9]R
98 | *.eledsec[1-9]
99 | *.eledsec[1-9]R
100 | *.eledsec[1-9][0-9]
101 | *.eledsec[1-9][0-9]R
102 | *.eledsec[1-9][0-9][0-9]
103 | *.eledsec[1-9][0-9][0-9]R
104 |
105 | # glossaries
106 | *.acn
107 | *.acr
108 | *.glg
109 | *.glo
110 | *.gls
111 | *.glsdefs
112 |
113 | # gnuplottex
114 | *-gnuplottex-*
115 |
116 | # gregoriotex
117 | *.gaux
118 | *.gtex
119 |
120 | # htlatex
121 | *.4ct
122 | *.4tc
123 | *.idv
124 | *.lg
125 | *.trc
126 | *.xref
127 |
128 | # hyperref
129 | *.brf
130 |
131 | # knitr
132 | *-concordance.tex
133 | # TODO Comment the next line if you want to keep your tikz graphics files
134 | *.tikz
135 | *-tikzDictionary
136 |
137 | # listings
138 | *.lol
139 |
140 | # luatexja-ruby
141 | *.ltjruby
142 |
143 | # makeidx
144 | *.idx
145 | *.ilg
146 | *.ind
147 | *.ist
148 |
149 | # minitoc
150 | *.maf
151 | *.mlf
152 | *.mlt
153 | *.mtc[0-9]*
154 | *.slf[0-9]*
155 | *.slt[0-9]*
156 | *.stc[0-9]*
157 |
158 | # minted
159 | _minted*
160 | *.pyg
161 |
162 | # morewrites
163 | *.mw
164 |
165 | # nomencl
166 | *.nlg
167 | *.nlo
168 | *.nls
169 |
170 | # pax
171 | *.pax
172 |
173 | # pdfpcnotes
174 | *.pdfpc
175 |
176 | # sagetex
177 | *.sagetex.sage
178 | *.sagetex.py
179 | *.sagetex.scmd
180 |
181 | # scrwfile
182 | *.wrt
183 |
184 | # sympy
185 | *.sout
186 | *.sympy
187 | sympy-plots-for-*.tex/
188 |
189 | # pdfcomment
190 | *.upa
191 | *.upb
192 |
193 | # pythontex
194 | *.pytxcode
195 | pythontex-files-*/
196 |
197 | # tcolorbox
198 | *.listing
199 |
200 | # thmtools
201 | *.loe
202 |
203 | # TikZ & PGF
204 | *.dpth
205 | *.md5
206 | *.auxlock
207 |
208 | # todonotes
209 | *.tdo
210 |
211 | # vhistory
212 | *.hst
213 | *.ver
214 |
215 | # easy-todo
216 | *.lod
217 |
218 | # xcolor
219 | *.xcp
220 |
221 | # xmpincl
222 | *.xmpi
223 |
224 | # xindy
225 | *.xdy
226 |
227 | # xypic precompiled matrices
228 | *.xyc
229 |
230 | # endfloat
231 | *.ttt
232 | *.fff
233 |
234 | # Latexian
235 | TSWLatexianTemp*
236 |
237 | ## Editors:
238 | # WinEdt
239 | *.bak
240 | *.sav
241 |
242 | # Texpad
243 | .texpadtmp
244 |
245 | # LyX
246 | *.lyx~
247 |
248 | # Kile
249 | *.backup
250 |
251 | # KBibTeX
252 | *~[0-9]*
253 |
254 | # auto folder when using emacs and auctex
255 | ./auto/*
256 | *.el
257 |
258 | # expex forward references with \gathertags
259 | *-tags.tex
260 |
261 | # standalone packages
262 | *.sta
263 |
264 | # output pdf (now we put it in the directory)
265 | # thesis.pdf
266 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | share/python-wheels/
24 | *.egg-info/
25 | .installed.cfg
26 | *.egg
27 | MANIFEST
28 |
29 | # PyInstaller
30 | # Usually these files are written by a python script from a template
31 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
32 | *.manifest
33 | *.spec
34 |
35 | # Installer logs
36 | pip-log.txt
37 | pip-delete-this-directory.txt
38 |
39 | # Unit test / coverage reports
40 | htmlcov/
41 | .tox/
42 | .nox/
43 | .coverage
44 | .coverage.*
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | *.cover
49 | *.py,cover
50 | .hypothesis/
51 | .pytest_cache/
52 | cover/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | .pybuilder/
76 | target/
77 |
78 | # Jupyter Notebook
79 | .ipynb_checkpoints
80 |
81 | # IPython
82 | profile_default/
83 | ipython_config.py
84 |
85 | # pyenv
86 | # For a library or package, you might want to ignore these files since the code is
87 | # intended to run in multiple environments; otherwise, check them in:
88 | # .python-version
89 |
90 | # pipenv
91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
94 | # install all needed dependencies.
95 | #Pipfile.lock
96 |
97 | # poetry
98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99 | # This is especially recommended for binary packages to ensure reproducibility, and is more
100 | # commonly ignored for libraries.
101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102 | #poetry.lock
103 |
104 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
105 | __pypackages__/
106 |
107 | # Celery stuff
108 | celerybeat-schedule
109 | celerybeat.pid
110 |
111 | # SageMath parsed files
112 | *.sage.py
113 |
114 | # Environments
115 | .env
116 | .venv
117 | env/
118 | venv/
119 | venv_tensorboard/
120 | ENV/
121 | env.bak/
122 | venv.bak/
123 |
124 | # Spyder project settings
125 | .spyderproject
126 | .spyproject
127 |
128 | # Rope project settings
129 | .ropeproject
130 |
131 | # mkdocs documentation
132 | /site
133 |
134 | # mypy
135 | .mypy_cache/
136 | .dmypy.json
137 | dmypy.json
138 |
139 | # Pyre type checker
140 | .pyre/
141 |
142 | # pytype static type analyzer
143 | .pytype/
144 |
145 | # Cython debug symbols
146 | cython_debug/
147 |
148 | .DS_Store
149 |
150 | # Prerequisites
151 | *.d
152 |
153 | # Compiled Object files
154 | *.slo
155 | *.lo
156 | *.o
157 | *.obj
158 |
159 | # Precompiled Headers
160 | *.gch
161 | *.pch
162 |
163 | # Compiled Dynamic libraries
164 | *.so
165 | *.dylib
166 | *.dll
167 |
168 | # Fortran module files
169 | *.mod
170 | *.smod
171 |
172 | # Compiled Static libraries
173 | *.lai
174 | *.la
175 | *.a
176 | *.lib
177 |
178 | # Executables
179 | *.exe
180 | *.out
181 | *.app
182 |
183 | ## VM management scripts
184 | remote_*
185 |
186 | ## VSCODE stuff
187 | .vscode
188 |
189 | ## Outputs from NS3
190 | outputs/
191 |
192 | ## Evaluations from NS3
193 | evaluations
194 |
195 | ## Logs
196 | lightning_logs
197 | logs
198 |
199 | ## Plots
200 | workspace/PandasScripts/plots
201 | workspace/NetworkSimulators/memento/plots
202 | workspace/TransformerModels/plots
203 | figures_test/
204 | # report/thesis.pdf
205 |
206 |
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/ns3/newnet.cc:
--------------------------------------------------------------------------------
1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */
2 |
3 | #include "ns3/core-module.h"
4 | #include "ns3/network-module.h"
5 | #include "ns3/csma-module.h"
6 | #include "ns3/internet-module.h"
7 | #include "ns3/point-to-point-module.h"
8 | #include "ns3/applications-module.h"
9 | #include "ns3/ipv4-global-routing-helper.h"
10 |
11 | using namespace ns3;
12 |
13 | // Net network topology
14 | //
15 | // 10.1.1.0
16 | // n0 -------------- n1 n2 n3 n4
17 | // point-to-point | | | |
18 | // ================
19 | // LAN 10.1.2.0
20 |
21 | NS_LOG_COMPONENT_DEFINE("NewnetTest");
22 |
23 | int main(int argc, char *argv[]){
24 |
25 | bool verbose = true;
26 | uint32_t nCsma = 3;
27 | uint32_t nPackets = 1;
28 |
29 | CommandLine cmd;
30 | cmd.AddValue("nCsma", "Number of \"extra\" CSMA nodes/devices", nCsma);
31 | cmd.AddValue("verbose", "Tell echo applications to log if true", verbose);
32 | cmd.AddValue("nPackets", "Number of packets to echo", nPackets);
33 |
34 |
35 | cmd.Parse(argc, argv);
36 |
37 | if (verbose)
38 | {
39 | LogComponentEnable("UdpEchoClientApplication", LOG_LEVEL_INFO);
40 | LogComponentEnable("UdpEchoServerApplication", LOG_LEVEL_INFO);
41 | }
42 | // Sanity check, nCsma >=1 always, nPackets >=1 always
43 | nCsma = nCsma == 0 ? 1 : nCsma;
44 | nPackets = nPackets == 0 ? 1 : nPackets;
45 |
46 | // Create the P2P nodes first
47 | NodeContainer p2pNodes;
48 | p2pNodes.Create(2);
49 |
50 | // Create CSMA node containers
51 | NodeContainer csmaNodes;
52 | csmaNodes.Add(p2pNodes.Get (1));
53 | csmaNodes.Create(nCsma);
54 |
55 | // Bind the P2P devices inside th P2P containers
56 | PointToPointHelper pointToPoint;
57 | pointToPoint.SetDeviceAttribute("DataRate", StringValue ("5Mbps"));
58 | pointToPoint.SetChannelAttribute("Delay", StringValue ("2ms"));
59 | pointToPoint.SetQueue("ns3::DropTailQueue", "MaxSize", StringValue ("50p"));
60 |
61 | NetDeviceContainer p2pDevices;
62 | p2pDevices = pointToPoint.Install(p2pNodes);
63 |
64 | // Bind the CSMA devices inside th CSMA containers
65 | // For CSMA, data rate is a channel attribute, not a device attribute!
66 | // CSMA doesn't allow one to mix devices on a channel
67 | CsmaHelper csma;
68 | csma.SetChannelAttribute("DataRate", StringValue("100Mbps"));
69 | csma.SetChannelAttribute("Delay", TimeValue(NanoSeconds(6560)));
70 |
71 | NetDeviceContainer csmaDevices;
72 | csmaDevices = csma.Install(csmaNodes);
73 |
74 | // Install the protocol stack on the containers
75 | InternetStackHelper stack;
76 | stack.Install(p2pNodes.Get(0));
77 | stack.Install(csmaNodes);
78 |
79 | // IP address for point to point nodes
80 | Ipv4AddressHelper address;
81 | address.SetBase("10.1.1.0", "255.255.255.0");
82 | Ipv4InterfaceContainer p2pInterfaces;
83 | p2pInterfaces = address.Assign(p2pDevices);
84 |
85 | // IP address for CSMA devices (variable chain of devices)
86 | address.SetBase("10.1.2.0", "255.255.255.0");
87 | Ipv4InterfaceContainer csmaInterfaces;
88 | csmaInterfaces = address.Assign(csmaDevices);
89 |
90 | // PORT number is 9 here
91 | UdpEchoServerHelper echoServer(9);
92 |
93 | ApplicationContainer serverApps = echoServer.Install(csmaNodes.Get(nCsma));
94 | serverApps.Start(Seconds(1.0));
95 | serverApps.Stop(Seconds(10.0));
96 |
97 | UdpEchoClientHelper echoClient(csmaInterfaces.GetAddress(nCsma), 9);
98 | echoClient.SetAttribute("MaxPackets", UintegerValue(nPackets));
99 | echoClient.SetAttribute("Interval", TimeValue(Seconds(1.0)));
100 | echoClient.SetAttribute("PacketSize", UintegerValue(1024));
101 |
102 | ApplicationContainer clientApps = echoClient.Install(p2pNodes.Get(0));
103 | clientApps.Start(Seconds(2.0));
104 | clientApps.Stop(Seconds(10.0));
105 |
106 | Ipv4GlobalRoutingHelper::PopulateRoutingTables();
107 |
108 | pointToPoint.EnablePcap("newnet", p2pNodes.Get(0)->GetId(), 0);
109 | csma.EnablePcap("newnet", csmaNodes.Get(nCsma)->GetId(), 0, false);
110 | csma.EnablePcap("newnet", csmaNodes.Get(nCsma-1)->GetId(), 0, false);
111 |
112 | Simulator::Run();
113 | Simulator::Destroy();
114 | return 0;
115 | }
116 |
117 |
118 |
119 |
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/memento/experiment-tags.h:
--------------------------------------------------------------------------------
1 | /* Tags for tracking simulation info.
2 | */
3 | #ifndef EXPERIMENT_TAGS_H
4 | #define EXPERIMENT_TAGS_H
5 |
6 | #include "ns3/core-module.h"
7 | #include "ns3/network-module.h"
8 |
9 | using namespace ns3;
10 |
11 | // A timestamp tag that can be added to a packet.
12 | class TimestampTag : public Tag
13 | {
14 | public:
15 | static TypeId GetTypeId(void)
16 | {
17 | static TypeId tid = TypeId("ns3::TimestampTag")
18 | .SetParent()
19 | .AddConstructor()
20 | .AddAttribute("Timestamp",
21 | "Timestamp to save in tag.",
22 | EmptyAttributeValue(),
23 | MakeTimeAccessor(&TimestampTag::timestamp),
24 | MakeTimeChecker());
25 | return tid;
26 | };
27 | TypeId GetInstanceTypeId(void) const { return GetTypeId(); };
28 | uint32_t GetSerializedSize(void) const { return sizeof(timestamp); };
29 | void Serialize(TagBuffer i) const
30 | {
31 | i.Write(reinterpret_cast(×tamp),
32 | sizeof(timestamp));
33 | };
34 | void Deserialize(TagBuffer i)
35 | {
36 | i.Read(reinterpret_cast(×tamp), sizeof(timestamp));
37 | };
38 | void Print(std::ostream &os) const
39 | {
40 | os << "t=" << timestamp;
41 | };
42 |
43 | // these are our accessors to our tag structure
44 | void SetTime(Time time) { timestamp = time; };
45 | Time GetTime() { return timestamp; };
46 |
47 | private:
48 | Time timestamp;
49 | };
50 |
51 | // A tag with two integer values for workload and application ids.
52 | class IdTag : public Tag
53 | {
54 | public:
55 | static TypeId GetTypeId(void)
56 | {
57 | static TypeId tid =
58 | TypeId("ns3::IntTag")
59 | .SetParent()
60 | .AddConstructor()
61 | .AddAttribute("workload",
62 | "Workload id to save in tag.",
63 | EmptyAttributeValue(),
64 | MakeUintegerAccessor(&IdTag::workload),
65 | MakeUintegerChecker())
66 | .AddAttribute("application",
67 | "Application id to save in tag.",
68 | EmptyAttributeValue(),
69 | MakeUintegerAccessor(&IdTag::application),
70 | MakeUintegerChecker());
71 | return tid;
72 | };
73 | TypeId GetInstanceTypeId(void) const { return GetTypeId(); };
74 | uint32_t GetSerializedSize(void) const
75 | {
76 | return sizeof(workload) + sizeof(application);
77 | };
78 | void Serialize(TagBuffer i) const
79 | {
80 | i.Write(reinterpret_cast(&workload),
81 | sizeof(workload));
82 | i.Write(reinterpret_cast(&application),
83 | sizeof(application));
84 | };
85 | void Deserialize(TagBuffer i)
86 | {
87 | i.Read(reinterpret_cast(&workload), sizeof(workload));
88 | i.Read(reinterpret_cast(&application), sizeof(application));
89 | };
90 | void Print(std::ostream &os) const
91 | {
92 | os << "w=" << workload << ", "
93 | << "a=" << application;
94 | };
95 |
96 | // these are our accessors to our tag structure
97 | void SetWorkload(u_int32_t newval) { workload = newval; };
98 | u_int32_t GetWorkload() { return workload; };
99 | void SetApplication(u_int32_t newval) { application = newval; };
100 | u_int32_t GetApplication() { return application; };
101 |
102 | private:
103 | u_int32_t workload;
104 | u_int32_t application;
105 | };
106 |
107 | // A tag to check message ids and see if they are preserved across fragments
108 | class MessageTag : public Tag
109 | {
110 | public:
111 |
112 | static TypeId GetTypeId (void)
113 | {
114 | static TypeId tid = TypeId ("ns3::MessageTag")
115 | .SetParent ()
116 | .AddConstructor ()
117 | .AddAttribute ("SimpleValue",
118 | "A simple value",
119 | EmptyAttributeValue (),
120 | MakeUintegerAccessor (&MessageTag::m_simpleValue),
121 | MakeUintegerChecker ());
122 | return tid;
123 | }
124 | TypeId GetInstanceTypeId(void) const
125 | {
126 | return GetTypeId();
127 | }
128 | uint32_t GetSerializedSize(void) const
129 | {
130 | return sizeof(m_simpleValue);
131 | }
132 | void Serialize(TagBuffer i) const
133 | {
134 | i.Write(reinterpret_cast(&m_simpleValue),
135 | sizeof(m_simpleValue));
136 | }
137 | void Deserialize(TagBuffer i)
138 | {
139 | i.Read(reinterpret_cast(&m_simpleValue),
140 | sizeof(m_simpleValue));
141 | }
142 | void Print(std::ostream &os) const
143 | {
144 | os << "v=" << (uint32_t)m_simpleValue;
145 | }
146 | void SetSimpleValue(uint32_t value)
147 | {
148 | m_simpleValue = value;
149 | }
150 | uint32_t GetSimpleValue(void) const
151 | {
152 | return m_simpleValue;
153 | }
154 |
155 | private:
156 | uint32_t m_simpleValue;
157 | };
158 |
159 | #endif // EXPERIMENT_TAGS_H
160 |
--------------------------------------------------------------------------------
/literature/Literature.html:
--------------------------------------------------------------------------------
1 |
2 |
5 |
6 | Bookmarks
7 | Bookmarks Menu
8 |
9 | - Attention is all you need
10 |
- Should you mask 15% in masked language modeling?
11 |
- The illustrated transformer
12 |
- The ns-3 network simulator
13 |
- Homa: A receiver-driven low-latency transport protocol using network priorities
14 |
- Cloze procedure: A new tool for measuring readability
15 |
- Hypothesis testing in time series analysis
16 |
- Smoothing, forecasting and prediction of discrete time series
17 |
- Time series and forecasting: Brief history and future research
18 |
- Long short-term memory
19 |
- Modelling radiological language with bidirectional long short-term memory networks
20 |
- Learning in situ: a randomized experiment in video streaming
21 |
- The CAIDA anonymized internet traces data access
22 |
- Measurement lab
23 |
- Crawdad
24 |
- Rocketfuel: An ISP topology mapping engine
25 |
- Header space analysis: Static checking for networks
26 |
- Distilling the knowledge in a neural network
27 |
- Advances and open problems in federated learning
28 |
- PyTorch: An imperative style, high-performance deep learning library
29 |
- Layer normalization
30 |
- Adam: A method for stochastic optimization
31 |
- Robust estimation of a location parameter
32 |
- A new hope for network model generalization
33 |
- Classic meets modern: A pragmatic learning-based congestion control for the internet
34 |
- TCP ex machina: Computer-generated congestion control
35 |
- TCP ex machina: Computer-generated congestion control
36 |
- Oboe: Auto-tuning video ABR algorithms to network conditions
37 |
- AuTO: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization
38 |
- Learning to route
39 |
- Is advance knowledge of flow sizes a plausible assumption?
40 |
- One protocol to rule them all: Wireless Network-on-Chip using deep reinforcement learning
41 |
- Biases in data-driven networking, and what to do about them
42 |
- On the use of ML for blackbox system performance prediction
43 |
- Factorization tricks for LSTM networks
44 |
- Recent advances in natural language inference: A survey of benchmarks, resources, and approaches
45 |
- A survey on vision transformer
46 |
- Nuts and bolts of building applications using deep learning
47 |
- Network planning with deep reinforcement learning
48 |
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/memento/eval.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | from matplotlib import pyplot as plt
3 | import numpy as np
4 | import seaborn as sns
5 | import sys
6 |
7 | import matplotlib.pyplot as plt
8 | import matplotlib as mpl
9 | from matplotlib.ticker import FormatStrFormatter
10 |
11 | BIG = True
12 | TEST = True # Marked true for fine-tuning data with multiple bottlenecks
13 | val = sys.argv[1]
14 |
15 | sns.set_theme("paper", "whitegrid", font_scale=1.5)
16 | mpl.rcParams.update({
17 | 'text.usetex': True,
18 | 'font.family': 'serif',
19 | 'text.latex.preamble': r'\usepackage{amsmath,amssymb}',
20 |
21 | 'lines.linewidth': 2,
22 | 'lines.markeredgewidth': 0,
23 |
24 | 'scatter.marker': '.',
25 | 'scatter.edgecolors': 'none',
26 |
27 | # Set image quality and reduce whitespace around saved figure.
28 | 'savefig.dpi': 300,
29 | 'savefig.bbox': 'tight',
30 | 'savefig.pad_inches': 0.01,
31 | })
32 |
33 | if not TEST:
34 | frame = pd.read_csv("small_test_no_disturbance_with_message_ids{}.csv".format(val))
35 | else:
36 | if not BIG:
37 | frame = pd.read_csv("small_test_one_disturbance_with_message_ids{}.csv".format(val))
38 | else:
39 | frame = pd.read_csv("large_test_disturbance_with_message_ids{}.csv".format(val))
40 |
41 | # Get the time stamp, packet size and delay (from my format, Alex uses a different format)
42 | frame = frame[frame.columns[[1,7,-8]]]
43 | frame.columns = ["t", "size", "delay"]
44 | print(frame.head())
45 |
46 | frame = (
47 | frame
48 | .assign(delay=lambda df: df['delay']) # to ms.
49 | )
50 |
51 | plt.figure(figsize=(5,5))
52 | sbs = sns.displot(
53 | data=frame,
54 | kind='ecdf',
55 | x='delay'
56 | )
57 |
58 | #sbs.fig.suptitle('Delay plot with multiple senders')
59 | sbs.set(xlabel='Delay (seconds)', ylabel='Fraction of packets')
60 | plt.xlim([0,0.5])
61 | plt.ylim(bottom=0)
62 | # Tight layout
63 | sbs.fig.tight_layout()
64 | plt.savefig("delay"+".pdf")
65 |
66 |
67 | frame['delay'].quantile([0.5, 0.99])
68 |
69 | throughput = frame.loc[frame['t'] > 20, 'size'].sum() / 40 / (1024*1024) # in MBps
70 |
71 | queueframe = pd.read_csv("queue.csv", names=["source", "time", "size"])
72 |
73 | bottleneck_source = "/NodeList/0/DeviceList/0/$ns3::CsmaNetDevice/TxQueue/PacketsInQueue"
74 | bottleneck_queue = queueframe[queueframe["source"] == bottleneck_source]
75 | print(bottleneck_source)
76 |
77 | plt.figure(figsize=(5,5))
78 | scs = sns.relplot(
79 | data=bottleneck_queue,
80 | kind='line',
81 | x='time',
82 | y='size',
83 | legend=False,
84 | ci=None,
85 | )
86 |
87 | scs.fig.suptitle('Bottleneck queue plot with multiple senders')
88 | plt.savefig("Queuesize"+".pdf")
89 |
90 | ## Bottleneck plots for switches A, B, D, G
91 |
92 | if BIG:
93 | values = [6, 7, 9, 12]
94 | dict_switches = {
95 | 6: "A",
96 | 7: "B",
97 | 9: "D",
98 | 12: "G"
99 | }
100 | else:
101 | values = [2, 3]
102 | dict_switches = {
103 | 2: "A",
104 | 3: "B"
105 | }
106 |
107 | for value in values:
108 | bottleneck_source = "/NodeList/{}/DeviceList/0/$ns3::CsmaNetDevice/TxQueue/PacketsInQueue".format(value)
109 | bottleneck_queue = queueframe[queueframe["source"] == bottleneck_source]
110 | print(bottleneck_source)
111 |
112 | plt.figure(figsize=(5,5))
113 | scs = sns.relplot(
114 | data=bottleneck_queue,
115 | kind='line',
116 | x='time',
117 | y='size',
118 | legend=False,
119 | ci=None,
120 | )
121 |
122 | #scs.fig.suptitle('Bottleneck queue on switch {} '.format(dict_switches[value]))
123 | #scs.fig.suptitle('Queue on bottleneck switch')
124 | scs.set(xlabel='Simulation Time (seconds)', ylabel='Queue Size (packets)')
125 | plt.xlim([0,60])
126 | plt.ylim([0,1000])
127 |
128 | save_name = "Queue profile on switch {}".format(dict_switches[value]) + ".pdf"
129 | scs.fig.tight_layout()
130 | plt.savefig(save_name)
131 |
132 | dropframe = pd.read_csv("drops.csv", names=["source", "time", "packetsize"])
133 |
134 | print("Drop fraction:", len(dropframe) / (len(dropframe) + len(frame)))
135 |
136 | if BIG:
137 | ## Plot delay distribution for each receiver
138 | new_frame = pd.read_csv("large_test_disturbance_with_message_ids{}.csv".format(val))
139 | new_frame = new_frame[new_frame.columns[[1,7, 23, -8]]]
140 | new_frame.columns = ["t", "size", "dest ip", "delay"]
141 | print(new_frame.head())
142 |
143 | gb = new_frame.groupby('dest ip')
144 | groups = [gb.get_group(x) for x in gb.groups]
145 | print(groups)
146 |
147 | for idx, group in enumerate(groups):
148 | print(idx, group.shape)
149 | plt.figure(figsize=(5,5))
150 | scs = sns.displot(
151 | data=group,
152 | kind='ecdf',
153 | x='delay',
154 | legend=False
155 | )
156 |
157 | scs.fig.suptitle('Delay plot on receiver {} '.format(idx+1))
158 | scs.set(xlabel='Delay', ylabel='Fraction of packets')
159 | plt.xlim([0,0.5])
160 | # Tight layout
161 | scs.fig.tight_layout()
162 | plt.savefig("delay_Receiver{}".format(idx)+".pdf")
163 |
164 | fig, ax = plt.subplots(figsize=(5,5))
165 | df0 = groups[0]
166 | df1 = groups[1]
167 | df2 = groups[2]
168 |
169 | scs0 = sns.ecdfplot(
170 | data=df0,
171 | x='delay',
172 | label="Receiver 1",
173 | color="blue",
174 | ax = ax
175 | )
176 | scs1 = sns.ecdfplot(
177 | data=df1,
178 | x='delay',
179 | label="Receiver 2",
180 | color="red",
181 | ax = ax
182 | )
183 | scs2 = sns.ecdfplot(
184 | data=df2,
185 | x='delay',
186 | label="Receiver 3",
187 | color="green",
188 | ax = ax
189 | )
190 |
191 | ax.set_xlabel("Delay", fontsize=12)
192 | ax.set_ylabel("Fraction of packets",fontsize=12)
193 | ax.axis(xmin=0,xmax=0.5)
194 | ax.lines[0].set_linestyle("dotted")
195 | ax.lines[1].set_linestyle("--")
196 | ax.lines[2].set_linestyle("-.")
197 | fig.legend(["Receiver 1","Receiver 2","Receiver 3"],loc = "lower right", bbox_to_anchor=(0.948, 0.125), ncol=1, fontsize=10)
198 | #ax.get_legend().remove()
199 | # Tight layout
200 | fig.tight_layout()
201 | fig.savefig("delay_Receivers"+".pdf")
202 |
203 |
204 |
205 |
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/ns3/tcpapplication.cc:
--------------------------------------------------------------------------------
1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */
2 |
3 | #include
4 | #include "ns3/core-module.h"
5 | #include "ns3/network-module.h"
6 | #include "ns3/internet-module.h"
7 | #include "ns3/point-to-point-module.h"
8 | #include "ns3/applications-module.h"
9 |
10 | using namespace ns3;
11 |
12 | NS_LOG_COMPONENT_DEFINE("TCPApplication");
13 |
14 | // ===========================================================================
15 | //
16 | // node 0 node 1
17 | // +----------------+ +----------------+
18 | // | ns-3 TCP | | ns-3 TCP |
19 | // +----------------+ +----------------+
20 | // | 10.1.1.1 | | 10.1.1.2 |
21 | // +----------------+ +----------------+
22 | // | point-to-point | | point-to-point |
23 | // +----------------+ +----------------+
24 | // | |
25 | // +---------------------+
26 | // 5 Mbps, 2 ms
27 | //
28 | // ===========================================================================
29 |
30 | // MAIN TAKEAWAY: Cannot hook onto trace sources and sinks during configuration as
31 | // they may be created during run time, and do not exist during configuration time.
32 |
33 | // Application Code
34 |
35 | class App : public Application
36 | {
37 | public:
38 | App();
39 | virtual ~App();
40 | void setup(Ptr socket, Address address, uint32_t packetSize,
41 | uint32_t nPackets, DataRate dataRate);
42 |
43 | private:
44 | virtual void StartApplication(void);
45 | virtual void StopApplication(void);
46 |
47 | void ScheduleTx(void);
48 | void SendPacket(void);
49 |
50 | Ptr m_socket;
51 | Address m_peer;
52 | uint32_t m_packetSize;
53 | uint32_t m_nPackets;
54 | DataRate m_dataRate;
55 | EventId m_sendEvent;
56 | bool m_running;
57 | uint32_t m_packetsSent;
58 | };
59 |
60 | // Constructor for the application
61 | App::App()
62 | : m_socket(0),
63 | m_peer(),
64 | m_packetSize(0),
65 | m_nPackets(0),
66 | m_dataRate(0),
67 | m_sendEvent(),
68 | m_running(false),
69 | m_packetsSent(0)
70 | {}
71 |
72 | // Destructor for the application
73 | App::~App()
74 | {
75 | m_socket = 0;
76 | }
77 |
78 | void App::setup(Ptr socket, Address address, uint32_t packetSize,
79 | uint32_t nPackets, DataRate dataRate)
80 | {
81 | m_socket = socket;
82 | m_peer = address;
83 | m_packetSize = packetSize;
84 | m_nPackets = nPackets;
85 | m_dataRate = dataRate;
86 | }
87 |
88 | // Application start
89 | void App::StartApplication(void)
90 | {
91 | m_running = true;
92 | m_packetsSent = 0;
93 | m_socket->Bind();
94 | m_socket->Connect(m_peer);
95 | SendPacket();
96 | }
97 |
98 | // Application stop
99 | void App::StopApplication(void)
100 | {
101 | m_running = false;
102 | if(m_sendEvent.IsRunning())
103 | {
104 | Simulator::Cancel(m_sendEvent);
105 | }
106 | if(m_socket)
107 | {
108 | m_socket->Close();
109 | }
110 | }
111 |
112 | // Send the packet
113 | void App::SendPacket(void)
114 | {
115 | Ptr packet = Create(m_packetSize);
116 | m_socket->Send(packet);
117 |
118 | if(++m_packetsSent < m_nPackets)
119 | {
120 | ScheduleTx();
121 | }
122 | }
123 |
124 | void App::ScheduleTx(void)
125 | {
126 | if(m_running)
127 | {
128 | Time tNext(Seconds(m_packetSize * 8 / static_cast(m_dataRate.GetBitRate())));
129 | m_sendEvent = Simulator::Schedule(tNext, &App::SendPacket, this);
130 | }
131 | }
132 |
133 | static void CwndChange(uint32_t oldCwnd, uint32_t newCwnd)
134 | {
135 | NS_LOG_INFO(Simulator::Now().GetSeconds()<<"\t"< stream, Ptrp)
139 | {
140 | NS_LOG_INFO("RxDrop at "<GetStream() << "Rx drop at: "<< Simulator::Now().GetSeconds();
142 | }
143 |
144 | // Main program
145 | int main(int argc, char *argv[])
146 | {
147 | NS_LOG_INFO("Create P2P nodes.....");
148 | NodeContainer nodes;
149 | nodes.Create(2);
150 |
151 | PointToPointHelper pointToPoint;
152 | pointToPoint.SetDeviceAttribute("DataRate", StringValue("5Mbps"));
153 | pointToPoint.SetChannelAttribute("Delay", StringValue("2ms"));
154 | pointToPoint.SetQueue("ns3::DropTailQueue", "MaxSize", StringValue ("50p"));
155 |
156 | NetDeviceContainer devices;
157 | devices = pointToPoint.Install(nodes);
158 |
159 | // Add errors in the channel at a given rate
160 |
161 | Ptr em = CreateObject();
162 | em->SetAttribute("ErrorRate", DoubleValue(0.00001));
163 | devices.Get(1)->SetAttribute("ReceiveErrorModel", PointerValue(em));
164 |
165 | InternetStackHelper stack;
166 | stack.Install(nodes);
167 |
168 | Ipv4AddressHelper address;
169 | address.SetBase("10.1.1.0", "255.255.255.252");
170 | Ipv4InterfaceContainer interfaces = address.Assign(devices);
171 |
172 | uint16_t sinkPort = 8080;
173 | Address sinkAddress(InetSocketAddress(interfaces.GetAddress(1), sinkPort));
174 | PacketSinkHelper packetSinkHelper("ns3::TcpSocketFactory",
175 | InetSocketAddress(Ipv4Address::GetAny(), sinkPort));
176 |
177 | ApplicationContainer sinkApps = packetSinkHelper.Install(nodes.Get(1));
178 | sinkApps.Start(Seconds(0.));
179 | sinkApps.Stop(Seconds(20.));
180 |
181 | Ptr ns3TcpSocket = Socket::CreateSocket(nodes.Get(0),
182 | TcpSocketFactory::GetTypeId());
183 | ns3TcpSocket->TraceConnectWithoutContext("CongestionWindow",
184 | MakeCallback(&CwndChange));
185 |
186 | Ptr app = CreateObject();
187 | app->setup(ns3TcpSocket, sinkAddress, 104000, 1, DataRate("1Mbps"));
188 | nodes.Get(0)->AddApplication(app);
189 | app->SetStartTime(Seconds(1.));
190 | app->SetStopTime(Seconds(20.));
191 |
192 | AsciiTraceHelper ascii;
193 | Ptr streamRxDrops = ascii.CreateFileStream("outputs/RxDrops_tcpbasic.txt");
194 | devices.Get(1)->TraceConnectWithoutContext("PhyRxDrop", MakeBoundCallback(&RxDrop, streamRxDrops));
195 |
196 | Simulator::Stop(Seconds(20));
197 | Simulator::Run();
198 | Simulator::Destroy();
199 |
200 | return 0;
201 | }
202 |
203 |
204 |
205 |
206 |
207 |
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/memento/cdf-application.h:
--------------------------------------------------------------------------------
1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */
2 | //
3 | // Copyright (c) 2006 Georgia Tech Research Corporation
4 | //
5 | // This program is free software; you can redistribute it and/or modify
6 | // it under the terms of the GNU General Public License version 2 as
7 | // published by the Free Software Foundation;
8 | //
9 | // This program is distributed in the hope that it will be useful,
10 | // but WITHOUT ANY WARRANTY; without even the implied warranty of
11 | // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 | // GNU General Public License for more details.
13 | //
14 | // You should have received a copy of the GNU General Public License
15 | // along with this program; if not, write to the Free Software
16 | // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17 | //
18 | // Author: George F. Riley
19 | //
20 |
21 | // TODO: Update description
22 | // ns3 - On/Off Data Source Application class
23 | // George F. Riley, Georgia Tech, Spring 2007
24 | // Adapted from ApplicationOnOff in GTNetS.
25 |
26 | #ifndef CDF_APPLICATION_H
27 | #define CDF_APPLICATION_H
28 |
29 | #include "ns3/address.h"
30 | #include "ns3/application.h"
31 | #include "ns3/event-id.h"
32 | #include "ns3/ptr.h"
33 | #include "ns3/data-rate.h"
34 | #include "ns3/traced-callback.h"
35 | #include "ns3/random-variable-stream.h"
36 |
37 | namespace ns3
38 | {
39 |
40 | class Address;
41 | class RandomVariableStream;
42 | class Socket;
43 |
44 | /**
45 | * \ingroup applications
46 | * \defgroup onoff CdfApplication
47 | *
48 | * This traffic generator follows an On/Off pattern: after
49 | * Application::StartApplication
50 | * is called, "On" and "Off" states alternate. The duration of each of
51 | * these states is determined with the onTime and the offTime random
52 | * variables. During the "Off" state, no traffic is generated.
53 | * During the "On" state, cbr traffic is generated. This cbr traffic is
54 | * characterized by the specified "data rate" and "packet size".
55 | */
56 | /**
57 | * \ingroup onoff
58 | *
59 | * \brief Generate traffic to a single destination according to an
60 | * Cdf pattern.
61 | *
62 | * This traffic generator follows an On/Off pattern: after
63 | * Application::StartApplication
64 | * is called, "On" and "Off" states alternate. The duration of each of
65 | * these states is determined with the onTime and the offTime random
66 | * variables. During the "Off" state, no traffic is generated.
67 | * During the "On" state, cbr traffic is generated. This cbr traffic is
68 | * characterized by the specified "data rate" and "packet size".
69 | *
70 | * Note: When an application is started, the first packet transmission
71 | * occurs _after_ a delay equal to (packet size/bit rate). Note also,
72 | * when an application transitions into an off state in between packet
73 | * transmissions, the remaining time until when the next transmission
74 | * would have occurred is cached and is used when the application starts
75 | * up again. Example: packet size = 1000 bits, bit rate = 500 bits/sec.
76 | * If the application is started at time 3 seconds, the first packet
77 | * transmission will be scheduled for time 5 seconds (3 + 1000/500)
78 | * and subsequent transmissions at 2 second intervals. If the above
79 | * application were instead stopped at time 4 seconds, and restarted at
80 | * time 5.5 seconds, then the first packet would be sent at time 6.5 seconds,
81 | * because when it was stopped at 4 seconds, there was only 1 second remaining
82 | * until the originally scheduled transmission, and this time remaining
83 | * information is cached and used to schedule the next transmission
84 | * upon restarting.
85 | *
86 | * If the underlying socket type supports broadcast, this application
87 | * will automatically enable the SetAllowBroadcast(true) socket option.
88 | */
89 | class CdfApplication : public Application
90 | {
91 | public:
92 | /**
93 | * \brief Get the type ID.
94 | * \return the object TypeId
95 | */
96 | static TypeId GetTypeId(void);
97 |
98 | CdfApplication();
99 |
100 | virtual ~CdfApplication();
101 |
102 | /**
103 | * \brief Return a pointer to associated socket.
104 | * \return pointer to associated socket
105 | */
106 | Ptr GetSocket(void) const;
107 |
108 | /**
109 | * \brief Assign a fixed random variable stream number to the random variables
110 | * used by this model.
111 | *
112 | * \param stream first stream index to use
113 | * \return the number of stream indices assigned by this model
114 | */
115 | int64_t AssignStreams(int64_t stream);
116 |
117 | protected:
118 | virtual void DoDispose(void);
119 |
120 | private:
121 | // inherited from Application base class.
122 | virtual void StartApplication(void); // Called at time specified by Start
123 | virtual void StopApplication(void); // Called at time specified by Stop
124 |
125 | //helpers
126 | /**
127 | * \brief Cancel all pending events.
128 | */
129 | void CancelEvents();
130 |
131 | // Event handlers
132 | /**
133 | * \brief Send a packet
134 | */
135 | void SendPacket();
136 |
137 | Ptr m_socket; //!< Associated socket
138 | Address m_peer; //!< Peer address
139 | bool m_connected; //!< True if connected
140 | DataRate m_rate; //!< Rate that data is generated
141 | Time m_lastStartTime; //!< Time last packet sent
142 | EventId m_sendEvent; //!< Event id of pending "send packet" event
143 | TypeId m_tid; //!< Type of the socket used
144 |
145 | // cdf files!
146 | std::string m_filename;
147 | double m_average_size; // in bytes!
148 | Ptr m_sizeDist;
149 | Ptr m_timeDist;
150 | uint32_t m_counter; // track number of fragments sent
151 |
152 | /// Traced Callback: transmitted packets.
153 | TracedCallback> m_txTrace;
154 |
155 | /// Callbacks for tracing the packet Tx events, includes source and destination addresses
156 | TracedCallback, const Address &, const Address &> m_txTraceWithAddresses;
157 |
158 | private:
159 | /**
160 | * \brief Schedule the next packet transmission
161 | */
162 | void ScheduleNextTx();
163 | /**
164 | * \brief Handle a Connection Succeed event
165 | * \param socket the connected socket
166 | */
167 | void ConnectionSucceeded(Ptr socket);
168 | /**
169 | * \brief Handle a Connection Failed event
170 | * \param socket the not connected socket
171 | */
172 | void ConnectionFailed(Ptr socket);
173 |
174 | // Accessors for Distribution Attributes
175 | bool SetDistribution(std::string filename);
176 | std::string GetDistribution() const;
177 |
178 | void SetRate(DataRate rate);
179 | DataRate GetRate() const;
180 |
181 | // Helper to set the rate dist, needs to be called by both setters above.
182 | void UpdateRateDistribution();
183 | };
184 |
185 | } // namespace ns3
186 |
187 | #endif /* CDF_APPLICATION_H */
188 |
--------------------------------------------------------------------------------
/workspace/TransformerModels/plot_losses.py:
--------------------------------------------------------------------------------
1 | # Orignal author: Siddhant Ray
2 |
3 | import matplotlib as mpl
4 | import matplotlib.pyplot as plt
5 | import seaborn as sns
6 | from matplotlib.ticker import FormatStrFormatter
7 | from tbparse import SummaryReader
8 |
9 | log_dir = "../../logs/encoder_delay_logs/"
10 | reader = SummaryReader(log_dir)
11 | df = reader.scalars
12 |
13 | epoch_train_loss_df = df[df["tag"] == "Avg loss per epoch"]
14 | epoch_train_loss_df.reset_index(inplace=True, drop=True)
15 | print(epoch_train_loss_df)
16 |
17 | train_loss_step_df = df[df["tag"] == "Train loss"]
18 | train_loss_step_df.reset_index(inplace=True, drop=True)
19 | print(train_loss_step_df)
20 |
21 | val_loss_step_df = df[df["tag"] == "Val loss"]
22 | val_loss_step_df.reset_index(inplace=True, drop=True)
23 | print(val_loss_step_df)
24 |
25 | ## Train loss plot (pre-training)
26 | plt.figure(figsize=(5, 5))
27 | sns.lineplot(
28 | x=epoch_train_loss_df.index,
29 | y="value",
30 | data=epoch_train_loss_df,
31 | label="Avg train loss per epoch",
32 | )
33 | # plt.show()
34 |
35 | ## Val loss plot (pre-training)
36 | plt.figure(figsize=(5, 5))
37 | sns.lineplot(
38 | x=val_loss_step_df.index,
39 | y="value",
40 | data=val_loss_step_df,
41 | label="Avg val loss per epoch",
42 | )
43 | # plt.show()
44 |
45 | mct_log_dir_pretrained = "../../logs/finetune_mct_logs/"
46 | reader_pretrained = SummaryReader(mct_log_dir_pretrained)
47 | df_pretrained = reader_pretrained.scalars
48 |
49 | train_loss_epoch_df_pretrained = df_pretrained[
50 | df_pretrained["tag"] == "Avg loss per epoch"
51 | ]
52 | train_loss_epoch_df_pretrained.reset_index(inplace=True, drop=True)
53 |
54 | val_loss_epoch_df_pretrained = df_pretrained[df_pretrained["tag"] == "Val loss"]
55 | val_loss_epoch_df_pretrained.reset_index(inplace=True, drop=True)
56 |
57 |
58 | mct_log_dir_nonpretrained = "../../logs/finetune_mct_logs2/"
59 | reader_nonpretrained = SummaryReader(mct_log_dir_nonpretrained)
60 | df_nonpretrained = reader_nonpretrained.scalars
61 |
62 | train_loss_epoch_df_nonpretrained = df_nonpretrained[
63 | df_nonpretrained["tag"] == "Avg loss per epoch"
64 | ]
65 | train_loss_epoch_df_nonpretrained.reset_index(inplace=True, drop=True)
66 |
67 | val_loss_epoch_df_nonpretrained = df_nonpretrained[
68 | df_nonpretrained["tag"] == "Val loss"
69 | ]
70 | val_loss_epoch_df_nonpretrained.reset_index(inplace=True, drop=True)
71 |
72 | print(train_loss_epoch_df_nonpretrained.head(25))
73 | print(val_loss_epoch_df_nonpretrained.head(25))
74 |
75 | print(train_loss_epoch_df_pretrained.head(17))
76 | print(val_loss_epoch_df_pretrained.head(17))
77 |
78 |
79 | sns.set_theme("paper", "whitegrid", font_scale=1.2)
80 | mpl.rcParams.update(
81 | {
82 | "text.usetex": True,
83 | "font.family": "serif",
84 | "text.latex.preamble": r"\usepackage{amsmath,amssymb}",
85 | "lines.linewidth": 2,
86 | "lines.markeredgewidth": 0,
87 | "scatter.marker": ".",
88 | "scatter.edgecolors": "none",
89 | # Set image quality and reduce whitespace around saved figure.
90 | "savefig.dpi": 300,
91 | "savefig.bbox": "tight",
92 | "savefig.pad_inches": 0.01,
93 | }
94 | )
95 |
96 | fig, ax = plt.subplots(2, figsize=(5, 5), sharex=True)
97 | plt.subplots_adjust(hspace=0.03)
98 | # plt.xticks(fontsize=8)
99 | # plt.yticks(fontsize=8)
100 |
101 |
102 | ## Train loss (pre-trained vs non-pretrained)
103 | # plt.figure(figsize=(3, 1.67))
104 | g1 = sns.lineplot(
105 | x=train_loss_epoch_df_pretrained.index,
106 | y="value",
107 | data=train_loss_epoch_df_pretrained,
108 | color="green",
109 | label="Pre-trained",
110 | ax=ax[0],
111 | )
112 | g2 = sns.lineplot(
113 | x=train_loss_epoch_df_nonpretrained.index,
114 | y="value",
115 | data=train_loss_epoch_df_nonpretrained,
116 | color="red",
117 | label="From scratch",
118 | ax=ax[0],
119 | )
120 | # plt.title("Train loss on MCT prediction pre-trained vs non-pretrained")
121 | ax[0].set_xlabel("Training Epoch", fontsize=12)
122 | ax[0].set_ylabel("Training MSE", fontsize=12)
123 | ax[0].lines[1].set_linestyle("--")
124 | ticks = [0, 0.25, 0.5, 0.75, 1]
125 | ax[0].yaxis.set_ticks(ticks)
126 | tickLabels = map(str, ticks)
127 | ax[0].yaxis.set_ticklabels(tickLabels)
128 | ax[0].axis(ymin=0, ymax=1)
129 | ax[0].axis(xmin=0, xmax=25)
130 |
131 | # ax[0].legend(fontsize=8)
132 | # plt.savefig("../../figures/MCT_train_loss.pdf")
133 |
134 | ## Val loss (pre-trained vs non-pretrained)
135 | # plt.figure(figsize=(3, 1.67))
136 | g3 = sns.lineplot(
137 | x=val_loss_epoch_df_pretrained.index,
138 | y="value",
139 | data=val_loss_epoch_df_pretrained,
140 | color="green",
141 | label="Pre-trained",
142 | ax=ax[1],
143 | )
144 | g4 = sns.lineplot(
145 | x=val_loss_epoch_df_nonpretrained.index,
146 | y="value",
147 | data=val_loss_epoch_df_nonpretrained,
148 | color="red",
149 | label="From scratch",
150 | ax=ax[1],
151 | )
152 | # plt.title("Val loss on MCT prediction pre-trained vs non-pretrained")
153 | ax[1].set_xlabel("Training Epoch", fontsize=12)
154 | ax[1].set_ylabel("Validation MSE", fontsize=12)
155 | ticks = [0, 0.25, 0.5, 0.75, 1]
156 | ax[1].yaxis.set_ticks(ticks)
157 | tickLabels = map(str, ticks)
158 | ax[1].yaxis.set_ticklabels(tickLabels)
159 | ax[1].lines[1].set_linestyle("--")
160 | ax[1].axis(ymin=0, ymax=1)
161 | ax[1].axis(xmin=0, xmax=25)
162 | # plt.xticks(fontsize=8)
163 | # plt.yticks(fontsize=8)
164 | # ax[1].legend(fontsize=8)
165 | fig.legend(
166 | ["Pre-trained", "From scratch"],
167 | loc="upper right",
168 | bbox_to_anchor=(0.968, 0.973),
169 | ncol=1,
170 | fontsize=12,
171 | )
172 | ax[1].get_legend().remove()
173 | ax[0].get_legend().remove()
174 | fig.tight_layout()
175 | plt.savefig("../../figures_test/MCT_loss.pdf")
176 |
177 |
178 | fig1, ax1 = plt.subplots(figsize=(5, 5), sharex=True)
179 | g1 = sns.lineplot(
180 | x=train_loss_epoch_df_pretrained.index,
181 | y="value",
182 | data=train_loss_epoch_df_pretrained,
183 | color="green",
184 | label="Pre-trained",
185 | ax=ax1,
186 | )
187 | g2 = sns.lineplot(
188 | x=train_loss_epoch_df_nonpretrained.index,
189 | y="value",
190 | data=train_loss_epoch_df_nonpretrained,
191 | color="red",
192 | label="From scratch",
193 | ax=ax1,
194 | )
195 | ax1.set_xlabel("Training Epoch", fontsize=8)
196 | ax1.set_ylabel("Training MSE", fontsize=8)
197 | ax1.lines[1].set_linestyle("--")
198 | ticks = [0, 0.25, 0.5, 0.75, 1]
199 | ax1.yaxis.set_ticks(ticks)
200 | tickLabels = map(str, ticks)
201 | ax1.yaxis.set_ticklabels(tickLabels)
202 | ax1.axis(ymin=0, ymax=1)
203 | ax1.axis(xmin=0, xmax=25)
204 | fig1.legend(
205 | ["Pre-trained", "From scratch"],
206 | loc="upper right",
207 | bbox_to_anchor=(0.968, 0.973),
208 | ncol=1,
209 | fontsize=8,
210 | )
211 | ax1.get_legend().remove()
212 | fig1.tight_layout()
213 | plt.savefig("../../figures_test/MCT_trainloss.pdf")
214 |
215 | fig2, ax2 = plt.subplots(figsize=(5, 5), sharex=True)
216 | g3 = sns.lineplot(
217 | x=val_loss_epoch_df_pretrained.index,
218 | y="value",
219 | data=val_loss_epoch_df_pretrained,
220 | color="green",
221 | label="Pre-trained",
222 | ax=ax2,
223 | )
224 | g4 = sns.lineplot(
225 | x=val_loss_epoch_df_nonpretrained.index,
226 | y="value",
227 | data=val_loss_epoch_df_nonpretrained,
228 | color="red",
229 | label="From scratch",
230 | ax=ax2,
231 | )
232 | ax2.set_xlabel("Training Epoch", fontsize=8)
233 | ax2.set_ylabel("Validation MSE", fontsize=8)
234 | ticks = [0, 0.25, 0.5, 0.75, 1]
235 | ax2.yaxis.set_ticks(ticks)
236 | tickLabels = map(str, ticks)
237 | ax2.yaxis.set_ticklabels(tickLabels)
238 | ax2.lines[1].set_linestyle("--")
239 | ax2.axis(ymin=0, ymax=1)
240 | ax2.axis(xmin=0, xmax=25)
241 | fig2.legend(
242 | ["Pre-trained", "From scratch"],
243 | loc="upper right",
244 | bbox_to_anchor=(0.968, 0.973),
245 | ncol=1,
246 | fontsize=8,
247 | )
248 | ax2.get_legend().remove()
249 | fig2.tight_layout()
250 | plt.savefig("../../figures_test/MCT_valloss.pdf")
251 |
--------------------------------------------------------------------------------
/workspace/PandasScripts/csvhelper_memento.py:
--------------------------------------------------------------------------------
1 | # Orignal author: Siddhant Ray
2 |
3 | import argparse
4 | import os
5 |
6 | import numpy as np
7 | import pandas as pd
8 |
9 | print("Current directory is:", os.getcwd())
10 | print("Generate combined csv for TCP congestion data")
11 |
12 |
13 | def extract_TTL(text):
14 | list_of_features = text.split()
15 | idx_of_ttl = list_of_features.index("ttl")
16 | ttl = list_of_features[idx_of_ttl + 1]
17 | return ttl
18 |
19 |
20 | def extract_protocol(text):
21 | list_of_features = text.split()
22 | idx_of_protocol = list_of_features.index("protocol")
23 | protocol = list_of_features[idx_of_protocol + 1]
24 | return protocol
25 |
26 |
27 | def rename_flowid(input_text):
28 | return input_text
29 |
30 |
31 | def generate_senders_csv(path, n_senders):
32 | path = path
33 | num_senders = n_senders
34 | sender_num = 0
35 |
36 | df_sent_cols = [
37 | "Timestamp",
38 | "Flow ID",
39 | "Packet ID",
40 | "Packet Size",
41 | "IP ID",
42 | "DSCP",
43 | "ECN",
44 | "TTL",
45 | "Payload Size",
46 | "Proto",
47 | "Source IP",
48 | "Destination IP",
49 | "TCP Source Port",
50 | "TCP Destination Port",
51 | "TCP Sequence Number",
52 | "TCP Window Size",
53 | "Delay",
54 | "Workload ID",
55 | "Application ID",
56 | "Message ID",
57 | ]
58 |
59 | df_sent_cols_to_drop = [
60 | 0,
61 | 2,
62 | 4,
63 | 6,
64 | 8,
65 | 10,
66 | 12,
67 | 14,
68 | 16,
69 | 18,
70 | 20,
71 | 22,
72 | 24,
73 | 26,
74 | 28,
75 | 30,
76 | 32,
77 | 34,
78 | 36,
79 | 38,
80 | 40,
81 | ]
82 |
83 | temp_cols = [
84 | "Timestamp",
85 | "Flow ID",
86 | "Packet ID",
87 | "Packet Size",
88 | "IP ID",
89 | "DSCP",
90 | "ECN",
91 | "TTL",
92 | "Payload Size",
93 | "Proto",
94 | "Source IP",
95 | "Destination IP",
96 | "TCP Source Port",
97 | "TCP Destination Port",
98 | "TCP Sequence Number",
99 | "TCP Window Size",
100 | "Delay",
101 | "Workload ID",
102 | "Application ID",
103 | "Message ID",
104 | ]
105 |
106 | temp = pd.DataFrame(columns=temp_cols)
107 | print(temp.head())
108 |
109 | # files = ["topo_1.csv", "topo_2.csv", "topo_test_1.csv", "topo_test_2.csv"]
110 | # files = ["topo_more_data_1.csv", "topo_more_data_2.csv", "topo_more_data_3.csv",
111 | # "topo_more_data_4.csv", "topo_more_data_5.csv", "topo_more_data_6.csv"]
112 |
113 | """files = ["small_test_no_disturbance_with_message_ids1.csv",
114 | "small_test_no_disturbance_with_message_ids2.csv",
115 | "small_test_no_disturbance_with_message_ids3.csv",
116 | "small_test_no_disturbance_with_message_ids4.csv",
117 | "small_test_no_disturbance_with_message_ids5.csv",
118 | "small_test_no_disturbance_with_message_ids6.csv",
119 | "small_test_no_disturbance_with_message_ids7.csv",
120 | "small_test_no_disturbance_with_message_ids8.csv",
121 | "small_test_no_disturbance_with_message_ids9.csv",
122 | "small_test_no_disturbance_with_message_ids10.csv"]"""
123 |
124 | """files = ["small_test_one_disturbance_with_message_ids1.csv",
125 | "small_test_one_disturbance_with_message_ids2.csv",
126 | "small_test_one_disturbance_with_message_ids3.csv",
127 | "small_test_one_disturbance_with_message_ids4.csv",
128 | "small_test_one_disturbance_with_message_ids5.csv",
129 | "small_test_one_disturbance_with_message_ids6.csv",
130 | "small_test_one_disturbance_with_message_ids7.csv",
131 | "small_test_one_disturbance_with_message_ids8.csv",
132 | "small_test_one_disturbance_with_message_ids9.csv",
133 | "small_test_one_disturbance_with_message_ids10.csv",
134 | "small_test_one_disturbance_with_message_ids11.csv"]"""
135 |
136 | files = [
137 | "large_test_disturbance_with_message_ids1.csv",
138 | "large_test_disturbance_with_message_ids2.csv",
139 | "large_test_disturbance_with_message_ids3.csv",
140 | "large_test_disturbance_with_message_ids4.csv",
141 | "large_test_disturbance_with_message_ids5.csv",
142 | "large_test_disturbance_with_message_ids6.csv",
143 | "large_test_disturbance_with_message_ids7.csv",
144 | "large_test_disturbance_with_message_ids8.csv",
145 | "large_test_disturbance_with_message_ids9.csv",
146 | "large_test_disturbance_with_message_ids10.csv",
147 | ]
148 | # files = ["memento_test10.csv", "memento_test20.csv", "memento_test25.csv"]
149 |
150 | for file in files:
151 |
152 | sender_tx_df = pd.read_csv(path + file)
153 | sender_tx_df = pd.DataFrame(np.vstack([sender_tx_df.columns, sender_tx_df]))
154 | sender_tx_df.drop(
155 | sender_tx_df.columns[df_sent_cols_to_drop], axis=1, inplace=True
156 | )
157 |
158 | sender_tx_df.columns = df_sent_cols
159 | sender_tx_df["Packet ID"].iloc[0] = 0
160 | sender_tx_df["Flow ID"].iloc[0] = sender_tx_df["Flow ID"].iloc[1]
161 | sender_tx_df["IP ID"].iloc[0] = 0
162 | sender_tx_df["DSCP"].iloc[0] = 0
163 | sender_tx_df["ECN"].iloc[0] = 0
164 | sender_tx_df["TCP Sequence Number"].iloc[0] = 0
165 | # sender_tx_df["TTL"] = sender_tx_df.apply(lambda row: extract_TTL(row['Extra']), axis = 1)
166 | # sender_tx_df["Proto"] = sender_tx_df.apply(lambda row: extract_protocol(row['Extra']), axis = 1)
167 | sender_tx_df["Flow ID"] = [sender_num for i in range(sender_tx_df.shape[0])]
168 | sender_tx_df["Message ID"].iloc[0] = sender_tx_df["Message ID"].iloc[1]
169 |
170 | df_sent_cols_new = [
171 | "Timestamp",
172 | "Flow ID",
173 | "Packet ID",
174 | "Packet Size",
175 | "IP ID",
176 | "DSCP",
177 | "ECN",
178 | "Payload Size",
179 | "TTL",
180 | "Proto",
181 | "Source IP",
182 | "Destination IP",
183 | "TCP Source Port",
184 | "TCP Destination Port",
185 | "TCP Sequence Number",
186 | "TCP Window Size",
187 | "Delay",
188 | "Workload ID",
189 | "Application ID",
190 | "Message ID",
191 | ]
192 | sender_tx_df = sender_tx_df[df_sent_cols_new]
193 |
194 | # sender_tx_df.drop(['Extra'],axis = 1, inplace=True)
195 | temp = pd.concat([temp, sender_tx_df], ignore_index=True, copy=False)
196 | # sender_tx_df.drop(['Extra'],axis = 1, inplace=True)
197 | save_name = file.split(".")[0] + "_final.csv"
198 | sender_tx_df.to_csv(path + save_name, index=False)
199 |
200 | # temp.drop(['Extra'],axis = 1, inplace=True)
201 | print(temp.head())
202 | print(temp.columns)
203 | print(temp.shape)
204 |
205 | return temp
206 |
207 |
208 | def main():
209 | parser = argparse.ArgumentParser()
210 | parser.add_argument(
211 | "-mod",
212 | "--model",
213 | help="choose CC model for creating congestion",
214 | required=False,
215 | )
216 | parser.add_argument(
217 | "-nsend",
218 | "--numsenders",
219 | help="choose path for different topologies",
220 | required=False,
221 | )
222 | args = parser.parse_args()
223 | print(args)
224 |
225 | if args.model == "memento":
226 | path = "results/"
227 |
228 | else:
229 | pass
230 |
231 | n_senders = 1
232 | sender_csv = generate_senders_csv(path, n_senders)
233 |
234 |
235 | if __name__ == "__main__":
236 | main()
237 |
--------------------------------------------------------------------------------
/workspace/TransformerModels/arima.py:
--------------------------------------------------------------------------------
1 | # Orignal author: Siddhant Ray
2 |
3 | import argparse
4 | import time as t
5 | import warnings
6 | from datetime import datetime
7 | from math import sqrt
8 |
9 | import matplotlib as mpl
10 | import matplotlib.pyplot as plt
11 | import numpy as np
12 | import pandas as pd
13 | import seaborn as sns
14 | from generate_sequences import generate_ARIMA_delay_data
15 | from sklearn.metrics import mean_squared_error
16 | from statsmodels.tools.sm_exceptions import ConvergenceWarning
17 | from statsmodels.tsa.arima.model import ARIMA
18 |
19 | NUMBOTTLECKS = 1
20 |
21 |
22 | def run_arima():
23 | delay_data = generate_ARIMA_delay_data(NUMBOTTLECKS)
24 | targets, predictions = [], []
25 | warnings.simplefilter("ignore", ConvergenceWarning)
26 |
27 | # count = 0
28 | # We want minimum 1023 for the first ARIMA prediction (size of the window)
29 | # Make this 29990 -> 9990 for the 10000 history window ARIMA
30 | for value in range(1023, int(delay_data.shape[0] / 116) + 29990):
31 |
32 | # We want to predict the next value
33 | # Fit the model
34 | model = ARIMA(delay_data[:value], order=(1, 1, 2))
35 | model_fit = model.fit()
36 | yhat = model_fit.forecast(steps=1)
37 | targets.append(delay_data[value])
38 | predictions.append(yhat)
39 | # count+=1
40 |
41 | # print(count)
42 | return targets, predictions
43 |
44 |
45 | def evaluate_arima(targets, predictions):
46 | mse = mean_squared_error(targets, predictions)
47 | squared_error = np.square(targets - predictions)
48 | return squared_error, mse
49 |
50 |
51 | if __name__ == "__main__":
52 |
53 | args = argparse.ArgumentParser()
54 | args.add_argument("--run", type=bool, default=False)
55 | args = args.parse_args()
56 |
57 | if args.run:
58 |
59 | print("Started ARIMA at:")
60 | time = datetime.now()
61 | print(time)
62 |
63 | targets, predictions = run_arima()
64 |
65 | ## MSE calculation
66 | mse = mean_squared_error(targets, predictions)
67 | print(mse)
68 |
69 | print("Finished ARIMA at:")
70 | time = datetime.now()
71 | print(time)
72 |
73 | # Save the results
74 | df = pd.DataFrame({"Targets": targets, "Predictions": predictions})
75 | df.to_csv("memento_data/ARIMA_30000.csv", index=False)
76 |
77 | else:
78 | print("ARIMA load results from file")
79 | df = pd.read_csv("memento_data/ARIMA_30000.csv")
80 |
81 | targets = df["Targets"]
82 | predictions = (
83 | df["Predictions"].str.split(" ").str[4].str.split("\n").str[0].astype(float)
84 | )
85 |
86 | squared_error, mse = evaluate_arima(targets, predictions)
87 |
88 | df = pd.DataFrame(
89 | {
90 | "Squared Error": squared_error,
91 | "targets": targets,
92 | "predictions": predictions,
93 | }
94 | )
95 | df.to_csv("memento_data/ARIMA_evaluation_30000.csv", index=False)
96 |
97 | print(df.head())
98 |
99 | print(squared_error.values)
100 |
101 | ## Stats on the squared error
102 | # Mean squared error
103 | print(
104 | np.mean(squared_error.values),
105 | " Mean squared error",
106 | )
107 | # Median squared error
108 | print(np.median(squared_error.values), " Median squared error")
109 | # 90th percentile squared error
110 | print(
111 | np.quantile(squared_error.values, 0.90, method="closest_observation"),
112 | " 90th percentile squared error",
113 | )
114 | # 99th percentile squared error
115 | print(
116 | np.quantile(squared_error.values, 0.99, method="closest_observation"),
117 | " 99th percentile squared error",
118 | )
119 | # 99.9th percentile squared error
120 | print(
121 | np.quantile(squared_error.values, 0.999, method="closest_observation"),
122 | " 99.9th percentile squared error",
123 | )
124 | # Standard deviation squared error
125 | print(np.std(squared_error.values), " Standard deviation squared error")
126 |
127 | ## Df row where the squared error is the a certain value
128 | print(
129 | df[
130 | df["Squared Error"]
131 | == np.quantile(squared_error.values, 0.5, method="closest_observation")
132 | ],
133 | "Values at median SE",
134 | )
135 | print(
136 | df[
137 | df["Squared Error"]
138 | == np.quantile(squared_error.values, 0.90, method="closest_observation")
139 | ],
140 | "Values at 90th percentile SE",
141 | )
142 | print(
143 | df[
144 | df["Squared Error"]
145 | == np.quantile(squared_error.values, 0.99, method="closest_observation")
146 | ],
147 | "Values at 99th percentile SE",
148 | )
149 | print(
150 | df[
151 | df["Squared Error"]
152 | == np.quantile(
153 | squared_error.values, 0.999, method="closest_observation"
154 | )
155 | ],
156 | "Values at 99.9th percentile SE",
157 | )
158 | print(
159 | df[
160 | df["Squared Error"]
161 | == np.quantile(
162 | squared_error.values, 0.9999, method="closest_observation"
163 | )
164 | ],
165 | "Values at 99.99th percentile SE",
166 | )
167 |
168 | # Plot the index vs squared error
169 | # Set figure size
170 |
171 | sns.set_theme("paper", "whitegrid", font_scale=1.5)
172 | mpl.rcParams.update(
173 | {
174 | "text.usetex": True,
175 | "font.family": "serif",
176 | "text.latex.preamble": r"\usepackage{amsmath,amssymb}",
177 | "lines.linewidth": 2,
178 | "lines.markeredgewidth": 0,
179 | "scatter.marker": ".",
180 | "scatter.edgecolors": "none",
181 | # Set image quality and reduce whitespace around saved figure.
182 | "savefig.dpi": 300,
183 | "savefig.bbox": "tight",
184 | "savefig.pad_inches": 0.01,
185 | }
186 | )
187 |
188 | # Plot the index vs squared error
189 | plt.figure(figsize=(5, 5))
190 | sns.lineplot(x=df.index, y=df["Squared Error"], color="red")
191 | plt.xlabel("History Length")
192 | plt.ylabel("Squared Error")
193 | # plt.title("Squared Error on predictions vs History Length upto 10000")
194 | plt.xlim(1023, 10000)
195 | # place legend to the right, bbox is the box around the legend
196 | # plt.legend(loc='upper right', bbox_to_anchor=(0.67, 1.02), ncol=1)
197 | plt.savefig("SE_trend_arima_10000.pdf")
198 |
199 | ## Do the plots over a loop of xlims
200 | xlims = [0, 6000, 12000, 18000, 24000, 30000]
201 | for idx_xlim in range(len(xlims) - 1):
202 | plt.figure(figsize=(10, 6))
203 | sns.lineplot(
204 | x=df.index, y=df["Squared Error"], color="red", label="Squared Error"
205 | )
206 | # label axes
207 | plt.xlabel("History Length")
208 | plt.ylabel("Squared Error")
209 | # set xlim
210 | plt.xlim(xlims[idx_xlim], xlims[idx_xlim + 1])
211 | plt.title(
212 | "Squared Error trend for xlims "
213 | + str(xlims[idx_xlim])
214 | + " to "
215 | + str(xlims[idx_xlim + 1])
216 | )
217 | plt.savefig(
218 | "SE_trend_arima_xlim_"
219 | + str(xlims[idx_xlim])
220 | + "_"
221 | + str(xlims[idx_xlim + 1])
222 | + ".pdf"
223 | )
224 |
--------------------------------------------------------------------------------
/report/introduction.tex:
--------------------------------------------------------------------------------
1 | \chapter{Introduction}
2 | \label{cha:introduction}
3 |
4 | Learning fundamental behaviour of network data from packet traces is an extremely hard problem. While machine learning (ML) algorithms have been shown as an efficient way of learning from raw data, adapting such algorithms to the general network domain has been hard, to the point that the community doesn't attempt to do so. In our project, we argue that all is not lost, using some specific machine learning architectures like the Transformer, it is indeed possible to develop methods to learn from such data in a general manner.
5 |
6 | \section{Motivation}
7 | \label{sec:motivation}
8 |
9 | Modelling network dynamics is a \emph{sequence modelling} problem. From a sequence of past packets, the goal is to estimate the current state of the network (\eg Is there congestion? Will the packet be dropped?), following which predict the state's evolution and future traffic's fate. Concretely, we can also decide on which action to take next \eg should the next packet be put on a different path? Due to the successes in the field of ML and learning from data, it is becoming increasing popular to use such algorithms for solving this modelling problem but the task is notoriously complex. There has been some success in using ML for specific applications in networks, including congestion control\cite{classic,jayDeepReinforcementLearning2019,dynamic,exmachina},
10 | video streaming\cite{oboe,maoNeuralAdaptiveVideo2017,puffer},
11 | traffic optimization\cite{auto},
12 | routing\cite{learnroute},
13 | flow size prediction\cite{flow,onlineflow},
14 | MAC protocol optimization\cite{oneproto,heterowire},
15 | and network simulation\cite{zhangMimicNetFastPerformance2021}, however a good framework for general purpose learning on network data, still doesn't exist.
16 |
17 | Today's ML models trained are trained for specific tasks and do not generalize well; \ie they often fail to deliver outside of their original training environments\cite{puffer, datadriven, blackbox}. Due to this, generalizing to different tasks is not even considered. Recent work argues that, rather than hoping for generalization, one obtains better results by training in-situ, \ie using data collected in the deployment environment\cite{puffer}.
18 | Today we tend to design and train models from scratch using model-specific datasets~(Figure \ref{fig:vision}, top). This process is arduous, expensive, repetitive and time-consuming. We always redo everything from scratch in the training process, and never make use of common objectives for training. Moreover, the growing resource requirements to even attempt training these models is increasing inequalities in networking research and, ultimately, hindering collective progress.
19 |
20 | As ML algorithms (especially certain deep learning architectures) have shown generalization capabilities\cite{generalizingdnn} in other fields, where an initial \emph{pre-training} phase is used to train models on a large dataset, in a task-agnostic manner and then, in a \emph{fine-tuning} phase, the models are refined on smaller task specific datasets. This helps reuse the general pre-trained model across multiple tasks, making it resource and time efficient. This kind of transfer learning or generalization\cite{transferng} is enabled by using the pre-training phase to learn the overall structure in the data, followed by the fine-tuning phase to focus on learning more task-specific features. As long as there is a certain amount of similarity in the data's structure across the pre-training and fine-tuning, this method can have extremely efficient results.
21 |
22 | \section{Tasks, Goals and Challenges}
23 | \label{sec:task}
24 |
25 | Inspired by ML models which generalize on data in several other fields, it should be possible to design a similar model, in order to have learning and generalization in networking. Even if the networking contexts (topology, network configuration, traffic, etc.) can be very diverse, the underlying dynamics of networks remain essentially the same; \eg when buffers fill up, queuing disciplines delay or drop packets. These dynamics can be learned with ML and should generalize and it should not be required to re-learn this fundamental behaviour for training a new model every time. Building such a generic network model for network data is challenging, but this effort would benefit the entire community. Starting from such a model, one would only need to collect a small task-specific dataset to fine-tune it (Figure \ref{fig:vision}, bottom), assuming that the pre-trained model generalizes well. This could even allow modelling rare events (\eg drops after link failures) for which, across the real network traces today, only little data is available due to its less frequent occurrence.
26 |
27 | \begin{figure}
28 | \centering
29 | \includegraphics[scale=1.3]{figures/vision}
30 | \caption{Can we collectively learn general network traffic dynamics \emph{once} and focus on task-specific data collecting and learning for \emph{many future models?} Credits: Alexander Dietmüller}
31 | \label{fig:vision}
32 | \end{figure}
33 |
34 | While research shows some generalization in specific networking contexts\cite{jayDeepReinforcementLearning2019}, truly ``generic'' models, which are able to perform well on a wide range of tasks and networks, remain unavailable. This is caused by the fact that we usually do not train on datasets large enough to allow generalization, we only train on task-specific smaller datasets. Sequence modelling for a long time, has been infeasible even with architectures dedicated to sequence modelling, such as recurrent neural networks (RNNs), as they can only handle short sequence and are inefficient to train\cite{factor}. However, a few years ago, a new architecture for sequence modelling was proposed: the \emph{Transformer}\cite{vaswaniAttentionAllYou2017} and this proved to be ground-breaking for sequence modelling. This architecture is designed to train efficiently, enabling learning from massive datasets and unprecedented generalization. In a \emph{pre-training phase}, the transformer learns sequential ``structures'', \eg the structure of a language from a large corpus of texts. Then, in a much quicker \emph{fine-tuning phase}, the final stages of the model are adapted to a specific prediction task (\eg text sentiment analysis). Today, Transformers are among the state-of-the-art in natural language processing (NLP\cite{recentnlp}) and computer vision (CV\cite{cvsurvey}).
35 |
36 | The generalization power of the Transformer, stems from its ability to learn ``contextual information'', using context from the neighbouring elements in a sequence, for a given element in the same sequence\cite{devlinBERTPretrainingDeep2019}.\footnote{Consider the word \emph{left} in two different contexts: I \emph{left} my book on the table. Turn \emph{left} at the next crossing. The transformer outputs for the word \emph{left} are different for each sequence as they encode the word's context.}
37 | We can draw parallels between networking and NLP. In isolation, packet metadata (headers, etc.) provides limited insights into the network state, we also need the \emph{context}, \ie which we can get from the recent packet history.\footnote{Increasing latency over history indicates congestion.} Based on these parallel, we propose that a Transformer based architecture can also be design to generalise on network packet data.
38 |
39 | Naively transposing NLP or CV transformers to networking fails, as the fundamental structure and biases\cite{biases} in the data are different. Generalizing on complex interactions in networks is not a trivial problem. We expect the following challenges for our Transformer design.
40 |
41 | \begin{itemize}
42 | \item
43 | How do we adapt Transformers for learning on networking data?
44 | \item
45 | How do we assemble a dataset large and diverse enough to allow useful generalization?
46 | \item
47 | Which pre-training task would allow the model to generalize, and how far can we push generalization?
48 | \item
49 | How do we scale such a Transformer to arbitrarily large amounts of network data from extremely diverse environments?
50 | \end{itemize}
51 |
52 | \section{Overview}
53 | \label{sec:overview}
54 |
55 | We present in our thesis, a Network Traffic Transformer (NTT), which serves a first step, to design a Transformer model for learning on network packet data. We outline the following main technical contributions in this thesis:
56 |
57 | \begin{itemize}
58 | \item We present the required background on Transformers, which gives directions to our design, in Chapter \ref{cha:background}.
59 | \item We present the detailed architectural design ideas behind our proof-of-concept NTT in Chapter \ref{cha:design}.
60 | \item We present a detailed evaluation on pre-training and fine-tuning our first NTT models in Chapter \ref{cha:evaluation}.
61 | \item We present several future research directions which can be used to improve our NTT in Chapter \ref{cha:outlook}.
62 | \item We summarise our work and provide some concluding remarks in Chapter \ref{cha:summary}.
63 | \item Supplementary technical details and supporting results are presented in Appendix \ref{app:a}, \ref{app:b} and \ref{app:c}.
64 | \end{itemize}
65 |
66 | A part of the work conducted during this thesis has been submitted as the following paper\cite{newhope} to HotNets '22, and hence, some parts of the thesis have been built upon work done in writing the paper.
67 |
--------------------------------------------------------------------------------
/workspace/TransformerModels/mct_test_plots.py:
--------------------------------------------------------------------------------
1 | # Orignal author: Siddhant Ray
2 |
3 | from csv import reader
4 |
5 | import matplotlib as mpl
6 | import matplotlib.pyplot as plt
7 | import seaborn as sns
8 | from matplotlib.ticker import FormatStrFormatter
9 | from tbparse import SummaryReader
10 |
11 | mct_log_dir_pretrained0 = "../../logs/finetune_mct_logs/"
12 | reader_pretrained0 = SummaryReader(mct_log_dir_pretrained0)
13 | df_pretrained0 = reader_pretrained0.scalars
14 |
15 |
16 | train_loss_epoch_df_pretrained0 = df_pretrained0[
17 | df_pretrained0["tag"] == "Avg loss per epoch"
18 | ]
19 | train_loss_epoch_df_pretrained0.reset_index(inplace=True, drop=True)
20 |
21 | val_loss_epoch_df_pretrained0 = df_pretrained0[df_pretrained0["tag"] == "Val loss"]
22 | val_loss_epoch_df_pretrained0.reset_index(inplace=True, drop=True)
23 |
24 |
25 | mct_log_dir_pretrained1 = "../../logs/finetune_mct_logs3/"
26 | reader_pretrained1 = SummaryReader(mct_log_dir_pretrained1)
27 | df_pretrained1 = reader_pretrained1.scalars
28 |
29 | train_loss_epoch_df_pretrained1 = df_pretrained1[
30 | df_pretrained1["tag"] == "Avg loss per epoch"
31 | ]
32 | train_loss_epoch_df_pretrained1.reset_index(inplace=True, drop=True)
33 |
34 | val_loss_epoch_df_pretrained1 = df_pretrained1[df_pretrained1["tag"] == "Val loss"]
35 | val_loss_epoch_df_pretrained1.reset_index(inplace=True, drop=True)
36 |
37 | mct_log_dir_pretrained2 = "../../logs/finetune_mct_logs4/"
38 | reader_pretrained2 = SummaryReader(mct_log_dir_pretrained2)
39 | df_pretrained2 = reader_pretrained2.scalars
40 |
41 | train_loss_epoch_df_pretrained2 = df_pretrained2[
42 | df_pretrained2["tag"] == "Avg loss per epoch"
43 | ]
44 | train_loss_epoch_df_pretrained2.reset_index(inplace=True, drop=True)
45 |
46 | val_loss_epoch_df_pretrained2 = df_pretrained2[df_pretrained2["tag"] == "Val loss"]
47 | val_loss_epoch_df_pretrained2.reset_index(inplace=True, drop=True)
48 |
49 | # Print df shape
50 | print(train_loss_epoch_df_pretrained2.shape)
51 | print(val_loss_epoch_df_pretrained2.shape)
52 |
53 | sns.set_theme("paper", "whitegrid", font_scale=1.5)
54 | mpl.rcParams.update(
55 | {
56 | "text.usetex": True,
57 | "font.family": "serif",
58 | "text.latex.preamble": r"\usepackage{amsmath,amssymb}",
59 | "lines.linewidth": 2,
60 | "lines.markeredgewidth": 0,
61 | "scatter.marker": ".",
62 | "scatter.edgecolors": "none",
63 | # Set image quality and reduce whitespace around saved figure.
64 | "savefig.dpi": 300,
65 | "savefig.bbox": "tight",
66 | "savefig.pad_inches": 0.01,
67 | }
68 | )
69 |
70 | # Test figure, makse subplots for train and val
71 | fig, ax = plt.subplots(2, figsize=(5, 5), sharex=True)
72 | t0 = sns.lineplot(
73 | x=train_loss_epoch_df_pretrained0.index,
74 | y="value",
75 | data=train_loss_epoch_df_pretrained0,
76 | color="blue",
77 | label="Fixed Mask Last",
78 | ax=ax[0],
79 | )
80 | t1 = sns.lineplot(
81 | x=train_loss_epoch_df_pretrained1.index,
82 | y="value",
83 | data=train_loss_epoch_df_pretrained1,
84 | color="green",
85 | label="Last 16",
86 | ax=ax[0],
87 | )
88 | t2 = sns.lineplot(
89 | x=train_loss_epoch_df_pretrained2.index,
90 | y="value",
91 | data=train_loss_epoch_df_pretrained2,
92 | color="red",
93 | label="Last 32",
94 | ax=ax[0],
95 | )
96 | # Label plot
97 | ax[0].set_xlabel("Training Epoch", fontsize=10)
98 | ax[0].set_ylabel("Training MSE", fontsize=10)
99 | ax[0].lines[0].set_linestyle("dotted")
100 | ax[0].lines[1].set_linestyle("--")
101 | ax[0].lines[2].set_linestyle("-.")
102 | ticks = [0, 0.25, 0.5, 0.75, 1]
103 | ax[0].yaxis.set_ticks(ticks)
104 | tickLabels = map(str, ticks)
105 | ax[0].yaxis.set_ticklabels(tickLabels)
106 | ax[0].axis(ymin=0, ymax=1)
107 | ax[0].axis(xmin=0, xmax=18)
108 |
109 | v0 = sns.lineplot(
110 | x=val_loss_epoch_df_pretrained0.index,
111 | y="value",
112 | data=val_loss_epoch_df_pretrained0,
113 | color="blue",
114 | label="Fixed Mask Last",
115 | ax=ax[1],
116 | )
117 | v1 = sns.lineplot(
118 | x=val_loss_epoch_df_pretrained1.index,
119 | y="value",
120 | data=val_loss_epoch_df_pretrained1,
121 | color="green",
122 | label="Last 16",
123 | ax=ax[1],
124 | )
125 | v2 = sns.lineplot(
126 | x=val_loss_epoch_df_pretrained2.index,
127 | y="value",
128 | data=val_loss_epoch_df_pretrained2,
129 | color="red",
130 | label="Last 32",
131 | ax=ax[1],
132 | )
133 |
134 | # Label plot
135 | ax[1].set_xlabel("Training Epoch", fontsize=10)
136 | ax[1].set_ylabel("Validation MSE", fontsize=10)
137 | ax[1].lines[0].set_linestyle("dotted")
138 | ax[1].lines[1].set_linestyle("--")
139 | ax[1].lines[2].set_linestyle("-.")
140 | ticks = [0, 0.25, 0.5, 0.75, 1]
141 | ax[1].yaxis.set_ticks(ticks)
142 | tickLabels = map(str, ticks)
143 | ax[1].yaxis.set_ticklabels(tickLabels)
144 | ax[1].axis(ymin=0, ymax=1)
145 | ax[1].axis(xmin=0, xmax=18)
146 |
147 | fig.legend(
148 | ["Fixed Mask Last", "Last 16", "Last 32"],
149 | loc="upper right",
150 | bbox_to_anchor=(0.968, 0.958),
151 | ncol=1,
152 | fontsize=10,
153 | )
154 | ax[1].get_legend().remove()
155 | ax[0].get_legend().remove()
156 | fig.tight_layout()
157 | fig.savefig("../../figures_test/finetune_mct_loss_comparison.pdf")
158 |
159 | ## Create the same for aggregated masking also
160 | # Create a new dataframe for aggregated masking
161 |
162 | mct_log_dir_pretrained3 = "../../logs/finetune_mct_logs5/"
163 | reader_pretrained3 = SummaryReader(mct_log_dir_pretrained3)
164 | df_pretrained3 = reader_pretrained3.scalars
165 |
166 | train_loss_epoch_df_pretrained3 = df_pretrained3[
167 | df_pretrained3["tag"] == "Avg loss per epoch"
168 | ]
169 | train_loss_epoch_df_pretrained3.reset_index(inplace=True, drop=True)
170 |
171 | val_loss_epoch_df_pretrained3 = df_pretrained3[df_pretrained3["tag"] == "Val loss"]
172 | val_loss_epoch_df_pretrained3.reset_index(inplace=True, drop=True)
173 |
174 | mct_log_dir_pretrained4 = "../../logs/finetune_mct_logs6/"
175 | reader_pretrained4 = SummaryReader(mct_log_dir_pretrained4)
176 | df_pretrained4 = reader_pretrained4.scalars
177 |
178 | train_loss_epoch_df_pretrained4 = df_pretrained4[
179 | df_pretrained4["tag"] == "Avg loss per epoch"
180 | ]
181 | train_loss_epoch_df_pretrained4.reset_index(inplace=True, drop=True)
182 |
183 | val_loss_epoch_df_pretrained4 = df_pretrained4[df_pretrained4["tag"] == "Val loss"]
184 | val_loss_epoch_df_pretrained4.reset_index(inplace=True, drop=True)
185 |
186 | mct_log_dir_pretrained5 = "../../logs/finetune_mct_logs7/"
187 | reader_pretrained5 = SummaryReader(mct_log_dir_pretrained5)
188 | df_pretrained5 = reader_pretrained5.scalars
189 |
190 | train_loss_epoch_df_pretrained5 = df_pretrained5[
191 | df_pretrained5["tag"] == "Avg loss per epoch"
192 | ]
193 | train_loss_epoch_df_pretrained5.reset_index(inplace=True, drop=True)
194 |
195 | val_loss_epoch_df_pretrained5 = df_pretrained5[df_pretrained5["tag"] == "Val loss"]
196 | val_loss_epoch_df_pretrained5.reset_index(inplace=True, drop=True)
197 |
198 |
199 | ## Plot the loss for aggregated masking
200 | fig, ax = plt.subplots(2, figsize=(5, 5), sharex=True)
201 | t0 = sns.lineplot(
202 | x=train_loss_epoch_df_pretrained0.index,
203 | y="value",
204 | data=train_loss_epoch_df_pretrained0,
205 | color="blue",
206 | label="Fixed Mask Last",
207 | ax=ax[0],
208 | )
209 | t1 = sns.lineplot(
210 | x=train_loss_epoch_df_pretrained4.index,
211 | y="value",
212 | data=train_loss_epoch_df_pretrained4,
213 | color="green",
214 | label="Choose Enc. State",
215 | ax=ax[0],
216 | )
217 | t2 = sns.lineplot(
218 | x=train_loss_epoch_df_pretrained3.index,
219 | y="value",
220 | data=train_loss_epoch_df_pretrained3,
221 | color="red",
222 | label="Choose Agg. Level",
223 | ax=ax[0],
224 | )
225 |
226 | # Label plot
227 | ax[0].set_xlabel("Training Epoch", fontsize=10)
228 | ax[0].set_ylabel("Training MSE", fontsize=10)
229 | ax[0].lines[0].set_linestyle("dotted")
230 | ax[0].lines[1].set_linestyle("--")
231 | ax[0].lines[2].set_linestyle("-.")
232 | ticks = [0, 0.25, 0.5, 0.75, 1]
233 | ax[0].yaxis.set_ticks(ticks)
234 | tickLabels = map(str, ticks)
235 | ax[0].yaxis.set_ticklabels(tickLabels)
236 | ax[0].axis(ymin=0, ymax=1)
237 | ax[0].axis(xmin=0, xmax=13)
238 |
239 | v0 = sns.lineplot(
240 | x=val_loss_epoch_df_pretrained0.index,
241 | y="value",
242 | data=val_loss_epoch_df_pretrained0,
243 | color="blue",
244 | label="Fixed Mask Last",
245 | ax=ax[1],
246 | )
247 | v1 = sns.lineplot(
248 | x=val_loss_epoch_df_pretrained4.index,
249 | y="value",
250 | data=val_loss_epoch_df_pretrained4,
251 | color="green",
252 | label="Choose Enc. State",
253 | ax=ax[1],
254 | )
255 | v2 = sns.lineplot(
256 | x=val_loss_epoch_df_pretrained3.index,
257 | y="value",
258 | data=val_loss_epoch_df_pretrained3,
259 | color="red",
260 | label="Choose Agg. Level",
261 | ax=ax[1],
262 | )
263 |
264 | # Label plot
265 | ax[1].set_xlabel("Training Epoch", fontsize=10)
266 | ax[1].set_ylabel("Validation MSE", fontsize=10)
267 | ax[1].lines[0].set_linestyle("dotted")
268 | ax[1].lines[1].set_linestyle("--")
269 | ax[1].lines[2].set_linestyle("-.")
270 | ticks = [0, 0.25, 0.5, 0.75, 1]
271 | ax[1].yaxis.set_ticks(ticks)
272 | tickLabels = map(str, ticks)
273 | ax[1].yaxis.set_ticklabels(tickLabels)
274 | ax[1].axis(ymin=0, ymax=1)
275 | ax[1].axis(xmin=0, xmax=18)
276 |
277 | # Make legend
278 | fig.legend(
279 | ["Fixed Mask Last", "Choose Enc. State", "Choose Agg. Level"],
280 | loc="upper right",
281 | bbox_to_anchor=(0.968, 0.958),
282 | ncol=1,
283 | fontsize=10,
284 | )
285 | ax[1].get_legend().remove()
286 | ax[0].get_legend().remove()
287 | fig.tight_layout()
288 | fig.savefig("../../figures_test/finetune_mct_loss_comparison_agg.pdf")
289 |
--------------------------------------------------------------------------------
/report/appendix.tex:
--------------------------------------------------------------------------------
1 | \chapter{NTT training details}
2 | \label{app:a}
3 |
4 | We present here further specifics about the choices made during pre-training and fine-tuning our NTT architecture. We also present the common hyper-parameters used for both. In the scope of our project, we do not perform a search to find the best performing hyper-parameters. Our objective is to explore what can be learnt, and not to achieve state-of-the-art results. The hyper-parameters for us are chosen based on Transformers trained in other domains, and with some tweaking, work reasonably well for our use case.
5 |
6 | We implement our NTT in Python, using the PyTorch\cite{pytorch} and PyTorch Lightning\cite{pytorchlit} libraries. We implement our NTT in a Debian $10$ environment. For our training process, we use NVIDIA\textsuperscript{\textregistered} Titan Xp GPUs, with $12$ GB of GPU memory. For pre-training and fine-tuning on the full datasets, we use 2 GPUs with PyTorch's DataParallel implementation. For pre-processing our data, generating our input sliding window sequences and converting our data into training, validation and test batches, we use $4$ Intel\textsuperscript{\textregistered} $2.40$ GHz Dual Tetrakaideca-Core Xeon E5-2680 v4 CPUs and between $60-80$ GB of RAM.
7 |
8 | \begin{table}[htbp]
9 | \centering
10 | \begin{tabular}{ l c }
11 | \toprule
12 | \emph{Hyper-parameter} & Value \\
13 |
14 | \midrule
15 | Learning rate & $1\times10^{-4}$ \\
16 | Weight decay & $1\times10^{-5}$ \\
17 | \# of attention heads & $8$ \\
18 | \# of Transformer layers & $6$ \\
19 | Batch size & $64$ \\
20 | Dropout prob. & $0.2$ \\
21 | Pkt. embedding dim. & $120$ \\
22 |
23 | \bottomrule
24 |
25 | \end{tabular}
26 | \caption{Hyper-parameters for NTT training}
27 | \label{app:table1}
28 | \end{table}
29 |
30 | We refer to Table \ref{app:table1} to discuss our training hyper-parameters. The number of attention heads refers to the number of attention matrices used inside the Transformer encoder layers, which are processed in parallel. In our NTT architecture (Figure \ref{fig:ntt}), we have $691K$ trainable parameters from the embedding and aggregation layers, $3.3M$ trainable parameters from the Transformer encoder layers and $163K$ trainable parameters from linear layers in the decoder. We use $4$ linear layers in the decoder, with activation and layer-weight normalisation\cite{layernorm} between each linear layer. We also use a layer-weight normalisation layer, as a pre-normalising layer, on the output of the embedding and aggregation. During training, we use a dropout probability\cite{dropout} of $0.2$ and a weight decay\cite{weightdecay} over the weights (not biases)\cite{goodfellowDeepLearning2016}, in order to prevent overfitting. We use a batch size of $64$, to reduce the noise during our training process.
31 |
32 | We use the ADAM\cite{adam} optimiser with $\beta_1=0.9$, $\beta_2=0.98$ and $\epsilon=1\times10^{-9}$, for our training and the Huber loss\cite{huber} function for training loss (\ref{eq:huber}), as it is not super-sensitive to outliers but neither ignores their effects entirely. The loss function is computed on the residual \ie the difference between the observed and predicted values, $y$ and $f(x)$ respectively.
33 | \begin{equation}
34 | L_\delta(y, f(x))=
35 | \begin{cases}
36 | \frac{1}{2}(y - f(x))^2 & \text{for } \lvert y - f(x) \rvert \leq \delta, \\
37 | \delta \cdot (\lvert y - f(x) \rvert - \frac{1}{2}\delta) & \text{otherwise}
38 | \end{cases}
39 | \label{eq:huber}
40 | \end{equation}
41 |
42 | We use a warm-up scheduler over our base learning rate (lr) of $1\times10^{-4}$, as mentioned in the original Transformer paper\cite{vaswaniAttentionAllYou2017}. We present the governing equation for that as (\ref {eq:lr})
43 |
44 | \begin{equation}
45 | lr = d_{model}^{-0.5} \cdot \min{(step\_num^{-0.5}, warmup\_steps^{-0.5})}
46 | \label{eq:lr}
47 | \end{equation}
48 |
49 | This corresponds to increasing the learning rate linearly for the first \emph{warmup\_steps} training steps, and decreasing it thereafter proportionally to the inverse square root of the step number. We used $warmup\_steps = 4000$ and our pre-training data has ${\sim}17K$ steps, and our $d_{model}$ is $120$.
50 |
51 |
52 | \chapter{Learning with multiple decoders}
53 | \label{app:b}
54 |
55 | In Section \ref{ssec:impptt}, we evaluated the idea of masking different positions in the input sequence, in order to improve the pre-training phase of the NTT. During this, we realised that with a variable masking approach, it is not always feasible to have a single set of linear layers to effectively act a combined MLP decoder, across all levels of aggregation. We present some further results on using different instances of identical MLP decoders, for different levels of aggregation, which are due to selecting the packet delays to be masked in different ways for pre-training. We summarise our findings in Table \ref{app:table2}.
56 |
57 | \begin{table}[htbp]
58 | \centering
59 | \begin{tabular}{ l c }
60 | \toprule
61 | \emph{all values $\times10^{-3}$} & Pre-training \\
62 | (Masking + MLP instance) & (Delay) \\
63 |
64 | \midrule
65 | \em{NTT: Chosen mask} & \\
66 | \smallindent From encoded states + 1 MLP decoder & 0.063 \\
67 | \smallindent From encoded states + 3 MLP decoders & 0.070 \\
68 | \smallindent From aggregation levels + 1 MLP decoder & 1.31 \\
69 | \smallindent From aggregation levels + 3 MLP decoders & 0.087 \\
70 |
71 |
72 | \bottomrule
73 |
74 | \end{tabular}
75 | \caption{MSE on delay prediction across NTT with multiple instances of linear MLP decoders}
76 | \label{app:table2}
77 | \end{table}
78 |
79 | Based on our experiments, it is evident that we will need different instances of MLP decoders, when we mask across different levels of aggregation, with varying frequency of masking. In the case, where we choose from \emph{the encoded states}, we only choose the packets which are aggregated twice $1/48$ of the times, the packets which are aggregated once $15/48$ of the times, and hence we mainly choose the non-aggregated packets \ie $32/48$ of the times. In this scenario, it doesn't hurt the performance even if we use a single set of linear layers to extract the learnt behaviour across levels of aggregation, as most of the times, the non-aggregated packets are chosen. In this case, using a single MLP decoder vs using $3$ MLP decoders for learning has very similar performance.
80 |
81 | When we choose from \emph{aggregation levels}, the situation is very different as every type of aggregation is chosen $1/3$ of the times. Effectively this results in the fact, that we mask $1/2$ of the packet delays, which are aggregated twice, $1/3$ of the times. In this scenario, a single MLP decoder cannot effectively learn across levels of aggregation, all of which are chosen frequently and in this case, using multiple MLP decoders, helps the learning process. A-priori, we do not know what kind of architecture works best for this, and thus we start with the simplest model, where we use three identical sets of linear layers, to match the independent levels of aggregation.
82 |
83 | Based on the fact that different aggregation schemes may be used in future version of the NTT, different numbers of MLP decoders will be needed. One can certainly be sure that, with increasing complexity and aggregation, more complexity will also be required in the decoder architecture, to learn newer kinds of information.
84 |
85 | \chapter{Delay distributions on the multi-path topology}
86 | \label{app:c}
87 |
88 | We present further insights in this section, about the individual delay distributions seen on each end-to-end path (from sending sources to each individual receiver) as shown in Figure \ref{fig:topo_ft_big}. A-priori, we should not assume that increasing complexity in traffic flows on the network, changes the traffic distributions on each individual path. Our NTT learns dynamics only on a single path during pre-training. We hypothesise that this can generalize to topologies with different paths and different dynamics with fine-tuning. For this we should try to check for the case, when these delay dynamics on different paths (affected by queueing delay and link delay) are different.
89 |
90 | \begin{figure*}[!h]
91 | \begin{center}
92 | \includegraphics[scale=0.8]{figures/delay_Receivers.pdf}
93 | \caption{Comparing delay CDFs across multiple paths}
94 | \label{fig:multipatht}
95 | \end{center}
96 | \end{figure*}
97 |
98 | We can see in Figure \ref{fig:multipatht} that the network dynamics change considerably, as we increase the paths in the network, as we compare the CDFs of the packet delays at all receivers. We observe that the dynamics change a lot from the path to Receiver 1 to the path to Receiver 2, but in our setup, not so much in the paths to Receiver 2 and Receiver 3. IWe can clearly see from the experimental results in the Section \ref{ssec:comptop}, that the pre-trained NTT generalizes to newer topologies with varying dynamics across different paths. However, for testing the NTT in a more robust manner, it is evident that we need to fine-tune on multiple topologies, with different path dynamics in order to check the true extent of generalization. For the purpose of this thesis, such evaluation is not in the current scope and we leave that to future experiments.
99 |
100 | \chapter{Declaration of Originality}
101 | \label{app:d}
102 |
103 | \begin{figure*}[!h]
104 | \begin{center}
105 | \includegraphics[scale=0.8]{figures/declaration.pdf}
106 | \label{fig:dec}
107 | \end{center}
108 | \end{figure*}
109 |
110 |
--------------------------------------------------------------------------------
/report/outlook.tex:
--------------------------------------------------------------------------------
1 | \chapter{Outlook}
2 | \label{cha:outlook}
3 |
4 | The evaluation for our current NTT design is extremely promising, and indicates that such an architecture can be built for learning and generalizing on network dynamics. However, the process does not end here, there is huge scope of possible future research and there exist multiple directions in which the NTT can be improved. We present some ideas which we feel can be the next steps for these improvements.
5 |
6 | \section{Learning on bigger topologies}
7 | \label{sec:biggertopos}
8 |
9 | We evaluate our NTT on simple topologies in our project, as we are still in the initial phase of building such an architecture. Real networks are undeniably much more complex than our current training environment. These networks (\eg a large datacenter, ISP etc.) are larger, have complex connections with traffic flowing along multiple paths; there are many different applications, transport protocols, queueing disciplines, etc. and the interactions across these traces, lead to extremely complex network dynamics. Additionally, there are many more fine-tuning tasks to consider, \eg flow classification for security or anomaly detection. We evaluate pre-training on a small topology and its generalising behaviour on a larger topology to an extent, based on learnt behaviour of network dynamics of bottleneck queues in Section \ref{ssec:comptop}. However, this merely scratches the surface and doesn't match the scale of complexity of dynamics on real networks.
10 |
11 | Apart from this, our current network traces for training and evaluation are only drawn from a small subset of Internet traffic distributions\cite{homa}. Testing our NTT prototype in real, diverse environments and with multiple fine-tuning tasks will provide many more invaluable insights. We can not only better understand the strengths and weaknesses of our NTT architecture but also into the ``fundamental learnability'' of network dynamics. A first step for this can be conducting more extensive, more complex simulations and analysing real-world datasets such as Caida\cite{caida}, M-LAB\cite{mlab}, or Crawdad\cite{crawdad} or Rocketfuel\cite{rocketfuel}. This kind of evaluation will provide much deeper insights into the generalization on learnt network dynamics.
12 |
13 | \emph{How does the NTT hold up with more diverse environments and fine-tuning tasks? Which aspects of network dynamics are easy to generalize to, and which are difficult?}
14 |
15 | \section{Learning more complex features}
16 | \label{sec:compftt}
17 |
18 | More diverse environments, \ie with more diverse network dynamics, also present the opportunity to improve our NTT architecture. The better the learning of general network dynamics during pre-training, the more can other models benefit from the NTT during fine-tuning. The directions for improvement we see here are:
19 |
20 | \begin{itemize}
21 | \item \emph{Better sequence aggregation:} We base our current NTT's aggregation levels on the number of in-flight packets, \ie whether packets in the sequence may share their fate, determined by buffer sizes in our experiments. Evaluations show that the hypothesis holds; the further packets are apart, the less likely they do share fate. Such packets are aggregated much more, given their individual contribution to the state of current packet is much lower. Currently, we believe matching individual aggregation levels to common buffer sizes (\eg flow and switch buffers) may be beneficial. Much more research is still needed to put this hypothesis to the test and determine the best sequence sizes and aggregation levels for the future NTT versions.
22 |
23 | \item \emph{Multiple protocol packet data:} Till now, we did not use network traces which combined different transport protocols or contained network prioritisation of different traffic classes, and thus did not use any specific packet header information in our features for learning. Considering such header information might be essential to learn behavioural differences between such categorisations. Raw packet headers can be challenging to provide as inputs to an ML model, as they may appear in many combinations and contain values that are difficult to learn, like IP addresses\cite{zhangMimicNetFastPerformance2021}. Some research ideas from the network verification community on header space analysis\cite{kazemianHeaderSpaceAnalysis} may provide valuable insights on header representations and potential first steps in this direction.
24 |
25 | \item \emph{Learning path identification:} Currently, we do not have a concrete method for the NTT to learn differences between multiple possible paths in the network, a feature which will become significant, to learn on larger topologies. Evaluation on the topologies with multiple paths (in Section \ref{ssec:comptop}), demonstrated that such a distinction is indeed required. In the initial experiments, this was solved by providing a unique identifier (Receiver ID) as an additional feature in the input feature set. While this is a quick, simple fix; it might not be an optimal method to scale to larger topologies. Additionally, such a simple identifier does not provide insights about hierarchical overlap (\eg subnets), which might be required for more efficient learning. Networks today learn the required path information from the routing function (\eg shortest path, prefix matching, subnetting etc). While it might be possible for the NTT to ``learn" the the path information and the routing function by giving it all features like prefix, subnet mask etc., this might be suboptimal, hard and unnecessary. Coming up with possible ideas to learn path information better is a possible next step we see to improve the NTT.
26 |
27 | \item \emph{Dealing better with rare network events:} While in several fields of machine learning, it is enough to learn behaviour ``on average'', this doesn't translate to the domain of network data. Less frequently occurring events in networks (\eg packet drops from link failures) can lead to significant information loss. This kind of behaviour is hard to learn for machine learning algorithms due to these events have relatively very less representation in the training data but it is essential that our NTT learns outcomes of these events to an extent. One possible step is to collect telemetry data like packet drops or buffer occupancy as features. This may allow the model to learn the behaviour of networks better, though this will be hard to due to sparse nature of such data and future research is needed to solve this in an efficient way.
28 | \end{itemize}
29 | \vspace{-0.3cm}
30 | \emph{How can we improve the NTT design to learn efficiently from diverse environments? How can we deal with an information mismatch between environments?}
31 |
32 |
33 | \section{Collaborative pre-training}
34 | \label{sec:collab}
35 |
36 | Transformers in NLP and CV have only shown to truly outshine their competition when pre-trained with massive amounts of data. We envision that this would require a previously unseen collaboration across the industry. We see two main challenges:
37 | \begin{itemize}
38 | \item \emph{Training data volume:} Training an NTT to learn complex dynamics at the scale of large topologies will require a extremely large amount of data. Given the differences possible across networks, the pre-training data will need to be representative of this, which will require huge amount of network traces, which no single organisation might have access too and will require collaboration between several of them.
39 | \item \emph{Data privacy:} Due to privacy concerns, it might not be possible to share a lot of this data publicly. Also several organisations might be unwilling to share their data anyway, as it would cost them their competitive advantage in the industry.
40 | \end{itemize}
41 |
42 |
43 | We see some possible solutions to these problems. ML models are known to effectively compress data. As an example, GPT-3\cite{brownLanguageModelsAre2020} is one of the largest Transformer models for text data to date and consists of 175 Billion parameters or roughly 350 Gigabytes. However, it contains information from over 45 Terabytes of text data. Another huge model is Data2Vec\cite{baevskiData2vecGeneralFramework2022} which is a general purpose Transformer which learns representations for text, images and audio using a task-agnostic training approach with knowledge distillation\cite{kd}, but is trained on trillions of datapoints. Sharing a pre-trained model is much more feasible than sharing all the underlying data, it also reduces training time and redundancy in re-training for already established results.
44 | Furthermore, sharing models instead of data could overcome privacy barriers via federated learning\cite{kairouzAdvancesOpenProblems2021}. Organizations can keep their data private and only share pre-trained models, which can then be combined into a final collectively pre-trained model. This brings up the problem of how can these models be trusted, but this can be solved by making the details of the pre-training process public, sharing the model architecture while at the same time, keeping the training data private.
45 |
46 | \emph{Can we leverage pre-training and federated learning to learn from previously unavailable data?}
47 |
48 | \section{Continual learning}
49 | \label{sec:cont}
50 |
51 | Underlying structure in languages and images, do not evolve much over time. A cat's remains a cat's image, whether viewed today or $10$ years later. A sentence in English might change slightly over time, due to changes in grammatical rules but the overall structure is similar. Pre-trained models on such data thus, do not need to be re-trained frequently. However, the Internet is an ever evolving environment. Protocols, applications, etc. may change over time. Interactions in networks due to addition of new network nodes sending traffic, may change the underlying network dynamics significantly.
52 |
53 | Though we are certain that underlying network dynamics will change over time, we do expect them change less frequently than changes in individual environments, and still argue that the same pre-trained NTT may be used for significant time, with just small amounts of fine-tuning from time to time. Nevertheless, at some point, even the learnt NTT model on underlying dynamics may become outdated and will have to be re-trained. It is already difficult to determine when it is helpful to re-train a specific model\cite{puffer}, and for a model that is supposed to capture a large range of environments, this is very likely be an even harder task.
54 |
55 | \emph{At which point should we consider an NTT outdated? When and with what data should it be re-trained?}
56 |
57 |
58 |
59 |
60 |
--------------------------------------------------------------------------------
/workspace/NetworkSimulators/memento/cdf-application.cc:
--------------------------------------------------------------------------------
1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */
2 | //
3 | // Copyright (c) 2006 Georgia Tech Research Corporation
4 | //
5 | // This program is free software; you can redistribute it and/or modify
6 | // it under the terms of the GNU General Public License version 2 as
7 | // published by the Free Software Foundation;
8 | //
9 | // This program is distributed in the hope that it will be useful,
10 | // but WITHOUT ANY WARRANTY; without even the implied warranty of
11 | // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 | // GNU General Public License for more details.
13 | //
14 | // You should have received a copy of the GNU General Public License
15 | // along with this program; if not, write to the Free Software
16 | // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17 | //
18 | // Author: George F. Riley
19 | //
20 |
21 | // ns3 - On/Off Data Source Application class
22 | // George F. Riley, Georgia Tech, Spring 2007
23 | // Adapted from ApplicationOnOff in GTNetS.
24 |
25 | #include "fstream"
26 | #include "ns3/log.h"
27 | #include "ns3/address.h"
28 | #include "ns3/inet-socket-address.h"
29 | #include "ns3/inet6-socket-address.h"
30 | #include "ns3/packet-socket-address.h"
31 | #include "ns3/node.h"
32 | #include "ns3/nstime.h"
33 | #include "ns3/data-rate.h"
34 | #include "ns3/random-variable-stream.h"
35 | #include "ns3/socket.h"
36 | #include "ns3/simulator.h"
37 | #include "ns3/socket-factory.h"
38 | #include "ns3/packet.h"
39 | #include "ns3/uinteger.h"
40 | #include "ns3/trace-source-accessor.h"
41 | #include "ns3/udp-socket-factory.h"
42 | #include "ns3/string.h"
43 | #include "ns3/pointer.h"
44 | #include "ns3/double.h"
45 | #include "ns3/tag.h"
46 |
47 | #include "cdf-application.h"
48 | #include "ns3/experiment-tags.h"
49 |
50 | namespace ns3
51 |
52 | {
53 |
54 | NS_LOG_COMPONENT_DEFINE("CdfApplication");
55 |
56 | NS_OBJECT_ENSURE_REGISTERED(CdfApplication);
57 |
58 | TypeId
59 | CdfApplication::GetTypeId(void)
60 | {
61 | static TypeId tid =
62 | TypeId("ns3::CdfApplication")
63 | .SetParent()
64 | .SetGroupName("Applications")
65 | .AddConstructor()
66 | .AddAttribute("DataRate", "The data rate in on state.",
67 | DataRateValue(DataRate("500kb/s")),
68 | MakeDataRateAccessor(&CdfApplication::SetRate,
69 | &CdfApplication::GetRate),
70 | MakeDataRateChecker())
71 | .AddAttribute("CdfFile", "Message size distribution file.",
72 | EmptyAttributeValue(),
73 | MakeStringAccessor(&CdfApplication::SetDistribution,
74 | &CdfApplication::GetDistribution),
75 | MakeStringChecker())
76 | .AddAttribute("Remote", "The address of the destination",
77 | AddressValue(),
78 | MakeAddressAccessor(&CdfApplication::m_peer),
79 | MakeAddressChecker())
80 | .AddAttribute("Protocol", "The type of protocol to use. This should be "
81 | "a subclass of ns3::SocketFactory",
82 | TypeIdValue(UdpSocketFactory::GetTypeId()),
83 | MakeTypeIdAccessor(&CdfApplication::m_tid),
84 | // This should check for SocketFactory as a parent
85 | MakeTypeIdChecker())
86 | .AddTraceSource("Tx", "A new packet is created and is sent",
87 | MakeTraceSourceAccessor(&CdfApplication::m_txTrace),
88 | "ns3::Packet::TracedCallback")
89 | .AddTraceSource("TxWithAddresses", "A new packet is created and is sent",
90 | MakeTraceSourceAccessor(&CdfApplication::m_txTraceWithAddresses),
91 | "ns3::Packet::TwoAddressTracedCallback");
92 | return tid;
93 | }
94 |
95 | CdfApplication::CdfApplication()
96 | : m_socket(0),
97 | m_connected(false),
98 | m_lastStartTime(Seconds(0)),
99 | m_average_size(0),
100 | m_sizeDist(CreateObject()),
101 | m_timeDist(CreateObject()),
102 | m_counter(0)
103 | {
104 | NS_LOG_FUNCTION(this);
105 | }
106 |
107 | CdfApplication::~CdfApplication()
108 | {
109 | NS_LOG_FUNCTION(this);
110 | }
111 |
112 | Ptr
113 | CdfApplication::GetSocket(void) const
114 | {
115 | NS_LOG_FUNCTION(this);
116 | return m_socket;
117 | }
118 |
119 | int64_t
120 | CdfApplication::AssignStreams(int64_t stream)
121 | {
122 | NS_LOG_FUNCTION(this << stream);
123 | m_sizeDist->SetStream(stream);
124 | m_timeDist->SetStream(stream + 1);
125 | return 2;
126 | }
127 |
128 | void CdfApplication::DoDispose(void)
129 | {
130 | NS_LOG_FUNCTION(this);
131 |
132 | CancelEvents();
133 | m_socket = 0;
134 | // chain up
135 | Application::DoDispose();
136 | }
137 |
138 | // Application Methods
139 | void CdfApplication::StartApplication() // Called at time specified by Start
140 | {
141 | NS_LOG_FUNCTION(this);
142 |
143 | // Create the socket if not already
144 | if (!m_socket)
145 | {
146 | m_socket = Socket::CreateSocket(GetNode(), m_tid);
147 | if (Inet6SocketAddress::IsMatchingType(m_peer))
148 | {
149 | if (m_socket->Bind6() == -1)
150 | {
151 | NS_FATAL_ERROR("Failed to bind socket");
152 | }
153 | }
154 | else if (InetSocketAddress::IsMatchingType(m_peer) ||
155 | PacketSocketAddress::IsMatchingType(m_peer))
156 | {
157 | if (m_socket->Bind() == -1)
158 | {
159 | NS_FATAL_ERROR("Failed to bind socket");
160 | }
161 | }
162 | m_socket->Connect(m_peer);
163 | m_socket->SetAllowBroadcast(true);
164 | m_socket->ShutdownRecv();
165 |
166 | m_socket->SetConnectCallback(
167 | MakeCallback(&CdfApplication::ConnectionSucceeded, this),
168 | MakeCallback(&CdfApplication::ConnectionFailed, this));
169 | }
170 |
171 | // Insure no pending event
172 | CancelEvents();
173 | // If we are not yet connected, there is nothing to do here
174 | // The ConnectionComplete upcall will start timers at that time
175 | //if (!m_connected) return;
176 | ScheduleNextTx();
177 | //Simulator::Schedule(m_stopTime, &CdfApplication::CancelEvents, this);
178 | }
179 |
180 | void CdfApplication::StopApplication() // Called at time specified by Stop
181 | {
182 | NS_LOG_FUNCTION(this);
183 |
184 | CancelEvents();
185 | if (m_socket != 0)
186 | {
187 | m_socket->Close();
188 | }
189 | else
190 | {
191 | NS_LOG_WARN("CdfApplication found null socket to close in StopApplication");
192 | }
193 | }
194 |
195 | void CdfApplication::CancelEvents()
196 | {
197 | NS_LOG_FUNCTION(this);
198 | Simulator::Cancel(m_sendEvent);
199 | }
200 |
201 | // Private helpers
202 | void CdfApplication::ScheduleNextTx()
203 | {
204 | NS_LOG_FUNCTION(this);
205 |
206 | // Draw waiting time.
207 | auto nextTime = Seconds(m_timeDist->GetValue());
208 | NS_LOG_DEBUG("Wait Time: " << nextTime.GetMilliSeconds() << "ms.");
209 | m_sendEvent = Simulator::Schedule(nextTime,
210 | &CdfApplication::SendPacket, this);
211 | }
212 |
213 | void CdfApplication::SendPacket()
214 | {
215 | NS_LOG_FUNCTION(this);
216 |
217 | // Draw packet size.
218 | auto size = m_sizeDist->GetInteger();
219 | NS_LOG_DEBUG("Choosen Size: " << size << " Bytes.");
220 |
221 | NS_ASSERT(m_sendEvent.IsExpired());
222 | Ptr packet = Create(size);
223 | m_txTrace(packet);
224 |
225 | MessageTag m_tag;
226 | m_tag.SetSimpleValue(m_counter++);
227 | packet->AddPacketTag(m_tag);
228 | m_socket->Send(packet);
229 | Address localAddress;
230 | m_socket->GetSockName(localAddress);
231 | if (InetSocketAddress::IsMatchingType(m_peer))
232 | {
233 | NS_LOG_INFO("At time " << Simulator::Now().GetSeconds()
234 | << "s on-off application sent "
235 | << packet->GetSize() << " bytes to "
236 | << InetSocketAddress::ConvertFrom(m_peer).GetIpv4()
237 | << " port " << InetSocketAddress::ConvertFrom(m_peer).GetPort());
238 | m_txTraceWithAddresses(packet, localAddress, InetSocketAddress::ConvertFrom(m_peer));
239 | }
240 | else if (Inet6SocketAddress::IsMatchingType(m_peer))
241 | {
242 | NS_LOG_INFO("At time " << Simulator::Now().GetSeconds()
243 | << "s on-off application sent "
244 | << packet->GetSize() << " bytes to "
245 | << Inet6SocketAddress::ConvertFrom(m_peer).GetIpv6()
246 | << " port " << Inet6SocketAddress::ConvertFrom(m_peer).GetPort());
247 | m_txTraceWithAddresses(packet, localAddress, Inet6SocketAddress::ConvertFrom(m_peer));
248 | }
249 | m_lastStartTime = Simulator::Now();
250 | ScheduleNextTx();
251 | }
252 |
253 | void CdfApplication::ConnectionSucceeded(Ptr socket)
254 | {
255 | NS_LOG_FUNCTION(this << socket);
256 | m_connected = true;
257 | }
258 |
259 | void CdfApplication::ConnectionFailed(Ptr socket)
260 | {
261 | NS_LOG_FUNCTION(this << socket);
262 | }
263 |
264 | /*
265 | void CdfApplication::LoadDistribution()
266 | {
267 | NS_LOG_FUNCTION(this);
268 | // in any case, make sure the data rate is up to date, in case m_Rate was
269 | // changed.
270 | // If loaded already, do nothing else.
271 | if (m_filename == m_loaded_filename)
272 | {
273 | UpdateRateDistribution();
274 | return;
275 | }
276 |
277 | // Reset dist
278 | m_sizeDist = CreateObject();
279 |
280 | NS_LOG_DEBUG("FILENAME " << m_filename);
281 | std::ifstream distFile(m_filename);
282 |
283 | if (!(distFile >> m_average_size))
284 | {
285 | NS_FATAL_ERROR("Could not parse file: " << m_filename);
286 | }
287 | UpdateRateDistribution();
288 | NS_LOG_DEBUG("Average size: " << m_average_size << " Bytes.");
289 | NS_LOG_DEBUG("Average interarrival time: " << m_timeDist->GetMean() << "s.");
290 |
291 | double value, probability;
292 | while (distFile >> value >> probability)
293 | {
294 | NS_LOG_DEBUG(value << ", " << probability);
295 | m_sizeDist->CDF(value, probability);
296 | }
297 |
298 | m_loaded_filename = m_filename;
299 | }
300 | */
301 |
302 | void CdfApplication::UpdateRateDistribution()
303 | {
304 | NS_LOG_FUNCTION(this);
305 | auto timeBetween = m_rate.CalculateBytesTxTime(m_average_size);
306 | m_timeDist->SetAttribute("Mean", DoubleValue(timeBetween.GetSeconds()));
307 | }
308 |
309 | bool CdfApplication::SetDistribution(std::string filename)
310 | {
311 | NS_LOG_FUNCTION(this << filename);
312 | m_filename = filename;
313 |
314 | // Reset existing dist, if any.
315 | m_sizeDist = CreateObject();
316 |
317 | std::ifstream distFile(m_filename);
318 |
319 | if (!(distFile >> m_average_size))
320 | {
321 | NS_LOG_ERROR("Could not parse file: " << m_filename);
322 | return false;
323 | }
324 | // Using the average rate, update the time dist.
325 | UpdateRateDistribution();
326 |
327 | NS_LOG_DEBUG("Average size: " << m_average_size << " Bytes.");
328 | NS_LOG_DEBUG("Average interarrival time: " << m_timeDist->GetMean() << "s.");
329 |
330 | NS_LOG_DEBUG("Loading CDF from file...");
331 | double value, probability;
332 | while (distFile >> value >> probability)
333 | {
334 | NS_LOG_DEBUG(value << ", " << probability);
335 | m_sizeDist->CDF(value, probability);
336 | }
337 | return true;
338 | }
339 | std::string CdfApplication::GetDistribution() const { return m_filename; }
340 |
341 | void CdfApplication::SetRate(DataRate rate)
342 | {
343 | m_rate = rate;
344 | UpdateRateDistribution();
345 | }
346 | DataRate CdfApplication::GetRate() const { return m_rate; }
347 |
348 | } // Namespace ns3
349 |
--------------------------------------------------------------------------------
/workspace/README.md:
--------------------------------------------------------------------------------
1 | # Workspace
2 |
3 | We provide an exhaustive guide here to reproduce all experiments to train and evaluate the NTT model. Most of the steps are automated but some will have to be done manually as there are multiple platforms used separately e.g. NS3 for simulations and PyTorch-Lightning for the NTT implementation.
4 |
5 | ## File descriptions:
6 | The files inside the [TransformerModels](TransformerModels) directory is as follows:
7 |
8 | Core files -
9 | * [`encoder_delay.py`](TransformerModels/encoder_delay.py) : Pre-train the NTT by masking the last delay only.
10 | * [`encoder_delay_varmask_chooseencodelem.py`](TransformerModels/encoder_delay_varmask_chooseencodelem.py) : Pre-train the NTT by masking delays after choosing equally from the NTT's output encoded elements.
11 | * [`encoder_delay_varmask_chooseencodelem_multi.py`](TransformerModels/encoder_delay_varmask_chooseencodelem_multi.py) : Pre-train the NTT by masking delays after choosing equally from the NTT's output encoded elements and using multiple decoder instances.
12 | * [`encoder_delay_varmask_chooseagglevel_multi.py`](TransformerModels/encoder_delay_varmask_chooseagglevel_multi.py) : Pre-train the NTT by masking delays after choosing equally from the 3 levels of aggregation for the NTT and using multiple decoder instances.
13 | * [`finetune_encoder.py`](TransformerModels/finetune_encoder.py) : Fine-tune the NTT by masking the last delay only.
14 | * [`finetune_encoder_multi.py`](TransformerModels/finetune_encoder_multi.py) : Fine-tune the NTT by masking the last delay but initialize with multiple decoders to match the architecture when pre-trained with multiple decoders.
15 | * [`finetune_mct.py`](TransformerModels/finetune_mct.py) : Fine-tune the NTT to predict the MCT on the given data.
16 | * [`finetune_mct_multi.py`](TransformerModels/finetune_mct_multi.py) : Fine-tune the NTT to predict the MCT on the given data but initialize with multiple decoders to match the architecture when pre-trained with multiple decoders.
17 | * [`generate_sequences.py`](TransformerModels/generate_sequences.py) : Generate the sliding windows for the NTT from the processed NS3 simulations' packet data.
18 | * [`utils.py`](TransformerModels/utils.py) : All utility functions for data pre-processing.
19 | * [`arima.py`](TransformerModels/arima.py) : Train the ARIMA baselines.
20 | * [`lstm.py`](TransformerModels/lstm.py) : Train the Bi-LSTM baselines.
21 | * [`configs`](TransformerModels/configs) : Hyper-paramters for training the NTT model.
22 |
23 |
24 | Plot files -
25 | * [`plot_losses.py`](TransformerModels/plot_losses.py) : Plot MCT loss curves after fine-tuning the NTT pre-trained on masking the last delay only.
26 | * [`mct_test_plots.py`](TransformerModels/mct_test_plots.py) : Plot MCT loss curves after fine-tuning the NTT pre-trained on masking on variable positions.
27 | * [`plot_predictions.py`](TransformerModels/plot_predictions.py) : Plot historgrams of predictions after pre-training and fine-tuning the NTT.
28 |
29 | Others -
30 | * [`transformer_delay.py`](TransformerModels/transformer_delay.py) : A vanilla Transformer encoder-decoder architecture, naively trained on some packet data to predict delays. (this was only for initial insights)
31 |
32 | The files inside the [PandasScripts](PandasScripts) directory is as follows:
33 | * [`csvhelper_memento.py`](PandasScripts/csvhelper_memento.py) : Utility script to pre-process raw memento NS3 outputs to a format, which makes it easier to create the sliding windows and train the NTT. This is the actual script used.
34 | * [`csv_gendelays.py`](PandasScripts/csv_gendelays.py) : Utility script to pre-process raw NS3 outputs to a format, which makes it easier to create the sliding windows and train the vanilla transformer. This is NOT used, except for initial insights.
35 |
36 | The structure inside the [NetworkSimulators](NetworkSimulators) is as follows:
37 | * [memento](NetworkSimulators/memento): Contains a working copy of ONLY the relevant code files for generating the pre-training and fine-tuning the NTT models. This cannot be run without the full setup, which is self contained in [memento-ns3-for-NTT](https://github.com/Siddhant-Ray/memento-ns3-for-NTT). Files inside this [memento](NetworkSimulators/memento) directory, should not be used anymore, except for quick reference.
38 | * [ns3](NetworkSimulators/ns3): This was used for initial insights only, and NO RESULTS from it have been included in the thesis.
39 | - Contains a working copy of the relevant code files for generating the pre-training for the vanilla NTT model, which is authored in [`transformer_delay.py`](TransformerModels/transformer_delay.py). To generate this data, you must install ns3 from scratch as mentioned [here](https://www.nsnam.org/docs/release/3.35/tutorial/singlehtml/index.html#prerequisites).Following which, all the `.cc` files in [ns3](NetworkSimulators/ns3) must be put in the `scratch/` directory. This can be tricky, so we will provide a quicker alternative setup.
40 | - Alternatively, you can run the script [`dockerns3.sh`](NetworkSimulators/ns3/dockerns3.sh), and use the files [`cptodocker.sh`](NetworkSimulators/ns3/cptodocker.sh) and [`cpfromdocker.sh`](NetworkSimulators/ns3/cpfromdocker.sh) to move the code files and results, in and out of the ns3 container.
41 | - You can run the files inside the container with the commands:
42 | * `export NS_LOG=congestion_1=info`
43 | * and then `./waf --run scratch/congestion_1`.
44 | - This generates a folder called `congesion_1` with the required data files.
45 | - For pre-processing, copy the `congesion_1` folder into [PandasScripts](PandasScripts) and run:
46 | * ```python csv_gendelays.py --model tcponly --numsenders 6```
47 | - The files can now be added to [`transformer_delay.py`](TransformerModels/transformer_delay.py) and the job can be run.
48 |
49 |
50 | ## To reproduce results in the thesis:
51 |
52 | ### Setup
53 |
54 | To run on the ```TIK SLURM cluster```, you need to install ```pyenv```, details for which can be found here: [D-ITET Computing](https://computing.ee.ethz.ch/Programming/Languages/Python).
55 |
56 | On other clusters or standalone systems, you can use the system Python to create a new virtual environment.
57 |
58 | $ python -m venv venv
59 |
60 | After the environment has been created (created name is `venv` for simplicity):
61 |
62 |
63 | If it is a pyenv environment, run
64 |
65 | $ eval "$(pyenv init -)"
66 | $ eval "$(pyenv virtualenv-init -)"
67 | $ pyenv activate venv
68 |
69 | Else for normal Python virtual environments, run
70 |
71 | $ source venv/bin/acvtivate
72 |
73 | Now, install the Python dependencies:
74 |
75 | $ pip install -r requirements.txt
76 |
77 | Alternatively, if installing environments and dependencies may not work on some systems (Eg. Windows), you can use our pre-built Docker image for setting up the same. The Docker image has a Python environment setup with the dependencies, along with the entire code packaged in it, for ease of use.
78 |
79 | To use the docker image, run
80 |
81 | $ ./docker-run.sh
82 |
83 | Or you can build your own Docker image locally. For this, run
84 |
85 | $ docker build --tag ntt-docker .
86 |
87 | The folder (submodule) [`memento-ns3-for-NTT`](https://github.com/Siddhant-Ray/memento-ns3-for-NTT) contains instructions to generate the training data using NS3 simulations. The module is self contained and will generate a folder called ```results/```, which will contain the required data. To preprocess, copy the ```results/``` folder into the directory [`PandasScripts`](PandasScripts) and run the script (modify the filesnames inside [`csvhelper_memento.py`](PandasScripts/csvhelper_memento.py) if needed):
88 |
89 | $ python csvhelper_memento.py --model memento
90 |
91 | This will generate the pre-processed files. The files maybe different, depending on the kind of data generated but all of them will end with ```_final.csv```. Copy all files with this ending, into a folder named ```memento_data/``` and move this folder to the [`TransformerModels`](TransformerModels) directory.
92 |
93 | Copying ```results/``` and ```memento_data/``` to these destinations is needed, else the execution will fail. After copying the files, the training and fine-tuning phase is ready to be initiated.
94 |
95 | NOTE: If you are using the Docker image for NTT training, you will need to generate the pre-training data inside the Docker containers provided by `memento-ns3-for-NTT`. Following which, the ```results/``` folder must be copied inside the ```ntt-docker``` container, into the same directories as mentioned above. This can be done with ```docker cp```. Our Docker image doesn't support GPUs (yet), so feel free to modify the Dockerfile to include CUDA support, or run with CPUs for now.
96 |
97 | ### Training and fine-tuning:
98 |
99 | We need GPUs to run the training and fine-tuning, and this documentation only covers the steps to run on the ```TIK SLURM cluster```. If running on other clusters, the setup might have to be modified a little. We provide a self-contained run script ```run.sh```, in which you can uncomment out the job you want to run. Now you can just execute:
100 |
101 | $ sbatch run.sh
102 |
103 | ### Reproduce the plots:
104 |
105 | The specific log folders generated after a pre-training or fine-tuning job, must be copied with the EXACT same names, into a ```logs``` directories relative to the [`plot_losses.py`](TransformerModels/plot_losses.py) or [`mct_test_plots.py`](TransformerModels/mct_test_plots.py) files, as displayed in the ```.py``` files. Following that, the plots can be generated as simply as:
106 |
107 | $ python mct_test_plots.py
108 | $ python plot_losses.py
109 |
110 |
111 | ## Comments:
112 |
113 | * SBATCH commands in ```run.sh``` might need to be changed as per memory or GPU requirements.
114 | * For running the ARIMA baselines, GPUs are not needed.
115 | * On the TIK SLURM cluster, sometimes there is the following error ```OSError: [Errno 12] Cannot allocate memory```.
116 | To fix this:
117 | - Increase the amount of memory for the job to be run or
118 | - Reduce the ```num_workers``` argument in the DataLoader inside the given model's ```.py``` file from 4 to 1.
119 | * To switch to data from different topologies, you only need to change the ```NUM_BOTTLENECKS``` global variable in the respective model's ```.py``` you are running. Note that not all experiments are meant to be run on all topologies. For details on which topology is used for which experiment, refer to [`thesis.pdf`](../report/thesis.pdf)
120 | * Checkpoints will automatically be saved in the respective log folders for every job (refer to the particular model's ```.py``` to see specific names). It is advisable to copy the ```.ckpt``` files into a new folder named ```checkpoints/```, in order to initialise from the trained weights and not lose any work. This relative path ```checkpoints/*.ckpt``` can replaced in the appropriate ```.py``` file. Every fine-tuning ```.py``` file has a global variable ```PRETRAINED``` which can be set to ```True``` if you want to initialize from the saved weights, or ```False``` if fine-tuning must be done from scratch .
121 | * Different models have different instances of linear layers, and if you want to initialize fine-tuning from a checkpoint, you must ensure that the pre-training process had the same model architecture, else PyTorch model loading will fail. After initialization with the same architecture, the layers can be changed as per required. As a quick map,
122 | - If pre-training is done with multiple linear layer instances in the model i.e. [`encoder_delay_varmask_chooseagglevel_multi.py`](TransformerModels/encoder_delay_varmask_chooseagglevel_multi.py) or [`encoder_delay_varmask_chooseencodelem_multi.py`](TransformerModels/encoder_delay_varmask_chooseencodelem_multi.py), then [`finetune_encoder_multi.py`](TransformerModels/finetune_encoder_multi.py) and [`finetune_mct_multi.py`](TransformerModels/finetune_mct_multi.py) should be used for fine-tuning.
123 | - In other cases of single linear layer instances in the model, [`finetune_encoder.py`](TransformerModels/finetune_encoder.py) and [`finetune_mct.py`](TransformerModels/finetune_mct.py) should be used for fine-tuning.
124 | * The ```TRAIN``` global variable in the ```.py``` files is used to decide whether to train on the training data, or just test on the testing data.
125 | * The ```trainer API``` from PyTorch lightning (present in the all of the Transformer ```.py``` files) is used to select multiple GPUs using the ```strategy``` argument. Possible options are
126 | - `dp` : Data Parallel, this works always on the TIK SLURM cluster.
127 | - `ddp` : Distributed Data Parallel, this only works sometimes and we haven't used this. To run ddp jobs, modify the ```run.sh``` file, to include an `srun` command prior to the `python` command.
128 | * To save files, sometimes you might have to modify the directory and file names in the code, as needed on your machine. As this is not an end-to-end software, somtimes it is not possible to create a generic file saving system across multiple experiments.
129 |
130 |
131 |
132 |
133 |
--------------------------------------------------------------------------------
/workspace/TransformerModels/transformer_delay.py:
--------------------------------------------------------------------------------
1 | # Orignal author: Siddhant Ray
2 |
3 | import argparse
4 | import copy
5 | import json
6 | import math
7 | import os
8 | import pathlib
9 | import random
10 | import time as t
11 | from datetime import datetime
12 | from ipaddress import ip_address
13 |
14 | import matplotlib.pyplot as plt
15 | import numpy as np
16 | import pandas as pd
17 | import pytorch_lightning as pl
18 | import torch
19 | import yaml
20 | from pytorch_lightning import loggers as pl_loggers
21 | from pytorch_lightning.callbacks import ProgressBar
22 | from pytorch_lightning.callbacks.early_stopping import EarlyStopping
23 | from sklearn.model_selection import train_test_split
24 | from tensorboard.backend.event_processing.event_accumulator import \
25 | EventAccumulator
26 | from torch import einsum, nn, optim
27 | from torch.nn import functional as F
28 | from torch.utils.data import DataLoader
29 | from tqdm import tqdm
30 | from utils import (PacketDataset, convert_to_relative_timestamp,
31 | get_data_from_csv, ipaddress_to_number,
32 | sliding_window_delay, sliding_window_features,
33 | vectorize_features_to_numpy)
34 |
35 | random.seed(0)
36 | np.random.seed(0)
37 | torch.manual_seed(0)
38 |
39 | torch.set_default_dtype(torch.float64)
40 |
41 | # Hyper parameters from config file
42 |
43 | with open("configs/config-transformer.yaml") as f:
44 | config = yaml.load(f, Loader=yaml.FullLoader)
45 |
46 | WEIGHTDECAY = float(config["weight_decay"])
47 | LEARNINGRATE = float(config["learning_rate"])
48 | DROPOUT = float(config["dropout"])
49 | NHEAD = int(config["num_heads"])
50 | LAYERS = int(config["num_layers"])
51 | EPOCHS = int(config["epochs"])
52 | BATCHSIZE = int(config["batch_size"])
53 | LINEARSIZE = int(config["linear_size"])
54 | LOSSFUNCTION = nn.MSELoss()
55 |
56 | if "loss_function" in config.keys():
57 | if config["loss_function"] == "huber":
58 | LOSSFUNCTION = nn.HuberLoss()
59 | if config["loss_function"] == "smoothl1":
60 | LOSSFUNCTION = nn.SmoothL1Loss()
61 | if config["loss_function"] == "kldiv":
62 | LOSSFUNCTION = nn.KLDivLoss()
63 |
64 | # Params for the sliding window on the packet data
65 | SLIDING_WINDOW_START = 0
66 | SLIDING_WINDOW_STEP = 1
67 | SLIDING_WINDOW_SIZE = 10
68 |
69 | SAVE_MODEL = False
70 | MAKE_EPOCH_PLOT = True
71 | TEST = True
72 |
73 | if torch.cuda.is_available():
74 | NUM_GPUS = torch.cuda.device_count()
75 | print("Number of GPUS: {}".format(NUM_GPUS))
76 | else:
77 | print("ERROR: NO CUDA DEVICE FOUND")
78 | NUM_GPUS = 0
79 |
80 | # DO NOT USE (AS OF NOW)
81 | class AbsPosEmb1DAISummer(nn.Module):
82 | """
83 | Given query q of shape [batch heads tokens dim] we multiply
84 | q by all the flattened absolute differences between tokens.
85 | Learned embedding representations are shared across heads
86 | """
87 |
88 | def __init__(self, tokens, dim_head):
89 | """
90 | Output: [batch head tokens tokens]
91 | Args:
92 | tokens: elements of the sequence
93 | dim_head: the size of the last dimension of q
94 | """
95 | super().__init__()
96 | scale = dim_head**-0.5
97 | self.abs_pos_emb = nn.Parameter(torch.randn(tokens, dim_head) * scale)
98 |
99 | def forward(self, q):
100 | return einsum("b h i d, j d -> b h i j", q, self.abs_pos_emb)
101 |
102 |
103 | # DO NOT USE (AS OF NOW)
104 | class PositionalEncoding(nn.Module):
105 | def __init__(self, d_model, dropout=DROPOUT, max_len=5000):
106 | super(PositionalEncoding, self).__init__()
107 | self.dropout = nn.Dropout(p=dropout)
108 | pe = torch.zeros(max_len, d_model)
109 | position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
110 | div_term = torch.exp(
111 | torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model)
112 | )
113 | pe[:, 0::2] = torch.sin(position * div_term)
114 | pe[:, 1::2] = torch.cos(position * div_term)
115 | pe = pe.unsqueeze(0).transpose(0, 1)
116 | self.register_buffer("pe", pe)
117 |
118 | def forward(self, x):
119 | x = x + self.pe[: x.size(0), :]
120 | return self.dropout(x)
121 |
122 |
123 | # TRANSFOMER CLASS TO PREDICT DELAYS
124 | class BaseTransformer(pl.LightningModule):
125 | def __init__(self, input_size, target_size, loss_function):
126 | super(BaseTransformer, self).__init__()
127 |
128 | self.step = [0]
129 | self.warmup_steps = 1000
130 |
131 | # create the model with its layers
132 |
133 | self.encoder_layer = nn.TransformerEncoderLayer(
134 | d_model=LINEARSIZE, nhead=NHEAD, batch_first=True, dropout=DROPOUT
135 | )
136 | self.decoder_layer = nn.TransformerDecoderLayer(
137 | d_model=LINEARSIZE, nhead=NHEAD, batch_first=True, dropout=DROPOUT
138 | )
139 | self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=LAYERS)
140 | self.decoder = nn.TransformerDecoder(self.decoder_layer, num_layers=LAYERS)
141 | self.encoderin = nn.Linear(input_size, LINEARSIZE)
142 | self.decoderin = nn.Linear(target_size, LINEARSIZE)
143 | self.decoderpred = nn.Linear(LINEARSIZE, target_size)
144 | self.model = nn.ModuleList(
145 | [
146 | self.encoder,
147 | self.decoder,
148 | self.encoderin,
149 | self.decoderin,
150 | self.decoderpred,
151 | ]
152 | )
153 |
154 | self.loss_func = loss_function
155 | parameters = {
156 | "WEIGHTDECAY": WEIGHTDECAY,
157 | "LEARNINGRATE": LEARNINGRATE,
158 | "EPOCHS": EPOCHS,
159 | "BATCHSIZE": BATCHSIZE,
160 | "LINEARSIZE": LINEARSIZE,
161 | "NHEAD": NHEAD,
162 | "LAYERS": LAYERS,
163 | }
164 | self.df = pd.DataFrame()
165 | self.df["parameters"] = [json.dumps(parameters)]
166 |
167 | def configure_optimizers(self):
168 | self.optimizer = optim.Adam(
169 | self.model.parameters(),
170 | betas=(0.9, 0.98),
171 | eps=1e-9,
172 | lr=LEARNINGRATE,
173 | weight_decay=WEIGHTDECAY,
174 | )
175 | return {"optimizer": self.optimizer}
176 |
177 | def lr_update(self):
178 | self.step[0] += 1
179 | learning_rate = LINEARSIZE ** (-0.5) * min(
180 | self.step[0] ** (-0.5), self.step[0] * self.warmup_steps ** (-1.5)
181 | )
182 | for param_group in self.optimizer.param_groups:
183 | param_group["lr"] = learning_rate
184 |
185 | def forward(self, input, target):
186 | # used for the forward pass of the model
187 | scaled_input = self.encoderin(input.double())
188 | target = self.decoderin(target.double())
189 | enc = self.encoder(scaled_input)
190 | out = self.decoderpred(self.decoder(target, enc))
191 | return out
192 |
193 | def training_step(self, train_batch, train_idx):
194 | X, y = train_batch
195 | self.lr_update()
196 | prediction = self.forward(X, y)
197 | loss = self.loss_func(prediction, y)
198 | self.log("Train loss", loss)
199 | return loss
200 |
201 | def validation_step(self, val_batch, val_idx):
202 | X, y = val_batch
203 | prediction = self.forward(X, y)
204 | loss = self.loss_func(prediction, y)
205 | self.log("Val loss", loss, sync_dist=True)
206 | return loss
207 |
208 | def test_step(self, test_batch, test_idx):
209 | X, y = test_batch
210 | prediction = self.forward(X, y)
211 | loss = self.loss_func(prediction, y)
212 | self.log("Test loss", loss, sync_dist=True)
213 | return loss
214 |
215 | def predict_step(self, test_batch, test_idx, dataloader_idx=0):
216 | X, y = test_batch
217 | prediction = self.forward(X, y)
218 | return prediction
219 |
220 | def training_epoch_end(self, outputs):
221 | loss_tensor_list = [item["loss"].to("cpu").numpy() for item in outputs]
222 | # print(loss_tensor_list, len(loss_tensor_list))
223 | self.log(
224 | "Avg loss per epoch",
225 | np.mean(np.array(loss_tensor_list)),
226 | on_step=False,
227 | on_epoch=True,
228 | )
229 |
230 |
231 | def main():
232 | path = "congestion_1/"
233 | files = [
234 | "endtoenddelay500s_1.csv",
235 | "endtoenddelay500s_2.csv",
236 | "endtoenddelay500s_3.csv",
237 | "endtoenddelay500s_4.csv",
238 | "endtoenddelay500s_5.csv",
239 | ]
240 |
241 | sl_win_start = SLIDING_WINDOW_START
242 | sl_win_size = SLIDING_WINDOW_SIZE
243 | sl_win_shift = SLIDING_WINDOW_STEP
244 |
245 | num_features = 15
246 | input_size = sl_win_size * num_features
247 | output_size = sl_win_size
248 |
249 | model = BaseTransformer(input_size, output_size, LOSSFUNCTION)
250 | full_feature_arr = []
251 | full_target_arr = []
252 | test_loaders = []
253 |
254 | for file in files:
255 | print(os.getcwd())
256 |
257 | df = get_data_from_csv(path + file)
258 | df = convert_to_relative_timestamp(df)
259 |
260 | df = ipaddress_to_number(df)
261 | feature_df, label_df = vectorize_features_to_numpy(df)
262 |
263 | print(feature_df.head(), feature_df.shape)
264 | print(label_df.head())
265 |
266 | feature_arr = sliding_window_features(
267 | feature_df.Combined, sl_win_start, sl_win_size, sl_win_shift
268 | )
269 | target_arr = sliding_window_delay(
270 | label_df, sl_win_start, sl_win_size, sl_win_shift
271 | )
272 | print(len(feature_arr), len(target_arr))
273 | full_feature_arr = full_feature_arr + feature_arr
274 | full_target_arr = full_target_arr + target_arr
275 |
276 | print(len(full_feature_arr), len(full_target_arr))
277 |
278 | full_train_vectors, test_vectors, full_train_labels, test_labels = train_test_split(
279 | full_feature_arr, full_target_arr, test_size=0.05, shuffle=True, random_state=42
280 | )
281 | # print(len(full_train_vectors), len(full_train_labels))
282 | # print(len(test_vectors), len(test_labels))
283 |
284 | train_vectors, val_vectors, train_labels, val_labels = train_test_split(
285 | full_train_vectors, full_train_labels, test_size=0.1, shuffle=False
286 | )
287 | # print(len(train_vectors), len(train_labels))
288 | # print(len(val_vectors), len(val_labels))
289 |
290 | # print(train_vectors[0].shape[0])
291 | # print(train_labels[0].shape[0])
292 |
293 | train_dataset = PacketDataset(train_vectors, train_labels)
294 | val_dataset = PacketDataset(val_vectors, val_labels)
295 | test_dataset = PacketDataset(test_vectors, test_labels)
296 | # print(train_dataset.__getitem__(0))
297 |
298 | train_loader = DataLoader(
299 | train_dataset, batch_size=BATCHSIZE, shuffle=True, num_workers=4
300 | )
301 | val_loader = DataLoader(
302 | val_dataset, batch_size=BATCHSIZE, shuffle=False, num_workers=4
303 | )
304 | test_loader = DataLoader(
305 | test_dataset, batch_size=BATCHSIZE, shuffle=False, num_workers=4
306 | )
307 |
308 | # print one dataloader item!!!!
309 | train_features, train_lbls = next(iter(train_loader))
310 | print(f"Feature batch shape: {train_features.size()}")
311 | print(f"Labels batch shape: {train_lbls.size()}")
312 | feature = train_features[0]
313 | label = train_lbls[0]
314 | print(f"Feature: {feature}")
315 | print(f"Label: {label}")
316 |
317 | val_features, val_lbls = next(iter(val_loader))
318 | print(f"Feature batch shape: {val_features.size()}")
319 | print(f"Labels batch shape: {val_lbls.size()}")
320 | feature = val_features[0]
321 | label = val_lbls[0]
322 | print(f"Feature: {feature}")
323 | print(f"Label: {label}")
324 |
325 | test_features, test_lbls = next(iter(test_loader))
326 | print(f"Feature batch shape: {test_features.size()}")
327 | print(f"Labels batch shape: {test_lbls.size()}")
328 | feature = test_features[0]
329 | label = test_lbls[0]
330 | print(f"Feature: {feature}")
331 | print(f"Label: {label}")
332 |
333 | print("Started training at:")
334 | time = datetime.now()
335 | print(time)
336 |
337 | print("Removing old logs:")
338 | os.system("rm -rf transformer_delay_logs/lightning_logs/")
339 |
340 | tb_logger = pl_loggers.TensorBoardLogger(save_dir="transformer_delay_logs/")
341 |
342 | if NUM_GPUS >= 1:
343 | trainer = pl.Trainer(
344 | precision=16,
345 | gpus=-1,
346 | strategy="dp",
347 | max_epochs=EPOCHS,
348 | check_val_every_n_epoch=1,
349 | logger=tb_logger,
350 | callbacks=[EarlyStopping(monitor="Val loss", patience=5)],
351 | )
352 | else:
353 | trainer = pl.Trainer(
354 | gpus=None,
355 | max_epochs=EPOCHS,
356 | check_val_every_n_epoch=1,
357 | logger=tb_logger,
358 | callbacks=[EarlyStopping(monitor="Val loss", patience=5)],
359 | )
360 |
361 | trainer.fit(model, train_loader, val_loader)
362 | print("Finished training at:")
363 | time = datetime.now()
364 | print(time)
365 |
366 | if SAVE_MODEL:
367 | name = config["name"]
368 | torch.save(model.model, f"./trained_transformer_{name}")
369 |
370 | if not MAKE_EPOCH_PLOT:
371 | t.sleep(5)
372 | log_dir = "transformer_logs/lightning_logs/version_0"
373 | y_key = "Avg loss per epoch"
374 |
375 | event_accumulator = EventAccumulator(log_dir)
376 | event_accumulator.Reload()
377 |
378 | steps = {x.step for x in event_accumulator.Scalars("epoch")}
379 | epoch_vals = list({x.value for x in event_accumulator.Scalars("epoch")})
380 | epoch_vals.pop()
381 |
382 | x = list(range(len(steps)))
383 | y = [x.value for x in event_accumulator.Scalars(y_key) if x.step in steps]
384 |
385 | fig, ax = plt.subplots()
386 | ax.plot(epoch_vals, y)
387 | ax.set_xlabel("epoch")
388 | ax.set_ylabel(y_key)
389 | fig.savefig("lossplot_perepoch.png")
390 |
391 | if TEST:
392 | trainer.test(model, dataloaders=test_loader)
393 |
394 |
395 | if __name__ == "__main__":
396 | main()
397 |
--------------------------------------------------------------------------------
/report/background.tex:
--------------------------------------------------------------------------------
1 | \chapter{Background and Related Work}
2 | \label{cha:background}
3 |
4 | A large part of the work undertaken during this project requires a deep understanding on how a particular deep learning architecture, the Transformer works. In this section, we will cover some of the required background and insights drawn from the Transformer architecture which were needed to model and solve our problem of prediction on packet data. We also present adaptations of the Transformer architecture to solve problems in several fields such as NLP/CV and relevant ideas which could be adapted to our tasks.
5 |
6 | \section{Background on Transformers}
7 | \label{sec:background}
8 |
9 | \subsection{Sequence modelling with attention}
10 | \label{ssec:bgsequence}
11 |
12 | Transformers are built around the \emph{attention mechanism}, which maps an input sequence to an output sequence of the same length.
13 | Every output encodes its own information and its context, ie information from related elements in the sequence, regardless of how far they are apart.
14 | The process involves the scalar multiplication of the input feature matrix and the attention matrix as a dot product operation, which allows the deep neural network to focus on certain parts of the sequence at a time, based on the values of the attention weight matrix. This allows the network to attend to parts of the sequence in parallel, rather than in sequence, which allows highly efficient computation. Also, as the attention weights are learnable parameters for the network, the Transformer over time, learns to choose the best weights, which allows for optimum learning of structure within the sequence of data.
15 | Computing attention is efficient as all elements in the sequence can be processed in parallel with matrix operations that are highly optimised on most hardware. These properties have made Transformer based neural networks the state-of-the-art solution for solving many sequence modelling tasks. We refer to an excellent illustrated guide to the Transformer here\cite{trans}.
16 |
17 |
18 | While attention originated as an improvement to Recurrent Neural Networks(RNNs), it was soon realised that the mechanism could replace them entirely\cite{vaswaniAttentionAllYou2017}. RNN models were the initial state-of-the art deep learning architectures in sequence modelling problems, however they suffer from several issues. Training RNNs is usually limited by one or all of the following problems:
19 | \begin{enumerate}
20 | \item RNNs are not computationally efficient for training long sequences, as they require $n$ sequential operations to learn a sequence, $n$ being the length of the given sequence. This makes the training process extremely slow.
21 | \item RNNs suffer from the problem of \emph{vanishing gradients} \ie as elements in the input need to be processed in a sequence over time, the gradients used by the optimiser\cite{Robbins2007ASA} for the elements at the end of very long sequences, become extremely small and numerically unstable to converge to the desired value.
22 | \item RNNs struggle to learn \emph{long-term dependencies}, \ie learning relations between elements far apart in the sequence is challenging.
23 | \end{enumerate}
24 |
25 | We present an excellent summary of the complexities and differences between several deep learning architectures which have attempted to solve the sequence modelling problem in Table \ref{bg:table1}.
26 |
27 | \begin{table}[htbp]
28 | \centering
29 | \begin{tabular}{ c c c c }
30 | \toprule
31 | Layer Type & Complexity & Sequential & Maximum \\
32 | & per Layer & Operations & Path Length \\
33 | \midrule
34 | Transformer(Self-Attention) & $O(n^2 \cdot d)$ & $O(1)$ & $O(1)$ \\
35 | Recurrent NN & $O(n \cdot d^2)$ & $O(n)$ & $O(n)$ \\
36 | Convolutional NN & $O(k \cdot n \cdot d^2)$ & $O(1)$ & $O(\log_{k}{n})$\\
37 | Restricted(Self-Attention) & $O(r \cdot n \cdot d)$ & $O(1)$ & $O(n/r)$\\
38 | \bottomrule
39 | \end{tabular}
40 | \caption{Maximum path lengths, per-layer complexity and minimum number of sequential operations
41 | for different layer types. $n$ is the sequence length, $d$ is the representation dimension, $k$ is the kernel
42 | size of convolutions and $r$ the size of the neighbourhood in restricted self-attention.
43 | SOURCE: Original paper\cite{vaswaniAttentionAllYou2017}}
44 | \label{bg:table1}
45 | \end{table}
46 |
47 |
48 | Augmenting RNNs with attention solves some of these issues\cite{rnnattention}, and replacing them with \emph{Transfomers} has shown to solve all these problems.
49 | The authors propose an architecture for translation tasks that contains:
50 | \begin{itemize}
51 | \item a learnable \emph{embedding} layer mapping words to vectors
52 | \item a \emph{transformer encoder} encoding the input sequence
53 | \item a \emph{transformer decoder} generating an output sequence based on the encoded input
54 | \end{itemize}
55 |
56 | Each transformer block alternates between attention and linear layers, ie between encoding context and refining features. The attention mechanism helps learn ``context rich" representations of every element, mapping relevant information from the surrounding elements which surround it and the linear layers help map this learn information into a form useful for downstream prediction. Figure \ref{fig:transformer} shows details of the original Transformer architecture and the functions of its layers.
57 |
58 | \begin{figure}[!hbt]
59 | \begin{center}
60 | \includegraphics[scale=1.5]{figures/architecture_transformer.pdf}
61 | \caption{Original Transformer Architecture, Credits: Alexander Dietmüller}
62 | \label{fig:transformer}
63 | \end{center}
64 | \end{figure}
65 |
66 | \subsection{Pre-training and fine-tuning}
67 | \label{ssec:bgtraining}
68 |
69 | Due to the highly efficient and parallelizable nature of Transformers, they can be widely used a variety of tasks based on the principle of \emph{transfer learning}. The most common strategy for that is to use the architecture for two phases, \emph{pre-training} and \emph{fine-tuning}. Inspired by the original Transformers success on the task of language translation, the use of Transformers has become ubiquitous in solving NLP problems. We present one such state of the the art NLP Transformer model,
70 | called BERT\cite{devlinBERTPretrainingDeep2019}, one of the most widely used transformer models today.
71 | While the original Transformer had both an encoder and decoder with attention, BERT uses only the transformer encoder followed by a small and replaceable decoder. This decoder is usually a set of linear layers and acts as a multilayer perceptron (MLP); and is usually called the 'MLP head' in the deep learning community. The principle of transfer-learning works for BERT as follows:
72 |
73 | \begin{itemize}
74 | \item In the first step, BERT is \emph{pre-trained} with a task that requires learning the underlying language structure. Linguists' research has shown that a Masked Language Model is optimal for deep learning models for learning structure in natural languages\cite{wettigShouldYouMask}.
75 | Concretely, a fraction of words in the input sequence is masked out (15\% in the original model), and the decoder is tasked to predict the original words from the encoded input sequence.
76 | BERT is used to generate contextual encodings of, which is only possible due to the bi-directionality of the attention mechanism in BERT. This allows the model to infer the context of the word from both sides in a given sentence, which was not possible earlier when elements in a sequence were only processed in a given order.\footnote{
77 | BERT is pre-trained from text corpora with several billion words and fine-tuned with $\sim$100 thousand examples per task.
78 | }
79 |
80 | \item In the second step, the unique pre-trained model can be fine-tuned to many different tasks by replacing the small decoder with task-specific ones, eg language understanding, question answering, or text generation.
81 | The fine-tuning process involves resumption of learning from the saved weights of the pre-training phase, but for a new task or learning in a new environment. The new model has already learned to encode a general language context and only needs to learn to extract the task-relevant information from this context. This requires far less data compared to starting from scratch and makes the pre-training process faster.
82 |
83 | Furthermore, BERTs pre-training step is unsupervised/self-supervised, \ie it requires only ``cheap'' unlabelled data and no labelled signal from the data a target value. As procuring labelled data is harder, this problem is mitigated by having a pre-training phase on ``generic" data and then using
84 | ``expensive'' labeled data, eg for text classification, during the fine-tuning phase. Figure \ref{fig:bert} shows the details of BERT's pre-training and fine-tuning phase.
85 | \end{itemize}
86 |
87 | \begin{figure}[!hbt]
88 | \begin{center}
89 | \includegraphics[scale=1.5]{figures/architecture_bert.pdf}
90 | \caption{Original BERT Architecture, Credits: Alexander Dietmüller}
91 | \label{fig:bert}
92 | \end{center}
93 | \end{figure}
94 |
95 |
96 |
97 | Another extremely useful feature of Transformer is its generalization ability, which is possible due to its highly efficient parallelization during training, which in turn helps transfer of knowledge to a variety of tasks during the process of fine-tuning. The OpenAI GPT-3\cite{brownLanguageModelsAre2020} model that investigates few-shot learning with only a pre-trained model.
98 | As it is just a pre-trained model, it does not outperform fine-tuned models on specific tasks. However, GPT-3 delivers impressive performance on various tasks, including translation, determining the sentiment of tweets, or generating completely novel text. As per requirement, it can also be fine-tuned on specific tasks, which showcases further, the generalization power of transformers.
99 |
100 |
101 | \subsection{Vision transformers}
102 | \label{ssec:bgvit}
103 |
104 | Following their success in NLP, the use of Transformers were explored as a possibility to learn structural information in other kinds of data. One such field where they gained a lot of traction in in the field of CV.
105 | However adapting the use of Transformers to vision tasks came with a few major challenges:
106 | \begin{itemize}
107 | \item Image data does not have a natural sequence, as they are a spatial collection of pixels.
108 | \item Learning encodings at the pixel level for a large number of images, proved to be too fine-grained to scale to big datasets.
109 | \item The Masked Language Model theory couldn't be efficiently transferred to the context of learning structure in images, as the relationship between the units (pixels) does not follow the same logic as in natural languages.
110 | \end{itemize}
111 |
112 | To solve this problems, the authors of the Vision Transformer(ViT)\cite{dosovitskiyImageWorth16x162021}, came up with multiple ideas to solve each of these problems, in order to have the data in a form, on which a Transformer could be efficiently trained.
113 | \begin{itemize}
114 | \item \emph{Serialize and aggregate the data: } As image data does not have a natural sequence structure, such a structure was artificially introduced. Every image was split at the pixel level and aggregated into patches of dimension 16$\times$16 and each of these patches became a member of the new input ``sequence". As Transformers scale quadratically with increasing sequence size \ref{bg:table1}, this also solved the problem of efficiency and that encodings at the pixel level are too fine-grained to be useful. The embedding and transformer layers were then applied to the resulting sequence of patches, using an architecture similar to BERT.
115 | \item \emph{Domain Specific Pre-training: } At the heart, most CV problems have very similar training and inference objectives, one of the most common being image classification, which makes most of the vision tasks similar in objective, differing only in environment. This made classification a much more suited task for image data pre-training as opposed to masking and reconstruction. This was exploited by the ViT and both the pre-training and fine-tuning was done with the objective of classification, with only a change in environment between the stages. This also meant that understanding structure in image data, not only required information from the neighbouring patches but also from the whole image, which was possible by using all the encoded patches for classification. We present the details of ViT's pre-training and fine-tuning in Figure \ref{fig:vit}.
116 | \end{itemize}
117 |
118 | \begin{figure}[!hbt]
119 | \begin{center}
120 | \includegraphics[scale=1.5]{figures/architecture_vit.pdf}
121 | \caption{Original ViT Architecture, Credits: Alexander Dietmüller}
122 | \label{fig:vit}
123 | \end{center}
124 | \end{figure}
125 |
126 |
127 | Finally, ViT's authors also observe that domain-specific architectures that implicitly encode structure, like convolutional networks (CNNs) work extremely well for image datasets and actually result in better performance on small datasets. However, given enough pre-training data, learning sequences with attention beats architectures that encode structure implicitly, making the process a tradeoff between the utility and the amount of resources required for pre-training. Later research in the field has also shown advancements in vision based transformers which use a masked reconstruction approach\cite{heMaskedAutoencodersAre2021}, such details however are beyond the scope of this section.
128 |
129 |
130 | \section{Related Work}
131 | \label{sec:related_work}
132 |
133 | The problem of learning networks dynamics from packet data has been deemed a complex ``lost cause'', and not a lot of the research community has made much direct effort in this direction. The idea of using large pre-trained machine learning models to learn from abstract network traces, has thus not been explored much. However, application specific adaptions of ML architectures have been used with varying amount of success on network data, and we present some of these efforts and successes here, which helped provide direction to our own ideas and thoughts for this project.
134 |
135 | In MimicNet\cite{zhangMimicNetFastPerformance2021}, the authors show that they can provide users with the familiar abstraction of a packet-level simulation for a portion of the network while leveraging redundancy and recent advances in machine learning to quickly and accurately approximate portions of the network that are not directly visible. In Puffer\cite{puffer}, they use supervised learning in situ, with data from the real deployment environment, to train a probabilistic predictor of upcoming chunk transmission times for video streaming. The authors of Aurora\cite{jayDeepReinforcementLearning2019} show that casting congestion control as a reinforcement learning (RL) problem enables training deep network policies that capture intricate patterns in data traffic and network conditions but also claim that fairness, safety, and generalization, are not trivial to address within conventional RL formalism. Pensieve\cite{maoNeuralAdaptiveVideo2017} trains a neural network model that selects bitrates for future video chunks based on observations collected by client video players and it learns to make ABR decisions solely through observations of the resulting performance of past decisions. Finally, in NeuroPlan\cite{planning} they propose to use a graph neural network (GNN) combined with a novel domain-specific node-link transformation for state encoding and following that, leverage a two-stage hybrid approach that first uses deep RL to prune the search space and then uses an ILP solver to find the optimal solution for the network planning problem.
136 |
137 |
138 |
139 |
140 |
141 |
--------------------------------------------------------------------------------