├── report ├── thesis.pdf ├── figures │ ├── mask.pdf │ ├── delay.pdf │ ├── delay2.pdf │ ├── delay4.pdf │ ├── vision.pdf │ ├── MCT_loss.pdf │ ├── overview.pdf │ ├── MCT_valloss.pdf │ ├── complex_topo.pdf │ ├── declaration.pdf │ ├── prev_vision.pdf │ ├── simple_topo.pdf │ ├── simulation.pdf │ ├── MCT_trainloss.pdf │ ├── eth-nsg-header.pdf │ ├── simple_topo_ft.pdf │ ├── slidingwindow.pdf │ ├── architecture_bert.pdf │ ├── architecture_ntt.pdf │ ├── architecture_vit.pdf │ ├── delay_Receiver0.pdf │ ├── delay_Receiver1.pdf │ ├── delay_Receiver2.pdf │ ├── delay_Receivers.pdf │ ├── queue_profile_A.pdf │ ├── simulation_small.pdf │ ├── SE_trend_arima_10000.pdf │ ├── SE_trend_arima_30000.pdf │ ├── architecture_transformer.pdf │ ├── finetune_mct_loss_comparison.pdf │ ├── finetune_mct_loss_comparison_agg.pdf │ ├── simple_topo.drawio │ ├── mask.drawio │ ├── slidingwindow.drawio │ └── complex_topo.drawio ├── README.md ├── Makefile ├── abstract.tex ├── summary.tex ├── thesis.tex ├── .gitignore ├── introduction.tex ├── appendix.tex ├── outlook.tex └── background.tex ├── presentation ├── slides.pdf ├── figures │ ├── delay.pdf │ ├── vision.pdf │ ├── eth_logo.pdf │ ├── nsg_logo.pdf │ ├── questions.pdf │ ├── simple_topo.pdf │ ├── complex_topo.pdf │ ├── architecture_ntt.pdf │ ├── eth-nsg-header.pdf │ ├── queue_profile_A.pdf │ ├── simple_topo_ft.pdf │ ├── finetune_mct_loss_comparison.pdf │ └── finetune_mct_loss_comparison_agg.pdf ├── README.md ├── Makefile └── .gitignore ├── .gitmodules ├── literature ├── README.md └── Literature.html ├── workspace ├── NetworkSimulators │ ├── ns3 │ │ ├── cptodocker.sh │ │ ├── cpfromdocker.sh │ │ ├── dockerns3.sh │ │ ├── tracing.cc │ │ ├── newnet.cc │ │ └── tcpapplication.cc │ └── memento │ │ ├── run_topo_small.sh │ │ ├── run_topo.sh │ │ ├── experiment-tags.h │ │ ├── eval.py │ │ ├── cdf-application.h │ │ └── cdf-application.cc ├── Dockerfile ├── docker-run.sh ├── TransformerModels │ ├── configs │ │ ├── config-linear.yaml │ │ ├── config-lstm.yaml │ │ ├── config-encoder.yaml │ │ ├── config-transformer.yaml │ │ └── config-encoder-test.yaml │ ├── run.sh │ ├── plot_losses.py │ ├── arima.py │ ├── mct_test_plots.py │ └── transformer_delay.py ├── requirements.txt ├── PandasScripts │ ├── csv_gendelays.py │ └── csvhelper_memento.py └── README.md ├── CITATION.cff ├── LICENSE ├── .gitlab-ci.yml ├── README.md ├── .github └── workflows │ └── codeql-analysis.yml └── .gitignore /report/thesis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/thesis.pdf -------------------------------------------------------------------------------- /presentation/slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/slides.pdf -------------------------------------------------------------------------------- /report/figures/mask.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/mask.pdf -------------------------------------------------------------------------------- /report/figures/delay.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay.pdf -------------------------------------------------------------------------------- /report/figures/delay2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay2.pdf -------------------------------------------------------------------------------- /report/figures/delay4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay4.pdf -------------------------------------------------------------------------------- /report/figures/vision.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/vision.pdf -------------------------------------------------------------------------------- /report/figures/MCT_loss.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/MCT_loss.pdf -------------------------------------------------------------------------------- /report/figures/overview.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/overview.pdf -------------------------------------------------------------------------------- /presentation/figures/delay.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/delay.pdf -------------------------------------------------------------------------------- /presentation/figures/vision.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/vision.pdf -------------------------------------------------------------------------------- /report/figures/MCT_valloss.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/MCT_valloss.pdf -------------------------------------------------------------------------------- /report/figures/complex_topo.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/complex_topo.pdf -------------------------------------------------------------------------------- /report/figures/declaration.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/declaration.pdf -------------------------------------------------------------------------------- /report/figures/prev_vision.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/prev_vision.pdf -------------------------------------------------------------------------------- /report/figures/simple_topo.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/simple_topo.pdf -------------------------------------------------------------------------------- /report/figures/simulation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/simulation.pdf -------------------------------------------------------------------------------- /presentation/figures/eth_logo.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/eth_logo.pdf -------------------------------------------------------------------------------- /presentation/figures/nsg_logo.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/nsg_logo.pdf -------------------------------------------------------------------------------- /report/figures/MCT_trainloss.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/MCT_trainloss.pdf -------------------------------------------------------------------------------- /report/figures/eth-nsg-header.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/eth-nsg-header.pdf -------------------------------------------------------------------------------- /report/figures/simple_topo_ft.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/simple_topo_ft.pdf -------------------------------------------------------------------------------- /report/figures/slidingwindow.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/slidingwindow.pdf -------------------------------------------------------------------------------- /presentation/figures/questions.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/questions.pdf -------------------------------------------------------------------------------- /presentation/figures/simple_topo.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/simple_topo.pdf -------------------------------------------------------------------------------- /report/figures/architecture_bert.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/architecture_bert.pdf -------------------------------------------------------------------------------- /report/figures/architecture_ntt.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/architecture_ntt.pdf -------------------------------------------------------------------------------- /report/figures/architecture_vit.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/architecture_vit.pdf -------------------------------------------------------------------------------- /report/figures/delay_Receiver0.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay_Receiver0.pdf -------------------------------------------------------------------------------- /report/figures/delay_Receiver1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay_Receiver1.pdf -------------------------------------------------------------------------------- /report/figures/delay_Receiver2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay_Receiver2.pdf -------------------------------------------------------------------------------- /report/figures/delay_Receivers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/delay_Receivers.pdf -------------------------------------------------------------------------------- /report/figures/queue_profile_A.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/queue_profile_A.pdf -------------------------------------------------------------------------------- /report/figures/simulation_small.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/simulation_small.pdf -------------------------------------------------------------------------------- /presentation/figures/complex_topo.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/complex_topo.pdf -------------------------------------------------------------------------------- /presentation/figures/architecture_ntt.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/architecture_ntt.pdf -------------------------------------------------------------------------------- /presentation/figures/eth-nsg-header.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/eth-nsg-header.pdf -------------------------------------------------------------------------------- /presentation/figures/queue_profile_A.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/queue_profile_A.pdf -------------------------------------------------------------------------------- /presentation/figures/simple_topo_ft.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/simple_topo_ft.pdf -------------------------------------------------------------------------------- /report/figures/SE_trend_arima_10000.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/SE_trend_arima_10000.pdf -------------------------------------------------------------------------------- /report/figures/SE_trend_arima_30000.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/SE_trend_arima_30000.pdf -------------------------------------------------------------------------------- /report/figures/architecture_transformer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/architecture_transformer.pdf -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "workspace/memento-ns3-for-NTT"] 2 | path = workspace/memento-ns3-for-NTT 3 | url = git@github.com:Siddhant-Ray/memento-ns3-for-NTT.git 4 | -------------------------------------------------------------------------------- /report/figures/finetune_mct_loss_comparison.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/finetune_mct_loss_comparison.pdf -------------------------------------------------------------------------------- /literature/README.md: -------------------------------------------------------------------------------- 1 | # Literature 2 | 3 | Most of the relevant literature used for this thesis, can be found in this single [```Literature.html```](Literature.html) file. 4 | -------------------------------------------------------------------------------- /report/figures/finetune_mct_loss_comparison_agg.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/report/figures/finetune_mct_loss_comparison_agg.pdf -------------------------------------------------------------------------------- /presentation/figures/finetune_mct_loss_comparison.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/finetune_mct_loss_comparison.pdf -------------------------------------------------------------------------------- /workspace/NetworkSimulators/ns3/cptodocker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | sudo docker cp ../workspace/NetworkSimulators/ns3/. 467f79c77706:/ns3/scratch # change CONTAINER ID as required 4 | -------------------------------------------------------------------------------- /presentation/figures/finetune_mct_loss_comparison_agg.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Siddhant-Ray/Network-Traffic-Transformer/HEAD/presentation/figures/finetune_mct_loss_comparison_agg.pdf -------------------------------------------------------------------------------- /report/README.md: -------------------------------------------------------------------------------- 1 | # Report 2 | 3 | ## Final thesis PDF : 4 | [Download final compiled thesis](https://gitlab.ethz.ch/nsg/students/projects/2022/ma-2022_packet_transformer/-/jobs/artifacts/main/raw/report/thesis.pdf?job=compile_pdf) -------------------------------------------------------------------------------- /workspace/NetworkSimulators/ns3/cpfromdocker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | sudo docker cp 467f79c77706:/ns3/outputs . # change CONTAINER ID as required 4 | sudo chmod -R 757 outputs 5 | sudo rm -r ../outputs 6 | sudo mv outputs ../ 7 | -------------------------------------------------------------------------------- /presentation/README.md: -------------------------------------------------------------------------------- 1 | # Presentation 2 | 3 | ## Slides for presentation 4 | [`Final slides`](https://gitlab.ethz.ch/nsg/students/projects/2022/ma-2022_packet_transformer/-/jobs/artifacts/main/raw/presentation/slides.pdf?job=compile_slides) 5 | -------------------------------------------------------------------------------- /presentation/Makefile: -------------------------------------------------------------------------------- 1 | 2 | NOTE = !! change the next line to fit your filename; no spaces at file name end !! 3 | FILE = slides 4 | 5 | all: 6 | pdflatex $(FILE) 7 | pdflatex $(FILE) 8 | 9 | clean: 10 | rm -f *.dvi *.log *.aux *.bbl *.blg *.toc *.lof *.lot *.cb *.~ *.out *.fdb_latexmk *.fls *.nav *.snm 11 | -------------------------------------------------------------------------------- /workspace/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:latest 2 | LABEL MAINTAINER Siddhant Ray 3 | WORKDIR /ntt 4 | RUN set -xe \ 5 | && apt-get update \ 6 | && apt-get install python3-pip -y 7 | 8 | COPY requirements.txt requirements.txt 9 | RUN pip install --upgrade pip 10 | RUN pip install -r requirements.txt 11 | 12 | COPY . . -------------------------------------------------------------------------------- /report/Makefile: -------------------------------------------------------------------------------- 1 | 2 | NOTE = !! change the next line to fit your filename; no spaces at file name end !! 3 | FILE = thesis 4 | 5 | all: 6 | pdflatex $(FILE) 7 | bibtex $(FILE) 8 | pdflatex $(FILE) 9 | pdflatex $(FILE) 10 | 11 | clean: 12 | rm -f *.dvi *.log *.aux *.bbl *.blg *.toc *.lof *.lot *.cb *.~ *.out *.fdb_latexmk *.fls 13 | -------------------------------------------------------------------------------- /workspace/docker-run.sh: -------------------------------------------------------------------------------- 1 | # https://stackoverflow.com/questions/30543409/how-to-check-if-a-docker-image-with-a-specific-tag-exist-locally 2 | if [[ "$(docker images -q siddhantray/ntt-docker:latest 2> /dev/null)" == "" ]]; then 3 | docker run -it ntt-docker:latest 4 | else 5 | docker pull siddhantray/ntt-docker 6 | docker run -it siddhantray/ntt-docker 7 | fi -------------------------------------------------------------------------------- /workspace/TransformerModels/configs/config-linear.yaml: -------------------------------------------------------------------------------- 1 | config_base = ''' name: base 2 | max_learning_rate: 0.1 3 | weight_decay: 1e-5 4 | learning_rate: 1e-4 5 | dropout: 0.2 6 | num_layers: 6 7 | epochs: 15 8 | batch_size: 64 9 | linear_size: 256 10 | loss_function: huber 11 | eof: eof_str''' -------------------------------------------------------------------------------- /workspace/TransformerModels/configs/config-lstm.yaml: -------------------------------------------------------------------------------- 1 | config_base = ''' name: base 2 | max_learning_rate: 0.1 3 | weight_decay: 1e-5 4 | learning_rate: 1e-4 5 | dropout: 0.2 6 | num_layers: 6 7 | epochs: 15 8 | batch_size: 64 9 | linear_size: 256 10 | loss_function: huber 11 | eof: eof_str''' -------------------------------------------------------------------------------- /workspace/TransformerModels/configs/config-encoder.yaml: -------------------------------------------------------------------------------- 1 | config_base = ''' name: encoder 2 | max_learning_rate: 0.1 3 | weight_decay: 1e-5 4 | learning_rate: 1e-4 5 | dropout: 0.2 6 | num_heads: 8 7 | num_layers: 6 8 | epochs: 15 9 | batch_size: 64 10 | linear_size: 640 11 | loss_function: huber 12 | eof: eof_str''' -------------------------------------------------------------------------------- /workspace/TransformerModels/configs/config-transformer.yaml: -------------------------------------------------------------------------------- 1 | config_base = ''' name: base 2 | max_learning_rate: 0.1 3 | weight_decay: 1e-5 4 | learning_rate: 1e-4 5 | dropout: 0.2 6 | num_heads: 8 7 | num_layers: 6 8 | epochs: 20 9 | batch_size: 64 10 | linear_size: 384 11 | loss_function: huber 12 | eof: eof_str''' -------------------------------------------------------------------------------- /workspace/TransformerModels/configs/config-encoder-test.yaml: -------------------------------------------------------------------------------- 1 | config_base = ''' name: encoder 2 | max_learning_rate: 0.1 3 | weight_decay: 1e-5 4 | learning_rate: 1e-4 5 | dropout: 0.2 6 | num_heads: 8 7 | num_layers: 6 8 | epochs: 15 9 | batch_size: 64 10 | linear_size: 120 11 | loss_function: huber 12 | eof: eof_str''' -------------------------------------------------------------------------------- /workspace/NetworkSimulators/ns3/dockerns3.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | if [ "$1" == "fetch" ] 4 | then 5 | sudo docker pull notspecial/ns-3-dev 6 | sudo docker run -i -t notspecial/ns-3-dev 7 | elif [ "$1" == "newcontainer" ] 8 | then 9 | sudo docker run -i -t notspecial/ns-3-dev 10 | elif [ "$1" == "shell" ] 11 | then 12 | sudo docker exec -it 467f79c77706 bash # change CONTAINER ID as required 13 | else 14 | sudo docker start -ai 467f79c77706 # change CONTAINER ID as required 15 | fi 16 | -------------------------------------------------------------------------------- /CITATION.cff: -------------------------------------------------------------------------------- 1 | # This CITATION.cff file was generated with cffinit. 2 | # Visit https://bit.ly/cffinit to generate yours today! 3 | 4 | cff-version: 1.2.0 5 | title: Network Traffic Transformer 6 | message: >- 7 | If you use this software, please cite it using the 8 | metadata from this file. 9 | type: software 10 | authors: 11 | - given-names: Siddhant 12 | family-names: Ray 13 | affiliation: ETH Zurich 14 | orcid: 'https://orcid.org/0000-0003-0265-2144' 15 | email: siddhant.r98@gmail.com 16 | - given-names: Alexander 17 | family-names: Dietmüller 18 | affiliation: ETH Zurich 19 | orcid: 'https://orcid.org/0000-0003-3769-3958' 20 | email: adietmue@ethz.ch 21 | keywords: 22 | - Transformer 23 | - Packet-level modelling 24 | license: MIT 25 | version: '1.0' 26 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Siddhant Ray 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /workspace/TransformerModels/run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #SBATCH --output=./%j.out 3 | #SBATCH --error=./%j.err 4 | #SBATCH --cpus-per-task=4 5 | #SBATCH --gres=gpu:1 6 | #SBATCH --mem=80G 7 | set -o errexit # Exit on errors 8 | 9 | # Activate the correct venv. 10 | eval "$(pyenv init -)" 11 | eval "$(pyenv virtualenv-init -)" 12 | pyenv activate venv 13 | 14 | echo "Running on node: $(hostname)" 15 | echo "In directory: $(pwd)" 16 | echo "Starting on: $(date)" 17 | echo "SLURM_JOB_ID: ${SLURM_JOB_ID}" 18 | 19 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python encoder_delay.py 20 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python encoder_delay_varmask_chooseagglevel_multi.py 21 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python encoder_delay_varmask_chooseencodelem.py 22 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python encoder_delay_varmask_chooseencodelem_multi.py 23 | 24 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python finetune_encoder.py 25 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python finetune_encoder_multi.py 26 | 27 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python finetune_mct.py 28 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python finetune_mct_multi.py 29 | 30 | # NCCL_DEBUG=INFO; NCCL_SEBUG_SUBSYS=ALL python lstm.py 31 | # NCCL_DEBUG=INFO; NCCL_DEBUG_SUBSYS=ALL python arima.py --run true 32 | 33 | -------------------------------------------------------------------------------- /report/abstract.tex: -------------------------------------------------------------------------------- 1 | \clearpage 2 | \null 3 | \vfil 4 | \thispagestyle{plain} 5 | \begin{center}\textbf{Abstract}\end{center} 6 | 7 | Learning underlying network dynamics from packet-level data has been deemed an extremely difficult task, to the point that it is practically not attempted. While research has shown that machine learning (ML) models can be used to learn behaviour and improve on some specific tasks in the networking domain, these models do not generalize to any other tasks. However, a new ML model called the \emph{Transformer} has shown massive generalization abilities in several fields, where the model is pre-trained on large datasets, in a task agnostic fashion and fine-tuned on smaller datasets for task specific applications, and has become the state-of-the-art architecture for machine learning generalization. We present a new Transformer architecture adapted for the networking domain, the Network Traffic Transformer (NTT), which is designed to learn network dynamics from packet traces. We pre-train our NTT to learn fundamental network dynamics and then, leverage this learnt behaviour to fine-tune to specific network applications in a quick and efficient manner. By learning such dynamics in the network, the NTT can then be used to make more network-aware decisions across applications, make improvements to the same and make the networks of tomorrow, more efficient and reliable. 8 | 9 | 10 | \vfil 11 | \clearpage 12 | -------------------------------------------------------------------------------- /workspace/NetworkSimulators/memento/run_topo_small.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Original author: Siddhant Ray 4 | 5 | ## First in arguments is the topology 6 | ## Second specifies different congestion for different receivers (default is 0) 7 | ## Third specifies the seed for the random number generator (change for multiple runs) 8 | 9 | ## Current setup generates fine-tuning data with 2 bottlenecks, $2 is the second bottleneck rate (!=0) 10 | ## To generate pre-training data with only one bottleneck, replace --prefix with 11 | ## --prefix=results/small_test_no_disturbance_with_message_ids$3 and pass $2 as 0 12 | 13 | # // Network topology 1 14 | # // 15 | # // disturbance1 16 | # // | 17 | # // 3x n_apps(senders) --- switchA --- switchB --- receiver1 18 | # // 19 | 20 | # If running inside the VSCode's environment to run Docker containers: Replace ./docker-run.sh waf with just waf 21 | 22 | mkdir -p results 23 | ./docker-run.sh waf --run "trafficgen_small 24 | --topo=$1 25 | --apps=20 26 | --apprate=1Mbps 27 | --startwindow=50 28 | --queuesize=1000p 29 | --linkrate=30Mbps 30 | --congestion1=$2Mbps 31 | --prefix=results/small_test_one_disturbance_with_message_ids$3 32 | --seed=$3" 33 | 34 | -------------------------------------------------------------------------------- /report/figures/simple_topo.drawio: -------------------------------------------------------------------------------- 1 | 7Vldb9owFP01SNsDU74Lj3x0qzRRTWNSn03iBKsmzhxToL9+14mTkDistE0pqC1SiY+da+ece+17Q8+erLY/OEqWMxZg2rOMYNuzpz3LGjoe/JfALgc8x82BiJMgh8wKmJNHrEBDoWsS4LQ2UDBGBUnqoM/iGPuihiHO2aY+LGS0PmuCIqwBcx9RHb0jgVjm6MA1KvwGk2hZzGwaqmeFisEKSJcoYJs9yL7u2RPOmMivVtsJppK7gpf8vu8HesuFcRyLY26I+OB21p+ECD+Ob+ZhdMd/zvrKygOia/XAarFiVzAAVoBsaIw3SyLwPEG+7NmA3IAtxYpCy4RLlCa5AiHZYph0rGxjLvD24KLNkgpwIcxWWPAdDFE3WI5ib9dobyoxCmi5p0OBISV/VFquGIILRdIzCLPOnDDHODPC7CMIi4ORDFVo+RSlKfHrPHG2jgNJ0NQoScKBFrdPUrRHgdtCQYFxTJEgD3XzbbyoGX4xAhNXLjtoKOAOv7l1Iylbcx+r+/ZDtmFKE9NrqCQQj7DQDGVClQ/+cu2cFu08CnSNQ5at02eU8azH+7uWGxmoYzuB/OxDXiS/bVi8MVvAxm0ZlMT3hS1YWm4uH6a5BwSDaAQOJVEsvQUUxzD/WIYMgU17pDpWJAjk7WOOU/KIFpkp6TyJpCojzx333Km0tRYszY8daToVnN3jiXqumMXSSkgobUBd7G3DA+ru+emwxU/ttwpV96OFqtM8XV4eqq75vqHqfTTtzKvOtNOSjBNrd/W5zb7hNqsF+Xtvs4NO5fY+5a7JbbpnJrfZVjLU9K5UKcSVHf2cvxEMMJ1km0veJv4WWmgla4t4kSZ5b2Z+wSv1Xz1hu1ca2Z++sDkcNpgf9MULdlG54klJhe1M5acj123mEJbuulf/OVG7d129ePuNfQyHNr8csdQ7JdPRxVPe280x0yz7bFcTb3DSfaetentVSvg8Ns83dXSsulTmsKHB0fW5WeaczdctJ0odzbaa7TN37Cx3bHiKZehBfdpkQq/zJpylaf8PR2EI4XsRypxwW7YbZWJZNu4pWL6zP42ET5Z7T6dnlkrPjAXy76Nsl+6rQJf9PFp8sVxwVVihYcmnkxem4X5tzSJvWU/+MsIlsRz3BUckJnF00cF/ypPfO8LFrG5cDJrVL0b5IVL97GZf/wM= -------------------------------------------------------------------------------- /report/figures/mask.drawio: -------------------------------------------------------------------------------- 1 | 5ZpNc5swEIZ/jY/pAAKCj7GTpj204xkn7fSogAyayogKObbz6ytAfIqUNMHEE11ssZIW6X00oNUyA8vt4ZbBJPpGA0RmlhEcZuB6Zllz2xW/meFYGBzPKwwhw0FhMmvDGj8haTSkdYcDlLYackoJx0nb6NM4Rj5v2SBjdN9utqGkfdcEhkgxrH1IVOtPHPCosHqOUdu/IBxG5Z1NQ9ZsYdlYGtIIBnTfMIGbGVgySnlR2h6WiGTalboU/T4/U1sNjKGYv6TDdzBP6CO/i2+/hvv7+6sf+5+/Lmw5uEdIdnLGcrT8WErA6C4OUObFmIHFPsIcrRPoZ7V7wVzYIr4l4soUxQ0mZEkJZXlfsNlsLN8X9pQz+hs1agL3wXXcqqYUV8iykENCjKPDs5M1KwnF0kN0izg7iiZlB1dOTC47UFLY1xBB2SZqACwFgXLdhJXrWlpRkOr+j9LmsNIoDq6yNSuuYhqjtrJtDEVfFCird1ChhgJOjwCljSECOX5su+9TRd5hRbG4cQXAMjoA3I6wKd0xH8lezWU75MjqOOKQhYgrjnJI1bTfwM3Si5s9Freuo6m5Ab24eWNx6zqamputFbdK3rdyUxxNzc3Ri1t3g/Fqbs/tVKbi5mrFzR5rX6I4mprbpV7cxtqXKI6m5ubpxW2sfYniaGpuc4XbylDIiYiVt3FBgsNYlH3BA4nYd5HFtdiH5EpWbHEQZN0XDKX4CT7krjK0STaVfHLOYuZcZ752nKbFEYmpxNRypTQDcGmSAx83npav+8YqsntWEThVOO2o4fTK1AaHGhS/Nw412vqkDw3PGaQBJqWhxlAa0bDPjYYaGelDA7jnRkONdzSiYZ0bDTWK0YeGbZwbDTU2WZlGT4DyYYmA4X1VX9bodER6og5BBOhD5PLMiJQ3+1f8nkYwyYpCbEJ3fDhrmiCGxfAyTGWnVW1aBDCN8ojfHEnUzlmx1ZMdNftUnTunUvUF2dE35aE9H/XnoR88x3aMkXTtnC1Vq7Cpq9Wj68myzq41rOsHPm2yLjvCvva0SXmenPi0ydUre+mY83G4KY6m5taXvXRJ9grY0HygNUD3z46WFRfF2/VKNDCN5JAjK+tFKcz+79alKzG0wltRoc9mQHm+vvOxl9uX9BwJd/4tnO7AlcfwewPvy5aOBPxaPEOPuhN3TGeQeN+L6BXExWX99WXxAqg/YQU3fwE= -------------------------------------------------------------------------------- /workspace/NetworkSimulators/ns3/tracing.cc: -------------------------------------------------------------------------------- 1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */ 2 | 3 | #include "ns3/object.h" 4 | #include "ns3/uinteger.h" 5 | #include "ns3/traced-value.h" 6 | #include "ns3/trace-source-accessor.h" 7 | 8 | #include "ns3/core-module.h" 9 | 10 | #include 11 | 12 | using namespace ns3; 13 | 14 | // Probably should avoid using ns std; 15 | using namespace std; 16 | 17 | class TraceTest : public Object 18 | { 19 | public: 20 | 21 | static TypeId GetTypeId (void) 22 | { 23 | static TypeId tid = TypeId("TraceTest") 24 | .SetParent(Object::GetTypeId()) 25 | .SetGroupName("Tracing") 26 | .AddConstructor () 27 | .AddTraceSource("Integer value", 28 | "An integer value to trace.", 29 | MakeTraceSourceAccessor(&TraceTest::val_Int), 30 | "ns3::TracedValueCallback::Int32") 31 | ; 32 | return tid; 33 | } 34 | 35 | TraceTest() {} 36 | TracedValue val_Int; 37 | }; 38 | 39 | void IntTrace (int32_t oldValue, int32_t newValue) 40 | { 41 | std::cout<<"Traced "< traceTest = CreateObject (); 54 | traceTest->TraceConnectWithoutContext("Integer value", MakeCallback(&IntTrace)); 55 | 56 | traceTest->val_Int = 1000; 57 | return 0; 58 | } 59 | 60 | 61 | -------------------------------------------------------------------------------- /report/figures/slidingwindow.drawio: -------------------------------------------------------------------------------- 1 | 5ZvPk5owFMf/Go/dASKox8pu28N2yswe9hwhClMgDsTV7V/fRAIKL1bXgeg2e9iBB3nC95OXHy9khPxs973A6/gnjUg6cqxoN0KPI8eZjT3+XxjeK4M7nVaGVZFElck+GF6SP0QaLWndJBEpWzcySlOWrNvGkOY5CVnLhouCbtu3LWna/tU1XhFgeAlxCq2vScTiyjp1rYP9B0lWcf3LtiWvZLi+WRrKGEd0e2RCTyPkF5Sy6ijb+SQV2tW6VOW+nbjaPFhBcnZJgV+pmzw+fyfZrzlZPI3p9uU1+yK9vOF0I19YPix7rxUo6CaPiHBijdB8GyeMvKxxKK5uOXJui1mW8jObHy6TNPVpSot9WbRcLp0w5PaSFfQ3OboSeQvP9ZortbZclbl8JFIwsjv5rnajIK95hGaEFe/8lrqAJ0WXtc4Zy/PtgaFb3xMf8avvw7LarBrXB2X5gRT3A0I754UmefRV1Fh+ltOctIVtU6jKkgjU3bMCHQugeP/aVpAUs+St7V4livyFgCb8hxv9Hauj/7Sja0k3RUhkqeNKe85RFxDDxYow4GjPqHnt67Ehs7BN+8LWdaQZ29gobMjpCRtwpBmbaxa2bid1NbZTvZ0mbJ5R2MZ99W3AkWZsE7OwodmD2xM46EozuqlZ6PoalgBHmrHNjMLm9jUsAY40Y6tzEaZw62tcAhzp5nZBduM/4ub1NTABjnRzc8ziNu6LW9eRbm5mZUu8voYlwJFubmbNu52+4g040s3tgmnAAY39sey9pBzhMm6K68nV18Oto9rQJKaOq0Oz1NJ/sv6CAd/AwoIFE9+3+F8/koOsunOh5N1I6U9xmLENLKA5f2PWFhanySrnxyF/c8KVmgtdkhCnX+WFLIkiUXxekDL5gxd7V0LGtYjK/Wu485H7KHxtGC2rBUgbEJDUFCCHCAFXwUOBwxkMB+wRAtsYHCA8bo4DZliDmTE4wPT/5jhg5jSYGoMDJNFujgMOg4KJOTi6g9Cb44BZzsAzB8e99R0IjmUD1xgcYGn05jhgTjIYm4Pj3rpy5EAcyBgc4DOdm+OAGcQATr3/Wxz31pUjxazcNic8vIl7lsdYKw/VtNyc+PDQvfFQzMttc/Ik7uzeeCgm5nU22QQe7r3xuOo7pVZm/OQq06m1iPtdd3I7U8NZ8xXZhz+smJ3zNPRn8XCO/5rkYnsLfwS2B4wLZk7gdXnAuGsWq7QEXh3legLv3FrV5wlJ2+ktJhWuht70AFMLyqDkz8nBWCaNU7poFPGpWu4cLj5h2gHQ0L3ALELW9wfKgiKouHKB2RtM8au+TeqtReyI+4laxC6Rq9vDrqOhW0OYu/hXa2jOCgUAo5glaG4NL/j+7Ez7V8Z4La5ku5XYhv6wwGUSPhQkZD5OU7oRJPd70pHAsd+Mbrv8aFEF8vMiVYKQkWvBKL+k0RUnAWa8ruR7i+hz+2leJ22EE0hQ2UbY6OMI+elhj3oVnIeN/ujpLw== -------------------------------------------------------------------------------- /workspace/requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==1.0.0 2 | aiohttp==3.8.1 3 | aiosignal==1.2.0 4 | astunparse==1.6.3 5 | async-timeout==4.0.2 6 | attrs==21.4.0 7 | black==22.8.0 8 | cachetools==5.0.0 9 | certifi==2022.12.7 10 | charset-normalizer==2.0.12 11 | click==8.1.3 12 | colorama==0.4.5 13 | cycler==0.11.0 14 | docformatter==1.5.0 15 | einops==0.4.1 16 | flatbuffers==1.12 17 | fonttools==4.33.3 18 | frozenlist==1.3.0 19 | fsspec==2022.3.0 20 | future==0.18.3 21 | gast==0.4.0 22 | google-auth==2.6.6 23 | google-auth-oauthlib==0.4.6 24 | google-pasta==0.2.0 25 | grpcio==1.44.0 26 | h5py==3.7.0 27 | idna==3.3 28 | importlib-metadata==4.11.3 29 | isort==5.10.1 30 | joblib==1.2.0 31 | keras==2.9.0 32 | Keras-Preprocessing==1.1.2 33 | kiwisolver==1.4.3 34 | libclang==14.0.1 35 | mando==0.6.4 36 | Markdown==3.3.6 37 | matplotlib==3.5.2 38 | multidict==6.0.2 39 | mypy-extensions==0.4.3 40 | numpy==1.22.3 41 | oauthlib==3.2.1 42 | opt-einsum==3.3.0 43 | packaging==21.3 44 | pandas==1.4.2 45 | pathspec==0.10.1 46 | Pillow==9.3.0 47 | platformdirs==2.5.2 48 | protobuf==3.19.5 49 | pyasn1==0.4.8 50 | pyasn1-modules==0.2.8 51 | pyDeprecate==0.3.2 52 | pyparsing==3.0.8 53 | python-dateutil==2.8.2 54 | pytorch-lightning==1.6.1 55 | pytz==2022.1 56 | PyYAML==6.0 57 | radon==5.1.0 58 | requests==2.27.1 59 | requests-oauthlib==1.3.1 60 | rsa==4.8 61 | scikit-learn==1.0.2 62 | scipy==1.8.1 63 | seaborn==0.11.2 64 | six==1.16.0 65 | tbparse==0.0.6 66 | tensorboard==2.9.1 67 | tensorboard-data-server==0.6.1 68 | tensorboard-plugin-wit==1.8.1 69 | tensorflow==2.11.1 70 | tensorflow-estimator==2.9.0 71 | tensorflow-io-gcs-filesystem==0.26.0 72 | termcolor==1.1.0 73 | threadpoolctl==3.1.0 74 | tomli==2.0.1 75 | torch==1.13.1 76 | torchmetrics==0.8.0 77 | tqdm==4.64.0 78 | typing_extensions==4.1.1 79 | untokenize==0.1.1 80 | urllib3==1.26.9 81 | Werkzeug==2.2.3 82 | wrapt==1.14.1 83 | yarl==1.7.2 84 | zipp==3.8.0 85 | -------------------------------------------------------------------------------- /workspace/NetworkSimulators/memento/run_topo.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Original author: Siddhant Ray 4 | 5 | ## First in arguments is the topology 6 | ## Second to fourth Specify the different congestion for different receivers (default is 0) 7 | ## Last argument is for setting the seed for the random number generator (change for multiple runs). 8 | 9 | ## Current setup generates fine-tuning data with 4 bottlenecks, $2 is the second bottleneck rate 10 | ## $3 is the third bottleneck rate, $4 is the fourth bottleneck rate (all !=0) 11 | 12 | ## Topology is 13 | # // Network topology 2 14 | # // 15 | # // disturbance1 16 | # // | 17 | # // 3x n_apps(senders) --- switchA --- switchB --- receiver1 18 | # // | 19 | # // | 20 | # // | disturbance2 21 | # // | | 22 | # // switchC --- switchD --- receiver2 23 | # // | 24 | # // | 25 | # // | disturbance3 26 | # // | | 27 | # // switchE --- switchF --- switchG--recevier3 28 | # // 29 | 30 | # If running inside the VSCode's environment to run Docker containers: Replace ./docker-run.sh waf with just waf 31 | 32 | mkdir -p results 33 | ./docker-run.sh waf --run "trafficgen 34 | --topo=$1 35 | --apps=20 36 | --apprate=1Mbps 37 | --startwindow=50 38 | --queuesize=1000p 39 | --linkrate=30Mbps 40 | --congestion1=$2Mbps 41 | --congestion2=$3Mbps 42 | --congestion3=$4Mbps 43 | --prefix=results/large_test_disturbance_with_message_ids$5 44 | --seed=$5" 45 | 46 | -------------------------------------------------------------------------------- /report/summary.tex: -------------------------------------------------------------------------------- 1 | \chapter{Summary} 2 | \label{cha:summary} 3 | 4 | We have seen that earning fundamental dynamics in networks is not a trivial problem. Even on the simplest network topologies, different interactions of traffic arising from multiple sources, can create complex patterns in the overall traffic. However, amidst these complexities, there exist some underlying patterns, which can effectively be captured and learnt, by having the right model equipped for this task. Over a span of time, measurements taken on these traffic complexities, show that not all is truly random, there is some structure which can be learnt, and be doing so, this can be leveraged to improve performance in networks on a whole. 5 | 6 | We present in this thesis, a new NTT model, which serves the first steps towards learning fundamental behaviour of dynamics from network traffic. Based on state-of-the-art techniques on learning sequences in data in other fields, we present our Transformer based architecture which is trained to learn similar sequential information in network traffic data. Of course, doing so even in our setup, is quite challenging and there is clear scope for a huge amount of improvement. We feel that this approach surely opens up a plethora of new research questions and directions in the field of learning for networks. Over the course of this project, we explore several possible methods of pre-training on traffic traces, following which we evaluate and compare them, in order to understand better, the possibilities and limits of learning the network dynamics. Based on our findings, we do conclude that learning these networks dynamics is definitely possible, and acknowledge the fact that a lot is still unknown about deep learning on such data, but at the same time, realise that we are in a much better position to decide new directions to proceed in, given our current NTT architecture. 7 | 8 | We hope that our initial work in this direction, motivates the networking community to explore the vast possibilities of taking a step further in this domain and working together to build smarter and better learnt models, to improve performance and efficiency in the networks of tomorrow. -------------------------------------------------------------------------------- /report/figures/complex_topo.drawio: -------------------------------------------------------------------------------- 1 | 7Vxtk6I4EP41Vu19cAtIQP046t7O1tXcbc1c3d5+RIhIDRIv4Kjz6y+RoJDE8Y03XcYqx3Sgic/TnXSnU3bAaL7+SuzF7Am7KOgYmrvugHHHMAbQou9MsEkEFjQTgUd8NxHpe8GL/464UOPSpe+iKHdhjHEQ+4u80MFhiJw4J7MJwav8ZVMc5J+6sD0kCV4cO5ClP3w3niXSvqnt5Y/I92bpk3WN98zt9GIuiGa2i1cZEfjSASOCcZx8mq9HKGDYpbgk9/1+oHc3MILC+JQbPNL/86k7mtroffj4MvV+kD+eulzLmx0s+Rfmg403KQJUCwWbNoarmR+jl4XtsJ4VpZvKZvE8oC2dfrSjRcLA1F8j+tChPML0cYjEaJ0R8RF/RXiOYrKhl/BeA3L0NkJ7tScjFc0yPKQym9Pv7TTvEaIfOEhnAGY0HDCoNQwwcAJgofvAXJW2nMCOIt/J40TwMnQZQGPtI5CQm3NlGaIMBKYCglRGUGDH/lt+AlDhwp/wHft0JHuT7QsMmIPPZl5JhJfEQfy+rMsKqiQyLYGl2CYeiiVFW6J2X/xy7qCCOyugcA2neDtOBweYbHus/5ZsIqPsAOiyV1Zkeew/oIPXniZ04ja0wA9fU110aIm65DLJPKjlx4LjBL4XMmuhJoDo84fMP3w6aT/wjrnvuuz2IUGR/25PtqqY8SwYVFvwzGHHHDNdyxhHybLDVEcxwa9oxL9XiEOmZeoHgSAqYm4bHGA3Y6cDhZ2CslzV/NVcFYqry+Wuaur1uqr1q3Gn9wrjTgoyKuau106zJU6zkpPXPc32C6XbaunO0a2bDaNbV6UMOb73rKTkso5ugt8DvUCHi3VCuYr8NW3Zc5ZbhJNokfRu1U/Inv2rH6i2Sm37Jw/shS42iBy0xRs2UTbi0Q4KAMfsVZDpijGEIZtu74MVtXjTlZO3Z+QgumgT/XbY4ptKOpTZ4+ZbCHtSBJjudNU28ajSt6tiwqvRbEjsCI08VfpA4ODkBF3fBZ3ifktFsaOuStra4LGw4FF0aq3uaEJO9EYER1H3b2JPp9R9b4KZCqdlIOSJu7wxw+Bu074aCo/me8fjM4PHZ9rEdl697Szd5Y7O+ok3+WSYdGqgI9S2Nss+6Jr5mzKM/ImYv0/p3fTdD1E3XoZ+6N2081doYtA6wcSMkkzsr+d/rMfHb70YvD9rkQPfQqJXUdaR4FGAeHJZByiStLKqFErAVDlakwATKwG1A3ZKWadJgMF+zYCdEozXCZhpNQywwqsRFAiy+Zc1aMzOmz+zfeN1rrXpZHfChdRFuWQksf6xebm2wqTA8K5QeW7WA8SgWFzHist5lCiWVOs4znBTSsrgUuakZaSviUWTkrlrax1H0tXr4qrBQX4rSFiVhKuqHXftrGK9ydQudFZxRaaKKnbWQeusJTorNA/yW5ezpgv77QRdtXm5PsiRBy/1ckmRuLaX7OOKGk/r5OU5OVCkVdW6eOFVoYyLW1bvHCfftr4j4tOvxiguPN0CTQoELo/axaiu4nxLUXO4TZOpyxIscTNqcPFZNaunfdbAMW1lG4ScxaXnAmTLaOTMfl514LqNNWG3BFh1B3lySpayB1r2BPYsoXwI+3Wz1yZlpe6giPGaXnO8ZqhSspbvovgWl+YGJOG7AnTLeCkZmXD4CypOWVbLt9HyXSbfYvxl1O/h8p7L3R/iuopD8RAXUBzDK+0Ql5rCMrdR6ipPG2oS6jmUm/HTszdMRFXiCfmS82OjPeF3X/OPeMJPOf9UesLPKLyueq+VGrMnFFjgxfOKpKriWo0h5/5t3PDh3hs4HutXGzekitul4U5MzDohnax2aQDt/kGZ+aQp7gjDEvNJ2tz/QlSyjOx/Zgt8+R8= -------------------------------------------------------------------------------- /.gitlab-ci.yml: -------------------------------------------------------------------------------- 1 | # This file is a template, and might need editing before it works on your project. 2 | # To contribute improvements to CI/CD templates, please follow the Development guide at: 3 | # https://docs.gitlab.com/ee/development/cicd/templates.html 4 | # This specific template is located at: 5 | # https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Getting-Started.gitlab-ci.yml 6 | 7 | # This is a sample GitLab CI/CD configuration file that should run without any modifications. 8 | # It demonstrates a basic 3 stage CI/CD pipeline. Instead of real tests or scripts, 9 | # it uses echo commands to simulate the pipeline execution. 10 | # 11 | # A pipeline is composed of independent jobs that run scripts, grouped into stages. 12 | # Stages run in sequential order, but jobs within stages run in parallel. 13 | # 14 | # For more information, see: https://docs.gitlab.com/ee/ci/yaml/index.html#stages 15 | 16 | compile_pdf: 17 | stage: build 18 | image: texlive/texlive # use a Docker image for LaTeX from https://hub.docker.com/ 19 | script: 20 | - cd report/ 21 | - rm -f *.dvi *.log *.aux *.bbl *.blg *.toc *.lof *.lot *.cb *.~ *.out *.fdb_latexmk *.fls 22 | - pdflatex thesis 23 | - bibtex thesis 24 | - pdflatex thesis 25 | - pdflatex thesis # build the pdf just as you would on your computer 26 | artifacts: 27 | paths: 28 | - ./report/thesis.pdf # instruct GitLab to keep the thesis.pdf file 29 | 30 | compile_slides: 31 | stage: build 32 | image: texlive/texlive 33 | script: 34 | - cd presentation/ 35 | - rm -f *.dvi *.log *.aux *.bbl *.blg *.toc *.lof *.lot *.cb *.~ *.out *.fdb_latexmk *.fls *.nav *.snm 36 | - pdflatex slides 37 | - pdflatex slides # build the pdf just as you would on your computer 38 | artifacts: 39 | paths: 40 | - ./presentation/slides.pdf # instruct GitLab to keep the slides.pdf file 41 | 42 | 43 | pages: 44 | stage: deploy 45 | script: 46 | - mkdir test_report # create a folder called public 47 | - cp report/thesis.pdf test_report # copy the pdf file into the public folder 48 | - mkdir test_slides 49 | - cp presentation/slides.pdf test_slides 50 | artifacts: 51 | paths: 52 | - test_report # instruct GitLab to keep the public folder 53 | - test_slides 54 | only: 55 | - main # deploy the pdf only for commits made to the main branch -------------------------------------------------------------------------------- /report/thesis.tex: -------------------------------------------------------------------------------- 1 | \documentclass[11pt,oneside]{book} 2 | \usepackage{graphicx} 3 | \usepackage{booktabs} 4 | \usepackage{caption} 5 | \usepackage{subcaption} 6 | \usepackage{amsmath} 7 | \usepackage{amsfonts} 8 | \usepackage{amssymb} 9 | \usepackage{lscape} 10 | \usepackage{psfrag} 11 | \usepackage[usenames]{color} 12 | \usepackage{bbm} 13 | \usepackage[update]{epstopdf} 14 | \usepackage[bookmarks,pdfstartview=FitH,a4paper,pdfborder={0 0 0}]{hyperref} 15 | \usepackage{verbatim} 16 | \usepackage{listings} 17 | \usepackage{textcomp} 18 | \usepackage{fancyhdr} 19 | \usepackage{multirow} 20 | \usepackage{tikz} 21 | \usepackage{lipsum} 22 | \usepackage{xcolor} 23 | \usepackage[margin=1in]{geometry} 24 | \newcommand{\hint}[1]{{\color{blue} \em #1}} 25 | 26 | \usepackage{xspace} 27 | \newcommand*{\eg}{e.g.\@\xspace} 28 | \newcommand*{\ie}{i.e.\@\xspace} 29 | 30 | \usepackage{xcolor, soul} 31 | \usepackage[shortlabels]{enumitem} 32 | 33 | \definecolor{GoodBlue}{rgb}{0.6640625,0.8203125,1} 34 | \sethlcolor{GoodBlue} 35 | 36 | \newcommand{\smallindent}{\hphantom{N}} 37 | %\usepackage{appendix} 38 | %\usepackage[title]{appendix} 39 | \usepackage[titletoc]{appendix} 40 | 41 | \begin{document} 42 | 43 | % Update the Thesis information below! 44 | \begin{titlepage} 45 | \centering 46 | \includegraphics[width=\textwidth]{figures/eth-nsg-header}\\[60mm] 47 | % 48 | {\Huge\bf\sf{ 49 | Advancing Packet-Level Traffic Predictions \\ 50 | with Transformers %\\[5mm] 51 | % Second Line of Thesis Title 52 | }}\\[10mm] 53 | {\Large\bf\sf Master Thesis}\\[3mm] 54 | % 55 | {\Large\bf\sf Author: Siddhant Ray } \\[5mm] 56 | {\sf Tutors: Alexander Dietmüller, Dr. Romain Jacob}\\[5mm] 57 | {\sf Supervisor: Prof. Dr. Laurent Vanbever}\\[30mm] 58 | % 59 | {\sf Februrary 2022 to August 2022} 60 | \end{titlepage} 61 | 62 | \thispagestyle{empty} 63 | \newpage 64 | \pagenumbering{roman} 65 | \include{abstract} 66 | 67 | \clearpage 68 | \setcounter{tocdepth}{2} 69 | \tableofcontents 70 | \clearpage 71 | 72 | \pagenumbering{arabic} 73 | 74 | \include{introduction} 75 | \include{background} 76 | \include{design} 77 | \include{evaluation} 78 | \include{outlook} 79 | \include{summary} 80 | 81 | \clearpage 82 | 83 | \addcontentsline{toc}{chapter}{References} 84 | 85 | \bibliographystyle{acm} 86 | \bibliography{refs} 87 | 88 | \clearpage 89 | \begin{appendices} 90 | 91 | \pagenumbering{Roman} 92 | 93 | \include{appendix} 94 | \end{appendices} 95 | 96 | \end{document} 97 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Network Traffic Transformer (NTT) 2 | 3 | This work was undertaken as part of my master thesis at ETH Zurich, from Feb 2022 to Aug 2022, titled `Advancing packet-level traffic predictions with Transformers`. We present a new transformer-based architecture, to learn network dynamics from packet traces. 4 | 5 | We design a `pre-training` phase, where we learn fundamental network dynamics. Following this, we have a `fine-tuning` phase, on different network tasks, and demonstrate that pre-training well leads to generalization to multiple fine-tuning tasks. 6 | 7 | ### Original proposal: 8 | * [`Project proposal`](https://nsg.ee.ethz.ch/fileadmin/user_upload/thesis_proposal_packet_transformer.pdf) 9 | 10 | ### Supervisors: 11 | * [`Alexander Dietmüller`](https://nsg.ee.ethz.ch/people/alexander-dietmueller/) 12 | * [`Dr. Romain Jacob`](https://nsg.ee.ethz.ch/people/romain-jacob/) 13 | * [`Prof. Dr. Laurent Vanbever`](https://nsg.ee.ethz.ch/people/laurent-vanbever/) 14 | 15 | ### Research Lab: 16 | * [`Networked Systems Group, ETH Zurich`](https://nsg.ee.ethz.ch/home/) 17 | 18 | ### We redirect you to the following sections for further details. 19 | 20 | * [`Code and reproducing instructions:`](workspace/README.md) 21 | 22 | * [`Thesis TeX and PDF files`](report/) 23 | 24 | * [`Literature files`](literature/) 25 | 26 | * [`Slides TeX and PDF files`](presentation/) 27 | 28 | ### NOTE 1: 29 | The experiments conducted in this project are very involved. Understanding and reproducing them from just the code and comments alone will be quite hard, inspite of the instructions mentioned in the given [`README`](workspace/README.md). For more detailed understanding, we invite you to read the thesis ([`direct link`](report/thesis.pdf)). You can also check out an overview on the presentation slides ([`direct link`](presentation/slides.pdf)) 30 | 31 | For any further questions or to discuss related research ideas, please feel free to contact me by [`email.`](mailto:siddhant.r98@gmail.com) 32 | 33 | ### NOTE 2: 34 | Some results from the thesis have been written as a paper titled ```A new hope for network model generalization``` and the same has been accepted for presentation at [ACM HotNets 2022](https://conferences.sigcomm.org/hotnets/2022/). The paper is now online and open-access, it can be accessed via the ACM 35 | Digital Library via this [link](https://dl.acm.org/doi/abs/10.1145/3563766.3564104), DOI is: 10.1145/3563766.3564104. 36 | 37 | ### NOTE 3: 38 | The thesis has now been published under the ETH Research collection, which is open access. It can be accessed from here ([`direct link`](https://www.research-collection.ethz.ch/handle/20.500.11850/569234)) 39 | 40 | -------------------------------------------------------------------------------- /.github/workflows/codeql-analysis.yml: -------------------------------------------------------------------------------- 1 | # For most projects, this workflow file will not need changing; you simply need 2 | # to commit it to your repository. 3 | # 4 | # You may wish to alter this file to override the set of languages analyzed, 5 | # or to provide custom queries or build logic. 6 | # 7 | # ******** NOTE ******** 8 | # We have attempted to detect the languages in your repository. Please check 9 | # the `language` matrix defined below to confirm you have the correct set of 10 | # supported CodeQL languages. 11 | # 12 | name: "CodeQL" 13 | 14 | on: 15 | push: 16 | branches: [ "main" ] 17 | pull_request: 18 | # The branches below must be a subset of the branches above 19 | branches: [ "main" ] 20 | schedule: 21 | - cron: '18 11 * * 5' 22 | 23 | jobs: 24 | analyze: 25 | name: Analyze 26 | runs-on: ubuntu-latest 27 | permissions: 28 | actions: read 29 | contents: read 30 | security-events: write 31 | 32 | strategy: 33 | fail-fast: false 34 | matrix: 35 | language: [ 'python' ] 36 | # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ] 37 | # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support 38 | 39 | steps: 40 | - name: Checkout repository 41 | uses: actions/checkout@v3 42 | 43 | # Initializes the CodeQL tools for scanning. 44 | - name: Initialize CodeQL 45 | uses: github/codeql-action/init@v2 46 | with: 47 | languages: ${{ matrix.language }} 48 | # If you wish to specify custom queries, you can do so here or in a config file. 49 | # By default, queries listed here will override any specified in a config file. 50 | # Prefix the list here with "+" to use these queries and those in the config file. 51 | 52 | # Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs 53 | # queries: security-extended,security-and-quality 54 | 55 | 56 | # Autobuild attempts to build any compiled languages (C/C++, C#, or Java). 57 | # If this step fails, then you should remove it and run the build manually (see below) 58 | - name: Autobuild 59 | uses: github/codeql-action/autobuild@v2 60 | 61 | # ℹ️ Command-line programs to run using the OS shell. 62 | # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun 63 | 64 | # If the Autobuild fails above, remove it and uncomment the following three lines. 65 | # modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance. 66 | 67 | # - run: | 68 | # echo "Run, Build Application using script" 69 | # ./location_of_script_within_repo/buildscript.sh 70 | 71 | - name: Perform CodeQL Analysis 72 | uses: github/codeql-action/analyze@v2 73 | -------------------------------------------------------------------------------- /workspace/PandasScripts/csv_gendelays.py: -------------------------------------------------------------------------------- 1 | # Orignal author: Siddhant Ray 2 | 3 | import argparse 4 | import collections 5 | import os 6 | 7 | import numpy as np 8 | import pandas as pd 9 | 10 | print("Current directory is:", os.getcwd()) 11 | print("Generate end to end packet delays") 12 | 13 | ## Some packets are sent at sender but not received at receiver 14 | ## they are dropped somewhere in the middle, so those are not counted 15 | ## for the delay calculation 16 | def gen_packet_delay(input_dataframe1, input_dataframe2, path): 17 | # print(input_dataframe1.head(), input_dataframe1.shape) 18 | # print(input_dataframe2.head(), input_dataframe2.shape) 19 | 20 | ## these packets are not received 21 | ids_not_received = list( 22 | set(input_dataframe1["IP ID"].to_list()) 23 | - set(input_dataframe2["IP ID"].to_list()) 24 | ) 25 | 26 | # print(ids_not_received) 27 | 28 | for value in ids_not_received: 29 | input_dataframe1 = input_dataframe1[input_dataframe1["IP ID"] != value] 30 | input_dataframe1 = input_dataframe1.reset_index(drop=True) 31 | 32 | # print(input_dataframe1.tail(), input_dataframe1.shape) 33 | # print(input_dataframe2.tail(), input_dataframe2.shape) 34 | 35 | input_dataframe1["Delay"] = ( 36 | input_dataframe2["Timestamp"] - input_dataframe1["Timestamp"] 37 | ) 38 | # print(input_dataframe1.tail(), input_dataframe1.shape) 39 | 40 | input_dataframe1 = input_dataframe1[input_dataframe1["Delay"].notna()] 41 | # print(input_dataframe1.tail(), input_dataframe1.shape) 42 | 43 | return input_dataframe1 44 | 45 | 46 | def main(): 47 | parser = argparse.ArgumentParser() 48 | parser.add_argument( 49 | "-mod", "--model", help="choose CC model for creating congestion", required=True 50 | ) 51 | parser.add_argument( 52 | "-nsend", 53 | "--numsenders", 54 | help="choose path for different topologies", 55 | required=True, 56 | ) 57 | args = parser.parse_args() 58 | print(args) 59 | 60 | if args.model == "tcponly": 61 | path = "congestion_1/" 62 | elif args.model == "tcpandudp": 63 | path = "congestion_2/" 64 | else: 65 | print("ERROR: CONGESTION MODEL NOT CORRECT....") 66 | exit() 67 | 68 | num_senders = int(args.numsenders) 69 | num_receivers = num_senders 70 | sender = 0 71 | receiver = 0 72 | 73 | temp_cols = [ 74 | "Timestamp", 75 | "Flow ID", 76 | "Packet ID", 77 | "Packet Size", 78 | "Interface ID", 79 | "IP ID", 80 | "DSCP", 81 | "ECN", 82 | "Payload Size", 83 | "TTL", 84 | "Proto", 85 | "Source IP", 86 | "Destination IP", 87 | "TCP Source Port", 88 | "TCP Destination Port", 89 | "TCP Sequence Number", 90 | "TCP Window Size", 91 | "Delay", 92 | ] 93 | 94 | temp = pd.DataFrame(columns=temp_cols) 95 | 96 | while sender < num_senders and receiver < num_receivers: 97 | 98 | input_dataframe1 = pd.read_csv(path + "sender_{}_final.csv".format(sender)) 99 | input_dataframe2 = pd.read_csv(path + "receiver_{}_final.csv".format(receiver)) 100 | 101 | delay_df = gen_packet_delay(input_dataframe1, input_dataframe2, path) 102 | temp = pd.concat([temp, delay_df], ignore_index=True, copy=False) 103 | 104 | sender += 1 105 | receiver += 1 106 | 107 | temp = temp.sort_values(by=["Timestamp"], ascending=True) 108 | print(temp.head()) 109 | print(temp.shape) 110 | temp.to_csv(path + "endtoenddelay.csv", index=False) 111 | 112 | 113 | if __name__ == "__main__": 114 | main() 115 | -------------------------------------------------------------------------------- /presentation/.gitignore: -------------------------------------------------------------------------------- 1 | ## Core latex/pdflatex auxiliary files: 2 | *.aux 3 | *.lof 4 | *.log 5 | *.lot 6 | *.fls 7 | *.out 8 | *.toc 9 | *.fmt 10 | *.fot 11 | *.cb 12 | *.cb2 13 | .*.lb 14 | 15 | ## Intermediate documents: 16 | *.dvi 17 | *.xdv 18 | *-converted-to.* 19 | # these rules might exclude image files for figures etc. 20 | # *.ps 21 | # *.eps 22 | # *.pdf 23 | 24 | ## Generated if empty string is given at "Please type another file name for output:" 25 | .pdf 26 | 27 | ## Bibliography auxiliary files (bibtex/biblatex/biber): 28 | *.bbl 29 | *.bcf 30 | *.blg 31 | *-blx.aux 32 | *-blx.bib 33 | *.run.xml 34 | 35 | ## Build tool auxiliary files: 36 | *.fdb_latexmk 37 | *.synctex 38 | *.synctex(busy) 39 | *.synctex.gz 40 | *.synctex.gz(busy) 41 | *.pdfsync 42 | 43 | ## Build tool directories for auxiliary files 44 | # latexrun 45 | latex.out/ 46 | 47 | ## Auxiliary and intermediate files from other packages: 48 | # algorithms 49 | *.alg 50 | *.loa 51 | 52 | # achemso 53 | acs-*.bib 54 | 55 | # amsthm 56 | *.thm 57 | 58 | # beamer 59 | *.nav 60 | *.pre 61 | *.snm 62 | *.vrb 63 | 64 | # changes 65 | *.soc 66 | 67 | # comment 68 | *.cut 69 | 70 | # cprotect 71 | *.cpt 72 | 73 | # elsarticle (documentclass of Elsevier journals) 74 | *.spl 75 | 76 | # endnotes 77 | *.ent 78 | 79 | # fixme 80 | *.lox 81 | 82 | # feynmf/feynmp 83 | *.mf 84 | *.mp 85 | *.t[1-9] 86 | *.t[1-9][0-9] 87 | *.tfm 88 | 89 | #(r)(e)ledmac/(r)(e)ledpar 90 | *.end 91 | *.?end 92 | *.[1-9] 93 | *.[1-9][0-9] 94 | *.[1-9][0-9][0-9] 95 | *.[1-9]R 96 | *.[1-9][0-9]R 97 | *.[1-9][0-9][0-9]R 98 | *.eledsec[1-9] 99 | *.eledsec[1-9]R 100 | *.eledsec[1-9][0-9] 101 | *.eledsec[1-9][0-9]R 102 | *.eledsec[1-9][0-9][0-9] 103 | *.eledsec[1-9][0-9][0-9]R 104 | 105 | # glossaries 106 | *.acn 107 | *.acr 108 | *.glg 109 | *.glo 110 | *.gls 111 | *.glsdefs 112 | 113 | # gnuplottex 114 | *-gnuplottex-* 115 | 116 | # gregoriotex 117 | *.gaux 118 | *.gtex 119 | 120 | # htlatex 121 | *.4ct 122 | *.4tc 123 | *.idv 124 | *.lg 125 | *.trc 126 | *.xref 127 | 128 | # hyperref 129 | *.brf 130 | 131 | # knitr 132 | *-concordance.tex 133 | # TODO Comment the next line if you want to keep your tikz graphics files 134 | *.tikz 135 | *-tikzDictionary 136 | 137 | # listings 138 | *.lol 139 | 140 | # luatexja-ruby 141 | *.ltjruby 142 | 143 | # makeidx 144 | *.idx 145 | *.ilg 146 | *.ind 147 | *.ist 148 | 149 | # minitoc 150 | *.maf 151 | *.mlf 152 | *.mlt 153 | *.mtc[0-9]* 154 | *.slf[0-9]* 155 | *.slt[0-9]* 156 | *.stc[0-9]* 157 | 158 | # minted 159 | _minted* 160 | *.pyg 161 | 162 | # morewrites 163 | *.mw 164 | 165 | # nomencl 166 | *.nlg 167 | *.nlo 168 | *.nls 169 | 170 | # pax 171 | *.pax 172 | 173 | # pdfpcnotes 174 | *.pdfpc 175 | 176 | # sagetex 177 | *.sagetex.sage 178 | *.sagetex.py 179 | *.sagetex.scmd 180 | 181 | # scrwfile 182 | *.wrt 183 | 184 | # sympy 185 | *.sout 186 | *.sympy 187 | sympy-plots-for-*.tex/ 188 | 189 | # pdfcomment 190 | *.upa 191 | *.upb 192 | 193 | # pythontex 194 | *.pytxcode 195 | pythontex-files-*/ 196 | 197 | # tcolorbox 198 | *.listing 199 | 200 | # thmtools 201 | *.loe 202 | 203 | # TikZ & PGF 204 | *.dpth 205 | *.md5 206 | *.auxlock 207 | 208 | # todonotes 209 | *.tdo 210 | 211 | # vhistory 212 | *.hst 213 | *.ver 214 | 215 | # easy-todo 216 | *.lod 217 | 218 | # xcolor 219 | *.xcp 220 | 221 | # xmpincl 222 | *.xmpi 223 | 224 | # xindy 225 | *.xdy 226 | 227 | # xypic precompiled matrices 228 | *.xyc 229 | 230 | # endfloat 231 | *.ttt 232 | *.fff 233 | 234 | # Latexian 235 | TSWLatexianTemp* 236 | 237 | ## Editors: 238 | # WinEdt 239 | *.bak 240 | *.sav 241 | 242 | # Texpad 243 | .texpadtmp 244 | 245 | # LyX 246 | *.lyx~ 247 | 248 | # Kile 249 | *.backup 250 | 251 | # KBibTeX 252 | *~[0-9]* 253 | 254 | # auto folder when using emacs and auctex 255 | ./auto/* 256 | *.el 257 | 258 | # expex forward references with \gathertags 259 | *-tags.tex 260 | 261 | # standalone packages 262 | *.sta 263 | 264 | -------------------------------------------------------------------------------- /report/.gitignore: -------------------------------------------------------------------------------- 1 | ## Core latex/pdflatex auxiliary files: 2 | *.aux 3 | *.lof 4 | *.log 5 | *.lot 6 | *.fls 7 | *.out 8 | *.toc 9 | *.fmt 10 | *.fot 11 | *.cb 12 | *.cb2 13 | .*.lb 14 | 15 | ## Intermediate documents: 16 | *.dvi 17 | *.xdv 18 | *-converted-to.* 19 | # these rules might exclude image files for figures etc. 20 | # *.ps 21 | # *.eps 22 | # *.pdf 23 | 24 | ## Generated if empty string is given at "Please type another file name for output:" 25 | .pdf 26 | 27 | ## Bibliography auxiliary files (bibtex/biblatex/biber): 28 | *.bbl 29 | *.bcf 30 | *.blg 31 | *-blx.aux 32 | *-blx.bib 33 | *.run.xml 34 | 35 | ## Build tool auxiliary files: 36 | *.fdb_latexmk 37 | *.synctex 38 | *.synctex(busy) 39 | *.synctex.gz 40 | *.synctex.gz(busy) 41 | *.pdfsync 42 | 43 | ## Build tool directories for auxiliary files 44 | # latexrun 45 | latex.out/ 46 | 47 | ## Auxiliary and intermediate files from other packages: 48 | # algorithms 49 | *.alg 50 | *.loa 51 | 52 | # achemso 53 | acs-*.bib 54 | 55 | # amsthm 56 | *.thm 57 | 58 | # beamer 59 | *.nav 60 | *.pre 61 | *.snm 62 | *.vrb 63 | 64 | # changes 65 | *.soc 66 | 67 | # comment 68 | *.cut 69 | 70 | # cprotect 71 | *.cpt 72 | 73 | # elsarticle (documentclass of Elsevier journals) 74 | *.spl 75 | 76 | # endnotes 77 | *.ent 78 | 79 | # fixme 80 | *.lox 81 | 82 | # feynmf/feynmp 83 | *.mf 84 | *.mp 85 | *.t[1-9] 86 | *.t[1-9][0-9] 87 | *.tfm 88 | 89 | #(r)(e)ledmac/(r)(e)ledpar 90 | *.end 91 | *.?end 92 | *.[1-9] 93 | *.[1-9][0-9] 94 | *.[1-9][0-9][0-9] 95 | *.[1-9]R 96 | *.[1-9][0-9]R 97 | *.[1-9][0-9][0-9]R 98 | *.eledsec[1-9] 99 | *.eledsec[1-9]R 100 | *.eledsec[1-9][0-9] 101 | *.eledsec[1-9][0-9]R 102 | *.eledsec[1-9][0-9][0-9] 103 | *.eledsec[1-9][0-9][0-9]R 104 | 105 | # glossaries 106 | *.acn 107 | *.acr 108 | *.glg 109 | *.glo 110 | *.gls 111 | *.glsdefs 112 | 113 | # gnuplottex 114 | *-gnuplottex-* 115 | 116 | # gregoriotex 117 | *.gaux 118 | *.gtex 119 | 120 | # htlatex 121 | *.4ct 122 | *.4tc 123 | *.idv 124 | *.lg 125 | *.trc 126 | *.xref 127 | 128 | # hyperref 129 | *.brf 130 | 131 | # knitr 132 | *-concordance.tex 133 | # TODO Comment the next line if you want to keep your tikz graphics files 134 | *.tikz 135 | *-tikzDictionary 136 | 137 | # listings 138 | *.lol 139 | 140 | # luatexja-ruby 141 | *.ltjruby 142 | 143 | # makeidx 144 | *.idx 145 | *.ilg 146 | *.ind 147 | *.ist 148 | 149 | # minitoc 150 | *.maf 151 | *.mlf 152 | *.mlt 153 | *.mtc[0-9]* 154 | *.slf[0-9]* 155 | *.slt[0-9]* 156 | *.stc[0-9]* 157 | 158 | # minted 159 | _minted* 160 | *.pyg 161 | 162 | # morewrites 163 | *.mw 164 | 165 | # nomencl 166 | *.nlg 167 | *.nlo 168 | *.nls 169 | 170 | # pax 171 | *.pax 172 | 173 | # pdfpcnotes 174 | *.pdfpc 175 | 176 | # sagetex 177 | *.sagetex.sage 178 | *.sagetex.py 179 | *.sagetex.scmd 180 | 181 | # scrwfile 182 | *.wrt 183 | 184 | # sympy 185 | *.sout 186 | *.sympy 187 | sympy-plots-for-*.tex/ 188 | 189 | # pdfcomment 190 | *.upa 191 | *.upb 192 | 193 | # pythontex 194 | *.pytxcode 195 | pythontex-files-*/ 196 | 197 | # tcolorbox 198 | *.listing 199 | 200 | # thmtools 201 | *.loe 202 | 203 | # TikZ & PGF 204 | *.dpth 205 | *.md5 206 | *.auxlock 207 | 208 | # todonotes 209 | *.tdo 210 | 211 | # vhistory 212 | *.hst 213 | *.ver 214 | 215 | # easy-todo 216 | *.lod 217 | 218 | # xcolor 219 | *.xcp 220 | 221 | # xmpincl 222 | *.xmpi 223 | 224 | # xindy 225 | *.xdy 226 | 227 | # xypic precompiled matrices 228 | *.xyc 229 | 230 | # endfloat 231 | *.ttt 232 | *.fff 233 | 234 | # Latexian 235 | TSWLatexianTemp* 236 | 237 | ## Editors: 238 | # WinEdt 239 | *.bak 240 | *.sav 241 | 242 | # Texpad 243 | .texpadtmp 244 | 245 | # LyX 246 | *.lyx~ 247 | 248 | # Kile 249 | *.backup 250 | 251 | # KBibTeX 252 | *~[0-9]* 253 | 254 | # auto folder when using emacs and auctex 255 | ./auto/* 256 | *.el 257 | 258 | # expex forward references with \gathertags 259 | *-tags.tex 260 | 261 | # standalone packages 262 | *.sta 263 | 264 | # output pdf (now we put it in the directory) 265 | # thesis.pdf 266 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # poetry 98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 102 | #poetry.lock 103 | 104 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 105 | __pypackages__/ 106 | 107 | # Celery stuff 108 | celerybeat-schedule 109 | celerybeat.pid 110 | 111 | # SageMath parsed files 112 | *.sage.py 113 | 114 | # Environments 115 | .env 116 | .venv 117 | env/ 118 | venv/ 119 | venv_tensorboard/ 120 | ENV/ 121 | env.bak/ 122 | venv.bak/ 123 | 124 | # Spyder project settings 125 | .spyderproject 126 | .spyproject 127 | 128 | # Rope project settings 129 | .ropeproject 130 | 131 | # mkdocs documentation 132 | /site 133 | 134 | # mypy 135 | .mypy_cache/ 136 | .dmypy.json 137 | dmypy.json 138 | 139 | # Pyre type checker 140 | .pyre/ 141 | 142 | # pytype static type analyzer 143 | .pytype/ 144 | 145 | # Cython debug symbols 146 | cython_debug/ 147 | 148 | .DS_Store 149 | 150 | # Prerequisites 151 | *.d 152 | 153 | # Compiled Object files 154 | *.slo 155 | *.lo 156 | *.o 157 | *.obj 158 | 159 | # Precompiled Headers 160 | *.gch 161 | *.pch 162 | 163 | # Compiled Dynamic libraries 164 | *.so 165 | *.dylib 166 | *.dll 167 | 168 | # Fortran module files 169 | *.mod 170 | *.smod 171 | 172 | # Compiled Static libraries 173 | *.lai 174 | *.la 175 | *.a 176 | *.lib 177 | 178 | # Executables 179 | *.exe 180 | *.out 181 | *.app 182 | 183 | ## VM management scripts 184 | remote_* 185 | 186 | ## VSCODE stuff 187 | .vscode 188 | 189 | ## Outputs from NS3 190 | outputs/ 191 | 192 | ## Evaluations from NS3 193 | evaluations 194 | 195 | ## Logs 196 | lightning_logs 197 | logs 198 | 199 | ## Plots 200 | workspace/PandasScripts/plots 201 | workspace/NetworkSimulators/memento/plots 202 | workspace/TransformerModels/plots 203 | figures_test/ 204 | # report/thesis.pdf 205 | 206 | -------------------------------------------------------------------------------- /workspace/NetworkSimulators/ns3/newnet.cc: -------------------------------------------------------------------------------- 1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */ 2 | 3 | #include "ns3/core-module.h" 4 | #include "ns3/network-module.h" 5 | #include "ns3/csma-module.h" 6 | #include "ns3/internet-module.h" 7 | #include "ns3/point-to-point-module.h" 8 | #include "ns3/applications-module.h" 9 | #include "ns3/ipv4-global-routing-helper.h" 10 | 11 | using namespace ns3; 12 | 13 | // Net network topology 14 | // 15 | // 10.1.1.0 16 | // n0 -------------- n1 n2 n3 n4 17 | // point-to-point | | | | 18 | // ================ 19 | // LAN 10.1.2.0 20 | 21 | NS_LOG_COMPONENT_DEFINE("NewnetTest"); 22 | 23 | int main(int argc, char *argv[]){ 24 | 25 | bool verbose = true; 26 | uint32_t nCsma = 3; 27 | uint32_t nPackets = 1; 28 | 29 | CommandLine cmd; 30 | cmd.AddValue("nCsma", "Number of \"extra\" CSMA nodes/devices", nCsma); 31 | cmd.AddValue("verbose", "Tell echo applications to log if true", verbose); 32 | cmd.AddValue("nPackets", "Number of packets to echo", nPackets); 33 | 34 | 35 | cmd.Parse(argc, argv); 36 | 37 | if (verbose) 38 | { 39 | LogComponentEnable("UdpEchoClientApplication", LOG_LEVEL_INFO); 40 | LogComponentEnable("UdpEchoServerApplication", LOG_LEVEL_INFO); 41 | } 42 | // Sanity check, nCsma >=1 always, nPackets >=1 always 43 | nCsma = nCsma == 0 ? 1 : nCsma; 44 | nPackets = nPackets == 0 ? 1 : nPackets; 45 | 46 | // Create the P2P nodes first 47 | NodeContainer p2pNodes; 48 | p2pNodes.Create(2); 49 | 50 | // Create CSMA node containers 51 | NodeContainer csmaNodes; 52 | csmaNodes.Add(p2pNodes.Get (1)); 53 | csmaNodes.Create(nCsma); 54 | 55 | // Bind the P2P devices inside th P2P containers 56 | PointToPointHelper pointToPoint; 57 | pointToPoint.SetDeviceAttribute("DataRate", StringValue ("5Mbps")); 58 | pointToPoint.SetChannelAttribute("Delay", StringValue ("2ms")); 59 | pointToPoint.SetQueue("ns3::DropTailQueue", "MaxSize", StringValue ("50p")); 60 | 61 | NetDeviceContainer p2pDevices; 62 | p2pDevices = pointToPoint.Install(p2pNodes); 63 | 64 | // Bind the CSMA devices inside th CSMA containers 65 | // For CSMA, data rate is a channel attribute, not a device attribute! 66 | // CSMA doesn't allow one to mix devices on a channel 67 | CsmaHelper csma; 68 | csma.SetChannelAttribute("DataRate", StringValue("100Mbps")); 69 | csma.SetChannelAttribute("Delay", TimeValue(NanoSeconds(6560))); 70 | 71 | NetDeviceContainer csmaDevices; 72 | csmaDevices = csma.Install(csmaNodes); 73 | 74 | // Install the protocol stack on the containers 75 | InternetStackHelper stack; 76 | stack.Install(p2pNodes.Get(0)); 77 | stack.Install(csmaNodes); 78 | 79 | // IP address for point to point nodes 80 | Ipv4AddressHelper address; 81 | address.SetBase("10.1.1.0", "255.255.255.0"); 82 | Ipv4InterfaceContainer p2pInterfaces; 83 | p2pInterfaces = address.Assign(p2pDevices); 84 | 85 | // IP address for CSMA devices (variable chain of devices) 86 | address.SetBase("10.1.2.0", "255.255.255.0"); 87 | Ipv4InterfaceContainer csmaInterfaces; 88 | csmaInterfaces = address.Assign(csmaDevices); 89 | 90 | // PORT number is 9 here 91 | UdpEchoServerHelper echoServer(9); 92 | 93 | ApplicationContainer serverApps = echoServer.Install(csmaNodes.Get(nCsma)); 94 | serverApps.Start(Seconds(1.0)); 95 | serverApps.Stop(Seconds(10.0)); 96 | 97 | UdpEchoClientHelper echoClient(csmaInterfaces.GetAddress(nCsma), 9); 98 | echoClient.SetAttribute("MaxPackets", UintegerValue(nPackets)); 99 | echoClient.SetAttribute("Interval", TimeValue(Seconds(1.0))); 100 | echoClient.SetAttribute("PacketSize", UintegerValue(1024)); 101 | 102 | ApplicationContainer clientApps = echoClient.Install(p2pNodes.Get(0)); 103 | clientApps.Start(Seconds(2.0)); 104 | clientApps.Stop(Seconds(10.0)); 105 | 106 | Ipv4GlobalRoutingHelper::PopulateRoutingTables(); 107 | 108 | pointToPoint.EnablePcap("newnet", p2pNodes.Get(0)->GetId(), 0); 109 | csma.EnablePcap("newnet", csmaNodes.Get(nCsma)->GetId(), 0, false); 110 | csma.EnablePcap("newnet", csmaNodes.Get(nCsma-1)->GetId(), 0, false); 111 | 112 | Simulator::Run(); 113 | Simulator::Destroy(); 114 | return 0; 115 | } 116 | 117 | 118 | 119 | -------------------------------------------------------------------------------- /workspace/NetworkSimulators/memento/experiment-tags.h: -------------------------------------------------------------------------------- 1 | /* Tags for tracking simulation info. 2 | */ 3 | #ifndef EXPERIMENT_TAGS_H 4 | #define EXPERIMENT_TAGS_H 5 | 6 | #include "ns3/core-module.h" 7 | #include "ns3/network-module.h" 8 | 9 | using namespace ns3; 10 | 11 | // A timestamp tag that can be added to a packet. 12 | class TimestampTag : public Tag 13 | { 14 | public: 15 | static TypeId GetTypeId(void) 16 | { 17 | static TypeId tid = TypeId("ns3::TimestampTag") 18 | .SetParent() 19 | .AddConstructor() 20 | .AddAttribute("Timestamp", 21 | "Timestamp to save in tag.", 22 | EmptyAttributeValue(), 23 | MakeTimeAccessor(&TimestampTag::timestamp), 24 | MakeTimeChecker()); 25 | return tid; 26 | }; 27 | TypeId GetInstanceTypeId(void) const { return GetTypeId(); }; 28 | uint32_t GetSerializedSize(void) const { return sizeof(timestamp); }; 29 | void Serialize(TagBuffer i) const 30 | { 31 | i.Write(reinterpret_cast(×tamp), 32 | sizeof(timestamp)); 33 | }; 34 | void Deserialize(TagBuffer i) 35 | { 36 | i.Read(reinterpret_cast(×tamp), sizeof(timestamp)); 37 | }; 38 | void Print(std::ostream &os) const 39 | { 40 | os << "t=" << timestamp; 41 | }; 42 | 43 | // these are our accessors to our tag structure 44 | void SetTime(Time time) { timestamp = time; }; 45 | Time GetTime() { return timestamp; }; 46 | 47 | private: 48 | Time timestamp; 49 | }; 50 | 51 | // A tag with two integer values for workload and application ids. 52 | class IdTag : public Tag 53 | { 54 | public: 55 | static TypeId GetTypeId(void) 56 | { 57 | static TypeId tid = 58 | TypeId("ns3::IntTag") 59 | .SetParent() 60 | .AddConstructor() 61 | .AddAttribute("workload", 62 | "Workload id to save in tag.", 63 | EmptyAttributeValue(), 64 | MakeUintegerAccessor(&IdTag::workload), 65 | MakeUintegerChecker()) 66 | .AddAttribute("application", 67 | "Application id to save in tag.", 68 | EmptyAttributeValue(), 69 | MakeUintegerAccessor(&IdTag::application), 70 | MakeUintegerChecker()); 71 | return tid; 72 | }; 73 | TypeId GetInstanceTypeId(void) const { return GetTypeId(); }; 74 | uint32_t GetSerializedSize(void) const 75 | { 76 | return sizeof(workload) + sizeof(application); 77 | }; 78 | void Serialize(TagBuffer i) const 79 | { 80 | i.Write(reinterpret_cast(&workload), 81 | sizeof(workload)); 82 | i.Write(reinterpret_cast(&application), 83 | sizeof(application)); 84 | }; 85 | void Deserialize(TagBuffer i) 86 | { 87 | i.Read(reinterpret_cast(&workload), sizeof(workload)); 88 | i.Read(reinterpret_cast(&application), sizeof(application)); 89 | }; 90 | void Print(std::ostream &os) const 91 | { 92 | os << "w=" << workload << ", " 93 | << "a=" << application; 94 | }; 95 | 96 | // these are our accessors to our tag structure 97 | void SetWorkload(u_int32_t newval) { workload = newval; }; 98 | u_int32_t GetWorkload() { return workload; }; 99 | void SetApplication(u_int32_t newval) { application = newval; }; 100 | u_int32_t GetApplication() { return application; }; 101 | 102 | private: 103 | u_int32_t workload; 104 | u_int32_t application; 105 | }; 106 | 107 | // A tag to check message ids and see if they are preserved across fragments 108 | class MessageTag : public Tag 109 | { 110 | public: 111 | 112 | static TypeId GetTypeId (void) 113 | { 114 | static TypeId tid = TypeId ("ns3::MessageTag") 115 | .SetParent () 116 | .AddConstructor () 117 | .AddAttribute ("SimpleValue", 118 | "A simple value", 119 | EmptyAttributeValue (), 120 | MakeUintegerAccessor (&MessageTag::m_simpleValue), 121 | MakeUintegerChecker ()); 122 | return tid; 123 | } 124 | TypeId GetInstanceTypeId(void) const 125 | { 126 | return GetTypeId(); 127 | } 128 | uint32_t GetSerializedSize(void) const 129 | { 130 | return sizeof(m_simpleValue); 131 | } 132 | void Serialize(TagBuffer i) const 133 | { 134 | i.Write(reinterpret_cast(&m_simpleValue), 135 | sizeof(m_simpleValue)); 136 | } 137 | void Deserialize(TagBuffer i) 138 | { 139 | i.Read(reinterpret_cast(&m_simpleValue), 140 | sizeof(m_simpleValue)); 141 | } 142 | void Print(std::ostream &os) const 143 | { 144 | os << "v=" << (uint32_t)m_simpleValue; 145 | } 146 | void SetSimpleValue(uint32_t value) 147 | { 148 | m_simpleValue = value; 149 | } 150 | uint32_t GetSimpleValue(void) const 151 | { 152 | return m_simpleValue; 153 | } 154 | 155 | private: 156 | uint32_t m_simpleValue; 157 | }; 158 | 159 | #endif // EXPERIMENT_TAGS_H 160 | -------------------------------------------------------------------------------- /literature/Literature.html: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | Bookmarks 7 |

Bookmarks Menu

8 |
9 |
Attention is all you need 10 |
Should you mask 15% in masked language modeling? 11 |
The illustrated transformer 12 |
The ns-3 network simulator 13 |
Homa: A receiver-driven low-latency transport protocol using network priorities 14 |
Cloze procedure: A new tool for measuring readability 15 |
Hypothesis testing in time series analysis 16 |
Smoothing, forecasting and prediction of discrete time series 17 |
Time series and forecasting: Brief history and future research 18 |
Long short-term memory 19 |
Modelling radiological language with bidirectional long short-term memory networks 20 |
Learning in situ: a randomized experiment in video streaming 21 |
The CAIDA anonymized internet traces data access 22 |
Measurement lab 23 |
Crawdad 24 |
Rocketfuel: An ISP topology mapping engine 25 |
Header space analysis: Static checking for networks 26 |
Distilling the knowledge in a neural network 27 |
Advances and open problems in federated learning 28 |
PyTorch: An imperative style, high-performance deep learning library 29 |
Layer normalization 30 |
Adam: A method for stochastic optimization 31 |
Robust estimation of a location parameter 32 |
A new hope for network model generalization 33 |
Classic meets modern: A pragmatic learning-based congestion control for the internet 34 |
TCP ex machina: Computer-generated congestion control 35 |
TCP ex machina: Computer-generated congestion control 36 |
Oboe: Auto-tuning video ABR algorithms to network conditions 37 |
AuTO: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization 38 |
Learning to route 39 |
Is advance knowledge of flow sizes a plausible assumption? 40 |
One protocol to rule them all: Wireless Network-on-Chip using deep reinforcement learning 41 |
Biases in data-driven networking, and what to do about them 42 |
On the use of ML for blackbox system performance prediction 43 |
Factorization tricks for LSTM networks 44 |
Recent advances in natural language inference: A survey of benchmarks, resources, and approaches 45 |
A survey on vision transformer 46 |
Nuts and bolts of building applications using deep learning 47 |
Network planning with deep reinforcement learning 48 |
-------------------------------------------------------------------------------- /workspace/NetworkSimulators/memento/eval.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | from matplotlib import pyplot as plt 3 | import numpy as np 4 | import seaborn as sns 5 | import sys 6 | 7 | import matplotlib.pyplot as plt 8 | import matplotlib as mpl 9 | from matplotlib.ticker import FormatStrFormatter 10 | 11 | BIG = True 12 | TEST = True # Marked true for fine-tuning data with multiple bottlenecks 13 | val = sys.argv[1] 14 | 15 | sns.set_theme("paper", "whitegrid", font_scale=1.5) 16 | mpl.rcParams.update({ 17 | 'text.usetex': True, 18 | 'font.family': 'serif', 19 | 'text.latex.preamble': r'\usepackage{amsmath,amssymb}', 20 | 21 | 'lines.linewidth': 2, 22 | 'lines.markeredgewidth': 0, 23 | 24 | 'scatter.marker': '.', 25 | 'scatter.edgecolors': 'none', 26 | 27 | # Set image quality and reduce whitespace around saved figure. 28 | 'savefig.dpi': 300, 29 | 'savefig.bbox': 'tight', 30 | 'savefig.pad_inches': 0.01, 31 | }) 32 | 33 | if not TEST: 34 | frame = pd.read_csv("small_test_no_disturbance_with_message_ids{}.csv".format(val)) 35 | else: 36 | if not BIG: 37 | frame = pd.read_csv("small_test_one_disturbance_with_message_ids{}.csv".format(val)) 38 | else: 39 | frame = pd.read_csv("large_test_disturbance_with_message_ids{}.csv".format(val)) 40 | 41 | # Get the time stamp, packet size and delay (from my format, Alex uses a different format) 42 | frame = frame[frame.columns[[1,7,-8]]] 43 | frame.columns = ["t", "size", "delay"] 44 | print(frame.head()) 45 | 46 | frame = ( 47 | frame 48 | .assign(delay=lambda df: df['delay']) # to ms. 49 | ) 50 | 51 | plt.figure(figsize=(5,5)) 52 | sbs = sns.displot( 53 | data=frame, 54 | kind='ecdf', 55 | x='delay' 56 | ) 57 | 58 | #sbs.fig.suptitle('Delay plot with multiple senders') 59 | sbs.set(xlabel='Delay (seconds)', ylabel='Fraction of packets') 60 | plt.xlim([0,0.5]) 61 | plt.ylim(bottom=0) 62 | # Tight layout 63 | sbs.fig.tight_layout() 64 | plt.savefig("delay"+".pdf") 65 | 66 | 67 | frame['delay'].quantile([0.5, 0.99]) 68 | 69 | throughput = frame.loc[frame['t'] > 20, 'size'].sum() / 40 / (1024*1024) # in MBps 70 | 71 | queueframe = pd.read_csv("queue.csv", names=["source", "time", "size"]) 72 | 73 | bottleneck_source = "/NodeList/0/DeviceList/0/$ns3::CsmaNetDevice/TxQueue/PacketsInQueue" 74 | bottleneck_queue = queueframe[queueframe["source"] == bottleneck_source] 75 | print(bottleneck_source) 76 | 77 | plt.figure(figsize=(5,5)) 78 | scs = sns.relplot( 79 | data=bottleneck_queue, 80 | kind='line', 81 | x='time', 82 | y='size', 83 | legend=False, 84 | ci=None, 85 | ) 86 | 87 | scs.fig.suptitle('Bottleneck queue plot with multiple senders') 88 | plt.savefig("Queuesize"+".pdf") 89 | 90 | ## Bottleneck plots for switches A, B, D, G 91 | 92 | if BIG: 93 | values = [6, 7, 9, 12] 94 | dict_switches = { 95 | 6: "A", 96 | 7: "B", 97 | 9: "D", 98 | 12: "G" 99 | } 100 | else: 101 | values = [2, 3] 102 | dict_switches = { 103 | 2: "A", 104 | 3: "B" 105 | } 106 | 107 | for value in values: 108 | bottleneck_source = "/NodeList/{}/DeviceList/0/$ns3::CsmaNetDevice/TxQueue/PacketsInQueue".format(value) 109 | bottleneck_queue = queueframe[queueframe["source"] == bottleneck_source] 110 | print(bottleneck_source) 111 | 112 | plt.figure(figsize=(5,5)) 113 | scs = sns.relplot( 114 | data=bottleneck_queue, 115 | kind='line', 116 | x='time', 117 | y='size', 118 | legend=False, 119 | ci=None, 120 | ) 121 | 122 | #scs.fig.suptitle('Bottleneck queue on switch {} '.format(dict_switches[value])) 123 | #scs.fig.suptitle('Queue on bottleneck switch') 124 | scs.set(xlabel='Simulation Time (seconds)', ylabel='Queue Size (packets)') 125 | plt.xlim([0,60]) 126 | plt.ylim([0,1000]) 127 | 128 | save_name = "Queue profile on switch {}".format(dict_switches[value]) + ".pdf" 129 | scs.fig.tight_layout() 130 | plt.savefig(save_name) 131 | 132 | dropframe = pd.read_csv("drops.csv", names=["source", "time", "packetsize"]) 133 | 134 | print("Drop fraction:", len(dropframe) / (len(dropframe) + len(frame))) 135 | 136 | if BIG: 137 | ## Plot delay distribution for each receiver 138 | new_frame = pd.read_csv("large_test_disturbance_with_message_ids{}.csv".format(val)) 139 | new_frame = new_frame[new_frame.columns[[1,7, 23, -8]]] 140 | new_frame.columns = ["t", "size", "dest ip", "delay"] 141 | print(new_frame.head()) 142 | 143 | gb = new_frame.groupby('dest ip') 144 | groups = [gb.get_group(x) for x in gb.groups] 145 | print(groups) 146 | 147 | for idx, group in enumerate(groups): 148 | print(idx, group.shape) 149 | plt.figure(figsize=(5,5)) 150 | scs = sns.displot( 151 | data=group, 152 | kind='ecdf', 153 | x='delay', 154 | legend=False 155 | ) 156 | 157 | scs.fig.suptitle('Delay plot on receiver {} '.format(idx+1)) 158 | scs.set(xlabel='Delay', ylabel='Fraction of packets') 159 | plt.xlim([0,0.5]) 160 | # Tight layout 161 | scs.fig.tight_layout() 162 | plt.savefig("delay_Receiver{}".format(idx)+".pdf") 163 | 164 | fig, ax = plt.subplots(figsize=(5,5)) 165 | df0 = groups[0] 166 | df1 = groups[1] 167 | df2 = groups[2] 168 | 169 | scs0 = sns.ecdfplot( 170 | data=df0, 171 | x='delay', 172 | label="Receiver 1", 173 | color="blue", 174 | ax = ax 175 | ) 176 | scs1 = sns.ecdfplot( 177 | data=df1, 178 | x='delay', 179 | label="Receiver 2", 180 | color="red", 181 | ax = ax 182 | ) 183 | scs2 = sns.ecdfplot( 184 | data=df2, 185 | x='delay', 186 | label="Receiver 3", 187 | color="green", 188 | ax = ax 189 | ) 190 | 191 | ax.set_xlabel("Delay", fontsize=12) 192 | ax.set_ylabel("Fraction of packets",fontsize=12) 193 | ax.axis(xmin=0,xmax=0.5) 194 | ax.lines[0].set_linestyle("dotted") 195 | ax.lines[1].set_linestyle("--") 196 | ax.lines[2].set_linestyle("-.") 197 | fig.legend(["Receiver 1","Receiver 2","Receiver 3"],loc = "lower right", bbox_to_anchor=(0.948, 0.125), ncol=1, fontsize=10) 198 | #ax.get_legend().remove() 199 | # Tight layout 200 | fig.tight_layout() 201 | fig.savefig("delay_Receivers"+".pdf") 202 | 203 | 204 | 205 | -------------------------------------------------------------------------------- /workspace/NetworkSimulators/ns3/tcpapplication.cc: -------------------------------------------------------------------------------- 1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */ 2 | 3 | #include 4 | #include "ns3/core-module.h" 5 | #include "ns3/network-module.h" 6 | #include "ns3/internet-module.h" 7 | #include "ns3/point-to-point-module.h" 8 | #include "ns3/applications-module.h" 9 | 10 | using namespace ns3; 11 | 12 | NS_LOG_COMPONENT_DEFINE("TCPApplication"); 13 | 14 | // =========================================================================== 15 | // 16 | // node 0 node 1 17 | // +----------------+ +----------------+ 18 | // | ns-3 TCP | | ns-3 TCP | 19 | // +----------------+ +----------------+ 20 | // | 10.1.1.1 | | 10.1.1.2 | 21 | // +----------------+ +----------------+ 22 | // | point-to-point | | point-to-point | 23 | // +----------------+ +----------------+ 24 | // | | 25 | // +---------------------+ 26 | // 5 Mbps, 2 ms 27 | // 28 | // =========================================================================== 29 | 30 | // MAIN TAKEAWAY: Cannot hook onto trace sources and sinks during configuration as 31 | // they may be created during run time, and do not exist during configuration time. 32 | 33 | // Application Code 34 | 35 | class App : public Application 36 | { 37 | public: 38 | App(); 39 | virtual ~App(); 40 | void setup(Ptr socket, Address address, uint32_t packetSize, 41 | uint32_t nPackets, DataRate dataRate); 42 | 43 | private: 44 | virtual void StartApplication(void); 45 | virtual void StopApplication(void); 46 | 47 | void ScheduleTx(void); 48 | void SendPacket(void); 49 | 50 | Ptr m_socket; 51 | Address m_peer; 52 | uint32_t m_packetSize; 53 | uint32_t m_nPackets; 54 | DataRate m_dataRate; 55 | EventId m_sendEvent; 56 | bool m_running; 57 | uint32_t m_packetsSent; 58 | }; 59 | 60 | // Constructor for the application 61 | App::App() 62 | : m_socket(0), 63 | m_peer(), 64 | m_packetSize(0), 65 | m_nPackets(0), 66 | m_dataRate(0), 67 | m_sendEvent(), 68 | m_running(false), 69 | m_packetsSent(0) 70 | {} 71 | 72 | // Destructor for the application 73 | App::~App() 74 | { 75 | m_socket = 0; 76 | } 77 | 78 | void App::setup(Ptr socket, Address address, uint32_t packetSize, 79 | uint32_t nPackets, DataRate dataRate) 80 | { 81 | m_socket = socket; 82 | m_peer = address; 83 | m_packetSize = packetSize; 84 | m_nPackets = nPackets; 85 | m_dataRate = dataRate; 86 | } 87 | 88 | // Application start 89 | void App::StartApplication(void) 90 | { 91 | m_running = true; 92 | m_packetsSent = 0; 93 | m_socket->Bind(); 94 | m_socket->Connect(m_peer); 95 | SendPacket(); 96 | } 97 | 98 | // Application stop 99 | void App::StopApplication(void) 100 | { 101 | m_running = false; 102 | if(m_sendEvent.IsRunning()) 103 | { 104 | Simulator::Cancel(m_sendEvent); 105 | } 106 | if(m_socket) 107 | { 108 | m_socket->Close(); 109 | } 110 | } 111 | 112 | // Send the packet 113 | void App::SendPacket(void) 114 | { 115 | Ptr packet = Create(m_packetSize); 116 | m_socket->Send(packet); 117 | 118 | if(++m_packetsSent < m_nPackets) 119 | { 120 | ScheduleTx(); 121 | } 122 | } 123 | 124 | void App::ScheduleTx(void) 125 | { 126 | if(m_running) 127 | { 128 | Time tNext(Seconds(m_packetSize * 8 / static_cast(m_dataRate.GetBitRate()))); 129 | m_sendEvent = Simulator::Schedule(tNext, &App::SendPacket, this); 130 | } 131 | } 132 | 133 | static void CwndChange(uint32_t oldCwnd, uint32_t newCwnd) 134 | { 135 | NS_LOG_INFO(Simulator::Now().GetSeconds()<<"\t"< stream, Ptrp) 139 | { 140 | NS_LOG_INFO("RxDrop at "<GetStream() << "Rx drop at: "<< Simulator::Now().GetSeconds(); 142 | } 143 | 144 | // Main program 145 | int main(int argc, char *argv[]) 146 | { 147 | NS_LOG_INFO("Create P2P nodes....."); 148 | NodeContainer nodes; 149 | nodes.Create(2); 150 | 151 | PointToPointHelper pointToPoint; 152 | pointToPoint.SetDeviceAttribute("DataRate", StringValue("5Mbps")); 153 | pointToPoint.SetChannelAttribute("Delay", StringValue("2ms")); 154 | pointToPoint.SetQueue("ns3::DropTailQueue", "MaxSize", StringValue ("50p")); 155 | 156 | NetDeviceContainer devices; 157 | devices = pointToPoint.Install(nodes); 158 | 159 | // Add errors in the channel at a given rate 160 | 161 | Ptr em = CreateObject(); 162 | em->SetAttribute("ErrorRate", DoubleValue(0.00001)); 163 | devices.Get(1)->SetAttribute("ReceiveErrorModel", PointerValue(em)); 164 | 165 | InternetStackHelper stack; 166 | stack.Install(nodes); 167 | 168 | Ipv4AddressHelper address; 169 | address.SetBase("10.1.1.0", "255.255.255.252"); 170 | Ipv4InterfaceContainer interfaces = address.Assign(devices); 171 | 172 | uint16_t sinkPort = 8080; 173 | Address sinkAddress(InetSocketAddress(interfaces.GetAddress(1), sinkPort)); 174 | PacketSinkHelper packetSinkHelper("ns3::TcpSocketFactory", 175 | InetSocketAddress(Ipv4Address::GetAny(), sinkPort)); 176 | 177 | ApplicationContainer sinkApps = packetSinkHelper.Install(nodes.Get(1)); 178 | sinkApps.Start(Seconds(0.)); 179 | sinkApps.Stop(Seconds(20.)); 180 | 181 | Ptr ns3TcpSocket = Socket::CreateSocket(nodes.Get(0), 182 | TcpSocketFactory::GetTypeId()); 183 | ns3TcpSocket->TraceConnectWithoutContext("CongestionWindow", 184 | MakeCallback(&CwndChange)); 185 | 186 | Ptr app = CreateObject(); 187 | app->setup(ns3TcpSocket, sinkAddress, 104000, 1, DataRate("1Mbps")); 188 | nodes.Get(0)->AddApplication(app); 189 | app->SetStartTime(Seconds(1.)); 190 | app->SetStopTime(Seconds(20.)); 191 | 192 | AsciiTraceHelper ascii; 193 | Ptr streamRxDrops = ascii.CreateFileStream("outputs/RxDrops_tcpbasic.txt"); 194 | devices.Get(1)->TraceConnectWithoutContext("PhyRxDrop", MakeBoundCallback(&RxDrop, streamRxDrops)); 195 | 196 | Simulator::Stop(Seconds(20)); 197 | Simulator::Run(); 198 | Simulator::Destroy(); 199 | 200 | return 0; 201 | } 202 | 203 | 204 | 205 | 206 | 207 | -------------------------------------------------------------------------------- /workspace/NetworkSimulators/memento/cdf-application.h: -------------------------------------------------------------------------------- 1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */ 2 | // 3 | // Copyright (c) 2006 Georgia Tech Research Corporation 4 | // 5 | // This program is free software; you can redistribute it and/or modify 6 | // it under the terms of the GNU General Public License version 2 as 7 | // published by the Free Software Foundation; 8 | // 9 | // This program is distributed in the hope that it will be useful, 10 | // but WITHOUT ANY WARRANTY; without even the implied warranty of 11 | // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 | // GNU General Public License for more details. 13 | // 14 | // You should have received a copy of the GNU General Public License 15 | // along with this program; if not, write to the Free Software 16 | // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 17 | // 18 | // Author: George F. Riley 19 | // 20 | 21 | // TODO: Update description 22 | // ns3 - On/Off Data Source Application class 23 | // George F. Riley, Georgia Tech, Spring 2007 24 | // Adapted from ApplicationOnOff in GTNetS. 25 | 26 | #ifndef CDF_APPLICATION_H 27 | #define CDF_APPLICATION_H 28 | 29 | #include "ns3/address.h" 30 | #include "ns3/application.h" 31 | #include "ns3/event-id.h" 32 | #include "ns3/ptr.h" 33 | #include "ns3/data-rate.h" 34 | #include "ns3/traced-callback.h" 35 | #include "ns3/random-variable-stream.h" 36 | 37 | namespace ns3 38 | { 39 | 40 | class Address; 41 | class RandomVariableStream; 42 | class Socket; 43 | 44 | /** 45 | * \ingroup applications 46 | * \defgroup onoff CdfApplication 47 | * 48 | * This traffic generator follows an On/Off pattern: after 49 | * Application::StartApplication 50 | * is called, "On" and "Off" states alternate. The duration of each of 51 | * these states is determined with the onTime and the offTime random 52 | * variables. During the "Off" state, no traffic is generated. 53 | * During the "On" state, cbr traffic is generated. This cbr traffic is 54 | * characterized by the specified "data rate" and "packet size". 55 | */ 56 | /** 57 | * \ingroup onoff 58 | * 59 | * \brief Generate traffic to a single destination according to an 60 | * Cdf pattern. 61 | * 62 | * This traffic generator follows an On/Off pattern: after 63 | * Application::StartApplication 64 | * is called, "On" and "Off" states alternate. The duration of each of 65 | * these states is determined with the onTime and the offTime random 66 | * variables. During the "Off" state, no traffic is generated. 67 | * During the "On" state, cbr traffic is generated. This cbr traffic is 68 | * characterized by the specified "data rate" and "packet size". 69 | * 70 | * Note: When an application is started, the first packet transmission 71 | * occurs _after_ a delay equal to (packet size/bit rate). Note also, 72 | * when an application transitions into an off state in between packet 73 | * transmissions, the remaining time until when the next transmission 74 | * would have occurred is cached and is used when the application starts 75 | * up again. Example: packet size = 1000 bits, bit rate = 500 bits/sec. 76 | * If the application is started at time 3 seconds, the first packet 77 | * transmission will be scheduled for time 5 seconds (3 + 1000/500) 78 | * and subsequent transmissions at 2 second intervals. If the above 79 | * application were instead stopped at time 4 seconds, and restarted at 80 | * time 5.5 seconds, then the first packet would be sent at time 6.5 seconds, 81 | * because when it was stopped at 4 seconds, there was only 1 second remaining 82 | * until the originally scheduled transmission, and this time remaining 83 | * information is cached and used to schedule the next transmission 84 | * upon restarting. 85 | * 86 | * If the underlying socket type supports broadcast, this application 87 | * will automatically enable the SetAllowBroadcast(true) socket option. 88 | */ 89 | class CdfApplication : public Application 90 | { 91 | public: 92 | /** 93 | * \brief Get the type ID. 94 | * \return the object TypeId 95 | */ 96 | static TypeId GetTypeId(void); 97 | 98 | CdfApplication(); 99 | 100 | virtual ~CdfApplication(); 101 | 102 | /** 103 | * \brief Return a pointer to associated socket. 104 | * \return pointer to associated socket 105 | */ 106 | Ptr GetSocket(void) const; 107 | 108 | /** 109 | * \brief Assign a fixed random variable stream number to the random variables 110 | * used by this model. 111 | * 112 | * \param stream first stream index to use 113 | * \return the number of stream indices assigned by this model 114 | */ 115 | int64_t AssignStreams(int64_t stream); 116 | 117 | protected: 118 | virtual void DoDispose(void); 119 | 120 | private: 121 | // inherited from Application base class. 122 | virtual void StartApplication(void); // Called at time specified by Start 123 | virtual void StopApplication(void); // Called at time specified by Stop 124 | 125 | //helpers 126 | /** 127 | * \brief Cancel all pending events. 128 | */ 129 | void CancelEvents(); 130 | 131 | // Event handlers 132 | /** 133 | * \brief Send a packet 134 | */ 135 | void SendPacket(); 136 | 137 | Ptr m_socket; //!< Associated socket 138 | Address m_peer; //!< Peer address 139 | bool m_connected; //!< True if connected 140 | DataRate m_rate; //!< Rate that data is generated 141 | Time m_lastStartTime; //!< Time last packet sent 142 | EventId m_sendEvent; //!< Event id of pending "send packet" event 143 | TypeId m_tid; //!< Type of the socket used 144 | 145 | // cdf files! 146 | std::string m_filename; 147 | double m_average_size; // in bytes! 148 | Ptr m_sizeDist; 149 | Ptr m_timeDist; 150 | uint32_t m_counter; // track number of fragments sent 151 | 152 | /// Traced Callback: transmitted packets. 153 | TracedCallback> m_txTrace; 154 | 155 | /// Callbacks for tracing the packet Tx events, includes source and destination addresses 156 | TracedCallback, const Address &, const Address &> m_txTraceWithAddresses; 157 | 158 | private: 159 | /** 160 | * \brief Schedule the next packet transmission 161 | */ 162 | void ScheduleNextTx(); 163 | /** 164 | * \brief Handle a Connection Succeed event 165 | * \param socket the connected socket 166 | */ 167 | void ConnectionSucceeded(Ptr socket); 168 | /** 169 | * \brief Handle a Connection Failed event 170 | * \param socket the not connected socket 171 | */ 172 | void ConnectionFailed(Ptr socket); 173 | 174 | // Accessors for Distribution Attributes 175 | bool SetDistribution(std::string filename); 176 | std::string GetDistribution() const; 177 | 178 | void SetRate(DataRate rate); 179 | DataRate GetRate() const; 180 | 181 | // Helper to set the rate dist, needs to be called by both setters above. 182 | void UpdateRateDistribution(); 183 | }; 184 | 185 | } // namespace ns3 186 | 187 | #endif /* CDF_APPLICATION_H */ 188 | -------------------------------------------------------------------------------- /workspace/TransformerModels/plot_losses.py: -------------------------------------------------------------------------------- 1 | # Orignal author: Siddhant Ray 2 | 3 | import matplotlib as mpl 4 | import matplotlib.pyplot as plt 5 | import seaborn as sns 6 | from matplotlib.ticker import FormatStrFormatter 7 | from tbparse import SummaryReader 8 | 9 | log_dir = "../../logs/encoder_delay_logs/" 10 | reader = SummaryReader(log_dir) 11 | df = reader.scalars 12 | 13 | epoch_train_loss_df = df[df["tag"] == "Avg loss per epoch"] 14 | epoch_train_loss_df.reset_index(inplace=True, drop=True) 15 | print(epoch_train_loss_df) 16 | 17 | train_loss_step_df = df[df["tag"] == "Train loss"] 18 | train_loss_step_df.reset_index(inplace=True, drop=True) 19 | print(train_loss_step_df) 20 | 21 | val_loss_step_df = df[df["tag"] == "Val loss"] 22 | val_loss_step_df.reset_index(inplace=True, drop=True) 23 | print(val_loss_step_df) 24 | 25 | ## Train loss plot (pre-training) 26 | plt.figure(figsize=(5, 5)) 27 | sns.lineplot( 28 | x=epoch_train_loss_df.index, 29 | y="value", 30 | data=epoch_train_loss_df, 31 | label="Avg train loss per epoch", 32 | ) 33 | # plt.show() 34 | 35 | ## Val loss plot (pre-training) 36 | plt.figure(figsize=(5, 5)) 37 | sns.lineplot( 38 | x=val_loss_step_df.index, 39 | y="value", 40 | data=val_loss_step_df, 41 | label="Avg val loss per epoch", 42 | ) 43 | # plt.show() 44 | 45 | mct_log_dir_pretrained = "../../logs/finetune_mct_logs/" 46 | reader_pretrained = SummaryReader(mct_log_dir_pretrained) 47 | df_pretrained = reader_pretrained.scalars 48 | 49 | train_loss_epoch_df_pretrained = df_pretrained[ 50 | df_pretrained["tag"] == "Avg loss per epoch" 51 | ] 52 | train_loss_epoch_df_pretrained.reset_index(inplace=True, drop=True) 53 | 54 | val_loss_epoch_df_pretrained = df_pretrained[df_pretrained["tag"] == "Val loss"] 55 | val_loss_epoch_df_pretrained.reset_index(inplace=True, drop=True) 56 | 57 | 58 | mct_log_dir_nonpretrained = "../../logs/finetune_mct_logs2/" 59 | reader_nonpretrained = SummaryReader(mct_log_dir_nonpretrained) 60 | df_nonpretrained = reader_nonpretrained.scalars 61 | 62 | train_loss_epoch_df_nonpretrained = df_nonpretrained[ 63 | df_nonpretrained["tag"] == "Avg loss per epoch" 64 | ] 65 | train_loss_epoch_df_nonpretrained.reset_index(inplace=True, drop=True) 66 | 67 | val_loss_epoch_df_nonpretrained = df_nonpretrained[ 68 | df_nonpretrained["tag"] == "Val loss" 69 | ] 70 | val_loss_epoch_df_nonpretrained.reset_index(inplace=True, drop=True) 71 | 72 | print(train_loss_epoch_df_nonpretrained.head(25)) 73 | print(val_loss_epoch_df_nonpretrained.head(25)) 74 | 75 | print(train_loss_epoch_df_pretrained.head(17)) 76 | print(val_loss_epoch_df_pretrained.head(17)) 77 | 78 | 79 | sns.set_theme("paper", "whitegrid", font_scale=1.2) 80 | mpl.rcParams.update( 81 | { 82 | "text.usetex": True, 83 | "font.family": "serif", 84 | "text.latex.preamble": r"\usepackage{amsmath,amssymb}", 85 | "lines.linewidth": 2, 86 | "lines.markeredgewidth": 0, 87 | "scatter.marker": ".", 88 | "scatter.edgecolors": "none", 89 | # Set image quality and reduce whitespace around saved figure. 90 | "savefig.dpi": 300, 91 | "savefig.bbox": "tight", 92 | "savefig.pad_inches": 0.01, 93 | } 94 | ) 95 | 96 | fig, ax = plt.subplots(2, figsize=(5, 5), sharex=True) 97 | plt.subplots_adjust(hspace=0.03) 98 | # plt.xticks(fontsize=8) 99 | # plt.yticks(fontsize=8) 100 | 101 | 102 | ## Train loss (pre-trained vs non-pretrained) 103 | # plt.figure(figsize=(3, 1.67)) 104 | g1 = sns.lineplot( 105 | x=train_loss_epoch_df_pretrained.index, 106 | y="value", 107 | data=train_loss_epoch_df_pretrained, 108 | color="green", 109 | label="Pre-trained", 110 | ax=ax[0], 111 | ) 112 | g2 = sns.lineplot( 113 | x=train_loss_epoch_df_nonpretrained.index, 114 | y="value", 115 | data=train_loss_epoch_df_nonpretrained, 116 | color="red", 117 | label="From scratch", 118 | ax=ax[0], 119 | ) 120 | # plt.title("Train loss on MCT prediction pre-trained vs non-pretrained") 121 | ax[0].set_xlabel("Training Epoch", fontsize=12) 122 | ax[0].set_ylabel("Training MSE", fontsize=12) 123 | ax[0].lines[1].set_linestyle("--") 124 | ticks = [0, 0.25, 0.5, 0.75, 1] 125 | ax[0].yaxis.set_ticks(ticks) 126 | tickLabels = map(str, ticks) 127 | ax[0].yaxis.set_ticklabels(tickLabels) 128 | ax[0].axis(ymin=0, ymax=1) 129 | ax[0].axis(xmin=0, xmax=25) 130 | 131 | # ax[0].legend(fontsize=8) 132 | # plt.savefig("../../figures/MCT_train_loss.pdf") 133 | 134 | ## Val loss (pre-trained vs non-pretrained) 135 | # plt.figure(figsize=(3, 1.67)) 136 | g3 = sns.lineplot( 137 | x=val_loss_epoch_df_pretrained.index, 138 | y="value", 139 | data=val_loss_epoch_df_pretrained, 140 | color="green", 141 | label="Pre-trained", 142 | ax=ax[1], 143 | ) 144 | g4 = sns.lineplot( 145 | x=val_loss_epoch_df_nonpretrained.index, 146 | y="value", 147 | data=val_loss_epoch_df_nonpretrained, 148 | color="red", 149 | label="From scratch", 150 | ax=ax[1], 151 | ) 152 | # plt.title("Val loss on MCT prediction pre-trained vs non-pretrained") 153 | ax[1].set_xlabel("Training Epoch", fontsize=12) 154 | ax[1].set_ylabel("Validation MSE", fontsize=12) 155 | ticks = [0, 0.25, 0.5, 0.75, 1] 156 | ax[1].yaxis.set_ticks(ticks) 157 | tickLabels = map(str, ticks) 158 | ax[1].yaxis.set_ticklabels(tickLabels) 159 | ax[1].lines[1].set_linestyle("--") 160 | ax[1].axis(ymin=0, ymax=1) 161 | ax[1].axis(xmin=0, xmax=25) 162 | # plt.xticks(fontsize=8) 163 | # plt.yticks(fontsize=8) 164 | # ax[1].legend(fontsize=8) 165 | fig.legend( 166 | ["Pre-trained", "From scratch"], 167 | loc="upper right", 168 | bbox_to_anchor=(0.968, 0.973), 169 | ncol=1, 170 | fontsize=12, 171 | ) 172 | ax[1].get_legend().remove() 173 | ax[0].get_legend().remove() 174 | fig.tight_layout() 175 | plt.savefig("../../figures_test/MCT_loss.pdf") 176 | 177 | 178 | fig1, ax1 = plt.subplots(figsize=(5, 5), sharex=True) 179 | g1 = sns.lineplot( 180 | x=train_loss_epoch_df_pretrained.index, 181 | y="value", 182 | data=train_loss_epoch_df_pretrained, 183 | color="green", 184 | label="Pre-trained", 185 | ax=ax1, 186 | ) 187 | g2 = sns.lineplot( 188 | x=train_loss_epoch_df_nonpretrained.index, 189 | y="value", 190 | data=train_loss_epoch_df_nonpretrained, 191 | color="red", 192 | label="From scratch", 193 | ax=ax1, 194 | ) 195 | ax1.set_xlabel("Training Epoch", fontsize=8) 196 | ax1.set_ylabel("Training MSE", fontsize=8) 197 | ax1.lines[1].set_linestyle("--") 198 | ticks = [0, 0.25, 0.5, 0.75, 1] 199 | ax1.yaxis.set_ticks(ticks) 200 | tickLabels = map(str, ticks) 201 | ax1.yaxis.set_ticklabels(tickLabels) 202 | ax1.axis(ymin=0, ymax=1) 203 | ax1.axis(xmin=0, xmax=25) 204 | fig1.legend( 205 | ["Pre-trained", "From scratch"], 206 | loc="upper right", 207 | bbox_to_anchor=(0.968, 0.973), 208 | ncol=1, 209 | fontsize=8, 210 | ) 211 | ax1.get_legend().remove() 212 | fig1.tight_layout() 213 | plt.savefig("../../figures_test/MCT_trainloss.pdf") 214 | 215 | fig2, ax2 = plt.subplots(figsize=(5, 5), sharex=True) 216 | g3 = sns.lineplot( 217 | x=val_loss_epoch_df_pretrained.index, 218 | y="value", 219 | data=val_loss_epoch_df_pretrained, 220 | color="green", 221 | label="Pre-trained", 222 | ax=ax2, 223 | ) 224 | g4 = sns.lineplot( 225 | x=val_loss_epoch_df_nonpretrained.index, 226 | y="value", 227 | data=val_loss_epoch_df_nonpretrained, 228 | color="red", 229 | label="From scratch", 230 | ax=ax2, 231 | ) 232 | ax2.set_xlabel("Training Epoch", fontsize=8) 233 | ax2.set_ylabel("Validation MSE", fontsize=8) 234 | ticks = [0, 0.25, 0.5, 0.75, 1] 235 | ax2.yaxis.set_ticks(ticks) 236 | tickLabels = map(str, ticks) 237 | ax2.yaxis.set_ticklabels(tickLabels) 238 | ax2.lines[1].set_linestyle("--") 239 | ax2.axis(ymin=0, ymax=1) 240 | ax2.axis(xmin=0, xmax=25) 241 | fig2.legend( 242 | ["Pre-trained", "From scratch"], 243 | loc="upper right", 244 | bbox_to_anchor=(0.968, 0.973), 245 | ncol=1, 246 | fontsize=8, 247 | ) 248 | ax2.get_legend().remove() 249 | fig2.tight_layout() 250 | plt.savefig("../../figures_test/MCT_valloss.pdf") 251 | -------------------------------------------------------------------------------- /workspace/PandasScripts/csvhelper_memento.py: -------------------------------------------------------------------------------- 1 | # Orignal author: Siddhant Ray 2 | 3 | import argparse 4 | import os 5 | 6 | import numpy as np 7 | import pandas as pd 8 | 9 | print("Current directory is:", os.getcwd()) 10 | print("Generate combined csv for TCP congestion data") 11 | 12 | 13 | def extract_TTL(text): 14 | list_of_features = text.split() 15 | idx_of_ttl = list_of_features.index("ttl") 16 | ttl = list_of_features[idx_of_ttl + 1] 17 | return ttl 18 | 19 | 20 | def extract_protocol(text): 21 | list_of_features = text.split() 22 | idx_of_protocol = list_of_features.index("protocol") 23 | protocol = list_of_features[idx_of_protocol + 1] 24 | return protocol 25 | 26 | 27 | def rename_flowid(input_text): 28 | return input_text 29 | 30 | 31 | def generate_senders_csv(path, n_senders): 32 | path = path 33 | num_senders = n_senders 34 | sender_num = 0 35 | 36 | df_sent_cols = [ 37 | "Timestamp", 38 | "Flow ID", 39 | "Packet ID", 40 | "Packet Size", 41 | "IP ID", 42 | "DSCP", 43 | "ECN", 44 | "TTL", 45 | "Payload Size", 46 | "Proto", 47 | "Source IP", 48 | "Destination IP", 49 | "TCP Source Port", 50 | "TCP Destination Port", 51 | "TCP Sequence Number", 52 | "TCP Window Size", 53 | "Delay", 54 | "Workload ID", 55 | "Application ID", 56 | "Message ID", 57 | ] 58 | 59 | df_sent_cols_to_drop = [ 60 | 0, 61 | 2, 62 | 4, 63 | 6, 64 | 8, 65 | 10, 66 | 12, 67 | 14, 68 | 16, 69 | 18, 70 | 20, 71 | 22, 72 | 24, 73 | 26, 74 | 28, 75 | 30, 76 | 32, 77 | 34, 78 | 36, 79 | 38, 80 | 40, 81 | ] 82 | 83 | temp_cols = [ 84 | "Timestamp", 85 | "Flow ID", 86 | "Packet ID", 87 | "Packet Size", 88 | "IP ID", 89 | "DSCP", 90 | "ECN", 91 | "TTL", 92 | "Payload Size", 93 | "Proto", 94 | "Source IP", 95 | "Destination IP", 96 | "TCP Source Port", 97 | "TCP Destination Port", 98 | "TCP Sequence Number", 99 | "TCP Window Size", 100 | "Delay", 101 | "Workload ID", 102 | "Application ID", 103 | "Message ID", 104 | ] 105 | 106 | temp = pd.DataFrame(columns=temp_cols) 107 | print(temp.head()) 108 | 109 | # files = ["topo_1.csv", "topo_2.csv", "topo_test_1.csv", "topo_test_2.csv"] 110 | # files = ["topo_more_data_1.csv", "topo_more_data_2.csv", "topo_more_data_3.csv", 111 | # "topo_more_data_4.csv", "topo_more_data_5.csv", "topo_more_data_6.csv"] 112 | 113 | """files = ["small_test_no_disturbance_with_message_ids1.csv", 114 | "small_test_no_disturbance_with_message_ids2.csv", 115 | "small_test_no_disturbance_with_message_ids3.csv", 116 | "small_test_no_disturbance_with_message_ids4.csv", 117 | "small_test_no_disturbance_with_message_ids5.csv", 118 | "small_test_no_disturbance_with_message_ids6.csv", 119 | "small_test_no_disturbance_with_message_ids7.csv", 120 | "small_test_no_disturbance_with_message_ids8.csv", 121 | "small_test_no_disturbance_with_message_ids9.csv", 122 | "small_test_no_disturbance_with_message_ids10.csv"]""" 123 | 124 | """files = ["small_test_one_disturbance_with_message_ids1.csv", 125 | "small_test_one_disturbance_with_message_ids2.csv", 126 | "small_test_one_disturbance_with_message_ids3.csv", 127 | "small_test_one_disturbance_with_message_ids4.csv", 128 | "small_test_one_disturbance_with_message_ids5.csv", 129 | "small_test_one_disturbance_with_message_ids6.csv", 130 | "small_test_one_disturbance_with_message_ids7.csv", 131 | "small_test_one_disturbance_with_message_ids8.csv", 132 | "small_test_one_disturbance_with_message_ids9.csv", 133 | "small_test_one_disturbance_with_message_ids10.csv", 134 | "small_test_one_disturbance_with_message_ids11.csv"]""" 135 | 136 | files = [ 137 | "large_test_disturbance_with_message_ids1.csv", 138 | "large_test_disturbance_with_message_ids2.csv", 139 | "large_test_disturbance_with_message_ids3.csv", 140 | "large_test_disturbance_with_message_ids4.csv", 141 | "large_test_disturbance_with_message_ids5.csv", 142 | "large_test_disturbance_with_message_ids6.csv", 143 | "large_test_disturbance_with_message_ids7.csv", 144 | "large_test_disturbance_with_message_ids8.csv", 145 | "large_test_disturbance_with_message_ids9.csv", 146 | "large_test_disturbance_with_message_ids10.csv", 147 | ] 148 | # files = ["memento_test10.csv", "memento_test20.csv", "memento_test25.csv"] 149 | 150 | for file in files: 151 | 152 | sender_tx_df = pd.read_csv(path + file) 153 | sender_tx_df = pd.DataFrame(np.vstack([sender_tx_df.columns, sender_tx_df])) 154 | sender_tx_df.drop( 155 | sender_tx_df.columns[df_sent_cols_to_drop], axis=1, inplace=True 156 | ) 157 | 158 | sender_tx_df.columns = df_sent_cols 159 | sender_tx_df["Packet ID"].iloc[0] = 0 160 | sender_tx_df["Flow ID"].iloc[0] = sender_tx_df["Flow ID"].iloc[1] 161 | sender_tx_df["IP ID"].iloc[0] = 0 162 | sender_tx_df["DSCP"].iloc[0] = 0 163 | sender_tx_df["ECN"].iloc[0] = 0 164 | sender_tx_df["TCP Sequence Number"].iloc[0] = 0 165 | # sender_tx_df["TTL"] = sender_tx_df.apply(lambda row: extract_TTL(row['Extra']), axis = 1) 166 | # sender_tx_df["Proto"] = sender_tx_df.apply(lambda row: extract_protocol(row['Extra']), axis = 1) 167 | sender_tx_df["Flow ID"] = [sender_num for i in range(sender_tx_df.shape[0])] 168 | sender_tx_df["Message ID"].iloc[0] = sender_tx_df["Message ID"].iloc[1] 169 | 170 | df_sent_cols_new = [ 171 | "Timestamp", 172 | "Flow ID", 173 | "Packet ID", 174 | "Packet Size", 175 | "IP ID", 176 | "DSCP", 177 | "ECN", 178 | "Payload Size", 179 | "TTL", 180 | "Proto", 181 | "Source IP", 182 | "Destination IP", 183 | "TCP Source Port", 184 | "TCP Destination Port", 185 | "TCP Sequence Number", 186 | "TCP Window Size", 187 | "Delay", 188 | "Workload ID", 189 | "Application ID", 190 | "Message ID", 191 | ] 192 | sender_tx_df = sender_tx_df[df_sent_cols_new] 193 | 194 | # sender_tx_df.drop(['Extra'],axis = 1, inplace=True) 195 | temp = pd.concat([temp, sender_tx_df], ignore_index=True, copy=False) 196 | # sender_tx_df.drop(['Extra'],axis = 1, inplace=True) 197 | save_name = file.split(".")[0] + "_final.csv" 198 | sender_tx_df.to_csv(path + save_name, index=False) 199 | 200 | # temp.drop(['Extra'],axis = 1, inplace=True) 201 | print(temp.head()) 202 | print(temp.columns) 203 | print(temp.shape) 204 | 205 | return temp 206 | 207 | 208 | def main(): 209 | parser = argparse.ArgumentParser() 210 | parser.add_argument( 211 | "-mod", 212 | "--model", 213 | help="choose CC model for creating congestion", 214 | required=False, 215 | ) 216 | parser.add_argument( 217 | "-nsend", 218 | "--numsenders", 219 | help="choose path for different topologies", 220 | required=False, 221 | ) 222 | args = parser.parse_args() 223 | print(args) 224 | 225 | if args.model == "memento": 226 | path = "results/" 227 | 228 | else: 229 | pass 230 | 231 | n_senders = 1 232 | sender_csv = generate_senders_csv(path, n_senders) 233 | 234 | 235 | if __name__ == "__main__": 236 | main() 237 | -------------------------------------------------------------------------------- /workspace/TransformerModels/arima.py: -------------------------------------------------------------------------------- 1 | # Orignal author: Siddhant Ray 2 | 3 | import argparse 4 | import time as t 5 | import warnings 6 | from datetime import datetime 7 | from math import sqrt 8 | 9 | import matplotlib as mpl 10 | import matplotlib.pyplot as plt 11 | import numpy as np 12 | import pandas as pd 13 | import seaborn as sns 14 | from generate_sequences import generate_ARIMA_delay_data 15 | from sklearn.metrics import mean_squared_error 16 | from statsmodels.tools.sm_exceptions import ConvergenceWarning 17 | from statsmodels.tsa.arima.model import ARIMA 18 | 19 | NUMBOTTLECKS = 1 20 | 21 | 22 | def run_arima(): 23 | delay_data = generate_ARIMA_delay_data(NUMBOTTLECKS) 24 | targets, predictions = [], [] 25 | warnings.simplefilter("ignore", ConvergenceWarning) 26 | 27 | # count = 0 28 | # We want minimum 1023 for the first ARIMA prediction (size of the window) 29 | # Make this 29990 -> 9990 for the 10000 history window ARIMA 30 | for value in range(1023, int(delay_data.shape[0] / 116) + 29990): 31 | 32 | # We want to predict the next value 33 | # Fit the model 34 | model = ARIMA(delay_data[:value], order=(1, 1, 2)) 35 | model_fit = model.fit() 36 | yhat = model_fit.forecast(steps=1) 37 | targets.append(delay_data[value]) 38 | predictions.append(yhat) 39 | # count+=1 40 | 41 | # print(count) 42 | return targets, predictions 43 | 44 | 45 | def evaluate_arima(targets, predictions): 46 | mse = mean_squared_error(targets, predictions) 47 | squared_error = np.square(targets - predictions) 48 | return squared_error, mse 49 | 50 | 51 | if __name__ == "__main__": 52 | 53 | args = argparse.ArgumentParser() 54 | args.add_argument("--run", type=bool, default=False) 55 | args = args.parse_args() 56 | 57 | if args.run: 58 | 59 | print("Started ARIMA at:") 60 | time = datetime.now() 61 | print(time) 62 | 63 | targets, predictions = run_arima() 64 | 65 | ## MSE calculation 66 | mse = mean_squared_error(targets, predictions) 67 | print(mse) 68 | 69 | print("Finished ARIMA at:") 70 | time = datetime.now() 71 | print(time) 72 | 73 | # Save the results 74 | df = pd.DataFrame({"Targets": targets, "Predictions": predictions}) 75 | df.to_csv("memento_data/ARIMA_30000.csv", index=False) 76 | 77 | else: 78 | print("ARIMA load results from file") 79 | df = pd.read_csv("memento_data/ARIMA_30000.csv") 80 | 81 | targets = df["Targets"] 82 | predictions = ( 83 | df["Predictions"].str.split(" ").str[4].str.split("\n").str[0].astype(float) 84 | ) 85 | 86 | squared_error, mse = evaluate_arima(targets, predictions) 87 | 88 | df = pd.DataFrame( 89 | { 90 | "Squared Error": squared_error, 91 | "targets": targets, 92 | "predictions": predictions, 93 | } 94 | ) 95 | df.to_csv("memento_data/ARIMA_evaluation_30000.csv", index=False) 96 | 97 | print(df.head()) 98 | 99 | print(squared_error.values) 100 | 101 | ## Stats on the squared error 102 | # Mean squared error 103 | print( 104 | np.mean(squared_error.values), 105 | " Mean squared error", 106 | ) 107 | # Median squared error 108 | print(np.median(squared_error.values), " Median squared error") 109 | # 90th percentile squared error 110 | print( 111 | np.quantile(squared_error.values, 0.90, method="closest_observation"), 112 | " 90th percentile squared error", 113 | ) 114 | # 99th percentile squared error 115 | print( 116 | np.quantile(squared_error.values, 0.99, method="closest_observation"), 117 | " 99th percentile squared error", 118 | ) 119 | # 99.9th percentile squared error 120 | print( 121 | np.quantile(squared_error.values, 0.999, method="closest_observation"), 122 | " 99.9th percentile squared error", 123 | ) 124 | # Standard deviation squared error 125 | print(np.std(squared_error.values), " Standard deviation squared error") 126 | 127 | ## Df row where the squared error is the a certain value 128 | print( 129 | df[ 130 | df["Squared Error"] 131 | == np.quantile(squared_error.values, 0.5, method="closest_observation") 132 | ], 133 | "Values at median SE", 134 | ) 135 | print( 136 | df[ 137 | df["Squared Error"] 138 | == np.quantile(squared_error.values, 0.90, method="closest_observation") 139 | ], 140 | "Values at 90th percentile SE", 141 | ) 142 | print( 143 | df[ 144 | df["Squared Error"] 145 | == np.quantile(squared_error.values, 0.99, method="closest_observation") 146 | ], 147 | "Values at 99th percentile SE", 148 | ) 149 | print( 150 | df[ 151 | df["Squared Error"] 152 | == np.quantile( 153 | squared_error.values, 0.999, method="closest_observation" 154 | ) 155 | ], 156 | "Values at 99.9th percentile SE", 157 | ) 158 | print( 159 | df[ 160 | df["Squared Error"] 161 | == np.quantile( 162 | squared_error.values, 0.9999, method="closest_observation" 163 | ) 164 | ], 165 | "Values at 99.99th percentile SE", 166 | ) 167 | 168 | # Plot the index vs squared error 169 | # Set figure size 170 | 171 | sns.set_theme("paper", "whitegrid", font_scale=1.5) 172 | mpl.rcParams.update( 173 | { 174 | "text.usetex": True, 175 | "font.family": "serif", 176 | "text.latex.preamble": r"\usepackage{amsmath,amssymb}", 177 | "lines.linewidth": 2, 178 | "lines.markeredgewidth": 0, 179 | "scatter.marker": ".", 180 | "scatter.edgecolors": "none", 181 | # Set image quality and reduce whitespace around saved figure. 182 | "savefig.dpi": 300, 183 | "savefig.bbox": "tight", 184 | "savefig.pad_inches": 0.01, 185 | } 186 | ) 187 | 188 | # Plot the index vs squared error 189 | plt.figure(figsize=(5, 5)) 190 | sns.lineplot(x=df.index, y=df["Squared Error"], color="red") 191 | plt.xlabel("History Length") 192 | plt.ylabel("Squared Error") 193 | # plt.title("Squared Error on predictions vs History Length upto 10000") 194 | plt.xlim(1023, 10000) 195 | # place legend to the right, bbox is the box around the legend 196 | # plt.legend(loc='upper right', bbox_to_anchor=(0.67, 1.02), ncol=1) 197 | plt.savefig("SE_trend_arima_10000.pdf") 198 | 199 | ## Do the plots over a loop of xlims 200 | xlims = [0, 6000, 12000, 18000, 24000, 30000] 201 | for idx_xlim in range(len(xlims) - 1): 202 | plt.figure(figsize=(10, 6)) 203 | sns.lineplot( 204 | x=df.index, y=df["Squared Error"], color="red", label="Squared Error" 205 | ) 206 | # label axes 207 | plt.xlabel("History Length") 208 | plt.ylabel("Squared Error") 209 | # set xlim 210 | plt.xlim(xlims[idx_xlim], xlims[idx_xlim + 1]) 211 | plt.title( 212 | "Squared Error trend for xlims " 213 | + str(xlims[idx_xlim]) 214 | + " to " 215 | + str(xlims[idx_xlim + 1]) 216 | ) 217 | plt.savefig( 218 | "SE_trend_arima_xlim_" 219 | + str(xlims[idx_xlim]) 220 | + "_" 221 | + str(xlims[idx_xlim + 1]) 222 | + ".pdf" 223 | ) 224 | -------------------------------------------------------------------------------- /report/introduction.tex: -------------------------------------------------------------------------------- 1 | \chapter{Introduction} 2 | \label{cha:introduction} 3 | 4 | Learning fundamental behaviour of network data from packet traces is an extremely hard problem. While machine learning (ML) algorithms have been shown as an efficient way of learning from raw data, adapting such algorithms to the general network domain has been hard, to the point that the community doesn't attempt to do so. In our project, we argue that all is not lost, using some specific machine learning architectures like the Transformer, it is indeed possible to develop methods to learn from such data in a general manner. 5 | 6 | \section{Motivation} 7 | \label{sec:motivation} 8 | 9 | Modelling network dynamics is a \emph{sequence modelling} problem. From a sequence of past packets, the goal is to estimate the current state of the network (\eg Is there congestion? Will the packet be dropped?), following which predict the state's evolution and future traffic's fate. Concretely, we can also decide on which action to take next \eg should the next packet be put on a different path? Due to the successes in the field of ML and learning from data, it is becoming increasing popular to use such algorithms for solving this modelling problem but the task is notoriously complex. There has been some success in using ML for specific applications in networks, including congestion control\cite{classic,jayDeepReinforcementLearning2019,dynamic,exmachina}, 10 | video streaming\cite{oboe,maoNeuralAdaptiveVideo2017,puffer}, 11 | traffic optimization\cite{auto}, 12 | routing\cite{learnroute}, 13 | flow size prediction\cite{flow,onlineflow}, 14 | MAC protocol optimization\cite{oneproto,heterowire}, 15 | and network simulation\cite{zhangMimicNetFastPerformance2021}, however a good framework for general purpose learning on network data, still doesn't exist. 16 | 17 | Today's ML models trained are trained for specific tasks and do not generalize well; \ie they often fail to deliver outside of their original training environments\cite{puffer, datadriven, blackbox}. Due to this, generalizing to different tasks is not even considered. Recent work argues that, rather than hoping for generalization, one obtains better results by training in-situ, \ie using data collected in the deployment environment\cite{puffer}. 18 | Today we tend to design and train models from scratch using model-specific datasets~(Figure \ref{fig:vision}, top). This process is arduous, expensive, repetitive and time-consuming. We always redo everything from scratch in the training process, and never make use of common objectives for training. Moreover, the growing resource requirements to even attempt training these models is increasing inequalities in networking research and, ultimately, hindering collective progress. 19 | 20 | As ML algorithms (especially certain deep learning architectures) have shown generalization capabilities\cite{generalizingdnn} in other fields, where an initial \emph{pre-training} phase is used to train models on a large dataset, in a task-agnostic manner and then, in a \emph{fine-tuning} phase, the models are refined on smaller task specific datasets. This helps reuse the general pre-trained model across multiple tasks, making it resource and time efficient. This kind of transfer learning or generalization\cite{transferng} is enabled by using the pre-training phase to learn the overall structure in the data, followed by the fine-tuning phase to focus on learning more task-specific features. As long as there is a certain amount of similarity in the data's structure across the pre-training and fine-tuning, this method can have extremely efficient results. 21 | 22 | \section{Tasks, Goals and Challenges} 23 | \label{sec:task} 24 | 25 | Inspired by ML models which generalize on data in several other fields, it should be possible to design a similar model, in order to have learning and generalization in networking. Even if the networking contexts (topology, network configuration, traffic, etc.) can be very diverse, the underlying dynamics of networks remain essentially the same; \eg when buffers fill up, queuing disciplines delay or drop packets. These dynamics can be learned with ML and should generalize and it should not be required to re-learn this fundamental behaviour for training a new model every time. Building such a generic network model for network data is challenging, but this effort would benefit the entire community. Starting from such a model, one would only need to collect a small task-specific dataset to fine-tune it (Figure \ref{fig:vision}, bottom), assuming that the pre-trained model generalizes well. This could even allow modelling rare events (\eg drops after link failures) for which, across the real network traces today, only little data is available due to its less frequent occurrence. 26 | 27 | \begin{figure} 28 | \centering 29 | \includegraphics[scale=1.3]{figures/vision} 30 | \caption{Can we collectively learn general network traffic dynamics \emph{once} and focus on task-specific data collecting and learning for \emph{many future models?} Credits: Alexander Dietmüller} 31 | \label{fig:vision} 32 | \end{figure} 33 | 34 | While research shows some generalization in specific networking contexts\cite{jayDeepReinforcementLearning2019}, truly ``generic'' models, which are able to perform well on a wide range of tasks and networks, remain unavailable. This is caused by the fact that we usually do not train on datasets large enough to allow generalization, we only train on task-specific smaller datasets. Sequence modelling for a long time, has been infeasible even with architectures dedicated to sequence modelling, such as recurrent neural networks (RNNs), as they can only handle short sequence and are inefficient to train\cite{factor}. However, a few years ago, a new architecture for sequence modelling was proposed: the \emph{Transformer}\cite{vaswaniAttentionAllYou2017} and this proved to be ground-breaking for sequence modelling. This architecture is designed to train efficiently, enabling learning from massive datasets and unprecedented generalization. In a \emph{pre-training phase}, the transformer learns sequential ``structures'', \eg the structure of a language from a large corpus of texts. Then, in a much quicker \emph{fine-tuning phase}, the final stages of the model are adapted to a specific prediction task (\eg text sentiment analysis). Today, Transformers are among the state-of-the-art in natural language processing (NLP\cite{recentnlp}) and computer vision (CV\cite{cvsurvey}). 35 | 36 | The generalization power of the Transformer, stems from its ability to learn ``contextual information'', using context from the neighbouring elements in a sequence, for a given element in the same sequence\cite{devlinBERTPretrainingDeep2019}.\footnote{Consider the word \emph{left} in two different contexts: I \emph{left} my book on the table. Turn \emph{left} at the next crossing. The transformer outputs for the word \emph{left} are different for each sequence as they encode the word's context.} 37 | We can draw parallels between networking and NLP. In isolation, packet metadata (headers, etc.) provides limited insights into the network state, we also need the \emph{context}, \ie which we can get from the recent packet history.\footnote{Increasing latency over history indicates congestion.} Based on these parallel, we propose that a Transformer based architecture can also be design to generalise on network packet data. 38 | 39 | Naively transposing NLP or CV transformers to networking fails, as the fundamental structure and biases\cite{biases} in the data are different. Generalizing on complex interactions in networks is not a trivial problem. We expect the following challenges for our Transformer design. 40 | 41 | \begin{itemize} 42 | \item 43 | How do we adapt Transformers for learning on networking data? 44 | \item 45 | How do we assemble a dataset large and diverse enough to allow useful generalization? 46 | \item 47 | Which pre-training task would allow the model to generalize, and how far can we push generalization? 48 | \item 49 | How do we scale such a Transformer to arbitrarily large amounts of network data from extremely diverse environments? 50 | \end{itemize} 51 | 52 | \section{Overview} 53 | \label{sec:overview} 54 | 55 | We present in our thesis, a Network Traffic Transformer (NTT), which serves a first step, to design a Transformer model for learning on network packet data. We outline the following main technical contributions in this thesis: 56 | 57 | \begin{itemize} 58 | \item We present the required background on Transformers, which gives directions to our design, in Chapter \ref{cha:background}. 59 | \item We present the detailed architectural design ideas behind our proof-of-concept NTT in Chapter \ref{cha:design}. 60 | \item We present a detailed evaluation on pre-training and fine-tuning our first NTT models in Chapter \ref{cha:evaluation}. 61 | \item We present several future research directions which can be used to improve our NTT in Chapter \ref{cha:outlook}. 62 | \item We summarise our work and provide some concluding remarks in Chapter \ref{cha:summary}. 63 | \item Supplementary technical details and supporting results are presented in Appendix \ref{app:a}, \ref{app:b} and \ref{app:c}. 64 | \end{itemize} 65 | 66 | A part of the work conducted during this thesis has been submitted as the following paper\cite{newhope} to HotNets '22, and hence, some parts of the thesis have been built upon work done in writing the paper. 67 | -------------------------------------------------------------------------------- /workspace/TransformerModels/mct_test_plots.py: -------------------------------------------------------------------------------- 1 | # Orignal author: Siddhant Ray 2 | 3 | from csv import reader 4 | 5 | import matplotlib as mpl 6 | import matplotlib.pyplot as plt 7 | import seaborn as sns 8 | from matplotlib.ticker import FormatStrFormatter 9 | from tbparse import SummaryReader 10 | 11 | mct_log_dir_pretrained0 = "../../logs/finetune_mct_logs/" 12 | reader_pretrained0 = SummaryReader(mct_log_dir_pretrained0) 13 | df_pretrained0 = reader_pretrained0.scalars 14 | 15 | 16 | train_loss_epoch_df_pretrained0 = df_pretrained0[ 17 | df_pretrained0["tag"] == "Avg loss per epoch" 18 | ] 19 | train_loss_epoch_df_pretrained0.reset_index(inplace=True, drop=True) 20 | 21 | val_loss_epoch_df_pretrained0 = df_pretrained0[df_pretrained0["tag"] == "Val loss"] 22 | val_loss_epoch_df_pretrained0.reset_index(inplace=True, drop=True) 23 | 24 | 25 | mct_log_dir_pretrained1 = "../../logs/finetune_mct_logs3/" 26 | reader_pretrained1 = SummaryReader(mct_log_dir_pretrained1) 27 | df_pretrained1 = reader_pretrained1.scalars 28 | 29 | train_loss_epoch_df_pretrained1 = df_pretrained1[ 30 | df_pretrained1["tag"] == "Avg loss per epoch" 31 | ] 32 | train_loss_epoch_df_pretrained1.reset_index(inplace=True, drop=True) 33 | 34 | val_loss_epoch_df_pretrained1 = df_pretrained1[df_pretrained1["tag"] == "Val loss"] 35 | val_loss_epoch_df_pretrained1.reset_index(inplace=True, drop=True) 36 | 37 | mct_log_dir_pretrained2 = "../../logs/finetune_mct_logs4/" 38 | reader_pretrained2 = SummaryReader(mct_log_dir_pretrained2) 39 | df_pretrained2 = reader_pretrained2.scalars 40 | 41 | train_loss_epoch_df_pretrained2 = df_pretrained2[ 42 | df_pretrained2["tag"] == "Avg loss per epoch" 43 | ] 44 | train_loss_epoch_df_pretrained2.reset_index(inplace=True, drop=True) 45 | 46 | val_loss_epoch_df_pretrained2 = df_pretrained2[df_pretrained2["tag"] == "Val loss"] 47 | val_loss_epoch_df_pretrained2.reset_index(inplace=True, drop=True) 48 | 49 | # Print df shape 50 | print(train_loss_epoch_df_pretrained2.shape) 51 | print(val_loss_epoch_df_pretrained2.shape) 52 | 53 | sns.set_theme("paper", "whitegrid", font_scale=1.5) 54 | mpl.rcParams.update( 55 | { 56 | "text.usetex": True, 57 | "font.family": "serif", 58 | "text.latex.preamble": r"\usepackage{amsmath,amssymb}", 59 | "lines.linewidth": 2, 60 | "lines.markeredgewidth": 0, 61 | "scatter.marker": ".", 62 | "scatter.edgecolors": "none", 63 | # Set image quality and reduce whitespace around saved figure. 64 | "savefig.dpi": 300, 65 | "savefig.bbox": "tight", 66 | "savefig.pad_inches": 0.01, 67 | } 68 | ) 69 | 70 | # Test figure, makse subplots for train and val 71 | fig, ax = plt.subplots(2, figsize=(5, 5), sharex=True) 72 | t0 = sns.lineplot( 73 | x=train_loss_epoch_df_pretrained0.index, 74 | y="value", 75 | data=train_loss_epoch_df_pretrained0, 76 | color="blue", 77 | label="Fixed Mask Last", 78 | ax=ax[0], 79 | ) 80 | t1 = sns.lineplot( 81 | x=train_loss_epoch_df_pretrained1.index, 82 | y="value", 83 | data=train_loss_epoch_df_pretrained1, 84 | color="green", 85 | label="Last 16", 86 | ax=ax[0], 87 | ) 88 | t2 = sns.lineplot( 89 | x=train_loss_epoch_df_pretrained2.index, 90 | y="value", 91 | data=train_loss_epoch_df_pretrained2, 92 | color="red", 93 | label="Last 32", 94 | ax=ax[0], 95 | ) 96 | # Label plot 97 | ax[0].set_xlabel("Training Epoch", fontsize=10) 98 | ax[0].set_ylabel("Training MSE", fontsize=10) 99 | ax[0].lines[0].set_linestyle("dotted") 100 | ax[0].lines[1].set_linestyle("--") 101 | ax[0].lines[2].set_linestyle("-.") 102 | ticks = [0, 0.25, 0.5, 0.75, 1] 103 | ax[0].yaxis.set_ticks(ticks) 104 | tickLabels = map(str, ticks) 105 | ax[0].yaxis.set_ticklabels(tickLabels) 106 | ax[0].axis(ymin=0, ymax=1) 107 | ax[0].axis(xmin=0, xmax=18) 108 | 109 | v0 = sns.lineplot( 110 | x=val_loss_epoch_df_pretrained0.index, 111 | y="value", 112 | data=val_loss_epoch_df_pretrained0, 113 | color="blue", 114 | label="Fixed Mask Last", 115 | ax=ax[1], 116 | ) 117 | v1 = sns.lineplot( 118 | x=val_loss_epoch_df_pretrained1.index, 119 | y="value", 120 | data=val_loss_epoch_df_pretrained1, 121 | color="green", 122 | label="Last 16", 123 | ax=ax[1], 124 | ) 125 | v2 = sns.lineplot( 126 | x=val_loss_epoch_df_pretrained2.index, 127 | y="value", 128 | data=val_loss_epoch_df_pretrained2, 129 | color="red", 130 | label="Last 32", 131 | ax=ax[1], 132 | ) 133 | 134 | # Label plot 135 | ax[1].set_xlabel("Training Epoch", fontsize=10) 136 | ax[1].set_ylabel("Validation MSE", fontsize=10) 137 | ax[1].lines[0].set_linestyle("dotted") 138 | ax[1].lines[1].set_linestyle("--") 139 | ax[1].lines[2].set_linestyle("-.") 140 | ticks = [0, 0.25, 0.5, 0.75, 1] 141 | ax[1].yaxis.set_ticks(ticks) 142 | tickLabels = map(str, ticks) 143 | ax[1].yaxis.set_ticklabels(tickLabels) 144 | ax[1].axis(ymin=0, ymax=1) 145 | ax[1].axis(xmin=0, xmax=18) 146 | 147 | fig.legend( 148 | ["Fixed Mask Last", "Last 16", "Last 32"], 149 | loc="upper right", 150 | bbox_to_anchor=(0.968, 0.958), 151 | ncol=1, 152 | fontsize=10, 153 | ) 154 | ax[1].get_legend().remove() 155 | ax[0].get_legend().remove() 156 | fig.tight_layout() 157 | fig.savefig("../../figures_test/finetune_mct_loss_comparison.pdf") 158 | 159 | ## Create the same for aggregated masking also 160 | # Create a new dataframe for aggregated masking 161 | 162 | mct_log_dir_pretrained3 = "../../logs/finetune_mct_logs5/" 163 | reader_pretrained3 = SummaryReader(mct_log_dir_pretrained3) 164 | df_pretrained3 = reader_pretrained3.scalars 165 | 166 | train_loss_epoch_df_pretrained3 = df_pretrained3[ 167 | df_pretrained3["tag"] == "Avg loss per epoch" 168 | ] 169 | train_loss_epoch_df_pretrained3.reset_index(inplace=True, drop=True) 170 | 171 | val_loss_epoch_df_pretrained3 = df_pretrained3[df_pretrained3["tag"] == "Val loss"] 172 | val_loss_epoch_df_pretrained3.reset_index(inplace=True, drop=True) 173 | 174 | mct_log_dir_pretrained4 = "../../logs/finetune_mct_logs6/" 175 | reader_pretrained4 = SummaryReader(mct_log_dir_pretrained4) 176 | df_pretrained4 = reader_pretrained4.scalars 177 | 178 | train_loss_epoch_df_pretrained4 = df_pretrained4[ 179 | df_pretrained4["tag"] == "Avg loss per epoch" 180 | ] 181 | train_loss_epoch_df_pretrained4.reset_index(inplace=True, drop=True) 182 | 183 | val_loss_epoch_df_pretrained4 = df_pretrained4[df_pretrained4["tag"] == "Val loss"] 184 | val_loss_epoch_df_pretrained4.reset_index(inplace=True, drop=True) 185 | 186 | mct_log_dir_pretrained5 = "../../logs/finetune_mct_logs7/" 187 | reader_pretrained5 = SummaryReader(mct_log_dir_pretrained5) 188 | df_pretrained5 = reader_pretrained5.scalars 189 | 190 | train_loss_epoch_df_pretrained5 = df_pretrained5[ 191 | df_pretrained5["tag"] == "Avg loss per epoch" 192 | ] 193 | train_loss_epoch_df_pretrained5.reset_index(inplace=True, drop=True) 194 | 195 | val_loss_epoch_df_pretrained5 = df_pretrained5[df_pretrained5["tag"] == "Val loss"] 196 | val_loss_epoch_df_pretrained5.reset_index(inplace=True, drop=True) 197 | 198 | 199 | ## Plot the loss for aggregated masking 200 | fig, ax = plt.subplots(2, figsize=(5, 5), sharex=True) 201 | t0 = sns.lineplot( 202 | x=train_loss_epoch_df_pretrained0.index, 203 | y="value", 204 | data=train_loss_epoch_df_pretrained0, 205 | color="blue", 206 | label="Fixed Mask Last", 207 | ax=ax[0], 208 | ) 209 | t1 = sns.lineplot( 210 | x=train_loss_epoch_df_pretrained4.index, 211 | y="value", 212 | data=train_loss_epoch_df_pretrained4, 213 | color="green", 214 | label="Choose Enc. State", 215 | ax=ax[0], 216 | ) 217 | t2 = sns.lineplot( 218 | x=train_loss_epoch_df_pretrained3.index, 219 | y="value", 220 | data=train_loss_epoch_df_pretrained3, 221 | color="red", 222 | label="Choose Agg. Level", 223 | ax=ax[0], 224 | ) 225 | 226 | # Label plot 227 | ax[0].set_xlabel("Training Epoch", fontsize=10) 228 | ax[0].set_ylabel("Training MSE", fontsize=10) 229 | ax[0].lines[0].set_linestyle("dotted") 230 | ax[0].lines[1].set_linestyle("--") 231 | ax[0].lines[2].set_linestyle("-.") 232 | ticks = [0, 0.25, 0.5, 0.75, 1] 233 | ax[0].yaxis.set_ticks(ticks) 234 | tickLabels = map(str, ticks) 235 | ax[0].yaxis.set_ticklabels(tickLabels) 236 | ax[0].axis(ymin=0, ymax=1) 237 | ax[0].axis(xmin=0, xmax=13) 238 | 239 | v0 = sns.lineplot( 240 | x=val_loss_epoch_df_pretrained0.index, 241 | y="value", 242 | data=val_loss_epoch_df_pretrained0, 243 | color="blue", 244 | label="Fixed Mask Last", 245 | ax=ax[1], 246 | ) 247 | v1 = sns.lineplot( 248 | x=val_loss_epoch_df_pretrained4.index, 249 | y="value", 250 | data=val_loss_epoch_df_pretrained4, 251 | color="green", 252 | label="Choose Enc. State", 253 | ax=ax[1], 254 | ) 255 | v2 = sns.lineplot( 256 | x=val_loss_epoch_df_pretrained3.index, 257 | y="value", 258 | data=val_loss_epoch_df_pretrained3, 259 | color="red", 260 | label="Choose Agg. Level", 261 | ax=ax[1], 262 | ) 263 | 264 | # Label plot 265 | ax[1].set_xlabel("Training Epoch", fontsize=10) 266 | ax[1].set_ylabel("Validation MSE", fontsize=10) 267 | ax[1].lines[0].set_linestyle("dotted") 268 | ax[1].lines[1].set_linestyle("--") 269 | ax[1].lines[2].set_linestyle("-.") 270 | ticks = [0, 0.25, 0.5, 0.75, 1] 271 | ax[1].yaxis.set_ticks(ticks) 272 | tickLabels = map(str, ticks) 273 | ax[1].yaxis.set_ticklabels(tickLabels) 274 | ax[1].axis(ymin=0, ymax=1) 275 | ax[1].axis(xmin=0, xmax=18) 276 | 277 | # Make legend 278 | fig.legend( 279 | ["Fixed Mask Last", "Choose Enc. State", "Choose Agg. Level"], 280 | loc="upper right", 281 | bbox_to_anchor=(0.968, 0.958), 282 | ncol=1, 283 | fontsize=10, 284 | ) 285 | ax[1].get_legend().remove() 286 | ax[0].get_legend().remove() 287 | fig.tight_layout() 288 | fig.savefig("../../figures_test/finetune_mct_loss_comparison_agg.pdf") 289 | -------------------------------------------------------------------------------- /report/appendix.tex: -------------------------------------------------------------------------------- 1 | \chapter{NTT training details} 2 | \label{app:a} 3 | 4 | We present here further specifics about the choices made during pre-training and fine-tuning our NTT architecture. We also present the common hyper-parameters used for both. In the scope of our project, we do not perform a search to find the best performing hyper-parameters. Our objective is to explore what can be learnt, and not to achieve state-of-the-art results. The hyper-parameters for us are chosen based on Transformers trained in other domains, and with some tweaking, work reasonably well for our use case. 5 | 6 | We implement our NTT in Python, using the PyTorch\cite{pytorch} and PyTorch Lightning\cite{pytorchlit} libraries. We implement our NTT in a Debian $10$ environment. For our training process, we use NVIDIA\textsuperscript{\textregistered} Titan Xp GPUs, with $12$ GB of GPU memory. For pre-training and fine-tuning on the full datasets, we use 2 GPUs with PyTorch's DataParallel implementation. For pre-processing our data, generating our input sliding window sequences and converting our data into training, validation and test batches, we use $4$ Intel\textsuperscript{\textregistered} $2.40$ GHz Dual Tetrakaideca-Core Xeon E5-2680 v4 CPUs and between $60-80$ GB of RAM. 7 | 8 | \begin{table}[htbp] 9 | \centering 10 | \begin{tabular}{ l c } 11 | \toprule 12 | \emph{Hyper-parameter} & Value \\ 13 | 14 | \midrule 15 | Learning rate & $1\times10^{-4}$ \\ 16 | Weight decay & $1\times10^{-5}$ \\ 17 | \# of attention heads & $8$ \\ 18 | \# of Transformer layers & $6$ \\ 19 | Batch size & $64$ \\ 20 | Dropout prob. & $0.2$ \\ 21 | Pkt. embedding dim. & $120$ \\ 22 | 23 | \bottomrule 24 | 25 | \end{tabular} 26 | \caption{Hyper-parameters for NTT training} 27 | \label{app:table1} 28 | \end{table} 29 | 30 | We refer to Table \ref{app:table1} to discuss our training hyper-parameters. The number of attention heads refers to the number of attention matrices used inside the Transformer encoder layers, which are processed in parallel. In our NTT architecture (Figure \ref{fig:ntt}), we have $691K$ trainable parameters from the embedding and aggregation layers, $3.3M$ trainable parameters from the Transformer encoder layers and $163K$ trainable parameters from linear layers in the decoder. We use $4$ linear layers in the decoder, with activation and layer-weight normalisation\cite{layernorm} between each linear layer. We also use a layer-weight normalisation layer, as a pre-normalising layer, on the output of the embedding and aggregation. During training, we use a dropout probability\cite{dropout} of $0.2$ and a weight decay\cite{weightdecay} over the weights (not biases)\cite{goodfellowDeepLearning2016}, in order to prevent overfitting. We use a batch size of $64$, to reduce the noise during our training process. 31 | 32 | We use the ADAM\cite{adam} optimiser with $\beta_1=0.9$, $\beta_2=0.98$ and $\epsilon=1\times10^{-9}$, for our training and the Huber loss\cite{huber} function for training loss (\ref{eq:huber}), as it is not super-sensitive to outliers but neither ignores their effects entirely. The loss function is computed on the residual \ie the difference between the observed and predicted values, $y$ and $f(x)$ respectively. 33 | \begin{equation} 34 | L_\delta(y, f(x))= 35 | \begin{cases} 36 | \frac{1}{2}(y - f(x))^2 & \text{for } \lvert y - f(x) \rvert \leq \delta, \\ 37 | \delta \cdot (\lvert y - f(x) \rvert - \frac{1}{2}\delta) & \text{otherwise} 38 | \end{cases} 39 | \label{eq:huber} 40 | \end{equation} 41 | 42 | We use a warm-up scheduler over our base learning rate (lr) of $1\times10^{-4}$, as mentioned in the original Transformer paper\cite{vaswaniAttentionAllYou2017}. We present the governing equation for that as (\ref {eq:lr}) 43 | 44 | \begin{equation} 45 | lr = d_{model}^{-0.5} \cdot \min{(step\_num^{-0.5}, warmup\_steps^{-0.5})} 46 | \label{eq:lr} 47 | \end{equation} 48 | 49 | This corresponds to increasing the learning rate linearly for the first \emph{warmup\_steps} training steps, and decreasing it thereafter proportionally to the inverse square root of the step number. We used $warmup\_steps = 4000$ and our pre-training data has ${\sim}17K$ steps, and our $d_{model}$ is $120$. 50 | 51 | 52 | \chapter{Learning with multiple decoders} 53 | \label{app:b} 54 | 55 | In Section \ref{ssec:impptt}, we evaluated the idea of masking different positions in the input sequence, in order to improve the pre-training phase of the NTT. During this, we realised that with a variable masking approach, it is not always feasible to have a single set of linear layers to effectively act a combined MLP decoder, across all levels of aggregation. We present some further results on using different instances of identical MLP decoders, for different levels of aggregation, which are due to selecting the packet delays to be masked in different ways for pre-training. We summarise our findings in Table \ref{app:table2}. 56 | 57 | \begin{table}[htbp] 58 | \centering 59 | \begin{tabular}{ l c } 60 | \toprule 61 | \emph{all values $\times10^{-3}$} & Pre-training \\ 62 | (Masking + MLP instance) & (Delay) \\ 63 | 64 | \midrule 65 | \em{NTT: Chosen mask} & \\ 66 | \smallindent From encoded states + 1 MLP decoder & 0.063 \\ 67 | \smallindent From encoded states + 3 MLP decoders & 0.070 \\ 68 | \smallindent From aggregation levels + 1 MLP decoder & 1.31 \\ 69 | \smallindent From aggregation levels + 3 MLP decoders & 0.087 \\ 70 | 71 | 72 | \bottomrule 73 | 74 | \end{tabular} 75 | \caption{MSE on delay prediction across NTT with multiple instances of linear MLP decoders} 76 | \label{app:table2} 77 | \end{table} 78 | 79 | Based on our experiments, it is evident that we will need different instances of MLP decoders, when we mask across different levels of aggregation, with varying frequency of masking. In the case, where we choose from \emph{the encoded states}, we only choose the packets which are aggregated twice $1/48$ of the times, the packets which are aggregated once $15/48$ of the times, and hence we mainly choose the non-aggregated packets \ie $32/48$ of the times. In this scenario, it doesn't hurt the performance even if we use a single set of linear layers to extract the learnt behaviour across levels of aggregation, as most of the times, the non-aggregated packets are chosen. In this case, using a single MLP decoder vs using $3$ MLP decoders for learning has very similar performance. 80 | 81 | When we choose from \emph{aggregation levels}, the situation is very different as every type of aggregation is chosen $1/3$ of the times. Effectively this results in the fact, that we mask $1/2$ of the packet delays, which are aggregated twice, $1/3$ of the times. In this scenario, a single MLP decoder cannot effectively learn across levels of aggregation, all of which are chosen frequently and in this case, using multiple MLP decoders, helps the learning process. A-priori, we do not know what kind of architecture works best for this, and thus we start with the simplest model, where we use three identical sets of linear layers, to match the independent levels of aggregation. 82 | 83 | Based on the fact that different aggregation schemes may be used in future version of the NTT, different numbers of MLP decoders will be needed. One can certainly be sure that, with increasing complexity and aggregation, more complexity will also be required in the decoder architecture, to learn newer kinds of information. 84 | 85 | \chapter{Delay distributions on the multi-path topology} 86 | \label{app:c} 87 | 88 | We present further insights in this section, about the individual delay distributions seen on each end-to-end path (from sending sources to each individual receiver) as shown in Figure \ref{fig:topo_ft_big}. A-priori, we should not assume that increasing complexity in traffic flows on the network, changes the traffic distributions on each individual path. Our NTT learns dynamics only on a single path during pre-training. We hypothesise that this can generalize to topologies with different paths and different dynamics with fine-tuning. For this we should try to check for the case, when these delay dynamics on different paths (affected by queueing delay and link delay) are different. 89 | 90 | \begin{figure*}[!h] 91 | \begin{center} 92 | \includegraphics[scale=0.8]{figures/delay_Receivers.pdf} 93 | \caption{Comparing delay CDFs across multiple paths} 94 | \label{fig:multipatht} 95 | \end{center} 96 | \end{figure*} 97 | 98 | We can see in Figure \ref{fig:multipatht} that the network dynamics change considerably, as we increase the paths in the network, as we compare the CDFs of the packet delays at all receivers. We observe that the dynamics change a lot from the path to Receiver 1 to the path to Receiver 2, but in our setup, not so much in the paths to Receiver 2 and Receiver 3. IWe can clearly see from the experimental results in the Section \ref{ssec:comptop}, that the pre-trained NTT generalizes to newer topologies with varying dynamics across different paths. However, for testing the NTT in a more robust manner, it is evident that we need to fine-tune on multiple topologies, with different path dynamics in order to check the true extent of generalization. For the purpose of this thesis, such evaluation is not in the current scope and we leave that to future experiments. 99 | 100 | \chapter{Declaration of Originality} 101 | \label{app:d} 102 | 103 | \begin{figure*}[!h] 104 | \begin{center} 105 | \includegraphics[scale=0.8]{figures/declaration.pdf} 106 | \label{fig:dec} 107 | \end{center} 108 | \end{figure*} 109 | 110 | -------------------------------------------------------------------------------- /report/outlook.tex: -------------------------------------------------------------------------------- 1 | \chapter{Outlook} 2 | \label{cha:outlook} 3 | 4 | The evaluation for our current NTT design is extremely promising, and indicates that such an architecture can be built for learning and generalizing on network dynamics. However, the process does not end here, there is huge scope of possible future research and there exist multiple directions in which the NTT can be improved. We present some ideas which we feel can be the next steps for these improvements. 5 | 6 | \section{Learning on bigger topologies} 7 | \label{sec:biggertopos} 8 | 9 | We evaluate our NTT on simple topologies in our project, as we are still in the initial phase of building such an architecture. Real networks are undeniably much more complex than our current training environment. These networks (\eg a large datacenter, ISP etc.) are larger, have complex connections with traffic flowing along multiple paths; there are many different applications, transport protocols, queueing disciplines, etc. and the interactions across these traces, lead to extremely complex network dynamics. Additionally, there are many more fine-tuning tasks to consider, \eg flow classification for security or anomaly detection. We evaluate pre-training on a small topology and its generalising behaviour on a larger topology to an extent, based on learnt behaviour of network dynamics of bottleneck queues in Section \ref{ssec:comptop}. However, this merely scratches the surface and doesn't match the scale of complexity of dynamics on real networks. 10 | 11 | Apart from this, our current network traces for training and evaluation are only drawn from a small subset of Internet traffic distributions\cite{homa}. Testing our NTT prototype in real, diverse environments and with multiple fine-tuning tasks will provide many more invaluable insights. We can not only better understand the strengths and weaknesses of our NTT architecture but also into the ``fundamental learnability'' of network dynamics. A first step for this can be conducting more extensive, more complex simulations and analysing real-world datasets such as Caida\cite{caida}, M-LAB\cite{mlab}, or Crawdad\cite{crawdad} or Rocketfuel\cite{rocketfuel}. This kind of evaluation will provide much deeper insights into the generalization on learnt network dynamics. 12 | 13 | \emph{How does the NTT hold up with more diverse environments and fine-tuning tasks? Which aspects of network dynamics are easy to generalize to, and which are difficult?} 14 | 15 | \section{Learning more complex features} 16 | \label{sec:compftt} 17 | 18 | More diverse environments, \ie with more diverse network dynamics, also present the opportunity to improve our NTT architecture. The better the learning of general network dynamics during pre-training, the more can other models benefit from the NTT during fine-tuning. The directions for improvement we see here are: 19 | 20 | \begin{itemize} 21 | \item \emph{Better sequence aggregation:} We base our current NTT's aggregation levels on the number of in-flight packets, \ie whether packets in the sequence may share their fate, determined by buffer sizes in our experiments. Evaluations show that the hypothesis holds; the further packets are apart, the less likely they do share fate. Such packets are aggregated much more, given their individual contribution to the state of current packet is much lower. Currently, we believe matching individual aggregation levels to common buffer sizes (\eg flow and switch buffers) may be beneficial. Much more research is still needed to put this hypothesis to the test and determine the best sequence sizes and aggregation levels for the future NTT versions. 22 | 23 | \item \emph{Multiple protocol packet data:} Till now, we did not use network traces which combined different transport protocols or contained network prioritisation of different traffic classes, and thus did not use any specific packet header information in our features for learning. Considering such header information might be essential to learn behavioural differences between such categorisations. Raw packet headers can be challenging to provide as inputs to an ML model, as they may appear in many combinations and contain values that are difficult to learn, like IP addresses\cite{zhangMimicNetFastPerformance2021}. Some research ideas from the network verification community on header space analysis\cite{kazemianHeaderSpaceAnalysis} may provide valuable insights on header representations and potential first steps in this direction. 24 | 25 | \item \emph{Learning path identification:} Currently, we do not have a concrete method for the NTT to learn differences between multiple possible paths in the network, a feature which will become significant, to learn on larger topologies. Evaluation on the topologies with multiple paths (in Section \ref{ssec:comptop}), demonstrated that such a distinction is indeed required. In the initial experiments, this was solved by providing a unique identifier (Receiver ID) as an additional feature in the input feature set. While this is a quick, simple fix; it might not be an optimal method to scale to larger topologies. Additionally, such a simple identifier does not provide insights about hierarchical overlap (\eg subnets), which might be required for more efficient learning. Networks today learn the required path information from the routing function (\eg shortest path, prefix matching, subnetting etc). While it might be possible for the NTT to ``learn" the the path information and the routing function by giving it all features like prefix, subnet mask etc., this might be suboptimal, hard and unnecessary. Coming up with possible ideas to learn path information better is a possible next step we see to improve the NTT. 26 | 27 | \item \emph{Dealing better with rare network events:} While in several fields of machine learning, it is enough to learn behaviour ``on average'', this doesn't translate to the domain of network data. Less frequently occurring events in networks (\eg packet drops from link failures) can lead to significant information loss. This kind of behaviour is hard to learn for machine learning algorithms due to these events have relatively very less representation in the training data but it is essential that our NTT learns outcomes of these events to an extent. One possible step is to collect telemetry data like packet drops or buffer occupancy as features. This may allow the model to learn the behaviour of networks better, though this will be hard to due to sparse nature of such data and future research is needed to solve this in an efficient way. 28 | \end{itemize} 29 | \vspace{-0.3cm} 30 | \emph{How can we improve the NTT design to learn efficiently from diverse environments? How can we deal with an information mismatch between environments?} 31 | 32 | 33 | \section{Collaborative pre-training} 34 | \label{sec:collab} 35 | 36 | Transformers in NLP and CV have only shown to truly outshine their competition when pre-trained with massive amounts of data. We envision that this would require a previously unseen collaboration across the industry. We see two main challenges: 37 | \begin{itemize} 38 | \item \emph{Training data volume:} Training an NTT to learn complex dynamics at the scale of large topologies will require a extremely large amount of data. Given the differences possible across networks, the pre-training data will need to be representative of this, which will require huge amount of network traces, which no single organisation might have access too and will require collaboration between several of them. 39 | \item \emph{Data privacy:} Due to privacy concerns, it might not be possible to share a lot of this data publicly. Also several organisations might be unwilling to share their data anyway, as it would cost them their competitive advantage in the industry. 40 | \end{itemize} 41 | 42 | 43 | We see some possible solutions to these problems. ML models are known to effectively compress data. As an example, GPT-3\cite{brownLanguageModelsAre2020} is one of the largest Transformer models for text data to date and consists of 175 Billion parameters or roughly 350 Gigabytes. However, it contains information from over 45 Terabytes of text data. Another huge model is Data2Vec\cite{baevskiData2vecGeneralFramework2022} which is a general purpose Transformer which learns representations for text, images and audio using a task-agnostic training approach with knowledge distillation\cite{kd}, but is trained on trillions of datapoints. Sharing a pre-trained model is much more feasible than sharing all the underlying data, it also reduces training time and redundancy in re-training for already established results. 44 | Furthermore, sharing models instead of data could overcome privacy barriers via federated learning\cite{kairouzAdvancesOpenProblems2021}. Organizations can keep their data private and only share pre-trained models, which can then be combined into a final collectively pre-trained model. This brings up the problem of how can these models be trusted, but this can be solved by making the details of the pre-training process public, sharing the model architecture while at the same time, keeping the training data private. 45 | 46 | \emph{Can we leverage pre-training and federated learning to learn from previously unavailable data?} 47 | 48 | \section{Continual learning} 49 | \label{sec:cont} 50 | 51 | Underlying structure in languages and images, do not evolve much over time. A cat's remains a cat's image, whether viewed today or $10$ years later. A sentence in English might change slightly over time, due to changes in grammatical rules but the overall structure is similar. Pre-trained models on such data thus, do not need to be re-trained frequently. However, the Internet is an ever evolving environment. Protocols, applications, etc. may change over time. Interactions in networks due to addition of new network nodes sending traffic, may change the underlying network dynamics significantly. 52 | 53 | Though we are certain that underlying network dynamics will change over time, we do expect them change less frequently than changes in individual environments, and still argue that the same pre-trained NTT may be used for significant time, with just small amounts of fine-tuning from time to time. Nevertheless, at some point, even the learnt NTT model on underlying dynamics may become outdated and will have to be re-trained. It is already difficult to determine when it is helpful to re-train a specific model\cite{puffer}, and for a model that is supposed to capture a large range of environments, this is very likely be an even harder task. 54 | 55 | \emph{At which point should we consider an NTT outdated? When and with what data should it be re-trained?} 56 | 57 | 58 | 59 | 60 | -------------------------------------------------------------------------------- /workspace/NetworkSimulators/memento/cdf-application.cc: -------------------------------------------------------------------------------- 1 | /* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */ 2 | // 3 | // Copyright (c) 2006 Georgia Tech Research Corporation 4 | // 5 | // This program is free software; you can redistribute it and/or modify 6 | // it under the terms of the GNU General Public License version 2 as 7 | // published by the Free Software Foundation; 8 | // 9 | // This program is distributed in the hope that it will be useful, 10 | // but WITHOUT ANY WARRANTY; without even the implied warranty of 11 | // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 | // GNU General Public License for more details. 13 | // 14 | // You should have received a copy of the GNU General Public License 15 | // along with this program; if not, write to the Free Software 16 | // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 17 | // 18 | // Author: George F. Riley 19 | // 20 | 21 | // ns3 - On/Off Data Source Application class 22 | // George F. Riley, Georgia Tech, Spring 2007 23 | // Adapted from ApplicationOnOff in GTNetS. 24 | 25 | #include "fstream" 26 | #include "ns3/log.h" 27 | #include "ns3/address.h" 28 | #include "ns3/inet-socket-address.h" 29 | #include "ns3/inet6-socket-address.h" 30 | #include "ns3/packet-socket-address.h" 31 | #include "ns3/node.h" 32 | #include "ns3/nstime.h" 33 | #include "ns3/data-rate.h" 34 | #include "ns3/random-variable-stream.h" 35 | #include "ns3/socket.h" 36 | #include "ns3/simulator.h" 37 | #include "ns3/socket-factory.h" 38 | #include "ns3/packet.h" 39 | #include "ns3/uinteger.h" 40 | #include "ns3/trace-source-accessor.h" 41 | #include "ns3/udp-socket-factory.h" 42 | #include "ns3/string.h" 43 | #include "ns3/pointer.h" 44 | #include "ns3/double.h" 45 | #include "ns3/tag.h" 46 | 47 | #include "cdf-application.h" 48 | #include "ns3/experiment-tags.h" 49 | 50 | namespace ns3 51 | 52 | { 53 | 54 | NS_LOG_COMPONENT_DEFINE("CdfApplication"); 55 | 56 | NS_OBJECT_ENSURE_REGISTERED(CdfApplication); 57 | 58 | TypeId 59 | CdfApplication::GetTypeId(void) 60 | { 61 | static TypeId tid = 62 | TypeId("ns3::CdfApplication") 63 | .SetParent() 64 | .SetGroupName("Applications") 65 | .AddConstructor() 66 | .AddAttribute("DataRate", "The data rate in on state.", 67 | DataRateValue(DataRate("500kb/s")), 68 | MakeDataRateAccessor(&CdfApplication::SetRate, 69 | &CdfApplication::GetRate), 70 | MakeDataRateChecker()) 71 | .AddAttribute("CdfFile", "Message size distribution file.", 72 | EmptyAttributeValue(), 73 | MakeStringAccessor(&CdfApplication::SetDistribution, 74 | &CdfApplication::GetDistribution), 75 | MakeStringChecker()) 76 | .AddAttribute("Remote", "The address of the destination", 77 | AddressValue(), 78 | MakeAddressAccessor(&CdfApplication::m_peer), 79 | MakeAddressChecker()) 80 | .AddAttribute("Protocol", "The type of protocol to use. This should be " 81 | "a subclass of ns3::SocketFactory", 82 | TypeIdValue(UdpSocketFactory::GetTypeId()), 83 | MakeTypeIdAccessor(&CdfApplication::m_tid), 84 | // This should check for SocketFactory as a parent 85 | MakeTypeIdChecker()) 86 | .AddTraceSource("Tx", "A new packet is created and is sent", 87 | MakeTraceSourceAccessor(&CdfApplication::m_txTrace), 88 | "ns3::Packet::TracedCallback") 89 | .AddTraceSource("TxWithAddresses", "A new packet is created and is sent", 90 | MakeTraceSourceAccessor(&CdfApplication::m_txTraceWithAddresses), 91 | "ns3::Packet::TwoAddressTracedCallback"); 92 | return tid; 93 | } 94 | 95 | CdfApplication::CdfApplication() 96 | : m_socket(0), 97 | m_connected(false), 98 | m_lastStartTime(Seconds(0)), 99 | m_average_size(0), 100 | m_sizeDist(CreateObject()), 101 | m_timeDist(CreateObject()), 102 | m_counter(0) 103 | { 104 | NS_LOG_FUNCTION(this); 105 | } 106 | 107 | CdfApplication::~CdfApplication() 108 | { 109 | NS_LOG_FUNCTION(this); 110 | } 111 | 112 | Ptr 113 | CdfApplication::GetSocket(void) const 114 | { 115 | NS_LOG_FUNCTION(this); 116 | return m_socket; 117 | } 118 | 119 | int64_t 120 | CdfApplication::AssignStreams(int64_t stream) 121 | { 122 | NS_LOG_FUNCTION(this << stream); 123 | m_sizeDist->SetStream(stream); 124 | m_timeDist->SetStream(stream + 1); 125 | return 2; 126 | } 127 | 128 | void CdfApplication::DoDispose(void) 129 | { 130 | NS_LOG_FUNCTION(this); 131 | 132 | CancelEvents(); 133 | m_socket = 0; 134 | // chain up 135 | Application::DoDispose(); 136 | } 137 | 138 | // Application Methods 139 | void CdfApplication::StartApplication() // Called at time specified by Start 140 | { 141 | NS_LOG_FUNCTION(this); 142 | 143 | // Create the socket if not already 144 | if (!m_socket) 145 | { 146 | m_socket = Socket::CreateSocket(GetNode(), m_tid); 147 | if (Inet6SocketAddress::IsMatchingType(m_peer)) 148 | { 149 | if (m_socket->Bind6() == -1) 150 | { 151 | NS_FATAL_ERROR("Failed to bind socket"); 152 | } 153 | } 154 | else if (InetSocketAddress::IsMatchingType(m_peer) || 155 | PacketSocketAddress::IsMatchingType(m_peer)) 156 | { 157 | if (m_socket->Bind() == -1) 158 | { 159 | NS_FATAL_ERROR("Failed to bind socket"); 160 | } 161 | } 162 | m_socket->Connect(m_peer); 163 | m_socket->SetAllowBroadcast(true); 164 | m_socket->ShutdownRecv(); 165 | 166 | m_socket->SetConnectCallback( 167 | MakeCallback(&CdfApplication::ConnectionSucceeded, this), 168 | MakeCallback(&CdfApplication::ConnectionFailed, this)); 169 | } 170 | 171 | // Insure no pending event 172 | CancelEvents(); 173 | // If we are not yet connected, there is nothing to do here 174 | // The ConnectionComplete upcall will start timers at that time 175 | //if (!m_connected) return; 176 | ScheduleNextTx(); 177 | //Simulator::Schedule(m_stopTime, &CdfApplication::CancelEvents, this); 178 | } 179 | 180 | void CdfApplication::StopApplication() // Called at time specified by Stop 181 | { 182 | NS_LOG_FUNCTION(this); 183 | 184 | CancelEvents(); 185 | if (m_socket != 0) 186 | { 187 | m_socket->Close(); 188 | } 189 | else 190 | { 191 | NS_LOG_WARN("CdfApplication found null socket to close in StopApplication"); 192 | } 193 | } 194 | 195 | void CdfApplication::CancelEvents() 196 | { 197 | NS_LOG_FUNCTION(this); 198 | Simulator::Cancel(m_sendEvent); 199 | } 200 | 201 | // Private helpers 202 | void CdfApplication::ScheduleNextTx() 203 | { 204 | NS_LOG_FUNCTION(this); 205 | 206 | // Draw waiting time. 207 | auto nextTime = Seconds(m_timeDist->GetValue()); 208 | NS_LOG_DEBUG("Wait Time: " << nextTime.GetMilliSeconds() << "ms."); 209 | m_sendEvent = Simulator::Schedule(nextTime, 210 | &CdfApplication::SendPacket, this); 211 | } 212 | 213 | void CdfApplication::SendPacket() 214 | { 215 | NS_LOG_FUNCTION(this); 216 | 217 | // Draw packet size. 218 | auto size = m_sizeDist->GetInteger(); 219 | NS_LOG_DEBUG("Choosen Size: " << size << " Bytes."); 220 | 221 | NS_ASSERT(m_sendEvent.IsExpired()); 222 | Ptr packet = Create(size); 223 | m_txTrace(packet); 224 | 225 | MessageTag m_tag; 226 | m_tag.SetSimpleValue(m_counter++); 227 | packet->AddPacketTag(m_tag); 228 | m_socket->Send(packet); 229 | Address localAddress; 230 | m_socket->GetSockName(localAddress); 231 | if (InetSocketAddress::IsMatchingType(m_peer)) 232 | { 233 | NS_LOG_INFO("At time " << Simulator::Now().GetSeconds() 234 | << "s on-off application sent " 235 | << packet->GetSize() << " bytes to " 236 | << InetSocketAddress::ConvertFrom(m_peer).GetIpv4() 237 | << " port " << InetSocketAddress::ConvertFrom(m_peer).GetPort()); 238 | m_txTraceWithAddresses(packet, localAddress, InetSocketAddress::ConvertFrom(m_peer)); 239 | } 240 | else if (Inet6SocketAddress::IsMatchingType(m_peer)) 241 | { 242 | NS_LOG_INFO("At time " << Simulator::Now().GetSeconds() 243 | << "s on-off application sent " 244 | << packet->GetSize() << " bytes to " 245 | << Inet6SocketAddress::ConvertFrom(m_peer).GetIpv6() 246 | << " port " << Inet6SocketAddress::ConvertFrom(m_peer).GetPort()); 247 | m_txTraceWithAddresses(packet, localAddress, Inet6SocketAddress::ConvertFrom(m_peer)); 248 | } 249 | m_lastStartTime = Simulator::Now(); 250 | ScheduleNextTx(); 251 | } 252 | 253 | void CdfApplication::ConnectionSucceeded(Ptr socket) 254 | { 255 | NS_LOG_FUNCTION(this << socket); 256 | m_connected = true; 257 | } 258 | 259 | void CdfApplication::ConnectionFailed(Ptr socket) 260 | { 261 | NS_LOG_FUNCTION(this << socket); 262 | } 263 | 264 | /* 265 | void CdfApplication::LoadDistribution() 266 | { 267 | NS_LOG_FUNCTION(this); 268 | // in any case, make sure the data rate is up to date, in case m_Rate was 269 | // changed. 270 | // If loaded already, do nothing else. 271 | if (m_filename == m_loaded_filename) 272 | { 273 | UpdateRateDistribution(); 274 | return; 275 | } 276 | 277 | // Reset dist 278 | m_sizeDist = CreateObject(); 279 | 280 | NS_LOG_DEBUG("FILENAME " << m_filename); 281 | std::ifstream distFile(m_filename); 282 | 283 | if (!(distFile >> m_average_size)) 284 | { 285 | NS_FATAL_ERROR("Could not parse file: " << m_filename); 286 | } 287 | UpdateRateDistribution(); 288 | NS_LOG_DEBUG("Average size: " << m_average_size << " Bytes."); 289 | NS_LOG_DEBUG("Average interarrival time: " << m_timeDist->GetMean() << "s."); 290 | 291 | double value, probability; 292 | while (distFile >> value >> probability) 293 | { 294 | NS_LOG_DEBUG(value << ", " << probability); 295 | m_sizeDist->CDF(value, probability); 296 | } 297 | 298 | m_loaded_filename = m_filename; 299 | } 300 | */ 301 | 302 | void CdfApplication::UpdateRateDistribution() 303 | { 304 | NS_LOG_FUNCTION(this); 305 | auto timeBetween = m_rate.CalculateBytesTxTime(m_average_size); 306 | m_timeDist->SetAttribute("Mean", DoubleValue(timeBetween.GetSeconds())); 307 | } 308 | 309 | bool CdfApplication::SetDistribution(std::string filename) 310 | { 311 | NS_LOG_FUNCTION(this << filename); 312 | m_filename = filename; 313 | 314 | // Reset existing dist, if any. 315 | m_sizeDist = CreateObject(); 316 | 317 | std::ifstream distFile(m_filename); 318 | 319 | if (!(distFile >> m_average_size)) 320 | { 321 | NS_LOG_ERROR("Could not parse file: " << m_filename); 322 | return false; 323 | } 324 | // Using the average rate, update the time dist. 325 | UpdateRateDistribution(); 326 | 327 | NS_LOG_DEBUG("Average size: " << m_average_size << " Bytes."); 328 | NS_LOG_DEBUG("Average interarrival time: " << m_timeDist->GetMean() << "s."); 329 | 330 | NS_LOG_DEBUG("Loading CDF from file..."); 331 | double value, probability; 332 | while (distFile >> value >> probability) 333 | { 334 | NS_LOG_DEBUG(value << ", " << probability); 335 | m_sizeDist->CDF(value, probability); 336 | } 337 | return true; 338 | } 339 | std::string CdfApplication::GetDistribution() const { return m_filename; } 340 | 341 | void CdfApplication::SetRate(DataRate rate) 342 | { 343 | m_rate = rate; 344 | UpdateRateDistribution(); 345 | } 346 | DataRate CdfApplication::GetRate() const { return m_rate; } 347 | 348 | } // Namespace ns3 349 | -------------------------------------------------------------------------------- /workspace/README.md: -------------------------------------------------------------------------------- 1 | # Workspace 2 | 3 | We provide an exhaustive guide here to reproduce all experiments to train and evaluate the NTT model. Most of the steps are automated but some will have to be done manually as there are multiple platforms used separately e.g. NS3 for simulations and PyTorch-Lightning for the NTT implementation. 4 | 5 | ## File descriptions: 6 | The files inside the [TransformerModels](TransformerModels) directory is as follows: 7 | 8 | Core files - 9 | * [`encoder_delay.py`](TransformerModels/encoder_delay.py) : Pre-train the NTT by masking the last delay only. 10 | * [`encoder_delay_varmask_chooseencodelem.py`](TransformerModels/encoder_delay_varmask_chooseencodelem.py) : Pre-train the NTT by masking delays after choosing equally from the NTT's output encoded elements. 11 | * [`encoder_delay_varmask_chooseencodelem_multi.py`](TransformerModels/encoder_delay_varmask_chooseencodelem_multi.py) : Pre-train the NTT by masking delays after choosing equally from the NTT's output encoded elements and using multiple decoder instances. 12 | * [`encoder_delay_varmask_chooseagglevel_multi.py`](TransformerModels/encoder_delay_varmask_chooseagglevel_multi.py) : Pre-train the NTT by masking delays after choosing equally from the 3 levels of aggregation for the NTT and using multiple decoder instances. 13 | * [`finetune_encoder.py`](TransformerModels/finetune_encoder.py) : Fine-tune the NTT by masking the last delay only. 14 | * [`finetune_encoder_multi.py`](TransformerModels/finetune_encoder_multi.py) : Fine-tune the NTT by masking the last delay but initialize with multiple decoders to match the architecture when pre-trained with multiple decoders. 15 | * [`finetune_mct.py`](TransformerModels/finetune_mct.py) : Fine-tune the NTT to predict the MCT on the given data. 16 | * [`finetune_mct_multi.py`](TransformerModels/finetune_mct_multi.py) : Fine-tune the NTT to predict the MCT on the given data but initialize with multiple decoders to match the architecture when pre-trained with multiple decoders. 17 | * [`generate_sequences.py`](TransformerModels/generate_sequences.py) : Generate the sliding windows for the NTT from the processed NS3 simulations' packet data. 18 | * [`utils.py`](TransformerModels/utils.py) : All utility functions for data pre-processing. 19 | * [`arima.py`](TransformerModels/arima.py) : Train the ARIMA baselines. 20 | * [`lstm.py`](TransformerModels/lstm.py) : Train the Bi-LSTM baselines. 21 | * [`configs`](TransformerModels/configs) : Hyper-paramters for training the NTT model. 22 | 23 | 24 | Plot files - 25 | * [`plot_losses.py`](TransformerModels/plot_losses.py) : Plot MCT loss curves after fine-tuning the NTT pre-trained on masking the last delay only. 26 | * [`mct_test_plots.py`](TransformerModels/mct_test_plots.py) : Plot MCT loss curves after fine-tuning the NTT pre-trained on masking on variable positions. 27 | * [`plot_predictions.py`](TransformerModels/plot_predictions.py) : Plot historgrams of predictions after pre-training and fine-tuning the NTT. 28 | 29 | Others - 30 | * [`transformer_delay.py`](TransformerModels/transformer_delay.py) : A vanilla Transformer encoder-decoder architecture, naively trained on some packet data to predict delays. (this was only for initial insights) 31 | 32 | The files inside the [PandasScripts](PandasScripts) directory is as follows: 33 | * [`csvhelper_memento.py`](PandasScripts/csvhelper_memento.py) : Utility script to pre-process raw memento NS3 outputs to a format, which makes it easier to create the sliding windows and train the NTT. This is the actual script used. 34 | * [`csv_gendelays.py`](PandasScripts/csv_gendelays.py) : Utility script to pre-process raw NS3 outputs to a format, which makes it easier to create the sliding windows and train the vanilla transformer. This is NOT used, except for initial insights. 35 | 36 | The structure inside the [NetworkSimulators](NetworkSimulators) is as follows: 37 | * [memento](NetworkSimulators/memento): Contains a working copy of ONLY the relevant code files for generating the pre-training and fine-tuning the NTT models. This cannot be run without the full setup, which is self contained in [memento-ns3-for-NTT](https://github.com/Siddhant-Ray/memento-ns3-for-NTT). Files inside this [memento](NetworkSimulators/memento) directory, should not be used anymore, except for quick reference. 38 | * [ns3](NetworkSimulators/ns3): This was used for initial insights only, and NO RESULTS from it have been included in the thesis. 39 | - Contains a working copy of the relevant code files for generating the pre-training for the vanilla NTT model, which is authored in [`transformer_delay.py`](TransformerModels/transformer_delay.py). To generate this data, you must install ns3 from scratch as mentioned [here](https://www.nsnam.org/docs/release/3.35/tutorial/singlehtml/index.html#prerequisites).Following which, all the `.cc` files in [ns3](NetworkSimulators/ns3) must be put in the `scratch/` directory. This can be tricky, so we will provide a quicker alternative setup. 40 | - Alternatively, you can run the script [`dockerns3.sh`](NetworkSimulators/ns3/dockerns3.sh), and use the files [`cptodocker.sh`](NetworkSimulators/ns3/cptodocker.sh) and [`cpfromdocker.sh`](NetworkSimulators/ns3/cpfromdocker.sh) to move the code files and results, in and out of the ns3 container. 41 | - You can run the files inside the container with the commands: 42 | * `export NS_LOG=congestion_1=info` 43 | * and then `./waf --run scratch/congestion_1`. 44 | - This generates a folder called `congesion_1` with the required data files. 45 | - For pre-processing, copy the `congesion_1` folder into [PandasScripts](PandasScripts) and run: 46 | * ```python csv_gendelays.py --model tcponly --numsenders 6``` 47 | - The files can now be added to [`transformer_delay.py`](TransformerModels/transformer_delay.py) and the job can be run. 48 | 49 | 50 | ## To reproduce results in the thesis: 51 | 52 | ### Setup 53 | 54 | To run on the ```TIK SLURM cluster```, you need to install ```pyenv```, details for which can be found here: [D-ITET Computing](https://computing.ee.ethz.ch/Programming/Languages/Python). 55 | 56 | On other clusters or standalone systems, you can use the system Python to create a new virtual environment. 57 | 58 | $ python -m venv venv 59 | 60 | After the environment has been created (created name is `venv` for simplicity): 61 | 62 | 63 | If it is a pyenv environment, run 64 | 65 | $ eval "$(pyenv init -)" 66 | $ eval "$(pyenv virtualenv-init -)" 67 | $ pyenv activate venv 68 | 69 | Else for normal Python virtual environments, run 70 | 71 | $ source venv/bin/acvtivate 72 | 73 | Now, install the Python dependencies: 74 | 75 | $ pip install -r requirements.txt 76 | 77 | Alternatively, if installing environments and dependencies may not work on some systems (Eg. Windows), you can use our pre-built Docker image for setting up the same. The Docker image has a Python environment setup with the dependencies, along with the entire code packaged in it, for ease of use. 78 | 79 | To use the docker image, run 80 | 81 | $ ./docker-run.sh 82 | 83 | Or you can build your own Docker image locally. For this, run 84 | 85 | $ docker build --tag ntt-docker . 86 | 87 | The folder (submodule) [`memento-ns3-for-NTT`](https://github.com/Siddhant-Ray/memento-ns3-for-NTT) contains instructions to generate the training data using NS3 simulations. The module is self contained and will generate a folder called ```results/```, which will contain the required data. To preprocess, copy the ```results/``` folder into the directory [`PandasScripts`](PandasScripts) and run the script (modify the filesnames inside [`csvhelper_memento.py`](PandasScripts/csvhelper_memento.py) if needed): 88 | 89 | $ python csvhelper_memento.py --model memento 90 | 91 | This will generate the pre-processed files. The files maybe different, depending on the kind of data generated but all of them will end with ```_final.csv```. Copy all files with this ending, into a folder named ```memento_data/``` and move this folder to the [`TransformerModels`](TransformerModels) directory. 92 | 93 | Copying ```results/``` and ```memento_data/``` to these destinations is needed, else the execution will fail. After copying the files, the training and fine-tuning phase is ready to be initiated. 94 | 95 | NOTE: If you are using the Docker image for NTT training, you will need to generate the pre-training data inside the Docker containers provided by `memento-ns3-for-NTT`. Following which, the ```results/``` folder must be copied inside the ```ntt-docker``` container, into the same directories as mentioned above. This can be done with ```docker cp```. Our Docker image doesn't support GPUs (yet), so feel free to modify the Dockerfile to include CUDA support, or run with CPUs for now. 96 | 97 | ### Training and fine-tuning: 98 | 99 | We need GPUs to run the training and fine-tuning, and this documentation only covers the steps to run on the ```TIK SLURM cluster```. If running on other clusters, the setup might have to be modified a little. We provide a self-contained run script ```run.sh```, in which you can uncomment out the job you want to run. Now you can just execute: 100 | 101 | $ sbatch run.sh 102 | 103 | ### Reproduce the plots: 104 | 105 | The specific log folders generated after a pre-training or fine-tuning job, must be copied with the EXACT same names, into a ```logs``` directories relative to the [`plot_losses.py`](TransformerModels/plot_losses.py) or [`mct_test_plots.py`](TransformerModels/mct_test_plots.py) files, as displayed in the ```.py``` files. Following that, the plots can be generated as simply as: 106 | 107 | $ python mct_test_plots.py 108 | $ python plot_losses.py 109 | 110 | 111 | ## Comments: 112 | 113 | * SBATCH commands in ```run.sh``` might need to be changed as per memory or GPU requirements. 114 | * For running the ARIMA baselines, GPUs are not needed. 115 | * On the TIK SLURM cluster, sometimes there is the following error ```OSError: [Errno 12] Cannot allocate memory```. 116 | To fix this: 117 | - Increase the amount of memory for the job to be run or 118 | - Reduce the ```num_workers``` argument in the DataLoader inside the given model's ```.py``` file from 4 to 1. 119 | * To switch to data from different topologies, you only need to change the ```NUM_BOTTLENECKS``` global variable in the respective model's ```.py``` you are running. Note that not all experiments are meant to be run on all topologies. For details on which topology is used for which experiment, refer to [`thesis.pdf`](../report/thesis.pdf) 120 | * Checkpoints will automatically be saved in the respective log folders for every job (refer to the particular model's ```.py``` to see specific names). It is advisable to copy the ```.ckpt``` files into a new folder named ```checkpoints/```, in order to initialise from the trained weights and not lose any work. This relative path ```checkpoints/*.ckpt``` can replaced in the appropriate ```.py``` file. Every fine-tuning ```.py``` file has a global variable ```PRETRAINED``` which can be set to ```True``` if you want to initialize from the saved weights, or ```False``` if fine-tuning must be done from scratch . 121 | * Different models have different instances of linear layers, and if you want to initialize fine-tuning from a checkpoint, you must ensure that the pre-training process had the same model architecture, else PyTorch model loading will fail. After initialization with the same architecture, the layers can be changed as per required. As a quick map, 122 | - If pre-training is done with multiple linear layer instances in the model i.e. [`encoder_delay_varmask_chooseagglevel_multi.py`](TransformerModels/encoder_delay_varmask_chooseagglevel_multi.py) or [`encoder_delay_varmask_chooseencodelem_multi.py`](TransformerModels/encoder_delay_varmask_chooseencodelem_multi.py), then [`finetune_encoder_multi.py`](TransformerModels/finetune_encoder_multi.py) and [`finetune_mct_multi.py`](TransformerModels/finetune_mct_multi.py) should be used for fine-tuning. 123 | - In other cases of single linear layer instances in the model, [`finetune_encoder.py`](TransformerModels/finetune_encoder.py) and [`finetune_mct.py`](TransformerModels/finetune_mct.py) should be used for fine-tuning. 124 | * The ```TRAIN``` global variable in the ```.py``` files is used to decide whether to train on the training data, or just test on the testing data. 125 | * The ```trainer API``` from PyTorch lightning (present in the all of the Transformer ```.py``` files) is used to select multiple GPUs using the ```strategy``` argument. Possible options are 126 | - `dp` : Data Parallel, this works always on the TIK SLURM cluster. 127 | - `ddp` : Distributed Data Parallel, this only works sometimes and we haven't used this. To run ddp jobs, modify the ```run.sh``` file, to include an `srun` command prior to the `python` command. 128 | * To save files, sometimes you might have to modify the directory and file names in the code, as needed on your machine. As this is not an end-to-end software, somtimes it is not possible to create a generic file saving system across multiple experiments. 129 | 130 | 131 | 132 | 133 | -------------------------------------------------------------------------------- /workspace/TransformerModels/transformer_delay.py: -------------------------------------------------------------------------------- 1 | # Orignal author: Siddhant Ray 2 | 3 | import argparse 4 | import copy 5 | import json 6 | import math 7 | import os 8 | import pathlib 9 | import random 10 | import time as t 11 | from datetime import datetime 12 | from ipaddress import ip_address 13 | 14 | import matplotlib.pyplot as plt 15 | import numpy as np 16 | import pandas as pd 17 | import pytorch_lightning as pl 18 | import torch 19 | import yaml 20 | from pytorch_lightning import loggers as pl_loggers 21 | from pytorch_lightning.callbacks import ProgressBar 22 | from pytorch_lightning.callbacks.early_stopping import EarlyStopping 23 | from sklearn.model_selection import train_test_split 24 | from tensorboard.backend.event_processing.event_accumulator import \ 25 | EventAccumulator 26 | from torch import einsum, nn, optim 27 | from torch.nn import functional as F 28 | from torch.utils.data import DataLoader 29 | from tqdm import tqdm 30 | from utils import (PacketDataset, convert_to_relative_timestamp, 31 | get_data_from_csv, ipaddress_to_number, 32 | sliding_window_delay, sliding_window_features, 33 | vectorize_features_to_numpy) 34 | 35 | random.seed(0) 36 | np.random.seed(0) 37 | torch.manual_seed(0) 38 | 39 | torch.set_default_dtype(torch.float64) 40 | 41 | # Hyper parameters from config file 42 | 43 | with open("configs/config-transformer.yaml") as f: 44 | config = yaml.load(f, Loader=yaml.FullLoader) 45 | 46 | WEIGHTDECAY = float(config["weight_decay"]) 47 | LEARNINGRATE = float(config["learning_rate"]) 48 | DROPOUT = float(config["dropout"]) 49 | NHEAD = int(config["num_heads"]) 50 | LAYERS = int(config["num_layers"]) 51 | EPOCHS = int(config["epochs"]) 52 | BATCHSIZE = int(config["batch_size"]) 53 | LINEARSIZE = int(config["linear_size"]) 54 | LOSSFUNCTION = nn.MSELoss() 55 | 56 | if "loss_function" in config.keys(): 57 | if config["loss_function"] == "huber": 58 | LOSSFUNCTION = nn.HuberLoss() 59 | if config["loss_function"] == "smoothl1": 60 | LOSSFUNCTION = nn.SmoothL1Loss() 61 | if config["loss_function"] == "kldiv": 62 | LOSSFUNCTION = nn.KLDivLoss() 63 | 64 | # Params for the sliding window on the packet data 65 | SLIDING_WINDOW_START = 0 66 | SLIDING_WINDOW_STEP = 1 67 | SLIDING_WINDOW_SIZE = 10 68 | 69 | SAVE_MODEL = False 70 | MAKE_EPOCH_PLOT = True 71 | TEST = True 72 | 73 | if torch.cuda.is_available(): 74 | NUM_GPUS = torch.cuda.device_count() 75 | print("Number of GPUS: {}".format(NUM_GPUS)) 76 | else: 77 | print("ERROR: NO CUDA DEVICE FOUND") 78 | NUM_GPUS = 0 79 | 80 | # DO NOT USE (AS OF NOW) 81 | class AbsPosEmb1DAISummer(nn.Module): 82 | """ 83 | Given query q of shape [batch heads tokens dim] we multiply 84 | q by all the flattened absolute differences between tokens. 85 | Learned embedding representations are shared across heads 86 | """ 87 | 88 | def __init__(self, tokens, dim_head): 89 | """ 90 | Output: [batch head tokens tokens] 91 | Args: 92 | tokens: elements of the sequence 93 | dim_head: the size of the last dimension of q 94 | """ 95 | super().__init__() 96 | scale = dim_head**-0.5 97 | self.abs_pos_emb = nn.Parameter(torch.randn(tokens, dim_head) * scale) 98 | 99 | def forward(self, q): 100 | return einsum("b h i d, j d -> b h i j", q, self.abs_pos_emb) 101 | 102 | 103 | # DO NOT USE (AS OF NOW) 104 | class PositionalEncoding(nn.Module): 105 | def __init__(self, d_model, dropout=DROPOUT, max_len=5000): 106 | super(PositionalEncoding, self).__init__() 107 | self.dropout = nn.Dropout(p=dropout) 108 | pe = torch.zeros(max_len, d_model) 109 | position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) 110 | div_term = torch.exp( 111 | torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model) 112 | ) 113 | pe[:, 0::2] = torch.sin(position * div_term) 114 | pe[:, 1::2] = torch.cos(position * div_term) 115 | pe = pe.unsqueeze(0).transpose(0, 1) 116 | self.register_buffer("pe", pe) 117 | 118 | def forward(self, x): 119 | x = x + self.pe[: x.size(0), :] 120 | return self.dropout(x) 121 | 122 | 123 | # TRANSFOMER CLASS TO PREDICT DELAYS 124 | class BaseTransformer(pl.LightningModule): 125 | def __init__(self, input_size, target_size, loss_function): 126 | super(BaseTransformer, self).__init__() 127 | 128 | self.step = [0] 129 | self.warmup_steps = 1000 130 | 131 | # create the model with its layers 132 | 133 | self.encoder_layer = nn.TransformerEncoderLayer( 134 | d_model=LINEARSIZE, nhead=NHEAD, batch_first=True, dropout=DROPOUT 135 | ) 136 | self.decoder_layer = nn.TransformerDecoderLayer( 137 | d_model=LINEARSIZE, nhead=NHEAD, batch_first=True, dropout=DROPOUT 138 | ) 139 | self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=LAYERS) 140 | self.decoder = nn.TransformerDecoder(self.decoder_layer, num_layers=LAYERS) 141 | self.encoderin = nn.Linear(input_size, LINEARSIZE) 142 | self.decoderin = nn.Linear(target_size, LINEARSIZE) 143 | self.decoderpred = nn.Linear(LINEARSIZE, target_size) 144 | self.model = nn.ModuleList( 145 | [ 146 | self.encoder, 147 | self.decoder, 148 | self.encoderin, 149 | self.decoderin, 150 | self.decoderpred, 151 | ] 152 | ) 153 | 154 | self.loss_func = loss_function 155 | parameters = { 156 | "WEIGHTDECAY": WEIGHTDECAY, 157 | "LEARNINGRATE": LEARNINGRATE, 158 | "EPOCHS": EPOCHS, 159 | "BATCHSIZE": BATCHSIZE, 160 | "LINEARSIZE": LINEARSIZE, 161 | "NHEAD": NHEAD, 162 | "LAYERS": LAYERS, 163 | } 164 | self.df = pd.DataFrame() 165 | self.df["parameters"] = [json.dumps(parameters)] 166 | 167 | def configure_optimizers(self): 168 | self.optimizer = optim.Adam( 169 | self.model.parameters(), 170 | betas=(0.9, 0.98), 171 | eps=1e-9, 172 | lr=LEARNINGRATE, 173 | weight_decay=WEIGHTDECAY, 174 | ) 175 | return {"optimizer": self.optimizer} 176 | 177 | def lr_update(self): 178 | self.step[0] += 1 179 | learning_rate = LINEARSIZE ** (-0.5) * min( 180 | self.step[0] ** (-0.5), self.step[0] * self.warmup_steps ** (-1.5) 181 | ) 182 | for param_group in self.optimizer.param_groups: 183 | param_group["lr"] = learning_rate 184 | 185 | def forward(self, input, target): 186 | # used for the forward pass of the model 187 | scaled_input = self.encoderin(input.double()) 188 | target = self.decoderin(target.double()) 189 | enc = self.encoder(scaled_input) 190 | out = self.decoderpred(self.decoder(target, enc)) 191 | return out 192 | 193 | def training_step(self, train_batch, train_idx): 194 | X, y = train_batch 195 | self.lr_update() 196 | prediction = self.forward(X, y) 197 | loss = self.loss_func(prediction, y) 198 | self.log("Train loss", loss) 199 | return loss 200 | 201 | def validation_step(self, val_batch, val_idx): 202 | X, y = val_batch 203 | prediction = self.forward(X, y) 204 | loss = self.loss_func(prediction, y) 205 | self.log("Val loss", loss, sync_dist=True) 206 | return loss 207 | 208 | def test_step(self, test_batch, test_idx): 209 | X, y = test_batch 210 | prediction = self.forward(X, y) 211 | loss = self.loss_func(prediction, y) 212 | self.log("Test loss", loss, sync_dist=True) 213 | return loss 214 | 215 | def predict_step(self, test_batch, test_idx, dataloader_idx=0): 216 | X, y = test_batch 217 | prediction = self.forward(X, y) 218 | return prediction 219 | 220 | def training_epoch_end(self, outputs): 221 | loss_tensor_list = [item["loss"].to("cpu").numpy() for item in outputs] 222 | # print(loss_tensor_list, len(loss_tensor_list)) 223 | self.log( 224 | "Avg loss per epoch", 225 | np.mean(np.array(loss_tensor_list)), 226 | on_step=False, 227 | on_epoch=True, 228 | ) 229 | 230 | 231 | def main(): 232 | path = "congestion_1/" 233 | files = [ 234 | "endtoenddelay500s_1.csv", 235 | "endtoenddelay500s_2.csv", 236 | "endtoenddelay500s_3.csv", 237 | "endtoenddelay500s_4.csv", 238 | "endtoenddelay500s_5.csv", 239 | ] 240 | 241 | sl_win_start = SLIDING_WINDOW_START 242 | sl_win_size = SLIDING_WINDOW_SIZE 243 | sl_win_shift = SLIDING_WINDOW_STEP 244 | 245 | num_features = 15 246 | input_size = sl_win_size * num_features 247 | output_size = sl_win_size 248 | 249 | model = BaseTransformer(input_size, output_size, LOSSFUNCTION) 250 | full_feature_arr = [] 251 | full_target_arr = [] 252 | test_loaders = [] 253 | 254 | for file in files: 255 | print(os.getcwd()) 256 | 257 | df = get_data_from_csv(path + file) 258 | df = convert_to_relative_timestamp(df) 259 | 260 | df = ipaddress_to_number(df) 261 | feature_df, label_df = vectorize_features_to_numpy(df) 262 | 263 | print(feature_df.head(), feature_df.shape) 264 | print(label_df.head()) 265 | 266 | feature_arr = sliding_window_features( 267 | feature_df.Combined, sl_win_start, sl_win_size, sl_win_shift 268 | ) 269 | target_arr = sliding_window_delay( 270 | label_df, sl_win_start, sl_win_size, sl_win_shift 271 | ) 272 | print(len(feature_arr), len(target_arr)) 273 | full_feature_arr = full_feature_arr + feature_arr 274 | full_target_arr = full_target_arr + target_arr 275 | 276 | print(len(full_feature_arr), len(full_target_arr)) 277 | 278 | full_train_vectors, test_vectors, full_train_labels, test_labels = train_test_split( 279 | full_feature_arr, full_target_arr, test_size=0.05, shuffle=True, random_state=42 280 | ) 281 | # print(len(full_train_vectors), len(full_train_labels)) 282 | # print(len(test_vectors), len(test_labels)) 283 | 284 | train_vectors, val_vectors, train_labels, val_labels = train_test_split( 285 | full_train_vectors, full_train_labels, test_size=0.1, shuffle=False 286 | ) 287 | # print(len(train_vectors), len(train_labels)) 288 | # print(len(val_vectors), len(val_labels)) 289 | 290 | # print(train_vectors[0].shape[0]) 291 | # print(train_labels[0].shape[0]) 292 | 293 | train_dataset = PacketDataset(train_vectors, train_labels) 294 | val_dataset = PacketDataset(val_vectors, val_labels) 295 | test_dataset = PacketDataset(test_vectors, test_labels) 296 | # print(train_dataset.__getitem__(0)) 297 | 298 | train_loader = DataLoader( 299 | train_dataset, batch_size=BATCHSIZE, shuffle=True, num_workers=4 300 | ) 301 | val_loader = DataLoader( 302 | val_dataset, batch_size=BATCHSIZE, shuffle=False, num_workers=4 303 | ) 304 | test_loader = DataLoader( 305 | test_dataset, batch_size=BATCHSIZE, shuffle=False, num_workers=4 306 | ) 307 | 308 | # print one dataloader item!!!! 309 | train_features, train_lbls = next(iter(train_loader)) 310 | print(f"Feature batch shape: {train_features.size()}") 311 | print(f"Labels batch shape: {train_lbls.size()}") 312 | feature = train_features[0] 313 | label = train_lbls[0] 314 | print(f"Feature: {feature}") 315 | print(f"Label: {label}") 316 | 317 | val_features, val_lbls = next(iter(val_loader)) 318 | print(f"Feature batch shape: {val_features.size()}") 319 | print(f"Labels batch shape: {val_lbls.size()}") 320 | feature = val_features[0] 321 | label = val_lbls[0] 322 | print(f"Feature: {feature}") 323 | print(f"Label: {label}") 324 | 325 | test_features, test_lbls = next(iter(test_loader)) 326 | print(f"Feature batch shape: {test_features.size()}") 327 | print(f"Labels batch shape: {test_lbls.size()}") 328 | feature = test_features[0] 329 | label = test_lbls[0] 330 | print(f"Feature: {feature}") 331 | print(f"Label: {label}") 332 | 333 | print("Started training at:") 334 | time = datetime.now() 335 | print(time) 336 | 337 | print("Removing old logs:") 338 | os.system("rm -rf transformer_delay_logs/lightning_logs/") 339 | 340 | tb_logger = pl_loggers.TensorBoardLogger(save_dir="transformer_delay_logs/") 341 | 342 | if NUM_GPUS >= 1: 343 | trainer = pl.Trainer( 344 | precision=16, 345 | gpus=-1, 346 | strategy="dp", 347 | max_epochs=EPOCHS, 348 | check_val_every_n_epoch=1, 349 | logger=tb_logger, 350 | callbacks=[EarlyStopping(monitor="Val loss", patience=5)], 351 | ) 352 | else: 353 | trainer = pl.Trainer( 354 | gpus=None, 355 | max_epochs=EPOCHS, 356 | check_val_every_n_epoch=1, 357 | logger=tb_logger, 358 | callbacks=[EarlyStopping(monitor="Val loss", patience=5)], 359 | ) 360 | 361 | trainer.fit(model, train_loader, val_loader) 362 | print("Finished training at:") 363 | time = datetime.now() 364 | print(time) 365 | 366 | if SAVE_MODEL: 367 | name = config["name"] 368 | torch.save(model.model, f"./trained_transformer_{name}") 369 | 370 | if not MAKE_EPOCH_PLOT: 371 | t.sleep(5) 372 | log_dir = "transformer_logs/lightning_logs/version_0" 373 | y_key = "Avg loss per epoch" 374 | 375 | event_accumulator = EventAccumulator(log_dir) 376 | event_accumulator.Reload() 377 | 378 | steps = {x.step for x in event_accumulator.Scalars("epoch")} 379 | epoch_vals = list({x.value for x in event_accumulator.Scalars("epoch")}) 380 | epoch_vals.pop() 381 | 382 | x = list(range(len(steps))) 383 | y = [x.value for x in event_accumulator.Scalars(y_key) if x.step in steps] 384 | 385 | fig, ax = plt.subplots() 386 | ax.plot(epoch_vals, y) 387 | ax.set_xlabel("epoch") 388 | ax.set_ylabel(y_key) 389 | fig.savefig("lossplot_perepoch.png") 390 | 391 | if TEST: 392 | trainer.test(model, dataloaders=test_loader) 393 | 394 | 395 | if __name__ == "__main__": 396 | main() 397 | -------------------------------------------------------------------------------- /report/background.tex: -------------------------------------------------------------------------------- 1 | \chapter{Background and Related Work} 2 | \label{cha:background} 3 | 4 | A large part of the work undertaken during this project requires a deep understanding on how a particular deep learning architecture, the Transformer works. In this section, we will cover some of the required background and insights drawn from the Transformer architecture which were needed to model and solve our problem of prediction on packet data. We also present adaptations of the Transformer architecture to solve problems in several fields such as NLP/CV and relevant ideas which could be adapted to our tasks. 5 | 6 | \section{Background on Transformers} 7 | \label{sec:background} 8 | 9 | \subsection{Sequence modelling with attention} 10 | \label{ssec:bgsequence} 11 | 12 | Transformers are built around the \emph{attention mechanism}, which maps an input sequence to an output sequence of the same length. 13 | Every output encodes its own information and its context, ie information from related elements in the sequence, regardless of how far they are apart. 14 | The process involves the scalar multiplication of the input feature matrix and the attention matrix as a dot product operation, which allows the deep neural network to focus on certain parts of the sequence at a time, based on the values of the attention weight matrix. This allows the network to attend to parts of the sequence in parallel, rather than in sequence, which allows highly efficient computation. Also, as the attention weights are learnable parameters for the network, the Transformer over time, learns to choose the best weights, which allows for optimum learning of structure within the sequence of data. 15 | Computing attention is efficient as all elements in the sequence can be processed in parallel with matrix operations that are highly optimised on most hardware. These properties have made Transformer based neural networks the state-of-the-art solution for solving many sequence modelling tasks. We refer to an excellent illustrated guide to the Transformer here\cite{trans}. 16 | 17 | 18 | While attention originated as an improvement to Recurrent Neural Networks(RNNs), it was soon realised that the mechanism could replace them entirely\cite{vaswaniAttentionAllYou2017}. RNN models were the initial state-of-the art deep learning architectures in sequence modelling problems, however they suffer from several issues. Training RNNs is usually limited by one or all of the following problems: 19 | \begin{enumerate} 20 | \item RNNs are not computationally efficient for training long sequences, as they require $n$ sequential operations to learn a sequence, $n$ being the length of the given sequence. This makes the training process extremely slow. 21 | \item RNNs suffer from the problem of \emph{vanishing gradients} \ie as elements in the input need to be processed in a sequence over time, the gradients used by the optimiser\cite{Robbins2007ASA} for the elements at the end of very long sequences, become extremely small and numerically unstable to converge to the desired value. 22 | \item RNNs struggle to learn \emph{long-term dependencies}, \ie learning relations between elements far apart in the sequence is challenging. 23 | \end{enumerate} 24 | 25 | We present an excellent summary of the complexities and differences between several deep learning architectures which have attempted to solve the sequence modelling problem in Table \ref{bg:table1}. 26 | 27 | \begin{table}[htbp] 28 | \centering 29 | \begin{tabular}{ c c c c } 30 | \toprule 31 | Layer Type & Complexity & Sequential & Maximum \\ 32 | & per Layer & Operations & Path Length \\ 33 | \midrule 34 | Transformer(Self-Attention) & $O(n^2 \cdot d)$ & $O(1)$ & $O(1)$ \\ 35 | Recurrent NN & $O(n \cdot d^2)$ & $O(n)$ & $O(n)$ \\ 36 | Convolutional NN & $O(k \cdot n \cdot d^2)$ & $O(1)$ & $O(\log_{k}{n})$\\ 37 | Restricted(Self-Attention) & $O(r \cdot n \cdot d)$ & $O(1)$ & $O(n/r)$\\ 38 | \bottomrule 39 | \end{tabular} 40 | \caption{Maximum path lengths, per-layer complexity and minimum number of sequential operations 41 | for different layer types. $n$ is the sequence length, $d$ is the representation dimension, $k$ is the kernel 42 | size of convolutions and $r$ the size of the neighbourhood in restricted self-attention. 43 | SOURCE: Original paper\cite{vaswaniAttentionAllYou2017}} 44 | \label{bg:table1} 45 | \end{table} 46 | 47 | 48 | Augmenting RNNs with attention solves some of these issues\cite{rnnattention}, and replacing them with \emph{Transfomers} has shown to solve all these problems. 49 | The authors propose an architecture for translation tasks that contains: 50 | \begin{itemize} 51 | \item a learnable \emph{embedding} layer mapping words to vectors 52 | \item a \emph{transformer encoder} encoding the input sequence 53 | \item a \emph{transformer decoder} generating an output sequence based on the encoded input 54 | \end{itemize} 55 | 56 | Each transformer block alternates between attention and linear layers, ie between encoding context and refining features. The attention mechanism helps learn ``context rich" representations of every element, mapping relevant information from the surrounding elements which surround it and the linear layers help map this learn information into a form useful for downstream prediction. Figure \ref{fig:transformer} shows details of the original Transformer architecture and the functions of its layers. 57 | 58 | \begin{figure}[!hbt] 59 | \begin{center} 60 | \includegraphics[scale=1.5]{figures/architecture_transformer.pdf} 61 | \caption{Original Transformer Architecture, Credits: Alexander Dietmüller} 62 | \label{fig:transformer} 63 | \end{center} 64 | \end{figure} 65 | 66 | \subsection{Pre-training and fine-tuning} 67 | \label{ssec:bgtraining} 68 | 69 | Due to the highly efficient and parallelizable nature of Transformers, they can be widely used a variety of tasks based on the principle of \emph{transfer learning}. The most common strategy for that is to use the architecture for two phases, \emph{pre-training} and \emph{fine-tuning}. Inspired by the original Transformers success on the task of language translation, the use of Transformers has become ubiquitous in solving NLP problems. We present one such state of the the art NLP Transformer model, 70 | called BERT\cite{devlinBERTPretrainingDeep2019}, one of the most widely used transformer models today. 71 | While the original Transformer had both an encoder and decoder with attention, BERT uses only the transformer encoder followed by a small and replaceable decoder. This decoder is usually a set of linear layers and acts as a multilayer perceptron (MLP); and is usually called the 'MLP head' in the deep learning community. The principle of transfer-learning works for BERT as follows: 72 | 73 | \begin{itemize} 74 | \item In the first step, BERT is \emph{pre-trained} with a task that requires learning the underlying language structure. Linguists' research has shown that a Masked Language Model is optimal for deep learning models for learning structure in natural languages\cite{wettigShouldYouMask}. 75 | Concretely, a fraction of words in the input sequence is masked out (15\% in the original model), and the decoder is tasked to predict the original words from the encoded input sequence. 76 | BERT is used to generate contextual encodings of, which is only possible due to the bi-directionality of the attention mechanism in BERT. This allows the model to infer the context of the word from both sides in a given sentence, which was not possible earlier when elements in a sequence were only processed in a given order.\footnote{ 77 | BERT is pre-trained from text corpora with several billion words and fine-tuned with $\sim$100 thousand examples per task. 78 | } 79 | 80 | \item In the second step, the unique pre-trained model can be fine-tuned to many different tasks by replacing the small decoder with task-specific ones, eg language understanding, question answering, or text generation. 81 | The fine-tuning process involves resumption of learning from the saved weights of the pre-training phase, but for a new task or learning in a new environment. The new model has already learned to encode a general language context and only needs to learn to extract the task-relevant information from this context. This requires far less data compared to starting from scratch and makes the pre-training process faster. 82 | 83 | Furthermore, BERTs pre-training step is unsupervised/self-supervised, \ie it requires only ``cheap'' unlabelled data and no labelled signal from the data a target value. As procuring labelled data is harder, this problem is mitigated by having a pre-training phase on ``generic" data and then using 84 | ``expensive'' labeled data, eg for text classification, during the fine-tuning phase. Figure \ref{fig:bert} shows the details of BERT's pre-training and fine-tuning phase. 85 | \end{itemize} 86 | 87 | \begin{figure}[!hbt] 88 | \begin{center} 89 | \includegraphics[scale=1.5]{figures/architecture_bert.pdf} 90 | \caption{Original BERT Architecture, Credits: Alexander Dietmüller} 91 | \label{fig:bert} 92 | \end{center} 93 | \end{figure} 94 | 95 | 96 | 97 | Another extremely useful feature of Transformer is its generalization ability, which is possible due to its highly efficient parallelization during training, which in turn helps transfer of knowledge to a variety of tasks during the process of fine-tuning. The OpenAI GPT-3\cite{brownLanguageModelsAre2020} model that investigates few-shot learning with only a pre-trained model. 98 | As it is just a pre-trained model, it does not outperform fine-tuned models on specific tasks. However, GPT-3 delivers impressive performance on various tasks, including translation, determining the sentiment of tweets, or generating completely novel text. As per requirement, it can also be fine-tuned on specific tasks, which showcases further, the generalization power of transformers. 99 | 100 | 101 | \subsection{Vision transformers} 102 | \label{ssec:bgvit} 103 | 104 | Following their success in NLP, the use of Transformers were explored as a possibility to learn structural information in other kinds of data. One such field where they gained a lot of traction in in the field of CV. 105 | However adapting the use of Transformers to vision tasks came with a few major challenges: 106 | \begin{itemize} 107 | \item Image data does not have a natural sequence, as they are a spatial collection of pixels. 108 | \item Learning encodings at the pixel level for a large number of images, proved to be too fine-grained to scale to big datasets. 109 | \item The Masked Language Model theory couldn't be efficiently transferred to the context of learning structure in images, as the relationship between the units (pixels) does not follow the same logic as in natural languages. 110 | \end{itemize} 111 | 112 | To solve this problems, the authors of the Vision Transformer(ViT)\cite{dosovitskiyImageWorth16x162021}, came up with multiple ideas to solve each of these problems, in order to have the data in a form, on which a Transformer could be efficiently trained. 113 | \begin{itemize} 114 | \item \emph{Serialize and aggregate the data: } As image data does not have a natural sequence structure, such a structure was artificially introduced. Every image was split at the pixel level and aggregated into patches of dimension 16$\times$16 and each of these patches became a member of the new input ``sequence". As Transformers scale quadratically with increasing sequence size \ref{bg:table1}, this also solved the problem of efficiency and that encodings at the pixel level are too fine-grained to be useful. The embedding and transformer layers were then applied to the resulting sequence of patches, using an architecture similar to BERT. 115 | \item \emph{Domain Specific Pre-training: } At the heart, most CV problems have very similar training and inference objectives, one of the most common being image classification, which makes most of the vision tasks similar in objective, differing only in environment. This made classification a much more suited task for image data pre-training as opposed to masking and reconstruction. This was exploited by the ViT and both the pre-training and fine-tuning was done with the objective of classification, with only a change in environment between the stages. This also meant that understanding structure in image data, not only required information from the neighbouring patches but also from the whole image, which was possible by using all the encoded patches for classification. We present the details of ViT's pre-training and fine-tuning in Figure \ref{fig:vit}. 116 | \end{itemize} 117 | 118 | \begin{figure}[!hbt] 119 | \begin{center} 120 | \includegraphics[scale=1.5]{figures/architecture_vit.pdf} 121 | \caption{Original ViT Architecture, Credits: Alexander Dietmüller} 122 | \label{fig:vit} 123 | \end{center} 124 | \end{figure} 125 | 126 | 127 | Finally, ViT's authors also observe that domain-specific architectures that implicitly encode structure, like convolutional networks (CNNs) work extremely well for image datasets and actually result in better performance on small datasets. However, given enough pre-training data, learning sequences with attention beats architectures that encode structure implicitly, making the process a tradeoff between the utility and the amount of resources required for pre-training. Later research in the field has also shown advancements in vision based transformers which use a masked reconstruction approach\cite{heMaskedAutoencodersAre2021}, such details however are beyond the scope of this section. 128 | 129 | 130 | \section{Related Work} 131 | \label{sec:related_work} 132 | 133 | The problem of learning networks dynamics from packet data has been deemed a complex ``lost cause'', and not a lot of the research community has made much direct effort in this direction. The idea of using large pre-trained machine learning models to learn from abstract network traces, has thus not been explored much. However, application specific adaptions of ML architectures have been used with varying amount of success on network data, and we present some of these efforts and successes here, which helped provide direction to our own ideas and thoughts for this project. 134 | 135 | In MimicNet\cite{zhangMimicNetFastPerformance2021}, the authors show that they can provide users with the familiar abstraction of a packet-level simulation for a portion of the network while leveraging redundancy and recent advances in machine learning to quickly and accurately approximate portions of the network that are not directly visible. In Puffer\cite{puffer}, they use supervised learning in situ, with data from the real deployment environment, to train a probabilistic predictor of upcoming chunk transmission times for video streaming. The authors of Aurora\cite{jayDeepReinforcementLearning2019} show that casting congestion control as a reinforcement learning (RL) problem enables training deep network policies that capture intricate patterns in data traffic and network conditions but also claim that fairness, safety, and generalization, are not trivial to address within conventional RL formalism. Pensieve\cite{maoNeuralAdaptiveVideo2017} trains a neural network model that selects bitrates for future video chunks based on observations collected by client video players and it learns to make ABR decisions solely through observations of the resulting performance of past decisions. Finally, in NeuroPlan\cite{planning} they propose to use a graph neural network (GNN) combined with a novel domain-specific node-link transformation for state encoding and following that, leverage a two-stage hybrid approach that first uses deep RL to prune the search space and then uses an ILP solver to find the optimal solution for the network planning problem. 136 | 137 | 138 | 139 | 140 | 141 | --------------------------------------------------------------------------------