├── Makefile
├── README.md
├── configuration
├── deploy
├── README.md
├── deploy_nifty.sh
├── deploy_partitioner.sh
├── heal_partition.sh
├── nodes.conf
├── parts.conf
├── print_macs.sh
└── stop_nifty.sh
├── examples
├── HDFS
│ ├── README.md
│ ├── config.py
│ ├── deploy_hdfs.py
│ ├── run_benchmark.py
│ ├── run_exp.py
│ └── stop_hdfs.py
├── Kafka
│ ├── README.md
│ ├── cmd_helper.py
│ ├── config.py
│ ├── deploy_kafka.py
│ ├── helpers.py
│ ├── run_benchmark.py
│ ├── run_exp.py
│ └── stop_kafka.py
└── simple example
│ └── example.md
├── pnp.png
└── src
├── daemon.cpp
├── nifty.cpp
├── nifty.h
└── partitioner.cpp
/Makefile:
--------------------------------------------------------------------------------
1 | CC=g++
2 | CFLAGS=-c -std=c++17 -lpthread -pthread
3 | DEPS = dv.h
4 |
5 | SRC_DIR := src
6 | OBJ_DIR := obj
7 |
8 | .PHONY: all clean nifty partitioner
9 |
10 | all: nifty partitioner
11 |
12 | debug: CFLAGS += -Wall -DDEBUG -g
13 | debug: all
14 |
15 | nifty: $(OBJ_DIR)/daemon.o $(OBJ_DIR)/nifty.o
16 | $(CC) -o $@ $^ -pthread
17 |
18 | partitioner: $(OBJ_DIR)/partitioner.o
19 | $(CC) -o $@ $^
20 |
21 | $(OBJ_DIR)/%.o: $(SRC_DIR)/%.cpp $(SRC_DIR)/nifty.h | $(OBJ_DIR)
22 | $(CC) -o $@ $< $(CFLAGS)
23 |
24 | $(OBJ_DIR):
25 | mkdir -p $@
26 |
27 | clean:
28 | rm -r $(OBJ_DIR) nifty partitioner
29 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | Nifty
2 | =======
3 |
4 | Nifty is a transparent communication layer that masks partial network partitions. Partial partitions are a special kind of network partitions that divides the cluster into three groups of nodes (group 1, 2, and 3) such that groups 1 and 2 are disconnected from each other while nodes in group 3 can communicate with all cluster nodes (See figure below). Nifty follows a peer-to-peer design in which every node in the cluster runs a Nifty process. These processes collaborate in monitoring cluster connectivity. When Nifty detects a partial partition, it detours the traffic around the partition through intermediate nodes (e.g., nodes in group 3 in the figure).
5 |
6 | 
7 |
8 | Setup
9 | -------
10 |
11 | In order to use Nifty, you need to have OVS installed already, with a bridge called br0. To do that, you can use the following commands (make sure to replace **$INTERFACE_NAME** (e.g., if0) and **$IP_ADDRESS** with their actual vales):
12 |
13 | ```bash
14 | $ sudo apt-get update
15 | $ sudo apt-get install openvswitch-switch -y
16 | $ sudo ovs-vsctl init
17 | $ sudo ovs-vsctl add-br br0
18 | $ sudo ovs-vsctl add-port br0 $INTERFACE_NAME
19 | $ sudo ifconfig br0 $IP_ADDRESS netmask 255.255.255.0 up
20 | $ sudo ifconfig $INTERFACE_NAME 0
21 | ```
22 |
23 | After that, you can just use
24 | ```bash
25 | $ make
26 | ```
27 | to compile the code and generate the executables.
28 |
29 |
30 | Usage
31 | -------
32 | There are two main executables. Nifty and Partitioner. Nifty is the fault tolerance layer that protects against partial network partitions, whereas Partitioner is a simple tool that can be used to inject partial partitions (for testing purposes). Both of these require OVS and assume the bridge is called br0 (see setup above).
33 |
34 | The ```deploy``` directory contains scripts for deploying Nifty on a cluster.
35 |
36 |
37 | Examples
38 | -------
39 |
40 | The ```examples``` direcotry contains examples of using Nifty with diffirent systems.
41 | The ```examples/simple example``` directory contains a simple example that demonstrates the functionality of Nifty.
42 |
--------------------------------------------------------------------------------
/configuration:
--------------------------------------------------------------------------------
1 | NIFTY_HOME='/users/maaalfat/nifty/'
2 |
--------------------------------------------------------------------------------
/deploy/README.md:
--------------------------------------------------------------------------------
1 | The scripts in this folder are meant to facilitate the deployment of NIFTY and Partitioner on multiple machines.
2 |
3 | There are four scripts:
4 |
5 | 1. deploy_nifty: This script configures Nifty, and runs it on all the nodes specified in nodes.conf configuration file.
6 | 2. deploy_partitioner: This script configures partitioner, and runs it on all the nodes specified in nodes.conf. The way the partition is set up (between which nodes) is configured in parts.conf config file.
7 | 3. heal_partition: This script heals the partition that is currently present in the nodes in nodes.conf configuration file.
8 | 4. stop_nifty: This script stops Nifty instances running on all the nodes specified in nodes.conf configuration file.
9 |
10 |
11 | Assumptions
12 | -------
13 |
14 | To run the scripts without any modifications, we have three assumptions. Below we list these assumptions and describe how to
15 | modify the scripts in case any of these assumptions don't hold (if possible).
16 |
17 | 1. We assume that Nifty is already present in all nodes in nodes.conf and its location is the same in all these nodes (e.g., in the user NFS home directory). The directory of Nifty can be configured through the configuration file (in the main Nifty directory, outside of the deployment folder). The file configuration currently only holds one variable called NIFTY_HOME and it gets parsed as part of all the scripts.
18 |
19 | 2. We assume that the controller node (where you call the deployment scripts) can ssh into all the nodes in nodes.conf without any extra configuration or restrictions. Before running the experiment, please distribute your ssh keys on the nodes and make sure ssh does not ask for credentials. Furthermore, if you need extra configuration to ssh into other nodes, you need to modify the scripts and change the variable called sshOptions that is present at the top of all scripts files.
20 |
21 | 3. We assume that you do not need sudo privileges to install openflow rules in OVS of the nodes you ssh into. If this doesn't hold, you can change the lines in the scripts that call Nifty or Partitioner in other nodes (the scripts have a comment that makes this change easy)
22 |
23 |
24 | NIFTY Deployment
25 | =======
26 |
27 | First, please set properly the path to the Nifty directory in the ```NIFTY_HOME``` variable in the ```NIFTY/configuration``` file.
28 |
29 | In order for Nifty to run properly on a cluster, you will need to fill the config file nodes.conf.
30 | nodes.conf should contain the hostname or IP address of all the nodes in the cluster. Each hostname (or IP address) needs to be on a single line.
31 |
32 | Example file
33 | ```
34 | 192.168.1.101
35 | 192.168.1.102
36 | 192.168.1.103
37 | ```
38 |
39 | To deply NIFTY on the nodes in the nodes.conf just simple call the deployment script
40 |
41 | ```
42 | ./deploy_nifty
43 | ```
44 |
45 |
46 | Partitioner Deployment
47 | =======
48 |
49 | To use Partitioner, you need a config the parts.conf file which specifies the partition.
50 |
51 | The structure of the config file is as follows:
52 | First line is an integer that represents the number of nodes in the first group (n). The next n lines list the IP addresses of these nodes. Next line is an integer the represents the number of nodes in the second group (m). The next m lines list the IP addresses of these nodes. e.g.,
53 |
54 | ```
55 | 1
56 | 192.168.1.101
57 | 2
58 | 192.168.1.102
59 | 192.168.1.103
60 | ```
61 |
62 | The example parts.conf above specifies a partition in which IP1 (192.168.1.101) is on one side and the two nodes with macs (IP2 = 192.168.1.102 and IP3 = 192.168.1.103) are on another. Other nodes that are not listed in parts.conf are not affected by the partition, i.e., can access all other nodes.
63 |
64 | Once you configure parts.conf you can run the partitioner to create the partition.
65 |
66 | ```
67 | ./deploy_partitioner
68 | ```
69 |
70 | To heal a partition, simple call the heal script
71 |
72 | ```
73 | ./heal_partition
74 | ```
75 |
76 | Helper Scripts
77 | =======
78 |
79 | If you intend on running Nifty or the partitioner manually, you will need to find the MAC addresses of nodes in the cluster. You can use the print_macs helper script to help.
80 | Run the helper script as follows:
81 |
82 |
83 | ```
84 | ./print_macs nodes.conf
85 | ```
86 | Where nodes.conf includes a list of hostnames or IP addresses.
87 |
88 | This script will print the MAC address of every IP address found in nodes.conf
89 |
--------------------------------------------------------------------------------
/deploy/deploy_nifty.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # get NIFTY_HOME
4 | . ../configuration
5 |
6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...)
7 | sshOptions=" -T "
8 |
9 | # Iterate through all the nodes, get their brdige IPs and MACs and save them (used to update nifty's nodes.conf)
10 | ips="";
11 | macs="";
12 | nodesCount=0;
13 | while IFS= read -r nodeIP
14 | do
15 | if [ -z $nodeIP ]; then
16 | continue;
17 | fi
18 | #ssh into the node and get its IP and MAC.
19 | ip=$(ssh -n $sshOptions $nodeIP ip addr show br0 | grep 'inet ' | cut -f2 | awk '{print $2}' | rev | cut -c4- | rev)
20 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address)
21 |
22 | ips="${ips}${ip}\n"
23 | macs="${macs}${mac}\n"
24 | let nodesCount=nodesCount+1
25 | done < ./nodes.conf
26 |
27 | printf "%d\n%b%b" $nodesCount $ips $macs > nifty_nodes.conf
28 | # For each of the nodes in deployment, update nodes.conf & run nifty with the nodes IP.
29 | while IFS= read -r nodeIP
30 | do
31 | if [ -z $nodeIP ]; then
32 | continue;
33 | fi
34 | #ssh into the node and get its IP and MAC.
35 | ip=$(ssh -n $sshOptions $nodeIP ip addr show br0 | grep 'inet ' | cut -f2 | awk '{print $2}' | rev | cut -c4- | rev)
36 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address)
37 | scp $sshOptions ./nifty_nodes.conf $nodeIP:"${NIFTY_HOME}/nifty_nodes.conf"
38 |
39 | echo "Starting NIFTY on node $nodeIP (which has IP address: $ip, and MAC address: $mac)"
40 | # Could need to either run the script as sudo or add sudo here to be able to deploy rules. (or have OVS not require sudo)
41 | ssh -n $sshOptions $nodeIP "cd $NIFTY_HOME && ./nifty -t 200 -i $ip -m $mac -c nifty_nodes.conf" &
42 |
43 | done < ./nodes.conf
44 |
45 | rm ./nifty_nodes.conf
46 |
--------------------------------------------------------------------------------
/deploy/deploy_partitioner.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # get NIFTY_HOME
4 | . ../configuration
5 |
6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...)
7 | sshOptions=" -T "
8 |
9 | macs="";
10 |
11 | while IFS= read -r nodeIP
12 | do
13 |
14 | # read IP addresses, get macs and print them to nifty_parts.conf
15 | re='^[0-9]+$'
16 | if ! [[ $nodeIP =~ $re ]]; then
17 | #ssh into each node and get its MAC.
18 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address)
19 | echo $mac >> nifty_parts.conf
20 | echo "${nodeIP} ${mac}"
21 |
22 | # if it was a number and not an ip jsut print it to the file
23 | else
24 | echo $nodeIP >> nifty_parts.conf
25 | fi
26 | done < ./parts.conf
27 |
28 |
29 | # For each of the nodes in deployment, run partitiner.
30 | while IFS= read -r nodeIP
31 | do
32 | if [ -z $nodeIP ]; then
33 | continue;
34 | fi
35 |
36 | #ssh into the node and get its MAC.
37 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address)
38 | scp $sshOptions ./nifty_parts.conf $nodeIP:"${NIFTY_HOME}/parts.conf"
39 |
40 | echo "Starting Partitioner on node $nodeIP (which has MAC address: $mac)"
41 | # Could need to either run the script as sudo or add sudo here to be able to deploy rules. (or have OVS not require sudo)
42 | ssh -n $sshOptions $nodeIP "cd $NIFTY_HOME && ./partitioner $mac"
43 |
44 | done < ./nodes.conf
45 |
46 | rm nifty_parts.conf
--------------------------------------------------------------------------------
/deploy/heal_partition.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # get NIFTY_HOME
4 | . ../configuration
5 |
6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...)
7 | sshOptions=" -T "
8 |
9 | # For each of the nodes in deployment, run partitiner.
10 | while IFS= read -r nodeIP
11 | do
12 | if [ -z $nodeIP ]; then
13 | continue;
14 | fi
15 | echo "Healing the partition on node $nodeIP"
16 | # Could need to either run the script as sudo or add sudo here to be able to deploy rules. (or have OVS not require sudo)
17 | ssh -n $sshOptions $nodeIP "$NIFTY_HOME/partitioner"
18 |
19 | done < ./nodes.conf
20 |
--------------------------------------------------------------------------------
/deploy/nodes.conf:
--------------------------------------------------------------------------------
1 | 192.168.1.101
2 | 192.168.1.102
3 | 192.168.1.103
4 |
--------------------------------------------------------------------------------
/deploy/parts.conf:
--------------------------------------------------------------------------------
1 | 1
2 | 08:00:27:39:87:e3
3 | 1
4 | 08:00:27:39:87:e4
5 |
--------------------------------------------------------------------------------
/deploy/print_macs.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # get NIFTY_HOME
4 | . ../configuration
5 |
6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...)
7 | sshOptions=" -T "
8 |
9 | echo "The following are the mac addresses for your nodes. Please use these mac addresses to specify your partition in parts.conf file\n"
10 |
11 | # Iterate through all the nodes to find their and mac addresses
12 | while IFS= read -r nodeIP
13 | do
14 | #ssh into each node and get its MAC.
15 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address)
16 |
17 | echo "${nodeIP} ${mac}"
18 | done < ./nodes.conf
19 |
--------------------------------------------------------------------------------
/deploy/stop_nifty.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # get NIFTY_HOME
4 | . ../configuration
5 |
6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...)
7 | sshOptions=" -T "
8 |
9 | # For each of the nodes in deployment, run partitiner.
10 | while IFS= read -r nodeIP
11 | do
12 | if [ -z $nodeIP ]; then
13 | continue;
14 | fi
15 | echo "Stopping Nifty on node $nodeIP"
16 | # Could need to either run the script as sudo or add sudo here to be able to deploy rules. (or have OVS not require sudo)
17 | ssh -n $sshOptions $nodeIP killall nifty
18 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=1/-1
19 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=2/-1
20 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=3/-1
21 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=4/-1
22 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=5/-1
23 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=6/-1
24 | done < ./nodes.conf
--------------------------------------------------------------------------------
/examples/HDFS/README.md:
--------------------------------------------------------------------------------
1 | Example of using Nifty with HDFS
2 | =======
3 | The scripts in this directory show an example of using Nifty with HDFS. The following steps guide you through deploying an HDFS cluster with Nifty and evaluating its performance.
4 |
5 | Prerequesets
6 | -------
7 | 1- Install Java.
8 | ```bash
9 | $ apt-get install default-jdk -y
10 | ```
11 |
12 | 2- Install Hadoop 3.3.0 from the website: https://downloads.apache.org/hadoop/common/hadoop-3.3.0/.
13 |
14 | 3- Install the Paramiko library for Python SSH, the easiest way is using pip.
15 | ```bash
16 | $ pip install paramiko
17 | ```
18 | While most environments already have pip, you may need to install it manually as described in https://github.com/pypa/get-pip.
19 |
20 |
21 | 4- Make sure the different machines can SSH into each other, or at least one controller node can SSH into all machines. This could call for setting up some keys for SSH.
22 |
23 | Running the Example
24 | -------
25 | 1- Set variables in the config.py file:
26 | * HADOOP_HOME: making sure that this directory is where hadoop is installed and is the same for all nodes.
27 | * HADOOP_STORE: This is the directory that HDSF will use to for storage. The scripts also use this directory for some temp files. Make sure that this directory exists on all the nodes in the cluster. Also, make sure you have at least 1GB/client of available storage.
28 | * The IP addresses of all nodes in the cluster, this includes the HDFS cluster nodes (NameNodes and DataNodes) and the nodes which will run the benchmark client. Please note that the scripts assumes that the first ip in the list will have the NameNode instance.
29 | * The size of the HDFS cluster, which will be split into 1 NameNode and the rest will be DataNodes.
30 |
31 |
32 | 2- Start by setting HDFS parameteres. You'll need (at least) to edit the following file on all cluster nodes:
33 |
34 | * In $HADOOP_HOME/etc/hadoop/hdfs-site.xml: Add the following two properties, with the [HADOOP_STORE] defined in step 1:
35 |
36 | ```xml
37 |
38 | dfs.namenode.name.dir
39 | file:[HADOOP_STORE]/hdfs/namenode
40 |
41 |
42 | dfs.datanode.data.dir
43 | file:[HADOOP_STORE]/hdfs/datanode
44 |
45 | ```
46 |
47 | * In $HADOOP_HOME/etc/hadoop/hadoop-env.sh, add your JAVA_HOME directory. It should look something like this:
48 | ```bash
49 | # The java implementation to use. By default, this environment
50 | # variable is REQUIRED on ALL platforms except OS X!
51 | export JAVA_HOME='/usr/lib/jvm/java-8-openjdk-amd64'
52 | ```
53 |
54 | * In $HADOOP_HOME/etc/hadoop/core-site.xml, you must specify the address of your NameNode to allow the DataNodes to reach it. It should have a property like the following, with node1_ip being the first ip in the list specified in the config.py file in step 1:
55 | ```xml
56 |
57 | fs.defaultFS
58 | hdfs://:9000
59 |
60 | ```
61 |
62 | 3- From the controller node, which could be a separate node or part of the cluster, start HDFS. You can use:
63 | ```bash
64 | $ python deploy_hdfs.py
65 | ```
66 | The script will start a NameNode on the first IP address machine in the list (in config.py), and enough DataNodes to satify the set cluster size.
67 |
68 | 4- If you're testing with Nifty, now would be the time to start it using deploy/deploy-nifty.sh. You can learn more on how to start Nifty in the Readme of the deploy directory.
69 |
70 | 5- To evaluate the perfomance of the cluster, run the HDFS TestDFSIO benchmark. This can be done by:
71 | ```bash
72 | $ python run_benchmark.py
73 | ```
74 | Clients will be distributed onto the machines that are in the cluster but were not used in the HDFS cluster. The client will run in parallel, starting with a clean up, then writing to the HDFS cluster, then reading the same files they wrote. The run_benchmark script then returns the total throughput of the cluster in the write period and in the read period.
75 |
76 | To test different number of clients in a single script, maybe for plotting figures, you might want to use the following command:
77 | ```bash
78 | $ python run_exp.py
79 | ```
80 | This command will create a CSV file called results.csv with the Read and Write throughputs across different client counts.
--------------------------------------------------------------------------------
/examples/HDFS/config.py:
--------------------------------------------------------------------------------
1 | HADOOP_VERSION='3.3.0'
2 | HADOOP_HOME='/proj/sds-PG0/basil/HDFS/hadoop-' + HADOOP_VERSION
3 | HADOOP_STORE='/dev/shm/hadoop_store'
4 | ips = ['192.168.1.101', '192.168.1.102', '192.168.1.103', '192.168.1.104', '192.168.1.105']
5 | num_of_cluster_nodes = 3
6 | SSH_USERNAME='b2alkhat'
--------------------------------------------------------------------------------
/examples/HDFS/deploy_hdfs.py:
--------------------------------------------------------------------------------
1 | #/usr/bin/env python
2 |
3 | import config
4 |
5 | import sys
6 | import paramiko
7 | import time
8 | import threading
9 | import os
10 | import subprocess
11 | from multiprocessing import Pool
12 | from datetime import datetime
13 |
14 | nodes = []
15 |
16 | # parameters for SSH paramiko
17 | port = 22
18 |
19 |
20 |
21 | # SSH to all the nodes
22 | try:
23 | for ip in config.ips:
24 | print(ip)
25 | node = paramiko.SSHClient()
26 | node.set_missing_host_key_policy(paramiko.AutoAddPolicy())
27 | node.connect(ip, port=port, username=config.SSH_USERNAME)
28 | print("Trying to connect to node with address: " + ip)
29 | nodes.append(node)
30 | except:
31 | for n in nodes:
32 | n.close()
33 | print("Error: Could not ssh to all nodes")
34 |
35 |
36 |
37 | # Start NameNode on first IP node
38 | stdin, stdout, stderr = nodes[0].exec_command(config.HADOOP_HOME + "/bin/hdfs namenode -format c1")
39 | #print(stdout.read())
40 | print(stderr.read())
41 | stdin, stdout, stderr = nodes[0].exec_command(config.HADOOP_HOME + "/bin/hdfs --daemon start namenode")
42 | #print(stdout.read())
43 | print(stderr.read())
44 |
45 | # Start DataNodes
46 | for i in range(config.num_of_cluster_nodes-1):
47 | command = config.HADOOP_HOME + "/bin/hdfs --daemon start datanode"
48 | stdin, stdout, stderr = nodes[i+1].exec_command(command)
49 | print(stdout.read())
50 | print(stderr.read())
51 | print("Done DataNode start on node...")
--------------------------------------------------------------------------------
/examples/HDFS/run_benchmark.py:
--------------------------------------------------------------------------------
1 | #/usr/bin/env python
2 | import config
3 |
4 | import sys
5 | import paramiko
6 | import time
7 | import threading
8 | import os
9 | import subprocess
10 | from multiprocessing import Pool
11 | from datetime import datetime
12 |
13 |
14 | nodes = []
15 | num_of_nodes = len(config.ips)
16 | num_of_client_nodes = num_of_nodes - config.num_of_cluster_nodes
17 |
18 |
19 | # parameters for SSH paramiko
20 | port = 22
21 |
22 |
23 | # read inputs from arguments
24 | num_of_clients = int(sys.argv[1])
25 |
26 | # SSH to all the nodes
27 | try:
28 | for ip in config.ips:
29 | print(ip)
30 | node = paramiko.SSHClient()
31 | node.set_missing_host_key_policy(paramiko.AutoAddPolicy())
32 | node.connect(ip, port=port, username=config.SSH_USERNAME)
33 | print("Trying to connect to node with address: " + ip)
34 | nodes.append(node)
35 | except:
36 | for n in nodes:
37 | n.close()
38 | print("Error: Could not ssh to all nodes")
39 |
40 |
41 |
42 | def runBenchmarkReadClient(node, cid):
43 | command = "cd " + config.HADOOP_HOME +"; ./bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-" + config.HADOOP_VERSION + "-tests.jar TestDFSIO -Dtest.build.data=/benchmark/c" + str(cid) + " -Dmapred.output.compress=false -read -nrFiles 1 -fileSize 1000 &> " + config.HADOOP_HOME + "/temp_output_read_" + str(cid) + ".txt"
44 | node.exec_command(command)
45 |
46 | def runBenchmarkWriteClient(node, cid):
47 | command = "cd " + config.HADOOP_HOME +"; ./bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-" + config.HADOOP_VERSION + "-tests.jar TestDFSIO -Dtest.build.data=/benchmark/c" + str(cid) + " -Dmapred.output.compress=false -write -nrFiles 1 -fileSize 1000 &> " + config.HADOOP_HOME + "/temp_output_write_" + str(cid) + ".txt"
48 | node.exec_command(command)
49 |
50 | def runBenchmarkCleanup(node):
51 | node.exec_command("cd " + config.HADOOP_HOME + "; ./bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-" + config.HADOOP_VERSION + "-tests.jar TestDFSIO -Dtest.build.data=/benchmark -Dmapred.output.compress=false -clean")
52 |
53 |
54 |
55 |
56 | runBenchmarkCleanup(nodes[0])
57 |
58 |
59 |
60 |
61 | # Polling to make sure that files inside /benchmark are deleted
62 | # since HDFS has pretty unpredictable delete time
63 | x = 5
64 | while x > 1:
65 | try:
66 | results = subprocess.check_output((config.HADOOP_HOME + "/bin/hdfs dfs -ls /benchmark").split(" "))
67 | except:
68 | results = ''
69 | lines = results.split(" ")
70 | x = len(lines)
71 | time.sleep(10)
72 |
73 |
74 |
75 |
76 | print("Client's Write Operations Starting...")
77 | threads = []
78 | for i in range(num_of_clients):
79 | threads.append(threading.Thread(target=runBenchmarkWriteClient, args=[nodes[config.num_of_cluster_nodes + (i % num_of_client_nodes)], i]))
80 |
81 | for t in threads:
82 | t.start()
83 |
84 |
85 | # Polling for the results to be ready before joining threads
86 | for c in range(num_of_clients):
87 | done = False
88 | while done == False:
89 | stdin, stdout, stderr = nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("cat " + config.HADOOP_HOME + "/temp_output_write_" + str(c) + ".txt | grep Throughput")
90 | output_c = stdout.read()
91 | done = ('Throughput' in output_c)
92 | time.sleep(5)
93 |
94 | for t in threads:
95 | t.join()
96 | print("Client's Write Done")
97 |
98 |
99 |
100 |
101 | print("Client's Read Operation Starting...")
102 | threads = []
103 | for i in range(num_of_clients):
104 | threads.append(threading.Thread(target=runBenchmarkReadClient, args=[nodes[config.num_of_cluster_nodes + (i % num_of_client_nodes)], i]))
105 |
106 | for t in threads:
107 | t.start()
108 |
109 | # Polling for the results to be ready before joining threads
110 | for c in range(num_of_clients):
111 | done = False
112 | while done == False:
113 | stdin, stdout, stderr = nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("cat " + config.HADOOP_HOME + "/temp_output_read_" + str(c) + ".txt | grep Throughput")
114 | output_c = stdout.read()
115 | done = ('Throughput' in output_c)
116 | time.sleep(5)
117 |
118 | for t in threads:
119 | t.join()
120 | print("Client's Read Done")
121 |
122 |
123 |
124 | #-------------------------------------
125 | # Results Collection
126 | #-------------------------------------
127 |
128 | throughput = 0
129 | for c in range(num_of_clients):
130 | stdin, stdout, stderr = nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("cat " + config.HADOOP_HOME + "/temp_output_write_" + str(c) + ".txt")
131 | lines = stdout.read().splitlines()
132 | for line in lines:
133 | if "Throughput" in line:
134 | throughput = throughput + float(line.split(" ")[-1])
135 | nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("rm " + config.HADOOP_HOME + "/temp_output_write_" + str(c) + ".txt")
136 | print("Total Write Throughput: " + str(throughput) + "MB/sec")
137 |
138 |
139 | throughput = 0
140 | for c in range(num_of_clients):
141 | stdin, stdout, stderr = nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("cat " + config.HADOOP_HOME + "/temp_output_read_" + str(c) + ".txt")
142 | lines = stdout.read().splitlines()
143 | for line in lines:
144 | if "Throughput" in line:
145 | throughput = throughput + float(line.split(" ")[-1])
146 | nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("rm " + config.HADOOP_HOME + "/temp_output_read_" + str(c) + ".txt")
147 | print("Total Read Throughput: " + str(throughput) + "MB/sec")
--------------------------------------------------------------------------------
/examples/HDFS/run_exp.py:
--------------------------------------------------------------------------------
1 | #/usr/bin/env python
2 | import config
3 |
4 | import sys
5 | import paramiko
6 | import time
7 | import threading
8 | import os
9 | import subprocess
10 | from multiprocessing import Pool
11 | from datetime import datetime
12 |
13 |
14 | nodes = []
15 |
16 | # parameters for SSH paramiko
17 | port = 22
18 |
19 |
20 | # SSH to all the nodes
21 | try:
22 | for ip in config.ips:
23 | print(ip)
24 | node = paramiko.SSHClient()
25 | node.set_missing_host_key_policy(paramiko.AutoAddPolicy())
26 | node.connect(ip, port=port, username=config.SSH_USERNAME)
27 | print("Trying to connect to node with address: " + ip)
28 | nodes.append(node)
29 | except:
30 | for n in nodes:
31 | n.close()
32 | print("Error: Could not ssh to all nodes")
33 |
34 |
35 |
36 |
37 | if (len(sys.argv) < 3):
38 | print ("::USAGE::")
39 | print ("python ./run_exp.py min_num_clients max_num_clients step")
40 | print ("this script runs multiple experiments by increasing the ")
41 | print ("number of clients from min_num_clients to max_num_clients ")
42 | print ("by the value of step in each iteration")
43 | sys.exit()
44 |
45 | min_clients = int(sys.argv[1])
46 | max_clients = int(sys.argv[2])
47 | step = int(sys.argv[3])
48 |
49 | finalOutput = 'Clients,Write Throughput,Read Throughput\n'
50 |
51 | for clients in range(min_clients, max_clients+1, step):
52 | print('Starting experiment with ' + str(clients) + ' clients...')
53 | # get the working directory
54 | res = subprocess.check_output('pwd')
55 |
56 | # just run run_benchmark.py repeatedly
57 | stdin, stdout, stderr = nodes[0].exec_command('python ' + res.strip() + '/run_benchmark.py ' + str(clients))
58 | lines = stdout.read().splitlines()
59 |
60 | # Collect and parse results
61 | finalOutput = finalOutput + str(clients) + ','
62 | for line in lines:
63 | if 'Total Write Throughput' in line:
64 | finalOutput = finalOutput + str(line.split(' ')[3]) + ','
65 | if 'Total Read Throughput' in line:
66 | finalOutput = finalOutput + str(line.split(' ')[3]) + '\n'
67 |
68 | print ('Experiment Complete, results are in file results.csv')
69 | try:
70 | f = open("result.csv", "w")
71 | f.write(finalOutput)
72 | f.close()
73 | except Exception as e:
74 | print(e)
--------------------------------------------------------------------------------
/examples/HDFS/stop_hdfs.py:
--------------------------------------------------------------------------------
1 | #/usr/bin/env python
2 |
3 | import config
4 |
5 | import sys
6 | import paramiko
7 | import time
8 | import threading
9 | import os
10 | import subprocess
11 | from multiprocessing import Pool
12 | from datetime import datetime
13 |
14 | nodes = []
15 |
16 | # parameters for SSH paramiko
17 | port = 22
18 |
19 |
20 | # SSH to all the nodes
21 | try:
22 | for ip in config.ips:
23 | print(ip)
24 | node = paramiko.SSHClient()
25 | node.set_missing_host_key_policy(paramiko.AutoAddPolicy())
26 | node.connect(ip, port=port, username=config.SSH_USERNAME)
27 | print("Trying to connect to node with address: " + ip)
28 | nodes.append(node)
29 | except:
30 | for n in nodes:
31 | n.close()
32 | print("Error: Could not ssh to all nodes")
33 |
34 |
35 |
36 |
37 | nodes[0].exec_command(config.HADOOP_HOME + "/bin/hdfs --daemon stop namenode")
38 |
39 |
40 | for i in range(config.num_of_cluster_nodes-1):
41 | command = config.HADOOP_HOME + "/bin/hdfs --daemon stop datanode"
42 | stdin, stdout, stderr = nodes[i+1].exec_command(command)
43 | print(stdout.read())
44 | nodes[i+1].exec_command("rm -r " + config.HADOOP_STORE)
45 |
46 | print("HDFS Stopped")
--------------------------------------------------------------------------------
/examples/Kafka/README.md:
--------------------------------------------------------------------------------
1 | Example of using Nifty with Kafka
2 | =======
3 | The scripts in this directory can be used to try Nifty with Kafka v2.13-2.6.0. The scripts were tested on Ubuntu 16.04.1 LTS with OpenJDK version 1.8.0_265 and Python 3.5.2. These scripts automate the process of deploying Kafka on multiple machines, running Kafka producers to generate some workload, and shutting down Kafka cluster.
4 |
5 | Description of the scripts
6 | -------
7 | Teh following is a brief description for each script in this directory:
8 |
9 | * config: This script contains configuration parameters that is needed to run the benchmark. These parameters are described below in the *Running the Experiment* section.
10 | * deploy_kafka.py: this script deploys Kafka brokers on the cluster specified in `config.py` file.
11 | * stop_kafka.py: this script stops all Kafka instances that are running on the cluster specified in `config.py` file.
12 | * run_benchmark.py: this scripts runs the Kafka producers to measure the throughout of Kafka. The script takes one argument which specifies the number of producers to run. Producers will be distributed over the client nodes (i.e., nodes that are not part of the Kafka cluster).
13 | * run_exp.py: This script runs multiple experiments to get the performance of Kafka with different number of clients. This script takes three arguments: `min_num_clients, max_num_clients, step`. First experiment starts with `min_num_clients` clients and then it increases the number of clients by `step` in subsequent experiments until it reaches `max_num_clients`. Clients will be distributed over the machines that are in the cluster but are not used in the Kafka cluster. The clients will run in parallel, and each client will produce to a topic. Finally, the script writes the throughput results to a file stored in `./tmp/results.csv`. As an example, `python run_exp.py 2 4 2` will generate the throughput of Kafka with 2 and 4 clients. The output `results.csv` file will look something like this:
14 |
15 | number of producers | throughput
16 | ------------------- | ----------
17 | 2 | 200000
18 | 4 | 400000
19 |
20 |
21 | Prerequisites
22 | -------
23 | 1- Install Java on all machines.
24 | ```bash
25 | $ apt-get install default-jdk -y
26 | ```
27 | 2- install kafka v2.13-2.6.0 on all machines. Follow this link to install Kafka: https://www.apache.org/dyn/closer.cgi?path=/kafka/2.6.0/kafka_2.13-2.6.0.tgz.
28 |
29 |
30 | Running the Experiment
31 | -------
32 | 1- Set the following variables in the config.py file:
33 | * KAFKA_HOME: the directory at which Kafka is installed and it should be the same for all nodes in the cluster.
34 | * NODES_IPS: the ip addresses of all nodes in the cluster.
35 | * KAFKA_DATA_DIR: the directory at which Kafka logs and config files will be stored.
36 | * KAFKA_PORT: Kafka brokers will listen at this port
37 | * ZOOKEEPER_PORT: ZooKeeper will listen at this port
38 | * REPLICATION_FACTOR: the replication factor for Kafka.
39 | * USER_NAME: ssh username which is needed to ssh into other nodes and run commands. Set this to `None` if not needed.
40 | * SSH_KEY_PATH: the path to ssh private key. Set this to `None` if not needed.
41 |
42 | 2- start Kafka cluster by running
43 | ```bash
44 | python3 deploy_kafka.py
45 | ```
46 | This script starts a ZooKeeper instance on the first node in `NODES_IPS` and N Kafka brokers on the next N nodes, where N = `REPLICATION_FACTOR` configuration parameter.
47 |
48 | 3- Start the throughput experiment bu running `run_exp.py` script.
49 | ```bash
50 | python3 run_exp.py min_num_clients max_num_clients step
51 | ```
52 |
53 | Experimenting with Nifty
54 | -------
55 | In order to compare the performance of Kafka with and without Nifty, you must run the same experiments with and without Nifty and compare the results. Please refer to the `deploy` folder to see how to deploy Nifty on multiple nodes.
56 |
--------------------------------------------------------------------------------
/examples/Kafka/cmd_helper.py:
--------------------------------------------------------------------------------
1 | import subprocess
2 | import os
3 | import config
4 |
5 | '''
6 | This is a helper class to execute system
7 | commands locally and remotely
8 | '''
9 | class CmdHelper:
10 | def __init__(self):
11 | self.sshKeyPath = config.SSH_KEY_PATH
12 | self.username = config.USER_NAME
13 |
14 | def executeCmdBlocking(self, cmd, redirectOutputPath=None):
15 | try:
16 | if redirectOutputPath:
17 | f = open(redirectOutputPath, "w")
18 | p = subprocess.Popen(cmd, shell=True, stdout=f, stderr=f)
19 | f.close()
20 | else:
21 | p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
22 | output, err = p.communicate()
23 | return [output, err]
24 | except OSError as e:
25 | return [None, e]
26 | except ValueError as e:
27 | return [None, e]
28 | except Exception as e:
29 | return [None, e]
30 |
31 | def executeCmdNonBlocking(self, cmd, redirectOutputPath=None):
32 | try:
33 | cmd = "sudo " + cmd
34 | if redirectOutputPath:
35 | f = open(redirectOutputPath, "w")
36 | p = subprocess.Popen(cmd, shell=True, stdout=f, stderr=f)
37 | else:
38 | p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
39 | return [p, None]
40 | except OSError as e:
41 | return [None, e]
42 | except ValueError as e:
43 | return [None, e]
44 | except Exception as e:
45 | return [None, e]
46 |
47 | def uploadToServer(self, localPath, remoteAddress, remotePath, blocking=True):
48 | cmd = "sudo scp -o StrictHostKeyChecking=no -r "
49 | usernameStr = ""
50 | if self.sshKeyPath:
51 | cmd = cmd + " -i " + self.sshKeyPath
52 | if self.username:
53 | usernameStr = " " + self.username + "@"
54 |
55 | cmd = cmd + " " + localPath + " "
56 | cmd = cmd + usernameStr
57 | cmd = cmd + remoteAddress
58 | cmd = cmd + ":" + remotePath
59 | if blocking:
60 | return self.executeCmdBlocking(cmd)
61 | else:
62 | return self.executeCmdNonBlocking(cmd)
63 |
64 | def downloadFromServer(self, localPath, remoteAddress, remotePath):
65 | cmd = "sudo scp -o StrictHostKeyChecking=no -r "
66 | if self.sshKeyPath:
67 | cmd = cmd + " -i " + self.sshKeyPath
68 | if self.username:
69 | cmd = cmd + " " + self.username +"@"
70 |
71 | cmd = cmd + remoteAddress
72 | cmd = cmd + ":" + remotePath
73 | cmd = cmd + " " + localPath
74 | return self.executeCmdBlocking(cmd)
75 |
76 | def executeCmdRemotely(self, cmd, remoteAddress, blocking=True, redirectOutputPath=None):
77 | cmd = "\"sudo " + cmd + "\""
78 | fullCommand = "ssh -o StrictHostKeyChecking=no "
79 | if self.sshKeyPath:
80 | fullCommand = fullCommand + " -i " + self.sshKeyPath
81 | if self.username:
82 | fullCommand = fullCommand + " " + self.username +"@"
83 |
84 | fullCommand = fullCommand + remoteAddress
85 | fullCommand = fullCommand + " " + cmd
86 | if blocking:
87 | return self.executeCmdBlocking(fullCommand, redirectOutputPath)
88 | else:
89 | return self.executeCmdNonBlocking(fullCommand, redirectOutputPath)
90 |
91 |
92 |
--------------------------------------------------------------------------------
/examples/Kafka/config.py:
--------------------------------------------------------------------------------
1 | '''
2 | This file contains the configuration parametes that is needed to
3 | run Kafka and reproduce the results presented in a paper titled Toward a Generic
4 | Fault Tolerance Technique for Partial Network Partitioning (OSDI'20).
5 | '''
6 |
7 | # the ssh username
8 | USER_NAME = None
9 |
10 | # the path for the ssh private key
11 | SSH_KEY_PATH = None
12 |
13 | # the ip addresses of all nodes in the cluster
14 | NODES_IPS=["192.168.1.101","192.168.1.102","192.168.1.103","192.168.1.104","192.168.1.112","192.168.1.114","192.168.1.118","192.168.1.105","192.168.1.113","192.168.1.119","192.168.1.111","192.168.1.115","192.168.1.107","192.168.1.116","192.168.1.117","192.168.1.120","192.168.1.106","192.168.1.108","192.168.1.109","192.168.1.110"]
15 |
16 | # The root directory of Kafka
17 | KAFKA_HOME ="/proj/sds-PG0/ahmed/nifty/kafka/kafka_2.13-2.6.0/"
18 |
19 | # the path that ZooKeeper, brokers, and producers
20 | # use to store logs and config files
21 | KAFKA_DATA_DIR ="/media/ssd/kafka/"
22 |
23 | # replication factor for Kafka
24 | REPLICATION_FACTOR = 3
25 |
26 |
27 | assert len(NODES_IPS) > REPLICATION_FACTOR + 1, "At least {} nodes are needed (replication factor:{} & 1 ZooKeeper & 1 Client)".format(REPLICATION_FACTOR+2, REPLICATION_FACTOR)
28 |
29 | # produced message size
30 | MESSAGE_SIZE = 128
31 |
32 | # number of messages produced per producer
33 | MESSAGES_COUNT = 2000000
34 |
35 | # first node will be used to run ZooKeeper
36 | ZOOKEEPER_ADDRESS = NODES_IPS[0]
37 |
38 | # the next (N = REPLICATION_FACTOR) nodes are used to run Kafka brokers
39 | BROKER_NODES = NODES_IPS[1 : 1 + REPLICATION_FACTOR]
40 |
41 | # the remaining nodes are used to run clients
42 | CLIENT_NODES = NODES_IPS[1 + REPLICATION_FACTOR : len(NODES_IPS)]
43 |
44 |
45 |
46 | # the directory contains the binaries of Kafka
47 | KAFKA_BIN =KAFKA_HOME + "bin/"
48 | # Binaries to run various processes
49 | KAFKA_CREATE_TOPIC_BIN = KAFKA_BIN + "kafka-topics.sh "
50 | KAFKA_PRODUCER_TEST_BIN = KAFKA_BIN + "kafka-producer-perf-test.sh "
51 | KAFKA_BROKER_BIN = KAFKA_BIN + "kafka-server-start.sh "
52 | KAFKA_ZK_BIN = KAFKA_BIN + "zookeeper-server-start.sh "
53 |
54 |
55 | # Kafka logs directory
56 | KAFKA_LOGS_DIR = KAFKA_DATA_DIR + "logs"
57 | # ZooKeeper logs directory
58 | ZOOKEEPER_LOGS_DIR = KAFKA_DATA_DIR + "zk-logs"
59 |
60 | # Kafka brokers listen on this port
61 | KAFKA_PORT = "55555"
62 |
63 | # ZooKeeper listens on this port
64 | ZOOKEEPER_PORT = "2181"
65 |
66 |
--------------------------------------------------------------------------------
/examples/Kafka/deploy_kafka.py:
--------------------------------------------------------------------------------
1 | '''
2 | This script deploys Kafka and ZooKepper on the cluster specified
3 | in the config.py file
4 | '''
5 | import os
6 | import time
7 | import config
8 | import cmd_helper
9 | import helpers
10 |
11 | cmdHelper = cmd_helper.CmdHelper()
12 |
13 | def deployKafka():
14 | # initialize needed directories on all nodes
15 | for node in config.NODES_IPS:
16 | cmd = "rm -r {}".format(config.KAFKA_DATA_DIR)
17 | cmdHelper.executeCmdRemotely(cmd, node, True)
18 | cmd = "mkdir -p {}".format(config.KAFKA_DATA_DIR)
19 | cmdHelper.executeCmdRemotely(cmd, node, True)
20 | cmd = "mkdir -p {}".format(config.KAFKA_LOGS_DIR)
21 | cmdHelper.executeCmdRemotely(cmd, node, True)
22 | cmd = "mkdir -p {}".format(config.ZOOKEEPER_LOGS_DIR)
23 | cmdHelper.executeCmdRemotely(cmd, node, True)
24 | cmd = "chmod 777 -R {}".format(config.KAFKA_DATA_DIR)
25 | cmdHelper.executeCmdRemotely(cmd, node, True)
26 |
27 |
28 | # write ZooKeeper config file
29 | path = TMP_DIR + "zk-config.properties"
30 | helpers.writeZkConfigFile(path, config.ZOOKEEPER_PORT, config.ZOOKEEPER_LOGS_DIR)
31 | cmdHelper.uploadToServer(path, config.ZOOKEEPER_ADDRESS, config.KAFKA_DATA_DIR, True)
32 |
33 | # write brokers config files
34 | for index, node in enumerate(config.BROKER_NODES):
35 | path = TMP_DIR + "server{}.properties".format(index)
36 | helpers.writeBrokerConfigFile(path, index, node, config.KAFKA_PORT, config.KAFKA_LOGS_DIR, config.ZOOKEEPER_ADDRESS, config.ZOOKEEPER_PORT)
37 | cmdHelper.uploadToServer(path, node, config.KAFKA_DATA_DIR, True)
38 |
39 | # start ZooKeeper
40 | path = config.KAFKA_DATA_DIR + "zk-config.properties"
41 | cmd = config.KAFKA_ZK_BIN + " {}".format(path)
42 | cmdHelper.executeCmdRemotely(cmd, config.ZOOKEEPER_ADDRESS, False, "{}/zookeeper.log".format(TMP_DIR))
43 |
44 | # start kafka brokers
45 | for index, s in enumerate(config.BROKER_NODES):
46 | path = config.KAFKA_DATA_DIR + "server{}.properties".format(index)
47 | cmd = config.KAFKA_BROKER_BIN + " {}".format(path)
48 | cmdHelper.executeCmdRemotely(cmd, s, False, "{}/broker{}.log".format(TMP_DIR, index))
49 |
50 |
51 |
52 | # temp folder that is used to store temp config and log files
53 | TMP_DIR = os.getcwd() + "/tmp/"
54 | cmd = "mkdir -p {}".format(TMP_DIR)
55 | cmdHelper.executeCmdBlocking(cmd)
56 |
57 | deployKafka()
--------------------------------------------------------------------------------
/examples/Kafka/helpers.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | '''
4 | This helper function writes the configuration file
5 | for a Kafka broker
6 | '''
7 | def writeBrokerConfigFile(path, brokerId,kafkaAddress, kafkaPort, logFilesPath, zkAddress, zkPort):
8 | path = os.path.expanduser(path)
9 | str = ""
10 | str = str + "broker.id={}\n".format(brokerId)
11 | str = str + "zookeeper.connect={}:{}\n".format(zkAddress, zkPort)
12 | str = str + "listeners=PLAINTEXT://{}:{}\n".format(kafkaAddress, kafkaPort)
13 | str = str + "log.dirs={}\n".format(logFilesPath)
14 | str = str + "num.partitions=3\n"
15 | str = str + "num.network.threads=4\n"
16 | str = str + "num.io.threads=10\n"
17 |
18 | with open(path, 'w') as file:
19 | file.write(str)
20 |
21 | '''
22 | This helper function writes the configuration file
23 | for ZooKeeper
24 | '''
25 | def writeZkConfigFile(path, zkPort, logFilesPath):
26 | path = os.path.expanduser(path)
27 | str = ""
28 | str = str + "clientPort={}\n".format(zkPort)
29 | str = str + "dataDir={}\n".format(logFilesPath)
30 |
31 | with open(path, 'w') as file:
32 | file.write(str)
33 |
34 | '''
35 | This helper function writes the configuration file
36 | for Kafka producer
37 | '''
38 | def writeProducerConfigFile(path, brokersAddresses, logFilesPath):
39 | path = os.path.expanduser(path)
40 | str = ""
41 | str = str + "bootstrap.servers={}\n".format(brokersAddresses)
42 | str = str + "acks=all\n"
43 |
44 | with open(path, 'w') as file:
45 | file.write(str)
46 |
47 |
--------------------------------------------------------------------------------
/examples/Kafka/run_benchmark.py:
--------------------------------------------------------------------------------
1 | '''
2 | This script benchmarks the performance of Kafka by
3 | running multiple producers and measure the throughput.
4 | This script receives one command arguments: number of producers
5 | '''
6 |
7 | import sys
8 | import threading
9 | import os
10 | import time
11 | import config
12 | import cmd_helper
13 | import helpers
14 |
15 | cmdHelper = cmd_helper.CmdHelper()
16 |
17 | # number of producers to run
18 | PRODUCERS_COUNT = int(sys.argv[1])
19 |
20 | def printResults(producersProcesses):
21 | # read the results
22 | total = 0
23 | print ("###################")
24 | for index, p in enumerate(producersProcesses):
25 | try:
26 | output, err = p.communicate()
27 | if (err):
28 | print("producer{}: {}".format(index,err.decode("utf-8")))
29 | else:
30 | output = output.decode("utf-8")
31 | output = output.split('\n')
32 | output.remove('')
33 | output = output[-1]
34 | output = output.split(',')
35 | if (len(output) == 8):
36 | throughput = float(output[1].split(' ')[1])
37 | total = total + throughput
38 | print("producer{}: {:.2f} messages/sec".format(index, throughput))
39 | else:
40 | print("producer{}: {}".format(index, "error"))
41 | except Exception as e:
42 | print("{}: {}".format(index, e))
43 | print ("-------------------")
44 | print("total: {:.2f} messages/sec".format(total))
45 |
46 |
47 |
48 | # create a single topic
49 | def createSingleTopic(topicName):
50 | cmd = config.KAFKA_CREATE_TOPIC_BIN + " --create --topic {} --zookeeper {}:{} --replication-factor {} --partitions 3".format(topicName,config.ZOOKEEPER_ADDRESS, config.ZOOKEEPER_PORT, config.REPLICATION_FACTOR)
51 | cmdHelper.executeCmdRemotely(cmd, config.BROKER_NODES[0], True, TMP_DIR + "null")
52 |
53 |
54 | # create a topic for each producer.
55 | def createTopics():
56 | threads = []
57 | for i in range(PRODUCERS_COUNT):
58 | threads.append(threading.Thread(target=createSingleTopic, args=["topic-{}".format(i)]))
59 |
60 | for t in threads:
61 | t.start()
62 |
63 | for t in threads:
64 | t.join()
65 |
66 |
67 | def startSingleProducer(cmd, nodeIp, producersProcesses):
68 | [p, err] = cmdHelper.executeCmdRemotely(cmd, nodeIp, False)
69 | producersProcesses.append(p)
70 |
71 | # start the producers and collect the results
72 | def startProducers():
73 | # thread array to store threads use run clients in parallel
74 | threads = []
75 |
76 | # convert the brokers ip addresses to producers format
77 | brokersAddressesStr = []
78 | for index, node in enumerate(config.BROKER_NODES):
79 | brokersAddressesStr.append("{}:{}".format(node, config.KAFKA_PORT))
80 | brokersAddressesStr = ','.join(brokersAddressesStr)
81 |
82 | # write producers config files
83 | for i in range(PRODUCERS_COUNT):
84 | path = TMP_DIR + "producer{}.properties".format(i)
85 | helpers.writeProducerConfigFile(path, brokersAddressesStr, config.KAFKA_DATA_DIR)
86 | cmdHelper.uploadToServer(path, config.CLIENT_NODES[i%len(config.CLIENT_NODES)], config.KAFKA_DATA_DIR, True)
87 |
88 | # launch the producers on client machines
89 | producersProcesses = []
90 | for i in range(PRODUCERS_COUNT):
91 | path = config.KAFKA_DATA_DIR + "producer{}.properties".format(i)
92 | cmd = config.KAFKA_PRODUCER_TEST_BIN + "--topic topic-{} --record-size {} --throughput -1 --num-records {} --producer.config {}".format(i, config.MESSAGE_SIZE, config.MESSAGES_COUNT, path)
93 | threads.append(threading.Thread(target=startSingleProducer, args=[cmd, config.CLIENT_NODES[i%len(config.CLIENT_NODES)], producersProcesses]))
94 |
95 | for t in threads:
96 | t.start()
97 |
98 | for t in threads:
99 | t.join()
100 |
101 | # wait until producers finish
102 | for p in producersProcesses:
103 | try:
104 | p.wait()
105 | except Exception as e:
106 | print(e)
107 |
108 | # print the throughput of each producer
109 | printResults(producersProcesses)
110 |
111 |
112 | # temp folder that is used to store temp config and log files
113 | TMP_DIR = os.getcwd() + "/tmp/"
114 | cmd = "mkdir -p {}".format(TMP_DIR)
115 | cmdHelper.executeCmdBlocking(cmd)
116 |
117 | createTopics()
118 | startProducers()
--------------------------------------------------------------------------------
/examples/Kafka/run_exp.py:
--------------------------------------------------------------------------------
1 | import optparse
2 | import time
3 | import sys
4 | import os
5 | import cmd_helper
6 | import config
7 |
8 | def parseResult(outputStr):
9 | try:
10 | outputStr = outputStr.split('\n')
11 | outputStr.remove('')
12 | outputStr = outputStr[-1]
13 | outputStr = outputStr.split(' ')
14 | return float(outputStr[1])
15 | except Exception as e:
16 | print(e)
17 |
18 | if (len(sys.argv) < 4):
19 | print ("::USAGE::")
20 | print ("python ./run_exp.py min_num_clients max_num_clients step")
21 | print ("--------------------------------------------------------")
22 | print ("this script runs multiple experiments by increasing the ")
23 | print ("number of clients from min_num_clients to max_num_clients ")
24 | print ("by the value of step in each iteration")
25 | print ("--------------------------------------------------------")
26 | sys.exit()
27 |
28 | cmdHelper = cmd_helper.CmdHelper()
29 | TMP_DIR = os.getcwd() + "/tmp/"
30 | cmd = "mkdir -p {}".format(TMP_DIR)
31 | cmdHelper.executeCmdBlocking(cmd)
32 |
33 | min_clients = int(sys.argv[1])
34 | max_clients = int(sys.argv[2])
35 | step = int(sys.argv[3])
36 |
37 |
38 | resultStr = "num of producers, throughput (ops/sec)\n"
39 | for clients in range(min_clients, max_clients+1, step):
40 | cmd = "python3 ./run_benchmark.py {}".format(clients)
41 | out,err = cmdHelper.executeCmdBlocking(cmd)
42 | out = out.decode("utf-8")
43 | totalThroughput = parseResult(out)
44 | resultStr = resultStr + "{},{}\n".format(clients, totalThroughput)
45 |
46 | print (resultStr)
47 | try:
48 | f = open(TMP_DIR + "/result.csv", "w")
49 | f.write(resultStr)
50 | f.close()
51 | except Exception as e:
52 | print(e)
53 |
54 |
55 |
--------------------------------------------------------------------------------
/examples/Kafka/stop_kafka.py:
--------------------------------------------------------------------------------
1 | '''
2 | This script deploys Kafka and ZooKepper on the cluster specified
3 | in the config.py file
4 | '''
5 | import config
6 | import cmd_helper
7 |
8 | cmdHelper = cmd_helper.CmdHelper()
9 |
10 | def stopKafka():
11 | # initialize needed directories on all nodes
12 | for node in config.NODES_IPS:
13 | cmd = "sudo pkill -9 java"
14 | cmdHelper.executeCmdRemotely(cmd, node, True)
15 |
16 |
17 | stopKafka()
--------------------------------------------------------------------------------
/examples/simple example/example.md:
--------------------------------------------------------------------------------
1 | Nifty Example
2 | -------
3 |
4 | Below we describe an example that demonstrates how Nifty works and shows a simple use case of Nifty's functionality (i.e., Artifact Functionality).
5 |
6 | ### Overview
7 | The main idea of this example is to show that using Nifty, we can mask partial partitions in a local network. To demonstrate this, we will conduct a simple experiment where we use ping to show that Nifty infact does cover the partition without any modifications.
8 |
9 | ### Setup
10 | To run this example, you will need 4 machines that are in the same network, say node1, node2, node3, node4. node4 will act as a controller of the experiment, while the other three nodes are the cluster in which we will deploy Nifty. Further, you need to have Nifty present in all of the nodes at the same location (the location specified in the configuration file of the controller node (node4)), and Nifty should be compiled in all of them. You will also need to follow the steps in Nifty's setup and make sure that OVS is installed in nodes 1, 2, and 3.
11 |
12 | Example 1: A partition without Nifty
13 | -------
14 |
15 | In this example we will create a partition on a cluster. The cluster does not use Nifty.
16 |
17 | 1. Log into all the nodes, and make sure that all the nodes can ping each other.
18 |
19 | ```bash
20 | node2$ ping -c3 node3
21 | PING node3-link-0 (192.168.1.103) 56(84) bytes of data.
22 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=1 ttl=64 time=1.45 ms
23 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=2 ttl=64 time=0.311 ms
24 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=3 ttl=64 time=0.327 ms
25 |
26 | --- node3-link-0 ping statistics ---
27 | 3 packets transmitted, 3 received, 0% packet loss, time 2001ms
28 | ```
29 |
30 | 2. From the controller node (node4), modify nodes.conf in the deploy folder to include the hostnames (or IPs) of nodes 1, 2, and 3.
31 |
32 | The file will look something like this
33 | ```
34 | node1
35 | node2
36 | node3
37 | ```
38 |
39 | 3. From the controller node, modify parts.conf as follows:
40 |
41 | ```
42 | 1
43 | node2_MAC
44 | 1
45 | node3_MAC
46 | ```
47 |
48 | This effectively defines a partition between node2 and node3, while node1 can communicate with all the nodes.
49 |
50 | 4. From the controller node (node4) run the script ./deploy_partitioner.sh
51 |
52 | ```bash
53 | node4$ sudo ./deploy_partitioner.sh
54 | ```
55 |
56 | This will create a partition specified in parts.conf
57 |
58 | 6. Test this by logging into nodes 1, 2, and 3. You now should not be able to ping node 2 from node 3 (and vice versa) but should be able to ping nodes 2 and 3 from node 1.
59 |
60 | Here is the output for testing from node2
61 | ```bash
62 | node2$ ping -c3 node3
63 | PING node3-link-0 (192.168.1.103) 56(84) bytes of data.
64 |
65 | --- node3-link-0 ping statistics ---
66 | 3 packets transmitted, 0 received, 100% packet loss, time 2015ms
67 | ```
68 |
69 | 7. Heal the partition using the heal script on node4
70 |
71 | ```bash
72 | node4$ sudo ./heal_partition.sh
73 | ```
74 |
75 | Now let's see how the this example is different when we have Nifty running in the system
76 |
77 | Example 2: A partition while using Nifty
78 | -------
79 |
80 | We will repeat the previous example on the cluster after deploying Nifty.
81 |
82 | Redo steps 1 and 2 from example 1 above.
83 |
84 | 1. Deploy Nifty using the deployment script found in the ```deploy``` directory. Run the deploy script on the controller node.
85 |
86 | ```bash
87 | node4$ sudo ./deploy_nifty.sh
88 | ```
89 |
90 | 4. Now, on node4 run the script to create a partition using ./deploy_partitioner.sh
91 |
92 | ```bash
93 | node4$ sudo ./deploy_partitioner.sh
94 | ```
95 |
96 | This will create a partition specified in parts.conf
97 |
98 | 9. Test this by logging into nodes 1,2, and 3. You should still be able to ping all the nodes from all the nodes, despite the partition.
99 |
100 | The following is the test from node2
101 | ```bash
102 | node2$ ping -c3 node3
103 | PING node3-link-0 (192.168.1.103) 56(84) bytes of data.
104 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=1 ttl=64 time=0.301 ms
105 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=2 ttl=64 time=0.334 ms
106 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=3 ttl=64 time=0.360 ms
107 |
108 | --- node3-link-0 ping statistics ---
109 | 3 packets transmitted, 3 received, 0% packet loss, time 1999ms
110 | rtt min/avg/max/mdev = 0.301/0.331/0.360/0.032 ms
111 | ```
112 |
113 | The ping works and node2 and node3 can communicate in spite of the partition. That is because Nifty creates alternative routes in the network to mask the partition.
114 |
115 |
116 |
117 |
118 |
119 |
120 |
121 |
122 |
123 |
124 |
125 |
--------------------------------------------------------------------------------
/pnp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/UWASL/NIFTY/f1c3fcfb921b00c690208ae8c2f1b6f217a399ac/pnp.png
--------------------------------------------------------------------------------
/src/daemon.cpp:
--------------------------------------------------------------------------------
1 | /**
2 | * The main file of Nifty (contains driver code)
3 | */
4 |
5 | #include
6 | #include "nifty.h"
7 | using namespace std;
8 |
9 |
10 | /**
11 | * Prints usage information.
12 | */
13 | void printUsage()
14 | {
15 | printf("::USAGE::\n");
16 | printf("./nifty -i [ip] -m [mac] [OPTIONS]\n\n");
17 |
18 | printf("-i ip , This machine's IP address, e.g., 192.168.1.5\n");
19 | printf("-m mac , This machine's MAC address, e.g., f8:c9:7a:92:bb:a3\n");
20 | printf("OPTIONS: \n");
21 | printf("-c # , Configuration file path (default is nodes.conf) it contains IPS and MACs of nodes\n");
22 | printf("-t # , How often should this machine ping others. (every t seconds)\n");
23 | printf("-v , Verbose mode\n");
24 | exit(1);
25 | }
26 |
27 | /**
28 | * Prints the args that are passed to the driver.
29 | */
30 | void printArgs(string myIp, string myMac, int pingingPeriod, int destinationsCount,
31 | const string* destinationIps, const string* destinationMacs, bool verbose)
32 | {
33 | printf("myIp: %s\n", myIp.c_str());
34 | printf("myMac: %s\n", myMac.c_str());
35 | printf("pingingPeriod: %d\n", pingingPeriod);
36 | printf("Verbose: %s\n", verbose?"True":"False");
37 | printf("destinationsCount: %d\n", destinationsCount);
38 |
39 | string temp = "";
40 | for (int i = 0; i < destinationsCount; ++i)
41 | {
42 | if(i)
43 | temp += ", ";
44 | temp += destinationIps[i];
45 | }
46 | printf("destinationIps: %s\n", temp.c_str());
47 |
48 | temp = "";
49 | for (int i = 0; i < destinationsCount; ++i)
50 | {
51 | if(i)
52 | temp += ", ";
53 | temp += destinationMacs[i];
54 | }
55 | printf("destinationMacs: %s\n", temp.c_str());
56 | }
57 |
58 | /**
59 | * Driver code.
60 | *
61 | * Parses arguments passed to the program and creates DV with them.
62 | */
63 | int main(int argc, char** argv)
64 | {
65 | int destinationsCount = 1;
66 | string* destinationIps;
67 | string* destinationMacs;
68 | string myIp = "IP";
69 | string myMac = "MAC";
70 | string confPath = "nodes.conf";
71 | int pingingPeriod = 5;
72 | bool verbose = false;
73 |
74 | if(argc <= 3)
75 | printUsage();
76 |
77 | // Parse the command line arguments
78 | try
79 | {
80 | for (int i = 1; i < argc; ++i)
81 | {
82 | if(argv[i][0]=='-' && argv[i][1] == 'i')
83 | myIp = argv[++i];
84 |
85 | else if(argv[i][0]=='-' && argv[i][1] == 'm')
86 | myMac = argv[++i];
87 |
88 | else if(argv[i][0]=='-' && argv[i][1] == 't')
89 | pingingPeriod = stoi(argv[++i]);
90 |
91 | else if(argv[i][0]=='-' && argv[i][1] == 'v')
92 | verbose = true;
93 |
94 | else if(argv[i][0]=='-' && argv[i][1] == 'c')
95 | confPath = argv[++i];
96 | }
97 | ifstream fin(confPath);
98 | fin>>destinationsCount;
99 | destinationIps = new string[destinationsCount];
100 | destinationMacs = new string[destinationsCount];
101 |
102 | for (int j = 0; j < destinationsCount; ++j)
103 | {
104 | string dstIp;
105 | fin>>dstIp;
106 | destinationIps[j] = dstIp;
107 | }
108 |
109 | for (int j = 0; j < destinationsCount; ++j)
110 | {
111 | string dstMac;
112 | fin>>dstMac;
113 | destinationMacs[j] = dstMac;
114 | }
115 |
116 | fin.close();
117 | }
118 | catch(...)
119 | {
120 | printUsage();
121 | exit(1);
122 | }
123 | printArgs(myIp, myMac, pingingPeriod, destinationsCount, destinationIps, destinationMacs, verbose);
124 |
125 | Nifty nifty(myIp, myMac, pingingPeriod, destinationsCount, destinationIps, destinationMacs, verbose);
126 | nifty.start();
127 |
128 | return 0;
129 | }
130 |
--------------------------------------------------------------------------------
/src/nifty.cpp:
--------------------------------------------------------------------------------
1 | /**
2 | * Implementation file for the primary class Nifty
3 | */
4 |
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include
12 | #include
13 | #include
14 | #include
15 | #include
16 | #include "nifty.h"
17 |
18 | using namespace std;
19 |
20 |
21 | string Nifty::toString(string targetIP)
22 | {
23 | int throughID = ipToId.find(targetIP) == ipToId.end()? -1 : ipToId[targetIP];
24 |
25 | std::string ret = "";
26 | for (int i = 0; i < destinationsCount; ++i)
27 | {
28 | if(i)
29 | ret += ";";
30 | ret += distanceVector[i].toString(throughID);
31 | }
32 | return ret;
33 | }
34 |
35 |
36 | void Nifty::print(string msg, bool forcePrint)
37 | {
38 | if(verbose || forcePrint)
39 | printf("%s\n", msg.c_str());
40 | }
41 |
42 |
43 | DistanceVectorEntry* distancVectorFromString(const char* message, int len)
44 | {
45 | DistanceVectorEntry* ret = new DistanceVectorEntry[len];
46 | std::stringstream ss(message);
47 | std::string token;
48 | int id = 0;
49 | while(std::getline(ss, token, ';'))
50 | {
51 | std::stringstream ss2(token);
52 | double cost;
53 | std::string ip;
54 | ss2>>cost>>ip;
55 |
56 | ret[id] = DistanceVectorEntry(cost, id, ip);
57 | id++;
58 | }
59 | return ret;
60 | }
61 |
62 |
63 | Nifty::Nifty(std::string _myIp, std::string _myMac, unsigned int _pingingPeriod, unsigned int _destinationsCount,
64 | std::string* _destinationIps, std::string* _destinationMacs, bool _verbose)
65 | {
66 | myIp = _myIp;
67 | myMac = _myMac;
68 | pingingPeriod = _pingingPeriod;
69 | destinationsCount = _destinationsCount;
70 | destinationIps = _destinationIps;
71 | destinationMacs = _destinationMacs;
72 | verbose = _verbose;
73 |
74 | distanceVector = new DistanceVectorEntry[destinationsCount];
75 | init();
76 | }
77 |
78 |
79 | void Nifty::start()
80 | {
81 | //A seperate thread to ping others.
82 | pingingThread = std::thread (&Nifty::pingOthers, this, false);
83 | receiveMessages();
84 | }
85 |
86 |
87 | Nifty::~Nifty()
88 | {
89 | delete[] distanceVector;
90 | }
91 |
92 |
93 | void Nifty::init()
94 | {
95 | for (int i = 0; i < destinationsCount; ++i)
96 | {
97 | ipToId[destinationIps[i]] = i;
98 |
99 | distanceVector[i] = DistanceVectorEntry(MAX_COST, -1, destinationIps[i]);
100 | if(destinationIps[i] == myIp)
101 | distanceVector[i] = DistanceVectorEntry(0.0, i, myIp); //Cost to reach myself
102 | }
103 | updateOF();
104 | }
105 |
106 |
107 | void Nifty::nodeTimedOut(string ip)
108 | {
109 | // I already know that I cannot reach this one, do nothing.
110 | if(timedOutNodes.find(ip) != timedOutNodes.end() && timedOutNodes[ip])
111 | return;
112 |
113 | print("Node " + ip + " had timedout!!");
114 | for (int i = 0; i < destinationsCount; ++i)
115 | {
116 | if (distanceVector[i].throughID >= 0
117 | && destinationIps[distanceVector[i].throughID] == ip) //Used to go through this node, it's inf now!
118 | {
119 | distanceVector[i].cost = MAX_COST;
120 | distanceVector[i].throughID = -1;
121 | }
122 | }
123 | }
124 |
125 |
126 | void Nifty::checkTimeOuts()
127 | {
128 | for (int i = 0; i < destinationsCount; ++i)
129 | {
130 | if (myIp == destinationIps[i])
131 | continue;
132 |
133 | //Check if this node had timed out!
134 | time_t curr_time = time (NULL);
135 |
136 | if(curr_time - nodesTimes[destinationIps[i]] > timeoutPeriod)
137 | nodeTimedOut(destinationIps[i]);
138 | }
139 | }
140 |
141 |
142 | void Nifty::pingOthers(bool onlyOnce)
143 | {
144 | struct sockaddr_in dest_addr;
145 | dest_addr.sin_family = AF_INET;
146 | dest_addr.sin_port = htons(PORT);
147 | int dest_sockfd;
148 |
149 | //Ping everyone every @pingingPeriod seconds
150 | do
151 | {
152 | //Need to check for timeouts first (important!)
153 | checkTimeOuts();
154 |
155 | for (int i = 0; i < destinationsCount; ++i)
156 | {
157 | if (myIp == destinationIps[i])
158 | continue;
159 |
160 | string message = toString(destinationIps[i]);
161 | print("Sending {"+destinationIps[i]+"} message : " + message);
162 |
163 | const char* destination = destinationIps[i].c_str();
164 | inet_pton(AF_INET, destination, &dest_addr.sin_addr);
165 | dest_sockfd = socket(AF_INET, SOCK_DGRAM, 0);
166 | if(dest_sockfd < 0 )
167 | perror("socket creation failed in pingOthers");
168 |
169 | int sendingResult = sendto(dest_sockfd, (const char *)message.c_str(),
170 | strlen(message.c_str()),
171 | MSG_CONFIRM, (const struct sockaddr *) &dest_addr,
172 | sizeof(dest_addr));
173 | close(dest_sockfd);
174 | }
175 | std::this_thread::sleep_for(std::chrono::milliseconds(pingingPeriod));
176 | }while(!onlyOnce);
177 | }
178 |
179 |
180 | bool Nifty::updateDV(const char* message, const char* sourceIP)
181 | {
182 | bool updated = false;
183 |
184 | if(ipToId.find(sourceIP) == ipToId.end()) //IDK about the source!! do nothing.
185 | return false;
186 |
187 | timedOutNodes[sourceIP] = false; // I now know that this node didn't time out
188 |
189 | // Update when I last heared from this node.
190 | time_t curr_time = time (NULL);
191 | nodesTimes[sourceIP] = curr_time;
192 |
193 | int sourceID = ipToId[sourceIP];
194 |
195 | DistanceVectorEntry* otherDV = distancVectorFromString(message, destinationsCount);
196 |
197 | int reach_count = 0;
198 | for (int i = 0; i < destinationsCount; ++i)
199 | {
200 | string ip = otherDV[i].targetIP;
201 | double cost = otherDV[i].cost + 1; //One extra hop to get to the node I received the message from
202 | if (cost <=2) //Can directly reach
203 | reach_count++;
204 | int id = ipToId[ip];
205 |
206 | if (cost < distanceVector[id].cost) //The new cost is better, need to update my DV.
207 | {
208 | updated = true;
209 | distanceVector[id] = DistanceVectorEntry(cost, sourceID, ip);
210 | }
211 | }
212 |
213 | if(reach_count>=destinationsCount - 1) // Can reach everyone directly
214 | isBridgeNode[sourceIP] = true;
215 | else
216 | isBridgeNode[sourceIP] = false;
217 |
218 | delete[] otherDV;
219 | if(updated)
220 | {
221 | updateOF();
222 | print("DV got updated.", true);
223 | print("current DV is: " + toString());
224 | }
225 | return updated;
226 | }
227 |
228 |
229 | void Nifty::receiveMessages()
230 | {
231 | int sockfd;
232 | char buffer[BUFFSIZE];
233 | struct sockaddr_in servaddr, cliaddr;
234 | char adder_buffer[ADDRSIZE];
235 |
236 | // Creating socket file descriptor
237 | sockfd = socket(AF_INET, SOCK_DGRAM, 0);
238 | if ( sockfd< 0 ) {
239 | perror("socket creation failed");
240 | exit(EXIT_FAILURE);
241 | }
242 |
243 | memset(&servaddr, 0, sizeof(servaddr));
244 | memset(&cliaddr, 0, sizeof(cliaddr));
245 |
246 | // Filling server information
247 | servaddr.sin_family = AF_INET; // IPv4
248 | servaddr.sin_addr.s_addr = INADDR_ANY;
249 | servaddr.sin_port = htons(PORT);
250 |
251 | // Bind the socket with the server address
252 | if ( bind(sockfd, (const struct sockaddr *)&servaddr,
253 | sizeof(servaddr)) < 0 )
254 | {
255 | perror("bind failed");
256 | exit(EXIT_FAILURE);
257 | }
258 |
259 | //Keep listening for others messages.
260 | while(true)
261 | {
262 | unsigned int len;
263 | len = sizeof(struct sockaddr_in);
264 | int n = recvfrom(sockfd, (char *)buffer, BUFFSIZE,
265 | MSG_WAITALL, ( struct sockaddr *) &cliaddr, &len);
266 | buffer[n] = '\0';
267 |
268 | inet_ntop(AF_INET, &(cliaddr.sin_addr), adder_buffer, ADDRSIZE);
269 | string address = adder_buffer;
270 | string msg = buffer;
271 | print("Received a message from: " + address);
272 | print("The message: " + msg);
273 |
274 | //After receiving a message, update DV
275 | updateDV(buffer, adder_buffer);
276 | }
277 | close(sockfd);
278 | }
279 |
280 |
281 | const void Nifty::installRule(string rule)
282 | {
283 | print("Installing rule: " + rule);
284 | system(rule.c_str());
285 | }
286 |
287 |
288 | /**
289 | * Update the rules in the OVS's OpenFlow table using the data in
290 | * the distance vector table.
291 | * Uses different cookie numbers for different rules (used to
292 | * have a more targeted analysis of the traffic in the system)
293 | *
294 | * ****COOKIES TABLE*****
295 | * 1 => IN_TRAFFIC: DATA SENT TO ME
296 | * 2 => OUT_TRAFFIC: DATA GOING OUT OF ME TO OTHER DESTINATIONS
297 | * 3 => PASSING_TRAFFIC: DATA PASSING THROUGH ME TO OTHER DESTINATIONS
298 | * 4 => CONTROLLER TRAFFIC
299 | * 5 => OTHER? (not used.)
300 | */
301 | const void Nifty::updateOF()
302 | {
303 | if(updating)
304 | return;
305 | updating = true;
306 | //All tags from 1 - 9 belong to this controller, delete them all.
307 | installRule("ovs-ofctl del-flows br0 cookie=1/-1" );
308 | installRule("ovs-ofctl del-flows br0 cookie=2/-1" );
309 | installRule("ovs-ofctl del-flows br0 cookie=3/-1" );
310 | installRule("ovs-ofctl del-flows br0 cookie=4/-1" );
311 | installRule("ovs-ofctl del-flows br0 cookie=5/-1" );
312 | installRule("ovs-ofctl del-flows br0 cookie=6/-1" );
313 | string rule;
314 |
315 | installRule("ovs-ofctl add-flow br0 cookie=1,priority=100,action=normal");
316 |
317 | //Controller flow doesn't need to be forwarded
318 | installRule("ovs-ofctl add-flow br0 cookie=4,priority=5000,ip,nw_proto=17,tp_dst=8080,action=normal");
319 |
320 | int reach_count = 0;
321 | for (int i = 0; i < destinationsCount; ++i)
322 | {
323 | if (distanceVector[i].cost >= MAX_COST || destinationIps[i] == myIp) //Cannot really reach it, install nothing.
324 | continue;
325 |
326 | string dest_ip = destinationIps[i];
327 | string dest_mac = destinationMacs[i];
328 | string through_ip = destinationIps[distanceVector[i].throughID];
329 | string through_mac = destinationMacs[distanceVector[i].throughID];
330 |
331 | double cost = distanceVector[i].cost;
332 | if (cost <=1) //Can directly reach
333 | reach_count++;
334 |
335 | //Modify packets passing through me
336 | rule = "ovs-ofctl add-flow br0 cookie=3,priority=500,ip,in_port=1,nw_dst="+dest_ip+",action=mod_dl_dst:"+through_mac+",mod_dl_src:"+myMac+",in_port";
337 | installRule(rule);
338 |
339 | //Modify packets going out of me.
340 | rule = "ovs-ofctl add-flow br0 cookie=2,priority=500,ip,nw_dst="+dest_ip+",action=mod_dl_dst:"+through_mac+",mod_dl_src:"+myMac+",1";
341 | installRule(rule);
342 | }
343 |
344 | if(reach_count>=destinationsCount) // Can reach everyone directly
345 | isBridgeNode[myIp] = true;
346 | else
347 | isBridgeNode[myIp] = false;
348 |
349 | updating = false;
350 | }
351 |
352 |
353 | void Nifty::printDV()
354 | {
355 | print("CurrentDV: " + toString());
356 | }
357 |
358 |
359 | vector Nifty::getBridgeNodes()
360 | {
361 | vector ret;
362 |
363 | for (int i = 0; i < destinationsCount; ++i)
364 | {
365 | if(isBridgeNode[destinationIps[i]])
366 | ret.push_back(destinationIps[i]);
367 | }
368 |
369 | return ret;
370 | }
371 |
--------------------------------------------------------------------------------
/src/nifty.h:
--------------------------------------------------------------------------------
1 | /**
2 | * This is a header file that contains the skeleton for the main class of Nifty
3 | * and the implementation of the Distance Vectory Entry class.
4 | */
5 |
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 |
12 | using namespace std;
13 | const double MAX_COST = 1001; // Anything with a cost more than 1000 is unreachable
14 | #define PORT 8080 // Nifty instances use this port to communicate with each other
15 | #define BUFFSIZE 1024 // Maximum size of a single message sent between Nifty instances (in bytes)
16 | #define ADDRSIZE 20 // Size of address in bytes.
17 |
18 | /**
19 | * DistanceVectorEntry is the struct that contains infromation about each entry
20 | * in the distance-vector table (DV).
21 | * The struct contains three information:
22 | * @targetIP: the targetIP this entry concerns.
23 | * @throughID: the ID of the node traffic should be forwarded to next in
24 | * order to reach to the final distination (@targetIP).
25 | * @cost: the cost (number of hops) of reaching the target.
26 | */
27 | struct DistanceVectorEntry
28 | {
29 | double cost; // The cost to reach the node (number of hops)
30 | int throughID; // traffic destined to targetIP is forwarded to node with throughID.
31 | std::string targetIP; // The IP of the node
32 |
33 | DistanceVectorEntry(): DistanceVectorEntry(MAX_COST, -1,"") {}
34 | DistanceVectorEntry(double _cost, int _throughID): DistanceVectorEntry(_cost,_throughID,"") {}
35 | DistanceVectorEntry(double _cost, int _throughID, string _targetIP)
36 | {
37 | cost = _cost;
38 | throughID = _throughID;
39 | targetIP = _targetIP;
40 | }
41 |
42 | /**
43 | * return a string representing the entry.
44 | */
45 | std::string toString() const
46 | {
47 | stringstream ss;
48 | ss< ipToId; //get the id for a specific ip.
88 | unordered_map nodesTimes; //When was the last message received by this node
89 | unordered_map timedOutNodes; //States whether a node had timed out or not.
90 | unordered_map isBridgeNode; //Tells whether a node is a bridge node or not.
91 | bool updating = false;
92 |
93 |
94 | /**
95 | * Construct a message to send to other nodes. The message
96 | * contains details about my current distance vector table.
97 | *
98 | * @targetIP: The IP address of the node I'm sending the message to.
99 | */
100 | string toString(string targetIP = "");
101 |
102 |
103 | /**
104 | * Initializes the underlying data structures. Sets the cost of
105 | * reaching others to inf and reaching myself is 0.
106 | */
107 | void init();
108 |
109 |
110 | /**
111 | * A helper function to print messages to the console.
112 | * Needed as verbose might be off, and in that case we shouldn't print
113 | *
114 | * @msg: The message to be printed.
115 | * @forcePrint: defaults to false. Whether or not this print should be forced regardless of verbose settings.
116 | */
117 | void print(string msg, bool forcePrint=false);
118 |
119 |
120 | /**
121 | * Pings other nodes and piggyback my distance vector to the ping message.
122 | *
123 | * @onlyOnce: If set to true send only one heartbeat. Otherwise, keep sending heartbeats every pingingPeriod.
124 | */
125 | void pingOthers(bool onlyOnce = false);
126 |
127 |
128 | /**
129 | * Potentially update the unerlying distance vector table
130 | * using a @message received from @sourceIP.
131 | */
132 | bool updateDV(const char* message, const char* sourceIP);
133 |
134 |
135 | /**
136 | * A method to keep receiving messages from other hosts.
137 | */
138 | void receiveMessages();
139 |
140 |
141 | /**
142 | * Update the rules in the OVS's OpenFlow table using the data in the distance vector table.
143 | */
144 | void const updateOF();
145 |
146 |
147 | /**
148 | * Sets a node as a timoued out one and potentially modify
149 | * other entries in the DV table if they were using
150 | * the node with IP address @ip as an intermediary node.
151 | *
152 | * @ip: The IP address of the node that got timed out.
153 | */
154 | void nodeTimedOut(string ip);
155 |
156 |
157 | /**
158 | * A helper function that calls the underlying system funciton to install a new OVS OpenFlow rule.
159 | */
160 | void const installRule(string rule);
161 |
162 |
163 | /**
164 | * Checks if any of the nodes I'm connected to had recently timed out.
165 | */
166 | void checkTimeOuts();
167 |
168 |
169 | /**
170 | * Returns a list of the bridge nodes in the system.
171 | */
172 | vector getBridgeNodes();
173 |
174 |
175 | std::thread pingingThread;
176 | std::thread receivingThread;
177 | public:
178 | /**
179 | * Constructor
180 | *
181 | * @_myIp: IP address of this node.
182 | * @_myMac: MAC address of this node.
183 | * @_pingingPeriod: Ping other nodes every @_pingingPeriod number of seconds.
184 | * @_destinationsCount: Number of other nodes in the system.
185 | * @_destinationIps: IP addresses of other nodes in the system.
186 | * @_destinationMacs: MAC addresses of other nodes in the system.
187 | */
188 | Nifty(std::string _myIp, std::string _myMac, unsigned int _pingingPeriod, unsigned int _destinationsCount,
189 | std::string* _destinationIps, std::string* _destinationMacs, bool _verbose = false);
190 |
191 | /**
192 | * Starts the Nifty process. Start receiving messages and pinging others.
193 | */
194 | void start();
195 |
196 | /**
197 | * Function to output the entire distance-vector table
198 | */
199 | void printDV();
200 |
201 | /**
202 | * Destructor. Deletes the underlying data structures (distance vector)
203 | */
204 | ~Nifty();
205 | };
206 |
--------------------------------------------------------------------------------
/src/partitioner.cpp:
--------------------------------------------------------------------------------
1 | /**
2 | * The partitioner.cpp code is used to help test Nifty
3 | * by introducing artificial partitions between the nodes
4 | */
5 |
6 | #include
7 |
8 | using namespace std;
9 |
10 | string myMac = "";
11 |
12 | /**
13 | * A helper function that calls the underlying system funciton to
14 | * install a new OVS OpenFlow rule.
15 | *
16 | * @rule: a string with a command to install an OVS rule
17 | */
18 | void installRule(string rule)
19 | {
20 | printf("Installing rule: %s\n", rule.c_str());
21 | system(rule.c_str());
22 | }
23 |
24 |
25 | /**
26 | * A function to output a help message to the user of the code
27 | * showing how to run the code and what parameters to give it
28 | */
29 | void printUsage()
30 | {
31 | printf("::USAGE::\n");
32 | printf("./partitioner myMac [path]\n\n");
33 | printf("If the tool is called with no arguments, it heals the partial partition\n\n");
34 |
35 | printf("NOTE: The default path is ./parts.conf\n\n");
36 | printf("parts.conf structure\n");
37 | printf("Line1 (count) g1MACs\n");
38 | printf("Line2 (count) g2MACs\n");
39 |
40 | exit(1);
41 | }
42 |
43 |
44 | /**
45 | * Creates a MAC-based partial parition where g1 cannot reach g2.
46 | * (everyone else can reach both)
47 | * Uses the "cookie" value of 10 (0xa) for the installed OpenFlow rules.
48 | * This helps with healing the partition as we only remove rules with 0xa.
49 | * Make sure no other rules in the OpenFlow table use the cookie 0xa as well.
50 | *
51 | * @g1: Vector of g1 members MAC addresses
52 | * @g2: Vector of g2 members MAC addresses
53 | */
54 | void createMACPNP(const vector g1, const vector g2)
55 | {
56 | if(find(g1.begin(), g1.end(), myMac)!=g1.end()) {
57 | // I'm in g1, shouldn't reach any g2 members
58 | for (int i = 0; i < g2.size(); ++i)
59 | installRule("ovs-ofctl add-flow br0 cookie=10,priority=10000,dl_src="+g2[i]+",action=drop");
60 | }
61 | else if(find(g2.begin(), g2.end(), myMac)!=g2.end()) {
62 | // I'm in g2, shouldn't reach any g3 members
63 | for (int i = 0; i < g1.size(); ++i)
64 | installRule("ovs-ofctl add-flow br0 cookie=10,priority=10000,dl_src="+g1[i]+",action=drop");
65 | }
66 | //Do nothing otherwise (this applies to nodes that are not affected by the parition)
67 | }
68 |
69 |
70 | /**
71 | * Driver code for the partitioner.
72 | */
73 | int main(int argc, char** argv)
74 | {
75 | ifstream fin;
76 | // parse the command line arguments
77 | try
78 | {
79 | fin.exceptions(std::ifstream::failbit | std::ifstream::badbit);
80 | if(argc == 3)
81 | fin.open(argv[2]);
82 | else if(argc == 2)
83 | fin.open("./parts.conf");
84 | else if (argc == 1)
85 | {
86 | // No command line arguments are provided
87 | // heal all current partial network paritions by deleting
88 | // the flow rules with a cookie=10
89 | installRule("ovs-ofctl del-flows br0 cookie=10/-1");
90 | exit(0);
91 | }
92 | else printUsage();
93 |
94 | myMac = argv[1];
95 | }
96 | catch(...)
97 | {
98 | printf("~~~~~~Couldn't open the partitioning file!!\n\n");
99 | printUsage();
100 | }
101 |
102 | int n;
103 | // parse the first group addresses from the file
104 | fin>>n;
105 | vector g1;
106 | vector g2;
107 | while(n--)
108 | {
109 | string addr;
110 | fin>>addr;
111 | g1.push_back(addr);
112 | }
113 |
114 | // parse the second group addresses from the file
115 | fin>>n;
116 | while(n--)
117 | {
118 | string addr;
119 | fin>>addr;
120 | g2.push_back(addr);
121 | }
122 | fin.close();
123 |
124 |
125 | printf("Creating a partial partition between the following group 1 and group 2.\n");
126 | printf("Members of group 1: \n");
127 | for (int i = 0; i < g1.size(); ++i)
128 | printf("%s\n", g1[i].c_str());
129 |
130 | printf("\n~~~~~~~~~~~~~~~\n\n");
131 | printf("Members of group 2: \n");
132 | for (int i = 0; i < g2.size(); ++i)
133 | printf("%s\n", g2[i].c_str());
134 |
135 | createMACPNP(g1,g2);
136 |
137 | return 0;
138 | }
139 |
--------------------------------------------------------------------------------