├── Makefile ├── README.md ├── configuration ├── deploy ├── README.md ├── deploy_nifty.sh ├── deploy_partitioner.sh ├── heal_partition.sh ├── nodes.conf ├── parts.conf ├── print_macs.sh └── stop_nifty.sh ├── examples ├── HDFS │ ├── README.md │ ├── config.py │ ├── deploy_hdfs.py │ ├── run_benchmark.py │ ├── run_exp.py │ └── stop_hdfs.py ├── Kafka │ ├── README.md │ ├── cmd_helper.py │ ├── config.py │ ├── deploy_kafka.py │ ├── helpers.py │ ├── run_benchmark.py │ ├── run_exp.py │ └── stop_kafka.py └── simple example │ └── example.md ├── pnp.png └── src ├── daemon.cpp ├── nifty.cpp ├── nifty.h └── partitioner.cpp /Makefile: -------------------------------------------------------------------------------- 1 | CC=g++ 2 | CFLAGS=-c -std=c++17 -lpthread -pthread 3 | DEPS = dv.h 4 | 5 | SRC_DIR := src 6 | OBJ_DIR := obj 7 | 8 | .PHONY: all clean nifty partitioner 9 | 10 | all: nifty partitioner 11 | 12 | debug: CFLAGS += -Wall -DDEBUG -g 13 | debug: all 14 | 15 | nifty: $(OBJ_DIR)/daemon.o $(OBJ_DIR)/nifty.o 16 | $(CC) -o $@ $^ -pthread 17 | 18 | partitioner: $(OBJ_DIR)/partitioner.o 19 | $(CC) -o $@ $^ 20 | 21 | $(OBJ_DIR)/%.o: $(SRC_DIR)/%.cpp $(SRC_DIR)/nifty.h | $(OBJ_DIR) 22 | $(CC) -o $@ $< $(CFLAGS) 23 | 24 | $(OBJ_DIR): 25 | mkdir -p $@ 26 | 27 | clean: 28 | rm -r $(OBJ_DIR) nifty partitioner 29 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Nifty 2 | ======= 3 | 4 | Nifty is a transparent communication layer that masks partial network partitions. Partial partitions are a special kind of network partitions that divides the cluster into three groups of nodes (group 1, 2, and 3) such that groups 1 and 2 are disconnected from each other while nodes in group 3 can communicate with all cluster nodes (See figure below). Nifty follows a peer-to-peer design in which every node in the cluster runs a Nifty process. These processes collaborate in monitoring cluster connectivity. When Nifty detects a partial partition, it detours the traffic around the partition through intermediate nodes (e.g., nodes in group 3 in the figure). 5 | 6 | ![pnp](pnp.png?raw=true) 7 | 8 | Setup 9 | ------- 10 | 11 | In order to use Nifty, you need to have OVS installed already, with a bridge called br0. To do that, you can use the following commands (make sure to replace **$INTERFACE_NAME** (e.g., if0) and **$IP_ADDRESS** with their actual vales): 12 | 13 | ```bash 14 | $ sudo apt-get update 15 | $ sudo apt-get install openvswitch-switch -y 16 | $ sudo ovs-vsctl init 17 | $ sudo ovs-vsctl add-br br0 18 | $ sudo ovs-vsctl add-port br0 $INTERFACE_NAME 19 | $ sudo ifconfig br0 $IP_ADDRESS netmask 255.255.255.0 up 20 | $ sudo ifconfig $INTERFACE_NAME 0 21 | ``` 22 | 23 | After that, you can just use 24 | ```bash 25 | $ make 26 | ``` 27 | to compile the code and generate the executables. 28 | 29 | 30 | Usage 31 | ------- 32 | There are two main executables. Nifty and Partitioner. Nifty is the fault tolerance layer that protects against partial network partitions, whereas Partitioner is a simple tool that can be used to inject partial partitions (for testing purposes). Both of these require OVS and assume the bridge is called br0 (see setup above). 33 | 34 | The ```deploy``` directory contains scripts for deploying Nifty on a cluster. 35 | 36 | 37 | Examples 38 | ------- 39 | 40 | The ```examples``` direcotry contains examples of using Nifty with diffirent systems. 41 | The ```examples/simple example``` directory contains a simple example that demonstrates the functionality of Nifty. 42 | -------------------------------------------------------------------------------- /configuration: -------------------------------------------------------------------------------- 1 | NIFTY_HOME='/users/maaalfat/nifty/' 2 | -------------------------------------------------------------------------------- /deploy/README.md: -------------------------------------------------------------------------------- 1 | The scripts in this folder are meant to facilitate the deployment of NIFTY and Partitioner on multiple machines. 2 | 3 | There are four scripts: 4 | 5 | 1. deploy_nifty: This script configures Nifty, and runs it on all the nodes specified in nodes.conf configuration file. 6 | 2. deploy_partitioner: This script configures partitioner, and runs it on all the nodes specified in nodes.conf. The way the partition is set up (between which nodes) is configured in parts.conf config file. 7 | 3. heal_partition: This script heals the partition that is currently present in the nodes in nodes.conf configuration file. 8 | 4. stop_nifty: This script stops Nifty instances running on all the nodes specified in nodes.conf configuration file. 9 | 10 | 11 | Assumptions 12 | ------- 13 | 14 | To run the scripts without any modifications, we have three assumptions. Below we list these assumptions and describe how to 15 | modify the scripts in case any of these assumptions don't hold (if possible). 16 | 17 | 1. We assume that Nifty is already present in all nodes in nodes.conf and its location is the same in all these nodes (e.g., in the user NFS home directory). The directory of Nifty can be configured through the configuration file (in the main Nifty directory, outside of the deployment folder). The file configuration currently only holds one variable called NIFTY_HOME and it gets parsed as part of all the scripts. 18 | 19 | 2. We assume that the controller node (where you call the deployment scripts) can ssh into all the nodes in nodes.conf without any extra configuration or restrictions. Before running the experiment, please distribute your ssh keys on the nodes and make sure ssh does not ask for credentials. Furthermore, if you need extra configuration to ssh into other nodes, you need to modify the scripts and change the variable called sshOptions that is present at the top of all scripts files. 20 | 21 | 3. We assume that you do not need sudo privileges to install openflow rules in OVS of the nodes you ssh into. If this doesn't hold, you can change the lines in the scripts that call Nifty or Partitioner in other nodes (the scripts have a comment that makes this change easy) 22 | 23 | 24 | NIFTY Deployment 25 | ======= 26 | 27 | First, please set properly the path to the Nifty directory in the ```NIFTY_HOME``` variable in the ```NIFTY/configuration``` file. 28 | 29 | In order for Nifty to run properly on a cluster, you will need to fill the config file nodes.conf. 30 | nodes.conf should contain the hostname or IP address of all the nodes in the cluster. Each hostname (or IP address) needs to be on a single line. 31 | 32 | Example file 33 | ``` 34 | 192.168.1.101 35 | 192.168.1.102 36 | 192.168.1.103 37 | ``` 38 | 39 | To deply NIFTY on the nodes in the nodes.conf just simple call the deployment script 40 | 41 | ``` 42 | ./deploy_nifty 43 | ``` 44 | 45 | 46 | Partitioner Deployment 47 | ======= 48 | 49 | To use Partitioner, you need a config the parts.conf file which specifies the partition. 50 | 51 | The structure of the config file is as follows: 52 | First line is an integer that represents the number of nodes in the first group (n). The next n lines list the IP addresses of these nodes. Next line is an integer the represents the number of nodes in the second group (m). The next m lines list the IP addresses of these nodes. e.g., 53 | 54 | ``` 55 | 1 56 | 192.168.1.101 57 | 2 58 | 192.168.1.102 59 | 192.168.1.103 60 | ``` 61 | 62 | The example parts.conf above specifies a partition in which IP1 (192.168.1.101) is on one side and the two nodes with macs (IP2 = 192.168.1.102 and IP3 = 192.168.1.103) are on another. Other nodes that are not listed in parts.conf are not affected by the partition, i.e., can access all other nodes. 63 | 64 | Once you configure parts.conf you can run the partitioner to create the partition. 65 | 66 | ``` 67 | ./deploy_partitioner 68 | ``` 69 | 70 | To heal a partition, simple call the heal script 71 | 72 | ``` 73 | ./heal_partition 74 | ``` 75 | 76 | Helper Scripts 77 | ======= 78 | 79 | If you intend on running Nifty or the partitioner manually, you will need to find the MAC addresses of nodes in the cluster. You can use the print_macs helper script to help. 80 | Run the helper script as follows: 81 | 82 | 83 | ``` 84 | ./print_macs nodes.conf 85 | ``` 86 | Where nodes.conf includes a list of hostnames or IP addresses. 87 | 88 | This script will print the MAC address of every IP address found in nodes.conf 89 | -------------------------------------------------------------------------------- /deploy/deploy_nifty.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # get NIFTY_HOME 4 | . ../configuration 5 | 6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...) 7 | sshOptions=" -T " 8 | 9 | # Iterate through all the nodes, get their brdige IPs and MACs and save them (used to update nifty's nodes.conf) 10 | ips=""; 11 | macs=""; 12 | nodesCount=0; 13 | while IFS= read -r nodeIP 14 | do 15 | if [ -z $nodeIP ]; then 16 | continue; 17 | fi 18 | #ssh into the node and get its IP and MAC. 19 | ip=$(ssh -n $sshOptions $nodeIP ip addr show br0 | grep 'inet ' | cut -f2 | awk '{print $2}' | rev | cut -c4- | rev) 20 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address) 21 | 22 | ips="${ips}${ip}\n" 23 | macs="${macs}${mac}\n" 24 | let nodesCount=nodesCount+1 25 | done < ./nodes.conf 26 | 27 | printf "%d\n%b%b" $nodesCount $ips $macs > nifty_nodes.conf 28 | # For each of the nodes in deployment, update nodes.conf & run nifty with the nodes IP. 29 | while IFS= read -r nodeIP 30 | do 31 | if [ -z $nodeIP ]; then 32 | continue; 33 | fi 34 | #ssh into the node and get its IP and MAC. 35 | ip=$(ssh -n $sshOptions $nodeIP ip addr show br0 | grep 'inet ' | cut -f2 | awk '{print $2}' | rev | cut -c4- | rev) 36 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address) 37 | scp $sshOptions ./nifty_nodes.conf $nodeIP:"${NIFTY_HOME}/nifty_nodes.conf" 38 | 39 | echo "Starting NIFTY on node $nodeIP (which has IP address: $ip, and MAC address: $mac)" 40 | # Could need to either run the script as sudo or add sudo here to be able to deploy rules. (or have OVS not require sudo) 41 | ssh -n $sshOptions $nodeIP "cd $NIFTY_HOME && ./nifty -t 200 -i $ip -m $mac -c nifty_nodes.conf" & 42 | 43 | done < ./nodes.conf 44 | 45 | rm ./nifty_nodes.conf 46 | -------------------------------------------------------------------------------- /deploy/deploy_partitioner.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # get NIFTY_HOME 4 | . ../configuration 5 | 6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...) 7 | sshOptions=" -T " 8 | 9 | macs=""; 10 | 11 | while IFS= read -r nodeIP 12 | do 13 | 14 | # read IP addresses, get macs and print them to nifty_parts.conf 15 | re='^[0-9]+$' 16 | if ! [[ $nodeIP =~ $re ]]; then 17 | #ssh into each node and get its MAC. 18 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address) 19 | echo $mac >> nifty_parts.conf 20 | echo "${nodeIP} ${mac}" 21 | 22 | # if it was a number and not an ip jsut print it to the file 23 | else 24 | echo $nodeIP >> nifty_parts.conf 25 | fi 26 | done < ./parts.conf 27 | 28 | 29 | # For each of the nodes in deployment, run partitiner. 30 | while IFS= read -r nodeIP 31 | do 32 | if [ -z $nodeIP ]; then 33 | continue; 34 | fi 35 | 36 | #ssh into the node and get its MAC. 37 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address) 38 | scp $sshOptions ./nifty_parts.conf $nodeIP:"${NIFTY_HOME}/parts.conf" 39 | 40 | echo "Starting Partitioner on node $nodeIP (which has MAC address: $mac)" 41 | # Could need to either run the script as sudo or add sudo here to be able to deploy rules. (or have OVS not require sudo) 42 | ssh -n $sshOptions $nodeIP "cd $NIFTY_HOME && ./partitioner $mac" 43 | 44 | done < ./nodes.conf 45 | 46 | rm nifty_parts.conf -------------------------------------------------------------------------------- /deploy/heal_partition.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # get NIFTY_HOME 4 | . ../configuration 5 | 6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...) 7 | sshOptions=" -T " 8 | 9 | # For each of the nodes in deployment, run partitiner. 10 | while IFS= read -r nodeIP 11 | do 12 | if [ -z $nodeIP ]; then 13 | continue; 14 | fi 15 | echo "Healing the partition on node $nodeIP" 16 | # Could need to either run the script as sudo or add sudo here to be able to deploy rules. (or have OVS not require sudo) 17 | ssh -n $sshOptions $nodeIP "$NIFTY_HOME/partitioner" 18 | 19 | done < ./nodes.conf 20 | -------------------------------------------------------------------------------- /deploy/nodes.conf: -------------------------------------------------------------------------------- 1 | 192.168.1.101 2 | 192.168.1.102 3 | 192.168.1.103 4 | -------------------------------------------------------------------------------- /deploy/parts.conf: -------------------------------------------------------------------------------- 1 | 1 2 | 08:00:27:39:87:e3 3 | 1 4 | 08:00:27:39:87:e4 5 | -------------------------------------------------------------------------------- /deploy/print_macs.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # get NIFTY_HOME 4 | . ../configuration 5 | 6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...) 7 | sshOptions=" -T " 8 | 9 | echo "The following are the mac addresses for your nodes. Please use these mac addresses to specify your partition in parts.conf file\n" 10 | 11 | # Iterate through all the nodes to find their and mac addresses 12 | while IFS= read -r nodeIP 13 | do 14 | #ssh into each node and get its MAC. 15 | mac=$(ssh -n $sshOptions $nodeIP cat /sys/class/net/br0/address) 16 | 17 | echo "${nodeIP} ${mac}" 18 | done < ./nodes.conf 19 | -------------------------------------------------------------------------------- /deploy/stop_nifty.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # get NIFTY_HOME 4 | . ../configuration 5 | 6 | # If your ssh need more options, you can set them in the $sshOptions variable (you can set identify file, port, ...) 7 | sshOptions=" -T " 8 | 9 | # For each of the nodes in deployment, run partitiner. 10 | while IFS= read -r nodeIP 11 | do 12 | if [ -z $nodeIP ]; then 13 | continue; 14 | fi 15 | echo "Stopping Nifty on node $nodeIP" 16 | # Could need to either run the script as sudo or add sudo here to be able to deploy rules. (or have OVS not require sudo) 17 | ssh -n $sshOptions $nodeIP killall nifty 18 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=1/-1 19 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=2/-1 20 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=3/-1 21 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=4/-1 22 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=5/-1 23 | ssh -n $sshOptions $nodeIP ovs-ofctl del-flows br0 cookie=6/-1 24 | done < ./nodes.conf -------------------------------------------------------------------------------- /examples/HDFS/README.md: -------------------------------------------------------------------------------- 1 | Example of using Nifty with HDFS 2 | ======= 3 | The scripts in this directory show an example of using Nifty with HDFS. The following steps guide you through deploying an HDFS cluster with Nifty and evaluating its performance. 4 | 5 | Prerequesets 6 | ------- 7 | 1- Install Java. 8 | ```bash 9 | $ apt-get install default-jdk -y 10 | ``` 11 | 12 | 2- Install Hadoop 3.3.0 from the website: https://downloads.apache.org/hadoop/common/hadoop-3.3.0/. 13 | 14 | 3- Install the Paramiko library for Python SSH, the easiest way is using pip. 15 | ```bash 16 | $ pip install paramiko 17 | ``` 18 | While most environments already have pip, you may need to install it manually as described in https://github.com/pypa/get-pip. 19 | 20 | 21 | 4- Make sure the different machines can SSH into each other, or at least one controller node can SSH into all machines. This could call for setting up some keys for SSH. 22 | 23 | Running the Example 24 | ------- 25 | 1- Set variables in the config.py file: 26 | * HADOOP_HOME: making sure that this directory is where hadoop is installed and is the same for all nodes. 27 | * HADOOP_STORE: This is the directory that HDSF will use to for storage. The scripts also use this directory for some temp files. Make sure that this directory exists on all the nodes in the cluster. Also, make sure you have at least 1GB/client of available storage. 28 | * The IP addresses of all nodes in the cluster, this includes the HDFS cluster nodes (NameNodes and DataNodes) and the nodes which will run the benchmark client. Please note that the scripts assumes that the first ip in the list will have the NameNode instance. 29 | * The size of the HDFS cluster, which will be split into 1 NameNode and the rest will be DataNodes. 30 | 31 | 32 | 2- Start by setting HDFS parameteres. You'll need (at least) to edit the following file on all cluster nodes: 33 | 34 | * In $HADOOP_HOME/etc/hadoop/hdfs-site.xml: Add the following two properties, with the [HADOOP_STORE] defined in step 1: 35 | 36 | ```xml 37 | 38 | dfs.namenode.name.dir 39 | file:[HADOOP_STORE]/hdfs/namenode 40 | 41 | 42 | dfs.datanode.data.dir 43 | file:[HADOOP_STORE]/hdfs/datanode 44 | 45 | ``` 46 | 47 | * In $HADOOP_HOME/etc/hadoop/hadoop-env.sh, add your JAVA_HOME directory. It should look something like this: 48 | ```bash 49 | # The java implementation to use. By default, this environment 50 | # variable is REQUIRED on ALL platforms except OS X! 51 | export JAVA_HOME='/usr/lib/jvm/java-8-openjdk-amd64' 52 | ``` 53 | 54 | * In $HADOOP_HOME/etc/hadoop/core-site.xml, you must specify the address of your NameNode to allow the DataNodes to reach it. It should have a property like the following, with node1_ip being the first ip in the list specified in the config.py file in step 1: 55 | ```xml 56 | 57 | fs.defaultFS 58 | hdfs://:9000 59 | 60 | ``` 61 | 62 | 3- From the controller node, which could be a separate node or part of the cluster, start HDFS. You can use: 63 | ```bash 64 | $ python deploy_hdfs.py 65 | ``` 66 | The script will start a NameNode on the first IP address machine in the list (in config.py), and enough DataNodes to satify the set cluster size. 67 | 68 | 4- If you're testing with Nifty, now would be the time to start it using deploy/deploy-nifty.sh. You can learn more on how to start Nifty in the Readme of the deploy directory. 69 | 70 | 5- To evaluate the perfomance of the cluster, run the HDFS TestDFSIO benchmark. This can be done by: 71 | ```bash 72 | $ python run_benchmark.py 73 | ``` 74 | Clients will be distributed onto the machines that are in the cluster but were not used in the HDFS cluster. The client will run in parallel, starting with a clean up, then writing to the HDFS cluster, then reading the same files they wrote. The run_benchmark script then returns the total throughput of the cluster in the write period and in the read period. 75 | 76 | To test different number of clients in a single script, maybe for plotting figures, you might want to use the following command: 77 | ```bash 78 | $ python run_exp.py 79 | ``` 80 | This command will create a CSV file called results.csv with the Read and Write throughputs across different client counts. -------------------------------------------------------------------------------- /examples/HDFS/config.py: -------------------------------------------------------------------------------- 1 | HADOOP_VERSION='3.3.0' 2 | HADOOP_HOME='/proj/sds-PG0/basil/HDFS/hadoop-' + HADOOP_VERSION 3 | HADOOP_STORE='/dev/shm/hadoop_store' 4 | ips = ['192.168.1.101', '192.168.1.102', '192.168.1.103', '192.168.1.104', '192.168.1.105'] 5 | num_of_cluster_nodes = 3 6 | SSH_USERNAME='b2alkhat' -------------------------------------------------------------------------------- /examples/HDFS/deploy_hdfs.py: -------------------------------------------------------------------------------- 1 | #/usr/bin/env python 2 | 3 | import config 4 | 5 | import sys 6 | import paramiko 7 | import time 8 | import threading 9 | import os 10 | import subprocess 11 | from multiprocessing import Pool 12 | from datetime import datetime 13 | 14 | nodes = [] 15 | 16 | # parameters for SSH paramiko 17 | port = 22 18 | 19 | 20 | 21 | # SSH to all the nodes 22 | try: 23 | for ip in config.ips: 24 | print(ip) 25 | node = paramiko.SSHClient() 26 | node.set_missing_host_key_policy(paramiko.AutoAddPolicy()) 27 | node.connect(ip, port=port, username=config.SSH_USERNAME) 28 | print("Trying to connect to node with address: " + ip) 29 | nodes.append(node) 30 | except: 31 | for n in nodes: 32 | n.close() 33 | print("Error: Could not ssh to all nodes") 34 | 35 | 36 | 37 | # Start NameNode on first IP node 38 | stdin, stdout, stderr = nodes[0].exec_command(config.HADOOP_HOME + "/bin/hdfs namenode -format c1") 39 | #print(stdout.read()) 40 | print(stderr.read()) 41 | stdin, stdout, stderr = nodes[0].exec_command(config.HADOOP_HOME + "/bin/hdfs --daemon start namenode") 42 | #print(stdout.read()) 43 | print(stderr.read()) 44 | 45 | # Start DataNodes 46 | for i in range(config.num_of_cluster_nodes-1): 47 | command = config.HADOOP_HOME + "/bin/hdfs --daemon start datanode" 48 | stdin, stdout, stderr = nodes[i+1].exec_command(command) 49 | print(stdout.read()) 50 | print(stderr.read()) 51 | print("Done DataNode start on node...") -------------------------------------------------------------------------------- /examples/HDFS/run_benchmark.py: -------------------------------------------------------------------------------- 1 | #/usr/bin/env python 2 | import config 3 | 4 | import sys 5 | import paramiko 6 | import time 7 | import threading 8 | import os 9 | import subprocess 10 | from multiprocessing import Pool 11 | from datetime import datetime 12 | 13 | 14 | nodes = [] 15 | num_of_nodes = len(config.ips) 16 | num_of_client_nodes = num_of_nodes - config.num_of_cluster_nodes 17 | 18 | 19 | # parameters for SSH paramiko 20 | port = 22 21 | 22 | 23 | # read inputs from arguments 24 | num_of_clients = int(sys.argv[1]) 25 | 26 | # SSH to all the nodes 27 | try: 28 | for ip in config.ips: 29 | print(ip) 30 | node = paramiko.SSHClient() 31 | node.set_missing_host_key_policy(paramiko.AutoAddPolicy()) 32 | node.connect(ip, port=port, username=config.SSH_USERNAME) 33 | print("Trying to connect to node with address: " + ip) 34 | nodes.append(node) 35 | except: 36 | for n in nodes: 37 | n.close() 38 | print("Error: Could not ssh to all nodes") 39 | 40 | 41 | 42 | def runBenchmarkReadClient(node, cid): 43 | command = "cd " + config.HADOOP_HOME +"; ./bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-" + config.HADOOP_VERSION + "-tests.jar TestDFSIO -Dtest.build.data=/benchmark/c" + str(cid) + " -Dmapred.output.compress=false -read -nrFiles 1 -fileSize 1000 &> " + config.HADOOP_HOME + "/temp_output_read_" + str(cid) + ".txt" 44 | node.exec_command(command) 45 | 46 | def runBenchmarkWriteClient(node, cid): 47 | command = "cd " + config.HADOOP_HOME +"; ./bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-" + config.HADOOP_VERSION + "-tests.jar TestDFSIO -Dtest.build.data=/benchmark/c" + str(cid) + " -Dmapred.output.compress=false -write -nrFiles 1 -fileSize 1000 &> " + config.HADOOP_HOME + "/temp_output_write_" + str(cid) + ".txt" 48 | node.exec_command(command) 49 | 50 | def runBenchmarkCleanup(node): 51 | node.exec_command("cd " + config.HADOOP_HOME + "; ./bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-" + config.HADOOP_VERSION + "-tests.jar TestDFSIO -Dtest.build.data=/benchmark -Dmapred.output.compress=false -clean") 52 | 53 | 54 | 55 | 56 | runBenchmarkCleanup(nodes[0]) 57 | 58 | 59 | 60 | 61 | # Polling to make sure that files inside /benchmark are deleted 62 | # since HDFS has pretty unpredictable delete time 63 | x = 5 64 | while x > 1: 65 | try: 66 | results = subprocess.check_output((config.HADOOP_HOME + "/bin/hdfs dfs -ls /benchmark").split(" ")) 67 | except: 68 | results = '' 69 | lines = results.split(" ") 70 | x = len(lines) 71 | time.sleep(10) 72 | 73 | 74 | 75 | 76 | print("Client's Write Operations Starting...") 77 | threads = [] 78 | for i in range(num_of_clients): 79 | threads.append(threading.Thread(target=runBenchmarkWriteClient, args=[nodes[config.num_of_cluster_nodes + (i % num_of_client_nodes)], i])) 80 | 81 | for t in threads: 82 | t.start() 83 | 84 | 85 | # Polling for the results to be ready before joining threads 86 | for c in range(num_of_clients): 87 | done = False 88 | while done == False: 89 | stdin, stdout, stderr = nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("cat " + config.HADOOP_HOME + "/temp_output_write_" + str(c) + ".txt | grep Throughput") 90 | output_c = stdout.read() 91 | done = ('Throughput' in output_c) 92 | time.sleep(5) 93 | 94 | for t in threads: 95 | t.join() 96 | print("Client's Write Done") 97 | 98 | 99 | 100 | 101 | print("Client's Read Operation Starting...") 102 | threads = [] 103 | for i in range(num_of_clients): 104 | threads.append(threading.Thread(target=runBenchmarkReadClient, args=[nodes[config.num_of_cluster_nodes + (i % num_of_client_nodes)], i])) 105 | 106 | for t in threads: 107 | t.start() 108 | 109 | # Polling for the results to be ready before joining threads 110 | for c in range(num_of_clients): 111 | done = False 112 | while done == False: 113 | stdin, stdout, stderr = nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("cat " + config.HADOOP_HOME + "/temp_output_read_" + str(c) + ".txt | grep Throughput") 114 | output_c = stdout.read() 115 | done = ('Throughput' in output_c) 116 | time.sleep(5) 117 | 118 | for t in threads: 119 | t.join() 120 | print("Client's Read Done") 121 | 122 | 123 | 124 | #------------------------------------- 125 | # Results Collection 126 | #------------------------------------- 127 | 128 | throughput = 0 129 | for c in range(num_of_clients): 130 | stdin, stdout, stderr = nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("cat " + config.HADOOP_HOME + "/temp_output_write_" + str(c) + ".txt") 131 | lines = stdout.read().splitlines() 132 | for line in lines: 133 | if "Throughput" in line: 134 | throughput = throughput + float(line.split(" ")[-1]) 135 | nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("rm " + config.HADOOP_HOME + "/temp_output_write_" + str(c) + ".txt") 136 | print("Total Write Throughput: " + str(throughput) + "MB/sec") 137 | 138 | 139 | throughput = 0 140 | for c in range(num_of_clients): 141 | stdin, stdout, stderr = nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("cat " + config.HADOOP_HOME + "/temp_output_read_" + str(c) + ".txt") 142 | lines = stdout.read().splitlines() 143 | for line in lines: 144 | if "Throughput" in line: 145 | throughput = throughput + float(line.split(" ")[-1]) 146 | nodes[config.num_of_cluster_nodes + c % num_of_client_nodes].exec_command("rm " + config.HADOOP_HOME + "/temp_output_read_" + str(c) + ".txt") 147 | print("Total Read Throughput: " + str(throughput) + "MB/sec") -------------------------------------------------------------------------------- /examples/HDFS/run_exp.py: -------------------------------------------------------------------------------- 1 | #/usr/bin/env python 2 | import config 3 | 4 | import sys 5 | import paramiko 6 | import time 7 | import threading 8 | import os 9 | import subprocess 10 | from multiprocessing import Pool 11 | from datetime import datetime 12 | 13 | 14 | nodes = [] 15 | 16 | # parameters for SSH paramiko 17 | port = 22 18 | 19 | 20 | # SSH to all the nodes 21 | try: 22 | for ip in config.ips: 23 | print(ip) 24 | node = paramiko.SSHClient() 25 | node.set_missing_host_key_policy(paramiko.AutoAddPolicy()) 26 | node.connect(ip, port=port, username=config.SSH_USERNAME) 27 | print("Trying to connect to node with address: " + ip) 28 | nodes.append(node) 29 | except: 30 | for n in nodes: 31 | n.close() 32 | print("Error: Could not ssh to all nodes") 33 | 34 | 35 | 36 | 37 | if (len(sys.argv) < 3): 38 | print ("::USAGE::") 39 | print ("python ./run_exp.py min_num_clients max_num_clients step") 40 | print ("this script runs multiple experiments by increasing the ") 41 | print ("number of clients from min_num_clients to max_num_clients ") 42 | print ("by the value of step in each iteration") 43 | sys.exit() 44 | 45 | min_clients = int(sys.argv[1]) 46 | max_clients = int(sys.argv[2]) 47 | step = int(sys.argv[3]) 48 | 49 | finalOutput = 'Clients,Write Throughput,Read Throughput\n' 50 | 51 | for clients in range(min_clients, max_clients+1, step): 52 | print('Starting experiment with ' + str(clients) + ' clients...') 53 | # get the working directory 54 | res = subprocess.check_output('pwd') 55 | 56 | # just run run_benchmark.py repeatedly 57 | stdin, stdout, stderr = nodes[0].exec_command('python ' + res.strip() + '/run_benchmark.py ' + str(clients)) 58 | lines = stdout.read().splitlines() 59 | 60 | # Collect and parse results 61 | finalOutput = finalOutput + str(clients) + ',' 62 | for line in lines: 63 | if 'Total Write Throughput' in line: 64 | finalOutput = finalOutput + str(line.split(' ')[3]) + ',' 65 | if 'Total Read Throughput' in line: 66 | finalOutput = finalOutput + str(line.split(' ')[3]) + '\n' 67 | 68 | print ('Experiment Complete, results are in file results.csv') 69 | try: 70 | f = open("result.csv", "w") 71 | f.write(finalOutput) 72 | f.close() 73 | except Exception as e: 74 | print(e) -------------------------------------------------------------------------------- /examples/HDFS/stop_hdfs.py: -------------------------------------------------------------------------------- 1 | #/usr/bin/env python 2 | 3 | import config 4 | 5 | import sys 6 | import paramiko 7 | import time 8 | import threading 9 | import os 10 | import subprocess 11 | from multiprocessing import Pool 12 | from datetime import datetime 13 | 14 | nodes = [] 15 | 16 | # parameters for SSH paramiko 17 | port = 22 18 | 19 | 20 | # SSH to all the nodes 21 | try: 22 | for ip in config.ips: 23 | print(ip) 24 | node = paramiko.SSHClient() 25 | node.set_missing_host_key_policy(paramiko.AutoAddPolicy()) 26 | node.connect(ip, port=port, username=config.SSH_USERNAME) 27 | print("Trying to connect to node with address: " + ip) 28 | nodes.append(node) 29 | except: 30 | for n in nodes: 31 | n.close() 32 | print("Error: Could not ssh to all nodes") 33 | 34 | 35 | 36 | 37 | nodes[0].exec_command(config.HADOOP_HOME + "/bin/hdfs --daemon stop namenode") 38 | 39 | 40 | for i in range(config.num_of_cluster_nodes-1): 41 | command = config.HADOOP_HOME + "/bin/hdfs --daemon stop datanode" 42 | stdin, stdout, stderr = nodes[i+1].exec_command(command) 43 | print(stdout.read()) 44 | nodes[i+1].exec_command("rm -r " + config.HADOOP_STORE) 45 | 46 | print("HDFS Stopped") -------------------------------------------------------------------------------- /examples/Kafka/README.md: -------------------------------------------------------------------------------- 1 | Example of using Nifty with Kafka 2 | ======= 3 | The scripts in this directory can be used to try Nifty with Kafka v2.13-2.6.0. The scripts were tested on Ubuntu 16.04.1 LTS with OpenJDK version 1.8.0_265 and Python 3.5.2. These scripts automate the process of deploying Kafka on multiple machines, running Kafka producers to generate some workload, and shutting down Kafka cluster. 4 | 5 | Description of the scripts 6 | ------- 7 | Teh following is a brief description for each script in this directory: 8 | 9 | * config: This script contains configuration parameters that is needed to run the benchmark. These parameters are described below in the *Running the Experiment* section. 10 | * deploy_kafka.py: this script deploys Kafka brokers on the cluster specified in `config.py` file. 11 | * stop_kafka.py: this script stops all Kafka instances that are running on the cluster specified in `config.py` file. 12 | * run_benchmark.py: this scripts runs the Kafka producers to measure the throughout of Kafka. The script takes one argument which specifies the number of producers to run. Producers will be distributed over the client nodes (i.e., nodes that are not part of the Kafka cluster). 13 | * run_exp.py: This script runs multiple experiments to get the performance of Kafka with different number of clients. This script takes three arguments: `min_num_clients, max_num_clients, step`. First experiment starts with `min_num_clients` clients and then it increases the number of clients by `step` in subsequent experiments until it reaches `max_num_clients`. Clients will be distributed over the machines that are in the cluster but are not used in the Kafka cluster. The clients will run in parallel, and each client will produce to a topic. Finally, the script writes the throughput results to a file stored in `./tmp/results.csv`. As an example, `python run_exp.py 2 4 2` will generate the throughput of Kafka with 2 and 4 clients. The output `results.csv` file will look something like this: 14 | 15 | number of producers | throughput 16 | ------------------- | ---------- 17 | 2 | 200000 18 | 4 | 400000 19 | 20 | 21 | Prerequisites 22 | ------- 23 | 1- Install Java on all machines. 24 | ```bash 25 | $ apt-get install default-jdk -y 26 | ``` 27 | 2- install kafka v2.13-2.6.0 on all machines. Follow this link to install Kafka: https://www.apache.org/dyn/closer.cgi?path=/kafka/2.6.0/kafka_2.13-2.6.0.tgz. 28 | 29 | 30 | Running the Experiment 31 | ------- 32 | 1- Set the following variables in the config.py file: 33 | * KAFKA_HOME: the directory at which Kafka is installed and it should be the same for all nodes in the cluster. 34 | * NODES_IPS: the ip addresses of all nodes in the cluster. 35 | * KAFKA_DATA_DIR: the directory at which Kafka logs and config files will be stored. 36 | * KAFKA_PORT: Kafka brokers will listen at this port 37 | * ZOOKEEPER_PORT: ZooKeeper will listen at this port 38 | * REPLICATION_FACTOR: the replication factor for Kafka. 39 | * USER_NAME: ssh username which is needed to ssh into other nodes and run commands. Set this to `None` if not needed. 40 | * SSH_KEY_PATH: the path to ssh private key. Set this to `None` if not needed. 41 | 42 | 2- start Kafka cluster by running 43 | ```bash 44 | python3 deploy_kafka.py 45 | ``` 46 | This script starts a ZooKeeper instance on the first node in `NODES_IPS` and N Kafka brokers on the next N nodes, where N = `REPLICATION_FACTOR` configuration parameter. 47 | 48 | 3- Start the throughput experiment bu running `run_exp.py` script. 49 | ```bash 50 | python3 run_exp.py min_num_clients max_num_clients step 51 | ``` 52 | 53 | Experimenting with Nifty 54 | ------- 55 | In order to compare the performance of Kafka with and without Nifty, you must run the same experiments with and without Nifty and compare the results. Please refer to the `deploy` folder to see how to deploy Nifty on multiple nodes. 56 | -------------------------------------------------------------------------------- /examples/Kafka/cmd_helper.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import os 3 | import config 4 | 5 | ''' 6 | This is a helper class to execute system 7 | commands locally and remotely 8 | ''' 9 | class CmdHelper: 10 | def __init__(self): 11 | self.sshKeyPath = config.SSH_KEY_PATH 12 | self.username = config.USER_NAME 13 | 14 | def executeCmdBlocking(self, cmd, redirectOutputPath=None): 15 | try: 16 | if redirectOutputPath: 17 | f = open(redirectOutputPath, "w") 18 | p = subprocess.Popen(cmd, shell=True, stdout=f, stderr=f) 19 | f.close() 20 | else: 21 | p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE) 22 | output, err = p.communicate() 23 | return [output, err] 24 | except OSError as e: 25 | return [None, e] 26 | except ValueError as e: 27 | return [None, e] 28 | except Exception as e: 29 | return [None, e] 30 | 31 | def executeCmdNonBlocking(self, cmd, redirectOutputPath=None): 32 | try: 33 | cmd = "sudo " + cmd 34 | if redirectOutputPath: 35 | f = open(redirectOutputPath, "w") 36 | p = subprocess.Popen(cmd, shell=True, stdout=f, stderr=f) 37 | else: 38 | p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE) 39 | return [p, None] 40 | except OSError as e: 41 | return [None, e] 42 | except ValueError as e: 43 | return [None, e] 44 | except Exception as e: 45 | return [None, e] 46 | 47 | def uploadToServer(self, localPath, remoteAddress, remotePath, blocking=True): 48 | cmd = "sudo scp -o StrictHostKeyChecking=no -r " 49 | usernameStr = "" 50 | if self.sshKeyPath: 51 | cmd = cmd + " -i " + self.sshKeyPath 52 | if self.username: 53 | usernameStr = " " + self.username + "@" 54 | 55 | cmd = cmd + " " + localPath + " " 56 | cmd = cmd + usernameStr 57 | cmd = cmd + remoteAddress 58 | cmd = cmd + ":" + remotePath 59 | if blocking: 60 | return self.executeCmdBlocking(cmd) 61 | else: 62 | return self.executeCmdNonBlocking(cmd) 63 | 64 | def downloadFromServer(self, localPath, remoteAddress, remotePath): 65 | cmd = "sudo scp -o StrictHostKeyChecking=no -r " 66 | if self.sshKeyPath: 67 | cmd = cmd + " -i " + self.sshKeyPath 68 | if self.username: 69 | cmd = cmd + " " + self.username +"@" 70 | 71 | cmd = cmd + remoteAddress 72 | cmd = cmd + ":" + remotePath 73 | cmd = cmd + " " + localPath 74 | return self.executeCmdBlocking(cmd) 75 | 76 | def executeCmdRemotely(self, cmd, remoteAddress, blocking=True, redirectOutputPath=None): 77 | cmd = "\"sudo " + cmd + "\"" 78 | fullCommand = "ssh -o StrictHostKeyChecking=no " 79 | if self.sshKeyPath: 80 | fullCommand = fullCommand + " -i " + self.sshKeyPath 81 | if self.username: 82 | fullCommand = fullCommand + " " + self.username +"@" 83 | 84 | fullCommand = fullCommand + remoteAddress 85 | fullCommand = fullCommand + " " + cmd 86 | if blocking: 87 | return self.executeCmdBlocking(fullCommand, redirectOutputPath) 88 | else: 89 | return self.executeCmdNonBlocking(fullCommand, redirectOutputPath) 90 | 91 | 92 | -------------------------------------------------------------------------------- /examples/Kafka/config.py: -------------------------------------------------------------------------------- 1 | ''' 2 | This file contains the configuration parametes that is needed to 3 | run Kafka and reproduce the results presented in a paper titled Toward a Generic 4 | Fault Tolerance Technique for Partial Network Partitioning (OSDI'20). 5 | ''' 6 | 7 | # the ssh username 8 | USER_NAME = None 9 | 10 | # the path for the ssh private key 11 | SSH_KEY_PATH = None 12 | 13 | # the ip addresses of all nodes in the cluster 14 | NODES_IPS=["192.168.1.101","192.168.1.102","192.168.1.103","192.168.1.104","192.168.1.112","192.168.1.114","192.168.1.118","192.168.1.105","192.168.1.113","192.168.1.119","192.168.1.111","192.168.1.115","192.168.1.107","192.168.1.116","192.168.1.117","192.168.1.120","192.168.1.106","192.168.1.108","192.168.1.109","192.168.1.110"] 15 | 16 | # The root directory of Kafka 17 | KAFKA_HOME ="/proj/sds-PG0/ahmed/nifty/kafka/kafka_2.13-2.6.0/" 18 | 19 | # the path that ZooKeeper, brokers, and producers 20 | # use to store logs and config files 21 | KAFKA_DATA_DIR ="/media/ssd/kafka/" 22 | 23 | # replication factor for Kafka 24 | REPLICATION_FACTOR = 3 25 | 26 | 27 | assert len(NODES_IPS) > REPLICATION_FACTOR + 1, "At least {} nodes are needed (replication factor:{} & 1 ZooKeeper & 1 Client)".format(REPLICATION_FACTOR+2, REPLICATION_FACTOR) 28 | 29 | # produced message size 30 | MESSAGE_SIZE = 128 31 | 32 | # number of messages produced per producer 33 | MESSAGES_COUNT = 2000000 34 | 35 | # first node will be used to run ZooKeeper 36 | ZOOKEEPER_ADDRESS = NODES_IPS[0] 37 | 38 | # the next (N = REPLICATION_FACTOR) nodes are used to run Kafka brokers 39 | BROKER_NODES = NODES_IPS[1 : 1 + REPLICATION_FACTOR] 40 | 41 | # the remaining nodes are used to run clients 42 | CLIENT_NODES = NODES_IPS[1 + REPLICATION_FACTOR : len(NODES_IPS)] 43 | 44 | 45 | 46 | # the directory contains the binaries of Kafka 47 | KAFKA_BIN =KAFKA_HOME + "bin/" 48 | # Binaries to run various processes 49 | KAFKA_CREATE_TOPIC_BIN = KAFKA_BIN + "kafka-topics.sh " 50 | KAFKA_PRODUCER_TEST_BIN = KAFKA_BIN + "kafka-producer-perf-test.sh " 51 | KAFKA_BROKER_BIN = KAFKA_BIN + "kafka-server-start.sh " 52 | KAFKA_ZK_BIN = KAFKA_BIN + "zookeeper-server-start.sh " 53 | 54 | 55 | # Kafka logs directory 56 | KAFKA_LOGS_DIR = KAFKA_DATA_DIR + "logs" 57 | # ZooKeeper logs directory 58 | ZOOKEEPER_LOGS_DIR = KAFKA_DATA_DIR + "zk-logs" 59 | 60 | # Kafka brokers listen on this port 61 | KAFKA_PORT = "55555" 62 | 63 | # ZooKeeper listens on this port 64 | ZOOKEEPER_PORT = "2181" 65 | 66 | -------------------------------------------------------------------------------- /examples/Kafka/deploy_kafka.py: -------------------------------------------------------------------------------- 1 | ''' 2 | This script deploys Kafka and ZooKepper on the cluster specified 3 | in the config.py file 4 | ''' 5 | import os 6 | import time 7 | import config 8 | import cmd_helper 9 | import helpers 10 | 11 | cmdHelper = cmd_helper.CmdHelper() 12 | 13 | def deployKafka(): 14 | # initialize needed directories on all nodes 15 | for node in config.NODES_IPS: 16 | cmd = "rm -r {}".format(config.KAFKA_DATA_DIR) 17 | cmdHelper.executeCmdRemotely(cmd, node, True) 18 | cmd = "mkdir -p {}".format(config.KAFKA_DATA_DIR) 19 | cmdHelper.executeCmdRemotely(cmd, node, True) 20 | cmd = "mkdir -p {}".format(config.KAFKA_LOGS_DIR) 21 | cmdHelper.executeCmdRemotely(cmd, node, True) 22 | cmd = "mkdir -p {}".format(config.ZOOKEEPER_LOGS_DIR) 23 | cmdHelper.executeCmdRemotely(cmd, node, True) 24 | cmd = "chmod 777 -R {}".format(config.KAFKA_DATA_DIR) 25 | cmdHelper.executeCmdRemotely(cmd, node, True) 26 | 27 | 28 | # write ZooKeeper config file 29 | path = TMP_DIR + "zk-config.properties" 30 | helpers.writeZkConfigFile(path, config.ZOOKEEPER_PORT, config.ZOOKEEPER_LOGS_DIR) 31 | cmdHelper.uploadToServer(path, config.ZOOKEEPER_ADDRESS, config.KAFKA_DATA_DIR, True) 32 | 33 | # write brokers config files 34 | for index, node in enumerate(config.BROKER_NODES): 35 | path = TMP_DIR + "server{}.properties".format(index) 36 | helpers.writeBrokerConfigFile(path, index, node, config.KAFKA_PORT, config.KAFKA_LOGS_DIR, config.ZOOKEEPER_ADDRESS, config.ZOOKEEPER_PORT) 37 | cmdHelper.uploadToServer(path, node, config.KAFKA_DATA_DIR, True) 38 | 39 | # start ZooKeeper 40 | path = config.KAFKA_DATA_DIR + "zk-config.properties" 41 | cmd = config.KAFKA_ZK_BIN + " {}".format(path) 42 | cmdHelper.executeCmdRemotely(cmd, config.ZOOKEEPER_ADDRESS, False, "{}/zookeeper.log".format(TMP_DIR)) 43 | 44 | # start kafka brokers 45 | for index, s in enumerate(config.BROKER_NODES): 46 | path = config.KAFKA_DATA_DIR + "server{}.properties".format(index) 47 | cmd = config.KAFKA_BROKER_BIN + " {}".format(path) 48 | cmdHelper.executeCmdRemotely(cmd, s, False, "{}/broker{}.log".format(TMP_DIR, index)) 49 | 50 | 51 | 52 | # temp folder that is used to store temp config and log files 53 | TMP_DIR = os.getcwd() + "/tmp/" 54 | cmd = "mkdir -p {}".format(TMP_DIR) 55 | cmdHelper.executeCmdBlocking(cmd) 56 | 57 | deployKafka() -------------------------------------------------------------------------------- /examples/Kafka/helpers.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | ''' 4 | This helper function writes the configuration file 5 | for a Kafka broker 6 | ''' 7 | def writeBrokerConfigFile(path, brokerId,kafkaAddress, kafkaPort, logFilesPath, zkAddress, zkPort): 8 | path = os.path.expanduser(path) 9 | str = "" 10 | str = str + "broker.id={}\n".format(brokerId) 11 | str = str + "zookeeper.connect={}:{}\n".format(zkAddress, zkPort) 12 | str = str + "listeners=PLAINTEXT://{}:{}\n".format(kafkaAddress, kafkaPort) 13 | str = str + "log.dirs={}\n".format(logFilesPath) 14 | str = str + "num.partitions=3\n" 15 | str = str + "num.network.threads=4\n" 16 | str = str + "num.io.threads=10\n" 17 | 18 | with open(path, 'w') as file: 19 | file.write(str) 20 | 21 | ''' 22 | This helper function writes the configuration file 23 | for ZooKeeper 24 | ''' 25 | def writeZkConfigFile(path, zkPort, logFilesPath): 26 | path = os.path.expanduser(path) 27 | str = "" 28 | str = str + "clientPort={}\n".format(zkPort) 29 | str = str + "dataDir={}\n".format(logFilesPath) 30 | 31 | with open(path, 'w') as file: 32 | file.write(str) 33 | 34 | ''' 35 | This helper function writes the configuration file 36 | for Kafka producer 37 | ''' 38 | def writeProducerConfigFile(path, brokersAddresses, logFilesPath): 39 | path = os.path.expanduser(path) 40 | str = "" 41 | str = str + "bootstrap.servers={}\n".format(brokersAddresses) 42 | str = str + "acks=all\n" 43 | 44 | with open(path, 'w') as file: 45 | file.write(str) 46 | 47 | -------------------------------------------------------------------------------- /examples/Kafka/run_benchmark.py: -------------------------------------------------------------------------------- 1 | ''' 2 | This script benchmarks the performance of Kafka by 3 | running multiple producers and measure the throughput. 4 | This script receives one command arguments: number of producers 5 | ''' 6 | 7 | import sys 8 | import threading 9 | import os 10 | import time 11 | import config 12 | import cmd_helper 13 | import helpers 14 | 15 | cmdHelper = cmd_helper.CmdHelper() 16 | 17 | # number of producers to run 18 | PRODUCERS_COUNT = int(sys.argv[1]) 19 | 20 | def printResults(producersProcesses): 21 | # read the results 22 | total = 0 23 | print ("###################") 24 | for index, p in enumerate(producersProcesses): 25 | try: 26 | output, err = p.communicate() 27 | if (err): 28 | print("producer{}: {}".format(index,err.decode("utf-8"))) 29 | else: 30 | output = output.decode("utf-8") 31 | output = output.split('\n') 32 | output.remove('') 33 | output = output[-1] 34 | output = output.split(',') 35 | if (len(output) == 8): 36 | throughput = float(output[1].split(' ')[1]) 37 | total = total + throughput 38 | print("producer{}: {:.2f} messages/sec".format(index, throughput)) 39 | else: 40 | print("producer{}: {}".format(index, "error")) 41 | except Exception as e: 42 | print("{}: {}".format(index, e)) 43 | print ("-------------------") 44 | print("total: {:.2f} messages/sec".format(total)) 45 | 46 | 47 | 48 | # create a single topic 49 | def createSingleTopic(topicName): 50 | cmd = config.KAFKA_CREATE_TOPIC_BIN + " --create --topic {} --zookeeper {}:{} --replication-factor {} --partitions 3".format(topicName,config.ZOOKEEPER_ADDRESS, config.ZOOKEEPER_PORT, config.REPLICATION_FACTOR) 51 | cmdHelper.executeCmdRemotely(cmd, config.BROKER_NODES[0], True, TMP_DIR + "null") 52 | 53 | 54 | # create a topic for each producer. 55 | def createTopics(): 56 | threads = [] 57 | for i in range(PRODUCERS_COUNT): 58 | threads.append(threading.Thread(target=createSingleTopic, args=["topic-{}".format(i)])) 59 | 60 | for t in threads: 61 | t.start() 62 | 63 | for t in threads: 64 | t.join() 65 | 66 | 67 | def startSingleProducer(cmd, nodeIp, producersProcesses): 68 | [p, err] = cmdHelper.executeCmdRemotely(cmd, nodeIp, False) 69 | producersProcesses.append(p) 70 | 71 | # start the producers and collect the results 72 | def startProducers(): 73 | # thread array to store threads use run clients in parallel 74 | threads = [] 75 | 76 | # convert the brokers ip addresses to producers format 77 | brokersAddressesStr = [] 78 | for index, node in enumerate(config.BROKER_NODES): 79 | brokersAddressesStr.append("{}:{}".format(node, config.KAFKA_PORT)) 80 | brokersAddressesStr = ','.join(brokersAddressesStr) 81 | 82 | # write producers config files 83 | for i in range(PRODUCERS_COUNT): 84 | path = TMP_DIR + "producer{}.properties".format(i) 85 | helpers.writeProducerConfigFile(path, brokersAddressesStr, config.KAFKA_DATA_DIR) 86 | cmdHelper.uploadToServer(path, config.CLIENT_NODES[i%len(config.CLIENT_NODES)], config.KAFKA_DATA_DIR, True) 87 | 88 | # launch the producers on client machines 89 | producersProcesses = [] 90 | for i in range(PRODUCERS_COUNT): 91 | path = config.KAFKA_DATA_DIR + "producer{}.properties".format(i) 92 | cmd = config.KAFKA_PRODUCER_TEST_BIN + "--topic topic-{} --record-size {} --throughput -1 --num-records {} --producer.config {}".format(i, config.MESSAGE_SIZE, config.MESSAGES_COUNT, path) 93 | threads.append(threading.Thread(target=startSingleProducer, args=[cmd, config.CLIENT_NODES[i%len(config.CLIENT_NODES)], producersProcesses])) 94 | 95 | for t in threads: 96 | t.start() 97 | 98 | for t in threads: 99 | t.join() 100 | 101 | # wait until producers finish 102 | for p in producersProcesses: 103 | try: 104 | p.wait() 105 | except Exception as e: 106 | print(e) 107 | 108 | # print the throughput of each producer 109 | printResults(producersProcesses) 110 | 111 | 112 | # temp folder that is used to store temp config and log files 113 | TMP_DIR = os.getcwd() + "/tmp/" 114 | cmd = "mkdir -p {}".format(TMP_DIR) 115 | cmdHelper.executeCmdBlocking(cmd) 116 | 117 | createTopics() 118 | startProducers() -------------------------------------------------------------------------------- /examples/Kafka/run_exp.py: -------------------------------------------------------------------------------- 1 | import optparse 2 | import time 3 | import sys 4 | import os 5 | import cmd_helper 6 | import config 7 | 8 | def parseResult(outputStr): 9 | try: 10 | outputStr = outputStr.split('\n') 11 | outputStr.remove('') 12 | outputStr = outputStr[-1] 13 | outputStr = outputStr.split(' ') 14 | return float(outputStr[1]) 15 | except Exception as e: 16 | print(e) 17 | 18 | if (len(sys.argv) < 4): 19 | print ("::USAGE::") 20 | print ("python ./run_exp.py min_num_clients max_num_clients step") 21 | print ("--------------------------------------------------------") 22 | print ("this script runs multiple experiments by increasing the ") 23 | print ("number of clients from min_num_clients to max_num_clients ") 24 | print ("by the value of step in each iteration") 25 | print ("--------------------------------------------------------") 26 | sys.exit() 27 | 28 | cmdHelper = cmd_helper.CmdHelper() 29 | TMP_DIR = os.getcwd() + "/tmp/" 30 | cmd = "mkdir -p {}".format(TMP_DIR) 31 | cmdHelper.executeCmdBlocking(cmd) 32 | 33 | min_clients = int(sys.argv[1]) 34 | max_clients = int(sys.argv[2]) 35 | step = int(sys.argv[3]) 36 | 37 | 38 | resultStr = "num of producers, throughput (ops/sec)\n" 39 | for clients in range(min_clients, max_clients+1, step): 40 | cmd = "python3 ./run_benchmark.py {}".format(clients) 41 | out,err = cmdHelper.executeCmdBlocking(cmd) 42 | out = out.decode("utf-8") 43 | totalThroughput = parseResult(out) 44 | resultStr = resultStr + "{},{}\n".format(clients, totalThroughput) 45 | 46 | print (resultStr) 47 | try: 48 | f = open(TMP_DIR + "/result.csv", "w") 49 | f.write(resultStr) 50 | f.close() 51 | except Exception as e: 52 | print(e) 53 | 54 | 55 | -------------------------------------------------------------------------------- /examples/Kafka/stop_kafka.py: -------------------------------------------------------------------------------- 1 | ''' 2 | This script deploys Kafka and ZooKepper on the cluster specified 3 | in the config.py file 4 | ''' 5 | import config 6 | import cmd_helper 7 | 8 | cmdHelper = cmd_helper.CmdHelper() 9 | 10 | def stopKafka(): 11 | # initialize needed directories on all nodes 12 | for node in config.NODES_IPS: 13 | cmd = "sudo pkill -9 java" 14 | cmdHelper.executeCmdRemotely(cmd, node, True) 15 | 16 | 17 | stopKafka() -------------------------------------------------------------------------------- /examples/simple example/example.md: -------------------------------------------------------------------------------- 1 | Nifty Example 2 | ------- 3 | 4 | Below we describe an example that demonstrates how Nifty works and shows a simple use case of Nifty's functionality (i.e., Artifact Functionality). 5 | 6 | ### Overview 7 | The main idea of this example is to show that using Nifty, we can mask partial partitions in a local network. To demonstrate this, we will conduct a simple experiment where we use ping to show that Nifty infact does cover the partition without any modifications. 8 | 9 | ### Setup 10 | To run this example, you will need 4 machines that are in the same network, say node1, node2, node3, node4. node4 will act as a controller of the experiment, while the other three nodes are the cluster in which we will deploy Nifty. Further, you need to have Nifty present in all of the nodes at the same location (the location specified in the configuration file of the controller node (node4)), and Nifty should be compiled in all of them. You will also need to follow the steps in Nifty's setup and make sure that OVS is installed in nodes 1, 2, and 3. 11 | 12 | Example 1: A partition without Nifty 13 | ------- 14 | 15 | In this example we will create a partition on a cluster. The cluster does not use Nifty. 16 | 17 | 1. Log into all the nodes, and make sure that all the nodes can ping each other. 18 | 19 | ```bash 20 | node2$ ping -c3 node3 21 | PING node3-link-0 (192.168.1.103) 56(84) bytes of data. 22 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=1 ttl=64 time=1.45 ms 23 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=2 ttl=64 time=0.311 ms 24 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=3 ttl=64 time=0.327 ms 25 | 26 | --- node3-link-0 ping statistics --- 27 | 3 packets transmitted, 3 received, 0% packet loss, time 2001ms 28 | ``` 29 | 30 | 2. From the controller node (node4), modify nodes.conf in the deploy folder to include the hostnames (or IPs) of nodes 1, 2, and 3. 31 | 32 | The file will look something like this 33 | ``` 34 | node1 35 | node2 36 | node3 37 | ``` 38 | 39 | 3. From the controller node, modify parts.conf as follows: 40 | 41 | ``` 42 | 1 43 | node2_MAC 44 | 1 45 | node3_MAC 46 | ``` 47 | 48 | This effectively defines a partition between node2 and node3, while node1 can communicate with all the nodes. 49 | 50 | 4. From the controller node (node4) run the script ./deploy_partitioner.sh 51 | 52 | ```bash 53 | node4$ sudo ./deploy_partitioner.sh 54 | ``` 55 | 56 | This will create a partition specified in parts.conf 57 | 58 | 6. Test this by logging into nodes 1, 2, and 3. You now should not be able to ping node 2 from node 3 (and vice versa) but should be able to ping nodes 2 and 3 from node 1. 59 | 60 | Here is the output for testing from node2 61 | ```bash 62 | node2$ ping -c3 node3 63 | PING node3-link-0 (192.168.1.103) 56(84) bytes of data. 64 | 65 | --- node3-link-0 ping statistics --- 66 | 3 packets transmitted, 0 received, 100% packet loss, time 2015ms 67 | ``` 68 | 69 | 7. Heal the partition using the heal script on node4 70 | 71 | ```bash 72 | node4$ sudo ./heal_partition.sh 73 | ``` 74 | 75 | Now let's see how the this example is different when we have Nifty running in the system 76 | 77 | Example 2: A partition while using Nifty 78 | ------- 79 | 80 | We will repeat the previous example on the cluster after deploying Nifty. 81 | 82 | Redo steps 1 and 2 from example 1 above. 83 | 84 | 1. Deploy Nifty using the deployment script found in the ```deploy``` directory. Run the deploy script on the controller node. 85 | 86 | ```bash 87 | node4$ sudo ./deploy_nifty.sh 88 | ``` 89 | 90 | 4. Now, on node4 run the script to create a partition using ./deploy_partitioner.sh 91 | 92 | ```bash 93 | node4$ sudo ./deploy_partitioner.sh 94 | ``` 95 | 96 | This will create a partition specified in parts.conf 97 | 98 | 9. Test this by logging into nodes 1,2, and 3. You should still be able to ping all the nodes from all the nodes, despite the partition. 99 | 100 | The following is the test from node2 101 | ```bash 102 | node2$ ping -c3 node3 103 | PING node3-link-0 (192.168.1.103) 56(84) bytes of data. 104 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=1 ttl=64 time=0.301 ms 105 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=2 ttl=64 time=0.334 ms 106 | 64 bytes from node3-link-0 (192.168.1.103): icmp_seq=3 ttl=64 time=0.360 ms 107 | 108 | --- node3-link-0 ping statistics --- 109 | 3 packets transmitted, 3 received, 0% packet loss, time 1999ms 110 | rtt min/avg/max/mdev = 0.301/0.331/0.360/0.032 ms 111 | ``` 112 | 113 | The ping works and node2 and node3 can communicate in spite of the partition. That is because Nifty creates alternative routes in the network to mask the partition. 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | -------------------------------------------------------------------------------- /pnp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UWASL/NIFTY/f1c3fcfb921b00c690208ae8c2f1b6f217a399ac/pnp.png -------------------------------------------------------------------------------- /src/daemon.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * The main file of Nifty (contains driver code) 3 | */ 4 | 5 | #include 6 | #include "nifty.h" 7 | using namespace std; 8 | 9 | 10 | /** 11 | * Prints usage information. 12 | */ 13 | void printUsage() 14 | { 15 | printf("::USAGE::\n"); 16 | printf("./nifty -i [ip] -m [mac] [OPTIONS]\n\n"); 17 | 18 | printf("-i ip , This machine's IP address, e.g., 192.168.1.5\n"); 19 | printf("-m mac , This machine's MAC address, e.g., f8:c9:7a:92:bb:a3\n"); 20 | printf("OPTIONS: \n"); 21 | printf("-c # , Configuration file path (default is nodes.conf) it contains IPS and MACs of nodes\n"); 22 | printf("-t # , How often should this machine ping others. (every t seconds)\n"); 23 | printf("-v , Verbose mode\n"); 24 | exit(1); 25 | } 26 | 27 | /** 28 | * Prints the args that are passed to the driver. 29 | */ 30 | void printArgs(string myIp, string myMac, int pingingPeriod, int destinationsCount, 31 | const string* destinationIps, const string* destinationMacs, bool verbose) 32 | { 33 | printf("myIp: %s\n", myIp.c_str()); 34 | printf("myMac: %s\n", myMac.c_str()); 35 | printf("pingingPeriod: %d\n", pingingPeriod); 36 | printf("Verbose: %s\n", verbose?"True":"False"); 37 | printf("destinationsCount: %d\n", destinationsCount); 38 | 39 | string temp = ""; 40 | for (int i = 0; i < destinationsCount; ++i) 41 | { 42 | if(i) 43 | temp += ", "; 44 | temp += destinationIps[i]; 45 | } 46 | printf("destinationIps: %s\n", temp.c_str()); 47 | 48 | temp = ""; 49 | for (int i = 0; i < destinationsCount; ++i) 50 | { 51 | if(i) 52 | temp += ", "; 53 | temp += destinationMacs[i]; 54 | } 55 | printf("destinationMacs: %s\n", temp.c_str()); 56 | } 57 | 58 | /** 59 | * Driver code. 60 | * 61 | * Parses arguments passed to the program and creates DV with them. 62 | */ 63 | int main(int argc, char** argv) 64 | { 65 | int destinationsCount = 1; 66 | string* destinationIps; 67 | string* destinationMacs; 68 | string myIp = "IP"; 69 | string myMac = "MAC"; 70 | string confPath = "nodes.conf"; 71 | int pingingPeriod = 5; 72 | bool verbose = false; 73 | 74 | if(argc <= 3) 75 | printUsage(); 76 | 77 | // Parse the command line arguments 78 | try 79 | { 80 | for (int i = 1; i < argc; ++i) 81 | { 82 | if(argv[i][0]=='-' && argv[i][1] == 'i') 83 | myIp = argv[++i]; 84 | 85 | else if(argv[i][0]=='-' && argv[i][1] == 'm') 86 | myMac = argv[++i]; 87 | 88 | else if(argv[i][0]=='-' && argv[i][1] == 't') 89 | pingingPeriod = stoi(argv[++i]); 90 | 91 | else if(argv[i][0]=='-' && argv[i][1] == 'v') 92 | verbose = true; 93 | 94 | else if(argv[i][0]=='-' && argv[i][1] == 'c') 95 | confPath = argv[++i]; 96 | } 97 | ifstream fin(confPath); 98 | fin>>destinationsCount; 99 | destinationIps = new string[destinationsCount]; 100 | destinationMacs = new string[destinationsCount]; 101 | 102 | for (int j = 0; j < destinationsCount; ++j) 103 | { 104 | string dstIp; 105 | fin>>dstIp; 106 | destinationIps[j] = dstIp; 107 | } 108 | 109 | for (int j = 0; j < destinationsCount; ++j) 110 | { 111 | string dstMac; 112 | fin>>dstMac; 113 | destinationMacs[j] = dstMac; 114 | } 115 | 116 | fin.close(); 117 | } 118 | catch(...) 119 | { 120 | printUsage(); 121 | exit(1); 122 | } 123 | printArgs(myIp, myMac, pingingPeriod, destinationsCount, destinationIps, destinationMacs, verbose); 124 | 125 | Nifty nifty(myIp, myMac, pingingPeriod, destinationsCount, destinationIps, destinationMacs, verbose); 126 | nifty.start(); 127 | 128 | return 0; 129 | } 130 | -------------------------------------------------------------------------------- /src/nifty.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * Implementation file for the primary class Nifty 3 | */ 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include "nifty.h" 17 | 18 | using namespace std; 19 | 20 | 21 | string Nifty::toString(string targetIP) 22 | { 23 | int throughID = ipToId.find(targetIP) == ipToId.end()? -1 : ipToId[targetIP]; 24 | 25 | std::string ret = ""; 26 | for (int i = 0; i < destinationsCount; ++i) 27 | { 28 | if(i) 29 | ret += ";"; 30 | ret += distanceVector[i].toString(throughID); 31 | } 32 | return ret; 33 | } 34 | 35 | 36 | void Nifty::print(string msg, bool forcePrint) 37 | { 38 | if(verbose || forcePrint) 39 | printf("%s\n", msg.c_str()); 40 | } 41 | 42 | 43 | DistanceVectorEntry* distancVectorFromString(const char* message, int len) 44 | { 45 | DistanceVectorEntry* ret = new DistanceVectorEntry[len]; 46 | std::stringstream ss(message); 47 | std::string token; 48 | int id = 0; 49 | while(std::getline(ss, token, ';')) 50 | { 51 | std::stringstream ss2(token); 52 | double cost; 53 | std::string ip; 54 | ss2>>cost>>ip; 55 | 56 | ret[id] = DistanceVectorEntry(cost, id, ip); 57 | id++; 58 | } 59 | return ret; 60 | } 61 | 62 | 63 | Nifty::Nifty(std::string _myIp, std::string _myMac, unsigned int _pingingPeriod, unsigned int _destinationsCount, 64 | std::string* _destinationIps, std::string* _destinationMacs, bool _verbose) 65 | { 66 | myIp = _myIp; 67 | myMac = _myMac; 68 | pingingPeriod = _pingingPeriod; 69 | destinationsCount = _destinationsCount; 70 | destinationIps = _destinationIps; 71 | destinationMacs = _destinationMacs; 72 | verbose = _verbose; 73 | 74 | distanceVector = new DistanceVectorEntry[destinationsCount]; 75 | init(); 76 | } 77 | 78 | 79 | void Nifty::start() 80 | { 81 | //A seperate thread to ping others. 82 | pingingThread = std::thread (&Nifty::pingOthers, this, false); 83 | receiveMessages(); 84 | } 85 | 86 | 87 | Nifty::~Nifty() 88 | { 89 | delete[] distanceVector; 90 | } 91 | 92 | 93 | void Nifty::init() 94 | { 95 | for (int i = 0; i < destinationsCount; ++i) 96 | { 97 | ipToId[destinationIps[i]] = i; 98 | 99 | distanceVector[i] = DistanceVectorEntry(MAX_COST, -1, destinationIps[i]); 100 | if(destinationIps[i] == myIp) 101 | distanceVector[i] = DistanceVectorEntry(0.0, i, myIp); //Cost to reach myself 102 | } 103 | updateOF(); 104 | } 105 | 106 | 107 | void Nifty::nodeTimedOut(string ip) 108 | { 109 | // I already know that I cannot reach this one, do nothing. 110 | if(timedOutNodes.find(ip) != timedOutNodes.end() && timedOutNodes[ip]) 111 | return; 112 | 113 | print("Node " + ip + " had timedout!!"); 114 | for (int i = 0; i < destinationsCount; ++i) 115 | { 116 | if (distanceVector[i].throughID >= 0 117 | && destinationIps[distanceVector[i].throughID] == ip) //Used to go through this node, it's inf now! 118 | { 119 | distanceVector[i].cost = MAX_COST; 120 | distanceVector[i].throughID = -1; 121 | } 122 | } 123 | } 124 | 125 | 126 | void Nifty::checkTimeOuts() 127 | { 128 | for (int i = 0; i < destinationsCount; ++i) 129 | { 130 | if (myIp == destinationIps[i]) 131 | continue; 132 | 133 | //Check if this node had timed out! 134 | time_t curr_time = time (NULL); 135 | 136 | if(curr_time - nodesTimes[destinationIps[i]] > timeoutPeriod) 137 | nodeTimedOut(destinationIps[i]); 138 | } 139 | } 140 | 141 | 142 | void Nifty::pingOthers(bool onlyOnce) 143 | { 144 | struct sockaddr_in dest_addr; 145 | dest_addr.sin_family = AF_INET; 146 | dest_addr.sin_port = htons(PORT); 147 | int dest_sockfd; 148 | 149 | //Ping everyone every @pingingPeriod seconds 150 | do 151 | { 152 | //Need to check for timeouts first (important!) 153 | checkTimeOuts(); 154 | 155 | for (int i = 0; i < destinationsCount; ++i) 156 | { 157 | if (myIp == destinationIps[i]) 158 | continue; 159 | 160 | string message = toString(destinationIps[i]); 161 | print("Sending {"+destinationIps[i]+"} message : " + message); 162 | 163 | const char* destination = destinationIps[i].c_str(); 164 | inet_pton(AF_INET, destination, &dest_addr.sin_addr); 165 | dest_sockfd = socket(AF_INET, SOCK_DGRAM, 0); 166 | if(dest_sockfd < 0 ) 167 | perror("socket creation failed in pingOthers"); 168 | 169 | int sendingResult = sendto(dest_sockfd, (const char *)message.c_str(), 170 | strlen(message.c_str()), 171 | MSG_CONFIRM, (const struct sockaddr *) &dest_addr, 172 | sizeof(dest_addr)); 173 | close(dest_sockfd); 174 | } 175 | std::this_thread::sleep_for(std::chrono::milliseconds(pingingPeriod)); 176 | }while(!onlyOnce); 177 | } 178 | 179 | 180 | bool Nifty::updateDV(const char* message, const char* sourceIP) 181 | { 182 | bool updated = false; 183 | 184 | if(ipToId.find(sourceIP) == ipToId.end()) //IDK about the source!! do nothing. 185 | return false; 186 | 187 | timedOutNodes[sourceIP] = false; // I now know that this node didn't time out 188 | 189 | // Update when I last heared from this node. 190 | time_t curr_time = time (NULL); 191 | nodesTimes[sourceIP] = curr_time; 192 | 193 | int sourceID = ipToId[sourceIP]; 194 | 195 | DistanceVectorEntry* otherDV = distancVectorFromString(message, destinationsCount); 196 | 197 | int reach_count = 0; 198 | for (int i = 0; i < destinationsCount; ++i) 199 | { 200 | string ip = otherDV[i].targetIP; 201 | double cost = otherDV[i].cost + 1; //One extra hop to get to the node I received the message from 202 | if (cost <=2) //Can directly reach 203 | reach_count++; 204 | int id = ipToId[ip]; 205 | 206 | if (cost < distanceVector[id].cost) //The new cost is better, need to update my DV. 207 | { 208 | updated = true; 209 | distanceVector[id] = DistanceVectorEntry(cost, sourceID, ip); 210 | } 211 | } 212 | 213 | if(reach_count>=destinationsCount - 1) // Can reach everyone directly 214 | isBridgeNode[sourceIP] = true; 215 | else 216 | isBridgeNode[sourceIP] = false; 217 | 218 | delete[] otherDV; 219 | if(updated) 220 | { 221 | updateOF(); 222 | print("DV got updated.", true); 223 | print("current DV is: " + toString()); 224 | } 225 | return updated; 226 | } 227 | 228 | 229 | void Nifty::receiveMessages() 230 | { 231 | int sockfd; 232 | char buffer[BUFFSIZE]; 233 | struct sockaddr_in servaddr, cliaddr; 234 | char adder_buffer[ADDRSIZE]; 235 | 236 | // Creating socket file descriptor 237 | sockfd = socket(AF_INET, SOCK_DGRAM, 0); 238 | if ( sockfd< 0 ) { 239 | perror("socket creation failed"); 240 | exit(EXIT_FAILURE); 241 | } 242 | 243 | memset(&servaddr, 0, sizeof(servaddr)); 244 | memset(&cliaddr, 0, sizeof(cliaddr)); 245 | 246 | // Filling server information 247 | servaddr.sin_family = AF_INET; // IPv4 248 | servaddr.sin_addr.s_addr = INADDR_ANY; 249 | servaddr.sin_port = htons(PORT); 250 | 251 | // Bind the socket with the server address 252 | if ( bind(sockfd, (const struct sockaddr *)&servaddr, 253 | sizeof(servaddr)) < 0 ) 254 | { 255 | perror("bind failed"); 256 | exit(EXIT_FAILURE); 257 | } 258 | 259 | //Keep listening for others messages. 260 | while(true) 261 | { 262 | unsigned int len; 263 | len = sizeof(struct sockaddr_in); 264 | int n = recvfrom(sockfd, (char *)buffer, BUFFSIZE, 265 | MSG_WAITALL, ( struct sockaddr *) &cliaddr, &len); 266 | buffer[n] = '\0'; 267 | 268 | inet_ntop(AF_INET, &(cliaddr.sin_addr), adder_buffer, ADDRSIZE); 269 | string address = adder_buffer; 270 | string msg = buffer; 271 | print("Received a message from: " + address); 272 | print("The message: " + msg); 273 | 274 | //After receiving a message, update DV 275 | updateDV(buffer, adder_buffer); 276 | } 277 | close(sockfd); 278 | } 279 | 280 | 281 | const void Nifty::installRule(string rule) 282 | { 283 | print("Installing rule: " + rule); 284 | system(rule.c_str()); 285 | } 286 | 287 | 288 | /** 289 | * Update the rules in the OVS's OpenFlow table using the data in 290 | * the distance vector table. 291 | * Uses different cookie numbers for different rules (used to 292 | * have a more targeted analysis of the traffic in the system) 293 | * 294 | * ****COOKIES TABLE***** 295 | * 1 => IN_TRAFFIC: DATA SENT TO ME 296 | * 2 => OUT_TRAFFIC: DATA GOING OUT OF ME TO OTHER DESTINATIONS 297 | * 3 => PASSING_TRAFFIC: DATA PASSING THROUGH ME TO OTHER DESTINATIONS 298 | * 4 => CONTROLLER TRAFFIC 299 | * 5 => OTHER? (not used.) 300 | */ 301 | const void Nifty::updateOF() 302 | { 303 | if(updating) 304 | return; 305 | updating = true; 306 | //All tags from 1 - 9 belong to this controller, delete them all. 307 | installRule("ovs-ofctl del-flows br0 cookie=1/-1" ); 308 | installRule("ovs-ofctl del-flows br0 cookie=2/-1" ); 309 | installRule("ovs-ofctl del-flows br0 cookie=3/-1" ); 310 | installRule("ovs-ofctl del-flows br0 cookie=4/-1" ); 311 | installRule("ovs-ofctl del-flows br0 cookie=5/-1" ); 312 | installRule("ovs-ofctl del-flows br0 cookie=6/-1" ); 313 | string rule; 314 | 315 | installRule("ovs-ofctl add-flow br0 cookie=1,priority=100,action=normal"); 316 | 317 | //Controller flow doesn't need to be forwarded 318 | installRule("ovs-ofctl add-flow br0 cookie=4,priority=5000,ip,nw_proto=17,tp_dst=8080,action=normal"); 319 | 320 | int reach_count = 0; 321 | for (int i = 0; i < destinationsCount; ++i) 322 | { 323 | if (distanceVector[i].cost >= MAX_COST || destinationIps[i] == myIp) //Cannot really reach it, install nothing. 324 | continue; 325 | 326 | string dest_ip = destinationIps[i]; 327 | string dest_mac = destinationMacs[i]; 328 | string through_ip = destinationIps[distanceVector[i].throughID]; 329 | string through_mac = destinationMacs[distanceVector[i].throughID]; 330 | 331 | double cost = distanceVector[i].cost; 332 | if (cost <=1) //Can directly reach 333 | reach_count++; 334 | 335 | //Modify packets passing through me 336 | rule = "ovs-ofctl add-flow br0 cookie=3,priority=500,ip,in_port=1,nw_dst="+dest_ip+",action=mod_dl_dst:"+through_mac+",mod_dl_src:"+myMac+",in_port"; 337 | installRule(rule); 338 | 339 | //Modify packets going out of me. 340 | rule = "ovs-ofctl add-flow br0 cookie=2,priority=500,ip,nw_dst="+dest_ip+",action=mod_dl_dst:"+through_mac+",mod_dl_src:"+myMac+",1"; 341 | installRule(rule); 342 | } 343 | 344 | if(reach_count>=destinationsCount) // Can reach everyone directly 345 | isBridgeNode[myIp] = true; 346 | else 347 | isBridgeNode[myIp] = false; 348 | 349 | updating = false; 350 | } 351 | 352 | 353 | void Nifty::printDV() 354 | { 355 | print("CurrentDV: " + toString()); 356 | } 357 | 358 | 359 | vector Nifty::getBridgeNodes() 360 | { 361 | vector ret; 362 | 363 | for (int i = 0; i < destinationsCount; ++i) 364 | { 365 | if(isBridgeNode[destinationIps[i]]) 366 | ret.push_back(destinationIps[i]); 367 | } 368 | 369 | return ret; 370 | } 371 | -------------------------------------------------------------------------------- /src/nifty.h: -------------------------------------------------------------------------------- 1 | /** 2 | * This is a header file that contains the skeleton for the main class of Nifty 3 | * and the implementation of the Distance Vectory Entry class. 4 | */ 5 | 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | using namespace std; 13 | const double MAX_COST = 1001; // Anything with a cost more than 1000 is unreachable 14 | #define PORT 8080 // Nifty instances use this port to communicate with each other 15 | #define BUFFSIZE 1024 // Maximum size of a single message sent between Nifty instances (in bytes) 16 | #define ADDRSIZE 20 // Size of address in bytes. 17 | 18 | /** 19 | * DistanceVectorEntry is the struct that contains infromation about each entry 20 | * in the distance-vector table (DV). 21 | * The struct contains three information: 22 | * @targetIP: the targetIP this entry concerns. 23 | * @throughID: the ID of the node traffic should be forwarded to next in 24 | * order to reach to the final distination (@targetIP). 25 | * @cost: the cost (number of hops) of reaching the target. 26 | */ 27 | struct DistanceVectorEntry 28 | { 29 | double cost; // The cost to reach the node (number of hops) 30 | int throughID; // traffic destined to targetIP is forwarded to node with throughID. 31 | std::string targetIP; // The IP of the node 32 | 33 | DistanceVectorEntry(): DistanceVectorEntry(MAX_COST, -1,"") {} 34 | DistanceVectorEntry(double _cost, int _throughID): DistanceVectorEntry(_cost,_throughID,"") {} 35 | DistanceVectorEntry(double _cost, int _throughID, string _targetIP) 36 | { 37 | cost = _cost; 38 | throughID = _throughID; 39 | targetIP = _targetIP; 40 | } 41 | 42 | /** 43 | * return a string representing the entry. 44 | */ 45 | std::string toString() const 46 | { 47 | stringstream ss; 48 | ss< ipToId; //get the id for a specific ip. 88 | unordered_map nodesTimes; //When was the last message received by this node 89 | unordered_map timedOutNodes; //States whether a node had timed out or not. 90 | unordered_map isBridgeNode; //Tells whether a node is a bridge node or not. 91 | bool updating = false; 92 | 93 | 94 | /** 95 | * Construct a message to send to other nodes. The message 96 | * contains details about my current distance vector table. 97 | * 98 | * @targetIP: The IP address of the node I'm sending the message to. 99 | */ 100 | string toString(string targetIP = ""); 101 | 102 | 103 | /** 104 | * Initializes the underlying data structures. Sets the cost of 105 | * reaching others to inf and reaching myself is 0. 106 | */ 107 | void init(); 108 | 109 | 110 | /** 111 | * A helper function to print messages to the console. 112 | * Needed as verbose might be off, and in that case we shouldn't print 113 | * 114 | * @msg: The message to be printed. 115 | * @forcePrint: defaults to false. Whether or not this print should be forced regardless of verbose settings. 116 | */ 117 | void print(string msg, bool forcePrint=false); 118 | 119 | 120 | /** 121 | * Pings other nodes and piggyback my distance vector to the ping message. 122 | * 123 | * @onlyOnce: If set to true send only one heartbeat. Otherwise, keep sending heartbeats every pingingPeriod. 124 | */ 125 | void pingOthers(bool onlyOnce = false); 126 | 127 | 128 | /** 129 | * Potentially update the unerlying distance vector table 130 | * using a @message received from @sourceIP. 131 | */ 132 | bool updateDV(const char* message, const char* sourceIP); 133 | 134 | 135 | /** 136 | * A method to keep receiving messages from other hosts. 137 | */ 138 | void receiveMessages(); 139 | 140 | 141 | /** 142 | * Update the rules in the OVS's OpenFlow table using the data in the distance vector table. 143 | */ 144 | void const updateOF(); 145 | 146 | 147 | /** 148 | * Sets a node as a timoued out one and potentially modify 149 | * other entries in the DV table if they were using 150 | * the node with IP address @ip as an intermediary node. 151 | * 152 | * @ip: The IP address of the node that got timed out. 153 | */ 154 | void nodeTimedOut(string ip); 155 | 156 | 157 | /** 158 | * A helper function that calls the underlying system funciton to install a new OVS OpenFlow rule. 159 | */ 160 | void const installRule(string rule); 161 | 162 | 163 | /** 164 | * Checks if any of the nodes I'm connected to had recently timed out. 165 | */ 166 | void checkTimeOuts(); 167 | 168 | 169 | /** 170 | * Returns a list of the bridge nodes in the system. 171 | */ 172 | vector getBridgeNodes(); 173 | 174 | 175 | std::thread pingingThread; 176 | std::thread receivingThread; 177 | public: 178 | /** 179 | * Constructor 180 | * 181 | * @_myIp: IP address of this node. 182 | * @_myMac: MAC address of this node. 183 | * @_pingingPeriod: Ping other nodes every @_pingingPeriod number of seconds. 184 | * @_destinationsCount: Number of other nodes in the system. 185 | * @_destinationIps: IP addresses of other nodes in the system. 186 | * @_destinationMacs: MAC addresses of other nodes in the system. 187 | */ 188 | Nifty(std::string _myIp, std::string _myMac, unsigned int _pingingPeriod, unsigned int _destinationsCount, 189 | std::string* _destinationIps, std::string* _destinationMacs, bool _verbose = false); 190 | 191 | /** 192 | * Starts the Nifty process. Start receiving messages and pinging others. 193 | */ 194 | void start(); 195 | 196 | /** 197 | * Function to output the entire distance-vector table 198 | */ 199 | void printDV(); 200 | 201 | /** 202 | * Destructor. Deletes the underlying data structures (distance vector) 203 | */ 204 | ~Nifty(); 205 | }; 206 | -------------------------------------------------------------------------------- /src/partitioner.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * The partitioner.cpp code is used to help test Nifty 3 | * by introducing artificial partitions between the nodes 4 | */ 5 | 6 | #include 7 | 8 | using namespace std; 9 | 10 | string myMac = ""; 11 | 12 | /** 13 | * A helper function that calls the underlying system funciton to 14 | * install a new OVS OpenFlow rule. 15 | * 16 | * @rule: a string with a command to install an OVS rule 17 | */ 18 | void installRule(string rule) 19 | { 20 | printf("Installing rule: %s\n", rule.c_str()); 21 | system(rule.c_str()); 22 | } 23 | 24 | 25 | /** 26 | * A function to output a help message to the user of the code 27 | * showing how to run the code and what parameters to give it 28 | */ 29 | void printUsage() 30 | { 31 | printf("::USAGE::\n"); 32 | printf("./partitioner myMac [path]\n\n"); 33 | printf("If the tool is called with no arguments, it heals the partial partition\n\n"); 34 | 35 | printf("NOTE: The default path is ./parts.conf\n\n"); 36 | printf("parts.conf structure\n"); 37 | printf("Line1 (count) g1MACs\n"); 38 | printf("Line2 (count) g2MACs\n"); 39 | 40 | exit(1); 41 | } 42 | 43 | 44 | /** 45 | * Creates a MAC-based partial parition where g1 cannot reach g2. 46 | * (everyone else can reach both) 47 | * Uses the "cookie" value of 10 (0xa) for the installed OpenFlow rules. 48 | * This helps with healing the partition as we only remove rules with 0xa. 49 | * Make sure no other rules in the OpenFlow table use the cookie 0xa as well. 50 | * 51 | * @g1: Vector of g1 members MAC addresses 52 | * @g2: Vector of g2 members MAC addresses 53 | */ 54 | void createMACPNP(const vector g1, const vector g2) 55 | { 56 | if(find(g1.begin(), g1.end(), myMac)!=g1.end()) { 57 | // I'm in g1, shouldn't reach any g2 members 58 | for (int i = 0; i < g2.size(); ++i) 59 | installRule("ovs-ofctl add-flow br0 cookie=10,priority=10000,dl_src="+g2[i]+",action=drop"); 60 | } 61 | else if(find(g2.begin(), g2.end(), myMac)!=g2.end()) { 62 | // I'm in g2, shouldn't reach any g3 members 63 | for (int i = 0; i < g1.size(); ++i) 64 | installRule("ovs-ofctl add-flow br0 cookie=10,priority=10000,dl_src="+g1[i]+",action=drop"); 65 | } 66 | //Do nothing otherwise (this applies to nodes that are not affected by the parition) 67 | } 68 | 69 | 70 | /** 71 | * Driver code for the partitioner. 72 | */ 73 | int main(int argc, char** argv) 74 | { 75 | ifstream fin; 76 | // parse the command line arguments 77 | try 78 | { 79 | fin.exceptions(std::ifstream::failbit | std::ifstream::badbit); 80 | if(argc == 3) 81 | fin.open(argv[2]); 82 | else if(argc == 2) 83 | fin.open("./parts.conf"); 84 | else if (argc == 1) 85 | { 86 | // No command line arguments are provided 87 | // heal all current partial network paritions by deleting 88 | // the flow rules with a cookie=10 89 | installRule("ovs-ofctl del-flows br0 cookie=10/-1"); 90 | exit(0); 91 | } 92 | else printUsage(); 93 | 94 | myMac = argv[1]; 95 | } 96 | catch(...) 97 | { 98 | printf("~~~~~~Couldn't open the partitioning file!!\n\n"); 99 | printUsage(); 100 | } 101 | 102 | int n; 103 | // parse the first group addresses from the file 104 | fin>>n; 105 | vector g1; 106 | vector g2; 107 | while(n--) 108 | { 109 | string addr; 110 | fin>>addr; 111 | g1.push_back(addr); 112 | } 113 | 114 | // parse the second group addresses from the file 115 | fin>>n; 116 | while(n--) 117 | { 118 | string addr; 119 | fin>>addr; 120 | g2.push_back(addr); 121 | } 122 | fin.close(); 123 | 124 | 125 | printf("Creating a partial partition between the following group 1 and group 2.\n"); 126 | printf("Members of group 1: \n"); 127 | for (int i = 0; i < g1.size(); ++i) 128 | printf("%s\n", g1[i].c_str()); 129 | 130 | printf("\n~~~~~~~~~~~~~~~\n\n"); 131 | printf("Members of group 2: \n"); 132 | for (int i = 0; i < g2.size(); ++i) 133 | printf("%s\n", g2[i].c_str()); 134 | 135 | createMACPNP(g1,g2); 136 | 137 | return 0; 138 | } 139 | --------------------------------------------------------------------------------