├── LICENSE
├── README.md
├── bin
    ├── generate-graphs.sh
    ├── sbt
    └── watch-and-email.sh
├── build.sbt
├── conf
    └── cluster-sim-env.sh.template
├── project
    └── build.properties
├── src
    ├── main
    │   ├── java
    │   │   └── ClusterSchedulingSimulation
    │   │   │   └── ClusterSimulationProtos.java
    │   ├── protocolbuffers
    │   │   ├── cluster_simulation_protos.proto
    │   │   └── compile_protobufs.sh
    │   ├── python
    │   │   ├── cluster_simulation_protos_pb2.py
    │   │   ├── generate-txt-from-protobuff.py
    │   │   ├── generate-txt-from-protobuff.sh
    │   │   ├── generate-txt-from-protobuffs-in-dir.sh
    │   │   └── graphing-scripts
    │   │   │   ├── README
    │   │   │   ├── comparison-plot-from-protobuff.py
    │   │   │   ├── comparison-plot-from-protobuff.sh
    │   │   │   ├── generate-plots-from-protobuff.py
    │   │   │   └── utils.py
    │   └── scala
    │   │   ├── CoreClusterSimulation.scala
    │   │   ├── ExperimentRunner.scala
    │   │   ├── MesosSimulation.scala
    │   │   ├── MonolithicSimulation.scala
    │   │   ├── OmegaSimulation.scala
    │   │   ├── ParseParm.scala
    │   │   ├── Simulation.scala
    │   │   ├── Util.scala
    │   │   └── Workloads.scala
    └── test
    │   └── scala
    │       └── TestSimulations.scala
└── traces
    ├── README.txt
    ├── example-init-cluster-state.log
    └── job-distribution-traces
        ├── README.txt
        ├── example_csizes_cmb.log
        ├── example_interarrival_cmb.log
        └── example_runtimes_cmb.log


/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2013, Regents of the University of California
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions are met:
 6 | 
 7 | Redistributions of source code must retain the above copyright notice, this
 8 | list of conditions and the following disclaimer.  Redistributions in binary
 9 | form must reproduce the above copyright notice, this list of conditions and the
10 | following disclaimer in the documentation and/or other materials provided with
11 | the distribution.  Neither the name of the University of California, Berkeley
12 | nor the names of its contributors may be used to endorse or promote products
13 | derived from this software without specific prior written permission.  THIS
14 | SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
15 | EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
16 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
24 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Cluster scheduler simulator overview
 2 | 
 3 | This simulator can be used to prototype and compare different cluster scheduling strategies and policies. It generates synthetic cluster workloads from empirical parameter distributions (thus generating unique workloads even from a small amount of input data), simulates their scheduling and execution using a discrete event simulator, and finally permits analysis of scheduling performance metrics.
 4 | 
 5 | The simulator was originally written as part of research on the "Omega" shared-state cluster scheduling architecture at Google. A paper on Omega, published at EuroSys 2013, uses of this simulator for the comparative evaluation of Omega and other alternative architectures (referred to as a "lightweight" simulator there) [1]. As such, the simulators design is somewhat geared towards the comparative evaluation needs of this paper, but it does also permit more general experimentation with:
 6 | 
 7 |  * scheduling policies and logics (i.e. "what machine should a task be bound to?"),
 8 |  * resource models (i.e. "how are machines represented for scheduling, and how are they shared between tasks?"),
 9 |  * shared-cluster scheduling architectures (i.e. "how can multiple independent schedulers be supported for a large shared, multi-framework cluster?").
10 | 
11 | While the simulator will simulate job arrival, scheduler decision making and task placement, it does **not** simulate the actual execution of the tasks or variation in their runtime due to shared resources.
12 | 
13 | ## Downloading, building, and running
14 | 
15 | The source code for the simulator is available in a Git repository hosted on Google Code. Instructions for downloading  can be found at at https://code.google.com/p/cluster-scheduler-simulator/source/checkout.
16 | 
17 | The simulator is written in Scala, and requires the Simple Build Tool (`sbt`) to run. A copy of `sbt` is package with the source code, but you will need the following prerequisites in order to run the simulator:
18 | 
19 |  * a working JVM (`openjdk-6-jre` and `openjdk-6-jdk` packages in mid-2013 Ubuntu packages),
20 |  * a working installation of Scala (`scala` Ubuntu package),
21 |  * Python 2.6 or above and matplotlib 1.0 or above for generation of graphs (`python-2.7` and `python-matplotlib` Ubuntu packages).
22 | 
23 | Once you have ensured that all of these exist, simply type `bin/sbt run` from the project home directory in order to run the simulator:
24 | 
25 |     $ bin/sbt run
26 |     [...]
27 |     [info] Compiling 9 Scala sources and 1 Java source to ${WORKING_DIR}/target/scala-2.9.1/classes...
28 |     [...]
29 |     [info] Running Simulation 
30 |     
31 |     RUNNING CLUSTER SIMULATOR EXPERIMENTS
32 |     ------------------------
33 |     [...]
34 | 
35 | ### Using command line flags
36 | 
37 | The simulator can be passed some command-line arugments via configuration flags, such as `--thread-pool-size NUM_THREADS_INT` and `--random-seed SEED_VAL_INT`. To view all options run:
38 | 
39 |     $ bin/sbt "run --help"
40 | 
41 | Note that when passing command line options to the `sbt run` command you need to include the word `run` and all of the options that follow it within a single set of quotes. `sbt` can also be used via the `sbt` console by simply running `bin/sbt` which will drop you at a prompt. If you are using this `sbt` console option, you do not need to put quotes around the run command and any flags you pass.
42 | 
43 | ### Configuration file
44 | If a file `conf/cluster-sim-env.sh` exists, it will be sourced in the shell before the simulator is run. This was added as a way of setting up the JVM (e.g. heap size) for simulator runs. Check out `conf/cluster-sim-env.sh.template` as a starting point; you will need to uncomment and possibly modify the example configuration value set in that template file (and, of course, you will need to create a copy of the file removing the ".template" suffix).
45 | 
46 | 
47 | ## Configuring experiments
48 | 
49 | The simulation is controlled by the experiments configured in the `src/main/scala/Simulation.scala` setup file. Comments in the file explain how to set up different workloads, workload-to-scheduler mappings and simulated cluster and machine sizes.
50 | 
51 | Most of the workload setup happens in `src/main/scala/Workloads.scala`, so read through that file and make modifications there to have the simulator read from a trace file of your own (see more below about the type of trace files the simulator uses, and the example files included).
52 | 
53 | Workloads in the simulator are generated from *empirical parameter distributions*. These are typically based on cluster *snapshots* (at a point in time) or *traces* (sequences of events over time). We unfortunately cannot provide the full input data used for our experiments with the simulator, but we do provide example input files in the `traces` subdirectory, illustrating the expected data format (further explained in the local README file in `traces`). The following inputs are required:
54 | 
55 |  * **initial cluster state**: when the simulation starts, the simulated cluster obviously cannot start off empty. Instead, we pre-load it with a set of running jobs (and tasks) at this point in time. These jobs start before the beginning of simulation, and may end during the simulation or after. The example file `traces/example-init-cluster-state.log` shows the input format for the jobs in the initial cluster state, as well as the departure events of those of them which end during the simulation. The resource footprints of tasks generated at simulation runtime will also be sampled from the distribution of resource footprints of tasks in the initial cluster state.
56 |  * **job parameters**: the simulator samples three key parameters for each job from empirical distributions (i.e. randomly picks values from a large set):
57 |     1. Job sizes (`traces/job-distribution-traces/example_csizes_cmb.log`): the number of tasks in the generated job. We assume for simplicity that all tasks in a job have the same resource footprint.
58 |     2. Job inter-arrival times (`traces/job-distribution-traces/example_interarrival_cmb.log`): the time in between job arrivals for each workload (in seconds). The value drawn from this distribution indicates how many seconds elapse until another job arrives, i.e. the "gaps" in between jobs.
59 |     3. Job runtimes (`traces/job-distribution-traces/example_runtimes_cmb.log`): total job runtime. For simplicity, we assume that all tasks in a job run for exactly this long (although if a task gets scheduled later, it will also finish later).
60 | 
61 | For further details, see `traces/README.txt` and `traces/job-distribution-traces/README.txt`.
62 | 
63 | **Please note that the resource amounts specified in the example data files, and the example cluster machines configured in `Simulation.scala` do *not* reflect Google configurations. They are made-up numbers, so please do not quote them or try to interpret them!**
64 | 
65 | A possible starting point for generating realistic input data is the public Google cluster trace [2, 3]. It should be straightforward to write scripts that extract the relevant data from the public trace's event logs. Although we do not provide such scripts, it is worth noting that the "cluster C" workload in the EuroSys paper [1] represents the same workload as the public trace. (If you do write scripts for converting the public trace into simulator format, please let us know, and we will happily include them in the simulator code release!)
66 | 
67 | ## Experimental results: post-processing
68 | 
69 | Experimental results are stored in serialized Protocol Buffers in the `experiment_results` directory at the root of the source tree by default: one subdirectory for each experiment, and with a unique name identifying the experimental setup as well as the start time. The schemas for the `.protobuf` files are stored in `src/main/protocolbuffers`.
70 | 
71 | A script for post-processing and graphing experimental results is located in `src/main/python/graphing-scripts`, and `src/main/python` also contains scripts for converting the protobuf-encoded results into ASCII CSV files. See the README file in the `graphing-scripts` directory for detailed explanation.
72 | 
73 | ## NOTES
74 | 
75 | ### Changing and compiling the protocol buffers
76 | 
77 | If you make changes to the protocol buffer file (in `src/main/protocolbuffers`), you will need to recompile them, which will generate updated Java files in `src/main/java`. To do so, you must install the protcol buffer compiler and run `src/main/protocolbuffers/compile_protobufs.sh`, which itself calls `protoc` (which it assumes is on your `PATH`).
78 | 
79 | ### Known issues
80 | 
81 | - The `schedulePartialJobs` option is used in the current implementation of the `MesosScheduler` class. Partial jobs are always scheduled (even if this flag is set to false). Hence the `mesosSimulatorSingleSchedulerZeroResourceJobsTest` currently fails to pass.
82 | 
83 | ## Contributing, Development Status, and Contact Info
84 | 
85 | Please use  the Google Code [project issue tracker](https://code.google.com/p/cluster-scheduler-simulator/issues/list) for all bug reports, pull requests and patches, although we are unlikely to be able to respond to feature requests. You can also send any kind of feedback to the developers, [Andy Konwinski](http://andykonwinski.com/) and [Malte Schwarzkopf](http://www.cl.cam.ac.uk/~ms705/).
86 | 
87 | ## References
88 | 
89 | [1] Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek and John Wilkes. **[Omega: flexible, scalable schedulers for large compute clusters](http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf)**. In *Proceedings of the 8th European Conference on Computer Systems (EuroSys 2013)*.
90 | 
91 | [2] Charles Reiss, Alexey Tumanov, Gregory Ganger, Randy Katz and Michael Kotzuch. **[Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis](http://www.pdl.cmu.edu/PDL-FTP/CloudComputing/googletrace-socc2012.pdf)**. In *Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC 2012)*.
92 | 
93 | [3] Google public cluster workload traces. [https://code.google.com/p/googleclusterdata/](https://code.google.com/p/googleclusterdata/).
94 | 


--------------------------------------------------------------------------------
/bin/generate-graphs.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | # Copyright (c) 2013, Regents of the University of California
  4 | # All rights reserved.
  5 | 
  6 | # Redistribution and use in source and binary forms, with or without
  7 | # modification, are permitted provided that the following conditions are met:
  8 | 
  9 | # Redistributions of source code must retain the above copyright notice, this
 10 | # list of conditions and the following disclaimer.  Redistributions in binary
 11 | # form must reproduce the above copyright notice, this list of conditions and the
 12 | # following disclaimer in the documentation and/or other materials provided with
 13 | # the distribution.  Neither the name of the University of California, Berkeley
 14 | # nor the names of its contributors may be used to endorse or promote products
 15 | # derived from this software without specific prior written permission.  THIS
 16 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 17 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 18 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 19 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 20 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 21 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 22 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 23 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 24 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 26 | 
 27 | bin_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 28 | CLUSTER_SIM_HOME=$bin_dir/..
 29 | echo CLUSTER_SIM_HOME is $CLUSTER_SIM_HOME/src/main/python/graphing-scripts
 30 | cd $CLUSTER_SIM_HOME/src/main/python/graphing-scripts
 31 | 
 32 | function usage
 33 | {
 34 |   echo "usage: `basename $0` ABS_PATH_TO_INPUT_DIR [--env-set ENV_SET_1 --env-set ENV_SET_2 --png --paper-mode]"
 35 | }
 36 | 
 37 | if [ $# -eq 0 ]; then
 38 |   echo "please provide the input-directory containing your protocol buffer files (ending in .protobuf)"
 39 |   usage
 40 |   exit
 41 | fi
 42 | input_dir=$1
 43 | shift
 44 | echo input_dir set as $input_dir
 45 | run_time='86400'
 46 | # do_png should be "" or "png"
 47 | do_png=''
 48 | # 0 = normal, 1 = paper
 49 | modes=0
 50 | # Used to name directories generated in graph directory.
 51 | # Only affects which lines are actually plotted when in paper-mode.
 52 | # Must be all caps.
 53 | env_sets_to_plot=''
 54 | 
 55 | while [ "$1" != "" ]; do
 56 |   case $1 in
 57 |     -e | --env_set )         env_sets_to_plot+=$1
 58 |                              ;;
 59 |     -p | --png )             do_png="png"
 60 |                              ;;
 61 |     --paper-mode )           modes+=1
 62 |                              ;;
 63 |     -h | --help )            usage
 64 |                              exit
 65 |                              ;;
 66 |     * )                      usage
 67 |                              exit 1
 68 |   esac
 69 |   shift
 70 | done
 71 | 
 72 | case $input_dir in
 73 |   *vary_C*)      vary_dimensions+=c;;
 74 |   *vary_L*)      vary_dimensions+=l;;
 75 |   *vary_Lambda*) vary_dimensions+=lambda;;
 76 |   *)             echo "Protobuf filename must contain \"vary_[C|L|Lambda]\"."
 77 |                  exit 1
 78 | esac
 79 | 
 80 | # Use a default env_set if none was specified.
 81 | if [ -z "$env_sets_to_plot" ]; then
 82 |   env_sets_to_plot='C'
 83 | fi
 84 | 
 85 | plotting_script='generate-plots-from-protobuff.py'
 86 | 
 87 | # Assumes runtime is the last token of the filename.
 88 | run_time=`echo ${input_dir} | grep '[0-9]\+$' --only-matching`
 89 | 
 90 | function graph_experiment() {
 91 |   if [ -z "$1" ]; then # Is parameter #1 zero length?
 92 |     echo "graph_experiment requires 1 parameter (the protobuff file name)."
 93 |     exit
 94 |   fi
 95 |   filename=$1
 96 | 
 97 |   for mode in $modes; do
 98 |     echo mode is ${mode} '(0 = non-paper, 1 = paper)'
 99 |     if [[ ${mode} -eq 1 ]]; then
100 |       out_dir="${input_dir}/graphs/paper"
101 |     else
102 |       out_dir="${input_dir}/graphs"
103 |     fi
104 | 
105 |     # Figure out which simulator type this protobuff came from.
106 |     case $filename in
107 |       *omega-resource-fit-incremental*)         sim=omega-resource-fit-incremental;;
108 |       *omega-resource-fit-all-or-nothing*)      sim=omega-resource-fit-all-or-nothing;;
109 |       *omega-sequence-numbers-incremental*)     sim=omega-sequence-numbers-incremental;;
110 |       *omega-sequence-numbers-all-or-nothing*)  sim=omega-sequence-numbers-all-or-nothing;;
111 |       *monolithic*)                             sim=monolithic;;
112 |       *mesos*)                                  sim=mesos;;
113 |       *)                                        echo "Unknown simulator type, in ${filename} exiting."
114 |                                                 exit 1
115 |     esac
116 | 
117 |     num_service_scheds=`echo ${filename} | \
118 |                         grep --only-matching '[0-9]\+_service' | \
119 |                         grep --only-matching '[0-9]\+'`
120 |     num_batch_scheds=`echo ${filename} | \
121 |                         grep --only-matching '[0-9]\+_batch' | \
122 |                         grep --only-matching '[0-9]\+'`
123 |     echo "Parsed filename for num service (${num_service_scheds}) and" \
124 |          "batch (${num_batch_scheds}) schedulers."
125 |     for vd in ${vary_dimensions}; do
126 |       echo generating graphs for dimension ${vd}.
127 | 
128 |       case $filename in
129 |         *single_path*) pathness=single_path;;
130 |         *multi_path*)  pathness=multi_path;;
131 |         *)             echo "Protobuf filename must contain"         \
132 |                             "[single|multi]_path."
133 |                        exit 1
134 |       esac
135 |       echo Pathness is ${pathness}.
136 | 
137 |       for envs_to_plot in ${env_sets_to_plot}; do
138 |         complete_out_dir=${out_dir}/${sim}/${pathness}/${envs_to_plot}/${num_service_scheds}_service-${num_batch_scheds}_batch
139 |         mkdir -p ${complete_out_dir}
140 |         echo 'PYTHONPATH=$PYTHONPATH:.. '"python ${plotting_script}" \
141 |            "${complete_out_dir}"                                       \
142 |            "${input_dir}/${filename}"                                \
143 |            "${mode} ${vd} ${envs_to_plot} ${do_png}"
144 |         PYTHONPATH=$PYTHONPATH:.. python ${plotting_script}          \
145 |             ${complete_out_dir}                                        \
146 |             ${input_dir}/${filename}                                 \
147 |             ${mode} ${vd} ${envs_to_plot} ${do_png}
148 |         echo -e "\n"
149 |       done
150 |     done
151 |   done
152 | }
153 | 
154 | PROTO_LIST=''
155 | echo capturing: ls $input_dir|grep protobuf
156 | ls $input_dir|grep protobuf
157 | for curr_filename in `ls $input_dir|grep protobuf`; do
158 |   PROTO_LIST+=curr_filename
159 |   echo Calling graph_experiment with $curr_filename
160 |   graph_experiment $curr_filename
161 | done
162 | 
163 | 


--------------------------------------------------------------------------------
/bin/sbt:
--------------------------------------------------------------------------------
 1 | #! /bin/bash
 2 | NOFORMAT="false"
 3 | if [ "$1" == "NOFORMAT" ]; then
 4 |    NOFORMAT="true"
 5 |    shift
 6 | fi
 7 | export CLUSTER_SIM_HOME=$(cd "$(dirname $0)/.."; pwd)
 8 | if [[ -f $CLUSTER_SIM_HOME/conf/cluster-sim-env.sh ]]; then
 9 |   source $CLUSTER_SIM_HOME/conf/cluster-sim-env.sh #Sets up JAVA_OPTS env variable
10 | fi
11 | 
12 | if [[ ! -f sbt/bin/sbt ]]; then
13 |   wget "https://github.com/sbt/sbt/releases/download/v0.13.18/sbt-0.13.18.zip"
14 |   unzip sbt-0.13.18.zip
15 | fi
16 | java -Dsbt.log.noformat=$NOFORMAT $JAVA_OPTS -XX:+UseParallelGC -jar `dirname $0`/../sbt/bin/sbt-launch.jar "$@"
17 | 


--------------------------------------------------------------------------------
/bin/watch-and-email.sh:
--------------------------------------------------------------------------------
 1 | #! /bin/bash
 2 | 
 3 | # Copyright (c) 2013, Regents of the University of California
 4 | # All rights reserved.
 5 | 
 6 | # Redistribution and use in source and binary forms, with or without
 7 | # modification, are permitted provided that the following conditions are met:
 8 | 
 9 | # Redistributions of source code must retain the above copyright notice, this
10 | # list of conditions and the following disclaimer.  Redistributions in binary
11 | # form must reproduce the above copyright notice, this list of conditions and the
12 | # following disclaimer in the documentation and/or other materials provided with
13 | # the distribution.  Neither the name of the University of California, Berkeley
14 | # nor the names of its contributors may be used to endorse or promote products
15 | # derived from this software without specific prior written permission.  THIS
16 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
17 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
18 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 | 
27 | PROCESS_ID=$@
28 | 
29 | echo monitoring process $PROCESS_ID
30 | 
31 | IS_FINISHED=0
32 | 
33 | function test_is_finished() {
34 |   ps -p $PROCESS_ID
35 |   IS_FINISHED=$?
36 | }
37 | 
38 | test_is_finished
39 | 
40 | while [[ $IS_FINISHED -eq 0 ]]; do
41 |   echo Process $PROCESS_ID still running, waiting to send email.
42 |   sleep 10
43 |   test_is_finished
44 | done
45 | 
46 | echo Process $PROCESS_ID done running, sending email.
47 | 
48 | echo "Cluster Simulation experiments with pid $PROCESS_ID just finished running on `hostname`!" | mail -s  "Cluster Simulation pid $PROCESS_ID finished running!" andykonwinski@gmail.com
49 | 


--------------------------------------------------------------------------------
/build.sbt:
--------------------------------------------------------------------------------
 1 | // Copyright (c) 2013, Regents of the University of California
 2 | // All rights reserved.
 3 | //
 4 | // Redistribution and use in source and binary forms, with or without
 5 | // modification, are permitted provided that the following conditions are met:
 6 | //
 7 | // Redistributions of source code must retain the above copyright notice, this
 8 | // list of conditions and the following disclaimer.  Redistributions in binary
 9 | // form must reproduce the above copyright notice, this list of conditions and the
10 | // following disclaimer in the documentation and/or other materials provided with
11 | // the distribution.  Neither the name of the University of California, Berkeley
12 | // nor the names of its contributors may be used to endorse or promote products
13 | // derived from this software without specific prior written permission.  THIS
14 | // SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
15 | // EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
16 | // WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
17 | // DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
18 | // FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19 | // DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
20 | // SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
21 | // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
22 | // OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
23 | // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
24 | 
25 | name := "Omega Simulator"
26 | 
27 | version := "0.1"
28 | 
29 | scalaVersion := "2.10.4"
30 | 
31 | organization := "edu.berkeley.cs"
32 | 
33 | mainClass := Some("Simulation")
34 | 
35 | scalacOptions += "-deprecation"
36 | 
37 | // Add a dependency on commons-math for poisson random number generator
38 | libraryDependencies += "org.apache.commons" % "commons-math" % "2.2"
39 | 
40 | libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.5" % "test"
41 | 
42 | libraryDependencies += "com.google.protobuf" % "protobuf-java" % "2.6.1"
43 | 
44 | 


--------------------------------------------------------------------------------
/conf/cluster-sim-env.sh.template:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | # Copyright (c) 2013, Regents of the University of California
 4 | # All rights reserved.
 5 | 
 6 | # Redistribution and use in source and binary forms, with or without
 7 | # modification, are permitted provided that the following conditions are met:
 8 | 
 9 | # Redistributions of source code must retain the above copyright notice, this
10 | # list of conditions and the following disclaimer.  Redistributions in binary
11 | # form must reproduce the above copyright notice, this list of conditions and the
12 | # following disclaimer in the documentation and/or other materials provided with
13 | # the distribution.  Neither the name of the University of California, Berkeley
14 | # nor the names of its contributors may be used to endorse or promote products
15 | # derived from this software without specific prior written permission.  THIS
16 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
17 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
18 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 | 
27 | # Copy this file to cluster-sim-env.sh and add configuration settings here.
28 | 
29 | # Export environment variables for your site in this file. Some useful
30 | # variables to set are:
31 | 
32 | # JAVA_OPTS, used in the Java command line in bin/sbt. For example, to set
33 | #            the heap size, make this -Xmx500m for a 500meg heap or
34 | #            -Xmx1G for 1gig heap (this example is commented out below).
35 | # export JAVA_OPTS="-Xmx1Gm"
36 | 


--------------------------------------------------------------------------------
/project/build.properties:
--------------------------------------------------------------------------------
1 | sbt.version=0.13.7
2 | 


--------------------------------------------------------------------------------
/src/main/protocolbuffers/cluster_simulation_protos.proto:
--------------------------------------------------------------------------------
  1 | /**
  2 |  * Copyright (c) 2013, Regents of the University of California
  3 |  * All rights reserved.
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.  Redistributions in binary
 10 |  * form must reproduce the above copyright notice, this list of conditions and the
 11 |  * following disclaimer in the documentation and/or other materials provided with
 12 |  * the distribution.  Neither the name of the University of California, Berkeley
 13 |  * nor the names of its contributors may be used to endorse or promote products
 14 |  * derived from this software without specific prior written permission.  THIS
 15 |  * SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 16 |  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 17 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 18 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 19 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 20 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 21 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 22 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 23 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 24 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 25 |  */
 26 | 
 27 | package ClusterSchedulingSimulation;
 28 | 
 29 | message ExperimentResultSet {
 30 |   repeated ExperimentEnv experiment_env = 1;
 31 | 
 32 |   message ExperimentEnv {
 33 |     optional string cell_name = 1;
 34 |     optional string workload_split_type = 2;
 35 |     optional bool is_prefilled = 5 [default = false];
 36 |     optional double run_time = 3;
 37 |     repeated ExperimentResult experiment_result = 4;
 38 |     // Next field number: 6
 39 | 
 40 |     // There is a 1-1 relationship between an ExperimentResult and a WorkloadDesc.
 41 |     message ExperimentResult {
 42 |       // Track avg resource utilization attributable to tasks actually running.
 43 |       optional double cell_state_avg_cpu_utilization = 4;
 44 |       optional double cell_state_avg_mem_utilization = 5;
 45 |       // Track avg resource utilization attributable to pessimistic locking
 46 |       // while schedulers make their scheduling decisions.
 47 |       optional double cell_state_avg_cpu_locked = 13;
 48 |       optional double cell_state_avg_mem_locked = 14;
 49 |       // Track per-workload level stats for this experiment.
 50 |       repeated WorkloadStats workload_stats = 6;
 51 |       // Workload specific experiment parameters.
 52 |       optional string sweep_workload = 8;
 53 |       optional double avg_job_interarrival_time = 9;
 54 |       // Track per-scheduler level stats for this experiment.
 55 |       repeated SchedulerStats scheduler_stats = 7;
 56 |       // Scheduler specific experiment parameters.
 57 |       repeated SchedulerWorkload sweep_scheduler_workload = 10;
 58 |       optional double constant_think_time = 11;
 59 |       optional double per_task_think_time = 12;
 60 |       // Next field number: 15
 61 | 
 62 |       // Workload-level stats.
 63 |       message WorkloadStats {
 64 |         optional string workload_name = 1;
 65 |         optional int64 num_jobs = 2;
 66 |         optional int64 num_jobs_scheduled = 3;
 67 |         optional double job_think_times_90_percentile = 4;
 68 |         optional double avg_job_queue_times_till_first_scheduled = 5;
 69 |         optional double avg_job_queue_times_till_fully_scheduled = 6;
 70 |         optional double job_queue_time_till_first_scheduled_90_percentile = 7;
 71 |         optional double job_queue_time_till_fully_scheduled_90_percentile = 8;
 72 |         optional double num_scheduling_attempts_90_percentile = 9;
 73 |         optional double num_scheduling_attempts_99_percentile = 10;
 74 |         optional double num_task_scheduling_attempts_90_percentile = 11;
 75 |         optional double num_task_scheduling_attempts_99_percentile = 12;
 76 |       }
 77 | 
 78 |       message SchedulerStats {
 79 |         optional string scheduler_name = 1;
 80 |         optional double useful_busy_time = 3;
 81 |         optional double wasted_busy_time = 4;
 82 |         repeated PerDayStats per_day_stats = 15;
 83 |         repeated PerWorkloadBusyTime per_workload_busy_time = 5;
 84 |         // These are job level transactions
 85 |         // TODO(andyk): rename these to include "job" in the name.
 86 |         optional int64 num_successful_transactions = 6;
 87 |         optional int64 num_failed_transactions = 7;
 88 |         optional int64 num_no_resources_found_scheduling_attempts = 13;
 89 |         optional int64 num_retried_transactions = 11;
 90 |         optional int64 num_jobs_timed_out_scheduling = 16;
 91 |         optional int64 num_successful_task_transactions = 9;
 92 |         optional int64 num_failed_task_transactions = 10;
 93 |         optional bool is_multi_path = 8;
 94 |         // Num jobs in schedulers job queue when simulation ended.
 95 |         optional int64 num_jobs_left_in_queue = 12;
 96 |         optional int64 failed_find_victim_attempts = 14;
 97 |         // Next field ID:17
 98 | 
 99 |         // Per-day bucketing of important stats to support error bars.
100 |         message PerDayStats {
101 |           optional int64 day_num = 1;
102 |           optional double useful_busy_time = 2;
103 |           optional double wasted_busy_time = 3;
104 |           optional int64 num_successful_transactions = 4;
105 |           optional int64 num_failed_transactions = 5;
106 |         }
107 | 
108 | 
109 |         // Track busy time per scheduler, per workload.
110 |         message PerWorkloadBusyTime {
111 |           optional string workload_name = 1;
112 |           optional double useful_busy_time = 2;
113 |           optional double wasted_busy_time = 3;
114 |         }
115 |       }
116 | 
117 |       // (scheduler, workload) pairs, used to keep track of which
118 |       // such pairs the parameter sweep is applied to in an experiment run.
119 |       message SchedulerWorkload {
120 |         optional string schedulerName = 1;
121 |         optional string workloadName = 2;
122 |       }
123 |     }
124 |   }
125 | }
126 | 
127 | 


--------------------------------------------------------------------------------
/src/main/protocolbuffers/compile_protobufs.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Copyright (c) 2013, Regents of the University of California
 4 | # All rights reserved.
 5 | 
 6 | # Redistribution and use in source and binary forms, with or without
 7 | # modification, are permitted provided that the following conditions are met:
 8 | 
 9 | # Redistributions of source code must retain the above copyright notice, this
10 | # list of conditions and the following disclaimer.  Redistributions in binary
11 | # form must reproduce the above copyright notice, this list of conditions and the
12 | # following disclaimer in the documentation and/or other materials provided with
13 | # the distribution.  Neither the name of the University of California, Berkeley
14 | # nor the names of its contributors may be used to endorse or promote products
15 | # derived from this software without specific prior written permission.  THIS
16 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
17 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
18 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 | 
27 | curr_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
28 | cd $curr_dir
29 | 
30 | protoc --java_out=../java --python_out=../python ./cluster_simulation_protos.proto
31 | 


--------------------------------------------------------------------------------
/src/main/python/generate-txt-from-protobuff.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | 
  3 | # Copyright (c) 2013, Regents of the University of California
  4 | # All rights reserved.
  5 | 
  6 | # Redistribution and use in source and binary forms, with or without
  7 | # modification, are permitted provided that the following conditions are met:
  8 | 
  9 | # Redistributions of source code must retain the above copyright notice, this
 10 | # list of conditions and the following disclaimer.  Redistributions in binary
 11 | # form must reproduce the above copyright notice, this list of conditions and the
 12 | # following disclaimer in the documentation and/or other materials provided with
 13 | # the distribution.  Neither the name of the University of California, Berkeley
 14 | # nor the names of its contributors may be used to endorse or promote products
 15 | # derived from this software without specific prior written permission.  THIS
 16 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 17 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 18 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 19 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 20 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 21 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 22 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 23 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 24 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 26 | 
 27 | # For each unique (cell_name, scheduler, metric) tuple, where metric
 28 | # is either busy_time_median or conflict_fraction median, print a
 29 | # different text file with rows that contain the following fields:
 30 | #   cell_name
 31 | #   sched_id
 32 | #   c
 33 | #   l
 34 | #   avg_job_interarrival_time
 35 | #   median_busy_time (or conflict_fraction)
 36 | #   err_bar_metric_for_busy_time (or conflict_fraction)
 37 | 
 38 | import sys, os, re
 39 | import logging
 40 | import numpy as np
 41 | from collections import defaultdict
 42 | import cluster_simulation_protos_pb2
 43 | 
 44 | logging.basicConfig(level=logging.DEBUG)
 45 | 
 46 | def usage():
 47 |   print "usage: generate-txt-from-protobuff.py <input_protobuff_name> <optional: base name for output files. (defaults to inputfilename)>"
 48 |   sys.exit(1)
 49 | 
 50 | logging.debug("len(sys.argv): " + str(len(sys.argv)))
 51 | 
 52 | if len(sys.argv) < 2:
 53 |   logging.error("Not enough arguments provided.")
 54 |   usage()
 55 | 
 56 | try:
 57 |   input_protobuff_name = sys.argv[1]
 58 |   # Start optional args.
 59 |   if len(sys.argv) == 3:
 60 |     outfile_name_base = str(sys.argv[2])
 61 |   else:
 62 |     #make the output files the same as the input but add .txt to end
 63 |     outfile_name_base = input_protobuff_name
 64 |     
 65 | except:
 66 |   usage()
 67 | 
 68 | logging.info("Input file: %s" % input_protobuff_name)
 69 | 
 70 | def get_mad(median, data):
 71 |   logging.info("in get_mad, with median %f, data: %s"
 72 |                % (median, " ".join([str(i) for i in data])))
 73 |   devs = [abs(x - median) for x in data]
 74 |   mad = np.median(devs)
 75 |   print "returning mad = %f" % mad
 76 |   return mad
 77 | 
 78 | # Read in the ExperimentResultSet.
 79 | experiment_result_set = cluster_simulation_protos_pb2.ExperimentResultSet()
 80 | infile = open(input_protobuff_name, "rb")
 81 | experiment_result_set.ParseFromString(infile.read())
 82 | infile.close()
 83 | 
 84 | # This dictionary, indexed by 3tuples[String] of
 85 | # (cell_name, scheduler_name, metric_name), holds as values strings
 86 | # each holding all of the rows that will be written to to a text file
 87 | # uniquely identified by the dictionary key.
 88 | # This dictionary will be iterated over after being being filled
 89 | # to create text files holding its contents.
 90 | output_strings = defaultdict(str)
 91 | # Loop through each experiment environment.
 92 | logging.debug("Processing %d experiment envs."
 93 |               % len(experiment_result_set.experiment_env))
 94 | for env in experiment_result_set.experiment_env:
 95 |   logging.debug("Handling experiment env (%s %s)."
 96 |                 % (env.cell_name, env.workload_split_type))
 97 |   logging.debug("Processing %d experiment results."
 98 |                 % len(env.experiment_result))
 99 |   prev_l_val = -1.0
100 |   for exp_result in env.experiment_result:
101 |     logging.debug("Handling experiment result with C = %f and L = %f."
102 |                   % (exp_result.constant_think_time,
103 |                      exp_result.per_task_think_time))
104 |     for sched_stat in exp_result.scheduler_stats:
105 |       logging.debug("Handling scheduler stat for %s."
106 |                     % sched_stat.scheduler_name)
107 |       # Calculate per day busy time and conflict fractions.
108 |       daily_busy_fractions = []
109 |       daily_conflict_fractions = []
110 |       for day_stats in sched_stat.per_day_stats:
111 |         # Calculate the total busy time for each of the days and then
112 |         # take median of all fo them.
113 |         run_time_for_day = env.run_time - 86400 * day_stats.day_num
114 |         logging.info("setting run_time_for_day = env.run_time - 86400 * "
115 |                      "day_stats.day_num = %f - 86400 * %d = %f"
116 |                      % (env.run_time, day_stats.day_num, run_time_for_day))
117 |         if run_time_for_day > 0.0:
118 |           daily_busy_fractions.append(((day_stats.useful_busy_time +
119 |                                         day_stats.wasted_busy_time) /
120 |                                        min(86400.0, run_time_for_day)))
121 |           logging.info("%s appending daily_conflict_fraction %f."
122 |                        % (sched_stat.scheduler_name, daily_busy_fractions[-1]))
123 | 
124 |           if day_stats.num_successful_transactions > 0:
125 |             conflict_fraction = (float(day_stats.num_failed_transactions) /
126 |                                  float(day_stats.num_failed_transactions +
127 |                                        day_stats.num_successful_transactions))
128 |             daily_conflict_fractions.append(conflict_fraction)
129 |             logging.info("%s appending daily_conflict_fraction %f."
130 |                          % (sched_stat.scheduler_name, conflict_fraction))
131 |           else:
132 |             daily_conflict_fractions.append(0)
133 |             logging.info("appending 0 to daily_conflict_fraction")
134 | 
135 |       logging.info("Done building daily_busy_fractions: %s"
136 |                    % " ".join([str(i) for i in daily_busy_fractions]))
137 |       logging.info("Also done building daily_conflict_fractions: %s"
138 |                    % " ".join([str(i) for i in daily_conflict_fractions]))
139 | 
140 |       if prev_l_val != exp_result.per_task_think_time and prev_l_val != -1.0:
141 |         opt_extra_newline = "\n"
142 |       else:
143 |         opt_extra_newline = ""
144 |       prev_l_val = exp_result.per_task_think_time
145 | 
146 |       # Compute the busy_time row and append it to the string
147 |       # accumulating output rows for this schedulerName.
148 |       daily_busy_fraction_median = np.median(daily_busy_fractions)
149 |       busy_frac_key = (env.cell_name, sched_stat.scheduler_name, "busy_frac")
150 |       output_strings[busy_frac_key] += \
151 |           "%s%s %s %s %s %s %s %s\n" % (opt_extra_newline,
152 |                                       env.cell_name,
153 |                                       sched_stat.scheduler_name,
154 |                                       exp_result.constant_think_time,
155 |                                       exp_result.per_task_think_time,
156 |                                       exp_result.avg_job_interarrival_time,
157 |                                       daily_busy_fraction_median,
158 |                                       get_mad(daily_busy_fraction_median,
159 |                                               daily_busy_fractions))
160 | 
161 |       conflict_fraction_median = np.median(daily_conflict_fractions)
162 |       conf_frac_key = (env.cell_name, sched_stat.scheduler_name, "conf_frac")
163 |       output_strings[conf_frac_key] += \
164 |           "%s%s %s %s %s %s %s %s\n" % (opt_extra_newline,
165 |                                      env.cell_name,
166 |                                      sched_stat.scheduler_name,
167 |                                      exp_result.constant_think_time,
168 |                                      exp_result.per_task_think_time,
169 |                                      exp_result.avg_job_interarrival_time,
170 |                                      conflict_fraction_median,
171 |                                      get_mad(conflict_fraction_median,
172 |                                              daily_conflict_fractions))
173 | 
174 | # Create output files.
175 | # One output file for each unique (cell_name, scheduler_name, metric) tuple.
176 | for key_tuple, out_str in output_strings.iteritems():
177 |   outfile_name = (outfile_name_base +
178 |                   "." + "_".join([str(i) for i in key_tuple]) + ".txt")
179 |   logging.info("Creating output file: %s" % outfile_name)
180 |   outfile = open(outfile_name, "w")
181 |   outfile.write(out_str)
182 |   outfile.close()
183 | 


--------------------------------------------------------------------------------
/src/main/python/generate-txt-from-protobuff.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Copyright (c) 2013, Regents of the University of California
 4 | # All rights reserved.
 5 | 
 6 | # Redistribution and use in source and binary forms, with or without
 7 | # modification, are permitted provided that the following conditions are met:
 8 | 
 9 | # Redistributions of source code must retain the above copyright notice, this
10 | # list of conditions and the following disclaimer.  Redistributions in binary
11 | # form must reproduce the above copyright notice, this list of conditions and the
12 | # following disclaimer in the documentation and/or other materials provided with
13 | # the distribution.  Neither the name of the University of California, Berkeley
14 | # nor the names of its contributors may be used to endorse or promote products
15 | # derived from this software without specific prior written permission.  THIS
16 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
17 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
18 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 | 
27 | curr_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
28 | cd $curr_dir
29 | 
30 | PYTHONPATH=$PYTHONPATH:.. python ./generate-txt-from-protobuff.py $@
31 | 


--------------------------------------------------------------------------------
/src/main/python/generate-txt-from-protobuffs-in-dir.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Copyright (c) 2013, Regents of the University of California
 4 | # All rights reserved.
 5 | 
 6 | # Redistribution and use in source and binary forms, with or without
 7 | # modification, are permitted provided that the following conditions are met:
 8 | 
 9 | # Redistributions of source code must retain the above copyright notice, this
10 | # list of conditions and the following disclaimer.  Redistributions in binary
11 | # form must reproduce the above copyright notice, this list of conditions and the
12 | # following disclaimer in the documentation and/or other materials provided with
13 | # the distribution.  Neither the name of the University of California, Berkeley
14 | # nor the names of its contributors may be used to endorse or promote products
15 | # derived from this software without specific prior written permission.  THIS
16 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
17 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
18 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 | 
27 | curr_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
28 | cd $curr_dir
29 | 
30 | dirname=$1
31 | echo dirname is $dirname
32 | 
33 | if [[ ! -d $dirname ]]; then
34 |  echo "Error: accepts a single argument which is dirname containing protobuffs."
35 |  exit
36 | fi
37 | 
38 | # TODO(andyk): make naming convention for protobuf(f) consistent.
39 | for i in `ls $dirname/*.protobuf`; do
40 |   echo PYTHONPATH=$PYTHONPATH:.. python ./generate-txt-from-protobuff.py $i 
41 |   PYTHONPATH=$PYTHONPATH:.. python ./generate-txt-from-protobuff.py $i 
42 | done
43 | 


--------------------------------------------------------------------------------
/src/main/python/graphing-scripts/README:
--------------------------------------------------------------------------------
1 | Use these scripts to generate graphs of the output of the synthetic simulator.
2 | Before you can do that, you have to run some simulations.
3 | 
4 | Once you have run some simulations, use CLUSTER_SIM_HOME/bin/generate-graphs.sh
5 | to generate graphs based on the output those simulations generated, which
6 | resides in HOME/experiment_results.
7 | 


--------------------------------------------------------------------------------
/src/main/python/graphing-scripts/comparison-plot-from-protobuff.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | 
  3 | # Copyright (c) 2013, Regents of the University of California
  4 | # All rights reserved.
  5 | 
  6 | # Redistribution and use in source and binary forms, with or without
  7 | # modification, are permitted provided that the following conditions are met:
  8 | 
  9 | # Redistributions of source code must retain the above copyright notice, this
 10 | # list of conditions and the following disclaimer.  Redistributions in binary
 11 | # form must reproduce the above copyright notice, this list of conditions and the
 12 | # following disclaimer in the documentation and/or other materials provided with
 13 | # the distribution.  Neither the name of the University of California, Berkeley
 14 | # nor the names of its contributors may be used to endorse or promote products
 15 | # derived from this software without specific prior written permission.  THIS
 16 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 17 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 18 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 19 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 20 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 21 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 22 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 23 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 24 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 26 | 
 27 | # This file generates a set of graphs for a simulator "experiment".
 28 | # An experiment is equivalent to the file generated from the run of a
 29 | # single Experiment object in the simulator (i.e. a parameter sweep for a
 30 | # set of workload_descs), with the added constraint that only one of
 31 | # C, L, or lambda can be varied per a single series (the simulator
 32 | # currently allows ranges to be provided for more than one of these).
 33 | 
 34 | import sys, os, re
 35 | from utils import *
 36 | import numpy as np
 37 | import matplotlib.pyplot as plt
 38 | import math
 39 | import operator
 40 | import logging
 41 | from collections import defaultdict
 42 | 
 43 | import cluster_simulation_protos_pb2
 44 | 
 45 | logging.basicConfig(level=logging.DEBUG, format="%(message)s")
 46 | 
 47 | def usage():
 48 |   print "usage: scheduler-business.py <output folder> <REMOVED: input_protobuff> " \
 49 |       "<paper_mode: 0|1> <vary_dim: c|l|lambda> <env: any of A,B,C> [png]"
 50 |   sys.exit(1)
 51 | 
 52 | # if len(sys.argv) < 6:
 53 | #   logging.error("Not enough arguments provided.")
 54 | #   usage()
 55 |  
 56 | paper_mode = True
 57 | output_formats = ['pdf']
 58 | # try:
 59 | #   output_prefix = str(sys.argv[1])
 60 | #   input_protobuff = sys.argv[2]
 61 | #   if int(sys.argv[3]) == 1:
 62 | #     paper_mode = True
 63 | #   vary_dim = sys.argv[4]
 64 | #   if vary_dim not in ['c', 'l', 'lambda']:
 65 | #     logging.error("vary_dim must be c, l, or lambda!")
 66 | #     sys.exit(1)
 67 | #   envs_to_plot = sys.argv[5]
 68 | #   if re.search("[^ABC]",envs_to_plot):
 69 | #     logging.error("envs_to_plot must be any combination of a, b, and c, without spaces!")
 70 | #     sys.exit(1)
 71 | #   if len(sys.argv) == 7:
 72 | #     if sys.argv[6] == "png":
 73 | #       output_formats.append('png')
 74 | #     else:
 75 | #       logging.error("The only valid optional 5th argument is 'png'")
 76 | #       sys.exit(1)
 77 | # 
 78 | # except:
 79 | #   usage()
 80 | # 
 81 | # set_leg_fontsize(11)
 82 | 
 83 | # logging.info("Output prefix: %s" % output_prefix)
 84 | # logging.info("Input file: %s" % input_protobuff)
 85 | 
 86 | # google-omega-resfit-allornoth-single_path-vary_l-604800.protobuf
 87 | # google-omega-resfit-inc-single_path-vary_l-604800.protobuf
 88 | # google-omega-seqnum-allornoth-single_path-vary_l-604800.protobuf
 89 | # google-omega-seqnum-inc-single_path-vary_l-604800.protobuf
 90 | 
 91 | envs_to_plot = "C"
 92 | 
 93 | file_dir = '/Users/andyk/omega-7day-simulator-results/'
 94 | output_prefix = file_dir + "/graphs"
 95 | 
 96 | file_names = [("Fine/Gang", "google-omega-resfit-allornoth-single_path-vary_c-604800.protobuf"),  
 97 |               ("Fine/Inc", "google-omega-resfit-inc-single_path-vary_c-604800.protobuf"),
 98 |               ("Coarse/Gang", "google-omega-seqnum-allornoth-single_path-vary_c-604800.protobuf"),
 99 |               ("Course/Inc", "google-omega-seqnum-inc-single_path-vary_c-604800.protobuf")]
100 | 
101 | experiment_result_sets = []
102 | for title_name_tuple in file_names:
103 |   title = title_name_tuple[0]
104 |   file_name = title_name_tuple[1]
105 |   full_name = file_dir + file_name
106 |   # Read in the ExperimentResultSet.
107 |   #experiment_result_sets.append((title, cluster_simulation_protos_pb2.ExperimentResultSet()))
108 |   res_set = cluster_simulation_protos_pb2.ExperimentResultSet()
109 |   experiment_result_sets.append([title, res_set])
110 |   #titles[experiment_result_sets[-1]] = title
111 |   f = open(full_name, "rb")
112 |   res_set.ParseFromString(f.read())
113 |   f.close()
114 | 
115 | 
116 | # ---------------------------------------
117 | # Set up some general graphing variables.
118 | if paper_mode:
119 |   set_paper_rcs()
120 |   fig = plt.figure(figsize=(2,1.33))
121 | else:
122 |   fig = plt.figure()
123 | 
124 | prefilled_colors_web = { 'A': 'b', 'B': 'r', 'C': 'c', "synth": 'y' }
125 | colors_web = { 'A': 'b', 'B': 'r', 'C': 'm', "synth": 'y' }
126 | colors_paper = { 'A': 'b', 'B': 'r', 'C': 'c', "synth": 'b' }
127 | per_wl_colors = { 'OmegaBatch': 'b',
128 |                   'OmegaService': 'r' }
129 | 
130 | title_colors_web = { "Fine/Gang": 'b', "Fine/Inc": 'r', "Coarse/Gang": 'm', "Course/Inc": 'c' }
131 | 
132 | prefilled_linestyles_web = { 'Monolithic': 'D-',
133 |                    'MonolithicApprox': 's-',
134 |                    'MesosBatch': 'D-',
135 |                    'MesosService': 'D:',
136 |                    'MesosBatchApprox': 's-',
137 |                    'MesosServiceApprox': 's:',
138 |                    'OmegaBatch': 'D-',
139 |                    'OmegaService': 'D:',
140 |                    'OmegaBatchApprox': 's-',
141 |                    'OmegaServiceApprox': 's:',
142 |                    'Batch': 'D-',
143 |                    'Service': 'D:' }
144 | 
145 | linestyles_web = { 'Monolithic': 'x-',
146 |                    'MonolithicApprox': 'o-',
147 |                    'MesosBatch': 'x-',
148 |                    'MesosService': 'x:',
149 |                    'MesosBatchApprox': 'o-',
150 |                    'MesosServiceApprox': 'o:',
151 |                    'OmegaBatch': 'x-',
152 |                    'OmegaService': 'x:',
153 |                    'OmegaBatchApprox': 'o-',
154 |                    'OmegaServiceApprox': 'o:',
155 |                    'Batch': 'x-',
156 |                    'Service': 'x:' }
157 | linestyles_paper = { 'Monolithic': '-',
158 |                      'MonolithicApprox': '--',
159 |                      'MesosBatch': '-',
160 |                      'MesosService': ':',
161 |                      'MesosBatchApprox': '--',
162 |                      'MesosServiceApprox': '-.',
163 |                      'OmegaBatch': '-',
164 |                      'OmegaService': ':',
165 |                      'OmegaBatchApprox': '--',
166 |                      'OmegaServiceApprox': '-.',
167 |                      'Batch': '-',
168 |                      'Service': ':' }
169 | 
170 | dashes_paper = { 'Monolithic': (None,None),
171 |                  'MonolithicApprox': (3,3),
172 |                  'MesosBatch': (None,None),
173 |                  'MesosService': (1,1),
174 |                  'MesosBatchApprox': (3,3),
175 |                  'MesosServiceApprox': (4,2),
176 |                  'OmegaBatch': (None,None),
177 |                  'OmegaService': (1,1),
178 |                  'OmegaBatchApprox': (3,3),
179 |                  'OmegaServiceApprox': (4,2),
180 |                  'Batch': (None,None),
181 |                  'Service': (1,1),
182 |                  'Fine/Gang': (1,1),       
183 |                  'Fine/Inc': (3,3), 
184 |                  'Coarse/Gang': (4,2)
185 |                }
186 | 
187 | # Some dictionaries whose values will be dictionaries
188 | # to make 2d dictionaries, which will be indexed by both exp_env 
189 | # and either workoad or scheduler name.
190 | # --
191 | # (cellName, assignmentPolicy, workload_name) -> array of data points
192 | # for the parameter sweep done in the experiment.
193 | workload_queue_time_till_first = {}
194 | workload_queue_time_till_fully = {}
195 | workload_queue_time_till_first_90_ptile = {}
196 | workload_queue_time_till_fully_90_ptile = {}
197 | workload_num_jobs_unscheduled = {}
198 | # (cellName, assignmentPolicy, scheduler_name) -> array of data points
199 | # for the parameter sweep done in the experiment.
200 | sched_total_busy_fraction = {}
201 | sched_daily_busy_fraction = {}
202 | sched_daily_busy_fraction_err = {}
203 | # TODO(andyk): Graph retry_busy_fraction on same graph as total_busy_fraction
204 | #              to parallel Malte's graphs.
205 | # sched_retry_busy_fraction = {}
206 | sched_conflict_fraction = {}
207 | sched_daily_conflict_fraction = {}
208 | sched_daily_conflict_fraction_err = {}
209 | sched_task_conflict_fraction = {}
210 | sched_num_retried_transactions = {}
211 | sched_num_jobs_remaining = {}
212 | sched_failed_find_victim_attempts = {}
213 | 
214 | # Convenience wrapper to override __str__()
215 | class ExperimentEnv:
216 |   def __init__(self, init_exp_env):
217 |     self.exp_env = init_exp_env
218 |     self.cell_name = init_exp_env.cell_name
219 |     self.workload_split_type = init_exp_env.workload_split_type
220 |     self.is_prefilled = init_exp_env.is_prefilled
221 |     self.run_time = init_exp_env.run_time
222 | 
223 |   def __str__(self):
224 |     return str("%s, %s" % (self.exp_env.cell_name, self.exp_env.workload_split_type))
225 | 
226 |   # Figure out if we are varying c, l, or lambda in this experiment.
227 |   def vary_dim(self):
228 |     env = self.exp_env # Make a convenient short handle.
229 |     assert(len(env.experiment_result) > 1)
230 |     if (env.experiment_result[0].constant_think_time !=
231 |         env.experiment_result[1].constant_think_time):
232 |       vary_dim = "c"
233 |       # logging.debug("Varying %s. The first two experiments' c values were %d, %d "
234 |       #               % (vary_dim,
235 |       #                  env.experiment_result[0].constant_think_time,
236 |       #                  env.experiment_result[1].constant_think_time))
237 |     elif (env.experiment_result[0].per_task_think_time !=
238 |           env.experiment_result[1].per_task_think_time):
239 |       vary_dim = "l"
240 |       # logging.debug("Varying %s. The first two experiments' l values were %d, %d "
241 |       #               % (vary_dim,
242 |       #                  env.experiment_result[0].per_task_think_time,
243 |       #                  env.experiment_result[1].per_task_think_time))
244 |     else:
245 |       vary_dim = "lambda"
246 |     # logging.debug("Varying %s." % vary_dim)
247 |     return vary_dim
248 |   
249 | class Value:
250 |   def __init__(self, init_x, init_y):
251 |     self.x = init_x
252 |     self.y = init_y
253 |   def __str__(self):
254 |     return str("%f, %f" % (self.x, self.y))
255 | 
256 | def bt_approx(cell_name, sched_name, point, vary_dim_, tt_c, tt_l, runtime):
257 |   logging.debug("sched_name is %s " % sched_name)
258 |   assert(sched_name == "Batch" or sched_name == "Service")
259 |   lbd = {}
260 |   n = {}
261 |   # This function calculates an approximated scheduler busyness line given
262 |   # an average inter-arrival time and job size for each scheduler
263 |   # XXX: configure the below parameters and comment out the following
264 |   # line in order to
265 |   # 1) disable the warning, and
266 |   # 2) get a correct no-conflict approximation.
267 |   print >> sys.stderr, "*********************************************\n" \
268 |       "WARNING: YOU HAVE NOT CONFIGURED THE PARAMETERS IN THE bt_approx\n" \
269 |       "*********************************************\n"
270 |   ################################
271 |   # XXX EDIT BELOW HERE
272 |   # hard-coded SAMPLE params for cluster A
273 |   lbd['A'] = { "Batch": 0.1, "Service": 0.01 } # lambdas for 0: serv & 1: Batch 
274 |   n['A'] = { "Batch": 10.0, "Service": 5.0 } # avg num tasks per job
275 |   # hard-coded SAMPLE params for cluster B
276 |   lbd['B'] = { "Batch": 0.1, "Service": 0.01 }
277 |   n['B'] = { "Batch": 10.0, "Service": 5.0 }
278 |   # hard-coded SAMPLE params for cluster C
279 |   lbd['C'] = { "Batch": 0.1, "Service": 0.01 }
280 |   n['C'] = { "Batch": 10.0, "Service": 5.0 }
281 |   ################################
282 | 
283 |   # approximation formula
284 |   if vary_dim_ == 'c':
285 |     # busy_time = num_jobs * (per_job_think_time = C + nL) / runtime
286 |     return runtime * lbd[cell_name][sched_name] *                       \
287 |            ((point + n[cell_name][sched_name] * float(tt_l))) / runtime
288 |   elif vary_dim_ == 'l':
289 |     return runtime * lbd[cell_name][sched_name] *                       \
290 |            ((float(tt_c) + n[cell_name][sched_name] * point)) / runtime
291 | 
292 | def get_mad(median, data):
293 |   #print "in get_mad, with median %f, data: %s" % (median, " ".join([str(i) for i in data]))
294 |   devs = [abs(x - median) for x in data]
295 |   mad = np.median(devs)
296 |   #print "returning mad = %f" % mad
297 |   return mad
298 | 
299 | def sort_labels(handles, labels):
300 |   hl = sorted(zip(handles, labels),
301 |               key=operator.itemgetter(1))
302 |   handles2, labels2 = zip(*hl)
303 |   return (handles2, labels2)
304 | 
305 | for experiment_result_set_arry in experiment_result_sets:
306 |   title = experiment_result_set_arry[0]
307 |   logging.debug("\n\n==========================\nHandling title %s." % title)
308 |   experiment_result_set = experiment_result_set_arry[1]
309 | 
310 |   # Loop through each experiment environment.
311 |   logging.debug("Processing %d experiment envs."
312 |                 % len(experiment_result_set.experiment_env))
313 |   for env in experiment_result_set.experiment_env:
314 |     if not re.search(cell_to_anon(env.cell_name), envs_to_plot):
315 |       logging.debug("  skipping env/cell " + env.cell_name)
316 |       continue
317 |     logging.debug("\n\n\n env: " + env.cell_name)
318 |     exp_env = ExperimentEnv(env) # Wrap the protobuff object to get __str__()
319 |     logging.debug("  Handling experiment env %s." % exp_env)
320 |   
321 |     # Within this environment, loop through each experiment result
322 |     logging.debug("  Processing %d experiment results." % len(env.experiment_result))
323 |     for exp_result in env.experiment_result:
324 |       logging.debug("    Handling experiment with per_task_think_time %f, constant_think_time %f"
325 |             % (exp_result.per_task_think_time, exp_result.constant_think_time))
326 |       # Record the correct x val depending on which dimension is being
327 |       # swept over in this experiment.
328 |       vary_dim = exp_env.vary_dim() # This line is unecessary since this value 
329 |                                     # is a flag passed as an arg to the script.
330 |       if vary_dim == "c":
331 |         x_val = exp_result.constant_think_time
332 |       elif vary_dim == "l":
333 |         x_val = exp_result.per_task_think_time
334 |       else:
335 |         x_val = exp_result.avg_job_interarrival_time
336 |       # logging.debug("Set x_val to %f." % x_val)
337 |   
338 |       # Build results dictionaries of per-scheduler stats.
339 |       for sched_stat in exp_result.scheduler_stats:
340 |         # Per day busy time and conflict fractions.
341 |         daily_busy_fractions = []
342 |         daily_conflict_fractions = []
343 |         daily_conflicts = [] # counts the mean of daily abs # of conflicts.
344 |         daily_successes = []
345 |         logging.debug("      handling scheduler %s" % sched_stat.scheduler_name)
346 |         for day_stats in sched_stat.per_day_stats:
347 |           # Calculate the total busy time for each of the days and then
348 |           # take median of all fo them.
349 |           run_time_for_day = exp_env.run_time - 86400 * day_stats.day_num
350 |           # logging.debug("setting run_time_for_day = exp_env.run_time - 86400 * "
351 |           #       "day_stats.day_num = %f - 86400 * %d = %f"
352 |           #       % (exp_env.run_time, day_stats.day_num, run_time_for_day))
353 |           if run_time_for_day > 0.0:
354 |             daily_busy_fractions.append(((day_stats.useful_busy_time +
355 |                                           day_stats.wasted_busy_time) /
356 |                                          min(86400.0, run_time_for_day)))
357 |   
358 |             if day_stats.num_successful_transactions > 0:
359 |               conflict_fraction = (float(day_stats.num_failed_transactions) /
360 |                                    float(day_stats.num_successful_transactions))
361 |               daily_conflict_fractions.append(conflict_fraction)
362 |               daily_conflicts.append(float(day_stats.num_failed_transactions))
363 |               daily_successes.append(float(day_stats.num_successful_transactions))
364 |               # logging.debug("appending daily_conflict_fraction %f / %f = %f." 
365 |               #       % (float(day_stats.num_failed_transactions),
366 |               #          float(day_stats.num_successful_transactions),
367 |               #          conflict_fraction))
368 |             else:
369 |               daily_conflict_fractions.append(0)
370 | 
371 |         # Daily busy time median.
372 |         daily_busy_time_med = np.median(daily_busy_fractions)
373 |         logging.debug("      Daily_busy_fractions, med: %f, vals: %s"
374 |               % (daily_busy_time_med,
375 |                  " ".join([str(i) for i in daily_busy_fractions])))
376 |         value = Value(x_val, daily_busy_time_med)
377 |         append_or_create_2d(sched_daily_busy_fraction,
378 |                             title,
379 |                             sched_stat.scheduler_name,
380 |                             value)
381 |         #logging.debug("sched_daily_busy_fraction[%s %s].append(%s)."
382 |         #              % (exp_env, sched_stat.scheduler_name, value))
383 |         # Error Bar (MAD) for daily busy time.
384 |         value = Value(x_val, get_mad(daily_busy_time_med,
385 |                                      daily_busy_fractions))
386 |         append_or_create_2d(sched_daily_busy_fraction_err,
387 |                             title,
388 |                             sched_stat.scheduler_name,
389 |                             value)
390 |         #logging.debug("sched_daily_busy_fraction_err[%s %s].append(%s)."
391 |         #              % (exp_env, sched_stat.scheduler_name, value))
392 |         # Daily conflict fraction median.
393 |         daily_conflict_fraction_med = np.median(daily_conflict_fractions)
394 |         logging.debug("      Daily_abs_num_conflicts, med: %f, vals: %s"
395 |               % (np.median(daily_conflicts),
396 |                  " ".join([str(i) for i in daily_conflicts])))
397 |         logging.debug("      Daily_num_successful_conflicts, med: %f, vals: %s"
398 |               % (np.median(daily_successes),
399 |                  " ".join([str(i) for i in daily_successes])))
400 |         logging.debug("      Daily_conflict_fractions, med : %f, vals: %s\n      --"
401 |               % (daily_conflict_fraction_med,
402 |                  " ".join([str(i) for i in daily_conflict_fractions])))
403 |         value = Value(x_val, daily_conflict_fraction_med)
404 |         append_or_create_2d(sched_daily_conflict_fraction,
405 |                             title,
406 |                             sched_stat.scheduler_name,
407 |                             value)
408 |         # logging.debug("sched_daily_conflict_fraction[%s %s].append(%s)."
409 |         #               % (exp_env, sched_stat.scheduler_name, value))
410 |         # Error Bar (MAD) for daily conflict fraction.
411 |         value = Value(x_val, get_mad(daily_conflict_fraction_med,
412 |                                      daily_conflict_fractions))
413 |         append_or_create_2d(sched_daily_conflict_fraction_err,
414 |                             title,
415 |                             sched_stat.scheduler_name,
416 |                             value)
417 |         
418 | 
419 | def plot_2d_data_set_dict(data_set_2d_dict,
420 |                           plot_title,
421 |                           filename_suffix,
422 |                           y_label,
423 |                           y_axis_type,
424 |                           error_bars_data_set_2d_dict = None):
425 |   assert(y_axis_type == "0-to-1" or
426 |          y_axis_type == "ms-to-day" or 
427 |          y_axis_type == "abs")
428 |   plt.clf()
429 |   ax = fig.add_subplot(111)
430 |   for title, name_to_val_map in data_set_2d_dict.iteritems():
431 |     for wl_or_sched_name, values in name_to_val_map.iteritems():
432 |       line_label = title
433 |       # Hacky: chop MonolithicBatch, MesosBatch, MonolithicService, etc.
434 |       # down to "Batch" and "Service" if in paper mode.
435 |       updated_wl_or_sched_name = wl_or_sched_name
436 |       if paper_mode and re.search("Batch", wl_or_sched_name):
437 |         updated_wl_or_sched_name = "Batch"
438 |       if paper_mode and re.search("Service", wl_or_sched_name):
439 |         updated_wl_or_sched_name = "Service"
440 | 
441 |       # Don't show lines for service frameworks
442 |       if updated_wl_or_sched_name == "Batch":
443 |         "Skipping a line for a service scheduler"
444 |         continue
445 |       x_vals = [value.x for value in values]
446 |       # Rewrite zero's for the y_axis_types that will be log.
447 |       y_vals = [0.00001 if (value.y == 0 and y_axis_type == "ms-to-day")
448 |                         else value.y for value in values]
449 |       logging.debug("Plotting line for %s %s %s." %
450 |                     (title, updated_wl_or_sched_name, plot_title))
451 |       #logging.debug("x vals: " + " ".join([str(i) for i in x_vals]))
452 |       #logging.debug("y vals: " + " ".join([str(i) for i in y_vals]))
453 |       logging.debug("wl_or_sched_name: " + wl_or_sched_name)
454 |       logging.debug("title: " + title)
455 | 
456 |       ax.plot(x_vals, y_vals,
457 |               dashes=dashes_paper[wl_or_sched_name],
458 |               color=title_colors_web[title],
459 |               label=line_label, markersize=4,
460 |               mec=title_colors_web[title])
461 | 
462 |   setup_graph_details(ax, plot_title, filename_suffix, y_label, y_axis_type)
463 | 
464 | def setup_graph_details(ax, plot_title, filename_suffix, y_label, y_axis_type):
465 |   assert(y_axis_type == "0-to-1" or
466 |          y_axis_type == "ms-to-day" or 
467 |          y_axis_type == "abs")
468 | 
469 |   # Paper title.
470 |   if not paper_mode:
471 |     plt.title(plot_title)
472 | 
473 |   if paper_mode:
474 |     try:
475 |       # Set up the legend, for removing the border if in paper mode.
476 |       handles, labels = ax.get_legend_handles_labels()
477 |       handles2, labels2 = sort_labels(handles, labels)
478 |       leg = plt.legend(handles2, labels2, loc=2, labelspacing=0)
479 |       fr = leg.get_frame()
480 |       fr.set_linewidth(0)
481 |     except:
482 |       print "Failed to remove frame around legend, legend probably is empty."
483 | 
484 |   # Axis labels.
485 |   if not paper_mode:
486 |     ax.set_ylabel(y_label)
487 |     if vary_dim == "c":
488 |       ax.set_xlabel(u'Scheduler 1 constant processing time [sec]')
489 |     elif vary_dim == "l":
490 |       ax.set_xlabel(u'Scheduler 1 per-task processing time [sec]')
491 |     elif vary_dim == "lambda":
492 |       ax.set_xlabel(u'Job arrival rate to scheduler 1, $\lambda_1$')
493 | 
494 |   # x-axis scale, limit, tics and tic labels.
495 |   ax.set_xscale('log')
496 |   ax.set_autoscalex_on(False)
497 |   if vary_dim == 'c':
498 |     plt.xlim(xmin=0.01)
499 |     plt.xticks((0.01, 0.1, 1, 10, 100), ('10ms', '0.1s', '1s', '10s', '100s'))
500 |   elif vary_dim == 'l':
501 |     plt.xlim(xmin=0.001, xmax=1)
502 |     plt.xticks((0.001, 0.01, 0.1, 1), ('1ms', '10ms', '0.1s', '1s'))
503 |   elif vary_dim == 'lambda':
504 |     plt.xlim([0.1, 100])
505 |     plt.xticks((0.1, 1, 10, 100), ('0.1s', '1s', '10s', '100s'))
506 | 
507 |   # y-axis limit, tics and tic labels.
508 |   if y_axis_type == "0-to-1":
509 |     logging.debug("Setting up y-axis for '0-to-1' style graph.")
510 |     plt.ylim([0, 1])
511 |     plt.yticks((0, 0.2, 0.4, 0.6, 0.8, 1.0),
512 |                ('0.0', '0.2', '0.4', '0.6', '0.8', '1.0'))
513 |   elif y_axis_type == "ms-to-day":
514 |     logging.debug("Setting up y-axis for 'ms-to-day' style graph.")
515 |     #ax.set_yscale('symlog', linthreshy=0.001)
516 |     ax.set_yscale('log')
517 |     plt.ylim(ymin=0.01, ymax=24*3600)
518 |     plt.yticks((0.01, 1, 60, 3600, 24*3600), ('10ms', '1s', '1m', '1h', '1d'))
519 |   elif y_axis_type == "abs":
520 |     plt.ylim(ymin=0)
521 |     logging.debug("Setting up y-axis for 'abs' style graph.")
522 |     #plt.yticks((0.01, 1, 60, 3600, 24*3600), ('10ms', '1s', '1m', '1h', '1d'))
523 |   else:
524 |     logging.error('y_axis_label parameter must be either "0-to-1"'
525 |                   ', "ms-to-day", or "abs".')
526 |     sys.exit(1)
527 | 
528 |   final_filename = (output_prefix +
529 |                    ('/sisi-vary-%s-vs-' % vary_dim) +
530 |                    filename_suffix)
531 |   logging.debug("Writing plot to %s", final_filename)
532 |   writeout(final_filename, output_formats)
533 | 
534 | 
535 | #SCHEDULER DAILY BUSY AND CONFLICT FRACTION MEDIANS
536 | plot_2d_data_set_dict(sched_daily_busy_fraction,
537 |                       "Scheduler processing time vs. median(daily busy time fraction)",
538 |                       "daily-busy-fraction-med",
539 |                       u'Median(daily busy time fraction)',
540 |                       "0-to-1")
541 | 
542 | plot_2d_data_set_dict(sched_daily_conflict_fraction,
543 |                       "Scheduler processing time vs. median(daily conflict fraction)",
544 |                       "daily-conflict-fraction-med",
545 |                       u'Median(daily conflict fraction)',
546 |                       "0-to-1")
547 | 


--------------------------------------------------------------------------------
/src/main/python/graphing-scripts/comparison-plot-from-protobuff.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Copyright (c) 2013, Regents of the University of California
 4 | # All rights reserved.
 5 | 
 6 | # Redistribution and use in source and binary forms, with or without
 7 | # modification, are permitted provided that the following conditions are met:
 8 | 
 9 | # Redistributions of source code must retain the above copyright notice, this
10 | # list of conditions and the following disclaimer.  Redistributions in binary
11 | # form must reproduce the above copyright notice, this list of conditions and the
12 | # following disclaimer in the documentation and/or other materials provided with
13 | # the distribution.  Neither the name of the University of California, Berkeley
14 | # nor the names of its contributors may be used to endorse or promote products
15 | # derived from this software without specific prior written permission.  THIS
16 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
17 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
18 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 | 
27 | curr_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
28 | cd $curr_dir
29 | 
30 | PYTHONPATH=$PYTHONPATH:.. python ./comparison-plot-from-protobuff.py $@
31 | 


--------------------------------------------------------------------------------
/src/main/python/graphing-scripts/utils.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) 2013, Regents of the University of California
 2 | # All rights reserved.
 3 | 
 4 | # Redistribution and use in source and binary forms, with or without
 5 | # modification, are permitted provided that the following conditions are met:
 6 | 
 7 | # Redistributions of source code must retain the above copyright notice, this
 8 | # list of conditions and the following disclaimer.  Redistributions in binary
 9 | # form must reproduce the above copyright notice, this list of conditions and the
10 | # following disclaimer in the documentation and/or other materials provided with
11 | # the distribution.  Neither the name of the University of California, Berkeley
12 | # nor the names of its contributors may be used to endorse or promote products
13 | # derived from this software without specific prior written permission.  THIS
14 | # SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
15 | # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
16 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
17 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
18 | # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19 | # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
20 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
21 | # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
22 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
23 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
24 | 
25 | from matplotlib import use, rc
26 | use('Agg')
27 | import matplotlib.pyplot as plt
28 | 
29 | # plot saving utility function
30 | def writeout(filename_base, formats = ['pdf']):
31 |   for fmt in formats:
32 |     plt.savefig("%s.%s" % (filename_base, fmt), format=fmt, bbox_inches='tight')
33 | #    plt.savefig("%s.%s" % (filename_base, fmt), format=fmt)
34 | 
35 | def set_leg_fontsize(size):
36 |   rc('legend', fontsize=size)
37 | 
38 | def set_paper_rcs():
39 |   rc('font',**{'family':'sans-serif','sans-serif':['Helvetica'],
40 |                'serif':['Helvetica'],'size':8})
41 |   rc('text', usetex=True)
42 |   rc('legend', fontsize=7)
43 |   rc('figure', figsize=(3.33,2.22))
44 | #  rc('figure.subplot', left=0.10, top=0.90, bottom=0.12, right=0.95)
45 |   rc('axes', linewidth=0.5)
46 |   rc('lines', linewidth=0.5)
47 | 
48 | def set_rcs():
49 |   rc('font',**{'family':'sans-serif','sans-serif':['Helvetica'],
50 |                'serif':['Times'],'size':12})
51 |   rc('text', usetex=True)
52 |   rc('legend', fontsize=7)
53 |   rc('figure', figsize=(6,4))
54 |   rc('figure.subplot', left=0.10, top=0.90, bottom=0.12, right=0.95)
55 |   rc('axes', linewidth=0.5)
56 |   rc('lines', linewidth=0.5)
57 | 
58 | def append_or_create(d, i, e):
59 |   if not i in d:
60 |     d[i] = [e]
61 |   else:
62 |     d[i].append(e)
63 | 
64 | # Append e to the array at position (i,k).
65 | # d - a dictionary of dictionaries of arrays, essentially a 2d dictionary.
66 | # i, k - essentially a 2 element tuple to use as the key into this 2d dict.
67 | # e - the value to add to the array indexed by key (i,k).
68 | def append_or_create_2d(d, i, k, e):
69 |   if not i in d:
70 |     d[i] = {k : [e]}
71 |   elif k not in d[i]: 
72 |     d[i][k] = [e]
73 |   else:
74 |     d[i][k].append(e)
75 | 
76 | def cell_to_anon(cell):
77 |   if cell == 'A':
78 |     return 'A'
79 |   elif cell == 'B':
80 |     return 'B'
81 |   elif cell == 'C':
82 |     return 'C'
83 |   else:
84 |     return 'SYNTH'
85 | 


--------------------------------------------------------------------------------
/src/main/scala/ExperimentRunner.scala:
--------------------------------------------------------------------------------
  1 | /**
  2 |  * Copyright (c) 2013, Regents of the University of California
  3 |  * All rights reserved.
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.  Redistributions in binary
 10 |  * form must reproduce the above copyright notice, this list of conditions and the
 11 |  * following disclaimer in the documentation and/or other materials provided with
 12 |  * the distribution.  Neither the name of the University of California, Berkeley
 13 |  * nor the names of its contributors may be used to endorse or promote products
 14 |  * derived from this software without specific prior written permission.  THIS
 15 |  * SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 16 |  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 17 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 18 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 19 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 20 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 21 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 22 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 23 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 24 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 25 |  */
 26 | 
 27 | package ClusterSchedulingSimulation
 28 | 
 29 | import scala.collection.mutable.HashMap
 30 | import scala.collection.mutable.ListBuffer
 31 | import ClusterSimulationProtos._
 32 | import java.io._
 33 | 
 34 | /**
 35 |  * An experiment represents a series of runs of a simulator,
 36 |  * across ranges of paramters. Exactly one of {L, C, Lambda}
 37 |  * can be swept over per experiment, i.e. only one of
 38 |  * avgJobInterarrivalTimeRange, constantThinkTimeRange, and
 39 |  * perTaskThinkTimeRange can have size greater than one in a
 40 |  * single Experiment instance.
 41 |  */
 42 | class Experiment(
 43 |     name: String,
 44 |     // Workloads setup.
 45 |     workloadToSweepOver: String,
 46 |     avgJobInterarrivalTimeRange: Option[Seq[Double]] = None,
 47 |     workloadDescs: Seq[WorkloadDesc],
 48 |     // Schedulers setup.
 49 |     schedulerWorkloadsToSweepOver: Map[String, Seq[String]],
 50 |     constantThinkTimeRange: Seq[Double],
 51 |     perTaskThinkTimeRange: Seq[Double],
 52 |     blackListPercentRange: Seq[Double],
 53 |     // Workload -> scheduler mapping setup.
 54 |     schedulerWorkloadMap: Map[String, Seq[String]],
 55 |     // Simulator setup.
 56 |     simulatorDesc: ClusterSimulatorDesc,
 57 |     logging: Boolean = false,
 58 |     outputDirectory: String = "experiment_results",
 59 |     // Map from workloadName -> max % of cellState this prefill workload
 60 |     // can account for. Any prefill workload generator with workloadName
 61 |     // that is not contained in any of these maps will have no prefill
 62 |     // generated for this experiment, and any with name that is in multiple
 63 |     // of these maps will use the first limit that actually kicks in.
 64 |     prefillCpuLimits: Map[String, Double] = Map(),
 65 |     prefillMemLimits: Map[String, Double] = Map(),
 66 |     // Default simulations to 10 minute timeout.
 67 |     simulationTimeout: Double = 60.0*10.0) extends Runnable {
 68 |   prefillCpuLimits.values.foreach(l => assert(l >= 0.0 && l <= 1.0))
 69 |   prefillMemLimits.values.foreach(l => assert(l >= 0.0 && l <= 1.0))
 70 | 
 71 |   var parametersSweepingOver = 0
 72 |   avgJobInterarrivalTimeRange.foreach{opt: Seq[Double] => {
 73 |     if (opt.length > 1) {
 74 |       parametersSweepingOver += 1
 75 |     }
 76 |   }}
 77 |   if (constantThinkTimeRange.length > 1) {parametersSweepingOver += 1}
 78 |   if (perTaskThinkTimeRange.length > 1) {parametersSweepingOver += 1}
 79 |   // assert(parametersSweepingOver <= 1)
 80 | 
 81 |   override
 82 |   def toString = name
 83 | 
 84 |   def run() {
 85 |     // Create the output directory if it doesn't exist.
 86 |     (new File(outputDirectory)).mkdirs()
 87 |     val output =
 88 |         new java.io.FileOutputStream("%s/%s-%.0f.protobuf"
 89 |                                      .format(outputDirectory,
 90 |                                              name,
 91 |                                              simulatorDesc.runTime))
 92 | 
 93 |     val experimentResultSet = ExperimentResultSet.newBuilder()
 94 | 
 95 |     // Parameter sweep over workloadDescs
 96 |     workloadDescs.foreach(workloadDesc => {
 97 |       println("\nSet workloadDesc = %s %s"
 98 |               .format(workloadDesc.cell, workloadDesc.assignmentPolicy))
 99 | 
100 |       // Save Experiment level stats into protobuf results.
101 |       val experimentEnv = ExperimentResultSet.ExperimentEnv.newBuilder()
102 |       experimentEnv.setCellName(workloadDesc.cell)
103 |       experimentEnv.setWorkloadSplitType(workloadDesc.assignmentPolicy)
104 |       experimentEnv.setIsPrefilled(
105 |           workloadDesc.prefillWorkloadGenerators.length > 0)
106 |       experimentEnv.setRunTime(simulatorDesc.runTime)
107 | 
108 |       // Generate preFill workloads. The simulator doesn't modify
109 |       // these workloads like it does the workloads that are played during
110 |       // the simulation.
111 |       var prefillWorkloads = List[Workload]()
112 |       workloadDesc.prefillWorkloadGenerators
113 |                   .filter(wlGen => {
114 |                     prefillCpuLimits.contains(wlGen.workloadName) ||
115 |                     prefillMemLimits.contains(wlGen.workloadName)
116 |                   }).foreach(wlGen => {
117 |         val cpusMaxOpt = prefillCpuLimits.get(wlGen.workloadName).map(i => {
118 |             i * workloadDesc.cellStateDesc.numMachines *
119 |             workloadDesc.cellStateDesc.cpusPerMachine
120 |         })
121 |                       
122 |         val memMaxOpt = prefillMemLimits.get(wlGen.workloadName).map(i => {
123 |           i * workloadDesc.cellStateDesc.numMachines *
124 |           workloadDesc.cellStateDesc.memPerMachine
125 |         })
126 |         println(("Creating a new prefill workload from " +
127 |                 "%s with maxCPU %s and maxMem %s")
128 |                 .format(wlGen.workloadName, cpusMaxOpt, memMaxOpt))
129 |         val newWorkload = wlGen.newWorkload(simulatorDesc.runTime,
130 |                                             maxCpus = cpusMaxOpt,
131 |                                             maxMem = memMaxOpt)
132 |         for(job <- newWorkload.getJobs) {
133 |           assert(job.submitted == 0.0)
134 |         }
135 |         prefillWorkloads ::= newWorkload
136 |       })
137 | 
138 |       // Parameter sweep over lambda.
139 |       // If we have a range for lambda, loop over it, else
140 |       // we just loop over a list holding a single element: None 
141 |       val jobInterarrivalRange = avgJobInterarrivalTimeRange match {
142 |         case Some(paramsRange) => paramsRange.map(Some(_))
143 |         case None => List(None)
144 |       }
145 | 
146 |       println("\nSet up avgJobInterarrivalTimeRange: %s\n"
147 |               .format(jobInterarrivalRange))
148 |       jobInterarrivalRange.foreach(avgJobInterarrivalTime => {
149 |         if (avgJobInterarrivalTime.isEmpty) {
150 |           println("Since we're not in a labmda sweep, not overwriting lambda.")
151 |         } else {
152 |           println("Curr avgJobInterarrivalTime: %s\n"
153 |                   .format(avgJobInterarrivalTime))
154 |         }
155 | 
156 |         // Set up a list of workloads
157 |         var commonWorkloadSet = ListBuffer[Workload]()
158 |         var newAvgJobInterarrivalTime: Option[Double] = None
159 |         workloadDesc.workloadGenerators.foreach(workloadGenerator => {
160 |           if (workloadToSweepOver.equals(
161 |               workloadGenerator.workloadName)) {
162 |             // Only update the workload interarrival time if this is the
163 |             // workload we are supposed to sweep over. If this is not a
164 |             // lambda parameter sweep then updatedAvgJobInterarrivalTime
165 |             // will remain None after this line is executed.
166 |             newAvgJobInterarrivalTime = avgJobInterarrivalTime          
167 |           }
168 |           println("Generating new Workload %s for window %f seconds long."
169 |                   .format(workloadGenerator.workloadName, simulatorDesc.runTime))
170 |           val newWorkload =
171 |               workloadGenerator
172 |               .newWorkload(timeWindow = simulatorDesc.runTime,
173 |                            updatedAvgJobInterarrivalTime = newAvgJobInterarrivalTime)
174 |           commonWorkloadSet.append(newWorkload)
175 |         })
176 | 
177 |         // Parameter sweep over L.
178 |         perTaskThinkTimeRange.foreach(perTaskThinkTime => {
179 |           println("\nSet perTaskThinkTime = %f".format(perTaskThinkTime))
180 | 
181 |           // Parameter sweep over C.
182 |           constantThinkTimeRange.foreach(constantThinkTime => {
183 |             println("\nSet constantThinkTime = %f".format(constantThinkTime))
184 | 
185 |             // Parameter sweep over BlackListPercent (of cellstate).
186 |             blackListPercentRange.foreach(blackListPercent => {
187 |               println("\nSet blackListPercent = %f".format(blackListPercent))
188 | 
189 |               // Make a copy of the workloads that this run of the simulator
190 |               // will modify by using them to track statistics.
191 |               val workloads = ListBuffer[Workload]()
192 |               commonWorkloadSet.foreach(workload => {
193 |                 workloads.append(workload.copy)
194 |               })
195 |               // Setup and and run the simulator.
196 |               val simulator =
197 |                   simulatorDesc.newSimulator(constantThinkTime,
198 |                                              perTaskThinkTime,
199 |                                              blackListPercent,
200 |                                              schedulerWorkloadsToSweepOver,
201 |                                              schedulerWorkloadMap,
202 |                                              workloadDesc.cellStateDesc,
203 |                                              workloads,
204 |                                              prefillWorkloads,
205 |                                              logging)
206 | 
207 |               println("Running simulation with run().")
208 |               val success: Boolean = simulator.run(Some(simulatorDesc.runTime),
209 |                                                    Some(simulationTimeout))
210 |               if (success) {
211 |                 // Simulation did not time out, so record stats.
212 |                 /**
213 |                  * Capture statistics into a protocolbuffer.
214 |                  */
215 |                 val experimentResult =
216 |                     ExperimentResultSet.ExperimentEnv.ExperimentResult.newBuilder()
217 | 
218 |                 experimentResult.setCellStateAvgCpuUtilization(
219 |                     simulator.avgCpuUtilization / simulator.cellState.totalCpus)
220 |                 experimentResult.setCellStateAvgMemUtilization(
221 |                     simulator.avgMemUtilization / simulator.cellState.totalMem)
222 | 
223 |                 experimentResult.setCellStateAvgCpuLocked(
224 |                     simulator.avgCpuLocked / simulator.cellState.totalCpus)
225 |                 experimentResult.setCellStateAvgMemLocked(
226 |                     simulator.avgMemLocked / simulator.cellState.totalMem)
227 | 
228 |                 // Save repeated stats about workloads.
229 |                 workloads.foreach(workload => {
230 |                   val workloadStats = ExperimentResultSet.
231 |                                       ExperimentEnv.
232 |                                       ExperimentResult.
233 |                                       WorkloadStats.newBuilder()
234 |                   workloadStats.setWorkloadName(workload.name)
235 |                   workloadStats.setNumJobs(workload.numJobs)
236 |                   workloadStats.setNumJobsScheduled(
237 |                       workload.getJobs.filter(_.numSchedulingAttempts > 0).length)
238 |                   workload
239 |                   workloadStats.setJobThinkTimes90Percentile(
240 |                       workload.jobUsefulThinkTimesPercentile(0.9))
241 |                   workloadStats.setAvgJobQueueTimesTillFirstScheduled(
242 |                       workload.avgJobQueueTimeTillFirstScheduled)
243 |                   workloadStats.setAvgJobQueueTimesTillFullyScheduled(
244 |                       workload.avgJobQueueTimeTillFullyScheduled)
245 |                   workloadStats.setJobQueueTimeTillFirstScheduled90Percentile(
246 |                       workload.jobQueueTimeTillFirstScheduledPercentile(0.9))
247 |                   workloadStats.setJobQueueTimeTillFullyScheduled90Percentile(
248 |                       workload.jobQueueTimeTillFullyScheduledPercentile(0.9))
249 |                   workloadStats.setNumSchedulingAttempts90Percentile(
250 |                       workload.numSchedulingAttemptsPercentile(0.9))
251 |                   workloadStats.setNumSchedulingAttempts99Percentile(
252 |                       workload.numSchedulingAttemptsPercentile(0.99))
253 |                   workloadStats.setNumTaskSchedulingAttempts90Percentile(
254 |                       workload.numTaskSchedulingAttemptsPercentile(0.9))
255 |                   workloadStats.setNumTaskSchedulingAttempts99Percentile(
256 |                       workload.numTaskSchedulingAttemptsPercentile(0.99))
257 | 
258 |                   experimentResult.addWorkloadStats(workloadStats)
259 |                 })
260 |                 // Record workload specific details about the parameter sweeps.
261 |                 experimentResult.setSweepWorkload(workloadToSweepOver)
262 |                 experimentResult.setAvgJobInterarrivalTime(
263 |                     avgJobInterarrivalTime.getOrElse(
264 |                         workloads.filter(_.name == workloadToSweepOver)
265 |                                  .head.avgJobInterarrivalTime))
266 | 
267 |                 // Save repeated stats about schedulers.
268 |                 simulator.schedulers.values.foreach(scheduler => {
269 |                   val schedulerStats =
270 |                       ExperimentResultSet.
271 |                       ExperimentEnv.
272 |                       ExperimentResult.
273 |                       SchedulerStats.newBuilder()
274 |                   schedulerStats.setSchedulerName(scheduler.name)
275 |                   schedulerStats.setUsefulBusyTime(
276 |                       scheduler.totalUsefulTimeScheduling)
277 |                   schedulerStats.setWastedBusyTime(
278 |                       scheduler.totalWastedTimeScheduling)
279 |                   // Per scheduler metrics bucketed by day.
280 |                   // Use floor since days are zero-indexed. For example, if the
281 |                   // simulator only runs for 1/2 day, we should only have one
282 |                   // bucket (day 0), so our range should be 0 to 0. In this example
283 |                   // we would get floor(runTime / 86400) = floor(0.5) = 0.
284 |                   val daysRan = math.floor(simulatorDesc.runTime/86400.0).toInt
285 |                   println("Computing daily stats for days 0 through %d."
286 |                           .format(daysRan))
287 |                   (0 to daysRan).foreach {
288 |                       day: Int => { 
289 |                     val perDayStats =
290 |                         ExperimentResultSet.
291 |                         ExperimentEnv.
292 |                         ExperimentResult.
293 |                         SchedulerStats.
294 |                         PerDayStats.newBuilder()
295 |                     perDayStats.setDayNum(day)
296 |                     // Busy and wasted time bucketed by day.
297 |                     perDayStats.setUsefulBusyTime(
298 |                         scheduler.dailyUsefulTimeScheduling.getOrElse(day, 0.0))
299 |                     println(("Writing dailyUsefulScheduling(day = %d) = %f for " +
300 |                             "scheduler %s")
301 |                             .format(day,
302 |                                     scheduler
303 |                                       .dailyUsefulTimeScheduling
304 |                                       .getOrElse(day, 0.0),
305 |                                     scheduler.name))
306 |                     perDayStats.setWastedBusyTime(
307 |                         scheduler.dailyWastedTimeScheduling.getOrElse(day, 0.0))
308 |                     // Counters bucketed by day.
309 |                     perDayStats.setNumSuccessfulTransactions(
310 |                         scheduler.dailySuccessTransactions.getOrElse[Int](day, 0))
311 |                     perDayStats.setNumFailedTransactions(
312 |                         scheduler.dailyFailedTransactions.getOrElse[Int](day, 0))
313 | 
314 |                     schedulerStats.addPerDayStats(perDayStats)
315 |                   }}
316 | 
317 |                   assert(scheduler.perWorkloadUsefulTimeScheduling.size ==
318 |                          scheduler.perWorkloadWastedTimeScheduling.size,
319 |                          "the maps held by Scheduler to track per workload " + 
320 |                          "useful and wasted time should be the same size " +
321 |                          "(Scheduler.addJob() should ensure this).")
322 |                   scheduler.perWorkloadUsefulTimeScheduling.foreach{
323 |                       case (workloadName, workloadUsefulBusyTime) => {
324 |                     val perWorkloadBusyTime =
325 |                         ExperimentResultSet.
326 |                         ExperimentEnv.
327 |                         ExperimentResult.
328 |                         SchedulerStats.
329 |                         PerWorkloadBusyTime.newBuilder()
330 |                     perWorkloadBusyTime.setWorkloadName(workloadName)
331 |                     perWorkloadBusyTime.setUsefulBusyTime(workloadUsefulBusyTime)
332 |                     perWorkloadBusyTime.setWastedBusyTime(
333 |                         scheduler.perWorkloadWastedTimeScheduling(workloadName))
334 | 
335 |                     schedulerStats.addPerWorkloadBusyTime(perWorkloadBusyTime)
336 |                   }}
337 |                   // Counts of sched-level job transaction successes, failures,
338 |                   // and retries.
339 |                   schedulerStats.setNumSuccessfulTransactions(
340 |                       scheduler.numSuccessfulTransactions)
341 |                   schedulerStats.setNumFailedTransactions(
342 |                       scheduler.numFailedTransactions)
343 |                   schedulerStats.setNumNoResourcesFoundSchedulingAttempts(
344 |                       scheduler.numNoResourcesFoundSchedulingAttempts)
345 |                   schedulerStats.setNumRetriedTransactions(
346 |                       scheduler.numRetriedTransactions)
347 |                   schedulerStats.setNumJobsTimedOutScheduling(
348 |                       scheduler.numJobsTimedOutScheduling)
349 |                   // Counts of task transaction successes and failures.
350 |                   schedulerStats.setNumSuccessfulTaskTransactions(
351 |                       scheduler.numSuccessfulTaskTransactions)
352 |                   schedulerStats.setNumFailedTaskTransactions(
353 |                       scheduler.numFailedTaskTransactions)
354 | 
355 |                   schedulerStats.setIsMultiPath(scheduler.isMultiPath)
356 |                   schedulerStats.setNumJobsLeftInQueue(scheduler.jobQueueSize)
357 |                   schedulerStats.setFailedFindVictimAttempts(
358 |                       scheduler.failedFindVictimAttempts)
359 | 
360 |                   experimentResult.addSchedulerStats(schedulerStats)
361 |                 })
362 |                 // Record scheduler specific details about the parameter sweeps.
363 |                 schedulerWorkloadsToSweepOver
364 |                     .foreach{case (schedName, workloadNames) => {
365 |                   workloadNames.foreach(workloadName => {
366 |                     val schedulerWorkload =
367 |                         ExperimentResultSet.
368 |                         ExperimentEnv.
369 |                         ExperimentResult.
370 |                         SchedulerWorkload.newBuilder()
371 |                     schedulerWorkload.setSchedulerName(schedName)
372 |                     schedulerWorkload.setWorkloadName(workloadName)
373 |                     experimentResult.addSweepSchedulerWorkload(schedulerWorkload)
374 |                   })
375 |                 }}
376 | 
377 |                 experimentResult.setConstantThinkTime(constantThinkTime)
378 |                 experimentResult.setPerTaskThinkTime(perTaskThinkTime)
379 | 
380 |                 // Save our results as a protocol buffer.
381 |                 experimentEnv.addExperimentResult(experimentResult.build())
382 | 
383 | 
384 |                 /**
385 |                  * TODO(andyk): Once protocol buffer support is finished,
386 |                  *              remove this.
387 |                  */
388 | 
389 |                 // Create a sorted list of schedulers and workloads to compute
390 |                 // a lot of the stats below, so that the we can be sure
391 |                 // which column is which when we print the stats.
392 |                 val sortedSchedulers = simulator
393 |                     .schedulers.values.toList.sortWith(_.name < _.name)
394 |                 val sortedWorkloads = workloads.toList.sortWith(_.name < _.name)
395 | 
396 |                 // Sorted names of workloads.
397 |                 var workloadNames = sortedWorkloads.map(_.name).mkString(" ")
398 | 
399 |                 // Count the jobs in each workload.
400 |                 var numJobs = sortedWorkloads.map(_.numJobs).mkString(" ")
401 | 
402 |                 // Count the jobs in each workload that were actually scheduled.
403 |                 val numJobsScheduled = sortedWorkloads.map(workload => {
404 |                   workload.getJobs.filter(_.numSchedulingAttempts > 0).length
405 |                 }).mkString(" ")
406 | 
407 |                 // Sorted names of Schedulers.
408 |                 val schedNames = sortedSchedulers.map(_.name).mkString(" ") 
409 | 
410 |                 // Calculate per scheduler successful, failed, retried
411 |                 // transaction conflict rates.
412 |                 val schedSuccessfulTransactions = sortedSchedulers.map(sched => {
413 |                   sched.numSuccessfulTransactions
414 |                 }).mkString(" ")
415 |                 val schedFailedTransactions = sortedSchedulers.map(sched => {
416 |                   sched.numFailedTransactions
417 |                 }).mkString(" ")
418 |                 val schedNoResorucesFoundSchedAttempt = sortedSchedulers.map(sched => {
419 |                   sched.numNoResourcesFoundSchedulingAttempts
420 |                 }).mkString(" ")
421 |                 val schedRetriedTransactions = sortedSchedulers.map(sched => {
422 |                   sched.numRetriedTransactions
423 |                 }).mkString(" ")
424 | 
425 |                 // Calculate per scheduler task transaction and conflict rates
426 |                 val schedSuccessfulTaskTransactions = sortedSchedulers.map(sched => {
427 |                   sched.numSuccessfulTaskTransactions
428 |                 }).mkString(" ")
429 |                 val schedFailedTaskTransactions = sortedSchedulers.map(sched => {
430 |                   sched.numFailedTaskTransactions
431 |                 }).mkString(" ")
432 | 
433 |                 val schedNumJobsTimedOutScheduling = sortedSchedulers.map(sched => {
434 |                   sched.numJobsTimedOutScheduling
435 |                 }).mkString(" ")
436 | 
437 |                 // Calculate per scheduler aggregate (useful + wasted) busy time.
438 |                 val schedBusyTimes = sortedSchedulers.map(sched => {
439 |                   println(("calculating busy time for sched %s as " + 
440 |                           "(%f + %f) / %f = %f.")
441 |                           .format(sched.name,
442 |                                   sched.totalUsefulTimeScheduling,
443 |                                   sched.totalWastedTimeScheduling,
444 |                                   simulator.currentTime,
445 |                                   (sched.totalUsefulTimeScheduling +
446 |                                    sched.totalWastedTimeScheduling) /
447 |                                   simulator.currentTime))
448 |                   (sched.totalUsefulTimeScheduling +
449 |                    sched.totalWastedTimeScheduling) / simulator.currentTime
450 |                 }).mkString(" ")
451 | 
452 |                 // Calculate per scheduler aggregate (useful + wasted) busy time.
453 |                 val schedUsefulBusyTimes = sortedSchedulers.map(sched => {
454 |                   sched.totalUsefulTimeScheduling / simulator.currentTime
455 |                 }).mkString(" ")
456 | 
457 |                 // Calculate per scheduler aggregate (useful + wasted) busy time.
458 |                 val schedWastedBusyTimes = sortedSchedulers.map(sched => {
459 |                   sched.totalWastedTimeScheduling / simulator.currentTime
460 |                 }).mkString(" ")
461 | 
462 |                 // Calculate per-scheduler per-workload useful + wasted busy time.
463 |                 val perWorkloadSchedBusyTimes = sortedSchedulers.map(sched => {
464 |                   // Sort by workload name.
465 |                   val sortedSchedulingTimes =
466 |                     sched.perWorkloadUsefulTimeScheduling.toList.sortWith(_._1<_._1)
467 |                   sortedSchedulingTimes.map(nameTimePair => {
468 |                     (nameTimePair._2 +
469 |                      sched.perWorkloadWastedTimeScheduling(nameTimePair._1)) /
470 |                     simulator.currentTime
471 |                   }).mkString(" ")
472 |                 }).mkString(" ")
473 | 
474 |                 // Calculate 90%tile per-workload time-scheduling for
475 |                 // scheduled jobs.
476 |                 // sortedWorkloads is a ListBuffer[Workload]
477 |                 // Workload.jobs is a ListBuffer[Job].
478 |                 val jobThinkTimes90Percentile = sortedWorkloads.map(workload => {
479 |                   workload.jobUsefulThinkTimesPercentile(0.9)
480 |                 }).mkString(" ")
481 | 
482 |                 // Calculate the average time jobs spent in scheduler's queue before
483 |                 // its first task was first scheduled.
484 |                 val avgJobQueueTimesTillFirstScheduled = sortedWorkloads.map(workload => {
485 |                   workload.avgJobQueueTimeTillFirstScheduled
486 |                 }).mkString(" ")
487 | 
488 |                 // Calculate the average time jobs spent in scheduler's queue before
489 |                 // its final task was scheduled..
490 |                 val avgJobQueueTimesTillFullyScheduled = sortedWorkloads.map(workload => {
491 |                   workload.avgJobQueueTimeTillFullyScheduled
492 |                 }).mkString(" ")
493 | 
494 |                 // Calculate the 90%tile per-workload jobQueueTime*-s for
495 |                 // scheduled jobs.
496 |                 val jobQueueTimeTillFirstScheduled90Percentile =
497 |                     sortedWorkloads.map(workload => {
498 |                   workload.jobQueueTimeTillFirstScheduledPercentile(0.9)
499 |                 }).mkString(" ")
500 | 
501 |                 val jobQueueTimeTillFullyScheduled90Percentile =
502 |                     sortedWorkloads.map(workload => {
503 |                   workload.jobQueueTimeTillFullyScheduledPercentile(0.9)
504 |                 }).mkString(" ")
505 | 
506 |                 val numSchedulingAttempts90Percentile =
507 |                     sortedWorkloads.map(workload => {
508 |                   workload.numSchedulingAttemptsPercentile(0.9)
509 |                 }).mkString(" ")
510 | 
511 |                 val numSchedulingAttempts99Percentile =
512 |                     sortedWorkloads.map(workload => {
513 |                   workload.numSchedulingAttemptsPercentile(0.99)
514 |                 }).mkString(" ")
515 | 
516 |                 val numSchedulingAttemptsMax =
517 |                     sortedWorkloads.map(workload => {
518 |                   workload.getJobs.map(_.numSchedulingAttempts).max
519 |                 }).mkString(" ")
520 | 
521 |                 val numTaskSchedulingAttempts90Percentile =
522 |                     sortedWorkloads.map(workload => {
523 |                   workload.numTaskSchedulingAttemptsPercentile(0.9)
524 |                 }).mkString(" ")
525 | 
526 |                 val numTaskSchedulingAttempts99Percentile =
527 |                     sortedWorkloads.map(workload => {
528 |                   workload.numTaskSchedulingAttemptsPercentile(0.99)
529 |                 }).mkString(" ")
530 | 
531 |                 val numTaskSchedulingAttemptsMax =
532 |                     sortedWorkloads.map(workload => {
533 |                   workload.getJobs.map(_.numTaskSchedulingAttempts).max
534 |                 }).mkString(" ")
535 | 
536 |                 // Per-scheduler stats.
537 |                 val schedulerIsMultiPaths = sortedSchedulers.map(sched => {
538 |                   if (sched.isMultiPath) "1"
539 |                   else "0"
540 |                 }).mkString(" ")
541 |                 val schedulerJobQueueSizes =
542 |                     sortedSchedulers.map(_.jobQueueSize).mkString(" ")
543 | 
544 |                 val prettyLine = ("cell: %s \n" +
545 |                                   "assignment policy: %s \n" +
546 |                                   "runtime: %f \n" +
547 |                                   "avg cpu util: %f \n" +
548 |                                   "avg mem util: %f \n" +
549 |                                   "num workloads %d \n" +
550 |                                   "workload names: %s \n" +
551 |                                   "numjobs: %s \n" + 
552 |                                   "num jobs scheduled: %s \n" + 
553 |                                   "perWorkloadSchedBusyTimes: %s \n" +
554 |                                   "jobThinkTimes90Percentile: %s \n" +
555 |                                   "avgJobQueueTimesTillFirstScheduled: %s \n" +
556 |                                   "avgJobQueueTimesTillFullyScheduled: %s \n" +
557 |                                   "jobQueueTimeTillFirstScheduled90Percentile: %s \n" +
558 |                                   "jobQueueTimeTillFullyScheduled90Percentile: %s \n" +
559 |                                   "numSchedulingAttempts90Percentile: %s \n" +
560 |                                   "numSchedulingAttempts99Percentile: %s \n" +
561 |                                   "numSchedulingAttemptsMax: %s \n" +
562 |                                   "numTaskSchedulingAttempts90Percentile: %s \n" +
563 |                                   "numTaskSchedulingAttempts99Percentile: %s \n" +
564 |                                   "numTaskSchedulingAttemptsMax: %s \n" +
565 |                                   "simulator.schedulers.size: %d \n" +
566 |                                   "schedNames: %s \n" +
567 |                                   "schedBusyTimes: %s \n" +
568 |                                   "schedUsefulBusyTimes: %s \n" +
569 |                                   "schedWastedBusyTimes: %s \n" +
570 |                                   "schedSuccessfulTransactions: %s \n" +
571 |                                   "schedFailedTransactions: %s \n" +     
572 |                                   "schedNoResorucesFoundSchedAttempt: %s \n" +
573 |                                   "schedRetriedTransactions: %s \n" +
574 |                                   "schedSuccessfulTaskTransactions: %s \n" +
575 |                                   "schedFailedTaskTransactions: %s \n" +     
576 |                                   "schedNumJobsTimedOutScheduling: %s \n" +     
577 |                                   "schedulerIsMultiPaths: %s \n" +
578 |                                   "schedulerNumJobsLeftInQueue: %s \n" +
579 |                                   "workloadToSweepOver: %s \n" +
580 |                                   "avgJobInterarrivalTime: %f \n" +
581 |                                   "constantThinkTime: %f \n" + 
582 |                                   "perTaskThinkTime %f").format(
583 |                     workloadDesc.cell,                                  // %s
584 |                     workloadDesc.assignmentPolicy,                      // %s
585 |                     simulatorDesc.runTime,                              // %f
586 |                     simulator.avgCpuUtilization /
587 |                         simulator.cellState.totalCpus,                  // %f
588 |                     simulator.avgMemUtilization /
589 |                         simulator.cellState.totalMem,                   // %f
590 |                     workloads.length,                                   // %d
591 |                     workloadNames,                                      // %s
592 |                     numJobs,                                            // %s
593 |                     numJobsScheduled,                                   // %s
594 |                     perWorkloadSchedBusyTimes,                          // %s
595 |                     jobThinkTimes90Percentile,                          // %s
596 |                     avgJobQueueTimesTillFirstScheduled,                 // %s
597 |                     avgJobQueueTimesTillFullyScheduled,                 // %s
598 |                     jobQueueTimeTillFirstScheduled90Percentile,         // %s
599 |                     jobQueueTimeTillFullyScheduled90Percentile,         // %s
600 |                     numSchedulingAttempts90Percentile,                  // %s
601 |                     numSchedulingAttempts99Percentile,                  // %s
602 |                     numSchedulingAttemptsMax,                           // %s
603 |                     numTaskSchedulingAttempts90Percentile,              // %s
604 |                     numTaskSchedulingAttempts99Percentile,              // %s
605 |                     numTaskSchedulingAttemptsMax,                       // %s
606 |                     simulator.schedulers.size,                          // %d
607 |                     schedNames,                                         // %s
608 |                     schedBusyTimes,                                     // %s
609 |                     schedUsefulBusyTimes,                               // %s
610 |                     schedWastedBusyTimes,                               // %s
611 |                     schedSuccessfulTransactions,                        // %s
612 |                     schedFailedTransactions,                            // %s
613 |                     schedNoResorucesFoundSchedAttempt,                  // %s
614 |                     schedRetriedTransactions,                           // %s
615 |                     schedSuccessfulTaskTransactions,                    // %s
616 |                     schedFailedTaskTransactions,                        // %s
617 |                     schedNumJobsTimedOutScheduling,                     // %s
618 |                     schedulerIsMultiPaths,                              // %s
619 |                     schedulerJobQueueSizes,
620 |                     workloadToSweepOver,                                // %s
621 |                     avgJobInterarrivalTime.getOrElse(
622 |                         workloads.filter(_.name == workloadToSweepOver) // %f
623 |                                  .head.avgJobInterarrivalTime),
624 |                     constantThinkTime,                                  // %f
625 |                     perTaskThinkTime)                                   // %f
626 | 
627 |                 println(prettyLine + "\n")
628 |               } else { // if (success)
629 |                 println("Simulation timed out.")
630 |               }
631 |             }) // blackListPercent
632 |           }) // C
633 |         }) // L
634 |       }) // lambda
635 |       experimentResultSet.addExperimentEnv(experimentEnv)
636 |     }) // WorkloadDescs
637 |     experimentResultSet.build().writeTo(output)
638 |     output.close()
639 |   }
640 | }
641 | 


--------------------------------------------------------------------------------
/src/main/scala/MesosSimulation.scala:
--------------------------------------------------------------------------------
  1 | /**
  2 |  * Copyright (c) 2013, Regents of the University of California
  3 |  * All rights reserved.
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.  Redistributions in binary
 10 |  * form must reproduce the above copyright notice, this list of conditions and the
 11 |  * following disclaimer in the documentation and/or other materials provided with
 12 |  * the distribution.  Neither the name of the University of California, Berkeley
 13 |  * nor the names of its contributors may be used to endorse or promote products
 14 |  * derived from this software without specific prior written permission.  THIS
 15 |  * SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 16 |  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 17 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 18 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 19 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 20 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 21 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 22 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 23 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 24 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 25 |  */
 26 | 
 27 | package ClusterSchedulingSimulation
 28 | 
 29 | import collection.mutable.HashMap
 30 | import collection.mutable.ListBuffer
 31 | 
 32 | class MesosSimulatorDesc(
 33 |     schedulerDescs: Seq[MesosSchedulerDesc],
 34 |     runTime: Double,
 35 |     val allocatorConstantThinkTime: Double)
 36 |    extends ClusterSimulatorDesc(runTime){
 37 |   override
 38 |   def newSimulator(constantThinkTime: Double,
 39 |                    perTaskThinkTime: Double,
 40 |                    blackListPercent: Double,
 41 |                    schedulerWorkloadsToSweepOver: Map[String, Seq[String]],
 42 |                    workloadToSchedulerMap: Map[String, Seq[String]],
 43 |                    cellStateDesc: CellStateDesc,
 44 |                    workloads: Seq[Workload],
 45 |                    prefillWorkloads: Seq[Workload],
 46 |                    logging: Boolean = false): ClusterSimulator = {
 47 |     var schedulers = HashMap[String, MesosScheduler]()
 48 |     // Create schedulers according to experiment parameters.
 49 |     schedulerDescs.foreach(schedDesc => {
 50 |       // If any of the scheduler-workload pairs we're sweeping over
 51 |       // are for this scheduler, then apply them before
 52 |       // registering it.
 53 |       var constantThinkTimes = HashMap[String, Double](
 54 |           schedDesc.constantThinkTimes.toSeq: _*)
 55 |       var perTaskThinkTimes = HashMap[String, Double](
 56 |           schedDesc.perTaskThinkTimes.toSeq: _*)
 57 |       var newBlackListPercent = 0.0
 58 |       if (schedulerWorkloadsToSweepOver
 59 |           .contains(schedDesc.name)) {
 60 |         newBlackListPercent = blackListPercent
 61 |         schedulerWorkloadsToSweepOver(schedDesc.name)
 62 |             .foreach(workloadName => {
 63 |           constantThinkTimes(workloadName) = constantThinkTime
 64 |           perTaskThinkTimes(workloadName) = perTaskThinkTime
 65 |         })
 66 |       }
 67 |       schedulers(schedDesc.name) =
 68 |           new MesosScheduler(schedDesc.name,
 69 |                              constantThinkTimes.toMap,
 70 |                              perTaskThinkTimes.toMap,
 71 |                              schedDesc.schedulePartialJobs,
 72 |                              math.floor(newBlackListPercent *
 73 |                                cellStateDesc.numMachines.toDouble).toInt)
 74 |     })
 75 |     // It shouldn't matter which transactionMode we choose, but it does
 76 |     // matter that we use "resource-fit" conflictMode or else
 77 |     // responses to resource offers will likely fail.
 78 |     val cellState = new CellState(cellStateDesc.numMachines,
 79 |                                   cellStateDesc.cpusPerMachine,
 80 |                                   cellStateDesc.memPerMachine,
 81 |                                   conflictMode = "resource-fit",
 82 |                                   transactionMode = "all-or-nothing")
 83 | 
 84 |     val allocator =
 85 |         new MesosAllocator(allocatorConstantThinkTime)
 86 | 
 87 |     new MesosSimulator(cellState,
 88 |                        schedulers.toMap,
 89 |                        workloadToSchedulerMap,
 90 |                        workloads,
 91 |                        prefillWorkloads,
 92 |                        allocator,
 93 |                        logging)
 94 |   }
 95 | }
 96 | 
 97 | class MesosSimulator(cellState: CellState,
 98 |                      override val schedulers: Map[String, MesosScheduler],
 99 |                      workloadToSchedulerMap: Map[String, Seq[String]],
100 |                      workloads: Seq[Workload],
101 |                      prefillWorkloads: Seq[Workload],
102 |                      var allocator: MesosAllocator,
103 |                      logging: Boolean = false,
104 |                      monitorUtilization: Boolean = true)
105 |                     extends ClusterSimulator(cellState,
106 |                                              schedulers,
107 |                                              workloadToSchedulerMap,
108 |                                              workloads,
109 |                                              prefillWorkloads,
110 |                                              logging,
111 |                                              monitorUtilization) {
112 |   assert(cellState.conflictMode.equals("resource-fit"),
113 |          "Mesos requires cellstate to be set up with resource-fit conflictMode")
114 |   // Set up a pointer to this simulator in the allocator.
115 |   allocator.simulator = this
116 | 
117 |   log("========================================================")
118 |   log("Mesos SIM CONSTRUCTOR - CellState total usage: %fcpus (%.1f%s), %fmem (%.1f%s)."
119 |                 .format(cellState.totalOccupiedCpus,
120 |                         cellState.totalOccupiedCpus /
121 |                         cellState.totalCpus * 100.0,
122 |                         "%",
123 |                         cellState.totalOccupiedMem,
124 |                         cellState.totalOccupiedMem /
125 |                           cellState.totalMem * 100.0,
126 |                         "%"))
127 | 
128 |   // Set up a pointer to this simulator in each scheduler.
129 |   schedulers.values.foreach(_.mesosSimulator = this)
130 | }
131 | 
132 | class MesosSchedulerDesc(name: String,
133 |                          constantThinkTimes: Map[String, Double],
134 |                          perTaskThinkTimes: Map[String, Double],
135 |                          val schedulePartialJobs: Boolean)
136 |                         extends SchedulerDesc(name,
137 |                                               constantThinkTimes,
138 |                                               perTaskThinkTimes)
139 | 
140 | class MesosScheduler(name: String,
141 |                      constantThinkTimes: Map[String, Double],
142 |                      perTaskThinkTimes: Map[String, Double],
143 |                      val schedulePartialJobs: Boolean,
144 |                      numMachinesToBlackList: Double = 0)
145 |                     extends Scheduler(name,
146 |                                       constantThinkTimes,
147 |                                       perTaskThinkTimes,
148 |                                       numMachinesToBlackList) {
149 |   println("scheduler-id-info: %d, %s, %d, %s, %s"
150 |           .format(Thread.currentThread().getId(),
151 |                   name,
152 |                   hashCode(),
153 |                   constantThinkTimes.mkString(";"),
154 |                   perTaskThinkTimes.mkString(";")))
155 |   // TODO(andyk): Clean up these <subclass>Simulator classes
156 |   //              by templatizing the Scheduler class and having only
157 |   //              one simulator of the correct type, instead of one
158 |   //              simulator for each of the parent and child classes.
159 |   var mesosSimulator: MesosSimulator = null
160 |   val offerQueue = new collection.mutable.Queue[Offer]
161 | 
162 |   override
163 |   def checkRegistered = {
164 |     super.checkRegistered
165 |     assert(mesosSimulator != null, "This scheduler has not been added to a " +
166 |                                    "simulator yet.")
167 |   }
168 | 
169 |   /**
170 |    * How an allocator sends offers to a framework.
171 |    */
172 |   def resourceOffer(offer: Offer): Unit = {
173 |     offerQueue.enqueue(offer)
174 |     handleNextResourceOffer()
175 |   }
176 | 
177 |   def handleNextResourceOffer(): Unit = {
178 |     // We essentially synchronize access to this scheduling logic
179 |     // via the scheduling variable. We aren't protecting this from real
180 |     // parallelism, but rather from discrete-event-simlation style parallelism.
181 |     if(!scheduling && !offerQueue.isEmpty) {
182 |       scheduling = true
183 |       val offer = offerQueue.dequeue()
184 |       // Use this offer to attempt to schedule jobs.
185 |       simulator.log("------ In %s.resourceOffer(offer %d).".format(name, offer.id))
186 |       val offerResponse = collection.mutable.ListBuffer[ClaimDelta]()
187 |       var aggThinkTime: Double = 0.0
188 |       // TODO(andyk): add an efficient method to CellState that allows us to
189 |       //              check the largest slice of available resources to decode
190 |       //              if we should keep trying to schedule or not.
191 |       while (offer.cellState.availableCpus > 0.000001 &&
192 |              offer.cellState.availableMem > 0.000001 &&
193 |              !pendingQueue.isEmpty) {
194 |         val job = pendingQueue.dequeue
195 |         job.updateTimeInQueueStats(simulator.currentTime)
196 |         val jobThinkTime = getThinkTime(job)
197 |         aggThinkTime += jobThinkTime
198 |         job.numSchedulingAttempts += 1
199 |         job.numTaskSchedulingAttempts += job.unscheduledTasks
200 | 
201 |         // Before calling the expensive scheduleJob() function, check
202 |         // to see if one of this job's tasks could fit into the sum of
203 |         // *all* the currently free resources in the offers' cell state.
204 |         // If one can't, then there is no need to call scheduleJob(). If 
205 |         // one can, we call scheduleJob(), though we still might not fit
206 |         // any tasks due to fragmentation.
207 |         if (offer.cellState.availableCpus > job.cpusPerTask &&
208 |             offer.cellState.availableMem > job.cpusPerTask) {
209 |           // Schedule the job using the cellstate in the ResourceOffer.
210 |           val claimDeltas = scheduleJob(job, offer.cellState)
211 |           if(claimDeltas.length > 0) {
212 |             numSuccessfulTransactions += 1
213 |             recordUsefulTimeScheduling(job,
214 |                                        jobThinkTime,
215 |                                        job.numSchedulingAttempts == 1)
216 |             mesosSimulator.log(("Setting up job %d to accept at least " +
217 |                                 "part of offer %d. About to spend %f seconds " +
218 |                                 "scheduling it. Assigning %d tasks to it.")
219 |                                 .format(job.id, offer.id, jobThinkTime,
220 |                                         claimDeltas.length))
221 |             offerResponse ++= claimDeltas
222 |             job.unscheduledTasks -= claimDeltas.length
223 |           } else {
224 |             mesosSimulator.log(("Rejecting all of offer %d for job %d, " +
225 |                                 "which requires tasks with %f cpu, %f mem. " +
226 |                                 "Not counting busy time for this sched attempt.")
227 |                                 .format(offer.id,
228 |                                         job.id,
229 |                                         job.cpusPerTask,
230 |                                         job.memPerTask))
231 |             numNoResourcesFoundSchedulingAttempts += 1
232 |           }
233 |         } else {
234 |           mesosSimulator.log(("Short-path rejecting all of offer %d for " +
235 |                               "job %d because a single one of its tasks " +
236 |                               "(%f cpu, %f mem) wouldn't fit into the sum " +
237 |                               "of the offer's private cell state's " +
238 |                               "remaining resources (%f cpu, %f mem).")
239 |                               .format(offer.id,
240 |                                       job.id,
241 |                                       job.cpusPerTask,
242 |                                       job.memPerTask,
243 |                                       offer.cellState.availableCpus,
244 |                                       offer.cellState.availableMem))
245 |         }
246 | 
247 |         var jobEventType = "" // Set this conditionally below; used in logging.
248 |         // If job is only partially scheduled, put it back in the pendingQueue.
249 |         if (job.unscheduledTasks > 0) {
250 |           mesosSimulator.log(("Job %d is [still] only partially scheduled, " +
251 |                              "(%d out of %d its tasks remain unscheduled) so " +
252 |                              "putting it back in the queue.")
253 |                              .format(job.id,
254 |                                      job.unscheduledTasks,
255 |                                      job.numTasks))
256 |           // Give up on a job if (a) it hasn't scheduled a single task in
257 |           // 100 tries or (b) it hasn't finished scheduling after 1000 tries.
258 |           if ((job.numSchedulingAttempts > 100 &&
259 |                job.unscheduledTasks == job.numTasks) ||
260 |               job.numSchedulingAttempts > 1000) {
261 |             println(("Abandoning job %d (%f cpu %f mem) with %d/%d " +
262 |                    "remaining tasks, after %d scheduling " +
263 |                    "attempts.").format(job.id,
264 |                                        job.cpusPerTask,
265 |                                        job.memPerTask,
266 |                                        job.unscheduledTasks,
267 |                                        job.numTasks,
268 |                                        job.numSchedulingAttempts))
269 |             numJobsTimedOutScheduling += 1
270 |             jobEventType = "abandoned"
271 |           } else {
272 |             simulator.afterDelay(1) {
273 |               addJob(job)
274 |             }
275 |           }
276 |           job.lastEnqueued = simulator.currentTime
277 |         } else {
278 |           // All tasks in job scheduled so not putting it back in pendingQueue.
279 |           jobEventType = "fully-scheduled"
280 |         }
281 |         if (!jobEventType.equals("")) {
282 |           // Print some stats that we can use to generate CDFs of the job
283 |           // # scheduling attempts and job-time-till-scheduled.
284 |           // println("%s %s %d %s %d %d %f"
285 |           //         .format(Thread.currentThread().getId(),
286 |           //                 name,
287 |           //                 hashCode(),
288 |           //                 jobEventType,
289 |           //                 job.id,
290 |           //                 job.numSchedulingAttempts,
291 |           //                 simulator.currentTime - job.submitted))
292 |         }
293 |       }
294 | 
295 |       if (pendingQueue.isEmpty) {
296 |         // If we have scheduled everything, notify the allocator that we
297 |         // don't need resources offers until we request them again (which
298 |         // we will do when another job is added to our pendingQueue.
299 |         // Do this before we reply to the offer since the allocator may make
300 |         // its next round of offers shortly after we respond to this offer.
301 |         mesosSimulator.log(("After scheduling, %s's pending queue is " +
302 |                             "empty, canceling outstanding " +
303 |                             "resource request.").format(name))
304 |         mesosSimulator.allocator.cancelOfferRequest(this)
305 |       } else {
306 |         mesosSimulator.log(("%s's pending queue still has %d jobs in it, but " +
307 |                             "for some reason, they didn't fit into this " +
308 |                             "offer, so it will patiently wait for more " +
309 |                             "resource offers.").format(name, pendingQueue.size))
310 |       }
311 | 
312 |       // Send our response to this offer.
313 |       mesosSimulator.afterDelay(aggThinkTime) {
314 |         mesosSimulator.log(("Waited %f seconds of aggThinkTime, now " +
315 |                             "responding to offer %d with %d responses after.")
316 |                            .format(aggThinkTime, offer.id, offerResponse.length))
317 |         mesosSimulator.allocator.respondToOffer(offer, offerResponse)
318 |       }
319 |       // Done with this offer, see if we have another one to handle.
320 |       scheduling = false
321 |       handleNextResourceOffer()
322 |     }
323 |   }
324 | 
325 |   // When a job arrives, notify the allocator, so that it can make us offers
326 |   // until we notify it that we don't have any more jobs, at which time it
327 |   // can stop sending us offers.
328 |   override
329 |   def addJob(job: Job) = {
330 |     assert(simulator != null, "This scheduler has not been added to a " +
331 |                               "simulator yet.")
332 |       simulator.log("========================================================")
333 |       simulator.log("addJOB: CellState total usage: %fcpus (%.1f%s), %fmem (%.1f%s)."
334 |                     .format(simulator.cellState.totalOccupiedCpus,
335 |                             simulator.cellState.totalOccupiedCpus /
336 |                               simulator.cellState.totalCpus * 100.0,
337 |                             "%",
338 |                             simulator.cellState.totalOccupiedMem,
339 |                             simulator.cellState.totalOccupiedMem /
340 |                               simulator.cellState.totalMem * 100.0,
341 |                             "%"))
342 |     super.addJob(job)
343 |     pendingQueue.enqueue(job)
344 |     simulator.log("Enqueued job %d of workload type %s."
345 |                   .format(job.id, job.workloadName))
346 |     mesosSimulator.allocator.requestOffer(this)
347 |   }
348 | }
349 | 
350 | /**
351 |  * Decides which scheduler to make resource offer to next, and manages
352 |  * the resource offer process.
353 |  *
354 |  * @param constantThinkTime the time this scheduler takes to sort the
355 |  *       list of schedulers to decide which to offer to next. This happens
356 |  *       before each series of resource offers is made.
357 |  * @param resources How many resources is managed by this MesosAllocator
358 |  */
359 | class MesosAllocator(constantThinkTime: Double,
360 |                      minCpuOffer: Double = 100.0,
361 |                      minMemOffer: Double = 100.0,
362 |                      // Min time, in seconds, to batch up resources
363 |                      // before making an offer.
364 |                      val offerBatchInterval: Double = 1.0) {
365 |   var simulator: MesosSimulator = null
366 |   var allocating: Boolean = false
367 |   var schedulersRequestingResources = collection.mutable.Set[MesosScheduler]()
368 |   var timeSpentAllocating: Double = 0.0
369 |   var nextOfferId: Long = 0
370 |   val offeredDeltas = HashMap[Long, Seq[ClaimDelta]]()
371 |   // Are we currently waiting while a resource batch offer builds up
372 |   // that has already been scheduled?
373 |   var buildAndSendOfferScheduled = false
374 | 
375 |   def checkRegistered = {
376 |     assert(simulator != null, "You must assign a simulator to a " +
377 |                               "MesosAllocator before you can use it.")
378 |   }
379 | 
380 |   def getThinkTime: Double = {
381 |     constantThinkTime
382 |   }
383 | 
384 |   def requestOffer(needySched: MesosScheduler) {
385 |     checkRegistered
386 |     simulator.log("Received an offerRequest from %s.".format(needySched.name))
387 |     // Adding a scheduler to this list will ensure that it gets included
388 |     // in the next round of resource offers.
389 |     schedulersRequestingResources += needySched
390 |     schedBuildAndSendOffer()
391 |   }
392 | 
393 |   def cancelOfferRequest(needySched: MesosScheduler) = {
394 |     simulator.log("Canceling the outstanding resourceRequest for scheduler %s.".format(
395 |         needySched.name))
396 |     schedulersRequestingResources -= needySched
397 |   }
398 | 
399 |   /**
400 |    * We batch up available resources into periodic offers so
401 |    * that we don't send an offer in response to *every* small event,
402 |    * which adds latency to the average offer and slows the simulator down.
403 |    * This feature was in Mesos for the NSDI paper experiments, but didn't
404 |    * get committed to the open source codebase at that time.
405 |    */
406 |   def schedBuildAndSendOffer() = {
407 |     if (!buildAndSendOfferScheduled) {
408 |       buildAndSendOfferScheduled = true
409 |       simulator.afterDelay(offerBatchInterval) {
410 |         simulator.log("Building and sending a batched offer")
411 |         buildAndSendOffer()
412 |         // Let another call to buildAndSendOffer() get scheduled,
413 |         // giving some time for resources to build up that are
414 |         // becoming available due to tasks finishing.
415 |         buildAndSendOfferScheduled = false
416 |       }
417 |     }
418 |   }
419 |   /**
420 |    * Sort schedulers in simulator using DRF, then make an offer to
421 |    * the first scheduler in the list.
422 |    *
423 |    * After any task finishes or scheduler says it wants offers, we
424 |    * call this, i.e. buildAndSendOffer(), again. Note that the only
425 |    * resources that will be available will be the ones that
426 |    * the task that just finished was using).
427 |    */
428 |   def buildAndSendOffer(): Unit = {
429 |     checkRegistered
430 |     simulator.log("========================================================")
431 |     simulator.log(("TOP OF BUILD AND SEND. CellState total occupied: " +
432 |                    "%fcpus (%.1f%%), %fmem (%.1f%%).")
433 |                   .format(simulator.cellState.totalOccupiedCpus,
434 |                           simulator.cellState.totalOccupiedCpus /
435 |                             simulator.cellState.totalCpus * 100.0,
436 |                           simulator.cellState.totalOccupiedMem,
437 |                           simulator.cellState.totalOccupiedMem /
438 |                             simulator.cellState.totalMem * 100.0))
439 |     // Build and send an offer only if:
440 |     // (a) there are enough resources in cellstate and
441 |     // (b) at least one scheduler wants offers currently
442 |     // Else, don't do anything, since this function will be called
443 |     // again when a task finishes or a scheduler says it wants offers.
444 |     if (!schedulersRequestingResources.isEmpty &&
445 |       simulator.cellState.availableCpus >= minCpuOffer &&
446 |       simulator.cellState.availableCpus >= minMemOffer) {
447 |       // Use DRF to pick a candidate scheduler to offer resources.
448 |       val sortedSchedulers =
449 |           drfSortSchedulers(schedulersRequestingResources.toSeq)
450 |       sortedSchedulers.headOption.foreach(candidateSched => {
451 |         // Create an offer by taking a snapshot of cell state. We might
452 |         // discard this without sending it if we find that there are
453 |         // no available resources in cell state right now.
454 |         val privCellState = simulator.cellState.copy
455 |         val offer = Offer(nextOfferId, candidateSched, privCellState)
456 |         nextOfferId += 1
457 | 
458 |         // Call scheduleAllAvailable() which creates deltas, applies them,
459 |         // and returns them; all based on common cell state. This doesn't
460 |         // affect the privateCellState we created above. Store the deltas
461 |         // using the offerID as key until we get a response from the scheduler.
462 |         // This has the effect of pessimistally locking the resources in
463 |         // common cell state until we hear back from the scheduler (or time
464 |         // out and rescind the offer).
465 |         val claimDeltas =
466 |             candidateSched.scheduleAllAvailable(cellState = simulator.cellState,
467 |                                                 locked = true)
468 |         // Make sure scheduleAllAvailable() did its job.
469 |         assert(simulator.cellState.availableCpus < 0.01 &&
470 |                simulator.cellState.availableMem < 0.01,
471 |                ("After scheduleAllAvailable() is called on a cell state " +
472 |                 "that cells state should not have any available resources " +
473 |                 "of any type, but this cell state still has %f cpus and %f " +
474 |                 "memory available").format(simulator.cellState.availableCpus,
475 |                                            simulator.cellState.availableMem))
476 |         if (!claimDeltas.isEmpty) {
477 |           assert(privCellState.totalLockedCpus !=
478 |                  simulator.cellState.totalLockedCpus,
479 |                  "Since some resources were locked and put into a resource " +
480 |                  "offer, we expect the number of total lockedCpus to now be " +
481 |                  "different in the private cell state we created than in the" +
482 |                  "common cell state.")
483 |           offeredDeltas(offer.id) = claimDeltas
484 | 
485 |           val thinkTime = getThinkTime
486 |           simulator.afterDelay(thinkTime) {
487 |             timeSpentAllocating += thinkTime
488 |             simulator.log(("Allocator done thinking, sending offer to %s. " +
489 |                            "Offer contains private cell state with " +
490 |                            "%f cpu, %f mem available.")
491 |                           .format(candidateSched.name,
492 |                                   offer.cellState.availableCpus,
493 |                                   offer.cellState.availableMem))
494 |             // Send the offer.
495 |             candidateSched.resourceOffer(offer)
496 |           }
497 |         }
498 |       })
499 |     } else {
500 |       var reason = ""
501 |       if (schedulersRequestingResources.isEmpty)
502 |         reason = "No schedulers currently want offers."
503 |       if (simulator.cellState.availableCpus < minCpuOffer ||
504 |           simulator.cellState.availableCpus < minMemOffer)
505 |         reason = ("Only %f cpus and %f mem available in common cell state " +
506 |                   "but min offer size is %f cpus and %f mem.")
507 |                   .format(simulator.cellState.availableCpus,
508 |                           simulator.cellState.availableCpus,
509 |                           minCpuOffer,
510 |                           minMemOffer)
511 |       simulator.log("Not sending an offer after all. %s".format(reason))
512 |     }
513 |   }
514 | 
515 |   /**
516 |    * Schedulers call this to respond to resource offers.
517 |    */
518 |   def respondToOffer(offer: Offer, claimDeltas: Seq[ClaimDelta]) = {
519 |     checkRegistered
520 |     simulator.log(("------Scheduler %s responded to offer %d with " +
521 |                    "%d claimDeltas.")
522 |                   .format(offer.scheduler.name, offer.id, claimDeltas.length))
523 | 
524 |     // Look up, unapply, & discard the saved deltas associated with the offerid.
525 |     // This will cause the framework to stop being charged for the resources that
526 |     // were locked while he made his scheduling decision.
527 |     assert(offeredDeltas.contains(offer.id),
528 |            "Allocator received response to offer that is not on record.")
529 |     offeredDeltas.remove(offer.id).foreach(savedDeltas => {
530 |       savedDeltas.foreach(_.unApply(cellState = simulator.cellState,
531 |                                     locked = true))
532 |     })
533 |       simulator.log("========================================================")
534 |       simulator.log("AFTER UNAPPLYING SAVED DELTAS")
535 |       simulator.log("CellState total usage: %fcpus (%.1f%s), %fmem (%.1f%s)."
536 |                     .format(simulator.cellState.totalOccupiedCpus,
537 |                             simulator.cellState.totalOccupiedCpus /
538 |                               simulator.cellState.totalCpus * 100.0,
539 |                             "%",
540 |                             simulator.cellState.totalOccupiedMem,
541 |                             simulator.cellState.totalOccupiedMem /
542 |                               simulator.cellState.totalMem * 100.0,
543 |                             "%"))
544 |     simulator.log("Committing all %d deltas that were part of response %d "
545 |                   .format(claimDeltas.length, offer.id))
546 |     // commit() all deltas that were part of the offer response, don't use
547 |     // the option of having cell state create the end events for us since we
548 |     // want to add code to the end event that triggers another resource offer.
549 |     if (claimDeltas.length > 0) {
550 |       val commitResult = simulator.cellState.commit(claimDeltas, false)
551 |       assert(commitResult.conflictedDeltas.length == 0,
552 |              "Expecting no conflicts, but there were %d."
553 |              .format(commitResult.conflictedDeltas.length))
554 | 
555 |       // Create end events for all tasks committed.
556 |       commitResult.committedDeltas.foreach(delta => {
557 |         simulator.afterDelay(delta.duration) {
558 |           delta.unApply(simulator.cellState)
559 |           simulator.log(("A task started by scheduler %s finished. " +
560 |                          "Freeing %f cpus, %f mem. Available: %f cpus, %f " +
561 |                          "mem. Also, triggering a new batched offer round.")
562 |                        .format(delta.scheduler.name,
563 |                                delta.cpus,
564 |                                delta.mem,
565 |                                simulator.cellState.availableCpus,
566 |                                simulator.cellState.availableMem))
567 |           schedBuildAndSendOffer()
568 |         }
569 |       })
570 |     }
571 |     schedBuildAndSendOffer()
572 |   }
573 | 
574 |   /**
575 |    * 1/N multi-resource fair sharing.
576 |    */
577 |   def drfSortSchedulers(schedulers: Seq[MesosScheduler]): Seq[MesosScheduler] = {
578 |     val schedulerDominantShares = schedulers.map(scheduler => {
579 |       val shareOfCpus =
580 |           simulator.cellState.occupiedCpus.getOrElse(scheduler.name, 0.0)
581 |       val shareOfMem =
582 |           simulator.cellState.occupiedMem.getOrElse(scheduler.name, 0.0)
583 |       val domShare = math.max(shareOfCpus / simulator.cellState.totalCpus,
584 |                               shareOfMem / simulator.cellState.totalMem)
585 |       var nameOfDomShare = ""
586 |       if (shareOfCpus > shareOfMem) nameOfDomShare = "cpus"
587 |       else nameOfDomShare = "mem"
588 |       simulator.log("%s's dominant share is %s (%f%s)."
589 |                     .format(scheduler.name, nameOfDomShare, domShare, "%"))
590 |       (scheduler, domShare)
591 |     })
592 |     schedulerDominantShares.sortBy(_._2).map(_._1)
593 |   }
594 | }
595 | 
596 | case class Offer(id: Long, scheduler: MesosScheduler, cellState: CellState)
597 | 


--------------------------------------------------------------------------------
/src/main/scala/MonolithicSimulation.scala:
--------------------------------------------------------------------------------
  1 | /**
  2 |  * Copyright (c) 2013, Regents of the University of California
  3 |  * All rights reserved.
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.  Redistributions in binary
 10 |  * form must reproduce the above copyright notice, this list of conditions and the
 11 |  * following disclaimer in the documentation and/or other materials provided with
 12 |  * the distribution.  Neither the name of the University of California, Berkeley
 13 |  * nor the names of its contributors may be used to endorse or promote products
 14 |  * derived from this software without specific prior written permission.  THIS
 15 |  * SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 16 |  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 17 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 18 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 19 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 20 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 21 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 22 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 23 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 24 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 25 |  */
 26 | 
 27 | package ClusterSchedulingSimulation
 28 | 
 29 | import scala.collection.mutable.HashMap
 30 | 
 31 | /* This class and its subclasses are used by factory method
 32 |  * ClusterSimulator.newScheduler() to determine which type of Simulator
 33 |  * to create and also to carry any extra fields that the factory needs to
 34 |  * construct the simulator.
 35 |  */
 36 | class MonolithicSimulatorDesc(schedulerDescs: Seq[SchedulerDesc],
 37 |                               runTime: Double)
 38 |                              extends ClusterSimulatorDesc(runTime){
 39 |   override
 40 |   def newSimulator(constantThinkTime: Double,
 41 |                    perTaskThinkTime: Double,
 42 |                    blackListPercent: Double,
 43 |                    schedulerWorkloadsToSweepOver: Map[String, Seq[String]],
 44 |                    workloadToSchedulerMap: Map[String, Seq[String]],
 45 |                    cellStateDesc: CellStateDesc,
 46 |                    workloads: Seq[Workload],
 47 |                    prefillWorkloads: Seq[Workload],
 48 |                    logging: Boolean = false): ClusterSimulator = {
 49 |     var schedulers = HashMap[String, Scheduler]()
 50 |     // Create schedulers according to experiment parameters.
 51 |     schedulerDescs.foreach(schedDesc => {
 52 |       // If any of the scheduler-workload pairs we're sweeping over
 53 |       // are for this scheduler, then apply them before
 54 |       // registering it.
 55 |       var constantThinkTimes = HashMap[String, Double](
 56 |           schedDesc.constantThinkTimes.toSeq: _*)
 57 |       var perTaskThinkTimes = HashMap[String, Double](
 58 |           schedDesc.perTaskThinkTimes.toSeq: _*)
 59 |       var newBlackListPercent = 0.0
 60 |       if (schedulerWorkloadsToSweepOver
 61 |           .contains(schedDesc.name)) {
 62 |         newBlackListPercent = blackListPercent
 63 |         schedulerWorkloadsToSweepOver(schedDesc.name)
 64 |             .foreach(workloadName => {
 65 |           constantThinkTimes(workloadName) = constantThinkTime
 66 |           perTaskThinkTimes(workloadName) = perTaskThinkTime
 67 |         })
 68 |       }
 69 |       schedulers(schedDesc.name) =
 70 |           new MonolithicScheduler(schedDesc.name,
 71 |                                   constantThinkTimes.toMap,
 72 |                                   perTaskThinkTimes.toMap,
 73 |                                   math.floor(newBlackListPercent *
 74 |                                     cellStateDesc.numMachines.toDouble).toInt)
 75 |     })
 76 | 
 77 |     val cellState = new CellState(cellStateDesc.numMachines,
 78 |                                   cellStateDesc.cpusPerMachine,
 79 |                                   cellStateDesc.memPerMachine,
 80 |                                   conflictMode = "resource-fit",
 81 |                                   transactionMode = "all-or-nothing")
 82 | 
 83 |     new ClusterSimulator(cellState,
 84 |                          schedulers.toMap,
 85 |                          workloadToSchedulerMap,
 86 |                          workloads,
 87 |                          prefillWorkloads,
 88 |                          logging)
 89 |   }
 90 | }
 91 | 
 92 | class MonolithicScheduler(name: String,
 93 |                           constantThinkTimes: Map[String, Double],
 94 |                           perTaskThinkTimes: Map[String, Double],
 95 |                           numMachinesToBlackList: Double = 0)
 96 |                          extends Scheduler(name,
 97 |                                            constantThinkTimes,
 98 |                                            perTaskThinkTimes,
 99 |                                            numMachinesToBlackList) {
100 | 
101 |   println("scheduler-id-info: %d, %s, %d, %s, %s"
102 |           .format(Thread.currentThread().getId(),
103 |                   name,
104 |                   hashCode(),
105 |                   constantThinkTimes.mkString(";"),
106 |                   perTaskThinkTimes.mkString(";")))
107 | 
108 |   override
109 |   def addJob(job: Job) = {
110 |     assert(simulator != null, "This scheduler has not been added to a " +
111 |                               "simulator yet.")
112 |     super.addJob(job)
113 |     job.lastEnqueued = simulator.currentTime
114 |     pendingQueue.enqueue(job)
115 |     simulator.log("enqueued job " + job.id)
116 |     if (!scheduling)
117 |       scheduleNextJobAction()
118 |   }
119 | 
120 |   /**
121 |    * Checks to see if there is currently a job in this scheduler's job queue.
122 |    * If there is, and this scheduler is not currently scheduling a job, then
123 |    * pop that job off of the queue and "begin scheduling it". Scheduling a
124 |    * job consists of setting this scheduler's state to scheduling = true, and
125 |    * adding a finishSchedulingJobAction to the simulators event queue by
126 |    * calling afterDelay().
127 |    */
128 |   def scheduleNextJobAction(): Unit = {
129 |     assert(simulator != null, "This scheduler has not been added to a " +
130 |                               "simulator yet.")
131 |     if (!scheduling && !pendingQueue.isEmpty) {
132 |       scheduling = true
133 |       val job = pendingQueue.dequeue
134 |       job.updateTimeInQueueStats(simulator.currentTime)
135 |       job.lastSchedulingStartTime = simulator.currentTime
136 |       val thinkTime = getThinkTime(job)
137 |       simulator.log("getThinkTime returned " + thinkTime)
138 |       simulator.afterDelay(thinkTime) {
139 |         simulator.log(("Scheduler %s finished scheduling job %d. " +
140 |                        "Attempting to schedule next job in scheduler's " +
141 |                        "pendingQueue.").format(name, job.id))
142 |         job.numSchedulingAttempts += 1
143 |         job.numTaskSchedulingAttempts += job.unscheduledTasks
144 |         val claimDeltas = scheduleJob(job, simulator.cellState)
145 |         if(claimDeltas.length > 0) {
146 |           simulator.cellState.scheduleEndEvents(claimDeltas)
147 |           job.unscheduledTasks -= claimDeltas.length
148 |           simulator.log("scheduled %d tasks of job %d's, %d remaining."
149 |                         .format(claimDeltas.length, job.id, job.unscheduledTasks))
150 |           numSuccessfulTransactions += 1
151 |           recordUsefulTimeScheduling(job,
152 |                                      thinkTime,
153 |                                      job.numSchedulingAttempts == 1)
154 |         } else {
155 |           simulator.log(("No tasks scheduled for job %d (%f cpu %f mem) " +
156 |                          "during this scheduling attempt, not recording " +
157 |                          "any busy time. %d unscheduled tasks remaining.")
158 |                         .format(job.id,
159 |                                 job.cpusPerTask,
160 |                                 job.memPerTask,
161 |                                 job.unscheduledTasks))
162 |         }
163 |         var jobEventType = "" // Set this conditionally below; used in logging.
164 |         // If the job isn't yet fully scheduled, put it back in the queue.
165 |         if (job.unscheduledTasks > 0) {
166 |           simulator.log(("Job %s didn't fully schedule, %d / %d tasks remain " +
167 |                          "(shape: %f cpus, %f mem). Putting it " +
168 |                          "back in the queue").format(job.id,
169 |                                                      job.unscheduledTasks,
170 |                                                      job.numTasks,
171 |                                                      job.cpusPerTask,
172 |                                                      job.memPerTask))
173 |           // Give up on a job if (a) it hasn't scheduled a single task in
174 |           // 100 tries or (b) it hasn't finished scheduling after 1000 tries.
175 |           if ((job.numSchedulingAttempts > 100 &&
176 |                job.unscheduledTasks == job.numTasks) ||
177 |               job.numSchedulingAttempts > 1000) {
178 |             println(("Abandoning job %d (%f cpu %f mem) with %d/%d " +
179 |                    "remaining tasks, after %d scheduling " +
180 |                    "attempts.").format(job.id,
181 |                                        job.cpusPerTask,
182 |                                        job.memPerTask,
183 |                                        job.unscheduledTasks,
184 |                                        job.numTasks,
185 |                                        job.numSchedulingAttempts))
186 |             numJobsTimedOutScheduling += 1
187 |             jobEventType = "abandoned"
188 |           } else {
189 |             simulator.afterDelay(1) {
190 |               addJob(job)
191 |             }
192 |           }
193 |         } else {
194 |           // All tasks in job scheduled so don't put it back in pendingQueue.
195 |           jobEventType = "fully-scheduled"
196 |         }
197 |         if (!jobEventType.equals("")) {
198 |           // println("%s %s %d %s %d %d %f"
199 |           //         .format(Thread.currentThread().getId(),
200 |           //                 name,
201 |           //                 hashCode(),
202 |           //                 jobEventType,
203 |           //                 job.id,
204 |           //                 job.numSchedulingAttempts,
205 |           //                 simulator.currentTime - job.submitted))
206 |         }
207 | 
208 |         scheduling = false
209 |         scheduleNextJobAction()
210 |       }
211 |       simulator.log("Scheduler named '%s' started scheduling job %d "
212 |                     .format(name,job.id))
213 |     }
214 |   }
215 | }
216 | 


--------------------------------------------------------------------------------
/src/main/scala/OmegaSimulation.scala:
--------------------------------------------------------------------------------
  1 | /**
  2 |  * Copyright (c) 2013, Regents of the University of California
  3 |  * All rights reserved.
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.  Redistributions in binary
 10 |  * form must reproduce the above copyright notice, this list of conditions and the
 11 |  * following disclaimer in the documentation and/or other materials provided with
 12 |  * the distribution.  Neither the name of the University of California, Berkeley
 13 |  * nor the names of its contributors may be used to endorse or promote products
 14 |  * derived from this software without specific prior written permission.  THIS
 15 |  * SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 16 |  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 17 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 18 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 19 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 20 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 21 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 22 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 23 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 24 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 25 |  */
 26 | 
 27 | package ClusterSchedulingSimulation
 28 | 
 29 | import collection.mutable.HashMap
 30 | import collection.mutable.ListBuffer
 31 | 
 32 | class OmegaSimulatorDesc(
 33 |     val schedulerDescs: Seq[OmegaSchedulerDesc],
 34 |     runTime: Double,
 35 |     val conflictMode: String,
 36 |     val transactionMode: String)
 37 |    extends ClusterSimulatorDesc(runTime){
 38 |   override
 39 |   def newSimulator(constantThinkTime: Double,
 40 |                    perTaskThinkTime: Double,
 41 |                    blackListPercent: Double,
 42 |                    schedulerWorkloadsToSweepOver: Map[String, Seq[String]],
 43 |                    workloadToSchedulerMap: Map[String, Seq[String]],
 44 |                    cellStateDesc: CellStateDesc,
 45 |                    workloads: Seq[Workload],
 46 |                    prefillWorkloads: Seq[Workload],
 47 |                    logging: Boolean = false): ClusterSimulator = {
 48 |     assert(blackListPercent >= 0.0 && blackListPercent <= 1.0)
 49 |     var schedulers = HashMap[String, OmegaScheduler]()
 50 |     // Create schedulers according to experiment parameters.
 51 |     println("Creating %d schedulers.".format(schedulerDescs.length))
 52 |     schedulerDescs.foreach(schedDesc => {
 53 |       // If any of the scheduler-workload pairs we're sweeping over
 54 |       // are for this scheduler, then apply them before
 55 |       // registering it.
 56 |       var constantThinkTimes = HashMap[String, Double](
 57 |           schedDesc.constantThinkTimes.toSeq: _*)
 58 |       var perTaskThinkTimes = HashMap[String, Double](
 59 |           schedDesc.perTaskThinkTimes.toSeq: _*)
 60 |       var newBlackListPercent = 0.0
 61 |       if (schedulerWorkloadsToSweepOver
 62 |           .contains(schedDesc.name)) {
 63 |         newBlackListPercent = blackListPercent
 64 |         schedulerWorkloadsToSweepOver(schedDesc.name)
 65 |             .foreach(workloadName => {
 66 |           constantThinkTimes(workloadName) = constantThinkTime
 67 |           perTaskThinkTimes(workloadName) = perTaskThinkTime
 68 |         })
 69 |       }
 70 |       println("Creating new scheduler %s".format(schedDesc.name))
 71 |       schedulers(schedDesc.name) =
 72 |           new OmegaScheduler(schedDesc.name,
 73 |                              constantThinkTimes.toMap,
 74 |                              perTaskThinkTimes.toMap,
 75 |                              math.floor(newBlackListPercent *
 76 |                                cellStateDesc.numMachines.toDouble).toInt)
 77 |     })
 78 |     val cellState = new CellState(cellStateDesc.numMachines,
 79 |                                   cellStateDesc.cpusPerMachine,
 80 |                                   cellStateDesc.memPerMachine,
 81 |                                   conflictMode,
 82 |                                   transactionMode)
 83 |       println("Creating new OmegaSimulator with schedulers %s."
 84 |               .format(schedulers.values.map(_.toString).mkString(", ")))
 85 |       println("Setting OmegaSimulator(%s, %s)'s common cell state to %d"
 86 |               .format(conflictMode,
 87 |                       transactionMode,
 88 |                       cellState.hashCode))
 89 |     new OmegaSimulator(cellState,
 90 |                        schedulers.toMap,
 91 |                        workloadToSchedulerMap,
 92 |                        workloads,
 93 |                        prefillWorkloads,
 94 |                        logging)
 95 |   }
 96 | }
 97 | 
 98 | /**
 99 |  * A simple subclass of SchedulerDesc for extensibility to
100 |  * for symmetry in the naming of the type so that we don't
101 |  * have to use a SchedulerDesc for an OmegaSimulator.
102 |  */
103 | class OmegaSchedulerDesc(name: String,
104 |                          constantThinkTimes: Map[String, Double],
105 |                          perTaskThinkTimes: Map[String, Double])
106 |                         extends SchedulerDesc(name,
107 |                                               constantThinkTimes,
108 |                                               perTaskThinkTimes)
109 | 
110 | class OmegaSimulator(cellState: CellState,
111 |                      override val schedulers: Map[String, OmegaScheduler],
112 |                      workloadToSchedulerMap: Map[String, Seq[String]],
113 |                      workloads: Seq[Workload],
114 |                      prefillWorkloads: Seq[Workload],
115 |                      logging: Boolean = false,
116 |                      monitorUtilization: Boolean = true)
117 |                     extends ClusterSimulator(cellState,
118 |                                              schedulers,
119 |                                              workloadToSchedulerMap,
120 |                                              workloads,
121 |                                              prefillWorkloads,
122 |                                              logging,
123 |                                              monitorUtilization) {
124 |   // Set up a pointer to this simulator in each scheduler.
125 |   schedulers.values.foreach(_.omegaSimulator = this)
126 | }
127 | 
128 | /**
129 |  * While an Omega Scheduler has jobs in its job queue, it:
130 |  * 1: Syncs with cell state by getting a new copy of common cell state
131 |  * 2: Schedules the next job j in the queue, using getThinkTime(j) seconds
132 |  *    and assigning creating and applying one delta per task in the job.
133 |  * 3: submits the job to CellState
134 |  * 4: if any tasks failed to schedule: insert job at back of queue
135 |  * 5: rolls back its changes
136 |  * 6: repeat, starting at 1
137 |  */
138 | class OmegaScheduler(name: String,
139 |                      constantThinkTimes: Map[String, Double],
140 |                      perTaskThinkTimes: Map[String, Double],
141 |                      numMachinesToBlackList: Double = 0)
142 |                     extends Scheduler(name,
143 |                                       constantThinkTimes,
144 |                                       perTaskThinkTimes,
145 |                                       numMachinesToBlackList) {
146 |   println("scheduler-id-info: %d, %s, %d, %s, %s"
147 |           .format(Thread.currentThread().getId(),
148 |                   name,
149 |                   hashCode(),
150 |                   constantThinkTimes.mkString(";"),
151 |                   perTaskThinkTimes.mkString(";")))
152 |   // TODO(andyk): Clean up these <subclass>Simulator classes
153 |   //              by templatizing the Scheduler class and having only
154 |   //              one simulator of the correct type, instead of one
155 |   //              simulator for each of the parent and child classes.
156 |   var omegaSimulator: OmegaSimulator = null
157 |   var privateCellState: CellState = null
158 | 
159 |   override
160 |   def checkRegistered = {
161 |     super.checkRegistered
162 |     assert(omegaSimulator != null, "This scheduler has not been added to a " +
163 |                                    "simulator yet.")
164 |   }
165 | 
166 |   def incrementDailycounter(counter: HashMap[Int, Int]) = {
167 |     val index: Int = math.floor(simulator.currentTime / 86400).toInt
168 |     val currCount: Int = counter.getOrElse(index, 0)
169 |     counter(index) = currCount + 1
170 |   }
171 | 
172 |   // When a job arrives, start scheduling, or make sure we already are.
173 |   override
174 |   def addJob(job: Job) = {
175 |     assert(simulator != null, "This scheduler has not been added to a " +
176 |                               "simulator yet.")
177 | 
178 |     assert(job.unscheduledTasks > 0)
179 |     super.addJob(job)
180 |     pendingQueue.enqueue(job)
181 |     simulator.log("Scheduler %s enqueued job %d of workload type %s."
182 |                   .format(name, job.id, job.workloadName))
183 |     if (!scheduling) {
184 |       omegaSimulator.log("Set %s scheduling to TRUE to schedule job %d."
185 |                          .format(name, job.id))
186 |       scheduling = true
187 |       handleJob(pendingQueue.dequeue)
188 |     }
189 |   }
190 | 
191 |   /**
192 |    * Schedule job and submit a transaction to common cellstate for
193 |    * it. If not all tasks in the job are successfully committed,
194 |    * put it back in the pendingQueue to be scheduled again.
195 |    */
196 |   def handleJob(job: Job): Unit = {
197 |     job.updateTimeInQueueStats(simulator.currentTime)
198 |     syncCellState
199 |     val jobThinkTime = getThinkTime(job)
200 |     omegaSimulator.afterDelay(jobThinkTime) {
201 |       job.numSchedulingAttempts += 1
202 |       job.numTaskSchedulingAttempts += job.unscheduledTasks
203 |       // Schedule the job in private cellstate.
204 |       assert(job.unscheduledTasks > 0)
205 |       val claimDeltas = scheduleJob(job, privateCellState)
206 |       simulator.log(("Job %d (%s) finished %f seconds of scheduling " + 
207 |                      "thinktime; now trying to claim resources for %d " +
208 |                      "tasks with %f cpus and %f mem each.")
209 |                      .format(job.id,
210 |                              job.workloadName,
211 |                              jobThinkTime,
212 |                              job.numTasks,
213 |                              job.cpusPerTask,
214 |                              job.memPerTask))
215 |       if (claimDeltas.length > 0) {
216 |         // Attempt to claim resources in common cellstate by committing
217 |         // a transaction.
218 |         omegaSimulator.log("Submitting a transaction for %d tasks for job %d."
219 |                            .format(claimDeltas.length, job.id))
220 |         val commitResult = omegaSimulator.cellState.commit(claimDeltas, true)
221 |         job.unscheduledTasks -= commitResult.committedDeltas.length
222 |         omegaSimulator.log("%d tasks successfully committed for job %d."
223 |                            .format(commitResult.committedDeltas.length, job.id))
224 |         numSuccessfulTaskTransactions += commitResult.committedDeltas.length
225 |         numFailedTaskTransactions += commitResult.conflictedDeltas.length
226 |         if (job.numSchedulingAttempts > 1)
227 |           numRetriedTransactions += 1
228 | 
229 |         // Record job-level stats.
230 |         if (commitResult.conflictedDeltas.length == 0) {
231 |           numSuccessfulTransactions += 1
232 |           incrementDailycounter(dailySuccessTransactions)
233 |           recordUsefulTimeScheduling(job,
234 |                                      jobThinkTime,
235 |                                      job.numSchedulingAttempts == 1)
236 |         } else {
237 |           numFailedTransactions += 1
238 |           incrementDailycounter(dailyFailedTransactions)
239 |           // omegaSimulator.log("adding %f seconds to wastedThinkTime counter."
240 |           //                   .format(jobThinkTime))
241 |           recordWastedTimeScheduling(job,
242 |                                      jobThinkTime,
243 |                                      job.numSchedulingAttempts == 1)
244 |           // omegaSimulator.log(("Transaction task CONFLICTED for job-%d on " +
245 |           //                     "machines %s.")
246 |           //                    .format(job.id,
247 |           //                            commitResult.conflictedDeltas.map(_.machineID)
248 |           //                            .mkString(", ")))
249 |         }
250 |       } else {
251 |         simulator.log(("Not enough resources of the right shape were " +
252 |                       "available to schedule even one task of job %d, " +
253 |                       "so not submitting a transaction.").format(job.id))
254 |         numNoResourcesFoundSchedulingAttempts += 1
255 |       }
256 | 
257 |       var jobEventType = "" // Set this conditionally below; used in logging.
258 |       // If the job isn't yet fully scheduled, put it back in the queue.
259 |       if (job.unscheduledTasks > 0) {
260 |         // Give up on a job if (a) it hasn't scheduled a single task in
261 |         // 100 tries or (b) it hasn't finished scheduling after 1000 tries.
262 |         if ((job.numSchedulingAttempts > 100 &&
263 |              job.unscheduledTasks == job.numTasks) ||
264 |             job.numSchedulingAttempts > 1000) {
265 |           println(("Abandoning job %d (%f cpu %f mem) with %d/%d " +
266 |                  "remaining tasks, after %d scheduling " +
267 |                  "attempts.").format(job.id,
268 |                                      job.cpusPerTask,
269 |                                      job.memPerTask,
270 |                                      job.unscheduledTasks,
271 |                                      job.numTasks,
272 |                                      job.numSchedulingAttempts))
273 |           numJobsTimedOutScheduling += 1
274 |           jobEventType = "abandoned"
275 |         } else {
276 |           simulator.log(("Job %d still has %d unscheduled tasks, adding it " +
277 |                          "back to scheduler %s's job queue.")
278 |                          .format(job.id, job.unscheduledTasks, name))
279 |           simulator.afterDelay(1) {
280 |             addJob(job)
281 |           }
282 |         }
283 |       } else {
284 |         // All tasks in job scheduled so don't put it back in pendingQueue.
285 |         jobEventType = "fully-scheduled"
286 |       }
287 |       if (!jobEventType.equals("")) {
288 |         // println("%s %s %d %s %d %d %f"
289 |         //         .format(Thread.currentThread().getId(),
290 |         //                 name,
291 |         //                 hashCode(),
292 |         //                 jobEventType,
293 |         //                 job.id,
294 |         //                 job.numSchedulingAttempts,
295 |         //                 simulator.currentTime - job.submitted))
296 |       }
297 | 
298 |       omegaSimulator.log("Set " + name + " scheduling to FALSE")
299 |       scheduling = false
300 |       // Keep trying to schedule as long as we have jobs in the queue.
301 |       if (!pendingQueue.isEmpty) {
302 |         scheduling = true
303 |         handleJob(pendingQueue.dequeue)
304 |       }
305 |     }
306 |   }
307 | 
308 |   def syncCellState {
309 |     checkRegistered
310 |     privateCellState = omegaSimulator.cellState.copy
311 |     simulator.log("%s synced private cellstate.".format(name))
312 |     // println("Scheduler %s (%d) has new private cell state %d"
313 |     //         .format(name, hashCode, privateCellState.hashCode))
314 |   }
315 | }
316 | 


--------------------------------------------------------------------------------
/src/main/scala/ParseParm.scala:
--------------------------------------------------------------------------------
  1 | /**
  2 |  * Copyright (c) 2013, Regents of the University of California
  3 |  * All rights reserved.
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.  Redistributions in binary
 10 |  * form must reproduce the above copyright notice, this list of conditions and the
 11 |  * following disclaimer in the documentation and/or other materials provided with
 12 |  * the distribution.  Neither the name of the University of California, Berkeley
 13 |  * nor the names of its contributors may be used to endorse or promote products
 14 |  * derived from this software without specific prior written permission.  THIS
 15 |  * SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 16 |  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 17 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 18 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 19 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 20 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 21 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 22 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 23 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 24 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 25 |  */
 26 | 
 27 | /*
 28 |  * ParseParms.scala
 29 |  * 
 30 |  * Copied from: http://code.google.com/p/parse-cmd/wiki/AScalaParserClass
 31 |  *
 32 |  * ParseParms is an implementation of a command-line argument parser in Scala.
 33 |  *
 34 |  * It allows you to define:
 35 |  *
 36 |  *   A Help String
 37 |  *   Parameter entries each including:
 38 |  *       name
 39 |  *       default value
 40 |  *       regular expression used for validation; defaults are used if not stated
 41 |  *       tag to indicate a required parameter; defaults to not-required
 42 |  *   A validate method to test passed arguments against defined parameters
 43 |  *
 44 |  *   A validate method parses the arguments and returns a Scala tuple object
 45 |  *   of the form (Boolean, String, Map)
 46 |  *
 47 |  *   The Boolean indicates success and Map contains merged values: e.g.
 48 |  *   supplied command-line arguments compared and merged agains defined parms.
 49 |  *   
 50 |  *   Failure, false, includes an error message in String; Map is empty
 51 |  *
 52 |  *   The String includes an error message indicating missing required parms
 53 |  *   and/or incorrect values: failed regular expression test
 54 |  *
 55 |  *   The Map object contains the merged arguments and parameter default values
 56 |  *
 57 |  *   Usage example is included below under Main
 58 |  *
 59 |  *   jf.zarama at gmail dot com
 60 |  *
 61 |  *   2009.07.24
 62 |  */
 63 | 
 64 | package ca.zmatrix.utils
 65 | 
 66 | class ParseParms(val help: String) {
 67 | 
 68 |     private var parms = Map[String,(String,String,Boolean)]()
 69 |     private var cache: Option[String] = None    // save parm name across calls
 70 |                                                 // used by req and rex methods
 71 |     def parm(name: String) = {
 72 |         parms += name -> ("", "^.*$", false ) ;cache = Some(name)
 73 |         this
 74 |     }
 75 | 
 76 |     def parm(name: String, default: String) = {
 77 |         parms += name -> (default, defRex(default), false); cache = Some(name)
 78 |         this
 79 |     }
 80 | 
 81 |     def parm(name: String, default: String, rex: String) = {
 82 |         parms += name -> (default, rex, false); cache = Some(name)
 83 |         this
 84 |     }
 85 | 
 86 |     def parm(name: String, default: String, rex: String, req: Boolean) = {
 87 |         parms += name -> (default, rex, req); cache = Some(name)
 88 |         this
 89 |     }
 90 | 
 91 |     def parm(name: String, default: String, req: Boolean) = {
 92 |         parms += name -> (default, defRex(default), req); cache =  Some(name)
 93 |         this
 94 |     }
 95 | 
 96 |     def req(value: Boolean) = {                 // update required flag
 97 |         val k = checkName                       // for current parameter name
 98 |         if( k.length > 0 ) {                    // stored in cache
 99 |             val pvalue = parms(k)               // parmeter tuple value
100 |             val ntuple = (pvalue._1,pvalue._2,value)    // new tuple
101 |             parms += cache.get -> ntuple        // update entry in parms
102 |         }                                       // .parm("-p1","1").req(true)
103 |         this                                    // enables chained calls
104 |     }
105 | 
106 |     def rex(value: String) = {                  // update regular-expression
107 |         val k = checkName                       // for current name
108 |         if( k.length > 0 ) {                    // stored in cache
109 |             val pvalue = parms(k)               // parameter tuple value
110 |             val ntuple = (pvalue._1,value,pvalue._3)    // new tuple
111 |             parms += cache.get -> ntuple        // update tuple for key in parms
112 |         }                                       // .parm("-p1","1").rex(".+")
113 |         this                                    // enables chained calls
114 |     }
115 | 
116 |     private def checkName = {                           // checks name stored in cache
117 |         cache match {                           // to be a parm-name used for
118 |             case Some(key) => key               // req and rex methods
119 |             case _         => ""                // req & rex will not update
120 |         }                                       // entries if cache other than
121 |     }                                           // Some(key)
122 | 
123 |     private def defRex(default: String): String = {
124 |         if( default.matches("^\\d+$") ) "^\\d+$" else "^.*$"
125 |     }
126 | 
127 |     private def genMap(args: List[String] ) = { // return a Map of args
128 |         var argsMap = Map[String,String]()      // result object
129 |         if( ( args.length % 2 ) != 0 ) argsMap  // must have pairs: -name value
130 |         else {                                  // to return a valid Map
131 |             for( i <- 0.until(args.length,2) ){ // iterate through args by 2
132 |                 argsMap += args(i) -> args(i+1) // add -name value pair
133 |             }
134 |             argsMap                             // return -name value Map
135 |         }
136 |     }
137 | 
138 |     private def testRequired( args: Map[String,String] ) = {
139 |         val ParmsNotSupplied = new collection.mutable.ListBuffer[String]
140 |         for{ (key,value) <- parms               // iterate trough parms
141 |             if value._3                         // if parm is required
142 |             if !args.contains(key)              // and it is not in args
143 |         } ParmsNotSupplied += key               // add it to List
144 |         ParmsNotSupplied.toList                 // empty: all required present
145 |     }
146 | 
147 |     private def validParms( args: Map[String,String] ) = {
148 |         val invalidParms = new collection.mutable.ListBuffer[String]
149 |         for{ (key,value) <- args                // iterate through args
150 |             if parms.contains(key)              // if it is a defined parm
151 |             rex = parms(key)._2                 // parm defined rex
152 |             if !value.matches(rex)              // if regex does not match
153 |         } invalidParms += key                   // add invalid arg
154 |         invalidParms.toList                     // empty: all parms valid
155 |     }
156 | 
157 |     private def mergeParms( args: Map[String,String] ) = {
158 |         //val mergedMap = collection.mutable.Map[String,String]()
159 |         var mergedMap = Map[String,String]()    // name value Map of results
160 |         for{ (key,value) <- parms               // iterate through parms
161 |             //mValue = if( args.contains(key) ) args(key) else value(0)
162 |             mValue = args.getOrElse(key,value._1)  // args(key) or default
163 |         }   mergedMap +=  key -> mValue         // update result Map
164 |         mergedMap                               // return mergedMap
165 |     }
166 | 
167 |     private def mkString(l1: List[String],l2: List[String]) = {
168 |         "\nhelp:   " + help + "\n\trequired parms missing: "  +
169 |         ( if( !l1.isEmpty ) l1.mkString(" ")  else "" )       +
170 |         ( if( !l2.isEmpty ) "\n\tinvalid parms:          "    +
171 |                l2.mkString(" ") + "\n" else "" )
172 |     }
173 | 
174 |     def validate( args: List[String] ) = {          // validate args to parms
175 |         val argsMap   = genMap( args )              // Map of args: -name value
176 |         val reqList   = testRequired( argsMap )     // List of missing required
177 |         val validList = validParms( argsMap )       // List of (in)valid args
178 |         if( reqList.isEmpty && validList.isEmpty ) {// successful return
179 |             (true,"",mergeParms( argsMap ))         // true, "", mergedParms
180 |         } else (false,mkString(reqList,validList),Map[String,String]())
181 |     }
182 | }
183 | 
184 | // object Main {
185 | // 
186 | //   /**
187 | //    * @param args the command line arguments
188 | //    */
189 | //   def main(args: Array[String]) = {
190 | //     val helpString = " -p1 out.txt -p2 22 [ -p3 100 -p4 1200 ] "
191 | //     val pp = new ParseParms( helpString )
192 | //     pp.parm("-p1", "output.txt").rex("^.*\\.txt$").req(true)    // required
193 | //       .parm("-p2", "22","^\\d{2}$",true)        // alternate form, required
194 | //       .parm("-p3","100").rex("^\\d{3}$")                        // optional
195 | //       .parm("-p4","1200").rex("^\\d{4}$").req(false)            // optional
196 | // 
197 | //     val result = pp.validate( args.toList )
198 | //     println(  if( result._1 ) result._3  else result._2 )
199 | //     // result is a tuple (Boolean, String, Map)
200 | //     // ._1 Boolean; false: error String contained in ._2, Map in ._3 is empty
201 | //     //              true:  successful, Map of parsed & merged parms in ._3
202 | // 
203 | //     System.exit(0)
204 | //   }
205 | // 
206 | // }
207 | 
208 | 


--------------------------------------------------------------------------------
/src/main/scala/Util.scala:
--------------------------------------------------------------------------------
 1 | /**
 2 |  * Copyright (c) 2013, Regents of the University of California
 3 |  * All rights reserved.
 4 |  *
 5 |  * Redistribution and use in source and binary forms, with or without
 6 |  * modification, are permitted provided that the following conditions are met:
 7 |  *
 8 |  * Redistributions of source code must retain the above copyright notice, this
 9 |  * list of conditions and the following disclaimer.  Redistributions in binary
10 |  * form must reproduce the above copyright notice, this list of conditions and the
11 |  * following disclaimer in the documentation and/or other materials provided with
12 |  * the distribution.  Neither the name of the University of California, Berkeley
13 |  * nor the names of its contributors may be used to endorse or promote products
14 |  * derived from this software without specific prior written permission.  THIS
15 |  * SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
16 |  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
19 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
21 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
22 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
23 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
24 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25 |  */
26 | 
27 | package ClusterSchedulingSimulation
28 | 
29 | object Seed {
30 |   private var seed: Long = 0
31 |   def set(newSeed: Long) = {seed = newSeed}
32 |   def apply() = seed
33 | }
34 | 


--------------------------------------------------------------------------------
/src/main/scala/Workloads.scala:
--------------------------------------------------------------------------------
  1 | /**
  2 |  * Copyright (c) 2013, Regents of the University of California
  3 |  * All rights reserved.
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.  Redistributions in binary
 10 |  * form must reproduce the above copyright notice, this list of conditions and the
 11 |  * following disclaimer in the documentation and/or other materials provided with
 12 |  * the distribution.  Neither the name of the University of California, Berkeley
 13 |  * nor the names of its contributors may be used to endorse or promote products
 14 |  * derived from this software without specific prior written permission.  THIS
 15 |  * SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 16 |  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 17 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 18 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 19 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 20 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 21 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 22 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 23 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 24 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 25 |  */
 26 | 
 27 | package ClusterSchedulingSimulation
 28 | 
 29 | import java.io.File
 30 | 
 31 | /**
 32 |  * Set up workloads based on measurements from a real cluster.
 33 |  * In the Eurosys paper, we used measurements from Google clusters here.
 34 |  */
 35 | object Workloads {
 36 |   /**
 37 |    * Set up CellStateDescs that will go into WorkloadDescs. Fabricated
 38 |    * numbers are provided as an example. Enter numbers based on your
 39 |    * own clusters instead.
 40 |    */
 41 |   val exampleCellStateDesc = new CellStateDesc(numMachines = 10000,
 42 |                                           cpusPerMachine = 4,
 43 |                                           memPerMachine = 16)
 44 | 
 45 | 
 46 |   /**
 47 |    * Set up WorkloadDescs, containing generators of workloads and
 48 |    * pre-fill workloads based on measurements of cells/workloads.
 49 |    */
 50 |   val exampleWorkloadGeneratorBatch =
 51 |       new ExpExpExpWorkloadGenerator(workloadName = "Batch".intern(),
 52 |                                      initAvgJobInterarrivalTime = 10.0,
 53 |                                      avgTasksPerJob = 100.0,
 54 |                                      avgJobDuration = (100.0),
 55 |                                      avgCpusPerTask = 1.0,
 56 |                                      avgMemPerTask = 2.0)
 57 |   val exampleWorkloadGeneratorService =
 58 |       new ExpExpExpWorkloadGenerator(workloadName = "Service".intern(),
 59 |                                      initAvgJobInterarrivalTime = 20.0,
 60 |                                      avgTasksPerJob = 10.0,
 61 |                                      avgJobDuration = (500.0),
 62 |                                      avgCpusPerTask = 1.0,
 63 |                                      avgMemPerTask = 2.0)
 64 |   val exampleWorkloadDesc = WorkloadDesc(cell = "example",
 65 |                                       assignmentPolicy = "CMB_PBB",
 66 |                                       workloadGenerators =
 67 |                                           exampleWorkloadGeneratorBatch ::
 68 |                                           exampleWorkloadGeneratorService :: Nil,
 69 |                                       cellStateDesc = exampleCellStateDesc)
 70 | 
 71 | 
 72 |   // example pre-fill workload generators.
 73 |   val examplePrefillTraceFileName = "traces/example-init-cluster-state.log"
 74 |   assert((new File(examplePrefillTraceFileName)).exists())
 75 |   val exampleBatchPrefillTraceWLGenerator =
 76 |       new PrefillPbbTraceWorkloadGenerator("PrefillBatch",
 77 |                                            examplePrefillTraceFileName)
 78 |   val exampleServicePrefillTraceWLGenerator =
 79 |       new PrefillPbbTraceWorkloadGenerator("PrefillService",
 80 |                                           examplePrefillTraceFileName)
 81 |   val exampleBatchServicePrefillTraceWLGenerator =
 82 |       new PrefillPbbTraceWorkloadGenerator("PrefillBatchService",
 83 |                                           examplePrefillTraceFileName)
 84 | 
 85 |   val exampleWorkloadPrefillDesc =
 86 |       WorkloadDesc(cell = "example",
 87 |                    assignmentPolicy = "CMB_PBB",
 88 |                    workloadGenerators =
 89 |                        exampleWorkloadGeneratorBatch ::
 90 |                        exampleWorkloadGeneratorService ::
 91 |                        Nil,
 92 |                    cellStateDesc = exampleCellStateDesc,
 93 |                    prefillWorkloadGenerators =
 94 |                        List(exampleBatchServicePrefillTraceWLGenerator))
 95 | 
 96 | 
 97 |   // Set up example workload with jobs that have interarrival times
 98 |   // from trace-based interarrival times.
 99 |   val exampleInterarrivalTraceFileName = "traces/job-distribution-traces/" +
100 |       "example_interarrival_cmb.log"
101 |   val exampleNumTasksTraceFileName = "traces/job-distribution-traces/" +
102 |       "example_csizes_cmb.log"
103 |   val exampleJobDurationTraceFileName = "traces/job-distribution-traces/" +
104 |       "example_runtimes_cmb.log"
105 |   assert((new File(exampleInterarrivalTraceFileName)).exists())
106 |   assert((new File(exampleNumTasksTraceFileName)).exists())
107 |   assert((new File(exampleJobDurationTraceFileName)).exists())
108 | 
109 |   // A workload based on traces of interarrival times, tasks-per-job,
110 |   // and job duration. Task shapes now based on pre-fill traces.
111 |   val exampleWorkloadGeneratorTraceAllBatch =
112 |       new TraceAllWLGenerator(
113 |           "Batch".intern(),
114 |           exampleInterarrivalTraceFileName,
115 |           exampleNumTasksTraceFileName,
116 |           exampleJobDurationTraceFileName,
117 |           examplePrefillTraceFileName,
118 |           maxCpusPerTask = 3.9, // Machines in example cluster have 4 CPUs.
119 |           maxMemPerTask = 15.9) // Machines in example cluster have 16GB mem.
120 | 
121 |   val exampleWorkloadGeneratorTraceAllService =
122 |       new TraceAllWLGenerator(
123 |           "Service".intern(),
124 |           exampleInterarrivalTraceFileName,
125 |           exampleNumTasksTraceFileName,
126 |           exampleJobDurationTraceFileName,
127 |           examplePrefillTraceFileName,
128 |           maxCpusPerTask = 3.9,
129 |           maxMemPerTask = 15.9)
130 | 
131 |   val exampleTraceAllWorkloadPrefillDesc =
132 |       WorkloadDesc(cell = "example",
133 |                    assignmentPolicy = "CMB_PBB",
134 |                    workloadGenerators =
135 |                        exampleWorkloadGeneratorTraceAllBatch ::
136 |                        exampleWorkloadGeneratorTraceAllService ::
137 |                        Nil,
138 |                    cellStateDesc = exampleCellStateDesc,
139 |                    prefillWorkloadGenerators =
140 |                       List(exampleBatchServicePrefillTraceWLGenerator))
141 | }
142 | 


--------------------------------------------------------------------------------
/src/test/scala/TestSimulations.scala:
--------------------------------------------------------------------------------
  1 | /**
  2 |  * Copyright (c) 2013, Regents of the University of California
  3 |  * All rights reserved.
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.  Redistributions in binary
 10 |  * form must reproduce the above copyright notice, this list of conditions and the
 11 |  * following disclaimer in the documentation and/or other materials provided with
 12 |  * the distribution.  Neither the name of the University of California, Berkeley
 13 |  * nor the names of its contributors may be used to endorse or promote products
 14 |  * derived from this software without specific prior written permission.  THIS
 15 |  * SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
 16 |  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 17 |  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 18 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 19 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 20 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 21 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 22 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 23 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 24 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 25 |  */
 26 | 
 27 | import org.scalatest.FunSuite
 28 | 
 29 | import ClusterSchedulingSimulation.Workload
 30 | import ClusterSchedulingSimulation.WorkloadDesc
 31 | import ClusterSchedulingSimulation.Job
 32 | import ClusterSchedulingSimulation.UniformWorkloadGenerator
 33 | import ClusterSchedulingSimulation.CellState
 34 | 
 35 | import ClusterSchedulingSimulation.ClusterSimulator
 36 | import ClusterSchedulingSimulation.MonolithicScheduler
 37 | 
 38 | import ClusterSchedulingSimulation.MesosSimulator
 39 | import ClusterSchedulingSimulation.MesosScheduler
 40 | import ClusterSchedulingSimulation.MesosAllocator
 41 | 
 42 | import ClusterSchedulingSimulation.ClaimDelta
 43 | import ClusterSchedulingSimulation.OmegaSimulator
 44 | import ClusterSchedulingSimulation.OmegaScheduler
 45 | 
 46 | import ClusterSchedulingSimulation.PrefillPbbTraceWorkloadGenerator
 47 | import ClusterSchedulingSimulation.InterarrivalTimeTraceExpExpWLGenerator
 48 | 
 49 | import collection.mutable.HashMap
 50 | import collection.mutable.ListBuffer
 51 | import sys.process._
 52 | 
 53 | class SimulatorsTestSuite extends FunSuite {
 54 |   /**
 55 |    * Monolithic simulator tests.
 56 |    */
 57 |   test("MonolithicSimulatorTest") {
 58 |     println("\n\n\n=====================")
 59 |     println("Testing Monolithic simulator functionality.")
 60 |     println("=====================\n\n")
 61 |     // Build a workload manually.
 62 |     var workload = new Workload("unif")
 63 |     val numJobs = 4 // Don't change this unless you update the
 64 |                     // hand calculations used in assert()-s below.
 65 | 
 66 |     (1 to numJobs).foreach(i => {
 67 |       workload.addJob(new Job(id = i,
 68 |                                    submitted = 0,
 69 |                                    numTasks = i,
 70 |                                    taskDuration = i,
 71 |                                    workloadName = workload.name,
 72 |                                    cpusPerTask = 1.0,
 73 |                                    memPerTask = 1.0))
 74 |     })
 75 |     assert(workload.numJobs == numJobs)
 76 |     assert(workload.getJobs.last.id == numJobs)
 77 | 
 78 |     // Create a simple scheduler.
 79 |     val scheduler = new MonolithicScheduler("simple_sched",
 80 |                                             Map("unif" -> 1),
 81 |                                             Map("unif" -> 1))
 82 | 
 83 |     // Set up a CellState, conflictMode and transactionMode shouldn't 
 84 |     // matter for a Monolithic Simulator.
 85 |     val monolithicCellState = new CellState(1, 10.0, 20.0,
 86 |                                             conflictMode = "sequence-numbers",
 87 |                                             transactionMode = "all-or-nothing")
 88 | 
 89 |     val monolithicSimulator = new ClusterSimulator(
 90 |         monolithicCellState,  
 91 |         Map(scheduler.name -> scheduler),
 92 |         Map("unif" -> Seq("simple_sched")),
 93 |         List(workload),
 94 |         List(),
 95 |         logging = true,
 96 |         monitorUtilization = false)
 97 | 
 98 |     assert(monolithicSimulator.schedulers.size == 1)
 99 |     assert(monolithicSimulator.workloadToSchedulerMap.size == 1)
100 |     assert(monolithicSimulator.agendaSize == numJobs)
101 | 
102 |     // Test that run() empties the Simulator's priority queue of WorkItem-s
103 |     monolithicSimulator.run()
104 |     println(monolithicSimulator.currentTime)
105 |     println(numJobs + (1 to numJobs).sum)
106 |     assert(monolithicSimulator.agendaSize == 0)
107 |     // Last task finishes scheduling at (numJobs + (1 to numJobs).sum),
108 |     // and runs for 3 seconds (since jobs are numbered 0 to 3 and each job has
109 |     // task duration set to its id number).
110 |     assert(monolithicSimulator.currentTime ==
111 |            numJobs + (1 to numJobs).sum + numJobs - 1)
112 |   }
113 | 
114 |   test("testStats") {
115 |     var workload = new Workload("unif")
116 |     val numJobs = 4 // Don't change this unless you update the
117 |                     // hand calculations used in assert()-s below.
118 | 
119 |     workload = new Workload("unif")
120 |     (1 to numJobs).foreach(i => {
121 |       workload.addJob(Job(id = i,
122 |                                submitted = i,
123 |                                numTasks = i,
124 |                                taskDuration = i,
125 |                                workloadName = workload.name,
126 |                                cpusPerTask = 1.0,
127 |                                memPerTask = 1.0))
128 |     })
129 | 
130 |     // Create a simple scheduler.
131 |     val scheduler = new MonolithicScheduler("simple_sched",
132 |                                             Map("unif" -> 1),
133 |                                             Map("unif" -> 1))
134 | 
135 |     // Set up a CellState
136 |     val monolithicCellState = new CellState(1, 10.0, 20.0,
137 |                                             conflictMode = "sequence-numbers",
138 |                                             transactionMode = "all-or-nothing")
139 | 
140 |     val monolithicSimulator = new ClusterSimulator(
141 |           monolithicCellState,
142 |           Map(scheduler.name -> scheduler),
143 |           Map("unif" -> Seq("simple_sched")),
144 |           List(workload),
145 |           List(),
146 |           logging = true,
147 |           monitorUtilization = false)
148 | 
149 |     monolithicSimulator.run()
150 | 
151 |     // Test the workload stats.
152 |     workload.getJobs.foreach(job => {
153 |       println(job.usefulTimeScheduling)
154 |       assert(job.usefulTimeScheduling == 1 + job.id * 1)
155 |     })
156 |     println(workload.jobUsefulThinkTimesPercentile(0.9))
157 |     assert(workload.jobUsefulThinkTimesPercentile(0.9) == ((numJobs + 1) * 0.9).toInt)
158 |     // Job queue times:
159 |     // job 1 arrives 1, thinktime 2, finishes at 3, queued 0
160 |     // job 2 arrives 2, thinktime 3, finishes at 6, queued 1
161 |     // job 3 arrives 3, thinktime 4, finishes at 10, queued 3
162 |     // job 4 arrives 4, thinktime 5, finishes at 15 , queued 6
163 |     assert(workload.avgJobQueueTimeTillFirstScheduled ==
164 |            (0.0 + 1.0 + 3.0 + 6.0)/4.0)
165 |     println(workload.jobQueueTimeTillFirstScheduledPercentile(0.9))
166 |     val array = Array[Double](0.0, 1.0, 3.0, 6.0)
167 |     assert(workload.jobQueueTimeTillFirstScheduledPercentile(0.9) ==
168 |            array((3 *0.9).toInt))
169 |   }
170 | 
171 |   /**
172 |    * Mesos simulator tests.
173 |    */
174 | 
175 |   // The following test exercises functionality that is not yet implemented,
176 |   // so we currently expect it to fail.
177 |   test("mesosSimulatorSingleSchedulerZeroResourceJobsTest") {
178 |     println("\n\n\n=====================")
179 |     println("Testing Mesos simulator functionality.")
180 |     println("=====================\n\n")
181 | 
182 |     var workload = new Workload("unif")
183 |     val numJobs = 40 
184 |     (1 to numJobs).foreach(i => {
185 |       workload.addJob(Job(id = i,
186 |                                submitted = i,
187 |                                numTasks = i,
188 |                                taskDuration = i,
189 |                                workloadName = workload.name,
190 |                                cpusPerTask = 1.0,
191 |                                memPerTask = 1.0))
192 |     })
193 | 
194 |     // Create a simple scheduler, turn off partial job scheduling
195 |     // So that we can check our think time calculations by just
196 |     // assuming that each job only has to be scheduled once.
197 |     // An alternative to this would be to just increase the size
198 |     // of the test cell (e.g., to 1000cpus and 1000mem),
199 |     // which would allow all of the jobs to fit simultaneously.
200 |     val scheduler = new MesosScheduler(name = "mesos_test_sched",
201 |                                        constantThinkTimes = Map("unif" -> 1),
202 |                                        perTaskThinkTimes = Map("unif" -> 1),
203 |                                        schedulePartialJobs = false)
204 | 
205 |     // Set up a CellState with plenty of space so that no jobs 
206 |     val mesosCellState = new CellState(1, 100.0, 200.0,
207 |                                        conflictMode = "resource-fit",
208 |                                        transactionMode = "all-or-nothing")
209 | 
210 |     // Create a round-robin allocator
211 |     val allocatorConstantThinkTime = 1.0
212 |     val mesosDRFAllocator = new MesosAllocator(allocatorConstantThinkTime)
213 | 
214 |     val mesosSimulator = new MesosSimulator(
215 |           mesosCellState,
216 |           Map(scheduler.name -> scheduler),
217 |           Map("unif" -> Seq("mesos_test_sched")),
218 |           List(workload),
219 |           List(),
220 |           mesosDRFAllocator,
221 |           logging = true,
222 |           monitorUtilization = false)
223 | 
224 |     mesosSimulator.run()
225 |     assert(mesosSimulator.agendaSize == 0, ("Mesos Agenda should have been " +
226 |                                             "zero when simulator finished " +
227 |                                             "running, but was %d.")
228 |                                            .format(mesosSimulator.agendaSize))
229 |     // Each job has constant think time of 1, and per task think of 1.
230 |     // Thus, job i (which has i tasks) will think for 1 + i seconds.
231 |     // So we have numJobs seconds from the constant term of each job, and
232 |     // and 1 + 2 + ... + i from the i term.
233 |     assert(workload.totalJobUsefulThinkTimes == numJobs + (1 to numJobs).sum,
234 |            ("totalJobThinkTimes should have been %d, but was %f. THIS TEST " +
235 |             "EXERCISES FUNCTIONALITY THAT IS NOT YET IMPLEMENTED, SO WE " +
236 |             "CURRENTLY EXPECT IT TO FAIL").format(
237 |            numJobs + (1 to numJobs).sum, workload.totalJobUsefulThinkTimes))
238 |     // For .75 percentile we should see 1 + i (see comment above)
239 |     // where i = 40 * .75.
240 |     assert(workload.jobUsefulThinkTimesPercentile(.75) == 1 + (40 * .75).toInt,
241 |            ("Expected jobUsefulThinkTimesPercentil(0.75) to be %d, " +
242 |             "but it was %f.")
243 |            .format(1 + (40 * .75).toInt,
244 |                    workload.jobUsefulThinkTimesPercentile(.75)))
245 |   }
246 | 
247 |   test("MesosAllocatorTest") {
248 |     val testAllocator = new MesosAllocator(12)
249 |     assert(testAllocator.getThinkTime == 12)
250 |   }
251 | 
252 |   /**
253 |    * Omega simulator tests.
254 |    */
255 |   test("omegaSimulatorCellStateSyncApplyDeltaAndCommitTest") {
256 |     println("\n\n\n=====================")
257 |     println("omegaSimulatorCellStateSyncApplyDeltaAndCommitTest")
258 |     println("=====================\n\n")
259 |     println("\nRunning cellstate functionality test.")
260 |     // Set up a workload with one job with one task.
261 |     var workload = new Workload("unif")
262 |     workload.addJob(Job(id = 1,
263 |                         submitted = 1.0,
264 |                         numTasks = 1,
265 |                         taskDuration = 10.0,
266 |                         workloadName = workload.name,
267 |                         cpusPerTask = 1.0,
268 |                         memPerTask = 1.0))
269 | 
270 |     // Create an Omega scheduler.
271 |     val scheduler = new OmegaScheduler(name = "omega_test_sched",
272 |                                        constantThinkTimes = Map("unif" -> 1),
273 |                                        perTaskThinkTimes = Map("unif" -> 1))
274 | 
275 |     // Set up a CellState.
276 |     val commonCellState = new CellState(numMachines = 10,
277 |                                         cpusPerMachine = 1.0,
278 |                                         memPerMachine = 2.0,
279 |                                         conflictMode = "sequence-numbers",
280 |                                         transactionMode = "all-or-nothing")
281 | 
282 |     // Set up a Simulator.
283 |     val omegaSimulator = new OmegaSimulator(
284 |           commonCellState,
285 |           Map(scheduler.name -> scheduler),
286 |           Map("unif" -> Seq("omega_test_sched")),
287 |           List(workload),
288 |           List(),
289 |           logging = true,
290 |           monitorUtilization = false)
291 | 
292 |     // Create a private copy of cellstate.
293 |     val privateCellState = commonCellState.copy
294 |     assert(privateCellState.numMachines == commonCellState.numMachines)
295 |     assert(privateCellState.cpusPerMachine == commonCellState.cpusPerMachine)
296 |     assert(privateCellState.memPerMachine == commonCellState.memPerMachine)
297 |     // Test that the per machine state was successfully copied.
298 |     (0 to commonCellState.allocatedCpusPerMachine.length - 1).foreach{ i => {
299 |       assert(privateCellState.allocatedCpusPerMachine(i) ==
300 |              commonCellState.allocatedCpusPerMachine(i))
301 |     }}
302 |     assert(privateCellState.machineSeqNums(0) == 0)
303 |     // Make changes to the private cellstate by creating and applying a delta.
304 |     val claimDelta = new ClaimDelta(scheduler,
305 |                                     machineID = 0,
306 |                                     privateCellState.machineSeqNums(0),
307 |                                     duration = 10,
308 |                                     cpus = 0.25,
309 |                                     mem = 0.75)
310 |     claimDelta.apply(privateCellState, false)
311 |     // Check that machines sequence number was incremented in private cellstate.
312 |     assert(privateCellState.machineSeqNums(0) == 1)
313 |     // Check that changes to private cellstate stuck.
314 |     assert(privateCellState.availableCpusPerMachine(0) == 1.0 - 0.25)
315 |     assert(privateCellState.availableMemPerMachine(0) == 2.0 - 0.75)
316 |     assert(privateCellState.allocatedCpusPerMachine(0) == 0.25)
317 |     assert(privateCellState.allocatedMemPerMachine(0) == 0.75)
318 |     // Check that common cellstate didn't change yet.
319 |     assert(commonCellState.availableCpusPerMachine(0) == 1.0,
320 |            ("commonCellState should have 1.0 cpus available on machine 0, " +
321 |             "but only has %f.").format(commonCellState.availableCpusPerMachine(0)))
322 |     assert(commonCellState.availableMemPerMachine(0) == 2.0)
323 |     assert(commonCellState.allocatedCpusPerMachine(0) == 0.0)
324 |     assert(commonCellState.allocatedMemPerMachine(0) == 0.0)
325 | 
326 |     // Commit the changes back to common cellstate.
327 |     commonCellState.commit(Seq(claimDelta))
328 |     // Check that changes to common cellstate stuck.
329 |     assert(commonCellState.availableCpusPerMachine(0) == 1.0 - 0.25)
330 |     assert(commonCellState.availableMemPerMachine(0) == 2.0 - 0.75)
331 |     assert(commonCellState.allocatedCpusPerMachine(0) == 0.25)
332 |     assert(commonCellState.allocatedMemPerMachine(0) == 0.75)
333 |     assert(commonCellState.machineSeqNums(0) == 1)
334 | 
335 |     // Set up two new private cellstates.
336 |     val privateCellState1 = commonCellState.copy
337 |     val privateCellState2 = commonCellState.copy
338 |     assert(privateCellState1.machineSeqNums(0) == 1)
339 |     assert(privateCellState2.machineSeqNums(0) == 1)
340 | 
341 |     // Make parallel changes in both private cellstates that
342 |     // should cause a conflict in all-or-nothing conflict-mode.
343 |     val claimDelta1 = new ClaimDelta(scheduler,
344 |                                      machineID = 0,
345 |                                      privateCellState1.machineSeqNums(0),
346 |                                      duration = 10,
347 |                                      cpus = 0.25,
348 |                                      mem = 0.75)
349 |     claimDelta1.apply(privateCellState1, false)
350 |     assert(privateCellState1.machineSeqNums(0) == 2)
351 | 
352 |     // Check that the other private cellstate didn't change yet.
353 |     assert(privateCellState2.availableCpusPerMachine(0) == 1.0 - 0.25)
354 |     assert(privateCellState2.availableMemPerMachine(0) == 2.0 - 0.75)
355 |     assert(privateCellState2.allocatedCpusPerMachine(0) == 0.25)
356 |     assert(privateCellState2.allocatedMemPerMachine(0) == 0.75)
357 |     assert(privateCellState2.machineSeqNums(0) == 1)
358 | 
359 |     val claimDelta2 = new ClaimDelta(scheduler,
360 |                                      machineID = 0,
361 |                                      privateCellState2.machineSeqNums(0),
362 |                                      duration = 10,
363 |                                      cpus = 0.25,
364 |                                      mem = 0.75)
365 |     claimDelta2.apply(privateCellState2, false)
366 | 
367 |     // Commit the changes from the first private cellstate to common cellstate.
368 |     assert(commonCellState.commit(Seq(claimDelta1)).conflictedDeltas.length == 0)
369 |     // Commit the changes from the second private cellstate and check
370 |     // that it conflicts and doesn't change common cellstate.
371 |     assert(commonCellState.commit(Seq(claimDelta2)).conflictedDeltas.length > 0)
372 |     assert(commonCellState.availableCpusPerMachine(0) == 1.0 - 2 * 0.25)
373 |     assert(commonCellState.availableMemPerMachine(0) == 2.0 - 2 * 0.75)
374 |     assert(commonCellState.allocatedCpusPerMachine(0) == 2 * 0.25)
375 |     assert(commonCellState.allocatedMemPerMachine(0) == 2 * 0.75)
376 |     assert(commonCellState.machineSeqNums(0) == 2)
377 |   }
378 | 
379 |   test("omegaSchedulerTest") {
380 |     println("===========\nomegaSchedulerTest\n==========")
381 |     println("\nRunning cellstate flow test.")
382 |     var workload = new Workload("unif")
383 |     workload.addJob(Job(id = 1,
384 |                              submitted = 1.0,
385 |                              numTasks = 1,
386 |                              taskDuration = 10.0,
387 |                              workloadName = "unif",
388 |                              cpusPerTask = 1.0,
389 |                              memPerTask = 1.0))
390 | 
391 |     val scheduler = new OmegaScheduler(name = "omega_test_sched",
392 |                                        constantThinkTimes = Map("unif" -> 1),
393 |                                        perTaskThinkTimes = Map("unif" -> 1))
394 | 
395 |     val commonCellState = new CellState(numMachines = 20,
396 |                                         cpusPerMachine = 1.0,
397 |                                         memPerMachine = 1.0,
398 |                                         conflictMode = "sequence-numbers",
399 |                                         transactionMode = "all-or-nothing")
400 | 
401 |     val omegaSimulator = new OmegaSimulator(commonCellState,
402 |                                             Map(scheduler.name -> scheduler),
403 |                                             Map("unif" -> Seq("omega_test_sched")),
404 |                                             List(workload),
405 |                                             List(),
406 |                                             logging = true,
407 |                                             monitorUtilization = false)
408 | 
409 |     // The job should be scheduled as soon as it is added to the scheduler.
410 |     println("adding a job to scheduler.")
411 |     scheduler.addJob(workload.getJobs.head)
412 |     assert(scheduler.scheduling)
413 |     assert(scheduler.jobQueueSize == 0)
414 |     println("added job to scheduler.")
415 |   }
416 | 
417 |   test("omegaSimulatorRunWithSingleSchedulerTest") {
418 |     println("===========\nomegaSimulatorRunWithSingleSchedulerTest\n===========")
419 |     println("\nRunning cellstate run w/ single scheduler test.")
420 |     // Set up a workload with 40 jobs, each with 1 task.
421 |     var workload = new Workload("unif")
422 |     val numJobs = 40 
423 |     (1 to numJobs).foreach(i => {
424 |       workload.addJob(Job(id = i,
425 |                                submitted = i,
426 |                                numTasks = 1,
427 |                                taskDuration = i,
428 |                                workloadName = workload.name,
429 |                                cpusPerTask = 1.0,
430 |                                memPerTask = 1.0))
431 |     })
432 | 
433 |     // Create an Omega scheduler.
434 |     val scheduler = new OmegaScheduler(name = "omega_test_sched",
435 |                                        constantThinkTimes = Map("unif" -> 1),
436 |                                        perTaskThinkTimes = Map("unif" -> 1))
437 | 
438 |     // Set up a CellState.
439 |     val commonCellState = new CellState(numMachines = 1000,
440 |                                         cpusPerMachine = 1.0,
441 |                                         memPerMachine = 1.0,
442 |                                         conflictMode = "sequence-numbers",
443 |                                         transactionMode = "all-or-nothing")
444 | 
445 |     // Set up a Simulator.
446 |     val omegaSimulator = new OmegaSimulator(
447 |           commonCellState,
448 |           Map(scheduler.name -> scheduler),
449 |           Map("unif" -> Seq("omega_test_sched")),
450 |           List(workload),
451 |           List(),
452 |           logging = true,
453 |           monitorUtilization = false)
454 | 
455 |     omegaSimulator.run()
456 |     // Each job is scheduled two seconds after it arrives since all jobs
457 |     // have one task so think time = C + L * 1 = 1 + 1 = 2. So job 40 
458 |     // should be scheduled at time 40 * 2 + 1, and it should run for 40 seconds.
459 |     // Thus the simulator should finish at time 121, when the final task
460 |     // finishes running.
461 |     assert(omegaSimulator.currentTime == 121,
462 |            "Simulation ran for %f seconds, but should have run for %d"
463 |            .format(omegaSimulator.currentTime, 121))
464 |   }
465 |   
466 |   test("UniformWorkloadGeneratorTest") {
467 |     println("\nRunning Uniform workload generator test.")
468 |     // create a new WorkloadGenerator
469 |     var workloadGen =
470 |         new UniformWorkloadGenerator(workloadName = "test_wl",
471 |                                      initJobInterarrivalTime = 1.0,
472 |                                      tasksPerJob = 2,
473 |                                      jobDuration = 3.0,
474 |                                      cpusPerTask = 4.0,
475 |                                      memPerTask = 5.0)
476 | 
477 |     // Test newWorkload.
478 |     val workload = workloadGen.newWorkload(100.0)
479 |     assert(workload.numJobs == 100, "numJobs was %d, should have been %d"
480 |                                     .format(workload.numJobs, 100))
481 |     for(j <- workload.getJobs) {
482 |       assert(j.numTasks == 2.0)
483 |       assert(j.taskDuration == 3.0)
484 |       assert(j.cpusPerTask == 4.0)
485 |       assert(j.memPerTask == 5.0)
486 |     }
487 | 
488 |     // Test newJob.
489 |     val job = workloadGen.newJob(2003.0)
490 |     assert(job.submitted == 2003.0)
491 |     assert(job.numTasks == 2.0)
492 |     assert(job.taskDuration == 3.0)
493 |     assert(job.cpusPerTask == 4.0)
494 |     assert(job.memPerTask == 5.0)
495 |   }
496 | 
497 |   test("PrefillWorkloadGeneratorTest") {
498 |     println("\nRunning exmaple prefill workload generator test.")
499 |     val filename = "traces/example-init-cluster-state.log"
500 | 
501 |     // Load Service jobs.
502 |     val servicePrefillTraceWLGenerator = new PrefillPbbTraceWorkloadGenerator(
503 |         "PrefillService", filename)
504 |     val prefillServiceWL = servicePrefillTraceWLGenerator.newWorkload(1000.0)
505 |     // Cross validated by running at command line:
506 |     val numServiceJobsInFile
507 |       = Seq("awk",
508 |             "$1 == 11 && $4 == 1 && $5 != 0 && $5 != 1",
509 |             filename)
510 |            .!!.split("\n").length
511 |     assert(prefillServiceWL.numJobs == numServiceJobsInFile,
512 |            ("Expected to find %d prefill service jobs from tracefile " +
513 |             "%s, but found %d.")
514 |            .format(numServiceJobsInFile, filename, prefillServiceWL.numJobs))
515 |     for (j <- prefillServiceWL.getJobs) {
516 |       assert(j.submitted == 0)
517 |     }
518 | 
519 |     // Load batch jobs.
520 |     val batchPrefillTraceWLGenerator = new PrefillPbbTraceWorkloadGenerator(
521 |         "PrefillBatch", filename)
522 |     val prefillBatchWL = batchPrefillTraceWLGenerator.newWorkload(1000.0)  
523 |     val numBatchJobsInFile
524 |       = Seq("awk",
525 |             "$1 == 11 && ($4 != 1 || $5 == 0 || $5 == 1)",
526 |             filename)
527 |            .!!.split("\n").length
528 |     assert(prefillBatchWL.numJobs == numBatchJobsInFile,
529 |            ("Expected to find %d prefill batch jobs from tracefile %s, " +
530 |              "but found %d.")
531 |            .format(numBatchJobsInFile, filename, prefillBatchWL.numJobs))
532 |   }
533 | }
534 | 


--------------------------------------------------------------------------------
/traces/README.txt:
--------------------------------------------------------------------------------
 1 | == Schema of cell input file ==
 2 | Fields are space delimited. Each row represents a job scheduling event.
 3 | 
 4 | Each row in our traces belongs to one of two schemas. One schema has six columns, and the other has 8 columns. We describe both schema's below. The first five columns are the same in both schemas. 
 5 | 
 6 | === Common Columns ===
 7 | Column 0: possible values are 11 or 12
 8 |   "11" - something that was there at the beginning of timewindow
 9 |   "12" - something that was there at beginning of timewindow and ended at [timestamp] (see Column 1)
10 | Column 1: timestamp
11 | Column 2: unique job ID
12 | Column 3: 0 or 1 - prod_job - boolean flag indicating if this job is "production" priority as described in [1]
13 | Column 4: 0, 1, 2, or 3 - sched_class - see description of "Scheduling Class" in [1]
14 | 
15 | === 6 column format ===
16 | Column 5: UNSPECIFIED/UNUSED
17 | 
18 | === 8 column format ===
19 | Column 5: number of tasks
20 | Column 6: aggregate CPU usage of job (in num cores)
21 | Column 7: aggregate Ram usage of job (in bytes)
22 | 
23 | == CMB_PBB split logic ==
24 | For our primary research evaluation in [2] we used a job -> scheduler assignment policy as follows, which we call CMB_PBB.
25 | 
26 | The following Python snippet captures the definition of CMB_PBB. It decides a job's scheduler assignment based on its values of the prod_job and sched_class fields (see columns 3 and 4 above).
27 | 
28 | elif apol == 'cmb_pbb':
29 |    if prod_job and sched_class != 0 and sched_class != 1:
30 |      return 1
31 |    else:
32 |      return 0
33 | 
34 | == Example Lines ==
35 | 11 0.000000  623486366592 0 2 1 1 1074000000
36 | 12 12.602755 623486366592 0 2 82587
37 | 11 0.000000  158249529602 1 1 1 1 7286400
38 | 
39 | == Bibliography ==
40 | [1] Google ClusterData 2011 traces, https://github.com/google/cluster-data/blob/master/ClusterData2011_2.md
41 | [2] Omega: flexible, scalable schedulers for large compute clusters, https://research.google.com/pubs/pub41684.html
42 | 


--------------------------------------------------------------------------------
/traces/example-init-cluster-state.log:
--------------------------------------------------------------------------------
1 | 11 0.000000 049475829738997701 1 3 1 0.17000000178813934 524288000
2 | 11 0.000000 610339428128088085 0 0 1 0.0010000000474974513 10485760
3 | 11 0.000000 856292140335049805 0 0 1000 0.10000000149011612 1074000000000
4 | 12 6.844149 610339428128088085 0 0 5.001143932342529
5 | 12 8.921186 856292140335049805 0 0 20.576528549194336
6 | 


--------------------------------------------------------------------------------
/traces/job-distribution-traces/README.txt:
--------------------------------------------------------------------------------
 1 | == Schema of trace files of job interarrival time, num_tasks, job_duration ==
 2 | 
 3 | These trace files should contain distributions of job interarrival time,
 4 | num_tasks, job_duration from your cluster. The simulator will build
 5 | empirical distributions based on these files.
 6 | 
 7 | === Columns ===
 8 | Column 0: cluster_name
 9 | Column 1: assignment policy ("cmb-new" = "CMB_PBB")
10 | Column 2: scheduler id, values can be 0 or 1. 0 = batch, service = 1
11 | Column 3: depending on which trace file:
12 |     interarrival time (seconds since last job arrival)
13 |     OR tasks in job
14 |     OR job runtime (seconds)
15 | 
16 | Traces may mix batch and service jobs, although the provided examples segregate them.
17 | 


--------------------------------------------------------------------------------
/traces/job-distribution-traces/example_csizes_cmb.log:
--------------------------------------------------------------------------------
 1 | example_cluster cmb-new 0 1
 2 | example_cluster cmb-new 0 1
 3 | example_cluster cmb-new 0 1
 4 | example_cluster cmb-new 0 1600
 5 | example_cluster cmb-new 0 1
 6 | example_cluster cmb-new 0 1
 7 | example_cluster cmb-new 0 1
 8 | example_cluster cmb-new 0 1
 9 | example_cluster cmb-new 0 1
10 | example_cluster cmb-new 0 1
11 | example_cluster cmb-new 1 1
12 | example_cluster cmb-new 1 1
13 | example_cluster cmb-new 1 1
14 | example_cluster cmb-new 1 1
15 | example_cluster cmb-new 1 1
16 | example_cluster cmb-new 1 1
17 | example_cluster cmb-new 1 1
18 | example_cluster cmb-new 1 1
19 | example_cluster cmb-new 1 1
20 | example_cluster cmb-new 1 1
21 | 


--------------------------------------------------------------------------------
/traces/job-distribution-traces/example_interarrival_cmb.log:
--------------------------------------------------------------------------------
 1 | example_cluster cmb-new 0 2.517473
 2 | example_cluster cmb-new 0 1.295932
 3 | example_cluster cmb-new 0 1.618243
 4 | example_cluster cmb-new 0 0.024418
 5 | example_cluster cmb-new 0 0.060959
 6 | example_cluster cmb-new 0 1.291739
 7 | example_cluster cmb-new 0 0.020385
 8 | example_cluster cmb-new 0 2.001258
 9 | example_cluster cmb-new 0 0.090779
10 | example_cluster cmb-new 0 0.018862
11 | example_cluster cmb-new 1 340.360427
12 | example_cluster cmb-new 1 82.592528
13 | example_cluster cmb-new 1 1068.807106
14 | example_cluster cmb-new 1 50.258920
15 | example_cluster cmb-new 1 1205.063186
16 | example_cluster cmb-new 1 76.170744
17 | example_cluster cmb-new 1 884.422104
18 | example_cluster cmb-new 1 237.976148
19 | example_cluster cmb-new 1 107.145817
20 | example_cluster cmb-new 1 1048.466730
21 | 


--------------------------------------------------------------------------------
/traces/job-distribution-traces/example_runtimes_cmb.log:
--------------------------------------------------------------------------------
 1 | example_cluster cmb-new 0 1.727678
 2 | example_cluster cmb-new 0 24.602128
 3 | example_cluster cmb-new 0 21.330643
 4 | example_cluster cmb-new 0 18.513231
 5 | example_cluster cmb-new 0 15.296218
 6 | example_cluster cmb-new 0 13.420730
 7 | example_cluster cmb-new 0 9.506823
 8 | example_cluster cmb-new 0 41.397468
 9 | example_cluster cmb-new 0 43.686984
10 | example_cluster cmb-new 0 46.735950
11 | example_cluster cmb-new 1 101.121137
12 | example_cluster cmb-new 1 208.691963
13 | example_cluster cmb-new 1 110.178521
14 | example_cluster cmb-new 1 178.837062
15 | example_cluster cmb-new 1 101.111495
16 | example_cluster cmb-new 1 194.864865
17 | example_cluster cmb-new 1 176.516334
18 | example_cluster cmb-new 1 109.439388
19 | example_cluster cmb-new 1 243.861564
20 | example_cluster cmb-new 1 107.897281
21 | 


--------------------------------------------------------------------------------