├── power_trace_documentation.pdf ├── PowerData2019.md ├── ClusterData2019.md ├── TraceVersion1.md ├── README.md ├── ETAExplorationTraces.md ├── ClusterData2011_2.md ├── clusterdata_trace_format_v3.proto ├── clusterdata_analysis_colab.ipynb ├── power_trace_analysis_colab.ipynb └── bibliography.bib /power_trace_documentation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/cluster-data/HEAD/power_trace_documentation.pdf -------------------------------------------------------------------------------- /PowerData2019.md: -------------------------------------------------------------------------------- 1 | # PowerData 2019 traces 2 | 3 | The `powerdata-2019` trace dataset provides power utilization information for 57 4 | power domains in Google data centers. Two of these power domains are from cells 5 | in data centers with the new medium voltage power plane design. The remainder 6 | belong to the eight cells featured in the 7 | [2019 Cluster Data trace](ClusterData2019.md). 8 | 9 | Please see [the documentation](power_trace_documentation.pdf) for details on 10 | what's in this dataset and how to access it. For additional background, please 11 | refer to the paper [Data Center Power Oversubscription with a Medium Voltage 12 | Power Plane and Priority-Aware 13 | Capping](https://research.google/pubs/data-center-power-oversubscription-with-a-medium-voltage-power-plane-and-priority-aware-capping/). 14 | 15 | Also included is a [colab](power_trace_analysis_colab.ipynb) recreating the 16 | figures as an example of how to query the data. 17 | 18 | ## Notes 19 | 20 | If you use this data we'd appreciate if you cite the paper and let us know about 21 | your work! The best way to do so is through the mailing list. 22 | 23 | * If you haven't already joined our 24 | [mailing list](https://groups.google.com/forum/#!forum/googleclusterdata-discuss), 25 | please do so now. *Important: to avoid spammers, you MUST fill out the 26 | "reason" field, or your application will be rejected.* 27 | 28 | ![Creative Commons CC-BY license](https://i.creativecommons.org/l/by/4.0/88x31.png) 29 | The data and trace documentation are made available under the 30 | [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. By downloading it 31 | or using them, you agree to the terms of this license. 32 | 33 | **Questions?** 34 | 35 | You can send email to googleclusterdata-discuss@googlegroups.com. The more 36 | detailed the request the greater the chance that somebody can help you: screen 37 | shots, concrete examples, error messages, and a list of what you already tried 38 | are all useful. 39 | -------------------------------------------------------------------------------- /ClusterData2019.md: -------------------------------------------------------------------------------- 1 | # ClusterData 2019 traces 2 | 3 | _John Wilkes._ 4 | 5 | The `clusterdata-2019` trace dataset provides information about eight different Borg cells for the month of May 2019. It includes the following new information: 6 | 7 | * CPU usage information histograms for each 5 minute period, not just a point sample; 8 | * information about alloc sets (shared resource reservations used by jobs); 9 | * job-parent information for master/worker relationships such as MapReduce jobs. 10 | 11 | The 2019 traces focus on resource requests and usage, and contain no information about end users, their data, or access patterns to storage systems and other services. 12 | 13 | Because of it's size (about 2.4TiB compressed), we are only making the trace data available via [Google BigQuery](https://cloud.google.com/bigquery) so that sophisticated analyses can be performed without requiring local resources. 14 | 15 | **The `clusterdata-2019` traces are described in this document: 16 | [Google cluster-usage traces v3](https://drive.google.com/file/d/10r6cnJ5cJ89fPWCgj7j4LtLBqYN9RiI9/view).** You can find the download and access instructions there, as well as many more details about what is in the traces, and how to interpret them. For additional background information, please refer to the 2015 Borg paper, [Large-scale cluster management at Google with Borg](https://ai.google/research/pubs/pub43438). 17 | 18 | * If you haven't already joined our 19 | [mailing list](https://groups.google.com/forum/#!forum/googleclusterdata-discuss), 20 | please do so now. 21 | *Important: to avoid spammers, you MUST fill out the "reason" field, or your application will be rejected.* 22 | 23 | ![Creative Commons CC-BY license](https://i.creativecommons.org/l/by/4.0/88x31.png) 24 | The data and trace documentation are made available under the 25 | [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. 26 | By downloading it or using them, you agree to the terms of this license. 27 | 28 | **Questions?** 29 | 30 | You can send email to googleclusterdata-discuss@googlegroups.com. The more detailed the request the greater the chance that somebody can help you: screen shots, concrete examples, error messages, and a list of what you already tried are all useful. 31 | 32 | **Acknowledgements** 33 | 34 | This trace is the result of a collaboration involving Muhammad Tirmazi, Nan Deng, Md Ehtesam Haque, Zhijing Gene Qin, Steve Hand and Adam Barker. 35 | -------------------------------------------------------------------------------- /TraceVersion1.md: -------------------------------------------------------------------------------- 1 | 2 | |*Note: for new work, we **strongly** recommend using the [version 2](ClusterData2011_2.md) or [version 3](ClusterData2019.md) traces, which are more recent, more comprehensive, and provide much more data.* | 3 | |:--------- | 4 | 5 | 6 | _This trace was first announced in [this January 2010 blog post](https://ai.googleblog.com/2010/01/google-cluster-data.html)._ 7 | 8 | The first dataset provides traces from a Borg cell that were taken over a 7 hour 9 | period. The workload consists of a set of tasks, where each task runs on a 10 | single machine. Tasks consume memory and one or more cores (in fractional 11 | units). Each task belongs to a single job; a job may have multiple tasks (e.g., 12 | mappers and reducers). 13 | 14 | The trace data is available 15 | [here](http://commondatastorage.googleapis.com/clusterdata-misc/google-cluster-data-1.csv.gz). 16 | ([SHA1 checksum](http://en.wikipedia.org/wiki/SHA-1#Data_Integrity): 17 | 98c87f059aa1cc37f1e9523ac691ee0fd5629188.) 18 | 19 | The data have been anonymized in several ways: there are no task or job names, 20 | just numeric identifiers; timestamps are relative to the start of data 21 | collection; the consumption of CPU and memory is obscured using a linear 22 | transformation. However, even with these transformations of the data, 23 | researchers will be able to do workload characterizations (up to a linear 24 | transformation of the true workload) and workload generation. 25 | 26 | The data are structured as blank-separated columns. Each row reports on the 27 | execution of a single task during a five minute period. 28 | 29 | * `Time` (int) - time in seconds since the start of data collection 30 | * `JobID` (int) - Unique identifier of the job to which this task belongs (**may be called ParentID**) 31 | * `TaskID` (int) - Unique identifier of the executing task 32 | * `Job Type` (0, 1, 2, 3) - class of job (a categorization of work) 33 | * `Normalized Task Cores` (float) - normalized value of the average number of cores used by the task 34 | * `Normalized Task Memory` (float) - normalized value of the average memory consumed by the task 35 | 36 | ![Creative Commons CC-BY license](https://i.creativecommons.org/l/by/4.0/88x31.png) 37 | The data and trace documentation are made available under the 38 | [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. 39 | By downloading it or using them, you agree to the terms of this license. 40 | 41 | Questions? Send [email](mailto:googleclusterdata-discuss@googlegroups.com) 42 | or peruse the 43 | [discussion group](http://groups.google.com/group/googleclusterdata-discuss). 44 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | This repository describes various traces from parts of the Google cluster 4 | management software and systems. 5 | 6 | * Please join our (low volume) 7 | [discussion group](http://groups.google.com/group/googleclusterdata-discuss), 8 | so we can send you announcements, and you can let us know about any issues, 9 | insights, or papers you publish using these traces. **Important: to avoid 10 | spammers, you MUST fill out the "reason" field, or your application will be 11 | rejected**. Once you are a member, you can send email to 12 | [googleclusterdata-discuss@googlegroups.com](mailto:googleclusterdata-discuss@googlegroups.com) 13 | to: 14 | 15 | * Announce tools and techniques that can help others analyze or decode the 16 | trace data. 17 | * Share insights and surprises. 18 | * Ask questions (the group has a few hundred members) and get help. If you 19 | ask for help, please include concrete examples of issues you run into; 20 | screen shots; error codes; and a list of what you have already tried. 21 | Don't just say "I can't download the data"! 22 | 23 | * We provide a **[trace bibliography](bibliography.bib)** of papers that have 24 | used and/or analyzed the traces, and encourage anybody who publishes one to 25 | add it to the bibliography using a github pull request [preferred], or by 26 | emailing the bibtex entry to 27 | [googleclusterdata-discuss@googlegroups.com](mailto:googleclusterdata-discuss@googlegroups.com). 28 | In either case, please mimic the existing format **exactly**. 29 | 30 | # Borg cluster workload traces 31 | 32 | These are traces of workloads running on Google compute cells that are managed 33 | by the cluster management software internally known as Borg. 34 | 35 | * **[version 3](ClusterData2019.md)** (aka `ClusterData2019`) provides data 36 | from eight Borg cells over the month of May 2019. 37 | * [version 2](ClusterData2011_2.md) (aka `ClusterData2011`) provides data from 38 | a single 12.5k-machine Borg cell from May 2011. 39 | * [version 1](TraceVersion1.md) is an older, short trace that describes a 7 40 | hour period from one cell from 2009. *Deprecated. We strongly recommend 41 | using the version 2 or version 3 traces instead.* 42 | 43 | ## ETA traces 44 | 45 | In addition, this site hosts a set of 46 | [execution traces from ETA](ETAExplorationTraces.md) (Exploratory Testing 47 | Architecture) - a testing framework that explores interactions between 48 | distributed, concurrently-executing components, with an eye towards improving 49 | testing them. 50 | 51 | ## Power traces 52 | 53 | This site also hosts [power traces](PowerData2019.md) for 57 power domains 54 | during the month of May 2019. This trace is synergistic with the 55 | `ClusterData2019` dataset. 56 | 57 | ## License 58 | 59 | ![Creative Commons CC-BY license](https://i.creativecommons.org/l/by/4.0/88x31.png) 60 | The data and trace documentation are made available under the 61 | [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. By downloading it 62 | or using them, you agree to the terms of this license. 63 | -------------------------------------------------------------------------------- /ETAExplorationTraces.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | [ETA](http://www.pdl.cmu.edu/PDL-FTP/associated/CMU-PDL-11-113.pdf) (Exploratory 4 | Testing Architecture) is a testing framework that explores the execution of a 5 | distributed application, looking for bugs that are provoked by particular 6 | sequences of events caused by non-determinism such as timing and asynchrony. 7 | ETA was developed for [Omega](http://research.google.com/pubs/pub41684.html), a 8 | cluster management system developed at Google. 9 | 10 | As part of its functionality, ETA provides estimates for when its exploratory 11 | testing will finish. Achieving accurate runtime estimations is a significant 12 | research challenge, and so in order to stimulate interest, and foster research 13 | in improving these estimates, we have made available traces of a number of ETA’s 14 | real-world exploratory test runs. 15 | 16 | You can find the traces 17 | [here](http://commondatastorage.googleapis.com/clusterdata-misc/ETA-traces.tar.gz). 18 | ([SHA1 checksum](http://en.wikipedia.org/wiki/SHA-1#Data_Integrity): 19 | 6664e43caa1bf1f4c0af959fe93d266ead24d234.) 20 | 21 | # Format 22 | 23 | These traces describe the execution tree structure explored by ETA. In short, 24 | the execution tree represents at abstract level the different sequences in which 25 | concurrent events can happen during an execution of a test. Further, ETA uses 26 | [state space reduction](http://dl.acm.org/citation.cfm?id=1040315) to avoid the 27 | need to explore equivalent sequences. In other words, certain parts of the 28 | execution tree may never be explored. For details on the execution tree 29 | structure and application of the state space reduction in ETA read our 30 | [technical report](http://www.pdl.cmu.edu/PDL-FTP/associated/CMU-PDL-11-113.pdf). 31 | 32 | An exploration trace contains a sequence of events that detail the 33 | exploration. Each event can be one of the following: 34 | 35 | * `AddNode x y` -- a node `x` with parent `y` has been added (the parent of the root is -1) 36 | * `Explore x` -- the node `x` has been marked for further exploration 37 | * `Transition x` -- the exploration transitioned from the current node to node `x` 38 | * `Start` -- new test execution (starting from node 0) has been initiated 39 | * `End t` -- current test execution finished after `t` time units. 40 | 41 | A well-formed trace contains a number of executions, each starting with `Start` 42 | and ending with `End t`. Each execution explores a branch of the execution tree, 43 | transitioning from the root all the way to a leaf (`Transition x`), and 44 | optionally adding newly encountered nodes to the tree (`AddNode x y`), and 45 | identifying which unvisited nodes should be explored in future (`Explore x`). 46 | 47 | # Traces 48 | These are all provided in a single compressed tar file (see above for the link). 49 | 50 | * `resource_X.trace`: The `resource_X` test is representative of a class of 51 | Omega tests that evaluate interactions of `X` different users that acquire 52 | and release resources from a pool of `X` resources. 53 | * `store_X_Y_Z.trace`: The `store_X_Y_Z` test is representative of a class of 54 | Omega tests that evaluate interactions of X users of a distributed key-value 55 | store with `Y` front-end nodes and `Z` back-end nodes. 56 | * `scheduling_X.trace`: The `scheduling_X` test is representative of a class of 57 | Omega tests that evaluate interactions of `X` users issuing concurrent 58 | scheduling requests. 59 | * `tlp.trace`: The `tlp` test is representative of a class of Omega tests that 60 | do scheduling work. 61 | 62 | # Notes 63 | 64 | The data may be freely used for any purpose, although acknowledgement of Google 65 | as the source of the data would be appreciated, and we’d love to be sent copies 66 | of any papers you publish that use it. 67 | 68 | Questions? Send us email! 69 | 70 | Jiri Simsa jsimsa@google.com, john wilkes johnwilkes@google.com 71 | 72 | -------------------------------------- 73 | 74 | _Version of: 2012-09-26; revised 2015-07-29_ 75 | -------------------------------------------------------------------------------- /ClusterData2011_2.md: -------------------------------------------------------------------------------- 1 | # ClusterData 2011 traces 2 | 3 | _John Wilkes and Charles Reiss._ 4 | 5 | The `clusterdata-2011-2` trace represents 29 day's worth of Borg cell information 6 | from May 2011, on a cluster of about 12.5k machines. (The `-2` refers to the fact that we added some additional data after the initial release, to create trace version 2.1.) 7 | 8 | * If you haven't already joined our 9 | [mailing list](https://groups.google.com/forum/#!forum/googleclusterdata-discuss), 10 | please do so now. **Important**: please fill out the "reason" field, or your application will be rejected. 11 | 12 | ## Trace data 13 | 14 | The `clusterdata-2011-2` trace starts at 19:00 EDT on Sunday May 1, 2011, and 15 | the datacenter is in that timezone (US Eastern). This corresponds to a trace 16 | timestamp of 600s; see the data schema documentation for why. 17 | 18 | The trace is described in the trace-data 19 | [v2.1 format + schema document](https://drive.google.com/file/d/0B5g07T_gRDg9Z0lsSTEtTWtpOW8/view?usp=sharing&resourcekey=0-cozD56gA4fUDdrkHnLJSrQ). 20 | 21 | Priorities in this trace range from 0 to 11 inclusive; bigger numbers mean "more 22 | important". 0 and 1 are “free” priorities; 9, 10, and 11 are “production” 23 | priorities; and 12 is a “monitoring” priority.
24 | 25 | The `clusterdata-2011-2` trace is identical to the one called 26 | `clusterdata-2011-1`, except for the addition of a single new column of data in 27 | the `task_usage` tables. This new data is a randomly-picked 1 second sample of 28 | CPU usage from within the associated 5-minute usage-reporting period for that 29 | task. Using this data, it is possible to build up a stochastic model of task 30 | utilization over time for long-running tasks. 31 | 32 | ![Creative Commons CC-BY license](https://i.creativecommons.org/l/by/4.0/88x31.png) 33 | The data and trace documentation are made available under the 34 | [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. 35 | By downloading it or using them, you agree to the terms of this license. 36 | 37 | ## Downloading the trace 38 | 39 | Download instructions for the trace are in the 40 | [v2.1 format + schema document](https://drive.google.com/file/d/0B5g07T_gRDg9Z0lsSTEtTWtpOW8/view?usp=sharing&resourcekey=0-cozD56gA4fUDdrkHnLJSrQ). 41 | 42 | The trace is stored in 43 | [Google Storage for Developers](https://developers.google.com/storage/) in the 44 | bucket called `clusterdata-2011-2`. The total size of the compressed trace is 45 | approximately 41GB. 46 | 47 | Most users should use the 48 | [gsutil](https://developers.google.com/storage/docs/gsutil) command-line tool to 49 | download the trace data. 50 | 51 | 52 | ## Known anomalies in the trace 53 | 54 | Disk-time-fraction data is only included in about the first 14 days, because of 55 | a change in our monitoring system. 56 | 57 | Some jobs are deliberately omitted because they ran primarily on machines not 58 | included in this trace. The portion that ran on included machines amounts to 59 | approximately 0.003% of the machines’ task-seconds of usage. 60 | 61 | We are aware of only one example of a job that retains its job ID after being 62 | stopped, reconfigured, and restarted (job number 6253771429). 63 | 64 | Approximately 70 jobs (for example, job number 6377830001) have job event 65 | records but no task event records. We believe that this is legitimate in a 66 | majority of cases: typically because the job is started but its tasks are 67 | disabled for its entire duration. 68 | 69 | Approximately 0.013% of task events and 0.0008% of job events in this trace have 70 | a non-empty missing info field. 71 | 72 | We estimate that less than 0.05% of job and task scheduling event records are 73 | missing and less than 1% of resource usage measurements are missing. 74 | 75 | Some cycles per instruction (CPI) and memory accesses per instruction (MAI) 76 | measurements are clearly inaccurate (for example, they are above or below the 77 | range possible on the underlying micro-architectures). We believe these 78 | measurements are caused by bugs in the data-capture system used, such as the 79 | cycle counter and instruction counter not being read at the same time. To obtain 80 | useful data from these measurements, we suggest filtering out measurements 81 | representing a very small amount of CPU time and measurements with unreasonable 82 | CPI and MAI values. 83 | 84 | # Questions? 85 | 86 | Please send email to googleclusterdata-discuss@googlegroups.com. 87 | -------------------------------------------------------------------------------- /clusterdata_trace_format_v3.proto: -------------------------------------------------------------------------------- 1 | // This file defines the format of the 3rd version of cluster trace data 2 | // published by Google. Please refer to the associated 'Google cluster-usage 3 | // traces v3' document. 4 | // More information at https://github.com/google/cluster-data 5 | 6 | syntax = "proto2"; 7 | 8 | package google.cluster_data; 9 | 10 | // Values used to indicate "not present" for special cases. 11 | enum Constants { 12 | option allow_alias = true; // OK for multiple names to have the same value. 13 | 14 | NO_MACHINE = 0; // The thing is not bound to a machine. 15 | DEDICATED_MACHINE = -1; // The thing is bound to a dedicated machine. 16 | NO_ALLOC_COLLECTION = 0; // The thing is not running in an alloc set. 17 | NO_ALLOC_INDEX = -1; // The thing does not have an alloc instance index. 18 | } 19 | 20 | // A common structure for CPU and memory resource units. 21 | // All resource measurements are normalized and scaled. 22 | message Resources { 23 | optional float cpus = 1; // Normalized GCUs (NCUs). 24 | optional float memory = 2; // Normalized RAM bytes. 25 | } 26 | 27 | // Collections are either jobs (which have tasks) or alloc sets (which have 28 | // alloc instances). 29 | enum CollectionType { 30 | JOB = 0; 31 | ALLOC_SET = 1; 32 | } 33 | 34 | // This enum is used in the 'type' field of the CollectionEvent and 35 | // InstanceEvent tables. 36 | enum EventType { 37 | // The collection or instance was submitted to the scheduler for scheduling. 38 | SUBMIT = 0; 39 | // The collection or instance was marked not eligible for scheduling by the 40 | // batch scheduler. 41 | QUEUE = 1; 42 | // The collection or instance became eligible for scheduling. 43 | ENABLE = 2; 44 | // The collection or instance started running. 45 | SCHEDULE = 3; 46 | // The collection or instance was descheduled because of a higher priority 47 | // collection or instance, or because the scheduler overcommitted resources. 48 | EVICT = 4; 49 | // The collection or instance was descheduled due to a failure. 50 | FAIL = 5; 51 | // The collection or instance completed normally. 52 | FINISH = 6; 53 | // The collection or instance was cancelled by the user or because a 54 | // depended-upon collection died. 55 | KILL = 7; 56 | // The collection or instance was presumably terminated, but due to missing 57 | // data there is insufficient information to identify when or how. 58 | LOST = 8; 59 | // The collection or instance was updated (scheduling class or resource 60 | // requirements) while it was waiting to be scheduled. 61 | UPDATE_PENDING = 9; 62 | // The collection or instance was updated while it was scheduled somewhere. 63 | UPDATE_RUNNING = 10; 64 | } 65 | // Represents reasons why we synthesized a scheduler event to replace 66 | // apparently missing data. 67 | enum MissingType { 68 | MISSING_TYPE_NONE = 0; // No data was missing. 69 | SNAPSHOT_BUT_NO_TRANSITION = 1; 70 | NO_SNAPSHOT_OR_TRANSITION = 2; 71 | EXISTS_BUT_NO_CREATION = 3; 72 | TRANSITION_MISSING_STEP = 4; 73 | TOO_MANY_EVENTS = 5; 74 | } 75 | // How latency-sensitive a thing is to CPU scheduling delays when running 76 | // on a machine, in increasing-sensitivity order. 77 | // Note that this is _not_ the same as the thing's cluster-scheduling 78 | // priority although latency-sensitive things do tend to have higher priorities. 79 | enum LatencySensitivity { 80 | MOST_INSENSITIVE = 0; // Also known as "best effort". 81 | INSENSITIVE = 1; // Often used for batch jobs. 82 | SENSITIVE = 2; // Used for latency-sensitive jobs. 83 | MOST_SENSITIVE = 3; // Used for the most latency-senstive jobs. 84 | } 85 | 86 | // Represents the type of scheduler that is handling a job. 87 | enum Scheduler { 88 | // Handled by the default cluster scheduler. 89 | SCHEDULER_DEFAULT = 0; 90 | // Handled by a secondary scheduler, optimized for batch loads. 91 | SCHEDULER_BATCH = 1; 92 | } 93 | 94 | // How the collection is verically auto-scaled. 95 | enum VerticalScalingSetting { 96 | // We were unable to determine the setting. 97 | VERTICAL_SCALING_SETTING_UNKNOWN = 0; 98 | // Vertical scaling was disabled, e.g., in the collection 99 | // creation request. 100 | VERTICAL_SCALING_OFF = 1; 101 | // Vertical scaling was enabled, with user-supplied lower 102 | // and/or upper bounds for GCU and/or RAM. 103 | VERTICAL_SCALING_CONSTRAINED = 2; 104 | // Vertical scaling was enabled, with no user-provided bounds. 105 | VERTICAL_SCALING_FULLY_AUTOMATED = 3; 106 | } 107 | 108 | // A constraint represents a request for a thing to be placed on a machine 109 | // (or machines) with particular attributes. 110 | message MachineConstraint { 111 | // Comparison operation between the supplied value and the machine's value. 112 | // For EQUAL and NOT_EQUAL relationships, the comparison is a string 113 | // comparison; for LESS_THAN, GREATER_THAN, etc., the values are converted to 114 | // floating point numbers first; for PRESENT and NOT_PRESENT, the test is 115 | // merely whether the supplied attribute exists for the machine in question, 116 | // and the value field of the constraint is ignored. 117 | enum Relation { 118 | EQUAL = 0; 119 | NOT_EQUAL = 1; 120 | LESS_THAN = 2; 121 | GREATER_THAN = 3; 122 | LESS_THAN_EQUAL = 4; 123 | GREATER_THAN_EQUAL = 5; 124 | PRESENT = 6; 125 | NOT_PRESENT = 7; 126 | } 127 | 128 | // Obfuscated name of the constraint. 129 | optional string name = 1; 130 | // Target value for the constraint (e.g., a minimum or equality). 131 | optional string value = 2; 132 | // Comparison operator. 133 | optional Relation relation = 3; 134 | } 135 | 136 | // Instance and collection events both share a common prefix, followed by 137 | // specific fields. Information about an instance event (task or alloc 138 | // instance). 139 | message InstanceEvent { 140 | // Common fields shared between instances and collections. 141 | 142 | // Timestamp, in microseconds since the start of the trace. 143 | optional int64 time = 1; 144 | // What type of event is this? 145 | optional EventType type = 2; 146 | // The identity of the collection that this instance is part of. 147 | optional int64 collection_id = 3; 148 | // How latency-sensitive is the instance? 149 | optional LatencySensitivity scheduling_class = 4; 150 | // Was there any missing data? If so, why? 151 | optional MissingType missing_type = 5; 152 | // What type of collection this instance belongs to. 153 | optional CollectionType collection_type = 6; 154 | // Cluster-level scheduling priority for the instance. 155 | optional int32 priority = 7; 156 | // (Tasks only) The ID of the alloc set that this task is running in, or 157 | // NO_ALLOC_COLLECTION if it is not running in an alloc. 158 | optional int64 alloc_collection_id = 8; 159 | 160 | // Begin: fields specific to instances 161 | // The index of the instance in its collection (starts at 0). 162 | optional int32 instance_index = 9; 163 | // The ID of the machine on which this instance is placed (or NO_MACHINE if 164 | // not placed on one, or DEDICATED_MACHINE if it's on a dedicated machine). 165 | optional int64 machine_id = 10; 166 | // (Tasks only) The index of the alloc instance that this task is running in, 167 | // or NO_ALLOC_INDEX if it is not running in an alloc. 168 | optional int32 alloc_instance_index = 11; 169 | // The resources requested when the instance was submitted or last updated. 170 | optional Resources resource_request = 12; 171 | // Currently active scheduling constraints. 172 | repeated MachineConstraint constraint = 13; 173 | } 174 | 175 | // Collection events apply to the collection as a whole. 176 | message CollectionEvent { 177 | // Common fields shared between instances and collections. 178 | 179 | // Timestamp, in microseconds since the start of the trace. 180 | optional int64 time = 1; 181 | // What type of event is this? 182 | optional EventType type = 2; 183 | // The identity of the collection. 184 | optional int64 collection_id = 3; 185 | // How latency-sensitive is the collection? 186 | optional LatencySensitivity scheduling_class = 4; 187 | // Was there any missing data? If so, why? 188 | optional MissingType missing_type = 5; 189 | // What type of collection is this? 190 | optional CollectionType collection_type = 6; 191 | // Cluster-level scheduling priority for the collection. 192 | optional int32 priority = 7; 193 | // The ID of the alloc set that this job is to run in, or NO_ALLOC_COLLECTION 194 | // (only for jobs). 195 | optional int64 alloc_collection_id = 8; 196 | 197 | // Fields specific to a collection. 198 | 199 | // The user who runs the collection 200 | optional string user = 9; 201 | // Obfuscated name of the collection. 202 | optional string collection_name = 10; 203 | // Obfuscated logical name of the collection. 204 | optional string collection_logical_name = 11; 205 | // ID of the collection that this is a child of. 206 | // (Used for stopping a collection when the parent terminates.) 207 | optional int64 parent_collection_id = 12; 208 | // IDs of collections that must finish before this collection may start. 209 | repeated int64 start_after_collection_ids = 13; 210 | // Maximum number of instances of this collection that may be placed on 211 | // one machine (or 0 if unlimited). 212 | optional int32 max_per_machine = 14; 213 | // Maximum number of instances of this collection that may be placed on 214 | // machines connected to a single Top of Rack switch (or 0 if unlimited). 215 | optional int32 max_per_switch = 15; 216 | // How/whether vertical scaling should be done for this collection. 217 | optional VerticalScalingSetting vertical_scaling = 16; 218 | // The preferred cluster scheduler to use. 219 | optional Scheduler scheduler = 17; 220 | } 221 | 222 | // Machine events describe the addition, removal, or update (change) of a 223 | // machine in the cluster at a particular time. 224 | message MachineEvent { 225 | enum EventType { 226 | // Should never happen :-). 227 | EVENT_TYPE_UNKNOWN = 0; 228 | // Machine added to the cluster. 229 | ADD = 1; 230 | // Machine removed from cluster (usually due to failure or repairs). 231 | REMOVE = 2; 232 | // Machine capacity updated (while not removed). 233 | UPDATE = 3; 234 | } 235 | 236 | // If we detect that data is missing, why do we know this? 237 | enum MissingDataReason { 238 | // No data is missing. 239 | MISSING_DATA_REASON_NONE = 0; 240 | // We observed that a change to the state of a machine must have 241 | // occurred from an internal state snapshot, but did not see a 242 | // corresponding transition event during the trace. 243 | SNAPSHOT_BUT_NO_TRANSITION = 1; 244 | } 245 | 246 | // Timestamp, in microseconds since the start of the trace. [key] 247 | optional int64 time = 1; 248 | // Unique ID of the machine within the cluster. [key] 249 | optional int64 machine_id = 2; 250 | // Specifies the type of event 251 | optional EventType type = 3; 252 | // Obfuscated name of the Top of Rack switch that this machine is attached to. 253 | optional string switch_id = 4; 254 | // Available resources that the machine supplies. (Note: may be smaller 255 | // than the physical machine's raw capacity.) 256 | optional Resources capacity = 5; 257 | // An obfuscated form of the machine platform (microarchitecture + motherboard 258 | // design). 259 | optional string platform_id = 6; 260 | // Did we detect possibly-missing data? 261 | optional MissingDataReason missing_data_reason = 7; 262 | } 263 | 264 | // A machine attribute update or (if time = 0) its initial value. 265 | message MachineAttribute { 266 | // Timestamp, in microseconds since the start of the trace. [key] 267 | optional int64 time = 1; 268 | // Unique ID of the machine within the cluster. [key] 269 | optional int64 machine_id = 2; 270 | // Obfuscated unique name of the attribute (unique across all clusters). [key] 271 | optional string name = 3; 272 | // Value of the attribute. If this is unset, then 'deleted' must be true. 273 | optional string value = 4; 274 | // True if the attribute is being deleted at this time. 275 | optional bool deleted = 5; 276 | } 277 | 278 | // Information about resource consumption (usage) during a sample window 279 | // (which is typically 300s, but may be shorter if the instance started 280 | // and/or ended during a measurement window). 281 | message InstanceUsage { 282 | // Sample window end points, in microseconds since the start of the trace. 283 | optional int64 start_time = 1; 284 | optional int64 end_time = 2; 285 | // ID of collection that this instance belongs to. 286 | optional int64 collection_id = 3; 287 | // Index of this instance's position in that collection (starts at 0). 288 | optional int32 instance_index = 4; 289 | // Unique ID of the machine on which the instance has been placed. 290 | optional int64 machine_id = 5; 291 | 292 | // ID and index of the alloc collection + instance in which this instance 293 | // is running, or NO_ALLOC_COLLECTION / NO_ALLOC_INDEX if it is not 294 | // running inside an alloc. 295 | optional int64 alloc_collection_id = 6; 296 | optional int64 alloc_instance_index = 7; 297 | // Type of the collection that this instance belongs to. 298 | optional CollectionType collection_type = 8; 299 | // Average (mean) usage over the measurement period. 300 | optional Resources average_usage = 9; 301 | // Observed maximum usage over the measurement period. 302 | // This measurement may be fully or partially missing in some cases. 303 | optional Resources maximum_usage = 10; 304 | // Observed CPU usage during a randomly-sampled second within the measurement 305 | // window. (No memory data is provided here.) 306 | optional Resources random_sample_usage = 11; 307 | 308 | // The memory limit imposed on this instance; normally, it will not be 309 | // allowed to exceed this amount of memory. 310 | optional float assigned_memory = 12; 311 | // Amount of memory that is used for the instance's file page cache in the OS 312 | // kernel. 313 | optional float page_cache_memory = 13; 314 | 315 | // Average (mean) number of processor and memory cycles per instruction. 316 | optional float cycles_per_instruction = 14; 317 | optional float memory_accesses_per_instruction = 15; 318 | // The average (mean) number of data samples collected per second 319 | // (e.g., sample_rate=0.5 means a sample every 2 seconds on average). 320 | optional float sample_rate = 16; 321 | 322 | // CPU usage percentile data. 323 | // The cpu_usage_distribution vector contains 10 elements, representing 324 | // 0%ile (aka min), 10%ile, 20%ile, ... 90%ile, 100%ile (aka max) of the 325 | // normalized CPU usage in NCUs. 326 | // Note that the 100%ile may not exactly match the maximum_usage 327 | // value because of interpolation effects. 328 | repeated float cpu_usage_distribution = 17; 329 | // The tail_cpu_usage_distribution vector contains 9 elements, representing 330 | // 91%ile, 92%ile, 93%ile, ... 98%ile, 99%ile of the normalized CPU resource 331 | // usage in NCUs. 332 | repeated float tail_cpu_usage_distribution = 18; 333 | } 334 | -------------------------------------------------------------------------------- /clusterdata_analysis_colab.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "clusterdata_analysis_colab.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [], 9 | "authorship_tag": "ABX9TyN3LZGXsYYQ/HRBwFP0rm4Q", 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "name": "python3", 14 | "display_name": "Python 3" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "id": "-qcDkhOIjj8a", 32 | "colab_type": "text" 33 | }, 34 | "source": [ 35 | "# Google trace analysis colab\n", 36 | "\n", 37 | "This colab provides several example queries and graphs using [Altair](https://altair-viz.github.io/) for the 2019 Google cluster trace. Further examples will be added over time.\n", 38 | "\n", 39 | "**Important:** in order to be able to run the queries you will need to:\n", 40 | "\n", 41 | "1. Use the [Cloud Resource Manager](https://console.cloud.google.com/cloud-resource-manager) to Create a Cloud Platform project if you do not already have one.\n", 42 | "2. [Enable billing](https://support.google.com/cloud/answer/6293499#enable-billing) for the project.\n", 43 | "3. [Enable BigQuery](https://console.cloud.google.com/flows/enableapi?apiid=bigquery) APIs for the project.\n" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "metadata": { 49 | "id": "Vcjo13Kejgij", 50 | "colab_type": "code", 51 | "colab": {} 52 | }, 53 | "source": [ 54 | "#@title Please input your project id\n", 55 | "import pandas as pd\n", 56 | "import numpy as np\n", 57 | "import altair as alt\n", 58 | "from google.cloud import bigquery\n", 59 | "# Provide credentials to the runtime\n", 60 | "from google.colab import auth\n", 61 | "from google.cloud.bigquery import magics\n", 62 | "\n", 63 | "auth.authenticate_user()\n", 64 | "print('Authenticated')\n", 65 | "project_id = '' #@param {type: \"string\"}\n", 66 | "# Set the default project id for %bigquery magic\n", 67 | "magics.context.project = project_id\n", 68 | "\n", 69 | "# Use the client to run queries constructed from a more complicated function.\n", 70 | "client = bigquery.Client(project=project_id)" 71 | ], 72 | "execution_count": null, 73 | "outputs": [] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": { 78 | "id": "NFUPuLiajrC8", 79 | "colab_type": "text" 80 | }, 81 | "source": [ 82 | "# Basic queries\n", 83 | "\n", 84 | "This section shows the most basic way of querying the trace using the [bigquery magic](https://googleapis.dev/python/bigquery/latest/magics.html)" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "metadata": { 90 | "id": "3xyBH9oQjr1w", 91 | "colab_type": "code", 92 | "colab": {} 93 | }, 94 | "source": [ 95 | "%%bigquery\n", 96 | "SELECT capacity.cpus AS cpu_cap, \n", 97 | "capacity.memory AS memory_cap, \n", 98 | "COUNT(DISTINCT machine_id) AS num_machines\n", 99 | "FROM `google.com:google-cluster-data`.clusterdata_2019_a.machine_events\n", 100 | "GROUP BY 1,2" 101 | ], 102 | "execution_count": null, 103 | "outputs": [] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "metadata": { 108 | "id": "SzIHMxy2jvsM", 109 | "colab_type": "code", 110 | "colab": {} 111 | }, 112 | "source": [ 113 | "%%bigquery\n", 114 | "SELECT COUNT(DISTINCT collection_id) AS collections FROM \n", 115 | "`google.com:google-cluster-data`.clusterdata_2019_a.collection_events;" 116 | ], 117 | "execution_count": null, 118 | "outputs": [] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": { 123 | "id": "gm-gaRS-jwZj", 124 | "colab_type": "text" 125 | }, 126 | "source": [ 127 | "# Cell level resource usage time series\n", 128 | "\n", 129 | "This query takes a cell as input and plots a resource usage time-series for every hour of the trace broken down by tier." 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "metadata": { 135 | "id": "ekhDGYOAjy54", 136 | "colab_type": "code", 137 | "colab": {} 138 | }, 139 | "source": [ 140 | "#@title Select a cell and a resource to plot the cell level usage series\n", 141 | "\n", 142 | "def query_cell_capacity(cell):\n", 143 | " return '''\n", 144 | "SELECT SUM(cpu_cap) AS cpu_capacity,\n", 145 | " SUM(memory_cap) AS memory_capacity\n", 146 | "FROM (\n", 147 | " SELECT machine_id, MAX(capacity.cpus) AS cpu_cap,\n", 148 | " MAX(capacity.memory) AS memory_cap\n", 149 | " FROM `google.com:google-cluster-data`.clusterdata_2019_{cell}.machine_events\n", 150 | " GROUP BY 1\n", 151 | ")\n", 152 | " '''.format(cell=cell)\n", 153 | "\n", 154 | "def query_per_instance_usage_priority(cell):\n", 155 | " return '''\n", 156 | "SELECT u.time AS time,\n", 157 | " u.collection_id AS collection_id,\n", 158 | " u.instance_index AS instance_index,\n", 159 | " e.priority AS priority,\n", 160 | " CASE\n", 161 | " WHEN e.priority BETWEEN 0 AND 99 THEN '1_free'\n", 162 | " WHEN e.priority BETWEEN 100 AND 115 THEN '2_beb'\n", 163 | " WHEN e.priority BETWEEN 116 AND 119 THEN '3_mid'\n", 164 | " ELSE '4_prod'\n", 165 | " END AS tier,\n", 166 | " u.cpu_usage AS cpu_usage,\n", 167 | " u.memory_usage AS memory_usage\n", 168 | "FROM (\n", 169 | " SELECT start_time AS time,\n", 170 | " collection_id,\n", 171 | " instance_index,\n", 172 | " machine_id,\n", 173 | " average_usage.cpus AS cpu_usage,\n", 174 | " average_usage.memory AS memory_usage\n", 175 | " FROM `google.com:google-cluster-data`.clusterdata_2019_{cell}.instance_usage\n", 176 | " WHERE (alloc_collection_id IS NULL OR alloc_collection_id = 0)\n", 177 | " AND (end_time - start_time) >= (5 * 60 * 1e6)\n", 178 | ") AS u JOIN (\n", 179 | " SELECT collection_id, instance_index, machine_id,\n", 180 | " MAX(priority) AS priority\n", 181 | " FROM `google.com:google-cluster-data`.clusterdata_2019_{cell}.instance_events\n", 182 | " WHERE (alloc_collection_id IS NULL OR alloc_collection_id = 0)\n", 183 | " GROUP BY 1, 2, 3\n", 184 | ") AS e ON u.collection_id = e.collection_id\n", 185 | " AND u.instance_index = e.instance_index\n", 186 | " AND u.machine_id = e.machine_id\n", 187 | " '''.format(cell=cell)\n", 188 | "\n", 189 | "def query_per_tier_utilization_time_series(cell, cpu_capacity, memory_capacity):\n", 190 | " return '''\n", 191 | "SELECT CAST(FLOOR(time/(1e6 * 60 * 60)) AS INT64) AS hour_index,\n", 192 | " tier,\n", 193 | " SUM(cpu_usage) / (12 * {cpu_capacity}) AS avg_cpu_usage,\n", 194 | " SUM(memory_usage) / (12 * {memory_capacity}) AS avg_memory_usage\n", 195 | "FROM ({table})\n", 196 | "GROUP BY 1, 2 ORDER BY hour_index\n", 197 | " '''.format(table=query_per_instance_usage_priority(cell),\n", 198 | " cpu_capacity=cpu_capacity, memory_capacity=memory_capacity)\n", 199 | " \n", 200 | "def run_query_utilization_per_time_time_series(cell):\n", 201 | " cell_cap = client.query(query_cell_capacity(cell)).to_dataframe()\n", 202 | " query = query_per_tier_utilization_time_series(\n", 203 | " cell,\n", 204 | " cell_cap['cpu_capacity'][0],\n", 205 | " cell_cap['memory_capacity'][0])\n", 206 | " time_series = client.query(query).to_dataframe()\n", 207 | " return time_series\n", 208 | "\n", 209 | "cell = 'c' #@param ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']\n", 210 | "hourly_usage = run_query_utilization_per_time_time_series(cell)\n", 211 | "\n", 212 | "# CPU graph\n", 213 | "cpu = alt.Chart(hourly_usage).mark_area().encode(\n", 214 | " alt.X('hour_index:N'),\n", 215 | " alt.Y('avg_cpu_usage:Q'),\n", 216 | " color=alt.Color('tier', legend=alt.Legend(orient=\"left\")),\n", 217 | " order=alt.Order('tier', sort='descending'),\n", 218 | " tooltip=['hour_index', 'tier', 'avg_cpu_usage']\n", 219 | " )\n", 220 | "cpu.encoding.x.title = \"Hour\"\n", 221 | "cpu.encoding.y.title = \"Average CPU usage\"\n", 222 | "cpu.display()\n", 223 | "\n", 224 | "# Memory graph\n", 225 | "memory = alt.Chart(hourly_usage).mark_area().encode(\n", 226 | " alt.X('hour_index:N'),\n", 227 | " alt.Y('avg_memory_usage:Q'),\n", 228 | " color=alt.Color('tier', legend=alt.Legend(orient=\"left\")),\n", 229 | " order=alt.Order('tier', sort='descending'),\n", 230 | " tooltip=['hour_index', 'tier', 'avg_memory_usage']\n", 231 | " )\n", 232 | "memory.encoding.x.title = \"Hour\"\n", 233 | "memory.encoding.y.title = \"Average memory usage\"\n", 234 | "memory.display()" 235 | ], 236 | "execution_count": null, 237 | "outputs": [] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": { 242 | "id": "qz9m4P5hj2bv", 243 | "colab_type": "text" 244 | }, 245 | "source": [ 246 | "#Per machine resource usage distribution\n", 247 | "\n", 248 | "This query takes a cell as input and plots a per-machine resource utilization CDF." 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "metadata": { 254 | "id": "fgSP3kvyj25-", 255 | "colab_type": "code", 256 | "colab": {} 257 | }, 258 | "source": [ 259 | "#@title Select a cell and plot its per-machine resource utilization CDFs\n", 260 | "\n", 261 | "# Functions to plot CDFs using Altair\n", 262 | "def pick_quantiles_from_tall_dataframe(data, qcol, name=\"\"):\n", 263 | " quantiles = pd.DataFrame([x for x in data[qcol]]).transpose()\n", 264 | " if name != \"\":\n", 265 | " quantiles.columns = data[name]\n", 266 | " return quantiles\n", 267 | "\n", 268 | "# - data: a dataframe with one row and one or more columns of quantiles (results\n", 269 | "# returned from APPROX_QUANTILES)\n", 270 | "# - qcols: a list of names of the quantiles\n", 271 | "# - names: the names of each returned quantiles' columns.\n", 272 | "def pick_quantiles_from_wide_dataframe(data, qcols, names=[]):\n", 273 | " quantiles = {}\n", 274 | " i = 0\n", 275 | " for qcol in qcols:\n", 276 | " col_name = qcol\n", 277 | " if i < len(names):\n", 278 | " col_name = names[i]\n", 279 | " quantiles[col_name] = data[qcol][0]\n", 280 | " i+=1\n", 281 | " return pd.DataFrame(quantiles)\n", 282 | "\n", 283 | "# - quantiles: a dataframe where each column contains the quantiles of one\n", 284 | "# data set. The index (i.e. row names) of the dataframe is the quantile. The\n", 285 | "# column names are the names of the data set.\n", 286 | "def plot_cdfs(quantiles, xlab=\"Value\", ylab=\"CDF\",\n", 287 | " legend_title=\"dataset\", labels=[],\n", 288 | " interactive=False,\n", 289 | " title=''):\n", 290 | " dfs = []\n", 291 | " label = legend_title\n", 292 | " yval = range(quantiles.shape[0])\n", 293 | " esp = 1.0/(len(quantiles)-1)\n", 294 | " yval = [y * esp for y in yval]\n", 295 | " while label == xlab or label == ylab:\n", 296 | " label += '_'\n", 297 | " for col_idx, col in enumerate(quantiles.columns):\n", 298 | " col_label = col\n", 299 | " if col_idx < len(labels):\n", 300 | " col_label = labels[col_idx]\n", 301 | " dfs.append(pd.DataFrame({\n", 302 | " label: col_label,\n", 303 | " xlab: quantiles[col],\n", 304 | " ylab: yval\n", 305 | " }))\n", 306 | " cdfs = pd.concat(dfs)\n", 307 | " lines = alt.Chart(cdfs).mark_line().encode(\n", 308 | " # If you can draw a CDF, it has to be continuous real-valued\n", 309 | " x=xlab+\":Q\",\n", 310 | " y=ylab+\":Q\",\n", 311 | " color=label+\":N\"\n", 312 | " ).properties(\n", 313 | " title=title\n", 314 | " )\n", 315 | " if not interactive:\n", 316 | " return lines\n", 317 | " # Create a selection that chooses the nearest point & selects based on x-value\n", 318 | " nearest = alt.selection(type='single', nearest=True, on='mouseover',\n", 319 | " fields=[ylab], empty='none')\n", 320 | " # Transparent selectors across the chart. This is what tells us\n", 321 | " # the y-value of the cursor\n", 322 | " selectors = alt.Chart(cdfs).mark_point().encode(\n", 323 | " y=ylab+\":Q\",\n", 324 | " opacity=alt.value(0),\n", 325 | " ).properties(\n", 326 | " selection=nearest\n", 327 | " )\n", 328 | "\n", 329 | " # Draw text labels near the points, and highlight based on selection\n", 330 | " text = lines.mark_text(align='left', dx=5, dy=-5).encode(\n", 331 | " text=alt.condition(nearest,\n", 332 | " alt.Text(xlab+\":Q\", format=\".2f\"),\n", 333 | " alt.value(' '))\n", 334 | " )\n", 335 | "\n", 336 | " # Draw a rule at the location of the selection\n", 337 | " rules = alt.Chart(cdfs).mark_rule(color='gray').encode(\n", 338 | " y=ylab+\":Q\",\n", 339 | " ).transform_filter(\n", 340 | " nearest.ref()\n", 341 | " )\n", 342 | " # Draw points on the line, and highlight based on selection\n", 343 | " points = lines.mark_point().encode(\n", 344 | " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", 345 | " )\n", 346 | " # Put the five layers into a chart and bind the data\n", 347 | " return alt.layer(lines, selectors, rules, text, points).interactive(\n", 348 | " bind_y=False)\n", 349 | " \n", 350 | "# Functions to create the query\n", 351 | "\n", 352 | "def query_machine_capacity(cell):\n", 353 | " return '''\n", 354 | "SELECT machine_id, MAX(capacity.cpus) AS cpu_cap,\n", 355 | " MAX(capacity.memory) AS memory_cap\n", 356 | "FROM `google.com:google-cluster-data`.clusterdata_2019_{cell}.machine_events\n", 357 | "GROUP BY 1\n", 358 | " '''.format(cell=cell)\n", 359 | "\n", 360 | "def query_top_level_instance_usage(cell):\n", 361 | " return '''\n", 362 | "SELECT CAST(FLOOR(start_time/(1e6 * 300)) * (1000000 * 300) AS INT64) AS time,\n", 363 | " collection_id,\n", 364 | " instance_index,\n", 365 | " machine_id,\n", 366 | " average_usage.cpus AS cpu_usage,\n", 367 | " average_usage.memory AS memory_usage\n", 368 | "FROM `google.com:google-cluster-data`.clusterdata_2019_{cell}.instance_usage\n", 369 | "WHERE (alloc_collection_id IS NULL OR alloc_collection_id = 0)\n", 370 | " AND (end_time - start_time) >= (5 * 60 * 1e6)\n", 371 | " '''.format(cell=cell)\n", 372 | "\n", 373 | "def query_machine_usage(cell):\n", 374 | " return '''\n", 375 | "SELECT u.time AS time,\n", 376 | " u.machine_id AS machine_id,\n", 377 | " SUM(u.cpu_usage) AS cpu_usage,\n", 378 | " SUM(u.memory_usage) AS memory_usage,\n", 379 | " MAX(m.cpu_cap) AS cpu_capacity,\n", 380 | " MAX(m.memory_cap) AS memory_capacity\n", 381 | "FROM ({instance_usage}) AS u JOIN\n", 382 | " ({machine_capacity}) AS m\n", 383 | "ON u.machine_id = m.machine_id\n", 384 | "GROUP BY 1, 2\n", 385 | " '''.format(instance_usage = query_top_level_instance_usage(cell),\n", 386 | " machine_capacity = query_machine_capacity(cell))\n", 387 | " \n", 388 | "def query_machine_utilization_distribution(cell):\n", 389 | " return '''\n", 390 | "SELECT APPROX_QUANTILES(IF(cpu_usage > cpu_capacity, 1.0, cpu_usage / cpu_capacity), 100) AS cpu_util_dist,\n", 391 | " APPROX_QUANTILES(IF(memory_usage > memory_capacity, 1.0, memory_usage / memory_capacity), 100) AS memory_util_dist\n", 392 | "FROM ({table})\n", 393 | " '''.format(table = query_machine_usage(cell))\n", 394 | "\n", 395 | "cell = 'd' #@param ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']\n", 396 | "query = query_machine_utilization_distribution(cell)\n", 397 | "machine_util_dist = client.query(query).to_dataframe()\n", 398 | "plot_cdfs(pick_quantiles_from_wide_dataframe(machine_util_dist, ['cpu_util_dist', 'memory_util_dist'], ['CPU', 'Memory']), xlab='x - resource utilization (%)', ylab=\"Probability (resource utilization < x)\", interactive=True)" 399 | ], 400 | "execution_count": null, 401 | "outputs": [] 402 | } 403 | ] 404 | } -------------------------------------------------------------------------------- /power_trace_analysis_colab.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [] 7 | }, 8 | "kernelspec": { 9 | "name": "python3", 10 | "display_name": "Python 3" 11 | }, 12 | "language_info": { 13 | "name": "python" 14 | } 15 | }, 16 | "cells": [ 17 | { 18 | "cell_type": "markdown", 19 | "source": [ 20 | "# Google Data Center Power Trace Analysis\n", 21 | "\n", 22 | "This colab demonstrates querying the Google data center power traces with bigquery, visualizing them with [Altair](https://altair-viz.github.io/), and analyzing them in conjunction with the 2019 Google cluster data.\n", 23 | "\n", 24 | "**Important:** in order to be able to run the queries you will need to:\n", 25 | "\n", 26 | "1. Use the [Cloud Resource Manager](https://console.cloud.google.com/cloud-resource-manager) to Create a Cloud Platform project if you do not already have one.\n", 27 | "1. [Enable billing](https://support.google.com/cloud/answer/6293499#enable-billing) for the project.\n", 28 | "1. [Enable BigQuery](https://console.cloud.google.com/flows/enableapi?apiid=bigquery) APIs for the project." 29 | ], 30 | "metadata": { 31 | "id": "n86D0dGs4Lh0" 32 | } 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "source": [ 37 | "To begin with, we'll authenticate with GCP and import the python libraries necessary to execute this colab." 38 | ], 39 | "metadata": { 40 | "id": "c-ipGY9-arep" 41 | } 42 | }, 43 | { 44 | "cell_type": "code", 45 | "source": [ 46 | "#@title Please input your project id\n", 47 | "import altair as alt\n", 48 | "import numpy as np\n", 49 | "import pandas as pd\n", 50 | "from google.cloud import bigquery\n", 51 | "# Provide credentials to the runtime\n", 52 | "from google.colab import auth\n", 53 | "from google.cloud.bigquery import magics\n", 54 | "\n", 55 | "auth.authenticate_user()\n", 56 | "print('Authenticated')\n", 57 | "project_id = 'google.com:google-cluster-data' #@param {type: \"string\"}\n", 58 | "# Set the default project id for %bigquery magic\n", 59 | "magics.context.project = project_id\n", 60 | "\n", 61 | "# Use the client to run queries constructed from a more complicated function.\n", 62 | "client = bigquery.Client(project=project_id)\n" 63 | ], 64 | "metadata": { 65 | "id": "CEhNsC1OPajn" 66 | }, 67 | "execution_count": null, 68 | "outputs": [] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "source": [ 73 | "## Basic Queries" 74 | ], 75 | "metadata": { 76 | "id": "tLOi9ZDM46oa" 77 | } 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "source": [ 82 | "Here are some examples of using the [bigquery magic](https://cloud.google.com/python/docs/reference/bigquery/latest/index.html) to query the power traces.\n", 83 | "\n", 84 | "First we'll calculate the average production utilization for a single power domain.\n" 85 | ], 86 | "metadata": { 87 | "id": "nqX1P6QOOUpt" 88 | } 89 | }, 90 | { 91 | "cell_type": "code", 92 | "source": [ 93 | "%%bigquery\n", 94 | "SELECT\n", 95 | " AVG(production_power_util) AS average_production_power_util\n", 96 | "FROM `google.com:google-cluster-data`.powerdata_2019.cella_pdu10" 97 | ], 98 | "metadata": { 99 | "id": "63jcfAIdWfqN" 100 | }, 101 | "execution_count": null, 102 | "outputs": [] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "source": [ 107 | "\n", 108 | "Now let's find the minimum and maximum measured power utilization for each cell. We use bigquery [wildcard tables](https://cloud.google.com/bigquery/docs/querying-wildcard-tables) in order to conveniently query all trace tables at once." 109 | ], 110 | "metadata": { 111 | "id": "FxNECyAlVJR3" 112 | } 113 | }, 114 | { 115 | "cell_type": "code", 116 | "source": [ 117 | "%%bigquery\n", 118 | "SELECT\n", 119 | " cell,\n", 120 | " MIN(measured_power_util) AS minimum_measured_power_util,\n", 121 | " MAX(measured_power_util) AS maximum_measured_power_util\n", 122 | "FROM `google.com:google-cluster-data.powerdata_2019.cell*`\n", 123 | "GROUP BY cell" 124 | ], 125 | "metadata": { 126 | "id": "YqpYZTniOq-t" 127 | }, 128 | "execution_count": null, 129 | "outputs": [] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "source": [ 134 | "Modifying the previous query to also group by `pdu` gives us the maximum and minimum measured power utilization per power domain." 135 | ], 136 | "metadata": { 137 | "id": "z4PECIQ0OGYI" 138 | } 139 | }, 140 | { 141 | "cell_type": "code", 142 | "source": [ 143 | "%%bigquery\n", 144 | "SELECT\n", 145 | " cell,\n", 146 | " pdu,\n", 147 | " MIN(measured_power_util) AS minimum_measured_power_util,\n", 148 | " MAX(measured_power_util) AS maximum_measured_power_util\n", 149 | "FROM `google.com:google-cluster-data.powerdata_2019.cell*`\n", 150 | "GROUP BY cell, pdu\n", 151 | "ORDER BY maximum_measured_power_util" 152 | ], 153 | "metadata": { 154 | "id": "23wOHXUpNvNS" 155 | }, 156 | "execution_count": null, 157 | "outputs": [] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "source": [ 162 | "## Measured and Production Power over Time" 163 | ], 164 | "metadata": { 165 | "id": "NvZLmc5TYB76" 166 | } 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "source": [ 171 | "We provide traces for two clusters of Google's new Medium Voltage Power Plane (MVPP) data center design, described in [the paper](https://research.google/pubs/pub49032/). Let's plot the measured and estimated production power utilization of one of these MVPPs: mvpp1. We'll limit this visualization to the first 15 days of the trace period (the first 4320 datapoints of the trace)." 172 | ], 173 | "metadata": { 174 | "id": "i89hzgdGbrrI" 175 | } 176 | }, 177 | { 178 | "cell_type": "code", 179 | "source": [ 180 | "%%bigquery mvpp1_df\n", 181 | "SELECT\n", 182 | " time,\n", 183 | " measured_power_util,\n", 184 | " production_power_util\n", 185 | "FROM `google.com:google-cluster-data`.powerdata_2019.celli_mvpp1\n", 186 | "ORDER BY time\n", 187 | "LIMIT 4320" 188 | ], 189 | "metadata": { 190 | "id": "WhMI0Y8sYBMU" 191 | }, 192 | "execution_count": null, 193 | "outputs": [] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "source": [ 198 | "alt.Chart(mvpp1_df).mark_line().transform_fold(\n", 199 | " [\"measured_power_util\", \"production_power_util\"]).encode(\n", 200 | " x=\"time:Q\",\n", 201 | " y=alt.X(\"value:Q\", scale=alt.Scale(zero=False)),\n", 202 | " color=\"key:N\"\n", 203 | ").properties(width=700, height=75)" 204 | ], 205 | "metadata": { 206 | "id": "K07qANn2Zz_o" 207 | }, 208 | "execution_count": null, 209 | "outputs": [] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "source": [ 214 | "## CPU and Power over Time" 215 | ], 216 | "metadata": { 217 | "id": "bdzTheN-XdHv" 218 | } 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "source": [ 223 | "As an example of joining the power traces with the cluster traces, we'll plot the average CPU utilization and power utilization per hour.\n", 224 | "\n", 225 | "You may remember this query from the [cluster analysis colab](https://github.com/google/cluster-data/blob/master/clusterdata_analysis_colab.ipynb). It's been modified slightly--see `query_per_tier_utilization_time_series` in particular." 226 | ], 227 | "metadata": { 228 | "id": "TL7Dh7-TFsK7" 229 | } 230 | }, 231 | { 232 | "cell_type": "code", 233 | "source": [ 234 | "def machines_in_pdu(pdu_number):\n", 235 | " return '''\n", 236 | "SELECT machine_id\n", 237 | "FROM `google.com:google-cluster-data`.powerdata_2019.machine_to_pdu_mapping\n", 238 | "WHERE pdu = 'pdu{pdu_number}'\n", 239 | " '''.format(pdu_number=pdu_number)\n", 240 | "\n", 241 | "def query_cell_capacity(cell, pdu_number):\n", 242 | " return '''\n", 243 | "SELECT SUM(cpu_cap) AS cpu_capacity\n", 244 | "FROM (\n", 245 | " SELECT machine_id, MAX(capacity.cpus) AS cpu_cap,\n", 246 | " FROM `google.com:google-cluster-data`.clusterdata_2019_{cell}.machine_events\n", 247 | " WHERE machine_id IN ({machine_query})\n", 248 | " GROUP BY 1\n", 249 | ")\n", 250 | " '''.format(cell=cell, machine_query=machines_in_pdu(pdu_number))\n", 251 | "\n", 252 | "def query_per_instance_usage_priority(cell, pdu_num):\n", 253 | " return '''\n", 254 | "SELECT u.time AS time,\n", 255 | " u.collection_id AS collection_id,\n", 256 | " u.instance_index AS instance_index,\n", 257 | " e.priority AS priority,\n", 258 | " CASE\n", 259 | " WHEN e.priority BETWEEN 0 AND 99 THEN '1_free'\n", 260 | " WHEN e.priority BETWEEN 100 AND 115 THEN '2_beb'\n", 261 | " WHEN e.priority BETWEEN 116 AND 119 THEN '3_mid'\n", 262 | " ELSE '4_prod'\n", 263 | " END AS tier,\n", 264 | " u.cpu_usage AS cpu_usage\n", 265 | "FROM (\n", 266 | " SELECT start_time AS time,\n", 267 | " collection_id,\n", 268 | " instance_index,\n", 269 | " machine_id,\n", 270 | " average_usage.cpus AS cpu_usage\n", 271 | " FROM `google.com:google-cluster-data`.clusterdata_2019_{cell}.instance_usage\n", 272 | " WHERE (alloc_collection_id IS NULL OR alloc_collection_id = 0)\n", 273 | " AND (end_time - start_time) >= (5 * 60 * 1e6)\n", 274 | ") AS u JOIN (\n", 275 | " SELECT collection_id, instance_index, machine_id,\n", 276 | " MAX(priority) AS priority\n", 277 | " FROM `google.com:google-cluster-data`.clusterdata_2019_{cell}.instance_events\n", 278 | " WHERE (alloc_collection_id IS NULL OR alloc_collection_id = 0)\n", 279 | " AND machine_id IN ({machine_query})\n", 280 | " GROUP BY 1, 2, 3\n", 281 | ") AS e ON u.collection_id = e.collection_id\n", 282 | " AND u.instance_index = e.instance_index\n", 283 | " AND u.machine_id = e.machine_id\n", 284 | " '''.format(cell=cell, machine_query=machines_in_pdu(pdu_num))\n", 285 | "\n", 286 | "def query_per_tier_utilization_time_series(cell, pdu_num, cpu_capacity):\n", 287 | " return '''\n", 288 | "SELECT * FROM (\n", 289 | " SELECT CAST(FLOOR(time/(1e6 * 60 * 60)) AS INT64) AS hour_index,\n", 290 | " tier,\n", 291 | " SUM(cpu_usage) / (12 * {cpu_capacity}) AS avg_cpu_usage,\n", 292 | " FROM ({table})\n", 293 | " GROUP BY 1, 2)\n", 294 | "JOIN (\n", 295 | " SELECT CAST(FLOOR(time/(1e6 * 60 * 60)) AS INT64) AS hour_index,\n", 296 | " pdu,\n", 297 | " AVG(measured_power_util) as avg_measured_power_util,\n", 298 | " AVG(production_power_util) AS avg_production_power_util\n", 299 | " FROM `google.com:google-cluster-data`.`powerdata_2019.cell{cell}_pdu{pdu_num}`\n", 300 | " GROUP BY hour_index, pdu\n", 301 | ") USING (hour_index)\n", 302 | " '''.format(table=query_per_instance_usage_priority(cell, pdu_num),\n", 303 | " cpu_capacity=cpu_capacity, cell=cell, pdu_num=pdu_num)\n", 304 | "\n", 305 | "def run_query_utilization_per_time_time_series(cell, pdu_num):\n", 306 | " cell_cap = client.query(query_cell_capacity(cell, pdu_num)).to_dataframe()\n", 307 | " query = query_per_tier_utilization_time_series(\n", 308 | " cell,\n", 309 | " pdu_num,\n", 310 | " cell_cap['cpu_capacity'][0])\n", 311 | " time_series = client.query(query).to_dataframe()\n", 312 | " return time_series\n", 313 | "\n", 314 | "CELL='f'\n", 315 | "PDU_NUM='17'\n", 316 | "hourly_usage = run_query_utilization_per_time_time_series(CELL, PDU_NUM)" 317 | ], 318 | "metadata": { 319 | "id": "Bmp8Z2diaCPC" 320 | }, 321 | "execution_count": null, 322 | "outputs": [] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "source": [ 327 | "Plot power utilization on top of the CPU utilization graph." 328 | ], 329 | "metadata": { 330 | "id": "E47jgIjzGnDP" 331 | } 332 | }, 333 | { 334 | "cell_type": "code", 335 | "source": [ 336 | "# CPU graph\n", 337 | "cpu = alt.Chart().mark_area().encode(\n", 338 | " alt.X('hour_index:N'),\n", 339 | " alt.Y('avg_cpu_usage:Q'),\n", 340 | " color=alt.Color('tier', legend=alt.Legend(orient=\"left\", title=None)),\n", 341 | " order=alt.Order('tier', sort='descending'),\n", 342 | " tooltip=['tier:N', 'avg_cpu_usage:Q']\n", 343 | " )\n", 344 | "cpu.encoding.x.title = \"Hour\"\n", 345 | "cpu.encoding.y.title = \"Average Utilization\"\n", 346 | "\n", 347 | "\n", 348 | "# Power Utilization graph\n", 349 | "pu = (\n", 350 | " alt.Chart()\n", 351 | " .transform_fold(['avg_measured_power_util', 'avg_production_power_util'])\n", 352 | " .encode(\n", 353 | " alt.X(\n", 354 | " 'hour_index:N',\n", 355 | " axis=alt.Axis(labels=False, domain=False, ticks=False),\n", 356 | " ),\n", 357 | " alt.Y('value:Q'),\n", 358 | " color=alt.Color('key:N', legend=None),\n", 359 | " strokeDash=alt.StrokeDash('key:N', legend=None),\n", 360 | " tooltip=['hour_index:N', 'key:N', 'value:Q']\n", 361 | " )\n", 362 | " .mark_line().properties(title=alt.datum.cell + ': ' + alt.datum.cluster)\n", 363 | ")\n", 364 | "\n", 365 | "\n", 366 | "alt.layer(cpu, pu, data=hourly_usage).properties(\n", 367 | " width=1200,\n", 368 | " height=300,\n", 369 | " title=\"Average CPU and Power Utilization\").configure_axis(grid=False)" 370 | ], 371 | "metadata": { 372 | "id": "bUCQxV7-iDPm" 373 | }, 374 | "execution_count": null, 375 | "outputs": [] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "source": [ 380 | "We can adapt the previous queries to calculate the average CPU and power utilizations per day by tier (i.e. cappable workloads or production) for each PDU in cell `b`." 381 | ], 382 | "metadata": { 383 | "id": "G0qxiNg96pq3" 384 | } 385 | }, 386 | { 387 | "cell_type": "code", 388 | "source": [ 389 | "%%bigquery cluster_and_power_data_df\n", 390 | "WITH\n", 391 | " machines_in_cell AS (\n", 392 | " SELECT machine_id, pdu, cell\n", 393 | " FROM `google.com:google-cluster-data`.powerdata_2019.machine_to_pdu_mapping\n", 394 | " WHERE cell = 'b'\n", 395 | " ),\n", 396 | " cpu_capacities AS (\n", 397 | " SELECT pdu, cell, SUM(cpu_cap) AS cpu_capacity\n", 398 | " FROM\n", 399 | " (\n", 400 | " SELECT machine_id, MAX(capacity.cpus) AS cpu_cap,\n", 401 | " FROM `google.com:google-cluster-data`.clusterdata_2019_b.machine_events\n", 402 | " GROUP BY 1\n", 403 | " )\n", 404 | " JOIN machines_in_cell\n", 405 | " USING (machine_id)\n", 406 | " GROUP BY 1, 2\n", 407 | " ),\n", 408 | " per_instance_usage_priority AS (\n", 409 | " SELECT\n", 410 | " u.time AS time,\n", 411 | " u.collection_id AS collection_id,\n", 412 | " u.instance_index AS instance_index,\n", 413 | " e.priority AS priority,\n", 414 | " IF(e.priority < 120, 'cappable', 'production') AS tier,\n", 415 | " u.cpu_usage AS cpu_usage,\n", 416 | " m.pdu\n", 417 | " FROM\n", 418 | " (\n", 419 | " SELECT\n", 420 | " start_time AS time,\n", 421 | " collection_id,\n", 422 | " instance_index,\n", 423 | " machine_id,\n", 424 | " average_usage.cpus AS cpu_usage\n", 425 | " FROM `google.com:google-cluster-data`.clusterdata_2019_b.instance_usage\n", 426 | " WHERE\n", 427 | " (alloc_collection_id IS NULL OR alloc_collection_id = 0)\n", 428 | " AND (end_time - start_time) >= (5 * 60 * 1e6)\n", 429 | " ) AS u\n", 430 | " JOIN\n", 431 | " (\n", 432 | " SELECT collection_id, instance_index, machine_id, MAX(priority) AS priority\n", 433 | " FROM `google.com:google-cluster-data`.clusterdata_2019_b.instance_events\n", 434 | " WHERE (alloc_collection_id IS NULL OR alloc_collection_id = 0)\n", 435 | " GROUP BY 1, 2, 3\n", 436 | " ) AS e\n", 437 | " ON\n", 438 | " u.collection_id = e.collection_id\n", 439 | " AND u.instance_index = e.instance_index\n", 440 | " AND u.machine_id = e.machine_id\n", 441 | " JOIN machines_in_cell AS m\n", 442 | " ON m.machine_id = u.machine_id\n", 443 | " )\n", 444 | "SELECT *\n", 445 | "FROM\n", 446 | " (\n", 447 | " SELECT\n", 448 | " CAST(FLOOR(time / (1e6 * 60 * 60 * 24)) AS INT64) AS day_index,\n", 449 | " pdu,\n", 450 | " tier,\n", 451 | " SUM(cpu_usage) / (12 * 24 * ANY_VALUE(cpu_capacity)) AS avg_cpu_usage,\n", 452 | " FROM per_instance_usage_priority\n", 453 | " JOIN cpu_capacities\n", 454 | " USING (pdu)\n", 455 | " GROUP BY 1, 2, 3\n", 456 | " )\n", 457 | "JOIN\n", 458 | " (\n", 459 | " SELECT\n", 460 | " CAST(FLOOR((time - 6e8 + 3e8) / (1e6 * 60 * 60 * 24)) AS INT64) AS day_index,\n", 461 | " pdu,\n", 462 | " AVG(measured_power_util) AS avg_measured_power_util,\n", 463 | " AVG(production_power_util) AS avg_production_power_util\n", 464 | " FROM `google.com:google-cluster-data`.`powerdata_2019.cellb_pdu*`\n", 465 | " GROUP BY 1, 2\n", 466 | " )\n", 467 | " USING (pdu, day_index)" 468 | ], 469 | "metadata": { 470 | "id": "Id_W7pdF6xK9" 471 | }, 472 | "execution_count": null, 473 | "outputs": [] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "source": [ 478 | "cluster_and_power_data_df.describe()" 479 | ], 480 | "metadata": { 481 | "id": "-DFupvCbKniC" 482 | }, 483 | "execution_count": null, 484 | "outputs": [] 485 | }, 486 | { 487 | "cell_type": "markdown", 488 | "source": [ 489 | "### Be careful joining on `time` directly!\n", 490 | "Note that in the queries above we converted `time` from the cluster and power datasets to an `hour_index` which we then use to join the two datasets. We don't want to join on `time` directly due to the how the datasets are structured.\n", 491 | "\n", 492 | "The power trace `time` values are each aligned to the 5-minute mark. For example, there's a `time` at 600s, 900s, 1200s but never 700s or 601s. The cluster trace `time` values have no specific alignment. If we were to join the two datasets on `time`, we'd end up dropping a lot of data!" 493 | ], 494 | "metadata": { 495 | "id": "ok8vgKFvCCgt" 496 | } 497 | }, 498 | { 499 | "cell_type": "markdown", 500 | "source": [ 501 | "You may have also noticed that there are `day_index` has 32 unique values in the query above, despite May having 31 days. This is because timestamps in the data sets are represented as microseconds since 600 seconsd before the start of the trace period, May 01 2019 at 00:00 PT." 502 | ], 503 | "metadata": { 504 | "id": "ifMNifTPODkO" 505 | } 506 | }, 507 | { 508 | "cell_type": "markdown", 509 | "source": [ 510 | "## Recreating Graphs from the Paper" 511 | ], 512 | "metadata": { 513 | "id": "wbXuUBygOu5-" 514 | } 515 | }, 516 | { 517 | "cell_type": "markdown", 518 | "source": [ 519 | "Below, the power traces are used to re-create figures from [the paper](https://research.google/pubs/pub49032/)." 520 | ], 521 | "metadata": { 522 | "id": "aARdyheiRgVp" 523 | } 524 | }, 525 | { 526 | "cell_type": "code", 527 | "source": [ 528 | "def histogram_data(filter='pdu%', agg_by='pdu', util_type='measured_power_util'):\n", 529 | " query = \"\"\"\n", 530 | " SELECT bin as bins, SUM(count) as counts\n", 531 | " FROM (\n", 532 | " SELECT {agg_by},\n", 533 | " ROUND(CAST({util_type} / 0.0001 as INT64) * 0.0001, 3) as bin,\n", 534 | " COUNT(*) as count\n", 535 | " FROM (\n", 536 | " SELECT {agg_by},\n", 537 | " time,\n", 538 | " {util_type},\n", 539 | " FROM `google.com:google-cluster-data`.`powerdata_2019.cell*`\n", 540 | " WHERE NOT bad_measurement_data\n", 541 | " AND NOT bad_production_power_data\n", 542 | " AND {agg_by} LIKE '{filter}'\n", 543 | " AND NOT cell in ('i', 'j')\n", 544 | " )\n", 545 | " GROUP BY 1, 2\n", 546 | " ) GROUP BY 1 ORDER BY 1;\n", 547 | " \"\"\".format(**{'filter': filter, 'agg_by': agg_by, 'util_type': util_type})\n", 548 | " return client.query(query).to_dataframe()" 549 | ], 550 | "metadata": { 551 | "id": "LFuWscTAOyE3" 552 | }, 553 | "execution_count": null, 554 | "outputs": [] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "source": [ 559 | "pdu_df = histogram_data()\n", 560 | "cluster_df = histogram_data('%', 'cell')\n", 561 | "prod_pdu_df = histogram_data(util_type='production_power_util')\n", 562 | "prod_cluster_df = histogram_data('%', 'cell', util_type='production_power_util')" 563 | ], 564 | "metadata": { 565 | "id": "62vMXuSYUIG8" 566 | }, 567 | "execution_count": null, 568 | "outputs": [] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "source": [ 573 | "def make_cdf(p_df, c_df, title):\n", 574 | " pdu_counts, pdu_bins = p_df.counts, p_df.bins\n", 575 | " cluster_counts, cluster_bins = c_df.counts, c_df.bins\n", 576 | "\n", 577 | " pdu_cdf = np.cumsum (list(pdu_counts))\n", 578 | " pdu_cdf = (1.0 * pdu_cdf) / pdu_cdf[-1]\n", 579 | " cluster_cdf = np.cumsum (list(cluster_counts))\n", 580 | " cluster_cdf = (1.0 * cluster_cdf) / cluster_cdf[-1]\n", 581 | "\n", 582 | " pdu_cdf_graph = alt.Chart(pd.DataFrame(\n", 583 | " {'bins': pdu_bins, 'cdf': pdu_cdf})).mark_point(size=.1).encode(\n", 584 | " x=alt.X('bins', scale=alt.Scale(domain=[0.4, 0.90])),\n", 585 | " y=alt.Y('cdf'),\n", 586 | " color=alt.value('steelblue')\n", 587 | " )\n", 588 | "\n", 589 | " cluster_cdf_graph = alt.Chart(pd.DataFrame(\n", 590 | " {'bins': cluster_bins, 'cdf': cluster_cdf})).mark_point(size=.1).encode(\n", 591 | " x=alt.X('bins', scale=alt.Scale(domain=[0.4, 0.90])),\n", 592 | " y=alt.Y('cdf'),\n", 593 | " color=alt.value('forestgreen')\n", 594 | " )\n", 595 | "\n", 596 | " return (pdu_cdf_graph + cluster_cdf_graph).properties(title=title)\n", 597 | "\n", 598 | "\n", 599 | "alt.hconcat(make_cdf(pdu_df, cluster_df, \"Measured Power Util\"), make_cdf(\n", 600 | " prod_pdu_df, prod_cluster_df, \"Production Power Util\"))" 601 | ], 602 | "metadata": { 603 | "id": "eEze9O1CUlSU" 604 | }, 605 | "execution_count": null, 606 | "outputs": [] 607 | } 608 | ] 609 | } -------------------------------------------------------------------------------- /bibliography.bib: -------------------------------------------------------------------------------- 1 | ################################################################ 2 | # Introduction 3 | ################################################################ 4 | 5 | This bibliography is a resource for people writing papers that refer 6 | to the Google cluster traces. It covers papers that analyze the 7 | traces, as well as ones that use them as inputs to other studies. 8 | 9 | * I recommend using \usepackage{url}. 10 | * Entries are in publication-date order, with the most recent at the top. 11 | * Bibtex ignores stuff that is outside the entries, so text like this is safe. 12 | 13 | The following are the RECOMMENDED CITATIONS if you just need the basics: 14 | 15 | * Borg: 16 | * \cite{clusterdata:Verma2015, clusterdata:Tirmazi2020} for Borg itself 17 | * 2019 traces: 18 | * \cite{clusterdata:Wilkes2020, clusterdata:Wilkes2020a, clusterdata:Tirmazi2020} for 19 | the complete set of info about trace itself. 20 | * \cite{clusterdata:Wilkes2020} for the 2019 trace announcement 21 | * \cite{clusterdata:Wilkes2020a} for the details about the 2019 trace contents 22 | * \cite{clusterdata:Tirmazi2020} for the EuroSys paper about the 2019 and 2011 traces 23 | * 2011 trace: 24 | * \cite{clusterdata:Wilkes2011, clusterdata:Reiss2011} for the trace itself 25 | * \cite{clusterdata:Reiss2012b} for the first thorough analysis of it. 26 | 27 | If you use the traces, please send a bibtex entry that looks *exactly* like one 28 | of these to johnwilkes@google.com, so your paper can be added - and cited! A 29 | Github pull request is the best format. 30 | 31 | ################################################################ 32 | # Trace-announcements 33 | ################################################################ 34 | 35 | These entries can be used to cite the traces themselves. 36 | 37 | # The May 2019 traces. 38 | # Use clusterdata:Tirmazi2020 for the first paper to analyze them. 39 | 40 | # This is the formal announcement of the trace: 41 | @Misc{clusterdata:Wilkes2020, 42 | author = {John Wilkes}, 43 | title = {Yet more {Google} compute cluster trace data}, 44 | howpublished = {Google research blog}, 45 | month = Apr, 46 | year = 2020, 47 | address = {Mountain View, CA, USA}, 48 | note = {Posted at \url{https://ai.googleblog.com/2020/04/yet-more-google-compute-cluster-trace.html}.}, 49 | } 50 | 51 | # If you want to cite details about the trace itself: 52 | @TechReport{clusterdata:Wilkes2020a, 53 | author = {John Wilkes}, 54 | title = {{Google} cluster-usage traces v3}, 55 | institution = {Google Inc.}, 56 | year = 2020, 57 | month = Apr, 58 | type = {Technical Report}, 59 | address = {Mountain View, CA, USA}, 60 | note = {Posted at \url{https://github.com/google/cluster-data/blob/master/ClusterData2019.md}}, 61 | abstract = { 62 | This document describes the semantics, data format, and 63 | schema of usage traces of a few Google compute cells. 64 | This document describes version 3 of the trace format.}, 65 | } 66 | 67 | #---------------- 68 | The next couple are for the May 2011 "full" trace. 69 | @Misc{clusterdata:Wilkes2011, 70 | author = {John Wilkes}, 71 | title = {More {Google} cluster data}, 72 | howpublished = {Google research blog}, 73 | month = Nov, 74 | year = 2011, 75 | address = {Mountain View, CA, USA}, 76 | note = {Posted at \url{http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html}.}, 77 | } 78 | 79 | @TechReport{clusterdata:Reiss2011, 80 | author = {Charles Reiss and John Wilkes and Joseph L. Hellerstein}, 81 | title = {{Google} cluster-usage traces: format + schema}, 82 | institution = {Google Inc.}, 83 | year = 2011, 84 | month = Nov, 85 | type = {Technical Report}, 86 | address = {Mountain View, CA, USA}, 87 | note = {Revised 2014-11-17 for version 2.1. Posted at 88 | \url{https://github.com/google/cluster-data}}, 89 | } 90 | 91 | 92 | #---------------- 93 | # The next one is for the earlier "small" 7-hour trace. 94 | # (Most people should not be using this.) 95 | 96 | @Misc{clusterdata:Hellersetein2010, 97 | author = {Joseph L. Hellerstein}, 98 | title = {{Google} cluster data}, 99 | howpublished = {Google research blog}, 100 | month = Jan, 101 | year = 2010, 102 | note = {Posted at \url{http://googleresearch.blogspot.com/2010/01/google-cluster-data.html}.}, 103 | } 104 | 105 | #---------------- 106 | The canonical Borg paper. 107 | @inproceedings{clusterdata:Verma2015, 108 | title = {Large-scale cluster management at {Google} with {Borg}}, 109 | author = {Abhishek Verma and Luis Pedrosa and Madhukar R. Korupolu and David Oppenheimer and Eric Tune and John Wilkes}, 110 | year = {2015}, 111 | booktitle = {Proceedings of the European Conference on Computer Systems (EuroSys'15)}, 112 | address = {Bordeaux, France}, 113 | articleno = {18}, 114 | numpages = {17}, 115 | abstract = { 116 | Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, 117 | from many thousands of different applications, across a number of clusters each with 118 | up to tens of thousands of machines. 119 | 120 | It achieves high utilization by combining admission control, efficient task-packing, 121 | over-commitment, and machine sharing with process-level performance isolation. 122 | It supports high-availability applications with runtime features that minimize 123 | fault-recovery time, and scheduling policies that reduce the probability of correlated 124 | failures. Borg simplifies life for its users by offering a declarative job specification 125 | language, name service integration, real-time job monitoring, and tools to analyze and 126 | simulate system behavior. 127 | 128 | We present a summary of the Borg system architecture and features, important design 129 | decisions, a quantitative analysis of some of its policy decisions, and a qualitative 130 | examination of lessons learned from a decade of operational experience with it.}, 131 | url = {https://dl.acm.org/doi/10.1145/2741948.2741964}, 132 | doi = {10.1145/2741948.2741964}, 133 | } 134 | 135 | 136 | #---------------- 137 | The next paper describes the policy choices and technologies used to 138 | make the traces safe to release. 139 | 140 | @InProceedings{clusterdata:Reiss2012, 141 | author = {Charles Reiss and John Wilkes and Joseph L. Hellerstein}, 142 | title = {Obfuscatory obscanturism: making workload traces of 143 | commercially-sensitive systems safe to release}, 144 | year = 2012, 145 | booktitle = {3rd International Workshop on Cloud Management (CLOUDMAN)}, 146 | month = Apr, 147 | publisher = {IEEE}, 148 | pages = {1279--1286}, 149 | address = {Maui, HI, USA}, 150 | abstract = {Cloud providers such as Google are interested in fostering 151 | research on the daunting technical challenges they face in 152 | supporting planetary-scale distributed systems, but no 153 | academic organizations have similar scale systems on which to 154 | experiment. Fortunately, good research can still be done using 155 | traces of real-life production workloads, but there are risks 156 | in releasing such data, including inadvertently disclosing 157 | confidential or proprietary information, as happened with the 158 | Netflix Prize data. This paper discusses these risks, and our 159 | approach to them, which we call systematic obfuscation. It 160 | protects proprietary and personal data while leaving it 161 | possible to answer interesting research questions. We explain 162 | and motivate some of the risks and concerns and propose how 163 | they can best be mitigated, using as an example our recent 164 | publication of a month-long trace of a production system 165 | workload on a 11k-machine cluster.}, 166 | url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6212064}, 167 | } 168 | 169 | ################################################################ 170 | # Trace-analysis papers 171 | ################################################################ 172 | 173 | These papers are primarily about analyzing the traces. 174 | Order: most recent first. 175 | 176 | If you just want one citation about the Cluster2011 trace, then 177 | use \cite{clusterdata:Reiss2012b}. 178 | 179 | 180 | ################ 2025 181 | @InProceedings{clusterdata:Sliwko2025, 182 | author = {Sliwko, Leszek and Mizera-Pietraszko, Jolanta}, 183 | title = {Enhancing Cluster Scheduling in {HPC}: A Continuous Transfer Learning for Real-Time Optimization}, 184 | year = 2025, 185 | booktitle = {2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)}, 186 | month = Jun, 187 | pages = {316--325}, 188 | doi = {10.1109/IPDPSW66978.2025.00056}, 189 | issn = {2995-066X}, 190 | url = {https://ieeexplore.ieee.org/document/11105897}, 191 | keywords = {Cloud computing;machine learning;load balancing and task assignment;transfer learning}, 192 | abstract = {This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, 193 | focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, 194 | whereas the proposed continuous transfer learning model evolves dynamically during operations, minimizing retraining 195 | needs. Evaluated on Google Cluster Data, the model achieves over 99\% accuracy, reducing computational overhead and 196 | improving scheduling latency for constrained tasks. This scalable solution enables real-time optimization, advancing 197 | machine learning integration in cluster management and paving the way for future adaptive scheduling strategies.}, 198 | } 199 | 200 | 201 | ################ 2024 202 | @article{clusterdata:Sliwko2024, 203 | author = {Sliwko, Leszek}, 204 | title = {Cluster Workload Allocation: A Predictive Approach Leveraging Machine Learning Efficiency}, 205 | year = 2024, 206 | month = Dec, 207 | journal = {IEEE Access}, 208 | volume = 12, 209 | pages = {194091--194107}, 210 | doi = {10.1109/ACCESS.2024.3520422}, 211 | issn = {2169-3536}, 212 | url = {https://ieeexplore.ieee.org/document/10807210}, 213 | keywords = {Machine learning;classification algorithms;load balancing and task assignment;Google Cluster Data}, 214 | abstract = {This research investigates how Machine Learning (ML) algorithms can assist in workload allocation 215 | strategies by detecting tasks with node affinity operators (referred to as constraint operators), which constrain 216 | their execution to a limited number of nodes. Using real-world Google Cluster Data (GCD) workload traces and 217 | the AGOCS framework, the study extracts node attributes and task constraints, then analyses them to identify suitable 218 | node-task pairings. It focuses on tasks that can be executed on either a single node or fewer than a thousand out of 219 | 12.5k nodes in the analysed GCD cluster. Task constraint operators are compacted, pre-processed with one-hot 220 | encoding,and used as features in a training dataset. Various ML classifiers, including Artificial Neural Networks, 221 | K-Nearest Neighbours, Decision Trees, Naive Bayes, Ridge Regression, Adaptive Boosting, and Bagging, are fine-tuned 222 | and assessed for accuracy and F1-scores. The final ensemble voting classifier model achieved 98\% accuracy and a 223 | 1.5-1.8\% misclassification rate for tasks with a single suitable node.} 224 | } 225 | 226 | 227 | ################ 2023 228 | @INPROCEEDINGS{clusterdata:Tuns2023, 229 | author = {Tuns, Adrian-Ioan and Spătaru, Adrian}, 230 | booktitle = {2023 25th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)}, 231 | title = {Cloud Service Failure Prediction on {Google’s Borg} Cluster Traces Using Traditional Machine Learning}, 232 | year = 2023, 233 | month = Sep, 234 | ISSN = {2470-881X}, 235 | doi = {10.1109/SYNASC61333.2023.00029}, 236 | url = {https://doi.org/10.1109/SYNASC61333.2023.00029}, 237 | pages = {162--169}, 238 | keywords = {Cloud computing;Machine learning algorithms;Scientific computing;Clustering algorithms; 239 | Prediction algorithms;Boosting;Classification algorithms;failure prediction;big data;machine learning; 240 | classification algorithms;Google Borg}, 241 | abstract = {The ability to predict failures in complex systems is crucial for maintaining their 242 | optimal performance, opening the possibility of reducing downtime and minimizing costs. 243 | In the context of cloud computing, cloud failure represents one of the most relevant problems, 244 | which not only leads to substantial financial losses but also negatively impacts the productivity 245 | of both industrial and end users. This paper presents a comprehensive study on the application of 246 | failure prediction techniques, by exploring four machine learning algorithms, namely Decision Tree, 247 | Random Forest, Gradient Boosting, and Logistic Regression.The research focuses on analyzing the 248 | workload of an industrial set of clusters, provided as traces in Google’s Borg cluster workload traces. 249 | The aim was to develop highly accurate predictive models for both job and task failures, a goal which 250 | was achieved. A job classifier having a performance of 83.97\% accuracy (Gradient Boosting) and a task 251 | classifier of 98.79\% accuracy performance (Decision Tree) were obtained.}, 252 | } 253 | 254 | 255 | ################ 2022 256 | @inproceedings {clusterdata:jajooSLearn2022, 257 | author = {Akshay Jajoo and Y. Charlie Hu and Xiaojun Lin and Nan Deng}, 258 | title = {A Case for Task Sampling based Learning for Cluster Job Scheduling}, 259 | booktitle = {19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)}, 260 | year = {2022}, 261 | address = {Renton, WA, USA}, 262 | url = {https://www.usenix.org/conference/nsdi22/presentation/jajoo}, 263 | publisher = {USENIX Association}, 264 | keywords = {data centers, big data, job scheduling, learning, online learning}, 265 | abstract = {The ability to accurately estimate job runtime properties allows a 266 | scheduler to effectively schedule jobs. State-of-the-art online cluster job 267 | schedulers use history-based learning, which uses past job execution information 268 | to estimate the runtime properties of newly arrived jobs. However, with fast-paced 269 | development in cluster technology (in both hardware and software) and changing user 270 | inputs, job runtime properties can change over time, which lead to inaccurate predictions. 271 | In this paper, we explore the potential and limitation of real-time learning of job 272 | runtime properties, by proactively sampling and scheduling a small fraction of the 273 | tasks of each job. Such a task-sampling-based approach exploits the similarity among 274 | runtime properties of the tasks of the same job and is inherently immune to changing 275 | job behavior. Our study focuses on two key questions in comparing task-sampling-based 276 | learning (learning in space) and history-based learning (learning in time): (1) Can 277 | learning in space be more accurate than learning in time? (2) If so, can delaying 278 | scheduling the remaining tasks of a job till the completion of sampled tasks be more 279 | than compensated by the improved accuracy and result in improved job performance? Our 280 | analytical and experimental analysis of 3 production traces with different skew and job 281 | distribution shows that learning in space can be substantially more accurate. Our 282 | simulation and testbed evaluation on Azure of the two learning approaches anchored in a 283 | generic job scheduler using 3 production cluster job traces shows that despite its online 284 | overhead, learning in space reduces the average Job Completion Time (JCT) by 1.28x, 1.56x, 285 | and 1.32x compared to the prior-art history-based predictor.}, 286 | } 287 | 288 | 289 | ################ 2021 290 | @article{clusterdata:jajooSLearnTechReport2021, 291 | author = {Akshay Jajoo and Y. Charlie Hu and Xiaojun Lin and Nan Deng}, 292 | title = {The Case for Task Sampling based Learning for Cluster Job Scheduling}, 293 | journal = {Computing Research Repository}, 294 | volume = {abs/2108.10464}, 295 | year = {2021}, 296 | url = {https://arxiv.org/abs/2108.10464}, 297 | eprinttype = {arXiv}, 298 | eprint = {2108.10464}, 299 | timestamp = {Fri, 27 Aug 2021 15:02:29 +0200}, 300 | biburl = {https://dblp.org/rec/journals/corr/abs-2108-10464.bib}, 301 | bibsource = {dblp computer science bibliography, https://dblp.org}, 302 | keywords = {data centers, big data, job scheduling, learning, online learning}, 303 | abstract = {The ability to accurately estimate job runtime properties allows a 304 | scheduler to effectively schedule jobs. State-of-the-art online cluster job 305 | schedulers use history-based learning, which uses past job execution information 306 | to estimate the runtime properties of newly arrived jobs. However, with fast-paced 307 | development in cluster technology (in both hardware and software) and changing user 308 | inputs, job runtime properties can change over time, which lead to inaccurate predictions. 309 | In this paper, we explore the potential and limitation of real-time learning of job 310 | runtime properties, by proactively sampling and scheduling a small fraction of the 311 | tasks of each job. Such a task-sampling-based approach exploits the similarity among 312 | runtime properties of the tasks of the same job and is inherently immune to changing 313 | job behavior. Our study focuses on two key questions in comparing task-sampling-based 314 | learning (learning in space) and history-based learning (learning in time): (1) Can 315 | learning in space be more accurate than learning in time? (2) If so, can delaying 316 | scheduling the remaining tasks of a job till the completion of sampled tasks be more 317 | than compensated by the improved accuracy and result in improved job performance? Our 318 | analytical and experimental analysis of 3 production traces with different skew and job 319 | distribution shows that learning in space can be substantially more accurate. Our 320 | simulation and testbed evaluation on Azure of the two learning approaches anchored in a 321 | generic job scheduler using 3 production cluster job traces shows that despite its online 322 | overhead, learning in space reduces the average Job Completion Time (JCT) by 1.28x, 1.56x, 323 | and 1.32x compared to the prior-art history-based predictor.}, 324 | } 325 | 326 | ################ 2020 327 | 328 | @inproceedings{clusterdata:Tirmazi2020, 329 | author = {Tirmazi, Muhammad and Barker, Adam and Deng, Nan and Haque, Md E. and Qin, Zhijing Gene and Hand, Steven and Harchol-Balter, Mor and Wilkes, John}, 330 | title = {{Borg: the Next Generation}}, 331 | year = {2020}, 332 | isbn = {9781450368827}, 333 | publisher = {ACM}, 334 | address = {Heraklion, Greece}, 335 | url = {https://doi.org/10.1145/3342195.3387517}, 336 | doi = {10.1145/3342195.3387517}, 337 | booktitle = {Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys'20)}, 338 | articleno = {30}, 339 | numpages = {14}, 340 | keywords = {data centers, cloud computing}, 341 | abstract = { 342 | This paper analyzes a newly-published trace that covers 8 343 | different Borg clusters for the month of May 2019. The 344 | trace enables researchers to explore how scheduling works in 345 | large-scale production compute clusters. We highlight how 346 | Borg has evolved and perform a longitudinal comparison of 347 | the newly-published 2019 trace against the 2011 trace, which 348 | has been highly cited within the research community. 349 | Our findings show that Borg features such as alloc sets 350 | are used for resource-heavy workloads; automatic vertical 351 | scaling is effective; job-dependencies account for much of 352 | the high failure rates reported by prior studies; the workload 353 | arrival rate has increased, as has the use of resource 354 | over-commitment; the workload mix has changed, jobs have 355 | migrated from the free tier into the best-effort batch tier; 356 | the workload exhibits an extremely heavy-tailed distribution 357 | where the top 1\% of jobs consume over 99\% of resources; and 358 | there is a great deal of variation between different clusters.}, 359 | } 360 | 361 | 362 | ################ 2018 363 | 364 | @article{clusterdata:Sebastio2018, 365 | title = {Characterizing machines lifecycle in Google data centers}, 366 | journal = {Performance Evaluation}, 367 | volume = 126, 368 | pages = {39 -- 63}, 369 | year = 2018, 370 | issn = {0166-5316}, 371 | doi = {https://doi.org/10.1016/j.peva.2018.08.001}, 372 | url = {http://www.sciencedirect.com/science/article/pii/S016653161830004X}, 373 | author = {Stefano Sebastio and Kishor S. Trivedi and Javier Alonso}, 374 | keywords = {Statistical analysis, Distributed architectures, Cloud computing, System reliability, Large-scale systems, Empirical studies}, 375 | abstract = {Due to the increasing need for computational power, the market has 376 | shifted towards big centralized data centers. Understanding the nature 377 | of the dynamics of these data centers from machine and job/task 378 | perspective is critical to design efficient data center management 379 | policies like optimal resource/power utilization, capacity planning and 380 | optimal (reactive and proactive) maintenance scheduling. Whereas 381 | jobs/tasks dynamics have received a lot of attention, the study of the 382 | dynamics of the underlying machines supporting the jobs/tasks execution 383 | has received much less attention, even when these dynamics would 384 | substantially affect the performance of the jobs/tasks execution. Given 385 | the limited data available from large computing installations, only a 386 | few previous studies have inspected data centers and only concerning 387 | failures and their root causes. In this paper, we study the 2011 Google 388 | data center traces from the machine dynamics perspective. First, we 389 | characterize the machine events and their underlying distributions in 390 | order to have a better understanding of the entire machine lifecycle. 391 | Second, we propose a data-driven model to enable the estimate of the 392 | expected number of available machines at any instant of time. The model 393 | is parameterized and validated using the empirical data collected by 394 | Google during a one month period.} 395 | } 396 | 397 | ################ 2017 398 | 399 | @Inbook{clusterdata:Ray2017, 400 | author = {Ray, Biplob R., Chowdhury, Morshed and Atif, Usman}, 401 | editor = {Doss, Robin, Piramuthu, Selwyn and Zhou, Wei}, 402 | title = {Is {High Performance Computing (HPC)} Ready to Handle Big Data?}, 403 | bookTitle = {Future Network Systems and Security}, 404 | year = 2017, 405 | month = Aug, 406 | publisher = {Springer}, 407 | address = {Cham, Switzerland}, 408 | pages = {97--112}, 409 | abstract={In recent years big data has emerged as a universal term and its 410 | management has become a crucial research topic. The phrase `big data' 411 | refers to data sets so large and complex that the processing of them 412 | requires collaborative High Performance Computing (HPC). How to 413 | effectively allocate resources is one of the prime challenges in 414 | HPC. This leads us to the question: are the existing HPC resource 415 | allocation techniques effective enough to support future big data 416 | challenges? In this context, we have investigated the effectiveness of 417 | HPC resource allocation using the Google cluster dataset and a number of 418 | data mining tools to determine the correlational coefficient between 419 | resource allocation, resource usages and priority. Our analysis 420 | initially focused on correlation between resource allocation and 421 | resource uses. The finding shows that a high volume of resources that 422 | are allocated by the system for a job are not being used by that same 423 | job. To investigate further, we analyzed the correlation between 424 | resource allocation, resource usages and priority. Our clustering, 425 | classification and prediction techniques identified that the allocation 426 | and uses of resources are very loosely correlated with priority of the 427 | jobs. This research shows that our current HPC scheduling needs 428 | improvement in order to accommodate the big data challenge 429 | efficiently.}, 430 | keywords = {Big data; HPC; Data mining; QoS; Correlation }, 431 | isbn = {978-3-319-65548-2}, 432 | doi = {10.1007/978-3-319-65548-2_8}, 433 | url = {https://doi.org/10.1007/978-3-319-65548-2_8}, 434 | } 435 | 436 | @INPROCEEDINGS{clusterdata:Elsayed2017, 437 | author = {Nosayba El-Sayed and Hongyu Zhu and Bianca Schroeder}, 438 | title = {Learning from Failure Across Multiple Clusters: A Trace-Driven Approach to Understanding, Predicting, and Mitigating Job Terminations}, 439 | booktitle={International Conference on Distributed Computing Systems (ICDCS)}, 440 | year=2017, 441 | month=Jun, 442 | pages={1333--1344}, 443 | abstract={In large-scale computing platforms, jobs are prone to interruptions 444 | and premature terminations, limiting their usability and leading to 445 | significant waste in cluster resources. In this paper, we tackle this 446 | problem in three steps. First, we provide a comprehensive study based on 447 | log data from multiple large-scale production systems to identify 448 | patterns in the behaviour of unsuccessful jobs across different clusters 449 | and investigate possible root causes behind job termination. Our results 450 | reveal several interesting properties that distinguish unsuccessful jobs 451 | from others, particularly w.r.t. resource consumption patterns and job 452 | configuration settings. Secondly, we design a machine learning-based 453 | framework for predicting job and task terminations. We show that job 454 | failures can be predicted relatively early with high precision and 455 | recall, and also identify attributes that have strong predictive power 456 | of job failure. Finally, we demonstrate in a concrete use case how our 457 | prediction framework can be used to mitigate the effect of unsuccessful 458 | execution using an effective task-cloning policy that we propose.}, 459 | keywords={learning (artificial intelligence);parallel 460 | processing;resource allocation;software fault tolerance; job 461 | configuration settings;job failures prediction;job 462 | terminations mitigation;job terminations prediction; 463 | large-scale computing platforms;machine learning-based 464 | framework;resource consumption patterns; task-cloning 465 | policy;trace-driven approach;Computer crashes;Electric 466 | breakdown;Google;Large-scale systems; Linear systems;Parallel 467 | processing;Program processors;Failure Mitigation;Failure 468 | Prediction;Job Failure; Large-Scale Systems;Reliability;Trace 469 | Analysis}, doi={10.1109/ICDCS.2017.317}, issn={1063-6927}, } 470 | 471 | ################ 2014 472 | 473 | @INPROCEEDINGS{clusterdata:Abdul-Rahman2014, 474 | author = {Abdul-Rahman, Omar Arif and Aida, Kento}, 475 | title = {Towards understanding the usage behavior of {Google} cloud 476 | users: the mice and elephants phenomenon}, 477 | booktitle = {IEEE International Conference on Cloud Computing 478 | Technology and Science (CloudCom)}, 479 | year = 2014, 480 | month = dec, 481 | address = {Singapore}, 482 | pages = {272--277}, 483 | keywords = {Google trace; Workload trace analysis; User session view; 484 | Application composition; Mass-Count disparity; Exploratory statistical 485 | analysis; Visual analysis; Color-schemed graphs; Coarse grain 486 | classification; Heavy-tailed distributions; Long-tailed lognormal 487 | distributions; Exponential distribution; Normal distribution; Discrete 488 | modes; Large web services; Batch processing; MapReduce computation; 489 | Human users; }, 490 | abstract = {In the era of cloud computing, users encounter the challenging 491 | task of effectively composing and running their applications on the 492 | cloud. In an attempt to understand user behavior in constructing 493 | applications and interacting with typical cloud infrastructures, we 494 | analyzed a large utilization dataset of Google cluster. In the present 495 | paper, we consider user behavior in composing applications from the 496 | perspective of topology, maximum requested computational resources, and 497 | workload type. We model user dynamic behavior around the user's session 498 | view. Mass-Count disparity metrics are used to investigate the 499 | characteristics of underlying statistical models and to characterize 500 | users into distinct groups according to their composition and behavioral 501 | classes and patterns. The present study reveals interesting insight into 502 | the heterogeneous structure of the Google cloud workload.}, 503 | doi = {10.1109/CloudCom.2014.75}, 504 | } 505 | 506 | ################ 2013 507 | 508 | @inproceedings{clusterdata:Di2013, 509 | title = {Characterizing cloud applications on a {Google} data center}, 510 | author = {Di, Sheng and Kondo, Derrick and Franck, Cappello}, 511 | booktitle = {42nd International Conference on Parallel Processing (ICPP)}, 512 | year = 2013, 513 | month = Oct, 514 | address = {Lyon, France}, 515 | abstract = {In this paper, we characterize Google applications, 516 | based on a one-month Google trace with over 650k jobs running 517 | across over 12000 heterogeneous hosts from a Google data 518 | center. On one hand, we carefully compute the valuable 519 | statistics about task events and resource utilization for 520 | Google applications, based on various types of resources (such 521 | as CPU, memory) and execution types (e.g., whether they can 522 | run batch tasks or not). Resource utilization per application 523 | is observed with an extremely typical Pareto principle. On the 524 | other hand, we classify applications via a K-means clustering 525 | algorithm with optimized number of sets, based on task events 526 | and resource usage. The number of applications in the Kmeans 527 | clustering sets follows a Pareto-similar distribution. We 528 | believe our work is very interesting and valuable for the 529 | further investigation of Cloud environment.}, 530 | } 531 | 532 | ################ 2012 533 | 534 | @INPROCEEDINGS{clusterdata:Reiss2012b, 535 | title = {Heterogeneity and dynamicity of clouds at scale: {Google} 536 | trace analysis}, 537 | author = {Charles Reiss and Alexey Tumanov and Gregory R. Ganger and 538 | Randy H. Katz and Michael A. Kozuch}, 539 | booktitle = {ACM Symposium on Cloud Computing (SoCC)}, 540 | year = 2012, 541 | month = Oct, 542 | address = {San Jose, CA, USA}, 543 | abstract = {To better understand the challenges in developing effective 544 | cloud-based resource schedulers, we analyze the first publicly available 545 | trace data from a sizable multi-purpose cluster. The most notable 546 | workload characteristic is heterogeneity: in resource types (e.g., 547 | cores:RAM per machine) and their usage (e.g., duration and resources 548 | needed). Such heterogeneity reduces the effectiveness of traditional 549 | slot- and core-based scheduling. Furthermore, some tasks are 550 | constrained as to the kind of machine types they can use, increasing the 551 | complexity of resource assignment and complicating task migration. The 552 | workload is also highly dynamic, varying over time and most workload 553 | features, and is driven by many short jobs that demand quick scheduling 554 | decisions. While few simplifying assumptions apply, we find that many 555 | longer-running jobs have relatively stable resource utilizations, which 556 | can help adaptive resource schedulers.}, 557 | url = {http://www.pdl.cmu.edu/PDL-FTP/CloudComputing/googletrace-socc2012.pdf}, 558 | privatenote = {An earlier version of this was posted at 559 | \url{http://www.istc-cc.cmu.edu/publications/papers/2012/ISTC-CC-TR-12-101.pdf}, 560 | and included here as clusterdata:Reiss2012a. Please use this 561 | version instead of that.}, 562 | } 563 | 564 | @INPROCEEDINGS{clusterdata:Liu2012, 565 | author = {Zitao Liu and Sangyeun Cho}, 566 | title = {Characterizing machines and workloads on a {Google} cluster}, 567 | booktitle = {8th International Workshop on Scheduling and Resource 568 | Management for Parallel and Distributed Systems (SRMPDS)}, 569 | year = 2012, 570 | month = Sep, 571 | address = {Pittsburgh, PA, USA}, 572 | abstract = {Cloud computing offers high scalability, flexibility and 573 | cost-effectiveness to meet emerging computing 574 | requirements. Understanding the characteristics of real workloads on a 575 | large production cloud cluster benefits not only cloud service providers 576 | but also researchers and daily users. This paper studies a large-scale 577 | Google cluster usage trace dataset and characterizes how the machines in 578 | the cluster are managed and the workloads submitted during a 29-day 579 | period behave. We focus on the frequency and pattern of machine 580 | maintenance events, job- and task-level workload behavior, and how the 581 | overall cluster resources are utilized.}, 582 | url = {http://www.cs.pitt.edu/cast/abstract/liu-srmpds12.html}, 583 | } 584 | 585 | @INPROCEEDINGS{clusterdata:Di2012a, 586 | author = {Sheng Di and Derrick Kondo and Walfredo Cirne}, 587 | title = {Characterization and comparison of cloud versus {Grid} workloads}, 588 | booktitle = {International Conference on Cluster Computing (IEEE CLUSTER)}, 589 | year = 2012, 590 | month = Sep, 591 | pages = {230--238}, 592 | address = {Beijing, China}, 593 | abstract = {A new era of Cloud Computing has emerged, but the characteristics 594 | of Cloud load in data centers is not perfectly clear. Yet this 595 | characterization is critical for the design of novel Cloud job and 596 | resource management systems. In this paper, we comprehensively 597 | characterize the job/task load and host load in a real-world production 598 | data center at Google Inc. We use a detailed trace of over 25 million 599 | tasks across over 12,500 hosts. We study the differences between a 600 | Google data center and other Grid/HPC systems, from the perspective of 601 | both work load (w.r.t. jobs and tasks) and host load 602 | (w.r.t. machines). In particular, we study the job length, job 603 | submission frequency, and the resource utilization of jobs in the 604 | different systems, and also investigate valuable statistics of machine's 605 | maximum load, queue state and relative usage levels, with different job 606 | priorities and resource attributes. We find that the Google data center 607 | exhibits finer resource allocation with respect to CPU and memory than 608 | that of Grid/HPC systems. Google jobs are always submitted with much 609 | higher frequency and they are much shorter than Grid jobs. As such, 610 | Google host load exhibits higher variance and noise.}, 611 | keywords = {cloud computing;computer centres;grid computing;queueing 612 | theory;resource allocation;search engines;CPU;Google data 613 | center;cloud computing;cloud job;cloud load;data centers;grid 614 | workloads;grid-HPC systems;host load;job length;job submission 615 | frequency;jobs resource utilization;machine maximum load;queue 616 | state;real-world production data center;relative usage 617 | levels;resource allocation;resource attributes;resource 618 | management systems;task load;Capacity 619 | planning;Google;Joints;Load modeling;Measurement;Memory 620 | management;Resource management;Cloud Computing;Grid 621 | Computing;Load Characterization}, 622 | doi = {10.1109/CLUSTER.2012.35}, 623 | privatenote = {An earlier version is available at 624 | \url{http://hal.archives-ouvertes.fr/hal-00705858}. It used 625 | to be included here as clusterdata:Di2012.}, 626 | } 627 | 628 | ################ 2010 629 | 630 | @Article{clusterdata:Mishra2010, 631 | author = {Mishra, Asit K. and Hellerstein, Joseph L. and Cirne, 632 | Walfredo and Das, Chita R.}, 633 | title = {Towards characterizing cloud backend workloads: insights 634 | from {Google} compute clusters}, 635 | journal = {SIGMETRICS Perform. Eval. Rev.}, 636 | volume = {37}, 637 | number = {4}, 638 | month = Mar, 639 | year = 2010, 640 | issn = {0163-5999}, 641 | pages = {34--41}, 642 | numpages = {8}, 643 | url = {http://doi.acm.org/10.1145/1773394.1773400}, 644 | doi = {10.1145/1773394.1773400}, 645 | publisher = {ACM}, 646 | abstract = {The advent of cloud computing promises highly available, 647 | efficient, and flexible computing services for applications such as web 648 | search, email, voice over IP, and web search alerts. Our experience at 649 | Google is that realizing the promises of cloud computing requires an 650 | extremely scalable backend consisting of many large compute clusters 651 | that are shared by application tasks with diverse service level 652 | requirements for throughput, latency, and jitter. These considerations 653 | impact (a) capacity planning to determine which machine resources must 654 | grow and by how much and (b) task scheduling to achieve high machine 655 | utilization and to meet service level objectives. 656 | 657 | Both capacity planning and task scheduling require a good understanding 658 | of task resource consumption (e.g., CPU and memory usage). This in turn 659 | demands simple and accurate approaches to workload 660 | classification-determining how to form groups of tasks (workloads) with 661 | similar resource demands. One approach to workload classification is to 662 | make each task its own workload. However, this approach scales poorly 663 | since tens of thousands of tasks execute daily on Google compute 664 | clusters. Another approach to workload classification is to view all 665 | tasks as belonging to a single workload. Unfortunately, applying such a 666 | coarse-grain workload classification to the diversity of tasks running 667 | on Google compute clusters results in large variances in predicted 668 | resource consumptions. 669 | 670 | This paper describes an approach to workload classification and its 671 | application to the Google Cloud Backend, arguably the largest cloud 672 | backend on the planet. Our methodology for workload classification 673 | consists of: (1) identifying the workload dimensions; (2) constructing 674 | task classes using an off-the-shelf algorithm such as k-means; (3) 675 | determining the break points for qualitative coordinates within the 676 | workload dimensions; and (4) merging adjacent task classes to reduce the 677 | number of workloads. We use the foregoing, especially the notion of 678 | qualitative coordinates, to glean several insights about the Google 679 | Cloud Backend: (a) the duration of task executions is bimodal in that 680 | tasks either have a short duration or a long duration; (b) most tasks 681 | have short durations; and (c) most resources are consumed by a few tasks 682 | with long duration that have large demands for CPU and memory.}, 683 | } 684 | 685 | 686 | ################################################################ 687 | # Trace-usage papers 688 | ################################################################ 689 | 690 | These entries are for papers that primarily focus on some other topic, but 691 | use the traces as inputs, e.g., in simulations or load predictions. 692 | Order: most recent first. 693 | 694 | ################ 2023 695 | @ARTICLE{clusterdata:jajooSLearnTCC2023, 696 | author={Jajoo, Akshay and Hu, Y. Charlie and Lin, Xiaojun and Deng, Nan}, 697 | journal={IEEE Transactions on Cloud Computing}, 698 | title={SLearn: A Case for Task Sampling Based Learning for Cluster Job Scheduling}, 699 | year={2023}, 700 | volume={11}, 701 | number={3}, 702 | pages={2664-2680}, 703 | publisher = {USENIX Association}, 704 | keywords = {data centers, big data, job scheduling, learning, online learning}, 705 | abstract = {The ability to accurately estimate job runtime properties allows a 706 | scheduler to effectively schedule jobs. State-of-the-art online cluster 707 | job schedulers use history-based learning, which uses past job execution 708 | information to estimate the runtime properties of newly arrived jobs. 709 | However, with fast-paced development in cluster technology (in both hardware 710 | and software) and changing user inputs, job runtime properties can change over 711 | time, which lead to inaccurate predictions. In this article, we explore the 712 | potential and limitation of real-time learning of job runtime properties, 713 | by proactively sampling and scheduling a small fraction of the tasks of 714 | each job. Such a task-sampling-based approach exploits the similarity among 715 | runtime properties of the tasks of the same job and is inherently immune to 716 | changing job behavior. Our analytical and experimental analysis of 3 production 717 | traces with different skew and job distribution shows that learning in space can 718 | be substantially more accurate. Our simulation and testbed evaluation on Azure of 719 | the two learning approaches anchored in a generic job scheduler using 3 production 720 | cluster job traces shows that despite its online overhead, learning in space reduces 721 | the average Job Completion Time (JCT) by 1.28×, 1.56×, and 1.32× compared to the 722 | prior-art history-based predictor. We further analyze the experimental results to 723 | give intuitive explanations to why learning in space outperforms learning in time 724 | in these experiments. Finally, we show how sampling-based learning can be extended 725 | to schedule DAG jobs and achieve similar speedups over the prior-art history-based 726 | predictor.}, 727 | doi={10.1109/TCC.2022.3222649}} 728 | 729 | 730 | ################ 2022 731 | @inproceedings {clusterdata:jajooSLearnNSDI2022, 732 | author = {Akshay Jajoo and Y. Charlie Hu and Xiaojun Lin and Nan Deng}, 733 | title = {A Case for Task Sampling based Learning for Cluster Job Scheduling}, 734 | booktitle = {19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)}, 735 | year = {2022}, 736 | address = {Renton, WA, USA}, 737 | url = {https://www.usenix.org/conference/nsdi22/presentation/jajoo}, 738 | publisher = {USENIX Association}, 739 | keywords = {data centers, big data, job scheduling, learning, online learning}, 740 | abstract = {The ability to accurately estimate job runtime properties allows a 741 | scheduler to effectively schedule jobs. State-of-the-art online cluster job 742 | schedulers use history-based learning, which uses past job execution information 743 | to estimate the runtime properties of newly arrived jobs. However, with fast-paced 744 | development in cluster technology (in both hardware and software) and changing user 745 | inputs, job runtime properties can change over time, which lead to inaccurate predictions. 746 | In this paper, we explore the potential and limitation of real-time learning of job 747 | runtime properties, by proactively sampling and scheduling a small fraction of the 748 | tasks of each job. Such a task-sampling-based approach exploits the similarity among 749 | runtime properties of the tasks of the same job and is inherently immune to changing 750 | job behavior. Our study focuses on two key questions in comparing task-sampling-based 751 | learning (learning in space) and history-based learning (learning in time): (1) Can 752 | learning in space be more accurate than learning in time? (2) If so, can delaying 753 | scheduling the remaining tasks of a job till the completion of sampled tasks be more 754 | than compensated by the improved accuracy and result in improved job performance? Our 755 | analytical and experimental analysis of 3 production traces with different skew and job 756 | distribution shows that learning in space can be substantially more accurate. Our 757 | simulation and testbed evaluation on Azure of the two learning approaches anchored in a 758 | generic job scheduler using 3 production cluster job traces shows that despite its online 759 | overhead, learning in space reduces the average Job Completion Time (JCT) by 1.28x, 1.56x, 760 | and 1.32x compared to the prior-art history-based predictor.}, 761 | } 762 | 763 | 764 | ################ 2021 765 | @article{clusterdata:jajooSLearnTechReport2021, 766 | author = {Akshay Jajoo and Y. Charlie Hu and Xiaojun Lin and Nan Deng}, 767 | title = {The Case for Task Sampling based Learning for Cluster Job Scheduling}, 768 | journal = {Computing Research Repository}, 769 | volume = {abs/2108.10464}, 770 | year = {2021}, 771 | url = {https://arxiv.org/abs/2108.10464}, 772 | eprinttype = {arXiv}, 773 | eprint = {2108.10464}, 774 | timestamp = {Fri, 27 Aug 2021 15:02:29 +0200}, 775 | biburl = {https://dblp.org/rec/journals/corr/abs-2108-10464.bib}, 776 | bibsource = {dblp computer science bibliography, https://dblp.org}, 777 | keywords = {data centers, big data, job scheduling, learning, online learning}, 778 | abstract = {The ability to accurately estimate job runtime properties allows a 779 | scheduler to effectively schedule jobs. State-of-the-art online cluster job 780 | schedulers use history-based learning, which uses past job execution information 781 | to estimate the runtime properties of newly arrived jobs. However, with fast-paced 782 | development in cluster technology (in both hardware and software) and changing user 783 | inputs, job runtime properties can change over time, which lead to inaccurate predictions. 784 | In this paper, we explore the potential and limitation of real-time learning of job 785 | runtime properties, by proactively sampling and scheduling a small fraction of the 786 | tasks of each job. Such a task-sampling-based approach exploits the similarity among 787 | runtime properties of the tasks of the same job and is inherently immune to changing 788 | job behavior. Our study focuses on two key questions in comparing task-sampling-based 789 | learning (learning in space) and history-based learning (learning in time): (1) Can 790 | learning in space be more accurate than learning in time? (2) If so, can delaying 791 | scheduling the remaining tasks of a job till the completion of sampled tasks be more 792 | than compensated by the improved accuracy and result in improved job performance? Our 793 | analytical and experimental analysis of 3 production traces with different skew and job 794 | distribution shows that learning in space can be substantially more accurate. Our 795 | simulation and testbed evaluation on Azure of the two learning approaches anchored in a 796 | generic job scheduler using 3 production cluster job traces shows that despite its online 797 | overhead, learning in space reduces the average Job Completion Time (JCT) by 1.28x, 1.56x, 798 | and 1.32x compared to the prior-art history-based predictor.}, 799 | } 800 | 801 | ################ 2020 802 | 803 | @INPROCEEDINGS{clusterdata:Lin2020, 804 | title = {Using {GANs} for Sharing Networked Time Series Data: Challenges, 805 | Initial Promise, and Open Questions}, 806 | author = {Lin, Zinan and Jain, Alankar and Wang, Chen and Fanti, 807 | Giulia and Sekar, Vyas}, 808 | year = {2020}, 809 | isbn = {9781450381383}, 810 | publisher = {Association for Computing Machinery}, 811 | url = {https://doi.org/10.1145/3419394.3423643}, 812 | doi = {10.1145/3419394.3423643}, 813 | abstract = {Limited data access is a longstanding barrier to data-driven 814 | research and development in the networked systems community. In this work, 815 | we explore if and how generative adversarial networks (GANs) can be used to 816 | incentivize data sharing by enabling a generic framework for sharing 817 | synthetic datasets with minimal expert knowledge. As a specific target, our 818 | focus in this paper is on time series datasets with metadata (e.g., packet 819 | loss rate measurements with corresponding ISPs). We identify key challenges 820 | of existing GAN approaches for such workloads with respect to fidelity 821 | (e.g., long-term dependencies, complex multidimensional relationships, mode 822 | collapse) and privacy (i.e., existing guarantees are poorly understood and 823 | can sacrifice fidelity). To improve fidelity, we design a custom workflow 824 | called DoppelGANger (DG) and demonstrate that across diverse real-world 825 | datasets (e.g., bandwidth measurements, cluster requests, web sessions) and 826 | use cases (e.g., structural characterization, predictive modeling, algorithm 827 | comparison), DG achieves up to 43% better fidelity than baseline models. 828 | Although we do not resolve the privacy problem in this work, we identify 829 | fundamental challenges with both classical notions of privacy and recent 830 | advances to improve the privacy properties of GANs, and suggest a potential 831 | roadmap for addressing these challenges. By shedding light on the promise 832 | and challenges, we hope our work can rekindle the conversation on workflows 833 | for data sharing.}, 834 | booktitle = {Proceedings of the ACM Internet Measurement Conference (IMC 835 | 2020)}, 836 | pages = {464--483}, 837 | numpages = {20}, 838 | keywords = {privacy, synthetic data generation, time series, 839 | generative adversarial networks}, 840 | } 841 | 842 | @article{clusterdata:Aydin2020, 843 | title = {Multi-objective temporal bin packing problem: an application in cloud computing}, 844 | journal = {Computers \& Operations Research}, 845 | volume = 121, 846 | pages = {1049--59}, 847 | year = 2020, 848 | month = Sep, 849 | issn = {0305-0548}, 850 | doi = {https://doi.org/10.1016/j.cor.2020.104959}, 851 | url = {http://www.sciencedirect.com/science/article/pii/S0305054820300769}, 852 | author = {Nurşen Aydin and Ibrahim Muter and Ş. Ilker Birbil}, 853 | keywords = {Bin packing, Cloud computing, Heuristics, Exact methods, Column generation}, 854 | abstract = {Improving energy efficiency and lowering operational 855 | costs are the main challenges faced in systems with multiple 856 | servers. One prevalent objective in such systems is to 857 | minimize the number of servers required to process a given set 858 | of tasks under server capacity constraints. This objective 859 | leads to the well-known bin packing problem. In this study, we 860 | consider a generalization of this problem with a time 861 | dimension, where the tasks are to be performed with predefined 862 | start and end times. This new dimension brings about new 863 | performance considerations, one of which is the uninterrupted 864 | utilization of servers. This study is motivated by the problem 865 | of energy efficient assignment of virtual machines to physical 866 | servers in a cloud computing service. We address the virtual 867 | machine placement problem and present a binary integer 868 | programming model to develop different assignment policies. By 869 | analyzing the structural properties of the problem, we propose 870 | an efficient heuristic method based on solving smaller 871 | versions of the original problem iteratively. Moreover, we 872 | design a column generation algorithm that yields a lower bound 873 | on the objective value, which can be utilized to evaluate the 874 | performance of the heuristic algorithm. Our numerical study 875 | indicates that the proposed heuristic is capable of solving 876 | large-scale instances in a short time with small optimality 877 | gaps.}, 878 | } 879 | 880 | @article{clusterdata:Milocco2020, 881 | title = {Evaluating the Upper Bound of Energy Cost Saving by Proactive Data Center Management}, 882 | journal = {IEEE Transactions on Network and Service Management}, 883 | year = 2020, 884 | issn = {1932-4537}, 885 | doi = {10.1109/TNSM.2020.2988346}, 886 | url = {https://ieeexplore.ieee.org/abstract/document/9069318}, 887 | author = {Ruben Milocco and Pascale Minet and Éric Renault and Selma Boumerdassi}, 888 | keywords = {Data center management, Proactive management, Machine Learning, Prediction, Energy cost}, 889 | abstract = { 890 | Data Centers (DCs) need to periodically configure their servers in order to meet user demands. 891 | Since appropriate proactive management to meet demands reduces the cost, either by improving Quality of 892 | Service (QoS) or saving energy, there is a great interest in studying different proactive strategies 893 | based on predictions of the energy used to serve CPU and memory requests. The amount of savings that can 894 | be achieved depends not only on the selected proactive strategy but also on user-demand statistics and the 895 | predictors used. Despite its importance, it is difficult to find theoretical studies that quantify the 896 | savings that can be made, due to the problem complexity. A proactive DC management strategy is presented 897 | together with its upper bound of energy cost savings obtained with respect to a purely reactive management. 898 | Using this method together with records of the recent past, it is possible to quantify the efficiency of 899 | different predictors. Both linear and nonlinear predictors are studied, using a Google data set collected 900 | over 29 days, to evaluate the benefits that can be obtained with these two predictors.}, 901 | } 902 | 903 | 904 | ################ 2018 905 | 906 | @article{clusterdata:Sliwko2018, 907 | author = {Sliwko, Leszek}, 908 | title = {A Scalable Service Allocation Negotiation For Cloud Computing}, 909 | journal = {Journal of Theoretical and Applied Information Technology}, 910 | volume = 96, 911 | number = 20, 912 | month = Oct, 913 | year = 2018, 914 | issn = {1817-3195}, 915 | pages = {6751--6782}, 916 | numpages = {32}, 917 | keywords = {distributed scheduling, agents; load balancing, MASB}, 918 | abstract={This paper presents a detailed design of a decentralised agent-based 919 | scheduler, which can be used to manage workloads within the computing cells 920 | of a Cloud system. This scheme in based on the concept of service allocation 921 | negotiation, whereby all system nodes communicate between themselves and 922 | scheduling logic is decentralised. The architecture presented has been 923 | implemented, with multiple simulations run using realword workload traces from 924 | the Google Cluster Data project. The results were then compared to the 925 | scheduling patterns of Google’s Borg system.} 926 | } 927 | 928 | @INPROCEEDINGS{clusterdata:Liu2018gh, 929 | author = {Liu, Jinwei and Shen, Haiying and Sarker, Ankur and Chung, Wingyan}, 930 | title = {Leveraging Dependency in Scheduling and Preemption for High Throughput in Data-Parallel Clusters}, 931 | booktitle = {2018 IEEE International Conference on Cluster Computing (CLUSTER)}, 932 | year = {2018}, 933 | month = Sep, 934 | pages = {359--369}, 935 | publisher = {IEEE}, 936 | abstract = {Task scheduling and preemption are two important functions in 937 | data-parallel clusters. Though directed acyclic graph task dependencies 938 | are common in data-parallel clusters, previous task scheduling and 939 | preemption methods do not fully utilize such task dependency to increase 940 | throughput since they simply schedule precedent tasks prior to their 941 | dependent tasks or neglect the dependency. We notice that in both 942 | scheduling and preemption, choosing a task with more dependent tasks to 943 | run allows more tasks to be runnable next, which facilitates to select a 944 | task that can more increase throughput. Accordingly, in this paper, we 945 | propose a Dependency-aware Scheduling and Preemption system (DSP) to 946 | achieve high throughput. First, we build an integer linear programming 947 | model to minimize the makespan (i.e., the time when all jobs finish 948 | execution) with the consideration of task dependency and deadline, and 949 | derive the target server and start time for each task, which can 950 | minimize the makespan. Second, we utilize task dependency to determine 951 | tasks' priorities for preemption. Finally, we propose a method to reduce 952 | the number of unnecessary preemptions that cause more overhead than the 953 | throughput gain. Extensive experimental results based on a real cluster 954 | and Amazon EC2 cloud service show that DSP achieves much higher 955 | throughput compared to existing strategies.}, 956 | doi = {10.1109/CLUSTER.2018.00054}, 957 | } 958 | 959 | @inproceedings{clusterdata:Minet2018j, 960 | author = {Pascale Minet and Éric Renault and Ines Khoufi and Selma Boumerdassi}, 961 | title = {Analyzing Traces from a {Google} Data Center}, 962 | booktitle = {14th International Wireless Communications and Mobile Computing Conference (IWCMC 2018)}, 963 | year = 2018, 964 | month = Jun, 965 | publisher = {IEEE}, 966 | address = {Limassol, Cyprus}, 967 | pages = {1167--1172}, 968 | url = {https://doi.org/10.1109/IWCMC.2018.8450304}, 969 | doi = {10.1109/IWCMC.2018.8450304}, 970 | abstract = { 971 | Traces collected from an operational Google data center over 29 days represent a very rich 972 | and useful source of information for understanding the main features of a data center. In this 973 | paper, we characterize the strong heterogeneity of jobs and the medium heterogeneity of machine 974 | configurations. We analyze the off-periods of machines. We study the distribution of jobs per 975 | category, per scheduling class, per priority and per number of tasks. The distribution of job 976 | execution durations shows a high disparity, as does the job waiting time before being scheduled. 977 | The resource requests in terms of CPU and memory are also analyzed. The distribution of these 978 | parameter values is very useful to develop accurate models and algorithms for resource allocation 979 | in data centers.}, 980 | keywords = {Data analysis, data center, big data application, resource allocation, scheduling}, 981 | } 982 | 983 | @inproceedings{clusterdata:Minet2018m, 984 | author = {Pascale Minet and Éric Renault and Ines Khoufi and Selma Boumerdassi}, 985 | title = {Data Analysis of a {Google} Data Center}, 986 | booktitle = {18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2018)}, 987 | year = 2018, 988 | month = May, 989 | publisher = {IEEE}, 990 | address = {Washington DC, USA}, 991 | pages = {342--343}, 992 | url = {https://doi.org/10.1109/CCGRID.2018.00049}, 993 | doi = {10.1109/CCGRID.2018.00049}, 994 | abstract = { 995 | Data collected from an operational Google data center during 29 days represent a very rich 996 | and very useful source of information for understanding the main features of a data center. 997 | In this paper, we highlight the strong heterogeneity of jobs. The distribution of job execution 998 | duration shows a high disparity, as well as the job waiting time before being scheduled. The 999 | resource requests in terms of CPU and memory are also analyzed. The knowledge of all these features 1000 | is needed to design models of jobs, machines and resource requests that are representative of a 1001 | real data center.}, 1002 | } 1003 | 1004 | @ARTICLE{clusterdata:Sebastio2018b, 1005 | author = {Stefano Sebastio and Rahul Ghosh and Tridib Mukherjee}, 1006 | journal = {IEEE Transactions on Services Computing}, 1007 | title = {An availability analysis approach for deployment configurations of containers}, 1008 | year = {2018}, 1009 | month = Jan, 1010 | abstract = {Operating system (OS) containers enabling the microservice-oriented 1011 | architecture are becoming popular in the context of Cloud 1012 | services. Containers provide the ability to create lightweight and 1013 | portable runtime environments decoupling the application requirements 1014 | from the characteristics of the underlying system. Services built on 1015 | containers have a small resource footprint in terms of processing, 1016 | storage, memory and network, allowing a denser deployment 1017 | environment. While the performance of such containers is addressed in 1018 | few previous studies, understanding the failure-repair behavior of the 1019 | containers remains unexplored. In this paper, from an availability point 1020 | of view, we propose and compare different configuration models for 1021 | deploying a containerized software system. Inspired by Google 1022 | Kubernetes, a container management system, these configurations are 1023 | characterized with a failure response and migration service. We develop 1024 | novel non-state-space and state-space analytic models for container 1025 | availability analysis. Analytical as well as simulative solutions are 1026 | obtained for the developed models. Our analysis provides insights on k 1027 | out-of N availability and sensitivity of system availability for key 1028 | system parameters. Finally, we build an open-source software tool 1029 | powered by these models. The tool helps Cloud administrators to assess 1030 | the availability of containerized systems and to conduct a what-if 1031 | analysis based on user-provided parameters and configurations.}, 1032 | keywords = {Containers;Analytical models;Cloud computing;Stochastic 1033 | processes;Tools;Computer architecture;Google;container;system 1034 | availability;virtual machine;cloud computing;analytic model;stochastic 1035 | reward net}, 1036 | doi = {10.1109/TSC.2017.2788442}, 1037 | ISSN = {1939-1374}, 1038 | } 1039 | 1040 | 1041 | @article{clusterdata:Sebastio2018c, 1042 | author = {Sebastio, Stefano and Amoretti, Michele and Lafuente, Alberto Lluch and Scala, Antonio}, 1043 | title = {A Holistic Approach for Collaborative Workload Execution in Volunteer Clouds}, 1044 | journal = {ACM Transactions on Modeling and Computer Simulation (TOMACS)}, 1045 | volume = 28, 1046 | number = 2, 1047 | month = Mar, 1048 | year = 2018, 1049 | issn = {1049-3301}, 1050 | pages = {14:1--14:27}, 1051 | articleno = {14}, 1052 | numpages = {27}, 1053 | url = {http://doi.acm.org/10.1145/3155336}, 1054 | doi = {10.1145/3155336}, 1055 | acmid = {3155336}, 1056 | publisher = {ACM}, 1057 | keywords = {Collective adaptive systems, ant colony optimization (ACO), 1058 | autonomic computing, cloud computing, collaborative computing, 1059 | computational fields, multiagent optimization, peer-to-peer (P2P), task 1060 | scheduling}, 1061 | abstract={The demand for provisioning, using, and maintaining distributed 1062 | computational resources is growing hand in hand with the quest for 1063 | ubiquitous services. Centralized infrastructures such as cloud computing 1064 | systems provide suitable solutions for many applications, but their 1065 | scalability could be limited in some scenarios, such as in the case of 1066 | latency-dependent applications. The volunteer cloud paradigm aims at 1067 | overcoming this limitation by encouraging clients to offer their own 1068 | spare, perhaps unused, computational resources. Volunteer clouds are 1069 | thus complex, large-scale, dynamic systems that demand for self-adaptive 1070 | capabilities to offer effective services, as well as modeling and 1071 | analysis techniques to predict their behavior. In this article, we 1072 | propose a novel holistic approach for volunteer clouds supporting 1073 | collaborative task execution services able to improve the quality of 1074 | service of compute-intensive workloads. We instantiate our approach by 1075 | extending a recently proposed ant colony optimization algorithm for 1076 | distributed task execution with a workload-based partitioning of the 1077 | overlay network of the volunteer cloud. Finally, we evaluate our 1078 | approach using simulation-based statistical analysis techniques on a 1079 | workload benchmark provided by Google. Our results show that the 1080 | proposed approach outperforms some traditional distributed task 1081 | scheduling algorithms in the presence of compute-intensive workloads.} 1082 | } 1083 | 1084 | @Article{clusterdata:Sebastio2018d, 1085 | author = {Stefano Sebastio and Giorgio Gnecco}, 1086 | title = {A green policy to schedule tasks in a distributed cloud}, 1087 | journal = {Optimization Letters}, 1088 | year = 2018, 1089 | month = Oct, 1090 | day = 01, 1091 | volume = 12, 1092 | number = 7, 1093 | pages = {1535--1551}, 1094 | abstract = {In the last years, demand and availability of computational 1095 | capabilities experienced radical changes. Desktops and laptops increased 1096 | their processing resources, exceeding users' demand for large part of 1097 | the day. On the other hand, computational methods are more and more 1098 | frequently adopted by scientific communities, which often experience 1099 | difficulties in obtaining access to the required 1100 | resources. Consequently, data centers for outsourcing use, relying on 1101 | the cloud computing paradigm, are proliferating. Notwithstanding the 1102 | effort to build energy-efficient data centers, their energy footprint is 1103 | still considerable, since cooling a large number of machines situated in 1104 | the same room or container requires a significant amount of power. The 1105 | volunteer cloud, exploiting the users' willingness to share a quote of 1106 | their underused machine resources, can constitute an effective solution 1107 | to have the required computational resources when needed. In this paper, 1108 | we foster the adoption of the volunteer cloud computing as a green 1109 | (i.e., energy efficient) solution even able to outperform existing data 1110 | centers in specific tasks. To manage the complexity of such a large 1111 | scale heterogeneous system, we propose a distributed optimization policy 1112 | to task scheduling with the aim of reducing the overall energy 1113 | consumption executing a given workload. To this end, we consider an 1114 | integer programming problem relying on the Alternating Direction Method 1115 | of Multipliers (ADMM) for its solution. Our approach is compared with a 1116 | centralized one and other non-green targeting solutions. Results show 1117 | that the distributed solution found by the ADMM constitutes a good 1118 | suboptimal solution, worth to be applied in a real environment.}, 1119 | issn = {1862-4480}, 1120 | doi = {10.1007/s11590-017-1208-8}, 1121 | url = {https://doi.org/10.1007/s11590-017-1208-8} 1122 | } 1123 | ################ 2017 1124 | 1125 | @Article{clusterdata:Carvalho2017b, 1126 | author = {Marcus Carvalho and Daniel A. Menasc\'{e} and Francisco Brasileiro}, 1127 | title = {Capacity planning for {IaaS} cloud providers offering multiple 1128 | service classes}, 1129 | journal = {Future Generation Computer Systems}, 1130 | volume = {77}, 1131 | pages = {97--111}, 1132 | month = Dec, 1133 | year = 2017, 1134 | abstract = {Infrastructure as a Service (IaaS) cloud providers typically offer 1135 | multiple service classes to satisfy users with different requirements 1136 | and budgets. Cloud providers are faced with the challenge of estimating 1137 | the minimum resource capacity required to meet Service Level Objectives 1138 | (SLOs) defined for all service classes. This paper proposes a capacity 1139 | planning method that is combined with an admission control mechanism to 1140 | address this challenge. The capacity planning method uses analytical 1141 | models to estimate the output of a quota-based admission control 1142 | mechanism and find the minimum capacity required to meet availability 1143 | SLOs and admission rate targets for all classes. An evaluation using 1144 | trace-driven simulations shows that our method estimates the best cloud 1145 | capacity with a mean relative error of 2.5\% with respect to the 1146 | simulation, compared to a 36\% relative error achieved by a single-class 1147 | baseline method that does not consider admission control 1148 | mechanisms. Moreover, our method exhibited a high SLO fulfillment for 1149 | both availability and admission rates, and obtained mean CPU utilization 1150 | over 91\%, while the single-class baseline method had values not greater 1151 | than 78\%.}, 1152 | url = {http://www.sciencedirect.com/science/article/pii/S0167739X16308561}, 1153 | doi = {10.1016/j.future.2017.07.019}, 1154 | issn = {0167-739X}, 1155 | } 1156 | 1157 | @inproceedings{clusterdata:Janus2017, 1158 | author = {Pawel Janus and Krzysztof Rzadca}, 1159 | title = {{SLO}-aware Colocation of Data Center Tasks Based on Instantaneous Processor Requirements}, 1160 | booktitle = {ACM Symposium on Cloud Computing (SoCC)}, 1161 | year = 2017, 1162 | month = Sep, 1163 | pages = {256--268}, 1164 | address = {Santa Clara, CA, USA}, 1165 | publisher = {ACM}, 1166 | abstract = {In a cloud data center, a single physical machine simultaneously 1167 | executes dozens of highly heterogeneous tasks. Such colocation results 1168 | in more efficient utilization of machines, but, when tasks' requirements 1169 | exceed available resources, some of the tasks might be throttled down or 1170 | preempted. We analyze version 2.1 of the Google cluster trace that 1171 | shows short-term (1 second) task CPU usage. Contrary to the assumptions 1172 | taken by many theoretical studies, we demonstrate that the empirical 1173 | distributions do not follow any single distribution. However, high 1174 | percentiles of the total processor usage (summed over at least 10 tasks) 1175 | can be reasonably estimated by the Gaussian distribution. We use this 1176 | result for a probabilistic fit test, called the Gaussian Percentile 1177 | Approximation (GPA), for standard bin-packing algorithms. To check 1178 | whether a new task will fit into a machine, GPA checks whether the 1179 | resulting distribution's percentile corresponding to the requested 1180 | service level objective, SLO is still below the machine's capacity. In 1181 | our simulation experiments, GPA resulted in colocations exceeding the 1182 | machines' capacity with a frequency similar to the requested SLO.}, 1183 | doi = {10.1145/3127479.3132244}, 1184 | url = {http://arxiv.org/abs/1709.01384}, 1185 | } 1186 | 1187 | 1188 | @InProceedings{clusterdata:Carvalho2017, 1189 | title = {Multi-dimensional admission control and capacity planning for {IaaS} clouds with multiple service classes}, 1190 | author = {Carvalho, Marcus and Brasileiro, Francisco and Lopes, Raquel and Farias, Giovanni and Fook, Alessandro and Mafra, Jo\~{a}o and Turull, Daniel}, 1191 | booktitle = {IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)}, 1192 | year = 2017, 1193 | month = May, 1194 | pages = {160--169}, 1195 | address = {Madrid, Spain}, 1196 | keywords = {admission control, capacity planning, cloud computing, performance models, simulation}, 1197 | abstract = {Infrastructure as a Service (IaaS) providers typically offer 1198 | multiple service classes to deal with the wide variety of users adopting 1199 | this cloud computing model. In this scenario, IaaS providers need to 1200 | perform efficient admission control and capacity planning in order to 1201 | minimize infrastructure costs, while fulfilling the different Service 1202 | Level Objectives (SLOs) defined for all service classes 1203 | offered. However, most of the previous work on this field consider a 1204 | single resource dimension -- typically CPU -- when making such 1205 | management decisions. We show that this approach will either increase 1206 | infrastructure costs due to over-provisioning, or violate SLOs due to 1207 | lack of capacity for the resource dimensions being ignored. To fill this 1208 | gap, we propose admission control and capacity planning methods that 1209 | consider multiple service classes and multiple resource dimensions. Our 1210 | results show that our admission control method can guarantee a high 1211 | availability SLO fulfillment in scenarios where both CPU and memory can 1212 | become the bottleneck resource. Moreover, we show that our capacity 1213 | planning method can find the minimum capacity required for both CPU and 1214 | memory to meet SLOs with good accuracy. We also analyze how the load 1215 | variation on one resource dimension can affect another, highlighting the 1216 | need to manage resources for multiple dimensions simultaneously.}, 1217 | url = {https://doi.org/10.1109/CCGRID.2017.14}, 1218 | doi = {10.1109/CCGRID.2017.14}, 1219 | } 1220 | 1221 | @article{clusterdata:Sebastio2017, 1222 | title = {Optimal distributed task scheduling in volunteer clouds}, 1223 | journal = {Computers and Operations Research}, 1224 | volume = 81, 1225 | pages = {231 - 246}, 1226 | year = 2017, 1227 | month = May, 1228 | issn = {0305-0548}, 1229 | doi = {https://doi.org/10.1016/j.cor.2016.11.004}, 1230 | url = {http://www.sciencedirect.com/science/article/pii/S0305054816302660}, 1231 | author = {Stefano Sebastio and Giorgio Gnecco and Alberto Bemporad}, 1232 | keywords = {Cloud computing, Distributed optimization, Integer programming, 1233 | Combinatorial optimization, ADMM}, 1234 | abstract = {The ever increasing request of computational resources has shifted 1235 | the computing paradigm towards solutions where less computation is 1236 | performed locally. The most widely adopted approach nowadays is 1237 | represented by cloud computing. With the cloud, users can transparently 1238 | access to virtually infinite resources with the same aptitude of using 1239 | any other utility. Next to the cloud, the volunteer computing paradigm 1240 | has gained attention in the last decade, where the spared resources on 1241 | each personal machine are shared thanks to the usersââ�¬â�¢ willingness to 1242 | cooperate. Cloud and volunteer paradigms have been recently seen as 1243 | companion technologies to better exploit the use of local 1244 | resources. Conversely, this scenario places complex challenges in 1245 | managing such a large-scale environment, as the resources available on 1246 | each node and the presence of the nodes online are not known 1247 | a-priori. The complexity further increases in presence of tasks that 1248 | have an associated Service Level Agreement specified, e.g., through a 1249 | deadline. Distributed management solutions have then be advocated as the 1250 | only approaches that are realistically applicable. In this paper, we 1251 | propose a framework to allocate tasks according to different policies, 1252 | defined by suitable optimization problems. Then, we provide a 1253 | distributed optimization approach relying on the Alternating Direction 1254 | Method of Multipliers (ADMM) for one of these policies, and we compare 1255 | it with a centralized approach. Results show that, when a centralized 1256 | approach can not be adopted in a real environment, it could be possible 1257 | to rely on the good suboptimal solutions found by the ADMM.} 1258 | } 1259 | ################ 2016 1260 | 1261 | @InProceedings{clusterdata:Zakarya2016, 1262 | title = {An energy aware cost recovery approach for virtual machine migration}, 1263 | author = {Muhammad Zakarya and Lee Gillam}, 1264 | year = 2016, 1265 | booktitle = {13th International Conference on Economics of Grids, Clouds, Systems and Services (GECON2016)}, 1266 | month = September, 1267 | address = {Athens, Greece}, 1268 | abstract = {Datacenters provide an IT backbone for today's business and 1269 | economy, and are the principal electricity consumers for Cloud 1270 | computing. Various studies suggest that approximately 30\% of the 1271 | running servers in US datacenters are idle and the others are 1272 | under-utilized, making it possible to save energy and money by using 1273 | Virtual Machine (VM) consolidation to reduce the number of hosts in 1274 | use. However, consolidation involves migrations that can be expensive in 1275 | terms of energy consumption, and sometimes it will be more energy 1276 | efficient not to consolidate. This paper investigates how migration 1277 | decisions can be made such that the energy costs involved with the 1278 | migration are recovered, as only when costs of migration have been 1279 | recovered will energy start to be saved. We demonstrate through a number 1280 | of experiments, using the Google workload traces for 12,583 hosts and 1281 | 1,083,309 tasks, how different VM allocation heuristics, combined with 1282 | different approaches to migration, will impact on energy efficiency. We 1283 | suggest, using reasonable assumptions for datacenter setup, that a 1284 | combination of energy-aware fill-up VM allocation and energy-aware 1285 | migration, and migration only for relatively long running VMs, provides 1286 | for optimal energy efficiency.}, 1287 | url = {http://epubs.surrey.ac.uk/id/eprint/813810}, 1288 | } 1289 | 1290 | @INPROCEEDINGS{clusterdata:Sliwko2016, 1291 | title = {{AGOCS} - Accurate {Google} Cloud Simulator Framework}, 1292 | author = {Leszek Sliwko and Vladimir Getov}, 1293 | booktitle = {16th IEEE International Conference on Scalable Computing and Communications (ScalCom 2016)}, 1294 | year = 2016, 1295 | month = July, 1296 | pages={550--558}, 1297 | address = {Toulouse, France}, 1298 | keywords = {cloud system; workload traces; workload simulation framework; google cluster data}, 1299 | abstract = {This paper presents the Accurate Google Cloud Simulator (AGOCS) - 1300 | a novel high-fidelity Cloud workload simulator based on parsing real 1301 | workload traces, which can be conveniently used on a desktop machine for 1302 | day-to-day research. Our simulation is based on real-world workload 1303 | traces from a Google Cluster with 12.5K nodes, over a period of a 1304 | calendar month. The framework is able to reveal very precise and 1305 | detailed parameters of the executed jobs, tasks and nodes as well as to 1306 | provide actual resource usage statistics. The system has been 1307 | implemented in Scala language with focus on parallel execution and an 1308 | easy-to-extend design concept. The paper presents the detailed 1309 | structural framework for AGOCS and discusses our main design decisions, 1310 | whilst also suggesting alternative and possibly performance enhancing 1311 | future approaches. The framework is available via the Open Source GitHub 1312 | repository.}, 1313 | url = {http://dx.doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.10}, 1314 | doi={10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.10}, 1315 | } 1316 | 1317 | ################ 2015 1318 | 1319 | @INPROCEEDINGS{clusterdata:Carvalho2015, 1320 | title = {Prediction-Based Admission Control for {IaaS} Clouds with Multiple Service Classes}, 1321 | author = {Marcus Carvalho and Daniel Menasce and Francisco Brasileiro}, 1322 | booktitle = {IEEE International Conference on Cloud Computing Technology and Science (CloudCom)}, 1323 | year = 2015, 1324 | month = Nov, 1325 | pages={82--90}, 1326 | address = {Vancouver, BC, Canada}, 1327 | keywords = {admission control;cloud computing;infrastructure-as-a-service; 1328 | performance prediction;quality of service;resource management}, 1329 | abstract = {There is a growing adoption of cloud computing services, 1330 | attracting users with different requirements and budgets to run 1331 | their applications in cloud infrastructures. In order to match 1332 | users' needs, cloud providers can offer multiple service 1333 | classes with different pricing and Service Level Objective (SLO) 1334 | guarantees. Admission control mechanisms can help providers to 1335 | meet target SLOs by limiting the demand at peak periods. This 1336 | paper proposes a prediction-based admission control model for 1337 | IaaS clouds with multiple service classes, aiming to maximize 1338 | request admission rates while fulfilling availability SLOs 1339 | defined for each class. We evaluate our approach with trace-driven 1340 | simulations fed with data from production systems. Our results 1341 | show that admission control can reduce SLO violations 1342 | significantly, specially in underprovisioned scenarios. Moreover, 1343 | our predictive heuristics are less sensitive to different capacity 1344 | planning and SLO decisions, as they fulfill availability SLOs for 1345 | more than 91\% of requests even in the worst case scenario, for 1346 | which only 56\% of SLOs are fulfilled by a simpler greedy heuristic 1347 | and as little as 0.2\% when admission control is not used.}, 1348 | url = {http://dx.doi.org/10.1109/CloudCom.2015.16}, 1349 | doi={10.1109/CloudCom.2015.16}, 1350 | } 1351 | 1352 | @INPROCEEDINGS{clusterdata:Ismaeel2015, 1353 | author = {Salam Ismaeel and Ali Miri}, 1354 | title = {Using {ELM} Techniques to Predict Data Centre {VM} Requests}, 1355 | year = 2015, 1356 | booktitle = {IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)}, 1357 | month = Nov, 1358 | publisher = {IEEE}, 1359 | address = {New York, NY, USA}, 1360 | abstract = {Data centre prediction models can be used to forecast future loads 1361 | for a given centre in terms of CPU, memory, VM requests, and other 1362 | parameters. An effective and efficient model can not only be used to 1363 | optimize resource allocation, but can also be used as part of a strategy 1364 | to conserve energy, improve performance and increase profits for both 1365 | clients and service providers. In this paper, we have developed a 1366 | prediction model, which combines k-means clustering techniques and 1367 | Extreme Learning Machines (ELMs). We have shown the effectiveness of our 1368 | proposed model by using it to estimate future VM requests in a data 1369 | centre based on its historical usage. We have tested our model on real 1370 | Google traces that feature over 25 million tasks collected over a 29-day 1371 | time period. Experimental results presented show that our proposed 1372 | system outperforms other models reported in the literature.}, 1373 | } 1374 | 1375 | @INPROCEEDINGS{clusterdata:Sebastio2015b, 1376 | author = {Stefano Sebastio and Antonio Scala}, 1377 | booktitle = {2015 IEEE Conference on Collaboration and Internet Computing (CIC)}, 1378 | title = {A workload-based approach to partition the volunteer cloud}, 1379 | year = 2015, 1380 | month = Oct, 1381 | pages = {210--218}, 1382 | abstract = {The growing demand of computational resources has shifted users 1383 | towards the adoption of cloud computing technologies. Cloud allows users 1384 | to transparently access to remote computing capabilities as an 1385 | utility. The volunteer computing paradigm, another ICT trend of the last 1386 | years, can be considered a companion force to enhance the cloud in 1387 | fulfilling specific domain requirements, such as computational intensive 1388 | requests. Combining the spared resources provided by volunteer nodes 1389 | with few data centers is possible to obtain a robust and scalable cloud 1390 | platform. The price for such benefits relies in increased challenges to 1391 | design and manage a dynamic complex system composed by heterogeneous 1392 | nodes. Task execution requests submitted in the volunteer cloud are 1393 | usually associated with Quality of Service requirements e.g., Specified 1394 | through an execution deadline. In this paper, we present a preliminary 1395 | evaluation of a cloud partitioning approach to distribute task execution 1396 | requests in volunteer cloud, that has been validated through a 1397 | simulation-based statistical analysis using the Google workload data 1398 | trace.}, 1399 | keywords = {cloud computing;computer centres;digital simulation;quality of 1400 | service;statistical analysis;volunteer computing;workload-based 1401 | approach;volunteer cloud partitioning;computational resources;cloud 1402 | computing technologies;remote computing capabilities;volunteer computing 1403 | paradigm;volunteer nodes;data centers;cloud platform;dynamic complex 1404 | system;heterogeneous nodes;task execution request;quality of service 1405 | requirements;simulation-based statistical analysis;Google workload data 1406 | trace;Cloud computing;Measurement;Peer-to-peer computing;Quality of 1407 | service;Google;Overlay networks;Computer applications;cloud 1408 | computing;autonomic clouds;autonomous systems;volunteer 1409 | computing;distributed tasks execution}, 1410 | doi = {10.1109/CIC.2015.27}, 1411 | } 1412 | 1413 | @INPROCEEDINGS{clusterdata:Sirbu2015, 1414 | title = {Towards Data-Driven Autonomics in Data Centers}, 1415 | author = {Alina S{\^\i}rbu and Ozalp Babaoglu}, 1416 | booktitle = {International Conference on Cloud and Autonomic Computing (ICCAC)}, 1417 | month = Sep, 1418 | year = 2015, 1419 | address = {Cambridge, MA, USA}, 1420 | publisher = {IEEE Computer Society}, 1421 | keywords = {Data science; predictive analytics; Google cluster 1422 | trace; log data analysis; failure prediction; machine learning 1423 | classification; ensemble classifier; random forest; BigQuery}, 1424 | abstract = {Continued reliance on human operators for managing data centers is 1425 | a major impediment for them from ever reaching extreme dimensions. 1426 | Large computer systems in general, and data centers in particular, will 1427 | ultimately be managed using predictive computational and executable 1428 | models obtained through data-science tools, and at that point, the 1429 | intervention of humans will be limited to setting high-level goals and 1430 | policies rather than performing low-level operations. Data-driven 1431 | autonomics, where management and control are based on holistic 1432 | predictive models that are built and updated using generated data, opens 1433 | one possible path towards limiting the role of operators in data 1434 | centers. In this paper, we present a data-science study of a public 1435 | Google dataset collected in a 12K-node cluster with the goal of building 1436 | and evaluating a predictive model for node failures. We use BigQuery, 1437 | the big data SQL platform from the Google Cloud suite, to process 1438 | massive amounts of data and generate a rich feature set characterizing 1439 | machine state over time. We describe how an ensemble classifier can be 1440 | built out of many Random Forest classifiers each trained on these 1441 | features, to predict if machines will fail in a future 24-hour 1442 | window. Our evaluation reveals that if we limit false positive rates to 1443 | 5\%, we can achieve true positive rates between 27\% and 88\% with 1444 | precision varying between 50\% and 72\%. We discuss the practicality of 1445 | including our predictive model as the central component of a data-driven 1446 | autonomic manager and operating it on-line with live data streams 1447 | (rather than off-line on data logs). All of the scripts used for 1448 | BigQuery and classification analyses are publicly available from the 1449 | authors' website.}, 1450 | url = {http://www.cs.unibo.it/babaoglu/papers/pdf/CAC2015.pdf}, 1451 | } 1452 | 1453 | @inproceedings {clusterdata:Delgado2015hawk, 1454 | author = {Pamela Delgado and Florin Dinu and Anne-Marie Kermarrec and Willy Zwaenepoel}, 1455 | title = {{Hawk}: hybrid datacenter scheduling}, 1456 | year = {2015}, 1457 | booktitle = {USENIX Annual Technical Conference (USENIX ATC)}, 1458 | month = Jul, 1459 | publisher = {USENIX Association}, 1460 | pages = {499--510}, 1461 | address = {Santa Clara, CA, USA}, 1462 | isbn = {978-1-931971-225}, 1463 | url = {https://www.usenix.org/conference/atc15/technical-session/presentation/delgado}, 1464 | abstract = {This paper addresses the problem of efficient scheduling of large 1465 | clusters under high load and heterogeneous workloads. A heterogeneous 1466 | workload typically consists of many short jobs and a small number of 1467 | large jobs that consume the bulk of the cluster's resources. 1468 | 1469 | Recent work advocates distributed scheduling to overcome the limitations 1470 | of centralized schedulers for large clusters with many competing 1471 | jobs. Such distributed schedulers are inherently scalable, but may make 1472 | poor scheduling decisions because of limited visibility into the overall 1473 | resource usage in the cluster. In particular, we demonstrate that under 1474 | high load, short jobs can fare poorly with such a distributed scheduler. 1475 | 1476 | We propose instead a new hybrid centralized/distributed scheduler, 1477 | called Hawk. In Hawk, long jobs are scheduled using a centralized 1478 | scheduler, while short ones are scheduled in a fully distributed 1479 | way. Moreover, a small portion of the cluster is reserved for the use of 1480 | short jobs. In order to compensate for the occasional poor decisions 1481 | made by the distributed scheduler, we propose a novel and efficient 1482 | randomized work-stealing algorithm. 1483 | 1484 | We evaluate Hawk using a trace-driven simulation and a prototype 1485 | implementation in Spark. In particular, using a Google trace, we show 1486 | that under high load, compared to the purely distributed Sparrow 1487 | scheduler, Hawk improves the 50th and 90th percentile runtimes by 80\% 1488 | and 90\% for short jobs and by 35\% and 10\% for long jobs, 1489 | respectively. Measurements of a prototype implementation using Spark on 1490 | a 100-node cluster confirm the results of the simulation.}, 1491 | } 1492 | 1493 | @article{clusterdata:Sebastio2015, 1494 | author = {Sebastio, Stefano and Amoretti, Michele and Lluch-Lafuente, Alberto}, 1495 | title = {AVOCLOUDY: a simulator of volunteer clouds}, 1496 | journal = {Software: Practice and Experience}, 1497 | volume = {46}, 1498 | number = {1}, 1499 | pages = {3--30}, 1500 | year = 2015, 1501 | month = Jan, 1502 | keywords = {cloud computing, volunteer computing, autonomic computing, distributed computing, discrete event simulation}, 1503 | doi = {10.1002/spe.2345}, 1504 | url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2345}, 1505 | eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.2345}, 1506 | abstract = {The increasing demand of computational and storage resources is 1507 | shifting users toward the adoption of cloud technologies. Cloud 1508 | computing is based on the vision of computing as utility, where users no 1509 | more need to buy machines but simply access remote resources made 1510 | available on-demand by cloud providers. The relationship between users 1511 | and providers is defined by a service-level agreement, where the 1512 | non-fulfillment of its terms is regulated by the associated penalty 1513 | fees. Therefore, it is important that the providers adopt proper 1514 | monitoring and managing strategies. Despite their reduced application, 1515 | intelligent agents constitute a feasible technology to add autonomic 1516 | features to cloud operations. Furthermore, the volunteer computing 1517 | paradigmââ�¬â��one of the Information and Communications Technology (ICT) 1518 | trends of the last decadeââ�¬â��can be pulled alongside traditional cloud 1519 | approaches, with the purpose to ââ�¬Ë�greenââ�¬â�¢ them. Indeed, the 1520 | combination of data center and volunteer resources, managed by agents, 1521 | allows one to obtain a more robust and scalable cloud computing 1522 | platform. The increased challenges in designing such a complex system 1523 | can benefit from a simulation-based approach, to test autonomic 1524 | management solutions before their deployment in the production 1525 | environment. However, currently available simulators of cloud platforms 1526 | are not suitable to model and analyze such heterogeneous, large-scale, 1527 | and highly dynamic systems. We propose the AVOCLOUDY simulator to fill 1528 | this gap. This paper presents the internal architecture of the 1529 | simulator, provides implementation details, summarizes several notable 1530 | applications, and provides experimental results that measure the 1531 | simulator performance and its accuracy. The latter experiments are based 1532 | on real-world worldwide distributed computations on top of the PlanetLab 1533 | platform.} 1534 | } 1535 | 1536 | ################ 2014 1537 | 1538 | @InProceedings{clusterdata:Iglesias2014:task-estimation, 1539 | author = {Jesus Omana Iglesias and Liam Murphy Lero and Milan De 1540 | Cauwer and Deepak Mehta and Barry O'Sullivan}, 1541 | title = {A methodology for online consolidation of tasks through 1542 | more accurate resource estimations}, 1543 | year = 2014, 1544 | month = Dec, 1545 | booktitle = {IEEE/ACM Intl. Conf. on Utility and Cloud Computing (UCC)}, 1546 | address = {London, UK}, 1547 | abstract = {Cloud providers aim to provide computing services for a wide range 1548 | of applications, such as web applications, emails, web searches, and map 1549 | reduce jobs. These applications are commonly scheduled to run on 1550 | multi-purpose clusters that nowadays are becoming larger and more 1551 | heterogeneous. A major challenge is to efficiently utilize the cluster's 1552 | available resources, in particular to maximize overall machine 1553 | utilization levels while minimizing application waiting time. We studied 1554 | a publicly available trace from a large Google cluster ($\sim$12,000 1555 | machines) and observed that users generally request more resources than 1556 | required for running their tasks, leading to low levels of utilization. 1557 | In this paper, we propose a methodology for achieving an efficient 1558 | utilization of the cluster's resources while providing the users with 1559 | fast and reliable computing services. The methodology consists of three 1560 | main modules: i) a prediction module that forecasts the maximum resource 1561 | requirement of a task; ii) a scalable scheduling module that efficiently 1562 | allocates tasks to machines; and iii) a monitoring module that tracks 1563 | the levels of utilization of the machines and tasks. We present results 1564 | that show that the impact of more accurate resource estimations for the 1565 | scheduling of tasks can lead to an increase in the average utilization 1566 | of the cluster, a reduction in the number of tasks being evicted, and a 1567 | reduction in task waiting time.}, 1568 | keys = {online scheduling, Cloud computing, forecasting, resource provisioning, 1569 | constraint programming}, 1570 | } 1571 | 1572 | @InProceedings{clusterdata:Balliu2014, 1573 | author = {Alkida Balliu and Dennis Olivetti and Ozalp Babaoglu and 1574 | Moreno Marzolla and Alina Sirbu}, 1575 | title = {{BiDAl: Big Data Analyzer} for cluster traces}, 1576 | year = 2014, 1577 | booktitle = {Informatik Workshop on System Software Support for Big Data (BigSys)}, 1578 | month = Sep, 1579 | publisher = {GI-Edition Lecture Notes in Informatics}, 1580 | abstract = {Modern data centers that provide Internet-scale services are 1581 | stadium-size structures housing tens of thousands of heterogeneous 1582 | devices (server clusters, networking equipment, power and cooling 1583 | infrastructures) that must operate continuously and reliably. As part 1584 | of their operation, these devices produce large amounts of data in the 1585 | form of event and error logs that are essential not only for identifying 1586 | problems but also for improving data center efficiency and 1587 | management. These activities employ data analytics and often exploit 1588 | hidden statistical patterns and correlations among different factors 1589 | present in the data. Uncovering these patterns and correlations is 1590 | challenging due to the sheer volume of data to be analyzed. This paper 1591 | presents BiDAl, a prototype ``log-data analysis framework'' that 1592 | incorporates various Big Data technologies to simplify the analysis of 1593 | data traces from large clusters. BiDAl is written in Java with a modular 1594 | and extensible architecture so that different storage backends 1595 | (currently, HDFS and SQLite are supported), as well as different 1596 | analysis languages (current implementation supports SQL, R and Hadoop 1597 | MapReduce) can be easily selected as appropriate. We present the design 1598 | of BiDAl and describe our experience using it to analyze several public 1599 | traces of Google data clusters for building a simulation model capable 1600 | of reproducing observed behavior.}, 1601 | } 1602 | 1603 | @inproceedings{clusterdata:Caglar2014, 1604 | title = {{iOverbook}: intelligent resource-overbooking to support 1605 | soft real-time applications in the cloud}, 1606 | author = {Faruk Caglar and Aniruddha Gokhale}, 1607 | booktitle = {7th IEEE International Conference on Cloud Computing (IEEE CLOUD)}, 1608 | year = 2014, 1609 | month = {Jun--Jul}, 1610 | address = {Anchorage, AK, USA}, 1611 | abstract = {Cloud service providers (CSPs) often overbook their resources 1612 | with user applications despite having to maintain service-level 1613 | agreements with their customers. Overbooking is attractive to CSPs 1614 | because it helps to reduce power consumption in the data center by 1615 | packing more user jobs in less number of resources while improving their 1616 | profits. Overbooking becomes feasible because user applications tend to 1617 | overestimate their resource requirements utilizing only a fraction of 1618 | the allocated resources. Arbitrary resource overbooking ratios, however, 1619 | may be detrimental to soft real-time applications, such as airline 1620 | reservations or Netflix video streaming, which are increasingly hosted 1621 | in the cloud. The changing dynamics of the cloud preclude an offline 1622 | determination of overbooking ratios. To address these concerns, this 1623 | paper presents iOverbook, which uses a machine learning approach to make 1624 | systematic and online determination of overbooking ratios such that the 1625 | quality of service needs of soft real-time systems can be met while 1626 | still benefiting from overbooking. Specifically, iOverbook utilizes 1627 | historic data of tasks and host machines in the cloud to extract their 1628 | resource usage patterns and predict future resource usage along with the 1629 | expected mean performance of host machines. To evaluate our approach, we 1630 | have used a large usage trace made available by Google of one of its 1631 | production data centers. In the context of the traces, our experiments 1632 | show that iOverbook can help CSPs improve their resource utilization by 1633 | an average of 12.5\% and save 32\% power in the data center.}, 1634 | url = {http://www.dre.vanderbilt.edu/~gokhale/WWW/papers/CLOUD-2014.pdf}, 1635 | } 1636 | 1637 | @inproceedings{clusterdata:Sebastio2014, 1638 | author = {Sebastio, Stefano and Amoretti, Michele and Lluch Lafuente, Alberto}, 1639 | title = {A computational field framework for collaborative task 1640 | execution in volunteer clouds}, 1641 | booktitle = {International Symposium on Software Engineering for 1642 | Adaptive and Self-Managing Systems (SEAMS)}, 1643 | year = 2014, 1644 | month = Jun, 1645 | isbn = {978-1-4503-2864-7}, 1646 | address = {Hyderabad, India}, 1647 | pages = {105--114}, 1648 | url = {http://doi.acm.org/10.1145/2593929.2593943}, 1649 | doi = {10.1145/2593929.2593943}, 1650 | publisher = {ACM}, 1651 | keywords = {ant colony optimization, bio-inspired algorithms, cloud computing, 1652 | distributed tasks execution, peer-to-peer, self-* systems, spatial 1653 | computing, volunteer computing}, 1654 | abstract = {The increasing diffusion of cloud technologies offers new 1655 | opportunities for distributed and collaborative computing. Volunteer 1656 | clouds are a prominent example, where participants join and leave the 1657 | platform and collaborate by sharing computational resources. The high 1658 | complexity, dynamism and unpredictability of such scenarios call for 1659 | decentralized self-* approaches. We present in this paper a framework 1660 | for the design and evaluation of self-adaptive collaborative task 1661 | execution strategies in volunteer clouds. As a byproduct, we propose a 1662 | novel strategy based on the Ant Colony Optimization paradigm, that we 1663 | validate through simulation-based statistical analysis over Google 1664 | cluster data.}, 1665 | } 1666 | 1667 | @inproceedings{clusterdata:Breitgand2014-adaptive, 1668 | title = {An adaptive utilization accelerator for virtualized environments}, 1669 | author = {Breitgand, David and Dubitzky, Zvi and Epstein, Amir and 1670 | Feder, Oshrit and Glikson, Alex and Shapira, Inbar and 1671 | Toffetti, Giovanni}, 1672 | booktitle = {International Conference on Cloud Engineering (IC2E)}, 1673 | pages = {165--174}, 1674 | year = 2014, 1675 | month = Mar, 1676 | publisher = IEEE, 1677 | address = {Boston, MA, USA}, 1678 | abstract = {One of the key enablers of a cloud provider competitiveness is 1679 | ability to over-commit shared infrastructure at ratios that are higher 1680 | than those of other competitors, without compromising non-functional 1681 | requirements, such as performance. A widely recognized impediment to 1682 | achieving this goal is so called ``Virtual Machines sprawl'', a 1683 | phenomenon referring to the situation when customers order Virtual 1684 | Machines (VM) on the cloud, use them extensively and then leave them 1685 | inactive for prolonged periods of time. Since a typical cloud 1686 | provisioning system treats new VM provision requests according to the 1687 | nominal virtual hardware specification, an often occurring situation is 1688 | that the nominal resources of a cloud/pool become exhausted fast while 1689 | the physical hosts utilization remains low. We present IBM adaPtive 1690 | UtiLiSation AcceleratoR (IBM PULSAR), a cloud resources scheduler that 1691 | extends OpenStack Nova Filter Scheduler. IBM PULSAR recognises that 1692 | effective safely attainable over-commit ratio varies with time due to 1693 | workloads' variability and dynamically adapts the effective over-commit 1694 | ratio to these changes.}, 1695 | } 1696 | 1697 | @ARTICLE{clusterdata:Zhang2014-Harmony, 1698 | author = {Qi Zhang and Mohamed Faten Zhani and Raouf Boutaba and 1699 | Joseph L Hellerstein}, 1700 | title = {Dynamic heterogeneity-aware resource provisioning in the cloud}, 1701 | journal = {IEEE Transactions on Cloud Computing (TCC)}, 1702 | year = 2014, 1703 | month = Mar, 1704 | volume = 2, 1705 | number = 1, 1706 | abstract = {Data centers consume tremendous amounts of energy in terms of 1707 | power distribution and cooling. Dynamic capacity provisioning is a 1708 | promising approach for reducing energy consumption by dynamically 1709 | adjusting the number of active machines to match resource 1710 | demands. However, despite extensive studies of the problem, existing 1711 | solutions have not fully considered the heterogeneity of both workload 1712 | and machine hardware found in production environments. In particular, 1713 | production data centers often comprise heterogeneous machines with 1714 | different capacities and energy consumption characteristics. Meanwhile, 1715 | the production cloud workloads typically consist of diverse applications 1716 | with different priorities, performance and resource 1717 | requirements. Failure to consider the heterogeneity of both machines and 1718 | workloads will lead to both sub-optimal energy-savings and long 1719 | scheduling delays, due to incompatibility between workload requirements 1720 | and the resources offered by the provisioned machines. To address this 1721 | limitation, we present Harmony, a Heterogeneity-Aware dynamic capacity 1722 | provisioning scheme for cloud data centers. Specifically, we first use 1723 | the K-means clustering algorithm to divide workload into distinct task 1724 | classes with similar characteristics in terms of resource and 1725 | performance requirements. Then we present a technique that dynamically 1726 | adjusting the number of machines to minimize total energy consumption 1727 | and scheduling delay. Simulations using traces from a Google's compute 1728 | cluster demonstrate Harmony can reduce energy by 28 percent compared to 1729 | heterogeneity-oblivious solutions.}, 1730 | } 1731 | 1732 | ################ 2013 1733 | 1734 | @INPROCEEDINGS{clusterdata:Di2013a, 1735 | title = {Optimization of cloud task processing with checkpoint-restart mechanism}, 1736 | author = {Di, Sheng and Robert, Yves and Vivien, Fr\'ed\'eric and 1737 | Kondo, Derrick and Wang, Cho-Li and Cappello, Franck}, 1738 | booktitle = {25th International Conference on High Performance 1739 | Computing, Networking, Storage and Analysis (SC)}, 1740 | year = 2013, 1741 | month = Nov, 1742 | address = {Denver, CO, USA}, 1743 | abstract = {In this paper, we aim at optimizing fault-tolerance techniques 1744 | based on a checkpointing/restart mechanism, in the context of cloud 1745 | computing. Our contribution is three-fold. (1) We derive a fresh formula 1746 | to compute the optimal number of checkpoints for cloud jobs with varied 1747 | distributions of failure events. Our analysis is not only generic with 1748 | no assumption on failure probability distribution, but also attractively 1749 | simple to apply in practice. (2) We design an adaptive algorithm to 1750 | optimize the impact of checkpointing regarding various costs like 1751 | checkpointing/restart overhead. (3) We evaluate our optimized solution 1752 | in a real cluster environment with hundreds of virtual machines and 1753 | Berkeley Lab Checkpoint/Restart tool. Task failure events are emulated 1754 | via a production trace produced on a large-scale Google data 1755 | center. Experiments confirm that our solution is fairly suitable for 1756 | Google systems. Our optimized formula outperforms Young's formula by 1757 | 3--10 percent, reducing wallclock lengths by 50--100 seconds per job on 1758 | average.}, 1759 | } 1760 | 1761 | @inproceedings{clusterdata:Qiang2013-anomaly, 1762 | author = {Qiang Guan and Song Fu}, 1763 | title = {Adaptive Anomaly Identification by Exploring Metric 1764 | Subspace in Cloud Computing Infrastructures}, 1765 | booktitle = {32nd IEEE Symposium on Reliable Distributed Systems (SRDS)}, 1766 | year = 2013, 1767 | month = Sep, 1768 | pages = {205--214}, 1769 | address = {Braga, Portugal}, 1770 | abstract = {Cloud computing has become increasingly popular by obviating the 1771 | need for users to own and maintain complex computing 1772 | infrastructures. However, due to their inherent complexity and large 1773 | scale, production cloud computing systems are prone to various runtime 1774 | problems caused by hardware and software faults and environmental 1775 | factors. Autonomic anomaly detection is a crucial technique for 1776 | understanding emergent, cloud-wide phenomena and self-managing cloud 1777 | resources for system-level dependability assurance. To detect anomalous 1778 | cloud behaviors, we need to monitor the cloud execution and collect 1779 | runtime cloud performance data. These data consist of values of 1780 | performance metrics for different types of failures, which display 1781 | different correlations with the performance metrics. In this paper, we 1782 | present an adaptive anomaly identification mechanism that explores the 1783 | most relevant principal components of different failure types in cloud 1784 | computing infrastructures. It integrates the cloud performance metric 1785 | analysis with filtering techniques to achieve automated, efficient, and 1786 | accurate anomaly identification. The proposed mechanism adapts itself by 1787 | recursively learning from the newly verified detection results to refine 1788 | future detections. We have implemented a prototype of the anomaly 1789 | identification system and conducted experiments in an on-campus cloud 1790 | computing environment and by using the Google data center traces. Our 1791 | experimental results show that our mechanism can achieve more efficient 1792 | and accurate anomaly detection than other existing schemes.}, 1793 | } 1794 | 1795 | @ARTICLE{clusterdata:Zhani2013-HARMONY, 1796 | title = {{HARMONY}: dynamic heterogeneity-aware resource provisioning in the cloud}, 1797 | author = {Qi Zhang and Mohamed Faten Zhani and Raouf Boutaba and 1798 | Joseph L. Hellerstein}, 1799 | journal = {The 33rd International Conference on Distributed Computing Systems (ICDCS)}, 1800 | year = 2013, 1801 | pages = {510--519}, 1802 | month = Jul, 1803 | address = {Philadelphia, PA, USA}, 1804 | abstract = {Data centers today consume tremendous amount of energy in terms 1805 | of power distribution and cooling. Dynamic capacity provisioning is a 1806 | promising approach for reducing energy consumption by dynamically 1807 | adjusting the number of active machines to match resource 1808 | demands. However, despite extensive studies of the problem, existing 1809 | solutions for dynamic capacity provisioning have not fully considered 1810 | the heterogeneity of both workload and machine hardware found in 1811 | production environments. In particular, production data centers often 1812 | comprise several generations of machines with different capacities, 1813 | capabilities and energy consumption characteristics. Meanwhile, the 1814 | workloads running in these data centers typically consist of a wide 1815 | variety of applications with different priorities, performance 1816 | objectives and resource requirements. Failure to consider heterogenous 1817 | characteristics will lead to both sub-optimal energy-savings and long 1818 | scheduling delays, due to incompatibility between workload requirements 1819 | and the resources offered by the provisioned machines. To address this 1820 | limitation, in this paper we present HARMONY, a Heterogeneity-Aware 1821 | Resource Management System for dynamic capacity provisioning in cloud 1822 | computing environments. Specifically, we first use the K-means 1823 | clustering algorithm to divide the workload into distinct task classes 1824 | with similar characteristics in terms of resource and performance 1825 | requirements. Then we present a novel technique for dynamically 1826 | adjusting the number of machines of each type to minimize total energy 1827 | consumption and performance penalty in terms of scheduling 1828 | delay. Through simulations using real traces from Google's compute 1829 | clusters, we found that our approach can improve data center energy 1830 | efficiency by up to 28\% compared to heterogeneity-oblivious 1831 | solutions.}, 1832 | } 1833 | 1834 | @INPROCEEDINGS{clusterdata:Amoretti2013 1835 | title = {A cooperative approach for distributed task execution in autonomic clouds}, 1836 | author = {Amoretti, M. and Lafuente, A.L. and Sebastio, S.}, 1837 | booktitle = {21st Euromicro International Conference on Parallel, 1838 | Distributed and Network-Based Processing (PDP)}, 1839 | publisher = {IEEE}, 1840 | year = 2013, 1841 | month = Feb, 1842 | pages = {274--281}, 1843 | abstract = {Virtualization and distributed computing are two key pillars that 1844 | guarantee scalability of applications deployed in the Cloud. In 1845 | Autonomous Cooperative Cloud-based Platforms, autonomous computing nodes 1846 | cooperate to offer a PaaS Cloud for the deployment of user 1847 | applications. Each node must allocate the necessary resources for 1848 | applications to be executed with certain QoS guarantees. If the QoS of 1849 | an application cannot be guaranteed a node has mainly two options: to 1850 | allocate more resources (if it is possible) or to rely on the 1851 | collaboration of other nodes. Making a decision is not trivial since it 1852 | involves many factors (e.g. the cost of setting up virtual machines, 1853 | migrating applications, discovering collaborators). In this paper we 1854 | present a model of such scenarios and experimental results validating 1855 | the convenience of cooperative strategies over selfish ones, where nodes 1856 | do not help each other. We describe the architecture of the platform of 1857 | autonomous clouds and the main features of the model, which has been 1858 | implemented and evaluated in the DEUS discrete-event simulator. From the 1859 | experimental evaluation, based on workload data from the Google Cloud 1860 | Backend, we can conclude that (modulo our assumptions and 1861 | simplifications) the performance of a volunteer cloud can be compared to 1862 | that of a Google Cluster.}, 1863 | doi = {10.1109/PDP.2013.47}, 1864 | ISSN = {1066-6192}, 1865 | address = {Belfast, UK}, 1866 | url = {http://doi.ieeecomputersociety.org/10.1109/PDP.2013.47}, 1867 | } 1868 | 1869 | ################ 2012 1870 | 1871 | @INPROCEEDINGS{clusterdata:Di2012b, 1872 | title = {Host load prediction in a {Google} compute cloud with a {Bayesian} model}, 1873 | author = {Di, Sheng and Kondo, Derrick and Cirne, Walfredo}, 1874 | booktitle = {International Conference on High Performance Computing, 1875 | Networking, Storage and Analysis (SC)}, 1876 | year = 2012, 1877 | month = Nov, 1878 | isbn = {978-1-4673-0804-5}, 1879 | address = {Salt Lake City, UT, USA}, 1880 | pages = {21:1--21:11}, 1881 | abstract = {Prediction of host load in Cloud systems is critical for achieving 1882 | service-level agreements. However, accurate prediction of host load in 1883 | Clouds is extremely challenging because it fluctuates drastically at 1884 | small timescales. We design a prediction method based on Bayes model to 1885 | predict the mean load over a long-term time interval, as well as the 1886 | mean load in consecutive future time intervals. We identify novel 1887 | predictive features of host load that capture the expectation, 1888 | predictability, trends and patterns of host load. We also determine the 1889 | most effective combinations of these features for prediction. We 1890 | evaluate our method using a detailed one-month trace of a Google data 1891 | center with thousands of machines. Experiments show that the Bayes 1892 | method achieves high accuracy with a mean squared error of 1893 | 0.0014. Moreover, the Bayes method improves the load prediction accuracy 1894 | by 5.6--50\% compared to other state-of-the-art methods based on moving 1895 | averages, auto-regression, and/or noise filters.}, 1896 | url = {http://dl.acm.org/citation.cfm?id=2388996.2389025}, 1897 | publisher = {IEEE Computer Society Press}, 1898 | } 1899 | 1900 | @INPROCEEDINGS{clusterdata:Zhang2012, 1901 | title = {Dynamic energy-aware capacity provisioning for cloud computing environments}, 1902 | author = {Zhang, Qi and Zhani, Mohamed Faten and Zhang, Shuo and 1903 | Zhu, Quanyan and Boutaba, Raouf and Hellerstein, Joseph L.}, 1904 | booktitle = {9th ACM International Conference on Autonomic Computing (ICAC)}, 1905 | year = 2012, 1906 | month = Sep, 1907 | isbn = {978-1-4503-1520-3}, 1908 | address = {San Jose, CA, USA}, 1909 | pages = {145--154}, 1910 | acmid = {2371562}, 1911 | publisher = {ACM}, 1912 | doi = {10.1145/2371536.2371562}, 1913 | keywords = {cloud computing, energy management, model predictive 1914 | control, resource management}, 1915 | abstract = {Data centers have recently gained significant popularity as a 1916 | cost-effective platform for hosting large-scale service 1917 | applications. While large data centers enjoy economies of scale by 1918 | amortizing initial capital investment over large number of machines, 1919 | they also incur tremendous energy cost in terms of power distribution 1920 | and cooling. An effective approach for saving energy in data centers is 1921 | to adjust dynamically the data center capacity by turning off unused 1922 | machines. However, this dynamic capacity provisioning problem is known 1923 | to be challenging as it requires a careful understanding of the resource 1924 | demand characteristics as well as considerations to various cost 1925 | factors, including task scheduling delay, machine reconfiguration cost 1926 | and electricity price fluctuation. In this paper, we provide a 1927 | control-theoretic solution to the dynamic capacity provisioning problem 1928 | that minimizes the total energy cost while meeting the performance 1929 | objective in terms of task scheduling delay. Specifically, we model this 1930 | problem as a constrained discrete-time optimal control problem, and use 1931 | Model Predictive Control (MPC) to find the optimal control 1932 | policy. Through extensive analysis and simulation using real workload 1933 | traces from Google's compute clusters, we show that our proposed 1934 | framework can achieve significant reduction in energy cost, while 1935 | maintaining an acceptable average scheduling delay for individual 1936 | tasks.}, 1937 | } 1938 | 1939 | @INPROCEEDINGS{clusterdata:Ali-Eldin2012 1940 | title = {Efficient provisioning of bursty scientific workloads on the 1941 | cloud using adaptive elasticity control}, 1942 | author = {Ahmed Ali-Eldin and Maria Kihl and Johan Tordsson and Erik Elmroth}, 1943 | booktitle = {3rd Workshop on Scientific Cloud Computing (ScienceCloud)}, 1944 | year = 2012, 1945 | month = Jun, 1946 | address = {Delft, The Nederlands}, 1947 | isbn = {978-1-4503-1340-7}, 1948 | pages = {31--40}, 1949 | url = {http://dl.acm.org/citation.cfm?id=2287044}, 1950 | doi = {10.1145/2287036.2287044}, 1951 | publisher = {ACM}, 1952 | abstract = {Elasticity is the ability of a cloud infrastructure to dynamically 1953 | change the amount of resources allocated to a running service as load 1954 | changes. We build an autonomous elasticity controller that changes the 1955 | number of virtual machines allocated to a service based on both 1956 | monitored load changes and predictions of future load. The cloud 1957 | infrastructure is modeled as a G/G/N queue. This model is used to 1958 | construct a hybrid reactive-adaptive controller that quickly reacts to 1959 | sudden load changes, prevents premature release of resources, takes into 1960 | account the heterogeneity of the workload, and avoids 1961 | oscillations. Using simulations with Web and cluster workload traces, we 1962 | show that our proposed controller lowers the number of delayed requests 1963 | by a factor of 70 for the Web traces and 3 for the cluster traces when 1964 | compared to a reactive controller. Our controller also decreases the 1965 | average number of queued requests by a factor of 3 for both traces, and 1966 | reduces oscillations by a factor of 7 for the Web traces and 3 for the 1967 | cluster traces. This comes at the expense of between 20\% and 30\% 1968 | over-provisioning, as compared to a few percent for the reactive 1969 | controller.}, 1970 | } 1971 | 1972 | ################ 2011 1973 | 1974 | @INPROCEEDINGS{clusterdata:Sharma2011, 1975 | title = {Modeling and synthesizing task placement constraints in 1976 | {Google} compute clusters}, 1977 | author = {Sharma, Bikash and Chudnovsky, Victor and Hellerstein, 1978 | Joseph L. and Rifaat, Rasekh and Das, Chita R.}, 1979 | booktitle = {2nd ACM Symposium on Cloud Computing (SoCC)}, 1980 | year = 2011, 1981 | month = Oct, 1982 | isbn = {978-1-4503-0976-9}, 1983 | address = {Cascais, Portugal}, 1984 | pages = {3:1--3:14}, 1985 | url = {http://doi.acm.org/10.1145/2038916.2038919}, 1986 | doi = {10.1145/2038916.2038919}, 1987 | publisher = {ACM}, 1988 | keywords = {benchmarking, benchmarks, metrics, modeling, performance 1989 | evaluation, workload characterization}, 1990 | abstract = {Evaluating the performance of large compute clusters requires 1991 | benchmarks with representative workloads. At Google, performance 1992 | benchmarks are used to obtain performance metrics such as task 1993 | scheduling delays and machine resource utilizations to assess changes in 1994 | application codes, machine configurations, and scheduling 1995 | algorithms. Existing approaches to workload characterization for high 1996 | performance computing and grids focus on task resource requirements for 1997 | CPU, memory, disk, I/O, network, etc. Such resource requirements address 1998 | how much resource is consumed by a task. However, in addition to 1999 | resource requirements, Google workloads commonly include task placement 2000 | constraints that determine which machine resources are consumed by 2001 | tasks. Task placement constraints arise because of task dependencies 2002 | such as those related to hardware architecture and kernel version. 2003 | 2004 | This paper develops methodologies for incorporating task placement 2005 | constraints and machine properties into performance benchmarks of large 2006 | compute clusters. Our studies of Google compute clusters show that 2007 | constraints increase average task scheduling delays by a factor of 2 to 2008 | 6, which often results in tens of minutes of additional task wait 2009 | time. To understand why, we extend the concept of resource utilization 2010 | to include constraints by introducing a new metric, the Utilization 2011 | Multiplier (UM). UM is the ratio of the resource utilization seen by 2012 | tasks with a constraint to the average utilization of the resource. UM 2013 | provides a simple model of the performance impact of constraints in that 2014 | task scheduling delays increase with UM. Last, we describe how to 2015 | synthesize representative task constraints and machine properties, and 2016 | how to incorporate this synthesis into existing performance 2017 | benchmarks. Using synthetic task constraints and machine properties 2018 | generated by our methodology, we accurately reproduce performance 2019 | metrics for benchmarks of Google compute clusters with a discrepancy of 2020 | only 13\% in task scheduling delay and 5\% in resource utilization.}, 2021 | } 2022 | 2023 | @INPROCEEDINGS{clusterdata:Wang2011, 2024 | title = {Towards synthesizing realistic workload traces for studying 2025 | the {Hadoop} ecosystem}, 2026 | author = {Wang, Guanying and Butt, Ali R. and Monti, Henry and Gupta, Karan}, 2027 | booktitle = {19th IEEE Annual International Symposium on Modelling, 2028 | Analysis, and Simulation of Computer and Telecommunication 2029 | Systems (MASCOTS)}, 2030 | year = 2011, 2031 | month = Jul, 2032 | isbn = {978-0-7695-4430-4}, 2033 | pages = {400--408}, 2034 | url = {http://people.cs.vt.edu/~butta/docs/mascots11-hadooptrace.pdf}, 2035 | doi = {10.1109/MASCOTS.2011.59}, 2036 | publisher = {IEEE Computer Society}, 2037 | address = {Raffles Hotel, Singapore}, 2038 | keywords = {Cloud computing, Performance analysis, Design 2039 | optimization, Software performance modeling}, 2040 | abstract = {Designing cloud computing setups is a challenging task. It 2041 | involves understanding the impact of a plethora of parameters ranging 2042 | from cluster configuration, partitioning, networking characteristics, 2043 | and the targeted applications' behavior. The design space, and the scale 2044 | of the clusters, make it cumbersome and error-prone to test different 2045 | cluster configurations using real setups. Thus, the community is 2046 | increasingly relying on simulations and models of cloud setups to infer 2047 | system behavior and the impact of design choices. The accuracy of the 2048 | results from such approaches depends on the accuracy and realistic 2049 | nature of the workload traces employed. Unfortunately, few cloud 2050 | workload traces are available (in the public domain). In this paper, we 2051 | present the key steps towards analyzing the traces that have been made 2052 | public, e.g., from Google, and inferring lessons that can be used to 2053 | design realistic cloud workloads as well as enable thorough quantitative 2054 | studies of Hadoop design. Moreover, we leverage the lessons learned from 2055 | the traces to undertake two case studies: (i) Evaluating Hadoop job 2056 | schedulers, and (ii) Quantifying the impact of shared storage on Hadoop 2057 | system performance.} 2058 | } 2059 | --------------------------------------------------------------------------------