├── .gitignore ├── CONTRIBUTING.md ├── LICENSE ├── MAINTAINERS.md ├── README.md ├── docker-compose.yml ├── firehose-nozzle ├── Dockerfile ├── influxdb-firehose-nozzle.json └── supervisord.conf ├── grafana ├── Dockerfile ├── dashboards │ └── import_format │ │ ├── CC.json │ │ ├── Etcd_stats.json │ │ ├── Firehose_Stats.json │ │ ├── Influx_stats.json │ │ ├── Routing.json │ │ ├── bbs.json │ │ ├── cell.json │ │ ├── component-health.json │ │ ├── users.json │ │ └── vm-level-stats.json ├── grafana.ini ├── load.sh └── supervisord.conf ├── images ├── architecture.png ├── bosh_stats.png ├── cell_memory.png ├── dashboards.png ├── datastores.png ├── loggregator_stats.png └── slack.png ├── influxdb ├── Dockerfile ├── influxdb.config └── run.sh ├── kapacitor ├── bosh_event_np.tick ├── cpu_wait_np.tick ├── etcd_alert_np.tick ├── job_health_np.tick ├── kapacitor.conf ├── loader.md ├── max_container_np.tick ├── persistent_disk_np.tick ├── slow_consumer_np.tick └── swap_alert_np.tick └── telegraf └── telegraf.conf /.gitignore: -------------------------------------------------------------------------------- 1 | data/ 2 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contribution Guidelines 2 | 3 | ## Pull requests are always welcome 4 | 5 | We're trying very hard to keep our systems simple, lean and focused. We don't want them to be everything for everybody. This means that we might decide 6 | against incorporating a new request. 7 | 8 | 9 | ## Create issues... 10 | 11 | Any significant change should be documented as a GitHub issue before anybody starts working on it. 12 | 13 | 14 | ### ...but check for existing issues first! 15 | 16 | Please take a moment to check that an issue doesn't already exist documenting your request. If it does, it never hurts to add a quick "+1" or "I need this too". This will help prioritize the most common requests. 17 | 18 | 19 | ## Conventions 20 | 21 | Fork the repository and make changes on your fork on a branch: 22 | 23 | 1. Create the right type of issue (defect, enhancement, test, etc) 24 | 2. Name the branch N-something where N is the number of the issue. 25 | 26 | Note that the maintainers work on branches in this repository. 27 | 28 | Work hard to ensure your pull request is valid. 29 | 30 | Pull request descriptions should be as clear as possible and include a reference to all the issues that they address. In GitHub, you can reference an 31 | issue by adding a line to your commit description that follows the format: 32 | 33 | `Fixes #N` 34 | 35 | where N is the issue number. 36 | 37 | 38 | ## Merge approval 39 | 40 | Repository maintainers use **LGTM (Looks Good To Me)** in comments on the code review to indicate acceptance. 41 | 42 | A change requires LGTMs from an absolute majority of the **MAINTAINERS**. The **Benevolent Dictator For Life (BDFL)** reserves sole veto power. We recommend also 43 | getting an LGTM from the BDFL in advance of merging to avoid the possibility of a revert. 44 | 45 | 46 | #### Small patch exception 47 | 48 | There are exceptions to the merge approval process. Currently these are: 49 | 50 | * Your patch fixes spelling or grammar errors. 51 | * Your patch fixes Markdown formatting or syntax errors in any .md files in this repository 52 | 53 | 54 | ## How can I become a maintainer? 55 | 56 | Make important contributions. Don't forget, being a maintainer is a time investment. Make sure you will have time to make yourself available. You don't have to be a maintainer to make a difference on the project! 57 | 58 | 59 | ## What is a maintainer's responsibility? 60 | 61 | It is every maintainer's responsibility to: 62 | 63 | 1. Deliver prompt feedback and decisions on pull requests. 64 | 2. Be available to anyone with questions, bug reports, criticism, etc. on their component. This includes Slack and GitHub requests 65 | 3. Make sure their component respects the philosophy, design and road map of the project. 66 | 67 | 68 | ## How are decisions made? 69 | 70 | Short answer: with pull requests to this repository. 71 | 72 | All decisions, big and small, follow the same 3 steps: 73 | 74 | 1. Open a pull request. Anyone can do this. 75 | 76 | 2. Discuss the pull request. Anyone can do this. 77 | 78 | 3. Accept (`LGTM`) or refuse a pull request. The relevant maintainers 79 | do this (see below "Who decides what?") 80 | 81 | 1. Accepting pull requests 82 | 83 | 1. If the pull request appears to be ready to merge, give it a `LGTM`, which stands for "Looks Good To Me". 84 | 85 | 2. If the pull request has some small problems that need to be changed, make a comment addressing the issues. 86 | 87 | 3. If the changes needed to a PR are small, you can add a "LGTM once the following comments are addressed..." this will reduce needless back and forth. 88 | 89 | 4. If the PR only needs a few changes before being merged, any MAINTAINER can make a replacement PR that incorporates the existing commits and fixes the problems before a fast track merge. 90 | 91 | 2. Closing pull requests 92 | 93 | 1. If a PR appears to be abandoned, after having attempted to contact the original contributor, then a replacement PR may be made. Once the replacement PR is made, any contributor may close the original one. 94 | 95 | 2. If you are not sure if the pull request implements a good feature or you do not understand the purpose of the PR, ask the contributor to provide more documentation. If the contributor is not able to adequately explain the purpose of the PR, the PR may be closed by any MAINTAINER. 96 | 97 | 3. If a MAINTAINER feels that the pull request is sufficiently architecturally flawed, or if the pull request needs significantly more design discussion before being considered, the MAINTAINER should close the pull request with a short explanation of what discussion still needs to be had. It is important not to leave such pull requests open, as this will waste both the MAINTAINER's time and the contributor's time. It is not good to string a contributor on for weeks or months, having them make many changes to a PR that will eventually be rejected. 98 | 99 | 100 | ## Who decides what? 101 | 102 | All decisions are pull requests, and the relevant maintainers make decisions by accepting or refusing pull requests. Review and acceptance by anyone is 103 | denoted by adding a comment in the pull request: `LGTM`. However, only currently listed `MAINTAINERS` are counted towards the required majority. 104 | 105 | Event repositories follow the timeless, highly efficient and totally unfair system known as [Benevolent dictator for life](http://en.wikipedia.org/wiki/Benevolent_Dictator_for_Life). This means that all decisions are made in the end, by default, by **BDFL**. In 106 | practice decisions are spread across the maintainers with the goal of consensus prior to all merges. 107 | 108 | The current BDFL is listed by convention in the first line of the MAINTAINERS file with a suffix of "BDFL". 109 | 110 | 111 | ## I'm a maintainer, should I make pull requests too? 112 | 113 | Yes. Nobody should ever push to master directly. All changes should be made through a pull request. 114 | 115 | 116 | ## Who assigns maintainers? 117 | 118 | MAINTAINERS are changed via pull requests and the standard approval process - i.e. create an issue and make a pull request with the 119 | changes to the MAINTAINERS file. 120 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | # Modified BSD License 2 | 3 | Copyright (c) 2015, Monsanto Company 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | * Redistributions of source code must retain the above copyright 9 | notice, this list of conditions and the following disclaimer. 10 | * Redistributions in binary form must reproduce the above copyright 11 | notice, this list of conditions and the following disclaimer in the 12 | documentation and/or other materials provided with the distribution. 13 | * Neither the name of the Monsanto Company nor the 14 | names of its contributors may be used to endorse or promote products 15 | derived from this software without specific prior written permission. 16 | 17 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 18 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 19 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 20 | DISCLAIMED. IN NO EVENT SHALL MONSANTO COMPANY BE LIABLE FOR ANY 21 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 22 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 23 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 24 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 25 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 26 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 27 | 28 | -------------------------------------------------------------------------------- /MAINTAINERS.md: -------------------------------------------------------------------------------- 1 | mjseid (Mark Seidenstricker) 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # cf-metrics 2 | A project for monitoring and alerting with cloudfoundry utilizing the [InfluxData TICK stack](https://www.influxdata.com/time-series-platform/) 3 | 4 | ## Architecture and Data Flow 5 | ### Architecture 6 | ![](images/architecture.png) 7 | 8 | | Component | Purpose | 9 | | ------------- |-------------| 10 | | Loggregator Firehose | collects logs, events, and metrics from all jobs and app containers in cf - [details](https://docs.cloudfoundry.org/loggregator/architecture.html) | 11 | | Firehose Nozzle | connects to the loggregator firehose and forwards metrics to influxdb - [details](https://github.com/MonsantoCo/influxdb-firehose-nozzle) | 12 | | Bosh HM Forwarder | bosh job which subscribes to BOSH health-monitor metrics and forwards them to the loggregator firehose - [details](https://github.com/cloudfoundry/bosh-hm-forwarder) | 13 | | Bosh HM | collects vm vitals for all vm's in the cf release and sends events to telegraf - [details](https://bosh.io/docs/monitoring.html) | 14 | | Telegraf | recieves bosh events in consul protocol format from BOSH HM and sends them to kapacitor for processing - [details](https://docs.influxdata.com/telegraf/v1.3/) | 15 | | Kapacitor | streams data from influxdb and telegraf for processing, anomaly detection, and alerting - [details](https://docs.influxdata.com/kapacitor/v1.3/) | 16 | | Influxdb | stores the incoming metric streams for persistence - [details](https://docs.influxdata.com/influxdb/v1.3/) | 17 | | Grafana | provides dashboards for viewing metric data in influxdb - [details](http://grafana.org/) | 18 | | Slack | receives alerts and event notifications from kapacitor - [details](https://slack.com/) | 19 | 20 | For this project we have packaged the InfluxDB, Telegraf, Kapacitor, firehose nozzle, and Grafana components into a [docker compose](https://docs.docker.com/compose/) enviornment to allow for a compact and easily portable solution. 21 | 22 | ## Setup 23 | To run the project, you will need the following: 24 | 25 | 1. A working bosh/cloud-foundry enviornment utilizing the Cloud Foundry Diego architecture 26 | 2. A docker host with [docker](https://docs.docker.com/engine/installation/#server) and [docker compose](https://docs.docker.com/compose/install/) installed and configured. This project has been tested with docker and docker-compose versions 17.03.1-ce 27 | 28 | ### Firehose ClientID & Secret 29 | 30 | Update the following section in your cloud foundry manifest and redeploy cf to enable a uaa client for the firehose nozzle: 31 | ``` 32 | properties: 33 | uaa: 34 | clients: 35 | influxdb-firehose-nozzle: 36 | access-token-validity: 1209600 37 | authorized-grant-types: authorization_code,client_credentials,refresh_token 38 | override: true 39 | secret: 40 | scope: openid,oauth.approvals,doppler.firehose 41 | authorities: oauth.login,doppler.firehose 42 | ``` 43 | 44 | ### Docker Host and Container Configuration 45 | First clone this repo to the docker host and change the following files to reflect your environment: 46 | 47 | #### InfluxDB Compose Configuration 48 | docker-compose.yml: update this to reflect the name of your cf enviornment, which will be the database name in influx 49 | ``` 50 | environment: 51 | - PRE_CREATE_DB=cf_np 52 | ``` 53 | 54 | #### Nozzle Compose Configuration 55 | The firehose nozzle is configured via environment variables in the docker-compose.yml. Variables passed here override the ones built inside the nozzle container via the nozzle json file. At a minimum, you will want to change these six variables to reference your cf deployment. The database and deployment names given here should match the env given as the influxd database above: 56 | ``` 57 | NOZZLE_UAAURL=https://uaa.cf-np.company.com 58 | NOZZLE_CLIENT=client_id 59 | NOZZLE_CLIENT_SECRET=secret 60 | NOZZLE_TRAFFICCONTROLLERURL=wss://doppler.cf-np.company.com:443 61 | NOZZLE_DEPLOYMENT=cf_np 62 | NOZZLE_INFLUXDB_DATABASE=cf_np 63 | ``` 64 | Additional environment variable options can be found in the [upstream project](https://github.com/MonsantoCo/influxdb-firehose-nozzle) 65 | 66 | #### Grafana Dashboard Configuration 67 | If you changed your PRE_CREATE_DB value above, then make the following changes: 68 | ``` 69 | grafana/load.sh: replace any string of 'cf_np' with your PRE_CREATE_DB value 70 | grafana/dashboards/import_format/*.json: replace any string of 'cf_np' with your PRE_CREATE_DB value 71 | ``` 72 | 73 | #### Kapacitor Compose Configuration 74 | docker-compose.yml:: update the following environment variables for your slack instance 75 | ``` 76 | KAPACITOR_SLACK_URL=https://hooks.slack.com/services/XXXX/YYYYY/ZZZZZZZZZZZZZ 77 | KAPACITOR_SLACK_CHANNEL=#bot-testing 78 | ``` 79 | 80 | kapacitor/``*.tick``: If you are using the built in tick scripts, you will have to update the vars at the top of the tick scripts such as _grfana_url_,_grafanaenv_, _database_, or _slackchannel_ to represent your enviornment. 81 | 82 | ### BOSH Setup 83 | 84 | #### BOSH HM Forwarder 85 | Follow the directions in the [bosh-hm-forwarder](https://github.com/cloudfoundry/bosh-hm-forwarder-release) project to either deploy a dedicated bosh release for the forwarder, or add the job to an existing job in your cf deployment. Instructions for adding the job to the existing consul_z2 job of your cf deployment are below: 86 | 87 | Upload the release to your director 88 | 89 | ``` 90 | bosh target 91 | bosh upload release https://bosh.io/d/github.com/cloudfoundry/bosh-hm-forwarder-release 92 | ``` 93 | Add the following items to your consul_z2 job section and the releases section of your cf.yml 94 | ``` 95 | jobs: 96 | - default_networks: 97 | - name: cf2 98 | name: consul_z2 99 | templates: 100 | - name: boshhmforwarder 101 | release: bosh-hm-forwarder 102 | 103 | releases: 104 | - name: cf 105 | version: latest 106 | - name: bosh-hm-forwarder 107 | version: latest 108 | ``` 109 | Do a bosh deploy for your cf deployement 110 | 111 | #### BOSH HM 112 | 113 | Update the following section in your bosh manifest and redeploy bosh to enable the bosh monitor statistics: 114 | ``` 115 | hm: 116 | tsdb_enabled: true 117 | tsdb: 118 | address: 119 | port: 4000 120 | consul_event_forwarder_enabled: true 121 | consul_event_forwarder: 122 | host: 123 | port: 8125 124 | protocol: http 125 | events: true 126 | heartbeats_as_alerts: true 127 | ``` 128 | 129 | ## Running the Compose Application 130 | In the top level of the project directory, use the following command to create the docker compose application and verify successful start: 131 | ``` 132 | docker-compose up -d && docker-compose ps 133 | ``` 134 | ### Enabling the Kapacitor tick scripts 135 | Auto-defining and enabling the included tick scripts is currently not supported in Kapacitor, so you need to manually log into kapacitor after it is up and enable any desired scripts. In the below example it assumes your influx database is cf_np. If you changed PRE_CREATE_DB above, then substitute your database name below for the -dbrp field 136 | 137 | **will be addressed by https://github.com/influxdata/kapacitor/pull/1481** 138 | ``` 139 | docker exec -it bash 140 | 141 | cd /etc/kapacitor/ 142 | kapacitor define etcd_alert_np -type stream -tick etcd_alert_np.tick -dbrp cf_np.autogen 143 | kapacitor enable etcd_alert_np 144 | kapacitor define slow_consumer_np -type stream -tick slow_consumer_np.tick -dbrp cf_np.autogen 145 | kapacitor enable slow_consumer_np 146 | kapacitor define swap_alert_np -type stream -tick swap_alert_np.tick -dbrp cf_np.autogen 147 | kapacitor enable swap_alert_np 148 | kapacitor define job_health_np -type stream -tick job_health_np.tick -dbrp cf_np.autogen 149 | kapacitor enable job_health_np 150 | kapacitor define persistent_disk_np -type stream -tick persistent_disk_np.tick -dbrp cf_np.autogen 151 | kapacitor enable persistent_disk_np 152 | kapacitor define cpu_wait_alert_np -type stream -tick cpu_wait_np.tick -dbrp cf_np.autogen 153 | kapacitor enable cpu_wait_alert_np 154 | kapacitor define bosh_event_np -type stream -tick bosh_event_np.tick -dbrp telegraf_np.autogen 155 | kapacitor enable bosh_event_np 156 | kapacitor define max_container_np -type stream -tick max_container_np.tick -dbrp cf_np.autogen 157 | kapacitor enable max_container_np 158 | 159 | kapacitor list tasks 160 | ``` 161 | 162 | ## Usage 163 | Grafana will be configured for the datasources you specified in the config file (cf_np & influx by default) and will have some preloaded dashboards which show relevant cf and bosh metric data. 164 | 165 | You can use grafana to create you're own dashboards or the following dashboards have been included: 166 | ![](images/dashboards.png) 167 | 168 | Cloud Foundry Job specific dashboards use metrics from the Firehose to show things like Cell available memory ratios and CPU load. Annotations are included to show "bosh deploy" events and enable correlation between changes and incidents. 169 | ![](images/cell_memory.png) 170 | 171 | VM dashboards provide VM level statistics from BOSH Monitor to show things like ephemeral disk or swap utilzation. 172 | ![](images/bosh_stats.png) 173 | 174 | Enabled tick alert scripts in Kapacitor (such as Cell memory and bosh deploy's) will go to your slack channel for team notification. You can also modify the Kapacitor tickscripts to send alerts to any number of supported alert [event handlers](https://docs.influxdata.com/kapacitor/v1.3/nodes/alert_node/) like email or pagerduty. 175 | 176 | 177 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: '2' 2 | services: 3 | influxdb: 4 | build: ./influxdb 5 | ports: 6 | - "8083:8083" 7 | - "8088:8088" 8 | - "8086:8086" 9 | environment: 10 | - PRE_CREATE_DB=cf_np 11 | volumes: 12 | - ./data/influxdb:/data 13 | ulimits: 14 | nofile: 1048576 15 | networks: 16 | - mynet 17 | restart: always 18 | grafana: 19 | build: ./grafana 20 | ports: 21 | - "3000:3000" 22 | volumes: 23 | - ./grafana/grafana.ini:/etc/grafana/grafana.ini:ro 24 | - ./data/grafana:/usr/share/grafana/data 25 | environment: 26 | - GF_AUTH_ANONYMOUS_ENABLED=true 27 | - GF_AUTH_ANONYMOUS_ORG_ROLE=Editor 28 | - GF_DASHBOARDS_JSON_ENABLED=true 29 | - GF_DASHBOARDS_JSON_PATH=/etc/grafana/dashboards 30 | networks: 31 | - mynet 32 | restart: always 33 | nozzlenp: 34 | build: ./firehose-nozzle 35 | environment: 36 | - NOZZLE_INFLUXDB_URL=http://influxdb:8086 37 | - NOZZLE_INFLUXDB_DATABASE=cf_np 38 | - NOZZLE_UAAURL=https://uaa.cf.company.com 39 | - NOZZLE_CLIENT=influxdb-firehose-nozzle 40 | - NOZZLE_CLIENT_SECRET=supersecret 41 | - NOZZLE_TRAFFICCONTROLLERURL=wss://doppler.cf.company.com:443 42 | - NOZZLE_DEPLOYMENT=cf_np 43 | - NOZZLE_EVENT_FILTER=CounterEvent,ValueMetric 44 | - NOZZLE_FIREHOSESUBSCRIPTIONID=cf-metrics 45 | networks: 46 | - mynet 47 | restart: always 48 | kapacitor: 49 | image: kapacitor:1.3.3 50 | ports: 51 | - "9092:9092" 52 | volumes: 53 | - ./data/kapacitor:/var/lib/kapacitor 54 | - ./kapacitor:/etc/kapacitor 55 | environment: 56 | - KAPACITOR_REPORTING_ENABLED=false 57 | - KAPACITOR_INFLUXDB_0_URLS_0=http://influxdb:8086 58 | - KAPACITOR_SLACK_ENABLED=true 59 | - KAPACITOR_SLACK_URL=https://hooks.slack.com/services/XXXX/YYYYY/ZZZZZZZZZZZZZ 60 | - KAPACITOR_SLACK_CHANNEL=#bot-testing 61 | - KAPACITOR_SLACK_USERNAME=kapacitor 62 | - KAPACITOR_HTTPPOST_0_ENDPOINT=jenkins-np 63 | - KAPACITOR_HTTPPOST_0_URL=http://jenkins.company.com/jenkins/job/cf-cell-increase--np/build?token=secrettoken 64 | - KAPACITOR_HTTPPOST_0_BASIC_AUTH_USERNAME=user 65 | - KAPACITOR_HTTPPOST_0_BASIC_AUTH_PASSWORD=password 66 | networks: 67 | - mynet 68 | restart: always 69 | telegraf_np: 70 | image: telegraf:1.4.2 71 | ports: 72 | - "8125:8125" 73 | volumes: 74 | - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro 75 | environment: 76 | - KAPACITOR_DB=telegraf_np 77 | - LISTENER_PORT=tcp://:8125 78 | networks: 79 | - mynet 80 | restart: always 81 | networks: 82 | mynet: 83 | driver: bridge 84 | -------------------------------------------------------------------------------- /firehose-nozzle/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM buildpack-deps:jessie-scm 2 | 3 | # gcc for cgo 4 | RUN apt-get update && apt-get install -y --no-install-recommends \ 5 | g++ \ 6 | gcc \ 7 | libc6-dev \ 8 | make \ 9 | supervisor \ 10 | curl \ 11 | ca-certificates \ 12 | && rm -rf /var/lib/apt/lists/* 13 | 14 | ENV GOLANG_VERSION 1.6.3 15 | ENV GOLANG_DOWNLOAD_URL https://golang.org/dl/go$GOLANG_VERSION.linux-amd64.tar.gz 16 | ENV GOLANG_DOWNLOAD_SHA256 cdde5e08530c0579255d6153b08fdb3b8e47caabbe717bc7bcd7561275a87aeb 17 | 18 | RUN curl -fsSL "$GOLANG_DOWNLOAD_URL" -o golang.tar.gz \ 19 | && echo "$GOLANG_DOWNLOAD_SHA256 golang.tar.gz" | sha256sum -c - \ 20 | && tar -C /usr/local -xzf golang.tar.gz \ 21 | && rm golang.tar.gz 22 | 23 | ENV GOPATH /go 24 | ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH 25 | 26 | RUN mkdir -p "$GOPATH/src" "$GOPATH/bin" && chmod -R 777 "$GOPATH" 27 | WORKDIR $GOPATH 28 | 29 | RUN cd /go \ 30 | && go get -u -v github.com/MonsantoCo/influxdb-firehose-nozzle \ 31 | && cd /go/src/github.com/MonsantoCo/influxdb-firehose-nozzle \ 32 | && go build . \ 33 | && cp influxdb-firehose-nozzle /go/ 34 | 35 | ADD influxdb-firehose-nozzle.json /go/influxdb-firehose-nozzle.json 36 | COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf 37 | 38 | CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"] 39 | -------------------------------------------------------------------------------- /firehose-nozzle/influxdb-firehose-nozzle.json: -------------------------------------------------------------------------------- 1 | { 2 | "UAAURL": "https://uaa.cf.domain.com", 3 | "Client": "influxdb-firehose-nozzle", 4 | "ClientSecret": "password", 5 | "TrafficControllerURL": "wss://doppler.cf.domain.com:443", 6 | "FirehoseSubscriptionID": "influxdb-firehose-nozzle", 7 | "InfluxDbUrl": "http://influxdb:8086", 8 | "InfluxDbDatabase": "cloudfoundry", 9 | "InfluxDbUser": "root", 10 | "InfluxDbPassword": "admin", 11 | "FlushDurationSeconds": 15, 12 | "SsLSkipVerify": true, 13 | "MetricPrefix": "firehose.", 14 | "Deployment": "deployment", 15 | "DisableAccessControl": false, 16 | "IdleTimeoutSeconds" : 60 17 | } 18 | -------------------------------------------------------------------------------- /firehose-nozzle/supervisord.conf: -------------------------------------------------------------------------------- 1 | [supervisord] 2 | nodaemon=true 3 | logfile=/dev/null 4 | 5 | [program:nozzle] 6 | command=/go/influxdb-firehose-nozzle -config /go/influxdb-firehose-nozzle.json 7 | startsecs=0 8 | auto_start=true 9 | startretries=30 10 | autorestart=true 11 | redirect_stderr=true 12 | stdout_logfile=/dev/stdout 13 | stdout_logfile_maxbytes=0 14 | stderr_logfile=/dev/stdout 15 | stderr_logfile_maxbytes=0 16 | -------------------------------------------------------------------------------- /grafana/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:trusty 2 | 3 | RUN apt-get update && apt-get -y install libfontconfig wget adduser openssl ca-certificates curl supervisor 4 | 5 | RUN wget -O grafana_latest_amd64.deb https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_4.5.2_amd64.deb 6 | RUN dpkg -i grafana_latest_amd64.deb 7 | 8 | EXPOSE 3000 9 | 10 | VOLUME ["/usr/share/grafana/data"] 11 | 12 | WORKDIR /usr/share/grafana 13 | 14 | ADD dashboards/import_format/ /etc/grafana/dashboards/ 15 | ADD load.sh /etc/grafana/load.sh 16 | COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf 17 | 18 | CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"] 19 | -------------------------------------------------------------------------------- /grafana/dashboards/import_format/CC.json: -------------------------------------------------------------------------------- 1 | { 2 | "__inputs": [], 3 | "__requires": [ 4 | { 5 | "type": "grafana", 6 | "id": "grafana", 7 | "name": "Grafana", 8 | "version": "4.2.0" 9 | }, 10 | { 11 | "type": "panel", 12 | "id": "graph", 13 | "name": "Graph", 14 | "version": "" 15 | } 16 | ], 17 | "annotations": { 18 | "list": [ 19 | { 20 | "datasource": "$Environment", 21 | "enable": true, 22 | "iconColor": "rgba(255, 96, 96, 1)", 23 | "name": "deployment", 24 | "query": "select value from bosh_deploy where $timeFilter" 25 | } 26 | ] 27 | }, 28 | "editable": true, 29 | "gnetId": null, 30 | "graphTooltip": 0, 31 | "hideControls": false, 32 | "id": null, 33 | "links": [], 34 | "rows": [ 35 | { 36 | "collapse": false, 37 | "height": "250px", 38 | "panels": [ 39 | { 40 | "aliasColors": {}, 41 | "bars": false, 42 | "datasource": "$Environment", 43 | "editable": true, 44 | "error": false, 45 | "fill": 0, 46 | "grid": {}, 47 | "id": 1, 48 | "legend": { 49 | "avg": false, 50 | "current": false, 51 | "max": false, 52 | "min": false, 53 | "show": true, 54 | "total": false, 55 | "values": false 56 | }, 57 | "lines": true, 58 | "linewidth": 2, 59 | "links": [], 60 | "nullPointMode": "connected", 61 | "percentage": false, 62 | "pointradius": 1, 63 | "points": true, 64 | "renderer": "flot", 65 | "seriesOverrides": [], 66 | "span": 12, 67 | "stack": false, 68 | "steppedLine": false, 69 | "targets": [ 70 | { 71 | "alias": "$tag_job:$tag_index", 72 | "dsType": "influxdb", 73 | "groupBy": [ 74 | { 75 | "params": [ 76 | "$interval" 77 | ], 78 | "type": "time" 79 | }, 80 | { 81 | "params": [ 82 | "job" 83 | ], 84 | "type": "tag" 85 | }, 86 | { 87 | "params": [ 88 | "index" 89 | ], 90 | "type": "tag" 91 | }, 92 | { 93 | "params": [ 94 | "null" 95 | ], 96 | "type": "fill" 97 | } 98 | ], 99 | "hide": false, 100 | "measurement": "firehose.gorouter.requests.CloudController", 101 | "policy": "default", 102 | "refId": "A", 103 | "resultFormat": "time_series", 104 | "select": [ 105 | [ 106 | { 107 | "params": [ 108 | "value" 109 | ], 110 | "type": "field" 111 | }, 112 | { 113 | "params": [], 114 | "type": "mean" 115 | }, 116 | { 117 | "params": [ 118 | "1s" 119 | ], 120 | "type": "non_negative_derivative" 121 | } 122 | ] 123 | ], 124 | "tags": [] 125 | } 126 | ], 127 | "thresholds": [], 128 | "timeFrom": null, 129 | "timeShift": null, 130 | "title": "CloudController Requests Per Second", 131 | "tooltip": { 132 | "msResolution": true, 133 | "shared": true, 134 | "sort": 2, 135 | "value_type": "cumulative" 136 | }, 137 | "type": "graph", 138 | "xaxis": { 139 | "mode": "time", 140 | "name": null, 141 | "show": true, 142 | "values": [] 143 | }, 144 | "yaxes": [ 145 | { 146 | "format": "short", 147 | "label": null, 148 | "logBase": 1, 149 | "max": null, 150 | "min": null, 151 | "show": true 152 | }, 153 | { 154 | "format": "short", 155 | "label": null, 156 | "logBase": 1, 157 | "max": null, 158 | "min": null, 159 | "show": true 160 | } 161 | ] 162 | } 163 | ], 164 | "repeat": null, 165 | "repeatIteration": null, 166 | "repeatRowId": null, 167 | "showTitle": false, 168 | "title": "Row", 169 | "titleSize": "h6" 170 | }, 171 | { 172 | "collapse": false, 173 | "height": "250px", 174 | "panels": [ 175 | { 176 | "aliasColors": {}, 177 | "bars": false, 178 | "datasource": "$Environment", 179 | "editable": true, 180 | "error": false, 181 | "fill": 0, 182 | "grid": {}, 183 | "id": 2, 184 | "legend": { 185 | "avg": false, 186 | "current": false, 187 | "max": false, 188 | "min": false, 189 | "show": true, 190 | "total": false, 191 | "values": false 192 | }, 193 | "lines": true, 194 | "linewidth": 2, 195 | "links": [], 196 | "nullPointMode": "connected", 197 | "percentage": false, 198 | "pointradius": 1, 199 | "points": true, 200 | "renderer": "flot", 201 | "seriesOverrides": [], 202 | "span": 12, 203 | "stack": false, 204 | "steppedLine": false, 205 | "targets": [ 206 | { 207 | "alias": "$tag_job:$tag_index", 208 | "dsType": "influxdb", 209 | "groupBy": [ 210 | { 211 | "params": [ 212 | "$interval" 213 | ], 214 | "type": "time" 215 | }, 216 | { 217 | "params": [ 218 | "job" 219 | ], 220 | "type": "tag" 221 | }, 222 | { 223 | "params": [ 224 | "index" 225 | ], 226 | "type": "tag" 227 | } 228 | ], 229 | "hide": false, 230 | "measurement": "firehose.cc.job_queue_length.total", 231 | "policy": "default", 232 | "refId": "A", 233 | "resultFormat": "time_series", 234 | "select": [ 235 | [ 236 | { 237 | "params": [ 238 | "value" 239 | ], 240 | "type": "field" 241 | }, 242 | { 243 | "params": [], 244 | "type": "mean" 245 | } 246 | ] 247 | ], 248 | "tags": [] 249 | } 250 | ], 251 | "thresholds": [], 252 | "timeFrom": null, 253 | "timeShift": null, 254 | "title": "job Queue Length", 255 | "tooltip": { 256 | "msResolution": false, 257 | "shared": true, 258 | "sort": 2, 259 | "value_type": "cumulative" 260 | }, 261 | "type": "graph", 262 | "xaxis": { 263 | "mode": "time", 264 | "name": null, 265 | "show": true, 266 | "values": [] 267 | }, 268 | "yaxes": [ 269 | { 270 | "format": "short", 271 | "label": null, 272 | "logBase": 1, 273 | "max": null, 274 | "min": null, 275 | "show": true 276 | }, 277 | { 278 | "format": "short", 279 | "label": null, 280 | "logBase": 1, 281 | "max": null, 282 | "min": null, 283 | "show": true 284 | } 285 | ] 286 | } 287 | ], 288 | "repeat": null, 289 | "repeatIteration": null, 290 | "repeatRowId": null, 291 | "showTitle": false, 292 | "title": "New row", 293 | "titleSize": "h6" 294 | }, 295 | { 296 | "collapse": false, 297 | "height": "250px", 298 | "panels": [ 299 | { 300 | "aliasColors": {}, 301 | "bars": false, 302 | "datasource": "$Environment", 303 | "editable": true, 304 | "error": false, 305 | "fill": 0, 306 | "grid": {}, 307 | "id": 3, 308 | "legend": { 309 | "avg": false, 310 | "current": false, 311 | "max": false, 312 | "min": false, 313 | "show": true, 314 | "total": false, 315 | "values": false 316 | }, 317 | "lines": true, 318 | "linewidth": 2, 319 | "links": [], 320 | "nullPointMode": "connected", 321 | "percentage": false, 322 | "pointradius": 1, 323 | "points": true, 324 | "renderer": "flot", 325 | "seriesOverrides": [], 326 | "span": 12, 327 | "stack": false, 328 | "steppedLine": false, 329 | "targets": [ 330 | { 331 | "alias": "$tag_job:$tag_index", 332 | "dsType": "influxdb", 333 | "groupBy": [ 334 | { 335 | "params": [ 336 | "$interval" 337 | ], 338 | "type": "time" 339 | }, 340 | { 341 | "params": [ 342 | "index" 343 | ], 344 | "type": "tag" 345 | }, 346 | { 347 | "params": [ 348 | "job" 349 | ], 350 | "type": "tag" 351 | }, 352 | { 353 | "params": [ 354 | "null" 355 | ], 356 | "type": "fill" 357 | } 358 | ], 359 | "measurement": "firehose.cc.requests.outstanding", 360 | "policy": "default", 361 | "refId": "A", 362 | "resultFormat": "time_series", 363 | "select": [ 364 | [ 365 | { 366 | "params": [ 367 | "value" 368 | ], 369 | "type": "field" 370 | }, 371 | { 372 | "params": [], 373 | "type": "mean" 374 | } 375 | ] 376 | ], 377 | "tags": [] 378 | } 379 | ], 380 | "thresholds": [], 381 | "timeFrom": null, 382 | "timeShift": null, 383 | "title": "Outstanding Requests", 384 | "tooltip": { 385 | "msResolution": true, 386 | "shared": true, 387 | "sort": 2, 388 | "value_type": "cumulative" 389 | }, 390 | "type": "graph", 391 | "xaxis": { 392 | "mode": "time", 393 | "name": null, 394 | "show": true, 395 | "values": [] 396 | }, 397 | "yaxes": [ 398 | { 399 | "format": "short", 400 | "label": null, 401 | "logBase": 1, 402 | "max": null, 403 | "min": null, 404 | "show": true 405 | }, 406 | { 407 | "format": "short", 408 | "label": null, 409 | "logBase": 1, 410 | "max": null, 411 | "min": null, 412 | "show": true 413 | } 414 | ] 415 | } 416 | ], 417 | "repeat": null, 418 | "repeatIteration": null, 419 | "repeatRowId": null, 420 | "showTitle": false, 421 | "title": "New row", 422 | "titleSize": "h6" 423 | }, 424 | { 425 | "collapse": false, 426 | "height": 250, 427 | "panels": [ 428 | { 429 | "aliasColors": {}, 430 | "bars": false, 431 | "datasource": "$Environment", 432 | "fill": 0, 433 | "id": 4, 434 | "legend": { 435 | "avg": false, 436 | "current": false, 437 | "max": false, 438 | "min": false, 439 | "show": true, 440 | "total": false, 441 | "values": false 442 | }, 443 | "lines": true, 444 | "linewidth": 1, 445 | "links": [], 446 | "nullPointMode": "connected", 447 | "percentage": false, 448 | "pointradius": 1, 449 | "points": true, 450 | "renderer": "flot", 451 | "seriesOverrides": [], 452 | "span": 12, 453 | "stack": false, 454 | "steppedLine": false, 455 | "targets": [ 456 | { 457 | "alias": "$tag_job:$tag_index", 458 | "dsType": "influxdb", 459 | "groupBy": [ 460 | { 461 | "params": [ 462 | "$interval" 463 | ], 464 | "type": "time" 465 | }, 466 | { 467 | "params": [ 468 | "index" 469 | ], 470 | "type": "tag" 471 | }, 472 | { 473 | "params": [ 474 | "job" 475 | ], 476 | "type": "tag" 477 | }, 478 | { 479 | "params": [ 480 | "null" 481 | ], 482 | "type": "fill" 483 | } 484 | ], 485 | "measurement": "firehose.cc.thread_info.event_machine.threadqueue.num_waiting", 486 | "policy": "default", 487 | "refId": "A", 488 | "resultFormat": "time_series", 489 | "select": [ 490 | [ 491 | { 492 | "params": [ 493 | "value" 494 | ], 495 | "type": "field" 496 | }, 497 | { 498 | "params": [], 499 | "type": "mean" 500 | } 501 | ] 502 | ], 503 | "tags": [] 504 | } 505 | ], 506 | "thresholds": [], 507 | "timeFrom": null, 508 | "timeShift": null, 509 | "title": "Available CC Threads", 510 | "tooltip": { 511 | "shared": true, 512 | "sort": 2, 513 | "value_type": "individual" 514 | }, 515 | "type": "graph", 516 | "xaxis": { 517 | "mode": "time", 518 | "name": null, 519 | "show": true, 520 | "values": [] 521 | }, 522 | "yaxes": [ 523 | { 524 | "format": "short", 525 | "label": null, 526 | "logBase": 1, 527 | "max": null, 528 | "min": null, 529 | "show": true 530 | }, 531 | { 532 | "format": "short", 533 | "label": null, 534 | "logBase": 1, 535 | "max": null, 536 | "min": null, 537 | "show": true 538 | } 539 | ] 540 | } 541 | ], 542 | "repeat": null, 543 | "repeatIteration": null, 544 | "repeatRowId": null, 545 | "showTitle": false, 546 | "title": "Dashboard Row", 547 | "titleSize": "h6" 548 | } 549 | ], 550 | "schemaVersion": 14, 551 | "style": "dark", 552 | "tags": [], 553 | "templating": { 554 | "list": [ 555 | { 556 | "current": { 557 | "text": "cf_np", 558 | "value": "cf_np" 559 | }, 560 | "hide": 0, 561 | "label": null, 562 | "name": "Environment", 563 | "options": [], 564 | "query": "influxdb", 565 | "refresh": 1, 566 | "regex": "/cf_*/", 567 | "type": "datasource" 568 | } 569 | ] 570 | }, 571 | "time": { 572 | "from": "now-15m", 573 | "to": "now" 574 | }, 575 | "timepicker": { 576 | "refresh_intervals": [ 577 | "5s", 578 | "10s", 579 | "30s", 580 | "1m", 581 | "5m", 582 | "15m", 583 | "30m", 584 | "1h", 585 | "2h", 586 | "1d" 587 | ], 588 | "time_options": [ 589 | "5m", 590 | "15m", 591 | "1h", 592 | "6h", 593 | "12h", 594 | "24h", 595 | "2d", 596 | "7d", 597 | "30d" 598 | ] 599 | }, 600 | "timezone": "browser", 601 | "title": "Cloud Controller", 602 | "version": 0 603 | } 604 | -------------------------------------------------------------------------------- /grafana/dashboards/import_format/Etcd_stats.json: -------------------------------------------------------------------------------- 1 | { 2 | "__inputs": [], 3 | "__requires": [ 4 | { 5 | "type": "grafana", 6 | "id": "grafana", 7 | "name": "Grafana", 8 | "version": "4.2.0" 9 | }, 10 | { 11 | "type": "panel", 12 | "id": "graph", 13 | "name": "Graph", 14 | "version": "" 15 | } 16 | ], 17 | "annotations": { 18 | "list": [] 19 | }, 20 | "editable": true, 21 | "gnetId": null, 22 | "graphTooltip": 0, 23 | "hideControls": false, 24 | "id": null, 25 | "links": [], 26 | "rows": [ 27 | { 28 | "collapse": false, 29 | "height": "250px", 30 | "panels": [ 31 | { 32 | "aliasColors": {}, 33 | "bars": false, 34 | "datasource": "$Environment", 35 | "editable": true, 36 | "error": false, 37 | "fill": 0, 38 | "grid": {}, 39 | "id": 1, 40 | "legend": { 41 | "avg": false, 42 | "current": false, 43 | "max": false, 44 | "min": false, 45 | "show": true, 46 | "total": false, 47 | "values": false 48 | }, 49 | "lines": true, 50 | "linewidth": 1, 51 | "links": [], 52 | "nullPointMode": "connected", 53 | "percentage": false, 54 | "pointradius": 2, 55 | "points": true, 56 | "renderer": "flot", 57 | "seriesOverrides": [], 58 | "span": 12, 59 | "stack": false, 60 | "steppedLine": false, 61 | "targets": [ 62 | { 63 | "dsType": "influxdb", 64 | "groupBy": [ 65 | { 66 | "params": [ 67 | "$interval" 68 | ], 69 | "type": "time" 70 | }, 71 | { 72 | "params": [ 73 | "index" 74 | ], 75 | "type": "tag" 76 | }, 77 | { 78 | "params": [ 79 | "job" 80 | ], 81 | "type": "tag" 82 | }, 83 | { 84 | "params": [ 85 | "null" 86 | ], 87 | "type": "fill" 88 | } 89 | ], 90 | "measurement": "firehose.etcd.IsLeader", 91 | "policy": "default", 92 | "query": "SELECT mean(\"value\") FROM \"firehose.etcd.IsLeader\" WHERE $timeFilter GROUP BY time($interval), \"index\", \"job\" fill(null)", 93 | "rawQuery": false, 94 | "refId": "A", 95 | "resultFormat": "time_series", 96 | "select": [ 97 | [ 98 | { 99 | "params": [ 100 | "value" 101 | ], 102 | "type": "field" 103 | }, 104 | { 105 | "params": [], 106 | "type": "mean" 107 | } 108 | ] 109 | ], 110 | "tags": [] 111 | } 112 | ], 113 | "thresholds": [], 114 | "timeFrom": null, 115 | "timeShift": null, 116 | "title": "Etcd Leaders", 117 | "tooltip": { 118 | "msResolution": true, 119 | "shared": true, 120 | "sort": 0, 121 | "value_type": "cumulative" 122 | }, 123 | "type": "graph", 124 | "xaxis": { 125 | "mode": "time", 126 | "name": null, 127 | "show": true, 128 | "values": [] 129 | }, 130 | "yaxes": [ 131 | { 132 | "format": "short", 133 | "label": null, 134 | "logBase": 1, 135 | "max": null, 136 | "min": null, 137 | "show": true 138 | }, 139 | { 140 | "format": "short", 141 | "label": null, 142 | "logBase": 1, 143 | "max": null, 144 | "min": null, 145 | "show": true 146 | } 147 | ] 148 | } 149 | ], 150 | "repeat": null, 151 | "repeatIteration": null, 152 | "repeatRowId": null, 153 | "showTitle": false, 154 | "title": "Row", 155 | "titleSize": "h6" 156 | }, 157 | { 158 | "collapse": false, 159 | "height": "250px", 160 | "panels": [ 161 | { 162 | "aliasColors": {}, 163 | "bars": false, 164 | "datasource": "$Environment", 165 | "editable": true, 166 | "error": false, 167 | "fill": 0, 168 | "grid": {}, 169 | "id": 2, 170 | "legend": { 171 | "avg": false, 172 | "current": false, 173 | "max": false, 174 | "min": false, 175 | "show": true, 176 | "total": false, 177 | "values": false 178 | }, 179 | "lines": true, 180 | "linewidth": 1, 181 | "links": [], 182 | "nullPointMode": "connected", 183 | "percentage": false, 184 | "pointradius": 2, 185 | "points": true, 186 | "renderer": "flot", 187 | "seriesOverrides": [], 188 | "span": 12, 189 | "stack": false, 190 | "steppedLine": false, 191 | "targets": [ 192 | { 193 | "dsType": "influxdb", 194 | "groupBy": [ 195 | { 196 | "params": [ 197 | "$interval" 198 | ], 199 | "type": "time" 200 | }, 201 | { 202 | "params": [ 203 | "index" 204 | ], 205 | "type": "tag" 206 | }, 207 | { 208 | "params": [ 209 | "job" 210 | ], 211 | "type": "tag" 212 | }, 213 | { 214 | "params": [ 215 | "null" 216 | ], 217 | "type": "fill" 218 | } 219 | ], 220 | "measurement": "firehose.etcd.Watchers", 221 | "policy": "default", 222 | "query": "SELECT mean(\"value\") FROM \"measurement\" WHERE $timeFilter GROUP BY time($interval) fill(null)", 223 | "rawQuery": false, 224 | "refId": "A", 225 | "resultFormat": "time_series", 226 | "select": [ 227 | [ 228 | { 229 | "params": [ 230 | "value" 231 | ], 232 | "type": "field" 233 | }, 234 | { 235 | "params": [], 236 | "type": "mean" 237 | } 238 | ] 239 | ], 240 | "tags": [] 241 | } 242 | ], 243 | "thresholds": [], 244 | "timeFrom": null, 245 | "timeShift": null, 246 | "title": "Etcd Watchers", 247 | "tooltip": { 248 | "msResolution": true, 249 | "shared": true, 250 | "sort": 0, 251 | "value_type": "cumulative" 252 | }, 253 | "type": "graph", 254 | "xaxis": { 255 | "mode": "time", 256 | "name": null, 257 | "show": true, 258 | "values": [] 259 | }, 260 | "yaxes": [ 261 | { 262 | "format": "short", 263 | "label": null, 264 | "logBase": 1, 265 | "max": null, 266 | "min": null, 267 | "show": true 268 | }, 269 | { 270 | "format": "short", 271 | "label": null, 272 | "logBase": 1, 273 | "max": null, 274 | "min": null, 275 | "show": true 276 | } 277 | ] 278 | } 279 | ], 280 | "repeat": null, 281 | "repeatIteration": null, 282 | "repeatRowId": null, 283 | "showTitle": false, 284 | "title": "New row", 285 | "titleSize": "h6" 286 | }, 287 | { 288 | "collapse": false, 289 | "height": "250px", 290 | "panels": [ 291 | { 292 | "aliasColors": {}, 293 | "bars": false, 294 | "datasource": "$Environment", 295 | "editable": true, 296 | "error": false, 297 | "fill": 0, 298 | "grid": {}, 299 | "id": 3, 300 | "legend": { 301 | "avg": false, 302 | "current": false, 303 | "max": false, 304 | "min": false, 305 | "show": true, 306 | "total": false, 307 | "values": false 308 | }, 309 | "lines": true, 310 | "linewidth": 1, 311 | "links": [], 312 | "nullPointMode": "connected", 313 | "percentage": false, 314 | "pointradius": 2, 315 | "points": true, 316 | "renderer": "flot", 317 | "seriesOverrides": [], 318 | "span": 12, 319 | "stack": false, 320 | "steppedLine": false, 321 | "targets": [ 322 | { 323 | "dsType": "influxdb", 324 | "groupBy": [ 325 | { 326 | "params": [ 327 | "$interval" 328 | ], 329 | "type": "time" 330 | }, 331 | { 332 | "params": [ 333 | "job" 334 | ], 335 | "type": "tag" 336 | }, 337 | { 338 | "params": [ 339 | "index" 340 | ], 341 | "type": "tag" 342 | }, 343 | { 344 | "params": [ 345 | "null" 346 | ], 347 | "type": "fill" 348 | } 349 | ], 350 | "measurement": "firehose.etcd.Followers", 351 | "policy": "default", 352 | "query": "SELECT mean(\"value\") FROM \"measurement\" WHERE $timeFilter GROUP BY time($interval) fill(null)", 353 | "rawQuery": false, 354 | "refId": "A", 355 | "resultFormat": "time_series", 356 | "select": [ 357 | [ 358 | { 359 | "params": [ 360 | "value" 361 | ], 362 | "type": "field" 363 | }, 364 | { 365 | "params": [], 366 | "type": "mean" 367 | } 368 | ] 369 | ], 370 | "tags": [] 371 | } 372 | ], 373 | "thresholds": [], 374 | "timeFrom": null, 375 | "timeShift": null, 376 | "title": "Etcd Followers", 377 | "tooltip": { 378 | "msResolution": true, 379 | "shared": true, 380 | "sort": 0, 381 | "value_type": "cumulative" 382 | }, 383 | "type": "graph", 384 | "xaxis": { 385 | "mode": "time", 386 | "name": null, 387 | "show": true, 388 | "values": [] 389 | }, 390 | "yaxes": [ 391 | { 392 | "format": "short", 393 | "label": null, 394 | "logBase": 1, 395 | "max": null, 396 | "min": null, 397 | "show": true 398 | }, 399 | { 400 | "format": "short", 401 | "label": null, 402 | "logBase": 1, 403 | "max": null, 404 | "min": null, 405 | "show": true 406 | } 407 | ] 408 | } 409 | ], 410 | "repeat": null, 411 | "repeatIteration": null, 412 | "repeatRowId": null, 413 | "showTitle": false, 414 | "title": "New row", 415 | "titleSize": "h6" 416 | } 417 | ], 418 | "schemaVersion": 14, 419 | "style": "dark", 420 | "tags": [], 421 | "templating": { 422 | "list": [ 423 | { 424 | "current": { 425 | "text": "cf_np", 426 | "value": "cf_np" 427 | }, 428 | "hide": 0, 429 | "label": null, 430 | "name": "Environment", 431 | "options": [], 432 | "query": "influxdb", 433 | "refresh": 1, 434 | "regex": "/cf_*/", 435 | "type": "datasource" 436 | } 437 | ] 438 | }, 439 | "time": { 440 | "from": "now-1h", 441 | "to": "now" 442 | }, 443 | "timepicker": { 444 | "refresh_intervals": [ 445 | "5s", 446 | "10s", 447 | "30s", 448 | "1m", 449 | "5m", 450 | "15m", 451 | "30m", 452 | "1h", 453 | "2h", 454 | "1d" 455 | ], 456 | "time_options": [ 457 | "5m", 458 | "15m", 459 | "1h", 460 | "6h", 461 | "12h", 462 | "24h", 463 | "2d", 464 | "7d", 465 | "30d" 466 | ] 467 | }, 468 | "timezone": "browser", 469 | "title": "ETCD Stats", 470 | "version": 0 471 | } 472 | -------------------------------------------------------------------------------- /grafana/dashboards/import_format/Influx_stats.json: -------------------------------------------------------------------------------- 1 | { 2 | "__inputs": [ 3 | { 4 | "name": "DS_INFLUX", 5 | "label": "Influx", 6 | "description": "", 7 | "type": "datasource", 8 | "pluginId": "influxdb", 9 | "pluginName": "InfluxDB" 10 | } 11 | ], 12 | "__requires": [ 13 | { 14 | "type": "grafana", 15 | "id": "grafana", 16 | "name": "Grafana", 17 | "version": "4.1.2" 18 | }, 19 | { 20 | "type": "panel", 21 | "id": "graph", 22 | "name": "Graph", 23 | "version": "" 24 | }, 25 | { 26 | "type": "datasource", 27 | "id": "influxdb", 28 | "name": "InfluxDB", 29 | "version": "1.0.0" 30 | }, 31 | { 32 | "type": "panel", 33 | "id": "singlestat", 34 | "name": "Singlestat", 35 | "version": "" 36 | } 37 | ], 38 | "annotations": { 39 | "list": [] 40 | }, 41 | "editable": true, 42 | "gnetId": null, 43 | "graphTooltip": 0, 44 | "hideControls": false, 45 | "id": null, 46 | "links": [], 47 | "refresh": false, 48 | "rows": [ 49 | { 50 | "collapse": false, 51 | "height": "250px", 52 | "panels": [ 53 | { 54 | "aliasColors": {}, 55 | "bars": false, 56 | "datasource": "influxdb", 57 | "fill": 1, 58 | "id": 1, 59 | "legend": { 60 | "alignAsTable": true, 61 | "avg": true, 62 | "current": false, 63 | "max": true, 64 | "min": false, 65 | "show": true, 66 | "total": false, 67 | "values": true 68 | }, 69 | "lines": true, 70 | "linewidth": 1, 71 | "links": [], 72 | "nullPointMode": "null", 73 | "percentage": false, 74 | "pointradius": 1, 75 | "points": true, 76 | "renderer": "flot", 77 | "seriesOverrides": [], 78 | "span": 12, 79 | "stack": false, 80 | "steppedLine": false, 81 | "targets": [ 82 | { 83 | "dsType": "influxdb", 84 | "groupBy": [ 85 | { 86 | "params": [ 87 | "hostname" 88 | ], 89 | "type": "tag" 90 | } 91 | ], 92 | "hide": false, 93 | "measurement": "httpd", 94 | "policy": "default", 95 | "refId": "A", 96 | "resultFormat": "time_series", 97 | "select": [ 98 | [ 99 | { 100 | "params": [ 101 | "pointsWrittenOK" 102 | ], 103 | "type": "field" 104 | }, 105 | { 106 | "params": [ 107 | "1s" 108 | ], 109 | "type": "non_negative_derivative" 110 | } 111 | ] 112 | ], 113 | "tags": [] 114 | } 115 | ], 116 | "thresholds": [], 117 | "timeFrom": null, 118 | "timeShift": null, 119 | "title": "Points written per second", 120 | "tooltip": { 121 | "shared": true, 122 | "sort": 0, 123 | "value_type": "individual" 124 | }, 125 | "type": "graph", 126 | "xaxis": { 127 | "mode": "time", 128 | "name": null, 129 | "show": true, 130 | "values": [] 131 | }, 132 | "yaxes": [ 133 | { 134 | "format": "short", 135 | "label": null, 136 | "logBase": 1, 137 | "max": null, 138 | "min": null, 139 | "show": true 140 | }, 141 | { 142 | "format": "short", 143 | "label": null, 144 | "logBase": 1, 145 | "max": null, 146 | "min": null, 147 | "show": true 148 | } 149 | ] 150 | } 151 | ], 152 | "repeat": null, 153 | "repeatIteration": null, 154 | "repeatRowId": null, 155 | "showTitle": false, 156 | "title": "Dashboard Row", 157 | "titleSize": "h6" 158 | }, 159 | { 160 | "collapse": false, 161 | "height": 250, 162 | "panels": [ 163 | { 164 | "cacheTimeout": null, 165 | "colorBackground": false, 166 | "colorValue": false, 167 | "colors": [ 168 | "rgba(245, 54, 54, 0.9)", 169 | "rgba(237, 129, 40, 0.89)", 170 | "rgba(50, 172, 45, 0.97)" 171 | ], 172 | "datasource": "influxdb", 173 | "format": "none", 174 | "gauge": { 175 | "maxValue": 100, 176 | "minValue": 0, 177 | "show": false, 178 | "thresholdLabels": false, 179 | "thresholdMarkers": true 180 | }, 181 | "id": 2, 182 | "interval": null, 183 | "links": [], 184 | "mappingType": 1, 185 | "mappingTypes": [ 186 | { 187 | "name": "value to text", 188 | "value": 1 189 | }, 190 | { 191 | "name": "range to text", 192 | "value": 2 193 | } 194 | ], 195 | "maxDataPoints": 100, 196 | "nullPointMode": "connected", 197 | "nullText": null, 198 | "postfix": "", 199 | "postfixFontSize": "50%", 200 | "prefix": "", 201 | "prefixFontSize": "50%", 202 | "rangeMaps": [ 203 | { 204 | "from": "null", 205 | "text": "N/A", 206 | "to": "null" 207 | } 208 | ], 209 | "span": 6, 210 | "sparkline": { 211 | "fillColor": "rgba(31, 118, 189, 0.18)", 212 | "full": false, 213 | "lineColor": "rgb(31, 120, 193)", 214 | "show": false 215 | }, 216 | "targets": [ 217 | { 218 | "dsType": "influxdb", 219 | "groupBy": [], 220 | "measurement": "database", 221 | "policy": "default", 222 | "refId": "A", 223 | "resultFormat": "time_series", 224 | "select": [ 225 | [ 226 | { 227 | "params": [ 228 | "numMeasurements" 229 | ], 230 | "type": "field" 231 | } 232 | ] 233 | ], 234 | "tags": [] 235 | } 236 | ], 237 | "thresholds": "", 238 | "title": "Number of Measurements", 239 | "type": "singlestat", 240 | "valueFontSize": "80%", 241 | "valueMaps": [ 242 | { 243 | "op": "=", 244 | "text": "N/A", 245 | "value": "null" 246 | } 247 | ], 248 | "valueName": "avg" 249 | }, 250 | { 251 | "cacheTimeout": null, 252 | "colorBackground": false, 253 | "colorValue": false, 254 | "colors": [ 255 | "rgba(245, 54, 54, 0.9)", 256 | "rgba(237, 129, 40, 0.89)", 257 | "rgba(50, 172, 45, 0.97)" 258 | ], 259 | "datasource": "influxdb", 260 | "format": "none", 261 | "gauge": { 262 | "maxValue": 100, 263 | "minValue": 0, 264 | "show": false, 265 | "thresholdLabels": false, 266 | "thresholdMarkers": true 267 | }, 268 | "id": 3, 269 | "interval": null, 270 | "links": [], 271 | "mappingType": 1, 272 | "mappingTypes": [ 273 | { 274 | "name": "value to text", 275 | "value": 1 276 | }, 277 | { 278 | "name": "range to text", 279 | "value": 2 280 | } 281 | ], 282 | "maxDataPoints": 100, 283 | "nullPointMode": "connected", 284 | "nullText": null, 285 | "postfix": "", 286 | "postfixFontSize": "50%", 287 | "prefix": "", 288 | "prefixFontSize": "50%", 289 | "rangeMaps": [ 290 | { 291 | "from": "null", 292 | "text": "N/A", 293 | "to": "null" 294 | } 295 | ], 296 | "span": 6, 297 | "sparkline": { 298 | "fillColor": "rgba(31, 118, 189, 0.18)", 299 | "full": false, 300 | "lineColor": "rgb(31, 120, 193)", 301 | "show": false 302 | }, 303 | "targets": [ 304 | { 305 | "dsType": "influxdb", 306 | "groupBy": [], 307 | "measurement": "database", 308 | "policy": "default", 309 | "refId": "A", 310 | "resultFormat": "time_series", 311 | "select": [ 312 | [ 313 | { 314 | "params": [ 315 | "numSeries" 316 | ], 317 | "type": "field" 318 | } 319 | ] 320 | ], 321 | "tags": [] 322 | } 323 | ], 324 | "thresholds": "", 325 | "title": "Number of series", 326 | "type": "singlestat", 327 | "valueFontSize": "80%", 328 | "valueMaps": [ 329 | { 330 | "op": "=", 331 | "text": "N/A", 332 | "value": "null" 333 | } 334 | ], 335 | "valueName": "avg" 336 | } 337 | ], 338 | "repeat": null, 339 | "repeatIteration": null, 340 | "repeatRowId": null, 341 | "showTitle": false, 342 | "title": "Dashboard Row", 343 | "titleSize": "h6" 344 | } 345 | ], 346 | "schemaVersion": 14, 347 | "style": "dark", 348 | "tags": [], 349 | "templating": { 350 | "list": [] 351 | }, 352 | "time": { 353 | "from": "now-1h", 354 | "to": "now" 355 | }, 356 | "timepicker": { 357 | "refresh_intervals": [ 358 | "5s", 359 | "10s", 360 | "30s", 361 | "1m", 362 | "5m", 363 | "15m", 364 | "30m", 365 | "1h", 366 | "2h", 367 | "1d" 368 | ], 369 | "time_options": [ 370 | "5m", 371 | "15m", 372 | "1h", 373 | "6h", 374 | "12h", 375 | "24h", 376 | "2d", 377 | "7d", 378 | "30d" 379 | ] 380 | }, 381 | "timezone": "browser", 382 | "title": "InfluxDB stats", 383 | "version": 5 384 | } 385 | -------------------------------------------------------------------------------- /grafana/dashboards/import_format/Routing.json: -------------------------------------------------------------------------------- 1 | { 2 | "__inputs": [], 3 | "__requires": [ 4 | { 5 | "type": "grafana", 6 | "id": "grafana", 7 | "name": "Grafana", 8 | "version": "4.4.3" 9 | }, 10 | { 11 | "type": "panel", 12 | "id": "graph", 13 | "name": "Graph", 14 | "version": "" 15 | }, 16 | { 17 | "type": "panel", 18 | "id": "singlestat", 19 | "name": "Singlestat", 20 | "version": "" 21 | } 22 | ], 23 | "annotations": { 24 | "list": [ 25 | { 26 | "datasource": "$Environment_CF", 27 | "enable": true, 28 | "iconColor": "#C0C6BE", 29 | "iconSize": 15, 30 | "lineColor": "rgba(255, 96, 96, 0.592157)", 31 | "name": "deployment", 32 | "query": "select value from bosh_deploy where $timeFilter", 33 | "showLine": true 34 | } 35 | ] 36 | }, 37 | "editable": true, 38 | "gnetId": null, 39 | "graphTooltip": 0, 40 | "hideControls": false, 41 | "id": null, 42 | "links": [], 43 | "refresh": false, 44 | "rows": [ 45 | { 46 | "collapse": false, 47 | "height": "250px", 48 | "panels": [ 49 | { 50 | "aliasColors": {}, 51 | "bars": false, 52 | "dashLength": 10, 53 | "dashes": false, 54 | "datasource": "$Environment_CF", 55 | "editable": true, 56 | "error": false, 57 | "fill": 0, 58 | "grid": {}, 59 | "id": 7, 60 | "legend": { 61 | "avg": false, 62 | "current": false, 63 | "max": false, 64 | "min": false, 65 | "show": true, 66 | "total": false, 67 | "values": false 68 | }, 69 | "lines": true, 70 | "linewidth": 2, 71 | "links": [], 72 | "nullPointMode": "connected", 73 | "percentage": false, 74 | "pointradius": 5, 75 | "points": false, 76 | "renderer": "flot", 77 | "seriesOverrides": [], 78 | "spaceLength": 10, 79 | "span": 12, 80 | "stack": false, 81 | "steppedLine": false, 82 | "targets": [ 83 | { 84 | "alias": "$tag_job:$tag_index", 85 | "dsType": "influxdb", 86 | "groupBy": [ 87 | { 88 | "params": [ 89 | "index" 90 | ], 91 | "type": "tag" 92 | }, 93 | { 94 | "params": [ 95 | "job" 96 | ], 97 | "type": "tag" 98 | } 99 | ], 100 | "hide": false, 101 | "measurement": "/firehose.gorouter.responses.\\d+/", 102 | "orderByTime": "ASC", 103 | "policy": "default", 104 | "query": "SELECT mean(\"value\") FROM firehose.gorouter.latency.dea-0 WHERE $timeFilter GROUP BY time($interval), \"index\", \"job\" fill(null)", 105 | "rawQuery": false, 106 | "refId": "A", 107 | "resultFormat": "time_series", 108 | "select": [ 109 | [ 110 | { 111 | "params": [ 112 | "value" 113 | ], 114 | "type": "field" 115 | }, 116 | { 117 | "params": [ 118 | "5m" 119 | ], 120 | "type": "non_negative_derivative" 121 | } 122 | ] 123 | ], 124 | "tags": [] 125 | } 126 | ], 127 | "thresholds": [], 128 | "timeFrom": null, 129 | "timeShift": null, 130 | "title": "Router Response Rates (5min)", 131 | "tooltip": { 132 | "msResolution": false, 133 | "shared": true, 134 | "sort": 0, 135 | "value_type": "cumulative" 136 | }, 137 | "type": "graph", 138 | "xaxis": { 139 | "buckets": null, 140 | "mode": "time", 141 | "name": null, 142 | "show": true, 143 | "values": [] 144 | }, 145 | "yaxes": [ 146 | { 147 | "format": "short", 148 | "label": null, 149 | "logBase": 1, 150 | "max": null, 151 | "min": null, 152 | "show": true 153 | }, 154 | { 155 | "format": "short", 156 | "label": null, 157 | "logBase": 1, 158 | "max": null, 159 | "min": null, 160 | "show": true 161 | } 162 | ] 163 | } 164 | ], 165 | "repeat": null, 166 | "repeatIteration": null, 167 | "repeatRowId": null, 168 | "showTitle": false, 169 | "title": "New row", 170 | "titleSize": "h6" 171 | }, 172 | { 173 | "collapse": false, 174 | "height": "250px", 175 | "panels": [ 176 | { 177 | "aliasColors": {}, 178 | "bars": false, 179 | "dashLength": 10, 180 | "dashes": false, 181 | "datasource": "$Environment_CF", 182 | "editable": true, 183 | "error": false, 184 | "fill": 0, 185 | "grid": {}, 186 | "id": 1, 187 | "legend": { 188 | "avg": false, 189 | "current": false, 190 | "max": false, 191 | "min": false, 192 | "show": true, 193 | "total": false, 194 | "values": false 195 | }, 196 | "lines": true, 197 | "linewidth": 1, 198 | "links": [], 199 | "nullPointMode": "connected", 200 | "percentage": false, 201 | "pointradius": 1, 202 | "points": true, 203 | "renderer": "flot", 204 | "seriesOverrides": [], 205 | "spaceLength": 10, 206 | "span": 12, 207 | "stack": false, 208 | "steppedLine": false, 209 | "targets": [ 210 | { 211 | "dsType": "influxdb", 212 | "groupBy": [ 213 | { 214 | "params": [ 215 | "job" 216 | ], 217 | "type": "tag" 218 | }, 219 | { 220 | "params": [ 221 | "index" 222 | ], 223 | "type": "tag" 224 | }, 225 | { 226 | "params": [ 227 | "job" 228 | ], 229 | "type": "tag" 230 | }, 231 | { 232 | "params": [ 233 | "index" 234 | ], 235 | "type": "tag" 236 | } 237 | ], 238 | "measurement": "/firehose.gorouter.latency.+/", 239 | "policy": "default", 240 | "query": "SELECT \"value\" FROM \"firehose.gorouter.latency\" WHERE $timeFilter GROUP BY \"job\", \"index\"", 241 | "refId": "A", 242 | "resultFormat": "time_series", 243 | "select": [ 244 | [ 245 | { 246 | "params": [ 247 | "value" 248 | ], 249 | "type": "field" 250 | } 251 | ] 252 | ], 253 | "tags": [] 254 | } 255 | ], 256 | "thresholds": [], 257 | "timeFrom": null, 258 | "timeShift": null, 259 | "title": "Component Latency (ms)", 260 | "tooltip": { 261 | "msResolution": false, 262 | "shared": true, 263 | "sort": 0, 264 | "value_type": "cumulative" 265 | }, 266 | "type": "graph", 267 | "xaxis": { 268 | "buckets": null, 269 | "mode": "time", 270 | "name": null, 271 | "show": true, 272 | "values": [] 273 | }, 274 | "yaxes": [ 275 | { 276 | "format": "ms", 277 | "logBase": 1, 278 | "max": null, 279 | "min": null, 280 | "show": true 281 | }, 282 | { 283 | "format": "short", 284 | "logBase": 1, 285 | "max": null, 286 | "min": null, 287 | "show": true 288 | } 289 | ] 290 | } 291 | ], 292 | "repeat": null, 293 | "repeatIteration": null, 294 | "repeatRowId": null, 295 | "showTitle": false, 296 | "title": "Row", 297 | "titleSize": "h6" 298 | }, 299 | { 300 | "collapse": false, 301 | "height": "250px", 302 | "panels": [ 303 | { 304 | "aliasColors": {}, 305 | "bars": false, 306 | "dashLength": 10, 307 | "dashes": false, 308 | "datasource": "$Environment_CF", 309 | "editable": true, 310 | "error": false, 311 | "fill": 0, 312 | "grid": {}, 313 | "id": 5, 314 | "legend": { 315 | "alignAsTable": true, 316 | "avg": true, 317 | "current": true, 318 | "max": true, 319 | "min": false, 320 | "show": true, 321 | "total": false, 322 | "values": true 323 | }, 324 | "lines": true, 325 | "linewidth": 2, 326 | "links": [], 327 | "nullPointMode": "connected", 328 | "percentage": false, 329 | "pointradius": 5, 330 | "points": false, 331 | "renderer": "flot", 332 | "seriesOverrides": [], 333 | "spaceLength": 10, 334 | "span": 12, 335 | "stack": false, 336 | "steppedLine": false, 337 | "targets": [ 338 | { 339 | "alias": "$tag_job:$tag_index", 340 | "dsType": "influxdb", 341 | "groupBy": [ 342 | { 343 | "params": [ 344 | "index" 345 | ], 346 | "type": "tag" 347 | }, 348 | { 349 | "params": [ 350 | "job" 351 | ], 352 | "type": "tag" 353 | } 354 | ], 355 | "measurement": "firehose.gorouter.total_requests", 356 | "policy": "default", 357 | "refId": "A", 358 | "resultFormat": "time_series", 359 | "select": [ 360 | [ 361 | { 362 | "params": [ 363 | "value" 364 | ], 365 | "type": "field" 366 | }, 367 | { 368 | "params": [ 369 | "1s" 370 | ], 371 | "type": "non_negative_derivative" 372 | } 373 | ] 374 | ], 375 | "tags": [] 376 | }, 377 | { 378 | "alias": "Total for all", 379 | "dsType": "influxdb", 380 | "groupBy": [ 381 | { 382 | "params": [ 383 | "1s" 384 | ], 385 | "type": "time" 386 | } 387 | ], 388 | "hide": true, 389 | "measurement": "firehose.gorouter.total_requests", 390 | "policy": "default", 391 | "refId": "B", 392 | "resultFormat": "time_series", 393 | "select": [ 394 | [ 395 | { 396 | "params": [ 397 | "value" 398 | ], 399 | "type": "field" 400 | }, 401 | { 402 | "params": [], 403 | "type": "sum" 404 | }, 405 | { 406 | "params": [ 407 | "1s" 408 | ], 409 | "type": "non_negative_derivative" 410 | } 411 | ] 412 | ], 413 | "tags": [] 414 | } 415 | ], 416 | "thresholds": [], 417 | "timeFrom": null, 418 | "timeShift": null, 419 | "title": "Request per Second", 420 | "tooltip": { 421 | "msResolution": false, 422 | "shared": true, 423 | "sort": 0, 424 | "value_type": "cumulative" 425 | }, 426 | "type": "graph", 427 | "xaxis": { 428 | "buckets": null, 429 | "mode": "time", 430 | "name": null, 431 | "show": true, 432 | "values": [] 433 | }, 434 | "yaxes": [ 435 | { 436 | "format": "short", 437 | "label": null, 438 | "logBase": 1, 439 | "max": null, 440 | "min": null, 441 | "show": true 442 | }, 443 | { 444 | "format": "short", 445 | "label": null, 446 | "logBase": 1, 447 | "max": null, 448 | "min": null, 449 | "show": true 450 | } 451 | ] 452 | } 453 | ], 454 | "repeat": null, 455 | "repeatIteration": null, 456 | "repeatRowId": null, 457 | "showTitle": false, 458 | "title": "New row", 459 | "titleSize": "h6" 460 | }, 461 | { 462 | "collapse": false, 463 | "height": "250px", 464 | "panels": [ 465 | { 466 | "aliasColors": {}, 467 | "bars": false, 468 | "dashLength": 10, 469 | "dashes": false, 470 | "datasource": "$Environment_CF", 471 | "editable": true, 472 | "error": false, 473 | "fill": 0, 474 | "grid": {}, 475 | "id": 6, 476 | "legend": { 477 | "avg": false, 478 | "current": false, 479 | "max": false, 480 | "min": false, 481 | "show": true, 482 | "total": false, 483 | "values": false 484 | }, 485 | "lines": true, 486 | "linewidth": 2, 487 | "links": [], 488 | "nullPointMode": "connected", 489 | "percentage": false, 490 | "pointradius": 5, 491 | "points": false, 492 | "renderer": "flot", 493 | "seriesOverrides": [], 494 | "spaceLength": 10, 495 | "span": 12, 496 | "stack": false, 497 | "steppedLine": false, 498 | "targets": [ 499 | { 500 | "alias": "$tag_job:$tag_index", 501 | "dsType": "influxdb", 502 | "groupBy": [ 503 | { 504 | "params": [ 505 | "$interval" 506 | ], 507 | "type": "time" 508 | }, 509 | { 510 | "params": [ 511 | "null" 512 | ], 513 | "type": "fill" 514 | } 515 | ], 516 | "hide": false, 517 | "measurement": "system_cpu", 518 | "orderByTime": "ASC", 519 | "policy": "default", 520 | "query": "SELECT \"cpu_sys\" + \"cpu_user\" FROM \"firehose.bosh-hm-forwarder.system.cpu\" WHERE \"job\" =~ /router.*/ GROUP BY \"job\", \"index\"", 521 | "rawQuery": true, 522 | "refId": "B", 523 | "resultFormat": "time_series", 524 | "select": [ 525 | [ 526 | { 527 | "params": [ 528 | "cpu_sys" 529 | ], 530 | "type": "field" 531 | }, 532 | { 533 | "params": [], 534 | "type": "mean" 535 | } 536 | ] 537 | ], 538 | "tags": [] 539 | } 540 | ], 541 | "thresholds": [], 542 | "timeFrom": null, 543 | "timeShift": null, 544 | "title": "CPU %", 545 | "tooltip": { 546 | "msResolution": false, 547 | "shared": true, 548 | "sort": 0, 549 | "value_type": "cumulative" 550 | }, 551 | "type": "graph", 552 | "xaxis": { 553 | "buckets": null, 554 | "mode": "time", 555 | "name": null, 556 | "show": true, 557 | "values": [] 558 | }, 559 | "yaxes": [ 560 | { 561 | "format": "short", 562 | "label": null, 563 | "logBase": 1, 564 | "max": null, 565 | "min": null, 566 | "show": true 567 | }, 568 | { 569 | "format": "short", 570 | "label": null, 571 | "logBase": 1, 572 | "max": null, 573 | "min": null, 574 | "show": true 575 | } 576 | ] 577 | } 578 | ], 579 | "repeat": null, 580 | "repeatIteration": null, 581 | "repeatRowId": null, 582 | "showTitle": false, 583 | "title": "New row", 584 | "titleSize": "h6" 585 | }, 586 | { 587 | "collapse": false, 588 | "height": "250px", 589 | "panels": [ 590 | { 591 | "cacheTimeout": null, 592 | "colorBackground": false, 593 | "colorValue": false, 594 | "colors": [ 595 | "rgba(245, 54, 54, 0.9)", 596 | "rgba(237, 129, 40, 0.89)", 597 | "rgba(50, 172, 45, 0.97)" 598 | ], 599 | "datasource": "$Environment_CF", 600 | "editable": true, 601 | "error": false, 602 | "format": "none", 603 | "gauge": { 604 | "maxValue": 100, 605 | "minValue": 0, 606 | "show": false, 607 | "thresholdLabels": true 608 | }, 609 | "id": 2, 610 | "interval": null, 611 | "links": [], 612 | "mappingType": 1, 613 | "mappingTypes": [ 614 | { 615 | "name": "value to text", 616 | "value": 1 617 | }, 618 | { 619 | "name": "range to text", 620 | "value": 2 621 | } 622 | ], 623 | "maxDataPoints": 100, 624 | "nullPointMode": "connected", 625 | "nullText": null, 626 | "postfix": "", 627 | "postfixFontSize": "50%", 628 | "prefix": "", 629 | "prefixFontSize": "50%", 630 | "rangeMaps": [ 631 | { 632 | "from": "null", 633 | "text": "N/A", 634 | "to": "null" 635 | } 636 | ], 637 | "span": 4, 638 | "sparkline": { 639 | "fillColor": "rgba(31, 118, 189, 0.18)", 640 | "full": false, 641 | "lineColor": "rgb(31, 120, 193)", 642 | "show": true 643 | }, 644 | "tableColumn": "", 645 | "targets": [ 646 | { 647 | "dsType": "influxdb", 648 | "groupBy": [ 649 | { 650 | "params": [ 651 | "$interval" 652 | ], 653 | "type": "time" 654 | }, 655 | { 656 | "params": [ 657 | "null" 658 | ], 659 | "type": "fill" 660 | } 661 | ], 662 | "measurement": "firehose.gorouter.rejected_requests", 663 | "policy": "default", 664 | "query": "SELECT mean(\"value\") FROM \"firehose.gorouter.rejected_requests\" WHERE $timeFilter GROUP BY time($interval) fill(null)", 665 | "refId": "A", 666 | "resultFormat": "time_series", 667 | "select": [ 668 | [ 669 | { 670 | "params": [ 671 | "value" 672 | ], 673 | "type": "field" 674 | }, 675 | { 676 | "params": [], 677 | "type": "mean" 678 | } 679 | ] 680 | ], 681 | "tags": [] 682 | } 683 | ], 684 | "thresholds": "", 685 | "title": "Reject Requests", 686 | "type": "singlestat", 687 | "valueFontSize": "80%", 688 | "valueMaps": [ 689 | { 690 | "op": "=", 691 | "text": "N/A", 692 | "value": "null" 693 | } 694 | ], 695 | "valueName": "avg" 696 | }, 697 | { 698 | "cacheTimeout": null, 699 | "colorBackground": false, 700 | "colorValue": false, 701 | "colors": [ 702 | "rgba(245, 54, 54, 0.9)", 703 | "rgba(237, 129, 40, 0.89)", 704 | "rgba(50, 172, 45, 0.97)" 705 | ], 706 | "datasource": "$Environment_CF", 707 | "editable": true, 708 | "error": false, 709 | "format": "none", 710 | "gauge": { 711 | "maxValue": 100, 712 | "minValue": 0, 713 | "show": false, 714 | "thresholdLabels": true 715 | }, 716 | "id": 3, 717 | "interval": null, 718 | "links": [], 719 | "mappingType": 1, 720 | "mappingTypes": [ 721 | { 722 | "name": "value to text", 723 | "value": 1 724 | }, 725 | { 726 | "name": "range to text", 727 | "value": 2 728 | } 729 | ], 730 | "maxDataPoints": 100, 731 | "nullPointMode": "connected", 732 | "nullText": null, 733 | "postfix": "", 734 | "postfixFontSize": "50%", 735 | "prefix": "", 736 | "prefixFontSize": "50%", 737 | "rangeMaps": [ 738 | { 739 | "from": "null", 740 | "text": "N/A", 741 | "to": "null" 742 | } 743 | ], 744 | "span": 4, 745 | "sparkline": { 746 | "fillColor": "rgba(31, 118, 189, 0.18)", 747 | "full": false, 748 | "lineColor": "rgb(31, 120, 193)", 749 | "show": true 750 | }, 751 | "tableColumn": "", 752 | "targets": [ 753 | { 754 | "dsType": "influxdb", 755 | "groupBy": [ 756 | { 757 | "params": [ 758 | "$interval" 759 | ], 760 | "type": "time" 761 | }, 762 | { 763 | "params": [ 764 | "null" 765 | ], 766 | "type": "fill" 767 | } 768 | ], 769 | "hide": false, 770 | "measurement": "firehose.gorouter.bad_gateways", 771 | "policy": "default", 772 | "query": "SELECT mean(\"value\") FROM \"firehose.gorouter.bad_gateways\" WHERE $timeFilter GROUP BY time($interval) fill(null)", 773 | "refId": "A", 774 | "resultFormat": "time_series", 775 | "select": [ 776 | [ 777 | { 778 | "params": [ 779 | "value" 780 | ], 781 | "type": "field" 782 | }, 783 | { 784 | "params": [], 785 | "type": "mean" 786 | } 787 | ] 788 | ], 789 | "tags": [] 790 | } 791 | ], 792 | "thresholds": "", 793 | "title": "Bad Gateways", 794 | "type": "singlestat", 795 | "valueFontSize": "80%", 796 | "valueMaps": [ 797 | { 798 | "op": "=", 799 | "text": "N/A", 800 | "value": "null" 801 | } 802 | ], 803 | "valueName": "avg" 804 | }, 805 | { 806 | "cacheTimeout": null, 807 | "colorBackground": false, 808 | "colorValue": false, 809 | "colors": [ 810 | "rgba(245, 54, 54, 0.9)", 811 | "rgba(237, 129, 40, 0.89)", 812 | "rgba(50, 172, 45, 0.97)" 813 | ], 814 | "datasource": "$Environment_CF", 815 | "editable": true, 816 | "error": false, 817 | "format": "none", 818 | "gauge": { 819 | "maxValue": 100, 820 | "minValue": 0, 821 | "show": false, 822 | "thresholdLabels": true 823 | }, 824 | "id": 4, 825 | "interval": null, 826 | "links": [], 827 | "mappingType": 1, 828 | "mappingTypes": [ 829 | { 830 | "name": "value to text", 831 | "value": 1 832 | }, 833 | { 834 | "name": "range to text", 835 | "value": 2 836 | } 837 | ], 838 | "maxDataPoints": 100, 839 | "nullPointMode": "connected", 840 | "nullText": null, 841 | "postfix": "", 842 | "postfixFontSize": "50%", 843 | "prefix": "", 844 | "prefixFontSize": "50%", 845 | "rangeMaps": [ 846 | { 847 | "from": "null", 848 | "text": "N/A", 849 | "to": "null" 850 | } 851 | ], 852 | "span": 4, 853 | "sparkline": { 854 | "fillColor": "rgba(31, 118, 189, 0.18)", 855 | "full": false, 856 | "lineColor": "rgb(31, 120, 193)", 857 | "show": true 858 | }, 859 | "tableColumn": "", 860 | "targets": [ 861 | { 862 | "dsType": "influxdb", 863 | "groupBy": [ 864 | { 865 | "params": [ 866 | "$interval" 867 | ], 868 | "type": "time" 869 | }, 870 | { 871 | "params": [ 872 | "null" 873 | ], 874 | "type": "fill" 875 | } 876 | ], 877 | "measurement": "firehose.gorouter.total_routes", 878 | "policy": "default", 879 | "query": "SELECT mean(\"value\") FROM \"firehose.gorouter.total_routes\" WHERE $timeFilter GROUP BY time($interval) fill(null)", 880 | "refId": "A", 881 | "resultFormat": "time_series", 882 | "select": [ 883 | [ 884 | { 885 | "params": [ 886 | "value" 887 | ], 888 | "type": "field" 889 | }, 890 | { 891 | "params": [], 892 | "type": "mean" 893 | } 894 | ] 895 | ], 896 | "tags": [] 897 | } 898 | ], 899 | "thresholds": "", 900 | "title": "Total Routes", 901 | "type": "singlestat", 902 | "valueFontSize": "80%", 903 | "valueMaps": [ 904 | { 905 | "op": "=", 906 | "text": "N/A", 907 | "value": "null" 908 | } 909 | ], 910 | "valueName": "avg" 911 | } 912 | ], 913 | "repeat": null, 914 | "repeatIteration": null, 915 | "repeatRowId": null, 916 | "showTitle": false, 917 | "title": "New row", 918 | "titleSize": "h6" 919 | } 920 | ], 921 | "schemaVersion": 14, 922 | "style": "dark", 923 | "tags": [], 924 | "templating": { 925 | "list": [ 926 | { 927 | "current": { 928 | "text": "cf_np", 929 | "value": "cf_np" 930 | }, 931 | "hide": 0, 932 | "label": null, 933 | "name": "Environment_CF", 934 | "options": [], 935 | "query": "influxdb", 936 | "refresh": 1, 937 | "regex": "/cf_*/", 938 | "type": "datasource" 939 | } 940 | ] 941 | }, 942 | "time": { 943 | "from": "now-15m", 944 | "to": "now" 945 | }, 946 | "timepicker": { 947 | "refresh_intervals": [ 948 | "5s", 949 | "10s", 950 | "30s", 951 | "1m", 952 | "5m", 953 | "15m", 954 | "30m", 955 | "1h", 956 | "2h", 957 | "1d" 958 | ], 959 | "time_options": [ 960 | "5m", 961 | "15m", 962 | "1h", 963 | "6h", 964 | "12h", 965 | "24h", 966 | "2d", 967 | "7d", 968 | "30d" 969 | ] 970 | }, 971 | "timezone": "browser", 972 | "title": "Routing", 973 | "version": 0 974 | } 975 | -------------------------------------------------------------------------------- /grafana/dashboards/import_format/bbs.json: -------------------------------------------------------------------------------- 1 | { 2 | "__inputs": [], 3 | "__requires": [ 4 | { 5 | "type": "grafana", 6 | "id": "grafana", 7 | "name": "Grafana", 8 | "version": "4.2.0" 9 | }, 10 | { 11 | "type": "panel", 12 | "id": "graph", 13 | "name": "Graph", 14 | "version": "" 15 | }, 16 | { 17 | "type": "panel", 18 | "id": "singlestat", 19 | "name": "Singlestat", 20 | "version": "" 21 | } 22 | ], 23 | "annotations": { 24 | "list": [] 25 | }, 26 | "editable": true, 27 | "gnetId": null, 28 | "graphTooltip": 0, 29 | "hideControls": false, 30 | "id": null, 31 | "links": [], 32 | "refresh": false, 33 | "rows": [ 34 | { 35 | "collapse": false, 36 | "height": "250px", 37 | "panels": [ 38 | { 39 | "aliasColors": {}, 40 | "bars": false, 41 | "datasource": "$Environment", 42 | "editable": true, 43 | "error": false, 44 | "fill": 1, 45 | "grid": {}, 46 | "id": 2, 47 | "legend": { 48 | "alignAsTable": true, 49 | "avg": false, 50 | "current": true, 51 | "max": true, 52 | "min": true, 53 | "show": true, 54 | "total": false, 55 | "values": true 56 | }, 57 | "lines": true, 58 | "linewidth": 1, 59 | "links": [], 60 | "nullPointMode": "connected", 61 | "percentage": false, 62 | "pointradius": 2, 63 | "points": false, 64 | "renderer": "flot", 65 | "seriesOverrides": [], 66 | "span": 12, 67 | "stack": false, 68 | "steppedLine": false, 69 | "targets": [ 70 | { 71 | "dsType": "influxdb", 72 | "groupBy": [ 73 | { 74 | "params": [ 75 | "job" 76 | ], 77 | "type": "tag" 78 | }, 79 | { 80 | "params": [ 81 | "index" 82 | ], 83 | "type": "tag" 84 | } 85 | ], 86 | "hide": false, 87 | "measurement": "firehose.bbs.LRPsRunning", 88 | "policy": "default", 89 | "query": "SELECT \"value\" FROM \"firehose.analyzer.NumberOfDesiredApps\" WHERE $timeFilter GROUP BY \"job\", \"index\"", 90 | "rawQuery": false, 91 | "refId": "A", 92 | "resultFormat": "time_series", 93 | "select": [ 94 | [ 95 | { 96 | "params": [ 97 | "value" 98 | ], 99 | "type": "field" 100 | } 101 | ] 102 | ], 103 | "tags": [], 104 | "target": "" 105 | }, 106 | { 107 | "dsType": "influxdb", 108 | "groupBy": [ 109 | { 110 | "params": [ 111 | "job" 112 | ], 113 | "type": "tag" 114 | }, 115 | { 116 | "params": [ 117 | "index" 118 | ], 119 | "type": "tag" 120 | } 121 | ], 122 | "hide": false, 123 | "measurement": "firehose.bbs.LRPsDesired", 124 | "policy": "default", 125 | "query": "SELECT \"value\" FROM \"firehose.analyzer.NumberOfAppsWithAllInstancesReporting\" WHERE $timeFilter GROUP BY \"job\", \"index\"", 126 | "rawQuery": false, 127 | "refId": "B", 128 | "resultFormat": "time_series", 129 | "select": [ 130 | [ 131 | { 132 | "params": [ 133 | "value" 134 | ], 135 | "type": "field" 136 | } 137 | ] 138 | ], 139 | "tags": [], 140 | "target": "" 141 | } 142 | ], 143 | "thresholds": [], 144 | "timeFrom": null, 145 | "timeShift": null, 146 | "title": "LRP's Running vs Desired", 147 | "tooltip": { 148 | "shared": true, 149 | "sort": 0, 150 | "value_type": "cumulative" 151 | }, 152 | "type": "graph", 153 | "xaxis": { 154 | "mode": "time", 155 | "name": null, 156 | "show": true, 157 | "values": [] 158 | }, 159 | "yaxes": [ 160 | { 161 | "format": "short", 162 | "logBase": 1, 163 | "max": null, 164 | "min": null, 165 | "show": true 166 | }, 167 | { 168 | "format": "short", 169 | "logBase": 1, 170 | "max": null, 171 | "min": null, 172 | "show": true 173 | } 174 | ] 175 | } 176 | ], 177 | "repeat": null, 178 | "repeatIteration": null, 179 | "repeatRowId": null, 180 | "showTitle": false, 181 | "title": "New row", 182 | "titleSize": "h6" 183 | }, 184 | { 185 | "collapse": false, 186 | "height": "250px", 187 | "panels": [ 188 | { 189 | "cacheTimeout": null, 190 | "colorBackground": false, 191 | "colorValue": true, 192 | "colors": [ 193 | "rgba(50, 172, 45, 0.97)", 194 | "rgba(237, 129, 40, 0.89)", 195 | "rgba(245, 54, 54, 0.9)" 196 | ], 197 | "datasource": "$Environment", 198 | "editable": true, 199 | "error": false, 200 | "format": "none", 201 | "gauge": { 202 | "maxValue": 100, 203 | "minValue": 0, 204 | "show": false, 205 | "thresholdLabels": false, 206 | "thresholdMarkers": true 207 | }, 208 | "id": 9, 209 | "interval": null, 210 | "links": [], 211 | "mappingType": 1, 212 | "mappingTypes": [ 213 | { 214 | "name": "value to text", 215 | "value": 1 216 | }, 217 | { 218 | "name": "range to text", 219 | "value": 2 220 | } 221 | ], 222 | "maxDataPoints": 100, 223 | "nullPointMode": "connected", 224 | "nullText": null, 225 | "postfix": "", 226 | "postfixFontSize": "50%", 227 | "prefix": "", 228 | "prefixFontSize": "50%", 229 | "rangeMaps": [ 230 | { 231 | "from": "null", 232 | "text": "N/A", 233 | "to": "null" 234 | } 235 | ], 236 | "span": 4, 237 | "sparkline": { 238 | "fillColor": "rgba(31, 118, 189, 0.18)", 239 | "full": false, 240 | "lineColor": "rgb(31, 120, 193)", 241 | "show": true 242 | }, 243 | "targets": [ 244 | { 245 | "dsType": "influxdb", 246 | "fields": [ 247 | { 248 | "func": "mean", 249 | "name": "value" 250 | } 251 | ], 252 | "groupBy": [ 253 | { 254 | "params": [ 255 | "job" 256 | ], 257 | "type": "tag" 258 | }, 259 | { 260 | "params": [ 261 | "index" 262 | ], 263 | "type": "tag" 264 | } 265 | ], 266 | "groupByTags": [], 267 | "measurement": "firehose.bbs.LRPsUnclaimed", 268 | "policy": "default", 269 | "query": "SELECT \"value\" FROM \"firehose.analyzer.NumberOfDesiredAppsPendingStaging\" WHERE $timeFilter GROUP BY \"job\", \"index\"", 270 | "rawQuery": false, 271 | "refId": "A", 272 | "resultFormat": "time_series", 273 | "select": [ 274 | [ 275 | { 276 | "params": [ 277 | "value" 278 | ], 279 | "type": "field" 280 | } 281 | ] 282 | ], 283 | "tags": [] 284 | } 285 | ], 286 | "thresholds": "1,3", 287 | "title": "Unclaimed LRPs", 288 | "type": "singlestat", 289 | "valueFontSize": "80%", 290 | "valueMaps": [ 291 | { 292 | "op": "=", 293 | "text": "N/A", 294 | "value": "null" 295 | } 296 | ], 297 | "valueName": "avg" 298 | }, 299 | { 300 | "cacheTimeout": null, 301 | "colorBackground": false, 302 | "colorValue": true, 303 | "colors": [ 304 | "rgba(50, 172, 45, 0.97)", 305 | "rgba(237, 129, 40, 0.89)", 306 | "rgba(245, 54, 54, 0.9)" 307 | ], 308 | "datasource": "$Environment", 309 | "editable": true, 310 | "error": false, 311 | "format": "none", 312 | "gauge": { 313 | "maxValue": 100, 314 | "minValue": 0, 315 | "show": false, 316 | "thresholdLabels": false, 317 | "thresholdMarkers": true 318 | }, 319 | "id": 10, 320 | "interval": null, 321 | "links": [], 322 | "mappingType": 1, 323 | "mappingTypes": [ 324 | { 325 | "name": "value to text", 326 | "value": 1 327 | }, 328 | { 329 | "name": "range to text", 330 | "value": 2 331 | } 332 | ], 333 | "maxDataPoints": 100, 334 | "nullPointMode": "connected", 335 | "nullText": null, 336 | "postfix": "", 337 | "postfixFontSize": "50%", 338 | "prefix": "", 339 | "prefixFontSize": "50%", 340 | "rangeMaps": [ 341 | { 342 | "from": "null", 343 | "text": "N/A", 344 | "to": "null" 345 | } 346 | ], 347 | "span": 4, 348 | "sparkline": { 349 | "fillColor": "rgba(31, 118, 189, 0.18)", 350 | "full": false, 351 | "lineColor": "rgb(31, 120, 193)", 352 | "show": true 353 | }, 354 | "targets": [ 355 | { 356 | "dsType": "influxdb", 357 | "fields": [ 358 | { 359 | "func": "mean", 360 | "name": "value" 361 | } 362 | ], 363 | "groupBy": [ 364 | { 365 | "params": [ 366 | "$interval" 367 | ], 368 | "type": "time" 369 | }, 370 | { 371 | "params": [ 372 | "null" 373 | ], 374 | "type": "fill" 375 | } 376 | ], 377 | "groupByTags": [], 378 | "measurement": "firehose.bbs.LRPsMissing", 379 | "policy": "default", 380 | "query": "SELECT mean(\"value\") FROM \"firehose.analyzer.NumberOfMissingIndices\" WHERE $timeFilter GROUP BY time($interval) fill(null)", 381 | "rawQuery": false, 382 | "refId": "A", 383 | "resultFormat": "time_series", 384 | "select": [ 385 | [ 386 | { 387 | "params": [ 388 | "value" 389 | ], 390 | "type": "field" 391 | }, 392 | { 393 | "params": [], 394 | "type": "mean" 395 | } 396 | ] 397 | ], 398 | "tags": [] 399 | } 400 | ], 401 | "thresholds": "1,3", 402 | "title": "LRP's Missing", 403 | "type": "singlestat", 404 | "valueFontSize": "80%", 405 | "valueMaps": [ 406 | { 407 | "op": "=", 408 | "text": "N/A", 409 | "value": "null" 410 | } 411 | ], 412 | "valueName": "avg" 413 | }, 414 | { 415 | "cacheTimeout": null, 416 | "colorBackground": false, 417 | "colorValue": true, 418 | "colors": [ 419 | "rgba(50, 172, 45, 0.97)", 420 | "rgba(237, 129, 40, 0.89)", 421 | "rgba(245, 54, 54, 0.9)" 422 | ], 423 | "datasource": "$Environment", 424 | "editable": true, 425 | "error": false, 426 | "format": "none", 427 | "gauge": { 428 | "maxValue": 100, 429 | "minValue": 0, 430 | "show": false, 431 | "thresholdLabels": false, 432 | "thresholdMarkers": true 433 | }, 434 | "id": 12, 435 | "interval": null, 436 | "links": [], 437 | "mappingType": 1, 438 | "mappingTypes": [ 439 | { 440 | "name": "value to text", 441 | "value": 1 442 | }, 443 | { 444 | "name": "range to text", 445 | "value": 2 446 | } 447 | ], 448 | "maxDataPoints": 100, 449 | "nullPointMode": "connected", 450 | "nullText": null, 451 | "postfix": "", 452 | "postfixFontSize": "50%", 453 | "prefix": "", 454 | "prefixFontSize": "50%", 455 | "rangeMaps": [ 456 | { 457 | "from": "null", 458 | "text": "N/A", 459 | "to": "null" 460 | } 461 | ], 462 | "span": 4, 463 | "sparkline": { 464 | "fillColor": "rgba(31, 118, 189, 0.18)", 465 | "full": false, 466 | "lineColor": "rgb(31, 120, 193)", 467 | "show": true 468 | }, 469 | "targets": [ 470 | { 471 | "dsType": "influxdb", 472 | "fields": [ 473 | { 474 | "func": "mean", 475 | "name": "value" 476 | } 477 | ], 478 | "groupBy": [ 479 | { 480 | "params": [ 481 | "$interval" 482 | ], 483 | "type": "time" 484 | }, 485 | { 486 | "params": [ 487 | "null" 488 | ], 489 | "type": "fill" 490 | } 491 | ], 492 | "groupByTags": [], 493 | "measurement": "firehose.bbs.CrashedActualLRPs", 494 | "policy": "default", 495 | "query": "SELECT mean(\"value\") FROM \"firehose.analyzer.NumberOfCrashedInstances\" WHERE $timeFilter GROUP BY time($interval) fill(null)", 496 | "refId": "A", 497 | "resultFormat": "time_series", 498 | "select": [ 499 | [ 500 | { 501 | "params": [ 502 | "value" 503 | ], 504 | "type": "field" 505 | }, 506 | { 507 | "params": [], 508 | "type": "mean" 509 | } 510 | ] 511 | ], 512 | "tags": [] 513 | } 514 | ], 515 | "thresholds": "10,15", 516 | "title": "LRP's Crashing", 517 | "type": "singlestat", 518 | "valueFontSize": "80%", 519 | "valueMaps": [ 520 | { 521 | "op": "=", 522 | "text": "N/A", 523 | "value": "null" 524 | } 525 | ], 526 | "valueName": "avg" 527 | } 528 | ], 529 | "repeat": null, 530 | "repeatIteration": null, 531 | "repeatRowId": null, 532 | "showTitle": false, 533 | "title": "New row", 534 | "titleSize": "h6" 535 | } 536 | ], 537 | "schemaVersion": 14, 538 | "style": "dark", 539 | "tags": [], 540 | "templating": { 541 | "list": [ 542 | { 543 | "current": { 544 | "text": "cf_np", 545 | "value": "cf_np" 546 | }, 547 | "hide": 0, 548 | "label": null, 549 | "name": "Environment", 550 | "options": [], 551 | "query": "influxdb", 552 | "refresh": 1, 553 | "regex": "/cf_*/", 554 | "type": "datasource" 555 | } 556 | ] 557 | }, 558 | "time": { 559 | "from": "now-1h", 560 | "to": "now" 561 | }, 562 | "timepicker": { 563 | "collapse": false, 564 | "enable": true, 565 | "notice": false, 566 | "now": true, 567 | "refresh_intervals": [ 568 | "5s", 569 | "10s", 570 | "30s", 571 | "1m", 572 | "5m", 573 | "15m", 574 | "30m", 575 | "1h", 576 | "2h", 577 | "1d" 578 | ], 579 | "status": "Stable", 580 | "time_options": [ 581 | "5m", 582 | "15m", 583 | "1h", 584 | "6h", 585 | "12h", 586 | "24h", 587 | "2d", 588 | "7d", 589 | "30d" 590 | ], 591 | "type": "timepicker" 592 | }, 593 | "timezone": "browser", 594 | "title": "BBS Metrics", 595 | "version": 0 596 | } -------------------------------------------------------------------------------- /grafana/dashboards/import_format/cell.json: -------------------------------------------------------------------------------- 1 | { 2 | "__inputs": [], 3 | "__requires": [ 4 | { 5 | "type": "grafana", 6 | "id": "grafana", 7 | "name": "Grafana", 8 | "version": "4.4.3" 9 | }, 10 | { 11 | "type": "panel", 12 | "id": "graph", 13 | "name": "Graph", 14 | "version": "" 15 | } 16 | ], 17 | "annotations": { 18 | "list": [ 19 | { 20 | "datasource": "$Environment", 21 | "enable": true, 22 | "iconColor": "rgba(255, 96, 96, 1)", 23 | "name": "deployment", 24 | "query": "select value from bosh_deploy where $timeFilter" 25 | } 26 | ] 27 | }, 28 | "editable": true, 29 | "gnetId": null, 30 | "graphTooltip": 0, 31 | "hideControls": false, 32 | "id": null, 33 | "links": [], 34 | "rows": [ 35 | { 36 | "collapse": false, 37 | "height": "250px", 38 | "panels": [ 39 | { 40 | "aliasColors": {}, 41 | "bars": false, 42 | "dashLength": 10, 43 | "dashes": false, 44 | "datasource": "$Environment", 45 | "editable": true, 46 | "error": false, 47 | "fill": 0, 48 | "grid": {}, 49 | "id": 1, 50 | "legend": { 51 | "avg": false, 52 | "current": false, 53 | "max": false, 54 | "min": false, 55 | "show": true, 56 | "total": false, 57 | "values": false 58 | }, 59 | "lines": true, 60 | "linewidth": 2, 61 | "links": [], 62 | "nullPointMode": "connected", 63 | "percentage": false, 64 | "pointradius": 1, 65 | "points": true, 66 | "renderer": "flot", 67 | "seriesOverrides": [], 68 | "spaceLength": 10, 69 | "span": 12, 70 | "stack": false, 71 | "steppedLine": false, 72 | "targets": [ 73 | { 74 | "alias": "$tag_job:$tag_index", 75 | "dsType": "influxdb", 76 | "groupBy": [ 77 | { 78 | "params": [ 79 | "job" 80 | ], 81 | "type": "tag" 82 | }, 83 | { 84 | "params": [ 85 | "index" 86 | ], 87 | "type": "tag" 88 | } 89 | ], 90 | "measurement": "firehose.rep.CapacityRemainingMemory", 91 | "orderByTime": "ASC", 92 | "policy": "default", 93 | "refId": "A", 94 | "resultFormat": "time_series", 95 | "select": [ 96 | [ 97 | { 98 | "params": [ 99 | "value" 100 | ], 101 | "type": "field" 102 | } 103 | ] 104 | ], 105 | "tags": [ 106 | { 107 | "key": "job", 108 | "operator": "=~", 109 | "value": "/^$Segment$/" 110 | } 111 | ] 112 | } 113 | ], 114 | "thresholds": [], 115 | "timeFrom": null, 116 | "timeShift": null, 117 | "title": "Cell Remaining Memory", 118 | "tooltip": { 119 | "msResolution": false, 120 | "shared": true, 121 | "sort": 2, 122 | "value_type": "cumulative" 123 | }, 124 | "type": "graph", 125 | "xaxis": { 126 | "buckets": null, 127 | "mode": "time", 128 | "name": null, 129 | "show": true, 130 | "values": [] 131 | }, 132 | "yaxes": [ 133 | { 134 | "format": "mbytes", 135 | "label": null, 136 | "logBase": 1, 137 | "max": null, 138 | "min": null, 139 | "show": true 140 | }, 141 | { 142 | "format": "short", 143 | "label": null, 144 | "logBase": 1, 145 | "max": null, 146 | "min": null, 147 | "show": true 148 | } 149 | ] 150 | } 151 | ], 152 | "repeat": null, 153 | "repeatIteration": null, 154 | "repeatRowId": null, 155 | "showTitle": false, 156 | "title": "Row", 157 | "titleSize": "h6" 158 | }, 159 | { 160 | "collapse": false, 161 | "height": "250px", 162 | "panels": [ 163 | { 164 | "aliasColors": {}, 165 | "bars": false, 166 | "dashLength": 10, 167 | "dashes": false, 168 | "datasource": "$Environment", 169 | "editable": true, 170 | "error": false, 171 | "fill": 0, 172 | "grid": {}, 173 | "id": 2, 174 | "legend": { 175 | "avg": false, 176 | "current": false, 177 | "max": false, 178 | "min": false, 179 | "show": true, 180 | "total": false, 181 | "values": false 182 | }, 183 | "lines": true, 184 | "linewidth": 2, 185 | "links": [], 186 | "nullPointMode": "connected", 187 | "percentage": false, 188 | "pointradius": 1, 189 | "points": true, 190 | "renderer": "flot", 191 | "seriesOverrides": [], 192 | "spaceLength": 10, 193 | "span": 12, 194 | "stack": false, 195 | "steppedLine": false, 196 | "targets": [ 197 | { 198 | "alias": "$tag_job:$tag_index", 199 | "dsType": "influxdb", 200 | "groupBy": [ 201 | { 202 | "params": [ 203 | "job" 204 | ], 205 | "type": "tag" 206 | }, 207 | { 208 | "params": [ 209 | "index" 210 | ], 211 | "type": "tag" 212 | } 213 | ], 214 | "measurement": "firehose.rep.CapacityRemainingDisk", 215 | "orderByTime": "ASC", 216 | "policy": "default", 217 | "refId": "A", 218 | "resultFormat": "time_series", 219 | "select": [ 220 | [ 221 | { 222 | "params": [ 223 | "value" 224 | ], 225 | "type": "field" 226 | } 227 | ] 228 | ], 229 | "tags": [ 230 | { 231 | "key": "job", 232 | "operator": "=~", 233 | "value": "/^$Segment$/" 234 | } 235 | ] 236 | } 237 | ], 238 | "thresholds": [], 239 | "timeFrom": null, 240 | "timeShift": null, 241 | "title": "Cell Remaining Disk", 242 | "tooltip": { 243 | "msResolution": false, 244 | "shared": true, 245 | "sort": 2, 246 | "value_type": "cumulative" 247 | }, 248 | "type": "graph", 249 | "xaxis": { 250 | "buckets": null, 251 | "mode": "time", 252 | "name": null, 253 | "show": true, 254 | "values": [] 255 | }, 256 | "yaxes": [ 257 | { 258 | "format": "mbytes", 259 | "label": null, 260 | "logBase": 1, 261 | "max": null, 262 | "min": null, 263 | "show": true 264 | }, 265 | { 266 | "format": "short", 267 | "label": null, 268 | "logBase": 1, 269 | "max": null, 270 | "min": null, 271 | "show": true 272 | } 273 | ] 274 | } 275 | ], 276 | "repeat": null, 277 | "repeatIteration": null, 278 | "repeatRowId": null, 279 | "showTitle": false, 280 | "title": "New row", 281 | "titleSize": "h6" 282 | }, 283 | { 284 | "collapse": false, 285 | "height": "250px", 286 | "panels": [ 287 | { 288 | "aliasColors": {}, 289 | "bars": false, 290 | "dashLength": 10, 291 | "dashes": false, 292 | "datasource": "$Environment", 293 | "editable": true, 294 | "error": false, 295 | "fill": 0, 296 | "grid": {}, 297 | "id": 3, 298 | "legend": { 299 | "avg": false, 300 | "current": false, 301 | "max": false, 302 | "min": false, 303 | "show": true, 304 | "total": false, 305 | "values": false 306 | }, 307 | "lines": true, 308 | "linewidth": 2, 309 | "links": [], 310 | "nullPointMode": "connected", 311 | "percentage": false, 312 | "pointradius": 1, 313 | "points": true, 314 | "renderer": "flot", 315 | "seriesOverrides": [], 316 | "spaceLength": 10, 317 | "span": 12, 318 | "stack": false, 319 | "steppedLine": false, 320 | "targets": [ 321 | { 322 | "alias": "$tag_job:$tag_index", 323 | "dsType": "influxdb", 324 | "groupBy": [ 325 | { 326 | "params": [ 327 | "job" 328 | ], 329 | "type": "tag" 330 | }, 331 | { 332 | "params": [ 333 | "index" 334 | ], 335 | "type": "tag" 336 | } 337 | ], 338 | "measurement": "firehose.rep.ContainerCount", 339 | "orderByTime": "ASC", 340 | "policy": "default", 341 | "refId": "A", 342 | "resultFormat": "time_series", 343 | "select": [ 344 | [ 345 | { 346 | "params": [ 347 | "value" 348 | ], 349 | "type": "field" 350 | } 351 | ] 352 | ], 353 | "tags": [ 354 | { 355 | "key": "job", 356 | "operator": "=~", 357 | "value": "/^$Segment$/" 358 | } 359 | ] 360 | } 361 | ], 362 | "thresholds": [], 363 | "timeFrom": null, 364 | "timeShift": null, 365 | "title": "Cell Container Count", 366 | "tooltip": { 367 | "msResolution": true, 368 | "shared": true, 369 | "sort": 2, 370 | "value_type": "cumulative" 371 | }, 372 | "type": "graph", 373 | "xaxis": { 374 | "buckets": null, 375 | "mode": "time", 376 | "name": null, 377 | "show": true, 378 | "values": [] 379 | }, 380 | "yaxes": [ 381 | { 382 | "format": "short", 383 | "label": null, 384 | "logBase": 1, 385 | "max": null, 386 | "min": null, 387 | "show": true 388 | }, 389 | { 390 | "format": "short", 391 | "label": null, 392 | "logBase": 1, 393 | "max": null, 394 | "min": null, 395 | "show": true 396 | } 397 | ] 398 | } 399 | ], 400 | "repeat": null, 401 | "repeatIteration": null, 402 | "repeatRowId": null, 403 | "showTitle": false, 404 | "title": "New row", 405 | "titleSize": "h6" 406 | }, 407 | { 408 | "collapse": false, 409 | "height": "250px", 410 | "panels": [ 411 | { 412 | "aliasColors": {}, 413 | "bars": false, 414 | "dashLength": 10, 415 | "dashes": false, 416 | "datasource": "$Environment", 417 | "editable": true, 418 | "error": false, 419 | "fill": 0, 420 | "grid": {}, 421 | "id": 4, 422 | "legend": { 423 | "avg": false, 424 | "current": false, 425 | "max": false, 426 | "min": false, 427 | "show": true, 428 | "total": false, 429 | "values": false 430 | }, 431 | "lines": true, 432 | "linewidth": 2, 433 | "links": [], 434 | "nullPointMode": "connected", 435 | "percentage": false, 436 | "pointradius": 1, 437 | "points": true, 438 | "renderer": "flot", 439 | "seriesOverrides": [], 440 | "spaceLength": 10, 441 | "span": 12, 442 | "stack": false, 443 | "steppedLine": false, 444 | "targets": [ 445 | { 446 | "alias": "$tag_job:$tag_index", 447 | "dsType": "influxdb", 448 | "groupBy": [ 449 | { 450 | "params": [ 451 | "job" 452 | ], 453 | "type": "tag" 454 | }, 455 | { 456 | "params": [ 457 | "index" 458 | ], 459 | "type": "tag" 460 | } 461 | ], 462 | "measurement": "firehose.rep.UnhealthyCell", 463 | "orderByTime": "ASC", 464 | "policy": "default", 465 | "refId": "A", 466 | "resultFormat": "time_series", 467 | "select": [ 468 | [ 469 | { 470 | "params": [ 471 | "value" 472 | ], 473 | "type": "field" 474 | } 475 | ] 476 | ], 477 | "tags": [ 478 | { 479 | "key": "job", 480 | "operator": "=~", 481 | "value": "/^$Segment$/" 482 | } 483 | ] 484 | } 485 | ], 486 | "thresholds": [], 487 | "timeFrom": null, 488 | "timeShift": null, 489 | "title": "Cell Unhealthy", 490 | "tooltip": { 491 | "msResolution": false, 492 | "shared": true, 493 | "sort": 0, 494 | "value_type": "cumulative" 495 | }, 496 | "type": "graph", 497 | "xaxis": { 498 | "buckets": null, 499 | "mode": "time", 500 | "name": null, 501 | "show": true, 502 | "values": [] 503 | }, 504 | "yaxes": [ 505 | { 506 | "format": "short", 507 | "label": null, 508 | "logBase": 1, 509 | "max": null, 510 | "min": null, 511 | "show": true 512 | }, 513 | { 514 | "format": "short", 515 | "label": null, 516 | "logBase": 1, 517 | "max": null, 518 | "min": null, 519 | "show": true 520 | } 521 | ] 522 | } 523 | ], 524 | "repeat": null, 525 | "repeatIteration": null, 526 | "repeatRowId": null, 527 | "showTitle": false, 528 | "title": "New row", 529 | "titleSize": "h6" 530 | } 531 | ], 532 | "schemaVersion": 14, 533 | "style": "dark", 534 | "tags": [], 535 | "templating": { 536 | "list": [ 537 | { 538 | "current": { 539 | "text": "cf_np", 540 | "value": "cf_np" 541 | }, 542 | "hide": 0, 543 | "label": null, 544 | "name": "Environment", 545 | "options": [], 546 | "query": "influxdb", 547 | "refresh": 1, 548 | "regex": "/cf_*/", 549 | "type": "datasource" 550 | }, 551 | { 552 | "allValue": null, 553 | "current": {}, 554 | "datasource": "$Environment", 555 | "hide": 0, 556 | "includeAll": true, 557 | "label": null, 558 | "multi": true, 559 | "name": "Segment", 560 | "options": [], 561 | "query": "show tag values from \"firehose.rep.CapacityTotalContainers\" with key = \"job\"", 562 | "refresh": 1, 563 | "regex": "", 564 | "sort": 0, 565 | "tagValuesQuery": "show tag values from \"firehose.rep.CapacityTotalContainers\" with key = \"job\" WHERE job =~ /$tag/", 566 | "tags": [ 567 | "cell_z1", 568 | "cell_z2" 569 | ], 570 | "tagsQuery": "show tag values from \"firehose.rep.CapacityTotalContainers\" with key = \"job\"", 571 | "type": "query", 572 | "useTags": true 573 | } 574 | ] 575 | }, 576 | "time": { 577 | "from": "now-1h", 578 | "to": "now" 579 | }, 580 | "timepicker": { 581 | "refresh_intervals": [ 582 | "5s", 583 | "10s", 584 | "30s", 585 | "1m", 586 | "5m", 587 | "15m", 588 | "30m", 589 | "1h", 590 | "2h", 591 | "1d" 592 | ], 593 | "time_options": [ 594 | "5m", 595 | "15m", 596 | "1h", 597 | "6h", 598 | "12h", 599 | "24h", 600 | "2d", 601 | "7d", 602 | "30d" 603 | ] 604 | }, 605 | "timezone": "browser", 606 | "title": "Cell Metrics", 607 | "version": 0 608 | } 609 | -------------------------------------------------------------------------------- /grafana/dashboards/import_format/component-health.json: -------------------------------------------------------------------------------- 1 | { 2 | "__inputs": [], 3 | "__requires": [ 4 | { 5 | "type": "grafana", 6 | "id": "grafana", 7 | "name": "Grafana", 8 | "version": "4.4.3" 9 | }, 10 | { 11 | "type": "panel", 12 | "id": "graph", 13 | "name": "Graph", 14 | "version": "" 15 | } 16 | ], 17 | "annotations": { 18 | "list": [ 19 | { 20 | "datasource": "$Environment", 21 | "enable": true, 22 | "iconColor": "#C0C6BE", 23 | "iconSize": 15, 24 | "lineColor": "rgba(255, 96, 96, 0.592157)", 25 | "name": "deployment", 26 | "query": "select value from bosh_deploy where $timeFilter", 27 | "showLine": true 28 | } 29 | ] 30 | }, 31 | "editable": true, 32 | "gnetId": null, 33 | "graphTooltip": 0, 34 | "hideControls": false, 35 | "id": null, 36 | "links": [], 37 | "rows": [ 38 | { 39 | "collapse": false, 40 | "height": "250px", 41 | "panels": [ 42 | { 43 | "aliasColors": {}, 44 | "bars": false, 45 | "dashLength": 10, 46 | "dashes": false, 47 | "datasource": "$Environment", 48 | "editable": true, 49 | "error": false, 50 | "fill": 0, 51 | "grid": {}, 52 | "id": 1, 53 | "legend": { 54 | "avg": false, 55 | "current": false, 56 | "max": false, 57 | "min": false, 58 | "show": true, 59 | "total": false, 60 | "values": false 61 | }, 62 | "lines": true, 63 | "linewidth": 1, 64 | "links": [], 65 | "nullPointMode": "connected", 66 | "percentage": false, 67 | "pointradius": 2, 68 | "points": true, 69 | "renderer": "flot", 70 | "seriesOverrides": [], 71 | "spaceLength": 10, 72 | "span": 12, 73 | "stack": false, 74 | "steppedLine": false, 75 | "targets": [ 76 | { 77 | "alias": "$tag_job:$tag_index", 78 | "dsType": "influxdb", 79 | "groupBy": [ 80 | { 81 | "params": [ 82 | "job" 83 | ], 84 | "type": "tag" 85 | }, 86 | { 87 | "params": [ 88 | "index" 89 | ], 90 | "type": "tag" 91 | } 92 | ], 93 | "measurement": "firehose.bosh-hm-forwarder.system.healthy", 94 | "orderByTime": "ASC", 95 | "policy": "default", 96 | "query": "SELECT \"value\" FROM \"firehose.bosh-hm-forwarder.system.healthy\" WHERE $timeFilter GROUP BY \"job\", \"index\"", 97 | "refId": "A", 98 | "resultFormat": "time_series", 99 | "select": [ 100 | [ 101 | { 102 | "params": [ 103 | "value" 104 | ], 105 | "type": "field" 106 | } 107 | ] 108 | ], 109 | "tags": [] 110 | } 111 | ], 112 | "thresholds": [], 113 | "timeFrom": null, 114 | "timeShift": null, 115 | "title": "Bosh Component Health", 116 | "tooltip": { 117 | "shared": true, 118 | "sort": 0, 119 | "value_type": "cumulative" 120 | }, 121 | "type": "graph", 122 | "xaxis": { 123 | "buckets": null, 124 | "mode": "time", 125 | "name": null, 126 | "show": true, 127 | "values": [] 128 | }, 129 | "yaxes": [ 130 | { 131 | "format": "short", 132 | "logBase": 1, 133 | "max": null, 134 | "min": null, 135 | "show": true 136 | }, 137 | { 138 | "format": "short", 139 | "logBase": 1, 140 | "max": null, 141 | "min": null, 142 | "show": true 143 | } 144 | ] 145 | } 146 | ], 147 | "repeat": null, 148 | "repeatIteration": null, 149 | "repeatRowId": null, 150 | "showTitle": false, 151 | "title": "Row", 152 | "titleSize": "h6" 153 | } 154 | ], 155 | "schemaVersion": 14, 156 | "style": "dark", 157 | "tags": [], 158 | "templating": { 159 | "list": [ 160 | { 161 | "current": { 162 | "text": "cf_np", 163 | "value": "cf_np" 164 | }, 165 | "hide": 0, 166 | "label": null, 167 | "name": "Environment", 168 | "options": [], 169 | "query": "influxdb", 170 | "refresh": 1, 171 | "regex": "/cf_*/", 172 | "type": "datasource" 173 | } 174 | ] 175 | }, 176 | "time": { 177 | "from": "now-1h", 178 | "to": "now" 179 | }, 180 | "timepicker": { 181 | "now": true, 182 | "refresh_intervals": [ 183 | "5s", 184 | "10s", 185 | "30s", 186 | "1m", 187 | "5m", 188 | "15m", 189 | "30m", 190 | "1h", 191 | "2h", 192 | "1d" 193 | ], 194 | "time_options": [ 195 | "5m", 196 | "15m", 197 | "1h", 198 | "6h", 199 | "12h", 200 | "24h", 201 | "2d", 202 | "7d", 203 | "30d" 204 | ] 205 | }, 206 | "timezone": "browser", 207 | "title": "Component Health", 208 | "version": 0 209 | } 210 | -------------------------------------------------------------------------------- /grafana/dashboards/import_format/users.json: -------------------------------------------------------------------------------- 1 | { 2 | "__inputs": [], 3 | "__requires": [ 4 | { 5 | "type": "grafana", 6 | "id": "grafana", 7 | "name": "Grafana", 8 | "version": "4.2.0" 9 | }, 10 | { 11 | "type": "panel", 12 | "id": "singlestat", 13 | "name": "Singlestat", 14 | "version": "" 15 | } 16 | ], 17 | "annotations": { 18 | "list": [] 19 | }, 20 | "editable": true, 21 | "gnetId": null, 22 | "graphTooltip": 0, 23 | "hideControls": false, 24 | "id": null, 25 | "links": [], 26 | "rows": [ 27 | { 28 | "collapse": false, 29 | "height": "250px", 30 | "panels": [ 31 | { 32 | "cacheTimeout": null, 33 | "colorBackground": false, 34 | "colorValue": false, 35 | "colors": [ 36 | "rgba(245, 54, 54, 0.9)", 37 | "rgba(237, 129, 40, 0.89)", 38 | "rgba(50, 172, 45, 0.97)" 39 | ], 40 | "datasource": "$Environment", 41 | "format": "none", 42 | "gauge": { 43 | "maxValue": 100, 44 | "minValue": 0, 45 | "show": false, 46 | "thresholdLabels": false, 47 | "thresholdMarkers": true 48 | }, 49 | "id": 1, 50 | "interval": null, 51 | "links": [], 52 | "mappingType": 1, 53 | "mappingTypes": [ 54 | { 55 | "name": "value to text", 56 | "value": 1 57 | }, 58 | { 59 | "name": "range to text", 60 | "value": 2 61 | } 62 | ], 63 | "maxDataPoints": 100, 64 | "nullPointMode": "connected", 65 | "nullText": null, 66 | "postfix": "", 67 | "postfixFontSize": "50%", 68 | "prefix": "", 69 | "prefixFontSize": "50%", 70 | "rangeMaps": [ 71 | { 72 | "from": "null", 73 | "text": "N/A", 74 | "to": "null" 75 | } 76 | ], 77 | "span": 12, 78 | "sparkline": { 79 | "fillColor": "rgba(31, 118, 189, 0.18)", 80 | "full": false, 81 | "lineColor": "rgb(31, 120, 193)", 82 | "show": true 83 | }, 84 | "targets": [ 85 | { 86 | "dsType": "influxdb", 87 | "groupBy": [], 88 | "measurement": "firehose.cc.total_users", 89 | "policy": "default", 90 | "refId": "A", 91 | "resultFormat": "time_series", 92 | "select": [ 93 | [ 94 | { 95 | "params": [ 96 | "value" 97 | ], 98 | "type": "field" 99 | }, 100 | { 101 | "params": [], 102 | "type": "max" 103 | } 104 | ] 105 | ], 106 | "tags": [] 107 | } 108 | ], 109 | "thresholds": "", 110 | "title": "Total CF Users", 111 | "type": "singlestat", 112 | "valueFontSize": "80%", 113 | "valueMaps": [ 114 | { 115 | "op": "=", 116 | "text": "N/A", 117 | "value": "null" 118 | } 119 | ], 120 | "valueName": "avg" 121 | } 122 | ], 123 | "repeat": null, 124 | "repeatIteration": null, 125 | "repeatRowId": null, 126 | "showTitle": false, 127 | "title": "Row", 128 | "titleSize": "h6" 129 | } 130 | ], 131 | "schemaVersion": 14, 132 | "style": "dark", 133 | "tags": [], 134 | "templating": { 135 | "list": [ 136 | { 137 | "current": { 138 | "text": "cf_np", 139 | "value": "cf_np" 140 | }, 141 | "hide": 0, 142 | "label": null, 143 | "name": "Environment", 144 | "options": [], 145 | "query": "influxdb", 146 | "refresh": 1, 147 | "regex": "/cf_*/", 148 | "type": "datasource" 149 | } 150 | ] 151 | }, 152 | "time": { 153 | "from": "now-1h", 154 | "to": "now" 155 | }, 156 | "timepicker": { 157 | "refresh_intervals": [ 158 | "5s", 159 | "10s", 160 | "30s", 161 | "1m", 162 | "5m", 163 | "15m", 164 | "30m", 165 | "1h", 166 | "2h", 167 | "1d" 168 | ], 169 | "time_options": [ 170 | "5m", 171 | "15m", 172 | "1h", 173 | "6h", 174 | "12h", 175 | "24h", 176 | "2d", 177 | "7d", 178 | "30d" 179 | ] 180 | }, 181 | "timezone": "browser", 182 | "title": "Users", 183 | "version": 0 184 | } 185 | -------------------------------------------------------------------------------- /grafana/dashboards/import_format/vm-level-stats.json: -------------------------------------------------------------------------------- 1 | { 2 | "__inputs": [], 3 | "__requires": [ 4 | { 5 | "type": "grafana", 6 | "id": "grafana", 7 | "name": "Grafana", 8 | "version": "4.4.3" 9 | }, 10 | { 11 | "type": "panel", 12 | "id": "graph", 13 | "name": "Graph", 14 | "version": "" 15 | } 16 | ], 17 | "annotations": { 18 | "list": [ 19 | { 20 | "datasource": "$Environment", 21 | "enable": true, 22 | "iconColor": "#C0C6BE", 23 | "iconSize": 15, 24 | "lineColor": "rgba(243, 46, 7, 0.59)", 25 | "name": "deployment", 26 | "query": "select value from bosh_deploy where $timeFilter", 27 | "showLine": true 28 | } 29 | ] 30 | }, 31 | "editable": true, 32 | "gnetId": null, 33 | "graphTooltip": 0, 34 | "hideControls": false, 35 | "id": null, 36 | "links": [], 37 | "rows": [ 38 | { 39 | "collapse": false, 40 | "height": "250px", 41 | "panels": [ 42 | { 43 | "aliasColors": {}, 44 | "bars": false, 45 | "dashLength": 10, 46 | "dashes": false, 47 | "datasource": "$Environment", 48 | "editable": true, 49 | "error": false, 50 | "fill": 0, 51 | "grid": {}, 52 | "id": 8, 53 | "legend": { 54 | "avg": false, 55 | "current": false, 56 | "max": false, 57 | "min": false, 58 | "show": true, 59 | "total": false, 60 | "values": false 61 | }, 62 | "lines": true, 63 | "linewidth": 2, 64 | "links": [], 65 | "nullPointMode": "connected", 66 | "percentage": false, 67 | "pointradius": 5, 68 | "points": false, 69 | "renderer": "flot", 70 | "seriesOverrides": [], 71 | "spaceLength": 10, 72 | "span": 12, 73 | "stack": false, 74 | "steppedLine": false, 75 | "targets": [ 76 | { 77 | "alias": "$tag_job:$tag_index", 78 | "dsType": "influxdb", 79 | "groupBy": [ 80 | { 81 | "params": [ 82 | "$interval" 83 | ], 84 | "type": "time" 85 | }, 86 | { 87 | "params": [ 88 | "job" 89 | ], 90 | "type": "tag" 91 | }, 92 | { 93 | "params": [ 94 | "index" 95 | ], 96 | "type": "tag" 97 | } 98 | ], 99 | "hide": false, 100 | "measurement": "firehose.bosh-hm-forwarder.system.cpu.user", 101 | "orderByTime": "ASC", 102 | "policy": "default", 103 | "query": "SELECT \"cpu_sys\" + \"cpu_user\" FROM \"firehose.bosh-hm-forwarder.system.cpu\" GROUP BY \"job\", \"index\"", 104 | "rawQuery": true, 105 | "refId": "A", 106 | "resultFormat": "time_series", 107 | "select": [ 108 | [ 109 | { 110 | "params": [ 111 | "value" 112 | ], 113 | "type": "field" 114 | }, 115 | { 116 | "params": [], 117 | "type": "mean" 118 | } 119 | ] 120 | ], 121 | "tags": [] 122 | } 123 | ], 124 | "thresholds": [], 125 | "timeFrom": null, 126 | "timeShift": null, 127 | "title": "CPU Percentage", 128 | "tooltip": { 129 | "msResolution": true, 130 | "shared": true, 131 | "sort": 2, 132 | "value_type": "cumulative" 133 | }, 134 | "type": "graph", 135 | "xaxis": { 136 | "buckets": null, 137 | "mode": "time", 138 | "name": null, 139 | "show": true, 140 | "values": [] 141 | }, 142 | "yaxes": [ 143 | { 144 | "format": "percent", 145 | "label": "", 146 | "logBase": 1, 147 | "max": null, 148 | "min": null, 149 | "show": true 150 | }, 151 | { 152 | "format": "short", 153 | "label": null, 154 | "logBase": 1, 155 | "max": null, 156 | "min": null, 157 | "show": true 158 | } 159 | ] 160 | }, 161 | { 162 | "aliasColors": {}, 163 | "bars": false, 164 | "dashLength": 10, 165 | "dashes": false, 166 | "datasource": "$Environment", 167 | "editable": true, 168 | "error": false, 169 | "fill": 0, 170 | "grid": {}, 171 | "id": 2, 172 | "legend": { 173 | "avg": false, 174 | "current": false, 175 | "max": false, 176 | "min": false, 177 | "show": true, 178 | "total": false, 179 | "values": false 180 | }, 181 | "lines": true, 182 | "linewidth": 2, 183 | "links": [], 184 | "nullPointMode": "connected", 185 | "percentage": false, 186 | "pointradius": 5, 187 | "points": false, 188 | "renderer": "flot", 189 | "seriesOverrides": [], 190 | "spaceLength": 10, 191 | "span": 12, 192 | "stack": false, 193 | "steppedLine": false, 194 | "targets": [ 195 | { 196 | "alias": "$tag_job:$tag_index", 197 | "dsType": "influxdb", 198 | "groupBy": [ 199 | { 200 | "params": [ 201 | "$interval" 202 | ], 203 | "type": "time" 204 | }, 205 | { 206 | "params": [ 207 | "job" 208 | ], 209 | "type": "tag" 210 | }, 211 | { 212 | "params": [ 213 | "index" 214 | ], 215 | "type": "tag" 216 | } 217 | ], 218 | "measurement": "firehose.bosh-hm-forwarder.system.cpu.wait", 219 | "policy": "default", 220 | "refId": "A", 221 | "resultFormat": "time_series", 222 | "select": [ 223 | [ 224 | { 225 | "params": [ 226 | "value" 227 | ], 228 | "type": "field" 229 | }, 230 | { 231 | "params": [], 232 | "type": "mean" 233 | } 234 | ] 235 | ], 236 | "tags": [] 237 | } 238 | ], 239 | "thresholds": [], 240 | "timeFrom": null, 241 | "timeShift": null, 242 | "title": "CPU Wait Percent", 243 | "tooltip": { 244 | "msResolution": false, 245 | "shared": true, 246 | "sort": 2, 247 | "value_type": "cumulative" 248 | }, 249 | "type": "graph", 250 | "xaxis": { 251 | "buckets": null, 252 | "mode": "time", 253 | "name": null, 254 | "show": true, 255 | "values": [] 256 | }, 257 | "yaxes": [ 258 | { 259 | "format": "percent", 260 | "logBase": 1, 261 | "max": null, 262 | "min": null, 263 | "show": true 264 | }, 265 | { 266 | "format": "short", 267 | "logBase": 1, 268 | "max": null, 269 | "min": null, 270 | "show": true 271 | } 272 | ] 273 | }, 274 | { 275 | "aliasColors": {}, 276 | "bars": false, 277 | "dashLength": 10, 278 | "dashes": false, 279 | "datasource": "$Environment", 280 | "editable": true, 281 | "error": false, 282 | "fill": 0, 283 | "grid": {}, 284 | "id": 5, 285 | "legend": { 286 | "avg": false, 287 | "current": false, 288 | "max": false, 289 | "min": false, 290 | "show": true, 291 | "total": false, 292 | "values": false 293 | }, 294 | "lines": true, 295 | "linewidth": 2, 296 | "links": [], 297 | "nullPointMode": "connected", 298 | "percentage": false, 299 | "pointradius": 5, 300 | "points": false, 301 | "renderer": "flot", 302 | "seriesOverrides": [], 303 | "spaceLength": 10, 304 | "span": 12, 305 | "stack": false, 306 | "steppedLine": false, 307 | "targets": [ 308 | { 309 | "alias": "$tag_job:$tag_index", 310 | "dsType": "influxdb", 311 | "groupBy": [ 312 | { 313 | "params": [ 314 | "$interval" 315 | ], 316 | "type": "time" 317 | }, 318 | { 319 | "params": [ 320 | "job" 321 | ], 322 | "type": "tag" 323 | }, 324 | { 325 | "params": [ 326 | "index" 327 | ], 328 | "type": "tag" 329 | } 330 | ], 331 | "hide": false, 332 | "measurement": "firehose.bosh-hm-forwarder.system.load.1m", 333 | "policy": "default", 334 | "query": "SELECT value FROM \"firehose.bosh-hm-forwarder.system.load.1m\" WHERE $timeFilter GROUP BY job,index time($interval) ORDER BY asc", 335 | "rawQuery": false, 336 | "refId": "A", 337 | "resultFormat": "time_series", 338 | "select": [ 339 | [ 340 | { 341 | "params": [ 342 | "value" 343 | ], 344 | "type": "field" 345 | }, 346 | { 347 | "params": [], 348 | "type": "mean" 349 | } 350 | ] 351 | ], 352 | "tags": {}, 353 | "target": "" 354 | } 355 | ], 356 | "thresholds": [], 357 | "timeFrom": null, 358 | "timeShift": null, 359 | "title": "CPU Load", 360 | "tooltip": { 361 | "msResolution": false, 362 | "shared": true, 363 | "sort": 2, 364 | "value_type": "cumulative" 365 | }, 366 | "type": "graph", 367 | "xaxis": { 368 | "buckets": null, 369 | "mode": "time", 370 | "name": null, 371 | "show": true, 372 | "values": [] 373 | }, 374 | "yaxes": [ 375 | { 376 | "format": "short", 377 | "logBase": 1, 378 | "max": null, 379 | "min": null, 380 | "show": true 381 | }, 382 | { 383 | "format": "short", 384 | "logBase": 1, 385 | "max": null, 386 | "min": null, 387 | "show": true 388 | } 389 | ] 390 | }, 391 | { 392 | "aliasColors": {}, 393 | "bars": false, 394 | "dashLength": 10, 395 | "dashes": false, 396 | "datasource": "$Environment", 397 | "editable": true, 398 | "error": false, 399 | "fill": 0, 400 | "grid": {}, 401 | "id": 4, 402 | "legend": { 403 | "avg": false, 404 | "current": false, 405 | "max": false, 406 | "min": false, 407 | "show": true, 408 | "total": false, 409 | "values": false 410 | }, 411 | "lines": true, 412 | "linewidth": 2, 413 | "links": [], 414 | "nullPointMode": "connected", 415 | "percentage": false, 416 | "pointradius": 5, 417 | "points": false, 418 | "renderer": "flot", 419 | "seriesOverrides": [], 420 | "spaceLength": 10, 421 | "span": 12, 422 | "stack": false, 423 | "steppedLine": false, 424 | "targets": [ 425 | { 426 | "alias": "$tag_job:$tag_index", 427 | "dsType": "influxdb", 428 | "groupBy": [ 429 | { 430 | "params": [ 431 | "$interval" 432 | ], 433 | "type": "time" 434 | }, 435 | { 436 | "params": [ 437 | "job" 438 | ], 439 | "type": "tag" 440 | }, 441 | { 442 | "params": [ 443 | "index" 444 | ], 445 | "type": "tag" 446 | } 447 | ], 448 | "hide": false, 449 | "measurement": "firehose.bosh-hm-forwarder.system.disk.ephemeral.percent", 450 | "policy": "default", 451 | "query": "SELECT value FROM \"firehose.bosh-hm-forwarder.system.disk.ephemeral.percent\" WHERE $timeFilter GROUP BY job,index time($interval) ORDER BY asc", 452 | "rawQuery": false, 453 | "refId": "A", 454 | "resultFormat": "time_series", 455 | "select": [ 456 | [ 457 | { 458 | "params": [ 459 | "value" 460 | ], 461 | "type": "field" 462 | }, 463 | { 464 | "params": [], 465 | "type": "mean" 466 | } 467 | ] 468 | ], 469 | "tags": {}, 470 | "target": "" 471 | } 472 | ], 473 | "thresholds": [], 474 | "timeFrom": null, 475 | "timeShift": null, 476 | "title": "Ephemeral Disk Percent Used", 477 | "tooltip": { 478 | "msResolution": false, 479 | "shared": true, 480 | "sort": 2, 481 | "value_type": "cumulative" 482 | }, 483 | "type": "graph", 484 | "xaxis": { 485 | "buckets": null, 486 | "mode": "time", 487 | "name": null, 488 | "show": true, 489 | "values": [] 490 | }, 491 | "yaxes": [ 492 | { 493 | "format": "percent", 494 | "logBase": 1, 495 | "max": null, 496 | "min": null, 497 | "show": true 498 | }, 499 | { 500 | "format": "short", 501 | "logBase": 1, 502 | "max": null, 503 | "min": null, 504 | "show": true 505 | } 506 | ] 507 | }, 508 | { 509 | "aliasColors": {}, 510 | "bars": false, 511 | "dashLength": 10, 512 | "dashes": false, 513 | "datasource": "$Environment", 514 | "editable": true, 515 | "error": false, 516 | "fill": 0, 517 | "grid": {}, 518 | "id": 3, 519 | "legend": { 520 | "avg": false, 521 | "current": false, 522 | "max": false, 523 | "min": false, 524 | "show": true, 525 | "total": false, 526 | "values": false 527 | }, 528 | "lines": true, 529 | "linewidth": 2, 530 | "links": [], 531 | "nullPointMode": "connected", 532 | "percentage": false, 533 | "pointradius": 5, 534 | "points": false, 535 | "renderer": "flot", 536 | "seriesOverrides": [], 537 | "spaceLength": 10, 538 | "span": 12, 539 | "stack": false, 540 | "steppedLine": false, 541 | "targets": [ 542 | { 543 | "alias": "$tag_job:$tag_index", 544 | "dsType": "influxdb", 545 | "groupBy": [ 546 | { 547 | "params": [ 548 | "$interval" 549 | ], 550 | "type": "time" 551 | }, 552 | { 553 | "params": [ 554 | "job" 555 | ], 556 | "type": "tag" 557 | }, 558 | { 559 | "params": [ 560 | "index" 561 | ], 562 | "type": "tag" 563 | } 564 | ], 565 | "measurement": "firehose.bosh-hm-forwarder.system.disk.persistent.percent", 566 | "policy": "default", 567 | "refId": "A", 568 | "resultFormat": "time_series", 569 | "select": [ 570 | [ 571 | { 572 | "params": [ 573 | "value" 574 | ], 575 | "type": "field" 576 | }, 577 | { 578 | "params": [], 579 | "type": "mean" 580 | } 581 | ] 582 | ], 583 | "tags": [] 584 | } 585 | ], 586 | "thresholds": [], 587 | "timeFrom": null, 588 | "timeShift": null, 589 | "title": "Persistent Disk Percent Used", 590 | "tooltip": { 591 | "msResolution": false, 592 | "shared": true, 593 | "sort": 2, 594 | "value_type": "cumulative" 595 | }, 596 | "type": "graph", 597 | "xaxis": { 598 | "buckets": null, 599 | "mode": "time", 600 | "name": null, 601 | "show": true, 602 | "values": [] 603 | }, 604 | "yaxes": [ 605 | { 606 | "format": "percent", 607 | "logBase": 1, 608 | "max": null, 609 | "min": null, 610 | "show": true 611 | }, 612 | { 613 | "format": "short", 614 | "logBase": 1, 615 | "max": null, 616 | "min": null, 617 | "show": true 618 | } 619 | ] 620 | }, 621 | { 622 | "aliasColors": {}, 623 | "bars": false, 624 | "dashLength": 10, 625 | "dashes": false, 626 | "datasource": "$Environment", 627 | "editable": true, 628 | "error": false, 629 | "fill": 0, 630 | "grid": {}, 631 | "id": 6, 632 | "legend": { 633 | "avg": false, 634 | "current": false, 635 | "max": false, 636 | "min": false, 637 | "show": true, 638 | "total": false, 639 | "values": false 640 | }, 641 | "lines": true, 642 | "linewidth": 2, 643 | "links": [], 644 | "nullPointMode": "connected", 645 | "percentage": false, 646 | "pointradius": 5, 647 | "points": false, 648 | "renderer": "flot", 649 | "seriesOverrides": [], 650 | "spaceLength": 10, 651 | "span": 12, 652 | "stack": false, 653 | "steppedLine": false, 654 | "targets": [ 655 | { 656 | "alias": "$tag_job:$tag_index", 657 | "dsType": "influxdb", 658 | "groupBy": [ 659 | { 660 | "params": [ 661 | "$interval" 662 | ], 663 | "type": "time" 664 | }, 665 | { 666 | "params": [ 667 | "job" 668 | ], 669 | "type": "tag" 670 | }, 671 | { 672 | "params": [ 673 | "index" 674 | ], 675 | "type": "tag" 676 | } 677 | ], 678 | "hide": false, 679 | "measurement": "firehose.bosh-hm-forwarder.system.swap.percent", 680 | "policy": "default", 681 | "query": "SELECT mean(\"value\") FROM \"firehose.bosh-hm-forwarder.system.swap.percent\" WHERE $timeFilter GROUP BY time($interval), \"job\"", 682 | "rawQuery": false, 683 | "refId": "A", 684 | "resultFormat": "time_series", 685 | "select": [ 686 | [ 687 | { 688 | "params": [ 689 | "value" 690 | ], 691 | "type": "field" 692 | }, 693 | { 694 | "params": [], 695 | "type": "mean" 696 | } 697 | ] 698 | ], 699 | "tags": {}, 700 | "target": "" 701 | } 702 | ], 703 | "thresholds": [], 704 | "timeFrom": null, 705 | "timeShift": null, 706 | "title": "Swap Percentage", 707 | "tooltip": { 708 | "msResolution": false, 709 | "shared": true, 710 | "sort": 2, 711 | "value_type": "cumulative" 712 | }, 713 | "type": "graph", 714 | "xaxis": { 715 | "buckets": null, 716 | "mode": "time", 717 | "name": null, 718 | "show": true, 719 | "values": [] 720 | }, 721 | "yaxes": [ 722 | { 723 | "format": "percent", 724 | "logBase": 1, 725 | "max": null, 726 | "min": null, 727 | "show": true 728 | }, 729 | { 730 | "format": "short", 731 | "logBase": 1, 732 | "max": null, 733 | "min": null, 734 | "show": true 735 | } 736 | ] 737 | }, 738 | { 739 | "aliasColors": {}, 740 | "bars": false, 741 | "dashLength": 10, 742 | "dashes": false, 743 | "datasource": "$Environment", 744 | "editable": true, 745 | "error": false, 746 | "fill": 0, 747 | "grid": {}, 748 | "id": 7, 749 | "legend": { 750 | "avg": false, 751 | "current": false, 752 | "max": false, 753 | "min": false, 754 | "show": true, 755 | "total": false, 756 | "values": false 757 | }, 758 | "lines": true, 759 | "linewidth": 2, 760 | "links": [], 761 | "nullPointMode": "connected", 762 | "percentage": false, 763 | "pointradius": 5, 764 | "points": false, 765 | "renderer": "flot", 766 | "seriesOverrides": [], 767 | "spaceLength": 10, 768 | "span": 12, 769 | "stack": false, 770 | "steppedLine": false, 771 | "targets": [ 772 | { 773 | "alias": "$tag_job:$tag_index", 774 | "dsType": "influxdb", 775 | "groupBy": [ 776 | { 777 | "params": [ 778 | "$interval" 779 | ], 780 | "type": "time" 781 | }, 782 | { 783 | "params": [ 784 | "job" 785 | ], 786 | "type": "tag" 787 | }, 788 | { 789 | "params": [ 790 | "index" 791 | ], 792 | "type": "tag" 793 | } 794 | ], 795 | "hide": false, 796 | "measurement": "firehose.bosh-hm-forwarder.system.mem.percent", 797 | "policy": "default", 798 | "query": "SELECT value FROM \"firehose.bosh-hm-forwarder.system.mem.percent\" WHERE $timeFilter GROUP BY job,index time($interval) ORDER BY asc", 799 | "rawQuery": false, 800 | "refId": "A", 801 | "resultFormat": "time_series", 802 | "select": [ 803 | [ 804 | { 805 | "params": [ 806 | "value" 807 | ], 808 | "type": "field" 809 | }, 810 | { 811 | "params": [], 812 | "type": "mean" 813 | } 814 | ] 815 | ], 816 | "tags": {}, 817 | "target": "" 818 | } 819 | ], 820 | "thresholds": [], 821 | "timeFrom": null, 822 | "timeShift": null, 823 | "title": "Memory Used Percentage", 824 | "tooltip": { 825 | "msResolution": false, 826 | "shared": true, 827 | "sort": 2, 828 | "value_type": "cumulative" 829 | }, 830 | "type": "graph", 831 | "xaxis": { 832 | "buckets": null, 833 | "mode": "time", 834 | "name": null, 835 | "show": true, 836 | "values": [] 837 | }, 838 | "yaxes": [ 839 | { 840 | "format": "percent", 841 | "logBase": 1, 842 | "max": null, 843 | "min": null, 844 | "show": true 845 | }, 846 | { 847 | "format": "short", 848 | "logBase": 1, 849 | "max": null, 850 | "min": null, 851 | "show": true 852 | } 853 | ] 854 | } 855 | ], 856 | "repeat": null, 857 | "repeatIteration": null, 858 | "repeatRowId": null, 859 | "showTitle": false, 860 | "title": "Row", 861 | "titleSize": "h6" 862 | } 863 | ], 864 | "schemaVersion": 14, 865 | "style": "dark", 866 | "tags": [], 867 | "templating": { 868 | "list": [ 869 | { 870 | "current": { 871 | "text": "cf_np", 872 | "value": "cf_np" 873 | }, 874 | "hide": 0, 875 | "label": null, 876 | "name": "Environment", 877 | "options": [], 878 | "query": "influxdb", 879 | "refresh": 1, 880 | "regex": "/cf_*/", 881 | "type": "datasource" 882 | } 883 | ] 884 | }, 885 | "time": { 886 | "from": "now-1h", 887 | "to": "now" 888 | }, 889 | "timepicker": { 890 | "collapse": false, 891 | "enable": true, 892 | "notice": false, 893 | "now": true, 894 | "refresh_intervals": [ 895 | "5s", 896 | "10s", 897 | "30s", 898 | "1m", 899 | "5m", 900 | "15m", 901 | "30m", 902 | "1h", 903 | "2h", 904 | "1d" 905 | ], 906 | "status": "Stable", 907 | "time_options": [ 908 | "5m", 909 | "15m", 910 | "1h", 911 | "6h", 912 | "12h", 913 | "24h", 914 | "2d", 915 | "7d", 916 | "30d" 917 | ], 918 | "type": "timepicker" 919 | }, 920 | "timezone": "browser", 921 | "title": "VM Level Stats", 922 | "version": 0 923 | } 924 | -------------------------------------------------------------------------------- /grafana/grafana.ini: -------------------------------------------------------------------------------- 1 | ##################### Grafana Configuration Example ##################### 2 | # 3 | # Everything has defaults so you only need to uncomment things you want to 4 | # change 5 | 6 | ; app_mode = production 7 | 8 | #################################### Paths #################################### 9 | [paths] 10 | # Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is useD) 11 | # 12 | ;data = /var/lib/grafana 13 | # 14 | # Directory where grafana can store logs 15 | # 16 | ;logs = /var/log/grafana 17 | 18 | #################################### Server #################################### 19 | [server] 20 | # Protocol (http or https) 21 | ;protocol = http 22 | 23 | # The ip address to bind to, empty will bind to all interfaces 24 | ;http_addr = 25 | 26 | # The http port to use 27 | ;http_port = 3000 28 | 29 | # The public facing domain name used to access grafana from a browser 30 | 31 | # The full public facing url 32 | ;root_url = %(protocol)s://%(domain)s:%(http_port)s/ 33 | 34 | # Log web requests 35 | ;router_logging = false 36 | 37 | # the path relative working path 38 | ;static_root_path = public 39 | 40 | # enable gzip 41 | ;enable_gzip = false 42 | 43 | # https certs & key file 44 | ;cert_file = 45 | ;cert_key = 46 | 47 | #################################### Database #################################### 48 | [database] 49 | # Either "mysql", "postgres" or "sqlite3", it's your choice 50 | ;type = sqlite3 51 | ;host = 127.0.0.1:3306 52 | ;name = grafana 53 | ;user = root 54 | ;password = 55 | 56 | # For "postgres" only, either "disable", "require" or "verify-full" 57 | ;ssl_mode = disable 58 | 59 | # For "sqlite3" only, path relative to data_path setting 60 | ;path = grafana.db 61 | 62 | #################################### Session #################################### 63 | [session] 64 | # Either "memory", "file", "redis", "mysql", default is "memory" 65 | ;provider = file 66 | 67 | # Provider config options 68 | # memory: not have any config yet 69 | # file: session dir path, is relative to grafana data_path 70 | # redis: config like redis server addr, poolSize, password, e.g. `127.0.0.1:6379,100,grafana` 71 | # mysql: go-sql-driver/mysql dsn config string, e.g. `user:password@tcp(127.0.0.1)/database_name` 72 | ;provider_config = sessions 73 | 74 | # Session cookie name 75 | ;cookie_name = grafana_sess 76 | 77 | # If you use session in https only, default is false 78 | ;cookie_secure = false 79 | 80 | # Session life time, default is 86400 81 | ;session_life_time = 86400 82 | 83 | #################################### Analytics #################################### 84 | [analytics] 85 | # Server reporting, sends usage counters to stats.grafana.org every 24 hours. 86 | # No ip addresses are being tracked, only simple counters to track 87 | # running instances, dashboard and error counts. It is very helpful to us. 88 | # Change this option to false to disable reporting. 89 | reporting_enabled = false 90 | 91 | # Google Analytics universal tracking code, only enabled if you specify an id here 92 | ;google_analytics_ua_id = 93 | 94 | #################################### Security #################################### 95 | [security] 96 | # default admin user, created on startup 97 | ;admin_user = admin 98 | 99 | # default admin password, can be changed before first start of grafana, or in profile settings 100 | ;admin_password = admin 101 | 102 | # used for signing 103 | ;secret_key = some_secret 104 | 105 | # Auto-login remember days 106 | ;login_remember_days = 7 107 | ;cookie_username = grafana_user 108 | ;cookie_remember_name = grafana_remember 109 | 110 | #################################### Users #################################### 111 | [users] 112 | # disable user signup / registration 113 | allow_sign_up = false 114 | 115 | # Allow non admin users to create organizations 116 | allow_org_create = false 117 | 118 | # Set to true to automatically assign new users to the default organization (id 1) 119 | auto_assign_org = true 120 | 121 | # Default role new users will be automatically assigned (if disabled above is set to true) 122 | auto_assign_org_role = Viewer 123 | 124 | #################################### Anonymous Auth ########################## 125 | [auth.anonymous] 126 | # enable anonymous access 127 | enabled = true 128 | 129 | # specify organization name that should be used for unauthenticated users 130 | org_name = Main Org. 131 | 132 | # specify role for unauthenticated users 133 | org_role = Viewer 134 | 135 | #################################### Github Auth ########################## 136 | [auth.github] 137 | enabled = true 138 | #client_id = xxxxxxxxxxx 139 | #client_secret = xxxxxxx 140 | #scopes = user:email 141 | #auth_url = https://github.com/login/oauth/authorize 142 | #token_url = https://github.com/login/oauth/access_token 143 | #api_url = https://api.github.com/user 144 | 145 | # Uncomment bellow to only allow specific email domains 146 | ; allowed_domains = mycompany.com othercompany.com 147 | allow_sign_up = true 148 | 149 | #################################### Google Auth ########################## 150 | [auth.google] 151 | ;enabled = false 152 | ;client_id = some_client_id 153 | ;client_secret = some_client_secret 154 | ;scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email 155 | ;auth_url = https://accounts.google.com/o/oauth2/auth 156 | ;token_url = https://accounts.google.com/o/oauth2/token 157 | ;api_url = https://www.googleapis.com/oauth2/v1/userinfo 158 | # Uncomment bellow to only allow specific email domains 159 | ; allowed_domains = mycompany.com othercompany.com 160 | 161 | #################################### Logging ########################## 162 | [log] 163 | # Either "console", "file", default is "console" 164 | # Use comma to separate multiple modes, e.g. "console, file" 165 | ;mode = console, file 166 | 167 | # Buffer length of channel, keep it as it is if you don't know what it is. 168 | ;buffer_len = 10000 169 | 170 | # Either "Trace", "Debug", "Info", "Warn", "Error", "Critical", default is "Trace" 171 | ;level = Info 172 | 173 | # For "console" mode only 174 | [log.console] 175 | ;level = 176 | 177 | # For "file" mode only 178 | [log.file] 179 | ;level = 180 | # This enables automated log rotate(switch of following options), default is true 181 | ;log_rotate = true 182 | 183 | # Max line number of single file, default is 1000000 184 | ;max_lines = 1000000 185 | 186 | # Max size shift of single file, default is 28 means 1 << 28, 256MB 187 | ;max_lines_shift = 28 188 | 189 | # Segment log daily, default is true 190 | ;daily_rotate = true 191 | 192 | # Expired days of log file(delete after max days), default is 7 193 | ;max_days = 7 194 | 195 | #################################### AMPQ Event Publisher ########################## 196 | [event_publisher] 197 | ;enabled = false 198 | ;rabbitmq_url = amqp://localhost/ 199 | ;exchange = grafana_events 200 | 201 | -------------------------------------------------------------------------------- /grafana/load.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | UP=0 4 | 5 | while [ $UP -ne 1 ] 6 | do 7 | curl localhost:3000 &> /dev/null 8 | if [ $? -eq 0 ] 9 | then 10 | echo "localhost was up" 11 | UP=1 12 | fi 13 | echo "waiting for host to come up ......" 14 | sleep 2 15 | done 16 | 17 | 18 | curl --user admin:admin 'http://localhost:3000/api/datasources' -X POST -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{"name":"influxdb","type":"influxdb","url":"http://influxdb:8086","access":"proxy","isDefault":false,"database":"_internal","user":"root","password":"root"}' 19 | curl --user admin:admin 'http://localhost:3000/api/datasources' -X POST -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{"name":"cf_np","type":"influxdb","url":"http://influxdb:8086","access":"proxy","isDefault":false,"database":"cf_np","user":"root","password":"root"}' 20 | -------------------------------------------------------------------------------- /grafana/supervisord.conf: -------------------------------------------------------------------------------- 1 | [supervisord] 2 | nodaemon=true 3 | logfile=/dev/null 4 | 5 | [program:grafana] 6 | command=/usr/sbin/grafana-server --config /etc/grafana/grafana.ini 7 | auto_start=true 8 | autorestart=true 9 | redirect_stderr=true 10 | stdout_logfile=/dev/stdout 11 | stdout_logfile_maxbytes=0 12 | 13 | [program:datasourceload] 14 | command=/bin/bash /etc/grafana/load.sh 15 | startsecs=0 16 | autostart=true 17 | autorestart=false 18 | redirect_stderr=true 19 | stdout_logfile_maxbytes=0 20 | stderr_logfile=/dev/stdout 21 | stderr_logfile_maxbytes=0 22 | -------------------------------------------------------------------------------- /images/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bayer-Group/cf-metrics/1c6ca18746cd9a04054f9ad06e81d6c9cbea3fe0/images/architecture.png -------------------------------------------------------------------------------- /images/bosh_stats.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bayer-Group/cf-metrics/1c6ca18746cd9a04054f9ad06e81d6c9cbea3fe0/images/bosh_stats.png -------------------------------------------------------------------------------- /images/cell_memory.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bayer-Group/cf-metrics/1c6ca18746cd9a04054f9ad06e81d6c9cbea3fe0/images/cell_memory.png -------------------------------------------------------------------------------- /images/dashboards.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bayer-Group/cf-metrics/1c6ca18746cd9a04054f9ad06e81d6c9cbea3fe0/images/dashboards.png -------------------------------------------------------------------------------- /images/datastores.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bayer-Group/cf-metrics/1c6ca18746cd9a04054f9ad06e81d6c9cbea3fe0/images/datastores.png -------------------------------------------------------------------------------- /images/loggregator_stats.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bayer-Group/cf-metrics/1c6ca18746cd9a04054f9ad06e81d6c9cbea3fe0/images/loggregator_stats.png -------------------------------------------------------------------------------- /images/slack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bayer-Group/cf-metrics/1c6ca18746cd9a04054f9ad06e81d6c9cbea3fe0/images/slack.png -------------------------------------------------------------------------------- /influxdb/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:trusty 2 | MAINTAINER mjseid 3 | ENV INFLUXDB_VERSION 1.3.6 4 | 5 | RUN apt-get update && apt-get install -y curl openssh-client awscli 6 | RUN curl -s -o /tmp/influxdb_latest_amd64.deb https://dl.influxdata.com/influxdb/releases/influxdb_${INFLUXDB_VERSION}_amd64.deb && \ 7 | dpkg -i /tmp/influxdb_latest_amd64.deb && \ 8 | rm /tmp/influxdb_latest_amd64.deb && \ 9 | rm -rf /var/lib/apt/lists/* 10 | 11 | ADD influxdb.config /etc/influxdb/influxdb.config 12 | ADD run.sh /run.sh 13 | RUN chmod +x /*.sh 14 | 15 | ENV PRE_CREATE_DB **None** 16 | 17 | EXPOSE 8083 8084 8086 18 | 19 | VOLUME ["/data"] 20 | 21 | CMD ["/run.sh"] 22 | -------------------------------------------------------------------------------- /influxdb/influxdb.config: -------------------------------------------------------------------------------- 1 | reporting-disabled = true 2 | bind-address = ":8088" 3 | 4 | [meta] 5 | dir = "/data/meta" 6 | retention-autocreate = true 7 | logging-enabled = true 8 | 9 | [data] 10 | dir = "/data/db" 11 | engine = "tsm1" 12 | wal-dir = "/data/wal" 13 | wal-logging-enabled = true 14 | query-log-enabled = true 15 | cache-max-memory-size = 524288000 16 | cache-snapshot-memory-size = 26214400 17 | cache-snapshot-write-cold-duration = "1h0m0s" 18 | compact-full-write-cold-duration = "24h0m0s" 19 | max-points-per-block = 0 20 | max-series-per-database = 1000000 21 | trace-logging-enabled = false 22 | 23 | [coordinator] 24 | write-timeout = "10s" 25 | max-concurrent-queries = 0 26 | query-timeout = "0" 27 | log-queries-after = "0" 28 | max-select-point = 0 29 | max-select-series = 0 30 | max-select-buckets = 0 31 | 32 | [retention] 33 | enabled = true 34 | check-interval = "30m0s" 35 | 36 | [shard-precreation] 37 | enabled = true 38 | check-interval = "10m0s" 39 | advance-period = "30m0s" 40 | 41 | [admin] 42 | enabled = true 43 | bind-address = ":8083" 44 | https-enabled = false 45 | https-certificate = "/etc/ssl/influxdb.pem" 46 | 47 | [monitor] 48 | store-enabled = true 49 | store-database = "_internal" 50 | store-interval = "10s" 51 | 52 | [subscriber] 53 | enabled = true 54 | http-timeout = "30s" 55 | write-concurrency = 40 56 | write-buffer-size = 1000 57 | 58 | [http] 59 | enabled = true 60 | bind-address = ":8086" 61 | auth-enabled = false 62 | log-enabled = true 63 | write-tracing = false 64 | https-enabled = false 65 | https-certificate = "/etc/ssl/influxdb.pem" 66 | https-private-key = "" 67 | # temp fix for grafana 68 | max-row-limit = 0 69 | max-connection-limit = 0 70 | shared-secret = "" 71 | realm = "InfluxDB" 72 | 73 | [[graphite]] 74 | enabled = false 75 | 76 | [[collectd]] 77 | enabled = false 78 | 79 | [[opentsdb]] 80 | enabled = false 81 | 82 | [[udp]] 83 | enabled = false 84 | 85 | [continuous_queries] 86 | log-enabled = true 87 | enabled = true 88 | run-interval = "1s" 89 | -------------------------------------------------------------------------------- /influxdb/run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | set -m 3 | CONFIG_FILE="/etc/influxdb/influxdb.config" 4 | 5 | if [ "${PRE_CREATE_DB}" == "**None**" ]; then 6 | unset PRE_CREATE_DB 7 | fi 8 | 9 | API_URL="http://localhost:8086" 10 | 11 | if [ -f "/data/.pre_db_created" ] 12 | then 13 | echo "reusing existing local data ..." 14 | else 15 | echo "no backup found in s3, starting from scratch" 16 | fi 17 | 18 | #Pre create database on the initiation of the container 19 | if [ -n "${PRE_CREATE_DB}" ]; then 20 | echo "=> About to create the following database: ${PRE_CREATE_DB}" 21 | if [ -f "/data/.pre_db_created" ]; then 22 | echo "=> Database had been created before, skipping ..." 23 | else 24 | echo "=> Starting InfluxDB ..." 25 | exec /usr/bin/influxd -config=${CONFIG_FILE} & 26 | PASS=${INFLUXDB_INIT_PWD:-root} 27 | arr=$(echo ${PRE_CREATE_DB} | tr ";" "\n") 28 | 29 | #wait for the startup of influxdb 30 | RET=1 31 | while [[ RET -ne 0 ]]; do 32 | echo "=> Waiting for confirmation of InfluxDB service startup ..." 33 | sleep 3 34 | curl -k ${API_URL}/ping 2> /dev/null 35 | RET=$? 36 | done 37 | echo "" 38 | 39 | for x in $arr 40 | do 41 | echo "=> Creating database: ${x}" 42 | curl -G 'http://localhost:8086/query?u=root&p=root' --data-urlencode "q=CREATE DATABASE ${x}" 43 | #Pre create the cq's for the databases specified for pre-create on the initiation of the container 44 | echo "=> Creating cq for database: ${x}" 45 | curl -i -XPOST 'http://localhost:8086/query?u=root&p=root' --data-urlencode "q=ALTER RETENTION POLICY autogen ON ${x} DURATION 14d REPLICATION 1 DEFAULT" 46 | curl -G 'http://localhost:8086/query?u=root&p=root' --data-urlencode "q=CREATE RETENTION POLICY one_year ON ${x} DURATION 52w REPLICATION 1" 47 | curl -G 'http://localhost:8086/query?u=root&p=root' --data-urlencode "q=CREATE CONTINUOUS QUERY ${x}_30m ON ${x} BEGIN SELECT mean(value) AS value INTO ${x}.\"one_year\".:MEASUREMENT FROM /.*/ GROUP BY time(30m), * END" 48 | curl -G 'http://localhost:8086/query?u=root&p=root' --data-urlencode "q=CREATE CONTINUOUS QUERY \"sum_cpu_sys\" ON ${x} BEGIN SELECT mean(value) as \"cpu_sys\" INTO \"firehose.bosh-hm-forwarder.system.cpu\" FROM \"firehose.bosh-hm-forwarder.system.cpu.sys\" GROUP BY time(1m),* END" 49 | curl -G 'http://localhost:8086/query?u=root&p=root' --data-urlencode "q=CREATE CONTINUOUS QUERY \"sum_cpu_user\" ON ${x} BEGIN SELECT mean(value) as \"cpu_user\" INTO \"firehose.bosh-hm-forwarder.system.cpu\" FROM \"firehose.bosh-hm-forwarder.system.cpu.user\" GROUP BY time(1m),* END" 50 | done 51 | echo "" 52 | 53 | touch "/data/.pre_db_created" 54 | fg 55 | exit 0 56 | fi 57 | else 58 | echo "=> No database need to be pre-created" 59 | fi 60 | 61 | echo "=> Starting InfluxDB ..." 62 | 63 | exec /usr/bin/influxd -config=${CONFIG_FILE} 64 | -------------------------------------------------------------------------------- /kapacitor/bosh_event_np.tick: -------------------------------------------------------------------------------- 1 | //curl --max-time 2 -X PUT http://localhost:8125/v1/event/fire/director -d "{\"kind\":\"alert\",\"id\":\"c0cef7c0-808e-48fd-abcb-1234567890\",\"severity\":4,\"title\":\"director - begin update deployment\",\"summary\":\"Begin update deployment for 'cf_aws_np-diego' against Director '52b23505-b43d-435c-ad13-1234567890'\",\"source\":\"director\",\"created_at\":1504142405}" 2 | 3 | var slackchannel = '#bot-testing' 4 | var env = 'Non-Prod' 5 | var database = 'cf_np' 6 | 7 | var events = stream 8 | |from() 9 | //.database('telegraf') 10 | .measurement('deploy_event') 11 | 12 | //deadman alert if we don't get anything in 5 minutes 13 | events 14 | |deadman(1.0, 5m) 15 | .stateChangesOnly(15m) 16 | .noRecoveries() 17 | .id('Deadman alert for ' + env +' BOSH HM Forwarder') 18 | .message('{{ .ID }}') 19 | .slack() 20 | .channel(slackchannel) 21 | 22 | events 23 | // enable this to see all the bosh hm events coming across 24 | //|log() 25 | |eval(lambda: if(strContains("value", 'Finish update deployment') OR strContains("value", 'Error during update'), 'true', 'false'), lambda: bool("textvalue")) 26 | .as('textvalue','value') 27 | .keep('value') 28 | |where(lambda: "value" == bool('true')) 29 | |influxDBOut() 30 | .database(database) 31 | .retentionPolicy('autogen') 32 | .measurement('bosh_deploy') 33 | .tag('status', 'finish') 34 | .precision('s') 35 | 36 | events 37 | |eval(lambda: strContains("value", 'Begin update deployment')) 38 | .as('value') 39 | |where(lambda: "value" == bool('true')) 40 | |influxDBOut() 41 | .database(database) 42 | .retentionPolicy('autogen') 43 | .measurement('bosh_deploy') 44 | .tag('status', 'start') 45 | .precision('s') 46 | -------------------------------------------------------------------------------- /kapacitor/cpu_wait_np.tick: -------------------------------------------------------------------------------- 1 | // will need to run this periodically for 5 minutes to test alert 2 | //curl -i -XPOST 'http://localhost:8086/write?db=cf_np&precision=s' --data-binary 'firehose.bosh-hm-forwarder.system.cpu.wait,deployment=cf_np-diego,job=cell_z1,index=999 value=70 '`date +%s`'' 3 | 4 | var slackchannel = '#bot-testing' 5 | var grafanaurl = 'http://server.company.com:3000/dashboard/file' 6 | var grafanaenv = '?&var-Environment=cf_np' 7 | 8 | stream 9 | |from() 10 | .measurement('firehose.bosh-hm-forwarder.system.cpu.wait') 11 | |groupBy('job', 'index') 12 | |stateDuration(lambda: "value" >= 90) 13 | .unit(1m) 14 | |alert() 15 | .crit(lambda: "state_duration" >= 15) 16 | .stateChangesOnly(15m) 17 | .noRecoveries() 18 | .id('CPU Wait Alert for {{ index .Tags "deployment" }}') 19 | .message('{{ .ID }} 20 | Job {{index .Tags "job" }}/{{index .Tags "index" }} in {{index .Tags "deployment" }} using more than 50% for past 5 minutes 21 | <' + grafanaurl +'/vm-level-stats.json'+ grafanaenv +'|Go to Grafana>') 22 | .slack() 23 | .channel(slackchannel) 24 | -------------------------------------------------------------------------------- /kapacitor/etcd_alert_np.tick: -------------------------------------------------------------------------------- 1 | //curl -i -XPOST 'http://localhost:8086/write?db=cf_np&precision=s' --data-binary 'firehose.etcd.IsLeader,deployment=cf_np-diego,ip=192.168.1.1 value=1 '`date +%s`'' 2 | 3 | var env = 'Non-Prod' 4 | var slackchannel = '#bot-testing' 5 | var grafanaurl = 'http://server.company.com:3000/dashboard/file' 6 | var grafanaenv = '?&var-Environment=cf_np' 7 | 8 | stream 9 | |from() 10 | .measurement('firehose.etcd.IsLeader') 11 | |window() 12 | .period(1m) 13 | .every(1m) 14 | |sum('value') 15 | |stateDuration(lambda: "sum" > 1) 16 | .unit(1m) 17 | |alert() 18 | .crit(lambda: "state_duration" >= 5) 19 | .stateChangesOnly(15m) 20 | .noRecoveries() 21 | .id('More than 1 EtcD leader in ' + env) 22 | .message('{{ .ID }} 23 | Possible cluster split brain scenario 24 | <' + grafanaurl +'/Etcd_stats.json'+ grafanaenv +'|Go to Grafana>') 25 | .slack() 26 | .channel(slackchannel) 27 | -------------------------------------------------------------------------------- /kapacitor/job_health_np.tick: -------------------------------------------------------------------------------- 1 | //will need to run this periodically for 15 minutes to test alert 2 | //curl -i -XPOST 'http://localhost:8086/write?db=cf_np&precision=s' --data-binary 'firehose.bosh-hm-forwarder.system.healthy,deployment=cf_np-diego,job=cell_z1,index=999 value=0 '`date +%s`'' 3 | 4 | var slackchannel = '#bot-testing' 5 | var env = 'Non-Prod' 6 | var grafanaurl = 'http://server.company.com:3000/dashboard/file' 7 | var grafanaenv = '?&var-Environment=cf_np' 8 | 9 | var data = stream 10 | |from() 11 | .measurement('firehose.bosh-hm-forwarder.system.healthy') 12 | 13 | data 14 | |groupBy('job', 'index') 15 | |stateDuration(lambda: "value" < 1) 16 | .unit(3m) 17 | |alert() 18 | .crit(lambda: "state_duration" >= 5) 19 | .stateChangesOnly(15m) 20 | .noRecoveries() 21 | .id('Job Health Alert for {{ index .Tags "deployment" }}') 22 | .message('{{ .ID }} 23 | Job {{index .Tags "job" }}/{{index .Tags "index" }} in {{index .Tags "deployment" }} not healthy for past 15 minutes 24 | <' + grafanaurl +'/component-health.json'+ grafanaenv +'|Go to Grafana>') 25 | .slack() 26 | .channel(slackchannel) 27 | -------------------------------------------------------------------------------- /kapacitor/kapacitor.conf: -------------------------------------------------------------------------------- 1 | hostname = "localhost" 2 | data_dir = "/var/lib/kapacitor" 3 | skip-config-overrides = false 4 | default-retention-policy = "" 5 | 6 | [http] 7 | bind-address = ":9092" 8 | auth-enabled = false 9 | log-enabled = true 10 | write-tracing = false 11 | pprof-enabled = false 12 | https-enabled = false 13 | https-certificate = "/etc/ssl/kapacitor.pem" 14 | shutdown-timeout = "10s" 15 | shared-secret = "" 16 | 17 | [replay] 18 | dir = "/var/lib/kapacitor/replay" 19 | 20 | [storage] 21 | boltdb = "/var/lib/kapacitor/kapacitor.db" 22 | 23 | [task] 24 | dir = "/root/.kapacitor/tasks" 25 | snapshot-interval = "1m0s" 26 | 27 | [[influxdb]] 28 | enabled = true 29 | name = "default" 30 | default = false 31 | urls = ["http://localhost:8086"] 32 | username = "" 33 | password = "" 34 | ssl-ca = "" 35 | ssl-cert = "" 36 | ssl-key = "" 37 | insecure-skip-verify = false 38 | timeout = "0s" 39 | disable-subscriptions = false 40 | subscription-protocol = "http" 41 | kapacitor-hostname = "kapacitor" 42 | http-port = 0 43 | udp-bind = "" 44 | udp-buffer = 1000 45 | udp-read-buffer = 0 46 | startup-timeout = "5m0s" 47 | subscriptions-sync-interval = "1m0s" 48 | [influxdb.subscriptions] 49 | [influxdb.excluded-subscriptions] 50 | _internal = ["monitor"] 51 | 52 | [logging] 53 | file = "STDERR" 54 | #level = "INFO" 55 | level = "DEBUG" 56 | 57 | [config-override] 58 | enabled = true 59 | 60 | [alert] 61 | 62 | [collectd] 63 | enabled = false 64 | bind-address = ":25826" 65 | database = "collectd" 66 | retention-policy = "" 67 | batch-size = 5000 68 | batch-pending = 10 69 | batch-timeout = "10s" 70 | read-buffer = 0 71 | typesdb = "/usr/share/collectd/types.db" 72 | 73 | [opentsdb] 74 | enabled = false 75 | bind-address = ":4242" 76 | database = "opentsdb" 77 | retention-policy = "" 78 | consistency-level = "one" 79 | tls-enabled = false 80 | certificate = "/etc/ssl/influxdb.pem" 81 | batch-size = 1000 82 | batch-pending = 5 83 | batch-timeout = "1s" 84 | log-point-errors = true 85 | 86 | [alerta] 87 | enabled = false 88 | url = "" 89 | insecure-skip-verify = false 90 | token = "" 91 | environment = "" 92 | origin = "" 93 | 94 | [hipchat] 95 | enabled = false 96 | url = "" 97 | token = "" 98 | room = "" 99 | global = false 100 | state-changes-only = false 101 | 102 | [opsgenie] 103 | enabled = false 104 | api-key = "" 105 | url = "https://api.opsgenie.com/v1/json/alert" 106 | recovery_url = "https://api.opsgenie.com/v1/json/alert/note" 107 | global = false 108 | 109 | [pagerduty] 110 | enabled = false 111 | url = "https://events.pagerduty.com/generic/2010-04-15/create_event.json" 112 | service-key = "" 113 | global = false 114 | 115 | [[httppost]] 116 | endpoint = "jenkins-np" 117 | url = "placeholderforissue1344" 118 | 119 | [[httppost]] 120 | endpoint = "jenkins-prd" 121 | url = "placeholderforissue1344" 122 | 123 | 124 | [smtp] 125 | enabled = false 126 | host = "localhost" 127 | port = 25 128 | username = "" 129 | password = "" 130 | no-verify = false 131 | global = false 132 | state-changes-only = false 133 | from = "" 134 | idle-timeout = "30s" 135 | 136 | [snmptrap] 137 | enabled = false 138 | addr = "localhost:162" 139 | community = "kapacitor" 140 | retries = 1 141 | 142 | [sensu] 143 | enabled = false 144 | addr = "" 145 | source = "Kapacitor" 146 | 147 | [slack] 148 | enabled = false 149 | url = "" 150 | channel = "placeholder" 151 | username = "kapacitor" 152 | icon-emoji = "" 153 | global = false 154 | state-changes-only = false 155 | 156 | [talk] 157 | enabled = false 158 | url = "" 159 | author_name = "" 160 | 161 | [telegram] 162 | enabled = false 163 | url = "https://api.telegram.org/bot" 164 | token = "" 165 | chat-id = "" 166 | parse-mode = "" 167 | disable-web-page-preview = false 168 | disable-notification = false 169 | global = false 170 | state-changes-only = false 171 | 172 | [victorops] 173 | enabled = false 174 | api-key = "" 175 | routing-key = "" 176 | url = "https://alert.victorops.com/integrations/generic/20131114/alert" 177 | global = false 178 | 179 | [kubernetes] 180 | enabled = false 181 | in-cluster = false 182 | token = "" 183 | ca-path = "" 184 | namespace = "" 185 | 186 | [reporting] 187 | enabled = false 188 | url = "https://usage.influxdata.com" 189 | 190 | [stats] 191 | enabled = true 192 | stats-interval = "10s" 193 | database = "_kapacitor" 194 | retention-policy = "autogen" 195 | timing-sample-rate = 0.1 196 | timing-movavg-size = 1000 197 | 198 | [udf] 199 | 200 | [deadman] 201 | interval = "10s" 202 | threshold = 0.0 203 | id = "{{ .Group }}:NODE_NAME for task '{{ .TaskName }}'" 204 | message = "{{ .ID }} is {{ if eq .Level \"OK\" }}alive{{ else }}dead{{ end }}: {{ index .Fields \"emitted\" | printf \"%0.3f\" }} points/INTERVAL." 205 | global = false 206 | 207 | 208 | -------------------------------------------------------------------------------- /kapacitor/loader.md: -------------------------------------------------------------------------------- 1 | kapacitor define etcd_alert_np -type stream -tick etcd_alert_np.tick -dbrp cf_np.autogen 2 | kapacitor enable etcd_alert_np 3 | kapacitor define slow_consumer_np -type stream -tick slow_consumer_np.tick -dbrp cf_np.autogen 4 | kapacitor enable slow_consumer_np 5 | kapacitor define swap_alert_np -type stream -tick swap_alert_np.tick -dbrp cf_np.autogen 6 | kapacitor enable swap_alert_np 7 | kapacitor define job_health_np -type stream -tick job_health_np.tick -dbrp cf_np.autogen 8 | kapacitor enable job_health_np 9 | kapacitor define persistent_disk_np -type stream -tick persistent_disk_np.tick -dbrp cf_np.autogen 10 | kapacitor enable persistent_disk_np 11 | kapacitor define cpu_wait_alert_np -type stream -tick cpu_wait_np.tick -dbrp cf_np.autogen 12 | kapacitor enable cpu_wait_alert_np 13 | kapacitor define bosh_event_np -type stream -tick bosh_event_np.tick -dbrp telegraf_np.autogen 14 | kapacitor enable bosh_event_np 15 | kapacitor define max_container_np -type stream -tick max_container_np.tick -dbrp cf_np.autogen 16 | kapacitor enable max_container_np 17 | -------------------------------------------------------------------------------- /kapacitor/max_container_np.tick: -------------------------------------------------------------------------------- 1 | var slackchannel = '#bot-testing' 2 | var grafanaurl = 'http://server.company.com:3000/dashboard/file' 3 | var jenkins = 'jenkins-np' 4 | var grafanaenv = '?&var-Environment=cf_np' 5 | var env = 'Non-Prod' 6 | 7 | var repmem = stream 8 | // Select just the cell mem measurement 9 | |from() 10 | .measurement('firehose.rep.CapacityRemainingMemory') 11 | // eval every minute the past 4 minutes of data 12 | |window() 13 | .period(4m) 14 | .every(1m) 15 | 16 | repmem 17 | |deadman(1.0, 5m) 18 | .stateChangesOnly(15m) 19 | .noRecoveries() 20 | .id('Deadman alert for ' + env +' Firehose') 21 | .message('{{ .ID }}') 22 | .slack() 23 | .channel(slackchannel) 24 | 25 | repmem 26 | |where(lambda: "job" =~ /cell_z./) 27 | // get the max from all the cells who've reported in. 28 | |max('value') 29 | |alert() 30 | // if the max remaining mem for any cell who's reported in the past 3 minutes is less than 3GB, then send the alert 31 | .crit(lambda: "max" < 3096) 32 | // only alert on state change or every 15 min when state hasn't changed 33 | .stateChangesOnly(15m) 34 | .noRecoveries() 35 | .id('Cell Capacity Alert for {{ index .Tags "deployment" }}') 36 | .message('{{ .ID }} 37 | No Cells in {{ index .Tags "deployment" }} have room for a 3G container. 38 | <' + grafanaurl +'/cell.json'+ grafanaenv +'|Go to Grafana>') 39 | .slack() 40 | .channel(slackchannel) 41 | .post().endpoint(jenkins) 42 | 43 | repmem 44 | |where(lambda: "job" =~ /cell_large_z./) 45 | // get the max from all the cells who've reported in. 46 | |max('value') 47 | |alert() 48 | // if the max remaining mem for any cell who's reported in the past 3 minutes is less than 9GB, then send the alert 49 | .crit(lambda: "max" < 9216) 50 | // only alert on state change or every 15 min when state hasn't changed 51 | .stateChangesOnly(15m) 52 | .noRecoveries() 53 | .id('Large Cell Capacity Alert for {{ index .Tags "deployment" }}') 54 | .message('{{ .ID }} 55 | No Large Cells in {{ index .Tags "deployment" }} have room for a 9G container. 56 | <' + grafanaurl +'/cell.json'+ grafanaenv +'|Go to Grafana>') 57 | .slack() 58 | .channel(slackchannel) 59 | .post().endpoint(jenkins) 60 | -------------------------------------------------------------------------------- /kapacitor/persistent_disk_np.tick: -------------------------------------------------------------------------------- 1 | //run periodically for 5 minutes to test alert 2 | //curl -i -XPOST 'http://localhost:8086/write?db=cf_np&precision=s' --data-binary 'firehose.bosh-hm-forwarder.system.disk.persistent.percent,deployment=cf_np-diego,job=cell_z1,index=999 value=99 '`date +%s`'' 3 | 4 | var grafanaurl = 'http://server.company.com:3000/dashboard/file' 5 | var slackchannel = '#bot-testing' 6 | var grafanaenv = '?&var-Environment=cf_np' 7 | 8 | stream 9 | |from() 10 | .measurement('firehose.bosh-hm-forwarder.system.disk.persistent.percent') 11 | |groupBy('job', 'index') 12 | |stateDuration(lambda: "value" >= 85) 13 | .unit(1m) 14 | |alert() 15 | .crit(lambda: "state_duration" >= 5) 16 | .stateChangesOnly(15m) 17 | .noRecoveries() 18 | .id('Persistent Disk Alert for {{ index .Tags "deployment" }}') 19 | .message('{{ .ID }} 20 | Job {{index .Tags "job" }}/{{index .Tags "index" }} in {{index .Tags "deployment" }} using more than 85%% Disk for past 5 minutes 21 | <' + grafanaurl +'/vm-level-stats.json'+ grafanaenv +'|Go to Grafana>') 22 | .slack() 23 | .channel(slackchannel) 24 | -------------------------------------------------------------------------------- /kapacitor/slow_consumer_np.tick: -------------------------------------------------------------------------------- 1 | //curl -i -XPOST 'http://localhost:8086/write?db=cf_np&precision=s' --data-binary 'firehose.slowConsumerAlert,deployment=cf_np-diego,ip=192.168.1.1 value=1 '`date +%s`'' 2 | 3 | var env = 'Non-Prod' 4 | var slackchannel = '#bot-testing' 5 | var grafanaenv = '?&var-Environment=cf_np' 6 | 7 | stream 8 | |from() 9 | .measurement('firehose.slowConsumerAlert') 10 | |alert() 11 | .crit(lambda: "value" > 0) 12 | .stateChangesOnly(15m) 13 | .noRecoveries() 14 | .id('Slow Consumer Alert for ' + env + ' Firehose') 15 | .message('{{ .ID }}') 16 | .slack() 17 | .channel(slackchannel) 18 | -------------------------------------------------------------------------------- /kapacitor/swap_alert_np.tick: -------------------------------------------------------------------------------- 1 | //run peridocially for 5 minutes to trigger test alert 2 | //curl -i -XPOST 'http://localhost:8086/write?db=cf_np&precision=s' --data-binary 'firehose.bosh-hm-forwarder.system.swap.percent,deployment=cf_np-diego,job=cell_z1,index=999 value=99 '`date +%s`'' 3 | 4 | var grafanaurl = 'http://server.company.com:3000/dashboard/file' 5 | var slackchannel = '#bot-testing' 6 | var grafanaenv = '?&var-Environment=cf_np' 7 | 8 | stream 9 | |from() 10 | .measurement('firehose.bosh-hm-forwarder.system.swap.percent') 11 | |groupBy('job', 'index') 12 | |stateDuration(lambda: "value" >= 25) 13 | .unit(1m) 14 | |alert() 15 | .crit(lambda: "state_duration" >= 5) 16 | .stateChangesOnly(15m) 17 | .noRecoveries() 18 | .id('Swap Alert for {{ index .Tags "deployment" }}') 19 | .message('{{ .ID }} 20 | Job {{index .Tags "job" }}/{{index .Tags "index" }} in {{index .Tags "deployment" }} using more than 25% swap for past 5 minutes 21 | <' + grafanaurl +'/vm-level-stats.json'+ grafanaenv +'|Go to Grafana>') 22 | .slack() 23 | .channel(slackchannel) 24 | -------------------------------------------------------------------------------- /telegraf/telegraf.conf: -------------------------------------------------------------------------------- 1 | # Telegraf Configuration 2 | # 3 | # Telegraf is entirely plugin driven. All metrics are gathered from the 4 | # declared inputs, and sent to the declared outputs. 5 | # 6 | # Plugins must be declared in here to be active. 7 | # To deactivate a plugin, comment out the name and any variables. 8 | # 9 | # Use 'telegraf -config telegraf.conf -test' to see what metrics a config 10 | # file would generate. 11 | # 12 | # Environment variables can be used anywhere in this config file, simply prepend 13 | # them with $. For strings the variable must be within quotes (ie, "$STR_VAR"), 14 | # for numbers and booleans they should be plain (ie, $INT_VAR, $BOOL_VAR) 15 | 16 | 17 | # Global tags can be specified here in key="value" format. 18 | [global_tags] 19 | 20 | # Configuration for telegraf agent 21 | [agent] 22 | ## Default data collection interval for all inputs 23 | interval = "10s" 24 | ## Rounds collection interval to 'interval' 25 | ## ie, if interval="10s" then always collect on :00, :10, :20, etc. 26 | round_interval = true 27 | 28 | ## Telegraf will send metrics to outputs in batches of at most 29 | ## metric_batch_size metrics. 30 | ## This controls the size of writes that Telegraf sends to output plugins. 31 | metric_batch_size = 1000 32 | 33 | ## For failed writes, telegraf will cache metric_buffer_limit metrics for each 34 | ## output, and will flush this buffer on a successful write. Oldest metrics 35 | ## are dropped first when this buffer fills. 36 | ## This buffer only fills when writes fail to output plugin(s). 37 | metric_buffer_limit = 10000 38 | 39 | ## Collection jitter is used to jitter the collection by a random amount. 40 | ## Each plugin will sleep for a random time within jitter before collecting. 41 | ## This can be used to avoid many plugins querying things like sysfs at the 42 | ## same time, which can have a measurable effect on the system. 43 | collection_jitter = "0s" 44 | 45 | ## Default flushing interval for all outputs. You shouldn't set this below 46 | ## interval. Maximum flush_interval will be flush_interval + flush_jitter 47 | flush_interval = "10s" 48 | ## Jitter the flush interval by a random amount. This is primarily to avoid 49 | ## large write spikes for users running a large number of telegraf instances. 50 | ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s 51 | flush_jitter = "0s" 52 | 53 | ## By default or when set to "0s", precision will be set to the same 54 | ## timestamp order as the collection interval, with the maximum being 1s. 55 | ## ie, when interval = "10s", precision will be "1s" 56 | ## when interval = "250ms", precision will be "1ms" 57 | ## Precision will NOT be used for service inputs. It is up to each individual 58 | ## service input to set the timestamp at the appropriate precision. 59 | ## Valid time units are "ns", "us" (or "µs"), "ms", "s". 60 | precision = "" 61 | 62 | ## Logging configuration: 63 | ## Run telegraf with debug log messages. 64 | debug = true 65 | 66 | ## Run telegraf in quiet mode (error log messages only). 67 | quiet = false 68 | ## Specify the log file name. The empty string means to log to stderr. 69 | logfile = "" 70 | 71 | ## Override default hostname, if empty use os.Hostname() 72 | hostname = "" 73 | ## If set to true, do no set the "host" tag in the telegraf agent. 74 | omit_hostname = false 75 | 76 | 77 | ############################################################################### 78 | # OUTPUT PLUGINS # 79 | ############################################################################### 80 | 81 | # Configuration for influxdb server to send metrics to 82 | [[outputs.influxdb]] 83 | ## The HTTP or UDP URL for your InfluxDB instance. Each item should be 84 | ## of the form: 85 | ## scheme "://" host [ ":" port] 86 | ## 87 | ## Multiple urls can be specified as part of the same cluster, 88 | ## this means that only ONE of the urls will be written to each interval. 89 | # urls = ["udp://localhost:8089"] # UDP endpoint example 90 | #urls = ["http://localhost:8086"] # required 91 | urls = ["http://kapacitor:9092"] # required 92 | ## The target database for metrics (telegraf will create it if not exists). 93 | #database = "telegraf" # required 94 | database = "$KAPACITOR_DB" # required 95 | 96 | ## Name of existing retention policy to write to. Empty string writes to 97 | ## the default retention policy. 98 | retention_policy = "autogen" 99 | ## Write consistency (clusters only), can be: "any", "one", "quorum", "all" 100 | write_consistency = "any" 101 | 102 | ## Write timeout (for the InfluxDB client), formatted as a string. 103 | ## If not provided, will default to 5s. 0s means no timeout (not recommended). 104 | timeout = "5s" 105 | # username = "telegraf" 106 | # password = "metricsmetricsmetricsmetrics" 107 | ## Set the user agent for HTTP POSTs (can be useful for log differentiation) 108 | # user_agent = "telegraf" 109 | ## Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes) 110 | # udp_payload = 512 111 | 112 | ## Optional SSL Config 113 | # ssl_ca = "/etc/telegraf/ca.pem" 114 | # ssl_cert = "/etc/telegraf/cert.pem" 115 | # ssl_key = "/etc/telegraf/key.pem" 116 | ## Use SSL but skip chain & host verification 117 | # insecure_skip_verify = false 118 | 119 | 120 | 121 | ############################################################################### 122 | # PROCESSOR PLUGINS # 123 | ############################################################################### 124 | 125 | # # Print all metrics that pass through this filter. 126 | # [[processors.printer]] 127 | 128 | 129 | 130 | ############################################################################### 131 | # AGGREGATOR PLUGINS # 132 | ############################################################################### 133 | 134 | # # Keep the aggregate min/max of each metric passing through. 135 | # [[aggregators.minmax]] 136 | # ## General Aggregator Arguments: 137 | # ## The period on which to flush & clear the aggregator. 138 | # period = "30s" 139 | # ## If true, the original metric will be dropped by the 140 | # ## aggregator and will not get sent to the output plugins. 141 | # drop_original = false 142 | 143 | 144 | 145 | ############################################################################### 146 | # INPUT PLUGINS # 147 | ############################################################################### 148 | 149 | 150 | ############################################################################### 151 | # SERVICE INPUT PLUGINS # 152 | ############################################################################### 153 | 154 | # Generic socket listener capable of handling multiple socket types. 155 | ## listener for np 156 | [[inputs.socket_listener]] 157 | service_address = "$LISTENER_PORT" 158 | #service_address = "tcp://:8125" 159 | data_format = "value" 160 | data_type = "string" 161 | name_override = "deploy_event" 162 | 163 | --------------------------------------------------------------------------------