├── .gitignore ├── README.md ├── docker-compose.yaml ├── monitoring_config ├── blackbox_config.yml └── prometheus.yml ├── systemd_service_samples ├── node_exporter.service └── smartmon_collector.service └── textfile_collectors ├── run_smartmon.sh └── smartmon.sh /.gitignore: -------------------------------------------------------------------------------- 1 | venv 2 | .idea -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Monitoring RPi with Prometheus 2 | Just a bunch of docker services to monitor your Raspberry Pi using prometheus and grafana. It can be used to 3 | monitor other platforms as well. 4 | 5 | Since this uses docker containers, it can be run on `amd64` platforms as well (i.e your windows or linux machine). 6 | This guide is generic, any platform specific changes would be highlighted with comments. 7 | 8 | ## Services used 9 | - Prometheus: Time series database 10 | - Grafana: Visualisation 11 | - Blackbox exporter: To probe things using HTTP, ICMP etc etc, check Internet connectivity 12 | - rpi_cpu_stats: Raspberry Pi CPU temp and freq exporter. You don't need to use this if not using RPi 13 | - node_exporter: Collects host stats. This needs to be running on each host 14 | - smartmon_collector: Collects disk stats using S.M.A.R.T 15 | 16 | **NOTE:** node_exporter and smartmon_collector are not run as docker container. They are run as system services 17 | 18 | ## Prerequisite 19 | - docker 20 | - docker-compose 21 | 22 | Follow this guide if you have to install docker and docker-compose: https://github.com/thundermagic/rpi_media_centre/blob/master/docs/docker_install.md 23 | 24 | ## Usage 25 | #### Installing node_exporter 26 | _Skip this if you are not monitoring the host_ 27 | 28 | **Note: Steps taken from https://devopscube.com/monitor-linux-servers-prometheus-node-exporter/. I have modified it for 29 | raspberry Pi** 30 | 31 | 1.) Download the latest `armv7` version from https://github.com/prometheus/node_exporter/releases. At the time of 32 | writing, the latest version is `0.18.1`. If you are running this on a different platform, then download the relevant 33 | platform package 34 | ```bash 35 | wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-armv7.tar.gz 36 | ``` 37 | 38 | 2.) Unpack the tarball 39 | ```bash 40 | tar -xvf node_exporter-0.18.1.linux-armv7.tar.gz 41 | ``` 42 | 43 | 3.) Move the node_exporter script to `/usr/local/bin/` 44 | ```bash 45 | cd node_exporter-0.18.1.linux-armv7 46 | mv node_exporter /usr/local/bin/ 47 | ``` 48 | 49 | 4.) Create a user `node_exporter` that will be used to run the node_exporter service. If you want to use a different user, 50 | then change the user in the `systemd_service_samples/node_exporter.service` file. 51 | ```bash 52 | sudo useradd -rs /bin/false node_exporter 53 | ``` 54 | 55 | 5.) Open `systemd_service_samples/node_exporter.service` with a text editor and modify it if needed. Comments in the file 56 | guide you what to modify. 57 | 58 | 6.) Copy the service file to `/etc/systemd/system` 59 | ```bash 60 | cp systemd_service_samples/node_exporter.service /etc/systemd/system/ 61 | ``` 62 | 63 | 7.) Start and enable the service 64 | ```bash 65 | sudo systemctl daemon-reload 66 | sudo systemctl start node_exporter 67 | sudo systemctl enable node_exporter 68 | ``` 69 | 70 | 8.) You should see the stats at `http://:9100/metrics` 71 | 72 | #### Monitoring disk stats 73 | _Skip this step if you are not monitoring disk using S.M.A.R.T_ 74 | 75 | You need `smartmontools` installed on the host to use this. To install it, `sudo apt install smartmontools` 76 | 77 | This uses a script to get disk stats using smartmontools and writes the stats to a file. Node exporter then parses 78 | the file and exposes the stats. 79 | The script (`smartmon.sh`) is available at: https://github.com/prometheus-community/node-exporter-textfile-collector-scripts. 80 | A copy of it is in the `textfile_collectors` directory. If there is a newer version of the this script available, then you can 81 | replace it with the newer version in the same directory. 82 | 83 | 1.) Open `textfile_collectors/run_smartmon.sh` with a text editor and modify it. Comments in the file guide with modifying 84 | 85 | 2.) Run this as a service 86 | ```bash 87 | cp systemd_service_samples/smartmon_collector.service /etc/systemd/system/ 88 | sudo systemctl daemon-reload 89 | sudo systemctl start smartmon_collector 90 | sudo systemctl enable smartmon_collector 91 | ``` 92 | 93 | You should now have disk stats being written to a file and you should see those stats at `http://:9100/metrics` 94 | 95 | #### Running prometheus 96 | 97 | 1.) Create directory structure like below to store prometheus and grafana data. I have my external hard disk mounted at /mnt/media. 98 | You can change it to whatever you like. You would need to change the `docker-compose.yaml` file accordingly as well. 99 | ```bash 100 | /mnt/media/appdata 101 | ├── grafana 102 | └── prometheus 103 | ``` 104 | 105 | 2.) Open `docker-compose.yaml` with a text editor and modify it. Comments in the file guide with modifying 106 | 107 | 3.) Edit `monitoring_config/prometheus.yml` file as per your setup. 108 | 109 | 4.) Run the service 110 | ```bash 111 | docker-compose up -d 112 | ``` 113 | 114 | 5.) Check containers are running 115 | ```bash 116 | docker container ls -a 117 | ``` 118 | All the containers should show as running. If there is any container stuck in a reboot a cycle, check the directory 119 | structure created in previous steps and check container logs for more info. 120 | 121 | ##### How do I check container logs? 122 | ```bash 123 | docker container logs 124 | ``` 125 | Example, if you have to check logs for prometheus 126 | ```bash 127 | docker container logs prometheus 128 | ``` 129 | 130 | If you have to follow/tail logs use the `-f` flag, example; 131 | ```bash 132 | docker container logs -f prometheus 133 | ``` 134 | 135 | Assuming everything went alright, you should have all the servics running now. 136 | You can now access each of the services. 137 | 138 | ## Accessing services 139 | Assuming your IP address is 192.168.4.4 140 | 141 | - Prometheus: http://192.168.4.4:9090 142 | - Grafana: http://192.168.4.4:3000 143 | - Blackbox exporter: http://192.168.4.4:9115 144 | - rpi_cpu_stats: http://192.168.4.4:9669 145 | 146 | All the port numbers are listed in the `docker container ls` command under `PORTS` column 147 | 148 | Now you should be able to start building your grafana dashboard. There are dashboard already build, you can just import them 149 | and modify them as per your need 150 | 151 | 152 | ## Services details 153 | #### Prometheus 154 | Time series database. 155 | Website: https://prometheus.io/ 156 | Docker image: https://hub.docker.com/r/prom/prometheus 157 | Documentation: https://prometheus.io/docs/introduction/overview/ 158 | 159 | #### Grafana 160 | Analytics and visualisation. 161 | Website: https://grafana.com/ 162 | Docker image: https://hub.docker.com/r/grafana/grafana 163 | Documentation: https://grafana.com/docs/ 164 | 165 | #### blackbox_exporter 166 | Probing things with HTTP, HTTPS, DNS, TCP and ICMP. 167 | Website and documentation: https://github.com/prometheus/blackbox_exporter 168 | Docker image: https://hub.docker.com/r/prom/blackbox-exporter 169 | 170 | #### rpi_cpu_stats 171 | Prometheus exporter to expose raspberry pi cpu stats like temp and frequencies. Does not rely on vcgencmd command. 172 | I use prometheus [node_exporter](https://github.com/prometheus/node_exporter) to monitor the hosts but node exporter was not 173 | exposing the cpu temperature and cpu frequencies correctly for raspberry pi. So I created this exporter. 174 | Website: https://github.com/thundermagic/rpi_cpu_stats 175 | Docker image and documentation: https://hub.docker.com/r/thundermagic/rpi_cpu_stats 176 | 177 | #### node_exporter 178 | Collects hosts stats. This needs to run on each host. 179 | Website and documentation: https://github.com/prometheus/node_exporter 180 | 181 | #### smartmon_collector: 182 | Collects disk stats. It is a node exporter text file exporter 183 | Website and documentation: https://github.com/prometheus-community/node-exporter-textfile-collector-scripts 184 | 185 | ## How to upgrade a service? 186 | When there is a new version of service available, like a newer grafana version, you can follow these steps to upgrade 187 | 188 | _**A bit of a side note regarding docker image tags:** Docker images naming conventions is `:`. Usually 189 | images have a `latest` tag that would be pointing to the latest version of the image. In addition to this images could 190 | have a tag for specific versions. Check out documentation for the image to know which tags are supported._ 191 | 192 | Assuming all the services have docker tag of `latest`, to upgrade; 193 | 194 | `cd` into the directory where docker-compose file is. 195 | #### To upgrade all services 196 | ```bash 197 | docker-compose pull 198 | docker-compose down 199 | docker-compose up -d 200 | ``` 201 | 202 | #### To upgrade one specific service 203 | Taking example of the grafana service which is using the container name of `grafana` 204 | ```bash 205 | docker-compose pull grafana 206 | docker container stop grafana 207 | docker container rm grafana 208 | docker-compose up -d grafana 209 | ``` 210 | 211 | If a service is using specific tag, then you would need to need to change the `docker-compose.yaml` file and change the 212 | tag to the newer tag. 213 | For example, assuming we have a tag of `v2` available for a service (lets call the service as `srv1` which uses the 214 | same name as container name) that currently is using the tag `v1`. 215 | Change the tag for the image used by this service to `v2` from `v1` and then run below commands in the shell. You have 216 | to be in the directory which have `docker-compose.yaml` file, unless you want to use the `-f` flag with `docker-compose` command. 217 | ```bash 218 | docker-compose pull srv1 219 | docker container stop srv1 220 | docker container rm srv1 221 | docker-compose up -d srv1 222 | ``` -------------------------------------------------------------------------------- /docker-compose.yaml: -------------------------------------------------------------------------------- 1 | version: "3" 2 | x-extra_hosts: 3 | &pi 4 | # Change this to IP of your RPi 5 | - "pi:192.168.4.4" 6 | services: 7 | # Prometheus and grafana services use docker bind mounts to store config data on the host's filesystem. if you're familiar with 8 | # docker storage, you can use whatever you want. You can change it to use docker volumes if you want. 9 | # I have my external HDD mounted as /mnt/media. If you want to use a different directory for bind mount, you can do 10 | # that by modifying the bind mounts under volumes section of a service. 11 | # 12 | # All the services are multi-arch. So you wouldn't need to change the image if you are not running it on raspberry pi. 13 | # 14 | # Each services runs on a different port number. Services use docker bridge networking and have host ports mapped 15 | # to container ports. For these you can change the mapping between host ports and container ports. 16 | # Port mapping is in the format : 17 | prometheus: 18 | image: prom/prometheus:latest 19 | restart: always 20 | container_name: prometheus 21 | extra_hosts: *pi 22 | ports: 23 | # Host port 9090 is mapped to container port 9090 24 | - "9090:9090" 25 | volumes: 26 | # Sample prometheus config is in the monitoring_config directory 27 | - ./monitoring_config/prometheus.yml:/etc/prometheus/prometheus.yml 28 | # Storing prometheus data in the hosts /mnt/media/appdata/prometheus directory 29 | - /mnt/media/appdata/prometheus:/prometheus 30 | 31 | grafana: 32 | image: grafana/grafana:latest 33 | restart: always 34 | container_name: grafana 35 | extra_hosts: *pi 36 | depends_on: 37 | - prometheus 38 | ports: 39 | # Host port 3000 is mapped to container port 3000 40 | - "3000:3000" 41 | volumes: 42 | # Storing grafana data in the hosts /mnt/media/appdata/grafana directory 43 | - /mnt/media/appdata/grafana:/var/lib/grafana 44 | 45 | # This services probes things using HTTP, ICMP, TCP etc. Mainly used to do connectivity and service checks 46 | blackbox_exporter: 47 | image: prom/blackbox-exporter:master 48 | restart: always 49 | container_name: blackbox_exporter 50 | extra_hosts: *pi 51 | ports: 52 | # Host port 9115 is mapped to container port 9115 53 | - "9115:9115" 54 | volumes: 55 | # Sample blackbox config is in the monitoring_config directory. You would normally won't need to change this 56 | - ./monitoring_config/blackbox_config.yml:/etc/blackbox_exporter/config.yml 57 | 58 | # If you are not using raspberry pi then you can delete this section. This collects and exposes cpu temperature and 59 | # frequency. At the time of writing, these stats are not collected by node_exporter itself. 60 | rpi_cpu_stats: 61 | image: thundermagic/rpi_cpu_stats:latest 62 | restart: always 63 | container_name: rpi_cpu_stats 64 | ports: 65 | - "9669:9669" 66 | environment: # Add the PUID and PGID of the user you want to run the container as 67 | - PUID=1001 68 | - PGID=1001 69 | volumes: 70 | # Mounting the host's sys directory to containers /sys directory. 71 | - /sys:/sys 72 | 73 | -------------------------------------------------------------------------------- /monitoring_config/blackbox_config.yml: -------------------------------------------------------------------------------- 1 | modules: 2 | http_2xx: 3 | prober: http 4 | http: 5 | preferred_ip_protocol: "ip4" # defaults to "ip6" 6 | ip_protocol_fallback: false # no fallback to "ip6" 7 | http_post_2xx: 8 | prober: http 9 | http: 10 | method: POST 11 | preferred_ip_protocol: "ip4" # defaults to "ip6" 12 | ip_protocol_fallback: false # no fallback to "ip6" 13 | tcp_connect: 14 | prober: tcp 15 | tcp: 16 | preferred_ip_protocol: "ip4" # defaults to "ip6" 17 | ip_protocol_fallback: false # no fallback to "ip6" 18 | source_ip_address: "127.0.0.1" 19 | icmp: 20 | prober: icmp 21 | icmp: 22 | preferred_ip_protocol: "ip4" # defaults to "ip6" 23 | ip_protocol_fallback: false # no fallback to "ip6" 24 | 25 | -------------------------------------------------------------------------------- /monitoring_config/prometheus.yml: -------------------------------------------------------------------------------- 1 | # This is a sample config. Please change it according to your setup. IP address would be one thing to change 2 | 3 | # my global config 4 | global: 5 | scrape_interval: 5s # Set the scrape interval to every 15 seconds. Default is every 1 minute. 6 | evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. 7 | # scrape_timeout is set to the global default (10s). 8 | 9 | # Alertmanager configuration 10 | alerting: 11 | alertmanagers: 12 | - static_configs: 13 | - targets: 14 | # - alertmanager:9093 15 | 16 | # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. 17 | rule_files: 18 | # - "first_rules.yml" 19 | # - "second_rules.yml" 20 | 21 | # A scrape configuration containing exactly one endpoint to scrape: 22 | # Here it's Prometheus itself. 23 | scrape_configs: 24 | # The job name is added as a label `job=` to any timeseries scraped from this config. 25 | - job_name: 'prometheus' 26 | 27 | # metrics_path defaults to '/metrics' 28 | # scheme defaults to 'http'. 29 | 30 | static_configs: 31 | - targets: ['localhost:9090'] 32 | 33 | - job_name: 'node_exporter' 34 | static_configs: 35 | - targets: ['192.168.4.4:9100'] 36 | 37 | - job_name: 'pi_cpu_stats' 38 | static_configs: 39 | - targets: ['192.168.4.4:9669'] 40 | 41 | - job_name: 'blackbox' 42 | metrics_path: /probe 43 | params: 44 | module: [http_2xx] # Look for a HTTP 200 response. 45 | static_configs: 46 | - targets: 47 | - http://prometheus.io # Target to probe with http. 48 | - https://prometheus.io # Target to probe with https. 49 | - https://www.youtube.com 50 | relabel_configs: 51 | - source_labels: [__address__] 52 | target_label: __param_target 53 | - source_labels: [__param_target] 54 | target_label: instance 55 | - target_label: __address__ 56 | replacement: 192.168.4.4:9115 # The blackbox exporter's real hostname:port. 57 | 58 | - job_name: 'blackbox_ping' 59 | metrics_path: /probe 60 | params: 61 | module: [icmp] 62 | static_configs: 63 | - targets: 64 | - www.google.com 65 | - 1.1.1.1 66 | relabel_configs: 67 | - source_labels: [__address__] 68 | target_label: __param_target 69 | - source_labels: [__param_target] 70 | target_label: instance 71 | - target_label: __address__ 72 | replacement: 192.168.4.4:9115 # The blackbox exporter's real hostname:port. 73 | -------------------------------------------------------------------------------- /systemd_service_samples/node_exporter.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=Node Exporter 3 | After=network.target 4 | 5 | [Service] 6 | # User and group used to run this service. If you want to use a different user and/or group, please change them 7 | User=node_exporter 8 | Group=node_exporter 9 | Type=simple 10 | # some node exporter collectors are enabled by default. Please see https://github.com/prometheus/node_exporter#collectors for details 11 | # Please modify the below statement as per your requirement. You can enable and disable collectors as you wish. 12 | # 13 | # This service specifies a texfile collector directory. If you are not using textfile collectors you can remove this from 14 | # the below statement. More details about textfile collector: https://github.com/prometheus/node_exporter#textfile-collector 15 | # If you are using the textfile collector as per below statement, make sure you have /textfile_collector_results directory 16 | # created 17 | ExecStart=/usr/local/bin/node_exporter --collector.systemd --collector.processes --collector.buddyinfo --collector.textfile.directory="/textfile_collector_result" 18 | 19 | [Install] 20 | WantedBy=multi-user.target -------------------------------------------------------------------------------- /systemd_service_samples/smartmon_collector.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=Node Exporter smartmon textfile collector 3 | After=network.target 4 | 5 | [Service] 6 | User=root 7 | Group=root 8 | Type=simple 9 | ExecStart=/usr/local/bin/run_smartmon.sh 10 | 11 | [Install] 12 | WantedBy=multi-user.target -------------------------------------------------------------------------------- /textfile_collectors/run_smartmon.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | while true 4 | do 5 | # Change to the directory where smartmon.sh script is. If you have been using your home directory then it would 6 | # be what is in the below statement. Please adjust it accordingly else 7 | cd $HOME/rpi_monitoring_with_prometheus/textfile_collectors 8 | # Run the script and save the output to file in /textfile_collector_result directory. You would need to have this 9 | # directory created. 10 | ./smartmon.sh > /textfile_collector_result/smartmon.tmp 11 | # Rename the file. node exporter textfile collector parses files with .prom extension 12 | mv /textfile_collector_result/smartmon.tmp /textfile_collector_result/smartmon.prom 13 | # Interval at which to measure stats. Change it as per your need. Its in seconds 14 | sleep 30 15 | done 16 | -------------------------------------------------------------------------------- /textfile_collectors/smartmon.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Script informed by the collectd monitoring script for smartmontools (using smartctl) 3 | # by Samuel B. (c) 2012 4 | # source at: http://devel.dob.sk/collectd-scripts/ 5 | 6 | # TODO: This probably needs to be a little more complex. The raw numbers can have more 7 | # data in them than you'd think. 8 | # http://arstechnica.com/civis/viewtopic.php?p=22062211 9 | 10 | # Formatting done via shfmt -i 2 11 | # https://github.com/mvdan/sh 12 | 13 | parse_smartctl_attributes_awk="$( 14 | cat <<'SMARTCTLAWK' 15 | $1 ~ /^ *[0-9]+$/ && $2 ~ /^[a-zA-Z0-9_-]+$/ { 16 | gsub(/-/, "_"); 17 | printf "%s_value{%s,smart_id=\"%s\"} %d\n", $2, labels, $1, $4 18 | printf "%s_worst{%s,smart_id=\"%s\"} %d\n", $2, labels, $1, $5 19 | printf "%s_threshold{%s,smart_id=\"%s\"} %d\n", $2, labels, $1, $6 20 | printf "%s_raw_value{%s,smart_id=\"%s\"} %e\n", $2, labels, $1, $10 21 | } 22 | SMARTCTLAWK 23 | )" 24 | 25 | smartmon_attrs="$( 26 | cat <<'SMARTMONATTRS' 27 | airflow_temperature_cel 28 | command_timeout 29 | current_pending_sector 30 | end_to_end_error 31 | erase_fail_count 32 | g_sense_error_rate 33 | hardware_ecc_recovered 34 | host_reads_mib 35 | host_reads_32mib 36 | host_writes_mib 37 | host_writes_32mib 38 | load_cycle_count 39 | media_wearout_indicator 40 | wear_leveling_count 41 | nand_writes_1gib 42 | offline_uncorrectable 43 | power_cycle_count 44 | power_on_hours 45 | program_fail_count 46 | raw_read_error_rate 47 | reallocated_event_count 48 | reallocated_sector_ct 49 | reported_uncorrect 50 | sata_downshift_count 51 | seek_error_rate 52 | spin_retry_count 53 | spin_up_time 54 | start_stop_count 55 | temperature_case 56 | temperature_celsius 57 | temperature_internal 58 | total_lbas_read 59 | total_lbas_written 60 | udma_crc_error_count 61 | unsafe_shutdown_count 62 | workld_host_reads_perc 63 | workld_media_wear_indic 64 | workload_minutes 65 | SMARTMONATTRS 66 | )" 67 | smartmon_attrs="$(echo ${smartmon_attrs} | xargs | tr ' ' '|')" 68 | 69 | parse_smartctl_attributes() { 70 | local disk="$1" 71 | local disk_type="$2" 72 | local labels="disk=\"${disk}\",type=\"${disk_type}\"" 73 | local vars="$(echo "${smartmon_attrs}" | xargs | tr ' ' '|')" 74 | sed 's/^ \+//g' | 75 | awk -v labels="${labels}" "${parse_smartctl_attributes_awk}" 2>/dev/null | 76 | tr A-Z a-z | 77 | grep -E "(${smartmon_attrs})" 78 | } 79 | 80 | parse_smartctl_scsi_attributes() { 81 | local disk="$1" 82 | local disk_type="$2" 83 | local labels="disk=\"${disk}\",type=\"${disk_type}\"" 84 | while read line; do 85 | attr_type="$(echo "${line}" | tr '=' ':' | cut -f1 -d: | sed 's/^ \+//g' | tr ' ' '_')" 86 | attr_value="$(echo "${line}" | tr '=' ':' | cut -f2 -d: | sed 's/^ \+//g')" 87 | case "${attr_type}" in 88 | number_of_hours_powered_up_) power_on="$(echo "${attr_value}" | awk '{ printf "%e\n", $1 }')" ;; 89 | Current_Drive_Temperature) temp_cel="$(echo ${attr_value} | cut -f1 -d' ' | awk '{ printf "%e\n", $1 }')" ;; 90 | Blocks_read_from_cache_and_sent_to_initiator_) lbas_read="$(echo ${attr_value} | awk '{ printf "%e\n", $1 }')" ;; 91 | Accumulated_start-stop_cycles) power_cycle="$(echo ${attr_value} | awk '{ printf "%e\n", $1 }')" ;; 92 | Elements_in_grown_defect_list) grown_defects="$(echo ${attr_value} | awk '{ printf "%e\n", $1 }')" ;; 93 | esac 94 | done 95 | [ ! -z "$power_on" ] && echo "power_on_hours_raw_value{${labels},smart_id=\"9\"} ${power_on}" 96 | [ ! -z "$temp_cel" ] && echo "temperature_celsius_raw_value{${labels},smart_id=\"194\"} ${temp_cel}" 97 | [ ! -z "$lbas_read" ] && echo "total_lbas_read_raw_value{${labels},smart_id=\"242\"} ${lbas_read}" 98 | [ ! -z "$power_cycle" ] && echo "power_cycle_count_raw_value{${labels},smart_id=\"12\"} ${power_cycle}" 99 | [ ! -z "$grown_defects" ] && echo "grown_defects_count_raw_value{${labels},smart_id=\"12\"} ${grown_defects}" 100 | } 101 | 102 | parse_smartctl_info() { 103 | local -i smart_available=0 smart_enabled=0 smart_healthy=0 104 | local disk="$1" disk_type="$2" 105 | local model_family='' device_model='' serial_number='' fw_version='' vendor='' product='' revision='' lun_id='' 106 | while read line; do 107 | info_type="$(echo "${line}" | cut -f1 -d: | tr ' ' '_')" 108 | info_value="$(echo "${line}" | cut -f2- -d: | sed 's/^ \+//g' | sed 's/"/\\"/')" 109 | case "${info_type}" in 110 | Model_Family) model_family="${info_value}" ;; 111 | Device_Model) device_model="${info_value}" ;; 112 | Serial_Number) serial_number="${info_value}" ;; 113 | Firmware_Version) fw_version="${info_value}" ;; 114 | Vendor) vendor="${info_value}" ;; 115 | Product) product="${info_value}" ;; 116 | Revision) revision="${info_value}" ;; 117 | Logical_Unit_id) lun_id="${info_value}" ;; 118 | esac 119 | if [[ "${info_type}" == 'SMART_support_is' ]]; then 120 | case "${info_value:0:7}" in 121 | Enabled) smart_enabled=1 ;; 122 | Availab) smart_available=1 ;; 123 | Unavail) smart_available=0 ;; 124 | esac 125 | fi 126 | if [[ "${info_type}" == 'SMART_overall-health_self-assessment_test_result' ]]; then 127 | case "${info_value:0:6}" in 128 | PASSED) smart_healthy=1 ;; 129 | esac 130 | elif [[ "${info_type}" == 'SMART_Health_Status' ]]; then 131 | case "${info_value:0:2}" in 132 | OK) smart_healthy=1 ;; 133 | esac 134 | fi 135 | done 136 | echo "device_info{disk=\"${disk}\",type=\"${disk_type}\",vendor=\"${vendor}\",product=\"${product}\",revision=\"${revision}\",lun_id=\"${lun_id}\",model_family=\"${model_family}\",device_model=\"${device_model}\",serial_number=\"${serial_number}\",firmware_version=\"${fw_version}\"} 1" 137 | echo "device_smart_available{disk=\"${disk}\",type=\"${disk_type}\"} ${smart_available}" 138 | echo "device_smart_enabled{disk=\"${disk}\",type=\"${disk_type}\"} ${smart_enabled}" 139 | echo "device_smart_healthy{disk=\"${disk}\",type=\"${disk_type}\"} ${smart_healthy}" 140 | } 141 | 142 | output_format_awk="$( 143 | cat <<'OUTPUTAWK' 144 | BEGIN { v = "" } 145 | v != $1 { 146 | print "# HELP smartmon_" $1 " SMART metric " $1; 147 | print "# TYPE smartmon_" $1 " gauge"; 148 | v = $1 149 | } 150 | {print "smartmon_" $0} 151 | OUTPUTAWK 152 | )" 153 | 154 | format_output() { 155 | sort | 156 | awk -F'{' "${output_format_awk}" 157 | } 158 | 159 | smartctl_version="$(/usr/sbin/smartctl -V | head -n1 | awk '$1 == "smartctl" {print $2}')" 160 | 161 | echo "smartctl_version{version=\"${smartctl_version}\"} 1" | format_output 162 | 163 | if [[ "$(expr "${smartctl_version}" : '\([0-9]*\)\..*')" -lt 6 ]]; then 164 | exit 165 | fi 166 | 167 | device_list="$(/usr/sbin/smartctl --scan-open | awk '/^\/dev/{print $1 "|" $3}')" 168 | 169 | for device in ${device_list}; do 170 | disk="$(echo ${device} | cut -f1 -d'|')" 171 | type="$(echo ${device} | cut -f2 -d'|')" 172 | active=1 173 | echo "smartctl_run{disk=\"${disk}\",type=\"${type}\"}" "$(TZ=UTC date '+%s')" 174 | # Check if the device is in a low-power mode 175 | /usr/sbin/smartctl -n standby -d "${type}" "${disk}" > /dev/null || active=0 176 | echo "device_active{disk=\"${disk}\",type=\"${type}\"}" "${active}" 177 | # Skip further metrics to prevent the disk from spinning up 178 | #test ${active} -eq 0 && continue 179 | # Get the SMART information and health 180 | /usr/sbin/smartctl -i -H -d "${type}" "${disk}" | parse_smartctl_info "${disk}" "${type}" 181 | # Get the SMART attributes 182 | case ${type} in 183 | sat) /usr/sbin/smartctl -A -d "${type}" "${disk}" | parse_smartctl_attributes "${disk}" "${type}" ;; 184 | sat+megaraid*) /usr/sbin/smartctl -A -d "${type}" "${disk}" | parse_smartctl_attributes "${disk}" "${type}" ;; 185 | scsi) /usr/sbin/smartctl -A -d "${type}" "${disk}" | parse_smartctl_scsi_attributes "${disk}" "${type}" ;; 186 | megaraid*) /usr/sbin/smartctl -A -d "${type}" "${disk}" | parse_smartctl_scsi_attributes "${disk}" "${type}" ;; 187 | *) 188 | echo "disk type is not sat, scsi or megaraid but ${type}" 189 | exit 190 | ;; 191 | esac 192 | done | format_output --------------------------------------------------------------------------------