├── .gitignore ├── LICENSE ├── README.md ├── bin └── zpool ├── build_tarball.sh ├── prometheus-zfs.go ├── utils.go ├── zpool.go └── zpool_test.go /.gitignore: -------------------------------------------------------------------------------- 1 | bin/ 2 | prometheus-zfs 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Eric Ripa 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # prometheus-zfs 2 | 3 | ---- 4 | 5 | **Note:** This project is **WIP** and was mainly built in order to learn more Go and Prometheus. Given the nature of ZFS and the zpool command, this service requires root to be work. This generally a bad idea for a network exposed service. Considering that versions v0.1.0 and v0.1.1 will require root to run, they are **not** recommended to use in production. 6 | 7 | I've come to realize that Prometheus node_exporter does support a text-file parsing feature. My intention is to rewrite this program to instead of being a network service for Prometheus it will be simplified to produce text output in the [Prometheus Exposition Format](http://prometheus.io/docs/instrumenting/exposition_formats/) that can then be used by node_exporter. 8 | 9 | ---- 10 | 11 | 12 | Prometheus metric endpoint to get ZFS pool stats, written in Go. 13 | 14 | Using Go gives the nice benefit of static binaries on different platforms. The only external dependency is 'zpool', which you probably have where you want to use this. 15 | 16 | Heavily borrowed from my [nagios-zfs-go](https://github.com/eripa/nagios-zfs-go) utility which is used to do Nagios status checks of zpools. 17 | 18 | ## Usage 19 | 20 | prometheus-zfs runs in the foreground, providing a HTTP endpoint for Prometheus collection. 21 | 22 | Listen port and endpoint name can be configured using command lines, as shown in the help text. 23 | 24 | Usage of ./prometheus-zfs: 25 | -endpoint string 26 | HTTP endpoint to export data on (default "metrics") 27 | -p string 28 | what ZFS pool to monitor (shorthand) (default "tank") 29 | -pool string 30 | what ZFS pool to monitor (default "tank") 31 | -port string 32 | Port to listen on (default "8080") 33 | -version 34 | display current tool version 35 | 36 | ## Example run 37 | 38 | Launch exporter: 39 | 40 | $ ./prometheus-zfs -p zones -port 8090 -endpoint zonesmetrics 41 | Starting zpool metrics exporter on :8090/zonesmetrics 42 | 43 | And collect using curl: 44 | 45 | $ curl http://localhost:8090/zonesmetrics 2> /dev/null | grep "^zpool" 46 | zpool_capacity_percentage 53 47 | zpool_faulted_providers_count 0 48 | zpool_online_providers_count 6 49 | 50 | ## Build 51 | 52 | I recommend to use Go 1.5, to make cross-compilation a lot easier. 53 | 54 | SmartOS (x86_64): 55 | 56 | env GOOS=solaris GOARCH=amd64 go build -o bin/prometheus-zfs-solaris 57 | 58 | Linux (x86_64): 59 | 60 | env GOOS=linux GOARCH=amd64 go build -o bin/prometheus-zfs-linux 61 | 62 | Mac OS X: 63 | 64 | env GOOS=darwin GOARCH=amd64 go build -o bin/prometheus-zfs-mac 65 | 66 | ## Tests 67 | 68 | There are some simple test cases to make sure that no insane results occur. All test cases are based on a raidz2 setup with 6 disks. So perhaps more variants of pool configurations would be good to add.. also one could create different, real, pool using disk images. Contributions are welcome! 69 | 70 | Run `go test -v` to run the tests with some verbosity. 71 | 72 | ## bin/zpool 73 | 74 | `bin/zpool` is a shell-script that can be used to fake a 'zpool' command on your local development machine where you might not have ZFS installed. It will simply run zpool over SSH on a remote host. Set environment variable ZFSHOST to whatever host you want to remote to. 75 | 76 | The script also has some simple sed statements prepared (you will have to remove the hash signs manually) to fake different pool statuses for testing purposes. 77 | 78 | ## License 79 | 80 | The MIT License, see separate LICENSE file for full text. 81 | 82 | ## Contributing 83 | 84 | * Fork it 85 | * Create your feature branch (git checkout -b my-new-feature) 86 | * Commit your changes (git commit -am 'Add some feature') 87 | * Push to the branch (git push origin my-new-feature) 88 | * Create new Pull Request 89 | -------------------------------------------------------------------------------- /bin/zpool: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | # A simple script that can be used in case the development machine doesn't 3 | # have a ZFS pool thus no 'zpool' command available. 4 | # 5 | # Set the environment variable ZFSHOST to a valid host and put this folder in 6 | # your PATH, then you should have a zpool command on your dev machine 7 | # 8 | 9 | e=$(ssh -q root@${ZFSHOST} zpool $@) 10 | echo "$e" 11 | 12 | # Below line can be used to fake a degraded pool 13 | #echo "$e" | sed '1 s/ONLINE/DEGRADED/' | sed '2 s/ONLINE/DEGRADED/' | sed '10,11 s/ONLINE/UNAVAIL/' 14 | # Below line can be used to fake capacity on pool (88% in this case) 15 | #echo "$e" | sed -E 's/[0-9]+/88/' 16 | 17 | -------------------------------------------------------------------------------- /build_tarball.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # Shell script for building binaries for all relevant platforms 4 | 5 | SCRIPT_DIR=$(dirname $0) 6 | cd ${SCRIPT_DIR} 7 | 8 | go test 9 | if [ "$?" -ne "0" ] ; then 10 | echo "go test failed, aborting" 11 | exit 1 12 | fi 13 | 14 | # Build 15 | declare -a TARGETS=(darwin linux solaris freebsd) 16 | for target in ${TARGETS[@]} ; do 17 | output="prometheus-zfs-${target}" 18 | echo "Building for ${target}, output bin/${output}" 19 | export GOOS=${target} 20 | export GOARCH=amd64 21 | go build -o bin/${output} 22 | done 23 | 24 | # Create a tar-ball for release 25 | DIR_NAME=${PWD##*/} # name of current directory, presumably prometheus-zfs 26 | VERSION=$(git describe --abbrev=0 --tags 2> /dev/null) # this doesn't actually seem to work 27 | if [ "$?" -ne 0 ] ; then 28 | # No tag, use commit hash 29 | HASH=$(git rev-parse HEAD) 30 | VERSION=${HASH:0:7} 31 | fi 32 | 33 | cd ../ 34 | TARBALL="prometheus-zfs-${VERSION}.tar.gz" 35 | tar -cf ${TARBALL} --exclude=.git -vz ${DIR_NAME} 36 | echo "Created: ${PWD}/${TARBALL}" 37 | -------------------------------------------------------------------------------- /prometheus-zfs.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "log" 7 | "net/http" 8 | "os" 9 | "sync" 10 | 11 | "github.com/prometheus/client_golang/prometheus" 12 | ) 13 | 14 | const ( 15 | toolVersion = "0.1.1" 16 | ) 17 | 18 | // Exporter collects zpool stats from the given zpool and exports them using 19 | // the prometheus metrics package. 20 | type Exporter struct { 21 | mutex sync.RWMutex 22 | 23 | poolUsage, providersFaulted, providersOnline prometheus.Gauge 24 | zpool *zpool 25 | } 26 | 27 | // NewExporter returns an initialized Exporter. 28 | func NewExporter(zp *zpool) *Exporter { 29 | // Init and return our exporter. 30 | return &Exporter{ 31 | zpool: zp, 32 | poolUsage: prometheus.NewGauge(prometheus.GaugeOpts{ 33 | Name: "zpool_capacity_percentage", 34 | Help: "Current zpool capacity level", 35 | }), 36 | providersOnline: prometheus.NewGauge(prometheus.GaugeOpts{ 37 | Name: "zpool_online_providers_count", 38 | Help: "Number of ONLINE zpool providers (disks)", 39 | }), 40 | providersFaulted: prometheus.NewGauge(prometheus.GaugeOpts{ 41 | Name: "zpool_faulted_providers_count", 42 | Help: "Number of FAULTED/UNAVAIL zpool providers (disks)", 43 | }), 44 | } 45 | } 46 | 47 | // Describe describes all the metrics ever exported by the zpool exporter. It 48 | // implements prometheus.Collector. 49 | func (e *Exporter) Describe(ch chan<- *prometheus.Desc) { 50 | ch <- e.poolUsage.Desc() 51 | ch <- e.providersOnline.Desc() 52 | ch <- e.providersFaulted.Desc() 53 | } 54 | 55 | // Collect fetches the stats from configured ZFS pool and delivers them 56 | // as Prometheus metrics. It implements prometheus.Collector. 57 | func (e *Exporter) Collect(ch chan<- prometheus.Metric) { 58 | 59 | e.mutex.Lock() // To protect metrics from concurrent collects. 60 | defer e.mutex.Unlock() 61 | 62 | e.zpool.getStatus() 63 | e.poolUsage.Set(float64(e.zpool.capacity)) 64 | e.providersOnline.Set(float64(e.zpool.online)) 65 | e.providersFaulted.Set(float64(e.zpool.faulted)) 66 | 67 | ch <- e.poolUsage 68 | ch <- e.providersOnline 69 | ch <- e.providersFaulted 70 | } 71 | 72 | var ( 73 | zfsPool string 74 | listenPort string 75 | metricsHandle string 76 | versionCheck bool 77 | ) 78 | 79 | func init() { 80 | const ( 81 | defaultPool = "tank" 82 | selectedPool = "what ZFS pool to monitor" 83 | versionUsage = "display current tool version" 84 | defaultPort = "8080" 85 | portUsage = "Port to listen on" 86 | defaultHandle = "metrics" 87 | handleUsage = "HTTP endpoint to export data on" 88 | ) 89 | flag.StringVar(&zfsPool, "pool", defaultPool, selectedPool) 90 | flag.StringVar(&zfsPool, "p", defaultPool, selectedPool+" (shorthand)") 91 | flag.StringVar(&listenPort, "port", defaultPort, portUsage) 92 | flag.StringVar(&metricsHandle, "endpoint", defaultHandle, handleUsage) 93 | flag.BoolVar(&versionCheck, "version", false, versionUsage) 94 | flag.Parse() 95 | } 96 | 97 | func main() { 98 | if versionCheck { 99 | fmt.Printf("prometheus-zfs v%s (https://github.com/eripa/prometheus-zfs)\n", toolVersion) 100 | os.Exit(0) 101 | } 102 | err := checkExistance(zfsPool) 103 | if err != nil { 104 | log.Fatal(err) 105 | } 106 | z := zpool{name: zfsPool} 107 | z.getStatus() 108 | 109 | exporter := NewExporter(&z) 110 | prometheus.MustRegister(exporter) 111 | 112 | fmt.Printf("Starting zpool metrics exporter on :%s/%s\n", listenPort, metricsHandle) 113 | http.Handle("/"+metricsHandle, prometheus.Handler()) 114 | http.ListenAndServe(":"+listenPort, nil) 115 | 116 | } 117 | -------------------------------------------------------------------------------- /utils.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import "strings" 4 | 5 | func substringInSlice(str string, list []string) bool { 6 | for _, v := range list { 7 | if strings.Contains(str, v) { 8 | return true 9 | } 10 | } 11 | return false 12 | } 13 | -------------------------------------------------------------------------------- /zpool.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "errors" 5 | "fmt" 6 | "log" 7 | "os/exec" 8 | "strconv" 9 | "strings" 10 | ) 11 | 12 | type zpool struct { 13 | name string 14 | capacity int64 15 | healthy bool 16 | status string 17 | online int64 18 | faulted int64 19 | } 20 | 21 | func (z *zpool) checkHealth(output string) (err error) { 22 | output = strings.Trim(output, "\n") 23 | if output == "ONLINE" { 24 | z.healthy = true 25 | } else if output == "DEGRADED" || output == "FAULTED" { 26 | z.healthy = false 27 | } else { 28 | z.healthy = false // just to make sure 29 | err = errors.New("Unknown status") 30 | } 31 | return err 32 | } 33 | 34 | func (z *zpool) getCapacity(output string) (err error) { 35 | s := strings.Split(output, "%")[0] 36 | z.capacity, err = strconv.ParseInt(s, 0, 8) 37 | if err != nil { 38 | return err 39 | } 40 | return err 41 | } 42 | 43 | func (z *zpool) getProviders(output string) (err error) { 44 | nonProviderLines := []string{ 45 | z.name, 46 | "state:", 47 | "mirror-", 48 | "raid0-", 49 | "raid10-", 50 | "raidz-", 51 | "raidz2-", 52 | "raidz3-", 53 | } 54 | lines := strings.Split(output, "\n") 55 | z.status = strings.Split(lines[1], " ")[2] 56 | 57 | // Count all providers, ONLINE and FAULTED 58 | var fcount int64 59 | var dcount int64 60 | for _, line := range lines { 61 | if (strings.Contains(line, "FAULTED") || strings.Contains(line, "UNAVAIL")) && !substringInSlice(line, nonProviderLines) { 62 | fcount = fcount + 1 63 | } else if strings.Contains(line, "ONLINE") && !substringInSlice(line, nonProviderLines) { 64 | dcount = dcount + 1 65 | } 66 | } 67 | z.faulted = fcount 68 | z.online = dcount 69 | 70 | if z.status != "ONLINE" && z.status != "DEGRADED" && z.status != "FAULTED" { 71 | z.faulted = 1 // fake faulted if there is a parsing error or other status 72 | err = errors.New("Error parsing faulted/unavailable providers") 73 | } 74 | return 75 | } 76 | 77 | func (z *zpool) getStatus() { 78 | output := runZpoolCommand([]string{"status", z.name}) 79 | err := z.getProviders(output) 80 | if err != nil { 81 | log.Fatal("Error parsing zpool status") 82 | } 83 | output = runZpoolCommand([]string{"list", "-H", "-o", "health", z.name}) 84 | err = z.checkHealth(output) 85 | if err != nil { 86 | log.Fatal("Error parsing zpool list -H -o health ", z.name) 87 | } 88 | output = runZpoolCommand([]string{"list", "-H", "-o", "cap", z.name}) 89 | err = z.getCapacity(output) 90 | if err != nil { 91 | log.Fatal("Error parsing zpool capacity") 92 | } 93 | } 94 | 95 | func checkExistance(pool string) (err error) { 96 | output := runZpoolCommand([]string{"list", pool}) 97 | if strings.Contains(fmt.Sprintf("%s", output), "no such pool") { 98 | err = errors.New("No such pool") 99 | } 100 | return 101 | } 102 | 103 | func runZpoolCommand(args []string) string { 104 | zpoolPath, err := exec.LookPath("zpool") 105 | if err != nil { 106 | log.Fatal("Could not find zpool in PATH") 107 | } 108 | cmd := exec.Command(zpoolPath, args...) 109 | out, _ := cmd.CombinedOutput() 110 | return fmt.Sprintf("%s", out) 111 | } 112 | -------------------------------------------------------------------------------- /zpool_test.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "testing" 5 | ) 6 | 7 | func TestCheckHealth(t *testing.T) { 8 | z := zpool{name: "tank"} 9 | 10 | // Test ONLINE 11 | err := z.checkHealth("ONLINE") 12 | 13 | if err != nil { 14 | t.Errorf("Error in checkHealth (%s)", err) 15 | } 16 | if z.healthy == false { 17 | t.Errorf("healthy should equal true when given 'ONLINE'") 18 | } 19 | 20 | // Test FAULED 21 | err = z.checkHealth("FAULTED") 22 | if err != nil { 23 | t.Errorf("Error in checkHealth (%s)", err) 24 | } 25 | if z.healthy == true { 26 | t.Errorf("healthy should equal true when given 'FAULTED'") 27 | } 28 | 29 | // Test DEGRADED 30 | err = z.checkHealth("DEGRADED") 31 | if err != nil { 32 | t.Errorf("Error in checkHealth (%s)", err) 33 | } 34 | if z.healthy == true { 35 | t.Errorf("healthy should equal true when given 'DEGRADED'") 36 | } 37 | 38 | // Test other 39 | err = z.checkHealth("other status") 40 | if err == nil { 41 | t.Errorf("other status should throw error in checkHealth (%s)", err) 42 | } 43 | if z.healthy == true { 44 | t.Errorf("healthy should equal false when given unknown input") 45 | } 46 | } 47 | 48 | func TestGetCapacity(t *testing.T) { 49 | z := zpool{name: "tank"} 50 | 51 | // Test average capacity 52 | err := z.getCapacity("51%") 53 | 54 | if err != nil { 55 | t.Errorf("Error in getCapacity") 56 | } 57 | if z.capacity != 51 { 58 | t.Errorf("Non-matching integer, should be 51") 59 | } 60 | 61 | // Test non-integer 62 | err = z.getCapacity("foo") 63 | 64 | if err == nil { 65 | t.Errorf("Non-integer should produce error in getCapacity") 66 | } 67 | } 68 | 69 | func TestGetProviders(t *testing.T) { 70 | z := zpool{ 71 | name: "tank", 72 | faulted: -1, // set to -1 to make sure we actually test the all-ONLINE case 73 | } 74 | 75 | // Test all ONLINE 76 | err := z.getProviders(` pool: tank 77 | state: ONLINE 78 | scan: scrub repaired 0 in 1h1m with 0 errors on Thu Jan 1 13:37:00 1970 79 | config: 80 | 81 | NAME STATE READ WRITE CKSUM 82 | zones ONLINE 0 0 0 83 | raidz2-0 ONLINE 0 0 0 84 | c0t5000C5006A6E87D9d0 ONLINE 0 0 0 85 | c0t5000C50024CAAFFCd0 ONLINE 0 0 0 86 | c0t5000CCA249D27B4Ed0 ONLINE 0 0 0 87 | c0t5000C5004425F6F6d0 ONLINE 0 0 0 88 | c0t5000C500652DD0EFd0 ONLINE 0 0 0 89 | c0t50014EE25A580141d0 ONLINE 0 0 0 90 | 91 | errors: No known data errors`) 92 | if err != nil { 93 | t.Errorf("Error in getProviders") 94 | } 95 | if z.faulted != 0 { 96 | t.Errorf("Incorrect amount of faulted, should be 0.") 97 | } 98 | 99 | // Test degraded state 100 | err = z.getProviders(` pool: tank 101 | state: DEGRADED 102 | scan: scrub repaired 0 in 1h1m with 0 errors on Thu Jan 1 13:37:00 1970 103 | config: 104 | 105 | NAME STATE READ WRITE CKSUM 106 | zones DEGRADED 0 0 0 107 | raidz2-0 ONLINE 0 0 0 108 | c0t5000C5006A6E87D9d0 FAULTED 0 0 0 109 | c0t5000C50024CAAFFCd0 ONLINE 0 0 0 110 | c0t5000CCA249D27B4Ed0 ONLINE 0 0 0 111 | c0t5000C5004425F6F6d0 UNAVAIL 0 0 0 112 | c0t5000C500652DD0EFd0 ONLINE 0 0 0 113 | c0t50014EE25A580141d0 ONLINE 0 0 0 114 | 115 | errors: No known data errors`) 116 | if err != nil { 117 | t.Errorf("Error in getProviders") 118 | } 119 | if z.faulted != 2 { 120 | t.Errorf("Incorrect amount of faulted, should be 2.") 121 | } 122 | 123 | // Test other output 124 | err = z.getProviders(` pool: tank 125 | state: Oother`) 126 | if err == nil { 127 | t.Errorf("Should produce parsing error in getProviders") 128 | } 129 | if z.faulted != 1 { 130 | t.Errorf("Incorrect amount of faulted, should be 1 when parsing error.") 131 | } 132 | 133 | // Test all ONLINE 134 | err = z.getProviders(` pool: tank 135 | state: ONLINE 136 | scan: scrub repaired 0 in 1h1m with 0 errors on Thu Jan 1 13:37:00 1970 137 | config: 138 | 139 | NAME STATE READ WRITE CKSUM 140 | tank ONLINE 0 0 0 141 | raidz2-0 ONLINE 0 0 0 142 | c0t5000C5006A6E87D9d0 ONLINE 0 0 0 143 | c0t5000C50024CAAFFCd0 ONLINE 0 0 0 144 | c0t5000CCA249D27B4Ed0 ONLINE 0 0 0 145 | c0t5000C5004425F6F6d0 ONLINE 0 0 0 146 | c0t5000C500652DD0EFd0 ONLINE 0 0 0 147 | c0t50014EE25A580141d0 ONLINE 0 0 0 148 | 149 | errors: No known data errors`) 150 | if err != nil { 151 | t.Errorf("Error in getProviders") 152 | } 153 | if z.online != 6 { 154 | t.Errorf("Incorrect amount of online (%d) providers, should be 6.", z.online) 155 | } 156 | 157 | // Test degraded state 158 | err = z.getProviders(` pool: tank 159 | state: DEGRADED 160 | scan: scrub repaired 0 in 1h1m with 0 errors on Thu Jan 1 13:37:00 1970 161 | config: 162 | 163 | NAME STATE READ WRITE CKSUM 164 | zones DEGRADED 0 0 0 165 | raidz2-0 ONLINE 0 0 0 166 | c0t5000C5006A6E87D9d0 FAULTED 0 0 0 167 | c0t5000C50024CAAFFCd0 ONLINE 0 0 0 168 | c0t5000CCA249D27B4Ed0 ONLINE 0 0 0 169 | c0t5000C5004425F6F6d0 UNAVAIL 0 0 0 170 | c0t5000C500652DD0EFd0 ONLINE 0 0 0 171 | c0t50014EE25A580141d0 ONLINE 0 0 0 172 | 173 | errors: No known data errors`) 174 | if err != nil { 175 | t.Errorf("Error in getProviders") 176 | } 177 | if z.online != 4 { 178 | t.Errorf("Incorrect amount of online (%d) providers, should be 4.", z.online) 179 | } 180 | } 181 | --------------------------------------------------------------------------------