├── .gitignore
├── monitor
    ├── vars
    │   └── main.yml
    ├── handlers
    │   └── main.yml
    ├── defaults
    │   └── main.yml
    ├── .yamllint
    ├── README.md
    ├── tasks
    │   ├── vagrant_swarm.yml
    │   └── main.yml
    ├── templates
    │   ├── monitor-stack.yml.j2
    │   └── prometheus.yml.j2
    └── meta
    │   └── main.yml
├── images
    ├── pihole.png
    ├── config_host_ip.png
    ├── grafana_login.png
    ├── node_exporter.png
    ├── pihole_exporter.png
    ├── cadvisor_exporter.png
    ├── complete_dashboard.png
    ├── import_dashboard_01.png
    ├── import_dashboard_02.png
    ├── prometheus_query_01.png
    ├── prometheus_targets.png
    ├── grafana_node_exporter.png
    ├── grafana_zfs_job_node.png
    ├── config_add_data_source.png
    ├── grafana_dashboard_save.png
    ├── grafana_pihole_dashboard.png
    ├── grafana_save_dashboard.png
    ├── config_dropdown_prometheus.png
    ├── grafana_save_dashboard_error.png
    ├── grafana_docker_swarm_dashboard_before.png
    └── grafana_docker_swarm_dashboard_with_cadvisor.png
├── playbook.yml
├── Jenkinsfile
├── Makefile
├── gh-md-toc
├── README.md
└── PART_02.md


/.gitignore:
--------------------------------------------------------------------------------
1 | data/*
2 | *molecule*
3 | pihole_exporter/*
4 | 


--------------------------------------------------------------------------------
/monitor/vars/main.yml:
--------------------------------------------------------------------------------
1 | ---
2 | # vars file for monitor
3 | 


--------------------------------------------------------------------------------
/monitor/handlers/main.yml:
--------------------------------------------------------------------------------
1 | ---
2 | # handlers file for monitor
3 | 


--------------------------------------------------------------------------------
/monitor/defaults/main.yml:
--------------------------------------------------------------------------------
1 | ---
2 | # defaults file for monitor
3 | monitor_dir: '/data'
4 | 


--------------------------------------------------------------------------------
/images/pihole.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/pihole.png


--------------------------------------------------------------------------------
/images/config_host_ip.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/config_host_ip.png


--------------------------------------------------------------------------------
/images/grafana_login.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_login.png


--------------------------------------------------------------------------------
/images/node_exporter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/node_exporter.png


--------------------------------------------------------------------------------
/images/pihole_exporter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/pihole_exporter.png


--------------------------------------------------------------------------------
/playbook.yml:
--------------------------------------------------------------------------------
1 | ---
2 | - hosts: all
3 |   vars:
4 |     monitor_dir: /shredder_pool
5 |   roles:
6 |     - monitor
7 | 


--------------------------------------------------------------------------------
/images/cadvisor_exporter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/cadvisor_exporter.png


--------------------------------------------------------------------------------
/images/complete_dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/complete_dashboard.png


--------------------------------------------------------------------------------
/images/import_dashboard_01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/import_dashboard_01.png


--------------------------------------------------------------------------------
/images/import_dashboard_02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/import_dashboard_02.png


--------------------------------------------------------------------------------
/images/prometheus_query_01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/prometheus_query_01.png


--------------------------------------------------------------------------------
/images/prometheus_targets.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/prometheus_targets.png


--------------------------------------------------------------------------------
/images/grafana_node_exporter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_node_exporter.png


--------------------------------------------------------------------------------
/images/grafana_zfs_job_node.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_zfs_job_node.png


--------------------------------------------------------------------------------
/images/config_add_data_source.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/config_add_data_source.png


--------------------------------------------------------------------------------
/images/grafana_dashboard_save.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_dashboard_save.png


--------------------------------------------------------------------------------
/images/grafana_pihole_dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_pihole_dashboard.png


--------------------------------------------------------------------------------
/images/grafana_save_dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_save_dashboard.png


--------------------------------------------------------------------------------
/images/config_dropdown_prometheus.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/config_dropdown_prometheus.png


--------------------------------------------------------------------------------
/images/grafana_save_dashboard_error.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_save_dashboard_error.png


--------------------------------------------------------------------------------
/images/grafana_docker_swarm_dashboard_before.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_docker_swarm_dashboard_before.png


--------------------------------------------------------------------------------
/images/grafana_docker_swarm_dashboard_with_cadvisor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_docker_swarm_dashboard_with_cadvisor.png


--------------------------------------------------------------------------------
/monitor/.yamllint:
--------------------------------------------------------------------------------
 1 | extends: default
 2 | 
 3 | rules:
 4 |   braces:
 5 |     max-spaces-inside: 1
 6 |     level: error
 7 |   brackets:
 8 |     max-spaces-inside: 1
 9 |     level: error
10 |   line-length: disable
11 |   # NOTE(retr0h): Templates no longer fail this lint rule.
12 |   #               Uncomment if running old Molecule templates.
13 |   # truthy: disable
14 | 


--------------------------------------------------------------------------------
/monitor/README.md:
--------------------------------------------------------------------------------
 1 | Docker Swarm Monitor
 2 | =========
 3 | 
 4 | Configure directory structure for docker swarm volume mounts.
 5 | Deploy Prometheus, Cadvisor, Grafana, and more to docker swarm.
 6 | 
 7 | Requirements
 8 | ------------
 9 | 
10 | Role Variables
11 | --------------
12 | 
13 | monitor_dir: '/data'
14 | 
15 | Dependencies
16 | ------------
17 | 
18 | Example Playbook
19 | ----------------
20 | 
21 | Including an example of how to use your role (for instance, with variables
22 | passed in as parameters) is always nice for users too:
23 | 
24 |     - hosts: servers
25 |       roles:
26 |          - { role: monitor, monitor_dir: '/not_data/' }
27 | 
28 | License
29 | -------
30 | 
31 | GPLv2
32 | 
33 | Author Information
34 | ------------------
35 | 
36 | [homelab.business](https://homelab.business/docker-swarm-monitoring-part-02-fixes-cadvisor-pihole/)
37 | 


--------------------------------------------------------------------------------
/monitor/tasks/vagrant_swarm.yml:
--------------------------------------------------------------------------------
 1 | ---
 2 | - name: install dependencies
 3 |   become: true
 4 |   apt:
 5 |     name: "{{ item }}"
 6 |     state: present
 7 |     update_cache: true
 8 |   with_items:
 9 |     - nmap
10 |     - apt-transport-https
11 |     - ca-certificates
12 |     - curl
13 |     - software-properties-common
14 | 
15 | - name: add docker ce repo key
16 |   become: true
17 |   apt_key:
18 |     url: https://download.docker.com/linux/ubuntu/gpg
19 |     state: present
20 | 
21 | - name: add docker ce repo
22 |   become: true
23 |   apt_repository:
24 |     repo: "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable"
25 |     state: present
26 |     filename: docker
27 |     update_cache: true
28 | 
29 | - name: install docker-ce
30 |   become: true
31 |   apt:
32 |     name: docker-ce
33 |     state: present
34 |     update_cache: true
35 | 
36 | - name: determine swarm status
37 |   become: true
38 |   shell: >
39 |     docker info | egrep '^Swarm: ' | cut -d ' ' -f2
40 |   register: swarm_status
41 | 
42 | - name: initialize swarm cluster
43 |   become: true
44 |   shell: >
45 |     docker swarm init
46 |     --advertise-addr={{ ansible_default_ipv4.address | default('eth0') }}:2377
47 |   when: "'inactive' in swarm_status.stdout_lines"
48 | 


--------------------------------------------------------------------------------
/Jenkinsfile:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env groovy
 2 | 
 3 | node('master') {
 4 | 
 5 |     try {
 6 | 
 7 |         stage('build') {
 8 |             // Clean workspace
 9 |             deleteDir()
10 |             // Checkout the app at the given commit sha from the webhook
11 |             checkout scm
12 |         }
13 | 
14 |         stage('test') {
15 |             // Run any testing suites
16 |             sh "echo 'WE ARE TESTING'"
17 |         }
18 | 
19 |         stage('deploy') {
20 |             sh "echo 'WE ARE DEPLOYING'"
21 |             wrap([$class: 'AnsiColorBuildWrapper', colorMapName: "xterm"]) {
22 |                 ansibleTower(
23 |                     towerServer: 'shredder',
24 |                     jobTemplate: 'monitor',
25 |                     importTowerLogs: true,
26 |                     inventory: '',
27 |                     jobTags: '',
28 |                     limit: '',
29 |                     removeColor: false,
30 |                     verbose: true,
31 |                     credential: '',
32 |                     extraVars: '''---
33 |                       test: "test"'''
34 |                 )
35 |             }
36 |         }
37 | 
38 |     } catch(error) {
39 |         throw error
40 | 
41 |     } finally {
42 |         // Any cleanup operations needed, whether we hit an error or not
43 | 
44 |     }
45 | }
46 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | DATA_DIR?=./data
 2 | VERSION = "0.1.0"
 3 | 
 4 | all: dir config update deploy
 5 | 
 6 | config: ## Copy prometheus.yml to config dir
 7 | 	@cp prometheus.yml $(DATA_DIR)/etc/prometheus.yml
 8 | 
 9 | dir: ## Create directories
10 | 	@mkdir -p \
11 | 		$(DATA_DIR)/etc \
12 | 		$(DATA_DIR)/grafana \
13 | 		$(DATA_DIR)/prometheus \
14 | 		|| echo "TRY SUDO"
15 | 	@chmod 777 $(DATA_DIR)/prometheus
16 | 	@chown -R nobody:nobody $(DATA_DIR)/prometheus
17 | 
18 | update: ## Pull latest docker images
19 | 	@docker pull grafana/grafana
20 | 	@docker pull prom/prometheus:latest
21 | 	@docker pull prom/node-exporter:latest
22 | 
23 | deploy: ## Deploy to docker swarm
24 | 	@docker stack deploy -c docker-stack.yml monitor
25 | 
26 | destroy: ## Docker stack rm && rm -rf data
27 | 	@docker stack rm monitor
28 | 	@rm -rf $(DATA_DIR)
29 | 
30 | help: ## This help dialog
31 | 	@IFS=$$'\n' ; \
32 | 		help_lines=(`fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//'`); \
33 | 		for help_line in $${help_lines[@]}; do \
34 | 			IFS=$$'#' ; \
35 | 			help_split=($$help_line) ; \
36 | 			help_command=`echo $${help_split[0]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
37 | 			help_info=`echo $${help_split[2]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
38 | 			printf "%-10s %s\n" $$help_command $$help_info ; \
39 | 		done
40 | 
41 | .PHONY: all config destroy deploy dir docker help test update
42 | 


--------------------------------------------------------------------------------
/monitor/templates/monitor-stack.yml.j2:
--------------------------------------------------------------------------------
 1 | version: '3'
 2 | 
 3 | services:
 4 | 
 5 |   grafana:
 6 |     image: grafana/grafana
 7 |     ports:
 8 |       - "3000:3000"
 9 |     volumes:
10 |       - {{ monitor_dir }}/grafana:/var/lib/grafana:rw
11 |     deploy:
12 |       mode: replicated
13 |       replicas: 1
14 | 
15 |   prometheus:
16 |     image: prom/prometheus:latest
17 |     ports:
18 |       - '9090:9090'
19 |     volumes:
20 |       - {{ monitor_dir }}/etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
21 |       - {{ monitor_dir }}/prometheus:/prometheus:rw
22 |     deploy:
23 |       mode: replicated
24 |       replicas: 1
25 | 
26 |   exporter:
27 |     image: prom/node-exporter:latest
28 |     ports:
29 |       - '9100:9100'
30 |     volumes:
31 |       - /sys:/host/sys:ro
32 |       - /:/rootfs:ro
33 |       - /proc:/host/proc:ro
34 |     deploy:
35 |       mode: global
36 | 
37 |   pihole-exporter:
38 |     image: jahrik/pihole-exporter
39 |     ports:
40 |       - '9101:9311'
41 |     deploy:
42 |       replicas: 1
43 |     command: "-pihole http://bebop"
44 | 
45 |   cadvisor:
46 |     image: google/cadvisor:latest
47 |     ports:
48 |       - '9102:8080'
49 |     volumes:
50 |       - /var/lib/docker/:/var/lib/docker
51 |       - /dev/disk/:/dev/disk
52 |       - /sys:/sys
53 |       - /var/run:/var/run
54 |       - /:/rootfs
55 |       - /dev/zfs:/dev/zfs
56 |     deploy:
57 |       mode: global
58 |       resources:
59 |         limits:
60 |           cpus: '0.50'
61 |           memory: 1024M
62 |         reservations:
63 |           cpus: '0.25'
64 |           memory: 512M
65 |       update_config:
66 |         parallelism: 3
67 |         monitor: 2m
68 |         max_failure_ratio: 0.3
69 |         failure_action: rollback
70 |         delay: 30s
71 |       restart_policy:
72 |         condition: on-failure
73 |         delay: 5s
74 |         max_attempts: 3
75 | 


--------------------------------------------------------------------------------
/monitor/meta/main.yml:
--------------------------------------------------------------------------------
 1 | ---
 2 | galaxy_info:
 3 |   author: your name
 4 |   description: your description
 5 |   company: your company (optional)
 6 | 
 7 |   # If the issue tracker for your role is not on github, uncomment the
 8 |   # next line and provide a value
 9 |   # issue_tracker_url: http://example.com/issue/tracker
10 | 
11 |   # Some suggested licenses:
12 |   # - BSD (default)
13 |   # - MIT
14 |   # - GPLv2
15 |   # - GPLv3
16 |   # - Apache
17 |   # - CC-BY
18 |   license: license (GPLv2, CC-BY, etc)
19 | 
20 |   min_ansible_version: 1.2
21 | 
22 |   # If this a Container Enabled role, provide the minimum Ansible Container version.
23 |   # min_ansible_container_version:
24 | 
25 |   # Optionally specify the branch Galaxy will use when accessing the GitHub
26 |   # repo for this role. During role install, if no tags are available,
27 |   # Galaxy will use this branch. During import Galaxy will access files on
28 |   # this branch. If Travis integration is configured, only notifications for this
29 |   # branch will be accepted. Otherwise, in all cases, the repo's default branch
30 |   # (usually master) will be used.
31 |   # github_branch:
32 | 
33 |   #
34 |   # platforms is a list of platforms, and each platform has a name and a list of versions.
35 |   #
36 |   # platforms:
37 |   # - name: Fedora
38 |   #   versions:
39 |   #   - all
40 |   #   - 25
41 |   # - name: SomePlatform
42 |   #   versions:
43 |   #   - all
44 |   #   - 1.0
45 |   #   - 7
46 |   #   - 99.99
47 | 
48 |   galaxy_tags: []
49 |   # List tags for your role here, one per line. A tag is a keyword that describes
50 |   # and categorizes the role. Users find roles by searching for tags. Be sure to
51 |   # remove the '[]' above, if you add tags to this list.
52 |   #
53 |   # NOTE: A tag is limited to a single word comprised of alphanumeric characters.
54 |   #       Maximum 20 tags per role.
55 | 
56 | dependencies: []
57 | # List your role dependencies here, one per line. Be sure to remove the '[]' above,
58 | # if you add dependencies to this list.
59 | 


--------------------------------------------------------------------------------
/monitor/templates/prometheus.yml.j2:
--------------------------------------------------------------------------------
 1 | global:
 2 |   scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
 3 |   evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
 4 |   # scrape_timeout is set to the global default (10s).
 5 | 
 6 | # Alertmanager configuration
 7 | alerting:
 8 |   alertmanagers:
 9 |   - static_configs:
10 |     - targets:
11 |       # - alertmanager:9093
12 | 
13 | # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
14 | rule_files:
15 |   # - 'first_rules.yml'
16 |   # - 'second_rules.yml'
17 | 
18 | # A scrape configuration containing exactly one endpoint to scrape:
19 | # Here it's Prometheus itself.
20 | scrape_configs:
21 |   # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
22 |   - job_name: 'prometheus'
23 |     scrape_interval: 10s
24 |     static_configs:
25 |     - targets:
26 |       - shredder:9090
27 | 
28 |   # http://shredder:9100/metrics
29 |   - job_name: 'node'
30 |     scrape_interval: 10s
31 |     metrics_path: '/metrics'
32 |     static_configs:
33 |     - targets:
34 |       - shredder:9100
35 |       # - donatello:9100
36 |       # - leonardo:9100
37 |       - ninja:9100
38 |       - venus:9100
39 |       - oroku:9100
40 |       - bebop:9100
41 |       - rocks:9100
42 | 
43 |   # http://shredder:9101/metrics
44 |   - job_name: 'pihole'
45 |     scrape_interval: 10s
46 |     metrics_path: '/metrics'
47 |     static_configs:
48 |     - targets:
49 |       - shredder:9101
50 | 
51 |   # http://shredder:9102/metrics/
52 |   - job_name: 'cadvisor'
53 |     scrape_interval: 30s
54 |     metrics_path: '/metrics'
55 |     static_configs:
56 |     - targets:
57 |       - shredder:9102
58 | 
59 |   # http://shredder:9103/metrics/
60 |   - job_name: 'transmission'
61 |     scrape_interval: 10s
62 |     metrics_path: '/metrics'
63 |     static_configs:
64 |     - targets:
65 |       - shredder:9103
66 | 
67 |   # http://shredder:8080/prometheus/
68 |   - job_name: 'jenkins'
69 |     scrape_interval: 10s
70 |     metrics_path: '/prometheus'
71 |     static_configs:
72 |     - targets:
73 |       - shredder:8080
74 | 


--------------------------------------------------------------------------------
/monitor/tasks/main.yml:
--------------------------------------------------------------------------------
  1 | ---
  2 | # If built in molecule, user will be 'ubuntu'
  3 | # install docker swarm first
  4 | 
  5 | - include_tasks: vagrant_swarm.yml
  6 |   when:
  7 |     - ansible_user_id == 'ubuntu'
  8 | 
  9 | - name: Create base directory structure
 10 |   become: true
 11 |   file:
 12 |     path: "{{ item }}"
 13 |     state: directory
 14 |     owner: root
 15 |     group: root
 16 |     mode: 0755
 17 |   with_items:
 18 |     - "{{ monitor_dir }}"
 19 |     - "{{ monitor_dir }}/stacks"
 20 | 
 21 | - name: Create directories for prometheus
 22 |   become: true
 23 |   file:
 24 |     path: "{{ item }}"
 25 |     state: directory
 26 |     owner: 65534
 27 |     group: 65534
 28 |     mode: 0755
 29 |     # recurse: yes
 30 |   with_items:
 31 |     - "{{ monitor_dir }}/prometheus"
 32 |     - "{{ monitor_dir }}/etc/prometheus"
 33 | 
 34 | - name: Create directories for grafana
 35 |   become: true
 36 |   file:
 37 |     path: "{{ item }}"
 38 |     state: directory
 39 |     owner: 472
 40 |     group: 472
 41 |     mode: 0755
 42 |     # recurse: yes
 43 |   with_items:
 44 |     - "{{ monitor_dir }}/grafana"
 45 | 
 46 | - name: Generate config files
 47 |   become: true
 48 |   template:
 49 |     src: prometheus.yml.j2
 50 |     dest: "{{ monitor_dir }}/etc/prometheus/prometheus.yml"
 51 |     mode: 0644
 52 |   register: prom_conf
 53 | 
 54 | - name: Check if prometheus is running
 55 |   ignore_errors: true
 56 |   uri:
 57 |     url: "http://{{ ansible_default_ipv4.address }}:9090/graph"
 58 |     status_code: 200
 59 |   register: result
 60 | 
 61 | - name: kill prometheus service if conf file changes
 62 |   become: true
 63 |   command: docker service rm monitor_prometheus
 64 |   when:
 65 |     - result.status == 200
 66 |     - prom_conf.changed
 67 | 
 68 | - name: Generate stack file
 69 |   become: true
 70 |   template:
 71 |     src: monitor-stack.yml.j2
 72 |     dest: "{{ monitor_dir }}/stacks/monitor-stack.yml"
 73 |     mode: 0644
 74 | 
 75 | - name: update docker images
 76 |   become: true
 77 |   command: "{{ item }}"
 78 |   with_items:
 79 |     - docker pull grafana/grafana
 80 |     - docker pull prom/prometheus
 81 |     - docker pull prom/node-exporter
 82 |     - docker pull jahrik/pihole-exporter
 83 | 
 84 | - name: deploy the monitor stack to docker swarm
 85 |   become: true
 86 |   command: docker stack deploy -c monitor-stack.yml monitor
 87 |   args:
 88 |     chdir: "{{ monitor_dir }}/stacks/"
 89 | 
 90 | - name: Wait for prometheus port to come up
 91 |   wait_for:
 92 |     host: "{{ ansible_default_ipv4.address }}"
 93 |     port: 9090
 94 |     timeout: 30
 95 | 
 96 | - name: Wait for grafana port to come up
 97 |   wait_for:
 98 |     host: "{{ ansible_default_ipv4.address }}"
 99 |     port: 3000
100 |     timeout: 30
101 | 
102 | - name: Wait for prometheus exporter port to come up
103 |   wait_for:
104 |     host: "{{ ansible_default_ipv4.address }}"
105 |     port: 9100
106 |     timeout: 30
107 | 
108 | - name: Wait for pihole exporter port to come up
109 |   wait_for:
110 |     host: "{{ ansible_default_ipv4.address }}"
111 |     port: 9101
112 |     timeout: 30
113 | 
114 | - name: docker stack ps monitor
115 |   become: true
116 |   shell: docker stack ps monitor
117 |   register: docker_stack
118 | 
119 | - debug:
120 |     msg: "Stack {{ docker_stack.stdout_lines }}"
121 | 


--------------------------------------------------------------------------------
/gh-md-toc:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env bash
  2 | 
  3 | #
  4 | # Steps:
  5 | #
  6 | #  1. Download corresponding html file for some README.md:
  7 | #       curl -s $1
  8 | #
  9 | #  2. Discard rows where no substring 'user-content-' (github's markup):
 10 | #       awk '/user-content-/ { ...
 11 | #
 12 | #  3.1 Get last number in each row like ' ... </span></a>sitemap.js</h1'.
 13 | #      It's a level of the current header:
 14 | #       substr($0, length($0), 1)
 15 | #
 16 | #  3.2 Get level from 3.1 and insert corresponding number of spaces before '*':
 17 | #       sprintf("%*s", substr($0, length($0), 1)*3, " ")
 18 | #
 19 | #  4. Find head's text and insert it inside "* [ ... ]":
 20 | #       substr($0, match($0, /a>.*<\/h/)+2, RLENGTH-5)
 21 | #
 22 | #  5. Find anchor and insert it inside "(...)":
 23 | #       substr($0, match($0, "href=\"[^\"]+?\" ")+6, RLENGTH-8)
 24 | #
 25 | 
 26 | gh_toc_version="0.4.9"
 27 | 
 28 | gh_user_agent="gh-md-toc v$gh_toc_version"
 29 | 
 30 | #
 31 | # Download rendered into html README.md by its url.
 32 | #
 33 | #
 34 | gh_toc_load() {
 35 |     local gh_url=$1
 36 | 
 37 |     if type curl &>/dev/null; then
 38 |         curl --user-agent "$gh_user_agent" -s "$gh_url"
 39 |     elif type wget &>/dev/null; then
 40 |         wget --user-agent="$gh_user_agent" -qO- "$gh_url"
 41 |     else
 42 |         echo "Please, install 'curl' or 'wget' and try again."
 43 |         exit 1
 44 |     fi
 45 | }
 46 | 
 47 | #
 48 | # Converts local md file into html by GitHub
 49 | #
 50 | # ➥ curl -X POST --data '{"text": "Hello world github/linguist#1 **cool**, and #1!"}' https://api.github.com/markdown
 51 | # <p>Hello world github/linguist#1 <strong>cool</strong>, and #1!</p>'"
 52 | gh_toc_md2html() {
 53 |     local gh_file_md=$1
 54 |     URL=https://api.github.com/markdown/raw
 55 |     TOKEN="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/token.txt"
 56 |     if [ -f "$TOKEN" ]; then
 57 |         URL="$URL?access_token=$(cat $TOKEN)"
 58 |     fi
 59 |     curl -s --user-agent "$gh_user_agent" \
 60 |         --data-binary @"$gh_file_md" -H "Content-Type:text/plain" \
 61 |         $URL
 62 | }
 63 | 
 64 | #
 65 | # Is passed string url
 66 | #
 67 | gh_is_url() {
 68 |     case $1 in
 69 |         https* | http*)
 70 |             echo "yes";;
 71 |         *)
 72 |             echo "no";;
 73 |     esac
 74 | }
 75 | 
 76 | #
 77 | # TOC generator
 78 | #
 79 | gh_toc(){
 80 |     local gh_src=$1
 81 |     local gh_src_copy=$1
 82 |     local gh_ttl_docs=$2
 83 | 
 84 |     if [ "$gh_src" = "" ]; then
 85 |         echo "Please, enter URL or local path for a README.md"
 86 |         exit 1
 87 |     fi
 88 | 
 89 | 
 90 |     # Show "TOC" string only if working with one document
 91 |     if [ "$gh_ttl_docs" = "1" ]; then
 92 | 
 93 |         echo "Table of Contents"
 94 |         echo "================="
 95 |         echo ""
 96 |         gh_src_copy=""
 97 | 
 98 |     fi
 99 | 
100 |     if [ "$(gh_is_url "$gh_src")" == "yes" ]; then
101 |         gh_toc_load "$gh_src" | gh_toc_grab "$gh_src_copy"
102 |     else
103 |         gh_toc_md2html "$gh_src" | gh_toc_grab "$gh_src_copy"
104 |     fi
105 | }
106 | 
107 | #
108 | # Grabber of the TOC from rendered html
109 | #
110 | # $1 — a source url of document.
111 | # It's need if TOC is generated for multiple documents.
112 | #
113 | gh_toc_grab() {
114 | 	# if closed <h[1-6]> is on the new line, then move it on the prev line
115 | 	# for example:
116 | 	# 	was: The command <code>foo1</code>
117 | 	# 		 </h1>
118 | 	# 	became: The command <code>foo1</code></h1>
119 |     sed -e ':a' -e 'N' -e '$!ba' -e 's/\n<\/h/<\/h/g' |
120 |     # find strings that corresponds to template
121 |     grep -E -o '<a.*id="user-content-[^"]*".*</h[1-6]' |
122 |     # remove code tags
123 |     sed 's/<code>//' | sed 's/<\/code>//' |
124 |     # now all rows are like:
125 |     #   <a id="user-content-..." href="..."><span ...></span></a> ... </h1
126 |     # format result line
127 |     #   * $0 — whole string
128 |     echo -e "$(awk -v "gh_url=$1" '{
129 |     print sprintf("%*s", substr($0, length($0), 1)*3, " ") "* [" substr($0, match($0, /a>.*<\/h/)+2, RLENGTH-5)"](" gh_url substr($0, match($0, "href=\"[^\"]+?\" ")+6, RLENGTH-8) ")"}' | sed 'y/+/ /; s/%/\\x/g')"
130 | }
131 | 
132 | #
133 | # Returns filename only from full path or url
134 | #
135 | gh_toc_get_filename() {
136 |     echo "${1##*/}"
137 | }
138 | 
139 | #
140 | # Options hendlers
141 | #
142 | gh_toc_app() {
143 |     local app_name="gh-md-toc"
144 | 
145 |     if [ "$1" = '--help' ] || [ $# -eq 0 ] ; then
146 |         echo "GitHub TOC generator ($app_name): $gh_toc_version"
147 |         echo ""
148 |         echo "Usage:"
149 |         echo "  $app_name src [src]     Create TOC for a README file (url or local path)"
150 |         echo "  $app_name -             Create TOC for markdown from STDIN"
151 |         echo "  $app_name --help        Show help"
152 |         echo "  $app_name --version     Show version"
153 |         return
154 |     fi
155 | 
156 |     if [ "$1" = '--version' ]; then
157 |         echo "$gh_toc_version"
158 |         return
159 |     fi
160 | 
161 |     if [ "$1" = "-" ]; then
162 |         if [ -z "$TMPDIR" ]; then
163 |             TMPDIR="/tmp"
164 |         elif [ -n "$TMPDIR" -a ! -d "$TMPDIR" ]; then
165 |             mkdir -p "$TMPDIR"
166 |         fi
167 |         local gh_tmp_md
168 |         gh_tmp_md=$(mktemp $TMPDIR/tmp.XXXXXX)
169 |         while read input; do
170 |             echo "$input" >> "$gh_tmp_md"
171 |         done
172 |         gh_toc_md2html "$gh_tmp_md" | gh_toc_grab ""
173 |         return
174 |     fi
175 | 
176 |     for md in "$@"
177 |     do
178 |         echo ""
179 |         gh_toc "$md" "$#"
180 |     done
181 | 
182 |     echo ""
183 |     echo "Created by [gh-md-toc](https://github.com/ekalinin/github-markdown-toc)"
184 | }
185 | 
186 | #
187 | # Entry point
188 | #
189 | gh_toc_app "$@"
190 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Docker Swarm monitoring - part 01 (Node-exporter, Prometheus, and Grafana)
  2 | 
  3 | An effective monitoring system can be built across a Docker Swarm cluster using services managed by swarm itself. Starting with the prometheus node-exporter to gather system info from all host machines running Docker in swarm mode.  Mount the system's directories as docker volumes to accomplish read access.  Prometheus exporter gathers system info such as CPU, memory, and disk usage and exports it to a website that Prometheus server can then scrape every 15 seconds and fill a Time Series Data Base.  With those 2 services in place, Grafana can then be pointed at the Prometheus server to build beautiful graphs and dashboards!
  4 | 
  5 | Prerequisites: 
  6 | * [Docker Install Docs](https://docs.docker.com/install/linux/docker-ce/ubuntu/)
  7 | * [Docker Swarm Docs](https://docs.docker.com/engine/reference/commandline/swarm_init/)
  8 | * [github.com/jahrik/docker-swarm-monitor](https://github.com/jahrik/docker-swarm-monitor)
  9 | 
 10 | Docker Swarm uses [Compose v3](https://docs.docker.com/compose/compose-file/) and uses a `docker-stack.yml` file, much like the `docker-compose.yml` files designed to be used with the `docker-compose` tool, which use Compose v2.  One of the biggest differences you'll run into when starting services with `docker stack deploy` over `docker-compose up/down` is that docker swarm creates a [Routing Mesh](https://docs.docker.com/engine/swarm/ingress/) for you, where as with `docker-compose` networks and containers have to be explicitly created and linked.  In swarm mode, the `link: ` is no longer needed.  Services can be included in the same stack file and, by default, be created in the same network stack at deploy time, allowing docker containers to call each other by service name.  This network can then be used by other stacks and future services by calling it in the stack file and assigning a service to it.  This makes it easy to keep containers on their own isolated network or to cluster certain services like metrics and logging tools together on the same network.
 11 | 
 12 | Here is a Compose v3 docker-stack.yml file for this project that will start three services: Grafana, Prometheus server, and Prometheus node-exporter.
 13 | 
 14 | **docker-stack.yml**
 15 | 
 16 |     version: '3'
 17 | 
 18 |     services:
 19 | 
 20 |       exporter:
 21 |         image: prom/node-exporter:latest
 22 |         ports:
 23 |           - '9100:9100'
 24 |         volumes:
 25 |           - /sys:/host/sys:ro
 26 |           - /:/rootfs:ro
 27 |           - /proc:/host/proc:ro
 28 |         deploy:
 29 |           mode: global
 30 | 
 31 |       prometheus:
 32 |         image: prom/prometheus:latest
 33 |         ports:
 34 |           - '9090:9090'
 35 |         volumes:
 36 |           - ./data/etc/prometheus.yml:/etc/prometheus/prometheus.yml:ro
 37 |           - ./data/prometheus:/prometheus:rw
 38 |         deploy:
 39 |           mode: replicated
 40 |           replicas: 1
 41 | 
 42 |       grafana:
 43 |         image: grafana/grafana
 44 |         ports:
 45 |           - "3000:3000"
 46 |         volumes:
 47 |           - ./data/grafana:/var/lib/grafana:rw
 48 |         deploy:
 49 |           mode: replicated
 50 |           replicas: 1
 51 | 
 52 | Directory creation needs to be done before deploying this stack.  A [Makefile](https://github.com/jahrik/docker-swarm-monitor/blob/master/Makefile) has been included to handle config, build, deploy, destroy operations and should be used as a reference for the commands that will build this thing.
 53 | 
 54 |     make help                                                                              
 55 | 
 56 |     config:    Copy prometheus.yml to config dir
 57 |     dir:       Create directories
 58 |     update:    Pull latest docker images
 59 |     deploy:    Deploy to docker swarm
 60 |     destroy:   Docker stack rm && rm -rf data
 61 |     help:      This help dialog
 62 | 
 63 | With what's in the source code the stack can be started with:
 64 | 
 65 |     sudo make
 66 | 
 67 | ## Prometheus
 68 | 
 69 | ### Exporter
 70 | 
 71 | Browse to the [Prometheus node-exporter](https://github.com/prometheus/node_exporter) docs up on github and you'll see a few lines at the bottom of the readme that say how to run this in docker that look like this.
 72 | 
 73 |     docker run -d \
 74 |       --net="host" \
 75 |       --pid="host" \
 76 |       quay.io/prometheus/node-exporter
 77 | 
 78 | Start by creating the stack file with just this entry.  Take the image name from the docs and add it the stack file.  The volumes in the stack file are mounted for prometheus to read.  `deploy: mode: global` is saying that this service will be started on every node in the swarm cluster.  Outputs to [localhost:9100/](http://localhost:9100/)
 79 | 
 80 | **docker-stack.yml**
 81 | 
 82 |     version: '3'
 83 | 
 84 |     services:
 85 | 
 86 |       exporter:
 87 |         image: prom/node-exporter:latest
 88 |         ports:
 89 |           - '9100:9100'
 90 |         volumes:
 91 |           - /sys:/host/sys:ro
 92 |           - /:/rootfs:ro
 93 |           - /proc:/host/proc:ro
 94 |         deploy:
 95 |           mode: global
 96 | 
 97 | Start this up with the `docker stack deploy` command
 98 | 
 99 |     docker stack deploy -c docker-stack.yml monitor
100 | 
101 |     Creating network monitor_default
102 |     Creating service monitor_exporter
103 | 
104 | This can also be kicked off with the Makefile
105 | 
106 |     make deploy
107 | 
108 |     Updating service monitor_exporter (id: ivbddqpnjr7sdxre0gzopney9)
109 | 
110 | Check the service
111 | 
112 |     docker service ps monitor_exporter
113 | 
114 |     ID                  NAME                                         IMAGE                       NODE                DESIRED STATE       CURRENT STATE                ERROR               PORTS
115 |     cp7o7s9t33s6        monitor_exporter.76g7crzb0hk6jp9zysvegmupy   prom/node-exporter:latest   localhost           Running             Running about a minute ago
116 | 
117 | Check the logs
118 | 
119 |     docker service logs monitor_exporter
120 |     ...
121 |     ...
122 |     monitor_exporter.0.cp7o7s9t33s6@localhost    | time="2018-03-28T08:14:47Z" level=info msg="Listening on :9100" source="node_exporter.go:76"
123 | 
124 | Browse [localhost:9100/](http://localhost:9100/) and check it out.
125 | 
126 | ![node_exporter](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/node_exporter.png)
127 | 
128 | ### Server
129 | 
130 | Next, start up the [Prometheus Server](https://github.com/prometheus/prometheus).  This will scrape the exporter at a (10 second interval) set in the prometheus.yml configuration file.  This file will be configured locally and copied to the image as a volume at run time.  Volumes will also be used for persistent tsdb data in case of a container restart or failure.
131 | 
132 | **docker-stack.yml**
133 | 
134 |     prometheus:
135 |       image: prom/prometheus:latest
136 |       ports:
137 |         - '9090:9090'
138 |       volumes:
139 |         - ./data/etc/prometheus.yml:/etc/prometheus/prometheus.yml:ro
140 |         - ./data/prometheus:/prometheus:rw
141 |       deploy:
142 |         mode: replicated
143 |         replicas: 1
144 | 
145 | Prepare directories for mounting docker volumes.  These will need read/write permissions for the default prometheus container user, which is nobody:nobody.
146 | 
147 |     DATA_DIR="./data"
148 | 
149 |     mkdir -p \
150 |       "$DATA_DIR/etc" \
151 |       "$DATA_DIR/grafana" \
152 |       "$DATA_DIR/prometheus"
153 | 
154 |     chmod 777 "$DATA_DIR/prometheus"
155 |     chown -R nobody:nobody "$DATA_DIR/prometheus"
156 | 
157 | Volumes are configured in the docker-stack.yml file. The first one is where prometheus will write it's database to.  Secondly, prometheus mounts the prometheus.yml file which will come in handy later, when I start deploying this with jenkins later, because it let's me edit this file and reconfigure prometheus at deploy time.
158 | * ./data/prometheus:/prometheus:rw
159 | * ./data/etc/prometheus.yml:/etc/prometheus/prometheus.yml:ro
160 | 
161 | Check out your prometheus.yml file and make sure the exporter is added as a scrape target.  This is how targets will be added in the future.  Like Cadvisor and mysql-exporter.
162 | 
163 | **prometheus.yml**
164 | 
165 |     global:
166 |       scrape_interval:     30s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
167 |       evaluation_interval: 30s # Evaluate rules every 15 seconds. The default is every 1 minute.
168 |       # scrape_timeout is set to the global default (10s).
169 | 
170 |     # Alertmanager configuration
171 |     alerting:
172 |       alertmanagers:
173 |       - static_configs:
174 |         - targets:
175 |           # - alertmanager:9093
176 | 
177 |     # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
178 |     rule_files:
179 |       # - "first_rules.yml"
180 |       # - "second_rules.yml"
181 | 
182 |     # A scrape configuration containing exactly one endpoint to scrape:
183 |     # Here it's Prometheus itself.
184 |     scrape_configs:
185 |       # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
186 |       - job_name: 'prometheus'
187 |         # metrics_path defaults to '/metrics'
188 |         # scheme defaults to 'http'.
189 |         static_configs:
190 |         - targets:
191 |           - localhost:9090
192 | 
193 |       # http://exporter:9100/metrics
194 |       - job_name: prometheus-exporter
195 |         scrape_interval: 10s
196 |         metrics_path: "/metrics"
197 |         static_configs:
198 |         - targets:
199 |            - exporter:9100
200 | 
201 | Use make to deploy and it will copy this config file to where it needs to go.
202 | 
203 |     make config
204 | 
205 |     config:
206 | 	    @cp prometheus.yml $(DATA_DIR)/etc/prometheus.yml
207 | 
208 | With the prometheus server service added to the docker-stack.yml file and everything configured, redeploy the stack to add the new service.
209 | 
210 |     docker stack deploy -c docker-stack.yml monitor
211 | 
212 |     Creating service monitor_prometheus
213 |     Updating service monitor_exporter (id: ivbddqpnjr7sdxre0gzopney9)
214 | 
215 | Browse to [localhost:9090/targets](http://127.0.0.1:9090/targets) to verify connectivity.
216 | 
217 | ![prometheus_targets](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/prometheus_targets.png)
218 | 
219 | That the server successfully scraping system data, it's now possible to query the tsdb
220 | 
221 |     node_cpu{cpu="cpu0"}
222 |     node_cpu{cpu="cpu0",mode="idle"}
223 |     etc...
224 | 
225 | ![prometheus_query_01](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/prometheus_query_01.png)
226 | 
227 | ## Grafana
228 | 
229 | With Prometheus up and running, it's time to start Grafana.  A volume is created for persistent data.  Will output to [localhost:3000/](http://localhost:3000/)
230 | 
231 | **docker-stack.yml**
232 | 
233 |     grafana:
234 |       image: grafana/grafana
235 |       ports:
236 |         - "3000:3000"
237 |       volumes:
238 |         - ./data/grafana:/var/lib/grafana:rw
239 |       deploy:
240 |         mode: replicated
241 |         replicas: 1
242 | 
243 | Deploy to start Grafana
244 | 
245 |     docker stack deploy -c docker-stack.yml monitor                                        
246 | 
247 |     Updating service monitor_exporter (id: ivbddqpnjr7sdxre0gzopney9)
248 |     Updating service monitor_prometheus (id: q4f07qz2tk3dvic9kc21sa3kq)
249 |     Creating service monitor_grafana
250 | 
251 | Browse to [localhost:3000/login](http://localhost:3000/login)
252 | 
253 | The default user and password are: `admin` `admin`
254 | 
255 | ![grafana_login](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/grafana_login.png)
256 | 
257 | Add a data source
258 | ![config_add_data_source](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/config_add_data_source.png)
259 | 
260 | Chose prometheus from the drop down
261 | ![config_dropdown_prometheus](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/config_dropdown_prometheus.png)
262 | 
263 | I used the IP from the host machine in this example
264 | ![config_host_ip](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/config_host_ip.png)
265 | 
266 | Go to [grafana.com/dashboards](https://grafana.com/dashboards) to check out the thousands of pre-made dashboards that are out there and find one that will work as a template to build on.  A good one to start with in this project is the [node exporter metrics on docker swarm mode](https://grafana.com/dashboards/1442) dashboard or `1442`
267 | 
268 | Import this dashboard to Grafana
269 | ![import_dashboard_01](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/import_dashboard_01.png)
270 | 
271 | Chose prometheus as data source and hit Import
272 | ![import_dashboard_01](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/import_dashboard_02.png)
273 | 
274 | And dashboard
275 | ![complete_dashboard](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/complete_dashboard.png)
276 | 
277 | With that, a very flexible monitoring system has been established across the swarm cluster! A lot can be done to add to it easily, with new data sources and dashboards.
278 | 
279 | Bring it all down with
280 | 
281 |     sudo make destroy
282 | 
283 |     Removing service monitor_exporter
284 |     Removing service monitor_grafana
285 |     Removing service monitor_prometheus
286 |     Removing network monitor_default
287 | 


--------------------------------------------------------------------------------
/PART_02.md:
--------------------------------------------------------------------------------
  1 | # Docker Swarm monitoring - part 02 (Fixes, Cadvisor, and Pihole)
  2 | 
  3 | In [part 01](https://homelab.business/docker-swarm-monitoring-part-01/), I deployed [node exporter](https://github.com/prometheus/node_exporter), [Prometheus](https://github.com/prometheus/prometheus), and [Grafana](https://grafana.com/).  This time around, I will touch on some of the problems I've run into since then and how I solved them. I'll tack on another monitoring tool to the stack, [Cadvisor](https://github.com/google/cadvisor).  Finally, I'll forward [Pi-Hole](https://pi-hole.net/) metrics to a Grafana dashboard.
  4 | 
  5 | Since part 01, I have added enough to [deploy this to Docker Swarm](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/monitor-stack.yml.j2) using a [Jenkins pipeline](https://github.com/jahrik/docker-swarm-monitor/blob/master/Jenkinsfile) and [Ansible playbook](https://github.com/jahrik/docker-swarm-monitor/blob/master/playbook.yml).  This workflow lets me push my changes to github, have Jenkins handle building and testing, then push configuration and deploy to Docker Swarm with Ansible AWX.  There is a [write-up on doing the same thing with an Ark server](https://homelab.business/ark-jenkins-ansible-swarm/), if you need more information on how all those pieces fit together.
  6 | 
  7 | * [Grafana](#grafana)
  8 | * [Prometheus](#prometheus)
  9 | * [Node Exporter](#node-exporter)
 10 | * [Cadvisor](#cadvisor)
 11 | * [Pihole](#pihole)
 12 | * [Pihole exporter](#pihole-exporter)
 13 | 
 14 | ## Grafana
 15 | 
 16 | Somehow, I ended up changing the permissions to the Grafana SQLite.db file and it was still able to read data, but I wasn't able to save anything.  Somewhere along the line, maybe I ran a command close to this? `chown 1000:1000 /data/grafana/grafana.db`. Grafana didn't like it.
 17 | 
 18 | ![grafana_save_dashboard_error.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_save_dashboard_error.png?raw=true)
 19 | 
 20 | The following was observed in `docker service logs -f monitor_grafana`
 21 | 
 22 |     monitor_grafana.1.tyxisxhoxri4@<redacted_docker_host>    | t=2018-05-18T05:54:07+0000 lvl=eror msg="Failed to save dashboard" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database"
 23 | 
 24 | Plus a repeating stream of the following error, over and over.
 25 | 
 26 |     monitor_grafana.1.tyxisxhoxri4@<redacted_docker_host>    | t=2018-05-18T05:57:59+0000 lvl=eror msg="Failed to update last_seen_at" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database"
 27 |     monitor_grafana.1.tyxisxhoxri4@<redacted_docker_host>    | t=2018-05-18T05:57:59+0000 lvl=eror msg="Failed to update last_seen_at" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database"
 28 |     monitor_grafana.1.tyxisxhoxri4@<redacted_docker_host>    | t=2018-05-18T05:57:59+0000 lvl=eror msg="Failed to update last_seen_at" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database"
 29 |     monitor_grafana.1.tyxisxhoxri4@<redacted_docker_host>    | t=2018-05-18T05:57:59+0000 lvl=eror msg="Failed to update last_seen_at" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database"
 30 | 
 31 | Which makes it pretty obvious what's going on:
 32 | * msg="Failed to save dashboard"
 33 | * msg="Failed to update last_seen_at"
 34 | * error="attempt to write a readonly database"
 35 | 
 36 | This was an easy fix.
 37 | 
 38 | Find the grafana container and note the container id.
 39 | 
 40 |     docker ps
 41 | 
 42 |     CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS                  PORTS                                                                            NAMES
 43 |     5dbad5cc02a1        grafana/grafana:latest          "/run.sh"                21 minutes ago      Up 21 minutes           3000/tcp                                                                         monitor_grafana.1.tyxisxhoxri40hfv56ecgr46i
 44 | 
 45 | Execute a shell on the docker container.
 46 | 
 47 |     docker exec -it 5dbad5cc02a1 bash
 48 | 
 49 | Get grafana user id info
 50 | 
 51 |     grafana@5dbad5cc02a1:/$ id
 52 |     uid=472(grafana) gid=472(grafana) groups=472(grafana)
 53 | 
 54 | On the docker host, set the file permission to all the files on the host end of the volume.
 55 | Where `/data/grafana` is the mounted volume containing `/data/grafana/grafana.db`
 56 | 
 57 |     sudo chown -R 472:472 /data/grafana
 58 | 
 59 | Kill and restart the grafana service
 60 | 
 61 |     docker service rm monitor_grafana 
 62 | 
 63 |     docker stack deploy -c monitor-stack.yml monitor
 64 | 
 65 | Write access is restored.
 66 | 
 67 | ![grafana_save_dashboard.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_save_dashboard.png?raw=true)
 68 | 
 69 | Here's an Ansible task to handle this.
 70 | 
 71 |     - name: Create directories for grafana
 72 |       become: true
 73 |       file:
 74 |         path: "{{ item }}"
 75 |         state: directory
 76 |         owner: 472
 77 |         group: 472
 78 |         mode: 0755
 79 |         # recurse: yes
 80 |       with_items:
 81 |         - "{{ monitor_dir }}/grafana"
 82 |         - "{{ monitor_dir }}/grafana/sessions"
 83 |         - "{{ monitor_dir }}/grafana/plugins"
 84 | 
 85 |     - name: Set file perms on grafana.db file
 86 |       become: true
 87 |       file:
 88 |         path: "{{ monitor_dir }}/grafana/grafana.db"
 89 |         state: file
 90 |         owner: 472
 91 |         group: 472
 92 |         mode: 0664
 93 | 
 94 | ## Prometheus
 95 | 
 96 | When you make an update to the prometheus.yml file, the desired action is for the Prometheus server to be restarted.  Because I'm deploying this in an automated fashion, I need to handle the restarting of this service the same way and add in a couple of checks along the way.  [This Ansible playbook can be found here](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/tasks/main.yml).
 97 | 
 98 | The config file is generated and registers a variable, `prom_conf` containing information about the file in question, `prometheus.yml`, including information on whether the file has been changed this run or not.
 99 | 
100 |     - name: Generate config files
101 |       become: true
102 |       template:
103 |         src: prometheus.yml.j2
104 |         dest: "{{ monitor_dir }}/etc/prometheus/prometheus.yml"
105 |         mode: 0644
106 |       register: prom_conf
107 | 
108 | A check to see if Prometheus is running or not, with the [uri module](http://docs.ansible.com/ansible/latest/modules/uri_module.html).  This also registers a variable, `result`, containing a status code returned from whatever webserver it's pointed at.  In this case, I'm pulling the default IPv4 address from the host that ansible is currently running on and adding `:9090/graph` to the end of that, in hopes of reaching Prometheus.  Notice how this one also has `ignore_erros: true`.  The reason for this, is for the very first time this runs on docker or for times when Prometheus is not actually running.  Without that, you will get a status_code back that does not equal 200 and this task will fail.
109 | 
110 |     - name: Check if prometheus is running
111 |       ignore_errors: true
112 |       uri:
113 |         url: "http://{{ ansible_default_ipv4.address }}:9090/graph"
114 |         status_code: 200
115 |       register: result
116 | 
117 | With these two checks in place, there is enough information to determine if the prometheus server needs to be restarted or not.  With a when statement that contains more than one thing, `when: ['one_thing','two_thing']`, both values have to be true before this task is kicked off. If the variable `prom_conf` comes back with a `.changed` status of `true` this will be passed on as true.  Same goes for the `result.status`, if it == 200 it will return `true`.
118 | 
119 |     - name: kill prometheus service if conf file changes
120 |       become: true
121 |       command: docker service rm monitor_prometheus
122 |       when:
123 |         - result.status == 200
124 |         - prom_conf.changed
125 | 
126 | With that, the stack is redeployed to swarm, restarting Prometheus.
127 | 
128 |     - name: deploy the monitor stack to docker swarm
129 |       become: true
130 |       command: docker stack deploy -c monitor-stack.yml monitor
131 |       args:
132 |         chdir: "{{ monitor_dir }}/stacks/"
133 | 
134 | 
135 | I've also added a check at the end of the playbook to make sure Prometheus is running.
136 | 
137 |     - name: Wait for prometheus port to come up
138 |       wait_for:
139 |         host: "{{ ansible_default_ipv4.address }}"
140 |         port: 9090
141 |         timeout: 30
142 | 
143 | ## Node Exporter
144 | 
145 | I'm seeing the following from `docker service logs -f monitor_exporter`.  I would like node exporter to ignore docker volume mounts.  Ignoring all of /var/lib/docker would be ok with me for now to clean up this error, but I haven't figured out where to configure that yet.  It's on the **TODO** list.
146 | 
147 |     time="2018-05-19T08:33:59Z" level=error msg="Error on statfs() system call for \"/rootfs/var/lib/docker/overlay2/f8da180fa939589132d04099a37c9f182bc0b38e84d0b84ee8958fe42aa5e18d/merged\": permission denied" source="filesystem_linux.go:57"
148 |     time="2018-05-19T08:33:59Z" level=error msg="Error on statfs() system call for \"/rootfs/var/lib/docker/containers/01358918338b67982715107fe876b803abbcd0c57f4672c07de0025d1426f2af/mounts/shm\": permission denied" source="filesystem_linux.go:57"
149 |     time="2018-05-19T08:33:59Z" level=error msg="Error on statfs() system call for \"/rootfs/run/docker/netns/a2d163e99d44\": permission denied" source="filesystem_linux.go:57"
150 | 
151 | Out of the box, the [Node - ZFS](https://grafana.com/dashboards/3170) and [Node - ZFS all](https://grafana.com/dashboards/3161) dashboards rely on a specific job name to work.  At first I had the Prometheus job name as 'node-exporter' in the prometheus.yml file, but this dashboard is relying on it being just 'node' and using that as a variable.
152 | 
153 | ![grafana_zfs_job_node.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_zfs_job_node.png?raw=true)
154 | 
155 | The entry in the [prometheus.yml](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/prometheus.yml.j2) file uses a job name of `node` to work with the zfs dashboards.
156 | 
157 |     # http://docker_host:9100/metrics
158 |     - job_name: 'node'
159 |       scrape_interval: 10s
160 |       metrics_path: '/metrics'
161 |       static_configs:
162 |       - targets:
163 |         - docker_host:9100
164 | 
165 | * [Node Exporter Full](https://grafana.com/dashboards/1860)
166 | 
167 | ## Cadvisor
168 | 
169 | [Cadvisor](https://github.com/google/cadvisor) will export metrics from the container service running.
170 | > cAdvisor has native support for Docker containers and should support just about any other container type out of the box. 
171 | 
172 | While running node_exporter alone and not Cadvisor yet, the [Docker-swarm-monitor dashboard](https://grafana.com/dashboards/2603) will look a bit like this.
173 | ![grafana_docker_swarm_dashboard_before.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_docker_swarm_dashboard_before.png?raw=true)
174 | 
175 | Add Cadvisor to the [monitor-stack.yml](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/monitor-stack.yml.j2) file.
176 | 
177 |     cadvisor:
178 |       image: google/cadvisor:latest
179 |       ports:
180 |         - '9105:8080'
181 |       volumes:
182 |         - /var/lib/docker/:/var/lib/docker
183 |         - /dev/disk/:/dev/disk
184 |         - /sys:/sys
185 |         - /var/run:/var/run
186 |         - /:/rootfs
187 |         - /dev/zfs:/dev/zfs
188 |       deploy:
189 |         mode: global
190 |         resources:
191 |           limits:
192 |             cpus: '0.50'
193 |             memory: 1024M
194 |           reservations:
195 |             cpus: '0.25'
196 |             memory: 512M
197 |         update_config:
198 |           parallelism: 3
199 |           monitor: 2m
200 |           max_failure_ratio: 0.3
201 |           failure_action: rollback
202 |           delay: 30s
203 |         restart_policy:
204 |           condition: on-failure
205 |           delay: 5s
206 |           max_attempts: 3
207 | 
208 | Because I'm deploying this with a [webhook to jenkins](https://homelab.business/ark-jenkins-ansible-swarm/#webhook), the [commit](https://github.com/jahrik/docker-swarm-monitor/commit/ccc13342b8c58a08ce8da8488f2b414cc296f2a7) that added this ^ to the stack file deployed Cadvisor to the Swarm, as I'm writing this.
209 | 
210 | Cadvisor is viewable at [docker_host:9102/containers](docker_host:9102/containers/)
211 | 
212 | ![cadvisor_exporter.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/cadvisor_exporter.png?raw=true)
213 | 
214 | Create a job in the [prometheus.yml](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/prometheus.yml.j2) file to import data from Cadvisor.
215 | 
216 |     # http://shredder:9102/metrics/
217 |     - job_name: 'cadvisor'
218 |       scrape_interval: 30s
219 |       metrics_path: '/metrics'
220 |       static_configs:
221 |       - targets:
222 |         - docker_host:9102
223 | 
224 | With that deployed, a new target is added to `Prometheus > targets`, [docker_host:9090/targets](docker_host:9090/targets/)
225 | 
226 | ![prometheus_targets.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/prometheus_targets.png?raw=true)
227 | 
228 | Refresh the docker swarm monitor dashboard and there should be a lot more info now!
229 | 
230 | ![grafana_docker_swarm_dashboard_with_cadvisor.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_docker_swarm_dashboard_with_cadvisor.png?raw=true)
231 | 
232 | ## Pihole
233 | 
234 | [Pi-Hole](https://github.com/pi-hole/pi-hole) is running on a raspberry pi. It acts as the DNS and DHCP server for the network, while caching DNS queries and providing "a [DNS sinkhole](https://en.wikipedia.org/wiki/DNS_sinkhole) that protects your devices from unwanted content, without installing any client-side software."  It blocks a surprising amount of ad content from things like Facebook, news sites, and blog posts, like this one, which uses Google Adsense. It worked well enough, in-fact, I had to add google analytics to the whitelist after setting this up to access the site and check metrics.
235 | 
236 | ![pihole.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/pihole.png?raw=true)
237 | 
238 | Seeing the results and experiencing an increase in query speeds, was worth the hour or so of fussing with pfsense. Finally, disabling DHCP and DNS forwarding altogether on the firewall and just letting Pi-Hole handle it, worked.  The dashboard that comes with pihole is great and really all you need for this service, but it's also nice to have those metrics in the same location as other monitoring tools and graphs.
239 | 
240 | ## Pihole exporter
241 | 
242 | * where `pihole_host_ip` is the ip or hostname of pihole
243 | 
244 | One way to accomplish this is with the [pihole_exporter](https://github.com/nlamirault/pihole_exporter) for prometheus.  I fought with this for a good hour before getting it to work.  Eventually building it from source with docker and pushing it up to docker hub to pull into swarm at stack creation time.
245 | 
246 | This is the error I kept seeing when using the latest version in docker hub: `standard_init_linux.go:190: exec user process caused "exec format error"`, and can be reproduced like this.
247 | 
248 |     docker run -it povilasv/arm-pihole_exporter -pihole http://pihole_host_ip
249 | 
250 |     standard_init_linux.go:190: exec user process caused "exec format error"
251 | 
252 | So, I had to clone the repo locally.
253 | 
254 |     git clone https://github.com/nlamirault/pihole_exporter.git
255 | 
256 |     cd pihole_exporter
257 | 
258 | And build it with docker
259 | 
260 |     docker build -t jahrik/pihole_exporter .
261 | 
262 |     Sending build context to Docker daemon  5.881MB
263 |     Step 1/11 : FROM golang:alpine AS build
264 |     ...
265 |     ...
266 |     ...
267 |     ...
268 |     Step 11/11 : EXPOSE 9311
269 |      ---> Using cache
270 |      ---> f4cfd273446d
271 |     Successfully built f4cfd273446d
272 |     Successfully tagged jahrik/pihole_exporter:latest
273 | 
274 | Somehow, this magically fixed the error and it just works.
275 | 
276 |     docker run -it jahrik/pihole_exporter -pihole http://pihole_host_ip
277 | 
278 |     INFO[0000] Setup Pihole exporter using URL: %s http://pihole_host_ip  source="pihole_exporter.go:112"
279 |     INFO[0000] Register exporter                             source="pihole_exporter.go:197"
280 |     INFO[0000] Listening on :9311                            source="pihole_exporter.go:211"
281 | 
282 | Push it up to dockerhub before deploying to swarm.
283 | 
284 |     docker push jahrik/pihole-exporter
285 | 
286 | I'm pulling the `jahrik/pihole_exporter` version in [the stack file](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/monitor-stack.yml.j2), rather than the original, `povilasv/arm-pihole_exporter` and starting it up in swarm worked after that.
287 | 
288 |     pihole-exporter:
289 |       image: jahrik/pihole-exporter
290 |       ports:
291 |         - '9101:9311'
292 |       deploy:
293 |         replicas: 1
294 |       command: "-pihole http://pihole_host_ip"
295 | 
296 | Output can be seen at [http://docker_host:9101/metrics](http://docker_host:9101/metrics)
297 | 
298 | ![pihole_exporter.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/pihole_exporter.png?raw=true)
299 | 
300 | Import the [Pi-Hole dashboard](https://grafana.com/dashboards/5855) to Grafana and it should look something like this to start with.  I'm assuming the missing data points are there because the exporter is only seeing what pihole allows without admin access and will need user creds for the api to pull more metrics.  Another thing for the **TODO** list!
301 | 
302 | ![grafana_pihole_dashboard.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_pihole_dashboard.png?raw=true)
303 | 
304 | With a few minor hiccups along the way, this setup has worked great for me in my homelab docker swarm environment.  In part 03, I'm thinking of putting together an example Prometheus client with Python to monitor the temperature of my server with [lm_sensors](https://wiki.archlinux.org/index.php/lm_sensors) or [PySensors](https://pypi.org/project/PySensors/#description) and output that to a gauge in Grafana.
305 | 
306 | 


--------------------------------------------------------------------------------