├── .gitignore ├── monitor ├── vars │ └── main.yml ├── handlers │ └── main.yml ├── defaults │ └── main.yml ├── .yamllint ├── README.md ├── tasks │ ├── vagrant_swarm.yml │ └── main.yml ├── templates │ ├── monitor-stack.yml.j2 │ └── prometheus.yml.j2 └── meta │ └── main.yml ├── images ├── pihole.png ├── config_host_ip.png ├── grafana_login.png ├── node_exporter.png ├── pihole_exporter.png ├── cadvisor_exporter.png ├── complete_dashboard.png ├── import_dashboard_01.png ├── import_dashboard_02.png ├── prometheus_query_01.png ├── prometheus_targets.png ├── grafana_node_exporter.png ├── grafana_zfs_job_node.png ├── config_add_data_source.png ├── grafana_dashboard_save.png ├── grafana_pihole_dashboard.png ├── grafana_save_dashboard.png ├── config_dropdown_prometheus.png ├── grafana_save_dashboard_error.png ├── grafana_docker_swarm_dashboard_before.png └── grafana_docker_swarm_dashboard_with_cadvisor.png ├── playbook.yml ├── Jenkinsfile ├── Makefile ├── gh-md-toc ├── README.md └── PART_02.md /.gitignore: -------------------------------------------------------------------------------- 1 | data/* 2 | *molecule* 3 | pihole_exporter/* 4 | -------------------------------------------------------------------------------- /monitor/vars/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # vars file for monitor 3 | -------------------------------------------------------------------------------- /monitor/handlers/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # handlers file for monitor 3 | -------------------------------------------------------------------------------- /monitor/defaults/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # defaults file for monitor 3 | monitor_dir: '/data' 4 | -------------------------------------------------------------------------------- /images/pihole.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/pihole.png -------------------------------------------------------------------------------- /images/config_host_ip.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/config_host_ip.png -------------------------------------------------------------------------------- /images/grafana_login.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_login.png -------------------------------------------------------------------------------- /images/node_exporter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/node_exporter.png -------------------------------------------------------------------------------- /images/pihole_exporter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/pihole_exporter.png -------------------------------------------------------------------------------- /playbook.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - hosts: all 3 | vars: 4 | monitor_dir: /shredder_pool 5 | roles: 6 | - monitor 7 | -------------------------------------------------------------------------------- /images/cadvisor_exporter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/cadvisor_exporter.png -------------------------------------------------------------------------------- /images/complete_dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/complete_dashboard.png -------------------------------------------------------------------------------- /images/import_dashboard_01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/import_dashboard_01.png -------------------------------------------------------------------------------- /images/import_dashboard_02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/import_dashboard_02.png -------------------------------------------------------------------------------- /images/prometheus_query_01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/prometheus_query_01.png -------------------------------------------------------------------------------- /images/prometheus_targets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/prometheus_targets.png -------------------------------------------------------------------------------- /images/grafana_node_exporter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_node_exporter.png -------------------------------------------------------------------------------- /images/grafana_zfs_job_node.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_zfs_job_node.png -------------------------------------------------------------------------------- /images/config_add_data_source.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/config_add_data_source.png -------------------------------------------------------------------------------- /images/grafana_dashboard_save.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_dashboard_save.png -------------------------------------------------------------------------------- /images/grafana_pihole_dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_pihole_dashboard.png -------------------------------------------------------------------------------- /images/grafana_save_dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_save_dashboard.png -------------------------------------------------------------------------------- /images/config_dropdown_prometheus.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/config_dropdown_prometheus.png -------------------------------------------------------------------------------- /images/grafana_save_dashboard_error.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_save_dashboard_error.png -------------------------------------------------------------------------------- /images/grafana_docker_swarm_dashboard_before.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_docker_swarm_dashboard_before.png -------------------------------------------------------------------------------- /images/grafana_docker_swarm_dashboard_with_cadvisor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/HEAD/images/grafana_docker_swarm_dashboard_with_cadvisor.png -------------------------------------------------------------------------------- /monitor/.yamllint: -------------------------------------------------------------------------------- 1 | extends: default 2 | 3 | rules: 4 | braces: 5 | max-spaces-inside: 1 6 | level: error 7 | brackets: 8 | max-spaces-inside: 1 9 | level: error 10 | line-length: disable 11 | # NOTE(retr0h): Templates no longer fail this lint rule. 12 | # Uncomment if running old Molecule templates. 13 | # truthy: disable 14 | -------------------------------------------------------------------------------- /monitor/README.md: -------------------------------------------------------------------------------- 1 | Docker Swarm Monitor 2 | ========= 3 | 4 | Configure directory structure for docker swarm volume mounts. 5 | Deploy Prometheus, Cadvisor, Grafana, and more to docker swarm. 6 | 7 | Requirements 8 | ------------ 9 | 10 | Role Variables 11 | -------------- 12 | 13 | monitor_dir: '/data' 14 | 15 | Dependencies 16 | ------------ 17 | 18 | Example Playbook 19 | ---------------- 20 | 21 | Including an example of how to use your role (for instance, with variables 22 | passed in as parameters) is always nice for users too: 23 | 24 | - hosts: servers 25 | roles: 26 | - { role: monitor, monitor_dir: '/not_data/' } 27 | 28 | License 29 | ------- 30 | 31 | GPLv2 32 | 33 | Author Information 34 | ------------------ 35 | 36 | [homelab.business](https://homelab.business/docker-swarm-monitoring-part-02-fixes-cadvisor-pihole/) 37 | -------------------------------------------------------------------------------- /monitor/tasks/vagrant_swarm.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: install dependencies 3 | become: true 4 | apt: 5 | name: "{{ item }}" 6 | state: present 7 | update_cache: true 8 | with_items: 9 | - nmap 10 | - apt-transport-https 11 | - ca-certificates 12 | - curl 13 | - software-properties-common 14 | 15 | - name: add docker ce repo key 16 | become: true 17 | apt_key: 18 | url: https://download.docker.com/linux/ubuntu/gpg 19 | state: present 20 | 21 | - name: add docker ce repo 22 | become: true 23 | apt_repository: 24 | repo: "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" 25 | state: present 26 | filename: docker 27 | update_cache: true 28 | 29 | - name: install docker-ce 30 | become: true 31 | apt: 32 | name: docker-ce 33 | state: present 34 | update_cache: true 35 | 36 | - name: determine swarm status 37 | become: true 38 | shell: > 39 | docker info | egrep '^Swarm: ' | cut -d ' ' -f2 40 | register: swarm_status 41 | 42 | - name: initialize swarm cluster 43 | become: true 44 | shell: > 45 | docker swarm init 46 | --advertise-addr={{ ansible_default_ipv4.address | default('eth0') }}:2377 47 | when: "'inactive' in swarm_status.stdout_lines" 48 | -------------------------------------------------------------------------------- /Jenkinsfile: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env groovy 2 | 3 | node('master') { 4 | 5 | try { 6 | 7 | stage('build') { 8 | // Clean workspace 9 | deleteDir() 10 | // Checkout the app at the given commit sha from the webhook 11 | checkout scm 12 | } 13 | 14 | stage('test') { 15 | // Run any testing suites 16 | sh "echo 'WE ARE TESTING'" 17 | } 18 | 19 | stage('deploy') { 20 | sh "echo 'WE ARE DEPLOYING'" 21 | wrap([$class: 'AnsiColorBuildWrapper', colorMapName: "xterm"]) { 22 | ansibleTower( 23 | towerServer: 'shredder', 24 | jobTemplate: 'monitor', 25 | importTowerLogs: true, 26 | inventory: '', 27 | jobTags: '', 28 | limit: '', 29 | removeColor: false, 30 | verbose: true, 31 | credential: '', 32 | extraVars: '''--- 33 | test: "test"''' 34 | ) 35 | } 36 | } 37 | 38 | } catch(error) { 39 | throw error 40 | 41 | } finally { 42 | // Any cleanup operations needed, whether we hit an error or not 43 | 44 | } 45 | } 46 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | DATA_DIR?=./data 2 | VERSION = "0.1.0" 3 | 4 | all: dir config update deploy 5 | 6 | config: ## Copy prometheus.yml to config dir 7 | @cp prometheus.yml $(DATA_DIR)/etc/prometheus.yml 8 | 9 | dir: ## Create directories 10 | @mkdir -p \ 11 | $(DATA_DIR)/etc \ 12 | $(DATA_DIR)/grafana \ 13 | $(DATA_DIR)/prometheus \ 14 | || echo "TRY SUDO" 15 | @chmod 777 $(DATA_DIR)/prometheus 16 | @chown -R nobody:nobody $(DATA_DIR)/prometheus 17 | 18 | update: ## Pull latest docker images 19 | @docker pull grafana/grafana 20 | @docker pull prom/prometheus:latest 21 | @docker pull prom/node-exporter:latest 22 | 23 | deploy: ## Deploy to docker swarm 24 | @docker stack deploy -c docker-stack.yml monitor 25 | 26 | destroy: ## Docker stack rm && rm -rf data 27 | @docker stack rm monitor 28 | @rm -rf $(DATA_DIR) 29 | 30 | help: ## This help dialog 31 | @IFS=$$'\n' ; \ 32 | help_lines=(`fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//'`); \ 33 | for help_line in $${help_lines[@]}; do \ 34 | IFS=$$'#' ; \ 35 | help_split=($$help_line) ; \ 36 | help_command=`echo $${help_split[0]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \ 37 | help_info=`echo $${help_split[2]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \ 38 | printf "%-10s %s\n" $$help_command $$help_info ; \ 39 | done 40 | 41 | .PHONY: all config destroy deploy dir docker help test update 42 | -------------------------------------------------------------------------------- /monitor/templates/monitor-stack.yml.j2: -------------------------------------------------------------------------------- 1 | version: '3' 2 | 3 | services: 4 | 5 | grafana: 6 | image: grafana/grafana 7 | ports: 8 | - "3000:3000" 9 | volumes: 10 | - {{ monitor_dir }}/grafana:/var/lib/grafana:rw 11 | deploy: 12 | mode: replicated 13 | replicas: 1 14 | 15 | prometheus: 16 | image: prom/prometheus:latest 17 | ports: 18 | - '9090:9090' 19 | volumes: 20 | - {{ monitor_dir }}/etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro 21 | - {{ monitor_dir }}/prometheus:/prometheus:rw 22 | deploy: 23 | mode: replicated 24 | replicas: 1 25 | 26 | exporter: 27 | image: prom/node-exporter:latest 28 | ports: 29 | - '9100:9100' 30 | volumes: 31 | - /sys:/host/sys:ro 32 | - /:/rootfs:ro 33 | - /proc:/host/proc:ro 34 | deploy: 35 | mode: global 36 | 37 | pihole-exporter: 38 | image: jahrik/pihole-exporter 39 | ports: 40 | - '9101:9311' 41 | deploy: 42 | replicas: 1 43 | command: "-pihole http://bebop" 44 | 45 | cadvisor: 46 | image: google/cadvisor:latest 47 | ports: 48 | - '9102:8080' 49 | volumes: 50 | - /var/lib/docker/:/var/lib/docker 51 | - /dev/disk/:/dev/disk 52 | - /sys:/sys 53 | - /var/run:/var/run 54 | - /:/rootfs 55 | - /dev/zfs:/dev/zfs 56 | deploy: 57 | mode: global 58 | resources: 59 | limits: 60 | cpus: '0.50' 61 | memory: 1024M 62 | reservations: 63 | cpus: '0.25' 64 | memory: 512M 65 | update_config: 66 | parallelism: 3 67 | monitor: 2m 68 | max_failure_ratio: 0.3 69 | failure_action: rollback 70 | delay: 30s 71 | restart_policy: 72 | condition: on-failure 73 | delay: 5s 74 | max_attempts: 3 75 | -------------------------------------------------------------------------------- /monitor/meta/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | galaxy_info: 3 | author: your name 4 | description: your description 5 | company: your company (optional) 6 | 7 | # If the issue tracker for your role is not on github, uncomment the 8 | # next line and provide a value 9 | # issue_tracker_url: http://example.com/issue/tracker 10 | 11 | # Some suggested licenses: 12 | # - BSD (default) 13 | # - MIT 14 | # - GPLv2 15 | # - GPLv3 16 | # - Apache 17 | # - CC-BY 18 | license: license (GPLv2, CC-BY, etc) 19 | 20 | min_ansible_version: 1.2 21 | 22 | # If this a Container Enabled role, provide the minimum Ansible Container version. 23 | # min_ansible_container_version: 24 | 25 | # Optionally specify the branch Galaxy will use when accessing the GitHub 26 | # repo for this role. During role install, if no tags are available, 27 | # Galaxy will use this branch. During import Galaxy will access files on 28 | # this branch. If Travis integration is configured, only notifications for this 29 | # branch will be accepted. Otherwise, in all cases, the repo's default branch 30 | # (usually master) will be used. 31 | # github_branch: 32 | 33 | # 34 | # platforms is a list of platforms, and each platform has a name and a list of versions. 35 | # 36 | # platforms: 37 | # - name: Fedora 38 | # versions: 39 | # - all 40 | # - 25 41 | # - name: SomePlatform 42 | # versions: 43 | # - all 44 | # - 1.0 45 | # - 7 46 | # - 99.99 47 | 48 | galaxy_tags: [] 49 | # List tags for your role here, one per line. A tag is a keyword that describes 50 | # and categorizes the role. Users find roles by searching for tags. Be sure to 51 | # remove the '[]' above, if you add tags to this list. 52 | # 53 | # NOTE: A tag is limited to a single word comprised of alphanumeric characters. 54 | # Maximum 20 tags per role. 55 | 56 | dependencies: [] 57 | # List your role dependencies here, one per line. Be sure to remove the '[]' above, 58 | # if you add dependencies to this list. 59 | -------------------------------------------------------------------------------- /monitor/templates/prometheus.yml.j2: -------------------------------------------------------------------------------- 1 | global: 2 | scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. 3 | evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. 4 | # scrape_timeout is set to the global default (10s). 5 | 6 | # Alertmanager configuration 7 | alerting: 8 | alertmanagers: 9 | - static_configs: 10 | - targets: 11 | # - alertmanager:9093 12 | 13 | # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. 14 | rule_files: 15 | # - 'first_rules.yml' 16 | # - 'second_rules.yml' 17 | 18 | # A scrape configuration containing exactly one endpoint to scrape: 19 | # Here it's Prometheus itself. 20 | scrape_configs: 21 | # The job name is added as a label `job=` to any timeseries scraped from this config. 22 | - job_name: 'prometheus' 23 | scrape_interval: 10s 24 | static_configs: 25 | - targets: 26 | - shredder:9090 27 | 28 | # http://shredder:9100/metrics 29 | - job_name: 'node' 30 | scrape_interval: 10s 31 | metrics_path: '/metrics' 32 | static_configs: 33 | - targets: 34 | - shredder:9100 35 | # - donatello:9100 36 | # - leonardo:9100 37 | - ninja:9100 38 | - venus:9100 39 | - oroku:9100 40 | - bebop:9100 41 | - rocks:9100 42 | 43 | # http://shredder:9101/metrics 44 | - job_name: 'pihole' 45 | scrape_interval: 10s 46 | metrics_path: '/metrics' 47 | static_configs: 48 | - targets: 49 | - shredder:9101 50 | 51 | # http://shredder:9102/metrics/ 52 | - job_name: 'cadvisor' 53 | scrape_interval: 30s 54 | metrics_path: '/metrics' 55 | static_configs: 56 | - targets: 57 | - shredder:9102 58 | 59 | # http://shredder:9103/metrics/ 60 | - job_name: 'transmission' 61 | scrape_interval: 10s 62 | metrics_path: '/metrics' 63 | static_configs: 64 | - targets: 65 | - shredder:9103 66 | 67 | # http://shredder:8080/prometheus/ 68 | - job_name: 'jenkins' 69 | scrape_interval: 10s 70 | metrics_path: '/prometheus' 71 | static_configs: 72 | - targets: 73 | - shredder:8080 74 | -------------------------------------------------------------------------------- /monitor/tasks/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # If built in molecule, user will be 'ubuntu' 3 | # install docker swarm first 4 | 5 | - include_tasks: vagrant_swarm.yml 6 | when: 7 | - ansible_user_id == 'ubuntu' 8 | 9 | - name: Create base directory structure 10 | become: true 11 | file: 12 | path: "{{ item }}" 13 | state: directory 14 | owner: root 15 | group: root 16 | mode: 0755 17 | with_items: 18 | - "{{ monitor_dir }}" 19 | - "{{ monitor_dir }}/stacks" 20 | 21 | - name: Create directories for prometheus 22 | become: true 23 | file: 24 | path: "{{ item }}" 25 | state: directory 26 | owner: 65534 27 | group: 65534 28 | mode: 0755 29 | # recurse: yes 30 | with_items: 31 | - "{{ monitor_dir }}/prometheus" 32 | - "{{ monitor_dir }}/etc/prometheus" 33 | 34 | - name: Create directories for grafana 35 | become: true 36 | file: 37 | path: "{{ item }}" 38 | state: directory 39 | owner: 472 40 | group: 472 41 | mode: 0755 42 | # recurse: yes 43 | with_items: 44 | - "{{ monitor_dir }}/grafana" 45 | 46 | - name: Generate config files 47 | become: true 48 | template: 49 | src: prometheus.yml.j2 50 | dest: "{{ monitor_dir }}/etc/prometheus/prometheus.yml" 51 | mode: 0644 52 | register: prom_conf 53 | 54 | - name: Check if prometheus is running 55 | ignore_errors: true 56 | uri: 57 | url: "http://{{ ansible_default_ipv4.address }}:9090/graph" 58 | status_code: 200 59 | register: result 60 | 61 | - name: kill prometheus service if conf file changes 62 | become: true 63 | command: docker service rm monitor_prometheus 64 | when: 65 | - result.status == 200 66 | - prom_conf.changed 67 | 68 | - name: Generate stack file 69 | become: true 70 | template: 71 | src: monitor-stack.yml.j2 72 | dest: "{{ monitor_dir }}/stacks/monitor-stack.yml" 73 | mode: 0644 74 | 75 | - name: update docker images 76 | become: true 77 | command: "{{ item }}" 78 | with_items: 79 | - docker pull grafana/grafana 80 | - docker pull prom/prometheus 81 | - docker pull prom/node-exporter 82 | - docker pull jahrik/pihole-exporter 83 | 84 | - name: deploy the monitor stack to docker swarm 85 | become: true 86 | command: docker stack deploy -c monitor-stack.yml monitor 87 | args: 88 | chdir: "{{ monitor_dir }}/stacks/" 89 | 90 | - name: Wait for prometheus port to come up 91 | wait_for: 92 | host: "{{ ansible_default_ipv4.address }}" 93 | port: 9090 94 | timeout: 30 95 | 96 | - name: Wait for grafana port to come up 97 | wait_for: 98 | host: "{{ ansible_default_ipv4.address }}" 99 | port: 3000 100 | timeout: 30 101 | 102 | - name: Wait for prometheus exporter port to come up 103 | wait_for: 104 | host: "{{ ansible_default_ipv4.address }}" 105 | port: 9100 106 | timeout: 30 107 | 108 | - name: Wait for pihole exporter port to come up 109 | wait_for: 110 | host: "{{ ansible_default_ipv4.address }}" 111 | port: 9101 112 | timeout: 30 113 | 114 | - name: docker stack ps monitor 115 | become: true 116 | shell: docker stack ps monitor 117 | register: docker_stack 118 | 119 | - debug: 120 | msg: "Stack {{ docker_stack.stdout_lines }}" 121 | -------------------------------------------------------------------------------- /gh-md-toc: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # 4 | # Steps: 5 | # 6 | # 1. Download corresponding html file for some README.md: 7 | # curl -s $1 8 | # 9 | # 2. Discard rows where no substring 'user-content-' (github's markup): 10 | # awk '/user-content-/ { ... 11 | # 12 | # 3.1 Get last number in each row like ' ... sitemap.js.*<\/h/)+2, RLENGTH-5) 21 | # 22 | # 5. Find anchor and insert it inside "(...)": 23 | # substr($0, match($0, "href=\"[^\"]+?\" ")+6, RLENGTH-8) 24 | # 25 | 26 | gh_toc_version="0.4.9" 27 | 28 | gh_user_agent="gh-md-toc v$gh_toc_version" 29 | 30 | # 31 | # Download rendered into html README.md by its url. 32 | # 33 | # 34 | gh_toc_load() { 35 | local gh_url=$1 36 | 37 | if type curl &>/dev/null; then 38 | curl --user-agent "$gh_user_agent" -s "$gh_url" 39 | elif type wget &>/dev/null; then 40 | wget --user-agent="$gh_user_agent" -qO- "$gh_url" 41 | else 42 | echo "Please, install 'curl' or 'wget' and try again." 43 | exit 1 44 | fi 45 | } 46 | 47 | # 48 | # Converts local md file into html by GitHub 49 | # 50 | # ➥ curl -X POST --data '{"text": "Hello world github/linguist#1 **cool**, and #1!"}' https://api.github.com/markdown 51 | #

Hello world github/linguist#1 cool, and #1!

'" 52 | gh_toc_md2html() { 53 | local gh_file_md=$1 54 | URL=https://api.github.com/markdown/raw 55 | TOKEN="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/token.txt" 56 | if [ -f "$TOKEN" ]; then 57 | URL="$URL?access_token=$(cat $TOKEN)" 58 | fi 59 | curl -s --user-agent "$gh_user_agent" \ 60 | --data-binary @"$gh_file_md" -H "Content-Type:text/plain" \ 61 | $URL 62 | } 63 | 64 | # 65 | # Is passed string url 66 | # 67 | gh_is_url() { 68 | case $1 in 69 | https* | http*) 70 | echo "yes";; 71 | *) 72 | echo "no";; 73 | esac 74 | } 75 | 76 | # 77 | # TOC generator 78 | # 79 | gh_toc(){ 80 | local gh_src=$1 81 | local gh_src_copy=$1 82 | local gh_ttl_docs=$2 83 | 84 | if [ "$gh_src" = "" ]; then 85 | echo "Please, enter URL or local path for a README.md" 86 | exit 1 87 | fi 88 | 89 | 90 | # Show "TOC" string only if working with one document 91 | if [ "$gh_ttl_docs" = "1" ]; then 92 | 93 | echo "Table of Contents" 94 | echo "=================" 95 | echo "" 96 | gh_src_copy="" 97 | 98 | fi 99 | 100 | if [ "$(gh_is_url "$gh_src")" == "yes" ]; then 101 | gh_toc_load "$gh_src" | gh_toc_grab "$gh_src_copy" 102 | else 103 | gh_toc_md2html "$gh_src" | gh_toc_grab "$gh_src_copy" 104 | fi 105 | } 106 | 107 | # 108 | # Grabber of the TOC from rendered html 109 | # 110 | # $1 — a source url of document. 111 | # It's need if TOC is generated for multiple documents. 112 | # 113 | gh_toc_grab() { 114 | # if closed is on the new line, then move it on the prev line 115 | # for example: 116 | # was: The command foo1 117 | # 118 | # became: The command foo1 119 | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n<\/h/<\/h/g' | 120 | # find strings that corresponds to template 121 | grep -E -o '//' | sed 's/<\/code>//' | 124 | # now all rows are like: 125 | # ... .*<\/h/)+2, RLENGTH-5)"](" gh_url substr($0, match($0, "href=\"[^\"]+?\" ")+6, RLENGTH-8) ")"}' | sed 'y/+/ /; s/%/\\x/g')" 130 | } 131 | 132 | # 133 | # Returns filename only from full path or url 134 | # 135 | gh_toc_get_filename() { 136 | echo "${1##*/}" 137 | } 138 | 139 | # 140 | # Options hendlers 141 | # 142 | gh_toc_app() { 143 | local app_name="gh-md-toc" 144 | 145 | if [ "$1" = '--help' ] || [ $# -eq 0 ] ; then 146 | echo "GitHub TOC generator ($app_name): $gh_toc_version" 147 | echo "" 148 | echo "Usage:" 149 | echo " $app_name src [src] Create TOC for a README file (url or local path)" 150 | echo " $app_name - Create TOC for markdown from STDIN" 151 | echo " $app_name --help Show help" 152 | echo " $app_name --version Show version" 153 | return 154 | fi 155 | 156 | if [ "$1" = '--version' ]; then 157 | echo "$gh_toc_version" 158 | return 159 | fi 160 | 161 | if [ "$1" = "-" ]; then 162 | if [ -z "$TMPDIR" ]; then 163 | TMPDIR="/tmp" 164 | elif [ -n "$TMPDIR" -a ! -d "$TMPDIR" ]; then 165 | mkdir -p "$TMPDIR" 166 | fi 167 | local gh_tmp_md 168 | gh_tmp_md=$(mktemp $TMPDIR/tmp.XXXXXX) 169 | while read input; do 170 | echo "$input" >> "$gh_tmp_md" 171 | done 172 | gh_toc_md2html "$gh_tmp_md" | gh_toc_grab "" 173 | return 174 | fi 175 | 176 | for md in "$@" 177 | do 178 | echo "" 179 | gh_toc "$md" "$#" 180 | done 181 | 182 | echo "" 183 | echo "Created by [gh-md-toc](https://github.com/ekalinin/github-markdown-toc)" 184 | } 185 | 186 | # 187 | # Entry point 188 | # 189 | gh_toc_app "$@" 190 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Docker Swarm monitoring - part 01 (Node-exporter, Prometheus, and Grafana) 2 | 3 | An effective monitoring system can be built across a Docker Swarm cluster using services managed by swarm itself. Starting with the prometheus node-exporter to gather system info from all host machines running Docker in swarm mode. Mount the system's directories as docker volumes to accomplish read access. Prometheus exporter gathers system info such as CPU, memory, and disk usage and exports it to a website that Prometheus server can then scrape every 15 seconds and fill a Time Series Data Base. With those 2 services in place, Grafana can then be pointed at the Prometheus server to build beautiful graphs and dashboards! 4 | 5 | Prerequisites: 6 | * [Docker Install Docs](https://docs.docker.com/install/linux/docker-ce/ubuntu/) 7 | * [Docker Swarm Docs](https://docs.docker.com/engine/reference/commandline/swarm_init/) 8 | * [github.com/jahrik/docker-swarm-monitor](https://github.com/jahrik/docker-swarm-monitor) 9 | 10 | Docker Swarm uses [Compose v3](https://docs.docker.com/compose/compose-file/) and uses a `docker-stack.yml` file, much like the `docker-compose.yml` files designed to be used with the `docker-compose` tool, which use Compose v2. One of the biggest differences you'll run into when starting services with `docker stack deploy` over `docker-compose up/down` is that docker swarm creates a [Routing Mesh](https://docs.docker.com/engine/swarm/ingress/) for you, where as with `docker-compose` networks and containers have to be explicitly created and linked. In swarm mode, the `link: ` is no longer needed. Services can be included in the same stack file and, by default, be created in the same network stack at deploy time, allowing docker containers to call each other by service name. This network can then be used by other stacks and future services by calling it in the stack file and assigning a service to it. This makes it easy to keep containers on their own isolated network or to cluster certain services like metrics and logging tools together on the same network. 11 | 12 | Here is a Compose v3 docker-stack.yml file for this project that will start three services: Grafana, Prometheus server, and Prometheus node-exporter. 13 | 14 | **docker-stack.yml** 15 | 16 | version: '3' 17 | 18 | services: 19 | 20 | exporter: 21 | image: prom/node-exporter:latest 22 | ports: 23 | - '9100:9100' 24 | volumes: 25 | - /sys:/host/sys:ro 26 | - /:/rootfs:ro 27 | - /proc:/host/proc:ro 28 | deploy: 29 | mode: global 30 | 31 | prometheus: 32 | image: prom/prometheus:latest 33 | ports: 34 | - '9090:9090' 35 | volumes: 36 | - ./data/etc/prometheus.yml:/etc/prometheus/prometheus.yml:ro 37 | - ./data/prometheus:/prometheus:rw 38 | deploy: 39 | mode: replicated 40 | replicas: 1 41 | 42 | grafana: 43 | image: grafana/grafana 44 | ports: 45 | - "3000:3000" 46 | volumes: 47 | - ./data/grafana:/var/lib/grafana:rw 48 | deploy: 49 | mode: replicated 50 | replicas: 1 51 | 52 | Directory creation needs to be done before deploying this stack. A [Makefile](https://github.com/jahrik/docker-swarm-monitor/blob/master/Makefile) has been included to handle config, build, deploy, destroy operations and should be used as a reference for the commands that will build this thing. 53 | 54 | make help 55 | 56 | config: Copy prometheus.yml to config dir 57 | dir: Create directories 58 | update: Pull latest docker images 59 | deploy: Deploy to docker swarm 60 | destroy: Docker stack rm && rm -rf data 61 | help: This help dialog 62 | 63 | With what's in the source code the stack can be started with: 64 | 65 | sudo make 66 | 67 | ## Prometheus 68 | 69 | ### Exporter 70 | 71 | Browse to the [Prometheus node-exporter](https://github.com/prometheus/node_exporter) docs up on github and you'll see a few lines at the bottom of the readme that say how to run this in docker that look like this. 72 | 73 | docker run -d \ 74 | --net="host" \ 75 | --pid="host" \ 76 | quay.io/prometheus/node-exporter 77 | 78 | Start by creating the stack file with just this entry. Take the image name from the docs and add it the stack file. The volumes in the stack file are mounted for prometheus to read. `deploy: mode: global` is saying that this service will be started on every node in the swarm cluster. Outputs to [localhost:9100/](http://localhost:9100/) 79 | 80 | **docker-stack.yml** 81 | 82 | version: '3' 83 | 84 | services: 85 | 86 | exporter: 87 | image: prom/node-exporter:latest 88 | ports: 89 | - '9100:9100' 90 | volumes: 91 | - /sys:/host/sys:ro 92 | - /:/rootfs:ro 93 | - /proc:/host/proc:ro 94 | deploy: 95 | mode: global 96 | 97 | Start this up with the `docker stack deploy` command 98 | 99 | docker stack deploy -c docker-stack.yml monitor 100 | 101 | Creating network monitor_default 102 | Creating service monitor_exporter 103 | 104 | This can also be kicked off with the Makefile 105 | 106 | make deploy 107 | 108 | Updating service monitor_exporter (id: ivbddqpnjr7sdxre0gzopney9) 109 | 110 | Check the service 111 | 112 | docker service ps monitor_exporter 113 | 114 | ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 115 | cp7o7s9t33s6 monitor_exporter.76g7crzb0hk6jp9zysvegmupy prom/node-exporter:latest localhost Running Running about a minute ago 116 | 117 | Check the logs 118 | 119 | docker service logs monitor_exporter 120 | ... 121 | ... 122 | monitor_exporter.0.cp7o7s9t33s6@localhost | time="2018-03-28T08:14:47Z" level=info msg="Listening on :9100" source="node_exporter.go:76" 123 | 124 | Browse [localhost:9100/](http://localhost:9100/) and check it out. 125 | 126 | ![node_exporter](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/node_exporter.png) 127 | 128 | ### Server 129 | 130 | Next, start up the [Prometheus Server](https://github.com/prometheus/prometheus). This will scrape the exporter at a (10 second interval) set in the prometheus.yml configuration file. This file will be configured locally and copied to the image as a volume at run time. Volumes will also be used for persistent tsdb data in case of a container restart or failure. 131 | 132 | **docker-stack.yml** 133 | 134 | prometheus: 135 | image: prom/prometheus:latest 136 | ports: 137 | - '9090:9090' 138 | volumes: 139 | - ./data/etc/prometheus.yml:/etc/prometheus/prometheus.yml:ro 140 | - ./data/prometheus:/prometheus:rw 141 | deploy: 142 | mode: replicated 143 | replicas: 1 144 | 145 | Prepare directories for mounting docker volumes. These will need read/write permissions for the default prometheus container user, which is nobody:nobody. 146 | 147 | DATA_DIR="./data" 148 | 149 | mkdir -p \ 150 | "$DATA_DIR/etc" \ 151 | "$DATA_DIR/grafana" \ 152 | "$DATA_DIR/prometheus" 153 | 154 | chmod 777 "$DATA_DIR/prometheus" 155 | chown -R nobody:nobody "$DATA_DIR/prometheus" 156 | 157 | Volumes are configured in the docker-stack.yml file. The first one is where prometheus will write it's database to. Secondly, prometheus mounts the prometheus.yml file which will come in handy later, when I start deploying this with jenkins later, because it let's me edit this file and reconfigure prometheus at deploy time. 158 | * ./data/prometheus:/prometheus:rw 159 | * ./data/etc/prometheus.yml:/etc/prometheus/prometheus.yml:ro 160 | 161 | Check out your prometheus.yml file and make sure the exporter is added as a scrape target. This is how targets will be added in the future. Like Cadvisor and mysql-exporter. 162 | 163 | **prometheus.yml** 164 | 165 | global: 166 | scrape_interval: 30s # Set the scrape interval to every 15 seconds. Default is every 1 minute. 167 | evaluation_interval: 30s # Evaluate rules every 15 seconds. The default is every 1 minute. 168 | # scrape_timeout is set to the global default (10s). 169 | 170 | # Alertmanager configuration 171 | alerting: 172 | alertmanagers: 173 | - static_configs: 174 | - targets: 175 | # - alertmanager:9093 176 | 177 | # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. 178 | rule_files: 179 | # - "first_rules.yml" 180 | # - "second_rules.yml" 181 | 182 | # A scrape configuration containing exactly one endpoint to scrape: 183 | # Here it's Prometheus itself. 184 | scrape_configs: 185 | # The job name is added as a label `job=` to any timeseries scraped from this config. 186 | - job_name: 'prometheus' 187 | # metrics_path defaults to '/metrics' 188 | # scheme defaults to 'http'. 189 | static_configs: 190 | - targets: 191 | - localhost:9090 192 | 193 | # http://exporter:9100/metrics 194 | - job_name: prometheus-exporter 195 | scrape_interval: 10s 196 | metrics_path: "/metrics" 197 | static_configs: 198 | - targets: 199 | - exporter:9100 200 | 201 | Use make to deploy and it will copy this config file to where it needs to go. 202 | 203 | make config 204 | 205 | config: 206 | @cp prometheus.yml $(DATA_DIR)/etc/prometheus.yml 207 | 208 | With the prometheus server service added to the docker-stack.yml file and everything configured, redeploy the stack to add the new service. 209 | 210 | docker stack deploy -c docker-stack.yml monitor 211 | 212 | Creating service monitor_prometheus 213 | Updating service monitor_exporter (id: ivbddqpnjr7sdxre0gzopney9) 214 | 215 | Browse to [localhost:9090/targets](http://127.0.0.1:9090/targets) to verify connectivity. 216 | 217 | ![prometheus_targets](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/prometheus_targets.png) 218 | 219 | That the server successfully scraping system data, it's now possible to query the tsdb 220 | 221 | node_cpu{cpu="cpu0"} 222 | node_cpu{cpu="cpu0",mode="idle"} 223 | etc... 224 | 225 | ![prometheus_query_01](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/prometheus_query_01.png) 226 | 227 | ## Grafana 228 | 229 | With Prometheus up and running, it's time to start Grafana. A volume is created for persistent data. Will output to [localhost:3000/](http://localhost:3000/) 230 | 231 | **docker-stack.yml** 232 | 233 | grafana: 234 | image: grafana/grafana 235 | ports: 236 | - "3000:3000" 237 | volumes: 238 | - ./data/grafana:/var/lib/grafana:rw 239 | deploy: 240 | mode: replicated 241 | replicas: 1 242 | 243 | Deploy to start Grafana 244 | 245 | docker stack deploy -c docker-stack.yml monitor 246 | 247 | Updating service monitor_exporter (id: ivbddqpnjr7sdxre0gzopney9) 248 | Updating service monitor_prometheus (id: q4f07qz2tk3dvic9kc21sa3kq) 249 | Creating service monitor_grafana 250 | 251 | Browse to [localhost:3000/login](http://localhost:3000/login) 252 | 253 | The default user and password are: `admin` `admin` 254 | 255 | ![grafana_login](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/grafana_login.png) 256 | 257 | Add a data source 258 | ![config_add_data_source](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/config_add_data_source.png) 259 | 260 | Chose prometheus from the drop down 261 | ![config_dropdown_prometheus](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/config_dropdown_prometheus.png) 262 | 263 | I used the IP from the host machine in this example 264 | ![config_host_ip](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/config_host_ip.png) 265 | 266 | Go to [grafana.com/dashboards](https://grafana.com/dashboards) to check out the thousands of pre-made dashboards that are out there and find one that will work as a template to build on. A good one to start with in this project is the [node exporter metrics on docker swarm mode](https://grafana.com/dashboards/1442) dashboard or `1442` 267 | 268 | Import this dashboard to Grafana 269 | ![import_dashboard_01](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/import_dashboard_01.png) 270 | 271 | Chose prometheus as data source and hit Import 272 | ![import_dashboard_01](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/import_dashboard_02.png) 273 | 274 | And dashboard 275 | ![complete_dashboard](https://raw.githubusercontent.com/jahrik/docker-swarm-monitor/master/images/complete_dashboard.png) 276 | 277 | With that, a very flexible monitoring system has been established across the swarm cluster! A lot can be done to add to it easily, with new data sources and dashboards. 278 | 279 | Bring it all down with 280 | 281 | sudo make destroy 282 | 283 | Removing service monitor_exporter 284 | Removing service monitor_grafana 285 | Removing service monitor_prometheus 286 | Removing network monitor_default 287 | -------------------------------------------------------------------------------- /PART_02.md: -------------------------------------------------------------------------------- 1 | # Docker Swarm monitoring - part 02 (Fixes, Cadvisor, and Pihole) 2 | 3 | In [part 01](https://homelab.business/docker-swarm-monitoring-part-01/), I deployed [node exporter](https://github.com/prometheus/node_exporter), [Prometheus](https://github.com/prometheus/prometheus), and [Grafana](https://grafana.com/). This time around, I will touch on some of the problems I've run into since then and how I solved them. I'll tack on another monitoring tool to the stack, [Cadvisor](https://github.com/google/cadvisor). Finally, I'll forward [Pi-Hole](https://pi-hole.net/) metrics to a Grafana dashboard. 4 | 5 | Since part 01, I have added enough to [deploy this to Docker Swarm](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/monitor-stack.yml.j2) using a [Jenkins pipeline](https://github.com/jahrik/docker-swarm-monitor/blob/master/Jenkinsfile) and [Ansible playbook](https://github.com/jahrik/docker-swarm-monitor/blob/master/playbook.yml). This workflow lets me push my changes to github, have Jenkins handle building and testing, then push configuration and deploy to Docker Swarm with Ansible AWX. There is a [write-up on doing the same thing with an Ark server](https://homelab.business/ark-jenkins-ansible-swarm/), if you need more information on how all those pieces fit together. 6 | 7 | * [Grafana](#grafana) 8 | * [Prometheus](#prometheus) 9 | * [Node Exporter](#node-exporter) 10 | * [Cadvisor](#cadvisor) 11 | * [Pihole](#pihole) 12 | * [Pihole exporter](#pihole-exporter) 13 | 14 | ## Grafana 15 | 16 | Somehow, I ended up changing the permissions to the Grafana SQLite.db file and it was still able to read data, but I wasn't able to save anything. Somewhere along the line, maybe I ran a command close to this? `chown 1000:1000 /data/grafana/grafana.db`. Grafana didn't like it. 17 | 18 | ![grafana_save_dashboard_error.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_save_dashboard_error.png?raw=true) 19 | 20 | The following was observed in `docker service logs -f monitor_grafana` 21 | 22 | monitor_grafana.1.tyxisxhoxri4@ | t=2018-05-18T05:54:07+0000 lvl=eror msg="Failed to save dashboard" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database" 23 | 24 | Plus a repeating stream of the following error, over and over. 25 | 26 | monitor_grafana.1.tyxisxhoxri4@ | t=2018-05-18T05:57:59+0000 lvl=eror msg="Failed to update last_seen_at" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database" 27 | monitor_grafana.1.tyxisxhoxri4@ | t=2018-05-18T05:57:59+0000 lvl=eror msg="Failed to update last_seen_at" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database" 28 | monitor_grafana.1.tyxisxhoxri4@ | t=2018-05-18T05:57:59+0000 lvl=eror msg="Failed to update last_seen_at" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database" 29 | monitor_grafana.1.tyxisxhoxri4@ | t=2018-05-18T05:57:59+0000 lvl=eror msg="Failed to update last_seen_at" logger=context userId=1 orgId=1 uname=admin error="attempt to write a readonly database" 30 | 31 | Which makes it pretty obvious what's going on: 32 | * msg="Failed to save dashboard" 33 | * msg="Failed to update last_seen_at" 34 | * error="attempt to write a readonly database" 35 | 36 | This was an easy fix. 37 | 38 | Find the grafana container and note the container id. 39 | 40 | docker ps 41 | 42 | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 43 | 5dbad5cc02a1 grafana/grafana:latest "/run.sh" 21 minutes ago Up 21 minutes 3000/tcp monitor_grafana.1.tyxisxhoxri40hfv56ecgr46i 44 | 45 | Execute a shell on the docker container. 46 | 47 | docker exec -it 5dbad5cc02a1 bash 48 | 49 | Get grafana user id info 50 | 51 | grafana@5dbad5cc02a1:/$ id 52 | uid=472(grafana) gid=472(grafana) groups=472(grafana) 53 | 54 | On the docker host, set the file permission to all the files on the host end of the volume. 55 | Where `/data/grafana` is the mounted volume containing `/data/grafana/grafana.db` 56 | 57 | sudo chown -R 472:472 /data/grafana 58 | 59 | Kill and restart the grafana service 60 | 61 | docker service rm monitor_grafana 62 | 63 | docker stack deploy -c monitor-stack.yml monitor 64 | 65 | Write access is restored. 66 | 67 | ![grafana_save_dashboard.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_save_dashboard.png?raw=true) 68 | 69 | Here's an Ansible task to handle this. 70 | 71 | - name: Create directories for grafana 72 | become: true 73 | file: 74 | path: "{{ item }}" 75 | state: directory 76 | owner: 472 77 | group: 472 78 | mode: 0755 79 | # recurse: yes 80 | with_items: 81 | - "{{ monitor_dir }}/grafana" 82 | - "{{ monitor_dir }}/grafana/sessions" 83 | - "{{ monitor_dir }}/grafana/plugins" 84 | 85 | - name: Set file perms on grafana.db file 86 | become: true 87 | file: 88 | path: "{{ monitor_dir }}/grafana/grafana.db" 89 | state: file 90 | owner: 472 91 | group: 472 92 | mode: 0664 93 | 94 | ## Prometheus 95 | 96 | When you make an update to the prometheus.yml file, the desired action is for the Prometheus server to be restarted. Because I'm deploying this in an automated fashion, I need to handle the restarting of this service the same way and add in a couple of checks along the way. [This Ansible playbook can be found here](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/tasks/main.yml). 97 | 98 | The config file is generated and registers a variable, `prom_conf` containing information about the file in question, `prometheus.yml`, including information on whether the file has been changed this run or not. 99 | 100 | - name: Generate config files 101 | become: true 102 | template: 103 | src: prometheus.yml.j2 104 | dest: "{{ monitor_dir }}/etc/prometheus/prometheus.yml" 105 | mode: 0644 106 | register: prom_conf 107 | 108 | A check to see if Prometheus is running or not, with the [uri module](http://docs.ansible.com/ansible/latest/modules/uri_module.html). This also registers a variable, `result`, containing a status code returned from whatever webserver it's pointed at. In this case, I'm pulling the default IPv4 address from the host that ansible is currently running on and adding `:9090/graph` to the end of that, in hopes of reaching Prometheus. Notice how this one also has `ignore_erros: true`. The reason for this, is for the very first time this runs on docker or for times when Prometheus is not actually running. Without that, you will get a status_code back that does not equal 200 and this task will fail. 109 | 110 | - name: Check if prometheus is running 111 | ignore_errors: true 112 | uri: 113 | url: "http://{{ ansible_default_ipv4.address }}:9090/graph" 114 | status_code: 200 115 | register: result 116 | 117 | With these two checks in place, there is enough information to determine if the prometheus server needs to be restarted or not. With a when statement that contains more than one thing, `when: ['one_thing','two_thing']`, both values have to be true before this task is kicked off. If the variable `prom_conf` comes back with a `.changed` status of `true` this will be passed on as true. Same goes for the `result.status`, if it == 200 it will return `true`. 118 | 119 | - name: kill prometheus service if conf file changes 120 | become: true 121 | command: docker service rm monitor_prometheus 122 | when: 123 | - result.status == 200 124 | - prom_conf.changed 125 | 126 | With that, the stack is redeployed to swarm, restarting Prometheus. 127 | 128 | - name: deploy the monitor stack to docker swarm 129 | become: true 130 | command: docker stack deploy -c monitor-stack.yml monitor 131 | args: 132 | chdir: "{{ monitor_dir }}/stacks/" 133 | 134 | 135 | I've also added a check at the end of the playbook to make sure Prometheus is running. 136 | 137 | - name: Wait for prometheus port to come up 138 | wait_for: 139 | host: "{{ ansible_default_ipv4.address }}" 140 | port: 9090 141 | timeout: 30 142 | 143 | ## Node Exporter 144 | 145 | I'm seeing the following from `docker service logs -f monitor_exporter`. I would like node exporter to ignore docker volume mounts. Ignoring all of /var/lib/docker would be ok with me for now to clean up this error, but I haven't figured out where to configure that yet. It's on the **TODO** list. 146 | 147 | time="2018-05-19T08:33:59Z" level=error msg="Error on statfs() system call for \"/rootfs/var/lib/docker/overlay2/f8da180fa939589132d04099a37c9f182bc0b38e84d0b84ee8958fe42aa5e18d/merged\": permission denied" source="filesystem_linux.go:57" 148 | time="2018-05-19T08:33:59Z" level=error msg="Error on statfs() system call for \"/rootfs/var/lib/docker/containers/01358918338b67982715107fe876b803abbcd0c57f4672c07de0025d1426f2af/mounts/shm\": permission denied" source="filesystem_linux.go:57" 149 | time="2018-05-19T08:33:59Z" level=error msg="Error on statfs() system call for \"/rootfs/run/docker/netns/a2d163e99d44\": permission denied" source="filesystem_linux.go:57" 150 | 151 | Out of the box, the [Node - ZFS](https://grafana.com/dashboards/3170) and [Node - ZFS all](https://grafana.com/dashboards/3161) dashboards rely on a specific job name to work. At first I had the Prometheus job name as 'node-exporter' in the prometheus.yml file, but this dashboard is relying on it being just 'node' and using that as a variable. 152 | 153 | ![grafana_zfs_job_node.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_zfs_job_node.png?raw=true) 154 | 155 | The entry in the [prometheus.yml](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/prometheus.yml.j2) file uses a job name of `node` to work with the zfs dashboards. 156 | 157 | # http://docker_host:9100/metrics 158 | - job_name: 'node' 159 | scrape_interval: 10s 160 | metrics_path: '/metrics' 161 | static_configs: 162 | - targets: 163 | - docker_host:9100 164 | 165 | * [Node Exporter Full](https://grafana.com/dashboards/1860) 166 | 167 | ## Cadvisor 168 | 169 | [Cadvisor](https://github.com/google/cadvisor) will export metrics from the container service running. 170 | > cAdvisor has native support for Docker containers and should support just about any other container type out of the box. 171 | 172 | While running node_exporter alone and not Cadvisor yet, the [Docker-swarm-monitor dashboard](https://grafana.com/dashboards/2603) will look a bit like this. 173 | ![grafana_docker_swarm_dashboard_before.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_docker_swarm_dashboard_before.png?raw=true) 174 | 175 | Add Cadvisor to the [monitor-stack.yml](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/monitor-stack.yml.j2) file. 176 | 177 | cadvisor: 178 | image: google/cadvisor:latest 179 | ports: 180 | - '9105:8080' 181 | volumes: 182 | - /var/lib/docker/:/var/lib/docker 183 | - /dev/disk/:/dev/disk 184 | - /sys:/sys 185 | - /var/run:/var/run 186 | - /:/rootfs 187 | - /dev/zfs:/dev/zfs 188 | deploy: 189 | mode: global 190 | resources: 191 | limits: 192 | cpus: '0.50' 193 | memory: 1024M 194 | reservations: 195 | cpus: '0.25' 196 | memory: 512M 197 | update_config: 198 | parallelism: 3 199 | monitor: 2m 200 | max_failure_ratio: 0.3 201 | failure_action: rollback 202 | delay: 30s 203 | restart_policy: 204 | condition: on-failure 205 | delay: 5s 206 | max_attempts: 3 207 | 208 | Because I'm deploying this with a [webhook to jenkins](https://homelab.business/ark-jenkins-ansible-swarm/#webhook), the [commit](https://github.com/jahrik/docker-swarm-monitor/commit/ccc13342b8c58a08ce8da8488f2b414cc296f2a7) that added this ^ to the stack file deployed Cadvisor to the Swarm, as I'm writing this. 209 | 210 | Cadvisor is viewable at [docker_host:9102/containers](docker_host:9102/containers/) 211 | 212 | ![cadvisor_exporter.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/cadvisor_exporter.png?raw=true) 213 | 214 | Create a job in the [prometheus.yml](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/prometheus.yml.j2) file to import data from Cadvisor. 215 | 216 | # http://shredder:9102/metrics/ 217 | - job_name: 'cadvisor' 218 | scrape_interval: 30s 219 | metrics_path: '/metrics' 220 | static_configs: 221 | - targets: 222 | - docker_host:9102 223 | 224 | With that deployed, a new target is added to `Prometheus > targets`, [docker_host:9090/targets](docker_host:9090/targets/) 225 | 226 | ![prometheus_targets.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/prometheus_targets.png?raw=true) 227 | 228 | Refresh the docker swarm monitor dashboard and there should be a lot more info now! 229 | 230 | ![grafana_docker_swarm_dashboard_with_cadvisor.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_docker_swarm_dashboard_with_cadvisor.png?raw=true) 231 | 232 | ## Pihole 233 | 234 | [Pi-Hole](https://github.com/pi-hole/pi-hole) is running on a raspberry pi. It acts as the DNS and DHCP server for the network, while caching DNS queries and providing "a [DNS sinkhole](https://en.wikipedia.org/wiki/DNS_sinkhole) that protects your devices from unwanted content, without installing any client-side software." It blocks a surprising amount of ad content from things like Facebook, news sites, and blog posts, like this one, which uses Google Adsense. It worked well enough, in-fact, I had to add google analytics to the whitelist after setting this up to access the site and check metrics. 235 | 236 | ![pihole.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/pihole.png?raw=true) 237 | 238 | Seeing the results and experiencing an increase in query speeds, was worth the hour or so of fussing with pfsense. Finally, disabling DHCP and DNS forwarding altogether on the firewall and just letting Pi-Hole handle it, worked. The dashboard that comes with pihole is great and really all you need for this service, but it's also nice to have those metrics in the same location as other monitoring tools and graphs. 239 | 240 | ## Pihole exporter 241 | 242 | * where `pihole_host_ip` is the ip or hostname of pihole 243 | 244 | One way to accomplish this is with the [pihole_exporter](https://github.com/nlamirault/pihole_exporter) for prometheus. I fought with this for a good hour before getting it to work. Eventually building it from source with docker and pushing it up to docker hub to pull into swarm at stack creation time. 245 | 246 | This is the error I kept seeing when using the latest version in docker hub: `standard_init_linux.go:190: exec user process caused "exec format error"`, and can be reproduced like this. 247 | 248 | docker run -it povilasv/arm-pihole_exporter -pihole http://pihole_host_ip 249 | 250 | standard_init_linux.go:190: exec user process caused "exec format error" 251 | 252 | So, I had to clone the repo locally. 253 | 254 | git clone https://github.com/nlamirault/pihole_exporter.git 255 | 256 | cd pihole_exporter 257 | 258 | And build it with docker 259 | 260 | docker build -t jahrik/pihole_exporter . 261 | 262 | Sending build context to Docker daemon 5.881MB 263 | Step 1/11 : FROM golang:alpine AS build 264 | ... 265 | ... 266 | ... 267 | ... 268 | Step 11/11 : EXPOSE 9311 269 | ---> Using cache 270 | ---> f4cfd273446d 271 | Successfully built f4cfd273446d 272 | Successfully tagged jahrik/pihole_exporter:latest 273 | 274 | Somehow, this magically fixed the error and it just works. 275 | 276 | docker run -it jahrik/pihole_exporter -pihole http://pihole_host_ip 277 | 278 | INFO[0000] Setup Pihole exporter using URL: %s http://pihole_host_ip source="pihole_exporter.go:112" 279 | INFO[0000] Register exporter source="pihole_exporter.go:197" 280 | INFO[0000] Listening on :9311 source="pihole_exporter.go:211" 281 | 282 | Push it up to dockerhub before deploying to swarm. 283 | 284 | docker push jahrik/pihole-exporter 285 | 286 | I'm pulling the `jahrik/pihole_exporter` version in [the stack file](https://github.com/jahrik/docker-swarm-monitor/blob/master/monitor/templates/monitor-stack.yml.j2), rather than the original, `povilasv/arm-pihole_exporter` and starting it up in swarm worked after that. 287 | 288 | pihole-exporter: 289 | image: jahrik/pihole-exporter 290 | ports: 291 | - '9101:9311' 292 | deploy: 293 | replicas: 1 294 | command: "-pihole http://pihole_host_ip" 295 | 296 | Output can be seen at [http://docker_host:9101/metrics](http://docker_host:9101/metrics) 297 | 298 | ![pihole_exporter.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/pihole_exporter.png?raw=true) 299 | 300 | Import the [Pi-Hole dashboard](https://grafana.com/dashboards/5855) to Grafana and it should look something like this to start with. I'm assuming the missing data points are there because the exporter is only seeing what pihole allows without admin access and will need user creds for the api to pull more metrics. Another thing for the **TODO** list! 301 | 302 | ![grafana_pihole_dashboard.png](https://github.com/jahrik/docker-swarm-monitor/blob/master/images/grafana_pihole_dashboard.png?raw=true) 303 | 304 | With a few minor hiccups along the way, this setup has worked great for me in my homelab docker swarm environment. In part 03, I'm thinking of putting together an example Prometheus client with Python to monitor the temperature of my server with [lm_sensors](https://wiki.archlinux.org/index.php/lm_sensors) or [PySensors](https://pypi.org/project/PySensors/#description) and output that to a gauge in Grafana. 305 | 306 | --------------------------------------------------------------------------------