├── .github └── workflows │ └── release.yaml ├── .gitignore ├── LICENSE ├── README.md ├── apps ├── backup2graph │ ├── Dockerfile │ ├── README.md │ ├── app │ │ ├── backup2graph.py │ │ ├── const.py │ │ └── db_load.py │ ├── buildkitd.toml │ └── requirements.txt └── neo4j-datasource │ └── kniepdennis-neo4j-datasource-2.0.0.zip ├── charts └── aci-monitoring-stack │ ├── .helmignore │ ├── Chart.lock │ ├── Chart.yaml │ ├── alerts │ ├── loki │ │ └── interfaces.yaml │ └── prom │ │ └── alerts.yaml │ ├── charts │ ├── grafana-9.0.0.tgz │ ├── loki-6.29.0.tgz │ ├── memgraph-0.2.3.tgz │ ├── prometheus-27.13.0.tgz │ └── promtail-6.16.6.tgz │ ├── config.d │ ├── aci-configs.yaml │ ├── bgp.yaml │ ├── interface.yaml │ ├── node.yaml │ ├── ospf.yaml │ ├── system.yaml │ └── vlan.yaml │ ├── dashboards │ ├── alerts.json │ ├── contracts-explorer.json │ ├── drops.json │ ├── epg-explore.json │ ├── epg-stats.json │ ├── fabric-policies-pg.json │ ├── fabric-wide-capacity.json │ ├── faults.json │ ├── missing-targets.json │ ├── node-capacity.json │ ├── node-details.json │ ├── node-interfaces.json │ ├── power-usage.json │ ├── routing-protocols.json │ └── vlans-vmm-trunks.json │ ├── templates │ ├── _helpers.tpl │ ├── aci-exporter │ │ ├── configmap-config.yaml │ │ ├── configmap-queries.yaml │ │ ├── deployment.yaml │ │ └── service.yaml │ ├── backup2graph │ │ └── CronJob.yml │ ├── grafana-configmap-dashboards.yaml │ ├── grafana-datasources.yaml │ ├── loki │ │ └── loki-configmap-alerts.yaml │ ├── openshift │ │ ├── SecurityContextConstraints.yaml │ │ ├── bucket.yaml │ │ ├── cluster-role-binding.yaml │ │ ├── cluster-role.yaml │ │ └── service-account.yaml │ ├── prometheus │ │ ├── configmap-alerts.yaml │ │ └── configmap-config.yaml │ └── syslog-ng │ │ ├── configmap.yaml │ │ ├── deployment.yaml │ │ └── service.yaml │ └── values.yaml └── docs ├── 4-fabric-example.yaml ├── LABDCN-1038 ├── README.md └── overview.md ├── LABDCN-2620 ├── README.md ├── dmz-deploy.md └── overview.md ├── demo-environment.md ├── deployment.md ├── development.md ├── example-openshift.yaml ├── images ├── column-filter.png ├── contract-explorer.png ├── dashboards.png ├── fabric-filter.png ├── faults.png ├── missing-targets.png ├── port-group.png └── vlans.png ├── labs ├── images │ ├── lab1 │ │ ├── EmptyDashboard.png │ │ ├── TableView1.png │ │ ├── TimeSeries.png │ │ ├── Visualization.png │ │ ├── label-filtering-1.png │ │ ├── label-filtering-dropdown.png │ │ ├── multiply.png │ │ ├── oganize.png │ │ ├── queryformat.png │ │ └── table-wrong-time.png │ └── lab2 │ │ ├── explore.png │ │ ├── log-details.png │ │ ├── logs-1.png │ │ ├── loki-builder.png │ │ ├── multi-fabric-logs.png │ │ ├── select-loki.png │ │ ├── ui-filter-result.png │ │ └── ui-filter.png ├── lab1.md ├── lab2.md └── lab3.md ├── minikube.md ├── multiple-aci-exporters.md ├── syslog.md └── webex.md /.github/workflows/release.yaml: -------------------------------------------------------------------------------- 1 | name: Release Charts 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | 8 | jobs: 9 | release: 10 | permissions: 11 | contents: write 12 | runs-on: ubuntu-latest 13 | steps: 14 | - name: Checkout 15 | uses: actions/checkout@v4 16 | with: 17 | fetch-depth: 0 18 | 19 | - name: Configure Git 20 | run: | 21 | git config user.name "$GITHUB_ACTOR" 22 | git config user.email "$GITHUB_ACTOR@users.noreply.github.com" 23 | - name: Add repositories 24 | run: | 25 | for dir in $(ls -d charts/*/); do 26 | helm dependency list $dir 2> /dev/null | tail +2 | head -n -1 | awk '{ print "helm repo add " $1 " " $3 }' | while read cmd; do $cmd; done 27 | done 28 | - name: Run chart-releaser 29 | uses: helm/chart-releaser-action@v1.6.0 30 | env: 31 | CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}" 32 | CR_SKIP_EXISTING: True -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | my_lab.yaml 2 | local_tests/ 3 | req.rest 4 | ndi.rest 5 | cronRole.yml 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | aci-monitoring-stack - Open Source Monitoring for Cisco ACI 2 | ------------ 3 | 4 | # Overview 5 | 6 | Harness the power of open source to efficiently monitor your Cisco ACI environment with the ACI-Monitoring-Stack. This lightweight, yet robust, monitoring solution combines top-tier open source tools, each contributing unique capabilities to ensure comprehensive visibility into your ACI infrastructure. 7 | 8 | The ACI-Monitoring-Stack integrates the following key components: 9 | 10 | - [Grafana](https://grafana.com/oss/grafana/): The leading open-source analytics and visualization platform. Grafana allows you to create dynamic dashboards that provide real-time insights into your network's performance, health, and metrics. With its user-friendly interface, you can easily visualize and correlate data across your ACI fabric, enabling quicker diagnostics and informed decision-making. 11 | 12 | - [Prometheus](https://prometheus.io/): A powerful open-source monitoring and alerting toolkit. Prometheus excels in collecting and storing metrics in a time-series database, allowing for flexible queries and real-time alerting. Its seamless integration with Grafana ensures that your monitoring stack provides a detailed and up-to-date view of your ACI environment. 13 | 14 | - [Loki](https://grafana.com/oss/loki/): Designed for efficiently aggregating and querying logs from your entire ACI ecosystem. Loki complements Prometheus by focusing on log aggregation, providing a unified stack for metrics and logs. Its integration with Grafana enables you to correlate log data with metrics and create a holistic monitoring experience. 15 | 16 | - [Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/): the agent responsible for gathering and shipping the log files to the Loki server. 17 | 18 | - [Syslog-ng](https://github.com/syslog-ng/syslog-ng): is an open-source implementation of the Syslog protocol, its role in this stack is to translate syslog messages from RFC 3164 to 5424. This is needed because Promtail only support Syslog RFC 5424 over TCP and this capability is only available in ACI 6.1 and above. 19 | 20 | - [aci-exporter](https://github.com/opsdis/aci-exporter): A Prometheus exporter that serves as the bridge between your Cisco ACI environment and the Prometheus monitoring ecosystem. The aci-exporter translates ACI-specific metrics into a format that Prometheus can ingest, ensuring that all crucial data points are captured and monitored effectively. 21 | 22 | - [backup2graph](apps/backup2graph/README.md): Convert an ACI Backup into a Graph Database 23 | 24 | - [Memgraph](https://github.com/memgraph/memgraph): An open source graph database implemented in C/C++ and leverages an in-memory first architecture. This will be used in the ACI-Monitoring-Stack to explore the ACI configurations imported by backup2graph 25 | 26 | - Pre-configured ACI data collections queries, alerts, and dashboards (Work In Progress): The ACI-Monitoring-Stack provides a solid foundation for monitoring an ACI fabric with its pre-defined queries, dashboards, and alerts. While these tools are crafted based on best practices to offer immediate insights into network performance, they are not exhaustive. The strength of the ACI-Monitoring-Stack lies in its community-driven approach. Users are invited to contribute their expertise by providing feedback, sharing custom solutions, and helping enhance the stack. Your input helps to refine and expand the stack's capabilities, ensuring it remains a relevant and powerful tool for network monitoring. 27 | 28 | # Your Stack 29 | 30 | To gain a comprehensive understanding of the ACI Monitoring Stack and its components it is helpful to break down the stack into separate functions. Each function focuses on a different aspect of monitoring the Cisco Application Centric Infrastructure (ACI) environment. 31 | 32 | ## Fabric Discovery: 33 | 34 | The ACI monitoring stack uses Prometheus Service Discovery (HTTP SD) to dynamically discover and scrape targets by periodically querying a specified HTTP endpoint for a list of target configurations in JSON format. 35 | 36 | The ACI Monitoring Stack needs only the IP addresses of the APICs, the Switches will be Auto Discovered. If switches are added or removed from the fabric no action is required from the end user. 37 | 38 | ```mermaid 39 | flowchart-elk RL 40 | P[("Prometheus")] 41 | A["aci-exporter"] 42 | APIC["APIC"] 43 | 44 | APIC -- "API Query" --> A 45 | A -- "HTTP SD" --> P 46 | ``` 47 | 48 | ## ACI Object Scraping: 49 | 50 | `Prometheus` scraping is the process by which `Prometheus` periodically collects metrics data by sending HTTP requests to predefined endpoints on monitored targets. The `aci-exporter` translates ACI-specific metrics into a format that `Prometheus` can ingest, ensuring that all crucial data points are captured and monitored effectively. 51 | 52 | ```mermaid 53 | flowchart-elk RL 54 | P[("Prometheus")] 55 | A["aci-exporter"] 56 | subgraph ACI 57 | S["Switches"] 58 | APIC["APIC"] 59 | end 60 | A--"Scraping"-->P 61 | S--"API Queries"-->A 62 | APIC--"API Queries"-->A 63 | ``` 64 | ## Syslog Ingestion: 65 | 66 | The syslog config is composed of 3 components: `promtail`, `loki` and `syslog-ng`. 67 | Prior to ACI 6.1 `syslog-ng` is required between `ACI` and `Promtail` to convert from RFC 3164 to 5424 syslog message format. 68 | 69 | ```mermaid 70 | flowchart-elk LR 71 | L["Loki"] 72 | PT["Promtail"] 73 | SL["Syslog-ng"] 74 | PT-->L 75 | SL-->PT 76 | subgraph ACI 77 | S["Switches"] 78 | APIC["APIC"] 79 | end 80 | V{Ver >= 6.1} 81 | S--"Syslog"-->V 82 | APIC--"Syslog"-->V 83 | V -->|Yes| PT 84 | V -->|No| SL 85 | ``` 86 | 87 | 88 | ## Config Explorer: 89 | 90 | ACI-Monitoring-Stack will generate a Config Snapshot every 15min (By default) and automatically load it into Memgraph. 91 | Backup2Graph uses ACI API Call to: 92 | - Create a new snapshot policy 93 | - Trigger a snapshot 94 | - Delete the snapshot policy and snapshot (once transferred out of the APIC) 95 | 96 | and then uses `scp` to copy it over for processing. Once the Snapshot is copied the APIC config is cleaned up 97 | 98 | ```mermaid 99 | flowchart-elk RL 100 | U["User"] 101 | G["Grafana"] 102 | A["APIC"] 103 | B2G["Backup2Graph"] 104 | MG["Memgraph"] 105 | A--"Backup"-->B2G 106 | B2G--"Push"-->MG 107 | MG--"Cypher Queries"-->G 108 | G-->U 109 | ``` 110 | ## Data Visualization 111 | 112 | The Data Visualization is handled by `Grafana`, an open-source analytics and monitoring platform that allows users to visualize, query, and analyze data from various sources through customizable and interactive dashboards. It supports a wide range of data sources, including `Prometheus` and `Loki` enabling users to create real-time visualizations, alerts, and reports to monitor system performance and gain actionable insights. 113 | 114 | ```mermaid 115 | flowchart-elk RL 116 | G["Grafana"] 117 | L["Loki"] 118 | P[("Prometheus")] 119 | U["User"] 120 | 121 | P--"PromQL"-->G 122 | L--"LogQL"-->G 123 | G-->U 124 | ``` 125 | ## Alerting 126 | 127 | `Alertmanager` is a component of the `Prometheus` ecosystem designed to handle alerts generated by `Prometheus`. It manages the entire lifecycle of alerts, including deduplication, grouping, silencing, and routing notifications to various communication channels like email, `Webex`, `Slack`, and others, ensuring that alerts are delivered to the right people in a timely and organized manner. 128 | 129 | In the ACI Monitoring Stack both `Prometheus` and `Loki` are configured with alerting rules. 130 | ```mermaid 131 | flowchart-elk LR 132 | L["Loki"] 133 | P["Prometheus"] 134 | AM["Alertmanager"] 135 | N["Notifications (Mail/Webex etc...)"] 136 | L --> AM 137 | P --> AM 138 | AM --> N 139 | ``` 140 | # [Demo Environment Access and Use](docs/demo-environment.md) 141 | 142 | # [Stack Deployment Guide](docs/deployment.md) 143 | 144 | # [Stack Development Guide](docs/development.md) 145 | -------------------------------------------------------------------------------- /apps/backup2graph/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.12-slim 2 | WORKDIR /app 3 | ENV PATH="$PATH:/app" 4 | COPY requirements.txt requirements.txt 5 | RUN apt update && apt install -y python3-dev cmake make gcc g++ libssl-dev && pip3 install -r requirements.txt 6 | COPY app /app 7 | ENTRYPOINT ["python", "backup2graph.py"] -------------------------------------------------------------------------------- /apps/backup2graph/README.md: -------------------------------------------------------------------------------- 1 | # Backup 2 Graph 2 | 3 | This Python application is used in conjunction with the ACI Monitoring Stack to convert the ACI Object Model Database in a Graph Database. 4 | 5 | This is done by following these steps: 6 | 7 | * Load the ACI access details from the aci-exporter config map 8 | * Generate an ACI Backup 9 | * Export it locally 10 | * Parse the backup in a set of CSV Files 11 | * Load The backup inside Memgraph 12 | 13 | `Backup2Graph` and `Memgraph` are sharing a common PVC so that the data processed by `Backup2Graph` is directly accessible by `Memgraph` 14 | This is taken care by the Helm chart automatically. 15 | Currently Multiple Fabric are supported but only 1 snapshot is preserved. 16 | 17 | ## Building Backup2Graph 18 | 19 | You can use the docker file to build your own images however in my lab I have to deal with a proxy if you are as unfortunate as me follow these steps: 20 | 21 | ### Create a Buildx instance 22 | 23 | * If you are using in insecure registry (i.e. self signed certificate) set you key/cert/ca in the [buildkitd](buildkitd.toml) config file. 24 | * If you have a proxy set up your `http_proxy/https_proxy/no_proxy` environment variables in `/etc/environment` 25 | ```shell 26 | docker buildx create --use --driver-opt '"env.http_proxy='$http_proxy'"' --driver-opt '"env.https_proxy='$https_proxy'"' --driver-opt '"env.no_proxy='$no_proxy'"' --config ./buildkitd.toml 27 | 28 | ``` 29 | 30 | ### When Building Pass the Proxy Environment to the build container 31 | 32 | For my build I just need `https_proxy` but you can add the other options if needed. 33 | ```shell 34 | docker login harbor.cam.ciscolabs.com -u -p 35 | docker build . --build-arg HTTPS_PROXY=$https_proxy --build-arg HTTP_PROXY=$http_proxy --platform linux/amd64,linux/arm64 --push --tag harbor.cam.ciscolabs.com/library/backup2graph:test 36 | ``` 37 | 38 | If you push to a registry that needs authentication do 39 | ```shell 40 | docker login 41 | ``` 42 | -------------------------------------------------------------------------------- /apps/backup2graph/app/db_load.py: -------------------------------------------------------------------------------- 1 | import multiprocessing 2 | import mgclient 3 | import os 4 | import logging 5 | 6 | 7 | # Configure the root logger 8 | #logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - [Thread ID: %(thread)d] - %(message)s') 9 | 10 | # Create a logger for this module 11 | logger = logging.getLogger(__name__) 12 | logger.setLevel(logging.INFO) 13 | 14 | # Create a console handler and set its level to DEBUG 15 | ch = logging.StreamHandler() 16 | ch.setLevel(logging.INFO) 17 | 18 | # Create a formatter and set it for the handler 19 | formatter = logging.Formatter('%(asctime)s - %(levelname)s - [Thread ID: %(thread)d] - %(message)s') 20 | ch.setFormatter(formatter) 21 | 22 | # Add the handler to the logger 23 | logger.addHandler(ch) 24 | 25 | host = os.environ['MEMGRAPH_SVC_HOST'] 26 | port = int(os.environ['MEMGRAPH_SVC_PORT']) 27 | 28 | 29 | def wipe_db(): 30 | # Delete everything for now 31 | conn = mgclient.connect(host=host, port=port) 32 | conn.autocommit = True 33 | cursor = conn.cursor() 34 | cursor.execute("MATCH (n) DETACH DELETE n;") 35 | cursor.execute("CALL schema.assert({}, {}, {}, true) YIELD action, key, keys, label, unique RETURN action, key, keys, label, unique;") 36 | cursor.close() 37 | conn.close() 38 | 39 | def in_memory_analytical(): 40 | conn = mgclient.connect(host=host, port=port) 41 | cursor = conn.cursor() 42 | conn.autocommit = True 43 | cursor.execute("STORAGE MODE IN_MEMORY_ANALYTICAL") 44 | cursor.close() 45 | conn.close() 46 | 47 | def in_memory_transactional(): 48 | conn = mgclient.connect(host=host, port=port) 49 | cursor = conn.cursor() 50 | conn.autocommit = True 51 | cursor.execute("STORAGE MODE IN_MEMORY_TRANSACTIONAL") 52 | cursor.close() 53 | conn.close() 54 | 55 | def execute_load(q): 56 | conn = mgclient.connect(host=host, port=port) 57 | cursor = conn.cursor() 58 | logger.debug("execute: %s", q) 59 | max_retries=10 60 | for attempt in range(max_retries): 61 | try: 62 | cursor.execute(q) 63 | conn.commit() 64 | break 65 | except Exception as e: 66 | logger.error("Transaction failed %s", str(e)) 67 | cursor.close() 68 | conn.close() 69 | 70 | def execute_indexes(q): 71 | conn = mgclient.connect(host=host, port=port) 72 | cursor = conn.cursor() 73 | max_retries=10 74 | for attempt in range(max_retries): 75 | try: 76 | conn.autocommit = True 77 | cursor.execute(q) 78 | break 79 | except Exception as e: 80 | logger.error("Transaction failed %s", str(e)) 81 | cursor.close() 82 | conn.close() 83 | 84 | def db_load(fabric_backup_folder): 85 | 86 | csv_folder = os.path.join(fabric_backup_folder, 'csv') 87 | logger.info("Loading Indexes") 88 | with open(csv_folder + "/indexes", 'r') as f: 89 | lines = f.readlines() 90 | with multiprocessing.Pool(1) as pool: 91 | pool.map(execute_indexes, lines) 92 | logger.info("Loading Nodes") 93 | with open(csv_folder + "/nodes", 'r') as f: 94 | lines = f.readlines() 95 | with multiprocessing.Pool(1) as pool: 96 | pool.map(execute_load, lines) 97 | logger.info("Loading Edges") 98 | with open(csv_folder + "/edges", 'r') as f: 99 | lines = f.readlines() 100 | with multiprocessing.Pool(1) as pool: 101 | pool.map(execute_load, lines) 102 | logger.info("DB Import Completed") -------------------------------------------------------------------------------- /apps/backup2graph/buildkitd.toml: -------------------------------------------------------------------------------- 1 | [registry."harbor.cam.ciscolabs.com"] 2 | ca=["/nfs-share/harbor/harbor-install/ca.crt"] 3 | [[registry."harbor.cam.ciscolabs.com".keypair]] 4 | key="/nfs-share/harbor/harbor-install/harbor.cam.ciscolabs.com.key" 5 | cert="/nfs-share/harbor/harbor-install/harbor.cam.ciscolabs.com.crt" -------------------------------------------------------------------------------- /apps/backup2graph/requirements.txt: -------------------------------------------------------------------------------- 1 | cisco-pyaci==1.1.2 2 | pymgclient -------------------------------------------------------------------------------- /apps/neo4j-datasource/kniepdennis-neo4j-datasource-2.0.0.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/apps/neo4j-datasource/kniepdennis-neo4j-datasource-2.0.0.zip -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/.helmignore: -------------------------------------------------------------------------------- 1 | .git 2 | experiments -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/Chart.lock: -------------------------------------------------------------------------------- 1 | dependencies: 2 | - name: prometheus 3 | repository: https://prometheus-community.github.io/helm-charts 4 | version: 27.13.0 5 | - name: loki 6 | repository: https://grafana.github.io/helm-charts 7 | version: 6.29.0 8 | - name: promtail 9 | repository: https://grafana.github.io/helm-charts 10 | version: 6.16.6 11 | - name: grafana 12 | repository: https://grafana.github.io/helm-charts 13 | version: 9.0.0 14 | - name: memgraph 15 | repository: https://memgraph.github.io/helm-charts 16 | version: 0.2.3 17 | digest: sha256:1411ab82769a2a2b323ab2fa4914394e5bcb92625ce975e61010da013cbfef5c 18 | generated: "2025-05-20T11:36:12.483387624+10:00" 19 | -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/Chart.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v2 2 | name: aci-monitoring-stack 3 | description: A Helm chart for Kubernetes 4 | 5 | # A chart can be either an 'application' or a 'library' chart. 6 | # 7 | # Application charts are a collection of templates that can be packaged into versioned archives 8 | # to be deployed. 9 | # 10 | # Library charts provide useful utilities or functions for the chart developer. They're included as 11 | # a dependency of application charts to inject those utilities and functions into the rendering 12 | # pipeline. Library charts do not define any templates and therefore cannot be deployed. 13 | type: application 14 | 15 | # This is the chart version. This version number should be incremented each time you make changes 16 | # to the chart and its templates, including the app version. 17 | # Versions are expected to follow Semantic Versioning (https://semver.org/) 18 | version: 0.3.0 19 | # This is the version number of the application being deployed. This version number should be 20 | # incremented each time you make changes to the application. Versions are not expected to 21 | # follow Semantic Versioning. They should reflect the version the application is using. 22 | # It is recommended to use it with quotes. 23 | appVersion: "v0.8.3" 24 | dependencies: 25 | - name: prometheus 26 | version: "27.13.0" 27 | repository: "https://prometheus-community.github.io/helm-charts" 28 | condition: prometheus.enabled 29 | - name: loki 30 | version: "6.29.0" 31 | repository: "https://grafana.github.io/helm-charts" 32 | condition: loki.enabled 33 | - name: "promtail" 34 | condition: promtail.enabled 35 | repository: "https://grafana.github.io/helm-charts" 36 | version: "6.16.6" 37 | - name: "grafana" 38 | condition: grafana.enabled 39 | repository: "https://grafana.github.io/helm-charts" 40 | version: "9.0.0" 41 | - name: "memgraph" 42 | condition: memgraph.enabled 43 | repository: "https://memgraph.github.io/helm-charts" 44 | version: "0.2.3" -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/alerts/loki/interfaces.yaml: -------------------------------------------------------------------------------- 1 | groups: 2 | - name: Interfaces 3 | rules: 4 | - alert: Fabric Interface Down 5 | expr: | 6 | (sum by (fabric, switch,interface)(count_over_time({job="aci-monitoring-stack"} |= `[F1394][soaking]` |= `[sys/phys-` | regexp ".+phys-\\[(?P[^\\]]+)\\]/"[1m])) > 1) 7 | for: 0m 8 | labels: 9 | severity: warning 10 | annotations: 11 | summary: Fabric Interface Down 12 | description: Fabric Interface `{{ $labels.interface }}` on switch `{{ $labels.switch }}` in Fabric `{{ $labels.fabric }}` Went Down 13 | - alert: EPG Interface Down 14 | expr: | 15 | (sum by (fabric, switch,interface)(count_over_time({job="aci-monitoring-stack"} |= `[F0532][soaking]` |= `[sys/phys-` | regexp ".+phys-\\[(?P[^\\]]+)\\]/"[1m])) > 1) 16 | for: 0m 17 | labels: 18 | severity: warning 19 | annotations: 20 | summary: EPG Interface Down 21 | description: EPG Interface `{{ $labels.interface }}` on switch `{{ $labels.switch }}` in Fabric `{{ $labels.fabric }}` Went Down 22 | - alert: vPC Down 23 | expr: | 24 | (sum by (fabric, switch,interface)(count_over_time({job="aci-monitoring-stack"} |= `[F1296][soaking]` | regexp ".+vPC (?P.+) is"[1m])) > 1) 25 | for: 0m 26 | labels: 27 | severity: warning 28 | annotations: 29 | summary: vPC Down 30 | description: EPG Port-Channel `{{ $labels.interface }}` on switch `{{ $labels.switch }}` in Fabric `{{ $labels.fabric }}` Went Down 31 | - alert: Interface Up 32 | expr: | 33 | (sum by (fabric, switch,interface)(count_over_time({job="aci-monitoring-stack"} |= "[E4205125]" |= `[sys/phys-` | regexp ".+phys-\\[(?P[^\\]]+)\\]/"[1m])) > 1) 34 | for: 0m 35 | labels: 36 | severity: info 37 | annotations: 38 | summary: Interface Up 39 | description: Interface `{{ $labels.interface }}` on switch `{{ $labels.switch }}` in Fabric `{{ $labels.fabric }}` Went Up 40 | - alert: vPC Up 41 | expr: | 42 | (sum by (fabric, switch,interface)(count_over_time({job="aci-monitoring-stack"} |= `[E4205113]` | regexp ".+vPC (?P.+) is"[1m])) > 1) 43 | for: 0m 44 | labels: 45 | severity: info 46 | annotations: 47 | summary: vPC Up 48 | description: Interface `{{ $labels.interface }}` on switch `{{ $labels.switch }}` in Fabric `{{ $labels.fabric }}` Went Up 49 | - record: aci_interfaces_down_count1m 50 | expr: | 51 | (sum by (fabric, switch,interface)(count_over_time({job="aci-monitoring-stack"} |= `[F1394][soaking]` or `[F0532][soaking]` [1m]))) 52 | labels: 53 | test: "record" -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/alerts/prom/alerts.yaml: -------------------------------------------------------------------------------- 1 | groups: 2 | - name: ACI 3 | rules: 4 | - alert: BGP Received Path high churn detected 5 | expr: abs(deriv(aci_bgp_peer_prefix_received{address_family=~"ipv4-ucast|ipv6-ucast"}[2m])) > 0.2 6 | labels: 7 | severity: warning 8 | annotations: 9 | description: 'BGP Received Path high churn detected: Fabric: {{ $labels.fabric }}, Node: {{ $labels.nodeid }}, Peer: {{ $labels.peer_ip}}, VRF: {{ $labels.vrf}}. The churn is {{ $value }} routes per second.' 10 | summary: 'BGP Received Path high churn detected' -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/charts/grafana-9.0.0.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/charts/aci-monitoring-stack/charts/grafana-9.0.0.tgz -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/charts/loki-6.29.0.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/charts/aci-monitoring-stack/charts/loki-6.29.0.tgz -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/charts/memgraph-0.2.3.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/charts/aci-monitoring-stack/charts/memgraph-0.2.3.tgz -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/charts/prometheus-27.13.0.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/charts/aci-monitoring-stack/charts/prometheus-27.13.0.tgz -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/charts/promtail-6.16.6.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/charts/aci-monitoring-stack/charts/promtail-6.16.6.tgz -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/config.d/aci-configs.yaml: -------------------------------------------------------------------------------- 1 | #This file is used to ingest the APIC configs to then analyse/display it inside Grafana. 2 | class_queries: 3 | subnets: 4 | class_name: fvSubnet 5 | query_parameter: '?order-by=fvSubnet.dn' 6 | metrics: 7 | - name: subnets 8 | value_name: fvSubnet.attributes.uid 9 | labels: 10 | - property_name: fvSubnet.attributes.dn 11 | regex: "^uni/tn-(?P.*)/BD-(?P.*)/subnet-\\[(?P.*)\\]" 12 | 13 | epg_to_bd: 14 | class_name: fvRsBd 15 | query_parameter: '?order-by=fvRsBd.dn' 16 | metrics: 17 | - name: epg_to_bd 18 | value_name: fvRsBd.attributes.uid 19 | labels: 20 | - property_name: fvRsBd.attributes.dn 21 | regex: "^uni/tn-(?P.*)/ap-(?P.*)/epg-(?P.*)/rsbd" 22 | - property_name: fvRsBd.attributes.tDn 23 | regex: "^uni/tn-(?P.*)/BD-(?P.*)" 24 | inb_epg_to_bd: 25 | class_name: mgmtRsMgmtBD 26 | query_parameter: '?order-by=mgmtRsMgmtBD.dn' 27 | metrics: 28 | - name: inb_epg_to_bd 29 | value_name: mgmtRsMgmtBD.attributes.uid 30 | labels: 31 | - property_name: mgmtRsMgmtBD.attributes.dn 32 | regex: "^uni/tn-(?P.*)/mgmtp-(?P.*)/inb-(?P.*)/rsmgmtBD" 33 | - property_name: mgmtRsMgmtBD.attributes.tDn 34 | regex: "^uni/tn-(?P.*)/BD-(?P.*)" 35 | svc_epg_to_bd: 36 | class_name: vnsRsLIfCtxToBD 37 | query_parameter: '?order-by=vnsRsLIfCtxToBD.dn' 38 | metrics: 39 | - name: svc_epg_to_bd 40 | value_name: vnsRsLIfCtxToBD.attributes.uid 41 | labels: 42 | - property_name: vnsRsLIfCtxToBD.attributes.dn 43 | regex: "^uni/tn-(?P.*)/ldevCtx-c-(?P.*)-g-(?P.*)-n-(?P.*)/lIfCtx-c-(?P.*)/rsLIfCtxToBD" 44 | - property_name: vnsRsLIfCtxToBD.attributes.tDn 45 | regex: "^uni/tn-(?P.*)/BD-(?P.*)" 46 | bd_to_vrf: 47 | class_name: fvRtCtx 48 | query_parameter: '?order-by=fvRtCtx.dn' 49 | metrics: 50 | - name: bd_to_vrf 51 | value_name: fvRtCtx.attributes.status 52 | value_transform: 53 | '' : 0 54 | labels: 55 | - property_name: fvRtCtx.attributes.dn 56 | regex: "^uni/tn-(?P.*)/ctx-(?P.*)/rtctx-\\[uni/tn-(?P.*)/BD-(?P.*)\\]" 57 | 58 | -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/config.d/bgp.yaml: -------------------------------------------------------------------------------- 1 | class_queries: 2 | node_bgp_peers: 3 | class_name: bgpPeer 4 | query_parameter: '?order-by=bgpPeer.dn&rsp-subtree=children&rsp-subtree-class=bgpPeerEntry&rsp-subtree-include=required' 5 | metrics: 6 | - name: bgp_peers 7 | # As metric I am saving the last time the peer conenction changed state 8 | value_name: bgpPeer.children.[.*].attributes.lastFlapTs 9 | value_regex_transformation: "(?P.*)" 10 | value_calculation: "date" 11 | labels: 12 | - property_name: bgpPeer.attributes.asn 13 | regex: "(?P.*)" 14 | - property_name: bgpPeer.attributes.srcIf 15 | regex: "(?P.*)" 16 | - property_name: bgpPeer.attributes.dn 17 | regex: "^sys/bgp/inst/dom-(?P.*)/peer-" 18 | - property_name: bgpPeer.children.[.*].attributes.flags 19 | regex: "(?P.*)" 20 | - property_name: bgpPeer.children.[.*].attributes.type 21 | regex: "(?P.*)" 22 | - property_name: bgpPeer.children.[.*].attributes.operSt 23 | regex: "(?P.*)" 24 | - property_name: bgpPeer.children.[.*].attributes.connAttempts 25 | regex: "(?P.*)" 26 | - property_name: bgpPeer.children.[.*].attributes.connDrop 27 | regex: "(?P.*)" 28 | - property_name: bgpPeer.children.[.*].attributes.connEst 29 | regex: "(?P.*)" 30 | - property_name: bgpPeer.children.[.*].attributes.addr 31 | regex: "(?P.*)" 32 | - property_name: bgpPeer.children.[.*].attributes.rtrId 33 | regex: "(?P.*)" 34 | # BGP Peers 35 | node_bgp_peers_af: 36 | class_name: bgpPeerAfEntry 37 | query_parameter: '?order-by=bgpPeerAfEntry.dn' 38 | metrics: 39 | - name: bgp_peer_prefix_sent 40 | value_name: bgpPeerAfEntry.attributes.pfxSent 41 | type: "gauge" 42 | - name: bgp_peer_prefix_received 43 | value_name: bgpPeerAfEntry.attributes.acceptedPaths 44 | type: "gauge" 45 | labels: 46 | - property_name: bgpPeerAfEntry.attributes.dn 47 | regex: "^sys/bgp/inst/dom-(?P.*)/peer-.*\\/ent-\\[(?P.*)\\]" 48 | - property_name: bgpPeerAfEntry.attributes.type 49 | regex: "(?P.*)" 50 | -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/config.d/interface.yaml: -------------------------------------------------------------------------------- 1 | class_queries: 2 | 3 | node_interface_info: 4 | # Interface speed and status 5 | class_name: ethpmPhysIf 6 | metrics: 7 | # The name of the metrics without prefix and unit 8 | - name: interface_oper_speed 9 | value_name: ethpmPhysIf.attributes.operSpeed 10 | unit: bps 11 | type: gauge 12 | help: The current operational speed of the interface, in bits per second. 13 | value_transform: 14 | 'unknown': 0 15 | '100M': 100000000 16 | '1G': 1000000000 17 | '10G': 10000000000 18 | '25G': 25000000000 19 | '40G': 40000000000 20 | '100G': 100000000000 21 | '400G': 400000000000 22 | - name: interface_oper_state 23 | # The field in the json that is used as the metric value, qualified path (gjson) under imdata 24 | value_name: ethpmPhysIf.attributes.operSt 25 | # Type 26 | type: gauge 27 | # Help text without prefix of metrics name 28 | help: The current operational state of the interface. (0=unknown, 1=down, 2=up, 3=link-up) 29 | # A string to float64 transform table of the value 30 | value_transform: 31 | 'unknown': 0 32 | 'down': 1 33 | 'up': 2 34 | 'link-up': 3 35 | # The labels to extract as regex 36 | labels: 37 | # The field in the json used to parse the labels from 38 | - property_name: ethpmPhysIf.attributes.dn 39 | regex: "^sys/(?P[a-z]+)-\\[(?P[^\\]]+)\\]/" 40 | 41 | 42 | node_interface_rx_stats: 43 | class_name: eqptIngrBytes5min 44 | metrics: 45 | - name: interface_rx_unicast 46 | value_name: eqptIngrBytes5min.attributes.unicastCum 47 | type: counter 48 | unit: bytes 49 | help: The number of unicast bytes received on the interface since it was integrated into the fabric. 50 | - name: interface_rx_multicast 51 | value_name: eqptIngrBytes5min.attributes.multicastCum 52 | type: counter 53 | unit: bytes 54 | help: The number of multicast bytes received on the interface since it was integrated into the fabric. 55 | - name: interface_rx_broadcast 56 | value_name: eqptIngrBytes5min.attributes.floodCum 57 | type: counter 58 | unit: bytes 59 | help: The number of broadcast bytes received on the interface since it was integrated into the fabric. 60 | labels: 61 | - property_name: eqptIngrBytes5min.attributes.dn 62 | regex: "^sys/(?P[a-z]+)-\\[(?P[^\\]]+)\\]/" 63 | 64 | node_interface_tx_stats: 65 | class_name: eqptEgrBytes5min 66 | metrics: 67 | - name: interface_tx_unicast 68 | value_name: eqptEgrBytes5min.attributes.unicastCum 69 | type: counter 70 | unit: bytes 71 | help: The number of unicast bytes transmitted on the interface since it was integrated into the fabric. 72 | - name: interface_tx_multicast 73 | value_name: eqptEgrBytes5min.attributes.multicastCum 74 | type: counter 75 | unit: bytes 76 | help: The number of multicast bytes transmitted on the interface since it was integrated into the fabric. 77 | - name: interface_tx_broadcast 78 | value_name: eqptEgrBytes5min.attributes.floodCum 79 | type: counter 80 | unit: bytes 81 | help: The number of broadcast bytes transmitted on the interface since it was integrated into the fabric. 82 | labels: 83 | - property_name: eqptEgrBytes5min.attributes.dn 84 | regex: "^sys/(?P[a-z]+)-\\[(?P[^\\]]+)\\]/" 85 | 86 | node_interface_rx_err_stats: 87 | class_name: eqptIngrDropPkts5min 88 | metrics: 89 | - name: interface_rx_buffer_dropped 90 | value_name: eqptIngrDropPkts5min.attributes.bufferCum 91 | type: counter 92 | unit: pkts 93 | help: The number of packets dropped by the interface due to a 94 | buffer overrun while receiving since it was integrated into the 95 | fabric. 96 | - name: interface_rx_error_dropped 97 | value_name: eqptIngrDropPkts5min.attributes.errorCum 98 | type: counter 99 | unit: pkts 100 | help: The number of packets dropped by the interface due to a 101 | packet error while receiving since it was integrated into the 102 | fabric. 103 | - name: interface_rx_forwarding_dropped 104 | value_name: eqptIngrDropPkts5min.attributes.forwardingCum 105 | type: counter 106 | unit: pkts 107 | help: The number of packets dropped by the interface due to a 108 | forwarding issue while receiving since it was integrated into the 109 | fabric. 110 | - name: interface_rx_loadbal_dropped 111 | value_name: eqptIngrDropPkts5min.attributes.lbCum 112 | type: counter 113 | unit: pkts 114 | help: The number of packets dropped by the interface due to a 115 | load balancing issue while receiving since it was integrated into 116 | the fabric. 117 | labels: 118 | - property_name: eqptIngrDropPkts5min.attributes.dn 119 | regex: "^sys/(?P[a-z]+)-\\[(?P[^\\]]+)\\]/" 120 | 121 | node_interface_tx_err_stats: 122 | class_name: eqptEgrDropPkts5min 123 | metrics: 124 | - name: interface_tx_queue_dropped 125 | value_name: eqptEgrDropPkts5min.attributes.afdWredCum 126 | type: counter 127 | unit: pkts 128 | help: The number of packets dropped by the interface during queue 129 | management while transmitting since it was integrated into the 130 | fabric. 131 | - name: interface_tx_buffer_dropped 132 | value_name: eqptEgrDropPkts5min.attributes.bufferCum 133 | type: counter 134 | unit: pkts 135 | help: The number of packets dropped by the interface due to a 136 | buffer overrun while transmitting since it was integrated into the 137 | fabric. 138 | - name: interface_tx_error_dropped 139 | value_name: eqptEgrDropPkts5min.attributes.errorCum 140 | type: counter 141 | unit: pkts 142 | help: The number of packets dropped by the interface due to a 143 | packet error while transmitting since it was integrated into the 144 | fabric. 145 | labels: 146 | - property_name: eqptEgrDropPkts5min.attributes.dn 147 | regex: "^sys/(?P[a-z]+)-\\[(?P[^\\]]+)\\]/" 148 | -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/config.d/ospf.yaml: -------------------------------------------------------------------------------- 1 | class_queries: 2 | # OSPF Neighbors 3 | node_ospf_neighbors: 4 | class_name: ospfAdjEp 5 | query_parameter: '?order-by=ospfAdjEp.dn&rsp-subtree-include=required&rsp-subtree-class=ospfAdjStats&rsp-subtree=children' 6 | metrics: 7 | - name: ospf_neighbors 8 | # As metric I am saving the last time the conenction changed state 9 | value_name: ospfAdjEp.children.[ospfAdjStats].attributes.lastStChgTs 10 | value_regex_transformation: "(?P.*)" 11 | value_calculation: "date" 12 | labels: 13 | - property_name: ospfAdjEp.attributes.dn 14 | regex: ".*/dom-(?P.*)/if-\\[(?P.*)\\]" 15 | - property_name: ospfAdjEp.attributes.area 16 | regex: "(?P.*)" 17 | - property_name: ospfAdjEp.attributes.id 18 | regex: "(?P.*)" 19 | - property_name: ospfAdjEp.attributes.operSt 20 | regex: "(?P.*)" 21 | - property_name: ospfAdjEp.attributes.peerIp 22 | regex: "(?P.*)" 23 | staticlabels: 24 | - key: type 25 | value: ospf 26 | -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/config.d/system.yaml: -------------------------------------------------------------------------------- 1 | class_queries: 2 | fabric_node_info: 3 | # Get all the fabric nodes (Controllers, Spines and Leaves) 4 | class_name: fabricNode 5 | query_parameter: '?order-by=fabricNode.dn' 6 | metrics: 7 | - name: fabric_node 8 | # In this case we are not looking for a value just the labels for info 9 | type: "gauge" 10 | help: "Returns the info of the infrastructure node" 11 | unit: "info" 12 | value_name: fabricNode.attributes.fabricSt 13 | value_transform: 14 | 'active': 1 15 | 'inactive': 2 16 | 'disabled': 3 17 | 'discovering': 4 18 | 'undiscovered': 5 19 | 'unsupported': 6 20 | 'unknown': 7 21 | 'decommissioned': 8 22 | 'maintenance': 9 23 | 'commissioned': 10 24 | labels: 25 | - property_name: fabricNode.attributes.name 26 | regex: "^(?P.*)" 27 | - property_name: fabricNode.attributes.address 28 | regex: "^(?P.*)" 29 | - property_name: fabricNode.attributes.role 30 | regex: "^(?P.*)" 31 | - property_name: fabricNode.attributes.serial 32 | regex: "^(?P.*)" 33 | - property_name: fabricNode.attributes.model 34 | regex: "^(?P.*)" 35 | - property_name: fabricNode.attributes.version 36 | regex: "^(?:n9000-)?(?P.*)" 37 | - property_name: fabricNode.attributes.dn 38 | regex: "^topology/pod-(?P[1-9][0-9]*)/node-(?P[1-9][0-9]*)" 39 | 40 | max_capacity: 41 | class_name: fvcapRule 42 | query_parameter: '?order-by=fvcapRule.dn' 43 | # fvcapRule filtering seems to be broken... so we need to filter the results in 44 | # the exporter itself, I only care about fabric wide metrics so no node-id 45 | #query_parameter: '?query-target-filter=wcard(fvcapRule.dn,"^uni/.*")' 46 | metrics: 47 | - name: max_capacity 48 | value_name: fvcapRule.attributes.constraint 49 | type: gauge 50 | help: Returns the max capacity of the fabric 51 | labels: 52 | - property_name: fvcapRule.attributes.subj 53 | regex: "^(?P.*)" 54 | - property_name: fvcapRule.attributes.dn 55 | regex: "^topology/pod-(?P[1-9][0-9]*)/node-(?P[1-9][0-9]*)" 56 | - property_name: fvcapRule.attributes.dn 57 | # This is used as a workaround so I can set the nodeid == fabric to detect fabric wide metrics 58 | regex: "^uni/(?P.*)/compcat-default" 59 | 60 | max_global_pctags: 61 | class_name: fvcapNSRule 62 | query_parameter: '?order-by=fvcapNSRule.dn' 63 | # fvcapNSRule filtering seems to be broken... so we need to filter the results in 64 | # the exporter itself, I only care about fabric wide metrics so no node-id 65 | #query_parameter: '?query-target-filter=wcard(fvcapNSRule.dn,"^uni/.*")' 66 | metrics: 67 | - name: max_global_pctag 68 | value_name: fvcapNSRule.attributes.constraint 69 | type: gauge 70 | help: Returns the used capacity for global pctag 71 | labels: 72 | - property_name: fvcapNSRule.attributes.dn 73 | regex: "^topology/pod-(?P[1-9][0-9]*)/node-(?P[1-9][0-9]*)" 74 | - property_name: fvcapNSRule.attributes.dn 75 | # This is used as a workaround so I can set the nodeid == fabric to detect fabric wide metrics 76 | regex: "^uni/(?P.*)/compcat-default" 77 | 78 | fault_insts: 79 | class_name: faultInst 80 | query_parameter: '?order-by=faultInst.dn&query-target-filter=and(not(wcard(faultInst.dn,"__ui_")))' 81 | metrics: 82 | - name: faults 83 | value_name: faultInst.attributes.lastTransition 84 | # Use the time the profile was applied 85 | value_regex_transformation: "(?P.*)" 86 | value_calculation: "lastTransition" 87 | help: Returns the faults last transition time 88 | labels: 89 | - property_name: faultInst.attributes.ack 90 | regex: "(?P.*)" 91 | - property_name: faultInst.attributes.dn 92 | regex: "(?P.*)" 93 | - property_name: faultInst.attributes.cause 94 | regex: "(?P.*)" 95 | - property_name: faultInst.attributes.created 96 | regex: "(?P.*)" 97 | - property_name: faultInst.attributes.descr 98 | regex: "(?P.*)" 99 | - property_name: faultInst.attributes.code 100 | regex: "(?P.*)" 101 | - property_name: faultInst.attributes.severity 102 | regex: "(?P.*)" 103 | - property_name: faultInst.attributes.domain 104 | regex: "(?P.*)" 105 | - property_name: faultInst.attributes.type 106 | regex: "(?P.*)" 107 | 108 | 109 | compound_queries: 110 | object_count: 111 | classnames: 112 | - class_name: fvCtx 113 | # The label value that will be set to the "labelname: class" 114 | label_value: fvCtx 115 | query_parameter: '?rsp-subtree-include=count' 116 | - class_name: fvCEp 117 | label_value: fvCEp 118 | query_parameter: '?rsp-subtree-include=count' 119 | - class_name: fvIp 120 | label_value: fvIp 121 | query_parameter: '?rsp-subtree-include=count' 122 | - class_name: fvAEPg 123 | label_value: fvAEPg 124 | query_parameter: '?rsp-subtree-include=count' 125 | #This counts both ESGs and EPGs and is useful to calculate the numbner of global PCtags 126 | - class_name: fvEPg 127 | label_value: fvEPg_Global 128 | query_parameter: '?rsp-subtree-include=count&query-target-filter=lt(fvEPg.pcTag,"16384")' 129 | - class_name: fvESg 130 | label_value: fvESg 131 | query_parameter: '?rsp-subtree-include=count' 132 | - class_name: fvBD 133 | label_value: fvBD 134 | query_parameter: '?rsp-subtree-include=count' 135 | - class_name: fvTenant 136 | label_value: fvTenant 137 | query_parameter: '?rsp-subtree-include=count' 138 | - class_name: vnsCDev 139 | label_value: vnsCDev 140 | query_parameter: '?rsp-subtree-include=count' 141 | - class_name: vnsGraphInst 142 | label_value: vnsGraphInst 143 | query_parameter: '?rsp-subtree-include=count' 144 | - class_name: eqptLC 145 | label_value: eqptLC 146 | query_parameter: '?rsp-subtree-include=count' 147 | - class_name: coopEpRec 148 | label_value: coopRemoteEp 149 | query_parameter: '?rsp-subtree-include=count&query-target-filter=or(eq(coopEpRec.remoteType,"msite"),eq(coopEpRec.remoteType,"ext_fab"))' 150 | labelname: class 151 | metrics: 152 | - name: object_instances 153 | value_name: moCount.attributes.count 154 | type: gauge 155 | help: Returns the current count of objects for ACI classes 156 | 157 | node_count: 158 | classnames: 159 | - class_name: topSystem 160 | label_value: spine 161 | query_parameter: '?query-target-filter=eq(topSystem.role,"spine")&rsp-subtree-include=count' 162 | - class_name: topSystem 163 | label_value: leaf 164 | query_parameter: '?query-target-filter=eq(topSystem.role,"leaf")&rsp-subtree-include=count' 165 | - class_name: topSystem 166 | label_value: controller 167 | query_parameter: '?query-target-filter=eq(topSystem.role,"controller")&rsp-subtree-include=count' 168 | labelname: type 169 | metrics: 170 | - name: nodes 171 | value_name: moCount.attributes.count 172 | type: gauge 173 | help: Returns the current count of nodes 174 | 175 | # Group class queries 176 | group_class_queries: 177 | # Gather all different health related metrics 178 | health: 179 | name: health 180 | unit: ratio 181 | type: gauge 182 | help: Returns health score 183 | queries: 184 | - node_health: 185 | class_name: topSystem 186 | query_parameter: "?rsp-subtree-include=health" 187 | metrics: 188 | - value_name: topSystem.children.@reverse.0.healthInst.attributes.cur 189 | value_calculation: "value / 100" 190 | labels: 191 | - property_name: topSystem.attributes.dn 192 | regex: "^topology/pod-(?P[1-9][0-9]*)/node-(?P[1-9][0-9]*)/sys" 193 | - property_name: topSystem.attributes.state 194 | regex: "^(?P.*)" 195 | - property_name: topSystem.attributes.oobMgmtAddr 196 | regex: "^(?P.*)" 197 | - property_name: topSystem.attributes.name 198 | regex: "^(?P.*)" 199 | - property_name: topSystem.attributes.role 200 | regex: "^(?P.*)" 201 | # A label for the class query 202 | staticlabels: 203 | - key: class 204 | value: topSystem 205 | 206 | - fabric_health: 207 | class_name: fabricHealthTotal 208 | query_parameter: '?query-target-filter=wcard(fabricHealthTotal.dn,"topology/.*/health")' 209 | metrics: 210 | - value_name: fabricHealthTotal.attributes.cur 211 | value_calculation: "value / 100" 212 | labels: 213 | - property_name: fabricHealthTotal.attributes.dn 214 | regex: "^topology/pod-(?P[1-9][0-9]*)/health" 215 | staticlabels: 216 | - key: class 217 | value: fabricHealthTotal 218 | 219 | - contract: 220 | class_name: fvCtx 221 | query_parameter: '?rsp-subtree-include=health,required' 222 | metrics: 223 | - value_name: fvCtx.children.[healthInst].attributes.cur 224 | value_calculation: "value / 100" 225 | labels: 226 | - property_name: fvCtx.attributes.dn 227 | regex: "^uni/tn-(?P.*)/ctx-(?P.*)" 228 | staticlabels: 229 | - key: class 230 | value: fvCtx 231 | 232 | - bridge_domain_health_by_label: 233 | class_name: fvBD 234 | query_parameter: '?rsp-subtree-include=health,required' 235 | metrics: 236 | - value_name: fvBD.children.[healthInst].attributes.cur 237 | value_calculation: "value / 100" 238 | labels: 239 | - property_name: fvBD.attributes.dn 240 | regex: "^uni/tn-(?P.*)/BD-(?P.*)" 241 | staticlabels: 242 | - key: class 243 | value: fvBD 244 | 245 | - tenant: 246 | class_name: fvTenant 247 | query_parameter: '?rsp-subtree-include=health,required' 248 | metrics: 249 | - value_name: fvTenant.children.[healthInst].attributes.cur 250 | value_calculation: "value / 100" 251 | labels: 252 | - property_name: fvTenant.attributes.dn 253 | regex: "^(?P.*)" 254 | staticlabels: 255 | - key: class 256 | value: fvTenant 257 | 258 | - ap: 259 | class_name: fvAp 260 | query_parameter: '?rsp-subtree-include=health,required' 261 | metrics: 262 | - value_name: fvAp.children.[healthInst].attributes.cur 263 | value_calculation: "value / 100" 264 | labels: 265 | - property_name: fvAp.attributes.dn 266 | regex: "^uni/tn-(?P.*)/ap-(?P.*)" 267 | staticlabels: 268 | - key: class 269 | value: fvAp 270 | 271 | - aepg: 272 | class_name: fvAEPg 273 | query_parameter: '?rsp-subtree-include=health,required' 274 | metrics: 275 | - value_name: fvAEPg.children.[healthInst].attributes.cur 276 | value_calculation: "value / 100" 277 | labels: 278 | - property_name: fvAEPg.attributes.dn 279 | regex: "^uni/tn-(?P.*)/(?:ap|mgmtp)-(?P.*)/(?:epg|inb)-(?P.*)" 280 | staticlabels: 281 | - key: class 282 | value: fvAEPg 283 | 284 | 285 | -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/config.d/vlan.yaml: -------------------------------------------------------------------------------- 1 | class_queries: 2 | vlans: 3 | class_name: fvnsEncapBlk 4 | query_parameter: '?order-by=fvnsEncapBlk.dn' 5 | metrics: 6 | - name: vlans_from 7 | value_name: fvnsEncapBlk.attributes.from 8 | type: gauge 9 | help: The from vlan 10 | value_regex_transformation: "v[x]?lan-(.*)" 11 | - name: vlans_to 12 | value_name: fvnsEncapBlk.attributes.to 13 | type: gauge 14 | help: The to vlan 15 | value_regex_transformation: "v[x]?lan-(.*)" 16 | - name: vlan_pool 17 | value_name: fvnsEncapBlk.attributes.from 18 | type: gauge 19 | help: The to vlan 20 | value_regex_transformation: "v[x]?lan-(.*)" 21 | labels: 22 | - property_name: fvnsEncapBlk.attributes.dn 23 | regex: "^uni/infra/vlanns-\\[(?P.+)\\]-(?Pstatic|dynamic)/from-\\[vlan-(?P.+)\\]-to-\\[vlan-(?P.+)\\]" 24 | - property_name: fvnsEncapBlk.attributes.dn 25 | regex: "^uni/infra/vxlanns-(?P.+)/from-\\[vxlan-(?P.+)\\]-to-\\[vxlan-(?P.+)\\]" 26 | - property_name: fvnsEncapBlk.attributes.dn 27 | regex: "^uni/vmmp-(?P.*)/dom-(?P.*)/usrcustomaggr-(?P.*)/from-\\[vlan-(?P.+)\\]-to-\\[vlan-(?P.+)\\]" 28 | 29 | static_binding_info: 30 | class_name: fvAEPg 31 | query_parameter: "?rsp-subtree-include=required&rsp-subtree-class=fvRsPathAtt&rsp-subtree=children&order-by=fvAEPg.dn" 32 | metrics: 33 | - name: static_binding 34 | value_name: fvAEPg.children.[fvRsPathAtt].attributes.encap 35 | type: gauge 36 | value_regex_transformation: "vlan-(.*)" 37 | help: "Static binding info" 38 | labels: 39 | - property_name: fvAEPg.attributes.dn 40 | regex: "^uni/tn-(?P.*)/(?:ap|mgmtp)-(?P.*)/(?:epg|inb)-(?P.*)" 41 | - property_name: fvAEPg.attributes.[.*].attributes.tDn 42 | regex: "^topology/pod-(?P[1-9][0-9]*)/(protpaths|paths)-(?P[1-9][0-9].*)/pathep-\\[(?P.+)\\]" 43 | - property_name: fvAEPg.attributes.[.*].attributes.encap 44 | regex: "^(?P.*)" 45 | 46 | epg_infos: 47 | class_name: vlanCktEp 48 | query_parameter: '?order-by=vlanCktEp.dn&rsp-subtree-include=stats&rsp-subtree-class=l2RsPathDomAtt,l2IngrPkts5min,l2EgrPkts5min,l2IngrBytes5min,l2EgrBytes5min&rsp-subtree=children' 49 | metrics: 50 | - name: epg_port_vlan_binding 51 | value_name: vlanCktEp.children.[l2RsPathDomAtt].attributes.operSt 52 | type: gauge 53 | value_transform: 54 | 'unknown': 0 55 | 'down': 1 56 | 'up': 2 57 | 'link-up': 3 58 | - name: epg_rx_flood 59 | value_name: vlanCktEp.children.[l2IngrBytes5min].attributes.floodCum 60 | value_transform: 61 | '': 0 62 | type: counter 63 | unit: "bytes" 64 | - name: epg_rx_multicast 65 | value_name: vlanCktEp.children.[l2IngrBytes5min].attributes.multicastCum 66 | value_transform: 67 | '': 0 68 | type: counter 69 | unit: "bytes" 70 | - name: epg_rx_unicast 71 | value_name: vlanCktEp.children.[l2IngrBytes5min].attributes.unicastCum 72 | value_transform: 73 | '': 0 74 | type: counter 75 | unit: "bytes" 76 | - name: epg_rx_drop 77 | value_name: vlanCktEp.children.[l2IngrBytes5min].attributes.dropCum 78 | value_transform: 79 | '': 0 80 | type: counter 81 | unit: "bytes" 82 | - name: epg_tx_flood 83 | value_name: vlanCktEp.children.[l2EgrBytes5min].attributes.floodCum+ 84 | value_transform: 85 | '': 0 86 | type: counter 87 | unit: "bytes" 88 | - name: epg_tx_multicast 89 | value_name: vlanCktEp.children.[l2EgrBytes5min].attributes.multicastCum 90 | value_transform: 91 | '': 0 92 | type: counter 93 | unit: "bytes" 94 | - name: epg_tx_unicast 95 | value_name: vlanCktEp.children.[l2EgrBytes5min].attributes.unicastCum 96 | value_transform: 97 | '': 0 98 | type: counter 99 | unit: "bytes" 100 | - name: epg_tx_drop 101 | value_name: vlanCktEp.children.[l2EgrBytes5min].attributes.dropCum 102 | value_transform: 103 | '': 0 104 | type: counter 105 | unit: "bytes" 106 | - name: epg_rx_flood 107 | value_name: vlanCktEp.children.[l2IngrPkts5min].attributes.floodCum 108 | value_transform: 109 | '': 0 110 | type: counter 111 | unit: "pkts" 112 | - name: epg_rx_multicast 113 | value_name: vlanCktEp.children.[l2IngrPkts5min].attributes.multicastCum 114 | value_transform: 115 | '': 0 116 | type: counter 117 | unit: "pkts" 118 | - name: epg_rx_unicast 119 | value_name: vlanCktEp.children.[l2IngrPkts5min].attributes.unicastCum 120 | value_transform: 121 | '': 0 122 | type: counter 123 | unit: "pkts" 124 | - name: epg_rx_drop 125 | value_name: vlanCktEp.children.[l2IngrPkts5min].attributes.dropCum 126 | value_transform: 127 | '': 0 128 | type: counter 129 | unit: "pkts" 130 | - name: epg_tx_flood 131 | value_name: vlanCktEp.children.[l2EgrPkts5min].attributes.floodCum 132 | value_transform: 133 | '': 0 134 | type: counter 135 | unit: "pkts" 136 | - name: epg_tx_multicast 137 | value_name: vlanCktEp.children.[l2EgrPkts5min].attributes.multicastCum 138 | value_transform: 139 | '': 0 140 | type: counter 141 | unit: "pkts" 142 | - name: epg_tx_unicast 143 | value_name: vlanCktEp.children.[l2EgrPkts5min].attributes.unicastCum 144 | value_transform: 145 | '': 0 146 | type: counter 147 | unit: "pkts" 148 | - name: epg_tx_drop 149 | value_name: vlanCktEp.children.[l2EgrPkts5min].attributes.dropCum 150 | value_transform: 151 | '': 0 152 | type: counter 153 | unit: "pkts" 154 | 155 | labels: 156 | - property_name: vlanCktEp.attributes.epgDn 157 | regex: "^uni/tn-(?P.*)/(?:ap|mgmtp)-(?P.*)/(?:epg|inb)-(?P.*)" 158 | - property_name: vlanCktEp.attributes.encap 159 | regex: "^vlan-(?P.*)" 160 | - property_name: vlanCktEp.attributes.pcTag 161 | regex: "^(?P.*)" 162 | - property_name: vlanCktEp.children.[l2RsPathDomAtt].attributes.tDn 163 | regex: "^sys/conng/path-\\[(?P[^\\]]+)\\]" 164 | 165 | epg_port_vxlan_binding: 166 | class_name: vxlanCktEp 167 | query_parameter: '?order-by=vxlanCktEp.dn&rsp-subtree-include=required&rsp-subtree-class=l2RsPathDomAtt&rsp-subtree=children' 168 | metrics: 169 | - name: epg_port_vxlan_binding 170 | value_name: vxlanCktEp.children.[l2RsPathDomAtt].attributes.operSt 171 | type: gauge 172 | value_transform: 173 | 'unknown': 0 174 | 'down': 1 175 | 'up': 2 176 | 'link-up': 3 177 | labels: 178 | - property_name: vxlanCktEp.attributes.epgDn 179 | regex: "^uni/tn-(?P.*)/(?:ap|mgmtp)-(?P.*)/(?:epg|inb)-(?P.*)" 180 | - property_name: vxlanCktEp.attributes.encap 181 | regex: "^vxlan-(?P.*)" 182 | - property_name: vxlanCktEp.attributes.pcTag 183 | regex: "^(?P.*)" 184 | - property_name: vxlanCktEp.children.[l2RsPathDomAtt].attributes.tDn 185 | regex: "^sys/conng/path-\\[(?P[^\\]]+)\\]" -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/dashboards/alerts.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "grafana", 8 | "uid": "-- Grafana --" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "type": "dashboard" 15 | } 16 | ] 17 | }, 18 | "editable": true, 19 | "fiscalYearStartMonth": 0, 20 | "graphTooltip": 0, 21 | "id": 12, 22 | "links": [], 23 | "panels": [ 24 | { 25 | "datasource": { 26 | "type": "prometheus", 27 | "uid": "prometheus" 28 | }, 29 | "gridPos": { 30 | "h": 8, 31 | "w": 24, 32 | "x": 0, 33 | "y": 0 34 | }, 35 | "id": 1, 36 | "options": { 37 | "alertInstanceLabelFilter": "", 38 | "alertName": "", 39 | "dashboardAlerts": false, 40 | "datasource": "Prometheus", 41 | "groupBy": [], 42 | "groupMode": "default", 43 | "maxItems": 100, 44 | "sortOrder": 3, 45 | "stateFilter": { 46 | "error": true, 47 | "firing": true, 48 | "noData": false, 49 | "normal": false, 50 | "pending": true 51 | }, 52 | "viewMode": "list" 53 | }, 54 | "targets": [ 55 | { 56 | "datasource": { 57 | "type": "prometheus", 58 | "uid": "prometheus" 59 | }, 60 | "expr": "", 61 | "instant": false, 62 | "range": true, 63 | "refId": "A" 64 | } 65 | ], 66 | "title": "Alerts", 67 | "type": "alertlist" 68 | } 69 | ], 70 | "schemaVersion": 39, 71 | "tags": [ 72 | "cisco-aci" 73 | ], 74 | "templating": { 75 | "list": [] 76 | }, 77 | "time": { 78 | "from": "now-5m", 79 | "to": "now" 80 | }, 81 | "timepicker": {}, 82 | "timezone": "browser", 83 | "title": "Alerts", 84 | "uid": "ddr3ntqzfw5q8d", 85 | "version": 2, 86 | "weekStart": "" 87 | } -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/dashboards/contracts-explorer.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "grafana", 8 | "uid": "-- Grafana --" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "type": "dashboard" 15 | } 16 | ] 17 | }, 18 | "description": "Explore the Contracts!", 19 | "editable": true, 20 | "fiscalYearStartMonth": 0, 21 | "graphTooltip": 0, 22 | "id": 29, 23 | "links": [], 24 | "panels": [ 25 | { 26 | "datasource": { 27 | "type": "kniepdennis-neo4j-datasource", 28 | "uid": "memgraph" 29 | }, 30 | "fieldConfig": { 31 | "defaults": { 32 | "color": { 33 | "mode": "thresholds" 34 | }, 35 | "custom": { 36 | "align": "auto", 37 | "cellOptions": { 38 | "type": "auto" 39 | }, 40 | "filterable": true, 41 | "inspect": true 42 | }, 43 | "mappings": [], 44 | "thresholds": { 45 | "mode": "absolute", 46 | "steps": [ 47 | { 48 | "color": "green", 49 | "value": null 50 | }, 51 | { 52 | "color": "red", 53 | "value": 80 54 | } 55 | ] 56 | } 57 | }, 58 | "overrides": [] 59 | }, 60 | "gridPos": { 61 | "h": 8, 62 | "w": 24, 63 | "x": 0, 64 | "y": 0 65 | }, 66 | "id": 1, 67 | "options": { 68 | "cellHeight": "sm", 69 | "footer": { 70 | "countRows": false, 71 | "fields": "", 72 | "reducer": [ 73 | "sum" 74 | ], 75 | "show": false 76 | }, 77 | "showHeader": true, 78 | "sortBy": [] 79 | }, 80 | "pluginVersion": "11.4.0", 81 | "targets": [ 82 | { 83 | "Format": "table", 84 | "cypherQuery": "MATCH (provider)-[r1:fvRsProv|vzRsAnyToProv]-(contract:vzBrCP)-[r2:fvRsCons|vzRsAnyToCons]-(consumer)\nWHERE contract.dn=\"uni/tn-$tenant/brc-$contract\" and contract.fabric='$fabric'\n\nRETURN provider.dn as ProviderDN, consumer.dn as ConsumerDN", 85 | "datasource": { 86 | "type": "kniepdennis-neo4j-datasource", 87 | "uid": "memgraph" 88 | }, 89 | "refId": "A" 90 | } 91 | ], 92 | "title": "Contract Table", 93 | "type": "table" 94 | }, 95 | { 96 | "datasource": { 97 | "type": "kniepdennis-neo4j-datasource", 98 | "uid": "memgraph" 99 | }, 100 | "fieldConfig": { 101 | "defaults": { 102 | "color": { 103 | "mode": "thresholds" 104 | }, 105 | "custom": { 106 | "align": "auto", 107 | "cellOptions": { 108 | "type": "auto" 109 | }, 110 | "inspect": false 111 | }, 112 | "mappings": [], 113 | "thresholds": { 114 | "mode": "absolute", 115 | "steps": [ 116 | { 117 | "color": "green", 118 | "value": null 119 | }, 120 | { 121 | "color": "red", 122 | "value": 80 123 | } 124 | ] 125 | } 126 | }, 127 | "overrides": [] 128 | }, 129 | "gridPos": { 130 | "h": 13, 131 | "w": 12, 132 | "x": 0, 133 | "y": 8 134 | }, 135 | "id": 3, 136 | "options": { 137 | "cellHeight": "sm", 138 | "footer": { 139 | "countRows": false, 140 | "fields": "", 141 | "reducer": [ 142 | "sum" 143 | ], 144 | "show": false 145 | }, 146 | "showHeader": true 147 | }, 148 | "pluginVersion": "11.4.0", 149 | "targets": [ 150 | { 151 | "cypherQuery": "MATCH (contract:vzBrCP)-[r1:IS_PARENT]->(subject:vzSubj)-[r2:vzRsSubjFiltAtt]->(filter:vzFilter)-[r3:IS_PARENT]-(entry:vzEntry)\nWHERE contract.dn=\"uni/tn-$tenant/brc-$contract\" and contract.fabric='$fabric'\nWITH split(entry.dn,\"/\") as parsedDN, entry\nreturn split(parsedDN[1],\"-\")[1] as Tenant,\n split(parsedDN[3],\"-\")[1] as Name,\n entry.sFromPort + '-' + entry.sToPort as SrcPort, entry.dFromPort + '-' + entry.dToPort as DestPort", 152 | "datasource": { 153 | "type": "kniepdennis-neo4j-datasource", 154 | "uid": "memgraph" 155 | }, 156 | "refId": "A" 157 | } 158 | ], 159 | "title": "Filter Entry", 160 | "type": "table" 161 | }, 162 | { 163 | "datasource": { 164 | "type": "kniepdennis-neo4j-datasource", 165 | "uid": "memgraph" 166 | }, 167 | "fieldConfig": { 168 | "defaults": {}, 169 | "overrides": [ 170 | { 171 | "matcher": { 172 | "id": "byName", 173 | "options": "LeafProfile.name" 174 | }, 175 | "properties": [] 176 | } 177 | ] 178 | }, 179 | "gridPos": { 180 | "h": 13, 181 | "w": 12, 182 | "x": 12, 183 | "y": 8 184 | }, 185 | "id": 2, 186 | "options": { 187 | "edges": {}, 188 | "nodes": {} 189 | }, 190 | "pluginVersion": "11.4.0", 191 | "targets": [ 192 | { 193 | "Format": "nodegraph", 194 | "cypherQuery": "MATCH (provider)-[r1:fvRsProv|vzRsAnyToProv]-(contract:vzBrCP)-[r2:fvRsCons|vzRsAnyToCons]-(consumer)\nWHERE contract.dn=\"uni/tn-$tenant/brc-$contract\" and contract.fabric='$fabric'\n\nRETURN *", 195 | "datasource": { 196 | "type": "kniepdennis-neo4j-datasource", 197 | "uid": "memgraph" 198 | }, 199 | "refId": "A" 200 | } 201 | ], 202 | "title": "Contract Graphs", 203 | "type": "nodeGraph" 204 | } 205 | ], 206 | "preload": false, 207 | "schemaVersion": 40, 208 | "tags": [ 209 | "cisco-aci", 210 | "cisco-aci-config" 211 | ], 212 | "templating": { 213 | "list": [ 214 | { 215 | "current": { 216 | "text": "fab2", 217 | "value": "fab2" 218 | }, 219 | "datasource": { 220 | "type": "prometheus", 221 | "uid": "prometheus" 222 | }, 223 | "definition": "label_values(fabric)", 224 | "label": "Fabric", 225 | "name": "fabric", 226 | "options": [], 227 | "query": { 228 | "qryType": 1, 229 | "query": "label_values(fabric)", 230 | "refId": "PrometheusVariableQueryEditor-VariableQuery" 231 | }, 232 | "refresh": 1, 233 | "regex": "", 234 | "type": "query" 235 | }, 236 | { 237 | "current": { 238 | "text": "common", 239 | "value": "common" 240 | }, 241 | "datasource": { 242 | "type": "kniepdennis-neo4j-datasource", 243 | "uid": "memgraph" 244 | }, 245 | "definition": "", 246 | "label": "Tenants", 247 | "name": "tenant", 248 | "options": [], 249 | "query": { 250 | "cypherQuery": "MATCH (t:fvTenant) WHERE (t.fabric=\"$fabric\")\nRETURN t.name" 251 | }, 252 | "refresh": 1, 253 | "regex": "", 254 | "sort": 1, 255 | "type": "query" 256 | }, 257 | { 258 | "current": { 259 | "text": "esg_oob", 260 | "value": "esg_oob" 261 | }, 262 | "datasource": { 263 | "type": "kniepdennis-neo4j-datasource", 264 | "uid": "memgraph" 265 | }, 266 | "definition": "", 267 | "description": "", 268 | "label": "Contracts", 269 | "name": "contract", 270 | "options": [], 271 | "query": { 272 | "cypherQuery": "MATCH (t:fvTenant)-[r:IS_PARENT]->(c:vzBrCP) WHERE (t.fabric=\"$fabric\" and t.name=\"$tenant\")\nRETURN c.name" 273 | }, 274 | "refresh": 1, 275 | "regex": "", 276 | "sort": 1, 277 | "type": "query" 278 | } 279 | ] 280 | }, 281 | "time": { 282 | "from": "now-6h", 283 | "to": "now" 284 | }, 285 | "timepicker": {}, 286 | "timezone": "browser", 287 | "title": "Contract Explorer", 288 | "uid": "be79w387etcsga", 289 | "version": 7, 290 | "weekStart": "" 291 | } -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/dashboards/drops.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "grafana", 8 | "uid": "-- Grafana --" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "type": "dashboard" 15 | } 16 | ] 17 | }, 18 | "editable": true, 19 | "fiscalYearStartMonth": 0, 20 | "graphTooltip": 0, 21 | "id": 3, 22 | "links": [], 23 | "panels": [ 24 | { 25 | "datasource": { 26 | "type": "loki", 27 | "uid": "P8E80F9AEF21F6940" 28 | }, 29 | "fieldConfig": { 30 | "defaults": { 31 | "color": { 32 | "mode": "palette-classic" 33 | }, 34 | "custom": { 35 | "axisBorderShow": false, 36 | "axisCenteredZero": false, 37 | "axisColorMode": "text", 38 | "axisLabel": "", 39 | "axisPlacement": "auto", 40 | "barAlignment": 0, 41 | "drawStyle": "line", 42 | "fillOpacity": 0, 43 | "gradientMode": "none", 44 | "hideFrom": { 45 | "legend": false, 46 | "tooltip": false, 47 | "viz": false 48 | }, 49 | "insertNulls": false, 50 | "lineInterpolation": "linear", 51 | "lineWidth": 1, 52 | "pointSize": 5, 53 | "scaleDistribution": { 54 | "type": "linear" 55 | }, 56 | "showPoints": "auto", 57 | "spanNulls": false, 58 | "stacking": { 59 | "group": "A", 60 | "mode": "none" 61 | }, 62 | "thresholdsStyle": { 63 | "mode": "off" 64 | } 65 | }, 66 | "mappings": [], 67 | "thresholds": { 68 | "mode": "absolute", 69 | "steps": [ 70 | { 71 | "color": "green", 72 | "value": null 73 | }, 74 | { 75 | "color": "red", 76 | "value": 80 77 | } 78 | ] 79 | } 80 | }, 81 | "overrides": [] 82 | }, 83 | "gridPos": { 84 | "h": 7, 85 | "w": 24, 86 | "x": 0, 87 | "y": 0 88 | }, 89 | "id": 1, 90 | "options": { 91 | "legend": { 92 | "calcs": [], 93 | "displayMode": "list", 94 | "placement": "bottom", 95 | "showLegend": true 96 | }, 97 | "tooltip": { 98 | "maxHeight": 600, 99 | "mode": "single", 100 | "sort": "none" 101 | } 102 | }, 103 | "targets": [ 104 | { 105 | "datasource": { 106 | "type": "loki", 107 | "uid": "P8E80F9AEF21F6940" 108 | }, 109 | "editorMode": "code", 110 | "expr": "sum by(SIP, DIP) (count_over_time({job=\"aci-monitoring-stack\",fabric=~\"$fabric\"} |= `ACLLOG-5-ACLLOG_PKTLOG_DENY` | regexp `.+(?PACLLOG-5-ACLLOG_PKTLOG_DENY:) CName: (?P.+)\\(.+SIP: (?P.+), DIP: (?P.+), SPort: (?P[0-9]+), DPort: (?P[0-9]+).+Proto: (?P[0-9]+),.+` [1m]))", 111 | "legendFormat": "{{SIP}}-->{{DIP}}", 112 | "queryType": "range", 113 | "refId": "A" 114 | } 115 | ], 116 | "title": "Packet Drops", 117 | "type": "timeseries" 118 | }, 119 | { 120 | "datasource": { 121 | "type": "loki", 122 | "uid": "P8E80F9AEF21F6940" 123 | }, 124 | "gridPos": { 125 | "h": 15, 126 | "w": 24, 127 | "x": 0, 128 | "y": 7 129 | }, 130 | "id": 2, 131 | "options": { 132 | "dedupStrategy": "none", 133 | "enableLogDetails": true, 134 | "prettifyLogMessage": false, 135 | "showCommonLabels": false, 136 | "showLabels": false, 137 | "showTime": false, 138 | "sortOrder": "Descending", 139 | "wrapLogMessage": false 140 | }, 141 | "targets": [ 142 | { 143 | "datasource": { 144 | "type": "loki", 145 | "uid": "P8E80F9AEF21F6940" 146 | }, 147 | "editorMode": "code", 148 | "expr": "{job=\"aci-monitoring-stack\",fabric=~\"$fabric\"} |= \"ACLLOG-5-ACLLOG_PKTLOG_DENY\" | regexp `.+(?PACLLOG-5-ACLLOG_PKTLOG_DENY:) CName: (?P.+)\\(.+SIP: (?P.+), DIP: (?P.+), SPort: (?P[0-9]+), DPort: (?P[0-9]+).+Proto: (?P[0-9]+),.+` | line_format \"Packet Drop on {{.switch}}\\t VRF={{.VRF}} SIP={{.SIP}} SPORT={{.SPORT}} DIP={{.DIP}} DPORT={{.DPORT}} Protocol={{.PROT}}\"\n", 149 | "key": "Q-54a7ca93-3271-4625-bf8d-3aaa07a6dcf3-0", 150 | "queryType": "range", 151 | "refId": "A" 152 | } 153 | ], 154 | "title": "Packet Drops Logs", 155 | "type": "logs" 156 | } 157 | ], 158 | "refresh": "5s", 159 | "schemaVersion": 39, 160 | "tags": [ 161 | "cisco-aci" 162 | ], 163 | "templating": { 164 | "list": [ 165 | { 166 | "current": { 167 | "selected": true, 168 | "text": "fab2", 169 | "value": "fab2" 170 | }, 171 | "datasource": { 172 | "type": "loki", 173 | "uid": "P8E80F9AEF21F6940" 174 | }, 175 | "definition": "", 176 | "hide": 0, 177 | "includeAll": false, 178 | "label": "Fabric", 179 | "multi": false, 180 | "name": "fabric", 181 | "options": [], 182 | "query": { 183 | "label": "fabric", 184 | "refId": "LokiVariableQueryEditor-VariableQuery", 185 | "stream": "", 186 | "type": 1 187 | }, 188 | "refresh": 1, 189 | "regex": "", 190 | "skipUrlSync": false, 191 | "sort": 0, 192 | "type": "query" 193 | } 194 | ] 195 | }, 196 | "time": { 197 | "from": "now-15m", 198 | "to": "now" 199 | }, 200 | "timepicker": {}, 201 | "timezone": "browser", 202 | "title": "Contract Drops Logs", 203 | "uid": "bdq4hhevqzfnkf", 204 | "version": 1, 205 | "weekStart": "" 206 | } -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/dashboards/fabric-policies-pg.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "grafana", 8 | "uid": "-- Grafana --" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "type": "dashboard" 15 | } 16 | ] 17 | }, 18 | "description": "Explore the Port Groups to VLAN/Domain/Switch and Port Bindings", 19 | "editable": true, 20 | "fiscalYearStartMonth": 0, 21 | "graphTooltip": 0, 22 | "id": 7, 23 | "links": [], 24 | "panels": [ 25 | { 26 | "datasource": { 27 | "type": "kniepdennis-neo4j-datasource", 28 | "uid": "memgraph" 29 | }, 30 | "fieldConfig": { 31 | "defaults": { 32 | "color": { 33 | "mode": "thresholds" 34 | }, 35 | "custom": { 36 | "align": "auto", 37 | "cellOptions": { 38 | "type": "auto" 39 | }, 40 | "inspect": false 41 | }, 42 | "mappings": [], 43 | "thresholds": { 44 | "mode": "absolute", 45 | "steps": [ 46 | { 47 | "color": "green", 48 | "value": null 49 | }, 50 | { 51 | "color": "red", 52 | "value": 80 53 | } 54 | ] 55 | } 56 | }, 57 | "overrides": [ 58 | { 59 | "matcher": { 60 | "id": "byName", 61 | "options": "LeafProfile.name" 62 | }, 63 | "properties": [ 64 | { 65 | "id": "custom.width", 66 | "value": 220 67 | } 68 | ] 69 | } 70 | ] 71 | }, 72 | "gridPos": { 73 | "h": 21, 74 | "w": 24, 75 | "x": 0, 76 | "y": 0 77 | }, 78 | "id": 1, 79 | "options": { 80 | "cellHeight": "sm", 81 | "footer": { 82 | "countRows": false, 83 | "fields": "", 84 | "reducer": [ 85 | "sum" 86 | ], 87 | "show": false 88 | }, 89 | "showHeader": true, 90 | "sortBy": [] 91 | }, 92 | "pluginVersion": "11.5.1", 93 | "targets": [ 94 | { 95 | "cypherQuery": "//To optimize the query is importan to filter as early as possible.\n//Collect all the Interface Policy Groups and filter by name\nOPTIONAL MATCH (n1:infraAccPortGrp) WHERE (n1.fabric=\"$fabric\" and n1.name=\"$pg\")\nOPTIONAL MATCH (n2:infraAccBndlGrp) WHERE (n2.fabric=\"$fabric\" and n2.name=\"$pg\")\nWITH coalesce(n1, n2) as PG\n\n//Find the Interface Policy Group to AAEP relationship\nMATCH (PG)-[r1:infraRsAttEntP]->(AAEP) \n//Find the Interface Policy Group to Access Port Selector relationship\nMATCH (AccessPortSel:infraHPortS)-[r2:infraRsAccBaseGrp]->(PG)\n\n//From now on all the matches are Optinal as I want to be able to display also incomplete configs\n//Build PG to VLAN ID relationships\nOPTIONAL MATCH (AAEP)-[r3:infraRsDomP]->(Domain)\nOPTIONAL MATCH (Domain)-[r4:infraRsVlanNs]->(VlanPool)\nOPTIONAL MATCH (VlanPool)-[r5:IS_PARENT]->(vlanBlock)\n\n//Find AccessPortSel to Interface Block relationships\nOPTIONAL MATCH (AccessPortSel)-[r6:IS_PARENT]->(interfaceBlk:infraPortBlk)\n// Find the Switch that is used for the PG\nOPTIONAL MATCH (LeafIntProf:infraAccPortP)-[r7:IS_PARENT]->(AccessPortSel)\nOPTIONAL MATCH (LeafProfile:infraNodeP)-[r8:infraRsAccPortP]->(LeafIntProf)\n//RETURN *\nRETURN vlanBlock.from + ' - ' + vlanBlock.to as VLANIDs,\n VlanPool.name,Domain.name,AAEP.name,PG.name,LeafProfile.name,AccessPortSel.name,LeafIntProf.name,\n interfaceBlk.fromCard + '/' + interfaceBlk.fromPort + ' - ' + interfaceBlk.toCard + '/' + interfaceBlk.toPort as Interface\n", 96 | "datasource": { 97 | "type": "kniepdennis-neo4j-datasource", 98 | "uid": "memgraph" 99 | }, 100 | "refId": "A" 101 | } 102 | ], 103 | "title": "Panel Title", 104 | "type": "table" 105 | } 106 | ], 107 | "preload": false, 108 | "schemaVersion": 40, 109 | "tags": [ 110 | "cisco-aci", 111 | "cisco-aci-config" 112 | ], 113 | "templating": { 114 | "list": [ 115 | { 116 | "current": { 117 | "text": "fab1", 118 | "value": "fab1" 119 | }, 120 | "datasource": { 121 | "type": "prometheus", 122 | "uid": "prometheus" 123 | }, 124 | "definition": "label_values(fabric)", 125 | "label": "Fabric", 126 | "name": "fabric", 127 | "options": [], 128 | "query": { 129 | "qryType": 1, 130 | "query": "label_values(fabric)", 131 | "refId": "PrometheusVariableQueryEditor-VariableQuery" 132 | }, 133 | "refresh": 1, 134 | "regex": "", 135 | "type": "query" 136 | }, 137 | { 138 | "current": { 139 | "text": "APICs", 140 | "value": "APICs" 141 | }, 142 | "datasource": { 143 | "type": "kniepdennis-neo4j-datasource", 144 | "uid": "memgraph" 145 | }, 146 | "definition": "", 147 | "label": "PortGroup", 148 | "name": "pg", 149 | "options": [], 150 | "query": { 151 | "cypherQuery": "OPTIONAL MATCH (n:infraAccPortGrp) WHERE (n.fabric=\"$fabric\") \nRETURN n.name as PGName\nUNION\nOPTIONAL MATCH (n:infraAccBndlGrp) WHERE (n.fabric=\"$fabric\") \nRETURN n.name as PGName" 152 | }, 153 | "refresh": 1, 154 | "regex": "", 155 | "sort": 1, 156 | "type": "query" 157 | } 158 | ] 159 | }, 160 | "time": { 161 | "from": "now-6h", 162 | "to": "now" 163 | }, 164 | "timepicker": {}, 165 | "timezone": "browser", 166 | "title": "Fabric Policies - Port Group", 167 | "uid": "ae79d0uzpr9xcc", 168 | "version": 1, 169 | "weekStart": "" 170 | } -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/dashboards/missing-targets.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "grafana", 8 | "uid": "-- Grafana --" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "type": "dashboard" 15 | } 16 | ] 17 | }, 18 | "description": "Show a list of missing Targets", 19 | "editable": true, 20 | "fiscalYearStartMonth": 0, 21 | "graphTooltip": 0, 22 | "id": 29, 23 | "links": [], 24 | "panels": [ 25 | { 26 | "datasource": { 27 | "type": "kniepdennis-neo4j-datasource", 28 | "uid": "memgraph" 29 | }, 30 | "fieldConfig": { 31 | "defaults": { 32 | "color": { 33 | "mode": "thresholds" 34 | }, 35 | "custom": { 36 | "align": "auto", 37 | "cellOptions": { 38 | "type": "auto" 39 | }, 40 | "inspect": false 41 | }, 42 | "mappings": [], 43 | "thresholds": { 44 | "mode": "absolute", 45 | "steps": [ 46 | { 47 | "color": "green", 48 | "value": null 49 | }, 50 | { 51 | "color": "red", 52 | "value": 80 53 | } 54 | ] 55 | } 56 | }, 57 | "overrides": [ 58 | { 59 | "matcher": { 60 | "id": "byName", 61 | "options": "LeafProfile.name" 62 | }, 63 | "properties": [ 64 | { 65 | "id": "custom.width", 66 | "value": 220 67 | } 68 | ] 69 | } 70 | ] 71 | }, 72 | "gridPos": { 73 | "h": 17, 74 | "w": 24, 75 | "x": 0, 76 | "y": 0 77 | }, 78 | "id": 1, 79 | "options": { 80 | "cellHeight": "sm", 81 | "footer": { 82 | "countRows": false, 83 | "fields": "", 84 | "reducer": [ 85 | "sum" 86 | ], 87 | "show": false 88 | }, 89 | "frameIndex": 1, 90 | "showHeader": true, 91 | "sortBy": [ 92 | { 93 | "desc": false, 94 | "displayName": "TargetName" 95 | } 96 | ] 97 | }, 98 | "pluginVersion": "11.5.1", 99 | "targets": [ 100 | { 101 | "Format": "table", 102 | "cypherQuery": "// I am not loading commRsKeyRing class in the DB so I just ignore it here\nMATCH (p:MissingTarget)-[r]-(t) WHERE TYPE(r) != \"commRsKeyRing\" AND p.fabric='$fabric'\nRETURN t.dn AS Parent ,r.target AS TargetName, TYPE(r) AS RelationshipClass", 103 | "datasource": { 104 | "type": "kniepdennis-neo4j-datasource", 105 | "uid": "memgraph" 106 | }, 107 | "refId": "A" 108 | } 109 | ], 110 | "title": "Missing Targets", 111 | "type": "table" 112 | } 113 | ], 114 | "preload": false, 115 | "schemaVersion": 40, 116 | "tags": [ 117 | "cisco-aci", 118 | "cisco-aci-config" 119 | ], 120 | "templating": { 121 | "list": [ 122 | { 123 | "current": { 124 | "text": "fab2", 125 | "value": "fab2" 126 | }, 127 | "datasource": { 128 | "type": "prometheus", 129 | "uid": "prometheus" 130 | }, 131 | "definition": "label_values(fabric)", 132 | "label": "Fabric", 133 | "name": "fabric", 134 | "options": [], 135 | "query": { 136 | "qryType": 1, 137 | "query": "label_values(fabric)", 138 | "refId": "PrometheusVariableQueryEditor-VariableQuery" 139 | }, 140 | "refresh": 1, 141 | "regex": "", 142 | "type": "query" 143 | } 144 | ] 145 | }, 146 | "time": { 147 | "from": "now-6h", 148 | "to": "now" 149 | }, 150 | "timepicker": {}, 151 | "timezone": "browser", 152 | "title": "Missing Targets", 153 | "uid": "beezkx0398u80f", 154 | "version": 6, 155 | "weekStart": "" 156 | } -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/dashboards/node-details.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "datasource", 8 | "uid": "grafana" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "target": { 15 | "limit": 100, 16 | "matchAny": false, 17 | "type": "dashboard" 18 | }, 19 | "type": "dashboard" 20 | } 21 | ] 22 | }, 23 | "editable": true, 24 | "fiscalYearStartMonth": 0, 25 | "graphTooltip": 0, 26 | "links": [], 27 | "liveNow": false, 28 | "panels": [ 29 | { 30 | "datasource": { 31 | "type": "prometheus", 32 | "uid": "prometheus" 33 | }, 34 | "description": "", 35 | "fieldConfig": { 36 | "defaults": { 37 | "color": { 38 | "mode": "palette-classic" 39 | }, 40 | "custom": { 41 | "axisBorderShow": false, 42 | "axisCenteredZero": false, 43 | "axisColorMode": "text", 44 | "axisLabel": "", 45 | "axisPlacement": "auto", 46 | "barAlignment": 0, 47 | "drawStyle": "line", 48 | "fillOpacity": 0, 49 | "gradientMode": "none", 50 | "hideFrom": { 51 | "legend": false, 52 | "tooltip": false, 53 | "viz": false 54 | }, 55 | "insertNulls": false, 56 | "lineInterpolation": "linear", 57 | "lineWidth": 1, 58 | "pointSize": 5, 59 | "scaleDistribution": { 60 | "type": "linear" 61 | }, 62 | "showPoints": "auto", 63 | "spanNulls": false, 64 | "stacking": { 65 | "group": "A", 66 | "mode": "none" 67 | }, 68 | "thresholdsStyle": { 69 | "mode": "off" 70 | } 71 | }, 72 | "mappings": [], 73 | "thresholds": { 74 | "mode": "percentage", 75 | "steps": [ 76 | { 77 | "color": "green", 78 | "value": null 79 | }, 80 | { 81 | "color": "yellow", 82 | "value": 85 83 | }, 84 | { 85 | "color": "red", 86 | "value": 95 87 | } 88 | ] 89 | }, 90 | "unit": "percentunit" 91 | }, 92 | "overrides": [] 93 | }, 94 | "gridPos": { 95 | "h": 7, 96 | "w": 24, 97 | "x": 0, 98 | "y": 0 99 | }, 100 | "id": 11, 101 | "options": { 102 | "legend": { 103 | "calcs": [], 104 | "displayMode": "list", 105 | "placement": "bottom", 106 | "showLegend": true 107 | }, 108 | "tooltip": { 109 | "maxHeight": 600, 110 | "mode": "single", 111 | "sort": "none" 112 | } 113 | }, 114 | "pluginVersion": "10.1.5", 115 | "targets": [ 116 | { 117 | "datasource": { 118 | "type": "prometheus", 119 | "uid": "prometheus" 120 | }, 121 | "editorMode": "code", 122 | "expr": "aci_node_cpu_user_ratio{fabric=~\"$fabric\",nodeid=~\"$nodeid\"} + aci_node_cpu_kernel_ratio{fabric=~\"$fabric\",nodeid=~\"$nodeid\"}", 123 | "interval": "", 124 | "legendFormat": "{{fabric}} - {{nodeid}}", 125 | "range": true, 126 | "refId": "A" 127 | } 128 | ], 129 | "title": "Node cpu", 130 | "type": "timeseries" 131 | }, 132 | { 133 | "datasource": { 134 | "type": "prometheus", 135 | "uid": "prometheus" 136 | }, 137 | "description": "", 138 | "fieldConfig": { 139 | "defaults": { 140 | "color": { 141 | "mode": "palette-classic" 142 | }, 143 | "custom": { 144 | "axisBorderShow": false, 145 | "axisCenteredZero": false, 146 | "axisColorMode": "text", 147 | "axisLabel": "", 148 | "axisPlacement": "auto", 149 | "barAlignment": 0, 150 | "drawStyle": "line", 151 | "fillOpacity": 0, 152 | "gradientMode": "none", 153 | "hideFrom": { 154 | "legend": false, 155 | "tooltip": false, 156 | "viz": false 157 | }, 158 | "insertNulls": false, 159 | "lineInterpolation": "linear", 160 | "lineWidth": 1, 161 | "pointSize": 5, 162 | "scaleDistribution": { 163 | "type": "linear" 164 | }, 165 | "showPoints": "auto", 166 | "spanNulls": false, 167 | "stacking": { 168 | "group": "A", 169 | "mode": "none" 170 | }, 171 | "thresholdsStyle": { 172 | "mode": "off" 173 | } 174 | }, 175 | "mappings": [], 176 | "thresholds": { 177 | "mode": "percentage", 178 | "steps": [ 179 | { 180 | "color": "green", 181 | "value": null 182 | }, 183 | { 184 | "color": "yellow", 185 | "value": 85 186 | }, 187 | { 188 | "color": "red", 189 | "value": 95 190 | } 191 | ] 192 | }, 193 | "unit": "percentunit" 194 | }, 195 | "overrides": [] 196 | }, 197 | "gridPos": { 198 | "h": 8, 199 | "w": 24, 200 | "x": 0, 201 | "y": 7 202 | }, 203 | "id": 17, 204 | "options": { 205 | "legend": { 206 | "calcs": [], 207 | "displayMode": "list", 208 | "placement": "bottom", 209 | "showLegend": true 210 | }, 211 | "tooltip": { 212 | "maxHeight": 600, 213 | "mode": "single", 214 | "sort": "none" 215 | } 216 | }, 217 | "pluginVersion": "10.1.5", 218 | "targets": [ 219 | { 220 | "datasource": { 221 | "type": "prometheus", 222 | "uid": "prometheus" 223 | }, 224 | "editorMode": "code", 225 | "expr": "aci_node_memory_used_bytes{fabric=~\"$fabric\",nodeid=~\"$nodeid\"} / aci_node_memory_total_bytes{fabric=~\"$fabric\",nodeid=~\"$nodeid\"}", 226 | "interval": "", 227 | "legendFormat": "{{fabric}} - {{nodeid}}", 228 | "range": true, 229 | "refId": "A" 230 | } 231 | ], 232 | "title": "Node memory used", 233 | "type": "timeseries" 234 | }, 235 | { 236 | "datasource": { 237 | "type": "prometheus", 238 | "uid": "prometheus" 239 | }, 240 | "description": "", 241 | "fieldConfig": { 242 | "defaults": { 243 | "color": { 244 | "mode": "palette-classic" 245 | }, 246 | "custom": { 247 | "axisBorderShow": false, 248 | "axisCenteredZero": false, 249 | "axisColorMode": "text", 250 | "axisLabel": "", 251 | "axisPlacement": "auto", 252 | "barAlignment": 0, 253 | "drawStyle": "line", 254 | "fillOpacity": 0, 255 | "gradientMode": "none", 256 | "hideFrom": { 257 | "legend": false, 258 | "tooltip": false, 259 | "viz": false 260 | }, 261 | "insertNulls": false, 262 | "lineInterpolation": "linear", 263 | "lineWidth": 1, 264 | "pointSize": 5, 265 | "scaleDistribution": { 266 | "type": "linear" 267 | }, 268 | "showPoints": "never", 269 | "spanNulls": false, 270 | "stacking": { 271 | "group": "A", 272 | "mode": "none" 273 | }, 274 | "thresholdsStyle": { 275 | "mode": "off" 276 | } 277 | }, 278 | "mappings": [], 279 | "max": 1, 280 | "min": 0, 281 | "thresholds": { 282 | "mode": "absolute", 283 | "steps": [ 284 | { 285 | "color": "green", 286 | "value": null 287 | }, 288 | { 289 | "color": "red", 290 | "value": 80 291 | } 292 | ] 293 | }, 294 | "unit": "percentunit" 295 | }, 296 | "overrides": [] 297 | }, 298 | "gridPos": { 299 | "h": 6, 300 | "w": 24, 301 | "x": 0, 302 | "y": 15 303 | }, 304 | "id": 5, 305 | "options": { 306 | "legend": { 307 | "calcs": [], 308 | "displayMode": "list", 309 | "placement": "bottom", 310 | "showLegend": true 311 | }, 312 | "tooltip": { 313 | "maxHeight": 600, 314 | "mode": "multi", 315 | "sort": "none" 316 | } 317 | }, 318 | "pluginVersion": "10.1.5", 319 | "targets": [ 320 | { 321 | "datasource": { 322 | "type": "prometheus", 323 | "uid": "prometheus" 324 | }, 325 | "editorMode": "code", 326 | "expr": "aci_health_ratio{fabric=~\"$fabric\",nodeid=~\"$nodeid\"}", 327 | "interval": "", 328 | "legendFormat": "{{fabric}} - {{nodeid}}", 329 | "range": true, 330 | "refId": "A" 331 | } 332 | ], 333 | "title": "Node health ", 334 | "type": "timeseries" 335 | } 336 | ], 337 | "refresh": "", 338 | "revision": 1, 339 | "schemaVersion": 39, 340 | "tags": [ 341 | "cisco-aci" 342 | ], 343 | "templating": { 344 | "list": [ 345 | { 346 | "current": { 347 | "selected": false, 348 | "text": "All", 349 | "value": "$__all" 350 | }, 351 | "datasource": { 352 | "type": "prometheus", 353 | "uid": "prometheus" 354 | }, 355 | "definition": "label_values(fabric)", 356 | "hide": 0, 357 | "includeAll": true, 358 | "label": "Fabric", 359 | "multi": false, 360 | "name": "fabric", 361 | "options": [], 362 | "query": { 363 | "query": "label_values(fabric)", 364 | "refId": "PrometheusVariableQueryEditor-VariableQuery" 365 | }, 366 | "refresh": 1, 367 | "regex": "", 368 | "skipUrlSync": false, 369 | "sort": 0, 370 | "type": "query" 371 | }, 372 | { 373 | "current": { 374 | "selected": false, 375 | "text": "All", 376 | "value": "$__all" 377 | }, 378 | "datasource": { 379 | "type": "prometheus", 380 | "uid": "prometheus" 381 | }, 382 | "definition": "query_result(aci_uptime_seconds_total{fabric=~\"$fabric\",role!=\"controller\"})", 383 | "hide": 0, 384 | "includeAll": true, 385 | "label": "Nodeid", 386 | "multi": true, 387 | "name": "nodeid", 388 | "options": [], 389 | "query": { 390 | "query": "query_result(aci_uptime_seconds_total{fabric=~\"$fabric\",role!=\"controller\"})", 391 | "refId": "PrometheusVariableQueryEditor-VariableQuery" 392 | }, 393 | "refresh": 1, 394 | "regex": "/nodeid=\"(.*?)\"/", 395 | "skipUrlSync": false, 396 | "sort": 3, 397 | "tagValuesQuery": "", 398 | "tagsQuery": "", 399 | "type": "query", 400 | "useTags": false 401 | } 402 | ] 403 | }, 404 | "time": { 405 | "from": "now-30m", 406 | "to": "now" 407 | }, 408 | "timeRangeUpdatedDuringEditOrView": false, 409 | "timepicker": { 410 | "refresh_intervals": [ 411 | "10s", 412 | "30s", 413 | "1m", 414 | "5m", 415 | "15m", 416 | "30m", 417 | "1h", 418 | "2h", 419 | "1d" 420 | ] 421 | }, 422 | "timezone": "", 423 | "title": "Node Detail", 424 | "uid": "6RqSXnVMk", 425 | "version": 1, 426 | "weekStart": "" 427 | } -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/dashboards/power-usage.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "grafana", 8 | "uid": "-- Grafana --" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "type": "dashboard" 15 | } 16 | ] 17 | }, 18 | "editable": true, 19 | "fiscalYearStartMonth": 0, 20 | "graphTooltip": 0, 21 | "links": [], 22 | "liveNow": false, 23 | "panels": [ 24 | { 25 | "datasource": { 26 | "type": "prometheus", 27 | "uid": "prometheus" 28 | }, 29 | "fieldConfig": { 30 | "defaults": { 31 | "color": { 32 | "mode": "palette-classic" 33 | }, 34 | "custom": { 35 | "axisBorderShow": false, 36 | "axisCenteredZero": false, 37 | "axisColorMode": "text", 38 | "axisLabel": "", 39 | "axisPlacement": "auto", 40 | "barAlignment": 0, 41 | "drawStyle": "line", 42 | "fillOpacity": 0, 43 | "gradientMode": "none", 44 | "hideFrom": { 45 | "legend": false, 46 | "tooltip": false, 47 | "viz": false 48 | }, 49 | "insertNulls": false, 50 | "lineInterpolation": "linear", 51 | "lineWidth": 1, 52 | "pointSize": 5, 53 | "scaleDistribution": { 54 | "type": "linear" 55 | }, 56 | "showPoints": "auto", 57 | "spanNulls": false, 58 | "stacking": { 59 | "group": "A", 60 | "mode": "none" 61 | }, 62 | "thresholdsStyle": { 63 | "mode": "off" 64 | } 65 | }, 66 | "mappings": [], 67 | "thresholds": { 68 | "mode": "absolute", 69 | "steps": [ 70 | { 71 | "color": "green", 72 | "value": null 73 | }, 74 | { 75 | "color": "red", 76 | "value": 80 77 | } 78 | ] 79 | } 80 | }, 81 | "overrides": [] 82 | }, 83 | "gridPos": { 84 | "h": 8, 85 | "w": 24, 86 | "x": 0, 87 | "y": 0 88 | }, 89 | "id": 1, 90 | "options": { 91 | "legend": { 92 | "calcs": [], 93 | "displayMode": "list", 94 | "placement": "bottom", 95 | "showLegend": true 96 | }, 97 | "tooltip": { 98 | "maxHeight": 600, 99 | "mode": "single", 100 | "sort": "none" 101 | } 102 | }, 103 | "pluginVersion": "10.1.5", 104 | "targets": [ 105 | { 106 | "datasource": { 107 | "type": "prometheus", 108 | "uid": "prometheus" 109 | }, 110 | "editorMode": "code", 111 | "expr": "sum by(nodeid, fabric) (aci_psu_supplied_avg{fabric=~\"$fabric\",nodeid=~\"$nodeid\"})", 112 | "hide": false, 113 | "instant": false, 114 | "legendFormat": "Total Power Draw {{fabric}} - {{nodeid}}", 115 | "range": true, 116 | "refId": "A" 117 | } 118 | ], 119 | "title": "Panel Title", 120 | "type": "timeseries" 121 | } 122 | ], 123 | "refresh": "", 124 | "schemaVersion": 39, 125 | "tags": [ 126 | "cisco-aci" 127 | ], 128 | "templating": { 129 | "list": [ 130 | { 131 | "current": { 132 | "selected": false, 133 | "text": "All", 134 | "value": "$__all" 135 | }, 136 | "datasource": { 137 | "type": "prometheus", 138 | "uid": "prometheus" 139 | }, 140 | "definition": "label_values(fabric)", 141 | "hide": 0, 142 | "includeAll": true, 143 | "label": "Fabric", 144 | "multi": false, 145 | "name": "fabric", 146 | "options": [], 147 | "query": { 148 | "query": "label_values(fabric)", 149 | "refId": "PrometheusVariableQueryEditor-VariableQuery" 150 | }, 151 | "refresh": 1, 152 | "regex": "", 153 | "skipUrlSync": false, 154 | "sort": 0, 155 | "type": "query" 156 | }, 157 | { 158 | "current": { 159 | "selected": true, 160 | "text": [ 161 | "All" 162 | ], 163 | "value": [ 164 | "$__all" 165 | ] 166 | }, 167 | "datasource": { 168 | "type": "prometheus", 169 | "uid": "prometheus" 170 | }, 171 | "definition": "query_result(aci_uptime_seconds_total{fabric=~\"$fabric\"})", 172 | "hide": 0, 173 | "includeAll": true, 174 | "label": "Nodeid", 175 | "multi": true, 176 | "name": "nodeid", 177 | "options": [], 178 | "query": { 179 | "query": "query_result(aci_uptime_seconds_total{fabric=~\"$fabric\"})", 180 | "refId": "PrometheusVariableQueryEditor-VariableQuery" 181 | }, 182 | "refresh": 1, 183 | "regex": "/nodeid=\"(.*?)\"/", 184 | "skipUrlSync": false, 185 | "sort": 0, 186 | "type": "query" 187 | } 188 | ] 189 | }, 190 | "time": { 191 | "from": "now-5m", 192 | "to": "now" 193 | }, 194 | "timeRangeUpdatedDuringEditOrView": false, 195 | "timepicker": {}, 196 | "timezone": "", 197 | "title": "Power Usage", 198 | "uid": "e504e219-6b07-42d3-a534-ed2c6e0fe5e6", 199 | "version": 1, 200 | "weekStart": "" 201 | } -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/dashboards/vlans-vmm-trunks.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "grafana", 8 | "uid": "-- Grafana --" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "type": "dashboard" 15 | } 16 | ] 17 | }, 18 | "editable": true, 19 | "fiscalYearStartMonth": 0, 20 | "graphTooltip": 0, 21 | "id": 15, 22 | "links": [], 23 | "panels": [ 24 | { 25 | "datasource": { 26 | "type": "kniepdennis-neo4j-datasource", 27 | "uid": "memgraph" 28 | }, 29 | "description": "This show all the vlans that are being used in the fabric.", 30 | "fieldConfig": { 31 | "defaults": { 32 | "color": { 33 | "mode": "thresholds" 34 | }, 35 | "custom": { 36 | "align": "left", 37 | "cellOptions": { 38 | "type": "auto" 39 | }, 40 | "filterable": true, 41 | "inspect": false 42 | }, 43 | "mappings": [], 44 | "thresholds": { 45 | "mode": "absolute", 46 | "steps": [ 47 | { 48 | "color": "green", 49 | "value": null 50 | }, 51 | { 52 | "color": "red", 53 | "value": 80 54 | } 55 | ] 56 | } 57 | }, 58 | "overrides": [ 59 | { 60 | "matcher": { 61 | "id": "byName", 62 | "options": "VLANID" 63 | }, 64 | "properties": [ 65 | { 66 | "id": "displayName", 67 | "value": "Static VLANs in use" 68 | } 69 | ] 70 | } 71 | ] 72 | }, 73 | "gridPos": { 74 | "h": 8, 75 | "w": 3, 76 | "x": 0, 77 | "y": 0 78 | }, 79 | "id": 3, 80 | "options": { 81 | "cellHeight": "sm", 82 | "footer": { 83 | "countRows": false, 84 | "fields": "", 85 | "reducer": [ 86 | "sum" 87 | ], 88 | "show": false 89 | }, 90 | "showHeader": true, 91 | "sortBy": [] 92 | }, 93 | "pluginVersion": "11.5.1", 94 | "targets": [ 95 | { 96 | "cypherQuery": "//To optimize the query is importan to filter as early as possible.\n//Collect all the Interface Policy Groups and filter by name \nOPTIONAL MATCH (a:l3extRsPathL3OutAtt) WHERE (a.fabric=\"$fabric\")\nWITH COLLECT(DISTINCT a.encap) as aencap\nOPTIONAL MATCH (b:l3extVirtualLIfP) WHERE (b.fabric=\"$fabric\")\nWITH aencap + COLLECT(DISTINCT b.encap) as bencap\nOPTIONAL MATCH (c:fvRsPathAtt) WHERE (c.fabric=\"$fabric\")\nWITH bencap + COLLECT(DISTINCT c.encap) as cencap\nOPTIONAL MATCH (d:infraRsFuncToEpg) WHERE (d.fabric=\"$fabric\")\nWITH cencap + COLLECT(DISTINCT d.encap) as dencap\nUNWIND dencap AS individualEncaps\nWITH TOINTEGER(SPLIT(individualEncaps,\"-\")[1]) as VLANID\nRETURN VLANID ORDER BY VLANID", 97 | "datasource": { 98 | "type": "kniepdennis-neo4j-datasource", 99 | "uid": "memgraph" 100 | }, 101 | "refId": "A" 102 | } 103 | ], 104 | "title": "", 105 | "type": "table" 106 | }, 107 | { 108 | "datasource": { 109 | "type": "kniepdennis-neo4j-datasource", 110 | "uid": "memgraph" 111 | }, 112 | "fieldConfig": { 113 | "defaults": { 114 | "color": { 115 | "mode": "thresholds" 116 | }, 117 | "custom": { 118 | "align": "left", 119 | "cellOptions": { 120 | "type": "auto" 121 | }, 122 | "filterable": true, 123 | "inspect": false 124 | }, 125 | "mappings": [], 126 | "thresholds": { 127 | "mode": "absolute", 128 | "steps": [ 129 | { 130 | "color": "green", 131 | "value": null 132 | }, 133 | { 134 | "color": "red", 135 | "value": 80 136 | } 137 | ] 138 | } 139 | }, 140 | "overrides": [ 141 | { 142 | "matcher": { 143 | "id": "byName", 144 | "options": "VLANID" 145 | }, 146 | "properties": [ 147 | { 148 | "id": "custom.width", 149 | "value": 138 150 | } 151 | ] 152 | } 153 | ] 154 | }, 155 | "gridPos": { 156 | "h": 8, 157 | "w": 21, 158 | "x": 3, 159 | "y": 0 160 | }, 161 | "id": 4, 162 | "options": { 163 | "cellHeight": "sm", 164 | "footer": { 165 | "countRows": false, 166 | "fields": "", 167 | "reducer": [ 168 | "sum" 169 | ], 170 | "show": false 171 | }, 172 | "showHeader": true, 173 | "sortBy": [] 174 | }, 175 | "pluginVersion": "11.5.1", 176 | "targets": [ 177 | { 178 | "cypherQuery": "//To optimize the query is importan to filter as early as possible.\n//Collect all the Interface Policy Groups and filter by name \nOPTIONAL MATCH (a:l3extRsPathL3OutAtt) WHERE (a.fabric=\"$fabric\")\nWITH COLLECT(a) as aencap\nOPTIONAL MATCH (b:l3extVirtualLIfP) WHERE (b.fabric=\"$fabric\")\nWITH aencap + COLLECT(b) as bencap\nOPTIONAL MATCH (c:fvRsPathAtt) WHERE (c.fabric=\"$fabric\")\nWITH bencap + COLLECT(c) as cencap\nOPTIONAL MATCH (d:infraRsFuncToEpg) WHERE (d.fabric=\"$fabric\")\nWITH cencap + COLLECT(d) as dencap\nUNWIND dencap AS individualEncaps\nRETURN TOINTEGER(SPLIT(individualEncaps.encap,\"-\")[1]) as VLANID, individualEncaps.dn as DN\nORDER BY VLANID", 179 | "datasource": { 180 | "type": "kniepdennis-neo4j-datasource", 181 | "uid": "memgraph" 182 | }, 183 | "refId": "A" 184 | } 185 | ], 186 | "title": "VLAN To Port Config", 187 | "type": "table" 188 | }, 189 | { 190 | "datasource": { 191 | "type": "kniepdennis-neo4j-datasource", 192 | "uid": "memgraph" 193 | }, 194 | "fieldConfig": { 195 | "defaults": { 196 | "color": { 197 | "mode": "thresholds" 198 | }, 199 | "custom": { 200 | "align": "auto", 201 | "cellOptions": { 202 | "type": "auto" 203 | }, 204 | "filterable": true, 205 | "inspect": false 206 | }, 207 | "mappings": [], 208 | "thresholds": { 209 | "mode": "absolute", 210 | "steps": [ 211 | { 212 | "color": "green", 213 | "value": null 214 | }, 215 | { 216 | "color": "red", 217 | "value": 80 218 | } 219 | ] 220 | } 221 | }, 222 | "overrides": [] 223 | }, 224 | "gridPos": { 225 | "h": 15, 226 | "w": 12, 227 | "x": 0, 228 | "y": 8 229 | }, 230 | "id": 1, 231 | "options": { 232 | "cellHeight": "sm", 233 | "footer": { 234 | "countRows": false, 235 | "enablePagination": false, 236 | "fields": "", 237 | "reducer": [ 238 | "sum" 239 | ], 240 | "show": false 241 | }, 242 | "frameIndex": 3, 243 | "showHeader": true, 244 | "sortBy": [ 245 | { 246 | "desc": false, 247 | "displayName": "Vlan Pool Name" 248 | } 249 | ] 250 | }, 251 | "pluginVersion": "11.5.1", 252 | "targets": [ 253 | { 254 | "cypherQuery": "MATCH(vlp:fvnsVlanInstP)-[r1]-(vblk:fvnsEncapBlk) where (vlp.fabric=\"$fabric\")\nRETURN vlp.name AS POOL_NAME, vblk.from as FROM, vblk.to as TO", 255 | "datasource": { 256 | "type": "kniepdennis-neo4j-datasource", 257 | "uid": "memgraph" 258 | }, 259 | "refId": "A" 260 | } 261 | ], 262 | "title": "Vlans Pools", 263 | "transformations": [ 264 | { 265 | "id": "organize", 266 | "options": { 267 | "excludeByName": { 268 | "Time": true, 269 | "Value": true, 270 | "__name__": true, 271 | "fabric": true, 272 | "instance": true, 273 | "job": true 274 | }, 275 | "indexByName": { 276 | "Time": 9, 277 | "Value": 8, 278 | "__name__": 10, 279 | "aci": 1, 280 | "allocMode": 3, 281 | "fabric": 2, 282 | "from": 4, 283 | "instance": 5, 284 | "job": 6, 285 | "to": 7, 286 | "vlanns": 0 287 | }, 288 | "renameByName": { 289 | "aci": "Fabric", 290 | "fabric": "", 291 | "vlanns": "Vlan Pool Name" 292 | } 293 | } 294 | } 295 | ], 296 | "type": "table" 297 | }, 298 | { 299 | "datasource": { 300 | "type": "kniepdennis-neo4j-datasource", 301 | "uid": "memgraph" 302 | }, 303 | "fieldConfig": { 304 | "defaults": { 305 | "color": { 306 | "mode": "thresholds" 307 | }, 308 | "custom": { 309 | "align": "auto", 310 | "cellOptions": { 311 | "type": "auto" 312 | }, 313 | "filterable": true, 314 | "inspect": false 315 | }, 316 | "mappings": [], 317 | "thresholds": { 318 | "mode": "absolute", 319 | "steps": [ 320 | { 321 | "color": "green", 322 | "value": null 323 | }, 324 | { 325 | "color": "red", 326 | "value": 80 327 | } 328 | ] 329 | } 330 | }, 331 | "overrides": [] 332 | }, 333 | "gridPos": { 334 | "h": 15, 335 | "w": 12, 336 | "x": 12, 337 | "y": 8 338 | }, 339 | "id": 2, 340 | "options": { 341 | "cellHeight": "sm", 342 | "footer": { 343 | "countRows": false, 344 | "fields": "", 345 | "reducer": [ 346 | "sum" 347 | ], 348 | "show": false 349 | }, 350 | "frameIndex": 3, 351 | "showHeader": true 352 | }, 353 | "pluginVersion": "11.5.1", 354 | "targets": [ 355 | { 356 | "cypherQuery": "MATCH(vlp:vmmUsrCustomAggr)-[r1]-(vblk:fvnsEncapBlk) where (vlp.fabric=\"$fabric\")\nRETURN vlp.name AS POOL_NAME, vblk.from as FROM, vblk.to as TO", 357 | "datasource": { 358 | "type": "kniepdennis-neo4j-datasource", 359 | "uid": "memgraph" 360 | }, 361 | "refId": "A" 362 | } 363 | ], 364 | "title": "VMM Custom Tunk Port Groups", 365 | "transformations": [ 366 | { 367 | "id": "organize", 368 | "options": { 369 | "excludeByName": { 370 | "Time": true, 371 | "Value": true, 372 | "__name__": true, 373 | "fabric": true, 374 | "instance": true, 375 | "job": true 376 | }, 377 | "indexByName": { 378 | "Time": 10, 379 | "TrunkPGName": 1, 380 | "Value": 9, 381 | "__name__": 11, 382 | "aci": 0, 383 | "fabric": 4, 384 | "from": 5, 385 | "instance": 6, 386 | "job": 7, 387 | "to": 8, 388 | "vmmDomain": 2, 389 | "vmmDomainType": 3 390 | }, 391 | "renameByName": { 392 | "aci": "Fabric", 393 | "fabric": "", 394 | "vlanns": "Vlan Pool Name" 395 | } 396 | } 397 | } 398 | ], 399 | "type": "table" 400 | } 401 | ], 402 | "preload": false, 403 | "refresh": "", 404 | "schemaVersion": 40, 405 | "tags": [ 406 | "cisco-aci", 407 | "cisco-aci-config" 408 | ], 409 | "templating": { 410 | "list": [ 411 | { 412 | "current": { 413 | "text": "fab2", 414 | "value": "fab2" 415 | }, 416 | "datasource": { 417 | "type": "prometheus", 418 | "uid": "prometheus" 419 | }, 420 | "definition": "label_values(fabric)", 421 | "includeAll": false, 422 | "label": "Fabric", 423 | "name": "fabric", 424 | "options": [], 425 | "query": { 426 | "query": "label_values(fabric)", 427 | "refId": "PrometheusVariableQueryEditor-VariableQuery" 428 | }, 429 | "refresh": 1, 430 | "regex": "", 431 | "type": "query" 432 | } 433 | ] 434 | }, 435 | "time": { 436 | "from": "now-5m", 437 | "to": "now" 438 | }, 439 | "timepicker": {}, 440 | "timezone": "", 441 | "title": "Vlans", 442 | "uid": "c094d195-2037-4f78-bc33-b4c7cc0ddc0d", 443 | "version": 14, 444 | "weekStart": "" 445 | } -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/_helpers.tpl: -------------------------------------------------------------------------------- 1 | {{/* 2 | Expand the name of the chart. 3 | */}} 4 | {{- define "aci-monitoring-stack.name" -}} 5 | {{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} 6 | {{- end }} 7 | 8 | {{/* 9 | Create a default fully qualified app name. 10 | We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). 11 | If release name contains chart name it will be used as a full name. 12 | */}} 13 | {{- define "aci-monitoring-stack.fullname" -}} 14 | {{- if .Values.fullnameOverride }} 15 | {{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} 16 | {{- else }} 17 | {{- $name := default .Chart.Name .Values.nameOverride }} 18 | {{- if contains $name .Release.Name }} 19 | {{- .Release.Name | trunc 63 | trimSuffix "-" }} 20 | {{- else }} 21 | {{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} 22 | {{- end }} 23 | {{- end }} 24 | {{- end }} 25 | 26 | {{/* 27 | Create chart name and version as used by the chart label. 28 | */}} 29 | {{- define "aci-monitoring-stack.chart" -}} 30 | {{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} 31 | {{- end }} 32 | 33 | {{/* 34 | Common labels 35 | */}} 36 | {{- define "aci-monitoring-stack.labels" -}} 37 | helm.sh/chart: {{ include "aci-monitoring-stack.chart" . }} 38 | {{ include "aci-monitoring-stack.selectorLabels" . }} 39 | {{- if .Chart.AppVersion }} 40 | app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} 41 | {{- end }} 42 | app.kubernetes.io/managed-by: {{ .Release.Service }} 43 | {{- end }} 44 | 45 | {{/* 46 | Selector labels 47 | */}} 48 | {{- define "aci-monitoring-stack.selectorLabels" -}} 49 | app.kubernetes.io/name: {{ include "aci-monitoring-stack.name" . }} 50 | app.kubernetes.io/instance: {{ .Release.Name }} 51 | {{- end }} 52 | 53 | {{/* 54 | Create the name of the service account to use 55 | */}} 56 | {{- define "aci-monitoring-stack.serviceAccountName" -}} 57 | {{- if .Values.serviceAccount.create }} 58 | {{- default (include "aci-monitoring-stack.fullname" .) .Values.serviceAccount.name }} 59 | {{- else }} 60 | {{- default "default" .Values.serviceAccount.name }} 61 | {{- end }} 62 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/aci-exporter/configmap-config.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: {{ .Release.Name }}-aci-exporter-config 5 | labels: 6 | app: {{ .Release.Name }}-aci-exporter 7 | {{ include "aci-monitoring-stack.labels" $ | indent 4 }} 8 | data: 9 | config.yaml: |- 10 | port: {{ $.Values.aci_exporter.port }} 11 | prefix: {{ $.Values.aci_exporter.prefix }} 12 | # Profiles for different fabrics 13 | fabrics: 14 | {{- if $.Values.aci_exporter.fabrics }} 15 | {{- range $k, $v := $.Values.aci_exporter.fabrics }} 16 | {{ $k }}: 17 | username: {{ $v.username }} 18 | password: {{ $v.password | quote }} 19 | apic: 20 | {{- range $v.apic }} 21 | - {{ . }} 22 | {{- end }} 23 | service_discovery: 24 | target_format: "%s#%s" 25 | target_fields: 26 | - aci_exporter_fabric 27 | - {{ $v.service_discovery }} 28 | {{- end }} 29 | {{- end }} 30 | # Http client settings used to access apic 31 | httpclient: 32 | insecurehttps: {{ $.Values.aci_exporter.httpclient.insecurehttps }} 33 | keepalive: {{ $.Values.aci_exporter.httpclient.keepalive }} 34 | timeout: {{ $.Values.aci_exporter.httpclient.timeout }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/aci-exporter/configmap-queries.yaml: -------------------------------------------------------------------------------- 1 | {{- $files := .Files.Glob "config.d/*.yaml" }} 2 | {{- if $files }} 3 | apiVersion: v1 4 | kind: ConfigMapList 5 | items: 6 | {{- range $path, $fileContents := $files }} 7 | {{- $configName := regexReplaceAll "(^.*/)(.*)\\.yaml$" $path "${2}" }} 8 | - apiVersion: v1 9 | kind: ConfigMap 10 | metadata: 11 | name: {{ printf "%s-aci-exporter-queries-%s" $.Release.Name $configName | trunc 63 | trimSuffix "-" }} 12 | labels: 13 | app: {{ $.Release.Name }}-aci-exporter 14 | {{ include "aci-monitoring-stack.labels" $ | indent 6 }} 15 | data: 16 | {{ $configName }}.yaml: {{ $.Files.Get $path | toYaml | nindent 6}} 17 | {{- end }} 18 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/aci-exporter/deployment.yaml: -------------------------------------------------------------------------------- 1 | {{- $files := .Files.Glob "config.d/*.yaml" }} 2 | apiVersion: apps/v1 3 | kind: Deployment 4 | metadata: 5 | name: {{ $.Release.Name }}-aci-exporter 6 | labels: 7 | app.kubernetes.io/component: {{ $.Release.Name }}-aci-exporter 8 | spec: 9 | replicas: 1 10 | selector: 11 | matchLabels: 12 | app.kubernetes.io/component: {{ $.Release.Name }}-aci-exporter 13 | template: 14 | metadata: 15 | labels: 16 | app.kubernetes.io/component: {{ $.Release.Name }}-aci-exporter 17 | annotations: 18 | checksum/config: {{ include (print $.Template.BasePath "/aci-exporter/configmap-config.yaml") . | sha256sum }} 19 | {{- if $files }} 20 | {{- range $path, $fileContents := $files }} 21 | {{- $configName := regexReplaceAll "(^.*/)(.*)\\.yaml$" $path "${2}" }} 22 | checksum/queries-{{ $configName }}: {{ print $fileContents | sha256sum }} 23 | {{- end }} 24 | {{- end }} 25 | spec: 26 | containers: 27 | - name: aci-exporter 28 | image: "{{ .Values.aci_exporter.image.repository }}:{{ .Values.aci_exporter.image.tag | default .Chart.AppVersion }}" 29 | imagePullPolicy: {{ .Values.aci_exporter.image.pullPolicy }} 30 | ports: 31 | - containerPort: {{ .Values.aci_exporter.port }} 32 | protocol: TCP 33 | volumeMounts: 34 | - name: {{ $.Release.Name }}-aci-exporter-config 35 | mountPath: /etc/aci-exporter/config.yaml 36 | subPath: config.yaml 37 | {{- if $files }} 38 | {{- range $path, $fileContents := $files }} 39 | {{- $configName := regexReplaceAll "(^.*/)(.*)\\.yaml$" $path "${2}" }} 40 | - name: {{ printf "%s-aci-exporter-queries-%s" $.Release.Name $configName | trunc 63 | trimSuffix "-" }} 41 | mountPath: /etc/aci-exporter/config.d/{{ $configName }}.yaml 42 | subPath: {{ $configName }}.yaml 43 | {{- end }} 44 | {{- end }} 45 | volumes: 46 | - name: {{ $.Release.Name }}-aci-exporter-config 47 | configMap: 48 | name: {{ $.Release.Name }}-aci-exporter-config 49 | {{- if $files }} 50 | {{- range $path, $fileContents := $files }} 51 | {{- $configName := regexReplaceAll "(^.*/)(.*)\\.yaml$" $path "${2}" }} 52 | - name: {{ printf "%s-aci-exporter-queries-%s" $.Release.Name $configName | trunc 63 | trimSuffix "-" }} 53 | configMap: 54 | name: {{ printf "%s-aci-exporter-queries-%s" $.Release.Name $configName | trunc 63 | trimSuffix "-" }} 55 | {{- end }} 56 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/aci-exporter/service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: {{ .Values.aci_exporter.url }} 5 | labels: 6 | {{- include "aci-monitoring-stack.labels" . | nindent 4 }} 7 | spec: 8 | type: ClusterIP 9 | ports: 10 | - port: {{ .Values.aci_exporter.port }} 11 | targetPort: {{ .Values.aci_exporter.port }} 12 | protocol: TCP 13 | name: http 14 | selector: 15 | app.kubernetes.io/component: {{ $.Release.Name }}-aci-exporter 16 | -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/backup2graph/CronJob.yml: -------------------------------------------------------------------------------- 1 | {{- if $.Values.backup2graph.enabled }} 2 | apiVersion: batch/v1 3 | kind: CronJob 4 | metadata: 5 | name: {{ $.Release.Name }}-aci-backup2graph 6 | labels: 7 | app.kubernetes.io/component: {{ $.Release.Name }}-aci-backup2graph 8 | spec: 9 | schedule: "{{ .Values.backup2graph.schedule }}" 10 | concurrencyPolicy: 'Forbid' 11 | jobTemplate: 12 | spec: 13 | template: 14 | spec: 15 | {{- if .Capabilities.APIVersions.Has "security.openshift.io/v1" }} 16 | serviceAccountName: {{ .Values.global.serviceAccountName }} 17 | {{- end }} 18 | containers: 19 | - name: aci-backup2graph 20 | image: "{{ .Values.backup2graph.image.repository }}:{{ .Values.backup2graph.image.tag | default .Chart.AppVersion }}" 21 | imagePullPolicy: {{ .Values.backup2graph.image.pullPolicy }} 22 | volumeMounts: 23 | - name: {{ $.Release.Name }}-aci-exporter-config 24 | mountPath: /etc/backup2graph/config.yaml 25 | subPath: config.yaml 26 | - name: backups 27 | mountPath: /app/fabrics/ 28 | env: 29 | - name: MEMGRAPH_SVC_HOST 30 | value: {{ $.Release.Name }}-memgraph 31 | - name: MEMGRAPH_SVC_PORT 32 | value: "{{ $.Values.memgraph.boltPort }}" 33 | {{- with .Values.backup2graph.nodeSelector }} 34 | nodeSelector: 35 | {{- toYaml . | nindent 12 }} 36 | {{- end }} 37 | volumes: 38 | - name: {{ $.Release.Name }}-aci-exporter-config 39 | configMap: 40 | name: {{ $.Release.Name }}-aci-exporter-config 41 | - name: backups 42 | persistentVolumeClaim: 43 | claimName: {{ $.Release.Name }}-memgraph-user-storage-{{ $.Release.Name }}-memgraph-0 44 | restartPolicy: Never 45 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/grafana-configmap-dashboards.yaml: -------------------------------------------------------------------------------- 1 | {{- $files := .Files.Glob "dashboards/*.json" }} 2 | {{- if $files }} 3 | apiVersion: v1 4 | kind: ConfigMapList 5 | items: 6 | {{- range $path, $fileContents := $files }} 7 | {{- $dashboardName := regexReplaceAll "(^.*/)(.*)\\.json$" $path "${2}" }} 8 | - apiVersion: v1 9 | kind: ConfigMap 10 | metadata: 11 | name: {{ printf "%s-%s-dashboard" $.Release.Name $dashboardName | trunc 63 | trimSuffix "-" }} 12 | annotations: 13 | k8s-sidecar-target-directory: "ACI" 14 | labels: 15 | {{- if $.Values.grafana.sidecar.dashboards.label }} 16 | {{ $.Values.grafana.sidecar.dashboards.label }}: {{ $.Values.grafana.sidecar.dashboards.labelValue | quote }} 17 | {{- end }} 18 | app: {{ $.Release.Name }}-grafana 19 | {{ include "aci-monitoring-stack.labels" $ | indent 6 }} 20 | data: 21 | {{ $dashboardName }}.json: {{ $.Files.Get $path | toJson }} 22 | {{- end }} 23 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/grafana-datasources.yaml: -------------------------------------------------------------------------------- 1 | {{- if or (and .Values.grafana.enabled .Values.grafana.sidecar.datasources.enabled) .Values.grafana.forceDeployDatasources }} 2 | apiVersion: v1 3 | kind: ConfigMap 4 | metadata: 5 | name: {{ .Release.Name }}-grafana-datasources 6 | namespace: {{ .Release.Namespace }} 7 | {{- if .Values.grafana.sidecar.datasources.annotations }} 8 | annotations: 9 | {{- toYaml .Values.grafana.sidecar.datasources.annotations | nindent 4 }} 10 | {{- end }} 11 | labels: 12 | {{ $.Values.grafana.sidecar.datasources.label }}: {{ $.Values.grafana.sidecar.datasources.labelValue | quote }} 13 | app: {{ .Release.Name }}-grafana 14 | {{ include "aci-monitoring-stack.labels" $ | indent 4 }} 15 | data: 16 | datasource.yaml: |- 17 | apiVersion: 1 18 | datasources: 19 | {{- $scrapeInterval := .Values.grafana.sidecar.datasources.defaultDatasourceScrapeInterval | default .Values.prometheus.scrapeInterval | default "30s" }} 20 | {{- if .Values.grafana.sidecar.datasources.defaultDatasourceEnabled }} 21 | - name: {{ .Values.grafana.sidecar.datasources.name }} 22 | type: prometheus 23 | uid: {{ .Values.grafana.sidecar.datasources.uid }} 24 | {{- if .Values.grafana.sidecar.datasources.url }} 25 | url: {{ .Values.grafana.sidecar.datasources.url }} 26 | {{- else }} 27 | url: http://{{ .Release.Name }}-prometheus-server.{{ .Release.Namespace }}.svc:{{ .Values.prometheus.service.servicePort }}/{{ trimPrefix "/" .Values.prometheus.server.prefixURL }} 28 | {{- end }} 29 | access: proxy 30 | isDefault: {{ .Values.grafana.sidecar.datasources.isDefaultDatasource }} 31 | jsonData: 32 | httpMethod: {{ .Values.grafana.sidecar.datasources.httpMethod }} 33 | timeInterval: {{ $scrapeInterval }} 34 | {{- if .Values.grafana.sidecar.datasources.timeout }} 35 | timeout: {{ .Values.grafana.sidecar.datasources.timeout }} 36 | {{- end }} 37 | {{- if .Values.grafana.sidecar.datasources.exemplarTraceIdDestinations }} 38 | exemplarTraceIdDestinations: 39 | - datasourceUid: {{ .Values.grafana.sidecar.datasources.exemplarTraceIdDestinations.datasourceUid }} 40 | name: {{ .Values.grafana.sidecar.datasources.exemplarTraceIdDestinations.traceIdLabelName }} 41 | {{- end }} 42 | {{- if .Values.grafana.sidecar.datasources.alertmanager.enabled }} 43 | - name: {{ .Values.grafana.sidecar.datasources.alertmanager.name }} 44 | type: alertmanager 45 | uid: {{ .Values.grafana.sidecar.datasources.alertmanager.uid }} 46 | {{- if .Values.grafana.sidecar.datasources.alertmanager.url }} 47 | url: {{ .Values.grafana.sidecar.datasources.alertmanager.url }} 48 | {{- else }} 49 | url: http://{{ .Release.Name }}-alertmanager.{{ .Release.Namespace }}.svc:{{ .Values.prometheus.alertmanager.service.port }} 50 | {{- end }} 51 | access: proxy 52 | jsonData: 53 | handleGrafanaManagedAlerts: {{ .Values.grafana.sidecar.datasources.alertmanager.handleGrafanaManagedAlerts }} 54 | implementation: {{ .Values.grafana.sidecar.datasources.alertmanager.implementation }} 55 | {{- end }} 56 | {{- end }} 57 | 58 | {{- if $.Values.memgraph.enabled }} 59 | - name: "memgraph" 60 | type: kniepdennis-neo4j-datasource 61 | access: proxy 62 | url: "" 63 | jsonData: {url: "bolt://{{ $.Release.Name }}-memgraph.{{ .Release.Namespace }}.svc:{{ $.Values.memgraph.boltPort }}"} 64 | uid: 'memgraph' 65 | version: 2 66 | {{- end }} 67 | 68 | {{- if $.Values.loki.enabled }} 69 | {{- if .Values.loki.loki.enabled }} 70 | - name: Loki 71 | type: loki 72 | access: proxy 73 | url: http://{{ template "loki.gatewayFullname" .Subcharts.loki }}.{{ .Release.Namespace }}.svc:{{ .Values.loki.loki.gateway.service.port }} 74 | version: 1 75 | isDefault: {{ default false .Values.loki.loki.isDefault }} 76 | {{- with .Values.loki.loki.datasource.uid }} 77 | uid: {{ . | quote }} 78 | {{- end }} 79 | {{- with .Values.loki.loki.datasource.jsonData }} 80 | jsonData: 81 | {{- tpl . $ | nindent 8 }} 82 | {{- end }} 83 | {{- end }} 84 | 85 | {{- end }} 86 | 87 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/loki/loki-configmap-alerts.yaml: -------------------------------------------------------------------------------- 1 | {{- if $.Values.loki.enabled }} 2 | {{- $files := .Files.Glob "alerts/loki/*.yaml" }} 3 | {{- if $files }} 4 | apiVersion: v1 5 | kind: ConfigMapList 6 | items: 7 | {{- range $path, $fileContents := $files }} 8 | {{- $dashboardName := regexReplaceAll "(^.*/)(.*)\\.yaml$" $path "${2}" }} 9 | - apiVersion: v1 10 | kind: ConfigMap 11 | metadata: 12 | name: {{ printf "%s-%s-alert" $.Release.Name $dashboardName | trunc 63 | trimSuffix "-" }} 13 | annotations: 14 | k8s-sidecar-target-directory: "fake" 15 | labels: 16 | {{- if $.Values.loki.loki.sidecar.rules.enabled }} 17 | {{ index $.Values.loki.loki.sidecar.rules.label }}: {{ $.Values.loki.loki.sidecar.rules.labelValue | quote }} 18 | {{- end }} 19 | app: {{ $.Release.Name }}-grafana 20 | {{ include "aci-monitoring-stack.labels" $ | indent 6 }} 21 | data: 22 | {{ $dashboardName }}.yaml: {{ $.Files.Get $path | toYaml | indent 4 }} 23 | {{- end }} 24 | {{- end }} 25 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/openshift/SecurityContextConstraints.yaml: -------------------------------------------------------------------------------- 1 | {{- if .Capabilities.APIVersions.Has "security.openshift.io/v1" }} 2 | apiVersion: security.openshift.io/v1 3 | kind: SecurityContextConstraints 4 | metadata: 5 | name: {{ $.Release.Name }}-{{ .Values.global.serviceAccountName }} 6 | allowHostDirVolumePlugin: false 7 | allowHostIPC: false 8 | allowHostNetwork: false 9 | allowHostPID: false 10 | allowHostPorts: false 11 | allowPrivilegeEscalation: true 12 | allowPrivilegedContainer: true 13 | allowedCapabilities: 14 | - '*' 15 | allowedUnsafeSysctls: 16 | - '*' 17 | defaultAddCapabilities: null 18 | fsGroup: 19 | type: RunAsAny 20 | priority: null 21 | readOnlyRootFilesystem: false 22 | requiredDropCapabilities: null 23 | runAsUser: 24 | type: RunAsAny 25 | seLinuxContext: 26 | type: MustRunAs 27 | seccompProfiles: 28 | - '*' 29 | supplementalGroups: 30 | type: RunAsAny 31 | volumes: 32 | - '*' 33 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/openshift/bucket.yaml: -------------------------------------------------------------------------------- 1 | {{- if .Values.loki.cephBucket.enabled }} 2 | apiVersion: objectbucket.io/v1alpha1 3 | kind: ObjectBucketClaim 4 | metadata: 5 | name: {{ .Values.loki.cephBucket.bucketName }} 6 | spec: 7 | bucketName: {{ .Values.loki.cephBucket.bucketName }} 8 | generateBucketName: {{ .Values.loki.cephBucket.bucketName }} 9 | storageClassName: {{ .Values.loki.cephBucket.storageClassName }} 10 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/openshift/cluster-role-binding.yaml: -------------------------------------------------------------------------------- 1 | {{- if .Capabilities.APIVersions.Has "security.openshift.io/v1" }} 2 | apiVersion: rbac.authorization.k8s.io/v1 3 | kind: ClusterRoleBinding 4 | metadata: 5 | name: {{ $.Release.Name }}-{{ .Values.global.serviceAccountName }} 6 | roleRef: 7 | apiGroup: rbac.authorization.k8s.io 8 | kind: ClusterRole 9 | name: {{ $.Release.Name }}-{{ .Values.global.serviceAccountName }} 10 | subjects: 11 | - kind: ServiceAccount 12 | name: {{ .Values.global.serviceAccountName }} 13 | namespace: {{ $.Release.Namespace }} 14 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/openshift/cluster-role.yaml: -------------------------------------------------------------------------------- 1 | {{- if .Capabilities.APIVersions.Has "security.openshift.io/v1" }} 2 | apiVersion: rbac.authorization.k8s.io/v1 3 | kind: ClusterRole 4 | metadata: 5 | name: {{ $.Release.Name }}-{{ .Values.global.serviceAccountName }} 6 | rules: 7 | - apiGroups: 8 | - security.openshift.io 9 | resourceNames: 10 | - {{ $.Release.Name }}-{{ .Values.global.serviceAccountName }} 11 | resources: 12 | - securitycontextconstraints 13 | verbs: 14 | - use 15 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/openshift/service-account.yaml: -------------------------------------------------------------------------------- 1 | {{- if .Capabilities.APIVersions.Has "security.openshift.io/v1" }} 2 | apiVersion: v1 3 | kind: ServiceAccount 4 | metadata: 5 | name: {{ .Values.global.serviceAccountName }} 6 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/prometheus/configmap-alerts.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: prometheus-alerts 5 | labels: 6 | app: {{ .Release.Name }}-prometheus-alerts 7 | {{ include "aci-monitoring-stack.labels" $ | indent 4 }} 8 | data: 9 | prometheus-alerts.yaml: {{ $.Files.Get "alerts/prom/alerts.yaml" | toYaml | indent 4 }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/prometheus/configmap-config.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: {{ .Release.Name }}-prometheus-config 5 | labels: 6 | app: {{ .Release.Name }}-prometheus-config 7 | {{ include "aci-monitoring-stack.labels" $ | indent 4 }} 8 | data: 9 | prometheus.yml: |- 10 | global: 11 | evaluation_interval: 1m 12 | scrape_interval: 1m 13 | scrape_timeout: 10s 14 | rule_files: 15 | - /etc/alerts.d/prometheus-alerts.yaml 16 | scrape_configs: 17 | - job_name: prometheus 18 | static_configs: 19 | - targets: 20 | - localhost:9090 21 | {{- if $.Values.aci_exporter.aciServiceDiscoveryURLs }} 22 | {{- range $k, $v := $.Values.aci_exporter.aciServiceDiscoveryURLs }} 23 | - job_name: {{ $k }}-aci-exporter-apics 24 | scrape_interval: {{ $v.apic_polling }} 25 | scrape_timeout: {{ $v.apic_scrape_timeout }} 26 | metrics_path: /probe 27 | params: 28 | # List of the queries to execute on the fabric level. They need to match the aci-exporter config 29 | # DO NOT INSERT SPACES and use \ for next line or aci-exporter will not be able to parse the queries 30 | queries: 31 | - "health,fabric_node_info,max_capacity,max_global_pctags,subnets,epg_to_bd,svc_epg_to_bd,inb_epg_to_bd,bd_to_vrf,\ 32 | vlans,static_binding_info,node_count,object_count,fault_insts,\ 33 | ps_power_usage,apic_hw_sensors,controller_topsystem" 34 | scheme: http 35 | http_sd_configs: 36 | - url: {{ $v.url }}.{{ $.Release.Namespace }}.svc:{{ $.Values.aci_exporter.port }}/sd 37 | refresh_interval: 5m 38 | relabel_configs: 39 | - source_labels: [ __meta_role ] 40 | # This config executes the queries at the "fabric" level and is used to probe any of the APICs 41 | # to get metrics about all the devices in the fabric. A classic use case is to get for example the vlan pools 42 | # the status of the nodes or the scale profile for the whole fabric etc... 43 | regex: "aci_exporter_fabric" 44 | action: "keep" 45 | - source_labels: [ __address__ ] 46 | target_label: __param_target 47 | - source_labels: [ __param_target ] 48 | target_label: instance 49 | - source_labels: [ __meta_url ] 50 | regex: https?://(.*)/.* 51 | replacement: "$1" 52 | target_label: __address__ 53 | - job_name: {{ $k }}-aci-exporter-switches 54 | scrape_interval: {{ $v.switch_polling }} 55 | scrape_timeout: {{ $v.switch_scrape_timeout }} 56 | metrics_path: /probe 57 | params: 58 | # List of the queries to execute on the fabric level. They need to match the aci-exporter config 59 | # DO NOT INSERT SPACES and use \ for next line or aci-exporter will not be able to parse the queries 60 | queries: 61 | - "node_topsystem,node_bgp_peers,node_bgp_peers_af,node_interface_info,\ 62 | node_interface_rx_stats,node_interface_rx_err_stats,epg_infos,epg_port_vxlan_binding,\ 63 | node_interface_tx_stats,node_interface_tx_err_stats,node_cpu,node_memory,\ 64 | node_scale_profiles,node_active_scale_profile,node_tcam_current,\ 65 | node_labels_current,node_mac_current,node_ipv4_current,\ 66 | node_ipv6_current,node_mcast_current,node_vlan_current,\ 67 | node_lpm_current,node_slash32_current,node_slash128_current,\ 68 | node_scale_ctx,node_ospf_neighbors,node_fru_power_usage,node_temperature" 69 | scheme: http 70 | http_sd_configs: 71 | - url: {{ $v.url }}.{{ $.Release.Namespace }}.svc:{{ $.Values.aci_exporter.port }}/sd 72 | refresh_interval: 5m 73 | relabel_configs: 74 | - source_labels: [ __meta_role ] 75 | # Include Only the Switches, this is used to execute the queries on all the switches in the fabric 76 | # but not on the APIC, for example the APIC have no TCAM so we need to exclude them. 77 | regex: "(spine|leaf)" 78 | action: "keep" 79 | 80 | # Get the target (Fabric Name) param from __address__ that is # by default 81 | - source_labels: [ __address__ ] 82 | separator: "#" 83 | regex: (.*)#(.*) 84 | replacement: "$1" 85 | target_label: __param_target 86 | 87 | # Get the node Address param from __address__ that is # by default 88 | - source_labels: [ __address__ ] 89 | separator: "#" 90 | regex: (.*)#(.*) 91 | replacement: "$2" 92 | target_label: __param_node 93 | # Get the aci-exporter URL from the service discovery URL 94 | - source_labels: [ __meta_url ] 95 | regex: https?://(.*)/.* 96 | replacement: "$1" 97 | target_label: __address__ 98 | 99 | # Set instance to the ip/hostname from the __param_node 100 | - source_labels: [ __param_node ] 101 | target_label: instance 102 | 103 | # Add labels from discovery 104 | - source_labels: [ __meta_fabricDomain ] 105 | target_label: aci 106 | - source_labels: [ __meta_id ] 107 | target_label: nodeid 108 | - source_labels: [ __meta_podId ] 109 | target_label: podid 110 | - source_labels: [ __meta_role ] 111 | target_label: role 112 | - source_labels: [ __meta_name ] 113 | target_label: name 114 | {{- end }} 115 | {{- end }} 116 | alerting: 117 | alertmanagers: 118 | - kubernetes_sd_configs: 119 | - role: pod 120 | tls_config: 121 | ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 122 | bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token 123 | relabel_configs: 124 | - source_labels: [__meta_kubernetes_namespace] 125 | regex: aci-mon-stack 126 | action: keep 127 | - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_instance] 128 | regex: aci-mon-stack 129 | action: keep 130 | - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name] 131 | regex: alertmanager 132 | action: keep 133 | - source_labels: [__meta_kubernetes_pod_container_port_number] 134 | regex: "9093" 135 | action: keep -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/syslog-ng/configmap.yaml: -------------------------------------------------------------------------------- 1 | {{- if $.Values.syslog.enabled }} 2 | apiVersion: v1 3 | kind: ConfigMap 4 | metadata: 5 | name: {{ $.Release.Name }}-syslog-ng-config 6 | labels: 7 | app: {{ .Release.Name }}-syslog-ng 8 | {{ include "aci-monitoring-stack.labels" $ | indent 4 }} 9 | data: 10 | syslog-ng.conf: | 11 | @version: 4.2 12 | @include "scl.conf" 13 | {{- range $key, $values := .Values.syslog.services }} 14 | source s_{{ $key }} { 15 | {{$values.protocol|lower}}( 16 | ip(0.0.0.0) 17 | port({{ $values.service.port }}) 18 | flags(no-parse) 19 | ); 20 | 21 | }; 22 | destination d_{{ $key }} { 23 | syslog("{{ $.Release.Name }}-promtail-{{ $key }}" transport("tcp") port({{ $values.service.port }})); }; 24 | 25 | log { 26 | source(s_{{ $key }}); 27 | parser { syslog-parser(); }; 28 | destination(d_{{ $key }}); 29 | }; 30 | {{- end }} 31 | scl.conf: | 32 | @module appmodel 33 | @include 'scl/*/*.conf' 34 | @define java-module-dir "`module-install-dir`/java-modules" 35 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/syslog-ng/deployment.yaml: -------------------------------------------------------------------------------- 1 | {{- if $.Values.syslog.enabled }} 2 | apiVersion: apps/v1 3 | kind: Deployment 4 | metadata: 5 | name: {{ $.Release.Name }}-syslog-ng 6 | labels: 7 | app.kubernetes.io/component: {{ $.Release.Name }}-syslog-ng 8 | spec: 9 | replicas: 1 10 | selector: 11 | matchLabels: 12 | app.kubernetes.io/component: {{ $.Release.Name }}-syslog-ng 13 | template: 14 | metadata: 15 | labels: 16 | app.kubernetes.io/component: {{ $.Release.Name }}-syslog-ng 17 | annotations: 18 | checksum/config: {{ include (print $.Template.BasePath "/syslog-ng/configmap.yaml") . | sha256sum }} 19 | spec: 20 | {{- if .Capabilities.APIVersions.Has "security.openshift.io/v1" }} 21 | serviceAccountName: {{ .Values.global.serviceAccountName }} 22 | {{- end }} 23 | containers: 24 | - name: syslog-ng 25 | securityContext: 26 | runAsUser: 0 27 | image: "{{ .Values.syslog.image.repository }}:{{ .Values.syslog.image.tag }}" 28 | imagePullPolicy: {{ .Values.syslog.image.pullPolicy }} 29 | ports: 30 | {{- range $key, $values := .Values.syslog.services }} 31 | - name: {{ .name | default $key }} 32 | containerPort: {{ $values.containerPort }} 33 | protocol: {{ $values.protocol | default "TCP" }} 34 | {{- end }} 35 | volumeMounts: 36 | - name: {{ $.Release.Name }}-syslog-ng-config 37 | mountPath: /etc/syslog-ng 38 | volumes: 39 | - name: {{ $.Release.Name }}-syslog-ng-config 40 | configMap: 41 | name: {{ $.Release.Name }}-syslog-ng-config 42 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/templates/syslog-ng/service.yaml: -------------------------------------------------------------------------------- 1 | {{- if $.Values.syslog.enabled }} 2 | {{- range $key, $values := .Values.syslog.services }} 3 | --- 4 | apiVersion: v1 5 | kind: Service 6 | metadata: 7 | name: {{ $.Release.Name }}-syslog-{{ $values.name }} 8 | labels: 9 | {{- with $values.labels }} 10 | {{- toYaml . | nindent 4 }} 11 | {{- end }} 12 | spec: 13 | {{- with $values.service }} 14 | type: {{ .type | default "ClusterIP" }} 15 | {{- with .clusterIP }} 16 | clusterIP: {{ . }} 17 | {{- end }} 18 | {{- with .loadBalancerIP }} 19 | loadBalancerIP: {{ . }} 20 | {{- end }} 21 | {{- with .loadBalancerSourceRanges }} 22 | loadBalancerSourceRanges: 23 | {{- toYaml . | nindent 4 }} 24 | {{- end }} 25 | {{- with .externalIPs }} 26 | externalIPs: 27 | {{- toYaml . | nindent 4 }} 28 | {{- end }} 29 | {{- with .externalTrafficPolicy }} 30 | externalTrafficPolicy: {{ . }} 31 | {{- end }} 32 | {{- end }} 33 | ports: 34 | - name: {{ .name | default $key }} 35 | targetPort: {{ $values.containerPort }} 36 | protocol: {{ $values.protocol | default "TCP" }} 37 | {{- if $values.service }} 38 | port: {{ $values.service.port | default $values.containerPort }} 39 | {{- if $values.service.nodePort }} 40 | nodePort: {{ $values.service.nodePort }} 41 | {{- end }} 42 | {{- else }} 43 | port: {{ $values.containerPort }} 44 | {{- end }} 45 | selector: 46 | app.kubernetes.io/component: {{ $.Release.Name }}-syslog-ng 47 | {{- end }} 48 | {{- end }} -------------------------------------------------------------------------------- /charts/aci-monitoring-stack/values.yaml: -------------------------------------------------------------------------------- 1 | aci_exporter: 2 | image: 3 | repository: quay.io/camillo/aci-exporter 4 | tag: v0.8.0 5 | pullPolicy: Always 6 | port: 9643 7 | # IF you change this URL YOU MUST Change the aciServiceDiscoveryURLs.sd value to match 8 | url: aci-exporter-svc 9 | httpclient: 10 | insecurehttps: true 11 | keepalive: 120 12 | timeout: 30 13 | pagesize: 1000 14 | fabrics: {} 15 | 16 | # Helm can't use variables in a Value file so naming the service 17 | # Dynamically (i.e. including the release name) will break the prometheus sub-chart 18 | # The easiest solution (and the only one I found that does not involves patching the prom sub-chart) 19 | # is to use anchors so we can define the http_service_discovery URL needed by prometheus 20 | # and then pass it in the prometheus.serverFiles.prometheus.yml.scrape_configs.http_sd_configs.url paramater 21 | # Sadly acnhors don't support string concatenation so we have to ensure the aci-sd-url matches the config.port and config.url 22 | # You need to change this ONLY if you want to deploy this chart multiple times in the same namespace to avoid 23 | # having duplicate service name in the same NS. 24 | # Note: The url is then expanded in the config map as {{ $v.url }}.{{ $.Release.Namespace }}.svc:{{ $.Values.aci_exporter.port }}/sd 25 | # so do not include ports or paths here. 26 | aciServiceDiscoveryURLs: 27 | sd: 28 | url: http://aci-exporter-svc 29 | apic_polling: 5m 30 | apic_scrape_timeout: 4m 31 | switch_polling: 1m 32 | switch_scrape_timeout: 30s 33 | prefix: aci_ 34 | 35 | prometheus: 36 | configmapReload: 37 | prometheus: 38 | extraConfigmapMounts: 39 | - name: prometheus-alerts 40 | mountPath: /etc/alerts.d 41 | configMap: prometheus-alerts 42 | readOnly: true 43 | extraVolumeDirs: 44 | - /etc/alerts.d 45 | # We are deploying a dedicated Prometheus instance for the aci-exporter 46 | # So we don't care about collecting metrics from the K8s cluster itself 47 | kube-state-metrics: 48 | enabled: false 49 | prometheus-node-exporter: 50 | enabled: false 51 | prometheus-pushgateway: 52 | enabled: false 53 | server: 54 | configMapOverrideName: prometheus-config 55 | extraConfigmapMounts: 56 | - name: prometheus-alerts 57 | mountPath: /etc/alerts.d 58 | configMap: prometheus-alerts 59 | readOnly: true 60 | prefixURL: "" 61 | extraArgs: 62 | # Is a bit of an odd syntax but this is how is ENABLED 63 | web.enable-remote-write-receiver: null 64 | 65 | service: 66 | servicePort: 80 67 | alertmanager: 68 | enabled: true 69 | service: 70 | port: 9093 71 | templates: 72 | alertmanager-webex-template.tmpl: |- 73 | {{/* Title of the Webex alert */}} 74 | {{ define "webex.default.message" -}} 75 | [{{ .Status | toUpper -}} 76 | {{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{- end -}} 77 | ] {{ .CommonLabels.alertname }} 78 | {{ template "Webex.text" . }} 79 | {{- end }} 80 | 81 | {{/* Severity of the alert */}} 82 | {{ define "__alert_severity" -}} 83 | {{- if eq .CommonLabels.severity "critical" -}} 84 | *Severity:* `Critical` 85 | {{- else if eq .CommonLabels.severity "warning" -}} 86 | *Severity:* `Warning` 87 | {{- else if eq .CommonLabels.severity "info" -}} 88 | *Severity:* `Info` 89 | {{- else -}} 90 | *Severity:* `undefined` {{ .CommonLabels.severity }} 91 | {{- end }} 92 | {{- end }} 93 | 94 | {{/* The text to display in the alert */}} 95 | {{ define "Webex.text" -}} 96 | 97 | {{ template "__alert_severity" . }} 98 | {{- if (index .Alerts 0).Annotations.summary }} 99 | {{- "\n" -}} 100 | *Summary:* {{ (index .Alerts 0).Annotations.summary }} 101 | {{- end }} 102 | 103 | {{ range .Alerts }} 104 | 105 | {{- if .Annotations.description }} 106 | {{- "\n" -}} 107 | {{ .Annotations.description }} 108 | {{- "\n" -}} 109 | {{- end }} 110 | {{- if .Annotations.message }} 111 | {{- "\n" -}} 112 | {{ .Annotations.message }} 113 | {{- "\n" -}} 114 | {{- end }} 115 | 116 | {{- end }} 117 | {{- end }} 118 | 119 | grafana: 120 | enabled: true 121 | defaultDashboardsEnabled: false 122 | sidecar: 123 | # This will load datasorces that are defined as ConfigMaps with the label grafana_datasource 124 | datasources: 125 | name: Prometheus 126 | uid: prometheus 127 | enabled: true 128 | defaultDatasourceEnabled: true 129 | isDefaultDatasource: true 130 | label: grafana_datasource 131 | labelValue: "aci-monitoring-stack" 132 | alertmanager: 133 | enabled: true 134 | name: Alertmanager 135 | uid: alertmanager 136 | handleGrafanaManagedAlerts: false 137 | implementation: prometheus 138 | dashboards: 139 | enabled: true 140 | label: grafana_dashboard 141 | labelValue: "aci-monitoring-stack" 142 | # This is needed to place the dashboard inside the folder specified 143 | # in the `k8s-sidecar-target-directory` annotation in the ConfigMap 144 | provider: 145 | foldersFromFilesStructure: true 146 | allowUiUpdates: true 147 | alerts: 148 | enabled: false 149 | rbac: 150 | create: false 151 | promtail: 152 | enabled: true 153 | config: 154 | clients: 155 | - url: http://loki-gateway/loki/api/v1/push 156 | snippets: 157 | scrapeConfigs: | 158 | {{- range $k, $v := .Values.extraPorts }} 159 | - job_name: {{ $v.name }} 160 | syslog: 161 | listen_address: 0.0.0.0:{{ $v.containerPort }} 162 | labels: 163 | job: aci-monitoring-stack 164 | fabric: {{ $v.name }} 165 | relabel_configs: 166 | - source_labels: 167 | - __syslog_message_hostname 168 | target_label: switch 169 | pipeline_stages: 170 | # ACI log levels don't map to the Grafana hard coded one so to get coloring right I just map the ACI levels to Grafana ones. 171 | - regex: 172 | expression: '\[(?Pcritical|major|minor|warning|cleared|info)\]' 173 | - replace: 174 | expression: '^(major)$' 175 | source: level 176 | replace: 'error' 177 | - replace: 178 | expression: '^(minor)$' 179 | source: level 180 | replace: 'warning' 181 | - replace: 182 | expression: '^(cleared)$' 183 | source: level 184 | replace: 'info' 185 | - labels: 186 | level: 187 | {{- end }} 188 | daemonset: 189 | enabled: false 190 | deployment: 191 | enabled: true 192 | replicaCount: 1 193 | #Only used to ingest external logs from ACI so no need to mount any of the local volumes 194 | defaultVolumes: [] 195 | defaultVolumeMounts: [] 196 | podSecurityPolicy: 197 | privileged: fasle 198 | allowPrivilegeEscalation: false 199 | volumes: 200 | - 'secret' 201 | - 'downwardAPI' 202 | syslog: 203 | enabled: false 204 | image: 205 | repository: lscr.io/linuxserver/syslog-ng 206 | tag: 4.8.1 207 | pullPolicy: IfNotPresent 208 | 209 | loki: 210 | enabled: true 211 | cephBucket: 212 | enabled: false 213 | bucketName: loki 214 | storageClassName: ocs-storagecluster-ceph-rgw 215 | fullnameOverride: "loki" 216 | monitoring: 217 | dashboards: 218 | enabled: false 219 | rules: 220 | enabled: false 221 | serviceMonitor: 222 | enabled: false 223 | selfMonitoring: 224 | enabled: false 225 | grafanaAgent: 226 | installOperator: false 227 | lokiCanary: 228 | enabled: false 229 | lokiCanary: 230 | enabled: false 231 | test: 232 | enabled: false 233 | loki: 234 | enabled: true 235 | auth_enabled: false 236 | gateway: 237 | service: 238 | port: 80 239 | datasource: 240 | jsonData: "{}" 241 | uid: "" 242 | commonConfig: 243 | replication_factor: 1 244 | limits_config: 245 | discover_log_levels: false 246 | discover_service_name: [] 247 | schemaConfig: 248 | configs: 249 | - from: 2024-04-01 250 | store: tsdb 251 | object_store: s3 252 | schema: v13 253 | index: 254 | prefix: loki_index_ 255 | period: 24h 256 | ingester: 257 | chunk_encoding: snappy 258 | tracing: 259 | enabled: false 260 | querier: 261 | # Default is 4, if you have enough memory and CPU you can increase, reduce if OOMing 262 | max_concurrent: 4 263 | sidecar: 264 | rules: 265 | enabled: true 266 | label: loki_rule 267 | labelValue: "" 268 | rulerConfig: 269 | wal: 270 | dir: /rules/ruler-wal 271 | storage: 272 | type: local 273 | local: 274 | directory: /rules 275 | rule_path: /rules/fake 276 | remote_write: 277 | enabled: true 278 | clients: 279 | fake: 280 | url: http://aci-mon-stack-prometheus-server.{{ .Release.Namespace }}.svc:80/api/v1/write 281 | alertmanager_url: http://aci-mon-stack-alertmanager.{{ .Release.Namespace }}.svc:9093 282 | ring: 283 | kvstore: 284 | store: inmemory 285 | enable_alertmanager_v2: true 286 | deploymentMode: SimpleScalable 287 | backend: 288 | replicas: 3 289 | persistence: 290 | size: 2Gi 291 | read: 292 | replicas: 3 293 | write: 294 | replicas: 3 295 | persistence: 296 | size: 2Gi 297 | # Enable minio for storage 298 | minio: 299 | enabled: true 300 | singleBinary: 301 | replicas: 0 302 | chunksCache: 303 | allocatedMemory: 1024 304 | ingester: 305 | replicas: 0 306 | querier: 307 | replicas: 0 308 | queryFrontend: 309 | replicas: 0 310 | queryScheduler: 311 | replicas: 0 312 | distributor: 313 | replicas: 0 314 | compactor: 315 | replicas: 0 316 | indexGateway: 317 | replicas: 0 318 | bloomCompactor: 319 | replicas: 0 320 | bloomGateway: 321 | replicas: 0 322 | 323 | backup2graph: 324 | enabled: true 325 | # How often to run the backup2graph job this is a cron schedule expressions by default is every 15min 326 | schedule: "*/15 * * * *" 327 | nodeSelector: {} 328 | image: 329 | repository: quay.io/datacenter/backup2graph 330 | tag: v0.1.4 331 | pullPolicy: Always 332 | memgraph: 333 | enabled: true 334 | boltPort: 7687 335 | container: 336 | terminationGracePeriodSeconds: 60 337 | sysctlInitContainer: 338 | enabled: false 339 | persistentVolumeClaim: 340 | createUserClaim: true 341 | userStorageAccessMode: "ReadWriteMany" 342 | # DO NOT CHANGE: backup2graph is hardcoded to use this path 343 | userMountPath: /app/fabrics/ -------------------------------------------------------------------------------- /docs/4-fabric-example.yaml: -------------------------------------------------------------------------------- 1 | aci_exporter: 2 | # Profiles for different fabrics 3 | fabrics: 4 | fab1: 5 | username: 6 | password: 7 | apic: 8 | - https://IP1 9 | service_discovery: inbMgmtAddr 10 | fab2: 11 | username: 12 | password: 13 | apic: 14 | - https://IP1 15 | service_discovery: inbMgmtAddr 16 | nsd-backbone: 17 | username: 18 | password: 19 | apic: 20 | - https://IP1 21 | - https://IP2 22 | - https://IP3 23 | service_discovery: oobMgmtAddr 24 | steve-uk: 25 | username: 26 | password: 27 | apic: 28 | - https://IP1 29 | service_discovery: oobMgmtAddr 30 | 31 | prometheus: 32 | server: 33 | ingress: 34 | enabled: true 35 | ingressClassName: "traefik" 36 | hosts: 37 | - aci-exporter-prom.apps.c1.cam.ciscolabs.com 38 | baseURL: "http://aci-exporter-prom.apps.c1.cam.ciscolabs.com" 39 | service: 40 | retentionSize: 5GB 41 | persistentVolume: 42 | accessModes: ["ReadWriteOnce"] 43 | size: 5Gi 44 | 45 | alertmanager: 46 | baseURL: "http://aci-exporter-alertmanager.apps.c1.cam.ciscolabs.com" 47 | ingress: 48 | enabled: true 49 | ingressClassName: "traefik" 50 | hosts: 51 | - host: aci-exporter-alertmanager.apps.c1.cam.ciscolabs.com 52 | paths: 53 | - path: / 54 | pathType: ImplementationSpecific 55 | config: 56 | route: 57 | group_by: ['alertname'] 58 | group_interval: 30s 59 | repeat_interval: 30s 60 | group_wait: 30s 61 | receiver: 'webex' 62 | receivers: 63 | - name: webex 64 | webex_configs: 65 | - send_resolved: false 66 | api_url: "https://webexapis.com/v1/messages" 67 | room_id: "" 68 | http_config: 69 | authorization: 70 | credentials: "" 71 | grafana: 72 | enable: true 73 | grafana.ini: 74 | users: 75 | viewers_can_edit: "True" 76 | plugins: 77 | allow_loading_unsigned_plugins: "kniepdennis-neo4j-datasource" 78 | plugins: 79 | - http:///kniepdennis-neo4j-datasource-2.0.0.zip;kniepdennis-neo4j-datasource 80 | adminPassword: 81 | deploymentStrategy: 82 | type: Recreate 83 | ingress: 84 | ingressClassName: "traefik" 85 | enabled: true 86 | hosts: 87 | - aci-exporter-grafana.apps.c1.cam.ciscolabs.com 88 | persistence: 89 | enabled: true 90 | size: 2Gi 91 | 92 | loki: 93 | loki: 94 | rulerConfig: 95 | external_url: http://aci-exporter-grafana.apps.c1.cam.ciscolabs.com 96 | 97 | backend: 98 | replicas: 3 99 | persistence: 100 | enableStatefulSetAutoDeletePVC: true 101 | size: 2Gi 102 | read: 103 | replicas: 3 104 | write: 105 | replicas: 3 106 | persistence: 107 | enableStatefulSetAutoDeletePVC: true 108 | size: 2Gi 109 | 110 | syslog: 111 | enabled: true 112 | services: 113 | nsd-backbone: 114 | name: nsd-backbone 115 | containerPort: 1516 116 | protocol: UDP 117 | service: 118 | type: LoadBalancer 119 | port: 1516 120 | 121 | promtail: 122 | extraPorts: 123 | Fab1: 124 | name: fab1 125 | containerPort: 1513 126 | protocol: TCP 127 | service: 128 | type: LoadBalancer 129 | port: 1514 130 | Fab2: 131 | name: fab2 132 | containerPort: 1514 133 | protocol: TCP 134 | service: 135 | type: LoadBalancer 136 | port: 1514 137 | Steve-UK: 138 | name: steve-uk 139 | containerPort: 1515 140 | protocol: TCP 141 | service: 142 | type: LoadBalancer 143 | port: 1514 144 | nsd-backbone: 145 | name: nsd-backbone 146 | containerPort: 1516 147 | protocol: TCP 148 | service: 149 | type: ClusterIP 150 | memgraph: 151 | storageClass: 152 | name: memgraph 153 | provisioner: "driver.longhorn.io" 154 | backup2graph: 155 | enabled: true -------------------------------------------------------------------------------- /docs/LABDCN-1038/README.md: -------------------------------------------------------------------------------- 1 | # LABDCN-1038: Open Source Monitoring for Cisco ACI 2 | 3 | This section contains specific instruction on how to run the *LABDCN-1038* Walk In Lab for Cisco Live San Diego 2025. 4 | 5 | ## Task 1 - Getting Familiar with the ACI Monitoring Stack 6 | 7 | If this is your first time learning about the ACI monitoring stack you should start with the [Overview](overview.md) that provides an overview of the Stack Architecture. 8 | You do not need to deep dive in the details, unless you want to, but is good to have a generic understanding of the components used in the Stack. 9 | 10 | Next head over the [Demo Environment](../demo-environment.md) documentation, as you read this section explore the dashboard that are available in the Demo Environment. 11 | 12 | ## Task 2 - Create a Dashboard 13 | 14 | [Lab1](../labs/lab1.md): In this lab we are going to re-built the ACI Fault Dashboard 15 | 16 | ## Task 3 - Explore The Logs 17 | 18 | [Lab2](../labs/lab2.md): In this lab we are going to use `Explore` to visualize the Logs Received by our ACI fabrics. 19 | 20 | ## Task 4 - Explore the ACI Configs 21 | 22 | The ACI Monitoring Stack introduced a new feature in its last release that automatically generates a Config Snapshot every 15 minutes (by default) and seamlessly loads it into a Graph Database. This allow the user to then query the ACI config directly from Grafana. 23 | 24 | [Lab3](../labs/lab3.md) -------------------------------------------------------------------------------- /docs/LABDCN-1038/overview.md: -------------------------------------------------------------------------------- 1 | aci-monitoring-stack - Open Source Monitoring for Cisco ACI 2 | ------------ 3 | 4 | # Overview 5 | 6 | Harness the power of open source to efficiently monitor your Cisco ACI environment with the ACI-Monitoring-Stack. This lightweight, yet robust, monitoring solution combines top-tier open source tools, each contributing unique capabilities to ensure comprehensive visibility into your ACI infrastructure. 7 | 8 | The ACI-Monitoring-Stack integrates the following key components: 9 | 10 | - [Grafana](https://grafana.com/oss/grafana/): The leading open-source analytics and visualization platform. Grafana allows you to create dynamic dashboards that provide real-time insights into your network's performance, health, and metrics. With its user-friendly interface, you can easily visualize and correlate data across your ACI fabric, enabling quicker diagnostics and informed decision-making. 11 | 12 | - [Prometheus](https://prometheus.io/): A powerful open-source monitoring and alerting toolkit. Prometheus excels in collecting and storing metrics in a time-series database, allowing for flexible queries and real-time alerting. Its seamless integration with Grafana ensures that your monitoring stack provides a detailed and up-to-date view of your ACI environment. 13 | 14 | - [Loki](https://grafana.com/oss/loki/): Designed for efficiently aggregating and querying logs from your entire ACI ecosystem. Loki complements Prometheus by focusing on log aggregation, providing a unified stack for metrics and logs. Its integration with Grafana enables you to correlate log data with metrics and create a holistic monitoring experience. 15 | 16 | - [Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/): the agent responsible for gathering and shipping the log files to the Loki server. 17 | 18 | - [Syslog-ng](https://github.com/syslog-ng/syslog-ng): is an open-source implementation of the Syslog protocol, its role in this stack is to translate syslog messages from RFC 3164 to 5424. This is needed because Promtail only support Syslog RFC 5424 over TCP and this capability is only available in ACI 6.1 and above. 19 | 20 | - [aci-exporter](https://github.com/opsdis/aci-exporter): A Prometheus exporter that serves as the bridge between your Cisco ACI environment and the Prometheus monitoring ecosystem. The aci-exporter translates ACI-specific metrics into a format that Prometheus can ingest, ensuring that all crucial data points are captured and monitored effectively. 21 | 22 | - [backup2graph](apps/backup2graph/README.md): Convert an ACI Backup into a Graph Database 23 | 24 | - [Memgraph](https://github.com/memgraph/memgraph): An open source graph database implemented in C/C++ and leverages an in-memory first architecture. This will be used in the ACI-Monitoring-Stack to explore the ACI configurations imported by backup2graph 25 | 26 | - Pre-configured ACI data collections queries, alerts, and dashboards (Work In Progress): The ACI-Monitoring-Stack provides a solid foundation for monitoring an ACI fabric with its pre-defined queries, dashboards, and alerts. While these tools are crafted based on best practices to offer immediate insights into network performance, they are not exhaustive. The strength of the ACI-Monitoring-Stack lies in its community-driven approach. Users are invited to contribute their expertise by providing feedback, sharing custom solutions, and helping enhance the stack. Your input helps to refine and expand the stack's capabilities, ensuring it remains a relevant and powerful tool for network monitoring. 27 | 28 | # Your Stack 29 | 30 | To gain a comprehensive understanding of the ACI Monitoring Stack and its components it is helpful to break down the stack into separate functions. Each function focuses on a different aspect of monitoring the Cisco Application Centric Infrastructure (ACI) environment. 31 | 32 | ## Fabric Discovery: 33 | 34 | The ACI monitoring stack uses Prometheus Service Discovery (HTTP SD) to dynamically discover and scrape targets by periodically querying a specified HTTP endpoint for a list of target configurations in JSON format. 35 | 36 | The ACI Monitoring Stack needs only the IP addresses of the APICs, the Switches will be Auto Discovered. If switches are added or removed from the fabric no action is required from the end user. 37 | 38 | ```mermaid 39 | flowchart-elk RL 40 | P[("Prometheus")] 41 | A["aci-exporter"] 42 | APIC["APIC"] 43 | 44 | APIC -- "API Query" --> A 45 | A -- "HTTP SD" --> P 46 | ``` 47 | 48 | ## ACI Object Scraping: 49 | 50 | `Prometheus` scraping is the process by which `Prometheus` periodically collects metrics data by sending HTTP requests to predefined endpoints on monitored targets. The `aci-exporter` translates ACI-specific metrics into a format that `Prometheus` can ingest, ensuring that all crucial data points are captured and monitored effectively. 51 | 52 | ```mermaid 53 | flowchart-elk RL 54 | P[("Prometheus")] 55 | A["aci-exporter"] 56 | subgraph ACI 57 | S["Switches"] 58 | APIC["APIC"] 59 | end 60 | A--"Scraping"-->P 61 | S--"API Queries"-->A 62 | APIC--"API Queries"-->A 63 | ``` 64 | ## Syslog Ingestion: 65 | 66 | The syslog config is composed of 3 components: `promtail`, `loki` and `syslog-ng`. 67 | Prior to ACI 6.1 `syslog-ng` is required between `ACI` and `Promtail` to convert from RFC 3164 to 5424 syslog message format. 68 | 69 | ```mermaid 70 | flowchart-elk LR 71 | L["Loki"] 72 | PT["Promtail"] 73 | SL["Syslog-ng"] 74 | PT-->L 75 | SL-->PT 76 | subgraph ACI 77 | S["Switches"] 78 | APIC["APIC"] 79 | end 80 | V{Ver >= 6.1} 81 | S--"Syslog"-->V 82 | APIC--"Syslog"-->V 83 | V -->|Yes| PT 84 | V -->|No| SL 85 | ``` 86 | 87 | ## Config Explorer: 88 | 89 | ACI-Monitoring-Stack will generate a Config Snapshot every 15min (By default) and automatically load it into Memgraph. 90 | Backup2Graph uses ACI API Call to: 91 | - Create a new snapshot policy 92 | - Trigger a snapshot 93 | - Delete the snapshot policy and snapshot (once transferred out of the APIC) 94 | 95 | and then uses `scp` to copy it over for processing. Once the Snapshot is copied the APIC config is cleaned up 96 | 97 | ```mermaid 98 | flowchart-elk RL 99 | U["User"] 100 | G["Grafana"] 101 | A["APIC"] 102 | B2G["Backup2Graph"] 103 | MG["Memgraph"] 104 | A--"Backup"-->B2G 105 | B2G--"Push"-->MG 106 | MG--"Cypher Queries"-->G 107 | G-->U 108 | ``` 109 | 110 | ## Data Visualization 111 | 112 | The Data Visualization is handled by `Grafana`, an open-source analytics and monitoring platform that allows users to visualize, query, and analyze data from various sources through customizable and interactive dashboards. It supports a wide range of data sources, including `Prometheus` and `Loki` enabling users to create real-time visualizations, alerts, and reports to monitor system performance and gain actionable insights. 113 | 114 | ```mermaid 115 | flowchart-elk RL 116 | G["Grafana"] 117 | L["Loki"] 118 | P[("Prometheus")] 119 | U["User"] 120 | 121 | P--"PromQL"-->G 122 | L--"LogQL"-->G 123 | G-->U 124 | ``` 125 | ## Alerting 126 | 127 | `Alertmanager` is a component of the `Prometheus` ecosystem designed to handle alerts generated by `Prometheus`. It manages the entire lifecycle of alerts, including deduplication, grouping, silencing, and routing notifications to various communication channels like email, `Webex`, `Slack`, and others, ensuring that alerts are delivered to the right people in a timely and organized manner. 128 | 129 | In the ACI Monitoring Stack both `Prometheus` and `Loki` are configured with alerting rules. 130 | ```mermaid 131 | flowchart-elk LR 132 | L["Loki"] 133 | P["Prometheus"] 134 | AM["Alertmanager"] 135 | N["Notifications (Mail/Webex etc...)"] 136 | L --> AM 137 | P --> AM 138 | AM --> N 139 | ``` 140 | 141 | [Back](README.md) -------------------------------------------------------------------------------- /docs/LABDCN-2620/README.md: -------------------------------------------------------------------------------- 1 | # LABDCN-2620: Open Source Monitoring for Cisco ACI 2 | 3 | This section contains specific instruction on how to run the *LABDCN-2620* Walk In Lab for Cisco Live APJC 2024. 4 | All the tasks aside the last one can be run without VPN access to the DMZ. 5 | DMZ credentials will be available from the eXpo portal. 6 | 7 | 8 | ## Task 1 - Getting Familiar with the ACI Monitoring Stack 9 | 10 | If this is your first time learning about the ACI monitoring stack you should start with the [Overview](overview.md) that provides an overview of the Stack Architecture. 11 | You do not need to deep dive in the details, unless you want to, but is good to have a generic understanding of the components used in the Stack. 12 | 13 | Next head over the [Demo Environment](../demo-environment.md) documentation, as you read this section explore the dashboard that are available in the Demo Environment. 14 | 15 | ## Task 2 - Create a Dashboard 16 | 17 | [Lab1](../labs/lab1.md): In this lab we are going to re-built the ACI Fault Dashboard 18 | 19 | ## Task 3 - Explore The Logs 20 | 21 | [Lab2](../labs/lab2.md): In this lab we are going to use `Explore` to visualize the Logs Received by our ACI fabrics. 22 | 23 | ## Task 4 - Deploy the Monitoring Stack (Requires VPN Access) 24 | 25 | The ACI Monitoring Stack can be deployed on any Kubernetes cluster. For this lab, I am providing a pre-configured Kubernetes environment where no major configuration will be required. 26 | 27 | Before proceeding with this tasks I'd suggest you familiarize yourself by reading the [Deployment](../deployment.md) which contains the details on how to setup the ACI Monitoring Stack from scratch however for this Task you should follow the [DMZ Deployment](dmz-deploy.md) instructions. -------------------------------------------------------------------------------- /docs/LABDCN-2620/dmz-deploy.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | In this lab we are going to deploy the ACI Monitoring Stack in a DMZ environment. The DMZ environment is already pre-configured with a Kubernetes cluster that provides: 4 | 5 | - An ingress controller to expose services via HTTPS 6 | - Persistent Storage 7 | 8 | Note that this Task we will be configuring a simplified version and will not be enabling syslog. 9 | 10 | 11 | ## Connect to the Kubernetes Cluster 12 | 13 | The first step consist into SSHing into the Linux server where we are going to deploy the stack. 14 | The VPN and SSH details are present in the eXpo portal. 15 | 16 | ## Review the Config 17 | 18 | After you have connected to the linux server move to the `aci-mon-stack-values` directory. 19 | 20 | ```shell 21 | ➜ ~ cd aci-mon-stack-values 22 | ➜ aci-mon-stack-values 23 | ``` 24 | 25 | Depending on your `` inspect the `aci-mon-stack-values-pod-.yaml` file for example `pod1` contains this 26 | 27 | ```yaml 28 | aci_exporter: 29 | # Defines 3 ACI fabric to probe with their credentials 30 | fabrics: 31 | site1: 32 | apic: 33 | - https:// 34 | password: 35 | service_discovery: oobMgmtAddr 36 | username: aci-exporter 37 | site2: 38 | apic: 39 | - https:// 40 | password: 41 | service_discovery: oobMgmtAddr 42 | username: aci-exporter 43 | site3: 44 | apic: 45 | - https:// 46 | password: 47 | service_discovery: oobMgmtAddr 48 | username: aci-exporter 49 | # Enable Grafana 50 | grafana: 51 | # Enable Grafana Ingress controller over the grafana.pod1.apps.minikube.dmz URL 52 | ingress: 53 | enabled: true 54 | hosts: 55 | - grafana.pod1.apps.minikube.dmz 56 | adminPassword: 57 | defaultDashboardsEnabled: false 58 | deploymentStrategy: 59 | type: Recreate 60 | enable: true 61 | # Allocate 200Mi for grafana storage 62 | persistence: 63 | enabled: true 64 | size: 200Mi 65 | service: 66 | enabled: true 67 | type: ClusterIP 68 | prometheus: 69 | # Enable prometheus Ingress controller over the prom.pod1.apps.minikube.dmz URL 70 | server: 71 | ingress: 72 | enabled: true 73 | hosts: 74 | - prom.pod1.apps.minikube.dmz 75 | baseURL: "http://prom.pod1.apps.minikube.dmz" 76 | 77 | # Allocate 200Mi for prometheus storage 78 | persistentVolume: 79 | accessModes: 80 | - ReadWriteOnce 81 | size: 200Mi 82 | service: 83 | retentionSize: 200Mi 84 | alertmanager: 85 | # Allocate 200Mi for alertmanager storage 86 | persistence: 87 | size: 200Mi 88 | baseURL: "http://alertmanager.pod1.apps.minikube.dmz" 89 | 90 | # Enable alertmanager Ingress controller over the prom.pod1.apps.minikube.dmz URL 91 | ingress: 92 | enabled: true 93 | hosts: 94 | - host: alertmanager.pod1.apps.minikube.dmz 95 | paths: 96 | - path: / 97 | pathType: ImplementationSpecific 98 | 99 | # For this lab I am not enabling Syslog collection 100 | loki: 101 | enabled: false 102 | promtail: 103 | enabled: false 104 | syslog: 105 | enabled: false 106 | ``` 107 | 108 | In order to deploy your stack we first need to have the HELM repository configured, to do so execute the following: 109 | 110 | ```shell 111 | helm repo add aci-monitoring-stack https://datacenter.github.io/aci-monitoring-stack 112 | helm repo update 113 | ``` 114 | 115 | If you get a message stating `"aci-monitoring-stack" already exists with the same configuration, skipping` it simply means you are not the first student of the day. 116 | 117 | Next we can deploy the stack with this single line, please **be careful** to use replace with your PODID 118 | 119 | ``` 120 | helm -n pod--aci-monitoring-stack upgrade --install --create-namespace pod--aci-monitoring-stack aci-monitoring-stack/aci-monitoring-stack -f aci-mon-stack-values-pod-.yaml 121 | ``` 122 | 123 | Now you can check with kubectl and see if your POD are deployed in this below example ` == 2` 124 | 125 | ``` 126 | 127 | ➜ aci-mon-stack-values kubectl -n pod-2-aci-monitoring-stack get pod 128 | NAME READY STATUS RESTARTS AGE 129 | pod-2-aci-monitoring-stack-aci-exporter-f7dfdc997-bswv2 1/1 Running 0 10m 130 | pod-2-aci-monitoring-stack-alertmanager-0 1/1 Running 0 10m 131 | pod-2-aci-monitoring-stack-grafana-7f766c95cd-khxnc 3/3 Running 0 10m 132 | pod-2-aci-monitoring-stack-prometheus-server-5867fb886-jxs66 2/2 Running 0 10m 133 | ``` 134 | 135 | This should also have created the required `Ingress` routes 136 | 137 | ``` 138 | ➜ aci-mon-stack-values kubectl -n pod-2-aci-monitoring-stack get ingress 139 | NAME CLASS HOSTS ADDRESS PORTS AGE 140 | pod-2-aci-monitoring-stack-alertmanager traefik alertmanager.pod2.apps.minikube.dmz 172.16.0.210 80 11m 141 | pod-2-aci-monitoring-stack-grafana traefik grafana.pod2.apps.minikube.dmz 172.16.0.210 80 11m 142 | pod-2-aci-monitoring-stack-prometheus-server traefik prom.pod2.apps.minikube.dmz 172.16.0.210 80 11m 143 | ``` 144 | 145 | You should now be able to access the Grafana UI from your browser, you **MUST use HTTPS** as the connections are terminated on a reverse proxy. All the URL will be in the format of 146 | `https://grafana.pod.apps.minikube.dmz` 147 | 148 | 149 | -------------------------------------------------------------------------------- /docs/LABDCN-2620/overview.md: -------------------------------------------------------------------------------- 1 | aci-monitoring-stack - Open Source Monitoring for Cisco ACI 2 | ------------ 3 | 4 | # Overview 5 | 6 | Harness the power of open source to efficiently monitor your Cisco ACI environment with the ACI-Monitoring-Stack. This lightweight, yet robust, monitoring solution combines top-tier open source tools, each contributing unique capabilities to ensure comprehensive visibility into your ACI infrastructure. 7 | 8 | The ACI-Monitoring-Stack integrates the following key components: 9 | 10 | - [Grafana](https://grafana.com/oss/grafana/): The leading open-source analytics and visualization platform. Grafana allows you to create dynamic dashboards that provide real-time insights into your network's performance, health, and metrics. With its user-friendly interface, you can easily visualize and correlate data across your ACI fabric, enabling quicker diagnostics and informed decision-making. 11 | 12 | - [Prometheus](https://prometheus.io/): A powerful open-source monitoring and alerting toolkit. Prometheus excels in collecting and storing metrics in a time-series database, allowing for flexible queries and real-time alerting. Its seamless integration with Grafana ensures that your monitoring stack provides a detailed and up-to-date view of your ACI environment. 13 | 14 | - [Loki](https://grafana.com/oss/loki/): Designed for efficiently aggregating and querying logs from your entire ACI ecosystem. Loki complements Prometheus by focusing on log aggregation, providing a unified stack for metrics and logs. Its integration with Grafana enables you to correlate log data with metrics and create a holistic monitoring experience. 15 | 16 | - [Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/): the agent responsible for gathering and shipping the log files to the Loki server. 17 | 18 | - [Syslog-ng](https://github.com/syslog-ng/syslog-ng): is an open-source implementation of the Syslog protocol, its role in this stack is to translate syslog messages from RFC 3164 to 5424. This is needed because Promtail only support Syslog RFC 5424 over TCP and this capability is only available in ACI 6.1 and above. 19 | 20 | - [aci-exporter](https://github.com/opsdis/aci-exporter): A Prometheus exporter that serves as the bridge between your Cisco ACI environment and the Prometheus monitoring ecosystem. The aci-exporter translates ACI-specific metrics into a format that Prometheus can ingest, ensuring that all crucial data points are captured and monitored effectively. 21 | 22 | - Pre-configured ACI data collections queries, alerts, and dashboards (Work In Progress): The ACI-Monitoring-Stack provides a solid foundation for monitoring an ACI fabric with its pre-defined queries, dashboards, and alerts. While these tools are crafted based on best practices to offer immediate insights into network performance, they are not exhaustive. The strength of the ACI-Monitoring-Stack lies in its community-driven approach. Users are invited to contribute their expertise by providing feedback, sharing custom solutions, and helping enhance the stack. Your input helps to refine and expand the stack's capabilities, ensuring it remains a relevant and powerful tool for network monitoring. 23 | 24 | # Your Stack 25 | 26 | To gain a comprehensive understanding of the ACI Monitoring Stack and its components it is helpful to break down the stack into separate functions. Each function focuses on a different aspect of monitoring the Cisco Application Centric Infrastructure (ACI) environment. 27 | 28 | ## Fabric Discovery: 29 | 30 | The ACI monitoring stack uses Prometheus Service Discovery (HTTP SD) to dynamically discover and scrape targets by periodically querying a specified HTTP endpoint for a list of target configurations in JSON format. 31 | 32 | The ACI Monitoring Stack needs only the IP addresses of the APICs, the Switches will be Auto Discovered. If switches are added or removed from the fabric no action is required from the end user. 33 | 34 | ```mermaid 35 | flowchart-elk RL 36 | P[("Prometheus")] 37 | A["aci-exporter"] 38 | APIC["APIC"] 39 | 40 | APIC -- "API Query" --> A 41 | A -- "HTTP SD" --> P 42 | ``` 43 | 44 | ## ACI Object Scraping: 45 | 46 | `Prometheus` scraping is the process by which `Prometheus` periodically collects metrics data by sending HTTP requests to predefined endpoints on monitored targets. The `aci-exporter` translates ACI-specific metrics into a format that `Prometheus` can ingest, ensuring that all crucial data points are captured and monitored effectively. 47 | 48 | ```mermaid 49 | flowchart-elk RL 50 | P[("Prometheus")] 51 | A["aci-exporter"] 52 | subgraph ACI 53 | S["Switches"] 54 | APIC["APIC"] 55 | end 56 | A--"Scraping"-->P 57 | S--"API Queries"-->A 58 | APIC--"API Queries"-->A 59 | ``` 60 | ## Syslog Ingestion: 61 | 62 | The syslog config is composed of 3 components: `promtail`, `loki` and `syslog-ng`. 63 | Prior to ACI 6.1 `syslog-ng` is required between `ACI` and `Promtail` to convert from RFC 3164 to 5424 syslog message format. 64 | 65 | ```mermaid 66 | flowchart-elk LR 67 | L["Loki"] 68 | PT["Promtail"] 69 | SL["Syslog-ng"] 70 | PT-->L 71 | SL-->PT 72 | subgraph ACI 73 | S["Switches"] 74 | APIC["APIC"] 75 | end 76 | V{Ver >= 6.1} 77 | S--"Syslog"-->V 78 | APIC--"Syslog"-->V 79 | V -->|Yes| PT 80 | V -->|No| SL 81 | ``` 82 | 83 | ## Data Visualization 84 | 85 | The Data Visualization is handled by `Grafana`, an open-source analytics and monitoring platform that allows users to visualize, query, and analyze data from various sources through customizable and interactive dashboards. It supports a wide range of data sources, including `Prometheus` and `Loki` enabling users to create real-time visualizations, alerts, and reports to monitor system performance and gain actionable insights. 86 | 87 | ```mermaid 88 | flowchart-elk RL 89 | G["Grafana"] 90 | L["Loki"] 91 | P[("Prometheus")] 92 | U["User"] 93 | 94 | P--"PromQL"-->G 95 | L--"LogQL"-->G 96 | G-->U 97 | ``` 98 | ## Alerting 99 | 100 | `Alertmanager` is a component of the `Prometheus` ecosystem designed to handle alerts generated by `Prometheus`. It manages the entire lifecycle of alerts, including deduplication, grouping, silencing, and routing notifications to various communication channels like email, `Webex`, `Slack`, and others, ensuring that alerts are delivered to the right people in a timely and organized manner. 101 | 102 | In the ACI Monitoring Stack both `Prometheus` and `Loki` are configured with alerting rules. 103 | ```mermaid 104 | flowchart-elk LR 105 | L["Loki"] 106 | P["Prometheus"] 107 | AM["Alertmanager"] 108 | N["Notifications (Mail/Webex etc...)"] 109 | L --> AM 110 | P --> AM 111 | AM --> N 112 | ``` 113 | 114 | [Back](README.md) -------------------------------------------------------------------------------- /docs/demo-environment.md: -------------------------------------------------------------------------------- 1 | # Access 2 | 3 | The Demo environment is hosted in a DMZ and ca be accessed with the following credentials: 4 | 5 | https://64.104.255.11/ 6 | 7 | user: `guest` 8 | password: `guest` 9 | 10 | The guest user is able to modify the dashboards and run `Explore` queries however it can't save any of the configuration changes. 11 | 12 | # Exploring the ACI Monitoring Stack 13 | 14 | In this section I am going to guide you trough the available already built dashboards and how to use them. We will cover two types of dashboards based on data provided by Prometheus for fault and alerts data and Loki for syslog data. 15 | 16 | *Note:* Grafana support building dashboard with data coming from Multiple data source but for the moment, the ACI Monitoring stack does not make use of such capability. 17 | 18 | All the Dashboards are located in the `ACI` Folder in the `Dashboards` section of the UI: 19 | ![dashboards](images/dashboards.png) 20 | 21 | 22 | The stack is pre-provisioned with the following Dashboards. Feel free to explore the ones that are of interest. 23 | 24 | - [Prometheus backed Dashboards](#prometheus-backed-dashboards) 25 | - [ACI Faults](#aci-faults) 26 | - [EPG Explore](#epg-explore) 27 | - [EPG Stats](#epg-stats) 28 | - [Fabric Capacity](#fabric-capacity) 29 | - [Node Capacity](#node-capacity) 30 | - [Node Details](#node-details) 31 | - [Nodes Interfaces](#nodes-interfaces) 32 | - [Power Usage](#power-usage) 33 | - [Routing Protocols](#routing-protocols) 34 | - [Loki backed Dashboards](#loki-backed-dashboards) 35 | - [Contract Drops Logs](#contract-drops-logs) 36 | - [Config Export Dashboards](#config-export-dashboards) 37 | - [Contract Explorer](#contract-explorer) 38 | - [Fabric Policies - Port Group](#fabric-policies-port-group) 39 | - [Missing Targets](#missing-targets) 40 | - [Vlans](#vlans) 41 | 42 | ## Prometheus backed Dashboards 43 | 44 | These dashboards are using `Prometheus` as data source meaning the data we are visualizing came from an ACI Managed Object and was translated by the `aci-exporter` 45 | 46 | ### ACI Faults 47 | This dashboard is a 1:1 copy of the faults that are present inside ACI. The main advantages compared to looking at the faults in the ACI UI are: 48 | - the ability to aggregating Faults from Multiple Fabrics in a single table 49 | - allowing advanced sorting and filtering 50 | 51 | 52 | ![faults](images/faults.png) 53 | 54 | By using the `Fabric` drop down menu you can select different Fabrics (or All) and you can use the Colum headers to filter/sort the data: 55 | 56 | 57 | 58 | 59 | This is a good dashboard to understand how Grafana dashboards are built, if you are interested on building your own dashboard you can take a look [here](labs/lab1.md). 60 | 61 | 62 | 63 | ### EPG Explore 64 | 65 | The EPG Explore is composed of 2 tables: 66 | - EPG To Interface - VLANs: This table allows the user to map a EPG to a VLAN port on a switch. This table can be filtered by: 67 | - fabric 68 | - tenant 69 | - epg 70 | - V(x)LANs to EPG - Interface: This table allows the user to map a VLAN to an EPG and a port on a switch. This table can be filtered by: 71 | - VLAN 72 | - VXLAN 73 | 74 | *Limitations:* This has not yet been tested with overlapping VLANs 75 | 76 | ### EPG Stats 77 | 78 | This dashboard contains the following time series graphs: 79 | 80 | - EPG RX Gbits/s: This show the Received traffic in the EPG 81 | - EPG TX Gbits/s: This show the Transmitted traffic by the EPG 82 | - EPG Drops RX Pkts/s: This show the number of Packet drops in the ingress direction 83 | - EPG Drops TX Pkts/s: This show the number of Packet drops in the egress direction 84 | 85 | These dashboards are built with the same logic as the ACI EPG Stats dashboards, just in Grafana 86 | 87 | ### Fabric Capacity 88 | 89 | This dashboard contains the same info as the APIC Fabric Capacity dashboard but allows to plot the resource usage over a time period to better monitor the fabric utilization over time 90 | 91 | ### Node Capacity 92 | 93 | This dashboard contains the same info as the APIC FabrNodeic Capacity dashboard but allows to plot the resource usage over a time period to better monitor the fabric utilization over time 94 | 95 | ### Node Details 96 | 97 | This dashboard contains the following time series graphs: 98 | 99 | - Node CPU Usage 100 | - Node Memory Usage 101 | - Node Health 102 | 103 | ### Nodes Interfaces 104 | 105 | This dashboard contains the following graphs: 106 | 107 | - Node Interface status: This dashboard shows which interface are Up/Down 108 | - Interface RX/TX Usage: This dashboard shows the interface utilization in %, it is sorted by highest usage and will display the top 10 interfaces by usage. 109 | 110 | ### Power Usage 111 | 112 | This dashboard display a time series graph of the average power draw per switch 113 | 114 | ### Routing Protocols 115 | 116 | This dashboard contains the following graphs: 117 | 118 | - L3 Neighbours: For every BGP or OSPF neighbors we display the Node to Peer IP peering, the routing protocol used the State of the connect etc... 119 | - BGP Advertised/Received Paths: For every BGP peering we display the number of paths received/advertised 120 | - BGP Accepted Paths: Time series graph of **received** BGP prefixes 121 | 122 | ## Loki backed Dashboards 123 | 124 | These dashboards are using `Loki` as data source meaning the data we are visualizing came from an ACI Syslog Message 125 | 126 | ### Contract Drops Logs 127 | 128 | This dashboard parses the logs received by the switches and extract infos on the Contract Drop Logs. This requires a specific [config](syslog.md) on ACI and is limited to 500 Messages/s per switch 129 | 130 | ## Config Export Dashboards 131 | These dashboard are based on data extracted from ACI Config Snapshot and converted in a Graph Database. 132 | 133 | ### Contract Explorer 134 | 135 | This dashboard allows the user to select a contract and will display how a contract is deployed and what EPG/ESGs are providing or consuming it. 136 | 137 | 138 | 139 | ### Fabric Policies - Port Group 140 | 141 | This dashboard displays detailed information about a port group allowing the user to understand the mappings of: 142 | 143 | - VLANs 144 | - Domains 145 | - AAEP 146 | - Leaves and ports 147 | 148 | 149 | 150 | ### Missing Targets 151 | 152 | Detects and Show missing targets. This is still a bit of a work in progress and should be improved a bit! 153 | ![alt text](images/missing-targets.png) 154 | 155 | ### Vlans 156 | 157 | Display the APIC config for VLAN Pools and VMM Custom Trunk Ports in filterable tables. 158 | ![alt text](images/vlans.png) 159 | 160 | [Next - Lab1](labs/lab1.md) -------------------------------------------------------------------------------- /docs/example-openshift.yaml: -------------------------------------------------------------------------------- 1 | global: 2 | serviceAccountName: &priviledgedServiceAccountName "aci-mon-stack-priv-scc" 3 | 4 | aci_exporter: 5 | aciServiceDiscoveryURLs: 6 | sd: 7 | apic_polling: 5m 8 | apic_scrape_timeout: 4m 9 | switch_polling: 1m 10 | switch_scrape_timeout: 30s 11 | url: http://aci-exporter-svc 12 | fabrics: 13 | fab1: 14 | username: 15 | password: 16 | apic: 17 | - https://IP1 18 | service_discovery: inbMgmtAddr 19 | backup2graph: 20 | enabled: true 21 | grafana: 22 | serviceAccount: 23 | create: false 24 | name: *priviledgedServiceAccountName 25 | adminPassword: aci-monitoring 26 | deploymentStrategy: 27 | type: Recreate 28 | enable: true 29 | env: 30 | http_proxy: http://proxy 31 | https_proxy: http://proxy 32 | no_proxy: .cam.ciscolabs.com,.cluster.local,.svc,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16 33 | grafana.ini: 34 | plugins: 35 | allow_loading_unsigned_plugins: kniepdennis-neo4j-datasource 36 | users: 37 | viewers_can_edit: "True" 38 | ingress: 39 | enabled: true 40 | hosts: 41 | - aci-mon-stack-grafana.apps.ocp-sr-iov.cam.ciscolabs.com 42 | persistence: 43 | enabled: true 44 | size: 2Gi 45 | plugins: 46 | - http:///kniepdennis-neo4j-datasource-2.0.0.zip;kniepdennis-neo4j-datasource 47 | - volkovlabs-form-panel 48 | loki: 49 | cephBucket: 50 | enabled: true 51 | bucketName: &bucketName loki-bucket 52 | storageClassName: ocs-storagecluster-ceph-rgw 53 | endpoint: &bucketEndpoint rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc:443 54 | serviceAccount: 55 | create: false 56 | name: *priviledgedServiceAccountName 57 | global: 58 | dnsService: "dns-default" 59 | dnsNamespace: "openshift-dns" 60 | loki: 61 | rulerConfig: 62 | external_url: http://aci-mon-stack-grafana.apps.ocp-sr-iov.cam.ciscolabs.com 63 | storage: 64 | bucketNames: 65 | admin: *bucketName 66 | chunks: *bucketName 67 | ruler: *bucketName 68 | object_store: null 69 | s3: 70 | endpoint: *bucketEndpoint 71 | insecure: true 72 | type: s3 73 | use_thanos_objstore: true 74 | # Yes this needs to be repeated twice... 75 | storage_config: 76 | aws: 77 | bucketnames: *bucketName 78 | endpoint: *bucketEndpoint 79 | insecure: false 80 | http_config: 81 | insecure_skip_verify: true 82 | s3forcepathstyle: true 83 | backend: 84 | extraEnvFrom: 85 | - secretRef: 86 | name: *bucketName 87 | write: 88 | extraEnvFrom: 89 | - secretRef: 90 | name: *bucketName 91 | read: 92 | extraEnvFrom: 93 | - secretRef: 94 | name: *bucketName 95 | # I use CephFS for the storage, so I don't need to set this 96 | minio: 97 | enabled: false 98 | memgraph: 99 | serviceAccount: 100 | create: false 101 | name: *priviledgedServiceAccountName 102 | container: 103 | terminationGracePeriodSeconds: 60 104 | enabled: true 105 | persistentVolumeClaim: 106 | userStorageClassName: ocs-storagecluster-cephfs 107 | prometheus: 108 | serviceAccounts: 109 | server: 110 | create: false 111 | name: *priviledgedServiceAccountName 112 | alertmanager: 113 | serviceAccount: 114 | create: false 115 | name: *priviledgedServiceAccountName 116 | baseURL: http://aci-mon-stack-alertmanager.apps.ocp-sr-iov.cam.ciscolabs.com 117 | config: 118 | receivers: 119 | - name: webex 120 | webex_configs: 121 | - api_url: https://webexapis.com/v1/messages 122 | http_config: 123 | authorization: 124 | credentials: 125 | no_proxy: .cam.ciscolabs.com,.cluster.local,.svc,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16 126 | proxy_url: http://proxy-wsa.esl.cisco.com:80 127 | room_id: 128 | send_resolved: false 129 | route: 130 | group_by: 131 | - alertname 132 | group_interval: 30s 133 | group_wait: 30s 134 | receiver: webex 135 | repeat_interval: 30s 136 | ingress: 137 | enabled: true 138 | hosts: 139 | - host: aci-mon-stack-alertmanager.apps.ocp-sr-iov.cam.ciscolabs.com 140 | paths: 141 | - path: / 142 | pathType: ImplementationSpecific 143 | server: 144 | baseURL: http://aci-exporter-prom.apps.ocp-sr-iov.cam.ciscolabs.com 145 | ingress: 146 | enabled: true 147 | hosts: 148 | - aci-exporter-prom.apps.ocp-sr-iov.cam.ciscolabs.com 149 | persistentVolume: 150 | accessModes: 151 | - ReadWriteOnce 152 | size: 5Gi 153 | service: 154 | retentionSize: 5GB 155 | promtail: 156 | serviceAccount: 157 | create: false 158 | name: *priviledgedServiceAccountName 159 | extraPorts: 160 | Fab1: 161 | containerPort: 1513 162 | name: fab1 163 | protocol: TCP 164 | service: 165 | port: 1514 166 | type: NodePort 167 | Fab2: 168 | containerPort: 1514 169 | name: fab2 170 | protocol: TCP 171 | service: 172 | port: 1514 173 | type: NodePort 174 | nsd-backbone: 175 | containerPort: 1516 176 | name: nsd-backbone 177 | protocol: TCP 178 | service: 179 | type: ClusterIP 180 | syslog: 181 | enabled: true 182 | services: 183 | nsd-backbone: 184 | containerPort: 1516 185 | name: nsd-backbone 186 | protocol: UDP 187 | service: 188 | port: 1516 189 | type: NodePort -------------------------------------------------------------------------------- /docs/images/column-filter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/images/column-filter.png -------------------------------------------------------------------------------- /docs/images/contract-explorer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/images/contract-explorer.png -------------------------------------------------------------------------------- /docs/images/dashboards.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/images/dashboards.png -------------------------------------------------------------------------------- /docs/images/fabric-filter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/images/fabric-filter.png -------------------------------------------------------------------------------- /docs/images/faults.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/images/faults.png -------------------------------------------------------------------------------- /docs/images/missing-targets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/images/missing-targets.png -------------------------------------------------------------------------------- /docs/images/port-group.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/images/port-group.png -------------------------------------------------------------------------------- /docs/images/vlans.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/images/vlans.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/EmptyDashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/EmptyDashboard.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/TableView1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/TableView1.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/TimeSeries.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/TimeSeries.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/Visualization.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/Visualization.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/label-filtering-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/label-filtering-1.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/label-filtering-dropdown.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/label-filtering-dropdown.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/multiply.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/multiply.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/oganize.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/oganize.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/queryformat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/queryformat.png -------------------------------------------------------------------------------- /docs/labs/images/lab1/table-wrong-time.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab1/table-wrong-time.png -------------------------------------------------------------------------------- /docs/labs/images/lab2/explore.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab2/explore.png -------------------------------------------------------------------------------- /docs/labs/images/lab2/log-details.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab2/log-details.png -------------------------------------------------------------------------------- /docs/labs/images/lab2/logs-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab2/logs-1.png -------------------------------------------------------------------------------- /docs/labs/images/lab2/loki-builder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab2/loki-builder.png -------------------------------------------------------------------------------- /docs/labs/images/lab2/multi-fabric-logs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab2/multi-fabric-logs.png -------------------------------------------------------------------------------- /docs/labs/images/lab2/select-loki.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab2/select-loki.png -------------------------------------------------------------------------------- /docs/labs/images/lab2/ui-filter-result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab2/ui-filter-result.png -------------------------------------------------------------------------------- /docs/labs/images/lab2/ui-filter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacenter/aci-monitoring-stack/d86c508e0dd8ece1ecd90bbe3d177496a146ad1c/docs/labs/images/lab2/ui-filter.png -------------------------------------------------------------------------------- /docs/labs/lab1.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | This is an simple lab that builds a minimal dashboard showing data in a Table Format. 4 | 5 | **This labs assumes you are familiar with the [Demo Environment](../demo-environment.md)** 6 | 7 | # Access 8 | 9 | The Demo environment is hosted in a DMZ and ca be accessed with the following credentials: 10 | 11 | https://64.104.255.11/ 12 | 13 | user: `guest` 14 | password: `guest` 15 | 16 | The guest user is able to modify the dashboards and run `Explore` queries however it can't save any of the configuration changes. 17 | 18 | # Recreate the ACI Faults Dashboard 19 | 20 | This dashboard is a 1:1 copy of the faults that are present inside ACI. The main advantages compared to looking at the faults in the ACI UI are: 21 | - the ability to aggregating Faults from Multiple Fabrics in a single table 22 | - allowing advanced sorting and filtering 23 | 24 | ![faults](../images/faults.png) 25 | 26 | By using the `Fabric` drop down menu you can select different Fabrics (or All) and you can use the Colum headers to filter/sort the data: 27 | 28 | 29 | 30 | 31 | This is a good dashboard to understand how Grafana dashboards are built, so let's re-built the `Fault By Last Transition` table. 32 | 33 | **Note:** In this example we are focusing on Grafana Dashboard, *someone* configured the `aci-exporter` and `Prometheus` to populate the `aci_faults` with data. If you wanna learn on how to configure the `aci-exporter` and `Prometheus` to work together you can check out the [development](development.md) guide. 34 | 35 | ## Dashboard Editing 36 | 37 | *Warning:* Since this is an environment open to the internet I have not allowed used to save any config changes so DO NOT close or reload the browser or you will loose your work ! 38 | 39 | - Select `Dashboards` --> `Tests` --> `Dashboard Test 1` --> Move your move over the empty dashboard --> press `e` on the keyboard. This should open up the editing mode. 40 | - Ensure that you have: 41 | - `Prometheus` selected as Data Source 42 | - Selected the `Builder` mode this is a good way to learn but we will also look a the code afterwards. 43 | 44 | ![alt text](images/lab1/EmptyDashboard.png) 45 | 46 | - From the Metric Drop Down menu select `aci_faults` and click `Run Query` this will display a Graph, in the legend you can see that for each metric we have infos about the Fabric, cause, description etc... the Metric value itself (the 1.7Bil) is the Unix Time Stamp of the last transition time the fault. 47 | 48 | ![alt text](images/lab1/TimeSeries.png) 49 | However this is not a very good visualization for this type of data, we can see interesting data in the legend but a time series is really not the right visualization as we are interested in a list of faults aka a table! 50 | 51 | To switch to a `Table` view we need two steps: 52 | - Select the `Table Format` for our query: Go to `Options` --> `Format` --> Select `Table` 53 | 54 | - Select the `Table` from the Visualization drop down Menu by clicking on `Time Series` and then picking `Table` (take a moment to see how many options there are here) 55 | 56 | 57 | 58 | - With just these two simple changes the data should look much better already however: 59 | - The `Time` and `created` column are not the last transition time for the Fault but when the fault was first received in `Prometheus` and for our use case this is useless. 60 | - The table contains a few "useless" column that would be nice to hide 61 | - The `Value` (last column) that represent the last transition time for our `Fault` is a long number, not a date 62 | 63 | To solve all these issue we need to manipulate our data, in this example we are going to use there 3 Grafana transformations: 64 | 65 | - `Organize fields by name`: This will allows us to rename, re-order and hide the table columns: 66 | - `Convert field type`: This will allow us to convert the `Value` from a Unix Time Stamp to an actual human readable data 67 | - `Sort by`: To sort our Events by Last Transition time i.e. `Value` 68 | - click on the `Transform Data` tab and select `Add Transformation` 69 | 70 | ## Organize fields by name: 71 | 72 | Select `Transform Data` --> `Add Transformation` --> `Organize fields by name` 73 | ![alt text](images/lab1/oganize.png) 74 | Here you can 75 | - Change the ordering of the fields, by drag them by the vertical dots on the left 76 | - Hide them, by clicking on the `eye` symbol 77 | - Rename them by adding text in the empty box on the right of the field name 78 | 79 | You are free to sort things as you please but I would recommend to at least: 80 | - Hide: 81 | - Time 82 | - aci 83 | - created 84 | - instance 85 | - job 86 | - Rename: 87 | - `Values` to `Last Transition` 88 | - Place `Last Transition` as first item in the table 89 | 90 | ## Convert field type: 91 | 92 | Select `Add another transformation` --> `Convert field type` 93 | - Field: `Last Transition` 94 | - Type: `Time` 95 | 96 | If you have placed the `Last Transition` as the first colum you should now see dates but you probably also notice that are not quite right as they show 1970. 97 | This is due to the fact that the epoch is expected in milliseconds since 1970 but what we are getting is just seconds, we will fix this after for now ignore this. 98 | 99 | ## Sort by: 100 | 101 | Select `Add another transformation` --> `Sort By` 102 | - Field: `Last Transition` 103 | - Reverse: Enabled 104 | 105 | Depending on how you have Organize the fields our table should look something like this: 106 | 107 | ![alt text](images/lab1/table-wrong-time.png) 108 | 109 | ## Fix the `Last Transition` timestamp: 110 | 111 | All we have to do is multiple the `Last Transition` (aka the Value of our Metric) by 1000 112 | 113 | - Click on `Query` --> `+ Operations` 114 | - In the `Search` tab enter `multiply` and select `Multiply by scalar` 115 | ![alt text](images/lab1/multiply.png) 116 | - Set the `Value`: to 1000 117 | - Click `Run Queries` 118 | 119 | Now the time should be reflected correctly. 120 | 121 | ## Switch to Code 122 | 123 | The query Builder is a great tool to learn but as you star building more complex queries it will become too cumbersome to use and some advanced capabilities are also not available so is a good idea to also learn the `PromQL` syntax. Try to click on `Code` and you should see that the same expression can be written as: 124 | 125 | `aci_faults * 1000` 126 | 127 | ## Filter by Fabric 128 | 129 | We will do this steps in the Code mode to learn a bit more. `PromQL` support filtering your queriers by the labels, this is super easy, just open a `{` after the metric name and you should see a dropdown menu with all our labels! 130 | 131 | 132 | If you want for example to show Faults only from `site1` you can type the following query `aci_faults{fabric="site1"} * 1000` and now only faults from `site1` should appear. If you want to filter by using Regular Expressions (`RegEx`) you can replace `=` with `=~` we will use this syntax in the next task 133 | 134 | ## Filter by Fabric with Dashboards Variables 135 | 136 | Variables in Grafana allow you to create dynamic and interactive dashboards by enabling you to define placeholders that can be replaced with different values at runtime. 137 | 138 | If you are still on the dashboard editing pane let's modify our query to look like this: 139 | 140 | `aci_faults{fabric=~"$fabric"} * 1000` 141 | 142 | The `fabric=~"$fabric"` part simply tells `Grafana` to use the variable `$fabric` in this filter and the `=~` also allow us to treat this filtering expression as a `RegEx` so that we can select 1 or more fabrics at the same time. 143 | 144 | Click apply, this will result in an *empty* dashboard, this is expected since the variable `$fabric` does not exists yet! 145 | 146 | To create the `$fabric` variables to select our `sites` follow these steps: 147 | 148 | - Click on the gear icon (settings in the top right) and select "Variables." 149 | - Click "New variable" 150 | - Select variable type: `Query` 151 | - Name: `fabric` 152 | - Display Name: `Fabric` 153 | - Show on dashboard: `Labels and Values` 154 | - Data source: `Prometheus` 155 | - Query 156 | - Query type: `Labels Values` 157 | - Label: `fabric` **Warning** select the `fabric` label **DO NOT** select `$fabric` 158 | - Selection options: Enabled `Multi-Values` and `Include All option` 159 | - Click `Apply` 160 | - Click `Close` 161 | - If the Dashboard is still empty click the refresh button on the Top Right (the two spinning arrows) 162 | 163 | Now your dashboard will have a new drop down menu where you can dynamically select the fabric to display! 164 | 165 | 166 | 167 | # The End 168 | This concludes Lab1, feel free to play around more with this dashboard if you want or you can proceed to [Lab2](lab2.md) -------------------------------------------------------------------------------- /docs/labs/lab2.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | In this lab we are going to use `Explore` to visualize the Logs Received by our ACI fabrics. 4 | 5 | **This labs assumes you are familiar with the [Demo Environment](../demo-environment.md)** 6 | 7 | # Access 8 | 9 | The Demo environment is hosted in a DMZ and ca be accessed with the following credentials: 10 | 11 | https://64.104.255.11/ 12 | 13 | user: `guest` 14 | password: `guest` 15 | 16 | The guest user is able to modify the dashboards and run `Explore` queries however it can't save any of the configuration changes. 17 | 18 | # Log Filtering and Exploring 19 | 20 | This lab will follow a free form where I will show you how to select logs from a fabric and how to filter them. Feel free to play around as well this is a great way to get a bit more familiar. 21 | 22 | ## Access Explore 23 | 24 | You can find `Explore` in the left panel of the Grafana UI. 25 | ![alt text](images/lab2/explore.png) 26 | 27 | By default `Explore` will select `Prometheus` as data source as you can see in the pic above. Let's switch to `Loki` by clicking on `Prometheus` and Selecting `Loki` 28 | 29 | 30 | 31 | Now the UI should change and show the `Loki Builder` 32 | 33 | ![alt text](images/lab2/loki-builder.png) 34 | 35 | As we saw previously in [lab1](lab1.md) with `Prometheus` we can use Labels to filter/visualize logs. Loki uses labels as well as regular expression to filter trough our logs. 36 | 37 | ## Grab Some Logs! 38 | 39 | To display the logs from a fabric you need to select a Label and a Value: 40 | Go to Label Filters and select the following from the dropdowns: 41 | 42 | - *Select Label*: `fabric` 43 | - *Select value*: `site2` 44 | 45 | This tells Grafana to pull in the logs where the fabric name is `site2` 46 | 47 | **Note:** In this example we are focusing on Grafana Dashboard, *someone* configured ACI to send the logs to Loki. You can check out the [deployment](../deployment.md#syslog-config) guide if you would like to know more. 48 | 49 | Press "Run Query" and you should be presented with something similar to the picture below: 50 | ![alt text](images/lab2/logs-1.png) 51 | 52 | As you can see above, the logs are already color coded by severity allowing to easily understand if we should take a deeper look into our fabrics. 53 | It is also possible to load all the logs from multiple fabrics, just click on the `=` and select `=~` now you can pick all 3 sites and if you re-run the query Grafana will load all the logs: 54 | 55 | ![alt text](images/lab2/multi-fabric-logs.png) 56 | 57 | If you expand a Log entry you will see the logs message details and its labels. As you can see there are just a handful of labels. 58 | 59 | ![alt text](images/lab2/log-details.png) 60 | 61 | In Loki it is better to keep labels to a minimum and to use filter expressions `(|= "text", |~ "regex", …)` and brute force the logs. This is in line with [Loki Label best practices](https://grafana.com/docs/loki/latest/get-started/labels/bp-labels/). 62 | 63 | 64 | ## Logs Filtering 65 | 66 | Log filtering can be done manually by entering a `RegEx` or you can highlight a part of a log message and add the text as included or excluded. 67 | 68 | For example if we want to show only messages where the error code is `[E4204936]` you can manually enter that text in the `Line Contains` box or you can simply highlight the text with your mouse and select *Add as line contains filter* 69 | 70 | ![alt text](images/lab2/ui-filter.png) 71 | 72 | The result will be something similar to what you can see below, feel free to use different filters, these screenshot are just examples! 73 | 74 | ![alt text](images/lab2/ui-filter-result.png) 75 | 76 | # The End 77 | This concludes Lab2, feel free to play around more with this dashboard if you want or you can proceed to [DMZ Deployment](../LABDCN-2620/dmz-deploy.md) instructions -------------------------------------------------------------------------------- /docs/labs/lab3.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | All the dashboards that are tagged as `cisco-aci-config` are generated by creating a backup of the ACI confing and importint it into a graph database. 4 | 5 | A graph database is a type of database designed to represent and store data as a network of interconnected nodes (entities) and edges (relationships). Unlike traditional relational databases that use tables, graph databases use graph structures to model relationships between data, making them highly efficient for querying and analyzing complex, interconnected data. Each node represents an entity (e.g., a person, product, or location), while edges define relationships (e.g., "friend of," "purchased," or "located at"). 6 | 7 | This works great to represent the ACI Configuration and allows us to create custom dashboards by using the `Cypher` language. 8 | 9 | Cypher is a query language specifically designed for working with graph databases. It is declarative, meaning you describe what you want to retrieve or manipulate in the graph, and the database engine determines the best way to execute the query. Cypher is similar in concept to SQL for relational databases but is optimized for graph structures, enabling intuitive and powerful querying of nodes (entities), relationships (edges), and their properties. 10 | 11 | Cypher uses a pattern-matching syntax that resembles ASCII art, making it easy to visualize and query graphs. For example, (a)-[r]->(b) represents a node a connected to node b by a relationship r. You can use Cypher to perform a variety of graph operations, such as finding shortest paths, traversing relationships, filtering based on properties, and creating or modifying nodes and edges. 12 | 13 | Feel free to explore the pre-existing dashboard once you are done if you want to experiment head over to: 14 | 15 | You can find `Explore` in the left panel of the Grafana UI and from the Drop Down Select `memgraph` 16 | ![alt text](images/lab2/explore.png) 17 | 18 | Now let's try writing a simple query: 19 | 20 | ```sql 21 | MATCH (t:fvTenant)-[r]-(vrf:fvCtx) 22 | WHERE t.fabric="site3" 23 | return * 24 | ``` 25 | 26 | This will return a mapping of Tenants to VRF. Try now to take a look at the `Contract Explorer` dashboard and edit it. You will see that the cypher query is a bit more complex: 27 | 28 | ```sql 29 | MATCH (provider)-[r1:fvRsProv|vzRsAnyToProv]-(contract:vzBrCP)-[r2:fvRsCons|vzRsAnyToCons]-(consumer) 30 | WHERE contract.dn="uni/tn-$tenant/brc-$contract" and contract.fabric='$fabric' 31 | 32 | RETURN provider.dn as ProviderDN, consumer.dn as ConsumerDN 33 | ``` 34 | 35 | 1. MATCH: The query is looking for a specific structure (or pattern) in the graph. 36 | 2. Nodes: 37 | - (provider): A node representing the provider (this could be any entity or object, depending on the graph's context). I do like this as this can be a EPG/ESG or a VRF 38 | - (contract:vzBrCP): A node representing a contract, specifically an ACI Class of type vzBrCP 39 | - (consumer): A node representing the consumer (another entity or object). I do like this as this can be a EPG/ESG or a VRF 40 | 3. Relationships: 41 | - [r1:fvRsProv|vzRsAnyToProv]: There is a relationship between the `provider` and the `contract`, which can be of type fvRsProv or vzRsAnyToProv. 42 | - [r2:fvRsCons|vzRsAnyToCons]: There is a relationship between the `contract` and the `consumer`, which can be of type fvRsCons or vzRsAnyToCons. 43 | 4. Pattern: 44 | - The query is looking for a provider node that is connected to a contract node via one of the specified relationships (fvRsProv or vzRsAnyToProv). 45 | - Then, it looks for a consumer node that is also connected to the same contract node via one of the specified relationships (fvRsCons or vzRsAnyToCons). 46 | 47 | 5. WHERE: It simply filter the result by fabric and contract name that you can select from the grafana dashboard. 48 | 6. RETURN: Instead of returning everything we return only the Distinguisher names of the ACI Objects. 49 | 50 | If you want to challenge yourself you can take a look at the `Fabric Policies - Port Group` dashboard query. -------------------------------------------------------------------------------- /docs/minikube.md: -------------------------------------------------------------------------------- 1 | # Minikube 2 | 3 | This can be used to run aci-monitoring-stack locally (say on your laptop). 4 | 5 | By default, minikube only provide access locally and this is an issue for logs ingestion however for a lab you can configure HAProxy to expose you Minikube instance over the Host IP Address. This implies that you should configure all your External Services as `NodePort` and configure HAProxy to send the traffic to the correct `NodePort` 6 | 7 | I have configured minikube with 4GB or RAM and 4 CPU and that was plenty to monitor a small 10 switch ACI Fabric. 8 | 9 | ```shell 10 | minikube config set memory 4016 11 | minikube config set cpus 4 12 | ``` 13 | 14 | Example HPProxy Config 15 | 16 | ```shell 17 | frontend grafana 18 | # Here I am doing SSL Termination 19 | bind :443 ssl crt /etc/ssl/private/grafana.pem 20 | default_backend grafana 21 | 22 | backend grafana 23 | balance roundrobin 24 | server grafana :30000 check 25 | 26 | frontend promtail-site1 27 | mode tcp 28 | bind :1511 29 | default_backend promtail-site1 30 | 31 | backend promtail-site1 32 | mode tcp 33 | balance roundrobin 34 | server promtail :30001 check 35 | 36 | frontend promtail-site2 37 | mode tcp 38 | bind :1512 39 | default_backend promtail-site2 40 | 41 | backend promtail-site2 42 | mode tcp 43 | balance roundrobin 44 | server promtail :30002 check 45 | 46 | frontend promtail-site3 47 | mode tcp 48 | bind :1513 49 | default_backend promtail-site3 50 | 51 | backend promtail-site3 52 | mode tcp 53 | balance roundrobin 54 | server promtail :30003 check 55 | ``` 56 | 57 | 58 | # Troubleshooting 59 | 60 | While installing Minikube I hit the following issues: 61 | 62 | ## minikube/podman wrong CNI Version 63 | See https://github.com/kubernetes/minikube/issues/17754 64 | For me this fixed it: 65 | ``` 66 | #sudo apt list --all-versions podman 67 | podman/jammy-updates,jammy-security,now 3.4.4+ds1-1ubuntu1.22.04.2 amd64 [installed] <=== Seems is bad 68 | podman/jammy 3.4.4+ds1-1ubuntu1 amd64 69 | 70 | #sudo apt install podman=3.4.4+ds1-1ubuntu1 71 | Reading package lists... Done 72 | ``` 73 | ## Prometheus does not install under minikube/podman 74 | 75 | Log into minikube with `minikube ssh`. 76 | 77 | 78 | ```shell 79 | sudo vi /etc/containers/registries.conf 80 | ``` 81 | 82 | `unqualified-search-registries = ["docker.io", "quay.io"]` 83 | 84 | Restart minikube 85 | ```shell 86 | minikube stop && minikube start 87 | ``` -------------------------------------------------------------------------------- /docs/multiple-aci-exporters.md: -------------------------------------------------------------------------------- 1 | Implemented not yet documented -------------------------------------------------------------------------------- /docs/syslog.md: -------------------------------------------------------------------------------- 1 | # Cisco ACI Syslog Configuration Guide 2 | 3 | Follow these steps to configure Syslog for Cisco Application Centric Infrastructure (ACI): 4 | 5 | ## 1. Access the APIC Management Console 6 | - Open a web browser and navigate to the management IP address of your Application Policy Infrastructure Controller (APIC). 7 | - Log in with your credentials. 8 | 9 | ## 2. Navigate to the Syslog Policy Configuration 10 | - Click on the `Admin` menu at the top. 11 | - Select `External Data Collectors` 12 | - Choose `Monitoring Destinations`, then `Syslog`. 13 | 14 | ## 3. Add a Syslog Server 15 | - Right Click on the `Syslog` folder and add a new Syslog server. 16 | - Enter the name for the Syslog server policy. 17 | - Format: Enhanced Log 18 | - Admin Stat: Enabled4 19 | - Click Next to configure the `Remote Destination` 20 | - Click on `+`: 21 | - Hostname: The promtail Service IP Address. 22 | - Port: The promtail Port IP Address. 23 | - Name: a name 24 | - Admin State: Enabled 25 | - Severity: Informational (this is required to get contract drop logs) 26 | - Management EPG: Select the management EPG to source the messages from 27 | 28 | ## 4 Configure Monitoring Policies 29 | 30 | Syslog monitoring policies can be configured at different scopes: 31 | - Fabric > Fabric Policies > Policies > Monitoring > Default > Callhome/Smart Callhome/SNMP/Syslog/TACACS > Syslog 32 | - Fabric > Access Policies > Policies > Monitoring > Default > Callhome/Smart Callhome/SNMP/Syslog > Syslog 33 | - Tenant > Policies > Monitoring > Default > Callhome/Smart Callhome/SNMP/Syslog > Syslog 34 | - If you want to have a common policy for all your tenants you can configure the policy under the `common` tenant and will be applied to all your tenants 35 | For each of the above scopes repeat the following: 36 | - Select Syslog and click on `+` 37 | - Name: a name 38 | - Min Severity: Information (Choose based on your needs) 39 | - For ACI Contract Deny Logs the Min severity for the `Access Policies` MUST BE Information 40 | - Include: Select All the Options 41 | - Dest Group: Select the Destiantion group created in the previous step. 42 | 43 | ### Enabling the sending of ACL/Contract Log entries as SYSLOG events 44 | 45 | The enable ACI to send Contract Permit/Deny log messages change the default syslog policy from `alerts` to `information`. To do so go to: 46 | - Fabric > Fabric Policies > Policies > Monitoring > Common Policy > Syslog Message Policy > default 47 | - From `Facility Filters` deouble click on `default` and set the `Severity` to `information` 48 | 49 | *Warning*: ACI Contract Deny Logs is limited to 500 Messages/s per switch aci-monitoring-stack 50 | 51 | ## 4. Apply the Syslog Policy to an Administrative Domain (Tenant) 52 | - Navigate to `Tenants` on the top menu. 53 | - Select the tenant to which you wish to apply the Syslog policy. 54 | - Within the tenant, go to `Monitoring Policies`. 55 | - Choose `Common Policy` and then `Logging`. 56 | - Associate the Syslog server policy with the tenant by selecting it from the list. 57 | - Apply the policy to the desired EPGs (Endpoint Groups), applications, or ACI constructs. 58 | -------------------------------------------------------------------------------- /docs/webex.md: -------------------------------------------------------------------------------- 1 | # Webex How To 2 | 3 | 4 | - Create a new Webex Space 5 | 6 | ## Create a Bot 7 | - Go to: https://developer.webex.com/my-apps 8 | - Create a new Bot 9 | - (Re)generate Access Token: This the token you set in the `credentials` field of Alertmanager config for Webex 10 | - Add the Bot to the Webex Space 11 | - Go to People 12 | - Add People 13 | - Search the name of your Bot 14 | 15 | ## Get you Space/Room ID: 16 | 17 | - Go to https://developer.webex.com/docs/api/v1/rooms/list-rooms 18 | - Disable: `Use personal access token` otherwise we are gonna retrieve ALL the rooms... 19 | - In `Bearer` paste the Access Token generated in the previous step 20 | - You should see 1 or 2 rooms, find the one with the right name and copy the `id` in the `room_id` field of Alertmanager config for Webex 21 | 22 | --------------------------------------------------------------------------------