├── .gitignore ├── README.md ├── day-1 ├── images │ ├── Introduction-to-Observability.png │ └── why-monitoring-why-observability.png └── readme.md ├── day-2 ├── custom_kube_prometheus_stack.yml ├── images │ └── prometheus-architecture.gif └── readme.md ├── day-3 ├── alb_controller.md ├── ingress_kube_prom_stack.yaml └── readme.md ├── day-4 ├── alerts-alertmanager-servicemonitor-manifest │ ├── alertmangerconfig.yml │ ├── alerts.yml │ ├── email-secrets.yml │ ├── kustomization.yml │ └── serviceMonitor.yml ├── application │ ├── service-a │ │ ├── Dockerfile │ │ ├── index.js │ │ ├── package-lock.json │ │ ├── package.json │ │ └── tracing.js │ └── service-b │ │ ├── Dockerfile │ │ ├── index.js │ │ ├── package-lock.json │ │ ├── package.json │ │ └── tracing.js ├── images │ └── architecture.gif ├── kubernetes-manifest │ ├── deployment-svc-a.yml │ ├── deployment-svc-b.yml │ ├── kustomization.yml │ ├── service-svc-a.yml │ └── service-svc-b.yml ├── readme.md └── test.sh ├── day-5 ├── fluentbit-values.yaml ├── images │ └── architecture.gif └── readme.md ├── day-6 ├── images │ └── architecture.gif ├── jaeger-values.yaml └── readme.md ├── day-7 ├── README.md ├── jaeger-values.yaml ├── k8s-manifests │ ├── deployment-a.yml │ ├── deployment-b.yml │ ├── kustomization.yml │ ├── namespace.yml │ ├── svc-a.yml │ └── svc-b.yml ├── microservice-a │ ├── .dockerignore │ ├── .env │ ├── docker-compose.yml │ ├── dockerfile │ ├── go.mod │ ├── go.sum │ ├── main.go │ ├── otel-collector-config.yaml │ ├── prometheus.yaml │ └── test.sh ├── microservice-b │ ├── .dockerignore │ ├── .env │ ├── docker-compose.yml │ ├── dockerfile │ ├── go.mod │ ├── go.sum │ ├── main.go │ ├── otel-collector-config.yaml │ ├── prometheus.yaml │ └── test.sh ├── otel-collector-values.yaml ├── prometheus-values.yaml └── test.sh └── opensearch-stack ├── fluent-bit-config.yaml ├── fluent-bit-daemonset.yaml ├── log-generator.yaml └── prerequisites.md /.gitignore: -------------------------------------------------------------------------------- 1 | **/*.pptx 2 | 3 | **/**/node_modules 4 | 5 | **/**/*.pem -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # 📚 7-Day Observability Tutorial Series 3 | 4 | Welcome to the 7-Day Observability Tutorial Series! This repository contains the code and detailed explanations for setting up and understanding observability in Kubernetes using Prometheus, Grafana, Elasticsearch Fluentbit, Kibana, Jaeger, groundcover(eBPF), opentelemetry e.t.c.,. 5 | 6 | ## 📅 Overview of Each Day 7 | 8 | ### Day 1: Introduction to Observability 9 | - **Concepts Covered**: 10 | - Introduction to Observability, Monitoring, Logging, and Tracing. 11 | - The difference between Monitoring and Observability. 12 | - Tools available for Monitoring and Observability. 13 | - Comparison between monitoring and observing in Bare-Metal Servers vs. Kubernetes. 14 | - **Key Learning**: 15 | - Understand the fundamental concepts of observability. 16 | - Learn why monitoring and observability are crucial in modern IT environments. 17 | 18 | ### Day 2: Prometheus - Setting Up Monitoring 19 | - **Concepts Covered**: 20 | - Introduction to Prometheus and its architecture. 21 | - Setup and configuration of Prometheus in an EKS cluster. 22 | - Installation of kube-prometheus-stack with Helm and integrating it with Grafana. 23 | - Basic queries and setup for monitoring with Prometheus and Grafana. 24 | - **Key Learning**: 25 | - Get hands-on experience with Prometheus and Grafana. 26 | - Learn to install and configure Prometheus on Kubernetes. 27 | 28 | ### Day 3: Metrics and PromQL in Prometheus 29 | - **Concepts Covered**: 30 | - Introduction to PromQL and basic querying techniques. 31 | - Aggregation and functions in PromQL to analyze metrics data. 32 | - **Key Learning**: 33 | - Master the Prometheus Query Language (PromQL) for querying and analyzing metrics. 34 | 35 | ### Day 4: Instrumentation and Custom Metrics 36 | - **Concepts Covered**: 37 | - Instrumentation for adding monitoring capabilities to applications. 38 | - Understanding different types of metrics in Prometheus: Counter, Gauge, Histogram, and Summary. 39 | - Writing custom metrics in a Node.js application using the `prom-client` library. 40 | - Dockerizing the application and deploying it on Kubernetes. 41 | - Setting up Alertmanager for alerting based on custom metrics. 42 | - **Key Learning**: 43 | - Learn how to instrument applications to expose custom metrics. 44 | - Configure alerts in Alertmanager to monitor application performance. 45 | - Understand how to work with different types of metrics in Prometheus. 46 | 47 | ### Day 5: Logging with EFK Stack 48 | - **Concepts Covered**: 49 | - Introduction to logging in distributed systems and Kubernetes. 50 | - Setting up the EFK stack (Elasticsearch, Fluentbit, Kibana) on Kubernetes. 51 | - Detailed setup and configuration for collecting and visualizing logs. 52 | - Cleaning up the Kubernetes cluster and resources. 53 | - **Key Learning**: 54 | - Understand the importance of logging and how to set up 55 | 56 | ### Day 6: Distributed Tracing with Jaeger 57 | - **Concepts Covered**: 58 | - Introduction to Jaeger and its architecture for distributed tracing. 59 | - Setting up Jaeger in a Kubernetes cluster using Helm. 60 | - Instrumenting services using OpenTelemetry to enable tracing. 61 | - Viewing and analyzing traces in the Jaeger UI. 62 | - Cleaning up the environment after setting up Jaeger. 63 | - **Key Learning**: 64 | - Gain insights into distributed tracing and how it helps in debugging and performance optimization. 65 | - Learn how to set up and configure Jaeger for tracing in a microservices architecture. 66 | 67 | ### Day 7: OpenTelemetry – Setting Up Unified Observability 68 | - **Concepts Covered**: 69 | - Introduction to OpenTelemetry, a unified framework for observability. 70 | - Understanding how OpenTelemetry integrates tracing, metrics, and logging. 71 | - Comparison of OpenTelemetry with prior observability tools like Jaeger, Prometheus 72 | - Supported programming languages and multi-language support in OpenTelemetry. 73 | - Step-by-step setup of OpenTelemetry in Kubernetes. 74 | - **Key Learning**: 75 | - Learn how OpenTelemetry simplifies the process of collecting and exporting telemetry data. 76 | - Understand the benefits of a unified observability approach using OpenTelemetry. 77 | - Gain hands-on experience with setting up OpenTelemetry Collector, Prometheus, Jaeger, and Elasticsearch to monitor a Golang microservice application. 78 | 79 | -------------------------------------------------------------------------------- /day-1/images/Introduction-to-Observability.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iam-veeramalla/observability-zero-to-hero/9445b2364672b23b72f029e65471ed485a0c8950/day-1/images/Introduction-to-Observability.png -------------------------------------------------------------------------------- /day-1/images/why-monitoring-why-observability.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iam-veeramalla/observability-zero-to-hero/9445b2364672b23b72f029e65471ed485a0c8950/day-1/images/why-monitoring-why-observability.png -------------------------------------------------------------------------------- /day-1/readme.md: -------------------------------------------------------------------------------- 1 | # 💡 Introduction to Observability 2 | - Observability is the ability to understand the internal state of a system by analyzing the data it produces, including logs, metrics, and traces. 3 | 4 | - Monitoring(Metrics): involves tracking system metrics like CPU usage, memory usage, and network performance. Provides alerts based on predefined thresholds and conditions 5 | - `Monitoring tells us what is happening.` 6 | - Logging(Logs): involves the collection of log data from various components of a system. 7 | - `Logging explains why it is happening.` 8 | - Tracing(Traces): involves tracking the flow of a request or transaction as it moves through different services and components within a system. 9 | - `Tracing shows how it is happening.` 10 | 11 | ![Introduction to Observability](images/Introduction-to-Observability.png) 12 | 13 | ## 🤔 Why Monitoring? 14 | - Monitoring helps us keep an eye on our systems to ensure they are working properly. 15 | - Perpose: maintaining the **health, performance, and security** of IT environments. 16 | - It enables early detection of issues, ensuring that they can be addressed before causing significant downtime or data loss. 17 | 18 | - We use monitoring to: 19 | - Detect Problems Early 20 | - Measure Performance: 21 | - Ensure Availability: 22 | 23 | ## 🤔 Why Observability? 24 | - Observability helps us understand why our systems are behaving the way they are. 25 | - It’s like having a detailed map and tools to explore and diagnose issues. 26 | 27 | - We use observability to: 28 | - Diagnose Issues: 29 | - Understand Behavior: 30 | - Improve Systems: 31 | 32 | ![why-monitoring-why-observability](images/why-monitoring-why-observability.png) 33 | 34 | 35 | ## 🆚 What is the Exact Difference Between Monitoring and Observability? 36 | - 🔥 Monitoring is the *`when and what`* of a system error, and observability is the *`why and how`* 37 | 38 | | Category | Monitoring | Observability | 39 | |----------------|----------------------------------------------|------------------------------------------------------| 40 | | Focus | Checking if everything is working as expected| Understanding why things are happening in the system | 41 | | Data | Collects metrics like CPU usage, memory usage, and error rates | Collects logs, metrics, and traces to provide a full picture | 42 | | Alerts | Sends notifications when something goes wrong| Correlates events and anomalies to identify root causes | 43 | | Example | If a server's CPU usage goes above 90%, monitoring will alert us | If a website is slow, observability helps us trace the user's request through different services to find the bottleneck | 44 | | Insight | Identifies potential issues before they become critical | Helps diagnose issues and understand system behavior | 45 | 46 | 47 | ## 🔭 Does Observability Cover Monitoring? 48 | - Yes!! Monitoring is subset of Observability 49 | - Observability is a broader concept that includes monitoring as one of its components. 50 | - monitoring focuses on tracking specific metrics and alerting on predefined conditions 51 | - observability provides a comprehensive understanding of the system by collecting and analyzing a wider range of data, including **logs, metrics, and traces**. 52 | 53 | ## 🖥️ What Can Be Monitored? 54 | - Infrastructure: CPU usage, memory usage, disk I/O, network traffic. 55 | - Applications: Response times, error rates, throughput. 56 | - Databases: Query performance, connection pool usage, transaction rates. 57 | - Network: Latency, packet loss, bandwidth usage. 58 | - Security: Unauthorized access attempts, vulnerability scans, firewall logs. 59 | 60 | ## 👀 What Can Be Observed? 61 | - Logs: Detailed records of events and transactions within the system. 62 | - Metrics: Quantitative data points like CPU load, memory consumption, and request counts. 63 | - Traces: Data that shows the flow of requests through various services and components. 64 | 65 | ## 🆚 Monitoring on Bare-Metal Servers vs. Monitoring Kubernetes 66 | - Bare-Metal Servers: 67 | - Direct Access: Easier access to hardware metrics and logs. 68 | - Fewer Layers: Simpler environment with fewer abstraction layers. 69 | 70 | - Kubernetes: 71 | - Dynamic Environment: Challenges with monitoring ephemeral containers and dynamic scaling. 72 | - Distributed Nature: Requires tools that can handle distributed systems and correlate data from multiple sources. 73 | 74 | ## 🆚 Observing on Bare-Metal Servers vs. Observing Kubernetes 75 | - Bare-Metal Servers: 76 | - Simpler Observability: Easier to collect and correlate logs, metrics, and traces due to fewer components and layers. 77 | 78 | - Kubernetes: 79 | - Complex Observability: Requires sophisticated tools to handle the dynamic and distributed nature of containers and microservices. 80 | - Integration: Necessitates the integration of multiple observability tools to get a complete picture of the system. 81 | 82 | ## ⚒️ What are the Tools Available? 83 | - **Monitoring Tools**: Prometheus, Grafana, Nagios, Zabbix, PRTG. 84 | - **Observability Tools**: ELK Stack (Elasticsearch, Logstash, Kibana), EFK Stack (Elasticsearch, FluentBit, Kibana) Splunk, Jaeger, Zipkin, New Relic, Dynatrace, Datadog. 85 | 86 | -------------------------------------------------------------------------------- /day-2/custom_kube_prometheus_stack.yml: -------------------------------------------------------------------------------- 1 | alertmanager: 2 | alertmanagerSpec: 3 | # Selects Alertmanager configuration based on these labels. Ensure that the Alertmanager configuration has matching labels. 4 | # ✅ Solves error: Misconfigured Alertmanager selectors can lead to missing alert configurations. 5 | # ✅ Solves error: Alertmanager wasn't able to findout the applied CRD (kind: Alertmanagerconfig) 6 | alertmanagerConfigSelector: 7 | matchLabels: 8 | release: monitoring 9 | 10 | # Sets the number of Alertmanager replicas to 3 for high availability. 11 | # ✅ Solves error: Single replica can cause alerting issues during pod failures. 12 | # ✅ Solves error: Alertmanager Cluster Status is Disabled (GitHub issue) 13 | replicas: 2 14 | 15 | # Sets the strategy for matching Alertmanager configurations. 'None' means no specific matching strategy. 16 | # ✅ Solves error: Incorrect matcher strategy can lead to unhandled alert configurations. 17 | # ✅ Solves error: Get rid of namespace matchers when creating AlertManagerConfig (GitHub issue) 18 | alertmanagerConfigMatcherStrategy: 19 | type: None -------------------------------------------------------------------------------- /day-2/images/prometheus-architecture.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iam-veeramalla/observability-zero-to-hero/9445b2364672b23b72f029e65471ed485a0c8950/day-2/images/prometheus-architecture.gif -------------------------------------------------------------------------------- /day-2/readme.md: -------------------------------------------------------------------------------- 1 | # Monitoring 2 | 3 | ## Metrics vs Monitoring 4 | 5 | Metrics are measurements or data points that tell you what is happening. For example, the number of steps you walk each day, your heart rate, or the temperature outside—these are all metrics. 6 | 7 | Monitoring is the process of keeping an eye on these metrics over time to understand what’s normal, identify changes, and detect problems. It's like watching your step count daily to see if you're meeting your fitness goal or checking your heart rate to make sure it's in a healthy range. 8 | 9 | ## 🚀 Prometheus 10 | - Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. 11 | - It is known for its robust data model, powerful query language (PromQL), and the ability to generate alerts based on the collected time-series data. 12 | - It can be configured and set up on both bare-metal servers and container environments like Kubernetes. 13 | 14 | ## 🏠 Prometheus Architecture 15 | - The architecture of Prometheus is designed to be highly flexible, scalable, and modular. 16 | - It consists of several core components, each responsible for a specific aspect of the monitoring process. 17 | 18 | ![Prometheus Architecture](images/prometheus-architecture.gif) 19 | 20 | ### 🔥 Prometheus Server 21 | - Prometheus server is the core of the monitoring system. It is responsible for scraping metrics from various configured targets, storing them in its time-series database (TSDB), and serving queries through its HTTP API. 22 | - Components: 23 | - **Retrieval**: This module handles the scraping of metrics from endpoints, which are discovered either through static configurations or dynamic service discovery methods. 24 | - **TSDB (Time Series Database)**: The data scraped from targets is stored in the TSDB, which is designed to handle high volumes of time-series data efficiently. 25 | - **HTTP Server**: This provides an API for querying data using PromQL, retrieving metadata, and interacting with other components of the Prometheus ecosystem. 26 | - **Storage**: The scraped data is stored on local disk (HDD/SSD) in a format optimized for time-series data. 27 | 28 | ### 🌐 Service Discovery 29 | - Service discovery automatically identifies and manages the list of scrape targets (i.e., services or applications) that Prometheus monitors. 30 | - This is crucial in dynamic environments like Kubernetes where services are constantly being created and destroyed. 31 | - Components: 32 | - **Kubernetes**: In Kubernetes environments, Prometheus can automatically discover services, pods, and nodes using Kubernetes API, ensuring it monitors the most up-to-date list of targets. 33 | - **File SD (Service Discovery)**: Prometheus can also read static target configurations from files, allowing for flexibility in environments where dynamic service discovery is not used. 34 | 35 | ### 📤 Pushgateway 36 | - The Pushgateway is used to expose metrics from short-lived jobs or applications that cannot be scraped directly by Prometheus. 37 | - These jobs push their metrics to the Pushgateway, which then makes them available for Prometheus to scrape(pull). 38 | - Use Case: 39 | - It's particularly useful for batch jobs or tasks that have a limited lifespan and would otherwise not have their metrics collected. 40 | 41 | ### 🚨 Alertmanager 42 | - The Alertmanager is responsible for managing alerts generated by the Prometheus server. 43 | - It takes care of deduplicating, grouping, and routing alerts to the appropriate notification channels such as PagerDuty, email, or Slack. 44 | 45 | ### 🧲 Exporters 46 | - Exporters are small applications that collect metrics from various third-party systems and expose them in a format Prometheus can scrape. They are essential for monitoring systems that do not natively support Prometheus. 47 | - Types of Exporters: 48 | - Common exporters include the Node Exporter (for hardware metrics), the MySQL Exporter (for database metrics), and various other application-specific exporters. 49 | 50 | ### 🖥️ Prometheus Web UI 51 | - The Prometheus Web UI allows users to explore the collected metrics data, run ad-hoc PromQL queries, and visualize the results directly within Prometheus. 52 | 53 | ### 📊 Grafana 54 | - Grafana is a powerful dashboard and visualization tool that integrates with Prometheus to provide rich, customizable visualizations of the metrics data. 55 | 56 | ### 🔌 API Clients 57 | - API clients interact with Prometheus through its HTTP API to fetch data, query metrics, and integrate Prometheus with other systems or custom applications. 58 | 59 | # 🛠️ Installation & Configurations 60 | ## 📦 Step 1: Create EKS Cluster 61 | 62 | ### Prerequisites 63 | - Download and Install AWS Cli - Please Refer [this]("https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html") link. 64 | - Setup and configure AWS CLI using the `aws configure` command. 65 | - Install and configure eksctl using the steps mentioned [here]("https://eksctl.io/installation/"). 66 | - Install and configure kubectl as mentioned [here]("https://kubernetes.io/docs/tasks/tools/"). 67 | 68 | 69 | ```bash 70 | eksctl create cluster --name=observability \ 71 | --region=us-east-1 \ 72 | --zones=us-east-1a,us-east-1b \ 73 | --without-nodegroup 74 | ``` 75 | ```bash 76 | eksctl utils associate-iam-oidc-provider \ 77 | --region us-east-1 \ 78 | --cluster observability \ 79 | --approve 80 | ``` 81 | ```bash 82 | eksctl create nodegroup --cluster=observability \ 83 | --region=us-east-1 \ 84 | --name=observability-ng-private \ 85 | --node-type=t3.medium \ 86 | --nodes-min=2 \ 87 | --nodes-max=3 \ 88 | --node-volume-size=20 \ 89 | --managed \ 90 | --asg-access \ 91 | --external-dns-access \ 92 | --full-ecr-access \ 93 | --appmesh-access \ 94 | --alb-ingress-access \ 95 | --node-private-networking 96 | 97 | # Update ./kube/config file 98 | aws eks update-kubeconfig --name observability 99 | ``` 100 | 101 | ### 🧰 Step 2: Install kube-prometheus-stack 102 | ```bash 103 | helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 104 | helm repo update 105 | ``` 106 | 107 | ### 🚀 Step 3: Deploy the chart into a new namespace "monitoring" 108 | ```bash 109 | kubectl create ns monitoring 110 | ``` 111 | ```bash 112 | cd day-2 113 | 114 | helm install monitoring prometheus-community/kube-prometheus-stack \ 115 | -n monitoring \ 116 | -f ./custom_kube_prometheus_stack.yml 117 | ``` 118 | 119 | ### ✅ Step 4: Verify the Installation 120 | ```bash 121 | kubectl get all -n monitoring 122 | ``` 123 | - **Prometheus UI**: 124 | ```bash 125 | kubectl port-forward service/prometheus-operated -n monitoring 9090:9090 126 | ``` 127 | 128 | **NOTE:** If you are using an EC2 Instance or Cloud VM, you need to pass `--address 0.0.0.0` to the above command. Then you can access the UI on 129 | 130 | - **Grafana UI**: password is `prom-operator` 131 | ```bash 132 | kubectl port-forward service/monitoring-grafana -n monitoring 8080:80 133 | ``` 134 | - **Alertmanager UI**: 135 | ```bash 136 | kubectl port-forward service/alertmanager-operated -n monitoring 9093:9093 137 | ``` 138 | 139 | ### 🧼 Step 5: Clean UP 140 | - **Uninstall helm chart**: 141 | ```bash 142 | helm uninstall monitoring --namespace monitoring 143 | ``` 144 | - **Delete namespace**: 145 | ```bash 146 | kubectl delete ns monitoring 147 | ``` 148 | - **Delete Cluster & everything else**: 149 | ```bash 150 | eksctl delete cluster --name observability 151 | ``` 152 | -------------------------------------------------------------------------------- /day-3/alb_controller.md: -------------------------------------------------------------------------------- 1 | # ALB Controller Installation 2 | 3 | Follow the steps mentioned [here]("https://github.com/iam-veeramalla/aws-devops-zero-to-hero/blob/main/day-22/alb-controller-add-on.md") -------------------------------------------------------------------------------- /day-3/ingress_kube_prom_stack.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: networking.k8s.io/v1 2 | kind: Ingress 3 | metadata: 4 | name: kubernetes-prometheus-stack 5 | annotations: 6 | alb.ingress.kubernetes.io/scheme: internet-facing 7 | alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]' 8 | alb.ingress.kubernetes.io/target-type: ip 9 | spec: 10 | ingressClassName: alb 11 | rules: 12 | - http: 13 | paths: 14 | - path: /prometheus 15 | pathType: Prefix 16 | backend: 17 | service: 18 | name: prometheus-service # Change this to your Prometheus service name 19 | port: 20 | number: 9090 21 | - path: /grafana 22 | pathType: Prefix 23 | backend: 24 | service: 25 | name: grafana-service # Change this to your Grafana service name 26 | port: 27 | number: 3000 28 | - path: /alertmanager 29 | pathType: Prefix 30 | backend: 31 | service: 32 | name: alertmanager-service # Change this to your Alertmanager service name 33 | port: 34 | number: 9093 35 | -------------------------------------------------------------------------------- /day-3/readme.md: -------------------------------------------------------------------------------- 1 | 2 | ## 📊 Metrics in Prometheus: 3 | - Metrics in Prometheus are the core data objects that represent measurements collected from monitored systems. 4 | - These metrics provide insights into various aspects of **system performance, health, and behavior**. 5 | 6 | ## 🏷️ Labels: 7 | - Metrics are paired with Labels. 8 | - Labels are key-value pairs that allow you to differentiate between dimensions of a metric, such as different services, instances, or endpoints. 9 | 10 | 11 | ## 🔍 Example: 12 | ```bash 13 | container_cpu_usage_seconds_total{namespace="kube-system", endpoint="https-metrics"} 14 | ``` 15 | - `container_cpu_usage_seconds_total` is the metric. 16 | - `{namespace="kube-system", endpoint="https-metrics"}` are the labels. 17 | 18 | 19 | ## 🛠️ What is PromQL? 20 | - PromQL (Prometheus Query Language) is a powerful and flexible query language used to query data from Prometheus. 21 | - It allows you to retrieve and manipulate time series data, perform mathematical operations, aggregate data, and much more. 22 | 23 | - 🔑 Key Features of PromQL: 24 | - Selecting Time Series: You can select specific metrics with filters and retrieve their data. 25 | - Mathematical Operations: PromQL allows for mathematical operations on metrics. 26 | - Aggregation: You can aggregate data across multiple time series. 27 | - Functionality: PromQL includes a wide range of functions to analyze and manipulate data. 28 | 29 | ## 💡 Basic Examples of PromQL 30 | - `container_cpu_usage_seconds_total` 31 | - Return all time series with the metric container_cpu_usage_seconds_total 32 | - `container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}` 33 | - Return all time series with the metric `container_cpu_usage_seconds_total` and the given `namespace` and `pod` labels. 34 | - `container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}[5m]` 35 | - Return a whole range of time (in this case 5 minutes up to the query time) for the same vector, making it a range vector. 36 | 37 | ## ⚙️ Aggregation & Functions in PromQL 38 | - Aggregation in PromQL allows you to combine multiple time series into a single one, based on certain labels. 39 | - **Sum Up All CPU Usage**: 40 | ```bash 41 | sum(rate(node_cpu_seconds_total[5m])) 42 | ``` 43 | - This query aggregates the CPU usage across all nodes. 44 | 45 | - **Average Memory Usage per Namespace:** 46 | ```bash 47 | avg(container_memory_usage_bytes) by (namespace) 48 | ``` 49 | - This query provides the average memory usage grouped by namespace. 50 | 51 | - **rate() Function:** 52 | - The rate() function calculates the per-second average rate of increase of the time series in a specified range. 53 | ```bash 54 | rate(container_cpu_usage_seconds_total[5m]) 55 | ``` 56 | - This calculates the rate of CPU usage over 5 minutes. 57 | - **increase() Function:** 58 | - The increase() function returns the increase in a counter over a specified time range. 59 | ```bash 60 | increase(kube_pod_container_status_restarts_total[1h]) 61 | ``` 62 | - This gives the total increase in container restarts over the last hour. 63 | 64 | - **histogram_quantile() Function:** 65 | - The histogram_quantile() function calculates quantiles (e.g., 95th percentile) from histogram data. 66 | ```bash 67 | histogram_quantile(0.95, sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le)) 68 | ``` 69 | - This calculates the 95th percentile of Kubernetes API request durations. 70 | -------------------------------------------------------------------------------- /day-4/alerts-alertmanager-servicemonitor-manifest/alertmangerconfig.yml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1alpha1 2 | kind: AlertmanagerConfig 3 | metadata: 4 | name: main-rules-alert-config 5 | namespace: monitoring 6 | labels: 7 | release: monitoring 8 | spec: 9 | route: 10 | repeatInterval: 30m 11 | receiver: 'null' 12 | routes: 13 | - matchers: 14 | - name: alertname 15 | value: HighCpuUsage 16 | receiver: 'send-email' 17 | - matchers: 18 | - name: alertname 19 | value: PodRestart 20 | receiver: 'send-email' 21 | repeatInterval: 5m 22 | receivers: 23 | - name: 'send-email' 24 | emailConfigs: 25 | - to: YOUR_EMAIL_ID 26 | from: YOUR_EMAIL_ID 27 | sendResolved: false 28 | smarthost: smtp.gmail.com:587 29 | authUsername: YOUR_EMAIL_ID 30 | authIdentity: YOUR_EMAIL_ID 31 | authPassword: 32 | name: mail-pass 33 | key: gmail-pass 34 | - name: 'null' 35 | -------------------------------------------------------------------------------- /day-4/alerts-alertmanager-servicemonitor-manifest/alerts.yml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: PrometheusRule 3 | metadata: 4 | name: custom-alert-rules 5 | namespace: monitoring 6 | labels: 7 | release: monitoring # if you installed through then you've to mention the release name of helm, otherwise prometheus will not recognize it 8 | spec: 9 | groups: 10 | - name: custom.rules 11 | rules: 12 | - alert: HighCpuUsage 13 | expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 50 14 | for: 5m 15 | labels: 16 | severity: warning 17 | annotations: 18 | summary: "High CPU usage on instance {{ $labels.instance }}" 19 | description: "CPU usage is above 50% (current value: {{ $value }}%)" 20 | - alert: PodRestart 21 | expr: kube_pod_container_status_restarts_total > 2 22 | for: 0m 23 | labels: 24 | severity: critical 25 | annotations: 26 | summary: "Pod restart detected in namespace {{ $labels.namespace }}" 27 | description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has restarted {{ $value }} times" 28 | -------------------------------------------------------------------------------- /day-4/alerts-alertmanager-servicemonitor-manifest/email-secrets.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Secret 3 | type: Opaque 4 | metadata: 5 | name: mail-pass 6 | namespace: monitoring 7 | labels: 8 | release: monitoring 9 | data: 10 | gmail-pass: <> 11 | 12 | 13 | -------------------------------------------------------------------------------- /day-4/alerts-alertmanager-servicemonitor-manifest/kustomization.yml: -------------------------------------------------------------------------------- 1 | apiVersion: kustomize.config.k8s.io/v1beta1 2 | kind: Kustomization 3 | namespace: monitoring 4 | resources: 5 | - alerts.yml 6 | - email-secrets.yml 7 | - alertmangerconfig.yml 8 | - serviceMonitor.yml 9 | -------------------------------------------------------------------------------- /day-4/alerts-alertmanager-servicemonitor-manifest/serviceMonitor.yml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app: a-service-service-monitor 6 | release: monitoring 7 | name: a-service-service-monitor 8 | namespace: monitoring 9 | spec: 10 | jobLabel: job 11 | endpoints: 12 | - interval: 2s 13 | port: a-service-port 14 | path: /metrics 15 | selector: 16 | matchLabels: 17 | app: a-service 18 | namespaceSelector: 19 | matchNames: 20 | - dev 21 | -------------------------------------------------------------------------------- /day-4/application/service-a/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM node:18-alpine 2 | 3 | COPY package*.json /usr/app/ 4 | 5 | COPY index.js /usr/app/ 6 | 7 | COPY tracing.js /usr/app/ 8 | 9 | WORKDIR /usr/app 10 | 11 | RUN npm install 12 | 13 | CMD ["node", "index.js"] 14 | -------------------------------------------------------------------------------- /day-4/application/service-a/index.js: -------------------------------------------------------------------------------- 1 | // service-a/index.js 2 | require('dotenv').config(); 3 | require('./tracing'); // Add this line to initialize tracing 4 | const express = require('express'); 5 | const morgan = require('morgan'); 6 | const pino = require('pino'); 7 | const axios = require('axios'); 8 | const promClient = require('prom-client'); 9 | 10 | const app = express(); 11 | 12 | const logger = pino(); 13 | 14 | const logging = () => { 15 | logger.info("Here are the logs") 16 | logger.info("Please have a look ") 17 | logger.info("This is just for testing") 18 | } 19 | 20 | app.use(morgan('common')) 21 | 22 | const PORT = 3001; 23 | 24 | 25 | 26 | 27 | // Prometheus metrics 28 | const httpRequestCounter = new promClient.Counter({ 29 | name: 'http_requests_total', 30 | help: 'Total number of HTTP requests', 31 | labelNames: ['method', 'path', 'status_code'], 32 | }); 33 | 34 | const requestDurationHistogram = new promClient.Histogram({ 35 | name: 'http_request_duration_seconds', 36 | help: 'Duration of HTTP requests in seconds', 37 | labelNames: ['method', 'path', 'status_code'], 38 | buckets: [0.1, 0.5, 1, 5, 10], // Buckets for the histogram in seconds 39 | }); 40 | 41 | const requestDurationSummary = new promClient.Summary({ 42 | name: 'http_request_duration_summary_seconds', 43 | help: 'Summary of the duration of HTTP requests in seconds', 44 | labelNames: ['method', 'path', 'status_code'], 45 | percentiles: [0.5, 0.9, 0.99], // Define your percentiles here 46 | }); 47 | 48 | 49 | 50 | // Gauge metric 51 | const gauge = new promClient.Gauge({ 52 | name: 'node_gauge_example', 53 | help: 'Example of a gauge tracking async task duration', 54 | labelNames: ['method', 'status'] 55 | }); 56 | 57 | // Define an async function that simulates a task taking random time 58 | const simulateAsyncTask = async () => { 59 | const randomTime = Math.random() * 5; // Random time between 0 and 5 seconds 60 | return new Promise((resolve) => setTimeout(resolve, randomTime * 1000)); 61 | }; 62 | 63 | app.disable('etag'); 64 | 65 | // Middleware to track metrics 66 | app.use((req, res, next) => { 67 | const start = Date.now(); 68 | res.on('finish', () => { 69 | const duration = (Date.now() - start) / 1000; // Duration in seconds 70 | const { method, url } = req; 71 | const statusCode = res.statusCode; // Get the actual HTTP status code 72 | httpRequestCounter.labels({ method, path: url, status_code: statusCode }).inc(); 73 | requestDurationHistogram.labels({ method, path: url, status_code: statusCode }).observe(duration); 74 | requestDurationSummary.labels({ method, path: url, status_code: statusCode }).observe(duration); 75 | }); 76 | next(); 77 | }); 78 | 79 | app.get('/', (req, res) => { 80 | res.status(200).json({ 81 | status: "🏃- Running" 82 | }); 83 | }); 84 | 85 | app.get('/healthy', (req, res) => { 86 | res.status(200).json({ 87 | name: "👀 - Obserability 🔥- Abhishek Veeramalla", 88 | status: "healthy" 89 | }) 90 | }); 91 | 92 | app.get('/serverError', (req, res) => { 93 | res.status(500).json({ 94 | error: " Internal server error", 95 | statusCode: 500 96 | }) 97 | }); 98 | 99 | app.get('/notFound', (req, res) => { 100 | res.status(404).json({ 101 | error: "Not Found", 102 | statusCode: "404" 103 | }) 104 | }); 105 | 106 | app.get('/logs', (req, res) => { 107 | logging(); 108 | res.status(200).json({ 109 | objective: "To generate logs" 110 | }) 111 | }); 112 | 113 | 114 | // Simulate a crash by throwing an error 115 | app.get('/crash', (req, res) => { 116 | console.log('Intentionally crashing the server...'); 117 | process.exit(1); 118 | }); 119 | 120 | 121 | // Define the /example route 122 | app.get('/example', async (req, res) => { 123 | const endGauge = gauge.startTimer({ method: req.method, status: res.statusCode }); 124 | await simulateAsyncTask(); 125 | endGauge(); 126 | res.send('Async task completed'); 127 | }); 128 | 129 | // Expose metrics for Prometheus to scrape 130 | app.get('/metrics', async (req, res) => { 131 | res.set('Content-Type', promClient.register.contentType); 132 | res.end(await promClient.register.metrics()); 133 | }); 134 | 135 | // Calling to service-b 136 | app.get('/call-service-b', async (req, res) => { 137 | try { 138 | const response = await axios.get(`${process.env.SERVICE_B_URI}/hello`); 139 | res.send(`

Service B says: ${response.data}

`); 140 | } catch (error) { 141 | res.status(500).send('Error communicating with Service B'); 142 | } 143 | }); 144 | 145 | app.listen(PORT, () => { 146 | console.log(`Service A is running on port ${PORT}`); 147 | }); -------------------------------------------------------------------------------- /day-4/application/service-a/package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "app-code", 3 | "version": "1.0.0", 4 | "description": "", 5 | "main": "index.js", 6 | "scripts": { 7 | "test": "echo \"Error: no test specified\" && exit 1", 8 | "start": "node index.js" 9 | }, 10 | "author": "", 11 | "license": "ISC", 12 | "dependencies": { 13 | "@opentelemetry/api": "^1.9.0", 14 | "@opentelemetry/auto-instrumentations-node": "^0.49.2", 15 | "@opentelemetry/exporter-jaeger": "^1.26.0", 16 | "@opentelemetry/exporter-trace-otlp-grpc": "^0.53.0", 17 | "@opentelemetry/instrumentation": "^0.53.0", 18 | "@opentelemetry/instrumentation-express": "^0.41.1", 19 | "@opentelemetry/instrumentation-http": "^0.53.0", 20 | "@opentelemetry/resources": "^1.26.0", 21 | "@opentelemetry/sdk-node": "^0.53.0", 22 | "@opentelemetry/sdk-trace-base": "^1.26.0", 23 | "@opentelemetry/sdk-trace-node": "^1.26.0", 24 | "@opentelemetry/semantic-conventions": "^1.27.0", 25 | "axios": "^1.7.6", 26 | "dotenv": "^16.4.5", 27 | "express": "^4.19.2", 28 | "morgan": "^1.10.0", 29 | "pino": "^9.2.0", 30 | "prom-client": "^15.1.2" 31 | } 32 | } 33 | -------------------------------------------------------------------------------- /day-4/application/service-a/tracing.js: -------------------------------------------------------------------------------- 1 | 'use strict'; 2 | 3 | const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node'); // Updated import 4 | const { JaegerExporter } = require('@opentelemetry/exporter-jaeger'); 5 | const { registerInstrumentations } = require('@opentelemetry/instrumentation'); 6 | const { Resource } = require('@opentelemetry/resources'); 7 | const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions'); 8 | const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base'); 9 | const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http'); 10 | const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express'); 11 | 12 | // Initialize the provider 13 | const provider = new NodeTracerProvider({ 14 | resource: new Resource({ 15 | [SemanticResourceAttributes.SERVICE_NAME]: 'service-a', 16 | }), 17 | }); 18 | 19 | const JAEGER_ENDPOINT = process.env.OTEL_EXPORTER_JAEGER_ENDPOINT 20 | 21 | // Setup the exporter 22 | const exporter = new JaegerExporter({ 23 | endpoint: JAEGER_ENDPOINT, // Replace with the appropriate Jaeger collector endpoint 24 | }); 25 | 26 | // Add the exporter to the provider 27 | provider.addSpanProcessor(new SimpleSpanProcessor(exporter)); 28 | 29 | // Initialize the provider and instrumentations 30 | provider.register(); 31 | 32 | registerInstrumentations({ 33 | instrumentations: [ 34 | new HttpInstrumentation({ 35 | applyCustomAttributesOnSpan: (span, request, response) => { 36 | span.setAttribute('custom-attribute', 'custom-value'); 37 | }, 38 | }), 39 | new ExpressInstrumentation(), // Add this for Express.js instrumentation 40 | ], 41 | }); 42 | 43 | console.log('Tracing initialized'); 44 | -------------------------------------------------------------------------------- /day-4/application/service-b/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM node:18-alpine 2 | 3 | COPY package*.json /usr/app/ 4 | 5 | COPY index.js /usr/app/ 6 | 7 | COPY tracing.js /usr/app/ 8 | 9 | WORKDIR /usr/app 10 | 11 | RUN npm install 12 | 13 | CMD ["node", "index.js"] 14 | -------------------------------------------------------------------------------- /day-4/application/service-b/index.js: -------------------------------------------------------------------------------- 1 | // service-b/index.js 2 | require('dotenv').config(); 3 | require('./tracing'); // Add this line to initialize tracing 4 | const express = require('express'); 5 | const morgan = require('morgan'); 6 | 7 | const app = express(); 8 | const PORT = 3002; 9 | app.use(morgan('common')) 10 | 11 | app.get('/hello', (req, res) => { 12 | res.send('Hello from Service B!'); 13 | }); 14 | 15 | app.listen(PORT, () => { 16 | console.log(`Service B is running on port ${PORT}`); 17 | }); -------------------------------------------------------------------------------- /day-4/application/service-b/package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "app-code", 3 | "version": "1.0.0", 4 | "description": "", 5 | "main": "index.js", 6 | "scripts": { 7 | "test": "echo \"Error: no test specified\" && exit 1", 8 | "start": "node index.js" 9 | }, 10 | "author": "", 11 | "license": "ISC", 12 | "dependencies": { 13 | "@opentelemetry/api": "^1.9.0", 14 | "@opentelemetry/auto-instrumentations-node": "^0.49.2", 15 | "@opentelemetry/exporter-jaeger": "^1.26.0", 16 | "@opentelemetry/exporter-trace-otlp-grpc": "^0.53.0", 17 | "@opentelemetry/instrumentation": "^0.53.0", 18 | "@opentelemetry/instrumentation-express": "^0.41.1", 19 | "@opentelemetry/instrumentation-http": "^0.53.0", 20 | "@opentelemetry/resources": "^1.26.0", 21 | "@opentelemetry/sdk-node": "^0.53.0", 22 | "@opentelemetry/sdk-trace-base": "^1.26.0", 23 | "@opentelemetry/sdk-trace-node": "^1.26.0", 24 | "@opentelemetry/semantic-conventions": "^1.27.0", 25 | "axios": "^1.7.6", 26 | "dotenv": "^16.4.5", 27 | "express": "^4.19.2", 28 | "morgan": "^1.10.0", 29 | "pino": "^9.2.0", 30 | "prom-client": "^15.1.2" 31 | } 32 | } 33 | -------------------------------------------------------------------------------- /day-4/application/service-b/tracing.js: -------------------------------------------------------------------------------- 1 | 'use strict'; 2 | 3 | const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node'); // Updated import 4 | const { JaegerExporter } = require('@opentelemetry/exporter-jaeger'); 5 | const { registerInstrumentations } = require('@opentelemetry/instrumentation'); 6 | const { Resource } = require('@opentelemetry/resources'); 7 | const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions'); 8 | const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base'); 9 | const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http'); 10 | const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express'); 11 | 12 | // Initialize the provider 13 | const provider = new NodeTracerProvider({ 14 | resource: new Resource({ 15 | [SemanticResourceAttributes.SERVICE_NAME]: 'service-b', 16 | }), 17 | }); 18 | 19 | const JAEGER_ENDPOINT = process.env.OTEL_EXPORTER_JAEGER_ENDPOINT 20 | 21 | // Setup the exporter 22 | const exporter = new JaegerExporter({ 23 | endpoint: JAEGER_ENDPOINT, // Replace with the appropriate Jaeger collector endpoint 24 | }); 25 | 26 | // Add the exporter to the provider 27 | provider.addSpanProcessor(new SimpleSpanProcessor(exporter)); 28 | 29 | // Initialize the provider and instrumentations 30 | provider.register(); 31 | 32 | registerInstrumentations({ 33 | instrumentations: [ 34 | new HttpInstrumentation({ 35 | applyCustomAttributesOnSpan: (span, request, response) => { 36 | span.setAttribute('custom-attribute', 'custom-value'); 37 | }, 38 | }), 39 | new ExpressInstrumentation(), // Add this for Express.js instrumentation 40 | ], 41 | }); 42 | 43 | console.log('Tracing initialized'); 44 | -------------------------------------------------------------------------------- /day-4/images/architecture.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iam-veeramalla/observability-zero-to-hero/9445b2364672b23b72f029e65471ed485a0c8950/day-4/images/architecture.gif -------------------------------------------------------------------------------- /day-4/kubernetes-manifest/deployment-svc-a.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | labels: 5 | app: service-a-deployment 6 | # run: service-a-deployment 7 | name: service-a-deployment 8 | spec: 9 | replicas: 1 10 | selector: 11 | matchLabels: 12 | app: service-a-deployment 13 | template: 14 | metadata: 15 | labels: 16 | app: service-a-deployment 17 | spec: 18 | containers: 19 | - image: abhishekf5/demoservice-a:v 20 | name: service-a 21 | imagePullPolicy: Always 22 | ports: 23 | - containerPort: 3001 24 | env: 25 | - name: OTEL_EXPORTER_JAEGER_ENDPOINT 26 | value: "http://jaeger-collector.tracing:14268/api/traces" 27 | - name: SERVICE_B_URI 28 | value: "http://b-service.dev" 29 | -------------------------------------------------------------------------------- /day-4/kubernetes-manifest/deployment-svc-b.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | labels: 5 | app: service-b-deployment 6 | # run: service-b-deployment 7 | name: service-b-deployment 8 | spec: 9 | replicas: 1 10 | selector: 11 | matchLabels: 12 | app: service-b-deployment 13 | template: 14 | metadata: 15 | labels: 16 | app: service-b-deployment 17 | spec: 18 | containers: 19 | - image: abhishekf5/demoservice-a:v 20 | name: service-b 21 | imagePullPolicy: Always 22 | ports: 23 | - containerPort: 3002 24 | env: 25 | - name: OTEL_EXPORTER_JAEGER_ENDPOINT 26 | value: "http://jaeger-collector.tracing:14268/api/traces" -------------------------------------------------------------------------------- /day-4/kubernetes-manifest/kustomization.yml: -------------------------------------------------------------------------------- 1 | apiVersion: kustomize.config.k8s.io/v1beta1 2 | kind: Kustomization 3 | namespace: dev 4 | resources: 5 | - deployment-svc-a.yml 6 | - service-svc-a.yml 7 | - deployment-svc-b.yml 8 | - service-svc-b.yml 9 | -------------------------------------------------------------------------------- /day-4/kubernetes-manifest/service-svc-a.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | job: node-api 6 | app: a-service 7 | name: a-service 8 | spec: 9 | ports: 10 | - name: a-service-port 11 | port: 80 12 | protocol: TCP 13 | targetPort: 3001 14 | selector: 15 | app: service-a-deployment 16 | type: LoadBalancer 17 | 18 | -------------------------------------------------------------------------------- /day-4/kubernetes-manifest/service-svc-b.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | job: node-api 6 | app: b-service 7 | name: b-service 8 | spec: 9 | ports: 10 | - name: b-service-port 11 | port: 80 12 | protocol: TCP 13 | targetPort: 3002 14 | selector: 15 | app: service-b-deployment 16 | 17 | -------------------------------------------------------------------------------- /day-4/readme.md: -------------------------------------------------------------------------------- 1 | ## 🎛️ Instrumentation 2 | - Instrumentation refers to the process of adding monitoring capabilities to your applications, systems, or services. 3 | - This involves embedding/Writting code or using tools to collect metrics, logs, or traces that provide insights into how the system is performing. 4 | 5 | ## 🎯 Purpose of Instrumentation: 6 | - **Visibility**: It helps you gain visibility into the internal state of your applications and infrastructure. 7 | - **Metrics Collection**: By collecting key metrics like CPU usage, memory consumption, request rates, error rates, etc., you can understand the health and performance of your system. 8 | - **Troubleshooting**: When something goes wrong, instrumentation allows you to diagnose the issue quickly by providing detailed insights. 9 | 10 | ## ⚙️ How it Works: 11 | - **Code-Level Instrumentation**: You can add instrumentation directly in your application code to expose metrics. For example, in a `Node.js` application, you might use a library like prom-client to expose custom metrics. 12 | 13 | ## 📈 Instrumentation in Prometheus: 14 | - 📤 **Exporters**: Prometheus uses exporters to collect metrics from different systems. These exporters expose metrics in a format that Prometheus can scrape and store. 15 | - **Node Exporter**: Collects system-level metrics from Linux/Unix systems. 16 | - **MySQL Exporter (For MySQL Database)**: Collects metrics from a MySQL database. 17 | - **PostgreSQL Exporter (For PostgreSQL Database)**: Collects metrics from a PostgreSQL database. 18 | - 📊 **Custom Metrics**: You can instrument your application to expose custom metrics that are relevant to your specific use case. For example, you might track the number of user logins per minute. 19 | 20 | ## 📈 Types of Metrics in Prometheus 21 | - 🔄️ **Counter**: 22 | - A Counter is a cumulative metric that represents a single numerical value that only ever goes up. It is used for counting events like the number of HTTP requests, errors, or tasks completed. 23 | - **Example**: Counting the number of times a container restarts in your Kubernetes cluster 24 | - **Metric Example**: `kube_pod_container_status_restarts_total` 25 | 26 | - 📏 **Gauge**: 27 | - A Gauge is a metric that represents a single numerical value that can go up and down. It is typically used for things like memory usage, CPU usage, or the current number of active users. 28 | - **Example**: Monitoring the memory usage of a container in your Kubernetes cluster. 29 | - **Metric Example**: `container_memory_usage_bytes` 30 | 31 | - 📊 **Histogram**: 32 | - A Histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. 33 | - It also provides a sum of all observed values and a count of observations. 34 | - **Example**: Measuring the response time of Kubernetes API requests in various time buckets. 35 | - **Metric Example**: `apiserver_request_duration_seconds_bucket` 36 | 37 | - 📝 Summary: 38 | - Similar to a Histogram, a Summary samples observations and provides a total count of observations, their sum, and configurable quantiles (percentiles). 39 | - **Example**: Monitoring the 95th percentile of request durations to understand high latency in your Kubernetes API. 40 | - **Metric Example**: `apiserver_request_duration_seconds_sum` 41 | 42 | 43 | # 🎯 Project Objectives 44 | - 🛠️ **Implement Custom Metrics in Node.js Application**: Use the prom-client library to write and expose custom metrics in the Node.js application. 45 | - 🚨 **Set Up Alerts in Alertmanager**: Configure Alertmanager to send email notifications if a container crashes more than two times. 46 | - 📝 **Set Up Logging**: Implement logging on both application and cluster (node) logs for better observability using EFK stack(Elasticsearch, FluentBit, Kibana). 47 | - 📸 **Implement Distributed Tracing for Node.js Application**: Enhance observability by instrumenting the Node.js application for distributed tracing using Jaeger. enabling better performance monitoring and troubleshooting of complex, multi-service architectures. 48 | 49 | # 🏠 Architecture 50 | ![Project Architecture](images/architecture.gif) 51 | 52 | ## 1) Write Custom Metrics 53 | - Please take a look at `day-4/application/service-a/index.js` file to learn more about custom metrics. below is the brief overview 54 | - **Express Setup**: Initializes an Express application and sets up logging with Morgan. 55 | - **Logging with Pino**: Defines a custom logging function using Pino for structured logging. 56 | - **Prometheus Metrics with prom-client**: Integrates Prometheus for monitoring HTTP requests using the prom-client library: 57 | - `http_requests_total`: counter 58 | - `http_request_duration_seconds`: histogram 59 | - `http_request_duration_summary_seconds`: summary 60 | - `node_gauge_example`: gauge for tracking async task duration 61 | ### Basic Routes: 62 | - `/` : Returns a "Running" status. 63 | - `/healthy`: Returns the health status of the server. 64 | - `/serverError`: Simulates a 500 Internal Server Error. 65 | - `/notFound`: Simulates a 404 Not Found error. 66 | - `/logs`: Generates logs using the custom logging function. 67 | - `/crash`: Simulates a server crash by exiting the process. 68 | - `/example`: Tracks async task duration with a gauge. 69 | - `/metrics`: Exposes Prometheus metrics endpoint. 70 | - `/call-service-b`: To call service b & receive data from service b 71 | 72 | ## 2) dockerize & push it to the registry 73 | - To containerize the applications and push it to your Docker registry, run the following commands: 74 | ```bash 75 | cd day-4 76 | 77 | # Dockerize microservice - a 78 | docker build -t <>:<> application/service-a/ 79 | # or use abhishekf5/demoservice-a:v 80 | 81 | # Dockerize microservice - b 82 | docker build -t <>:<> application/service-b/ 83 | 84 | or use the pre-built images 85 | - abhishekf5/demoservice-a:v 86 | - abhishekf5/demoservice-b:v 87 | 88 | ``` 89 | 90 | ## 3) Kubernetes manifest 91 | - Review the Kubernetes manifest files located in `day-4/kubernetes-manifest`. 92 | - Apply the Kubernetes manifest files to your cluster by running: 93 | ```bash 94 | kubectl create ns dev 95 | 96 | kubectl apply -k kubernetes-manifest/ 97 | ``` 98 | 99 | ## 4) Test all the endpoints 100 | - Open a browser and get the LoadBalancer DNS name & hit the DNS name with following routes to test the application: 101 | - `/` 102 | - `/healthy` 103 | - `/serverError` 104 | - `/notFound` 105 | - `/logs` 106 | - `/example` 107 | - `/metrics` 108 | - `/call-service-b` 109 | - Alternatively, you can run the automated script `test.sh`, which will automatically send random requests to the LoadBalancer and generate metrics: 110 | ```bash 111 | ./test.sh <> 112 | ``` 113 | 114 | ## 5) Configure Alertmanager 115 | - Review the Alertmanager configuration files located in `day-4/alerts-alertmanager-servicemonitor-manifest` but below is the brief overview 116 | - Before configuring Alertmanager, we need credentials to send emails. For this project, we are using Gmail, but any SMTP provider like AWS SES can be used. so please grab the credentials for that. 117 | - Open your Google account settings and search App password & create a new password & put the password in `day-4/alerts-alertmanager-servicemonitor-manifest/email-secret.yml` 118 | - One last thing, please add your email id in the `day-4/alerts-alertmanager-servicemonitor-manifest/alertmanagerconfig.yml` 119 | - **HighCpuUsage**: Triggers a warning alert if the average CPU usage across instances exceeds 50% for more than 5 minutes. 120 | - **PodRestart**: Triggers a critical alert immediately if any pod restarts more than 2 times. 121 | - Apply the manifest files to your cluster by running: 122 | ```bash 123 | kubectl apply -k alerts-alertmanager-servicemonitor-manifest/ 124 | ``` 125 | - Wait for 4-5 minutes and then check the Prometheus UI to confirm that the custom metrics implemented in the Node.js application are available: 126 | - `http_requests_total`: counter 127 | - `http_request_duration_seconds`: histogram 128 | - `http_request_duration_summary_seconds`: summary 129 | - `node_gauge_example`: gauge for tracking async task duration 130 | 131 | ## 6) Testing Alerts 132 | - To test the alerting system, manually crash the container more than 2 times to trigger an alert (email notification). 133 | - To crash the application container, hit the following endpoint 134 | - `<>/crash` 135 | - You should receive an email once the application container has restarted at least 3 times. -------------------------------------------------------------------------------- /day-4/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Set the base URL of your Node.js application 4 | BASE_URL="http://$1" 5 | 6 | echo $BASE_URL 7 | 8 | # Define an array of endpoints 9 | ENDPOINTS=( 10 | "/" 11 | "/healthy" 12 | "/serverError" 13 | "/notFound" 14 | "/logs" 15 | "/example" 16 | "/metrics" 17 | "/call-service-b" 18 | "/call-service-b" 19 | "/call-service-b" 20 | ) 21 | 22 | # Function to make a random request to one of the endpoints 23 | make_random_request() { 24 | local endpoint=${ENDPOINTS[$RANDOM % ${#ENDPOINTS[@]}]} 25 | curl -s -o /dev/null -w "%{http_code}" "$BASE_URL$endpoint" 26 | } 27 | 28 | # Make 1000 random requests 29 | for ((i=1; i<=1000; i++)); do 30 | make_random_request 31 | echo "Request $i completed" 32 | sleep 0.1 # Optional: Sleep for a short duration between requests to simulate real traffic 33 | done 34 | 35 | echo "Completed 1000 requests" 36 | -------------------------------------------------------------------------------- /day-5/fluentbit-values.yaml: -------------------------------------------------------------------------------- 1 | # Default values for fluent-bit. 2 | 3 | # kind -- DaemonSet or Deployment 4 | kind: DaemonSet 5 | 6 | # replicaCount -- Only applicable if kind=Deployment 7 | replicaCount: 1 8 | 9 | image: 10 | repository: cr.fluentbit.io/fluent/fluent-bit 11 | # Overrides the image tag whose default is {{ .Chart.AppVersion }} 12 | # Set to "-" to not use the default value 13 | tag: 14 | digest: 15 | pullPolicy: IfNotPresent 16 | 17 | testFramework: 18 | enabled: true 19 | namespace: 20 | image: 21 | repository: busybox 22 | pullPolicy: Always 23 | tag: latest 24 | digest: 25 | 26 | imagePullSecrets: [] 27 | nameOverride: "" 28 | fullnameOverride: "" 29 | 30 | serviceAccount: 31 | create: true 32 | annotations: {} 33 | name: 34 | 35 | rbac: 36 | create: true 37 | nodeAccess: false 38 | eventsAccess: false 39 | 40 | # Configure podsecuritypolicy 41 | # Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ 42 | # from Kubernetes 1.25, PSP is deprecated 43 | # See: https://kubernetes.io/blog/2022/08/23/kubernetes-v1-25-release/#pod-security-changes 44 | # We automatically disable PSP if Kubernetes version is 1.25 or higher 45 | podSecurityPolicy: 46 | create: false 47 | annotations: {} 48 | 49 | # OpenShift-specific configuration 50 | openShift: 51 | enabled: false 52 | securityContextConstraints: 53 | # Create SCC for Fluent-bit and allow use it 54 | create: true 55 | name: "" 56 | annotations: {} 57 | # Use existing SCC in cluster, rather then create new one 58 | existingName: "" 59 | 60 | podSecurityContext: {} 61 | # fsGroup: 2000 62 | 63 | hostNetwork: false 64 | dnsPolicy: ClusterFirst 65 | 66 | dnsConfig: {} 67 | # nameservers: 68 | # - 1.2.3.4 69 | # searches: 70 | # - ns1.svc.cluster-domain.example 71 | # - my.dns.search.suffix 72 | # options: 73 | # - name: ndots 74 | # value: "2" 75 | # - name: edns0 76 | 77 | hostAliases: [] 78 | # - ip: "1.2.3.4" 79 | # hostnames: 80 | # - "foo.local" 81 | # - "bar.local" 82 | 83 | securityContext: {} 84 | # capabilities: 85 | # drop: 86 | # - ALL 87 | # readOnlyRootFilesystem: true 88 | # runAsNonRoot: true 89 | # runAsUser: 1000 90 | 91 | service: 92 | type: ClusterIP 93 | port: 2020 94 | internalTrafficPolicy: 95 | loadBalancerClass: 96 | loadBalancerSourceRanges: [] 97 | labels: {} 98 | # nodePort: 30020 99 | # clusterIP: 172.16.10.1 100 | annotations: {} 101 | # prometheus.io/path: "/api/v1/metrics/prometheus" 102 | # prometheus.io/port: "2020" 103 | # prometheus.io/scrape: "true" 104 | externalIPs: [] 105 | # externalIPs: 106 | # - 2.2.2.2 107 | 108 | 109 | serviceMonitor: 110 | enabled: false 111 | # namespace: monitoring 112 | # interval: 10s 113 | # scrapeTimeout: 10s 114 | # selector: 115 | # prometheus: my-prometheus 116 | # ## metric relabel configs to apply to samples before ingestion. 117 | # ## 118 | # metricRelabelings: 119 | # - sourceLabels: [__meta_kubernetes_service_label_cluster] 120 | # targetLabel: cluster 121 | # regex: (.*) 122 | # replacement: ${1} 123 | # action: replace 124 | # ## relabel configs to apply to samples after ingestion. 125 | # ## 126 | # relabelings: 127 | # - sourceLabels: [__meta_kubernetes_pod_node_name] 128 | # separator: ; 129 | # regex: ^(.*)$ 130 | # targetLabel: nodename 131 | # replacement: $1 132 | # action: replace 133 | # scheme: "" 134 | # tlsConfig: {} 135 | 136 | ## Bear in mind if you want to collect metrics from a different port 137 | ## you will need to configure the new ports on the extraPorts property. 138 | additionalEndpoints: [] 139 | # - port: metrics 140 | # path: /metrics 141 | # interval: 10s 142 | # scrapeTimeout: 10s 143 | # scheme: "" 144 | # tlsConfig: {} 145 | # # metric relabel configs to apply to samples before ingestion. 146 | # # 147 | # metricRelabelings: 148 | # - sourceLabels: [__meta_kubernetes_service_label_cluster] 149 | # targetLabel: cluster 150 | # regex: (.*) 151 | # replacement: ${1} 152 | # action: replace 153 | # # relabel configs to apply to samples after ingestion. 154 | # # 155 | # relabelings: 156 | # - sourceLabels: [__meta_kubernetes_pod_node_name] 157 | # separator: ; 158 | # regex: ^(.*)$ 159 | # targetLabel: nodename 160 | # replacement: $1 161 | # action: replace 162 | 163 | prometheusRule: 164 | enabled: false 165 | # namespace: "" 166 | # additionalLabels: {} 167 | # rules: 168 | # - alert: NoOutputBytesProcessed 169 | # expr: rate(fluentbit_output_proc_bytes_total[5m]) == 0 170 | # annotations: 171 | # message: | 172 | # Fluent Bit instance {{ $labels.instance }}'s output plugin {{ $labels.name }} has not processed any 173 | # bytes for at least 15 minutes. 174 | # summary: No Output Bytes Processed 175 | # for: 15m 176 | # labels: 177 | # severity: critical 178 | 179 | dashboards: 180 | enabled: false 181 | labelKey: grafana_dashboard 182 | labelValue: 1 183 | annotations: {} 184 | namespace: "" 185 | 186 | lifecycle: {} 187 | # preStop: 188 | # exec: 189 | # command: ["/bin/sh", "-c", "sleep 20"] 190 | 191 | livenessProbe: 192 | httpGet: 193 | path: / 194 | port: http 195 | 196 | readinessProbe: 197 | httpGet: 198 | path: /api/v1/health 199 | port: http 200 | 201 | resources: {} 202 | # limits: 203 | # cpu: 100m 204 | # memory: 128Mi 205 | # requests: 206 | # cpu: 100m 207 | # memory: 128Mi 208 | 209 | ## only available if kind is Deployment 210 | ingress: 211 | enabled: false 212 | ingressClassName: "" 213 | annotations: {} 214 | # kubernetes.io/ingress.class: nginx 215 | # kubernetes.io/tls-acme: "true" 216 | hosts: [] 217 | # - host: fluent-bit.example.tld 218 | extraHosts: [] 219 | # - host: fluent-bit-extra.example.tld 220 | ## specify extraPort number 221 | # port: 5170 222 | tls: [] 223 | # - secretName: fluent-bit-example-tld 224 | # hosts: 225 | # - fluent-bit.example.tld 226 | 227 | ## only available if kind is Deployment 228 | autoscaling: 229 | vpa: 230 | enabled: false 231 | 232 | annotations: {} 233 | 234 | # List of resources that the vertical pod autoscaler can control. Defaults to cpu and memory 235 | controlledResources: [] 236 | 237 | # Define the max allowed resources for the pod 238 | maxAllowed: {} 239 | # cpu: 200m 240 | # memory: 100Mi 241 | # Define the min allowed resources for the pod 242 | minAllowed: {} 243 | # cpu: 200m 244 | # memory: 100Mi 245 | 246 | updatePolicy: 247 | # Specifies whether recommended updates are applied when a Pod is started and whether recommended updates 248 | # are applied during the life of a Pod. Possible values are "Off", "Initial", "Recreate", and "Auto". 249 | updateMode: Auto 250 | 251 | enabled: false 252 | minReplicas: 1 253 | maxReplicas: 3 254 | targetCPUUtilizationPercentage: 75 255 | # targetMemoryUtilizationPercentage: 75 256 | ## see https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics 257 | customRules: [] 258 | # - type: Pods 259 | # pods: 260 | # metric: 261 | # name: packets-per-second 262 | # target: 263 | # type: AverageValue 264 | # averageValue: 1k 265 | ## see https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-configurable-scaling-behavior 266 | behavior: {} 267 | # scaleDown: 268 | # policies: 269 | # - type: Pods 270 | # value: 4 271 | # periodSeconds: 60 272 | # - type: Percent 273 | # value: 10 274 | # periodSeconds: 60 275 | 276 | ## only available if kind is Deployment 277 | podDisruptionBudget: 278 | enabled: false 279 | annotations: {} 280 | maxUnavailable: "30%" 281 | 282 | nodeSelector: {} 283 | 284 | tolerations: [] 285 | 286 | affinity: {} 287 | 288 | labels: {} 289 | 290 | annotations: {} 291 | 292 | podAnnotations: {} 293 | 294 | podLabels: {} 295 | 296 | ## How long (in seconds) a pods needs to be stable before progressing the deployment 297 | ## 298 | minReadySeconds: 299 | 300 | ## How long (in seconds) a pod may take to exit (useful with lifecycle hooks to ensure lb deregistration is done) 301 | ## 302 | terminationGracePeriodSeconds: 303 | 304 | priorityClassName: "" 305 | 306 | env: [] 307 | # - name: FOO 308 | # value: "bar" 309 | 310 | # The envWithTpl array below has the same usage as "env", but is using the tpl function to support templatable string. 311 | # This can be useful when you want to pass dynamic values to the Chart using the helm argument "--set =" 312 | # https://helm.sh/docs/howto/charts_tips_and_tricks/#using-the-tpl-function 313 | envWithTpl: [] 314 | # - name: FOO_2 315 | # value: "{{ .Values.foo2 }}" 316 | # 317 | # foo2: bar2 318 | 319 | envFrom: [] 320 | 321 | # This supports either a structured array or a templatable string 322 | extraContainers: [] 323 | 324 | # Array mode 325 | # extraContainers: 326 | # - name: do-something 327 | # image: busybox 328 | # command: ['do', 'something'] 329 | 330 | # String mode 331 | # extraContainers: |- 332 | # - name: do-something 333 | # image: bitnami/kubectl:{{ .Capabilities.KubeVersion.Major }}.{{ .Capabilities.KubeVersion.Minor }} 334 | # command: ['kubectl', 'version'] 335 | 336 | flush: 1 337 | 338 | metricsPort: 2020 339 | 340 | extraPorts: [] 341 | # - port: 5170 342 | # containerPort: 5170 343 | # protocol: TCP 344 | # name: tcp 345 | # nodePort: 30517 346 | 347 | extraVolumes: [] 348 | 349 | extraVolumeMounts: [] 350 | 351 | updateStrategy: {} 352 | # type: RollingUpdate 353 | # rollingUpdate: 354 | # maxUnavailable: 1 355 | 356 | # Make use of a pre-defined configmap instead of the one templated here 357 | existingConfigMap: "" 358 | 359 | networkPolicy: 360 | enabled: false 361 | # ingress: 362 | # from: [] 363 | 364 | luaScripts: 365 | setIndex.lua: | 366 | function set_index(tag, timestamp, record) 367 | index = "abhishek-" 368 | if record["kubernetes"] ~= nil then 369 | if record["kubernetes"]["namespace_name"] == "logging" then 370 | return -1, timestamp, record -- Skip logs from the logging namespace 371 | end 372 | if record["kubernetes"]["namespace_name"] ~= nil then 373 | if record["kubernetes"]["container_name"] ~= nil then 374 | record["es_index"] = index 375 | .. record["kubernetes"]["namespace_name"] 376 | .. "-" 377 | .. record["kubernetes"]["container_name"] 378 | return 1, timestamp, record 379 | end 380 | record["es_index"] = index 381 | .. record["kubernetes"]["namespace_name"] 382 | return 1, timestamp, record 383 | end 384 | end 385 | return 1, timestamp, record 386 | end 387 | 388 | ## https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/classic-mode/configuration-file 389 | config: 390 | service: | 391 | [SERVICE] 392 | Daemon Off 393 | Flush {{ .Values.flush }} 394 | Log_Level {{ .Values.logLevel }} 395 | Parsers_File /fluent-bit/etc/parsers.conf 396 | Parsers_File /fluent-bit/etc/conf/custom_parsers.conf 397 | HTTP_Server On 398 | HTTP_Listen 0.0.0.0 399 | HTTP_Port {{ .Values.metricsPort }} 400 | Health_Check On 401 | 402 | ## https://docs.fluentbit.io/manual/pipeline/inputs 403 | inputs: | 404 | [INPUT] 405 | Name tail 406 | Path /var/log/containers/*.log 407 | multiline.parser docker, cri 408 | Tag kube.* 409 | Mem_Buf_Limit 5MB 410 | Skip_Long_Lines On 411 | 412 | [INPUT] 413 | Name systemd 414 | Tag host.* 415 | Systemd_Filter _SYSTEMD_UNIT=kubelet.service 416 | Read_From_Tail On 417 | 418 | ## https://docs.fluentbit.io/manual/pipeline/filters 419 | filters: | 420 | [FILTER] 421 | Name kubernetes 422 | Match kube.* 423 | Merge_Log On 424 | Keep_Log Off 425 | K8S-Logging.Parser On 426 | K8S-Logging.Exclude On 427 | 428 | [FILTER] 429 | Name lua 430 | Match kube.* 431 | script /fluent-bit/scripts/setIndex.lua 432 | call set_index 433 | 434 | ## https://docs.fluentbit.io/manual/pipeline/outputs 435 | outputs: | 436 | [OUTPUT] 437 | Name es 438 | Match kube.* 439 | Type _doc 440 | Host elasticsearch-master 441 | Port 9200 442 | HTTP_User elastic 443 | HTTP_Passwd cbTQj1qxRIPNF5uc 444 | tls On 445 | tls.verify Off 446 | Logstash_Format On 447 | Logstash_Prefix logstash 448 | Retry_Limit False 449 | Suppress_Type_Name On 450 | 451 | [OUTPUT] 452 | Name es 453 | Match host.* 454 | Type _doc 455 | Host elasticsearch-master 456 | Port 9200 457 | HTTP_User elastic 458 | HTTP_Passwd cbTQj1qxRIPNF5uc 459 | tls On 460 | tls.verify Off 461 | Logstash_Format On 462 | Logstash_Prefix node 463 | Retry_Limit False 464 | Suppress_Type_Name On 465 | 466 | ## https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/classic-mode/upstream-servers 467 | ## This configuration is deprecated, please use `extraFiles` instead. 468 | upstream: {} 469 | 470 | ## https://docs.fluentbit.io/manual/pipeline/parsers 471 | customParsers: | 472 | [PARSER] 473 | Name docker_no_time 474 | Format json 475 | Time_Keep Off 476 | Time_Key time 477 | Time_Format %Y-%m-%dT%H:%M:%S.%L 478 | 479 | # This allows adding more files with arbitrary filenames to /fluent-bit/etc/conf by providing key/value pairs. 480 | # The key becomes the filename, the value becomes the file content. 481 | extraFiles: {} 482 | # upstream.conf: | 483 | # [UPSTREAM] 484 | # upstream1 485 | # 486 | # [NODE] 487 | # name node-1 488 | # host 127.0.0.1 489 | # port 43000 490 | # example.conf: | 491 | # [OUTPUT] 492 | # Name example 493 | # Match foo.* 494 | # Host bar 495 | 496 | # The config volume is mounted by default, either to the existingConfigMap value, or the default of "fluent-bit.fullname" 497 | volumeMounts: 498 | - name: config 499 | mountPath: /fluent-bit/etc/conf 500 | 501 | daemonSetVolumes: 502 | - name: varlog 503 | hostPath: 504 | path: /var/log 505 | - name: varlibdockercontainers 506 | hostPath: 507 | path: /var/lib/docker/containers 508 | - name: etcmachineid 509 | hostPath: 510 | path: /etc/machine-id 511 | type: File 512 | 513 | daemonSetVolumeMounts: 514 | - name: varlog 515 | mountPath: /var/log 516 | - name: varlibdockercontainers 517 | mountPath: /var/lib/docker/containers 518 | readOnly: true 519 | - name: etcmachineid 520 | mountPath: /etc/machine-id 521 | readOnly: true 522 | 523 | command: 524 | - /fluent-bit/bin/fluent-bit 525 | 526 | args: 527 | - --workdir=/fluent-bit/etc 528 | - --config=/fluent-bit/etc/conf/fluent-bit.conf 529 | 530 | # This supports either a structured array or a templatable string 531 | initContainers: [] 532 | 533 | # Array mode 534 | # initContainers: 535 | # - name: do-something 536 | # image: bitnami/kubectl:1.22 537 | # command: ['kubectl', 'version'] 538 | 539 | # String mode 540 | # initContainers: |- 541 | # - name: do-something 542 | # image: bitnami/kubectl:{{ .Capabilities.KubeVersion.Major }}.{{ .Capabilities.KubeVersion.Minor }} 543 | # command: ['kubectl', 'version'] 544 | 545 | logLevel: info 546 | 547 | hotReload: 548 | enabled: false 549 | image: 550 | repository: ghcr.io/jimmidyson/configmap-reload 551 | tag: v0.11.1 552 | digest: 553 | pullPolicy: IfNotPresent 554 | resources: {} 555 | 556 | -------------------------------------------------------------------------------- /day-5/images/architecture.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iam-veeramalla/observability-zero-to-hero/9445b2364672b23b72f029e65471ed485a0c8950/day-5/images/architecture.gif -------------------------------------------------------------------------------- /day-5/readme.md: -------------------------------------------------------------------------------- 1 | # 🔍 Logging overview 2 | - Logging is crucial in any distributed system, especially in Kubernetes, to monitor application behavior, detect issues, and ensure the smooth functioning of microservices. 3 | 4 | 5 | ## 🚀 Importance: 6 | - **Debugging**: Logs provide critical information when debugging issues in applications. 7 | - **Auditing**: Logs serve as an audit trail, showing what actions were taken and by whom. 8 | - **Performance** Monitoring: Analyzing logs can help identify performance bottlenecks. 9 | - **Security**: Logs help in detecting unauthorized access or malicious activities. 10 | 11 | ## 🛠️ Tools Available for Logging in Kubernetes 12 | - 🗂️ EFK Stack (Elasticsearch, Fluentbit, Kibana) 13 | - 🗂️ EFK Stack (Elasticsearch, FluentD, Kibana) 14 | - 🗂️ ELK Stack (Elasticsearch, Logstash, Kibana) 15 | - 📊 Promtail + Loki + Grafana 16 | 17 | ## 📦 EFK Stack (Elasticsearch, Fluentbit, Kibana) 18 | - EFK is a popular logging stack used to collect, store, and analyze logs in Kubernetes. 19 | - **Elasticsearch**: Stores and indexes log data for easy retrieval. 20 | - **Fluentbit**: A lightweight log forwarder that collects logs from different sources and sends them to Elasticsearch. 21 | - **Kibana**: A visualization tool that allows users to explore and analyze logs stored in Elasticsearch. 22 | 23 | # 🏠 Architecture 24 | ![Project Architecture](images/architecture.gif) 25 | 26 | 27 | ## 📝 Step-by-Step Setup 28 | 29 | ### 1) Create IAM Role for Service Account 30 | ```bash 31 | eksctl create iamserviceaccount \ 32 | --name ebs-csi-controller-sa \ 33 | --namespace kube-system \ 34 | --cluster observability \ 35 | --role-name AmazonEKS_EBS_CSI_DriverRole \ 36 | --role-only \ 37 | --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \ 38 | --approve 39 | ``` 40 | - This command creates an IAM role for the EBS CSI controller. 41 | - IAM role allows EBS CSI controller to interact with AWS resources, specifically for managing EBS volumes in the Kubernetes cluster. 42 | - We will attach the Role with service account 43 | 44 | ### 2) Retrieve IAM Role ARN 45 | ```bash 46 | ARN=$(aws iam get-role --role-name AmazonEKS_EBS_CSI_DriverRole --query 'Role.Arn' --output text) 47 | ``` 48 | - Command retrieves the ARN of the IAM role created for the EBS CSI controller service account. 49 | 50 | ### 3) Deploy EBS CSI Driver 51 | ```bash 52 | eksctl create addon --cluster observability --name aws-ebs-csi-driver --version latest \ 53 | --service-account-role-arn $ARN --force 54 | ``` 55 | - Above command deploys the AWS EBS CSI driver as an addon to your Kubernetes cluster. 56 | - It uses the previously created IAM service account role to allow the driver to manage EBS volumes securely. 57 | 58 | ### 4) Create Namespace for Logging 59 | ```bash 60 | kubectl create namespace logging 61 | ``` 62 | 63 | ### 5) Install Elasticsearch on K8s 64 | 65 | ```bash 66 | helm repo add elastic https://helm.elastic.co 67 | 68 | helm install elasticsearch \ 69 | --set replicas=1 \ 70 | --set volumeClaimTemplate.storageClassName=gp2 \ 71 | --set persistence.labels.enabled=true elastic/elasticsearch -n logging 72 | ``` 73 | - Installs Elasticsearch in the `logging` namespace. 74 | - It sets the number of replicas, specifies the storage class, and enables persistence labels to ensure 75 | data is stored on persistent volumes. 76 | 77 | ### 6) Retrieve Elasticsearch Username & Password 78 | ```bash 79 | # for username 80 | kubectl get secrets --namespace=logging elasticsearch-master-credentials -ojsonpath='{.data.username}' | base64 -d 81 | # for password 82 | kubectl get secrets --namespace=logging elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d 83 | ``` 84 | - Retrieves the password for the Elasticsearch cluster's master credentials from the Kubernetes secret. 85 | - The password is base64 encoded, so it needs to be decoded before use. 86 | - 👉 **Note**: Please write down the password for future reference 87 | 88 | ### 7) Install Kibana 89 | ```bash 90 | helm install kibana --set service.type=LoadBalancer elastic/kibana -n logging 91 | ``` 92 | - Kibana provides a user-friendly interface for exploring and visualizing data stored in Elasticsearch. 93 | - It is exposed as a LoadBalancer service, making it accessible from outside the cluster. 94 | 95 | ### 8) Install Fluentbit with Custom Values/Configurations 96 | - 👉 **Note**: Please update the `HTTP_Passwd` field in the `fluentbit-values.yml` file with the password retrieved earlier in step 6: (i.e NJyO47UqeYBsoaEU)" 97 | ```bash 98 | helm repo add fluent https://fluent.github.io/helm-charts 99 | helm install fluent-bit fluent/fluent-bit -f fluentbit-values.yaml -n logging 100 | ``` 101 | 102 | ## ✅ Conclusion 103 | - We have successfully installed the EFK stack in our Kubernetes cluster, which includes Elasticsearch for storing logs, Fluentbit for collecting and forwarding logs, and Kibana for visualizing logs. 104 | - To verify the setup, access the Kibana dashboard by entering the `LoadBalancer DNS name followed by :5601 in your browser. 105 | - `http://LOAD_BALANCER_DNS_NAME:5601` 106 | - Use the username and password retrieved in step 6 to log in. 107 | - Once logged in, create a new data view in Kibana and explore the logs collected from your Kubernetes cluster. 108 | 109 | 110 | 111 | ## 🧼 Clean Up 112 | ```bash 113 | 114 | helm uninstall monitoring -n monitoring 115 | 116 | helm uninstall fluent-bit -n logging 117 | 118 | helm uninstall elasticsearch -n logging 119 | 120 | helm uninstall kibana -n logging 121 | 122 | cd day-4 123 | 124 | kubectl delete -k kubernetes-manifest/ 125 | 126 | kubectl delete -k alerts-alertmanager-servicemonitor-manifest/ 127 | 128 | 129 | eksctl delete cluster --name observability 130 | 131 | ``` 132 | -------------------------------------------------------------------------------- /day-6/images/architecture.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iam-veeramalla/observability-zero-to-hero/9445b2364672b23b72f029e65471ed485a0c8950/day-6/images/architecture.gif -------------------------------------------------------------------------------- /day-6/jaeger-values.yaml: -------------------------------------------------------------------------------- 1 | storage: 2 | type: elasticsearch 3 | elasticsearch: 4 | host: elasticsearch-master.logging.svc # Replace with your Elasticsearch service DNS 5 | port: 9200 6 | scheme: https 7 | user: elastic # Replace with the actual username if necessary 8 | password: cbTQj1qxRIPNF5uc # Replace with the actual password 9 | tls: 10 | enabled: true 11 | ca: /tls/ca-cert.pem # Path where the CA cert is mounted 12 | 13 | provisionDataStore: 14 | cassandra: false 15 | elasticsearch: false 16 | 17 | query: 18 | cmdlineParams: 19 | es.tls.ca: "/tls/ca-cert.pem" 20 | extraConfigmapMounts: 21 | - name: jaeger-tls 22 | mountPath: /tls 23 | subPath: "" 24 | configMap: jaeger-tls 25 | readOnly: true 26 | 27 | collector: 28 | cmdlineParams: 29 | es.tls.ca: "/tls/ca-cert.pem" 30 | extraConfigmapMounts: 31 | - name: jaeger-tls 32 | mountPath: /tls 33 | subPath: "" 34 | configMap: jaeger-tls 35 | readOnly: true 36 | -------------------------------------------------------------------------------- /day-6/readme.md: -------------------------------------------------------------------------------- 1 | ## 🕵️‍♂️ What is Jaeger? 2 | - Jaeger is an open-source, end-to-end distributed tracing system used for monitoring and troubleshooting microservices-based architectures. It helps developers understand how requests flow through a complex system, by tracing the path a request takes and measuring how long each step in that path takes. 3 | 4 | ## ❓ Why Use Jaeger? 5 | - In modern applications, especially microservices architectures, a single user request can touch multiple services. When something goes wrong, it’s challenging to pinpoint the source of the problem. Jaeger helps by: 6 | 7 | - 🐢 **Identifying bottlenecks**: See where your application spends most of its time. 8 | - 🔍 **Finding root causes of errors**: Trace errors back to their source. 9 | - ⚡ **Optimizing performance**: Understand and improve the latency of services. 10 | 11 | 12 | ## 📚 Core Concepts of Jaeger 13 | 14 | - 🛤️ **Trace**: A trace represents the journey of a request as it travels through various services. Think of it as a detailed map that shows every stop a request makes in your system. 15 | - 📏 **Span**: Each trace is made up of multiple spans. A span is a single operation within a trace, such as an API call or a database query. It has a start time and a duration. 16 | - 🏷️ **Tags**: Tags are key-value pairs that provide additional context about a span. For example, a tag might indicate the HTTP method used (GET, POST) or the status code returned. 17 | - 📝 **Logs**: Logs in a span provide details about what’s happening during that operation. They can capture events like errors or important checkpoints. 18 | - 🔗 **Context Propagation**: For Jaeger to trace requests across services, it needs to propagate context. This means each service in the call chain passes along the trace information to the next service. 19 | 20 | # 🏠 Architecture 21 | ![Project Architecture](images/architecture.gif) 22 | 23 | 24 | 25 | ## ⚙️ Setting Up Jaeger 26 | 27 | ### Step 1: Instrumenting Your Code 28 | - To start tracing, you need to instrument your services. This means adding tracing capabilities to your code. Most popular programming languages and frameworks have libraries or middleware that make this easy. 29 | - We have already instrumented our code using OpenTelemetry libraries/packages. For more details, refer to `day-4/application/service-a/tracing.js` or `day-4/application/service-b/tracing.js`. 30 | 31 | 32 | ### Step 2: Components of Jaeger 33 | - Jaeger consists of several components: 34 | - Agent: Collects traces from your application. 35 | - Collector: Receives traces from the agent and processes them. 36 | - Query: Provides a UI to view traces. 37 | - Storage: Stores traces for later retrieval (often a database like *Elasticsearch*). 38 | 39 | 40 | ### Step 3: Export Elasticsearch CA Certificate 41 | - This command retrieves the CA certificate from the Elasticsearch master certificate secret and decodes it, saving it to a ca-cert.pem file. 42 | ```bash 43 | kubectl get secret elasticsearch-master-certs -n logging -o jsonpath='{.data.ca\.crt}' | base64 --decode > ca-cert.pem 44 | ``` 45 | 46 | ### Step 4: Create Tracing Namespace 47 | - Creates a new Kubernetes namespace called tracing if it doesn't already exist, where Jaeger components will be installed. 48 | ```bash 49 | kubectl create ns tracing 50 | ``` 51 | 52 | ### Step 5: Create ConfigMap for Jaeger's TLS Certificate 53 | - Creates a ConfigMap in the tracing namespace, containing the CA certificate to be used by Jaeger for TLS. 54 | ```bash 55 | kubectl create configmap jaeger-tls --from-file=ca-cert.pem -n tracing 56 | ``` 57 | ### Step 6: Create Secret for Elasticsearch TLS 58 | - Creates a Kubernetes Secret in the tracing namespace, containing the CA certificate for Elasticsearch TLS communication. 59 | ```bash 60 | kubectl create secret generic es-tls-secret --from-file=ca-cert.pem -n tracing 61 | ``` 62 | ### Step 7: Add Jaeger Helm Repository 63 | - adds the official Jaeger Helm chart repository to your Helm setup, making it available for installations. 64 | ```bash 65 | helm repo add jaegertracing https://jaegertracing.github.io/helm-charts 66 | 67 | helm repo update 68 | ``` 69 | 70 | ### Step 8: Install Jaeger with Custom Values 71 | - 👉 **Note**: Please update the `password` field and other related field in the `jaeger-values.yaml` file with the password retrieved earlier in day-4 at step 6: (i.e NJyO47UqeYBsoaEU)" 72 | - Command installs Jaeger into the tracing namespace using a custom jaeger-values.yaml configuration file. Ensure the password is updated in the file before installation. 73 | ```bash 74 | helm install jaeger jaegertracing/jaeger -n tracing --values jaeger-values.yaml 75 | ``` 76 | ### Step 9: Port Forward Jaeger Query Service 77 | - Command forwards port 8080 on your local machine to the Jaeger Query service, allowing you to access the Jaeger UI locally. 78 | ```bash 79 | kubectl port-forward svc/jaeger-query 8080:80 -n tracing 80 | 81 | ``` 82 | 83 | ## 🧼 Clean Up 84 | ```bash 85 | 86 | helm uninstall jaeger -n tracing 87 | 88 | helm uninstall elasticsearch -n logging 89 | 90 | # Also delete PVC created for elasticsearch 91 | 92 | helm uninstall monitoring -n monitoring 93 | 94 | cd day-4 95 | 96 | kubectl delete -k kubernetes-manifest/ 97 | 98 | kubectl delete -k alerts-alertmanager-servicemonitor-manifest/ 99 | 100 | # Delete cluster 101 | eksctl delete cluster --name observability 102 | 103 | ``` 104 | 105 | -------------------------------------------------------------------------------- /day-7/README.md: -------------------------------------------------------------------------------- 1 | ## 📊 What is OpenTelemetry? 2 | - OpenTelemetry is an open-source observability framework for generating, collecting, and exporting telemetry data (traces, metrics, logs) to help monitor applications. 3 | 4 | ## 🛠️ How is it Different from Other Libraries? 5 | - OpenTelemetry offers a unified standard for observability across multiple tools and vendors, unlike other libraries that may focus only on a specific aspect like tracing or metrics. 6 | 7 | ## ⏳ What Existed Before OpenTelemetry? 8 | - Before OpenTelemetry, observability was typically managed using a combination of specialized tools for different aspects like 9 | - `Tracing`: Tools like Jaeger and Zipkin were used to track requests 10 | - `Metrics`: Solutions like Prometheus and StatsD were popular for collecting metrics 11 | - `Logging`: Tools like ELK Stack (Elasticsearch, Logstash, Kibana) or Fluentd were used to aggregate and analyze logs. 12 | - OpenTelemetry unified these by standardizing how telemetry data is collected and exported. 13 | - Prior to OpenTelemetry, there were OpenTracing and OpenCensus, which OpenTelemetry merged to provide a more comprehensive and standardized observability solution. 14 | 15 | ## 🌐 Supported Programming Languages 16 | 17 | OpenTelemetry supports several languages, including: 18 | 19 | - **Go** 20 | - **Java** 21 | - **JavaScript** 22 | - **Python** 23 | - **C#** 24 | - **C++** 25 | - **Ruby** 26 | - **PHP** 27 | - **Swift** 28 | - ...and others. 29 | 30 | ## Architecture 31 | 32 | ### 🖥️ Step 1: Create EKS Cluster 33 | 34 | ```bash 35 | eksctl create cluster --name=observability \ 36 | --region=us-east-1 \ 37 | --zones=us-east-1a,us-east-1b \ 38 | --without-nodegroup 39 | ``` 40 | ```bash 41 | eksctl utils associate-iam-oidc-provider \ 42 | --region us-east-1 \ 43 | --cluster observability \ 44 | --approve 45 | ``` 46 | ```bash 47 | eksctl create nodegroup --cluster=observability \ 48 | --region=us-east-1 \ 49 | --name=observability-ng-private \ 50 | --node-type=t3.medium \ 51 | --nodes-min=2 \ 52 | --nodes-max=3 \ 53 | --node-volume-size=20 \ 54 | --managed \ 55 | --asg-access \ 56 | --external-dns-access \ 57 | --full-ecr-access \ 58 | --appmesh-access \ 59 | --alb-ingress-access \ 60 | --node-private-networking 61 | 62 | # Update ./kube/config file 63 | aws eks update-kubeconfig --name observability 64 | ``` 65 | 66 | ### 🔐 Step 2: Create IAM Role for Service Account 67 | ```bash 68 | eksctl create iamserviceaccount \ 69 | --name ebs-csi-controller-sa \ 70 | --namespace kube-system \ 71 | --cluster observability \ 72 | --role-name AmazonEKS_EBS_CSI_DriverRole \ 73 | --role-only \ 74 | --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \ 75 | --approve 76 | ``` 77 | - This command creates an IAM role for the EBS CSI controller. 78 | - IAM role allows EBS CSI controller to interact with AWS resources, specifically for managing EBS volumes in the Kubernetes cluster. 79 | - We will attach the Role with service account 80 | 81 | ### 📝 Step 3: Retrieve IAM Role ARN 82 | ```bash 83 | ARN=$(aws iam get-role --role-name AmazonEKS_EBS_CSI_DriverRole --query 'Role.Arn' --output text) 84 | ``` 85 | - Command retrieves the ARN of the IAM role created for the EBS CSI controller service account. 86 | 87 | ### 📦 Step 4: Deploy EBS CSI Driver 88 | ```bash 89 | eksctl create addon --cluster observability --name aws-ebs-csi-driver --version latest \ 90 | --service-account-role-arn $ARN --force 91 | ``` 92 | - Above command deploys the AWS EBS CSI driver as an addon to your Kubernetes cluster. 93 | - It uses the previously created IAM service account role to allow the driver to manage EBS volumes securely. 94 | 95 | 96 | ### 🧩 Step 5: Understand the Application 97 | - We have two very simple microservice A (`microservice-a`) & B (`microservice-a`), Built with Golang using the Gin web framework for handling HTTP requests. 98 | - **Microservice A** API Endpoints: 99 | - `GET /hello-a` – Returns a greeting message 100 | - `GET /call-b` – Calls another service (Service B) and returns its response 101 | - `GET /getme-coffee` – Fetches and returns data from an external coffee API 102 | - **Microservice B** API Endpoints: 103 | - `GET /hello-b` – Returns a greeting message 104 | - `GET /call-a` – Calls another service (Service A) and returns its response 105 | - `GET /getme-coffee` – Fetches and returns data from an external coffee API 106 | - Observability: 107 | - OpenTelemetry SDK integrated for tracing and metrics. 108 | - Metrics and traces are exported to the OpenTelemetry Collector via OTLP over HTTP. 109 | - Instrumentation: 110 | - Uses OpenTelemetry middleware (otelgin) for automatic request tracing. 111 | - Instruments HTTP clients with otelhttp for distributed tracing of outbound requests. 112 | 113 | 114 | ### 🐳 Step 6: Dockerize & push it to the registry 115 | ```bash 116 | # Dockerize microservice - a 117 | docker build -t <>:<> microservice-a/ 118 | 119 | # Dockerize microservice - b 120 | docker build -t <>:<> microservice-b/ 121 | 122 | # push both images 123 | docker push <>:<> 124 | docker push <>:<> 125 | ``` 126 | 127 | 128 | ### 🗂️ Step 7: Create Namespace for observability components 129 | ```bash 130 | kubectl create namespace olly 131 | ``` 132 | 133 | ### 📚 Step 8: Install Elasticsearch on K8s 134 | helm repo add elastic https://helm.elastic.co 135 | 136 | helm install elasticsearch \ 137 | --set replicas=1 \ 138 | --set volumeClaimTemplate.storageClassName=gp2 \ 139 | --set persistence.labels.enabled=true elastic/elasticsearch -n olly 140 | 141 | 142 | ### 📜 Step 9: Export Elasticsearch CA Certificate 143 | - This command retrieves the CA certificate from the Elasticsearch master certificate secret and decodes it, saving it to a ca-cert.pem file. 144 | ```bash 145 | kubectl get secret elasticsearch-master-certs -n olly -o jsonpath='{.data.ca\.crt}' | base64 --decode > ca-cert.pem 146 | ``` 147 | 148 | ### 🔑 Step 10: Create ConfigMap for Jaeger's TLS Certificate 149 | - Creates a ConfigMap in the olly namespace, containing the CA certificate to be used by Jaeger for TLS. 150 | ```bash 151 | kubectl create configmap jaeger-tls --from-file=ca-cert.pem -n olly 152 | ``` 153 | 154 | ### 🛡️ Step 11: Create Secret for Elasticsearch TLS 155 | - Creates a Kubernetes Secret in the tracing namespace, containing the CA certificate for Elasticsearch TLS communication. 156 | ```bash 157 | kubectl create secret generic es-tls-secret --from-file=ca-cert.pem -n olly 158 | ``` 159 | 160 | ### 🔍 Step 12: Retrieve Elasticsearch Username & Password 161 | ```bash 162 | # for username 163 | kubectl get secrets --namespace=olly elasticsearch-master-credentials -ojsonpath='{.data.username}' | base64 -d 164 | # for password 165 | kubectl get secrets --namespace=olly elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d 166 | ``` 167 | - Retrieves the password for the Elasticsearch cluster's master credentials from the Kubernetes secret. 168 | - 👉 **Note**: Please write down the password for future reference 169 | 170 | 171 | ### 🕵️‍♂️ Step 13: Install Jaeger with Custom Values 172 | - 👉 **Note**: Please update the `password` field and other related field in the `jaeger-values.yaml` file with the password retrieved previous step at step 12: (i.e NJyO47UqeYBsoaEU)" 173 | - Command installs Jaeger into the olly namespace using a custom jaeger-values.yaml configuration file. Ensure the password is updated in the file before installation. 174 | ```bash 175 | helm repo add jaegertracing https://jaegertracing.github.io/helm-charts 176 | helm repo update 177 | 178 | helm install jaeger jaegertracing/jaeger -n olly --values jaeger-values.yaml 179 | ``` 180 | 181 | ### 🌐 Step 14: Access UI - Port Forward Jaeger Query Service 182 | kubectl port-forward svc/jaeger-query 8080:80 -n olly 183 | 184 | 185 | 186 | ### 📈 Step 15: Install Opentelemetry-collector 187 | helm install otel-collector open-telemetry/opentelemetry-collector -n olly --values otel-collector-values.yaml 188 | 189 | 190 | ### 📊 Step 16: Install prometheus 191 | ```bash 192 | helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 193 | helm repo update 194 | 195 | helm install prometheus prometheus-community/prometheus -n olly --values prometheus-values.yaml 196 | ``` 197 | 198 | ### 🚀 Step 17: Deploy the applicaiton 199 | - ***Note:*** - Review the Kubernetes manifest files located in `./k8s-manifest`. and you should change image name & tag with your own image 200 | ```bash 201 | kubectl apply -k k8s-manifests/ 202 | ``` 203 | - 👉 ***Note***: wait for 5 minutes till you load balancer comes in running state 204 | 205 | ## 🔄 Step 18: Generate Load 206 | - Script: `test.sh` takes two load balancer DNS addresses as input arguments and alternates requests between them using curl. 207 | - `test.sh` Continuously sends random HTTP requests every second to predefined routes on two provided load balancer DNSs 208 | - ***Note:*** Keep the script running in another terminal to quickly gather metrics & traces. 209 | 210 | ```bash 211 | ./test.sh http://Microservice_A_LOAD_BALANCER_DNS http://Microservice_B_LOAD_BALANCER_DNS 212 | ``` 213 | 214 | ### 📊 Step 19: Access the UI of Prometheus 215 | ```bash 216 | kubectl port-forward svc/prometheus-server 9090:80 -n olly 217 | ``` 218 | - Look for your application's metrics like `request_count`, `request_duration_ms`, `active_requests` and other to monitor request rates & performance. 219 | 220 | 221 | ### 🕵️‍♂️ Step 20: Access the UI of Jaeger 222 | ```bash 223 | kubectl port-forward svc/jaeger-query 8080:80 -n olly 224 | ``` 225 | - Look for traces from the service name microservice-a, microservice-b and operations such as `[/hello-a, /call-b, and /getme-coffee]` or `[/hello-b, /call-a, and /getme-coffee]` to monitor request flows and dependencies. 226 | 227 | ## ✅ Conclusion 228 | - By following the above steps, you have successfully set up an observability stack using OpenTelemetry on an EKS cluster. This setup allows you to monitor your microservices effectively through integrated tracing, metrics, and logging. 229 | 230 | ## 🧼 Clean Up 231 | ```bash 232 | helm uninstall prometheus -n olly 233 | helm uninstall otel-collector -n olly 234 | helm uninstall jaeger -n olly 235 | helm uninstall elasticsearch -n olly 236 | 237 | 238 | 239 | kubectl delete -k k8s-manifests/ 240 | 241 | 242 | kubectl delete ns olly 243 | 244 | eksctl delete cluster --name observability 245 | ``` -------------------------------------------------------------------------------- /day-7/jaeger-values.yaml: -------------------------------------------------------------------------------- 1 | storage: 2 | type: elasticsearch 3 | elasticsearch: 4 | host: elasticsearch-master.olly.svc # Replace with your Elasticsearch service DNS 5 | port: 9200 6 | scheme: https 7 | user: elastic # Replace with the actual username if necessary 8 | password: F2Dm1tKzDQDYnNXR # Replace with the actual password 9 | tls: 10 | enabled: true 11 | ca: /tls/ca-cert.pem # Path where the CA cert is mounted 12 | 13 | provisionDataStore: 14 | cassandra: false 15 | elasticsearch: false 16 | 17 | query: 18 | cmdlineParams: 19 | es.tls.ca: "/tls/ca-cert.pem" 20 | extraConfigmapMounts: 21 | - name: jaeger-tls 22 | mountPath: /tls 23 | subPath: "" 24 | configMap: jaeger-tls 25 | readOnly: true 26 | 27 | collector: 28 | image: 29 | repository: jaegertracing/jaeger-collector 30 | tag: latest 31 | 32 | # Configure the Collector service to expose OTLP ports 33 | service: 34 | type: ClusterIP 35 | otlp: 36 | grpc: 37 | name: otlp-grpc 38 | # enabled: true 39 | port: 4317 # gRPC OTLP port 40 | http: 41 | name: otlp-http 42 | # enabled: true 43 | port: 4318 # HTTP OTLP port 44 | 45 | 46 | 47 | cmdlineParams: 48 | es.tls.ca: "/tls/ca-cert.pem" 49 | collector.otlp.grpc.host-port: "0.0.0.0:4317" # Enable OTLP gRPC receiver on port 4317 50 | collector.otlp.http.host-port: "0.0.0.0:4318" # Enable OTLP HTTP receiver on port 4318 51 | 52 | extraConfigmapMounts: 53 | - name: jaeger-tls 54 | mountPath: /tls 55 | subPath: "" 56 | configMap: jaeger-tls 57 | readOnly: true 58 | 59 | 60 | # Define the service ports for OTLP receivers 61 | ports: 62 | otlp-grpc: 63 | enabled: true 64 | containerPort: 4317 65 | servicePort: 4317 66 | protocol: TCP 67 | otlp-http: 68 | enabled: true 69 | containerPort: 4318 70 | servicePort: 4318 71 | protocol: TCP 72 | -------------------------------------------------------------------------------- /day-7/k8s-manifests/deployment-a.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | labels: 5 | app: go-service-a-deployment 6 | # run: go-service-a-deployment 7 | name: go-service-a-deployment 8 | spec: 9 | replicas: 1 10 | selector: 11 | matchLabels: 12 | app: go-service-a-deployment 13 | template: 14 | metadata: 15 | labels: 16 | app: go-service-a-deployment 17 | spec: 18 | containers: 19 | # - image: ankitjodhani/golang-svc-a:latest 20 | - image: <>:<> 21 | name: service-a 22 | imagePullPolicy: Always 23 | ports: 24 | - containerPort: 80 25 | env: 26 | - name: OTEL_COLLECTOR_ENDPOINT 27 | value: "otel-collector-opentelemetry-collector.olly:4318" 28 | - name: SVC_B_URI 29 | value: "http://b-service.dev" 30 | - name: PORT 31 | value: "80" 32 | -------------------------------------------------------------------------------- /day-7/k8s-manifests/deployment-b.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | labels: 5 | app: go-service-b-deployment 6 | # run: go-service-b-deployment 7 | name: go-service-b-deployment 8 | spec: 9 | replicas: 1 10 | selector: 11 | matchLabels: 12 | app: go-service-b-deployment 13 | template: 14 | metadata: 15 | labels: 16 | app: go-service-b-deployment 17 | spec: 18 | containers: 19 | # - image: ankitjodhani/golang-svc-b:latest 20 | - image: <>:<> 21 | name: service-a 22 | imagePullPolicy: Always 23 | ports: 24 | - containerPort: 80 25 | env: 26 | - name: OTEL_COLLECTOR_ENDPOINT 27 | value: "otel-collector-opentelemetry-collector.olly:4318" 28 | - name: SVC_A_URI 29 | value: "http://a-service.dev" 30 | - name: PORT 31 | value: "80" 32 | -------------------------------------------------------------------------------- /day-7/k8s-manifests/kustomization.yml: -------------------------------------------------------------------------------- 1 | apiVersion: kustomize.config.k8s.io/v1beta1 2 | kind: Kustomization 3 | namespace: dev 4 | resources: 5 | - namespace.yml 6 | - deployment-a.yml 7 | - deployment-b.yml 8 | - svc-a.yml 9 | - svc-b.yml 10 | -------------------------------------------------------------------------------- /day-7/k8s-manifests/namespace.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: dev -------------------------------------------------------------------------------- /day-7/k8s-manifests/svc-a.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | job: go-api 6 | app: a-service 7 | name: a-service 8 | annotations: 9 | prometheus.io/scrape: "true" 10 | prometheus.io/port: "80" 11 | prometheus.io/path: "/metrics" 12 | 13 | spec: 14 | ports: 15 | - name: a-service-port 16 | port: 80 17 | protocol: TCP 18 | targetPort: 80 19 | selector: 20 | app: go-service-a-deployment 21 | type: LoadBalancer 22 | 23 | -------------------------------------------------------------------------------- /day-7/k8s-manifests/svc-b.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | job: go-api 6 | app: b-service 7 | name: b-service 8 | annotations: 9 | prometheus.io/scrape: "true" 10 | prometheus.io/port: "80" 11 | prometheus.io/path: "/metrics" 12 | 13 | spec: 14 | ports: 15 | - name: b-service-port 16 | port: 80 17 | protocol: TCP 18 | targetPort: 80 19 | selector: 20 | app: go-service-b-deployment 21 | type: LoadBalancer 22 | 23 | -------------------------------------------------------------------------------- /day-7/microservice-a/.dockerignore: -------------------------------------------------------------------------------- 1 | .env -------------------------------------------------------------------------------- /day-7/microservice-a/.env: -------------------------------------------------------------------------------- 1 | SVC_B_URI=http://localhost:8081 2 | OTEL_COLLECTOR_ENDPOINT=localhost:4318 3 | PORT=80 -------------------------------------------------------------------------------- /day-7/microservice-a/docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: '3' 2 | 3 | services: 4 | otel-collector: 5 | image: otel/opentelemetry-collector-contrib:latest 6 | command: ["--config=/etc/otel-collector-config.yaml"] 7 | volumes: 8 | - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml 9 | ports: 10 | - "4317:4317" # OTLP gRPC receiver 11 | - "4318:4318" # OTLP HTTP receiver 12 | - "8889:8889" # Prometheus metrics exporter 13 | depends_on: 14 | - jaeger 15 | 16 | prometheus: 17 | image: prom/prometheus:latest 18 | volumes: 19 | - ./prometheus.yaml:/etc/prometheus/prometheus.yml 20 | command: 21 | - "--config.file=/etc/prometheus/prometheus.yml" 22 | ports: 23 | - "9090:9090" 24 | depends_on: 25 | - otel-collector 26 | 27 | jaeger: 28 | image: jaegertracing/all-in-one:latest 29 | ports: 30 | - "16686:16686" # Jaeger UI 31 | - "14250:14250" # Jaeger gRPC receiver 32 | 33 | -------------------------------------------------------------------------------- /day-7/microservice-a/dockerfile: -------------------------------------------------------------------------------- 1 | # Use official Golang image as the build image 2 | FROM golang:1.23-alpine AS builder 3 | 4 | WORKDIR /app 5 | 6 | COPY go.mod ./ 7 | COPY go.sum ./ 8 | RUN go mod download 9 | 10 | COPY . ./ 11 | 12 | RUN go build -o service-a . 13 | 14 | # Use a minimal image for the runtime 15 | FROM alpine:latest 16 | 17 | WORKDIR /app 18 | 19 | COPY --from=builder /app/service-a . 20 | 21 | EXPOSE 80 22 | 23 | CMD ["./service-a"] 24 | -------------------------------------------------------------------------------- /day-7/microservice-a/go.mod: -------------------------------------------------------------------------------- 1 | module microservice-a 2 | 3 | go 1.23.0 4 | 5 | require ( 6 | github.com/gin-gonic/gin v1.10.0 7 | github.com/joho/godotenv v1.5.1 8 | go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin v0.55.0 9 | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.55.0 10 | go.opentelemetry.io/otel v1.30.0 11 | go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.30.0 12 | go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.30.0 13 | go.opentelemetry.io/otel/metric v1.30.0 14 | go.opentelemetry.io/otel/sdk v1.30.0 15 | go.opentelemetry.io/otel/sdk/metric v1.30.0 16 | ) 17 | 18 | require ( 19 | github.com/bytedance/sonic v1.12.2 // indirect 20 | github.com/bytedance/sonic/loader v0.2.0 // indirect 21 | github.com/cenkalti/backoff/v4 v4.3.0 // indirect 22 | github.com/cloudwego/base64x v0.1.4 // indirect 23 | github.com/cloudwego/iasm v0.2.0 // indirect 24 | github.com/felixge/httpsnoop v1.0.4 // indirect 25 | github.com/gabriel-vasile/mimetype v1.4.5 // indirect 26 | github.com/gin-contrib/sse v0.1.0 // indirect 27 | github.com/go-logr/logr v1.4.2 // indirect 28 | github.com/go-logr/stdr v1.2.2 // indirect 29 | github.com/go-playground/locales v0.14.1 // indirect 30 | github.com/go-playground/universal-translator v0.18.1 // indirect 31 | github.com/go-playground/validator/v10 v10.22.1 // indirect 32 | github.com/goccy/go-json v0.10.3 // indirect 33 | github.com/google/uuid v1.6.0 // indirect 34 | github.com/grpc-ecosystem/grpc-gateway/v2 v2.22.0 // indirect 35 | github.com/json-iterator/go v1.1.12 // indirect 36 | github.com/klauspost/cpuid/v2 v2.2.8 // indirect 37 | github.com/leodido/go-urn v1.4.0 // indirect 38 | github.com/mattn/go-isatty v0.0.20 // indirect 39 | github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect 40 | github.com/modern-go/reflect2 v1.0.2 // indirect 41 | github.com/pelletier/go-toml/v2 v2.2.3 // indirect 42 | github.com/twitchyliquid64/golang-asm v0.15.1 // indirect 43 | github.com/ugorji/go/codec v1.2.12 // indirect 44 | go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.30.0 // indirect 45 | go.opentelemetry.io/otel/trace v1.30.0 // indirect 46 | go.opentelemetry.io/proto/otlp v1.3.1 // indirect 47 | golang.org/x/arch v0.10.0 // indirect 48 | golang.org/x/crypto v0.27.0 // indirect 49 | golang.org/x/net v0.29.0 // indirect 50 | golang.org/x/sys v0.25.0 // indirect 51 | golang.org/x/text v0.18.0 // indirect 52 | google.golang.org/genproto/googleapis/api v0.0.0-20240903143218-8af14fe29dc1 // indirect 53 | google.golang.org/genproto/googleapis/rpc v0.0.0-20240903143218-8af14fe29dc1 // indirect 54 | google.golang.org/grpc v1.66.1 // indirect 55 | google.golang.org/protobuf v1.34.2 // indirect 56 | gopkg.in/yaml.v3 v3.0.1 // indirect 57 | ) 58 | -------------------------------------------------------------------------------- /day-7/microservice-a/go.sum: -------------------------------------------------------------------------------- 1 | github.com/bytedance/sonic v1.12.2 h1:oaMFuRTpMHYLpCntGca65YWt5ny+wAceDERTkT2L9lg= 2 | github.com/bytedance/sonic v1.12.2/go.mod h1:B8Gt/XvtZ3Fqj+iSKMypzymZxw/FVwgIGKzMzT9r/rk= 3 | github.com/bytedance/sonic/loader v0.1.1/go.mod h1:ncP89zfokxS5LZrJxl5z0UJcsk4M4yY2JpfqGeCtNLU= 4 | github.com/bytedance/sonic/loader v0.2.0 h1:zNprn+lsIP06C/IqCHs3gPQIvnvpKbbxyXQP1iU4kWM= 5 | github.com/bytedance/sonic/loader v0.2.0/go.mod h1:ncP89zfokxS5LZrJxl5z0UJcsk4M4yY2JpfqGeCtNLU= 6 | github.com/cenkalti/backoff/v4 v4.3.0 h1:MyRJ/UdXutAwSAT+s3wNd7MfTIcy71VQueUuFK343L8= 7 | github.com/cenkalti/backoff/v4 v4.3.0/go.mod h1:Y3VNntkOUPxTVeUxJ/G5vcM//AlwfmyYozVcomhLiZE= 8 | github.com/cloudwego/base64x v0.1.4 h1:jwCgWpFanWmN8xoIUHa2rtzmkd5J2plF/dnLS6Xd/0Y= 9 | github.com/cloudwego/base64x v0.1.4/go.mod h1:0zlkT4Wn5C6NdauXdJRhSKRlJvmclQ1hhJgA0rcu/8w= 10 | github.com/cloudwego/iasm v0.2.0 h1:1KNIy1I1H9hNNFEEH3DVnI4UujN+1zjpuk6gwHLTssg= 11 | github.com/cloudwego/iasm v0.2.0/go.mod h1:8rXZaNYT2n95jn+zTI1sDr+IgcD2GVs0nlbbQPiEFhY= 12 | github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 13 | github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= 14 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 15 | github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg= 16 | github.com/felixge/httpsnoop v1.0.4/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U= 17 | github.com/gabriel-vasile/mimetype v1.4.5 h1:J7wGKdGu33ocBOhGy0z653k/lFKLFDPJMG8Gql0kxn4= 18 | github.com/gabriel-vasile/mimetype v1.4.5/go.mod h1:ibHel+/kbxn9x2407k1izTA1S81ku1z/DlgOW2QE0M4= 19 | github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE= 20 | github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI= 21 | github.com/gin-gonic/gin v1.10.0 h1:nTuyha1TYqgedzytsKYqna+DfLos46nTv2ygFy86HFU= 22 | github.com/gin-gonic/gin v1.10.0/go.mod h1:4PMNQiOhvDRa013RKVbsiNwoyezlm2rm0uX/T7kzp5Y= 23 | github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A= 24 | github.com/go-logr/logr v1.4.2 h1:6pFjapn8bFcIbiKo3XT4j/BhANplGihG6tvd+8rYgrY= 25 | github.com/go-logr/logr v1.4.2/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY= 26 | github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag= 27 | github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE= 28 | github.com/go-playground/assert/v2 v2.2.0 h1:JvknZsQTYeFEAhQwI4qEt9cyV5ONwRHC+lYKSsYSR8s= 29 | github.com/go-playground/assert/v2 v2.2.0/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4= 30 | github.com/go-playground/locales v0.14.1 h1:EWaQ/wswjilfKLTECiXz7Rh+3BjFhfDFKv/oXslEjJA= 31 | github.com/go-playground/locales v0.14.1/go.mod h1:hxrqLVvrK65+Rwrd5Fc6F2O76J/NuW9t0sjnWqG1slY= 32 | github.com/go-playground/universal-translator v0.18.1 h1:Bcnm0ZwsGyWbCzImXv+pAJnYK9S473LQFuzCbDbfSFY= 33 | github.com/go-playground/universal-translator v0.18.1/go.mod h1:xekY+UJKNuX9WP91TpwSH2VMlDf28Uj24BCp08ZFTUY= 34 | github.com/go-playground/validator/v10 v10.22.1 h1:40JcKH+bBNGFczGuoBYgX4I6m/i27HYW8P9FDk5PbgA= 35 | github.com/go-playground/validator/v10 v10.22.1/go.mod h1:dbuPbCMFw/DrkbEynArYaCwl3amGuJotoKCe95atGMM= 36 | github.com/goccy/go-json v0.10.3 h1:KZ5WoDbxAIgm2HNbYckL0se1fHD6rz5j4ywS6ebzDqA= 37 | github.com/goccy/go-json v0.10.3/go.mod h1:oq7eo15ShAhp70Anwd5lgX2pLfOS3QCiwU/PULtXL6M= 38 | github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI= 39 | github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= 40 | github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= 41 | github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0= 42 | github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= 43 | github.com/grpc-ecosystem/grpc-gateway/v2 v2.22.0 h1:asbCHRVmodnJTuQ3qamDwqVOIjwqUPTYmYuemVOx+Ys= 44 | github.com/grpc-ecosystem/grpc-gateway/v2 v2.22.0/go.mod h1:ggCgvZ2r7uOoQjOyu2Y1NhHmEPPzzuhWgcza5M1Ji1I= 45 | github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0= 46 | github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4= 47 | github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM= 48 | github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo= 49 | github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg= 50 | github.com/klauspost/cpuid/v2 v2.2.8 h1:+StwCXwm9PdpiEkPyzBXIy+M9KUb4ODm0Zarf1kS5BM= 51 | github.com/klauspost/cpuid/v2 v2.2.8/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws= 52 | github.com/knz/go-libedit v1.10.1/go.mod h1:MZTVkCWyz0oBc7JOWP3wNAzd002ZbM/5hgShxwh4x8M= 53 | github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= 54 | github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= 55 | github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= 56 | github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= 57 | github.com/leodido/go-urn v1.4.0 h1:WT9HwE9SGECu3lg4d/dIA+jxlljEa1/ffXKmRjqdmIQ= 58 | github.com/leodido/go-urn v1.4.0/go.mod h1:bvxc+MVxLKB4z00jd1z+Dvzr47oO32F/QSNjSBOlFxI= 59 | github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY= 60 | github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y= 61 | github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= 62 | github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg= 63 | github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= 64 | github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M= 65 | github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk= 66 | github.com/pelletier/go-toml/v2 v2.2.3 h1:YmeHyLY8mFWbdkNWwpr+qIL2bEqT0o95WSdkNHvL12M= 67 | github.com/pelletier/go-toml/v2 v2.2.3/go.mod h1:MfCQTFTvCcUyyvvwm1+G6H/jORL20Xlb6rzQu9GuUkc= 68 | github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= 69 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= 70 | github.com/rogpeppe/go-internal v1.12.0 h1:exVL4IDcn6na9z1rAb56Vxr+CgyK3nn3O+epU5NdKM8= 71 | github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4= 72 | github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= 73 | github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw= 74 | github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo= 75 | github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= 76 | github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= 77 | github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= 78 | github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU= 79 | github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4= 80 | github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg= 81 | github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= 82 | github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI= 83 | github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08= 84 | github.com/ugorji/go/codec v1.2.12 h1:9LC83zGrHhuUA9l16C9AHXAqEV/2wBQ4nkvumAE65EE= 85 | github.com/ugorji/go/codec v1.2.12/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg= 86 | go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin v0.55.0 h1:n4Dd8YaDFeTd2uw+uCHJzOKeqfLgAOlePZpQ5f9cAoE= 87 | go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin v0.55.0/go.mod h1:8aCCTMjP225r98yevEMM5NYDb3ianWLoeIzZ1rPyxHU= 88 | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.55.0 h1:ZIg3ZT/aQ7AfKqdwp7ECpOK6vHqquXXuyTjIO8ZdmPs= 89 | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.55.0/go.mod h1:DQAwmETtZV00skUwgD6+0U89g80NKsJE3DCKeLLPQMI= 90 | go.opentelemetry.io/contrib/propagators/b3 v1.30.0 h1:vumy4r1KMyaoQRltX7cJ37p3nluzALX9nugCjNNefuY= 91 | go.opentelemetry.io/contrib/propagators/b3 v1.30.0/go.mod h1:fRbvRsaeVZ82LIl3u0rIvusIel2UUf+JcaaIpy5taho= 92 | go.opentelemetry.io/otel v1.30.0 h1:F2t8sK4qf1fAmY9ua4ohFS/K+FUuOPemHUIXHtktrts= 93 | go.opentelemetry.io/otel v1.30.0/go.mod h1:tFw4Br9b7fOS+uEao81PJjVMjW/5fvNCbpsDIXqP0pc= 94 | go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.30.0 h1:VrMAbeJz4gnVDg2zEzjHG4dEH86j4jO6VYB+NgtGD8s= 95 | go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.30.0/go.mod h1:qqN/uFdpeitTvm+JDqqnjm517pmQRYxTORbETHq5tOc= 96 | go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.30.0 h1:lsInsfvhVIfOI6qHVyysXMNDnjO9Npvl7tlDPJFBVd4= 97 | go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.30.0/go.mod h1:KQsVNh4OjgjTG0G6EiNi1jVpnaeeKsKMRwbLN+f1+8M= 98 | go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.30.0 h1:umZgi92IyxfXd/l4kaDhnKgY8rnN/cZcF1LKc6I8OQ8= 99 | go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.30.0/go.mod h1:4lVs6obhSVRb1EW5FhOuBTyiQhtRtAnnva9vD3yRfq8= 100 | go.opentelemetry.io/otel/metric v1.30.0 h1:4xNulvn9gjzo4hjg+wzIKG7iNFEaBMX00Qd4QIZs7+w= 101 | go.opentelemetry.io/otel/metric v1.30.0/go.mod h1:aXTfST94tswhWEb+5QjlSqG+cZlmyXy/u8jFpor3WqQ= 102 | go.opentelemetry.io/otel/sdk v1.30.0 h1:cHdik6irO49R5IysVhdn8oaiR9m8XluDaJAs4DfOrYE= 103 | go.opentelemetry.io/otel/sdk v1.30.0/go.mod h1:p14X4Ok8S+sygzblytT1nqG98QG2KYKv++HE0LY/mhg= 104 | go.opentelemetry.io/otel/sdk/metric v1.30.0 h1:QJLT8Pe11jyHBHfSAgYH7kEmT24eX792jZO1bo4BXkM= 105 | go.opentelemetry.io/otel/sdk/metric v1.30.0/go.mod h1:waS6P3YqFNzeP01kuo/MBBYqaoBJl7efRQHOaydhy1Y= 106 | go.opentelemetry.io/otel/trace v1.30.0 h1:7UBkkYzeg3C7kQX8VAidWh2biiQbtAKjyIML8dQ9wmc= 107 | go.opentelemetry.io/otel/trace v1.30.0/go.mod h1:5EyKqTzzmyqB9bwtCCq6pDLktPK6fmGf/Dph+8VI02o= 108 | go.opentelemetry.io/proto/otlp v1.3.1 h1:TrMUixzpM0yuc/znrFTP9MMRh8trP93mkCiDVeXrui0= 109 | go.opentelemetry.io/proto/otlp v1.3.1/go.mod h1:0X1WI4de4ZsLrrJNLAQbFeLCm3T7yBkR0XqQ7niQU+8= 110 | golang.org/x/arch v0.10.0 h1:S3huipmSclq3PJMNe76NGwkBR504WFkQ5dhzWzP8ZW8= 111 | golang.org/x/arch v0.10.0/go.mod h1:FEVrYAQjsQXMVJ1nsMoVVXPZg6p2JE2mx8psSWTDQys= 112 | golang.org/x/crypto v0.27.0 h1:GXm2NjJrPaiv/h1tb2UH8QfgC/hOf/+z0p6PT8o1w7A= 113 | golang.org/x/crypto v0.27.0/go.mod h1:1Xngt8kV6Dvbssa53Ziq6Eqn0HqbZi5Z6R0ZpwQzt70= 114 | golang.org/x/net v0.29.0 h1:5ORfpBpCs4HzDYoodCDBbwHzdR5UrLBZ3sOnUJmFoHo= 115 | golang.org/x/net v0.29.0/go.mod h1:gLkgy8jTGERgjzMic6DS9+SP0ajcu6Xu3Orq/SpETg0= 116 | golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 117 | golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 118 | golang.org/x/sys v0.25.0 h1:r+8e+loiHxRqhXVl6ML1nO3l1+oFoWbnlu2Ehimmi34= 119 | golang.org/x/sys v0.25.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= 120 | golang.org/x/text v0.18.0 h1:XvMDiNzPAl0jr17s6W9lcaIhGUfUORdGCNsuLmPG224= 121 | golang.org/x/text v0.18.0/go.mod h1:BuEKDfySbSR4drPmRPG/7iBdf8hvFMuRexcpahXilzY= 122 | google.golang.org/genproto/googleapis/api v0.0.0-20240903143218-8af14fe29dc1 h1:hjSy6tcFQZ171igDaN5QHOw2n6vx40juYbC/x67CEhc= 123 | google.golang.org/genproto/googleapis/api v0.0.0-20240903143218-8af14fe29dc1/go.mod h1:qpvKtACPCQhAdu3PyQgV4l3LMXZEtft7y8QcarRsp9I= 124 | google.golang.org/genproto/googleapis/rpc v0.0.0-20240903143218-8af14fe29dc1 h1:pPJltXNxVzT4pK9yD8vR9X75DaWYYmLGMsEvBfFQZzQ= 125 | google.golang.org/genproto/googleapis/rpc v0.0.0-20240903143218-8af14fe29dc1/go.mod h1:UqMtugtsSgubUsoxbuAoiCXvqvErP7Gf0so0mK9tHxU= 126 | google.golang.org/grpc v1.66.1 h1:hO5qAXR19+/Z44hmvIM4dQFMSYX9XcWsByfoxutBpAM= 127 | google.golang.org/grpc v1.66.1/go.mod h1:s3/l6xSSCURdVfAnL+TqCNMyTDAGN6+lZeVxnZR128Y= 128 | google.golang.org/protobuf v1.34.2 h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg= 129 | google.golang.org/protobuf v1.34.2/go.mod h1:qYOHts0dSfpeUzUFpOMr/WGzszTmLH+DiWniOlNbLDw= 130 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 131 | gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= 132 | gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= 133 | gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 134 | gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= 135 | gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 136 | nullprogram.com/x/optparse v1.0.0/go.mod h1:KdyPE+Igbe0jQUrVfMqDMeJQIJZEuyV7pjYmp6pbG50= 137 | -------------------------------------------------------------------------------- /day-7/microservice-a/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "context" 5 | "fmt" 6 | "io/ioutil" 7 | "log" 8 | "net/http" 9 | "os" 10 | "time" 11 | 12 | "github.com/gin-gonic/gin" 13 | "github.com/joho/godotenv" 14 | 15 | "go.opentelemetry.io/otel" 16 | "go.opentelemetry.io/otel/attribute" 17 | "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp" 18 | "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" 19 | "go.opentelemetry.io/otel/metric" 20 | sdkmetric "go.opentelemetry.io/otel/sdk/metric" 21 | "go.opentelemetry.io/otel/sdk/resource" 22 | "go.opentelemetry.io/otel/sdk/trace" 23 | semconv "go.opentelemetry.io/otel/semconv/v1.21.0" 24 | 25 | "go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin" 26 | "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp" 27 | ) 28 | 29 | var ( 30 | requestCounter metric.Int64Counter 31 | requestDuration metric.Float64Histogram 32 | activeRequestsCounter metric.Int64UpDownCounter 33 | ) 34 | 35 | func initProvider() (func(context.Context) error, error) { 36 | ctx := context.Background() 37 | 38 | // Load environment variables 39 | err := godotenv.Load() 40 | if err != nil { 41 | log.Println("No .env file found. Using environment variables 👌") 42 | } 43 | 44 | // Read the OTEL collector endpoint from environment variable 45 | otelEndpoint := os.Getenv("OTEL_COLLECTOR_ENDPOINT") 46 | if otelEndpoint == "" { 47 | otelEndpoint = "localhost:4318" // Default endpoint 48 | } 49 | 50 | // Create a resource with the service name 51 | res, err := resource.New(ctx, 52 | resource.WithAttributes( 53 | semconv.ServiceNameKey.String("microservice-a"), 54 | ), 55 | ) 56 | if err != nil { 57 | return nil, fmt.Errorf("failed to create resource: %w", err) 58 | } 59 | 60 | // Create OTLP trace exporter over HTTP with custom endpoint 61 | traceExporter, err := otlptracehttp.New(ctx, 62 | otlptracehttp.WithEndpoint(otelEndpoint), 63 | otlptracehttp.WithInsecure(), 64 | ) 65 | if err != nil { 66 | return nil, fmt.Errorf("failed to create trace exporter: %w", err) 67 | } 68 | 69 | // Create OTLP metric exporter over HTTP with custom endpoint 70 | metricExporter, err := otlpmetrichttp.New(ctx, 71 | otlpmetrichttp.WithEndpoint(otelEndpoint), 72 | otlpmetrichttp.WithInsecure(), 73 | ) 74 | if err != nil { 75 | return nil, fmt.Errorf("failed to create metric exporter: %w", err) 76 | } 77 | 78 | // Create trace provider with the exporter and resource 79 | tracerProvider := trace.NewTracerProvider( 80 | trace.WithBatcher(traceExporter), 81 | trace.WithResource(res), 82 | ) 83 | 84 | // Create metric reader and meter provider with the resource 85 | metricReader := sdkmetric.NewPeriodicReader(metricExporter) 86 | meterProvider := sdkmetric.NewMeterProvider( 87 | sdkmetric.WithReader(metricReader), 88 | sdkmetric.WithResource(res), 89 | ) 90 | 91 | // Set global providers 92 | otel.SetTracerProvider(tracerProvider) 93 | otel.SetMeterProvider(meterProvider) 94 | 95 | return func(ctx context.Context) error { 96 | err := tracerProvider.Shutdown(ctx) 97 | if err != nil { 98 | return err 99 | } 100 | err = meterProvider.Shutdown(ctx) 101 | if err != nil { 102 | return err 103 | } 104 | return nil 105 | }, nil 106 | } 107 | 108 | // Basic Hello Handler 109 | func hello(c *gin.Context) { 110 | startTime := time.Now() 111 | ctx := c.Request.Context() 112 | 113 | // Increment active requests 114 | activeRequestsCounter.Add(ctx, 1) 115 | defer activeRequestsCounter.Add(ctx, -1) 116 | 117 | c.JSON(http.StatusOK, gin.H{ 118 | "message": "👋 Hello from microservice-a", 119 | }) 120 | 121 | duration := time.Since(startTime).Milliseconds() 122 | 123 | requestCounter.Add(ctx, 1, metric.WithAttributes(attribute.String("endpoint", "/hello-a"))) 124 | requestDuration.Record(ctx, float64(duration), metric.WithAttributes(attribute.String("endpoint", "/hello-a"))) 125 | } 126 | 127 | // Call Service A Handler 128 | func callB(c *gin.Context) { 129 | startTime := time.Now() 130 | ctx := c.Request.Context() 131 | 132 | activeRequestsCounter.Add(ctx, 1) 133 | defer activeRequestsCounter.Add(ctx, -1) 134 | 135 | // Load environment variables 136 | err := godotenv.Load() 137 | if err != nil { 138 | log.Println("No .env file found. Using environment variables 👌") 139 | } 140 | 141 | SVC_B_URI := os.Getenv("SVC_B_URI") 142 | if SVC_B_URI == "" { 143 | SVC_B_URI = "http://localhost:8081" // Default URI for service-B 144 | } 145 | 146 | // Create a new HTTP client with OpenTelemetry instrumentation 147 | client := http.Client{ 148 | Transport: otelhttp.NewTransport(http.DefaultTransport), 149 | } 150 | 151 | // Create a new request 152 | req, err := http.NewRequest("GET", fmt.Sprintf("%s/hello-b", SVC_B_URI), nil) 153 | if err != nil { 154 | c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create request to service-B"}) 155 | return 156 | } 157 | 158 | // Use the context from Gin 159 | req = req.WithContext(ctx) 160 | 161 | // Make the request 162 | resp, err := client.Do(req) 163 | if err != nil { 164 | c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to reach service-B"}) 165 | return 166 | } 167 | defer resp.Body.Close() 168 | 169 | resBody, _ := ioutil.ReadAll(resp.Body) 170 | 171 | c.JSON(http.StatusOK, gin.H{ 172 | "message": "🥳 Response from service-B", 173 | "response": string(resBody), 174 | }) 175 | 176 | duration := time.Since(startTime).Milliseconds() 177 | 178 | requestCounter.Add(ctx, 1, metric.WithAttributes(attribute.String("endpoint", "/call-b"))) 179 | requestDuration.Record(ctx, float64(duration), metric.WithAttributes(attribute.String("endpoint", "/call-b"))) 180 | } 181 | 182 | // Get Coffee Handler 183 | func getMeCoffee(c *gin.Context) { 184 | startTime := time.Now() 185 | ctx := c.Request.Context() 186 | 187 | activeRequestsCounter.Add(ctx, 1) 188 | defer activeRequestsCounter.Add(ctx, -1) 189 | 190 | // Create a new HTTP client with OpenTelemetry instrumentation 191 | client := http.Client{ 192 | Transport: otelhttp.NewTransport(http.DefaultTransport), 193 | } 194 | 195 | // Create a new request 196 | req, err := http.NewRequest("GET", "https://api.sampleapis.com/coffee/iced", nil) 197 | if err != nil { 198 | c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create request to coffee API"}) 199 | return 200 | } 201 | 202 | // Use the context from Gin 203 | req = req.WithContext(ctx) 204 | 205 | // Make the request 206 | resp, err := client.Do(req) 207 | if err != nil { 208 | c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to fetch coffee"}) 209 | return 210 | } 211 | defer resp.Body.Close() 212 | 213 | resBody, _ := ioutil.ReadAll(resp.Body) 214 | 215 | c.JSON(http.StatusOK, gin.H{ 216 | "message": "🍵 Here is your coffee", 217 | "response": string(resBody), 218 | }) 219 | 220 | duration := time.Since(startTime).Milliseconds() 221 | 222 | requestCounter.Add(ctx, 1, metric.WithAttributes(attribute.String("endpoint", "/getme-coffee"))) 223 | requestDuration.Record(ctx, float64(duration), metric.WithAttributes(attribute.String("endpoint", "/getme-coffee"))) 224 | } 225 | 226 | func main() { 227 | ctx := context.Background() 228 | shutdown, err := initProvider() 229 | if err != nil { 230 | log.Fatalf("Failed to initialize OpenTelemetry: %v", err) 231 | } 232 | defer func() { 233 | if err := shutdown(ctx); err != nil { 234 | log.Fatalf("Error shutting down provider: %v", err) 235 | } 236 | }() 237 | 238 | router := gin.Default() 239 | 240 | // Use OpenTelemetry middleware for Gin 241 | router.Use(otelgin.Middleware("microservice-a")) 242 | 243 | // Initialize the Meter 244 | meter := otel.GetMeterProvider().Meter("microservice-a") 245 | 246 | // Initialize instruments using the Meter interface methods 247 | requestCounter, err = meter.Int64Counter( 248 | "request_count", 249 | metric.WithDescription("Counts the number of requests received"), 250 | ) 251 | if err != nil { 252 | log.Fatalf("Failed to create counter: %v", err) 253 | } 254 | 255 | requestDuration, err = meter.Float64Histogram( 256 | "request_duration_ms", 257 | metric.WithDescription("Records the duration of requests in milliseconds"), 258 | ) 259 | if err != nil { 260 | log.Fatalf("Failed to create histogram: %v", err) 261 | } 262 | 263 | activeRequestsCounter, err = meter.Int64UpDownCounter( 264 | "active_requests", 265 | metric.WithDescription("Counts the number of active requests"), 266 | ) 267 | if err != nil { 268 | log.Fatalf("Failed to create up-down counter: %v", err) 269 | } 270 | 271 | router.GET("/hello-a", hello) 272 | router.GET("/call-b", callB) 273 | router.GET("/getme-coffee", getMeCoffee) 274 | 275 | err = godotenv.Load() 276 | if err != nil { 277 | log.Println("No .env file found. Using environment variables 👌") 278 | } 279 | 280 | PORT := os.Getenv("PORT") 281 | if PORT == "" { 282 | PORT = "80" // Default URI for service-B 283 | } 284 | 285 | // Start the server 286 | router.Run(fmt.Sprintf(":%s", PORT)) 287 | } 288 | -------------------------------------------------------------------------------- /day-7/microservice-a/otel-collector-config.yaml: -------------------------------------------------------------------------------- 1 | # 👉 Note: this file is for to test in local environment - nothing to do with k8s 2 | 3 | 4 | # receivers: 5 | # otlp: 6 | # protocols: 7 | # http: 8 | # grpc: 9 | 10 | receivers: 11 | otlp: 12 | protocols: 13 | http: 14 | endpoint: "0.0.0.0:4318" 15 | grpc: 16 | endpoint: "0.0.0.0:4317" 17 | 18 | processors: 19 | batch: 20 | 21 | exporters: 22 | prometheus: 23 | endpoint: "0.0.0.0:8889" 24 | otlp: 25 | endpoint: "jaeger:4317" # Send data to Jaeger over gRPC 26 | tls: 27 | insecure: true 28 | service: 29 | pipelines: 30 | metrics: 31 | receivers: [otlp] 32 | processors: [batch] 33 | exporters: [prometheus] 34 | traces: 35 | receivers: [otlp] 36 | processors: [batch] 37 | exporters: [otlp] 38 | -------------------------------------------------------------------------------- /day-7/microservice-a/prometheus.yaml: -------------------------------------------------------------------------------- 1 | # 👉 Note: this file is for to test in local environment - nothing to do with k8s 2 | 3 | 4 | global: 5 | scrape_interval: 2s 6 | 7 | scrape_configs: 8 | - job_name: 'otel-collector' 9 | scrape_interval: 2s 10 | static_configs: 11 | - targets: ['otel-collector:8889'] 12 | 13 | 14 | -------------------------------------------------------------------------------- /day-7/microservice-a/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Set the base URL of your Node.js application 4 | BASE_URL="http://localhost:8080" 5 | 6 | echo $BASE_URL 7 | 8 | # Define an array of endpoints 9 | ENDPOINTS=( 10 | "/hello-a" 11 | "/call-b" 12 | "/getme-coffee" 13 | ) 14 | 15 | # Function to make a random request to one of the endpoints 16 | make_random_request() { 17 | local endpoint=${ENDPOINTS[$RANDOM % ${#ENDPOINTS[@]}]} 18 | curl -s -o /dev/null -w "%{http_code}" "$BASE_URL$endpoint" 19 | } 20 | 21 | # Make 1000 random requests 22 | for ((i=1; i<=1000; i++)); do 23 | make_random_request 24 | echo "Request $i completed" 25 | sleep 0.1 # Optional: Sleep for a short duration between requests to simulate real traffic 26 | done 27 | 28 | echo "Completed 1000 requests" 29 | -------------------------------------------------------------------------------- /day-7/microservice-b/.dockerignore: -------------------------------------------------------------------------------- 1 | .env -------------------------------------------------------------------------------- /day-7/microservice-b/.env: -------------------------------------------------------------------------------- 1 | SVC_A_URI=http://localhost:8080 2 | OTEL_EXPORTER_OTLP_ENDPOINT="localhost:4317" 3 | OTEL_COLLECTOR_ENDPOINT=localhost:4318 4 | PORT=80 -------------------------------------------------------------------------------- /day-7/microservice-b/docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: '3' 2 | 3 | services: 4 | otel-collector: 5 | image: otel/opentelemetry-collector-contrib:latest 6 | command: ["--config=/etc/otel-collector-config.yaml"] 7 | volumes: 8 | - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml 9 | ports: 10 | - "4317:4317" # OTLP gRPC receiver 11 | - "4318:4318" # OTLP HTTP receiver 12 | - "8889:8889" # Prometheus metrics exporter 13 | depends_on: 14 | - jaeger 15 | 16 | prometheus: 17 | image: prom/prometheus:latest 18 | volumes: 19 | - ./prometheus.yaml:/etc/prometheus/prometheus.yml 20 | command: 21 | - "--config.file=/etc/prometheus/prometheus.yml" 22 | ports: 23 | - "9090:9090" 24 | depends_on: 25 | - otel-collector 26 | 27 | jaeger: 28 | image: jaegertracing/all-in-one:latest 29 | ports: 30 | - "16686:16686" # Jaeger UI 31 | - "14250:14250" # Jaeger gRPC receiver 32 | 33 | -------------------------------------------------------------------------------- /day-7/microservice-b/dockerfile: -------------------------------------------------------------------------------- 1 | # Use official Golang image as the build image 2 | FROM golang:1.23-alpine AS builder 3 | 4 | WORKDIR /app 5 | 6 | COPY go.mod ./ 7 | COPY go.sum ./ 8 | RUN go mod download 9 | 10 | COPY . ./ 11 | 12 | RUN go build -o service-b . 13 | 14 | # Use a minimal image for the runtime 15 | FROM alpine:latest 16 | 17 | WORKDIR /app 18 | 19 | COPY --from=builder /app/service-b . 20 | 21 | EXPOSE 80 22 | 23 | CMD ["./service-b"] 24 | -------------------------------------------------------------------------------- /day-7/microservice-b/go.mod: -------------------------------------------------------------------------------- 1 | module microservice-b 2 | 3 | go 1.23.0 4 | 5 | require ( 6 | github.com/gin-gonic/gin v1.10.0 7 | github.com/joho/godotenv v1.5.1 8 | go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin v0.55.0 9 | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.55.0 10 | go.opentelemetry.io/otel v1.30.0 11 | go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.30.0 12 | go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.30.0 13 | go.opentelemetry.io/otel/metric v1.30.0 14 | go.opentelemetry.io/otel/sdk v1.30.0 15 | go.opentelemetry.io/otel/sdk/metric v1.30.0 16 | ) 17 | 18 | require ( 19 | github.com/bytedance/sonic v1.12.2 // indirect 20 | github.com/bytedance/sonic/loader v0.2.0 // indirect 21 | github.com/cenkalti/backoff/v4 v4.3.0 // indirect 22 | github.com/cloudwego/base64x v0.1.4 // indirect 23 | github.com/cloudwego/iasm v0.2.0 // indirect 24 | github.com/felixge/httpsnoop v1.0.4 // indirect 25 | github.com/gabriel-vasile/mimetype v1.4.5 // indirect 26 | github.com/gin-contrib/sse v0.1.0 // indirect 27 | github.com/go-logr/logr v1.4.2 // indirect 28 | github.com/go-logr/stdr v1.2.2 // indirect 29 | github.com/go-playground/locales v0.14.1 // indirect 30 | github.com/go-playground/universal-translator v0.18.1 // indirect 31 | github.com/go-playground/validator/v10 v10.22.1 // indirect 32 | github.com/goccy/go-json v0.10.3 // indirect 33 | github.com/google/uuid v1.6.0 // indirect 34 | github.com/grpc-ecosystem/grpc-gateway/v2 v2.22.0 // indirect 35 | github.com/json-iterator/go v1.1.12 // indirect 36 | github.com/klauspost/cpuid/v2 v2.2.8 // indirect 37 | github.com/leodido/go-urn v1.4.0 // indirect 38 | github.com/mattn/go-isatty v0.0.20 // indirect 39 | github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect 40 | github.com/modern-go/reflect2 v1.0.2 // indirect 41 | github.com/pelletier/go-toml/v2 v2.2.3 // indirect 42 | github.com/twitchyliquid64/golang-asm v0.15.1 // indirect 43 | github.com/ugorji/go/codec v1.2.12 // indirect 44 | go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.30.0 // indirect 45 | go.opentelemetry.io/otel/trace v1.30.0 // indirect 46 | go.opentelemetry.io/proto/otlp v1.3.1 // indirect 47 | golang.org/x/arch v0.10.0 // indirect 48 | golang.org/x/crypto v0.27.0 // indirect 49 | golang.org/x/net v0.29.0 // indirect 50 | golang.org/x/sys v0.25.0 // indirect 51 | golang.org/x/text v0.18.0 // indirect 52 | google.golang.org/genproto/googleapis/api v0.0.0-20240903143218-8af14fe29dc1 // indirect 53 | google.golang.org/genproto/googleapis/rpc v0.0.0-20240903143218-8af14fe29dc1 // indirect 54 | google.golang.org/grpc v1.66.1 // indirect 55 | google.golang.org/protobuf v1.34.2 // indirect 56 | gopkg.in/yaml.v3 v3.0.1 // indirect 57 | ) 58 | -------------------------------------------------------------------------------- /day-7/microservice-b/go.sum: -------------------------------------------------------------------------------- 1 | github.com/bytedance/sonic v1.12.2 h1:oaMFuRTpMHYLpCntGca65YWt5ny+wAceDERTkT2L9lg= 2 | github.com/bytedance/sonic v1.12.2/go.mod h1:B8Gt/XvtZ3Fqj+iSKMypzymZxw/FVwgIGKzMzT9r/rk= 3 | github.com/bytedance/sonic/loader v0.1.1/go.mod h1:ncP89zfokxS5LZrJxl5z0UJcsk4M4yY2JpfqGeCtNLU= 4 | github.com/bytedance/sonic/loader v0.2.0 h1:zNprn+lsIP06C/IqCHs3gPQIvnvpKbbxyXQP1iU4kWM= 5 | github.com/bytedance/sonic/loader v0.2.0/go.mod h1:ncP89zfokxS5LZrJxl5z0UJcsk4M4yY2JpfqGeCtNLU= 6 | github.com/cenkalti/backoff/v4 v4.3.0 h1:MyRJ/UdXutAwSAT+s3wNd7MfTIcy71VQueUuFK343L8= 7 | github.com/cenkalti/backoff/v4 v4.3.0/go.mod h1:Y3VNntkOUPxTVeUxJ/G5vcM//AlwfmyYozVcomhLiZE= 8 | github.com/cloudwego/base64x v0.1.4 h1:jwCgWpFanWmN8xoIUHa2rtzmkd5J2plF/dnLS6Xd/0Y= 9 | github.com/cloudwego/base64x v0.1.4/go.mod h1:0zlkT4Wn5C6NdauXdJRhSKRlJvmclQ1hhJgA0rcu/8w= 10 | github.com/cloudwego/iasm v0.2.0 h1:1KNIy1I1H9hNNFEEH3DVnI4UujN+1zjpuk6gwHLTssg= 11 | github.com/cloudwego/iasm v0.2.0/go.mod h1:8rXZaNYT2n95jn+zTI1sDr+IgcD2GVs0nlbbQPiEFhY= 12 | github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 13 | github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= 14 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 15 | github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg= 16 | github.com/felixge/httpsnoop v1.0.4/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U= 17 | github.com/gabriel-vasile/mimetype v1.4.5 h1:J7wGKdGu33ocBOhGy0z653k/lFKLFDPJMG8Gql0kxn4= 18 | github.com/gabriel-vasile/mimetype v1.4.5/go.mod h1:ibHel+/kbxn9x2407k1izTA1S81ku1z/DlgOW2QE0M4= 19 | github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE= 20 | github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI= 21 | github.com/gin-gonic/gin v1.10.0 h1:nTuyha1TYqgedzytsKYqna+DfLos46nTv2ygFy86HFU= 22 | github.com/gin-gonic/gin v1.10.0/go.mod h1:4PMNQiOhvDRa013RKVbsiNwoyezlm2rm0uX/T7kzp5Y= 23 | github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A= 24 | github.com/go-logr/logr v1.4.2 h1:6pFjapn8bFcIbiKo3XT4j/BhANplGihG6tvd+8rYgrY= 25 | github.com/go-logr/logr v1.4.2/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY= 26 | github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag= 27 | github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE= 28 | github.com/go-playground/assert/v2 v2.2.0 h1:JvknZsQTYeFEAhQwI4qEt9cyV5ONwRHC+lYKSsYSR8s= 29 | github.com/go-playground/assert/v2 v2.2.0/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4= 30 | github.com/go-playground/locales v0.14.1 h1:EWaQ/wswjilfKLTECiXz7Rh+3BjFhfDFKv/oXslEjJA= 31 | github.com/go-playground/locales v0.14.1/go.mod h1:hxrqLVvrK65+Rwrd5Fc6F2O76J/NuW9t0sjnWqG1slY= 32 | github.com/go-playground/universal-translator v0.18.1 h1:Bcnm0ZwsGyWbCzImXv+pAJnYK9S473LQFuzCbDbfSFY= 33 | github.com/go-playground/universal-translator v0.18.1/go.mod h1:xekY+UJKNuX9WP91TpwSH2VMlDf28Uj24BCp08ZFTUY= 34 | github.com/go-playground/validator/v10 v10.22.1 h1:40JcKH+bBNGFczGuoBYgX4I6m/i27HYW8P9FDk5PbgA= 35 | github.com/go-playground/validator/v10 v10.22.1/go.mod h1:dbuPbCMFw/DrkbEynArYaCwl3amGuJotoKCe95atGMM= 36 | github.com/goccy/go-json v0.10.3 h1:KZ5WoDbxAIgm2HNbYckL0se1fHD6rz5j4ywS6ebzDqA= 37 | github.com/goccy/go-json v0.10.3/go.mod h1:oq7eo15ShAhp70Anwd5lgX2pLfOS3QCiwU/PULtXL6M= 38 | github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI= 39 | github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= 40 | github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= 41 | github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0= 42 | github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= 43 | github.com/grpc-ecosystem/grpc-gateway/v2 v2.22.0 h1:asbCHRVmodnJTuQ3qamDwqVOIjwqUPTYmYuemVOx+Ys= 44 | github.com/grpc-ecosystem/grpc-gateway/v2 v2.22.0/go.mod h1:ggCgvZ2r7uOoQjOyu2Y1NhHmEPPzzuhWgcza5M1Ji1I= 45 | github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0= 46 | github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4= 47 | github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM= 48 | github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo= 49 | github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg= 50 | github.com/klauspost/cpuid/v2 v2.2.8 h1:+StwCXwm9PdpiEkPyzBXIy+M9KUb4ODm0Zarf1kS5BM= 51 | github.com/klauspost/cpuid/v2 v2.2.8/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws= 52 | github.com/knz/go-libedit v1.10.1/go.mod h1:MZTVkCWyz0oBc7JOWP3wNAzd002ZbM/5hgShxwh4x8M= 53 | github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= 54 | github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= 55 | github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= 56 | github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= 57 | github.com/leodido/go-urn v1.4.0 h1:WT9HwE9SGECu3lg4d/dIA+jxlljEa1/ffXKmRjqdmIQ= 58 | github.com/leodido/go-urn v1.4.0/go.mod h1:bvxc+MVxLKB4z00jd1z+Dvzr47oO32F/QSNjSBOlFxI= 59 | github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY= 60 | github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y= 61 | github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= 62 | github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg= 63 | github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= 64 | github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M= 65 | github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk= 66 | github.com/pelletier/go-toml/v2 v2.2.3 h1:YmeHyLY8mFWbdkNWwpr+qIL2bEqT0o95WSdkNHvL12M= 67 | github.com/pelletier/go-toml/v2 v2.2.3/go.mod h1:MfCQTFTvCcUyyvvwm1+G6H/jORL20Xlb6rzQu9GuUkc= 68 | github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= 69 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= 70 | github.com/rogpeppe/go-internal v1.12.0 h1:exVL4IDcn6na9z1rAb56Vxr+CgyK3nn3O+epU5NdKM8= 71 | github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4= 72 | github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= 73 | github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw= 74 | github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo= 75 | github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= 76 | github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= 77 | github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= 78 | github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU= 79 | github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4= 80 | github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg= 81 | github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= 82 | github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI= 83 | github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08= 84 | github.com/ugorji/go/codec v1.2.12 h1:9LC83zGrHhuUA9l16C9AHXAqEV/2wBQ4nkvumAE65EE= 85 | github.com/ugorji/go/codec v1.2.12/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg= 86 | go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin v0.55.0 h1:n4Dd8YaDFeTd2uw+uCHJzOKeqfLgAOlePZpQ5f9cAoE= 87 | go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin v0.55.0/go.mod h1:8aCCTMjP225r98yevEMM5NYDb3ianWLoeIzZ1rPyxHU= 88 | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.55.0 h1:ZIg3ZT/aQ7AfKqdwp7ECpOK6vHqquXXuyTjIO8ZdmPs= 89 | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.55.0/go.mod h1:DQAwmETtZV00skUwgD6+0U89g80NKsJE3DCKeLLPQMI= 90 | go.opentelemetry.io/contrib/propagators/b3 v1.30.0 h1:vumy4r1KMyaoQRltX7cJ37p3nluzALX9nugCjNNefuY= 91 | go.opentelemetry.io/contrib/propagators/b3 v1.30.0/go.mod h1:fRbvRsaeVZ82LIl3u0rIvusIel2UUf+JcaaIpy5taho= 92 | go.opentelemetry.io/otel v1.30.0 h1:F2t8sK4qf1fAmY9ua4ohFS/K+FUuOPemHUIXHtktrts= 93 | go.opentelemetry.io/otel v1.30.0/go.mod h1:tFw4Br9b7fOS+uEao81PJjVMjW/5fvNCbpsDIXqP0pc= 94 | go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.30.0 h1:VrMAbeJz4gnVDg2zEzjHG4dEH86j4jO6VYB+NgtGD8s= 95 | go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.30.0/go.mod h1:qqN/uFdpeitTvm+JDqqnjm517pmQRYxTORbETHq5tOc= 96 | go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.30.0 h1:lsInsfvhVIfOI6qHVyysXMNDnjO9Npvl7tlDPJFBVd4= 97 | go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.30.0/go.mod h1:KQsVNh4OjgjTG0G6EiNi1jVpnaeeKsKMRwbLN+f1+8M= 98 | go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.30.0 h1:umZgi92IyxfXd/l4kaDhnKgY8rnN/cZcF1LKc6I8OQ8= 99 | go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.30.0/go.mod h1:4lVs6obhSVRb1EW5FhOuBTyiQhtRtAnnva9vD3yRfq8= 100 | go.opentelemetry.io/otel/metric v1.30.0 h1:4xNulvn9gjzo4hjg+wzIKG7iNFEaBMX00Qd4QIZs7+w= 101 | go.opentelemetry.io/otel/metric v1.30.0/go.mod h1:aXTfST94tswhWEb+5QjlSqG+cZlmyXy/u8jFpor3WqQ= 102 | go.opentelemetry.io/otel/sdk v1.30.0 h1:cHdik6irO49R5IysVhdn8oaiR9m8XluDaJAs4DfOrYE= 103 | go.opentelemetry.io/otel/sdk v1.30.0/go.mod h1:p14X4Ok8S+sygzblytT1nqG98QG2KYKv++HE0LY/mhg= 104 | go.opentelemetry.io/otel/sdk/metric v1.30.0 h1:QJLT8Pe11jyHBHfSAgYH7kEmT24eX792jZO1bo4BXkM= 105 | go.opentelemetry.io/otel/sdk/metric v1.30.0/go.mod h1:waS6P3YqFNzeP01kuo/MBBYqaoBJl7efRQHOaydhy1Y= 106 | go.opentelemetry.io/otel/trace v1.30.0 h1:7UBkkYzeg3C7kQX8VAidWh2biiQbtAKjyIML8dQ9wmc= 107 | go.opentelemetry.io/otel/trace v1.30.0/go.mod h1:5EyKqTzzmyqB9bwtCCq6pDLktPK6fmGf/Dph+8VI02o= 108 | go.opentelemetry.io/proto/otlp v1.3.1 h1:TrMUixzpM0yuc/znrFTP9MMRh8trP93mkCiDVeXrui0= 109 | go.opentelemetry.io/proto/otlp v1.3.1/go.mod h1:0X1WI4de4ZsLrrJNLAQbFeLCm3T7yBkR0XqQ7niQU+8= 110 | golang.org/x/arch v0.10.0 h1:S3huipmSclq3PJMNe76NGwkBR504WFkQ5dhzWzP8ZW8= 111 | golang.org/x/arch v0.10.0/go.mod h1:FEVrYAQjsQXMVJ1nsMoVVXPZg6p2JE2mx8psSWTDQys= 112 | golang.org/x/crypto v0.27.0 h1:GXm2NjJrPaiv/h1tb2UH8QfgC/hOf/+z0p6PT8o1w7A= 113 | golang.org/x/crypto v0.27.0/go.mod h1:1Xngt8kV6Dvbssa53Ziq6Eqn0HqbZi5Z6R0ZpwQzt70= 114 | golang.org/x/net v0.29.0 h1:5ORfpBpCs4HzDYoodCDBbwHzdR5UrLBZ3sOnUJmFoHo= 115 | golang.org/x/net v0.29.0/go.mod h1:gLkgy8jTGERgjzMic6DS9+SP0ajcu6Xu3Orq/SpETg0= 116 | golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 117 | golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 118 | golang.org/x/sys v0.25.0 h1:r+8e+loiHxRqhXVl6ML1nO3l1+oFoWbnlu2Ehimmi34= 119 | golang.org/x/sys v0.25.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= 120 | golang.org/x/text v0.18.0 h1:XvMDiNzPAl0jr17s6W9lcaIhGUfUORdGCNsuLmPG224= 121 | golang.org/x/text v0.18.0/go.mod h1:BuEKDfySbSR4drPmRPG/7iBdf8hvFMuRexcpahXilzY= 122 | google.golang.org/genproto/googleapis/api v0.0.0-20240903143218-8af14fe29dc1 h1:hjSy6tcFQZ171igDaN5QHOw2n6vx40juYbC/x67CEhc= 123 | google.golang.org/genproto/googleapis/api v0.0.0-20240903143218-8af14fe29dc1/go.mod h1:qpvKtACPCQhAdu3PyQgV4l3LMXZEtft7y8QcarRsp9I= 124 | google.golang.org/genproto/googleapis/rpc v0.0.0-20240903143218-8af14fe29dc1 h1:pPJltXNxVzT4pK9yD8vR9X75DaWYYmLGMsEvBfFQZzQ= 125 | google.golang.org/genproto/googleapis/rpc v0.0.0-20240903143218-8af14fe29dc1/go.mod h1:UqMtugtsSgubUsoxbuAoiCXvqvErP7Gf0so0mK9tHxU= 126 | google.golang.org/grpc v1.66.1 h1:hO5qAXR19+/Z44hmvIM4dQFMSYX9XcWsByfoxutBpAM= 127 | google.golang.org/grpc v1.66.1/go.mod h1:s3/l6xSSCURdVfAnL+TqCNMyTDAGN6+lZeVxnZR128Y= 128 | google.golang.org/protobuf v1.34.2 h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg= 129 | google.golang.org/protobuf v1.34.2/go.mod h1:qYOHts0dSfpeUzUFpOMr/WGzszTmLH+DiWniOlNbLDw= 130 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 131 | gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= 132 | gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= 133 | gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 134 | gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= 135 | gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 136 | nullprogram.com/x/optparse v1.0.0/go.mod h1:KdyPE+Igbe0jQUrVfMqDMeJQIJZEuyV7pjYmp6pbG50= 137 | -------------------------------------------------------------------------------- /day-7/microservice-b/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "context" 5 | "fmt" 6 | "io/ioutil" 7 | "log" 8 | "net/http" 9 | "os" 10 | "time" 11 | 12 | "github.com/gin-gonic/gin" 13 | "github.com/joho/godotenv" 14 | 15 | "go.opentelemetry.io/otel" 16 | "go.opentelemetry.io/otel/attribute" 17 | "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp" 18 | "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" 19 | "go.opentelemetry.io/otel/metric" 20 | sdkmetric "go.opentelemetry.io/otel/sdk/metric" 21 | "go.opentelemetry.io/otel/sdk/resource" 22 | "go.opentelemetry.io/otel/sdk/trace" 23 | semconv "go.opentelemetry.io/otel/semconv/v1.21.0" 24 | 25 | "go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin" 26 | "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp" 27 | ) 28 | 29 | var ( 30 | requestCounter metric.Int64Counter 31 | requestDuration metric.Float64Histogram 32 | activeRequestsCounter metric.Int64UpDownCounter 33 | ) 34 | 35 | func initProvider() (func(context.Context) error, error) { 36 | ctx := context.Background() 37 | 38 | // Load environment variables 39 | err := godotenv.Load() 40 | if err != nil { 41 | log.Println("No .env file found. Using environment variables 👌") 42 | } 43 | 44 | // Read the OTEL collector endpoint from environment variable 45 | otelEndpoint := os.Getenv("OTEL_COLLECTOR_ENDPOINT") 46 | if otelEndpoint == "" { 47 | otelEndpoint = "localhost:4318" // Default endpoint 48 | } 49 | 50 | // Create a resource with the service name 51 | res, err := resource.New(ctx, 52 | resource.WithAttributes( 53 | semconv.ServiceNameKey.String("microservice-b"), 54 | ), 55 | ) 56 | if err != nil { 57 | return nil, fmt.Errorf("failed to create resource: %w", err) 58 | } 59 | 60 | // Create OTLP trace exporter over HTTP with custom endpoint 61 | traceExporter, err := otlptracehttp.New(ctx, 62 | otlptracehttp.WithEndpoint(otelEndpoint), 63 | otlptracehttp.WithInsecure(), 64 | ) 65 | if err != nil { 66 | return nil, fmt.Errorf("failed to create trace exporter: %w", err) 67 | } 68 | 69 | // Create OTLP metric exporter over HTTP with custom endpoint 70 | metricExporter, err := otlpmetrichttp.New(ctx, 71 | otlpmetrichttp.WithEndpoint(otelEndpoint), 72 | otlpmetrichttp.WithInsecure(), 73 | ) 74 | if err != nil { 75 | return nil, fmt.Errorf("failed to create metric exporter: %w", err) 76 | } 77 | 78 | // Create trace provider with the exporter and resource 79 | tracerProvider := trace.NewTracerProvider( 80 | trace.WithBatcher(traceExporter), 81 | trace.WithResource(res), 82 | ) 83 | 84 | // Create metric reader and meter provider with the resource 85 | metricReader := sdkmetric.NewPeriodicReader(metricExporter) 86 | meterProvider := sdkmetric.NewMeterProvider( 87 | sdkmetric.WithReader(metricReader), 88 | sdkmetric.WithResource(res), 89 | ) 90 | 91 | // Set global providers 92 | otel.SetTracerProvider(tracerProvider) 93 | otel.SetMeterProvider(meterProvider) 94 | 95 | return func(ctx context.Context) error { 96 | err := tracerProvider.Shutdown(ctx) 97 | if err != nil { 98 | return err 99 | } 100 | err = meterProvider.Shutdown(ctx) 101 | if err != nil { 102 | return err 103 | } 104 | return nil 105 | }, nil 106 | } 107 | 108 | // Basic Hello Handler 109 | func hello(c *gin.Context) { 110 | startTime := time.Now() 111 | ctx := c.Request.Context() 112 | 113 | // Increment active requests 114 | activeRequestsCounter.Add(ctx, 1) 115 | defer activeRequestsCounter.Add(ctx, -1) 116 | 117 | c.JSON(http.StatusOK, gin.H{ 118 | "message": "👋 Hello from microservice-b", 119 | }) 120 | 121 | duration := time.Since(startTime).Milliseconds() 122 | 123 | requestCounter.Add(ctx, 1, metric.WithAttributes(attribute.String("endpoint", "/hello-b"))) 124 | requestDuration.Record(ctx, float64(duration), metric.WithAttributes(attribute.String("endpoint", "/hello-b"))) 125 | } 126 | 127 | // Call Service A Handler 128 | func callA(c *gin.Context) { 129 | startTime := time.Now() 130 | ctx := c.Request.Context() 131 | 132 | activeRequestsCounter.Add(ctx, 1) 133 | defer activeRequestsCounter.Add(ctx, -1) 134 | 135 | // Load environment variables 136 | err := godotenv.Load() 137 | if err != nil { 138 | log.Println("No .env file found. Using environment variables 👌") 139 | } 140 | 141 | SVC_A_URI := os.Getenv("SVC_A_URI") 142 | if SVC_A_URI == "" { 143 | SVC_A_URI = "http://localhost:8080" // Default URI for service-A 144 | } 145 | 146 | // Create a new HTTP client with OpenTelemetry instrumentation 147 | client := http.Client{ 148 | Transport: otelhttp.NewTransport(http.DefaultTransport), 149 | } 150 | 151 | // Create a new request 152 | req, err := http.NewRequest("GET", fmt.Sprintf("%s/hello-a", SVC_A_URI), nil) 153 | if err != nil { 154 | c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create request to service-A"}) 155 | return 156 | } 157 | 158 | // Use the context from Gin 159 | req = req.WithContext(ctx) 160 | 161 | // Make the request 162 | resp, err := client.Do(req) 163 | if err != nil { 164 | c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to reach service-A"}) 165 | return 166 | } 167 | defer resp.Body.Close() 168 | 169 | resBody, _ := ioutil.ReadAll(resp.Body) 170 | 171 | c.JSON(http.StatusOK, gin.H{ 172 | "message": "🥳 Response from service-A", 173 | "response": string(resBody), 174 | }) 175 | 176 | duration := time.Since(startTime).Milliseconds() 177 | 178 | requestCounter.Add(ctx, 1, metric.WithAttributes(attribute.String("endpoint", "/call-a"))) 179 | requestDuration.Record(ctx, float64(duration), metric.WithAttributes(attribute.String("endpoint", "/call-a"))) 180 | } 181 | 182 | // Get Coffee Handler 183 | func getMeCoffee(c *gin.Context) { 184 | startTime := time.Now() 185 | ctx := c.Request.Context() 186 | 187 | activeRequestsCounter.Add(ctx, 1) 188 | defer activeRequestsCounter.Add(ctx, -1) 189 | 190 | // Create a new HTTP client with OpenTelemetry instrumentation 191 | client := http.Client{ 192 | Transport: otelhttp.NewTransport(http.DefaultTransport), 193 | } 194 | 195 | // Create a new request 196 | req, err := http.NewRequest("GET", "https://api.sampleapis.com/coffee/iced", nil) 197 | if err != nil { 198 | c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create request to coffee API"}) 199 | return 200 | } 201 | 202 | // Use the context from Gin 203 | req = req.WithContext(ctx) 204 | 205 | // Make the request 206 | resp, err := client.Do(req) 207 | if err != nil { 208 | c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to fetch coffee"}) 209 | return 210 | } 211 | defer resp.Body.Close() 212 | 213 | resBody, _ := ioutil.ReadAll(resp.Body) 214 | 215 | c.JSON(http.StatusOK, gin.H{ 216 | "message": "🍵 Here is your coffee", 217 | "response": string(resBody), 218 | }) 219 | 220 | duration := time.Since(startTime).Milliseconds() 221 | 222 | requestCounter.Add(ctx, 1, metric.WithAttributes(attribute.String("endpoint", "/getme-coffee"))) 223 | requestDuration.Record(ctx, float64(duration), metric.WithAttributes(attribute.String("endpoint", "/getme-coffee"))) 224 | } 225 | 226 | func main() { 227 | ctx := context.Background() 228 | shutdown, err := initProvider() 229 | if err != nil { 230 | log.Fatalf("Failed to initialize OpenTelemetry: %v", err) 231 | } 232 | defer func() { 233 | if err := shutdown(ctx); err != nil { 234 | log.Fatalf("Error shutting down provider: %v", err) 235 | } 236 | }() 237 | 238 | router := gin.Default() 239 | 240 | // Use OpenTelemetry middleware for Gin 241 | router.Use(otelgin.Middleware("microservice-b")) 242 | 243 | // Initialize the Meter 244 | meter := otel.GetMeterProvider().Meter("microservice-b") 245 | 246 | // Initialize instruments using the Meter interface methods 247 | requestCounter, err = meter.Int64Counter( 248 | "request_count", 249 | metric.WithDescription("Counts the number of requests received"), 250 | ) 251 | if err != nil { 252 | log.Fatalf("Failed to create counter: %v", err) 253 | } 254 | 255 | requestDuration, err = meter.Float64Histogram( 256 | "request_duration_ms", 257 | metric.WithDescription("Records the duration of requests in milliseconds"), 258 | ) 259 | if err != nil { 260 | log.Fatalf("Failed to create histogram: %v", err) 261 | } 262 | 263 | activeRequestsCounter, err = meter.Int64UpDownCounter( 264 | "active_requests", 265 | metric.WithDescription("Counts the number of active requests"), 266 | ) 267 | if err != nil { 268 | log.Fatalf("Failed to create up-down counter: %v", err) 269 | } 270 | 271 | router.GET("/hello-b", hello) 272 | router.GET("/call-a", callA) 273 | router.GET("/getme-coffee", getMeCoffee) 274 | 275 | PORT := os.Getenv("PORT") 276 | if PORT == "" { 277 | PORT = "80" // Default URI for service-B 278 | } 279 | 280 | // Start the server 281 | router.Run(fmt.Sprintf(":%s", PORT)) 282 | } 283 | -------------------------------------------------------------------------------- /day-7/microservice-b/otel-collector-config.yaml: -------------------------------------------------------------------------------- 1 | # 👉 Note: this file is for to test in local environment - nothing to do with k8s 2 | 3 | 4 | # receivers: 5 | # otlp: 6 | # protocols: 7 | # http: 8 | # grpc: 9 | 10 | receivers: 11 | otlp: 12 | protocols: 13 | http: 14 | endpoint: "0.0.0.0:4318" 15 | grpc: 16 | endpoint: "0.0.0.0:4317" 17 | 18 | processors: 19 | batch: 20 | 21 | exporters: 22 | prometheus: 23 | endpoint: "0.0.0.0:8889" 24 | otlp: 25 | endpoint: "jaeger:4317" # Send data to Jaeger over gRPC 26 | tls: 27 | insecure: true 28 | service: 29 | pipelines: 30 | metrics: 31 | receivers: [otlp] 32 | processors: [batch] 33 | exporters: [prometheus] 34 | traces: 35 | receivers: [otlp] 36 | processors: [batch] 37 | exporters: [otlp] 38 | -------------------------------------------------------------------------------- /day-7/microservice-b/prometheus.yaml: -------------------------------------------------------------------------------- 1 | # 👉 Note: this file is for to test in local environment - nothing to do with k8s 2 | 3 | global: 4 | scrape_interval: 2s 5 | 6 | scrape_configs: 7 | - job_name: 'otel-collector' 8 | scrape_interval: 2s 9 | static_configs: 10 | - targets: ['otel-collector:8889'] 11 | -------------------------------------------------------------------------------- /day-7/microservice-b/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Set the base URL of your Node.js application 4 | BASE_URL="http://localhost:8081" 5 | 6 | echo $BASE_URL 7 | 8 | # Define an array of endpoints 9 | ENDPOINTS=( 10 | "/hello-b" 11 | "/call-a" 12 | "/getme-coffee" 13 | ) 14 | 15 | # Function to make a random request to one of the endpoints 16 | make_random_request() { 17 | local endpoint=${ENDPOINTS[$RANDOM % ${#ENDPOINTS[@]}]} 18 | curl -s -o /dev/null -w "%{http_code}" "$BASE_URL$endpoint" 19 | } 20 | 21 | # Make 1000 random requests 22 | for ((i=1; i<=1000; i++)); do 23 | make_random_request 24 | echo "Request $i completed" 25 | sleep 0.1 # Optional: Sleep for a short duration between requests to simulate real traffic 26 | done 27 | 28 | echo "Completed 1000 requests" 29 | -------------------------------------------------------------------------------- /day-7/otel-collector-values.yaml: -------------------------------------------------------------------------------- 1 | # otel-collector-values.yaml 2 | 3 | mode: "deployment" 4 | 5 | config: 6 | receivers: 7 | otlp: 8 | protocols: 9 | http: 10 | endpoint: "0.0.0.0:4318" 11 | grpc: 12 | endpoint: "0.0.0.0:4317" 13 | 14 | processors: 15 | batch: {} 16 | 17 | exporters: 18 | prometheus: 19 | endpoint: "0.0.0.0:8889" 20 | 21 | # jaeger: 22 | # endpoint: "jaeger-collector.olly:14250" # Jaeger gRPC endpoint 23 | # insecure: true 24 | 25 | otlp: 26 | endpoint: "jaeger-collector.olly:4317" # Update as per your Jaeger service 27 | tls: 28 | insecure: true 29 | debug: 30 | verbosity: detailed 31 | service: 32 | pipelines: 33 | metrics: 34 | receivers: [otlp] 35 | processors: [batch] 36 | exporters: [prometheus] 37 | traces: 38 | receivers: [otlp] 39 | processors: [batch] 40 | exporters: [otlp] 41 | 42 | image: 43 | repository: "otel/opentelemetry-collector-contrib" # Use contrib image 44 | tag: "latest" # Specify the desired tag 45 | pullPolicy: "IfNotPresent" 46 | 47 | command: 48 | name: "otelcol-contrib" # Optional: Update command name if necessary 49 | 50 | service: 51 | type: ClusterIP 52 | 53 | # Uncomment and configure if using ServiceMonitor 54 | # serviceMonitor: 55 | # enabled: true 56 | # namespace: olly 57 | # selector: 58 | # matchLabels: 59 | # app: otel-collector 60 | # endpoints: 61 | # - port: prometheus 62 | # interval: 2s 63 | 64 | resources: 65 | requests: 66 | memory: "256Mi" 67 | cpu: "250m" 68 | limits: 69 | memory: "512Mi" 70 | cpu: "500m" 71 | 72 | ports: 73 | prometheus: 74 | enabled: true 75 | containerPort: 8889 76 | servicePort: 8889 77 | hostPort: 8889 78 | protocol: TCP 79 | appProtocol: TCP 80 | otlp: 81 | enabled: true 82 | containerPort: 4317 83 | servicePort: 4317 84 | hostPort: 4317 85 | protocol: TCP 86 | # nodePort: 30317 87 | appProtocol: grpc 88 | otlp-http: 89 | enabled: true 90 | containerPort: 4318 91 | servicePort: 4318 92 | hostPort: 4318 93 | protocol: TCP 94 | jaeger-compact: 95 | enabled: true 96 | containerPort: 6831 97 | servicePort: 6831 98 | hostPort: 6831 99 | protocol: UDP 100 | jaeger-thrift: 101 | enabled: true 102 | containerPort: 14268 103 | servicePort: 14268 104 | hostPort: 14268 105 | protocol: TCP 106 | jaeger-grpc: 107 | enabled: true 108 | containerPort: 14250 109 | servicePort: 14250 110 | hostPort: 14250 111 | protocol: TCP 112 | zipkin: 113 | enabled: true 114 | containerPort: 9411 115 | servicePort: 9411 116 | hostPort: 9411 117 | protocol: TCP -------------------------------------------------------------------------------- /day-7/prometheus-values.yaml: -------------------------------------------------------------------------------- 1 | serverFiles: 2 | prometheus.yml: 3 | scrape_configs: 4 | - job_name: otel-collector 5 | static_configs: 6 | - targets: 7 | - otel-collector-opentelemetry-collector.olly:8889 8 | 9 | alertmanager: 10 | enabled: false 11 | 12 | prometheus-pushgateway: 13 | enabled: false 14 | 15 | kube-state-metrics: 16 | enabled: false 17 | 18 | prometheus-node-exporter: 19 | enabled: false -------------------------------------------------------------------------------- /day-7/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Check if both load balancers are provided as input 4 | if [ $# -ne 2 ]; then 5 | echo "Usage: $0 " 6 | exit 1 7 | fi 8 | 9 | # Assign input arguments to variables 10 | LB1=$1 11 | LB2=$2 12 | 13 | # Define available routes for LB1 and LB2 14 | LB1_ROUTES=("/call-b" "/hello-a" "/getme-coffee") 15 | LB2_ROUTES=("/call-a" "/hello-b" "/getme-coffee") 16 | 17 | # Function to generate random index and request from LB1 18 | request_lb1() { 19 | RANDOM_INDEX=$((RANDOM % ${#LB1_ROUTES[@]})) 20 | URL="$LB1${LB1_ROUTES[$RANDOM_INDEX]}" 21 | echo "Sending request to LB1: $URL" 22 | curl -s -o /dev/null -w "%{http_code}" $URL 23 | } 24 | 25 | # Function to generate random index and request from LB2 26 | request_lb2() { 27 | RANDOM_INDEX=$((RANDOM % ${#LB2_ROUTES[@]})) 28 | URL="$LB2${LB2_ROUTES[$RANDOM_INDEX]}" 29 | echo "Sending request to LB2: $URL" 30 | curl -s -o /dev/null -w "%{http_code}" $URL 31 | } 32 | 33 | # Loop for sending requests to both LBs randomly 34 | while true; do 35 | # Randomly choose between LB1 and LB2 36 | if (( RANDOM % 2 == 0 )); then 37 | request_lb1 38 | else 39 | request_lb2 40 | fi 41 | 42 | # Sleep for 1 second between requests (adjust if needed) 43 | sleep 1 44 | done 45 | -------------------------------------------------------------------------------- /opensearch-stack/fluent-bit-config.yaml: -------------------------------------------------------------------------------- 1 | # fluent-bit-config.yaml 2 | apiVersion: v1 3 | kind: ConfigMap 4 | metadata: 5 | name: fluent-bit-config 6 | namespace: logging 7 | data: 8 | fluent-bit.conf: | 9 | [SERVICE] 10 | Flush 5 11 | Daemon Off 12 | Log_Level info 13 | Parsers_File parsers.conf 14 | 15 | [INPUT] 16 | Name tail 17 | Path /var/log/containers/*.log 18 | Parser docker 19 | Tag kube.* 20 | Refresh_Interval 5 21 | Mem_Buf_Limit 5MB 22 | Skip_Long_Lines On 23 | 24 | [FILTER] 25 | Name kubernetes 26 | Match kube.* 27 | Kube_URL https://kubernetes.default.svc:443 28 | Merge_Log On 29 | K8S-Logging.Parser On 30 | K8S-Logging.Exclude Off 31 | 32 | [OUTPUT] 33 | Name opensearch 34 | Match * 35 | Host 36 | Port 37 | Index fluentbit 38 | HTTP_User 39 | HTTP_Passwd 40 | TLS On 41 | TLS.verify Off 42 | Suppress_Type_Name On 43 | Include_Tag_Key On 44 | Logstash_Format On 45 | Logstash_Prefix kubernetes 46 | Replace_Dots On 47 | Retry_Limit False 48 | # Add these parameters for OpenSearch compatibility 49 | Write_Operation create 50 | 51 | 52 | parsers.conf: | 53 | [PARSER] 54 | Name docker 55 | Format json 56 | Time_Key time 57 | Time_Format %Y-%m-%dT%H:%M:%S.%L 58 | Time_Keep On 59 | Decode_Field_As escaped_utf8 log do_next 60 | Decode_Field_As json log 61 | -------------------------------------------------------------------------------- /opensearch-stack/fluent-bit-daemonset.yaml: -------------------------------------------------------------------------------- 1 | # fluent-bit-daemonset.yaml 2 | apiVersion: apps/v1 3 | kind: DaemonSet 4 | metadata: 5 | name: fluent-bit 6 | namespace: logging 7 | labels: 8 | app: fluent-bit 9 | spec: 10 | selector: 11 | matchLabels: 12 | app: fluent-bit 13 | template: 14 | metadata: 15 | labels: 16 | app: fluent-bit 17 | spec: 18 | serviceAccountName: fluent-bit 19 | containers: 20 | - name: fluent-bit 21 | image: fluent/fluent-bit:3.0.4 22 | imagePullPolicy: Always 23 | volumeMounts: 24 | - name: varlog 25 | mountPath: /var/log 26 | - name: varlibdockercontainers 27 | mountPath: /var/lib/docker/containers 28 | readOnly: true 29 | - name: config 30 | mountPath: /fluent-bit/etc/ 31 | volumes: 32 | - name: varlog 33 | hostPath: 34 | path: /var/log 35 | - name: varlibdockercontainers 36 | hostPath: 37 | path: /var/lib/docker/containers 38 | - name: config 39 | configMap: 40 | name: fluent-bit-config 41 | -------------------------------------------------------------------------------- /opensearch-stack/log-generator.yaml: -------------------------------------------------------------------------------- 1 | # log-generator-deployment.yaml 2 | apiVersion: apps/v1 3 | kind: Deployment 4 | metadata: 5 | name: log-generator 6 | labels: 7 | app: log-generator 8 | spec: 9 | replicas: 1 10 | selector: 11 | matchLabels: 12 | app: log-generator 13 | template: 14 | metadata: 15 | labels: 16 | app: log-generator 17 | spec: 18 | containers: 19 | - name: log-generator 20 | image: busybox 21 | command: ["/bin/sh", "-c"] 22 | args: 23 | - | 24 | while true; do 25 | echo "$(date) INFO: Application started"; 26 | echo "$(date) DEBUG: Debugging app logic"; 27 | echo "$(date) ERROR: An error occurred!"; 28 | sleep 5; 29 | done 30 | -------------------------------------------------------------------------------- /opensearch-stack/prerequisites.md: -------------------------------------------------------------------------------- 1 | # OpenSearch Stack 2 | 3 | ### Prerequisites 4 | 5 | - Setup a Kubernetes cluster 6 | - Create the logging namespace - `kubectl create ns logging` 7 | - Create serviceaccount in the namespace - `kubectl create sa fluent-bit -n logging` 8 | 9 | --------------------------------------------------------------------------------