├── .dockerignore ├── .github └── workflows │ └── build_and_push_images.yml ├── .gitignore ├── 00-cluster-setup.md ├── 01-collector-introduction.md ├── 02-operator-introduction.md ├── 03-app-instrumentation.md ├── 04-metrics.md ├── 05-logs.md ├── 06-roadmap.md ├── LICENSE ├── README.md ├── app ├── README.md ├── backend1 │ ├── Dockerfile │ ├── app.py │ ├── requirements.txt │ └── run.sh ├── backend2 │ ├── .gitignore │ ├── Dockerfile │ ├── build.gradle │ ├── gradle │ │ └── wrapper │ │ │ ├── gradle-wrapper.jar │ │ │ └── gradle-wrapper.properties │ ├── gradlew │ ├── gradlew.bat │ ├── run.sh │ ├── settings.gradle │ └── src │ │ └── main │ │ ├── java │ │ └── io │ │ │ └── opentelemetry │ │ │ └── dice │ │ │ ├── DiceApplication.java │ │ │ └── RollController.java │ │ └── resources │ │ └── application.properties ├── backend3 │ ├── Dockerfile │ ├── Program.cs │ ├── Properties │ │ └── launchSettings.json │ ├── appsettings.Development.json │ ├── appsettings.json │ └── backend3.csproj ├── docker-compose.yml ├── frontend │ ├── Dockerfile │ ├── index.js │ ├── instrument.js │ ├── package-lock.json │ ├── package.json │ └── run.sh ├── instrumentation.yaml ├── k8s-annotated.yaml ├── k8s.yaml ├── loadgen │ ├── Dockerfile │ └── run.sh └── otel-env ├── backend ├── 01-backend.yaml ├── 02-collector.yaml ├── 03-collector-prom-cr.yaml ├── 04-servicemonitors.yaml └── 05-collector-daemonset.yaml ├── collector-config.yaml ├── images ├── filelog-flow.png ├── grafana-complete-trace.png ├── grafana-metrics-backend1-prometheus.png ├── grafana-metrics-backend1.png ├── grafana-metrics-backend2.png ├── grafana-metrics-collector-addtl-scrapes.png ├── grafana-metrics-collector-red.png ├── grafana-metrics-frontend.png ├── grafana-metrics-ta-server.png ├── grafana-metrics-ta.png ├── grafana-traces-player-attribute.jpg ├── grafana-traces-resoure.jpg ├── logs-dashboard.png └── otel-collector.png └── slides.pdf /.dockerignore: -------------------------------------------------------------------------------- 1 | Dockerfile 2 | node_modules -------------------------------------------------------------------------------- /.github/workflows/build_and_push_images.yml: -------------------------------------------------------------------------------- 1 | name: "Build and Push Images" 2 | 3 | on: 4 | push: 5 | paths: 6 | - "app/**" 7 | 8 | jobs: 9 | build-and-push-image: 10 | runs-on: ubuntu-latest 11 | permissions: 12 | contents: read 13 | packages: write 14 | 15 | strategy: 16 | matrix: 17 | app: 18 | - frontend 19 | - backend1 20 | - backend2 21 | - loadgen 22 | 23 | steps: 24 | - name: Checkout repository 25 | uses: actions/checkout@v3 26 | 27 | - name: Log in to the Container registry 28 | uses: docker/login-action@v2 29 | with: 30 | registry: ghcr.io 31 | username: ${{ github.actor }} 32 | password: ${{ secrets.GITHUB_TOKEN }} 33 | 34 | - name: Set up QEMU 35 | uses: docker/setup-qemu-action@v2 36 | 37 | - name: Set up Docker Buildx 38 | uses: docker/setup-buildx-action@v2 39 | with: 40 | config-inline: | 41 | [worker.oci] 42 | max-parallelism = 2 43 | 44 | - name: Build and push images 45 | uses: docker/build-push-action@v3.3.0 46 | with: 47 | context: ./app/${{ matrix.app }} 48 | file: ./app/${{ matrix.app }}/Dockerfile 49 | platforms: linux/amd64,linux/arm64 50 | push: true 51 | tags: ghcr.io/${{ github.repository }}-${{ matrix.app }} 52 | cache-from: type=gha 53 | cache-to: type=gha 54 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .gradle 2 | build 3 | node_modules 4 | obj 5 | bin 6 | __pycache__ 7 | javaagent.jar 8 | -------------------------------------------------------------------------------- /00-cluster-setup.md: -------------------------------------------------------------------------------- 1 | # Cluster setup 2 | 3 | This tutorial requires a docker and Kubernetes cluster, refer to [Kind](https://kind.sigs.k8s.io/docs/user/quick-start/) or [Minikube](https://minikube.sigs.k8s.io/docs/start/) for a local Kubernetes cluster installations. 4 | 5 | ## Quickstart 6 | 7 | ### Kubectl 8 | 9 | Almost all of the following steps in this tutorial require kubectl. Your used version should not differ more than +-1 from the used cluster version. Please follow [this](https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-kubectl-binary-with-curl-on-linux) installation guide. 10 | 11 | ### Kind 12 | 13 | If [go](https://go.dev/) is installed on your machine, `kind` can be easily installed as follows: 14 | 15 | ```bash 16 | go install sigs.k8s.io/kind@v0.18.0 17 | ``` 18 | 19 | If this is not the case, simply download the [kind-v0.18.0](https://github.com/kubernetes-sigs/kind/releases/tag/v0.18.0) binary from the release page. (Other versions will probably work too. :cowboy_hat_face:) 20 | 21 | ### Create a workshop cluster 22 | 23 | After a successful installation, a cluster can be created as follows: 24 | 25 | ```bash 26 | kind create cluster --name=workshop --image kindest/node:v1.26.3 27 | ``` 28 | 29 | Kind automatically sets the kube context to the created workshop cluster. We can easily check this by getting information about our nodes. 30 | 31 | ```bash 32 | kubectl get nodes 33 | ``` 34 | Expected is the following: 35 | 36 | ```bash 37 | NAME STATUS ROLES AGE VERSION 38 | workshop-control-plane Ready control-plane 75s v1.26.3 39 | ``` 40 | 41 | ### Cleanup 42 | 43 | ```bash 44 | kind delete cluster --name=workshop 45 | ``` 46 | 47 | ### Telemetrygen (optional) 48 | 49 | To send telemetry to the OpenTelemetry Collector (that will be created in step 1), there is a `telemetrygen` helper tool [in the contrib repository avaliable](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.75.0/cmd/telemetrygen). If go is not installed, the container image can be used. 50 | 51 | ```bash 52 | go install github.com/open-telemetry/opentelemetry-collector-contrib/cmd/telemetrygen@v0.74.0 53 | ``` 54 | 55 | --- 56 | [Next steps](./README.md#deploy-cert-manager) 57 | -------------------------------------------------------------------------------- /01-collector-introduction.md: -------------------------------------------------------------------------------- 1 | # OpenTelemetry Collector introduction 2 | 3 | This tutorial step focuses on the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) introduction. 4 | 5 | ## Collector Overview 6 | 7 | Structure of the OpenTelemetry Collector. 8 | > Vendor-agnostic way to receive, process and export telemetry data. 9 | 10 | ![OpenTelemetry Collector](images/otel-collector.png) 11 | 12 | 13 | The OpenTelemetry Collector can be devided into a few major components. 14 | 15 | - **Receivers**: Collect data from a specific source, like an application or infrastructure, and convert it into [pData (pipeline data)](https://pkg.go.dev/go.opentelemetry.io/collector/consumer/pdata#section-documentation). This component can be active (e.g. Prometheus) or passive (OTLP). 16 | - **Processors**: Manipulates the data collected by receivers in some way. For example, a processor might filter out irrelevant data, or add metadata to help with analysis. Like the batch or metric renaming processor. 17 | - **Exporters**: Send data to an external system for storage or analysis. Examples are Prometheus, Loki or the OTLP exporter. 18 | - **Extensions**: Add additional functionality to OpenTelemetry, like configuring a bearer token or offering a Jaeger remote sampling endpoint. 19 | - **Connectors**: Is both an exporter and receiver. It consumes data as an exporter in one pipeline and emits data as a receiver in another pipeline. 20 | 21 | For more details, check the [offical documentation](https://opentelemetry.io/docs/collector/). 22 | 23 | ### Collector Distributions 24 | 25 | A set of receivers, processors, exporters, extensions and connectors is a distribution. Officially, there are two distributions provided with the names `core` and `contrib`. New releases are currently published every two weeks. After a release, the default collector distributions are available as binary, container image and linux distro packages. 26 | 27 | - **core**: Components are developed and maintained by the OpenTelemetry core team. [[manifest-v0.74.0](https://github.com/open-telemetry/opentelemetry-collector-releases/blob/v0.74.0/distributions/otelcol/manifest.yaml)] 28 | - **contrib**: Extends the core distribution with a large list of components developed by the OpenTelemetry Community components. [[manifest-v0.74.0](https://github.com/open-telemetry/opentelemetry-collector-releases/blob/v0.74.0/distributions/otelcol-contrib/manifest.yaml)] 29 | 30 | ### OpenTelemetry Collector Builder (OCB) 31 | 32 | With each new release of the OpenTelemetry Collector distributions, a new version of the [OpenTelemetry Collector Builder](https://github.com/open-telemetry/opentelemetry-collector/blob/v0.74.0/cmd/builder) is released too. 33 | 34 | ```yaml 35 | receivers: 36 | - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.74.0 37 | - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.74.0 38 | - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver v0.74.0 39 | processors: 40 | - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.74.0 41 | exporters: 42 | - gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.74.0 43 | - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.74.0 44 | - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter v0.74.0 45 | - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter v0.74.0 46 | extensions: 47 | - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/oidcauthextension v0.74.0 48 | connectors: 49 | - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/connector/spanmetricsconnector v0.74.0 50 | ``` 51 | 52 | ### Configuration 53 | 54 | The configuration of the Open Telemetry Collector is described in yaml. The following shows an `OTLP/gRPC` receiver listening on `localhost:4317`. A batch processor with default parameters and a logging exporter with a normal log level. It also describes multiple pipelines for different telemetry data, which all route their collected telemetry data to the logging exporter. 55 | 56 | The easiest way to learn more about the configuration options of individual components is to visit the readme in the component folder directly. Example [loggingexporter](https://github.com/open-telemetry/opentelemetry-collector/blob/v0.74.0/exporter/loggingexporter). 57 | 58 | ```yaml 59 | receivers: 60 | otlp: 61 | protocols: 62 | grpc: 63 | endpoint: 127.0.0.1:4317 64 | processors: 65 | batch: 66 | 67 | exporters: 68 | logging: 69 | verbosity: normal 70 | 71 | service: 72 | pipelines: 73 | metrics: 74 | receivers: [otlp] 75 | processors: [batch] 76 | exporters: [logging] 77 | logs: 78 | receivers: [otlp] 79 | processors: [batch] 80 | exporters: [logging] 81 | traces: 82 | receivers: [otlp] 83 | processors: [batch] 84 | exporters: [logging] 85 | ``` 86 | 87 | ### Run collector locally 88 | 89 | Here we launch a collector, which is accessible via localhost, with the configuration shown above: 90 | ```bash 91 | docker run --rm -it --name otel-collector -p 4317:4317 -p 4318:4318 ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector:0.74.0 --config https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2023-opentelemetry-kubernetes-tutorial/main/collector-config.yaml 92 | ``` 93 | 94 | ### Send telemetry data to your Collector 95 | 96 | The previously configured collector is now listening on `localhost:4317` **without** TLS. To test whether the collector actually receives metrics, logs, traces and passes them on to the specified logging exporter, we can generate test data with `telemetrygen`. 97 | 98 | ```bash 99 | telemetrygen metrics --otlp-insecure --duration 10s --rate 4 100 | # or 101 | telemetrygen logs --otlp-insecure --duration 10s --rate 4 102 | # or 103 | telemetrygen traces --otlp-insecure --duration 10s --rate 4 104 | ``` 105 | 106 | If you do not have `telemetrygen` installed, alternatively you can use the container image instead: 107 | ```bash 108 | docker run --rm -it --link otel-collector ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.74.0 metrics --otlp-endpoint=otel-collector:4317 --otlp-insecure --duration 10s --rate 4 109 | # or 110 | docker run --rm -it --link otel-collector ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.74.0 logs --otlp-endpoint=otel-collector:4317 --otlp-insecure --duration 10s --rate 4 111 | # or 112 | docker run --rm -it --link otel-collector ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.74.0 traces --otlp-endpoint=otel-collector:4317 --otlp-insecure --duration 10s --rate 4 113 | ``` 114 | 115 | Expected output: 116 | 117 | ```bash 118 | 2023-04-22T12:27:26.638Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "#logs": 1} 119 | 2023-04-22T12:27:30.248Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "#logs": 2} 120 | 2023-04-22T12:27:34.457Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "logging", "#metrics": 2} 121 | 2023-04-22T12:27:34.857Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "logging", "#metrics": 1} 122 | 2023-04-22T12:27:39.468Z info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 8} 123 | 2023-04-22T12:27:41.473Z info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 10} 124 | ``` 125 | 126 | --- 127 | [Next steps](./02-operator-introduction.md) 128 | -------------------------------------------------------------------------------- /02-operator-introduction.md: -------------------------------------------------------------------------------- 1 | # OpenTelemetry Operator introduction 2 | 3 | This tutorial step focuses on the [OpenTelemetry operator](https://github.com/open-telemetry/opentelemetry-operator) introduction. 4 | 5 | ## What is Kubernetes operator 6 | 7 | Kubernetes operator can: 8 | * Create `CustomResouceDefinitions` (CRD) in the cluster 9 | * Hide deployment complexity of the application 10 | * Support application upgrades (handles breaking changes, schema migrations) 11 | * Auto scale the application 12 | 13 | ## What is OpenTelemetry operator 14 | 15 | OpenTelemetry Kubernetes operator can: 16 | * Deploy and manage OpenTelemetry collector 17 | * Instrument workloads with OpenTelemetry auto-instrumentation/agents (see [app instrumentation tutorial step](./03-app-instrumentation.md)). Supports `Java`, `.Net`, `Node.JS` and `Python`. 18 | * Read Prometheus `podmonitor.monitoring.coreos.com` and `servicemonitor.monitoring.coreos.com` and distribute scrape targets across deployed OpenTelemetry collectors (see [metrics tutorial step](./04-metrics.md)) 19 | 20 | It manages two `CustomResourceDefinition`s (CRDs): 21 | * `opentelemetrycollectors.opentelemetry.io`, short name `otelcol` 22 | * `instrumentations.opentelemetry.io`, short name `otelinst` 23 | 24 | ## Deploy the operator 25 | 26 | The operator installation consists of the operator `Deployment`, `Service`, `ClusterRole`, `ClusterRoleBinding`, `CustomResourceDefinitions` etc. 27 | 28 | The operator can be deployed via: 29 | * [Apply operator Kubernetes manifest files](https://github.com/open-telemetry/opentelemetry-operator/releases) 30 | * [OperatorHub for Kubernetes](https://operatorhub.io/operator/opentelemetry-operator) 31 | * OperatorHub on OpenShift 32 | 33 | The default operator installation uses cert-manager to provision certificates for the validating and mutating admission webhooks. 34 | 35 | ### Deploy the operator to the local Kubernetes cluster 36 | 37 | Deploy OpenTelemetry operator to the cluster: 38 | 39 | ```bash 40 | kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.yaml 41 | sleep 50 # wait until cert-manager is up and ready 42 | kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/download/v0.74.0/opentelemetry-operator.yaml 43 | ``` 44 | 45 | Verify operator installation: 46 | 47 | ```bash 48 | kubectl get pods -w -n opentelemetry-operator-system 49 | ``` 50 | 51 | ## OpenTelemetry collector CRD 52 | 53 | Example `OpenTelemetryCollector` CR: 54 | 55 | ```yaml 56 | apiVersion: opentelemetry.io/v1alpha1 57 | kind: OpenTelemetryCollector 58 | metadata: 59 | name: otel 60 | spec: 61 | image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.74.0 62 | mode: deployment # statefulset, daemonset, sidecar 63 | autoscaler: 64 | targetCPUUtilization: 90 65 | minReplicas: 1 66 | maxReplicas: 5 67 | ingress: 68 | hostname: ... 69 | config: | # contains OpenTelemetry collector configuration 70 | receivers: 71 | otlp: 72 | protocols: 73 | grpc: 74 | http: 75 | processors: 76 | batch: 77 | 78 | exporters: 79 | logging: 80 | 81 | service: 82 | pipelines: 83 | traces: 84 | receivers: [otlp] 85 | processors: [batch] 86 | exporters: [logging] 87 | ``` 88 | 89 | The sidecar can be injected to a pod by applying `sidecar.opentelemetry.io/inject: "true"` annotation to a pod spec. 90 | 91 | ### Deploy collector 92 | 93 | See the [collector CR](./backend/02-collector.yaml), and deploy it: 94 | 95 | ```bash 96 | kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2023-opentelemetry-kubernetes-tutorial/main/backend/02-collector.yaml 97 | ``` 98 | 99 | Verify the collector deployment: 100 | ```bash 101 | kubectl get pods -n observability-backend -l app.kubernetes.io/component=opentelemetry-collector -w 102 | ``` 103 | 104 | The collector is by default configured to export data to the observability backed that was deployed in the prerequisites section, however, the configuration can be changed to export data to any [other supported observability system](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter). 105 | 106 | The data to the collector can be pushed via `otel-collector` service in the `observability-backend` namespace. The full endpoint is `otel-collector.observability-backend.svc.cluster.local:4317`. The collector is by default configured to receive OpenTelemetry protocol (OTLP). 107 | 108 | ### Change collector configuration 109 | 110 | In this step we will change the collector config by adding the Jaeger receiver. 111 | 112 | Let's get the currently created services and see what ports are exposed: 113 | 114 | ```bash 115 | kubectl get svc otel-collector -n observability-backend -o yaml 116 | ``` 117 | 118 | Edit the CR: 119 | 120 | ```bash 121 | kubectl edit opentelemetrycollectors.opentelemetry.io otel -n observability-backend 122 | ``` 123 | 124 | Let's add Jaeger receiver: 125 | 126 | ```yaml 127 | receivers: 128 | jaeger: 129 | protocols: 130 | grpc: 131 | thrift_binary: 132 | thrift_compact: 133 | thrift_http: 134 | 135 | service: 136 | pipelines: 137 | traces: 138 | receivers: [otlp, jaeger] 139 | processors: [memory_limiter, batch] 140 | ``` 141 | 142 | The collector pod should be re-deployed and the `otel-collector` service should expose more ports: 143 | 144 | ```bash 145 | kubectl get svc otel-collector -n observability-backend -o yaml 146 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 147 | otel-collector ClusterIP 10.217.4.201 14250/TCP,6832/UDP,6831/UDP,14268/TCP,4317/TCP,4318/TCP,55681/TCP 5m15s 148 | otel-collector-headless ClusterIP None 14250/TCP,6832/UDP,6831/UDP,14268/TCP,4317/TCP,4318/TCP,55681/TCP 5m15s 149 | otel-collector-monitoring ClusterIP 10.217.4.207 8888/TCP 150 | ``` 151 | 152 | ## Instrumentation CRD 153 | 154 | The operator use pod mutating webhook to inject auto-instrumentation libraries into starting pods. 155 | The webhook adds an init container that copies auto-instrumentation libraries into a volume that is mounted as well to the application container and 156 | it configures runtime (e.g. in JVM via `JAVA_TOOL_OPTIONS`) to load the libraries. 157 | 158 | 159 | Example OpenTelemetry `Instrumentation` CR: 160 | 161 | ```yaml 162 | apiVersion: opentelemetry.io/v1alpha1 163 | kind: Instrumentation 164 | metadata: 165 | name: instrumentation 166 | spec: 167 | exporter: 168 | endpoint: http://otel-collector:4317 169 | propagators: 170 | - tracecontext 171 | - baggage 172 | - b3 173 | sampler: 174 | type: parentbased_traceidratio 175 | argument: "1" 176 | resource: 177 | addK8sUIDAttributes: true 178 | attributes: # Add user defined attributes 179 | env: production 180 | python: 181 | env: 182 | - name: OTEL_EXPORTER_OTLP_ENDPOINT 183 | value: http://otel-collector:4318 184 | dotnet: 185 | env: 186 | - name: OTEL_EXPORTER_OTLP_ENDPOINT 187 | value: http://otel-collector:4318 188 | ``` 189 | 190 | Then use annotation on a pod spec to enable the injection e.g. `instrumentation.opentelemetry.io/inject-java: "true"` 191 | 192 | We will create the `Instrumentation` resource in the next tutorial step. 193 | 194 | --- 195 | [Next steps](./03-app-instrumentation.md) 196 | -------------------------------------------------------------------------------- /03-app-instrumentation.md: -------------------------------------------------------------------------------- 1 | # Deploy & instrument the application 2 | 3 | This tutorial step focuses on instrumenting the services of the 4 | [sample application](./app). 5 | 6 | ## Application Description 7 | 8 | The sample application is a simple _"dice game"_, where two players roll a 9 | dice, and the player with the highest number wins. 10 | 11 | There are 3 microservices within this application: 12 | 13 | - Service `frontend` in Node.JS, that has an API endpoint `/` which takes two 14 | player names as query parameters (player1 and player2). The service calls 2 15 | down stream services (backend1, backend2), which each returning a random number 16 | between 1-6. The winner is computed and returned. 17 | - Service `backend1` in python, that has an API endpoint `/rolldice` which takes 18 | a player name as query parameter. The service returns a random number between 19 | 1 and 6. 20 | - Service `backend2` in Java, that also has an API endpoint `/rolldice` which 21 | takes a player name as query parameter. The service returns a random number 22 | between 1 and 6. 23 | 24 | Additionally there is a `loadgen` service, which utilizes `curl` to periodically 25 | call the frontend service. 26 | 27 | Let's assume player `alice` and `bob` use our service, here's a potential 28 | sequence diagram: 29 | 30 | ```mermaid 31 | sequenceDiagram 32 | loadgen->>frontend: /?player1=bob&player2=alice 33 | frontend->>backend1: /rolldice?player=bob 34 | frontend->>backend2: /rolldice?player=alice 35 | backend1-->>frontend: 3 36 | frontend-->>loadgen: bob rolls: 3 37 | backend2-->>frontend: 6 38 | frontend-->>loadgen: alice rolls: 6 39 | frontend-->>loadgen: alice wins 40 | ``` 41 | 42 | ## Manual or Automatic Instrumentation? 43 | 44 | To make your application emit traces, metrics & logs you can either instrument 45 | your application _manually_ or _automatically_: 46 | 47 | - Manual instrumentation means that you modify your code yourself: you initialize and 48 | configure the SDK, you load instrumentation libraries, you create your own spans, 49 | metrics, etc. 50 | Developers can use this approach to tune the observability of their application to 51 | their needs. 52 | - Automatic instrumentation means that you don't have to touch your code to get your 53 | application emit code. 54 | Automatic instrumentation is great to get you started with OpenTelemetry, and it is 55 | also valuable for Application Operators, who have no access or insights about the 56 | source code. 57 | 58 | In the following we will introduce you to both approaches. 59 | 60 | ## Manual instrumentation 61 | 62 | As a developer you can add OpenTelemetry to your code by using the 63 | language-specific SDKs. 64 | 65 | Here you will only instrument the frontend service manually, we will use 66 | automatic instrumentation for the other services in the next step. 67 | 68 | Before starting, make sure that you have an OpenTelemetry collector up and running 69 | locally, as described in the [OpenTelemetry Collector introduction](./01-collector-introduction.md) 70 | 71 | For development you can run the app locally by installing all dependencies 72 | and running it with `nodemon` from the [./app/frontend](./app/frontend/) directory: 73 | 74 | ```bash 75 | cd app/frontend 76 | npm install 77 | npx nodemon index.js 78 | ``` 79 | 80 | If you don't have `Node.JS` installed locally, you can use a container for development: 81 | 82 | ```bash 83 | cd app/frontend 84 | docker run -p 4000:4000 --link otel-collector --rm -t -i -v ${PWD}:/app:z node:18-alpine /bin/sh 85 | ``` 86 | 87 | Within the container run: 88 | 89 | ```bash 90 | cd /app 91 | npm install 92 | npx nodemon index.js 93 | ``` 94 | 95 | Open the [index.js](./app/frontend/index.js) file with your preferred editor. 96 | Use the instructions provided by the 97 | [official OpenTelemetry documentation](https://opentelemetry.io/docs/instrumentation/js/getting-started/nodejs/) 98 | to add tracing & metrics. A few differences in your implementation: 99 | 100 | - Instead of creating a dedicated `instrument.js` you can add the initialization of the SDK at the top of `index.js` directly. 101 | - Replace the `ConsoleSpanExporter` with an `OTLPTraceExporter` as outlined in the [Exporters](https://opentelemetry.io/docs/instrumentation/js/exporters/) documentation (make use of `opentelemetry/exporter-metrics-otlp-grpc` & `opentelemetry/exporter-trace-otlp-grpc`) 102 | 103 | Give it a try yourself, if you are unsure how to accomplish this, you can peek 104 | into the [instrument.js](./app/frontend/instrument.js) file. 105 | 106 | To see if spans are emitted to the collector, call the frontend service via your 107 | browser or curl: 108 | 109 | ```bash 110 | curl localhost:4000/ 111 | ``` 112 | 113 | The **Internal Server Error** response is OK for now, because you don't have the backends 114 | running. 115 | 116 | If all works, your OpenTelemetry collector should receive metrics & traces and 117 | the logs of the frontend service should contain `trace_id` and `span_id` 118 | 119 | Finally, look into the `index.js` file once again, there are a few additional 120 | `TODOs` for you! 121 | 122 | ## Deploy the application 123 | 124 | Run the following command to deploy the sample application to your cluster: 125 | 126 | ```bash 127 | kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2023-opentelemetry-kubernetes-tutorial/main/app/k8s.yaml 128 | ``` 129 | 130 | After a short while, verify that it has been deployed successfully: 131 | 132 | ```bash 133 | $ kubectl get all -n tutorial-application 134 | NAME READY STATUS RESTARTS AGE 135 | pod/loadgen-deployment-5cc46c7f8c-6wwrm 1/1 Running 0 39m 136 | pod/backend1-deployment-69bf64db96-nhd98 1/1 Running 0 19m 137 | pod/frontend-deployment-bdbff495f-wc48h 1/1 Running 0 19m 138 | pod/backend2-deployment-856b75d696-d4m6d 1/1 Running 0 19m 139 | 140 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 141 | service/backend1-service ClusterIP 10.43.194.58 5000/TCP 39m 142 | service/backend2-service ClusterIP 10.43.176.21 5165/TCP 39m 143 | service/frontend-service ClusterIP 10.43.82.230 4000/TCP 39m 144 | 145 | NAME READY UP-TO-DATE AVAILABLE AGE 146 | deployment.apps/loadgen-deployment 1/1 1 1 39m 147 | deployment.apps/backend1-deployment 1/1 1 1 39m 148 | deployment.apps/frontend-deployment 1/1 1 1 39m 149 | deployment.apps/backend2-deployment 1/1 1 1 39m 150 | ``` 151 | 152 | ### Port forward 153 | 154 | Now let's port forward the frontend application: 155 | 156 | ```bash 157 | kubectl port-forward -n tutorial-application svc/frontend-service 4000:4000 158 | ``` 159 | 160 | Open it in the browser [localhost:4000](http://localhost:4000/) 161 | 162 | ## Auto-instrumentation 163 | 164 | The OpenTelemetry Operator supports injecting and configuring 165 | auto-instrumentation for you. 166 | 167 | With the operator & collector running you can now let the Operator know, 168 | what pods to instrument and which auto-instrumentation to use for those pods. 169 | This is done via the `Instrumentation` CRD. A basic `Instrumentation` resource 170 | looks like the following: 171 | 172 | ```yaml 173 | apiVersion: opentelemetry.io/v1alpha1 174 | kind: Instrumentation 175 | metadata: 176 | name: my-instrumentation 177 | namespace: tutorial-application 178 | spec: 179 | exporter: 180 | endpoint: http://otel-collector.observability-backend.svc.cluster.local:4317 181 | ``` 182 | 183 | To create an [Instrumentation resource](./app/instrumentation.yaml) for our sample application run the following 184 | command: 185 | 186 | ```bash 187 | kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2023-opentelemetry-kubernetes-tutorial/main/app/instrumentation.yaml 188 | ``` 189 | 190 | Until now we only have created the `Instrumentation` resource, in a next step you 191 | need to opt-in your services for auto-instrumentation. This is done by updating 192 | your service's `spec.template.metadata.annotations` 193 | 194 | ### Configure Node.JS - frontend service 195 | 196 | You have instrumented the frontend service manually in a previous step. In a real 197 | world scenario you would now rebuild your container image, upload it into the registry 198 | and make use of it in your deployment: 199 | 200 | ```yaml 201 | spec: 202 | containers: 203 | - name: frontend 204 | image: ghcr.io/pavolloffay/kubecon-eu-2023-opentelemetry-kubernetes-tutorial-frontend:latest 205 | env: 206 | - name: OTEL_INSTRUMENTATION_ENABLED 207 | value: "true" 208 | ``` 209 | 210 | To provide you with a shortcut here, we have prepared a way for you to use a _manually_ 211 | instrumented version of the frontend: The environment variable `OTEL_INSTRUMENTATION_ENABLED` set to true 212 | will make sure that the [instrument.js](./app/frontend/instrument.js) is included. 213 | 214 | The `Node.js` auto-instrumentation supports traces and metrics. 215 | 216 | Before applying the annotation let's take a look at the pod specification: 217 | ```bash 218 | kubectl get pods -n tutorial-application -l app=frontend -o yaml 219 | ``` 220 | 221 | All you need to do now, is to inject the configuration: 222 | 223 | ```bash 224 | kubectl patch deployment frontend-deployment -n tutorial-application -p '{"spec": {"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-sdk":"true"}}}} }' 225 | ``` 226 | 227 | Now verify that it worked: 228 | 229 | ```bash 230 | kubectl get pods -n tutorial-application -l app=frontend -o yaml 231 | ``` 232 | 233 | and [access traces](http://localhost:3000/grafana/explore?orgId=1&left=%7B%22datasource%22:%22tempo%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22tempo%22%7D,%22queryType%22:%22nativeSearch%22,%22serviceName%22:%22frontend-deployment%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D). 234 | 235 | ### Instrument Python - backend1 service 236 | 237 | The `Python` auto-instrumentation supports traces and metrics. 238 | 239 | Before applying the annotation let's take a look at the pod specification: 240 | ```bash 241 | kubectl get pods -n tutorial-application -l app=backend1 -o yaml 242 | ``` 243 | 244 | Let's enable in instrumentation by applying the annotation: 245 | 246 | ```bash 247 | kubectl patch deployment backend1-deployment -n tutorial-application -p '{"spec": {"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-python":"true"}}}} }' 248 | ``` 249 | 250 | Now verify the instrumentation: 251 | 252 | ```bash 253 | kubectl get pods -n tutorial-application -l app=backend1 -o yaml 254 | ``` 255 | 256 | and [access traces](http://localhost:3000/grafana/explore?orgId=1&left=%7B%22datasource%22:%22tempo%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22tempo%22%7D,%22queryType%22:%22nativeSearch%22,%22serviceName%22:%22backend1-deployment%22,%22spanName%22:%22%2Frolldice%22%7D,%7B%22refId%22:%22B%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22tempo%22%7D,%22queryType%22:%22traceId%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D). 257 | 258 | ### Instrument Java - backend2 service 259 | 260 | The `Java` auto-instrumentation supports traces, metrics and logs. 261 | 262 | Before applying the annotation let's take a look at the pod specification: 263 | ```bash 264 | kubectl get pods -n tutorial-application -l app=backend2 -o yaml 265 | ``` 266 | 267 | Let's enable in instrumentation by applying the annotation: 268 | 269 | ```bash 270 | kubectl patch deployment backend2-deployment -n tutorial-application -p '{"spec": {"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-java":"true"}}}} }' 271 | ``` 272 | 273 | Now verify the instrumentation: 274 | 275 | ```bash 276 | kubectl get pods -n tutorial-application -l app=backend2 -o yaml 277 | ``` 278 | 279 | and [access traces](http://localhost:3000/grafana/explore?orgId=1&left=%7B%22datasource%22:%22tempo%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22tempo%22%7D,%22queryType%22:%22nativeSearch%22,%22serviceName%22:%22backend2-deployment%22%7D,%7B%22refId%22:%22B%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22tempo%22%7D,%22queryType%22:%22traceId%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D). 280 | 281 | ### The full picture 282 | 283 | How everything should look like after running through the previous steps: 284 | 285 | ```mermaid 286 | 287 | flowchart LR 288 | subgraph namespace: observability-backend 289 | subgraph pod: collector 290 | OC{OTel Collector} 291 | end 292 | subgraph pod: mimir 293 | OC --metrics-->Mimir 294 | end 295 | subgraph pod: loki 296 | OC --logs-->Loki 297 | end 298 | subgraph pod: tempo 299 | OC --traces-->Tempo 300 | end 301 | subgraph pod: grafana 302 | grafana-.->Mimir 303 | grafana-.->Loki 304 | grafana-.->Tempo 305 | end 306 | end 307 | subgraph namespace: app 308 | subgraph pod: loadgen 309 | LG((loadgen)) 310 | end 311 | subgraph pod: frontend 312 | LG --http--> F((frontend)) --metrics,traces--> OC 313 | end 314 | subgraph pod: backend1 315 | F --http--> B1((backend1)) --metrics,traces--> OC 316 | end 317 | subgraph pod: backend2 318 | F --http--> B2((backend2)) --logs,metrics,traces--> OC 319 | end 320 | end 321 | ``` 322 | 323 | Wait for a little bit and then [access your traces once again](http://localhost:3000/grafana/explore?orgId=1&left=%7B%22datasource%22:%22tempo%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22tempo%22%7D,%22queryType%22:%22nativeSearch%22,%22serviceName%22:%22frontend-deployment%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D). You should see traces starting in the frontend and continuing across the backend services. 324 | 325 | ![View of a trace shat shows spans in the frontend, backend1 and backend2](./images/grafana-complete-trace.png) 326 | 327 | ## Resource attributes 328 | 329 | There are several ways how essential Kubernetes resource attributes (`Namespace`, `Deployment`, `ReplicaSet`, `Pod` name and UIDs) can be collected: 330 | 331 | * The `Instrumentation` CR - operator injects the attributes to the application container via `OTEL_RESOURCE_ATTRIBUTES` env var. The OpenTelemetry SDK used in the auto-instrumentation reads the variable. 332 | * The `OpenTelemetryCollector` CR - the [k8sattributesprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) enriches spans with attributes in the collector 333 | * The `OpenTelemetryCollector` CR - in the `sidecar` mode use [resourcedetectionprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor). The operator sets `OTEL_RESOURCE_ATTRIBUTES` with Kubernetes resource attributes and the variable can be consumed by `env` detector see [the blog post](https://opentelemetry.io/blog/2022/k8s-metadata/#using-resource-detector-processor) for more details. 334 | 335 | Kubernetes resource attributes like are set 336 | 337 | ```yaml 338 | apiVersion: v1 339 | kind: Pod 340 | metadata: 341 | annotations: 342 | instrumentation.opentelemetry.io/inject-java: "true" 343 | name: backend2-deployment-58cfcb8db7-tdc8v 344 | namespace: tutorial-application 345 | spec: 346 | containers: 347 | - env: 348 | - name: JAVA_TOOL_OPTIONS 349 | value: ' -javaagent:/otel-auto-instrumentation/javaagent.jar' 350 | - name: OTEL_SERVICE_NAME 351 | value: backend2-deployment 352 | - name: OTEL_EXPORTER_OTLP_ENDPOINT 353 | value: http://otel-collector.observability-backend.svc.cluster.local:4317 354 | - name: OTEL_RESOURCE_ATTRIBUTES_POD_NAME 355 | valueFrom: 356 | fieldRef: 357 | apiVersion: v1 358 | fieldPath: metadata.name 359 | - name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME 360 | valueFrom: 361 | fieldRef: 362 | apiVersion: v1 363 | fieldPath: spec.nodeName 364 | - name: OTEL_PROPAGATORS 365 | value: tracecontext,baggage,b3 366 | - name: OTEL_TRACES_SAMPLER 367 | value: parentbased_traceidratio 368 | - name: OTEL_TRACES_SAMPLER_ARG 369 | value: "1" 370 | - name: OTEL_RESOURCE_ATTRIBUTES 371 | value: k8s.container.name=backend2,k8s.deployment.name=backend2-deployment,k8s.namespace.name=tutorial-application,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.replicaset.name=backend2-deployment-58cfcb8db7 372 | ``` 373 | 374 | Let's enable collection of Kubernetes UID attributes. Update the `Instrumentation` CR: 375 | 376 | ```bash 377 | kubectl edit instrumentations.opentelemetry.io my-instrumentation -n tutorial-application 378 | ``` 379 | 380 | ```yaml 381 | spec: 382 | resource: 383 | addK8sUIDAttributes: true 384 | ``` 385 | 386 | The resource attributes are injected to the application container, to apply the change on already running applications a restart is required: 387 | 388 | ```bash 389 | kubectl rollout restart deployment -n tutorial-application -l app=backend1 390 | kubectl rollout restart deployment -n tutorial-application -l app=backend2 391 | kubectl rollout restart deployment -n tutorial-application -l app=frontend 392 | ``` 393 | 394 | [Traces in Grafana](http://localhost:3000/grafana/explore?orgId=1&left=%7B%22datasource%22:%22tempo%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22tempo%22%7D,%22queryType%22:%22nativeSearch%22,%22serviceName%22:%22frontend-deployment%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D). 395 | 396 | ![Traces in Grafana](./images/grafana-traces-resoure.jpg) 397 | 398 | ## Sampling 399 | 400 | Sampling in OpenTelemetry SDK and auto-instrumentations is configured via `OTEL_TRACES_SAMPLER` and `OTEL_TRACES_SAMPLER_ARG` environment variables. 401 | In our demo these environment variables are configured in the `Instrumentation` CR. 402 | 403 | Let's change the sampling rate (argument) to sample 25% of requests: 404 | 405 | ```bash 406 | kubectl edit instrumentations.opentelemetry.io my-instrumentation -n tutorial-application 407 | ``` 408 | 409 | ```yaml 410 | spec: 411 | sampler: 412 | type: parentbased_traceidratio 413 | argument: "0.25" 414 | ``` 415 | 416 | Restart of applications is required again, the OTEL environment variables are set only at the pod startup: 417 | 418 | ```bash 419 | kubectl rollout restart deployment -n tutorial-application -l app=backend1 420 | kubectl rollout restart deployment -n tutorial-application -l app=backend2 421 | kubectl rollout restart deployment -n tutorial-application -l app=frontend 422 | ``` 423 | 424 | Now let's take a look at the Grafana dashboard of the collector for [received traces](http://localhost:3000/grafana/d/7hHiATL4z/collector?orgId=1&viewPanel=7). 425 | 426 | All possible values of `type` and `argument` are defined in [SDK configuration](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md#general-sdk-configuration) 427 | 428 | ### Remotely configurable sampling 429 | 430 | Jaeger remote sampler allows dynamically configure OpenTelemetry SDKs. 431 | The collector can be configured with Jaeger remote sampler extension that exposes 432 | an endpoint for SDKs to retrieve sampling configuration per service and span operation name. 433 | 434 | * [Jaeger remote sampler spec](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#jaegerremotesampler) 435 | * [SDK sampler configuration](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md#general-sdk-configuration) 436 | * [Collector Jaeger remote sampler extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/jaegerremotesampling) 437 | 438 | ## PII and data manipulation 439 | 440 | The collector can add, change and/or remove data that is flowing through it (spans, attributes etc.). This is useful to extract new attributes that can be later used for querying. Second use-case for data manipulation is to handle personally identifiable information (PII). 441 | 442 | The following collector processors can be used for data manipulation: 443 | 444 | * [attributesprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/attributesprocessor) removes attributes. 445 | * [filterprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/filterprocessor) removes spans and attributes. It supports regex. 446 | * [redactionprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/redactionprocessor) deletes span attributes that don't match a list of allowed span attributes. 447 | * [transformprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor) modifies telemetry based on configuration using the [OpenTelemetry Transformation Language](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl). 448 | 449 | Now let's edit the collector configuration to extract player's name from `http.target` attribute: 450 | 451 | ```bash 452 | kubectl edit opentelemetrycollectors.opentelemetry.io otel -n observability-backend 453 | ``` 454 | 455 | ```yaml 456 | processors: 457 | attributes: 458 | actions: 459 | - key: "http.target" 460 | pattern: ^.*\?player=(?P.*) 461 | action: extract 462 | 463 | service: 464 | pipelines: 465 | traces: 466 | processors: [memory_limiter, attributes, batch] 467 | ``` 468 | 469 | See [traces in Grafana](http://localhost:3000/grafana/explore?orgId=1&left=%7B%22datasource%22:%22tempo%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22tempo%22%7D,%22queryType%22:%22nativeSearch%22,%22search%22:%22player%3DPavol%22%7D,%7B%22refId%22:%22B%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22tempo%22%7D,%22queryType%22:%22traceId%22%7D%5D,%22range%22:%7B%22from%22:%22now-3h%22,%22to%22:%22now%22%7D%7D&right=%7B%22datasource%22:%22tempo%22,%22queries%22:%5B%7B%22query%22:%2256683d3ac9a751ffd7abde903dccc247%22,%22queryType%22:%22traceId%22,%22refId%22:%22A%22%7D%5D,%22range%22:%7B%22from%22:%221681295096718%22,%22to%22:%221681309496718%22%7D%7D) 470 | 471 | ![Traces in Grafana, player attribute](./images/grafana-traces-player-attribute.jpg) 472 | 473 | --- 474 | [Next Steps](./04-metrics.md) 475 | -------------------------------------------------------------------------------- /04-metrics.md: -------------------------------------------------------------------------------- 1 | # Metrics 2 | 3 | This tutorial step focuses on metrics, how the collector can help in metric scraping, and how to use the collector 4 | to create metrics from spans. 5 | 6 | ## Auto-instrumentation and metrics 7 | 8 | Our instrumentation we set up in the previous step provides metrics as well, which we can see in the 9 | [Apps Dashboard](http://localhost:3000/grafana/d/WbvDPqY4k/apps?orgId=1): 10 | ![](./images/grafana-metrics-frontend.png) 11 | ![](./images/grafana-metrics-backend1.png) 12 | ![](./images/grafana-metrics-backend2.png) 13 | 14 | Our backend1 app has additional prometheus metrics that were previously instrumented. We want to be able to see them 15 | as well, which we can enable in the following steps. 16 | 17 | ## Prometheus Target Discovery 18 | 19 | #### Service and Pod Monitors 20 | 21 | If you have services already generating metrics for prometheus, the collector can collect those using the prometheus 22 | receiver, which scrapes metric endpoints provided in a scrape_config like the one below: 23 | ```yaml 24 | - job_name: 'otel-collector' 25 | scrape_interval: 10s 26 | static_configs: 27 | - targets: [ '0.0.0.0:8888' ] 28 | ``` 29 | 30 | This solution works but requires writing out all known targets. When services being deployed are added or changed, 31 | it will require updating this configuration. An alternative to this is to set up Prometheus [Service and Pod Monitors](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/design.md#servicemonitor). 32 | This allows for discovering metric endpoint dynamically and without needing to modify the collector configuration and 33 | restart all collectors. 34 | 35 | In order to apply a pod or service monitor, the CRDs need to be installed: 36 | ```shell 37 | kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml 38 | 39 | kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml 40 | ``` 41 | 42 | You can verify both CRDs are present with the command `kubectl get customresourcedefinitions`, and then the below lines 43 | should be included in your list of CRDs (dates will differ): 44 | ```shell 45 | podmonitors.monitoring.coreos.com 2023-04-11T22:17:04Z 46 | servicemonitors.monitoring.coreos.com 2023-04-11T22:16:58Z 47 | ``` 48 | 49 | ### Target Allocator 50 | 51 | A service called the [Target Allocator](https://github.com/open-telemetry/opentelemetry-operator/blob/main/cmd/otel-allocator/README.md) 52 | can use the prometheus service and pod monitor to discover targets. The target allocator discovers the targets and then 53 | distributes both discovered and configured targets among available collectors. It must be deployed alongside a 54 | Statefulset of collectors. 55 | 56 | Notable changes in the CRD compared to the collector Deployment we applied earlier: 57 | ```yaml 58 | spec: 59 | mode: statefulset 60 | replicas: 3 61 | targetAllocator: 62 | enabled: true 63 | allocationStrategy: "consistent-hashing" 64 | replicas: 2 65 | image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.74.0 66 | prometheusCR: 67 | enabled: true 68 | 69 | config: | 70 | receivers: 71 | prometheus: 72 | config: 73 | scrape_configs: 74 | target_allocator: 75 | endpoint: http://otel-prom-cr-targetallocator:80 76 | interval: 30s 77 | collector_id: ${POD_NAME} 78 | http_sd_config: 79 | refresh_interval: 60s 80 | ``` 81 | 82 | Applying this chart will start a new collector as a StatefulSet with the target allocator enabled, and it will create a ClusterRole granting the TargetAllocator the permissions it needs: 83 | ```shell 84 | kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2023-opentelemetry-kubernetes-tutorial/main/backend/03-collector-prom-cr.yaml 85 | ``` 86 | 87 | Applying this chart will set up service monitors for the backend1 service, the target allocators, and the collector statefulset: 88 | ```shell 89 | kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2023-opentelemetry-kubernetes-tutorial/main/backend/04-servicemonitors.yaml 90 | ``` 91 | 92 | You can verify the collectors and target allocators have been deployed with the command `kubectl get pods -n observability-backend`, where we should see five additional pods: 93 | ```shell 94 | otel-prom-cr-collector-0 1/1 Running 2 (18m ago) 18m 95 | otel-prom-cr-collector-1 1/1 Running 2 (18m ago) 18m 96 | otel-prom-cr-collector-2 1/1 Running 2 (18m ago) 18m 97 | otel-prom-cr-targetallocator-f844684ff-fwrzj 1/1 Running 0 18m 98 | otel-prom-cr-targetallocator-f844684ff-r4jd2 1/1 Running 0 18m 99 | ``` 100 | 101 | The service monitors can also be verified with `kubectl get servicemonitors -A`: 102 | ```shell 103 | NAMESPACE NAME AGE 104 | observability-backend otel-prom-cr-collector-monitoring 21m 105 | observability-backend otel-prom-cr-targetallocator 21m 106 | tutorial-application backend1-service 21m 107 | ``` 108 | 109 | Now we're getting our backend1 prometheus metrics in the [Apps Dashboard](http://localhost:3000/grafana/d/WbvDPqY4k/apps?orgId=1): 110 | ![](./images/grafana-metrics-backend1-prometheus.png) 111 | 112 | We can see the bump in the prometheus metrics receiver and additional prometheus jobs in the [Collector Dashboard](http://localhost:3000/grafana/d/7hHiATL4z/collector?orgId=1): 113 | ![](./images/grafana-metrics-collector-addtl-scrapes.png) 114 | 115 | And the Target Allocator has its own metrics in the [Target Allocator Dashboard](http://localhost:3000/grafana/d/ulLjw3L4z/target-allocator?orgId=1): 116 | ![](./images/grafana-metrics-ta.png) 117 | ![](./images/grafana-metrics-ta-server.png) 118 | 119 | ## Span metrics 120 | 121 | In addition to acquiring instrumented metrics, we can use the [spanmetrics connector](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/spanmetricsconnector/README.md) 122 | to transform spans into Request, Error, and Duration (RED) metrics. A [connector](https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#connectors) 123 | is a special component in the collector that can consume data as an exporter in one pipeline and emit data as a 124 | receiver in another. 125 | 126 | Our otel collector from the [Operator Introduction](./02-operator-introduction.md) is the collector receiving 127 | traces, so we want to modify its configuration to add the spanmetrics connector: 128 | ```shell 129 | kubectl edit opentelemetrycollectors.opentelemetry.io otel -n observability-backend 130 | ``` 131 | ```shell 132 | connectors: 133 | spanmetrics: 134 | namespace: "converted" 135 | 136 | service: 137 | pipelines: 138 | traces: 139 | exporters: [otlp, spanmetrics] 140 | metrics: 141 | receivers: [prometheus, otlp, spanmetrics] 142 | ``` 143 | 144 | Then the collector will need to be restarted: 145 | ```shell 146 | kubectl rollout restart deployment otel-collector -n observability-backend 147 | ``` 148 | 149 | Now we can see RED metrics at the bottom of the [Collector Dashboard](http://localhost:3000/grafana/d/7hHiATL4z/collector?orgId=1): 150 | ![](./images/grafana-metrics-collector-red.png) 151 | 152 | --- 153 | [Next steps](./05-logs.md) 154 | -------------------------------------------------------------------------------- /05-logs.md: -------------------------------------------------------------------------------- 1 | # Logs 2 | This tutorial step focuses on logs, how the collector work on logs scraping and how to use the filelog receiver to transmit those logs to Loki. 3 | ## Summary 4 | 5 | As far logs are concerned, one of the many challenges of SRE's is standardize the how the logs flow from diverse applications to different logging solutions, due to that, OpenTelemetry has the ability to receive and transmit logs through the OTLP protocol and create a correlation (e.g. through origin of the telemetry, execution context, time of execution, spanid) with metrics and traces. 6 | 7 | ## FileLog receiver 8 | [Filelog](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver) receiver is one of the various Receivers available on [OpenTelemetry Collector Contrib](https://github.com/open-telemetry/opentelemetry-collector-contrib). 9 | 10 | ## Filelog workflow. 11 | 12 | ![](./images/filelog-flow.png) 13 | In order to demonstrate the logs instrumentation, we have to get the OpenTelemetry Instance running as a Daemonset: 14 | ````yaml 15 | filelog: 16 | include: 17 | - /var/log/pods/*/*/*.log 18 | #Each operator fulfills a single responsibility, 19 | #such as reading lines from a file, or parsing JSON 20 | #from a field. Operators are then chained together 21 | #in a pipeline to achieve a desired result. 22 | operators: 23 | # Parse CRI-O format 24 | - type: regex_parser 25 | id: parser-crio 26 | regex: '^(?P