├── rules ├── RES-001.md ├── MET-004.md ├── MET-006.md ├── MET-005.md ├── SPA-002.md ├── SPA-003.md ├── LOG-002.md ├── SPA-001.md ├── MET-001.md ├── SPA-005.md ├── MET-003.md ├── MET-002.md ├── RES-004.md ├── SDK-001.md ├── LOG-001.md ├── RES-005.md ├── RES-002.md ├── SPA-004.md ├── RES-003.md └── _template.md ├── .gitignore ├── .github └── ISSUE_TEMPLATE │ └── propose-rule.yml ├── DCO.md ├── GOVERNANCE.md ├── prior-art.md ├── CONTRIBUTING.md ├── README.md ├── LICENSE └── specification.md /rules/RES-001.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** RES-001 2 | 3 | **Description:** `service.instance.id` is present. 4 | 5 | **Rationale:** The `service.instance.id` uniquely identifies a resource, and can be used as the process identifier without taking other resource attributes into account. 6 | 7 | **Target:** Resource 8 | 9 | **Criteria:** The `service.instance.id` resource attribute MUST appear in the resource. 10 | 11 | **Impact:** Normal 12 | -------------------------------------------------------------------------------- /rules/MET-004.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** MET-004 2 | 3 | **Description:** Histogram metrics consistently use the same histogram buckets per metric name. 4 | 5 | **Rationale:** Histograms use buckets to count occurrences in given ranges. Inconsistent buckets may make quantile aggregations less precise. 6 | 7 | **Target:** Metric 8 | 9 | **Criteria:** Given a metric name, all time series of type histogram with that metric name in the past 14 days MUST have the same histogram buckets. 10 | 11 | **Impact:** Normal 12 | -------------------------------------------------------------------------------- /rules/MET-006.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** MET-006 2 | 3 | **Description:** Metric names do not equal semantic convention attribute keys. 4 | 5 | **Rationale:** A metric name that exactly matches an attribute key defined in the semantic conventions (e.g., `http.response.status_code`, a span-level attribute) causes confusion and should be avoided. 6 | 7 | **Target:** Metric 8 | 9 | **Criteria**: Metric names MUST NOT equal any of the attribute keys specified in semantic conventions. 10 | 11 | **Impact:** Important 12 | -------------------------------------------------------------------------------- /rules/MET-005.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** MET-005 2 | 3 | **Description:** Metric names do not contain the name of the metric unit. 4 | 5 | **Rationale:** The metric unit is already provided as part of the unit of the metric per the metric model. The metric name duplicates the metric unit and can be less descriptive. It also implies breaking changes if the unit changes at a later time. 6 | 7 | **Target:** Metric 8 | 9 | **Criteria:** Metric names MUST NOT contain the metric unit (example: `collection.duration.seconds`). 10 | 11 | **Impact:** Normal 12 | -------------------------------------------------------------------------------- /rules/SPA-002.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** SPA-002 2 | 3 | **Description:** Traces do not contain orphan spans. 4 | 5 | **Rationale:** Orphaned spans indicate potential issues in tracing instrumentation or data integrity. This can lead to incomplete or misleading trace data, hindering effective troubleshooting and performance analysis. 6 | 7 | **Target:** Span 8 | 9 | **Criteria:** Given any span with a `parent_span_id` reference, here MUST exist a span with the same trace id whose `span_id` equals that `parent_span_id` value. 10 | 11 | **Impact:** Normal 12 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # dependencies (bun install) 2 | node_modules 3 | 4 | # output 5 | out 6 | dist 7 | *.tgz 8 | 9 | # code coverage 10 | coverage 11 | *.lcov 12 | 13 | # logs 14 | logs 15 | _.log 16 | report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json 17 | 18 | # dotenv environment variable files 19 | .env 20 | .env.development.local 21 | .env.test.local 22 | .env.production.local 23 | .env.local 24 | 25 | # caches 26 | .eslintcache 27 | .cache 28 | *.tsbuildinfo 29 | 30 | # IntelliJ based IDEs 31 | .idea 32 | 33 | # Finder (MacOS) folder config 34 | .DS_Store 35 | -------------------------------------------------------------------------------- /rules/SPA-003.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** SPA-003 2 | 3 | **Description:** Span names have bound cardinality. 4 | 5 | **Rationale:** HTTP and Database span names, depending on the instrumentation, can be high cardinality due to literals embedded in database queries, or literal URL paths instead of HTTP routes. 6 | 7 | High-cardinality span names impact the usefulness of group-by mechanics, reduce the effectiveness of filtering mechanics, and can blow up indexes in tools that rely on them. 8 | 9 | **Target:** Span 10 | 11 | **Criteria:** TODO 12 | 13 | **Examples:** 14 | 15 | **Impact:** Important 16 | -------------------------------------------------------------------------------- /rules/LOG-002.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** LOG-002 2 | 3 | **Description:** Log records have their `severityNumber` set. 4 | 5 | **Rationale:** When using the filelog receiver of the OpenTelemetry Collector, or an equivalent way of reading logs from file, and converting them to OpenTelemetry Log records, it is common for adopters not to specify a way to parse the log’s severity and transform it into the OTLP severityNumber fields. This leaves logs less actionable than they should be. 6 | 7 | **Target:** Log 8 | 9 | **Criteria:** No log record with `severity.text` \= `UNSET` is observed. 10 | 11 | **Impact:** Important 12 | -------------------------------------------------------------------------------- /rules/SPA-001.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** SPA-001 2 | 3 | **Description:** Traces contain a limited number of `INTERNAL` spans per service. 4 | 5 | **Rationale:** Services producing an excessive number of internal spans may indicate inefficient or complex operations. This can impact observability and performance monitoring, making it harder to identify bottlenecks and troubleshoot issues. 6 | 7 | **Target:** Span 8 | 9 | **Criteria:** When grouping spans by trace identifier and `service.name`, no more than 10 spans in a single trace SHOULD have `span.kind = SpanKind.SPAN_KIND_INTERNAL`. 10 | 11 | **Impact:** Normal 12 | -------------------------------------------------------------------------------- /rules/MET-001.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** MET-001 2 | 3 | **Description:** Metric attributes have bound cardinality. 4 | 5 | **Rationale:** High cardinality metric attributes can significantly degrade performance and increase storage costs of observability systems. They lead to a large number of unique time series, making it difficult to aggregate, query, and analyze metrics effectively. This rule helps identify and address such attributes. 6 | 7 | **Target:** Metric 8 | 9 | **Criteria:** Attribute keys on metrics, aggregated by metric name, MUST have less than 10.000 unique values within a 1-hour window. 10 | 11 | **Impact:** Important 12 | -------------------------------------------------------------------------------- /rules/SPA-005.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** SPA-005 2 | 3 | **Description:** Traces do not contain a high number of short duration spans. 4 | 5 | **Rationale:** An excessive number of very short-duration internal spans (`span.kind=INTERNAL`) within a trace might indicate excessive internal calls, instrumentation overhead, or potentially inefficient code. Identifying such traces can help optimize application performance and reduce unnecessary overhead. 6 | 7 | **Target:** Span 8 | 9 | **Criteria:** When grouping spans by `trace_id`, each trace MUST NOT have more than 20 spans with a `duration` of less than 5 milliseconds. 10 | 11 | **Impact:** Important -------------------------------------------------------------------------------- /rules/MET-003.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** MET-003 2 | 3 | **Description:** Metric names are consistently associated with the same metric unit. 4 | 5 | **Rationale:** Metric units are fundamental in understanding what a metric means, and how to analyze it. Metric units are specified in the OpenTelemetry SDKs when their instruments are created, and it is possible for different applications to use the same metric name with different metric units, creating chaos during analysis. 6 | 7 | **Target:** Metric 8 | 9 | **Criteria:** Given a metric name, all time series with that metric name in the past 14 days MUST have the same unit of measurement. 10 | 11 | **Impact:** Important -------------------------------------------------------------------------------- /rules/MET-002.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** MET-002 2 | 3 | **Description:** Metrics have useful metric units. 4 | 5 | **Rationale:** Metric units are fundamental in understanding what a metric means, and how to analyze it. It makes the whole difference in the world to know your pod is using 1% or its memory allocation, 1MB of memory or 1GB. Often, the only thing that tells you how to interpret the metric value and analyze it in queries, is the metric unit. 6 | 7 | **Target:** Metric 8 | 9 | **Criteria:** All Metrics MUST have a non-default unit of measurement compliant with [Unified Code for Units of Measure](https://github.com/ucum-org/ucum). 10 | 11 | **Impact:** Important 12 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/propose-rule.yml: -------------------------------------------------------------------------------- 1 | name: Propose a new rule 2 | description: Submit a proposal for a new rule. 3 | title: "Rule proposal: " 4 | labels: ["rule"] 5 | body: 6 | - type: markdown 7 | attributes: 8 | value: | 9 | Thanks for taking the time to submit a rule proposal! Please fill in as many details as possible. 10 | - type: textarea 11 | id: rule-proposal 12 | attributes: 13 | label: Proposal 14 | value: | 15 | **Rule ID:** 16 | 17 | **Description:** 18 | 19 | **Rationale:** 20 | 21 | **Target:** 22 | 23 | **Criteria:** 24 | 25 | **Impact:** 26 | 27 | **Type:** 28 | -------------------------------------------------------------------------------- /rules/RES-004.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** RES-004 2 | 3 | **Description:** Semantic conventions attributes are used at the right level. 4 | 5 | **Rationale:** Semantic conventions not only specify the namespace keys and the intended meaning, but also at which level in OTLP they should be used, e.g., resource, log, span or metric datapoint. Some tools require end users to specify, upon querying, at which level to look for a matching key. Having, e.g., the `service.name` at inconsistent levels across telemetry hinders analysis. 6 | 7 | **Target:** Resource, Log, Span 8 | 9 | **Criteria:** Attribute keys specified in the semantic conventions MUST appear at the right level in OTLP. 10 | 11 | **Impact:** Important 12 | 13 | **Examples:** 14 | 15 | * `service.name` in span attributes 16 | * `http.request.method` in resource attributes 17 | -------------------------------------------------------------------------------- /rules/SDK-001.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** SDK-001 2 | 3 | **Description:** Dependencies (language and runtime) are supported by the SDK. 4 | 5 | **Rationale:** Using up-to-date SDKs are recommended, so it can get the proper support from community in case of issues. It also will also mostly likely adhere to the latest Semantic Conventions and features. The version of languages and runtimes should reflect the version selected for the SDK and be within the SDK supported values. 6 | 7 | **Target:** SDK 8 | 9 | **Criteria:** The versions of language and runtime found on the instrumentation dependencies for the SDK SHOULD be within supported values. 10 | 11 | **Examples:** 12 | 13 | * JS SDK version v2.0.1 supports Node.JS version 18, 20 and 22 and TypeScript version v5.0.4+, following a support window os 2 years. 14 | 15 | **Impact:** Low -------------------------------------------------------------------------------- /rules/LOG-001.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** LOG-001 2 | 3 | **Description:** Debug-level logs are not enabled in production environments for longer than 14 days. 4 | 5 | **Rationale:** Debug-level logging should generally not be enabled in production environments long-term. Retaining debug logs for extended periods in production can lead to increased storage costs, potential security concerns due to sensitive information being logged, and noisy logs that make troubleshooting more difficult. This rule helps identify situations where debug logging is left on inadvertently in production. 6 | 7 | **Target:** Log 8 | 9 | **Criteria:** Log records with `severity.text` \= `DEBUG` are not observed in a production environment for longer than 14 days, where a production environment is defined by the value of `deployment.environment.name` _Resource_ attribute. 10 | 11 | **Impact:** Important 12 | -------------------------------------------------------------------------------- /rules/RES-005.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** RES-005 2 | 3 | **Description:** `service.name` is present 4 | 5 | **Rationale:** `service.name` represents the logical name of the service, and it is critical for service identification. This attribute is required by [OpenTelemetry Semantic Conventions for Resources](https://opentelemetry.io/docs/specs/semconv/resource/#service). It must be the same for all instances of horizontally scaled services. 6 | 7 | **Target:** Resource 8 | 9 | **Criteria:** Resource attributes MUST contain a `service.name` key with a non-empty string value. The attribute MUST not be null, undefined, or an empty string. 10 | 11 | **Examples:** 12 | 13 | - "Resource attribute `service.name` is missing from the resource attributes." 14 | - "Resource attribute `service.name` is present but has an empty string value." 15 | - "Resource attribute `service.name` is present but has a null value." 16 | 17 | **Impact:** Critical 18 | -------------------------------------------------------------------------------- /rules/RES-002.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** RES-002 2 | 3 | **Description:** `service.instance.id` is unique across logical resources within a given `service.name`. 4 | 5 | **Rationale:** The `service.instance.id` uniquely identifies a resource, however, it's being misused when another resource attribute is present indicating that two workloads are sharing the same `service.instance.id`. This rule is particularly important as it negates the effects of RES-001—having a `service.instance.id` provides no value if it's not unique across resources. 6 | 7 | **Target:** Resource 8 | 9 | **Criteria:** The `service.instance.id` resource attribute MUST be unique across logical resources, e.g., different Kubernetes pods. 10 | 11 | **Impact:** Important 12 | 13 | **Examples:** 14 | 15 | * Resource attribute `service.instance.id` is set to `abc` when the `k8s.pod.name` is set to `payment-abc123`. The same ID `abc` is also seen on telemetry coming from `k8s.pod.name` set to `payment-def456` 16 | -------------------------------------------------------------------------------- /rules/SPA-004.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** SPA-004 2 | 3 | **Description:** Root spans are not `CLIENT` spans. 4 | 5 | **Rationale:** In virtually all cases, with the exception of spans generated by the OTel Browser Bundle, `CLIENT` root spans are a sign of either missing instrumentation (e.g., a missing span of kind `SERVER` or `INTERNAL` to denote the beginning of a batch or headless workload), or the sign that trace context was lost within the application. 6 | 7 | Distributed traces are commonly used to describe interactions within components. A component receives a request, e.g., an HTTP server, or initiates a trace, e.g., an application running in a K8s Job: either case, the trace should begin with a span that describes which request is the application serving. Spans of kind `CLIENT` describe requests issued by an application towards another, and generally miss the context *why* that request is issued in the first place. 8 | 9 | **Target:** Span 10 | 11 | **Criteria:** A root span MUST NOT have kind `CLIENT`. 12 | 13 | **Examples:** 14 | 15 | **Impact:** Important -------------------------------------------------------------------------------- /rules/RES-003.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** RES-003 2 | 3 | **Description:** `k8s.pod.uid` is present in telemetry collected from applications running on a Kubernetes cluster, or from the control pane of the Kubernetes cluster itself. 4 | 5 | **Rationale:** The `k8s.pod.uid` resource attribute enables correlation of telemetry through the `k8sattributeprocessor` and similar facilities that, unlike `k8s.pod.ip`, is robust against service meshes (Istio, Linkerd). 6 | 7 | **Target:** Resource 8 | 9 | **Criteria:** The `k8s.pod.uid` resource attribute MUST be present on resources associated with telemetry describing Kubernetes pods. 10 | 11 | **Impact:** Important 12 | 13 | **Examples:** 14 | 15 | * This is likely the actual issue the user is having [here](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/29630). 16 | 17 | **Note:** OpenTelemetry SDKs should strive to collect the `k8s.pod.uid` out of the box when deployed in Kubernetes-based applications, which is possible by parsing the `cgroup` metadata.[1] 18 | 19 | [1]: See e.g. [https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1489](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1489) 20 | -------------------------------------------------------------------------------- /DCO.md: -------------------------------------------------------------------------------- 1 | Developer Certificate of Origin 2 | Version 1.1 3 | 4 | Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 5 | 6 | Everyone is permitted to copy and distribute verbatim copies of this 7 | license document, but changing it is not allowed. 8 | 9 | 10 | Developer's Certificate of Origin 1.1 11 | 12 | By making a contribution to this project, I certify that: 13 | 14 | (a) The contribution was created in whole or in part by me and I 15 | have the right to submit it under the open source license 16 | indicated in the file; or 17 | 18 | (b) The contribution is based upon previous work that, to the best 19 | of my knowledge, is covered under an appropriate open source 20 | license and I have the right under that license to submit that 21 | work with modifications, whether created in whole or in part 22 | by me, under the same open source license (unless I am 23 | permitted to submit under a different license), as indicated 24 | in the file; or 25 | 26 | (c) The contribution was provided directly to me by some other 27 | person who certified (a), (b) or (c) and I have not modified 28 | it. 29 | 30 | (d) I understand and agree that this project and the contribution 31 | are public and that a record of the contribution (including all 32 | personal information I submit with it, including my sign-off) is 33 | maintained indefinitely and may be redistributed consistent with 34 | this project or the open source license(s) involved. -------------------------------------------------------------------------------- /rules/_template.md: -------------------------------------------------------------------------------- 1 | **Rule ID:** \[Unique Rule ID\] 2 | 3 | **Description:** \[A brief, clear statement describing the purpose of the rule and what it achieves if the check passes. E.g. "Traces contain low number of internal spans per service"\] 4 | 5 | **Rationale:** \[Explanation of why this rule is important for instrumentation quality. What benefit does adherence provide, or what problem does non-adherence cause? Link to relevant OpenTelemetry specifications (e.g., Semantic Conventions) or established best practices if applicable. Explain the impact on observability, efficiency, or cost.\] 6 | 7 | **Target:** \[Specify the primary OTLP element this rule evaluates: Resource | Span | Metric | Log | Profile | Other (Specify) or the component: SDK | Collector | Other (Specify)\] 8 | 9 | **Criteria:** \[The precise, objective conditions under which this rule is triggered when analyzing OTLP data. This description MUST be unambiguous and algorithmically testable. Provide specific attribute names, expected values or patterns, conditions for presence/absence, thresholds, etc. Use backticks for attribute names or code elements.\] 10 | 11 | **Examples:** 12 | 13 | * *"Resource attribute service.name is missing, null, or an empty string."* 14 | * *"A Span has span.kind \= SpanKind.SPAN\_KIND\_SERVER but is missing the http.route attribute when http.request.method is present."* 15 | * *"A Metric point of type Sum has aggregation\_temporality \= AGGREGATION\_TEMPORALITY\_DELTA and is\_monotonic \= true, but its name does not end with the suffix \_total."* 16 | * *"More than 10 spans within the service in a single trace have a duration less than 5 milliseconds."* 17 | * *"A log record with severity\_text \= 'DEBUG' is observed in an environment where the Resource attribute deployment.environment.name is set to production."* 18 | 19 | **Impact:** \[Choose one: Critical | Important | Normal | Low\] (Based on the perceived impact of violating this rule on overall observability effectiveness or efficiency.) 20 | -------------------------------------------------------------------------------- /GOVERNANCE.md: -------------------------------------------------------------------------------- 1 | # Governance Model 2 | 3 | ## Overview 4 | 5 | The Instrumentation Score Specification project operates under an **open governance model** designed to ensure transparency, community participation, and collaborative decision-making. This document outlines the governance structure and participation guidelines for the project. 6 | 7 | ## Current Status 8 | 9 | **Current Host**: OllyGarden, Inc. 10 | **Future Goal**: Transition to neutral hosting under a recognized open-source foundation (such as the Cloud Native Computing Foundation or as part of the OpenTelemetry project) 11 | 12 | This governance model is designed to facilitate the eventual transition to neutral hosting while maintaining the project's open and collaborative nature. 13 | 14 | ## Core Principles 15 | 16 | 1. **Openness**: All project activities, discussions, and decisions are conducted transparently in public forums 17 | 2. **Meritocracy**: Contributions and influence are based on merit, expertise, and constructive participation 18 | 3. **Community-Driven**: The project's direction is determined by the collective needs and input of the community 19 | 4. **Vendor Neutrality**: No single organization or vendor controls the project's technical direction 20 | 5. **Consensus Building**: Decisions prioritize consensus, with clear escalation paths when needed 21 | 22 | ## Governance Structure 23 | 24 | ### Maintainers 25 | 26 | Maintainers are responsible for the day-to-day management of the project, including: 27 | - Reviewing and merging contributions 28 | - Maintaining project quality and consistency 29 | - Facilitating community discussions 30 | - Making technical decisions within their areas of expertise 31 | 32 | **Current Maintainers:** 33 | - [Antoine Toulme] (@atoulme), Splunk 34 | - [Daniel Gomez Blanco] (@danielgblanco), New Relic 35 | - [Juraci Paixão Kröhling] (@jpkrohling), OllyGarden 36 | - [Michele Mancioppi] (@mmanciop), Dash0 37 | 38 | **Becoming a Maintainer:** 39 | - Demonstrate consistent, high-quality contributions to the project 40 | - Show good judgment in technical and community matters 41 | - Be nominated by existing maintainers and confirmed by community consensus 42 | - Commit to the project's governance principles and code of conduct 43 | 44 | ### Community Participants 45 | 46 | All community members are encouraged to participate through: 47 | - Contributing to specifications and documentation 48 | - Participating in discussions and reviews 49 | - Reporting issues and suggesting improvements 50 | - Implementing and testing the specification 51 | 52 | ## Transition to Neutral Hosting 53 | 54 | ### Transition Goals 55 | - Move the project to a neutral, community-governed location, such as the CNCF or OpenTelemetry 56 | - Ensure continuity of project governance and community 57 | 58 | ### Transition Criteria 59 | - Demonstrated community adoption and participation 60 | - Stable governance processes and contributor base 61 | - Sustainable project infrastructure 62 | 63 | ## Communication and Transparency 64 | 65 | ### Public Forums 66 | - **GitHub Issues**: Technical discussions and bug reports 67 | - **CNCF Slack**: Join [#instrumentation-score channel](https://cloud-native.slack.com/archives/C090FEG5R0F) for technical discussions 68 | - **Documentation**: To be maintained in the project repository 69 | 70 | ### Regular Activities 71 | - **Community Meetings**: Regular meetings open to all participants (schedule TBD) 72 | - **Release Planning**: Transparent planning process for specification updates 73 | - **Progress Reports**: Regular updates on project status and governance transition 74 | 75 | ## Contribution Guidelines 76 | 77 | ### How to Contribute 78 | 1. Review the CONTRIBUTING.md document 79 | 2. Participate in community discussions 80 | 3. Submit issues and pull requests 81 | 4. Follow the project's code of conduct 82 | 83 | ## Amendments 84 | 85 | This governance document may be amended by the maintainers with community input and transparency. 86 | 87 | ## Contact 88 | 89 | For governance questions or concerns, please: 90 | - Open an issue in the project repository 91 | - Contact the maintainers directly 92 | - Participate in community meetings 93 | -------------------------------------------------------------------------------- /prior-art.md: -------------------------------------------------------------------------------- 1 | **1\. CVSS (Common Vulnerability Scoring System)** 2 | 3 | * **What it is:** A standardized framework for rating the severity of software vulnerabilities. Widely used across the security industry. 4 | * **How it's Calculated:** 5 | * Based on a formula considering multiple *base metrics* capturing the vulnerability's inherent characteristics: Attack Vector, Attack Complexity, Privileges Required, User Interaction, Scope, Confidentiality Impact, Integrity Impact, Availability Impact. 6 | * Each metric has defined values (e.g., Attack Vector: Network, Adjacent, Local, Physical). 7 | * Outputs a numerical score from 0.0 to 10.0, often mapped to qualitative ratings (Low, Medium, High, Critical). 8 | * Also includes optional Temporal (exploit maturity, remediation level) and Environmental (specific organizational context) metrics that can modify the score, but the *Base score* is the universal standard part. 9 | * **Learnings for Instrumentation Score:** 10 | * **Standardization is Key:** CVSS's strength is its defined, public, vendor-neutral set of metrics and formula. The Instrumentation Score needs this level of specification clarity to be adopted. 11 | * **Multi-dimensional:** Quality/Risk is rarely a single thing. CVSS breaks it down into understandable components. Instrumentation Score's rule-based approach with impact levels aligns well here. 12 | * **Transparency:** The calculation method is open, allowing anyone to understand *why* a vulnerability gets a specific score. Instrumentation Score needs this transparency. 13 | * **Context vs. Standard:** CVSS separates the universal Base score from contextual modifiers (Temporal/Environmental). The Instrumentation Score should likely focus *first* on a standardized "Base Instrumentation Score". 14 | 15 | **2\. Google Lighthouse / PageSpeed Insights** 16 | 17 | * **What it is:** An open-source, automated tool for improving the quality of web pages. Provides scores for Performance, Accessibility, Best Practices, and SEO. 18 | * **How it's Calculated:** 19 | * Runs a series of *audits* against a page. 20 | * The *Performance score* (0-100) is a weighted average of specific metrics (Largest Contentful Paint, Total Blocking Time, Cumulative Layout Shift, etc.). Weights are periodically adjusted based on user experience research. 21 | * Other scores (Accessibility, Best Practices, SEO) are often based on passing/failing a set of specific checks relevant to that category. 22 | * Crucially, provides *actionable opportunities* linked directly to failed audits or poor metrics. 23 | * Uses real-world performance data (from HTTP Archive) to define the score distribution curves (log-normal), meaning scores reflect performance relative to other sites. 24 | * **Learnings for Instrumentation Score:** 25 | * **Actionability is Paramount:** Lighthouse excels because it doesn't just give a score; it tells you *what* to fix and *why*. The Instrumentation Score must be tightly integrated with actionable recommendations based on the rules that were triggered. 26 | * **Weighted Averages:** Combining multiple factors with different levels of importance (weights) is a proven model for a composite score. Instrumentation Score's impact levels serve a similar purpose. 27 | * **Clear Categories:** Separating scores into distinct categories (Performance, Accessibility, etc.) helps users focus. While Instrumentation Score might start with one overall score, thinking about sub-scores (e.g., Resource Attribute Quality, Span Completeness Quality) could be a future evolution. 28 | * **Data-Driven Thresholds:** Using real-world data to set score thresholds makes them more meaningful. This is harder initially but a long-term goal could be to base Instrumentation Score thresholds on observed data patterns. 29 | 30 | **3\. SonarQube Quality Gate** 31 | 32 | * **What it is:** A tool for continuous inspection of code quality, providing metrics and enforcing standards. The "Quality Gate" is a key concept. 33 | * **How it's Calculated:** 34 | * Doesn't typically provide a single numerical score out-of-the-box, but rather a *Pass/Fail* status on a configurable "Quality Gate". 35 | * The Quality Gate is defined by a set of *conditions* based on various code metrics (e.g., Code Coverage \> 80%, Reliability Rating \= A, No new Blocker bugs, Duplication % \< 3%). 36 | * Conditions often focus specifically on *new code* ("Clean as You Code" approach). 37 | * Underlying metrics include things like complexity, duplications, bugs/vulnerabilities/smells (often rated A-E), test coverage (%), etc. 38 | * **Learnings for Instrumentation Score:** 39 | * **Focus on "New Code":** The idea of focusing rules on *changes* or *recent* activity can be powerful for driving incremental improvement, though maybe complex for an initial score. Instrumentation Score's time-bound score (30 days) captures some of this. 40 | * **Pass/Fail Thresholds:** While Instrumentation Score aims for a numerical score, having defined thresholds that trigger a "Needs Improvement" or "Poor" status (like a failed Quality Gate) can simplify interpretation and decision-making. 41 | * **Driving Developer Behavior:** Quality Gates are often integrated into CI/CD pipelines to block merges/deployments, directly influencing developer workflows. The Instrumentation Score could eventually serve a similar purpose in observability reviews or pipeline gates. 42 | * **Configurability vs. Standard:** SonarQube allows customizing gates. The Instrumentation Score standard should likely be less configurable initially to *be* a standard, but understanding this trade-off is important. 43 | 44 | **4\. SSL Labs Server Test** 45 | 46 | * **What it is:** A widely respected free online service that performs a deep analysis of the configuration of any SSL/TLS web server. 47 | * **How it's Calculated:** 48 | * Performs numerous specific checks across categories: Certificate validity/trust, Protocol Support, Key Exchange strength, Cipher Strength. 49 | * Assigns scores (0-100) to each category based on best practices (e.g., TLS 1.3 support \= 100%, TLS 1.2 \= 90%, weak ciphers penalized). 50 | * Combines category scores into an overall numerical score. 51 | * Translates the numerical score into a letter grade (F to A+). 52 | * Applies specific *rules* that can cap or boost the grade (e.g., no TLS 1.3 support caps grade at A-; HSTS required for A+; known vulnerabilities result in F). 53 | * **Learnings for Instrumentation Score:** 54 | * **Rewarding Excellence (A+):** Having a top tier that requires specific best practices encourages going beyond "good enough". Instrumentation Score's idea of potential bonus points aligns. 55 | * **Grade Caps for Deficiencies:** Critical issues directly limit the maximum achievable score, regardless of other positive factors. Instrumentation Score's formula using Critical/Important rules directly mirrors this effective approach. 56 | * **Clear Criteria:** SSL Labs is trusted because its criteria (while complex) are documented and based on established security best practices. The Instrumentation Score rules need to be grounded in OTel specs and community consensus. 57 | * **Evolution:** SSL Labs grading evolves as best practices change (e.g., requirements for TLS versions, key lengths). The Instrumentation Score governance needs to allow for similar evolution. 58 | 59 | **Overall Lessons:** 60 | 61 | * **Transparency is Non-Negotiable:** How the score is calculated *must* be public and clear. 62 | * **Actionability Drives Value:** The score needs to directly guide users on *how* to improve. 63 | * **Multiple Factors Matter:** Combine different aspects of quality, potentially with weighting or criticality, for a meaningful assessment. 64 | * **Clear Scope and Purpose:** Define exactly what the score measures and what it doesn't. 65 | * **Standardization Enables Comparison:** A common definition allows for benchmarking and shared understanding. 66 | * **Governance is Required:** A process for maintaining and evolving the standard is essential for long-term relevance. 67 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to the Instrumentation Score Specification 2 | 3 | Thank you for your interest in contributing to the Instrumentation Score Specification! This document provides guidelines for contributing to the project. 4 | 5 | ## Table of Contents 6 | 7 | - [Getting Started](#getting-started) 8 | - [Ways to Contribute](#ways-to-contribute) 9 | - [Contributing to the Specification](#contributing-to-the-specification) 10 | - [Contributing Rules](#contributing-rules) 11 | - [Submitting Changes](#submitting-changes) 12 | - [Communication](#communication) 13 | - [Code of Conduct](#code-of-conduct) 14 | 15 | ## Getting Started 16 | 17 | The Instrumentation Score Specification defines a standardized metric for assessing OpenTelemetry instrumentation quality. Before contributing, please: 18 | 19 | 1. **Read the specification**: Review the main [specification.md](./specification.md) document 20 | 2. **Understand the goals**: Familiarize yourself with the project's objectives and scope 21 | 3. **Join the community**: Connect with us on [CNCF Slack #instrumentation-score](https://cloud-native.slack.com/archives/C090FEG5R0F) 22 | 4. **Review existing issues**: Check [GitHub Issues](https://github.com/instrumentation-score/spec/issues) for current discussions 23 | 24 | ## Ways to Contribute 25 | 26 | We welcome various types of contributions: 27 | 28 | ### 📝 Documentation 29 | - Improve specification clarity and completeness 30 | - Fix typos, grammar, and formatting 31 | - Add examples and use cases 32 | - Enhance explanations of concepts 33 | 34 | ### 📋 Rules Development 35 | - Propose new scoring rules 36 | - Improve existing rule definitions 37 | - Provide rationale for rule changes 38 | - Contribute rule validation examples 39 | 40 | ### 🐛 Issue Reporting 41 | - Report ambiguities or gaps in the specification 42 | - Suggest improvements to existing rules 43 | - Identify inconsistencies or contradictions 44 | 45 | ### 💡 Feature Proposals 46 | - Suggest new features or capabilities 47 | - Propose extensions to the scoring framework 48 | - Share implementation experiences and feedback 49 | 50 | ### 🔬 Research and Analysis 51 | - Provide research on instrumentation best practices 52 | - Share data on real-world instrumentation patterns 53 | - Contribute analysis of scoring effectiveness 54 | 55 | ## Contributing to the Specification 56 | 57 | ### Specification Structure 58 | 59 | The main specification is organized into these key sections: 60 | - **Introduction**: Project overview and goals 61 | - **Prior Art**: Learning from existing scoring systems 62 | - **Goals and Non-Goals**: Project scope definition 63 | - **Detailed Specification**: Core technical specification 64 | - **Score Calculation**: Mathematical framework 65 | - **Rule Structure**: How rules are defined and applied 66 | 67 | ### Making Specification Changes 68 | 69 | 1. **Small Changes**: For minor fixes (typos, clarifications), submit a pull request directly 70 | 2. **Significant Changes**: For major modifications, start with an issue to discuss the proposal 71 | 72 | ### Specification Guidelines 73 | 74 | - **Clarity**: Write clearly and avoid ambiguous language 75 | - **Completeness**: Ensure all concepts are fully defined 76 | - **Consistency**: Maintain consistent terminology throughout 77 | - **Implementability**: Ensure the specification can be practically implemented 78 | - **Vendor Neutrality**: Avoid favoring specific tools or vendors 79 | 80 | ## Contributing Rules 81 | 82 | Rules are the core mechanism for calculating instrumentation scores. They are located in the `rules/` directory. 83 | 84 | ### Rule Structure 85 | 86 | Each rule must include: 87 | - **ID**: Unique identifier (e.g., RES-001, SPAN-042) 88 | - **Description**: Clear, human-readable explanation 89 | - **Rationale**: Why this rule matters for instrumentation quality 90 | - **Criteria**: Specific, objective conditions for rule application 91 | - **Target**: Which OTLP element it applies to (Resource, TraceSpan, Metric, Log) 92 | - **Impact**: Impact level (Critical, Important, Normal, Low) 93 | 94 | ### Rule Guidelines 95 | 96 | - **Objective**: Rules must be measurable and unambiguous 97 | - **Based on Standards**: Align with OpenTelemetry Semantic Conventions 98 | - **Well-Justified**: Include clear rationale for the rule's importance 99 | - **Implementable**: Ensure rules can be consistently implemented across tools 100 | - **Tested**: Provide examples of when the rule should and shouldn't trigger 101 | 102 | ### Proposing New Rules 103 | 104 | 1. **Research**: Ensure the rule aligns with OpenTelemetry best practices 105 | 2. **Issue First**: Create an issue describing the proposed rule 106 | 3. **Community Input**: Allow time for community discussion 107 | 4. **Implementation**: Submit a pull request with the complete rule definition 108 | 109 | ## Submitting Changes 110 | 111 | ### Pull Request Process 112 | 113 | 1. **Fork the Repository**: Create your own fork of the project 114 | 2. **Create a Branch**: Use a descriptive branch name (e.g., `add-service-version-rule`) 115 | 3. **Make Changes**: Implement your changes with clear, focused commits 116 | 4. **Test**: Ensure your changes don't break existing content 117 | 5. **Submit PR**: Create a pull request with a clear description 118 | 119 | ### Pull Request Guidelines 120 | 121 | - **Title**: Use a clear, descriptive title, following [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/) 122 | - **Description**: Explain what changes you made and why 123 | - **Scope**: Keep PRs focused on a single issue or feature 124 | - **Documentation**: Update related documentation as needed 125 | 126 | ### Commit Message Format 127 | 128 | Use clear, descriptive commit messages: 129 | ``` 130 | feat: add rule for missing `service.version` attribute 131 | 132 | - Defines RES-002 rule for service.version presence 133 | - Categorized as "Important" impact 134 | - Includes rationale and implementation criteria 135 | ``` 136 | 137 | ## Communication 138 | 139 | ### Discussion Channels 140 | 141 | - **GitHub Issues**: For specific problems, proposals, and bugs 142 | - **CNCF Slack**: Join [#instrumentation-score](https://cloud-native.slack.com/archives/C090FEG5R0F) for real-time discussion 143 | - **Pull Requests**: For code and documentation review discussions 144 | 145 | ### Getting Help 146 | 147 | If you need assistance: 148 | 1. Check existing documentation and issues 149 | 2. Ask questions in the CNCF Slack channel 150 | 3. Create a GitHub issue with the "question" label 151 | 4. Reach out to the maintainers directly 152 | 153 | ### Community Meetings 154 | 155 | We hold regular community meetings to discuss: 156 | - Project roadmap and priorities 157 | - Rule proposals and changes 158 | - Implementation feedback 159 | - Community questions and concerns 160 | 161 | Meeting details will be announced in the CNCF Slack channel and GitHub discussions. 162 | 163 | ## Code of Conduct 164 | 165 | This project follows the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md). By participating, you agree to uphold this code. Please report unacceptable behavior to the project maintainers. 166 | 167 | ### Our Standards 168 | 169 | - **Inclusive**: Welcome contributors from all backgrounds 170 | - **Respectful**: Treat everyone with respect and professionalism 171 | - **Collaborative**: Work together constructively 172 | - **Constructive**: Provide helpful, actionable feedback 173 | 174 | ## Recognition 175 | 176 | Contributors are recognized in several ways: 177 | - Contributors are listed in project documentation 178 | - Significant contributors may be invited to become maintainers 179 | - Community achievements are highlighted in project communications 180 | 181 | ## Questions? 182 | 183 | If you have questions about contributing, please: 184 | - Join our [CNCF Slack channel](https://cloud-native.slack.com/archives/C090FEG5R0F) 185 | - Create a GitHub issue with the "question" label 186 | - Review the [governance document](./GOVERNANCE.md) for project structure 187 | 188 | Thank you for contributing to the Instrumentation Score Specification! 189 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Instrumentation Score Specification 2 | 3 | > A standardized metric for assessing OpenTelemetry instrumentation quality 4 | 5 | [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) 6 | [![CNCF Slack](https://img.shields.io/badge/CNCF%20Slack-%23instrumentation--score-blue)](https://cloud-native.slack.com/archives/C090FEG5R0F) 7 | 8 | ## Overview 9 | 10 | The **Instrumentation Score** is a standardized, vendor-neutral metric that quantifies the quality of OpenTelemetry instrumentation. Represented as a numerical value from **0 to 100**, it provides objective feedback on how well a service or system follows OpenTelemetry best practices and semantic conventions. 11 | 12 | The rules defined in this specification are classified into different impact levels, presenting actionable recommendations that teams can implement in order to improve the overall score of their services. These levels provide engineers with a recommended **prioritization** across a range of potential instrumentation issues, allowing them to focus on the most critical actions. 13 | 14 | ## Target Audience 15 | 16 | While this specification can be used and implemented by anyone, we have identified key audiences who will benefit most from the Instrumentation Score: 17 | 18 | ### 🏢 Observability Platform Vendors 19 | 20 | We expect vendors to adopt the Instrumentation Score as a standard metric in their platforms. This provides: 21 | - **Consistent Quality Assessment**: Standardized scoring across different tools and platforms 22 | - **Customer Value**: Clear instrumentation quality insights for users 23 | - **Customer Advisory**: Use the spec as a framework when advising customers on instrumentation best practices 24 | 25 | ### 👩‍💻 Observability & Engineering Teams 26 | 27 | Observability engineers and engineering teams are the primary users who will interpret and act on Instrumentation Scores: 28 | - **Quality Assessment**: Understand instrumentation health at a glance 29 | - **Improvement Guidance**: Get actionable insights for better instrumentation 30 | - **Vendor Independence**: Carry knowledge between different observability platforms 31 | - **Team Communication**: Use common vocabulary when discussing instrumentation quality 32 | 33 | We encourage these teams to join the project and contribute their expertise: 34 | - **Real-world Experience**: Share insights about what constitutes good vs. bad instrumentation 35 | - **Rule Development**: Help define and refine scoring criteria 36 | - **Use Case Validation**: Ensure rules reflect practical observability needs 37 | - **Community Growth**: Expand the collective knowledge base 38 | 39 | ### Why Instrumentation Score? 40 | 41 | As OpenTelemetry adoption grows, organizations face challenges with instrumentation quality: 42 | - Missing critical attributes (e.g., `service.name`) 43 | - Inefficient telemetry signal usage 44 | - High cardinality issues 45 | - Incomplete traces 46 | - Inconsistent instrumentation patterns 47 | 48 | The Instrumentation Score addresses these challenges by providing: 49 | 50 | - **🎯 Common Vocabulary**: Shared language for discussing instrumentation quality 51 | - **📊 Benchmarking**: Meaningful comparisons across services and over time 52 | - **🔧 Actionable Guidance**: Clear feedback to improve instrumentation 53 | - **💰 Efficiency**: Practices that lead to more cost-effective telemetry 54 | 55 | ## Quick Start 56 | 57 | ### Understanding Your Score 58 | 59 | The Instrumentation Score uses these qualitative categories: 60 | 61 | | Score Range | Category | Meaning | 62 | | ----------- | --------------------- | ------------------------------------------ | 63 | | 90-100 | **Excellent** | High standard of instrumentation quality | 64 | | 75-89 | **Good** | Solid quality; minor improvements possible | 65 | | 50-74 | **Needs Improvement** | Tangible issues requiring attention | 66 | | 0-49 | **Poor** | Significant problems needing urgent action | 67 | 68 | ### How It Works 69 | 70 | 1. **Analyze OTLP Data**: The score is calculated by analyzing OpenTelemetry Protocol (OTLP) telemetry streams. 71 | 2. **Apply Rules**: A set of standardized rules evaluate traces, metrics, and resource attributes. Each rule is evaluated as a boolean condition with `true` implying success and `false` implying failure. 72 | 3. **Calculate Score**: Mathematical formula generates a single score, applying weights to each rule check. 73 | 4. **Provide Feedback**: Actionable insights guide improvements. 74 | 75 | ### Score Calculation 76 | 77 | Let: 78 | 79 | * $N$ be the total number of impact levels. 80 | * $L_i$ denote the $i$-th impact level, where $i \in \{1, 2, \dots, N\}$. 81 | * $W_i$ be the weight assigned to the $i$-th impact level ($L_i$). 82 | * $P_i$ be the number of rules passed, or succeeded, for impact level $L_i$. 83 | * $T_i$ be the total number of rules for impact level $L_i$. 84 | 85 | The _Instrumentation Score_ is calculated as: 86 | 87 | $$\text{Score} = \frac{\sum_{i=1}^{N} (P_i \times W_i)}{\sum_{i=1}^{N} (T_i \times W_i)} \times 100$$ 88 | 89 | See the [Specification](./specification.md) for further examples and further details. 90 | 91 | ## Documentation 92 | 93 | 📖 **[Full Specification](./specification.md)** - Complete technical specification 94 | 🔧 **[Contributing Guide](./CONTRIBUTING.md)** - How to contribute to the project 95 | 🏛️ **[Governance](./GOVERNANCE.md)** - Project governance and maintainers 96 | 📚 **[Prior Art](./prior-art.md)** - Research on existing scoring systems 97 | 📋 **[Rules Directory](./rules/)** - Complete set of scoring rules 98 | 99 | ## Implementation 100 | 101 | This specification is designed to be **implementation-agnostic**. Multiple tools can implement the Instrumentation Score calculation while maintaining consistency through the standardized rules. 102 | 103 | ### Implementation Requirements 104 | 105 | - ✅ **MUST** implement all rules defined in this specification 106 | - ✅ **MUST** use the standardized calculation formula 107 | - ✅ **MUST** provide information if not all rules are implemented 108 | - ✅ **MUST NOT** include additional rules that affect the standard score 109 | 110 | ## Rules Structure 111 | 112 | Scoring rules are the foundation of the Instrumentation Score. Each rule includes: 113 | 114 | - **ID**: Unique identifier (e.g., `RES-001`, `SPAN-042`). 115 | - **Description**: Human-readable explanation. 116 | - **Rationale**: Why this rule matters for quality. 117 | - **Criteria**: Boolean condition that evaluates as `true` for success or `false` for failure. 118 | - **Target**: OTLP element type (`Resource`, `TraceSpan`, `Metric`, `Log`), 119 | - **Impact**: Impact level (`Critical`, `Important`, `Normal`, `Low`). 120 | 121 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in rule criteria are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). 122 | 123 | ### Example Rule 124 | 125 | ```yaml 126 | id: RES-001 127 | description: "Service name must be present" 128 | rationale: "service.name is fundamental for service identification and observability" 129 | target: Resource 130 | impact: Critical 131 | type: Negative 132 | criteria: "Resource attributes MUST contain 'service.name' key with non-empty value" 133 | ``` 134 | 135 | ## Community 136 | 137 | ### Get Involved 138 | 139 | We welcome contributions from the OpenTelemetry community! Here's how to participate: 140 | 141 | - 💬 **Join Discussion**: [CNCF Slack #instrumentation-score](https://cloud-native.slack.com/archives/C090FEG5R0F) 142 | - 🐛 **Report Issues**: [GitHub Issues](https://github.com/instrumentation-score/spec/issues) 143 | - 🔀 **Submit Changes**: Follow our [contributing guide](./CONTRIBUTING.md) 144 | - 📅 **Attend Meetings**: Community meetings (schedule in Slack) 145 | 146 | ### Maintainers 147 | 148 | - [Antoine Toulme](https://github.com/atoulme) (@atoulme), Splunk 149 | - [Daniel Gomez Blanco](https://github.com/danielgblanco) (@danielgblanco), New Relic 150 | - [Juraci Paixão Kröhling](https://github.com/jpkrohling) (@jpkrohling), OllyGarden 151 | - [Michele Mancioppi](https://github.com/mmanciop) (@mmanciop), Dash0 152 | 153 | ## Project Status 154 | 155 | **Current Status**: Active development and community feedback 156 | 157 | This is an open-source specification initiated by [OllyGarden](https://olly.garden) with the goal of becoming a community-governed standard for instrumentation quality assessment. 158 | 159 | ## Relationship to OpenTelemetry 160 | 161 | The Instrumentation Score specification: 162 | 163 | - 🏗️ **Builds on** OpenTelemetry Semantic Conventions 164 | - 📊 **Analyzes** OTLP telemetry data streams 165 | - 🔧 **Complements** existing observability tools 166 | - 🎯 **Guides** effective use of OTel SDKs and Collector 167 | - 🤝 **Intended for** integration with the OpenTelemetry ecosystem 168 | 169 | ## License 170 | 171 | This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details. 172 | 173 | ## Acknowledgments 174 | 175 | - OpenTelemetry project and community 176 | - All contributors and community members 177 | 178 | --- 179 | 180 | **Start improving your instrumentation quality today!** 🚀 181 | 182 | For questions, feedback, or discussions, join us in the [CNCF Slack #instrumentation-score channel](https://cloud-native.slack.com/archives/C090FEG5R0F). 183 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /specification.md: -------------------------------------------------------------------------------- 1 | # **Introduction** 2 | 3 | The OpenTelemetry project provides a powerful, vendor-neutral framework for instrumenting, generating, collecting, and exporting telemetry data (traces, metrics, logs). However, as adoption grows, organizations often face challenges with the *quality* and *consistency* of their instrumentation. Issues like missing critical attributes (e.g., `service.name`), inefficient use of telemetry signals (e.g., using verbose logs where metrics suffice), high cardinality, or incomplete traces can hinder observability effectiveness, increase costs, and make troubleshooting difficult. Currently, the OpenTelemetry ecosystem lacks a standardized, transparent, and objective method for assessing the quality of this instrumentation. 4 | 5 | To address this gap, this document proposes a specification for a standardized **Instrumentation Score**. This score, represented as a numerical value ranging from **0 to 100**, aims to provide a quantifiable measure of how well a service or system is instrumented according to OpenTelemetry best practices and semantic conventions. It is calculated by analyzing OpenTelemetry Protocol (OTLP) data streams against a defined set of rules. 6 | 7 | The introduction of such a standard score offers several significant benefits: 8 | 9 | 1. **Common Vocabulary:** It establishes a shared language for discussing instrumentation quality among developers, SREs, platform teams, and vendors. 10 | 2. **Benchmarking:** It enables meaningful comparisons of instrumentation quality across different services within an organization, or tracking improvements over time. 11 | 3. **Actionable Guidance:** The score and its underlying components are designed to provide clear, actionable feedback, helping teams identify specific areas for improvement. 12 | 4. **Efficiency Promotion:** It encourages practices that lead to more efficient and effective telemetry pipelines, potentially reducing data volume and associated costs. 13 | 14 | This specification is presented as an initial draft (Version 0.1) and is being initiated as an open-source effort by [OllyGarden](https://olly.garden), with the explicit goal of soliciting community feedback and eventually hosting it under an open governance model, hopefully integrated into the OpenTelemetry project. It focuses on defining the core scoring framework, the structure of the rules, and the calculation methodology, rather than mandating or endorsing any specific software tool for its implementation. 15 | 16 | ## **Target Audience** 17 | 18 | This specification is designed for technical stakeholders involved in implementing, adopting, or evaluating the Instrumentation Score standard: 19 | 20 | ### **Tool and Platform Implementers** 21 | 22 | - **Observability Platform Vendors**: Teams building commercial or open-source observability platforms who want to integrate standardized instrumentation scoring, and who can use the spec as a framework when advising customers on instrumentation best practices 23 | - **Tool Developers**: Engineers creating standalone tools for instrumentation analysis and scoring 24 | - **Integration Engineers**: Technical teams implementing the score calculation within existing observability infrastructure 25 | 26 | ### **Technical Decision Makers** 27 | 28 | - **Platform Engineering Teams**: Engineers responsible for observability strategy and tooling decisions within organizations 29 | - **SRE and DevOps Teams**: Teams evaluating instrumentation quality assessment solutions for their production environments 30 | - **Engineering Managers**: Technical leaders assessing the value and feasibility of adopting instrumentation scoring standards 31 | 32 | ### **Community Contributors** 33 | 34 | - **OpenTelemetry Contributors**: Community members interested in extending or refining the scoring methodology 35 | - **Observability Engineers**: Practitioners with real-world experience who can contribute insights about effective instrumentation patterns 36 | - **Standards Enthusiasts**: Technical professionals interested in contributing to open observability standards 37 | 38 | This specification assumes familiarity with OpenTelemetry concepts, OTLP data formats, and observability engineering practices. 39 | 40 | ## **Learning from Prior Art in Scoring** 41 | 42 | Before defining the specifics, it's valuable to consider established scoring systems in other technical domains: 43 | 44 | * **CVSS (Common Vulnerability Scoring System):** This standard for vulnerability impact demonstrates the power of a transparent, multi-faceted scoring system based on clearly defined metrics. Its separation of a universal base score from contextual modifiers is also instructive. The key takeaway is the importance of **standardization and transparency** for wide adoption and trust. 45 | * **Google Lighthouse:** Scoring web page quality, Lighthouse excels at linking scores directly to **actionable recommendations**, making it highly valuable for users seeking improvement. Its use of weighted averages and data-driven thresholds also provides useful patterns. 46 | * **SonarQube Quality Gate:** By using Pass/Fail gates based on code metrics, SonarQube shows how quality scores can be integrated into **developer workflows** (like CI/CD pipelines) to enforce standards, particularly focusing on the quality of *new* code. 47 | * **SSL Labs Server Test:** This TLS configuration grader effectively uses **grade capping**, where critical flaws (like weak protocols) limit the maximum achievable score, regardless of other positive factors. It also rewards exceptional configurations (e.g., HSTS for an A+), providing clear incentives. 48 | 49 | These examples underscore the need for the Instrumentation Score to be standardized, transparent, actionable, multi-faceted, and governed effectively, incorporating mechanisms to reflect the critical impact of major deficiencies. A more comprehensive research on the existing prior art is available at [Prior art for Instrumentation Score](./prior-art.md). 50 | 51 | ## **Specification Goals and Non-Goals** 52 | 53 | The primary **goals** of this specification are to: 54 | 55 | * Define a **standardized**, vendor-neutral metric for instrumentation quality. 56 | * Provide **quantifiable and transparent** feedback via a numerical score (0-100) and an open calculation method. 57 | * Offer **actionable insights** by structuring the score to guide improvements. 58 | * **Promote best practices** in line with OpenTelemetry standards. 59 | * Establish a **governed framework** allowing for community-driven evolution. 60 | * Create a basis for **benchmarking** instrumentation quality. 61 | 62 | It is explicitly **not** the goal of this specification to: 63 | 64 | * Mandate or endorse specific software tools for score calculation. 65 | * Define a system for real-time alerting on instrumentation issues. 66 | * Dictate specific backend implementation details (databases, architecture). 67 | * Cover every niche instrumentation scenario in initial versions. 68 | * Replace existing observability dashboards or analysis tools. 69 | 70 | ## **Detailed Specification** 71 | 72 | ### **Overview** 73 | 74 | The Instrumentation Score is a numerical value between 0 (Poor) and 100 (Excellent). It assesses the quality of instrumentation based on the automated analysis of OTLP telemetry data streams, primarily focusing on Traces and associated Resource attributes in its initial conception, with potential future expansion to Metrics and Logs. The score is typically calculated per `service.name`, representing the quality over a defined sliding time window (defaulting to 30 days). Implementations may support aggregation to higher levels (e.g., organization-wide), potentially applying additional rules at that level. The calculation relies on applying a defined set of Rules to the observed telemetry. Implementations MUST NOT include other rules to the instrumentation score that don't belong to the specification: the instrumentation score obtained by a service using a specific implementation must yield the same instrumentation score when using a different implementation. If an implementation doesn't implement all rules, they MUST provide information to their users that the instrumentation score might not be complete. 75 | 76 | ### **Rules** 77 | 78 | The scoring mechanism is driven by rules derived primarily from OpenTelemetry Semantic Conventions and community-accepted best practices. Each rule must be clearly defined with the following attributes: 79 | 80 | * _ID_: A unique, stable identifier (e.g., RES-001). 81 | * _Description_: A human-readable explanation. 82 | * _Rationale_: Justification for the rule's importance to quality. 83 | * _Criteria_: Boolean condition that evaluates as `true` for success or `false` for failure. Multiple sub-conditions may be used, in which case the overall result is an `AND` operation on all conditions. 84 | * _Target_: The OTLP signal or element it applies to. Must be one of: `Resource`, `TraceSpan`, `Metric`, `Log`. 85 | * _Impact_: An assigned importance level influencing score impact. Must be one of: `Critical`, `Important`, `Normal`, `Low`. 86 | 87 | As explained in the _Score Calculation Formula_ section below, each of these impact levels has an associated **weight**, which increases the associated rule's importance in the resulting score: 88 | 89 | * _Critical_: 40 90 | * _Important_: 30 91 | * _Normal_: 20 92 | * _Low_: 10 93 | 94 | ### **Score Calculation Formula** 95 | 96 | The final _Instrumentation Score_ ensures major issues significantly impact the score, and adheres to the 0-100 range. 97 | 98 | Let: 99 | 100 | * $N$ be the total number of impact levels. 101 | * $L_i$ denote the $i$-th impact level, where $i \in \{1, 2, \dots, N\}$. 102 | * $W_i$ be the weight assigned to the $i$-th impact level ($L_i$). 103 | * $P_i$ be the number of rules passed, or succeeded, for impact level $L_i$. 104 | * $T_i$ be the total number of rules for impact level $L_i$. 105 | 106 | The _Instrumentation Score_ is calculated as: 107 | 108 | $$\text{Score} = \frac{\sum_{i=1}^{N} (P_i \times W_i)}{\sum_{i=1}^{N} (T_i \times W_i)} \times 100$$ 109 | 110 | To illustrate this we can use the weights specified in the previous section, and the following compliance across impact levels: 111 | 112 | * **Critical**: 4/8 rules passed ($P_1 = 4$, $T_1 = 8$) 113 | * **Important**: 8/10 rules passed ($P_2 = 8$, $T_2 = 10$) 114 | * **Normal**: 6/8 rules passed ($P_3 = 6$, $T_3 = 8$) 115 | * **Low**: 1/5 rules passed ($P_4 = 1$, $T_4 = 5$) 116 | 117 | Substituting the values into the formula with the updated weights: 118 | 119 | $$\text{Score} = \frac{(4 \times 40) + (8 \times 30) + (6 \times 20) + (1 \times 10)}{(8 \times 40) + (10 \times 30) + (8 \times 20) + (5 \times 10)} \times 100$$ 120 | 121 | With the final score as: 122 | 123 | $$\text{Score} = \frac{530}{830} \times 100 \approx 0.63855 \times 100 \approx 63.86$$ 124 | 125 | 126 | This structure ensures that major deficiencies act as a significant deterrent, potentially capping the achievable score, aligning with lessons from prior art like SSL Labs. At the same time, it presents a clear prioritization for teams addressing failed rules. Solving 4 _Critical_ impact issues would increase the score to 83.13, while solving 4 _Low_ impact issues would achieve 67.47. 127 | 128 | ### **Qualitative Categories** 129 | 130 | To simplify interpretation, the numerical score is mapped to intuitive qualitative categories: 131 | 132 | | Score Range | Category | Interpretation Guidance | 133 | | :---- | :---- | :---- | 134 | | 90 \- 100 | **Excellent** | Represents a high standard of instrumentation quality. | 135 | | 75 \- 89 | **Good** | Solid, acceptable quality; minor improvements may be possible. | 136 | | 50 \- 74 | **Needs Improvement** | Indicates tangible issues requiring attention and remediation. | 137 | | 0 \- 49 | **Poor** | Signals significant instrumentation problems needing urgent action. | 138 | 139 | These ranges provide clear signals for action, with "Excellent" being a distinct achievement and "Poor" indicating likely critical issues. 140 | 141 | ### **Initial Rule Set Considerations** 142 | 143 | The initial set of official rules should prioritize high-impact, widely applicable checks, primarily based on stable OpenTelemetry Semantic Conventions and focusing on foundational elements like Traces and Resource attributes. A comprehensive rule set accompanies this repository under the [rules](./rules/) directory, but illustrative examples include: 144 | 145 | * Missing `service.name` (Critical), missing `service.version` (Important), missing `deployment.environment.name` (Normal), patterns suggesting logs used inefficiently instead of metrics (Normal), high cardinality detected in metric dimensions (Important). 146 | * Presence of recommended attributes like `service.instance.id` (Important). 147 | 148 | ## **Intended Usage and Benefits** 149 | 150 | The Instrumentation Score serves multiple purposes: 151 | 152 | * Providing direct **feedback to developers** to guide instrumentation improvements. 153 | * Allowing **platform teams** to track quality trends across services. 154 | * Establishing a basis for internal **benchmarking**. 155 | * Highlighting areas for **optimization** to improve telemetry efficiency and potentially reduce costs. 156 | * Creating a **common language** for discussing instrumentation quality. 157 | * Serving as a standard metric for **consultants and auditors**. 158 | 159 | ## **Relationship to the OpenTelemetry Ecosystem** 160 | 161 | This specification is deeply intertwined with the OpenTelemetry project: 162 | 163 | * It **leverages** OpenTelemetry Semantic Conventions as the primary source for rule definitions and OTLP as the data format analyzed. 164 | * It **informs** users about the effectiveness of their instrumentation choices made using OTel SDKs and configurations within the OTel Collector. 165 | * It **complements** existing observability backends and visualization tools by providing a focused metric on instrumentation quality itself. 166 | * It is intended for eventual submission to a neutral ground, perhaps as a CNCF project or OpenTelemetry SIG. 167 | --------------------------------------------------------------------------------