Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 33 additions & 30 deletions content/en/docs/collector/internal-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,28 +237,28 @@ categorized by instrumentation type.

#### Additional `detailed`-level metrics

| Metric name | Description | Type |
| --------------------------------- | ----------------------------------------------------------------------------------------- | --------- |
| `http_client_active_requests` | Number of active HTTP client requests. | Counter |
| `http_client_connection_duration` | Measures the duration of the successfully established outbound HTTP connections. | Histogram |
| `http_client_open_connections` | Number of outbound HTTP connections that are active or idle on the client. | Counter |
| `http_client_request_body_size` | Measures the size of HTTP client request bodies. | Histogram |
| `http_client_request_duration` | Measures the duration of HTTP client requests. | Histogram |
| `http_client_response_body_size` | Measures the size of HTTP client response bodies. | Histogram |
| `http_server_active_requests` | Number of active HTTP server requests. | Counter |
| `http_server_request_body_size` | Measures the size of HTTP server request bodies. | Histogram |
| `http_server_request_duration` | Measures the duration of HTTP server requests. | Histogram |
| `http_server_response_body_size` | Measures the size of HTTP server response bodies. | Histogram |
| `rpc_client_duration` | Measures the duration of outbound RPC. | Histogram |
| `rpc_client_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram |
| `rpc_client_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| `rpc_client_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram |
| `rpc_client_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| `rpc_server_duration` | Measures the duration of inbound RPC. | Histogram |
| `rpc_server_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram |
| `rpc_server_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| `rpc_server_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram |
| `rpc_server_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| Metric name | Description | Type |
| ----------------------------------------- | ----------------------------------------------------------------------------------------- | --------- |
| `otelcol_http_client_active_requests` | Number of active HTTP client requests. | Counter |
| `otelcol_http_client_connection_duration` | Measures the duration of the successfully established outbound HTTP connections. | Histogram |
| `otelcol_http_client_open_connections` | Number of outbound HTTP connections that are active or idle on the client. | Counter |
| `otelcol_http_client_request_body_size` | Measures the size of HTTP client request bodies. | Histogram |
| `otelcol_http_client_request_duration` | Measures the duration of HTTP client requests. | Histogram |
| `otelcol_http_client_response_body_size` | Measures the size of HTTP client response bodies. | Histogram |
| `otelcol_http_server_active_requests` | Number of active HTTP server requests. | Counter |
| `otelcol_http_server_request_body_size` | Measures the size of HTTP server request bodies. | Histogram |
| `otelcol_http_server_request_duration` | Measures the duration of HTTP server requests. | Histogram |
| `otelcol_http_server_response_body_size` | Measures the size of HTTP server response bodies. | Histogram |
| `otelcol_rpc_client_duration` | Measures the duration of outbound RPC. | Histogram |
| `otelcol_rpc_client_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram |
| `otelcol_rpc_client_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| `otelcol_rpc_client_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram |
| `otelcol_rpc_client_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| `otelcol_rpc_server_duration` | Measures the duration of inbound RPC. | Histogram |
| `otelcol_rpc_server_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram |
| `otelcol_rpc_server_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
| `otelcol_rpc_server_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram |
| `otelcol_rpc_server_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics are generated by the underlying instrumentation library, not by collector components. This means that the prefix will not be present here

Copy link
Contributor Author

@danelson danelson Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codeboten are you saying to revert this? Also can you clarify how this affects consuming the data when scraping the metrics with the prometheus receiver? I am running 0.104.0 with the below config and I see metrics with names like otelcol_http_server_response_size (note it is not response_body_size either)
image

OTel config
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
  prometheus/collector:
    config:
      scrape_configs:
        - job_name: "internal"
          scrape_interval: 10s
          static_configs:
            - targets:
                - "localhost:8888"

processors:
  filter/collector:
    error_mode: ignore
    metrics:
      include:
        match_type: regexp
        metric_names:
          - .*http_server.*

exporters:
  debug:
    verbosity: detailed

service:
  telemetry:
    metrics:
      level: detailed
  pipelines:
    logs:
      receivers: [otlp]
      exporters: [debug]
    metrics:
      receivers: [prometheus/collector]
      processors: [filter/collector]
      exporters: [debug]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, until v0.106.1, the prometheus exporter configured inside the collector was setting a prometheus namespace of otelcol to prefix all metrics exported via prometheus. This was inconsistent with the metrics exported via other exporters (OTLP, console). This was addressed by prefixing all collector component generated metrics manually with otelcol_ to provide a consistent metric name for all exporters. Note that when I'm using the term exporters here, I mean the exporters configured inside the Collector for the use of the OTel Go SDK.

This means that all metrics generated by instrumentation libraries will match the names that these instrumentation libraries intended as per the example below, where http_server_response_size used to be prefixed by otelcol_ and will now look like this:

http_server_response_size{http_method="POST",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="4318",net_protocol_name="http",net_protocol_version="1.1",service_instance_id="aa3d8988-fdf1-4023-8fff-193877983817",service_name="otelcontribcol",service_version="0.106.1-dev"} 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. The disconnect I was having was between collector generated and instrumentation library generated metrics.


### Events observable with internal logs

Expand All @@ -283,7 +283,8 @@ own telemetry.

#### Data loss

Use the rate of `otelcol_processor_dropped_spans > 0` and
Use the rate of `otelcol_processor_dropped_log_records > 0`,
`otelcol_processor_dropped_spans > 0`, and
`otelcol_processor_dropped_metric_points > 0` to detect data loss. Depending on
your project's requirements, select a narrow time window before alerting begins
to avoid notifications for small losses that are within the desired reliability
Expand Down Expand Up @@ -317,19 +318,21 @@ logs for messages such as `Dropping data because sending_queue is full`.

#### Receive failures

Sustained rates of `otelcol_receiver_refused_spans` and
`otelcol_receiver_refused_metric_points` indicate that too many errors were
returned to clients. Depending on the deployment and the clients' resilience,
this might indicate clients' data loss.
Sustained rates of `otelcol_receiver_refused_log_records`,
`otelcol_receiver_refused_spans`, and `otelcol_receiver_refused_metric_points`
indicate that too many errors were returned to clients. Depending on the
deployment and the clients' resilience, this might indicate clients' data loss.

Sustained rates of `otelcol_exporter_send_failed_spans` and
Sustained rates of `otelcol_exporter_send_failed_log_records`,
`otelcol_exporter_send_failed_spans` and
`otelcol_exporter_send_failed_metric_points` indicate that the Collector is not
able to export data as expected. These metrics do not inherently imply data loss
since there could be retries. But a high rate of failures could indicate issues
with the network or backend receiving the data.

#### Data flow

You can monitor data ingress with the `otelcol_receiver_accepted_spans` and
`otelcol_receiver_accepted_metric_points` metrics and data egress with the
You can monitor data ingress with the `otelcol_receiver_accepted_log_records`,
`otelcol_receiver_accepted_spans`, and `otelcol_receiver_accepted_metric_points`
metrics and data egress with the `otelcol_exporter_sent_log_records`,
`otelcol_exporter_sent_spans` and `otelcol_exporter_sent_metric_points` metrics.