Skip to content

fix(base-cluster/monitoring)!: grafana-tempo-distributed would need s3#1875

Merged
cwrau merged 1 commit intomainfrom
fix/base-cluster/grafana-tempo
Dec 9, 2025
Merged

fix(base-cluster/monitoring)!: grafana-tempo-distributed would need s3#1875
cwrau merged 1 commit intomainfrom
fix/base-cluster/grafana-tempo

Conversation

@cwrau
Copy link
Member

@cwrau cwrau commented Dec 9, 2025

Therefore we switch to the singleBinary chart.
Without this change the compactor doesn't work; meaning the traces are
never cleaned up...

Summary by CodeRabbit

  • Chores

    • Upgraded Grafana Tempo integration from tempo-distributed (1.57.0) to tempo (1.24.1) and adjusted tracing deployment defaults (persistence, replicas, retention, resources, and security settings).
    • Flattened tracing configuration: ingester sub-block fields were promoted to top-level tracing fields.
  • Documentation

    • Added migration guidance for upcoming major releases, including tracing config changes.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings December 9, 2025 13:58
@cwrau cwrau enabled auto-merge December 9, 2025 13:58
@gemini-code-assist
Copy link

Summary of Changes

Hello @cwrau, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where the grafana-tempo-distributed chart's compactor was failing due to an unmet dependency on S3, leading to an accumulation of traces. To rectify this, the monitoring stack is being migrated to utilize the grafana-tempo single binary Helm chart. This change ensures that trace retention and compaction function correctly by updating the chart reference, adjusting the OTLP endpoint in the Alloy gateway, and reconfiguring the Tempo Helm chart values to suit the single binary architecture.

Highlights

  • Helm Chart Switch: The Grafana Tempo deployment has been switched from the tempo-distributed Helm chart to the tempo single binary Helm chart to resolve issues with trace compaction.
  • Endpoint Update: The OTLP exporter endpoint in the Alloy gateway configuration has been updated to reflect the new grafana-tempo service name.
  • Configuration Refactoring: The values for the Grafana Tempo Helm chart have been refactored to align with the single binary deployment model, removing distributed-specific configurations and adjusting others for proper operation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This PR replaces the Grafana Tempo distributed chart with the standalone tempo chart, restructures tempo-related Helm templates/values (removes ingester block, promotes its fields), and updates the OTLP exporter endpoint in the Alloy gateway from grafana-tempo-distributor:4317 to grafana-tempo:4317.

Changes

Cohort / File(s) Summary
Tempo templates & Alloy gateway
charts/base-cluster/templates/monitoring/alloy-gateway.yaml, charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml
Updated OTLP exporter endpoint to grafana-tempo:4317. Reworked Tempo Helm template: switched from tempo-distributed to tempo, removed legacy ingester block, added otlpReceiver (grpc 0.0.0.0:4317), serviceMonitor config, persistentVolumeClaimRetentionPolicy, reporting/persistence/replica settings, and adjusted image/registry/resources/securityContext usage.
Values & schema
charts/base-cluster/values.yaml, charts/base-cluster/values.schema.json
Replaced Helm dependency tempo-distributed: 1.57.0 with tempo: 1.24.1. Promoted tracing.ingester.* fields to tracing.* (resourcesPreset, resources, persistence.size) and removed nested ingester object from schema.
Documentation
charts/base-cluster/README.md.gotmpl
Added migration notes for 8.x→9.0.0, 9.x→10.0.0, and 10.x→11.0.0 documenting Tempo/ingester migration and other upgrade steps.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Review charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml for compatibility with tempo v1.24.1 API/values.
  • Verify promoted tracing fields in values.yaml and values.schema.json match template references and validation.
  • Confirm OTLP endpoint change in alloy-gateway.yaml aligns with service names and serviceMonitor/service selectors.
  • Check persistence/retention semantics (persistentVolumeClaimRetentionPolicy, retention settings) and any securityContext adjustments.

Possibly related PRs

Suggested reviewers

  • tasches
  • marvinWolff
  • teutonet-bot

Poem

🐇
I hopped through charts at break of dawn,
Swapped distributed for tempo, now reborn,
OTLP found a new front door,
Persistence tucked in, replicas no more,
Tracing hums light — hooray, I snore. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions 'grafana-tempo-distributed would need s3' but the actual change is switching FROM tempo-distributed TO tempo (single binary chart), not adding S3 support. Revise the title to accurately reflect the main change: 'fix(base-cluster/monitoring): Switch Grafana Tempo to single binary deployment' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/base-cluster/grafana-tempo

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0fbd47d and 4a3566a.

📒 Files selected for processing (5)
  • charts/base-cluster/README.md.gotmpl (1 hunks)
  • charts/base-cluster/templates/monitoring/alloy-gateway.yaml (1 hunks)
  • charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml (1 hunks)
  • charts/base-cluster/values.schema.json (1 hunks)
  • charts/base-cluster/values.yaml (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • charts/base-cluster/README.md.gotmpl
🚧 Files skipped from review as they are similar to previous changes (1)
  • charts/base-cluster/templates/monitoring/alloy-gateway.yaml
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1605
File: charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml:41-46
Timestamp: 2025-07-24T09:27:02.830Z
Learning: The grafana/tempo-distributed Helm chart uses port 3200 (not 3100) for the query-frontend HTTP service by default. The URL "http://grafana-tempo-query-frontend.monitoring:3200" is correct when configuring Grafana datasources for this chart.
📚 Learning: 2025-07-24T09:41:28.072Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1604
File: charts/base-cluster/templates/monitoring/metrics-server/metrics-server.yaml:20-21
Timestamp: 2025-07-24T09:41:28.072Z
Learning: The official kubernetes-sigs/metrics-server Helm chart pins the image tag to a specific version in its values.yaml by default, ensuring reproducible deployments without requiring explicit tag specification in consuming charts.

Applied to files:

  • charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml
📚 Learning: 2025-07-24T09:55:53.655Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1601
File: charts/base-cluster/templates/dns/external-dns.yaml:30-32
Timestamp: 2025-07-24T09:55:53.655Z
Learning: In charts/base-cluster/templates/dns/external-dns.yaml, the dns.provider field in values.yaml has always been expected to be a map format (e.g., `{ cloudflare: {} }`), never a string format. The template correctly uses `{{ .Values.dns.provider | keys | first }}` to extract the provider name from the map keys.

Applied to files:

  • charts/base-cluster/values.yaml
📚 Learning: 2025-07-24T09:56:41.380Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1601
File: charts/base-cluster/templates/dns/external-dns.yaml:33-39
Timestamp: 2025-07-24T09:56:41.380Z
Learning: In the teutonet-helm-charts base-cluster chart, secret names like "external-dns" for Cloudflare provider are intentionally hard-coded. Users who need custom secret names should use Helm's `valuesFrom` feature to override values rather than expecting dedicated fields in values.yaml. This design keeps the values.yaml clean while still allowing full customization flexibility.

Applied to files:

  • charts/base-cluster/values.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: check licenses
  • GitHub Check: lint helm chart (base-cluster)
🔇 Additional comments (8)
charts/base-cluster/values.schema.json (1)

874-889: Schema updates correctly promote ingester fields to tracing level.

The addition of resourcesPreset, resources, and persistence properties at the tracing level aligns with the restructuring described in the PR. The $ref definitions are valid and consistent with existing patterns in the schema.

charts/base-cluster/values.yaml (2)

114-114: Chart dependency updated correctly.

The addition of tempo 1.24.1 to the grafana helm repository aligns with the shift from the distributed to single binary architecture.


348-353: Tracing configuration promoted correctly.

The resourcesPreset, resources, and persistence.size fields are now at the tracing level, consistent with the schema changes and template updates. The resourcesPreset: small provides an appropriate default allocation for the single binary tempo instance.

charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml (5)

12-12: Chart reference correctly switched to single binary tempo.

The chart spec has been updated from tempo-distributed to tempo, aligning with the PR objective to use the single binary chart.


30-35: OTLP receiver configuration is appropriate.

The addition of the gRPC endpoint at 0.0.0.0:4317 correctly configures the standard OTLP ingest port for the tempo single binary instance, enabling trace collection from instrumented applications and the Alloy gateway.


37-45: Persistence and retention configuration supports operational requirements.

The persistent volume configuration with 48h retention (line 42) and PVC retention policy (lines 37-40) addresses the PR objective by enabling proper compaction and cleanup of older traces.


53-56: ServiceMonitor integration is correctly configured.

The addition of the serviceMonitor with monitoring/provisioned-by: base-cluster label enables proper discovery and scraping of tempo metrics by the Prometheus operator, supporting observability of the tracing system.


75-75: No action required. The query-frontend port 3200 is correct for the Grafana Tempo Helm chart (both distributed and single-binary deployments use port 3200 as the default queryFrontend.service.ports.http value).


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an important issue with trace cleanup by switching from the grafana-tempo-distributed Helm chart to the simpler tempo single-binary chart. This is a good move as it removes the requirement for S3-compatible object storage, which was causing the compactor to fail. The configuration changes across the files are mostly correct and reflect the new chart's structure. However, I've found a critical issue in the resource configuration for Tempo that will prevent it from getting the correct resource limits and requests, as detailed in the specific comment.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates Grafana Tempo from the distributed chart (tempo-distributed) to the single binary chart (tempo) to address an issue where the compactor component doesn't work without S3 storage, preventing trace cleanup.

Key Changes

  • Chart migration from tempo-distributed (v1.57.0) to tempo (v1.24.1)
  • Reconfiguration of deployment from distributed microservices to single binary architecture
  • Updated service endpoint from grafana-tempo-distributor:4317 to grafana-tempo:4317 for trace ingestion

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
charts/base-cluster/values.yaml Updated chart reference from tempo-distributed to tempo with new version
charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml Restructured configuration for single binary deployment including OTLP receivers, persistence, and metrics generator
charts/base-cluster/templates/monitoring/alloy-gateway.yaml Updated trace export endpoint to match new single binary service name

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b50ea98 and 0fbd47d.

📒 Files selected for processing (3)
  • charts/base-cluster/templates/monitoring/alloy-gateway.yaml (1 hunks)
  • charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml (1 hunks)
  • charts/base-cluster/values.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-07-24T09:27:02.830Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1605
File: charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml:41-46
Timestamp: 2025-07-24T09:27:02.830Z
Learning: The grafana/tempo-distributed Helm chart uses port 3200 (not 3100) for the query-frontend HTTP service by default. The URL "http://grafana-tempo-query-frontend.monitoring:3200" is correct when configuring Grafana datasources for this chart.

Applied to files:

  • charts/base-cluster/templates/monitoring/alloy-gateway.yaml
📚 Learning: 2025-07-24T09:41:28.072Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1604
File: charts/base-cluster/templates/monitoring/metrics-server/metrics-server.yaml:20-21
Timestamp: 2025-07-24T09:41:28.072Z
Learning: The official kubernetes-sigs/metrics-server Helm chart pins the image tag to a specific version in its values.yaml by default, ensuring reproducible deployments without requiring explicit tag specification in consuming charts.

Applied to files:

  • charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml
📚 Learning: 2025-07-24T09:55:53.655Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1601
File: charts/base-cluster/templates/dns/external-dns.yaml:30-32
Timestamp: 2025-07-24T09:55:53.655Z
Learning: In charts/base-cluster/templates/dns/external-dns.yaml, the dns.provider field in values.yaml has always been expected to be a map format (e.g., `{ cloudflare: {} }`), never a string format. The template correctly uses `{{ .Values.dns.provider | keys | first }}` to extract the provider name from the map keys.

Applied to files:

  • charts/base-cluster/values.yaml
📚 Learning: 2025-07-24T09:56:41.380Z
Learnt from: cwrau
Repo: teutonet/teutonet-helm-charts PR: 1601
File: charts/base-cluster/templates/dns/external-dns.yaml:33-39
Timestamp: 2025-07-24T09:56:41.380Z
Learning: In the teutonet-helm-charts base-cluster chart, secret names like "external-dns" for Cloudflare provider are intentionally hard-coded. Users who need custom secret names should use Helm's `valuesFrom` feature to override values rather than expecting dedicated fields in values.yaml. This design keeps the values.yaml clean while still allowing full customization flexibility.

Applied to files:

  • charts/base-cluster/values.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: check licenses
  • GitHub Check: Agent
  • GitHub Check: lint helm chart (base-cluster)
🔇 Additional comments (5)
charts/base-cluster/templates/monitoring/alloy-gateway.yaml (1)

99-107: Endpoint update aligns with chart migration.

The OTLP exporter endpoint correctly targets the new tempo deployment service. Port 4317 is the appropriate OTLP gRPC receiver port.

charts/base-cluster/values.yaml (1)

114-114: Chart version update is appropriate.

The migration to tempo 1.24.1 is consistent with the PR objective to switch from tempo-distributed to the single binary tempo chart.

charts/base-cluster/templates/monitoring/tracing/grafana-tempo.yaml (3)

30-35: OTLP receiver endpoint configuration is correct.

The OTLP gRPC endpoint 0.0.0.0:4317 matches the grafana/tempo chart's default receiver configuration and is properly exposed as containerPort 4317 in the deployment.


52-52: remoteWriteUrl is correctly configured for kube-prometheus-stack 79.11.0.

The endpoint http://kube-prometheus-stack-prometheus:9090/api/v1/write on line 52 is accurate. The service name follows the standard Helm release naming pattern ({release}-{component}), port 9090 is the correct Prometheus default, and /api/v1/write is the correct remote write API path. This configuration is consistent with other components in the codebase (e.g., alloy-collector.yaml uses the same endpoint).


75-75: The Grafana datasource endpoint is correct. The tempo Helm chart (v1.24.1) deployed here uses distributed architecture with a query-frontend component, and port 3200 is the standard HTTP port for the query-frontend service. The service name grafana-tempo-query-frontend and namespace monitoring follow the chart's default naming conventions.

Likely an incorrect or invalid review comment.

@cwrau cwrau force-pushed the fix/base-cluster/grafana-tempo branch 3 times, most recently from 186e11a to 7e3185a Compare December 9, 2025 14:14
@cwrau cwrau changed the title fix(base-cluster/monitoring): grafana-tempo-distributed would need s3 fix(base-cluster/monitoring)!: grafana-tempo-distributed would need s3 Dec 9, 2025
@cwrau cwrau disabled auto-merge December 9, 2025 14:19
marvinWolff
marvinWolff previously approved these changes Dec 9, 2025
Therefore we switch to the singleBinary chart.
Without this change the compactor doesn't work; meaning the traces are
never cleaned up...
@cwrau cwrau force-pushed the fix/base-cluster/grafana-tempo branch from 7e3185a to 4a3566a Compare December 9, 2025 14:31
@cwrau cwrau enabled auto-merge December 9, 2025 14:31
@cwrau cwrau added this pull request to the merge queue Dec 9, 2025
Merged via the queue into main with commit df5c372 Dec 9, 2025
31 of 32 checks passed
@cwrau cwrau deleted the fix/base-cluster/grafana-tempo branch December 9, 2025 14:41
github-merge-queue bot pushed a commit that referenced this pull request Dec 11, 2025
🤖 I have created a release *beep* *boop*
---


##
[11.0.0](base-cluster-v10.1.2...base-cluster-v11.0.0)
(2025-12-10)


### ⚠ BREAKING CHANGES

* **base-cluster/monitoring:** grafana-tempo-distributed would need s3
([#1875](#1875))

### Features

* **base-cluster/flux:** use centralised helmRepositories template
([#1845](#1845))
([60658ed](60658ed))


### Bug Fixes

* **base-cluster/monitoring:** grafana-tempo-distributed would need s3
([#1875](#1875))
([df5c372](df5c372))
* **base-cluster/monitoring:** increase limit of grafana sidecar
([#1878](#1878))
([fb1ebc6](fb1ebc6))
* **base-cluster:** correctly set image repository
([#1848](#1848))
([0e108dc](0e108dc))


### Miscellaneous Chores

* **base-cluster/dependencies:** update common docker tag to v1.7.0
([#1872](#1872))
([b50ea98](b50ea98))
* **base-cluster/dependencies:** update helm release reflector to
v9.1.44
([#1861](#1861))
([2cfe4e5](2cfe4e5))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Breaking Changes**
  * grafana-tempo-distributed now requires S3 configuration

* **New Features**
  * Centralized helmRepositories template management

* **Bug Fixes**
  * Increased Grafana sidecar limit
  * Corrected image repository reference

* **Chores**
  * Updated dependencies and component versions

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants