Skip to content

[clickhouse] Implement FindTraces for ClickHouse Storage#7795

Merged
yurishkuro merged 20 commits into
jaegertracing:mainfrom
mahadzaryab1:find-traces
Dec 31, 2025
Merged

[clickhouse] Implement FindTraces for ClickHouse Storage#7795
yurishkuro merged 20 commits into
jaegertracing:mainfrom
mahadzaryab1:find-traces

Conversation

@mahadzaryab1
Copy link
Copy Markdown
Collaborator

@mahadzaryab1 mahadzaryab1 commented Dec 27, 2025

Which problem is this PR solving?

Description of the changes

  • This PR implements FindTraces for ClickHouse storage by leveraging the existing query building logic for FindTraceIDs.

How was this change tested?

  • Enabled integration tests for FindTraces. Currently ignoring the tags integration tests since we need to rework how typed attributes are handled in ClickHouse.

Checklist

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@mahadzaryab1 mahadzaryab1 added the changelog:experimental Change to an experimental part of the code label Dec 27, 2025
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.51%. Comparing base (9658822) to head (f3d4fd5).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7795      +/-   ##
==========================================
+ Coverage   95.35%   95.51%   +0.15%     
==========================================
  Files         310      307       -3     
  Lines       16075    15911     -164     
==========================================
- Hits        15329    15197     -132     
+ Misses        578      561      -17     
+ Partials      168      153      -15     
Flag Coverage Δ
badger_v1 9.18% <0.00%> (+0.15%) ⬆️
badger_v2 1.93% <0.00%> (+0.03%) ⬆️
cassandra-4.x-v1-manual 13.58% <0.00%> (+0.23%) ⬆️
cassandra-4.x-v2-auto 1.92% <0.00%> (+0.03%) ⬆️
cassandra-4.x-v2-manual 1.92% <0.00%> (+0.03%) ⬆️
cassandra-5.x-v1-manual 13.58% <0.00%> (+0.23%) ⬆️
cassandra-5.x-v2-auto 1.92% <0.00%> (+0.03%) ⬆️
cassandra-5.x-v2-manual 1.92% <0.00%> (+0.03%) ⬆️
clickhouse 1.97% <0.00%> (+0.15%) ⬆️
elasticsearch-6.x-v1 17.54% <0.00%> (+0.29%) ⬆️
elasticsearch-7.x-v1 17.57% <0.00%> (+0.29%) ⬆️
elasticsearch-8.x-v1 17.73% <0.00%> (+0.30%) ⬆️
elasticsearch-8.x-v2 1.93% <0.00%> (+0.03%) ⬆️
elasticsearch-9.x-v2 1.93% <0.00%> (+0.03%) ⬆️
grpc_v1 8.84% <0.00%> (-0.07%) ⬇️
grpc_v2 1.93% <0.00%> (+0.03%) ⬆️
kafka-3.x-v2 1.93% <0.00%> (+0.03%) ⬆️
memory_v2 1.93% <0.00%> (+0.03%) ⬆️
opensearch-1.x-v1 17.62% <0.00%> (+0.29%) ⬆️
opensearch-2.x-v1 17.62% <0.00%> (+0.29%) ⬆️
opensearch-2.x-v2 1.93% <0.00%> (+0.03%) ⬆️
opensearch-3.x-v2 1.93% <0.00%> (+0.03%) ⬆️
query 1.93% <0.00%> (+0.03%) ⬆️
tailsampling-processor 0.55% <0.00%> (+<0.01%) ⬆️
unittests 94.14% <100.00%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 28, 2025

Metrics Comparison Summary

Total changes across all snapshots: 90

Detailed changes per snapshot

summary_metrics_snapshot_clickhouse

📊 Metrics Diff Summary

Total Changes: 90

  • 🆕 Added: 0 metrics
  • ❌ Removed: 90 metrics
  • 🔄 Modified: 0 metrics

❌ Removed Metrics

  • rpc_server_duration_milliseconds (18 variants)
View diff sample
-rpc_server_duration_milliseconds{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_duration_milliseconds{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_duration_milliseconds{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_duration_milliseconds{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_duration_milliseconds{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_duration_milliseconds{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_duration_milliseconds{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
- `rpc_server_request_size_bytes` (18 variants)
View diff sample
-rpc_server_request_size_bytes{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_request_size_bytes{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_request_size_bytes{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_request_size_bytes{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_request_size_bytes{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_request_size_bytes{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_request_size_bytes{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
- `rpc_server_requests_per_rpc` (18 variants)
View diff sample
-rpc_server_requests_per_rpc{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_requests_per_rpc{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_requests_per_rpc{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_requests_per_rpc{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_requests_per_rpc{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_requests_per_rpc{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_requests_per_rpc{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
- `rpc_server_response_size_bytes` (18 variants)
View diff sample
-rpc_server_response_size_bytes{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_response_size_bytes{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_response_size_bytes{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_response_size_bytes{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_response_size_bytes{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_response_size_bytes{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_response_size_bytes{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
- `rpc_server_responses_per_rpc` (18 variants)
View diff sample
-rpc_server_responses_per_rpc{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_responses_per_rpc{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_responses_per_rpc{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_responses_per_rpc{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_responses_per_rpc{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_responses_per_rpc{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
-rpc_server_responses_per_rpc{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.37.0",otel_scope_version="0.64.0",rpc_grpc_status_code="0",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...

➡️ View full metrics file

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
{"spans", sql.TruncateSpans},
{"services", sql.TruncateServices},
{"operations", sql.TruncateOperations},
{"trace_id_timestamps", sql.TruncateTraceIDTimestamps},
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by change

@mahadzaryab1 mahadzaryab1 changed the title [WIP] Implement FindTraces for ClickHouse Storage [clickhouse] Implement FindTraces for ClickHouse Storage Dec 28, 2025
@mahadzaryab1 mahadzaryab1 marked this pull request as ready for review December 28, 2025 15:33
@mahadzaryab1 mahadzaryab1 requested a review from a team as a code owner December 28, 2025 15:33
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Comment thread internal/storage/v2/clickhouse/sql/queries.go Outdated
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Comment thread cmd/jaeger/internal/integration/clickhouse_test.go
Comment thread internal/storage/integration/fixtures/traces/multi_index_trace.json Outdated
Comment thread internal/storage/v2/clickhouse/tracestore/reader.go Outdated
Comment thread internal/storage/v2/clickhouse/sql/queries.go
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@yurishkuro yurishkuro merged commit c5ff269 into jaegertracing:main Dec 31, 2025
59 checks passed
@jixiuf
Copy link
Copy Markdown

jixiuf commented Jan 1, 2026

I tried this , but the performance was quite poor—almost unusable.
Perhaps we could consider changing the SQL to something like the following:

SELECT DISTINCT
    l.trace_id,
    t.start,
    t.end
FROM (
    SELECT DISTINCT s.trace_id
    FROM spans s
    WHERE 1=1
    AND s.service_name = 'myservicenma''
    AND s.name = 'a operation name''
    AND s.start_time >= toDateTime64('2026-01-01 01:18:00.000000000', 9)
    AND s.start_time <= toDateTime64('2026-01-01 01:19:00.000000000', 9)
    LIMIT 20
) l
LEFT JOIN (   -- or just use INNER JOIN
    SELECT trace_id, start, end
    FROM trace_id_timestamps
    WHERE 1=1
    AND start >= toDateTime64('2026-01-01 01:18:00.000000000', 9)
    AND end <= toDateTime64('2026-01-01 01:19:00.000000000', 9)
) t
ON l.trace_id = t.trace_id

Would this adjustment be worth trying?

@mahadzaryab1
Copy link
Copy Markdown
Collaborator Author

I tried this , but the performance was quite poor—almost unusable. Perhaps we could consider changing the SQL to something like the following:

SELECT DISTINCT
    l.trace_id,
    t.start,
    t.end
FROM (
    SELECT DISTINCT s.trace_id
    FROM spans s
    WHERE 1=1
    AND s.service_name = 'myservicenma''
    AND s.name = 'a operation name''
    AND s.start_time >= toDateTime64('2026-01-01 01:18:00.000000000', 9)
    AND s.start_time <= toDateTime64('2026-01-01 01:19:00.000000000', 9)
    LIMIT 20
) l
LEFT JOIN (   -- or just use INNER JOIN
    SELECT trace_id, start, end
    FROM trace_id_timestamps
    WHERE 1=1
    AND start >= toDateTime64('2026-01-01 01:18:00.000000000', 9)
    AND end <= toDateTime64('2026-01-01 01:19:00.000000000', 9)
) t
ON l.trace_id = t.trace_id

Would this adjustment be worth trying?

@jixiuf Thanks for reaching out! Are you able to share how to reproduce the slowness that you're seeing? I've opened #7814 to have a dedicated effort to benchmarking and improve any bottlenecks.

@mahadzaryab1 mahadzaryab1 deleted the find-traces branch January 1, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage changelog:experimental Change to an experimental part of the code enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants