Skip to content

Conversation

@james-ryans
Copy link
Contributor

@james-ryans james-ryans commented Apr 3, 2024

Which problem is this PR solving?

Description of the changes

  • Utilizing existing StorageIntegration to test the jaeger-v2 OTel Collector and gRPC storage backend with the provided config file at cmd/jaeger/grpc_config.yaml.
  • Creates an abstraction for e2e test mode that initializes the collector, span writer to the receiver, and span reader from jaeger_query.

How was this change tested?

  • Run STORAGE=grpc SPAN_STORAGE_TYPE=memory make jaeger-v2-storage-integration-test

Checklist

@james-ryans james-ryans requested a review from a team as a code owner April 3, 2024 07:15
@james-ryans james-ryans requested a review from albertteoh April 3, 2024 07:15
@james-ryans
Copy link
Contributor Author

@yurishkuro There's an issue that translating Jaeger proto to OTLP format changes the tags attributes BINARY type into string at open-telemetry/opentelemetry-collector-contrib/pkg/translator/jaeger/jaegerproto_to_traces.go#L231-L248 which causes the integration test failed. The remaining tests failed below are all caused by this problem. How should we solve this?

...
=== RUN   TestGRPCStorage/FindTraces/Tags_in_one_spot_-_Tags
    integration.go:139: Waiting for storage backend to update documents, iteration 1 out of 100
    integration.go:330: FindTraces: expected: 1, actual: 0
    integration.go:139: Waiting for storage backend to update documents, iteration 2 out of 100
    trace_compare.go:38: Expected and actual differ: [0].Spans[0].Tags[0].VType: 4 != 0
    trace_compare.go:38: Expected and actual differ: [0].Spans[0].Tags[0].VStr: "" != "AAAwOQ=="
    trace_compare.go:38: Expected and actual differ: [0].Spans[0].Tags[0].VBinary: []uint8[4] != []uint8[0]
    trace_compare.go:44: Actual traces: [{"spans":[{"trace_id":"AAAAAAAAAAAAAAAAAAAAAQ==","span_id":"AAAAAAAAAAI=","operation_name":"some-operation","references":null,"flags":0,"start_time":"2024-04-02T16:46:31.639875Z","duration":7000,"tags":[{"key":"blob","v_str":"AAAwOQ=="},{"key":"sameplacetag1","v_str":"sameplacevalue"},{"key":"sameplacetag2","v_type":2,"v_int64":123},{"key":"sameplacetag3","v_type":3,"v_float64":72.5},{"key":"sameplacetag4","v_type":1,"v_bool":true}],"logs":null,"process":{"service_name":"query01-service","tags":null}}],"process_map":null}]
    trace_compare.go:45: Expected traces: [{"spans":[{"trace_id":"AAAAAAAAAAAAAAAAAAAAAQ==","span_id":"AAAAAAAAAAI=","operation_name":"some-operation","references":[],"flags":0,"start_time":"2024-04-02T16:46:31.639875Z","duration":7000,"tags":[{"key":"blob","v_type":4,"v_binary":"AAAwOQ=="},{"key":"sameplacetag1","v_str":"sameplacevalue"},{"key":"sameplacetag2","v_type":2,"v_int64":123},{"key":"sameplacetag3","v_type":3,"v_float64":72.5},{"key":"sameplacetag4","v_type":1,"v_bool":true}],"logs":[],"process":{"service_name":"query01-service","tags":[]}}],"process_map":null}]
...
--- ❌ FAIL: TestGRPCStorage (35.19s)
    --- ✅ PASS: TestGRPCStorage/GetServices (1.04s)
    --- ✅ PASS: TestGRPCStorage/GetOperations (1.04s)
    --- ✅ PASS: TestGRPCStorage/GetTrace (1.04s)
        --- ✅ PASS: TestGRPCStorage/GetTrace/NotFound_error (0.00s)
    --- ✅ PASS: TestGRPCStorage/GetLargeSpans (29.91s)
    --- ❌ FAIL: TestGRPCStorage/FindTraces (1.15s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_in_one_spot_-_Tags (1.01s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_in_one_spot_-_Logs (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_in_one_spot_-_Process (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Tags_in_different_spots (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Trace_spans_over_multiple_indices (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Operation_name (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Operation_name_+_max_Duration (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Operation_name_+_Duration_range (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Duration_range (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/max_Duration (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/default (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_Operation_name (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_Operation_name_+_max_Duration (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_Operation_name_+_Duration_range (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_Duration_range (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Tags_+_max_Duration (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_Operation_name (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_Operation_name_+_max_Duration (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_Operation_name_+_Duration_range (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_Duration_range (0.00s)
        --- ❌ FAIL: TestGRPCStorage/FindTraces/Multi-spot_Tags_+_max_Duration (0.00s)
        --- ✅ PASS: TestGRPCStorage/FindTraces/Multiple_Traces (0.00s)
❌ FAIL
coverage: 17.6% of statements in ./...
❌ FAIL	github.com/jaegertracing/jaeger/cmd/jaeger/internal/integration/e2e	36.236s
❌ FAIL

@codecov
Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 87.43169% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 95.17%. Comparing base (6a26005) to head (d88a6a5).

Files Patch % Lines
cmd/jaeger/internal/integration/span_reader.go 80.68% 10 Missing and 7 partials ⚠️
cmd/jaeger/internal/integration/span_writer.go 79.31% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5322      +/-   ##
==========================================
- Coverage   95.25%   95.17%   -0.09%     
==========================================
  Files         340      343       +3     
  Lines       16677    16779     +102     
==========================================
+ Hits        15886    15969      +83     
- Misses        601      610       +9     
- Partials      190      200      +10     
Flag Coverage Δ
badger 10.59% <20.83%> (-0.59%) ⬇️
cassandra-3.x 18.52% <20.83%> (-1.06%) ⬇️
cassandra-4.x 18.52% <20.83%> (-1.06%) ⬇️
elasticsearch-5.x 21.00% <20.83%> (-1.20%) ⬇️
elasticsearch-6.x 20.98% <20.83%> (-1.21%) ⬇️
elasticsearch-7.x 21.04% <20.83%> (-1.22%) ⬇️
elasticsearch-8.x 21.24% <20.83%> (-1.20%) ⬇️
grpc 14.58% <87.43%> (+3.00%) ⬆️
kafka 10.21% <8.33%> (-0.59%) ⬇️
opensearch-1.x 21.09% <20.83%> (-1.22%) ⬇️
opensearch-2.x 21.10% <20.83%> (-1.21%) ⬇️
unittests 91.73% <5.46%> (-0.57%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

)

type GRPCServer struct {
errChan chan error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed? If it is, I would prefer to use a callback function, not a channel, because channels like this tend to cause deadlocks when nothing is reading them, while the callback achieves the same result of communicating the status without the risk of blocking (and the implementation of the callback can still use channel if it wants to, but that channel will be more local and less prone to deadlocks)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I have updated this to pass the error through the struct field

require.NoError(t, err)
require.NoError(t, s.factory.Close())
if s.server != nil {
require.NoError(t, s.server.Close())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to release the server here (s.server = nil)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. actually the server (GRPCServer) here is a declaration if the current test suite uses remote storage or a plugin. If it uses remote storage then it closes and starts the server again.

But the server inside GPRCServer struct is the gRPC server that handles requests, and I'll add to release the server there.

return err
}

func (r *spanReader) GetTrace(ctx context.Context, traceID model.TraceID) (*model.Trace, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit sad that you had to implement all of that. This is a takeway for V2 storage - perhaps it's possible to have remote storage API to be the same as query service API. Then we could've used plugin/storage/grpc/shared/client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll create a new issue for this task.

"github.com/jaegertracing/jaeger/ports"
)

type GRPCServer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the business function of this class? Calling it grpc server doesn't tell me anything. Is it not RemoteMemoryStorage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what is the difference between this server and cmd/remote-storage/app/server.go? Could we reuse that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yes, we could reuse cmd/remote-storage/app/server.go. This server is just an extraction of the gRPCServer from existing grpc_test.go so I can reuse this code for the new e2e grpc_test, but I'll change this to reuse the remote-storage server.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

cc, err := grpc.DialContext(ctx, ports.PortToHostPort(ports.QueryGRPC), opts...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call NewClient, dial was deprecated

}
}

func (r *spanReader) Start() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest merging this into create function. What's the benefit of separating Start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I've seen a lot of structs that separate the creation and start functions so I thought it was cleaner that way. I'll refactor this.

@yurishkuro
Copy link
Member

@james-ryans another possible way to unblock from waiting on OTEL contrib is to tweak the tests to not use binary tags (controlled by a flag, with a new issue to fix it once OTEL fix is released)


// TODO: remove this after upstream issue in OTEL jaeger translator is fixed
// Skip testing trace binary tags, logs, and process
SkipBinaryAttrs bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please book a ticket referring to this commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Booked it at #5341


func (s *RemoteMemoryStorage) Close() error {
if err := s.server.Close(); err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this potentially leaves storageFactory not closed. The pattern we use for multi-close is

	return errors.Join(
		b.reporter.Close(),
		b.tlsCloser.Close(),
		b.GetConn().Close(),
	)

tm := tenancy.NewManager(&opts.Tenancy)
storageFactory, err := storage.NewFactory(storage.FactoryConfigFromEnvAndCLI(os.Args, os.Stderr))
if err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about taking t.Testing arg and just calling require.NoError? This will eliminate the uncovered lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the easiest approach. Will do


func TestIndexCleaner_doNotFailOnFullStorage(t *testing.T) {
skipUnlessEnv(t, "elasticsearch", "opensearch")
SkipUnlessEnv(t, "elasticsearch", "opensearch")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this little change is a good candidate for a separate PR. It would significantly reduce the number of files in this PR, making it more focused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'll do that

@yurishkuro yurishkuro merged commit deea97b into jaegertracing:main Apr 9, 2024
yurishkuro pushed a commit that referenced this pull request May 11, 2024
## Which problem is this PR solving?
- PR #5322 temporarily added a SkipBinaryAttrs flag to avoid checking
span's tags with a binary type since the OTEL Jaeger translator has a
bug that converts binary tags into string tags. Since it has been fixed,
we will delete this flag.

## Description of the changes
- Delete the SkipBinaryAttrs flag from StorageIntegration.

## How was this change tested?
- Tested locally by running all the e2e storage tests. e.g.
`STORAGE=grpc SPAN_STORAGE_TYPE=memory make
jaeger-v2-storage-integration-test`

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [ ] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

Signed-off-by: James Ryans <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog:test Change that's adding missing tests or correcting existing tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants