Unify zipkin v1 and v2 annotation/tag parsing logic #1002

chris-smith-zocdoc · 2020-05-20T19:34:54Z

Fixes #975

Please look at the individual commits, most of this is just moving code around.

Moves zipkin v2 trace conversion code into translator/trace/zipkin, previously it was in the receiver
Use the same tag parsing logic for both zipkin v1 and v2

codecov · 2020-05-20T20:57:23Z

Codecov Report

Merging #1002 into master will increase coverage by 0.62%.
The diff coverage is 95.60%.

@@            Coverage Diff             @@
##           master    #1002      +/-   ##
==========================================
+ Coverage   86.46%   87.09%   +0.62%     
==========================================
  Files         198      199       +1     
  Lines       14144    14151       +7     
==========================================
+ Hits        12230    12325      +95     
+ Misses       1463     1384      -79     
+ Partials      451      442       -9

Impacted Files	Coverage Δ
translator/trace/zipkin/status_code.go	`97.67% <93.33%> (-2.33%)`	⬇️
translator/trace/zipkin/zipkinv2_to_protospan.go	`95.65% <95.65%> (ø)`
receiver/zipkinreceiver/trace_receiver.go	`89.47% <100.00%> (+19.17%)`	⬆️
translator/trace/zipkin/zipkinv1_to_protospan.go	`93.91% <100.00%> (+9.94%)`	⬆️
translator/internaldata/resource_to_oc.go	`73.52% <0.00%> (-2.95%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ad767e...d08da47. Read the comment docs.

translator/trace/zipkin/status_code.go

dmitryax · 2020-05-20T22:39:55Z

translator/trace/zipkin/status_code.go

 	return nil
 }
+
+const statusCodeUnknown = 2


Looks like this values is not going to be being used anymore.

It means that if we get an invalid value in error attribute that is out of the scope of canonicalCodesMap, we don't use that to identify an error in the span anymore. I'm not sure this is what we need. Why do you think we should change the behavior this way?

And since we have types conversion now, I would consider setting the UnknownError status for true boolean value.

IFAIK another case when this value was set in zipkinV2 translation is when opencensus.status_description attribute value is provided but error attribute is empty. After this change we will be setting SpanStatus = 0 instead of 2 for those spans. That is acceptable change I think.

@bogdandrutu @tigrannajaryan thoughts?

It means that if we get an invalid value in error attribute that is out of the scope of canonicalCodesMap, we don't use that to identify an error in the span anymore. I'm not sure this is what we need. Why do you think we should change the behavior this way?

I pushed a commit to add this back, its not super clean, I'm open to suggestions on how to improve it.

And since we have types conversion now, I would consider setting the UnknownError status for true boolean value

For the error attribute? Which piece of the mapper should I set that on? fromCensus/fromStatus/fromHTTP How should that interact with the presence of the other attributes? I've added some test cases, perhaps listing a few ones to add would help clarify for me

IFAIK another case when this value was set in zipkinV2 translation is when opencensus.status_description attribute value is provided but error attribute is empty. After this change we will be setting SpanStatus = 0 instead of 2 for those spans. That is acceptable change I think.

I actually undid this when I added support for unknown back. Let me know what you think the best path forward is

For the error attribute? Which piece of the mapper should I set that on? fromCensus/fromStatus/fromHTTP How should that interact with the presence of the other attributes? I've added some test cases, perhaps listing a few ones to add would help clarify for me

Probably to add another one like fromErrorTag. We should set 2 (Unknown) error code if "error" tag has a value (like "true" or anything else that is not an actual error code), and others sources (fromCensus/fromStatus/fromHTTP) doesn't provide any details to infer an error code.

I actually undid this when I added support for unknown back. Let me know what you think the best path forward is

ZipkinV1 receiver was defaulting to 0 code if there is message but not error code identified. I think we should use that approach in both v1 and v2 receivers. Setting Unknown error code only based on a message doesn't sound right to me.

I've rebased and added a commit for these changes

bogdandrutu · 2020-05-20T23:53:08Z

Please sign the CLA

chris-smith-zocdoc · 2020-05-22T15:29:30Z

I signed it

bogdandrutu · 2020-06-02T21:37:30Z

Please rebase and try to add tests :)

chris-smith-zocdoc · 2020-06-02T23:39:41Z

Please rebase and try to add tests :)

Added some tests for the status mapper, there are still some open questions regarding intended behavior I think (see #1002 (comment) ) Let me know if there are other areas I should add tests for

I just pushed another commit, I'll wait to rebase until after dmitryax gets a chance to look at it

chris-smith-zocdoc · 2020-06-02T23:43:38Z

Actually I see dmitryax made some conflicting changes, merging for now

dmitryax · 2020-06-03T08:04:11Z

@chris-smith-zocdoc could you check the failing test please?

chris-smith-zocdoc · 2020-06-03T13:21:15Z

The failing test is also failing on the master branch right now, so I don't think its related to my change

--- FAIL: TestBatchProcessorTraceSendWhenClosing (5.00s)
    batch_processor_test.go:189: failed to wait for sender timed out waiting for spans
    batch_processor_test.go:193: 
        	Error Trace:	batch_processor_test.go:193
        	Error:      	Not equal: 
        	            	expected: 100
        	            	actual  : 90
        	Test:       	TestBatchProcessorTraceSendWhenClosing
FAIL
FAIL	go.opentelemetry.io/collector/processor/batchprocessor	9.381s

tigrannajaryan · 2020-06-03T14:17:52Z

The failing test is also failing on the master branch right now, so I don't think its related to my change

--- FAIL: TestBatchProcessorTraceSendWhenClosing (5.00s)
    batch_processor_test.go:189: failed to wait for sender timed out waiting for spans
    batch_processor_test.go:193: 
        	Error Trace:	batch_processor_test.go:193
        	Error:      	Not equal: 
        	            	expected: 100
        	            	actual  : 90
        	Test:       	TestBatchProcessorTraceSendWhenClosing
FAIL
FAIL	go.opentelemetry.io/collector/processor/batchprocessor	9.381s

Filed a bug #1069
We will disable the test if it cannot be fixed quickly.

tigrannajaryan · 2020-06-03T18:21:23Z

@chris-smith-zocdoc please rebase, tests should pass now.

use same tag parsing logic for both zipkin v1 and v2

oc description without error should be OK add more test cases for type conversions on error tag

chris-smith-zocdoc · 2020-06-04T16:03:29Z

translator/trace/zipkin/status_code.go

 		s = m.fromHTTP
 	}

+	// If no codePtr was provided, fallback to the first source with a message


applying this default caused this test to fail

opentelemetry-collector/translator/trace/zipkin/zipkinv1_thrift_to_protospan_test.go

Lines 166 to 175 in 665d64e

// only status.message tag

{

haveTags: []*zipkincore.BinaryAnnotation{{

Key: "status.message",

Value: []byte("Forbidden"),

AnnotationType: zipkincore.AnnotationType_STRING,

}},

wantAttributes: nil,

wantStatus: nil,

},

attributes: { "status.message": "Forbidden", } expects tracepb.Status to be nil

The jaeger behavior matches the previous zipkin behavior https://github.com/open-telemetry/opentelemetry-collector/blob/master/translator/trace/jaeger/jaegerthrift_to_protospan.go#L262-L265

Do you think I should update the tests or modify the behavior?

Please change behavior . We should not set any error status based on "status.message" only

Does this only apply to that specific open census attribute? I ask because the existing code will return a tracepb.Status when either a code or message is set if s.codePtr != nil || s.message != "" . Because of the way it is currently implemented, this would only ever apply to fromHTTP though.

On master, the current behavior is

given attributes: { "http.status_message": "something" } returns tracepb.Status{ Code: 0, Message: "something", }

You are right. we should not set error status code based only on an error message attribute such as http.status_message or status.message (hopefully I don't miss any other message attribute)

@chris-smith-zocdoc could you apply this approach ^ and make sure tests pass?

dmitryax · 2020-06-05T17:07:06Z

translator/trace/zipkin/status_code.go

+			s = m.fromHTTP
+		}
+
+		s.codePtr = m.fromErrorTag.codePtr


I think this should be under condition if (s.codePtr == nil) so we don't override code that was already set

I'll add it to make it more clear, but this would always be true currently

I don't understand why this always be true? any of the assignments above can change it not null, no?

From the first switch we know that
m.fromCensus.codePtr == nil
m.fromStatus.codePtr == nil

and line 62 gives
m.fromHTTP.codePtr == nil

So we've verified all three are nil

Sorry, I still don't get it. s can be set in any case above, so any of them can set s.codePtr to be something but not nil, right? If we don't have if (s.codePtr == nil) condition here, we override any s.codePtr with m.fromErrorTag.codePtr all the time

translator/trace/zipkin/status_code.go

check for nil before overwriting

linux-foundation-easycla · 2020-06-05T18:27:08Z

The committers are authorized under a signed CLA.

✅ Chris Smith (2080cdc, 45c7df8, 5acb4a0, bbd9ad3)

chris-smith-zocdoc · 2020-06-05T18:31:50Z

Did something change with the CLA process? Not sure why its complaining about that last commit

chris-smith-zocdoc · 2020-06-09T17:42:41Z

translator/trace/zipkin/status_code_test.go

+			name:     "error only: false",
+			expected: &tracepb.Status{Code: 2},
+			attributes: map[string]string{
+				"error": "false",


This case is still weird to me, I looked through our data and we do have spans with this being reported 🙄

How do you feel about treating a boolean attribute with a value false as OK?

What is the instrumentation on your end producing this kind of spans? It looks like OpenTracing convention, but should not be reported by zipkin instrumentation. If the instrumentation and spans are valid we probably should handle this edge case and make it OK code.

I looked into this more, I don't think we need to make any changes.

Originally I was looking at our exported data in honeycomb where this was accidentally reported as a boolean instead of a string, causing some type conversion problems.

…alue

chris-smith-zocdoc · 2020-06-10T17:00:10Z

translator/trace/zipkin/zipkinv1_to_protospan.go

-							},
-						},
-					},
+					Description: &tracepb.TruncatableString{Value: currAnnotation.Value},


@dmitryax found another difference with the v1 parsing on Time Events, converted it to the v2 behavior

opentelemetry-collector/receiver/zipkinreceiver/trace_receiver.go

Lines 562 to 574 in 8aa2731

func zipkinAnnotationToProtoAnnotation(zas zipkinmodel.Annotation) *tracepb.Span_TimeEvent {

if zas == blankAnnotation {

return nil

}

return &tracepb.Span_TimeEvent{

Time: internal.TimeToTimestamp(zas.Timestamp),

Value: &tracepb.Span_TimeEvent_Annotation_{

Annotation: &tracepb.Span_TimeEvent_Annotation{

Description: &tracepb.TruncatableString{Value: zas.Value},

},

},

}

}

without this change, the exporter would emit time events without a name

Will it still work as expected for v1 spans?

Previously it didn't work, it should now

The time event (annotation in zipkin terms) looks like this

annotations: [ { "timestamp": 1591814867, "value": "my custom annotation", "endpoint": { "serviceName": "my-service" } } ]

Previously this was being converted into this proto structure

time_events: [ { time: 1591814867, attributes: { "my custom annotation": "my-service" } } ]

Now it is

time_events: [ { time: 1591814867, description: "my custom annotation" } ]

Ok. looks good

dmitryax · 2020-06-11T16:05:45Z

@chris-smith-zocdoc could you please increase the diff coverage to hit 95% requirement?

# Conflicts: # receiver/zipkinreceiver/trace_receiver.go # receiver/zipkinreceiver/trace_receiver_test.go

chris-smith-zocdoc · 2020-06-13T05:21:46Z

Hmm I just added a bunch of missing tests but the coverage target didn't change? Can you help me understand where I'm missing coverage? Haven't used codecov before

Looks like I increased overall coverage but not on the diff?

dmitryax · 2020-06-15T03:30:28Z

Unf it's caused by moving around parts that were not covered before. You can see what is not covered here: https://codecov.io/gh/open-telemetry/opentelemetry-collector/compare/1ad767e62f3dff6f62f32c7360b6fefe0fbf32ff...0ea765f5586230d49756fc380585341c6333d4a0/diff
For example this piece in translator/trace/zipkin/zipkinv2_to_protospan.go:

Could you see if it's feasible to cover those?

chris-smith-zocdoc · 2020-06-15T20:19:16Z

wooo its green!

dmitryax · 2020-06-17T07:20:00Z

Thanks @chris-smith-zocdoc ! It looks good overall, but I'd like to give another thorough review tomorrow since it's a pretty risky change.

dmitryax

@chris-smith-zocdoc overall it looks good. I just found the last issue which I think we should solve in this PR. Please let me know what do you think.

dmitryax · 2020-06-18T07:40:51Z

translator/trace/zipkin/status_code.go

+		}
+		code, set := canonicalCodesMap[canonicalCodeStr]
+		if set {
+			return &code


It looks like we need to break down this function into two:

If this condition worked out and we can get a clear error status from "error" tag by doing the case * tracepb.AttributeValue_StringValue -> canonicalCodesMap[canonicalCodeStr] check, the result must be considered as higher priority. And we should not use HTTP status in that case. We also should drop the "error" tag, since it doesn't provide any additional value (return true in fromAttribute).

Otherwise, if there is something in the "error" that we cannot identify, we should set unknown status as lower priority, and use HTTP status instead. Probably we need to use another statusMapper like fromErrorTagUnknown.

I've made this change and was able to simplify this logic a bit. Please let me know what you think

dmitryax · 2020-06-18T08:08:26Z

translator/trace/zipkin/status_code.go

+
+		if s.codePtr == nil {
+			s.codePtr = m.fromErrorTag.codePtr
+		}


As the result of the proposed change we would have here:

if s.codePtr == nil { switch { case m.fromCensus.message != "": s = m.fromCensus case m.fromStatus.message != "": s = m.fromStatus case m.fromErrorTag.codePtr != nil: s = m.fromErrorTag case m.fromHTTP.message != "": s = m.fromHTTP default: s = m.fromErrorTagUnknown }

dmitryax

LGTM

) Fixes open-telemetry#975 Please look at the individual commits, most of this is just moving code around. Moves zipkin v2 trace conversion code into translator/trace/zipkin, previously it was in the receiver Use the same tag parsing logic for both zipkin v1 and v2

…/jaeger (open-telemetry#1002) * Bump google.golang.org/grpc in /exporters/trace/jaeger Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.30.0 to 1.31.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.30.0...v1.31.0) Signed-off-by: dependabot[bot] <[email protected]> * Auto-fix go.sum changes in dependent modules Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> Co-authored-by: Tyler Yahn <[email protected]>

…y#1002) Bumps [boto3](https://github.com/boto/boto3) from 1.20.16 to 1.20.17. - [Release notes](https://github.com/boto/boto3/releases) - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst) - [Commits](boto/boto3@1.20.16...1.20.17) --- updated-dependencies: - dependency-name: boto3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* demo: add missing namespace on grafana dashboard fixes: open-telemetry#1001 Signed-off-by: hsinhoyeh <[email protected]> * add rendered yamls Signed-off-by: hsinhoyeh <[email protected]> --------- Signed-off-by: hsinhoyeh <[email protected]>

chris-smith-zocdoc requested review from bogdandrutu, dmitryax, flands, owais, pjanotti, rghetia, songy23 and tigrannajaryan as code owners May 20, 2020 19:34

chris-smith-zocdoc force-pushed the cs_975 branch 2 times, most recently from ad533e1 to d0c5225 Compare May 20, 2020 20:46

dmitryax reviewed May 20, 2020

View reviewed changes

tigrannajaryan assigned dmitryax May 21, 2020

chris-smith-zocdoc mentioned this pull request Jun 3, 2020

Upstream Sync Zocdoc/opentelemetry-collector#1

Merged

chris-smith-zocdoc added 2 commits June 4, 2020 10:06

Move zipkin v2 trace conversion code into translator/trace/zipkin

2080cdc

use same tag parsing logic for both zipkin v1 and v2

fallback to error tag if set

45c7df8

oc description without error should be OK add more test cases for type conversions on error tag

chris-smith-zocdoc force-pushed the cs_975 branch from 7670027 to 45c7df8 Compare June 4, 2020 14:37

chris-smith-zocdoc commented Jun 4, 2020

View reviewed changes

dmitryax reviewed Jun 5, 2020

View reviewed changes

return nil instead of unknown

5acb4a0

check for nil before overwriting

return nil ocStatus when only message is provided

bbd9ad3

chris-smith-zocdoc commented Jun 9, 2020

View reviewed changes

zipkin v1: The time event should use the Description for the zipkin v…

2cdac1b

…alue

chris-smith-zocdoc commented Jun 10, 2020

View reviewed changes

chris-smith-zocdoc added 4 commits June 12, 2020 10:01

Merge branch 'upstream' into cs_975

937621f

Merge branch 'upstream' into cs_975

56a079d

# Conflicts: # receiver/zipkinreceiver/trace_receiver.go # receiver/zipkinreceiver/trace_receiver_test.go

add missing test case

945fe81

increase test coverage on the zipkin receiver

0ea765f

chris-smith-zocdoc force-pushed the cs_975 branch from 33b9944 to 0ea765f Compare June 13, 2020 05:07

add some more coverage

1470dca

minor coverage improvement for zipkin v1

8d07b93

dmitryax reviewed Jun 18, 2020

View reviewed changes

Prioritize valid oc status code in error over http

d08da47

chris-smith-zocdoc force-pushed the cs_975 branch from ea5ba4f to d08da47 Compare June 18, 2020 14:59

dmitryax approved these changes Jun 18, 2020

View reviewed changes

tigrannajaryan merged commit 969a8d4 into open-telemetry:master Jun 23, 2020

chris-smith-zocdoc deleted the cs_975 branch June 27, 2020 03:03

chris-smith-zocdoc mentioned this pull request Aug 5, 2020

Convert Zipkin receiver and exporter to use OTLP and fix translation bugs #1446

Merged

Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this pull request Jul 5, 2024

Improve consistence of spelling in README (open-telemetry#1002)

21a441e

	// only status.message tag
	{
	haveTags: []*zipkincore.BinaryAnnotation{{
	Key: "status.message",
	Value: []byte("Forbidden"),
	AnnotationType: zipkincore.AnnotationType_STRING,
	}},
	wantAttributes: nil,
	wantStatus: nil,
	},

	func zipkinAnnotationToProtoAnnotation(zas zipkinmodel.Annotation) *tracepb.Span_TimeEvent {
	if zas == blankAnnotation {
	return nil
	}
	return &tracepb.Span_TimeEvent{
	Time: internal.TimeToTimestamp(zas.Timestamp),
	Value: &tracepb.Span_TimeEvent_Annotation_{
	Annotation: &tracepb.Span_TimeEvent_Annotation{
	Description: &tracepb.TruncatableString{Value: zas.Value},
	},
	},
	}
	}

Unify zipkin v1 and v2 annotation/tag parsing logic #1002

Unify zipkin v1 and v2 annotation/tag parsing logic #1002

Uh oh!

Conversation

chris-smith-zocdoc commented May 20, 2020

Uh oh!

codecov bot commented May 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmitryax Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bogdandrutu commented May 20, 2020

Uh oh!

chris-smith-zocdoc commented May 22, 2020

Uh oh!

bogdandrutu commented Jun 2, 2020

Uh oh!

chris-smith-zocdoc commented Jun 2, 2020

Uh oh!

chris-smith-zocdoc commented Jun 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmitryax commented Jun 3, 2020

Uh oh!

chris-smith-zocdoc commented Jun 3, 2020

Uh oh!

tigrannajaryan commented Jun 3, 2020

Uh oh!

tigrannajaryan commented Jun 3, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Jun 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-smith-zocdoc commented Jun 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 20, 2020 •

edited

Loading

dmitryax Jun 3, 2020 •

edited

Loading

chris-smith-zocdoc commented Jun 2, 2020 •

edited

Loading

linux-foundation-easycla bot commented Jun 5, 2020 •

edited

Loading

dmitryax commented Jun 15, 2020 •

edited

Loading