Skip to content

Conversation

@chris-smith-zocdoc
Copy link
Contributor

Fixes #975

Please look at the individual commits, most of this is just moving code around.

Moves zipkin v2 trace conversion code into translator/trace/zipkin, previously it was in the receiver
Use the same tag parsing logic for both zipkin v1 and v2

@codecov
Copy link

codecov bot commented May 20, 2020

Codecov Report

Merging #1002 into master will increase coverage by 0.62%.
The diff coverage is 95.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1002      +/-   ##
==========================================
+ Coverage   86.46%   87.09%   +0.62%     
==========================================
  Files         198      199       +1     
  Lines       14144    14151       +7     
==========================================
+ Hits        12230    12325      +95     
+ Misses       1463     1384      -79     
+ Partials      451      442       -9     
Impacted Files Coverage Δ
translator/trace/zipkin/status_code.go 97.67% <93.33%> (-2.33%) ⬇️
translator/trace/zipkin/zipkinv2_to_protospan.go 95.65% <95.65%> (ø)
receiver/zipkinreceiver/trace_receiver.go 89.47% <100.00%> (+19.17%) ⬆️
translator/trace/zipkin/zipkinv1_to_protospan.go 93.91% <100.00%> (+9.94%) ⬆️
translator/internaldata/resource_to_oc.go 73.52% <0.00%> (-2.95%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ad767e...d08da47. Read the comment docs.

return nil
}

const statusCodeUnknown = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this values is not going to be being used anymore.

It means that if we get an invalid value in error attribute that is out of the scope of canonicalCodesMap, we don't use that to identify an error in the span anymore. I'm not sure this is what we need. Why do you think we should change the behavior this way?

And since we have types conversion now, I would consider setting the UnknownError status for true boolean value.

IFAIK another case when this value was set in zipkinV2 translation is when opencensus.status_description attribute value is provided but error attribute is empty. After this change we will be setting SpanStatus = 0 instead of 2 for those spans. That is acceptable change I think.

@bogdandrutu @tigrannajaryan thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that if we get an invalid value in error attribute that is out of the scope of canonicalCodesMap, we don't use that to identify an error in the span anymore. I'm not sure this is what we need. Why do you think we should change the behavior this way?

I pushed a commit to add this back, its not super clean, I'm open to suggestions on how to improve it.

And since we have types conversion now, I would consider setting the UnknownError status for true boolean value

For the error attribute? Which piece of the mapper should I set that on? fromCensus/fromStatus/fromHTTP How should that interact with the presence of the other attributes? I've added some test cases, perhaps listing a few ones to add would help clarify for me

IFAIK another case when this value was set in zipkinV2 translation is when opencensus.status_description attribute value is provided but error attribute is empty. After this change we will be setting SpanStatus = 0 instead of 2 for those spans. That is acceptable change I think.

I actually undid this when I added support for unknown back. Let me know what you think the best path forward is

Copy link
Member

@dmitryax dmitryax Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the error attribute? Which piece of the mapper should I set that on? fromCensus/fromStatus/fromHTTP How should that interact with the presence of the other attributes? I've added some test cases, perhaps listing a few ones to add would help clarify for me

Probably to add another one like fromErrorTag. We should set 2 (Unknown) error code if "error" tag has a value (like "true" or anything else that is not an actual error code), and others sources (fromCensus/fromStatus/fromHTTP) doesn't provide any details to infer an error code.

I actually undid this when I added support for unknown back. Let me know what you think the best path forward is

ZipkinV1 receiver was defaulting to 0 code if there is message but not error code identified. I think we should use that approach in both v1 and v2 receivers. Setting Unknown error code only based on a message doesn't sound right to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased and added a commit for these changes

@bogdandrutu
Copy link
Member

Please sign the CLA

@chris-smith-zocdoc
Copy link
Contributor Author

I signed it

@bogdandrutu
Copy link
Member

Please rebase and try to add tests :)

@chris-smith-zocdoc
Copy link
Contributor Author

Please rebase and try to add tests :)

Added some tests for the status mapper, there are still some open questions regarding intended behavior I think (see #1002 (comment) ) Let me know if there are other areas I should add tests for

I just pushed another commit, I'll wait to rebase until after dmitryax gets a chance to look at it

@chris-smith-zocdoc
Copy link
Contributor Author

chris-smith-zocdoc commented Jun 2, 2020

Actually I see dmitryax made some conflicting changes, merging for now

@dmitryax
Copy link
Member

dmitryax commented Jun 3, 2020

@chris-smith-zocdoc could you check the failing test please?

@chris-smith-zocdoc
Copy link
Contributor Author

The failing test is also failing on the master branch right now, so I don't think its related to my change

--- FAIL: TestBatchProcessorTraceSendWhenClosing (5.00s)
    batch_processor_test.go:189: failed to wait for sender timed out waiting for spans
    batch_processor_test.go:193: 
        	Error Trace:	batch_processor_test.go:193
        	Error:      	Not equal: 
        	            	expected: 100
        	            	actual  : 90
        	Test:       	TestBatchProcessorTraceSendWhenClosing
FAIL
FAIL	go.opentelemetry.io/collector/processor/batchprocessor	9.381s

@tigrannajaryan
Copy link
Member

The failing test is also failing on the master branch right now, so I don't think its related to my change

--- FAIL: TestBatchProcessorTraceSendWhenClosing (5.00s)
    batch_processor_test.go:189: failed to wait for sender timed out waiting for spans
    batch_processor_test.go:193: 
        	Error Trace:	batch_processor_test.go:193
        	Error:      	Not equal: 
        	            	expected: 100
        	            	actual  : 90
        	Test:       	TestBatchProcessorTraceSendWhenClosing
FAIL
FAIL	go.opentelemetry.io/collector/processor/batchprocessor	9.381s

Filed a bug #1069
We will disable the test if it cannot be fixed quickly.

@tigrannajaryan
Copy link
Member

@chris-smith-zocdoc please rebase, tests should pass now.

use same tag parsing logic for both zipkin v1 and v2
oc description without error should be OK
add more test cases for type conversions on error tag
s = m.fromHTTP
}

// If no codePtr was provided, fallback to the first source with a message
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applying this default caused this test to fail

// only status.message tag
{
haveTags: []*zipkincore.BinaryAnnotation{{
Key: "status.message",
Value: []byte("Forbidden"),
AnnotationType: zipkincore.AnnotationType_STRING,
}},
wantAttributes: nil,
wantStatus: nil,
},

attributes: {
  "status.message": "Forbidden",
}

expects tracepb.Status  to be nil

The jaeger behavior matches the previous zipkin behavior https://github.com/open-telemetry/opentelemetry-collector/blob/master/translator/trace/jaeger/jaegerthrift_to_protospan.go#L262-L265

Do you think I should update the tests or modify the behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change behavior . We should not set any error status based on "status.message" only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only apply to that specific open census attribute? I ask because the existing code will return a tracepb.Status when either a code or message is set if s.codePtr != nil || s.message != "" . Because of the way it is currently implemented, this would only ever apply to fromHTTP though.

On master, the current behavior is

given

attributes: {
   "http.status_message": "something"
}

returns

tracepb.Status{
    Code: 0,
    Message: "something",
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. we should not set error status code based only on an error message attribute such as http.status_message or status.message (hopefully I don't miss any other message attribute)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-smith-zocdoc could you apply this approach ^ and make sure tests pass?

s = m.fromHTTP
}

s.codePtr = m.fromErrorTag.codePtr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be under condition if (s.codePtr == nil) so we don't override code that was already set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it to make it more clear, but this would always be true currently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this always be true? any of the assignments above can change it not null, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the first switch we know that
m.fromCensus.codePtr == nil
m.fromStatus.codePtr == nil

and line 62 gives
m.fromHTTP.codePtr == nil

So we've verified all three are nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still don't get it. s can be set in any case above, so any of them can set s.codePtr to be something but not nil, right? If we don't have if (s.codePtr == nil) condition here, we override any s.codePtr with m.fromErrorTag.codePtr all the time

check for nil before overwriting
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 5, 2020

CLA Check
The committers are authorized under a signed CLA.

@chris-smith-zocdoc
Copy link
Contributor Author

Did something change with the CLA process? Not sure why its complaining about that last commit

name: "error only: false",
expected: &tracepb.Status{Code: 2},
attributes: map[string]string{
"error": "false",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is still weird to me, I looked through our data and we do have spans with this being reported 🙄

How do you feel about treating a boolean attribute with a value false as OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the instrumentation on your end producing this kind of spans? It looks like OpenTracing convention, but should not be reported by zipkin instrumentation. If the instrumentation and spans are valid we probably should handle this edge case and make it OK code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this more, I don't think we need to make any changes.

Originally I was looking at our exported data in honeycomb where this was accidentally reported as a boolean instead of a string, causing some type conversion problems.

},
},
},
Description: &tracepb.TruncatableString{Value: currAnnotation.Value},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax found another difference with the v1 parsing on Time Events, converted it to the v2 behavior

func zipkinAnnotationToProtoAnnotation(zas zipkinmodel.Annotation) *tracepb.Span_TimeEvent {
if zas == blankAnnotation {
return nil
}
return &tracepb.Span_TimeEvent{
Time: internal.TimeToTimestamp(zas.Timestamp),
Value: &tracepb.Span_TimeEvent_Annotation_{
Annotation: &tracepb.Span_TimeEvent_Annotation{
Description: &tracepb.TruncatableString{Value: zas.Value},
},
},
}
}

without this change, the exporter would emit time events without a name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it still work as expected for v1 spans?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it didn't work, it should now

The time event (annotation in zipkin terms) looks like this

annotations: 
[ {
    "timestamp": 1591814867,
    "value": "my custom annotation",
    "endpoint": {
       "serviceName": "my-service"
   }
} ]

Previously this was being converted into this proto structure

time_events: 
[ {
    time: 1591814867,
    attributes: {
       "my custom annotation": "my-service"
    }
} ]

Now it is

time_events: 
[ {
    time: 1591814867,
    description: "my custom annotation"
} ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. looks good

@dmitryax
Copy link
Member

@chris-smith-zocdoc could you please increase the diff coverage to hit 95% requirement?

@chris-smith-zocdoc
Copy link
Contributor Author

Hmm I just added a bunch of missing tests but the coverage target didn't change? Can you help me understand where I'm missing coverage? Haven't used codecov before

Looks like I increased overall coverage but not on the diff?
Screen Shot 2020-06-13 at 1 20 45 AM

@dmitryax
Copy link
Member

dmitryax commented Jun 15, 2020

Unf it's caused by moving around parts that were not covered before. You can see what is not covered here: https://codecov.io/gh/open-telemetry/opentelemetry-collector/compare/1ad767e62f3dff6f62f32c7360b6fefe0fbf32ff...0ea765f5586230d49756fc380585341c6333d4a0/diff
For example this piece in translator/trace/zipkin/zipkinv2_to_protospan.go:
Screenshot 2020-06-14 20 27 10
Could you see if it's feasible to cover those?

@chris-smith-zocdoc
Copy link
Contributor Author

wooo its green!

@dmitryax
Copy link
Member

Thanks @chris-smith-zocdoc ! It looks good overall, but I'd like to give another thorough review tomorrow since it's a pretty risky change.

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-smith-zocdoc overall it looks good. I just found the last issue which I think we should solve in this PR. Please let me know what do you think.

}
code, set := canonicalCodesMap[canonicalCodeStr]
if set {
return &code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we need to break down this function into two:

If this condition worked out and we can get a clear error status from "error" tag by doing the case * tracepb.AttributeValue_StringValue -> canonicalCodesMap[canonicalCodeStr] check, the result must be considered as higher priority. And we should not use HTTP status in that case. We also should drop the "error" tag, since it doesn't provide any additional value (return true in fromAttribute).

Otherwise, if there is something in the "error" that we cannot identify, we should set unknown status as lower priority, and use HTTP status instead. Probably we need to use another statusMapper like fromErrorTagUnknown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this change and was able to simplify this logic a bit. Please let me know what you think


if s.codePtr == nil {
s.codePtr = m.fromErrorTag.codePtr
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the result of the proposed change we would have here:

	if s.codePtr == nil {
		switch {
		case m.fromCensus.message != "":
			s = m.fromCensus
		case m.fromStatus.message != "":
			s = m.fromStatus
		case m.fromErrorTag.codePtr != nil:
			s = m.fromErrorTag
		case m.fromHTTP.message != "":
			s = m.fromHTTP
		default:
			s = m.fromErrorTagUnknown
		}

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tigrannajaryan tigrannajaryan merged commit 969a8d4 into open-telemetry:master Jun 23, 2020
@chris-smith-zocdoc chris-smith-zocdoc deleted the cs_975 branch June 27, 2020 03:03
wyTrivail pushed a commit to mxiamxia/opentelemetry-collector that referenced this pull request Jul 13, 2020
)

Fixes open-telemetry#975

Please look at the individual commits, most of this is just moving code around.

Moves zipkin v2 trace conversion code into translator/trace/zipkin, previously it was in the receiver
Use the same tag parsing logic for both zipkin v1 and v2
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this pull request Nov 11, 2021
…/jaeger (open-telemetry#1002)

* Bump google.golang.org/grpc in /exporters/trace/jaeger

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.30.0 to 1.31.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.30.0...v1.31.0)

Signed-off-by: dependabot[bot] <[email protected]>

* Auto-fix go.sum changes in dependent modules

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this pull request Apr 27, 2023
…y#1002)

Bumps [boto3](https://github.com/boto/boto3) from 1.20.16 to 1.20.17.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](boto/boto3@1.20.16...1.20.17)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this pull request Jul 5, 2024
swiatekm pushed a commit to swiatekm/opentelemetry-collector that referenced this pull request Oct 9, 2024
* demo: add missing namespace on grafana dashboard

fixes: open-telemetry#1001

Signed-off-by: hsinhoyeh <[email protected]>

* add rendered yamls

Signed-off-by: hsinhoyeh <[email protected]>

---------

Signed-off-by: hsinhoyeh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zipkin V2 Trace Receiver doesn't parse tags like http.status_code

4 participants