Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 36 additions & 21 deletions translator/trace/zipkin/status_code.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ type statusMapper struct {
fromCensus status
// oc status code extracted from "http.status_code" tags
fromHTTP status
// oc status code extracted from "error" tags
fromErrorTag status
}

// ocStatus returns an OC status from the best possible extraction source.
Expand All @@ -45,13 +47,6 @@ type statusMapper struct {
// and finally fallback on code extracted and translated from "http.status_code"
// ocStatus must be called after all tags/attributes are processed with the `fromAttribute` method.
func (m *statusMapper) ocStatus() *tracepb.Status {

noKnownCode := m.fromCensus.codePtr == nil && m.fromHTTP.codePtr == nil && m.fromStatus.codePtr == nil
if noKnownCode && m.fromCensus.message != "" {
unknown := int32(2)
m.fromCensus.codePtr = &unknown
}

var s status
switch {
case m.fromCensus.codePtr != nil:
Expand All @@ -62,6 +57,21 @@ func (m *statusMapper) ocStatus() *tracepb.Status {
s = m.fromHTTP
}

// If no codePtr was provided, fallback to the first source with a message
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applying this default caused this test to fail

// only status.message tag
{
haveTags: []*zipkincore.BinaryAnnotation{{
Key: "status.message",
Value: []byte("Forbidden"),
AnnotationType: zipkincore.AnnotationType_STRING,
}},
wantAttributes: nil,
wantStatus: nil,
},

attributes: {
  "status.message": "Forbidden",
}

expects tracepb.Status  to be nil

The jaeger behavior matches the previous zipkin behavior https://github.com/open-telemetry/opentelemetry-collector/blob/master/translator/trace/jaeger/jaegerthrift_to_protospan.go#L262-L265

Do you think I should update the tests or modify the behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change behavior . We should not set any error status based on "status.message" only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only apply to that specific open census attribute? I ask because the existing code will return a tracepb.Status when either a code or message is set if s.codePtr != nil || s.message != "" . Because of the way it is currently implemented, this would only ever apply to fromHTTP though.

On master, the current behavior is

given

attributes: {
   "http.status_message": "something"
}

returns

tracepb.Status{
    Code: 0,
    Message: "something",
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. we should not set error status code based only on an error message attribute such as http.status_message or status.message (hopefully I don't miss any other message attribute)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-smith-zocdoc could you apply this approach ^ and make sure tests pass?

// and use the error tag code
if s.codePtr == nil {
switch {
case m.fromCensus.message != "":
s = m.fromCensus
case m.fromStatus.message != "":
s = m.fromStatus
default:
s = m.fromHTTP
}

s.codePtr = m.fromErrorTag.codePtr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be under condition if (s.codePtr == nil) so we don't override code that was already set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it to make it more clear, but this would always be true currently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this always be true? any of the assignments above can change it not null, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the first switch we know that
m.fromCensus.codePtr == nil
m.fromStatus.codePtr == nil

and line 62 gives
m.fromHTTP.codePtr == nil

So we've verified all three are nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still don't get it. s can be set in any case above, so any of them can set s.codePtr to be something but not nil, right? If we don't have if (s.codePtr == nil) condition here, we override any s.codePtr with m.fromErrorTag.codePtr all the time

}

if s.codePtr != nil || s.message != "" {
code := int32(0)
if s.codePtr != nil {
Expand All @@ -84,12 +94,6 @@ func (m *statusMapper) fromAttribute(key string, attrib *tracepb.AttributeValue)
}
return true

case tracetranslator.TagError:
codePtr := extractStatusFromError(attrib)
if codePtr != nil {
m.fromCensus.codePtr = codePtr
}

case tracetranslator.TagZipkinCensusMsg, tracetranslator.TagZipkinOpenCensusMsg:
m.fromCensus.message = attrib.GetStringValue().GetValue()
return true
Expand All @@ -114,6 +118,9 @@ func (m *statusMapper) fromAttribute(key string, attrib *tracepb.AttributeValue)

case tracetranslator.TagHTTPStatusMsg:
m.fromHTTP.message = attrib.GetStringValue().GetValue()

case tracetranslator.TagError:
m.fromErrorTag.codePtr = extractStatusFromError(attrib)
}
return false
}
Expand Down Expand Up @@ -148,15 +155,23 @@ func toInt32(i int) (int32, error) {
func extractStatusFromError(attrib *tracepb.AttributeValue) *int32 {
// The status is stored with the "error" key
// See https://github.com/census-instrumentation/opencensus-go/blob/1eb9a13c7dd02141e065a665f6bf5c99a090a16a/exporter/zipkin/zipkin.go#L160-L165
canonicalCodeStr := attrib.GetStringValue().GetValue()
if canonicalCodeStr == "" {
return nil
}
code, set := canonicalCodesMap[canonicalCodeStr]
if set {
return &code
var unknown int32 = 2

switch val := attrib.Value.(type) {
case *tracepb.AttributeValue_StringValue:
canonicalCodeStr := val.StringValue.GetValue()
if canonicalCodeStr == "" {
return &unknown
}
code, set := canonicalCodesMap[canonicalCodeStr]
if set {
return &code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we need to break down this function into two:

If this condition worked out and we can get a clear error status from "error" tag by doing the case * tracepb.AttributeValue_StringValue -> canonicalCodesMap[canonicalCodeStr] check, the result must be considered as higher priority. And we should not use HTTP status in that case. We also should drop the "error" tag, since it doesn't provide any additional value (return true in fromAttribute).

Otherwise, if there is something in the "error" that we cannot identify, we should set unknown status as lower priority, and use HTTP status instead. Probably we need to use another statusMapper like fromErrorTagUnknown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this change and was able to simplify this logic a bit. Please let me know what you think

}
default:
break
}
return nil

return &unknown
}

var canonicalCodesMap = map[string]int32{
Expand Down
32 changes: 28 additions & 4 deletions translator/trace/zipkin/status_code_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,8 @@ func TestStatusCodeMapperCases(t *testing.T) {
},

{
name: "oc: description only, use unknown code",
expected: &tracepb.Status{Code: 2, Message: "a description"},
name: "oc: description only, status ok",
expected: &tracepb.Status{Code: 0, Message: "a description"},
attributes: map[string]string{
"opencensus.status_description": "a description",
},
Expand All @@ -181,12 +181,36 @@ func TestStatusCodeMapperCases(t *testing.T) {
},

{
name: "error only",
expected: nil,
name: "error only: string description",
expected: &tracepb.Status{Code: 2},
attributes: map[string]string{
"error": "a description",
},
},

{
name: "error only: true",
expected: &tracepb.Status{Code: 2},
attributes: map[string]string{
"error": "true",
},
},

{
name: "error only: false",
expected: &tracepb.Status{Code: 2},
attributes: map[string]string{
"error": "false",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is still weird to me, I looked through our data and we do have spans with this being reported 🙄

How do you feel about treating a boolean attribute with a value false as OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the instrumentation on your end producing this kind of spans? It looks like OpenTracing convention, but should not be reported by zipkin instrumentation. If the instrumentation and spans are valid we probably should handle this edge case and make it OK code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this more, I don't think we need to make any changes.

Originally I was looking at our exported data in honeycomb where this was accidentally reported as a boolean instead of a string, causing some type conversion problems.

},
},

{
name: "error only: 1",
expected: &tracepb.Status{Code: 2},
attributes: map[string]string{
"error": "1",
},
},
}

for _, test := range tests {
Expand Down