Skip to content

aws-sdk SQS propagation #5167

@blumamir

Description

@blumamir

Hello,
I am trying to make aws-sdk SQS context propagation work between applications written in Java (scala actually), and nodejs / ruby.
In order for it to work, both the sender and receiver should agree on how to inject and extract the context onto the messages.

Unfortunately, Java and node \ ruby behave differently which creates broken traces for end-users who are sending messages across those systems.

NodeJS \ Ruby

These implementations use the "OpenTelemetry" approach. They use the propagators registered in the otel API (w3c \ b3 \ custom \ etc) to inject and extract context from the message attributes

Cons

There are quotas on SQS messages which the context propagation consumes: (https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/quotas-messages.html) which means:

  1. User is charged for this payload which includes the message attributes. It is not expected to be expensive as "Each 64 KB chunk of a payload is billed as 1 request", but still it's a few dozen bytes that are charged.
  2. If the message is just a bit below the 256K hard limit, then adding the context data can potentially cross this limit and reject the message.
  3. Message attributes are limited to 10 values total, including the user values, which means if he already used the available amount, then instrumentation has no more space to inject propagation headers as well.

Java

The jave implementation is using the X-Amzn-Trace-Id header, which does not consume quotas like the Nodejs \ Ruby implementation, but has the following cons:

Cons

  1. If X-Ray is enabled on a service, it might inject additional spans into the trace as the request is passing via the AWS x-ray enabled services. These spans are only exported to X-Ray. It means that otel users which are not using X-Ray, will have missing spans in the trace, and thus the goal to have async messaging visibility is lost.
  2. For some services, if X-Ray is disabled, the service will still create a new trace, flag it as a non-sampled trace and propagate this information downstream. If the application is configured with parent-based sampler, then x-ray effectivly turns off tracing for the application. See this issue in node for reference and more info.
  3. X-Ray propagator does not support baggage, which means baggage values are not propagated via AWS services even if user registers the W3C Baggage propagator.

Action Items

Considering the above Pros and Cons, I want to suggest adding a second propagation style into the aws-sdk instrumentations, which is compatible with nodejs / ruby and allow the user to bypass the x-ray propagator so the above compatibility issues will not affect his application.

I'll be very happy to get more insights and ideas on this issue. Do you think it makes sense adding this new propagation style? Maybe AWS has plans to solve the compatibility problems in the near future?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions