Skip to content

Conversation

@atheriel
Copy link
Collaborator

This commit wraps all LLM model calls in an Open Telemetry span that abides by the (still nascent) semantic conventions for Generative AI clients.

It's very similar in approach to what was done for httr2, and in fact the two of them complement one another nicely:
r-lib/httr2#729.

For example:

library(otelsdk)

Sys.setenv(OTEL_TRACES_EXPORTER = "stderr")

chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet")
chat$chat("Tell me a joke in the form of an SQL query.")

@atheriel atheriel requested review from gaborcsardi and hadley May 22, 2025 19:37
@atheriel
Copy link
Collaborator Author

Traces that mix LLM model call spans with httr2 spans:

Screenshot 2025-05-22 at 14-25-20 Live · posit1_starter-project · Pydantic Logfire

@jcheng5
Copy link
Collaborator

jcheng5 commented May 23, 2025

cc @cpsievert @schloerke @icarusz

@hadley
Copy link
Member

hadley commented May 28, 2025

Do we want to (optionally?) also include user and assistant messages, a la https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/ ?

@atheriel
Copy link
Collaborator Author

@hadley I do. But there's a ton of disagreement in the OTel LLM community about how to do that, and none of the existing instrumentation libraries work in the same way 😞. Plus the whole "structured body" mechanism the current spec proposes (1) isn't supported by the span API; and (2) is formally deprecated.

So I kind of think we need to noodle on what to do there, and I suggest pushing it into follow-up work. I'm planning on writing up an issue describing what options we have.

I also think we should have first-class support for tool call spans, because that's something that ellmer focuses on specifically. This PR is really the "basic" bit that the title implies.

@hadley
Copy link
Member

hadley commented May 28, 2025

@atheriel ok, that makes sense. I'm sure there will be a lot of learning as we figure out exactly what is most useful to instrument across packages.

@atheriel atheriel marked this pull request as draft June 6, 2025 17:59
@atheriel
Copy link
Collaborator Author

atheriel commented Jun 6, 2025

Moving this back to draft because it has known issues (i.e. the concurrency does not work correctly).

This commit instruments various operations with Open Telemetry spans that
abide by the (still nascent) semantic conventions for Generative AI
clients [0].

These conventions classify `ellmer` chatbots as "agents" due to their
ability to run tool calls, so in fact there are three types of span: (1)
a top-level `invoke_agent` span for each chat interaction; (2) `chat`
spans that wrap model API calls; and (3) `execute_tool` spans that wrap
tool calls on our end.

There's currently no community concensus for how to attach turns to
spans, so I've left that out for now.

Example code:

    library(otelsdk)

    Sys.setenv(OTEL_TRACES_EXPORTER = "stderr")

    chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet")
    chat$chat("Tell me a joke in the form of an SQL query.")

Unit tests are included.

[0]: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/

Signed-off-by: Aaron Jacobs <[email protected]>
@atheriel atheriel changed the title Add basic Open Telemetry instrumentation for model calls Add Open Telemetry instrumentation Jun 20, 2025
@atheriel
Copy link
Collaborator Author

atheriel commented Jun 20, 2025

This has been updated to support async operations and for changes in otel and otelsdk. It now also includes pretty extensive unit tests and support for agent and tool call spans.

schloerke and others added 10 commits September 8, 2025 09:50
* main: (95 commits)
  fix(chat): Call `check_echo()` in `chat()` for consistent echo behavior (#742)
  Increment version number to 0.3.2.9000
  Increment version number to 0.3.2
  Don't run `content_image()` on CRAN (#739)
  feat(chat_): Add `params` and `model` to all `chat_` functions (#699)
  Fix spelling in `tool_prompt.md` (#730)
  Fix typos in source comments and regenerate documentation (#736)
  Fix news bullet
  Increment version number to 0.3.1.9000
  Increment version number to 0.3.1
  Update cran comments
  Check revdpes
  Re-build readme
  Typo fixes (#686)
  Polish news
  Update to latest Air settings and use `format-suggest.yaml` (#683)
  Use newer REST API base url (#726)
  Fix auth scope and API endpoints for Google Vertex (#704)
  Run `Rscript data-raw/prices.R` to update pricing info (#727)
  Improve error message for `batch_chat()` (#716)
  ...
@hadley
Copy link
Member

hadley commented Oct 14, 2025

One other question: do we want to log something about auth here (i.e. in particular, which auth was automatically picked?)

* main:
  Batch and parallel chats are no longer experimental (#842)
  Add basic support for Anthropic/Claude Files API (#760)
  More dollars methods (#841)
  Reworking OpenAI interface (#832)
  Better `batch_chat()` error handling for OpenAI (#838)
  Fix broken Anthropic url (#839)
schloerke and others added 4 commits November 7, 2025 11:43
params/args: `parent_otel_span` -> `otel_span`
`local_*_otel_span(parent_otel_span=)` -> `local_*_otel_span(parent=)`
@shikokuchuo
Copy link
Member

Just adding a comment that I've adopted @schloerke's suggestions from #848 (with a couple of extensions as I noted) in 8080dc6, and this is a common approach to fixing the tracer across the packages we're instrumenting.

@shikokuchuo
Copy link
Member

@schloerke one of the live-api test failures concerns a test on this block:

p1 <- chat$chat_async("What's the current date in Y-M-D format?") |>
  promises::then(function(result) {
    chat$chat_async("What date will it be 47 days from now?")
  })

The test expectation is for both agent spans to be top level.

I'm thinking this was set up pre-otel promise domains, and want to confirm with you that one should indeed be a child span of the other given it executes within a $then() context. Thanks!

@schloerke
Copy link
Collaborator

The test expectation is for both agent spans to be top level.

I'm thinking this was set up pre-otel promise domains, and want to confirm with you that one should indeed be a child span of the other given it executes within a $then() context. Thanks!

I'd still expect both chat$chat_async() calls to be top level.

**Investigating

Updated local_agent_otel_span to accept an 'activate' argument. `activate` MUST be `FALSE` for `local_agent_otel_span()` to prevent issues when switching too many coroutine contexts. Improved related tests to check span hierarchy and clarify span activation behavior.
Updated tests to use expect_gte instead of strict length checks for tool and chat spans. This accounts for model variations where tools may be called more than expected or results are cached, improving test robustness across different model behaviors.
@schloerke
Copy link
Collaborator

The error was introduced in eeb0b84 (#526) 🥸

But!.. It's fixed now. Thank goodness for unit tests


Restrictions on the other earlier failing test have been relaxed and now pass

@schloerke schloerke marked this pull request as ready for review November 10, 2025 17:14
@schloerke schloerke requested review from hadley and shikokuchuo and removed request for hadley November 10, 2025 17:14
Copy link
Member

@shikokuchuo shikokuchuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @schloerke. I've made a couple more tidy-ups, the only consequential one to move otel to 'suggests', consistent with the other packages we're instrumenting.

Apart from the unit tests themsleves, I've also been testing using your shinychat demo (Shiny, ellmer, httr2 and mirai spans) and the traces all look good.

@shikokuchuo
Copy link
Member

@hadley this PR is now ready for review. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants