Add Open Telemetry instrumentation #526

atheriel · 2025-05-22T19:37:21Z

This commit wraps all LLM model calls in an Open Telemetry span that abides by the (still nascent) semantic conventions for Generative AI clients.

It's very similar in approach to what was done for httr2, and in fact the two of them complement one another nicely:
r-lib/httr2#729.

For example:

library(otelsdk)

Sys.setenv(OTEL_TRACES_EXPORTER = "stderr")

chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet")
chat$chat("Tell me a joke in the form of an SQL query.")

atheriel · 2025-05-22T19:53:07Z

Traces that mix LLM model call spans with httr2 spans:

jcheng5 · 2025-05-23T00:58:21Z

cc @cpsievert @schloerke @icarusz

hadley · 2025-05-28T15:37:27Z

Do we want to (optionally?) also include user and assistant messages, a la https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/ ?

atheriel · 2025-05-28T15:45:03Z

@hadley I do. But there's a ton of disagreement in the OTel LLM community about how to do that, and none of the existing instrumentation libraries work in the same way 😞. Plus the whole "structured body" mechanism the current spec proposes (1) isn't supported by the span API; and (2) is formally deprecated.

So I kind of think we need to noodle on what to do there, and I suggest pushing it into follow-up work. I'm planning on writing up an issue describing what options we have.

I also think we should have first-class support for tool call spans, because that's something that ellmer focuses on specifically. This PR is really the "basic" bit that the title implies.

hadley · 2025-05-28T16:10:02Z

@atheriel ok, that makes sense. I'm sure there will be a lot of learning as we figure out exactly what is most useful to instrument across packages.

atheriel · 2025-06-06T17:59:47Z

Moving this back to draft because it has known issues (i.e. the concurrency does not work correctly).

This commit instruments various operations with Open Telemetry spans that abide by the (still nascent) semantic conventions for Generative AI clients [0]. These conventions classify `ellmer` chatbots as "agents" due to their ability to run tool calls, so in fact there are three types of span: (1) a top-level `invoke_agent` span for each chat interaction; (2) `chat` spans that wrap model API calls; and (3) `execute_tool` spans that wrap tool calls on our end. There's currently no community concensus for how to attach turns to spans, so I've left that out for now. Example code: library(otelsdk) Sys.setenv(OTEL_TRACES_EXPORTER = "stderr") chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet") chat$chat("Tell me a joke in the form of an SQL query.") Unit tests are included. [0]: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ Signed-off-by: Aaron Jacobs <[email protected]>

atheriel · 2025-06-20T20:36:42Z

This has been updated to support async operations and for changes in otel and otelsdk. It now also includes pretty extensive unit tests and support for agent and tool call spans.

* main: (95 commits) fix(chat): Call `check_echo()` in `chat()` for consistent echo behavior (#742) Increment version number to 0.3.2.9000 Increment version number to 0.3.2 Don't run `content_image()` on CRAN (#739) feat(chat_): Add `params` and `model` to all `chat_` functions (#699) Fix spelling in `tool_prompt.md` (#730) Fix typos in source comments and regenerate documentation (#736) Fix news bullet Increment version number to 0.3.1.9000 Increment version number to 0.3.1 Update cran comments Check revdpes Re-build readme Typo fixes (#686) Polish news Update to latest Air settings and use `format-suggest.yaml` (#683) Use newer REST API base url (#726) Fix auth scope and API endpoints for Google Vertex (#704) Run `Rscript data-raw/prices.R` to update pricing info (#727) Improve error message for `batch_chat()` (#716) ...

Co-Authored-By: Aaron Jacobs <[email protected]>

…o check remotes

Co-Authored-By: Aaron Jacobs <[email protected]>

…n activation for chat

R/otel.R

hadley · 2025-10-14T13:32:42Z

One other question: do we want to log something about auth here (i.e. in particular, which auth was automatically picked?)

R/otel.R

DESCRIPTION

Co-authored-by: Charlie Gao <[email protected]>

Inspiration from shiny / promises code reviews

* main: Batch and parallel chats are no longer experimental (#842) Add basic support for Anthropic/Claude Files API (#760) More dollars methods (#841) Reworking OpenAI interface (#832) Better `batch_chat()` error handling for OpenAI (#838) Fix broken Anthropic url (#839)

R/chat-tools.R

params/args: `parent_otel_span` -> `otel_span` `local_*_otel_span(parent_otel_span=)` -> `local_*_otel_span(parent=)`

…ns to span creation functions

shikokuchuo · 2025-11-08T15:42:43Z

Just adding a comment that I've adopted @schloerke's suggestions from #848 (with a couple of extensions as I noted) in 8080dc6, and this is a common approach to fixing the tracer across the packages we're instrumenting.

shikokuchuo · 2025-11-08T19:39:04Z

@schloerke one of the live-api test failures concerns a test on this block:

p1 <- chat$chat_async("What's the current date in Y-M-D format?") |>
  promises::then(function(result) {
    chat$chat_async("What date will it be 47 days from now?")
  })

The test expectation is for both agent spans to be top level.

I'm thinking this was set up pre-otel promise domains, and want to confirm with you that one should indeed be a child span of the other given it executes within a $then() context. Thanks!

schloerke · 2025-11-10T15:16:39Z

The test expectation is for both agent spans to be top level.

I'm thinking this was set up pre-otel promise domains, and want to confirm with you that one should indeed be a child span of the other given it executes within a $then() context. Thanks!

I'd still expect both chat$chat_async() calls to be top level.

**Investigating

Updated local_agent_otel_span to accept an 'activate' argument. `activate` MUST be `FALSE` for `local_agent_otel_span()` to prevent issues when switching too many coroutine contexts. Improved related tests to check span hierarchy and clarify span activation behavior.

Updated tests to use expect_gte instead of strict length checks for tool and chat spans. This accounts for model variations where tools may be called more than expected or results are cached, improving test robustness across different model behaviors.

schloerke · 2025-11-10T17:02:59Z

The error was introduced in eeb0b84 (#526) 🥸

But!.. It's fixed now. Thank goodness for unit tests

Restrictions on the other earlier failing test have been relaxed and now pass

… absolute values

shikokuchuo

Thanks @schloerke. I've made a couple more tidy-ups, the only consequential one to move otel to 'suggests', consistent with the other packages we're instrumenting.

Apart from the unit tests themsleves, I've also been testing using your shinychat demo (Shiny, ellmer, httr2 and mirai spans) and the traces all look good.

shikokuchuo · 2025-11-10T19:50:08Z

@hadley this PR is now ready for review. Thanks!

atheriel requested review from gaborcsardi and hadley May 22, 2025 19:37

atheriel marked this pull request as draft June 6, 2025 17:59

atheriel force-pushed the otel branch from 1e41be2 to 6429125 Compare June 20, 2025 20:34

atheriel changed the title ~~Add basic Open Telemetry instrumentation for model calls~~ Add Open Telemetry instrumentation Jun 20, 2025

schloerke and others added 10 commits September 8, 2025 09:50

usethis::use_tidy_description()

a710ce2

Updates to the latest promises ospans

bf51c67

Co-Authored-By: Aaron Jacobs <[email protected]>

Simplify internal api

2d5245a

Use promises::local_ospan_promise_domain()

397052e

Import otel as promises does. Remove suggestions on otelsdk and add t…

833c308

…o check remotes

Use new promises main branch (PR was merged)

324c955

Copy in tracer retrieval from httr2

a923a1a

Co-Authored-By: Aaron Jacobs <[email protected]>

Fix runtime error with otelsdk where spn$end(status="auto") would fail

bc00cc2

Pass through the parent chat ospan to the generator methods. Add ospa…

e59614b

…n activation for chat

schloerke reviewed Sep 18, 2025

View reviewed changes

R/otel.R Outdated Show resolved Hide resolved

shikokuchuo reviewed Oct 23, 2025

View reviewed changes

R/otel.R Outdated Show resolved Hide resolved

R/otel.R Outdated Show resolved Hide resolved

R/otel.R Outdated Show resolved Hide resolved

DESCRIPTION Outdated Show resolved Hide resolved

DESCRIPTION Outdated Show resolved Hide resolved

schloerke and others added 4 commits November 6, 2025 12:07

Apply suggestions from code review

5e48b66

Co-authored-by: Charlie Gao <[email protected]>

Merge branch 'main' into otel

0de30c1

Use existing is_testing() method

5735bc6

Use more descriptive otel tracer function for ellmer

30ee794

Inspiration from shiny / promises code reviews

schloerke assigned shikokuchuo Nov 7, 2025

schloerke added 4 commits November 7, 2025 11:21

Remove coro:: namespace for await_each

98e3ff6

Use local_tempfile() helper method

e92322e

Remove otelsdk remote and make a Suggests

0c91f8f

schloerke reviewed Nov 7, 2025

View reviewed changes

R/chat-tools.R Outdated Show resolved Hide resolved

schloerke and others added 4 commits November 7, 2025 11:43

Refactor otel span argument naming for clarity

c377dd6

params/args: `parent_otel_span` -> `otel_span` `local_*_otel_span(parent_otel_span=)` -> `local_*_otel_span(parent=)`

Update chat-tools.R

a7852bc

Update otel tracer caching implementation after #848; add early retur…

8080dc6

…ns to span creation functions

Merge branch 'main' into otel

302e752

shikokuchuo added 2 commits November 8, 2025 17:05

Corrections for 8080dc6

37e9ca2

Remove superfluous promise domain setups

01f20e6

schloerke added 2 commits November 10, 2025 11:36

schloerke marked this pull request as ready for review November 10, 2025 17:14

schloerke requested review from hadley and shikokuchuo and removed request for hadley November 10, 2025 17:14

shikokuchuo added 4 commits November 10, 2025 18:12

Refactor otel spans tests to be relative instead of comparing against…

2d78a24

… absolute values

Simplify span kinds and test

7f653bb

Move otel to suggests

c03dfde

Add span_recording() helper

ecf85fd

shikokuchuo approved these changes Nov 10, 2025

View reviewed changes

shikokuchuo added 2 commits November 14, 2025 15:12

Merge branch 'main' into otel

34d8093

Merge branch 'main' into otel

128a7b5

Add Open Telemetry instrumentation #526

Are you sure you want to change the base?

Add Open Telemetry instrumentation #526

Uh oh!

Conversation

atheriel commented May 22, 2025

Uh oh!

atheriel commented May 22, 2025

Uh oh!

jcheng5 commented May 23, 2025

Uh oh!

hadley commented May 28, 2025

Uh oh!

atheriel commented May 28, 2025

Uh oh!

hadley commented May 28, 2025

Uh oh!

atheriel commented Jun 6, 2025

Uh oh!

atheriel commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hadley commented Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shikokuchuo commented Nov 8, 2025

Uh oh!

shikokuchuo commented Nov 8, 2025

Uh oh!

schloerke commented Nov 10, 2025

Uh oh!

schloerke commented Nov 10, 2025

Uh oh!

shikokuchuo left a comment

Choose a reason for hiding this comment

Uh oh!

shikokuchuo commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

atheriel commented Jun 20, 2025 •

edited

Loading