-
Notifications
You must be signed in to change notification settings - Fork 108
Add Open Telemetry instrumentation #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Do we want to (optionally?) also include user and assistant messages, a la https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/ ? |
|
@hadley I do. But there's a ton of disagreement in the OTel LLM community about how to do that, and none of the existing instrumentation libraries work in the same way 😞. Plus the whole "structured body" mechanism the current spec proposes (1) isn't supported by the span API; and (2) is formally deprecated. So I kind of think we need to noodle on what to do there, and I suggest pushing it into follow-up work. I'm planning on writing up an issue describing what options we have. I also think we should have first-class support for tool call spans, because that's something that |
|
@atheriel ok, that makes sense. I'm sure there will be a lot of learning as we figure out exactly what is most useful to instrument across packages. |
|
Moving this back to draft because it has known issues (i.e. the concurrency does not work correctly). |
This commit instruments various operations with Open Telemetry spans that
abide by the (still nascent) semantic conventions for Generative AI
clients [0].
These conventions classify `ellmer` chatbots as "agents" due to their
ability to run tool calls, so in fact there are three types of span: (1)
a top-level `invoke_agent` span for each chat interaction; (2) `chat`
spans that wrap model API calls; and (3) `execute_tool` spans that wrap
tool calls on our end.
There's currently no community concensus for how to attach turns to
spans, so I've left that out for now.
Example code:
library(otelsdk)
Sys.setenv(OTEL_TRACES_EXPORTER = "stderr")
chat <- ellmer::chat_databricks(model = "databricks-claude-3-7-sonnet")
chat$chat("Tell me a joke in the form of an SQL query.")
Unit tests are included.
[0]: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
Signed-off-by: Aaron Jacobs <[email protected]>
|
This has been updated to support async operations and for changes in |
* main: (95 commits) fix(chat): Call `check_echo()` in `chat()` for consistent echo behavior (#742) Increment version number to 0.3.2.9000 Increment version number to 0.3.2 Don't run `content_image()` on CRAN (#739) feat(chat_): Add `params` and `model` to all `chat_` functions (#699) Fix spelling in `tool_prompt.md` (#730) Fix typos in source comments and regenerate documentation (#736) Fix news bullet Increment version number to 0.3.1.9000 Increment version number to 0.3.1 Update cran comments Check revdpes Re-build readme Typo fixes (#686) Polish news Update to latest Air settings and use `format-suggest.yaml` (#683) Use newer REST API base url (#726) Fix auth scope and API endpoints for Google Vertex (#704) Run `Rscript data-raw/prices.R` to update pricing info (#727) Improve error message for `batch_chat()` (#716) ...
Co-Authored-By: Aaron Jacobs <[email protected]>
Co-Authored-By: Aaron Jacobs <[email protected]>
…n activation for chat
|
One other question: do we want to log something about auth here (i.e. in particular, which auth was automatically picked?) |
Co-authored-by: Charlie Gao <[email protected]>
Inspiration from shiny / promises code reviews
params/args: `parent_otel_span` -> `otel_span` `local_*_otel_span(parent_otel_span=)` -> `local_*_otel_span(parent=)`
…ns to span creation functions
|
Just adding a comment that I've adopted @schloerke's suggestions from #848 (with a couple of extensions as I noted) in 8080dc6, and this is a common approach to fixing the tracer across the packages we're instrumenting. |
|
@schloerke one of the live-api test failures concerns a test on this block: p1 <- chat$chat_async("What's the current date in Y-M-D format?") |>
promises::then(function(result) {
chat$chat_async("What date will it be 47 days from now?")
})The test expectation is for both agent spans to be top level. I'm thinking this was set up pre-otel promise domains, and want to confirm with you that one should indeed be a child span of the other given it executes within a |
I'd still expect both **Investigating |
Updated local_agent_otel_span to accept an 'activate' argument. `activate` MUST be `FALSE` for `local_agent_otel_span()` to prevent issues when switching too many coroutine contexts. Improved related tests to check span hierarchy and clarify span activation behavior.
Updated tests to use expect_gte instead of strict length checks for tool and chat spans. This accounts for model variations where tools may be called more than expected or results are cached, improving test robustness across different model behaviors.
|
The error was introduced in But!.. It's fixed now. Thank goodness for unit tests Restrictions on the other earlier failing test have been relaxed and now pass |
shikokuchuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @schloerke. I've made a couple more tidy-ups, the only consequential one to move otel to 'suggests', consistent with the other packages we're instrumenting.
Apart from the unit tests themsleves, I've also been testing using your shinychat demo (Shiny, ellmer, httr2 and mirai spans) and the traces all look good.
|
@hadley this PR is now ready for review. Thanks! |

This commit wraps all LLM model calls in an Open Telemetry span that abides by the (still nascent) semantic conventions for Generative AI clients.
It's very similar in approach to what was done for
httr2, and in fact the two of them complement one another nicely:r-lib/httr2#729.
For example: