Document AI-assisted component development guidance by lquerel · Pull Request #2909 · open-telemetry/otel-arrow

lquerel · 2026-05-08T17:40:02Z

Change Summary

Adds AI-assisted development guidance for OTAP Dataflow Engine contributors and maintainers.

This PR introduces a concise docs/ai entry point and documents the project’s posture for responsible AI-assisted work: controlled, reviewable, evidence-based, and owned by engineers familiar with OTAP Dataflow, Rust, and OpenTelemetry.

It also clarifies the current AI-assisted guidance set:

AI-Assisted Component Development: overview for choosing the right approach.
- Spec-Constrained Oracle Reimplementation: for interoperability-focused work where a reference implementation acts as an executable oracle.
- Reference-Informed OTAP-Native Capability Design: for designing improved OTAP-native capabilities from existing implementations, feedback, and future direction.
AI-Assisted Pull Request Review: for human and agent reviewers, focused on OTAP architectural invariants, thread-per-core runtime behavior, bounded resources, backpressure, performance, correctness, security, portability, and test intent.

What issue does this PR close?

Closes #2908

How are these changes tested?

Ran python3 tools/sanitycheck.py

Are there any user-facing changes?

Yes. This is documentation-only, but contributor-facing. It adds and updates guidance for engineers using AI-assisted workflows in OTAP Dataflow Engine development.

codecov · 2026-05-08T17:55:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.02%. Comparing base (b753329) to head (ef4b011).
⚠️ Report is 22 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2909      +/-   ##
==========================================
- Coverage   86.20%   86.02%   -0.18%     
==========================================
  Files         715      720       +5     
  Lines      271673   273338    +1665     
==========================================
+ Hits       234203   235149     +946     
- Misses      36946    37665     +719     
  Partials      524      524

Components	Coverage Δ
otap-dataflow	`87.18% <ø> (+0.02%)`	⬆️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`89.57% <ø> (-1.16%)`	⬇️
otel-arrow-go	`52.45% <ø> (ø)`
quiver	`92.25% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gouslu · 2026-05-11T21:33:51Z

Seems like AI assisted PR review could also be added as a skill to guide AI agents to take certain things into consideration when doing the review. Also possibly the spec-constrained oracle reimplementation. I think this guidance is good, but it could be even more visible/accessible if there were already skills in this repository that people could quickly import to be able to follow these AI assisted development and review guidelines.

edit: to clarify, by skill I meant copilot/AI skills. cc: @albertlockett

albertlockett

LGTM. The PR review guidelines I feel will be especially helpful, and I agree with @gouslu that some of these could also be framed as rust skills.

albertlockett · 2026-05-12T16:57:57Z

+- clear ownership, lifetimes, and cancellation behavior
+- idiomatic Rust without sacrificing hot-path performance
+- no implicit behavior that hides failure, retries, or data loss
+- justified and reviewed `unsafe` blocks


should we add that expect()/unwrap()/unreachable!() should also be documented with a comment about safety?

albertlockett · 2026-05-12T16:59:13Z

+## Development Note and PR Strategy
+
+Each component using either approach must keep a short development note, for
+example `<component-module>/DEVELOPMENT.md`.


in the future, it might be helpful to have an example

reyang · 2026-05-12T17:25:00Z

+- classify preserved, changed, unsupported, rejected, or intentionally divergent
+  behavior
+- make unsupported behavior and intentional divergences explicit
+- preserve human ownership of correctness, maintainability, and security


Can human delegate this?

For example, if AI can upgrade the package versions for downstream dependencies, and there is a reasonable quality gate, would project maintainers be okay letting bots merge the PR directly?

reyang · 2026-05-12T17:28:41Z

+
+- OTAP-native data representation and component composition
+- ownership, allocation, and hot-path materialization behavior
+- thread-per-core, share-nothing execution model


Is there a way to verify and gate this?

More of a question for @cijothomas.

reyang · 2026-05-12T17:35:42Z

+
+Reviewers must protect the OTAP runtime model.
+
+Check that changes preserve the thread-per-core, share-nothing design:


What about code which invokes a function from an external library?
I think there are two challenges:

How would the PR reviewer know if the invoked function meets the thread-per-core expectation.

Even if the function meets the expectation for now, how do we detect/prevent future drifts? (e.g., the newer version of the library might behave differently)

That makes sense. For external libraries, I do not think reviewers can always know the internals just from the call site.

Maybe the doc could say that if we call an external library on the runtime path, the PR should write down the important assumption: can it block, start threads, use shared/global state, allocate a lot, or hide buffering/retries?

That note could be simple: a link to the library docs/source, a short code comment, a note in the component development doc, or a small test/benchmark if needed.

For future drift, dependency upgrades should re-check that assumption. It will not make this perfect, but it makes the risk visible for reviewers.

jmacd

This works for me as a human. I think the framing of spec-constrained and reference-informed is useful.

I asked an AI.

My concern is that this document is not very consumable by an AI. It's prime quote: "I'm more likely to follow a tight 50-line checklist than a thorough essay."

What works well for me as an AI consumer

ai-assisted-pr-review.md — The architectural invariants list
(thread-per-core, bounded resources, backpressure, no blocking in async) is
exactly the kind of checklist of falsifiable properties I can actually
apply. I can grep a diff for tokio::spawn, Mutex, unbounded_channel,
block_on, etc. and have a concrete rubric. High utility.
The two-approach framing (Spec-Constrained Oracle Reimplementation vs.
Reference-Informed OTAP-Native Design) is genuinely useful because it tells
me which mode I'm in before I start. The failure mode I most often fall
into is silently drifting between "port this faithfully" and "redesign it
idiomatically." Having an explicit name for each mode makes it easier to
stay in one.
Internal consistency and link validity — boring but real. When docs
cross-reference cleanly, I can chase a thread without burning tool calls on
dead ends.

What's less useful or actively risky

Prose-heavy framing around principles I already infer from the code. A
lot of the "be thoughtful, preserve invariants" guidance is something a
competent reviewer derives anyway. It costs context tokens to load and
rarely changes my output.
Long documents tempt me to skim. 794 lines is past the point where I'll
reliably internalize every rule. I'm more likely to follow a tight 50-line
checklist than a thorough essay. The PR-review doc is closer to
checklist-shaped; the philosophical framing docs are more at risk of being
skimmed.
The script-path bug flagged in the review is a perfect example of the core
hazard: documentation written for AI agents must be
executable-correct, not just narratively correct. A human reviewer would
mentally fix ./scripts/... → rust/otap-dataflow/scripts/...; an agent
will dutifully run the wrong command and report failure. Every concrete
command in agent-facing docs is a small contract.

What I'd want more of

Negative examples ("here's a diff that looks fine but violates X").
Exact grep/ripgrep patterns for the invariants.
A short "if you only read one section, read this" pointer.

jmacd · 2026-05-12T21:03:18Z

+
+- Treat AI output as a draft, not as authority.
+- Require traceable evidence for accepted behavior and design decisions.
+- Keep changes small enough to review and validate.


Suggested change

- Keep changes small enough to review and validate.

- Keep changes small enough to review and validate.

- Consider optimizing changes for human readability and reviewability.

I'm trying to suggest ways that AI can help humans here, but I could elaborate:

isolate large mechanical changes into separate units

avoid unnecessary reordering

do not rename "for clarity" inside unrelated changes

avoid repetitive blocks of code; refactor to reduce size

jmacd · 2026-05-12T21:29:46Z

+## Component Development Note
+
+Each component using this approach must keep a short development note, for
+example `<component-module>/DEVELOPMENT.md`.


This made me think of #2902, it will be good to see what one of these looks like or to link to a real example (later).

lalitb · 2026-05-12T22:10:52Z

+
+## Choosing the Right Approach
+
+| Question | Spec-Constrained Oracle | Reference-Informed Design |


One small clarification: this decision matrix is useful for choosing between oracle based work and reference-informed work, but what about greenfield OTAP-native work where there is no clear reference implementation? Should the doc say that this case is outside this decision tree, or point contributors back to the normal design/review process?

lalitb · 2026-05-12T22:12:13Z

+| Schema-required | Required by a schema, IDL, or generated-code contract. |
+| Ecosystem-required | Not fully specified, but required for interoperability. |
+| Implementation artifact | Observable in the oracle, but not required. |
+| Reference bug | Believed to be incorrect and intentionally not reproduced. |


This may already be implied by the evidence/documentation guidance, but “Reference bug” feels like the riskiest classification here. If we intentionally do not reproduce behavior from the reference implementation, should the doc explicitly say to write down why, for example with a spec link, upstream issue, or short note in the development note?
That would make it easier for reviewers to tell whether this is really a reference bug, and not just behavior that is hard or inconvenient to implement.

Co-authored-by: albertlockett <[email protected]>

lquerel added 6 commits May 8, 2026 00:31

First draft

65fe20d

Refine oracle reimplementation guidance

05e351a

Refine reference-informed design guidance

3260499

Clarify reference-informed design evidence

282c6f0

Refine AI-assisted development overview

342baa8

Align AI guidance docs

5874d80

lquerel requested a review from a team as a code owner May 8, 2026 17:40

github-project-automation Bot added this to OTel-Arrow May 8, 2026

github-actions Bot added the rust Pull requests that update Rust code label May 8, 2026

lquerel added 4 commits May 8, 2026 12:11

Add AI-assisted PR review guidance

9122838

Expand AI-assisted PR review guidance

f42fd8b

Expand AI-assisted PR review guidance

16d20b9

Refine AI-assisted PR review guidance

8001fa5

albertlockett approved these changes May 12, 2026

View reviewed changes

reyang reviewed May 12, 2026

View reviewed changes

jmacd approved these changes May 12, 2026

View reviewed changes

lalitb reviewed May 12, 2026

View reviewed changes

lalitb approved these changes May 12, 2026

View reviewed changes

Update rust/otap-dataflow/docs/ai/README.md

ef4b011

Co-authored-by: albertlockett <[email protected]>


		Reviewers must protect the OTAP runtime model.

		Check that changes preserve the thread-per-core, share-nothing design:

	- Keep changes small enough to review and validate.
	- Keep changes small enough to review and validate.
	- Consider optimizing changes for human readability and reviewability.


		## Choosing the Right Approach

		\| Question \| Spec-Constrained Oracle \| Reference-Informed Design \|

Conversation

lquerel commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Summary

What issue does this PR close?

How are these changes tested?

Are there any user-facing changes?

Uh oh!

codecov Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gouslu commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertlockett left a comment

Choose a reason for hiding this comment

Uh oh!

albertlockett May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmacd left a comment

Choose a reason for hiding this comment

What works well for me as an AI consumer

What's less useful or actively risky

What I'd want more of

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lalitb May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lquerel commented May 8, 2026 •

edited

Loading

codecov Bot commented May 8, 2026 •

edited

Loading

gouslu commented May 11, 2026 •

edited

Loading

albertlockett May 12, 2026 •

edited

Loading

lalitb May 12, 2026 •

edited

Loading