Skip to content

Document AI-assisted component development guidance#2909

Open
lquerel wants to merge 11 commits into
open-telemetry:mainfrom
lquerel:ai-methods
Open

Document AI-assisted component development guidance#2909
lquerel wants to merge 11 commits into
open-telemetry:mainfrom
lquerel:ai-methods

Conversation

@lquerel
Copy link
Copy Markdown
Contributor

@lquerel lquerel commented May 8, 2026

Change Summary

Adds AI-assisted development guidance for OTAP Dataflow Engine contributors and maintainers.

This PR introduces a concise docs/ai entry point and documents the project’s posture for responsible AI-assisted work: controlled, reviewable, evidence-based, and owned by engineers familiar with OTAP Dataflow, Rust, and OpenTelemetry.

It also clarifies the current AI-assisted guidance set:

  • AI-Assisted Component Development: overview for choosing the right approach.
    • Spec-Constrained Oracle Reimplementation: for interoperability-focused work where a reference implementation acts as an executable oracle.
    • Reference-Informed OTAP-Native Capability Design: for designing improved OTAP-native capabilities from existing implementations, feedback, and future direction.
  • AI-Assisted Pull Request Review: for human and agent reviewers, focused on OTAP architectural invariants, thread-per-core runtime behavior, bounded resources, backpressure, performance, correctness, security, portability, and test intent.

What issue does this PR close?

How are these changes tested?

  • Ran python3 tools/sanitycheck.py

Are there any user-facing changes?

Yes. This is documentation-only, but contributor-facing. It adds and updates guidance for engineers using AI-assisted workflows in OTAP Dataflow Engine development.

@lquerel lquerel requested a review from a team as a code owner May 8, 2026 17:40
@github-actions github-actions Bot added the rust Pull requests that update Rust code label May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.02%. Comparing base (b753329) to head (ef4b011).
⚠️ Report is 22 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2909      +/-   ##
==========================================
- Coverage   86.20%   86.02%   -0.18%     
==========================================
  Files         715      720       +5     
  Lines      271673   273338    +1665     
==========================================
+ Hits       234203   235149     +946     
- Misses      36946    37665     +719     
  Partials      524      524              
Components Coverage Δ
otap-dataflow 87.18% <ø> (+0.02%) ⬆️
query_abstraction 80.61% <ø> (ø)
query_engine 89.57% <ø> (-1.16%) ⬇️
otel-arrow-go 52.45% <ø> (ø)
quiver 92.25% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gouslu
Copy link
Copy Markdown
Contributor

gouslu commented May 11, 2026

Seems like AI assisted PR review could also be added as a skill to guide AI agents to take certain things into consideration when doing the review. Also possibly the spec-constrained oracle reimplementation. I think this guidance is good, but it could be even more visible/accessible if there were already skills in this repository that people could quickly import to be able to follow these AI assisted development and review guidelines.

edit: to clarify, by skill I meant copilot/AI skills. cc: @albertlockett

Copy link
Copy Markdown
Member

@albertlockett albertlockett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The PR review guidelines I feel will be especially helpful, and I agree with @gouslu that some of these could also be framed as rust skills.

- clear ownership, lifetimes, and cancellation behavior
- idiomatic Rust without sacrificing hot-path performance
- no implicit behavior that hides failure, retries, or data loss
- justified and reviewed `unsafe` blocks
Copy link
Copy Markdown
Member

@albertlockett albertlockett May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add that expect()/unwrap()/unreachable!() should also be documented with a comment about safety?

Comment thread rust/otap-dataflow/docs/ai/README.md Outdated
## Development Note and PR Strategy

Each component using either approach must keep a short development note, for
example `<component-module>/DEVELOPMENT.md`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future, it might be helpful to have an example

- classify preserved, changed, unsupported, rejected, or intentionally divergent
behavior
- make unsupported behavior and intentional divergences explicit
- preserve human ownership of correctness, maintainability, and security
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can human delegate this?

For example, if AI can upgrade the package versions for downstream dependencies, and there is a reasonable quality gate, would project maintainers be okay letting bots merge the PR directly?


- OTAP-native data representation and component composition
- ownership, allocation, and hot-path materialization behavior
- thread-per-core, share-nothing execution model
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to verify and gate this?

More of a question for @cijothomas.


Reviewers must protect the OTAP runtime model.

Check that changes preserve the thread-per-core, share-nothing design:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about code which invokes a function from an external library?
I think there are two challenges:

  1. How would the PR reviewer know if the invoked function meets the thread-per-core expectation.
  2. Even if the function meets the expectation for now, how do we detect/prevent future drifts? (e.g., the newer version of the library might behave differently)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. For external libraries, I do not think reviewers can always know the internals just from the call site.

Maybe the doc could say that if we call an external library on the runtime path, the PR should write down the important assumption: can it block, start threads, use shared/global state, allocate a lot, or hide buffering/retries?

That note could be simple: a link to the library docs/source, a short code comment, a note in the component development doc, or a small test/benchmark if needed.

For future drift, dependency upgrades should re-check that assumption. It will not make this perfect, but it makes the risk visible for reviewers.

Copy link
Copy Markdown
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me as a human. I think the framing of spec-constrained and reference-informed is useful.

I asked an AI.

My concern is that this document is not very consumable by an AI. It's prime quote: "I'm more likely to follow a tight 50-line checklist than a thorough essay."

What works well for me as an AI consumer

  • ai-assisted-pr-review.md — The architectural invariants list
    (thread-per-core, bounded resources, backpressure, no blocking in async) is
    exactly the kind of checklist of falsifiable properties I can actually
    apply. I can grep a diff for tokio::spawn, Mutex, unbounded_channel,
    block_on, etc. and have a concrete rubric. High utility.

  • The two-approach framing (Spec-Constrained Oracle Reimplementation vs.
    Reference-Informed OTAP-Native Design) is genuinely useful because it tells
    me which mode I'm in before I start. The failure mode I most often fall
    into is silently drifting between "port this faithfully" and "redesign it
    idiomatically." Having an explicit name for each mode makes it easier to
    stay in one.

  • Internal consistency and link validity — boring but real. When docs
    cross-reference cleanly, I can chase a thread without burning tool calls on
    dead ends.

What's less useful or actively risky

  • Prose-heavy framing around principles I already infer from the code. A
    lot of the "be thoughtful, preserve invariants" guidance is something a
    competent reviewer derives anyway. It costs context tokens to load and
    rarely changes my output.

  • Long documents tempt me to skim. 794 lines is past the point where I'll
    reliably internalize every rule. I'm more likely to follow a tight 50-line
    checklist than a thorough essay. The PR-review doc is closer to
    checklist-shaped; the philosophical framing docs are more at risk of being
    skimmed.

  • The script-path bug flagged in the review is a perfect example of the core
    hazard:
    documentation written for AI agents must be
    executable-correct, not just narratively correct. A human reviewer would
    mentally fix ./scripts/...rust/otap-dataflow/scripts/...; an agent
    will dutifully run the wrong command and report failure. Every concrete
    command in agent-facing docs is a small contract.

What I'd want more of

  • Negative examples ("here's a diff that looks fine but violates X").
  • Exact grep/ripgrep patterns for the invariants.
  • A short "if you only read one section, read this" pointer.


- Treat AI output as a draft, not as authority.
- Require traceable evidence for accepted behavior and design decisions.
- Keep changes small enough to review and validate.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Keep changes small enough to review and validate.
- Keep changes small enough to review and validate.
- Consider optimizing changes for human readability and reviewability.

I'm trying to suggest ways that AI can help humans here, but I could elaborate:

  • isolate large mechanical changes into separate units
  • avoid unnecessary reordering
  • do not rename "for clarity" inside unrelated changes
  • avoid repetitive blocks of code; refactor to reduce size

## Component Development Note

Each component using this approach must keep a short development note, for
example `<component-module>/DEVELOPMENT.md`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made me think of #2902, it will be good to see what one of these looks like or to link to a real example (later).


## Choosing the Right Approach

| Question | Spec-Constrained Oracle | Reference-Informed Design |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small clarification: this decision matrix is useful for choosing between oracle based work and reference-informed work, but what about greenfield OTAP-native work where there is no clear reference implementation? Should the doc say that this case is outside this decision tree, or point contributors back to the normal design/review process?

| Schema-required | Required by a schema, IDL, or generated-code contract. |
| Ecosystem-required | Not fully specified, but required for interoperability. |
| Implementation artifact | Observable in the oracle, but not required. |
| Reference bug | Believed to be incorrect and intentionally not reproduced. |
Copy link
Copy Markdown
Member

@lalitb lalitb May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may already be implied by the evidence/documentation guidance, but “Reference bug” feels like the riskiest classification here. If we intentionally do not reproduce behavior from the reference implementation, should the doc explicitly say to write down why, for example with a spec link, upstream issue, or short note in the development note?
That would make it easier for reviewers to tell whether this is really a reference bug, and not just behavior that is hard or inconvenient to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Document AI-Assisted Development Guidance for OTAP Dataflow

6 participants