Document AI-assisted component development guidance#2909
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2909 +/- ##
==========================================
- Coverage 86.20% 86.02% -0.18%
==========================================
Files 715 720 +5
Lines 271673 273338 +1665
==========================================
+ Hits 234203 235149 +946
- Misses 36946 37665 +719
Partials 524 524
🚀 New features to boost your workflow:
|
|
Seems like AI assisted PR review could also be added as a skill to guide AI agents to take certain things into consideration when doing the review. Also possibly the spec-constrained oracle reimplementation. I think this guidance is good, but it could be even more visible/accessible if there were already skills in this repository that people could quickly import to be able to follow these AI assisted development and review guidelines. edit: to clarify, by skill I meant copilot/AI skills. cc: @albertlockett |
albertlockett
left a comment
There was a problem hiding this comment.
LGTM. The PR review guidelines I feel will be especially helpful, and I agree with @gouslu that some of these could also be framed as rust skills.
| - clear ownership, lifetimes, and cancellation behavior | ||
| - idiomatic Rust without sacrificing hot-path performance | ||
| - no implicit behavior that hides failure, retries, or data loss | ||
| - justified and reviewed `unsafe` blocks |
There was a problem hiding this comment.
should we add that expect()/unwrap()/unreachable!() should also be documented with a comment about safety?
| ## Development Note and PR Strategy | ||
|
|
||
| Each component using either approach must keep a short development note, for | ||
| example `<component-module>/DEVELOPMENT.md`. |
There was a problem hiding this comment.
in the future, it might be helpful to have an example
| - classify preserved, changed, unsupported, rejected, or intentionally divergent | ||
| behavior | ||
| - make unsupported behavior and intentional divergences explicit | ||
| - preserve human ownership of correctness, maintainability, and security |
There was a problem hiding this comment.
Can human delegate this?
For example, if AI can upgrade the package versions for downstream dependencies, and there is a reasonable quality gate, would project maintainers be okay letting bots merge the PR directly?
|
|
||
| - OTAP-native data representation and component composition | ||
| - ownership, allocation, and hot-path materialization behavior | ||
| - thread-per-core, share-nothing execution model |
There was a problem hiding this comment.
Is there a way to verify and gate this?
More of a question for @cijothomas.
|
|
||
| Reviewers must protect the OTAP runtime model. | ||
|
|
||
| Check that changes preserve the thread-per-core, share-nothing design: |
There was a problem hiding this comment.
What about code which invokes a function from an external library?
I think there are two challenges:
- How would the PR reviewer know if the invoked function meets the thread-per-core expectation.
- Even if the function meets the expectation for now, how do we detect/prevent future drifts? (e.g., the newer version of the library might behave differently)
There was a problem hiding this comment.
That makes sense. For external libraries, I do not think reviewers can always know the internals just from the call site.
Maybe the doc could say that if we call an external library on the runtime path, the PR should write down the important assumption: can it block, start threads, use shared/global state, allocate a lot, or hide buffering/retries?
That note could be simple: a link to the library docs/source, a short code comment, a note in the component development doc, or a small test/benchmark if needed.
For future drift, dependency upgrades should re-check that assumption. It will not make this perfect, but it makes the risk visible for reviewers.
jmacd
left a comment
There was a problem hiding this comment.
This works for me as a human. I think the framing of spec-constrained and reference-informed is useful.
I asked an AI.
My concern is that this document is not very consumable by an AI. It's prime quote: "I'm more likely to follow a tight 50-line checklist than a thorough essay."
What works well for me as an AI consumer
-
ai-assisted-pr-review.md— The architectural invariants list
(thread-per-core, bounded resources, backpressure, no blocking in async) is
exactly the kind of checklist of falsifiable properties I can actually
apply. I can grep a diff fortokio::spawn,Mutex,unbounded_channel,
block_on, etc. and have a concrete rubric. High utility. -
The two-approach framing (Spec-Constrained Oracle Reimplementation vs.
Reference-Informed OTAP-Native Design) is genuinely useful because it tells
me which mode I'm in before I start. The failure mode I most often fall
into is silently drifting between "port this faithfully" and "redesign it
idiomatically." Having an explicit name for each mode makes it easier to
stay in one. -
Internal consistency and link validity — boring but real. When docs
cross-reference cleanly, I can chase a thread without burning tool calls on
dead ends.
What's less useful or actively risky
-
Prose-heavy framing around principles I already infer from the code. A
lot of the "be thoughtful, preserve invariants" guidance is something a
competent reviewer derives anyway. It costs context tokens to load and
rarely changes my output. -
Long documents tempt me to skim. 794 lines is past the point where I'll
reliably internalize every rule. I'm more likely to follow a tight 50-line
checklist than a thorough essay. The PR-review doc is closer to
checklist-shaped; the philosophical framing docs are more at risk of being
skimmed. -
The script-path bug flagged in the review is a perfect example of the core
hazard: documentation written for AI agents must be
executable-correct, not just narratively correct. A human reviewer would
mentally fix./scripts/...→rust/otap-dataflow/scripts/...; an agent
will dutifully run the wrong command and report failure. Every concrete
command in agent-facing docs is a small contract.
What I'd want more of
- Negative examples ("here's a diff that looks fine but violates X").
- Exact grep/ripgrep patterns for the invariants.
- A short "if you only read one section, read this" pointer.
|
|
||
| - Treat AI output as a draft, not as authority. | ||
| - Require traceable evidence for accepted behavior and design decisions. | ||
| - Keep changes small enough to review and validate. |
There was a problem hiding this comment.
| - Keep changes small enough to review and validate. | |
| - Keep changes small enough to review and validate. | |
| - Consider optimizing changes for human readability and reviewability. |
I'm trying to suggest ways that AI can help humans here, but I could elaborate:
- isolate large mechanical changes into separate units
- avoid unnecessary reordering
- do not rename "for clarity" inside unrelated changes
- avoid repetitive blocks of code; refactor to reduce size
| ## Component Development Note | ||
|
|
||
| Each component using this approach must keep a short development note, for | ||
| example `<component-module>/DEVELOPMENT.md`. |
There was a problem hiding this comment.
This made me think of #2902, it will be good to see what one of these looks like or to link to a real example (later).
|
|
||
| ## Choosing the Right Approach | ||
|
|
||
| | Question | Spec-Constrained Oracle | Reference-Informed Design | |
There was a problem hiding this comment.
One small clarification: this decision matrix is useful for choosing between oracle based work and reference-informed work, but what about greenfield OTAP-native work where there is no clear reference implementation? Should the doc say that this case is outside this decision tree, or point contributors back to the normal design/review process?
| | Schema-required | Required by a schema, IDL, or generated-code contract. | | ||
| | Ecosystem-required | Not fully specified, but required for interoperability. | | ||
| | Implementation artifact | Observable in the oracle, but not required. | | ||
| | Reference bug | Believed to be incorrect and intentionally not reproduced. | |
There was a problem hiding this comment.
This may already be implied by the evidence/documentation guidance, but “Reference bug” feels like the riskiest classification here. If we intentionally do not reproduce behavior from the reference implementation, should the doc explicitly say to write down why, for example with a spec link, upstream issue, or short note in the development note?
That would make it easier for reviewers to tell whether this is really a reference bug, and not just behavior that is hard or inconvenient to implement.
Co-authored-by: albertlockett <[email protected]>
Change Summary
Adds AI-assisted development guidance for OTAP Dataflow Engine contributors and maintainers.
This PR introduces a concise
docs/aientry point and documents the project’s posture for responsible AI-assisted work: controlled, reviewable, evidence-based, and owned by engineers familiar with OTAP Dataflow, Rust, and OpenTelemetry.It also clarifies the current AI-assisted guidance set:
AI-Assisted Component Development: overview for choosing the right approach.Spec-Constrained Oracle Reimplementation: for interoperability-focused work where a reference implementation acts as an executable oracle.Reference-Informed OTAP-Native Capability Design: for designing improved OTAP-native capabilities from existing implementations, feedback, and future direction.AI-Assisted Pull Request Review: for human and agent reviewers, focused on OTAP architectural invariants, thread-per-core runtime behavior, bounded resources, backpressure, performance, correctness, security, portability, and test intent.What issue does this PR close?
How are these changes tested?
python3 tools/sanitycheck.pyAre there any user-facing changes?
Yes. This is documentation-only, but contributor-facing. It adds and updates guidance for engineers using AI-assisted workflows in OTAP Dataflow Engine development.