Conversation
Coverage ReportCreated: 2026-03-13 16:27Click here for information about interpreting this report.
Generated by llvm-cov -- llvm version 18.1.3 |
|
Really great to see true declarative tracing! I ended up building my own entire trace injection toolkit for running tests with arbitrary tracing (tools/trace-*.py in xdna-emu -- fair warning, the parent project is a messy work-in-progress), so it's lovely to have the capability in mlir-aie proper. Would love to contribute here. Some things I can help with:
Let me know if contributions are welcome, and if so, what scope. Just don't want to step on anyone's toes -- this project has been evolving incredibly rapidly recently. |
|
Hi @FIM43-Redeye, Thank you for your interest in the repo! I think the enhancements you're proposing here are very interesting and useful.
Contributions on trace enhancements are more than welcome, so you are definitely not stepping on anyone's toes! Please feel free to open PRs whenever you're ready. I will also let others with deeper trace expertise weigh in. @jackl-xilinx @fifield James |
…e lowered through trace passes. - Added auto-detection of packet type based on tile when not defined in `aie.trace`. - Fixed field mode emission. Only core trace control registers have a mode field. - Updated expected broadcast event values to actual hardware event codes - Update manual `aiex.npu.write32` trace configuration in test/npu-xrt/vec_mul_event_trace/aie.mlir and programming_examples/basic/event_trace/aie_trace.mlir to declarative `aie.trace` ops - Updated examples to feature trace config for all 4 options: coretile, core_mem, memtile, shimtile. - Register -aie-insert-trace-flows to aiecc. - TODO: Future PR to update python to use `aie.trace` bindings in python/utils/trace.
d70bdb5 to
e9c2a06
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a new AIE MLIR transform pass to automatically insert trace packet flows (trace source → shim DMA) and the required runtime-sequence NPU operations for trace capture, so users no longer need to manually author low-level npu.write32/DMA setup for tracing. It also updates the trace lowering pipeline, examples, and tests to use aie.trace + aie.trace.start_config, and tightens trace-to-config lowering for broadcast events and mode-field emission.
Changes:
- Add
-aie-insert-trace-flowspass and wire it into theaiecctrace-lowering pipeline before-aie-trace-to-config. - Update
AIETraceToConfigbroadcast lowering (broadcast channel → HW event ID) and prevent emittingModefor non-core trace modules. - Refresh tests/examples to use declarative
aie.traceand validate end-to-end trace output.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/aiecc/aiecc.cpp | Adds the new trace-flow insertion pass into the existing trace lowering pipeline. |
| lib/Dialect/AIE/Transforms/AIEInsertTraceFlows.cpp | Implements the new pass that inserts aie.packet_flow ops and runtime-sequence NPU trace setup/teardown. |
| lib/Dialect/AIE/Transforms/AIETraceToConfig.cpp | Resolves broadcast start/stop to hardware event IDs and restricts Mode field emission to core traces. |
| lib/Dialect/AIE/Transforms/CMakeLists.txt | Adds the new pass source file to the build. |
| include/aie/Dialect/AIE/Transforms/AIEPasses.td | Declares the new pass, its options, and documentation. |
| include/aie/Dialect/AIE/Transforms/AIEPasses.h | Exposes createAIEInsertTraceFlowsPass() factory. |
| test/dialect/AIE/trace/test_insert_trace_flows_simple.mlir | New lit test for basic packet-flow insertion. |
| test/dialect/AIE/trace/test_insert_trace_flows_multiple.mlir | New lit test covering multiple trace sources/IDs. |
| test/dialect/AIE/trace/test_trace_to_config.mlir | Updates FileCheck expectations for resolved broadcast event IDs. |
| test/Dialect/AIE/trace/test_trace_port_to_config.mlir | Updates FileCheck expectations for resolved broadcast event IDs. |
| test/Dialect/AIE/combo_edge/test_combo_edge_full.mlir | Updates FileCheck expectations for resolved broadcast event IDs. |
| test/npu-xrt/vec_mul_event_trace/aie.mlir | Converts the end-to-end XRT test to declarative aie.trace across core/mem/memtile/shim. |
| test/npu-xrt/vec_mul_event_trace/test.py | Updates to an NPU2 end-to-end run and validates events from multiple trace types. |
| test/npu-xrt/vec_mul_event_trace/vector_scalar_mul.cc | Updates copyright header. |
| programming_examples/basic/event_trace/aie_trace.py | Updates example generator to use memtile forwarding and expanded trace coverage. |
| programming_examples/basic/event_trace/aie_trace.mlir | Updates MLIR example similarly (memtile forwarding + multiple trace types). |
| programming_examples/basic/event_trace/README.md | Documents the updated trace lowering pipeline including the new pass. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let constructor = "xilinx::AIE::createAIEInsertTraceFlowsPass()"; | ||
|
|
||
| let dependentDialects = [ | ||
| "xilinx::AIE::AIEDialect", |
| "Minimize number of shim tiles used (prefer one shim for all traces)">, | ||
| Option<"preferSameColumn", "prefer-same-column", "bool", "true", | ||
| "When choosing shim, prefer same column as trace sources"> |
| // CHECK: aie.trace @core_trace | ||
| aie.trace @core_trace(%tile02) { | ||
| aie.trace.mode "Event-Time" | ||
| aie.trace.packet id=1 type="core" |
| std::set<std::pair<int, int>> processedTiles; // (col, row) | ||
| for (auto &info : traceInfos) { | ||
| int col = info.tile.getCol(); | ||
| int row = info.tile.getRow(); | ||
|
|
||
| if (processedTiles.find({col, row}) != processedTiles.end()) | ||
| continue; | ||
| processedTiles.insert({col, row}); | ||
|
|
||
| // Compute timer control address | ||
| uint32_t timerCtrlAddr = computeTimerCtrlAddress( | ||
| info.tile, targetModel, info.packetType == TracePacketType::Mem); |
| // Compute timer control address | ||
| uint32_t timerCtrlAddr = computeTimerCtrlAddress( | ||
| info.tile, targetModel, info.packetType == TracePacketType::Mem); | ||
|
|
||
| // Timer control value: BROADCAST_15 event (122 << 8 = 31232) | ||
| uint32_t timerCtrlValue = 31232; // Event 122 (BROADCAST_15) << 8 | ||
|
|
||
| builder.create<xilinx::AIEX::NpuWrite32Op>( | ||
| runtimeSeq.getLoc(), timerCtrlAddr, timerCtrlValue, nullptr, | ||
| builder.getI32IntegerAttr(col), builder.getI32IntegerAttr(row)); | ||
| } |
| aie.trace.packet id=1 type="core" | ||
| aie.trace.event<"INSTR_VECTOR"> | ||
| aie.trace.start broadcast=15 | ||
| } | ||
|
|
||
| // Mem trace on tile (0,2) | ||
| aie.trace @mem_trace_02(%tile02) { | ||
| aie.trace.packet id=2 type="mem" | ||
| aie.trace.event<"DMA_S2MM_0_START_TASK"> | ||
| aie.trace.start broadcast=15 | ||
| } | ||
|
|
||
| // Core trace on tile (0,3) | ||
| aie.trace @core_trace_03(%tile03) { | ||
| aie.trace.packet id=3 type="core" |
| } else { | ||
| // Fallback: Use one shim for all traces (column 0) | ||
| int targetCol = 0; | ||
| TileOp shimTile = nullptr; | ||
| for (auto tile : device.getOps<TileOp>()) { | ||
| if (tile.getCol() == targetCol && tile.getRow() == 0) { | ||
| shimTile = tile; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if (!shimTile) { | ||
| builder.setInsertionPointToStart(&device.getRegion().front()); | ||
| shimTile = builder.create<TileOp>(device.getLoc(), targetCol, 0); | ||
| } | ||
|
|
||
| ShimInfo shimInfo; | ||
| shimInfo.shimTile = shimTile; | ||
| shimInfo.channel = shimChannel; | ||
| shimInfo.bdId = defaultBdId; | ||
| shimInfo.argIdx = traceArgIdx; | ||
| shimInfo.traceSources = traceInfos; | ||
| shimInfos[targetCol] = shimInfo; | ||
| } | ||
|
|
||
| // Phase 3: Insert packet flows | ||
| // Insert before the device terminator | ||
| Block &deviceBlock = device.getRegion().front(); | ||
| builder.setInsertionPoint(deviceBlock.getTerminator()); | ||
|
|
||
| for (auto &info : traceInfos) { | ||
| // Find target shim for this trace | ||
| int col = info.tile.getCol(); | ||
| ShimInfo &shimInfo = shimInfos[col]; | ||
|
|
| // Find packet ID and type from trace body | ||
| std::optional<int> packetId; | ||
| std::optional<TracePacketType> packetType; | ||
| for (auto &op : trace.getBody().getOps()) { | ||
| if (auto packetOp = dyn_cast<TracePacketOp>(op)) { | ||
| packetId = packetOp.getId(); | ||
| packetType = packetOp.getType(); | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| // Determine packet type from tile type if not specified | ||
| if (!packetType) { | ||
| if (tile.isShimTile()) { | ||
| packetType = TracePacketType::ShimTile; | ||
| } else if (tile.isMemTile()) { | ||
| packetType = TracePacketType::MemTile; | ||
| } else { | ||
| // Core tile defaults to core type | ||
| packetType = TracePacketType::Core; | ||
| } | ||
| } | ||
|
|
||
| // Allocate packet ID if not specified | ||
| if (!packetId) { | ||
| if (usedPacketIds.find(col) == usedPacketIds.end()) { | ||
| usedPacketIds[col] = nextPacketId; | ||
| } | ||
| packetId = usedPacketIds[col]++; | ||
| } | ||
|
|
| // Timer control value: BROADCAST_15 event (122 << 8 = 31232) | ||
| uint32_t timerCtrlValue = 31232; // Event 122 (BROADCAST_15) << 8 | ||
|
|
||
| builder.create<xilinx::AIEX::NpuWrite32Op>( | ||
| runtimeSeq.getLoc(), timerCtrlAddr, timerCtrlValue, nullptr, | ||
| builder.getI32IntegerAttr(col), builder.getI32IntegerAttr(row)); | ||
| } |
| // 4c. Write buffer descriptor | ||
| builder.create<xilinx::AIEX::NpuWriteBdOp>( | ||
| runtimeSeq.getLoc(), | ||
| shimCol, // column | ||
| shimInfo.bdId, // bd_id | ||
| traceBufferSize, // buffer_length | ||
| 0, // buffer_offset | ||
| 1, // enable_packet | ||
| 0, // out_of_order_id | ||
| 0, // packet_id (not used for reception) | ||
| 0, // packet_type (not used for reception) | ||
| 0, 0, 0, 0, 0, | ||
| 0, // d0_size, d0_stride, d1_size, d1_stride, d2_size, d2_stride | ||
| 0, 0, 0, // iteration_current, iteration_size, iteration_stride | ||
| 0, // next_bd | ||
| 0, // row | ||
| 0, // use_next_bd | ||
| 1, // valid_bd | ||
| 0, 0, 0, 0, 0, // lock_rel_val, lock_rel_id, lock_acq_enable, | ||
| // lock_acq_val, lock_acq_id | ||
| 0, 0, 0, 0, 0, 0, // d0_zero_before, d1_zero_before, d2_zero_before, | ||
| // d0_zero_after, d1_zero_after, d2_zero_after | ||
| traceBurstLength // burst_length | ||
| ); |
I have some examples + draft code for parsing the other two trace modes (event-pc and execution-trace) which I used as a quick proof of life check when developing the declarative trace ops. I'll try to dig it out and stick it in a branch if other folks want to take a look. |
|
@yenjames Thanks for the warm welcome! I'll start with FileCheck tests for multi-tile scenarios -- that's low-friction and immediately useful for your coverage numbers. @fifield That would be incredibly valuable! I've been reverse-engineering the event-pc and execution-trace formats from raw hardware captures on my Phoenix NPU and have gotten reasonably far -- event-pc cross-references against the ELF, execution mode appears to be ARM CoreSight ETM-style atom encoding -- but I've been working without any format specification, so I'd love to compare notes against your parser to see where I've gotten things right vs wrong. For broader context: I'm building an open-source cycle-accurate NPU emulator (xdna-emu dev branch) intended as both a learning tool and a verification aid -- so documentation on any aspect of the NPU is hugely useful to me, not just trace formats. I've been deriving behavior from the open-source toolchain (aie-rt, mlir-aie, llvm-aie) and hardware observation, which has gotten me surprisingly far, but I'll be honest -- I'd much rather work from open documentation than reverse-engineer proprietary tooling. It's not just inefficient, it makes me genuinely uncomfortable. AM020/AM025 cover register layouts but not things like trace wire formats or detailed DMA pipeline behavior. If there's any documentation that can be shared, even informally, it would save a lot of effort. And if not, I completely understand -- I'll keep working from hardware observations and the toolchain source. |
Not ready for review, working on copilot suggestions and increasing test coverage
Summary
This PR implements a new MLIR pass to automatically insert trace packet flows and runtime sequence configuration for AIE trace operations. Following #2705, this should eliminate the need for manual trace
npu.write32configurations.Changes
test/npu-xrt/vec_mul_event_traceto be the full end to end trace test. Checks trace output after hardware run.aie.trace.packettype is not specified, the pass auto-detects it from the tile type. For core tiles, it defaults to core instead of mem.AIETraceToConfig.cppwhich incorrectly allows MemTile and ShimTiles to be configured with a Mode field in their trace. Based on my understanding, onlycoremodule has the Mode field in its trace control register.Example
What user writes:
--aie-insert-trace-flows generates:
aie.traceis lowered to anaie.trace.configsequence of register writes, then tonpu.write32(Implemented in #2705):Follow-up PR
python/utils/trace/setup.pyto emitaie.traceops instead of directnpu_write32calls.