Open
Conversation
…g_examples Code bugs (would cause failures during tutorial): - Fix SAXPY data_size 2048->4096 to match C++ kernel and README - Fix undefined variable: section-2e loop `for _ in range_` -> `for i in range_` - Fix section-3 Worker construction: add missing comma, correct variable names (of_in1/of_out1 -> of_in/of_out to match aie2.py) - Fix section-2g invalid DMA direction: SS2M -> S2MM - Fix section-2g buffer() calls: add missing datatype= keyword - Fix section-4c make command: `int_bit_width=32 trace` -> `make int_bit_width=32 trace` - Fix vector_exp README: `make text_exp.exe` -> `make vector_exp.exe` Navigation bugs: - Fix 34+ broken image links: mlir_tutorials -> mlir_exercises (9 files) - Fix section-2d/DMATasks.md and RuntimeTasks.md: retitle from Section 2g to Section 2d and correct TOC ordering - Fix section-2e broken link: 03_Link_Distribute_Join -> 03_Implicit_Copy - Fix section-3 broken nav link: ../section4/section-4b -> ../section-4/section-4b - Fix section-2f L2 README: correct filename ext_to_coreL2.py/ext_to_core.py -> ext_to_core_L2.py - Fix section-4b README-placed.md self-referential Up link: ../../section-4b -> ../ Wrong/misleading content: - Remove reference to non-existent passThrough.cc from passthrough_pykernel README - Fix scale_shift README title: "Eltwise Multiplication" -> "Scale Shift" - Fix eltwise_add README: 3x "addtiplication" -> "addition" - Update ml/README.md to include all 20 ML examples (was only listing 8) - Add algorithms/ directory to programming_examples/README.md - Fix vector_reduce_max_1col.py assert comment: "4 bytes = 1 integer" -> "4 bytes = 2 bfloat16 elements" Typos: Peformance, becuase, strides/strides, intialized, synchornization, iniitliaze, addd, additionl/descrbied/belows, evnet, incorect, colshift spacing, prase_trace.py (2 files), abtracts, candiate, fucntion, desing (3 files), multiplcation, Defintions, techncially, dimenioni/dimenion, chanenls Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add Prerequisites section to getting_started/README.md listing required
hardware, software, packages, and links to setup docs (rec 2)
- Fix vector_reduce_max reference: initialize ref_max to bfloat16('-inf')
instead of 0 so all-negative inputs are handled correctly (rec 4)
- Fix saxpy.py validation: only print on error, not on every correct element;
suppress 4096 lines of noise on a passing run (rec 5)
- Remove dead imports from saxpy.py (range_, TensorTiler2D, TensorAccessPattern,
Kernel, jit — none used in this file) (rec 6)
- Remove dead REL_WRITE/REL_READ macro definitions from saxpy.cc; add
explanatory comment to saxpy_scalar marking it as a readable reference
implementation (rec 6)
- Add inline comments to TensorAccessPattern calls in memcpy.py and
matrix_multiplication_single_core.py explaining tiler_shape/offset/sizes/
strides dimension ordering (rec 7)
- Rename start_core_body -> final_core_body in vector_reduce_max_1col.py;
the old name implied it ran first when it is actually the terminal node
in the cascade that produces the final output (rec 9)
- Add links to programming_guide sections (section-2b, section-2c, section-2f)
from files that use ObjectFIFO split(), data layout transforms, and
multi-level memory hierarchy patterns (rec 11)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
mini_tutorial: - Rename aie2p.py -> aie2.py; the file auto-detects hardware via iron.get_current_device() and runs on both npu1 and npu2 — the old name misleadingly implied AIE2P/npu2-only targeting - Update run.lit and README.md references accordingly - Add hardware prerequisites note to README.md: requires Ryzen AI NPU (any generation), XRT, and mlir-aie environment; note that @iron.jit handles device detection automatically section-1/README.md: - Fix LocalBuffer -> Buffer; LocalBuffer does not exist in the IRON API (readers copying the snippet would get a NameError) section-2a/README.md: - Fix 2x "refered" -> "referred" section-2d/RuntimeTasks.md: - Fix unused loop variable: `for groups in [0, 1]` -> `for _ in [0, 1]` section-2d/DMATasks.md: - Update Conclusion to acknowledge both npu_dma_memcpy_nd/dma_wait and the newer shim_dma_single_bd_task/dma_await_task/dma_free_task APIs; note that dma_task operations are preferred for new designs section-2e/aie2.py: - Fix file header comment: said section-2d/aie2.py, is in section-2e section-2f/02_external_mem_to_core/ext_to_core.py: - Fix file header comment: said single_buffer.py, is ext_to_core.py section-3/README.md: - Fix undefined variable: tile_size = data_size // 4 -> tensor_size // 4 - Fix kernel snippet parameter names: a_in/c_out -> a/c to match the actual vector_scalar_mul.cc source (body used c[i] and a[i] which were undeclared under the old parameter names) section-4a/README.md: - Fix C++ syntax: "imported via `import <chrono>`" -> "included via `#include <chrono>`" section-4c/README.md: - Reconcile cycle count: update prose, efficiency table, and load/store analysis from 72 -> 78 cycles to match the exercise answer tooltip, which reflects a more recent hardware measurement; update derived efficiency percentages accordingly quick_reference.md: - Remove duplicate "AIE2 - Table of supported data types and vector sizes" link that appeared in both "Helpful References" and "AIE Detailed References" sections Co-Authored-By: Claude Opus 4.6 <[email protected]>
The prose said 72 cycles while the exercise answer tooltip said 78. Rather than picking one without running the design, acknowledge that results vary by hardware revision and compiler version (~72-78 cycles). Update all derived efficiency values and prose accordingly. Co-Authored-By: Claude Opus 4.6 <[email protected]>
The conclusion now acknowledges both the npu_dma_memcpy_nd/dma_wait and dma_task interfaces without asserting one is preferred — that claim was not grounded in any project documentation. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Restore original 72-cycle / 44% / 22% values throughout — these are the numbers the original author measured and the ones consistent with the rest of the document's worked examples. The discrepancy with the exercise tooltip (78 cycles) should be resolved by someone running the design, not by editing the prose to hedge. Retain the two valid fixes from the earlier pass: - mlir_tutorials -> mlir_exercises image links - missing `make` in `make clean; int_bit_width=32 trace` command Co-Authored-By: Claude Opus 4.6 <[email protected]>
…mples Fix 1 - basic/passthrough_kernel/README.md: Correct stale filename: 'aie2.py' -> 'passthrough_kernel.py' Fix 2 - getting_started/01_SAXPY/saxpy.cc + saxpy.py: Document the hardcoded N=4096 loop bound in both saxpy() and saxpy_scalar(); add matching comment in saxpy.py main() warning that changing data_size without updating the C++ kernel will produce silently incorrect results Fix 3 - ml/gelu/README.md, ml/silu/README.md, ml/swiglu/README.md: Write READMEs for all three missing ML activation function examples; all three are listed in ml/README.md's table but had no documentation Fix 5 - basic/README.md: Correct 'matrix * vector' -> 'adds a scalar constant' for matrix_scalar_add; expand example list from 14 to all 29 subdirectories, including chaining_channels, combined_transpose, dma_transpose_packet, event_trace, packet_switch, passthrough_pykernel, row_wise_bias_add, shuffle_transpose, tiling_exploration, and others not previously listed; add VCK5000 hardware note for passthrough_dmas_plio Fix 6 - basic/passthrough_dmas_plio/README.md: Add prominent hardware note at top: targets VCK5000, not Ryzen AI NPU Fix 7 - ml/relu, ml/eltwise_add, ml/scale_shift, basic/vector_scalar_mul: Replace boilerplate 'verifies the memcpy results' with accurate per-design descriptions of what each testbench actually verifies Fix 8 - vision/edge_detect/README.md, vision/color_threshold/README.md: Replace deprecated 'OrderedObjectBuffer (OOB)' terminology with current 'ObjectFIFO' throughout both READMEs Fix 9 - mlir/README.md: Write top-level README for the mlir/ category; explains that these examples target Versal hardware (not Ryzen AI NPU), use raw MLIR rather than the Python IRON API, and describes each of the 5 examples Fix 10 - basic/memcpy/README.md: Add prominent exercise note at top making explicit that the runtime sequence is intentionally unoptimized; link to the reference solution in getting_started/00_memcpy/ Co-Authored-By: Claude Opus 4.6 <[email protected]>
black reformatted vector_reduce_max_1col.py: wrapped a long assert statement to fit within the line length limit. All other changed Python files (memcpy.py, saxpy.py, matrix_multiplication_single_core.py, aie2.py x3) were already compliant. saxpy.cc was already compliant with clang-format. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…python/utils Code bugs fixed: - buffer.py: Buffer.shape and Buffer.dtype referenced self._obj_type which does not exist; fixed to self._arr_type - buffer.py: __getitem__ and __setitem__ used `return AttributeError(...)` instead of `raise AttributeError(...)`; now raises correctly - worker.py: error message said "per works" instead of "per worker" - npukernel.py: __init__ received device_index but never stored it; added self._device_index = device_index - objectfifo.py: error message said "no consumers were not created" (double negative); fixed to "no consumers were created" - objectfifo.py: ObjectFifoLink.__init__ error messages said "Then number of" (two occurrences); fixed to "The number of" Placeholder text removed (would render literally on GitHub Pages): - worker.py: Worker class docstring had _summary_ placeholder; replaced with proper description - worker.py: orphaned floating string literal before __init__ removed - resolvable.py: NotResolvedError docstring had _type_ and _description_ placeholders; replaced with accurate description - objectfifo.py: ObjectFifo.__init__ plio arg and Raises both had _description_ placeholders; replaced with real text - objectfifo.py: floating string literal converted to comment - runtime/runtime.py: sequence() Yields had _type_ placeholder; fixed runtime/runtime.py: finish_task_group() had _description_ placeholder; fixed - buffer.py: floating string literal converted to comment - runtime/runtime.py: floating string literal converted to comment Inaccurate docstrings corrected: - runtime/runtime.py: __init__ documented parameter as check_task_groups but actual name is strict_task_groups; also fixed wrong -> Runtime return annotation to -> None - runtime/runtime.py: drain() tap arg said "sending it to the in_fifo"; fixed to "reading from the out_fifo"; removed duplicate "Defaults to None" - runtime/runtime.py: enable_trace() had "Enable trace." as its entire docstring with invalid [] type annotations; now fully documented with all 7 parameters and correct list | None annotations - runtime/endpoint.py: __init__ had wrong -> RuntimeEndpoint return annotation; fixed to -> None - runtime/data.py: arr_type property had -> np.ndarray return annotation; fixed to -> type[np.ndarray] - program.py: __init__ docstring claimed verify() is called there; it is called in resolve_program(); corrected - device/device.py: is_mem_accessible Returns said "int: Number of connections"; it returns bool; corrected - resolvable.py: Resolvable.resolve first param named cls but is an instance method; renamed to self - placeable.py: tile property Returns omitted | None; corrected - objectfifo.py: "overriden" -> "overridden" (2x), "callled" -> "called" Documentation added: - Module docstrings added to all 30 changed files (previously absent from all) - Worker.__init__: added trace_events parameter to Args section - WorkerRuntimeBarrier.__init__: added docstring - _BarrierSetOp.__init__: added docstring - FinishTaskGroupTask.__init__, RuntimeStartTask.__init__: added docstrings - DMATask: added class docstring - Program: added class docstring - SequentialPlacer.__init__: added docstring - SequentialPlacer.make_placement: added docstring (was undocumented override) - Device.rows, Device.cols: added property docstrings - Device.get_num_*_connections (x4): added Args section for t parameter - Tile.__init__: added docstring - transform(), transform_binary(), transform_parallel(), transform_parallel_binary(): expanded from one-liners to full Args/Returns - for_each(): fixed example (tile_size was positional, should be keyword); added Returns section - controlflow.py: documented range_ semantics (NPU loop vs Python unroll) - dtype.py: fixed stale filename comment (config.py -> dtype.py); added comment explaining dtype_map entries - buffer.py: fixed stale filename comment (globalbuffer.py -> buffer.py) - npukernel.py: fixed __call__ Returns which referenced undefined KernelResult - compile/utils.py: fixed "xbclbin" typo to "xclbin" - runtime/taskgroup.py: fixed "to indicated" -> "to indicate" - runtime/dmatask.py: fixed "shoudl" -> "should" - placers.py: fixed "informatio" -> "information" Co-Authored-By: Claude Opus 4.6 <[email protected]>
…tion - get_num_connections and is_mem_accessible had their Returns: sections swapped; restore each to match its actual return type and semantics - programming_examples/README.md described the algorithms/ directory as containing "sorting, searching" patterns; replace with an accurate description of the element-wise transform/for_each templates it actually contains Co-Authored-By: Claude Opus 4.6 <[email protected]>
- jit.py: add `assert cached_kernel is not None` to narrow type before calling the cached kernel, resolving reportOptionalCall - compile/utils.py: widen path parameters from `str | None` to `str | Path | None` to match actual call sites in jit.py - npukernel.py: add None guard before DefaultNPURuntime.load_and_run() to resolve reportOptionalMemberAccess; mirrors the guard already present in jit.py Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Merge origin/main ('Replace mlir_tutorials with mlir_exercises')
- Resolve conflicts in section-4c (keep correct 'make' command) and
section-5 (keep fixed 'candidate' spelling)
- Fix resnet.py: remove dead RTP writes in set_rtps() for conv2_fn
tiles (rtp[0][1], rtp[0][2], rtp[1][0], rtp[1][2], rtp[2][1],
rtp[2][3]) that are never placed/resolved because conv2_fn workers
use a hardcoded scale=1 and have no RTP argument. This was a latent
bug exposed by the buffer.py fix (return->raise AttributeError).
Co-Authored-By: Claude Opus 4.6 <[email protected]>
- aie2.py: correct stale header comment (aie2p.py -> aie2.py)
- quick_reference.md: remove empty '## AIE Detailed References' section
- objectfifo.py: fix word order in cons() docstring ('is it' -> 'it is')
- section-2e/aie2_multi.py: fix header path (section-2d -> section-2e)
and fix out-of-bounds loop (range_(data_size) -> range_(tile_size));
each worker tile holds tile_size elements, not data_size
- section-2e/README.md: sync core_fn loop bound to match aie2_multi.py
(range_(data_size) -> range_(tile_size))
- section-4b/README.md: fix typos 'precendence' -> 'precedence' and
'woker' -> 'worker'
Co-Authored-By: Claude Opus 4.6 <[email protected]>
section-1/README.md: - Prose said AIEDevice.npu1_1col but code examples use AIEDevice.npu1 - NOTE said AIEDevice.npu (nonexistent enum value), corrected to npu1 section-2a/README.md: - Fix broken internal anchor (#advanced-topic-data-movement-accelerators -> #advanced-topic-direct-memory-access-channels) - Fix core_fn example: test_func2(elemIn,...) -> test_func2(elemOut,...) (elemIn was already released at that point) section-2e/README.md: - rt.sequence(data_size,...) -> rt.sequence(data_ty,...) (×2): sequence() takes numpy array types, not integer sizes - Fix inner loop variable shadowing: 'for i in range_(tile_size)' -> 'for j' to avoid shadowing the outer 'for i in range(n_cores)' loop - Fix shell commands in python code fences -> bash fences (×2) section-4b/README.md: - xrt_test_wrapper_.h -> xrt_test_wrapper.h (spurious underscore, ×3) - parse_trace.py -> parse.py (actual script name, ×3) section-4c/README.md: - 'make -f Makefile.chess use_placed=1 trace' -> 'make use_placed=1 trace' (Makefile.chess does not exist in that directory) section-5/README.md: - passthrough.cc -> passThrough.cc (match actual filename case) Co-Authored-By: Claude Opus 4.6 <[email protected]>
hunhoffe
commented
Mar 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In preparation for future tutorials on mlir-aie/IRON, Claude helped me audit the programming_guide, programming_examples, and python/iron code + documentation for correctness and clarity.