Conversation
|
Awesome! When this works, we can probably get rid of https://github.com/Xilinx/mlir-aie/blob/main/python/utils/compile/link.py |
Coverage ReportCreated: 2026-03-12 20:48Click here for information about interpreting this report.
Generated by llvm-cov -- llvm version 18.1.3 |
eabf522 to
b2975f1
Compare
- Add link_files StrArrayAttr to CoreOp (canonical post-pass list of .o paths) - Deprecate CoreOp-level link_with in favour of func.func-level link_with - Add AIEAssignCoreLinkFiles pass: traces call edges from each aie.core to func.func declarations carrying link_with, accumulates .o paths into CoreOp link_files, and migrates any legacy CoreOp-level link_with - Wire link_files into BCF and LdScript emitters - Register pass in CMakeLists.txt and AIEPasses.h Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Run aie-assign-core-link-files pass in aiecc.cpp pipeline before BCF/LdScript - Update Python aiecc driver (main.py) to invoke the pass and handle link_files - Add atomicCopyFile helper for race-free kernel .o staging when multiple cores share the same kernel object file - Use GEN_PASS_DEF_AIEASSIGNCORELINKFILES / impl:: base to be compatible with both wheel and source-built MLIR tablegen versions Co-Authored-By: Claude Opus 4.6 <[email protected]>
aiecc dialect tests (test/aiecc/):
- cpp_link_with.mlir: migrate existing test to func-level link_with
- cpp_link_with_func_level.mlir: basic func.func link_with → CoreOp link_files
- cpp_link_with_both_attrs.mlir: CoreOp + func.func link_with coexist → error
- cpp_link_with_deprecation.mlir: CoreOp-level link_with deprecation warning
- cpp_link_with_emitter_fallback.mlir: BCF/LdScript emit from link_files
- cpp_link_with_indirect_call.mlir: indirect call triggers warning
- cpp_link_with_mixed.mlir: mixed kernel .o per-core merged and deduped
- cpp_link_with_shared_func.mlir: same func.func called by multiple cores
- cpp_link_with_unused_func.mlir: unused func.func with link_with warns
- cpp_multi_link_with.mlir: multiple link_with attrs on different funcs
npu-xrt end-to-end tests (require physical NPU):
- add_one_func_link_with_{chess,peano}: single kernel via func-level link_with
- add_one_scale_func_link_with_{chess,peano}: two kernels (add_one + scale_by_two)
each with its own func.func link_with, exercising multi-.o linking per core
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace ::close(tmpFD) with sys::fs::closeFile() in atomicCopyFile so that <unistd.h> does not need to be included. The system header was conflicting with the existing cl::opt<bool> link variable at file scope, causing a build error on Linux. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…rride, atomic copy - Move createAIEAssignCoreLinkFilesPass() declaration to alphabetical position in AIEPasses.h (after the two AssignBufferAddresses overloads) - Remove getDependentDialects() override from AIEAssignCoreLinkFilesPass; the tablegen dependentDialects field already registers FuncDialect and AIEDialect via the generated base class - Use tempfile.mkstemp + os.replace in the Python aiecc driver's .o staging loop for the same atomic copy-then-rename semantics as the C++ atomicCopyFile helper Co-Authored-By: Claude Opus 4.6 <[email protected]>
- worker.py: drop dead `isinstance(Kernel, ExternalFunction): pass` branch (now handled implicitly via func.call ops), remove unused import, update comment - kernel.py: update stale `bin_name` docstring to reflect func.func link_with usage - jit.py: remove redundant pre-scan of ExternalFunction in args/kwargs; simplify _instances collection into a single list comprehension - compile/utils.py: remove three empty try/except/raise blocks in compile_external_kernel that catch and immediately re-raise Co-Authored-By: Claude Opus 4.6 <[email protected]>
The Peano linker (ld.lld via clang) resolves bare INPUT() filenames in the generated ldscript against the linker process's working directory. When invoked via do_call with no cwd, this inherited the Python process's cwd rather than tmpdirname where the .o files live. Fix by adding a cwd parameter to do_call and passing cwd=self.tmpdirname to all three Peano linker invocations in process_core. Co-Authored-By: Claude Opus 4.6 <[email protected]>
The merge_object_files approach (link.py) is superseded by the link_with attribute on func.func declarations, which aiecc aggregates onto CoreOp via AIEAssignCoreLinkFiles. Remove link.py, its CMakeLists entry, and the test_compile_and_link test that exercised it. Add a link_with keyword argument to external_func in aie.py so Python/IRON callers can attach the link_with attribute directly when declaring external kernel functions. Co-Authored-By: Claude Opus 4.6 <[email protected]>
During rebase onto origin/main, later commits in our branch re-introduced the 2217-line Python aiecc implementation (via replay of the cwd/JIT fix). Replace it with the 116-line thin wrapper from dde51cb that delegates all compilation to the C++ aiecc binary, consistent with PR #2925's deprecation of aiecc.py. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Three bugs found during post-rebase testing: 1. aiecc.cpp: atomicCopyFile was called unconditionally even when src==dest (JIT case where the .o is already in tmpDirName). Guard the copy with a src!=dest check in both compileCores and compileCoresUnified Peano sections. 2. aiecc.cpp: ld.lld resolves INPUT() paths relative to process cwd, not the linker script directory. Before generating the ldscript, patch the CoreOp's link_files attribute to use absolute paths (tmpDirName-prefixed) so INPUT() directives are always absolute. 3. python/utils/compile/utils.py: when work_dir is provided, write aie.mlir into work_dir and call aiecc directly on that path so relative link_with filenames (e.g. "add_one.o") resolve against work_dir where compile_external_kernel placed the compiled .o. 4. python/utils/jit.py: the post-migration audit (2974552) incorrectly removed the pre-scan of args/kwargs for ExternalFunction instances. Since _instances.add() runs in __init__ (before the JIT call), and _instances.clear() runs before function() executes, the post-scan always found nothing. Restore the pre-scan so externally-constructed ExternalFunction instances are collected before _instances.clear(). Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
…s.py
aiecc.cpp:
- Fix thread-safety: ldscript CoreOp link_files patching was mutating the
shared moduleOp from parallel core threads. Move the walk+patch onto
coreModule (the per-thread clone) and pass it to AIETranslateToLdScript.
- Add missing verbose logging in compileCoresUnified Peano copy loop to
match the logging in compileCores.
- Fix stale comment "above" → "as part of" in getCoreInfo.
python/utils/compile/utils.py:
- Replace call to private API aiecc._find_aiecc_binary() with
shutil.which("aiecc"), which is already imported. Add a clear error
message if the binary is not found.
- Improve error message format: include a newline before stderr output
to separate it visually from the RuntimeError prefix.
python/utils/jit.py:
- Replace O(n) list `in` check for deduplication with an id-based set,
making intent explicit: deduplication is by object identity, not equality.
- Remove duplicate "Determine target architecture based on device type"
comment.
- Tighten collection of arg-passed ExternalFunction into a list
comprehension for clarity.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
aiecc.cpp — ldscript clone was already LLVM-lowered: The ldscript patch cloned coreModule, but coreModule had already been destructively lowered to LLVM IR by runLLVMLoweringPipeline. AIETranslateToLdScript expects AIE dialect input, so it failed for every first-time compilation. Fix: clone moduleOp (the shared pre-lowering module) into a fresh ldScriptModule specifically for this purpose. This also correctly addresses the thread-safety concern: ldScriptModule is a per-call local clone so mutations are never visible to other threads. python/utils/jit.py — is_placed=True captured empty outer context: Program.resolve_program() creates its own mlir_mod_ctx() internally and returns its module. The jit decorator was wrapping function() in an outer mlir_mod_ctx() and capturing ctx.module, which was always empty because all AIE ops were generated into resolve_program's inner context. Fix: always use the function's return value directly; the is_placed=True / is_placed=False split is now unnecessary and removed. Also remove the now-unused mlir_mod_ctx import. Co-Authored-By: Claude Opus 4.6 <[email protected]>
61b6e2e to
3a09dbc
Compare
The runtime_sequence had two parameters (%in, %out), mapping %out to XRT group_id(4) = bo_inB. But test.cpp reads results from bo_out at group_id(5), which was never written, causing all-zero output. Add the standard dummy middle buffer (%buf : memref<32xi32>) to shift %out to group_id(5) = bo_out, matching test.cpp's three-buffer layout. Also remove the unused %c8 constant from the sequence body. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Commit 3a09dbc ("Fix two bugs exposed by running tests without warm cache") correctly observed that non-placed designs (using Program.resolve_program()) already open their own mlir_mod_ctx() internally and return the module, so wrapping them in an outer context gives an empty module. However, the fix over-reached: it also removed the outer mlir_mod_ctx() for placed designs, which use the raw @device(...) DSL. Placed designs populate the context as a side effect and return nothing, so they still require an outer context to be active when their decorator arguments are evaluated. Without it, iron.get_current_device().resolve() fails with: RuntimeError: An MLIR function requires a Context but none was provided Fix: restore the is_placed branch so placed designs get an outer mlir_mod_ctx() (capturing ctx.module), while non-placed designs continue to use the function's return value directly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…pyFile On Windows, llvm::sys::fs::file_t is HANDLE (not int), so calling closeFile(int tmpFD) fails to compile with MSVC: error C2664: cannot convert argument 1 from 'int' to 'file_t &' The open+close sequence was unnecessary: we only needed a unique path, not an open file descriptor, since copy_file immediately overwrites the temp file. Switch to the two-argument createUniqueFile(model, path) overload which reserves a unique filename without opening it, and drop the closeFile call entirely. This is correct on all platforms. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Resolve conflicts: - Keep link.py deleted (superseded by link_with attribute approach) - Remove verbose=False arg from compile_cxx_core_function call (param removed upstream)
- Fix broken LLVM file header modeline in AIEAssignCoreLinkFiles.cpp
- Remove duplicate #include in aiecc.cpp
- Extract duplicated core-skipping predicate into coreNeedsCompilation()
- Normalize verbose output terminology ("external object" vs "link_with")
- Improve comments in AIEAssignCoreLinkFiles, BCF/ldscript targets, aiecc
- Add/improve docstrings in AIEOps.td, AIEPasses.td, kernel.py, aie.py, utils.py
- Extend indirect-call warning check in test to cover actionable message text
- Add comment explaining uninitialized bo_inB in npu-xrt tests
Co-Authored-By: Claude Opus 4.6 <[email protected]>
aiecc.cpp: - Add missing lock_guard(outputMutex) around new verbose log lines in compileCore and compileCoresUnified (data race with parallel compilation) - Add src == dest guard before atomicCopyFile in the BCF linkWithFiles loop, consistent with the existing guard in the Peano loop python/utils/compile/utils.py: - Forward aiecc subprocess stdout/stderr to logger.debug on success so verbose output is not silently discarded - Remove dead hasattr(func, "_compiled") guard; _compiled is always initialized in ExternalFunction.__init__ python/utils/jit.py: - Replace assert mlir_module.operation.verify() with explicit if/raise so it cannot be silently disabled by python -O lib/Dialect/AIE/Transforms/AIEAssignCoreLinkFiles.cpp: - Use mlir::Builder instead of mlir::OpBuilder; no ops are inserted lib/Dialect/AIE/Transforms/CMakeLists.txt: - Restore alphabetical ordering of source files test/npu-xrt/add_one_scale_func_link_with_peano/run.lit: - Fix wrong function signature in comment: scale_by_two takes three args Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add link_with to the external_func signature entry and note that multiple declarations may share the same .o file (automatically deduplicated by aie-assign-core-link-files). Co-Authored-By: Claude Opus 4.6 <[email protected]>
…sion examples
edge_detect and color_detect were packing multiple kernel object files
into combined archives (combined_gray2rgba_addWeighted.a,
combined_bitwiseOR_gray2rgba_bitwiseAND.a) solely because the old
aie.core link_with attribute accepted only one file per core.
Now that link_with is declared per external_func and
aie-assign-core-link-files aggregates and deduplicates the per-function
object files into a per-core link_files list, each external_func
declaration can point directly to the .o that implements it:
- edge_detect: gray2rgbaLine -> gray2rgba.cc.o
addWeightedLine -> addWeighted.cc.o
- color_detect: bitwiseORLine -> bitwiseOR.cc.o
gray2rgbaLine -> gray2rgba.cc.o
bitwiseANDLine -> bitwiseAND.cc.o
Remove the ar archive build rules from both Makefiles and update the
xclbin dependencies accordingly. Update the Iron Kernel() calls in the
non-placed variants and the edge_detect test in test/python/npu.py.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
kernel.py: - Remove apologetic "harmless redundancy" comment from ExternalFunction.__call__; the arg-count check is explained by the fact that BaseKernel validates count while ExternalFunction also validates types, so the super() call naturally re-checks count. compile/utils.py: - Clarify compile_mlir_module comment: the MLIR write in the work_dir branch is intentional (and idempotent when jit.py pre-writes the same file) because the C++ aiecc binary needs the MLIR on disk to resolve relative link_with paths against the compilation directory. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Does this PR address #2530? If so, is it appropriate to close that draft PR? |
It does, I'll close it. Much appreciated! |
jgmelber
left a comment
There was a problem hiding this comment.
Comment on archive files in the examples, but overall this looks good! Thanks!
|
|
||
| exp_bf16_1024 = external_func("exp_bf16_1024", inputs=[tile_ty, tile_ty]) | ||
| exp_bf16_1024 = external_func( | ||
| "exp_bf16_1024", inputs=[tile_ty, tile_ty], link_with="kernels.a" |
There was a problem hiding this comment.
I admit I didn't dig into this example before writing this, but doesn't func-level link_with eliminate the need for archive files?
There was a problem hiding this comment.
My understanding of this example is that the archive format is a reflection of needing to bundle the function definition with a file of common helpers (e.g., kernel source requirement) and not a reflection of a need to archive multiple .o files due to mlir-aie.
Summary
Removes the one-object-file-per-core restriction by moving
link_withfromaie.coreto individualfunc.funcdeclarations. The newaie-assign-core-link-filespass tracesfunc.calledges from each core, aggregates referenced object files into alink_filesarray on theCoreOp, and both compiler backends (xbridge/Chess and lld/Peano) consume it.Changes
Dialect:
CoreOpgainslink_files(StrArrayAttr);func.funcdeclarations carrylink_with. Verifier rejects designs that set both on the same core.Pass
aie-assign-core-link-files: Traces directfunc.calledges per core, deduplicateslink_withvalues intolink_files. Warns on indirect calls, unusedlink_withfunctions, and deprecated core-levellink_with. Wired into the resource-allocation pipeline inaiecc.cpp.Backends: BCF and ldscript emitters consume
link_fileswith fallback to deprecatedlink_with.atomicCopyFileadded for safe parallel-core.ocopying.Python/IRON (breaking changes):
Kernel(name, bin_name)→Kernel(name, object_file_name);.bin_name→.object_file_nameCore(link_with=...)raisesTypeError; setlink_withonexternal_func()insteadtile_size(),arg_types(), and arg-count validation lifted toBaseKernelExternalFunction: removeddebugparameter and no-op context managermerge_object_files/link.pyremoved (superseded by per-functionlink_with)vector_exp,softmax(×2),color_detect,edge_detectplaced examplesTests
test/aiecc/: 11 LIT pass tests covering canonical, multi-.o, dedup, deprecation, warnings, emitter fallback, no-optest/npu-xrt/: single-.oand two-.oend-to-end on NPU1/NPU2 (Chess + Peano)test/python/:external_func(link_with=...)MLIR output; JIT with twoExternalFunctions on one coreMigration
Known limitations
func.calledges are traced; transitive calls through helper functions are not followed automatically.