Skip to content

func level link_with#2922

Merged
hunhoffe merged 37 commits intomainfrom
func-level-link-with
Mar 13, 2026
Merged

func level link_with#2922
hunhoffe merged 37 commits intomainfrom
func-level-link-with

Conversation

@hunhoffe
Copy link
Collaborator

@hunhoffe hunhoffe commented Mar 4, 2026

Summary

Removes the one-object-file-per-core restriction by moving link_with from aie.core to individual func.func declarations. The new aie-assign-core-link-files pass traces func.call edges from each core, aggregates referenced object files into a link_files array on the CoreOp, and both compiler backends (xbridge/Chess and lld/Peano) consume it.

Changes

Dialect: CoreOp gains link_files (StrArrayAttr); func.func declarations carry link_with. Verifier rejects designs that set both on the same core.

Pass aie-assign-core-link-files: Traces direct func.call edges per core, deduplicates link_with values into link_files. Warns on indirect calls, unused link_with functions, and deprecated core-level link_with. Wired into the resource-allocation pipeline in aiecc.cpp.

Backends: BCF and ldscript emitters consume link_files with fallback to deprecated link_with. atomicCopyFile added for safe parallel-core .o copying.

Python/IRON (breaking changes):

  • Kernel(name, bin_name)Kernel(name, object_file_name); .bin_name.object_file_name
  • Core(link_with=...) raises TypeError; set link_with on external_func() instead
  • tile_size(), arg_types(), and arg-count validation lifted to BaseKernel
  • ExternalFunction: removed debug parameter and no-op context manager
  • merge_object_files / link.py removed (superseded by per-function link_with)
  • Migrated vector_exp, softmax (×2), color_detect, edge_detect placed examples

Tests

  • test/aiecc/: 11 LIT pass tests covering canonical, multi-.o, dedup, deprecation, warnings, emitter fallback, no-op
  • test/npu-xrt/: single-.o and two-.o end-to-end on NPU1/NPU2 (Chess + Peano)
  • test/python/: external_func(link_with=...) MLIR output; JIT with two ExternalFunctions on one core

Migration

# Before
my_fn = external_func("my_fn", inputs=[...])
@core(tile, "my_kernel.o")
def core_body(): ...

# After
my_fn = external_func("my_fn", inputs=[...], link_with="my_kernel.o")
@core(tile)
def core_body(): ...

Known limitations

  • Only direct func.call edges are traced; transitive calls through helper functions are not followed automatically.
  • Does not handle symbol collisions; that can be a separate PR.

@ypapadop-amd
Copy link
Collaborator

Awesome! When this works, we can probably get rid of https://github.com/Xilinx/mlir-aie/blob/main/python/utils/compile/link.py

@hunhoffe hunhoffe mentioned this pull request Mar 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Coverage Report

Created: 2026-03-12 20:48

Click here for information about interpreting this report.

FilenameFunction CoverageLine CoverageRegion CoverageBranch Coverage
include/aie/Dialect/AIE/Transforms/AIEPasses.h 100.00% 100.00% 100.00% -
lib/Dialect/AIE/IR/AIEDialect.cpp 90.48% 86.12% 87.33% 78.94%
lib/Dialect/AIE/Transforms/AIEAssignCoreLinkFiles.cpp 100.00% 100.00% 100.00% 90.91%
Totals 90.79% 86.65% 87.53% 79.15%
Generated by llvm-cov -- llvm version 18.1.3

@hunhoffe hunhoffe force-pushed the func-level-link-with branch 3 times, most recently from eabf522 to b2975f1 Compare March 5, 2026 21:29
hunhoffe and others added 13 commits March 5, 2026 16:28
- Add link_files StrArrayAttr to CoreOp (canonical post-pass list of .o paths)
- Deprecate CoreOp-level link_with in favour of func.func-level link_with
- Add AIEAssignCoreLinkFiles pass: traces call edges from each aie.core to
  func.func declarations carrying link_with, accumulates .o paths into
  CoreOp link_files, and migrates any legacy CoreOp-level link_with
- Wire link_files into BCF and LdScript emitters
- Register pass in CMakeLists.txt and AIEPasses.h

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Run aie-assign-core-link-files pass in aiecc.cpp pipeline before BCF/LdScript
- Update Python aiecc driver (main.py) to invoke the pass and handle link_files
- Add atomicCopyFile helper for race-free kernel .o staging when multiple cores
  share the same kernel object file
- Use GEN_PASS_DEF_AIEASSIGNCORELINKFILES / impl:: base to be compatible with
  both wheel and source-built MLIR tablegen versions

Co-Authored-By: Claude Opus 4.6 <[email protected]>
aiecc dialect tests (test/aiecc/):
- cpp_link_with.mlir: migrate existing test to func-level link_with
- cpp_link_with_func_level.mlir: basic func.func link_with → CoreOp link_files
- cpp_link_with_both_attrs.mlir: CoreOp + func.func link_with coexist → error
- cpp_link_with_deprecation.mlir: CoreOp-level link_with deprecation warning
- cpp_link_with_emitter_fallback.mlir: BCF/LdScript emit from link_files
- cpp_link_with_indirect_call.mlir: indirect call triggers warning
- cpp_link_with_mixed.mlir: mixed kernel .o per-core merged and deduped
- cpp_link_with_shared_func.mlir: same func.func called by multiple cores
- cpp_link_with_unused_func.mlir: unused func.func with link_with warns
- cpp_multi_link_with.mlir: multiple link_with attrs on different funcs

npu-xrt end-to-end tests (require physical NPU):
- add_one_func_link_with_{chess,peano}: single kernel via func-level link_with
- add_one_scale_func_link_with_{chess,peano}: two kernels (add_one + scale_by_two)
  each with its own func.func link_with, exercising multi-.o linking per core

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace ::close(tmpFD) with sys::fs::closeFile() in atomicCopyFile so
that <unistd.h> does not need to be included. The system header was
conflicting with the existing cl::opt<bool> link variable at file scope,
causing a build error on Linux.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…rride, atomic copy

- Move createAIEAssignCoreLinkFilesPass() declaration to alphabetical
  position in AIEPasses.h (after the two AssignBufferAddresses overloads)
- Remove getDependentDialects() override from AIEAssignCoreLinkFilesPass;
  the tablegen dependentDialects field already registers FuncDialect and
  AIEDialect via the generated base class
- Use tempfile.mkstemp + os.replace in the Python aiecc driver's .o
  staging loop for the same atomic copy-then-rename semantics as the
  C++ atomicCopyFile helper

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- worker.py: drop dead `isinstance(Kernel, ExternalFunction): pass` branch
  (now handled implicitly via func.call ops), remove unused import, update comment
- kernel.py: update stale `bin_name` docstring to reflect func.func link_with usage
- jit.py: remove redundant pre-scan of ExternalFunction in args/kwargs; simplify
  _instances collection into a single list comprehension
- compile/utils.py: remove three empty try/except/raise blocks in
  compile_external_kernel that catch and immediately re-raise

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The Peano linker (ld.lld via clang) resolves bare INPUT() filenames in
the generated ldscript against the linker process's working directory.
When invoked via do_call with no cwd, this inherited the Python process's
cwd rather than tmpdirname where the .o files live.

Fix by adding a cwd parameter to do_call and passing cwd=self.tmpdirname
to all three Peano linker invocations in process_core.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The merge_object_files approach (link.py) is superseded by the link_with
attribute on func.func declarations, which aiecc aggregates onto CoreOp
via AIEAssignCoreLinkFiles. Remove link.py, its CMakeLists entry, and
the test_compile_and_link test that exercised it.

Add a link_with keyword argument to external_func in aie.py so Python/IRON
callers can attach the link_with attribute directly when declaring external
kernel functions.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
During rebase onto origin/main, later commits in our branch re-introduced
the 2217-line Python aiecc implementation (via replay of the cwd/JIT fix).
Replace it with the 116-line thin wrapper from dde51cb that delegates
all compilation to the C++ aiecc binary, consistent with PR #2925's
deprecation of aiecc.py.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Three bugs found during post-rebase testing:

1. aiecc.cpp: atomicCopyFile was called unconditionally even when src==dest
   (JIT case where the .o is already in tmpDirName). Guard the copy with
   a src!=dest check in both compileCores and compileCoresUnified Peano
   sections.

2. aiecc.cpp: ld.lld resolves INPUT() paths relative to process cwd, not
   the linker script directory. Before generating the ldscript, patch the
   CoreOp's link_files attribute to use absolute paths (tmpDirName-prefixed)
   so INPUT() directives are always absolute.

3. python/utils/compile/utils.py: when work_dir is provided, write aie.mlir
   into work_dir and call aiecc directly on that path so relative link_with
   filenames (e.g. "add_one.o") resolve against work_dir where
   compile_external_kernel placed the compiled .o.

4. python/utils/jit.py: the post-migration audit (2974552) incorrectly
   removed the pre-scan of args/kwargs for ExternalFunction instances.
   Since _instances.add() runs in __init__ (before the JIT call), and
   _instances.clear() runs before function() executes, the post-scan always
   found nothing. Restore the pre-scan so externally-constructed
   ExternalFunction instances are collected before _instances.clear().

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…s.py

aiecc.cpp:
- Fix thread-safety: ldscript CoreOp link_files patching was mutating the
  shared moduleOp from parallel core threads. Move the walk+patch onto
  coreModule (the per-thread clone) and pass it to AIETranslateToLdScript.
- Add missing verbose logging in compileCoresUnified Peano copy loop to
  match the logging in compileCores.
- Fix stale comment "above" → "as part of" in getCoreInfo.

python/utils/compile/utils.py:
- Replace call to private API aiecc._find_aiecc_binary() with
  shutil.which("aiecc"), which is already imported. Add a clear error
  message if the binary is not found.
- Improve error message format: include a newline before stderr output
  to separate it visually from the RuntimeError prefix.

python/utils/jit.py:
- Replace O(n) list `in` check for deduplication with an id-based set,
  making intent explicit: deduplication is by object identity, not equality.
- Remove duplicate "Determine target architecture based on device type"
  comment.
- Tighten collection of arg-passed ExternalFunction into a list
  comprehension for clarity.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
aiecc.cpp — ldscript clone was already LLVM-lowered:
  The ldscript patch cloned coreModule, but coreModule had already been
  destructively lowered to LLVM IR by runLLVMLoweringPipeline.
  AIETranslateToLdScript expects AIE dialect input, so it failed for every
  first-time compilation.  Fix: clone moduleOp (the shared pre-lowering
  module) into a fresh ldScriptModule specifically for this purpose.
  This also correctly addresses the thread-safety concern: ldScriptModule
  is a per-call local clone so mutations are never visible to other threads.

python/utils/jit.py — is_placed=True captured empty outer context:
  Program.resolve_program() creates its own mlir_mod_ctx() internally and
  returns its module.  The jit decorator was wrapping function() in an
  outer mlir_mod_ctx() and capturing ctx.module, which was always empty
  because all AIE ops were generated into resolve_program's inner context.
  Fix: always use the function's return value directly; the is_placed=True
  / is_placed=False split is now unnecessary and removed.  Also remove the
  now-unused mlir_mod_ctx import.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@hunhoffe hunhoffe force-pushed the func-level-link-with branch from 61b6e2e to 3a09dbc Compare March 6, 2026 01:34
hunhoffe and others added 10 commits March 6, 2026 10:21
The runtime_sequence had two parameters (%in, %out), mapping %out to
XRT group_id(4) = bo_inB. But test.cpp reads results from bo_out at
group_id(5), which was never written, causing all-zero output.

Add the standard dummy middle buffer (%buf : memref<32xi32>) to shift
%out to group_id(5) = bo_out, matching test.cpp's three-buffer layout.
Also remove the unused %c8 constant from the sequence body.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Commit 3a09dbc ("Fix two bugs exposed by running tests without warm
cache") correctly observed that non-placed designs (using
Program.resolve_program()) already open their own mlir_mod_ctx()
internally and return the module, so wrapping them in an outer context
gives an empty module.

However, the fix over-reached: it also removed the outer mlir_mod_ctx()
for placed designs, which use the raw @device(...) DSL.  Placed designs
populate the context as a side effect and return nothing, so they still
require an outer context to be active when their decorator arguments are
evaluated.  Without it, iron.get_current_device().resolve() fails with:

  RuntimeError: An MLIR function requires a Context but none was provided

Fix: restore the is_placed branch so placed designs get an outer
mlir_mod_ctx() (capturing ctx.module), while non-placed designs
continue to use the function's return value directly.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…pyFile

On Windows, llvm::sys::fs::file_t is HANDLE (not int), so calling
closeFile(int tmpFD) fails to compile with MSVC:

  error C2664: cannot convert argument 1 from 'int' to 'file_t &'

The open+close sequence was unnecessary: we only needed a unique path,
not an open file descriptor, since copy_file immediately overwrites the
temp file. Switch to the two-argument createUniqueFile(model, path)
overload which reserves a unique filename without opening it, and drop
the closeFile call entirely. This is correct on all platforms.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Resolve conflicts:
- Keep link.py deleted (superseded by link_with attribute approach)
- Remove verbose=False arg from compile_cxx_core_function call (param removed upstream)
- Fix broken LLVM file header modeline in AIEAssignCoreLinkFiles.cpp
- Remove duplicate #include in aiecc.cpp
- Extract duplicated core-skipping predicate into coreNeedsCompilation()
- Normalize verbose output terminology ("external object" vs "link_with")
- Improve comments in AIEAssignCoreLinkFiles, BCF/ldscript targets, aiecc
- Add/improve docstrings in AIEOps.td, AIEPasses.td, kernel.py, aie.py, utils.py
- Extend indirect-call warning check in test to cover actionable message text
- Add comment explaining uninitialized bo_inB in npu-xrt tests

Co-Authored-By: Claude Opus 4.6 <[email protected]>
aiecc.cpp:
- Add missing lock_guard(outputMutex) around new verbose log lines in
  compileCore and compileCoresUnified (data race with parallel compilation)
- Add src == dest guard before atomicCopyFile in the BCF linkWithFiles
  loop, consistent with the existing guard in the Peano loop

python/utils/compile/utils.py:
- Forward aiecc subprocess stdout/stderr to logger.debug on success so
  verbose output is not silently discarded
- Remove dead hasattr(func, "_compiled") guard; _compiled is always
  initialized in ExternalFunction.__init__

python/utils/jit.py:
- Replace assert mlir_module.operation.verify() with explicit
  if/raise so it cannot be silently disabled by python -O

lib/Dialect/AIE/Transforms/AIEAssignCoreLinkFiles.cpp:
- Use mlir::Builder instead of mlir::OpBuilder; no ops are inserted

lib/Dialect/AIE/Transforms/CMakeLists.txt:
- Restore alphabetical ordering of source files

test/npu-xrt/add_one_scale_func_link_with_peano/run.lit:
- Fix wrong function signature in comment: scale_by_two takes three args

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add link_with to the external_func signature entry and note that
multiple declarations may share the same .o file (automatically
deduplicated by aie-assign-core-link-files).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
hunhoffe

This comment was marked as outdated.

hunhoffe and others added 3 commits March 11, 2026 15:55
…sion examples

edge_detect and color_detect were packing multiple kernel object files
into combined archives (combined_gray2rgba_addWeighted.a,
combined_bitwiseOR_gray2rgba_bitwiseAND.a) solely because the old
aie.core link_with attribute accepted only one file per core.

Now that link_with is declared per external_func and
aie-assign-core-link-files aggregates and deduplicates the per-function
object files into a per-core link_files list, each external_func
declaration can point directly to the .o that implements it:

- edge_detect: gray2rgbaLine -> gray2rgba.cc.o
               addWeightedLine -> addWeighted.cc.o
- color_detect: bitwiseORLine -> bitwiseOR.cc.o
                gray2rgbaLine -> gray2rgba.cc.o
                bitwiseANDLine -> bitwiseAND.cc.o

Remove the ar archive build rules from both Makefiles and update the
xclbin dependencies accordingly. Update the Iron Kernel() calls in the
non-placed variants and the edge_detect test in test/python/npu.py.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
kernel.py:
- Remove apologetic "harmless redundancy" comment from
  ExternalFunction.__call__; the arg-count check is explained by the
  fact that BaseKernel validates count while ExternalFunction also
  validates types, so the super() call naturally re-checks count.

compile/utils.py:
- Clarify compile_mlir_module comment: the MLIR write in the work_dir
  branch is intentional (and idempotent when jit.py pre-writes the same
  file) because the C++ aiecc binary needs the MLIR on disk to resolve
  relative link_with paths against the compilation directory.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@hunhoffe hunhoffe requested a review from ypapadop-amd March 11, 2026 22:07
@hunhoffe hunhoffe marked this pull request as ready for review March 12, 2026 02:19
@hunhoffe
Copy link
Collaborator Author

Does this PR address #2530? If so, is it appropriate to close that draft PR?

@ypapadop-amd
Copy link
Collaborator

Does this PR address #2530? If so, is it appropriate to close that draft PR?

It does, I'll close it. Much appreciated!

Copy link
Collaborator

@jgmelber jgmelber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on archive files in the examples, but overall this looks good! Thanks!


exp_bf16_1024 = external_func("exp_bf16_1024", inputs=[tile_ty, tile_ty])
exp_bf16_1024 = external_func(
"exp_bf16_1024", inputs=[tile_ty, tile_ty], link_with="kernels.a"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I didn't dig into this example before writing this, but doesn't func-level link_with eliminate the need for archive files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this example is that the archive format is a reflection of needing to bundle the function definition with a file of common helpers (e.g., kernel source requirement) and not a reflection of a need to archive multiple .o files due to mlir-aie.

@hunhoffe hunhoffe added this pull request to the merge queue Mar 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 12, 2026
@hunhoffe hunhoffe added this pull request to the merge queue Mar 13, 2026
Merged via the queue into main with commit 9c1e32d Mar 13, 2026
60 checks passed
@hunhoffe hunhoffe deleted the func-level-link-with branch March 13, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants