Skip to content

Add C ObjectFIFO API for kernel-managed lock synchronization#2898

Draft
jgmelber wants to merge 5 commits intomainfrom
c-objfifo-locks
Draft

Add C ObjectFIFO API for kernel-managed lock synchronization#2898
jgmelber wants to merge 5 commits intomainfrom
c-objfifo-locks

Conversation

@jgmelber
Copy link
Collaborator

Enable precompiled C kernels to directly manage ObjectFIFO synchronization by receiving lock IDs and buffer references from MLIR, rather than relying on the compiler to insert acquire/release ops.

New MLIR ops:

  • aie.objectfifo.lock: returns (acq_lock, rel_lock) for a port
  • aie.objectfifo.buffer: returns a buffer memref at a given index

Stateful transform lowering resolves these ops to concrete lock/buffer SSA values, handling AIE1 single-lock and AIE2/AIE2P dual-lock semantics.

C API header (aie_runtime_lib/AIE2{,P}/aie_objectfifo.h): objectfifo_port_t struct with inline acquire/release functions using compiler-provided lock intrinsics. Use acq_value=-1 for AcquireGreaterEqual semantics matching the DMA protocol.

Python API: get_lock()/get_buffer() on object_fifo and ObjectFifoHandle.

Bug fixes discovered during hardware bring-up:

  • python/dialects/aie.py: get_lock() passed incorrect extra args to ObjectFifoGetLockOp auto-generated constructor
  • AIELocalizeLocks: used getParentOp() == coreOp which failed for lock uses nested inside scf.for; fixed to use isProperAncestor()

Includes:

  • CMake install rules for aie_objectfifo.h into install tree
  • LIT tests for lock localization with external function calls
  • E2E NPU tests (lock_pass_to_c, objectfifo_lock_c_api)
  • Programming example: passthrough_kernel_c_objfifo

@jgmelber
Copy link
Collaborator Author

Related to #288

Enable precompiled C kernels to directly manage ObjectFIFO
synchronization by receiving lock IDs and buffer references from MLIR,
rather than relying on the compiler to insert acquire/release ops.

New MLIR ops:
- aie.objectfifo.lock: returns (acq_lock, rel_lock) for a port
- aie.objectfifo.buffer: returns a buffer memref at a given index

Stateful transform lowering resolves these ops to concrete lock/buffer
SSA values, handling AIE1 single-lock and AIE2/AIE2P dual-lock
semantics.

C API header (aie_runtime_lib/AIE2{,P}/aie_objectfifo.h):
objectfifo_port_t struct with inline acquire/release functions using
compiler-provided lock intrinsics. Use acq_value=-1 for
AcquireGreaterEqual semantics matching the DMA protocol.

Python API: get_lock()/get_buffer() on object_fifo and ObjectFifoHandle.

Bug fixes discovered during hardware bring-up:
- python/dialects/aie.py: get_lock() passed incorrect extra args to
  ObjectFifoGetLockOp auto-generated constructor
- AIELocalizeLocks: used getParentOp() == coreOp which failed for lock
  uses nested inside scf.for; fixed to use isProperAncestor()

Includes:
- CMake install rules for aie_objectfifo.h into install tree
- LIT tests for lock localization with external function calls
- E2E NPU tests (lock_pass_to_c, objectfifo_lock_c_api)
- Programming example: passthrough_kernel_c_objfifo

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 26, 2026

Coverage Report

Created: 2026-03-03 03:40

Click here for information about interpreting this report.

FilenameFunction CoverageLine CoverageRegion CoverageBranch Coverage
IR/AIEDialect.cpp 93.28% 86.63% 87.75% 79.19%
Transforms/AIELocalizeLocks.cpp 100.00% 98.11% 94.44% 93.75%
Transforms/AIEObjectFifoStatefulTransform.cpp 100.00% 94.14% 91.61% 85.12%
Totals 95.26% 90.54% 89.38% 81.91%
Generated by llvm-cov -- llvm version 18.1.3

@fifield
Copy link
Collaborator

fifield commented Feb 27, 2026

The C-ABI is basically: manually extract locks and buffers from the objectfifo symbol, pass them around in an ad-hoc way, then glue some of them back together into a partial object fifo data structure so that you can call objectfifo-like functions on them.

            out_buf0 = of_out.get_buffer(0)
            out_buf1 = of_out.get_buffer(1)
            out_acq, out_rel = of_out.get_lock(ObjectFifoPort.Produce)

            # C kernel owns the compute loop and buffer rotation
            passThroughLine(
                in_buf0,
                in_buf1,
                out_buf0,
                out_buf1,
                in_acq,
                in_rel,
                out_acq,
                out_rel,
            )
typedef struct {
  int32_t acq_lock; // Lock ID for acquire operation
  int32_t rel_lock; // Lock ID for release operation
  int32_t
      acq_value; // Value for acquire_equal(): use -1 for AcquireGreaterEqual
  int32_t rel_value; // Value for release() call (typically 1)
} objectfifo_port_t;

void passThroughLine(int32_t *in_buf0, int32_t *in_buf1, int32_t *out_buf0,
                     int32_t *out_buf1, int64_t in_acq_lock,
                     int64_t in_rel_lock, int64_t out_acq_lock,
                     int64_t out_rel_lock) {
  objectfifo_port_t port_in = {(int32_t)in_acq_lock, (int32_t)in_rel_lock, -1,
                               1};
  objectfifo_port_t port_out = {(int32_t)out_acq_lock, (int32_t)out_rel_lock,
                                -1, 1};
    
    // ...

    objectfifo_acquire(&port_in);
    objectfifo_acquire(&port_out);

Did you consider instead of objectfifo_port_t an objectfifo_t which also includes the buffers? This could be a defacto C compatible ABI for passing around object fifos analogous to the numpy array struct format memrefs lower to.

            # C kernel owns the compute loop and buffer rotation
            passThroughLine(of_out.get_value())
typedef struct {
  // some well defined layout
  // ports ...
  // buffers ...
} objectfifo_t;

void passThroughLine(objectfifo_t *of) {
    // ...

    objectfifo_acquire(Consume, of);
    objectfifo_acquire(Produce, of);

@jgmelber
Copy link
Collaborator Author

Did you consider instead of objectfifo_port_t an objectfifo_t which also includes the buffers? This could be a defacto C compatible ABI for passing around object fifos analogous to the numpy array struct format memrefs lower to.

            # C kernel owns the compute loop and buffer rotation
            passThroughLine(of_out.get_value())
typedef struct {
 // some well defined layout
 // ports ...
 // buffers ...
} objectfifo_t;

void passThroughLine(objectfifo_t *of) {
   // ...

   objectfifo_acquire(Consume, of);
   objectfifo_acquire(Produce, of);

Yes, I was thinking of something along those lines. It gets messy handling those buffers from the user perspective. The struct should also know how deep the objFIFO is so that it can do the % or better a +/- to access the objects in the correct order.

@jgmelber
Copy link
Collaborator Author

This PR needs more npu-xrt tests: depth 1 and 3

jgmelber and others added 3 commits March 2, 2026 19:21
Replace the lock-only objectfifo_port_t with a self-contained objectfifo_t
struct that bundles locks, buffer pointers, and depth. Add
objectfifo_get_buffer() for automatic buffer rotation via iter % depth,
eliminating manual modular indexing in kernel code.

Also remove hardcoded XRT paths from passthrough_kernel_c_objfifo
CMakeLists.txt since common.cmake handles auto-detection.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add get_depth() method to ObjectFifoCreateOp (aie.py) and
  ObjectFifoHandle (objectfifo.py) that emits an arith.constant with
  the ObjectFIFO depth as an index value, for passing to C kernels
- Add npu-xrt tests for depth-1 (single buffer) and depth-3 (triple
  buffering) exercising the objectfifo_t C API with
  objectfifo_get_buffer() rotation at each depth
- Add MLIR FileCheck tests verifying stateful transform lowering of
  objectfifo.lock and objectfifo.buffer at depth 1 and depth 3

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants