Add C ObjectFIFO API for kernel-managed lock synchronization#2898
Add C ObjectFIFO API for kernel-managed lock synchronization#2898
Conversation
|
Related to #288 |
Enable precompiled C kernels to directly manage ObjectFIFO
synchronization by receiving lock IDs and buffer references from MLIR,
rather than relying on the compiler to insert acquire/release ops.
New MLIR ops:
- aie.objectfifo.lock: returns (acq_lock, rel_lock) for a port
- aie.objectfifo.buffer: returns a buffer memref at a given index
Stateful transform lowering resolves these ops to concrete lock/buffer
SSA values, handling AIE1 single-lock and AIE2/AIE2P dual-lock
semantics.
C API header (aie_runtime_lib/AIE2{,P}/aie_objectfifo.h):
objectfifo_port_t struct with inline acquire/release functions using
compiler-provided lock intrinsics. Use acq_value=-1 for
AcquireGreaterEqual semantics matching the DMA protocol.
Python API: get_lock()/get_buffer() on object_fifo and ObjectFifoHandle.
Bug fixes discovered during hardware bring-up:
- python/dialects/aie.py: get_lock() passed incorrect extra args to
ObjectFifoGetLockOp auto-generated constructor
- AIELocalizeLocks: used getParentOp() == coreOp which failed for lock
uses nested inside scf.for; fixed to use isProperAncestor()
Includes:
- CMake install rules for aie_objectfifo.h into install tree
- LIT tests for lock localization with external function calls
- E2E NPU tests (lock_pass_to_c, objectfifo_lock_c_api)
- Programming example: passthrough_kernel_c_objfifo
Co-Authored-By: Claude Opus 4.6 <[email protected]>
b1bb77e to
24af672
Compare
Coverage ReportCreated: 2026-03-03 03:40Click here for information about interpreting this report.
Generated by llvm-cov -- llvm version 18.1.3 |
|
The C-ABI is basically: manually extract locks and buffers from the objectfifo symbol, pass them around in an ad-hoc way, then glue some of them back together into a partial object fifo data structure so that you can call objectfifo-like functions on them. out_buf0 = of_out.get_buffer(0)
out_buf1 = of_out.get_buffer(1)
out_acq, out_rel = of_out.get_lock(ObjectFifoPort.Produce)
# C kernel owns the compute loop and buffer rotation
passThroughLine(
in_buf0,
in_buf1,
out_buf0,
out_buf1,
in_acq,
in_rel,
out_acq,
out_rel,
)typedef struct {
int32_t acq_lock; // Lock ID for acquire operation
int32_t rel_lock; // Lock ID for release operation
int32_t
acq_value; // Value for acquire_equal(): use -1 for AcquireGreaterEqual
int32_t rel_value; // Value for release() call (typically 1)
} objectfifo_port_t;
void passThroughLine(int32_t *in_buf0, int32_t *in_buf1, int32_t *out_buf0,
int32_t *out_buf1, int64_t in_acq_lock,
int64_t in_rel_lock, int64_t out_acq_lock,
int64_t out_rel_lock) {
objectfifo_port_t port_in = {(int32_t)in_acq_lock, (int32_t)in_rel_lock, -1,
1};
objectfifo_port_t port_out = {(int32_t)out_acq_lock, (int32_t)out_rel_lock,
-1, 1};
// ...
objectfifo_acquire(&port_in);
objectfifo_acquire(&port_out);Did you consider instead of # C kernel owns the compute loop and buffer rotation
passThroughLine(of_out.get_value())typedef struct {
// some well defined layout
// ports ...
// buffers ...
} objectfifo_t;
void passThroughLine(objectfifo_t *of) {
// ...
objectfifo_acquire(Consume, of);
objectfifo_acquire(Produce, of); |
Yes, I was thinking of something along those lines. It gets messy handling those buffers from the user perspective. The struct should also know how deep the objFIFO is so that it can do the |
|
This PR needs more |
Replace the lock-only objectfifo_port_t with a self-contained objectfifo_t struct that bundles locks, buffer pointers, and depth. Add objectfifo_get_buffer() for automatic buffer rotation via iter % depth, eliminating manual modular indexing in kernel code. Also remove hardcoded XRT paths from passthrough_kernel_c_objfifo CMakeLists.txt since common.cmake handles auto-detection. Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add get_depth() method to ObjectFifoCreateOp (aie.py) and ObjectFifoHandle (objectfifo.py) that emits an arith.constant with the ObjectFIFO depth as an index value, for passing to C kernels - Add npu-xrt tests for depth-1 (single buffer) and depth-3 (triple buffering) exercising the objectfifo_t C API with objectfifo_get_buffer() rotation at each depth - Add MLIR FileCheck tests verifying stateful transform lowering of objectfifo.lock and objectfifo.buffer at depth 1 and depth 3 Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Enable precompiled C kernels to directly manage ObjectFIFO synchronization by receiving lock IDs and buffer references from MLIR, rather than relying on the compiler to insert acquire/release ops.
New MLIR ops:
Stateful transform lowering resolves these ops to concrete lock/buffer SSA values, handling AIE1 single-lock and AIE2/AIE2P dual-lock semantics.
C API header (aie_runtime_lib/AIE2{,P}/aie_objectfifo.h): objectfifo_port_t struct with inline acquire/release functions using compiler-provided lock intrinsics. Use acq_value=-1 for AcquireGreaterEqual semantics matching the DMA protocol.
Python API: get_lock()/get_buffer() on object_fifo and ObjectFifoHandle.
Bug fixes discovered during hardware bring-up:
Includes: