Skip to content

Add bf16-emulation option to convert-vector-to-aievec pipeline#2942

Merged
erwei-xilinx merged 2 commits intoXilinx:mainfrom
erwei-xilinx:erwei/bf16-emulation-mode
Mar 10, 2026
Merged

Add bf16-emulation option to convert-vector-to-aievec pipeline#2942
erwei-xilinx merged 2 commits intoXilinx:mainfrom
erwei-xilinx:erwei/bf16-emulation-mode

Conversation

@erwei-xilinx
Copy link
Collaborator

@erwei-xilinx erwei-xilinx commented Mar 10, 2026

Summary

  • Add a bf16-emulation boolean option to the convert-vector-to-aievec pipeline that emulates f32 vector arithmetic using bf16 operations
  • When enabled, inserts arith.truncf/arith.extf around f32 vector ops to compute in bf16, trading precision for performance (1 bf16 op vs 3-9 MACs for f32 emulation on AIE2)
  • Excludes arith.divf because bf16 vector divf is unsupported on all AIE targets (Peano does not legalize G_FDIV on <16 x s16>)

Details

The pass runs as the first step in the canonicalize-vector-for-aievec pipeline (Vector→Vector stage), before the existing lowering patterns. Supported ops:

Category Operations
Binary arithmetic arith.addf, arith.subf, arith.mulf, arith.maximumf, arith.minimumf
Comparison arith.cmpf (result stays vector<Nxi1>)
Select arith.select
FMA vector.fma
Unary arith.negf
Reduction vector.reduction

A smart truncation helper (smartTruncF32ToBF16) eliminates redundant extf→truncf chains between consecutive demoted ops. For example, addf f32 → mulf f32 becomes addf bf16 → mulf bf16 with no intermediate type conversions.

After demotion, the existing bf16 lowering patterns handle the ops naturally. FMA fusion still works: mulf bf16 + addf bf16 fuses to aievec.mac_elem with f32 accumulator, providing some precision recovery.

Usage

aie-opt --convert-vector-to-aievec="aie-target=aie2 target-backend=llvmir bf16-emulation=true" input.mlir

Test plan

  • New test file test/Conversion/VectorToAIEVec/test-bf16-emulation.mlir with 11 test cases covering all patterns, chain optimization, divf exclusion, and pass-through of non-f32 ops
  • Verify existing VectorToAIEVec tests pass (no regression)
  • Full pipeline test: f32 mulf+addf with bf16-emulation produces aievec.mac_elem (bf16 inputs, f32 accumulator)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 10, 2026 00:30
Add a `bf16-emulation` option that emulates f32 vector arithmetic
using bf16 operations. When enabled, the pass inserts arith.truncf/
arith.extf around f32 vector ops to compute in bf16, trading
precision for performance (1 bf16 op vs 3-9 MACs for f32 emulation
on AIE2).

The pass runs as the first step in the canonicalize-vector-for-aievec
pipeline. It handles binary ops (addf, subf, mulf, maximumf,
minimumf), comparison (cmpf), select, vector.fma, negf, and
vector.reduction. arith.divf is excluded because bf16 vector divf
is not supported by Peano on any AIE target.

A smart truncation helper eliminates redundant extf->truncf chains
between consecutive demoted ops, keeping the IR clean.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in bf16-emulation mode to the AIEVec vector canonicalization stage so f32 vector arithmetic can be computed via bf16 ops (via arith.truncf/arith.extf insertion) before existing AIEVec lowering patterns run.

Changes:

  • Introduce a new BF16 emulation pass that rewrites supported f32 vector ops to bf16 equivalents with ext/trunc boundaries (and avoids redundant extf→truncf chains).
  • Plumb a new bf16-emulation boolean pipeline option through convert-vector-to-aievec into the canonicalize-vector-for-aievec sub-pipeline.
  • Add a new MLIR test file covering several emulation rewrite patterns and the divf exclusion behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
lib/Dialect/AIEVec/Transforms/VectorToVectorConversions.cpp Adds BF16 emulation rewrite patterns + pass and hooks it into the canonicalize pipeline when enabled.
include/aie/Dialect/AIEVec/Pipelines/Passes.h Adds bf16-emulation option to pipeline options and forwards it from convert→canonicalize options.
test/Conversion/VectorToAIEVec/test-bf16-emulation.mlir New FileCheck tests for bf16-emulation behavior (demotion, chaining, and divf non-demotion).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@erwei-xilinx erwei-xilinx added this pull request to the merge queue Mar 10, 2026
Merged via the queue into Xilinx:main with commit d4f709d Mar 10, 2026
60 checks passed
@erwei-xilinx erwei-xilinx deleted the erwei/bf16-emulation-mode branch March 10, 2026 04:42
erwei-xilinx added a commit to erwei-xilinx/mlir-air-erwei that referenced this pull request Mar 10, 2026
Update mlir-aie wheel to 0.0.1.2026031005+d4f709d which includes
the bf16-emulation pass (PR Xilinx/mlir-aie#2942). This commit adds
the convert-vector-to-aievec bf16-emulation option and the aiecc
--bf16-emulation CLI flag needed by the f32 primitive tests.

Also updates eudsl-python-extras hash to 09d24cd.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
erwei-xilinx added a commit to erwei-xilinx/mlir-aie that referenced this pull request Mar 12, 2026
The --bf16-emulation flag was added to the Python aiecc in Xilinx#2942 but
was lost when the Python aiecc was replaced by the C++ binary in
Xilinx#2925. This adds it to the C++ binary and threads it through to the
convert-vector-to-aievec pipeline string.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants