Skip to content

feat(flatbuffer_direct): add QLinear ops and quant-chain fusion#873

Merged
PINTO0309 merged 2 commits intomainfrom
feat-fdirect-int8
Feb 15, 2026
Merged

feat(flatbuffer_direct): add QLinear ops and quant-chain fusion#873
PINTO0309 merged 2 commits intomainfrom
feat-fdirect-int8

Conversation

@PINTO0309
Copy link
Owner

@PINTO0309 PINTO0309 commented Feb 14, 2026

Summary

This PR extends flatbuffer_direct for quantized ONNX graphs and documents the updated support matrix.

face_recognition_sface_2021dec_int8.onnx.zip

What was added

  1. Added direct lowering for quantized operators in flatbuffer_direct:

    • QuantizeLinear -> QUANTIZE
    • DequantizeLinear -> DEQUANTIZE
    • QLinearAdd -> ADD
    • QLinearMul -> MUL
    • QLinearConv -> CONV_2D / DEPTHWISE_CONV_2D
    • QLinearMatMul -> FULLY_CONNECTED
  2. Added flatbuffer_direct-only preprocess fusion:

    • DequantizeLinear -> BatchNormalization -> PRelu -> QuantizeLinear
    • BN params are folded and rewritten into Mul + Add form before lowering.
  3. Added supporting direct builders and registry coverage:

    • BatchNormalization direct builder (MUL + ADD form)
    • Flatten direct builder (RESHAPE form)
    • op registry validators for quantized op constraints (const inputs, rank/shape checks, axis checks)
  4. Updated README support status section for flatbuffer_direct.

Implementation details

Quantized builder layer

  • New file: onnx2tf/tflite_builder/op_builders/quantized.py
  • Implements quant parameter handling (scale, zero_point, quantized_dimension) for both:
    • raw quantized path (QLinear*)
    • dequant/quant bridge path (DequantizeLinear / QuantizeLinear)
  • Handles:
    • dtype derivation from zero_point
    • quant metadata propagation across intermediate tensors
    • bias scale derivation for int32 bias (input_scale * weight_scale)

Dispatch/validation layer

  • Updated onnx2tf/tflite_builder/op_registry.py with new DispatchEntrys and validators:
    • _validate_quantize_dequantize_linear
    • _validate_qlinear_binary
    • _validate_qlinear_conv
    • _validate_qlinear_matmul
    • _validate_batch_norm
    • _validate_flatten

Preprocess rules

  • New rule file: onnx2tf/tflite_builder/preprocess/rules/quant_chain_fusion.py
  • Registered via:
    • onnx2tf/tflite_builder/preprocess/rules/__init__.py
    • onnx2tf/tflite_builder/preprocess/__init__.py
  • Rule ID: quant_chain_fusion_wave3

Tests

Added/updated tests

  • tests/test_tflite_builder_direct.py
    • Added qlinear conv chain model
    • Added qlinear matmul/fc chain model
    • Included both in direct operator smoke coverage
  • tests/test_tflite_builder_preprocess.py
    • Added test_quant_chain_fusion_wave3_rewrites_dq_bn_prelu_q_chain

Test results

  • pytest -q tests/test_tflite_builder_preprocess.py tests/test_tflite_builder_direct.py
    • 72 passed

Real model verification

Validated with:

  • face_recognition_sface_2021dec_int8.onnx
  • command: python -m onnx2tf -i face_recognition_sface_2021dec_int8.onnx -o /tmp/onnx2tf_sface_pr_check -tb flatbuffer_direct --report_op_coverage -n

Coverage report highlights:

  • conversion_error = null
  • graph_summary.supported_nodes = 278 / 278
  • graph_summary.unsupported_nodes = 0
  • graph_summary.coverage_ratio = 1
  • quant_chain_fusion_wave3 applied (matched_patterns=26, rewritten_patterns=26)

Additional updates

  • Version bump:
    • pyproject.toml: 2.0.11
    • onnx2tf/__init__.py: 2.0.11
  • README Docker tag examples aligned to 2.0.11.

@PINTO0309
Copy link
Owner Author

PINTO0309 commented Feb 14, 2026

Supplement for additional commit 1090987.

Additional implementation (more INT8-oriented)

  • Added direct lowering of PRelu to PRELU in flatbuffer_direct
    • onnx2tf/tflite_builder/op_builders/elementwise.py:215
  • For quantized input (INT8/UINT8), alpha (slope) is converted to a quantized constant with quantization metadata
    • onnx2tf/tflite_builder/op_builders/elementwise.py:185
  • Added PRelu dispatch and validator in the registry
    • onnx2tf/tflite_builder/op_registry.py:664
    • onnx2tf/tflite_builder/op_registry.py:940
  • Removed PRelu expansion from pseudo_ops_wave1; PRelu is now handled by direct builder without decomposition
    • onnx2tf/tflite_builder/preprocess/rules/pseudo_ops.py:489
  • Added PRELU builtin options mapping in model writer
    • onnx2tf/tflite_builder/model_writer.py:202
  • Updated README support matrix
    • README.md:280 (builtin count 38)
    • README.md:315 (PRelu row)

Tests

  • Existing + newly added tests:
    • tests/test_tflite_builder_direct.py:881 (verifies PRELU builtin emission)
  • Result:
    • pytest -q tests/test_tflite_builder_preprocess.py tests/test_tflite_builder_direct.py tests/test_tflite_builder_op_coverage.py
    • 76 passed

Real model verification (face_recognition_sface_2021dec_int8.onnx)

  • Conversion succeeded (--report_op_coverage): unsupported 0 / coverage 1.0
  • All 27 PRelu nodes are dispatch_mode=builtin
  • Number of float-output ops in generated TFLite:
    • Before: 223 / 335
    • After: 115 / 227
    • Main effect: decomposed PRelu (Neg/Relu/Mul/Sub) was consolidated into a single PRELU op

@PINTO0309 PINTO0309 merged commit 4bdb913 into main Feb 15, 2026
3 checks passed
@PINTO0309 PINTO0309 deleted the feat-fdirect-int8 branch February 15, 2026 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant