Skip to content

NPU usage: TensorFlow Lite NNAPI and TensorFlow Lite QNN Support #47

@DoS007

Description

@DoS007

Models like medium or large-v3-turbo variant are (much) better at voice recognition but need more hardware performance or efficiency.

To be able to use those variants on smartphone it's important to use the NPU/AI-Chip, which are optimized for AI-like operations (which the whisper models need).

The corresponding comparison for NPU-Usage vs CPU and GPU for the S23 Ultra:

Quantized Half-Precision
S23 Ultra: TensorFlow Lite CPU 1738 926
S23 Ultra: TensorFlow Lite GPU 2240 2747
S23 Ultra: TensorFlow Lite NNAPI 634 279
S23 Ultra: TensorFlow Lite QNN 37498 16474

Source

  • On Qualcomm-SoCs QNN is probably best backend and on others SoCs NNAPI is the way to use the NPU (e.g. Mediatek, Tensor G on Pixels).
  • Quantized should correspond to int8. The NPU-Usage is 16 times better than the best other backend on that device (37498/2240 ~= 16,74).
  • On other devices the difference might be less (or higher), but in many devices the NPU usage should provide a big advantage (e.g. 6 times).

(And even if one stays with the same model, more hardware performance on same models is also good for faster transcribing speed.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions