NPU usage: TensorFlow Lite NNAPI and TensorFlow Lite QNN Support

Models like *medium* or *large-v3-turbo* variant are (much) better at voice recognition but need more hardware performance or efficiency.

To be able to use those variants on smartphone it's important to use the NPU/AI-Chip, which are optimized for AI-like operations (which the *whisper* models need).

The corresponding comparison for NPU-Usage vs CPU and GPU for the S23 Ultra:

| | Quantized| Half-Precision|
|-|-|-|
|S23 Ultra: TensorFlow Lite CPU| 1738 |926|
|S23 Ultra: TensorFlow Lite GPU| 2240 |2747|
|S23 Ultra: TensorFlow Lite NNAPI| 634 |279|
|S23 Ultra: TensorFlow Lite QNN| 37498 |16474|

[Source](https://browser.geekbench.com/search?k=ai&q=s23+ultra)

- On Qualcomm-SoCs QNN is probably best backend and on others SoCs NNAPI is the way to use the NPU (e.g. Mediatek, Tensor G on Pixels).
- Quantized should correspond to int8. The NPU-Usage is _16 times_ better than the best other backend on that device (37498/2240 ~= 16,74).
- On other devices the difference might be less (or higher), but in many devices the NPU usage should provide a big advantage (e.g. 6 _times_).

(And even if one stays with the same model, more hardware performance on same models is also good for faster transcribing speed.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPU usage: TensorFlow Lite NNAPI and TensorFlow Lite QNN Support #47

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	Quantized	Half-Precision
S23 Ultra: TensorFlow Lite CPU	1738	926
S23 Ultra: TensorFlow Lite GPU	2240	2747
S23 Ultra: TensorFlow Lite NNAPI	634	279
S23 Ultra: TensorFlow Lite QNN	37498	16474

NPU usage: TensorFlow Lite NNAPI and TensorFlow Lite QNN Support #47

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions