-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
Models like medium or large-v3-turbo variant are (much) better at voice recognition but need more hardware performance or efficiency.
To be able to use those variants on smartphone it's important to use the NPU/AI-Chip, which are optimized for AI-like operations (which the whisper models need).
The corresponding comparison for NPU-Usage vs CPU and GPU for the S23 Ultra:
| Quantized | Half-Precision | |
|---|---|---|
| S23 Ultra: TensorFlow Lite CPU | 1738 | 926 |
| S23 Ultra: TensorFlow Lite GPU | 2240 | 2747 |
| S23 Ultra: TensorFlow Lite NNAPI | 634 | 279 |
| S23 Ultra: TensorFlow Lite QNN | 37498 | 16474 |
- On Qualcomm-SoCs QNN is probably best backend and on others SoCs NNAPI is the way to use the NPU (e.g. Mediatek, Tensor G on Pixels).
- Quantized should correspond to int8. The NPU-Usage is 16 times better than the best other backend on that device (37498/2240 ~= 16,74).
- On other devices the difference might be less (or higher), but in many devices the NPU usage should provide a big advantage (e.g. 6 times).
(And even if one stays with the same model, more hardware performance on same models is also good for faster transcribing speed.)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels