Skip to content

Conversation

@abitofevrything
Copy link
Contributor

Adds SSE3 support for SIMD and support for using Imath for fp16-fp32 conversions. Imath can be faster on systems where whisper.cpp doesn't already have a native method for doing the conversion as it uses a lookup table, leading to an ~3.5x speed increase on my system.

@abitofevrything abitofevrything marked this pull request as draft January 3, 2023 21:15
@abitofevrything
Copy link
Contributor Author

Drafting as I am unsure what value to put for GGML_F32_STEP and GGML_F16_STEP - guidance on this would be appreciated.

@abitofevrything
Copy link
Contributor Author

A quick test seems to show that 32 leads to better performance than 16 or 64

@ggerganov
Copy link
Member

A quick test seems to show that 32 leads to better performance than 16 or 64

Yes, that's what I do - trial and error to find the best value :)

This is a great contribution.
Before merging, I would like to avoid the Imath dependency.
We can simply generate a lookup table in ggml.c and use it instead of relying on Imath.
Take a look at the existing lookup tables for gelu and exp:

https://github.com/ggerganov/whisper.cpp/blob/a0d4f8e65ca03247ef385552a34be11ef6f1a871/ggml.c#L246-L250

I'm very curious to see if this F16 LUT will speed-up the WASM examples, because WASM does not have an intrinsic for FP16 <-> FP32 conversion so it fallbacks to the naive conversion method.

@abitofevrything
Copy link
Contributor Author

Leaving as a draft for now as I want to see if I can get rid of some of the memcpy calls in the ggml_lookup_fp16_to_fo32 function.

A review would be appreciated as I am almost done with this though.

@abitofevrything abitofevrything changed the title Add SSE3 and Imath support Add SSE3 and fp16 conversion lookup table Jan 6, 2023
@abitofevrything
Copy link
Contributor Author

Turns out the memcpy calls are optimised out by the compiler anyways :) Marking this as ready.

@abitofevrything abitofevrything marked this pull request as ready for review January 6, 2023 12:34
@ggerganov
Copy link
Member

@abitofevrything
Good news! As expected, the lookup table improves the WASM performance.
On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

@ggerganov ggerganov merged commit a62170c into ggml-org:master Jan 6, 2023
anandijain pushed a commit to anandijain/whisper.cpp that referenced this pull request Apr 28, 2023
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <[email protected]>
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants