Add SSE3 and fp16 conversion lookup table #368

abitofevrything · 2023-01-03T21:14:55Z

Adds SSE3 support for SIMD and support for using Imath for fp16-fp32 conversions. Imath can be faster on systems where whisper.cpp doesn't already have a native method for doing the conversion as it uses a lookup table, leading to an ~3.5x speed increase on my system.

abitofevrything · 2023-01-03T21:15:53Z

Drafting as I am unsure what value to put for GGML_F32_STEP and GGML_F16_STEP - guidance on this would be appreciated.

abitofevrything · 2023-01-04T10:56:13Z

A quick test seems to show that 32 leads to better performance than 16 or 64

ggerganov · 2023-01-05T19:51:42Z

A quick test seems to show that 32 leads to better performance than 16 or 64

Yes, that's what I do - trial and error to find the best value :)

This is a great contribution.
Before merging, I would like to avoid the Imath dependency.
We can simply generate a lookup table in ggml.c and use it instead of relying on Imath.
Take a look at the existing lookup tables for gelu and exp:

https://github.com/ggerganov/whisper.cpp/blob/a0d4f8e65ca03247ef385552a34be11ef6f1a871/ggml.c#L246-L250

I'm very curious to see if this F16 LUT will speed-up the WASM examples, because WASM does not have an intrinsic for FP16 <-> FP32 conversion so it fallbacks to the naive conversion method.

abitofevrything · 2023-01-06T00:16:19Z

Leaving as a draft for now as I want to see if I can get rid of some of the memcpy calls in the ggml_lookup_fp16_to_fo32 function.

A review would be appreciated as I am almost done with this though.

abitofevrything · 2023-01-06T12:34:38Z

Turns out the memcpy calls are optimised out by the compiler anyways :) Marking this as ready.

ggerganov · 2023-01-06T16:44:49Z

@abitofevrything
Good news! As expected, the lookup table improves the WASM performance.
On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Improves WASM performance: On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome * Add support for SSE3 SIMD * Add SSE3 to system information * Add Imath support for fp16-fp32 conversions * Add Imath to system information * Wrap Imath calls to avoid static function warnings * Drop Imath; Add lookup table for f16 -> f32 conversions * Remove TODO comments * Update SSE3 to new macro arguments * Correct updated macro definitions * Prefer static inline where possible * ggml : static inlines + add public f16 <-> f32 conversions Co-authored-by: Georgi Gerganov <[email protected]>

abitofevrything added 6 commits January 2, 2023 08:36

Add support for SSE3 SIMD

c4d8664

Add SSE3 to system information

97e0dfc

Add Imath support for fp16-fp32 conversions

7bba2dd

Add Imath to system information

67a1aa3

Wrap Imath calls to avoid static function warnings

4362818

Merge branch 'master' of github.com:ggerganov/whisper.cpp

d8f356a

abitofevrything marked this pull request as draft January 3, 2023 21:15

abitofevrything added 4 commits January 6, 2023 00:45

Drop Imath; Add lookup table for f16 -> f32 conversions

a408a3b

Remove TODO comments

fe6670d

Merge branch 'master' of github.com:ggerganov/whisper.cpp

b6740c0

Update SSE3 to new macro arguments

39d9c1c

abitofevrything changed the title ~~Add SSE3 and Imath support~~ Add SSE3 and fp16 conversion lookup table Jan 6, 2023

abitofevrything added 2 commits January 6, 2023 13:22

Correct updated macro definitions

171acce

Prefer static inline where possible

44fc35e

abitofevrything marked this pull request as ready for review January 6, 2023 12:34

ggml : static inlines + add public f16 <-> f32 conversions

daae73f

ggerganov merged commit a62170c into ggml-org:master Jan 6, 2023

This was referenced Jan 6, 2023

Very slow - any way to speed up? #300

Closed

Decoding strangely slow on i7 Macbook Pro #378

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SSE3 and fp16 conversion lookup table #368

Add SSE3 and fp16 conversion lookup table #368

Uh oh!

abitofevrything commented Jan 3, 2023

Uh oh!

abitofevrything commented Jan 3, 2023

Uh oh!

abitofevrything commented Jan 4, 2023

Uh oh!

ggerganov commented Jan 5, 2023

Uh oh!

abitofevrything commented Jan 6, 2023

Uh oh!

abitofevrything commented Jan 6, 2023

Uh oh!

ggerganov commented Jan 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add SSE3 and fp16 conversion lookup table #368

Add SSE3 and fp16 conversion lookup table #368

Uh oh!

Conversation

abitofevrything commented Jan 3, 2023

Uh oh!

abitofevrything commented Jan 3, 2023

Uh oh!

abitofevrything commented Jan 4, 2023

Uh oh!

ggerganov commented Jan 5, 2023

Uh oh!

abitofevrything commented Jan 6, 2023

Uh oh!

abitofevrything commented Jan 6, 2023

Uh oh!

ggerganov commented Jan 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants