Support AVX2 in windows better #381

Mike-Bell · 2023-01-06T16:21:53Z

I spent quite a while trying to figure out why AVX2 wasn't enabled when I built this on Windows in my build system. Adding /arch:AVX2 to the C-compiler seems to fix it. I'm not building through Visual Studio but via a more customized build system (leveraging MSBuild), so that's likely why this isn't required for other people building on Windows, but since ggml.c is a C file (not C++), it seems like this should be correct and necessary in some cicrumstances.

RndyP · 2023-01-06T16:44:28Z

Thanks Mike! I had tried AVX2 and it worked, but was extremely slow. The JFK.wav sample took 17 seconds, where with AVX it takes 4.5 seconds. Using Visual Studio with AVX2 set under C/C++ Code Generation -> Enable Enhanced Instruction Set -> Advanced Vector Extensions 2 (/arch:AVX2) time went from 4.5 to about 3.6 seconds. Nice improvement.

RndyP · 2023-01-06T19:12:12Z

I did some research into what's going on here. One of the most critical sections of code is in ggml_vec_dot_f16(). AVX2 has fused multiply add instructions and AVX does not. So with AVX2 we have:

Without (just AVX) we have:

Note in the AVX2 case the disassembly shows the use of the fused intrinsics vfmadd231ps

ggerganov · 2023-01-06T19:21:38Z

@RndyP
So recently, I was revisiting the SIMD code in ggml.c and I came to the realisation that we don't use AVX2 at all because all x86 intrinsics that we currently use are either AVX, FMA or F16C. This is based on lookup in the following site:

https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html

For example, the case that you refer to I think actually depends on wether FMA is supported or not:

https://github.com/ggerganov/whisper.cpp/blob/87dd4a30811ee07700ee6fee267508e8935b32fc/ggml.c#L483-L487

So it is strange that AVX2 flags make any difference at all on Windows. My understanding is that AVX2 should not make a difference, but I might be missing something.

RndyP · 2023-01-06T20:42:25Z

I think what's happening is the Microsoft compiler is smart enough to convert the _mm256_add_ps(_mm256_mul_ps(b, c), a) to the fused assembly version, when the compiler option is set. The code above goes from 16 down to 12 vector instructions. 4.5 to 3.6 seconds is a nice improvement.

Support AVX2 in windows better

8f4f7d9

Mike-Bell force-pushed the SupportWindowsAvx2Better branch from 4faae25 to 8f4f7d9 Compare January 6, 2023 17:05

ggerganov merged commit 41e05c6 into ggml-org:master Jan 6, 2023

Mike-Bell deleted the SupportWindowsAvx2Better branch January 6, 2023 17:46

anandijain pushed a commit to anandijain/whisper.cpp that referenced this pull request Apr 28, 2023

cmake : support AVX2 in Windows better (ggml-org#381)

499563f

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023

cmake : support AVX2 in Windows better (ggml-org#381)

d40c2be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support AVX2 in windows better #381

Support AVX2 in windows better #381

Uh oh!

Mike-Bell commented Jan 6, 2023 •

edited

Loading

Uh oh!

RndyP commented Jan 6, 2023 •

edited

Loading

Uh oh!

RndyP commented Jan 6, 2023 •

edited

Loading

Uh oh!

ggerganov commented Jan 6, 2023 •

edited

Loading

Uh oh!

RndyP commented Jan 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support AVX2 in windows better #381

Support AVX2 in windows better #381

Uh oh!

Conversation

Mike-Bell commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RndyP commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RndyP commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RndyP commented Jan 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mike-Bell commented Jan 6, 2023 •

edited

Loading

RndyP commented Jan 6, 2023 •

edited

Loading

RndyP commented Jan 6, 2023 •

edited

Loading

ggerganov commented Jan 6, 2023 •

edited

Loading