-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) #7864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,7 +7,7 @@ BUILD_TYPE?= | |
| NATIVE?=false | ||
| ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh | ||
| TARGET?=--target grpc-server | ||
| JOBS?=$(shell nproc) | ||
| JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 1) | ||
| ARCH?=$(shell uname -m) | ||
|
|
||
| # Disable Shared libs as we are linking on static gRPC and we can't mix shared and static | ||
|
|
@@ -109,10 +109,10 @@ llama-cpp-avx: llama.cpp | |
| $(info ${GREEN}I llama-cpp build info:avx${RESET}) | ||
| ifeq ($(OS),Darwin) | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server | ||
| else ifeq ($(ARCH),$(filter $(ARCH),aarch64 arm64)) | ||
| else ifneq ($(filter $(ARCH),aarch64 arm64),) | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any specific reason? I find
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason here is:
The I can change to ifeq ($(OS),Darwin)
# No BMI flags (Darwin)
else ifeq ($(filter $(ARCH),aarch64 arm64),)
# This is x86_64 - ADD BMI flags here
else
# This is ARM - NO BMI flags
endif |
||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server | ||
| else | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DCMAKE_C_FLAGS=-mno-bmi2 -DCMAKE_CXX_FLAGS=-mno-bmi2" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server | ||
| CFLAGS="-mno-bmi2" CXXFLAGS="-mno-bmi2" CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI=off -DGGML_BMI2=off" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do we need also CFLAGS/CXXFLAGS? If don't let's drop it. GGML_BMI2 should be enough
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right here. -DGGML_BMI2=off alone works (as I tested in the successful build) - the |
||
| endif | ||
| cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-avx-build/grpc-server llama-cpp-avx | ||
|
|
||
|
|
@@ -122,10 +122,10 @@ llama-cpp-fallback: llama.cpp | |
| $(info ${GREEN}I llama-cpp build info:fallback${RESET}) | ||
| ifeq ($(OS),Darwin) | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server | ||
| else ifeq ($(ARCH),$(filter $(ARCH),aarch64 arm64)) | ||
| else ifneq ($(filter $(ARCH),aarch64 arm64),) | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto about logic inversion |
||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server | ||
| else | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DCMAKE_C_FLAGS='-mno-bmi -mno-bmi2' -DCMAKE_CXX_FLAGS='-mno-bmi -mno-bmi2'" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server | ||
| CFLAGS="-mno-bmi -mno-bmi2" CXXFLAGS="-mno-bmi -mno-bmi2" CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI=off -DGGML_BMI2=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto above |
||
| endif | ||
| cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build/grpc-server llama-cpp-fallback | ||
|
|
||
|
|
@@ -135,10 +135,10 @@ llama-cpp-grpc: llama.cpp | |
| $(info ${GREEN}I llama-cpp build info:grpc${RESET}) | ||
| ifeq ($(OS),Darwin) | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server | ||
| else ifeq ($(ARCH),$(filter $(ARCH),aarch64 arm64)) | ||
| else ifneq ($(filter $(ARCH),aarch64 arm64),) | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server | ||
| else | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DCMAKE_C_FLAGS='-mno-bmi -mno-bmi2' -DCMAKE_CXX_FLAGS='-mno-bmi -mno-bmi2'" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server | ||
| CFLAGS="-mno-bmi -mno-bmi2" CXXFLAGS="-mno-bmi -mno-bmi2" CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI=off -DGGML_BMI2=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
| endif | ||
| cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build/grpc-server llama-cpp-grpc | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| cmake_minimum_required(VERSION 3.12) | ||
| # CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues | ||
| cmake_minimum_required(VERSION 3.31.10) | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
| project(gosd LANGUAGES C CXX) | ||
| set(CMAKE_POSITION_INDEPENDENT_CODE ON) | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed? If we need to do that, it requires compiling Cmake in the build process. Doable, but adds to compilation time and CI times. If there is no specific reason to do it I would avoid to do so for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had trouble compiling for RTX 5060 (SM_120) and the only version that consistently worked was CMake 3.31.10. I tried multiple 4.0.x versions and lower CMake versions, but none succeeded. I’d prefer we standardize on 3.31.10 for now - it looks like the safest option, and PyTorch also uses it. Also worth noting: 3.31.9 includes a fix related to CUDA 13, which may be connected to what we’re seeing.