-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Version: dev
Describe the Bug
I can't load models due to a broken migration while building latest dev
-flash-attention: it should be on off or auto, not undefined or false
Steps to Reproduce
- Download a model
- Load it
Screenshots / Logs
LLAMA_CPP_PROCESS_ERROR: The model process encountered an unexpected error.
Details:
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.007 sec
ggml_metal_device_init: GPU name: Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 26800.60 MB
error while handling argument "--flash-attn": error: unkown value for --flash-attn: 'false'
usage:
-fa, --flash-attn [on|off|auto] set Flash Attention use ('on', 'off', or 'auto', default: 'auto')
(env: LLAMA_ARG_FLASH_ATTN)
to show complete usage, run with -h
Operating System
- MacOS
- Windows
- Linux