Skip to content

bug: can't load models on dev #6856

@louis-jan

Description

@louis-jan

Version: dev

Describe the Bug

I can't load models due to a broken migration while building latest dev

-flash-attention: it should be on off or auto, not undefined or false

Steps to Reproduce

  1. Download a model
  2. Load it

Screenshots / Logs

Image

LLAMA_CPP_PROCESS_ERROR: The model process encountered an unexpected error.

Details:
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.007 sec
ggml_metal_device_init: GPU name: Apple M2 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 26800.60 MB
error while handling argument "--flash-attn": error: unkown value for --flash-attn: 'false'
usage:
-fa, --flash-attn [on|off|auto] set Flash Attention use ('on', 'off', or 'auto', default: 'auto')
(env: LLAMA_ARG_FLASH_ATTN)
to show complete usage, run with -h

Operating System

  • MacOS
  • Windows
  • Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions