Skip to content
This repository was archived by the owner on Sep 23, 2023. It is now read-only.
This repository was archived by the owner on Sep 23, 2023. It is now read-only.

Can't load model if n-gpu-layers > 0 #130

@jhandl

Description

@jhandl

I'm running this on a Mac mini M2 Pro 16GB. I used the MacOS one-click-installer, and copied the vicuna-13b-v1.5-16k.Q4_K_M model into the models dir. When I select this model, it selects the llama.cpp loader.

If I set the n-gpu-layers parameter to 0, everything works. It doesn't use the GPU though.

If I set it to 1 (or any value other than 0), loading the model produces the following:

2023-09-17 18:38:19 INFO:Loading vicuna-13b-v1.5-16k.Q4_K_M.gguf...
2023-09-17 18:38:19 INFO:llama.cpp weights detected: models/vicuna-13b-v1.5-16k.Q4_K_M.gguf
2023-09-17 18:38:19 INFO:Cache capacity is 0 bytes
llama_model_loader: loaded meta data with 20 key-value pairs and 363 tensors from models/vicuna-13b-v1.5-16k.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_K     [  5120,  5120,     1,     1 ]
...
llama_model_loader: - tensor  362:                    output.weight q6_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str     
llama_model_loader: - kv   1:                               general.name str     
llama_model_loader: - kv   2:                       llama.context_length u32     
llama_model_loader: - kv   3:                     llama.embedding_length u32     
llama_model_loader: - kv   4:                          llama.block_count u32     
llama_model_loader: - kv   5:                  llama.feed_forward_length u32     
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32     
llama_model_loader: - kv   7:                 llama.attention.head_count u32     
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32     
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32     
llama_model_loader: - kv  10:                    llama.rope.scale_linear f32     
llama_model_loader: - kv  11:                          general.file_type u32     
llama_model_loader: - kv  12:                       tokenizer.ggml.model str     
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr     
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32     
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32     
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32     
llama_model_loader: - kv  19:               general.quantization_version u32     
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q4_K:  241 tensors
llama_model_loader: - type q6_K:   41 tensors
llm_load_print_meta: format         = GGUF V2 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32000
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 16384
llm_load_print_meta: n_ctx          = 1048
llm_load_print_meta: n_embd         = 5120
llm_load_print_meta: n_head         = 40
llm_load_print_meta: n_head_kv      = 40
llm_load_print_meta: n_layer        = 40
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 13824
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 0.25
llm_load_print_meta: model type     = 13B
llm_load_print_meta: model ftype    = mostly Q4_K - Medium
llm_load_print_meta: model size     = 13.02 B
llm_load_print_meta: general.name   = lmsys_vicuna-13b-v1.5-16k
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: mem required  = 7500.97 MB (+  818.75 MB per state)
...................................................................................................
llama_new_context_with_model: kv self size  =  818.75 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Pro
ggml_metal_init: picking default device: Apple M2 Pro
ggml_metal_init: loading '(null)'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=258 "The file name is invalid."
llama_new_context_with_model: ggml_metal_init() failed
2023-09-17 18:38:19 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/Users/jhandl/oobabooga_macos/text-generation-webui/modules/ui_model_menu.py", line 194, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/Users/jhandl/oobabooga_macos/text-generation-webui/modules/models.py", line 77, in load_model
    output = load_func_map[loader](model_name)
  File "/Users/jhandl/oobabooga_macos/text-generation-webui/modules/models.py", line 245, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
  File "/Users/jhandl/oobabooga_macos/text-generation-webui/modules/llamacpp_model.py", line 87, in from_pretrained
    result.model = Llama(**params)
  File "/Users/jhandl/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/llama_cpp/llama.py", line 334, in __init__
    assert self.ctx is not None
AssertionError

Exception ignored in: <function LlamaCppModel.__del__ at 0x157ee39a0>
Traceback (most recent call last):
  File "/Users/jhandl/oobabooga_macos/text-generation-webui/modules/llamacpp_model.py", line 46, in __del__
    self.model.__del__()
AttributeError: 'LlamaCppModel' object has no attribute 'model'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions