-
Notifications
You must be signed in to change notification settings - Fork 16.2k
GPT2: llama_model_load: error loading model: missing tensor 'output.weight' #12567
Copy link
Copy link
Closed as not planned
Closed as not planned
Copy link
Labels
wontfixThis will not be worked onThis will not be worked on
Description
Name and Version
build/bin/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
version: 4945 (9b169a4)
built with cc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-23) for x86_64-redhat-linux
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX 4090
Models
GPT2LMHeadModel
Problem description & steps to reproduce
when I convert the basic GPT2LMHeadModel using the convert_hf_to_gguf.py script. it work perfectly but then when I load it using:
from llama_cpp import Llama
llama = Llama("model.path.gguf")
I got this error:
llama_model_load: error loading model: missing tensor 'output.weight'
obviously the output layer has not been convert but why? Does anyone have an idea
First Bad Commit
No response
Relevant log output
nt_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 3072
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = -1
print_info: rope scaling = linear
print_info: freq_base_train = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 1024
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 0.1B
print_info: model params = 124.44 M
print_info: general.name = Gpt2
print_info: vocab type = BPE
print_info: n_vocab = 50257
print_info: n_merges = 50000
print_info: BOS token = 50256 '<|endoftext|>'
print_info: EOS token = 50256 '<|endoftext|>'
print_info: EOT token = 50256 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 50256 '<|endoftext|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer 0 assigned to device CUDA0
load_tensors: layer 1 assigned to device CUDA0
load_tensors: layer 2 assigned to device CUDA0
load_tensors: layer 3 assigned to device CUDA0
load_tensors: layer 4 assigned to device CUDA0
load_tensors: layer 5 assigned to device CUDA0
load_tensors: layer 6 assigned to device CUDA0
load_tensors: layer 7 assigned to device CUDA0
load_tensors: layer 8 assigned to device CUDA0
load_tensors: layer 9 assigned to device CUDA0
load_tensors: layer 10 assigned to device CUDA0
load_tensors: layer 11 assigned to device CUDA0
load_tensors: layer 12 assigned to device CUDA0
llama_model_load: error loading model: missing tensor 'output.weight'
llama_model_load_from_file_impl: failed to load modelReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
wontfixThis will not be worked onThis will not be worked on