Supports SmolLM by Stillerman · Pull Request #495 · mozilla-ai/llamafile

Stillerman · 2024-07-21T06:52:56Z

These changes are needed to make a gguf for SmolLM work with llamafile. The gguf was generated with this PR for llama.cpp

Tested with

llamafile-convert smol-135M.gguf
./smol-135M.llamafile

jart · 2024-07-22T06:32:57Z

Wow thanks for sending this. I checked out your llama.cpp PR and I'm having trouble creating a GGUF file. Any ideas?

Traceback (most recent call last):
  File "/home/jart/llama.cpp/convert_hf_to_gguf.py", line 3673, in <module>
    main()
  File "/home/jart/llama.cpp/convert_hf_to_gguf.py", line 3666, in main
    model_instance.write()
  File "/home/jart/llama.cpp/convert_hf_to_gguf.py", line 400, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/jart/llama.cpp/convert_hf_to_gguf.py", line 348, in prepare_metadata
    self.metadata = gguf.Metadata.load(self.metadata_override, self.dir_model, self.model_name, total_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jart/llama.cpp/gguf-py/gguf/metadata.py", line 59, in load
    metadata = Metadata.apply_metadata_heuristic(metadata, model_card, hf_params, model_path, total_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jart/llama.cpp/gguf-py/gguf/metadata.py", line 396, in apply_metadata_heuristic
    model_full_name_component, org_component, basename, finetune, version, size_label = Metadata.get_model_id_components(model_id, total_params)
                                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jart/llama.cpp/gguf-py/gguf/metadata.py", line 233, in get_model_id_components
    if at_start and ((len(t) == 0 and part[0].isalpha()) or "version" in t):
                                      ~~~~^^^
IndexError: string index out of range

I ran this command:

$?=1 main jart@luna:/fast/hf/SmolLM-135M$ ~/llama.cpp/convert_hf_to_gguf.py --outtype bf16 .

Stillerman · 2024-07-22T16:33:12Z

Hmmm looking into this now

Stillerman · 2024-07-22T17:18:09Z

Could you try with a fresh copy of llama.cpp at d94c6e0 with the following environment

(justine) jts@Jasons-MacBook-Air justine % uv pip freeze
certifi==2024.7.4
charset-normalizer==3.3.2
filelock==3.15.4
fsspec==2024.6.1
huggingface-hub==0.24.0
idna==3.7
jinja2==3.1.4
markupsafe==2.1.5
mpmath==1.3.0
networkx==3.3
numpy==1.26.4
packaging==24.1
pyyaml==6.0.1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
sentencepiece==0.2.0
sympy==1.13.1
tokenizers==0.19.1
torch==2.3.1
tqdm==4.66.4
transformers==4.42.4
typing-extensions==4.12.2
urllib3==2.2.2

download.py

from huggingface_hub import snapshot_download

# https://huggingface.co/HuggingFaceTB/SmolLM-135M
model_id="HuggingFaceTB/SmolLM-135M"
snapshot_download(repo_id=model_id, local_dir="smol-135",
                    local_dir_use_symlinks=False, revision="main")

and then run

python download.py
python convert_hf_to_gguf.py smol-135 --outtype bf16

I was able to then run

make -j8
./llama-cli -m "smol-135/smol-135M-135-BF16.gguf" -p "hi there llama\!"`

and it seems to inference

jart · 2024-07-22T17:28:50Z

Conversion works now. Although it's a little weird the filename that it chooses.

jart

Fantastic. This model goes wicked fast on CPU.

llama_print_timings:        load time =      59.98 ms
llama_print_timings:      sample time =       1.16 ms /    30 runs   (    0.04 ms per token, 25862.07 tokens per second)
llama_print_timings: prompt eval time =      45.54 ms /   203 tokens (    0.22 ms per token,  4457.82 tokens per second)
llama_print_timings:        eval time =     185.74 ms /    29 runs   (    6.40 ms per token,   156.14 tokens per second)
llama_print_timings:       total time =     237.18 ms /   232 tokens
Log end
smol jart@luna:~/llamafile$ ls -hal /weights/SmolLM-135M.BF16.gguf
-rw-rw-r-- 1 jart jart 259M Jul 22 10:29 /weights/SmolLM-135M.BF16.gguf

Thank you! Approved!

Stillerman · 2024-07-23T02:23:29Z

Llamafiles for all SmolLM models can be found here.

jart · 2024-07-23T02:38:06Z

Nice!

Supports SmolLM

946b204

github-actions bot added the llama.cpp label Jul 21, 2024

jart self-requested a review July 22, 2024 06:24

jart approved these changes Jul 22, 2024

View reviewed changes

jart merged commit cc30400 into mozilla-ai:main Jul 22, 2024

Stillerman mentioned this pull request Jul 22, 2024

Bug: unsupported op 'MUL_MAT' on bf16 but not f16 on SmolLM #499

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supports SmolLM#495

Supports SmolLM#495
jart merged 1 commit intomozilla-ai:mainfrom
Stillerman:main

Stillerman commented Jul 21, 2024 •

edited

Loading

Uh oh!

jart commented Jul 22, 2024

Uh oh!

Stillerman commented Jul 22, 2024

Uh oh!

Stillerman commented Jul 22, 2024 •

edited

Loading

Uh oh!

jart commented Jul 22, 2024

Uh oh!

jart left a comment

Uh oh!

Stillerman commented Jul 23, 2024

Uh oh!

jart commented Jul 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Stillerman commented Jul 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jart commented Jul 22, 2024

Uh oh!

Stillerman commented Jul 22, 2024

Uh oh!

Stillerman commented Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jart commented Jul 22, 2024

Uh oh!

jart left a comment

Choose a reason for hiding this comment

Uh oh!

Stillerman commented Jul 23, 2024

Uh oh!

jart commented Jul 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stillerman commented Jul 21, 2024 •

edited

Loading

Stillerman commented Jul 22, 2024 •

edited

Loading