kraxel@ollivander ~# ramalama --debug run tiny hello
run_cmd: podman inspect quay.io/ramalama/cuda:0.6
Working directory: None
Ignore stderr: False
Ignore all: True
exec_cmd: podman run --rm -i --label ai.ramalama --name ramalama_v2MjJibw8H --env=HOME=/tmp --init --runtime /usr/bin/nvidia-container-runtime --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=ollama://tinyllama --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.command=run --env LLAMA_PROMPT_PREFIX=🦭 > --pull=newer -t --device /dev/dri --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 --network none --mount=type=bind,src=/home/kraxel/.local/share/ramalama/models/ollama/tinyllama:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/cuda:latest llama-run -c 2048 --temp 0.8 -v --ngl 999 /mnt/models/model.file hello
Loading modelggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes
kraxel@ollivander ~# dmesg | tail -1
[ 401.460183] traps: llama-run[1907] trap invalid opcode ip:7f2d23f352ac sp:7ffcf9dbfd20 error:0 in libggml-cpu.so[3a2ac,7f2d23f03000+60000]
kraxel@ollivander ~# lscpu | grep avx
kraxel@ollivander ~#
I suspect
libggml-cpu.sogoes use AVX instructions without checking the CPU actually supports them.