Skip to content

Tesla p40 issue #6928

@KOPACb

Description

@KOPACb

hello. I`m trying to build from source, and cant get localai up on teslap40 vgpu under proxmox.

LocalAI version:
v3.6.0-215-g1e5b9135
commit 1e5b913 (HEAD -> master, origin/master, origin/HEAD)

Environment, CPU architecture, OS, and Version:
VM in proxmox with vgpu passed.
Profile GRID-P40-24Q (mdev=nvidia-53) num_heads=4, frl_config=60, framebuffer=24576M
4 cores from AMD Ryzen 9 5900X 12-Core Processor, processor type host.

Linux LLM-teslap40 6.1.0-38-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.147-1 (2025-08-02) x86_64 GNU/Linux

nvidia-smi
root@LLM-teslap40:/srv/localai/src/LocalAI# nvidia-smi
Fri Oct 31 00:29:41 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03             Driver Version: 535.261.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  GRID P40-24Q                   On  | 00000000:00:10.0 Off |                  N/A |
| N/A   N/A    P0              N/A /  N/A |      0MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Describe the bug
LocalAI didnt see my GPU.
If i got qemu default gpu i need to use LOCALAI_FORCE_META_BACKEND_CAPABILITY=nvidia to start
Now it is disabled, but

12:27AM INF LocalAI version: v3.6.0-215-g1e5b9135 (1e5b9135df80534e96cb0f33fd80b4ec5414cb86)
12:27AM INF Capability automatically detected, set LOCALAI_FORCE_META_BACKEND_CAPABILITY to override Capability=nvidia
12:27AM WRN VRAM is less than 4GB, defaulting to CPU. Set LOCALAI_FORCE_META_BACKEND_CAPABILITY to override

LocalAI didn`t see full VRAM.

To Reproduce

git clone https://github.com/go-skynet/LocalAI
cd LocalAI
make build
./local-ai

Logs

root@LLM-teslap40:/srv/localai/src/LocalAI# make build
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
go install google.golang.org/protobuf/cmd/[email protected]
mkdir -p pkg/grpc/proto
./protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
    backend/backend.proto
I local-ai build info:
I BUILD_TYPE:
I GO_TAGS:
I LD_FLAGS: -s -w -X "github.com/mudler/LocalAI/internal.Version=v3.6.0-215-g1e5b9135" -X "github.com/mudler/LocalAI/internal.Commit=1e5b9135df80534e96cb0f33fd80b4ec5414cb86"
I UPX:
rm -rf local-ai || true
CGO_LDFLAGS="" go build -ldflags "-s -w -X "github.com/mudler/LocalAI/internal.Version=v3.6.0-215-g1e5b9135" -X "github.com/mudler/LocalAI/internal.Commit=1e5b9135df80534e96cb0f33fd80b4ec5414cb86"" -tags "" -o local-ai ./cmd/local-ai
root@LLM-teslap40:/srv/localai/src/LocalAI# ./local-ai
12:48AM INF Starting LocalAI using 4 threads, with models path: /srv/localai/src/LocalAI/models
12:48AM INF LocalAI version: v3.6.0-215-g1e5b9135 (1e5b9135df80534e96cb0f33fd80b4ec5414cb86)
12:48AM INF Capability automatically detected, set LOCALAI_FORCE_META_BACKEND_CAPABILITY to override Capability=nvidia
12:48AM WRN VRAM is less than 4GB, defaulting to CPU. Set LOCALAI_FORCE_META_BACKEND_CAPABILITY to override
12:48AM INF Preloading models from /srv/localai/src/LocalAI/models

  Model name: qwen-image



  Model name: qwen3-8b


12:48AM INF core/startup process completed!
12:48AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
12:49AM INF Success ip=192.168.1.134 latency=533.876175ms method=POST status=200 url=/v1/chat/completions
12:49AM INF BackendLoader starting backend=llama-cpp modelID=qwen3-8b o.model=Qwen3-8B.Q4_K_M.gguf
12:49AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:46609: connect: connection refused\""
12:49AM ERR Failed to load model qwen3-8b with backend llama-cpp error="failed to load model with internal loader: grpc service not ready" modelID=qwen3-8b
12:49AM ERR Stream ended with error: failed to load model with internal loader: grpc service not ready

Additional context
But its work in container with image localai/localai:master-gpu-nvidia-cuda12

log
root@LLM-teslap40:/srv/docker-localai# docker compose up -d
[+] Running 1/1
 ✔ Container docker-localai-localai-1  Started                                                                                                                                                           3.1s
root@LLM-teslap40:/srv/docker-localai# docker compose logs -f
localai-1  | CPU info:
localai-1  | model name : AMD Ryzen 9 5900X 12-Core Processor
localai-1  | flags              : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor fsrm arch_capabilities
localai-1  | CPU:    AVX    found OK
localai-1  | CPU:    AVX2   found OK
localai-1  | CPU: no AVX512 found
localai-1  | 9:52PM INF Setting logging to info
localai-1  | 9:52PM INF Starting LocalAI using 4 threads, with models path: //models
localai-1  | 9:52PM INF LocalAI version: 5ce982b (5ce982b9c91851c82554d2eebf15f439b1caa7a9)
localai-1  | 9:52PM INF Preloading models from //models
localai-1  |
localai-1  |   Model name: dreamshaper
localai-1  |
localai-1  |
localai-1  |
localai-1  |   Model name: moondream2-20250414
localai-1  |
localai-1  |
localai-1  |
localai-1  |   Model name: qwen3-4b
localai-1  |
localai-1  |
localai-1  | 9:52PM INF core/startup process completed!
localai-1  | 9:52PM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
localai-1  | 9:52PM WRN Client error ip=192.168.1.134 latency=1.520117ms method=GET status=401 url=/
localai-1  | 9:52PM INF Success ip=192.168.1.134 latency="10.529µs" method=GET status=200 url=/static/assets/highlightjs.css
localai-1  | 9:52PM INF Success ip=192.168.1.134 latency="22.58µs" method=GET status=200 url=/static/assets/highlightjs.js
localai-1  | 9:52PM INF Success ip=192.168.1.134 latency="25.468µs" method=GET status=200 url=/static/general.css
...
localai-1  | 9:53PM INF Success ip=192.168.1.134 latency=635.634738ms method=POST status=200 url=/v1/chat/completions
localai-1  | 9:53PM INF BackendLoader starting backend=llama-cpp modelID=qwen3-4b o.model=Qwen3-4B.Q4_K_M.gguf
localai-1  | 9:53PM INF Success ip=192.168.1.134 latency=636.851105ms method=POST status=200 url=/v1/chat/completions
localai-1  | 9:53PM INF Success ip=127.0.0.1 latency="30.249µs" method=GET status=200 url=/readyz
localai-1  | 9:54PM INF Success ip=127.0.0.1 latency="11.11µs" method=GET status=200 url=/readyz
docker-compose.yml
services:
  localai:
    # See https://localai.io/basics/container/#standard-container-images for
    # a list of available container images (or build your own with the provided Dockerfile)
    # Available images with CUDA, ROCm, SYCL, Vulkan
    # Image list (quay.io): https://quay.io/repository/go-skynet/local-ai?tab=tags
    # Image list (dockerhub): https://hub.docker.com/r/localai/localai
    # For images with python backends, use:
    #image: localai/localai:master-cublas-cuda12-ffmpeg
    image: localai/localai:master-gpu-nvidia-cuda12
    restart: always
#    command:
#    - ${MODEL_NAME:-gemma-3-4b-it-qat}
#    - ${MULTIMODAL_MODEL:-moondream2-20250414}
#    - ${IMAGE_MODEL:-sd-1.5-ggml}
#    - granite-embedding-107m-multilingual
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 60s
      timeout: 10m
      retries: 120
    ports:
    - 8081:8080
    environment:
      - LOCALAI_SINGLE_ACTIVE_BACKEND=true
#      - DEBUG=true
      - API_KEY=apikeysecret
      - REBUILD=true
    volumes:
      - ./volumes/models:/models
      - ./volumes/backends:/backends
      - ./volumes/models:/build/models
      - ./volumes/backends:/build/backends
      - ./volumes/images:/tmp/generated/images
    # For images with python backends, use:
    # image: localai/localai:master-cublas-cuda12-ffmpeg
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions