Skip to content

RUN --no-cache to skip reading & writing a RUN layer to cache #6303

@fenollp

Description

@fenollp

Description

Some RUNs may need to always be re-run, e.g. apt update or this AI inference example

I'll reuse this example: here I'd like to be able to re-run the llama-cli inference (without providing an explicit seed, simply relying on non-deterministic floating point operations...).

IMO a RUN-level --no-cache flag is a more portable solution than --no-cache-filter <STAGE>.

docker build -o=. - <<DOCKERFILE
# syntax=docker/dockerfile:1-labs

FROM scratch AS model
ADD https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf /model.gguf

FROM scratch AS prompt
COPY <<EOF prompt.txt
Q: Generate  a list of 10 unique biggest countries by population in JSON with their estimated poulation in 1900 and 2024. Answer only newline formatted JSON with keys "country", "population_1900", "population_2024" with 10 items.
A:
[
    {

EOF

FROM ghcr.io/ggml-org/llama.cpp:full-cuda-b5124 AS infer
# RUN --no-cache \ ... instead
RUN --device=nvidia.com/gpu=all \
    --mount=from=model,target=/models \
    --mount=from=prompt,target=/tmp \
    ./llama-cli -m /models/model.gguf -no-cnv -ngl 99 -f /tmp/prompt.txt | tee /infered

FROM scratch
COPY --from=infer /infered /
DOCKERFILE

Related discussions, in no particular order:

Workaround (still populates cache):

  • Introduce a build arg just for this RUN and pass e.g. --build-arg MYARG=$RANDOM on every build

Workaround (stage granularity instead of at the RUN level; also, moves out of the Dockerfile):

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions