-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
Description
Description
Some RUNs may need to always be re-run, e.g. apt update or this AI inference example
I'll reuse this example: here I'd like to be able to re-run the llama-cli inference (without providing an explicit seed, simply relying on non-deterministic floating point operations...).
IMO a RUN-level --no-cache flag is a more portable solution than --no-cache-filter <STAGE>.
docker build -o=. - <<DOCKERFILE
# syntax=docker/dockerfile:1-labs
FROM scratch AS model
ADD https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf /model.gguf
FROM scratch AS prompt
COPY <<EOF prompt.txt
Q: Generate a list of 10 unique biggest countries by population in JSON with their estimated poulation in 1900 and 2024. Answer only newline formatted JSON with keys "country", "population_1900", "population_2024" with 10 items.
A:
[
{
EOF
FROM ghcr.io/ggml-org/llama.cpp:full-cuda-b5124 AS infer
# RUN --no-cache \ ... instead
RUN --device=nvidia.com/gpu=all \
--mount=from=model,target=/models \
--mount=from=prompt,target=/tmp \
./llama-cli -m /models/model.gguf -no-cnv -ngl 99 -f /tmp/prompt.txt | tee /infered
FROM scratch
COPY --from=infer /infered /
DOCKERFILERelated discussions, in no particular order:
- [feature] no cache export for specific copy statements/layers #1817
- Add --no-cache-filter to disable cache per target and per cache mount #1213
- Proposal: Allow setting cache expiration time for build steps #4294
- New feature request: Selectively disable caching for specific RUN commands in Dockerfile moby#1996
Workaround (still populates cache):
- Introduce a build arg just for this RUN and pass e.g.
--build-arg MYARG=$RANDOMon every build
Workaround (stage granularity instead of at the RUN level; also, moves out of the Dockerfile):