-
-
Notifications
You must be signed in to change notification settings - Fork 10
mz334: multistage cache lookahead #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
hmmm... for I can work with |
488f8b6 to
1c48add
Compare
1c48add to
c14e8f2
Compare
277158d to
a85bcb6
Compare
a85bcb6 to
341906d
Compare
9c5cfec to
6e616c0
Compare
|
YES! multistage cache lookahead appears to work correctly, it's just a bit ugly, but now I know where to continue with the refactoring to make this change feel good. |
|
|
||
| func squash(a, b *stageBuilder) *stageBuilder { | ||
| acmds := filterOnBuild(a.cmds) | ||
| return &stageBuilder{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should split stageBuilder into two parts. It currently has two responsibilities, it cares both how to build and what to build. I think the how can stay in stageBuilder, but the what should be a separate struct entirely. This new struct can then be passed into the build() function. Then the same stagebuilder entity can crunch through all stages. Here in the squash it becomes obvious as only the what parts are affected by the squashing, for the how parts squashing is meaningless.
| } | ||
|
|
||
| // Apply optimizations to the instructions. | ||
| if err := sb.optimize(*compositeKey, sb.cf.Config); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do do cache-optimization already here, as we want to run skip&squash over the cached commands. I left it as part of the build() function anyways though as I'm not yet sure whether it has any side-effects on he compositeKey.
|
For a simple build like this I get promising results already:
of course this is a trivial example, in a real-world use-case the expected saving will be way higher. But it is promising that the saving is visible in such a trivial example already. |
6e616c0 to
2331f13
Compare
2331f13 to
893c89c
Compare
Fixes #334
Description
For single stage builds we perform a cache lookahead. This means if we have 100% cache hitrate we don't even need to download and unpack the files in order to create the image. The idea here is to take this lookahead functionality further to work across stages too.
Currently in multistage builds every stage is built (from cache), even if the dependent stages would have resolved to a 100% cache hitrate themselves. This means that even if we have 100% cache hitrate, we at least need to download and unpack all the files for all multistage ancestors. In case of long
FROMchains this is partially mitigated by squashing the stages together, meaning that we in effect build a single stage. In case we haveCOPY --fromthough this is not possible, as we can't squash across forks & merges. If we opt to--cache-copy-layerswe don't even use the files from the hereby built image at all and instead load them cache directly. This means that in that case downloading and unpacking was completely in vain and we can skip it completely as an optimization.The difficulty here is that currently our cache-key depends on the file contents. This is the safe option. Even if an upstream image changes, or a multistage ancestor changes, as long as the file contents stay the same, we have a cache hit. The reverse is true too, we don't need to detect upstream changes, as we will notice them by the changed files. However, our lookahead needs to know this a priori, hence it only works if the files are guaranteed to be the same. This is for example the case if you reference images by their shasum. In that case it is guaranteed that the files will be the same after download. The same logic also applies if you provide a checksum to COPY/ADD, which is not yet implemented in kaniko, but would be a nice incentive to do so.
We can know a-priori whether a lookahead key will be stable and we can opt to do file hashing in case it is not. The only downside is that we would lose a cache hit if the a-priori key changed but the file-contents did not. Not yet sure whether we can remedy this case, as it basically would need to have two references to the same cache layer.
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Reviewer Notes
Release Notes
Describe any changes here so maintainer can include it in the release notes, or delete this block.