feat: enable target-triple build cache#1425
Conversation
Summary of ChangesHello @cpunion, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive build caching mechanism to 'llgo', significantly improving build performance by reusing previously compiled packages. It achieves this through deterministic fingerprinting of all build inputs, a robust cache management system for storing compiled artifacts, and intelligent conditional linking of runtime components. The manifest format has been upgraded to YAML for better structure and extensibility, and the entire system is designed for reliability with atomic file operations and graceful fallbacks. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive build caching mechanism to improve compilation speed. It works by fingerprinting package inputs (source files, dependencies, environment, build flags) and storing compiled archives (.a files) along with a manifest. The implementation is robust, using atomic file operations to prevent race conditions and a deterministic YAML-based manifest format. The build logic has been significantly refactored to integrate caching, including deferring the build of runtime packages until they are confirmed to be needed. My review found the overall design to be solid. I've provided a few comments on potential issues in fallback paths that could lead to cache inconsistencies and suggestions for minor code simplification.
| temp := &aPackage{Package: dep} | ||
| if c.pkgByID == nil { | ||
| c.pkgByID = make(map[string]Package) | ||
| } | ||
| c.pkgByID[dep.ID] = temp | ||
| if err := c.collectFingerprint(temp); err != nil { | ||
| return entry, fmt.Errorf("collect fingerprint for %s: %w", dep.ID, err) | ||
| } | ||
| entry.Fingerprint = temp.Fingerprint | ||
| return entry, nil | ||
| } |
There was a problem hiding this comment.
This fallback logic for a dependency not found in c.pkgByID creates a minimal temporary aPackage instance just for fingerprinting. However, this temporary instance is also stored in c.pkgByID. If this package is later retrieved during the linking phase, it will be missing crucial information like compiled object files (LLFiles) and linker arguments (LinkArgs), as it never goes through the full buildPkg process. This will likely cause an incomplete or failed link. All packages should be fully discovered and initialized during the buildSSAPkgs phase to avoid this scenario.
| if manifestContent == "" { | ||
| // Fallback: rebuild if missing (should not happen in normal flow). | ||
| m := newManifestBuilder() | ||
| c.collectEnvInputs(m) | ||
| c.collectCommonInputs(m) | ||
| if err := c.collectPackageInputs(m, pkg); err != nil { | ||
| return err | ||
| } | ||
| manifestContent = m.Build() | ||
| } |
There was a problem hiding this comment.
The fallback logic to rebuild the manifest if pkg.Manifest is empty is incomplete. It does not include dependency information, which is part of the original manifest generation in collectFingerprint. If this fallback is ever triggered, it will write a manifest to the cache that does not match the package's fingerprint, as the fingerprint was calculated based on the full manifest including dependencies. This could lead to cache corruption. The manifest should be rebuilt completely, including dependencies, or this fallback should be removed if it's truly unreachable.
internal/build/fingerprint.go
Outdated
| func digestFiles(paths []string) (string, []fileDigest, error) { | ||
| if len(paths) == 0 { | ||
| return "", nil, nil | ||
| } | ||
|
|
||
| digests := make([]fileDigest, 0, len(paths)) | ||
| for _, path := range paths { | ||
| hash, err := digestFile(path) | ||
| if err != nil { | ||
| return "", nil, fmt.Errorf("digest file %q: %w", path, err) | ||
| } | ||
| digests = append(digests, fileDigest{Path: path, SHA256: hash}) | ||
| } | ||
|
|
||
| sort.Slice(digests, func(i, j int) bool { return digests[i].Path < digests[j].Path }) | ||
|
|
||
| var parts []string | ||
| for _, d := range digests { | ||
| parts = append(parts, fmt.Sprintf("%s]sha256:%s", d.Path, d.SHA256)) | ||
| } | ||
|
|
||
| return strings.Join(parts, ","), digests, nil | ||
| } |
There was a problem hiding this comment.
The digestFiles function returns a serialized string digest and a slice of fileDigest. However, all call sites in internal/build/collect.go discard the string return value. This legacy string format seems to be unused and could be removed to simplify the function.
func digestFiles(paths []string) ([]fileDigest, error) {
if len(paths) == 0 {
return nil, nil
}
digests := make([]fileDigest, 0, len(paths))
for _, path := range paths {
hash, err := digestFile(path)
if err != nil {
return nil, fmt.Errorf("digest file %q: %w", path, err)
}
digests = append(digests, fileDigest{Path: path, SHA256: hash})
}
sort.Slice(digests, func(i, j int) bool { return digests[i].Path < digests[j].Path })
return digests, nil
}| // digestFilesWithOverlay calculates digests for files, using overlay content when available. | ||
| func digestFilesWithOverlay(paths []string, overlay map[string][]byte) (string, []fileDigest, error) { | ||
| if len(paths) == 0 { | ||
| return "", nil, nil | ||
| } | ||
|
|
||
| digests := make([]fileDigest, 0, len(paths)) | ||
| for _, path := range paths { | ||
| var hash string | ||
| if content, ok := overlay[path]; ok { | ||
| hash = digestBytes(content) | ||
| } else { | ||
| var err error | ||
| hash, err = digestFile(path) | ||
| if err != nil { | ||
| return "", nil, fmt.Errorf("digest file %q: %w", path, err) | ||
| } | ||
| } | ||
| digests = append(digests, fileDigest{Path: path, SHA256: hash}) | ||
| } | ||
|
|
||
| sort.Slice(digests, func(i, j int) bool { return digests[i].Path < digests[j].Path }) | ||
|
|
||
| var parts []string | ||
| for _, d := range digests { | ||
| parts = append(parts, fmt.Sprintf("%s]sha256:%s", d.Path, d.SHA256)) | ||
| } | ||
|
|
||
| return strings.Join(parts, ","), digests, nil | ||
| } |
There was a problem hiding this comment.
Similar to digestFiles, the digestFilesWithOverlay function returns a serialized string that is not used by its callers. This can be removed to simplify the code.
func digestFilesWithOverlay(paths []string, overlay map[string][]byte) ([]fileDigest, error) {
if len(paths) == 0 {
return nil, nil
}
digests := make([]fileDigest, 0, len(paths))
for _, path := range paths {
var hash string
if content, ok := overlay[path]; ok {
hash = digestBytes(content)
} else {
var err error
hash, err = digestFile(path)
if err != nil {
return nil, fmt.Errorf("digest file %q: %w", path, err)
}
}
digests = append(digests, fileDigest{Path: path, SHA256: hash})
}
sort.Slice(digests, func(i, j int) bool { return digests[i].Path < digests[j].Path })
return digests, nil
}
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1425 +/- ##
==========================================
+ Coverage 90.59% 90.63% +0.03%
==========================================
Files 43 43
Lines 11400 11400
==========================================
+ Hits 10328 10332 +4
+ Misses 911 907 -4
Partials 161 161 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/review |
Fixes #1361
Summary
.aarchives plus manifests under$LLGO_CACHE/build/<target-triple>/<pkg>/<fingerprint>.{a,manifest}and wire it intobuildPkg,linkMainPkg, and archive creation so cached packages skip LLVM lowering entirelyLLGO_BUILD_CACHEwith graceful fallbacks, make runtime linking conditional on actual NeedRt/NeedPyInit requirements, and ensure manifests capture metadata like linker args for cache hitsManifest Format
The cache manifest moved from the legacy INI layout to a YAML document so sections can embed structured data (lists/maps) deterministically. Files now record
size/mtime(andoverlay_hashfor in-memory overlays). Example output:Testing
go test ./internal/build/...