Added non-conflicting hash for install files#1454
Conversation
|
This is looking really good. I like the idea of only having conflicts if the transitive deps have changed. |
| for (String key : keys) { | ||
| toHash.put(key, rendered.get(key)); | ||
| @SuppressWarnings("unchecked") | ||
| private static Map<String, Integer> calculateArtifactHash(Map<String, Object> rendered) { |
There was a problem hiding this comment.
@shs96c question for a potential breaking change in the next major update.
It seems like this code and the code in v3_lock_file.bzl are similar. IIRC, the reason the starlark implementation exists is if the user doesn't have a lockfile.
If that is the case, is there a possibility to consolidate around the java code (which is easiest to test tbh) by forcing lockfile usage?
There was a problem hiding this comment.
We'd likely want to consolidate on the starlark version of the code, since that's the one that's used by people when they verify the signatures.
There was a problem hiding this comment.
I agree that it would be the best solution, but it's not simple to implement.
Starlark code runs in analysis phase, this java code runs in execution phase, so it's impossible to do without a minor rewrite of the flow.
|
@MarconZet, I'm waiting until you move this out of draft before reviewing. Please LMK when you're ready! |
|
@shs96c any progress on the review? |
|
Could we add a description to the PR like: Summary This commit introduces lock file version 3 with per-artifact hashing instead of a single global hash. The main purpose is to create "non-conflicting" hashes that allow more granular change Key Changes
Example in maven_install.json: // New format
The new _compute_lock_file_hash_v3 function computes individual hashes per artifact that include:
compute_dependency_inputs_signature now returns a dictionary of per-artifact hashes plus backward-compatible v1/v2 signatures.
Changed from --input_hash (single value) to --input-hash-path (path to JSON file containing the hash dictionary).
The code still supports reading v2 lock files - it checks for v3 first, then falls back to v2, then v1. Users with older lock files will see a message to repin. Purpose This per-artifact hashing approach allows the system to detect exactly which artifacts changed, rather than just knowing "something changed." This is useful for incremental updates and |
|
We tried this patch and so far it has been working well. There is one thing though. In case of mismatched signature, rules_jvm_external/private/rules/coursier.bzl Line 568 in e95b9d7 |
|
@honnix I changed the code, It should print errors better now |
Nice! Thank you. We will take the new patch and try it out. |
|
FWIW, We've been using this at Confluent for a month now and it has been working well. |
| for (String key : keys) { | ||
| toHash.put(key, rendered.get(key)); | ||
| @SuppressWarnings("unchecked") | ||
| private static Map<String, Integer> calculateArtifactHash(Map<String, Object> rendered) { |
There was a problem hiding this comment.
We'd likely want to consolidate on the starlark version of the code, since that's the one that's used by people when they verify the signatures.
|
|
||
| def _add_to_hash_dictionary(dictionary, artifact, salt): | ||
| artifact_dict = json.decode(artifact) | ||
| key = artifact_dict["group"] + ":" + artifact_dict["artifact"] |
There was a problem hiding this comment.
You should use the same logic that's in Coordinates.asKey() to get a stable key that includes things like the classifier. That's already in coordinates.bzl as to_key
There was a problem hiding this comment.
hmm, this key has a direct mapping to what appears in the lock file
My idea was, that I don't want things like classifier and packaging, my reason being:
- It looked ugly in the lock file – when I tried it in my repo, the
__INPUT_ARTIFACTS_HASHsize duplicated with:sources. It did not provide any information and was just noise. - I think that at this level, we don't want to allow a non-conflicting merge.
| if boms and len(boms): | ||
| for bom in sorted(boms): | ||
| artifact_inputs.append(_stable_artifact(bom)) | ||
| _add_to_hash_dictionary(all_hashes, bom, "bom") |
There was a problem hiding this comment.
Why is the salt needed for artifacts and boms? They should have unique coordinates no matter what.
There was a problem hiding this comment.
I did this thinking about artifact and excluded_artifact.
In v2 hash, excluding an artifact would change the hash because the has order changed. In group:artifact: HASH notation, excluding an artifact would not change the hash, so we need salt.
The rest is about code design principles, adding salt everywhere is easier then adding salt only to excluded_artifact.
|
Let me handle the rebase, and I'll merge this when I've done so. |
|
Ah! I can't do the rebase. Could you please handle that? |
|
The test failures look related to this change. The V3LockFileTest is failing. |
|
@shs96c I forgot to add some files, It should be ok now |
* master: (25 commits) fix: use forward slash separator in Maven purl format (bazel-contrib#1530) Load rules from specific bzl files and add sh_test imports (bazel-contrib#1529) Added non-conflicting hash for install files (bazel-contrib#1454) Update the maven and coursier resolver tests to create a class index file. (bazel-contrib#1519) [ci] Drop Bazel 6 and ensure we run on Bazel 7 and 8 (bazel-contrib#1525) Only allow modules specified in known_contributing_modules to contribute artifacts or boms to the root module (bazel-contrib#1523) [gradle] Fix false resolution failures when BOM upgrades dependency version (bazel-contrib#1520) [gradle] Fix Gradle resolver to respect force_version and include runtime dependencies (bazel-contrib#1516) Correctly merge BOMs from non-root modules (bazel-contrib#1518) Update more lock files Filter test_only artifacts out of artifacts merged into root repos and print a warning when a root artifact version is overridden by a non_root bazel_dep (bazel-contrib#1511) Fix SHA mismatch for conflicting dependency versions (bazel-contrib#1513) [gradle] Plumb through the force_version attribute (bazel-contrib#1515) [gradle] Add dep exclusions to only that dep (bazel-contrib#1514) [gradle] Handle aggregating dependencies and relocation version conflicts (bazel-contrib#1512) BOM Fixes (bazel-contrib#1506) Allow an optional index of dep -> class to be created (bazel-contrib#1492) Put files in `ResolutionResult` (bazel-contrib#1484) Optimize dependency graph building with O(1) lookups (bazel-contrib#1483) Provide a mechanism to list all resolved direct deps for a workspace (bazel-contrib#1510) ...
* master: Add presubmit check for prebuilt jars (bazel-contrib#1486) Upload artifacts in parallel (address artifactorys "Maven Snapshot Version Behaviour") (bazel-contrib#1524) feat: Support COURSIER_SHA256 environment variable (bazel-contrib#1527) fix: Do not add coursier opts when run other tools (bazel-contrib#1531) fix: add string attributes to `amend_artifact` for explicit unset state (bazel-contrib#1499) fix: use forward slash separator in Maven purl format (bazel-contrib#1530) Load rules from specific bzl files and add sh_test imports (bazel-contrib#1529) Added non-conflicting hash for install files (bazel-contrib#1454) Update the maven and coursier resolver tests to create a class index file. (bazel-contrib#1519) [ci] Drop Bazel 6 and ensure we run on Bazel 7 and 8 (bazel-contrib#1525) Only allow modules specified in known_contributing_modules to contribute artifacts or boms to the root module (bazel-contrib#1523) [gradle] Fix false resolution failures when BOM upgrades dependency version (bazel-contrib#1520)
Sumary
This commit introduces lock file version 3 with per-artifact hashing instead of a single global hash.
This per-artifact hashing approach can reduce the amount of merge conflicts when multiple people update canonical version in large monorepo.
The code still supports reading v2 lock files - it checks for v3 first, then falls back to v2, then v1. Users with older lock files will see a message to repin.
Key Changes
Example in maven_install.json:
The new _compute_lock_file_hash_v3 function computes individual hashes per artifact that include:
compute_dependency_inputs_signature now returns a dictionary of per-artifact hashes plus backward-compatible v1/v2 signatures.