-
Notifications
You must be signed in to change notification settings - Fork 53
✨ Add Maven index search functionality #921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThreaded a local Maven index path ( Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Prov as Java Provider
participant Walk as Dependency Walker
participant Index as Local Maven Index
participant Legacy as Legacy Remote Lookup
Note over Prov,Walk: Init threads mavenIndexPath into walker
Prov->>Walk: create walker(mvnIndexPath)
alt Legacy flow (before)
Walk->>Legacy: toDependency(..., disableMavenSearch / dep labels)
Legacy->>Legacy: HTTP SHA lookup / POM/structure fallbacks
Legacy-->>Walk: artifact or error
else Index flow (after)
Walk->>Index: toDependency(jarFile, indexPath)
Index->>Index: search(indexPath, jar signature)
Index->>Index: readEntryAt / readKeyAt / findValue
Index-->>Walk: buildJavaArtifact -> artifact
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
external-providers/java-external-provider/pkg/java_external_provider/dependency.go (1)
33-33: Track the TODO for config key rename.The config key
providerSpecificConfigOpenSourceDepListKeywith value "depOpenSourceLabelsFile" should be renamed to reflect its current usage as the Maven index path. This is a breaking change that should be planned carefully.Would you like me to open an issue to track this config key rename? This will help ensure it's not forgotten and can be coordinated with a major version release.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
external-providers/dotnet-external-provider/go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
external-providers/java-external-provider/pkg/java_external_provider/dependency.go(6 hunks)external-providers/java-external-provider/pkg/java_external_provider/provider.go(10 hunks)external-providers/java-external-provider/pkg/java_external_provider/util.go(11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T12:11:45.673Z
Learnt from: pranavgaikwad
PR: konveyor/analyzer-lsp#859
File: external-providers/java-external-provider/pkg/java_external_provider/dependency.go:694-694
Timestamp: 2025-07-30T12:11:45.673Z
Learning: In the Java external provider dependency walker (external-providers/java-external-provider/pkg/java_external_provider/dependency.go), errors from toDependency function calls should be ignored as they are not considered important by the maintainers.
Applied to files:
external-providers/java-external-provider/pkg/java_external_provider/dependency.go
🧬 Code graph analysis (2)
external-providers/java-external-provider/pkg/java_external_provider/dependency.go (1)
provider/provider.go (2)
DepDAGItem(474-474)DepExcludeLabel(34-34)
external-providers/java-external-provider/pkg/java_external_provider/util.go (1)
external-providers/java-external-provider/pkg/java_external_provider/provider.go (1)
JavaArchive(34-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test
- GitHub Check: benchmark (windows-latest, windows)
- GitHub Check: benchmark (macos-latest, mac)
- GitHub Check: test (windows-latest)
- GitHub Check: test (macos-latest)
🔇 Additional comments (15)
external-providers/java-external-provider/pkg/java_external_provider/util.go (8)
6-6: LGTM!The new imports support the index-based search functionality added in this PR.
Also applies to: 9-9, 18-18
97-97: LGTM!The signature change to accept
mavenIndexPathaligns with the PR's goal of using index-based Maven lookups.
172-172: LGTM!The function signature update is consistent with the index-based dependency resolution approach.
234-234: LGTM!The signature update maintains consistency with the index-based Maven resolution strategy.
530-554: LGTM!The refactoring from remote Maven Central HTTP search to local index-based lookup improves reliability and eliminates network dependencies. The fallback to POM parsing provides a safety net for artifacts not found in the index.
597-608: LGTM!The function provides a reasonable fallback for creating Maven coordinates from file paths when other resolution methods fail.
620-632: LGTM!The index entry structure and constants are well-defined for a fixed-size binary index format. The 40-byte key size correctly accommodates SHA-1 hex strings.
634-783: LGTM!The index search implementation is efficient and well-structured:
- Binary search avoids loading the entire index into memory
- Proper use of
ReadAtfor random access- Error handling is consistent throughout
- The approach correctly verifies exact matches after binary search
Note: The implementation assumes the index file is pre-sorted and the data format follows the expected schema (colon-separated Maven coordinates). Consider adding validation if the index could be corrupted or incorrectly formatted.
external-providers/java-external-provider/pkg/java_external_provider/provider.go (3)
289-289: LGTM!The retrieval of
mavenIndexPathfrom provider-specific config is implemented correctly.
536-536: LGTM!The function signature updates and call sites are consistent with the index-based Maven resolution approach introduced in this PR.
Also applies to: 546-546, 589-589, 698-698, 845-845, 898-898
362-362: LGTM!The call to
decompileJavacorrectly passes themavenIndexPathparameter.external-providers/java-external-provider/pkg/java_external_provider/dependency.go (4)
664-683: LGTM!The addition of
mvnIndexPathto the walker struct properly threads the index path through the dependency discovery process.
702-702: LGTM!The
toDependencycall signature is updated correctly to use the new index-based approach. The error is appropriately ignored per established patterns.Based on learnings.
951-967: LGTM!The signature simplification is appropriate. Returning an empty map when exclude packages are not configured is the correct behavior for this specialized initialization function.
971-996: LGTM!The refactoring to have the function create and return its own map improves clarity and reduces coupling. The implementation remains correct.
external-providers/java-external-provider/pkg/java_external_provider/provider.go
Outdated
Show resolved
Hide resolved
| func RandomName() string { | ||
| rand.Seed(int64(time.Now().Nanosecond())) | ||
| charset := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" | ||
| b := make([]byte, 16) | ||
| for i := range b { | ||
| b[i] = charset[rand.Intn(len(charset))] | ||
| } | ||
| log.V(10).Info(fmt.Sprintf("could not find groupId for in our public listing of group id's", jarFile)) | ||
| return artifact, nil | ||
| return string(b) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace deprecated rand.Seed with rand.New.
The global rand.Seed is deprecated as of Go 1.20. Use a local random generator instead to avoid future compatibility issues.
Apply this diff:
func RandomName() string {
- rand.Seed(int64(time.Now().Nanosecond()))
+ rng := rand.New(rand.NewSource(time.Now().UnixNano()))
charset := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
b := make([]byte, 16)
for i := range b {
- b[i] = charset[rand.Intn(len(charset))]
+ b[i] = charset[rng.Intn(len(charset))]
}
return string(b)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func RandomName() string { | |
| rand.Seed(int64(time.Now().Nanosecond())) | |
| charset := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" | |
| b := make([]byte, 16) | |
| for i := range b { | |
| b[i] = charset[rand.Intn(len(charset))] | |
| } | |
| log.V(10).Info(fmt.Sprintf("could not find groupId for in our public listing of group id's", jarFile)) | |
| return artifact, nil | |
| return string(b) | |
| } | |
| func RandomName() string { | |
| rng := rand.New(rand.NewSource(time.Now().UnixNano())) | |
| charset := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" | |
| b := make([]byte, 16) | |
| for i := range b { | |
| b[i] = charset[rng.Intn(len(charset))] | |
| } | |
| return string(b) | |
| } |
🤖 Prompt for AI Agents
external-providers/java-external-provider/pkg/java_external_provider/util.go
around lines 610 to 618: replace use of the deprecated global rand.Seed with a
local rand.Rand instance seeded from time.Now().UnixNano(); create a new source
and rand.New(source) at function start and use that instance's Intn when
selecting characters, leaving the rest of the function logic intact so it
returns a 16-character random string without touching global state.
Signed-off-by: Juan Manuel Leflet Estrada <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
external-providers/java-external-provider/pkg/java_external_provider/util.go (2)
530-554: Add validation for empty indexPath.The function uses
indexPathto construct file paths without checking if it's non-empty. IfindexPathis empty or invalid,filepath.Joinwill create incorrect paths leading to confusing error messages.Consider adding a check at the beginning of the function:
func toDependency(_ context.Context, log logr.Logger, jarFile string, indexPath string) (javaArtifact, error) { dep := javaArtifact{} + if indexPath == "" { + log.V(5).Info("index path not configured, falling back to pom parsing") + return constructArtifactFromPom(log, jarFile) + } // we look up the jar in maven file, err := os.Open(jarFile)
774-783: Add bounds checking in buildJavaArtifact.The function accesses
parts[0],parts[1], andparts[4]without verifying thatpartshas at least 5 elements. If the data file contains malformed entries, this could cause a panic.Apply this diff to add validation:
func buildJavaArtifact(sha, str string) javaArtifact { dep := javaArtifact{} parts := strings.Split(str, ":") + if len(parts) < 5 { + return dep // return empty artifact for malformed data + } dep.GroupId = parts[0] dep.ArtifactId = parts[1] dep.Version = parts[4] dep.foundOnline = true dep.sha1 = sha return dep }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
external-providers/dotnet-external-provider/go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
external-providers/java-external-provider/pkg/java_external_provider/dependency.go(4 hunks)external-providers/java-external-provider/pkg/java_external_provider/provider.go(8 hunks)external-providers/java-external-provider/pkg/java_external_provider/util.go(11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T12:11:45.673Z
Learnt from: pranavgaikwad
PR: konveyor/analyzer-lsp#859
File: external-providers/java-external-provider/pkg/java_external_provider/dependency.go:694-694
Timestamp: 2025-07-30T12:11:45.673Z
Learning: In the Java external provider dependency walker (external-providers/java-external-provider/pkg/java_external_provider/dependency.go), errors from toDependency function calls should be ignored as they are not considered important by the maintainers.
Applied to files:
external-providers/java-external-provider/pkg/java_external_provider/dependency.go
🧬 Code graph analysis (2)
external-providers/java-external-provider/pkg/java_external_provider/util.go (1)
external-providers/java-external-provider/pkg/java_external_provider/provider.go (1)
JavaArchive(34-34)
external-providers/java-external-provider/pkg/java_external_provider/dependency.go (1)
provider/provider.go (1)
DepDAGItem(474-474)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test
- GitHub Check: benchmark (windows-latest, windows)
- GitHub Check: benchmark (macos-latest, mac)
- GitHub Check: test (windows-latest)
🔇 Additional comments (13)
external-providers/java-external-provider/pkg/java_external_provider/dependency.go (2)
703-703: LGTM!The call signature update correctly replaces
disableMavenSearchwithmvnIndexPath, aligning with the new index-based dependency resolution approach. Error handling follows the established pattern per previous feedback.Based on learnings.
757-767: LGTM!The walker initialization properly includes
mvnIndexPath, maintaining consistency with the same pattern used indiscoverDepsFromJars.external-providers/java-external-provider/pkg/java_external_provider/provider.go (4)
523-523: LGTM, but depends on fixing line 289.The
mvnIndexPathfield initialization is correct. However, it relies onmavenIndexPathbeing retrieved with the correct config key at line 289.
536-536: LGTM!The
resolveSourcesJarsForMavensignature update correctly adds themavenIndexPathparameter, replacing the previousdisableMavenSearchflag with the new index-based approach.Also applies to: 846-846
546-546: LGTM!The
resolveSourcesJarsForGradlesignature update correctly adds themavenIndexPathparameter, maintaining consistency with the Maven equivalent and the new index-based resolution approach.Also applies to: 590-590
699-699: LGTM!The
decompilefunction calls correctly pass themavenIndexPathparameter, aligning with the updated function signature in util.go.Also applies to: 899-899
external-providers/java-external-provider/pkg/java_external_provider/util.go (7)
97-97: LGTM!The
decompilefunction signature correctly adds themavenIndexPathparameter, enabling index-based dependency resolution throughout the decompilation flow.
172-172: LGTM!The function signature and parameter threading correctly propagate
mavenIndexPaththrough the decompilation pipeline fromdecompileJava→explode→decompile.Also applies to: 185-185, 198-198
234-234: LGTM!The
explodefunction correctly threadsmvnIndexPaththrough both recursive calls and totoDependency, maintaining consistency in the index-based lookup approach.Also applies to: 366-366, 374-374
556-595: LGTM!The
constructArtifactFromPomfunction correctly parses Maven coordinates from embedded pom.properties files. File handles are properly managed with defer statements, and error handling is appropriate.
597-608: LGTM!The
toFilePathDependencyfunction provides a simple fallback mechanism to create dependency information from file paths when other methods fail. The logic is clear and appropriate for its purpose.
620-632: LGTM!The constants and
IndexEntrytype are well-defined. TheKeySizeof 40 correctly represents a SHA-1 hash in hexadecimal format, and theentrySizecalculation accurately accounts for the fixed-size binary format.
634-709: LGTM!The
searchandsearchIndexfunctions implement a correct binary search over the index file. File handles are properly managed, error handling is appropriate, and the exact match verification ensures correctness.
external-providers/java-external-provider/pkg/java_external_provider/dependency.go
Show resolved
Hide resolved
external-providers/java-external-provider/pkg/java_external_provider/provider.go
Outdated
Show resolved
Hide resolved
external-providers/java-external-provider/pkg/java_external_provider/provider.go
Outdated
Show resolved
Hide resolved
shawn-hurley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at the output of the files, couldn't you just do the binary search over the .txt file? it appears as if the file is sorted by the sha, and if that is the case then we can drop the 791MB .idx file.
Signed-off-by: Juan Manuel Leflet Estrada <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
external-providers/java-external-provider/pkg/java_external_provider/provider.go (1)
590-590: Consider using a parameter object for improved maintainability.The function now accepts 5 parameters. While not critical, consolidating related parameters into a struct (e.g.,
SourceResolutionConfig) would improve readability and make future parameter additions easier.Example refactor:
type SourceResolutionConfig struct { fernflower string disableMavenSearch bool taskFile string mavenIndexPath string } func (s *javaServiceClient) resolveSourcesJarsForGradle(ctx context.Context, config SourceResolutionConfig) error { // function body }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
external-providers/java-external-provider/pkg/java_external_provider/provider.go(8 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test
- GitHub Check: benchmark (windows-latest, windows)
- GitHub Check: test (windows-latest)
🔇 Additional comments (3)
external-providers/java-external-provider/pkg/java_external_provider/provider.go (3)
368-369: LGTM -mavenIndexPathcorrectly threaded through binary decompilation flow.The
mavenIndexPathparameter is appropriately passed todecompileJavafor binary archive processing.
536-546: LGTM -mavenIndexPathcorrectly passed to Maven and Gradle source resolution.Both Maven and Gradle source resolution flows receive the
mavenIndexPathparameter appropriately.
699-699: LGTM -mavenIndexPathcorrectly passed to decompile functions.Both Maven and Gradle flows correctly pass
mavenIndexPathto thedecompilefunction for dependency source resolution.Also applies to: 899-899
| disableMavenSearch, ok := config.ProviderSpecificConfig[DISABLE_MAVEN_SEARCH].(bool) | ||
| mavenIndexPath, ok := config.ProviderSpecificConfig[providerSpecificConfigMavenIndexPath].(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify handling of missing or empty mavenIndexPath.
The ok variable is captured but not checked. If the config key is missing or the value is not a string, mavenIndexPath will be an empty string. Given that this PR replaces Maven Central HTTP search with index-based lookup, an empty or missing mavenIndexPath could cause silent failures in downstream decompilation and dependency resolution flows.
Consider adding validation and logging:
disableMavenSearch, ok := config.ProviderSpecificConfig[DISABLE_MAVEN_SEARCH].(bool)
-mavenIndexPath, ok := config.ProviderSpecificConfig[providerSpecificConfigMavenIndexPath].(string)
+mavenIndexPath, ok := config.ProviderSpecificConfig[providerSpecificConfigMavenIndexPath].(string)
+if !ok || mavenIndexPath == "" {
+ log.V(5).Info("mavenIndexPath not configured, index-based dependency resolution will be unavailable")
+}Run this script to verify how downstream code handles an empty mavenIndexPath:
🏁 Script executed:
#!/bin/bash
# Description: Check how decompileJava and decompile functions handle empty mavenIndexPath
# Search for decompileJava function definition and usage
ast-grep --pattern $'func decompileJava($$$, mavenIndexPath string) $$$'
# Search for decompile function definition and usage
ast-grep --pattern $'func decompile($$$, mavenIndexPath string) $$$'
# Check if there are any validation checks for empty mavenIndexPath
rg -nP --type=go 'mavenIndexPath\s*(==|!=)\s*""' -A3 -B3Length of output: 0
---
🏁 Script executed:
```shell
#!/bin/bash
# Find all function definitions containing "decompile"
rg -n "^func.*decompile" --type=go
# Find where mavenIndexPath is used
rg -n "mavenIndexPath" --type=go -A2 -B2
Length of output: 11596
---
🏁 Script executed:
```shell
#!/bin/bash
# Find all calls to decompileJava to see the pattern for cleanBin parameter
rg -n "decompileJava\(" --type=go -B2 -A2
# Check what cleanBin parameter should be by looking at its usage
rg -n "cleanBin" --type=go -B3 -A3
Length of output: 5021
Fix critical bug: Line 369 passes ok instead of cleanBin to decompileJava.
At line 366, cleanBin and ok are assigned from the type assertion: cleanBin, ok := config.ProviderSpecificConfig[CLEAN_EXPLODED_BIN_OPTION].(bool). The variable cleanBin holds the actual configuration value, while ok indicates whether the type assertion succeeded.
Line 369 incorrectly passes ok as the cleanBin parameter:
depLocation, sourceLocation, err := decompileJava(ctx, log, fernflower,
config.Location, getMavenLocalRepoPath(mavenSettingsFile), ok, mavenIndexPath)This should pass cleanBin instead. Line 379 confirms they are distinct: if ok && cleanBin { shows both variables are checked separately. The decompileJava function uses this parameter to determine project path naming at line 177.
Regarding the original mavenIndexPath concern: defensive validation exists at service_client.go:139-141 (if p.mvnIndexPath != ""), but explicit validation and logging at config read time (line 289) would provide better clarity about missing configuration.
🤖 Prompt for AI Agents
external-providers/java-external-provider/pkg/java_external_provider/provider.go
around lines 288-289: the code performs a type assertion assigning cleanBin, ok
:= config.ProviderSpecificConfig[CLEAN_EXPLODED_BIN_OPTION].(bool) but then
mistakenly passes ok to decompileJava instead of the actual cleanBin value;
change the decompileJava call to pass cleanBin as the boolean argument (not ok).
Also add a defensive check/log when reading mavenIndexPath to validate it's
non-empty when required (log a warning or error if missing) so configuration
issues are surfaced early.
shawn-hurley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does conform to the enhancement which I did just agree to.
We should have follow-ons for this work, so going to approve this
Emily on PTO - request for change was addressed
- Adds search functionality using pre-generated, bundled index - Removes unneeded functionality: - Maven central HTTP sha search - Search by list of public groupIds - `disableMavenSearch` is no longer used - to be removed in a subsequent PR? Addresses https://issues.redhat.com/browse/MTA-6231 Requires konveyor/java-analyzer-bundle#158 Bundle PR: 158 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added local Maven index lookup and utilities (index entry type and random-name helper) for resolving Java artifacts. * **Improvements** * Dependency extraction, decompilation, and source resolution now use a configurable Maven index path for more reliable, faster resolution. * **Removed** * Removed reliance on open-source dependency label lists and remote Maven HTTP lookup in resolution flows. * **Refactor** * Public interfaces updated to accept and propagate a Maven index path instead of label-based inputs. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Juan Manuel Leflet Estrada <[email protected]> Signed-off-by: Cherry Picker <[email protected]>
- Adds search functionality using pre-generated, bundled index - Removes unneeded functionality: - Maven central HTTP sha search - Search by list of public groupIds - `disableMavenSearch` is no longer used - to be removed in a subsequent PR? Addresses https://issues.redhat.com/browse/MTA-6231 Requires konveyor/java-analyzer-bundle#158 Bundle PR: 166 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added local Maven index lookup and utilities (index entry type and random-name helper) for resolving Java artifacts. * **Improvements** * Dependency extraction, decompilation, and source resolution now use a configurable Maven index path for more reliable, faster resolution. * **Removed** * Removed reliance on open-source dependency label lists and remote Maven HTTP lookup in resolution flows. * **Refactor** * Public interfaces updated to accept and propagate a Maven index path instead of label-based inputs. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Juan Manuel Leflet Estrada <[email protected]> Signed-off-by: Cherry Picker <[email protected]> Signed-off-by: Juan Manuel Leflet Estrada <[email protected]> Signed-off-by: Cherry Picker <[email protected]> Co-authored-by: Juan Manuel Leflet Estrada <[email protected]>
disableMavenSearchis no longer used - to be removed in a subsequent PR?Addresses https://issues.redhat.com/browse/MTA-6231
Requires konveyor/java-analyzer-bundle#158
Bundle PR: 158
Summary by CodeRabbit
New Features
Improvements
Removed
Refactor