Skip to content

Support compatible Kotlin AST diff trees#1101

Open
thromel wants to merge 1 commit into
tsantalis:masterfrom
thromel:romel/issue-1100-compatible-ast
Open

Support compatible Kotlin AST diff trees#1101
thromel wants to merge 1 commit into
tsantalis:masterfrom
thromel:romel/issue-1100-compatible-ast

Conversation

@thromel

@thromel thromel commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Fixes #1100.

Why

Issue #1100 shows a real mismatch in AST-diff for Java-to-Kotlin changes. Java files use the JDT-style GumTree labels, while Kotlin files were still using tree-sitter labels such as class_declaration, function_declaration, and property_declaration. That made the two trees harder to match even when the source-level change was a straightforward Java-to-Kotlin conversion.

This PR changes AST-diff mode only. Refactoring detection still uses the existing Kotlin PSI/UML pipeline.

What changed

  • Added a Kotlin PSI-based AST-diff visitor that emits Java-compatible GumTree nodes, including CompilationUnit, PackageDeclaration, ImportDeclaration, TypeDeclaration, FieldDeclaration, MethodDeclaration, SingleVariableDeclaration, Block, ReturnStatement, MethodInvocation, InfixExpression, and comments.
  • Switched Kotlin AST-diff generation to that visitor when astDiff=true.
  • Kept Java-to-Kotlin detection based on language identity instead of declaration node labels. Kotlin now intentionally emits TypeDeclaration, so comparing declaration labels would hide Java-to-Kotlin diffs.
  • Hardened matcher paths that can see missing trees after pruning or location lookup.
  • Made decorator checks use the destination language constants.
  • Kept Nikos's recent operator work compatible with the new Kotlin AST shape: Kotlin infix operators and assignment operators now use the same compatible operator node names as Java, and initialized Kotlin properties expose their = as ASSIGNMENT_OPERATOR.
  • Regenerated the Java-to-Kotlin AST-diff fixtures for the OkHttp and IntelliJ commits covered by the test data.
  • Fixed wildcard Kotlin import ranges so a QualifiedName labeled kotlin.math.* covers the full source text, including .*.

Why the fixture diff is large

No fixture files were removed. The large deletion count is inside generated JSON snapshots.

The old Kotlin AST-diff tree came from tree-sitter and included many low-level syntax nodes. The new tree is smaller and Java-compatible. For example, Kotlin now emits CompilationUnit, TypeDeclaration, MethodDeclaration, FieldDeclaration, VariableDeclarationFragment, INFIX_EXPRESSION_OPERATOR, and ASSIGNMENT_OPERATOR instead of tree-sitter-specific nodes such as source_file, class_declaration, function_declaration, and property_declaration.

That means the expected mapping snapshots had to be regenerated. The changed fixture folders are limited to the two commits listed in java2kotlin.json.

Behavior notes

The new visitor is compatibility-oriented. It is not trying to model every Kotlin PSI node as a perfect JDT equivalent. It maps the Kotlin constructs that matter for the current AST-diff path to Java-style labels, then falls back to generic expression or statement wrappers when a construct does not have a dedicated mapping yet.

That keeps the issue fix scoped: Java-to-Kotlin AST diffs now get comparable tree roots and common declaration/body structure, without changing the refactoring model or the non-AST-diff Kotlin path.

Review feedback addressed

Claude Code with the Opus model alias reviewed the change before the PR was opened. I addressed the concrete findings around in-body comments, non-ASCII source offsets, and avoiding silent visitor failures.

puku-cli with the Opus model reviewed the PR after it was opened. One finding held up against the code: wildcard imports had a label/source-span mismatch. This PR now ranges the full wildcard import text and bounds contained-node lookup by the owning PSI element.

Nikos's follow-up commit on master added Java-to-Kotlin operator mappings for infix operators, assignment operators, and variable declaration =. This branch is rebased on that work and adapts it to the compatible Kotlin AST: those operators are now represented with compatible node names instead of tree-sitter operator node names.

The puku review also flagged cross-language detection as a possible regression. I checked that against the previous behavior. Before this PR, Java and Kotlin were already treated as cross-language because Java used TypeDeclaration and Kotlin used tree-sitter declaration labels. The new language-identity check preserves that behavior after Kotlin starts emitting Java-compatible declaration labels.

Tests

  • git diff --check
  • ./gradlew test --tests org.refactoringminer.astDiff.tests.KotlinCompatibleTreeVisitorTest --tests org.refactoringminer.astDiff.tests.JavaToKotlinDiffTest --no-daemon
  • ./gradlew test --tests org.refactoringminer.astDiff.tests.TreeFromParserTest --tests org.refactoringminer.astDiff.tests.TreeMatcherTest --tests org.refactoringminer.astDiff.tests.KotlinCompatibleTreeVisitorTest --tests org.refactoringminer.astDiff.tests.JavaToKotlinDiffTest --tests org.refactoringminer.astDiff.tests.PythonDiffTest --tests org.refactoringminer.astDiff.tests.TypeScriptDiffTest --no-daemon
  • ./gradlew test --tests org.refactoringminer.test.TestKotlinDatasetRefactorings --no-daemon

@thromel thromel force-pushed the romel/issue-1100-compatible-ast branch from 5fe7b6a to 79d9299 Compare June 20, 2026 10:19
@thromel thromel marked this pull request as draft June 20, 2026 10:57
@thromel thromel force-pushed the romel/issue-1100-compatible-ast branch from 79d9299 to fbd2f3c Compare June 20, 2026 11:18
@thromel thromel force-pushed the romel/issue-1100-compatible-ast branch from fbd2f3c to ed5205d Compare June 20, 2026 11:51
@thromel thromel marked this pull request as ready for review June 20, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate compatible AST for Kotlin and Java

1 participant