Skip to content

Remove abstractions of MMapDirectory and its "fake" MR-JAR support and move it to main sourceSet (vectors untouched)#14564

Merged
uschindler merged 14 commits intoapache:mainfrom
uschindler:dev/removeMMapDirectoryAbstractions
Apr 30, 2025
Merged

Remove abstractions of MMapDirectory and its "fake" MR-JAR support and move it to main sourceSet (vectors untouched)#14564
uschindler merged 14 commits intoapache:mainfrom
uschindler:dev/removeMMapDirectoryAbstractions

Conversation

@uschindler
Copy link
Contributor

This PR moves all classes of MMapDirectory and the MemorySegment supporting classes from the java21 sourceset to the main sourceset.

It also updates the MR-JAR sourceset for vectorization to the Java 23 API extraction and removes all stub classes from the APIJAR which refer to memory segments, so only the vector API is extracted and saved in the API-JAR. This change makes merging vector code a bit harder as the file paths changes, but at some point we have to live with this.

The MMapDirectory abstractions (attachments, provider classes,..) were removed from code and all implementations are statically initialized by MMapDirectory (native access). The MemorySegmentIndexInput is now a 1st class citizen.

We may remove further abstractions in future and maybe also move other code away from ByteBuffer to MemorySegment to allow mapping vectors and directly access index parts also from other directories (NIOFSDir may also return vector views)...

@uschindler
Copy link
Contributor Author

This PR updates the ASM dependcy to 9.8 because its needed for the APIJAR extraction. Due to our locking code, the expressions module therefore also updates.

But after a discussion with Robert, I am already working on a move away from ASM for expressions with Java 24. The ASM is then no longer a dependency of Lucene artifacts.

@uschindler uschindler changed the title Remove abstractions of MMapDirectory and MR-JAR support and move it to main sourceSet Remove abstractions of MMapDirectory and its required MR-JAR support and move it to main sourceSet (vectors untouched) Apr 29, 2025
@uschindler uschindler changed the title Remove abstractions of MMapDirectory and its required MR-JAR support and move it to main sourceSet (vectors untouched) Remove abstractions of MMapDirectory and its "fake" MR-JAR support and move it to main sourceSet (vectors untouched) Apr 29, 2025
Copy link
Member

@rmuir rmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, this is a good step.

@uschindler
Copy link
Contributor Author

uschindler commented Apr 29, 2025

I also removed some workaround code for Java 21 that prevented thread safe status check on closed arena to propagate AlreadyClosedException: 9925e5e

P.S.: with this check removed we ensure that the JDK 23+ version is correct! If we see errors in tests that close open files we have to reopen issues @ JDK.

@uschindler
Copy link
Contributor Author

I applied some further code cleanups to get rid of more abstractions in private methods. I also added changes entry.

I think it's ready.

…o dev/removeMMapDirectoryAbstractions

# Conflicts:
#	lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
@uschindler
Copy link
Contributor Author

I updated the code to Java 24 (generation of APIJAR).

I also tested vectorization, seems to compile and work:

> Task :lucene:core:test
WARNING: Using incubator modules: jdk.incubator.vector
  2> Apr. 29, 2025 9:47:35 PM org.apache.lucene.internal.vectorization.PanamaVectorizationProvider <init>
  2> INFORMATION: Java vector incubator API enabled; uses preferredBitSize=512
  2> Apr. 29, 2025 9:47:35 PM org.apache.lucene.internal.vectorization.PanamaVectorizationProvider <init>
  2> INFORMATION: Java vector incubator API enabled; uses preferredBitSize=512
  1> filesystem: ExtrasFS(HandleLimitFS(LeakFS(ShuffleFS(DisableFsyncFS(VerboseFS(sun.nio.fs.WindowsFileSystemProvider@292241db))))))
  1> FS 0 [2025-04-29T19:47:35.248226100Z; SUITE-TestVectorUtil-seed#[B55333409746887B]-worker]: createDirectory: ..\tests-tmp (FAILED: java.nio.file.FileAlreadyExistsException: C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\core\build\tmp\tests-tmp)
  2> Apr. 29, 2025 9:47:35 PM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
  2> WARNUNG: Vector bitsize and/or integer vectors enforcement; using default vectorization provider outside of testMode
  1> Loaded codecs: [MinimalCodec, MinimalCompoundCodec, Lucene103, Asserting, CheapBastard, DeflateWithPresetCompressingStoredFieldsData, FastCompressingStoredFieldsData, FastDecompressionCompressingStoredFieldsData, HighCompressionCompressingStoredFieldsData, LZ4WithPresetCompressingStoredFieldsData, DummyCompressingStoredFieldsData, ConfigurableMCodec, SimpleText]
  1> Loaded postingsFormats: [Lucene103, MockRandom, RAMOnly, LuceneFixedGap, LuceneVarGapFixedInterval, LuceneVarGapDocFreqInterval, TestBloomFilteredLucenePostings, Asserting, UniformSplitRot13, STUniformSplitRot13, BlockTreeOrds, BloomFilter, Direct, FST50, UniformSplit, SharedTermsUniformSplit]
  1> NOTE: Created shared ExecutorService with 1 threads
  2> NOTE: test params are: codec=Asserting(Lucene103): {}, knn_vectors:{}, docValues:{}, maxPointsInLeafNode=1005, maxMBSortInHeap=6.190502464272589, sim=Asserting(RandomSimilarity(queryNorm=false): {}), locale=xog, timezone=America/Martinique
  2> NOTE: Windows 11 10.0 amd64/Oracle Corporation 24 (64-bit)/cpus=1,threads=1,free=278614240,total=324534272
  2> NOTE: All tests run in this JVM: [TestVectorUtil]
:lucene:core:test (SUCCESS): 30 test(s)

…n version is greater than minimum java version
@uschindler
Copy link
Contributor Author

I added a small change to the JAR task so the MR-JAR part is only build if the target version is > minimum version. This was basically a bug in the previous main branch: it added a Java 21 MRJAR part. I backported the last commit to 10.x, although the issue won't appear there.

…o dev/removeMMapDirectoryAbstractions

# Conflicts:
#	lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
#	lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java
#	versions.lock
@uschindler
Copy link
Contributor Author

Damn: "Create file open hints on IOContext to replace ReadAdvice (#14482)" was merged yesterday..... Trying to figure out what's going on. @thecoop

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @uschindler - LGTM

@uschindler
Copy link
Contributor Author

I am still verifyying that the conflicting changes due to readAdvice are merged in here in a correct way.

I will merge this very soon tro avoid conflicts like this. My head is burning....

@thecoop
Copy link
Contributor

thecoop commented Apr 30, 2025

The changes around ReadAdvice look ok to me

@uschindler
Copy link
Contributor Author

uschindler commented Apr 30, 2025

Thanks @thecoop. About the other PR: It was only for main branch, right? So it won't be backported, so all changes to readAdvice will be a feature for Lucene 11. To me those changes look to invasive for a minor release.

I am just asking this because due to the changes here merging of MemorySegmentIndexInput and Native access code will be more complicated du to the new location of files and removed abstractions. If the code is only developed for main branch it's not a problem.

I would hope that Directory and MMapDirectoy are in "feature freeze" for 10.x.

EDIT: DAMN it was backported. Happy merging in future!

@uschindler uschindler merged commit 2a5597e into apache:main Apr 30, 2025
7 checks passed
@uschindler uschindler deleted the dev/removeMMapDirectoryAbstractions branch April 30, 2025 08:24
weizijun added a commit to weizijun/lucene that referenced this pull request Apr 30, 2025
* main: (51 commits)
  Fix ECJ compiler config for Java 24 (also affects Eclipse IDE)
  Correct shebang in gradlew. apache#14592
  Overwrite gradlew scripts with up-to-date ones and apply Lucene customizations. (apache#14592)
  Remove abstractions of MMapDirectory and its "fake" MR-JAR support and move it to main sourceSet (vectors untouched) (apache#14564)
  deps(java): bump org.gradle.toolchains.foojay-resolver-convention (apache#14573)
  deps(java): bump com.gradle.develocity from 3.18.2 to 4.0.1 (apache#14585)
  deps(java): bump nl.littlerobots.version-catalog-update (apache#14568)
  Bump expected base Java version to 24. apache#14533
  Create file open hints on IOContext to replace ReadAdvice (apache#14482)
  deps(java): bump com.github.ben-manes.versions from 0.51.0 to 0.52.0 (apache#14574)
  deps(java): bump org.owasp.dependencycheck from 7.2.0 to 12.1.1 (apache#14584)
  deps(java): bump org.eclipse.jgit:org.eclipse.jgit (apache#14579)
  deps(java): bump com.diffplug.spotless from 6.9.1 to 7.0.3 (apache#14588)
  deps(java): bump com.gradle.common-custom-user-data-gradle-plugin (apache#14587)
  deps(java): bump net.ltgt.errorprone from 4.1.0 to 4.2.0 (apache#14567)
  deps(java): bump org.locationtech.jts:jts-core from 1.17.0 to 1.20.0 (apache#14586)
  deps(java): bump org.apache.opennlp:opennlp-tools from 2.5.3 to 2.5.4 (apache#14580)
  deps(java): bump asm from 9.6 to 9.8 (apache#14578)
  deps(java): bump commons-codec:commons-codec from 1.17.2 to 1.18.0 (apache#14581)
  deps(java): bump net.java.dev.javacc:javacc from 7.0.12 to 7.0.13 (apache#14576)
  ...

# Conflicts:
#	lucene/CHANGES.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants