Skip to content

Clean up FileTypeHint a bit.#14635

Merged
jpountz merged 1 commit intoapache:mainfrom
jpountz:cleanup_filetypehint
May 11, 2025
Merged

Clean up FileTypeHint a bit.#14635
jpountz merged 1 commit intoapache:mainfrom
jpountz:cleanup_filetypehint

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented May 9, 2025

  • more javadocs to guide usage of INDEX vs. DATA,
  • the METADATA constant is removed, as metadata files should be opened with Directory#openChecksumIndexInput, which doesn't take hints,
  • configure FileTypeHint on more files of the default codec,
  • remove checks on FileTypeHint from toReadAdvice - the default impl should only look at DataAccessHint to determine the appropriate read advice.

 - more javadocs to guide usage of `INDEX` vs. `DATA`,
 - the `METADATA` constant is removed, as metadata files should be opened with
   `Directory#openChecksumIndexInput`, which doesn't take hints,
 - configure `FileTypeHint` on more files of the default codec,
 - remove checks on `FileTypeHint` from `toReadAdvice` - the default impl
   should only look at `DataAccessHint` to determine the appropriate read
   advice.
vectorsStreamFN,
context.withHints(
FileTypeHint.DATA, FileDataHint.KNN_VECTORS, DataAccessHint.RANDOM));
d.openInput(vectorsStreamFN, context.withHints(FileTypeHint.DATA, DataAccessHint.RANDOM));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I removed KNN_VECTORS since these are term vectors, not KNN vectors.

@jpountz
Copy link
Contributor Author

jpountz commented May 9, 2025

cc @thecoop @ChrisHegarty

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thecoop
Copy link
Contributor

thecoop commented May 9, 2025

Oops - copy/paste error

@jpountz jpountz merged commit 40d7460 into apache:main May 11, 2025
7 checks passed
@jpountz jpountz deleted the cleanup_filetypehint branch May 11, 2025 21:22
jpountz added a commit that referenced this pull request May 11, 2025
- more javadocs to guide usage of `INDEX` vs. `DATA`,
 - the `METADATA` constant is removed, as metadata files should be opened with
   `Directory#openChecksumIndexInput`, which doesn't take hints,
 - configure `FileTypeHint` on more files of the default codec,
 - remove checks on `FileTypeHint` from `toReadAdvice` - the default impl
   should only look at `DataAccessHint` to determine the appropriate read
   advice.
weizijun added a commit to weizijun/lucene that referenced this pull request May 16, 2025
* main: (31 commits)
  Fix termination condition in TestStressNRTReplication. (apache#14665)
  deps(java): bump com.gradle.develocity from 3.19 to 3.19.2 (apache#14662)
  Build: remove hard-coded Java versions from ecj.javadocs.prefs (apache#14651)
  Update verifier comment to show label (apache#14658)
  Catch and re-throw Throwable rather than using a success boolean (apache#14633)
  Mention label in changelog verifier comment (apache#14656)
  Enable PR actions in changelog verifier (apache#14644)
  Fix FuzzySet#getEstimatedNumberUniqueValuesAllowingForCollisions to properly account for hashCount (apache#14614)
  Don't perform additional KNN querying after timeout, fixes apache#14639 (apache#14640)
  Add instructions to help/IDEs.txt for VSCode and Neovim (apache#14646)
  build(deps): bump ruff from 0.11.7 to 0.11.8 in /dev-tools/scripts (apache#14603)
  deps(java): bump de.jflex:jflex from 1.8.2 to 1.9.1 (apache#14583)
  Use the preload hint on completion fields and memory terms dictionaries. (apache#14634)
  Clean up FileTypeHint a bit. (apache#14635)
  Expressions: Improve test to use a fully private class or method
  Remove deprecations in expressions (apache#14641)
  removing constructor with deprecated attribute 'onlyLongestMatch (apache#14356)
  Moving CHANGES entry for apache#14609 from 11.0 to 10.3 (apache#14638)
  Overrides rewrite in PointRangeQuery to optimize AllDocs/NoDocs cases (apache#14609)
  Adding benchmark for histogram collector over point range query (apache#14622)
  ...

# Conflicts:
#	lucene/CHANGES.txt
@jpountz jpountz added this to the 10.3.0 milestone May 20, 2025
@bharath-techie
Copy link

Hi @thecoop @jpountz ,
I've commented the same in https://github.com/apache/lucene/pull/14510/files#r2265722765

lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene101/Lucene101PostingsReader.java
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java

- (removed)
    // Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL to perform
      // readahead.
      docIn = state.directory.openInput(docName, state.context.withReadAdvice(ReadAdvice.NORMAL))

+ (added) 
docIn =
          state.directory.openInput(
              docName, state.context.withHints(FileTypeHint.DATA, FileDataHint.POSTINGS));

comment : This doesn't set a DataAccessHint as it's not RANDOM or SEQUENTIAL, just NORMAL.

In this PR we have removed the change in MMAPDirectory to apply read advice as normal.

So if no data access hint is provided, it falls back to default read advice which is "random".

So read advice for this file has changed from default to random ?

Can you please confirm if I'm missing something and if not, then should we revert back the read advice to normal ?

@thecoop
Copy link
Contributor

thecoop commented Aug 11, 2025

The logic in MMapDirectory.toReadAdvice sets the read advice to NORMAL if FileTypeHint.DATA is specified. It is here, so this results in a read advice of NORMAL being used here

@bharath-techie
Copy link

@thecoop But that is reverted in this PR, isn't it ? am i missing something ?

image

@thecoop
Copy link
Contributor

thecoop commented Aug 11, 2025

Ah, yes it does. However, #15040 (also applied to 10.3) changes the default read advice to NORMAL

@bharath-techie
Copy link

Yes got it thanks :) I just noted this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants