Skip to content

Change uses of withReadAdvice to use hints instead#14510

Merged
ChrisHegarty merged 4 commits intoapache:mainfrom
thecoop:read-advice-to-hints
May 9, 2025
Merged

Change uses of withReadAdvice to use hints instead#14510
ChrisHegarty merged 4 commits intoapache:mainfrom
thecoop:read-advice-to-hints

Conversation

@thecoop
Copy link
Contributor

@thecoop thecoop commented Apr 16, 2025

Followon from #14482.

ReadAdvice is now only really used by MMapDirectory, no longer part of Directory.

This PR doesn't change any behaviour, it just replicates the ReadAdvice that would be used using hints + MMapDirectory.toReadAdvice.

int chunkSizePower,
boolean confined) {
boolean confined,
Function<IOContext, ReadAdvice> toReadAdvice) {
Copy link
Contributor Author

@thecoop thecoop Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Function representing the mapping keeps the IOContext -> ReadAdvice mapping entirely within MMapDirectory

@thecoop thecoop marked this pull request as ready for review April 30, 2025 13:45
docIn = state.directory.openInput(docName, state.context.withReadAdvice(ReadAdvice.NORMAL));
docIn =
state.directory.openInput(
docName, state.context.withHints(FileTypeHint.DATA, FileDataHint.POSTINGS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call also pass a DataAccessHint? The fact that it's sometimes provided and sometimes not is a bit confusing to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't set a DataAccessHint as it's not RANDOM or SEQUENTIAL, just NORMAL.

I've tried to keep the initial hints as simple as possible here, without changing behaviour, so we can normalise and possibly modify behaviour in a subsequent PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh I had assumed until now that DataAccessHint.SEQUENTIAL effectively meant ReadAdvice.NORMAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It maps onto ReadAdvice.SEQUENTIAL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thecoop @jpountz ,

This doesn't set a DataAccessHint as it's not RANDOM or SEQUENTIAL, just NORMAL.

#14635 - with this PR - we have removed the change in MMAPDirectory to apply read advice as normal. So if no data access hint is provided, it falls back to default read advice which is random. So read advice has changed from default to random ? Can you please confirm if I'm missing something and if not, then should we revert back the read advice to normal ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See reply in #14635 (comment)

// readahead.
if (context.hints().contains(FileDataHint.POSTINGS)) {
return ReadAdvice.NORMAL;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this default implementation should be dumb and trust the DataAccessHint, and the FileDataHint hint should only be used by users to override defaults when they know better about their data / access pattern?

Copy link
Contributor Author

@thecoop thecoop May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, ReadAdvice.NORMAL doesn't have a corresponding DataAccessHint, so you get that by not specifying the DataAccessHint but then as the default is now ReadAdvice.RANDOM (or configurable), you need some way to explicitly specify 'I want the OS default behaviour here'.

Maybe we do have a DataAccessHint.DEFAULT, but then doesn't DataAccessHint just mirror ReadAdvice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's natural for DataAccessHint to be closely related to ReadAdvice.

My suggestion would be to do the following:

  • DataAccessHint has a single enum constant: RANDOM
  • the lack of a DataAccessHint means that the access pattern is Lucene's standard access pattern: a mix of forward seeks and reads of data in the order of 1kB-1MB (what terms, postings, points and doc values typically do)
  • Vectors pass a DataAccessHint.RANDOM at open time, they are the only ones to pass a DataAccessHint to the IOContext, because they're the only one to seek backwards to evaluate a single query
  • To mimic today's behavior MMapDirectory's default implementation uses Constants.DEFAULT_READADVICE on all files that have FileTypeHint.DATA.
  • In the future (different PR), we can discuss whether MMapDirectory's default implementation should force ReadAdvice.RANDOM for DataAccessHint.RANDOM, and whether Constants.DEFAULT_READADVICE should move back to NORMAL.

Copy link
Contributor Author

@thecoop thecoop May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still ReadAdvice.SEQUENTIAL, with corresponding DataAccessHint, which is used for a few things - whilst we need to consolidate uses of NORMAL vs SEQUENTIAL, I want to keep existing behaviour in this PR as much as as sensible, and modify it in subsequent PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken another look at this, and reduced MMapDirectory down to only check DataAccessHint and FileTypeHint, without changing behavior.

@thecoop thecoop requested review from ChrisHegarty and jpountz May 6, 2025 13:57
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thecoop - LGTM

thecoop added a commit to thecoop/lucene that referenced this pull request May 8, 2025
thecoop added a commit to thecoop/lucene that referenced this pull request May 8, 2025
@ChrisHegarty ChrisHegarty merged commit bec671e into apache:main May 9, 2025
7 checks passed
@thecoop thecoop deleted the read-advice-to-hints branch May 9, 2025 08:37
matriv added a commit to crate/crate that referenced this pull request Oct 21, 2025
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```

Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529

Follows: #18545
matriv added a commit to crate/crate that referenced this pull request Oct 21, 2025
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```

Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529

Follows: #18545
matriv added a commit to crate/crate that referenced this pull request Oct 21, 2025
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```

Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529

Follows: #18545
matriv added a commit to crate/crate that referenced this pull request Oct 21, 2025
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```

Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529

Follows: #18545
matriv added a commit to crate/crate that referenced this pull request Oct 21, 2025
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```

Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529

Follows: #18545
mergify bot pushed a commit to crate/crate that referenced this pull request Oct 21, 2025
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```

Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529

Follows: #18545
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants