Change uses of withReadAdvice to use hints instead#14510
Change uses of withReadAdvice to use hints instead#14510ChrisHegarty merged 4 commits intoapache:mainfrom
Conversation
fee3d2c to
4e1356b
Compare
6154e9b to
020aa4c
Compare
lucene/test-framework/src/java/org/apache/lucene/tests/store/SerialIOCountingDirectory.java
Outdated
Show resolved
Hide resolved
| int chunkSizePower, | ||
| boolean confined) { | ||
| boolean confined, | ||
| Function<IOContext, ReadAdvice> toReadAdvice) { |
There was a problem hiding this comment.
A Function representing the mapping keeps the IOContext -> ReadAdvice mapping entirely within MMapDirectory
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
Outdated
Show resolved
Hide resolved
| docIn = state.directory.openInput(docName, state.context.withReadAdvice(ReadAdvice.NORMAL)); | ||
| docIn = | ||
| state.directory.openInput( | ||
| docName, state.context.withHints(FileTypeHint.DATA, FileDataHint.POSTINGS)); |
There was a problem hiding this comment.
Should this call also pass a DataAccessHint? The fact that it's sometimes provided and sometimes not is a bit confusing to me.
There was a problem hiding this comment.
This doesn't set a DataAccessHint as it's not RANDOM or SEQUENTIAL, just NORMAL.
I've tried to keep the initial hints as simple as possible here, without changing behaviour, so we can normalise and possibly modify behaviour in a subsequent PR
There was a problem hiding this comment.
Ohhhh I had assumed until now that DataAccessHint.SEQUENTIAL effectively meant ReadAdvice.NORMAL.
There was a problem hiding this comment.
It maps onto ReadAdvice.SEQUENTIAL
There was a problem hiding this comment.
This doesn't set a DataAccessHint as it's not RANDOM or SEQUENTIAL, just NORMAL.
#14635 - with this PR - we have removed the change in MMAPDirectory to apply read advice as normal. So if no data access hint is provided, it falls back to default read advice which is random. So read advice has changed from default to random ? Can you please confirm if I'm missing something and if not, then should we revert back the read advice to normal ?
...kward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/Lucene50CompoundReader.java
Outdated
Show resolved
Hide resolved
| // readahead. | ||
| if (context.hints().contains(FileDataHint.POSTINGS)) { | ||
| return ReadAdvice.NORMAL; | ||
| } |
There was a problem hiding this comment.
I feel like this default implementation should be dumb and trust the DataAccessHint, and the FileDataHint hint should only be used by users to override defaults when they know better about their data / access pattern?
There was a problem hiding this comment.
Currently, ReadAdvice.NORMAL doesn't have a corresponding DataAccessHint, so you get that by not specifying the DataAccessHint but then as the default is now ReadAdvice.RANDOM (or configurable), you need some way to explicitly specify 'I want the OS default behaviour here'.
Maybe we do have a DataAccessHint.DEFAULT, but then doesn't DataAccessHint just mirror ReadAdvice?
There was a problem hiding this comment.
I think it's natural for DataAccessHint to be closely related to ReadAdvice.
My suggestion would be to do the following:
- DataAccessHint has a single enum constant: RANDOM
- the lack of a DataAccessHint means that the access pattern is Lucene's standard access pattern: a mix of forward seeks and reads of data in the order of 1kB-1MB (what terms, postings, points and doc values typically do)
- Vectors pass a DataAccessHint.RANDOM at open time, they are the only ones to pass a DataAccessHint to the IOContext, because they're the only one to seek backwards to evaluate a single query
- To mimic today's behavior MMapDirectory's default implementation uses
Constants.DEFAULT_READADVICEon all files that haveFileTypeHint.DATA. - In the future (different PR), we can discuss whether MMapDirectory's default implementation should force
ReadAdvice.RANDOMforDataAccessHint.RANDOM, and whetherConstants.DEFAULT_READADVICEshould move back toNORMAL.
There was a problem hiding this comment.
There's still ReadAdvice.SEQUENTIAL, with corresponding DataAccessHint, which is used for a few things - whilst we need to consolidate uses of NORMAL vs SEQUENTIAL, I want to keep existing behaviour in this PR as much as as sensible, and modify it in subsequent PRs
There was a problem hiding this comment.
I've taken another look at this, and reduced MMapDirectory down to only check DataAccessHint and FileTypeHint, without changing behavior.
lucene/test-framework/src/java/org/apache/lucene/tests/store/SerialIOCountingDirectory.java
Outdated
Show resolved
Hide resolved
ChrisHegarty
left a comment
There was a problem hiding this comment.
Thanks @thecoop - LGTM
Backport of apache#14510 to 10.x
Backport of apache#14510 to 10.x
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```
Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529
Follows: #18545
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```
Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529
Follows: #18545
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```
Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529
Follows: #18545
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```
Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529
Follows: #18545
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```
Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529
Follows: #18545
Apply changes made between 10.2.2 and 10.3.1 versions:
```
$ g log --oneline releases/lucene/10.2.2..releases/lucene/10.3.1 lucene/core/src/java/org/apache/lucene/codecs/lucene90/{Lucene90DocValuesProducer.java,Lucene90DocValuesConsumer.java,Lucene90DocValuesFormat.java}
d176a42b659 Implement IndexedDISI#docIDRunEnd (#14753)
223d7a41e3c [10.x] Change uses of withReadAdvice to use hints instead (#14629) (#14510)
bb3167e57c6 Impl intoBitset for IndexedDISI and Docvalues (#14529)
```
Relevant PRs:
apache/lucene#14753
apache/lucene#14510
apache/lucene#14529
Follows: #18545
Followon from #14482.
ReadAdviceis now only really used byMMapDirectory, no longer part ofDirectory.This PR doesn't change any behaviour, it just replicates the
ReadAdvicethat would be used using hints +MMapDirectory.toReadAdvice.