-
Notifications
You must be signed in to change notification settings - Fork 955
Fix incorrect lag and entries-read value with tombstone #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #685 +/- ##
============================================
+ Coverage 70.03% 70.87% +0.83%
============================================
Files 110 123 +13
Lines 60076 65929 +5853
============================================
+ Hits 42076 46726 +4650
- Misses 18000 19203 +1203
🚀 New features to boost your workflow:
|
hpatro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for taking this long to get to this. I'm not super confident in getting all these edge cases fixed in one go. If you don't mind, could you split out the changes into separate PR and target this one to fix only the lag issue described in #641 . @valkey-io/core-team Could one of you with more expertise have a look at it?
Disclaimer: I'm also reading this code flow in detail for the first time.
| if (!s->length || streamIDEqZero(&s->max_deleted_entry_id)) { | ||
| /* The stream is empty or has no tombstones. */ | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to remove this early exit condition?
Got it, thanks for the review. I'll try to split it in the next few days. |
| long long entries_read = streamEstimateDistance(s, cg, &cg->last_id); | ||
| if (entries_read != SCG_INVALID_ENTRIES_READ) { | ||
| /* A valid counter was obtained. */ | ||
| lag = (long long)s->entries_added - entries_read; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Estimate distance is just an estimate, not exact?
If it is exact, then we should rename the estimate functions to something more exact like calculate, compute, etc.
|
We can merge #1952 first, backport it and include it in patch releases 7.2.9, 8.0.3 and 8.1.1. After that, we can merge this PR and include it in 9.0 because it has a small behavior change. Sounds good? |
|
@zuiderkwast Can we discuss this in the next weekly meeting to understand the little changes you referred to? |
|
As far as I remember, there are some differences. Let me revise this PR to facilitate the discussion. If there are no issues, I'll close it. |
Not sure if I will be able join. It's about the first bullet in the top comment. The small changes to test cases in this PR illustrate the small behaviour change. The XINFO field entries-read returns a number instead of null, which looks like a correction more than anything. |
|
As @zuiderkwast mentioned, currently, the Theoretically, if these two fields cannot be calculated, they should both be We need to discuss whether to amend this logic: ensuring that either both |
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valkey-io/core-team Bump! Let's include this in 9.0. It's a good FIX. In some cases, we return a number instead of null for entries-read and lag, which is an improvement and a correction, not a breaking change. Look at the test case changes to see the behavior change.
@artikell One commit is missing the DCO. Please fix it.
f5c2ba3 to
67ecf89
Compare
|
Yeah, this behavior change seems OK to me. |
|
@artikell Can you fix the merge conflicts, please? |
Okay, I have time to fix this today. |
Signed-off-by: artikell <[email protected]>
67ecf89 to
c6cc724
Compare
| assert_equal [dict get $reply max-deleted-entry-id] "1-0" | ||
| assert_equal [dict get $reply entries-added] 1 | ||
| assert_equal [dict get $group entries-read] {} | ||
| assert_equal [dict get $group entries-read] 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this will be one if we haven't read any message from the stream?
| # so the lag can't be calculated | ||
| set reply [r XINFO STREAM x FULL] | ||
| set group [lindex [dict get $reply groups] 0] | ||
| assert_equal [dict get $group entries-read] 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the amount of entries-read haven't change even if there is a tombstone, why the null?
If a consumer is relaying on the metrics it will be very weird to get null entries-read for this case
| assert_equal [dict get $group lag] 1 | ||
| set group [lindex [dict get $reply groups] 1] | ||
| assert_equal [dict get $group entries-read] {} | ||
| assert_equal [dict get $group entries-read] 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same case here, g2 haven't read anything from the stream at this point
I don't see the problem with the two having different values, since they measure different things. The read entries will tell you how many items the consumer has read, and the delay will tell you an estimation how much is left to read. |
This PR fix three issues(#641):