Skip to content

Conversation

@artikell
Copy link
Contributor

This PR fix three issues(#641):

  • Fixed the incorrect entries-read count in the case of an empty stream.
  • Ensured the real-time accuracy of the entries-read value in XINFO, preventing inconsistencies between lag and entries-read. Consequently, the return results of some unit tests (UT) have been modified.
  • Fixed the counting error when last-id < max_deleted_entry_id < first_id.

@codecov
Copy link

codecov bot commented Jun 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.87%. Comparing base (ae2d421) to head (f5c2ba3).
Report is 804 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #685      +/-   ##
============================================
+ Coverage     70.03%   70.87%   +0.83%     
============================================
  Files           110      123      +13     
  Lines         60076    65929    +5853     
============================================
+ Hits          42076    46726    +4650     
- Misses        18000    19203    +1203     
Files with missing lines Coverage Δ
src/t_stream.c 92.88% <100.00%> (-0.27%) ⬇️

... and 115 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@madolson madolson requested a review from hpatro November 22, 2024 03:29
Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for taking this long to get to this. I'm not super confident in getting all these edge cases fixed in one go. If you don't mind, could you split out the changes into separate PR and target this one to fix only the lag issue described in #641 . @valkey-io/core-team Could one of you with more expertise have a look at it?

Disclaimer: I'm also reading this code flow in detail for the first time.

Comment on lines -1391 to -1395
if (!s->length || streamIDEqZero(&s->max_deleted_entry_id)) {
/* The stream is empty or has no tombstones. */
return 0;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to remove this early exit condition?

@artikell
Copy link
Contributor Author

artikell commented Jan 4, 2025

Apologies for taking this long to get to this. I'm not super confident in getting all these edge cases fixed in one go. If you don't mind, could you split out the changes into separate PR and target this one to fix only the lag issue described in #641 . @valkey-io/core-team Could one of you with more expertise have a look at it?

Disclaimer: I'm also reading this code flow in detail for the first time.

Got it, thanks for the review. I'll try to split it in the next few days.

Comment on lines +1424 to +1428
long long entries_read = streamEstimateDistance(s, cg, &cg->last_id);
if (entries_read != SCG_INVALID_ENTRIES_READ) {
/* A valid counter was obtained. */
lag = (long long)s->entries_added - entries_read;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Estimate distance is just an estimate, not exact?

If it is exact, then we should rename the estimate functions to something more exact like calculate, compute, etc.

@zuiderkwast
Copy link
Contributor

We can merge #1952 first, backport it and include it in patch releases 7.2.9, 8.0.3 and 8.1.1.

After that, we can merge this PR and include it in 9.0 because it has a small behavior change. Sounds good?

@artikell
Copy link
Contributor Author

We can merge #1952 first, backport it and include it in patch releases 7.2.9, 8.0.3 and 8.1.1.

After that, we can merge this PR and include it in 9.0 because it has a small behavior change. Sounds good?

LGTM, We also have to pay attention to max-deleted-entry-id incompatible changes in #1952.

@madolson madolson moved this to Optional for next release candidate in Valkey 9.0 Jul 21, 2025
@madolson
Copy link
Member

@zuiderkwast Can we discuss this in the next weekly meeting to understand the little changes you referred to?

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Jul 21, 2025
@nesty92
Copy link
Contributor

nesty92 commented Jul 21, 2025

@madolson I think is already solved in #1952

@artikell
Copy link
Contributor Author

As far as I remember, there are some differences. Let me revise this PR to facilitate the discussion. If there are no issues, I'll close it.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Jul 22, 2025

@zuiderkwast Can we discuss this in the next weekly meeting to understand the little changes you referred to?

Not sure if I will be able join. It's about the first bullet in the top comment. The small changes to test cases in this PR illustrate the small behaviour change. The XINFO field entries-read returns a number instead of null, which looks like a correction more than anything.

@artikell
Copy link
Contributor Author

artikell commented Aug 4, 2025

As @zuiderkwast mentioned, currently, the entries-read and lag fields of xinfo group are calculated separately.

            if (cg->entries_read != SCG_INVALID_ENTRIES_READ) {
                addReplyLongLong(c, cg->entries_read);
            } else {
                addReplyNull(c);
            }
            addReplyBulkCString(c, "lag");
            streamReplyWithCGLag(c, s, cg);

Theoretically, if these two fields cannot be calculated, they should both be null. However, this is not the case now, which seems to be an anomaly.

We need to discuss whether to amend this logic: ensuring that either both lag and entries-read are null, or both have values.

cc @madolson @nesty92

@madolson madolson moved this from Optional for next release candidate to Todo in Valkey 9.0 Aug 6, 2025
@madolson madolson moved this from Todo to In Progress in Valkey 9.0 Aug 6, 2025
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valkey-io/core-team Bump! Let's include this in 9.0. It's a good FIX. In some cases, we return a number instead of null for entries-read and lag, which is an improvement and a correction, not a breaking change. Look at the test case changes to see the behavior change.

@artikell One commit is missing the DCO. Please fix it.

@artikell artikell force-pushed the fix_stream_lag_value branch from f5c2ba3 to 67ecf89 Compare October 10, 2025 03:49
@madolson
Copy link
Member

Yeah, this behavior change seems OK to me.

@madolson madolson added major-decision-approved Major decision approved by TSC team release-notes This issue should get a line item in the release notes and removed major-decision-pending Major decision pending by TSC team labels Oct 13, 2025
@zuiderkwast
Copy link
Contributor

@artikell Can you fix the merge conflicts, please?

@artikell
Copy link
Contributor Author

@artikell Can you fix the merge conflicts, please?

Okay, I have time to fix this today.

@artikell artikell force-pushed the fix_stream_lag_value branch from 67ecf89 to c6cc724 Compare October 15, 2025 06:27
assert_equal [dict get $reply max-deleted-entry-id] "1-0"
assert_equal [dict get $reply entries-added] 1
assert_equal [dict get $group entries-read] {}
assert_equal [dict get $group entries-read] 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this will be one if we haven't read any message from the stream?

# so the lag can't be calculated
set reply [r XINFO STREAM x FULL]
set group [lindex [dict get $reply groups] 0]
assert_equal [dict get $group entries-read] 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the amount of entries-read haven't change even if there is a tombstone, why the null?

If a consumer is relaying on the metrics it will be very weird to get null entries-read for this case

assert_equal [dict get $group lag] 1
set group [lindex [dict get $reply groups] 1]
assert_equal [dict get $group entries-read] {}
assert_equal [dict get $group entries-read] 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same case here, g2 haven't read anything from the stream at this point

@nesty92
Copy link
Contributor

nesty92 commented Oct 15, 2025

As @zuiderkwast mentioned, currently, the entries-read and lag fields of xinfo group are calculated separately.

            if (cg->entries_read != SCG_INVALID_ENTRIES_READ) {
                addReplyLongLong(c, cg->entries_read);
            } else {
                addReplyNull(c);
            }
            addReplyBulkCString(c, "lag");
            streamReplyWithCGLag(c, s, cg);

Theoretically, if these two fields cannot be calculated, they should both be null. However, this is not the case now, which seems to be an anomaly.

We need to discuss whether to amend this logic: ensuring that either both lag and entries-read are null, or both have values.

cc @madolson @nesty92

I don't see the problem with the two having different values, since they measure different things. The read entries will tell you how many items the consumer has read, and the delay will tell you an estimation how much is left to read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

major-decision-approved Major decision approved by TSC team release-notes This issue should get a line item in the release notes

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants