Skip to content

Conversation

@otheng03
Copy link
Contributor

@otheng03 otheng03 commented Oct 11, 2024

ISSUE LINK: #1827

  • Add two new parameters to valkey.conf:
    • keyinfo-num-elements-larger-than
    • keyinfo-large-num-elements-max-len
  • Implement three new commands:
    • KEYINFO GET <count> MANY_ELEMENTS
    • KEYINFO LEN MANY_ELEMENTS
    • KEYINFO RESET MANY_ELEMENTS

@otheng03 otheng03 force-pushed the bigkeylog branch 2 times, most recently from 97e7b68 to e19f7ee Compare October 11, 2024 15:34
@otheng03 otheng03 force-pushed the bigkeylog branch 2 times, most recently from 6574313 to deb21af Compare April 8, 2025 18:48
@otheng03 otheng03 changed the title Adding bigkeylog commands to find out bigkeys Adding KEYINFO command to find out keys that have large number of elements Apr 8, 2025
@codecov
Copy link

codecov bot commented Apr 8, 2025

Codecov Report

Attention: Patch coverage is 79.56204% with 28 lines in your changes missing coverage. Please review.

Project coverage is 71.07%. Comparing base (2d200df) to head (23a9c40).
Report is 11 commits behind head on unstable.

Files with missing lines Patch % Lines
src/keyinfo.c 72.72% 24 Missing ⚠️
src/config.c 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1151      +/-   ##
============================================
+ Coverage     71.02%   71.07%   +0.05%     
============================================
  Files           123      124       +1     
  Lines         66116    66245     +129     
============================================
+ Hits          46956    47082     +126     
- Misses        19160    19163       +3     
Files with missing lines Coverage Δ
src/commands.def 100.00% <ø> (ø)
src/db.c 89.58% <100.00%> (+<0.01%) ⬆️
src/server.c 87.94% <100.00%> (+0.03%) ⬆️
src/server.h 100.00% <ø> (ø)
src/t_hash.c 96.23% <100.00%> (-0.11%) ⬇️
src/t_list.c 92.96% <100.00%> (+0.08%) ⬆️
src/t_set.c 97.86% <100.00%> (+0.35%) ⬆️
src/t_stream.c 93.18% <100.00%> (+<0.01%) ⬆️
src/t_string.c 96.79% <100.00%> (+0.01%) ⬆️
src/t_zset.c 96.77% <100.00%> (+0.01%) ⬆️
... and 2 more

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@otheng03 otheng03 force-pushed the bigkeylog branch 4 times, most recently from 4949a98 to 47c89bd Compare April 16, 2025 14:45
@otheng03 otheng03 force-pushed the bigkeylog branch 2 times, most recently from 53e94d4 to 0f13625 Compare April 28, 2025 14:39
@otheng03 otheng03 marked this pull request as ready for review April 29, 2025 00:48
@hwware
Copy link
Member

hwware commented May 2, 2025

I have not begun to review your codes, I just take a look the new command names:
KEYINFO GET MANY_ELEMENTS
KEYINFO LEN MANY_ELEMENTS
KEYINFO RESET MANY_ELEMENTS

The MANY_ELEMENTS is a little bit weird, i am not sure if you are willing to attend Valkey meeting next Monday (Link here https://github.com/valkey-io/valkey/wiki/Weekly-meeting#weekly-meeting-schedule ) and introduce this PR background to Valkey maintainer? Thanks

@otheng03
Copy link
Contributor Author

otheng03 commented May 5, 2025

Hi, thank you for your message.
I’m not sure if I’ll be able to attend the meeting, as May 3rd to 6th is one of the biggest holiday in my country.
Also, since I’m not fluent in English, I’m a bit concerned about communicating effectively during the meeting.
Would it be possible for me to introduce the feature in a document instead?
If that’s not sufficient, I’d be happy to join the meeting on Monday, May 12, if that works for you.

@madolson madolson added the major-decision-pending Major decision pending by TSC team label May 5, 2025
@hwware
Copy link
Member

hwware commented May 5, 2025

In today meeting, tsc member discussed this PR and express 2 concerns:

  1. The new KEYINFO list could cost some memory due to the "key name" in the entry. Because in Valkey, the maximum allowed key size is 512 MB, thus if multiply such these kinds of key is saved to KEYINFO list, it causes huge memory consumption. For the slowlog, no key name is saved (https://valkey.io/commands/slowlog-get/)

  2. MANY_ELEMENTS name is a little bit weird, I suggest we have a better name

Could you wrote down a specific example how this feature in used in production environment, (I know the valkey-cli --bigkeys is too expensive in real production) so that TSC member can understand your idea or join next Monday meeting, Thanks

@otheng03
Copy link
Contributor Author

otheng03 commented May 11, 2025

Thanks for the review.

  1. Oh, I see. I'll try to modify it to reference the existing key in order to reduce memory usage. I realized that referencing the existing key in hashtabase makes code complex since when the hashtable is resized, we may need to verify whether it's safe to keep using the pointer. Also, having an informational feature affect the system’s core functionality could make maintenance more difficult, so I don't think it's a good approach.
    Instead, could we consider adding an option to limit the key length in KEYINFO? (ex: keyinfo-max-key-len)

  2. I've listed a few candidates. Could you please pick one?
    2.1. keyinfo get -1 bigkeys
    2.2. keyinfo get -1 HIGH_CARDINALITY
    2.3. info bigkeys
    2.4. info highcardinalitykeys
    IMHO, keyinfo get -1 HIGH_CARDINALITY looks fine, but any suggestions are welcome.

@otheng03
Copy link
Contributor Author

otheng03 commented May 11, 2025

| Could you wrote down a specific example how this feature in used in production environment,

The use case I have in mind is to identify keys with a large number of elements easily and early—before they cause problems in the system.
To achieve this, the monitoring tool periodically checks which keys are large.

I realized that using this feature to retrieve big keys can block other commands if a key is extremely large, such as one that is 512MB in size. It might be better to change the default behavior to return only the beginning part of the key name, and provide an option to return the full key name if needed.

@otheng03
Copy link
Contributor Author

otheng03 commented May 12, 2025

This is a specific usecase I have in mind.

  • Regularly checking for big keys and raising an alert when they are detected.
    • I want to save time getting dump files from remote servers and running valkey-cli --bigkeys.
    • Due to security restrictions, I sometimes can’t access a remote host, which makes it difficult to get dump.rdb
      • Sometimes, access to remote hosts is granted through a complex verification process.
    • Although rare, there are cases where retrieving the full dataset is restricted for security reasons.

{3C7043BE-333D-48BF-A6A1-F4628CB715D0}

@otheng03
Copy link
Contributor Author

otheng03 commented May 12, 2025

Hi, @hwware.
Thank you, It was a great experience joining the meeting — I really liked how the open discussion moved quickly but still went deep.
I didn’t get a chance to bring up the topic I had prepared. But I left them by comments in this PR and I needed some time to work on the feedback I got from the May 5 meeting anyway.
If it's okay, I'd like to attend the meeting often.

@hwware
Copy link
Member

hwware commented May 15, 2025

Hi, @hwware. Thank you, It was a great experience joining the meeting — I really liked how the open discussion moved quickly but still went deep. I didn’t get a chance to bring up the topic I had prepared. But I left them by comments in this PR and I needed some time to work on the feedback I got from the May 5 meeting anyway. If it's okay, I'd like to attend the meeting often.

I add your PR in our meeting agenda (https://github.com/valkey-io/valkey/wiki/Weekly-meeting#core-team-sync-2025-05-19-1400-utc), you should have chance to discuss your requirement with TSC member in the meeting.

@otheng03
Copy link
Contributor Author

| I add your PR in our meeting agenda (https://github.com/valkey-io/valkey/wiki/Weekly-meeting#core-team-sync-2025-05-19-1400-utc), you should have chance to discuss your requirement with TSC member in the meeting.

Thanks for adding this PR to the agend.
However, I noticed that you've drawn a consensus between INFO KEYSIZES and this PR. #1 of this comment
I'm a bit concerned that this might lead to redundant discussions.
Would it be better to discuss this further? Anyway, I'm not sure it's acceptable to develop two similar features.

@madolson
Copy link
Member

madolson commented May 26, 2025

Thanks for investigating this implementation. We discussed it in our weekly meeting, and have some open questions.

  1. Performance: How much CPU does it take to perform the crc16 or some similar hash (xxhash or siphash). We probably can't enable this feature by default if the hashing is taking a lot of time.
  2. Module interaction: The module API needs to also be included here. JSON should be able to be supported here as well as modules that edit hash objects.
  3. This isn't really the 10 largest keys or the 10 most recently edited large keys. It's really just a bin of "N" large keys. There was an ask to either make it the 10 largest or the 10 most recently edited. That may not be possible though.
  4. Do we really need this functionality. Can an end user solve all of their problems with either a valkey-cli --bigkeys or using second order information like slowlog or commandlong large-reply.
  5. The deletion code path can lead to dangling keys. If something is async deleted, it looks like it's not getting tracked. If we do a flushall that is asynchronous, we don't do any mainthread work to clean up the data so the item in the table will not get cleaned up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

major-decision-pending Major decision pending by TSC team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants