-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: add RESET statement for configuration variabless #18408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting this PR, I've left a few comments
| self.return_empty_dataframe() | ||
| } | ||
|
|
||
| async fn reset_variable(&self, stmt: ResetVariable) -> Result<DataFrame> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd to always return an empty dataframe in this function. Is that really necessary vs Result<()> ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
datafusion/common/src/config.rs
Outdated
| } | ||
|
|
||
| /// Reset a configuration option back to its default value | ||
| pub fn reset(&mut self, key: &str) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall it might be better to add a reset fn to ConfigField and update the config_namespace! macro rather than implement it like this. Just my opinion..
|
One additional thing - I didn't see anything in this PR for the unparser. It's possible that the set/unset isn't supported (and/or shouldn't be supported) there but could you take a look and see? |
9a18356 to
9cbfb17
Compare
@Omega359 Thanks for flagging it. I double-checked the unparser: today it simply funnels |
be08bf3 to
645e40b
Compare
645e40b to
38a27d5
Compare
|
Thanks for continuing to work on this - I'll try and do an in-depth review tomorrow for this @Weijun-H. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good after a quick review. My biggest concern is the ability to reset whole sections of config - I am uncertain we want to support that.
| } | ||
| _ => _config_err!( | ||
| "Config value \"{}\" not found on ConfigFileEncryptionProperties", | ||
| "Config value \"{}\" not found on ConfigFileDecryptionProperties", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
| SET datafusion.runtime.memory_limit = '1M' | ||
|
|
||
| statement ok | ||
| RESET datafusion.runtime.memory_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the show to verify it was indeed reset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the SHOW statement does not display runtime settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file #18452
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a hack way to test, we can ensure the below query succeed after resetting.
> set datafusion.runtime.memory_limit = '1K';
0 row(s) fetched.
Elapsed 0.001 seconds.
> explain analyze select * from generate_series(1000) as t1(v1) order by v1;
Not enough memory to continue external sort. Consider increasing the memory limit, or decreasing sort_spill_reservation_bytes
caused by
Resources exhausted: Additional allocation failed for ExternalSorterMerge[0] with top memory consumers (across reservations) as:
DataFusion-Cli#89(can spill: false) consumed 0.0 B, peak 0.0 B,
ExternalSorter[0]#90(can spill: true) consumed 0.0 B, peak 0.0 B,
ExternalSorterMerge[0]#91(can spill: false) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 10.0 MB for ExternalSorterMerge[0] with 0.0 B already allocated for this reservation - 1024.0 B remain available for the total pool
datafusion/common/src/config.rs
Outdated
| match section { | ||
| "catalog" => { | ||
| if rem.is_empty() { | ||
| self.catalog = CatalogOptions::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am uncertain that this should be supported to be honest. I think that a full configuration key should be required for reset.
| } | ||
|
|
||
| fn reset(&mut self, key: &str) -> $crate::error::Result<()> { | ||
| if key.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems off. Why error if there is a key but don't if there isn't? Seems backwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scalar config fields the caller already strips off the key for the field before invoking reset. At that point key only contains a nested suffix (if any).
- key.is_empty() → we’re exactly at the scalar value, so resetting means “restore to default”, which we do.
- non-empty key → someone tried to reset something below a scalar (e.g. foo.bar where foo is a scalar); that’s invalid, so we emit the helpful error.
datafusion/common/src/config.rs
Outdated
| Ok(()) | ||
| } else { | ||
| $crate::error::_config_err!( | ||
| "Config value \"{}\" not found on {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message doesn't make sense in this context. We also should have a test to cover this error
| } | ||
| "max_temp_directory_size" => { | ||
| // default is 100 GB | ||
| const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100 * 1024 * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this const in disk_manager.rs, could we not just use that? Might need to change to pub(crate) though...
| "metadata_cache_limit" => { | ||
| // default is 50 MB | ||
| const DEFAULT_METADATA_CACHE_LIMIT: usize = 50 * 1024 * 1024; | ||
| builder = builder.with_metadata_cache_limit(DEFAULT_METADATA_CACHE_LIMIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here in cache_manager.rs
|
Small suggestions, but otherwise LGTM, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on it! I haven't review the implementations yet, I'll continue soon.
| SET datafusion.runtime.memory_limit = '1M' | ||
|
|
||
| statement ok | ||
| RESET datafusion.runtime.memory_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a hack way to test, we can ensure the below query succeed after resetting.
> set datafusion.runtime.memory_limit = '1K';
0 row(s) fetched.
Elapsed 0.001 seconds.
> explain analyze select * from generate_series(1000) as t1(v1) order by v1;
Not enough memory to continue external sort. Consider increasing the memory limit, or decreasing sort_spill_reservation_bytes
caused by
Resources exhausted: Additional allocation failed for ExternalSorterMerge[0] with top memory consumers (across reservations) as:
DataFusion-Cli#89(can spill: false) consumed 0.0 B, peak 0.0 B,
ExternalSorter[0]#90(can spill: true) consumed 0.0 B, peak 0.0 B,
ExternalSorterMerge[0]#91(can spill: false) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 10.0 MB for ExternalSorterMerge[0] with 0.0 B already allocated for this reservation - 1024.0 B remain available for the total pool
|
|
||
| statement error Arrow error: Parser error: Invalid timezone "Asia/Taipei2": failed to parse timezone | ||
| SELECT '2000-01-01T00:00:00'::TIMESTAMP::TIMESTAMPTZ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some test coverage for invalid resets like
reset dataexplosion.execution.batch_size, reset datafusion.exec.batch_size, reset datafusion.execution.batches_size and datafusion.execution.batch_size.bar
|
Looks like some tests are failing in CI |
Which issue does this PR close?
Rationale for this change
Without a SQL-level reset, clients that SET DataFusion options have to rebuild the session to recover defaults
What changes are included in this PR?
ResetVariableAre these changes tested?
Yes
Are there any user-facing changes?
Yes, SQL clients (including the CLI) can issue RESET