-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add PostgreSQL-style named arguments support for scalar functions #18019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Nice! I'll try and find time to review this if no one beats me to it in the next few days. |
|
It looks like we have some CI failures. Why can we not support user defined signatures? This will be a problem for things like the FFI crate where we rely on those to handle UDF signatures. For me, this takes away the major compelling reason I wanted to get this support. I made a small change to the description because I don't think this closes out the entire issue, but one portion of it. I plan to take a closer look this weekend. |
|
@timsaucer thanks for having a first look I incorporated your feedback |
timsaucer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a major improvement. Thank you for the contribution!
|
|
||
| ## Named Arguments | ||
|
|
||
| DataFusion supports PostgreSQL-style named arguments for scalar functions, allowing you to pass arguments by parameter name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work for other dialects? Do you know what happens when you try setting a different dialect and running your tests against it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that this does work with other dialects out of the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it didn't work for the postgres dialect, but I added support for it, and it doesn't work for the mssql dialect for which I wasn't able to add support
|
Is there a way that we can also take care of |
How do you envision the call syntax? This https://www.postgresql.org/docs/9.5/xfunc-sql.html section I haven't checked the feasibility, but on first glance it looks like a niche case with much added complexity |
|
I'll take another look at this PR soon, sorry for the delay |
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks good, with some very nice test coverage. In future followups we can look at adding these names to more of our inbuilt functions, and potentially hooking them into the auto-generated documentation somehow 🤔
Would be great if could have another pair of eyes considering the size of PR & functionality it adds, cc @timsaucer
| #[test] | ||
| fn test_named_arguments_with_dialects() { | ||
| let sql = "SELECT my_func(arg1 => 'value1')"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this test quite hard to read and understand what exactly it is supposed to test; could we get away with removing this and potentially only doing it via SLT test cases? (I see we already have some postgres & mssql dialect cases so maybe those are already sufficient)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this test was to understand how each dialect parses the parameter name. Most dialects return FunctionArg::Named, the PostgresSQLDialect returns FunctionArg::ExprNamed, and MsSqlDialect doesn't return anything.
Yes, the SLT test cases cover the findings of this test.
|
@bubulalabu can you please review @Jefffrey's comments and let us know if you can address them before we merge this PR? Thank you 🙏 |
|
yes, I'll address @Jefffrey's comments soon |
Implement support for calling functions with named parameters using
PostgreSQL-style syntax (param => value).
Features:
- Parse named arguments from SQL (param => value syntax)
- Resolve named arguments to positional order before execution
- Support mixed positional and named arguments
- Store parameter names in function signatures
- Show parameter names in error messages
Implementation:
- Added argument resolution logic with validation
- Extended Signature with parameter_names field
- Updated SQL parser to handle named argument syntax
- Integrated into physical planning phase
- Added comprehensive tests and documentation
Example usage:
SELECT substr(str => 'hello', start_pos => 2, length => 3);
SELECT substr('hello', start_pos => 2, length => 3);
Error messages now show:
Candidate functions:
substr(str, start_pos)
substr(str, start_pos, length)
Instead of generic types like substr(Any, Any).
Related issue: apache#17379
fixed CI issues
added test to verify arguments can be used case in-sensitive added support for PostgresSQL dialect and verified other dialects
…rt_pos: Int64") fix ArraySignature to pair each name with its corresponding type move positional argument validation upfront
reset to default dialect in named_arguments.slt after testing MsSQL dialect compacted pattern matching in sql_fn_arg_to_logical_expr_with_name
07d033b to
ebc391e
Compare
|
Thanks @bubulalabu |
|
likewise thanks for the thorough feedback, it was very educative |
…ache#18019) ## Which issue does this PR close? Addresses one portion of apache#17379. ## Rationale for this change PostgreSQL supports named arguments for function calls using the syntax `function_name(param => value)`, which improves code readability and allows arguments to be specified in any order. DataFusion should support this syntax to enhance the user experience, especially for functions with many optional parameters. ## What changes are included in this PR? This PR implements PostgreSQL-style named arguments for scalar functions. **Features:** - Parse named arguments from SQL (param => value syntax) - Resolve named arguments to positional order before execution - Support mixed positional and named arguments - Store parameter names in function signatures - Show parameter names in error messages **Limitations:** - Named arguments only work for functions with known arity (fixed number of parameters) - Variadic functions (like `concat`) cannot use named arguments as they accept variable numbers of arguments - Supported signature types: `Exact`, `Uniform`, `Any`, `Coercible`, `Comparable`, `Numeric`, `String`, `Nullary`, `ArraySignature`, `UserDefined`, and `OneOf` (combinations of these) - Not supported: `Variadic`, `VariadicAny` **Implementation:** - Added argument resolution logic with validation - Extended Signature with parameter_names field - Updated SQL parser to handle named argument syntax - Integrated into physical planning phase - Added comprehensive tests and documentation **Example usage:** ```sql -- All named arguments SELECT substr(str => 'hello world', start_pos => 7, length => 5); -- Mixed positional and named arguments SELECT substr('hello world', start_pos => 7, length => 5); -- Named arguments in any order SELECT substr(length => 5, str => 'hello world', start_pos => 7); ``` **Improved error messages:** Before this PR, error messages showed generic types: ``` Candidate functions: substr(Any, Any) substr(Any, Any, Any) ``` After this PR, error messages show parameter names: ``` Candidate functions: substr(str, start_pos) substr(str, start_pos, length) ``` Example error output: ``` datafusion % target/debug/datafusion-cli DataFusion CLI v50.1.0 > SELECT substr(str => 'hello world'); Error during planning: Execution error: Function 'substr' user-defined coercion failed with "Error during planning: The substr function requires 2 or 3 arguments, but got 1.". No function matches the given name and argument types 'substr(Utf8)'. You might need to add explicit type casts. Candidate functions: substr(str, start_pos, length) ``` Note: The function shows all parameters including optional ones for UserDefined signatures. The error message "requires 2 or 3 arguments" indicates that `length` is optional. ## Are these changes tested? Yes, comprehensive tests are included: 1. **Unit tests** (18 tests total): - Argument validation and reordering logic (8 tests in `udf.rs`) - Error message formatting with parameter names (2 tests in `utils.rs`) - TypeSignature parameter name support for all fixed-arity variants including ArraySignature (10 tests in `signature.rs`) 2. **Integration tests** (`named_arguments.slt`): - Positional arguments (baseline) - Named arguments in order - Named arguments out of order - Mixed positional and named arguments - Optional parameters - Function aliases - Error cases (positional after named, unknown parameter, duplicate parameter) - Error message format verification All tests pass successfully. ## Are there any user-facing changes? **Yes**, this PR adds new user-facing functionality: 1. **New SQL syntax**: Users can now call functions with named arguments using `param => value` syntax (only for functions with fixed arity) 2. **Improved error messages**: Signature mismatch errors now display parameter names instead of generic types 3. **UDF API**: Function authors can add parameter names to their functions using: ```rust signature: Signature::uniform(2, vec![DataType::Float64], Volatility::Immutable) .with_parameter_names(vec!["base".to_string(), "exponent".to_string()]) .expect("valid parameter names") ``` **Potential breaking change** (very unlikely): Added new public field `parameter_names: Option<Vec<String>>` to `Signature` struct. This is technically a breaking change if code constructs `Signature` using struct literal syntax. However, this is extremely unlikely in practice because: - `Signature` is almost always constructed using builder methods (`Signature::exact()`, `Signature::uniform()`, etc.) - The new field defaults to `None`, maintaining existing behavior - Existing code using builder methods continues to work without modification **No other breaking changes**: The feature is purely additive - existing SQL queries and UDF implementations work without modification.
## Which issue does this PR close? Addresses portions of #17379. ## Rationale for this change Add support for aggregate and window UDFs in the same way as we did it for scalar UDFs here: #18019 ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, the changes are user-facing, documented, purely additive and non-breaking.
…8389) ## Which issue does this PR close? Addresses portions of apache#17379. ## Rationale for this change Add support for aggregate and window UDFs in the same way as we did it for scalar UDFs here: apache#18019 ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, the changes are user-facing, documented, purely additive and non-breaking.
Which issue does this PR close?
Addresses one portion of #17379.
Rationale for this change
PostgreSQL supports named arguments for function calls using the syntax
function_name(param => value), which improves code readability and allows arguments to be specified in any order. DataFusion should support this syntax to enhance the user experience, especially for functions with many optional parameters.What changes are included in this PR?
This PR implements PostgreSQL-style named arguments for scalar functions.
Features:
Limitations:
concat) cannot use named arguments as they accept variable numbers of argumentsExact,Uniform,Any,Coercible,Comparable,Numeric,String,Nullary,ArraySignature,UserDefined, andOneOf(combinations of these)Variadic,VariadicAnyImplementation:
Example usage:
Improved error messages:
Before this PR, error messages showed generic types:
After this PR, error messages show parameter names:
Example error output:
Note: The function shows all parameters including optional ones for UserDefined signatures. The error message "requires 2 or 3 arguments" indicates that
lengthis optional.Are these changes tested?
Yes, comprehensive tests are included:
Unit tests (18 tests total):
udf.rs)utils.rs)signature.rs)Integration tests (
named_arguments.slt):All tests pass successfully.
Are there any user-facing changes?
Yes, this PR adds new user-facing functionality:
param => valuesyntax (only for functions with fixed arity)Potential breaking change (very unlikely): Added new public field
parameter_names: Option<Vec<String>>toSignaturestruct. This is technically a breaking change if code constructsSignatureusing struct literal syntax. However, this is extremely unlikely in practice because:Signatureis almost always constructed using builder methods (Signature::exact(),Signature::uniform(), etc.)None, maintaining existing behaviorNo other breaking changes: The feature is purely additive - existing SQL queries and UDF implementations work without modification.