Skip to content

Conversation

@bubulalabu
Copy link
Contributor

@bubulalabu bubulalabu commented Oct 11, 2025

Which issue does this PR close?

Addresses one portion of #17379.

Rationale for this change

PostgreSQL supports named arguments for function calls using the syntax function_name(param => value), which improves code readability and allows arguments to be specified in any order. DataFusion should support this syntax to enhance the user experience, especially for functions with many optional parameters.

What changes are included in this PR?

This PR implements PostgreSQL-style named arguments for scalar functions.

Features:

  • Parse named arguments from SQL (param => value syntax)
  • Resolve named arguments to positional order before execution
  • Support mixed positional and named arguments
  • Store parameter names in function signatures
  • Show parameter names in error messages

Limitations:

  • Named arguments only work for functions with known arity (fixed number of parameters)
  • Variadic functions (like concat) cannot use named arguments as they accept variable numbers of arguments
  • Supported signature types: Exact, Uniform, Any, Coercible, Comparable, Numeric, String, Nullary, ArraySignature, UserDefined, and OneOf (combinations of these)
  • Not supported: Variadic, VariadicAny

Implementation:

  • Added argument resolution logic with validation
  • Extended Signature with parameter_names field
  • Updated SQL parser to handle named argument syntax
  • Integrated into physical planning phase
  • Added comprehensive tests and documentation

Example usage:

-- All named arguments
SELECT substr(str => 'hello world', start_pos => 7, length => 5);

-- Mixed positional and named arguments
SELECT substr('hello world', start_pos => 7, length => 5);

-- Named arguments in any order
SELECT substr(length => 5, str => 'hello world', start_pos => 7);

Improved error messages:

Before this PR, error messages showed generic types:

Candidate functions:
    substr(Any, Any)
    substr(Any, Any, Any)

After this PR, error messages show parameter names:

Candidate functions:
    substr(str, start_pos)
    substr(str, start_pos, length)

Example error output:

datafusion % target/debug/datafusion-cli
DataFusion CLI v50.1.0
> SELECT substr(str => 'hello world');
Error during planning: Execution error: Function 'substr' user-defined coercion failed with "Error during planning: The substr function requires 2 or 3 arguments, but got 1.". No function matches the given name and argument types 'substr(Utf8)'. You might need to add explicit type casts.
        Candidate functions:
        substr(str, start_pos, length)

Note: The function shows all parameters including optional ones for UserDefined signatures. The error message "requires 2 or 3 arguments" indicates that length is optional.

Are these changes tested?

Yes, comprehensive tests are included:

  1. Unit tests (18 tests total):

    • Argument validation and reordering logic (8 tests in udf.rs)
    • Error message formatting with parameter names (2 tests in utils.rs)
    • TypeSignature parameter name support for all fixed-arity variants including ArraySignature (10 tests in signature.rs)
  2. Integration tests (named_arguments.slt):

    • Positional arguments (baseline)
    • Named arguments in order
    • Named arguments out of order
    • Mixed positional and named arguments
    • Optional parameters
    • Function aliases
    • Error cases (positional after named, unknown parameter, duplicate parameter)
    • Error message format verification

All tests pass successfully.

Are there any user-facing changes?

Yes, this PR adds new user-facing functionality:

  1. New SQL syntax: Users can now call functions with named arguments using param => value syntax (only for functions with fixed arity)
  2. Improved error messages: Signature mismatch errors now display parameter names instead of generic types
  3. UDF API: Function authors can add parameter names to their functions using:
    signature: Signature::uniform(2, vec![DataType::Float64], Volatility::Immutable)
        .with_parameter_names(vec!["base".to_string(), "exponent".to_string()])
        .expect("valid parameter names")

Potential breaking change (very unlikely): Added new public field parameter_names: Option<Vec<String>> to Signature struct. This is technically a breaking change if code constructs Signature using struct literal syntax. However, this is extremely unlikely in practice because:

  • Signature is almost always constructed using builder methods (Signature::exact(), Signature::uniform(), etc.)
  • The new field defaults to None, maintaining existing behavior
  • Existing code using builder methods continues to work without modification

No other breaking changes: The feature is purely additive - existing SQL queries and UDF implementations work without modification.

@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Oct 11, 2025
@Omega359
Copy link
Contributor

Nice! I'll try and find time to review this if no one beats me to it in the next few days.

@timsaucer
Copy link
Member

It looks like we have some CI failures.

Why can we not support user defined signatures? This will be a problem for things like the FFI crate where we rely on those to handle UDF signatures. For me, this takes away the major compelling reason I wanted to get this support.

I made a small change to the description because I don't think this closes out the entire issue, but one portion of it.

I plan to take a closer look this weekend.

@bubulalabu
Copy link
Contributor Author

@timsaucer thanks for having a first look

I incorporated your feedback

Copy link
Member

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a major improvement. Thank you for the contribution!


## Named Arguments

DataFusion supports PostgreSQL-style named arguments for scalar functions, allowing you to pass arguments by parameter name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work for other dialects? Do you know what happens when you try setting a different dialect and running your tests against it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that this does work with other dialects out of the box.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it didn't work for the postgres dialect, but I added support for it, and it doesn't work for the mssql dialect for which I wasn't able to add support

@timsaucer
Copy link
Member

Is there a way that we can also take care of Variadic and VariadicAny ? What is the technical holdup for taking care of them also?

@bubulalabu
Copy link
Contributor Author

@timsaucer

Is there a way that we can also take care of Variadic and VariadicAny ? What is the technical holdup for taking care of them also?

How do you envision the call syntax? This https://www.postgresql.org/docs/9.5/xfunc-sql.html section 35.4.5. SQL Functions with Variable Numbers of Arguments is how postgres does it.

I haven't checked the feasibility, but on first glance it looks like a niche case with much added complexity

@Jefffrey
Copy link
Contributor

I'll take another look at this PR soon, sorry for the delay

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks good, with some very nice test coverage. In future followups we can look at adding these names to more of our inbuilt functions, and potentially hooking them into the auto-generated documentation somehow 🤔

Would be great if could have another pair of eyes considering the size of PR & functionality it adds, cc @timsaucer

Comment on lines 3957 to 3959
#[test]
fn test_named_arguments_with_dialects() {
let sql = "SELECT my_func(arg1 => 'value1')";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this test quite hard to read and understand what exactly it is supposed to test; could we get away with removing this and potentially only doing it via SLT test cases? (I see we already have some postgres & mssql dialect cases so maybe those are already sufficient)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this test was to understand how each dialect parses the parameter name. Most dialects return FunctionArg::Named, the PostgresSQLDialect returns FunctionArg::ExprNamed, and MsSqlDialect doesn't return anything.

Yes, the SLT test cases cover the findings of this test.

@alamb
Copy link
Contributor

alamb commented Oct 26, 2025

@bubulalabu can you please review @Jefffrey's comments and let us know if you can address them before we merge this PR?

Thank you 🙏

@bubulalabu
Copy link
Contributor Author

yes, I'll address @Jefffrey's comments soon

Implement support for calling functions with named parameters using
PostgreSQL-style syntax (param => value).

Features:
- Parse named arguments from SQL (param => value syntax)
- Resolve named arguments to positional order before execution
- Support mixed positional and named arguments
- Store parameter names in function signatures
- Show parameter names in error messages

Implementation:
- Added argument resolution logic with validation
- Extended Signature with parameter_names field
- Updated SQL parser to handle named argument syntax
- Integrated into physical planning phase
- Added comprehensive tests and documentation

Example usage:
  SELECT substr(str => 'hello', start_pos => 2, length => 3);
  SELECT substr('hello', start_pos => 2, length => 3);

Error messages now show:
  Candidate functions:
    substr(str, start_pos)
    substr(str, start_pos, length)

Instead of generic types like substr(Any, Any).

Related issue: apache#17379
added test to verify arguments can be used case in-sensitive
added support for PostgresSQL dialect and verified other dialects
…rt_pos: Int64")

fix ArraySignature to pair each name with its corresponding type
move positional argument validation upfront
reset to default dialect in named_arguments.slt after testing MsSQL dialect
compacted pattern matching in sql_fn_arg_to_logical_expr_with_name
@bubulalabu bubulalabu force-pushed the 17379-add-named-arguments branch from 07d033b to ebc391e Compare October 26, 2025 12:54
@Jefffrey Jefffrey added this pull request to the merge queue Oct 28, 2025
Merged via the queue into apache:main with commit 74beabc Oct 28, 2025
33 checks passed
@Jefffrey
Copy link
Contributor

Thanks @bubulalabu

@bubulalabu
Copy link
Contributor Author

likewise thanks for the thorough feedback, it was very educative

tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025
…ache#18019)

## Which issue does this PR close?

Addresses one portion of apache#17379.

## Rationale for this change

PostgreSQL supports named arguments for function calls using the syntax
`function_name(param => value)`, which improves code readability and
allows arguments to be specified in any order. DataFusion should support
this syntax to enhance the user experience, especially for functions
with many optional parameters.

## What changes are included in this PR?

This PR implements PostgreSQL-style named arguments for scalar
functions.

**Features:**
- Parse named arguments from SQL (param => value syntax)
- Resolve named arguments to positional order before execution
- Support mixed positional and named arguments
- Store parameter names in function signatures
- Show parameter names in error messages

**Limitations:**
- Named arguments only work for functions with known arity (fixed number
of parameters)
- Variadic functions (like `concat`) cannot use named arguments as they
accept variable numbers of arguments
- Supported signature types: `Exact`, `Uniform`, `Any`, `Coercible`,
`Comparable`, `Numeric`, `String`, `Nullary`, `ArraySignature`,
`UserDefined`, and `OneOf` (combinations of these)
- Not supported: `Variadic`, `VariadicAny`

**Implementation:**
- Added argument resolution logic with validation
- Extended Signature with parameter_names field
- Updated SQL parser to handle named argument syntax
- Integrated into physical planning phase
- Added comprehensive tests and documentation

**Example usage:**
```sql
-- All named arguments
SELECT substr(str => 'hello world', start_pos => 7, length => 5);

-- Mixed positional and named arguments
SELECT substr('hello world', start_pos => 7, length => 5);

-- Named arguments in any order
SELECT substr(length => 5, str => 'hello world', start_pos => 7);
```

**Improved error messages:**

Before this PR, error messages showed generic types:
```
Candidate functions:
    substr(Any, Any)
    substr(Any, Any, Any)
```

After this PR, error messages show parameter names:
```
Candidate functions:
    substr(str, start_pos)
    substr(str, start_pos, length)
```

Example error output:
```
datafusion % target/debug/datafusion-cli
DataFusion CLI v50.1.0
> SELECT substr(str => 'hello world');
Error during planning: Execution error: Function 'substr' user-defined coercion failed with "Error during planning: The substr function requires 2 or 3 arguments, but got 1.". No function matches the given name and argument types 'substr(Utf8)'. You might need to add explicit type casts.
        Candidate functions:
        substr(str, start_pos, length)
```

Note: The function shows all parameters including optional ones for
UserDefined signatures. The error message "requires 2 or 3 arguments"
indicates that `length` is optional.

## Are these changes tested?

Yes, comprehensive tests are included:

1. **Unit tests** (18 tests total):
   - Argument validation and reordering logic (8 tests in `udf.rs`)
- Error message formatting with parameter names (2 tests in `utils.rs`)
- TypeSignature parameter name support for all fixed-arity variants
including ArraySignature (10 tests in `signature.rs`)

2. **Integration tests** (`named_arguments.slt`):
   - Positional arguments (baseline)
   - Named arguments in order
   - Named arguments out of order
   - Mixed positional and named arguments
   - Optional parameters
   - Function aliases
- Error cases (positional after named, unknown parameter, duplicate
parameter)
   - Error message format verification

All tests pass successfully.

## Are there any user-facing changes?

**Yes**, this PR adds new user-facing functionality:

1. **New SQL syntax**: Users can now call functions with named arguments
using `param => value` syntax (only for functions with fixed arity)
2. **Improved error messages**: Signature mismatch errors now display
parameter names instead of generic types
3. **UDF API**: Function authors can add parameter names to their
functions using:
   ```rust
signature: Signature::uniform(2, vec![DataType::Float64],
Volatility::Immutable)
.with_parameter_names(vec!["base".to_string(), "exponent".to_string()])
       .expect("valid parameter names")
   ```

**Potential breaking change** (very unlikely): Added new public field
`parameter_names: Option<Vec<String>>` to `Signature` struct. This is
technically a breaking change if code constructs `Signature` using
struct literal syntax. However, this is extremely unlikely in practice
because:
- `Signature` is almost always constructed using builder methods
(`Signature::exact()`, `Signature::uniform()`, etc.)
- The new field defaults to `None`, maintaining existing behavior
- Existing code using builder methods continues to work without
modification

**No other breaking changes**: The feature is purely additive - existing
SQL queries and UDF implementations work without modification.
github-merge-queue bot pushed a commit that referenced this pull request Nov 5, 2025
## Which issue does this PR close?

Addresses portions of #17379.

## Rationale for this change

Add support for aggregate and window UDFs in the same way as we did it
for scalar UDFs here: #18019

## Are these changes tested?

Yes

## Are there any user-facing changes?

Yes, the changes are user-facing, documented, purely additive and
non-breaking.
jizezhang pushed a commit to jizezhang/datafusion that referenced this pull request Nov 5, 2025
…8389)

## Which issue does this PR close?

Addresses portions of apache#17379.

## Rationale for this change

Add support for aggregate and window UDFs in the same way as we did it
for scalar UDFs here: apache#18019

## Are these changes tested?

Yes

## Are there any user-facing changes?

Yes, the changes are user-facing, documented, purely additive and
non-breaking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants