Skip to content

Incorrect column cast when coalescing array type #9098

@vigimite

Description

@vigimite

Describe the bug

When coalescing between two arrays, datafusion will infer the type to be Utf8.

To Reproduce

❯ select arrow_typeof(coalesce([1], [2]));
+-------------------------------------------------------------------+
| arrow_typeof(coalesce(make_array(Int64(1)),make_array(Int64(2)))) |
+-------------------------------------------------------------------+
| Utf8                                                              |
+-------------------------------------------------------------------+
1 row in set. Query took 0.001 seconds.

❯ select arrow_typeof(coalesce(NULL, [2]));
+---------------------------------------------------+
| arrow_typeof(coalesce(NULL,make_array(Int64(2)))) |
+---------------------------------------------------+
| Utf8                                              |
+---------------------------------------------------+
1 row in set. Query took 0.001 seconds.

❯ select arrow_typeof(coalesce([1], NULL));
+---------------------------------------------------+
| arrow_typeof(coalesce(make_array(Int64(1)),NULL)) |
+---------------------------------------------------+
| Utf8                                              |
+---------------------------------------------------+
1 row in set. Query took 0.001 seconds.

Expected behavior

Postgres handles this correctly for example:

select pg_typeof(coalesce(NULL, array[1,2,3]));

-- pg_typeof
-- integer[]

Additional context

I didn't test this extensively but I think the same is true for struct types (and by extension maybe all "complex" type?)

If someone comes accross this, I am currently working around this by replacing the coalesce with a case statement:

when(col("c1").is_null(), col("c2"))
    .otherwise(col("c1"))
    .unwrap()
    .alias("c1"),

This preserves the type of the column.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions