Skip to content

[BUG] Extra fields returned for left semi and left anti joins #1192

@ChrisJar

Description

@ChrisJar

What happened:
When performing a left semi or leftanti join, getFieldList and getFieldNames return an extra field that we need to filter out:

select_names = [field for field in rel.getRowType().getFieldList()]

Minimal Complete Verifiable Example:
For example:

import pandas as pd
from dask_sql import Context

c = Context()

dfa = pd.DataFrame({"id":[1,2,2,4], "a":["a","b","c","d"]})
dfb = pd.DataFrame({"id":[2,3,3,4], "b":["e","f","g","h"]})
c.create_table("dfa", dfa, gpu=True)
c.create_table("dfb", dfb, gpu=True)

query = "Select * from dfa left anti join dfb on dfa.id = dfb.id"
res = c.sql(query).compute()
print(res)

Should only result in 2 field names returned by getFieldNames yet it returns 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageAwaiting triage by a dask-sql maintainer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions