Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Jun 2, 2022

What changes were proposed in this pull request?

After SPARK-38118, dataTypeMismatchError related to Filter will be checked and throw in RemoveTempResolvedColumn, this will cause compatibility issue with exception message presentation.

For example, the following case:

spark.sql("create table t1(user_id int, auct_end_dt date) using parquet;")
spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show

The expected message is

Table or view not found: t2

But the actual message is

org.apache.spark.sql.AnalysisException: cannot resolve 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires date type, however, ''2020-12-27'' is of string type.; line 1 pos 76

For forward compatibility, this pr change to only records DATA_TYPE_MISMATCH_ERROR_MESSAGE in the RemoveTempResolvedColumn check process , and move failAnalysis to CheckAnalysis#checkAnalysis

Why are the changes needed?

Fix analysis exception message compatibility.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass Github Actions and add a new test case

@github-actions github-actions bot added the SQL label Jun 2, 2022
|SELECT (t.c AND t.d) c
|FROM t
|GROUP BY t.c
|GROUP BY t.c, t.d
Copy link
Contributor Author

@LuciferYang LuciferYang Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without t.d, the error message is

expression 't.d' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.

@LuciferYang
Copy link
Contributor Author

Try to fix SPARK-39354, if acceptable, I will update the pr description

@wangyum
Copy link
Member

wangyum commented Jun 2, 2022

@cloud-fan

@LuciferYang LuciferYang changed the title [DON'T MERGE] [DON'T MERGE][SQL] Show Table or view not found first even if there are dataTypeMismatchError at the same time Jun 2, 2022
@LuciferYang LuciferYang changed the title [DON'T MERGE][SQL] Show Table or view not found first even if there are dataTypeMismatchError at the same time [DON'T MERGE][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time Jun 2, 2022
@LuciferYang LuciferYang changed the title [DON'T MERGE][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time [SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time Jun 2, 2022
@LuciferYang LuciferYang marked this pull request as draft June 2, 2022 02:31
@LuciferYang LuciferYang marked this pull request as ready for review June 2, 2022 02:31
@LuciferYang LuciferYang changed the title [SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time Jun 2, 2022
@LuciferYang LuciferYang changed the title [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError at the same time [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError for Filter at the same time Jun 2, 2022
@LuciferYang LuciferYang changed the title [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError for Filter at the same time [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError related to Filter at the same time Jun 2, 2022
@LuciferYang
Copy link
Contributor Author

Try to fix SPARK-39354, if acceptable, I will update the pr description

done

@LuciferYang LuciferYang changed the title [WIP][SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError related to Filter at the same time [SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError related to Filter at the same time Jun 2, 2022
@MaxGekk
Copy link
Member

MaxGekk commented Jun 2, 2022

+1, LGTM. Merging to master/3.3.
Thank you, @LuciferYang.

MaxGekk pushed a commit that referenced this pull request Jun 2, 2022
…e are `dataTypeMismatchError` related to `Filter` at the same time

### What changes were proposed in this pull request?
After SPARK-38118,  `dataTypeMismatchError` related to `Filter` will be checked and throw in `RemoveTempResolvedColumn`,  this will cause compatibility issue with exception message presentation.

For example, the following case:

```
spark.sql("create table t1(user_id int, auct_end_dt date) using parquet;")
spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
```

The expected message is

```
Table or view not found: t2
```

But the actual message is
```
org.apache.spark.sql.AnalysisException: cannot resolve 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
```

For forward compatibility, this pr change to only records `DATA_TYPE_MISMATCH_ERROR_MESSAGE` in the `RemoveTempResolvedColumn` check  process , and move `failAnalysis` to `CheckAnalysis#checkAnalysis`

### Why are the changes needed?
Fix analysis exception message compatibility.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions and add a new test case

Closes #36746 from LuciferYang/SPARK-39354.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
(cherry picked from commit 89fdb8a)
Signed-off-by: Max Gekk <[email protected]>
@MaxGekk MaxGekk closed this in 89fdb8a Jun 2, 2022
@LuciferYang
Copy link
Contributor Author

thanks @MaxGekk @wangyum ~

cloud-fan added a commit that referenced this pull request Jun 16, 2022
### What changes were proposed in this pull request?

This is a followup of #35404 and #36746 , to simplify the error handling of `TempResolvedColumn`.

The idea is:
1. The rule `ResolveAggregationFunctions` in the main resolution batch creates `TempResolvedColumn` and only removes it if the aggregate expression is fully resolved. It either strips `TempResolvedColumn` if it's inside aggregate function or group expression, or restores `TempResolvedColumn` to `UnresolvedAttribute` otherwise, hoping other rules can resolve it.
2. The rule `RemoveTempResolvedColumn` in a latter batch can still hit `TempResolvedColumn` if the aggregate expression is unresolved (due to input type mismatch for example, e.g. `avg(bool_col)`, `date_add(int_group_col, 1)`). At this stage, there is no way to restore `TempResolvedColumn` to `UnresolvedAttribute`  and resolve it differently. The query will fail and we should blindly strip `TempResolvedColumn` to provide better error message.
### Why are the changes needed?

code cleanup

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #36809 from cloud-fan/error.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants