Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,7 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
override def nullable: Boolean = child.nullable
override def foldable: Boolean = child.foldable
override def dataType: DataType = child.dataType
override lazy val canonicalized: Expression = child.canonicalized
Copy link
Member

@gatorsmile gatorsmile Jul 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this fixes the issue, the root cause is the unneeded aliases (these aliases were added for nested fields) are not properly cleaned up in these RuntimeReplaceable expressions by the rule CleanupAliases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it. However, the aliases actually are replaced by Optimizer because RuntimeReplaceable. So they are no harm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, those aliases are not children of RuntimeReplaceable which is an UnaryExpression. So we can't trim the aliases out by simple transforming the expressions in CleanupAliases.

If we want to replace the non-children aliases in RuntimeReplaceable, we need to add more codes to RuntimeReplaceable and modify all expressions of RuntimeReplaceable. It makes the interface ugly IMO.

Consider those aliases will be replaced soon and no harm of them, that's why I choose to simply override canonicalized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, my only concern is CleanupAliases might be called in the other Analyzer rules. These unneeded aliases could break the assumptions made by the caller of CleanupAliases.

Second, you should post what I said above in the PR description. So far, the description is not clear to the reviewers.

Third, I am ok about the fix, but we should write the comments to explain why we did it here. We need to add two comments. One is in CleanupAliases ; another is above this line to explain why we add canonicalized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR description and comments accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you meant there is an assumption that CleanupAliases would clean up ALL aliases in a plan, that is not correct. It can only clean up the aliases reachable by expression transformation. I added a comment to clarify this.

}


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,9 @@ SELECT float(1), double(1), decimal(1);
SELECT date("2014-04-04"), timestamp(date("2014-04-04"));
-- error handling: only one argument
SELECT string(1, 2);

-- SPARK-21555: RuntimeReplaceable used in group by
CREATE TABLE test(a INT, foo STRUCT<foo1:STRING,foo2:STRING>) USING parquet;
INSERT INTO test VALUES(1, ("value1", "value2"));
SELECT nvl(foo.foo1, "value"), count(*) FROM test GROUP BY nvl(foo.foo1, "value");
DROP TABLE test;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify the test case to something like

CREATE TEMPORARY VIEW tempView1 AS VALUES (1, NAMED_STRUCT('col1', 'gamma', 'col2', 'delta')) AS T(id, st)")
SELECT nvl(st.col1, "value"), count(*) FROM from tempView1 GROUP BY nvl(st.col1, "value")

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 13
-- Number of queries: 17


-- !query 0
Expand Down Expand Up @@ -122,3 +122,35 @@ struct<>
-- !query 12 output
org.apache.spark.sql.AnalysisException
Function string accepts only one argument; line 1 pos 7


-- !query 13
CREATE TABLE test(a INT, foo STRUCT<foo1:STRING,foo2:STRING>) USING parquet
-- !query 13 schema
struct<>
-- !query 13 output



-- !query 14
INSERT INTO test VALUES(1, ("value1", "value2"))
-- !query 14 schema
struct<>
-- !query 14 output



-- !query 15
SELECT nvl(foo.foo1, "value"), count(*) FROM test GROUP BY nvl(foo.foo1, "value")
-- !query 15 schema
struct<nvl(test.`foo`.`foo1` AS `foo1`, 'value'):string,count(1):bigint>
-- !query 15 output
value1 1


-- !query 16
DROP TABLE test
-- !query 16 schema
struct<>
-- !query 16 output